All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/60] target/riscv: support vector extension v0.7.1
@ 2020-03-12 14:58 ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

This patchset implements the vector extension for RISC-V on QEMU.

You can also find the patchset and all *test cases* in
my repo(https://github.com/romanheros/qemu.git branch:vector-upstream-v3).
All the test cases are in the directory qemu/tests/riscv/vector/. They are
riscv64 linux user mode programs.

You can test the patchset by the script qemu/tests/riscv/vector/runcase.sh.

Features:
  * support specification riscv-v-spec-0.7.1.(https://github.com/riscv/riscv-v-spec/releases/tag/0.7.1/)
  * support basic vector extension.
  * support Zvlsseg.
  * support Zvamo.
  * not support Zvediv as it is changing.
  * SLEN always equals VLEN.
  * element width support 8bit, 16bit, 32bit, 64bit.

Changelog:
v5
  * fixup a bug in tb flags.
v4
  * no change
v3
  * move check code from execution-time to translation-time
  * use a continous memory block for vector register description.
  * vector registers as direct fields in RISCVCPUState.
  * support VLEN configure from qemu command line.
  * support ELEN configure from qemu command line.
  * support vector specification version configure from qemu command line.
  * probe pages before real load or store access.
  * use probe_page_check for no-fault operations in linux user mode.
  * generation atomic exit exception when in parallel environment.
  * fixup a lot of concrete bugs.

V2
  * use float16_compare{_quiet}
  * only use GETPC() in outer most helper
  * add ctx.ext_v Property


LIU Zhiwei (60):
  target/riscv: add vector extension field in CPURISCVState
  target/riscv: implementation-defined constant parameters
  target/riscv: support vector extension csr
  target/riscv: add vector configure instruction
  target/riscv: add vector stride load and store instructions
  target/riscv: add vector index load and store instructions
  target/riscv: add fault-only-first unit stride load
  target/riscv: add vector amo operations
  target/riscv: vector single-width integer add and subtract
  target/riscv: vector widening integer add and subtract
  target/riscv: vector integer add-with-carry / subtract-with-borrow
    instructions
  target/riscv: vector bitwise logical instructions
  target/riscv: vector single-width bit shift instructions
  target/riscv: vector narrowing integer right shift instructions
  target/riscv: vector integer comparison instructions
  target/riscv: vector integer min/max instructions
  target/riscv: vector single-width integer multiply instructions
  target/riscv: vector integer divide instructions
  target/riscv: vector widening integer multiply instructions
  target/riscv: vector single-width integer multiply-add instructions
  target/riscv: vector widening integer multiply-add instructions
  target/riscv: vector integer merge and move instructions
  target/riscv: vector single-width saturating add and subtract
  target/riscv: vector single-width averaging add and subtract
  target/riscv: vector single-width fractional multiply with rounding
    and saturation
  target/riscv: vector widening saturating scaled multiply-add
  target/riscv: vector single-width scaling shift instructions
  target/riscv: vector narrowing fixed-point clip instructions
  target/riscv: vector single-width floating-point add/subtract
    instructions
  target/riscv: vector widening floating-point add/subtract instructions
  target/riscv: vector single-width floating-point multiply/divide
    instructions
  target/riscv: vector widening floating-point multiply
  target/riscv: vector single-width floating-point fused multiply-add
    instructions
  target/riscv: vector widening floating-point fused multiply-add
    instructions
  target/riscv: vector floating-point square-root instruction
  target/riscv: vector floating-point min/max instructions
  target/riscv: vector floating-point sign-injection instructions
  target/riscv: vector floating-point compare instructions
  target/riscv: vector floating-point classify instructions
  target/riscv: vector floating-point merge instructions
  target/riscv: vector floating-point/integer type-convert instructions
  target/riscv: widening floating-point/integer type-convert
    instructions
  target/riscv: narrowing floating-point/integer type-convert
    instructions
  target/riscv: vector single-width integer reduction instructions
  target/riscv: vector wideing integer reduction instructions
  target/riscv: vector single-width floating-point reduction
    instructions
  target/riscv: vector widening floating-point reduction instructions
  target/riscv: vector mask-register logical instructions
  target/riscv: vector mask population count vmpopc
  target/riscv: vmfirst find-first-set mask bit
  target/riscv: set-X-first mask bit
  target/riscv: vector iota instruction
  target/riscv: vector element index instruction
  target/riscv: integer extract instruction
  target/riscv: integer scalar move instruction
  target/riscv: floating-point scalar move instructions
  target/riscv: vector slide instructions
  target/riscv: vector register gather instruction
  target/riscv: vector compress instruction
  target/riscv: configure and turn on vector extension from command line

 target/riscv/Makefile.objs              |    2 +-
 target/riscv/cpu.c                      |   49 +
 target/riscv/cpu.h                      |   89 +-
 target/riscv/cpu_bits.h                 |   15 +
 target/riscv/csr.c                      |   75 +-
 target/riscv/helper.h                   | 1075 +++++
 target/riscv/insn32-64.decode           |   11 +
 target/riscv/insn32.decode              |  366 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 2386 ++++++++++++
 target/riscv/translate.c                |   24 +-
 target/riscv/vector_helper.c            | 4745 +++++++++++++++++++++++
 11 files changed, 8824 insertions(+), 13 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
 create mode 100644 target/riscv/vector_helper.c

-- 
2.23.0



^ permalink raw reply	[flat|nested] 336+ messages in thread

* [PATCH v5 00/60] target/riscv: support vector extension v0.7.1
@ 2020-03-12 14:58 ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

This patchset implements the vector extension for RISC-V on QEMU.

You can also find the patchset and all *test cases* in
my repo(https://github.com/romanheros/qemu.git branch:vector-upstream-v3).
All the test cases are in the directory qemu/tests/riscv/vector/. They are
riscv64 linux user mode programs.

You can test the patchset by the script qemu/tests/riscv/vector/runcase.sh.

Features:
  * support specification riscv-v-spec-0.7.1.(https://github.com/riscv/riscv-v-spec/releases/tag/0.7.1/)
  * support basic vector extension.
  * support Zvlsseg.
  * support Zvamo.
  * not support Zvediv as it is changing.
  * SLEN always equals VLEN.
  * element width support 8bit, 16bit, 32bit, 64bit.

Changelog:
v5
  * fixup a bug in tb flags.
v4
  * no change
v3
  * move check code from execution-time to translation-time
  * use a continous memory block for vector register description.
  * vector registers as direct fields in RISCVCPUState.
  * support VLEN configure from qemu command line.
  * support ELEN configure from qemu command line.
  * support vector specification version configure from qemu command line.
  * probe pages before real load or store access.
  * use probe_page_check for no-fault operations in linux user mode.
  * generation atomic exit exception when in parallel environment.
  * fixup a lot of concrete bugs.

V2
  * use float16_compare{_quiet}
  * only use GETPC() in outer most helper
  * add ctx.ext_v Property


LIU Zhiwei (60):
  target/riscv: add vector extension field in CPURISCVState
  target/riscv: implementation-defined constant parameters
  target/riscv: support vector extension csr
  target/riscv: add vector configure instruction
  target/riscv: add vector stride load and store instructions
  target/riscv: add vector index load and store instructions
  target/riscv: add fault-only-first unit stride load
  target/riscv: add vector amo operations
  target/riscv: vector single-width integer add and subtract
  target/riscv: vector widening integer add and subtract
  target/riscv: vector integer add-with-carry / subtract-with-borrow
    instructions
  target/riscv: vector bitwise logical instructions
  target/riscv: vector single-width bit shift instructions
  target/riscv: vector narrowing integer right shift instructions
  target/riscv: vector integer comparison instructions
  target/riscv: vector integer min/max instructions
  target/riscv: vector single-width integer multiply instructions
  target/riscv: vector integer divide instructions
  target/riscv: vector widening integer multiply instructions
  target/riscv: vector single-width integer multiply-add instructions
  target/riscv: vector widening integer multiply-add instructions
  target/riscv: vector integer merge and move instructions
  target/riscv: vector single-width saturating add and subtract
  target/riscv: vector single-width averaging add and subtract
  target/riscv: vector single-width fractional multiply with rounding
    and saturation
  target/riscv: vector widening saturating scaled multiply-add
  target/riscv: vector single-width scaling shift instructions
  target/riscv: vector narrowing fixed-point clip instructions
  target/riscv: vector single-width floating-point add/subtract
    instructions
  target/riscv: vector widening floating-point add/subtract instructions
  target/riscv: vector single-width floating-point multiply/divide
    instructions
  target/riscv: vector widening floating-point multiply
  target/riscv: vector single-width floating-point fused multiply-add
    instructions
  target/riscv: vector widening floating-point fused multiply-add
    instructions
  target/riscv: vector floating-point square-root instruction
  target/riscv: vector floating-point min/max instructions
  target/riscv: vector floating-point sign-injection instructions
  target/riscv: vector floating-point compare instructions
  target/riscv: vector floating-point classify instructions
  target/riscv: vector floating-point merge instructions
  target/riscv: vector floating-point/integer type-convert instructions
  target/riscv: widening floating-point/integer type-convert
    instructions
  target/riscv: narrowing floating-point/integer type-convert
    instructions
  target/riscv: vector single-width integer reduction instructions
  target/riscv: vector wideing integer reduction instructions
  target/riscv: vector single-width floating-point reduction
    instructions
  target/riscv: vector widening floating-point reduction instructions
  target/riscv: vector mask-register logical instructions
  target/riscv: vector mask population count vmpopc
  target/riscv: vmfirst find-first-set mask bit
  target/riscv: set-X-first mask bit
  target/riscv: vector iota instruction
  target/riscv: vector element index instruction
  target/riscv: integer extract instruction
  target/riscv: integer scalar move instruction
  target/riscv: floating-point scalar move instructions
  target/riscv: vector slide instructions
  target/riscv: vector register gather instruction
  target/riscv: vector compress instruction
  target/riscv: configure and turn on vector extension from command line

 target/riscv/Makefile.objs              |    2 +-
 target/riscv/cpu.c                      |   49 +
 target/riscv/cpu.h                      |   89 +-
 target/riscv/cpu_bits.h                 |   15 +
 target/riscv/csr.c                      |   75 +-
 target/riscv/helper.h                   | 1075 +++++
 target/riscv/insn32-64.decode           |   11 +
 target/riscv/insn32.decode              |  366 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 2386 ++++++++++++
 target/riscv/translate.c                |   24 +-
 target/riscv/vector_helper.c            | 4745 +++++++++++++++++++++++
 11 files changed, 8824 insertions(+), 13 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
 create mode 100644 target/riscv/vector_helper.c

-- 
2.23.0



^ permalink raw reply	[flat|nested] 336+ messages in thread

* [PATCH v5 01/60] target/riscv: add vector extension field in CPURISCVState
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang,
	Alistair Francis, LIU Zhiwei

The 32 vector registers will be viewed as a continuous memory block.
It avoids the convension between element index and (regno, offset).
Thus elements can be directly accessed by offset from the first vector
base address.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/cpu.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 3dcdf92227..0c1f7bdd8b 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -64,6 +64,7 @@
 #define RVA RV('A')
 #define RVF RV('F')
 #define RVD RV('D')
+#define RVV RV('V')
 #define RVC RV('C')
 #define RVS RV('S')
 #define RVU RV('U')
@@ -94,9 +95,20 @@ typedef struct CPURISCVState CPURISCVState;
 
 #include "pmp.h"
 
+#define RV_VLEN_MAX 512
+
 struct CPURISCVState {
     target_ulong gpr[32];
     uint64_t fpr[32]; /* assume both F and D extensions */
+
+    /* vector coprocessor state. */
+    uint64_t vreg[32 * RV_VLEN_MAX / 64] QEMU_ALIGNED(16);
+    target_ulong vxrm;
+    target_ulong vxsat;
+    target_ulong vl;
+    target_ulong vstart;
+    target_ulong vtype;
+
     target_ulong pc;
     target_ulong load_res;
     target_ulong load_val;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 01/60] target/riscv: add vector extension field in CPURISCVState
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv,
	LIU Zhiwei, Alistair Francis

The 32 vector registers will be viewed as a continuous memory block.
It avoids the convension between element index and (regno, offset).
Thus elements can be directly accessed by offset from the first vector
base address.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/cpu.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 3dcdf92227..0c1f7bdd8b 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -64,6 +64,7 @@
 #define RVA RV('A')
 #define RVF RV('F')
 #define RVD RV('D')
+#define RVV RV('V')
 #define RVC RV('C')
 #define RVS RV('S')
 #define RVU RV('U')
@@ -94,9 +95,20 @@ typedef struct CPURISCVState CPURISCVState;
 
 #include "pmp.h"
 
+#define RV_VLEN_MAX 512
+
 struct CPURISCVState {
     target_ulong gpr[32];
     uint64_t fpr[32]; /* assume both F and D extensions */
+
+    /* vector coprocessor state. */
+    uint64_t vreg[32 * RV_VLEN_MAX / 64] QEMU_ALIGNED(16);
+    target_ulong vxrm;
+    target_ulong vxsat;
+    target_ulong vl;
+    target_ulong vstart;
+    target_ulong vtype;
+
     target_ulong pc;
     target_ulong load_res;
     target_ulong load_val;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 02/60] target/riscv: implementation-defined constant parameters
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang,
	Alistair Francis, LIU Zhiwei

vlen is the vector register length in bits.
elen is the max element size in bits.
vext_spec is the vector specification version, default value is v0.7.1.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/cpu.c | 7 +++++++
 target/riscv/cpu.h | 5 +++++
 2 files changed, 12 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index c0b7023100..6e4135583d 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -106,6 +106,11 @@ static void set_priv_version(CPURISCVState *env, int priv_ver)
     env->priv_ver = priv_ver;
 }
 
+static void set_vext_version(CPURISCVState *env, int vext_ver)
+{
+    env->vext_ver = vext_ver;
+}
+
 static void set_feature(CPURISCVState *env, int feature)
 {
     env->features |= (1ULL << feature);
@@ -364,6 +369,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
     CPURISCVState *env = &cpu->env;
     RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(dev);
     int priv_version = PRIV_VERSION_1_11_0;
+    int vext_version = VEXT_VERSION_0_07_1;
     target_ulong target_misa = 0;
     Error *local_err = NULL;
 
@@ -389,6 +395,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
     }
 
     set_priv_version(env, priv_version);
+    set_vext_version(env, vext_version);
     set_resetvec(env, DEFAULT_RSTVEC);
 
     if (cpu->cfg.mmu) {
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 0c1f7bdd8b..603715f849 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -84,6 +84,8 @@ enum {
 #define PRIV_VERSION_1_10_0 0x00011000
 #define PRIV_VERSION_1_11_0 0x00011100
 
+#define VEXT_VERSION_0_07_1 0x00000701
+
 #define TRANSLATE_PMP_FAIL 2
 #define TRANSLATE_FAIL 1
 #define TRANSLATE_SUCCESS 0
@@ -119,6 +121,7 @@ struct CPURISCVState {
     target_ulong guest_phys_fault_addr;
 
     target_ulong priv_ver;
+    target_ulong vext_ver;
     target_ulong misa;
     target_ulong misa_mask;
 
@@ -281,6 +284,8 @@ typedef struct RISCVCPU {
 
         char *priv_spec;
         char *user_spec;
+        uint16_t vlen;
+        uint16_t elen;
         bool mmu;
         bool pmp;
     } cfg;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 02/60] target/riscv: implementation-defined constant parameters
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv,
	LIU Zhiwei, Alistair Francis

vlen is the vector register length in bits.
elen is the max element size in bits.
vext_spec is the vector specification version, default value is v0.7.1.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/cpu.c | 7 +++++++
 target/riscv/cpu.h | 5 +++++
 2 files changed, 12 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index c0b7023100..6e4135583d 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -106,6 +106,11 @@ static void set_priv_version(CPURISCVState *env, int priv_ver)
     env->priv_ver = priv_ver;
 }
 
+static void set_vext_version(CPURISCVState *env, int vext_ver)
+{
+    env->vext_ver = vext_ver;
+}
+
 static void set_feature(CPURISCVState *env, int feature)
 {
     env->features |= (1ULL << feature);
@@ -364,6 +369,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
     CPURISCVState *env = &cpu->env;
     RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(dev);
     int priv_version = PRIV_VERSION_1_11_0;
+    int vext_version = VEXT_VERSION_0_07_1;
     target_ulong target_misa = 0;
     Error *local_err = NULL;
 
@@ -389,6 +395,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
     }
 
     set_priv_version(env, priv_version);
+    set_vext_version(env, vext_version);
     set_resetvec(env, DEFAULT_RSTVEC);
 
     if (cpu->cfg.mmu) {
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 0c1f7bdd8b..603715f849 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -84,6 +84,8 @@ enum {
 #define PRIV_VERSION_1_10_0 0x00011000
 #define PRIV_VERSION_1_11_0 0x00011100
 
+#define VEXT_VERSION_0_07_1 0x00000701
+
 #define TRANSLATE_PMP_FAIL 2
 #define TRANSLATE_FAIL 1
 #define TRANSLATE_SUCCESS 0
@@ -119,6 +121,7 @@ struct CPURISCVState {
     target_ulong guest_phys_fault_addr;
 
     target_ulong priv_ver;
+    target_ulong vext_ver;
     target_ulong misa;
     target_ulong misa_mask;
 
@@ -281,6 +284,8 @@ typedef struct RISCVCPU {
 
         char *priv_spec;
         char *user_spec;
+        uint16_t vlen;
+        uint16_t elen;
         bool mmu;
         bool pmp;
     } cfg;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 03/60] target/riscv: support vector extension csr
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

The v0.7.1 specification does not define vector status within mstatus.
A future revision will define the privileged portion of the vector status.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/cpu_bits.h | 15 +++++++++
 target/riscv/csr.c      | 75 ++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index 7f64ee1174..8117e8b5a7 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -29,6 +29,14 @@
 #define FSR_NXA             (FPEXC_NX << FSR_AEXC_SHIFT)
 #define FSR_AEXC            (FSR_NVA | FSR_OFA | FSR_UFA | FSR_DZA | FSR_NXA)
 
+/* Vector Fixed-Point round model */
+#define FSR_VXRM_SHIFT      9
+#define FSR_VXRM            (0x3 << FSR_VXRM_SHIFT)
+
+/* Vector Fixed-Point saturation flag */
+#define FSR_VXSAT_SHIFT     8
+#define FSR_VXSAT           (0x1 << FSR_VXSAT_SHIFT)
+
 /* Control and Status Registers */
 
 /* User Trap Setup */
@@ -48,6 +56,13 @@
 #define CSR_FRM             0x002
 #define CSR_FCSR            0x003
 
+/* User Vector CSRs */
+#define CSR_VSTART          0x008
+#define CSR_VXSAT           0x009
+#define CSR_VXRM            0x00a
+#define CSR_VL              0xc20
+#define CSR_VTYPE           0xc21
+
 /* User Timers and Counters */
 #define CSR_CYCLE           0xc00
 #define CSR_TIME            0xc01
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 11d184cd16..d71c49dfff 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -46,6 +46,10 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations *ops)
 static int fs(CPURISCVState *env, int csrno)
 {
 #if !defined(CONFIG_USER_ONLY)
+    /* loose check condition for fcsr in vector extension */
+    if ((csrno == CSR_FCSR) && (env->misa & RVV)) {
+        return 0;
+    }
     if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
         return -1;
     }
@@ -53,6 +57,14 @@ static int fs(CPURISCVState *env, int csrno)
     return 0;
 }
 
+static int vs(CPURISCVState *env, int csrno)
+{
+    if (env->misa & RVV) {
+        return 0;
+    }
+    return -1;
+}
+
 static int ctr(CPURISCVState *env, int csrno)
 {
 #if !defined(CONFIG_USER_ONLY)
@@ -174,6 +186,10 @@ static int read_fcsr(CPURISCVState *env, int csrno, target_ulong *val)
 #endif
     *val = (riscv_cpu_get_fflags(env) << FSR_AEXC_SHIFT)
         | (env->frm << FSR_RD_SHIFT);
+    if (vs(env, csrno) >= 0) {
+        *val |= (env->vxrm << FSR_VXRM_SHIFT)
+                | (env->vxsat << FSR_VXSAT_SHIFT);
+    }
     return 0;
 }
 
@@ -186,10 +202,62 @@ static int write_fcsr(CPURISCVState *env, int csrno, target_ulong val)
     env->mstatus |= MSTATUS_FS;
 #endif
     env->frm = (val & FSR_RD) >> FSR_RD_SHIFT;
+    if (vs(env, csrno) >= 0) {
+        env->vxrm = (val & FSR_VXRM) >> FSR_VXRM_SHIFT;
+        env->vxsat = (val & FSR_VXSAT) >> FSR_VXSAT_SHIFT;
+    }
     riscv_cpu_set_fflags(env, (val & FSR_AEXC) >> FSR_AEXC_SHIFT);
     return 0;
 }
 
+static int read_vtype(CPURISCVState *env, int csrno, target_ulong *val)
+{
+    *val = env->vtype;
+    return 0;
+}
+
+static int read_vl(CPURISCVState *env, int csrno, target_ulong *val)
+{
+    *val = env->vl;
+    return 0;
+}
+
+static int read_vxrm(CPURISCVState *env, int csrno, target_ulong *val)
+{
+    *val = env->vxrm;
+    return 0;
+}
+
+static int write_vxrm(CPURISCVState *env, int csrno, target_ulong val)
+{
+    env->vxrm = val;
+    return 0;
+}
+
+static int read_vxsat(CPURISCVState *env, int csrno, target_ulong *val)
+{
+    *val = env->vxsat;
+    return 0;
+}
+
+static int write_vxsat(CPURISCVState *env, int csrno, target_ulong val)
+{
+    env->vxsat = val;
+    return 0;
+}
+
+static int read_vstart(CPURISCVState *env, int csrno, target_ulong *val)
+{
+    *val = env->vstart;
+    return 0;
+}
+
+static int write_vstart(CPURISCVState *env, int csrno, target_ulong val)
+{
+    env->vstart = val;
+    return 0;
+}
+
 /* User Timers and Counters */
 static int read_instret(CPURISCVState *env, int csrno, target_ulong *val)
 {
@@ -1269,7 +1337,12 @@ static riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
     [CSR_FFLAGS] =              { fs,   read_fflags,      write_fflags      },
     [CSR_FRM] =                 { fs,   read_frm,         write_frm         },
     [CSR_FCSR] =                { fs,   read_fcsr,        write_fcsr        },
-
+    /* Vector CSRs */
+    [CSR_VSTART] =              { vs,   read_vstart,      write_vstart      },
+    [CSR_VXSAT] =               { vs,   read_vxsat,       write_vxsat       },
+    [CSR_VXRM] =                { vs,   read_vxrm,        write_vxrm        },
+    [CSR_VL] =                  { vs,   read_vl                             },
+    [CSR_VTYPE] =               { vs,   read_vtype                          },
     /* User Timers and Counters */
     [CSR_CYCLE] =               { ctr,  read_instret                        },
     [CSR_INSTRET] =             { ctr,  read_instret                        },
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 03/60] target/riscv: support vector extension csr
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

The v0.7.1 specification does not define vector status within mstatus.
A future revision will define the privileged portion of the vector status.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/cpu_bits.h | 15 +++++++++
 target/riscv/csr.c      | 75 ++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index 7f64ee1174..8117e8b5a7 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -29,6 +29,14 @@
 #define FSR_NXA             (FPEXC_NX << FSR_AEXC_SHIFT)
 #define FSR_AEXC            (FSR_NVA | FSR_OFA | FSR_UFA | FSR_DZA | FSR_NXA)
 
+/* Vector Fixed-Point round model */
+#define FSR_VXRM_SHIFT      9
+#define FSR_VXRM            (0x3 << FSR_VXRM_SHIFT)
+
+/* Vector Fixed-Point saturation flag */
+#define FSR_VXSAT_SHIFT     8
+#define FSR_VXSAT           (0x1 << FSR_VXSAT_SHIFT)
+
 /* Control and Status Registers */
 
 /* User Trap Setup */
@@ -48,6 +56,13 @@
 #define CSR_FRM             0x002
 #define CSR_FCSR            0x003
 
+/* User Vector CSRs */
+#define CSR_VSTART          0x008
+#define CSR_VXSAT           0x009
+#define CSR_VXRM            0x00a
+#define CSR_VL              0xc20
+#define CSR_VTYPE           0xc21
+
 /* User Timers and Counters */
 #define CSR_CYCLE           0xc00
 #define CSR_TIME            0xc01
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 11d184cd16..d71c49dfff 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -46,6 +46,10 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations *ops)
 static int fs(CPURISCVState *env, int csrno)
 {
 #if !defined(CONFIG_USER_ONLY)
+    /* loose check condition for fcsr in vector extension */
+    if ((csrno == CSR_FCSR) && (env->misa & RVV)) {
+        return 0;
+    }
     if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
         return -1;
     }
@@ -53,6 +57,14 @@ static int fs(CPURISCVState *env, int csrno)
     return 0;
 }
 
+static int vs(CPURISCVState *env, int csrno)
+{
+    if (env->misa & RVV) {
+        return 0;
+    }
+    return -1;
+}
+
 static int ctr(CPURISCVState *env, int csrno)
 {
 #if !defined(CONFIG_USER_ONLY)
@@ -174,6 +186,10 @@ static int read_fcsr(CPURISCVState *env, int csrno, target_ulong *val)
 #endif
     *val = (riscv_cpu_get_fflags(env) << FSR_AEXC_SHIFT)
         | (env->frm << FSR_RD_SHIFT);
+    if (vs(env, csrno) >= 0) {
+        *val |= (env->vxrm << FSR_VXRM_SHIFT)
+                | (env->vxsat << FSR_VXSAT_SHIFT);
+    }
     return 0;
 }
 
@@ -186,10 +202,62 @@ static int write_fcsr(CPURISCVState *env, int csrno, target_ulong val)
     env->mstatus |= MSTATUS_FS;
 #endif
     env->frm = (val & FSR_RD) >> FSR_RD_SHIFT;
+    if (vs(env, csrno) >= 0) {
+        env->vxrm = (val & FSR_VXRM) >> FSR_VXRM_SHIFT;
+        env->vxsat = (val & FSR_VXSAT) >> FSR_VXSAT_SHIFT;
+    }
     riscv_cpu_set_fflags(env, (val & FSR_AEXC) >> FSR_AEXC_SHIFT);
     return 0;
 }
 
+static int read_vtype(CPURISCVState *env, int csrno, target_ulong *val)
+{
+    *val = env->vtype;
+    return 0;
+}
+
+static int read_vl(CPURISCVState *env, int csrno, target_ulong *val)
+{
+    *val = env->vl;
+    return 0;
+}
+
+static int read_vxrm(CPURISCVState *env, int csrno, target_ulong *val)
+{
+    *val = env->vxrm;
+    return 0;
+}
+
+static int write_vxrm(CPURISCVState *env, int csrno, target_ulong val)
+{
+    env->vxrm = val;
+    return 0;
+}
+
+static int read_vxsat(CPURISCVState *env, int csrno, target_ulong *val)
+{
+    *val = env->vxsat;
+    return 0;
+}
+
+static int write_vxsat(CPURISCVState *env, int csrno, target_ulong val)
+{
+    env->vxsat = val;
+    return 0;
+}
+
+static int read_vstart(CPURISCVState *env, int csrno, target_ulong *val)
+{
+    *val = env->vstart;
+    return 0;
+}
+
+static int write_vstart(CPURISCVState *env, int csrno, target_ulong val)
+{
+    env->vstart = val;
+    return 0;
+}
+
 /* User Timers and Counters */
 static int read_instret(CPURISCVState *env, int csrno, target_ulong *val)
 {
@@ -1269,7 +1337,12 @@ static riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
     [CSR_FFLAGS] =              { fs,   read_fflags,      write_fflags      },
     [CSR_FRM] =                 { fs,   read_frm,         write_frm         },
     [CSR_FCSR] =                { fs,   read_fcsr,        write_fcsr        },
-
+    /* Vector CSRs */
+    [CSR_VSTART] =              { vs,   read_vstart,      write_vstart      },
+    [CSR_VXSAT] =               { vs,   read_vxsat,       write_vxsat       },
+    [CSR_VXRM] =                { vs,   read_vxrm,        write_vxrm        },
+    [CSR_VL] =                  { vs,   read_vl                             },
+    [CSR_VTYPE] =               { vs,   read_vtype                          },
     /* User Timers and Counters */
     [CSR_CYCLE] =               { ctr,  read_instret                        },
     [CSR_INSTRET] =             { ctr,  read_instret                        },
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 04/60] target/riscv: add vector configure instruction
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

vsetvl and vsetvli are two configure instructions for vl, vtype. TB flags
should update after configure instructions. The (ill, lmul, sew ) of vtype
and the bit of (VSTART == 0 && VL == VLMAX) will be placed within tb_flags.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/Makefile.objs              |  2 +-
 target/riscv/cpu.h                      | 63 ++++++++++++++++++----
 target/riscv/helper.h                   |  2 +
 target/riscv/insn32.decode              |  5 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 69 +++++++++++++++++++++++++
 target/riscv/translate.c                | 17 +++++-
 target/riscv/vector_helper.c            | 53 +++++++++++++++++++
 7 files changed, 199 insertions(+), 12 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
 create mode 100644 target/riscv/vector_helper.c

diff --git a/target/riscv/Makefile.objs b/target/riscv/Makefile.objs
index ff651f69f6..ff38df6219 100644
--- a/target/riscv/Makefile.objs
+++ b/target/riscv/Makefile.objs
@@ -1,4 +1,4 @@
-obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o gdbstub.o
+obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o vector_helper.o gdbstub.o
 obj-$(CONFIG_SOFTMMU) += pmp.o
 
 ifeq ($(CONFIG_SOFTMMU),y)
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 603715f849..505d1a8515 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -21,6 +21,7 @@
 #define RISCV_CPU_H
 
 #include "hw/core/cpu.h"
+#include "hw/registerfields.h"
 #include "exec/cpu-defs.h"
 #include "fpu/softfloat-types.h"
 
@@ -99,6 +100,12 @@ typedef struct CPURISCVState CPURISCVState;
 
 #define RV_VLEN_MAX 512
 
+FIELD(VTYPE, VLMUL, 0, 2)
+FIELD(VTYPE, VSEW, 2, 3)
+FIELD(VTYPE, VEDIV, 5, 2)
+FIELD(VTYPE, RESERVED, 7, sizeof(target_ulong) * 8 - 9)
+FIELD(VTYPE, VILL, sizeof(target_ulong) * 8 - 2, 1)
+
 struct CPURISCVState {
     target_ulong gpr[32];
     uint64_t fpr[32]; /* assume both F and D extensions */
@@ -358,19 +365,62 @@ void riscv_cpu_set_fflags(CPURISCVState *env, target_ulong);
 #define TB_FLAGS_MMU_MASK   3
 #define TB_FLAGS_MSTATUS_FS MSTATUS_FS
 
+typedef CPURISCVState CPUArchState;
+typedef RISCVCPU ArchCPU;
+#include "exec/cpu-all.h"
+
+FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
+FIELD(TB_FLAGS, LMUL, 3, 2)
+FIELD(TB_FLAGS, SEW, 5, 3)
+FIELD(TB_FLAGS, VILL, 8, 1)
+
+/*
+ * A simplification for VLMAX
+ * = (1 << LMUL) * VLEN / (8 * (1 << SEW))
+ * = (VLEN << LMUL) / (8 << SEW)
+ * = (VLEN << LMUL) >> (SEW + 3)
+ * = VLEN >> (SEW + 3 - LMUL)
+ */
+static inline uint32_t vext_get_vlmax(RISCVCPU *cpu, target_ulong vtype)
+{
+    uint8_t sew, lmul;
+
+    sew = FIELD_EX64(vtype, VTYPE, VSEW);
+    lmul = FIELD_EX64(vtype, VTYPE, VLMUL);
+    return cpu->cfg.vlen >> (sew + 3 - lmul);
+}
+
 static inline void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
-                                        target_ulong *cs_base, uint32_t *flags)
+                                        target_ulong *cs_base, uint32_t *pflags)
 {
+    uint32_t flags = 0;
+
     *pc = env->pc;
     *cs_base = 0;
+
+    if (env->misa & RVV) {
+        uint32_t vlmax = vext_get_vlmax(env_archcpu(env), env->vtype);
+        bool vl_eq_vlmax = (env->vstart == 0) && (vlmax == env->vl);
+        flags = FIELD_DP32(flags, TB_FLAGS, VILL,
+                    FIELD_EX64(env->vtype, VTYPE, VILL));
+        flags = FIELD_DP32(flags, TB_FLAGS, SEW,
+                    FIELD_EX64(env->vtype, VTYPE, VSEW));
+        flags = FIELD_DP32(flags, TB_FLAGS, LMUL,
+                    FIELD_EX64(env->vtype, VTYPE, VLMUL));
+        flags = FIELD_DP32(flags, TB_FLAGS, VL_EQ_VLMAX, vl_eq_vlmax);
+    } else {
+        flags = FIELD_DP32(flags, TB_FLAGS, VILL, 1);
+    }
+
 #ifdef CONFIG_USER_ONLY
-    *flags = TB_FLAGS_MSTATUS_FS;
+    flags |= TB_FLAGS_MSTATUS_FS;
 #else
-    *flags = cpu_mmu_index(env, 0);
+    flags |= cpu_mmu_index(env, 0);
     if (riscv_cpu_fp_enabled(env)) {
-        *flags |= env->mstatus & MSTATUS_FS;
+        flags |= env->mstatus & MSTATUS_FS;
     }
 #endif
+    *pflags = flags;
 }
 
 int riscv_csrrw(CPURISCVState *env, int csrno, target_ulong *ret_value,
@@ -411,9 +461,4 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations *ops);
 
 void riscv_cpu_register_gdb_regs_for_features(CPUState *cs);
 
-typedef CPURISCVState CPUArchState;
-typedef RISCVCPU ArchCPU;
-
-#include "exec/cpu-all.h"
-
 #endif /* RISCV_CPU_H */
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index debb22a480..3c28c7e407 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -76,3 +76,5 @@ DEF_HELPER_2(mret, tl, env, tl)
 DEF_HELPER_1(wfi, void, env)
 DEF_HELPER_1(tlb_flush, void, env)
 #endif
+/* Vector functions */
+DEF_HELPER_3(vsetvl, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b883672e63..53340bdbc4 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -62,6 +62,7 @@
 @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
 @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
 @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
+@r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
 
 @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
 @hfence_bvma ....... ..... .....   ... ..... ....... %rs2 %rs1
@@ -207,3 +208,7 @@ fcvt_w_d   1100001  00000 ..... ... ..... 1010011 @r2_rm
 fcvt_wu_d  1100001  00001 ..... ... ..... 1010011 @r2_rm
 fcvt_d_w   1101001  00000 ..... ... ..... 1010011 @r2_rm
 fcvt_d_wu  1101001  00001 ..... ... ..... 1010011 @r2_rm
+
+# *** RV32V Extension ***
+vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
+vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
new file mode 100644
index 0000000000..da82c72bbf
--- /dev/null
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -0,0 +1,69 @@
+/*
+ * RISC-V translation routines for the RVV Standard Extension.
+ *
+ * Copyright (c) 2020 C-SKY Limited. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl * a)
+{
+    TCGv s1, s2, dst;
+    s2 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    /* Using x0 as the rs1 register specifier, encodes an infinite AVL */
+    if (a->rs1 == 0) {
+        /* As the mask is at least one bit, RV_VLEN_MAX is >= VLMAX */
+        s1 = tcg_const_tl(RV_VLEN_MAX);
+    } else {
+        s1 = tcg_temp_new();
+        gen_get_gpr(s1, a->rs1);
+    }
+    gen_get_gpr(s2, a->rs2);
+    gen_helper_vsetvl(dst, cpu_env, s1, s2);
+    gen_set_gpr(a->rd, dst);
+    tcg_gen_movi_tl(cpu_pc, ctx->pc_succ_insn);
+    exit_tb(ctx);
+    ctx->base.is_jmp = DISAS_NORETURN;
+
+    tcg_temp_free(s1);
+    tcg_temp_free(s2);
+    tcg_temp_free(dst);
+    return true;
+}
+
+static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli * a)
+{
+    TCGv s1, s2, dst;
+    s2 = tcg_const_tl(a->zimm);
+    dst = tcg_temp_new();
+
+    /* Using x0 as the rs1 register specifier, encodes an infinite AVL */
+    if (a->rs1 == 0) {
+        /* As the mask is at least one bit, RV_VLEN_MAX is >= VLMAX */
+        s1 = tcg_const_tl(RV_VLEN_MAX);
+    } else {
+        s1 = tcg_temp_new();
+        gen_get_gpr(s1, a->rs1);
+    }
+    gen_helper_vsetvl(dst, cpu_env, s1, s2);
+    gen_set_gpr(a->rd, dst);
+    gen_goto_tb(ctx, 0, ctx->pc_succ_insn);
+    ctx->base.is_jmp = DISAS_NORETURN;
+
+    tcg_temp_free(s1);
+    tcg_temp_free(s2);
+    tcg_temp_free(dst);
+    return true;
+}
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 43bf7e39a6..af07ac4160 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -56,6 +56,12 @@ typedef struct DisasContext {
        to reset this known value.  */
     int frm;
     bool ext_ifencei;
+    /* vector extension */
+    bool vill;
+    uint8_t lmul;
+    uint8_t sew;
+    uint16_t vlen;
+    bool vl_eq_vlmax;
 } DisasContext;
 
 #ifdef TARGET_RISCV64
@@ -711,6 +717,7 @@ static bool gen_shift(DisasContext *ctx, arg_r *a,
 #include "insn_trans/trans_rva.inc.c"
 #include "insn_trans/trans_rvf.inc.c"
 #include "insn_trans/trans_rvd.inc.c"
+#include "insn_trans/trans_rvv.inc.c"
 #include "insn_trans/trans_privileged.inc.c"
 
 /* Include the auto-generated decoder for 16 bit insn */
@@ -745,10 +752,11 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     DisasContext *ctx = container_of(dcbase, DisasContext, base);
     CPURISCVState *env = cs->env_ptr;
     RISCVCPU *cpu = RISCV_CPU(cs);
+    uint32_t tb_flags = ctx->base.tb->flags;
 
     ctx->pc_succ_insn = ctx->base.pc_first;
-    ctx->mem_idx = ctx->base.tb->flags & TB_FLAGS_MMU_MASK;
-    ctx->mstatus_fs = ctx->base.tb->flags & TB_FLAGS_MSTATUS_FS;
+    ctx->mem_idx = tb_flags & TB_FLAGS_MMU_MASK;
+    ctx->mstatus_fs = tb_flags & TB_FLAGS_MSTATUS_FS;
     ctx->priv_ver = env->priv_ver;
 #if !defined(CONFIG_USER_ONLY)
     if (riscv_has_ext(env, RVH)) {
@@ -772,6 +780,11 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     ctx->misa = env->misa;
     ctx->frm = -1;  /* unknown rounding mode */
     ctx->ext_ifencei = cpu->cfg.ext_ifencei;
+    ctx->vlen = cpu->cfg.vlen;
+    ctx->vill = FIELD_EX32(tb_flags, TB_FLAGS, VILL);
+    ctx->sew = FIELD_EX32(tb_flags, TB_FLAGS, SEW);
+    ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
+    ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
 }
 
 static void riscv_tr_tb_start(DisasContextBase *db, CPUState *cpu)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
new file mode 100644
index 0000000000..2afe716f2a
--- /dev/null
+++ b/target/riscv/vector_helper.c
@@ -0,0 +1,53 @@
+/*
+ * RISC-V Vector Extension Helpers for QEMU.
+ *
+ * Copyright (c) 2020 C-SKY Limited. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/exec-all.h"
+#include "exec/helper-proto.h"
+#include <math.h>
+
+target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
+    target_ulong s2)
+{
+    int vlmax, vl;
+    RISCVCPU *cpu = env_archcpu(env);
+    uint16_t sew = 8 << FIELD_EX64(s2, VTYPE, VSEW);
+    uint8_t ediv = FIELD_EX64(s2, VTYPE, VEDIV);
+    bool vill = FIELD_EX64(s2, VTYPE, VILL);
+    target_ulong reserved = FIELD_EX64(s2, VTYPE, RESERVED);
+
+    if ((sew > cpu->cfg.elen) || vill || (ediv != 0) || (reserved != 0)) {
+        /* only set vill bit. */
+        env->vtype = FIELD_DP64(0, VTYPE, VILL, 1);
+        env->vl = 0;
+        env->vstart = 0;
+        return 0;
+    }
+
+    vlmax = vext_get_vlmax(cpu, s2);
+    if (s1 <= vlmax) {
+        vl = s1;
+    } else {
+        vl = vlmax;
+    }
+    env->vl = vl;
+    env->vtype = s2;
+    env->vstart = 0;
+    return vl;
+}
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 04/60] target/riscv: add vector configure instruction
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

vsetvl and vsetvli are two configure instructions for vl, vtype. TB flags
should update after configure instructions. The (ill, lmul, sew ) of vtype
and the bit of (VSTART == 0 && VL == VLMAX) will be placed within tb_flags.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/Makefile.objs              |  2 +-
 target/riscv/cpu.h                      | 63 ++++++++++++++++++----
 target/riscv/helper.h                   |  2 +
 target/riscv/insn32.decode              |  5 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 69 +++++++++++++++++++++++++
 target/riscv/translate.c                | 17 +++++-
 target/riscv/vector_helper.c            | 53 +++++++++++++++++++
 7 files changed, 199 insertions(+), 12 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
 create mode 100644 target/riscv/vector_helper.c

diff --git a/target/riscv/Makefile.objs b/target/riscv/Makefile.objs
index ff651f69f6..ff38df6219 100644
--- a/target/riscv/Makefile.objs
+++ b/target/riscv/Makefile.objs
@@ -1,4 +1,4 @@
-obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o gdbstub.o
+obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o vector_helper.o gdbstub.o
 obj-$(CONFIG_SOFTMMU) += pmp.o
 
 ifeq ($(CONFIG_SOFTMMU),y)
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 603715f849..505d1a8515 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -21,6 +21,7 @@
 #define RISCV_CPU_H
 
 #include "hw/core/cpu.h"
+#include "hw/registerfields.h"
 #include "exec/cpu-defs.h"
 #include "fpu/softfloat-types.h"
 
@@ -99,6 +100,12 @@ typedef struct CPURISCVState CPURISCVState;
 
 #define RV_VLEN_MAX 512
 
+FIELD(VTYPE, VLMUL, 0, 2)
+FIELD(VTYPE, VSEW, 2, 3)
+FIELD(VTYPE, VEDIV, 5, 2)
+FIELD(VTYPE, RESERVED, 7, sizeof(target_ulong) * 8 - 9)
+FIELD(VTYPE, VILL, sizeof(target_ulong) * 8 - 2, 1)
+
 struct CPURISCVState {
     target_ulong gpr[32];
     uint64_t fpr[32]; /* assume both F and D extensions */
@@ -358,19 +365,62 @@ void riscv_cpu_set_fflags(CPURISCVState *env, target_ulong);
 #define TB_FLAGS_MMU_MASK   3
 #define TB_FLAGS_MSTATUS_FS MSTATUS_FS
 
+typedef CPURISCVState CPUArchState;
+typedef RISCVCPU ArchCPU;
+#include "exec/cpu-all.h"
+
+FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
+FIELD(TB_FLAGS, LMUL, 3, 2)
+FIELD(TB_FLAGS, SEW, 5, 3)
+FIELD(TB_FLAGS, VILL, 8, 1)
+
+/*
+ * A simplification for VLMAX
+ * = (1 << LMUL) * VLEN / (8 * (1 << SEW))
+ * = (VLEN << LMUL) / (8 << SEW)
+ * = (VLEN << LMUL) >> (SEW + 3)
+ * = VLEN >> (SEW + 3 - LMUL)
+ */
+static inline uint32_t vext_get_vlmax(RISCVCPU *cpu, target_ulong vtype)
+{
+    uint8_t sew, lmul;
+
+    sew = FIELD_EX64(vtype, VTYPE, VSEW);
+    lmul = FIELD_EX64(vtype, VTYPE, VLMUL);
+    return cpu->cfg.vlen >> (sew + 3 - lmul);
+}
+
 static inline void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
-                                        target_ulong *cs_base, uint32_t *flags)
+                                        target_ulong *cs_base, uint32_t *pflags)
 {
+    uint32_t flags = 0;
+
     *pc = env->pc;
     *cs_base = 0;
+
+    if (env->misa & RVV) {
+        uint32_t vlmax = vext_get_vlmax(env_archcpu(env), env->vtype);
+        bool vl_eq_vlmax = (env->vstart == 0) && (vlmax == env->vl);
+        flags = FIELD_DP32(flags, TB_FLAGS, VILL,
+                    FIELD_EX64(env->vtype, VTYPE, VILL));
+        flags = FIELD_DP32(flags, TB_FLAGS, SEW,
+                    FIELD_EX64(env->vtype, VTYPE, VSEW));
+        flags = FIELD_DP32(flags, TB_FLAGS, LMUL,
+                    FIELD_EX64(env->vtype, VTYPE, VLMUL));
+        flags = FIELD_DP32(flags, TB_FLAGS, VL_EQ_VLMAX, vl_eq_vlmax);
+    } else {
+        flags = FIELD_DP32(flags, TB_FLAGS, VILL, 1);
+    }
+
 #ifdef CONFIG_USER_ONLY
-    *flags = TB_FLAGS_MSTATUS_FS;
+    flags |= TB_FLAGS_MSTATUS_FS;
 #else
-    *flags = cpu_mmu_index(env, 0);
+    flags |= cpu_mmu_index(env, 0);
     if (riscv_cpu_fp_enabled(env)) {
-        *flags |= env->mstatus & MSTATUS_FS;
+        flags |= env->mstatus & MSTATUS_FS;
     }
 #endif
+    *pflags = flags;
 }
 
 int riscv_csrrw(CPURISCVState *env, int csrno, target_ulong *ret_value,
@@ -411,9 +461,4 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations *ops);
 
 void riscv_cpu_register_gdb_regs_for_features(CPUState *cs);
 
-typedef CPURISCVState CPUArchState;
-typedef RISCVCPU ArchCPU;
-
-#include "exec/cpu-all.h"
-
 #endif /* RISCV_CPU_H */
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index debb22a480..3c28c7e407 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -76,3 +76,5 @@ DEF_HELPER_2(mret, tl, env, tl)
 DEF_HELPER_1(wfi, void, env)
 DEF_HELPER_1(tlb_flush, void, env)
 #endif
+/* Vector functions */
+DEF_HELPER_3(vsetvl, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b883672e63..53340bdbc4 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -62,6 +62,7 @@
 @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
 @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
 @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
+@r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
 
 @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
 @hfence_bvma ....... ..... .....   ... ..... ....... %rs2 %rs1
@@ -207,3 +208,7 @@ fcvt_w_d   1100001  00000 ..... ... ..... 1010011 @r2_rm
 fcvt_wu_d  1100001  00001 ..... ... ..... 1010011 @r2_rm
 fcvt_d_w   1101001  00000 ..... ... ..... 1010011 @r2_rm
 fcvt_d_wu  1101001  00001 ..... ... ..... 1010011 @r2_rm
+
+# *** RV32V Extension ***
+vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
+vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
new file mode 100644
index 0000000000..da82c72bbf
--- /dev/null
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -0,0 +1,69 @@
+/*
+ * RISC-V translation routines for the RVV Standard Extension.
+ *
+ * Copyright (c) 2020 C-SKY Limited. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl * a)
+{
+    TCGv s1, s2, dst;
+    s2 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    /* Using x0 as the rs1 register specifier, encodes an infinite AVL */
+    if (a->rs1 == 0) {
+        /* As the mask is at least one bit, RV_VLEN_MAX is >= VLMAX */
+        s1 = tcg_const_tl(RV_VLEN_MAX);
+    } else {
+        s1 = tcg_temp_new();
+        gen_get_gpr(s1, a->rs1);
+    }
+    gen_get_gpr(s2, a->rs2);
+    gen_helper_vsetvl(dst, cpu_env, s1, s2);
+    gen_set_gpr(a->rd, dst);
+    tcg_gen_movi_tl(cpu_pc, ctx->pc_succ_insn);
+    exit_tb(ctx);
+    ctx->base.is_jmp = DISAS_NORETURN;
+
+    tcg_temp_free(s1);
+    tcg_temp_free(s2);
+    tcg_temp_free(dst);
+    return true;
+}
+
+static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli * a)
+{
+    TCGv s1, s2, dst;
+    s2 = tcg_const_tl(a->zimm);
+    dst = tcg_temp_new();
+
+    /* Using x0 as the rs1 register specifier, encodes an infinite AVL */
+    if (a->rs1 == 0) {
+        /* As the mask is at least one bit, RV_VLEN_MAX is >= VLMAX */
+        s1 = tcg_const_tl(RV_VLEN_MAX);
+    } else {
+        s1 = tcg_temp_new();
+        gen_get_gpr(s1, a->rs1);
+    }
+    gen_helper_vsetvl(dst, cpu_env, s1, s2);
+    gen_set_gpr(a->rd, dst);
+    gen_goto_tb(ctx, 0, ctx->pc_succ_insn);
+    ctx->base.is_jmp = DISAS_NORETURN;
+
+    tcg_temp_free(s1);
+    tcg_temp_free(s2);
+    tcg_temp_free(dst);
+    return true;
+}
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 43bf7e39a6..af07ac4160 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -56,6 +56,12 @@ typedef struct DisasContext {
        to reset this known value.  */
     int frm;
     bool ext_ifencei;
+    /* vector extension */
+    bool vill;
+    uint8_t lmul;
+    uint8_t sew;
+    uint16_t vlen;
+    bool vl_eq_vlmax;
 } DisasContext;
 
 #ifdef TARGET_RISCV64
@@ -711,6 +717,7 @@ static bool gen_shift(DisasContext *ctx, arg_r *a,
 #include "insn_trans/trans_rva.inc.c"
 #include "insn_trans/trans_rvf.inc.c"
 #include "insn_trans/trans_rvd.inc.c"
+#include "insn_trans/trans_rvv.inc.c"
 #include "insn_trans/trans_privileged.inc.c"
 
 /* Include the auto-generated decoder for 16 bit insn */
@@ -745,10 +752,11 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     DisasContext *ctx = container_of(dcbase, DisasContext, base);
     CPURISCVState *env = cs->env_ptr;
     RISCVCPU *cpu = RISCV_CPU(cs);
+    uint32_t tb_flags = ctx->base.tb->flags;
 
     ctx->pc_succ_insn = ctx->base.pc_first;
-    ctx->mem_idx = ctx->base.tb->flags & TB_FLAGS_MMU_MASK;
-    ctx->mstatus_fs = ctx->base.tb->flags & TB_FLAGS_MSTATUS_FS;
+    ctx->mem_idx = tb_flags & TB_FLAGS_MMU_MASK;
+    ctx->mstatus_fs = tb_flags & TB_FLAGS_MSTATUS_FS;
     ctx->priv_ver = env->priv_ver;
 #if !defined(CONFIG_USER_ONLY)
     if (riscv_has_ext(env, RVH)) {
@@ -772,6 +780,11 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     ctx->misa = env->misa;
     ctx->frm = -1;  /* unknown rounding mode */
     ctx->ext_ifencei = cpu->cfg.ext_ifencei;
+    ctx->vlen = cpu->cfg.vlen;
+    ctx->vill = FIELD_EX32(tb_flags, TB_FLAGS, VILL);
+    ctx->sew = FIELD_EX32(tb_flags, TB_FLAGS, SEW);
+    ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
+    ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
 }
 
 static void riscv_tr_tb_start(DisasContextBase *db, CPUState *cpu)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
new file mode 100644
index 0000000000..2afe716f2a
--- /dev/null
+++ b/target/riscv/vector_helper.c
@@ -0,0 +1,53 @@
+/*
+ * RISC-V Vector Extension Helpers for QEMU.
+ *
+ * Copyright (c) 2020 C-SKY Limited. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/exec-all.h"
+#include "exec/helper-proto.h"
+#include <math.h>
+
+target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
+    target_ulong s2)
+{
+    int vlmax, vl;
+    RISCVCPU *cpu = env_archcpu(env);
+    uint16_t sew = 8 << FIELD_EX64(s2, VTYPE, VSEW);
+    uint8_t ediv = FIELD_EX64(s2, VTYPE, VEDIV);
+    bool vill = FIELD_EX64(s2, VTYPE, VILL);
+    target_ulong reserved = FIELD_EX64(s2, VTYPE, RESERVED);
+
+    if ((sew > cpu->cfg.elen) || vill || (ediv != 0) || (reserved != 0)) {
+        /* only set vill bit. */
+        env->vtype = FIELD_DP64(0, VTYPE, VILL, 1);
+        env->vl = 0;
+        env->vstart = 0;
+        return 0;
+    }
+
+    vlmax = vext_get_vlmax(cpu, s2);
+    if (s1 <= vlmax) {
+        vl = s1;
+    } else {
+        vl = vlmax;
+    }
+    env->vl = vl;
+    env->vtype = s2;
+    env->vstart = 0;
+    return vl;
+}
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 05/60] target/riscv: add vector stride load and store instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Vector strided operations access the first memory element at the base address,
and then access subsequent elements at address increments given by the byte
offset contained in the x register specified by rs2.

Vector unit-stride operations access elements stored contiguously in memory
starting from the base effective address. It can been seen as a special
case of strided operations.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/cpu.h                      |   6 +
 target/riscv/helper.h                   | 105 ++++++
 target/riscv/insn32.decode              |  32 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 340 ++++++++++++++++++++
 target/riscv/translate.c                |   7 +
 target/riscv/vector_helper.c            | 406 ++++++++++++++++++++++++
 6 files changed, 896 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 505d1a8515..b6ebb9b0eb 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -369,6 +369,12 @@ typedef CPURISCVState CPUArchState;
 typedef RISCVCPU ArchCPU;
 #include "exec/cpu-all.h"
 
+/* share data between vector helpers and decode code */
+FIELD(VDATA, MLEN, 0, 8)
+FIELD(VDATA, VM, 8, 1)
+FIELD(VDATA, LMUL, 9, 2)
+FIELD(VDATA, NF, 11, 4)
+
 FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
 FIELD(TB_FLAGS, LMUL, 3, 2)
 FIELD(TB_FLAGS, SEW, 5, 3)
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 3c28c7e407..87dfa90609 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -78,3 +78,108 @@ DEF_HELPER_1(tlb_flush, void, env)
 #endif
 /* Vector functions */
 DEF_HELPER_3(vsetvl, tl, env, tl, tl)
+DEF_HELPER_5(vlb_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_b_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlw_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlw_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlw_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlw_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_b_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_b_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwu_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwu_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwu_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwu_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_b_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsw_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsw_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsw_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsw_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_b_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_6(vlsb_v_b, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsb_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsb_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsb_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsh_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsh_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsh_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsw_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsw_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlse_v_b, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlse_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlse_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlse_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsbu_v_b, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsbu_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsbu_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsbu_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlshu_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlshu_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlshu_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlswu_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlswu_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssb_v_b, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssb_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssb_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssb_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssh_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssh_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssh_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssw_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssw_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vsse_v_b, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vsse_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vsse_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vsse_v_d, void, ptr, ptr, tl, tl, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 53340bdbc4..ef521152c5 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -25,6 +25,7 @@
 %sh10    20:10
 %csr    20:12
 %rm     12:3
+%nf     29:3                     !function=ex_plus_1
 
 # immediates:
 %imm_i    20:s12
@@ -43,6 +44,8 @@
 &u    imm rd
 &shift     shamt rs1 rd
 &atomic    aq rl rs2 rs1 rd
+&r2nfvm    vm rd rs1 nf
+&rnfvm     vm rd rs1 rs2 nf
 
 # Formats 32:
 @r       .......   ..... ..... ... ..... ....... &r                %rs2 %rs1 %rd
@@ -62,6 +65,8 @@
 @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
 @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
 @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
+@r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
+@r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
 @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
 
 @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
@@ -210,5 +215,32 @@ fcvt_d_w   1101001  00000 ..... ... ..... 1010011 @r2_rm
 fcvt_d_wu  1101001  00001 ..... ... ..... 1010011 @r2_rm
 
 # *** RV32V Extension ***
+
+# *** Vector loads and stores are encoded within LOADFP/STORE-FP ***
+vlb_v      ... 100 . 00000 ..... 000 ..... 0000111 @r2_nfvm
+vlh_v      ... 100 . 00000 ..... 101 ..... 0000111 @r2_nfvm
+vlw_v      ... 100 . 00000 ..... 110 ..... 0000111 @r2_nfvm
+vle_v      ... 000 . 00000 ..... 111 ..... 0000111 @r2_nfvm
+vlbu_v     ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm
+vlhu_v     ... 000 . 00000 ..... 101 ..... 0000111 @r2_nfvm
+vlwu_v     ... 000 . 00000 ..... 110 ..... 0000111 @r2_nfvm
+vsb_v      ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm
+vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
+vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
+vse_v      ... 000 . 00000 ..... 111 ..... 0100111 @r2_nfvm
+
+vlsb_v     ... 110 . ..... ..... 000 ..... 0000111 @r_nfvm
+vlsh_v     ... 110 . ..... ..... 101 ..... 0000111 @r_nfvm
+vlsw_v     ... 110 . ..... ..... 110 ..... 0000111 @r_nfvm
+vlse_v     ... 010 . ..... ..... 111 ..... 0000111 @r_nfvm
+vlsbu_v    ... 010 . ..... ..... 000 ..... 0000111 @r_nfvm
+vlshu_v    ... 010 . ..... ..... 101 ..... 0000111 @r_nfvm
+vlswu_v    ... 010 . ..... ..... 110 ..... 0000111 @r_nfvm
+vssb_v     ... 010 . ..... ..... 000 ..... 0100111 @r_nfvm
+vssh_v     ... 010 . ..... ..... 101 ..... 0100111 @r_nfvm
+vssw_v     ... 010 . ..... ..... 110 ..... 0100111 @r_nfvm
+vsse_v     ... 010 . ..... ..... 111 ..... 0100111 @r_nfvm
+
+# *** new major opcode OP-V ***
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index da82c72bbf..d85f2aec68 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -15,6 +15,8 @@
  * You should have received a copy of the GNU General Public License along with
  * this program.  If not, see <http://www.gnu.org/licenses/>.
  */
+#include "tcg/tcg-op-gvec.h"
+#include "tcg/tcg-gvec-desc.h"
 
 static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl * a)
 {
@@ -67,3 +69,341 @@ static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli * a)
     tcg_temp_free(dst);
     return true;
 }
+
+/* vector register offset from env */
+static uint32_t vreg_ofs(DisasContext *s, int reg)
+{
+    return offsetof(CPURISCVState, vreg) + reg * s->vlen / 8;
+}
+
+/* check functions */
+static bool vext_check_isa_ill(DisasContext *s, target_ulong isa)
+{
+    return !s->vill && ((s->misa & isa) == isa);
+}
+
+/*
+ * There are two rules check here.
+ *
+ * 1. Vector register numbers are multiples of LMUL. (Section 3.2)
+ *
+ * 2. For all widening instructions, the destination LMUL value must also be
+ *    a supported LMUL value. (Section 11.2)
+ */
+static bool vext_check_reg(DisasContext *s, uint32_t reg, bool widen)
+{
+    /*
+     * The destination vector register group results are arranged as if both
+     * SEW and LMUL were at twice their current settings. (Section 11.2).
+     */
+    int legal = widen ? 2 << s->lmul : 1 << s->lmul;
+
+    return !((s->lmul == 0x3 && widen) || (reg % legal));
+}
+
+/*
+ * There are two rules check here.
+ *
+ * 1. The destination vector register group for a masked vector instruction can
+ *    only overlap the source mask register (v0) when LMUL=1. (Section 5.3)
+ *
+ * 2. In widen instructions and some other insturctions, like vslideup.vx,
+ *    there is no need to check whether LMUL=1.
+ */
+static bool vext_check_overlap_mask(DisasContext *s, uint32_t vd, bool vm,
+    bool force)
+{
+    return (vm != 0 || vd != 0) || (!force && (s->lmul == 0));
+}
+
+/* The LMUL setting must be such that LMUL * NFIELDS <= 8. (Section 7.8) */
+static bool vext_check_nf(DisasContext *s, uint32_t nf)
+{
+    return (1 << s->lmul) * nf <= 8;
+}
+
+/* common translation macro */
+#define GEN_VEXT_TRANS(NAME, SEQ, ARGTYPE, OP, CHECK)      \
+static bool trans_##NAME(DisasContext *s, arg_##ARGTYPE *a)\
+{                                                          \
+    if (CHECK(s, a)) {                                     \
+        return OP(s, a, SEQ);                              \
+    }                                                      \
+    return false;                                          \
+}
+
+/*
+ *** unit stride load and store
+ */
+typedef void gen_helper_ldst_us(TCGv_ptr, TCGv_ptr, TCGv,
+        TCGv_env, TCGv_i32);
+
+static bool ldst_us_trans(uint32_t vd, uint32_t rs1, uint32_t data,
+        gen_helper_ldst_us *fn, DisasContext *s)
+{
+    TCGv_ptr dest, mask;
+    TCGv base;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    base = tcg_temp_new();
+
+    /*
+     * As simd_desc supports at most 256 bytes, and in this implementation,
+     * the max vector group length is 2048 bytes. So split it into two parts.
+     *
+     * The first part is vlen in bytes, encoded in maxsz of simd_desc.
+     * The second part is lmul, encoded in data of simd_desc.
+     */
+    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
+
+    gen_get_gpr(base, rs1);
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, mask, base, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free(base);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool ld_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
+{
+    uint32_t data = 0;
+    gen_helper_ldst_us *fn;
+    static gen_helper_ldst_us * const fns[2][7][4] = {
+        /* masked unit stride load */
+        { { gen_helper_vlb_v_b_mask,  gen_helper_vlb_v_h_mask,
+            gen_helper_vlb_v_w_mask,  gen_helper_vlb_v_d_mask },
+          { NULL,                     gen_helper_vlh_v_h_mask,
+            gen_helper_vlh_v_w_mask,  gen_helper_vlh_v_d_mask },
+          { NULL,                     NULL,
+            gen_helper_vlw_v_w_mask,  gen_helper_vlw_v_d_mask },
+          { gen_helper_vle_v_b_mask,  gen_helper_vle_v_h_mask,
+            gen_helper_vle_v_w_mask,  gen_helper_vle_v_d_mask },
+          { gen_helper_vlbu_v_b_mask, gen_helper_vlbu_v_h_mask,
+            gen_helper_vlbu_v_w_mask, gen_helper_vlbu_v_d_mask },
+          { NULL,                     gen_helper_vlhu_v_h_mask,
+            gen_helper_vlhu_v_w_mask, gen_helper_vlhu_v_d_mask },
+          { NULL,                     NULL,
+            gen_helper_vlwu_v_w_mask, gen_helper_vlwu_v_d_mask } },
+        /* unmasked unit stride load */
+        { { gen_helper_vlb_v_b,  gen_helper_vlb_v_h,
+            gen_helper_vlb_v_w,  gen_helper_vlb_v_d },
+          { NULL,                gen_helper_vlh_v_h,
+            gen_helper_vlh_v_w,  gen_helper_vlh_v_d },
+          { NULL,                NULL,
+            gen_helper_vlw_v_w,  gen_helper_vlw_v_d },
+          { gen_helper_vle_v_b,  gen_helper_vle_v_h,
+            gen_helper_vle_v_w,  gen_helper_vle_v_d },
+          { gen_helper_vlbu_v_b, gen_helper_vlbu_v_h,
+            gen_helper_vlbu_v_w, gen_helper_vlbu_v_d },
+          { NULL,                gen_helper_vlhu_v_h,
+            gen_helper_vlhu_v_w, gen_helper_vlhu_v_d },
+          { NULL,                NULL,
+            gen_helper_vlwu_v_w, gen_helper_vlwu_v_d } }
+    };
+
+    fn =  fns[a->vm][seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+    data = FIELD_DP32(data, VDATA, VM, a->vm);
+    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+    data = FIELD_DP32(data, VDATA, NF, a->nf);
+    return ldst_us_trans(a->rd, a->rs1, data, fn, s);
+}
+
+static bool ld_us_check(DisasContext *s, arg_r2nfvm* a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_nf(s, a->nf));
+}
+
+GEN_VEXT_TRANS(vlb_v, 0, r2nfvm, ld_us_op, ld_us_check)
+GEN_VEXT_TRANS(vlh_v, 1, r2nfvm, ld_us_op, ld_us_check)
+GEN_VEXT_TRANS(vlw_v, 2, r2nfvm, ld_us_op, ld_us_check)
+GEN_VEXT_TRANS(vle_v, 3, r2nfvm, ld_us_op, ld_us_check)
+GEN_VEXT_TRANS(vlbu_v, 4, r2nfvm, ld_us_op, ld_us_check)
+GEN_VEXT_TRANS(vlhu_v, 5, r2nfvm, ld_us_op, ld_us_check)
+GEN_VEXT_TRANS(vlwu_v, 6, r2nfvm, ld_us_op, ld_us_check)
+
+static bool st_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
+{
+    uint32_t data = 0;
+    gen_helper_ldst_us *fn;
+    static gen_helper_ldst_us * const fns[2][4][4] = {
+        /* masked unit stride load and store */
+        { { gen_helper_vsb_v_b_mask,  gen_helper_vsb_v_h_mask,
+            gen_helper_vsb_v_w_mask,  gen_helper_vsb_v_d_mask },
+          { NULL,                     gen_helper_vsh_v_h_mask,
+            gen_helper_vsh_v_w_mask,  gen_helper_vsh_v_d_mask },
+          { NULL,                     NULL,
+            gen_helper_vsw_v_w_mask,  gen_helper_vsw_v_d_mask },
+          { gen_helper_vse_v_b_mask,  gen_helper_vse_v_h_mask,
+            gen_helper_vse_v_w_mask,  gen_helper_vse_v_d_mask } },
+        /* unmasked unit stride store */
+        { { gen_helper_vsb_v_b,  gen_helper_vsb_v_h,
+            gen_helper_vsb_v_w,  gen_helper_vsb_v_d },
+          { NULL,                gen_helper_vsh_v_h,
+            gen_helper_vsh_v_w,  gen_helper_vsh_v_d },
+          { NULL,                NULL,
+            gen_helper_vsw_v_w,  gen_helper_vsw_v_d },
+          { gen_helper_vse_v_b,  gen_helper_vse_v_h,
+            gen_helper_vse_v_w,  gen_helper_vse_v_d } }
+    };
+
+    fn =  fns[a->vm][seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+    data = FIELD_DP32(data, VDATA, VM, a->vm);
+    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+    data = FIELD_DP32(data, VDATA, NF, a->nf);
+    return ldst_us_trans(a->rd, a->rs1, data, fn, s);
+}
+
+static bool st_us_check(DisasContext *s, arg_r2nfvm* a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_nf(s, a->nf));
+}
+
+GEN_VEXT_TRANS(vsb_v, 0, r2nfvm, st_us_op, st_us_check)
+GEN_VEXT_TRANS(vsh_v, 1, r2nfvm, st_us_op, st_us_check)
+GEN_VEXT_TRANS(vsw_v, 2, r2nfvm, st_us_op, st_us_check)
+GEN_VEXT_TRANS(vse_v, 3, r2nfvm, st_us_op, st_us_check)
+
+/*
+ *** stride load and store
+ */
+typedef void gen_helper_ldst_stride(TCGv_ptr, TCGv_ptr, TCGv,
+        TCGv, TCGv_env, TCGv_i32);
+
+static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, uint32_t rs2,
+        uint32_t data, gen_helper_ldst_stride *fn, DisasContext *s)
+{
+    TCGv_ptr dest, mask;
+    TCGv base, stride;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    base = tcg_temp_new();
+    stride = tcg_temp_new();
+    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
+
+    gen_get_gpr(base, rs1);
+    gen_get_gpr(stride, rs2);
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, mask, base, stride, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free(base);
+    tcg_temp_free(stride);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool ld_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
+{
+    uint32_t data = 0;
+    gen_helper_ldst_stride *fn;
+    static gen_helper_ldst_stride * const fns[7][4] = {
+        { gen_helper_vlsb_v_b,  gen_helper_vlsb_v_h,
+          gen_helper_vlsb_v_w,  gen_helper_vlsb_v_d },
+        { NULL,                 gen_helper_vlsh_v_h,
+          gen_helper_vlsh_v_w,  gen_helper_vlsh_v_d },
+        { NULL,                 NULL,
+          gen_helper_vlsw_v_w,  gen_helper_vlsw_v_d },
+        { gen_helper_vlse_v_b,  gen_helper_vlse_v_h,
+          gen_helper_vlse_v_w,  gen_helper_vlse_v_d },
+        { gen_helper_vlsbu_v_b, gen_helper_vlsbu_v_h,
+          gen_helper_vlsbu_v_w, gen_helper_vlsbu_v_d },
+        { NULL,                 gen_helper_vlshu_v_h,
+          gen_helper_vlshu_v_w, gen_helper_vlshu_v_d },
+        { NULL,                 NULL,
+          gen_helper_vlswu_v_w, gen_helper_vlswu_v_d },
+    };
+
+    fn =  fns[seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+    data = FIELD_DP32(data, VDATA, VM, a->vm);
+    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+    data = FIELD_DP32(data, VDATA, NF, a->nf);
+    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+}
+
+static bool ld_stride_check(DisasContext *s, arg_rnfvm* a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_nf(s, a->nf));
+}
+
+GEN_VEXT_TRANS(vlsb_v, 0, rnfvm, ld_stride_op, ld_stride_check)
+GEN_VEXT_TRANS(vlsh_v, 1, rnfvm, ld_stride_op, ld_stride_check)
+GEN_VEXT_TRANS(vlsw_v, 2, rnfvm, ld_stride_op, ld_stride_check)
+GEN_VEXT_TRANS(vlse_v, 3, rnfvm, ld_stride_op, ld_stride_check)
+GEN_VEXT_TRANS(vlsbu_v, 4, rnfvm, ld_stride_op, ld_stride_check)
+GEN_VEXT_TRANS(vlshu_v, 5, rnfvm, ld_stride_op, ld_stride_check)
+GEN_VEXT_TRANS(vlswu_v, 6, rnfvm, ld_stride_op, ld_stride_check)
+
+static bool st_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
+{
+    uint32_t data = 0;
+    gen_helper_ldst_stride *fn;
+    static gen_helper_ldst_stride * const fns[4][4] = {
+        /* masked stride store */
+        { gen_helper_vssb_v_b,  gen_helper_vssb_v_h,
+          gen_helper_vssb_v_w,  gen_helper_vssb_v_d },
+        { NULL,                 gen_helper_vssh_v_h,
+          gen_helper_vssh_v_w,  gen_helper_vssh_v_d },
+        { NULL,                 NULL,
+          gen_helper_vssw_v_w,  gen_helper_vssw_v_d },
+        { gen_helper_vsse_v_b,  gen_helper_vsse_v_h,
+          gen_helper_vsse_v_w,  gen_helper_vsse_v_d }
+    };
+
+    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+    data = FIELD_DP32(data, VDATA, VM, a->vm);
+    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+    data = FIELD_DP32(data, VDATA, NF, a->nf);
+    fn =  fns[seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+}
+
+static bool st_stride_check(DisasContext *s, arg_rnfvm* a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_nf(s, a->nf));
+}
+
+GEN_VEXT_TRANS(vssb_v, 0, rnfvm, st_stride_op, st_stride_check)
+GEN_VEXT_TRANS(vssh_v, 1, rnfvm, st_stride_op, st_stride_check)
+GEN_VEXT_TRANS(vssw_v, 2, rnfvm, st_stride_op, st_stride_check)
+GEN_VEXT_TRANS(vsse_v, 3, rnfvm, st_stride_op, st_stride_check)
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index af07ac4160..852545b77e 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -61,6 +61,7 @@ typedef struct DisasContext {
     uint8_t lmul;
     uint8_t sew;
     uint16_t vlen;
+    uint16_t mlen;
     bool vl_eq_vlmax;
 } DisasContext;
 
@@ -548,6 +549,11 @@ static void decode_RV32_64C(DisasContext *ctx, uint16_t opcode)
     }
 }
 
+static int ex_plus_1(DisasContext *ctx, int nf)
+{
+    return nf + 1;
+}
+
 #define EX_SH(amount) \
     static int ex_shift_##amount(DisasContext *ctx, int imm) \
     {                                         \
@@ -784,6 +790,7 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     ctx->vill = FIELD_EX32(tb_flags, TB_FLAGS, VILL);
     ctx->sew = FIELD_EX32(tb_flags, TB_FLAGS, SEW);
     ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
+    ctx->mlen = 1 << (ctx->sew  + 3 - ctx->lmul);
     ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
 }
 
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 2afe716f2a..ebfabd2946 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -18,8 +18,10 @@
 
 #include "qemu/osdep.h"
 #include "cpu.h"
+#include "exec/memop.h"
 #include "exec/exec-all.h"
 #include "exec/helper-proto.h"
+#include "tcg/tcg-gvec-desc.h"
 #include <math.h>
 
 target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
@@ -51,3 +53,407 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
     env->vstart = 0;
     return vl;
 }
+
+/*
+ * Note that vector data is stored in host-endian 64-bit chunks,
+ * so addressing units smaller than that needs a host-endian fixup.
+ */
+#ifdef HOST_WORDS_BIGENDIAN
+#define H1(x)   ((x) ^ 7)
+#define H1_2(x) ((x) ^ 6)
+#define H1_4(x) ((x) ^ 4)
+#define H2(x)   ((x) ^ 3)
+#define H4(x)   ((x) ^ 1)
+#define H8(x)   ((x))
+#else
+#define H1(x)   (x)
+#define H1_2(x) (x)
+#define H1_4(x) (x)
+#define H2(x)   (x)
+#define H4(x)   (x)
+#define H8(x)   (x)
+#endif
+
+static inline uint32_t vext_nf(uint32_t desc)
+{
+    return FIELD_EX32(simd_data(desc), VDATA, NF);
+}
+
+static inline uint32_t vext_mlen(uint32_t desc)
+{
+    return FIELD_EX32(simd_data(desc), VDATA, MLEN);
+}
+
+static inline uint32_t vext_vm(uint32_t desc)
+{
+    return FIELD_EX32(simd_data(desc), VDATA, VM);
+}
+
+static inline uint32_t vext_lmul(uint32_t desc)
+{
+    return FIELD_EX32(simd_data(desc), VDATA, LMUL);
+}
+
+/*
+ * Get vector group length in bytes. Its range is [64, 2048].
+ *
+ * As simd_desc support at most 256, the max vlen is 512 bits.
+ * So vlen in bytes is encoded as maxsz.
+ */
+static inline uint32_t vext_maxsz(uint32_t desc)
+{
+    return simd_maxsz(desc) << vext_lmul(desc);
+}
+
+/*
+ * This function checks watchpoint before real load operation.
+ *
+ * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
+ * In user mode, there is no watchpoint support now.
+ *
+ * It will trigger an exception if there is no mapping in TLB
+ * and page table walk can't fill the TLB entry. Then the guest
+ * software can return here after process the exception or never return.
+ */
+static void probe_pages(CPURISCVState *env, target_ulong addr,
+        target_ulong len, uintptr_t ra, MMUAccessType access_type)
+{
+    target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
+    target_ulong curlen = MIN(pagelen, len);
+
+    probe_access(env, addr, curlen, access_type,
+            cpu_mmu_index(env, false), ra);
+    if (len > curlen) {
+        addr += curlen;
+        curlen = len - curlen;
+        probe_access(env, addr, curlen, access_type,
+                cpu_mmu_index(env, false), ra);
+    }
+}
+
+#ifdef HOST_WORDS_BIGENDIAN
+static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
+{
+    /*
+     * Split the remaining range to two parts.
+     * The first part is in the last uint64_t unit.
+     * The second part start from the next uint64_t unit.
+     */
+    int part1 = 0, part2 = tot - cnt;
+    if (cnt % 8) {
+        part1 = 8 - (cnt % 8);
+        part2 = tot - cnt - part1;
+        memset(tail & ~(7ULL), 0, part1);
+        memset((tail + 8) & ~(7ULL), 0, part2);
+    } else {
+        memset(tail, 0, part2);
+    }
+}
+#else
+static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
+{
+    memset(tail, 0, tot - cnt);
+}
+#endif
+
+static void clearb(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
+{
+    int8_t *cur = ((int8_t *)vd + H1(idx));
+    vext_clear(cur, cnt, tot);
+}
+
+static void clearh(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
+{
+    int16_t *cur = ((int16_t *)vd + H2(idx));
+    vext_clear(cur, cnt, tot);
+}
+
+static void clearl(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
+{
+    int32_t *cur = ((int32_t *)vd + H4(idx));
+    vext_clear(cur, cnt, tot);
+}
+
+static void clearq(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
+{
+    int64_t *cur = (int64_t *)vd + idx;
+    vext_clear(cur, cnt, tot);
+}
+
+
+static inline int vext_elem_mask(void *v0, int mlen, int index)
+{
+    int idx = (index * mlen) / 64;
+    int pos = (index * mlen) % 64;
+    return (((uint64_t *)v0)[idx] >> pos) & 1;
+}
+
+/* elements operations for load and store */
+typedef void (*vext_ldst_elem_fn)(CPURISCVState *env, target_ulong addr,
+        uint32_t idx, void *vd, uintptr_t retaddr);
+typedef void (*vext_ld_clear_elem)(void *vd, uint32_t idx,
+        uint32_t cnt, uint32_t tot);
+
+#define GEN_VEXT_LD_ELEM(NAME, MTYPE, ETYPE, H, LDSUF)     \
+static void NAME(CPURISCVState *env, abi_ptr addr,         \
+        uint32_t idx, void *vd, uintptr_t retaddr)         \
+{                                                          \
+    MTYPE data;                                            \
+    ETYPE *cur = ((ETYPE *)vd + H(idx));                   \
+    data = cpu_##LDSUF##_data_ra(env, addr, retaddr);      \
+    *cur = data;                                           \
+}                                                          \
+
+GEN_VEXT_LD_ELEM(ldb_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(ldb_h, int8_t,  int16_t, H2, ldsb)
+GEN_VEXT_LD_ELEM(ldb_w, int8_t,  int32_t, H4, ldsb)
+GEN_VEXT_LD_ELEM(ldb_d, int8_t,  int64_t, H8, ldsb)
+GEN_VEXT_LD_ELEM(ldh_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(ldh_w, int16_t, int32_t, H4, ldsw)
+GEN_VEXT_LD_ELEM(ldh_d, int16_t, int64_t, H8, ldsw)
+GEN_VEXT_LD_ELEM(ldw_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(ldw_d, int32_t, int64_t, H8, ldl)
+GEN_VEXT_LD_ELEM(lde_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(lde_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(lde_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(lde_d, int64_t, int64_t, H8, ldq)
+GEN_VEXT_LD_ELEM(ldbu_b, uint8_t,  uint8_t,  H1, ldub)
+GEN_VEXT_LD_ELEM(ldbu_h, uint8_t,  uint16_t, H2, ldub)
+GEN_VEXT_LD_ELEM(ldbu_w, uint8_t,  uint32_t, H4, ldub)
+GEN_VEXT_LD_ELEM(ldbu_d, uint8_t,  uint64_t, H8, ldub)
+GEN_VEXT_LD_ELEM(ldhu_h, uint16_t, uint16_t, H2, lduw)
+GEN_VEXT_LD_ELEM(ldhu_w, uint16_t, uint32_t, H4, lduw)
+GEN_VEXT_LD_ELEM(ldhu_d, uint16_t, uint64_t, H8, lduw)
+GEN_VEXT_LD_ELEM(ldwu_w, uint32_t, uint32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(ldwu_d, uint32_t, uint64_t, H8, ldl)
+
+#define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF)          \
+static void NAME(CPURISCVState *env, abi_ptr addr,       \
+        uint32_t idx, void *vd, uintptr_t retaddr)       \
+{                                                        \
+    ETYPE data = *((ETYPE *)vd + H(idx));                \
+    cpu_##STSUF##_data_ra(env, addr, data, retaddr);     \
+}
+GEN_VEXT_ST_ELEM(stb_b, int8_t,  H1, stb)
+GEN_VEXT_ST_ELEM(stb_h, int16_t, H2, stb)
+GEN_VEXT_ST_ELEM(stb_w, int32_t, H4, stb)
+GEN_VEXT_ST_ELEM(stb_d, int64_t, H8, stb)
+GEN_VEXT_ST_ELEM(sth_h, int16_t, H2, stw)
+GEN_VEXT_ST_ELEM(sth_w, int32_t, H4, stw)
+GEN_VEXT_ST_ELEM(sth_d, int64_t, H8, stw)
+GEN_VEXT_ST_ELEM(stw_w, int32_t, H4, stl)
+GEN_VEXT_ST_ELEM(stw_d, int64_t, H8, stl)
+GEN_VEXT_ST_ELEM(ste_b, int8_t,  H1, stb)
+GEN_VEXT_ST_ELEM(ste_h, int16_t, H2, stw)
+GEN_VEXT_ST_ELEM(ste_w, int32_t, H4, stl)
+GEN_VEXT_ST_ELEM(ste_d, int64_t, H8, stq)
+
+/*
+ *** stride: access vector element from strided memory
+ */
+static void vext_ldst_stride(void *vd, void *v0, target_ulong base,
+        target_ulong stride, CPURISCVState *env, uint32_t desc, uint32_t vm,
+        vext_ldst_elem_fn ldst_elem, vext_ld_clear_elem clear_elem,
+        uint32_t esz, uint32_t msz, uintptr_t ra, MMUAccessType access_type)
+{
+    uint32_t i, k;
+    uint32_t nf = vext_nf(desc);
+    uint32_t mlen = vext_mlen(desc);
+    uint32_t vlmax = vext_maxsz(desc) / esz;
+
+    if (env->vl == 0) {
+        return;
+    }
+    /* probe every access*/
+    for (i = 0; i < env->vl; i++) {
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {
+            continue;
+        }
+        probe_pages(env, base + stride * i, nf * msz, ra, access_type);
+    }
+    /* do real access */
+    for (i = 0; i < env->vl; i++) {
+        k = 0;
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {
+            continue;
+        }
+        while (k < nf) {
+            target_ulong addr = base + stride * i + k * msz;
+            ldst_elem(env, addr, i + k * vlmax, vd, ra);
+            k++;
+        }
+    }
+    /* clear tail elements */
+    if (clear_elem) {
+        for (k = 0; k < nf; k++) {
+            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
+        }
+    }
+}
+
+#define GEN_VEXT_LD_STRIDE(NAME, MTYPE, ETYPE, LOAD_FN, CLEAR_FN)       \
+void HELPER(NAME)(void *vd, void * v0, target_ulong base,               \
+        target_ulong stride, CPURISCVState *env, uint32_t desc)         \
+{                                                                       \
+    uint32_t vm = vext_vm(desc);                                        \
+    vext_ldst_stride(vd, v0, base, stride, env, desc, vm, LOAD_FN,      \
+        CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);\
+}
+
+GEN_VEXT_LD_STRIDE(vlsb_v_b,  int8_t,   int8_t,   ldb_b,  clearb)
+GEN_VEXT_LD_STRIDE(vlsb_v_h,  int8_t,   int16_t,  ldb_h,  clearh)
+GEN_VEXT_LD_STRIDE(vlsb_v_w,  int8_t,   int32_t,  ldb_w,  clearl)
+GEN_VEXT_LD_STRIDE(vlsb_v_d,  int8_t,   int64_t,  ldb_d,  clearq)
+GEN_VEXT_LD_STRIDE(vlsh_v_h,  int16_t,  int16_t,  ldh_h,  clearh)
+GEN_VEXT_LD_STRIDE(vlsh_v_w,  int16_t,  int32_t,  ldh_w,  clearl)
+GEN_VEXT_LD_STRIDE(vlsh_v_d,  int16_t,  int64_t,  ldh_d,  clearq)
+GEN_VEXT_LD_STRIDE(vlsw_v_w,  int32_t,  int32_t,  ldw_w,  clearl)
+GEN_VEXT_LD_STRIDE(vlsw_v_d,  int32_t,  int64_t,  ldw_d,  clearq)
+GEN_VEXT_LD_STRIDE(vlse_v_b,  int8_t,   int8_t,   lde_b,  clearb)
+GEN_VEXT_LD_STRIDE(vlse_v_h,  int16_t,  int16_t,  lde_h,  clearh)
+GEN_VEXT_LD_STRIDE(vlse_v_w,  int32_t,  int32_t,  lde_w,  clearl)
+GEN_VEXT_LD_STRIDE(vlse_v_d,  int64_t,  int64_t,  lde_d,  clearq)
+GEN_VEXT_LD_STRIDE(vlsbu_v_b, uint8_t,  uint8_t,  ldbu_b, clearb)
+GEN_VEXT_LD_STRIDE(vlsbu_v_h, uint8_t,  uint16_t, ldbu_h, clearh)
+GEN_VEXT_LD_STRIDE(vlsbu_v_w, uint8_t,  uint32_t, ldbu_w, clearl)
+GEN_VEXT_LD_STRIDE(vlsbu_v_d, uint8_t,  uint64_t, ldbu_d, clearq)
+GEN_VEXT_LD_STRIDE(vlshu_v_h, uint16_t, uint16_t, ldhu_h, clearh)
+GEN_VEXT_LD_STRIDE(vlshu_v_w, uint16_t, uint32_t, ldhu_w, clearl)
+GEN_VEXT_LD_STRIDE(vlshu_v_d, uint16_t, uint64_t, ldhu_d, clearq)
+GEN_VEXT_LD_STRIDE(vlswu_v_w, uint32_t, uint32_t, ldwu_w, clearl)
+GEN_VEXT_LD_STRIDE(vlswu_v_d, uint32_t, uint64_t, ldwu_d, clearq)
+
+#define GEN_VEXT_ST_STRIDE(NAME, MTYPE, ETYPE, STORE_FN)                \
+void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
+        target_ulong stride, CPURISCVState *env, uint32_t desc)         \
+{                                                                       \
+    uint32_t vm = vext_vm(desc);                                        \
+    vext_ldst_stride(vd, v0, base, stride, env, desc, vm, STORE_FN,     \
+        NULL, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);   \
+}
+
+GEN_VEXT_ST_STRIDE(vssb_v_b, int8_t,  int8_t,  stb_b)
+GEN_VEXT_ST_STRIDE(vssb_v_h, int8_t,  int16_t, stb_h)
+GEN_VEXT_ST_STRIDE(vssb_v_w, int8_t,  int32_t, stb_w)
+GEN_VEXT_ST_STRIDE(vssb_v_d, int8_t,  int64_t, stb_d)
+GEN_VEXT_ST_STRIDE(vssh_v_h, int16_t, int16_t, sth_h)
+GEN_VEXT_ST_STRIDE(vssh_v_w, int16_t, int32_t, sth_w)
+GEN_VEXT_ST_STRIDE(vssh_v_d, int16_t, int64_t, sth_d)
+GEN_VEXT_ST_STRIDE(vssw_v_w, int32_t, int32_t, stw_w)
+GEN_VEXT_ST_STRIDE(vssw_v_d, int32_t, int64_t, stw_d)
+GEN_VEXT_ST_STRIDE(vsse_v_b, int8_t,  int8_t,  ste_b)
+GEN_VEXT_ST_STRIDE(vsse_v_h, int16_t, int16_t, ste_h)
+GEN_VEXT_ST_STRIDE(vsse_v_w, int32_t, int32_t, ste_w)
+GEN_VEXT_ST_STRIDE(vsse_v_d, int64_t, int64_t, ste_d)
+
+/*
+ *** unit-stride: access elements stored contiguously in memory
+ */
+
+/* unmasked unit-stride load and store operation*/
+static inline void vext_ldst_us(void *vd, target_ulong base,
+        CPURISCVState *env, uint32_t desc,
+        vext_ldst_elem_fn ldst_elem,
+        vext_ld_clear_elem clear_elem,
+        uint32_t esz, uint32_t msz, uintptr_t ra,
+        MMUAccessType access_type)
+{
+    uint32_t i, k;
+    uint32_t nf = vext_nf(desc);
+    uint32_t vlmax = vext_maxsz(desc) / esz;
+
+    if (env->vl == 0) {
+        return;
+    }
+    /* probe every access */
+    probe_pages(env, base, env->vl * nf * msz, ra, access_type);
+    /* load bytes from guest memory */
+    for (i = 0; i < env->vl; i++) {
+        k = 0;
+        while (k < nf) {
+            target_ulong addr = base + (i * nf + k) * msz;
+            ldst_elem(env, addr, i + k * vlmax, vd, ra);
+            k++;
+        }
+    }
+    /* clear tail elements */
+    if (clear_elem) {
+        for (k = 0; k < nf; k++) {
+            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
+        }
+    }
+}
+
+/*
+ * masked unit-stride load and store operation will be a special case of stride,
+ * stride = NF * sizeof (MTYPE)
+ */
+
+#define GEN_VEXT_LD_US(NAME, MTYPE, ETYPE, LOAD_FN, CLEAR_FN)           \
+void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,         \
+        CPURISCVState *env, uint32_t desc)                              \
+{                                                                       \
+    uint32_t stride = vext_nf(desc) * sizeof(MTYPE);                    \
+    vext_ldst_stride(vd, v0, base, stride, env, desc, false, LOAD_FN,   \
+        CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);\
+}                                                                       \
+                                                                        \
+void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
+        CPURISCVState *env, uint32_t desc)                              \
+{                                                                       \
+    vext_ldst_us(vd, base, env, desc, LOAD_FN, CLEAR_FN,                \
+        sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);          \
+}
+
+GEN_VEXT_LD_US(vlb_v_b,  int8_t,   int8_t,   ldb_b,  clearb)
+GEN_VEXT_LD_US(vlb_v_h,  int8_t,   int16_t,  ldb_h,  clearh)
+GEN_VEXT_LD_US(vlb_v_w,  int8_t,   int32_t,  ldb_w,  clearl)
+GEN_VEXT_LD_US(vlb_v_d,  int8_t,   int64_t,  ldb_d,  clearq)
+GEN_VEXT_LD_US(vlh_v_h,  int16_t,  int16_t,  ldh_h,  clearh)
+GEN_VEXT_LD_US(vlh_v_w,  int16_t,  int32_t,  ldh_w,  clearl)
+GEN_VEXT_LD_US(vlh_v_d,  int16_t,  int64_t,  ldh_d,  clearq)
+GEN_VEXT_LD_US(vlw_v_w,  int32_t,  int32_t,  ldw_w,  clearl)
+GEN_VEXT_LD_US(vlw_v_d,  int32_t,  int64_t,  ldw_d,  clearq)
+GEN_VEXT_LD_US(vle_v_b,  int8_t,   int8_t,   lde_b,  clearb)
+GEN_VEXT_LD_US(vle_v_h,  int16_t,  int16_t,  lde_h,  clearh)
+GEN_VEXT_LD_US(vle_v_w,  int32_t,  int32_t,  lde_w,  clearl)
+GEN_VEXT_LD_US(vle_v_d,  int64_t,  int64_t,  lde_d,  clearq)
+GEN_VEXT_LD_US(vlbu_v_b, uint8_t,  uint8_t,  ldbu_b, clearb)
+GEN_VEXT_LD_US(vlbu_v_h, uint8_t,  uint16_t, ldbu_h, clearh)
+GEN_VEXT_LD_US(vlbu_v_w, uint8_t,  uint32_t, ldbu_w, clearl)
+GEN_VEXT_LD_US(vlbu_v_d, uint8_t,  uint64_t, ldbu_d, clearq)
+GEN_VEXT_LD_US(vlhu_v_h, uint16_t, uint16_t, ldhu_h, clearh)
+GEN_VEXT_LD_US(vlhu_v_w, uint16_t, uint32_t, ldhu_w, clearl)
+GEN_VEXT_LD_US(vlhu_v_d, uint16_t, uint64_t, ldhu_d, clearq)
+GEN_VEXT_LD_US(vlwu_v_w, uint32_t, uint32_t, ldwu_w, clearl)
+GEN_VEXT_LD_US(vlwu_v_d, uint32_t, uint64_t, ldwu_d, clearq)
+
+#define GEN_VEXT_ST_US(NAME, MTYPE, ETYPE, STORE_FN)                    \
+void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,         \
+        CPURISCVState *env, uint32_t desc)                              \
+{                                                                       \
+    uint32_t stride = vext_nf(desc) * sizeof(MTYPE);                    \
+    vext_ldst_stride(vd, v0, base, stride, env, desc, false, STORE_FN,  \
+        NULL, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);   \
+}                                                                       \
+                                                                        \
+void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
+        CPURISCVState *env, uint32_t desc)                              \
+{                                                                       \
+    vext_ldst_us(vd, base, env, desc, STORE_FN, NULL,                   \
+        sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);         \
+}
+
+GEN_VEXT_ST_US(vsb_v_b, int8_t,  int8_t , stb_b)
+GEN_VEXT_ST_US(vsb_v_h, int8_t,  int16_t, stb_h)
+GEN_VEXT_ST_US(vsb_v_w, int8_t,  int32_t, stb_w)
+GEN_VEXT_ST_US(vsb_v_d, int8_t,  int64_t, stb_d)
+GEN_VEXT_ST_US(vsh_v_h, int16_t, int16_t, sth_h)
+GEN_VEXT_ST_US(vsh_v_w, int16_t, int32_t, sth_w)
+GEN_VEXT_ST_US(vsh_v_d, int16_t, int64_t, sth_d)
+GEN_VEXT_ST_US(vsw_v_w, int32_t, int32_t, stw_w)
+GEN_VEXT_ST_US(vsw_v_d, int32_t, int64_t, stw_d)
+GEN_VEXT_ST_US(vse_v_b, int8_t,  int8_t , ste_b)
+GEN_VEXT_ST_US(vse_v_h, int16_t, int16_t, ste_h)
+GEN_VEXT_ST_US(vse_v_w, int32_t, int32_t, ste_w)
+GEN_VEXT_ST_US(vse_v_d, int64_t, int64_t, ste_d)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 05/60] target/riscv: add vector stride load and store instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Vector strided operations access the first memory element at the base address,
and then access subsequent elements at address increments given by the byte
offset contained in the x register specified by rs2.

Vector unit-stride operations access elements stored contiguously in memory
starting from the base effective address. It can been seen as a special
case of strided operations.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/cpu.h                      |   6 +
 target/riscv/helper.h                   | 105 ++++++
 target/riscv/insn32.decode              |  32 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 340 ++++++++++++++++++++
 target/riscv/translate.c                |   7 +
 target/riscv/vector_helper.c            | 406 ++++++++++++++++++++++++
 6 files changed, 896 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 505d1a8515..b6ebb9b0eb 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -369,6 +369,12 @@ typedef CPURISCVState CPUArchState;
 typedef RISCVCPU ArchCPU;
 #include "exec/cpu-all.h"
 
+/* share data between vector helpers and decode code */
+FIELD(VDATA, MLEN, 0, 8)
+FIELD(VDATA, VM, 8, 1)
+FIELD(VDATA, LMUL, 9, 2)
+FIELD(VDATA, NF, 11, 4)
+
 FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
 FIELD(TB_FLAGS, LMUL, 3, 2)
 FIELD(TB_FLAGS, SEW, 5, 3)
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 3c28c7e407..87dfa90609 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -78,3 +78,108 @@ DEF_HELPER_1(tlb_flush, void, env)
 #endif
 /* Vector functions */
 DEF_HELPER_3(vsetvl, tl, env, tl, tl)
+DEF_HELPER_5(vlb_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_b_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlw_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlw_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlw_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlw_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_b_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_b_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwu_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwu_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwu_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwu_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_b_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsw_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsw_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsw_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsw_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_b_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_6(vlsb_v_b, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsb_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsb_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsb_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsh_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsh_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsh_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsw_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsw_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlse_v_b, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlse_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlse_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlse_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsbu_v_b, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsbu_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsbu_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsbu_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlshu_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlshu_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlshu_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlswu_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlswu_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssb_v_b, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssb_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssb_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssb_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssh_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssh_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssh_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssw_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssw_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vsse_v_b, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vsse_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vsse_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vsse_v_d, void, ptr, ptr, tl, tl, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 53340bdbc4..ef521152c5 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -25,6 +25,7 @@
 %sh10    20:10
 %csr    20:12
 %rm     12:3
+%nf     29:3                     !function=ex_plus_1
 
 # immediates:
 %imm_i    20:s12
@@ -43,6 +44,8 @@
 &u    imm rd
 &shift     shamt rs1 rd
 &atomic    aq rl rs2 rs1 rd
+&r2nfvm    vm rd rs1 nf
+&rnfvm     vm rd rs1 rs2 nf
 
 # Formats 32:
 @r       .......   ..... ..... ... ..... ....... &r                %rs2 %rs1 %rd
@@ -62,6 +65,8 @@
 @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
 @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
 @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
+@r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
+@r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
 @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
 
 @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
@@ -210,5 +215,32 @@ fcvt_d_w   1101001  00000 ..... ... ..... 1010011 @r2_rm
 fcvt_d_wu  1101001  00001 ..... ... ..... 1010011 @r2_rm
 
 # *** RV32V Extension ***
+
+# *** Vector loads and stores are encoded within LOADFP/STORE-FP ***
+vlb_v      ... 100 . 00000 ..... 000 ..... 0000111 @r2_nfvm
+vlh_v      ... 100 . 00000 ..... 101 ..... 0000111 @r2_nfvm
+vlw_v      ... 100 . 00000 ..... 110 ..... 0000111 @r2_nfvm
+vle_v      ... 000 . 00000 ..... 111 ..... 0000111 @r2_nfvm
+vlbu_v     ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm
+vlhu_v     ... 000 . 00000 ..... 101 ..... 0000111 @r2_nfvm
+vlwu_v     ... 000 . 00000 ..... 110 ..... 0000111 @r2_nfvm
+vsb_v      ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm
+vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
+vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
+vse_v      ... 000 . 00000 ..... 111 ..... 0100111 @r2_nfvm
+
+vlsb_v     ... 110 . ..... ..... 000 ..... 0000111 @r_nfvm
+vlsh_v     ... 110 . ..... ..... 101 ..... 0000111 @r_nfvm
+vlsw_v     ... 110 . ..... ..... 110 ..... 0000111 @r_nfvm
+vlse_v     ... 010 . ..... ..... 111 ..... 0000111 @r_nfvm
+vlsbu_v    ... 010 . ..... ..... 000 ..... 0000111 @r_nfvm
+vlshu_v    ... 010 . ..... ..... 101 ..... 0000111 @r_nfvm
+vlswu_v    ... 010 . ..... ..... 110 ..... 0000111 @r_nfvm
+vssb_v     ... 010 . ..... ..... 000 ..... 0100111 @r_nfvm
+vssh_v     ... 010 . ..... ..... 101 ..... 0100111 @r_nfvm
+vssw_v     ... 010 . ..... ..... 110 ..... 0100111 @r_nfvm
+vsse_v     ... 010 . ..... ..... 111 ..... 0100111 @r_nfvm
+
+# *** new major opcode OP-V ***
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index da82c72bbf..d85f2aec68 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -15,6 +15,8 @@
  * You should have received a copy of the GNU General Public License along with
  * this program.  If not, see <http://www.gnu.org/licenses/>.
  */
+#include "tcg/tcg-op-gvec.h"
+#include "tcg/tcg-gvec-desc.h"
 
 static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl * a)
 {
@@ -67,3 +69,341 @@ static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli * a)
     tcg_temp_free(dst);
     return true;
 }
+
+/* vector register offset from env */
+static uint32_t vreg_ofs(DisasContext *s, int reg)
+{
+    return offsetof(CPURISCVState, vreg) + reg * s->vlen / 8;
+}
+
+/* check functions */
+static bool vext_check_isa_ill(DisasContext *s, target_ulong isa)
+{
+    return !s->vill && ((s->misa & isa) == isa);
+}
+
+/*
+ * There are two rules check here.
+ *
+ * 1. Vector register numbers are multiples of LMUL. (Section 3.2)
+ *
+ * 2. For all widening instructions, the destination LMUL value must also be
+ *    a supported LMUL value. (Section 11.2)
+ */
+static bool vext_check_reg(DisasContext *s, uint32_t reg, bool widen)
+{
+    /*
+     * The destination vector register group results are arranged as if both
+     * SEW and LMUL were at twice their current settings. (Section 11.2).
+     */
+    int legal = widen ? 2 << s->lmul : 1 << s->lmul;
+
+    return !((s->lmul == 0x3 && widen) || (reg % legal));
+}
+
+/*
+ * There are two rules check here.
+ *
+ * 1. The destination vector register group for a masked vector instruction can
+ *    only overlap the source mask register (v0) when LMUL=1. (Section 5.3)
+ *
+ * 2. In widen instructions and some other insturctions, like vslideup.vx,
+ *    there is no need to check whether LMUL=1.
+ */
+static bool vext_check_overlap_mask(DisasContext *s, uint32_t vd, bool vm,
+    bool force)
+{
+    return (vm != 0 || vd != 0) || (!force && (s->lmul == 0));
+}
+
+/* The LMUL setting must be such that LMUL * NFIELDS <= 8. (Section 7.8) */
+static bool vext_check_nf(DisasContext *s, uint32_t nf)
+{
+    return (1 << s->lmul) * nf <= 8;
+}
+
+/* common translation macro */
+#define GEN_VEXT_TRANS(NAME, SEQ, ARGTYPE, OP, CHECK)      \
+static bool trans_##NAME(DisasContext *s, arg_##ARGTYPE *a)\
+{                                                          \
+    if (CHECK(s, a)) {                                     \
+        return OP(s, a, SEQ);                              \
+    }                                                      \
+    return false;                                          \
+}
+
+/*
+ *** unit stride load and store
+ */
+typedef void gen_helper_ldst_us(TCGv_ptr, TCGv_ptr, TCGv,
+        TCGv_env, TCGv_i32);
+
+static bool ldst_us_trans(uint32_t vd, uint32_t rs1, uint32_t data,
+        gen_helper_ldst_us *fn, DisasContext *s)
+{
+    TCGv_ptr dest, mask;
+    TCGv base;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    base = tcg_temp_new();
+
+    /*
+     * As simd_desc supports at most 256 bytes, and in this implementation,
+     * the max vector group length is 2048 bytes. So split it into two parts.
+     *
+     * The first part is vlen in bytes, encoded in maxsz of simd_desc.
+     * The second part is lmul, encoded in data of simd_desc.
+     */
+    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
+
+    gen_get_gpr(base, rs1);
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, mask, base, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free(base);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool ld_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
+{
+    uint32_t data = 0;
+    gen_helper_ldst_us *fn;
+    static gen_helper_ldst_us * const fns[2][7][4] = {
+        /* masked unit stride load */
+        { { gen_helper_vlb_v_b_mask,  gen_helper_vlb_v_h_mask,
+            gen_helper_vlb_v_w_mask,  gen_helper_vlb_v_d_mask },
+          { NULL,                     gen_helper_vlh_v_h_mask,
+            gen_helper_vlh_v_w_mask,  gen_helper_vlh_v_d_mask },
+          { NULL,                     NULL,
+            gen_helper_vlw_v_w_mask,  gen_helper_vlw_v_d_mask },
+          { gen_helper_vle_v_b_mask,  gen_helper_vle_v_h_mask,
+            gen_helper_vle_v_w_mask,  gen_helper_vle_v_d_mask },
+          { gen_helper_vlbu_v_b_mask, gen_helper_vlbu_v_h_mask,
+            gen_helper_vlbu_v_w_mask, gen_helper_vlbu_v_d_mask },
+          { NULL,                     gen_helper_vlhu_v_h_mask,
+            gen_helper_vlhu_v_w_mask, gen_helper_vlhu_v_d_mask },
+          { NULL,                     NULL,
+            gen_helper_vlwu_v_w_mask, gen_helper_vlwu_v_d_mask } },
+        /* unmasked unit stride load */
+        { { gen_helper_vlb_v_b,  gen_helper_vlb_v_h,
+            gen_helper_vlb_v_w,  gen_helper_vlb_v_d },
+          { NULL,                gen_helper_vlh_v_h,
+            gen_helper_vlh_v_w,  gen_helper_vlh_v_d },
+          { NULL,                NULL,
+            gen_helper_vlw_v_w,  gen_helper_vlw_v_d },
+          { gen_helper_vle_v_b,  gen_helper_vle_v_h,
+            gen_helper_vle_v_w,  gen_helper_vle_v_d },
+          { gen_helper_vlbu_v_b, gen_helper_vlbu_v_h,
+            gen_helper_vlbu_v_w, gen_helper_vlbu_v_d },
+          { NULL,                gen_helper_vlhu_v_h,
+            gen_helper_vlhu_v_w, gen_helper_vlhu_v_d },
+          { NULL,                NULL,
+            gen_helper_vlwu_v_w, gen_helper_vlwu_v_d } }
+    };
+
+    fn =  fns[a->vm][seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+    data = FIELD_DP32(data, VDATA, VM, a->vm);
+    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+    data = FIELD_DP32(data, VDATA, NF, a->nf);
+    return ldst_us_trans(a->rd, a->rs1, data, fn, s);
+}
+
+static bool ld_us_check(DisasContext *s, arg_r2nfvm* a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_nf(s, a->nf));
+}
+
+GEN_VEXT_TRANS(vlb_v, 0, r2nfvm, ld_us_op, ld_us_check)
+GEN_VEXT_TRANS(vlh_v, 1, r2nfvm, ld_us_op, ld_us_check)
+GEN_VEXT_TRANS(vlw_v, 2, r2nfvm, ld_us_op, ld_us_check)
+GEN_VEXT_TRANS(vle_v, 3, r2nfvm, ld_us_op, ld_us_check)
+GEN_VEXT_TRANS(vlbu_v, 4, r2nfvm, ld_us_op, ld_us_check)
+GEN_VEXT_TRANS(vlhu_v, 5, r2nfvm, ld_us_op, ld_us_check)
+GEN_VEXT_TRANS(vlwu_v, 6, r2nfvm, ld_us_op, ld_us_check)
+
+static bool st_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
+{
+    uint32_t data = 0;
+    gen_helper_ldst_us *fn;
+    static gen_helper_ldst_us * const fns[2][4][4] = {
+        /* masked unit stride load and store */
+        { { gen_helper_vsb_v_b_mask,  gen_helper_vsb_v_h_mask,
+            gen_helper_vsb_v_w_mask,  gen_helper_vsb_v_d_mask },
+          { NULL,                     gen_helper_vsh_v_h_mask,
+            gen_helper_vsh_v_w_mask,  gen_helper_vsh_v_d_mask },
+          { NULL,                     NULL,
+            gen_helper_vsw_v_w_mask,  gen_helper_vsw_v_d_mask },
+          { gen_helper_vse_v_b_mask,  gen_helper_vse_v_h_mask,
+            gen_helper_vse_v_w_mask,  gen_helper_vse_v_d_mask } },
+        /* unmasked unit stride store */
+        { { gen_helper_vsb_v_b,  gen_helper_vsb_v_h,
+            gen_helper_vsb_v_w,  gen_helper_vsb_v_d },
+          { NULL,                gen_helper_vsh_v_h,
+            gen_helper_vsh_v_w,  gen_helper_vsh_v_d },
+          { NULL,                NULL,
+            gen_helper_vsw_v_w,  gen_helper_vsw_v_d },
+          { gen_helper_vse_v_b,  gen_helper_vse_v_h,
+            gen_helper_vse_v_w,  gen_helper_vse_v_d } }
+    };
+
+    fn =  fns[a->vm][seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+    data = FIELD_DP32(data, VDATA, VM, a->vm);
+    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+    data = FIELD_DP32(data, VDATA, NF, a->nf);
+    return ldst_us_trans(a->rd, a->rs1, data, fn, s);
+}
+
+static bool st_us_check(DisasContext *s, arg_r2nfvm* a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_nf(s, a->nf));
+}
+
+GEN_VEXT_TRANS(vsb_v, 0, r2nfvm, st_us_op, st_us_check)
+GEN_VEXT_TRANS(vsh_v, 1, r2nfvm, st_us_op, st_us_check)
+GEN_VEXT_TRANS(vsw_v, 2, r2nfvm, st_us_op, st_us_check)
+GEN_VEXT_TRANS(vse_v, 3, r2nfvm, st_us_op, st_us_check)
+
+/*
+ *** stride load and store
+ */
+typedef void gen_helper_ldst_stride(TCGv_ptr, TCGv_ptr, TCGv,
+        TCGv, TCGv_env, TCGv_i32);
+
+static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, uint32_t rs2,
+        uint32_t data, gen_helper_ldst_stride *fn, DisasContext *s)
+{
+    TCGv_ptr dest, mask;
+    TCGv base, stride;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    base = tcg_temp_new();
+    stride = tcg_temp_new();
+    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
+
+    gen_get_gpr(base, rs1);
+    gen_get_gpr(stride, rs2);
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, mask, base, stride, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free(base);
+    tcg_temp_free(stride);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool ld_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
+{
+    uint32_t data = 0;
+    gen_helper_ldst_stride *fn;
+    static gen_helper_ldst_stride * const fns[7][4] = {
+        { gen_helper_vlsb_v_b,  gen_helper_vlsb_v_h,
+          gen_helper_vlsb_v_w,  gen_helper_vlsb_v_d },
+        { NULL,                 gen_helper_vlsh_v_h,
+          gen_helper_vlsh_v_w,  gen_helper_vlsh_v_d },
+        { NULL,                 NULL,
+          gen_helper_vlsw_v_w,  gen_helper_vlsw_v_d },
+        { gen_helper_vlse_v_b,  gen_helper_vlse_v_h,
+          gen_helper_vlse_v_w,  gen_helper_vlse_v_d },
+        { gen_helper_vlsbu_v_b, gen_helper_vlsbu_v_h,
+          gen_helper_vlsbu_v_w, gen_helper_vlsbu_v_d },
+        { NULL,                 gen_helper_vlshu_v_h,
+          gen_helper_vlshu_v_w, gen_helper_vlshu_v_d },
+        { NULL,                 NULL,
+          gen_helper_vlswu_v_w, gen_helper_vlswu_v_d },
+    };
+
+    fn =  fns[seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+    data = FIELD_DP32(data, VDATA, VM, a->vm);
+    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+    data = FIELD_DP32(data, VDATA, NF, a->nf);
+    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+}
+
+static bool ld_stride_check(DisasContext *s, arg_rnfvm* a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_nf(s, a->nf));
+}
+
+GEN_VEXT_TRANS(vlsb_v, 0, rnfvm, ld_stride_op, ld_stride_check)
+GEN_VEXT_TRANS(vlsh_v, 1, rnfvm, ld_stride_op, ld_stride_check)
+GEN_VEXT_TRANS(vlsw_v, 2, rnfvm, ld_stride_op, ld_stride_check)
+GEN_VEXT_TRANS(vlse_v, 3, rnfvm, ld_stride_op, ld_stride_check)
+GEN_VEXT_TRANS(vlsbu_v, 4, rnfvm, ld_stride_op, ld_stride_check)
+GEN_VEXT_TRANS(vlshu_v, 5, rnfvm, ld_stride_op, ld_stride_check)
+GEN_VEXT_TRANS(vlswu_v, 6, rnfvm, ld_stride_op, ld_stride_check)
+
+static bool st_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
+{
+    uint32_t data = 0;
+    gen_helper_ldst_stride *fn;
+    static gen_helper_ldst_stride * const fns[4][4] = {
+        /* masked stride store */
+        { gen_helper_vssb_v_b,  gen_helper_vssb_v_h,
+          gen_helper_vssb_v_w,  gen_helper_vssb_v_d },
+        { NULL,                 gen_helper_vssh_v_h,
+          gen_helper_vssh_v_w,  gen_helper_vssh_v_d },
+        { NULL,                 NULL,
+          gen_helper_vssw_v_w,  gen_helper_vssw_v_d },
+        { gen_helper_vsse_v_b,  gen_helper_vsse_v_h,
+          gen_helper_vsse_v_w,  gen_helper_vsse_v_d }
+    };
+
+    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+    data = FIELD_DP32(data, VDATA, VM, a->vm);
+    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+    data = FIELD_DP32(data, VDATA, NF, a->nf);
+    fn =  fns[seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+}
+
+static bool st_stride_check(DisasContext *s, arg_rnfvm* a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_nf(s, a->nf));
+}
+
+GEN_VEXT_TRANS(vssb_v, 0, rnfvm, st_stride_op, st_stride_check)
+GEN_VEXT_TRANS(vssh_v, 1, rnfvm, st_stride_op, st_stride_check)
+GEN_VEXT_TRANS(vssw_v, 2, rnfvm, st_stride_op, st_stride_check)
+GEN_VEXT_TRANS(vsse_v, 3, rnfvm, st_stride_op, st_stride_check)
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index af07ac4160..852545b77e 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -61,6 +61,7 @@ typedef struct DisasContext {
     uint8_t lmul;
     uint8_t sew;
     uint16_t vlen;
+    uint16_t mlen;
     bool vl_eq_vlmax;
 } DisasContext;
 
@@ -548,6 +549,11 @@ static void decode_RV32_64C(DisasContext *ctx, uint16_t opcode)
     }
 }
 
+static int ex_plus_1(DisasContext *ctx, int nf)
+{
+    return nf + 1;
+}
+
 #define EX_SH(amount) \
     static int ex_shift_##amount(DisasContext *ctx, int imm) \
     {                                         \
@@ -784,6 +790,7 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     ctx->vill = FIELD_EX32(tb_flags, TB_FLAGS, VILL);
     ctx->sew = FIELD_EX32(tb_flags, TB_FLAGS, SEW);
     ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
+    ctx->mlen = 1 << (ctx->sew  + 3 - ctx->lmul);
     ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
 }
 
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 2afe716f2a..ebfabd2946 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -18,8 +18,10 @@
 
 #include "qemu/osdep.h"
 #include "cpu.h"
+#include "exec/memop.h"
 #include "exec/exec-all.h"
 #include "exec/helper-proto.h"
+#include "tcg/tcg-gvec-desc.h"
 #include <math.h>
 
 target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
@@ -51,3 +53,407 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
     env->vstart = 0;
     return vl;
 }
+
+/*
+ * Note that vector data is stored in host-endian 64-bit chunks,
+ * so addressing units smaller than that needs a host-endian fixup.
+ */
+#ifdef HOST_WORDS_BIGENDIAN
+#define H1(x)   ((x) ^ 7)
+#define H1_2(x) ((x) ^ 6)
+#define H1_4(x) ((x) ^ 4)
+#define H2(x)   ((x) ^ 3)
+#define H4(x)   ((x) ^ 1)
+#define H8(x)   ((x))
+#else
+#define H1(x)   (x)
+#define H1_2(x) (x)
+#define H1_4(x) (x)
+#define H2(x)   (x)
+#define H4(x)   (x)
+#define H8(x)   (x)
+#endif
+
+static inline uint32_t vext_nf(uint32_t desc)
+{
+    return FIELD_EX32(simd_data(desc), VDATA, NF);
+}
+
+static inline uint32_t vext_mlen(uint32_t desc)
+{
+    return FIELD_EX32(simd_data(desc), VDATA, MLEN);
+}
+
+static inline uint32_t vext_vm(uint32_t desc)
+{
+    return FIELD_EX32(simd_data(desc), VDATA, VM);
+}
+
+static inline uint32_t vext_lmul(uint32_t desc)
+{
+    return FIELD_EX32(simd_data(desc), VDATA, LMUL);
+}
+
+/*
+ * Get vector group length in bytes. Its range is [64, 2048].
+ *
+ * As simd_desc support at most 256, the max vlen is 512 bits.
+ * So vlen in bytes is encoded as maxsz.
+ */
+static inline uint32_t vext_maxsz(uint32_t desc)
+{
+    return simd_maxsz(desc) << vext_lmul(desc);
+}
+
+/*
+ * This function checks watchpoint before real load operation.
+ *
+ * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
+ * In user mode, there is no watchpoint support now.
+ *
+ * It will trigger an exception if there is no mapping in TLB
+ * and page table walk can't fill the TLB entry. Then the guest
+ * software can return here after process the exception or never return.
+ */
+static void probe_pages(CPURISCVState *env, target_ulong addr,
+        target_ulong len, uintptr_t ra, MMUAccessType access_type)
+{
+    target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
+    target_ulong curlen = MIN(pagelen, len);
+
+    probe_access(env, addr, curlen, access_type,
+            cpu_mmu_index(env, false), ra);
+    if (len > curlen) {
+        addr += curlen;
+        curlen = len - curlen;
+        probe_access(env, addr, curlen, access_type,
+                cpu_mmu_index(env, false), ra);
+    }
+}
+
+#ifdef HOST_WORDS_BIGENDIAN
+static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
+{
+    /*
+     * Split the remaining range to two parts.
+     * The first part is in the last uint64_t unit.
+     * The second part start from the next uint64_t unit.
+     */
+    int part1 = 0, part2 = tot - cnt;
+    if (cnt % 8) {
+        part1 = 8 - (cnt % 8);
+        part2 = tot - cnt - part1;
+        memset(tail & ~(7ULL), 0, part1);
+        memset((tail + 8) & ~(7ULL), 0, part2);
+    } else {
+        memset(tail, 0, part2);
+    }
+}
+#else
+static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
+{
+    memset(tail, 0, tot - cnt);
+}
+#endif
+
+static void clearb(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
+{
+    int8_t *cur = ((int8_t *)vd + H1(idx));
+    vext_clear(cur, cnt, tot);
+}
+
+static void clearh(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
+{
+    int16_t *cur = ((int16_t *)vd + H2(idx));
+    vext_clear(cur, cnt, tot);
+}
+
+static void clearl(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
+{
+    int32_t *cur = ((int32_t *)vd + H4(idx));
+    vext_clear(cur, cnt, tot);
+}
+
+static void clearq(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
+{
+    int64_t *cur = (int64_t *)vd + idx;
+    vext_clear(cur, cnt, tot);
+}
+
+
+static inline int vext_elem_mask(void *v0, int mlen, int index)
+{
+    int idx = (index * mlen) / 64;
+    int pos = (index * mlen) % 64;
+    return (((uint64_t *)v0)[idx] >> pos) & 1;
+}
+
+/* elements operations for load and store */
+typedef void (*vext_ldst_elem_fn)(CPURISCVState *env, target_ulong addr,
+        uint32_t idx, void *vd, uintptr_t retaddr);
+typedef void (*vext_ld_clear_elem)(void *vd, uint32_t idx,
+        uint32_t cnt, uint32_t tot);
+
+#define GEN_VEXT_LD_ELEM(NAME, MTYPE, ETYPE, H, LDSUF)     \
+static void NAME(CPURISCVState *env, abi_ptr addr,         \
+        uint32_t idx, void *vd, uintptr_t retaddr)         \
+{                                                          \
+    MTYPE data;                                            \
+    ETYPE *cur = ((ETYPE *)vd + H(idx));                   \
+    data = cpu_##LDSUF##_data_ra(env, addr, retaddr);      \
+    *cur = data;                                           \
+}                                                          \
+
+GEN_VEXT_LD_ELEM(ldb_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(ldb_h, int8_t,  int16_t, H2, ldsb)
+GEN_VEXT_LD_ELEM(ldb_w, int8_t,  int32_t, H4, ldsb)
+GEN_VEXT_LD_ELEM(ldb_d, int8_t,  int64_t, H8, ldsb)
+GEN_VEXT_LD_ELEM(ldh_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(ldh_w, int16_t, int32_t, H4, ldsw)
+GEN_VEXT_LD_ELEM(ldh_d, int16_t, int64_t, H8, ldsw)
+GEN_VEXT_LD_ELEM(ldw_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(ldw_d, int32_t, int64_t, H8, ldl)
+GEN_VEXT_LD_ELEM(lde_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(lde_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(lde_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(lde_d, int64_t, int64_t, H8, ldq)
+GEN_VEXT_LD_ELEM(ldbu_b, uint8_t,  uint8_t,  H1, ldub)
+GEN_VEXT_LD_ELEM(ldbu_h, uint8_t,  uint16_t, H2, ldub)
+GEN_VEXT_LD_ELEM(ldbu_w, uint8_t,  uint32_t, H4, ldub)
+GEN_VEXT_LD_ELEM(ldbu_d, uint8_t,  uint64_t, H8, ldub)
+GEN_VEXT_LD_ELEM(ldhu_h, uint16_t, uint16_t, H2, lduw)
+GEN_VEXT_LD_ELEM(ldhu_w, uint16_t, uint32_t, H4, lduw)
+GEN_VEXT_LD_ELEM(ldhu_d, uint16_t, uint64_t, H8, lduw)
+GEN_VEXT_LD_ELEM(ldwu_w, uint32_t, uint32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(ldwu_d, uint32_t, uint64_t, H8, ldl)
+
+#define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF)          \
+static void NAME(CPURISCVState *env, abi_ptr addr,       \
+        uint32_t idx, void *vd, uintptr_t retaddr)       \
+{                                                        \
+    ETYPE data = *((ETYPE *)vd + H(idx));                \
+    cpu_##STSUF##_data_ra(env, addr, data, retaddr);     \
+}
+GEN_VEXT_ST_ELEM(stb_b, int8_t,  H1, stb)
+GEN_VEXT_ST_ELEM(stb_h, int16_t, H2, stb)
+GEN_VEXT_ST_ELEM(stb_w, int32_t, H4, stb)
+GEN_VEXT_ST_ELEM(stb_d, int64_t, H8, stb)
+GEN_VEXT_ST_ELEM(sth_h, int16_t, H2, stw)
+GEN_VEXT_ST_ELEM(sth_w, int32_t, H4, stw)
+GEN_VEXT_ST_ELEM(sth_d, int64_t, H8, stw)
+GEN_VEXT_ST_ELEM(stw_w, int32_t, H4, stl)
+GEN_VEXT_ST_ELEM(stw_d, int64_t, H8, stl)
+GEN_VEXT_ST_ELEM(ste_b, int8_t,  H1, stb)
+GEN_VEXT_ST_ELEM(ste_h, int16_t, H2, stw)
+GEN_VEXT_ST_ELEM(ste_w, int32_t, H4, stl)
+GEN_VEXT_ST_ELEM(ste_d, int64_t, H8, stq)
+
+/*
+ *** stride: access vector element from strided memory
+ */
+static void vext_ldst_stride(void *vd, void *v0, target_ulong base,
+        target_ulong stride, CPURISCVState *env, uint32_t desc, uint32_t vm,
+        vext_ldst_elem_fn ldst_elem, vext_ld_clear_elem clear_elem,
+        uint32_t esz, uint32_t msz, uintptr_t ra, MMUAccessType access_type)
+{
+    uint32_t i, k;
+    uint32_t nf = vext_nf(desc);
+    uint32_t mlen = vext_mlen(desc);
+    uint32_t vlmax = vext_maxsz(desc) / esz;
+
+    if (env->vl == 0) {
+        return;
+    }
+    /* probe every access*/
+    for (i = 0; i < env->vl; i++) {
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {
+            continue;
+        }
+        probe_pages(env, base + stride * i, nf * msz, ra, access_type);
+    }
+    /* do real access */
+    for (i = 0; i < env->vl; i++) {
+        k = 0;
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {
+            continue;
+        }
+        while (k < nf) {
+            target_ulong addr = base + stride * i + k * msz;
+            ldst_elem(env, addr, i + k * vlmax, vd, ra);
+            k++;
+        }
+    }
+    /* clear tail elements */
+    if (clear_elem) {
+        for (k = 0; k < nf; k++) {
+            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
+        }
+    }
+}
+
+#define GEN_VEXT_LD_STRIDE(NAME, MTYPE, ETYPE, LOAD_FN, CLEAR_FN)       \
+void HELPER(NAME)(void *vd, void * v0, target_ulong base,               \
+        target_ulong stride, CPURISCVState *env, uint32_t desc)         \
+{                                                                       \
+    uint32_t vm = vext_vm(desc);                                        \
+    vext_ldst_stride(vd, v0, base, stride, env, desc, vm, LOAD_FN,      \
+        CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);\
+}
+
+GEN_VEXT_LD_STRIDE(vlsb_v_b,  int8_t,   int8_t,   ldb_b,  clearb)
+GEN_VEXT_LD_STRIDE(vlsb_v_h,  int8_t,   int16_t,  ldb_h,  clearh)
+GEN_VEXT_LD_STRIDE(vlsb_v_w,  int8_t,   int32_t,  ldb_w,  clearl)
+GEN_VEXT_LD_STRIDE(vlsb_v_d,  int8_t,   int64_t,  ldb_d,  clearq)
+GEN_VEXT_LD_STRIDE(vlsh_v_h,  int16_t,  int16_t,  ldh_h,  clearh)
+GEN_VEXT_LD_STRIDE(vlsh_v_w,  int16_t,  int32_t,  ldh_w,  clearl)
+GEN_VEXT_LD_STRIDE(vlsh_v_d,  int16_t,  int64_t,  ldh_d,  clearq)
+GEN_VEXT_LD_STRIDE(vlsw_v_w,  int32_t,  int32_t,  ldw_w,  clearl)
+GEN_VEXT_LD_STRIDE(vlsw_v_d,  int32_t,  int64_t,  ldw_d,  clearq)
+GEN_VEXT_LD_STRIDE(vlse_v_b,  int8_t,   int8_t,   lde_b,  clearb)
+GEN_VEXT_LD_STRIDE(vlse_v_h,  int16_t,  int16_t,  lde_h,  clearh)
+GEN_VEXT_LD_STRIDE(vlse_v_w,  int32_t,  int32_t,  lde_w,  clearl)
+GEN_VEXT_LD_STRIDE(vlse_v_d,  int64_t,  int64_t,  lde_d,  clearq)
+GEN_VEXT_LD_STRIDE(vlsbu_v_b, uint8_t,  uint8_t,  ldbu_b, clearb)
+GEN_VEXT_LD_STRIDE(vlsbu_v_h, uint8_t,  uint16_t, ldbu_h, clearh)
+GEN_VEXT_LD_STRIDE(vlsbu_v_w, uint8_t,  uint32_t, ldbu_w, clearl)
+GEN_VEXT_LD_STRIDE(vlsbu_v_d, uint8_t,  uint64_t, ldbu_d, clearq)
+GEN_VEXT_LD_STRIDE(vlshu_v_h, uint16_t, uint16_t, ldhu_h, clearh)
+GEN_VEXT_LD_STRIDE(vlshu_v_w, uint16_t, uint32_t, ldhu_w, clearl)
+GEN_VEXT_LD_STRIDE(vlshu_v_d, uint16_t, uint64_t, ldhu_d, clearq)
+GEN_VEXT_LD_STRIDE(vlswu_v_w, uint32_t, uint32_t, ldwu_w, clearl)
+GEN_VEXT_LD_STRIDE(vlswu_v_d, uint32_t, uint64_t, ldwu_d, clearq)
+
+#define GEN_VEXT_ST_STRIDE(NAME, MTYPE, ETYPE, STORE_FN)                \
+void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
+        target_ulong stride, CPURISCVState *env, uint32_t desc)         \
+{                                                                       \
+    uint32_t vm = vext_vm(desc);                                        \
+    vext_ldst_stride(vd, v0, base, stride, env, desc, vm, STORE_FN,     \
+        NULL, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);   \
+}
+
+GEN_VEXT_ST_STRIDE(vssb_v_b, int8_t,  int8_t,  stb_b)
+GEN_VEXT_ST_STRIDE(vssb_v_h, int8_t,  int16_t, stb_h)
+GEN_VEXT_ST_STRIDE(vssb_v_w, int8_t,  int32_t, stb_w)
+GEN_VEXT_ST_STRIDE(vssb_v_d, int8_t,  int64_t, stb_d)
+GEN_VEXT_ST_STRIDE(vssh_v_h, int16_t, int16_t, sth_h)
+GEN_VEXT_ST_STRIDE(vssh_v_w, int16_t, int32_t, sth_w)
+GEN_VEXT_ST_STRIDE(vssh_v_d, int16_t, int64_t, sth_d)
+GEN_VEXT_ST_STRIDE(vssw_v_w, int32_t, int32_t, stw_w)
+GEN_VEXT_ST_STRIDE(vssw_v_d, int32_t, int64_t, stw_d)
+GEN_VEXT_ST_STRIDE(vsse_v_b, int8_t,  int8_t,  ste_b)
+GEN_VEXT_ST_STRIDE(vsse_v_h, int16_t, int16_t, ste_h)
+GEN_VEXT_ST_STRIDE(vsse_v_w, int32_t, int32_t, ste_w)
+GEN_VEXT_ST_STRIDE(vsse_v_d, int64_t, int64_t, ste_d)
+
+/*
+ *** unit-stride: access elements stored contiguously in memory
+ */
+
+/* unmasked unit-stride load and store operation*/
+static inline void vext_ldst_us(void *vd, target_ulong base,
+        CPURISCVState *env, uint32_t desc,
+        vext_ldst_elem_fn ldst_elem,
+        vext_ld_clear_elem clear_elem,
+        uint32_t esz, uint32_t msz, uintptr_t ra,
+        MMUAccessType access_type)
+{
+    uint32_t i, k;
+    uint32_t nf = vext_nf(desc);
+    uint32_t vlmax = vext_maxsz(desc) / esz;
+
+    if (env->vl == 0) {
+        return;
+    }
+    /* probe every access */
+    probe_pages(env, base, env->vl * nf * msz, ra, access_type);
+    /* load bytes from guest memory */
+    for (i = 0; i < env->vl; i++) {
+        k = 0;
+        while (k < nf) {
+            target_ulong addr = base + (i * nf + k) * msz;
+            ldst_elem(env, addr, i + k * vlmax, vd, ra);
+            k++;
+        }
+    }
+    /* clear tail elements */
+    if (clear_elem) {
+        for (k = 0; k < nf; k++) {
+            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
+        }
+    }
+}
+
+/*
+ * masked unit-stride load and store operation will be a special case of stride,
+ * stride = NF * sizeof (MTYPE)
+ */
+
+#define GEN_VEXT_LD_US(NAME, MTYPE, ETYPE, LOAD_FN, CLEAR_FN)           \
+void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,         \
+        CPURISCVState *env, uint32_t desc)                              \
+{                                                                       \
+    uint32_t stride = vext_nf(desc) * sizeof(MTYPE);                    \
+    vext_ldst_stride(vd, v0, base, stride, env, desc, false, LOAD_FN,   \
+        CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);\
+}                                                                       \
+                                                                        \
+void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
+        CPURISCVState *env, uint32_t desc)                              \
+{                                                                       \
+    vext_ldst_us(vd, base, env, desc, LOAD_FN, CLEAR_FN,                \
+        sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);          \
+}
+
+GEN_VEXT_LD_US(vlb_v_b,  int8_t,   int8_t,   ldb_b,  clearb)
+GEN_VEXT_LD_US(vlb_v_h,  int8_t,   int16_t,  ldb_h,  clearh)
+GEN_VEXT_LD_US(vlb_v_w,  int8_t,   int32_t,  ldb_w,  clearl)
+GEN_VEXT_LD_US(vlb_v_d,  int8_t,   int64_t,  ldb_d,  clearq)
+GEN_VEXT_LD_US(vlh_v_h,  int16_t,  int16_t,  ldh_h,  clearh)
+GEN_VEXT_LD_US(vlh_v_w,  int16_t,  int32_t,  ldh_w,  clearl)
+GEN_VEXT_LD_US(vlh_v_d,  int16_t,  int64_t,  ldh_d,  clearq)
+GEN_VEXT_LD_US(vlw_v_w,  int32_t,  int32_t,  ldw_w,  clearl)
+GEN_VEXT_LD_US(vlw_v_d,  int32_t,  int64_t,  ldw_d,  clearq)
+GEN_VEXT_LD_US(vle_v_b,  int8_t,   int8_t,   lde_b,  clearb)
+GEN_VEXT_LD_US(vle_v_h,  int16_t,  int16_t,  lde_h,  clearh)
+GEN_VEXT_LD_US(vle_v_w,  int32_t,  int32_t,  lde_w,  clearl)
+GEN_VEXT_LD_US(vle_v_d,  int64_t,  int64_t,  lde_d,  clearq)
+GEN_VEXT_LD_US(vlbu_v_b, uint8_t,  uint8_t,  ldbu_b, clearb)
+GEN_VEXT_LD_US(vlbu_v_h, uint8_t,  uint16_t, ldbu_h, clearh)
+GEN_VEXT_LD_US(vlbu_v_w, uint8_t,  uint32_t, ldbu_w, clearl)
+GEN_VEXT_LD_US(vlbu_v_d, uint8_t,  uint64_t, ldbu_d, clearq)
+GEN_VEXT_LD_US(vlhu_v_h, uint16_t, uint16_t, ldhu_h, clearh)
+GEN_VEXT_LD_US(vlhu_v_w, uint16_t, uint32_t, ldhu_w, clearl)
+GEN_VEXT_LD_US(vlhu_v_d, uint16_t, uint64_t, ldhu_d, clearq)
+GEN_VEXT_LD_US(vlwu_v_w, uint32_t, uint32_t, ldwu_w, clearl)
+GEN_VEXT_LD_US(vlwu_v_d, uint32_t, uint64_t, ldwu_d, clearq)
+
+#define GEN_VEXT_ST_US(NAME, MTYPE, ETYPE, STORE_FN)                    \
+void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,         \
+        CPURISCVState *env, uint32_t desc)                              \
+{                                                                       \
+    uint32_t stride = vext_nf(desc) * sizeof(MTYPE);                    \
+    vext_ldst_stride(vd, v0, base, stride, env, desc, false, STORE_FN,  \
+        NULL, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);   \
+}                                                                       \
+                                                                        \
+void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
+        CPURISCVState *env, uint32_t desc)                              \
+{                                                                       \
+    vext_ldst_us(vd, base, env, desc, STORE_FN, NULL,                   \
+        sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);         \
+}
+
+GEN_VEXT_ST_US(vsb_v_b, int8_t,  int8_t , stb_b)
+GEN_VEXT_ST_US(vsb_v_h, int8_t,  int16_t, stb_h)
+GEN_VEXT_ST_US(vsb_v_w, int8_t,  int32_t, stb_w)
+GEN_VEXT_ST_US(vsb_v_d, int8_t,  int64_t, stb_d)
+GEN_VEXT_ST_US(vsh_v_h, int16_t, int16_t, sth_h)
+GEN_VEXT_ST_US(vsh_v_w, int16_t, int32_t, sth_w)
+GEN_VEXT_ST_US(vsh_v_d, int16_t, int64_t, sth_d)
+GEN_VEXT_ST_US(vsw_v_w, int32_t, int32_t, stw_w)
+GEN_VEXT_ST_US(vsw_v_d, int32_t, int64_t, stw_d)
+GEN_VEXT_ST_US(vse_v_b, int8_t,  int8_t , ste_b)
+GEN_VEXT_ST_US(vse_v_h, int16_t, int16_t, ste_h)
+GEN_VEXT_ST_US(vse_v_w, int32_t, int32_t, ste_w)
+GEN_VEXT_ST_US(vse_v_d, int64_t, int64_t, ste_d)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 06/60] target/riscv: add vector index load and store instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Vector indexed operations add the contents of each element of the
vector offset operand specified by vs2 to the base effective address
to give the effective address of each element.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  35 +++++++
 target/riscv/insn32.decode              |  13 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 124 ++++++++++++++++++++++++
 target/riscv/vector_helper.c            | 117 ++++++++++++++++++++++
 4 files changed, 289 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 87dfa90609..f9b3da60ca 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -183,3 +183,38 @@ DEF_HELPER_6(vsse_v_b, void, ptr, ptr, tl, tl, env, i32)
 DEF_HELPER_6(vsse_v_h, void, ptr, ptr, tl, tl, env, i32)
 DEF_HELPER_6(vsse_v_w, void, ptr, ptr, tl, tl, env, i32)
 DEF_HELPER_6(vsse_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlxb_v_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxh_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxh_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxh_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxw_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxw_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxhu_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxhu_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxhu_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxwu_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxwu_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxh_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxh_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxh_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxw_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxw_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index ef521152c5..bc36df33b5 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -241,6 +241,19 @@ vssh_v     ... 010 . ..... ..... 101 ..... 0100111 @r_nfvm
 vssw_v     ... 010 . ..... ..... 110 ..... 0100111 @r_nfvm
 vsse_v     ... 010 . ..... ..... 111 ..... 0100111 @r_nfvm
 
+vlxb_v     ... 111 . ..... ..... 000 ..... 0000111 @r_nfvm
+vlxh_v     ... 111 . ..... ..... 101 ..... 0000111 @r_nfvm
+vlxw_v     ... 111 . ..... ..... 110 ..... 0000111 @r_nfvm
+vlxe_v     ... 011 . ..... ..... 111 ..... 0000111 @r_nfvm
+vlxbu_v    ... 011 . ..... ..... 000 ..... 0000111 @r_nfvm
+vlxhu_v    ... 011 . ..... ..... 101 ..... 0000111 @r_nfvm
+vlxwu_v    ... 011 . ..... ..... 110 ..... 0000111 @r_nfvm
+# Vector ordered-indexed and unordered-indexed store insns.
+vsxb_v     ... -11 . ..... ..... 000 ..... 0100111 @r_nfvm
+vsxh_v     ... -11 . ..... ..... 101 ..... 0100111 @r_nfvm
+vsxw_v     ... -11 . ..... ..... 110 ..... 0100111 @r_nfvm
+vsxe_v     ... -11 . ..... ..... 111 ..... 0100111 @r_nfvm
+
 # *** new major opcode OP-V ***
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index d85f2aec68..5d1eeef323 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -407,3 +407,127 @@ GEN_VEXT_TRANS(vssb_v, 0, rnfvm, st_stride_op, st_stride_check)
 GEN_VEXT_TRANS(vssh_v, 1, rnfvm, st_stride_op, st_stride_check)
 GEN_VEXT_TRANS(vssw_v, 2, rnfvm, st_stride_op, st_stride_check)
 GEN_VEXT_TRANS(vsse_v, 3, rnfvm, st_stride_op, st_stride_check)
+
+/*
+ *** index load and store
+ */
+typedef void gen_helper_ldst_index(TCGv_ptr, TCGv_ptr, TCGv,
+        TCGv_ptr, TCGv_env, TCGv_i32);
+
+static bool ldst_index_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
+        uint32_t data, gen_helper_ldst_index *fn, DisasContext *s)
+{
+    TCGv_ptr dest, mask, index;
+    TCGv base;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    index = tcg_temp_new_ptr();
+    base = tcg_temp_new();
+    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
+
+    gen_get_gpr(base, rs1);
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(index, cpu_env, vreg_ofs(s, vs2));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, mask, base, index, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free_ptr(index);
+    tcg_temp_free(base);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool ld_index_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
+{
+    uint32_t data = 0;
+    gen_helper_ldst_index *fn;
+    static gen_helper_ldst_index * const fns[7][4] = {
+        { gen_helper_vlxb_v_b,  gen_helper_vlxb_v_h,
+          gen_helper_vlxb_v_w,  gen_helper_vlxb_v_d },
+        { NULL,                 gen_helper_vlxh_v_h,
+          gen_helper_vlxh_v_w,  gen_helper_vlxh_v_d },
+        { NULL,                 NULL,
+          gen_helper_vlxw_v_w,  gen_helper_vlxw_v_d },
+        { gen_helper_vlxe_v_b,  gen_helper_vlxe_v_h,
+          gen_helper_vlxe_v_w,  gen_helper_vlxe_v_d },
+        { gen_helper_vlxbu_v_b, gen_helper_vlxbu_v_h,
+          gen_helper_vlxbu_v_w, gen_helper_vlxbu_v_d },
+        { NULL,                 gen_helper_vlxhu_v_h,
+          gen_helper_vlxhu_v_w, gen_helper_vlxhu_v_d },
+        { NULL,                 NULL,
+          gen_helper_vlxwu_v_w, gen_helper_vlxwu_v_d },
+    };
+
+    fn =  fns[seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+    data = FIELD_DP32(data, VDATA, VM, a->vm);
+    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+    data = FIELD_DP32(data, VDATA, NF, a->nf);
+    return ldst_index_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+}
+
+static bool ld_index_check(DisasContext *s, arg_rnfvm* a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_nf(s, a->nf));
+}
+
+GEN_VEXT_TRANS(vlxb_v, 0, rnfvm, ld_index_op, ld_index_check)
+GEN_VEXT_TRANS(vlxh_v, 1, rnfvm, ld_index_op, ld_index_check)
+GEN_VEXT_TRANS(vlxw_v, 2, rnfvm, ld_index_op, ld_index_check)
+GEN_VEXT_TRANS(vlxe_v, 3, rnfvm, ld_index_op, ld_index_check)
+GEN_VEXT_TRANS(vlxbu_v, 4, rnfvm, ld_index_op, ld_index_check)
+GEN_VEXT_TRANS(vlxhu_v, 5, rnfvm, ld_index_op, ld_index_check)
+GEN_VEXT_TRANS(vlxwu_v, 6, rnfvm, ld_index_op, ld_index_check)
+
+static bool st_index_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
+{
+    uint32_t data = 0;
+    gen_helper_ldst_index *fn;
+    static gen_helper_ldst_index * const fns[4][4] = {
+        { gen_helper_vsxb_v_b,  gen_helper_vsxb_v_h,
+          gen_helper_vsxb_v_w,  gen_helper_vsxb_v_d },
+        { NULL,                 gen_helper_vsxh_v_h,
+          gen_helper_vsxh_v_w,  gen_helper_vsxh_v_d },
+        { NULL,                 NULL,
+          gen_helper_vsxw_v_w,  gen_helper_vsxw_v_d },
+        { gen_helper_vsxe_v_b,  gen_helper_vsxe_v_h,
+          gen_helper_vsxe_v_w,  gen_helper_vsxe_v_d }
+    };
+
+    fn =  fns[seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+    data = FIELD_DP32(data, VDATA, VM, a->vm);
+    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+    data = FIELD_DP32(data, VDATA, NF, a->nf);
+    return ldst_index_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+}
+
+static bool st_index_check(DisasContext *s, arg_rnfvm* a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_nf(s, a->nf));
+}
+
+GEN_VEXT_TRANS(vsxb_v, 0, rnfvm, st_index_op, st_index_check)
+GEN_VEXT_TRANS(vsxh_v, 1, rnfvm, st_index_op, st_index_check)
+GEN_VEXT_TRANS(vsxw_v, 2, rnfvm, st_index_op, st_index_check)
+GEN_VEXT_TRANS(vsxe_v, 3, rnfvm, st_index_op, st_index_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index ebfabd2946..35cb9f09b4 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -457,3 +457,120 @@ GEN_VEXT_ST_US(vse_v_b, int8_t,  int8_t , ste_b)
 GEN_VEXT_ST_US(vse_v_h, int16_t, int16_t, ste_h)
 GEN_VEXT_ST_US(vse_v_w, int32_t, int32_t, ste_w)
 GEN_VEXT_ST_US(vse_v_d, int64_t, int64_t, ste_d)
+
+/*
+ *** index: access vector element from indexed memory
+ */
+typedef target_ulong (*vext_get_index_addr)(target_ulong base,
+        uint32_t idx, void *vs2);
+
+#define GEN_VEXT_GET_INDEX_ADDR(NAME, ETYPE, H)        \
+static target_ulong NAME(target_ulong base,            \
+        uint32_t idx, void *vs2)                       \
+{                                                      \
+    return (base + *((ETYPE *)vs2 + H(idx)));          \
+}
+
+GEN_VEXT_GET_INDEX_ADDR(idx_b, int8_t,  H1)
+GEN_VEXT_GET_INDEX_ADDR(idx_h, int16_t, H2)
+GEN_VEXT_GET_INDEX_ADDR(idx_w, int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(idx_d, int64_t, H8)
+
+static inline void vext_ldst_index(void *vd, void *v0, target_ulong base,
+        void *vs2, CPURISCVState *env, uint32_t desc,
+        vext_get_index_addr get_index_addr,
+        vext_ldst_elem_fn ldst_elem,
+        vext_ld_clear_elem clear_elem,
+        uint32_t esz, uint32_t msz, uintptr_t ra,
+        MMUAccessType access_type)
+{
+    uint32_t i, k;
+    uint32_t nf = vext_nf(desc);
+    uint32_t vm = vext_vm(desc);
+    uint32_t mlen = vext_mlen(desc);
+    uint32_t vlmax = vext_maxsz(desc) / esz;
+
+    if (env->vl == 0) {
+        return;
+    }
+    /* probe every access*/
+    for (i = 0; i < env->vl; i++) {
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {
+            continue;
+        }
+        probe_pages(env, get_index_addr(base, i, vs2), nf * msz, ra,
+                access_type);
+    }
+    /* load bytes from guest memory */
+    for (i = 0; i < env->vl; i++) {
+        k = 0;
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {
+            continue;
+        }
+        while (k < nf) {
+            abi_ptr addr = get_index_addr(base, i, vs2) + k * msz;
+            ldst_elem(env, addr, i + k * vlmax, vd, ra);
+            k++;
+        }
+    }
+    /* clear tail elements */
+    if (clear_elem) {
+        for (k = 0; k < nf; k++) {
+            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
+        }
+    }
+}
+
+#define GEN_VEXT_LD_INDEX(NAME, MTYPE, ETYPE, INDEX_FN, LOAD_FN, CLEAR_FN) \
+void HELPER(NAME)(void *vd, void *v0, target_ulong base,                   \
+        void *vs2, CPURISCVState *env, uint32_t desc)                      \
+{                                                                          \
+    vext_ldst_index(vd, v0, base, vs2, env, desc, INDEX_FN,                \
+        LOAD_FN, CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE),                   \
+        GETPC(), MMU_DATA_LOAD);                                           \
+}
+GEN_VEXT_LD_INDEX(vlxb_v_b,  int8_t,   int8_t,   idx_b, ldb_b,  clearb)
+GEN_VEXT_LD_INDEX(vlxb_v_h,  int8_t,   int16_t,  idx_h, ldb_h,  clearh)
+GEN_VEXT_LD_INDEX(vlxb_v_w,  int8_t,   int32_t,  idx_w, ldb_w,  clearl)
+GEN_VEXT_LD_INDEX(vlxb_v_d,  int8_t,   int64_t,  idx_d, ldb_d,  clearq)
+GEN_VEXT_LD_INDEX(vlxh_v_h,  int16_t,  int16_t,  idx_h, ldh_h,  clearh)
+GEN_VEXT_LD_INDEX(vlxh_v_w,  int16_t,  int32_t,  idx_w, ldh_w,  clearl)
+GEN_VEXT_LD_INDEX(vlxh_v_d,  int16_t,  int64_t,  idx_d, ldh_d,  clearq)
+GEN_VEXT_LD_INDEX(vlxw_v_w,  int32_t,  int32_t,  idx_w, ldw_w,  clearl)
+GEN_VEXT_LD_INDEX(vlxw_v_d,  int32_t,  int64_t,  idx_d, ldw_d,  clearq)
+GEN_VEXT_LD_INDEX(vlxe_v_b,  int8_t,   int8_t,   idx_b, lde_b,  clearb)
+GEN_VEXT_LD_INDEX(vlxe_v_h,  int16_t,  int16_t,  idx_h, lde_h,  clearh)
+GEN_VEXT_LD_INDEX(vlxe_v_w,  int32_t,  int32_t,  idx_w, lde_w,  clearl)
+GEN_VEXT_LD_INDEX(vlxe_v_d,  int64_t,  int64_t,  idx_d, lde_d,  clearq)
+GEN_VEXT_LD_INDEX(vlxbu_v_b, uint8_t,  uint8_t,  idx_b, ldbu_b, clearb)
+GEN_VEXT_LD_INDEX(vlxbu_v_h, uint8_t,  uint16_t, idx_h, ldbu_h, clearh)
+GEN_VEXT_LD_INDEX(vlxbu_v_w, uint8_t,  uint32_t, idx_w, ldbu_w, clearl)
+GEN_VEXT_LD_INDEX(vlxbu_v_d, uint8_t,  uint64_t, idx_d, ldbu_d, clearq)
+GEN_VEXT_LD_INDEX(vlxhu_v_h, uint16_t, uint16_t, idx_h, ldhu_h, clearh)
+GEN_VEXT_LD_INDEX(vlxhu_v_w, uint16_t, uint32_t, idx_w, ldhu_w, clearl)
+GEN_VEXT_LD_INDEX(vlxhu_v_d, uint16_t, uint64_t, idx_d, ldhu_d, clearq)
+GEN_VEXT_LD_INDEX(vlxwu_v_w, uint32_t, uint32_t, idx_w, ldwu_w, clearl)
+GEN_VEXT_LD_INDEX(vlxwu_v_d, uint32_t, uint64_t, idx_d, ldwu_d, clearq)
+
+#define GEN_VEXT_ST_INDEX(NAME, MTYPE, ETYPE, INDEX_FN, STORE_FN)\
+void HELPER(NAME)(void *vd, void *v0, target_ulong base,         \
+        void *vs2, CPURISCVState *env, uint32_t desc)            \
+{                                                                \
+    vext_ldst_index(vd, v0, base, vs2, env, desc, INDEX_FN,      \
+        STORE_FN, NULL, sizeof(ETYPE), sizeof(MTYPE),            \
+        GETPC(), MMU_DATA_STORE);                                \
+}
+
+GEN_VEXT_ST_INDEX(vsxb_v_b, int8_t,  int8_t,  idx_b, stb_b)
+GEN_VEXT_ST_INDEX(vsxb_v_h, int8_t,  int16_t, idx_h, stb_h)
+GEN_VEXT_ST_INDEX(vsxb_v_w, int8_t,  int32_t, idx_w, stb_w)
+GEN_VEXT_ST_INDEX(vsxb_v_d, int8_t,  int64_t, idx_d, stb_d)
+GEN_VEXT_ST_INDEX(vsxh_v_h, int16_t, int16_t, idx_h, sth_h)
+GEN_VEXT_ST_INDEX(vsxh_v_w, int16_t, int32_t, idx_w, sth_w)
+GEN_VEXT_ST_INDEX(vsxh_v_d, int16_t, int64_t, idx_d, sth_d)
+GEN_VEXT_ST_INDEX(vsxw_v_w, int32_t, int32_t, idx_w, stw_w)
+GEN_VEXT_ST_INDEX(vsxw_v_d, int32_t, int64_t, idx_d, stw_d)
+GEN_VEXT_ST_INDEX(vsxe_v_b, int8_t,  int8_t,  idx_b, ste_b)
+GEN_VEXT_ST_INDEX(vsxe_v_h, int16_t, int16_t, idx_h, ste_h)
+GEN_VEXT_ST_INDEX(vsxe_v_w, int32_t, int32_t, idx_w, ste_w)
+GEN_VEXT_ST_INDEX(vsxe_v_d, int64_t, int64_t, idx_d, ste_d)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 06/60] target/riscv: add vector index load and store instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Vector indexed operations add the contents of each element of the
vector offset operand specified by vs2 to the base effective address
to give the effective address of each element.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  35 +++++++
 target/riscv/insn32.decode              |  13 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 124 ++++++++++++++++++++++++
 target/riscv/vector_helper.c            | 117 ++++++++++++++++++++++
 4 files changed, 289 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 87dfa90609..f9b3da60ca 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -183,3 +183,38 @@ DEF_HELPER_6(vsse_v_b, void, ptr, ptr, tl, tl, env, i32)
 DEF_HELPER_6(vsse_v_h, void, ptr, ptr, tl, tl, env, i32)
 DEF_HELPER_6(vsse_v_w, void, ptr, ptr, tl, tl, env, i32)
 DEF_HELPER_6(vsse_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlxb_v_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxh_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxh_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxh_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxw_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxw_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxhu_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxhu_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxhu_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxwu_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxwu_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxh_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxh_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxh_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxw_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxw_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index ef521152c5..bc36df33b5 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -241,6 +241,19 @@ vssh_v     ... 010 . ..... ..... 101 ..... 0100111 @r_nfvm
 vssw_v     ... 010 . ..... ..... 110 ..... 0100111 @r_nfvm
 vsse_v     ... 010 . ..... ..... 111 ..... 0100111 @r_nfvm
 
+vlxb_v     ... 111 . ..... ..... 000 ..... 0000111 @r_nfvm
+vlxh_v     ... 111 . ..... ..... 101 ..... 0000111 @r_nfvm
+vlxw_v     ... 111 . ..... ..... 110 ..... 0000111 @r_nfvm
+vlxe_v     ... 011 . ..... ..... 111 ..... 0000111 @r_nfvm
+vlxbu_v    ... 011 . ..... ..... 000 ..... 0000111 @r_nfvm
+vlxhu_v    ... 011 . ..... ..... 101 ..... 0000111 @r_nfvm
+vlxwu_v    ... 011 . ..... ..... 110 ..... 0000111 @r_nfvm
+# Vector ordered-indexed and unordered-indexed store insns.
+vsxb_v     ... -11 . ..... ..... 000 ..... 0100111 @r_nfvm
+vsxh_v     ... -11 . ..... ..... 101 ..... 0100111 @r_nfvm
+vsxw_v     ... -11 . ..... ..... 110 ..... 0100111 @r_nfvm
+vsxe_v     ... -11 . ..... ..... 111 ..... 0100111 @r_nfvm
+
 # *** new major opcode OP-V ***
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index d85f2aec68..5d1eeef323 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -407,3 +407,127 @@ GEN_VEXT_TRANS(vssb_v, 0, rnfvm, st_stride_op, st_stride_check)
 GEN_VEXT_TRANS(vssh_v, 1, rnfvm, st_stride_op, st_stride_check)
 GEN_VEXT_TRANS(vssw_v, 2, rnfvm, st_stride_op, st_stride_check)
 GEN_VEXT_TRANS(vsse_v, 3, rnfvm, st_stride_op, st_stride_check)
+
+/*
+ *** index load and store
+ */
+typedef void gen_helper_ldst_index(TCGv_ptr, TCGv_ptr, TCGv,
+        TCGv_ptr, TCGv_env, TCGv_i32);
+
+static bool ldst_index_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
+        uint32_t data, gen_helper_ldst_index *fn, DisasContext *s)
+{
+    TCGv_ptr dest, mask, index;
+    TCGv base;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    index = tcg_temp_new_ptr();
+    base = tcg_temp_new();
+    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
+
+    gen_get_gpr(base, rs1);
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(index, cpu_env, vreg_ofs(s, vs2));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, mask, base, index, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free_ptr(index);
+    tcg_temp_free(base);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool ld_index_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
+{
+    uint32_t data = 0;
+    gen_helper_ldst_index *fn;
+    static gen_helper_ldst_index * const fns[7][4] = {
+        { gen_helper_vlxb_v_b,  gen_helper_vlxb_v_h,
+          gen_helper_vlxb_v_w,  gen_helper_vlxb_v_d },
+        { NULL,                 gen_helper_vlxh_v_h,
+          gen_helper_vlxh_v_w,  gen_helper_vlxh_v_d },
+        { NULL,                 NULL,
+          gen_helper_vlxw_v_w,  gen_helper_vlxw_v_d },
+        { gen_helper_vlxe_v_b,  gen_helper_vlxe_v_h,
+          gen_helper_vlxe_v_w,  gen_helper_vlxe_v_d },
+        { gen_helper_vlxbu_v_b, gen_helper_vlxbu_v_h,
+          gen_helper_vlxbu_v_w, gen_helper_vlxbu_v_d },
+        { NULL,                 gen_helper_vlxhu_v_h,
+          gen_helper_vlxhu_v_w, gen_helper_vlxhu_v_d },
+        { NULL,                 NULL,
+          gen_helper_vlxwu_v_w, gen_helper_vlxwu_v_d },
+    };
+
+    fn =  fns[seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+    data = FIELD_DP32(data, VDATA, VM, a->vm);
+    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+    data = FIELD_DP32(data, VDATA, NF, a->nf);
+    return ldst_index_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+}
+
+static bool ld_index_check(DisasContext *s, arg_rnfvm* a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_nf(s, a->nf));
+}
+
+GEN_VEXT_TRANS(vlxb_v, 0, rnfvm, ld_index_op, ld_index_check)
+GEN_VEXT_TRANS(vlxh_v, 1, rnfvm, ld_index_op, ld_index_check)
+GEN_VEXT_TRANS(vlxw_v, 2, rnfvm, ld_index_op, ld_index_check)
+GEN_VEXT_TRANS(vlxe_v, 3, rnfvm, ld_index_op, ld_index_check)
+GEN_VEXT_TRANS(vlxbu_v, 4, rnfvm, ld_index_op, ld_index_check)
+GEN_VEXT_TRANS(vlxhu_v, 5, rnfvm, ld_index_op, ld_index_check)
+GEN_VEXT_TRANS(vlxwu_v, 6, rnfvm, ld_index_op, ld_index_check)
+
+static bool st_index_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
+{
+    uint32_t data = 0;
+    gen_helper_ldst_index *fn;
+    static gen_helper_ldst_index * const fns[4][4] = {
+        { gen_helper_vsxb_v_b,  gen_helper_vsxb_v_h,
+          gen_helper_vsxb_v_w,  gen_helper_vsxb_v_d },
+        { NULL,                 gen_helper_vsxh_v_h,
+          gen_helper_vsxh_v_w,  gen_helper_vsxh_v_d },
+        { NULL,                 NULL,
+          gen_helper_vsxw_v_w,  gen_helper_vsxw_v_d },
+        { gen_helper_vsxe_v_b,  gen_helper_vsxe_v_h,
+          gen_helper_vsxe_v_w,  gen_helper_vsxe_v_d }
+    };
+
+    fn =  fns[seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+    data = FIELD_DP32(data, VDATA, VM, a->vm);
+    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+    data = FIELD_DP32(data, VDATA, NF, a->nf);
+    return ldst_index_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+}
+
+static bool st_index_check(DisasContext *s, arg_rnfvm* a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_nf(s, a->nf));
+}
+
+GEN_VEXT_TRANS(vsxb_v, 0, rnfvm, st_index_op, st_index_check)
+GEN_VEXT_TRANS(vsxh_v, 1, rnfvm, st_index_op, st_index_check)
+GEN_VEXT_TRANS(vsxw_v, 2, rnfvm, st_index_op, st_index_check)
+GEN_VEXT_TRANS(vsxe_v, 3, rnfvm, st_index_op, st_index_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index ebfabd2946..35cb9f09b4 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -457,3 +457,120 @@ GEN_VEXT_ST_US(vse_v_b, int8_t,  int8_t , ste_b)
 GEN_VEXT_ST_US(vse_v_h, int16_t, int16_t, ste_h)
 GEN_VEXT_ST_US(vse_v_w, int32_t, int32_t, ste_w)
 GEN_VEXT_ST_US(vse_v_d, int64_t, int64_t, ste_d)
+
+/*
+ *** index: access vector element from indexed memory
+ */
+typedef target_ulong (*vext_get_index_addr)(target_ulong base,
+        uint32_t idx, void *vs2);
+
+#define GEN_VEXT_GET_INDEX_ADDR(NAME, ETYPE, H)        \
+static target_ulong NAME(target_ulong base,            \
+        uint32_t idx, void *vs2)                       \
+{                                                      \
+    return (base + *((ETYPE *)vs2 + H(idx)));          \
+}
+
+GEN_VEXT_GET_INDEX_ADDR(idx_b, int8_t,  H1)
+GEN_VEXT_GET_INDEX_ADDR(idx_h, int16_t, H2)
+GEN_VEXT_GET_INDEX_ADDR(idx_w, int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(idx_d, int64_t, H8)
+
+static inline void vext_ldst_index(void *vd, void *v0, target_ulong base,
+        void *vs2, CPURISCVState *env, uint32_t desc,
+        vext_get_index_addr get_index_addr,
+        vext_ldst_elem_fn ldst_elem,
+        vext_ld_clear_elem clear_elem,
+        uint32_t esz, uint32_t msz, uintptr_t ra,
+        MMUAccessType access_type)
+{
+    uint32_t i, k;
+    uint32_t nf = vext_nf(desc);
+    uint32_t vm = vext_vm(desc);
+    uint32_t mlen = vext_mlen(desc);
+    uint32_t vlmax = vext_maxsz(desc) / esz;
+
+    if (env->vl == 0) {
+        return;
+    }
+    /* probe every access*/
+    for (i = 0; i < env->vl; i++) {
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {
+            continue;
+        }
+        probe_pages(env, get_index_addr(base, i, vs2), nf * msz, ra,
+                access_type);
+    }
+    /* load bytes from guest memory */
+    for (i = 0; i < env->vl; i++) {
+        k = 0;
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {
+            continue;
+        }
+        while (k < nf) {
+            abi_ptr addr = get_index_addr(base, i, vs2) + k * msz;
+            ldst_elem(env, addr, i + k * vlmax, vd, ra);
+            k++;
+        }
+    }
+    /* clear tail elements */
+    if (clear_elem) {
+        for (k = 0; k < nf; k++) {
+            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
+        }
+    }
+}
+
+#define GEN_VEXT_LD_INDEX(NAME, MTYPE, ETYPE, INDEX_FN, LOAD_FN, CLEAR_FN) \
+void HELPER(NAME)(void *vd, void *v0, target_ulong base,                   \
+        void *vs2, CPURISCVState *env, uint32_t desc)                      \
+{                                                                          \
+    vext_ldst_index(vd, v0, base, vs2, env, desc, INDEX_FN,                \
+        LOAD_FN, CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE),                   \
+        GETPC(), MMU_DATA_LOAD);                                           \
+}
+GEN_VEXT_LD_INDEX(vlxb_v_b,  int8_t,   int8_t,   idx_b, ldb_b,  clearb)
+GEN_VEXT_LD_INDEX(vlxb_v_h,  int8_t,   int16_t,  idx_h, ldb_h,  clearh)
+GEN_VEXT_LD_INDEX(vlxb_v_w,  int8_t,   int32_t,  idx_w, ldb_w,  clearl)
+GEN_VEXT_LD_INDEX(vlxb_v_d,  int8_t,   int64_t,  idx_d, ldb_d,  clearq)
+GEN_VEXT_LD_INDEX(vlxh_v_h,  int16_t,  int16_t,  idx_h, ldh_h,  clearh)
+GEN_VEXT_LD_INDEX(vlxh_v_w,  int16_t,  int32_t,  idx_w, ldh_w,  clearl)
+GEN_VEXT_LD_INDEX(vlxh_v_d,  int16_t,  int64_t,  idx_d, ldh_d,  clearq)
+GEN_VEXT_LD_INDEX(vlxw_v_w,  int32_t,  int32_t,  idx_w, ldw_w,  clearl)
+GEN_VEXT_LD_INDEX(vlxw_v_d,  int32_t,  int64_t,  idx_d, ldw_d,  clearq)
+GEN_VEXT_LD_INDEX(vlxe_v_b,  int8_t,   int8_t,   idx_b, lde_b,  clearb)
+GEN_VEXT_LD_INDEX(vlxe_v_h,  int16_t,  int16_t,  idx_h, lde_h,  clearh)
+GEN_VEXT_LD_INDEX(vlxe_v_w,  int32_t,  int32_t,  idx_w, lde_w,  clearl)
+GEN_VEXT_LD_INDEX(vlxe_v_d,  int64_t,  int64_t,  idx_d, lde_d,  clearq)
+GEN_VEXT_LD_INDEX(vlxbu_v_b, uint8_t,  uint8_t,  idx_b, ldbu_b, clearb)
+GEN_VEXT_LD_INDEX(vlxbu_v_h, uint8_t,  uint16_t, idx_h, ldbu_h, clearh)
+GEN_VEXT_LD_INDEX(vlxbu_v_w, uint8_t,  uint32_t, idx_w, ldbu_w, clearl)
+GEN_VEXT_LD_INDEX(vlxbu_v_d, uint8_t,  uint64_t, idx_d, ldbu_d, clearq)
+GEN_VEXT_LD_INDEX(vlxhu_v_h, uint16_t, uint16_t, idx_h, ldhu_h, clearh)
+GEN_VEXT_LD_INDEX(vlxhu_v_w, uint16_t, uint32_t, idx_w, ldhu_w, clearl)
+GEN_VEXT_LD_INDEX(vlxhu_v_d, uint16_t, uint64_t, idx_d, ldhu_d, clearq)
+GEN_VEXT_LD_INDEX(vlxwu_v_w, uint32_t, uint32_t, idx_w, ldwu_w, clearl)
+GEN_VEXT_LD_INDEX(vlxwu_v_d, uint32_t, uint64_t, idx_d, ldwu_d, clearq)
+
+#define GEN_VEXT_ST_INDEX(NAME, MTYPE, ETYPE, INDEX_FN, STORE_FN)\
+void HELPER(NAME)(void *vd, void *v0, target_ulong base,         \
+        void *vs2, CPURISCVState *env, uint32_t desc)            \
+{                                                                \
+    vext_ldst_index(vd, v0, base, vs2, env, desc, INDEX_FN,      \
+        STORE_FN, NULL, sizeof(ETYPE), sizeof(MTYPE),            \
+        GETPC(), MMU_DATA_STORE);                                \
+}
+
+GEN_VEXT_ST_INDEX(vsxb_v_b, int8_t,  int8_t,  idx_b, stb_b)
+GEN_VEXT_ST_INDEX(vsxb_v_h, int8_t,  int16_t, idx_h, stb_h)
+GEN_VEXT_ST_INDEX(vsxb_v_w, int8_t,  int32_t, idx_w, stb_w)
+GEN_VEXT_ST_INDEX(vsxb_v_d, int8_t,  int64_t, idx_d, stb_d)
+GEN_VEXT_ST_INDEX(vsxh_v_h, int16_t, int16_t, idx_h, sth_h)
+GEN_VEXT_ST_INDEX(vsxh_v_w, int16_t, int32_t, idx_w, sth_w)
+GEN_VEXT_ST_INDEX(vsxh_v_d, int16_t, int64_t, idx_d, sth_d)
+GEN_VEXT_ST_INDEX(vsxw_v_w, int32_t, int32_t, idx_w, stw_w)
+GEN_VEXT_ST_INDEX(vsxw_v_d, int32_t, int64_t, idx_d, stw_d)
+GEN_VEXT_ST_INDEX(vsxe_v_b, int8_t,  int8_t,  idx_b, ste_b)
+GEN_VEXT_ST_INDEX(vsxe_v_h, int16_t, int16_t, idx_h, ste_h)
+GEN_VEXT_ST_INDEX(vsxe_v_w, int32_t, int32_t, idx_w, ste_w)
+GEN_VEXT_ST_INDEX(vsxe_v_d, int64_t, int64_t, idx_d, ste_d)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 07/60] target/riscv: add fault-only-first unit stride load
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

The unit-stride fault-only-fault load instructions are used to
vectorize loops with data-dependent exit conditions(while loops).
These instructions execute as a regular load except that they
will only take a trap on element 0.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  22 +++++
 target/riscv/insn32.decode              |   7 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  69 +++++++++++++++
 target/riscv/vector_helper.c            | 111 ++++++++++++++++++++++++
 4 files changed, 209 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index f9b3da60ca..72ba4d9bdb 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -218,3 +218,25 @@ DEF_HELPER_6(vsxe_v_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsxe_v_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsxe_v_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsxe_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbff_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbff_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhff_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vleff_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vleff_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vleff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vleff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbuff_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbuff_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbuff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbuff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhuff_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhuff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhuff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwuff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwuff_v_d, void, ptr, ptr, tl, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index bc36df33b5..b76c09c8c0 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -224,6 +224,13 @@ vle_v      ... 000 . 00000 ..... 111 ..... 0000111 @r2_nfvm
 vlbu_v     ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm
 vlhu_v     ... 000 . 00000 ..... 101 ..... 0000111 @r2_nfvm
 vlwu_v     ... 000 . 00000 ..... 110 ..... 0000111 @r2_nfvm
+vlbff_v    ... 100 . 10000 ..... 000 ..... 0000111 @r2_nfvm
+vlhff_v    ... 100 . 10000 ..... 101 ..... 0000111 @r2_nfvm
+vlwff_v    ... 100 . 10000 ..... 110 ..... 0000111 @r2_nfvm
+vleff_v    ... 000 . 10000 ..... 111 ..... 0000111 @r2_nfvm
+vlbuff_v   ... 000 . 10000 ..... 000 ..... 0000111 @r2_nfvm
+vlhuff_v   ... 000 . 10000 ..... 101 ..... 0000111 @r2_nfvm
+vlwuff_v   ... 000 . 10000 ..... 110 ..... 0000111 @r2_nfvm
 vsb_v      ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm
 vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
 vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 5d1eeef323..9d9fc886d6 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -531,3 +531,72 @@ GEN_VEXT_TRANS(vsxb_v, 0, rnfvm, st_index_op, st_index_check)
 GEN_VEXT_TRANS(vsxh_v, 1, rnfvm, st_index_op, st_index_check)
 GEN_VEXT_TRANS(vsxw_v, 2, rnfvm, st_index_op, st_index_check)
 GEN_VEXT_TRANS(vsxe_v, 3, rnfvm, st_index_op, st_index_check)
+
+/*
+ *** unit stride fault-only-first load
+ */
+static bool ldff_trans(uint32_t vd, uint32_t rs1, uint32_t data,
+        gen_helper_ldst_us *fn, DisasContext *s)
+{
+    TCGv_ptr dest, mask;
+    TCGv base;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    base = tcg_temp_new();
+    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
+
+    gen_get_gpr(base, rs1);
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, mask, base, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free(base);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool ldff_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
+{
+    uint32_t data = 0;
+    gen_helper_ldst_us *fn;
+    static gen_helper_ldst_us * const fns[7][4] = {
+        { gen_helper_vlbff_v_b,  gen_helper_vlbff_v_h,
+          gen_helper_vlbff_v_w,  gen_helper_vlbff_v_d },
+        { NULL,                  gen_helper_vlhff_v_h,
+          gen_helper_vlhff_v_w,  gen_helper_vlhff_v_d },
+        { NULL,                  NULL,
+          gen_helper_vlwff_v_w,  gen_helper_vlwff_v_d },
+        { gen_helper_vleff_v_b,  gen_helper_vleff_v_h,
+          gen_helper_vleff_v_w,  gen_helper_vleff_v_d },
+        { gen_helper_vlbuff_v_b, gen_helper_vlbuff_v_h,
+          gen_helper_vlbuff_v_w, gen_helper_vlbuff_v_d },
+        { NULL,                  gen_helper_vlhuff_v_h,
+          gen_helper_vlhuff_v_w, gen_helper_vlhuff_v_d },
+        { NULL,                  NULL,
+          gen_helper_vlwuff_v_w, gen_helper_vlwuff_v_d }
+    };
+
+    fn =  fns[seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+    data = FIELD_DP32(data, VDATA, VM, a->vm);
+    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+    data = FIELD_DP32(data, VDATA, NF, a->nf);
+    return ldff_trans(a->rd, a->rs1, data, fn, s);
+}
+
+GEN_VEXT_TRANS(vlbff_v, 0, r2nfvm, ldff_op, ld_us_check)
+GEN_VEXT_TRANS(vlhff_v, 1, r2nfvm, ldff_op, ld_us_check)
+GEN_VEXT_TRANS(vlwff_v, 2, r2nfvm, ldff_op, ld_us_check)
+GEN_VEXT_TRANS(vleff_v, 3, r2nfvm, ldff_op, ld_us_check)
+GEN_VEXT_TRANS(vlbuff_v, 4, r2nfvm, ldff_op, ld_us_check)
+GEN_VEXT_TRANS(vlhuff_v, 5, r2nfvm, ldff_op, ld_us_check)
+GEN_VEXT_TRANS(vlwuff_v, 6, r2nfvm, ldff_op, ld_us_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 35cb9f09b4..3841301b74 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -574,3 +574,114 @@ GEN_VEXT_ST_INDEX(vsxe_v_b, int8_t,  int8_t,  idx_b, ste_b)
 GEN_VEXT_ST_INDEX(vsxe_v_h, int16_t, int16_t, idx_h, ste_h)
 GEN_VEXT_ST_INDEX(vsxe_v_w, int32_t, int32_t, idx_w, ste_w)
 GEN_VEXT_ST_INDEX(vsxe_v_d, int64_t, int64_t, idx_d, ste_d)
+
+/*
+ *** unit-stride fault-only-fisrt load instructions
+ */
+static inline void vext_ldff(void *vd, void *v0, target_ulong base,
+        CPURISCVState *env, uint32_t desc,
+        vext_ldst_elem_fn ldst_elem,
+        vext_ld_clear_elem clear_elem,
+        int mmuidx, uint32_t esz, uint32_t msz, uintptr_t ra)
+{
+    void *host;
+    uint32_t i, k, vl = 0;
+    uint32_t mlen = vext_mlen(desc);
+    uint32_t nf = vext_nf(desc);
+    uint32_t vm = vext_vm(desc);
+    uint32_t vlmax = vext_maxsz(desc) / esz;
+    target_ulong addr, offset, remain;
+
+    if (env->vl == 0) {
+        return;
+    }
+    /* probe every access*/
+    for (i = 0; i < env->vl; i++) {
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {
+            continue;
+        }
+        addr = base + nf * i * msz;
+        if (i == 0) {
+            probe_pages(env, addr, nf * msz, ra, MMU_DATA_LOAD);
+        } else {
+            /* if it triggers an exception, no need to check watchpoint */
+            offset = -(addr | TARGET_PAGE_MASK);
+            remain = nf * msz;
+            while (remain > 0) {
+                host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmuidx);
+                if (host) {
+#ifdef CONFIG_USER_ONLY
+                    if (page_check_range(addr, nf * msz, PAGE_READ) < 0) {
+                        vl = i;
+                        goto ProbeSuccess;
+                    }
+#else
+                    probe_pages(env, addr, nf * msz, ra, MMU_DATA_LOAD);
+#endif
+                } else {
+                    vl = i;
+                    goto ProbeSuccess;
+                }
+                if (remain <=  offset) {
+                    break;
+                }
+                remain -= offset;
+                addr += offset;
+                offset = -(addr | TARGET_PAGE_MASK);
+            }
+        }
+    }
+ProbeSuccess:
+    /* load bytes from guest memory */
+    if (vl != 0) {
+        env->vl = vl;
+    }
+    for (i = 0; i < env->vl; i++) {
+        k = 0;
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {
+            continue;
+        }
+        while (k < nf) {
+            target_ulong addr = base + (i * nf + k) * msz;
+            ldst_elem(env, addr, i + k * vlmax, vd, ra);
+            k++;
+        }
+    }
+    /* clear tail elements */
+    if (vl != 0) {
+        return;
+    }
+    for (k = 0; k < nf; k++) {
+        clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
+    }
+}
+
+#define GEN_VEXT_LDFF(NAME, MTYPE, ETYPE, MMUIDX, LOAD_FN, CLEAR_FN)  \
+void HELPER(NAME)(void *vd, void *v0, target_ulong base,              \
+        CPURISCVState *env, uint32_t desc)                            \
+{                                                                     \
+    vext_ldff(vd, v0, base, env, desc, LOAD_FN, CLEAR_FN, MMUIDX,     \
+        sizeof(ETYPE), sizeof(MTYPE), GETPC());                       \
+}
+GEN_VEXT_LDFF(vlbff_v_b,  int8_t,   int8_t,   MO_SB,   ldb_b,  clearb)
+GEN_VEXT_LDFF(vlbff_v_h,  int8_t,   int16_t,  MO_SB,   ldb_h,  clearh)
+GEN_VEXT_LDFF(vlbff_v_w,  int8_t,   int32_t,  MO_SB,   ldb_w,  clearl)
+GEN_VEXT_LDFF(vlbff_v_d,  int8_t,   int64_t,  MO_SB,   ldb_d,  clearq)
+GEN_VEXT_LDFF(vlhff_v_h,  int16_t,  int16_t,  MO_LESW, ldh_h,  clearh)
+GEN_VEXT_LDFF(vlhff_v_w,  int16_t,  int32_t,  MO_LESW, ldh_w,  clearl)
+GEN_VEXT_LDFF(vlhff_v_d,  int16_t,  int64_t,  MO_LESW, ldh_d,  clearq)
+GEN_VEXT_LDFF(vlwff_v_w,  int32_t,  int32_t,  MO_LESL, ldw_w,  clearl)
+GEN_VEXT_LDFF(vlwff_v_d,  int32_t,  int64_t,  MO_LESL, ldw_d,  clearq)
+GEN_VEXT_LDFF(vleff_v_b,  int8_t,   int8_t,   MO_SB,   lde_b,  clearb)
+GEN_VEXT_LDFF(vleff_v_h,  int16_t,  int16_t,  MO_LESW, lde_h,  clearh)
+GEN_VEXT_LDFF(vleff_v_w,  int32_t,  int32_t,  MO_LESL, lde_w,  clearl)
+GEN_VEXT_LDFF(vleff_v_d,  int64_t,  int64_t,  MO_LEQ,  lde_d,  clearq)
+GEN_VEXT_LDFF(vlbuff_v_b, uint8_t,  uint8_t,  MO_UB,   ldbu_b, clearb)
+GEN_VEXT_LDFF(vlbuff_v_h, uint8_t,  uint16_t, MO_UB,   ldbu_h, clearh)
+GEN_VEXT_LDFF(vlbuff_v_w, uint8_t,  uint32_t, MO_UB,   ldbu_w, clearl)
+GEN_VEXT_LDFF(vlbuff_v_d, uint8_t,  uint64_t, MO_UB,   ldbu_d, clearq)
+GEN_VEXT_LDFF(vlhuff_v_h, uint16_t, uint16_t, MO_LEUW, ldhu_h, clearh)
+GEN_VEXT_LDFF(vlhuff_v_w, uint16_t, uint32_t, MO_LEUW, ldhu_w, clearl)
+GEN_VEXT_LDFF(vlhuff_v_d, uint16_t, uint64_t, MO_LEUW, ldhu_d, clearq)
+GEN_VEXT_LDFF(vlwuff_v_w, uint32_t, uint32_t, MO_LEUL, ldwu_w, clearl)
+GEN_VEXT_LDFF(vlwuff_v_d, uint32_t, uint64_t, MO_LEUL, ldwu_d, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 07/60] target/riscv: add fault-only-first unit stride load
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

The unit-stride fault-only-fault load instructions are used to
vectorize loops with data-dependent exit conditions(while loops).
These instructions execute as a regular load except that they
will only take a trap on element 0.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  22 +++++
 target/riscv/insn32.decode              |   7 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  69 +++++++++++++++
 target/riscv/vector_helper.c            | 111 ++++++++++++++++++++++++
 4 files changed, 209 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index f9b3da60ca..72ba4d9bdb 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -218,3 +218,25 @@ DEF_HELPER_6(vsxe_v_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsxe_v_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsxe_v_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsxe_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbff_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbff_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhff_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vleff_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vleff_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vleff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vleff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbuff_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbuff_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbuff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbuff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhuff_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhuff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhuff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwuff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwuff_v_d, void, ptr, ptr, tl, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index bc36df33b5..b76c09c8c0 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -224,6 +224,13 @@ vle_v      ... 000 . 00000 ..... 111 ..... 0000111 @r2_nfvm
 vlbu_v     ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm
 vlhu_v     ... 000 . 00000 ..... 101 ..... 0000111 @r2_nfvm
 vlwu_v     ... 000 . 00000 ..... 110 ..... 0000111 @r2_nfvm
+vlbff_v    ... 100 . 10000 ..... 000 ..... 0000111 @r2_nfvm
+vlhff_v    ... 100 . 10000 ..... 101 ..... 0000111 @r2_nfvm
+vlwff_v    ... 100 . 10000 ..... 110 ..... 0000111 @r2_nfvm
+vleff_v    ... 000 . 10000 ..... 111 ..... 0000111 @r2_nfvm
+vlbuff_v   ... 000 . 10000 ..... 000 ..... 0000111 @r2_nfvm
+vlhuff_v   ... 000 . 10000 ..... 101 ..... 0000111 @r2_nfvm
+vlwuff_v   ... 000 . 10000 ..... 110 ..... 0000111 @r2_nfvm
 vsb_v      ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm
 vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
 vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 5d1eeef323..9d9fc886d6 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -531,3 +531,72 @@ GEN_VEXT_TRANS(vsxb_v, 0, rnfvm, st_index_op, st_index_check)
 GEN_VEXT_TRANS(vsxh_v, 1, rnfvm, st_index_op, st_index_check)
 GEN_VEXT_TRANS(vsxw_v, 2, rnfvm, st_index_op, st_index_check)
 GEN_VEXT_TRANS(vsxe_v, 3, rnfvm, st_index_op, st_index_check)
+
+/*
+ *** unit stride fault-only-first load
+ */
+static bool ldff_trans(uint32_t vd, uint32_t rs1, uint32_t data,
+        gen_helper_ldst_us *fn, DisasContext *s)
+{
+    TCGv_ptr dest, mask;
+    TCGv base;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    base = tcg_temp_new();
+    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
+
+    gen_get_gpr(base, rs1);
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, mask, base, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free(base);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool ldff_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
+{
+    uint32_t data = 0;
+    gen_helper_ldst_us *fn;
+    static gen_helper_ldst_us * const fns[7][4] = {
+        { gen_helper_vlbff_v_b,  gen_helper_vlbff_v_h,
+          gen_helper_vlbff_v_w,  gen_helper_vlbff_v_d },
+        { NULL,                  gen_helper_vlhff_v_h,
+          gen_helper_vlhff_v_w,  gen_helper_vlhff_v_d },
+        { NULL,                  NULL,
+          gen_helper_vlwff_v_w,  gen_helper_vlwff_v_d },
+        { gen_helper_vleff_v_b,  gen_helper_vleff_v_h,
+          gen_helper_vleff_v_w,  gen_helper_vleff_v_d },
+        { gen_helper_vlbuff_v_b, gen_helper_vlbuff_v_h,
+          gen_helper_vlbuff_v_w, gen_helper_vlbuff_v_d },
+        { NULL,                  gen_helper_vlhuff_v_h,
+          gen_helper_vlhuff_v_w, gen_helper_vlhuff_v_d },
+        { NULL,                  NULL,
+          gen_helper_vlwuff_v_w, gen_helper_vlwuff_v_d }
+    };
+
+    fn =  fns[seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+    data = FIELD_DP32(data, VDATA, VM, a->vm);
+    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+    data = FIELD_DP32(data, VDATA, NF, a->nf);
+    return ldff_trans(a->rd, a->rs1, data, fn, s);
+}
+
+GEN_VEXT_TRANS(vlbff_v, 0, r2nfvm, ldff_op, ld_us_check)
+GEN_VEXT_TRANS(vlhff_v, 1, r2nfvm, ldff_op, ld_us_check)
+GEN_VEXT_TRANS(vlwff_v, 2, r2nfvm, ldff_op, ld_us_check)
+GEN_VEXT_TRANS(vleff_v, 3, r2nfvm, ldff_op, ld_us_check)
+GEN_VEXT_TRANS(vlbuff_v, 4, r2nfvm, ldff_op, ld_us_check)
+GEN_VEXT_TRANS(vlhuff_v, 5, r2nfvm, ldff_op, ld_us_check)
+GEN_VEXT_TRANS(vlwuff_v, 6, r2nfvm, ldff_op, ld_us_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 35cb9f09b4..3841301b74 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -574,3 +574,114 @@ GEN_VEXT_ST_INDEX(vsxe_v_b, int8_t,  int8_t,  idx_b, ste_b)
 GEN_VEXT_ST_INDEX(vsxe_v_h, int16_t, int16_t, idx_h, ste_h)
 GEN_VEXT_ST_INDEX(vsxe_v_w, int32_t, int32_t, idx_w, ste_w)
 GEN_VEXT_ST_INDEX(vsxe_v_d, int64_t, int64_t, idx_d, ste_d)
+
+/*
+ *** unit-stride fault-only-fisrt load instructions
+ */
+static inline void vext_ldff(void *vd, void *v0, target_ulong base,
+        CPURISCVState *env, uint32_t desc,
+        vext_ldst_elem_fn ldst_elem,
+        vext_ld_clear_elem clear_elem,
+        int mmuidx, uint32_t esz, uint32_t msz, uintptr_t ra)
+{
+    void *host;
+    uint32_t i, k, vl = 0;
+    uint32_t mlen = vext_mlen(desc);
+    uint32_t nf = vext_nf(desc);
+    uint32_t vm = vext_vm(desc);
+    uint32_t vlmax = vext_maxsz(desc) / esz;
+    target_ulong addr, offset, remain;
+
+    if (env->vl == 0) {
+        return;
+    }
+    /* probe every access*/
+    for (i = 0; i < env->vl; i++) {
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {
+            continue;
+        }
+        addr = base + nf * i * msz;
+        if (i == 0) {
+            probe_pages(env, addr, nf * msz, ra, MMU_DATA_LOAD);
+        } else {
+            /* if it triggers an exception, no need to check watchpoint */
+            offset = -(addr | TARGET_PAGE_MASK);
+            remain = nf * msz;
+            while (remain > 0) {
+                host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmuidx);
+                if (host) {
+#ifdef CONFIG_USER_ONLY
+                    if (page_check_range(addr, nf * msz, PAGE_READ) < 0) {
+                        vl = i;
+                        goto ProbeSuccess;
+                    }
+#else
+                    probe_pages(env, addr, nf * msz, ra, MMU_DATA_LOAD);
+#endif
+                } else {
+                    vl = i;
+                    goto ProbeSuccess;
+                }
+                if (remain <=  offset) {
+                    break;
+                }
+                remain -= offset;
+                addr += offset;
+                offset = -(addr | TARGET_PAGE_MASK);
+            }
+        }
+    }
+ProbeSuccess:
+    /* load bytes from guest memory */
+    if (vl != 0) {
+        env->vl = vl;
+    }
+    for (i = 0; i < env->vl; i++) {
+        k = 0;
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {
+            continue;
+        }
+        while (k < nf) {
+            target_ulong addr = base + (i * nf + k) * msz;
+            ldst_elem(env, addr, i + k * vlmax, vd, ra);
+            k++;
+        }
+    }
+    /* clear tail elements */
+    if (vl != 0) {
+        return;
+    }
+    for (k = 0; k < nf; k++) {
+        clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
+    }
+}
+
+#define GEN_VEXT_LDFF(NAME, MTYPE, ETYPE, MMUIDX, LOAD_FN, CLEAR_FN)  \
+void HELPER(NAME)(void *vd, void *v0, target_ulong base,              \
+        CPURISCVState *env, uint32_t desc)                            \
+{                                                                     \
+    vext_ldff(vd, v0, base, env, desc, LOAD_FN, CLEAR_FN, MMUIDX,     \
+        sizeof(ETYPE), sizeof(MTYPE), GETPC());                       \
+}
+GEN_VEXT_LDFF(vlbff_v_b,  int8_t,   int8_t,   MO_SB,   ldb_b,  clearb)
+GEN_VEXT_LDFF(vlbff_v_h,  int8_t,   int16_t,  MO_SB,   ldb_h,  clearh)
+GEN_VEXT_LDFF(vlbff_v_w,  int8_t,   int32_t,  MO_SB,   ldb_w,  clearl)
+GEN_VEXT_LDFF(vlbff_v_d,  int8_t,   int64_t,  MO_SB,   ldb_d,  clearq)
+GEN_VEXT_LDFF(vlhff_v_h,  int16_t,  int16_t,  MO_LESW, ldh_h,  clearh)
+GEN_VEXT_LDFF(vlhff_v_w,  int16_t,  int32_t,  MO_LESW, ldh_w,  clearl)
+GEN_VEXT_LDFF(vlhff_v_d,  int16_t,  int64_t,  MO_LESW, ldh_d,  clearq)
+GEN_VEXT_LDFF(vlwff_v_w,  int32_t,  int32_t,  MO_LESL, ldw_w,  clearl)
+GEN_VEXT_LDFF(vlwff_v_d,  int32_t,  int64_t,  MO_LESL, ldw_d,  clearq)
+GEN_VEXT_LDFF(vleff_v_b,  int8_t,   int8_t,   MO_SB,   lde_b,  clearb)
+GEN_VEXT_LDFF(vleff_v_h,  int16_t,  int16_t,  MO_LESW, lde_h,  clearh)
+GEN_VEXT_LDFF(vleff_v_w,  int32_t,  int32_t,  MO_LESL, lde_w,  clearl)
+GEN_VEXT_LDFF(vleff_v_d,  int64_t,  int64_t,  MO_LEQ,  lde_d,  clearq)
+GEN_VEXT_LDFF(vlbuff_v_b, uint8_t,  uint8_t,  MO_UB,   ldbu_b, clearb)
+GEN_VEXT_LDFF(vlbuff_v_h, uint8_t,  uint16_t, MO_UB,   ldbu_h, clearh)
+GEN_VEXT_LDFF(vlbuff_v_w, uint8_t,  uint32_t, MO_UB,   ldbu_w, clearl)
+GEN_VEXT_LDFF(vlbuff_v_d, uint8_t,  uint64_t, MO_UB,   ldbu_d, clearq)
+GEN_VEXT_LDFF(vlhuff_v_h, uint16_t, uint16_t, MO_LEUW, ldhu_h, clearh)
+GEN_VEXT_LDFF(vlhuff_v_w, uint16_t, uint32_t, MO_LEUW, ldhu_w, clearl)
+GEN_VEXT_LDFF(vlhuff_v_d, uint16_t, uint64_t, MO_LEUW, ldhu_d, clearq)
+GEN_VEXT_LDFF(vlwuff_v_w, uint32_t, uint32_t, MO_LEUL, ldwu_w, clearl)
+GEN_VEXT_LDFF(vlwuff_v_d, uint32_t, uint64_t, MO_LEUL, ldwu_d, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 08/60] target/riscv: add vector amo operations
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Vector AMOs operate as if aq and rl bits were zero on each element
with regard to ordering relative to other instructions in the same hart.
Vector AMOs provide no ordering guarantee between element operations
in the same vector AMO instruction

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/cpu.h                      |   1 +
 target/riscv/helper.h                   |  29 +++++
 target/riscv/insn32-64.decode           |  11 ++
 target/riscv/insn32.decode              |  13 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 130 +++++++++++++++++++++
 target/riscv/vector_helper.c            | 143 ++++++++++++++++++++++++
 6 files changed, 327 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index b6ebb9b0eb..e069e55e81 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -374,6 +374,7 @@ FIELD(VDATA, MLEN, 0, 8)
 FIELD(VDATA, VM, 8, 1)
 FIELD(VDATA, LMUL, 9, 2)
 FIELD(VDATA, NF, 11, 4)
+FIELD(VDATA, WD, 11, 1)
 
 FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
 FIELD(TB_FLAGS, LMUL, 3, 2)
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 72ba4d9bdb..70a4b05f75 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -240,3 +240,32 @@ DEF_HELPER_5(vlhuff_v_w, void, ptr, ptr, tl, env, i32)
 DEF_HELPER_5(vlhuff_v_d, void, ptr, ptr, tl, env, i32)
 DEF_HELPER_5(vlwuff_v_w, void, ptr, ptr, tl, env, i32)
 DEF_HELPER_5(vlwuff_v_d, void, ptr, ptr, tl, env, i32)
+#ifdef TARGET_RISCV64
+DEF_HELPER_6(vamoswapw_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoswapd_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoaddw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoaddd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoxorw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoxord_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoandw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoandd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoorw_v_d,   void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoord_v_d,   void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomind_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominuw_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominud_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxuw_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxud_v_d, void, ptr, ptr, tl, ptr, env, i32)
+#endif
+DEF_HELPER_6(vamoswapw_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoaddw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoxorw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoandw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoorw_v_w,   void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index 380bf791bc..86153d93fa 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -57,6 +57,17 @@ amomax_d   10100 . . ..... ..... 011 ..... 0101111 @atom_st
 amominu_d  11000 . . ..... ..... 011 ..... 0101111 @atom_st
 amomaxu_d  11100 . . ..... ..... 011 ..... 0101111 @atom_st
 
+#*** Vector AMO operations (in addition to Zvamo) ***
+vamoswapd_v     00001 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamoaddd_v      00000 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamoxord_v      00100 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamoandd_v      01100 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamoord_v       01000 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamomind_v      10000 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamomaxd_v      10100 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamominud_v     11000 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamomaxud_v     11100 . . ..... ..... 111 ..... 0101111 @r_wdvm
+
 # *** RV64F Standard Extension (in addition to RV32F) ***
 fcvt_l_s   1100000  00010 ..... ... ..... 1010011 @r2_rm
 fcvt_lu_s  1100000  00011 ..... ... ..... 1010011 @r2_rm
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b76c09c8c0..1330703720 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -44,6 +44,7 @@
 &u    imm rd
 &shift     shamt rs1 rd
 &atomic    aq rl rs2 rs1 rd
+&rwdvm     vm wd rd rs1 rs2
 &r2nfvm    vm rd rs1 nf
 &rnfvm     vm rd rs1 rs2 nf
 
@@ -67,6 +68,7 @@
 @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
 @r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
 @r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
+@r_wdvm  ..... wd:1 vm:1 ..... ..... ... ..... ....... &rwdvm %rs2 %rs1 %rd
 @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
 
 @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
@@ -261,6 +263,17 @@ vsxh_v     ... -11 . ..... ..... 101 ..... 0100111 @r_nfvm
 vsxw_v     ... -11 . ..... ..... 110 ..... 0100111 @r_nfvm
 vsxe_v     ... -11 . ..... ..... 111 ..... 0100111 @r_nfvm
 
+#*** Vector AMO operations are encoded under the standard AMO major opcode ***
+vamoswapw_v     00001 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamoaddw_v      00000 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamoxorw_v      00100 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamoandw_v      01100 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamoorw_v       01000 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamominw_v      10000 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamomaxw_v      10100 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamominuw_v     11000 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamomaxuw_v     11100 . . ..... ..... 110 ..... 0101111 @r_wdvm
+
 # *** new major opcode OP-V ***
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 9d9fc886d6..3c677160c5 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -600,3 +600,133 @@ GEN_VEXT_TRANS(vleff_v, 3, r2nfvm, ldff_op, ld_us_check)
 GEN_VEXT_TRANS(vlbuff_v, 4, r2nfvm, ldff_op, ld_us_check)
 GEN_VEXT_TRANS(vlhuff_v, 5, r2nfvm, ldff_op, ld_us_check)
 GEN_VEXT_TRANS(vlwuff_v, 6, r2nfvm, ldff_op, ld_us_check)
+
+/*
+ *** vector atomic operation
+ */
+typedef void gen_helper_amo(TCGv_ptr, TCGv_ptr, TCGv, TCGv_ptr,
+        TCGv_env, TCGv_i32);
+
+static bool amo_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
+        uint32_t data, gen_helper_amo *fn, DisasContext *s)
+{
+    TCGv_ptr dest, mask, index;
+    TCGv base;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    index = tcg_temp_new_ptr();
+    base = tcg_temp_new();
+    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
+
+    gen_get_gpr(base, rs1);
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(index, cpu_env, vreg_ofs(s, vs2));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, mask, base, index, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free_ptr(index);
+    tcg_temp_free(base);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool amo_op(DisasContext *s, arg_rwdvm *a, uint8_t seq)
+{
+    uint32_t data = 0;
+    gen_helper_amo *fn;
+    static gen_helper_amo *const fnsw[9] = {
+        /* no atomic operation */
+        gen_helper_vamoswapw_v_w,
+        gen_helper_vamoaddw_v_w,
+        gen_helper_vamoxorw_v_w,
+        gen_helper_vamoandw_v_w,
+        gen_helper_vamoorw_v_w,
+        gen_helper_vamominw_v_w,
+        gen_helper_vamomaxw_v_w,
+        gen_helper_vamominuw_v_w,
+        gen_helper_vamomaxuw_v_w
+    };
+#ifdef TARGET_RISCV64
+    static gen_helper_amo *const fnsd[18] = {
+        gen_helper_vamoswapw_v_d,
+        gen_helper_vamoaddw_v_d,
+        gen_helper_vamoxorw_v_d,
+        gen_helper_vamoandw_v_d,
+        gen_helper_vamoorw_v_d,
+        gen_helper_vamominw_v_d,
+        gen_helper_vamomaxw_v_d,
+        gen_helper_vamominuw_v_d,
+        gen_helper_vamomaxuw_v_d,
+        gen_helper_vamoswapd_v_d,
+        gen_helper_vamoaddd_v_d,
+        gen_helper_vamoxord_v_d,
+        gen_helper_vamoandd_v_d,
+        gen_helper_vamoord_v_d,
+        gen_helper_vamomind_v_d,
+        gen_helper_vamomaxd_v_d,
+        gen_helper_vamominud_v_d,
+        gen_helper_vamomaxud_v_d
+    };
+#endif
+
+    if (tb_cflags(s->base.tb) & CF_PARALLEL) {
+        gen_helper_exit_atomic(cpu_env);
+        s->base.is_jmp = DISAS_NORETURN;
+        return true;
+    } else {
+        fn = fnsw[seq];
+#ifdef TARGET_RISCV64
+        if (s->sew == 3) {
+            fn = fnsd[seq];
+        }
+#endif
+    }
+
+    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+    data = FIELD_DP32(data, VDATA, VM, a->vm);
+    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+    data = FIELD_DP32(data, VDATA, WD, a->wd);
+    return amo_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+}
+/*
+ * There are two rules check here.
+ *
+ * 1. SEW must be at least as wide as the AMO memory element size.
+ *
+ * 2. If SEW is greater than XLEN, an illegal instruction exception is raised.
+ */
+static bool amo_check(DisasContext *s, arg_rwdvm* a)
+{
+    return (vext_check_isa_ill(s, RVV | RVA) &&
+            (!a->wd || vext_check_overlap_mask(s, a->rd, a->vm, false)) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            ((1 << s->sew) <= sizeof(target_ulong)) &&
+            ((1 << s->sew) >= 4));
+}
+
+GEN_VEXT_TRANS(vamoswapw_v, 0, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamoaddw_v, 1, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamoxorw_v, 2, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamoandw_v, 3, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamoorw_v, 4, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamominw_v, 5, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamomaxw_v, 6, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamominuw_v, 7, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamomaxuw_v, 8, rwdvm, amo_op, amo_check)
+#ifdef TARGET_RISCV64
+GEN_VEXT_TRANS(vamoswapd_v, 9, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamoaddd_v, 10, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamoxord_v, 11, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamoandd_v, 12, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamoord_v, 13, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamomind_v, 14, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamomaxd_v, 15, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamominud_v, 16, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamomaxud_v, 17, rwdvm, amo_op, amo_check)
+#endif
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 3841301b74..f9b409b169 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -94,6 +94,11 @@ static inline uint32_t vext_lmul(uint32_t desc)
     return FIELD_EX32(simd_data(desc), VDATA, LMUL);
 }
 
+static uint32_t vext_wd(uint32_t desc)
+{
+    return (simd_data(desc) >> 11) & 0x1;
+}
+
 /*
  * Get vector group length in bytes. Its range is [64, 2048].
  *
@@ -685,3 +690,141 @@ GEN_VEXT_LDFF(vlhuff_v_w, uint16_t, uint32_t, MO_LEUW, ldhu_w, clearl)
 GEN_VEXT_LDFF(vlhuff_v_d, uint16_t, uint64_t, MO_LEUW, ldhu_d, clearq)
 GEN_VEXT_LDFF(vlwuff_v_w, uint32_t, uint32_t, MO_LEUL, ldwu_w, clearl)
 GEN_VEXT_LDFF(vlwuff_v_d, uint32_t, uint64_t, MO_LEUL, ldwu_d, clearq)
+
+/*
+ *** Vector AMO Operations (Zvamo)
+ */
+typedef void (*vext_amo_noatomic_fn)(void *vs3, target_ulong addr,
+        uint32_t wd, uint32_t idx, CPURISCVState *env, uintptr_t retaddr);
+
+/* no atomic opreation for vector atomic insructions */
+#define DO_SWAP(N, M) (M)
+#define DO_AND(N, M)  (N & M)
+#define DO_XOR(N, M)  (N ^ M)
+#define DO_OR(N, M)   (N | M)
+#define DO_ADD(N, M)  (N + M)
+
+#define GEN_VEXT_AMO_NOATOMIC_OP(NAME, ESZ, MSZ, H, DO_OP, SUF) \
+static void vext_##NAME##_noatomic_op(void *vs3,                \
+            target_ulong addr, uint32_t wd, uint32_t idx,       \
+                CPURISCVState *env, uintptr_t retaddr)          \
+{                                                               \
+    typedef int##ESZ##_t ETYPE;                                 \
+    typedef int##MSZ##_t MTYPE;                                 \
+    typedef uint##MSZ##_t UMTYPE __attribute__((unused));       \
+    ETYPE *pe3 = (ETYPE *)vs3 + H(idx);                         \
+    MTYPE a = *pe3, b = cpu_ld##SUF##_data(env, addr);          \
+    a = DO_OP(a, b);                                            \
+    cpu_st##SUF##_data(env, addr, a);                           \
+    if (wd) {                                                   \
+        *pe3 = a;                                               \
+    }                                                           \
+}
+
+/* Signed min/max */
+#define DO_MAX(N, M)  ((N) >= (M) ? (N) : (M))
+#define DO_MIN(N, M)  ((N) >= (M) ? (M) : (N))
+
+/* Unsigned min/max */
+#define DO_MAXU(N, M) DO_MAX((UMTYPE)N, (UMTYPE)M)
+#define DO_MINU(N, M) DO_MIN((UMTYPE)N, (UMTYPE)M)
+
+GEN_VEXT_AMO_NOATOMIC_OP(vamoswapw_v_w, 32, 32, H4, DO_SWAP, l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoaddw_v_w,  32, 32, H4, DO_ADD,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoxorw_v_w,  32, 32, H4, DO_XOR,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoandw_v_w,  32, 32, H4, DO_AND,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoorw_v_w,   32, 32, H4, DO_OR,   l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamominw_v_w,  32, 32, H4, DO_MIN,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxw_v_w,  32, 32, H4, DO_MAX,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamominuw_v_w, 32, 32, H4, DO_MINU, l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxuw_v_w, 32, 32, H4, DO_MAXU, l)
+#ifdef TARGET_RISCV64
+GEN_VEXT_AMO_NOATOMIC_OP(vamoswapw_v_d, 64, 32, H8, DO_SWAP, l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoswapd_v_d, 64, 64, H8, DO_SWAP, q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoaddw_v_d,  64, 32, H8, DO_ADD,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoaddd_v_d,  64, 64, H8, DO_ADD,  q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoxorw_v_d,  64, 32, H8, DO_XOR,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoxord_v_d,  64, 64, H8, DO_XOR,  q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoandw_v_d,  64, 32, H8, DO_AND,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoandd_v_d,  64, 64, H8, DO_AND,  q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoorw_v_d,   64, 32, H8, DO_OR,   l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoord_v_d,   64, 64, H8, DO_OR,   q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamominw_v_d,  64, 32, H8, DO_MIN,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomind_v_d,  64, 64, H8, DO_MIN,  q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxw_v_d,  64, 32, H8, DO_MAX,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxd_v_d,  64, 64, H8, DO_MAX,  q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamominuw_v_d, 64, 32, H8, DO_MINU, l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamominud_v_d, 64, 64, H8, DO_MINU, q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxuw_v_d, 64, 32, H8, DO_MAXU, l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxud_v_d, 64, 64, H8, DO_MAXU, q)
+#endif
+
+static inline void vext_amo_noatomic(void *vs3, void *v0, target_ulong base,
+        void *vs2, CPURISCVState *env, uint32_t desc,
+        vext_get_index_addr get_index_addr,
+        vext_amo_noatomic_fn noatomic_op,
+        vext_ld_clear_elem clear_elem,
+        uint32_t esz, uint32_t msz, uintptr_t ra)
+{
+    uint32_t i;
+    target_long addr;
+    uint32_t wd = vext_wd(desc);
+    uint32_t vm = vext_vm(desc);
+    uint32_t mlen = vext_mlen(desc);
+    uint32_t vlmax = vext_maxsz(desc) / esz;
+
+    for (i = 0; i < env->vl; i++) {
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {
+            continue;
+        }
+        probe_pages(env, get_index_addr(base, i, vs2), msz, ra, MMU_DATA_LOAD);
+        probe_pages(env, get_index_addr(base, i, vs2), msz, ra, MMU_DATA_STORE);
+    }
+    for (i = 0; i < env->vl; i++) {
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {
+            continue;
+        }
+        addr = get_index_addr(base, i, vs2);
+        noatomic_op(vs3, addr, wd, i, env, ra);
+    }
+    clear_elem(vs3, env->vl, env->vl * esz, vlmax * esz);
+}
+
+#define GEN_VEXT_AMO(NAME, MTYPE, ETYPE, INDEX_FN, CLEAR_FN)    \
+void HELPER(NAME)(void *vs3, void *v0, target_ulong base,       \
+        void *vs2, CPURISCVState *env, uint32_t desc)           \
+{                                                               \
+    vext_amo_noatomic(vs3, v0, base, vs2, env, desc,            \
+        INDEX_FN, vext_##NAME##_noatomic_op, CLEAR_FN,          \
+        sizeof(ETYPE), sizeof(MTYPE), GETPC());                 \
+}
+
+#ifdef TARGET_RISCV64
+GEN_VEXT_AMO(vamoswapw_v_d, int32_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamoswapd_v_d, int64_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamoaddw_v_d,  int32_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamoaddd_v_d,  int64_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamoxorw_v_d,  int32_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamoxord_v_d,  int64_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamoandw_v_d,  int32_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamoandd_v_d,  int64_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamoorw_v_d,   int32_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamoord_v_d,   int64_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamominw_v_d,  int32_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamomind_v_d,  int64_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamomaxw_v_d,  int32_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamomaxd_v_d,  int64_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamominuw_v_d, uint32_t, uint64_t, idx_d, clearq)
+GEN_VEXT_AMO(vamominud_v_d, uint64_t, uint64_t, idx_d, clearq)
+GEN_VEXT_AMO(vamomaxuw_v_d, uint32_t, uint64_t, idx_d, clearq)
+GEN_VEXT_AMO(vamomaxud_v_d, uint64_t, uint64_t, idx_d, clearq)
+#endif
+GEN_VEXT_AMO(vamoswapw_v_w, int32_t,  int32_t,  idx_w, clearl)
+GEN_VEXT_AMO(vamoaddw_v_w,  int32_t,  int32_t,  idx_w, clearl)
+GEN_VEXT_AMO(vamoxorw_v_w,  int32_t,  int32_t,  idx_w, clearl)
+GEN_VEXT_AMO(vamoandw_v_w,  int32_t,  int32_t,  idx_w, clearl)
+GEN_VEXT_AMO(vamoorw_v_w,   int32_t,  int32_t,  idx_w, clearl)
+GEN_VEXT_AMO(vamominw_v_w,  int32_t,  int32_t,  idx_w, clearl)
+GEN_VEXT_AMO(vamomaxw_v_w,  int32_t,  int32_t,  idx_w, clearl)
+GEN_VEXT_AMO(vamominuw_v_w, uint32_t, uint32_t, idx_w, clearl)
+GEN_VEXT_AMO(vamomaxuw_v_w, uint32_t, uint32_t, idx_w, clearl)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 08/60] target/riscv: add vector amo operations
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Vector AMOs operate as if aq and rl bits were zero on each element
with regard to ordering relative to other instructions in the same hart.
Vector AMOs provide no ordering guarantee between element operations
in the same vector AMO instruction

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/cpu.h                      |   1 +
 target/riscv/helper.h                   |  29 +++++
 target/riscv/insn32-64.decode           |  11 ++
 target/riscv/insn32.decode              |  13 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 130 +++++++++++++++++++++
 target/riscv/vector_helper.c            | 143 ++++++++++++++++++++++++
 6 files changed, 327 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index b6ebb9b0eb..e069e55e81 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -374,6 +374,7 @@ FIELD(VDATA, MLEN, 0, 8)
 FIELD(VDATA, VM, 8, 1)
 FIELD(VDATA, LMUL, 9, 2)
 FIELD(VDATA, NF, 11, 4)
+FIELD(VDATA, WD, 11, 1)
 
 FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
 FIELD(TB_FLAGS, LMUL, 3, 2)
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 72ba4d9bdb..70a4b05f75 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -240,3 +240,32 @@ DEF_HELPER_5(vlhuff_v_w, void, ptr, ptr, tl, env, i32)
 DEF_HELPER_5(vlhuff_v_d, void, ptr, ptr, tl, env, i32)
 DEF_HELPER_5(vlwuff_v_w, void, ptr, ptr, tl, env, i32)
 DEF_HELPER_5(vlwuff_v_d, void, ptr, ptr, tl, env, i32)
+#ifdef TARGET_RISCV64
+DEF_HELPER_6(vamoswapw_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoswapd_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoaddw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoaddd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoxorw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoxord_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoandw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoandd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoorw_v_d,   void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoord_v_d,   void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomind_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominuw_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominud_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxuw_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxud_v_d, void, ptr, ptr, tl, ptr, env, i32)
+#endif
+DEF_HELPER_6(vamoswapw_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoaddw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoxorw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoandw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoorw_v_w,   void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index 380bf791bc..86153d93fa 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -57,6 +57,17 @@ amomax_d   10100 . . ..... ..... 011 ..... 0101111 @atom_st
 amominu_d  11000 . . ..... ..... 011 ..... 0101111 @atom_st
 amomaxu_d  11100 . . ..... ..... 011 ..... 0101111 @atom_st
 
+#*** Vector AMO operations (in addition to Zvamo) ***
+vamoswapd_v     00001 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamoaddd_v      00000 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamoxord_v      00100 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamoandd_v      01100 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamoord_v       01000 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamomind_v      10000 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamomaxd_v      10100 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamominud_v     11000 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamomaxud_v     11100 . . ..... ..... 111 ..... 0101111 @r_wdvm
+
 # *** RV64F Standard Extension (in addition to RV32F) ***
 fcvt_l_s   1100000  00010 ..... ... ..... 1010011 @r2_rm
 fcvt_lu_s  1100000  00011 ..... ... ..... 1010011 @r2_rm
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b76c09c8c0..1330703720 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -44,6 +44,7 @@
 &u    imm rd
 &shift     shamt rs1 rd
 &atomic    aq rl rs2 rs1 rd
+&rwdvm     vm wd rd rs1 rs2
 &r2nfvm    vm rd rs1 nf
 &rnfvm     vm rd rs1 rs2 nf
 
@@ -67,6 +68,7 @@
 @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
 @r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
 @r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
+@r_wdvm  ..... wd:1 vm:1 ..... ..... ... ..... ....... &rwdvm %rs2 %rs1 %rd
 @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
 
 @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
@@ -261,6 +263,17 @@ vsxh_v     ... -11 . ..... ..... 101 ..... 0100111 @r_nfvm
 vsxw_v     ... -11 . ..... ..... 110 ..... 0100111 @r_nfvm
 vsxe_v     ... -11 . ..... ..... 111 ..... 0100111 @r_nfvm
 
+#*** Vector AMO operations are encoded under the standard AMO major opcode ***
+vamoswapw_v     00001 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamoaddw_v      00000 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamoxorw_v      00100 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamoandw_v      01100 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamoorw_v       01000 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamominw_v      10000 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamomaxw_v      10100 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamominuw_v     11000 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamomaxuw_v     11100 . . ..... ..... 110 ..... 0101111 @r_wdvm
+
 # *** new major opcode OP-V ***
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 9d9fc886d6..3c677160c5 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -600,3 +600,133 @@ GEN_VEXT_TRANS(vleff_v, 3, r2nfvm, ldff_op, ld_us_check)
 GEN_VEXT_TRANS(vlbuff_v, 4, r2nfvm, ldff_op, ld_us_check)
 GEN_VEXT_TRANS(vlhuff_v, 5, r2nfvm, ldff_op, ld_us_check)
 GEN_VEXT_TRANS(vlwuff_v, 6, r2nfvm, ldff_op, ld_us_check)
+
+/*
+ *** vector atomic operation
+ */
+typedef void gen_helper_amo(TCGv_ptr, TCGv_ptr, TCGv, TCGv_ptr,
+        TCGv_env, TCGv_i32);
+
+static bool amo_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
+        uint32_t data, gen_helper_amo *fn, DisasContext *s)
+{
+    TCGv_ptr dest, mask, index;
+    TCGv base;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    index = tcg_temp_new_ptr();
+    base = tcg_temp_new();
+    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
+
+    gen_get_gpr(base, rs1);
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(index, cpu_env, vreg_ofs(s, vs2));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, mask, base, index, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free_ptr(index);
+    tcg_temp_free(base);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool amo_op(DisasContext *s, arg_rwdvm *a, uint8_t seq)
+{
+    uint32_t data = 0;
+    gen_helper_amo *fn;
+    static gen_helper_amo *const fnsw[9] = {
+        /* no atomic operation */
+        gen_helper_vamoswapw_v_w,
+        gen_helper_vamoaddw_v_w,
+        gen_helper_vamoxorw_v_w,
+        gen_helper_vamoandw_v_w,
+        gen_helper_vamoorw_v_w,
+        gen_helper_vamominw_v_w,
+        gen_helper_vamomaxw_v_w,
+        gen_helper_vamominuw_v_w,
+        gen_helper_vamomaxuw_v_w
+    };
+#ifdef TARGET_RISCV64
+    static gen_helper_amo *const fnsd[18] = {
+        gen_helper_vamoswapw_v_d,
+        gen_helper_vamoaddw_v_d,
+        gen_helper_vamoxorw_v_d,
+        gen_helper_vamoandw_v_d,
+        gen_helper_vamoorw_v_d,
+        gen_helper_vamominw_v_d,
+        gen_helper_vamomaxw_v_d,
+        gen_helper_vamominuw_v_d,
+        gen_helper_vamomaxuw_v_d,
+        gen_helper_vamoswapd_v_d,
+        gen_helper_vamoaddd_v_d,
+        gen_helper_vamoxord_v_d,
+        gen_helper_vamoandd_v_d,
+        gen_helper_vamoord_v_d,
+        gen_helper_vamomind_v_d,
+        gen_helper_vamomaxd_v_d,
+        gen_helper_vamominud_v_d,
+        gen_helper_vamomaxud_v_d
+    };
+#endif
+
+    if (tb_cflags(s->base.tb) & CF_PARALLEL) {
+        gen_helper_exit_atomic(cpu_env);
+        s->base.is_jmp = DISAS_NORETURN;
+        return true;
+    } else {
+        fn = fnsw[seq];
+#ifdef TARGET_RISCV64
+        if (s->sew == 3) {
+            fn = fnsd[seq];
+        }
+#endif
+    }
+
+    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+    data = FIELD_DP32(data, VDATA, VM, a->vm);
+    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+    data = FIELD_DP32(data, VDATA, WD, a->wd);
+    return amo_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+}
+/*
+ * There are two rules check here.
+ *
+ * 1. SEW must be at least as wide as the AMO memory element size.
+ *
+ * 2. If SEW is greater than XLEN, an illegal instruction exception is raised.
+ */
+static bool amo_check(DisasContext *s, arg_rwdvm* a)
+{
+    return (vext_check_isa_ill(s, RVV | RVA) &&
+            (!a->wd || vext_check_overlap_mask(s, a->rd, a->vm, false)) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            ((1 << s->sew) <= sizeof(target_ulong)) &&
+            ((1 << s->sew) >= 4));
+}
+
+GEN_VEXT_TRANS(vamoswapw_v, 0, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamoaddw_v, 1, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamoxorw_v, 2, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamoandw_v, 3, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamoorw_v, 4, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamominw_v, 5, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamomaxw_v, 6, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamominuw_v, 7, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamomaxuw_v, 8, rwdvm, amo_op, amo_check)
+#ifdef TARGET_RISCV64
+GEN_VEXT_TRANS(vamoswapd_v, 9, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamoaddd_v, 10, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamoxord_v, 11, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamoandd_v, 12, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamoord_v, 13, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamomind_v, 14, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamomaxd_v, 15, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamominud_v, 16, rwdvm, amo_op, amo_check)
+GEN_VEXT_TRANS(vamomaxud_v, 17, rwdvm, amo_op, amo_check)
+#endif
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 3841301b74..f9b409b169 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -94,6 +94,11 @@ static inline uint32_t vext_lmul(uint32_t desc)
     return FIELD_EX32(simd_data(desc), VDATA, LMUL);
 }
 
+static uint32_t vext_wd(uint32_t desc)
+{
+    return (simd_data(desc) >> 11) & 0x1;
+}
+
 /*
  * Get vector group length in bytes. Its range is [64, 2048].
  *
@@ -685,3 +690,141 @@ GEN_VEXT_LDFF(vlhuff_v_w, uint16_t, uint32_t, MO_LEUW, ldhu_w, clearl)
 GEN_VEXT_LDFF(vlhuff_v_d, uint16_t, uint64_t, MO_LEUW, ldhu_d, clearq)
 GEN_VEXT_LDFF(vlwuff_v_w, uint32_t, uint32_t, MO_LEUL, ldwu_w, clearl)
 GEN_VEXT_LDFF(vlwuff_v_d, uint32_t, uint64_t, MO_LEUL, ldwu_d, clearq)
+
+/*
+ *** Vector AMO Operations (Zvamo)
+ */
+typedef void (*vext_amo_noatomic_fn)(void *vs3, target_ulong addr,
+        uint32_t wd, uint32_t idx, CPURISCVState *env, uintptr_t retaddr);
+
+/* no atomic opreation for vector atomic insructions */
+#define DO_SWAP(N, M) (M)
+#define DO_AND(N, M)  (N & M)
+#define DO_XOR(N, M)  (N ^ M)
+#define DO_OR(N, M)   (N | M)
+#define DO_ADD(N, M)  (N + M)
+
+#define GEN_VEXT_AMO_NOATOMIC_OP(NAME, ESZ, MSZ, H, DO_OP, SUF) \
+static void vext_##NAME##_noatomic_op(void *vs3,                \
+            target_ulong addr, uint32_t wd, uint32_t idx,       \
+                CPURISCVState *env, uintptr_t retaddr)          \
+{                                                               \
+    typedef int##ESZ##_t ETYPE;                                 \
+    typedef int##MSZ##_t MTYPE;                                 \
+    typedef uint##MSZ##_t UMTYPE __attribute__((unused));       \
+    ETYPE *pe3 = (ETYPE *)vs3 + H(idx);                         \
+    MTYPE a = *pe3, b = cpu_ld##SUF##_data(env, addr);          \
+    a = DO_OP(a, b);                                            \
+    cpu_st##SUF##_data(env, addr, a);                           \
+    if (wd) {                                                   \
+        *pe3 = a;                                               \
+    }                                                           \
+}
+
+/* Signed min/max */
+#define DO_MAX(N, M)  ((N) >= (M) ? (N) : (M))
+#define DO_MIN(N, M)  ((N) >= (M) ? (M) : (N))
+
+/* Unsigned min/max */
+#define DO_MAXU(N, M) DO_MAX((UMTYPE)N, (UMTYPE)M)
+#define DO_MINU(N, M) DO_MIN((UMTYPE)N, (UMTYPE)M)
+
+GEN_VEXT_AMO_NOATOMIC_OP(vamoswapw_v_w, 32, 32, H4, DO_SWAP, l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoaddw_v_w,  32, 32, H4, DO_ADD,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoxorw_v_w,  32, 32, H4, DO_XOR,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoandw_v_w,  32, 32, H4, DO_AND,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoorw_v_w,   32, 32, H4, DO_OR,   l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamominw_v_w,  32, 32, H4, DO_MIN,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxw_v_w,  32, 32, H4, DO_MAX,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamominuw_v_w, 32, 32, H4, DO_MINU, l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxuw_v_w, 32, 32, H4, DO_MAXU, l)
+#ifdef TARGET_RISCV64
+GEN_VEXT_AMO_NOATOMIC_OP(vamoswapw_v_d, 64, 32, H8, DO_SWAP, l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoswapd_v_d, 64, 64, H8, DO_SWAP, q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoaddw_v_d,  64, 32, H8, DO_ADD,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoaddd_v_d,  64, 64, H8, DO_ADD,  q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoxorw_v_d,  64, 32, H8, DO_XOR,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoxord_v_d,  64, 64, H8, DO_XOR,  q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoandw_v_d,  64, 32, H8, DO_AND,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoandd_v_d,  64, 64, H8, DO_AND,  q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoorw_v_d,   64, 32, H8, DO_OR,   l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoord_v_d,   64, 64, H8, DO_OR,   q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamominw_v_d,  64, 32, H8, DO_MIN,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomind_v_d,  64, 64, H8, DO_MIN,  q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxw_v_d,  64, 32, H8, DO_MAX,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxd_v_d,  64, 64, H8, DO_MAX,  q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamominuw_v_d, 64, 32, H8, DO_MINU, l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamominud_v_d, 64, 64, H8, DO_MINU, q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxuw_v_d, 64, 32, H8, DO_MAXU, l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxud_v_d, 64, 64, H8, DO_MAXU, q)
+#endif
+
+static inline void vext_amo_noatomic(void *vs3, void *v0, target_ulong base,
+        void *vs2, CPURISCVState *env, uint32_t desc,
+        vext_get_index_addr get_index_addr,
+        vext_amo_noatomic_fn noatomic_op,
+        vext_ld_clear_elem clear_elem,
+        uint32_t esz, uint32_t msz, uintptr_t ra)
+{
+    uint32_t i;
+    target_long addr;
+    uint32_t wd = vext_wd(desc);
+    uint32_t vm = vext_vm(desc);
+    uint32_t mlen = vext_mlen(desc);
+    uint32_t vlmax = vext_maxsz(desc) / esz;
+
+    for (i = 0; i < env->vl; i++) {
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {
+            continue;
+        }
+        probe_pages(env, get_index_addr(base, i, vs2), msz, ra, MMU_DATA_LOAD);
+        probe_pages(env, get_index_addr(base, i, vs2), msz, ra, MMU_DATA_STORE);
+    }
+    for (i = 0; i < env->vl; i++) {
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {
+            continue;
+        }
+        addr = get_index_addr(base, i, vs2);
+        noatomic_op(vs3, addr, wd, i, env, ra);
+    }
+    clear_elem(vs3, env->vl, env->vl * esz, vlmax * esz);
+}
+
+#define GEN_VEXT_AMO(NAME, MTYPE, ETYPE, INDEX_FN, CLEAR_FN)    \
+void HELPER(NAME)(void *vs3, void *v0, target_ulong base,       \
+        void *vs2, CPURISCVState *env, uint32_t desc)           \
+{                                                               \
+    vext_amo_noatomic(vs3, v0, base, vs2, env, desc,            \
+        INDEX_FN, vext_##NAME##_noatomic_op, CLEAR_FN,          \
+        sizeof(ETYPE), sizeof(MTYPE), GETPC());                 \
+}
+
+#ifdef TARGET_RISCV64
+GEN_VEXT_AMO(vamoswapw_v_d, int32_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamoswapd_v_d, int64_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamoaddw_v_d,  int32_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamoaddd_v_d,  int64_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamoxorw_v_d,  int32_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamoxord_v_d,  int64_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamoandw_v_d,  int32_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamoandd_v_d,  int64_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamoorw_v_d,   int32_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamoord_v_d,   int64_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamominw_v_d,  int32_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamomind_v_d,  int64_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamomaxw_v_d,  int32_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamomaxd_v_d,  int64_t,  int64_t,  idx_d, clearq)
+GEN_VEXT_AMO(vamominuw_v_d, uint32_t, uint64_t, idx_d, clearq)
+GEN_VEXT_AMO(vamominud_v_d, uint64_t, uint64_t, idx_d, clearq)
+GEN_VEXT_AMO(vamomaxuw_v_d, uint32_t, uint64_t, idx_d, clearq)
+GEN_VEXT_AMO(vamomaxud_v_d, uint64_t, uint64_t, idx_d, clearq)
+#endif
+GEN_VEXT_AMO(vamoswapw_v_w, int32_t,  int32_t,  idx_w, clearl)
+GEN_VEXT_AMO(vamoaddw_v_w,  int32_t,  int32_t,  idx_w, clearl)
+GEN_VEXT_AMO(vamoxorw_v_w,  int32_t,  int32_t,  idx_w, clearl)
+GEN_VEXT_AMO(vamoandw_v_w,  int32_t,  int32_t,  idx_w, clearl)
+GEN_VEXT_AMO(vamoorw_v_w,   int32_t,  int32_t,  idx_w, clearl)
+GEN_VEXT_AMO(vamominw_v_w,  int32_t,  int32_t,  idx_w, clearl)
+GEN_VEXT_AMO(vamomaxw_v_w,  int32_t,  int32_t,  idx_w, clearl)
+GEN_VEXT_AMO(vamominuw_v_w, uint32_t, uint32_t, idx_w, clearl)
+GEN_VEXT_AMO(vamomaxuw_v_w, uint32_t, uint32_t, idx_w, clearl)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 09/60] target/riscv: vector single-width integer add and subtract
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  21 +++
 target/riscv/insn32.decode              |  10 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 220 ++++++++++++++++++++++++
 target/riscv/vector_helper.c            | 122 +++++++++++++
 4 files changed, 373 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 70a4b05f75..e73701d4bb 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -269,3 +269,24 @@ DEF_HELPER_6(vamominw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vamomaxw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vamominuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vamomaxuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vadd_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsub_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadd_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vadd_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vadd_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vadd_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrsub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrsub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrsub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrsub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 1330703720..d1034a0e61 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -44,6 +44,7 @@
 &u    imm rd
 &shift     shamt rs1 rd
 &atomic    aq rl rs2 rs1 rd
+&rmrr      vm rd rs1 rs2
 &rwdvm     vm wd rd rs1 rs2
 &r2nfvm    vm rd rs1 nf
 &rnfvm     vm rd rs1 rs2 nf
@@ -68,6 +69,7 @@
 @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
 @r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
 @r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
+@r_vm    ...... vm:1 ..... ..... ... ..... ....... &rmrr %rs2 %rs1 %rd
 @r_wdvm  ..... wd:1 vm:1 ..... ..... ... ..... ....... &rwdvm %rs2 %rs1 %rd
 @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
 
@@ -275,5 +277,13 @@ vamominuw_v     11000 . . ..... ..... 110 ..... 0101111 @r_wdvm
 vamomaxuw_v     11100 . . ..... ..... 110 ..... 0101111 @r_wdvm
 
 # *** new major opcode OP-V ***
+vadd_vv         000000 . ..... ..... 000 ..... 1010111 @r_vm
+vadd_vx         000000 . ..... ..... 100 ..... 1010111 @r_vm
+vadd_vi         000000 . ..... ..... 011 ..... 1010111 @r_vm
+vsub_vv         000010 . ..... ..... 000 ..... 1010111 @r_vm
+vsub_vx         000010 . ..... ..... 100 ..... 1010111 @r_vm
+vrsub_vx        000011 . ..... ..... 100 ..... 1010111 @r_vm
+vrsub_vi        000011 . ..... ..... 011 ..... 1010111 @r_vm
+
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 3c677160c5..00c7ec976f 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -730,3 +730,223 @@ GEN_VEXT_TRANS(vamomaxd_v, 15, rwdvm, amo_op, amo_check)
 GEN_VEXT_TRANS(vamominud_v, 16, rwdvm, amo_op, amo_check)
 GEN_VEXT_TRANS(vamomaxud_v, 17, rwdvm, amo_op, amo_check)
 #endif
+
+/*
+ *** Vector Integer Arithmetic Instructions
+ */
+#define MAXSZ(s) (s->vlen >> (3 - s->lmul))
+
+static bool opivv_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_reg(s, a->rs1, false));
+}
+
+/* OPIVV with GVEC IR */
+#define GEN_OPIVV_GVEC_TRANS(NAME, GVSUF)                          \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
+{                                                                  \
+    if (!opivv_check(s, a)) {                                      \
+        return false;                                              \
+    }                                                              \
+                                                                   \
+    if (a->vm && s->vl_eq_vlmax) {                                 \
+        tcg_gen_gvec_##GVSUF(8 << s->sew, vreg_ofs(s, a->rd),      \
+            vreg_ofs(s, a->rs2), vreg_ofs(s, a->rs1),              \
+            MAXSZ(s), MAXSZ(s));                                   \
+    } else {                                                       \
+        uint32_t data = 0;                                         \
+        static gen_helper_gvec_4_ptr * const fns[4] = {            \
+            gen_helper_##NAME##_b, gen_helper_##NAME##_h,          \
+            gen_helper_##NAME##_w, gen_helper_##NAME##_d,          \
+        };                                                         \
+                                                                   \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
+            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),              \
+            cpu_env, 0, s->vlen / 8, data, fns[s->sew]);           \
+    }                                                              \
+    return true;                                                   \
+}
+GEN_OPIVV_GVEC_TRANS(vadd_vv, add)
+GEN_OPIVV_GVEC_TRANS(vsub_vv, sub)
+
+typedef void (*gen_helper_opivx)(TCGv_ptr, TCGv_ptr, TCGv, TCGv_ptr,
+        TCGv_env, TCGv_i32);
+
+static bool opivx_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
+        uint32_t data, gen_helper_opivx fn, DisasContext *s)
+{
+    TCGv_ptr dest, src2, mask;
+    TCGv src1;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    src2 = tcg_temp_new_ptr();
+    src1 = tcg_temp_new();
+    gen_get_gpr(src1, rs1);
+    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
+
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, vs2));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, mask, src1, src2, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free_ptr(src2);
+    tcg_temp_free(src1);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool opivx_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false));
+}
+/* OPIVX with GVEC IR */
+#define GEN_OPIVX_GVEC_TRANS(NAME, GVSUF)                                     \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                        \
+{                                                                             \
+    if (!opivx_check(s, a)) {                                                 \
+        return false;                                                         \
+    }                                                                         \
+                                                                              \
+    if (a->vm && s->vl_eq_vlmax) {                                            \
+        TCGv_i64 src1 = tcg_temp_new_i64();                                   \
+        TCGv tmp = tcg_temp_new();                                            \
+        gen_get_gpr(tmp, a->rs1);                                             \
+        tcg_gen_ext_tl_i64(src1, tmp);                                        \
+        tcg_gen_gvec_##GVSUF(8 << s->sew, vreg_ofs(s, a->rd),                 \
+            vreg_ofs(s, a->rs2), src1, MAXSZ(s), MAXSZ(s));                   \
+        tcg_temp_free_i64(src1);                                              \
+        tcg_temp_free(tmp);                                                   \
+        return true;                                                          \
+    } else {                                                                  \
+        uint32_t data = 0;                                                    \
+        static gen_helper_opivx const fns[4] = {                              \
+            gen_helper_##NAME##_b, gen_helper_##NAME##_h,                     \
+            gen_helper_##NAME##_w, gen_helper_##NAME##_d,                     \
+        };                                                                    \
+                                                                              \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                        \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                            \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                        \
+        return opivx_trans(a->rd, a->rs1, a->rs2, data, fns[s->sew], s);      \
+    }                                                                         \
+    return true;                                                              \
+}
+GEN_OPIVX_GVEC_TRANS(vadd_vx, adds)
+GEN_OPIVX_GVEC_TRANS(vsub_vx, subs)
+
+/* OPIVX without GVEC IR */
+#define GEN_OPIVX_TRANS(NAME, CHECK)                                     \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
+{                                                                        \
+    if (CHECK(s, a)) {                                                   \
+        uint32_t data = 0;                                               \
+        static gen_helper_opivx const fns[4] = {                         \
+            gen_helper_##NAME##_b, gen_helper_##NAME##_h,                \
+            gen_helper_##NAME##_w, gen_helper_##NAME##_d,                \
+        };                                                               \
+                                                                         \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                       \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
+        return opivx_trans(a->rd, a->rs1, a->rs2, data, fns[s->sew], s); \
+    }                                                                    \
+    return false;                                                        \
+}
+
+GEN_OPIVX_TRANS(vrsub_vx, opivx_check)
+
+static bool opivi_trans(uint32_t vd, uint32_t imm, uint32_t vs2,
+        uint32_t data, gen_helper_opivx fn, DisasContext *s, int zx)
+{
+    TCGv_ptr dest, src2, mask;
+    TCGv src1;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    src2 = tcg_temp_new_ptr();
+    if (zx) {
+        src1 = tcg_const_tl(imm);
+    } else {
+        src1 = tcg_const_tl(sextract64(imm, 0, 5));
+    }
+    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
+
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, vs2));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, mask, src1, src2, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free_ptr(src2);
+    tcg_temp_free(src1);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+/* OPIVI with GVEC IR */
+#define GEN_OPIVI_GVEC_TRANS(NAME, ZX, OPIVX, GVSUF)                 \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)               \
+{                                                                    \
+    if (!opivx_check(s, a)) {                                        \
+        return false;                                                \
+    }                                                                \
+                                                                     \
+    if (a->vm && s->vl_eq_vlmax) {                                   \
+        tcg_gen_gvec_##GVSUF(8 << s->sew, vreg_ofs(s, a->rd),        \
+            vreg_ofs(s, a->rs2), sextract64(a->rs1, 0, 5),           \
+            MAXSZ(s), MAXSZ(s));                                     \
+        return true;                                                 \
+    } else {                                                         \
+        uint32_t data = 0;                                           \
+        static gen_helper_opivx const fns[4] = {                     \
+            gen_helper_##OPIVX##_b, gen_helper_##OPIVX##_h,          \
+            gen_helper_##OPIVX##_w, gen_helper_##OPIVX##_d,          \
+        };                                                           \
+                                                                     \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);               \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                   \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);               \
+        return opivi_trans(a->rd, a->rs1, a->rs2, data,              \
+                fns[s->sew], s, ZX);                                 \
+    }                                                                \
+    return true;                                                     \
+}
+GEN_OPIVI_GVEC_TRANS(vadd_vi, 0, vadd_vx, addi)
+
+/* OPIVI without GVEC IR */
+#define GEN_OPIVI_TRANS(NAME, ZX, OPIVX, CHECK)                          \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
+{                                                                        \
+    if (CHECK(s, a)) {                                                   \
+        uint32_t data = 0;                                               \
+        static gen_helper_opivx const fns[4] = {                         \
+            gen_helper_##OPIVX##_b, gen_helper_##OPIVX##_h,              \
+            gen_helper_##OPIVX##_w, gen_helper_##OPIVX##_d,              \
+        };                                                               \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                       \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
+        return opivi_trans(a->rd, a->rs1, a->rs2, data,                  \
+                fns[s->sew], s, ZX);                                     \
+    }                                                                    \
+    return false;                                                        \
+}
+GEN_OPIVI_TRANS(vrsub_vi, 0, vrsub_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index f9b409b169..abdf3b82a8 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -828,3 +828,125 @@ GEN_VEXT_AMO(vamominw_v_w,  int32_t,  int32_t,  idx_w, clearl)
 GEN_VEXT_AMO(vamomaxw_v_w,  int32_t,  int32_t,  idx_w, clearl)
 GEN_VEXT_AMO(vamominuw_v_w, uint32_t, uint32_t, idx_w, clearl)
 GEN_VEXT_AMO(vamomaxuw_v_w, uint32_t, uint32_t, idx_w, clearl)
+
+/*
+ *** Vector Integer Arithmetic Instructions
+ */
+
+/* expand macro args before macro */
+#define RVVCALL(macro, ...)  macro(__VA_ARGS__)
+
+/* (TD, T1, T2, TX1, TX2) */
+#define OP_SSS_B int8_t, int8_t, int8_t, int8_t, int8_t
+#define OP_SSS_H int16_t, int16_t, int16_t, int16_t, int16_t
+#define OP_SSS_W int32_t, int32_t, int32_t, int32_t, int32_t
+#define OP_SSS_D int64_t, int64_t, int64_t, int64_t, int64_t
+
+/* operation of two vector elements */
+#define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
+static void do_##NAME(void *vd, void *vs1, void *vs2, int i)    \
+{                                                               \
+    TX1 s1 = *((T1 *)vs1 + HS1(i));                             \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                             \
+    *((TD *)vd + HD(i)) = OP(s2, s1);                           \
+}
+#define DO_SUB(N, M) (N - M)
+#define DO_RSUB(N, M) (M - N)
+
+RVVCALL(OPIVV2, vadd_vv_b, OP_SSS_B, H1, H1, H1, DO_ADD)
+RVVCALL(OPIVV2, vadd_vv_h, OP_SSS_H, H2, H2, H2, DO_ADD)
+RVVCALL(OPIVV2, vadd_vv_w, OP_SSS_W, H4, H4, H4, DO_ADD)
+RVVCALL(OPIVV2, vadd_vv_d, OP_SSS_D, H8, H8, H8, DO_ADD)
+RVVCALL(OPIVV2, vsub_vv_b, OP_SSS_B, H1, H1, H1, DO_SUB)
+RVVCALL(OPIVV2, vsub_vv_h, OP_SSS_H, H2, H2, H2, DO_SUB)
+RVVCALL(OPIVV2, vsub_vv_w, OP_SSS_W, H4, H4, H4, DO_SUB)
+RVVCALL(OPIVV2, vsub_vv_d, OP_SSS_D, H8, H8, H8, DO_SUB)
+
+/* generate the helpers for OPIVV */
+#define GEN_VEXT_VV(NAME, ESZ, DSZ, CLEAR_FN)             \
+void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
+        void *vs2, CPURISCVState *env, uint32_t desc)     \
+{                                                         \
+    uint32_t vlmax = vext_maxsz(desc) / ESZ;              \
+    uint32_t mlen = vext_mlen(desc);                      \
+    uint32_t vm = vext_vm(desc);                          \
+    uint32_t vl = env->vl;                                \
+    uint32_t i;                                           \
+    for (i = 0; i < vl; i++) {                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
+            continue;                                     \
+        }                                                 \
+        do_##NAME(vd, vs1, vs2, i);                       \
+    }                                                     \
+    if (i != 0) {                                         \
+        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);         \
+    }                                                     \
+}
+
+GEN_VEXT_VV(vadd_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vadd_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vadd_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vadd_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vsub_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vsub_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vsub_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vsub_vv_d, 8, 8, clearq)
+
+/*
+ * If XLEN < SEW, the value from the x register is sign-extended to SEW bits.
+ * So (target_long)s1 is need. (T1)(target_long)s1 gives the real operator type.
+ * (TX1)(T1)(target_long)s1 expands the operator type of widen operations
+ * or narrow operations
+ */
+#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
+static void do_##NAME(void *vd, target_ulong s1, void *vs2, int i)  \
+{                                                                   \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
+    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)(target_long)s1);         \
+}
+RVVCALL(OPIVX2, vadd_vx_b, OP_SSS_B, H1, H1, DO_ADD)
+RVVCALL(OPIVX2, vadd_vx_h, OP_SSS_H, H2, H2, DO_ADD)
+RVVCALL(OPIVX2, vadd_vx_w, OP_SSS_W, H4, H4, DO_ADD)
+RVVCALL(OPIVX2, vadd_vx_d, OP_SSS_D, H8, H8, DO_ADD)
+RVVCALL(OPIVX2, vsub_vx_b, OP_SSS_B, H1, H1, DO_SUB)
+RVVCALL(OPIVX2, vsub_vx_h, OP_SSS_H, H2, H2, DO_SUB)
+RVVCALL(OPIVX2, vsub_vx_w, OP_SSS_W, H4, H4, DO_SUB)
+RVVCALL(OPIVX2, vsub_vx_d, OP_SSS_D, H8, H8, DO_SUB)
+RVVCALL(OPIVX2, vrsub_vx_b, OP_SSS_B, H1, H1, DO_RSUB)
+RVVCALL(OPIVX2, vrsub_vx_h, OP_SSS_H, H2, H2, DO_RSUB)
+RVVCALL(OPIVX2, vrsub_vx_w, OP_SSS_W, H4, H4, DO_RSUB)
+RVVCALL(OPIVX2, vrsub_vx_d, OP_SSS_D, H8, H8, DO_RSUB)
+
+/* generate the helpers for instructions with one vector and one sclar */
+#define GEN_VEXT_VX(NAME, ESZ, DSZ, CLEAR_FN)             \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
+        void *vs2, CPURISCVState *env, uint32_t desc)     \
+{                                                         \
+    uint32_t vlmax = vext_maxsz(desc) / ESZ;              \
+    uint32_t mlen = vext_mlen(desc);                      \
+    uint32_t vm = vext_vm(desc);                          \
+    uint32_t vl = env->vl;                                \
+    uint32_t i;                                           \
+                                                          \
+    for (i = 0; i < vl; i++) {                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
+            continue;                                     \
+        }                                                 \
+        do_##NAME(vd, s1, vs2, i);                        \
+    }                                                     \
+    if (i != 0) {                                         \
+        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);         \
+    }                                                     \
+}
+GEN_VEXT_VX(vadd_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vadd_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vadd_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vadd_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vsub_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vsub_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vsub_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vsub_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vrsub_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vrsub_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vrsub_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vrsub_vx_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 09/60] target/riscv: vector single-width integer add and subtract
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  21 +++
 target/riscv/insn32.decode              |  10 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 220 ++++++++++++++++++++++++
 target/riscv/vector_helper.c            | 122 +++++++++++++
 4 files changed, 373 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 70a4b05f75..e73701d4bb 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -269,3 +269,24 @@ DEF_HELPER_6(vamominw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vamomaxw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vamominuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vamomaxuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vadd_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsub_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadd_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vadd_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vadd_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vadd_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrsub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrsub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrsub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrsub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 1330703720..d1034a0e61 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -44,6 +44,7 @@
 &u    imm rd
 &shift     shamt rs1 rd
 &atomic    aq rl rs2 rs1 rd
+&rmrr      vm rd rs1 rs2
 &rwdvm     vm wd rd rs1 rs2
 &r2nfvm    vm rd rs1 nf
 &rnfvm     vm rd rs1 rs2 nf
@@ -68,6 +69,7 @@
 @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
 @r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
 @r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
+@r_vm    ...... vm:1 ..... ..... ... ..... ....... &rmrr %rs2 %rs1 %rd
 @r_wdvm  ..... wd:1 vm:1 ..... ..... ... ..... ....... &rwdvm %rs2 %rs1 %rd
 @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
 
@@ -275,5 +277,13 @@ vamominuw_v     11000 . . ..... ..... 110 ..... 0101111 @r_wdvm
 vamomaxuw_v     11100 . . ..... ..... 110 ..... 0101111 @r_wdvm
 
 # *** new major opcode OP-V ***
+vadd_vv         000000 . ..... ..... 000 ..... 1010111 @r_vm
+vadd_vx         000000 . ..... ..... 100 ..... 1010111 @r_vm
+vadd_vi         000000 . ..... ..... 011 ..... 1010111 @r_vm
+vsub_vv         000010 . ..... ..... 000 ..... 1010111 @r_vm
+vsub_vx         000010 . ..... ..... 100 ..... 1010111 @r_vm
+vrsub_vx        000011 . ..... ..... 100 ..... 1010111 @r_vm
+vrsub_vi        000011 . ..... ..... 011 ..... 1010111 @r_vm
+
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 3c677160c5..00c7ec976f 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -730,3 +730,223 @@ GEN_VEXT_TRANS(vamomaxd_v, 15, rwdvm, amo_op, amo_check)
 GEN_VEXT_TRANS(vamominud_v, 16, rwdvm, amo_op, amo_check)
 GEN_VEXT_TRANS(vamomaxud_v, 17, rwdvm, amo_op, amo_check)
 #endif
+
+/*
+ *** Vector Integer Arithmetic Instructions
+ */
+#define MAXSZ(s) (s->vlen >> (3 - s->lmul))
+
+static bool opivv_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_reg(s, a->rs1, false));
+}
+
+/* OPIVV with GVEC IR */
+#define GEN_OPIVV_GVEC_TRANS(NAME, GVSUF)                          \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
+{                                                                  \
+    if (!opivv_check(s, a)) {                                      \
+        return false;                                              \
+    }                                                              \
+                                                                   \
+    if (a->vm && s->vl_eq_vlmax) {                                 \
+        tcg_gen_gvec_##GVSUF(8 << s->sew, vreg_ofs(s, a->rd),      \
+            vreg_ofs(s, a->rs2), vreg_ofs(s, a->rs1),              \
+            MAXSZ(s), MAXSZ(s));                                   \
+    } else {                                                       \
+        uint32_t data = 0;                                         \
+        static gen_helper_gvec_4_ptr * const fns[4] = {            \
+            gen_helper_##NAME##_b, gen_helper_##NAME##_h,          \
+            gen_helper_##NAME##_w, gen_helper_##NAME##_d,          \
+        };                                                         \
+                                                                   \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
+            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),              \
+            cpu_env, 0, s->vlen / 8, data, fns[s->sew]);           \
+    }                                                              \
+    return true;                                                   \
+}
+GEN_OPIVV_GVEC_TRANS(vadd_vv, add)
+GEN_OPIVV_GVEC_TRANS(vsub_vv, sub)
+
+typedef void (*gen_helper_opivx)(TCGv_ptr, TCGv_ptr, TCGv, TCGv_ptr,
+        TCGv_env, TCGv_i32);
+
+static bool opivx_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
+        uint32_t data, gen_helper_opivx fn, DisasContext *s)
+{
+    TCGv_ptr dest, src2, mask;
+    TCGv src1;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    src2 = tcg_temp_new_ptr();
+    src1 = tcg_temp_new();
+    gen_get_gpr(src1, rs1);
+    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
+
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, vs2));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, mask, src1, src2, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free_ptr(src2);
+    tcg_temp_free(src1);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool opivx_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false));
+}
+/* OPIVX with GVEC IR */
+#define GEN_OPIVX_GVEC_TRANS(NAME, GVSUF)                                     \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                        \
+{                                                                             \
+    if (!opivx_check(s, a)) {                                                 \
+        return false;                                                         \
+    }                                                                         \
+                                                                              \
+    if (a->vm && s->vl_eq_vlmax) {                                            \
+        TCGv_i64 src1 = tcg_temp_new_i64();                                   \
+        TCGv tmp = tcg_temp_new();                                            \
+        gen_get_gpr(tmp, a->rs1);                                             \
+        tcg_gen_ext_tl_i64(src1, tmp);                                        \
+        tcg_gen_gvec_##GVSUF(8 << s->sew, vreg_ofs(s, a->rd),                 \
+            vreg_ofs(s, a->rs2), src1, MAXSZ(s), MAXSZ(s));                   \
+        tcg_temp_free_i64(src1);                                              \
+        tcg_temp_free(tmp);                                                   \
+        return true;                                                          \
+    } else {                                                                  \
+        uint32_t data = 0;                                                    \
+        static gen_helper_opivx const fns[4] = {                              \
+            gen_helper_##NAME##_b, gen_helper_##NAME##_h,                     \
+            gen_helper_##NAME##_w, gen_helper_##NAME##_d,                     \
+        };                                                                    \
+                                                                              \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                        \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                            \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                        \
+        return opivx_trans(a->rd, a->rs1, a->rs2, data, fns[s->sew], s);      \
+    }                                                                         \
+    return true;                                                              \
+}
+GEN_OPIVX_GVEC_TRANS(vadd_vx, adds)
+GEN_OPIVX_GVEC_TRANS(vsub_vx, subs)
+
+/* OPIVX without GVEC IR */
+#define GEN_OPIVX_TRANS(NAME, CHECK)                                     \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
+{                                                                        \
+    if (CHECK(s, a)) {                                                   \
+        uint32_t data = 0;                                               \
+        static gen_helper_opivx const fns[4] = {                         \
+            gen_helper_##NAME##_b, gen_helper_##NAME##_h,                \
+            gen_helper_##NAME##_w, gen_helper_##NAME##_d,                \
+        };                                                               \
+                                                                         \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                       \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
+        return opivx_trans(a->rd, a->rs1, a->rs2, data, fns[s->sew], s); \
+    }                                                                    \
+    return false;                                                        \
+}
+
+GEN_OPIVX_TRANS(vrsub_vx, opivx_check)
+
+static bool opivi_trans(uint32_t vd, uint32_t imm, uint32_t vs2,
+        uint32_t data, gen_helper_opivx fn, DisasContext *s, int zx)
+{
+    TCGv_ptr dest, src2, mask;
+    TCGv src1;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    src2 = tcg_temp_new_ptr();
+    if (zx) {
+        src1 = tcg_const_tl(imm);
+    } else {
+        src1 = tcg_const_tl(sextract64(imm, 0, 5));
+    }
+    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
+
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, vs2));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, mask, src1, src2, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free_ptr(src2);
+    tcg_temp_free(src1);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+/* OPIVI with GVEC IR */
+#define GEN_OPIVI_GVEC_TRANS(NAME, ZX, OPIVX, GVSUF)                 \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)               \
+{                                                                    \
+    if (!opivx_check(s, a)) {                                        \
+        return false;                                                \
+    }                                                                \
+                                                                     \
+    if (a->vm && s->vl_eq_vlmax) {                                   \
+        tcg_gen_gvec_##GVSUF(8 << s->sew, vreg_ofs(s, a->rd),        \
+            vreg_ofs(s, a->rs2), sextract64(a->rs1, 0, 5),           \
+            MAXSZ(s), MAXSZ(s));                                     \
+        return true;                                                 \
+    } else {                                                         \
+        uint32_t data = 0;                                           \
+        static gen_helper_opivx const fns[4] = {                     \
+            gen_helper_##OPIVX##_b, gen_helper_##OPIVX##_h,          \
+            gen_helper_##OPIVX##_w, gen_helper_##OPIVX##_d,          \
+        };                                                           \
+                                                                     \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);               \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                   \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);               \
+        return opivi_trans(a->rd, a->rs1, a->rs2, data,              \
+                fns[s->sew], s, ZX);                                 \
+    }                                                                \
+    return true;                                                     \
+}
+GEN_OPIVI_GVEC_TRANS(vadd_vi, 0, vadd_vx, addi)
+
+/* OPIVI without GVEC IR */
+#define GEN_OPIVI_TRANS(NAME, ZX, OPIVX, CHECK)                          \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
+{                                                                        \
+    if (CHECK(s, a)) {                                                   \
+        uint32_t data = 0;                                               \
+        static gen_helper_opivx const fns[4] = {                         \
+            gen_helper_##OPIVX##_b, gen_helper_##OPIVX##_h,              \
+            gen_helper_##OPIVX##_w, gen_helper_##OPIVX##_d,              \
+        };                                                               \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                       \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
+        return opivi_trans(a->rd, a->rs1, a->rs2, data,                  \
+                fns[s->sew], s, ZX);                                     \
+    }                                                                    \
+    return false;                                                        \
+}
+GEN_OPIVI_TRANS(vrsub_vi, 0, vrsub_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index f9b409b169..abdf3b82a8 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -828,3 +828,125 @@ GEN_VEXT_AMO(vamominw_v_w,  int32_t,  int32_t,  idx_w, clearl)
 GEN_VEXT_AMO(vamomaxw_v_w,  int32_t,  int32_t,  idx_w, clearl)
 GEN_VEXT_AMO(vamominuw_v_w, uint32_t, uint32_t, idx_w, clearl)
 GEN_VEXT_AMO(vamomaxuw_v_w, uint32_t, uint32_t, idx_w, clearl)
+
+/*
+ *** Vector Integer Arithmetic Instructions
+ */
+
+/* expand macro args before macro */
+#define RVVCALL(macro, ...)  macro(__VA_ARGS__)
+
+/* (TD, T1, T2, TX1, TX2) */
+#define OP_SSS_B int8_t, int8_t, int8_t, int8_t, int8_t
+#define OP_SSS_H int16_t, int16_t, int16_t, int16_t, int16_t
+#define OP_SSS_W int32_t, int32_t, int32_t, int32_t, int32_t
+#define OP_SSS_D int64_t, int64_t, int64_t, int64_t, int64_t
+
+/* operation of two vector elements */
+#define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
+static void do_##NAME(void *vd, void *vs1, void *vs2, int i)    \
+{                                                               \
+    TX1 s1 = *((T1 *)vs1 + HS1(i));                             \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                             \
+    *((TD *)vd + HD(i)) = OP(s2, s1);                           \
+}
+#define DO_SUB(N, M) (N - M)
+#define DO_RSUB(N, M) (M - N)
+
+RVVCALL(OPIVV2, vadd_vv_b, OP_SSS_B, H1, H1, H1, DO_ADD)
+RVVCALL(OPIVV2, vadd_vv_h, OP_SSS_H, H2, H2, H2, DO_ADD)
+RVVCALL(OPIVV2, vadd_vv_w, OP_SSS_W, H4, H4, H4, DO_ADD)
+RVVCALL(OPIVV2, vadd_vv_d, OP_SSS_D, H8, H8, H8, DO_ADD)
+RVVCALL(OPIVV2, vsub_vv_b, OP_SSS_B, H1, H1, H1, DO_SUB)
+RVVCALL(OPIVV2, vsub_vv_h, OP_SSS_H, H2, H2, H2, DO_SUB)
+RVVCALL(OPIVV2, vsub_vv_w, OP_SSS_W, H4, H4, H4, DO_SUB)
+RVVCALL(OPIVV2, vsub_vv_d, OP_SSS_D, H8, H8, H8, DO_SUB)
+
+/* generate the helpers for OPIVV */
+#define GEN_VEXT_VV(NAME, ESZ, DSZ, CLEAR_FN)             \
+void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
+        void *vs2, CPURISCVState *env, uint32_t desc)     \
+{                                                         \
+    uint32_t vlmax = vext_maxsz(desc) / ESZ;              \
+    uint32_t mlen = vext_mlen(desc);                      \
+    uint32_t vm = vext_vm(desc);                          \
+    uint32_t vl = env->vl;                                \
+    uint32_t i;                                           \
+    for (i = 0; i < vl; i++) {                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
+            continue;                                     \
+        }                                                 \
+        do_##NAME(vd, vs1, vs2, i);                       \
+    }                                                     \
+    if (i != 0) {                                         \
+        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);         \
+    }                                                     \
+}
+
+GEN_VEXT_VV(vadd_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vadd_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vadd_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vadd_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vsub_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vsub_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vsub_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vsub_vv_d, 8, 8, clearq)
+
+/*
+ * If XLEN < SEW, the value from the x register is sign-extended to SEW bits.
+ * So (target_long)s1 is need. (T1)(target_long)s1 gives the real operator type.
+ * (TX1)(T1)(target_long)s1 expands the operator type of widen operations
+ * or narrow operations
+ */
+#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
+static void do_##NAME(void *vd, target_ulong s1, void *vs2, int i)  \
+{                                                                   \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
+    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)(target_long)s1);         \
+}
+RVVCALL(OPIVX2, vadd_vx_b, OP_SSS_B, H1, H1, DO_ADD)
+RVVCALL(OPIVX2, vadd_vx_h, OP_SSS_H, H2, H2, DO_ADD)
+RVVCALL(OPIVX2, vadd_vx_w, OP_SSS_W, H4, H4, DO_ADD)
+RVVCALL(OPIVX2, vadd_vx_d, OP_SSS_D, H8, H8, DO_ADD)
+RVVCALL(OPIVX2, vsub_vx_b, OP_SSS_B, H1, H1, DO_SUB)
+RVVCALL(OPIVX2, vsub_vx_h, OP_SSS_H, H2, H2, DO_SUB)
+RVVCALL(OPIVX2, vsub_vx_w, OP_SSS_W, H4, H4, DO_SUB)
+RVVCALL(OPIVX2, vsub_vx_d, OP_SSS_D, H8, H8, DO_SUB)
+RVVCALL(OPIVX2, vrsub_vx_b, OP_SSS_B, H1, H1, DO_RSUB)
+RVVCALL(OPIVX2, vrsub_vx_h, OP_SSS_H, H2, H2, DO_RSUB)
+RVVCALL(OPIVX2, vrsub_vx_w, OP_SSS_W, H4, H4, DO_RSUB)
+RVVCALL(OPIVX2, vrsub_vx_d, OP_SSS_D, H8, H8, DO_RSUB)
+
+/* generate the helpers for instructions with one vector and one sclar */
+#define GEN_VEXT_VX(NAME, ESZ, DSZ, CLEAR_FN)             \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
+        void *vs2, CPURISCVState *env, uint32_t desc)     \
+{                                                         \
+    uint32_t vlmax = vext_maxsz(desc) / ESZ;              \
+    uint32_t mlen = vext_mlen(desc);                      \
+    uint32_t vm = vext_vm(desc);                          \
+    uint32_t vl = env->vl;                                \
+    uint32_t i;                                           \
+                                                          \
+    for (i = 0; i < vl; i++) {                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
+            continue;                                     \
+        }                                                 \
+        do_##NAME(vd, s1, vs2, i);                        \
+    }                                                     \
+    if (i != 0) {                                         \
+        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);         \
+    }                                                     \
+}
+GEN_VEXT_VX(vadd_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vadd_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vadd_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vadd_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vsub_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vsub_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vsub_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vsub_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vrsub_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vrsub_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vrsub_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vrsub_vx_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 10/60] target/riscv: vector widening integer add and subtract
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  49 ++++++++
 target/riscv/insn32.decode              |  16 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 154 ++++++++++++++++++++++++
 target/riscv/vector_helper.c            | 112 +++++++++++++++++
 4 files changed, 331 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index e73701d4bb..1256defb6c 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -290,3 +290,52 @@ DEF_HELPER_6(vrsub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrsub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrsub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrsub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vwaddu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwaddu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwaddu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsubu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsubu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsubu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwadd_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsub_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwaddu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwaddu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwaddu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsubu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsubu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsubu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwadd_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwadd_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwadd_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwaddu_wv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwaddu_wv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwaddu_wv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsubu_wv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsubu_wv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsubu_wv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwadd_wv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwadd_wv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwadd_wv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsub_wv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsub_wv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsub_wv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwaddu_wx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwaddu_wx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwaddu_wx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsubu_wx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsubu_wx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsubu_wx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwadd_wx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwadd_wx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwadd_wx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsub_wx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsub_wx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsub_wx_w, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index d1034a0e61..4bdbfd16fa 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -284,6 +284,22 @@ vsub_vv         000010 . ..... ..... 000 ..... 1010111 @r_vm
 vsub_vx         000010 . ..... ..... 100 ..... 1010111 @r_vm
 vrsub_vx        000011 . ..... ..... 100 ..... 1010111 @r_vm
 vrsub_vi        000011 . ..... ..... 011 ..... 1010111 @r_vm
+vwaddu_vv       110000 . ..... ..... 010 ..... 1010111 @r_vm
+vwaddu_vx       110000 . ..... ..... 110 ..... 1010111 @r_vm
+vwadd_vv        110001 . ..... ..... 010 ..... 1010111 @r_vm
+vwadd_vx        110001 . ..... ..... 110 ..... 1010111 @r_vm
+vwsubu_vv       110010 . ..... ..... 010 ..... 1010111 @r_vm
+vwsubu_vx       110010 . ..... ..... 110 ..... 1010111 @r_vm
+vwsub_vv        110011 . ..... ..... 010 ..... 1010111 @r_vm
+vwsub_vx        110011 . ..... ..... 110 ..... 1010111 @r_vm
+vwaddu_wv       110100 . ..... ..... 010 ..... 1010111 @r_vm
+vwaddu_wx       110100 . ..... ..... 110 ..... 1010111 @r_vm
+vwadd_wv        110101 . ..... ..... 010 ..... 1010111 @r_vm
+vwadd_wx        110101 . ..... ..... 110 ..... 1010111 @r_vm
+vwsubu_wv       110110 . ..... ..... 010 ..... 1010111 @r_vm
+vwsubu_wx       110110 . ..... ..... 110 ..... 1010111 @r_vm
+vwsub_wv        110111 . ..... ..... 010 ..... 1010111 @r_vm
+vwsub_wx        110111 . ..... ..... 110 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 00c7ec976f..7f6fe82fb3 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -122,6 +122,14 @@ static bool vext_check_nf(DisasContext *s, uint32_t nf)
     return (1 << s->lmul) * nf <= 8;
 }
 
+/*
+ * The destination vector register group cannot overlap a source vector register
+ * group of a different element width. (Section 11.2)
+ */
+static inline bool vext_check_overlap_group(int rd, int dlen, int rs, int slen)
+{
+    return ((rd >= rs + slen) || (rs >= rd + dlen));
+}
 /* common translation macro */
 #define GEN_VEXT_TRANS(NAME, SEQ, ARGTYPE, OP, CHECK)      \
 static bool trans_##NAME(DisasContext *s, arg_##ARGTYPE *a)\
@@ -950,3 +958,149 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
     return false;                                                        \
 }
 GEN_OPIVI_TRANS(vrsub_vi, 0, vrsub_vx, opivx_check)
+
+/* Vector Widening Integer Add/Subtract */
+
+/* OPIVV with WIDEN */
+static bool opivv_widen_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, true) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_reg(s, a->rs1, false) &&
+            vext_check_overlap_group(a->rd, 2 << s->lmul, a->rs2,
+                1 << s->lmul) &&
+            vext_check_overlap_group(a->rd, 2 << s->lmul, a->rs1,
+                1 << s->lmul) &&
+            (s->lmul < 0x3) && (s->sew < 0x3));
+}
+/* OPIVV with WIDEN */
+#define GEN_OPIVV_WIDEN_TRANS(NAME, CHECK)                       \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)           \
+{                                                                \
+    if (CHECK(s, a)) {                                           \
+        uint32_t data = 0;                                       \
+        static gen_helper_gvec_4_ptr * const fns[3] = {          \
+            gen_helper_##NAME##_b,                               \
+            gen_helper_##NAME##_h,                               \
+            gen_helper_##NAME##_w                                \
+        };                                                       \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);           \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);               \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);           \
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),   \
+            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),            \
+            cpu_env, 0, s->vlen / 8, data, fns[s->sew]);         \
+        return true;                                             \
+    }                                                            \
+    return false;                                                \
+}
+
+GEN_OPIVV_WIDEN_TRANS(vwaddu_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwadd_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwsubu_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwsub_vv, opivv_widen_check)
+
+/* OPIVX with WIDEN */
+static bool opivx_widen_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, true) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_overlap_group(a->rd, 2 << s->lmul, a->rs2,
+                1 << s->lmul) &&
+            (s->lmul < 0x3) && (s->sew < 0x3));
+}
+#define GEN_OPIVX_WIDEN_TRANS(NAME)                                      \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
+{                                                                        \
+    if (opivx_widen_check(s, a)) {                                       \
+        uint32_t data = 0;                                               \
+        static gen_helper_opivx const fns[3] = {                         \
+            gen_helper_##NAME##_b,                                       \
+            gen_helper_##NAME##_h,                                       \
+            gen_helper_##NAME##_w                                        \
+        };                                                               \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                       \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
+        return opivx_trans(a->rd, a->rs1, a->rs2, data, fns[s->sew], s); \
+    }                                                                    \
+    return false;                                                        \
+}
+GEN_OPIVX_WIDEN_TRANS(vwaddu_vx)
+GEN_OPIVX_WIDEN_TRANS(vwadd_vx)
+GEN_OPIVX_WIDEN_TRANS(vwsubu_vx)
+GEN_OPIVX_WIDEN_TRANS(vwsub_vx)
+
+/* WIDEN OPIVV with WIDEN */
+static bool opiwv_widen_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, true) &&
+            vext_check_reg(s, a->rs2, true) &&
+            vext_check_reg(s, a->rs1, false) &&
+            vext_check_overlap_group(a->rd, 2 << s->lmul, a->rs1,
+                1 << s->lmul) &&
+            (s->lmul < 0x3) && (s->sew < 0x3));
+}
+
+#define GEN_OPIWV_WIDEN_TRANS(NAME)                                \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
+{                                                                  \
+    if (opiwv_widen_check(s, a)) {                                 \
+        uint32_t data = 0;                                         \
+        static gen_helper_gvec_4_ptr * const fns[3] = {            \
+            gen_helper_##NAME##_b,                                 \
+            gen_helper_##NAME##_h,                                 \
+            gen_helper_##NAME##_w                                  \
+        };                                                         \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
+            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),              \
+            cpu_env, 0, s->vlen / 8, data, fns[s->sew]);           \
+        return true;                                               \
+    }                                                              \
+    return false;                                                  \
+}
+GEN_OPIWV_WIDEN_TRANS(vwaddu_wv)
+GEN_OPIWV_WIDEN_TRANS(vwadd_wv)
+GEN_OPIWV_WIDEN_TRANS(vwsubu_wv)
+GEN_OPIWV_WIDEN_TRANS(vwsub_wv)
+
+/* WIDEN OPIVX with WIDEN */
+static bool opiwx_widen_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, true) &&
+            vext_check_reg(s, a->rs2, true) &&
+            (s->lmul < 0x3) && (s->sew < 0x3));
+}
+
+#define GEN_OPIWX_WIDEN_TRANS(NAME)                                      \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
+{                                                                        \
+    if (opiwx_widen_check(s, a)) {                                       \
+        uint32_t data = 0;                                               \
+        static gen_helper_opivx const fns[3] = {                         \
+            gen_helper_##NAME##_b,                                       \
+            gen_helper_##NAME##_h,                                       \
+            gen_helper_##NAME##_w                                        \
+        };                                                               \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                       \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
+        return opivx_trans(a->rd, a->rs1, a->rs2, data, fns[s->sew], s); \
+    }                                                                    \
+    return false;                                                        \
+}
+GEN_OPIWX_WIDEN_TRANS(vwaddu_wx)
+GEN_OPIWX_WIDEN_TRANS(vwadd_wx)
+GEN_OPIWX_WIDEN_TRANS(vwsubu_wx)
+GEN_OPIWX_WIDEN_TRANS(vwsub_wx)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index abdf3b82a8..00eaebee9f 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -950,3 +950,115 @@ GEN_VEXT_VX(vrsub_vx_b, 1, 1, clearb)
 GEN_VEXT_VX(vrsub_vx_h, 2, 2, clearh)
 GEN_VEXT_VX(vrsub_vx_w, 4, 4, clearl)
 GEN_VEXT_VX(vrsub_vx_d, 8, 8, clearq)
+
+/* Vector Widening Integer Add/Subtract */
+#define WOP_UUU_B uint16_t, uint8_t, uint8_t, uint16_t, uint16_t
+#define WOP_UUU_H uint32_t, uint16_t, uint16_t, uint32_t, uint32_t
+#define WOP_UUU_W uint64_t, uint32_t, uint32_t, uint64_t, uint64_t
+#define WOP_SSS_B int16_t, int8_t, int8_t, int16_t, int16_t
+#define WOP_SSS_H int32_t, int16_t, int16_t, int32_t, int32_t
+#define WOP_SSS_W int64_t, int32_t, int32_t, int64_t, int64_t
+#define WOP_WUUU_B  uint16_t, uint8_t, uint16_t, uint16_t, uint16_t
+#define WOP_WUUU_H  uint32_t, uint16_t, uint32_t, uint32_t, uint32_t
+#define WOP_WUUU_W  uint64_t, uint32_t, uint64_t, uint64_t, uint64_t
+#define WOP_WSSS_B  int16_t, int8_t, int16_t, int16_t, int16_t
+#define WOP_WSSS_H  int32_t, int16_t, int32_t, int32_t, int32_t
+#define WOP_WSSS_W  int64_t, int32_t, int64_t, int64_t, int64_t
+RVVCALL(OPIVV2, vwaddu_vv_b, WOP_UUU_B, H2, H1, H1, DO_ADD)
+RVVCALL(OPIVV2, vwaddu_vv_h, WOP_UUU_H, H4, H2, H2, DO_ADD)
+RVVCALL(OPIVV2, vwaddu_vv_w, WOP_UUU_W, H8, H4, H4, DO_ADD)
+RVVCALL(OPIVV2, vwsubu_vv_b, WOP_UUU_B, H2, H1, H1, DO_SUB)
+RVVCALL(OPIVV2, vwsubu_vv_h, WOP_UUU_H, H4, H2, H2, DO_SUB)
+RVVCALL(OPIVV2, vwsubu_vv_w, WOP_UUU_W, H8, H4, H4, DO_SUB)
+RVVCALL(OPIVV2, vwadd_vv_b, WOP_SSS_B, H2, H1, H1, DO_ADD)
+RVVCALL(OPIVV2, vwadd_vv_h, WOP_SSS_H, H4, H2, H2, DO_ADD)
+RVVCALL(OPIVV2, vwadd_vv_w, WOP_SSS_W, H8, H4, H4, DO_ADD)
+RVVCALL(OPIVV2, vwsub_vv_b, WOP_SSS_B, H2, H1, H1, DO_SUB)
+RVVCALL(OPIVV2, vwsub_vv_h, WOP_SSS_H, H4, H2, H2, DO_SUB)
+RVVCALL(OPIVV2, vwsub_vv_w, WOP_SSS_W, H8, H4, H4, DO_SUB)
+RVVCALL(OPIVV2, vwaddu_wv_b, WOP_WUUU_B, H2, H1, H1, DO_ADD)
+RVVCALL(OPIVV2, vwaddu_wv_h, WOP_WUUU_H, H4, H2, H2, DO_ADD)
+RVVCALL(OPIVV2, vwaddu_wv_w, WOP_WUUU_W, H8, H4, H4, DO_ADD)
+RVVCALL(OPIVV2, vwsubu_wv_b, WOP_WUUU_B, H2, H1, H1, DO_SUB)
+RVVCALL(OPIVV2, vwsubu_wv_h, WOP_WUUU_H, H4, H2, H2, DO_SUB)
+RVVCALL(OPIVV2, vwsubu_wv_w, WOP_WUUU_W, H8, H4, H4, DO_SUB)
+RVVCALL(OPIVV2, vwadd_wv_b, WOP_WSSS_B, H2, H1, H1, DO_ADD)
+RVVCALL(OPIVV2, vwadd_wv_h, WOP_WSSS_H, H4, H2, H2, DO_ADD)
+RVVCALL(OPIVV2, vwadd_wv_w, WOP_WSSS_W, H8, H4, H4, DO_ADD)
+RVVCALL(OPIVV2, vwsub_wv_b, WOP_WSSS_B, H2, H1, H1, DO_SUB)
+RVVCALL(OPIVV2, vwsub_wv_h, WOP_WSSS_H, H4, H2, H2, DO_SUB)
+RVVCALL(OPIVV2, vwsub_wv_w, WOP_WSSS_W, H8, H4, H4, DO_SUB)
+GEN_VEXT_VV(vwaddu_vv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwaddu_vv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwaddu_vv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwsubu_vv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwsubu_vv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwsubu_vv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwadd_vv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwadd_vv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwadd_vv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwsub_vv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwsub_vv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwsub_vv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwaddu_wv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwaddu_wv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwaddu_wv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwsubu_wv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwsubu_wv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwsubu_wv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwadd_wv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwadd_wv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwadd_wv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwsub_wv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwsub_wv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwsub_wv_w, 4, 8, clearq)
+
+RVVCALL(OPIVX2, vwaddu_vx_b, WOP_UUU_B, H2, H1, DO_ADD)
+RVVCALL(OPIVX2, vwaddu_vx_h, WOP_UUU_H, H4, H2, DO_ADD)
+RVVCALL(OPIVX2, vwaddu_vx_w, WOP_UUU_W, H8, H4, DO_ADD)
+RVVCALL(OPIVX2, vwsubu_vx_b, WOP_UUU_B, H2, H1, DO_SUB)
+RVVCALL(OPIVX2, vwsubu_vx_h, WOP_UUU_H, H4, H2, DO_SUB)
+RVVCALL(OPIVX2, vwsubu_vx_w, WOP_UUU_W, H8, H4, DO_SUB)
+RVVCALL(OPIVX2, vwadd_vx_b, WOP_SSS_B, H2, H1, DO_ADD)
+RVVCALL(OPIVX2, vwadd_vx_h, WOP_SSS_H, H4, H2, DO_ADD)
+RVVCALL(OPIVX2, vwadd_vx_w, WOP_SSS_W, H8, H4, DO_ADD)
+RVVCALL(OPIVX2, vwsub_vx_b, WOP_SSS_B, H2, H1, DO_SUB)
+RVVCALL(OPIVX2, vwsub_vx_h, WOP_SSS_H, H4, H2, DO_SUB)
+RVVCALL(OPIVX2, vwsub_vx_w, WOP_SSS_W, H8, H4, DO_SUB)
+RVVCALL(OPIVX2, vwaddu_wx_b, WOP_WUUU_B, H2, H1, DO_ADD)
+RVVCALL(OPIVX2, vwaddu_wx_h, WOP_WUUU_H, H4, H2, DO_ADD)
+RVVCALL(OPIVX2, vwaddu_wx_w, WOP_WUUU_W, H8, H4, DO_ADD)
+RVVCALL(OPIVX2, vwsubu_wx_b, WOP_WUUU_B, H2, H1, DO_SUB)
+RVVCALL(OPIVX2, vwsubu_wx_h, WOP_WUUU_H, H4, H2, DO_SUB)
+RVVCALL(OPIVX2, vwsubu_wx_w, WOP_WUUU_W, H8, H4, DO_SUB)
+RVVCALL(OPIVX2, vwadd_wx_b, WOP_WSSS_B, H2, H1, DO_ADD)
+RVVCALL(OPIVX2, vwadd_wx_h, WOP_WSSS_H, H4, H2, DO_ADD)
+RVVCALL(OPIVX2, vwadd_wx_w, WOP_WSSS_W, H8, H4, DO_ADD)
+RVVCALL(OPIVX2, vwsub_wx_b, WOP_WSSS_B, H2, H1, DO_SUB)
+RVVCALL(OPIVX2, vwsub_wx_h, WOP_WSSS_H, H4, H2, DO_SUB)
+RVVCALL(OPIVX2, vwsub_wx_w, WOP_WSSS_W, H8, H4, DO_SUB)
+GEN_VEXT_VX(vwaddu_vx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwaddu_vx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwaddu_vx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwsubu_vx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwsubu_vx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwsubu_vx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwadd_vx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwadd_vx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwadd_vx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwsub_vx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwsub_vx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwsub_vx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwaddu_wx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwaddu_wx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwaddu_wx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwsubu_wx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwsubu_wx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwsubu_wx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwadd_wx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwadd_wx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwadd_wx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwsub_wx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwsub_wx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwsub_wx_w, 4, 8, clearq)
+
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 10/60] target/riscv: vector widening integer add and subtract
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  49 ++++++++
 target/riscv/insn32.decode              |  16 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 154 ++++++++++++++++++++++++
 target/riscv/vector_helper.c            | 112 +++++++++++++++++
 4 files changed, 331 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index e73701d4bb..1256defb6c 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -290,3 +290,52 @@ DEF_HELPER_6(vrsub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrsub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrsub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrsub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vwaddu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwaddu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwaddu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsubu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsubu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsubu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwadd_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsub_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwaddu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwaddu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwaddu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsubu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsubu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsubu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwadd_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwadd_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwadd_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwaddu_wv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwaddu_wv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwaddu_wv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsubu_wv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsubu_wv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsubu_wv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwadd_wv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwadd_wv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwadd_wv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsub_wv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsub_wv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsub_wv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwaddu_wx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwaddu_wx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwaddu_wx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsubu_wx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsubu_wx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsubu_wx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwadd_wx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwadd_wx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwadd_wx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsub_wx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsub_wx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsub_wx_w, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index d1034a0e61..4bdbfd16fa 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -284,6 +284,22 @@ vsub_vv         000010 . ..... ..... 000 ..... 1010111 @r_vm
 vsub_vx         000010 . ..... ..... 100 ..... 1010111 @r_vm
 vrsub_vx        000011 . ..... ..... 100 ..... 1010111 @r_vm
 vrsub_vi        000011 . ..... ..... 011 ..... 1010111 @r_vm
+vwaddu_vv       110000 . ..... ..... 010 ..... 1010111 @r_vm
+vwaddu_vx       110000 . ..... ..... 110 ..... 1010111 @r_vm
+vwadd_vv        110001 . ..... ..... 010 ..... 1010111 @r_vm
+vwadd_vx        110001 . ..... ..... 110 ..... 1010111 @r_vm
+vwsubu_vv       110010 . ..... ..... 010 ..... 1010111 @r_vm
+vwsubu_vx       110010 . ..... ..... 110 ..... 1010111 @r_vm
+vwsub_vv        110011 . ..... ..... 010 ..... 1010111 @r_vm
+vwsub_vx        110011 . ..... ..... 110 ..... 1010111 @r_vm
+vwaddu_wv       110100 . ..... ..... 010 ..... 1010111 @r_vm
+vwaddu_wx       110100 . ..... ..... 110 ..... 1010111 @r_vm
+vwadd_wv        110101 . ..... ..... 010 ..... 1010111 @r_vm
+vwadd_wx        110101 . ..... ..... 110 ..... 1010111 @r_vm
+vwsubu_wv       110110 . ..... ..... 010 ..... 1010111 @r_vm
+vwsubu_wx       110110 . ..... ..... 110 ..... 1010111 @r_vm
+vwsub_wv        110111 . ..... ..... 010 ..... 1010111 @r_vm
+vwsub_wx        110111 . ..... ..... 110 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 00c7ec976f..7f6fe82fb3 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -122,6 +122,14 @@ static bool vext_check_nf(DisasContext *s, uint32_t nf)
     return (1 << s->lmul) * nf <= 8;
 }
 
+/*
+ * The destination vector register group cannot overlap a source vector register
+ * group of a different element width. (Section 11.2)
+ */
+static inline bool vext_check_overlap_group(int rd, int dlen, int rs, int slen)
+{
+    return ((rd >= rs + slen) || (rs >= rd + dlen));
+}
 /* common translation macro */
 #define GEN_VEXT_TRANS(NAME, SEQ, ARGTYPE, OP, CHECK)      \
 static bool trans_##NAME(DisasContext *s, arg_##ARGTYPE *a)\
@@ -950,3 +958,149 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
     return false;                                                        \
 }
 GEN_OPIVI_TRANS(vrsub_vi, 0, vrsub_vx, opivx_check)
+
+/* Vector Widening Integer Add/Subtract */
+
+/* OPIVV with WIDEN */
+static bool opivv_widen_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, true) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_reg(s, a->rs1, false) &&
+            vext_check_overlap_group(a->rd, 2 << s->lmul, a->rs2,
+                1 << s->lmul) &&
+            vext_check_overlap_group(a->rd, 2 << s->lmul, a->rs1,
+                1 << s->lmul) &&
+            (s->lmul < 0x3) && (s->sew < 0x3));
+}
+/* OPIVV with WIDEN */
+#define GEN_OPIVV_WIDEN_TRANS(NAME, CHECK)                       \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)           \
+{                                                                \
+    if (CHECK(s, a)) {                                           \
+        uint32_t data = 0;                                       \
+        static gen_helper_gvec_4_ptr * const fns[3] = {          \
+            gen_helper_##NAME##_b,                               \
+            gen_helper_##NAME##_h,                               \
+            gen_helper_##NAME##_w                                \
+        };                                                       \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);           \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);               \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);           \
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),   \
+            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),            \
+            cpu_env, 0, s->vlen / 8, data, fns[s->sew]);         \
+        return true;                                             \
+    }                                                            \
+    return false;                                                \
+}
+
+GEN_OPIVV_WIDEN_TRANS(vwaddu_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwadd_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwsubu_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwsub_vv, opivv_widen_check)
+
+/* OPIVX with WIDEN */
+static bool opivx_widen_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, true) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_overlap_group(a->rd, 2 << s->lmul, a->rs2,
+                1 << s->lmul) &&
+            (s->lmul < 0x3) && (s->sew < 0x3));
+}
+#define GEN_OPIVX_WIDEN_TRANS(NAME)                                      \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
+{                                                                        \
+    if (opivx_widen_check(s, a)) {                                       \
+        uint32_t data = 0;                                               \
+        static gen_helper_opivx const fns[3] = {                         \
+            gen_helper_##NAME##_b,                                       \
+            gen_helper_##NAME##_h,                                       \
+            gen_helper_##NAME##_w                                        \
+        };                                                               \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                       \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
+        return opivx_trans(a->rd, a->rs1, a->rs2, data, fns[s->sew], s); \
+    }                                                                    \
+    return false;                                                        \
+}
+GEN_OPIVX_WIDEN_TRANS(vwaddu_vx)
+GEN_OPIVX_WIDEN_TRANS(vwadd_vx)
+GEN_OPIVX_WIDEN_TRANS(vwsubu_vx)
+GEN_OPIVX_WIDEN_TRANS(vwsub_vx)
+
+/* WIDEN OPIVV with WIDEN */
+static bool opiwv_widen_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, true) &&
+            vext_check_reg(s, a->rs2, true) &&
+            vext_check_reg(s, a->rs1, false) &&
+            vext_check_overlap_group(a->rd, 2 << s->lmul, a->rs1,
+                1 << s->lmul) &&
+            (s->lmul < 0x3) && (s->sew < 0x3));
+}
+
+#define GEN_OPIWV_WIDEN_TRANS(NAME)                                \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
+{                                                                  \
+    if (opiwv_widen_check(s, a)) {                                 \
+        uint32_t data = 0;                                         \
+        static gen_helper_gvec_4_ptr * const fns[3] = {            \
+            gen_helper_##NAME##_b,                                 \
+            gen_helper_##NAME##_h,                                 \
+            gen_helper_##NAME##_w                                  \
+        };                                                         \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
+            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),              \
+            cpu_env, 0, s->vlen / 8, data, fns[s->sew]);           \
+        return true;                                               \
+    }                                                              \
+    return false;                                                  \
+}
+GEN_OPIWV_WIDEN_TRANS(vwaddu_wv)
+GEN_OPIWV_WIDEN_TRANS(vwadd_wv)
+GEN_OPIWV_WIDEN_TRANS(vwsubu_wv)
+GEN_OPIWV_WIDEN_TRANS(vwsub_wv)
+
+/* WIDEN OPIVX with WIDEN */
+static bool opiwx_widen_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, true) &&
+            vext_check_reg(s, a->rs2, true) &&
+            (s->lmul < 0x3) && (s->sew < 0x3));
+}
+
+#define GEN_OPIWX_WIDEN_TRANS(NAME)                                      \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
+{                                                                        \
+    if (opiwx_widen_check(s, a)) {                                       \
+        uint32_t data = 0;                                               \
+        static gen_helper_opivx const fns[3] = {                         \
+            gen_helper_##NAME##_b,                                       \
+            gen_helper_##NAME##_h,                                       \
+            gen_helper_##NAME##_w                                        \
+        };                                                               \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                       \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
+        return opivx_trans(a->rd, a->rs1, a->rs2, data, fns[s->sew], s); \
+    }                                                                    \
+    return false;                                                        \
+}
+GEN_OPIWX_WIDEN_TRANS(vwaddu_wx)
+GEN_OPIWX_WIDEN_TRANS(vwadd_wx)
+GEN_OPIWX_WIDEN_TRANS(vwsubu_wx)
+GEN_OPIWX_WIDEN_TRANS(vwsub_wx)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index abdf3b82a8..00eaebee9f 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -950,3 +950,115 @@ GEN_VEXT_VX(vrsub_vx_b, 1, 1, clearb)
 GEN_VEXT_VX(vrsub_vx_h, 2, 2, clearh)
 GEN_VEXT_VX(vrsub_vx_w, 4, 4, clearl)
 GEN_VEXT_VX(vrsub_vx_d, 8, 8, clearq)
+
+/* Vector Widening Integer Add/Subtract */
+#define WOP_UUU_B uint16_t, uint8_t, uint8_t, uint16_t, uint16_t
+#define WOP_UUU_H uint32_t, uint16_t, uint16_t, uint32_t, uint32_t
+#define WOP_UUU_W uint64_t, uint32_t, uint32_t, uint64_t, uint64_t
+#define WOP_SSS_B int16_t, int8_t, int8_t, int16_t, int16_t
+#define WOP_SSS_H int32_t, int16_t, int16_t, int32_t, int32_t
+#define WOP_SSS_W int64_t, int32_t, int32_t, int64_t, int64_t
+#define WOP_WUUU_B  uint16_t, uint8_t, uint16_t, uint16_t, uint16_t
+#define WOP_WUUU_H  uint32_t, uint16_t, uint32_t, uint32_t, uint32_t
+#define WOP_WUUU_W  uint64_t, uint32_t, uint64_t, uint64_t, uint64_t
+#define WOP_WSSS_B  int16_t, int8_t, int16_t, int16_t, int16_t
+#define WOP_WSSS_H  int32_t, int16_t, int32_t, int32_t, int32_t
+#define WOP_WSSS_W  int64_t, int32_t, int64_t, int64_t, int64_t
+RVVCALL(OPIVV2, vwaddu_vv_b, WOP_UUU_B, H2, H1, H1, DO_ADD)
+RVVCALL(OPIVV2, vwaddu_vv_h, WOP_UUU_H, H4, H2, H2, DO_ADD)
+RVVCALL(OPIVV2, vwaddu_vv_w, WOP_UUU_W, H8, H4, H4, DO_ADD)
+RVVCALL(OPIVV2, vwsubu_vv_b, WOP_UUU_B, H2, H1, H1, DO_SUB)
+RVVCALL(OPIVV2, vwsubu_vv_h, WOP_UUU_H, H4, H2, H2, DO_SUB)
+RVVCALL(OPIVV2, vwsubu_vv_w, WOP_UUU_W, H8, H4, H4, DO_SUB)
+RVVCALL(OPIVV2, vwadd_vv_b, WOP_SSS_B, H2, H1, H1, DO_ADD)
+RVVCALL(OPIVV2, vwadd_vv_h, WOP_SSS_H, H4, H2, H2, DO_ADD)
+RVVCALL(OPIVV2, vwadd_vv_w, WOP_SSS_W, H8, H4, H4, DO_ADD)
+RVVCALL(OPIVV2, vwsub_vv_b, WOP_SSS_B, H2, H1, H1, DO_SUB)
+RVVCALL(OPIVV2, vwsub_vv_h, WOP_SSS_H, H4, H2, H2, DO_SUB)
+RVVCALL(OPIVV2, vwsub_vv_w, WOP_SSS_W, H8, H4, H4, DO_SUB)
+RVVCALL(OPIVV2, vwaddu_wv_b, WOP_WUUU_B, H2, H1, H1, DO_ADD)
+RVVCALL(OPIVV2, vwaddu_wv_h, WOP_WUUU_H, H4, H2, H2, DO_ADD)
+RVVCALL(OPIVV2, vwaddu_wv_w, WOP_WUUU_W, H8, H4, H4, DO_ADD)
+RVVCALL(OPIVV2, vwsubu_wv_b, WOP_WUUU_B, H2, H1, H1, DO_SUB)
+RVVCALL(OPIVV2, vwsubu_wv_h, WOP_WUUU_H, H4, H2, H2, DO_SUB)
+RVVCALL(OPIVV2, vwsubu_wv_w, WOP_WUUU_W, H8, H4, H4, DO_SUB)
+RVVCALL(OPIVV2, vwadd_wv_b, WOP_WSSS_B, H2, H1, H1, DO_ADD)
+RVVCALL(OPIVV2, vwadd_wv_h, WOP_WSSS_H, H4, H2, H2, DO_ADD)
+RVVCALL(OPIVV2, vwadd_wv_w, WOP_WSSS_W, H8, H4, H4, DO_ADD)
+RVVCALL(OPIVV2, vwsub_wv_b, WOP_WSSS_B, H2, H1, H1, DO_SUB)
+RVVCALL(OPIVV2, vwsub_wv_h, WOP_WSSS_H, H4, H2, H2, DO_SUB)
+RVVCALL(OPIVV2, vwsub_wv_w, WOP_WSSS_W, H8, H4, H4, DO_SUB)
+GEN_VEXT_VV(vwaddu_vv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwaddu_vv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwaddu_vv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwsubu_vv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwsubu_vv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwsubu_vv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwadd_vv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwadd_vv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwadd_vv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwsub_vv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwsub_vv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwsub_vv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwaddu_wv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwaddu_wv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwaddu_wv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwsubu_wv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwsubu_wv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwsubu_wv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwadd_wv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwadd_wv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwadd_wv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwsub_wv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwsub_wv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwsub_wv_w, 4, 8, clearq)
+
+RVVCALL(OPIVX2, vwaddu_vx_b, WOP_UUU_B, H2, H1, DO_ADD)
+RVVCALL(OPIVX2, vwaddu_vx_h, WOP_UUU_H, H4, H2, DO_ADD)
+RVVCALL(OPIVX2, vwaddu_vx_w, WOP_UUU_W, H8, H4, DO_ADD)
+RVVCALL(OPIVX2, vwsubu_vx_b, WOP_UUU_B, H2, H1, DO_SUB)
+RVVCALL(OPIVX2, vwsubu_vx_h, WOP_UUU_H, H4, H2, DO_SUB)
+RVVCALL(OPIVX2, vwsubu_vx_w, WOP_UUU_W, H8, H4, DO_SUB)
+RVVCALL(OPIVX2, vwadd_vx_b, WOP_SSS_B, H2, H1, DO_ADD)
+RVVCALL(OPIVX2, vwadd_vx_h, WOP_SSS_H, H4, H2, DO_ADD)
+RVVCALL(OPIVX2, vwadd_vx_w, WOP_SSS_W, H8, H4, DO_ADD)
+RVVCALL(OPIVX2, vwsub_vx_b, WOP_SSS_B, H2, H1, DO_SUB)
+RVVCALL(OPIVX2, vwsub_vx_h, WOP_SSS_H, H4, H2, DO_SUB)
+RVVCALL(OPIVX2, vwsub_vx_w, WOP_SSS_W, H8, H4, DO_SUB)
+RVVCALL(OPIVX2, vwaddu_wx_b, WOP_WUUU_B, H2, H1, DO_ADD)
+RVVCALL(OPIVX2, vwaddu_wx_h, WOP_WUUU_H, H4, H2, DO_ADD)
+RVVCALL(OPIVX2, vwaddu_wx_w, WOP_WUUU_W, H8, H4, DO_ADD)
+RVVCALL(OPIVX2, vwsubu_wx_b, WOP_WUUU_B, H2, H1, DO_SUB)
+RVVCALL(OPIVX2, vwsubu_wx_h, WOP_WUUU_H, H4, H2, DO_SUB)
+RVVCALL(OPIVX2, vwsubu_wx_w, WOP_WUUU_W, H8, H4, DO_SUB)
+RVVCALL(OPIVX2, vwadd_wx_b, WOP_WSSS_B, H2, H1, DO_ADD)
+RVVCALL(OPIVX2, vwadd_wx_h, WOP_WSSS_H, H4, H2, DO_ADD)
+RVVCALL(OPIVX2, vwadd_wx_w, WOP_WSSS_W, H8, H4, DO_ADD)
+RVVCALL(OPIVX2, vwsub_wx_b, WOP_WSSS_B, H2, H1, DO_SUB)
+RVVCALL(OPIVX2, vwsub_wx_h, WOP_WSSS_H, H4, H2, DO_SUB)
+RVVCALL(OPIVX2, vwsub_wx_w, WOP_WSSS_W, H8, H4, DO_SUB)
+GEN_VEXT_VX(vwaddu_vx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwaddu_vx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwaddu_vx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwsubu_vx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwsubu_vx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwsubu_vx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwadd_vx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwadd_vx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwadd_vx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwsub_vx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwsub_vx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwsub_vx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwaddu_wx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwaddu_wx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwaddu_wx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwsubu_wx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwsubu_wx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwsubu_wx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwadd_wx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwadd_wx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwadd_wx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwsub_wx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwsub_wx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwsub_wx_w, 4, 8, clearq)
+
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 11/60] target/riscv: vector integer add-with-carry / subtract-with-borrow instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  33 ++++++
 target/riscv/insn32.decode              |  10 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 108 ++++++++++++++++++
 target/riscv/vector_helper.c            | 140 ++++++++++++++++++++++++
 4 files changed, 291 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 1256defb6c..72c733bf49 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -339,3 +339,36 @@ DEF_HELPER_6(vwadd_wx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwsub_wx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwsub_wx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwsub_wx_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vadc_vvm_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadc_vvm_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadc_vvm_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadc_vvm_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsbc_vvm_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsbc_vvm_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsbc_vvm_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsbc_vvm_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadc_vvm_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadc_vvm_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadc_vvm_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadc_vvm_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vvm_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vvm_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vvm_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vvm_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadc_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vadc_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vadc_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vadc_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsbc_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsbc_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsbc_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsbc_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadc_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadc_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadc_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadc_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 4bdbfd16fa..e8ddf95d3d 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -300,6 +300,16 @@ vwsubu_wv       110110 . ..... ..... 010 ..... 1010111 @r_vm
 vwsubu_wx       110110 . ..... ..... 110 ..... 1010111 @r_vm
 vwsub_wv        110111 . ..... ..... 010 ..... 1010111 @r_vm
 vwsub_wx        110111 . ..... ..... 110 ..... 1010111 @r_vm
+vadc_vvm        010000 1 ..... ..... 000 ..... 1010111 @r
+vadc_vxm        010000 1 ..... ..... 100 ..... 1010111 @r
+vadc_vim        010000 1 ..... ..... 011 ..... 1010111 @r
+vmadc_vvm       010001 1 ..... ..... 000 ..... 1010111 @r
+vmadc_vxm       010001 1 ..... ..... 100 ..... 1010111 @r
+vmadc_vim       010001 1 ..... ..... 011 ..... 1010111 @r
+vsbc_vvm        010010 1 ..... ..... 000 ..... 1010111 @r
+vsbc_vxm        010010 1 ..... ..... 100 ..... 1010111 @r
+vmsbc_vvm       010011 1 ..... ..... 000 ..... 1010111 @r
+vmsbc_vxm       010011 1 ..... ..... 100 ..... 1010111 @r
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 7f6fe82fb3..a1f2e84eb8 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1104,3 +1104,111 @@ GEN_OPIWX_WIDEN_TRANS(vwaddu_wx)
 GEN_OPIWX_WIDEN_TRANS(vwadd_wx)
 GEN_OPIWX_WIDEN_TRANS(vwsubu_wx)
 GEN_OPIWX_WIDEN_TRANS(vwsub_wx)
+
+/* OPIVV with UNMASKED */
+#define GEN_OPIVV_R_TRANS(NAME, CHECK)                             \
+static bool trans_##NAME(DisasContext *s, arg_r *a)                \
+{                                                                  \
+    if (CHECK(s, a)) {                                             \
+        uint32_t data = 0;                                         \
+        static gen_helper_gvec_4_ptr * const fns[4] = {            \
+            gen_helper_##NAME##_b, gen_helper_##NAME##_h,          \
+            gen_helper_##NAME##_w, gen_helper_##NAME##_d,          \
+        };                                                         \
+                                                                   \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
+            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),              \
+            cpu_env, 0, s->vlen / 8, data, fns[s->sew]);           \
+        return true;                                               \
+    }                                                              \
+    return false;                                                  \
+}
+/*
+ * For vadc and vsbc, an illegal instruction exception is raised if the
+ * destination vector register is v0 and LMUL > 1. (Section 12.3)
+ */
+static bool opivv_vadc_check(DisasContext *s, arg_r *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_reg(s, a->rs1, false) &&
+            ((a->rd != 0) || (s->lmul == 0)));
+}
+GEN_OPIVV_R_TRANS(vadc_vvm, opivv_vadc_check)
+GEN_OPIVV_R_TRANS(vsbc_vvm, opivv_vadc_check)
+
+/*
+ * For vmadc and vmsbc, an illegal instruction exception is raised if the
+ * destination vector register overlaps a source vector register group.
+ */
+static bool opivv_vmadc_check(DisasContext *s, arg_r *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_reg(s, a->rs1, false) &&
+            vext_check_overlap_group(a->rd, 1, a->rs1, 1 << s->lmul) &&
+            vext_check_overlap_group(a->rd, 1, a->rs2, 1 << s->lmul));
+}
+GEN_OPIVV_R_TRANS(vmadc_vvm, opivv_vmadc_check)
+GEN_OPIVV_R_TRANS(vmsbc_vvm, opivv_vmadc_check)
+
+/* OPIVX with UNMASKED */
+#define GEN_OPIVX_R_TRANS(NAME, CHECK)                                   \
+static bool trans_##NAME(DisasContext *s, arg_r *a)                      \
+{                                                                        \
+    if (CHECK(s, a)) {                                                   \
+        uint32_t data = 0;                                               \
+        static gen_helper_opivx const fns[4] = {                         \
+            gen_helper_##NAME##_b, gen_helper_##NAME##_h,                \
+            gen_helper_##NAME##_w, gen_helper_##NAME##_d,                \
+        };                                                               \
+                                                                         \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
+        return opivx_trans(a->rd, a->rs1, a->rs2, data, fns[s->sew], s); \
+    }                                                                    \
+    return false;                                                        \
+}
+
+static bool opivx_vadc_check(DisasContext *s, arg_r *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            ((a->rd != 0) || (s->lmul == 0)));
+}
+GEN_OPIVX_R_TRANS(vadc_vxm, opivx_vadc_check)
+GEN_OPIVX_R_TRANS(vsbc_vxm, opivx_vadc_check)
+
+static bool opivx_vmadc_check(DisasContext *s, arg_r *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_overlap_group(a->rd, 1, a->rs2, 1 << s->lmul));
+}
+GEN_OPIVX_R_TRANS(vmadc_vxm, opivx_vmadc_check)
+GEN_OPIVX_R_TRANS(vmsbc_vxm, opivx_vmadc_check)
+
+/* OPIVI without GVEC IR */
+#define GEN_OPIVI_R_TRANS(NAME, ZX, OPIVX, CHECK)                        \
+static bool trans_##NAME(DisasContext *s, arg_r *a)                      \
+{                                                                        \
+    if (CHECK(s, a)) {                                                   \
+        uint32_t data = 0;                                               \
+        static gen_helper_opivx const fns[4] = {                         \
+            gen_helper_##OPIVX##_b, gen_helper_##OPIVX##_h,              \
+            gen_helper_##OPIVX##_w, gen_helper_##OPIVX##_d,              \
+        };                                                               \
+                                                                         \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
+        return opivi_trans(a->rd, a->rs1, a->rs2, data,                  \
+                fns[s->sew], s, ZX);                                     \
+    }                                                                    \
+    return false;                                                        \
+}
+GEN_OPIVI_R_TRANS(vadc_vim, 0, vadc_vxm, opivx_vadc_check)
+GEN_OPIVI_R_TRANS(vmadc_vim, 0, vmadc_vxm, opivx_vmadc_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 00eaebee9f..dd85b94fe7 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -185,6 +185,14 @@ static void clearq(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
     vext_clear(cur, cnt, tot);
 }
 
+static inline void vext_set_elem_mask(void *v0, int mlen, int index,
+        uint8_t value)
+{
+    int idx = (index * mlen) / 64;
+    int pos = (index * mlen) % 64;
+    uint64_t old = ((uint64_t *)v0)[idx];
+    ((uint64_t *)v0)[idx] = deposit64(old, pos, mlen, value);
+}
 
 static inline int vext_elem_mask(void *v0, int mlen, int index)
 {
@@ -1062,3 +1070,135 @@ GEN_VEXT_VX(vwsub_wx_b, 1, 2, clearh)
 GEN_VEXT_VX(vwsub_wx_h, 2, 4, clearl)
 GEN_VEXT_VX(vwsub_wx_w, 4, 8, clearq)
 
+#define DO_VADC(N, M, C) (N + M + C)
+#define DO_VSBC(N, M, C) (N - M - C)
+
+#define GEN_VEXT_VADC_VVM(NAME, ETYPE, H, DO_OP, CLEAR_FN)    \
+void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
+        CPURISCVState *env, uint32_t desc)                    \
+{                                                             \
+    uint32_t mlen = vext_mlen(desc);                          \
+    uint32_t vl = env->vl;                                    \
+    uint32_t esz = sizeof(ETYPE);                             \
+    uint32_t vlmax = vext_maxsz(desc) / esz;                  \
+    uint32_t i;                                               \
+                                                              \
+    for (i = 0; i < vl; i++) {                                \
+        ETYPE s1 = *((ETYPE *)vs1 + H(i));                    \
+        ETYPE s2 = *((ETYPE *)vs2 + H(i));                    \
+        uint8_t carry = vext_elem_mask(v0, mlen, i);          \
+                                                              \
+        *((ETYPE *)vd + H(i)) = DO_OP(s2, s1, carry);         \
+    }                                                         \
+    if (i != 0) {                                             \
+        CLEAR_FN(vd, vl, vl * esz, vlmax * esz);              \
+    }                                                         \
+}
+GEN_VEXT_VADC_VVM(vadc_vvm_b, uint8_t,  H1, DO_VADC, clearb)
+GEN_VEXT_VADC_VVM(vadc_vvm_h, uint16_t, H2, DO_VADC, clearh)
+GEN_VEXT_VADC_VVM(vadc_vvm_w, uint32_t, H4, DO_VADC, clearl)
+GEN_VEXT_VADC_VVM(vadc_vvm_d, uint64_t, H8, DO_VADC, clearq)
+
+GEN_VEXT_VADC_VVM(vsbc_vvm_b, uint8_t,  H1, DO_VSBC, clearb)
+GEN_VEXT_VADC_VVM(vsbc_vvm_h, uint16_t, H2, DO_VSBC, clearh)
+GEN_VEXT_VADC_VVM(vsbc_vvm_w, uint32_t, H4, DO_VSBC, clearl)
+GEN_VEXT_VADC_VVM(vsbc_vvm_d, uint64_t, H8, DO_VSBC, clearq)
+
+#define GEN_VEXT_VADC_VXM(NAME, ETYPE, H, DO_OP, CLEAR_FN)               \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,        \
+        CPURISCVState *env, uint32_t desc)                               \
+{                                                                        \
+    uint32_t mlen = vext_mlen(desc);                                     \
+    uint32_t vl = env->vl;                                               \
+    uint32_t esz = sizeof(ETYPE);                                        \
+    uint32_t vlmax = vext_maxsz(desc) / esz;                             \
+    uint32_t i;                                                          \
+                                                                         \
+    for (i = 0; i < vl; i++) {                                           \
+        ETYPE s2 = *((ETYPE *)vs2 + H(i));                               \
+        uint8_t carry = vext_elem_mask(v0, mlen, i);                     \
+                                                                         \
+        *((ETYPE *)vd + H(i)) = DO_OP(s2, (ETYPE)(target_long)s1, carry);\
+    }                                                                    \
+    if (i != 0) {                                                        \
+        CLEAR_FN(vd, vl, vl * esz, vlmax * esz);                         \
+    }                                                                    \
+}
+GEN_VEXT_VADC_VXM(vadc_vxm_b, uint8_t,  H1, DO_VADC, clearb)
+GEN_VEXT_VADC_VXM(vadc_vxm_h, uint16_t, H2, DO_VADC, clearh)
+GEN_VEXT_VADC_VXM(vadc_vxm_w, uint32_t, H4, DO_VADC, clearl)
+GEN_VEXT_VADC_VXM(vadc_vxm_d, uint64_t, H8, DO_VADC, clearq)
+
+GEN_VEXT_VADC_VXM(vsbc_vxm_b, uint8_t,  H1, DO_VSBC, clearb)
+GEN_VEXT_VADC_VXM(vsbc_vxm_h, uint16_t, H2, DO_VSBC, clearh)
+GEN_VEXT_VADC_VXM(vsbc_vxm_w, uint32_t, H4, DO_VSBC, clearl)
+GEN_VEXT_VADC_VXM(vsbc_vxm_d, uint64_t, H8, DO_VSBC, clearq)
+
+#define DO_MADC(N, M, C) ((__typeof(N))(N + M + C) < N ? 1 : 0)
+#define DO_MSBC(N, M, C) ((__typeof(N))(N - M - C) > N ? 1 : 0)
+
+#define GEN_VEXT_VMADC_VVM(NAME, ETYPE, H, DO_OP)             \
+void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
+        CPURISCVState *env, uint32_t desc)                    \
+{                                                             \
+    uint32_t mlen = vext_mlen(desc);                          \
+    uint32_t vl = env->vl;                                    \
+    uint32_t vlmax = vext_maxsz(desc) / sizeof(ETYPE);        \
+    uint32_t i;                                               \
+                                                              \
+    for (i = 0; i < vl; i++) {                                \
+        ETYPE s1 = *((ETYPE *)vs1 + H(i));                    \
+        ETYPE s2 = *((ETYPE *)vs2 + H(i));                    \
+        uint8_t carry = vext_elem_mask(v0, mlen, i);          \
+                                                              \
+        vext_set_elem_mask(vd, mlen, i, DO_OP(s2, s1, carry));\
+    }                                                         \
+    if (i == 0) {                                             \
+        return;                                               \
+    }                                                         \
+    for (; i < vlmax; i++) {                                  \
+        vext_set_elem_mask(vd, mlen, i, 0);                   \
+    }                                                         \
+}
+GEN_VEXT_VMADC_VVM(vmadc_vvm_b, uint8_t,  H1, DO_MADC)
+GEN_VEXT_VMADC_VVM(vmadc_vvm_h, uint16_t, H2, DO_MADC)
+GEN_VEXT_VMADC_VVM(vmadc_vvm_w, uint32_t, H4, DO_MADC)
+GEN_VEXT_VMADC_VVM(vmadc_vvm_d, uint64_t, H8, DO_MADC)
+
+GEN_VEXT_VMADC_VVM(vmsbc_vvm_b, uint8_t,  H1, DO_MSBC)
+GEN_VEXT_VMADC_VVM(vmsbc_vvm_h, uint16_t, H2, DO_MSBC)
+GEN_VEXT_VMADC_VVM(vmsbc_vvm_w, uint32_t, H4, DO_MSBC)
+GEN_VEXT_VMADC_VVM(vmsbc_vvm_d, uint64_t, H8, DO_MSBC)
+
+#define GEN_VEXT_VMADC_VXM(NAME, ETYPE, H, DO_OP)             \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1,        \
+        void *vs2, CPURISCVState *env, uint32_t desc)         \
+{                                                             \
+    uint32_t mlen = vext_mlen(desc);                          \
+    uint32_t vl = env->vl;                                    \
+    uint32_t vlmax = vext_maxsz(desc) / sizeof(ETYPE);        \
+    uint32_t i;                                               \
+                                                              \
+    for (i = 0; i < vl; i++) {                                \
+        ETYPE s2 = *((ETYPE *)vs2 + H(i));                    \
+        uint8_t carry = vext_elem_mask(v0, mlen, i);          \
+                                                              \
+        vext_set_elem_mask(vd, mlen, i,                       \
+                DO_OP(s2, (ETYPE)(target_long)s1, carry));    \
+    }                                                         \
+    if (i == 0) {                                             \
+        return;                                               \
+    }                                                         \
+    for (; i < vlmax; i++) {                                  \
+        vext_set_elem_mask(vd, mlen, i, 0);                   \
+    }                                                         \
+}
+GEN_VEXT_VMADC_VXM(vmadc_vxm_b, uint8_t,  H1, DO_MADC)
+GEN_VEXT_VMADC_VXM(vmadc_vxm_h, uint16_t, H2, DO_MADC)
+GEN_VEXT_VMADC_VXM(vmadc_vxm_w, uint32_t, H4, DO_MADC)
+GEN_VEXT_VMADC_VXM(vmadc_vxm_d, uint64_t, H8, DO_MADC)
+
+GEN_VEXT_VMADC_VXM(vmsbc_vxm_b, uint8_t,  H1, DO_MSBC)
+GEN_VEXT_VMADC_VXM(vmsbc_vxm_h, uint16_t, H2, DO_MSBC)
+GEN_VEXT_VMADC_VXM(vmsbc_vxm_w, uint32_t, H4, DO_MSBC)
+GEN_VEXT_VMADC_VXM(vmsbc_vxm_d, uint64_t, H8, DO_MSBC)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 11/60] target/riscv: vector integer add-with-carry / subtract-with-borrow instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  33 ++++++
 target/riscv/insn32.decode              |  10 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 108 ++++++++++++++++++
 target/riscv/vector_helper.c            | 140 ++++++++++++++++++++++++
 4 files changed, 291 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 1256defb6c..72c733bf49 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -339,3 +339,36 @@ DEF_HELPER_6(vwadd_wx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwsub_wx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwsub_wx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwsub_wx_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vadc_vvm_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadc_vvm_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadc_vvm_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadc_vvm_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsbc_vvm_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsbc_vvm_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsbc_vvm_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsbc_vvm_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadc_vvm_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadc_vvm_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadc_vvm_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadc_vvm_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vvm_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vvm_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vvm_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vvm_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadc_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vadc_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vadc_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vadc_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsbc_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsbc_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsbc_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsbc_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadc_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadc_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadc_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadc_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 4bdbfd16fa..e8ddf95d3d 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -300,6 +300,16 @@ vwsubu_wv       110110 . ..... ..... 010 ..... 1010111 @r_vm
 vwsubu_wx       110110 . ..... ..... 110 ..... 1010111 @r_vm
 vwsub_wv        110111 . ..... ..... 010 ..... 1010111 @r_vm
 vwsub_wx        110111 . ..... ..... 110 ..... 1010111 @r_vm
+vadc_vvm        010000 1 ..... ..... 000 ..... 1010111 @r
+vadc_vxm        010000 1 ..... ..... 100 ..... 1010111 @r
+vadc_vim        010000 1 ..... ..... 011 ..... 1010111 @r
+vmadc_vvm       010001 1 ..... ..... 000 ..... 1010111 @r
+vmadc_vxm       010001 1 ..... ..... 100 ..... 1010111 @r
+vmadc_vim       010001 1 ..... ..... 011 ..... 1010111 @r
+vsbc_vvm        010010 1 ..... ..... 000 ..... 1010111 @r
+vsbc_vxm        010010 1 ..... ..... 100 ..... 1010111 @r
+vmsbc_vvm       010011 1 ..... ..... 000 ..... 1010111 @r
+vmsbc_vxm       010011 1 ..... ..... 100 ..... 1010111 @r
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 7f6fe82fb3..a1f2e84eb8 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1104,3 +1104,111 @@ GEN_OPIWX_WIDEN_TRANS(vwaddu_wx)
 GEN_OPIWX_WIDEN_TRANS(vwadd_wx)
 GEN_OPIWX_WIDEN_TRANS(vwsubu_wx)
 GEN_OPIWX_WIDEN_TRANS(vwsub_wx)
+
+/* OPIVV with UNMASKED */
+#define GEN_OPIVV_R_TRANS(NAME, CHECK)                             \
+static bool trans_##NAME(DisasContext *s, arg_r *a)                \
+{                                                                  \
+    if (CHECK(s, a)) {                                             \
+        uint32_t data = 0;                                         \
+        static gen_helper_gvec_4_ptr * const fns[4] = {            \
+            gen_helper_##NAME##_b, gen_helper_##NAME##_h,          \
+            gen_helper_##NAME##_w, gen_helper_##NAME##_d,          \
+        };                                                         \
+                                                                   \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
+            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),              \
+            cpu_env, 0, s->vlen / 8, data, fns[s->sew]);           \
+        return true;                                               \
+    }                                                              \
+    return false;                                                  \
+}
+/*
+ * For vadc and vsbc, an illegal instruction exception is raised if the
+ * destination vector register is v0 and LMUL > 1. (Section 12.3)
+ */
+static bool opivv_vadc_check(DisasContext *s, arg_r *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_reg(s, a->rs1, false) &&
+            ((a->rd != 0) || (s->lmul == 0)));
+}
+GEN_OPIVV_R_TRANS(vadc_vvm, opivv_vadc_check)
+GEN_OPIVV_R_TRANS(vsbc_vvm, opivv_vadc_check)
+
+/*
+ * For vmadc and vmsbc, an illegal instruction exception is raised if the
+ * destination vector register overlaps a source vector register group.
+ */
+static bool opivv_vmadc_check(DisasContext *s, arg_r *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_reg(s, a->rs1, false) &&
+            vext_check_overlap_group(a->rd, 1, a->rs1, 1 << s->lmul) &&
+            vext_check_overlap_group(a->rd, 1, a->rs2, 1 << s->lmul));
+}
+GEN_OPIVV_R_TRANS(vmadc_vvm, opivv_vmadc_check)
+GEN_OPIVV_R_TRANS(vmsbc_vvm, opivv_vmadc_check)
+
+/* OPIVX with UNMASKED */
+#define GEN_OPIVX_R_TRANS(NAME, CHECK)                                   \
+static bool trans_##NAME(DisasContext *s, arg_r *a)                      \
+{                                                                        \
+    if (CHECK(s, a)) {                                                   \
+        uint32_t data = 0;                                               \
+        static gen_helper_opivx const fns[4] = {                         \
+            gen_helper_##NAME##_b, gen_helper_##NAME##_h,                \
+            gen_helper_##NAME##_w, gen_helper_##NAME##_d,                \
+        };                                                               \
+                                                                         \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
+        return opivx_trans(a->rd, a->rs1, a->rs2, data, fns[s->sew], s); \
+    }                                                                    \
+    return false;                                                        \
+}
+
+static bool opivx_vadc_check(DisasContext *s, arg_r *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            ((a->rd != 0) || (s->lmul == 0)));
+}
+GEN_OPIVX_R_TRANS(vadc_vxm, opivx_vadc_check)
+GEN_OPIVX_R_TRANS(vsbc_vxm, opivx_vadc_check)
+
+static bool opivx_vmadc_check(DisasContext *s, arg_r *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_overlap_group(a->rd, 1, a->rs2, 1 << s->lmul));
+}
+GEN_OPIVX_R_TRANS(vmadc_vxm, opivx_vmadc_check)
+GEN_OPIVX_R_TRANS(vmsbc_vxm, opivx_vmadc_check)
+
+/* OPIVI without GVEC IR */
+#define GEN_OPIVI_R_TRANS(NAME, ZX, OPIVX, CHECK)                        \
+static bool trans_##NAME(DisasContext *s, arg_r *a)                      \
+{                                                                        \
+    if (CHECK(s, a)) {                                                   \
+        uint32_t data = 0;                                               \
+        static gen_helper_opivx const fns[4] = {                         \
+            gen_helper_##OPIVX##_b, gen_helper_##OPIVX##_h,              \
+            gen_helper_##OPIVX##_w, gen_helper_##OPIVX##_d,              \
+        };                                                               \
+                                                                         \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
+        return opivi_trans(a->rd, a->rs1, a->rs2, data,                  \
+                fns[s->sew], s, ZX);                                     \
+    }                                                                    \
+    return false;                                                        \
+}
+GEN_OPIVI_R_TRANS(vadc_vim, 0, vadc_vxm, opivx_vadc_check)
+GEN_OPIVI_R_TRANS(vmadc_vim, 0, vmadc_vxm, opivx_vmadc_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 00eaebee9f..dd85b94fe7 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -185,6 +185,14 @@ static void clearq(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
     vext_clear(cur, cnt, tot);
 }
 
+static inline void vext_set_elem_mask(void *v0, int mlen, int index,
+        uint8_t value)
+{
+    int idx = (index * mlen) / 64;
+    int pos = (index * mlen) % 64;
+    uint64_t old = ((uint64_t *)v0)[idx];
+    ((uint64_t *)v0)[idx] = deposit64(old, pos, mlen, value);
+}
 
 static inline int vext_elem_mask(void *v0, int mlen, int index)
 {
@@ -1062,3 +1070,135 @@ GEN_VEXT_VX(vwsub_wx_b, 1, 2, clearh)
 GEN_VEXT_VX(vwsub_wx_h, 2, 4, clearl)
 GEN_VEXT_VX(vwsub_wx_w, 4, 8, clearq)
 
+#define DO_VADC(N, M, C) (N + M + C)
+#define DO_VSBC(N, M, C) (N - M - C)
+
+#define GEN_VEXT_VADC_VVM(NAME, ETYPE, H, DO_OP, CLEAR_FN)    \
+void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
+        CPURISCVState *env, uint32_t desc)                    \
+{                                                             \
+    uint32_t mlen = vext_mlen(desc);                          \
+    uint32_t vl = env->vl;                                    \
+    uint32_t esz = sizeof(ETYPE);                             \
+    uint32_t vlmax = vext_maxsz(desc) / esz;                  \
+    uint32_t i;                                               \
+                                                              \
+    for (i = 0; i < vl; i++) {                                \
+        ETYPE s1 = *((ETYPE *)vs1 + H(i));                    \
+        ETYPE s2 = *((ETYPE *)vs2 + H(i));                    \
+        uint8_t carry = vext_elem_mask(v0, mlen, i);          \
+                                                              \
+        *((ETYPE *)vd + H(i)) = DO_OP(s2, s1, carry);         \
+    }                                                         \
+    if (i != 0) {                                             \
+        CLEAR_FN(vd, vl, vl * esz, vlmax * esz);              \
+    }                                                         \
+}
+GEN_VEXT_VADC_VVM(vadc_vvm_b, uint8_t,  H1, DO_VADC, clearb)
+GEN_VEXT_VADC_VVM(vadc_vvm_h, uint16_t, H2, DO_VADC, clearh)
+GEN_VEXT_VADC_VVM(vadc_vvm_w, uint32_t, H4, DO_VADC, clearl)
+GEN_VEXT_VADC_VVM(vadc_vvm_d, uint64_t, H8, DO_VADC, clearq)
+
+GEN_VEXT_VADC_VVM(vsbc_vvm_b, uint8_t,  H1, DO_VSBC, clearb)
+GEN_VEXT_VADC_VVM(vsbc_vvm_h, uint16_t, H2, DO_VSBC, clearh)
+GEN_VEXT_VADC_VVM(vsbc_vvm_w, uint32_t, H4, DO_VSBC, clearl)
+GEN_VEXT_VADC_VVM(vsbc_vvm_d, uint64_t, H8, DO_VSBC, clearq)
+
+#define GEN_VEXT_VADC_VXM(NAME, ETYPE, H, DO_OP, CLEAR_FN)               \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,        \
+        CPURISCVState *env, uint32_t desc)                               \
+{                                                                        \
+    uint32_t mlen = vext_mlen(desc);                                     \
+    uint32_t vl = env->vl;                                               \
+    uint32_t esz = sizeof(ETYPE);                                        \
+    uint32_t vlmax = vext_maxsz(desc) / esz;                             \
+    uint32_t i;                                                          \
+                                                                         \
+    for (i = 0; i < vl; i++) {                                           \
+        ETYPE s2 = *((ETYPE *)vs2 + H(i));                               \
+        uint8_t carry = vext_elem_mask(v0, mlen, i);                     \
+                                                                         \
+        *((ETYPE *)vd + H(i)) = DO_OP(s2, (ETYPE)(target_long)s1, carry);\
+    }                                                                    \
+    if (i != 0) {                                                        \
+        CLEAR_FN(vd, vl, vl * esz, vlmax * esz);                         \
+    }                                                                    \
+}
+GEN_VEXT_VADC_VXM(vadc_vxm_b, uint8_t,  H1, DO_VADC, clearb)
+GEN_VEXT_VADC_VXM(vadc_vxm_h, uint16_t, H2, DO_VADC, clearh)
+GEN_VEXT_VADC_VXM(vadc_vxm_w, uint32_t, H4, DO_VADC, clearl)
+GEN_VEXT_VADC_VXM(vadc_vxm_d, uint64_t, H8, DO_VADC, clearq)
+
+GEN_VEXT_VADC_VXM(vsbc_vxm_b, uint8_t,  H1, DO_VSBC, clearb)
+GEN_VEXT_VADC_VXM(vsbc_vxm_h, uint16_t, H2, DO_VSBC, clearh)
+GEN_VEXT_VADC_VXM(vsbc_vxm_w, uint32_t, H4, DO_VSBC, clearl)
+GEN_VEXT_VADC_VXM(vsbc_vxm_d, uint64_t, H8, DO_VSBC, clearq)
+
+#define DO_MADC(N, M, C) ((__typeof(N))(N + M + C) < N ? 1 : 0)
+#define DO_MSBC(N, M, C) ((__typeof(N))(N - M - C) > N ? 1 : 0)
+
+#define GEN_VEXT_VMADC_VVM(NAME, ETYPE, H, DO_OP)             \
+void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
+        CPURISCVState *env, uint32_t desc)                    \
+{                                                             \
+    uint32_t mlen = vext_mlen(desc);                          \
+    uint32_t vl = env->vl;                                    \
+    uint32_t vlmax = vext_maxsz(desc) / sizeof(ETYPE);        \
+    uint32_t i;                                               \
+                                                              \
+    for (i = 0; i < vl; i++) {                                \
+        ETYPE s1 = *((ETYPE *)vs1 + H(i));                    \
+        ETYPE s2 = *((ETYPE *)vs2 + H(i));                    \
+        uint8_t carry = vext_elem_mask(v0, mlen, i);          \
+                                                              \
+        vext_set_elem_mask(vd, mlen, i, DO_OP(s2, s1, carry));\
+    }                                                         \
+    if (i == 0) {                                             \
+        return;                                               \
+    }                                                         \
+    for (; i < vlmax; i++) {                                  \
+        vext_set_elem_mask(vd, mlen, i, 0);                   \
+    }                                                         \
+}
+GEN_VEXT_VMADC_VVM(vmadc_vvm_b, uint8_t,  H1, DO_MADC)
+GEN_VEXT_VMADC_VVM(vmadc_vvm_h, uint16_t, H2, DO_MADC)
+GEN_VEXT_VMADC_VVM(vmadc_vvm_w, uint32_t, H4, DO_MADC)
+GEN_VEXT_VMADC_VVM(vmadc_vvm_d, uint64_t, H8, DO_MADC)
+
+GEN_VEXT_VMADC_VVM(vmsbc_vvm_b, uint8_t,  H1, DO_MSBC)
+GEN_VEXT_VMADC_VVM(vmsbc_vvm_h, uint16_t, H2, DO_MSBC)
+GEN_VEXT_VMADC_VVM(vmsbc_vvm_w, uint32_t, H4, DO_MSBC)
+GEN_VEXT_VMADC_VVM(vmsbc_vvm_d, uint64_t, H8, DO_MSBC)
+
+#define GEN_VEXT_VMADC_VXM(NAME, ETYPE, H, DO_OP)             \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1,        \
+        void *vs2, CPURISCVState *env, uint32_t desc)         \
+{                                                             \
+    uint32_t mlen = vext_mlen(desc);                          \
+    uint32_t vl = env->vl;                                    \
+    uint32_t vlmax = vext_maxsz(desc) / sizeof(ETYPE);        \
+    uint32_t i;                                               \
+                                                              \
+    for (i = 0; i < vl; i++) {                                \
+        ETYPE s2 = *((ETYPE *)vs2 + H(i));                    \
+        uint8_t carry = vext_elem_mask(v0, mlen, i);          \
+                                                              \
+        vext_set_elem_mask(vd, mlen, i,                       \
+                DO_OP(s2, (ETYPE)(target_long)s1, carry));    \
+    }                                                         \
+    if (i == 0) {                                             \
+        return;                                               \
+    }                                                         \
+    for (; i < vlmax; i++) {                                  \
+        vext_set_elem_mask(vd, mlen, i, 0);                   \
+    }                                                         \
+}
+GEN_VEXT_VMADC_VXM(vmadc_vxm_b, uint8_t,  H1, DO_MADC)
+GEN_VEXT_VMADC_VXM(vmadc_vxm_h, uint16_t, H2, DO_MADC)
+GEN_VEXT_VMADC_VXM(vmadc_vxm_w, uint32_t, H4, DO_MADC)
+GEN_VEXT_VMADC_VXM(vmadc_vxm_d, uint64_t, H8, DO_MADC)
+
+GEN_VEXT_VMADC_VXM(vmsbc_vxm_b, uint8_t,  H1, DO_MSBC)
+GEN_VEXT_VMADC_VXM(vmsbc_vxm_h, uint16_t, H2, DO_MSBC)
+GEN_VEXT_VMADC_VXM(vmsbc_vxm_w, uint32_t, H4, DO_MSBC)
+GEN_VEXT_VMADC_VXM(vmsbc_vxm_d, uint64_t, H8, DO_MSBC)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 12/60] target/riscv: vector bitwise logical instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 25 ++++++++++++
 target/riscv/insn32.decode              |  9 +++++
 target/riscv/insn_trans/trans_rvv.inc.c | 11 ++++++
 target/riscv/vector_helper.c            | 51 +++++++++++++++++++++++++
 4 files changed, 96 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 72c733bf49..4373e9e8c2 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -372,3 +372,28 @@ DEF_HELPER_6(vmsbc_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmsbc_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmsbc_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmsbc_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vand_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vand_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vand_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vand_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vor_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vor_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vor_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vor_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vxor_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vxor_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vxor_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vxor_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vand_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vand_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vand_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vand_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vor_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vor_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vor_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vor_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vxor_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vxor_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vxor_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vxor_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e8ddf95d3d..29a505cede 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -310,6 +310,15 @@ vsbc_vvm        010010 1 ..... ..... 000 ..... 1010111 @r
 vsbc_vxm        010010 1 ..... ..... 100 ..... 1010111 @r
 vmsbc_vvm       010011 1 ..... ..... 000 ..... 1010111 @r
 vmsbc_vxm       010011 1 ..... ..... 100 ..... 1010111 @r
+vand_vv         001001 . ..... ..... 000 ..... 1010111 @r_vm
+vand_vx         001001 . ..... ..... 100 ..... 1010111 @r_vm
+vand_vi         001001 . ..... ..... 011 ..... 1010111 @r_vm
+vor_vv          001010 . ..... ..... 000 ..... 1010111 @r_vm
+vor_vx          001010 . ..... ..... 100 ..... 1010111 @r_vm
+vor_vi          001010 . ..... ..... 011 ..... 1010111 @r_vm
+vxor_vv         001011 . ..... ..... 000 ..... 1010111 @r_vm
+vxor_vx         001011 . ..... ..... 100 ..... 1010111 @r_vm
+vxor_vi         001011 . ..... ..... 011 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index a1f2e84eb8..3a4696dbcd 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1212,3 +1212,14 @@ static bool trans_##NAME(DisasContext *s, arg_r *a)                      \
 }
 GEN_OPIVI_R_TRANS(vadc_vim, 0, vadc_vxm, opivx_vadc_check)
 GEN_OPIVI_R_TRANS(vmadc_vim, 0, vmadc_vxm, opivx_vmadc_check)
+
+/* Vector Bitwise Logical Instructions */
+GEN_OPIVV_GVEC_TRANS(vand_vv, and)
+GEN_OPIVV_GVEC_TRANS(vor_vv,  or)
+GEN_OPIVV_GVEC_TRANS(vxor_vv, xor)
+GEN_OPIVX_GVEC_TRANS(vand_vx, ands)
+GEN_OPIVX_GVEC_TRANS(vor_vx,  ors)
+GEN_OPIVX_GVEC_TRANS(vxor_vx, xors)
+GEN_OPIVI_GVEC_TRANS(vand_vi, 0, vand_vx, andi)
+GEN_OPIVI_GVEC_TRANS(vor_vi, 0, vor_vx,  ori)
+GEN_OPIVI_GVEC_TRANS(vxor_vi, 0, vxor_vx, xori)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index dd85b94fe7..532b373f99 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -1202,3 +1202,54 @@ GEN_VEXT_VMADC_VXM(vmsbc_vxm_b, uint8_t,  H1, DO_MSBC)
 GEN_VEXT_VMADC_VXM(vmsbc_vxm_h, uint16_t, H2, DO_MSBC)
 GEN_VEXT_VMADC_VXM(vmsbc_vxm_w, uint32_t, H4, DO_MSBC)
 GEN_VEXT_VMADC_VXM(vmsbc_vxm_d, uint64_t, H8, DO_MSBC)
+
+/* Vector Bitwise Logical Instructions */
+RVVCALL(OPIVV2, vand_vv_b, OP_SSS_B, H1, H1, H1, DO_AND)
+RVVCALL(OPIVV2, vand_vv_h, OP_SSS_H, H2, H2, H2, DO_AND)
+RVVCALL(OPIVV2, vand_vv_w, OP_SSS_W, H4, H4, H4, DO_AND)
+RVVCALL(OPIVV2, vand_vv_d, OP_SSS_D, H8, H8, H8, DO_AND)
+RVVCALL(OPIVV2, vor_vv_b, OP_SSS_B, H1, H1, H1, DO_OR)
+RVVCALL(OPIVV2, vor_vv_h, OP_SSS_H, H2, H2, H2, DO_OR)
+RVVCALL(OPIVV2, vor_vv_w, OP_SSS_W, H4, H4, H4, DO_OR)
+RVVCALL(OPIVV2, vor_vv_d, OP_SSS_D, H8, H8, H8, DO_OR)
+RVVCALL(OPIVV2, vxor_vv_b, OP_SSS_B, H1, H1, H1, DO_XOR)
+RVVCALL(OPIVV2, vxor_vv_h, OP_SSS_H, H2, H2, H2, DO_XOR)
+RVVCALL(OPIVV2, vxor_vv_w, OP_SSS_W, H4, H4, H4, DO_XOR)
+RVVCALL(OPIVV2, vxor_vv_d, OP_SSS_D, H8, H8, H8, DO_XOR)
+GEN_VEXT_VV(vand_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vand_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vand_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vand_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vor_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vor_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vor_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vor_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vxor_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vxor_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vxor_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vxor_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2, vand_vx_b, OP_SSS_B, H1, H1, DO_AND)
+RVVCALL(OPIVX2, vand_vx_h, OP_SSS_H, H2, H2, DO_AND)
+RVVCALL(OPIVX2, vand_vx_w, OP_SSS_W, H4, H4, DO_AND)
+RVVCALL(OPIVX2, vand_vx_d, OP_SSS_D, H8, H8, DO_AND)
+RVVCALL(OPIVX2, vor_vx_b, OP_SSS_B, H1, H1, DO_OR)
+RVVCALL(OPIVX2, vor_vx_h, OP_SSS_H, H2, H2, DO_OR)
+RVVCALL(OPIVX2, vor_vx_w, OP_SSS_W, H4, H4, DO_OR)
+RVVCALL(OPIVX2, vor_vx_d, OP_SSS_D, H8, H8, DO_OR)
+RVVCALL(OPIVX2, vxor_vx_b, OP_SSS_B, H1, H1, DO_XOR)
+RVVCALL(OPIVX2, vxor_vx_h, OP_SSS_H, H2, H2, DO_XOR)
+RVVCALL(OPIVX2, vxor_vx_w, OP_SSS_W, H4, H4, DO_XOR)
+RVVCALL(OPIVX2, vxor_vx_d, OP_SSS_D, H8, H8, DO_XOR)
+GEN_VEXT_VX(vand_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vand_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vand_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vand_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vor_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vor_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vor_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vor_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vxor_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vxor_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vxor_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vxor_vx_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 12/60] target/riscv: vector bitwise logical instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 25 ++++++++++++
 target/riscv/insn32.decode              |  9 +++++
 target/riscv/insn_trans/trans_rvv.inc.c | 11 ++++++
 target/riscv/vector_helper.c            | 51 +++++++++++++++++++++++++
 4 files changed, 96 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 72c733bf49..4373e9e8c2 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -372,3 +372,28 @@ DEF_HELPER_6(vmsbc_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmsbc_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmsbc_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmsbc_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vand_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vand_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vand_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vand_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vor_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vor_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vor_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vor_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vxor_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vxor_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vxor_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vxor_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vand_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vand_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vand_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vand_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vor_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vor_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vor_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vor_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vxor_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vxor_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vxor_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vxor_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e8ddf95d3d..29a505cede 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -310,6 +310,15 @@ vsbc_vvm        010010 1 ..... ..... 000 ..... 1010111 @r
 vsbc_vxm        010010 1 ..... ..... 100 ..... 1010111 @r
 vmsbc_vvm       010011 1 ..... ..... 000 ..... 1010111 @r
 vmsbc_vxm       010011 1 ..... ..... 100 ..... 1010111 @r
+vand_vv         001001 . ..... ..... 000 ..... 1010111 @r_vm
+vand_vx         001001 . ..... ..... 100 ..... 1010111 @r_vm
+vand_vi         001001 . ..... ..... 011 ..... 1010111 @r_vm
+vor_vv          001010 . ..... ..... 000 ..... 1010111 @r_vm
+vor_vx          001010 . ..... ..... 100 ..... 1010111 @r_vm
+vor_vi          001010 . ..... ..... 011 ..... 1010111 @r_vm
+vxor_vv         001011 . ..... ..... 000 ..... 1010111 @r_vm
+vxor_vx         001011 . ..... ..... 100 ..... 1010111 @r_vm
+vxor_vi         001011 . ..... ..... 011 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index a1f2e84eb8..3a4696dbcd 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1212,3 +1212,14 @@ static bool trans_##NAME(DisasContext *s, arg_r *a)                      \
 }
 GEN_OPIVI_R_TRANS(vadc_vim, 0, vadc_vxm, opivx_vadc_check)
 GEN_OPIVI_R_TRANS(vmadc_vim, 0, vmadc_vxm, opivx_vmadc_check)
+
+/* Vector Bitwise Logical Instructions */
+GEN_OPIVV_GVEC_TRANS(vand_vv, and)
+GEN_OPIVV_GVEC_TRANS(vor_vv,  or)
+GEN_OPIVV_GVEC_TRANS(vxor_vv, xor)
+GEN_OPIVX_GVEC_TRANS(vand_vx, ands)
+GEN_OPIVX_GVEC_TRANS(vor_vx,  ors)
+GEN_OPIVX_GVEC_TRANS(vxor_vx, xors)
+GEN_OPIVI_GVEC_TRANS(vand_vi, 0, vand_vx, andi)
+GEN_OPIVI_GVEC_TRANS(vor_vi, 0, vor_vx,  ori)
+GEN_OPIVI_GVEC_TRANS(vxor_vi, 0, vxor_vx, xori)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index dd85b94fe7..532b373f99 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -1202,3 +1202,54 @@ GEN_VEXT_VMADC_VXM(vmsbc_vxm_b, uint8_t,  H1, DO_MSBC)
 GEN_VEXT_VMADC_VXM(vmsbc_vxm_h, uint16_t, H2, DO_MSBC)
 GEN_VEXT_VMADC_VXM(vmsbc_vxm_w, uint32_t, H4, DO_MSBC)
 GEN_VEXT_VMADC_VXM(vmsbc_vxm_d, uint64_t, H8, DO_MSBC)
+
+/* Vector Bitwise Logical Instructions */
+RVVCALL(OPIVV2, vand_vv_b, OP_SSS_B, H1, H1, H1, DO_AND)
+RVVCALL(OPIVV2, vand_vv_h, OP_SSS_H, H2, H2, H2, DO_AND)
+RVVCALL(OPIVV2, vand_vv_w, OP_SSS_W, H4, H4, H4, DO_AND)
+RVVCALL(OPIVV2, vand_vv_d, OP_SSS_D, H8, H8, H8, DO_AND)
+RVVCALL(OPIVV2, vor_vv_b, OP_SSS_B, H1, H1, H1, DO_OR)
+RVVCALL(OPIVV2, vor_vv_h, OP_SSS_H, H2, H2, H2, DO_OR)
+RVVCALL(OPIVV2, vor_vv_w, OP_SSS_W, H4, H4, H4, DO_OR)
+RVVCALL(OPIVV2, vor_vv_d, OP_SSS_D, H8, H8, H8, DO_OR)
+RVVCALL(OPIVV2, vxor_vv_b, OP_SSS_B, H1, H1, H1, DO_XOR)
+RVVCALL(OPIVV2, vxor_vv_h, OP_SSS_H, H2, H2, H2, DO_XOR)
+RVVCALL(OPIVV2, vxor_vv_w, OP_SSS_W, H4, H4, H4, DO_XOR)
+RVVCALL(OPIVV2, vxor_vv_d, OP_SSS_D, H8, H8, H8, DO_XOR)
+GEN_VEXT_VV(vand_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vand_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vand_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vand_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vor_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vor_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vor_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vor_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vxor_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vxor_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vxor_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vxor_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2, vand_vx_b, OP_SSS_B, H1, H1, DO_AND)
+RVVCALL(OPIVX2, vand_vx_h, OP_SSS_H, H2, H2, DO_AND)
+RVVCALL(OPIVX2, vand_vx_w, OP_SSS_W, H4, H4, DO_AND)
+RVVCALL(OPIVX2, vand_vx_d, OP_SSS_D, H8, H8, DO_AND)
+RVVCALL(OPIVX2, vor_vx_b, OP_SSS_B, H1, H1, DO_OR)
+RVVCALL(OPIVX2, vor_vx_h, OP_SSS_H, H2, H2, DO_OR)
+RVVCALL(OPIVX2, vor_vx_w, OP_SSS_W, H4, H4, DO_OR)
+RVVCALL(OPIVX2, vor_vx_d, OP_SSS_D, H8, H8, DO_OR)
+RVVCALL(OPIVX2, vxor_vx_b, OP_SSS_B, H1, H1, DO_XOR)
+RVVCALL(OPIVX2, vxor_vx_h, OP_SSS_H, H2, H2, DO_XOR)
+RVVCALL(OPIVX2, vxor_vx_w, OP_SSS_W, H4, H4, DO_XOR)
+RVVCALL(OPIVX2, vxor_vx_d, OP_SSS_D, H8, H8, DO_XOR)
+GEN_VEXT_VX(vand_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vand_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vand_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vand_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vor_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vor_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vor_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vor_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vxor_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vxor_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vxor_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vxor_vx_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 13/60] target/riscv: vector single-width bit shift instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 25 ++++++++
 target/riscv/insn32.decode              |  9 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 44 +++++++++++++
 target/riscv/vector_helper.c            | 82 +++++++++++++++++++++++++
 4 files changed, 160 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 4373e9e8c2..47284c7476 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -397,3 +397,28 @@ DEF_HELPER_6(vxor_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vxor_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vxor_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vxor_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vsll_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsll_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsll_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsll_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsrl_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsrl_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsrl_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsrl_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsra_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsra_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsra_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsra_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsll_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsll_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsll_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsll_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsrl_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsrl_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsrl_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsrl_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsra_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsra_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsra_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsra_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 29a505cede..dbbfa34b97 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -319,6 +319,15 @@ vor_vi          001010 . ..... ..... 011 ..... 1010111 @r_vm
 vxor_vv         001011 . ..... ..... 000 ..... 1010111 @r_vm
 vxor_vx         001011 . ..... ..... 100 ..... 1010111 @r_vm
 vxor_vi         001011 . ..... ..... 011 ..... 1010111 @r_vm
+vsll_vv         100101 . ..... ..... 000 ..... 1010111 @r_vm
+vsll_vx         100101 . ..... ..... 100 ..... 1010111 @r_vm
+vsll_vi         100101 . ..... ..... 011 ..... 1010111 @r_vm
+vsrl_vv         101000 . ..... ..... 000 ..... 1010111 @r_vm
+vsrl_vx         101000 . ..... ..... 100 ..... 1010111 @r_vm
+vsrl_vi         101000 . ..... ..... 011 ..... 1010111 @r_vm
+vsra_vv         101001 . ..... ..... 000 ..... 1010111 @r_vm
+vsra_vx         101001 . ..... ..... 100 ..... 1010111 @r_vm
+vsra_vi         101001 . ..... ..... 011 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 3a4696dbcd..a60518e1df 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1223,3 +1223,47 @@ GEN_OPIVX_GVEC_TRANS(vxor_vx, xors)
 GEN_OPIVI_GVEC_TRANS(vand_vi, 0, vand_vx, andi)
 GEN_OPIVI_GVEC_TRANS(vor_vi, 0, vor_vx,  ori)
 GEN_OPIVI_GVEC_TRANS(vxor_vi, 0, vxor_vx, xori)
+
+/* Vector Single-Width Bit Shift Instructions */
+GEN_OPIVV_GVEC_TRANS(vsll_vv,  shlv)
+GEN_OPIVV_GVEC_TRANS(vsrl_vv,  shrv)
+GEN_OPIVV_GVEC_TRANS(vsra_vv,  sarv)
+
+#define GEN_OPIVX_GVEC_SHIFT_TRANS(NAME, GVSUF)                               \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                        \
+{                                                                             \
+    if (!opivx_check(s, a)) {                                                 \
+        return false;                                                         \
+    }                                                                         \
+                                                                              \
+    if (a->vm && s->vl_eq_vlmax) {                                            \
+        TCGv_i32 src1 = tcg_temp_new_i32();                                   \
+        TCGv tmp = tcg_temp_new();                                            \
+        gen_get_gpr(tmp, a->rs1);                                             \
+        tcg_gen_trunc_tl_i32(src1, tmp);                                      \
+        tcg_gen_gvec_##GVSUF(8 << s->sew, vreg_ofs(s, a->rd),                 \
+            vreg_ofs(s, a->rs2), src1, MAXSZ(s), MAXSZ(s));                   \
+        tcg_temp_free_i32(src1);                                              \
+        tcg_temp_free(tmp);                                                   \
+        return true;                                                          \
+    } else {                                                                  \
+        uint32_t data = 0;                                                    \
+        static gen_helper_opivx const fns[4] = {                              \
+            gen_helper_##NAME##_b, gen_helper_##NAME##_h,                     \
+            gen_helper_##NAME##_w, gen_helper_##NAME##_d,                     \
+        };                                                                    \
+                                                                              \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                        \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                            \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                        \
+        return opivx_trans(a->rd, a->rs1, a->rs2, data, fns[s->sew], s);      \
+    }                                                                         \
+    return true;                                                              \
+}
+GEN_OPIVX_GVEC_SHIFT_TRANS(vsll_vx,  shls)
+GEN_OPIVX_GVEC_SHIFT_TRANS(vsrl_vx,  shrs)
+GEN_OPIVX_GVEC_SHIFT_TRANS(vsra_vx,  sars)
+
+GEN_OPIVI_GVEC_TRANS(vsll_vi, 1, vsll_vx,  shli)
+GEN_OPIVI_GVEC_TRANS(vsrl_vi, 1, vsrl_vx,  shri)
+GEN_OPIVI_GVEC_TRANS(vsra_vi, 1, vsra_vx,  sari)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 532b373f99..3772b059b1 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -1253,3 +1253,85 @@ GEN_VEXT_VX(vxor_vx_b, 1, 1, clearb)
 GEN_VEXT_VX(vxor_vx_h, 2, 2, clearh)
 GEN_VEXT_VX(vxor_vx_w, 4, 4, clearl)
 GEN_VEXT_VX(vxor_vx_d, 8, 8, clearq)
+
+/* Vector Single-Width Bit Shift Instructions */
+#define DO_SLL(N, M)  (N << (M))
+#define DO_SRL(N, M)  (N >> (M))
+
+/* generate the helpers for shift instructions with two vector operators */
+#define GEN_VEXT_SHIFT_VV(NAME, TS1, TS2, HS1, HS2, OP, MASK, CLEAR_FN)   \
+void HELPER(NAME)(void *vd, void *v0, void *vs1,                          \
+        void *vs2, CPURISCVState *env, uint32_t desc)                     \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vm = vext_vm(desc);                                          \
+    uint32_t vl = env->vl;                                                \
+    uint32_t esz = sizeof(TS1);                                           \
+    uint32_t vlmax = vext_maxsz(desc) / esz;                              \
+    uint32_t i;                                                           \
+                                                                          \
+    for (i = 0; i < vl; i++) {                                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
+            continue;                                                     \
+        }                                                                 \
+        TS1 s1 = *((TS1 *)vs1 + HS1(i));                                  \
+        TS2 s2 = *((TS2 *)vs2 + HS2(i));                                  \
+        *((TS1 *)vd + HS1(i)) = OP(s2, s1 & MASK);                        \
+    }                                                                     \
+    if (i != 0) {                                                         \
+        CLEAR_FN(vd, vl, vl * esz, vlmax * esz);                          \
+    }                                                                     \
+}
+GEN_VEXT_SHIFT_VV(vsll_vv_b, uint8_t,  uint8_t, H1, H1, DO_SLL, 0x7, clearb)
+GEN_VEXT_SHIFT_VV(vsll_vv_h, uint16_t, uint16_t, H2, H2, DO_SLL, 0xf, clearh)
+GEN_VEXT_SHIFT_VV(vsll_vv_w, uint32_t, uint32_t, H4, H4, DO_SLL, 0x1f, clearl)
+GEN_VEXT_SHIFT_VV(vsll_vv_d, uint64_t, uint64_t, H8, H8, DO_SLL, 0x3f, clearq)
+
+GEN_VEXT_SHIFT_VV(vsrl_vv_b, uint8_t, uint8_t, H1, H1, DO_SRL, 0x7, clearb)
+GEN_VEXT_SHIFT_VV(vsrl_vv_h, uint16_t, uint16_t, H2, H2, DO_SRL, 0xf, clearh)
+GEN_VEXT_SHIFT_VV(vsrl_vv_w, uint32_t, uint32_t, H4, H4, DO_SRL, 0x1f, clearl)
+GEN_VEXT_SHIFT_VV(vsrl_vv_d, uint64_t, uint64_t, H8, H8, DO_SRL, 0x3f, clearq)
+
+GEN_VEXT_SHIFT_VV(vsra_vv_b, uint8_t,  int8_t, H1, H1, DO_SRL, 0x7, clearb)
+GEN_VEXT_SHIFT_VV(vsra_vv_h, uint16_t, int16_t, H2, H2, DO_SRL, 0xf, clearh)
+GEN_VEXT_SHIFT_VV(vsra_vv_w, uint32_t, int32_t, H4, H4, DO_SRL, 0x1f, clearl)
+GEN_VEXT_SHIFT_VV(vsra_vv_d, uint64_t, int64_t, H8, H8, DO_SRL, 0x3f, clearq)
+
+/* generate the helpers for shift instructions with one vector and one sclar */
+#define GEN_VEXT_SHIFT_VX(NAME, TD, TS2, HD, HS2, OP, MASK, CLEAR_FN) \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1,                \
+        void *vs2, CPURISCVState *env, uint32_t desc)                 \
+{                                                                     \
+    uint32_t mlen = vext_mlen(desc);                                  \
+    uint32_t vm = vext_vm(desc);                                      \
+    uint32_t vl = env->vl;                                            \
+    uint32_t esz = sizeof(TD);                                        \
+    uint32_t vlmax = vext_maxsz(desc) / esz;                          \
+    uint32_t i;                                                       \
+                                                                      \
+    for (i = 0; i < vl; i++) {                                        \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                    \
+            continue;                                                 \
+        }                                                             \
+        TS2 s2 = *((TS2 *)vs2 + HS2(i));                              \
+        *((TD *)vd + HD(i)) = OP(s2, s1 & MASK);                      \
+    }                                                                 \
+    if (i != 0) {                                                     \
+        CLEAR_FN(vd, vl, vl * esz, vlmax * esz);                      \
+    }                                                                 \
+}
+
+GEN_VEXT_SHIFT_VX(vsll_vx_b, uint8_t, int8_t, H1, H1, DO_SLL, 0x7, clearb)
+GEN_VEXT_SHIFT_VX(vsll_vx_h, uint16_t, int16_t, H2, H2, DO_SLL, 0xf, clearh)
+GEN_VEXT_SHIFT_VX(vsll_vx_w, uint32_t, int32_t, H4, H4, DO_SLL, 0x1f, clearl)
+GEN_VEXT_SHIFT_VX(vsll_vx_d, uint64_t, int64_t, H8, H8, DO_SLL, 0x3f, clearq)
+
+GEN_VEXT_SHIFT_VX(vsrl_vx_b, uint8_t, uint8_t, H1, H1, DO_SRL, 0x7, clearb)
+GEN_VEXT_SHIFT_VX(vsrl_vx_h, uint16_t, uint16_t, H2, H2, DO_SRL, 0xf, clearh)
+GEN_VEXT_SHIFT_VX(vsrl_vx_w, uint32_t, uint32_t, H4, H4, DO_SRL, 0x1f, clearl)
+GEN_VEXT_SHIFT_VX(vsrl_vx_d, uint64_t, uint64_t, H8, H8, DO_SRL, 0x3f, clearq)
+
+GEN_VEXT_SHIFT_VX(vsra_vx_b, int8_t, int8_t, H1, H1, DO_SRL, 0x7, clearb)
+GEN_VEXT_SHIFT_VX(vsra_vx_h, int16_t, int16_t, H2, H2, DO_SRL, 0xf, clearh)
+GEN_VEXT_SHIFT_VX(vsra_vx_w, int32_t, int32_t, H4, H4, DO_SRL, 0x1f, clearl)
+GEN_VEXT_SHIFT_VX(vsra_vx_d, int64_t, int64_t, H8, H8, DO_SRL, 0x3f, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 13/60] target/riscv: vector single-width bit shift instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 25 ++++++++
 target/riscv/insn32.decode              |  9 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 44 +++++++++++++
 target/riscv/vector_helper.c            | 82 +++++++++++++++++++++++++
 4 files changed, 160 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 4373e9e8c2..47284c7476 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -397,3 +397,28 @@ DEF_HELPER_6(vxor_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vxor_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vxor_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vxor_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vsll_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsll_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsll_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsll_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsrl_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsrl_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsrl_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsrl_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsra_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsra_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsra_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsra_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsll_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsll_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsll_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsll_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsrl_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsrl_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsrl_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsrl_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsra_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsra_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsra_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsra_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 29a505cede..dbbfa34b97 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -319,6 +319,15 @@ vor_vi          001010 . ..... ..... 011 ..... 1010111 @r_vm
 vxor_vv         001011 . ..... ..... 000 ..... 1010111 @r_vm
 vxor_vx         001011 . ..... ..... 100 ..... 1010111 @r_vm
 vxor_vi         001011 . ..... ..... 011 ..... 1010111 @r_vm
+vsll_vv         100101 . ..... ..... 000 ..... 1010111 @r_vm
+vsll_vx         100101 . ..... ..... 100 ..... 1010111 @r_vm
+vsll_vi         100101 . ..... ..... 011 ..... 1010111 @r_vm
+vsrl_vv         101000 . ..... ..... 000 ..... 1010111 @r_vm
+vsrl_vx         101000 . ..... ..... 100 ..... 1010111 @r_vm
+vsrl_vi         101000 . ..... ..... 011 ..... 1010111 @r_vm
+vsra_vv         101001 . ..... ..... 000 ..... 1010111 @r_vm
+vsra_vx         101001 . ..... ..... 100 ..... 1010111 @r_vm
+vsra_vi         101001 . ..... ..... 011 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 3a4696dbcd..a60518e1df 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1223,3 +1223,47 @@ GEN_OPIVX_GVEC_TRANS(vxor_vx, xors)
 GEN_OPIVI_GVEC_TRANS(vand_vi, 0, vand_vx, andi)
 GEN_OPIVI_GVEC_TRANS(vor_vi, 0, vor_vx,  ori)
 GEN_OPIVI_GVEC_TRANS(vxor_vi, 0, vxor_vx, xori)
+
+/* Vector Single-Width Bit Shift Instructions */
+GEN_OPIVV_GVEC_TRANS(vsll_vv,  shlv)
+GEN_OPIVV_GVEC_TRANS(vsrl_vv,  shrv)
+GEN_OPIVV_GVEC_TRANS(vsra_vv,  sarv)
+
+#define GEN_OPIVX_GVEC_SHIFT_TRANS(NAME, GVSUF)                               \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                        \
+{                                                                             \
+    if (!opivx_check(s, a)) {                                                 \
+        return false;                                                         \
+    }                                                                         \
+                                                                              \
+    if (a->vm && s->vl_eq_vlmax) {                                            \
+        TCGv_i32 src1 = tcg_temp_new_i32();                                   \
+        TCGv tmp = tcg_temp_new();                                            \
+        gen_get_gpr(tmp, a->rs1);                                             \
+        tcg_gen_trunc_tl_i32(src1, tmp);                                      \
+        tcg_gen_gvec_##GVSUF(8 << s->sew, vreg_ofs(s, a->rd),                 \
+            vreg_ofs(s, a->rs2), src1, MAXSZ(s), MAXSZ(s));                   \
+        tcg_temp_free_i32(src1);                                              \
+        tcg_temp_free(tmp);                                                   \
+        return true;                                                          \
+    } else {                                                                  \
+        uint32_t data = 0;                                                    \
+        static gen_helper_opivx const fns[4] = {                              \
+            gen_helper_##NAME##_b, gen_helper_##NAME##_h,                     \
+            gen_helper_##NAME##_w, gen_helper_##NAME##_d,                     \
+        };                                                                    \
+                                                                              \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                        \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                            \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                        \
+        return opivx_trans(a->rd, a->rs1, a->rs2, data, fns[s->sew], s);      \
+    }                                                                         \
+    return true;                                                              \
+}
+GEN_OPIVX_GVEC_SHIFT_TRANS(vsll_vx,  shls)
+GEN_OPIVX_GVEC_SHIFT_TRANS(vsrl_vx,  shrs)
+GEN_OPIVX_GVEC_SHIFT_TRANS(vsra_vx,  sars)
+
+GEN_OPIVI_GVEC_TRANS(vsll_vi, 1, vsll_vx,  shli)
+GEN_OPIVI_GVEC_TRANS(vsrl_vi, 1, vsrl_vx,  shri)
+GEN_OPIVI_GVEC_TRANS(vsra_vi, 1, vsra_vx,  sari)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 532b373f99..3772b059b1 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -1253,3 +1253,85 @@ GEN_VEXT_VX(vxor_vx_b, 1, 1, clearb)
 GEN_VEXT_VX(vxor_vx_h, 2, 2, clearh)
 GEN_VEXT_VX(vxor_vx_w, 4, 4, clearl)
 GEN_VEXT_VX(vxor_vx_d, 8, 8, clearq)
+
+/* Vector Single-Width Bit Shift Instructions */
+#define DO_SLL(N, M)  (N << (M))
+#define DO_SRL(N, M)  (N >> (M))
+
+/* generate the helpers for shift instructions with two vector operators */
+#define GEN_VEXT_SHIFT_VV(NAME, TS1, TS2, HS1, HS2, OP, MASK, CLEAR_FN)   \
+void HELPER(NAME)(void *vd, void *v0, void *vs1,                          \
+        void *vs2, CPURISCVState *env, uint32_t desc)                     \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vm = vext_vm(desc);                                          \
+    uint32_t vl = env->vl;                                                \
+    uint32_t esz = sizeof(TS1);                                           \
+    uint32_t vlmax = vext_maxsz(desc) / esz;                              \
+    uint32_t i;                                                           \
+                                                                          \
+    for (i = 0; i < vl; i++) {                                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
+            continue;                                                     \
+        }                                                                 \
+        TS1 s1 = *((TS1 *)vs1 + HS1(i));                                  \
+        TS2 s2 = *((TS2 *)vs2 + HS2(i));                                  \
+        *((TS1 *)vd + HS1(i)) = OP(s2, s1 & MASK);                        \
+    }                                                                     \
+    if (i != 0) {                                                         \
+        CLEAR_FN(vd, vl, vl * esz, vlmax * esz);                          \
+    }                                                                     \
+}
+GEN_VEXT_SHIFT_VV(vsll_vv_b, uint8_t,  uint8_t, H1, H1, DO_SLL, 0x7, clearb)
+GEN_VEXT_SHIFT_VV(vsll_vv_h, uint16_t, uint16_t, H2, H2, DO_SLL, 0xf, clearh)
+GEN_VEXT_SHIFT_VV(vsll_vv_w, uint32_t, uint32_t, H4, H4, DO_SLL, 0x1f, clearl)
+GEN_VEXT_SHIFT_VV(vsll_vv_d, uint64_t, uint64_t, H8, H8, DO_SLL, 0x3f, clearq)
+
+GEN_VEXT_SHIFT_VV(vsrl_vv_b, uint8_t, uint8_t, H1, H1, DO_SRL, 0x7, clearb)
+GEN_VEXT_SHIFT_VV(vsrl_vv_h, uint16_t, uint16_t, H2, H2, DO_SRL, 0xf, clearh)
+GEN_VEXT_SHIFT_VV(vsrl_vv_w, uint32_t, uint32_t, H4, H4, DO_SRL, 0x1f, clearl)
+GEN_VEXT_SHIFT_VV(vsrl_vv_d, uint64_t, uint64_t, H8, H8, DO_SRL, 0x3f, clearq)
+
+GEN_VEXT_SHIFT_VV(vsra_vv_b, uint8_t,  int8_t, H1, H1, DO_SRL, 0x7, clearb)
+GEN_VEXT_SHIFT_VV(vsra_vv_h, uint16_t, int16_t, H2, H2, DO_SRL, 0xf, clearh)
+GEN_VEXT_SHIFT_VV(vsra_vv_w, uint32_t, int32_t, H4, H4, DO_SRL, 0x1f, clearl)
+GEN_VEXT_SHIFT_VV(vsra_vv_d, uint64_t, int64_t, H8, H8, DO_SRL, 0x3f, clearq)
+
+/* generate the helpers for shift instructions with one vector and one sclar */
+#define GEN_VEXT_SHIFT_VX(NAME, TD, TS2, HD, HS2, OP, MASK, CLEAR_FN) \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1,                \
+        void *vs2, CPURISCVState *env, uint32_t desc)                 \
+{                                                                     \
+    uint32_t mlen = vext_mlen(desc);                                  \
+    uint32_t vm = vext_vm(desc);                                      \
+    uint32_t vl = env->vl;                                            \
+    uint32_t esz = sizeof(TD);                                        \
+    uint32_t vlmax = vext_maxsz(desc) / esz;                          \
+    uint32_t i;                                                       \
+                                                                      \
+    for (i = 0; i < vl; i++) {                                        \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                    \
+            continue;                                                 \
+        }                                                             \
+        TS2 s2 = *((TS2 *)vs2 + HS2(i));                              \
+        *((TD *)vd + HD(i)) = OP(s2, s1 & MASK);                      \
+    }                                                                 \
+    if (i != 0) {                                                     \
+        CLEAR_FN(vd, vl, vl * esz, vlmax * esz);                      \
+    }                                                                 \
+}
+
+GEN_VEXT_SHIFT_VX(vsll_vx_b, uint8_t, int8_t, H1, H1, DO_SLL, 0x7, clearb)
+GEN_VEXT_SHIFT_VX(vsll_vx_h, uint16_t, int16_t, H2, H2, DO_SLL, 0xf, clearh)
+GEN_VEXT_SHIFT_VX(vsll_vx_w, uint32_t, int32_t, H4, H4, DO_SLL, 0x1f, clearl)
+GEN_VEXT_SHIFT_VX(vsll_vx_d, uint64_t, int64_t, H8, H8, DO_SLL, 0x3f, clearq)
+
+GEN_VEXT_SHIFT_VX(vsrl_vx_b, uint8_t, uint8_t, H1, H1, DO_SRL, 0x7, clearb)
+GEN_VEXT_SHIFT_VX(vsrl_vx_h, uint16_t, uint16_t, H2, H2, DO_SRL, 0xf, clearh)
+GEN_VEXT_SHIFT_VX(vsrl_vx_w, uint32_t, uint32_t, H4, H4, DO_SRL, 0x1f, clearl)
+GEN_VEXT_SHIFT_VX(vsrl_vx_d, uint64_t, uint64_t, H8, H8, DO_SRL, 0x3f, clearq)
+
+GEN_VEXT_SHIFT_VX(vsra_vx_b, int8_t, int8_t, H1, H1, DO_SRL, 0x7, clearb)
+GEN_VEXT_SHIFT_VX(vsra_vx_h, int16_t, int16_t, H2, H2, DO_SRL, 0xf, clearh)
+GEN_VEXT_SHIFT_VX(vsra_vx_w, int32_t, int32_t, H4, H4, DO_SRL, 0x1f, clearl)
+GEN_VEXT_SHIFT_VX(vsra_vx_d, int64_t, int64_t, H8, H8, DO_SRL, 0x3f, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 14/60] target/riscv: vector narrowing integer right shift instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 13 ++++
 target/riscv/insn32.decode              |  6 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 91 +++++++++++++++++++++++++
 target/riscv/vector_helper.c            | 14 ++++
 4 files changed, 124 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 47284c7476..0f36a8ce43 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -422,3 +422,16 @@ DEF_HELPER_6(vsra_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsra_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsra_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsra_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vnsrl_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnsrl_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnsrl_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnsra_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnsra_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnsra_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnsrl_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnsrl_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnsrl_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnsra_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnsra_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnsra_vx_w, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index dbbfa34b97..e21b3d6b5e 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -328,6 +328,12 @@ vsrl_vi         101000 . ..... ..... 011 ..... 1010111 @r_vm
 vsra_vv         101001 . ..... ..... 000 ..... 1010111 @r_vm
 vsra_vx         101001 . ..... ..... 100 ..... 1010111 @r_vm
 vsra_vi         101001 . ..... ..... 011 ..... 1010111 @r_vm
+vnsrl_vv        101100 . ..... ..... 000 ..... 1010111 @r_vm
+vnsrl_vx        101100 . ..... ..... 100 ..... 1010111 @r_vm
+vnsrl_vi        101100 . ..... ..... 011 ..... 1010111 @r_vm
+vnsra_vv        101101 . ..... ..... 000 ..... 1010111 @r_vm
+vnsra_vx        101101 . ..... ..... 100 ..... 1010111 @r_vm
+vnsra_vi        101101 . ..... ..... 011 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index a60518e1df..7033eeaa4d 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1267,3 +1267,94 @@ GEN_OPIVX_GVEC_SHIFT_TRANS(vsra_vx,  sars)
 GEN_OPIVI_GVEC_TRANS(vsll_vi, 1, vsll_vx,  shli)
 GEN_OPIVI_GVEC_TRANS(vsrl_vi, 1, vsrl_vx,  shri)
 GEN_OPIVI_GVEC_TRANS(vsra_vi, 1, vsra_vx,  sari)
+
+/* Vector Narrowing Integer Right Shift Instructions */
+static bool opivv_narrow_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, true) &&
+            vext_check_reg(s, a->rs1, false) &&
+            vext_check_overlap_group(a->rd, 1 << s->lmul, a->rs2,
+                2 << s->lmul) &&
+            (s->lmul < 0x3) && (s->sew < 0x3));
+}
+
+/* OPIVV with NARROW */
+#define GEN_OPIVV_NARROW_TRANS(NAME)                               \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
+{                                                                  \
+    if (opivv_narrow_check(s, a)) {                                \
+        uint32_t data = 0;                                         \
+        static gen_helper_gvec_4_ptr * const fns[3] = {            \
+            gen_helper_##NAME##_b,                                 \
+            gen_helper_##NAME##_h,                                 \
+            gen_helper_##NAME##_w,                                 \
+        };                                                         \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
+            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),              \
+            cpu_env, 0, s->vlen / 8, data, fns[s->sew]);           \
+        return true;                                               \
+    }                                                              \
+    return false;                                                  \
+}
+GEN_OPIVV_NARROW_TRANS(vnsra_vv)
+GEN_OPIVV_NARROW_TRANS(vnsrl_vv)
+
+static bool opivx_narrow_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, true) &&
+            vext_check_overlap_group(a->rd, 1 << s->lmul, a->rs2,
+                2 << s->lmul) &&
+            (s->lmul < 0x3) && (s->sew < 0x3));
+}
+
+/* OPIVX with NARROW */
+#define GEN_OPIVX_NARROW_TRANS(NAME)                                     \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
+{                                                                        \
+    if (opivx_narrow_check(s, a)) {                                      \
+        uint32_t data = 0;                                               \
+        static gen_helper_opivx const fns[3] = {                         \
+            gen_helper_##NAME##_b,                                       \
+            gen_helper_##NAME##_h,                                       \
+            gen_helper_##NAME##_w,                                       \
+        };                                                               \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                       \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
+        return opivx_trans(a->rd, a->rs1, a->rs2, data, fns[s->sew], s); \
+    }                                                                    \
+    return false;                                                        \
+}
+GEN_OPIVX_NARROW_TRANS(vnsra_vx)
+GEN_OPIVX_NARROW_TRANS(vnsrl_vx)
+
+/* OPIVI with NARROW */
+#define GEN_OPIVI_NARROW_TRANS(NAME, ZX, OPIVX)                          \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
+{                                                                        \
+    if (opivx_narrow_check(s, a)) {                                      \
+        uint32_t data = 0;                                               \
+        static gen_helper_opivx const fns[3] = {                         \
+            gen_helper_##OPIVX##_b,                                      \
+            gen_helper_##OPIVX##_h,                                      \
+            gen_helper_##OPIVX##_w,                                      \
+        };                                                               \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                       \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
+        return opivi_trans(a->rd, a->rs1, a->rs2, data,                  \
+                fns[s->sew], s, ZX);                                     \
+    }                                                                    \
+    return false;                                                        \
+}
+GEN_OPIVI_NARROW_TRANS(vnsra_vi, 1, vnsra_vx)
+GEN_OPIVI_NARROW_TRANS(vnsrl_vi, 1, vnsrl_vx)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 3772b059b1..895155576c 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -1335,3 +1335,17 @@ GEN_VEXT_SHIFT_VX(vsra_vx_b, int8_t, int8_t, H1, H1, DO_SRL, 0x7, clearb)
 GEN_VEXT_SHIFT_VX(vsra_vx_h, int16_t, int16_t, H2, H2, DO_SRL, 0xf, clearh)
 GEN_VEXT_SHIFT_VX(vsra_vx_w, int32_t, int32_t, H4, H4, DO_SRL, 0x1f, clearl)
 GEN_VEXT_SHIFT_VX(vsra_vx_d, int64_t, int64_t, H8, H8, DO_SRL, 0x3f, clearq)
+
+/* Vector Narrowing Integer Right Shift Instructions */
+GEN_VEXT_SHIFT_VV(vnsrl_vv_b, uint8_t,  uint16_t, H1, H2, DO_SRL, 0xf, clearb)
+GEN_VEXT_SHIFT_VV(vnsrl_vv_h, uint16_t, uint32_t, H2, H4, DO_SRL, 0x1f, clearh)
+GEN_VEXT_SHIFT_VV(vnsrl_vv_w, uint32_t, uint64_t, H4, H8, DO_SRL, 0x3f, clearl)
+GEN_VEXT_SHIFT_VV(vnsra_vv_b, uint8_t,  int16_t, H1, H2, DO_SRL, 0xf, clearb)
+GEN_VEXT_SHIFT_VV(vnsra_vv_h, uint16_t, int32_t, H2, H4, DO_SRL, 0x1f, clearh)
+GEN_VEXT_SHIFT_VV(vnsra_vv_w, uint32_t, int64_t, H4, H8, DO_SRL, 0x3f, clearl)
+GEN_VEXT_SHIFT_VX(vnsrl_vx_b, uint8_t, uint16_t, H1, H2, DO_SRL, 0xf, clearb)
+GEN_VEXT_SHIFT_VX(vnsrl_vx_h, uint16_t, uint32_t, H2, H4, DO_SRL, 0x1f, clearh)
+GEN_VEXT_SHIFT_VX(vnsrl_vx_w, uint32_t, uint64_t, H4, H8, DO_SRL, 0x3f, clearl)
+GEN_VEXT_SHIFT_VX(vnsra_vx_b, int8_t, int16_t, H1, H2, DO_SRL, 0xf, clearb)
+GEN_VEXT_SHIFT_VX(vnsra_vx_h, int16_t, int32_t, H2, H4, DO_SRL, 0x1f, clearh)
+GEN_VEXT_SHIFT_VX(vnsra_vx_w, int32_t, int64_t, H4, H8, DO_SRL, 0x3f, clearl)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 14/60] target/riscv: vector narrowing integer right shift instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 13 ++++
 target/riscv/insn32.decode              |  6 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 91 +++++++++++++++++++++++++
 target/riscv/vector_helper.c            | 14 ++++
 4 files changed, 124 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 47284c7476..0f36a8ce43 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -422,3 +422,16 @@ DEF_HELPER_6(vsra_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsra_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsra_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsra_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vnsrl_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnsrl_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnsrl_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnsra_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnsra_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnsra_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnsrl_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnsrl_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnsrl_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnsra_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnsra_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnsra_vx_w, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index dbbfa34b97..e21b3d6b5e 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -328,6 +328,12 @@ vsrl_vi         101000 . ..... ..... 011 ..... 1010111 @r_vm
 vsra_vv         101001 . ..... ..... 000 ..... 1010111 @r_vm
 vsra_vx         101001 . ..... ..... 100 ..... 1010111 @r_vm
 vsra_vi         101001 . ..... ..... 011 ..... 1010111 @r_vm
+vnsrl_vv        101100 . ..... ..... 000 ..... 1010111 @r_vm
+vnsrl_vx        101100 . ..... ..... 100 ..... 1010111 @r_vm
+vnsrl_vi        101100 . ..... ..... 011 ..... 1010111 @r_vm
+vnsra_vv        101101 . ..... ..... 000 ..... 1010111 @r_vm
+vnsra_vx        101101 . ..... ..... 100 ..... 1010111 @r_vm
+vnsra_vi        101101 . ..... ..... 011 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index a60518e1df..7033eeaa4d 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1267,3 +1267,94 @@ GEN_OPIVX_GVEC_SHIFT_TRANS(vsra_vx,  sars)
 GEN_OPIVI_GVEC_TRANS(vsll_vi, 1, vsll_vx,  shli)
 GEN_OPIVI_GVEC_TRANS(vsrl_vi, 1, vsrl_vx,  shri)
 GEN_OPIVI_GVEC_TRANS(vsra_vi, 1, vsra_vx,  sari)
+
+/* Vector Narrowing Integer Right Shift Instructions */
+static bool opivv_narrow_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, true) &&
+            vext_check_reg(s, a->rs1, false) &&
+            vext_check_overlap_group(a->rd, 1 << s->lmul, a->rs2,
+                2 << s->lmul) &&
+            (s->lmul < 0x3) && (s->sew < 0x3));
+}
+
+/* OPIVV with NARROW */
+#define GEN_OPIVV_NARROW_TRANS(NAME)                               \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
+{                                                                  \
+    if (opivv_narrow_check(s, a)) {                                \
+        uint32_t data = 0;                                         \
+        static gen_helper_gvec_4_ptr * const fns[3] = {            \
+            gen_helper_##NAME##_b,                                 \
+            gen_helper_##NAME##_h,                                 \
+            gen_helper_##NAME##_w,                                 \
+        };                                                         \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
+            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),              \
+            cpu_env, 0, s->vlen / 8, data, fns[s->sew]);           \
+        return true;                                               \
+    }                                                              \
+    return false;                                                  \
+}
+GEN_OPIVV_NARROW_TRANS(vnsra_vv)
+GEN_OPIVV_NARROW_TRANS(vnsrl_vv)
+
+static bool opivx_narrow_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, true) &&
+            vext_check_overlap_group(a->rd, 1 << s->lmul, a->rs2,
+                2 << s->lmul) &&
+            (s->lmul < 0x3) && (s->sew < 0x3));
+}
+
+/* OPIVX with NARROW */
+#define GEN_OPIVX_NARROW_TRANS(NAME)                                     \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
+{                                                                        \
+    if (opivx_narrow_check(s, a)) {                                      \
+        uint32_t data = 0;                                               \
+        static gen_helper_opivx const fns[3] = {                         \
+            gen_helper_##NAME##_b,                                       \
+            gen_helper_##NAME##_h,                                       \
+            gen_helper_##NAME##_w,                                       \
+        };                                                               \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                       \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
+        return opivx_trans(a->rd, a->rs1, a->rs2, data, fns[s->sew], s); \
+    }                                                                    \
+    return false;                                                        \
+}
+GEN_OPIVX_NARROW_TRANS(vnsra_vx)
+GEN_OPIVX_NARROW_TRANS(vnsrl_vx)
+
+/* OPIVI with NARROW */
+#define GEN_OPIVI_NARROW_TRANS(NAME, ZX, OPIVX)                          \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
+{                                                                        \
+    if (opivx_narrow_check(s, a)) {                                      \
+        uint32_t data = 0;                                               \
+        static gen_helper_opivx const fns[3] = {                         \
+            gen_helper_##OPIVX##_b,                                      \
+            gen_helper_##OPIVX##_h,                                      \
+            gen_helper_##OPIVX##_w,                                      \
+        };                                                               \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                       \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
+        return opivi_trans(a->rd, a->rs1, a->rs2, data,                  \
+                fns[s->sew], s, ZX);                                     \
+    }                                                                    \
+    return false;                                                        \
+}
+GEN_OPIVI_NARROW_TRANS(vnsra_vi, 1, vnsra_vx)
+GEN_OPIVI_NARROW_TRANS(vnsrl_vi, 1, vnsrl_vx)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 3772b059b1..895155576c 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -1335,3 +1335,17 @@ GEN_VEXT_SHIFT_VX(vsra_vx_b, int8_t, int8_t, H1, H1, DO_SRL, 0x7, clearb)
 GEN_VEXT_SHIFT_VX(vsra_vx_h, int16_t, int16_t, H2, H2, DO_SRL, 0xf, clearh)
 GEN_VEXT_SHIFT_VX(vsra_vx_w, int32_t, int32_t, H4, H4, DO_SRL, 0x1f, clearl)
 GEN_VEXT_SHIFT_VX(vsra_vx_d, int64_t, int64_t, H8, H8, DO_SRL, 0x3f, clearq)
+
+/* Vector Narrowing Integer Right Shift Instructions */
+GEN_VEXT_SHIFT_VV(vnsrl_vv_b, uint8_t,  uint16_t, H1, H2, DO_SRL, 0xf, clearb)
+GEN_VEXT_SHIFT_VV(vnsrl_vv_h, uint16_t, uint32_t, H2, H4, DO_SRL, 0x1f, clearh)
+GEN_VEXT_SHIFT_VV(vnsrl_vv_w, uint32_t, uint64_t, H4, H8, DO_SRL, 0x3f, clearl)
+GEN_VEXT_SHIFT_VV(vnsra_vv_b, uint8_t,  int16_t, H1, H2, DO_SRL, 0xf, clearb)
+GEN_VEXT_SHIFT_VV(vnsra_vv_h, uint16_t, int32_t, H2, H4, DO_SRL, 0x1f, clearh)
+GEN_VEXT_SHIFT_VV(vnsra_vv_w, uint32_t, int64_t, H4, H8, DO_SRL, 0x3f, clearl)
+GEN_VEXT_SHIFT_VX(vnsrl_vx_b, uint8_t, uint16_t, H1, H2, DO_SRL, 0xf, clearb)
+GEN_VEXT_SHIFT_VX(vnsrl_vx_h, uint16_t, uint32_t, H2, H4, DO_SRL, 0x1f, clearh)
+GEN_VEXT_SHIFT_VX(vnsrl_vx_w, uint32_t, uint64_t, H4, H8, DO_SRL, 0x3f, clearl)
+GEN_VEXT_SHIFT_VX(vnsra_vx_b, int8_t, int16_t, H1, H2, DO_SRL, 0xf, clearb)
+GEN_VEXT_SHIFT_VX(vnsra_vx_h, int16_t, int32_t, H2, H4, DO_SRL, 0x1f, clearh)
+GEN_VEXT_SHIFT_VX(vnsra_vx_w, int32_t, int64_t, H4, H8, DO_SRL, 0x3f, clearl)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 15/60] target/riscv: vector integer comparison instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  57 +++++++++++
 target/riscv/insn32.decode              |  20 ++++
 target/riscv/insn_trans/trans_rvv.inc.c |  66 ++++++++++++
 target/riscv/vector_helper.c            | 130 ++++++++++++++++++++++++
 4 files changed, 273 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 0f36a8ce43..4e6c47c2d2 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -435,3 +435,60 @@ DEF_HELPER_6(vnsrl_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnsra_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnsra_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnsra_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vmseq_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmseq_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmseq_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmseq_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsne_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsne_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsne_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsne_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmslt_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmslt_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmslt_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmslt_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsle_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsle_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsle_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsle_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmseq_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmseq_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmseq_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmseq_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsne_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsne_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsne_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsne_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmslt_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmslt_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmslt_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmslt_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsle_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsle_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsle_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsle_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgtu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgtu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgtu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgtu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgt_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgt_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgt_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgt_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e21b3d6b5e..525b2fa442 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -334,6 +334,26 @@ vnsrl_vi        101100 . ..... ..... 011 ..... 1010111 @r_vm
 vnsra_vv        101101 . ..... ..... 000 ..... 1010111 @r_vm
 vnsra_vx        101101 . ..... ..... 100 ..... 1010111 @r_vm
 vnsra_vi        101101 . ..... ..... 011 ..... 1010111 @r_vm
+vmseq_vv        011000 . ..... ..... 000 ..... 1010111 @r_vm
+vmseq_vx        011000 . ..... ..... 100 ..... 1010111 @r_vm
+vmseq_vi        011000 . ..... ..... 011 ..... 1010111 @r_vm
+vmsne_vv        011001 . ..... ..... 000 ..... 1010111 @r_vm
+vmsne_vx        011001 . ..... ..... 100 ..... 1010111 @r_vm
+vmsne_vi        011001 . ..... ..... 011 ..... 1010111 @r_vm
+vmsltu_vv       011010 . ..... ..... 000 ..... 1010111 @r_vm
+vmsltu_vx       011010 . ..... ..... 100 ..... 1010111 @r_vm
+vmslt_vv        011011 . ..... ..... 000 ..... 1010111 @r_vm
+vmslt_vx        011011 . ..... ..... 100 ..... 1010111 @r_vm
+vmsleu_vv       011100 . ..... ..... 000 ..... 1010111 @r_vm
+vmsleu_vx       011100 . ..... ..... 100 ..... 1010111 @r_vm
+vmsleu_vi       011100 . ..... ..... 011 ..... 1010111 @r_vm
+vmsle_vv        011101 . ..... ..... 000 ..... 1010111 @r_vm
+vmsle_vx        011101 . ..... ..... 100 ..... 1010111 @r_vm
+vmsle_vi        011101 . ..... ..... 011 ..... 1010111 @r_vm
+vmsgtu_vx       011110 . ..... ..... 100 ..... 1010111 @r_vm
+vmsgtu_vi       011110 . ..... ..... 011 ..... 1010111 @r_vm
+vmsgt_vx        011111 . ..... ..... 100 ..... 1010111 @r_vm
+vmsgt_vi        011111 . ..... ..... 011 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 7033eeaa4d..078d275af6 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1358,3 +1358,69 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
 }
 GEN_OPIVI_NARROW_TRANS(vnsra_vi, 1, vnsra_vx)
 GEN_OPIVI_NARROW_TRANS(vnsrl_vi, 1, vnsrl_vx)
+
+/* Vector Integer Comparison Instructions */
+
+/* OPIVV without GVEC IR */
+#define GEN_OPIVV_TRANS(NAME, CHECK)                               \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
+{                                                                  \
+    if (CHECK(s, a)) {                                             \
+        uint32_t data = 0;                                         \
+        static gen_helper_gvec_4_ptr * const fns[4] = {            \
+            gen_helper_##NAME##_b, gen_helper_##NAME##_h,          \
+            gen_helper_##NAME##_w, gen_helper_##NAME##_d,          \
+        };                                                         \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
+            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),              \
+            cpu_env, 0, s->vlen / 8, data, fns[s->sew]);           \
+        return true;                                               \
+    }                                                              \
+    return false;                                                  \
+}
+/*
+ * For all comparison instructions, an illegal instruction exception is raised
+ * if the destination vector register overlaps a source vector register group
+ * and LMUL > 1.
+ */
+static bool opivv_cmp_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_reg(s, a->rs1, false) &&
+            ((vext_check_overlap_group(a->rd, 1, a->rs1, 1 << s->lmul) &&
+            vext_check_overlap_group(a->rd, 1, a->rs2, 1 << s->lmul)) ||
+            (s->lmul == 0)));
+}
+GEN_OPIVV_TRANS(vmseq_vv, opivv_cmp_check)
+GEN_OPIVV_TRANS(vmsne_vv, opivv_cmp_check)
+GEN_OPIVV_TRANS(vmsltu_vv, opivv_cmp_check)
+GEN_OPIVV_TRANS(vmslt_vv, opivv_cmp_check)
+GEN_OPIVV_TRANS(vmsleu_vv, opivv_cmp_check)
+GEN_OPIVV_TRANS(vmsle_vv, opivv_cmp_check)
+
+static bool opivx_cmp_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rs2, false) &&
+            (vext_check_overlap_group(a->rd, 1, a->rs2, 1 << s->lmul) ||
+            (s->lmul == 0)));
+}
+GEN_OPIVX_TRANS(vmseq_vx, opivx_cmp_check)
+GEN_OPIVX_TRANS(vmsne_vx, opivx_cmp_check)
+GEN_OPIVX_TRANS(vmsltu_vx, opivx_cmp_check)
+GEN_OPIVX_TRANS(vmslt_vx, opivx_cmp_check)
+GEN_OPIVX_TRANS(vmsleu_vx, opivx_cmp_check)
+GEN_OPIVX_TRANS(vmsle_vx, opivx_cmp_check)
+GEN_OPIVX_TRANS(vmsgtu_vx, opivx_cmp_check)
+GEN_OPIVX_TRANS(vmsgt_vx, opivx_cmp_check)
+
+GEN_OPIVI_TRANS(vmseq_vi, 0, vmseq_vx, opivx_cmp_check)
+GEN_OPIVI_TRANS(vmsne_vi, 0, vmsne_vx, opivx_cmp_check)
+GEN_OPIVI_TRANS(vmsleu_vi, 1, vmsleu_vx, opivx_cmp_check)
+GEN_OPIVI_TRANS(vmsle_vi, 0, vmsle_vx, opivx_cmp_check)
+GEN_OPIVI_TRANS(vmsgtu_vi, 1, vmsgtu_vx, opivx_cmp_check)
+GEN_OPIVI_TRANS(vmsgt_vi, 0, vmsgt_vx, opivx_cmp_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 895155576c..e7a4e99f46 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -1349,3 +1349,133 @@ GEN_VEXT_SHIFT_VX(vnsrl_vx_w, uint32_t, uint64_t, H4, H8, DO_SRL, 0x3f, clearl)
 GEN_VEXT_SHIFT_VX(vnsra_vx_b, int8_t, int16_t, H1, H2, DO_SRL, 0xf, clearb)
 GEN_VEXT_SHIFT_VX(vnsra_vx_h, int16_t, int32_t, H2, H4, DO_SRL, 0x1f, clearh)
 GEN_VEXT_SHIFT_VX(vnsra_vx_w, int32_t, int64_t, H4, H8, DO_SRL, 0x3f, clearl)
+
+/* Vector Integer Comparison Instructions */
+#define DO_MSEQ(N, M) ((N == M) ? 1 : 0)
+#define DO_MSNE(N, M) ((N != M) ? 1 : 0)
+#define DO_MSLTU(N, M) ((N < M) ? 1 : 0)
+#define DO_MSLT(N, M) ((N < M) ? 1 : 0)
+#define DO_MSLEU(N, M) ((N <= M) ? 1 : 0)
+#define DO_MSLE(N, M) ((N <= M) ? 1 : 0)
+#define DO_MSGTU(N, M) ((N > M) ? 1 : 0)
+#define DO_MSGT(N, M) ((N > M) ? 1 : 0)
+
+#define GEN_VEXT_CMP_VV(NAME, ETYPE, H, DO_OP)               \
+void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
+        CPURISCVState *env, uint32_t desc)                    \
+{                                                             \
+    uint32_t mlen = vext_mlen(desc);                          \
+    uint32_t vm = vext_vm(desc);                              \
+    uint32_t vl = env->vl;                                    \
+    uint32_t vlmax = vext_maxsz(desc) / sizeof(ETYPE);        \
+    uint32_t i;                                               \
+                                                              \
+    for (i = 0; i < vl; i++) {                                \
+        ETYPE s1 = *((ETYPE *)vs1 + H(i));                    \
+        ETYPE s2 = *((ETYPE *)vs2 + H(i));                    \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {            \
+            continue;                                         \
+        }                                                     \
+        vext_set_elem_mask(vd, mlen, i, DO_OP(s2, s1));       \
+    }                                                         \
+    if (i == 0) {                                             \
+        return;                                               \
+    }                                                         \
+    for (; i < vlmax; i++) {                                  \
+        vext_set_elem_mask(vd, mlen, i, 0);                   \
+    }                                                         \
+}
+GEN_VEXT_CMP_VV(vmseq_vv_b, uint8_t,  H1, DO_MSEQ)
+GEN_VEXT_CMP_VV(vmseq_vv_h, uint16_t, H2, DO_MSEQ)
+GEN_VEXT_CMP_VV(vmseq_vv_w, uint32_t, H4, DO_MSEQ)
+GEN_VEXT_CMP_VV(vmseq_vv_d, uint64_t, H8, DO_MSEQ)
+
+GEN_VEXT_CMP_VV(vmsne_vv_b, uint8_t,  H1, DO_MSNE)
+GEN_VEXT_CMP_VV(vmsne_vv_h, uint16_t, H2, DO_MSNE)
+GEN_VEXT_CMP_VV(vmsne_vv_w, uint32_t, H4, DO_MSNE)
+GEN_VEXT_CMP_VV(vmsne_vv_d, uint64_t, H8, DO_MSNE)
+
+GEN_VEXT_CMP_VV(vmsltu_vv_b, uint8_t,  H1, DO_MSLTU)
+GEN_VEXT_CMP_VV(vmsltu_vv_h, uint16_t, H2, DO_MSLTU)
+GEN_VEXT_CMP_VV(vmsltu_vv_w, uint32_t, H4, DO_MSLTU)
+GEN_VEXT_CMP_VV(vmsltu_vv_d, uint64_t, H8, DO_MSLTU)
+
+GEN_VEXT_CMP_VV(vmslt_vv_b, int8_t,  H1, DO_MSLT)
+GEN_VEXT_CMP_VV(vmslt_vv_h, int16_t, H2, DO_MSLT)
+GEN_VEXT_CMP_VV(vmslt_vv_w, int32_t, H4, DO_MSLT)
+GEN_VEXT_CMP_VV(vmslt_vv_d, int64_t, H8, DO_MSLT)
+
+GEN_VEXT_CMP_VV(vmsleu_vv_b, uint8_t,  H1, DO_MSLEU)
+GEN_VEXT_CMP_VV(vmsleu_vv_h, uint16_t, H2, DO_MSLEU)
+GEN_VEXT_CMP_VV(vmsleu_vv_w, uint32_t, H4, DO_MSLEU)
+GEN_VEXT_CMP_VV(vmsleu_vv_d, uint64_t, H8, DO_MSLEU)
+
+GEN_VEXT_CMP_VV(vmsle_vv_b, int8_t,  H1, DO_MSLE)
+GEN_VEXT_CMP_VV(vmsle_vv_h, int16_t, H2, DO_MSLE)
+GEN_VEXT_CMP_VV(vmsle_vv_w, int32_t, H4, DO_MSLE)
+GEN_VEXT_CMP_VV(vmsle_vv_d, int64_t, H8, DO_MSLE)
+
+#define GEN_VEXT_CMP_VX(NAME, ETYPE, H, DO_OP)                     \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,   \
+        CPURISCVState *env, uint32_t desc)                          \
+{                                                                   \
+    uint32_t mlen = vext_mlen(desc);                                \
+    uint32_t vm = vext_vm(desc);                                    \
+    uint32_t vl = env->vl;                                          \
+    uint32_t vlmax = vext_maxsz(desc) / sizeof(ETYPE);              \
+    uint32_t i;                                                     \
+                                                                    \
+    for (i = 0; i < vl; i++) {                                      \
+        ETYPE s2 = *((ETYPE *)vs2 + H(i));                          \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                  \
+            continue;                                               \
+        }                                                           \
+        vext_set_elem_mask(vd, mlen, i,                             \
+                DO_OP(s2, (ETYPE)(target_long)s1));                 \
+    }                                                               \
+    if (i == 0) {                                                   \
+        return;                                                     \
+    }                                                               \
+    for (; i < vlmax; i++) {                                        \
+        vext_set_elem_mask(vd, mlen, i, 0);                         \
+    }                                                               \
+}
+GEN_VEXT_CMP_VX(vmseq_vx_b, uint8_t,  H1, DO_MSEQ)
+GEN_VEXT_CMP_VX(vmseq_vx_h, uint16_t, H2, DO_MSEQ)
+GEN_VEXT_CMP_VX(vmseq_vx_w, uint32_t, H4, DO_MSEQ)
+GEN_VEXT_CMP_VX(vmseq_vx_d, uint64_t, H8, DO_MSEQ)
+
+GEN_VEXT_CMP_VX(vmsne_vx_b, uint8_t,  H1, DO_MSNE)
+GEN_VEXT_CMP_VX(vmsne_vx_h, uint16_t, H2, DO_MSNE)
+GEN_VEXT_CMP_VX(vmsne_vx_w, uint32_t, H4, DO_MSNE)
+GEN_VEXT_CMP_VX(vmsne_vx_d, uint64_t, H8, DO_MSNE)
+
+GEN_VEXT_CMP_VX(vmsltu_vx_b, uint8_t,  H1, DO_MSLTU)
+GEN_VEXT_CMP_VX(vmsltu_vx_h, uint16_t, H2, DO_MSLTU)
+GEN_VEXT_CMP_VX(vmsltu_vx_w, uint32_t, H4, DO_MSLTU)
+GEN_VEXT_CMP_VX(vmsltu_vx_d, uint64_t, H8, DO_MSLTU)
+
+GEN_VEXT_CMP_VX(vmslt_vx_b, int8_t,  H1, DO_MSLT)
+GEN_VEXT_CMP_VX(vmslt_vx_h, int16_t, H2, DO_MSLT)
+GEN_VEXT_CMP_VX(vmslt_vx_w, int32_t, H4, DO_MSLT)
+GEN_VEXT_CMP_VX(vmslt_vx_d, int64_t, H8, DO_MSLT)
+
+GEN_VEXT_CMP_VX(vmsleu_vx_b, uint8_t,  H1, DO_MSLEU)
+GEN_VEXT_CMP_VX(vmsleu_vx_h, uint16_t, H2, DO_MSLEU)
+GEN_VEXT_CMP_VX(vmsleu_vx_w, uint32_t, H4, DO_MSLEU)
+GEN_VEXT_CMP_VX(vmsleu_vx_d, uint64_t, H8, DO_MSLEU)
+
+GEN_VEXT_CMP_VX(vmsle_vx_b, int8_t,  H1, DO_MSLE)
+GEN_VEXT_CMP_VX(vmsle_vx_h, int16_t, H2, DO_MSLE)
+GEN_VEXT_CMP_VX(vmsle_vx_w, int32_t, H4, DO_MSLE)
+GEN_VEXT_CMP_VX(vmsle_vx_d, int64_t, H8, DO_MSLE)
+
+GEN_VEXT_CMP_VX(vmsgtu_vx_b, uint8_t,  H1, DO_MSGTU)
+GEN_VEXT_CMP_VX(vmsgtu_vx_h, uint16_t, H2, DO_MSGTU)
+GEN_VEXT_CMP_VX(vmsgtu_vx_w, uint32_t, H4, DO_MSGTU)
+GEN_VEXT_CMP_VX(vmsgtu_vx_d, uint64_t, H8, DO_MSGTU)
+
+GEN_VEXT_CMP_VX(vmsgt_vx_b, int8_t,  H1, DO_MSGT)
+GEN_VEXT_CMP_VX(vmsgt_vx_h, int16_t, H2, DO_MSGT)
+GEN_VEXT_CMP_VX(vmsgt_vx_w, int32_t, H4, DO_MSGT)
+GEN_VEXT_CMP_VX(vmsgt_vx_d, int64_t, H8, DO_MSGT)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 15/60] target/riscv: vector integer comparison instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  57 +++++++++++
 target/riscv/insn32.decode              |  20 ++++
 target/riscv/insn_trans/trans_rvv.inc.c |  66 ++++++++++++
 target/riscv/vector_helper.c            | 130 ++++++++++++++++++++++++
 4 files changed, 273 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 0f36a8ce43..4e6c47c2d2 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -435,3 +435,60 @@ DEF_HELPER_6(vnsrl_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnsra_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnsra_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnsra_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vmseq_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmseq_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmseq_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmseq_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsne_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsne_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsne_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsne_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmslt_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmslt_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmslt_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmslt_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsle_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsle_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsle_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsle_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmseq_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmseq_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmseq_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmseq_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsne_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsne_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsne_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsne_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmslt_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmslt_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmslt_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmslt_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsle_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsle_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsle_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsle_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgtu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgtu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgtu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgtu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgt_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgt_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgt_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgt_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e21b3d6b5e..525b2fa442 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -334,6 +334,26 @@ vnsrl_vi        101100 . ..... ..... 011 ..... 1010111 @r_vm
 vnsra_vv        101101 . ..... ..... 000 ..... 1010111 @r_vm
 vnsra_vx        101101 . ..... ..... 100 ..... 1010111 @r_vm
 vnsra_vi        101101 . ..... ..... 011 ..... 1010111 @r_vm
+vmseq_vv        011000 . ..... ..... 000 ..... 1010111 @r_vm
+vmseq_vx        011000 . ..... ..... 100 ..... 1010111 @r_vm
+vmseq_vi        011000 . ..... ..... 011 ..... 1010111 @r_vm
+vmsne_vv        011001 . ..... ..... 000 ..... 1010111 @r_vm
+vmsne_vx        011001 . ..... ..... 100 ..... 1010111 @r_vm
+vmsne_vi        011001 . ..... ..... 011 ..... 1010111 @r_vm
+vmsltu_vv       011010 . ..... ..... 000 ..... 1010111 @r_vm
+vmsltu_vx       011010 . ..... ..... 100 ..... 1010111 @r_vm
+vmslt_vv        011011 . ..... ..... 000 ..... 1010111 @r_vm
+vmslt_vx        011011 . ..... ..... 100 ..... 1010111 @r_vm
+vmsleu_vv       011100 . ..... ..... 000 ..... 1010111 @r_vm
+vmsleu_vx       011100 . ..... ..... 100 ..... 1010111 @r_vm
+vmsleu_vi       011100 . ..... ..... 011 ..... 1010111 @r_vm
+vmsle_vv        011101 . ..... ..... 000 ..... 1010111 @r_vm
+vmsle_vx        011101 . ..... ..... 100 ..... 1010111 @r_vm
+vmsle_vi        011101 . ..... ..... 011 ..... 1010111 @r_vm
+vmsgtu_vx       011110 . ..... ..... 100 ..... 1010111 @r_vm
+vmsgtu_vi       011110 . ..... ..... 011 ..... 1010111 @r_vm
+vmsgt_vx        011111 . ..... ..... 100 ..... 1010111 @r_vm
+vmsgt_vi        011111 . ..... ..... 011 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 7033eeaa4d..078d275af6 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1358,3 +1358,69 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
 }
 GEN_OPIVI_NARROW_TRANS(vnsra_vi, 1, vnsra_vx)
 GEN_OPIVI_NARROW_TRANS(vnsrl_vi, 1, vnsrl_vx)
+
+/* Vector Integer Comparison Instructions */
+
+/* OPIVV without GVEC IR */
+#define GEN_OPIVV_TRANS(NAME, CHECK)                               \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
+{                                                                  \
+    if (CHECK(s, a)) {                                             \
+        uint32_t data = 0;                                         \
+        static gen_helper_gvec_4_ptr * const fns[4] = {            \
+            gen_helper_##NAME##_b, gen_helper_##NAME##_h,          \
+            gen_helper_##NAME##_w, gen_helper_##NAME##_d,          \
+        };                                                         \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
+            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),              \
+            cpu_env, 0, s->vlen / 8, data, fns[s->sew]);           \
+        return true;                                               \
+    }                                                              \
+    return false;                                                  \
+}
+/*
+ * For all comparison instructions, an illegal instruction exception is raised
+ * if the destination vector register overlaps a source vector register group
+ * and LMUL > 1.
+ */
+static bool opivv_cmp_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_reg(s, a->rs1, false) &&
+            ((vext_check_overlap_group(a->rd, 1, a->rs1, 1 << s->lmul) &&
+            vext_check_overlap_group(a->rd, 1, a->rs2, 1 << s->lmul)) ||
+            (s->lmul == 0)));
+}
+GEN_OPIVV_TRANS(vmseq_vv, opivv_cmp_check)
+GEN_OPIVV_TRANS(vmsne_vv, opivv_cmp_check)
+GEN_OPIVV_TRANS(vmsltu_vv, opivv_cmp_check)
+GEN_OPIVV_TRANS(vmslt_vv, opivv_cmp_check)
+GEN_OPIVV_TRANS(vmsleu_vv, opivv_cmp_check)
+GEN_OPIVV_TRANS(vmsle_vv, opivv_cmp_check)
+
+static bool opivx_cmp_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rs2, false) &&
+            (vext_check_overlap_group(a->rd, 1, a->rs2, 1 << s->lmul) ||
+            (s->lmul == 0)));
+}
+GEN_OPIVX_TRANS(vmseq_vx, opivx_cmp_check)
+GEN_OPIVX_TRANS(vmsne_vx, opivx_cmp_check)
+GEN_OPIVX_TRANS(vmsltu_vx, opivx_cmp_check)
+GEN_OPIVX_TRANS(vmslt_vx, opivx_cmp_check)
+GEN_OPIVX_TRANS(vmsleu_vx, opivx_cmp_check)
+GEN_OPIVX_TRANS(vmsle_vx, opivx_cmp_check)
+GEN_OPIVX_TRANS(vmsgtu_vx, opivx_cmp_check)
+GEN_OPIVX_TRANS(vmsgt_vx, opivx_cmp_check)
+
+GEN_OPIVI_TRANS(vmseq_vi, 0, vmseq_vx, opivx_cmp_check)
+GEN_OPIVI_TRANS(vmsne_vi, 0, vmsne_vx, opivx_cmp_check)
+GEN_OPIVI_TRANS(vmsleu_vi, 1, vmsleu_vx, opivx_cmp_check)
+GEN_OPIVI_TRANS(vmsle_vi, 0, vmsle_vx, opivx_cmp_check)
+GEN_OPIVI_TRANS(vmsgtu_vi, 1, vmsgtu_vx, opivx_cmp_check)
+GEN_OPIVI_TRANS(vmsgt_vi, 0, vmsgt_vx, opivx_cmp_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 895155576c..e7a4e99f46 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -1349,3 +1349,133 @@ GEN_VEXT_SHIFT_VX(vnsrl_vx_w, uint32_t, uint64_t, H4, H8, DO_SRL, 0x3f, clearl)
 GEN_VEXT_SHIFT_VX(vnsra_vx_b, int8_t, int16_t, H1, H2, DO_SRL, 0xf, clearb)
 GEN_VEXT_SHIFT_VX(vnsra_vx_h, int16_t, int32_t, H2, H4, DO_SRL, 0x1f, clearh)
 GEN_VEXT_SHIFT_VX(vnsra_vx_w, int32_t, int64_t, H4, H8, DO_SRL, 0x3f, clearl)
+
+/* Vector Integer Comparison Instructions */
+#define DO_MSEQ(N, M) ((N == M) ? 1 : 0)
+#define DO_MSNE(N, M) ((N != M) ? 1 : 0)
+#define DO_MSLTU(N, M) ((N < M) ? 1 : 0)
+#define DO_MSLT(N, M) ((N < M) ? 1 : 0)
+#define DO_MSLEU(N, M) ((N <= M) ? 1 : 0)
+#define DO_MSLE(N, M) ((N <= M) ? 1 : 0)
+#define DO_MSGTU(N, M) ((N > M) ? 1 : 0)
+#define DO_MSGT(N, M) ((N > M) ? 1 : 0)
+
+#define GEN_VEXT_CMP_VV(NAME, ETYPE, H, DO_OP)               \
+void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
+        CPURISCVState *env, uint32_t desc)                    \
+{                                                             \
+    uint32_t mlen = vext_mlen(desc);                          \
+    uint32_t vm = vext_vm(desc);                              \
+    uint32_t vl = env->vl;                                    \
+    uint32_t vlmax = vext_maxsz(desc) / sizeof(ETYPE);        \
+    uint32_t i;                                               \
+                                                              \
+    for (i = 0; i < vl; i++) {                                \
+        ETYPE s1 = *((ETYPE *)vs1 + H(i));                    \
+        ETYPE s2 = *((ETYPE *)vs2 + H(i));                    \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {            \
+            continue;                                         \
+        }                                                     \
+        vext_set_elem_mask(vd, mlen, i, DO_OP(s2, s1));       \
+    }                                                         \
+    if (i == 0) {                                             \
+        return;                                               \
+    }                                                         \
+    for (; i < vlmax; i++) {                                  \
+        vext_set_elem_mask(vd, mlen, i, 0);                   \
+    }                                                         \
+}
+GEN_VEXT_CMP_VV(vmseq_vv_b, uint8_t,  H1, DO_MSEQ)
+GEN_VEXT_CMP_VV(vmseq_vv_h, uint16_t, H2, DO_MSEQ)
+GEN_VEXT_CMP_VV(vmseq_vv_w, uint32_t, H4, DO_MSEQ)
+GEN_VEXT_CMP_VV(vmseq_vv_d, uint64_t, H8, DO_MSEQ)
+
+GEN_VEXT_CMP_VV(vmsne_vv_b, uint8_t,  H1, DO_MSNE)
+GEN_VEXT_CMP_VV(vmsne_vv_h, uint16_t, H2, DO_MSNE)
+GEN_VEXT_CMP_VV(vmsne_vv_w, uint32_t, H4, DO_MSNE)
+GEN_VEXT_CMP_VV(vmsne_vv_d, uint64_t, H8, DO_MSNE)
+
+GEN_VEXT_CMP_VV(vmsltu_vv_b, uint8_t,  H1, DO_MSLTU)
+GEN_VEXT_CMP_VV(vmsltu_vv_h, uint16_t, H2, DO_MSLTU)
+GEN_VEXT_CMP_VV(vmsltu_vv_w, uint32_t, H4, DO_MSLTU)
+GEN_VEXT_CMP_VV(vmsltu_vv_d, uint64_t, H8, DO_MSLTU)
+
+GEN_VEXT_CMP_VV(vmslt_vv_b, int8_t,  H1, DO_MSLT)
+GEN_VEXT_CMP_VV(vmslt_vv_h, int16_t, H2, DO_MSLT)
+GEN_VEXT_CMP_VV(vmslt_vv_w, int32_t, H4, DO_MSLT)
+GEN_VEXT_CMP_VV(vmslt_vv_d, int64_t, H8, DO_MSLT)
+
+GEN_VEXT_CMP_VV(vmsleu_vv_b, uint8_t,  H1, DO_MSLEU)
+GEN_VEXT_CMP_VV(vmsleu_vv_h, uint16_t, H2, DO_MSLEU)
+GEN_VEXT_CMP_VV(vmsleu_vv_w, uint32_t, H4, DO_MSLEU)
+GEN_VEXT_CMP_VV(vmsleu_vv_d, uint64_t, H8, DO_MSLEU)
+
+GEN_VEXT_CMP_VV(vmsle_vv_b, int8_t,  H1, DO_MSLE)
+GEN_VEXT_CMP_VV(vmsle_vv_h, int16_t, H2, DO_MSLE)
+GEN_VEXT_CMP_VV(vmsle_vv_w, int32_t, H4, DO_MSLE)
+GEN_VEXT_CMP_VV(vmsle_vv_d, int64_t, H8, DO_MSLE)
+
+#define GEN_VEXT_CMP_VX(NAME, ETYPE, H, DO_OP)                     \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,   \
+        CPURISCVState *env, uint32_t desc)                          \
+{                                                                   \
+    uint32_t mlen = vext_mlen(desc);                                \
+    uint32_t vm = vext_vm(desc);                                    \
+    uint32_t vl = env->vl;                                          \
+    uint32_t vlmax = vext_maxsz(desc) / sizeof(ETYPE);              \
+    uint32_t i;                                                     \
+                                                                    \
+    for (i = 0; i < vl; i++) {                                      \
+        ETYPE s2 = *((ETYPE *)vs2 + H(i));                          \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                  \
+            continue;                                               \
+        }                                                           \
+        vext_set_elem_mask(vd, mlen, i,                             \
+                DO_OP(s2, (ETYPE)(target_long)s1));                 \
+    }                                                               \
+    if (i == 0) {                                                   \
+        return;                                                     \
+    }                                                               \
+    for (; i < vlmax; i++) {                                        \
+        vext_set_elem_mask(vd, mlen, i, 0);                         \
+    }                                                               \
+}
+GEN_VEXT_CMP_VX(vmseq_vx_b, uint8_t,  H1, DO_MSEQ)
+GEN_VEXT_CMP_VX(vmseq_vx_h, uint16_t, H2, DO_MSEQ)
+GEN_VEXT_CMP_VX(vmseq_vx_w, uint32_t, H4, DO_MSEQ)
+GEN_VEXT_CMP_VX(vmseq_vx_d, uint64_t, H8, DO_MSEQ)
+
+GEN_VEXT_CMP_VX(vmsne_vx_b, uint8_t,  H1, DO_MSNE)
+GEN_VEXT_CMP_VX(vmsne_vx_h, uint16_t, H2, DO_MSNE)
+GEN_VEXT_CMP_VX(vmsne_vx_w, uint32_t, H4, DO_MSNE)
+GEN_VEXT_CMP_VX(vmsne_vx_d, uint64_t, H8, DO_MSNE)
+
+GEN_VEXT_CMP_VX(vmsltu_vx_b, uint8_t,  H1, DO_MSLTU)
+GEN_VEXT_CMP_VX(vmsltu_vx_h, uint16_t, H2, DO_MSLTU)
+GEN_VEXT_CMP_VX(vmsltu_vx_w, uint32_t, H4, DO_MSLTU)
+GEN_VEXT_CMP_VX(vmsltu_vx_d, uint64_t, H8, DO_MSLTU)
+
+GEN_VEXT_CMP_VX(vmslt_vx_b, int8_t,  H1, DO_MSLT)
+GEN_VEXT_CMP_VX(vmslt_vx_h, int16_t, H2, DO_MSLT)
+GEN_VEXT_CMP_VX(vmslt_vx_w, int32_t, H4, DO_MSLT)
+GEN_VEXT_CMP_VX(vmslt_vx_d, int64_t, H8, DO_MSLT)
+
+GEN_VEXT_CMP_VX(vmsleu_vx_b, uint8_t,  H1, DO_MSLEU)
+GEN_VEXT_CMP_VX(vmsleu_vx_h, uint16_t, H2, DO_MSLEU)
+GEN_VEXT_CMP_VX(vmsleu_vx_w, uint32_t, H4, DO_MSLEU)
+GEN_VEXT_CMP_VX(vmsleu_vx_d, uint64_t, H8, DO_MSLEU)
+
+GEN_VEXT_CMP_VX(vmsle_vx_b, int8_t,  H1, DO_MSLE)
+GEN_VEXT_CMP_VX(vmsle_vx_h, int16_t, H2, DO_MSLE)
+GEN_VEXT_CMP_VX(vmsle_vx_w, int32_t, H4, DO_MSLE)
+GEN_VEXT_CMP_VX(vmsle_vx_d, int64_t, H8, DO_MSLE)
+
+GEN_VEXT_CMP_VX(vmsgtu_vx_b, uint8_t,  H1, DO_MSGTU)
+GEN_VEXT_CMP_VX(vmsgtu_vx_h, uint16_t, H2, DO_MSGTU)
+GEN_VEXT_CMP_VX(vmsgtu_vx_w, uint32_t, H4, DO_MSGTU)
+GEN_VEXT_CMP_VX(vmsgtu_vx_d, uint64_t, H8, DO_MSGTU)
+
+GEN_VEXT_CMP_VX(vmsgt_vx_b, int8_t,  H1, DO_MSGT)
+GEN_VEXT_CMP_VX(vmsgt_vx_h, int16_t, H2, DO_MSGT)
+GEN_VEXT_CMP_VX(vmsgt_vx_w, int32_t, H4, DO_MSGT)
+GEN_VEXT_CMP_VX(vmsgt_vx_d, int64_t, H8, DO_MSGT)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 16/60] target/riscv: vector integer min/max instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 33 ++++++++++++
 target/riscv/insn32.decode              |  8 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 10 ++++
 target/riscv/vector_helper.c            | 71 +++++++++++++++++++++++++
 4 files changed, 122 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 4e6c47c2d2..c7d4ff185a 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -492,3 +492,36 @@ DEF_HELPER_6(vmsgt_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmsgt_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmsgt_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmsgt_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vminu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vminu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vminu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vminu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmin_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmin_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmin_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmin_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmax_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmax_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmax_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmax_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vminu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vminu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vminu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vminu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmin_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmin_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmin_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmin_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmax_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmax_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmax_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmax_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 525b2fa442..a7619f4e3d 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -354,6 +354,14 @@ vmsgtu_vx       011110 . ..... ..... 100 ..... 1010111 @r_vm
 vmsgtu_vi       011110 . ..... ..... 011 ..... 1010111 @r_vm
 vmsgt_vx        011111 . ..... ..... 100 ..... 1010111 @r_vm
 vmsgt_vi        011111 . ..... ..... 011 ..... 1010111 @r_vm
+vminu_vv        000100 . ..... ..... 000 ..... 1010111 @r_vm
+vminu_vx        000100 . ..... ..... 100 ..... 1010111 @r_vm
+vmin_vv         000101 . ..... ..... 000 ..... 1010111 @r_vm
+vmin_vx         000101 . ..... ..... 100 ..... 1010111 @r_vm
+vmaxu_vv        000110 . ..... ..... 000 ..... 1010111 @r_vm
+vmaxu_vx        000110 . ..... ..... 100 ..... 1010111 @r_vm
+vmax_vv         000111 . ..... ..... 000 ..... 1010111 @r_vm
+vmax_vx         000111 . ..... ..... 100 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 078d275af6..4437a77878 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1424,3 +1424,13 @@ GEN_OPIVI_TRANS(vmsleu_vi, 1, vmsleu_vx, opivx_cmp_check)
 GEN_OPIVI_TRANS(vmsle_vi, 0, vmsle_vx, opivx_cmp_check)
 GEN_OPIVI_TRANS(vmsgtu_vi, 1, vmsgtu_vx, opivx_cmp_check)
 GEN_OPIVI_TRANS(vmsgt_vi, 0, vmsgt_vx, opivx_cmp_check)
+
+/* Vector Integer Min/Max Instructions */
+GEN_OPIVV_GVEC_TRANS(vminu_vv, umin)
+GEN_OPIVV_GVEC_TRANS(vmin_vv,  smin)
+GEN_OPIVV_GVEC_TRANS(vmaxu_vv, umax)
+GEN_OPIVV_GVEC_TRANS(vmax_vv,  smax)
+GEN_OPIVX_TRANS(vminu_vx, opivx_check)
+GEN_OPIVX_TRANS(vmin_vx,  opivx_check)
+GEN_OPIVX_TRANS(vmaxu_vx, opivx_check)
+GEN_OPIVX_TRANS(vmax_vx,  opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index e7a4e99f46..03e001262f 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -849,6 +849,10 @@ GEN_VEXT_AMO(vamomaxuw_v_w, uint32_t, uint32_t, idx_w, clearl)
 #define OP_SSS_H int16_t, int16_t, int16_t, int16_t, int16_t
 #define OP_SSS_W int32_t, int32_t, int32_t, int32_t, int32_t
 #define OP_SSS_D int64_t, int64_t, int64_t, int64_t, int64_t
+#define OP_UUU_B uint8_t, uint8_t, uint8_t, uint8_t, uint8_t
+#define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
+#define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
+#define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
 
 /* operation of two vector elements */
 #define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
@@ -1479,3 +1483,70 @@ GEN_VEXT_CMP_VX(vmsgt_vx_b, int8_t,  H1, DO_MSGT)
 GEN_VEXT_CMP_VX(vmsgt_vx_h, int16_t, H2, DO_MSGT)
 GEN_VEXT_CMP_VX(vmsgt_vx_w, int32_t, H4, DO_MSGT)
 GEN_VEXT_CMP_VX(vmsgt_vx_d, int64_t, H8, DO_MSGT)
+
+/* Vector Integer Min/Max Instructions */
+RVVCALL(OPIVV2, vminu_vv_b, OP_UUU_B, H1, H1, H1, DO_MIN)
+RVVCALL(OPIVV2, vminu_vv_h, OP_UUU_H, H2, H2, H2, DO_MIN)
+RVVCALL(OPIVV2, vminu_vv_w, OP_UUU_W, H4, H4, H4, DO_MIN)
+RVVCALL(OPIVV2, vminu_vv_d, OP_UUU_D, H8, H8, H8, DO_MIN)
+RVVCALL(OPIVV2, vmin_vv_b, OP_SSS_B, H1, H1, H1, DO_MIN)
+RVVCALL(OPIVV2, vmin_vv_h, OP_SSS_H, H2, H2, H2, DO_MIN)
+RVVCALL(OPIVV2, vmin_vv_w, OP_SSS_W, H4, H4, H4, DO_MIN)
+RVVCALL(OPIVV2, vmin_vv_d, OP_SSS_D, H8, H8, H8, DO_MIN)
+RVVCALL(OPIVV2, vmaxu_vv_b, OP_UUU_B, H1, H1, H1, DO_MAX)
+RVVCALL(OPIVV2, vmaxu_vv_h, OP_UUU_H, H2, H2, H2, DO_MAX)
+RVVCALL(OPIVV2, vmaxu_vv_w, OP_UUU_W, H4, H4, H4, DO_MAX)
+RVVCALL(OPIVV2, vmaxu_vv_d, OP_UUU_D, H8, H8, H8, DO_MAX)
+RVVCALL(OPIVV2, vmax_vv_b, OP_SSS_B, H1, H1, H1, DO_MAX)
+RVVCALL(OPIVV2, vmax_vv_h, OP_SSS_H, H2, H2, H2, DO_MAX)
+RVVCALL(OPIVV2, vmax_vv_w, OP_SSS_W, H4, H4, H4, DO_MAX)
+RVVCALL(OPIVV2, vmax_vv_d, OP_SSS_D, H8, H8, H8, DO_MAX)
+GEN_VEXT_VV(vminu_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vminu_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vminu_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vminu_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vmin_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vmin_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vmin_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vmin_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vmaxu_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vmaxu_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vmaxu_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vmaxu_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vmax_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vmax_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vmax_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vmax_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2, vminu_vx_b, OP_UUU_B, H1, H1, DO_MIN)
+RVVCALL(OPIVX2, vminu_vx_h, OP_UUU_H, H2, H2, DO_MIN)
+RVVCALL(OPIVX2, vminu_vx_w, OP_UUU_W, H4, H4, DO_MIN)
+RVVCALL(OPIVX2, vminu_vx_d, OP_UUU_D, H8, H8, DO_MIN)
+RVVCALL(OPIVX2, vmin_vx_b, OP_SSS_B, H1, H1, DO_MIN)
+RVVCALL(OPIVX2, vmin_vx_h, OP_SSS_H, H2, H2, DO_MIN)
+RVVCALL(OPIVX2, vmin_vx_w, OP_SSS_W, H4, H4, DO_MIN)
+RVVCALL(OPIVX2, vmin_vx_d, OP_SSS_D, H8, H8, DO_MIN)
+RVVCALL(OPIVX2, vmaxu_vx_b, OP_UUU_B, H1, H1, DO_MAX)
+RVVCALL(OPIVX2, vmaxu_vx_h, OP_UUU_H, H2, H2, DO_MAX)
+RVVCALL(OPIVX2, vmaxu_vx_w, OP_UUU_W, H4, H4, DO_MAX)
+RVVCALL(OPIVX2, vmaxu_vx_d, OP_UUU_D, H8, H8, DO_MAX)
+RVVCALL(OPIVX2, vmax_vx_b, OP_SSS_B, H1, H1, DO_MAX)
+RVVCALL(OPIVX2, vmax_vx_h, OP_SSS_H, H2, H2, DO_MAX)
+RVVCALL(OPIVX2, vmax_vx_w, OP_SSS_W, H4, H4, DO_MAX)
+RVVCALL(OPIVX2, vmax_vx_d, OP_SSS_D, H8, H8, DO_MAX)
+GEN_VEXT_VX(vminu_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vminu_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vminu_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vminu_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vmin_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vmin_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vmin_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vmin_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vmaxu_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vmaxu_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vmaxu_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vmaxu_vx_d, 8, 8,  clearq)
+GEN_VEXT_VX(vmax_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vmax_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vmax_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vmax_vx_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 16/60] target/riscv: vector integer min/max instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 33 ++++++++++++
 target/riscv/insn32.decode              |  8 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 10 ++++
 target/riscv/vector_helper.c            | 71 +++++++++++++++++++++++++
 4 files changed, 122 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 4e6c47c2d2..c7d4ff185a 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -492,3 +492,36 @@ DEF_HELPER_6(vmsgt_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmsgt_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmsgt_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmsgt_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vminu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vminu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vminu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vminu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmin_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmin_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmin_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmin_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmax_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmax_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmax_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmax_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vminu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vminu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vminu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vminu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmin_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmin_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmin_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmin_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmax_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmax_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmax_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmax_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 525b2fa442..a7619f4e3d 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -354,6 +354,14 @@ vmsgtu_vx       011110 . ..... ..... 100 ..... 1010111 @r_vm
 vmsgtu_vi       011110 . ..... ..... 011 ..... 1010111 @r_vm
 vmsgt_vx        011111 . ..... ..... 100 ..... 1010111 @r_vm
 vmsgt_vi        011111 . ..... ..... 011 ..... 1010111 @r_vm
+vminu_vv        000100 . ..... ..... 000 ..... 1010111 @r_vm
+vminu_vx        000100 . ..... ..... 100 ..... 1010111 @r_vm
+vmin_vv         000101 . ..... ..... 000 ..... 1010111 @r_vm
+vmin_vx         000101 . ..... ..... 100 ..... 1010111 @r_vm
+vmaxu_vv        000110 . ..... ..... 000 ..... 1010111 @r_vm
+vmaxu_vx        000110 . ..... ..... 100 ..... 1010111 @r_vm
+vmax_vv         000111 . ..... ..... 000 ..... 1010111 @r_vm
+vmax_vx         000111 . ..... ..... 100 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 078d275af6..4437a77878 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1424,3 +1424,13 @@ GEN_OPIVI_TRANS(vmsleu_vi, 1, vmsleu_vx, opivx_cmp_check)
 GEN_OPIVI_TRANS(vmsle_vi, 0, vmsle_vx, opivx_cmp_check)
 GEN_OPIVI_TRANS(vmsgtu_vi, 1, vmsgtu_vx, opivx_cmp_check)
 GEN_OPIVI_TRANS(vmsgt_vi, 0, vmsgt_vx, opivx_cmp_check)
+
+/* Vector Integer Min/Max Instructions */
+GEN_OPIVV_GVEC_TRANS(vminu_vv, umin)
+GEN_OPIVV_GVEC_TRANS(vmin_vv,  smin)
+GEN_OPIVV_GVEC_TRANS(vmaxu_vv, umax)
+GEN_OPIVV_GVEC_TRANS(vmax_vv,  smax)
+GEN_OPIVX_TRANS(vminu_vx, opivx_check)
+GEN_OPIVX_TRANS(vmin_vx,  opivx_check)
+GEN_OPIVX_TRANS(vmaxu_vx, opivx_check)
+GEN_OPIVX_TRANS(vmax_vx,  opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index e7a4e99f46..03e001262f 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -849,6 +849,10 @@ GEN_VEXT_AMO(vamomaxuw_v_w, uint32_t, uint32_t, idx_w, clearl)
 #define OP_SSS_H int16_t, int16_t, int16_t, int16_t, int16_t
 #define OP_SSS_W int32_t, int32_t, int32_t, int32_t, int32_t
 #define OP_SSS_D int64_t, int64_t, int64_t, int64_t, int64_t
+#define OP_UUU_B uint8_t, uint8_t, uint8_t, uint8_t, uint8_t
+#define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
+#define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
+#define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
 
 /* operation of two vector elements */
 #define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
@@ -1479,3 +1483,70 @@ GEN_VEXT_CMP_VX(vmsgt_vx_b, int8_t,  H1, DO_MSGT)
 GEN_VEXT_CMP_VX(vmsgt_vx_h, int16_t, H2, DO_MSGT)
 GEN_VEXT_CMP_VX(vmsgt_vx_w, int32_t, H4, DO_MSGT)
 GEN_VEXT_CMP_VX(vmsgt_vx_d, int64_t, H8, DO_MSGT)
+
+/* Vector Integer Min/Max Instructions */
+RVVCALL(OPIVV2, vminu_vv_b, OP_UUU_B, H1, H1, H1, DO_MIN)
+RVVCALL(OPIVV2, vminu_vv_h, OP_UUU_H, H2, H2, H2, DO_MIN)
+RVVCALL(OPIVV2, vminu_vv_w, OP_UUU_W, H4, H4, H4, DO_MIN)
+RVVCALL(OPIVV2, vminu_vv_d, OP_UUU_D, H8, H8, H8, DO_MIN)
+RVVCALL(OPIVV2, vmin_vv_b, OP_SSS_B, H1, H1, H1, DO_MIN)
+RVVCALL(OPIVV2, vmin_vv_h, OP_SSS_H, H2, H2, H2, DO_MIN)
+RVVCALL(OPIVV2, vmin_vv_w, OP_SSS_W, H4, H4, H4, DO_MIN)
+RVVCALL(OPIVV2, vmin_vv_d, OP_SSS_D, H8, H8, H8, DO_MIN)
+RVVCALL(OPIVV2, vmaxu_vv_b, OP_UUU_B, H1, H1, H1, DO_MAX)
+RVVCALL(OPIVV2, vmaxu_vv_h, OP_UUU_H, H2, H2, H2, DO_MAX)
+RVVCALL(OPIVV2, vmaxu_vv_w, OP_UUU_W, H4, H4, H4, DO_MAX)
+RVVCALL(OPIVV2, vmaxu_vv_d, OP_UUU_D, H8, H8, H8, DO_MAX)
+RVVCALL(OPIVV2, vmax_vv_b, OP_SSS_B, H1, H1, H1, DO_MAX)
+RVVCALL(OPIVV2, vmax_vv_h, OP_SSS_H, H2, H2, H2, DO_MAX)
+RVVCALL(OPIVV2, vmax_vv_w, OP_SSS_W, H4, H4, H4, DO_MAX)
+RVVCALL(OPIVV2, vmax_vv_d, OP_SSS_D, H8, H8, H8, DO_MAX)
+GEN_VEXT_VV(vminu_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vminu_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vminu_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vminu_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vmin_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vmin_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vmin_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vmin_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vmaxu_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vmaxu_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vmaxu_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vmaxu_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vmax_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vmax_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vmax_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vmax_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2, vminu_vx_b, OP_UUU_B, H1, H1, DO_MIN)
+RVVCALL(OPIVX2, vminu_vx_h, OP_UUU_H, H2, H2, DO_MIN)
+RVVCALL(OPIVX2, vminu_vx_w, OP_UUU_W, H4, H4, DO_MIN)
+RVVCALL(OPIVX2, vminu_vx_d, OP_UUU_D, H8, H8, DO_MIN)
+RVVCALL(OPIVX2, vmin_vx_b, OP_SSS_B, H1, H1, DO_MIN)
+RVVCALL(OPIVX2, vmin_vx_h, OP_SSS_H, H2, H2, DO_MIN)
+RVVCALL(OPIVX2, vmin_vx_w, OP_SSS_W, H4, H4, DO_MIN)
+RVVCALL(OPIVX2, vmin_vx_d, OP_SSS_D, H8, H8, DO_MIN)
+RVVCALL(OPIVX2, vmaxu_vx_b, OP_UUU_B, H1, H1, DO_MAX)
+RVVCALL(OPIVX2, vmaxu_vx_h, OP_UUU_H, H2, H2, DO_MAX)
+RVVCALL(OPIVX2, vmaxu_vx_w, OP_UUU_W, H4, H4, DO_MAX)
+RVVCALL(OPIVX2, vmaxu_vx_d, OP_UUU_D, H8, H8, DO_MAX)
+RVVCALL(OPIVX2, vmax_vx_b, OP_SSS_B, H1, H1, DO_MAX)
+RVVCALL(OPIVX2, vmax_vx_h, OP_SSS_H, H2, H2, DO_MAX)
+RVVCALL(OPIVX2, vmax_vx_w, OP_SSS_W, H4, H4, DO_MAX)
+RVVCALL(OPIVX2, vmax_vx_d, OP_SSS_D, H8, H8, DO_MAX)
+GEN_VEXT_VX(vminu_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vminu_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vminu_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vminu_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vmin_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vmin_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vmin_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vmin_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vmaxu_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vmaxu_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vmaxu_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vmaxu_vx_d, 8, 8,  clearq)
+GEN_VEXT_VX(vmax_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vmax_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vmax_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vmax_vx_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 17/60] target/riscv: vector single-width integer multiply instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  33 ++++++
 target/riscv/insn32.decode              |   8 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  10 ++
 target/riscv/vector_helper.c            | 147 ++++++++++++++++++++++++
 4 files changed, 198 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index c7d4ff185a..f42a12eef3 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -525,3 +525,36 @@ DEF_HELPER_6(vmax_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmax_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmax_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmax_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vmul_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmul_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmul_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmul_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulh_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulh_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulh_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulh_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmul_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmul_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmul_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmul_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulh_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulh_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulh_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulh_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index a7619f4e3d..a8ac4e9e9d 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -362,6 +362,14 @@ vmaxu_vv        000110 . ..... ..... 000 ..... 1010111 @r_vm
 vmaxu_vx        000110 . ..... ..... 100 ..... 1010111 @r_vm
 vmax_vv         000111 . ..... ..... 000 ..... 1010111 @r_vm
 vmax_vx         000111 . ..... ..... 100 ..... 1010111 @r_vm
+vmul_vv         100101 . ..... ..... 010 ..... 1010111 @r_vm
+vmul_vx         100101 . ..... ..... 110 ..... 1010111 @r_vm
+vmulh_vv        100111 . ..... ..... 010 ..... 1010111 @r_vm
+vmulh_vx        100111 . ..... ..... 110 ..... 1010111 @r_vm
+vmulhu_vv       100100 . ..... ..... 010 ..... 1010111 @r_vm
+vmulhu_vx       100100 . ..... ..... 110 ..... 1010111 @r_vm
+vmulhsu_vv      100110 . ..... ..... 010 ..... 1010111 @r_vm
+vmulhsu_vx      100110 . ..... ..... 110 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 4437a77878..a1ecc9f52d 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1434,3 +1434,13 @@ GEN_OPIVX_TRANS(vminu_vx, opivx_check)
 GEN_OPIVX_TRANS(vmin_vx,  opivx_check)
 GEN_OPIVX_TRANS(vmaxu_vx, opivx_check)
 GEN_OPIVX_TRANS(vmax_vx,  opivx_check)
+
+/* Vector Single-Width Integer Multiply Instructions */
+GEN_OPIVV_GVEC_TRANS(vmul_vv,  mul)
+GEN_OPIVV_TRANS(vmulh_vv, opivv_check)
+GEN_OPIVV_TRANS(vmulhu_vv, opivv_check)
+GEN_OPIVV_TRANS(vmulhsu_vv, opivv_check)
+GEN_OPIVX_GVEC_TRANS(vmul_vx,  muls)
+GEN_OPIVX_TRANS(vmulh_vx, opivx_check)
+GEN_OPIVX_TRANS(vmulhu_vx, opivx_check)
+GEN_OPIVX_TRANS(vmulhsu_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 03e001262f..93daafd5bd 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -853,6 +853,10 @@ GEN_VEXT_AMO(vamomaxuw_v_w, uint32_t, uint32_t, idx_w, clearl)
 #define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
 #define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
 #define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
+#define OP_SUS_B int8_t, uint8_t, int8_t, uint8_t, int8_t
+#define OP_SUS_H int16_t, uint16_t, int16_t, uint16_t, int16_t
+#define OP_SUS_W int32_t, uint32_t, int32_t, uint32_t, int32_t
+#define OP_SUS_D int64_t, uint64_t, int64_t, uint64_t, int64_t
 
 /* operation of two vector elements */
 #define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
@@ -1550,3 +1554,146 @@ GEN_VEXT_VX(vmax_vx_b, 1, 1, clearb)
 GEN_VEXT_VX(vmax_vx_h, 2, 2, clearh)
 GEN_VEXT_VX(vmax_vx_w, 4, 4, clearl)
 GEN_VEXT_VX(vmax_vx_d, 8, 8, clearq)
+
+/* Vector Single-Width Integer Multiply Instructions */
+#define DO_MUL(N, M) (N * M)
+RVVCALL(OPIVV2, vmul_vv_b, OP_SSS_B, H1, H1, H1, DO_MUL)
+RVVCALL(OPIVV2, vmul_vv_h, OP_SSS_H, H2, H2, H2, DO_MUL)
+RVVCALL(OPIVV2, vmul_vv_w, OP_SSS_W, H4, H4, H4, DO_MUL)
+RVVCALL(OPIVV2, vmul_vv_d, OP_SSS_D, H8, H8, H8, DO_MUL)
+GEN_VEXT_VV(vmul_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vmul_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vmul_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vmul_vv_d, 8, 8, clearq)
+
+static int8_t do_mulh_b(int8_t s2, int8_t s1)
+{
+    return (int16_t)s2 * (int16_t)s1 >> 8;
+}
+static int16_t do_mulh_h(int16_t s2, int16_t s1)
+{
+    return (int32_t)s2 * (int32_t)s1 >> 16;
+}
+static int32_t do_mulh_w(int32_t s2, int32_t s1)
+{
+    return (int64_t)s2 * (int64_t)s1 >> 32;
+}
+static int64_t do_mulh_d(int64_t s2, int64_t s1)
+{
+    uint64_t hi_64, lo_64;
+
+    muls64(&lo_64, &hi_64, s1, s2);
+    return hi_64;
+}
+
+static uint8_t do_mulhu_b(uint8_t s2, uint8_t s1)
+{
+    return (uint16_t)s2 * (uint16_t)s1 >> 8;
+}
+static uint16_t do_mulhu_h(uint16_t s2, uint16_t s1)
+{
+    return (uint32_t)s2 * (uint32_t)s1 >> 16;
+}
+static uint32_t do_mulhu_w(uint32_t s2, uint32_t s1)
+{
+    return (uint64_t)s2 * (uint64_t)s1 >> 32;
+}
+static uint64_t do_mulhu_d(uint64_t s2, uint64_t s1)
+{
+    uint64_t hi_64, lo_64;
+
+    mulu64(&lo_64, &hi_64, s2, s1);
+    return hi_64;
+}
+
+static int8_t do_mulhsu_b(int8_t s2, uint8_t s1)
+{
+    return (int16_t)s2 * (uint16_t)s1 >> 8;
+}
+static int16_t do_mulhsu_h(int16_t s2, uint16_t s1)
+{
+    return (int32_t)s2 * (uint32_t)s1 >> 16;
+}
+static int32_t do_mulhsu_w(int32_t s2, uint32_t s1)
+{
+    return (int64_t)s2 * (uint64_t)s1 >> 32;
+}
+static int64_t do_mulhsu_d(int64_t s2, uint64_t s1)
+{
+    uint64_t hi_64, lo_64, abs_s2 = s2;
+
+    if (s2 < 0) {
+        abs_s2 = -s2;
+    }
+    mulu64(&lo_64, &hi_64, abs_s2, s1);
+    if ((int64_t)(s2 ^ s1) < 0) {
+        lo_64 = ~lo_64;
+        hi_64 = ~hi_64;
+        if (lo_64 == UINT64_MAX) {
+            lo_64 = 0;
+            hi_64 += 1;
+        } else {
+            lo_64 += 1;
+        }
+    }
+
+    return hi_64;
+}
+
+RVVCALL(OPIVV2, vmulh_vv_b, OP_SSS_B, H1, H1, H1, do_mulh_b)
+RVVCALL(OPIVV2, vmulh_vv_h, OP_SSS_H, H2, H2, H2, do_mulh_h)
+RVVCALL(OPIVV2, vmulh_vv_w, OP_SSS_W, H4, H4, H4, do_mulh_w)
+RVVCALL(OPIVV2, vmulh_vv_d, OP_SSS_D, H8, H8, H8, do_mulh_d)
+RVVCALL(OPIVV2, vmulhu_vv_b, OP_UUU_B, H1, H1, H1, do_mulhu_b)
+RVVCALL(OPIVV2, vmulhu_vv_h, OP_UUU_H, H2, H2, H2, do_mulhu_h)
+RVVCALL(OPIVV2, vmulhu_vv_w, OP_UUU_W, H4, H4, H4, do_mulhu_w)
+RVVCALL(OPIVV2, vmulhu_vv_d, OP_UUU_D, H8, H8, H8, do_mulhu_d)
+RVVCALL(OPIVV2, vmulhsu_vv_b, OP_SUS_B, H1, H1, H1, do_mulhsu_b)
+RVVCALL(OPIVV2, vmulhsu_vv_h, OP_SUS_H, H2, H2, H2, do_mulhsu_h)
+RVVCALL(OPIVV2, vmulhsu_vv_w, OP_SUS_W, H4, H4, H4, do_mulhsu_w)
+RVVCALL(OPIVV2, vmulhsu_vv_d, OP_SUS_D, H8, H8, H8, do_mulhsu_d)
+GEN_VEXT_VV(vmulh_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vmulh_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vmulh_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vmulh_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vmulhu_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vmulhu_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vmulhu_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vmulhu_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vmulhsu_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vmulhsu_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vmulhsu_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vmulhsu_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2, vmul_vx_b, OP_SSS_B, H1, H1, DO_MUL)
+RVVCALL(OPIVX2, vmul_vx_h, OP_SSS_H, H2, H2, DO_MUL)
+RVVCALL(OPIVX2, vmul_vx_w, OP_SSS_W, H4, H4, DO_MUL)
+RVVCALL(OPIVX2, vmul_vx_d, OP_SSS_D, H8, H8, DO_MUL)
+RVVCALL(OPIVX2, vmulh_vx_b, OP_SSS_B, H1, H1, do_mulh_b)
+RVVCALL(OPIVX2, vmulh_vx_h, OP_SSS_H, H2, H2, do_mulh_h)
+RVVCALL(OPIVX2, vmulh_vx_w, OP_SSS_W, H4, H4, do_mulh_w)
+RVVCALL(OPIVX2, vmulh_vx_d, OP_SSS_D, H8, H8, do_mulh_d)
+RVVCALL(OPIVX2, vmulhu_vx_b, OP_UUU_B, H1, H1, do_mulhu_b)
+RVVCALL(OPIVX2, vmulhu_vx_h, OP_UUU_H, H2, H2, do_mulhu_h)
+RVVCALL(OPIVX2, vmulhu_vx_w, OP_UUU_W, H4, H4, do_mulhu_w)
+RVVCALL(OPIVX2, vmulhu_vx_d, OP_UUU_D, H8, H8, do_mulhu_d)
+RVVCALL(OPIVX2, vmulhsu_vx_b, OP_SUS_B, H1, H1, do_mulhsu_b)
+RVVCALL(OPIVX2, vmulhsu_vx_h, OP_SUS_H, H2, H2, do_mulhsu_h)
+RVVCALL(OPIVX2, vmulhsu_vx_w, OP_SUS_W, H4, H4, do_mulhsu_w)
+RVVCALL(OPIVX2, vmulhsu_vx_d, OP_SUS_D, H8, H8, do_mulhsu_d)
+GEN_VEXT_VX(vmul_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vmul_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vmul_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vmul_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vmulh_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vmulh_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vmulh_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vmulh_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vmulhu_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vmulhu_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vmulhu_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vmulhu_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vmulhsu_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vmulhsu_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vmulhsu_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vmulhsu_vx_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 17/60] target/riscv: vector single-width integer multiply instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  33 ++++++
 target/riscv/insn32.decode              |   8 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  10 ++
 target/riscv/vector_helper.c            | 147 ++++++++++++++++++++++++
 4 files changed, 198 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index c7d4ff185a..f42a12eef3 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -525,3 +525,36 @@ DEF_HELPER_6(vmax_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmax_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmax_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmax_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vmul_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmul_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmul_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmul_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulh_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulh_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulh_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulh_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmul_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmul_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmul_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmul_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulh_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulh_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulh_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulh_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index a7619f4e3d..a8ac4e9e9d 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -362,6 +362,14 @@ vmaxu_vv        000110 . ..... ..... 000 ..... 1010111 @r_vm
 vmaxu_vx        000110 . ..... ..... 100 ..... 1010111 @r_vm
 vmax_vv         000111 . ..... ..... 000 ..... 1010111 @r_vm
 vmax_vx         000111 . ..... ..... 100 ..... 1010111 @r_vm
+vmul_vv         100101 . ..... ..... 010 ..... 1010111 @r_vm
+vmul_vx         100101 . ..... ..... 110 ..... 1010111 @r_vm
+vmulh_vv        100111 . ..... ..... 010 ..... 1010111 @r_vm
+vmulh_vx        100111 . ..... ..... 110 ..... 1010111 @r_vm
+vmulhu_vv       100100 . ..... ..... 010 ..... 1010111 @r_vm
+vmulhu_vx       100100 . ..... ..... 110 ..... 1010111 @r_vm
+vmulhsu_vv      100110 . ..... ..... 010 ..... 1010111 @r_vm
+vmulhsu_vx      100110 . ..... ..... 110 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 4437a77878..a1ecc9f52d 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1434,3 +1434,13 @@ GEN_OPIVX_TRANS(vminu_vx, opivx_check)
 GEN_OPIVX_TRANS(vmin_vx,  opivx_check)
 GEN_OPIVX_TRANS(vmaxu_vx, opivx_check)
 GEN_OPIVX_TRANS(vmax_vx,  opivx_check)
+
+/* Vector Single-Width Integer Multiply Instructions */
+GEN_OPIVV_GVEC_TRANS(vmul_vv,  mul)
+GEN_OPIVV_TRANS(vmulh_vv, opivv_check)
+GEN_OPIVV_TRANS(vmulhu_vv, opivv_check)
+GEN_OPIVV_TRANS(vmulhsu_vv, opivv_check)
+GEN_OPIVX_GVEC_TRANS(vmul_vx,  muls)
+GEN_OPIVX_TRANS(vmulh_vx, opivx_check)
+GEN_OPIVX_TRANS(vmulhu_vx, opivx_check)
+GEN_OPIVX_TRANS(vmulhsu_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 03e001262f..93daafd5bd 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -853,6 +853,10 @@ GEN_VEXT_AMO(vamomaxuw_v_w, uint32_t, uint32_t, idx_w, clearl)
 #define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
 #define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
 #define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
+#define OP_SUS_B int8_t, uint8_t, int8_t, uint8_t, int8_t
+#define OP_SUS_H int16_t, uint16_t, int16_t, uint16_t, int16_t
+#define OP_SUS_W int32_t, uint32_t, int32_t, uint32_t, int32_t
+#define OP_SUS_D int64_t, uint64_t, int64_t, uint64_t, int64_t
 
 /* operation of two vector elements */
 #define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
@@ -1550,3 +1554,146 @@ GEN_VEXT_VX(vmax_vx_b, 1, 1, clearb)
 GEN_VEXT_VX(vmax_vx_h, 2, 2, clearh)
 GEN_VEXT_VX(vmax_vx_w, 4, 4, clearl)
 GEN_VEXT_VX(vmax_vx_d, 8, 8, clearq)
+
+/* Vector Single-Width Integer Multiply Instructions */
+#define DO_MUL(N, M) (N * M)
+RVVCALL(OPIVV2, vmul_vv_b, OP_SSS_B, H1, H1, H1, DO_MUL)
+RVVCALL(OPIVV2, vmul_vv_h, OP_SSS_H, H2, H2, H2, DO_MUL)
+RVVCALL(OPIVV2, vmul_vv_w, OP_SSS_W, H4, H4, H4, DO_MUL)
+RVVCALL(OPIVV2, vmul_vv_d, OP_SSS_D, H8, H8, H8, DO_MUL)
+GEN_VEXT_VV(vmul_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vmul_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vmul_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vmul_vv_d, 8, 8, clearq)
+
+static int8_t do_mulh_b(int8_t s2, int8_t s1)
+{
+    return (int16_t)s2 * (int16_t)s1 >> 8;
+}
+static int16_t do_mulh_h(int16_t s2, int16_t s1)
+{
+    return (int32_t)s2 * (int32_t)s1 >> 16;
+}
+static int32_t do_mulh_w(int32_t s2, int32_t s1)
+{
+    return (int64_t)s2 * (int64_t)s1 >> 32;
+}
+static int64_t do_mulh_d(int64_t s2, int64_t s1)
+{
+    uint64_t hi_64, lo_64;
+
+    muls64(&lo_64, &hi_64, s1, s2);
+    return hi_64;
+}
+
+static uint8_t do_mulhu_b(uint8_t s2, uint8_t s1)
+{
+    return (uint16_t)s2 * (uint16_t)s1 >> 8;
+}
+static uint16_t do_mulhu_h(uint16_t s2, uint16_t s1)
+{
+    return (uint32_t)s2 * (uint32_t)s1 >> 16;
+}
+static uint32_t do_mulhu_w(uint32_t s2, uint32_t s1)
+{
+    return (uint64_t)s2 * (uint64_t)s1 >> 32;
+}
+static uint64_t do_mulhu_d(uint64_t s2, uint64_t s1)
+{
+    uint64_t hi_64, lo_64;
+
+    mulu64(&lo_64, &hi_64, s2, s1);
+    return hi_64;
+}
+
+static int8_t do_mulhsu_b(int8_t s2, uint8_t s1)
+{
+    return (int16_t)s2 * (uint16_t)s1 >> 8;
+}
+static int16_t do_mulhsu_h(int16_t s2, uint16_t s1)
+{
+    return (int32_t)s2 * (uint32_t)s1 >> 16;
+}
+static int32_t do_mulhsu_w(int32_t s2, uint32_t s1)
+{
+    return (int64_t)s2 * (uint64_t)s1 >> 32;
+}
+static int64_t do_mulhsu_d(int64_t s2, uint64_t s1)
+{
+    uint64_t hi_64, lo_64, abs_s2 = s2;
+
+    if (s2 < 0) {
+        abs_s2 = -s2;
+    }
+    mulu64(&lo_64, &hi_64, abs_s2, s1);
+    if ((int64_t)(s2 ^ s1) < 0) {
+        lo_64 = ~lo_64;
+        hi_64 = ~hi_64;
+        if (lo_64 == UINT64_MAX) {
+            lo_64 = 0;
+            hi_64 += 1;
+        } else {
+            lo_64 += 1;
+        }
+    }
+
+    return hi_64;
+}
+
+RVVCALL(OPIVV2, vmulh_vv_b, OP_SSS_B, H1, H1, H1, do_mulh_b)
+RVVCALL(OPIVV2, vmulh_vv_h, OP_SSS_H, H2, H2, H2, do_mulh_h)
+RVVCALL(OPIVV2, vmulh_vv_w, OP_SSS_W, H4, H4, H4, do_mulh_w)
+RVVCALL(OPIVV2, vmulh_vv_d, OP_SSS_D, H8, H8, H8, do_mulh_d)
+RVVCALL(OPIVV2, vmulhu_vv_b, OP_UUU_B, H1, H1, H1, do_mulhu_b)
+RVVCALL(OPIVV2, vmulhu_vv_h, OP_UUU_H, H2, H2, H2, do_mulhu_h)
+RVVCALL(OPIVV2, vmulhu_vv_w, OP_UUU_W, H4, H4, H4, do_mulhu_w)
+RVVCALL(OPIVV2, vmulhu_vv_d, OP_UUU_D, H8, H8, H8, do_mulhu_d)
+RVVCALL(OPIVV2, vmulhsu_vv_b, OP_SUS_B, H1, H1, H1, do_mulhsu_b)
+RVVCALL(OPIVV2, vmulhsu_vv_h, OP_SUS_H, H2, H2, H2, do_mulhsu_h)
+RVVCALL(OPIVV2, vmulhsu_vv_w, OP_SUS_W, H4, H4, H4, do_mulhsu_w)
+RVVCALL(OPIVV2, vmulhsu_vv_d, OP_SUS_D, H8, H8, H8, do_mulhsu_d)
+GEN_VEXT_VV(vmulh_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vmulh_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vmulh_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vmulh_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vmulhu_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vmulhu_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vmulhu_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vmulhu_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vmulhsu_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vmulhsu_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vmulhsu_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vmulhsu_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2, vmul_vx_b, OP_SSS_B, H1, H1, DO_MUL)
+RVVCALL(OPIVX2, vmul_vx_h, OP_SSS_H, H2, H2, DO_MUL)
+RVVCALL(OPIVX2, vmul_vx_w, OP_SSS_W, H4, H4, DO_MUL)
+RVVCALL(OPIVX2, vmul_vx_d, OP_SSS_D, H8, H8, DO_MUL)
+RVVCALL(OPIVX2, vmulh_vx_b, OP_SSS_B, H1, H1, do_mulh_b)
+RVVCALL(OPIVX2, vmulh_vx_h, OP_SSS_H, H2, H2, do_mulh_h)
+RVVCALL(OPIVX2, vmulh_vx_w, OP_SSS_W, H4, H4, do_mulh_w)
+RVVCALL(OPIVX2, vmulh_vx_d, OP_SSS_D, H8, H8, do_mulh_d)
+RVVCALL(OPIVX2, vmulhu_vx_b, OP_UUU_B, H1, H1, do_mulhu_b)
+RVVCALL(OPIVX2, vmulhu_vx_h, OP_UUU_H, H2, H2, do_mulhu_h)
+RVVCALL(OPIVX2, vmulhu_vx_w, OP_UUU_W, H4, H4, do_mulhu_w)
+RVVCALL(OPIVX2, vmulhu_vx_d, OP_UUU_D, H8, H8, do_mulhu_d)
+RVVCALL(OPIVX2, vmulhsu_vx_b, OP_SUS_B, H1, H1, do_mulhsu_b)
+RVVCALL(OPIVX2, vmulhsu_vx_h, OP_SUS_H, H2, H2, do_mulhsu_h)
+RVVCALL(OPIVX2, vmulhsu_vx_w, OP_SUS_W, H4, H4, do_mulhsu_w)
+RVVCALL(OPIVX2, vmulhsu_vx_d, OP_SUS_D, H8, H8, do_mulhsu_d)
+GEN_VEXT_VX(vmul_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vmul_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vmul_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vmul_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vmulh_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vmulh_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vmulh_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vmulh_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vmulhu_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vmulhu_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vmulhu_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vmulhu_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vmulhsu_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vmulhsu_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vmulhsu_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vmulhsu_vx_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 18/60] target/riscv: vector integer divide instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 33 +++++++++++
 target/riscv/insn32.decode              |  8 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 10 ++++
 target/riscv/vector_helper.c            | 74 +++++++++++++++++++++++++
 4 files changed, 125 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index f42a12eef3..357f149198 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -558,3 +558,36 @@ DEF_HELPER_6(vmulhsu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmulhsu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmulhsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmulhsu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vdivu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdivu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdivu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdivu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdiv_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdiv_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdiv_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdiv_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vremu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vremu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vremu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vremu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrem_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrem_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrem_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrem_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdivu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vdivu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vdivu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vdivu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vdiv_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vdiv_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vdiv_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vdiv_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vremu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vremu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vremu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vremu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrem_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrem_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrem_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrem_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index a8ac4e9e9d..2afe24dd34 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -370,6 +370,14 @@ vmulhu_vv       100100 . ..... ..... 010 ..... 1010111 @r_vm
 vmulhu_vx       100100 . ..... ..... 110 ..... 1010111 @r_vm
 vmulhsu_vv      100110 . ..... ..... 010 ..... 1010111 @r_vm
 vmulhsu_vx      100110 . ..... ..... 110 ..... 1010111 @r_vm
+vdivu_vv        100000 . ..... ..... 010 ..... 1010111 @r_vm
+vdivu_vx        100000 . ..... ..... 110 ..... 1010111 @r_vm
+vdiv_vv         100001 . ..... ..... 010 ..... 1010111 @r_vm
+vdiv_vx         100001 . ..... ..... 110 ..... 1010111 @r_vm
+vremu_vv        100010 . ..... ..... 010 ..... 1010111 @r_vm
+vremu_vx        100010 . ..... ..... 110 ..... 1010111 @r_vm
+vrem_vv         100011 . ..... ..... 010 ..... 1010111 @r_vm
+vrem_vx         100011 . ..... ..... 110 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index a1ecc9f52d..9f0645a92b 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1444,3 +1444,13 @@ GEN_OPIVX_GVEC_TRANS(vmul_vx,  muls)
 GEN_OPIVX_TRANS(vmulh_vx, opivx_check)
 GEN_OPIVX_TRANS(vmulhu_vx, opivx_check)
 GEN_OPIVX_TRANS(vmulhsu_vx, opivx_check)
+
+/* Vector Integer Divide Instructions */
+GEN_OPIVV_TRANS(vdivu_vv, opivv_check)
+GEN_OPIVV_TRANS(vdiv_vv, opivv_check)
+GEN_OPIVV_TRANS(vremu_vv, opivv_check)
+GEN_OPIVV_TRANS(vrem_vv, opivv_check)
+GEN_OPIVX_TRANS(vdivu_vx, opivx_check)
+GEN_OPIVX_TRANS(vdiv_vx, opivx_check)
+GEN_OPIVX_TRANS(vremu_vx, opivx_check)
+GEN_OPIVX_TRANS(vrem_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 93daafd5bd..6330f5882f 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -1697,3 +1697,77 @@ GEN_VEXT_VX(vmulhsu_vx_b, 1, 1, clearb)
 GEN_VEXT_VX(vmulhsu_vx_h, 2, 2, clearh)
 GEN_VEXT_VX(vmulhsu_vx_w, 4, 4, clearl)
 GEN_VEXT_VX(vmulhsu_vx_d, 8, 8, clearq)
+
+/* Vector Integer Divide Instructions */
+#define DO_DIVU(N, M) (unlikely(M == 0) ? (__typeof(N))(-1) : N / M)
+#define DO_REMU(N, M) (unlikely(M == 0) ? N : N % M)
+#define DO_DIV(N, M)  (unlikely(M == 0) ? (__typeof(N))(-1) :\
+        unlikely((N == -N) && (M == (__typeof(N))(-1))) ? N : N / M)
+#define DO_REM(N, M)  (unlikely(M == 0) ? N :\
+        unlikely((N == -N) && (M == (__typeof(N))(-1))) ? 0 : N % M)
+
+RVVCALL(OPIVV2, vdivu_vv_b, OP_UUU_B, H1, H1, H1, DO_DIVU)
+RVVCALL(OPIVV2, vdivu_vv_h, OP_UUU_H, H2, H2, H2, DO_DIVU)
+RVVCALL(OPIVV2, vdivu_vv_w, OP_UUU_W, H4, H4, H4, DO_DIVU)
+RVVCALL(OPIVV2, vdivu_vv_d, OP_UUU_D, H8, H8, H8, DO_DIVU)
+RVVCALL(OPIVV2, vdiv_vv_b, OP_SSS_B, H1, H1, H1, DO_DIV)
+RVVCALL(OPIVV2, vdiv_vv_h, OP_SSS_H, H2, H2, H2, DO_DIV)
+RVVCALL(OPIVV2, vdiv_vv_w, OP_SSS_W, H4, H4, H4, DO_DIV)
+RVVCALL(OPIVV2, vdiv_vv_d, OP_SSS_D, H8, H8, H8, DO_DIV)
+RVVCALL(OPIVV2, vremu_vv_b, OP_UUU_B, H1, H1, H1, DO_REMU)
+RVVCALL(OPIVV2, vremu_vv_h, OP_UUU_H, H2, H2, H2, DO_REMU)
+RVVCALL(OPIVV2, vremu_vv_w, OP_UUU_W, H4, H4, H4, DO_REMU)
+RVVCALL(OPIVV2, vremu_vv_d, OP_UUU_D, H8, H8, H8, DO_REMU)
+RVVCALL(OPIVV2, vrem_vv_b, OP_SSS_B, H1, H1, H1, DO_REM)
+RVVCALL(OPIVV2, vrem_vv_h, OP_SSS_H, H2, H2, H2, DO_REM)
+RVVCALL(OPIVV2, vrem_vv_w, OP_SSS_W, H4, H4, H4, DO_REM)
+RVVCALL(OPIVV2, vrem_vv_d, OP_SSS_D, H8, H8, H8, DO_REM)
+GEN_VEXT_VV(vdivu_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vdivu_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vdivu_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vdivu_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vdiv_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vdiv_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vdiv_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vdiv_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vremu_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vremu_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vremu_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vremu_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vrem_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vrem_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vrem_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vrem_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2, vdivu_vx_b, OP_UUU_B, H1, H1, DO_DIVU)
+RVVCALL(OPIVX2, vdivu_vx_h, OP_UUU_H, H2, H2, DO_DIVU)
+RVVCALL(OPIVX2, vdivu_vx_w, OP_UUU_W, H4, H4, DO_DIVU)
+RVVCALL(OPIVX2, vdivu_vx_d, OP_UUU_D, H8, H8, DO_DIVU)
+RVVCALL(OPIVX2, vdiv_vx_b, OP_SSS_B, H1, H1, DO_DIV)
+RVVCALL(OPIVX2, vdiv_vx_h, OP_SSS_H, H2, H2, DO_DIV)
+RVVCALL(OPIVX2, vdiv_vx_w, OP_SSS_W, H4, H4, DO_DIV)
+RVVCALL(OPIVX2, vdiv_vx_d, OP_SSS_D, H8, H8, DO_DIV)
+RVVCALL(OPIVX2, vremu_vx_b, OP_UUU_B, H1, H1, DO_REMU)
+RVVCALL(OPIVX2, vremu_vx_h, OP_UUU_H, H2, H2, DO_REMU)
+RVVCALL(OPIVX2, vremu_vx_w, OP_UUU_W, H4, H4, DO_REMU)
+RVVCALL(OPIVX2, vremu_vx_d, OP_UUU_D, H8, H8, DO_REMU)
+RVVCALL(OPIVX2, vrem_vx_b, OP_SSS_B, H1, H1, DO_REM)
+RVVCALL(OPIVX2, vrem_vx_h, OP_SSS_H, H2, H2, DO_REM)
+RVVCALL(OPIVX2, vrem_vx_w, OP_SSS_W, H4, H4, DO_REM)
+RVVCALL(OPIVX2, vrem_vx_d, OP_SSS_D, H8, H8, DO_REM)
+GEN_VEXT_VX(vdivu_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vdivu_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vdivu_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vdivu_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vdiv_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vdiv_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vdiv_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vdiv_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vremu_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vremu_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vremu_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vremu_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vrem_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vrem_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vrem_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vrem_vx_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 18/60] target/riscv: vector integer divide instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 33 +++++++++++
 target/riscv/insn32.decode              |  8 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 10 ++++
 target/riscv/vector_helper.c            | 74 +++++++++++++++++++++++++
 4 files changed, 125 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index f42a12eef3..357f149198 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -558,3 +558,36 @@ DEF_HELPER_6(vmulhsu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmulhsu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmulhsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmulhsu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vdivu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdivu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdivu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdivu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdiv_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdiv_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdiv_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdiv_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vremu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vremu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vremu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vremu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrem_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrem_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrem_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrem_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdivu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vdivu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vdivu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vdivu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vdiv_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vdiv_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vdiv_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vdiv_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vremu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vremu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vremu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vremu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrem_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrem_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrem_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrem_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index a8ac4e9e9d..2afe24dd34 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -370,6 +370,14 @@ vmulhu_vv       100100 . ..... ..... 010 ..... 1010111 @r_vm
 vmulhu_vx       100100 . ..... ..... 110 ..... 1010111 @r_vm
 vmulhsu_vv      100110 . ..... ..... 010 ..... 1010111 @r_vm
 vmulhsu_vx      100110 . ..... ..... 110 ..... 1010111 @r_vm
+vdivu_vv        100000 . ..... ..... 010 ..... 1010111 @r_vm
+vdivu_vx        100000 . ..... ..... 110 ..... 1010111 @r_vm
+vdiv_vv         100001 . ..... ..... 010 ..... 1010111 @r_vm
+vdiv_vx         100001 . ..... ..... 110 ..... 1010111 @r_vm
+vremu_vv        100010 . ..... ..... 010 ..... 1010111 @r_vm
+vremu_vx        100010 . ..... ..... 110 ..... 1010111 @r_vm
+vrem_vv         100011 . ..... ..... 010 ..... 1010111 @r_vm
+vrem_vx         100011 . ..... ..... 110 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index a1ecc9f52d..9f0645a92b 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1444,3 +1444,13 @@ GEN_OPIVX_GVEC_TRANS(vmul_vx,  muls)
 GEN_OPIVX_TRANS(vmulh_vx, opivx_check)
 GEN_OPIVX_TRANS(vmulhu_vx, opivx_check)
 GEN_OPIVX_TRANS(vmulhsu_vx, opivx_check)
+
+/* Vector Integer Divide Instructions */
+GEN_OPIVV_TRANS(vdivu_vv, opivv_check)
+GEN_OPIVV_TRANS(vdiv_vv, opivv_check)
+GEN_OPIVV_TRANS(vremu_vv, opivv_check)
+GEN_OPIVV_TRANS(vrem_vv, opivv_check)
+GEN_OPIVX_TRANS(vdivu_vx, opivx_check)
+GEN_OPIVX_TRANS(vdiv_vx, opivx_check)
+GEN_OPIVX_TRANS(vremu_vx, opivx_check)
+GEN_OPIVX_TRANS(vrem_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 93daafd5bd..6330f5882f 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -1697,3 +1697,77 @@ GEN_VEXT_VX(vmulhsu_vx_b, 1, 1, clearb)
 GEN_VEXT_VX(vmulhsu_vx_h, 2, 2, clearh)
 GEN_VEXT_VX(vmulhsu_vx_w, 4, 4, clearl)
 GEN_VEXT_VX(vmulhsu_vx_d, 8, 8, clearq)
+
+/* Vector Integer Divide Instructions */
+#define DO_DIVU(N, M) (unlikely(M == 0) ? (__typeof(N))(-1) : N / M)
+#define DO_REMU(N, M) (unlikely(M == 0) ? N : N % M)
+#define DO_DIV(N, M)  (unlikely(M == 0) ? (__typeof(N))(-1) :\
+        unlikely((N == -N) && (M == (__typeof(N))(-1))) ? N : N / M)
+#define DO_REM(N, M)  (unlikely(M == 0) ? N :\
+        unlikely((N == -N) && (M == (__typeof(N))(-1))) ? 0 : N % M)
+
+RVVCALL(OPIVV2, vdivu_vv_b, OP_UUU_B, H1, H1, H1, DO_DIVU)
+RVVCALL(OPIVV2, vdivu_vv_h, OP_UUU_H, H2, H2, H2, DO_DIVU)
+RVVCALL(OPIVV2, vdivu_vv_w, OP_UUU_W, H4, H4, H4, DO_DIVU)
+RVVCALL(OPIVV2, vdivu_vv_d, OP_UUU_D, H8, H8, H8, DO_DIVU)
+RVVCALL(OPIVV2, vdiv_vv_b, OP_SSS_B, H1, H1, H1, DO_DIV)
+RVVCALL(OPIVV2, vdiv_vv_h, OP_SSS_H, H2, H2, H2, DO_DIV)
+RVVCALL(OPIVV2, vdiv_vv_w, OP_SSS_W, H4, H4, H4, DO_DIV)
+RVVCALL(OPIVV2, vdiv_vv_d, OP_SSS_D, H8, H8, H8, DO_DIV)
+RVVCALL(OPIVV2, vremu_vv_b, OP_UUU_B, H1, H1, H1, DO_REMU)
+RVVCALL(OPIVV2, vremu_vv_h, OP_UUU_H, H2, H2, H2, DO_REMU)
+RVVCALL(OPIVV2, vremu_vv_w, OP_UUU_W, H4, H4, H4, DO_REMU)
+RVVCALL(OPIVV2, vremu_vv_d, OP_UUU_D, H8, H8, H8, DO_REMU)
+RVVCALL(OPIVV2, vrem_vv_b, OP_SSS_B, H1, H1, H1, DO_REM)
+RVVCALL(OPIVV2, vrem_vv_h, OP_SSS_H, H2, H2, H2, DO_REM)
+RVVCALL(OPIVV2, vrem_vv_w, OP_SSS_W, H4, H4, H4, DO_REM)
+RVVCALL(OPIVV2, vrem_vv_d, OP_SSS_D, H8, H8, H8, DO_REM)
+GEN_VEXT_VV(vdivu_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vdivu_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vdivu_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vdivu_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vdiv_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vdiv_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vdiv_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vdiv_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vremu_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vremu_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vremu_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vremu_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vrem_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vrem_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vrem_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vrem_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2, vdivu_vx_b, OP_UUU_B, H1, H1, DO_DIVU)
+RVVCALL(OPIVX2, vdivu_vx_h, OP_UUU_H, H2, H2, DO_DIVU)
+RVVCALL(OPIVX2, vdivu_vx_w, OP_UUU_W, H4, H4, DO_DIVU)
+RVVCALL(OPIVX2, vdivu_vx_d, OP_UUU_D, H8, H8, DO_DIVU)
+RVVCALL(OPIVX2, vdiv_vx_b, OP_SSS_B, H1, H1, DO_DIV)
+RVVCALL(OPIVX2, vdiv_vx_h, OP_SSS_H, H2, H2, DO_DIV)
+RVVCALL(OPIVX2, vdiv_vx_w, OP_SSS_W, H4, H4, DO_DIV)
+RVVCALL(OPIVX2, vdiv_vx_d, OP_SSS_D, H8, H8, DO_DIV)
+RVVCALL(OPIVX2, vremu_vx_b, OP_UUU_B, H1, H1, DO_REMU)
+RVVCALL(OPIVX2, vremu_vx_h, OP_UUU_H, H2, H2, DO_REMU)
+RVVCALL(OPIVX2, vremu_vx_w, OP_UUU_W, H4, H4, DO_REMU)
+RVVCALL(OPIVX2, vremu_vx_d, OP_UUU_D, H8, H8, DO_REMU)
+RVVCALL(OPIVX2, vrem_vx_b, OP_SSS_B, H1, H1, DO_REM)
+RVVCALL(OPIVX2, vrem_vx_h, OP_SSS_H, H2, H2, DO_REM)
+RVVCALL(OPIVX2, vrem_vx_w, OP_SSS_W, H4, H4, DO_REM)
+RVVCALL(OPIVX2, vrem_vx_d, OP_SSS_D, H8, H8, DO_REM)
+GEN_VEXT_VX(vdivu_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vdivu_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vdivu_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vdivu_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vdiv_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vdiv_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vdiv_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vdiv_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vremu_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vremu_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vremu_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vremu_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vrem_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vrem_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vrem_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vrem_vx_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 19/60] target/riscv: vector widening integer multiply instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 19 +++++++++
 target/riscv/insn32.decode              |  6 +++
 target/riscv/insn_trans/trans_rvv.inc.c |  8 ++++
 target/riscv/vector_helper.c            | 51 +++++++++++++++++++++++++
 4 files changed, 84 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 357f149198..1704b8c512 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -591,3 +591,22 @@ DEF_HELPER_6(vrem_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrem_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrem_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrem_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vwmul_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmul_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmul_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmulu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmulu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmulu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmulsu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmulsu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmulsu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmul_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmul_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmul_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmulu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmulu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmulu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmulsu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmulsu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmulsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 2afe24dd34..ceddfe4b6c 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -378,6 +378,12 @@ vremu_vv        100010 . ..... ..... 010 ..... 1010111 @r_vm
 vremu_vx        100010 . ..... ..... 110 ..... 1010111 @r_vm
 vrem_vv         100011 . ..... ..... 010 ..... 1010111 @r_vm
 vrem_vx         100011 . ..... ..... 110 ..... 1010111 @r_vm
+vwmulu_vv       111000 . ..... ..... 010 ..... 1010111 @r_vm
+vwmulu_vx       111000 . ..... ..... 110 ..... 1010111 @r_vm
+vwmulsu_vv      111010 . ..... ..... 010 ..... 1010111 @r_vm
+vwmulsu_vx      111010 . ..... ..... 110 ..... 1010111 @r_vm
+vwmul_vv        111011 . ..... ..... 010 ..... 1010111 @r_vm
+vwmul_vx        111011 . ..... ..... 110 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 9f0645a92b..990433f866 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1454,3 +1454,11 @@ GEN_OPIVX_TRANS(vdivu_vx, opivx_check)
 GEN_OPIVX_TRANS(vdiv_vx, opivx_check)
 GEN_OPIVX_TRANS(vremu_vx, opivx_check)
 GEN_OPIVX_TRANS(vrem_vx, opivx_check)
+
+/* Vector Widening Integer Multiply Instructions */
+GEN_OPIVV_WIDEN_TRANS(vwmul_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwmulu_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwmulsu_vv, opivv_widen_check)
+GEN_OPIVX_WIDEN_TRANS(vwmul_vx)
+GEN_OPIVX_WIDEN_TRANS(vwmulu_vx)
+GEN_OPIVX_WIDEN_TRANS(vwmulsu_vx)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 6330f5882f..beb84f9674 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -857,6 +857,18 @@ GEN_VEXT_AMO(vamomaxuw_v_w, uint32_t, uint32_t, idx_w, clearl)
 #define OP_SUS_H int16_t, uint16_t, int16_t, uint16_t, int16_t
 #define OP_SUS_W int32_t, uint32_t, int32_t, uint32_t, int32_t
 #define OP_SUS_D int64_t, uint64_t, int64_t, uint64_t, int64_t
+#define WOP_UUU_B uint16_t, uint8_t, uint8_t, uint16_t, uint16_t
+#define WOP_UUU_H uint32_t, uint16_t, uint16_t, uint32_t, uint32_t
+#define WOP_UUU_W uint64_t, uint32_t, uint32_t, uint64_t, uint64_t
+#define WOP_SSS_B int16_t, int8_t, int8_t, int16_t, int16_t
+#define WOP_SSS_H int32_t, int16_t, int16_t, int32_t, int32_t
+#define WOP_SSS_W int64_t, int32_t, int32_t, int64_t, int64_t
+#define WOP_SUS_B int16_t, uint8_t, int8_t, uint16_t, int16_t
+#define WOP_SUS_H int32_t, uint16_t, int16_t, uint32_t, int32_t
+#define WOP_SUS_W int64_t, uint32_t, int32_t, uint64_t, int64_t
+#define WOP_SSU_B int16_t, int8_t, uint8_t, int16_t, uint16_t
+#define WOP_SSU_H int32_t, int16_t, uint16_t, int32_t, uint32_t
+#define WOP_SSU_W int64_t, int32_t, uint32_t, int64_t, uint64_t
 
 /* operation of two vector elements */
 #define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
@@ -1771,3 +1783,42 @@ GEN_VEXT_VX(vrem_vx_b, 1, 1, clearb)
 GEN_VEXT_VX(vrem_vx_h, 2, 2, clearh)
 GEN_VEXT_VX(vrem_vx_w, 4, 4, clearl)
 GEN_VEXT_VX(vrem_vx_d, 8, 8, clearq)
+
+/* Vector Widening Integer Multiply Instructions */
+RVVCALL(OPIVV2, vwmul_vv_b, WOP_SSS_B, H2, H1, H1, DO_MUL)
+RVVCALL(OPIVV2, vwmul_vv_h, WOP_SSS_H, H4, H2, H2, DO_MUL)
+RVVCALL(OPIVV2, vwmul_vv_w, WOP_SSS_W, H8, H4, H4, DO_MUL)
+RVVCALL(OPIVV2, vwmulu_vv_b, WOP_UUU_B, H2, H1, H1, DO_MUL)
+RVVCALL(OPIVV2, vwmulu_vv_h, WOP_UUU_H, H4, H2, H2, DO_MUL)
+RVVCALL(OPIVV2, vwmulu_vv_w, WOP_UUU_W, H8, H4, H4, DO_MUL)
+RVVCALL(OPIVV2, vwmulsu_vv_b, WOP_SUS_B, H2, H1, H1, DO_MUL)
+RVVCALL(OPIVV2, vwmulsu_vv_h, WOP_SUS_H, H4, H2, H2, DO_MUL)
+RVVCALL(OPIVV2, vwmulsu_vv_w, WOP_SUS_W, H8, H4, H4, DO_MUL)
+GEN_VEXT_VV(vwmul_vv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwmul_vv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwmul_vv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwmulu_vv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwmulu_vv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwmulu_vv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwmulsu_vv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwmulsu_vv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwmulsu_vv_w, 4, 8, clearq)
+
+RVVCALL(OPIVX2, vwmul_vx_b, WOP_SSS_B, H2, H1, DO_MUL)
+RVVCALL(OPIVX2, vwmul_vx_h, WOP_SSS_H, H4, H2, DO_MUL)
+RVVCALL(OPIVX2, vwmul_vx_w, WOP_SSS_W, H8, H4, DO_MUL)
+RVVCALL(OPIVX2, vwmulu_vx_b, WOP_UUU_B, H2, H1, DO_MUL)
+RVVCALL(OPIVX2, vwmulu_vx_h, WOP_UUU_H, H4, H2, DO_MUL)
+RVVCALL(OPIVX2, vwmulu_vx_w, WOP_UUU_W, H8, H4, DO_MUL)
+RVVCALL(OPIVX2, vwmulsu_vx_b, WOP_SUS_B, H2, H1, DO_MUL)
+RVVCALL(OPIVX2, vwmulsu_vx_h, WOP_SUS_H, H4, H2, DO_MUL)
+RVVCALL(OPIVX2, vwmulsu_vx_w, WOP_SUS_W, H8, H4, DO_MUL)
+GEN_VEXT_VX(vwmul_vx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwmul_vx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwmul_vx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwmulu_vx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwmulu_vx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwmulu_vx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwmulsu_vx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwmulsu_vx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwmulsu_vx_w, 4, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 19/60] target/riscv: vector widening integer multiply instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 19 +++++++++
 target/riscv/insn32.decode              |  6 +++
 target/riscv/insn_trans/trans_rvv.inc.c |  8 ++++
 target/riscv/vector_helper.c            | 51 +++++++++++++++++++++++++
 4 files changed, 84 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 357f149198..1704b8c512 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -591,3 +591,22 @@ DEF_HELPER_6(vrem_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrem_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrem_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrem_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vwmul_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmul_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmul_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmulu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmulu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmulu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmulsu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmulsu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmulsu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmul_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmul_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmul_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmulu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmulu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmulu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmulsu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmulsu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmulsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 2afe24dd34..ceddfe4b6c 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -378,6 +378,12 @@ vremu_vv        100010 . ..... ..... 010 ..... 1010111 @r_vm
 vremu_vx        100010 . ..... ..... 110 ..... 1010111 @r_vm
 vrem_vv         100011 . ..... ..... 010 ..... 1010111 @r_vm
 vrem_vx         100011 . ..... ..... 110 ..... 1010111 @r_vm
+vwmulu_vv       111000 . ..... ..... 010 ..... 1010111 @r_vm
+vwmulu_vx       111000 . ..... ..... 110 ..... 1010111 @r_vm
+vwmulsu_vv      111010 . ..... ..... 010 ..... 1010111 @r_vm
+vwmulsu_vx      111010 . ..... ..... 110 ..... 1010111 @r_vm
+vwmul_vv        111011 . ..... ..... 010 ..... 1010111 @r_vm
+vwmul_vx        111011 . ..... ..... 110 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 9f0645a92b..990433f866 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1454,3 +1454,11 @@ GEN_OPIVX_TRANS(vdivu_vx, opivx_check)
 GEN_OPIVX_TRANS(vdiv_vx, opivx_check)
 GEN_OPIVX_TRANS(vremu_vx, opivx_check)
 GEN_OPIVX_TRANS(vrem_vx, opivx_check)
+
+/* Vector Widening Integer Multiply Instructions */
+GEN_OPIVV_WIDEN_TRANS(vwmul_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwmulu_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwmulsu_vv, opivv_widen_check)
+GEN_OPIVX_WIDEN_TRANS(vwmul_vx)
+GEN_OPIVX_WIDEN_TRANS(vwmulu_vx)
+GEN_OPIVX_WIDEN_TRANS(vwmulsu_vx)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 6330f5882f..beb84f9674 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -857,6 +857,18 @@ GEN_VEXT_AMO(vamomaxuw_v_w, uint32_t, uint32_t, idx_w, clearl)
 #define OP_SUS_H int16_t, uint16_t, int16_t, uint16_t, int16_t
 #define OP_SUS_W int32_t, uint32_t, int32_t, uint32_t, int32_t
 #define OP_SUS_D int64_t, uint64_t, int64_t, uint64_t, int64_t
+#define WOP_UUU_B uint16_t, uint8_t, uint8_t, uint16_t, uint16_t
+#define WOP_UUU_H uint32_t, uint16_t, uint16_t, uint32_t, uint32_t
+#define WOP_UUU_W uint64_t, uint32_t, uint32_t, uint64_t, uint64_t
+#define WOP_SSS_B int16_t, int8_t, int8_t, int16_t, int16_t
+#define WOP_SSS_H int32_t, int16_t, int16_t, int32_t, int32_t
+#define WOP_SSS_W int64_t, int32_t, int32_t, int64_t, int64_t
+#define WOP_SUS_B int16_t, uint8_t, int8_t, uint16_t, int16_t
+#define WOP_SUS_H int32_t, uint16_t, int16_t, uint32_t, int32_t
+#define WOP_SUS_W int64_t, uint32_t, int32_t, uint64_t, int64_t
+#define WOP_SSU_B int16_t, int8_t, uint8_t, int16_t, uint16_t
+#define WOP_SSU_H int32_t, int16_t, uint16_t, int32_t, uint32_t
+#define WOP_SSU_W int64_t, int32_t, uint32_t, int64_t, uint64_t
 
 /* operation of two vector elements */
 #define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
@@ -1771,3 +1783,42 @@ GEN_VEXT_VX(vrem_vx_b, 1, 1, clearb)
 GEN_VEXT_VX(vrem_vx_h, 2, 2, clearh)
 GEN_VEXT_VX(vrem_vx_w, 4, 4, clearl)
 GEN_VEXT_VX(vrem_vx_d, 8, 8, clearq)
+
+/* Vector Widening Integer Multiply Instructions */
+RVVCALL(OPIVV2, vwmul_vv_b, WOP_SSS_B, H2, H1, H1, DO_MUL)
+RVVCALL(OPIVV2, vwmul_vv_h, WOP_SSS_H, H4, H2, H2, DO_MUL)
+RVVCALL(OPIVV2, vwmul_vv_w, WOP_SSS_W, H8, H4, H4, DO_MUL)
+RVVCALL(OPIVV2, vwmulu_vv_b, WOP_UUU_B, H2, H1, H1, DO_MUL)
+RVVCALL(OPIVV2, vwmulu_vv_h, WOP_UUU_H, H4, H2, H2, DO_MUL)
+RVVCALL(OPIVV2, vwmulu_vv_w, WOP_UUU_W, H8, H4, H4, DO_MUL)
+RVVCALL(OPIVV2, vwmulsu_vv_b, WOP_SUS_B, H2, H1, H1, DO_MUL)
+RVVCALL(OPIVV2, vwmulsu_vv_h, WOP_SUS_H, H4, H2, H2, DO_MUL)
+RVVCALL(OPIVV2, vwmulsu_vv_w, WOP_SUS_W, H8, H4, H4, DO_MUL)
+GEN_VEXT_VV(vwmul_vv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwmul_vv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwmul_vv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwmulu_vv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwmulu_vv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwmulu_vv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwmulsu_vv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwmulsu_vv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwmulsu_vv_w, 4, 8, clearq)
+
+RVVCALL(OPIVX2, vwmul_vx_b, WOP_SSS_B, H2, H1, DO_MUL)
+RVVCALL(OPIVX2, vwmul_vx_h, WOP_SSS_H, H4, H2, DO_MUL)
+RVVCALL(OPIVX2, vwmul_vx_w, WOP_SSS_W, H8, H4, DO_MUL)
+RVVCALL(OPIVX2, vwmulu_vx_b, WOP_UUU_B, H2, H1, DO_MUL)
+RVVCALL(OPIVX2, vwmulu_vx_h, WOP_UUU_H, H4, H2, DO_MUL)
+RVVCALL(OPIVX2, vwmulu_vx_w, WOP_UUU_W, H8, H4, DO_MUL)
+RVVCALL(OPIVX2, vwmulsu_vx_b, WOP_SUS_B, H2, H1, DO_MUL)
+RVVCALL(OPIVX2, vwmulsu_vx_h, WOP_SUS_H, H4, H2, DO_MUL)
+RVVCALL(OPIVX2, vwmulsu_vx_w, WOP_SUS_W, H8, H4, DO_MUL)
+GEN_VEXT_VX(vwmul_vx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwmul_vx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwmul_vx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwmulu_vx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwmulu_vx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwmulu_vx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwmulsu_vx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwmulsu_vx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwmulsu_vx_w, 4, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 20/60] target/riscv: vector single-width integer multiply-add instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 33 ++++++++++
 target/riscv/insn32.decode              |  8 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 10 +++
 target/riscv/vector_helper.c            | 88 +++++++++++++++++++++++++
 4 files changed, 139 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 1704b8c512..098288df76 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -610,3 +610,36 @@ DEF_HELPER_6(vwmulu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwmulsu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwmulsu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwmulsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vmacc_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmacc_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmacc_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmacc_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadd_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmacc_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmacc_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmacc_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmacc_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadd_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadd_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadd_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadd_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index ceddfe4b6c..58de888afa 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -384,6 +384,14 @@ vwmulsu_vv      111010 . ..... ..... 010 ..... 1010111 @r_vm
 vwmulsu_vx      111010 . ..... ..... 110 ..... 1010111 @r_vm
 vwmul_vv        111011 . ..... ..... 010 ..... 1010111 @r_vm
 vwmul_vx        111011 . ..... ..... 110 ..... 1010111 @r_vm
+vmacc_vv        101101 . ..... ..... 010 ..... 1010111 @r_vm
+vmacc_vx        101101 . ..... ..... 110 ..... 1010111 @r_vm
+vnmsac_vv       101111 . ..... ..... 010 ..... 1010111 @r_vm
+vnmsac_vx       101111 . ..... ..... 110 ..... 1010111 @r_vm
+vmadd_vv        101001 . ..... ..... 010 ..... 1010111 @r_vm
+vmadd_vx        101001 . ..... ..... 110 ..... 1010111 @r_vm
+vnmsub_vv       101011 . ..... ..... 010 ..... 1010111 @r_vm
+vnmsub_vx       101011 . ..... ..... 110 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 990433f866..05f7ae0bc4 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1462,3 +1462,13 @@ GEN_OPIVV_WIDEN_TRANS(vwmulsu_vv, opivv_widen_check)
 GEN_OPIVX_WIDEN_TRANS(vwmul_vx)
 GEN_OPIVX_WIDEN_TRANS(vwmulu_vx)
 GEN_OPIVX_WIDEN_TRANS(vwmulsu_vx)
+
+/* Vector Single-Width Integer Multiply-Add Instructions */
+GEN_OPIVV_TRANS(vmacc_vv, opivv_check)
+GEN_OPIVV_TRANS(vnmsac_vv, opivv_check)
+GEN_OPIVV_TRANS(vmadd_vv, opivv_check)
+GEN_OPIVV_TRANS(vnmsub_vv, opivv_check)
+GEN_OPIVX_TRANS(vmacc_vx, opivx_check)
+GEN_OPIVX_TRANS(vnmsac_vx, opivx_check)
+GEN_OPIVX_TRANS(vmadd_vx, opivx_check)
+GEN_OPIVX_TRANS(vnmsub_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index beb84f9674..e5082c8adc 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -1822,3 +1822,91 @@ GEN_VEXT_VX(vwmulu_vx_w, 4, 8, clearq)
 GEN_VEXT_VX(vwmulsu_vx_b, 1, 2, clearh)
 GEN_VEXT_VX(vwmulsu_vx_h, 2, 4, clearl)
 GEN_VEXT_VX(vwmulsu_vx_w, 4, 8, clearq)
+
+/* Vector Single-Width Integer Multiply-Add Instructions */
+#define OPIVV3(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)   \
+static void do_##NAME(void *vd, void *vs1, void *vs2, int i)       \
+{                                                                  \
+    TX1 s1 = *((T1 *)vs1 + HS1(i));                                \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                                \
+    TD d = *((TD *)vd + HD(i));                                    \
+    *((TD *)vd + HD(i)) = OP(s2, s1, d);                           \
+}
+
+#define DO_MACC(N, M, D) (M * N + D)
+#define DO_NMSAC(N, M, D) (-(M * N) + D)
+#define DO_MADD(N, M, D) (M * D + N)
+#define DO_NMSUB(N, M, D) (-(M * D) + N)
+RVVCALL(OPIVV3, vmacc_vv_b, OP_SSS_B, H1, H1, H1, DO_MACC)
+RVVCALL(OPIVV3, vmacc_vv_h, OP_SSS_H, H2, H2, H2, DO_MACC)
+RVVCALL(OPIVV3, vmacc_vv_w, OP_SSS_W, H4, H4, H4, DO_MACC)
+RVVCALL(OPIVV3, vmacc_vv_d, OP_SSS_D, H8, H8, H8, DO_MACC)
+RVVCALL(OPIVV3, vnmsac_vv_b, OP_SSS_B, H1, H1, H1, DO_NMSAC)
+RVVCALL(OPIVV3, vnmsac_vv_h, OP_SSS_H, H2, H2, H2, DO_NMSAC)
+RVVCALL(OPIVV3, vnmsac_vv_w, OP_SSS_W, H4, H4, H4, DO_NMSAC)
+RVVCALL(OPIVV3, vnmsac_vv_d, OP_SSS_D, H8, H8, H8, DO_NMSAC)
+RVVCALL(OPIVV3, vmadd_vv_b, OP_SSS_B, H1, H1, H1, DO_MADD)
+RVVCALL(OPIVV3, vmadd_vv_h, OP_SSS_H, H2, H2, H2, DO_MADD)
+RVVCALL(OPIVV3, vmadd_vv_w, OP_SSS_W, H4, H4, H4, DO_MADD)
+RVVCALL(OPIVV3, vmadd_vv_d, OP_SSS_D, H8, H8, H8, DO_MADD)
+RVVCALL(OPIVV3, vnmsub_vv_b, OP_SSS_B, H1, H1, H1, DO_NMSUB)
+RVVCALL(OPIVV3, vnmsub_vv_h, OP_SSS_H, H2, H2, H2, DO_NMSUB)
+RVVCALL(OPIVV3, vnmsub_vv_w, OP_SSS_W, H4, H4, H4, DO_NMSUB)
+RVVCALL(OPIVV3, vnmsub_vv_d, OP_SSS_D, H8, H8, H8, DO_NMSUB)
+GEN_VEXT_VV(vmacc_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vmacc_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vmacc_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vmacc_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vnmsac_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vnmsac_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vnmsac_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vnmsac_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vmadd_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vmadd_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vmadd_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vmadd_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vnmsub_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vnmsub_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vnmsub_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vnmsub_vv_d, 8, 8, clearq)
+
+#define OPIVX3(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)         \
+static void do_##NAME(void *vd, target_ulong s1, void *vs2, int i)  \
+{                                                                   \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
+    TD d = *((TD *)vd + HD(i));                                     \
+    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)(target_long)s1, d);      \
+}
+
+RVVCALL(OPIVX3, vmacc_vx_b, OP_SSS_B, H1, H1, DO_MACC)
+RVVCALL(OPIVX3, vmacc_vx_h, OP_SSS_H, H2, H2, DO_MACC)
+RVVCALL(OPIVX3, vmacc_vx_w, OP_SSS_W, H4, H4, DO_MACC)
+RVVCALL(OPIVX3, vmacc_vx_d, OP_SSS_D, H8, H8, DO_MACC)
+RVVCALL(OPIVX3, vnmsac_vx_b, OP_SSS_B, H1, H1, DO_NMSAC)
+RVVCALL(OPIVX3, vnmsac_vx_h, OP_SSS_H, H2, H2, DO_NMSAC)
+RVVCALL(OPIVX3, vnmsac_vx_w, OP_SSS_W, H4, H4, DO_NMSAC)
+RVVCALL(OPIVX3, vnmsac_vx_d, OP_SSS_D, H8, H8, DO_NMSAC)
+RVVCALL(OPIVX3, vmadd_vx_b, OP_SSS_B, H1, H1, DO_MADD)
+RVVCALL(OPIVX3, vmadd_vx_h, OP_SSS_H, H2, H2, DO_MADD)
+RVVCALL(OPIVX3, vmadd_vx_w, OP_SSS_W, H4, H4, DO_MADD)
+RVVCALL(OPIVX3, vmadd_vx_d, OP_SSS_D, H8, H8, DO_MADD)
+RVVCALL(OPIVX3, vnmsub_vx_b, OP_SSS_B, H1, H1, DO_NMSUB)
+RVVCALL(OPIVX3, vnmsub_vx_h, OP_SSS_H, H2, H2, DO_NMSUB)
+RVVCALL(OPIVX3, vnmsub_vx_w, OP_SSS_W, H4, H4, DO_NMSUB)
+RVVCALL(OPIVX3, vnmsub_vx_d, OP_SSS_D, H8, H8, DO_NMSUB)
+GEN_VEXT_VX(vmacc_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vmacc_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vmacc_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vmacc_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vnmsac_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vnmsac_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vnmsac_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vnmsac_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vmadd_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vmadd_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vmadd_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vmadd_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vnmsub_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vnmsub_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vnmsub_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vnmsub_vx_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 20/60] target/riscv: vector single-width integer multiply-add instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 33 ++++++++++
 target/riscv/insn32.decode              |  8 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 10 +++
 target/riscv/vector_helper.c            | 88 +++++++++++++++++++++++++
 4 files changed, 139 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 1704b8c512..098288df76 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -610,3 +610,36 @@ DEF_HELPER_6(vwmulu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwmulsu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwmulsu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwmulsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vmacc_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmacc_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmacc_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmacc_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadd_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmacc_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmacc_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmacc_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmacc_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadd_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadd_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadd_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadd_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index ceddfe4b6c..58de888afa 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -384,6 +384,14 @@ vwmulsu_vv      111010 . ..... ..... 010 ..... 1010111 @r_vm
 vwmulsu_vx      111010 . ..... ..... 110 ..... 1010111 @r_vm
 vwmul_vv        111011 . ..... ..... 010 ..... 1010111 @r_vm
 vwmul_vx        111011 . ..... ..... 110 ..... 1010111 @r_vm
+vmacc_vv        101101 . ..... ..... 010 ..... 1010111 @r_vm
+vmacc_vx        101101 . ..... ..... 110 ..... 1010111 @r_vm
+vnmsac_vv       101111 . ..... ..... 010 ..... 1010111 @r_vm
+vnmsac_vx       101111 . ..... ..... 110 ..... 1010111 @r_vm
+vmadd_vv        101001 . ..... ..... 010 ..... 1010111 @r_vm
+vmadd_vx        101001 . ..... ..... 110 ..... 1010111 @r_vm
+vnmsub_vv       101011 . ..... ..... 010 ..... 1010111 @r_vm
+vnmsub_vx       101011 . ..... ..... 110 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 990433f866..05f7ae0bc4 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1462,3 +1462,13 @@ GEN_OPIVV_WIDEN_TRANS(vwmulsu_vv, opivv_widen_check)
 GEN_OPIVX_WIDEN_TRANS(vwmul_vx)
 GEN_OPIVX_WIDEN_TRANS(vwmulu_vx)
 GEN_OPIVX_WIDEN_TRANS(vwmulsu_vx)
+
+/* Vector Single-Width Integer Multiply-Add Instructions */
+GEN_OPIVV_TRANS(vmacc_vv, opivv_check)
+GEN_OPIVV_TRANS(vnmsac_vv, opivv_check)
+GEN_OPIVV_TRANS(vmadd_vv, opivv_check)
+GEN_OPIVV_TRANS(vnmsub_vv, opivv_check)
+GEN_OPIVX_TRANS(vmacc_vx, opivx_check)
+GEN_OPIVX_TRANS(vnmsac_vx, opivx_check)
+GEN_OPIVX_TRANS(vmadd_vx, opivx_check)
+GEN_OPIVX_TRANS(vnmsub_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index beb84f9674..e5082c8adc 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -1822,3 +1822,91 @@ GEN_VEXT_VX(vwmulu_vx_w, 4, 8, clearq)
 GEN_VEXT_VX(vwmulsu_vx_b, 1, 2, clearh)
 GEN_VEXT_VX(vwmulsu_vx_h, 2, 4, clearl)
 GEN_VEXT_VX(vwmulsu_vx_w, 4, 8, clearq)
+
+/* Vector Single-Width Integer Multiply-Add Instructions */
+#define OPIVV3(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)   \
+static void do_##NAME(void *vd, void *vs1, void *vs2, int i)       \
+{                                                                  \
+    TX1 s1 = *((T1 *)vs1 + HS1(i));                                \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                                \
+    TD d = *((TD *)vd + HD(i));                                    \
+    *((TD *)vd + HD(i)) = OP(s2, s1, d);                           \
+}
+
+#define DO_MACC(N, M, D) (M * N + D)
+#define DO_NMSAC(N, M, D) (-(M * N) + D)
+#define DO_MADD(N, M, D) (M * D + N)
+#define DO_NMSUB(N, M, D) (-(M * D) + N)
+RVVCALL(OPIVV3, vmacc_vv_b, OP_SSS_B, H1, H1, H1, DO_MACC)
+RVVCALL(OPIVV3, vmacc_vv_h, OP_SSS_H, H2, H2, H2, DO_MACC)
+RVVCALL(OPIVV3, vmacc_vv_w, OP_SSS_W, H4, H4, H4, DO_MACC)
+RVVCALL(OPIVV3, vmacc_vv_d, OP_SSS_D, H8, H8, H8, DO_MACC)
+RVVCALL(OPIVV3, vnmsac_vv_b, OP_SSS_B, H1, H1, H1, DO_NMSAC)
+RVVCALL(OPIVV3, vnmsac_vv_h, OP_SSS_H, H2, H2, H2, DO_NMSAC)
+RVVCALL(OPIVV3, vnmsac_vv_w, OP_SSS_W, H4, H4, H4, DO_NMSAC)
+RVVCALL(OPIVV3, vnmsac_vv_d, OP_SSS_D, H8, H8, H8, DO_NMSAC)
+RVVCALL(OPIVV3, vmadd_vv_b, OP_SSS_B, H1, H1, H1, DO_MADD)
+RVVCALL(OPIVV3, vmadd_vv_h, OP_SSS_H, H2, H2, H2, DO_MADD)
+RVVCALL(OPIVV3, vmadd_vv_w, OP_SSS_W, H4, H4, H4, DO_MADD)
+RVVCALL(OPIVV3, vmadd_vv_d, OP_SSS_D, H8, H8, H8, DO_MADD)
+RVVCALL(OPIVV3, vnmsub_vv_b, OP_SSS_B, H1, H1, H1, DO_NMSUB)
+RVVCALL(OPIVV3, vnmsub_vv_h, OP_SSS_H, H2, H2, H2, DO_NMSUB)
+RVVCALL(OPIVV3, vnmsub_vv_w, OP_SSS_W, H4, H4, H4, DO_NMSUB)
+RVVCALL(OPIVV3, vnmsub_vv_d, OP_SSS_D, H8, H8, H8, DO_NMSUB)
+GEN_VEXT_VV(vmacc_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vmacc_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vmacc_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vmacc_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vnmsac_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vnmsac_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vnmsac_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vnmsac_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vmadd_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vmadd_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vmadd_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vmadd_vv_d, 8, 8, clearq)
+GEN_VEXT_VV(vnmsub_vv_b, 1, 1, clearb)
+GEN_VEXT_VV(vnmsub_vv_h, 2, 2, clearh)
+GEN_VEXT_VV(vnmsub_vv_w, 4, 4, clearl)
+GEN_VEXT_VV(vnmsub_vv_d, 8, 8, clearq)
+
+#define OPIVX3(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)         \
+static void do_##NAME(void *vd, target_ulong s1, void *vs2, int i)  \
+{                                                                   \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
+    TD d = *((TD *)vd + HD(i));                                     \
+    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)(target_long)s1, d);      \
+}
+
+RVVCALL(OPIVX3, vmacc_vx_b, OP_SSS_B, H1, H1, DO_MACC)
+RVVCALL(OPIVX3, vmacc_vx_h, OP_SSS_H, H2, H2, DO_MACC)
+RVVCALL(OPIVX3, vmacc_vx_w, OP_SSS_W, H4, H4, DO_MACC)
+RVVCALL(OPIVX3, vmacc_vx_d, OP_SSS_D, H8, H8, DO_MACC)
+RVVCALL(OPIVX3, vnmsac_vx_b, OP_SSS_B, H1, H1, DO_NMSAC)
+RVVCALL(OPIVX3, vnmsac_vx_h, OP_SSS_H, H2, H2, DO_NMSAC)
+RVVCALL(OPIVX3, vnmsac_vx_w, OP_SSS_W, H4, H4, DO_NMSAC)
+RVVCALL(OPIVX3, vnmsac_vx_d, OP_SSS_D, H8, H8, DO_NMSAC)
+RVVCALL(OPIVX3, vmadd_vx_b, OP_SSS_B, H1, H1, DO_MADD)
+RVVCALL(OPIVX3, vmadd_vx_h, OP_SSS_H, H2, H2, DO_MADD)
+RVVCALL(OPIVX3, vmadd_vx_w, OP_SSS_W, H4, H4, DO_MADD)
+RVVCALL(OPIVX3, vmadd_vx_d, OP_SSS_D, H8, H8, DO_MADD)
+RVVCALL(OPIVX3, vnmsub_vx_b, OP_SSS_B, H1, H1, DO_NMSUB)
+RVVCALL(OPIVX3, vnmsub_vx_h, OP_SSS_H, H2, H2, DO_NMSUB)
+RVVCALL(OPIVX3, vnmsub_vx_w, OP_SSS_W, H4, H4, DO_NMSUB)
+RVVCALL(OPIVX3, vnmsub_vx_d, OP_SSS_D, H8, H8, DO_NMSUB)
+GEN_VEXT_VX(vmacc_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vmacc_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vmacc_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vmacc_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vnmsac_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vnmsac_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vnmsac_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vnmsac_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vmadd_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vmadd_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vmadd_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vmadd_vx_d, 8, 8, clearq)
+GEN_VEXT_VX(vnmsub_vx_b, 1, 1, clearb)
+GEN_VEXT_VX(vnmsub_vx_h, 2, 2, clearh)
+GEN_VEXT_VX(vnmsub_vx_w, 4, 4, clearl)
+GEN_VEXT_VX(vnmsub_vx_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 21/60] target/riscv: vector widening integer multiply-add instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 22 ++++++++++++
 target/riscv/insn32.decode              |  7 ++++
 target/riscv/insn_trans/trans_rvv.inc.c |  9 +++++
 target/riscv/vector_helper.c            | 45 +++++++++++++++++++++++++
 4 files changed, 83 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 098288df76..1f0d3d60e3 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -643,3 +643,25 @@ DEF_HELPER_6(vnmsub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnmsub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnmsub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnmsub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vwmaccu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmaccu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmaccu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmacc_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmacc_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmacc_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmaccsu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmaccsu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmaccsu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmaccu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmacc_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmacc_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmacc_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccsu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccsu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccus_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccus_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccus_vx_w, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 58de888afa..2a5b945139 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -392,6 +392,13 @@ vmadd_vv        101001 . ..... ..... 010 ..... 1010111 @r_vm
 vmadd_vx        101001 . ..... ..... 110 ..... 1010111 @r_vm
 vnmsub_vv       101011 . ..... ..... 010 ..... 1010111 @r_vm
 vnmsub_vx       101011 . ..... ..... 110 ..... 1010111 @r_vm
+vwmaccu_vv      111100 . ..... ..... 010 ..... 1010111 @r_vm
+vwmaccu_vx      111100 . ..... ..... 110 ..... 1010111 @r_vm
+vwmacc_vv       111101 . ..... ..... 010 ..... 1010111 @r_vm
+vwmacc_vx       111101 . ..... ..... 110 ..... 1010111 @r_vm
+vwmaccsu_vv     111110 . ..... ..... 010 ..... 1010111 @r_vm
+vwmaccsu_vx     111110 . ..... ..... 110 ..... 1010111 @r_vm
+vwmaccus_vx     111111 . ..... ..... 110 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 05f7ae0bc4..958737d097 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1472,3 +1472,12 @@ GEN_OPIVX_TRANS(vmacc_vx, opivx_check)
 GEN_OPIVX_TRANS(vnmsac_vx, opivx_check)
 GEN_OPIVX_TRANS(vmadd_vx, opivx_check)
 GEN_OPIVX_TRANS(vnmsub_vx, opivx_check)
+
+/* Vector Widening Integer Multiply-Add Instructions */
+GEN_OPIVV_WIDEN_TRANS(vwmaccu_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwmacc_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwmaccsu_vv, opivv_widen_check)
+GEN_OPIVX_WIDEN_TRANS(vwmaccu_vx)
+GEN_OPIVX_WIDEN_TRANS(vwmacc_vx)
+GEN_OPIVX_WIDEN_TRANS(vwmaccsu_vx)
+GEN_OPIVX_WIDEN_TRANS(vwmaccus_vx)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index e5082c8adc..5109654f9f 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -1910,3 +1910,48 @@ GEN_VEXT_VX(vnmsub_vx_b, 1, 1, clearb)
 GEN_VEXT_VX(vnmsub_vx_h, 2, 2, clearh)
 GEN_VEXT_VX(vnmsub_vx_w, 4, 4, clearl)
 GEN_VEXT_VX(vnmsub_vx_d, 8, 8, clearq)
+
+/* Vector Widening Integer Multiply-Add Instructions */
+RVVCALL(OPIVV3, vwmaccu_vv_b, WOP_UUU_B, H2, H1, H1, DO_MACC)
+RVVCALL(OPIVV3, vwmaccu_vv_h, WOP_UUU_H, H4, H2, H2, DO_MACC)
+RVVCALL(OPIVV3, vwmaccu_vv_w, WOP_UUU_W, H8, H4, H4, DO_MACC)
+RVVCALL(OPIVV3, vwmacc_vv_b, WOP_SSS_B, H2, H1, H1, DO_MACC)
+RVVCALL(OPIVV3, vwmacc_vv_h, WOP_SSS_H, H4, H2, H2, DO_MACC)
+RVVCALL(OPIVV3, vwmacc_vv_w, WOP_SSS_W, H8, H4, H4, DO_MACC)
+RVVCALL(OPIVV3, vwmaccsu_vv_b, WOP_SSU_B, H2, H1, H1, DO_MACC)
+RVVCALL(OPIVV3, vwmaccsu_vv_h, WOP_SSU_H, H4, H2, H2, DO_MACC)
+RVVCALL(OPIVV3, vwmaccsu_vv_w, WOP_SSU_W, H8, H4, H4, DO_MACC)
+GEN_VEXT_VV(vwmaccu_vv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwmaccu_vv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwmaccu_vv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwmacc_vv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwmacc_vv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwmacc_vv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwmaccsu_vv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwmaccsu_vv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwmaccsu_vv_w, 4, 8, clearq)
+
+RVVCALL(OPIVX3, vwmaccu_vx_b, WOP_UUU_B, H2, H1, DO_MACC)
+RVVCALL(OPIVX3, vwmaccu_vx_h, WOP_UUU_H, H4, H2, DO_MACC)
+RVVCALL(OPIVX3, vwmaccu_vx_w, WOP_UUU_W, H8, H4, DO_MACC)
+RVVCALL(OPIVX3, vwmacc_vx_b, WOP_SSS_B, H2, H1, DO_MACC)
+RVVCALL(OPIVX3, vwmacc_vx_h, WOP_SSS_H, H4, H2, DO_MACC)
+RVVCALL(OPIVX3, vwmacc_vx_w, WOP_SSS_W, H8, H4, DO_MACC)
+RVVCALL(OPIVX3, vwmaccsu_vx_b, WOP_SSU_B, H2, H1, DO_MACC)
+RVVCALL(OPIVX3, vwmaccsu_vx_h, WOP_SSU_H, H4, H2, DO_MACC)
+RVVCALL(OPIVX3, vwmaccsu_vx_w, WOP_SSU_W, H8, H4, DO_MACC)
+RVVCALL(OPIVX3, vwmaccus_vx_b, WOP_SUS_B, H2, H1, DO_MACC)
+RVVCALL(OPIVX3, vwmaccus_vx_h, WOP_SUS_H, H4, H2, DO_MACC)
+RVVCALL(OPIVX3, vwmaccus_vx_w, WOP_SUS_W, H8, H4, DO_MACC)
+GEN_VEXT_VX(vwmaccu_vx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwmaccu_vx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwmaccu_vx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwmacc_vx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwmacc_vx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwmacc_vx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwmaccsu_vx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwmaccsu_vx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwmaccsu_vx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwmaccus_vx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwmaccus_vx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwmaccus_vx_w, 4, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 21/60] target/riscv: vector widening integer multiply-add instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 22 ++++++++++++
 target/riscv/insn32.decode              |  7 ++++
 target/riscv/insn_trans/trans_rvv.inc.c |  9 +++++
 target/riscv/vector_helper.c            | 45 +++++++++++++++++++++++++
 4 files changed, 83 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 098288df76..1f0d3d60e3 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -643,3 +643,25 @@ DEF_HELPER_6(vnmsub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnmsub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnmsub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnmsub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vwmaccu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmaccu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmaccu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmacc_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmacc_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmacc_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmaccsu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmaccsu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmaccsu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmaccu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmacc_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmacc_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmacc_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccsu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccsu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccus_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccus_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccus_vx_w, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 58de888afa..2a5b945139 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -392,6 +392,13 @@ vmadd_vv        101001 . ..... ..... 010 ..... 1010111 @r_vm
 vmadd_vx        101001 . ..... ..... 110 ..... 1010111 @r_vm
 vnmsub_vv       101011 . ..... ..... 010 ..... 1010111 @r_vm
 vnmsub_vx       101011 . ..... ..... 110 ..... 1010111 @r_vm
+vwmaccu_vv      111100 . ..... ..... 010 ..... 1010111 @r_vm
+vwmaccu_vx      111100 . ..... ..... 110 ..... 1010111 @r_vm
+vwmacc_vv       111101 . ..... ..... 010 ..... 1010111 @r_vm
+vwmacc_vx       111101 . ..... ..... 110 ..... 1010111 @r_vm
+vwmaccsu_vv     111110 . ..... ..... 010 ..... 1010111 @r_vm
+vwmaccsu_vx     111110 . ..... ..... 110 ..... 1010111 @r_vm
+vwmaccus_vx     111111 . ..... ..... 110 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 05f7ae0bc4..958737d097 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1472,3 +1472,12 @@ GEN_OPIVX_TRANS(vmacc_vx, opivx_check)
 GEN_OPIVX_TRANS(vnmsac_vx, opivx_check)
 GEN_OPIVX_TRANS(vmadd_vx, opivx_check)
 GEN_OPIVX_TRANS(vnmsub_vx, opivx_check)
+
+/* Vector Widening Integer Multiply-Add Instructions */
+GEN_OPIVV_WIDEN_TRANS(vwmaccu_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwmacc_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwmaccsu_vv, opivv_widen_check)
+GEN_OPIVX_WIDEN_TRANS(vwmaccu_vx)
+GEN_OPIVX_WIDEN_TRANS(vwmacc_vx)
+GEN_OPIVX_WIDEN_TRANS(vwmaccsu_vx)
+GEN_OPIVX_WIDEN_TRANS(vwmaccus_vx)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index e5082c8adc..5109654f9f 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -1910,3 +1910,48 @@ GEN_VEXT_VX(vnmsub_vx_b, 1, 1, clearb)
 GEN_VEXT_VX(vnmsub_vx_h, 2, 2, clearh)
 GEN_VEXT_VX(vnmsub_vx_w, 4, 4, clearl)
 GEN_VEXT_VX(vnmsub_vx_d, 8, 8, clearq)
+
+/* Vector Widening Integer Multiply-Add Instructions */
+RVVCALL(OPIVV3, vwmaccu_vv_b, WOP_UUU_B, H2, H1, H1, DO_MACC)
+RVVCALL(OPIVV3, vwmaccu_vv_h, WOP_UUU_H, H4, H2, H2, DO_MACC)
+RVVCALL(OPIVV3, vwmaccu_vv_w, WOP_UUU_W, H8, H4, H4, DO_MACC)
+RVVCALL(OPIVV3, vwmacc_vv_b, WOP_SSS_B, H2, H1, H1, DO_MACC)
+RVVCALL(OPIVV3, vwmacc_vv_h, WOP_SSS_H, H4, H2, H2, DO_MACC)
+RVVCALL(OPIVV3, vwmacc_vv_w, WOP_SSS_W, H8, H4, H4, DO_MACC)
+RVVCALL(OPIVV3, vwmaccsu_vv_b, WOP_SSU_B, H2, H1, H1, DO_MACC)
+RVVCALL(OPIVV3, vwmaccsu_vv_h, WOP_SSU_H, H4, H2, H2, DO_MACC)
+RVVCALL(OPIVV3, vwmaccsu_vv_w, WOP_SSU_W, H8, H4, H4, DO_MACC)
+GEN_VEXT_VV(vwmaccu_vv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwmaccu_vv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwmaccu_vv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwmacc_vv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwmacc_vv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwmacc_vv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwmaccsu_vv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwmaccsu_vv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwmaccsu_vv_w, 4, 8, clearq)
+
+RVVCALL(OPIVX3, vwmaccu_vx_b, WOP_UUU_B, H2, H1, DO_MACC)
+RVVCALL(OPIVX3, vwmaccu_vx_h, WOP_UUU_H, H4, H2, DO_MACC)
+RVVCALL(OPIVX3, vwmaccu_vx_w, WOP_UUU_W, H8, H4, DO_MACC)
+RVVCALL(OPIVX3, vwmacc_vx_b, WOP_SSS_B, H2, H1, DO_MACC)
+RVVCALL(OPIVX3, vwmacc_vx_h, WOP_SSS_H, H4, H2, DO_MACC)
+RVVCALL(OPIVX3, vwmacc_vx_w, WOP_SSS_W, H8, H4, DO_MACC)
+RVVCALL(OPIVX3, vwmaccsu_vx_b, WOP_SSU_B, H2, H1, DO_MACC)
+RVVCALL(OPIVX3, vwmaccsu_vx_h, WOP_SSU_H, H4, H2, DO_MACC)
+RVVCALL(OPIVX3, vwmaccsu_vx_w, WOP_SSU_W, H8, H4, DO_MACC)
+RVVCALL(OPIVX3, vwmaccus_vx_b, WOP_SUS_B, H2, H1, DO_MACC)
+RVVCALL(OPIVX3, vwmaccus_vx_h, WOP_SUS_H, H4, H2, DO_MACC)
+RVVCALL(OPIVX3, vwmaccus_vx_w, WOP_SUS_W, H8, H4, DO_MACC)
+GEN_VEXT_VX(vwmaccu_vx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwmaccu_vx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwmaccu_vx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwmacc_vx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwmacc_vx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwmacc_vx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwmaccsu_vx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwmaccsu_vx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwmaccsu_vx_w, 4, 8, clearq)
+GEN_VEXT_VX(vwmaccus_vx_b, 1, 2, clearh)
+GEN_VEXT_VX(vwmaccus_vx_h, 2, 4, clearl)
+GEN_VEXT_VX(vwmaccus_vx_w, 4, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 22/60] target/riscv: vector integer merge and move instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  9 ++++
 target/riscv/insn32.decode              |  3 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 24 ++++++++++
 target/riscv/vector_helper.c            | 58 +++++++++++++++++++++++++
 4 files changed, 94 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 1f0d3d60e3..121e9e57e7 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -665,3 +665,12 @@ DEF_HELPER_6(vwmaccsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwmaccus_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwmaccus_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwmaccus_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vmerge_vvm_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmerge_vvm_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmerge_vvm_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmerge_vvm_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmerge_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmerge_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmerge_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmerge_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 2a5b945139..bcb8273bcc 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -399,6 +399,9 @@ vwmacc_vx       111101 . ..... ..... 110 ..... 1010111 @r_vm
 vwmaccsu_vv     111110 . ..... ..... 010 ..... 1010111 @r_vm
 vwmaccsu_vx     111110 . ..... ..... 110 ..... 1010111 @r_vm
 vwmaccus_vx     111111 . ..... ..... 110 ..... 1010111 @r_vm
+vmerge_vvm      010111 . ..... ..... 000 ..... 1010111 @r_vm
+vmerge_vxm      010111 . ..... ..... 100 ..... 1010111 @r_vm
+vmerge_vim      010111 . ..... ..... 011 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 958737d097..aff5ca8663 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1481,3 +1481,27 @@ GEN_OPIVX_WIDEN_TRANS(vwmaccu_vx)
 GEN_OPIVX_WIDEN_TRANS(vwmacc_vx)
 GEN_OPIVX_WIDEN_TRANS(vwmaccsu_vx)
 GEN_OPIVX_WIDEN_TRANS(vwmaccus_vx)
+
+/* Vector Integer Merge and Move Instructions */
+static bool opivv_vmerge_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_reg(s, a->rs1, false) &&
+            ((a->vm == 0) || (a->rs2 == 0)));
+}
+GEN_OPIVV_TRANS(vmerge_vvm, opivv_vmerge_check)
+
+static bool opivx_vmerge_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            ((a->vm == 0) || (a->rs2 == 0)));
+}
+GEN_OPIVX_TRANS(vmerge_vxm, opivx_vmerge_check)
+
+GEN_OPIVI_TRANS(vmerge_vim, 0, vmerge_vxm, opivx_vmerge_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 5109654f9f..273b705847 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -1955,3 +1955,61 @@ GEN_VEXT_VX(vwmaccsu_vx_w, 4, 8, clearq)
 GEN_VEXT_VX(vwmaccus_vx_b, 1, 2, clearh)
 GEN_VEXT_VX(vwmaccus_vx_h, 2, 4, clearl)
 GEN_VEXT_VX(vwmaccus_vx_w, 4, 8, clearq)
+
+/* Vector Integer Merge and Move Instructions */
+#define GEN_VEXT_VMERGE_VV(NAME, ETYPE, H, CLEAR_FN)                 \
+void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,          \
+        CPURISCVState *env, uint32_t desc)                           \
+{                                                                    \
+    uint32_t mlen = vext_mlen(desc);                                 \
+    uint32_t vm = vext_vm(desc);                                     \
+    uint32_t vl = env->vl;                                           \
+    uint32_t esz = sizeof(ETYPE);                                    \
+    uint32_t vlmax = vext_maxsz(desc) / esz;                         \
+    uint32_t i;                                                      \
+                                                                     \
+    for (i = 0; i < vl; i++) {                                       \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                   \
+            ETYPE s2 = *((ETYPE *)vs2 + H(i));                       \
+            *((ETYPE *)vd + H1(i)) = s2;                             \
+        } else {                                                     \
+            ETYPE s1 = *((ETYPE *)vs1 + H(i));                       \
+            *((ETYPE *)vd + H(i)) = s1;                              \
+        }                                                            \
+    }                                                                \
+    if (i != 0) {                                                    \
+        CLEAR_FN(vd, vl, vl * esz, vlmax * esz);                     \
+    }                                                                \
+}
+GEN_VEXT_VMERGE_VV(vmerge_vvm_b, int8_t,  H1, clearb)
+GEN_VEXT_VMERGE_VV(vmerge_vvm_h, int16_t, H2, clearh)
+GEN_VEXT_VMERGE_VV(vmerge_vvm_w, int32_t, H4, clearl)
+GEN_VEXT_VMERGE_VV(vmerge_vvm_d, int64_t, H8, clearq)
+
+#define GEN_VEXT_VMERGE_VX(NAME, ETYPE, H, CLEAR_FN)                 \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1,               \
+        void *vs2, CPURISCVState *env, uint32_t desc)                \
+{                                                                    \
+    uint32_t mlen = vext_mlen(desc);                                 \
+    uint32_t vm = vext_vm(desc);                                     \
+    uint32_t vl = env->vl;                                           \
+    uint32_t esz = sizeof(ETYPE);                                    \
+    uint32_t vlmax = vext_maxsz(desc) / esz;                         \
+    uint32_t i;                                                      \
+                                                                     \
+    for (i = 0; i < vl; i++) {                                       \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                   \
+            ETYPE s2 = *((ETYPE *)vs2 + H(i));                       \
+            *((ETYPE *)vd + H1(i)) = s2;                             \
+        } else {                                                     \
+            *((ETYPE *)vd + H(i)) = (ETYPE)(target_long)s1;          \
+        }                                                            \
+    }                                                                \
+    if (i != 0) {                                                    \
+        CLEAR_FN(vd, vl, vl * esz, vlmax * esz);                     \
+    }                                                                \
+}
+GEN_VEXT_VMERGE_VX(vmerge_vxm_b, int8_t,  H1, clearb)
+GEN_VEXT_VMERGE_VX(vmerge_vxm_h, int16_t, H2, clearh)
+GEN_VEXT_VMERGE_VX(vmerge_vxm_w, int32_t, H4, clearl)
+GEN_VEXT_VMERGE_VX(vmerge_vxm_d, int64_t, H8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 22/60] target/riscv: vector integer merge and move instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  9 ++++
 target/riscv/insn32.decode              |  3 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 24 ++++++++++
 target/riscv/vector_helper.c            | 58 +++++++++++++++++++++++++
 4 files changed, 94 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 1f0d3d60e3..121e9e57e7 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -665,3 +665,12 @@ DEF_HELPER_6(vwmaccsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwmaccus_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwmaccus_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwmaccus_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vmerge_vvm_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmerge_vvm_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmerge_vvm_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmerge_vvm_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmerge_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmerge_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmerge_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmerge_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 2a5b945139..bcb8273bcc 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -399,6 +399,9 @@ vwmacc_vx       111101 . ..... ..... 110 ..... 1010111 @r_vm
 vwmaccsu_vv     111110 . ..... ..... 010 ..... 1010111 @r_vm
 vwmaccsu_vx     111110 . ..... ..... 110 ..... 1010111 @r_vm
 vwmaccus_vx     111111 . ..... ..... 110 ..... 1010111 @r_vm
+vmerge_vvm      010111 . ..... ..... 000 ..... 1010111 @r_vm
+vmerge_vxm      010111 . ..... ..... 100 ..... 1010111 @r_vm
+vmerge_vim      010111 . ..... ..... 011 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 958737d097..aff5ca8663 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1481,3 +1481,27 @@ GEN_OPIVX_WIDEN_TRANS(vwmaccu_vx)
 GEN_OPIVX_WIDEN_TRANS(vwmacc_vx)
 GEN_OPIVX_WIDEN_TRANS(vwmaccsu_vx)
 GEN_OPIVX_WIDEN_TRANS(vwmaccus_vx)
+
+/* Vector Integer Merge and Move Instructions */
+static bool opivv_vmerge_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_reg(s, a->rs1, false) &&
+            ((a->vm == 0) || (a->rs2 == 0)));
+}
+GEN_OPIVV_TRANS(vmerge_vvm, opivv_vmerge_check)
+
+static bool opivx_vmerge_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            ((a->vm == 0) || (a->rs2 == 0)));
+}
+GEN_OPIVX_TRANS(vmerge_vxm, opivx_vmerge_check)
+
+GEN_OPIVI_TRANS(vmerge_vim, 0, vmerge_vxm, opivx_vmerge_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 5109654f9f..273b705847 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -1955,3 +1955,61 @@ GEN_VEXT_VX(vwmaccsu_vx_w, 4, 8, clearq)
 GEN_VEXT_VX(vwmaccus_vx_b, 1, 2, clearh)
 GEN_VEXT_VX(vwmaccus_vx_h, 2, 4, clearl)
 GEN_VEXT_VX(vwmaccus_vx_w, 4, 8, clearq)
+
+/* Vector Integer Merge and Move Instructions */
+#define GEN_VEXT_VMERGE_VV(NAME, ETYPE, H, CLEAR_FN)                 \
+void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,          \
+        CPURISCVState *env, uint32_t desc)                           \
+{                                                                    \
+    uint32_t mlen = vext_mlen(desc);                                 \
+    uint32_t vm = vext_vm(desc);                                     \
+    uint32_t vl = env->vl;                                           \
+    uint32_t esz = sizeof(ETYPE);                                    \
+    uint32_t vlmax = vext_maxsz(desc) / esz;                         \
+    uint32_t i;                                                      \
+                                                                     \
+    for (i = 0; i < vl; i++) {                                       \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                   \
+            ETYPE s2 = *((ETYPE *)vs2 + H(i));                       \
+            *((ETYPE *)vd + H1(i)) = s2;                             \
+        } else {                                                     \
+            ETYPE s1 = *((ETYPE *)vs1 + H(i));                       \
+            *((ETYPE *)vd + H(i)) = s1;                              \
+        }                                                            \
+    }                                                                \
+    if (i != 0) {                                                    \
+        CLEAR_FN(vd, vl, vl * esz, vlmax * esz);                     \
+    }                                                                \
+}
+GEN_VEXT_VMERGE_VV(vmerge_vvm_b, int8_t,  H1, clearb)
+GEN_VEXT_VMERGE_VV(vmerge_vvm_h, int16_t, H2, clearh)
+GEN_VEXT_VMERGE_VV(vmerge_vvm_w, int32_t, H4, clearl)
+GEN_VEXT_VMERGE_VV(vmerge_vvm_d, int64_t, H8, clearq)
+
+#define GEN_VEXT_VMERGE_VX(NAME, ETYPE, H, CLEAR_FN)                 \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1,               \
+        void *vs2, CPURISCVState *env, uint32_t desc)                \
+{                                                                    \
+    uint32_t mlen = vext_mlen(desc);                                 \
+    uint32_t vm = vext_vm(desc);                                     \
+    uint32_t vl = env->vl;                                           \
+    uint32_t esz = sizeof(ETYPE);                                    \
+    uint32_t vlmax = vext_maxsz(desc) / esz;                         \
+    uint32_t i;                                                      \
+                                                                     \
+    for (i = 0; i < vl; i++) {                                       \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                   \
+            ETYPE s2 = *((ETYPE *)vs2 + H(i));                       \
+            *((ETYPE *)vd + H1(i)) = s2;                             \
+        } else {                                                     \
+            *((ETYPE *)vd + H(i)) = (ETYPE)(target_long)s1;          \
+        }                                                            \
+    }                                                                \
+    if (i != 0) {                                                    \
+        CLEAR_FN(vd, vl, vl * esz, vlmax * esz);                     \
+    }                                                                \
+}
+GEN_VEXT_VMERGE_VX(vmerge_vxm_b, int8_t,  H1, clearb)
+GEN_VEXT_VMERGE_VX(vmerge_vxm_h, int16_t, H2, clearh)
+GEN_VEXT_VMERGE_VX(vmerge_vxm_w, int32_t, H4, clearl)
+GEN_VEXT_VMERGE_VX(vmerge_vxm_d, int64_t, H8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 23/60] target/riscv: vector single-width saturating add and subtract
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  33 +++
 target/riscv/insn32.decode              |  10 +
 target/riscv/insn_trans/trans_rvv.inc.c |  16 ++
 target/riscv/vector_helper.c            | 278 ++++++++++++++++++++++++
 4 files changed, 337 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 121e9e57e7..95da00d365 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -674,3 +674,36 @@ DEF_HELPER_6(vmerge_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmerge_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmerge_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmerge_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vsaddu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsaddu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsaddu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsaddu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsadd_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssubu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssubu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssubu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssubu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssub_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsaddu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsaddu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsaddu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsaddu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsadd_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsadd_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsadd_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsadd_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssubu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssubu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssubu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssubu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index bcb8273bcc..44baadf582 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -402,6 +402,16 @@ vwmaccus_vx     111111 . ..... ..... 110 ..... 1010111 @r_vm
 vmerge_vvm      010111 . ..... ..... 000 ..... 1010111 @r_vm
 vmerge_vxm      010111 . ..... ..... 100 ..... 1010111 @r_vm
 vmerge_vim      010111 . ..... ..... 011 ..... 1010111 @r_vm
+vsaddu_vv       100000 . ..... ..... 000 ..... 1010111 @r_vm
+vsaddu_vx       100000 . ..... ..... 100 ..... 1010111 @r_vm
+vsaddu_vi       100000 . ..... ..... 011 ..... 1010111 @r_vm
+vsadd_vv        100001 . ..... ..... 000 ..... 1010111 @r_vm
+vsadd_vx        100001 . ..... ..... 100 ..... 1010111 @r_vm
+vsadd_vi        100001 . ..... ..... 011 ..... 1010111 @r_vm
+vssubu_vv       100010 . ..... ..... 000 ..... 1010111 @r_vm
+vssubu_vx       100010 . ..... ..... 100 ..... 1010111 @r_vm
+vssub_vv        100011 . ..... ..... 000 ..... 1010111 @r_vm
+vssub_vx        100011 . ..... ..... 100 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index aff5ca8663..ad55766b98 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1505,3 +1505,19 @@ static bool opivx_vmerge_check(DisasContext *s, arg_rmrr *a)
 GEN_OPIVX_TRANS(vmerge_vxm, opivx_vmerge_check)
 
 GEN_OPIVI_TRANS(vmerge_vim, 0, vmerge_vxm, opivx_vmerge_check)
+
+/*
+ *** Vector Fixed-Point Arithmetic Instructions
+ */
+
+/* Vector Single-Width Saturating Add and Subtract */
+GEN_OPIVV_GVEC_TRANS(vsaddu_vv, usadd)
+GEN_OPIVV_GVEC_TRANS(vsadd_vv,  ssadd)
+GEN_OPIVV_GVEC_TRANS(vssubu_vv, ussub)
+GEN_OPIVV_GVEC_TRANS(vssub_vv,  sssub)
+GEN_OPIVX_TRANS(vsaddu_vx,  opivx_check)
+GEN_OPIVX_TRANS(vsadd_vx,  opivx_check)
+GEN_OPIVX_TRANS(vssubu_vx,  opivx_check)
+GEN_OPIVX_TRANS(vssub_vx,  opivx_check)
+GEN_OPIVI_TRANS(vsaddu_vi, 1, vsaddu_vx, opivx_check)
+GEN_OPIVI_TRANS(vsadd_vi, 0, vsadd_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 273b705847..c7b8c1bff4 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -2013,3 +2013,281 @@ GEN_VEXT_VMERGE_VX(vmerge_vxm_b, int8_t,  H1, clearb)
 GEN_VEXT_VMERGE_VX(vmerge_vxm_h, int16_t, H2, clearh)
 GEN_VEXT_VMERGE_VX(vmerge_vxm_w, int32_t, H4, clearl)
 GEN_VEXT_VMERGE_VX(vmerge_vxm_d, int64_t, H8, clearq)
+
+/*
+ *** Vector Fixed-Point Arithmetic Instructions
+ */
+
+/* Vector Single-Width Saturating Add and Subtract */
+#define OPIVV2_ENV(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
+static void do_##NAME(void *vd, void *vs1, void *vs2, int i,        \
+        CPURISCVState *env)                                         \
+{                                                                   \
+    TX1 s1 = *((T1 *)vs1 + HS1(i));                                 \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
+    *((TD *)vd + HD(i)) = OP(env, s2, s1);                          \
+}
+
+#define GEN_VEXT_VV_ENV(NAME, ESZ, DSZ, CLEAR_FN)         \
+void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
+        void *vs2, CPURISCVState *env, uint32_t desc)     \
+{                                                         \
+    uint32_t vlmax = vext_maxsz(desc) / ESZ;              \
+    uint32_t mlen = vext_mlen(desc);                      \
+    uint32_t vm = vext_vm(desc);                          \
+    uint32_t vl = env->vl;                                \
+    uint32_t i;                                           \
+    for (i = 0; i < vl; i++) {                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
+            continue;                                     \
+        }                                                 \
+        do_##NAME(vd, vs1, vs2, i, env);                  \
+    }                                                     \
+    if (i != 0) {                                         \
+        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);         \
+    }                                                     \
+}
+
+static inline uint8_t saddu8(CPURISCVState *env, uint8_t a, uint8_t b)
+{
+    uint8_t res = a + b;
+    if (res < a) {
+        res = UINT8_MAX;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline uint16_t saddu16(CPURISCVState *env, uint16_t a, uint16_t b)
+{
+    uint16_t res = a + b;
+    if (res < a) {
+        res = UINT16_MAX;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline uint32_t saddu32(CPURISCVState *env, uint32_t a, uint32_t b)
+{
+    uint32_t res = a + b;
+    if (res < a) {
+        res = UINT32_MAX;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline uint64_t saddu64(CPURISCVState *env, uint64_t a, uint64_t b)
+{
+    uint64_t res = a + b;
+    if (res < a) {
+        res = UINT64_MAX;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+RVVCALL(OPIVV2_ENV, vsaddu_vv_b, OP_UUU_B, H1, H1, H1, saddu8)
+RVVCALL(OPIVV2_ENV, vsaddu_vv_h, OP_UUU_H, H2, H2, H2, saddu16)
+RVVCALL(OPIVV2_ENV, vsaddu_vv_w, OP_UUU_W, H4, H4, H4, saddu32)
+RVVCALL(OPIVV2_ENV, vsaddu_vv_d, OP_UUU_D, H8, H8, H8, saddu64)
+GEN_VEXT_VV_ENV(vsaddu_vv_b, 1, 1, clearb)
+GEN_VEXT_VV_ENV(vsaddu_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vsaddu_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vsaddu_vv_d, 8, 8, clearq)
+
+#define OPIVX2_ENV(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)         \
+static void do_##NAME(void *vd, target_ulong s1, void *vs2, int i,  \
+        CPURISCVState *env)                                         \
+{                                                                   \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
+    *((TD *)vd + HD(i)) = OP(env, s2, (TX1)(T1)(target_long)s1);    \
+}
+
+#define GEN_VEXT_VX_ENV(NAME, ESZ, DSZ, CLEAR_FN)         \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
+        void *vs2, CPURISCVState *env, uint32_t desc)     \
+{                                                         \
+    uint32_t vlmax = vext_maxsz(desc) / ESZ;              \
+    uint32_t mlen = vext_mlen(desc);                      \
+    uint32_t vm = vext_vm(desc);                          \
+    uint32_t vl = env->vl;                                \
+    uint32_t i;                                           \
+                                                          \
+    for (i = 0; i < vl; i++) {                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
+            continue;                                     \
+        }                                                 \
+        do_##NAME(vd, s1, vs2, i, env);                   \
+    }                                                     \
+    if (i != 0) {                                         \
+        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);         \
+    }                                                     \
+}
+RVVCALL(OPIVX2_ENV, vsaddu_vx_b, OP_UUU_B, H1, H1, saddu8)
+RVVCALL(OPIVX2_ENV, vsaddu_vx_h, OP_UUU_H, H2, H2, saddu16)
+RVVCALL(OPIVX2_ENV, vsaddu_vx_w, OP_UUU_W, H4, H4, saddu32)
+RVVCALL(OPIVX2_ENV, vsaddu_vx_d, OP_UUU_D, H8, H8, saddu64)
+GEN_VEXT_VX_ENV(vsaddu_vx_b, 1, 1, clearb)
+GEN_VEXT_VX_ENV(vsaddu_vx_h, 2, 2, clearh)
+GEN_VEXT_VX_ENV(vsaddu_vx_w, 4, 4, clearl)
+GEN_VEXT_VX_ENV(vsaddu_vx_d, 8, 8, clearq)
+
+static inline int8_t sadd8(CPURISCVState *env, int8_t a, int8_t b)
+{
+    int8_t res = a + b;
+    if (((res ^ a) & (res ^ b)) >> 7 == -1LL) {
+        res = a > 0 ? INT8_MAX : INT8_MIN;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline int16_t sadd16(CPURISCVState *env, int16_t a, int16_t b)
+{
+    int16_t res = a + b;
+    if (((res ^ a) & (res ^ b)) >> 15 == -1LL) {
+        res = a > 0 ? INT16_MAX : INT16_MIN;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline int32_t sadd32(CPURISCVState *env, int32_t a, int32_t b)
+{
+    int32_t res = a + b;
+    if (((res ^ a) & (res ^ b)) >> 31 == -1LL) {
+        res = a > 0 ? INT32_MAX : INT32_MIN;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline int64_t sadd64(CPURISCVState *env, int64_t a, int64_t b)
+{
+    int64_t res = a + b;
+    if (((res ^ a) & (res ^ b)) >> 63 == -1LL) {
+        res = a > 0 ? INT64_MAX : INT64_MIN;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+RVVCALL(OPIVV2_ENV, vsadd_vv_b, OP_SSS_B, H1, H1, H1, sadd8)
+RVVCALL(OPIVV2_ENV, vsadd_vv_h, OP_SSS_H, H2, H2, H2, sadd16)
+RVVCALL(OPIVV2_ENV, vsadd_vv_w, OP_SSS_W, H4, H4, H4, sadd32)
+RVVCALL(OPIVV2_ENV, vsadd_vv_d, OP_SSS_D, H8, H8, H8, sadd64)
+GEN_VEXT_VV_ENV(vsadd_vv_b, 1, 1, clearb)
+GEN_VEXT_VV_ENV(vsadd_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vsadd_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vsadd_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2_ENV, vsadd_vx_b, OP_SSS_B, H1, H1, sadd8)
+RVVCALL(OPIVX2_ENV, vsadd_vx_h, OP_SSS_H, H2, H2, sadd16)
+RVVCALL(OPIVX2_ENV, vsadd_vx_w, OP_SSS_W, H4, H4, sadd32)
+RVVCALL(OPIVX2_ENV, vsadd_vx_d, OP_SSS_D, H8, H8, sadd64)
+GEN_VEXT_VX_ENV(vsadd_vx_b, 1, 1, clearb)
+GEN_VEXT_VX_ENV(vsadd_vx_h, 2, 2, clearh)
+GEN_VEXT_VX_ENV(vsadd_vx_w, 4, 4, clearl)
+GEN_VEXT_VX_ENV(vsadd_vx_d, 8, 8, clearq)
+
+static inline uint8_t ssubu8(CPURISCVState *env, uint8_t a, uint8_t b)
+{
+    uint8_t res = a - b;
+    if (res > a) {
+        res = 0;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline uint16_t ssubu16(CPURISCVState *env, uint16_t a, uint16_t b)
+{
+    uint16_t res = a - b;
+    if (res > a) {
+        res = 0;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline uint32_t ssubu32(CPURISCVState *env, uint32_t a, uint32_t b)
+{
+    uint32_t res = a - b;
+    if (res > a) {
+        res = 0;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline uint64_t ssubu64(CPURISCVState *env, uint64_t a, uint64_t b)
+{
+    uint64_t res = a - b;
+    if (res > a) {
+        res = 0;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+RVVCALL(OPIVV2_ENV, vssubu_vv_b, OP_UUU_B, H1, H1, H1, ssubu8)
+RVVCALL(OPIVV2_ENV, vssubu_vv_h, OP_UUU_H, H2, H2, H2, ssubu16)
+RVVCALL(OPIVV2_ENV, vssubu_vv_w, OP_UUU_W, H4, H4, H4, ssubu32)
+RVVCALL(OPIVV2_ENV, vssubu_vv_d, OP_UUU_D, H8, H8, H8, ssubu64)
+GEN_VEXT_VV_ENV(vssubu_vv_b, 1, 1, clearb)
+GEN_VEXT_VV_ENV(vssubu_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vssubu_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vssubu_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2_ENV, vssubu_vx_b, OP_UUU_B, H1, H1, ssubu8)
+RVVCALL(OPIVX2_ENV, vssubu_vx_h, OP_UUU_H, H2, H2, ssubu16)
+RVVCALL(OPIVX2_ENV, vssubu_vx_w, OP_UUU_W, H4, H4, ssubu32)
+RVVCALL(OPIVX2_ENV, vssubu_vx_d, OP_UUU_D, H8, H8, ssubu64)
+GEN_VEXT_VX_ENV(vssubu_vx_b, 1, 1, clearb)
+GEN_VEXT_VX_ENV(vssubu_vx_h, 2, 2, clearh)
+GEN_VEXT_VX_ENV(vssubu_vx_w, 4, 4, clearl)
+GEN_VEXT_VX_ENV(vssubu_vx_d, 8, 8, clearq)
+
+static inline int8_t ssub8(CPURISCVState *env, int8_t a, int8_t b)
+{
+    int8_t res = a - b;
+    if (((res ^ a) & (a ^ b)) >> 7 == -1LL) {
+        res = a > 0 ? INT8_MAX : INT8_MIN;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline int16_t ssub16(CPURISCVState *env, int16_t a, int16_t b)
+{
+    int16_t res = a - b;
+    if (((res ^ a) & (a ^ b)) >> 15 == -1LL) {
+        res = a > 0 ? INT16_MAX : INT16_MIN;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline int32_t ssub32(CPURISCVState *env, int32_t a, int32_t b)
+{
+    int32_t res = a - b;
+    if (((res ^ a) & (a ^ b)) >> 31 == -1LL) {
+        res = a > 0 ? INT32_MAX : INT32_MIN;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline int64_t ssub64(CPURISCVState *env, int64_t a, int64_t b)
+{
+    int64_t res = a - b;
+    if (((res ^ a) & (a ^ b)) >> 63 == -1LL) {
+        res = a > 0 ? INT64_MAX : INT64_MIN;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+RVVCALL(OPIVV2_ENV, vssub_vv_b, OP_SSS_B, H1, H1, H1, ssub8)
+RVVCALL(OPIVV2_ENV, vssub_vv_h, OP_SSS_H, H2, H2, H2, ssub16)
+RVVCALL(OPIVV2_ENV, vssub_vv_w, OP_SSS_W, H4, H4, H4, ssub32)
+RVVCALL(OPIVV2_ENV, vssub_vv_d, OP_SSS_D, H8, H8, H8, ssub64)
+GEN_VEXT_VV_ENV(vssub_vv_b, 1, 1, clearb)
+GEN_VEXT_VV_ENV(vssub_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vssub_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vssub_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2_ENV, vssub_vx_b, OP_SSS_B, H1, H1, ssub8)
+RVVCALL(OPIVX2_ENV, vssub_vx_h, OP_SSS_H, H2, H2, ssub16)
+RVVCALL(OPIVX2_ENV, vssub_vx_w, OP_SSS_W, H4, H4, ssub32)
+RVVCALL(OPIVX2_ENV, vssub_vx_d, OP_SSS_D, H8, H8, ssub64)
+GEN_VEXT_VX_ENV(vssub_vx_b, 1, 1, clearb)
+GEN_VEXT_VX_ENV(vssub_vx_h, 2, 2, clearh)
+GEN_VEXT_VX_ENV(vssub_vx_w, 4, 4, clearl)
+GEN_VEXT_VX_ENV(vssub_vx_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 23/60] target/riscv: vector single-width saturating add and subtract
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  33 +++
 target/riscv/insn32.decode              |  10 +
 target/riscv/insn_trans/trans_rvv.inc.c |  16 ++
 target/riscv/vector_helper.c            | 278 ++++++++++++++++++++++++
 4 files changed, 337 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 121e9e57e7..95da00d365 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -674,3 +674,36 @@ DEF_HELPER_6(vmerge_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmerge_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmerge_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmerge_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vsaddu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsaddu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsaddu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsaddu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsadd_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssubu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssubu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssubu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssubu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssub_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsaddu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsaddu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsaddu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsaddu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsadd_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsadd_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsadd_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsadd_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssubu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssubu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssubu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssubu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index bcb8273bcc..44baadf582 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -402,6 +402,16 @@ vwmaccus_vx     111111 . ..... ..... 110 ..... 1010111 @r_vm
 vmerge_vvm      010111 . ..... ..... 000 ..... 1010111 @r_vm
 vmerge_vxm      010111 . ..... ..... 100 ..... 1010111 @r_vm
 vmerge_vim      010111 . ..... ..... 011 ..... 1010111 @r_vm
+vsaddu_vv       100000 . ..... ..... 000 ..... 1010111 @r_vm
+vsaddu_vx       100000 . ..... ..... 100 ..... 1010111 @r_vm
+vsaddu_vi       100000 . ..... ..... 011 ..... 1010111 @r_vm
+vsadd_vv        100001 . ..... ..... 000 ..... 1010111 @r_vm
+vsadd_vx        100001 . ..... ..... 100 ..... 1010111 @r_vm
+vsadd_vi        100001 . ..... ..... 011 ..... 1010111 @r_vm
+vssubu_vv       100010 . ..... ..... 000 ..... 1010111 @r_vm
+vssubu_vx       100010 . ..... ..... 100 ..... 1010111 @r_vm
+vssub_vv        100011 . ..... ..... 000 ..... 1010111 @r_vm
+vssub_vx        100011 . ..... ..... 100 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index aff5ca8663..ad55766b98 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1505,3 +1505,19 @@ static bool opivx_vmerge_check(DisasContext *s, arg_rmrr *a)
 GEN_OPIVX_TRANS(vmerge_vxm, opivx_vmerge_check)
 
 GEN_OPIVI_TRANS(vmerge_vim, 0, vmerge_vxm, opivx_vmerge_check)
+
+/*
+ *** Vector Fixed-Point Arithmetic Instructions
+ */
+
+/* Vector Single-Width Saturating Add and Subtract */
+GEN_OPIVV_GVEC_TRANS(vsaddu_vv, usadd)
+GEN_OPIVV_GVEC_TRANS(vsadd_vv,  ssadd)
+GEN_OPIVV_GVEC_TRANS(vssubu_vv, ussub)
+GEN_OPIVV_GVEC_TRANS(vssub_vv,  sssub)
+GEN_OPIVX_TRANS(vsaddu_vx,  opivx_check)
+GEN_OPIVX_TRANS(vsadd_vx,  opivx_check)
+GEN_OPIVX_TRANS(vssubu_vx,  opivx_check)
+GEN_OPIVX_TRANS(vssub_vx,  opivx_check)
+GEN_OPIVI_TRANS(vsaddu_vi, 1, vsaddu_vx, opivx_check)
+GEN_OPIVI_TRANS(vsadd_vi, 0, vsadd_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 273b705847..c7b8c1bff4 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -2013,3 +2013,281 @@ GEN_VEXT_VMERGE_VX(vmerge_vxm_b, int8_t,  H1, clearb)
 GEN_VEXT_VMERGE_VX(vmerge_vxm_h, int16_t, H2, clearh)
 GEN_VEXT_VMERGE_VX(vmerge_vxm_w, int32_t, H4, clearl)
 GEN_VEXT_VMERGE_VX(vmerge_vxm_d, int64_t, H8, clearq)
+
+/*
+ *** Vector Fixed-Point Arithmetic Instructions
+ */
+
+/* Vector Single-Width Saturating Add and Subtract */
+#define OPIVV2_ENV(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
+static void do_##NAME(void *vd, void *vs1, void *vs2, int i,        \
+        CPURISCVState *env)                                         \
+{                                                                   \
+    TX1 s1 = *((T1 *)vs1 + HS1(i));                                 \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
+    *((TD *)vd + HD(i)) = OP(env, s2, s1);                          \
+}
+
+#define GEN_VEXT_VV_ENV(NAME, ESZ, DSZ, CLEAR_FN)         \
+void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
+        void *vs2, CPURISCVState *env, uint32_t desc)     \
+{                                                         \
+    uint32_t vlmax = vext_maxsz(desc) / ESZ;              \
+    uint32_t mlen = vext_mlen(desc);                      \
+    uint32_t vm = vext_vm(desc);                          \
+    uint32_t vl = env->vl;                                \
+    uint32_t i;                                           \
+    for (i = 0; i < vl; i++) {                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
+            continue;                                     \
+        }                                                 \
+        do_##NAME(vd, vs1, vs2, i, env);                  \
+    }                                                     \
+    if (i != 0) {                                         \
+        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);         \
+    }                                                     \
+}
+
+static inline uint8_t saddu8(CPURISCVState *env, uint8_t a, uint8_t b)
+{
+    uint8_t res = a + b;
+    if (res < a) {
+        res = UINT8_MAX;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline uint16_t saddu16(CPURISCVState *env, uint16_t a, uint16_t b)
+{
+    uint16_t res = a + b;
+    if (res < a) {
+        res = UINT16_MAX;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline uint32_t saddu32(CPURISCVState *env, uint32_t a, uint32_t b)
+{
+    uint32_t res = a + b;
+    if (res < a) {
+        res = UINT32_MAX;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline uint64_t saddu64(CPURISCVState *env, uint64_t a, uint64_t b)
+{
+    uint64_t res = a + b;
+    if (res < a) {
+        res = UINT64_MAX;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+RVVCALL(OPIVV2_ENV, vsaddu_vv_b, OP_UUU_B, H1, H1, H1, saddu8)
+RVVCALL(OPIVV2_ENV, vsaddu_vv_h, OP_UUU_H, H2, H2, H2, saddu16)
+RVVCALL(OPIVV2_ENV, vsaddu_vv_w, OP_UUU_W, H4, H4, H4, saddu32)
+RVVCALL(OPIVV2_ENV, vsaddu_vv_d, OP_UUU_D, H8, H8, H8, saddu64)
+GEN_VEXT_VV_ENV(vsaddu_vv_b, 1, 1, clearb)
+GEN_VEXT_VV_ENV(vsaddu_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vsaddu_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vsaddu_vv_d, 8, 8, clearq)
+
+#define OPIVX2_ENV(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)         \
+static void do_##NAME(void *vd, target_ulong s1, void *vs2, int i,  \
+        CPURISCVState *env)                                         \
+{                                                                   \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
+    *((TD *)vd + HD(i)) = OP(env, s2, (TX1)(T1)(target_long)s1);    \
+}
+
+#define GEN_VEXT_VX_ENV(NAME, ESZ, DSZ, CLEAR_FN)         \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
+        void *vs2, CPURISCVState *env, uint32_t desc)     \
+{                                                         \
+    uint32_t vlmax = vext_maxsz(desc) / ESZ;              \
+    uint32_t mlen = vext_mlen(desc);                      \
+    uint32_t vm = vext_vm(desc);                          \
+    uint32_t vl = env->vl;                                \
+    uint32_t i;                                           \
+                                                          \
+    for (i = 0; i < vl; i++) {                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
+            continue;                                     \
+        }                                                 \
+        do_##NAME(vd, s1, vs2, i, env);                   \
+    }                                                     \
+    if (i != 0) {                                         \
+        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);         \
+    }                                                     \
+}
+RVVCALL(OPIVX2_ENV, vsaddu_vx_b, OP_UUU_B, H1, H1, saddu8)
+RVVCALL(OPIVX2_ENV, vsaddu_vx_h, OP_UUU_H, H2, H2, saddu16)
+RVVCALL(OPIVX2_ENV, vsaddu_vx_w, OP_UUU_W, H4, H4, saddu32)
+RVVCALL(OPIVX2_ENV, vsaddu_vx_d, OP_UUU_D, H8, H8, saddu64)
+GEN_VEXT_VX_ENV(vsaddu_vx_b, 1, 1, clearb)
+GEN_VEXT_VX_ENV(vsaddu_vx_h, 2, 2, clearh)
+GEN_VEXT_VX_ENV(vsaddu_vx_w, 4, 4, clearl)
+GEN_VEXT_VX_ENV(vsaddu_vx_d, 8, 8, clearq)
+
+static inline int8_t sadd8(CPURISCVState *env, int8_t a, int8_t b)
+{
+    int8_t res = a + b;
+    if (((res ^ a) & (res ^ b)) >> 7 == -1LL) {
+        res = a > 0 ? INT8_MAX : INT8_MIN;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline int16_t sadd16(CPURISCVState *env, int16_t a, int16_t b)
+{
+    int16_t res = a + b;
+    if (((res ^ a) & (res ^ b)) >> 15 == -1LL) {
+        res = a > 0 ? INT16_MAX : INT16_MIN;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline int32_t sadd32(CPURISCVState *env, int32_t a, int32_t b)
+{
+    int32_t res = a + b;
+    if (((res ^ a) & (res ^ b)) >> 31 == -1LL) {
+        res = a > 0 ? INT32_MAX : INT32_MIN;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline int64_t sadd64(CPURISCVState *env, int64_t a, int64_t b)
+{
+    int64_t res = a + b;
+    if (((res ^ a) & (res ^ b)) >> 63 == -1LL) {
+        res = a > 0 ? INT64_MAX : INT64_MIN;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+RVVCALL(OPIVV2_ENV, vsadd_vv_b, OP_SSS_B, H1, H1, H1, sadd8)
+RVVCALL(OPIVV2_ENV, vsadd_vv_h, OP_SSS_H, H2, H2, H2, sadd16)
+RVVCALL(OPIVV2_ENV, vsadd_vv_w, OP_SSS_W, H4, H4, H4, sadd32)
+RVVCALL(OPIVV2_ENV, vsadd_vv_d, OP_SSS_D, H8, H8, H8, sadd64)
+GEN_VEXT_VV_ENV(vsadd_vv_b, 1, 1, clearb)
+GEN_VEXT_VV_ENV(vsadd_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vsadd_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vsadd_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2_ENV, vsadd_vx_b, OP_SSS_B, H1, H1, sadd8)
+RVVCALL(OPIVX2_ENV, vsadd_vx_h, OP_SSS_H, H2, H2, sadd16)
+RVVCALL(OPIVX2_ENV, vsadd_vx_w, OP_SSS_W, H4, H4, sadd32)
+RVVCALL(OPIVX2_ENV, vsadd_vx_d, OP_SSS_D, H8, H8, sadd64)
+GEN_VEXT_VX_ENV(vsadd_vx_b, 1, 1, clearb)
+GEN_VEXT_VX_ENV(vsadd_vx_h, 2, 2, clearh)
+GEN_VEXT_VX_ENV(vsadd_vx_w, 4, 4, clearl)
+GEN_VEXT_VX_ENV(vsadd_vx_d, 8, 8, clearq)
+
+static inline uint8_t ssubu8(CPURISCVState *env, uint8_t a, uint8_t b)
+{
+    uint8_t res = a - b;
+    if (res > a) {
+        res = 0;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline uint16_t ssubu16(CPURISCVState *env, uint16_t a, uint16_t b)
+{
+    uint16_t res = a - b;
+    if (res > a) {
+        res = 0;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline uint32_t ssubu32(CPURISCVState *env, uint32_t a, uint32_t b)
+{
+    uint32_t res = a - b;
+    if (res > a) {
+        res = 0;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline uint64_t ssubu64(CPURISCVState *env, uint64_t a, uint64_t b)
+{
+    uint64_t res = a - b;
+    if (res > a) {
+        res = 0;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+RVVCALL(OPIVV2_ENV, vssubu_vv_b, OP_UUU_B, H1, H1, H1, ssubu8)
+RVVCALL(OPIVV2_ENV, vssubu_vv_h, OP_UUU_H, H2, H2, H2, ssubu16)
+RVVCALL(OPIVV2_ENV, vssubu_vv_w, OP_UUU_W, H4, H4, H4, ssubu32)
+RVVCALL(OPIVV2_ENV, vssubu_vv_d, OP_UUU_D, H8, H8, H8, ssubu64)
+GEN_VEXT_VV_ENV(vssubu_vv_b, 1, 1, clearb)
+GEN_VEXT_VV_ENV(vssubu_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vssubu_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vssubu_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2_ENV, vssubu_vx_b, OP_UUU_B, H1, H1, ssubu8)
+RVVCALL(OPIVX2_ENV, vssubu_vx_h, OP_UUU_H, H2, H2, ssubu16)
+RVVCALL(OPIVX2_ENV, vssubu_vx_w, OP_UUU_W, H4, H4, ssubu32)
+RVVCALL(OPIVX2_ENV, vssubu_vx_d, OP_UUU_D, H8, H8, ssubu64)
+GEN_VEXT_VX_ENV(vssubu_vx_b, 1, 1, clearb)
+GEN_VEXT_VX_ENV(vssubu_vx_h, 2, 2, clearh)
+GEN_VEXT_VX_ENV(vssubu_vx_w, 4, 4, clearl)
+GEN_VEXT_VX_ENV(vssubu_vx_d, 8, 8, clearq)
+
+static inline int8_t ssub8(CPURISCVState *env, int8_t a, int8_t b)
+{
+    int8_t res = a - b;
+    if (((res ^ a) & (a ^ b)) >> 7 == -1LL) {
+        res = a > 0 ? INT8_MAX : INT8_MIN;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline int16_t ssub16(CPURISCVState *env, int16_t a, int16_t b)
+{
+    int16_t res = a - b;
+    if (((res ^ a) & (a ^ b)) >> 15 == -1LL) {
+        res = a > 0 ? INT16_MAX : INT16_MIN;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline int32_t ssub32(CPURISCVState *env, int32_t a, int32_t b)
+{
+    int32_t res = a - b;
+    if (((res ^ a) & (a ^ b)) >> 31 == -1LL) {
+        res = a > 0 ? INT32_MAX : INT32_MIN;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+static inline int64_t ssub64(CPURISCVState *env, int64_t a, int64_t b)
+{
+    int64_t res = a - b;
+    if (((res ^ a) & (a ^ b)) >> 63 == -1LL) {
+        res = a > 0 ? INT64_MAX : INT64_MIN;
+        env->vxsat = 0x1;
+    }
+    return res;
+}
+RVVCALL(OPIVV2_ENV, vssub_vv_b, OP_SSS_B, H1, H1, H1, ssub8)
+RVVCALL(OPIVV2_ENV, vssub_vv_h, OP_SSS_H, H2, H2, H2, ssub16)
+RVVCALL(OPIVV2_ENV, vssub_vv_w, OP_SSS_W, H4, H4, H4, ssub32)
+RVVCALL(OPIVV2_ENV, vssub_vv_d, OP_SSS_D, H8, H8, H8, ssub64)
+GEN_VEXT_VV_ENV(vssub_vv_b, 1, 1, clearb)
+GEN_VEXT_VV_ENV(vssub_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vssub_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vssub_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2_ENV, vssub_vx_b, OP_SSS_B, H1, H1, ssub8)
+RVVCALL(OPIVX2_ENV, vssub_vx_h, OP_SSS_H, H2, H2, ssub16)
+RVVCALL(OPIVX2_ENV, vssub_vx_w, OP_SSS_W, H4, H4, ssub32)
+RVVCALL(OPIVX2_ENV, vssub_vx_d, OP_SSS_D, H8, H8, ssub64)
+GEN_VEXT_VX_ENV(vssub_vx_b, 1, 1, clearb)
+GEN_VEXT_VX_ENV(vssub_vx_h, 2, 2, clearh)
+GEN_VEXT_VX_ENV(vssub_vx_w, 4, 4, clearl)
+GEN_VEXT_VX_ENV(vssub_vx_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 24/60] target/riscv: vector single-width averaging add and subtract
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  17 ++++
 target/riscv/insn32.decode              |   5 +
 target/riscv/insn_trans/trans_rvv.inc.c |   7 ++
 target/riscv/vector_helper.c            | 129 ++++++++++++++++++++++++
 4 files changed, 158 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 95da00d365..d3837d2ca4 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -707,3 +707,20 @@ DEF_HELPER_6(vssub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vssub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vssub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vssub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vaadd_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vaadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vaadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vaadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vasub_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vasub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vasub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vasub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vaadd_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vaadd_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vaadd_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vaadd_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vasub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vasub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vasub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vasub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 44baadf582..0227a16b16 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -412,6 +412,11 @@ vssubu_vv       100010 . ..... ..... 000 ..... 1010111 @r_vm
 vssubu_vx       100010 . ..... ..... 100 ..... 1010111 @r_vm
 vssub_vv        100011 . ..... ..... 000 ..... 1010111 @r_vm
 vssub_vx        100011 . ..... ..... 100 ..... 1010111 @r_vm
+vaadd_vv        100100 . ..... ..... 000 ..... 1010111 @r_vm
+vaadd_vx        100100 . ..... ..... 100 ..... 1010111 @r_vm
+vaadd_vi        100100 . ..... ..... 011 ..... 1010111 @r_vm
+vasub_vv        100110 . ..... ..... 000 ..... 1010111 @r_vm
+vasub_vx        100110 . ..... ..... 100 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index ad55766b98..9988fad2fe 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1521,3 +1521,10 @@ GEN_OPIVX_TRANS(vssubu_vx,  opivx_check)
 GEN_OPIVX_TRANS(vssub_vx,  opivx_check)
 GEN_OPIVI_TRANS(vsaddu_vi, 1, vsaddu_vx, opivx_check)
 GEN_OPIVI_TRANS(vsadd_vi, 0, vsadd_vx, opivx_check)
+
+/* Vector Single-Width Averaging Add and Subtract */
+GEN_OPIVV_TRANS(vaadd_vv, opivv_check)
+GEN_OPIVV_TRANS(vasub_vv, opivv_check)
+GEN_OPIVX_TRANS(vaadd_vx,  opivx_check)
+GEN_OPIVX_TRANS(vasub_vx,  opivx_check)
+GEN_OPIVI_TRANS(vaadd_vi, 0, vaadd_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index c7b8c1bff4..b0a7a3b6e4 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -2291,3 +2291,132 @@ GEN_VEXT_VX_ENV(vssub_vx_b, 1, 1, clearb)
 GEN_VEXT_VX_ENV(vssub_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_ENV(vssub_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_ENV(vssub_vx_d, 8, 8, clearq)
+
+/* Vector Single-Width Averaging Add and Subtract */
+static inline uint8_t get_round(CPURISCVState *env, uint64_t v, uint8_t shift)
+{
+    uint8_t d = extract64(v, shift, 1);
+    uint8_t d1;
+    uint64_t D1, D2;
+    int mod = env->vxrm;
+
+    if (shift == 0 || shift > 64) {
+        return 0;
+    }
+
+    d1 = extract64(v, shift - 1, 1);
+    D1 = extract64(v, 0, shift);
+    if (mod == 0) { /* round-to-nearest-up (add +0.5 LSB) */
+        return d1;
+    } else if (mod == 1) { /* round-to-nearest-even */
+        if (shift > 1) {
+            D2 = extract64(v, 0, shift - 1);
+            return d1 & ((D2 != 0) | d);
+        } else {
+            return d1 & d;
+        }
+    } else if (mod == 3) { /* round-to-odd (OR bits into LSB, aka "jam") */
+        return !d & (D1 != 0);
+    }
+    return 0; /* round-down (truncate) */
+}
+
+static inline int8_t aadd8(CPURISCVState *env, int8_t a, int8_t b)
+{
+    int16_t res = (int16_t)a + (int16_t)b;
+    uint8_t round = get_round(env, res, 1);
+    res   = (res >> 1) + round;
+    return res;
+}
+static inline int16_t aadd16(CPURISCVState *env, int16_t a, int16_t b)
+{
+    int32_t res = (int32_t)a + (int32_t)b;
+    uint8_t round = get_round(env, res, 1);
+    res   = (res >> 1) + round;
+    return res;
+}
+static inline int32_t aadd32(CPURISCVState *env, int32_t a, int32_t b)
+{
+    int64_t res = (int64_t)a + (int64_t)b;
+    uint8_t round = get_round(env, res, 1);
+    res   = (res >> 1) + round;
+    return res;
+}
+static inline int64_t aadd64(CPURISCVState *env, int64_t a, int64_t b)
+{
+    int64_t res = (int64_t)a + (int64_t)b;
+    uint8_t round = get_round(env, res, 1); /* get_round only need v[d : 0] */
+    if (((res ^ a) & (res ^ b)) >> 63 == -1LL) { /* overflow */
+        res = ((res >> 1) ^ INT64_MIN) + round;
+    } else {
+        res   = (res >> 1) + round;
+    }
+    return res;
+}
+RVVCALL(OPIVV2_ENV, vaadd_vv_b, OP_SSS_B, H1, H1, H1, aadd8)
+RVVCALL(OPIVV2_ENV, vaadd_vv_h, OP_SSS_H, H2, H2, H2, aadd16)
+RVVCALL(OPIVV2_ENV, vaadd_vv_w, OP_SSS_W, H4, H4, H4, aadd32)
+RVVCALL(OPIVV2_ENV, vaadd_vv_d, OP_SSS_D, H8, H8, H8, aadd64)
+GEN_VEXT_VV_ENV(vaadd_vv_b, 1, 1, clearb)
+GEN_VEXT_VV_ENV(vaadd_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vaadd_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vaadd_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2_ENV, vaadd_vx_b, OP_SSS_B, H1, H1, aadd8)
+RVVCALL(OPIVX2_ENV, vaadd_vx_h, OP_SSS_H, H2, H2, aadd16)
+RVVCALL(OPIVX2_ENV, vaadd_vx_w, OP_SSS_W, H4, H4, aadd32)
+RVVCALL(OPIVX2_ENV, vaadd_vx_d, OP_SSS_D, H8, H8, aadd64)
+GEN_VEXT_VX_ENV(vaadd_vx_b, 1, 1, clearb)
+GEN_VEXT_VX_ENV(vaadd_vx_h, 2, 2, clearh)
+GEN_VEXT_VX_ENV(vaadd_vx_w, 4, 4, clearl)
+GEN_VEXT_VX_ENV(vaadd_vx_d, 8, 8, clearq)
+
+static inline int8_t asub8(CPURISCVState *env, int8_t a, int8_t b)
+{
+    int16_t res = (int16_t)a - (int16_t)b;
+    uint8_t round = get_round(env, res, 1);
+    res   = (res >> 1) + round;
+    return res;
+}
+static inline int16_t asub16(CPURISCVState *env, int16_t a, int16_t b)
+{
+    int32_t res = (int32_t)a - (int32_t)b;
+    uint8_t round = get_round(env, res, 1);
+    res   = (res >> 1) + round;
+    return res;
+}
+static inline int32_t asub32(CPURISCVState *env, int32_t a, int32_t b)
+{
+    int64_t res = (int64_t)a - (int64_t)b;
+    uint8_t round = get_round(env, res, 1);
+    res   = (res >> 1) + round;
+    return res;
+}
+static inline int64_t asub64(CPURISCVState *env, int64_t a, int64_t b)
+{
+    int64_t res = (int64_t)a - (int64_t)b;
+    uint8_t round = get_round(env, res, 1); /* get_round only need v[d : 0] */
+    if (((res ^ a) & (a ^ b)) >> 63 == -1LL) { /* overflow */
+        res = ((res >> 1) ^ INT64_MIN) + round;
+    } else {
+        res   = (res >> 1) + round;
+    }
+    return res;
+}
+RVVCALL(OPIVV2_ENV, vasub_vv_b, OP_SSS_B, H1, H1, H1, asub8)
+RVVCALL(OPIVV2_ENV, vasub_vv_h, OP_SSS_H, H2, H2, H2, asub16)
+RVVCALL(OPIVV2_ENV, vasub_vv_w, OP_SSS_W, H4, H4, H4, asub32)
+RVVCALL(OPIVV2_ENV, vasub_vv_d, OP_SSS_D, H8, H8, H8, asub64)
+GEN_VEXT_VV_ENV(vasub_vv_b, 1, 1, clearb)
+GEN_VEXT_VV_ENV(vasub_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vasub_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vasub_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2_ENV, vasub_vx_b, OP_SSS_B, H1, H1, asub8)
+RVVCALL(OPIVX2_ENV, vasub_vx_h, OP_SSS_H, H2, H2, asub16)
+RVVCALL(OPIVX2_ENV, vasub_vx_w, OP_SSS_W, H4, H4, asub32)
+RVVCALL(OPIVX2_ENV, vasub_vx_d, OP_SSS_D, H8, H8, asub64)
+GEN_VEXT_VX_ENV(vasub_vx_b, 1, 1, clearb)
+GEN_VEXT_VX_ENV(vasub_vx_h, 2, 2, clearh)
+GEN_VEXT_VX_ENV(vasub_vx_w, 4, 4, clearl)
+GEN_VEXT_VX_ENV(vasub_vx_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 24/60] target/riscv: vector single-width averaging add and subtract
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  17 ++++
 target/riscv/insn32.decode              |   5 +
 target/riscv/insn_trans/trans_rvv.inc.c |   7 ++
 target/riscv/vector_helper.c            | 129 ++++++++++++++++++++++++
 4 files changed, 158 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 95da00d365..d3837d2ca4 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -707,3 +707,20 @@ DEF_HELPER_6(vssub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vssub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vssub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vssub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vaadd_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vaadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vaadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vaadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vasub_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vasub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vasub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vasub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vaadd_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vaadd_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vaadd_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vaadd_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vasub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vasub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vasub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vasub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 44baadf582..0227a16b16 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -412,6 +412,11 @@ vssubu_vv       100010 . ..... ..... 000 ..... 1010111 @r_vm
 vssubu_vx       100010 . ..... ..... 100 ..... 1010111 @r_vm
 vssub_vv        100011 . ..... ..... 000 ..... 1010111 @r_vm
 vssub_vx        100011 . ..... ..... 100 ..... 1010111 @r_vm
+vaadd_vv        100100 . ..... ..... 000 ..... 1010111 @r_vm
+vaadd_vx        100100 . ..... ..... 100 ..... 1010111 @r_vm
+vaadd_vi        100100 . ..... ..... 011 ..... 1010111 @r_vm
+vasub_vv        100110 . ..... ..... 000 ..... 1010111 @r_vm
+vasub_vx        100110 . ..... ..... 100 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index ad55766b98..9988fad2fe 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1521,3 +1521,10 @@ GEN_OPIVX_TRANS(vssubu_vx,  opivx_check)
 GEN_OPIVX_TRANS(vssub_vx,  opivx_check)
 GEN_OPIVI_TRANS(vsaddu_vi, 1, vsaddu_vx, opivx_check)
 GEN_OPIVI_TRANS(vsadd_vi, 0, vsadd_vx, opivx_check)
+
+/* Vector Single-Width Averaging Add and Subtract */
+GEN_OPIVV_TRANS(vaadd_vv, opivv_check)
+GEN_OPIVV_TRANS(vasub_vv, opivv_check)
+GEN_OPIVX_TRANS(vaadd_vx,  opivx_check)
+GEN_OPIVX_TRANS(vasub_vx,  opivx_check)
+GEN_OPIVI_TRANS(vaadd_vi, 0, vaadd_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index c7b8c1bff4..b0a7a3b6e4 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -2291,3 +2291,132 @@ GEN_VEXT_VX_ENV(vssub_vx_b, 1, 1, clearb)
 GEN_VEXT_VX_ENV(vssub_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_ENV(vssub_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_ENV(vssub_vx_d, 8, 8, clearq)
+
+/* Vector Single-Width Averaging Add and Subtract */
+static inline uint8_t get_round(CPURISCVState *env, uint64_t v, uint8_t shift)
+{
+    uint8_t d = extract64(v, shift, 1);
+    uint8_t d1;
+    uint64_t D1, D2;
+    int mod = env->vxrm;
+
+    if (shift == 0 || shift > 64) {
+        return 0;
+    }
+
+    d1 = extract64(v, shift - 1, 1);
+    D1 = extract64(v, 0, shift);
+    if (mod == 0) { /* round-to-nearest-up (add +0.5 LSB) */
+        return d1;
+    } else if (mod == 1) { /* round-to-nearest-even */
+        if (shift > 1) {
+            D2 = extract64(v, 0, shift - 1);
+            return d1 & ((D2 != 0) | d);
+        } else {
+            return d1 & d;
+        }
+    } else if (mod == 3) { /* round-to-odd (OR bits into LSB, aka "jam") */
+        return !d & (D1 != 0);
+    }
+    return 0; /* round-down (truncate) */
+}
+
+static inline int8_t aadd8(CPURISCVState *env, int8_t a, int8_t b)
+{
+    int16_t res = (int16_t)a + (int16_t)b;
+    uint8_t round = get_round(env, res, 1);
+    res   = (res >> 1) + round;
+    return res;
+}
+static inline int16_t aadd16(CPURISCVState *env, int16_t a, int16_t b)
+{
+    int32_t res = (int32_t)a + (int32_t)b;
+    uint8_t round = get_round(env, res, 1);
+    res   = (res >> 1) + round;
+    return res;
+}
+static inline int32_t aadd32(CPURISCVState *env, int32_t a, int32_t b)
+{
+    int64_t res = (int64_t)a + (int64_t)b;
+    uint8_t round = get_round(env, res, 1);
+    res   = (res >> 1) + round;
+    return res;
+}
+static inline int64_t aadd64(CPURISCVState *env, int64_t a, int64_t b)
+{
+    int64_t res = (int64_t)a + (int64_t)b;
+    uint8_t round = get_round(env, res, 1); /* get_round only need v[d : 0] */
+    if (((res ^ a) & (res ^ b)) >> 63 == -1LL) { /* overflow */
+        res = ((res >> 1) ^ INT64_MIN) + round;
+    } else {
+        res   = (res >> 1) + round;
+    }
+    return res;
+}
+RVVCALL(OPIVV2_ENV, vaadd_vv_b, OP_SSS_B, H1, H1, H1, aadd8)
+RVVCALL(OPIVV2_ENV, vaadd_vv_h, OP_SSS_H, H2, H2, H2, aadd16)
+RVVCALL(OPIVV2_ENV, vaadd_vv_w, OP_SSS_W, H4, H4, H4, aadd32)
+RVVCALL(OPIVV2_ENV, vaadd_vv_d, OP_SSS_D, H8, H8, H8, aadd64)
+GEN_VEXT_VV_ENV(vaadd_vv_b, 1, 1, clearb)
+GEN_VEXT_VV_ENV(vaadd_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vaadd_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vaadd_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2_ENV, vaadd_vx_b, OP_SSS_B, H1, H1, aadd8)
+RVVCALL(OPIVX2_ENV, vaadd_vx_h, OP_SSS_H, H2, H2, aadd16)
+RVVCALL(OPIVX2_ENV, vaadd_vx_w, OP_SSS_W, H4, H4, aadd32)
+RVVCALL(OPIVX2_ENV, vaadd_vx_d, OP_SSS_D, H8, H8, aadd64)
+GEN_VEXT_VX_ENV(vaadd_vx_b, 1, 1, clearb)
+GEN_VEXT_VX_ENV(vaadd_vx_h, 2, 2, clearh)
+GEN_VEXT_VX_ENV(vaadd_vx_w, 4, 4, clearl)
+GEN_VEXT_VX_ENV(vaadd_vx_d, 8, 8, clearq)
+
+static inline int8_t asub8(CPURISCVState *env, int8_t a, int8_t b)
+{
+    int16_t res = (int16_t)a - (int16_t)b;
+    uint8_t round = get_round(env, res, 1);
+    res   = (res >> 1) + round;
+    return res;
+}
+static inline int16_t asub16(CPURISCVState *env, int16_t a, int16_t b)
+{
+    int32_t res = (int32_t)a - (int32_t)b;
+    uint8_t round = get_round(env, res, 1);
+    res   = (res >> 1) + round;
+    return res;
+}
+static inline int32_t asub32(CPURISCVState *env, int32_t a, int32_t b)
+{
+    int64_t res = (int64_t)a - (int64_t)b;
+    uint8_t round = get_round(env, res, 1);
+    res   = (res >> 1) + round;
+    return res;
+}
+static inline int64_t asub64(CPURISCVState *env, int64_t a, int64_t b)
+{
+    int64_t res = (int64_t)a - (int64_t)b;
+    uint8_t round = get_round(env, res, 1); /* get_round only need v[d : 0] */
+    if (((res ^ a) & (a ^ b)) >> 63 == -1LL) { /* overflow */
+        res = ((res >> 1) ^ INT64_MIN) + round;
+    } else {
+        res   = (res >> 1) + round;
+    }
+    return res;
+}
+RVVCALL(OPIVV2_ENV, vasub_vv_b, OP_SSS_B, H1, H1, H1, asub8)
+RVVCALL(OPIVV2_ENV, vasub_vv_h, OP_SSS_H, H2, H2, H2, asub16)
+RVVCALL(OPIVV2_ENV, vasub_vv_w, OP_SSS_W, H4, H4, H4, asub32)
+RVVCALL(OPIVV2_ENV, vasub_vv_d, OP_SSS_D, H8, H8, H8, asub64)
+GEN_VEXT_VV_ENV(vasub_vv_b, 1, 1, clearb)
+GEN_VEXT_VV_ENV(vasub_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vasub_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vasub_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2_ENV, vasub_vx_b, OP_SSS_B, H1, H1, asub8)
+RVVCALL(OPIVX2_ENV, vasub_vx_h, OP_SSS_H, H2, H2, asub16)
+RVVCALL(OPIVX2_ENV, vasub_vx_w, OP_SSS_W, H4, H4, asub32)
+RVVCALL(OPIVX2_ENV, vasub_vx_d, OP_SSS_D, H8, H8, asub64)
+GEN_VEXT_VX_ENV(vasub_vx_b, 1, 1, clearb)
+GEN_VEXT_VX_ENV(vasub_vx_h, 2, 2, clearh)
+GEN_VEXT_VX_ENV(vasub_vx_w, 4, 4, clearl)
+GEN_VEXT_VX_ENV(vasub_vx_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 25/60] target/riscv: vector single-width fractional multiply with rounding and saturation
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   9 +++
 target/riscv/insn32.decode              |   2 +
 target/riscv/insn_trans/trans_rvv.inc.c |   4 +
 target/riscv/vector_helper.c            | 103 ++++++++++++++++++++++++
 4 files changed, 118 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index d3837d2ca4..333eccca57 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -724,3 +724,12 @@ DEF_HELPER_6(vasub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vasub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vasub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vasub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vsmul_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsmul_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsmul_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsmul_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsmul_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsmul_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsmul_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsmul_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 0227a16b16..99f70924d6 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -417,6 +417,8 @@ vaadd_vx        100100 . ..... ..... 100 ..... 1010111 @r_vm
 vaadd_vi        100100 . ..... ..... 011 ..... 1010111 @r_vm
 vasub_vv        100110 . ..... ..... 000 ..... 1010111 @r_vm
 vasub_vx        100110 . ..... ..... 100 ..... 1010111 @r_vm
+vsmul_vv        100111 . ..... ..... 000 ..... 1010111 @r_vm
+vsmul_vx        100111 . ..... ..... 100 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 9988fad2fe..60e1e63b7b 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1528,3 +1528,7 @@ GEN_OPIVV_TRANS(vasub_vv, opivv_check)
 GEN_OPIVX_TRANS(vaadd_vx,  opivx_check)
 GEN_OPIVX_TRANS(vasub_vx,  opivx_check)
 GEN_OPIVI_TRANS(vaadd_vi, 0, vaadd_vx, opivx_check)
+
+/* Vector Single-Width Fractional Multiply with Rounding and Saturation */
+GEN_OPIVV_TRANS(vsmul_vv, opivv_check)
+GEN_OPIVX_TRANS(vsmul_vx,  opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index b0a7a3b6e4..74ad07743c 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -2420,3 +2420,106 @@ GEN_VEXT_VX_ENV(vasub_vx_b, 1, 1, clearb)
 GEN_VEXT_VX_ENV(vasub_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_ENV(vasub_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_ENV(vasub_vx_d, 8, 8, clearq)
+
+/* Vector Single-Width Fractional Multiply with Rounding and Saturation */
+static inline int8_t vsmul8(CPURISCVState *env, int8_t a, int8_t b)
+{
+    uint8_t round;
+    int16_t res;
+
+    res = (int16_t)a * (int16_t)b;
+    round = get_round(env, res, 7);
+    res   = (res >> 7) + round;
+
+    if (res > INT8_MAX) {
+        env->vxsat = 0x1;
+        return INT8_MAX;
+    } else if (res < INT8_MIN) {
+        env->vxsat = 0x1;
+        return INT8_MIN;
+    } else {
+        return res;
+    }
+}
+static int16_t vsmul16(CPURISCVState *env, int16_t a, int16_t b)
+{
+    uint8_t round;
+    int32_t res;
+
+    res = (int32_t)a * (int32_t)b;
+    round = get_round(env, res, 15);
+    res   = (res >> 15) + round;
+
+    if (res > INT16_MAX) {
+        env->vxsat = 0x1;
+        return INT16_MAX;
+    } else if (res < INT16_MIN) {
+        env->vxsat = 0x1;
+        return INT16_MIN;
+    } else {
+        return res;
+    }
+}
+static int32_t vsmul32(CPURISCVState *env, int32_t a, int32_t b)
+{
+    uint8_t round;
+    int64_t res;
+
+    res = (int64_t)a * (int64_t)b;
+    round = get_round(env, res, 31);
+    res   = (res >> 31) + round;
+
+    if (res > INT32_MAX) {
+        env->vxsat = 0x1;
+        return INT32_MAX;
+    } else if (res < INT32_MIN) {
+        env->vxsat = 0x1;
+        return INT32_MIN;
+    } else {
+        return res;
+    }
+}
+static int64_t vsmul64(CPURISCVState *env, int64_t a, int64_t b)
+{
+    uint8_t round;
+    uint64_t hi_64, lo_64, Hi62;
+    uint8_t hi62, hi63, lo63;
+
+    muls64(&lo_64, &hi_64, a, b);
+    hi62 = extract64(hi_64, 62, 1);
+    lo63 = extract64(lo_64, 63, 1);
+    hi63 = extract64(hi_64, 63, 1);
+    Hi62 = extract64(hi_64, 0, 62);
+    if (hi62 != hi63) {
+        env->vxsat = 0x1;
+        return INT64_MAX;
+    }
+    round = get_round(env, lo_64, 63);
+    if (round && (Hi62 == 0x3fffffff) && lo63) {
+        env->vxsat = 0x1;
+        return hi62 ? INT64_MIN : INT64_MAX;
+    } else {
+        if (lo63 && round) {
+            return (hi_64 + 1) << 1;
+        } else {
+            return (hi_64 << 1) | lo63 | round;
+        }
+    }
+}
+RVVCALL(OPIVV2_ENV, vsmul_vv_b, OP_SSS_B, H1, H1, H1, vsmul8)
+RVVCALL(OPIVV2_ENV, vsmul_vv_h, OP_SSS_H, H2, H2, H2, vsmul16)
+RVVCALL(OPIVV2_ENV, vsmul_vv_w, OP_SSS_W, H4, H4, H4, vsmul32)
+RVVCALL(OPIVV2_ENV, vsmul_vv_d, OP_SSS_D, H8, H8, H8, vsmul64)
+GEN_VEXT_VV_ENV(vsmul_vv_b, 1, 1, clearb)
+GEN_VEXT_VV_ENV(vsmul_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vsmul_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vsmul_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2_ENV, vsmul_vx_b, OP_SSS_B, H1, H1, vsmul8)
+RVVCALL(OPIVX2_ENV, vsmul_vx_h, OP_SSS_H, H2, H2, vsmul16)
+RVVCALL(OPIVX2_ENV, vsmul_vx_w, OP_SSS_W, H4, H4, vsmul32)
+RVVCALL(OPIVX2_ENV, vsmul_vx_d, OP_SSS_D, H8, H8, vsmul64)
+GEN_VEXT_VX_ENV(vsmul_vx_b, 1, 1, clearb)
+GEN_VEXT_VX_ENV(vsmul_vx_h, 2, 2, clearh)
+GEN_VEXT_VX_ENV(vsmul_vx_w, 4, 4, clearl)
+GEN_VEXT_VX_ENV(vsmul_vx_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 25/60] target/riscv: vector single-width fractional multiply with rounding and saturation
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   9 +++
 target/riscv/insn32.decode              |   2 +
 target/riscv/insn_trans/trans_rvv.inc.c |   4 +
 target/riscv/vector_helper.c            | 103 ++++++++++++++++++++++++
 4 files changed, 118 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index d3837d2ca4..333eccca57 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -724,3 +724,12 @@ DEF_HELPER_6(vasub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vasub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vasub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vasub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vsmul_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsmul_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsmul_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsmul_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsmul_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsmul_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsmul_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsmul_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 0227a16b16..99f70924d6 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -417,6 +417,8 @@ vaadd_vx        100100 . ..... ..... 100 ..... 1010111 @r_vm
 vaadd_vi        100100 . ..... ..... 011 ..... 1010111 @r_vm
 vasub_vv        100110 . ..... ..... 000 ..... 1010111 @r_vm
 vasub_vx        100110 . ..... ..... 100 ..... 1010111 @r_vm
+vsmul_vv        100111 . ..... ..... 000 ..... 1010111 @r_vm
+vsmul_vx        100111 . ..... ..... 100 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 9988fad2fe..60e1e63b7b 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1528,3 +1528,7 @@ GEN_OPIVV_TRANS(vasub_vv, opivv_check)
 GEN_OPIVX_TRANS(vaadd_vx,  opivx_check)
 GEN_OPIVX_TRANS(vasub_vx,  opivx_check)
 GEN_OPIVI_TRANS(vaadd_vi, 0, vaadd_vx, opivx_check)
+
+/* Vector Single-Width Fractional Multiply with Rounding and Saturation */
+GEN_OPIVV_TRANS(vsmul_vv, opivv_check)
+GEN_OPIVX_TRANS(vsmul_vx,  opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index b0a7a3b6e4..74ad07743c 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -2420,3 +2420,106 @@ GEN_VEXT_VX_ENV(vasub_vx_b, 1, 1, clearb)
 GEN_VEXT_VX_ENV(vasub_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_ENV(vasub_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_ENV(vasub_vx_d, 8, 8, clearq)
+
+/* Vector Single-Width Fractional Multiply with Rounding and Saturation */
+static inline int8_t vsmul8(CPURISCVState *env, int8_t a, int8_t b)
+{
+    uint8_t round;
+    int16_t res;
+
+    res = (int16_t)a * (int16_t)b;
+    round = get_round(env, res, 7);
+    res   = (res >> 7) + round;
+
+    if (res > INT8_MAX) {
+        env->vxsat = 0x1;
+        return INT8_MAX;
+    } else if (res < INT8_MIN) {
+        env->vxsat = 0x1;
+        return INT8_MIN;
+    } else {
+        return res;
+    }
+}
+static int16_t vsmul16(CPURISCVState *env, int16_t a, int16_t b)
+{
+    uint8_t round;
+    int32_t res;
+
+    res = (int32_t)a * (int32_t)b;
+    round = get_round(env, res, 15);
+    res   = (res >> 15) + round;
+
+    if (res > INT16_MAX) {
+        env->vxsat = 0x1;
+        return INT16_MAX;
+    } else if (res < INT16_MIN) {
+        env->vxsat = 0x1;
+        return INT16_MIN;
+    } else {
+        return res;
+    }
+}
+static int32_t vsmul32(CPURISCVState *env, int32_t a, int32_t b)
+{
+    uint8_t round;
+    int64_t res;
+
+    res = (int64_t)a * (int64_t)b;
+    round = get_round(env, res, 31);
+    res   = (res >> 31) + round;
+
+    if (res > INT32_MAX) {
+        env->vxsat = 0x1;
+        return INT32_MAX;
+    } else if (res < INT32_MIN) {
+        env->vxsat = 0x1;
+        return INT32_MIN;
+    } else {
+        return res;
+    }
+}
+static int64_t vsmul64(CPURISCVState *env, int64_t a, int64_t b)
+{
+    uint8_t round;
+    uint64_t hi_64, lo_64, Hi62;
+    uint8_t hi62, hi63, lo63;
+
+    muls64(&lo_64, &hi_64, a, b);
+    hi62 = extract64(hi_64, 62, 1);
+    lo63 = extract64(lo_64, 63, 1);
+    hi63 = extract64(hi_64, 63, 1);
+    Hi62 = extract64(hi_64, 0, 62);
+    if (hi62 != hi63) {
+        env->vxsat = 0x1;
+        return INT64_MAX;
+    }
+    round = get_round(env, lo_64, 63);
+    if (round && (Hi62 == 0x3fffffff) && lo63) {
+        env->vxsat = 0x1;
+        return hi62 ? INT64_MIN : INT64_MAX;
+    } else {
+        if (lo63 && round) {
+            return (hi_64 + 1) << 1;
+        } else {
+            return (hi_64 << 1) | lo63 | round;
+        }
+    }
+}
+RVVCALL(OPIVV2_ENV, vsmul_vv_b, OP_SSS_B, H1, H1, H1, vsmul8)
+RVVCALL(OPIVV2_ENV, vsmul_vv_h, OP_SSS_H, H2, H2, H2, vsmul16)
+RVVCALL(OPIVV2_ENV, vsmul_vv_w, OP_SSS_W, H4, H4, H4, vsmul32)
+RVVCALL(OPIVV2_ENV, vsmul_vv_d, OP_SSS_D, H8, H8, H8, vsmul64)
+GEN_VEXT_VV_ENV(vsmul_vv_b, 1, 1, clearb)
+GEN_VEXT_VV_ENV(vsmul_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vsmul_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vsmul_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2_ENV, vsmul_vx_b, OP_SSS_B, H1, H1, vsmul8)
+RVVCALL(OPIVX2_ENV, vsmul_vx_h, OP_SSS_H, H2, H2, vsmul16)
+RVVCALL(OPIVX2_ENV, vsmul_vx_w, OP_SSS_W, H4, H4, vsmul32)
+RVVCALL(OPIVX2_ENV, vsmul_vx_d, OP_SSS_D, H8, H8, vsmul64)
+GEN_VEXT_VX_ENV(vsmul_vx_b, 1, 1, clearb)
+GEN_VEXT_VX_ENV(vsmul_vx_h, 2, 2, clearh)
+GEN_VEXT_VX_ENV(vsmul_vx_w, 4, 4, clearl)
+GEN_VEXT_VX_ENV(vsmul_vx_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 26/60] target/riscv: vector widening saturating scaled multiply-add
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  22 +++
 target/riscv/insn32.decode              |   7 +
 target/riscv/insn_trans/trans_rvv.inc.c |   9 ++
 target/riscv/vector_helper.c            | 180 ++++++++++++++++++++++++
 4 files changed, 218 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 333eccca57..74c1c695e0 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -733,3 +733,25 @@ DEF_HELPER_6(vsmul_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsmul_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsmul_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsmul_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vwsmaccu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmaccu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmaccu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmacc_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmacc_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmacc_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmaccsu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmaccsu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmaccsu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmaccu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmacc_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmacc_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmacc_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccsu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccsu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccus_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccus_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccus_vx_w, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 99f70924d6..8798919d3e 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -419,6 +419,13 @@ vasub_vv        100110 . ..... ..... 000 ..... 1010111 @r_vm
 vasub_vx        100110 . ..... ..... 100 ..... 1010111 @r_vm
 vsmul_vv        100111 . ..... ..... 000 ..... 1010111 @r_vm
 vsmul_vx        100111 . ..... ..... 100 ..... 1010111 @r_vm
+vwsmaccu_vv     111100 . ..... ..... 000 ..... 1010111 @r_vm
+vwsmaccu_vx     111100 . ..... ..... 100 ..... 1010111 @r_vm
+vwsmacc_vv      111101 . ..... ..... 000 ..... 1010111 @r_vm
+vwsmacc_vx      111101 . ..... ..... 100 ..... 1010111 @r_vm
+vwsmaccsu_vv    111110 . ..... ..... 000 ..... 1010111 @r_vm
+vwsmaccsu_vx    111110 . ..... ..... 100 ..... 1010111 @r_vm
+vwsmaccus_vx    111111 . ..... ..... 100 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 60e1e63b7b..68bebd3c37 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1532,3 +1532,12 @@ GEN_OPIVI_TRANS(vaadd_vi, 0, vaadd_vx, opivx_check)
 /* Vector Single-Width Fractional Multiply with Rounding and Saturation */
 GEN_OPIVV_TRANS(vsmul_vv, opivv_check)
 GEN_OPIVX_TRANS(vsmul_vx,  opivx_check)
+
+/* Vector Widening Saturating Scaled Multiply-Add */
+GEN_OPIVV_WIDEN_TRANS(vwsmaccu_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwsmacc_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwsmaccsu_vv, opivv_widen_check)
+GEN_OPIVX_WIDEN_TRANS(vwsmaccu_vx)
+GEN_OPIVX_WIDEN_TRANS(vwsmacc_vx)
+GEN_OPIVX_WIDEN_TRANS(vwsmaccsu_vx)
+GEN_OPIVX_WIDEN_TRANS(vwsmaccus_vx)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 74ad07743c..90c19577fa 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -2523,3 +2523,183 @@ GEN_VEXT_VX_ENV(vsmul_vx_b, 1, 1, clearb)
 GEN_VEXT_VX_ENV(vsmul_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_ENV(vsmul_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_ENV(vsmul_vx_d, 8, 8, clearq)
+
+/* Vector Widening Saturating Scaled Multiply-Add */
+static uint16_t vwsmaccu8(CPURISCVState *env, uint8_t a, uint8_t b,
+    uint16_t c)
+{
+    uint8_t round;
+    uint16_t res = (uint16_t)a * (uint16_t)b;
+
+    round = get_round(env, res, 4);
+    res   = (res >> 4) + round;
+    return saddu16(env, c, res);
+}
+static uint32_t vwsmaccu16(CPURISCVState *env, uint16_t a, uint16_t b,
+    uint32_t c)
+{
+    uint8_t round;
+    uint32_t res = (uint32_t)a * (uint32_t)b;
+
+    round = get_round(env, res, 8);
+    res   = (res >> 8) + round;
+    return saddu32(env, c, res);
+}
+static uint64_t vwsmaccu32(CPURISCVState *env, uint32_t a, uint32_t b,
+    uint64_t c)
+{
+    uint8_t round;
+    uint64_t res = (uint64_t)a * (uint64_t)b;
+
+    round = get_round(env, res, 16);
+    res   = (res >> 16) + round;
+    return saddu64(env, c, res);
+}
+
+#define OPIVV3_ENV(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)   \
+static void do_##NAME(void *vd, void *vs1, void *vs2, int i,       \
+        CPURISCVState *env)                                        \
+{                                                                  \
+    TX1 s1 = *((T1 *)vs1 + HS1(i));                                \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                                \
+    TD d = *((TD *)vd + HD(i));                                    \
+    *((TD *)vd + HD(i)) = OP(env, s2, s1, d);                      \
+}
+RVVCALL(OPIVV3_ENV, vwsmaccu_vv_b, WOP_UUU_B, H2, H1, H1, vwsmaccu8)
+RVVCALL(OPIVV3_ENV, vwsmaccu_vv_h, WOP_UUU_H, H4, H2, H2, vwsmaccu16)
+RVVCALL(OPIVV3_ENV, vwsmaccu_vv_w, WOP_UUU_W, H8, H4, H4, vwsmaccu32)
+GEN_VEXT_VV_ENV(vwsmaccu_vv_b, 1, 2, clearh)
+GEN_VEXT_VV_ENV(vwsmaccu_vv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vwsmaccu_vv_w, 4, 8, clearq)
+
+#define OPIVX3_ENV(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)            \
+static void do_##NAME(void *vd, target_ulong s1, void *vs2, int i,     \
+        CPURISCVState *env)                                            \
+{                                                                      \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                                    \
+    TD d = *((TD *)vd + HD(i));                                        \
+    *((TD *)vd + HD(i)) = OP(env, s2, (TX1)(T1)(target_long)s1, d);    \
+}
+RVVCALL(OPIVX3_ENV, vwsmaccu_vx_b, WOP_UUU_B, H2, H1, vwsmaccu8)
+RVVCALL(OPIVX3_ENV, vwsmaccu_vx_h, WOP_UUU_H, H4, H2, vwsmaccu16)
+RVVCALL(OPIVX3_ENV, vwsmaccu_vx_w, WOP_UUU_W, H8, H4, vwsmaccu32)
+GEN_VEXT_VX_ENV(vwsmaccu_vx_b, 1, 2, clearh)
+GEN_VEXT_VX_ENV(vwsmaccu_vx_h, 2, 4, clearl)
+GEN_VEXT_VX_ENV(vwsmaccu_vx_w, 4, 8, clearq)
+
+static int16_t vwsmacc8(CPURISCVState *env, int8_t a, int8_t b, int16_t c)
+{
+    uint8_t round;
+    int16_t res = (int16_t)a * (int16_t)b;
+
+    round = get_round(env, res, 4);
+    res   = (res >> 4) + round;
+    return sadd16(env, c, res);
+}
+static int32_t vwsmacc16(CPURISCVState *env, int16_t a, int16_t b, int32_t c)
+{
+    uint8_t round;
+    int32_t res = (int32_t)a * (int32_t)b;
+
+    round = get_round(env, res, 8);
+    res   = (res >> 8) + round;
+    return sadd32(env, c, res);
+
+}
+static int64_t vwsmacc32(CPURISCVState *env, int32_t a, int32_t b, int64_t c)
+{
+    uint8_t round;
+    int64_t res = (int64_t)a * (int64_t)b;
+
+    round = get_round(env, res, 16);
+    res   = (res >> 16) + round;
+    return sadd64(env, c, res);
+}
+RVVCALL(OPIVV3_ENV, vwsmacc_vv_b, WOP_SSS_B, H2, H1, H1, vwsmacc8)
+RVVCALL(OPIVV3_ENV, vwsmacc_vv_h, WOP_SSS_H, H4, H2, H2, vwsmacc16)
+RVVCALL(OPIVV3_ENV, vwsmacc_vv_w, WOP_SSS_W, H8, H4, H4, vwsmacc32)
+GEN_VEXT_VV_ENV(vwsmacc_vv_b, 1, 2, clearh)
+GEN_VEXT_VV_ENV(vwsmacc_vv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vwsmacc_vv_w, 4, 8, clearq)
+RVVCALL(OPIVX3_ENV, vwsmacc_vx_b, WOP_SSS_B, H2, H1, vwsmacc8)
+RVVCALL(OPIVX3_ENV, vwsmacc_vx_h, WOP_SSS_H, H4, H2, vwsmacc16)
+RVVCALL(OPIVX3_ENV, vwsmacc_vx_w, WOP_SSS_W, H8, H4, vwsmacc32)
+GEN_VEXT_VX_ENV(vwsmacc_vx_b, 1, 2, clearh)
+GEN_VEXT_VX_ENV(vwsmacc_vx_h, 2, 4, clearl)
+GEN_VEXT_VX_ENV(vwsmacc_vx_w, 4, 8, clearq)
+
+static int16_t vwsmaccsu8(CPURISCVState *env, uint8_t a, int8_t b, int16_t c)
+{
+    uint8_t round;
+    int16_t res = (uint16_t)a * (int16_t)b;
+
+    round = get_round(env, res, 4);
+    res   = (res >> 4) + round;
+    return ssub16(env, c, res);
+}
+static int32_t vwsmaccsu16(CPURISCVState *env, uint16_t a, int16_t b,
+    uint32_t c)
+{
+    uint8_t round;
+    int32_t res = (uint32_t)a * (int32_t)b;
+
+    round = get_round(env, res, 8);
+    res   = (res >> 8) + round;
+    return ssub32(env, c, res);
+}
+static int64_t vwsmaccsu32(CPURISCVState *env, uint32_t a, int32_t b,
+    int64_t c)
+{
+    uint8_t round;
+    int64_t res = (uint64_t)a * (int64_t)b;
+
+    round = get_round(env, res, 16);
+    res   = (res >> 16) + round;
+    return ssub64(env, c, res);
+}
+RVVCALL(OPIVV3_ENV, vwsmaccsu_vv_b, WOP_SSU_B, H2, H1, H1, vwsmaccsu8)
+RVVCALL(OPIVV3_ENV, vwsmaccsu_vv_h, WOP_SSU_H, H4, H2, H2, vwsmaccsu16)
+RVVCALL(OPIVV3_ENV, vwsmaccsu_vv_w, WOP_SSU_W, H8, H4, H4, vwsmaccsu32)
+GEN_VEXT_VV_ENV(vwsmaccsu_vv_b, 1, 2, clearh)
+GEN_VEXT_VV_ENV(vwsmaccsu_vv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vwsmaccsu_vv_w, 4, 8, clearq)
+RVVCALL(OPIVX3_ENV, vwsmaccsu_vx_b, WOP_SSU_B, H2, H1, vwsmaccsu8)
+RVVCALL(OPIVX3_ENV, vwsmaccsu_vx_h, WOP_SSU_H, H4, H2, vwsmaccsu16)
+RVVCALL(OPIVX3_ENV, vwsmaccsu_vx_w, WOP_SSU_W, H8, H4, vwsmaccsu32)
+GEN_VEXT_VX_ENV(vwsmaccsu_vx_b, 1, 2, clearh)
+GEN_VEXT_VX_ENV(vwsmaccsu_vx_h, 2, 4, clearl)
+GEN_VEXT_VX_ENV(vwsmaccsu_vx_w, 4, 8, clearq)
+
+static int16_t vwsmaccus8(CPURISCVState *env, int8_t a, uint8_t b, int16_t c)
+{
+    uint8_t round;
+    int16_t res = (int16_t)a * (uint16_t)b;
+
+    round = get_round(env, res, 4);
+    res   = (res >> 4) + round;
+    return ssub16(env, c, res);
+}
+static int32_t vwsmaccus16(CPURISCVState *env, int16_t a, uint16_t b, int32_t c)
+{
+    uint8_t round;
+    int32_t res = (int32_t)a * (uint32_t)b;
+
+    round = get_round(env, res, 8);
+    res   = (res >> 8) + round;
+    return ssub32(env, c, res);
+}
+static int64_t vwsmaccus32(CPURISCVState *env, int32_t a, uint32_t b, int64_t c)
+{
+    uint8_t round;
+    int64_t res = (int64_t)a * (uint64_t)b;
+
+    round = get_round(env, res, 16);
+    res   = (res >> 16) + round;
+    return ssub64(env, c, res);
+}
+RVVCALL(OPIVX3_ENV, vwsmaccus_vx_b, WOP_SUS_B, H2, H1, vwsmaccus8)
+RVVCALL(OPIVX3_ENV, vwsmaccus_vx_h, WOP_SUS_H, H4, H2, vwsmaccus16)
+RVVCALL(OPIVX3_ENV, vwsmaccus_vx_w, WOP_SUS_W, H8, H4, vwsmaccus32)
+GEN_VEXT_VX_ENV(vwsmaccus_vx_b, 1, 2, clearh)
+GEN_VEXT_VX_ENV(vwsmaccus_vx_h, 2, 4, clearl)
+GEN_VEXT_VX_ENV(vwsmaccus_vx_w, 4, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 26/60] target/riscv: vector widening saturating scaled multiply-add
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  22 +++
 target/riscv/insn32.decode              |   7 +
 target/riscv/insn_trans/trans_rvv.inc.c |   9 ++
 target/riscv/vector_helper.c            | 180 ++++++++++++++++++++++++
 4 files changed, 218 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 333eccca57..74c1c695e0 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -733,3 +733,25 @@ DEF_HELPER_6(vsmul_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsmul_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsmul_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsmul_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vwsmaccu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmaccu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmaccu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmacc_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmacc_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmacc_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmaccsu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmaccsu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmaccsu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmaccu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmacc_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmacc_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmacc_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccsu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccsu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccus_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccus_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccus_vx_w, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 99f70924d6..8798919d3e 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -419,6 +419,13 @@ vasub_vv        100110 . ..... ..... 000 ..... 1010111 @r_vm
 vasub_vx        100110 . ..... ..... 100 ..... 1010111 @r_vm
 vsmul_vv        100111 . ..... ..... 000 ..... 1010111 @r_vm
 vsmul_vx        100111 . ..... ..... 100 ..... 1010111 @r_vm
+vwsmaccu_vv     111100 . ..... ..... 000 ..... 1010111 @r_vm
+vwsmaccu_vx     111100 . ..... ..... 100 ..... 1010111 @r_vm
+vwsmacc_vv      111101 . ..... ..... 000 ..... 1010111 @r_vm
+vwsmacc_vx      111101 . ..... ..... 100 ..... 1010111 @r_vm
+vwsmaccsu_vv    111110 . ..... ..... 000 ..... 1010111 @r_vm
+vwsmaccsu_vx    111110 . ..... ..... 100 ..... 1010111 @r_vm
+vwsmaccus_vx    111111 . ..... ..... 100 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 60e1e63b7b..68bebd3c37 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1532,3 +1532,12 @@ GEN_OPIVI_TRANS(vaadd_vi, 0, vaadd_vx, opivx_check)
 /* Vector Single-Width Fractional Multiply with Rounding and Saturation */
 GEN_OPIVV_TRANS(vsmul_vv, opivv_check)
 GEN_OPIVX_TRANS(vsmul_vx,  opivx_check)
+
+/* Vector Widening Saturating Scaled Multiply-Add */
+GEN_OPIVV_WIDEN_TRANS(vwsmaccu_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwsmacc_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwsmaccsu_vv, opivv_widen_check)
+GEN_OPIVX_WIDEN_TRANS(vwsmaccu_vx)
+GEN_OPIVX_WIDEN_TRANS(vwsmacc_vx)
+GEN_OPIVX_WIDEN_TRANS(vwsmaccsu_vx)
+GEN_OPIVX_WIDEN_TRANS(vwsmaccus_vx)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 74ad07743c..90c19577fa 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -2523,3 +2523,183 @@ GEN_VEXT_VX_ENV(vsmul_vx_b, 1, 1, clearb)
 GEN_VEXT_VX_ENV(vsmul_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_ENV(vsmul_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_ENV(vsmul_vx_d, 8, 8, clearq)
+
+/* Vector Widening Saturating Scaled Multiply-Add */
+static uint16_t vwsmaccu8(CPURISCVState *env, uint8_t a, uint8_t b,
+    uint16_t c)
+{
+    uint8_t round;
+    uint16_t res = (uint16_t)a * (uint16_t)b;
+
+    round = get_round(env, res, 4);
+    res   = (res >> 4) + round;
+    return saddu16(env, c, res);
+}
+static uint32_t vwsmaccu16(CPURISCVState *env, uint16_t a, uint16_t b,
+    uint32_t c)
+{
+    uint8_t round;
+    uint32_t res = (uint32_t)a * (uint32_t)b;
+
+    round = get_round(env, res, 8);
+    res   = (res >> 8) + round;
+    return saddu32(env, c, res);
+}
+static uint64_t vwsmaccu32(CPURISCVState *env, uint32_t a, uint32_t b,
+    uint64_t c)
+{
+    uint8_t round;
+    uint64_t res = (uint64_t)a * (uint64_t)b;
+
+    round = get_round(env, res, 16);
+    res   = (res >> 16) + round;
+    return saddu64(env, c, res);
+}
+
+#define OPIVV3_ENV(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)   \
+static void do_##NAME(void *vd, void *vs1, void *vs2, int i,       \
+        CPURISCVState *env)                                        \
+{                                                                  \
+    TX1 s1 = *((T1 *)vs1 + HS1(i));                                \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                                \
+    TD d = *((TD *)vd + HD(i));                                    \
+    *((TD *)vd + HD(i)) = OP(env, s2, s1, d);                      \
+}
+RVVCALL(OPIVV3_ENV, vwsmaccu_vv_b, WOP_UUU_B, H2, H1, H1, vwsmaccu8)
+RVVCALL(OPIVV3_ENV, vwsmaccu_vv_h, WOP_UUU_H, H4, H2, H2, vwsmaccu16)
+RVVCALL(OPIVV3_ENV, vwsmaccu_vv_w, WOP_UUU_W, H8, H4, H4, vwsmaccu32)
+GEN_VEXT_VV_ENV(vwsmaccu_vv_b, 1, 2, clearh)
+GEN_VEXT_VV_ENV(vwsmaccu_vv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vwsmaccu_vv_w, 4, 8, clearq)
+
+#define OPIVX3_ENV(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)            \
+static void do_##NAME(void *vd, target_ulong s1, void *vs2, int i,     \
+        CPURISCVState *env)                                            \
+{                                                                      \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                                    \
+    TD d = *((TD *)vd + HD(i));                                        \
+    *((TD *)vd + HD(i)) = OP(env, s2, (TX1)(T1)(target_long)s1, d);    \
+}
+RVVCALL(OPIVX3_ENV, vwsmaccu_vx_b, WOP_UUU_B, H2, H1, vwsmaccu8)
+RVVCALL(OPIVX3_ENV, vwsmaccu_vx_h, WOP_UUU_H, H4, H2, vwsmaccu16)
+RVVCALL(OPIVX3_ENV, vwsmaccu_vx_w, WOP_UUU_W, H8, H4, vwsmaccu32)
+GEN_VEXT_VX_ENV(vwsmaccu_vx_b, 1, 2, clearh)
+GEN_VEXT_VX_ENV(vwsmaccu_vx_h, 2, 4, clearl)
+GEN_VEXT_VX_ENV(vwsmaccu_vx_w, 4, 8, clearq)
+
+static int16_t vwsmacc8(CPURISCVState *env, int8_t a, int8_t b, int16_t c)
+{
+    uint8_t round;
+    int16_t res = (int16_t)a * (int16_t)b;
+
+    round = get_round(env, res, 4);
+    res   = (res >> 4) + round;
+    return sadd16(env, c, res);
+}
+static int32_t vwsmacc16(CPURISCVState *env, int16_t a, int16_t b, int32_t c)
+{
+    uint8_t round;
+    int32_t res = (int32_t)a * (int32_t)b;
+
+    round = get_round(env, res, 8);
+    res   = (res >> 8) + round;
+    return sadd32(env, c, res);
+
+}
+static int64_t vwsmacc32(CPURISCVState *env, int32_t a, int32_t b, int64_t c)
+{
+    uint8_t round;
+    int64_t res = (int64_t)a * (int64_t)b;
+
+    round = get_round(env, res, 16);
+    res   = (res >> 16) + round;
+    return sadd64(env, c, res);
+}
+RVVCALL(OPIVV3_ENV, vwsmacc_vv_b, WOP_SSS_B, H2, H1, H1, vwsmacc8)
+RVVCALL(OPIVV3_ENV, vwsmacc_vv_h, WOP_SSS_H, H4, H2, H2, vwsmacc16)
+RVVCALL(OPIVV3_ENV, vwsmacc_vv_w, WOP_SSS_W, H8, H4, H4, vwsmacc32)
+GEN_VEXT_VV_ENV(vwsmacc_vv_b, 1, 2, clearh)
+GEN_VEXT_VV_ENV(vwsmacc_vv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vwsmacc_vv_w, 4, 8, clearq)
+RVVCALL(OPIVX3_ENV, vwsmacc_vx_b, WOP_SSS_B, H2, H1, vwsmacc8)
+RVVCALL(OPIVX3_ENV, vwsmacc_vx_h, WOP_SSS_H, H4, H2, vwsmacc16)
+RVVCALL(OPIVX3_ENV, vwsmacc_vx_w, WOP_SSS_W, H8, H4, vwsmacc32)
+GEN_VEXT_VX_ENV(vwsmacc_vx_b, 1, 2, clearh)
+GEN_VEXT_VX_ENV(vwsmacc_vx_h, 2, 4, clearl)
+GEN_VEXT_VX_ENV(vwsmacc_vx_w, 4, 8, clearq)
+
+static int16_t vwsmaccsu8(CPURISCVState *env, uint8_t a, int8_t b, int16_t c)
+{
+    uint8_t round;
+    int16_t res = (uint16_t)a * (int16_t)b;
+
+    round = get_round(env, res, 4);
+    res   = (res >> 4) + round;
+    return ssub16(env, c, res);
+}
+static int32_t vwsmaccsu16(CPURISCVState *env, uint16_t a, int16_t b,
+    uint32_t c)
+{
+    uint8_t round;
+    int32_t res = (uint32_t)a * (int32_t)b;
+
+    round = get_round(env, res, 8);
+    res   = (res >> 8) + round;
+    return ssub32(env, c, res);
+}
+static int64_t vwsmaccsu32(CPURISCVState *env, uint32_t a, int32_t b,
+    int64_t c)
+{
+    uint8_t round;
+    int64_t res = (uint64_t)a * (int64_t)b;
+
+    round = get_round(env, res, 16);
+    res   = (res >> 16) + round;
+    return ssub64(env, c, res);
+}
+RVVCALL(OPIVV3_ENV, vwsmaccsu_vv_b, WOP_SSU_B, H2, H1, H1, vwsmaccsu8)
+RVVCALL(OPIVV3_ENV, vwsmaccsu_vv_h, WOP_SSU_H, H4, H2, H2, vwsmaccsu16)
+RVVCALL(OPIVV3_ENV, vwsmaccsu_vv_w, WOP_SSU_W, H8, H4, H4, vwsmaccsu32)
+GEN_VEXT_VV_ENV(vwsmaccsu_vv_b, 1, 2, clearh)
+GEN_VEXT_VV_ENV(vwsmaccsu_vv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vwsmaccsu_vv_w, 4, 8, clearq)
+RVVCALL(OPIVX3_ENV, vwsmaccsu_vx_b, WOP_SSU_B, H2, H1, vwsmaccsu8)
+RVVCALL(OPIVX3_ENV, vwsmaccsu_vx_h, WOP_SSU_H, H4, H2, vwsmaccsu16)
+RVVCALL(OPIVX3_ENV, vwsmaccsu_vx_w, WOP_SSU_W, H8, H4, vwsmaccsu32)
+GEN_VEXT_VX_ENV(vwsmaccsu_vx_b, 1, 2, clearh)
+GEN_VEXT_VX_ENV(vwsmaccsu_vx_h, 2, 4, clearl)
+GEN_VEXT_VX_ENV(vwsmaccsu_vx_w, 4, 8, clearq)
+
+static int16_t vwsmaccus8(CPURISCVState *env, int8_t a, uint8_t b, int16_t c)
+{
+    uint8_t round;
+    int16_t res = (int16_t)a * (uint16_t)b;
+
+    round = get_round(env, res, 4);
+    res   = (res >> 4) + round;
+    return ssub16(env, c, res);
+}
+static int32_t vwsmaccus16(CPURISCVState *env, int16_t a, uint16_t b, int32_t c)
+{
+    uint8_t round;
+    int32_t res = (int32_t)a * (uint32_t)b;
+
+    round = get_round(env, res, 8);
+    res   = (res >> 8) + round;
+    return ssub32(env, c, res);
+}
+static int64_t vwsmaccus32(CPURISCVState *env, int32_t a, uint32_t b, int64_t c)
+{
+    uint8_t round;
+    int64_t res = (int64_t)a * (uint64_t)b;
+
+    round = get_round(env, res, 16);
+    res   = (res >> 16) + round;
+    return ssub64(env, c, res);
+}
+RVVCALL(OPIVX3_ENV, vwsmaccus_vx_b, WOP_SUS_B, H2, H1, vwsmaccus8)
+RVVCALL(OPIVX3_ENV, vwsmaccus_vx_h, WOP_SUS_H, H4, H2, vwsmaccus16)
+RVVCALL(OPIVX3_ENV, vwsmaccus_vx_w, WOP_SUS_W, H8, H4, vwsmaccus32)
+GEN_VEXT_VX_ENV(vwsmaccus_vx_b, 1, 2, clearh)
+GEN_VEXT_VX_ENV(vwsmaccus_vx_h, 2, 4, clearl)
+GEN_VEXT_VX_ENV(vwsmaccus_vx_w, 4, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 27/60] target/riscv: vector single-width scaling shift instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  17 ++++
 target/riscv/insn32.decode              |   6 ++
 target/riscv/insn_trans/trans_rvv.inc.c |   8 ++
 target/riscv/vector_helper.c            | 109 ++++++++++++++++++++++++
 4 files changed, 140 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 74c1c695e0..efc84fbd79 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -755,3 +755,20 @@ DEF_HELPER_6(vwsmaccsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwsmaccus_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwsmaccus_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwsmaccus_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vssrl_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssrl_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssrl_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssrl_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssra_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssra_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssra_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssra_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssrl_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssrl_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssrl_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssrl_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssra_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssra_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssra_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssra_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 8798919d3e..d6d111e04a 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -426,6 +426,12 @@ vwsmacc_vx      111101 . ..... ..... 100 ..... 1010111 @r_vm
 vwsmaccsu_vv    111110 . ..... ..... 000 ..... 1010111 @r_vm
 vwsmaccsu_vx    111110 . ..... ..... 100 ..... 1010111 @r_vm
 vwsmaccus_vx    111111 . ..... ..... 100 ..... 1010111 @r_vm
+vssrl_vv        101010 . ..... ..... 000 ..... 1010111 @r_vm
+vssrl_vx        101010 . ..... ..... 100 ..... 1010111 @r_vm
+vssrl_vi        101010 . ..... ..... 011 ..... 1010111 @r_vm
+vssra_vv        101011 . ..... ..... 000 ..... 1010111 @r_vm
+vssra_vx        101011 . ..... ..... 100 ..... 1010111 @r_vm
+vssra_vi        101011 . ..... ..... 011 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 68bebd3c37..21f896ea26 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1541,3 +1541,11 @@ GEN_OPIVX_WIDEN_TRANS(vwsmaccu_vx)
 GEN_OPIVX_WIDEN_TRANS(vwsmacc_vx)
 GEN_OPIVX_WIDEN_TRANS(vwsmaccsu_vx)
 GEN_OPIVX_WIDEN_TRANS(vwsmaccus_vx)
+
+/* Vector Single-Width Scaling Shift Instructions */
+GEN_OPIVV_TRANS(vssrl_vv, opivv_check)
+GEN_OPIVV_TRANS(vssra_vv, opivv_check)
+GEN_OPIVX_TRANS(vssrl_vx,  opivx_check)
+GEN_OPIVX_TRANS(vssra_vx,  opivx_check)
+GEN_OPIVI_TRANS(vssrl_vi, 1, vssrl_vx, opivx_check)
+GEN_OPIVI_TRANS(vssra_vi, 0, vssra_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 90c19577fa..ec0f822fcf 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -2703,3 +2703,112 @@ RVVCALL(OPIVX3_ENV, vwsmaccus_vx_w, WOP_SUS_W, H8, H4, vwsmaccus32)
 GEN_VEXT_VX_ENV(vwsmaccus_vx_b, 1, 2, clearh)
 GEN_VEXT_VX_ENV(vwsmaccus_vx_h, 2, 4, clearl)
 GEN_VEXT_VX_ENV(vwsmaccus_vx_w, 4, 8, clearq)
+
+/* Vector Single-Width Scaling Shift Instructions */
+static uint8_t vssrl8(CPURISCVState *env, uint8_t a, uint8_t b)
+{
+    uint8_t round, shift = b & 0x7;
+    uint8_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    return res;
+}
+static uint16_t vssrl16(CPURISCVState *env, uint16_t a, uint16_t b)
+{
+    uint8_t round, shift = b & 0xf;
+    uint16_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    return res;
+}
+static uint32_t vssrl32(CPURISCVState *env, uint32_t a, uint32_t b)
+{
+    uint8_t round, shift = b & 0x1f;
+    uint32_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    return res;
+}
+static uint64_t vssrl64(CPURISCVState *env, uint64_t a, uint64_t b)
+{
+    uint8_t round, shift = b & 0x3f;
+    uint64_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    return res;
+}
+RVVCALL(OPIVV2_ENV, vssrl_vv_b, OP_UUU_B, H1, H1, H1, vssrl8)
+RVVCALL(OPIVV2_ENV, vssrl_vv_h, OP_UUU_H, H2, H2, H2, vssrl16)
+RVVCALL(OPIVV2_ENV, vssrl_vv_w, OP_UUU_W, H4, H4, H4, vssrl32)
+RVVCALL(OPIVV2_ENV, vssrl_vv_d, OP_UUU_D, H8, H8, H8, vssrl64)
+GEN_VEXT_VV_ENV(vssrl_vv_b, 1, 1, clearb)
+GEN_VEXT_VV_ENV(vssrl_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vssrl_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vssrl_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2_ENV, vssrl_vx_b, OP_UUU_B, H1, H1, vssrl8)
+RVVCALL(OPIVX2_ENV, vssrl_vx_h, OP_UUU_H, H2, H2, vssrl16)
+RVVCALL(OPIVX2_ENV, vssrl_vx_w, OP_UUU_W, H4, H4, vssrl32)
+RVVCALL(OPIVX2_ENV, vssrl_vx_d, OP_UUU_D, H8, H8, vssrl64)
+GEN_VEXT_VX_ENV(vssrl_vx_b, 1, 1, clearb)
+GEN_VEXT_VX_ENV(vssrl_vx_h, 2, 2, clearh)
+GEN_VEXT_VX_ENV(vssrl_vx_w, 4, 4, clearl)
+GEN_VEXT_VX_ENV(vssrl_vx_d, 8, 8, clearq)
+
+static int8_t vssra8(CPURISCVState *env, int8_t a, int8_t b)
+{
+    uint8_t round, shift = b & 0x7;
+    int8_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    return res;
+}
+static int16_t vssra16(CPURISCVState *env, int16_t a, int16_t b)
+{
+    uint8_t round, shift = b & 0xf;
+    int16_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    return res;
+}
+static int32_t vssra32(CPURISCVState *env, int32_t a, int32_t b)
+{
+    uint8_t round, shift = b & 0x1f;
+    int32_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    return res;
+}
+static int64_t vssra64(CPURISCVState *env, int64_t a, int64_t b)
+{
+    uint8_t round, shift = b & 0x3f;
+    int64_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    return res;
+}
+RVVCALL(OPIVV2_ENV, vssra_vv_b, OP_SSS_B, H1, H1, H1, vssra8)
+RVVCALL(OPIVV2_ENV, vssra_vv_h, OP_SSS_H, H2, H2, H2, vssra16)
+RVVCALL(OPIVV2_ENV, vssra_vv_w, OP_SSS_W, H4, H4, H4, vssra32)
+RVVCALL(OPIVV2_ENV, vssra_vv_d, OP_SSS_D, H8, H8, H8, vssra64)
+GEN_VEXT_VV_ENV(vssra_vv_b, 1, 1, clearb)
+GEN_VEXT_VV_ENV(vssra_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vssra_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vssra_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2_ENV, vssra_vx_b, OP_SSS_B, H1, H1, vssra8)
+RVVCALL(OPIVX2_ENV, vssra_vx_h, OP_SSS_H, H2, H2, vssra16)
+RVVCALL(OPIVX2_ENV, vssra_vx_w, OP_SSS_W, H4, H4, vssra32)
+RVVCALL(OPIVX2_ENV, vssra_vx_d, OP_SSS_D, H8, H8, vssra64)
+GEN_VEXT_VX_ENV(vssra_vx_b, 1, 1, clearb)
+GEN_VEXT_VX_ENV(vssra_vx_h, 2, 2, clearh)
+GEN_VEXT_VX_ENV(vssra_vx_w, 4, 4, clearl)
+GEN_VEXT_VX_ENV(vssra_vx_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 27/60] target/riscv: vector single-width scaling shift instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  17 ++++
 target/riscv/insn32.decode              |   6 ++
 target/riscv/insn_trans/trans_rvv.inc.c |   8 ++
 target/riscv/vector_helper.c            | 109 ++++++++++++++++++++++++
 4 files changed, 140 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 74c1c695e0..efc84fbd79 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -755,3 +755,20 @@ DEF_HELPER_6(vwsmaccsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwsmaccus_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwsmaccus_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwsmaccus_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vssrl_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssrl_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssrl_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssrl_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssra_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssra_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssra_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssra_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssrl_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssrl_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssrl_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssrl_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssra_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssra_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssra_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssra_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 8798919d3e..d6d111e04a 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -426,6 +426,12 @@ vwsmacc_vx      111101 . ..... ..... 100 ..... 1010111 @r_vm
 vwsmaccsu_vv    111110 . ..... ..... 000 ..... 1010111 @r_vm
 vwsmaccsu_vx    111110 . ..... ..... 100 ..... 1010111 @r_vm
 vwsmaccus_vx    111111 . ..... ..... 100 ..... 1010111 @r_vm
+vssrl_vv        101010 . ..... ..... 000 ..... 1010111 @r_vm
+vssrl_vx        101010 . ..... ..... 100 ..... 1010111 @r_vm
+vssrl_vi        101010 . ..... ..... 011 ..... 1010111 @r_vm
+vssra_vv        101011 . ..... ..... 000 ..... 1010111 @r_vm
+vssra_vx        101011 . ..... ..... 100 ..... 1010111 @r_vm
+vssra_vi        101011 . ..... ..... 011 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 68bebd3c37..21f896ea26 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1541,3 +1541,11 @@ GEN_OPIVX_WIDEN_TRANS(vwsmaccu_vx)
 GEN_OPIVX_WIDEN_TRANS(vwsmacc_vx)
 GEN_OPIVX_WIDEN_TRANS(vwsmaccsu_vx)
 GEN_OPIVX_WIDEN_TRANS(vwsmaccus_vx)
+
+/* Vector Single-Width Scaling Shift Instructions */
+GEN_OPIVV_TRANS(vssrl_vv, opivv_check)
+GEN_OPIVV_TRANS(vssra_vv, opivv_check)
+GEN_OPIVX_TRANS(vssrl_vx,  opivx_check)
+GEN_OPIVX_TRANS(vssra_vx,  opivx_check)
+GEN_OPIVI_TRANS(vssrl_vi, 1, vssrl_vx, opivx_check)
+GEN_OPIVI_TRANS(vssra_vi, 0, vssra_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 90c19577fa..ec0f822fcf 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -2703,3 +2703,112 @@ RVVCALL(OPIVX3_ENV, vwsmaccus_vx_w, WOP_SUS_W, H8, H4, vwsmaccus32)
 GEN_VEXT_VX_ENV(vwsmaccus_vx_b, 1, 2, clearh)
 GEN_VEXT_VX_ENV(vwsmaccus_vx_h, 2, 4, clearl)
 GEN_VEXT_VX_ENV(vwsmaccus_vx_w, 4, 8, clearq)
+
+/* Vector Single-Width Scaling Shift Instructions */
+static uint8_t vssrl8(CPURISCVState *env, uint8_t a, uint8_t b)
+{
+    uint8_t round, shift = b & 0x7;
+    uint8_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    return res;
+}
+static uint16_t vssrl16(CPURISCVState *env, uint16_t a, uint16_t b)
+{
+    uint8_t round, shift = b & 0xf;
+    uint16_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    return res;
+}
+static uint32_t vssrl32(CPURISCVState *env, uint32_t a, uint32_t b)
+{
+    uint8_t round, shift = b & 0x1f;
+    uint32_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    return res;
+}
+static uint64_t vssrl64(CPURISCVState *env, uint64_t a, uint64_t b)
+{
+    uint8_t round, shift = b & 0x3f;
+    uint64_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    return res;
+}
+RVVCALL(OPIVV2_ENV, vssrl_vv_b, OP_UUU_B, H1, H1, H1, vssrl8)
+RVVCALL(OPIVV2_ENV, vssrl_vv_h, OP_UUU_H, H2, H2, H2, vssrl16)
+RVVCALL(OPIVV2_ENV, vssrl_vv_w, OP_UUU_W, H4, H4, H4, vssrl32)
+RVVCALL(OPIVV2_ENV, vssrl_vv_d, OP_UUU_D, H8, H8, H8, vssrl64)
+GEN_VEXT_VV_ENV(vssrl_vv_b, 1, 1, clearb)
+GEN_VEXT_VV_ENV(vssrl_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vssrl_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vssrl_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2_ENV, vssrl_vx_b, OP_UUU_B, H1, H1, vssrl8)
+RVVCALL(OPIVX2_ENV, vssrl_vx_h, OP_UUU_H, H2, H2, vssrl16)
+RVVCALL(OPIVX2_ENV, vssrl_vx_w, OP_UUU_W, H4, H4, vssrl32)
+RVVCALL(OPIVX2_ENV, vssrl_vx_d, OP_UUU_D, H8, H8, vssrl64)
+GEN_VEXT_VX_ENV(vssrl_vx_b, 1, 1, clearb)
+GEN_VEXT_VX_ENV(vssrl_vx_h, 2, 2, clearh)
+GEN_VEXT_VX_ENV(vssrl_vx_w, 4, 4, clearl)
+GEN_VEXT_VX_ENV(vssrl_vx_d, 8, 8, clearq)
+
+static int8_t vssra8(CPURISCVState *env, int8_t a, int8_t b)
+{
+    uint8_t round, shift = b & 0x7;
+    int8_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    return res;
+}
+static int16_t vssra16(CPURISCVState *env, int16_t a, int16_t b)
+{
+    uint8_t round, shift = b & 0xf;
+    int16_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    return res;
+}
+static int32_t vssra32(CPURISCVState *env, int32_t a, int32_t b)
+{
+    uint8_t round, shift = b & 0x1f;
+    int32_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    return res;
+}
+static int64_t vssra64(CPURISCVState *env, int64_t a, int64_t b)
+{
+    uint8_t round, shift = b & 0x3f;
+    int64_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    return res;
+}
+RVVCALL(OPIVV2_ENV, vssra_vv_b, OP_SSS_B, H1, H1, H1, vssra8)
+RVVCALL(OPIVV2_ENV, vssra_vv_h, OP_SSS_H, H2, H2, H2, vssra16)
+RVVCALL(OPIVV2_ENV, vssra_vv_w, OP_SSS_W, H4, H4, H4, vssra32)
+RVVCALL(OPIVV2_ENV, vssra_vv_d, OP_SSS_D, H8, H8, H8, vssra64)
+GEN_VEXT_VV_ENV(vssra_vv_b, 1, 1, clearb)
+GEN_VEXT_VV_ENV(vssra_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vssra_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vssra_vv_d, 8, 8, clearq)
+
+RVVCALL(OPIVX2_ENV, vssra_vx_b, OP_SSS_B, H1, H1, vssra8)
+RVVCALL(OPIVX2_ENV, vssra_vx_h, OP_SSS_H, H2, H2, vssra16)
+RVVCALL(OPIVX2_ENV, vssra_vx_w, OP_SSS_W, H4, H4, vssra32)
+RVVCALL(OPIVX2_ENV, vssra_vx_d, OP_SSS_D, H8, H8, vssra64)
+GEN_VEXT_VX_ENV(vssra_vx_b, 1, 1, clearb)
+GEN_VEXT_VX_ENV(vssra_vx_h, 2, 2, clearh)
+GEN_VEXT_VX_ENV(vssra_vx_w, 4, 4, clearl)
+GEN_VEXT_VX_ENV(vssra_vx_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 28/60] target/riscv: vector narrowing fixed-point clip instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  13 +++
 target/riscv/insn32.decode              |   6 ++
 target/riscv/insn_trans/trans_rvv.inc.c |   8 ++
 target/riscv/vector_helper.c            | 128 ++++++++++++++++++++++++
 4 files changed, 155 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index efc84fbd79..4cad8679ec 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -772,3 +772,16 @@ DEF_HELPER_6(vssra_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vssra_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vssra_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vssra_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vnclip_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnclip_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnclip_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnclipu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnclipu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnclipu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnclipu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnclipu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnclipu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnclip_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnclip_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnclip_vx_w, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index d6d111e04a..c7d589566f 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -432,6 +432,12 @@ vssrl_vi        101010 . ..... ..... 011 ..... 1010111 @r_vm
 vssra_vv        101011 . ..... ..... 000 ..... 1010111 @r_vm
 vssra_vx        101011 . ..... ..... 100 ..... 1010111 @r_vm
 vssra_vi        101011 . ..... ..... 011 ..... 1010111 @r_vm
+vnclipu_vv      101110 . ..... ..... 000 ..... 1010111 @r_vm
+vnclipu_vx      101110 . ..... ..... 100 ..... 1010111 @r_vm
+vnclipu_vi      101110 . ..... ..... 011 ..... 1010111 @r_vm
+vnclip_vv       101111 . ..... ..... 000 ..... 1010111 @r_vm
+vnclip_vx       101111 . ..... ..... 100 ..... 1010111 @r_vm
+vnclip_vi       101111 . ..... ..... 011 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 21f896ea26..11b4887275 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1549,3 +1549,11 @@ GEN_OPIVX_TRANS(vssrl_vx,  opivx_check)
 GEN_OPIVX_TRANS(vssra_vx,  opivx_check)
 GEN_OPIVI_TRANS(vssrl_vi, 1, vssrl_vx, opivx_check)
 GEN_OPIVI_TRANS(vssra_vi, 0, vssra_vx, opivx_check)
+
+/* Vector Narrowing Fixed-Point Clip Instructions */
+GEN_OPIVV_NARROW_TRANS(vnclipu_vv)
+GEN_OPIVV_NARROW_TRANS(vnclip_vv)
+GEN_OPIVX_NARROW_TRANS(vnclipu_vx)
+GEN_OPIVX_NARROW_TRANS(vnclip_vx)
+GEN_OPIVI_NARROW_TRANS(vnclipu_vi, 1, vnclipu_vx)
+GEN_OPIVI_NARROW_TRANS(vnclip_vi, 1, vnclip_vx)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index ec0f822fcf..7f61d4c0c4 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -869,6 +869,12 @@ GEN_VEXT_AMO(vamomaxuw_v_w, uint32_t, uint32_t, idx_w, clearl)
 #define WOP_SSU_B int16_t, int8_t, uint8_t, int16_t, uint16_t
 #define WOP_SSU_H int32_t, int16_t, uint16_t, int32_t, uint32_t
 #define WOP_SSU_W int64_t, int32_t, uint32_t, int64_t, uint64_t
+#define NOP_SSS_B int8_t, int8_t, int16_t, int8_t, int16_t
+#define NOP_SSS_H int16_t, int16_t, int32_t, int16_t, int32_t
+#define NOP_SSS_W int32_t, int32_t, int64_t, int32_t, int64_t
+#define NOP_UUU_B uint8_t, uint8_t, uint16_t, uint8_t, uint16_t
+#define NOP_UUU_H uint16_t, uint16_t, uint32_t, uint16_t, uint32_t
+#define NOP_UUU_W uint32_t, uint32_t, uint64_t, uint32_t, uint64_t
 
 /* operation of two vector elements */
 #define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
@@ -2812,3 +2818,125 @@ GEN_VEXT_VX_ENV(vssra_vx_b, 1, 1, clearb)
 GEN_VEXT_VX_ENV(vssra_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_ENV(vssra_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_ENV(vssra_vx_d, 8, 8, clearq)
+
+/* Vector Narrowing Fixed-Point Clip Instructions */
+static int8_t vnclip8(CPURISCVState *env, int16_t a, int8_t b)
+{
+    uint8_t round, shift = b & 0xf;
+    int16_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    if (res > INT8_MAX) {
+        env->vxsat = 0x1;
+        return INT8_MAX;
+    } else if (res < INT8_MIN) {
+        env->vxsat = 0x1;
+        return INT8_MIN;
+    } else {
+        return res;
+    }
+}
+static int16_t vnclip16(CPURISCVState *env, int32_t a, int16_t b)
+{
+    uint8_t round, shift = b & 0x1f;
+    int32_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    if (res > INT16_MAX) {
+        env->vxsat = 0x1;
+        return INT16_MAX;
+    } else if (res < INT16_MIN) {
+        env->vxsat = 0x1;
+        return INT16_MIN;
+    } else {
+        return res;
+    }
+}
+static int32_t vnclip32(CPURISCVState *env, int64_t a, int32_t b)
+{
+    uint8_t round, shift = b & 0x3f;
+    int64_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    if (res > INT32_MAX) {
+        env->vxsat = 0x1;
+        return INT32_MAX;
+    } else if (res < INT32_MIN) {
+        env->vxsat = 0x1;
+        return INT32_MIN;
+    } else {
+        return res;
+    }
+}
+RVVCALL(OPIVV2_ENV, vnclip_vv_b, NOP_SSS_B, H1, H2, H1, vnclip8)
+RVVCALL(OPIVV2_ENV, vnclip_vv_h, NOP_SSS_H, H2, H4, H2, vnclip16)
+RVVCALL(OPIVV2_ENV, vnclip_vv_w, NOP_SSS_W, H4, H8, H4, vnclip32)
+GEN_VEXT_VV_ENV(vnclip_vv_b, 1, 1, clearb)
+GEN_VEXT_VV_ENV(vnclip_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vnclip_vv_w, 4, 4, clearl)
+
+RVVCALL(OPIVX2_ENV, vnclip_vx_b, NOP_SSS_B, H1, H2, vnclip8)
+RVVCALL(OPIVX2_ENV, vnclip_vx_h, NOP_SSS_H, H2, H4, vnclip16)
+RVVCALL(OPIVX2_ENV, vnclip_vx_w, NOP_SSS_W, H4, H8, vnclip32)
+GEN_VEXT_VX_ENV(vnclip_vx_b, 1, 1, clearb)
+GEN_VEXT_VX_ENV(vnclip_vx_h, 2, 2, clearh)
+GEN_VEXT_VX_ENV(vnclip_vx_w, 4, 4, clearl)
+
+static uint8_t vnclipu8(CPURISCVState *env, uint16_t a, uint8_t b)
+{
+    uint8_t round, shift = b & 0xf;
+    uint16_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    if (res > UINT8_MAX) {
+        env->vxsat = 0x1;
+        return UINT8_MAX;
+    } else {
+        return res;
+    }
+}
+static uint16_t vnclipu16(CPURISCVState *env, uint32_t a, uint16_t b)
+{
+    uint8_t round, shift = b & 0x1f;
+    uint32_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    if (res > UINT16_MAX) {
+        env->vxsat = 0x1;
+        return UINT16_MAX;
+    } else {
+        return res;
+    }
+}
+static uint32_t vnclipu32(CPURISCVState *env, uint64_t a, uint32_t b)
+{
+    uint8_t round, shift = b & 0x3f;
+    int64_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    if (res > UINT32_MAX) {
+        env->vxsat = 0x1;
+        return UINT32_MAX;
+    } else {
+        return res;
+    }
+}
+RVVCALL(OPIVV2_ENV, vnclipu_vv_b, NOP_UUU_B, H1, H2, H1, vnclipu8)
+RVVCALL(OPIVV2_ENV, vnclipu_vv_h, NOP_UUU_H, H2, H4, H2, vnclipu16)
+RVVCALL(OPIVV2_ENV, vnclipu_vv_w, NOP_UUU_W, H4, H8, H4, vnclipu32)
+GEN_VEXT_VV_ENV(vnclipu_vv_b, 1, 1, clearb)
+GEN_VEXT_VV_ENV(vnclipu_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vnclipu_vv_w, 4, 4, clearl)
+
+RVVCALL(OPIVX2_ENV, vnclipu_vx_b, NOP_UUU_B, H1, H2, vnclipu8)
+RVVCALL(OPIVX2_ENV, vnclipu_vx_h, NOP_UUU_H, H2, H4, vnclipu16)
+RVVCALL(OPIVX2_ENV, vnclipu_vx_w, NOP_UUU_W, H4, H8, vnclipu32)
+GEN_VEXT_VX_ENV(vnclipu_vx_b, 1, 1, clearb)
+GEN_VEXT_VX_ENV(vnclipu_vx_h, 2, 2, clearh)
+GEN_VEXT_VX_ENV(vnclipu_vx_w, 4, 4, clearl)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 28/60] target/riscv: vector narrowing fixed-point clip instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  13 +++
 target/riscv/insn32.decode              |   6 ++
 target/riscv/insn_trans/trans_rvv.inc.c |   8 ++
 target/riscv/vector_helper.c            | 128 ++++++++++++++++++++++++
 4 files changed, 155 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index efc84fbd79..4cad8679ec 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -772,3 +772,16 @@ DEF_HELPER_6(vssra_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vssra_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vssra_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vssra_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vnclip_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnclip_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnclip_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnclipu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnclipu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnclipu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnclipu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnclipu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnclipu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnclip_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnclip_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnclip_vx_w, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index d6d111e04a..c7d589566f 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -432,6 +432,12 @@ vssrl_vi        101010 . ..... ..... 011 ..... 1010111 @r_vm
 vssra_vv        101011 . ..... ..... 000 ..... 1010111 @r_vm
 vssra_vx        101011 . ..... ..... 100 ..... 1010111 @r_vm
 vssra_vi        101011 . ..... ..... 011 ..... 1010111 @r_vm
+vnclipu_vv      101110 . ..... ..... 000 ..... 1010111 @r_vm
+vnclipu_vx      101110 . ..... ..... 100 ..... 1010111 @r_vm
+vnclipu_vi      101110 . ..... ..... 011 ..... 1010111 @r_vm
+vnclip_vv       101111 . ..... ..... 000 ..... 1010111 @r_vm
+vnclip_vx       101111 . ..... ..... 100 ..... 1010111 @r_vm
+vnclip_vi       101111 . ..... ..... 011 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 21f896ea26..11b4887275 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1549,3 +1549,11 @@ GEN_OPIVX_TRANS(vssrl_vx,  opivx_check)
 GEN_OPIVX_TRANS(vssra_vx,  opivx_check)
 GEN_OPIVI_TRANS(vssrl_vi, 1, vssrl_vx, opivx_check)
 GEN_OPIVI_TRANS(vssra_vi, 0, vssra_vx, opivx_check)
+
+/* Vector Narrowing Fixed-Point Clip Instructions */
+GEN_OPIVV_NARROW_TRANS(vnclipu_vv)
+GEN_OPIVV_NARROW_TRANS(vnclip_vv)
+GEN_OPIVX_NARROW_TRANS(vnclipu_vx)
+GEN_OPIVX_NARROW_TRANS(vnclip_vx)
+GEN_OPIVI_NARROW_TRANS(vnclipu_vi, 1, vnclipu_vx)
+GEN_OPIVI_NARROW_TRANS(vnclip_vi, 1, vnclip_vx)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index ec0f822fcf..7f61d4c0c4 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -869,6 +869,12 @@ GEN_VEXT_AMO(vamomaxuw_v_w, uint32_t, uint32_t, idx_w, clearl)
 #define WOP_SSU_B int16_t, int8_t, uint8_t, int16_t, uint16_t
 #define WOP_SSU_H int32_t, int16_t, uint16_t, int32_t, uint32_t
 #define WOP_SSU_W int64_t, int32_t, uint32_t, int64_t, uint64_t
+#define NOP_SSS_B int8_t, int8_t, int16_t, int8_t, int16_t
+#define NOP_SSS_H int16_t, int16_t, int32_t, int16_t, int32_t
+#define NOP_SSS_W int32_t, int32_t, int64_t, int32_t, int64_t
+#define NOP_UUU_B uint8_t, uint8_t, uint16_t, uint8_t, uint16_t
+#define NOP_UUU_H uint16_t, uint16_t, uint32_t, uint16_t, uint32_t
+#define NOP_UUU_W uint32_t, uint32_t, uint64_t, uint32_t, uint64_t
 
 /* operation of two vector elements */
 #define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
@@ -2812,3 +2818,125 @@ GEN_VEXT_VX_ENV(vssra_vx_b, 1, 1, clearb)
 GEN_VEXT_VX_ENV(vssra_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_ENV(vssra_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_ENV(vssra_vx_d, 8, 8, clearq)
+
+/* Vector Narrowing Fixed-Point Clip Instructions */
+static int8_t vnclip8(CPURISCVState *env, int16_t a, int8_t b)
+{
+    uint8_t round, shift = b & 0xf;
+    int16_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    if (res > INT8_MAX) {
+        env->vxsat = 0x1;
+        return INT8_MAX;
+    } else if (res < INT8_MIN) {
+        env->vxsat = 0x1;
+        return INT8_MIN;
+    } else {
+        return res;
+    }
+}
+static int16_t vnclip16(CPURISCVState *env, int32_t a, int16_t b)
+{
+    uint8_t round, shift = b & 0x1f;
+    int32_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    if (res > INT16_MAX) {
+        env->vxsat = 0x1;
+        return INT16_MAX;
+    } else if (res < INT16_MIN) {
+        env->vxsat = 0x1;
+        return INT16_MIN;
+    } else {
+        return res;
+    }
+}
+static int32_t vnclip32(CPURISCVState *env, int64_t a, int32_t b)
+{
+    uint8_t round, shift = b & 0x3f;
+    int64_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    if (res > INT32_MAX) {
+        env->vxsat = 0x1;
+        return INT32_MAX;
+    } else if (res < INT32_MIN) {
+        env->vxsat = 0x1;
+        return INT32_MIN;
+    } else {
+        return res;
+    }
+}
+RVVCALL(OPIVV2_ENV, vnclip_vv_b, NOP_SSS_B, H1, H2, H1, vnclip8)
+RVVCALL(OPIVV2_ENV, vnclip_vv_h, NOP_SSS_H, H2, H4, H2, vnclip16)
+RVVCALL(OPIVV2_ENV, vnclip_vv_w, NOP_SSS_W, H4, H8, H4, vnclip32)
+GEN_VEXT_VV_ENV(vnclip_vv_b, 1, 1, clearb)
+GEN_VEXT_VV_ENV(vnclip_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vnclip_vv_w, 4, 4, clearl)
+
+RVVCALL(OPIVX2_ENV, vnclip_vx_b, NOP_SSS_B, H1, H2, vnclip8)
+RVVCALL(OPIVX2_ENV, vnclip_vx_h, NOP_SSS_H, H2, H4, vnclip16)
+RVVCALL(OPIVX2_ENV, vnclip_vx_w, NOP_SSS_W, H4, H8, vnclip32)
+GEN_VEXT_VX_ENV(vnclip_vx_b, 1, 1, clearb)
+GEN_VEXT_VX_ENV(vnclip_vx_h, 2, 2, clearh)
+GEN_VEXT_VX_ENV(vnclip_vx_w, 4, 4, clearl)
+
+static uint8_t vnclipu8(CPURISCVState *env, uint16_t a, uint8_t b)
+{
+    uint8_t round, shift = b & 0xf;
+    uint16_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    if (res > UINT8_MAX) {
+        env->vxsat = 0x1;
+        return UINT8_MAX;
+    } else {
+        return res;
+    }
+}
+static uint16_t vnclipu16(CPURISCVState *env, uint32_t a, uint16_t b)
+{
+    uint8_t round, shift = b & 0x1f;
+    uint32_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    if (res > UINT16_MAX) {
+        env->vxsat = 0x1;
+        return UINT16_MAX;
+    } else {
+        return res;
+    }
+}
+static uint32_t vnclipu32(CPURISCVState *env, uint64_t a, uint32_t b)
+{
+    uint8_t round, shift = b & 0x3f;
+    int64_t res;
+
+    round = get_round(env, a, shift);
+    res   = (a >> shift)  + round;
+    if (res > UINT32_MAX) {
+        env->vxsat = 0x1;
+        return UINT32_MAX;
+    } else {
+        return res;
+    }
+}
+RVVCALL(OPIVV2_ENV, vnclipu_vv_b, NOP_UUU_B, H1, H2, H1, vnclipu8)
+RVVCALL(OPIVV2_ENV, vnclipu_vv_h, NOP_UUU_H, H2, H4, H2, vnclipu16)
+RVVCALL(OPIVV2_ENV, vnclipu_vv_w, NOP_UUU_W, H4, H8, H4, vnclipu32)
+GEN_VEXT_VV_ENV(vnclipu_vv_b, 1, 1, clearb)
+GEN_VEXT_VV_ENV(vnclipu_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vnclipu_vv_w, 4, 4, clearl)
+
+RVVCALL(OPIVX2_ENV, vnclipu_vx_b, NOP_UUU_B, H1, H2, vnclipu8)
+RVVCALL(OPIVX2_ENV, vnclipu_vx_h, NOP_UUU_H, H2, H4, vnclipu16)
+RVVCALL(OPIVX2_ENV, vnclipu_vx_w, NOP_UUU_W, H4, H8, vnclipu32)
+GEN_VEXT_VX_ENV(vnclipu_vx_b, 1, 1, clearb)
+GEN_VEXT_VX_ENV(vnclipu_vx_h, 2, 2, clearh)
+GEN_VEXT_VX_ENV(vnclipu_vx_w, 4, 4, clearl)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 29/60] target/riscv: vector single-width floating-point add/subtract instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  16 ++++
 target/riscv/insn32.decode              |   5 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 107 ++++++++++++++++++++++++
 target/riscv/vector_helper.c            |  89 ++++++++++++++++++++
 4 files changed, 217 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 4cad8679ec..6b46677eeb 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -785,3 +785,19 @@ DEF_HELPER_6(vnclipu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnclip_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnclip_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnclip_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vfadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfadd_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfadd_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfadd_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsub_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsub_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsub_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfrsub_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfrsub_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfrsub_vf_d, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index c7d589566f..32918c4d11 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -438,6 +438,11 @@ vnclipu_vi      101110 . ..... ..... 011 ..... 1010111 @r_vm
 vnclip_vv       101111 . ..... ..... 000 ..... 1010111 @r_vm
 vnclip_vx       101111 . ..... ..... 100 ..... 1010111 @r_vm
 vnclip_vi       101111 . ..... ..... 011 ..... 1010111 @r_vm
+vfadd_vv        000000 . ..... ..... 001 ..... 1010111 @r_vm
+vfadd_vf        000000 . ..... ..... 101 ..... 1010111 @r_vm
+vfsub_vv        000010 . ..... ..... 001 ..... 1010111 @r_vm
+vfsub_vf        000010 . ..... ..... 101 ..... 1010111 @r_vm
+vfrsub_vf       100111 . ..... ..... 101 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 11b4887275..af4dcb96c6 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1557,3 +1557,110 @@ GEN_OPIVX_NARROW_TRANS(vnclipu_vx)
 GEN_OPIVX_NARROW_TRANS(vnclip_vx)
 GEN_OPIVI_NARROW_TRANS(vnclipu_vi, 1, vnclipu_vx)
 GEN_OPIVI_NARROW_TRANS(vnclip_vi, 1, vnclip_vx)
+
+/*
+ *** Vector Float Point Arithmetic Instructions
+ */
+/* Vector Single-Width Floating-Point Add/Subtract Instructions */
+
+/*
+ * If the current SEW does not correspond to a supported IEEE floating-point
+ * type, an illegal instruction exception is raised.
+ */
+static bool opfvv_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_reg(s, a->rs1, false) &&
+            (s->sew != 0));
+}
+
+/* OPFVV without GVEC IR */
+#define GEN_OPFVV_TRANS(NAME, CHECK)                               \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
+{                                                                  \
+    if (CHECK(s, a)) {                                             \
+        uint32_t data = 0;                                         \
+        static gen_helper_gvec_4_ptr * const fns[3] = {            \
+            gen_helper_##NAME##_h,                                 \
+            gen_helper_##NAME##_w,                                 \
+            gen_helper_##NAME##_d,                                 \
+        };                                                         \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
+            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),              \
+            cpu_env, 0, s->vlen / 8, data, fns[s->sew - 1]);       \
+        return true;                                               \
+    }                                                              \
+    return false;                                                  \
+}
+GEN_OPFVV_TRANS(vfadd_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfsub_vv, opfvv_check)
+
+typedef void (*gen_helper_opfvf)(TCGv_ptr, TCGv_ptr, TCGv_i64, TCGv_ptr,
+        TCGv_env, TCGv_i32);
+
+static bool opfvf_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
+        uint32_t data, gen_helper_opfvf fn, DisasContext *s)
+{
+    TCGv_ptr dest, src2, mask;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    src2 = tcg_temp_new_ptr();
+    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
+
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, vs2));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, mask, cpu_fpr[rs1], src2, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free_ptr(src2);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool opfvf_check(DisasContext *s, arg_rmrr *a)
+{
+/*
+ * If the current SEW does not correspond to a supported IEEE floating-point
+ * type, an illegal instruction exception is raised
+ */
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            (s->sew != 0));
+}
+
+/* OPFVF without GVEC IR */
+#define GEN_OPFVF_TRANS(NAME, CHECK)                              \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)            \
+{                                                                 \
+    if (CHECK(s, a)) {                                            \
+        uint32_t data = 0;                                        \
+        static gen_helper_opfvf const fns[3] = {                  \
+            gen_helper_##NAME##_h,                                \
+            gen_helper_##NAME##_w,                                \
+            gen_helper_##NAME##_d,                                \
+        };                                                        \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);            \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);            \
+        return opfvf_trans(a->rd, a->rs1, a->rs2, data,           \
+                fns[s->sew - 1], s);                              \
+    }                                                             \
+    return false;                                                 \
+}
+
+GEN_OPFVF_TRANS(vfadd_vf,  opfvf_check)
+GEN_OPFVF_TRANS(vfsub_vf,  opfvf_check)
+GEN_OPFVF_TRANS(vfrsub_vf,  opfvf_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 7f61d4c0c4..d49a7194f7 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -21,6 +21,7 @@
 #include "exec/memop.h"
 #include "exec/exec-all.h"
 #include "exec/helper-proto.h"
+#include "fpu/softfloat.h"
 #include "tcg/tcg-gvec-desc.h"
 #include <math.h>
 
@@ -2940,3 +2941,91 @@ RVVCALL(OPIVX2_ENV, vnclipu_vx_w, NOP_UUU_W, H4, H8, vnclipu32)
 GEN_VEXT_VX_ENV(vnclipu_vx_b, 1, 1, clearb)
 GEN_VEXT_VX_ENV(vnclipu_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_ENV(vnclipu_vx_w, 4, 4, clearl)
+
+/*
+ *** Vector Float Point Arithmetic Instructions
+ */
+/* Vector Single-Width Floating-Point Add/Subtract Instructions */
+#define OPFVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)   \
+static void do_##NAME(void *vd, void *vs1, void *vs2, int i,   \
+        CPURISCVState *env)                                    \
+{                                                              \
+    TX1 s1 = *((T1 *)vs1 + HS1(i));                            \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                            \
+    *((TD *)vd + HD(i)) = OP(s2, s1, &env->fp_status);         \
+}
+RVVCALL(OPFVV2, vfadd_vv_h, OP_UUU_H, H2, H2, H2, float16_add)
+RVVCALL(OPFVV2, vfadd_vv_w, OP_UUU_W, H4, H4, H4, float32_add)
+RVVCALL(OPFVV2, vfadd_vv_d, OP_UUU_D, H8, H8, H8, float64_add)
+GEN_VEXT_VV_ENV(vfadd_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfadd_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfadd_vv_d, 8, 8, clearq)
+
+#define OPFVF2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)        \
+static void do_##NAME(void *vd, uint64_t s1, void *vs2, int i, \
+        CPURISCVState *env)                                    \
+{                                                              \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                            \
+    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1, &env->fp_status);\
+}
+
+#define GEN_VEXT_VF(NAME, ESZ, DSZ, CLEAR_FN)             \
+void HELPER(NAME)(void *vd, void *v0, uint64_t s1,        \
+        void *vs2, CPURISCVState *env, uint32_t desc)     \
+{                                                         \
+    uint32_t vlmax = vext_maxsz(desc) / ESZ;              \
+    uint32_t mlen = vext_mlen(desc);                      \
+    uint32_t vm = vext_vm(desc);                          \
+    uint32_t vl = env->vl;                                \
+    uint32_t i;                                           \
+                                                          \
+    for (i = 0; i < vl; i++) {                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
+            continue;                                     \
+        }                                                 \
+        do_##NAME(vd, s1, vs2, i, env);                   \
+    }                                                     \
+    if (i != 0) {                                         \
+        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);         \
+    }                                                     \
+}
+RVVCALL(OPFVF2, vfadd_vf_h, OP_UUU_H, H2, H2, float16_add)
+RVVCALL(OPFVF2, vfadd_vf_w, OP_UUU_W, H4, H4, float32_add)
+RVVCALL(OPFVF2, vfadd_vf_d, OP_UUU_D, H8, H8, float64_add)
+GEN_VEXT_VF(vfadd_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfadd_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfadd_vf_d, 8, 8, clearq)
+
+RVVCALL(OPFVV2, vfsub_vv_h, OP_UUU_H, H2, H2, H2, float16_sub)
+RVVCALL(OPFVV2, vfsub_vv_w, OP_UUU_W, H4, H4, H4, float32_sub)
+RVVCALL(OPFVV2, vfsub_vv_d, OP_UUU_D, H8, H8, H8, float64_sub)
+GEN_VEXT_VV_ENV(vfsub_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfsub_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfsub_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF2, vfsub_vf_h, OP_UUU_H, H2, H2, float16_sub)
+RVVCALL(OPFVF2, vfsub_vf_w, OP_UUU_W, H4, H4, float32_sub)
+RVVCALL(OPFVF2, vfsub_vf_d, OP_UUU_D, H8, H8, float64_sub)
+GEN_VEXT_VF(vfsub_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfsub_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfsub_vf_d, 8, 8, clearq)
+
+static uint16_t float16_rsub(uint16_t a, uint16_t b, float_status *s)
+{
+    return float16_sub(b, a, s);
+}
+
+static uint32_t float32_rsub(uint32_t a, uint32_t b, float_status *s)
+{
+    return float32_sub(b, a, s);
+}
+
+static uint64_t float64_rsub(uint64_t a, uint64_t b, float_status *s)
+{
+    return float64_sub(b, a, s);
+}
+RVVCALL(OPFVF2, vfrsub_vf_h, OP_UUU_H, H2, H2, float16_rsub)
+RVVCALL(OPFVF2, vfrsub_vf_w, OP_UUU_W, H4, H4, float32_rsub)
+RVVCALL(OPFVF2, vfrsub_vf_d, OP_UUU_D, H8, H8, float64_rsub)
+GEN_VEXT_VF(vfrsub_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfrsub_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfrsub_vf_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 29/60] target/riscv: vector single-width floating-point add/subtract instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  16 ++++
 target/riscv/insn32.decode              |   5 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 107 ++++++++++++++++++++++++
 target/riscv/vector_helper.c            |  89 ++++++++++++++++++++
 4 files changed, 217 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 4cad8679ec..6b46677eeb 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -785,3 +785,19 @@ DEF_HELPER_6(vnclipu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnclip_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnclip_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnclip_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vfadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfadd_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfadd_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfadd_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsub_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsub_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsub_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfrsub_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfrsub_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfrsub_vf_d, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index c7d589566f..32918c4d11 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -438,6 +438,11 @@ vnclipu_vi      101110 . ..... ..... 011 ..... 1010111 @r_vm
 vnclip_vv       101111 . ..... ..... 000 ..... 1010111 @r_vm
 vnclip_vx       101111 . ..... ..... 100 ..... 1010111 @r_vm
 vnclip_vi       101111 . ..... ..... 011 ..... 1010111 @r_vm
+vfadd_vv        000000 . ..... ..... 001 ..... 1010111 @r_vm
+vfadd_vf        000000 . ..... ..... 101 ..... 1010111 @r_vm
+vfsub_vv        000010 . ..... ..... 001 ..... 1010111 @r_vm
+vfsub_vf        000010 . ..... ..... 101 ..... 1010111 @r_vm
+vfrsub_vf       100111 . ..... ..... 101 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 11b4887275..af4dcb96c6 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1557,3 +1557,110 @@ GEN_OPIVX_NARROW_TRANS(vnclipu_vx)
 GEN_OPIVX_NARROW_TRANS(vnclip_vx)
 GEN_OPIVI_NARROW_TRANS(vnclipu_vi, 1, vnclipu_vx)
 GEN_OPIVI_NARROW_TRANS(vnclip_vi, 1, vnclip_vx)
+
+/*
+ *** Vector Float Point Arithmetic Instructions
+ */
+/* Vector Single-Width Floating-Point Add/Subtract Instructions */
+
+/*
+ * If the current SEW does not correspond to a supported IEEE floating-point
+ * type, an illegal instruction exception is raised.
+ */
+static bool opfvv_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_reg(s, a->rs1, false) &&
+            (s->sew != 0));
+}
+
+/* OPFVV without GVEC IR */
+#define GEN_OPFVV_TRANS(NAME, CHECK)                               \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
+{                                                                  \
+    if (CHECK(s, a)) {                                             \
+        uint32_t data = 0;                                         \
+        static gen_helper_gvec_4_ptr * const fns[3] = {            \
+            gen_helper_##NAME##_h,                                 \
+            gen_helper_##NAME##_w,                                 \
+            gen_helper_##NAME##_d,                                 \
+        };                                                         \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
+            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),              \
+            cpu_env, 0, s->vlen / 8, data, fns[s->sew - 1]);       \
+        return true;                                               \
+    }                                                              \
+    return false;                                                  \
+}
+GEN_OPFVV_TRANS(vfadd_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfsub_vv, opfvv_check)
+
+typedef void (*gen_helper_opfvf)(TCGv_ptr, TCGv_ptr, TCGv_i64, TCGv_ptr,
+        TCGv_env, TCGv_i32);
+
+static bool opfvf_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
+        uint32_t data, gen_helper_opfvf fn, DisasContext *s)
+{
+    TCGv_ptr dest, src2, mask;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    src2 = tcg_temp_new_ptr();
+    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
+
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, vs2));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, mask, cpu_fpr[rs1], src2, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free_ptr(src2);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool opfvf_check(DisasContext *s, arg_rmrr *a)
+{
+/*
+ * If the current SEW does not correspond to a supported IEEE floating-point
+ * type, an illegal instruction exception is raised
+ */
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            (s->sew != 0));
+}
+
+/* OPFVF without GVEC IR */
+#define GEN_OPFVF_TRANS(NAME, CHECK)                              \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)            \
+{                                                                 \
+    if (CHECK(s, a)) {                                            \
+        uint32_t data = 0;                                        \
+        static gen_helper_opfvf const fns[3] = {                  \
+            gen_helper_##NAME##_h,                                \
+            gen_helper_##NAME##_w,                                \
+            gen_helper_##NAME##_d,                                \
+        };                                                        \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);            \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);            \
+        return opfvf_trans(a->rd, a->rs1, a->rs2, data,           \
+                fns[s->sew - 1], s);                              \
+    }                                                             \
+    return false;                                                 \
+}
+
+GEN_OPFVF_TRANS(vfadd_vf,  opfvf_check)
+GEN_OPFVF_TRANS(vfsub_vf,  opfvf_check)
+GEN_OPFVF_TRANS(vfrsub_vf,  opfvf_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 7f61d4c0c4..d49a7194f7 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -21,6 +21,7 @@
 #include "exec/memop.h"
 #include "exec/exec-all.h"
 #include "exec/helper-proto.h"
+#include "fpu/softfloat.h"
 #include "tcg/tcg-gvec-desc.h"
 #include <math.h>
 
@@ -2940,3 +2941,91 @@ RVVCALL(OPIVX2_ENV, vnclipu_vx_w, NOP_UUU_W, H4, H8, vnclipu32)
 GEN_VEXT_VX_ENV(vnclipu_vx_b, 1, 1, clearb)
 GEN_VEXT_VX_ENV(vnclipu_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_ENV(vnclipu_vx_w, 4, 4, clearl)
+
+/*
+ *** Vector Float Point Arithmetic Instructions
+ */
+/* Vector Single-Width Floating-Point Add/Subtract Instructions */
+#define OPFVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)   \
+static void do_##NAME(void *vd, void *vs1, void *vs2, int i,   \
+        CPURISCVState *env)                                    \
+{                                                              \
+    TX1 s1 = *((T1 *)vs1 + HS1(i));                            \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                            \
+    *((TD *)vd + HD(i)) = OP(s2, s1, &env->fp_status);         \
+}
+RVVCALL(OPFVV2, vfadd_vv_h, OP_UUU_H, H2, H2, H2, float16_add)
+RVVCALL(OPFVV2, vfadd_vv_w, OP_UUU_W, H4, H4, H4, float32_add)
+RVVCALL(OPFVV2, vfadd_vv_d, OP_UUU_D, H8, H8, H8, float64_add)
+GEN_VEXT_VV_ENV(vfadd_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfadd_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfadd_vv_d, 8, 8, clearq)
+
+#define OPFVF2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)        \
+static void do_##NAME(void *vd, uint64_t s1, void *vs2, int i, \
+        CPURISCVState *env)                                    \
+{                                                              \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                            \
+    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1, &env->fp_status);\
+}
+
+#define GEN_VEXT_VF(NAME, ESZ, DSZ, CLEAR_FN)             \
+void HELPER(NAME)(void *vd, void *v0, uint64_t s1,        \
+        void *vs2, CPURISCVState *env, uint32_t desc)     \
+{                                                         \
+    uint32_t vlmax = vext_maxsz(desc) / ESZ;              \
+    uint32_t mlen = vext_mlen(desc);                      \
+    uint32_t vm = vext_vm(desc);                          \
+    uint32_t vl = env->vl;                                \
+    uint32_t i;                                           \
+                                                          \
+    for (i = 0; i < vl; i++) {                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
+            continue;                                     \
+        }                                                 \
+        do_##NAME(vd, s1, vs2, i, env);                   \
+    }                                                     \
+    if (i != 0) {                                         \
+        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);         \
+    }                                                     \
+}
+RVVCALL(OPFVF2, vfadd_vf_h, OP_UUU_H, H2, H2, float16_add)
+RVVCALL(OPFVF2, vfadd_vf_w, OP_UUU_W, H4, H4, float32_add)
+RVVCALL(OPFVF2, vfadd_vf_d, OP_UUU_D, H8, H8, float64_add)
+GEN_VEXT_VF(vfadd_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfadd_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfadd_vf_d, 8, 8, clearq)
+
+RVVCALL(OPFVV2, vfsub_vv_h, OP_UUU_H, H2, H2, H2, float16_sub)
+RVVCALL(OPFVV2, vfsub_vv_w, OP_UUU_W, H4, H4, H4, float32_sub)
+RVVCALL(OPFVV2, vfsub_vv_d, OP_UUU_D, H8, H8, H8, float64_sub)
+GEN_VEXT_VV_ENV(vfsub_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfsub_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfsub_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF2, vfsub_vf_h, OP_UUU_H, H2, H2, float16_sub)
+RVVCALL(OPFVF2, vfsub_vf_w, OP_UUU_W, H4, H4, float32_sub)
+RVVCALL(OPFVF2, vfsub_vf_d, OP_UUU_D, H8, H8, float64_sub)
+GEN_VEXT_VF(vfsub_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfsub_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfsub_vf_d, 8, 8, clearq)
+
+static uint16_t float16_rsub(uint16_t a, uint16_t b, float_status *s)
+{
+    return float16_sub(b, a, s);
+}
+
+static uint32_t float32_rsub(uint32_t a, uint32_t b, float_status *s)
+{
+    return float32_sub(b, a, s);
+}
+
+static uint64_t float64_rsub(uint64_t a, uint64_t b, float_status *s)
+{
+    return float64_sub(b, a, s);
+}
+RVVCALL(OPFVF2, vfrsub_vf_h, OP_UUU_H, H2, H2, float16_rsub)
+RVVCALL(OPFVF2, vfrsub_vf_w, OP_UUU_W, H4, H4, float32_rsub)
+RVVCALL(OPFVF2, vfrsub_vf_d, OP_UUU_D, H8, H8, float64_rsub)
+GEN_VEXT_VF(vfrsub_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfrsub_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfrsub_vf_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 30/60] target/riscv: vector widening floating-point add/subtract instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  17 +++
 target/riscv/insn32.decode              |   8 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 131 ++++++++++++++++++++++++
 target/riscv/vector_helper.c            |  77 ++++++++++++++
 4 files changed, 233 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 6b46677eeb..f242fa4e4b 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -801,3 +801,20 @@ DEF_HELPER_6(vfsub_vf_d, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfrsub_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfrsub_vf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfrsub_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_6(vfwadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwadd_wv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwadd_wv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwsub_wv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwsub_wv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwadd_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwadd_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwsub_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwsub_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwadd_wf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwadd_wf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwsub_wf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwsub_wf_w, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 32918c4d11..5ec95541c6 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -443,6 +443,14 @@ vfadd_vf        000000 . ..... ..... 101 ..... 1010111 @r_vm
 vfsub_vv        000010 . ..... ..... 001 ..... 1010111 @r_vm
 vfsub_vf        000010 . ..... ..... 101 ..... 1010111 @r_vm
 vfrsub_vf       100111 . ..... ..... 101 ..... 1010111 @r_vm
+vfwadd_vv       110000 . ..... ..... 001 ..... 1010111 @r_vm
+vfwadd_vf       110000 . ..... ..... 101 ..... 1010111 @r_vm
+vfwadd_wv       110100 . ..... ..... 001 ..... 1010111 @r_vm
+vfwadd_wf       110100 . ..... ..... 101 ..... 1010111 @r_vm
+vfwsub_vv       110010 . ..... ..... 001 ..... 1010111 @r_vm
+vfwsub_vf       110010 . ..... ..... 101 ..... 1010111 @r_vm
+vfwsub_wv       110110 . ..... ..... 001 ..... 1010111 @r_vm
+vfwsub_wf       110110 . ..... ..... 101 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index af4dcb96c6..ab04f469af 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1664,3 +1664,134 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)            \
 GEN_OPFVF_TRANS(vfadd_vf,  opfvf_check)
 GEN_OPFVF_TRANS(vfsub_vf,  opfvf_check)
 GEN_OPFVF_TRANS(vfrsub_vf,  opfvf_check)
+
+/* Vector Widening Floating-Point Add/Subtract Instructions */
+static bool opfvv_widen_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, true) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_reg(s, a->rs1, false) &&
+            vext_check_overlap_group(a->rd, 2 << s->lmul, a->rs2,
+                1 << s->lmul) &&
+            vext_check_overlap_group(a->rd, 2 << s->lmul, a->rs1,
+                1 << s->lmul) &&
+            (s->lmul < 0x3) && (s->sew < 0x3) && (s->sew != 0));
+}
+
+/* OPFVV with WIDEN */
+#define GEN_OPFVV_WIDEN_TRANS(NAME, CHECK)                       \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)           \
+{                                                                \
+    if (CHECK(s, a)) {                                           \
+        uint32_t data = 0;                                       \
+        static gen_helper_gvec_4_ptr * const fns[2] = {          \
+            gen_helper_##NAME##_h, gen_helper_##NAME##_w,        \
+        };                                                       \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);           \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);               \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);           \
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),   \
+            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),            \
+            cpu_env, 0, s->vlen / 8, data, fns[s->sew - 1]);     \
+        return true;                                             \
+    }                                                            \
+    return false;                                                \
+}
+
+GEN_OPFVV_WIDEN_TRANS(vfwadd_vv, opfvv_widen_check)
+GEN_OPFVV_WIDEN_TRANS(vfwsub_vv, opfvv_widen_check)
+
+static bool opfvf_widen_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, true) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_overlap_group(a->rd, 2 << s->lmul, a->rs2,
+                1 << s->lmul) &&
+            (s->lmul < 0x3) && (s->sew < 0x3) && (s->sew != 0));
+}
+/* OPFVF with WIDEN */
+#define GEN_OPFVF_WIDEN_TRANS(NAME)                              \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)           \
+{                                                                \
+    if (opfvf_widen_check(s, a)) {                               \
+        uint32_t data = 0;                                       \
+        static gen_helper_opfvf const fns[2] = {                 \
+            gen_helper_##NAME##_h, gen_helper_##NAME##_w,        \
+        };                                                       \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);           \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);               \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);           \
+        return opfvf_trans(a->rd, a->rs1, a->rs2, data,          \
+                fns[s->sew - 1], s);                             \
+    }                                                            \
+    return false;                                                \
+}
+GEN_OPFVF_WIDEN_TRANS(vfwadd_vf)
+GEN_OPFVF_WIDEN_TRANS(vfwsub_vf)
+
+static bool opfwv_widen_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, true) &&
+            vext_check_reg(s, a->rs2, true) &&
+            vext_check_reg(s, a->rs1, false) &&
+            vext_check_overlap_group(a->rd, 2 << s->lmul, a->rs1,
+                1 << s->lmul) &&
+            (s->lmul < 0x3) && (s->sew < 0x3) && (s->sew != 0));
+}
+
+/* WIDEN OPFVV with WIDEN */
+#define GEN_OPFWV_WIDEN_TRANS(NAME)                                \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
+{                                                                  \
+    if (opfwv_widen_check(s, a)) {                                 \
+        uint32_t data = 0;                                         \
+        static gen_helper_gvec_4_ptr * const fns[2] = {            \
+            gen_helper_##NAME##_h, gen_helper_##NAME##_w,          \
+        };                                                         \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
+            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),              \
+            cpu_env, 0, s->vlen / 8, data, fns[s->sew - 1]);       \
+        return true;                                               \
+    }                                                              \
+    return false;                                                  \
+}
+GEN_OPFWV_WIDEN_TRANS(vfwadd_wv)
+GEN_OPFWV_WIDEN_TRANS(vfwsub_wv)
+
+static bool opfwf_widen_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, true) &&
+            vext_check_reg(s, a->rs2, true) &&
+            (s->lmul < 0x3) && (s->sew < 0x3) && (s->sew != 0));
+}
+
+/* WIDEN OPFVF with WIDEN */
+#define GEN_OPFWF_WIDEN_TRANS(NAME)                                      \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
+{                                                                        \
+    if (opfwf_widen_check(s, a)) {                                       \
+        uint32_t data = 0;                                               \
+        static gen_helper_opfvf const fns[2] = {                         \
+            gen_helper_##NAME##_h, gen_helper_##NAME##_w,                \
+        };                                                               \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                       \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
+        return opfvf_trans(a->rd, a->rs1, a->rs2, data,                  \
+                fns[s->sew - 1], s);                                     \
+    }                                                                    \
+    return false;                                                        \
+}
+GEN_OPFWF_WIDEN_TRANS(vfwadd_wf)
+GEN_OPFWF_WIDEN_TRANS(vfwsub_wf)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index d49a7194f7..0840c5d662 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3029,3 +3029,80 @@ RVVCALL(OPFVF2, vfrsub_vf_d, OP_UUU_D, H8, H8, float64_rsub)
 GEN_VEXT_VF(vfrsub_vf_h, 2, 2, clearh)
 GEN_VEXT_VF(vfrsub_vf_w, 4, 4, clearl)
 GEN_VEXT_VF(vfrsub_vf_d, 8, 8, clearq)
+
+/* Vector Widening Floating-Point Add/Subtract Instructions */
+static uint32_t vfwadd16(uint16_t a, uint16_t b, float_status *s)
+{
+    return float32_add(float16_to_float32(a, true, s),
+            float16_to_float32(b, true, s), s);
+}
+static uint64_t vfwadd32(uint32_t a, uint32_t b, float_status *s)
+{
+    return float64_add(float32_to_float64(a, s),
+            float32_to_float64(b, s), s);
+
+}
+RVVCALL(OPFVV2, vfwadd_vv_h, WOP_UUU_H, H4, H2, H2, vfwadd16)
+RVVCALL(OPFVV2, vfwadd_vv_w, WOP_UUU_W, H8, H4, H4, vfwadd32)
+GEN_VEXT_VV_ENV(vfwadd_vv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vfwadd_vv_w, 4, 8, clearq)
+RVVCALL(OPFVF2, vfwadd_vf_h, WOP_UUU_H, H4, H2, vfwadd16)
+RVVCALL(OPFVF2, vfwadd_vf_w, WOP_UUU_W, H8, H4, vfwadd32)
+GEN_VEXT_VF(vfwadd_vf_h, 2, 4, clearl)
+GEN_VEXT_VF(vfwadd_vf_w, 4, 8, clearq)
+
+static uint32_t vfwsub16(uint16_t a, uint16_t b, float_status *s)
+{
+    return float32_sub(float16_to_float32(a, true, s),
+            float16_to_float32(b, true, s), s);
+}
+
+static uint64_t vfwsub32(uint32_t a, uint32_t b, float_status *s)
+{
+    return float64_sub(float32_to_float64(a, s),
+            float32_to_float64(b, s), s);
+
+}
+RVVCALL(OPFVV2, vfwsub_vv_h, WOP_UUU_H, H4, H2, H2, vfwsub16)
+RVVCALL(OPFVV2, vfwsub_vv_w, WOP_UUU_W, H8, H4, H4, vfwsub32)
+GEN_VEXT_VV_ENV(vfwsub_vv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vfwsub_vv_w, 4, 8, clearq)
+RVVCALL(OPFVF2, vfwsub_vf_h, WOP_UUU_H, H4, H2, vfwsub16)
+RVVCALL(OPFVF2, vfwsub_vf_w, WOP_UUU_W, H8, H4, vfwsub32)
+GEN_VEXT_VF(vfwsub_vf_h, 2, 4, clearl)
+GEN_VEXT_VF(vfwsub_vf_w, 4, 8, clearq)
+
+static uint32_t vfwaddw16(uint32_t a, uint16_t b, float_status *s)
+{
+    return float32_add(a, float16_to_float32(b, true, s), s);
+}
+static uint64_t vfwaddw32(uint64_t a, uint32_t b, float_status *s)
+{
+    return float64_add(a, float32_to_float64(b, s), s);
+}
+RVVCALL(OPFVV2, vfwadd_wv_h, WOP_WUUU_H, H4, H2, H2, vfwaddw16)
+RVVCALL(OPFVV2, vfwadd_wv_w, WOP_WUUU_W, H8, H4, H4, vfwaddw32)
+GEN_VEXT_VV_ENV(vfwadd_wv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vfwadd_wv_w, 4, 8, clearq)
+RVVCALL(OPFVF2, vfwadd_wf_h, WOP_WUUU_H, H4, H2, vfwaddw16)
+RVVCALL(OPFVF2, vfwadd_wf_w, WOP_WUUU_W, H8, H4, vfwaddw32)
+GEN_VEXT_VF(vfwadd_wf_h, 2, 4, clearl)
+GEN_VEXT_VF(vfwadd_wf_w, 4, 8, clearq)
+
+static uint32_t vfwsubw16(uint32_t a, uint16_t b, float_status *s)
+{
+    return float32_sub(a, float16_to_float32(b, true, s), s);
+}
+
+static uint64_t vfwsubw32(uint64_t a, uint32_t b, float_status *s)
+{
+    return float64_sub(a, float32_to_float64(b, s), s);
+}
+RVVCALL(OPFVV2, vfwsub_wv_h, WOP_WUUU_H, H4, H2, H2, vfwsubw16)
+RVVCALL(OPFVV2, vfwsub_wv_w, WOP_WUUU_W, H8, H4, H4, vfwsubw32)
+GEN_VEXT_VV_ENV(vfwsub_wv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vfwsub_wv_w, 4, 8, clearq)
+RVVCALL(OPFVF2, vfwsub_wf_h, WOP_WUUU_H, H4, H2, vfwsubw16)
+RVVCALL(OPFVF2, vfwsub_wf_w, WOP_WUUU_W, H8, H4, vfwsubw32)
+GEN_VEXT_VF(vfwsub_wf_h, 2, 4, clearl)
+GEN_VEXT_VF(vfwsub_wf_w, 4, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 30/60] target/riscv: vector widening floating-point add/subtract instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  17 +++
 target/riscv/insn32.decode              |   8 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 131 ++++++++++++++++++++++++
 target/riscv/vector_helper.c            |  77 ++++++++++++++
 4 files changed, 233 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 6b46677eeb..f242fa4e4b 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -801,3 +801,20 @@ DEF_HELPER_6(vfsub_vf_d, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfrsub_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfrsub_vf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfrsub_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_6(vfwadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwadd_wv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwadd_wv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwsub_wv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwsub_wv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwadd_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwadd_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwsub_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwsub_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwadd_wf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwadd_wf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwsub_wf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwsub_wf_w, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 32918c4d11..5ec95541c6 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -443,6 +443,14 @@ vfadd_vf        000000 . ..... ..... 101 ..... 1010111 @r_vm
 vfsub_vv        000010 . ..... ..... 001 ..... 1010111 @r_vm
 vfsub_vf        000010 . ..... ..... 101 ..... 1010111 @r_vm
 vfrsub_vf       100111 . ..... ..... 101 ..... 1010111 @r_vm
+vfwadd_vv       110000 . ..... ..... 001 ..... 1010111 @r_vm
+vfwadd_vf       110000 . ..... ..... 101 ..... 1010111 @r_vm
+vfwadd_wv       110100 . ..... ..... 001 ..... 1010111 @r_vm
+vfwadd_wf       110100 . ..... ..... 101 ..... 1010111 @r_vm
+vfwsub_vv       110010 . ..... ..... 001 ..... 1010111 @r_vm
+vfwsub_vf       110010 . ..... ..... 101 ..... 1010111 @r_vm
+vfwsub_wv       110110 . ..... ..... 001 ..... 1010111 @r_vm
+vfwsub_wf       110110 . ..... ..... 101 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index af4dcb96c6..ab04f469af 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1664,3 +1664,134 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)            \
 GEN_OPFVF_TRANS(vfadd_vf,  opfvf_check)
 GEN_OPFVF_TRANS(vfsub_vf,  opfvf_check)
 GEN_OPFVF_TRANS(vfrsub_vf,  opfvf_check)
+
+/* Vector Widening Floating-Point Add/Subtract Instructions */
+static bool opfvv_widen_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, true) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_reg(s, a->rs1, false) &&
+            vext_check_overlap_group(a->rd, 2 << s->lmul, a->rs2,
+                1 << s->lmul) &&
+            vext_check_overlap_group(a->rd, 2 << s->lmul, a->rs1,
+                1 << s->lmul) &&
+            (s->lmul < 0x3) && (s->sew < 0x3) && (s->sew != 0));
+}
+
+/* OPFVV with WIDEN */
+#define GEN_OPFVV_WIDEN_TRANS(NAME, CHECK)                       \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)           \
+{                                                                \
+    if (CHECK(s, a)) {                                           \
+        uint32_t data = 0;                                       \
+        static gen_helper_gvec_4_ptr * const fns[2] = {          \
+            gen_helper_##NAME##_h, gen_helper_##NAME##_w,        \
+        };                                                       \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);           \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);               \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);           \
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),   \
+            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),            \
+            cpu_env, 0, s->vlen / 8, data, fns[s->sew - 1]);     \
+        return true;                                             \
+    }                                                            \
+    return false;                                                \
+}
+
+GEN_OPFVV_WIDEN_TRANS(vfwadd_vv, opfvv_widen_check)
+GEN_OPFVV_WIDEN_TRANS(vfwsub_vv, opfvv_widen_check)
+
+static bool opfvf_widen_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, true) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_overlap_group(a->rd, 2 << s->lmul, a->rs2,
+                1 << s->lmul) &&
+            (s->lmul < 0x3) && (s->sew < 0x3) && (s->sew != 0));
+}
+/* OPFVF with WIDEN */
+#define GEN_OPFVF_WIDEN_TRANS(NAME)                              \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)           \
+{                                                                \
+    if (opfvf_widen_check(s, a)) {                               \
+        uint32_t data = 0;                                       \
+        static gen_helper_opfvf const fns[2] = {                 \
+            gen_helper_##NAME##_h, gen_helper_##NAME##_w,        \
+        };                                                       \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);           \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);               \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);           \
+        return opfvf_trans(a->rd, a->rs1, a->rs2, data,          \
+                fns[s->sew - 1], s);                             \
+    }                                                            \
+    return false;                                                \
+}
+GEN_OPFVF_WIDEN_TRANS(vfwadd_vf)
+GEN_OPFVF_WIDEN_TRANS(vfwsub_vf)
+
+static bool opfwv_widen_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, true) &&
+            vext_check_reg(s, a->rs2, true) &&
+            vext_check_reg(s, a->rs1, false) &&
+            vext_check_overlap_group(a->rd, 2 << s->lmul, a->rs1,
+                1 << s->lmul) &&
+            (s->lmul < 0x3) && (s->sew < 0x3) && (s->sew != 0));
+}
+
+/* WIDEN OPFVV with WIDEN */
+#define GEN_OPFWV_WIDEN_TRANS(NAME)                                \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
+{                                                                  \
+    if (opfwv_widen_check(s, a)) {                                 \
+        uint32_t data = 0;                                         \
+        static gen_helper_gvec_4_ptr * const fns[2] = {            \
+            gen_helper_##NAME##_h, gen_helper_##NAME##_w,          \
+        };                                                         \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
+            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),              \
+            cpu_env, 0, s->vlen / 8, data, fns[s->sew - 1]);       \
+        return true;                                               \
+    }                                                              \
+    return false;                                                  \
+}
+GEN_OPFWV_WIDEN_TRANS(vfwadd_wv)
+GEN_OPFWV_WIDEN_TRANS(vfwsub_wv)
+
+static bool opfwf_widen_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, true) &&
+            vext_check_reg(s, a->rs2, true) &&
+            (s->lmul < 0x3) && (s->sew < 0x3) && (s->sew != 0));
+}
+
+/* WIDEN OPFVF with WIDEN */
+#define GEN_OPFWF_WIDEN_TRANS(NAME)                                      \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
+{                                                                        \
+    if (opfwf_widen_check(s, a)) {                                       \
+        uint32_t data = 0;                                               \
+        static gen_helper_opfvf const fns[2] = {                         \
+            gen_helper_##NAME##_h, gen_helper_##NAME##_w,                \
+        };                                                               \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                       \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
+        return opfvf_trans(a->rd, a->rs1, a->rs2, data,                  \
+                fns[s->sew - 1], s);                                     \
+    }                                                                    \
+    return false;                                                        \
+}
+GEN_OPFWF_WIDEN_TRANS(vfwadd_wf)
+GEN_OPFWF_WIDEN_TRANS(vfwsub_wf)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index d49a7194f7..0840c5d662 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3029,3 +3029,80 @@ RVVCALL(OPFVF2, vfrsub_vf_d, OP_UUU_D, H8, H8, float64_rsub)
 GEN_VEXT_VF(vfrsub_vf_h, 2, 2, clearh)
 GEN_VEXT_VF(vfrsub_vf_w, 4, 4, clearl)
 GEN_VEXT_VF(vfrsub_vf_d, 8, 8, clearq)
+
+/* Vector Widening Floating-Point Add/Subtract Instructions */
+static uint32_t vfwadd16(uint16_t a, uint16_t b, float_status *s)
+{
+    return float32_add(float16_to_float32(a, true, s),
+            float16_to_float32(b, true, s), s);
+}
+static uint64_t vfwadd32(uint32_t a, uint32_t b, float_status *s)
+{
+    return float64_add(float32_to_float64(a, s),
+            float32_to_float64(b, s), s);
+
+}
+RVVCALL(OPFVV2, vfwadd_vv_h, WOP_UUU_H, H4, H2, H2, vfwadd16)
+RVVCALL(OPFVV2, vfwadd_vv_w, WOP_UUU_W, H8, H4, H4, vfwadd32)
+GEN_VEXT_VV_ENV(vfwadd_vv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vfwadd_vv_w, 4, 8, clearq)
+RVVCALL(OPFVF2, vfwadd_vf_h, WOP_UUU_H, H4, H2, vfwadd16)
+RVVCALL(OPFVF2, vfwadd_vf_w, WOP_UUU_W, H8, H4, vfwadd32)
+GEN_VEXT_VF(vfwadd_vf_h, 2, 4, clearl)
+GEN_VEXT_VF(vfwadd_vf_w, 4, 8, clearq)
+
+static uint32_t vfwsub16(uint16_t a, uint16_t b, float_status *s)
+{
+    return float32_sub(float16_to_float32(a, true, s),
+            float16_to_float32(b, true, s), s);
+}
+
+static uint64_t vfwsub32(uint32_t a, uint32_t b, float_status *s)
+{
+    return float64_sub(float32_to_float64(a, s),
+            float32_to_float64(b, s), s);
+
+}
+RVVCALL(OPFVV2, vfwsub_vv_h, WOP_UUU_H, H4, H2, H2, vfwsub16)
+RVVCALL(OPFVV2, vfwsub_vv_w, WOP_UUU_W, H8, H4, H4, vfwsub32)
+GEN_VEXT_VV_ENV(vfwsub_vv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vfwsub_vv_w, 4, 8, clearq)
+RVVCALL(OPFVF2, vfwsub_vf_h, WOP_UUU_H, H4, H2, vfwsub16)
+RVVCALL(OPFVF2, vfwsub_vf_w, WOP_UUU_W, H8, H4, vfwsub32)
+GEN_VEXT_VF(vfwsub_vf_h, 2, 4, clearl)
+GEN_VEXT_VF(vfwsub_vf_w, 4, 8, clearq)
+
+static uint32_t vfwaddw16(uint32_t a, uint16_t b, float_status *s)
+{
+    return float32_add(a, float16_to_float32(b, true, s), s);
+}
+static uint64_t vfwaddw32(uint64_t a, uint32_t b, float_status *s)
+{
+    return float64_add(a, float32_to_float64(b, s), s);
+}
+RVVCALL(OPFVV2, vfwadd_wv_h, WOP_WUUU_H, H4, H2, H2, vfwaddw16)
+RVVCALL(OPFVV2, vfwadd_wv_w, WOP_WUUU_W, H8, H4, H4, vfwaddw32)
+GEN_VEXT_VV_ENV(vfwadd_wv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vfwadd_wv_w, 4, 8, clearq)
+RVVCALL(OPFVF2, vfwadd_wf_h, WOP_WUUU_H, H4, H2, vfwaddw16)
+RVVCALL(OPFVF2, vfwadd_wf_w, WOP_WUUU_W, H8, H4, vfwaddw32)
+GEN_VEXT_VF(vfwadd_wf_h, 2, 4, clearl)
+GEN_VEXT_VF(vfwadd_wf_w, 4, 8, clearq)
+
+static uint32_t vfwsubw16(uint32_t a, uint16_t b, float_status *s)
+{
+    return float32_sub(a, float16_to_float32(b, true, s), s);
+}
+
+static uint64_t vfwsubw32(uint64_t a, uint32_t b, float_status *s)
+{
+    return float64_sub(a, float32_to_float64(b, s), s);
+}
+RVVCALL(OPFVV2, vfwsub_wv_h, WOP_WUUU_H, H4, H2, H2, vfwsubw16)
+RVVCALL(OPFVV2, vfwsub_wv_w, WOP_WUUU_W, H8, H4, H4, vfwsubw32)
+GEN_VEXT_VV_ENV(vfwsub_wv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vfwsub_wv_w, 4, 8, clearq)
+RVVCALL(OPFVF2, vfwsub_wf_h, WOP_WUUU_H, H4, H2, vfwsubw16)
+RVVCALL(OPFVF2, vfwsub_wf_w, WOP_WUUU_W, H8, H4, vfwsubw32)
+GEN_VEXT_VF(vfwsub_wf_h, 2, 4, clearl)
+GEN_VEXT_VF(vfwsub_wf_w, 4, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 31/60] target/riscv: vector single-width floating-point multiply/divide instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 16 +++++++++
 target/riscv/insn32.decode              |  5 +++
 target/riscv/insn_trans/trans_rvv.inc.c |  7 ++++
 target/riscv/vector_helper.c            | 48 +++++++++++++++++++++++++
 4 files changed, 76 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index f242fa4e4b..a2d7ed19a8 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -818,3 +818,19 @@ DEF_HELPER_6(vfwadd_wf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfwadd_wf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfwsub_wf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfwsub_wf_w, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_6(vfmul_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmul_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmul_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfdiv_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfdiv_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfdiv_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmul_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmul_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmul_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfdiv_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfdiv_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfdiv_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfrdiv_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfrdiv_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfrdiv_vf_d, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 5ec95541c6..050b2fd467 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -451,6 +451,11 @@ vfwsub_vv       110010 . ..... ..... 001 ..... 1010111 @r_vm
 vfwsub_vf       110010 . ..... ..... 101 ..... 1010111 @r_vm
 vfwsub_wv       110110 . ..... ..... 001 ..... 1010111 @r_vm
 vfwsub_wf       110110 . ..... ..... 101 ..... 1010111 @r_vm
+vfmul_vv        100100 . ..... ..... 001 ..... 1010111 @r_vm
+vfmul_vf        100100 . ..... ..... 101 ..... 1010111 @r_vm
+vfdiv_vv        100000 . ..... ..... 001 ..... 1010111 @r_vm
+vfdiv_vf        100000 . ..... ..... 101 ..... 1010111 @r_vm
+vfrdiv_vf       100001 . ..... ..... 101 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index ab04f469af..8dcbff6c64 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1795,3 +1795,10 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
 }
 GEN_OPFWF_WIDEN_TRANS(vfwadd_wf)
 GEN_OPFWF_WIDEN_TRANS(vfwsub_wf)
+
+/* Vector Single-Width Floating-Point Multiply/Divide Instructions */
+GEN_OPFVV_TRANS(vfmul_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfdiv_vv, opfvv_check)
+GEN_OPFVF_TRANS(vfmul_vf,  opfvf_check)
+GEN_OPFVF_TRANS(vfdiv_vf,  opfvf_check)
+GEN_OPFVF_TRANS(vfrdiv_vf,  opfvf_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 0840c5d662..bd7ee4de18 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3106,3 +3106,51 @@ RVVCALL(OPFVF2, vfwsub_wf_h, WOP_WUUU_H, H4, H2, vfwsubw16)
 RVVCALL(OPFVF2, vfwsub_wf_w, WOP_WUUU_W, H8, H4, vfwsubw32)
 GEN_VEXT_VF(vfwsub_wf_h, 2, 4, clearl)
 GEN_VEXT_VF(vfwsub_wf_w, 4, 8, clearq)
+
+/* Vector Single-Width Floating-Point Multiply/Divide Instructions */
+RVVCALL(OPFVV2, vfmul_vv_h, OP_UUU_H, H2, H2, H2, float16_mul)
+RVVCALL(OPFVV2, vfmul_vv_w, OP_UUU_W, H4, H4, H4, float32_mul)
+RVVCALL(OPFVV2, vfmul_vv_d, OP_UUU_D, H8, H8, H8, float64_mul)
+GEN_VEXT_VV_ENV(vfmul_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfmul_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfmul_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF2, vfmul_vf_h, OP_UUU_H, H2, H2, float16_mul)
+RVVCALL(OPFVF2, vfmul_vf_w, OP_UUU_W, H4, H4, float32_mul)
+RVVCALL(OPFVF2, vfmul_vf_d, OP_UUU_D, H8, H8, float64_mul)
+GEN_VEXT_VF(vfmul_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfmul_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfmul_vf_d, 8, 8, clearq)
+
+RVVCALL(OPFVV2, vfdiv_vv_h, OP_UUU_H, H2, H2, H2, float16_div)
+RVVCALL(OPFVV2, vfdiv_vv_w, OP_UUU_W, H4, H4, H4, float32_div)
+RVVCALL(OPFVV2, vfdiv_vv_d, OP_UUU_D, H8, H8, H8, float64_div)
+GEN_VEXT_VV_ENV(vfdiv_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfdiv_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfdiv_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF2, vfdiv_vf_h, OP_UUU_H, H2, H2, float16_div)
+RVVCALL(OPFVF2, vfdiv_vf_w, OP_UUU_W, H4, H4, float32_div)
+RVVCALL(OPFVF2, vfdiv_vf_d, OP_UUU_D, H8, H8, float64_div)
+GEN_VEXT_VF(vfdiv_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfdiv_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfdiv_vf_d, 8, 8, clearq)
+
+static uint16_t float16_rdiv(uint16_t a, uint16_t b, float_status *s)
+{
+    return float16_div(b, a, s);
+}
+
+static uint32_t float32_rdiv(uint32_t a, uint32_t b, float_status *s)
+{
+    return float32_div(b, a, s);
+}
+
+static uint64_t float64_rdiv(uint64_t a, uint64_t b, float_status *s)
+{
+    return float64_div(b, a, s);
+}
+RVVCALL(OPFVF2, vfrdiv_vf_h, OP_UUU_H, H2, H2, float16_rdiv)
+RVVCALL(OPFVF2, vfrdiv_vf_w, OP_UUU_W, H4, H4, float32_rdiv)
+RVVCALL(OPFVF2, vfrdiv_vf_d, OP_UUU_D, H8, H8, float64_rdiv)
+GEN_VEXT_VF(vfrdiv_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfrdiv_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfrdiv_vf_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 31/60] target/riscv: vector single-width floating-point multiply/divide instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 16 +++++++++
 target/riscv/insn32.decode              |  5 +++
 target/riscv/insn_trans/trans_rvv.inc.c |  7 ++++
 target/riscv/vector_helper.c            | 48 +++++++++++++++++++++++++
 4 files changed, 76 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index f242fa4e4b..a2d7ed19a8 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -818,3 +818,19 @@ DEF_HELPER_6(vfwadd_wf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfwadd_wf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfwsub_wf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfwsub_wf_w, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_6(vfmul_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmul_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmul_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfdiv_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfdiv_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfdiv_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmul_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmul_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmul_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfdiv_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfdiv_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfdiv_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfrdiv_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfrdiv_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfrdiv_vf_d, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 5ec95541c6..050b2fd467 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -451,6 +451,11 @@ vfwsub_vv       110010 . ..... ..... 001 ..... 1010111 @r_vm
 vfwsub_vf       110010 . ..... ..... 101 ..... 1010111 @r_vm
 vfwsub_wv       110110 . ..... ..... 001 ..... 1010111 @r_vm
 vfwsub_wf       110110 . ..... ..... 101 ..... 1010111 @r_vm
+vfmul_vv        100100 . ..... ..... 001 ..... 1010111 @r_vm
+vfmul_vf        100100 . ..... ..... 101 ..... 1010111 @r_vm
+vfdiv_vv        100000 . ..... ..... 001 ..... 1010111 @r_vm
+vfdiv_vf        100000 . ..... ..... 101 ..... 1010111 @r_vm
+vfrdiv_vf       100001 . ..... ..... 101 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index ab04f469af..8dcbff6c64 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1795,3 +1795,10 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
 }
 GEN_OPFWF_WIDEN_TRANS(vfwadd_wf)
 GEN_OPFWF_WIDEN_TRANS(vfwsub_wf)
+
+/* Vector Single-Width Floating-Point Multiply/Divide Instructions */
+GEN_OPFVV_TRANS(vfmul_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfdiv_vv, opfvv_check)
+GEN_OPFVF_TRANS(vfmul_vf,  opfvf_check)
+GEN_OPFVF_TRANS(vfdiv_vf,  opfvf_check)
+GEN_OPFVF_TRANS(vfrdiv_vf,  opfvf_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 0840c5d662..bd7ee4de18 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3106,3 +3106,51 @@ RVVCALL(OPFVF2, vfwsub_wf_h, WOP_WUUU_H, H4, H2, vfwsubw16)
 RVVCALL(OPFVF2, vfwsub_wf_w, WOP_WUUU_W, H8, H4, vfwsubw32)
 GEN_VEXT_VF(vfwsub_wf_h, 2, 4, clearl)
 GEN_VEXT_VF(vfwsub_wf_w, 4, 8, clearq)
+
+/* Vector Single-Width Floating-Point Multiply/Divide Instructions */
+RVVCALL(OPFVV2, vfmul_vv_h, OP_UUU_H, H2, H2, H2, float16_mul)
+RVVCALL(OPFVV2, vfmul_vv_w, OP_UUU_W, H4, H4, H4, float32_mul)
+RVVCALL(OPFVV2, vfmul_vv_d, OP_UUU_D, H8, H8, H8, float64_mul)
+GEN_VEXT_VV_ENV(vfmul_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfmul_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfmul_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF2, vfmul_vf_h, OP_UUU_H, H2, H2, float16_mul)
+RVVCALL(OPFVF2, vfmul_vf_w, OP_UUU_W, H4, H4, float32_mul)
+RVVCALL(OPFVF2, vfmul_vf_d, OP_UUU_D, H8, H8, float64_mul)
+GEN_VEXT_VF(vfmul_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfmul_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfmul_vf_d, 8, 8, clearq)
+
+RVVCALL(OPFVV2, vfdiv_vv_h, OP_UUU_H, H2, H2, H2, float16_div)
+RVVCALL(OPFVV2, vfdiv_vv_w, OP_UUU_W, H4, H4, H4, float32_div)
+RVVCALL(OPFVV2, vfdiv_vv_d, OP_UUU_D, H8, H8, H8, float64_div)
+GEN_VEXT_VV_ENV(vfdiv_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfdiv_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfdiv_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF2, vfdiv_vf_h, OP_UUU_H, H2, H2, float16_div)
+RVVCALL(OPFVF2, vfdiv_vf_w, OP_UUU_W, H4, H4, float32_div)
+RVVCALL(OPFVF2, vfdiv_vf_d, OP_UUU_D, H8, H8, float64_div)
+GEN_VEXT_VF(vfdiv_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfdiv_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfdiv_vf_d, 8, 8, clearq)
+
+static uint16_t float16_rdiv(uint16_t a, uint16_t b, float_status *s)
+{
+    return float16_div(b, a, s);
+}
+
+static uint32_t float32_rdiv(uint32_t a, uint32_t b, float_status *s)
+{
+    return float32_div(b, a, s);
+}
+
+static uint64_t float64_rdiv(uint64_t a, uint64_t b, float_status *s)
+{
+    return float64_div(b, a, s);
+}
+RVVCALL(OPFVF2, vfrdiv_vf_h, OP_UUU_H, H2, H2, float16_rdiv)
+RVVCALL(OPFVF2, vfrdiv_vf_w, OP_UUU_W, H4, H4, float32_rdiv)
+RVVCALL(OPFVF2, vfrdiv_vf_d, OP_UUU_D, H8, H8, float64_rdiv)
+GEN_VEXT_VF(vfrdiv_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfrdiv_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfrdiv_vf_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 32/60] target/riscv: vector widening floating-point multiply
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  5 +++++
 target/riscv/insn32.decode              |  2 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  4 ++++
 target/riscv/vector_helper.c            | 22 ++++++++++++++++++++++
 4 files changed, 33 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index a2d7ed19a8..3ec2dcadd4 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -834,3 +834,8 @@ DEF_HELPER_6(vfdiv_vf_d, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfrdiv_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfrdiv_vf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfrdiv_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_6(vfwmul_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwmul_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwmul_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwmul_vf_w, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 050b2fd467..e0ee8f5a7c 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -456,6 +456,8 @@ vfmul_vf        100100 . ..... ..... 101 ..... 1010111 @r_vm
 vfdiv_vv        100000 . ..... ..... 001 ..... 1010111 @r_vm
 vfdiv_vf        100000 . ..... ..... 101 ..... 1010111 @r_vm
 vfrdiv_vf       100001 . ..... ..... 101 ..... 1010111 @r_vm
+vfwmul_vv       111000 . ..... ..... 001 ..... 1010111 @r_vm
+vfwmul_vf       111000 . ..... ..... 101 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 8dcbff6c64..b4d3797685 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1802,3 +1802,7 @@ GEN_OPFVV_TRANS(vfdiv_vv, opfvv_check)
 GEN_OPFVF_TRANS(vfmul_vf,  opfvf_check)
 GEN_OPFVF_TRANS(vfdiv_vf,  opfvf_check)
 GEN_OPFVF_TRANS(vfrdiv_vf,  opfvf_check)
+
+/* Vector Widening Floating-Point Multiply */
+GEN_OPFVV_WIDEN_TRANS(vfwmul_vv, opfvv_widen_check)
+GEN_OPFVF_WIDEN_TRANS(vfwmul_vf)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index bd7ee4de18..8bb6ac158f 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3154,3 +3154,25 @@ RVVCALL(OPFVF2, vfrdiv_vf_d, OP_UUU_D, H8, H8, float64_rdiv)
 GEN_VEXT_VF(vfrdiv_vf_h, 2, 2, clearh)
 GEN_VEXT_VF(vfrdiv_vf_w, 4, 4, clearl)
 GEN_VEXT_VF(vfrdiv_vf_d, 8, 8, clearq)
+
+/* Vector Widening Floating-Point Multiply */
+static uint32_t vfwmul16(uint16_t a, uint16_t b, float_status *s)
+{
+    return float32_mul(float16_to_float32(a, true, s),
+            float16_to_float32(b, true, s), s);
+}
+
+static uint64_t vfwmul32(uint32_t a, uint32_t b, float_status *s)
+{
+    return float64_mul(float32_to_float64(a, s),
+            float32_to_float64(b, s), s);
+
+}
+RVVCALL(OPFVV2, vfwmul_vv_h, WOP_UUU_H, H4, H2, H2, vfwmul16)
+RVVCALL(OPFVV2, vfwmul_vv_w, WOP_UUU_W, H8, H4, H4, vfwmul32)
+GEN_VEXT_VV_ENV(vfwmul_vv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vfwmul_vv_w, 4, 8, clearq)
+RVVCALL(OPFVF2, vfwmul_vf_h, WOP_UUU_H, H4, H2, vfwmul16)
+RVVCALL(OPFVF2, vfwmul_vf_w, WOP_UUU_W, H8, H4, vfwmul32)
+GEN_VEXT_VF(vfwmul_vf_h, 2, 4, clearl)
+GEN_VEXT_VF(vfwmul_vf_w, 4, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 32/60] target/riscv: vector widening floating-point multiply
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  5 +++++
 target/riscv/insn32.decode              |  2 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  4 ++++
 target/riscv/vector_helper.c            | 22 ++++++++++++++++++++++
 4 files changed, 33 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index a2d7ed19a8..3ec2dcadd4 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -834,3 +834,8 @@ DEF_HELPER_6(vfdiv_vf_d, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfrdiv_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfrdiv_vf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfrdiv_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_6(vfwmul_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwmul_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwmul_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwmul_vf_w, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 050b2fd467..e0ee8f5a7c 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -456,6 +456,8 @@ vfmul_vf        100100 . ..... ..... 101 ..... 1010111 @r_vm
 vfdiv_vv        100000 . ..... ..... 001 ..... 1010111 @r_vm
 vfdiv_vf        100000 . ..... ..... 101 ..... 1010111 @r_vm
 vfrdiv_vf       100001 . ..... ..... 101 ..... 1010111 @r_vm
+vfwmul_vv       111000 . ..... ..... 001 ..... 1010111 @r_vm
+vfwmul_vf       111000 . ..... ..... 101 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 8dcbff6c64..b4d3797685 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1802,3 +1802,7 @@ GEN_OPFVV_TRANS(vfdiv_vv, opfvv_check)
 GEN_OPFVF_TRANS(vfmul_vf,  opfvf_check)
 GEN_OPFVF_TRANS(vfdiv_vf,  opfvf_check)
 GEN_OPFVF_TRANS(vfrdiv_vf,  opfvf_check)
+
+/* Vector Widening Floating-Point Multiply */
+GEN_OPFVV_WIDEN_TRANS(vfwmul_vv, opfvv_widen_check)
+GEN_OPFVF_WIDEN_TRANS(vfwmul_vf)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index bd7ee4de18..8bb6ac158f 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3154,3 +3154,25 @@ RVVCALL(OPFVF2, vfrdiv_vf_d, OP_UUU_D, H8, H8, float64_rdiv)
 GEN_VEXT_VF(vfrdiv_vf_h, 2, 2, clearh)
 GEN_VEXT_VF(vfrdiv_vf_w, 4, 4, clearl)
 GEN_VEXT_VF(vfrdiv_vf_d, 8, 8, clearq)
+
+/* Vector Widening Floating-Point Multiply */
+static uint32_t vfwmul16(uint16_t a, uint16_t b, float_status *s)
+{
+    return float32_mul(float16_to_float32(a, true, s),
+            float16_to_float32(b, true, s), s);
+}
+
+static uint64_t vfwmul32(uint32_t a, uint32_t b, float_status *s)
+{
+    return float64_mul(float32_to_float64(a, s),
+            float32_to_float64(b, s), s);
+
+}
+RVVCALL(OPFVV2, vfwmul_vv_h, WOP_UUU_H, H4, H2, H2, vfwmul16)
+RVVCALL(OPFVV2, vfwmul_vv_w, WOP_UUU_W, H8, H4, H4, vfwmul32)
+GEN_VEXT_VV_ENV(vfwmul_vv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vfwmul_vv_w, 4, 8, clearq)
+RVVCALL(OPFVF2, vfwmul_vf_h, WOP_UUU_H, H4, H2, vfwmul16)
+RVVCALL(OPFVF2, vfwmul_vf_w, WOP_UUU_W, H8, H4, vfwmul32)
+GEN_VEXT_VF(vfwmul_vf_h, 2, 4, clearl)
+GEN_VEXT_VF(vfwmul_vf_w, 4, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 33/60] target/riscv: vector single-width floating-point fused multiply-add instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  49 +++++
 target/riscv/insn32.decode              |  16 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  18 ++
 target/riscv/vector_helper.c            | 228 ++++++++++++++++++++++++
 4 files changed, 311 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 3ec2dcadd4..3b6dd96918 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -839,3 +839,52 @@ DEF_HELPER_6(vfwmul_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vfwmul_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vfwmul_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfwmul_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_6(vfmacc_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmacc_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmacc_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmacc_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmacc_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmacc_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmsac_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmsac_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmsac_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmsac_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmsac_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmsac_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmsub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmsub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmacc_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmacc_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmacc_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmacc_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmacc_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmacc_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmsac_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmsac_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmsac_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmsac_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmsac_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmsac_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmadd_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmadd_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmadd_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmadd_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmadd_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmadd_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmsub_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmsub_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmsub_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmsub_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmsub_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmsub_vf_d, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e0ee8f5a7c..9834091a86 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -458,6 +458,22 @@ vfdiv_vf        100000 . ..... ..... 101 ..... 1010111 @r_vm
 vfrdiv_vf       100001 . ..... ..... 101 ..... 1010111 @r_vm
 vfwmul_vv       111000 . ..... ..... 001 ..... 1010111 @r_vm
 vfwmul_vf       111000 . ..... ..... 101 ..... 1010111 @r_vm
+vfmacc_vv       101100 . ..... ..... 001 ..... 1010111 @r_vm
+vfnmacc_vv      101101 . ..... ..... 001 ..... 1010111 @r_vm
+vfnmacc_vf      101101 . ..... ..... 101 ..... 1010111 @r_vm
+vfmacc_vf       101100 . ..... ..... 101 ..... 1010111 @r_vm
+vfmsac_vv       101110 . ..... ..... 001 ..... 1010111 @r_vm
+vfmsac_vf       101110 . ..... ..... 101 ..... 1010111 @r_vm
+vfnmsac_vv      101111 . ..... ..... 001 ..... 1010111 @r_vm
+vfnmsac_vf      101111 . ..... ..... 101 ..... 1010111 @r_vm
+vfmadd_vv       101000 . ..... ..... 001 ..... 1010111 @r_vm
+vfmadd_vf       101000 . ..... ..... 101 ..... 1010111 @r_vm
+vfnmadd_vv      101001 . ..... ..... 001 ..... 1010111 @r_vm
+vfnmadd_vf      101001 . ..... ..... 101 ..... 1010111 @r_vm
+vfmsub_vv       101010 . ..... ..... 001 ..... 1010111 @r_vm
+vfmsub_vf       101010 . ..... ..... 101 ..... 1010111 @r_vm
+vfnmsub_vv      101011 . ..... ..... 001 ..... 1010111 @r_vm
+vfnmsub_vf      101011 . ..... ..... 101 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index b4d3797685..172de867ea 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1806,3 +1806,21 @@ GEN_OPFVF_TRANS(vfrdiv_vf,  opfvf_check)
 /* Vector Widening Floating-Point Multiply */
 GEN_OPFVV_WIDEN_TRANS(vfwmul_vv, opfvv_widen_check)
 GEN_OPFVF_WIDEN_TRANS(vfwmul_vf)
+
+/* Vector Single-Width Floating-Point Fused Multiply-Add Instructions */
+GEN_OPFVV_TRANS(vfmacc_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfnmacc_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfmsac_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfnmsac_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfmadd_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfnmadd_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfmsub_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfnmsub_vv, opfvv_check)
+GEN_OPFVF_TRANS(vfmacc_vf, opfvf_check)
+GEN_OPFVF_TRANS(vfnmacc_vf, opfvf_check)
+GEN_OPFVF_TRANS(vfmsac_vf, opfvf_check)
+GEN_OPFVF_TRANS(vfnmsac_vf, opfvf_check)
+GEN_OPFVF_TRANS(vfmadd_vf, opfvf_check)
+GEN_OPFVF_TRANS(vfnmadd_vf, opfvf_check)
+GEN_OPFVF_TRANS(vfmsub_vf, opfvf_check)
+GEN_OPFVF_TRANS(vfnmsub_vf, opfvf_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 8bb6ac158f..2e0341adb0 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3176,3 +3176,231 @@ RVVCALL(OPFVF2, vfwmul_vf_h, WOP_UUU_H, H4, H2, vfwmul16)
 RVVCALL(OPFVF2, vfwmul_vf_w, WOP_UUU_W, H8, H4, vfwmul32)
 GEN_VEXT_VF(vfwmul_vf_h, 2, 4, clearl)
 GEN_VEXT_VF(vfwmul_vf_w, 4, 8, clearq)
+
+/* Vector Single-Width Floating-Point Fused Multiply-Add Instructions */
+#define OPFVV3(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)       \
+static void do_##NAME(void *vd, void *vs1, void *vs2, int i,       \
+        CPURISCVState *env)                                        \
+{                                                                  \
+    TX1 s1 = *((T1 *)vs1 + HS1(i));                                \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                                \
+    TD d = *((TD *)vd + HD(i));                                    \
+    *((TD *)vd + HD(i)) = OP(s2, s1, d, &env->fp_status);          \
+}
+static uint16_t fmacc16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
+{
+    return float16_muladd(a, b, d, 0, s);
+}
+
+static uint32_t fmacc32(uint32_t a, uint32_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(a, b, d, 0, s);
+}
+
+static uint64_t fmacc64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(a, b, d, 0, s);
+}
+RVVCALL(OPFVV3, vfmacc_vv_h, OP_UUU_H, H2, H2, H2, fmacc16)
+RVVCALL(OPFVV3, vfmacc_vv_w, OP_UUU_W, H4, H4, H4, fmacc32)
+RVVCALL(OPFVV3, vfmacc_vv_d, OP_UUU_D, H8, H8, H8, fmacc64)
+GEN_VEXT_VV_ENV(vfmacc_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfmacc_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfmacc_vv_d, 8, 8, clearq)
+
+#define OPFVF3(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)           \
+static void do_##NAME(void *vd, uint64_t s1, void *vs2, int i,    \
+        CPURISCVState *env)                                       \
+{                                                                 \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                               \
+    TD d = *((TD *)vd + HD(i));                                   \
+    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1, d, &env->fp_status);\
+}
+RVVCALL(OPFVF3, vfmacc_vf_h, OP_UUU_H, H2, H2, fmacc16)
+RVVCALL(OPFVF3, vfmacc_vf_w, OP_UUU_W, H4, H4, fmacc32)
+RVVCALL(OPFVF3, vfmacc_vf_d, OP_UUU_D, H8, H8, fmacc64)
+GEN_VEXT_VF(vfmacc_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfmacc_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfmacc_vf_d, 8, 8, clearq)
+
+static uint16_t fnmacc16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
+{
+    return float16_muladd(a, b, d,
+            float_muladd_negate_c | float_muladd_negate_product, s);
+}
+static uint32_t fnmacc32(uint32_t a, uint32_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(a, b, d,
+            float_muladd_negate_c | float_muladd_negate_product, s);
+}
+static uint64_t fnmacc64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(a, b, d,
+            float_muladd_negate_c | float_muladd_negate_product, s);
+}
+RVVCALL(OPFVV3, vfnmacc_vv_h, OP_UUU_H, H2, H2, H2, fnmacc16)
+RVVCALL(OPFVV3, vfnmacc_vv_w, OP_UUU_W, H4, H4, H4, fnmacc32)
+RVVCALL(OPFVV3, vfnmacc_vv_d, OP_UUU_D, H8, H8, H8, fnmacc64)
+GEN_VEXT_VV_ENV(vfnmacc_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfnmacc_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfnmacc_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF3, vfnmacc_vf_h, OP_UUU_H, H2, H2, fnmacc16)
+RVVCALL(OPFVF3, vfnmacc_vf_w, OP_UUU_W, H4, H4, fnmacc32)
+RVVCALL(OPFVF3, vfnmacc_vf_d, OP_UUU_D, H8, H8, fnmacc64)
+GEN_VEXT_VF(vfnmacc_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfnmacc_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfnmacc_vf_d, 8, 8, clearq)
+
+static uint16_t fmsac16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
+{
+    return float16_muladd(a, b, d, float_muladd_negate_c, s);
+}
+static uint32_t fmsac32(uint32_t a, uint32_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(a, b, d, float_muladd_negate_c, s);
+}
+static uint64_t fmsac64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(a, b, d, float_muladd_negate_c, s);
+}
+RVVCALL(OPFVV3, vfmsac_vv_h, OP_UUU_H, H2, H2, H2, fmsac16)
+RVVCALL(OPFVV3, vfmsac_vv_w, OP_UUU_W, H4, H4, H4, fmsac32)
+RVVCALL(OPFVV3, vfmsac_vv_d, OP_UUU_D, H8, H8, H8, fmsac64)
+GEN_VEXT_VV_ENV(vfmsac_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfmsac_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfmsac_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF3, vfmsac_vf_h, OP_UUU_H, H2, H2, fmsac16)
+RVVCALL(OPFVF3, vfmsac_vf_w, OP_UUU_W, H4, H4, fmsac32)
+RVVCALL(OPFVF3, vfmsac_vf_d, OP_UUU_D, H8, H8, fmsac64)
+GEN_VEXT_VF(vfmsac_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfmsac_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfmsac_vf_d, 8, 8, clearq)
+
+static uint16_t fnmsac16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
+{
+    return float16_muladd(a, b, d, float_muladd_negate_product, s);
+}
+static uint32_t fnmsac32(uint32_t a, uint32_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(a, b, d, float_muladd_negate_product, s);
+}
+static uint64_t fnmsac64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(a, b, d, float_muladd_negate_product, s);
+}
+RVVCALL(OPFVV3, vfnmsac_vv_h, OP_UUU_H, H2, H2, H2, fnmsac16)
+RVVCALL(OPFVV3, vfnmsac_vv_w, OP_UUU_W, H4, H4, H4, fnmsac32)
+RVVCALL(OPFVV3, vfnmsac_vv_d, OP_UUU_D, H8, H8, H8, fnmsac64)
+GEN_VEXT_VV_ENV(vfnmsac_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfnmsac_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfnmsac_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF3, vfnmsac_vf_h, OP_UUU_H, H2, H2, fnmsac16)
+RVVCALL(OPFVF3, vfnmsac_vf_w, OP_UUU_W, H4, H4, fnmsac32)
+RVVCALL(OPFVF3, vfnmsac_vf_d, OP_UUU_D, H8, H8, fnmsac64)
+GEN_VEXT_VF(vfnmsac_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfnmsac_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfnmsac_vf_d, 8, 8, clearq)
+
+static uint16_t fmadd16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
+{
+    return float16_muladd(d, b, a, 0, s);
+}
+static uint32_t fmadd32(uint32_t a, uint32_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(d, b, a, 0, s);
+}
+static uint64_t fmadd64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(d, b, a, 0, s);
+}
+RVVCALL(OPFVV3, vfmadd_vv_h, OP_UUU_H, H2, H2, H2, fmadd16)
+RVVCALL(OPFVV3, vfmadd_vv_w, OP_UUU_W, H4, H4, H4, fmadd32)
+RVVCALL(OPFVV3, vfmadd_vv_d, OP_UUU_D, H8, H8, H8, fmadd64)
+GEN_VEXT_VV_ENV(vfmadd_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfmadd_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfmadd_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF3, vfmadd_vf_h, OP_UUU_H, H2, H2, fmadd16)
+RVVCALL(OPFVF3, vfmadd_vf_w, OP_UUU_W, H4, H4, fmadd32)
+RVVCALL(OPFVF3, vfmadd_vf_d, OP_UUU_D, H8, H8, fmadd64)
+GEN_VEXT_VF(vfmadd_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfmadd_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfmadd_vf_d, 8, 8, clearq)
+
+static uint16_t fnmadd16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
+{
+    return float16_muladd(d, b, a,
+            float_muladd_negate_c | float_muladd_negate_product, s);
+}
+static uint32_t fnmadd32(uint32_t a, uint32_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(d, b, a,
+            float_muladd_negate_c | float_muladd_negate_product, s);
+}
+static uint64_t fnmadd64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(d, b, a,
+            float_muladd_negate_c | float_muladd_negate_product, s);
+}
+
+RVVCALL(OPFVV3, vfnmadd_vv_h, OP_UUU_H, H2, H2, H2, fnmadd16)
+RVVCALL(OPFVV3, vfnmadd_vv_w, OP_UUU_W, H4, H4, H4, fnmadd32)
+RVVCALL(OPFVV3, vfnmadd_vv_d, OP_UUU_D, H8, H8, H8, fnmadd64)
+GEN_VEXT_VV_ENV(vfnmadd_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfnmadd_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfnmadd_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF3, vfnmadd_vf_h, OP_UUU_H, H2, H2, fnmadd16)
+RVVCALL(OPFVF3, vfnmadd_vf_w, OP_UUU_W, H4, H4, fnmadd32)
+RVVCALL(OPFVF3, vfnmadd_vf_d, OP_UUU_D, H8, H8, fnmadd64)
+GEN_VEXT_VF(vfnmadd_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfnmadd_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfnmadd_vf_d, 8, 8, clearq)
+
+static uint16_t fmsub16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
+{
+    return float16_muladd(d, b, a, float_muladd_negate_c, s);
+}
+static uint32_t fmsub32(uint32_t a, uint32_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(d, b, a, float_muladd_negate_c, s);
+}
+static uint64_t fmsub64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(d, b, a, float_muladd_negate_c, s);
+}
+RVVCALL(OPFVV3, vfmsub_vv_h, OP_UUU_H, H2, H2, H2, fmsub16)
+RVVCALL(OPFVV3, vfmsub_vv_w, OP_UUU_W, H4, H4, H4, fmsub32)
+RVVCALL(OPFVV3, vfmsub_vv_d, OP_UUU_D, H8, H8, H8, fmsub64)
+GEN_VEXT_VV_ENV(vfmsub_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfmsub_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfmsub_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF3, vfmsub_vf_h, OP_UUU_H, H2, H2, fmsub16)
+RVVCALL(OPFVF3, vfmsub_vf_w, OP_UUU_W, H4, H4, fmsub32)
+RVVCALL(OPFVF3, vfmsub_vf_d, OP_UUU_D, H8, H8, fmsub64)
+GEN_VEXT_VF(vfmsub_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfmsub_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfmsub_vf_d, 8, 8, clearq)
+
+static uint16_t fnmsub16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
+{
+    return float16_muladd(d, b, a, float_muladd_negate_product, s);
+}
+static uint32_t fnmsub32(uint32_t a, uint32_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(d, b, a, float_muladd_negate_product, s);
+}
+static uint64_t fnmsub64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(d, b, a, float_muladd_negate_product, s);
+}
+RVVCALL(OPFVV3, vfnmsub_vv_h, OP_UUU_H, H2, H2, H2, fnmsub16)
+RVVCALL(OPFVV3, vfnmsub_vv_w, OP_UUU_W, H4, H4, H4, fnmsub32)
+RVVCALL(OPFVV3, vfnmsub_vv_d, OP_UUU_D, H8, H8, H8, fnmsub64)
+GEN_VEXT_VV_ENV(vfnmsub_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfnmsub_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfnmsub_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF3, vfnmsub_vf_h, OP_UUU_H, H2, H2, fnmsub16)
+RVVCALL(OPFVF3, vfnmsub_vf_w, OP_UUU_W, H4, H4, fnmsub32)
+RVVCALL(OPFVF3, vfnmsub_vf_d, OP_UUU_D, H8, H8, fnmsub64)
+GEN_VEXT_VF(vfnmsub_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfnmsub_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfnmsub_vf_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 33/60] target/riscv: vector single-width floating-point fused multiply-add instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  49 +++++
 target/riscv/insn32.decode              |  16 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  18 ++
 target/riscv/vector_helper.c            | 228 ++++++++++++++++++++++++
 4 files changed, 311 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 3ec2dcadd4..3b6dd96918 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -839,3 +839,52 @@ DEF_HELPER_6(vfwmul_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vfwmul_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vfwmul_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfwmul_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_6(vfmacc_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmacc_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmacc_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmacc_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmacc_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmacc_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmsac_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmsac_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmsac_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmsac_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmsac_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmsac_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmsub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmsub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmacc_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmacc_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmacc_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmacc_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmacc_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmacc_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmsac_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmsac_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmsac_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmsac_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmsac_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmsac_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmadd_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmadd_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmadd_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmadd_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmadd_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmadd_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmsub_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmsub_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmsub_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmsub_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmsub_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmsub_vf_d, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e0ee8f5a7c..9834091a86 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -458,6 +458,22 @@ vfdiv_vf        100000 . ..... ..... 101 ..... 1010111 @r_vm
 vfrdiv_vf       100001 . ..... ..... 101 ..... 1010111 @r_vm
 vfwmul_vv       111000 . ..... ..... 001 ..... 1010111 @r_vm
 vfwmul_vf       111000 . ..... ..... 101 ..... 1010111 @r_vm
+vfmacc_vv       101100 . ..... ..... 001 ..... 1010111 @r_vm
+vfnmacc_vv      101101 . ..... ..... 001 ..... 1010111 @r_vm
+vfnmacc_vf      101101 . ..... ..... 101 ..... 1010111 @r_vm
+vfmacc_vf       101100 . ..... ..... 101 ..... 1010111 @r_vm
+vfmsac_vv       101110 . ..... ..... 001 ..... 1010111 @r_vm
+vfmsac_vf       101110 . ..... ..... 101 ..... 1010111 @r_vm
+vfnmsac_vv      101111 . ..... ..... 001 ..... 1010111 @r_vm
+vfnmsac_vf      101111 . ..... ..... 101 ..... 1010111 @r_vm
+vfmadd_vv       101000 . ..... ..... 001 ..... 1010111 @r_vm
+vfmadd_vf       101000 . ..... ..... 101 ..... 1010111 @r_vm
+vfnmadd_vv      101001 . ..... ..... 001 ..... 1010111 @r_vm
+vfnmadd_vf      101001 . ..... ..... 101 ..... 1010111 @r_vm
+vfmsub_vv       101010 . ..... ..... 001 ..... 1010111 @r_vm
+vfmsub_vf       101010 . ..... ..... 101 ..... 1010111 @r_vm
+vfnmsub_vv      101011 . ..... ..... 001 ..... 1010111 @r_vm
+vfnmsub_vf      101011 . ..... ..... 101 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index b4d3797685..172de867ea 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1806,3 +1806,21 @@ GEN_OPFVF_TRANS(vfrdiv_vf,  opfvf_check)
 /* Vector Widening Floating-Point Multiply */
 GEN_OPFVV_WIDEN_TRANS(vfwmul_vv, opfvv_widen_check)
 GEN_OPFVF_WIDEN_TRANS(vfwmul_vf)
+
+/* Vector Single-Width Floating-Point Fused Multiply-Add Instructions */
+GEN_OPFVV_TRANS(vfmacc_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfnmacc_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfmsac_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfnmsac_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfmadd_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfnmadd_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfmsub_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfnmsub_vv, opfvv_check)
+GEN_OPFVF_TRANS(vfmacc_vf, opfvf_check)
+GEN_OPFVF_TRANS(vfnmacc_vf, opfvf_check)
+GEN_OPFVF_TRANS(vfmsac_vf, opfvf_check)
+GEN_OPFVF_TRANS(vfnmsac_vf, opfvf_check)
+GEN_OPFVF_TRANS(vfmadd_vf, opfvf_check)
+GEN_OPFVF_TRANS(vfnmadd_vf, opfvf_check)
+GEN_OPFVF_TRANS(vfmsub_vf, opfvf_check)
+GEN_OPFVF_TRANS(vfnmsub_vf, opfvf_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 8bb6ac158f..2e0341adb0 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3176,3 +3176,231 @@ RVVCALL(OPFVF2, vfwmul_vf_h, WOP_UUU_H, H4, H2, vfwmul16)
 RVVCALL(OPFVF2, vfwmul_vf_w, WOP_UUU_W, H8, H4, vfwmul32)
 GEN_VEXT_VF(vfwmul_vf_h, 2, 4, clearl)
 GEN_VEXT_VF(vfwmul_vf_w, 4, 8, clearq)
+
+/* Vector Single-Width Floating-Point Fused Multiply-Add Instructions */
+#define OPFVV3(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)       \
+static void do_##NAME(void *vd, void *vs1, void *vs2, int i,       \
+        CPURISCVState *env)                                        \
+{                                                                  \
+    TX1 s1 = *((T1 *)vs1 + HS1(i));                                \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                                \
+    TD d = *((TD *)vd + HD(i));                                    \
+    *((TD *)vd + HD(i)) = OP(s2, s1, d, &env->fp_status);          \
+}
+static uint16_t fmacc16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
+{
+    return float16_muladd(a, b, d, 0, s);
+}
+
+static uint32_t fmacc32(uint32_t a, uint32_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(a, b, d, 0, s);
+}
+
+static uint64_t fmacc64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(a, b, d, 0, s);
+}
+RVVCALL(OPFVV3, vfmacc_vv_h, OP_UUU_H, H2, H2, H2, fmacc16)
+RVVCALL(OPFVV3, vfmacc_vv_w, OP_UUU_W, H4, H4, H4, fmacc32)
+RVVCALL(OPFVV3, vfmacc_vv_d, OP_UUU_D, H8, H8, H8, fmacc64)
+GEN_VEXT_VV_ENV(vfmacc_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfmacc_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfmacc_vv_d, 8, 8, clearq)
+
+#define OPFVF3(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)           \
+static void do_##NAME(void *vd, uint64_t s1, void *vs2, int i,    \
+        CPURISCVState *env)                                       \
+{                                                                 \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                               \
+    TD d = *((TD *)vd + HD(i));                                   \
+    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1, d, &env->fp_status);\
+}
+RVVCALL(OPFVF3, vfmacc_vf_h, OP_UUU_H, H2, H2, fmacc16)
+RVVCALL(OPFVF3, vfmacc_vf_w, OP_UUU_W, H4, H4, fmacc32)
+RVVCALL(OPFVF3, vfmacc_vf_d, OP_UUU_D, H8, H8, fmacc64)
+GEN_VEXT_VF(vfmacc_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfmacc_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfmacc_vf_d, 8, 8, clearq)
+
+static uint16_t fnmacc16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
+{
+    return float16_muladd(a, b, d,
+            float_muladd_negate_c | float_muladd_negate_product, s);
+}
+static uint32_t fnmacc32(uint32_t a, uint32_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(a, b, d,
+            float_muladd_negate_c | float_muladd_negate_product, s);
+}
+static uint64_t fnmacc64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(a, b, d,
+            float_muladd_negate_c | float_muladd_negate_product, s);
+}
+RVVCALL(OPFVV3, vfnmacc_vv_h, OP_UUU_H, H2, H2, H2, fnmacc16)
+RVVCALL(OPFVV3, vfnmacc_vv_w, OP_UUU_W, H4, H4, H4, fnmacc32)
+RVVCALL(OPFVV3, vfnmacc_vv_d, OP_UUU_D, H8, H8, H8, fnmacc64)
+GEN_VEXT_VV_ENV(vfnmacc_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfnmacc_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfnmacc_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF3, vfnmacc_vf_h, OP_UUU_H, H2, H2, fnmacc16)
+RVVCALL(OPFVF3, vfnmacc_vf_w, OP_UUU_W, H4, H4, fnmacc32)
+RVVCALL(OPFVF3, vfnmacc_vf_d, OP_UUU_D, H8, H8, fnmacc64)
+GEN_VEXT_VF(vfnmacc_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfnmacc_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfnmacc_vf_d, 8, 8, clearq)
+
+static uint16_t fmsac16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
+{
+    return float16_muladd(a, b, d, float_muladd_negate_c, s);
+}
+static uint32_t fmsac32(uint32_t a, uint32_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(a, b, d, float_muladd_negate_c, s);
+}
+static uint64_t fmsac64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(a, b, d, float_muladd_negate_c, s);
+}
+RVVCALL(OPFVV3, vfmsac_vv_h, OP_UUU_H, H2, H2, H2, fmsac16)
+RVVCALL(OPFVV3, vfmsac_vv_w, OP_UUU_W, H4, H4, H4, fmsac32)
+RVVCALL(OPFVV3, vfmsac_vv_d, OP_UUU_D, H8, H8, H8, fmsac64)
+GEN_VEXT_VV_ENV(vfmsac_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfmsac_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfmsac_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF3, vfmsac_vf_h, OP_UUU_H, H2, H2, fmsac16)
+RVVCALL(OPFVF3, vfmsac_vf_w, OP_UUU_W, H4, H4, fmsac32)
+RVVCALL(OPFVF3, vfmsac_vf_d, OP_UUU_D, H8, H8, fmsac64)
+GEN_VEXT_VF(vfmsac_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfmsac_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfmsac_vf_d, 8, 8, clearq)
+
+static uint16_t fnmsac16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
+{
+    return float16_muladd(a, b, d, float_muladd_negate_product, s);
+}
+static uint32_t fnmsac32(uint32_t a, uint32_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(a, b, d, float_muladd_negate_product, s);
+}
+static uint64_t fnmsac64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(a, b, d, float_muladd_negate_product, s);
+}
+RVVCALL(OPFVV3, vfnmsac_vv_h, OP_UUU_H, H2, H2, H2, fnmsac16)
+RVVCALL(OPFVV3, vfnmsac_vv_w, OP_UUU_W, H4, H4, H4, fnmsac32)
+RVVCALL(OPFVV3, vfnmsac_vv_d, OP_UUU_D, H8, H8, H8, fnmsac64)
+GEN_VEXT_VV_ENV(vfnmsac_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfnmsac_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfnmsac_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF3, vfnmsac_vf_h, OP_UUU_H, H2, H2, fnmsac16)
+RVVCALL(OPFVF3, vfnmsac_vf_w, OP_UUU_W, H4, H4, fnmsac32)
+RVVCALL(OPFVF3, vfnmsac_vf_d, OP_UUU_D, H8, H8, fnmsac64)
+GEN_VEXT_VF(vfnmsac_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfnmsac_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfnmsac_vf_d, 8, 8, clearq)
+
+static uint16_t fmadd16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
+{
+    return float16_muladd(d, b, a, 0, s);
+}
+static uint32_t fmadd32(uint32_t a, uint32_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(d, b, a, 0, s);
+}
+static uint64_t fmadd64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(d, b, a, 0, s);
+}
+RVVCALL(OPFVV3, vfmadd_vv_h, OP_UUU_H, H2, H2, H2, fmadd16)
+RVVCALL(OPFVV3, vfmadd_vv_w, OP_UUU_W, H4, H4, H4, fmadd32)
+RVVCALL(OPFVV3, vfmadd_vv_d, OP_UUU_D, H8, H8, H8, fmadd64)
+GEN_VEXT_VV_ENV(vfmadd_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfmadd_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfmadd_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF3, vfmadd_vf_h, OP_UUU_H, H2, H2, fmadd16)
+RVVCALL(OPFVF3, vfmadd_vf_w, OP_UUU_W, H4, H4, fmadd32)
+RVVCALL(OPFVF3, vfmadd_vf_d, OP_UUU_D, H8, H8, fmadd64)
+GEN_VEXT_VF(vfmadd_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfmadd_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfmadd_vf_d, 8, 8, clearq)
+
+static uint16_t fnmadd16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
+{
+    return float16_muladd(d, b, a,
+            float_muladd_negate_c | float_muladd_negate_product, s);
+}
+static uint32_t fnmadd32(uint32_t a, uint32_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(d, b, a,
+            float_muladd_negate_c | float_muladd_negate_product, s);
+}
+static uint64_t fnmadd64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(d, b, a,
+            float_muladd_negate_c | float_muladd_negate_product, s);
+}
+
+RVVCALL(OPFVV3, vfnmadd_vv_h, OP_UUU_H, H2, H2, H2, fnmadd16)
+RVVCALL(OPFVV3, vfnmadd_vv_w, OP_UUU_W, H4, H4, H4, fnmadd32)
+RVVCALL(OPFVV3, vfnmadd_vv_d, OP_UUU_D, H8, H8, H8, fnmadd64)
+GEN_VEXT_VV_ENV(vfnmadd_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfnmadd_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfnmadd_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF3, vfnmadd_vf_h, OP_UUU_H, H2, H2, fnmadd16)
+RVVCALL(OPFVF3, vfnmadd_vf_w, OP_UUU_W, H4, H4, fnmadd32)
+RVVCALL(OPFVF3, vfnmadd_vf_d, OP_UUU_D, H8, H8, fnmadd64)
+GEN_VEXT_VF(vfnmadd_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfnmadd_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfnmadd_vf_d, 8, 8, clearq)
+
+static uint16_t fmsub16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
+{
+    return float16_muladd(d, b, a, float_muladd_negate_c, s);
+}
+static uint32_t fmsub32(uint32_t a, uint32_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(d, b, a, float_muladd_negate_c, s);
+}
+static uint64_t fmsub64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(d, b, a, float_muladd_negate_c, s);
+}
+RVVCALL(OPFVV3, vfmsub_vv_h, OP_UUU_H, H2, H2, H2, fmsub16)
+RVVCALL(OPFVV3, vfmsub_vv_w, OP_UUU_W, H4, H4, H4, fmsub32)
+RVVCALL(OPFVV3, vfmsub_vv_d, OP_UUU_D, H8, H8, H8, fmsub64)
+GEN_VEXT_VV_ENV(vfmsub_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfmsub_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfmsub_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF3, vfmsub_vf_h, OP_UUU_H, H2, H2, fmsub16)
+RVVCALL(OPFVF3, vfmsub_vf_w, OP_UUU_W, H4, H4, fmsub32)
+RVVCALL(OPFVF3, vfmsub_vf_d, OP_UUU_D, H8, H8, fmsub64)
+GEN_VEXT_VF(vfmsub_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfmsub_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfmsub_vf_d, 8, 8, clearq)
+
+static uint16_t fnmsub16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
+{
+    return float16_muladd(d, b, a, float_muladd_negate_product, s);
+}
+static uint32_t fnmsub32(uint32_t a, uint32_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(d, b, a, float_muladd_negate_product, s);
+}
+static uint64_t fnmsub64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(d, b, a, float_muladd_negate_product, s);
+}
+RVVCALL(OPFVV3, vfnmsub_vv_h, OP_UUU_H, H2, H2, H2, fnmsub16)
+RVVCALL(OPFVV3, vfnmsub_vv_w, OP_UUU_W, H4, H4, H4, fnmsub32)
+RVVCALL(OPFVV3, vfnmsub_vv_d, OP_UUU_D, H8, H8, H8, fnmsub64)
+GEN_VEXT_VV_ENV(vfnmsub_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfnmsub_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfnmsub_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF3, vfnmsub_vf_h, OP_UUU_H, H2, H2, fnmsub16)
+RVVCALL(OPFVF3, vfnmsub_vf_w, OP_UUU_W, H4, H4, fnmsub32)
+RVVCALL(OPFVF3, vfnmsub_vf_d, OP_UUU_D, H8, H8, fnmsub64)
+GEN_VEXT_VF(vfnmsub_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfnmsub_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfnmsub_vf_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 34/60] target/riscv: vector widening floating-point fused multiply-add instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 17 +++++
 target/riscv/insn32.decode              |  8 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 10 +++
 target/riscv/vector_helper.c            | 84 +++++++++++++++++++++++++
 4 files changed, 119 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 3b6dd96918..57e0fee929 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -888,3 +888,20 @@ DEF_HELPER_6(vfmsub_vf_d, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfnmsub_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfnmsub_vf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfnmsub_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_6(vfwmacc_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwmacc_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwnmacc_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwnmacc_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwmsac_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwmsac_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwnmsac_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwnmsac_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwmacc_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwmacc_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwnmacc_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwnmacc_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwmsac_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwmsac_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwnmsac_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwnmsac_vf_w, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 9834091a86..b7cb116cf4 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -474,6 +474,14 @@ vfmsub_vv       101010 . ..... ..... 001 ..... 1010111 @r_vm
 vfmsub_vf       101010 . ..... ..... 101 ..... 1010111 @r_vm
 vfnmsub_vv      101011 . ..... ..... 001 ..... 1010111 @r_vm
 vfnmsub_vf      101011 . ..... ..... 101 ..... 1010111 @r_vm
+vfwmacc_vv      111100 . ..... ..... 001 ..... 1010111 @r_vm
+vfwmacc_vf      111100 . ..... ..... 101 ..... 1010111 @r_vm
+vfwnmacc_vv     111101 . ..... ..... 001 ..... 1010111 @r_vm
+vfwnmacc_vf     111101 . ..... ..... 101 ..... 1010111 @r_vm
+vfwmsac_vv      111110 . ..... ..... 001 ..... 1010111 @r_vm
+vfwmsac_vf      111110 . ..... ..... 101 ..... 1010111 @r_vm
+vfwnmsac_vv     111111 . ..... ..... 001 ..... 1010111 @r_vm
+vfwnmsac_vf     111111 . ..... ..... 101 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 172de867ea..06d6e2625b 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1824,3 +1824,13 @@ GEN_OPFVF_TRANS(vfmadd_vf, opfvf_check)
 GEN_OPFVF_TRANS(vfnmadd_vf, opfvf_check)
 GEN_OPFVF_TRANS(vfmsub_vf, opfvf_check)
 GEN_OPFVF_TRANS(vfnmsub_vf, opfvf_check)
+
+/* Vector Widening Floating-Point Fused Multiply-Add Instructions */
+GEN_OPFVV_WIDEN_TRANS(vfwmacc_vv, opfvv_widen_check)
+GEN_OPFVV_WIDEN_TRANS(vfwnmacc_vv, opfvv_widen_check)
+GEN_OPFVV_WIDEN_TRANS(vfwmsac_vv, opfvv_widen_check)
+GEN_OPFVV_WIDEN_TRANS(vfwnmsac_vv, opfvv_widen_check)
+GEN_OPFVF_WIDEN_TRANS(vfwmacc_vf)
+GEN_OPFVF_WIDEN_TRANS(vfwnmacc_vf)
+GEN_OPFVF_WIDEN_TRANS(vfwmsac_vf)
+GEN_OPFVF_WIDEN_TRANS(vfwnmsac_vf)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 2e0341adb0..9bff516a15 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3404,3 +3404,87 @@ RVVCALL(OPFVF3, vfnmsub_vf_d, OP_UUU_D, H8, H8, fnmsub64)
 GEN_VEXT_VF(vfnmsub_vf_h, 2, 2, clearh)
 GEN_VEXT_VF(vfnmsub_vf_w, 4, 4, clearl)
 GEN_VEXT_VF(vfnmsub_vf_d, 8, 8, clearq)
+
+/* Vector Widening Floating-Point Fused Multiply-Add Instructions */
+static uint32_t fwmacc16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(float16_to_float32(a, true, s),
+                        float16_to_float32(b, true, s), d, 0, s);
+}
+static uint64_t fwmacc32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(float32_to_float64(a, s),
+                        float32_to_float64(b, s), d, 0, s);
+}
+RVVCALL(OPFVV3, vfwmacc_vv_h, WOP_UUU_H, H4, H2, H2, fwmacc16)
+RVVCALL(OPFVV3, vfwmacc_vv_w, WOP_UUU_W, H8, H4, H4, fwmacc32)
+GEN_VEXT_VV_ENV(vfwmacc_vv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vfwmacc_vv_w, 4, 8, clearq)
+RVVCALL(OPFVF3, vfwmacc_vf_h, WOP_UUU_H, H4, H2, fwmacc16)
+RVVCALL(OPFVF3, vfwmacc_vf_w, WOP_UUU_W, H8, H4, fwmacc32)
+GEN_VEXT_VF(vfwmacc_vf_h, 2, 4, clearl)
+GEN_VEXT_VF(vfwmacc_vf_w, 4, 8, clearq)
+
+static uint32_t fwnmacc16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(float16_to_float32(a, true, s),
+                        float16_to_float32(b, true, s), d,
+                        float_muladd_negate_c | float_muladd_negate_product, s);
+}
+static uint64_t fwnmacc32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(float32_to_float64(a, s),
+                        float32_to_float64(b, s), d,
+                        float_muladd_negate_c | float_muladd_negate_product, s);
+}
+RVVCALL(OPFVV3, vfwnmacc_vv_h, WOP_UUU_H, H4, H2, H2, fwnmacc16)
+RVVCALL(OPFVV3, vfwnmacc_vv_w, WOP_UUU_W, H8, H4, H4, fwnmacc32)
+GEN_VEXT_VV_ENV(vfwnmacc_vv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vfwnmacc_vv_w, 4, 8, clearq)
+RVVCALL(OPFVF3, vfwnmacc_vf_h, WOP_UUU_H, H4, H2, fwnmacc16)
+RVVCALL(OPFVF3, vfwnmacc_vf_w, WOP_UUU_W, H8, H4, fwnmacc32)
+GEN_VEXT_VF(vfwnmacc_vf_h, 2, 4, clearl)
+GEN_VEXT_VF(vfwnmacc_vf_w, 4, 8, clearq)
+
+static uint32_t fwmsac16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(float16_to_float32(a, true, s),
+                        float16_to_float32(b, true, s), d,
+                        float_muladd_negate_c, s);
+}
+static uint64_t fwmsac32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(float32_to_float64(a, s),
+                        float32_to_float64(b, s), d,
+                        float_muladd_negate_c, s);
+}
+
+RVVCALL(OPFVV3, vfwmsac_vv_h, WOP_UUU_H, H4, H2, H2, fwmsac16)
+RVVCALL(OPFVV3, vfwmsac_vv_w, WOP_UUU_W, H8, H4, H4, fwmsac32)
+GEN_VEXT_VV_ENV(vfwmsac_vv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vfwmsac_vv_w, 4, 8, clearq)
+RVVCALL(OPFVF3, vfwmsac_vf_h, WOP_UUU_H, H4, H2, fwmsac16)
+RVVCALL(OPFVF3, vfwmsac_vf_w, WOP_UUU_W, H8, H4, fwmsac32)
+GEN_VEXT_VF(vfwmsac_vf_h, 2, 4, clearl)
+GEN_VEXT_VF(vfwmsac_vf_w, 4, 8, clearq)
+
+static uint32_t fwnmsac16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(float16_to_float32(a, true, s),
+                        float16_to_float32(b, true, s), d,
+                        float_muladd_negate_product, s);
+}
+static uint64_t fwnmsac32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(float32_to_float64(a, s),
+                        float32_to_float64(b, s), d,
+                        float_muladd_negate_product, s);
+}
+RVVCALL(OPFVV3, vfwnmsac_vv_h, WOP_UUU_H, H4, H2, H2, fwnmsac16)
+RVVCALL(OPFVV3, vfwnmsac_vv_w, WOP_UUU_W, H8, H4, H4, fwnmsac32)
+GEN_VEXT_VV_ENV(vfwnmsac_vv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vfwnmsac_vv_w, 4, 8, clearq)
+RVVCALL(OPFVF3, vfwnmsac_vf_h, WOP_UUU_H, H4, H2, fwnmsac16)
+RVVCALL(OPFVF3, vfwnmsac_vf_w, WOP_UUU_W, H8, H4, fwnmsac32)
+GEN_VEXT_VF(vfwnmsac_vf_h, 2, 4, clearl)
+GEN_VEXT_VF(vfwnmsac_vf_w, 4, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 34/60] target/riscv: vector widening floating-point fused multiply-add instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 17 +++++
 target/riscv/insn32.decode              |  8 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 10 +++
 target/riscv/vector_helper.c            | 84 +++++++++++++++++++++++++
 4 files changed, 119 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 3b6dd96918..57e0fee929 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -888,3 +888,20 @@ DEF_HELPER_6(vfmsub_vf_d, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfnmsub_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfnmsub_vf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfnmsub_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_6(vfwmacc_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwmacc_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwnmacc_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwnmacc_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwmsac_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwmsac_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwnmsac_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwnmsac_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwmacc_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwmacc_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwnmacc_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwnmacc_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwmsac_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwmsac_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwnmsac_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwnmsac_vf_w, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 9834091a86..b7cb116cf4 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -474,6 +474,14 @@ vfmsub_vv       101010 . ..... ..... 001 ..... 1010111 @r_vm
 vfmsub_vf       101010 . ..... ..... 101 ..... 1010111 @r_vm
 vfnmsub_vv      101011 . ..... ..... 001 ..... 1010111 @r_vm
 vfnmsub_vf      101011 . ..... ..... 101 ..... 1010111 @r_vm
+vfwmacc_vv      111100 . ..... ..... 001 ..... 1010111 @r_vm
+vfwmacc_vf      111100 . ..... ..... 101 ..... 1010111 @r_vm
+vfwnmacc_vv     111101 . ..... ..... 001 ..... 1010111 @r_vm
+vfwnmacc_vf     111101 . ..... ..... 101 ..... 1010111 @r_vm
+vfwmsac_vv      111110 . ..... ..... 001 ..... 1010111 @r_vm
+vfwmsac_vf      111110 . ..... ..... 101 ..... 1010111 @r_vm
+vfwnmsac_vv     111111 . ..... ..... 001 ..... 1010111 @r_vm
+vfwnmsac_vf     111111 . ..... ..... 101 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 172de867ea..06d6e2625b 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1824,3 +1824,13 @@ GEN_OPFVF_TRANS(vfmadd_vf, opfvf_check)
 GEN_OPFVF_TRANS(vfnmadd_vf, opfvf_check)
 GEN_OPFVF_TRANS(vfmsub_vf, opfvf_check)
 GEN_OPFVF_TRANS(vfnmsub_vf, opfvf_check)
+
+/* Vector Widening Floating-Point Fused Multiply-Add Instructions */
+GEN_OPFVV_WIDEN_TRANS(vfwmacc_vv, opfvv_widen_check)
+GEN_OPFVV_WIDEN_TRANS(vfwnmacc_vv, opfvv_widen_check)
+GEN_OPFVV_WIDEN_TRANS(vfwmsac_vv, opfvv_widen_check)
+GEN_OPFVV_WIDEN_TRANS(vfwnmsac_vv, opfvv_widen_check)
+GEN_OPFVF_WIDEN_TRANS(vfwmacc_vf)
+GEN_OPFVF_WIDEN_TRANS(vfwnmacc_vf)
+GEN_OPFVF_WIDEN_TRANS(vfwmsac_vf)
+GEN_OPFVF_WIDEN_TRANS(vfwnmsac_vf)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 2e0341adb0..9bff516a15 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3404,3 +3404,87 @@ RVVCALL(OPFVF3, vfnmsub_vf_d, OP_UUU_D, H8, H8, fnmsub64)
 GEN_VEXT_VF(vfnmsub_vf_h, 2, 2, clearh)
 GEN_VEXT_VF(vfnmsub_vf_w, 4, 4, clearl)
 GEN_VEXT_VF(vfnmsub_vf_d, 8, 8, clearq)
+
+/* Vector Widening Floating-Point Fused Multiply-Add Instructions */
+static uint32_t fwmacc16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(float16_to_float32(a, true, s),
+                        float16_to_float32(b, true, s), d, 0, s);
+}
+static uint64_t fwmacc32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(float32_to_float64(a, s),
+                        float32_to_float64(b, s), d, 0, s);
+}
+RVVCALL(OPFVV3, vfwmacc_vv_h, WOP_UUU_H, H4, H2, H2, fwmacc16)
+RVVCALL(OPFVV3, vfwmacc_vv_w, WOP_UUU_W, H8, H4, H4, fwmacc32)
+GEN_VEXT_VV_ENV(vfwmacc_vv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vfwmacc_vv_w, 4, 8, clearq)
+RVVCALL(OPFVF3, vfwmacc_vf_h, WOP_UUU_H, H4, H2, fwmacc16)
+RVVCALL(OPFVF3, vfwmacc_vf_w, WOP_UUU_W, H8, H4, fwmacc32)
+GEN_VEXT_VF(vfwmacc_vf_h, 2, 4, clearl)
+GEN_VEXT_VF(vfwmacc_vf_w, 4, 8, clearq)
+
+static uint32_t fwnmacc16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(float16_to_float32(a, true, s),
+                        float16_to_float32(b, true, s), d,
+                        float_muladd_negate_c | float_muladd_negate_product, s);
+}
+static uint64_t fwnmacc32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(float32_to_float64(a, s),
+                        float32_to_float64(b, s), d,
+                        float_muladd_negate_c | float_muladd_negate_product, s);
+}
+RVVCALL(OPFVV3, vfwnmacc_vv_h, WOP_UUU_H, H4, H2, H2, fwnmacc16)
+RVVCALL(OPFVV3, vfwnmacc_vv_w, WOP_UUU_W, H8, H4, H4, fwnmacc32)
+GEN_VEXT_VV_ENV(vfwnmacc_vv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vfwnmacc_vv_w, 4, 8, clearq)
+RVVCALL(OPFVF3, vfwnmacc_vf_h, WOP_UUU_H, H4, H2, fwnmacc16)
+RVVCALL(OPFVF3, vfwnmacc_vf_w, WOP_UUU_W, H8, H4, fwnmacc32)
+GEN_VEXT_VF(vfwnmacc_vf_h, 2, 4, clearl)
+GEN_VEXT_VF(vfwnmacc_vf_w, 4, 8, clearq)
+
+static uint32_t fwmsac16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(float16_to_float32(a, true, s),
+                        float16_to_float32(b, true, s), d,
+                        float_muladd_negate_c, s);
+}
+static uint64_t fwmsac32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(float32_to_float64(a, s),
+                        float32_to_float64(b, s), d,
+                        float_muladd_negate_c, s);
+}
+
+RVVCALL(OPFVV3, vfwmsac_vv_h, WOP_UUU_H, H4, H2, H2, fwmsac16)
+RVVCALL(OPFVV3, vfwmsac_vv_w, WOP_UUU_W, H8, H4, H4, fwmsac32)
+GEN_VEXT_VV_ENV(vfwmsac_vv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vfwmsac_vv_w, 4, 8, clearq)
+RVVCALL(OPFVF3, vfwmsac_vf_h, WOP_UUU_H, H4, H2, fwmsac16)
+RVVCALL(OPFVF3, vfwmsac_vf_w, WOP_UUU_W, H8, H4, fwmsac32)
+GEN_VEXT_VF(vfwmsac_vf_h, 2, 4, clearl)
+GEN_VEXT_VF(vfwmsac_vf_w, 4, 8, clearq)
+
+static uint32_t fwnmsac16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
+{
+    return float32_muladd(float16_to_float32(a, true, s),
+                        float16_to_float32(b, true, s), d,
+                        float_muladd_negate_product, s);
+}
+static uint64_t fwnmsac32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
+{
+    return float64_muladd(float32_to_float64(a, s),
+                        float32_to_float64(b, s), d,
+                        float_muladd_negate_product, s);
+}
+RVVCALL(OPFVV3, vfwnmsac_vv_h, WOP_UUU_H, H4, H2, H2, fwnmsac16)
+RVVCALL(OPFVV3, vfwnmsac_vv_w, WOP_UUU_W, H8, H4, H4, fwnmsac32)
+GEN_VEXT_VV_ENV(vfwnmsac_vv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vfwnmsac_vv_w, 4, 8, clearq)
+RVVCALL(OPFVF3, vfwnmsac_vf_h, WOP_UUU_H, H4, H2, fwnmsac16)
+RVVCALL(OPFVF3, vfwnmsac_vf_w, WOP_UUU_W, H8, H4, fwnmsac32)
+GEN_VEXT_VF(vfwnmsac_vf_h, 2, 4, clearl)
+GEN_VEXT_VF(vfwnmsac_vf_w, 4, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 35/60] target/riscv: vector floating-point square-root instruction
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  4 +++
 target/riscv/insn32.decode              |  3 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 37 +++++++++++++++++++++++
 target/riscv/vector_helper.c            | 40 +++++++++++++++++++++++++
 4 files changed, 84 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 57e0fee929..c2f9871490 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -905,3 +905,7 @@ DEF_HELPER_6(vfwmsac_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfwmsac_vf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfwnmsac_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfwnmsac_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_5(vfsqrt_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfsqrt_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfsqrt_v_d, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b7cb116cf4..fc9aebc6d6 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -45,6 +45,7 @@
 &shift     shamt rs1 rd
 &atomic    aq rl rs2 rs1 rd
 &rmrr      vm rd rs1 rs2
+&rmr       vm rd rs2
 &rwdvm     vm wd rd rs1 rs2
 &r2nfvm    vm rd rs1 nf
 &rnfvm     vm rd rs1 rs2 nf
@@ -68,6 +69,7 @@
 @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
 @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
 @r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
+@r2_vm   ...... vm:1 ..... ..... ... ..... ....... &rmr %rs2 %rd
 @r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
 @r_vm    ...... vm:1 ..... ..... ... ..... ....... &rmrr %rs2 %rs1 %rd
 @r_wdvm  ..... wd:1 vm:1 ..... ..... ... ..... ....... &rwdvm %rs2 %rs1 %rd
@@ -482,6 +484,7 @@ vfwmsac_vv      111110 . ..... ..... 001 ..... 1010111 @r_vm
 vfwmsac_vf      111110 . ..... ..... 101 ..... 1010111 @r_vm
 vfwnmsac_vv     111111 . ..... ..... 001 ..... 1010111 @r_vm
 vfwnmsac_vf     111111 . ..... ..... 101 ..... 1010111 @r_vm
+vfsqrt_v        100011 . ..... 00000 001 ..... 1010111 @r2_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 06d6e2625b..3e4f7de240 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1834,3 +1834,40 @@ GEN_OPFVF_WIDEN_TRANS(vfwmacc_vf)
 GEN_OPFVF_WIDEN_TRANS(vfwnmacc_vf)
 GEN_OPFVF_WIDEN_TRANS(vfwmsac_vf)
 GEN_OPFVF_WIDEN_TRANS(vfwnmsac_vf)
+
+/* Vector Floating-Point Square-Root Instruction */
+
+/*
+ * If the current SEW does not correspond to a supported IEEE floating-point
+ * type, an illegal instruction exception is raised
+ */
+static bool opfv_check(DisasContext *s, arg_rmr *a)
+{
+   return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            (s->sew != 0));
+}
+
+#define GEN_OPFV_TRANS(NAME, CHECK)                                \
+static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
+{                                                                  \
+    if (CHECK(s, a)) {                                             \
+        uint32_t data = 0;                                         \
+        static gen_helper_gvec_3_ptr * const fns[3] = {            \
+            gen_helper_##NAME##_h,                                 \
+            gen_helper_##NAME##_w,                                 \
+            gen_helper_##NAME##_d,                                 \
+        };                                                         \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
+            vreg_ofs(s, a->rs2), cpu_env, 0,                       \
+            s->vlen / 8, data, fns[s->sew - 1]);                   \
+        return true;                                               \
+    }                                                              \
+    return false;                                                  \
+}
+GEN_OPFV_TRANS(vfsqrt_v, opfv_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 9bff516a15..088bb51af0 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3488,3 +3488,43 @@ RVVCALL(OPFVF3, vfwnmsac_vf_h, WOP_UUU_H, H4, H2, fwnmsac16)
 RVVCALL(OPFVF3, vfwnmsac_vf_w, WOP_UUU_W, H8, H4, fwnmsac32)
 GEN_VEXT_VF(vfwnmsac_vf_h, 2, 4, clearl)
 GEN_VEXT_VF(vfwnmsac_vf_w, 4, 8, clearq)
+
+/* Vector Floating-Point Square-Root Instruction */
+/* (TD, T2, TX2) */
+#define OP_UU_H uint16_t, uint16_t, uint16_t
+#define OP_UU_W uint32_t, uint32_t, uint32_t
+#define OP_UU_D uint64_t, uint64_t, uint64_t
+
+#define OPFVV1(NAME, TD, T2, TX2, HD, HS2, OP)        \
+static void do_##NAME(void *vd, void *vs2, int i,      \
+        CPURISCVState *env)                            \
+{                                                      \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                    \
+    *((TD *)vd + HD(i)) = OP(s2, &env->fp_status);     \
+}
+
+#define GEN_VEXT_V_ENV(NAME, ESZ, DSZ, CLEAR_FN)       \
+void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
+        CPURISCVState *env, uint32_t desc)             \
+{                                                      \
+    uint32_t vlmax = vext_maxsz(desc) / ESZ;           \
+    uint32_t mlen = vext_mlen(desc);                   \
+    uint32_t vm = vext_vm(desc);                       \
+    uint32_t vl = env->vl;                             \
+    uint32_t i;                                        \
+    for (i = 0; i < vl; i++) {                         \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {     \
+            continue;                                  \
+        }                                              \
+        do_##NAME(vd, vs2, i, env);                    \
+    }                                                  \
+    if (i != 0) {                                      \
+        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);      \
+    }                                                  \
+}
+RVVCALL(OPFVV1, vfsqrt_v_h, OP_UU_H, H2, H2, float16_sqrt)
+RVVCALL(OPFVV1, vfsqrt_v_w, OP_UU_W, H4, H4, float32_sqrt)
+RVVCALL(OPFVV1, vfsqrt_v_d, OP_UU_D, H8, H8, float64_sqrt)
+GEN_VEXT_V_ENV(vfsqrt_v_h, 2, 2, clearh)
+GEN_VEXT_V_ENV(vfsqrt_v_w, 4, 4, clearl)
+GEN_VEXT_V_ENV(vfsqrt_v_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 35/60] target/riscv: vector floating-point square-root instruction
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  4 +++
 target/riscv/insn32.decode              |  3 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 37 +++++++++++++++++++++++
 target/riscv/vector_helper.c            | 40 +++++++++++++++++++++++++
 4 files changed, 84 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 57e0fee929..c2f9871490 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -905,3 +905,7 @@ DEF_HELPER_6(vfwmsac_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfwmsac_vf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfwnmsac_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfwnmsac_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_5(vfsqrt_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfsqrt_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfsqrt_v_d, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b7cb116cf4..fc9aebc6d6 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -45,6 +45,7 @@
 &shift     shamt rs1 rd
 &atomic    aq rl rs2 rs1 rd
 &rmrr      vm rd rs1 rs2
+&rmr       vm rd rs2
 &rwdvm     vm wd rd rs1 rs2
 &r2nfvm    vm rd rs1 nf
 &rnfvm     vm rd rs1 rs2 nf
@@ -68,6 +69,7 @@
 @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
 @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
 @r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
+@r2_vm   ...... vm:1 ..... ..... ... ..... ....... &rmr %rs2 %rd
 @r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
 @r_vm    ...... vm:1 ..... ..... ... ..... ....... &rmrr %rs2 %rs1 %rd
 @r_wdvm  ..... wd:1 vm:1 ..... ..... ... ..... ....... &rwdvm %rs2 %rs1 %rd
@@ -482,6 +484,7 @@ vfwmsac_vv      111110 . ..... ..... 001 ..... 1010111 @r_vm
 vfwmsac_vf      111110 . ..... ..... 101 ..... 1010111 @r_vm
 vfwnmsac_vv     111111 . ..... ..... 001 ..... 1010111 @r_vm
 vfwnmsac_vf     111111 . ..... ..... 101 ..... 1010111 @r_vm
+vfsqrt_v        100011 . ..... 00000 001 ..... 1010111 @r2_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 06d6e2625b..3e4f7de240 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1834,3 +1834,40 @@ GEN_OPFVF_WIDEN_TRANS(vfwmacc_vf)
 GEN_OPFVF_WIDEN_TRANS(vfwnmacc_vf)
 GEN_OPFVF_WIDEN_TRANS(vfwmsac_vf)
 GEN_OPFVF_WIDEN_TRANS(vfwnmsac_vf)
+
+/* Vector Floating-Point Square-Root Instruction */
+
+/*
+ * If the current SEW does not correspond to a supported IEEE floating-point
+ * type, an illegal instruction exception is raised
+ */
+static bool opfv_check(DisasContext *s, arg_rmr *a)
+{
+   return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            (s->sew != 0));
+}
+
+#define GEN_OPFV_TRANS(NAME, CHECK)                                \
+static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
+{                                                                  \
+    if (CHECK(s, a)) {                                             \
+        uint32_t data = 0;                                         \
+        static gen_helper_gvec_3_ptr * const fns[3] = {            \
+            gen_helper_##NAME##_h,                                 \
+            gen_helper_##NAME##_w,                                 \
+            gen_helper_##NAME##_d,                                 \
+        };                                                         \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
+            vreg_ofs(s, a->rs2), cpu_env, 0,                       \
+            s->vlen / 8, data, fns[s->sew - 1]);                   \
+        return true;                                               \
+    }                                                              \
+    return false;                                                  \
+}
+GEN_OPFV_TRANS(vfsqrt_v, opfv_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 9bff516a15..088bb51af0 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3488,3 +3488,43 @@ RVVCALL(OPFVF3, vfwnmsac_vf_h, WOP_UUU_H, H4, H2, fwnmsac16)
 RVVCALL(OPFVF3, vfwnmsac_vf_w, WOP_UUU_W, H8, H4, fwnmsac32)
 GEN_VEXT_VF(vfwnmsac_vf_h, 2, 4, clearl)
 GEN_VEXT_VF(vfwnmsac_vf_w, 4, 8, clearq)
+
+/* Vector Floating-Point Square-Root Instruction */
+/* (TD, T2, TX2) */
+#define OP_UU_H uint16_t, uint16_t, uint16_t
+#define OP_UU_W uint32_t, uint32_t, uint32_t
+#define OP_UU_D uint64_t, uint64_t, uint64_t
+
+#define OPFVV1(NAME, TD, T2, TX2, HD, HS2, OP)        \
+static void do_##NAME(void *vd, void *vs2, int i,      \
+        CPURISCVState *env)                            \
+{                                                      \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                    \
+    *((TD *)vd + HD(i)) = OP(s2, &env->fp_status);     \
+}
+
+#define GEN_VEXT_V_ENV(NAME, ESZ, DSZ, CLEAR_FN)       \
+void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
+        CPURISCVState *env, uint32_t desc)             \
+{                                                      \
+    uint32_t vlmax = vext_maxsz(desc) / ESZ;           \
+    uint32_t mlen = vext_mlen(desc);                   \
+    uint32_t vm = vext_vm(desc);                       \
+    uint32_t vl = env->vl;                             \
+    uint32_t i;                                        \
+    for (i = 0; i < vl; i++) {                         \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {     \
+            continue;                                  \
+        }                                              \
+        do_##NAME(vd, vs2, i, env);                    \
+    }                                                  \
+    if (i != 0) {                                      \
+        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);      \
+    }                                                  \
+}
+RVVCALL(OPFVV1, vfsqrt_v_h, OP_UU_H, H2, H2, float16_sqrt)
+RVVCALL(OPFVV1, vfsqrt_v_w, OP_UU_W, H4, H4, float32_sqrt)
+RVVCALL(OPFVV1, vfsqrt_v_d, OP_UU_D, H8, H8, float64_sqrt)
+GEN_VEXT_V_ENV(vfsqrt_v_h, 2, 2, clearh)
+GEN_VEXT_V_ENV(vfsqrt_v_w, 4, 4, clearl)
+GEN_VEXT_V_ENV(vfsqrt_v_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 36/60] target/riscv: vector floating-point min/max instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 13 ++++++++++++
 target/riscv/insn32.decode              |  4 ++++
 target/riscv/insn_trans/trans_rvv.inc.c |  6 ++++++
 target/riscv/vector_helper.c            | 27 +++++++++++++++++++++++++
 4 files changed, 50 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index c2f9871490..9d1f443d02 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -909,3 +909,16 @@ DEF_HELPER_6(vfwnmsac_vf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_5(vfsqrt_v_h, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfsqrt_v_w, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfsqrt_v_d, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vfmin_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmin_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmin_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmax_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmax_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmax_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmin_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmin_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmin_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmax_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmax_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmax_vf_d, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index fc9aebc6d6..e42cbcebc0 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -485,6 +485,10 @@ vfwmsac_vf      111110 . ..... ..... 101 ..... 1010111 @r_vm
 vfwnmsac_vv     111111 . ..... ..... 001 ..... 1010111 @r_vm
 vfwnmsac_vf     111111 . ..... ..... 101 ..... 1010111 @r_vm
 vfsqrt_v        100011 . ..... 00000 001 ..... 1010111 @r2_vm
+vfmin_vv        000100 . ..... ..... 001 ..... 1010111 @r_vm
+vfmin_vf        000100 . ..... ..... 101 ..... 1010111 @r_vm
+vfmax_vv        000110 . ..... ..... 001 ..... 1010111 @r_vm
+vfmax_vf        000110 . ..... ..... 101 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 3e4f7de240..7e7e59c0d6 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1871,3 +1871,9 @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
     return false;                                                  \
 }
 GEN_OPFV_TRANS(vfsqrt_v, opfv_check)
+
+/* Vector Floating-Point MIN/MAX Instructions */
+GEN_OPFVV_TRANS(vfmin_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfmax_vv, opfvv_check)
+GEN_OPFVF_TRANS(vfmin_vf, opfvf_check)
+GEN_OPFVF_TRANS(vfmax_vf, opfvf_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 088bb51af0..3bcfce10aa 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3528,3 +3528,30 @@ RVVCALL(OPFVV1, vfsqrt_v_d, OP_UU_D, H8, H8, float64_sqrt)
 GEN_VEXT_V_ENV(vfsqrt_v_h, 2, 2, clearh)
 GEN_VEXT_V_ENV(vfsqrt_v_w, 4, 4, clearl)
 GEN_VEXT_V_ENV(vfsqrt_v_d, 8, 8, clearq)
+
+/* Vector Floating-Point MIN/MAX Instructions */
+RVVCALL(OPFVV2, vfmin_vv_h, OP_UUU_H, H2, H2, H2, float16_minnum)
+RVVCALL(OPFVV2, vfmin_vv_w, OP_UUU_W, H4, H4, H4, float32_minnum)
+RVVCALL(OPFVV2, vfmin_vv_d, OP_UUU_D, H8, H8, H8, float64_minnum)
+GEN_VEXT_VV_ENV(vfmin_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfmin_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfmin_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF2, vfmin_vf_h, OP_UUU_H, H2, H2, float16_minnum)
+RVVCALL(OPFVF2, vfmin_vf_w, OP_UUU_W, H4, H4, float32_minnum)
+RVVCALL(OPFVF2, vfmin_vf_d, OP_UUU_D, H8, H8, float64_minnum)
+GEN_VEXT_VF(vfmin_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfmin_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfmin_vf_d, 8, 8, clearq)
+
+RVVCALL(OPFVV2, vfmax_vv_h, OP_UUU_H, H2, H2, H2, float16_max)
+RVVCALL(OPFVV2, vfmax_vv_w, OP_UUU_W, H4, H4, H4, float32_max)
+RVVCALL(OPFVV2, vfmax_vv_d, OP_UUU_D, H8, H8, H8, float64_max)
+GEN_VEXT_VV_ENV(vfmax_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfmax_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfmax_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF2, vfmax_vf_h, OP_UUU_H, H2, H2, float16_max)
+RVVCALL(OPFVF2, vfmax_vf_w, OP_UUU_W, H4, H4, float32_max)
+RVVCALL(OPFVF2, vfmax_vf_d, OP_UUU_D, H8, H8, float64_max)
+GEN_VEXT_VF(vfmax_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfmax_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfmax_vf_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 36/60] target/riscv: vector floating-point min/max instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 13 ++++++++++++
 target/riscv/insn32.decode              |  4 ++++
 target/riscv/insn_trans/trans_rvv.inc.c |  6 ++++++
 target/riscv/vector_helper.c            | 27 +++++++++++++++++++++++++
 4 files changed, 50 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index c2f9871490..9d1f443d02 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -909,3 +909,16 @@ DEF_HELPER_6(vfwnmsac_vf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_5(vfsqrt_v_h, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfsqrt_v_w, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfsqrt_v_d, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vfmin_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmin_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmin_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmax_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmax_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmax_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmin_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmin_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmin_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmax_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmax_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmax_vf_d, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index fc9aebc6d6..e42cbcebc0 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -485,6 +485,10 @@ vfwmsac_vf      111110 . ..... ..... 101 ..... 1010111 @r_vm
 vfwnmsac_vv     111111 . ..... ..... 001 ..... 1010111 @r_vm
 vfwnmsac_vf     111111 . ..... ..... 101 ..... 1010111 @r_vm
 vfsqrt_v        100011 . ..... 00000 001 ..... 1010111 @r2_vm
+vfmin_vv        000100 . ..... ..... 001 ..... 1010111 @r_vm
+vfmin_vf        000100 . ..... ..... 101 ..... 1010111 @r_vm
+vfmax_vv        000110 . ..... ..... 001 ..... 1010111 @r_vm
+vfmax_vf        000110 . ..... ..... 101 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 3e4f7de240..7e7e59c0d6 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1871,3 +1871,9 @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
     return false;                                                  \
 }
 GEN_OPFV_TRANS(vfsqrt_v, opfv_check)
+
+/* Vector Floating-Point MIN/MAX Instructions */
+GEN_OPFVV_TRANS(vfmin_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfmax_vv, opfvv_check)
+GEN_OPFVF_TRANS(vfmin_vf, opfvf_check)
+GEN_OPFVF_TRANS(vfmax_vf, opfvf_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 088bb51af0..3bcfce10aa 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3528,3 +3528,30 @@ RVVCALL(OPFVV1, vfsqrt_v_d, OP_UU_D, H8, H8, float64_sqrt)
 GEN_VEXT_V_ENV(vfsqrt_v_h, 2, 2, clearh)
 GEN_VEXT_V_ENV(vfsqrt_v_w, 4, 4, clearl)
 GEN_VEXT_V_ENV(vfsqrt_v_d, 8, 8, clearq)
+
+/* Vector Floating-Point MIN/MAX Instructions */
+RVVCALL(OPFVV2, vfmin_vv_h, OP_UUU_H, H2, H2, H2, float16_minnum)
+RVVCALL(OPFVV2, vfmin_vv_w, OP_UUU_W, H4, H4, H4, float32_minnum)
+RVVCALL(OPFVV2, vfmin_vv_d, OP_UUU_D, H8, H8, H8, float64_minnum)
+GEN_VEXT_VV_ENV(vfmin_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfmin_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfmin_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF2, vfmin_vf_h, OP_UUU_H, H2, H2, float16_minnum)
+RVVCALL(OPFVF2, vfmin_vf_w, OP_UUU_W, H4, H4, float32_minnum)
+RVVCALL(OPFVF2, vfmin_vf_d, OP_UUU_D, H8, H8, float64_minnum)
+GEN_VEXT_VF(vfmin_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfmin_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfmin_vf_d, 8, 8, clearq)
+
+RVVCALL(OPFVV2, vfmax_vv_h, OP_UUU_H, H2, H2, H2, float16_max)
+RVVCALL(OPFVV2, vfmax_vv_w, OP_UUU_W, H4, H4, H4, float32_max)
+RVVCALL(OPFVV2, vfmax_vv_d, OP_UUU_D, H8, H8, H8, float64_max)
+GEN_VEXT_VV_ENV(vfmax_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfmax_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfmax_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF2, vfmax_vf_h, OP_UUU_H, H2, H2, float16_max)
+RVVCALL(OPFVF2, vfmax_vf_w, OP_UUU_W, H4, H4, float32_max)
+RVVCALL(OPFVF2, vfmax_vf_d, OP_UUU_D, H8, H8, float64_max)
+GEN_VEXT_VF(vfmax_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfmax_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfmax_vf_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 37/60] target/riscv: vector floating-point sign-injection instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 19 +++++++
 target/riscv/insn32.decode              |  6 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  8 +++
 target/riscv/vector_helper.c            | 76 +++++++++++++++++++++++++
 4 files changed, 109 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 9d1f443d02..efbd5d306d 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -922,3 +922,22 @@ DEF_HELPER_6(vfmin_vf_d, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfmax_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfmax_vf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfmax_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_6(vfsgnj_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsgnj_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsgnj_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsgnjn_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsgnjn_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsgnjn_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsgnjx_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsgnjx_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsgnjx_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsgnj_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsgnj_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsgnj_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsgnjn_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsgnjn_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsgnjn_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsgnjx_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsgnjx_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsgnjx_vf_d, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e42cbcebc0..d149eccd5c 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -489,6 +489,12 @@ vfmin_vv        000100 . ..... ..... 001 ..... 1010111 @r_vm
 vfmin_vf        000100 . ..... ..... 101 ..... 1010111 @r_vm
 vfmax_vv        000110 . ..... ..... 001 ..... 1010111 @r_vm
 vfmax_vf        000110 . ..... ..... 101 ..... 1010111 @r_vm
+vfsgnj_vv       001000 . ..... ..... 001 ..... 1010111 @r_vm
+vfsgnj_vf       001000 . ..... ..... 101 ..... 1010111 @r_vm
+vfsgnjn_vv      001001 . ..... ..... 001 ..... 1010111 @r_vm
+vfsgnjn_vf      001001 . ..... ..... 101 ..... 1010111 @r_vm
+vfsgnjx_vv      001010 . ..... ..... 001 ..... 1010111 @r_vm
+vfsgnjx_vf      001010 . ..... ..... 101 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 7e7e59c0d6..ff4ebd3f8c 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1877,3 +1877,11 @@ GEN_OPFVV_TRANS(vfmin_vv, opfvv_check)
 GEN_OPFVV_TRANS(vfmax_vv, opfvv_check)
 GEN_OPFVF_TRANS(vfmin_vf, opfvf_check)
 GEN_OPFVF_TRANS(vfmax_vf, opfvf_check)
+
+/* Vector Floating-Point Sign-Injection Instructions */
+GEN_OPFVV_TRANS(vfsgnj_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfsgnjn_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfsgnjx_vv, opfvv_check)
+GEN_OPFVF_TRANS(vfsgnj_vf, opfvf_check)
+GEN_OPFVF_TRANS(vfsgnjn_vf, opfvf_check)
+GEN_OPFVF_TRANS(vfsgnjx_vf, opfvf_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 3bcfce10aa..79e261ff1a 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3555,3 +3555,79 @@ RVVCALL(OPFVF2, vfmax_vf_d, OP_UUU_D, H8, H8, float64_max)
 GEN_VEXT_VF(vfmax_vf_h, 2, 2, clearh)
 GEN_VEXT_VF(vfmax_vf_w, 4, 4, clearl)
 GEN_VEXT_VF(vfmax_vf_d, 8, 8, clearq)
+
+/* Vector Floating-Point Sign-Injection Instructions */
+static uint16_t fsgnj16(uint16_t a, uint16_t b, float_status *s)
+{
+    return deposit64(b, 0, 15, a);
+}
+static uint32_t fsgnj32(uint32_t a, uint32_t b, float_status *s)
+{
+    return deposit64(b, 0, 31, a);
+}
+static uint64_t fsgnj64(uint64_t a, uint64_t b, float_status *s)
+{
+    return deposit64(b, 0, 63, a);
+}
+RVVCALL(OPFVV2, vfsgnj_vv_h, OP_UUU_H, H2, H2, H2, fsgnj16)
+RVVCALL(OPFVV2, vfsgnj_vv_w, OP_UUU_W, H4, H4, H4, fsgnj32)
+RVVCALL(OPFVV2, vfsgnj_vv_d, OP_UUU_D, H8, H8, H8, fsgnj64)
+GEN_VEXT_VV_ENV(vfsgnj_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfsgnj_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfsgnj_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF2, vfsgnj_vf_h, OP_UUU_H, H2, H2, fsgnj16)
+RVVCALL(OPFVF2, vfsgnj_vf_w, OP_UUU_W, H4, H4, fsgnj32)
+RVVCALL(OPFVF2, vfsgnj_vf_d, OP_UUU_D, H8, H8, fsgnj64)
+GEN_VEXT_VF(vfsgnj_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfsgnj_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfsgnj_vf_d, 8, 8, clearq)
+
+static uint16_t fsgnjn16(uint16_t a, uint16_t b, float_status *s)
+{
+    return deposit64(~b, 0, 15, a);
+}
+static uint32_t fsgnjn32(uint32_t a, uint32_t b, float_status *s)
+{
+    return deposit64(~b, 0, 31, a);
+}
+static uint64_t fsgnjn64(uint64_t a, uint64_t b, float_status *s)
+{
+    return deposit64(~b, 0, 63, a);
+}
+RVVCALL(OPFVV2, vfsgnjn_vv_h, OP_UUU_H, H2, H2, H2, fsgnjn16)
+RVVCALL(OPFVV2, vfsgnjn_vv_w, OP_UUU_W, H4, H4, H4, fsgnjn32)
+RVVCALL(OPFVV2, vfsgnjn_vv_d, OP_UUU_D, H8, H8, H8, fsgnjn64)
+GEN_VEXT_VV_ENV(vfsgnjn_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfsgnjn_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfsgnjn_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF2, vfsgnjn_vf_h, OP_UUU_H, H2, H2, fsgnjn16)
+RVVCALL(OPFVF2, vfsgnjn_vf_w, OP_UUU_W, H4, H4, fsgnjn32)
+RVVCALL(OPFVF2, vfsgnjn_vf_d, OP_UUU_D, H8, H8, fsgnjn64)
+GEN_VEXT_VF(vfsgnjn_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfsgnjn_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfsgnjn_vf_d, 8, 8, clearq)
+
+static uint16_t fsgnjx16(uint16_t a, uint16_t b, float_status *s)
+{
+    return deposit64(b ^ a, 0, 15, a);
+}
+static uint32_t fsgnjx32(uint32_t a, uint32_t b, float_status *s)
+{
+    return deposit64(b ^ a, 0, 31, a);
+}
+static uint64_t fsgnjx64(uint64_t a, uint64_t b, float_status *s)
+{
+    return deposit64(b ^ a, 0, 63, a);
+}
+RVVCALL(OPFVV2, vfsgnjx_vv_h, OP_UUU_H, H2, H2, H2, fsgnjx16)
+RVVCALL(OPFVV2, vfsgnjx_vv_w, OP_UUU_W, H4, H4, H4, fsgnjx32)
+RVVCALL(OPFVV2, vfsgnjx_vv_d, OP_UUU_D, H8, H8, H8, fsgnjx64)
+GEN_VEXT_VV_ENV(vfsgnjx_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfsgnjx_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfsgnjx_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF2, vfsgnjx_vf_h, OP_UUU_H, H2, H2, fsgnjx16)
+RVVCALL(OPFVF2, vfsgnjx_vf_w, OP_UUU_W, H4, H4, fsgnjx32)
+RVVCALL(OPFVF2, vfsgnjx_vf_d, OP_UUU_D, H8, H8, fsgnjx64)
+GEN_VEXT_VF(vfsgnjx_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfsgnjx_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfsgnjx_vf_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 37/60] target/riscv: vector floating-point sign-injection instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 19 +++++++
 target/riscv/insn32.decode              |  6 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  8 +++
 target/riscv/vector_helper.c            | 76 +++++++++++++++++++++++++
 4 files changed, 109 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 9d1f443d02..efbd5d306d 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -922,3 +922,22 @@ DEF_HELPER_6(vfmin_vf_d, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfmax_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfmax_vf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfmax_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_6(vfsgnj_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsgnj_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsgnj_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsgnjn_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsgnjn_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsgnjn_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsgnjx_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsgnjx_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsgnjx_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsgnj_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsgnj_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsgnj_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsgnjn_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsgnjn_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsgnjn_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsgnjx_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsgnjx_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsgnjx_vf_d, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e42cbcebc0..d149eccd5c 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -489,6 +489,12 @@ vfmin_vv        000100 . ..... ..... 001 ..... 1010111 @r_vm
 vfmin_vf        000100 . ..... ..... 101 ..... 1010111 @r_vm
 vfmax_vv        000110 . ..... ..... 001 ..... 1010111 @r_vm
 vfmax_vf        000110 . ..... ..... 101 ..... 1010111 @r_vm
+vfsgnj_vv       001000 . ..... ..... 001 ..... 1010111 @r_vm
+vfsgnj_vf       001000 . ..... ..... 101 ..... 1010111 @r_vm
+vfsgnjn_vv      001001 . ..... ..... 001 ..... 1010111 @r_vm
+vfsgnjn_vf      001001 . ..... ..... 101 ..... 1010111 @r_vm
+vfsgnjx_vv      001010 . ..... ..... 001 ..... 1010111 @r_vm
+vfsgnjx_vf      001010 . ..... ..... 101 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 7e7e59c0d6..ff4ebd3f8c 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1877,3 +1877,11 @@ GEN_OPFVV_TRANS(vfmin_vv, opfvv_check)
 GEN_OPFVV_TRANS(vfmax_vv, opfvv_check)
 GEN_OPFVF_TRANS(vfmin_vf, opfvf_check)
 GEN_OPFVF_TRANS(vfmax_vf, opfvf_check)
+
+/* Vector Floating-Point Sign-Injection Instructions */
+GEN_OPFVV_TRANS(vfsgnj_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfsgnjn_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfsgnjx_vv, opfvv_check)
+GEN_OPFVF_TRANS(vfsgnj_vf, opfvf_check)
+GEN_OPFVF_TRANS(vfsgnjn_vf, opfvf_check)
+GEN_OPFVF_TRANS(vfsgnjx_vf, opfvf_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 3bcfce10aa..79e261ff1a 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3555,3 +3555,79 @@ RVVCALL(OPFVF2, vfmax_vf_d, OP_UUU_D, H8, H8, float64_max)
 GEN_VEXT_VF(vfmax_vf_h, 2, 2, clearh)
 GEN_VEXT_VF(vfmax_vf_w, 4, 4, clearl)
 GEN_VEXT_VF(vfmax_vf_d, 8, 8, clearq)
+
+/* Vector Floating-Point Sign-Injection Instructions */
+static uint16_t fsgnj16(uint16_t a, uint16_t b, float_status *s)
+{
+    return deposit64(b, 0, 15, a);
+}
+static uint32_t fsgnj32(uint32_t a, uint32_t b, float_status *s)
+{
+    return deposit64(b, 0, 31, a);
+}
+static uint64_t fsgnj64(uint64_t a, uint64_t b, float_status *s)
+{
+    return deposit64(b, 0, 63, a);
+}
+RVVCALL(OPFVV2, vfsgnj_vv_h, OP_UUU_H, H2, H2, H2, fsgnj16)
+RVVCALL(OPFVV2, vfsgnj_vv_w, OP_UUU_W, H4, H4, H4, fsgnj32)
+RVVCALL(OPFVV2, vfsgnj_vv_d, OP_UUU_D, H8, H8, H8, fsgnj64)
+GEN_VEXT_VV_ENV(vfsgnj_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfsgnj_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfsgnj_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF2, vfsgnj_vf_h, OP_UUU_H, H2, H2, fsgnj16)
+RVVCALL(OPFVF2, vfsgnj_vf_w, OP_UUU_W, H4, H4, fsgnj32)
+RVVCALL(OPFVF2, vfsgnj_vf_d, OP_UUU_D, H8, H8, fsgnj64)
+GEN_VEXT_VF(vfsgnj_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfsgnj_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfsgnj_vf_d, 8, 8, clearq)
+
+static uint16_t fsgnjn16(uint16_t a, uint16_t b, float_status *s)
+{
+    return deposit64(~b, 0, 15, a);
+}
+static uint32_t fsgnjn32(uint32_t a, uint32_t b, float_status *s)
+{
+    return deposit64(~b, 0, 31, a);
+}
+static uint64_t fsgnjn64(uint64_t a, uint64_t b, float_status *s)
+{
+    return deposit64(~b, 0, 63, a);
+}
+RVVCALL(OPFVV2, vfsgnjn_vv_h, OP_UUU_H, H2, H2, H2, fsgnjn16)
+RVVCALL(OPFVV2, vfsgnjn_vv_w, OP_UUU_W, H4, H4, H4, fsgnjn32)
+RVVCALL(OPFVV2, vfsgnjn_vv_d, OP_UUU_D, H8, H8, H8, fsgnjn64)
+GEN_VEXT_VV_ENV(vfsgnjn_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfsgnjn_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfsgnjn_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF2, vfsgnjn_vf_h, OP_UUU_H, H2, H2, fsgnjn16)
+RVVCALL(OPFVF2, vfsgnjn_vf_w, OP_UUU_W, H4, H4, fsgnjn32)
+RVVCALL(OPFVF2, vfsgnjn_vf_d, OP_UUU_D, H8, H8, fsgnjn64)
+GEN_VEXT_VF(vfsgnjn_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfsgnjn_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfsgnjn_vf_d, 8, 8, clearq)
+
+static uint16_t fsgnjx16(uint16_t a, uint16_t b, float_status *s)
+{
+    return deposit64(b ^ a, 0, 15, a);
+}
+static uint32_t fsgnjx32(uint32_t a, uint32_t b, float_status *s)
+{
+    return deposit64(b ^ a, 0, 31, a);
+}
+static uint64_t fsgnjx64(uint64_t a, uint64_t b, float_status *s)
+{
+    return deposit64(b ^ a, 0, 63, a);
+}
+RVVCALL(OPFVV2, vfsgnjx_vv_h, OP_UUU_H, H2, H2, H2, fsgnjx16)
+RVVCALL(OPFVV2, vfsgnjx_vv_w, OP_UUU_W, H4, H4, H4, fsgnjx32)
+RVVCALL(OPFVV2, vfsgnjx_vv_d, OP_UUU_D, H8, H8, H8, fsgnjx64)
+GEN_VEXT_VV_ENV(vfsgnjx_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfsgnjx_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfsgnjx_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF2, vfsgnjx_vf_h, OP_UUU_H, H2, H2, fsgnjx16)
+RVVCALL(OPFVF2, vfsgnjx_vf_w, OP_UUU_W, H4, H4, fsgnjx32)
+RVVCALL(OPFVF2, vfsgnjx_vf_d, OP_UUU_D, H8, H8, fsgnjx64)
+GEN_VEXT_VF(vfsgnjx_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfsgnjx_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfsgnjx_vf_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 38/60] target/riscv: vector floating-point compare instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  37 ++++
 target/riscv/insn32.decode              |  12 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  33 ++++
 target/riscv/vector_helper.c            | 221 ++++++++++++++++++++++++
 4 files changed, 303 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index efbd5d306d..323bed038e 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -941,3 +941,40 @@ DEF_HELPER_6(vfsgnjn_vf_d, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfsgnjx_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfsgnjx_vf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfsgnjx_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_6(vmfeq_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmfeq_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmfeq_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmfne_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmfne_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmfne_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmflt_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmflt_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmflt_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmfle_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmfle_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmfle_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmfeq_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfeq_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfeq_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfne_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfne_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfne_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmflt_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmflt_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmflt_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfle_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfle_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfle_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfgt_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfgt_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfgt_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfge_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfge_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfge_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmford_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmford_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmford_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmford_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmford_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmford_vf_d, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index d149eccd5c..2d61256981 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -495,6 +495,18 @@ vfsgnjn_vv      001001 . ..... ..... 001 ..... 1010111 @r_vm
 vfsgnjn_vf      001001 . ..... ..... 101 ..... 1010111 @r_vm
 vfsgnjx_vv      001010 . ..... ..... 001 ..... 1010111 @r_vm
 vfsgnjx_vf      001010 . ..... ..... 101 ..... 1010111 @r_vm
+vmfeq_vv        011000 . ..... ..... 001 ..... 1010111 @r_vm
+vmfeq_vf        011000 . ..... ..... 101 ..... 1010111 @r_vm
+vmfne_vv        011100 . ..... ..... 001 ..... 1010111 @r_vm
+vmfne_vf        011100 . ..... ..... 101 ..... 1010111 @r_vm
+vmflt_vv        011011 . ..... ..... 001 ..... 1010111 @r_vm
+vmflt_vf        011011 . ..... ..... 101 ..... 1010111 @r_vm
+vmfle_vv        011001 . ..... ..... 001 ..... 1010111 @r_vm
+vmfle_vf        011001 . ..... ..... 101 ..... 1010111 @r_vm
+vmfgt_vf        011101 . ..... ..... 101 ..... 1010111 @r_vm
+vmfge_vf        011111 . ..... ..... 101 ..... 1010111 @r_vm
+vmford_vv       011010 . ..... ..... 001 ..... 1010111 @r_vm
+vmford_vf       011010 . ..... ..... 101 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index ff4ebd3f8c..9d9653e605 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1885,3 +1885,36 @@ GEN_OPFVV_TRANS(vfsgnjx_vv, opfvv_check)
 GEN_OPFVF_TRANS(vfsgnj_vf, opfvf_check)
 GEN_OPFVF_TRANS(vfsgnjn_vf, opfvf_check)
 GEN_OPFVF_TRANS(vfsgnjx_vf, opfvf_check)
+
+/* Vector Floating-Point Compare Instructions */
+static bool opfvv_cmp_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_reg(s, a->rs1, false) &&
+            (s->sew != 0) &&
+            ((vext_check_overlap_group(a->rd, 1, a->rs1, 1 << s->lmul) &&
+            vext_check_overlap_group(a->rd, 1, a->rs2, 1 << s->lmul)) ||
+            (s->lmul == 0)));
+}
+GEN_OPFVV_TRANS(vmfeq_vv, opfvv_cmp_check)
+GEN_OPFVV_TRANS(vmfne_vv, opfvv_cmp_check)
+GEN_OPFVV_TRANS(vmflt_vv, opfvv_cmp_check)
+GEN_OPFVV_TRANS(vmfle_vv, opfvv_cmp_check)
+GEN_OPFVV_TRANS(vmford_vv, opfvv_cmp_check)
+
+static bool opfvf_cmp_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rs2, false) &&
+            (s->sew != 0) &&
+            (vext_check_overlap_group(a->rd, 1, a->rs2, 1 << s->lmul) ||
+            (s->lmul == 0)));
+}
+GEN_OPFVF_TRANS(vmfeq_vf, opfvf_cmp_check)
+GEN_OPFVF_TRANS(vmfne_vf, opfvf_cmp_check)
+GEN_OPFVF_TRANS(vmflt_vf, opfvf_cmp_check)
+GEN_OPFVF_TRANS(vmfle_vf, opfvf_cmp_check)
+GEN_OPFVF_TRANS(vmfgt_vf, opfvf_cmp_check)
+GEN_OPFVF_TRANS(vmfge_vf, opfvf_cmp_check)
+GEN_OPFVF_TRANS(vmford_vf, opfvf_cmp_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 79e261ff1a..dd44cc57a3 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3631,3 +3631,224 @@ RVVCALL(OPFVF2, vfsgnjx_vf_d, OP_UUU_D, H8, H8, fsgnjx64)
 GEN_VEXT_VF(vfsgnjx_vf_h, 2, 2, clearh)
 GEN_VEXT_VF(vfsgnjx_vf_w, 4, 4, clearl)
 GEN_VEXT_VF(vfsgnjx_vf_d, 8, 8, clearq)
+
+/* Vector Floating-Point Compare Instructions */
+#define GEN_VEXT_CMP_VV_ENV(NAME, ETYPE, H, DO_OP)            \
+void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
+        CPURISCVState *env, uint32_t desc)                    \
+{                                                             \
+    uint32_t mlen = vext_mlen(desc);                          \
+    uint32_t vm = vext_vm(desc);                              \
+    uint32_t vl = env->vl;                                    \
+    uint32_t vlmax = vext_maxsz(desc) / sizeof(ETYPE);        \
+    uint32_t i;                                               \
+                                                              \
+    for (i = 0; i < vl; i++) {                                \
+        ETYPE s1 = *((ETYPE *)vs1 + H(i));                    \
+        ETYPE s2 = *((ETYPE *)vs2 + H(i));                    \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {            \
+            continue;                                         \
+        }                                                     \
+        vext_set_elem_mask(vd, mlen, i, DO_OP(s2, s1,         \
+            &env->fp_status));                                \
+    }                                                         \
+    if (i == 0) {                                             \
+        return;                                               \
+    }                                                         \
+    for (; i < vlmax; i++) {                                  \
+        vext_set_elem_mask(vd, mlen, i, 0);                   \
+    }                                                         \
+}
+
+static uint8_t float16_eq_quiet(uint16_t a, uint16_t b, float_status *s)
+{
+    int compare = float16_compare_quiet(a, b, s);
+    if (compare == float_relation_equal) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+GEN_VEXT_CMP_VV_ENV(vmfeq_vv_h, uint16_t, H2, float16_eq_quiet)
+GEN_VEXT_CMP_VV_ENV(vmfeq_vv_w, uint32_t, H4, float32_eq_quiet)
+GEN_VEXT_CMP_VV_ENV(vmfeq_vv_d, uint64_t, H8, float64_eq_quiet)
+
+#define GEN_VEXT_CMP_VF(NAME, ETYPE, H, DO_OP)                      \
+void HELPER(NAME)(void *vd, void *v0, uint64_t s1, void *vs2,       \
+        CPURISCVState *env, uint32_t desc)                          \
+{                                                                   \
+    uint32_t mlen = vext_mlen(desc);                                \
+    uint32_t vm = vext_vm(desc);                                    \
+    uint32_t vl = env->vl;                                          \
+    uint32_t vlmax = vext_maxsz(desc) / sizeof(ETYPE);              \
+    uint32_t i;                                                     \
+                                                                    \
+    for (i = 0; i < vl; i++) {                                      \
+        ETYPE s2 = *((ETYPE *)vs2 + H(i));                          \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                  \
+            continue;                                               \
+        }                                                           \
+        vext_set_elem_mask(vd, mlen, i, DO_OP(s2, (ETYPE)s1,        \
+                &env->fp_status));                                  \
+    }                                                               \
+    if (i == 0) {                                                   \
+        return;                                                     \
+    }                                                               \
+    for (; i < vlmax; i++) {                                        \
+        vext_set_elem_mask(vd, mlen, i, 0);                         \
+    }                                                               \
+}
+GEN_VEXT_CMP_VF(vmfeq_vf_h, uint16_t, H2, float16_eq_quiet)
+GEN_VEXT_CMP_VF(vmfeq_vf_w, uint32_t, H4, float32_eq_quiet)
+GEN_VEXT_CMP_VF(vmfeq_vf_d, uint64_t, H8, float64_eq_quiet)
+
+static uint8_t vmfne16(uint16_t a, uint16_t b, float_status *s)
+{
+    int compare = float16_compare_quiet(a, b, s);
+    if (compare != float_relation_equal &&
+            compare != float_relation_unordered) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+static uint8_t vmfne32(uint32_t a, uint32_t b, float_status *s)
+{
+    int compare = float32_compare_quiet(a, b, s);
+    if (compare != float_relation_equal &&
+            compare != float_relation_unordered) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+static uint8_t vmfne64(uint64_t a, uint64_t b, float_status *s)
+{
+    int compare = float64_compare_quiet(a, b, s);
+    if (compare != float_relation_equal &&
+            compare != float_relation_unordered) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+GEN_VEXT_CMP_VV_ENV(vmfne_vv_h, uint16_t, H2, vmfne16)
+GEN_VEXT_CMP_VV_ENV(vmfne_vv_w, uint32_t, H4, vmfne32)
+GEN_VEXT_CMP_VV_ENV(vmfne_vv_d, uint64_t, H8, vmfne64)
+GEN_VEXT_CMP_VF(vmfne_vf_h, uint16_t, H2, vmfne16)
+GEN_VEXT_CMP_VF(vmfne_vf_w, uint32_t, H4, vmfne32)
+GEN_VEXT_CMP_VF(vmfne_vf_d, uint64_t, H8, vmfne64)
+
+static uint8_t float16_lt(uint16_t a, uint16_t b, float_status *s)
+{
+    int compare = float16_compare(a, b, s);
+    if (compare == float_relation_less) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+GEN_VEXT_CMP_VV_ENV(vmflt_vv_h, uint16_t, H2, float16_lt)
+GEN_VEXT_CMP_VV_ENV(vmflt_vv_w, uint32_t, H4, float32_lt)
+GEN_VEXT_CMP_VV_ENV(vmflt_vv_d, uint64_t, H8, float64_lt)
+GEN_VEXT_CMP_VF(vmflt_vf_h, uint16_t, H2, float16_lt)
+GEN_VEXT_CMP_VF(vmflt_vf_w, uint32_t, H4, float32_lt)
+GEN_VEXT_CMP_VF(vmflt_vf_d, uint64_t, H8, float64_lt)
+
+static uint8_t float16_le(uint16_t a, uint16_t b, float_status *s)
+{
+    int compare = float16_compare(a, b, s);
+    if (compare == float_relation_less ||
+            compare == float_relation_equal) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+GEN_VEXT_CMP_VV_ENV(vmfle_vv_h, uint16_t, H2, float16_le)
+GEN_VEXT_CMP_VV_ENV(vmfle_vv_w, uint32_t, H4, float32_le)
+GEN_VEXT_CMP_VV_ENV(vmfle_vv_d, uint64_t, H8, float64_le)
+GEN_VEXT_CMP_VF(vmfle_vf_h, uint16_t, H2, float16_le)
+GEN_VEXT_CMP_VF(vmfle_vf_w, uint32_t, H4, float32_le)
+GEN_VEXT_CMP_VF(vmfle_vf_d, uint64_t, H8, float64_le)
+
+static uint8_t vmfgt16(uint16_t a, uint16_t b, float_status *s)
+{
+    int compare = float16_compare(a, b, s);
+    if (compare == float_relation_greater) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+static uint8_t vmfgt32(uint32_t a, uint32_t b, float_status *s)
+{
+    int compare = float32_compare(a, b, s);
+    if (compare == float_relation_greater) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+static uint8_t vmfgt64(uint64_t a, uint64_t b, float_status *s)
+{
+    int compare = float64_compare(a, b, s);
+    if (compare == float_relation_greater) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+GEN_VEXT_CMP_VF(vmfgt_vf_h, uint16_t, H2, vmfgt16)
+GEN_VEXT_CMP_VF(vmfgt_vf_w, uint32_t, H4, vmfgt32)
+GEN_VEXT_CMP_VF(vmfgt_vf_d, uint64_t, H8, vmfgt64)
+
+static uint8_t vmfge16(uint16_t a, uint16_t b, float_status *s)
+{
+    int compare = float16_compare(a, b, s);
+    if (compare == float_relation_greater ||
+            compare == float_relation_equal) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+static uint8_t vmfge32(uint32_t a, uint32_t b, float_status *s)
+{
+    int compare = float32_compare(a, b, s);
+    if (compare == float_relation_greater ||
+            compare == float_relation_equal) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+static uint8_t vmfge64(uint64_t a, uint64_t b, float_status *s)
+{
+    int compare = float64_compare(a, b, s);
+    if (compare == float_relation_greater ||
+            compare == float_relation_equal) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+GEN_VEXT_CMP_VF(vmfge_vf_h, uint16_t, H2, vmfge16)
+GEN_VEXT_CMP_VF(vmfge_vf_w, uint32_t, H4, vmfge32)
+GEN_VEXT_CMP_VF(vmfge_vf_d, uint64_t, H8, vmfge64)
+
+static uint8_t float16_unordered_quiet(uint16_t a, uint16_t b, float_status *s)
+{
+    int compare = float16_compare_quiet(a, b, s);
+    if (compare == float_relation_unordered) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+GEN_VEXT_CMP_VV_ENV(vmford_vv_h, uint16_t, H2, !float16_unordered_quiet)
+GEN_VEXT_CMP_VV_ENV(vmford_vv_w, uint32_t, H4, !float32_unordered_quiet)
+GEN_VEXT_CMP_VV_ENV(vmford_vv_d, uint64_t, H8, !float64_unordered_quiet)
+GEN_VEXT_CMP_VF(vmford_vf_h, uint16_t, H2, !float16_unordered_quiet)
+GEN_VEXT_CMP_VF(vmford_vf_w, uint32_t, H4, !float32_unordered_quiet)
+GEN_VEXT_CMP_VF(vmford_vf_d, uint64_t, H8, !float64_unordered_quiet)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 38/60] target/riscv: vector floating-point compare instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  37 ++++
 target/riscv/insn32.decode              |  12 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  33 ++++
 target/riscv/vector_helper.c            | 221 ++++++++++++++++++++++++
 4 files changed, 303 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index efbd5d306d..323bed038e 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -941,3 +941,40 @@ DEF_HELPER_6(vfsgnjn_vf_d, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfsgnjx_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfsgnjx_vf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfsgnjx_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_6(vmfeq_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmfeq_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmfeq_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmfne_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmfne_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmfne_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmflt_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmflt_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmflt_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmfle_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmfle_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmfle_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmfeq_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfeq_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfeq_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfne_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfne_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfne_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmflt_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmflt_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmflt_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfle_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfle_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfle_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfgt_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfgt_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfgt_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfge_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfge_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmfge_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmford_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmford_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmford_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmford_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmford_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vmford_vf_d, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index d149eccd5c..2d61256981 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -495,6 +495,18 @@ vfsgnjn_vv      001001 . ..... ..... 001 ..... 1010111 @r_vm
 vfsgnjn_vf      001001 . ..... ..... 101 ..... 1010111 @r_vm
 vfsgnjx_vv      001010 . ..... ..... 001 ..... 1010111 @r_vm
 vfsgnjx_vf      001010 . ..... ..... 101 ..... 1010111 @r_vm
+vmfeq_vv        011000 . ..... ..... 001 ..... 1010111 @r_vm
+vmfeq_vf        011000 . ..... ..... 101 ..... 1010111 @r_vm
+vmfne_vv        011100 . ..... ..... 001 ..... 1010111 @r_vm
+vmfne_vf        011100 . ..... ..... 101 ..... 1010111 @r_vm
+vmflt_vv        011011 . ..... ..... 001 ..... 1010111 @r_vm
+vmflt_vf        011011 . ..... ..... 101 ..... 1010111 @r_vm
+vmfle_vv        011001 . ..... ..... 001 ..... 1010111 @r_vm
+vmfle_vf        011001 . ..... ..... 101 ..... 1010111 @r_vm
+vmfgt_vf        011101 . ..... ..... 101 ..... 1010111 @r_vm
+vmfge_vf        011111 . ..... ..... 101 ..... 1010111 @r_vm
+vmford_vv       011010 . ..... ..... 001 ..... 1010111 @r_vm
+vmford_vf       011010 . ..... ..... 101 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index ff4ebd3f8c..9d9653e605 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1885,3 +1885,36 @@ GEN_OPFVV_TRANS(vfsgnjx_vv, opfvv_check)
 GEN_OPFVF_TRANS(vfsgnj_vf, opfvf_check)
 GEN_OPFVF_TRANS(vfsgnjn_vf, opfvf_check)
 GEN_OPFVF_TRANS(vfsgnjx_vf, opfvf_check)
+
+/* Vector Floating-Point Compare Instructions */
+static bool opfvv_cmp_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_reg(s, a->rs1, false) &&
+            (s->sew != 0) &&
+            ((vext_check_overlap_group(a->rd, 1, a->rs1, 1 << s->lmul) &&
+            vext_check_overlap_group(a->rd, 1, a->rs2, 1 << s->lmul)) ||
+            (s->lmul == 0)));
+}
+GEN_OPFVV_TRANS(vmfeq_vv, opfvv_cmp_check)
+GEN_OPFVV_TRANS(vmfne_vv, opfvv_cmp_check)
+GEN_OPFVV_TRANS(vmflt_vv, opfvv_cmp_check)
+GEN_OPFVV_TRANS(vmfle_vv, opfvv_cmp_check)
+GEN_OPFVV_TRANS(vmford_vv, opfvv_cmp_check)
+
+static bool opfvf_cmp_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rs2, false) &&
+            (s->sew != 0) &&
+            (vext_check_overlap_group(a->rd, 1, a->rs2, 1 << s->lmul) ||
+            (s->lmul == 0)));
+}
+GEN_OPFVF_TRANS(vmfeq_vf, opfvf_cmp_check)
+GEN_OPFVF_TRANS(vmfne_vf, opfvf_cmp_check)
+GEN_OPFVF_TRANS(vmflt_vf, opfvf_cmp_check)
+GEN_OPFVF_TRANS(vmfle_vf, opfvf_cmp_check)
+GEN_OPFVF_TRANS(vmfgt_vf, opfvf_cmp_check)
+GEN_OPFVF_TRANS(vmfge_vf, opfvf_cmp_check)
+GEN_OPFVF_TRANS(vmford_vf, opfvf_cmp_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 79e261ff1a..dd44cc57a3 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3631,3 +3631,224 @@ RVVCALL(OPFVF2, vfsgnjx_vf_d, OP_UUU_D, H8, H8, fsgnjx64)
 GEN_VEXT_VF(vfsgnjx_vf_h, 2, 2, clearh)
 GEN_VEXT_VF(vfsgnjx_vf_w, 4, 4, clearl)
 GEN_VEXT_VF(vfsgnjx_vf_d, 8, 8, clearq)
+
+/* Vector Floating-Point Compare Instructions */
+#define GEN_VEXT_CMP_VV_ENV(NAME, ETYPE, H, DO_OP)            \
+void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
+        CPURISCVState *env, uint32_t desc)                    \
+{                                                             \
+    uint32_t mlen = vext_mlen(desc);                          \
+    uint32_t vm = vext_vm(desc);                              \
+    uint32_t vl = env->vl;                                    \
+    uint32_t vlmax = vext_maxsz(desc) / sizeof(ETYPE);        \
+    uint32_t i;                                               \
+                                                              \
+    for (i = 0; i < vl; i++) {                                \
+        ETYPE s1 = *((ETYPE *)vs1 + H(i));                    \
+        ETYPE s2 = *((ETYPE *)vs2 + H(i));                    \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {            \
+            continue;                                         \
+        }                                                     \
+        vext_set_elem_mask(vd, mlen, i, DO_OP(s2, s1,         \
+            &env->fp_status));                                \
+    }                                                         \
+    if (i == 0) {                                             \
+        return;                                               \
+    }                                                         \
+    for (; i < vlmax; i++) {                                  \
+        vext_set_elem_mask(vd, mlen, i, 0);                   \
+    }                                                         \
+}
+
+static uint8_t float16_eq_quiet(uint16_t a, uint16_t b, float_status *s)
+{
+    int compare = float16_compare_quiet(a, b, s);
+    if (compare == float_relation_equal) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+GEN_VEXT_CMP_VV_ENV(vmfeq_vv_h, uint16_t, H2, float16_eq_quiet)
+GEN_VEXT_CMP_VV_ENV(vmfeq_vv_w, uint32_t, H4, float32_eq_quiet)
+GEN_VEXT_CMP_VV_ENV(vmfeq_vv_d, uint64_t, H8, float64_eq_quiet)
+
+#define GEN_VEXT_CMP_VF(NAME, ETYPE, H, DO_OP)                      \
+void HELPER(NAME)(void *vd, void *v0, uint64_t s1, void *vs2,       \
+        CPURISCVState *env, uint32_t desc)                          \
+{                                                                   \
+    uint32_t mlen = vext_mlen(desc);                                \
+    uint32_t vm = vext_vm(desc);                                    \
+    uint32_t vl = env->vl;                                          \
+    uint32_t vlmax = vext_maxsz(desc) / sizeof(ETYPE);              \
+    uint32_t i;                                                     \
+                                                                    \
+    for (i = 0; i < vl; i++) {                                      \
+        ETYPE s2 = *((ETYPE *)vs2 + H(i));                          \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                  \
+            continue;                                               \
+        }                                                           \
+        vext_set_elem_mask(vd, mlen, i, DO_OP(s2, (ETYPE)s1,        \
+                &env->fp_status));                                  \
+    }                                                               \
+    if (i == 0) {                                                   \
+        return;                                                     \
+    }                                                               \
+    for (; i < vlmax; i++) {                                        \
+        vext_set_elem_mask(vd, mlen, i, 0);                         \
+    }                                                               \
+}
+GEN_VEXT_CMP_VF(vmfeq_vf_h, uint16_t, H2, float16_eq_quiet)
+GEN_VEXT_CMP_VF(vmfeq_vf_w, uint32_t, H4, float32_eq_quiet)
+GEN_VEXT_CMP_VF(vmfeq_vf_d, uint64_t, H8, float64_eq_quiet)
+
+static uint8_t vmfne16(uint16_t a, uint16_t b, float_status *s)
+{
+    int compare = float16_compare_quiet(a, b, s);
+    if (compare != float_relation_equal &&
+            compare != float_relation_unordered) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+static uint8_t vmfne32(uint32_t a, uint32_t b, float_status *s)
+{
+    int compare = float32_compare_quiet(a, b, s);
+    if (compare != float_relation_equal &&
+            compare != float_relation_unordered) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+static uint8_t vmfne64(uint64_t a, uint64_t b, float_status *s)
+{
+    int compare = float64_compare_quiet(a, b, s);
+    if (compare != float_relation_equal &&
+            compare != float_relation_unordered) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+GEN_VEXT_CMP_VV_ENV(vmfne_vv_h, uint16_t, H2, vmfne16)
+GEN_VEXT_CMP_VV_ENV(vmfne_vv_w, uint32_t, H4, vmfne32)
+GEN_VEXT_CMP_VV_ENV(vmfne_vv_d, uint64_t, H8, vmfne64)
+GEN_VEXT_CMP_VF(vmfne_vf_h, uint16_t, H2, vmfne16)
+GEN_VEXT_CMP_VF(vmfne_vf_w, uint32_t, H4, vmfne32)
+GEN_VEXT_CMP_VF(vmfne_vf_d, uint64_t, H8, vmfne64)
+
+static uint8_t float16_lt(uint16_t a, uint16_t b, float_status *s)
+{
+    int compare = float16_compare(a, b, s);
+    if (compare == float_relation_less) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+GEN_VEXT_CMP_VV_ENV(vmflt_vv_h, uint16_t, H2, float16_lt)
+GEN_VEXT_CMP_VV_ENV(vmflt_vv_w, uint32_t, H4, float32_lt)
+GEN_VEXT_CMP_VV_ENV(vmflt_vv_d, uint64_t, H8, float64_lt)
+GEN_VEXT_CMP_VF(vmflt_vf_h, uint16_t, H2, float16_lt)
+GEN_VEXT_CMP_VF(vmflt_vf_w, uint32_t, H4, float32_lt)
+GEN_VEXT_CMP_VF(vmflt_vf_d, uint64_t, H8, float64_lt)
+
+static uint8_t float16_le(uint16_t a, uint16_t b, float_status *s)
+{
+    int compare = float16_compare(a, b, s);
+    if (compare == float_relation_less ||
+            compare == float_relation_equal) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+GEN_VEXT_CMP_VV_ENV(vmfle_vv_h, uint16_t, H2, float16_le)
+GEN_VEXT_CMP_VV_ENV(vmfle_vv_w, uint32_t, H4, float32_le)
+GEN_VEXT_CMP_VV_ENV(vmfle_vv_d, uint64_t, H8, float64_le)
+GEN_VEXT_CMP_VF(vmfle_vf_h, uint16_t, H2, float16_le)
+GEN_VEXT_CMP_VF(vmfle_vf_w, uint32_t, H4, float32_le)
+GEN_VEXT_CMP_VF(vmfle_vf_d, uint64_t, H8, float64_le)
+
+static uint8_t vmfgt16(uint16_t a, uint16_t b, float_status *s)
+{
+    int compare = float16_compare(a, b, s);
+    if (compare == float_relation_greater) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+static uint8_t vmfgt32(uint32_t a, uint32_t b, float_status *s)
+{
+    int compare = float32_compare(a, b, s);
+    if (compare == float_relation_greater) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+static uint8_t vmfgt64(uint64_t a, uint64_t b, float_status *s)
+{
+    int compare = float64_compare(a, b, s);
+    if (compare == float_relation_greater) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+GEN_VEXT_CMP_VF(vmfgt_vf_h, uint16_t, H2, vmfgt16)
+GEN_VEXT_CMP_VF(vmfgt_vf_w, uint32_t, H4, vmfgt32)
+GEN_VEXT_CMP_VF(vmfgt_vf_d, uint64_t, H8, vmfgt64)
+
+static uint8_t vmfge16(uint16_t a, uint16_t b, float_status *s)
+{
+    int compare = float16_compare(a, b, s);
+    if (compare == float_relation_greater ||
+            compare == float_relation_equal) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+static uint8_t vmfge32(uint32_t a, uint32_t b, float_status *s)
+{
+    int compare = float32_compare(a, b, s);
+    if (compare == float_relation_greater ||
+            compare == float_relation_equal) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+static uint8_t vmfge64(uint64_t a, uint64_t b, float_status *s)
+{
+    int compare = float64_compare(a, b, s);
+    if (compare == float_relation_greater ||
+            compare == float_relation_equal) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+GEN_VEXT_CMP_VF(vmfge_vf_h, uint16_t, H2, vmfge16)
+GEN_VEXT_CMP_VF(vmfge_vf_w, uint32_t, H4, vmfge32)
+GEN_VEXT_CMP_VF(vmfge_vf_d, uint64_t, H8, vmfge64)
+
+static uint8_t float16_unordered_quiet(uint16_t a, uint16_t b, float_status *s)
+{
+    int compare = float16_compare_quiet(a, b, s);
+    if (compare == float_relation_unordered) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+GEN_VEXT_CMP_VV_ENV(vmford_vv_h, uint16_t, H2, !float16_unordered_quiet)
+GEN_VEXT_CMP_VV_ENV(vmford_vv_w, uint32_t, H4, !float32_unordered_quiet)
+GEN_VEXT_CMP_VV_ENV(vmford_vv_d, uint64_t, H8, !float64_unordered_quiet)
+GEN_VEXT_CMP_VF(vmford_vf_h, uint16_t, H2, !float16_unordered_quiet)
+GEN_VEXT_CMP_VF(vmford_vf_w, uint32_t, H4, !float32_unordered_quiet)
+GEN_VEXT_CMP_VF(vmford_vf_d, uint64_t, H8, !float64_unordered_quiet)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 39/60] target/riscv: vector floating-point classify instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  4 ++
 target/riscv/insn32.decode              |  1 +
 target/riscv/insn_trans/trans_rvv.inc.c |  3 ++
 target/riscv/vector_helper.c            | 62 +++++++++++++++++++++++++
 4 files changed, 70 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 323bed038e..86f1498c06 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -978,3 +978,7 @@ DEF_HELPER_6(vmford_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vmford_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vmford_vf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vmford_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_5(vfclass_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfclass_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfclass_v_d, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 2d61256981..18b78ed82d 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -507,6 +507,7 @@ vmfgt_vf        011101 . ..... ..... 101 ..... 1010111 @r_vm
 vmfge_vf        011111 . ..... ..... 101 ..... 1010111 @r_vm
 vmford_vv       011010 . ..... ..... 001 ..... 1010111 @r_vm
 vmford_vf       011010 . ..... ..... 101 ..... 1010111 @r_vm
+vfclass_v       100011 . ..... 10000 001 ..... 1010111 @r2_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 9d9653e605..3971c3ebdb 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1918,3 +1918,6 @@ GEN_OPFVF_TRANS(vmfle_vf, opfvf_cmp_check)
 GEN_OPFVF_TRANS(vmfgt_vf, opfvf_cmp_check)
 GEN_OPFVF_TRANS(vmfge_vf, opfvf_cmp_check)
 GEN_OPFVF_TRANS(vmford_vf, opfvf_cmp_check)
+
+/* Vector Floating-Point Classify Instruction */
+GEN_OPFV_TRANS(vfclass_v, opfv_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index dd44cc57a3..e9f278643f 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3852,3 +3852,65 @@ GEN_VEXT_CMP_VV_ENV(vmford_vv_d, uint64_t, H8, !float64_unordered_quiet)
 GEN_VEXT_CMP_VF(vmford_vf_h, uint16_t, H2, !float16_unordered_quiet)
 GEN_VEXT_CMP_VF(vmford_vf_w, uint32_t, H4, !float32_unordered_quiet)
 GEN_VEXT_CMP_VF(vmford_vf_d, uint64_t, H8, !float64_unordered_quiet)
+
+/* Vector Floating-Point Classify Instruction */
+static uint16_t fclass_f16(uint16_t frs1, float_status *s)
+{
+    float16 f = frs1;
+    bool sign = float16_is_neg(f);
+
+    if (float16_is_infinity(f)) {
+        return sign ? 1 << 0 : 1 << 7;
+    } else if (float16_is_zero(f)) {
+        return sign ? 1 << 3 : 1 << 4;
+    } else if (float16_is_zero_or_denormal(f)) {
+        return sign ? 1 << 2 : 1 << 5;
+    } else if (float16_is_any_nan(f)) {
+        float_status s = { }; /* for snan_bit_is_one */
+        return float16_is_quiet_nan(f, &s) ? 1 << 9 : 1 << 8;
+    } else {
+        return sign ? 1 << 1 : 1 << 6;
+    }
+}
+static uint32_t fclass_s(uint32_t frs1, float_status *s)
+{
+    float32 f = frs1;
+    bool sign = float32_is_neg(f);
+
+    if (float32_is_infinity(f)) {
+        return sign ? 1 << 0 : 1 << 7;
+    } else if (float32_is_zero(f)) {
+        return sign ? 1 << 3 : 1 << 4;
+    } else if (float32_is_zero_or_denormal(f)) {
+        return sign ? 1 << 2 : 1 << 5;
+    } else if (float32_is_any_nan(f)) {
+        float_status s = { }; /* for snan_bit_is_one */
+        return float32_is_quiet_nan(f, &s) ? 1 << 9 : 1 << 8;
+    } else {
+        return sign ? 1 << 1 : 1 << 6;
+    }
+}
+static uint64_t fclass_d(uint64_t frs1, float_status *s)
+{
+    float64 f = frs1;
+    bool sign = float64_is_neg(f);
+
+    if (float64_is_infinity(f)) {
+        return sign ? 1 << 0 : 1 << 7;
+    } else if (float64_is_zero(f)) {
+        return sign ? 1 << 3 : 1 << 4;
+    } else if (float64_is_zero_or_denormal(f)) {
+        return sign ? 1 << 2 : 1 << 5;
+    } else if (float64_is_any_nan(f)) {
+        float_status s = { }; /* for snan_bit_is_one */
+        return float64_is_quiet_nan(f, &s) ? 1 << 9 : 1 << 8;
+    } else {
+        return sign ? 1 << 1 : 1 << 6;
+    }
+}
+RVVCALL(OPFVV1, vfclass_v_h, OP_UU_H, H2, H2, fclass_f16)
+RVVCALL(OPFVV1, vfclass_v_w, OP_UU_W, H4, H4, fclass_s)
+RVVCALL(OPFVV1, vfclass_v_d, OP_UU_D, H8, H8, fclass_d)
+GEN_VEXT_V_ENV(vfclass_v_h, 2, 2, clearh)
+GEN_VEXT_V_ENV(vfclass_v_w, 4, 4, clearl)
+GEN_VEXT_V_ENV(vfclass_v_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 39/60] target/riscv: vector floating-point classify instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  4 ++
 target/riscv/insn32.decode              |  1 +
 target/riscv/insn_trans/trans_rvv.inc.c |  3 ++
 target/riscv/vector_helper.c            | 62 +++++++++++++++++++++++++
 4 files changed, 70 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 323bed038e..86f1498c06 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -978,3 +978,7 @@ DEF_HELPER_6(vmford_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vmford_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vmford_vf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vmford_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_5(vfclass_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfclass_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfclass_v_d, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 2d61256981..18b78ed82d 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -507,6 +507,7 @@ vmfgt_vf        011101 . ..... ..... 101 ..... 1010111 @r_vm
 vmfge_vf        011111 . ..... ..... 101 ..... 1010111 @r_vm
 vmford_vv       011010 . ..... ..... 001 ..... 1010111 @r_vm
 vmford_vf       011010 . ..... ..... 101 ..... 1010111 @r_vm
+vfclass_v       100011 . ..... 10000 001 ..... 1010111 @r2_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 9d9653e605..3971c3ebdb 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1918,3 +1918,6 @@ GEN_OPFVF_TRANS(vmfle_vf, opfvf_cmp_check)
 GEN_OPFVF_TRANS(vmfgt_vf, opfvf_cmp_check)
 GEN_OPFVF_TRANS(vmfge_vf, opfvf_cmp_check)
 GEN_OPFVF_TRANS(vmford_vf, opfvf_cmp_check)
+
+/* Vector Floating-Point Classify Instruction */
+GEN_OPFV_TRANS(vfclass_v, opfv_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index dd44cc57a3..e9f278643f 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3852,3 +3852,65 @@ GEN_VEXT_CMP_VV_ENV(vmford_vv_d, uint64_t, H8, !float64_unordered_quiet)
 GEN_VEXT_CMP_VF(vmford_vf_h, uint16_t, H2, !float16_unordered_quiet)
 GEN_VEXT_CMP_VF(vmford_vf_w, uint32_t, H4, !float32_unordered_quiet)
 GEN_VEXT_CMP_VF(vmford_vf_d, uint64_t, H8, !float64_unordered_quiet)
+
+/* Vector Floating-Point Classify Instruction */
+static uint16_t fclass_f16(uint16_t frs1, float_status *s)
+{
+    float16 f = frs1;
+    bool sign = float16_is_neg(f);
+
+    if (float16_is_infinity(f)) {
+        return sign ? 1 << 0 : 1 << 7;
+    } else if (float16_is_zero(f)) {
+        return sign ? 1 << 3 : 1 << 4;
+    } else if (float16_is_zero_or_denormal(f)) {
+        return sign ? 1 << 2 : 1 << 5;
+    } else if (float16_is_any_nan(f)) {
+        float_status s = { }; /* for snan_bit_is_one */
+        return float16_is_quiet_nan(f, &s) ? 1 << 9 : 1 << 8;
+    } else {
+        return sign ? 1 << 1 : 1 << 6;
+    }
+}
+static uint32_t fclass_s(uint32_t frs1, float_status *s)
+{
+    float32 f = frs1;
+    bool sign = float32_is_neg(f);
+
+    if (float32_is_infinity(f)) {
+        return sign ? 1 << 0 : 1 << 7;
+    } else if (float32_is_zero(f)) {
+        return sign ? 1 << 3 : 1 << 4;
+    } else if (float32_is_zero_or_denormal(f)) {
+        return sign ? 1 << 2 : 1 << 5;
+    } else if (float32_is_any_nan(f)) {
+        float_status s = { }; /* for snan_bit_is_one */
+        return float32_is_quiet_nan(f, &s) ? 1 << 9 : 1 << 8;
+    } else {
+        return sign ? 1 << 1 : 1 << 6;
+    }
+}
+static uint64_t fclass_d(uint64_t frs1, float_status *s)
+{
+    float64 f = frs1;
+    bool sign = float64_is_neg(f);
+
+    if (float64_is_infinity(f)) {
+        return sign ? 1 << 0 : 1 << 7;
+    } else if (float64_is_zero(f)) {
+        return sign ? 1 << 3 : 1 << 4;
+    } else if (float64_is_zero_or_denormal(f)) {
+        return sign ? 1 << 2 : 1 << 5;
+    } else if (float64_is_any_nan(f)) {
+        float_status s = { }; /* for snan_bit_is_one */
+        return float64_is_quiet_nan(f, &s) ? 1 << 9 : 1 << 8;
+    } else {
+        return sign ? 1 << 1 : 1 << 6;
+    }
+}
+RVVCALL(OPFVV1, vfclass_v_h, OP_UU_H, H2, H2, fclass_f16)
+RVVCALL(OPFVV1, vfclass_v_w, OP_UU_W, H4, H4, fclass_s)
+RVVCALL(OPFVV1, vfclass_v_d, OP_UU_D, H8, H8, fclass_d)
+GEN_VEXT_V_ENV(vfclass_v_h, 2, 2, clearh)
+GEN_VEXT_V_ENV(vfclass_v_w, 4, 4, clearl)
+GEN_VEXT_V_ENV(vfclass_v_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 40/60] target/riscv: vector floating-point merge instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  4 ++++
 target/riscv/insn32.decode              |  1 +
 target/riscv/insn_trans/trans_rvv.inc.c | 12 +++++++++++
 target/riscv/vector_helper.c            | 28 +++++++++++++++++++++++++
 4 files changed, 45 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 86f1498c06..c02b207b44 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -982,3 +982,7 @@ DEF_HELPER_6(vmford_vf_d, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_5(vfclass_v_h, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfclass_v_w, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfclass_v_d, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vfmerge_vfm_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmerge_vfm_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmerge_vfm_d, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 18b78ed82d..41074f314a 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -508,6 +508,7 @@ vmfge_vf        011111 . ..... ..... 101 ..... 1010111 @r_vm
 vmford_vv       011010 . ..... ..... 001 ..... 1010111 @r_vm
 vmford_vf       011010 . ..... ..... 101 ..... 1010111 @r_vm
 vfclass_v       100011 . ..... 10000 001 ..... 1010111 @r2_vm
+vfmerge_vfm     010111 . ..... ..... 101 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 3971c3ebdb..1ddaee6dab 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1921,3 +1921,15 @@ GEN_OPFVF_TRANS(vmford_vf, opfvf_cmp_check)
 
 /* Vector Floating-Point Classify Instruction */
 GEN_OPFV_TRANS(vfclass_v, opfv_check)
+
+/* Vector Floating-Point Merge Instruction */
+static bool opfvf_vfmerge_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            ((a->vm == 0) || (a->rs2 == 0)) &&
+            (s->sew != 0));
+}
+GEN_OPFVF_TRANS(vfmerge_vfm, opfvf_vfmerge_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index e9f278643f..00f8d9344f 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3914,3 +3914,31 @@ RVVCALL(OPFVV1, vfclass_v_d, OP_UU_D, H8, H8, fclass_d)
 GEN_VEXT_V_ENV(vfclass_v_h, 2, 2, clearh)
 GEN_VEXT_V_ENV(vfclass_v_w, 4, 4, clearl)
 GEN_VEXT_V_ENV(vfclass_v_d, 8, 8, clearq)
+
+/* Vector Floating-Point Merge Instruction */
+#define GEN_VFMERGE_VF(NAME, ETYPE, H, CLEAR_FN)          \
+void HELPER(NAME)(void *vd, void *v0, uint64_t s1,        \
+        void *vs2, CPURISCVState *env, uint32_t desc)     \
+{                                                         \
+    uint32_t mlen = vext_mlen(desc);                      \
+    uint32_t vm = vext_vm(desc);                          \
+    uint32_t vl = env->vl;                                \
+    uint32_t esz = sizeof(ETYPE);                         \
+    uint32_t vlmax = vext_maxsz(desc) / esz;              \
+    uint32_t i;                                           \
+                                                          \
+    for (i = 0; i < vl; i++) {                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
+            ETYPE s2 = *((ETYPE *)vs2 + H(i));            \
+            *((ETYPE *)vd + H1(i)) = s2;                  \
+        } else {                                          \
+            *((ETYPE *)vd + H(i)) = (ETYPE)s1;            \
+        }                                                 \
+    }                                                     \
+    if (i != 0) {                                         \
+        CLEAR_FN(vd, vl, vl * esz, vlmax * esz);          \
+    }                                                     \
+}
+GEN_VFMERGE_VF(vfmerge_vfm_h, int16_t, H2, clearh)
+GEN_VFMERGE_VF(vfmerge_vfm_w, int32_t, H4, clearl)
+GEN_VFMERGE_VF(vfmerge_vfm_d, int64_t, H8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 40/60] target/riscv: vector floating-point merge instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  4 ++++
 target/riscv/insn32.decode              |  1 +
 target/riscv/insn_trans/trans_rvv.inc.c | 12 +++++++++++
 target/riscv/vector_helper.c            | 28 +++++++++++++++++++++++++
 4 files changed, 45 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 86f1498c06..c02b207b44 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -982,3 +982,7 @@ DEF_HELPER_6(vmford_vf_d, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_5(vfclass_v_h, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfclass_v_w, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfclass_v_d, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vfmerge_vfm_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmerge_vfm_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmerge_vfm_d, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 18b78ed82d..41074f314a 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -508,6 +508,7 @@ vmfge_vf        011111 . ..... ..... 101 ..... 1010111 @r_vm
 vmford_vv       011010 . ..... ..... 001 ..... 1010111 @r_vm
 vmford_vf       011010 . ..... ..... 101 ..... 1010111 @r_vm
 vfclass_v       100011 . ..... 10000 001 ..... 1010111 @r2_vm
+vfmerge_vfm     010111 . ..... ..... 101 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 3971c3ebdb..1ddaee6dab 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1921,3 +1921,15 @@ GEN_OPFVF_TRANS(vmford_vf, opfvf_cmp_check)
 
 /* Vector Floating-Point Classify Instruction */
 GEN_OPFV_TRANS(vfclass_v, opfv_check)
+
+/* Vector Floating-Point Merge Instruction */
+static bool opfvf_vfmerge_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            ((a->vm == 0) || (a->rs2 == 0)) &&
+            (s->sew != 0));
+}
+GEN_OPFVF_TRANS(vfmerge_vfm, opfvf_vfmerge_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index e9f278643f..00f8d9344f 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3914,3 +3914,31 @@ RVVCALL(OPFVV1, vfclass_v_d, OP_UU_D, H8, H8, fclass_d)
 GEN_VEXT_V_ENV(vfclass_v_h, 2, 2, clearh)
 GEN_VEXT_V_ENV(vfclass_v_w, 4, 4, clearl)
 GEN_VEXT_V_ENV(vfclass_v_d, 8, 8, clearq)
+
+/* Vector Floating-Point Merge Instruction */
+#define GEN_VFMERGE_VF(NAME, ETYPE, H, CLEAR_FN)          \
+void HELPER(NAME)(void *vd, void *v0, uint64_t s1,        \
+        void *vs2, CPURISCVState *env, uint32_t desc)     \
+{                                                         \
+    uint32_t mlen = vext_mlen(desc);                      \
+    uint32_t vm = vext_vm(desc);                          \
+    uint32_t vl = env->vl;                                \
+    uint32_t esz = sizeof(ETYPE);                         \
+    uint32_t vlmax = vext_maxsz(desc) / esz;              \
+    uint32_t i;                                           \
+                                                          \
+    for (i = 0; i < vl; i++) {                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
+            ETYPE s2 = *((ETYPE *)vs2 + H(i));            \
+            *((ETYPE *)vd + H1(i)) = s2;                  \
+        } else {                                          \
+            *((ETYPE *)vd + H(i)) = (ETYPE)s1;            \
+        }                                                 \
+    }                                                     \
+    if (i != 0) {                                         \
+        CLEAR_FN(vd, vl, vl * esz, vlmax * esz);          \
+    }                                                     \
+}
+GEN_VFMERGE_VF(vfmerge_vfm_h, int16_t, H2, clearh)
+GEN_VFMERGE_VF(vfmerge_vfm_w, int32_t, H4, clearl)
+GEN_VFMERGE_VF(vfmerge_vfm_d, int64_t, H8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 41/60] target/riscv: vector floating-point/integer type-convert instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 13 ++++++++++
 target/riscv/insn32.decode              |  4 +++
 target/riscv/insn_trans/trans_rvv.inc.c |  6 +++++
 target/riscv/vector_helper.c            | 33 +++++++++++++++++++++++++
 4 files changed, 56 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index c02b207b44..a34157e277 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -986,3 +986,16 @@ DEF_HELPER_5(vfclass_v_d, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vfmerge_vfm_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfmerge_vfm_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfmerge_vfm_d, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_5(vfcvt_xu_f_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfcvt_xu_f_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfcvt_xu_f_v_d, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfcvt_x_f_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfcvt_x_f_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfcvt_x_f_v_d, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfcvt_f_xu_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfcvt_f_xu_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfcvt_f_xu_v_d, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfcvt_f_x_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfcvt_f_x_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfcvt_f_x_v_d, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 41074f314a..6ca6f97323 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -509,6 +509,10 @@ vmford_vv       011010 . ..... ..... 001 ..... 1010111 @r_vm
 vmford_vf       011010 . ..... ..... 101 ..... 1010111 @r_vm
 vfclass_v       100011 . ..... 10000 001 ..... 1010111 @r2_vm
 vfmerge_vfm     010111 . ..... ..... 101 ..... 1010111 @r_vm
+vfcvt_xu_f_v    100010 . ..... 00000 001 ..... 1010111 @r2_vm
+vfcvt_x_f_v     100010 . ..... 00001 001 ..... 1010111 @r2_vm
+vfcvt_f_xu_v    100010 . ..... 00010 001 ..... 1010111 @r2_vm
+vfcvt_f_x_v     100010 . ..... 00011 001 ..... 1010111 @r2_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 1ddaee6dab..a61c7fdf32 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1933,3 +1933,9 @@ static bool opfvf_vfmerge_check(DisasContext *s, arg_rmrr *a)
             (s->sew != 0));
 }
 GEN_OPFVF_TRANS(vfmerge_vfm, opfvf_vfmerge_check)
+
+/* Single-Width Floating-Point/Integer Type-Convert Instructions */
+GEN_OPFV_TRANS(vfcvt_xu_f_v, opfv_check)
+GEN_OPFV_TRANS(vfcvt_x_f_v, opfv_check)
+GEN_OPFV_TRANS(vfcvt_f_xu_v, opfv_check)
+GEN_OPFV_TRANS(vfcvt_f_x_v, opfv_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 00f8d9344f..d032745e94 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3942,3 +3942,36 @@ void HELPER(NAME)(void *vd, void *v0, uint64_t s1,        \
 GEN_VFMERGE_VF(vfmerge_vfm_h, int16_t, H2, clearh)
 GEN_VFMERGE_VF(vfmerge_vfm_w, int32_t, H4, clearl)
 GEN_VFMERGE_VF(vfmerge_vfm_d, int64_t, H8, clearq)
+
+/* Single-Width Floating-Point/Integer Type-Convert Instructions */
+/* vfcvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. */
+RVVCALL(OPFVV1, vfcvt_xu_f_v_h, OP_UU_H, H2, H2, float16_to_uint16)
+RVVCALL(OPFVV1, vfcvt_xu_f_v_w, OP_UU_W, H4, H4, float32_to_uint32)
+RVVCALL(OPFVV1, vfcvt_xu_f_v_d, OP_UU_D, H8, H8, float64_to_uint64)
+GEN_VEXT_V_ENV(vfcvt_xu_f_v_h, 2, 2, clearh)
+GEN_VEXT_V_ENV(vfcvt_xu_f_v_w, 4, 4, clearl)
+GEN_VEXT_V_ENV(vfcvt_xu_f_v_d, 8, 8, clearq)
+
+/* vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. */
+RVVCALL(OPFVV1, vfcvt_x_f_v_h, OP_UU_H, H2, H2, float16_to_int16)
+RVVCALL(OPFVV1, vfcvt_x_f_v_w, OP_UU_W, H4, H4, float32_to_int32)
+RVVCALL(OPFVV1, vfcvt_x_f_v_d, OP_UU_D, H8, H8, float64_to_int64)
+GEN_VEXT_V_ENV(vfcvt_x_f_v_h, 2, 2, clearh)
+GEN_VEXT_V_ENV(vfcvt_x_f_v_w, 4, 4, clearl)
+GEN_VEXT_V_ENV(vfcvt_x_f_v_d, 8, 8, clearq)
+
+/* vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. */
+RVVCALL(OPFVV1, vfcvt_f_xu_v_h, OP_UU_H, H2, H2, uint16_to_float16)
+RVVCALL(OPFVV1, vfcvt_f_xu_v_w, OP_UU_W, H4, H4, uint32_to_float32)
+RVVCALL(OPFVV1, vfcvt_f_xu_v_d, OP_UU_D, H8, H8, uint64_to_float64)
+GEN_VEXT_V_ENV(vfcvt_f_xu_v_h, 2, 2, clearh)
+GEN_VEXT_V_ENV(vfcvt_f_xu_v_w, 4, 4, clearl)
+GEN_VEXT_V_ENV(vfcvt_f_xu_v_d, 8, 8, clearq)
+
+/* vfcvt.f.x.v vd, vs2, vm # Convert integer to float. */
+RVVCALL(OPFVV1, vfcvt_f_x_v_h, OP_UU_H, H2, H2, int16_to_float16)
+RVVCALL(OPFVV1, vfcvt_f_x_v_w, OP_UU_W, H4, H4, int32_to_float32)
+RVVCALL(OPFVV1, vfcvt_f_x_v_d, OP_UU_D, H8, H8, int64_to_float64)
+GEN_VEXT_V_ENV(vfcvt_f_x_v_h, 2, 2, clearh)
+GEN_VEXT_V_ENV(vfcvt_f_x_v_w, 4, 4, clearl)
+GEN_VEXT_V_ENV(vfcvt_f_x_v_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 41/60] target/riscv: vector floating-point/integer type-convert instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 13 ++++++++++
 target/riscv/insn32.decode              |  4 +++
 target/riscv/insn_trans/trans_rvv.inc.c |  6 +++++
 target/riscv/vector_helper.c            | 33 +++++++++++++++++++++++++
 4 files changed, 56 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index c02b207b44..a34157e277 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -986,3 +986,16 @@ DEF_HELPER_5(vfclass_v_d, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vfmerge_vfm_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfmerge_vfm_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfmerge_vfm_d, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_5(vfcvt_xu_f_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfcvt_xu_f_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfcvt_xu_f_v_d, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfcvt_x_f_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfcvt_x_f_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfcvt_x_f_v_d, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfcvt_f_xu_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfcvt_f_xu_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfcvt_f_xu_v_d, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfcvt_f_x_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfcvt_f_x_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfcvt_f_x_v_d, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 41074f314a..6ca6f97323 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -509,6 +509,10 @@ vmford_vv       011010 . ..... ..... 001 ..... 1010111 @r_vm
 vmford_vf       011010 . ..... ..... 101 ..... 1010111 @r_vm
 vfclass_v       100011 . ..... 10000 001 ..... 1010111 @r2_vm
 vfmerge_vfm     010111 . ..... ..... 101 ..... 1010111 @r_vm
+vfcvt_xu_f_v    100010 . ..... 00000 001 ..... 1010111 @r2_vm
+vfcvt_x_f_v     100010 . ..... 00001 001 ..... 1010111 @r2_vm
+vfcvt_f_xu_v    100010 . ..... 00010 001 ..... 1010111 @r2_vm
+vfcvt_f_x_v     100010 . ..... 00011 001 ..... 1010111 @r2_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 1ddaee6dab..a61c7fdf32 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1933,3 +1933,9 @@ static bool opfvf_vfmerge_check(DisasContext *s, arg_rmrr *a)
             (s->sew != 0));
 }
 GEN_OPFVF_TRANS(vfmerge_vfm, opfvf_vfmerge_check)
+
+/* Single-Width Floating-Point/Integer Type-Convert Instructions */
+GEN_OPFV_TRANS(vfcvt_xu_f_v, opfv_check)
+GEN_OPFV_TRANS(vfcvt_x_f_v, opfv_check)
+GEN_OPFV_TRANS(vfcvt_f_xu_v, opfv_check)
+GEN_OPFV_TRANS(vfcvt_f_x_v, opfv_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 00f8d9344f..d032745e94 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3942,3 +3942,36 @@ void HELPER(NAME)(void *vd, void *v0, uint64_t s1,        \
 GEN_VFMERGE_VF(vfmerge_vfm_h, int16_t, H2, clearh)
 GEN_VFMERGE_VF(vfmerge_vfm_w, int32_t, H4, clearl)
 GEN_VFMERGE_VF(vfmerge_vfm_d, int64_t, H8, clearq)
+
+/* Single-Width Floating-Point/Integer Type-Convert Instructions */
+/* vfcvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. */
+RVVCALL(OPFVV1, vfcvt_xu_f_v_h, OP_UU_H, H2, H2, float16_to_uint16)
+RVVCALL(OPFVV1, vfcvt_xu_f_v_w, OP_UU_W, H4, H4, float32_to_uint32)
+RVVCALL(OPFVV1, vfcvt_xu_f_v_d, OP_UU_D, H8, H8, float64_to_uint64)
+GEN_VEXT_V_ENV(vfcvt_xu_f_v_h, 2, 2, clearh)
+GEN_VEXT_V_ENV(vfcvt_xu_f_v_w, 4, 4, clearl)
+GEN_VEXT_V_ENV(vfcvt_xu_f_v_d, 8, 8, clearq)
+
+/* vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. */
+RVVCALL(OPFVV1, vfcvt_x_f_v_h, OP_UU_H, H2, H2, float16_to_int16)
+RVVCALL(OPFVV1, vfcvt_x_f_v_w, OP_UU_W, H4, H4, float32_to_int32)
+RVVCALL(OPFVV1, vfcvt_x_f_v_d, OP_UU_D, H8, H8, float64_to_int64)
+GEN_VEXT_V_ENV(vfcvt_x_f_v_h, 2, 2, clearh)
+GEN_VEXT_V_ENV(vfcvt_x_f_v_w, 4, 4, clearl)
+GEN_VEXT_V_ENV(vfcvt_x_f_v_d, 8, 8, clearq)
+
+/* vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. */
+RVVCALL(OPFVV1, vfcvt_f_xu_v_h, OP_UU_H, H2, H2, uint16_to_float16)
+RVVCALL(OPFVV1, vfcvt_f_xu_v_w, OP_UU_W, H4, H4, uint32_to_float32)
+RVVCALL(OPFVV1, vfcvt_f_xu_v_d, OP_UU_D, H8, H8, uint64_to_float64)
+GEN_VEXT_V_ENV(vfcvt_f_xu_v_h, 2, 2, clearh)
+GEN_VEXT_V_ENV(vfcvt_f_xu_v_w, 4, 4, clearl)
+GEN_VEXT_V_ENV(vfcvt_f_xu_v_d, 8, 8, clearq)
+
+/* vfcvt.f.x.v vd, vs2, vm # Convert integer to float. */
+RVVCALL(OPFVV1, vfcvt_f_x_v_h, OP_UU_H, H2, H2, int16_to_float16)
+RVVCALL(OPFVV1, vfcvt_f_x_v_w, OP_UU_W, H4, H4, int32_to_float32)
+RVVCALL(OPFVV1, vfcvt_f_x_v_d, OP_UU_D, H8, H8, int64_to_float64)
+GEN_VEXT_V_ENV(vfcvt_f_x_v_h, 2, 2, clearh)
+GEN_VEXT_V_ENV(vfcvt_f_x_v_w, 4, 4, clearl)
+GEN_VEXT_V_ENV(vfcvt_f_x_v_d, 8, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 42/60] target/riscv: widening floating-point/integer type-convert instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 11 ++++++
 target/riscv/insn32.decode              |  5 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 42 +++++++++++++++++++++++
 target/riscv/vector_helper.c            | 45 +++++++++++++++++++++++++
 4 files changed, 103 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index a34157e277..90b4d50c62 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -999,3 +999,14 @@ DEF_HELPER_5(vfcvt_f_xu_v_d, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfcvt_f_x_v_h, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfcvt_f_x_v_w, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfcvt_f_x_v_d, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_5(vfwcvt_xu_f_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfwcvt_xu_f_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfwcvt_x_f_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfwcvt_x_f_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfwcvt_f_xu_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfwcvt_f_xu_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfwcvt_f_x_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfwcvt_f_x_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfwcvt_f_f_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfwcvt_f_f_v_w, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 6ca6f97323..247419937e 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -513,6 +513,11 @@ vfcvt_xu_f_v    100010 . ..... 00000 001 ..... 1010111 @r2_vm
 vfcvt_x_f_v     100010 . ..... 00001 001 ..... 1010111 @r2_vm
 vfcvt_f_xu_v    100010 . ..... 00010 001 ..... 1010111 @r2_vm
 vfcvt_f_x_v     100010 . ..... 00011 001 ..... 1010111 @r2_vm
+vfwcvt_xu_f_v   100010 . ..... 01000 001 ..... 1010111 @r2_vm
+vfwcvt_x_f_v    100010 . ..... 01001 001 ..... 1010111 @r2_vm
+vfwcvt_f_xu_v   100010 . ..... 01010 001 ..... 1010111 @r2_vm
+vfwcvt_f_x_v    100010 . ..... 01011 001 ..... 1010111 @r2_vm
+vfwcvt_f_f_v    100010 . ..... 01100 001 ..... 1010111 @r2_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index a61c7fdf32..0566132d4e 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1939,3 +1939,45 @@ GEN_OPFV_TRANS(vfcvt_xu_f_v, opfv_check)
 GEN_OPFV_TRANS(vfcvt_x_f_v, opfv_check)
 GEN_OPFV_TRANS(vfcvt_f_xu_v, opfv_check)
 GEN_OPFV_TRANS(vfcvt_f_x_v, opfv_check)
+
+/* Widening Floating-Point/Integer Type-Convert Instructions */
+
+/*
+ * If the current SEW does not correspond to a supported IEEE floating-point
+ * type, an illegal instruction exception is raised
+ */
+static bool opfv_widen_check(DisasContext *s, arg_rmr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, true) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_overlap_group(a->rd, 2 << s->lmul, a->rs2,
+                1 << s->lmul) &&
+            (s->lmul < 0x3) && (s->sew < 0x3) && (s->sew != 0));
+}
+
+#define GEN_OPFV_WIDEN_TRANS(NAME)                                 \
+static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
+{                                                                  \
+    if (opfv_widen_check(s, a)) {                                  \
+        uint32_t data = 0;                                         \
+        static gen_helper_gvec_3_ptr * const fns[2] = {            \
+            gen_helper_##NAME##_h,                                 \
+            gen_helper_##NAME##_w,                                 \
+        };                                                         \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
+            vreg_ofs(s, a->rs2), cpu_env, 0,                       \
+            s->vlen / 8, data, fns[s->sew - 1]);                   \
+        return true;                                               \
+    }                                                              \
+    return false;                                                  \
+}
+GEN_OPFV_WIDEN_TRANS(vfwcvt_xu_f_v)
+GEN_OPFV_WIDEN_TRANS(vfwcvt_x_f_v)
+GEN_OPFV_WIDEN_TRANS(vfwcvt_f_xu_v)
+GEN_OPFV_WIDEN_TRANS(vfwcvt_f_x_v)
+GEN_OPFV_WIDEN_TRANS(vfwcvt_f_f_v)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index d032745e94..6454c54500 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3975,3 +3975,48 @@ RVVCALL(OPFVV1, vfcvt_f_x_v_d, OP_UU_D, H8, H8, int64_to_float64)
 GEN_VEXT_V_ENV(vfcvt_f_x_v_h, 2, 2, clearh)
 GEN_VEXT_V_ENV(vfcvt_f_x_v_w, 4, 4, clearl)
 GEN_VEXT_V_ENV(vfcvt_f_x_v_d, 8, 8, clearq)
+
+/* Widening Floating-Point/Integer Type-Convert Instructions */
+/* (TD, T2, TX2) */
+#define WOP_UU_H uint32_t, uint16_t, uint16_t
+#define WOP_UU_W uint64_t, uint32_t, uint32_t
+/* vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer.*/
+RVVCALL(OPFVV1, vfwcvt_xu_f_v_h, WOP_UU_H, H4, H2, float16_to_uint32)
+RVVCALL(OPFVV1, vfwcvt_xu_f_v_w, WOP_UU_W, H8, H4, float32_to_uint64)
+GEN_VEXT_V_ENV(vfwcvt_xu_f_v_h, 2, 4, clearl)
+GEN_VEXT_V_ENV(vfwcvt_xu_f_v_w, 4, 8, clearq)
+
+/* vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. */
+RVVCALL(OPFVV1, vfwcvt_x_f_v_h, WOP_UU_H, H4, H2, float16_to_int32)
+RVVCALL(OPFVV1, vfwcvt_x_f_v_w, WOP_UU_W, H8, H4, float32_to_int64)
+GEN_VEXT_V_ENV(vfwcvt_x_f_v_h, 2, 4, clearl)
+GEN_VEXT_V_ENV(vfwcvt_x_f_v_w, 4, 8, clearq)
+
+/* vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float */
+RVVCALL(OPFVV1, vfwcvt_f_xu_v_h, WOP_UU_H, H4, H2, uint16_to_float32)
+RVVCALL(OPFVV1, vfwcvt_f_xu_v_w, WOP_UU_W, H8, H4, uint32_to_float64)
+GEN_VEXT_V_ENV(vfwcvt_f_xu_v_h, 2, 4, clearl)
+GEN_VEXT_V_ENV(vfwcvt_f_xu_v_w, 4, 8, clearq)
+
+/* vfwcvt.f.x.v vd, vs2, vm # Convert integer to double-width float. */
+RVVCALL(OPFVV1, vfwcvt_f_x_v_h, WOP_UU_H, H4, H2, int16_to_float32)
+RVVCALL(OPFVV1, vfwcvt_f_x_v_w, WOP_UU_W, H8, H4, int32_to_float64)
+GEN_VEXT_V_ENV(vfwcvt_f_x_v_h, 2, 4, clearl)
+GEN_VEXT_V_ENV(vfwcvt_f_x_v_w, 4, 8, clearq)
+
+/*
+ * vfwcvt.f.f.v vd, vs2, vm #
+ * Convert single-width float to double-width float.
+ */
+static uint32_t vfwcvtffv16(uint16_t a, float_status *s)
+{
+    return float16_to_float32(a, true, s);
+}
+static uint64_t vfwcvtffv32(uint32_t a, float_status *s)
+{
+    return float32_to_float64(a, s);
+}
+RVVCALL(OPFVV1, vfwcvt_f_f_v_h, WOP_UU_H, H4, H2, vfwcvtffv16)
+RVVCALL(OPFVV1, vfwcvt_f_f_v_w, WOP_UU_W, H8, H4, vfwcvtffv32)
+GEN_VEXT_V_ENV(vfwcvt_f_f_v_h, 2, 4, clearl)
+GEN_VEXT_V_ENV(vfwcvt_f_f_v_w, 4, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 42/60] target/riscv: widening floating-point/integer type-convert instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 11 ++++++
 target/riscv/insn32.decode              |  5 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 42 +++++++++++++++++++++++
 target/riscv/vector_helper.c            | 45 +++++++++++++++++++++++++
 4 files changed, 103 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index a34157e277..90b4d50c62 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -999,3 +999,14 @@ DEF_HELPER_5(vfcvt_f_xu_v_d, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfcvt_f_x_v_h, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfcvt_f_x_v_w, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfcvt_f_x_v_d, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_5(vfwcvt_xu_f_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfwcvt_xu_f_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfwcvt_x_f_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfwcvt_x_f_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfwcvt_f_xu_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfwcvt_f_xu_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfwcvt_f_x_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfwcvt_f_x_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfwcvt_f_f_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfwcvt_f_f_v_w, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 6ca6f97323..247419937e 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -513,6 +513,11 @@ vfcvt_xu_f_v    100010 . ..... 00000 001 ..... 1010111 @r2_vm
 vfcvt_x_f_v     100010 . ..... 00001 001 ..... 1010111 @r2_vm
 vfcvt_f_xu_v    100010 . ..... 00010 001 ..... 1010111 @r2_vm
 vfcvt_f_x_v     100010 . ..... 00011 001 ..... 1010111 @r2_vm
+vfwcvt_xu_f_v   100010 . ..... 01000 001 ..... 1010111 @r2_vm
+vfwcvt_x_f_v    100010 . ..... 01001 001 ..... 1010111 @r2_vm
+vfwcvt_f_xu_v   100010 . ..... 01010 001 ..... 1010111 @r2_vm
+vfwcvt_f_x_v    100010 . ..... 01011 001 ..... 1010111 @r2_vm
+vfwcvt_f_f_v    100010 . ..... 01100 001 ..... 1010111 @r2_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index a61c7fdf32..0566132d4e 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1939,3 +1939,45 @@ GEN_OPFV_TRANS(vfcvt_xu_f_v, opfv_check)
 GEN_OPFV_TRANS(vfcvt_x_f_v, opfv_check)
 GEN_OPFV_TRANS(vfcvt_f_xu_v, opfv_check)
 GEN_OPFV_TRANS(vfcvt_f_x_v, opfv_check)
+
+/* Widening Floating-Point/Integer Type-Convert Instructions */
+
+/*
+ * If the current SEW does not correspond to a supported IEEE floating-point
+ * type, an illegal instruction exception is raised
+ */
+static bool opfv_widen_check(DisasContext *s, arg_rmr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, true) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_overlap_group(a->rd, 2 << s->lmul, a->rs2,
+                1 << s->lmul) &&
+            (s->lmul < 0x3) && (s->sew < 0x3) && (s->sew != 0));
+}
+
+#define GEN_OPFV_WIDEN_TRANS(NAME)                                 \
+static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
+{                                                                  \
+    if (opfv_widen_check(s, a)) {                                  \
+        uint32_t data = 0;                                         \
+        static gen_helper_gvec_3_ptr * const fns[2] = {            \
+            gen_helper_##NAME##_h,                                 \
+            gen_helper_##NAME##_w,                                 \
+        };                                                         \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
+            vreg_ofs(s, a->rs2), cpu_env, 0,                       \
+            s->vlen / 8, data, fns[s->sew - 1]);                   \
+        return true;                                               \
+    }                                                              \
+    return false;                                                  \
+}
+GEN_OPFV_WIDEN_TRANS(vfwcvt_xu_f_v)
+GEN_OPFV_WIDEN_TRANS(vfwcvt_x_f_v)
+GEN_OPFV_WIDEN_TRANS(vfwcvt_f_xu_v)
+GEN_OPFV_WIDEN_TRANS(vfwcvt_f_x_v)
+GEN_OPFV_WIDEN_TRANS(vfwcvt_f_f_v)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index d032745e94..6454c54500 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3975,3 +3975,48 @@ RVVCALL(OPFVV1, vfcvt_f_x_v_d, OP_UU_D, H8, H8, int64_to_float64)
 GEN_VEXT_V_ENV(vfcvt_f_x_v_h, 2, 2, clearh)
 GEN_VEXT_V_ENV(vfcvt_f_x_v_w, 4, 4, clearl)
 GEN_VEXT_V_ENV(vfcvt_f_x_v_d, 8, 8, clearq)
+
+/* Widening Floating-Point/Integer Type-Convert Instructions */
+/* (TD, T2, TX2) */
+#define WOP_UU_H uint32_t, uint16_t, uint16_t
+#define WOP_UU_W uint64_t, uint32_t, uint32_t
+/* vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer.*/
+RVVCALL(OPFVV1, vfwcvt_xu_f_v_h, WOP_UU_H, H4, H2, float16_to_uint32)
+RVVCALL(OPFVV1, vfwcvt_xu_f_v_w, WOP_UU_W, H8, H4, float32_to_uint64)
+GEN_VEXT_V_ENV(vfwcvt_xu_f_v_h, 2, 4, clearl)
+GEN_VEXT_V_ENV(vfwcvt_xu_f_v_w, 4, 8, clearq)
+
+/* vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. */
+RVVCALL(OPFVV1, vfwcvt_x_f_v_h, WOP_UU_H, H4, H2, float16_to_int32)
+RVVCALL(OPFVV1, vfwcvt_x_f_v_w, WOP_UU_W, H8, H4, float32_to_int64)
+GEN_VEXT_V_ENV(vfwcvt_x_f_v_h, 2, 4, clearl)
+GEN_VEXT_V_ENV(vfwcvt_x_f_v_w, 4, 8, clearq)
+
+/* vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float */
+RVVCALL(OPFVV1, vfwcvt_f_xu_v_h, WOP_UU_H, H4, H2, uint16_to_float32)
+RVVCALL(OPFVV1, vfwcvt_f_xu_v_w, WOP_UU_W, H8, H4, uint32_to_float64)
+GEN_VEXT_V_ENV(vfwcvt_f_xu_v_h, 2, 4, clearl)
+GEN_VEXT_V_ENV(vfwcvt_f_xu_v_w, 4, 8, clearq)
+
+/* vfwcvt.f.x.v vd, vs2, vm # Convert integer to double-width float. */
+RVVCALL(OPFVV1, vfwcvt_f_x_v_h, WOP_UU_H, H4, H2, int16_to_float32)
+RVVCALL(OPFVV1, vfwcvt_f_x_v_w, WOP_UU_W, H8, H4, int32_to_float64)
+GEN_VEXT_V_ENV(vfwcvt_f_x_v_h, 2, 4, clearl)
+GEN_VEXT_V_ENV(vfwcvt_f_x_v_w, 4, 8, clearq)
+
+/*
+ * vfwcvt.f.f.v vd, vs2, vm #
+ * Convert single-width float to double-width float.
+ */
+static uint32_t vfwcvtffv16(uint16_t a, float_status *s)
+{
+    return float16_to_float32(a, true, s);
+}
+static uint64_t vfwcvtffv32(uint32_t a, float_status *s)
+{
+    return float32_to_float64(a, s);
+}
+RVVCALL(OPFVV1, vfwcvt_f_f_v_h, WOP_UU_H, H4, H2, vfwcvtffv16)
+RVVCALL(OPFVV1, vfwcvt_f_f_v_w, WOP_UU_W, H8, H4, vfwcvtffv32)
+GEN_VEXT_V_ENV(vfwcvt_f_f_v_h, 2, 4, clearl)
+GEN_VEXT_V_ENV(vfwcvt_f_f_v_w, 4, 8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 43/60] target/riscv: narrowing floating-point/integer type-convert instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 11 +++++++
 target/riscv/insn32.decode              |  5 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 42 +++++++++++++++++++++++++
 target/riscv/vector_helper.c            | 42 +++++++++++++++++++++++++
 4 files changed, 100 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 90b4d50c62..008c5b9868 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1010,3 +1010,14 @@ DEF_HELPER_5(vfwcvt_f_x_v_h, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfwcvt_f_x_v_w, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfwcvt_f_f_v_h, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfwcvt_f_f_v_w, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_5(vfncvt_xu_f_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfncvt_xu_f_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfncvt_x_f_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfncvt_x_f_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfncvt_f_xu_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfncvt_f_xu_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfncvt_f_x_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfncvt_f_x_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfncvt_f_f_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfncvt_f_f_v_w, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 247419937e..ffc58698c6 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -518,6 +518,11 @@ vfwcvt_x_f_v    100010 . ..... 01001 001 ..... 1010111 @r2_vm
 vfwcvt_f_xu_v   100010 . ..... 01010 001 ..... 1010111 @r2_vm
 vfwcvt_f_x_v    100010 . ..... 01011 001 ..... 1010111 @r2_vm
 vfwcvt_f_f_v    100010 . ..... 01100 001 ..... 1010111 @r2_vm
+vfncvt_xu_f_v   100010 . ..... 10000 001 ..... 1010111 @r2_vm
+vfncvt_x_f_v    100010 . ..... 10001 001 ..... 1010111 @r2_vm
+vfncvt_f_xu_v   100010 . ..... 10010 001 ..... 1010111 @r2_vm
+vfncvt_f_x_v    100010 . ..... 10011 001 ..... 1010111 @r2_vm
+vfncvt_f_f_v    100010 . ..... 10100 001 ..... 1010111 @r2_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 0566132d4e..bdb765bf13 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1981,3 +1981,45 @@ GEN_OPFV_WIDEN_TRANS(vfwcvt_x_f_v)
 GEN_OPFV_WIDEN_TRANS(vfwcvt_f_xu_v)
 GEN_OPFV_WIDEN_TRANS(vfwcvt_f_x_v)
 GEN_OPFV_WIDEN_TRANS(vfwcvt_f_f_v)
+
+/* Narrowing Floating-Point/Integer Type-Convert Instructions */
+
+/*
+ * If the current SEW does not correspond to a supported IEEE floating-point
+ * type, an illegal instruction exception is raised
+ */
+static bool opfv_narrow_check(DisasContext *s, arg_rmr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, true) &&
+            vext_check_overlap_group(a->rd, 1 << s->lmul, a->rs2,
+                2 << s->lmul) &&
+            (s->lmul < 0x3) && (s->sew < 0x3) && (s->sew != 0));
+}
+
+#define GEN_OPFV_NARROW_TRANS(NAME)                                \
+static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
+{                                                                  \
+    if (opfv_narrow_check(s, a)) {                                 \
+        uint32_t data = 0;                                         \
+        static gen_helper_gvec_3_ptr * const fns[2] = {            \
+            gen_helper_##NAME##_h,                                 \
+            gen_helper_##NAME##_w,                                 \
+        };                                                         \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
+            vreg_ofs(s, a->rs2), cpu_env, 0,                       \
+            s->vlen / 8, data, fns[s->sew - 1]);                   \
+        return true;                                               \
+    }                                                              \
+    return false;                                                  \
+}
+GEN_OPFV_NARROW_TRANS(vfncvt_xu_f_v)
+GEN_OPFV_NARROW_TRANS(vfncvt_x_f_v)
+GEN_OPFV_NARROW_TRANS(vfncvt_f_xu_v)
+GEN_OPFV_NARROW_TRANS(vfncvt_f_x_v)
+GEN_OPFV_NARROW_TRANS(vfncvt_f_f_v)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 6454c54500..bb143b9216 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4020,3 +4020,45 @@ RVVCALL(OPFVV1, vfwcvt_f_f_v_h, WOP_UU_H, H4, H2, vfwcvtffv16)
 RVVCALL(OPFVV1, vfwcvt_f_f_v_w, WOP_UU_W, H8, H4, vfwcvtffv32)
 GEN_VEXT_V_ENV(vfwcvt_f_f_v_h, 2, 4, clearl)
 GEN_VEXT_V_ENV(vfwcvt_f_f_v_w, 4, 8, clearq)
+
+/* Narrowing Floating-Point/Integer Type-Convert Instructions */
+/* (TD, T2, TX2) */
+#define NOP_UU_H uint16_t, uint32_t, uint32_t
+#define NOP_UU_W uint32_t, uint64_t, uint64_t
+/* vfncvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. */
+RVVCALL(OPFVV1, vfncvt_xu_f_v_h, NOP_UU_H, H2, H4, float32_to_uint16)
+RVVCALL(OPFVV1, vfncvt_xu_f_v_w, NOP_UU_W, H4, H8, float64_to_uint32)
+GEN_VEXT_V_ENV(vfncvt_xu_f_v_h, 2, 2, clearh)
+GEN_VEXT_V_ENV(vfncvt_xu_f_v_w, 4, 4, clearl)
+
+/* vfncvt.x.f.v vd, vs2, vm # Convert double-width float to signed integer. */
+RVVCALL(OPFVV1, vfncvt_x_f_v_h, NOP_UU_H, H2, H4, float32_to_int16)
+RVVCALL(OPFVV1, vfncvt_x_f_v_w, NOP_UU_W, H4, H8, float64_to_int32)
+GEN_VEXT_V_ENV(vfncvt_x_f_v_h, 2, 2, clearh)
+GEN_VEXT_V_ENV(vfncvt_x_f_v_w, 4, 4, clearl)
+
+/* vfncvt.f.xu.v vd, vs2, vm # Convert double-width unsigned integer to float */
+RVVCALL(OPFVV1, vfncvt_f_xu_v_h, NOP_UU_H, H2, H4, uint32_to_float16)
+RVVCALL(OPFVV1, vfncvt_f_xu_v_w, NOP_UU_W, H4, H8, uint64_to_float32)
+GEN_VEXT_V_ENV(vfncvt_f_xu_v_h, 2, 2, clearh)
+GEN_VEXT_V_ENV(vfncvt_f_xu_v_w, 4, 4, clearl)
+
+/* vfncvt.f.x.v vd, vs2, vm # Convert double-width integer to float. */
+RVVCALL(OPFVV1, vfncvt_f_x_v_h, NOP_UU_H, H2, H4, int32_to_float16)
+RVVCALL(OPFVV1, vfncvt_f_x_v_w, NOP_UU_W, H4, H8, int64_to_float32)
+GEN_VEXT_V_ENV(vfncvt_f_x_v_h, 2, 2, clearh)
+GEN_VEXT_V_ENV(vfncvt_f_x_v_w, 4, 4, clearl)
+
+/* vfncvt.f.f.v vd, vs2, vm # Convert double float to single-width float. */
+static uint16_t vfncvtffv16(uint32_t a, float_status *s)
+{
+    return float32_to_float16(a, true, s);
+}
+static uint32_t vfncvtffv32(uint64_t a, float_status *s)
+{
+    return float64_to_float32(a, s);
+}
+RVVCALL(OPFVV1, vfncvt_f_f_v_h, NOP_UU_H, H2, H4, vfncvtffv16)
+RVVCALL(OPFVV1, vfncvt_f_f_v_w, NOP_UU_W, H4, H8, vfncvtffv32)
+GEN_VEXT_V_ENV(vfncvt_f_f_v_h, 2, 2, clearh)
+GEN_VEXT_V_ENV(vfncvt_f_f_v_w, 4, 4, clearl)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 43/60] target/riscv: narrowing floating-point/integer type-convert instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 11 +++++++
 target/riscv/insn32.decode              |  5 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 42 +++++++++++++++++++++++++
 target/riscv/vector_helper.c            | 42 +++++++++++++++++++++++++
 4 files changed, 100 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 90b4d50c62..008c5b9868 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1010,3 +1010,14 @@ DEF_HELPER_5(vfwcvt_f_x_v_h, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfwcvt_f_x_v_w, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfwcvt_f_f_v_h, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfwcvt_f_f_v_w, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_5(vfncvt_xu_f_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfncvt_xu_f_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfncvt_x_f_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfncvt_x_f_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfncvt_f_xu_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfncvt_f_xu_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfncvt_f_x_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfncvt_f_x_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfncvt_f_f_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfncvt_f_f_v_w, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 247419937e..ffc58698c6 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -518,6 +518,11 @@ vfwcvt_x_f_v    100010 . ..... 01001 001 ..... 1010111 @r2_vm
 vfwcvt_f_xu_v   100010 . ..... 01010 001 ..... 1010111 @r2_vm
 vfwcvt_f_x_v    100010 . ..... 01011 001 ..... 1010111 @r2_vm
 vfwcvt_f_f_v    100010 . ..... 01100 001 ..... 1010111 @r2_vm
+vfncvt_xu_f_v   100010 . ..... 10000 001 ..... 1010111 @r2_vm
+vfncvt_x_f_v    100010 . ..... 10001 001 ..... 1010111 @r2_vm
+vfncvt_f_xu_v   100010 . ..... 10010 001 ..... 1010111 @r2_vm
+vfncvt_f_x_v    100010 . ..... 10011 001 ..... 1010111 @r2_vm
+vfncvt_f_f_v    100010 . ..... 10100 001 ..... 1010111 @r2_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 0566132d4e..bdb765bf13 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1981,3 +1981,45 @@ GEN_OPFV_WIDEN_TRANS(vfwcvt_x_f_v)
 GEN_OPFV_WIDEN_TRANS(vfwcvt_f_xu_v)
 GEN_OPFV_WIDEN_TRANS(vfwcvt_f_x_v)
 GEN_OPFV_WIDEN_TRANS(vfwcvt_f_f_v)
+
+/* Narrowing Floating-Point/Integer Type-Convert Instructions */
+
+/*
+ * If the current SEW does not correspond to a supported IEEE floating-point
+ * type, an illegal instruction exception is raised
+ */
+static bool opfv_narrow_check(DisasContext *s, arg_rmr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, true) &&
+            vext_check_overlap_group(a->rd, 1 << s->lmul, a->rs2,
+                2 << s->lmul) &&
+            (s->lmul < 0x3) && (s->sew < 0x3) && (s->sew != 0));
+}
+
+#define GEN_OPFV_NARROW_TRANS(NAME)                                \
+static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
+{                                                                  \
+    if (opfv_narrow_check(s, a)) {                                 \
+        uint32_t data = 0;                                         \
+        static gen_helper_gvec_3_ptr * const fns[2] = {            \
+            gen_helper_##NAME##_h,                                 \
+            gen_helper_##NAME##_w,                                 \
+        };                                                         \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
+            vreg_ofs(s, a->rs2), cpu_env, 0,                       \
+            s->vlen / 8, data, fns[s->sew - 1]);                   \
+        return true;                                               \
+    }                                                              \
+    return false;                                                  \
+}
+GEN_OPFV_NARROW_TRANS(vfncvt_xu_f_v)
+GEN_OPFV_NARROW_TRANS(vfncvt_x_f_v)
+GEN_OPFV_NARROW_TRANS(vfncvt_f_xu_v)
+GEN_OPFV_NARROW_TRANS(vfncvt_f_x_v)
+GEN_OPFV_NARROW_TRANS(vfncvt_f_f_v)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 6454c54500..bb143b9216 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4020,3 +4020,45 @@ RVVCALL(OPFVV1, vfwcvt_f_f_v_h, WOP_UU_H, H4, H2, vfwcvtffv16)
 RVVCALL(OPFVV1, vfwcvt_f_f_v_w, WOP_UU_W, H8, H4, vfwcvtffv32)
 GEN_VEXT_V_ENV(vfwcvt_f_f_v_h, 2, 4, clearl)
 GEN_VEXT_V_ENV(vfwcvt_f_f_v_w, 4, 8, clearq)
+
+/* Narrowing Floating-Point/Integer Type-Convert Instructions */
+/* (TD, T2, TX2) */
+#define NOP_UU_H uint16_t, uint32_t, uint32_t
+#define NOP_UU_W uint32_t, uint64_t, uint64_t
+/* vfncvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. */
+RVVCALL(OPFVV1, vfncvt_xu_f_v_h, NOP_UU_H, H2, H4, float32_to_uint16)
+RVVCALL(OPFVV1, vfncvt_xu_f_v_w, NOP_UU_W, H4, H8, float64_to_uint32)
+GEN_VEXT_V_ENV(vfncvt_xu_f_v_h, 2, 2, clearh)
+GEN_VEXT_V_ENV(vfncvt_xu_f_v_w, 4, 4, clearl)
+
+/* vfncvt.x.f.v vd, vs2, vm # Convert double-width float to signed integer. */
+RVVCALL(OPFVV1, vfncvt_x_f_v_h, NOP_UU_H, H2, H4, float32_to_int16)
+RVVCALL(OPFVV1, vfncvt_x_f_v_w, NOP_UU_W, H4, H8, float64_to_int32)
+GEN_VEXT_V_ENV(vfncvt_x_f_v_h, 2, 2, clearh)
+GEN_VEXT_V_ENV(vfncvt_x_f_v_w, 4, 4, clearl)
+
+/* vfncvt.f.xu.v vd, vs2, vm # Convert double-width unsigned integer to float */
+RVVCALL(OPFVV1, vfncvt_f_xu_v_h, NOP_UU_H, H2, H4, uint32_to_float16)
+RVVCALL(OPFVV1, vfncvt_f_xu_v_w, NOP_UU_W, H4, H8, uint64_to_float32)
+GEN_VEXT_V_ENV(vfncvt_f_xu_v_h, 2, 2, clearh)
+GEN_VEXT_V_ENV(vfncvt_f_xu_v_w, 4, 4, clearl)
+
+/* vfncvt.f.x.v vd, vs2, vm # Convert double-width integer to float. */
+RVVCALL(OPFVV1, vfncvt_f_x_v_h, NOP_UU_H, H2, H4, int32_to_float16)
+RVVCALL(OPFVV1, vfncvt_f_x_v_w, NOP_UU_W, H4, H8, int64_to_float32)
+GEN_VEXT_V_ENV(vfncvt_f_x_v_h, 2, 2, clearh)
+GEN_VEXT_V_ENV(vfncvt_f_x_v_w, 4, 4, clearl)
+
+/* vfncvt.f.f.v vd, vs2, vm # Convert double float to single-width float. */
+static uint16_t vfncvtffv16(uint32_t a, float_status *s)
+{
+    return float32_to_float16(a, true, s);
+}
+static uint32_t vfncvtffv32(uint64_t a, float_status *s)
+{
+    return float64_to_float32(a, s);
+}
+RVVCALL(OPFVV1, vfncvt_f_f_v_h, NOP_UU_H, H2, H4, vfncvtffv16)
+RVVCALL(OPFVV1, vfncvt_f_f_v_w, NOP_UU_W, H4, H8, vfncvtffv32)
+GEN_VEXT_V_ENV(vfncvt_f_f_v_h, 2, 2, clearh)
+GEN_VEXT_V_ENV(vfncvt_f_f_v_w, 4, 4, clearl)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 44/60] target/riscv: vector single-width integer reduction instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 33 +++++++++++
 target/riscv/insn32.decode              |  8 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 17 ++++++
 target/riscv/vector_helper.c            | 76 +++++++++++++++++++++++++
 4 files changed, 134 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 008c5b9868..cc1eb55404 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1021,3 +1021,36 @@ DEF_HELPER_5(vfncvt_f_x_v_h, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfncvt_f_x_v_w, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfncvt_f_f_v_h, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfncvt_f_f_v_w, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vredsum_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredsum_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredsum_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredsum_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmaxu_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmaxu_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmaxu_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmaxu_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmax_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmax_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmax_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmax_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredminu_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredminu_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredminu_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredminu_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmin_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmin_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmin_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmin_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredand_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredand_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredand_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredand_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredor_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredor_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredor_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredor_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredxor_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredxor_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredxor_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredxor_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index ffc58698c6..2419ef97e7 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -523,6 +523,14 @@ vfncvt_x_f_v    100010 . ..... 10001 001 ..... 1010111 @r2_vm
 vfncvt_f_xu_v   100010 . ..... 10010 001 ..... 1010111 @r2_vm
 vfncvt_f_x_v    100010 . ..... 10011 001 ..... 1010111 @r2_vm
 vfncvt_f_f_v    100010 . ..... 10100 001 ..... 1010111 @r2_vm
+vredsum_vs      000000 . ..... ..... 010 ..... 1010111 @r_vm
+vredand_vs      000001 . ..... ..... 010 ..... 1010111 @r_vm
+vredor_vs       000010 . ..... ..... 010 ..... 1010111 @r_vm
+vredxor_vs      000011 . ..... ..... 010 ..... 1010111 @r_vm
+vredminu_vs     000100 . ..... ..... 010 ..... 1010111 @r_vm
+vredmin_vs      000101 . ..... ..... 010 ..... 1010111 @r_vm
+vredmaxu_vs     000110 . ..... ..... 010 ..... 1010111 @r_vm
+vredmax_vs      000111 . ..... ..... 010 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index bdb765bf13..3f6951abd5 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2023,3 +2023,20 @@ GEN_OPFV_NARROW_TRANS(vfncvt_x_f_v)
 GEN_OPFV_NARROW_TRANS(vfncvt_f_xu_v)
 GEN_OPFV_NARROW_TRANS(vfncvt_f_x_v)
 GEN_OPFV_NARROW_TRANS(vfncvt_f_f_v)
+
+/*
+ *** Vector Reduction Operations
+ */
+/* Vector Single-Width Integer Reduction Instructions */
+static bool reduction_check(DisasContext *s, arg_rmrr *a)
+{
+    return vext_check_isa_ill(s, RVV) && vext_check_reg(s, a->rs2, false);
+}
+GEN_OPIVV_TRANS(vredsum_vs, reduction_check)
+GEN_OPIVV_TRANS(vredmaxu_vs, reduction_check)
+GEN_OPIVV_TRANS(vredmax_vs, reduction_check)
+GEN_OPIVV_TRANS(vredminu_vs, reduction_check)
+GEN_OPIVV_TRANS(vredmin_vs, reduction_check)
+GEN_OPIVV_TRANS(vredand_vs, reduction_check)
+GEN_OPIVV_TRANS(vredor_vs, reduction_check)
+GEN_OPIVV_TRANS(vredxor_vs, reduction_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index bb143b9216..789be79b5a 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4062,3 +4062,79 @@ RVVCALL(OPFVV1, vfncvt_f_f_v_h, NOP_UU_H, H2, H4, vfncvtffv16)
 RVVCALL(OPFVV1, vfncvt_f_f_v_w, NOP_UU_W, H4, H8, vfncvtffv32)
 GEN_VEXT_V_ENV(vfncvt_f_f_v_h, 2, 2, clearh)
 GEN_VEXT_V_ENV(vfncvt_f_f_v_w, 4, 4, clearl)
+
+/*
+ *** Vector Reduction Operations
+ */
+/* Vector Single-Width Integer Reduction Instructions */
+#define GEN_VEXT_RED(NAME, TD, TS2, HD, HS2, OP, CLEAR_FN)\
+void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
+        void *vs2, CPURISCVState *env, uint32_t desc)     \
+{                                                         \
+    uint32_t mlen = vext_mlen(desc);                      \
+    uint32_t vm = vext_vm(desc);                          \
+    uint32_t vl = env->vl;                                \
+    uint32_t i;                                           \
+    uint32_t tot = env_archcpu(env)->cfg.vlen / 8;        \
+                                                          \
+    TD s1 =  *((TD *)vs1 + HD(0));                        \
+    for (i = 0; i < vl; i++) {                            \
+        TS2 s2 = *((TS2 *)vs2 + HS2(i));                  \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
+            continue;                                     \
+        }                                                 \
+        s1 = OP(s1, (TD)s2);                              \
+    }                                                     \
+    if (i != 0) {                                         \
+        *((TD *)vd + HD(0)) = s1;                         \
+        CLEAR_FN(vd, 1, sizeof(TD), tot);                 \
+    }                                                     \
+}
+
+/* vd[0] = sum(vs1[0], vs2[*]) */
+GEN_VEXT_RED(vredsum_vs_b, int8_t, int8_t, H1, H1, DO_ADD, clearb)
+GEN_VEXT_RED(vredsum_vs_h, int16_t, int16_t, H2, H2, DO_ADD, clearh)
+GEN_VEXT_RED(vredsum_vs_w, int32_t, int32_t, H4, H4, DO_ADD, clearl)
+GEN_VEXT_RED(vredsum_vs_d, int64_t, int64_t, H8, H8, DO_ADD, clearq)
+
+/* vd[0] = maxu(vs1[0], vs2[*]) */
+GEN_VEXT_RED(vredmaxu_vs_b, uint8_t, uint8_t, H1, H1, DO_MAX, clearb)
+GEN_VEXT_RED(vredmaxu_vs_h, uint16_t, uint16_t, H2, H2, DO_MAX, clearh)
+GEN_VEXT_RED(vredmaxu_vs_w, uint32_t, uint32_t, H4, H4, DO_MAX, clearl)
+GEN_VEXT_RED(vredmaxu_vs_d, uint64_t, uint64_t, H8, H8, DO_MAX, clearq)
+
+/* vd[0] = max(vs1[0], vs2[*]) */
+GEN_VEXT_RED(vredmax_vs_b, int8_t, int8_t, H1, H1, DO_MAX, clearb)
+GEN_VEXT_RED(vredmax_vs_h, int16_t, int16_t, H2, H2, DO_MAX, clearh)
+GEN_VEXT_RED(vredmax_vs_w, int32_t, int32_t, H4, H4, DO_MAX, clearl)
+GEN_VEXT_RED(vredmax_vs_d, int64_t, int64_t, H8, H8, DO_MAX, clearq)
+
+/* vd[0] = minu(vs1[0], vs2[*]) */
+GEN_VEXT_RED(vredminu_vs_b, uint8_t, uint8_t, H1, H1, DO_MIN, clearb)
+GEN_VEXT_RED(vredminu_vs_h, uint16_t, uint16_t, H2, H2, DO_MIN, clearh)
+GEN_VEXT_RED(vredminu_vs_w, uint32_t, uint32_t, H4, H4, DO_MIN, clearl)
+GEN_VEXT_RED(vredminu_vs_d, uint64_t, uint64_t, H8, H8, DO_MIN, clearq)
+
+/* vd[0] = min(vs1[0], vs2[*]) */
+GEN_VEXT_RED(vredmin_vs_b, int8_t, int8_t, H1, H1, DO_MIN, clearb)
+GEN_VEXT_RED(vredmin_vs_h, int16_t, int16_t, H2, H2, DO_MIN, clearh)
+GEN_VEXT_RED(vredmin_vs_w, int32_t, int32_t, H4, H4, DO_MIN, clearl)
+GEN_VEXT_RED(vredmin_vs_d, int64_t, int64_t, H8, H8, DO_MIN, clearq)
+
+/* vd[0] = and(vs1[0], vs2[*]) */
+GEN_VEXT_RED(vredand_vs_b, int8_t, int8_t, H1, H1, DO_AND, clearb)
+GEN_VEXT_RED(vredand_vs_h, int16_t, int16_t, H2, H2, DO_AND, clearh)
+GEN_VEXT_RED(vredand_vs_w, int32_t, int32_t, H4, H4, DO_AND, clearl)
+GEN_VEXT_RED(vredand_vs_d, int64_t, int64_t, H8, H8, DO_AND, clearq)
+
+/* vd[0] = or(vs1[0], vs2[*]) */
+GEN_VEXT_RED(vredor_vs_b, int8_t, int8_t, H1, H1, DO_OR, clearb)
+GEN_VEXT_RED(vredor_vs_h, int16_t, int16_t, H2, H2, DO_OR, clearh)
+GEN_VEXT_RED(vredor_vs_w, int32_t, int32_t, H4, H4, DO_OR, clearl)
+GEN_VEXT_RED(vredor_vs_d, int64_t, int64_t, H8, H8, DO_OR, clearq)
+
+/* vd[0] = xor(vs1[0], vs2[*]) */
+GEN_VEXT_RED(vredxor_vs_b, int8_t, int8_t, H1, H1, DO_XOR, clearb)
+GEN_VEXT_RED(vredxor_vs_h, int16_t, int16_t, H2, H2, DO_XOR, clearh)
+GEN_VEXT_RED(vredxor_vs_w, int32_t, int32_t, H4, H4, DO_XOR, clearl)
+GEN_VEXT_RED(vredxor_vs_d, int64_t, int64_t, H8, H8, DO_XOR, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 44/60] target/riscv: vector single-width integer reduction instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 33 +++++++++++
 target/riscv/insn32.decode              |  8 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 17 ++++++
 target/riscv/vector_helper.c            | 76 +++++++++++++++++++++++++
 4 files changed, 134 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 008c5b9868..cc1eb55404 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1021,3 +1021,36 @@ DEF_HELPER_5(vfncvt_f_x_v_h, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfncvt_f_x_v_w, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfncvt_f_f_v_h, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vfncvt_f_f_v_w, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vredsum_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredsum_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredsum_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredsum_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmaxu_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmaxu_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmaxu_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmaxu_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmax_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmax_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmax_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmax_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredminu_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredminu_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredminu_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredminu_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmin_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmin_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmin_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredmin_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredand_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredand_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredand_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredand_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredor_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredor_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredor_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredor_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredxor_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredxor_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredxor_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vredxor_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index ffc58698c6..2419ef97e7 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -523,6 +523,14 @@ vfncvt_x_f_v    100010 . ..... 10001 001 ..... 1010111 @r2_vm
 vfncvt_f_xu_v   100010 . ..... 10010 001 ..... 1010111 @r2_vm
 vfncvt_f_x_v    100010 . ..... 10011 001 ..... 1010111 @r2_vm
 vfncvt_f_f_v    100010 . ..... 10100 001 ..... 1010111 @r2_vm
+vredsum_vs      000000 . ..... ..... 010 ..... 1010111 @r_vm
+vredand_vs      000001 . ..... ..... 010 ..... 1010111 @r_vm
+vredor_vs       000010 . ..... ..... 010 ..... 1010111 @r_vm
+vredxor_vs      000011 . ..... ..... 010 ..... 1010111 @r_vm
+vredminu_vs     000100 . ..... ..... 010 ..... 1010111 @r_vm
+vredmin_vs      000101 . ..... ..... 010 ..... 1010111 @r_vm
+vredmaxu_vs     000110 . ..... ..... 010 ..... 1010111 @r_vm
+vredmax_vs      000111 . ..... ..... 010 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index bdb765bf13..3f6951abd5 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2023,3 +2023,20 @@ GEN_OPFV_NARROW_TRANS(vfncvt_x_f_v)
 GEN_OPFV_NARROW_TRANS(vfncvt_f_xu_v)
 GEN_OPFV_NARROW_TRANS(vfncvt_f_x_v)
 GEN_OPFV_NARROW_TRANS(vfncvt_f_f_v)
+
+/*
+ *** Vector Reduction Operations
+ */
+/* Vector Single-Width Integer Reduction Instructions */
+static bool reduction_check(DisasContext *s, arg_rmrr *a)
+{
+    return vext_check_isa_ill(s, RVV) && vext_check_reg(s, a->rs2, false);
+}
+GEN_OPIVV_TRANS(vredsum_vs, reduction_check)
+GEN_OPIVV_TRANS(vredmaxu_vs, reduction_check)
+GEN_OPIVV_TRANS(vredmax_vs, reduction_check)
+GEN_OPIVV_TRANS(vredminu_vs, reduction_check)
+GEN_OPIVV_TRANS(vredmin_vs, reduction_check)
+GEN_OPIVV_TRANS(vredand_vs, reduction_check)
+GEN_OPIVV_TRANS(vredor_vs, reduction_check)
+GEN_OPIVV_TRANS(vredxor_vs, reduction_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index bb143b9216..789be79b5a 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4062,3 +4062,79 @@ RVVCALL(OPFVV1, vfncvt_f_f_v_h, NOP_UU_H, H2, H4, vfncvtffv16)
 RVVCALL(OPFVV1, vfncvt_f_f_v_w, NOP_UU_W, H4, H8, vfncvtffv32)
 GEN_VEXT_V_ENV(vfncvt_f_f_v_h, 2, 2, clearh)
 GEN_VEXT_V_ENV(vfncvt_f_f_v_w, 4, 4, clearl)
+
+/*
+ *** Vector Reduction Operations
+ */
+/* Vector Single-Width Integer Reduction Instructions */
+#define GEN_VEXT_RED(NAME, TD, TS2, HD, HS2, OP, CLEAR_FN)\
+void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
+        void *vs2, CPURISCVState *env, uint32_t desc)     \
+{                                                         \
+    uint32_t mlen = vext_mlen(desc);                      \
+    uint32_t vm = vext_vm(desc);                          \
+    uint32_t vl = env->vl;                                \
+    uint32_t i;                                           \
+    uint32_t tot = env_archcpu(env)->cfg.vlen / 8;        \
+                                                          \
+    TD s1 =  *((TD *)vs1 + HD(0));                        \
+    for (i = 0; i < vl; i++) {                            \
+        TS2 s2 = *((TS2 *)vs2 + HS2(i));                  \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
+            continue;                                     \
+        }                                                 \
+        s1 = OP(s1, (TD)s2);                              \
+    }                                                     \
+    if (i != 0) {                                         \
+        *((TD *)vd + HD(0)) = s1;                         \
+        CLEAR_FN(vd, 1, sizeof(TD), tot);                 \
+    }                                                     \
+}
+
+/* vd[0] = sum(vs1[0], vs2[*]) */
+GEN_VEXT_RED(vredsum_vs_b, int8_t, int8_t, H1, H1, DO_ADD, clearb)
+GEN_VEXT_RED(vredsum_vs_h, int16_t, int16_t, H2, H2, DO_ADD, clearh)
+GEN_VEXT_RED(vredsum_vs_w, int32_t, int32_t, H4, H4, DO_ADD, clearl)
+GEN_VEXT_RED(vredsum_vs_d, int64_t, int64_t, H8, H8, DO_ADD, clearq)
+
+/* vd[0] = maxu(vs1[0], vs2[*]) */
+GEN_VEXT_RED(vredmaxu_vs_b, uint8_t, uint8_t, H1, H1, DO_MAX, clearb)
+GEN_VEXT_RED(vredmaxu_vs_h, uint16_t, uint16_t, H2, H2, DO_MAX, clearh)
+GEN_VEXT_RED(vredmaxu_vs_w, uint32_t, uint32_t, H4, H4, DO_MAX, clearl)
+GEN_VEXT_RED(vredmaxu_vs_d, uint64_t, uint64_t, H8, H8, DO_MAX, clearq)
+
+/* vd[0] = max(vs1[0], vs2[*]) */
+GEN_VEXT_RED(vredmax_vs_b, int8_t, int8_t, H1, H1, DO_MAX, clearb)
+GEN_VEXT_RED(vredmax_vs_h, int16_t, int16_t, H2, H2, DO_MAX, clearh)
+GEN_VEXT_RED(vredmax_vs_w, int32_t, int32_t, H4, H4, DO_MAX, clearl)
+GEN_VEXT_RED(vredmax_vs_d, int64_t, int64_t, H8, H8, DO_MAX, clearq)
+
+/* vd[0] = minu(vs1[0], vs2[*]) */
+GEN_VEXT_RED(vredminu_vs_b, uint8_t, uint8_t, H1, H1, DO_MIN, clearb)
+GEN_VEXT_RED(vredminu_vs_h, uint16_t, uint16_t, H2, H2, DO_MIN, clearh)
+GEN_VEXT_RED(vredminu_vs_w, uint32_t, uint32_t, H4, H4, DO_MIN, clearl)
+GEN_VEXT_RED(vredminu_vs_d, uint64_t, uint64_t, H8, H8, DO_MIN, clearq)
+
+/* vd[0] = min(vs1[0], vs2[*]) */
+GEN_VEXT_RED(vredmin_vs_b, int8_t, int8_t, H1, H1, DO_MIN, clearb)
+GEN_VEXT_RED(vredmin_vs_h, int16_t, int16_t, H2, H2, DO_MIN, clearh)
+GEN_VEXT_RED(vredmin_vs_w, int32_t, int32_t, H4, H4, DO_MIN, clearl)
+GEN_VEXT_RED(vredmin_vs_d, int64_t, int64_t, H8, H8, DO_MIN, clearq)
+
+/* vd[0] = and(vs1[0], vs2[*]) */
+GEN_VEXT_RED(vredand_vs_b, int8_t, int8_t, H1, H1, DO_AND, clearb)
+GEN_VEXT_RED(vredand_vs_h, int16_t, int16_t, H2, H2, DO_AND, clearh)
+GEN_VEXT_RED(vredand_vs_w, int32_t, int32_t, H4, H4, DO_AND, clearl)
+GEN_VEXT_RED(vredand_vs_d, int64_t, int64_t, H8, H8, DO_AND, clearq)
+
+/* vd[0] = or(vs1[0], vs2[*]) */
+GEN_VEXT_RED(vredor_vs_b, int8_t, int8_t, H1, H1, DO_OR, clearb)
+GEN_VEXT_RED(vredor_vs_h, int16_t, int16_t, H2, H2, DO_OR, clearh)
+GEN_VEXT_RED(vredor_vs_w, int32_t, int32_t, H4, H4, DO_OR, clearl)
+GEN_VEXT_RED(vredor_vs_d, int64_t, int64_t, H8, H8, DO_OR, clearq)
+
+/* vd[0] = xor(vs1[0], vs2[*]) */
+GEN_VEXT_RED(vredxor_vs_b, int8_t, int8_t, H1, H1, DO_XOR, clearb)
+GEN_VEXT_RED(vredxor_vs_h, int16_t, int16_t, H2, H2, DO_XOR, clearh)
+GEN_VEXT_RED(vredxor_vs_w, int32_t, int32_t, H4, H4, DO_XOR, clearl)
+GEN_VEXT_RED(vredxor_vs_d, int64_t, int64_t, H8, H8, DO_XOR, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 45/60] target/riscv: vector wideing integer reduction instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  7 +++++++
 target/riscv/insn32.decode              |  2 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  4 ++++
 target/riscv/vector_helper.c            | 11 +++++++++++
 4 files changed, 24 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index cc1eb55404..76435f90a9 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1054,3 +1054,10 @@ DEF_HELPER_6(vredxor_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vredxor_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vredxor_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vredxor_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vwredsumu_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwredsumu_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwredsumu_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwredsum_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwredsum_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwredsum_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 2419ef97e7..e6a354c134 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -531,6 +531,8 @@ vredminu_vs     000100 . ..... ..... 010 ..... 1010111 @r_vm
 vredmin_vs      000101 . ..... ..... 010 ..... 1010111 @r_vm
 vredmaxu_vs     000110 . ..... ..... 010 ..... 1010111 @r_vm
 vredmax_vs      000111 . ..... ..... 010 ..... 1010111 @r_vm
+vwredsumu_vs    110000 . ..... ..... 000 ..... 1010111 @r_vm
+vwredsum_vs     110001 . ..... ..... 000 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 3f6951abd5..195c460cb8 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2040,3 +2040,7 @@ GEN_OPIVV_TRANS(vredmin_vs, reduction_check)
 GEN_OPIVV_TRANS(vredand_vs, reduction_check)
 GEN_OPIVV_TRANS(vredor_vs, reduction_check)
 GEN_OPIVV_TRANS(vredxor_vs, reduction_check)
+
+/* Vector Widening Integer Reduction Instructions */
+GEN_OPIVV_WIDEN_TRANS(vwredsum_vs, reduction_check)
+GEN_OPIVV_WIDEN_TRANS(vwredsumu_vs, reduction_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 789be79b5a..f2ded5adc6 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4138,3 +4138,14 @@ GEN_VEXT_RED(vredxor_vs_b, int8_t, int8_t, H1, H1, DO_XOR, clearb)
 GEN_VEXT_RED(vredxor_vs_h, int16_t, int16_t, H2, H2, DO_XOR, clearh)
 GEN_VEXT_RED(vredxor_vs_w, int32_t, int32_t, H4, H4, DO_XOR, clearl)
 GEN_VEXT_RED(vredxor_vs_d, int64_t, int64_t, H8, H8, DO_XOR, clearq)
+
+/* Vector Widening Integer Reduction Instructions */
+/* signed sum reduction into double-width accumulator */
+GEN_VEXT_RED(vwredsum_vs_b, int16_t, int8_t, H2, H1, DO_ADD, clearh)
+GEN_VEXT_RED(vwredsum_vs_h, int32_t, int16_t, H4, H2, DO_ADD, clearl)
+GEN_VEXT_RED(vwredsum_vs_w, int64_t, int32_t, H8, H4, DO_ADD, clearq)
+
+/* Unsigned sum reduction into double-width accumulator */
+GEN_VEXT_RED(vwredsumu_vs_b, uint16_t, uint8_t, H2, H1, DO_ADD, clearh)
+GEN_VEXT_RED(vwredsumu_vs_h, uint32_t, uint16_t, H4, H2, DO_ADD, clearl)
+GEN_VEXT_RED(vwredsumu_vs_w, uint64_t, uint32_t, H8, H4, DO_ADD, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 45/60] target/riscv: vector wideing integer reduction instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  7 +++++++
 target/riscv/insn32.decode              |  2 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  4 ++++
 target/riscv/vector_helper.c            | 11 +++++++++++
 4 files changed, 24 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index cc1eb55404..76435f90a9 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1054,3 +1054,10 @@ DEF_HELPER_6(vredxor_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vredxor_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vredxor_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vredxor_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vwredsumu_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwredsumu_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwredsumu_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwredsum_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwredsum_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwredsum_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 2419ef97e7..e6a354c134 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -531,6 +531,8 @@ vredminu_vs     000100 . ..... ..... 010 ..... 1010111 @r_vm
 vredmin_vs      000101 . ..... ..... 010 ..... 1010111 @r_vm
 vredmaxu_vs     000110 . ..... ..... 010 ..... 1010111 @r_vm
 vredmax_vs      000111 . ..... ..... 010 ..... 1010111 @r_vm
+vwredsumu_vs    110000 . ..... ..... 000 ..... 1010111 @r_vm
+vwredsum_vs     110001 . ..... ..... 000 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 3f6951abd5..195c460cb8 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2040,3 +2040,7 @@ GEN_OPIVV_TRANS(vredmin_vs, reduction_check)
 GEN_OPIVV_TRANS(vredand_vs, reduction_check)
 GEN_OPIVV_TRANS(vredor_vs, reduction_check)
 GEN_OPIVV_TRANS(vredxor_vs, reduction_check)
+
+/* Vector Widening Integer Reduction Instructions */
+GEN_OPIVV_WIDEN_TRANS(vwredsum_vs, reduction_check)
+GEN_OPIVV_WIDEN_TRANS(vwredsumu_vs, reduction_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 789be79b5a..f2ded5adc6 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4138,3 +4138,14 @@ GEN_VEXT_RED(vredxor_vs_b, int8_t, int8_t, H1, H1, DO_XOR, clearb)
 GEN_VEXT_RED(vredxor_vs_h, int16_t, int16_t, H2, H2, DO_XOR, clearh)
 GEN_VEXT_RED(vredxor_vs_w, int32_t, int32_t, H4, H4, DO_XOR, clearl)
 GEN_VEXT_RED(vredxor_vs_d, int64_t, int64_t, H8, H8, DO_XOR, clearq)
+
+/* Vector Widening Integer Reduction Instructions */
+/* signed sum reduction into double-width accumulator */
+GEN_VEXT_RED(vwredsum_vs_b, int16_t, int8_t, H2, H1, DO_ADD, clearh)
+GEN_VEXT_RED(vwredsum_vs_h, int32_t, int16_t, H4, H2, DO_ADD, clearl)
+GEN_VEXT_RED(vwredsum_vs_w, int64_t, int32_t, H8, H4, DO_ADD, clearq)
+
+/* Unsigned sum reduction into double-width accumulator */
+GEN_VEXT_RED(vwredsumu_vs_b, uint16_t, uint8_t, H2, H1, DO_ADD, clearh)
+GEN_VEXT_RED(vwredsumu_vs_h, uint32_t, uint16_t, H4, H2, DO_ADD, clearl)
+GEN_VEXT_RED(vwredsumu_vs_w, uint64_t, uint32_t, H8, H4, DO_ADD, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 46/60] target/riscv: vector single-width floating-point reduction instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 10 +++++++
 target/riscv/insn32.decode              |  4 +++
 target/riscv/insn_trans/trans_rvv.inc.c |  5 ++++
 target/riscv/vector_helper.c            | 39 +++++++++++++++++++++++++
 4 files changed, 58 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 76435f90a9..0a1aa30514 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1061,3 +1061,13 @@ DEF_HELPER_6(vwredsumu_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vwredsum_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vwredsum_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vwredsum_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vfredsum_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfredsum_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfredsum_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfredmax_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfredmax_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfredmax_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfredmin_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfredmin_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfredmin_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e6a354c134..294e55b7ae 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -533,6 +533,10 @@ vredmaxu_vs     000110 . ..... ..... 010 ..... 1010111 @r_vm
 vredmax_vs      000111 . ..... ..... 010 ..... 1010111 @r_vm
 vwredsumu_vs    110000 . ..... ..... 000 ..... 1010111 @r_vm
 vwredsum_vs     110001 . ..... ..... 000 ..... 1010111 @r_vm
+# Vector ordered and unordered reduction sum
+vfredsum_vs     0000-1 . ..... ..... 001 ..... 1010111 @r_vm
+vfredmin_vs     000101 . ..... ..... 001 ..... 1010111 @r_vm
+vfredmax_vs     000111 . ..... ..... 001 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 195c460cb8..d66ec4a1e4 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2044,3 +2044,8 @@ GEN_OPIVV_TRANS(vredxor_vs, reduction_check)
 /* Vector Widening Integer Reduction Instructions */
 GEN_OPIVV_WIDEN_TRANS(vwredsum_vs, reduction_check)
 GEN_OPIVV_WIDEN_TRANS(vwredsumu_vs, reduction_check)
+
+/* Vector Single-Width Floating-Point Reduction Instructions */
+GEN_OPFVV_TRANS(vfredsum_vs, reduction_check)
+GEN_OPFVV_TRANS(vfredmax_vs, reduction_check)
+GEN_OPFVV_TRANS(vfredmin_vs, reduction_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index f2ded5adc6..948135f60b 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4149,3 +4149,42 @@ GEN_VEXT_RED(vwredsum_vs_w, int64_t, int32_t, H8, H4, DO_ADD, clearq)
 GEN_VEXT_RED(vwredsumu_vs_b, uint16_t, uint8_t, H2, H1, DO_ADD, clearh)
 GEN_VEXT_RED(vwredsumu_vs_h, uint32_t, uint16_t, H4, H2, DO_ADD, clearl)
 GEN_VEXT_RED(vwredsumu_vs_w, uint64_t, uint32_t, H8, H4, DO_ADD, clearq)
+
+/* Vector Single-Width Floating-Point Reduction Instructions */
+#define GEN_VEXT_FRED(NAME, TD, TS2, HD, HS2, OP, CLEAR_FN)\
+void HELPER(NAME)(void *vd, void *v0, void *vs1,           \
+        void *vs2, CPURISCVState *env, uint32_t desc)      \
+{                                                          \
+    uint32_t mlen = vext_mlen(desc);                       \
+    uint32_t vm = vext_vm(desc);                           \
+    uint32_t vl = env->vl;                                 \
+    uint32_t i;                                            \
+    uint32_t tot = env_archcpu(env)->cfg.vlen / 8;         \
+                                                           \
+    TD s1 =  *((TD *)vs1 + HD(0));                         \
+    for (i = 0; i < vl; i++) {                             \
+        TS2 s2 = *((TS2 *)vs2 + HS2(i));                   \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {         \
+            continue;                                      \
+        }                                                  \
+        s1 = OP(s1, (TD)s2, &env->fp_status);              \
+    }                                                      \
+    if (i != 0) {                                          \
+        *((TD *)vd + HD(0)) = s1;                          \
+        CLEAR_FN(vd, 1, sizeof(TD), tot);                  \
+    }                                                      \
+}
+/* Unordered sum */
+GEN_VEXT_FRED(vfredsum_vs_h, uint16_t, uint16_t, H2, H2, float16_add, clearh)
+GEN_VEXT_FRED(vfredsum_vs_w, uint32_t, uint32_t, H4, H4, float32_add, clearl)
+GEN_VEXT_FRED(vfredsum_vs_d, uint64_t, uint64_t, H8, H8, float64_add, clearq)
+
+/* Maximum value */
+GEN_VEXT_FRED(vfredmax_vs_h, uint16_t, uint16_t, H2, H2, float16_maxnum, clearh)
+GEN_VEXT_FRED(vfredmax_vs_w, uint32_t, uint32_t, H4, H4, float32_maxnum, clearl)
+GEN_VEXT_FRED(vfredmax_vs_d, uint64_t, uint64_t, H8, H8, float64_maxnum, clearq)
+
+/* Minimum value */
+GEN_VEXT_FRED(vfredmin_vs_h, uint16_t, uint16_t, H2, H2, float16_minnum, clearh)
+GEN_VEXT_FRED(vfredmin_vs_w, uint32_t, uint32_t, H4, H4, float32_minnum, clearl)
+GEN_VEXT_FRED(vfredmin_vs_d, uint64_t, uint64_t, H8, H8, float64_minnum, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 46/60] target/riscv: vector single-width floating-point reduction instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   | 10 +++++++
 target/riscv/insn32.decode              |  4 +++
 target/riscv/insn_trans/trans_rvv.inc.c |  5 ++++
 target/riscv/vector_helper.c            | 39 +++++++++++++++++++++++++
 4 files changed, 58 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 76435f90a9..0a1aa30514 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1061,3 +1061,13 @@ DEF_HELPER_6(vwredsumu_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vwredsum_vs_b, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vwredsum_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vwredsum_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vfredsum_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfredsum_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfredsum_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfredmax_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfredmax_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfredmax_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfredmin_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfredmin_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfredmin_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e6a354c134..294e55b7ae 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -533,6 +533,10 @@ vredmaxu_vs     000110 . ..... ..... 010 ..... 1010111 @r_vm
 vredmax_vs      000111 . ..... ..... 010 ..... 1010111 @r_vm
 vwredsumu_vs    110000 . ..... ..... 000 ..... 1010111 @r_vm
 vwredsum_vs     110001 . ..... ..... 000 ..... 1010111 @r_vm
+# Vector ordered and unordered reduction sum
+vfredsum_vs     0000-1 . ..... ..... 001 ..... 1010111 @r_vm
+vfredmin_vs     000101 . ..... ..... 001 ..... 1010111 @r_vm
+vfredmax_vs     000111 . ..... ..... 001 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 195c460cb8..d66ec4a1e4 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2044,3 +2044,8 @@ GEN_OPIVV_TRANS(vredxor_vs, reduction_check)
 /* Vector Widening Integer Reduction Instructions */
 GEN_OPIVV_WIDEN_TRANS(vwredsum_vs, reduction_check)
 GEN_OPIVV_WIDEN_TRANS(vwredsumu_vs, reduction_check)
+
+/* Vector Single-Width Floating-Point Reduction Instructions */
+GEN_OPFVV_TRANS(vfredsum_vs, reduction_check)
+GEN_OPFVV_TRANS(vfredmax_vs, reduction_check)
+GEN_OPFVV_TRANS(vfredmin_vs, reduction_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index f2ded5adc6..948135f60b 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4149,3 +4149,42 @@ GEN_VEXT_RED(vwredsum_vs_w, int64_t, int32_t, H8, H4, DO_ADD, clearq)
 GEN_VEXT_RED(vwredsumu_vs_b, uint16_t, uint8_t, H2, H1, DO_ADD, clearh)
 GEN_VEXT_RED(vwredsumu_vs_h, uint32_t, uint16_t, H4, H2, DO_ADD, clearl)
 GEN_VEXT_RED(vwredsumu_vs_w, uint64_t, uint32_t, H8, H4, DO_ADD, clearq)
+
+/* Vector Single-Width Floating-Point Reduction Instructions */
+#define GEN_VEXT_FRED(NAME, TD, TS2, HD, HS2, OP, CLEAR_FN)\
+void HELPER(NAME)(void *vd, void *v0, void *vs1,           \
+        void *vs2, CPURISCVState *env, uint32_t desc)      \
+{                                                          \
+    uint32_t mlen = vext_mlen(desc);                       \
+    uint32_t vm = vext_vm(desc);                           \
+    uint32_t vl = env->vl;                                 \
+    uint32_t i;                                            \
+    uint32_t tot = env_archcpu(env)->cfg.vlen / 8;         \
+                                                           \
+    TD s1 =  *((TD *)vs1 + HD(0));                         \
+    for (i = 0; i < vl; i++) {                             \
+        TS2 s2 = *((TS2 *)vs2 + HS2(i));                   \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {         \
+            continue;                                      \
+        }                                                  \
+        s1 = OP(s1, (TD)s2, &env->fp_status);              \
+    }                                                      \
+    if (i != 0) {                                          \
+        *((TD *)vd + HD(0)) = s1;                          \
+        CLEAR_FN(vd, 1, sizeof(TD), tot);                  \
+    }                                                      \
+}
+/* Unordered sum */
+GEN_VEXT_FRED(vfredsum_vs_h, uint16_t, uint16_t, H2, H2, float16_add, clearh)
+GEN_VEXT_FRED(vfredsum_vs_w, uint32_t, uint32_t, H4, H4, float32_add, clearl)
+GEN_VEXT_FRED(vfredsum_vs_d, uint64_t, uint64_t, H8, H8, float64_add, clearq)
+
+/* Maximum value */
+GEN_VEXT_FRED(vfredmax_vs_h, uint16_t, uint16_t, H2, H2, float16_maxnum, clearh)
+GEN_VEXT_FRED(vfredmax_vs_w, uint32_t, uint32_t, H4, H4, float32_maxnum, clearl)
+GEN_VEXT_FRED(vfredmax_vs_d, uint64_t, uint64_t, H8, H8, float64_maxnum, clearq)
+
+/* Minimum value */
+GEN_VEXT_FRED(vfredmin_vs_h, uint16_t, uint16_t, H2, H2, float16_minnum, clearh)
+GEN_VEXT_FRED(vfredmin_vs_w, uint32_t, uint32_t, H4, H4, float32_minnum, clearl)
+GEN_VEXT_FRED(vfredmin_vs_d, uint64_t, uint64_t, H8, H8, float64_minnum, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 47/60] target/riscv: vector widening floating-point reduction instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  3 ++
 target/riscv/insn32.decode              |  2 +
 target/riscv/insn_trans/trans_rvv.inc.c |  3 ++
 target/riscv/vector_helper.c            | 50 +++++++++++++++++++++++++
 4 files changed, 58 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 0a1aa30514..b0bb617b42 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1071,3 +1071,6 @@ DEF_HELPER_6(vfredmax_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vfredmin_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vfredmin_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vfredmin_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vfwredsum_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwredsum_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 294e55b7ae..f1efc8886d 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -537,6 +537,8 @@ vwredsum_vs     110001 . ..... ..... 000 ..... 1010111 @r_vm
 vfredsum_vs     0000-1 . ..... ..... 001 ..... 1010111 @r_vm
 vfredmin_vs     000101 . ..... ..... 001 ..... 1010111 @r_vm
 vfredmax_vs     000111 . ..... ..... 001 ..... 1010111 @r_vm
+# Vector widening ordered and unordered float reduction sum
+vfwredsum_vs    1100-1 . ..... ..... 001 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index d66ec4a1e4..ad864c9742 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2049,3 +2049,6 @@ GEN_OPIVV_WIDEN_TRANS(vwredsumu_vs, reduction_check)
 GEN_OPFVV_TRANS(vfredsum_vs, reduction_check)
 GEN_OPFVV_TRANS(vfredmax_vs, reduction_check)
 GEN_OPFVV_TRANS(vfredmin_vs, reduction_check)
+
+/* Vector Widening Floating-Point Reduction Instructions */
+GEN_OPFVV_WIDEN_TRANS(vfwredsum_vs, reduction_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 948135f60b..d325fe5e2e 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4188,3 +4188,53 @@ GEN_VEXT_FRED(vfredmax_vs_d, uint64_t, uint64_t, H8, H8, float64_maxnum, clearq)
 GEN_VEXT_FRED(vfredmin_vs_h, uint16_t, uint16_t, H2, H2, float16_minnum, clearh)
 GEN_VEXT_FRED(vfredmin_vs_w, uint32_t, uint32_t, H4, H4, float32_minnum, clearl)
 GEN_VEXT_FRED(vfredmin_vs_d, uint64_t, uint64_t, H8, H8, float64_minnum, clearq)
+
+/* Vector Widening Floating-Point Reduction Instructions */
+/* Unordered reduce 2*SEW = 2*SEW + sum(promote(SEW)) */
+void HELPER(vfwredsum_vs_h)(void *vd, void *v0, void *vs1,
+        void *vs2, CPURISCVState *env, uint32_t desc)
+{
+    uint32_t mlen = vext_mlen(desc);
+    uint32_t vm = vext_vm(desc);
+    uint32_t vl = env->vl;
+    uint32_t i;
+    uint32_t tot = env_archcpu(env)->cfg.vlen / 8;
+
+    uint32_t s1 =  *((uint32_t *)vs1 + H4(0));
+    for (i = 0; i < vl; i++) {
+        uint16_t s2 = *((uint16_t *)vs2 + H2(i));
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {
+            continue;
+        }
+        s1 = float32_add(s1, float16_to_float32(s2, true, &env->fp_status),
+                &env->fp_status);
+    }
+    if (i != 0) {
+        *((uint32_t *)vd + H4(0)) = s1;
+        clearl(vd, 1, sizeof(uint32_t), tot);
+    }
+}
+
+void HELPER(vfwredsum_vs_w)(void *vd, void *v0, void *vs1,
+        void *vs2, CPURISCVState *env, uint32_t desc)
+{
+    uint32_t mlen = vext_mlen(desc);
+    uint32_t vm = vext_vm(desc);
+    uint32_t vl = env->vl;
+    uint32_t i;
+    uint32_t tot = env_archcpu(env)->cfg.vlen / 8;
+
+    uint64_t s1 =  *((uint64_t *)vs1);
+    for (i = 0; i < vl; i++) {
+        uint32_t s2 = *((uint32_t *)vs2 + H4(i));
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {
+            continue;
+        }
+        s1 = float64_add(s1, float32_to_float64(s2, &env->fp_status),
+                &env->fp_status);
+    }
+    if (i != 0) {
+        *((uint64_t *)vd) = s1;
+        clearq(vd, 1, sizeof(uint64_t), tot);
+    }
+}
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 47/60] target/riscv: vector widening floating-point reduction instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  3 ++
 target/riscv/insn32.decode              |  2 +
 target/riscv/insn_trans/trans_rvv.inc.c |  3 ++
 target/riscv/vector_helper.c            | 50 +++++++++++++++++++++++++
 4 files changed, 58 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 0a1aa30514..b0bb617b42 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1071,3 +1071,6 @@ DEF_HELPER_6(vfredmax_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vfredmin_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vfredmin_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vfredmin_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vfwredsum_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwredsum_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 294e55b7ae..f1efc8886d 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -537,6 +537,8 @@ vwredsum_vs     110001 . ..... ..... 000 ..... 1010111 @r_vm
 vfredsum_vs     0000-1 . ..... ..... 001 ..... 1010111 @r_vm
 vfredmin_vs     000101 . ..... ..... 001 ..... 1010111 @r_vm
 vfredmax_vs     000111 . ..... ..... 001 ..... 1010111 @r_vm
+# Vector widening ordered and unordered float reduction sum
+vfwredsum_vs    1100-1 . ..... ..... 001 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index d66ec4a1e4..ad864c9742 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2049,3 +2049,6 @@ GEN_OPIVV_WIDEN_TRANS(vwredsumu_vs, reduction_check)
 GEN_OPFVV_TRANS(vfredsum_vs, reduction_check)
 GEN_OPFVV_TRANS(vfredmax_vs, reduction_check)
 GEN_OPFVV_TRANS(vfredmin_vs, reduction_check)
+
+/* Vector Widening Floating-Point Reduction Instructions */
+GEN_OPFVV_WIDEN_TRANS(vfwredsum_vs, reduction_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 948135f60b..d325fe5e2e 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4188,3 +4188,53 @@ GEN_VEXT_FRED(vfredmax_vs_d, uint64_t, uint64_t, H8, H8, float64_maxnum, clearq)
 GEN_VEXT_FRED(vfredmin_vs_h, uint16_t, uint16_t, H2, H2, float16_minnum, clearh)
 GEN_VEXT_FRED(vfredmin_vs_w, uint32_t, uint32_t, H4, H4, float32_minnum, clearl)
 GEN_VEXT_FRED(vfredmin_vs_d, uint64_t, uint64_t, H8, H8, float64_minnum, clearq)
+
+/* Vector Widening Floating-Point Reduction Instructions */
+/* Unordered reduce 2*SEW = 2*SEW + sum(promote(SEW)) */
+void HELPER(vfwredsum_vs_h)(void *vd, void *v0, void *vs1,
+        void *vs2, CPURISCVState *env, uint32_t desc)
+{
+    uint32_t mlen = vext_mlen(desc);
+    uint32_t vm = vext_vm(desc);
+    uint32_t vl = env->vl;
+    uint32_t i;
+    uint32_t tot = env_archcpu(env)->cfg.vlen / 8;
+
+    uint32_t s1 =  *((uint32_t *)vs1 + H4(0));
+    for (i = 0; i < vl; i++) {
+        uint16_t s2 = *((uint16_t *)vs2 + H2(i));
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {
+            continue;
+        }
+        s1 = float32_add(s1, float16_to_float32(s2, true, &env->fp_status),
+                &env->fp_status);
+    }
+    if (i != 0) {
+        *((uint32_t *)vd + H4(0)) = s1;
+        clearl(vd, 1, sizeof(uint32_t), tot);
+    }
+}
+
+void HELPER(vfwredsum_vs_w)(void *vd, void *v0, void *vs1,
+        void *vs2, CPURISCVState *env, uint32_t desc)
+{
+    uint32_t mlen = vext_mlen(desc);
+    uint32_t vm = vext_vm(desc);
+    uint32_t vl = env->vl;
+    uint32_t i;
+    uint32_t tot = env_archcpu(env)->cfg.vlen / 8;
+
+    uint64_t s1 =  *((uint64_t *)vs1);
+    for (i = 0; i < vl; i++) {
+        uint32_t s2 = *((uint32_t *)vs2 + H4(i));
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {
+            continue;
+        }
+        s1 = float64_add(s1, float32_to_float64(s2, &env->fp_status),
+                &env->fp_status);
+    }
+    if (i != 0) {
+        *((uint64_t *)vd) = s1;
+        clearq(vd, 1, sizeof(uint64_t), tot);
+    }
+}
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 48/60] target/riscv: vector mask-register logical instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  9 ++++++
 target/riscv/insn32.decode              |  8 +++++
 target/riscv/insn_trans/trans_rvv.inc.c | 28 +++++++++++++++++
 target/riscv/vector_helper.c            | 40 +++++++++++++++++++++++++
 4 files changed, 85 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index b0bb617b42..9301ce0e00 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1074,3 +1074,12 @@ DEF_HELPER_6(vfredmin_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
 
 DEF_HELPER_6(vfwredsum_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vfwredsum_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vmand_mm, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmnand_mm, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmandnot_mm, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmxor_mm, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmor_mm, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmnor_mm, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmornot_mm, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmxnor_mm, void, ptr, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index f1efc8886d..76a9bae8bb 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -539,6 +539,14 @@ vfredmin_vs     000101 . ..... ..... 001 ..... 1010111 @r_vm
 vfredmax_vs     000111 . ..... ..... 001 ..... 1010111 @r_vm
 # Vector widening ordered and unordered float reduction sum
 vfwredsum_vs    1100-1 . ..... ..... 001 ..... 1010111 @r_vm
+vmand_mm        011001 - ..... ..... 010 ..... 1010111 @r
+vmnand_mm       011101 - ..... ..... 010 ..... 1010111 @r
+vmandnot_mm     011000 - ..... ..... 010 ..... 1010111 @r
+vmxor_mm        011011 - ..... ..... 010 ..... 1010111 @r
+vmor_mm         011010 - ..... ..... 010 ..... 1010111 @r
+vmnor_mm        011110 - ..... ..... 010 ..... 1010111 @r
+vmornot_mm      011100 - ..... ..... 010 ..... 1010111 @r
+vmxnor_mm       011111 - ..... ..... 010 ..... 1010111 @r
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index ad864c9742..065b415abb 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2052,3 +2052,31 @@ GEN_OPFVV_TRANS(vfredmin_vs, reduction_check)
 
 /* Vector Widening Floating-Point Reduction Instructions */
 GEN_OPFVV_WIDEN_TRANS(vfwredsum_vs, reduction_check)
+
+/*
+ *** Vector Mask Operations
+ */
+/* Vector Mask-Register Logical Instructions */
+#define GEN_MM_TRANS(NAME)                                         \
+static bool trans_##NAME(DisasContext *s, arg_r *a)                \
+{                                                                  \
+    if (vext_check_isa_ill(s, RVV)) {                              \
+        uint32_t data = 0;                                         \
+        gen_helper_gvec_4_ptr * fn = gen_helper_##NAME;            \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
+            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),              \
+            cpu_env, 0, s->vlen / 8, data, fn);                    \
+        return true;                                               \
+    }                                                              \
+    return false;                                                  \
+}
+GEN_MM_TRANS(vmand_mm)
+GEN_MM_TRANS(vmnand_mm)
+GEN_MM_TRANS(vmandnot_mm)
+GEN_MM_TRANS(vmxor_mm)
+GEN_MM_TRANS(vmor_mm)
+GEN_MM_TRANS(vmnor_mm)
+GEN_MM_TRANS(vmornot_mm)
+GEN_MM_TRANS(vmxnor_mm)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index d325fe5e2e..9e9d172cda 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4238,3 +4238,43 @@ void HELPER(vfwredsum_vs_w)(void *vd, void *v0, void *vs1,
         clearq(vd, 1, sizeof(uint64_t), tot);
     }
 }
+
+/*
+ *** Vector Mask Operations
+ */
+/* Vector Mask-Register Logical Instructions */
+#define GEN_VEXT_MASK_VV(NAME, OP)                        \
+void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
+        void *vs2, CPURISCVState *env, uint32_t desc)     \
+{                                                         \
+    uint32_t mlen = vext_mlen(desc);                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;   \
+    uint32_t vl = env->vl;                                \
+    uint32_t i;                                           \
+    int a, b;                                             \
+    for (i = 0; i < vl; i++) {                            \
+        a = vext_elem_mask(vs1, mlen, i);                 \
+        b = vext_elem_mask(vs2, mlen, i);                 \
+        vext_set_elem_mask(vd, mlen, i, OP(b, a));        \
+    }                                                     \
+    if (i == 0) {                                         \
+        return;                                           \
+    }                                                     \
+    for (; i < vlmax; i++) {                              \
+        vext_set_elem_mask(vd, mlen, i, 0);               \
+    }                                                     \
+}
+#define DO_NAND(N, M)  (!(N & M))
+#define DO_ANDNOT(N, M)  (N & !M)
+#define DO_NOR(N, M)  (!(N | M))
+#define DO_ORNOT(N, M)  (N | !M)
+#define DO_XNOR(N, M)  (!(N ^ M))
+
+GEN_VEXT_MASK_VV(vmand_mm, DO_AND)
+GEN_VEXT_MASK_VV(vmnand_mm, DO_NAND)
+GEN_VEXT_MASK_VV(vmandnot_mm, DO_ANDNOT)
+GEN_VEXT_MASK_VV(vmxor_mm, DO_XOR)
+GEN_VEXT_MASK_VV(vmor_mm, DO_OR)
+GEN_VEXT_MASK_VV(vmnor_mm, DO_NOR)
+GEN_VEXT_MASK_VV(vmornot_mm, DO_ORNOT)
+GEN_VEXT_MASK_VV(vmxnor_mm, DO_XNOR)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 48/60] target/riscv: vector mask-register logical instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  9 ++++++
 target/riscv/insn32.decode              |  8 +++++
 target/riscv/insn_trans/trans_rvv.inc.c | 28 +++++++++++++++++
 target/riscv/vector_helper.c            | 40 +++++++++++++++++++++++++
 4 files changed, 85 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index b0bb617b42..9301ce0e00 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1074,3 +1074,12 @@ DEF_HELPER_6(vfredmin_vs_d, void, ptr, ptr, ptr, ptr, env, i32)
 
 DEF_HELPER_6(vfwredsum_vs_h, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vfwredsum_vs_w, void, ptr, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vmand_mm, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmnand_mm, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmandnot_mm, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmxor_mm, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmor_mm, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmnor_mm, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmornot_mm, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmxnor_mm, void, ptr, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index f1efc8886d..76a9bae8bb 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -539,6 +539,14 @@ vfredmin_vs     000101 . ..... ..... 001 ..... 1010111 @r_vm
 vfredmax_vs     000111 . ..... ..... 001 ..... 1010111 @r_vm
 # Vector widening ordered and unordered float reduction sum
 vfwredsum_vs    1100-1 . ..... ..... 001 ..... 1010111 @r_vm
+vmand_mm        011001 - ..... ..... 010 ..... 1010111 @r
+vmnand_mm       011101 - ..... ..... 010 ..... 1010111 @r
+vmandnot_mm     011000 - ..... ..... 010 ..... 1010111 @r
+vmxor_mm        011011 - ..... ..... 010 ..... 1010111 @r
+vmor_mm         011010 - ..... ..... 010 ..... 1010111 @r
+vmnor_mm        011110 - ..... ..... 010 ..... 1010111 @r
+vmornot_mm      011100 - ..... ..... 010 ..... 1010111 @r
+vmxnor_mm       011111 - ..... ..... 010 ..... 1010111 @r
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index ad864c9742..065b415abb 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2052,3 +2052,31 @@ GEN_OPFVV_TRANS(vfredmin_vs, reduction_check)
 
 /* Vector Widening Floating-Point Reduction Instructions */
 GEN_OPFVV_WIDEN_TRANS(vfwredsum_vs, reduction_check)
+
+/*
+ *** Vector Mask Operations
+ */
+/* Vector Mask-Register Logical Instructions */
+#define GEN_MM_TRANS(NAME)                                         \
+static bool trans_##NAME(DisasContext *s, arg_r *a)                \
+{                                                                  \
+    if (vext_check_isa_ill(s, RVV)) {                              \
+        uint32_t data = 0;                                         \
+        gen_helper_gvec_4_ptr * fn = gen_helper_##NAME;            \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
+            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),              \
+            cpu_env, 0, s->vlen / 8, data, fn);                    \
+        return true;                                               \
+    }                                                              \
+    return false;                                                  \
+}
+GEN_MM_TRANS(vmand_mm)
+GEN_MM_TRANS(vmnand_mm)
+GEN_MM_TRANS(vmandnot_mm)
+GEN_MM_TRANS(vmxor_mm)
+GEN_MM_TRANS(vmor_mm)
+GEN_MM_TRANS(vmnor_mm)
+GEN_MM_TRANS(vmornot_mm)
+GEN_MM_TRANS(vmxnor_mm)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index d325fe5e2e..9e9d172cda 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4238,3 +4238,43 @@ void HELPER(vfwredsum_vs_w)(void *vd, void *v0, void *vs1,
         clearq(vd, 1, sizeof(uint64_t), tot);
     }
 }
+
+/*
+ *** Vector Mask Operations
+ */
+/* Vector Mask-Register Logical Instructions */
+#define GEN_VEXT_MASK_VV(NAME, OP)                        \
+void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
+        void *vs2, CPURISCVState *env, uint32_t desc)     \
+{                                                         \
+    uint32_t mlen = vext_mlen(desc);                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;   \
+    uint32_t vl = env->vl;                                \
+    uint32_t i;                                           \
+    int a, b;                                             \
+    for (i = 0; i < vl; i++) {                            \
+        a = vext_elem_mask(vs1, mlen, i);                 \
+        b = vext_elem_mask(vs2, mlen, i);                 \
+        vext_set_elem_mask(vd, mlen, i, OP(b, a));        \
+    }                                                     \
+    if (i == 0) {                                         \
+        return;                                           \
+    }                                                     \
+    for (; i < vlmax; i++) {                              \
+        vext_set_elem_mask(vd, mlen, i, 0);               \
+    }                                                     \
+}
+#define DO_NAND(N, M)  (!(N & M))
+#define DO_ANDNOT(N, M)  (N & !M)
+#define DO_NOR(N, M)  (!(N | M))
+#define DO_ORNOT(N, M)  (N | !M)
+#define DO_XNOR(N, M)  (!(N ^ M))
+
+GEN_VEXT_MASK_VV(vmand_mm, DO_AND)
+GEN_VEXT_MASK_VV(vmnand_mm, DO_NAND)
+GEN_VEXT_MASK_VV(vmandnot_mm, DO_ANDNOT)
+GEN_VEXT_MASK_VV(vmxor_mm, DO_XOR)
+GEN_VEXT_MASK_VV(vmor_mm, DO_OR)
+GEN_VEXT_MASK_VV(vmnor_mm, DO_NOR)
+GEN_VEXT_MASK_VV(vmornot_mm, DO_ORNOT)
+GEN_VEXT_MASK_VV(vmxnor_mm, DO_XNOR)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 49/60] target/riscv: vector mask population count vmpopc
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  2 ++
 target/riscv/insn32.decode              |  1 +
 target/riscv/insn_trans/trans_rvv.inc.c | 32 +++++++++++++++++++++++++
 target/riscv/vector_helper.c            | 20 ++++++++++++++++
 4 files changed, 55 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 9301ce0e00..3f6b8ab451 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1083,3 +1083,5 @@ DEF_HELPER_6(vmor_mm, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vmnor_mm, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vmornot_mm, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vmxnor_mm, void, ptr, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_4(vmpopc_m, tl, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 76a9bae8bb..eac767ad82 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -547,6 +547,7 @@ vmor_mm         011010 - ..... ..... 010 ..... 1010111 @r
 vmnor_mm        011110 - ..... ..... 010 ..... 1010111 @r
 vmornot_mm      011100 - ..... ..... 010 ..... 1010111 @r
 vmxnor_mm       011111 - ..... ..... 010 ..... 1010111 @r
+vmpopc_m        010100 . ..... ----- 010 ..... 1010111 @r2_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 065b415abb..c56f30a257 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2080,3 +2080,35 @@ GEN_MM_TRANS(vmor_mm)
 GEN_MM_TRANS(vmnor_mm)
 GEN_MM_TRANS(vmornot_mm)
 GEN_MM_TRANS(vmxnor_mm)
+
+/* Vector mask population count vmpopc */
+static bool trans_vmpopc_m(DisasContext *s, arg_rmr *a)
+{
+    if (vext_check_isa_ill(s, RVV)) {
+        TCGv_ptr src2, mask;
+        TCGv dst;
+        TCGv_i32 desc;
+        uint32_t data = 0;
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+        data = FIELD_DP32(data, VDATA, VM, a->vm);
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+
+        mask = tcg_temp_new_ptr();
+        src2 = tcg_temp_new_ptr();
+        dst = tcg_temp_new();
+        desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
+
+        tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, a->rs2));
+        tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+        gen_helper_vmpopc_m(dst, mask, src2, cpu_env, desc);
+        gen_set_gpr(a->rd, dst);
+
+        tcg_temp_free_ptr(mask);
+        tcg_temp_free_ptr(src2);
+        tcg_temp_free(dst);
+        tcg_temp_free_i32(desc);
+        return true;
+    }
+    return false;
+}
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 9e9d172cda..4bd901e826 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4278,3 +4278,23 @@ GEN_VEXT_MASK_VV(vmor_mm, DO_OR)
 GEN_VEXT_MASK_VV(vmnor_mm, DO_NOR)
 GEN_VEXT_MASK_VV(vmornot_mm, DO_ORNOT)
 GEN_VEXT_MASK_VV(vmxnor_mm, DO_XNOR)
+
+/* Vector mask population count vmpopc */
+target_ulong HELPER(vmpopc_m)(void *v0, void *vs2, CPURISCVState *env,
+        uint32_t desc)
+{
+    target_ulong cnt = 0;
+    uint32_t mlen = vext_mlen(desc);
+    uint32_t vm = vext_vm(desc);
+    uint32_t vl = env->vl;
+    int i;
+
+    for (i = 0; i < vl; i++) {
+        if (vm || vext_elem_mask(v0, mlen, i)) {
+            if (vext_elem_mask(vs2, mlen, i)) {
+                cnt++;
+            }
+        }
+    }
+    return cnt;
+}
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 49/60] target/riscv: vector mask population count vmpopc
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  2 ++
 target/riscv/insn32.decode              |  1 +
 target/riscv/insn_trans/trans_rvv.inc.c | 32 +++++++++++++++++++++++++
 target/riscv/vector_helper.c            | 20 ++++++++++++++++
 4 files changed, 55 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 9301ce0e00..3f6b8ab451 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1083,3 +1083,5 @@ DEF_HELPER_6(vmor_mm, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vmnor_mm, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vmornot_mm, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vmxnor_mm, void, ptr, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_4(vmpopc_m, tl, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 76a9bae8bb..eac767ad82 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -547,6 +547,7 @@ vmor_mm         011010 - ..... ..... 010 ..... 1010111 @r
 vmnor_mm        011110 - ..... ..... 010 ..... 1010111 @r
 vmornot_mm      011100 - ..... ..... 010 ..... 1010111 @r
 vmxnor_mm       011111 - ..... ..... 010 ..... 1010111 @r
+vmpopc_m        010100 . ..... ----- 010 ..... 1010111 @r2_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 065b415abb..c56f30a257 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2080,3 +2080,35 @@ GEN_MM_TRANS(vmor_mm)
 GEN_MM_TRANS(vmnor_mm)
 GEN_MM_TRANS(vmornot_mm)
 GEN_MM_TRANS(vmxnor_mm)
+
+/* Vector mask population count vmpopc */
+static bool trans_vmpopc_m(DisasContext *s, arg_rmr *a)
+{
+    if (vext_check_isa_ill(s, RVV)) {
+        TCGv_ptr src2, mask;
+        TCGv dst;
+        TCGv_i32 desc;
+        uint32_t data = 0;
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+        data = FIELD_DP32(data, VDATA, VM, a->vm);
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+
+        mask = tcg_temp_new_ptr();
+        src2 = tcg_temp_new_ptr();
+        dst = tcg_temp_new();
+        desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
+
+        tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, a->rs2));
+        tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+        gen_helper_vmpopc_m(dst, mask, src2, cpu_env, desc);
+        gen_set_gpr(a->rd, dst);
+
+        tcg_temp_free_ptr(mask);
+        tcg_temp_free_ptr(src2);
+        tcg_temp_free(dst);
+        tcg_temp_free_i32(desc);
+        return true;
+    }
+    return false;
+}
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 9e9d172cda..4bd901e826 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4278,3 +4278,23 @@ GEN_VEXT_MASK_VV(vmor_mm, DO_OR)
 GEN_VEXT_MASK_VV(vmnor_mm, DO_NOR)
 GEN_VEXT_MASK_VV(vmornot_mm, DO_ORNOT)
 GEN_VEXT_MASK_VV(vmxnor_mm, DO_XNOR)
+
+/* Vector mask population count vmpopc */
+target_ulong HELPER(vmpopc_m)(void *v0, void *vs2, CPURISCVState *env,
+        uint32_t desc)
+{
+    target_ulong cnt = 0;
+    uint32_t mlen = vext_mlen(desc);
+    uint32_t vm = vext_vm(desc);
+    uint32_t vl = env->vl;
+    int i;
+
+    for (i = 0; i < vl; i++) {
+        if (vm || vext_elem_mask(v0, mlen, i)) {
+            if (vext_elem_mask(vs2, mlen, i)) {
+                cnt++;
+            }
+        }
+    }
+    return cnt;
+}
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 50/60] target/riscv: vmfirst find-first-set mask bit
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  2 ++
 target/riscv/insn32.decode              |  1 +
 target/riscv/insn_trans/trans_rvv.inc.c | 32 +++++++++++++++++++++++++
 target/riscv/vector_helper.c            | 19 +++++++++++++++
 4 files changed, 54 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 3f6b8ab451..363bc52dc4 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1085,3 +1085,5 @@ DEF_HELPER_6(vmornot_mm, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vmxnor_mm, void, ptr, ptr, ptr, ptr, env, i32)
 
 DEF_HELPER_4(vmpopc_m, tl, ptr, ptr, env, i32)
+
+DEF_HELPER_4(vmfirst_m, tl, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index eac767ad82..328a6c75bb 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -548,6 +548,7 @@ vmnor_mm        011110 - ..... ..... 010 ..... 1010111 @r
 vmornot_mm      011100 - ..... ..... 010 ..... 1010111 @r
 vmxnor_mm       011111 - ..... ..... 010 ..... 1010111 @r
 vmpopc_m        010100 . ..... ----- 010 ..... 1010111 @r2_vm
+vmfirst_m       010101 . ..... ----- 010 ..... 1010111 @r2_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index c56f30a257..265d94245f 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2112,3 +2112,35 @@ static bool trans_vmpopc_m(DisasContext *s, arg_rmr *a)
     }
     return false;
 }
+
+/* vmfirst find-first-set mask bit */
+static bool trans_vmfirst_m(DisasContext *s, arg_rmr *a)
+{
+    if (vext_check_isa_ill(s, RVV)) {
+        TCGv_ptr src2, mask;
+        TCGv dst;
+        TCGv_i32 desc;
+        uint32_t data = 0;
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+        data = FIELD_DP32(data, VDATA, VM, a->vm);
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+
+        mask = tcg_temp_new_ptr();
+        src2 = tcg_temp_new_ptr();
+        dst = tcg_temp_new();
+        desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
+
+        tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, a->rs2));
+        tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+        gen_helper_vmfirst_m(dst, mask, src2, cpu_env, desc);
+        gen_set_gpr(a->rd, dst);
+
+        tcg_temp_free_ptr(mask);
+        tcg_temp_free_ptr(src2);
+        tcg_temp_free(dst);
+        tcg_temp_free_i32(desc);
+        return true;
+    }
+    return false;
+}
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 4bd901e826..8a3f8ccdec 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4298,3 +4298,22 @@ target_ulong HELPER(vmpopc_m)(void *v0, void *vs2, CPURISCVState *env,
     }
     return cnt;
 }
+
+/* vmfirst find-first-set mask bit*/
+target_ulong HELPER(vmfirst_m)(void *v0, void *vs2, CPURISCVState *env,
+        uint32_t desc)
+{
+    uint32_t mlen = vext_mlen(desc);
+    uint32_t vm = vext_vm(desc);
+    uint32_t vl = env->vl;
+    int i;
+
+    for (i = 0; i < vl; i++) {
+        if (vm || vext_elem_mask(v0, mlen, i)) {
+            if (vext_elem_mask(vs2, mlen, i)) {
+               return i;
+            }
+        }
+    }
+    return -1LL;
+}
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 50/60] target/riscv: vmfirst find-first-set mask bit
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  2 ++
 target/riscv/insn32.decode              |  1 +
 target/riscv/insn_trans/trans_rvv.inc.c | 32 +++++++++++++++++++++++++
 target/riscv/vector_helper.c            | 19 +++++++++++++++
 4 files changed, 54 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 3f6b8ab451..363bc52dc4 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1085,3 +1085,5 @@ DEF_HELPER_6(vmornot_mm, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vmxnor_mm, void, ptr, ptr, ptr, ptr, env, i32)
 
 DEF_HELPER_4(vmpopc_m, tl, ptr, ptr, env, i32)
+
+DEF_HELPER_4(vmfirst_m, tl, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index eac767ad82..328a6c75bb 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -548,6 +548,7 @@ vmnor_mm        011110 - ..... ..... 010 ..... 1010111 @r
 vmornot_mm      011100 - ..... ..... 010 ..... 1010111 @r
 vmxnor_mm       011111 - ..... ..... 010 ..... 1010111 @r
 vmpopc_m        010100 . ..... ----- 010 ..... 1010111 @r2_vm
+vmfirst_m       010101 . ..... ----- 010 ..... 1010111 @r2_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index c56f30a257..265d94245f 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2112,3 +2112,35 @@ static bool trans_vmpopc_m(DisasContext *s, arg_rmr *a)
     }
     return false;
 }
+
+/* vmfirst find-first-set mask bit */
+static bool trans_vmfirst_m(DisasContext *s, arg_rmr *a)
+{
+    if (vext_check_isa_ill(s, RVV)) {
+        TCGv_ptr src2, mask;
+        TCGv dst;
+        TCGv_i32 desc;
+        uint32_t data = 0;
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+        data = FIELD_DP32(data, VDATA, VM, a->vm);
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+
+        mask = tcg_temp_new_ptr();
+        src2 = tcg_temp_new_ptr();
+        dst = tcg_temp_new();
+        desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
+
+        tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, a->rs2));
+        tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+        gen_helper_vmfirst_m(dst, mask, src2, cpu_env, desc);
+        gen_set_gpr(a->rd, dst);
+
+        tcg_temp_free_ptr(mask);
+        tcg_temp_free_ptr(src2);
+        tcg_temp_free(dst);
+        tcg_temp_free_i32(desc);
+        return true;
+    }
+    return false;
+}
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 4bd901e826..8a3f8ccdec 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4298,3 +4298,22 @@ target_ulong HELPER(vmpopc_m)(void *v0, void *vs2, CPURISCVState *env,
     }
     return cnt;
 }
+
+/* vmfirst find-first-set mask bit*/
+target_ulong HELPER(vmfirst_m)(void *v0, void *vs2, CPURISCVState *env,
+        uint32_t desc)
+{
+    uint32_t mlen = vext_mlen(desc);
+    uint32_t vm = vext_vm(desc);
+    uint32_t vl = env->vl;
+    int i;
+
+    for (i = 0; i < vl; i++) {
+        if (vm || vext_elem_mask(v0, mlen, i)) {
+            if (vext_elem_mask(vs2, mlen, i)) {
+               return i;
+            }
+        }
+    }
+    return -1LL;
+}
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 51/60] target/riscv: set-X-first mask bit
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  4 ++
 target/riscv/insn32.decode              |  3 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 23 +++++++++
 target/riscv/vector_helper.c            | 66 +++++++++++++++++++++++++
 4 files changed, 96 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 363bc52dc4..2da967b33b 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1087,3 +1087,7 @@ DEF_HELPER_6(vmxnor_mm, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_4(vmpopc_m, tl, ptr, ptr, env, i32)
 
 DEF_HELPER_4(vmfirst_m, tl, ptr, ptr, env, i32)
+
+DEF_HELPER_5(vmsbf_m, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vmsif_m, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vmsof_m, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 328a6c75bb..b2a11441c8 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -549,6 +549,9 @@ vmornot_mm      011100 - ..... ..... 010 ..... 1010111 @r
 vmxnor_mm       011111 - ..... ..... 010 ..... 1010111 @r
 vmpopc_m        010100 . ..... ----- 010 ..... 1010111 @r2_vm
 vmfirst_m       010101 . ..... ----- 010 ..... 1010111 @r2_vm
+vmsbf_m         010110 . ..... 00001 010 ..... 1010111 @r2_vm
+vmsif_m         010110 . ..... 00011 010 ..... 1010111 @r2_vm
+vmsof_m         010110 . ..... 00010 010 ..... 1010111 @r2_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 265d94245f..c1f4e27743 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2144,3 +2144,26 @@ static bool trans_vmfirst_m(DisasContext *s, arg_rmr *a)
     }
     return false;
 }
+
+/* vmsbf.m set-before-first mask bit */
+/* vmsif.m set-includ-first mask bit */
+/* vmsof.m set-only-first mask bit */
+#define GEN_M_TRANS(NAME)                                          \
+static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
+{                                                                  \
+    if (vext_check_isa_ill(s, RVV)) {                              \
+        uint32_t data = 0;                                         \
+        gen_helper_gvec_3_ptr * fn = gen_helper_##NAME;            \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd),                     \
+            vreg_ofs(s, 0), vreg_ofs(s, a->rs2),                   \
+            cpu_env, 0, s->vlen / 8, data, fn);                    \
+        return true;                                               \
+    }                                                              \
+    return false;                                                  \
+}
+GEN_M_TRANS(vmsbf_m)
+GEN_M_TRANS(vmsif_m)
+GEN_M_TRANS(vmsof_m)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 8a3f8ccdec..073f5dea6a 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4317,3 +4317,69 @@ target_ulong HELPER(vmfirst_m)(void *v0, void *vs2, CPURISCVState *env,
     }
     return -1LL;
 }
+
+enum set_mask_type {
+    ONLY_FIRST = 1,
+    INCLUDE_FIRST,
+    BEFORE_FIRST,
+};
+
+static void vmsetm(void *vd, void *v0, void *vs2, CPURISCVState *env,
+        uint32_t desc, enum set_mask_type type)
+{
+    uint32_t mlen = vext_mlen(desc);
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;
+    uint32_t vm = vext_vm(desc);
+    uint32_t vl = env->vl;
+    int i;
+    bool first_mask_bit = false;
+
+    for (i = 0; i < vl; i++) {
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {
+            continue;
+        }
+        /* write a zero to all following active elements */
+        if (first_mask_bit) {
+            vext_set_elem_mask(vd, mlen, i, 0);
+            continue;
+        }
+        if (vext_elem_mask(vs2, mlen, i)) {
+            first_mask_bit = true;
+            if (type == BEFORE_FIRST) {
+                vext_set_elem_mask(vd, mlen, i, 0);
+            } else {
+                vext_set_elem_mask(vd, mlen, i, 1);
+            }
+        } else {
+            if (type == ONLY_FIRST) {
+                vext_set_elem_mask(vd, mlen, i, 0);
+            } else {
+                vext_set_elem_mask(vd, mlen, i, 1);
+            }
+        }
+    }
+    if (i == 0) { /* vector register writeback is cancelled when vl == 0*/
+        return;
+    }
+    for (; i < vlmax; i++) {
+        vext_set_elem_mask(vd, mlen, i, 0);
+    }
+}
+
+void HELPER(vmsbf_m)(void *vd, void *v0, void *vs2, CPURISCVState *env,
+        uint32_t desc)
+{
+    vmsetm(vd, v0, vs2, env, desc, BEFORE_FIRST);
+}
+
+void HELPER(vmsif_m)(void *vd, void *v0, void *vs2, CPURISCVState *env,
+        uint32_t desc)
+{
+    vmsetm(vd, v0, vs2, env, desc, INCLUDE_FIRST);
+}
+
+void HELPER(vmsof_m)(void *vd, void *v0, void *vs2, CPURISCVState *env,
+        uint32_t desc)
+{
+    vmsetm(vd, v0, vs2, env, desc, ONLY_FIRST);
+}
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 51/60] target/riscv: set-X-first mask bit
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  4 ++
 target/riscv/insn32.decode              |  3 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 23 +++++++++
 target/riscv/vector_helper.c            | 66 +++++++++++++++++++++++++
 4 files changed, 96 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 363bc52dc4..2da967b33b 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1087,3 +1087,7 @@ DEF_HELPER_6(vmxnor_mm, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_4(vmpopc_m, tl, ptr, ptr, env, i32)
 
 DEF_HELPER_4(vmfirst_m, tl, ptr, ptr, env, i32)
+
+DEF_HELPER_5(vmsbf_m, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vmsif_m, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vmsof_m, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 328a6c75bb..b2a11441c8 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -549,6 +549,9 @@ vmornot_mm      011100 - ..... ..... 010 ..... 1010111 @r
 vmxnor_mm       011111 - ..... ..... 010 ..... 1010111 @r
 vmpopc_m        010100 . ..... ----- 010 ..... 1010111 @r2_vm
 vmfirst_m       010101 . ..... ----- 010 ..... 1010111 @r2_vm
+vmsbf_m         010110 . ..... 00001 010 ..... 1010111 @r2_vm
+vmsif_m         010110 . ..... 00011 010 ..... 1010111 @r2_vm
+vmsof_m         010110 . ..... 00010 010 ..... 1010111 @r2_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 265d94245f..c1f4e27743 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2144,3 +2144,26 @@ static bool trans_vmfirst_m(DisasContext *s, arg_rmr *a)
     }
     return false;
 }
+
+/* vmsbf.m set-before-first mask bit */
+/* vmsif.m set-includ-first mask bit */
+/* vmsof.m set-only-first mask bit */
+#define GEN_M_TRANS(NAME)                                          \
+static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
+{                                                                  \
+    if (vext_check_isa_ill(s, RVV)) {                              \
+        uint32_t data = 0;                                         \
+        gen_helper_gvec_3_ptr * fn = gen_helper_##NAME;            \
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);             \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd),                     \
+            vreg_ofs(s, 0), vreg_ofs(s, a->rs2),                   \
+            cpu_env, 0, s->vlen / 8, data, fn);                    \
+        return true;                                               \
+    }                                                              \
+    return false;                                                  \
+}
+GEN_M_TRANS(vmsbf_m)
+GEN_M_TRANS(vmsif_m)
+GEN_M_TRANS(vmsof_m)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 8a3f8ccdec..073f5dea6a 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4317,3 +4317,69 @@ target_ulong HELPER(vmfirst_m)(void *v0, void *vs2, CPURISCVState *env,
     }
     return -1LL;
 }
+
+enum set_mask_type {
+    ONLY_FIRST = 1,
+    INCLUDE_FIRST,
+    BEFORE_FIRST,
+};
+
+static void vmsetm(void *vd, void *v0, void *vs2, CPURISCVState *env,
+        uint32_t desc, enum set_mask_type type)
+{
+    uint32_t mlen = vext_mlen(desc);
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;
+    uint32_t vm = vext_vm(desc);
+    uint32_t vl = env->vl;
+    int i;
+    bool first_mask_bit = false;
+
+    for (i = 0; i < vl; i++) {
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {
+            continue;
+        }
+        /* write a zero to all following active elements */
+        if (first_mask_bit) {
+            vext_set_elem_mask(vd, mlen, i, 0);
+            continue;
+        }
+        if (vext_elem_mask(vs2, mlen, i)) {
+            first_mask_bit = true;
+            if (type == BEFORE_FIRST) {
+                vext_set_elem_mask(vd, mlen, i, 0);
+            } else {
+                vext_set_elem_mask(vd, mlen, i, 1);
+            }
+        } else {
+            if (type == ONLY_FIRST) {
+                vext_set_elem_mask(vd, mlen, i, 0);
+            } else {
+                vext_set_elem_mask(vd, mlen, i, 1);
+            }
+        }
+    }
+    if (i == 0) { /* vector register writeback is cancelled when vl == 0*/
+        return;
+    }
+    for (; i < vlmax; i++) {
+        vext_set_elem_mask(vd, mlen, i, 0);
+    }
+}
+
+void HELPER(vmsbf_m)(void *vd, void *v0, void *vs2, CPURISCVState *env,
+        uint32_t desc)
+{
+    vmsetm(vd, v0, vs2, env, desc, BEFORE_FIRST);
+}
+
+void HELPER(vmsif_m)(void *vd, void *v0, void *vs2, CPURISCVState *env,
+        uint32_t desc)
+{
+    vmsetm(vd, v0, vs2, env, desc, INCLUDE_FIRST);
+}
+
+void HELPER(vmsof_m)(void *vd, void *v0, void *vs2, CPURISCVState *env,
+        uint32_t desc)
+{
+    vmsetm(vd, v0, vs2, env, desc, ONLY_FIRST);
+}
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 52/60] target/riscv: vector iota instruction
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  5 ++++
 target/riscv/insn32.decode              |  1 +
 target/riscv/insn_trans/trans_rvv.inc.c | 22 ++++++++++++++++++
 target/riscv/vector_helper.c            | 31 +++++++++++++++++++++++++
 4 files changed, 59 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 2da967b33b..1a7653a431 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1091,3 +1091,8 @@ DEF_HELPER_4(vmfirst_m, tl, ptr, ptr, env, i32)
 DEF_HELPER_5(vmsbf_m, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vmsif_m, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vmsof_m, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_5(viota_m_b, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(viota_m_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(viota_m_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(viota_m_d, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b2a11441c8..00b8d2df9d 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -552,6 +552,7 @@ vmfirst_m       010101 . ..... ----- 010 ..... 1010111 @r2_vm
 vmsbf_m         010110 . ..... 00001 010 ..... 1010111 @r2_vm
 vmsif_m         010110 . ..... 00011 010 ..... 1010111 @r2_vm
 vmsof_m         010110 . ..... 00010 010 ..... 1010111 @r2_vm
+viota_m         010110 . ..... 10000 010 ..... 1010111 @r2_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index c1f4e27743..c7be689553 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2167,3 +2167,25 @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
 GEN_M_TRANS(vmsbf_m)
 GEN_M_TRANS(vmsif_m)
 GEN_M_TRANS(vmsof_m)
+
+/* Vector Iota Instruction */
+static bool trans_viota_m(DisasContext *s, arg_viota_m *a)
+{
+    if (vext_check_isa_ill(s, RVV) &&
+        vext_check_reg(s, a->rd, false) &&
+        vext_check_overlap_group(a->rd, 1 << s->lmul, a->rs2, 1) &&
+        (a->vm != 0 || a->rd != 0)) {
+        uint32_t data = 0;
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+        data = FIELD_DP32(data, VDATA, VM, a->vm);
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+        static gen_helper_gvec_3_ptr * const fns[4] = {
+            gen_helper_viota_m_b, gen_helper_viota_m_h,
+            gen_helper_viota_m_w, gen_helper_viota_m_d,
+        };
+        tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),
+            vreg_ofs(s, a->rs2), cpu_env, 0, s->vlen / 8, data, fns[s->sew]);
+        return true;
+    }
+    return false;
+}
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 073f5dea6a..6089be92fa 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4383,3 +4383,34 @@ void HELPER(vmsof_m)(void *vd, void *v0, void *vs2, CPURISCVState *env,
 {
     vmsetm(vd, v0, vs2, env, desc, ONLY_FIRST);
 }
+
+/* Vector Iota Instruction */
+#define GEN_VEXT_VIOTA_M(NAME, ETYPE, H, CLEAR_FN)                        \
+void HELPER(NAME)(void *vd, void *v0, void *vs2, CPURISCVState *env,      \
+        uint32_t desc)                                                    \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
+    uint32_t vm = vext_vm(desc);                                          \
+    uint32_t vl = env->vl;                                                \
+    uint32_t sum = 0;                                                     \
+    int i;                                                                \
+                                                                          \
+    for (i = 0; i < vl; i++) {                                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
+            continue;                                                     \
+        }                                                                 \
+        *((ETYPE *)vd + H(i)) = sum;                                      \
+        if (vext_elem_mask(vs2, mlen, i)) {                               \
+            sum++;                                                        \
+        }                                                                 \
+    }                                                                     \
+    if (i == 0) {                                                         \
+        return;                                                           \
+    }                                                                     \
+    CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));          \
+}
+GEN_VEXT_VIOTA_M(viota_m_b, uint8_t, H1, clearb)
+GEN_VEXT_VIOTA_M(viota_m_h, uint16_t, H2, clearh)
+GEN_VEXT_VIOTA_M(viota_m_w, uint32_t, H4, clearl)
+GEN_VEXT_VIOTA_M(viota_m_d, uint64_t, H8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 52/60] target/riscv: vector iota instruction
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  5 ++++
 target/riscv/insn32.decode              |  1 +
 target/riscv/insn_trans/trans_rvv.inc.c | 22 ++++++++++++++++++
 target/riscv/vector_helper.c            | 31 +++++++++++++++++++++++++
 4 files changed, 59 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 2da967b33b..1a7653a431 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1091,3 +1091,8 @@ DEF_HELPER_4(vmfirst_m, tl, ptr, ptr, env, i32)
 DEF_HELPER_5(vmsbf_m, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vmsif_m, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vmsof_m, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_5(viota_m_b, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(viota_m_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(viota_m_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(viota_m_d, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b2a11441c8..00b8d2df9d 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -552,6 +552,7 @@ vmfirst_m       010101 . ..... ----- 010 ..... 1010111 @r2_vm
 vmsbf_m         010110 . ..... 00001 010 ..... 1010111 @r2_vm
 vmsif_m         010110 . ..... 00011 010 ..... 1010111 @r2_vm
 vmsof_m         010110 . ..... 00010 010 ..... 1010111 @r2_vm
+viota_m         010110 . ..... 10000 010 ..... 1010111 @r2_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index c1f4e27743..c7be689553 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2167,3 +2167,25 @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
 GEN_M_TRANS(vmsbf_m)
 GEN_M_TRANS(vmsif_m)
 GEN_M_TRANS(vmsof_m)
+
+/* Vector Iota Instruction */
+static bool trans_viota_m(DisasContext *s, arg_viota_m *a)
+{
+    if (vext_check_isa_ill(s, RVV) &&
+        vext_check_reg(s, a->rd, false) &&
+        vext_check_overlap_group(a->rd, 1 << s->lmul, a->rs2, 1) &&
+        (a->vm != 0 || a->rd != 0)) {
+        uint32_t data = 0;
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+        data = FIELD_DP32(data, VDATA, VM, a->vm);
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+        static gen_helper_gvec_3_ptr * const fns[4] = {
+            gen_helper_viota_m_b, gen_helper_viota_m_h,
+            gen_helper_viota_m_w, gen_helper_viota_m_d,
+        };
+        tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),
+            vreg_ofs(s, a->rs2), cpu_env, 0, s->vlen / 8, data, fns[s->sew]);
+        return true;
+    }
+    return false;
+}
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 073f5dea6a..6089be92fa 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4383,3 +4383,34 @@ void HELPER(vmsof_m)(void *vd, void *v0, void *vs2, CPURISCVState *env,
 {
     vmsetm(vd, v0, vs2, env, desc, ONLY_FIRST);
 }
+
+/* Vector Iota Instruction */
+#define GEN_VEXT_VIOTA_M(NAME, ETYPE, H, CLEAR_FN)                        \
+void HELPER(NAME)(void *vd, void *v0, void *vs2, CPURISCVState *env,      \
+        uint32_t desc)                                                    \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
+    uint32_t vm = vext_vm(desc);                                          \
+    uint32_t vl = env->vl;                                                \
+    uint32_t sum = 0;                                                     \
+    int i;                                                                \
+                                                                          \
+    for (i = 0; i < vl; i++) {                                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
+            continue;                                                     \
+        }                                                                 \
+        *((ETYPE *)vd + H(i)) = sum;                                      \
+        if (vext_elem_mask(vs2, mlen, i)) {                               \
+            sum++;                                                        \
+        }                                                                 \
+    }                                                                     \
+    if (i == 0) {                                                         \
+        return;                                                           \
+    }                                                                     \
+    CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));          \
+}
+GEN_VEXT_VIOTA_M(viota_m_b, uint8_t, H1, clearb)
+GEN_VEXT_VIOTA_M(viota_m_h, uint16_t, H2, clearh)
+GEN_VEXT_VIOTA_M(viota_m_w, uint32_t, H4, clearl)
+GEN_VEXT_VIOTA_M(viota_m_d, uint64_t, H8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 53/60] target/riscv: vector element index instruction
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  5 +++++
 target/riscv/insn32.decode              |  2 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 21 ++++++++++++++++++++
 target/riscv/vector_helper.c            | 26 +++++++++++++++++++++++++
 4 files changed, 54 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 1a7653a431..e3f2970221 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1096,3 +1096,8 @@ DEF_HELPER_5(viota_m_b, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(viota_m_h, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(viota_m_w, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(viota_m_d, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_4(vid_v_b, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vid_v_h, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vid_v_w, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vid_v_d, void, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 00b8d2df9d..1504059415 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -70,6 +70,7 @@
 @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
 @r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
 @r2_vm   ...... vm:1 ..... ..... ... ..... ....... &rmr %rs2 %rd
+@r1_vm   ...... vm:1 ..... ..... ... ..... ....... %rd
 @r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
 @r_vm    ...... vm:1 ..... ..... ... ..... ....... &rmrr %rs2 %rs1 %rd
 @r_wdvm  ..... wd:1 vm:1 ..... ..... ... ..... ....... &rwdvm %rs2 %rs1 %rd
@@ -553,6 +554,7 @@ vmsbf_m         010110 . ..... 00001 010 ..... 1010111 @r2_vm
 vmsif_m         010110 . ..... 00011 010 ..... 1010111 @r2_vm
 vmsof_m         010110 . ..... 00010 010 ..... 1010111 @r2_vm
 viota_m         010110 . ..... 10000 010 ..... 1010111 @r2_vm
+vid_v           010110 . 00000 10001 010 ..... 1010111 @r1_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index c7be689553..1ff72a6406 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2189,3 +2189,24 @@ static bool trans_viota_m(DisasContext *s, arg_viota_m *a)
     }
     return false;
 }
+
+/* Vector Element Index Instruction */
+static bool trans_vid_v(DisasContext *s, arg_vid_v *a)
+{
+    if (vext_check_isa_ill(s, RVV) &&
+        vext_check_reg(s, a->rd, false) &&
+        vext_check_overlap_mask(s, a->rd, a->vm, false)) {
+        uint32_t data = 0;
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+        data = FIELD_DP32(data, VDATA, VM, a->vm);
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+        static gen_helper_gvec_2_ptr * const fns[4] = {
+            gen_helper_vid_v_b, gen_helper_vid_v_h,
+            gen_helper_vid_v_w, gen_helper_vid_v_d,
+        };
+        tcg_gen_gvec_2_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),
+            cpu_env, 0, s->vlen / 8, data, fns[s->sew]);
+        return true;
+    }
+    return false;
+}
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 6089be92fa..ff3b60e9c8 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4414,3 +4414,29 @@ GEN_VEXT_VIOTA_M(viota_m_b, uint8_t, H1, clearb)
 GEN_VEXT_VIOTA_M(viota_m_h, uint16_t, H2, clearh)
 GEN_VEXT_VIOTA_M(viota_m_w, uint32_t, H4, clearl)
 GEN_VEXT_VIOTA_M(viota_m_d, uint64_t, H8, clearq)
+
+/* Vector Element Index Instruction */
+#define GEN_VEXT_VID_V(NAME, ETYPE, H, CLEAR_FN)                          \
+void HELPER(NAME)(void *vd, void *v0, CPURISCVState *env, uint32_t desc)  \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
+    uint32_t vm = vext_vm(desc);                                          \
+    uint32_t vl = env->vl;                                                \
+    int i;                                                                \
+                                                                          \
+    for (i = 0; i < vl; i++) {                                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
+            continue;                                                     \
+        }                                                                 \
+        *((ETYPE *)vd + H(i)) = i;                                        \
+    }                                                                     \
+    if (i == 0) {                                                         \
+        return;                                                           \
+    }                                                                     \
+    CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));          \
+}
+GEN_VEXT_VID_V(vid_v_b, uint8_t, H1, clearb)
+GEN_VEXT_VID_V(vid_v_h, uint16_t, H2, clearh)
+GEN_VEXT_VID_V(vid_v_w, uint32_t, H4, clearl)
+GEN_VEXT_VID_V(vid_v_d, uint64_t, H8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 53/60] target/riscv: vector element index instruction
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  5 +++++
 target/riscv/insn32.decode              |  2 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 21 ++++++++++++++++++++
 target/riscv/vector_helper.c            | 26 +++++++++++++++++++++++++
 4 files changed, 54 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 1a7653a431..e3f2970221 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1096,3 +1096,8 @@ DEF_HELPER_5(viota_m_b, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(viota_m_h, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(viota_m_w, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(viota_m_d, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_4(vid_v_b, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vid_v_h, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vid_v_w, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vid_v_d, void, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 00b8d2df9d..1504059415 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -70,6 +70,7 @@
 @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
 @r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
 @r2_vm   ...... vm:1 ..... ..... ... ..... ....... &rmr %rs2 %rd
+@r1_vm   ...... vm:1 ..... ..... ... ..... ....... %rd
 @r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
 @r_vm    ...... vm:1 ..... ..... ... ..... ....... &rmrr %rs2 %rs1 %rd
 @r_wdvm  ..... wd:1 vm:1 ..... ..... ... ..... ....... &rwdvm %rs2 %rs1 %rd
@@ -553,6 +554,7 @@ vmsbf_m         010110 . ..... 00001 010 ..... 1010111 @r2_vm
 vmsif_m         010110 . ..... 00011 010 ..... 1010111 @r2_vm
 vmsof_m         010110 . ..... 00010 010 ..... 1010111 @r2_vm
 viota_m         010110 . ..... 10000 010 ..... 1010111 @r2_vm
+vid_v           010110 . 00000 10001 010 ..... 1010111 @r1_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index c7be689553..1ff72a6406 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2189,3 +2189,24 @@ static bool trans_viota_m(DisasContext *s, arg_viota_m *a)
     }
     return false;
 }
+
+/* Vector Element Index Instruction */
+static bool trans_vid_v(DisasContext *s, arg_vid_v *a)
+{
+    if (vext_check_isa_ill(s, RVV) &&
+        vext_check_reg(s, a->rd, false) &&
+        vext_check_overlap_mask(s, a->rd, a->vm, false)) {
+        uint32_t data = 0;
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+        data = FIELD_DP32(data, VDATA, VM, a->vm);
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+        static gen_helper_gvec_2_ptr * const fns[4] = {
+            gen_helper_vid_v_b, gen_helper_vid_v_h,
+            gen_helper_vid_v_w, gen_helper_vid_v_d,
+        };
+        tcg_gen_gvec_2_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),
+            cpu_env, 0, s->vlen / 8, data, fns[s->sew]);
+        return true;
+    }
+    return false;
+}
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 6089be92fa..ff3b60e9c8 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4414,3 +4414,29 @@ GEN_VEXT_VIOTA_M(viota_m_b, uint8_t, H1, clearb)
 GEN_VEXT_VIOTA_M(viota_m_h, uint16_t, H2, clearh)
 GEN_VEXT_VIOTA_M(viota_m_w, uint32_t, H4, clearl)
 GEN_VEXT_VIOTA_M(viota_m_d, uint64_t, H8, clearq)
+
+/* Vector Element Index Instruction */
+#define GEN_VEXT_VID_V(NAME, ETYPE, H, CLEAR_FN)                          \
+void HELPER(NAME)(void *vd, void *v0, CPURISCVState *env, uint32_t desc)  \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
+    uint32_t vm = vext_vm(desc);                                          \
+    uint32_t vl = env->vl;                                                \
+    int i;                                                                \
+                                                                          \
+    for (i = 0; i < vl; i++) {                                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
+            continue;                                                     \
+        }                                                                 \
+        *((ETYPE *)vd + H(i)) = i;                                        \
+    }                                                                     \
+    if (i == 0) {                                                         \
+        return;                                                           \
+    }                                                                     \
+    CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));          \
+}
+GEN_VEXT_VID_V(vid_v_b, uint8_t, H1, clearb)
+GEN_VEXT_VID_V(vid_v_h, uint16_t, H2, clearh)
+GEN_VEXT_VID_V(vid_v_w, uint32_t, H4, clearl)
+GEN_VEXT_VID_V(vid_v_d, uint64_t, H8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 54/60] target/riscv: integer extract instruction
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  5 ++++
 target/riscv/insn32.decode              |  1 +
 target/riscv/insn_trans/trans_rvv.inc.c | 33 +++++++++++++++++++++++++
 target/riscv/vector_helper.c            | 20 +++++++++++++++
 4 files changed, 59 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index e3f2970221..d94347a9a5 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1101,3 +1101,8 @@ DEF_HELPER_4(vid_v_b, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vid_v_h, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vid_v_w, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vid_v_d, void, ptr, ptr, env, i32)
+
+DEF_HELPER_3(vext_x_v_b, tl, ptr, tl, env)
+DEF_HELPER_3(vext_x_v_h, tl, ptr, tl, env)
+DEF_HELPER_3(vext_x_v_w, tl, ptr, tl, env)
+DEF_HELPER_3(vext_x_v_d, tl, ptr, tl, env)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 1504059415..c26a186d6a 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -555,6 +555,7 @@ vmsif_m         010110 . ..... 00011 010 ..... 1010111 @r2_vm
 vmsof_m         010110 . ..... 00010 010 ..... 1010111 @r2_vm
 viota_m         010110 . ..... 10000 010 ..... 1010111 @r2_vm
 vid_v           010110 . 00000 10001 010 ..... 1010111 @r1_vm
+vext_x_v        001100 1 ..... ..... 010 ..... 1010111 @r
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 1ff72a6406..46651dfb10 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2210,3 +2210,36 @@ static bool trans_vid_v(DisasContext *s, arg_vid_v *a)
     }
     return false;
 }
+
+/*
+ *** Vector Permutation Instructions
+ */
+/* Integer Extract Instruction */
+typedef void (* gen_helper_vext_x_v)(TCGv, TCGv_ptr, TCGv, TCGv_env);
+static bool trans_vext_x_v(DisasContext *s, arg_r *a)
+{
+    if (vext_check_isa_ill(s, RVV)) {
+        TCGv_ptr src2;
+        TCGv dest, src1;
+        gen_helper_vext_x_v fns[4] = {
+            gen_helper_vext_x_v_b, gen_helper_vext_x_v_h,
+            gen_helper_vext_x_v_w, gen_helper_vext_x_v_d
+        };
+
+        dest = tcg_temp_new();
+        src1 = tcg_temp_new();
+        src2 = tcg_temp_new_ptr();
+
+        gen_get_gpr(src1, a->rs1);
+        tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, a->rs2));
+
+        fns[s->sew](dest, src2, src1, cpu_env);
+        gen_set_gpr(a->rd, dest);
+
+        tcg_temp_free(dest);
+        tcg_temp_free(src1);
+        tcg_temp_free_ptr(src2);
+        return true;
+    }
+    return false;
+}
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index ff3b60e9c8..8704ee120f 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4440,3 +4440,23 @@ GEN_VEXT_VID_V(vid_v_b, uint8_t, H1, clearb)
 GEN_VEXT_VID_V(vid_v_h, uint16_t, H2, clearh)
 GEN_VEXT_VID_V(vid_v_w, uint32_t, H4, clearl)
 GEN_VEXT_VID_V(vid_v_d, uint64_t, H8, clearq)
+
+/*
+ *** Vector Permutation Instructions
+ */
+/* Integer Extract Instruction */
+#define GEN_VEXT_X_V(NAME, ETYPE, H)                    \
+target_ulong HELPER(NAME)(void *vs2, target_ulong s1,   \
+        CPURISCVState *env)                             \
+{                                                       \
+    uint32_t vlen = env_archcpu(env)->cfg.vlen / 8;     \
+                                                        \
+    if (s1 >= vlen / sizeof(ETYPE)) {                   \
+        return 0;                                       \
+    }                                                   \
+    return *((ETYPE *)vs2 + s1);                        \
+}
+GEN_VEXT_X_V(vext_x_v_b, uint8_t, H1)
+GEN_VEXT_X_V(vext_x_v_h, uint16_t, H2)
+GEN_VEXT_X_V(vext_x_v_w, uint32_t, H4)
+GEN_VEXT_X_V(vext_x_v_d, uint64_t, H8)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 54/60] target/riscv: integer extract instruction
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  5 ++++
 target/riscv/insn32.decode              |  1 +
 target/riscv/insn_trans/trans_rvv.inc.c | 33 +++++++++++++++++++++++++
 target/riscv/vector_helper.c            | 20 +++++++++++++++
 4 files changed, 59 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index e3f2970221..d94347a9a5 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1101,3 +1101,8 @@ DEF_HELPER_4(vid_v_b, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vid_v_h, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vid_v_w, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vid_v_d, void, ptr, ptr, env, i32)
+
+DEF_HELPER_3(vext_x_v_b, tl, ptr, tl, env)
+DEF_HELPER_3(vext_x_v_h, tl, ptr, tl, env)
+DEF_HELPER_3(vext_x_v_w, tl, ptr, tl, env)
+DEF_HELPER_3(vext_x_v_d, tl, ptr, tl, env)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 1504059415..c26a186d6a 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -555,6 +555,7 @@ vmsif_m         010110 . ..... 00011 010 ..... 1010111 @r2_vm
 vmsof_m         010110 . ..... 00010 010 ..... 1010111 @r2_vm
 viota_m         010110 . ..... 10000 010 ..... 1010111 @r2_vm
 vid_v           010110 . 00000 10001 010 ..... 1010111 @r1_vm
+vext_x_v        001100 1 ..... ..... 010 ..... 1010111 @r
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 1ff72a6406..46651dfb10 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2210,3 +2210,36 @@ static bool trans_vid_v(DisasContext *s, arg_vid_v *a)
     }
     return false;
 }
+
+/*
+ *** Vector Permutation Instructions
+ */
+/* Integer Extract Instruction */
+typedef void (* gen_helper_vext_x_v)(TCGv, TCGv_ptr, TCGv, TCGv_env);
+static bool trans_vext_x_v(DisasContext *s, arg_r *a)
+{
+    if (vext_check_isa_ill(s, RVV)) {
+        TCGv_ptr src2;
+        TCGv dest, src1;
+        gen_helper_vext_x_v fns[4] = {
+            gen_helper_vext_x_v_b, gen_helper_vext_x_v_h,
+            gen_helper_vext_x_v_w, gen_helper_vext_x_v_d
+        };
+
+        dest = tcg_temp_new();
+        src1 = tcg_temp_new();
+        src2 = tcg_temp_new_ptr();
+
+        gen_get_gpr(src1, a->rs1);
+        tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, a->rs2));
+
+        fns[s->sew](dest, src2, src1, cpu_env);
+        gen_set_gpr(a->rd, dest);
+
+        tcg_temp_free(dest);
+        tcg_temp_free(src1);
+        tcg_temp_free_ptr(src2);
+        return true;
+    }
+    return false;
+}
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index ff3b60e9c8..8704ee120f 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4440,3 +4440,23 @@ GEN_VEXT_VID_V(vid_v_b, uint8_t, H1, clearb)
 GEN_VEXT_VID_V(vid_v_h, uint16_t, H2, clearh)
 GEN_VEXT_VID_V(vid_v_w, uint32_t, H4, clearl)
 GEN_VEXT_VID_V(vid_v_d, uint64_t, H8, clearq)
+
+/*
+ *** Vector Permutation Instructions
+ */
+/* Integer Extract Instruction */
+#define GEN_VEXT_X_V(NAME, ETYPE, H)                    \
+target_ulong HELPER(NAME)(void *vs2, target_ulong s1,   \
+        CPURISCVState *env)                             \
+{                                                       \
+    uint32_t vlen = env_archcpu(env)->cfg.vlen / 8;     \
+                                                        \
+    if (s1 >= vlen / sizeof(ETYPE)) {                   \
+        return 0;                                       \
+    }                                                   \
+    return *((ETYPE *)vs2 + s1);                        \
+}
+GEN_VEXT_X_V(vext_x_v_b, uint8_t, H1)
+GEN_VEXT_X_V(vext_x_v_h, uint16_t, H2)
+GEN_VEXT_X_V(vext_x_v_w, uint32_t, H4)
+GEN_VEXT_X_V(vext_x_v_d, uint64_t, H8)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 55/60] target/riscv: integer scalar move instruction
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  5 +++++
 target/riscv/insn32.decode              |  1 +
 target/riscv/insn_trans/trans_rvv.inc.c | 26 +++++++++++++++++++++++++
 target/riscv/vector_helper.c            | 15 ++++++++++++++
 4 files changed, 47 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index d94347a9a5..41cecd266c 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1106,3 +1106,8 @@ DEF_HELPER_3(vext_x_v_b, tl, ptr, tl, env)
 DEF_HELPER_3(vext_x_v_h, tl, ptr, tl, env)
 DEF_HELPER_3(vext_x_v_w, tl, ptr, tl, env)
 DEF_HELPER_3(vext_x_v_d, tl, ptr, tl, env)
+
+DEF_HELPER_3(vmv_s_x_b, void, ptr, tl, env)
+DEF_HELPER_3(vmv_s_x_h, void, ptr, tl, env)
+DEF_HELPER_3(vmv_s_x_w, void, ptr, tl, env)
+DEF_HELPER_3(vmv_s_x_d, void, ptr, tl, env)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index c26a186d6a..7e1efeec05 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -556,6 +556,7 @@ vmsof_m         010110 . ..... 00010 010 ..... 1010111 @r2_vm
 viota_m         010110 . ..... 10000 010 ..... 1010111 @r2_vm
 vid_v           010110 . 00000 10001 010 ..... 1010111 @r1_vm
 vext_x_v        001100 1 ..... ..... 010 ..... 1010111 @r
+vmv_s_x         001101 1 00000 ..... 110 ..... 1010111 @r2
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 46651dfb10..7720ffecde 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2243,3 +2243,29 @@ static bool trans_vext_x_v(DisasContext *s, arg_r *a)
     }
     return false;
 }
+
+/* Integer Scalar Move Instruction */
+typedef void (* gen_helper_vmv_s_x)(TCGv_ptr, TCGv, TCGv_env);
+static bool trans_vmv_s_x(DisasContext *s, arg_vmv_s_x *a)
+{
+    if (vext_check_isa_ill(s, RVV)) {
+        TCGv_ptr dest;
+        TCGv src1;
+        gen_helper_vmv_s_x fns[4] = {
+            gen_helper_vmv_s_x_b, gen_helper_vmv_s_x_h,
+            gen_helper_vmv_s_x_w, gen_helper_vmv_s_x_d
+        };
+
+        src1 = tcg_temp_new();
+        dest = tcg_temp_new_ptr();
+        gen_get_gpr(src1, a->rs1);
+        tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, a->rd));
+
+        fns[s->sew](dest, src1, cpu_env);
+
+        tcg_temp_free(src1);
+        tcg_temp_free_ptr(dest);
+        return true;
+    }
+    return false;
+}
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 8704ee120f..66ee69da99 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4460,3 +4460,18 @@ GEN_VEXT_X_V(vext_x_v_b, uint8_t, H1)
 GEN_VEXT_X_V(vext_x_v_h, uint16_t, H2)
 GEN_VEXT_X_V(vext_x_v_w, uint32_t, H4)
 GEN_VEXT_X_V(vext_x_v_d, uint64_t, H8)
+
+/* Integer Scalar Move Instruction */
+#define GEN_VEXT_VMV_S_X(NAME, ETYPE, H, CLEAR_FN)                       \
+void HELPER(NAME)(void *vd, target_ulong s1, CPURISCVState *env)         \
+{                                                                        \
+    if (env->vl == 0) {                                                  \
+        return;                                                          \
+    }                                                                    \
+    *((ETYPE *)vd + H(0)) = s1;                                          \
+    CLEAR_FN(vd, 1, sizeof(ETYPE), env_archcpu(env)->cfg.vlen / 8);      \
+}
+GEN_VEXT_VMV_S_X(vmv_s_x_b, uint8_t, H1, clearb)
+GEN_VEXT_VMV_S_X(vmv_s_x_h, uint16_t, H2, clearh)
+GEN_VEXT_VMV_S_X(vmv_s_x_w, uint32_t, H4, clearl)
+GEN_VEXT_VMV_S_X(vmv_s_x_d, uint64_t, H8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 55/60] target/riscv: integer scalar move instruction
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  5 +++++
 target/riscv/insn32.decode              |  1 +
 target/riscv/insn_trans/trans_rvv.inc.c | 26 +++++++++++++++++++++++++
 target/riscv/vector_helper.c            | 15 ++++++++++++++
 4 files changed, 47 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index d94347a9a5..41cecd266c 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1106,3 +1106,8 @@ DEF_HELPER_3(vext_x_v_b, tl, ptr, tl, env)
 DEF_HELPER_3(vext_x_v_h, tl, ptr, tl, env)
 DEF_HELPER_3(vext_x_v_w, tl, ptr, tl, env)
 DEF_HELPER_3(vext_x_v_d, tl, ptr, tl, env)
+
+DEF_HELPER_3(vmv_s_x_b, void, ptr, tl, env)
+DEF_HELPER_3(vmv_s_x_h, void, ptr, tl, env)
+DEF_HELPER_3(vmv_s_x_w, void, ptr, tl, env)
+DEF_HELPER_3(vmv_s_x_d, void, ptr, tl, env)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index c26a186d6a..7e1efeec05 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -556,6 +556,7 @@ vmsof_m         010110 . ..... 00010 010 ..... 1010111 @r2_vm
 viota_m         010110 . ..... 10000 010 ..... 1010111 @r2_vm
 vid_v           010110 . 00000 10001 010 ..... 1010111 @r1_vm
 vext_x_v        001100 1 ..... ..... 010 ..... 1010111 @r
+vmv_s_x         001101 1 00000 ..... 110 ..... 1010111 @r2
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 46651dfb10..7720ffecde 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2243,3 +2243,29 @@ static bool trans_vext_x_v(DisasContext *s, arg_r *a)
     }
     return false;
 }
+
+/* Integer Scalar Move Instruction */
+typedef void (* gen_helper_vmv_s_x)(TCGv_ptr, TCGv, TCGv_env);
+static bool trans_vmv_s_x(DisasContext *s, arg_vmv_s_x *a)
+{
+    if (vext_check_isa_ill(s, RVV)) {
+        TCGv_ptr dest;
+        TCGv src1;
+        gen_helper_vmv_s_x fns[4] = {
+            gen_helper_vmv_s_x_b, gen_helper_vmv_s_x_h,
+            gen_helper_vmv_s_x_w, gen_helper_vmv_s_x_d
+        };
+
+        src1 = tcg_temp_new();
+        dest = tcg_temp_new_ptr();
+        gen_get_gpr(src1, a->rs1);
+        tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, a->rd));
+
+        fns[s->sew](dest, src1, cpu_env);
+
+        tcg_temp_free(src1);
+        tcg_temp_free_ptr(dest);
+        return true;
+    }
+    return false;
+}
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 8704ee120f..66ee69da99 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4460,3 +4460,18 @@ GEN_VEXT_X_V(vext_x_v_b, uint8_t, H1)
 GEN_VEXT_X_V(vext_x_v_h, uint16_t, H2)
 GEN_VEXT_X_V(vext_x_v_w, uint32_t, H4)
 GEN_VEXT_X_V(vext_x_v_d, uint64_t, H8)
+
+/* Integer Scalar Move Instruction */
+#define GEN_VEXT_VMV_S_X(NAME, ETYPE, H, CLEAR_FN)                       \
+void HELPER(NAME)(void *vd, target_ulong s1, CPURISCVState *env)         \
+{                                                                        \
+    if (env->vl == 0) {                                                  \
+        return;                                                          \
+    }                                                                    \
+    *((ETYPE *)vd + H(0)) = s1;                                          \
+    CLEAR_FN(vd, 1, sizeof(ETYPE), env_archcpu(env)->cfg.vlen / 8);      \
+}
+GEN_VEXT_VMV_S_X(vmv_s_x_b, uint8_t, H1, clearb)
+GEN_VEXT_VMV_S_X(vmv_s_x_h, uint16_t, H2, clearh)
+GEN_VEXT_VMV_S_X(vmv_s_x_w, uint32_t, H4, clearl)
+GEN_VEXT_VMV_S_X(vmv_s_x_d, uint64_t, H8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 56/60] target/riscv: floating-point scalar move instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  9 +++++
 target/riscv/insn32.decode              |  2 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 47 +++++++++++++++++++++++++
 target/riscv/vector_helper.c            | 36 +++++++++++++++++++
 4 files changed, 94 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 41cecd266c..7a689a5c07 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1111,3 +1111,12 @@ DEF_HELPER_3(vmv_s_x_b, void, ptr, tl, env)
 DEF_HELPER_3(vmv_s_x_h, void, ptr, tl, env)
 DEF_HELPER_3(vmv_s_x_w, void, ptr, tl, env)
 DEF_HELPER_3(vmv_s_x_d, void, ptr, tl, env)
+
+DEF_HELPER_2(vfmv_f_s_b, i64, ptr, env)
+DEF_HELPER_2(vfmv_f_s_h, i64, ptr, env)
+DEF_HELPER_2(vfmv_f_s_w, i64, ptr, env)
+DEF_HELPER_2(vfmv_f_s_d, i64, ptr, env)
+DEF_HELPER_3(vfmv_s_f_b, void, ptr, i64, env)
+DEF_HELPER_3(vfmv_s_f_h, void, ptr, i64, env)
+DEF_HELPER_3(vfmv_s_f_w, void, ptr, i64, env)
+DEF_HELPER_3(vfmv_s_f_d, void, ptr, i64, env)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 7e1efeec05..bfdce0979c 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -557,6 +557,8 @@ viota_m         010110 . ..... 10000 010 ..... 1010111 @r2_vm
 vid_v           010110 . 00000 10001 010 ..... 1010111 @r1_vm
 vext_x_v        001100 1 ..... ..... 010 ..... 1010111 @r
 vmv_s_x         001101 1 00000 ..... 110 ..... 1010111 @r2
+vfmv_f_s        001100 1 ..... 00000 001 ..... 1010111 @r2rd
+vfmv_s_f        001101 1 00000 ..... 101 ..... 1010111 @r2
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 7720ffecde..99cd45b0aa 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2269,3 +2269,50 @@ static bool trans_vmv_s_x(DisasContext *s, arg_vmv_s_x *a)
     }
     return false;
 }
+
+/* Floating-Point Scalar Move Instructions */
+typedef void (* gen_helper_vfmv_f_s)(TCGv_i64, TCGv_ptr, TCGv_env);
+static bool trans_vfmv_f_s(DisasContext *s, arg_vfmv_f_s *a)
+{
+    if (vext_check_isa_ill(s, RVV)) {
+        TCGv_ptr src2;
+        gen_helper_vfmv_f_s fns[4] = {
+            gen_helper_vfmv_f_s_b, gen_helper_vfmv_f_s_h,
+            gen_helper_vfmv_f_s_w, gen_helper_vfmv_f_s_d
+        };
+
+        src2 = tcg_temp_new_ptr();
+        tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, a->rs2));
+
+        fns[s->sew](cpu_fpr[a->rd], src2, cpu_env);
+
+        tcg_temp_free_ptr(src2);
+        return true;
+    }
+    return false;
+}
+
+typedef void (* gen_helper_vfmv_s_f)(TCGv_ptr, TCGv_i64, TCGv_env);
+static bool trans_vfmv_s_f(DisasContext *s, arg_vfmv_s_f *a)
+{
+    if (vext_check_isa_ill(s, RVV | RVF) ||
+        vext_check_isa_ill(s, RVV | RVD)) {
+        TCGv_ptr dest;
+        TCGv_i64 src1;
+        gen_helper_vfmv_s_f fns[4] = {
+            gen_helper_vfmv_s_f_b, gen_helper_vfmv_s_f_h,
+            gen_helper_vfmv_s_f_w, gen_helper_vfmv_s_f_d
+        };
+
+        src1 = tcg_temp_new_i64();
+        dest = tcg_temp_new_ptr();
+        tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, a->rd));
+
+        fns[s->sew](dest, src1, cpu_env);
+
+        tcg_temp_free_i64(src1);
+        tcg_temp_free_ptr(dest);
+        return true;
+    }
+    return false;
+}
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 66ee69da99..3235c3fbe1 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4475,3 +4475,39 @@ GEN_VEXT_VMV_S_X(vmv_s_x_b, uint8_t, H1, clearb)
 GEN_VEXT_VMV_S_X(vmv_s_x_h, uint16_t, H2, clearh)
 GEN_VEXT_VMV_S_X(vmv_s_x_w, uint32_t, H4, clearl)
 GEN_VEXT_VMV_S_X(vmv_s_x_d, uint64_t, H8, clearq)
+
+/* Floating-Point Scalar Move Instructions */
+#define GEN_VEXT_VFMV_S_F(NAME, ETYPE, H, CLEAR_FN)                     \
+void HELPER(NAME)(void *vd, uint64_t s1, CPURISCVState *env)            \
+{                                                                       \
+    if (env->vl == 0) {                                                 \
+        return;                                                         \
+    }                                                                   \
+    *((ETYPE *)vd + H(0)) = s1;                                         \
+    CLEAR_FN(vd, 1, sizeof(ETYPE), env_archcpu(env)->cfg.vlen / 8);     \
+}
+GEN_VEXT_VFMV_S_F(vfmv_s_f_b, uint8_t, H1, clearb)
+GEN_VEXT_VFMV_S_F(vfmv_s_f_h, uint16_t, H2, clearh)
+GEN_VEXT_VFMV_S_F(vfmv_s_f_w, uint32_t, H4, clearl)
+GEN_VEXT_VFMV_S_F(vfmv_s_f_d, uint64_t, H8, clearq)
+
+uint64_t HELPER(vfmv_f_s_b)(void *vs2, CPURISCVState *env)
+{
+    return deposit64(-1ULL, 0, 8, *((uint8_t *)vs2 + H1(0)));
+}
+uint64_t HELPER(vfmv_f_s_h)(void *vs2, CPURISCVState *env)
+{
+    return deposit64(-1ULL, 0, 16, *((uint16_t *)vs2 + H2(0)));
+}
+uint64_t HELPER(vfmv_f_s_w)(void *vs2, CPURISCVState *env)
+{
+    return deposit64(-1ULL, 0, 32, *((uint32_t *)vs2 + H4(0)));
+}
+uint64_t HELPER(vfmv_f_s_d)(void *vs2, CPURISCVState *env)
+{
+    if (env->misa & RVD) {
+        return *((uint64_t *)vs2);
+    } else {
+        return deposit64(*((uint64_t *)vs2), 32, 32, 0xffffffff);
+    }
+}
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 56/60] target/riscv: floating-point scalar move instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  9 +++++
 target/riscv/insn32.decode              |  2 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 47 +++++++++++++++++++++++++
 target/riscv/vector_helper.c            | 36 +++++++++++++++++++
 4 files changed, 94 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 41cecd266c..7a689a5c07 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1111,3 +1111,12 @@ DEF_HELPER_3(vmv_s_x_b, void, ptr, tl, env)
 DEF_HELPER_3(vmv_s_x_h, void, ptr, tl, env)
 DEF_HELPER_3(vmv_s_x_w, void, ptr, tl, env)
 DEF_HELPER_3(vmv_s_x_d, void, ptr, tl, env)
+
+DEF_HELPER_2(vfmv_f_s_b, i64, ptr, env)
+DEF_HELPER_2(vfmv_f_s_h, i64, ptr, env)
+DEF_HELPER_2(vfmv_f_s_w, i64, ptr, env)
+DEF_HELPER_2(vfmv_f_s_d, i64, ptr, env)
+DEF_HELPER_3(vfmv_s_f_b, void, ptr, i64, env)
+DEF_HELPER_3(vfmv_s_f_h, void, ptr, i64, env)
+DEF_HELPER_3(vfmv_s_f_w, void, ptr, i64, env)
+DEF_HELPER_3(vfmv_s_f_d, void, ptr, i64, env)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 7e1efeec05..bfdce0979c 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -557,6 +557,8 @@ viota_m         010110 . ..... 10000 010 ..... 1010111 @r2_vm
 vid_v           010110 . 00000 10001 010 ..... 1010111 @r1_vm
 vext_x_v        001100 1 ..... ..... 010 ..... 1010111 @r
 vmv_s_x         001101 1 00000 ..... 110 ..... 1010111 @r2
+vfmv_f_s        001100 1 ..... 00000 001 ..... 1010111 @r2rd
+vfmv_s_f        001101 1 00000 ..... 101 ..... 1010111 @r2
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 7720ffecde..99cd45b0aa 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2269,3 +2269,50 @@ static bool trans_vmv_s_x(DisasContext *s, arg_vmv_s_x *a)
     }
     return false;
 }
+
+/* Floating-Point Scalar Move Instructions */
+typedef void (* gen_helper_vfmv_f_s)(TCGv_i64, TCGv_ptr, TCGv_env);
+static bool trans_vfmv_f_s(DisasContext *s, arg_vfmv_f_s *a)
+{
+    if (vext_check_isa_ill(s, RVV)) {
+        TCGv_ptr src2;
+        gen_helper_vfmv_f_s fns[4] = {
+            gen_helper_vfmv_f_s_b, gen_helper_vfmv_f_s_h,
+            gen_helper_vfmv_f_s_w, gen_helper_vfmv_f_s_d
+        };
+
+        src2 = tcg_temp_new_ptr();
+        tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, a->rs2));
+
+        fns[s->sew](cpu_fpr[a->rd], src2, cpu_env);
+
+        tcg_temp_free_ptr(src2);
+        return true;
+    }
+    return false;
+}
+
+typedef void (* gen_helper_vfmv_s_f)(TCGv_ptr, TCGv_i64, TCGv_env);
+static bool trans_vfmv_s_f(DisasContext *s, arg_vfmv_s_f *a)
+{
+    if (vext_check_isa_ill(s, RVV | RVF) ||
+        vext_check_isa_ill(s, RVV | RVD)) {
+        TCGv_ptr dest;
+        TCGv_i64 src1;
+        gen_helper_vfmv_s_f fns[4] = {
+            gen_helper_vfmv_s_f_b, gen_helper_vfmv_s_f_h,
+            gen_helper_vfmv_s_f_w, gen_helper_vfmv_s_f_d
+        };
+
+        src1 = tcg_temp_new_i64();
+        dest = tcg_temp_new_ptr();
+        tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, a->rd));
+
+        fns[s->sew](dest, src1, cpu_env);
+
+        tcg_temp_free_i64(src1);
+        tcg_temp_free_ptr(dest);
+        return true;
+    }
+    return false;
+}
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 66ee69da99..3235c3fbe1 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4475,3 +4475,39 @@ GEN_VEXT_VMV_S_X(vmv_s_x_b, uint8_t, H1, clearb)
 GEN_VEXT_VMV_S_X(vmv_s_x_h, uint16_t, H2, clearh)
 GEN_VEXT_VMV_S_X(vmv_s_x_w, uint32_t, H4, clearl)
 GEN_VEXT_VMV_S_X(vmv_s_x_d, uint64_t, H8, clearq)
+
+/* Floating-Point Scalar Move Instructions */
+#define GEN_VEXT_VFMV_S_F(NAME, ETYPE, H, CLEAR_FN)                     \
+void HELPER(NAME)(void *vd, uint64_t s1, CPURISCVState *env)            \
+{                                                                       \
+    if (env->vl == 0) {                                                 \
+        return;                                                         \
+    }                                                                   \
+    *((ETYPE *)vd + H(0)) = s1;                                         \
+    CLEAR_FN(vd, 1, sizeof(ETYPE), env_archcpu(env)->cfg.vlen / 8);     \
+}
+GEN_VEXT_VFMV_S_F(vfmv_s_f_b, uint8_t, H1, clearb)
+GEN_VEXT_VFMV_S_F(vfmv_s_f_h, uint16_t, H2, clearh)
+GEN_VEXT_VFMV_S_F(vfmv_s_f_w, uint32_t, H4, clearl)
+GEN_VEXT_VFMV_S_F(vfmv_s_f_d, uint64_t, H8, clearq)
+
+uint64_t HELPER(vfmv_f_s_b)(void *vs2, CPURISCVState *env)
+{
+    return deposit64(-1ULL, 0, 8, *((uint8_t *)vs2 + H1(0)));
+}
+uint64_t HELPER(vfmv_f_s_h)(void *vs2, CPURISCVState *env)
+{
+    return deposit64(-1ULL, 0, 16, *((uint16_t *)vs2 + H2(0)));
+}
+uint64_t HELPER(vfmv_f_s_w)(void *vs2, CPURISCVState *env)
+{
+    return deposit64(-1ULL, 0, 32, *((uint32_t *)vs2 + H4(0)));
+}
+uint64_t HELPER(vfmv_f_s_d)(void *vs2, CPURISCVState *env)
+{
+    if (env->misa & RVD) {
+        return *((uint64_t *)vs2);
+    } else {
+        return deposit64(*((uint64_t *)vs2), 32, 32, 0xffffffff);
+    }
+}
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 57/60] target/riscv: vector slide instructions
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  17 +++
 target/riscv/insn32.decode              |   7 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  17 +++
 target/riscv/vector_helper.c            | 136 ++++++++++++++++++++++++
 4 files changed, 177 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 7a689a5c07..e86df5b9e4 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1120,3 +1120,20 @@ DEF_HELPER_3(vfmv_s_f_b, void, ptr, i64, env)
 DEF_HELPER_3(vfmv_s_f_h, void, ptr, i64, env)
 DEF_HELPER_3(vfmv_s_f_w, void, ptr, i64, env)
 DEF_HELPER_3(vfmv_s_f_d, void, ptr, i64, env)
+
+DEF_HELPER_6(vslideup_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslideup_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslideup_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslideup_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslidedown_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslidedown_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslidedown_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslidedown_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1up_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1up_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1up_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1up_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1down_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1down_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1down_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1down_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index bfdce0979c..e6ade9c68e 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -72,6 +72,7 @@
 @r2_vm   ...... vm:1 ..... ..... ... ..... ....... &rmr %rs2 %rd
 @r1_vm   ...... vm:1 ..... ..... ... ..... ....... %rd
 @r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
+@r2rd    .......   ..... ..... ... ..... ....... %rs2 %rd
 @r_vm    ...... vm:1 ..... ..... ... ..... ....... &rmrr %rs2 %rs1 %rd
 @r_wdvm  ..... wd:1 vm:1 ..... ..... ... ..... ....... &rwdvm %rs2 %rs1 %rd
 @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
@@ -559,6 +560,12 @@ vext_x_v        001100 1 ..... ..... 010 ..... 1010111 @r
 vmv_s_x         001101 1 00000 ..... 110 ..... 1010111 @r2
 vfmv_f_s        001100 1 ..... 00000 001 ..... 1010111 @r2rd
 vfmv_s_f        001101 1 00000 ..... 101 ..... 1010111 @r2
+vslideup_vx     001110 . ..... ..... 100 ..... 1010111 @r_vm
+vslideup_vi     001110 . ..... ..... 011 ..... 1010111 @r_vm
+vslide1up_vx    001110 . ..... ..... 110 ..... 1010111 @r_vm
+vslidedown_vx   001111 . ..... ..... 100 ..... 1010111 @r_vm
+vslidedown_vi   001111 . ..... ..... 011 ..... 1010111 @r_vm
+vslide1down_vx  001111 . ..... ..... 110 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 99cd45b0aa..ef5960ba39 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2316,3 +2316,20 @@ static bool trans_vfmv_s_f(DisasContext *s, arg_vfmv_s_f *a)
     }
     return false;
 }
+
+/* Vector Slide Instructions */
+static bool slideup_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            (a->rd != a->rs2));
+}
+GEN_OPIVX_TRANS(vslideup_vx, slideup_check)
+GEN_OPIVX_TRANS(vslide1up_vx, slideup_check)
+GEN_OPIVI_TRANS(vslideup_vi, 1, vslideup_vx, slideup_check)
+
+GEN_OPIVX_TRANS(vslidedown_vx, opivx_check)
+GEN_OPIVX_TRANS(vslide1down_vx, opivx_check)
+GEN_OPIVI_TRANS(vslidedown_vi, 1, vslidedown_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 3235c3fbe1..2219fdd6c5 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4511,3 +4511,139 @@ uint64_t HELPER(vfmv_f_s_d)(void *vs2, CPURISCVState *env)
         return deposit64(*((uint64_t *)vs2), 32, 32, 0xffffffff);
     }
 }
+
+/* Vector Slide Instructions */
+/*
+ * the spec doesn't specify the behavior when offset is lager than vl,
+ * just truncate the offset to vl here.
+ */
+#define GEN_VEXT_VSLIDEUP_VX(NAME, ETYPE, H, CLEAR_FN)                    \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
+        CPURISCVState *env, uint32_t desc)                                \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
+    uint32_t vm = vext_vm(desc);                                          \
+    uint32_t vl = env->vl;                                                \
+    uint32_t offset = s1, i;                                              \
+                                                                          \
+    if (offset > vl) {                                                    \
+        offset = vl;                                                      \
+    }                                                                     \
+    for (i = 0; i < vl; i++) {                                            \
+        if (((i < offset)) || (!vm && !vext_elem_mask(v0, mlen, i))) {    \
+            continue;                                                     \
+        }                                                                 \
+        *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - offset));          \
+    }                                                                     \
+    if (i == 0) {                                                         \
+        return;                                                           \
+    }                                                                     \
+    for (; i < vlmax; i++) {                                              \
+        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
+    }                                                                     \
+}
+/* vslideup.vx vd, vs2, rs1, vm # vd[i+rs1] = vs2[i] */
+GEN_VEXT_VSLIDEUP_VX(vslideup_vx_b, uint8_t, H1, clearb)
+GEN_VEXT_VSLIDEUP_VX(vslideup_vx_h, uint16_t, H2, clearh)
+GEN_VEXT_VSLIDEUP_VX(vslideup_vx_w, uint32_t, H4, clearl)
+GEN_VEXT_VSLIDEUP_VX(vslideup_vx_d, uint64_t, H8, clearq)
+
+#define GEN_VEXT_VSLIDEDOWN_VX(NAME, ETYPE, H, CLEAR_FN)                  \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
+        CPURISCVState *env, uint32_t desc)                                \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
+    uint32_t vm = vext_vm(desc);                                          \
+    uint32_t vl = env->vl;                                                \
+    uint32_t offset = s1, i;                                              \
+                                                                          \
+    for (i = 0; i < vl; i++) {                                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
+            continue;                                                     \
+        }                                                                 \
+        if (i + offset < vlmax) {                                         \
+            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i + offset));      \
+        } else {                                                          \
+            *((ETYPE *)vd + H(i)) = 0;                                    \
+        }                                                                 \
+    }                                                                     \
+    if (i == 0) {                                                         \
+        return;                                                           \
+    }                                                                     \
+    for (; i < vlmax; i++) {                                              \
+        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
+    }                                                                     \
+}
+/* vslidedown.vx vd, vs2, rs1, vm # vd[i] = vs2[i+rs1] */
+GEN_VEXT_VSLIDEDOWN_VX(vslidedown_vx_b, uint8_t, H1, clearb)
+GEN_VEXT_VSLIDEDOWN_VX(vslidedown_vx_h, uint16_t, H2, clearh)
+GEN_VEXT_VSLIDEDOWN_VX(vslidedown_vx_w, uint32_t, H4, clearl)
+GEN_VEXT_VSLIDEDOWN_VX(vslidedown_vx_d, uint64_t, H8, clearq)
+
+#define GEN_VEXT_VSLIDE1UP_VX(NAME, ETYPE, H, CLEAR_FN)                   \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
+        CPURISCVState *env, uint32_t desc)                                \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
+    uint32_t vm = vext_vm(desc);                                          \
+    uint32_t vl = env->vl;                                                \
+    uint32_t i;                                                           \
+                                                                          \
+    for (i = 0; i < vl; i++) {                                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
+            continue;                                                     \
+        }                                                                 \
+        if (i == 0) {                                                     \
+            *((ETYPE *)vd + H(i)) = s1;                                   \
+        } else {                                                          \
+            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - 1));           \
+        }                                                                 \
+    }                                                                     \
+    if (i == 0) {                                                         \
+        return;                                                           \
+    }                                                                     \
+    for (; i < vlmax; i++) {                                              \
+        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
+    }                                                                     \
+}
+/* vslide1up.vx vd, vs2, rs1, vm # vd[0]=x[rs1], vd[i+1] = vs2[i] */
+GEN_VEXT_VSLIDE1UP_VX(vslide1up_vx_b, uint8_t, H1, clearb)
+GEN_VEXT_VSLIDE1UP_VX(vslide1up_vx_h, uint16_t, H2, clearh)
+GEN_VEXT_VSLIDE1UP_VX(vslide1up_vx_w, uint32_t, H4, clearl)
+GEN_VEXT_VSLIDE1UP_VX(vslide1up_vx_d, uint64_t, H8, clearq)
+
+#define GEN_VEXT_VSLIDE1DOWN_VX(NAME, ETYPE, H, CLEAR_FN)                 \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
+        CPURISCVState *env, uint32_t desc)                                \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
+    uint32_t vm = vext_vm(desc);                                          \
+    uint32_t vl = env->vl;                                                \
+    uint32_t i;                                                           \
+                                                                          \
+    for (i = 0; i < vl; i++) {                                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
+            continue;                                                     \
+        }                                                                 \
+        if (i == vl - 1) {                                                \
+            *((ETYPE *)vd + H(i)) = s1;                                   \
+        } else {                                                          \
+            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i + 1));           \
+        }                                                                 \
+    }                                                                     \
+    if (i == 0) {                                                         \
+        return;                                                           \
+    }                                                                     \
+    for (; i < vlmax; i++) {                                              \
+        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
+    }                                                                     \
+}
+/* vslide1down.vx vd, vs2, rs1, vm # vd[i] = vs2[i+1], vd[vl-1]=x[rs1] */
+GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_b, uint8_t, H1, clearb)
+GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_h, uint16_t, H2, clearh)
+GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_w, uint32_t, H4, clearl)
+GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_d, uint64_t, H8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 57/60] target/riscv: vector slide instructions
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  17 +++
 target/riscv/insn32.decode              |   7 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  17 +++
 target/riscv/vector_helper.c            | 136 ++++++++++++++++++++++++
 4 files changed, 177 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 7a689a5c07..e86df5b9e4 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1120,3 +1120,20 @@ DEF_HELPER_3(vfmv_s_f_b, void, ptr, i64, env)
 DEF_HELPER_3(vfmv_s_f_h, void, ptr, i64, env)
 DEF_HELPER_3(vfmv_s_f_w, void, ptr, i64, env)
 DEF_HELPER_3(vfmv_s_f_d, void, ptr, i64, env)
+
+DEF_HELPER_6(vslideup_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslideup_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslideup_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslideup_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslidedown_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslidedown_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslidedown_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslidedown_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1up_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1up_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1up_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1up_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1down_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1down_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1down_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1down_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index bfdce0979c..e6ade9c68e 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -72,6 +72,7 @@
 @r2_vm   ...... vm:1 ..... ..... ... ..... ....... &rmr %rs2 %rd
 @r1_vm   ...... vm:1 ..... ..... ... ..... ....... %rd
 @r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
+@r2rd    .......   ..... ..... ... ..... ....... %rs2 %rd
 @r_vm    ...... vm:1 ..... ..... ... ..... ....... &rmrr %rs2 %rs1 %rd
 @r_wdvm  ..... wd:1 vm:1 ..... ..... ... ..... ....... &rwdvm %rs2 %rs1 %rd
 @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
@@ -559,6 +560,12 @@ vext_x_v        001100 1 ..... ..... 010 ..... 1010111 @r
 vmv_s_x         001101 1 00000 ..... 110 ..... 1010111 @r2
 vfmv_f_s        001100 1 ..... 00000 001 ..... 1010111 @r2rd
 vfmv_s_f        001101 1 00000 ..... 101 ..... 1010111 @r2
+vslideup_vx     001110 . ..... ..... 100 ..... 1010111 @r_vm
+vslideup_vi     001110 . ..... ..... 011 ..... 1010111 @r_vm
+vslide1up_vx    001110 . ..... ..... 110 ..... 1010111 @r_vm
+vslidedown_vx   001111 . ..... ..... 100 ..... 1010111 @r_vm
+vslidedown_vi   001111 . ..... ..... 011 ..... 1010111 @r_vm
+vslide1down_vx  001111 . ..... ..... 110 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 99cd45b0aa..ef5960ba39 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2316,3 +2316,20 @@ static bool trans_vfmv_s_f(DisasContext *s, arg_vfmv_s_f *a)
     }
     return false;
 }
+
+/* Vector Slide Instructions */
+static bool slideup_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            (a->rd != a->rs2));
+}
+GEN_OPIVX_TRANS(vslideup_vx, slideup_check)
+GEN_OPIVX_TRANS(vslide1up_vx, slideup_check)
+GEN_OPIVI_TRANS(vslideup_vi, 1, vslideup_vx, slideup_check)
+
+GEN_OPIVX_TRANS(vslidedown_vx, opivx_check)
+GEN_OPIVX_TRANS(vslide1down_vx, opivx_check)
+GEN_OPIVI_TRANS(vslidedown_vi, 1, vslidedown_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 3235c3fbe1..2219fdd6c5 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4511,3 +4511,139 @@ uint64_t HELPER(vfmv_f_s_d)(void *vs2, CPURISCVState *env)
         return deposit64(*((uint64_t *)vs2), 32, 32, 0xffffffff);
     }
 }
+
+/* Vector Slide Instructions */
+/*
+ * the spec doesn't specify the behavior when offset is lager than vl,
+ * just truncate the offset to vl here.
+ */
+#define GEN_VEXT_VSLIDEUP_VX(NAME, ETYPE, H, CLEAR_FN)                    \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
+        CPURISCVState *env, uint32_t desc)                                \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
+    uint32_t vm = vext_vm(desc);                                          \
+    uint32_t vl = env->vl;                                                \
+    uint32_t offset = s1, i;                                              \
+                                                                          \
+    if (offset > vl) {                                                    \
+        offset = vl;                                                      \
+    }                                                                     \
+    for (i = 0; i < vl; i++) {                                            \
+        if (((i < offset)) || (!vm && !vext_elem_mask(v0, mlen, i))) {    \
+            continue;                                                     \
+        }                                                                 \
+        *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - offset));          \
+    }                                                                     \
+    if (i == 0) {                                                         \
+        return;                                                           \
+    }                                                                     \
+    for (; i < vlmax; i++) {                                              \
+        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
+    }                                                                     \
+}
+/* vslideup.vx vd, vs2, rs1, vm # vd[i+rs1] = vs2[i] */
+GEN_VEXT_VSLIDEUP_VX(vslideup_vx_b, uint8_t, H1, clearb)
+GEN_VEXT_VSLIDEUP_VX(vslideup_vx_h, uint16_t, H2, clearh)
+GEN_VEXT_VSLIDEUP_VX(vslideup_vx_w, uint32_t, H4, clearl)
+GEN_VEXT_VSLIDEUP_VX(vslideup_vx_d, uint64_t, H8, clearq)
+
+#define GEN_VEXT_VSLIDEDOWN_VX(NAME, ETYPE, H, CLEAR_FN)                  \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
+        CPURISCVState *env, uint32_t desc)                                \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
+    uint32_t vm = vext_vm(desc);                                          \
+    uint32_t vl = env->vl;                                                \
+    uint32_t offset = s1, i;                                              \
+                                                                          \
+    for (i = 0; i < vl; i++) {                                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
+            continue;                                                     \
+        }                                                                 \
+        if (i + offset < vlmax) {                                         \
+            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i + offset));      \
+        } else {                                                          \
+            *((ETYPE *)vd + H(i)) = 0;                                    \
+        }                                                                 \
+    }                                                                     \
+    if (i == 0) {                                                         \
+        return;                                                           \
+    }                                                                     \
+    for (; i < vlmax; i++) {                                              \
+        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
+    }                                                                     \
+}
+/* vslidedown.vx vd, vs2, rs1, vm # vd[i] = vs2[i+rs1] */
+GEN_VEXT_VSLIDEDOWN_VX(vslidedown_vx_b, uint8_t, H1, clearb)
+GEN_VEXT_VSLIDEDOWN_VX(vslidedown_vx_h, uint16_t, H2, clearh)
+GEN_VEXT_VSLIDEDOWN_VX(vslidedown_vx_w, uint32_t, H4, clearl)
+GEN_VEXT_VSLIDEDOWN_VX(vslidedown_vx_d, uint64_t, H8, clearq)
+
+#define GEN_VEXT_VSLIDE1UP_VX(NAME, ETYPE, H, CLEAR_FN)                   \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
+        CPURISCVState *env, uint32_t desc)                                \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
+    uint32_t vm = vext_vm(desc);                                          \
+    uint32_t vl = env->vl;                                                \
+    uint32_t i;                                                           \
+                                                                          \
+    for (i = 0; i < vl; i++) {                                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
+            continue;                                                     \
+        }                                                                 \
+        if (i == 0) {                                                     \
+            *((ETYPE *)vd + H(i)) = s1;                                   \
+        } else {                                                          \
+            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - 1));           \
+        }                                                                 \
+    }                                                                     \
+    if (i == 0) {                                                         \
+        return;                                                           \
+    }                                                                     \
+    for (; i < vlmax; i++) {                                              \
+        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
+    }                                                                     \
+}
+/* vslide1up.vx vd, vs2, rs1, vm # vd[0]=x[rs1], vd[i+1] = vs2[i] */
+GEN_VEXT_VSLIDE1UP_VX(vslide1up_vx_b, uint8_t, H1, clearb)
+GEN_VEXT_VSLIDE1UP_VX(vslide1up_vx_h, uint16_t, H2, clearh)
+GEN_VEXT_VSLIDE1UP_VX(vslide1up_vx_w, uint32_t, H4, clearl)
+GEN_VEXT_VSLIDE1UP_VX(vslide1up_vx_d, uint64_t, H8, clearq)
+
+#define GEN_VEXT_VSLIDE1DOWN_VX(NAME, ETYPE, H, CLEAR_FN)                 \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
+        CPURISCVState *env, uint32_t desc)                                \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
+    uint32_t vm = vext_vm(desc);                                          \
+    uint32_t vl = env->vl;                                                \
+    uint32_t i;                                                           \
+                                                                          \
+    for (i = 0; i < vl; i++) {                                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
+            continue;                                                     \
+        }                                                                 \
+        if (i == vl - 1) {                                                \
+            *((ETYPE *)vd + H(i)) = s1;                                   \
+        } else {                                                          \
+            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i + 1));           \
+        }                                                                 \
+    }                                                                     \
+    if (i == 0) {                                                         \
+        return;                                                           \
+    }                                                                     \
+    for (; i < vlmax; i++) {                                              \
+        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
+    }                                                                     \
+}
+/* vslide1down.vx vd, vs2, rs1, vm # vd[i] = vs2[i+1], vd[vl-1]=x[rs1] */
+GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_b, uint8_t, H1, clearb)
+GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_h, uint16_t, H2, clearh)
+GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_w, uint32_t, H4, clearl)
+GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_d, uint64_t, H8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 58/60] target/riscv: vector register gather instruction
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  9 ++++
 target/riscv/insn32.decode              |  3 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 23 +++++++++
 target/riscv/vector_helper.c            | 68 +++++++++++++++++++++++++
 4 files changed, 103 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index e86df5b9e4..b9ec0a4efc 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1137,3 +1137,12 @@ DEF_HELPER_6(vslide1down_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vslide1down_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vslide1down_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vslide1down_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vrgather_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrgather_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrgather_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrgather_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrgather_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrgather_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrgather_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrgather_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e6ade9c68e..d92861a334 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -566,6 +566,9 @@ vslide1up_vx    001110 . ..... ..... 110 ..... 1010111 @r_vm
 vslidedown_vx   001111 . ..... ..... 100 ..... 1010111 @r_vm
 vslidedown_vi   001111 . ..... ..... 011 ..... 1010111 @r_vm
 vslide1down_vx  001111 . ..... ..... 110 ..... 1010111 @r_vm
+vrgather_vv     001100 . ..... ..... 000 ..... 1010111 @r_vm
+vrgather_vx     001100 . ..... ..... 100 ..... 1010111 @r_vm
+vrgather_vi     001100 . ..... ..... 011 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index ef5960ba39..f3b08919b9 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2333,3 +2333,26 @@ GEN_OPIVI_TRANS(vslideup_vi, 1, vslideup_vx, slideup_check)
 GEN_OPIVX_TRANS(vslidedown_vx, opivx_check)
 GEN_OPIVX_TRANS(vslide1down_vx, opivx_check)
 GEN_OPIVI_TRANS(vslidedown_vi, 1, vslidedown_vx, opivx_check)
+
+/* Vector Register Gather Instruction */
+static bool vrgather_vv_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs1, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            (a->rd != a->rs2) && (a->rd != a->rs1));
+}
+GEN_OPIVV_TRANS(vrgather_vv, vrgather_vv_check)
+
+static bool vrgather_vx_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            (a->rd != a->rs2));
+}
+GEN_OPIVX_TRANS(vrgather_vx, vrgather_vx_check)
+GEN_OPIVI_TRANS(vrgather_vi, 1, vrgather_vx, vrgather_vx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 2219fdd6c5..5788e46dcf 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4647,3 +4647,71 @@ GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_b, uint8_t, H1, clearb)
 GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_h, uint16_t, H2, clearh)
 GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_w, uint32_t, H4, clearl)
 GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_d, uint64_t, H8, clearq)
+
+/* Vector Register Gather Instruction */
+#define GEN_VEXT_VRGATHER_VV(NAME, ETYPE, H, CLEAR_FN)                    \
+void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,               \
+        CPURISCVState *env, uint32_t desc)                                \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
+    uint32_t vm = vext_vm(desc);                                          \
+    uint32_t vl = env->vl;                                                \
+    uint32_t index, i;                                                    \
+                                                                          \
+    for (i = 0; i < vl; i++) {                                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
+            continue;                                                     \
+        }                                                                 \
+        index = *((ETYPE *)vs1 + H(i));                                   \
+        if (index >= vlmax) {                                             \
+            *((ETYPE *)vd + H(i)) = 0;                                    \
+        } else {                                                          \
+            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(index));           \
+        }                                                                 \
+    }                                                                     \
+    if (i == 0) {                                                         \
+        return;                                                           \
+    }                                                                     \
+    for (; i < vlmax; i++) {                                              \
+        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
+    }                                                                     \
+}
+/* vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]]; */
+GEN_VEXT_VRGATHER_VV(vrgather_vv_b, uint8_t, H1, clearb)
+GEN_VEXT_VRGATHER_VV(vrgather_vv_h, uint16_t, H2, clearh)
+GEN_VEXT_VRGATHER_VV(vrgather_vv_w, uint32_t, H4, clearl)
+GEN_VEXT_VRGATHER_VV(vrgather_vv_d, uint64_t, H8, clearq)
+
+#define GEN_VEXT_VRGATHER_VX(NAME, ETYPE, H, CLEAR_FN)                    \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
+        CPURISCVState *env, uint32_t desc)                                \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
+    uint32_t vm = vext_vm(desc);                                          \
+    uint32_t vl = env->vl;                                                \
+    uint32_t index = s1, i;                                               \
+                                                                          \
+    for (i = 0; i < vl; i++) {                                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
+            continue;                                                     \
+        }                                                                 \
+        if (index >= vlmax) {                                             \
+            *((ETYPE *)vd + H(i)) = 0;                                    \
+        } else {                                                          \
+            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(index));           \
+        }                                                                 \
+    }                                                                     \
+    if (i == 0) {                                                         \
+        return;                                                           \
+    }                                                                     \
+    for (; i < vlmax; i++) {                                              \
+        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
+    }                                                                     \
+}
+/* vd[i] = (x[rs1] >= VLMAX) ? 0 : vs2[rs1] */
+GEN_VEXT_VRGATHER_VX(vrgather_vx_b, uint8_t, H1, clearb)
+GEN_VEXT_VRGATHER_VX(vrgather_vx_h, uint16_t, H2, clearh)
+GEN_VEXT_VRGATHER_VX(vrgather_vx_w, uint32_t, H4, clearl)
+GEN_VEXT_VRGATHER_VX(vrgather_vx_d, uint64_t, H8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 58/60] target/riscv: vector register gather instruction
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  9 ++++
 target/riscv/insn32.decode              |  3 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 23 +++++++++
 target/riscv/vector_helper.c            | 68 +++++++++++++++++++++++++
 4 files changed, 103 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index e86df5b9e4..b9ec0a4efc 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1137,3 +1137,12 @@ DEF_HELPER_6(vslide1down_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vslide1down_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vslide1down_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vslide1down_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vrgather_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrgather_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrgather_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrgather_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrgather_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrgather_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrgather_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrgather_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e6ade9c68e..d92861a334 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -566,6 +566,9 @@ vslide1up_vx    001110 . ..... ..... 110 ..... 1010111 @r_vm
 vslidedown_vx   001111 . ..... ..... 100 ..... 1010111 @r_vm
 vslidedown_vi   001111 . ..... ..... 011 ..... 1010111 @r_vm
 vslide1down_vx  001111 . ..... ..... 110 ..... 1010111 @r_vm
+vrgather_vv     001100 . ..... ..... 000 ..... 1010111 @r_vm
+vrgather_vx     001100 . ..... ..... 100 ..... 1010111 @r_vm
+vrgather_vi     001100 . ..... ..... 011 ..... 1010111 @r_vm
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index ef5960ba39..f3b08919b9 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2333,3 +2333,26 @@ GEN_OPIVI_TRANS(vslideup_vi, 1, vslideup_vx, slideup_check)
 GEN_OPIVX_TRANS(vslidedown_vx, opivx_check)
 GEN_OPIVX_TRANS(vslide1down_vx, opivx_check)
 GEN_OPIVI_TRANS(vslidedown_vi, 1, vslidedown_vx, opivx_check)
+
+/* Vector Register Gather Instruction */
+static bool vrgather_vv_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs1, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            (a->rd != a->rs2) && (a->rd != a->rs1));
+}
+GEN_OPIVV_TRANS(vrgather_vv, vrgather_vv_check)
+
+static bool vrgather_vx_check(DisasContext *s, arg_rmrr *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            (a->rd != a->rs2));
+}
+GEN_OPIVX_TRANS(vrgather_vx, vrgather_vx_check)
+GEN_OPIVI_TRANS(vrgather_vi, 1, vrgather_vx, vrgather_vx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 2219fdd6c5..5788e46dcf 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4647,3 +4647,71 @@ GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_b, uint8_t, H1, clearb)
 GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_h, uint16_t, H2, clearh)
 GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_w, uint32_t, H4, clearl)
 GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_d, uint64_t, H8, clearq)
+
+/* Vector Register Gather Instruction */
+#define GEN_VEXT_VRGATHER_VV(NAME, ETYPE, H, CLEAR_FN)                    \
+void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,               \
+        CPURISCVState *env, uint32_t desc)                                \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
+    uint32_t vm = vext_vm(desc);                                          \
+    uint32_t vl = env->vl;                                                \
+    uint32_t index, i;                                                    \
+                                                                          \
+    for (i = 0; i < vl; i++) {                                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
+            continue;                                                     \
+        }                                                                 \
+        index = *((ETYPE *)vs1 + H(i));                                   \
+        if (index >= vlmax) {                                             \
+            *((ETYPE *)vd + H(i)) = 0;                                    \
+        } else {                                                          \
+            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(index));           \
+        }                                                                 \
+    }                                                                     \
+    if (i == 0) {                                                         \
+        return;                                                           \
+    }                                                                     \
+    for (; i < vlmax; i++) {                                              \
+        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
+    }                                                                     \
+}
+/* vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]]; */
+GEN_VEXT_VRGATHER_VV(vrgather_vv_b, uint8_t, H1, clearb)
+GEN_VEXT_VRGATHER_VV(vrgather_vv_h, uint16_t, H2, clearh)
+GEN_VEXT_VRGATHER_VV(vrgather_vv_w, uint32_t, H4, clearl)
+GEN_VEXT_VRGATHER_VV(vrgather_vv_d, uint64_t, H8, clearq)
+
+#define GEN_VEXT_VRGATHER_VX(NAME, ETYPE, H, CLEAR_FN)                    \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
+        CPURISCVState *env, uint32_t desc)                                \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
+    uint32_t vm = vext_vm(desc);                                          \
+    uint32_t vl = env->vl;                                                \
+    uint32_t index = s1, i;                                               \
+                                                                          \
+    for (i = 0; i < vl; i++) {                                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
+            continue;                                                     \
+        }                                                                 \
+        if (index >= vlmax) {                                             \
+            *((ETYPE *)vd + H(i)) = 0;                                    \
+        } else {                                                          \
+            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(index));           \
+        }                                                                 \
+    }                                                                     \
+    if (i == 0) {                                                         \
+        return;                                                           \
+    }                                                                     \
+    for (; i < vlmax; i++) {                                              \
+        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
+    }                                                                     \
+}
+/* vd[i] = (x[rs1] >= VLMAX) ? 0 : vs2[rs1] */
+GEN_VEXT_VRGATHER_VX(vrgather_vx_b, uint8_t, H1, clearb)
+GEN_VEXT_VRGATHER_VX(vrgather_vx_h, uint16_t, H2, clearh)
+GEN_VEXT_VRGATHER_VX(vrgather_vx_w, uint32_t, H4, clearl)
+GEN_VEXT_VRGATHER_VX(vrgather_vx_d, uint64_t, H8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 59/60] target/riscv: vector compress instruction
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  5 +++++
 target/riscv/insn32.decode              |  1 +
 target/riscv/insn_trans/trans_rvv.inc.c | 28 +++++++++++++++++++++++++
 target/riscv/vector_helper.c            | 28 +++++++++++++++++++++++++
 4 files changed, 62 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index b9ec0a4efc..3e223ed7d7 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1146,3 +1146,8 @@ DEF_HELPER_6(vrgather_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrgather_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrgather_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrgather_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vcompress_vm_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vcompress_vm_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vcompress_vm_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vcompress_vm_d, void, ptr, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index d92861a334..8eab175a74 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -569,6 +569,7 @@ vslide1down_vx  001111 . ..... ..... 110 ..... 1010111 @r_vm
 vrgather_vv     001100 . ..... ..... 000 ..... 1010111 @r_vm
 vrgather_vx     001100 . ..... ..... 100 ..... 1010111 @r_vm
 vrgather_vi     001100 . ..... ..... 011 ..... 1010111 @r_vm
+vcompress_vm    010111 - ..... ..... 010 ..... 1010111 @r
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index f3b08919b9..b7959ad417 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2356,3 +2356,31 @@ static bool vrgather_vx_check(DisasContext *s, arg_rmrr *a)
 }
 GEN_OPIVX_TRANS(vrgather_vx, vrgather_vx_check)
 GEN_OPIVI_TRANS(vrgather_vi, 1, vrgather_vx, vrgather_vx_check)
+
+/* Vector Compress Instruction */
+static bool vcompress_vm_check(DisasContext *s, arg_r *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_overlap_group(a->rd, 1 << s->lmul, a->rs1, 1) &&
+            (a->rd != a->rs2));
+}
+
+static bool trans_vcompress_vm(DisasContext *s, arg_r *a)
+{
+    if (vcompress_vm_check(s, a)) {
+        uint32_t data = 0;
+        static gen_helper_gvec_4_ptr * const fns[4] = {
+            gen_helper_vcompress_vm_b, gen_helper_vcompress_vm_h,
+            gen_helper_vcompress_vm_w, gen_helper_vcompress_vm_d,
+        };
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),
+            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),
+            cpu_env, 0, s->vlen / 8, data, fns[s->sew]);
+        return true;
+    }
+    return false;
+}
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 5788e46dcf..8ab68bcdd1 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4715,3 +4715,31 @@ GEN_VEXT_VRGATHER_VX(vrgather_vx_b, uint8_t, H1, clearb)
 GEN_VEXT_VRGATHER_VX(vrgather_vx_h, uint16_t, H2, clearh)
 GEN_VEXT_VRGATHER_VX(vrgather_vx_w, uint32_t, H4, clearl)
 GEN_VEXT_VRGATHER_VX(vrgather_vx_d, uint64_t, H8, clearq)
+
+/* Vector Compress Instruction */
+#define GEN_VEXT_VCOMPRESS_VM(NAME, ETYPE, H, CLEAR_FN)                   \
+void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,               \
+        CPURISCVState *env, uint32_t desc)                                \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
+    uint32_t vl = env->vl;                                                \
+    uint32_t num = 0, i;                                                  \
+                                                                          \
+    for (i = 0; i < vl; i++) {                                            \
+        if (!vext_elem_mask(vs1, mlen, i)) {                              \
+            continue;                                                     \
+        }                                                                 \
+        *((ETYPE *)vd + H(num)) = *((ETYPE *)vs2 + H(i));                 \
+        num++;                                                            \
+    }                                                                     \
+    if (i == 0) {                                                         \
+        return;                                                           \
+    }                                                                     \
+    CLEAR_FN(vd, num, num * sizeof(ETYPE), vlmax * sizeof(ETYPE));        \
+}
+/* Compress into vd elements of vs2 where vs1 is enabled */
+GEN_VEXT_VCOMPRESS_VM(vcompress_vm_b, uint8_t, H1, clearb)
+GEN_VEXT_VCOMPRESS_VM(vcompress_vm_h, uint16_t, H2, clearh)
+GEN_VEXT_VCOMPRESS_VM(vcompress_vm_w, uint32_t, H4, clearl)
+GEN_VEXT_VCOMPRESS_VM(vcompress_vm_d, uint64_t, H8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 59/60] target/riscv: vector compress instruction
@ 2020-03-12 14:58   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:58 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  5 +++++
 target/riscv/insn32.decode              |  1 +
 target/riscv/insn_trans/trans_rvv.inc.c | 28 +++++++++++++++++++++++++
 target/riscv/vector_helper.c            | 28 +++++++++++++++++++++++++
 4 files changed, 62 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index b9ec0a4efc..3e223ed7d7 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1146,3 +1146,8 @@ DEF_HELPER_6(vrgather_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrgather_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrgather_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrgather_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vcompress_vm_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vcompress_vm_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vcompress_vm_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vcompress_vm_d, void, ptr, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index d92861a334..8eab175a74 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -569,6 +569,7 @@ vslide1down_vx  001111 . ..... ..... 110 ..... 1010111 @r_vm
 vrgather_vv     001100 . ..... ..... 000 ..... 1010111 @r_vm
 vrgather_vx     001100 . ..... ..... 100 ..... 1010111 @r_vm
 vrgather_vi     001100 . ..... ..... 011 ..... 1010111 @r_vm
+vcompress_vm    010111 - ..... ..... 010 ..... 1010111 @r
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index f3b08919b9..b7959ad417 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2356,3 +2356,31 @@ static bool vrgather_vx_check(DisasContext *s, arg_rmrr *a)
 }
 GEN_OPIVX_TRANS(vrgather_vx, vrgather_vx_check)
 GEN_OPIVI_TRANS(vrgather_vi, 1, vrgather_vx, vrgather_vx_check)
+
+/* Vector Compress Instruction */
+static bool vcompress_vm_check(DisasContext *s, arg_r *a)
+{
+    return (vext_check_isa_ill(s, RVV) &&
+            vext_check_reg(s, a->rd, false) &&
+            vext_check_reg(s, a->rs2, false) &&
+            vext_check_overlap_group(a->rd, 1 << s->lmul, a->rs1, 1) &&
+            (a->rd != a->rs2));
+}
+
+static bool trans_vcompress_vm(DisasContext *s, arg_r *a)
+{
+    if (vcompress_vm_check(s, a)) {
+        uint32_t data = 0;
+        static gen_helper_gvec_4_ptr * const fns[4] = {
+            gen_helper_vcompress_vm_b, gen_helper_vcompress_vm_h,
+            gen_helper_vcompress_vm_w, gen_helper_vcompress_vm_d,
+        };
+        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),
+            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),
+            cpu_env, 0, s->vlen / 8, data, fns[s->sew]);
+        return true;
+    }
+    return false;
+}
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 5788e46dcf..8ab68bcdd1 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4715,3 +4715,31 @@ GEN_VEXT_VRGATHER_VX(vrgather_vx_b, uint8_t, H1, clearb)
 GEN_VEXT_VRGATHER_VX(vrgather_vx_h, uint16_t, H2, clearh)
 GEN_VEXT_VRGATHER_VX(vrgather_vx_w, uint32_t, H4, clearl)
 GEN_VEXT_VRGATHER_VX(vrgather_vx_d, uint64_t, H8, clearq)
+
+/* Vector Compress Instruction */
+#define GEN_VEXT_VCOMPRESS_VM(NAME, ETYPE, H, CLEAR_FN)                   \
+void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,               \
+        CPURISCVState *env, uint32_t desc)                                \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
+    uint32_t vl = env->vl;                                                \
+    uint32_t num = 0, i;                                                  \
+                                                                          \
+    for (i = 0; i < vl; i++) {                                            \
+        if (!vext_elem_mask(vs1, mlen, i)) {                              \
+            continue;                                                     \
+        }                                                                 \
+        *((ETYPE *)vd + H(num)) = *((ETYPE *)vs2 + H(i));                 \
+        num++;                                                            \
+    }                                                                     \
+    if (i == 0) {                                                         \
+        return;                                                           \
+    }                                                                     \
+    CLEAR_FN(vd, num, num * sizeof(ETYPE), vlmax * sizeof(ETYPE));        \
+}
+/* Compress into vd elements of vs2 where vs1 is enabled */
+GEN_VEXT_VCOMPRESS_VM(vcompress_vm_b, uint8_t, H1, clearb)
+GEN_VEXT_VCOMPRESS_VM(vcompress_vm_h, uint16_t, H2, clearh)
+GEN_VEXT_VCOMPRESS_VM(vcompress_vm_w, uint32_t, H4, clearl)
+GEN_VEXT_VCOMPRESS_VM(vcompress_vm_d, uint64_t, H8, clearq)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 60/60] target/riscv: configure and turn on vector extension from command line
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-12 14:59   ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:59 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, LIU Zhiwei

Vector extension is default off. The only way to use vector extension is
1. use cpu rv32 or rv64
2. turn on it by command line
"-cpu rv64,v=true,vlen=128,elen=64,vext_spec=v0.7.1".

vlen is the vector register length, default value is 128 bit.
elen is the max operator size in bits, default value is 64 bit.
vext_spec is the vector specification version, default value is v0.7.1.
These properties can be specified with other values.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/cpu.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
 target/riscv/cpu.h |  2 ++
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 6e4135583d..5f1cdd4f2b 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -395,7 +395,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
     }
 
     set_priv_version(env, priv_version);
-    set_vext_version(env, vext_version);
     set_resetvec(env, DEFAULT_RSTVEC);
 
     if (cpu->cfg.mmu) {
@@ -463,6 +462,45 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
         if (cpu->cfg.ext_h) {
             target_misa |= RVH;
         }
+        if (cpu->cfg.ext_v) {
+            target_misa |= RVV;
+            if (!is_power_of_2(cpu->cfg.vlen)) {
+                error_setg(errp,
+                        "Vector extension VLEN must be power of 2");
+                return;
+            }
+            if (cpu->cfg.vlen > RV_VLEN_MAX || cpu->cfg.vlen < 128) {
+                error_setg(errp,
+                        "Vector extension implementation only supports VLEN "
+                        "in the range [128, %d]", RV_VLEN_MAX);
+                return;
+            }
+            if (!is_power_of_2(cpu->cfg.elen)) {
+                error_setg(errp,
+                        "Vector extension ELEN must be power of 2");
+                return;
+            }
+            if (cpu->cfg.elen > 64 || cpu->cfg.vlen < 8) {
+                error_setg(errp,
+                        "Vector extension implementation only supports ELEN "
+                        "in the range [8, 64]");
+                return;
+            }
+            if (cpu->cfg.vext_spec) {
+                if (!g_strcmp0(cpu->cfg.vext_spec, "v0.7.1")) {
+                    vext_version = VEXT_VERSION_0_07_1;
+                } else {
+                    error_setg(errp,
+                           "Unsupported vector spec version '%s'",
+                           cpu->cfg.vext_spec);
+                    return;
+                }
+            } else {
+                qemu_log("vector verison is not specified, "
+                        "use the default value v0.7.1\n");
+            }
+            set_vext_version(env, vext_version);
+        }
 
         set_misa(env, RVXLEN | target_misa);
     }
@@ -500,10 +538,14 @@ static Property riscv_cpu_properties[] = {
     DEFINE_PROP_BOOL("u", RISCVCPU, cfg.ext_u, true),
     /* This is experimental so mark with 'x-' */
     DEFINE_PROP_BOOL("x-h", RISCVCPU, cfg.ext_h, false),
+    DEFINE_PROP_BOOL("v", RISCVCPU, cfg.ext_v, false),
     DEFINE_PROP_BOOL("Counters", RISCVCPU, cfg.ext_counters, true),
     DEFINE_PROP_BOOL("Zifencei", RISCVCPU, cfg.ext_ifencei, true),
     DEFINE_PROP_BOOL("Zicsr", RISCVCPU, cfg.ext_icsr, true),
     DEFINE_PROP_STRING("priv_spec", RISCVCPU, cfg.priv_spec),
+    DEFINE_PROP_STRING("vext_spec", RISCVCPU, cfg.vext_spec),
+    DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
+    DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),
     DEFINE_PROP_BOOL("mmu", RISCVCPU, cfg.mmu, true),
     DEFINE_PROP_BOOL("pmp", RISCVCPU, cfg.pmp, true),
     DEFINE_PROP_END_OF_LIST(),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index e069e55e81..36ead8d6d5 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -285,12 +285,14 @@ typedef struct RISCVCPU {
         bool ext_s;
         bool ext_u;
         bool ext_h;
+        bool ext_v;
         bool ext_counters;
         bool ext_ifencei;
         bool ext_icsr;
 
         char *priv_spec;
         char *user_spec;
+        char *vext_spec;
         uint16_t vlen;
         uint16_t elen;
         bool mmu;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH v5 60/60] target/riscv: configure and turn on vector extension from command line
@ 2020-03-12 14:59   ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 14:59 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv, LIU Zhiwei

Vector extension is default off. The only way to use vector extension is
1. use cpu rv32 or rv64
2. turn on it by command line
"-cpu rv64,v=true,vlen=128,elen=64,vext_spec=v0.7.1".

vlen is the vector register length, default value is 128 bit.
elen is the max operator size in bits, default value is 64 bit.
vext_spec is the vector specification version, default value is v0.7.1.
These properties can be specified with other values.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/cpu.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
 target/riscv/cpu.h |  2 ++
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 6e4135583d..5f1cdd4f2b 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -395,7 +395,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
     }
 
     set_priv_version(env, priv_version);
-    set_vext_version(env, vext_version);
     set_resetvec(env, DEFAULT_RSTVEC);
 
     if (cpu->cfg.mmu) {
@@ -463,6 +462,45 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
         if (cpu->cfg.ext_h) {
             target_misa |= RVH;
         }
+        if (cpu->cfg.ext_v) {
+            target_misa |= RVV;
+            if (!is_power_of_2(cpu->cfg.vlen)) {
+                error_setg(errp,
+                        "Vector extension VLEN must be power of 2");
+                return;
+            }
+            if (cpu->cfg.vlen > RV_VLEN_MAX || cpu->cfg.vlen < 128) {
+                error_setg(errp,
+                        "Vector extension implementation only supports VLEN "
+                        "in the range [128, %d]", RV_VLEN_MAX);
+                return;
+            }
+            if (!is_power_of_2(cpu->cfg.elen)) {
+                error_setg(errp,
+                        "Vector extension ELEN must be power of 2");
+                return;
+            }
+            if (cpu->cfg.elen > 64 || cpu->cfg.vlen < 8) {
+                error_setg(errp,
+                        "Vector extension implementation only supports ELEN "
+                        "in the range [8, 64]");
+                return;
+            }
+            if (cpu->cfg.vext_spec) {
+                if (!g_strcmp0(cpu->cfg.vext_spec, "v0.7.1")) {
+                    vext_version = VEXT_VERSION_0_07_1;
+                } else {
+                    error_setg(errp,
+                           "Unsupported vector spec version '%s'",
+                           cpu->cfg.vext_spec);
+                    return;
+                }
+            } else {
+                qemu_log("vector verison is not specified, "
+                        "use the default value v0.7.1\n");
+            }
+            set_vext_version(env, vext_version);
+        }
 
         set_misa(env, RVXLEN | target_misa);
     }
@@ -500,10 +538,14 @@ static Property riscv_cpu_properties[] = {
     DEFINE_PROP_BOOL("u", RISCVCPU, cfg.ext_u, true),
     /* This is experimental so mark with 'x-' */
     DEFINE_PROP_BOOL("x-h", RISCVCPU, cfg.ext_h, false),
+    DEFINE_PROP_BOOL("v", RISCVCPU, cfg.ext_v, false),
     DEFINE_PROP_BOOL("Counters", RISCVCPU, cfg.ext_counters, true),
     DEFINE_PROP_BOOL("Zifencei", RISCVCPU, cfg.ext_ifencei, true),
     DEFINE_PROP_BOOL("Zicsr", RISCVCPU, cfg.ext_icsr, true),
     DEFINE_PROP_STRING("priv_spec", RISCVCPU, cfg.priv_spec),
+    DEFINE_PROP_STRING("vext_spec", RISCVCPU, cfg.vext_spec),
+    DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
+    DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),
     DEFINE_PROP_BOOL("mmu", RISCVCPU, cfg.mmu, true),
     DEFINE_PROP_BOOL("pmp", RISCVCPU, cfg.pmp, true),
     DEFINE_PROP_END_OF_LIST(),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index e069e55e81..36ead8d6d5 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -285,12 +285,14 @@ typedef struct RISCVCPU {
         bool ext_s;
         bool ext_u;
         bool ext_h;
+        bool ext_v;
         bool ext_counters;
         bool ext_ifencei;
         bool ext_icsr;
 
         char *priv_spec;
         char *user_spec;
+        char *vext_spec;
         uint16_t vlen;
         uint16_t elen;
         bool mmu;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 03/60] target/riscv: support vector extension csr
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-12 20:54     ` Alistair Francis
  -1 siblings, 0 replies; 336+ messages in thread
From: Alistair Francis @ 2020-03-12 20:54 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: guoren, open list:RISC-V, Richard Henderson,
	qemu-devel@nongnu.org Developers, wxy194768, Chih-Min Chao,
	wenmeng_zhang, Palmer Dabbelt

On Thu, Mar 12, 2020 at 8:05 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> The v0.7.1 specification does not define vector status within mstatus.
> A future revision will define the privileged portion of the vector status.
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/cpu_bits.h | 15 +++++++++
>  target/riscv/csr.c      | 75 ++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 89 insertions(+), 1 deletion(-)
>
> diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> index 7f64ee1174..8117e8b5a7 100644
> --- a/target/riscv/cpu_bits.h
> +++ b/target/riscv/cpu_bits.h
> @@ -29,6 +29,14 @@
>  #define FSR_NXA             (FPEXC_NX << FSR_AEXC_SHIFT)
>  #define FSR_AEXC            (FSR_NVA | FSR_OFA | FSR_UFA | FSR_DZA | FSR_NXA)
>
> +/* Vector Fixed-Point round model */
> +#define FSR_VXRM_SHIFT      9
> +#define FSR_VXRM            (0x3 << FSR_VXRM_SHIFT)
> +
> +/* Vector Fixed-Point saturation flag */
> +#define FSR_VXSAT_SHIFT     8
> +#define FSR_VXSAT           (0x1 << FSR_VXSAT_SHIFT)
> +
>  /* Control and Status Registers */
>
>  /* User Trap Setup */
> @@ -48,6 +56,13 @@
>  #define CSR_FRM             0x002
>  #define CSR_FCSR            0x003
>
> +/* User Vector CSRs */
> +#define CSR_VSTART          0x008
> +#define CSR_VXSAT           0x009
> +#define CSR_VXRM            0x00a
> +#define CSR_VL              0xc20
> +#define CSR_VTYPE           0xc21
> +
>  /* User Timers and Counters */
>  #define CSR_CYCLE           0xc00
>  #define CSR_TIME            0xc01
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index 11d184cd16..d71c49dfff 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -46,6 +46,10 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations *ops)
>  static int fs(CPURISCVState *env, int csrno)
>  {
>  #if !defined(CONFIG_USER_ONLY)
> +    /* loose check condition for fcsr in vector extension */
> +    if ((csrno == CSR_FCSR) && (env->misa & RVV)) {
> +        return 0;
> +    }
>      if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
>          return -1;
>      }
> @@ -53,6 +57,14 @@ static int fs(CPURISCVState *env, int csrno)
>      return 0;
>  }
>
> +static int vs(CPURISCVState *env, int csrno)
> +{
> +    if (env->misa & RVV) {
> +        return 0;
> +    }
> +    return -1;
> +}
> +
>  static int ctr(CPURISCVState *env, int csrno)
>  {
>  #if !defined(CONFIG_USER_ONLY)
> @@ -174,6 +186,10 @@ static int read_fcsr(CPURISCVState *env, int csrno, target_ulong *val)
>  #endif
>      *val = (riscv_cpu_get_fflags(env) << FSR_AEXC_SHIFT)
>          | (env->frm << FSR_RD_SHIFT);
> +    if (vs(env, csrno) >= 0) {
> +        *val |= (env->vxrm << FSR_VXRM_SHIFT)
> +                | (env->vxsat << FSR_VXSAT_SHIFT);
> +    }
>      return 0;
>  }
>
> @@ -186,10 +202,62 @@ static int write_fcsr(CPURISCVState *env, int csrno, target_ulong val)
>      env->mstatus |= MSTATUS_FS;
>  #endif
>      env->frm = (val & FSR_RD) >> FSR_RD_SHIFT;
> +    if (vs(env, csrno) >= 0) {
> +        env->vxrm = (val & FSR_VXRM) >> FSR_VXRM_SHIFT;
> +        env->vxsat = (val & FSR_VXSAT) >> FSR_VXSAT_SHIFT;
> +    }
>      riscv_cpu_set_fflags(env, (val & FSR_AEXC) >> FSR_AEXC_SHIFT);
>      return 0;
>  }
>
> +static int read_vtype(CPURISCVState *env, int csrno, target_ulong *val)
> +{
> +    *val = env->vtype;
> +    return 0;
> +}
> +
> +static int read_vl(CPURISCVState *env, int csrno, target_ulong *val)
> +{
> +    *val = env->vl;
> +    return 0;
> +}
> +
> +static int read_vxrm(CPURISCVState *env, int csrno, target_ulong *val)
> +{
> +    *val = env->vxrm;
> +    return 0;
> +}
> +
> +static int write_vxrm(CPURISCVState *env, int csrno, target_ulong val)
> +{
> +    env->vxrm = val;
> +    return 0;
> +}
> +
> +static int read_vxsat(CPURISCVState *env, int csrno, target_ulong *val)
> +{
> +    *val = env->vxsat;
> +    return 0;
> +}
> +
> +static int write_vxsat(CPURISCVState *env, int csrno, target_ulong val)
> +{
> +    env->vxsat = val;
> +    return 0;
> +}
> +
> +static int read_vstart(CPURISCVState *env, int csrno, target_ulong *val)
> +{
> +    *val = env->vstart;
> +    return 0;
> +}
> +
> +static int write_vstart(CPURISCVState *env, int csrno, target_ulong val)
> +{
> +    env->vstart = val;
> +    return 0;
> +}
> +
>  /* User Timers and Counters */
>  static int read_instret(CPURISCVState *env, int csrno, target_ulong *val)
>  {
> @@ -1269,7 +1337,12 @@ static riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
>      [CSR_FFLAGS] =              { fs,   read_fflags,      write_fflags      },
>      [CSR_FRM] =                 { fs,   read_frm,         write_frm         },
>      [CSR_FCSR] =                { fs,   read_fcsr,        write_fcsr        },
> -
> +    /* Vector CSRs */
> +    [CSR_VSTART] =              { vs,   read_vstart,      write_vstart      },
> +    [CSR_VXSAT] =               { vs,   read_vxsat,       write_vxsat       },
> +    [CSR_VXRM] =                { vs,   read_vxrm,        write_vxrm        },
> +    [CSR_VL] =                  { vs,   read_vl                             },
> +    [CSR_VTYPE] =               { vs,   read_vtype                          },
>      /* User Timers and Counters */
>      [CSR_CYCLE] =               { ctr,  read_instret                        },
>      [CSR_INSTRET] =             { ctr,  read_instret                        },
> --
> 2.23.0
>


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 03/60] target/riscv: support vector extension csr
@ 2020-03-12 20:54     ` Alistair Francis
  0 siblings, 0 replies; 336+ messages in thread
From: Alistair Francis @ 2020-03-12 20:54 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Chih-Min Chao, Palmer Dabbelt, wenmeng_zhang,
	wxy194768, guoren, qemu-devel@nongnu.org Developers,
	open list:RISC-V

On Thu, Mar 12, 2020 at 8:05 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> The v0.7.1 specification does not define vector status within mstatus.
> A future revision will define the privileged portion of the vector status.
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/cpu_bits.h | 15 +++++++++
>  target/riscv/csr.c      | 75 ++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 89 insertions(+), 1 deletion(-)
>
> diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> index 7f64ee1174..8117e8b5a7 100644
> --- a/target/riscv/cpu_bits.h
> +++ b/target/riscv/cpu_bits.h
> @@ -29,6 +29,14 @@
>  #define FSR_NXA             (FPEXC_NX << FSR_AEXC_SHIFT)
>  #define FSR_AEXC            (FSR_NVA | FSR_OFA | FSR_UFA | FSR_DZA | FSR_NXA)
>
> +/* Vector Fixed-Point round model */
> +#define FSR_VXRM_SHIFT      9
> +#define FSR_VXRM            (0x3 << FSR_VXRM_SHIFT)
> +
> +/* Vector Fixed-Point saturation flag */
> +#define FSR_VXSAT_SHIFT     8
> +#define FSR_VXSAT           (0x1 << FSR_VXSAT_SHIFT)
> +
>  /* Control and Status Registers */
>
>  /* User Trap Setup */
> @@ -48,6 +56,13 @@
>  #define CSR_FRM             0x002
>  #define CSR_FCSR            0x003
>
> +/* User Vector CSRs */
> +#define CSR_VSTART          0x008
> +#define CSR_VXSAT           0x009
> +#define CSR_VXRM            0x00a
> +#define CSR_VL              0xc20
> +#define CSR_VTYPE           0xc21
> +
>  /* User Timers and Counters */
>  #define CSR_CYCLE           0xc00
>  #define CSR_TIME            0xc01
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index 11d184cd16..d71c49dfff 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -46,6 +46,10 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations *ops)
>  static int fs(CPURISCVState *env, int csrno)
>  {
>  #if !defined(CONFIG_USER_ONLY)
> +    /* loose check condition for fcsr in vector extension */
> +    if ((csrno == CSR_FCSR) && (env->misa & RVV)) {
> +        return 0;
> +    }
>      if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
>          return -1;
>      }
> @@ -53,6 +57,14 @@ static int fs(CPURISCVState *env, int csrno)
>      return 0;
>  }
>
> +static int vs(CPURISCVState *env, int csrno)
> +{
> +    if (env->misa & RVV) {
> +        return 0;
> +    }
> +    return -1;
> +}
> +
>  static int ctr(CPURISCVState *env, int csrno)
>  {
>  #if !defined(CONFIG_USER_ONLY)
> @@ -174,6 +186,10 @@ static int read_fcsr(CPURISCVState *env, int csrno, target_ulong *val)
>  #endif
>      *val = (riscv_cpu_get_fflags(env) << FSR_AEXC_SHIFT)
>          | (env->frm << FSR_RD_SHIFT);
> +    if (vs(env, csrno) >= 0) {
> +        *val |= (env->vxrm << FSR_VXRM_SHIFT)
> +                | (env->vxsat << FSR_VXSAT_SHIFT);
> +    }
>      return 0;
>  }
>
> @@ -186,10 +202,62 @@ static int write_fcsr(CPURISCVState *env, int csrno, target_ulong val)
>      env->mstatus |= MSTATUS_FS;
>  #endif
>      env->frm = (val & FSR_RD) >> FSR_RD_SHIFT;
> +    if (vs(env, csrno) >= 0) {
> +        env->vxrm = (val & FSR_VXRM) >> FSR_VXRM_SHIFT;
> +        env->vxsat = (val & FSR_VXSAT) >> FSR_VXSAT_SHIFT;
> +    }
>      riscv_cpu_set_fflags(env, (val & FSR_AEXC) >> FSR_AEXC_SHIFT);
>      return 0;
>  }
>
> +static int read_vtype(CPURISCVState *env, int csrno, target_ulong *val)
> +{
> +    *val = env->vtype;
> +    return 0;
> +}
> +
> +static int read_vl(CPURISCVState *env, int csrno, target_ulong *val)
> +{
> +    *val = env->vl;
> +    return 0;
> +}
> +
> +static int read_vxrm(CPURISCVState *env, int csrno, target_ulong *val)
> +{
> +    *val = env->vxrm;
> +    return 0;
> +}
> +
> +static int write_vxrm(CPURISCVState *env, int csrno, target_ulong val)
> +{
> +    env->vxrm = val;
> +    return 0;
> +}
> +
> +static int read_vxsat(CPURISCVState *env, int csrno, target_ulong *val)
> +{
> +    *val = env->vxsat;
> +    return 0;
> +}
> +
> +static int write_vxsat(CPURISCVState *env, int csrno, target_ulong val)
> +{
> +    env->vxsat = val;
> +    return 0;
> +}
> +
> +static int read_vstart(CPURISCVState *env, int csrno, target_ulong *val)
> +{
> +    *val = env->vstart;
> +    return 0;
> +}
> +
> +static int write_vstart(CPURISCVState *env, int csrno, target_ulong val)
> +{
> +    env->vstart = val;
> +    return 0;
> +}
> +
>  /* User Timers and Counters */
>  static int read_instret(CPURISCVState *env, int csrno, target_ulong *val)
>  {
> @@ -1269,7 +1337,12 @@ static riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
>      [CSR_FFLAGS] =              { fs,   read_fflags,      write_fflags      },
>      [CSR_FRM] =                 { fs,   read_frm,         write_frm         },
>      [CSR_FCSR] =                { fs,   read_fcsr,        write_fcsr        },
> -
> +    /* Vector CSRs */
> +    [CSR_VSTART] =              { vs,   read_vstart,      write_vstart      },
> +    [CSR_VXSAT] =               { vs,   read_vxsat,       write_vxsat       },
> +    [CSR_VXRM] =                { vs,   read_vxrm,        write_vxrm        },
> +    [CSR_VL] =                  { vs,   read_vl                             },
> +    [CSR_VTYPE] =               { vs,   read_vtype                          },
>      /* User Timers and Counters */
>      [CSR_CYCLE] =               { ctr,  read_instret                        },
>      [CSR_INSTRET] =             { ctr,  read_instret                        },
> --
> 2.23.0
>


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 04/60] target/riscv: add vector configure instruction
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-12 21:23     ` Alistair Francis
  -1 siblings, 0 replies; 336+ messages in thread
From: Alistair Francis @ 2020-03-12 21:23 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: guoren, open list:RISC-V, Richard Henderson,
	qemu-devel@nongnu.org Developers, wxy194768, Chih-Min Chao,
	wenmeng_zhang, Palmer Dabbelt

On Thu, Mar 12, 2020 at 8:07 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> vsetvl and vsetvli are two configure instructions for vl, vtype. TB flags
> should update after configure instructions. The (ill, lmul, sew ) of vtype
> and the bit of (VSTART == 0 && VL == VLMAX) will be placed within tb_flags.
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/Makefile.objs              |  2 +-
>  target/riscv/cpu.h                      | 63 ++++++++++++++++++----
>  target/riscv/helper.h                   |  2 +
>  target/riscv/insn32.decode              |  5 ++
>  target/riscv/insn_trans/trans_rvv.inc.c | 69 +++++++++++++++++++++++++
>  target/riscv/translate.c                | 17 +++++-
>  target/riscv/vector_helper.c            | 53 +++++++++++++++++++
>  7 files changed, 199 insertions(+), 12 deletions(-)
>  create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
>  create mode 100644 target/riscv/vector_helper.c
>
> diff --git a/target/riscv/Makefile.objs b/target/riscv/Makefile.objs
> index ff651f69f6..ff38df6219 100644
> --- a/target/riscv/Makefile.objs
> +++ b/target/riscv/Makefile.objs
> @@ -1,4 +1,4 @@
> -obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o gdbstub.o
> +obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o vector_helper.o gdbstub.o
>  obj-$(CONFIG_SOFTMMU) += pmp.o
>
>  ifeq ($(CONFIG_SOFTMMU),y)
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index 603715f849..505d1a8515 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -21,6 +21,7 @@
>  #define RISCV_CPU_H
>
>  #include "hw/core/cpu.h"
> +#include "hw/registerfields.h"
>  #include "exec/cpu-defs.h"
>  #include "fpu/softfloat-types.h"
>
> @@ -99,6 +100,12 @@ typedef struct CPURISCVState CPURISCVState;
>
>  #define RV_VLEN_MAX 512
>
> +FIELD(VTYPE, VLMUL, 0, 2)
> +FIELD(VTYPE, VSEW, 2, 3)
> +FIELD(VTYPE, VEDIV, 5, 2)
> +FIELD(VTYPE, RESERVED, 7, sizeof(target_ulong) * 8 - 9)
> +FIELD(VTYPE, VILL, sizeof(target_ulong) * 8 - 2, 1)
> +
>  struct CPURISCVState {
>      target_ulong gpr[32];
>      uint64_t fpr[32]; /* assume both F and D extensions */
> @@ -358,19 +365,62 @@ void riscv_cpu_set_fflags(CPURISCVState *env, target_ulong);
>  #define TB_FLAGS_MMU_MASK   3
>  #define TB_FLAGS_MSTATUS_FS MSTATUS_FS
>
> +typedef CPURISCVState CPUArchState;
> +typedef RISCVCPU ArchCPU;
> +#include "exec/cpu-all.h"
> +
> +FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
> +FIELD(TB_FLAGS, LMUL, 3, 2)
> +FIELD(TB_FLAGS, SEW, 5, 3)
> +FIELD(TB_FLAGS, VILL, 8, 1)
> +
> +/*
> + * A simplification for VLMAX
> + * = (1 << LMUL) * VLEN / (8 * (1 << SEW))
> + * = (VLEN << LMUL) / (8 << SEW)
> + * = (VLEN << LMUL) >> (SEW + 3)
> + * = VLEN >> (SEW + 3 - LMUL)
> + */
> +static inline uint32_t vext_get_vlmax(RISCVCPU *cpu, target_ulong vtype)
> +{
> +    uint8_t sew, lmul;
> +
> +    sew = FIELD_EX64(vtype, VTYPE, VSEW);
> +    lmul = FIELD_EX64(vtype, VTYPE, VLMUL);
> +    return cpu->cfg.vlen >> (sew + 3 - lmul);
> +}
> +
>  static inline void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
> -                                        target_ulong *cs_base, uint32_t *flags)
> +                                        target_ulong *cs_base, uint32_t *pflags)
>  {
> +    uint32_t flags = 0;
> +
>      *pc = env->pc;
>      *cs_base = 0;
> +
> +    if (env->misa & RVV) {

Can you use: riscv_has_ext(env, RVV) instead?

> +        uint32_t vlmax = vext_get_vlmax(env_archcpu(env), env->vtype);
> +        bool vl_eq_vlmax = (env->vstart == 0) && (vlmax == env->vl);
> +        flags = FIELD_DP32(flags, TB_FLAGS, VILL,
> +                    FIELD_EX64(env->vtype, VTYPE, VILL));
> +        flags = FIELD_DP32(flags, TB_FLAGS, SEW,
> +                    FIELD_EX64(env->vtype, VTYPE, VSEW));
> +        flags = FIELD_DP32(flags, TB_FLAGS, LMUL,
> +                    FIELD_EX64(env->vtype, VTYPE, VLMUL));
> +        flags = FIELD_DP32(flags, TB_FLAGS, VL_EQ_VLMAX, vl_eq_vlmax);
> +    } else {
> +        flags = FIELD_DP32(flags, TB_FLAGS, VILL, 1);
> +    }
> +
>  #ifdef CONFIG_USER_ONLY
> -    *flags = TB_FLAGS_MSTATUS_FS;
> +    flags |= TB_FLAGS_MSTATUS_FS;
>  #else
> -    *flags = cpu_mmu_index(env, 0);
> +    flags |= cpu_mmu_index(env, 0);
>      if (riscv_cpu_fp_enabled(env)) {
> -        *flags |= env->mstatus & MSTATUS_FS;
> +        flags |= env->mstatus & MSTATUS_FS;
>      }
>  #endif
> +    *pflags = flags;
>  }
>
>  int riscv_csrrw(CPURISCVState *env, int csrno, target_ulong *ret_value,
> @@ -411,9 +461,4 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations *ops);
>
>  void riscv_cpu_register_gdb_regs_for_features(CPUState *cs);
>
> -typedef CPURISCVState CPUArchState;
> -typedef RISCVCPU ArchCPU;
> -
> -#include "exec/cpu-all.h"
> -
>  #endif /* RISCV_CPU_H */
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index debb22a480..3c28c7e407 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -76,3 +76,5 @@ DEF_HELPER_2(mret, tl, env, tl)
>  DEF_HELPER_1(wfi, void, env)
>  DEF_HELPER_1(tlb_flush, void, env)
>  #endif
> +/* Vector functions */
> +DEF_HELPER_3(vsetvl, tl, env, tl, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index b883672e63..53340bdbc4 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -62,6 +62,7 @@
>  @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
>  @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
>  @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
> +@r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
>
>  @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
>  @hfence_bvma ....... ..... .....   ... ..... ....... %rs2 %rs1
> @@ -207,3 +208,7 @@ fcvt_w_d   1100001  00000 ..... ... ..... 1010011 @r2_rm
>  fcvt_wu_d  1100001  00001 ..... ... ..... 1010011 @r2_rm
>  fcvt_d_w   1101001  00000 ..... ... ..... 1010011 @r2_rm
>  fcvt_d_wu  1101001  00001 ..... ... ..... 1010011 @r2_rm
> +
> +# *** RV32V Extension ***
> +vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
> +vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
> new file mode 100644
> index 0000000000..da82c72bbf
> --- /dev/null
> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
> @@ -0,0 +1,69 @@
> +/*
> + * RISC-V translation routines for the RVV Standard Extension.
> + *
> + * Copyright (c) 2020 C-SKY Limited. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl * a)
> +{
> +    TCGv s1, s2, dst;
> +    s2 = tcg_temp_new();
> +    dst = tcg_temp_new();
> +
> +    /* Using x0 as the rs1 register specifier, encodes an infinite AVL */
> +    if (a->rs1 == 0) {
> +        /* As the mask is at least one bit, RV_VLEN_MAX is >= VLMAX */
> +        s1 = tcg_const_tl(RV_VLEN_MAX);
> +    } else {
> +        s1 = tcg_temp_new();
> +        gen_get_gpr(s1, a->rs1);
> +    }
> +    gen_get_gpr(s2, a->rs2);
> +    gen_helper_vsetvl(dst, cpu_env, s1, s2);
> +    gen_set_gpr(a->rd, dst);
> +    tcg_gen_movi_tl(cpu_pc, ctx->pc_succ_insn);
> +    exit_tb(ctx);

Why does this

> +    ctx->base.is_jmp = DISAS_NORETURN;
> +
> +    tcg_temp_free(s1);
> +    tcg_temp_free(s2);
> +    tcg_temp_free(dst);
> +    return true;
> +}
> +
> +static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli * a)
> +{
> +    TCGv s1, s2, dst;
> +    s2 = tcg_const_tl(a->zimm);
> +    dst = tcg_temp_new();
> +
> +    /* Using x0 as the rs1 register specifier, encodes an infinite AVL */
> +    if (a->rs1 == 0) {
> +        /* As the mask is at least one bit, RV_VLEN_MAX is >= VLMAX */
> +        s1 = tcg_const_tl(RV_VLEN_MAX);
> +    } else {
> +        s1 = tcg_temp_new();
> +        gen_get_gpr(s1, a->rs1);
> +    }
> +    gen_helper_vsetvl(dst, cpu_env, s1, s2);
> +    gen_set_gpr(a->rd, dst);
> +    gen_goto_tb(ctx, 0, ctx->pc_succ_insn);

Need to be different to this?

Alistair

> +    ctx->base.is_jmp = DISAS_NORETURN;
> +
> +    tcg_temp_free(s1);
> +    tcg_temp_free(s2);
> +    tcg_temp_free(dst);
> +    return true;
> +}
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index 43bf7e39a6..af07ac4160 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -56,6 +56,12 @@ typedef struct DisasContext {
>         to reset this known value.  */
>      int frm;
>      bool ext_ifencei;
> +    /* vector extension */
> +    bool vill;
> +    uint8_t lmul;
> +    uint8_t sew;
> +    uint16_t vlen;
> +    bool vl_eq_vlmax;
>  } DisasContext;
>
>  #ifdef TARGET_RISCV64
> @@ -711,6 +717,7 @@ static bool gen_shift(DisasContext *ctx, arg_r *a,
>  #include "insn_trans/trans_rva.inc.c"
>  #include "insn_trans/trans_rvf.inc.c"
>  #include "insn_trans/trans_rvd.inc.c"
> +#include "insn_trans/trans_rvv.inc.c"
>  #include "insn_trans/trans_privileged.inc.c"
>
>  /* Include the auto-generated decoder for 16 bit insn */
> @@ -745,10 +752,11 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
>      DisasContext *ctx = container_of(dcbase, DisasContext, base);
>      CPURISCVState *env = cs->env_ptr;
>      RISCVCPU *cpu = RISCV_CPU(cs);
> +    uint32_t tb_flags = ctx->base.tb->flags;
>
>      ctx->pc_succ_insn = ctx->base.pc_first;
> -    ctx->mem_idx = ctx->base.tb->flags & TB_FLAGS_MMU_MASK;
> -    ctx->mstatus_fs = ctx->base.tb->flags & TB_FLAGS_MSTATUS_FS;
> +    ctx->mem_idx = tb_flags & TB_FLAGS_MMU_MASK;
> +    ctx->mstatus_fs = tb_flags & TB_FLAGS_MSTATUS_FS;
>      ctx->priv_ver = env->priv_ver;
>  #if !defined(CONFIG_USER_ONLY)
>      if (riscv_has_ext(env, RVH)) {
> @@ -772,6 +780,11 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
>      ctx->misa = env->misa;
>      ctx->frm = -1;  /* unknown rounding mode */
>      ctx->ext_ifencei = cpu->cfg.ext_ifencei;
> +    ctx->vlen = cpu->cfg.vlen;
> +    ctx->vill = FIELD_EX32(tb_flags, TB_FLAGS, VILL);
> +    ctx->sew = FIELD_EX32(tb_flags, TB_FLAGS, SEW);
> +    ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
> +    ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
>  }
>
>  static void riscv_tr_tb_start(DisasContextBase *db, CPUState *cpu)
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> new file mode 100644
> index 0000000000..2afe716f2a
> --- /dev/null
> +++ b/target/riscv/vector_helper.c
> @@ -0,0 +1,53 @@
> +/*
> + * RISC-V Vector Extension Helpers for QEMU.
> + *
> + * Copyright (c) 2020 C-SKY Limited. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "cpu.h"
> +#include "exec/exec-all.h"
> +#include "exec/helper-proto.h"
> +#include <math.h>
> +
> +target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
> +    target_ulong s2)
> +{
> +    int vlmax, vl;
> +    RISCVCPU *cpu = env_archcpu(env);
> +    uint16_t sew = 8 << FIELD_EX64(s2, VTYPE, VSEW);
> +    uint8_t ediv = FIELD_EX64(s2, VTYPE, VEDIV);
> +    bool vill = FIELD_EX64(s2, VTYPE, VILL);
> +    target_ulong reserved = FIELD_EX64(s2, VTYPE, RESERVED);
> +
> +    if ((sew > cpu->cfg.elen) || vill || (ediv != 0) || (reserved != 0)) {
> +        /* only set vill bit. */
> +        env->vtype = FIELD_DP64(0, VTYPE, VILL, 1);
> +        env->vl = 0;
> +        env->vstart = 0;
> +        return 0;
> +    }
> +
> +    vlmax = vext_get_vlmax(cpu, s2);
> +    if (s1 <= vlmax) {
> +        vl = s1;
> +    } else {
> +        vl = vlmax;
> +    }
> +    env->vl = vl;
> +    env->vtype = s2;
> +    env->vstart = 0;
> +    return vl;
> +}
> --
> 2.23.0
>


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 04/60] target/riscv: add vector configure instruction
@ 2020-03-12 21:23     ` Alistair Francis
  0 siblings, 0 replies; 336+ messages in thread
From: Alistair Francis @ 2020-03-12 21:23 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Chih-Min Chao, Palmer Dabbelt, wenmeng_zhang,
	wxy194768, guoren, qemu-devel@nongnu.org Developers,
	open list:RISC-V

On Thu, Mar 12, 2020 at 8:07 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> vsetvl and vsetvli are two configure instructions for vl, vtype. TB flags
> should update after configure instructions. The (ill, lmul, sew ) of vtype
> and the bit of (VSTART == 0 && VL == VLMAX) will be placed within tb_flags.
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/Makefile.objs              |  2 +-
>  target/riscv/cpu.h                      | 63 ++++++++++++++++++----
>  target/riscv/helper.h                   |  2 +
>  target/riscv/insn32.decode              |  5 ++
>  target/riscv/insn_trans/trans_rvv.inc.c | 69 +++++++++++++++++++++++++
>  target/riscv/translate.c                | 17 +++++-
>  target/riscv/vector_helper.c            | 53 +++++++++++++++++++
>  7 files changed, 199 insertions(+), 12 deletions(-)
>  create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
>  create mode 100644 target/riscv/vector_helper.c
>
> diff --git a/target/riscv/Makefile.objs b/target/riscv/Makefile.objs
> index ff651f69f6..ff38df6219 100644
> --- a/target/riscv/Makefile.objs
> +++ b/target/riscv/Makefile.objs
> @@ -1,4 +1,4 @@
> -obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o gdbstub.o
> +obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o vector_helper.o gdbstub.o
>  obj-$(CONFIG_SOFTMMU) += pmp.o
>
>  ifeq ($(CONFIG_SOFTMMU),y)
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index 603715f849..505d1a8515 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -21,6 +21,7 @@
>  #define RISCV_CPU_H
>
>  #include "hw/core/cpu.h"
> +#include "hw/registerfields.h"
>  #include "exec/cpu-defs.h"
>  #include "fpu/softfloat-types.h"
>
> @@ -99,6 +100,12 @@ typedef struct CPURISCVState CPURISCVState;
>
>  #define RV_VLEN_MAX 512
>
> +FIELD(VTYPE, VLMUL, 0, 2)
> +FIELD(VTYPE, VSEW, 2, 3)
> +FIELD(VTYPE, VEDIV, 5, 2)
> +FIELD(VTYPE, RESERVED, 7, sizeof(target_ulong) * 8 - 9)
> +FIELD(VTYPE, VILL, sizeof(target_ulong) * 8 - 2, 1)
> +
>  struct CPURISCVState {
>      target_ulong gpr[32];
>      uint64_t fpr[32]; /* assume both F and D extensions */
> @@ -358,19 +365,62 @@ void riscv_cpu_set_fflags(CPURISCVState *env, target_ulong);
>  #define TB_FLAGS_MMU_MASK   3
>  #define TB_FLAGS_MSTATUS_FS MSTATUS_FS
>
> +typedef CPURISCVState CPUArchState;
> +typedef RISCVCPU ArchCPU;
> +#include "exec/cpu-all.h"
> +
> +FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
> +FIELD(TB_FLAGS, LMUL, 3, 2)
> +FIELD(TB_FLAGS, SEW, 5, 3)
> +FIELD(TB_FLAGS, VILL, 8, 1)
> +
> +/*
> + * A simplification for VLMAX
> + * = (1 << LMUL) * VLEN / (8 * (1 << SEW))
> + * = (VLEN << LMUL) / (8 << SEW)
> + * = (VLEN << LMUL) >> (SEW + 3)
> + * = VLEN >> (SEW + 3 - LMUL)
> + */
> +static inline uint32_t vext_get_vlmax(RISCVCPU *cpu, target_ulong vtype)
> +{
> +    uint8_t sew, lmul;
> +
> +    sew = FIELD_EX64(vtype, VTYPE, VSEW);
> +    lmul = FIELD_EX64(vtype, VTYPE, VLMUL);
> +    return cpu->cfg.vlen >> (sew + 3 - lmul);
> +}
> +
>  static inline void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
> -                                        target_ulong *cs_base, uint32_t *flags)
> +                                        target_ulong *cs_base, uint32_t *pflags)
>  {
> +    uint32_t flags = 0;
> +
>      *pc = env->pc;
>      *cs_base = 0;
> +
> +    if (env->misa & RVV) {

Can you use: riscv_has_ext(env, RVV) instead?

> +        uint32_t vlmax = vext_get_vlmax(env_archcpu(env), env->vtype);
> +        bool vl_eq_vlmax = (env->vstart == 0) && (vlmax == env->vl);
> +        flags = FIELD_DP32(flags, TB_FLAGS, VILL,
> +                    FIELD_EX64(env->vtype, VTYPE, VILL));
> +        flags = FIELD_DP32(flags, TB_FLAGS, SEW,
> +                    FIELD_EX64(env->vtype, VTYPE, VSEW));
> +        flags = FIELD_DP32(flags, TB_FLAGS, LMUL,
> +                    FIELD_EX64(env->vtype, VTYPE, VLMUL));
> +        flags = FIELD_DP32(flags, TB_FLAGS, VL_EQ_VLMAX, vl_eq_vlmax);
> +    } else {
> +        flags = FIELD_DP32(flags, TB_FLAGS, VILL, 1);
> +    }
> +
>  #ifdef CONFIG_USER_ONLY
> -    *flags = TB_FLAGS_MSTATUS_FS;
> +    flags |= TB_FLAGS_MSTATUS_FS;
>  #else
> -    *flags = cpu_mmu_index(env, 0);
> +    flags |= cpu_mmu_index(env, 0);
>      if (riscv_cpu_fp_enabled(env)) {
> -        *flags |= env->mstatus & MSTATUS_FS;
> +        flags |= env->mstatus & MSTATUS_FS;
>      }
>  #endif
> +    *pflags = flags;
>  }
>
>  int riscv_csrrw(CPURISCVState *env, int csrno, target_ulong *ret_value,
> @@ -411,9 +461,4 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations *ops);
>
>  void riscv_cpu_register_gdb_regs_for_features(CPUState *cs);
>
> -typedef CPURISCVState CPUArchState;
> -typedef RISCVCPU ArchCPU;
> -
> -#include "exec/cpu-all.h"
> -
>  #endif /* RISCV_CPU_H */
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index debb22a480..3c28c7e407 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -76,3 +76,5 @@ DEF_HELPER_2(mret, tl, env, tl)
>  DEF_HELPER_1(wfi, void, env)
>  DEF_HELPER_1(tlb_flush, void, env)
>  #endif
> +/* Vector functions */
> +DEF_HELPER_3(vsetvl, tl, env, tl, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index b883672e63..53340bdbc4 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -62,6 +62,7 @@
>  @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
>  @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
>  @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
> +@r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
>
>  @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
>  @hfence_bvma ....... ..... .....   ... ..... ....... %rs2 %rs1
> @@ -207,3 +208,7 @@ fcvt_w_d   1100001  00000 ..... ... ..... 1010011 @r2_rm
>  fcvt_wu_d  1100001  00001 ..... ... ..... 1010011 @r2_rm
>  fcvt_d_w   1101001  00000 ..... ... ..... 1010011 @r2_rm
>  fcvt_d_wu  1101001  00001 ..... ... ..... 1010011 @r2_rm
> +
> +# *** RV32V Extension ***
> +vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
> +vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
> new file mode 100644
> index 0000000000..da82c72bbf
> --- /dev/null
> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
> @@ -0,0 +1,69 @@
> +/*
> + * RISC-V translation routines for the RVV Standard Extension.
> + *
> + * Copyright (c) 2020 C-SKY Limited. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl * a)
> +{
> +    TCGv s1, s2, dst;
> +    s2 = tcg_temp_new();
> +    dst = tcg_temp_new();
> +
> +    /* Using x0 as the rs1 register specifier, encodes an infinite AVL */
> +    if (a->rs1 == 0) {
> +        /* As the mask is at least one bit, RV_VLEN_MAX is >= VLMAX */
> +        s1 = tcg_const_tl(RV_VLEN_MAX);
> +    } else {
> +        s1 = tcg_temp_new();
> +        gen_get_gpr(s1, a->rs1);
> +    }
> +    gen_get_gpr(s2, a->rs2);
> +    gen_helper_vsetvl(dst, cpu_env, s1, s2);
> +    gen_set_gpr(a->rd, dst);
> +    tcg_gen_movi_tl(cpu_pc, ctx->pc_succ_insn);
> +    exit_tb(ctx);

Why does this

> +    ctx->base.is_jmp = DISAS_NORETURN;
> +
> +    tcg_temp_free(s1);
> +    tcg_temp_free(s2);
> +    tcg_temp_free(dst);
> +    return true;
> +}
> +
> +static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli * a)
> +{
> +    TCGv s1, s2, dst;
> +    s2 = tcg_const_tl(a->zimm);
> +    dst = tcg_temp_new();
> +
> +    /* Using x0 as the rs1 register specifier, encodes an infinite AVL */
> +    if (a->rs1 == 0) {
> +        /* As the mask is at least one bit, RV_VLEN_MAX is >= VLMAX */
> +        s1 = tcg_const_tl(RV_VLEN_MAX);
> +    } else {
> +        s1 = tcg_temp_new();
> +        gen_get_gpr(s1, a->rs1);
> +    }
> +    gen_helper_vsetvl(dst, cpu_env, s1, s2);
> +    gen_set_gpr(a->rd, dst);
> +    gen_goto_tb(ctx, 0, ctx->pc_succ_insn);

Need to be different to this?

Alistair

> +    ctx->base.is_jmp = DISAS_NORETURN;
> +
> +    tcg_temp_free(s1);
> +    tcg_temp_free(s2);
> +    tcg_temp_free(dst);
> +    return true;
> +}
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index 43bf7e39a6..af07ac4160 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -56,6 +56,12 @@ typedef struct DisasContext {
>         to reset this known value.  */
>      int frm;
>      bool ext_ifencei;
> +    /* vector extension */
> +    bool vill;
> +    uint8_t lmul;
> +    uint8_t sew;
> +    uint16_t vlen;
> +    bool vl_eq_vlmax;
>  } DisasContext;
>
>  #ifdef TARGET_RISCV64
> @@ -711,6 +717,7 @@ static bool gen_shift(DisasContext *ctx, arg_r *a,
>  #include "insn_trans/trans_rva.inc.c"
>  #include "insn_trans/trans_rvf.inc.c"
>  #include "insn_trans/trans_rvd.inc.c"
> +#include "insn_trans/trans_rvv.inc.c"
>  #include "insn_trans/trans_privileged.inc.c"
>
>  /* Include the auto-generated decoder for 16 bit insn */
> @@ -745,10 +752,11 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
>      DisasContext *ctx = container_of(dcbase, DisasContext, base);
>      CPURISCVState *env = cs->env_ptr;
>      RISCVCPU *cpu = RISCV_CPU(cs);
> +    uint32_t tb_flags = ctx->base.tb->flags;
>
>      ctx->pc_succ_insn = ctx->base.pc_first;
> -    ctx->mem_idx = ctx->base.tb->flags & TB_FLAGS_MMU_MASK;
> -    ctx->mstatus_fs = ctx->base.tb->flags & TB_FLAGS_MSTATUS_FS;
> +    ctx->mem_idx = tb_flags & TB_FLAGS_MMU_MASK;
> +    ctx->mstatus_fs = tb_flags & TB_FLAGS_MSTATUS_FS;
>      ctx->priv_ver = env->priv_ver;
>  #if !defined(CONFIG_USER_ONLY)
>      if (riscv_has_ext(env, RVH)) {
> @@ -772,6 +780,11 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
>      ctx->misa = env->misa;
>      ctx->frm = -1;  /* unknown rounding mode */
>      ctx->ext_ifencei = cpu->cfg.ext_ifencei;
> +    ctx->vlen = cpu->cfg.vlen;
> +    ctx->vill = FIELD_EX32(tb_flags, TB_FLAGS, VILL);
> +    ctx->sew = FIELD_EX32(tb_flags, TB_FLAGS, SEW);
> +    ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
> +    ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
>  }
>
>  static void riscv_tr_tb_start(DisasContextBase *db, CPUState *cpu)
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> new file mode 100644
> index 0000000000..2afe716f2a
> --- /dev/null
> +++ b/target/riscv/vector_helper.c
> @@ -0,0 +1,53 @@
> +/*
> + * RISC-V Vector Extension Helpers for QEMU.
> + *
> + * Copyright (c) 2020 C-SKY Limited. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "cpu.h"
> +#include "exec/exec-all.h"
> +#include "exec/helper-proto.h"
> +#include <math.h>
> +
> +target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
> +    target_ulong s2)
> +{
> +    int vlmax, vl;
> +    RISCVCPU *cpu = env_archcpu(env);
> +    uint16_t sew = 8 << FIELD_EX64(s2, VTYPE, VSEW);
> +    uint8_t ediv = FIELD_EX64(s2, VTYPE, VEDIV);
> +    bool vill = FIELD_EX64(s2, VTYPE, VILL);
> +    target_ulong reserved = FIELD_EX64(s2, VTYPE, RESERVED);
> +
> +    if ((sew > cpu->cfg.elen) || vill || (ediv != 0) || (reserved != 0)) {
> +        /* only set vill bit. */
> +        env->vtype = FIELD_DP64(0, VTYPE, VILL, 1);
> +        env->vl = 0;
> +        env->vstart = 0;
> +        return 0;
> +    }
> +
> +    vlmax = vext_get_vlmax(cpu, s2);
> +    if (s1 <= vlmax) {
> +        vl = s1;
> +    } else {
> +        vl = vlmax;
> +    }
> +    env->vl = vl;
> +    env->vtype = s2;
> +    env->vstart = 0;
> +    return vl;
> +}
> --
> 2.23.0
>


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 04/60] target/riscv: add vector configure instruction
  2020-03-12 21:23     ` Alistair Francis
@ 2020-03-12 22:00       ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 22:00 UTC (permalink / raw)
  To: Alistair Francis
  Cc: guoren, open list:RISC-V, Richard Henderson,
	qemu-devel@nongnu.org Developers, wxy194768, Chih-Min Chao,
	wenmeng_zhang, Palmer Dabbelt



On 2020/3/13 5:23, Alistair Francis wrote:
> On Thu, Mar 12, 2020 at 8:07 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>> vsetvl and vsetvli are two configure instructions for vl, vtype. TB flags
>> should update after configure instructions. The (ill, lmul, sew ) of vtype
>> and the bit of (VSTART == 0 && VL == VLMAX) will be placed within tb_flags.
>>
>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>> ---
>>   target/riscv/Makefile.objs              |  2 +-
>>   target/riscv/cpu.h                      | 63 ++++++++++++++++++----
>>   target/riscv/helper.h                   |  2 +
>>   target/riscv/insn32.decode              |  5 ++
>>   target/riscv/insn_trans/trans_rvv.inc.c | 69 +++++++++++++++++++++++++
>>   target/riscv/translate.c                | 17 +++++-
>>   target/riscv/vector_helper.c            | 53 +++++++++++++++++++
>>   7 files changed, 199 insertions(+), 12 deletions(-)
>>   create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
>>   create mode 100644 target/riscv/vector_helper.c
>>
>> diff --git a/target/riscv/Makefile.objs b/target/riscv/Makefile.objs
>> index ff651f69f6..ff38df6219 100644
>> --- a/target/riscv/Makefile.objs
>> +++ b/target/riscv/Makefile.objs
>> @@ -1,4 +1,4 @@
>> -obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o gdbstub.o
>> +obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o vector_helper.o gdbstub.o
>>   obj-$(CONFIG_SOFTMMU) += pmp.o
>>
>>   ifeq ($(CONFIG_SOFTMMU),y)
>> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
>> index 603715f849..505d1a8515 100644
>> --- a/target/riscv/cpu.h
>> +++ b/target/riscv/cpu.h
>> @@ -21,6 +21,7 @@
>>   #define RISCV_CPU_H
>>
>>   #include "hw/core/cpu.h"
>> +#include "hw/registerfields.h"
>>   #include "exec/cpu-defs.h"
>>   #include "fpu/softfloat-types.h"
>>
>> @@ -99,6 +100,12 @@ typedef struct CPURISCVState CPURISCVState;
>>
>>   #define RV_VLEN_MAX 512
>>
>> +FIELD(VTYPE, VLMUL, 0, 2)
>> +FIELD(VTYPE, VSEW, 2, 3)
>> +FIELD(VTYPE, VEDIV, 5, 2)
>> +FIELD(VTYPE, RESERVED, 7, sizeof(target_ulong) * 8 - 9)
>> +FIELD(VTYPE, VILL, sizeof(target_ulong) * 8 - 2, 1)
>> +
>>   struct CPURISCVState {
>>       target_ulong gpr[32];
>>       uint64_t fpr[32]; /* assume both F and D extensions */
>> @@ -358,19 +365,62 @@ void riscv_cpu_set_fflags(CPURISCVState *env, target_ulong);
>>   #define TB_FLAGS_MMU_MASK   3
>>   #define TB_FLAGS_MSTATUS_FS MSTATUS_FS
>>
>> +typedef CPURISCVState CPUArchState;
>> +typedef RISCVCPU ArchCPU;
>> +#include "exec/cpu-all.h"
>> +
>> +FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
>> +FIELD(TB_FLAGS, LMUL, 3, 2)
>> +FIELD(TB_FLAGS, SEW, 5, 3)
>> +FIELD(TB_FLAGS, VILL, 8, 1)
>> +
>> +/*
>> + * A simplification for VLMAX
>> + * = (1 << LMUL) * VLEN / (8 * (1 << SEW))
>> + * = (VLEN << LMUL) / (8 << SEW)
>> + * = (VLEN << LMUL) >> (SEW + 3)
>> + * = VLEN >> (SEW + 3 - LMUL)
>> + */
>> +static inline uint32_t vext_get_vlmax(RISCVCPU *cpu, target_ulong vtype)
>> +{
>> +    uint8_t sew, lmul;
>> +
>> +    sew = FIELD_EX64(vtype, VTYPE, VSEW);
>> +    lmul = FIELD_EX64(vtype, VTYPE, VLMUL);
>> +    return cpu->cfg.vlen >> (sew + 3 - lmul);
>> +}
>> +
>>   static inline void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
>> -                                        target_ulong *cs_base, uint32_t *flags)
>> +                                        target_ulong *cs_base, uint32_t *pflags)
>>   {
>> +    uint32_t flags = 0;
>> +
>>       *pc = env->pc;
>>       *cs_base = 0;
>> +
>> +    if (env->misa & RVV) {
> Can you use: riscv_has_ext(env, RVV) instead?
Yes. It will be clearer.
>
>> +        uint32_t vlmax = vext_get_vlmax(env_archcpu(env), env->vtype);
>> +        bool vl_eq_vlmax = (env->vstart == 0) && (vlmax == env->vl);
>> +        flags = FIELD_DP32(flags, TB_FLAGS, VILL,
>> +                    FIELD_EX64(env->vtype, VTYPE, VILL));
>> +        flags = FIELD_DP32(flags, TB_FLAGS, SEW,
>> +                    FIELD_EX64(env->vtype, VTYPE, VSEW));
>> +        flags = FIELD_DP32(flags, TB_FLAGS, LMUL,
>> +                    FIELD_EX64(env->vtype, VTYPE, VLMUL));
>> +        flags = FIELD_DP32(flags, TB_FLAGS, VL_EQ_VLMAX, vl_eq_vlmax);
>> +    } else {
>> +        flags = FIELD_DP32(flags, TB_FLAGS, VILL, 1);
>> +    }
>> +
>>   #ifdef CONFIG_USER_ONLY
>> -    *flags = TB_FLAGS_MSTATUS_FS;
>> +    flags |= TB_FLAGS_MSTATUS_FS;
>>   #else
>> -    *flags = cpu_mmu_index(env, 0);
>> +    flags |= cpu_mmu_index(env, 0);
>>       if (riscv_cpu_fp_enabled(env)) {
>> -        *flags |= env->mstatus & MSTATUS_FS;
>> +        flags |= env->mstatus & MSTATUS_FS;
>>       }
>>   #endif
>> +    *pflags = flags;
>>   }
>>
>>   int riscv_csrrw(CPURISCVState *env, int csrno, target_ulong *ret_value,
>> @@ -411,9 +461,4 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations *ops);
>>
>>   void riscv_cpu_register_gdb_regs_for_features(CPUState *cs);
>>
>> -typedef CPURISCVState CPUArchState;
>> -typedef RISCVCPU ArchCPU;
>> -
>> -#include "exec/cpu-all.h"
>> -
>>   #endif /* RISCV_CPU_H */
>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>> index debb22a480..3c28c7e407 100644
>> --- a/target/riscv/helper.h
>> +++ b/target/riscv/helper.h
>> @@ -76,3 +76,5 @@ DEF_HELPER_2(mret, tl, env, tl)
>>   DEF_HELPER_1(wfi, void, env)
>>   DEF_HELPER_1(tlb_flush, void, env)
>>   #endif
>> +/* Vector functions */
>> +DEF_HELPER_3(vsetvl, tl, env, tl, tl)
>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>> index b883672e63..53340bdbc4 100644
>> --- a/target/riscv/insn32.decode
>> +++ b/target/riscv/insn32.decode
>> @@ -62,6 +62,7 @@
>>   @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
>>   @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
>>   @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
>> +@r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
>>
>>   @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
>>   @hfence_bvma ....... ..... .....   ... ..... ....... %rs2 %rs1
>> @@ -207,3 +208,7 @@ fcvt_w_d   1100001  00000 ..... ... ..... 1010011 @r2_rm
>>   fcvt_wu_d  1100001  00001 ..... ... ..... 1010011 @r2_rm
>>   fcvt_d_w   1101001  00000 ..... ... ..... 1010011 @r2_rm
>>   fcvt_d_wu  1101001  00001 ..... ... ..... 1010011 @r2_rm
>> +
>> +# *** RV32V Extension ***
>> +vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
>> +vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
>> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
>> new file mode 100644
>> index 0000000000..da82c72bbf
>> --- /dev/null
>> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
>> @@ -0,0 +1,69 @@
>> +/*
>> + * RISC-V translation routines for the RVV Standard Extension.
>> + *
>> + * Copyright (c) 2020 C-SKY Limited. All rights reserved.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2 or later, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl * a)
>> +{
>> +    TCGv s1, s2, dst;
>> +    s2 = tcg_temp_new();
>> +    dst = tcg_temp_new();
>> +
>> +    /* Using x0 as the rs1 register specifier, encodes an infinite AVL */
>> +    if (a->rs1 == 0) {
>> +        /* As the mask is at least one bit, RV_VLEN_MAX is >= VLMAX */
>> +        s1 = tcg_const_tl(RV_VLEN_MAX);
>> +    } else {
>> +        s1 = tcg_temp_new();
>> +        gen_get_gpr(s1, a->rs1);
>> +    }
>> +    gen_get_gpr(s2, a->rs2);
>> +    gen_helper_vsetvl(dst, cpu_env, s1, s2);
>> +    gen_set_gpr(a->rd, dst);
>> +    tcg_gen_movi_tl(cpu_pc, ctx->pc_succ_insn);
>> +    exit_tb(ctx);
> Why does this
As the vsetvl will change vtype, the tb flags of the instructions next 
to the vsetvl
will change(some tb flags  are from vtype, like LMUL).
>
>> +    ctx->base.is_jmp = DISAS_NORETURN;
>> +
>> +    tcg_temp_free(s1);
>> +    tcg_temp_free(s2);
>> +    tcg_temp_free(dst);
>> +    return true;
>> +}
>> +
>> +static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli * a)
>> +{
>> +    TCGv s1, s2, dst;
>> +    s2 = tcg_const_tl(a->zimm);
>> +    dst = tcg_temp_new();
>> +
>> +    /* Using x0 as the rs1 register specifier, encodes an infinite AVL */
>> +    if (a->rs1 == 0) {
>> +        /* As the mask is at least one bit, RV_VLEN_MAX is >= VLMAX */
>> +        s1 = tcg_const_tl(RV_VLEN_MAX);
>> +    } else {
>> +        s1 = tcg_temp_new();
>> +        gen_get_gpr(s1, a->rs1);
>> +    }
>> +    gen_helper_vsetvl(dst, cpu_env, s1, s2);
>> +    gen_set_gpr(a->rd, dst);
>> +    gen_goto_tb(ctx, 0, ctx->pc_succ_insn);
> Need to be different to this?
Although vsetvli will also change vtype, the vtype will be a constant. 
So the tb flags of  the instruction(A) next to
it will always be same with the tb flags at first translation of A. 
That's why gen_goto_tb is enough.

Zhiwei
>
> Alistair
>
>> +    ctx->base.is_jmp = DISAS_NORETURN;
>> +
>> +    tcg_temp_free(s1);
>> +    tcg_temp_free(s2);
>> +    tcg_temp_free(dst);
>> +    return true;
>> +}
>> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
>> index 43bf7e39a6..af07ac4160 100644
>> --- a/target/riscv/translate.c
>> +++ b/target/riscv/translate.c
>> @@ -56,6 +56,12 @@ typedef struct DisasContext {
>>          to reset this known value.  */
>>       int frm;
>>       bool ext_ifencei;
>> +    /* vector extension */
>> +    bool vill;
>> +    uint8_t lmul;
>> +    uint8_t sew;
>> +    uint16_t vlen;
>> +    bool vl_eq_vlmax;
>>   } DisasContext;
>>
>>   #ifdef TARGET_RISCV64
>> @@ -711,6 +717,7 @@ static bool gen_shift(DisasContext *ctx, arg_r *a,
>>   #include "insn_trans/trans_rva.inc.c"
>>   #include "insn_trans/trans_rvf.inc.c"
>>   #include "insn_trans/trans_rvd.inc.c"
>> +#include "insn_trans/trans_rvv.inc.c"
>>   #include "insn_trans/trans_privileged.inc.c"
>>
>>   /* Include the auto-generated decoder for 16 bit insn */
>> @@ -745,10 +752,11 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
>>       DisasContext *ctx = container_of(dcbase, DisasContext, base);
>>       CPURISCVState *env = cs->env_ptr;
>>       RISCVCPU *cpu = RISCV_CPU(cs);
>> +    uint32_t tb_flags = ctx->base.tb->flags;
>>
>>       ctx->pc_succ_insn = ctx->base.pc_first;
>> -    ctx->mem_idx = ctx->base.tb->flags & TB_FLAGS_MMU_MASK;
>> -    ctx->mstatus_fs = ctx->base.tb->flags & TB_FLAGS_MSTATUS_FS;
>> +    ctx->mem_idx = tb_flags & TB_FLAGS_MMU_MASK;
>> +    ctx->mstatus_fs = tb_flags & TB_FLAGS_MSTATUS_FS;
>>       ctx->priv_ver = env->priv_ver;
>>   #if !defined(CONFIG_USER_ONLY)
>>       if (riscv_has_ext(env, RVH)) {
>> @@ -772,6 +780,11 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
>>       ctx->misa = env->misa;
>>       ctx->frm = -1;  /* unknown rounding mode */
>>       ctx->ext_ifencei = cpu->cfg.ext_ifencei;
>> +    ctx->vlen = cpu->cfg.vlen;
>> +    ctx->vill = FIELD_EX32(tb_flags, TB_FLAGS, VILL);
>> +    ctx->sew = FIELD_EX32(tb_flags, TB_FLAGS, SEW);
>> +    ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
>> +    ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
>>   }
>>
>>   static void riscv_tr_tb_start(DisasContextBase *db, CPUState *cpu)
>> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
>> new file mode 100644
>> index 0000000000..2afe716f2a
>> --- /dev/null
>> +++ b/target/riscv/vector_helper.c
>> @@ -0,0 +1,53 @@
>> +/*
>> + * RISC-V Vector Extension Helpers for QEMU.
>> + *
>> + * Copyright (c) 2020 C-SKY Limited. All rights reserved.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2 or later, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "cpu.h"
>> +#include "exec/exec-all.h"
>> +#include "exec/helper-proto.h"
>> +#include <math.h>
>> +
>> +target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
>> +    target_ulong s2)
>> +{
>> +    int vlmax, vl;
>> +    RISCVCPU *cpu = env_archcpu(env);
>> +    uint16_t sew = 8 << FIELD_EX64(s2, VTYPE, VSEW);
>> +    uint8_t ediv = FIELD_EX64(s2, VTYPE, VEDIV);
>> +    bool vill = FIELD_EX64(s2, VTYPE, VILL);
>> +    target_ulong reserved = FIELD_EX64(s2, VTYPE, RESERVED);
>> +
>> +    if ((sew > cpu->cfg.elen) || vill || (ediv != 0) || (reserved != 0)) {
>> +        /* only set vill bit. */
>> +        env->vtype = FIELD_DP64(0, VTYPE, VILL, 1);
>> +        env->vl = 0;
>> +        env->vstart = 0;
>> +        return 0;
>> +    }
>> +
>> +    vlmax = vext_get_vlmax(cpu, s2);
>> +    if (s1 <= vlmax) {
>> +        vl = s1;
>> +    } else {
>> +        vl = vlmax;
>> +    }
>> +    env->vl = vl;
>> +    env->vtype = s2;
>> +    env->vstart = 0;
>> +    return vl;
>> +}
>> --
>> 2.23.0
>>



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 04/60] target/riscv: add vector configure instruction
@ 2020-03-12 22:00       ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-12 22:00 UTC (permalink / raw)
  To: Alistair Francis
  Cc: Richard Henderson, Chih-Min Chao, Palmer Dabbelt, wenmeng_zhang,
	wxy194768, guoren, qemu-devel@nongnu.org Developers,
	open list:RISC-V



On 2020/3/13 5:23, Alistair Francis wrote:
> On Thu, Mar 12, 2020 at 8:07 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>> vsetvl and vsetvli are two configure instructions for vl, vtype. TB flags
>> should update after configure instructions. The (ill, lmul, sew ) of vtype
>> and the bit of (VSTART == 0 && VL == VLMAX) will be placed within tb_flags.
>>
>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>> ---
>>   target/riscv/Makefile.objs              |  2 +-
>>   target/riscv/cpu.h                      | 63 ++++++++++++++++++----
>>   target/riscv/helper.h                   |  2 +
>>   target/riscv/insn32.decode              |  5 ++
>>   target/riscv/insn_trans/trans_rvv.inc.c | 69 +++++++++++++++++++++++++
>>   target/riscv/translate.c                | 17 +++++-
>>   target/riscv/vector_helper.c            | 53 +++++++++++++++++++
>>   7 files changed, 199 insertions(+), 12 deletions(-)
>>   create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
>>   create mode 100644 target/riscv/vector_helper.c
>>
>> diff --git a/target/riscv/Makefile.objs b/target/riscv/Makefile.objs
>> index ff651f69f6..ff38df6219 100644
>> --- a/target/riscv/Makefile.objs
>> +++ b/target/riscv/Makefile.objs
>> @@ -1,4 +1,4 @@
>> -obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o gdbstub.o
>> +obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o vector_helper.o gdbstub.o
>>   obj-$(CONFIG_SOFTMMU) += pmp.o
>>
>>   ifeq ($(CONFIG_SOFTMMU),y)
>> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
>> index 603715f849..505d1a8515 100644
>> --- a/target/riscv/cpu.h
>> +++ b/target/riscv/cpu.h
>> @@ -21,6 +21,7 @@
>>   #define RISCV_CPU_H
>>
>>   #include "hw/core/cpu.h"
>> +#include "hw/registerfields.h"
>>   #include "exec/cpu-defs.h"
>>   #include "fpu/softfloat-types.h"
>>
>> @@ -99,6 +100,12 @@ typedef struct CPURISCVState CPURISCVState;
>>
>>   #define RV_VLEN_MAX 512
>>
>> +FIELD(VTYPE, VLMUL, 0, 2)
>> +FIELD(VTYPE, VSEW, 2, 3)
>> +FIELD(VTYPE, VEDIV, 5, 2)
>> +FIELD(VTYPE, RESERVED, 7, sizeof(target_ulong) * 8 - 9)
>> +FIELD(VTYPE, VILL, sizeof(target_ulong) * 8 - 2, 1)
>> +
>>   struct CPURISCVState {
>>       target_ulong gpr[32];
>>       uint64_t fpr[32]; /* assume both F and D extensions */
>> @@ -358,19 +365,62 @@ void riscv_cpu_set_fflags(CPURISCVState *env, target_ulong);
>>   #define TB_FLAGS_MMU_MASK   3
>>   #define TB_FLAGS_MSTATUS_FS MSTATUS_FS
>>
>> +typedef CPURISCVState CPUArchState;
>> +typedef RISCVCPU ArchCPU;
>> +#include "exec/cpu-all.h"
>> +
>> +FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
>> +FIELD(TB_FLAGS, LMUL, 3, 2)
>> +FIELD(TB_FLAGS, SEW, 5, 3)
>> +FIELD(TB_FLAGS, VILL, 8, 1)
>> +
>> +/*
>> + * A simplification for VLMAX
>> + * = (1 << LMUL) * VLEN / (8 * (1 << SEW))
>> + * = (VLEN << LMUL) / (8 << SEW)
>> + * = (VLEN << LMUL) >> (SEW + 3)
>> + * = VLEN >> (SEW + 3 - LMUL)
>> + */
>> +static inline uint32_t vext_get_vlmax(RISCVCPU *cpu, target_ulong vtype)
>> +{
>> +    uint8_t sew, lmul;
>> +
>> +    sew = FIELD_EX64(vtype, VTYPE, VSEW);
>> +    lmul = FIELD_EX64(vtype, VTYPE, VLMUL);
>> +    return cpu->cfg.vlen >> (sew + 3 - lmul);
>> +}
>> +
>>   static inline void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
>> -                                        target_ulong *cs_base, uint32_t *flags)
>> +                                        target_ulong *cs_base, uint32_t *pflags)
>>   {
>> +    uint32_t flags = 0;
>> +
>>       *pc = env->pc;
>>       *cs_base = 0;
>> +
>> +    if (env->misa & RVV) {
> Can you use: riscv_has_ext(env, RVV) instead?
Yes. It will be clearer.
>
>> +        uint32_t vlmax = vext_get_vlmax(env_archcpu(env), env->vtype);
>> +        bool vl_eq_vlmax = (env->vstart == 0) && (vlmax == env->vl);
>> +        flags = FIELD_DP32(flags, TB_FLAGS, VILL,
>> +                    FIELD_EX64(env->vtype, VTYPE, VILL));
>> +        flags = FIELD_DP32(flags, TB_FLAGS, SEW,
>> +                    FIELD_EX64(env->vtype, VTYPE, VSEW));
>> +        flags = FIELD_DP32(flags, TB_FLAGS, LMUL,
>> +                    FIELD_EX64(env->vtype, VTYPE, VLMUL));
>> +        flags = FIELD_DP32(flags, TB_FLAGS, VL_EQ_VLMAX, vl_eq_vlmax);
>> +    } else {
>> +        flags = FIELD_DP32(flags, TB_FLAGS, VILL, 1);
>> +    }
>> +
>>   #ifdef CONFIG_USER_ONLY
>> -    *flags = TB_FLAGS_MSTATUS_FS;
>> +    flags |= TB_FLAGS_MSTATUS_FS;
>>   #else
>> -    *flags = cpu_mmu_index(env, 0);
>> +    flags |= cpu_mmu_index(env, 0);
>>       if (riscv_cpu_fp_enabled(env)) {
>> -        *flags |= env->mstatus & MSTATUS_FS;
>> +        flags |= env->mstatus & MSTATUS_FS;
>>       }
>>   #endif
>> +    *pflags = flags;
>>   }
>>
>>   int riscv_csrrw(CPURISCVState *env, int csrno, target_ulong *ret_value,
>> @@ -411,9 +461,4 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations *ops);
>>
>>   void riscv_cpu_register_gdb_regs_for_features(CPUState *cs);
>>
>> -typedef CPURISCVState CPUArchState;
>> -typedef RISCVCPU ArchCPU;
>> -
>> -#include "exec/cpu-all.h"
>> -
>>   #endif /* RISCV_CPU_H */
>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>> index debb22a480..3c28c7e407 100644
>> --- a/target/riscv/helper.h
>> +++ b/target/riscv/helper.h
>> @@ -76,3 +76,5 @@ DEF_HELPER_2(mret, tl, env, tl)
>>   DEF_HELPER_1(wfi, void, env)
>>   DEF_HELPER_1(tlb_flush, void, env)
>>   #endif
>> +/* Vector functions */
>> +DEF_HELPER_3(vsetvl, tl, env, tl, tl)
>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>> index b883672e63..53340bdbc4 100644
>> --- a/target/riscv/insn32.decode
>> +++ b/target/riscv/insn32.decode
>> @@ -62,6 +62,7 @@
>>   @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
>>   @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
>>   @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
>> +@r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
>>
>>   @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
>>   @hfence_bvma ....... ..... .....   ... ..... ....... %rs2 %rs1
>> @@ -207,3 +208,7 @@ fcvt_w_d   1100001  00000 ..... ... ..... 1010011 @r2_rm
>>   fcvt_wu_d  1100001  00001 ..... ... ..... 1010011 @r2_rm
>>   fcvt_d_w   1101001  00000 ..... ... ..... 1010011 @r2_rm
>>   fcvt_d_wu  1101001  00001 ..... ... ..... 1010011 @r2_rm
>> +
>> +# *** RV32V Extension ***
>> +vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
>> +vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
>> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
>> new file mode 100644
>> index 0000000000..da82c72bbf
>> --- /dev/null
>> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
>> @@ -0,0 +1,69 @@
>> +/*
>> + * RISC-V translation routines for the RVV Standard Extension.
>> + *
>> + * Copyright (c) 2020 C-SKY Limited. All rights reserved.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2 or later, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl * a)
>> +{
>> +    TCGv s1, s2, dst;
>> +    s2 = tcg_temp_new();
>> +    dst = tcg_temp_new();
>> +
>> +    /* Using x0 as the rs1 register specifier, encodes an infinite AVL */
>> +    if (a->rs1 == 0) {
>> +        /* As the mask is at least one bit, RV_VLEN_MAX is >= VLMAX */
>> +        s1 = tcg_const_tl(RV_VLEN_MAX);
>> +    } else {
>> +        s1 = tcg_temp_new();
>> +        gen_get_gpr(s1, a->rs1);
>> +    }
>> +    gen_get_gpr(s2, a->rs2);
>> +    gen_helper_vsetvl(dst, cpu_env, s1, s2);
>> +    gen_set_gpr(a->rd, dst);
>> +    tcg_gen_movi_tl(cpu_pc, ctx->pc_succ_insn);
>> +    exit_tb(ctx);
> Why does this
As the vsetvl will change vtype, the tb flags of the instructions next 
to the vsetvl
will change(some tb flags  are from vtype, like LMUL).
>
>> +    ctx->base.is_jmp = DISAS_NORETURN;
>> +
>> +    tcg_temp_free(s1);
>> +    tcg_temp_free(s2);
>> +    tcg_temp_free(dst);
>> +    return true;
>> +}
>> +
>> +static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli * a)
>> +{
>> +    TCGv s1, s2, dst;
>> +    s2 = tcg_const_tl(a->zimm);
>> +    dst = tcg_temp_new();
>> +
>> +    /* Using x0 as the rs1 register specifier, encodes an infinite AVL */
>> +    if (a->rs1 == 0) {
>> +        /* As the mask is at least one bit, RV_VLEN_MAX is >= VLMAX */
>> +        s1 = tcg_const_tl(RV_VLEN_MAX);
>> +    } else {
>> +        s1 = tcg_temp_new();
>> +        gen_get_gpr(s1, a->rs1);
>> +    }
>> +    gen_helper_vsetvl(dst, cpu_env, s1, s2);
>> +    gen_set_gpr(a->rd, dst);
>> +    gen_goto_tb(ctx, 0, ctx->pc_succ_insn);
> Need to be different to this?
Although vsetvli will also change vtype, the vtype will be a constant. 
So the tb flags of  the instruction(A) next to
it will always be same with the tb flags at first translation of A. 
That's why gen_goto_tb is enough.

Zhiwei
>
> Alistair
>
>> +    ctx->base.is_jmp = DISAS_NORETURN;
>> +
>> +    tcg_temp_free(s1);
>> +    tcg_temp_free(s2);
>> +    tcg_temp_free(dst);
>> +    return true;
>> +}
>> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
>> index 43bf7e39a6..af07ac4160 100644
>> --- a/target/riscv/translate.c
>> +++ b/target/riscv/translate.c
>> @@ -56,6 +56,12 @@ typedef struct DisasContext {
>>          to reset this known value.  */
>>       int frm;
>>       bool ext_ifencei;
>> +    /* vector extension */
>> +    bool vill;
>> +    uint8_t lmul;
>> +    uint8_t sew;
>> +    uint16_t vlen;
>> +    bool vl_eq_vlmax;
>>   } DisasContext;
>>
>>   #ifdef TARGET_RISCV64
>> @@ -711,6 +717,7 @@ static bool gen_shift(DisasContext *ctx, arg_r *a,
>>   #include "insn_trans/trans_rva.inc.c"
>>   #include "insn_trans/trans_rvf.inc.c"
>>   #include "insn_trans/trans_rvd.inc.c"
>> +#include "insn_trans/trans_rvv.inc.c"
>>   #include "insn_trans/trans_privileged.inc.c"
>>
>>   /* Include the auto-generated decoder for 16 bit insn */
>> @@ -745,10 +752,11 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
>>       DisasContext *ctx = container_of(dcbase, DisasContext, base);
>>       CPURISCVState *env = cs->env_ptr;
>>       RISCVCPU *cpu = RISCV_CPU(cs);
>> +    uint32_t tb_flags = ctx->base.tb->flags;
>>
>>       ctx->pc_succ_insn = ctx->base.pc_first;
>> -    ctx->mem_idx = ctx->base.tb->flags & TB_FLAGS_MMU_MASK;
>> -    ctx->mstatus_fs = ctx->base.tb->flags & TB_FLAGS_MSTATUS_FS;
>> +    ctx->mem_idx = tb_flags & TB_FLAGS_MMU_MASK;
>> +    ctx->mstatus_fs = tb_flags & TB_FLAGS_MSTATUS_FS;
>>       ctx->priv_ver = env->priv_ver;
>>   #if !defined(CONFIG_USER_ONLY)
>>       if (riscv_has_ext(env, RVH)) {
>> @@ -772,6 +780,11 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
>>       ctx->misa = env->misa;
>>       ctx->frm = -1;  /* unknown rounding mode */
>>       ctx->ext_ifencei = cpu->cfg.ext_ifencei;
>> +    ctx->vlen = cpu->cfg.vlen;
>> +    ctx->vill = FIELD_EX32(tb_flags, TB_FLAGS, VILL);
>> +    ctx->sew = FIELD_EX32(tb_flags, TB_FLAGS, SEW);
>> +    ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
>> +    ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
>>   }
>>
>>   static void riscv_tr_tb_start(DisasContextBase *db, CPUState *cpu)
>> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
>> new file mode 100644
>> index 0000000000..2afe716f2a
>> --- /dev/null
>> +++ b/target/riscv/vector_helper.c
>> @@ -0,0 +1,53 @@
>> +/*
>> + * RISC-V Vector Extension Helpers for QEMU.
>> + *
>> + * Copyright (c) 2020 C-SKY Limited. All rights reserved.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2 or later, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "cpu.h"
>> +#include "exec/exec-all.h"
>> +#include "exec/helper-proto.h"
>> +#include <math.h>
>> +
>> +target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
>> +    target_ulong s2)
>> +{
>> +    int vlmax, vl;
>> +    RISCVCPU *cpu = env_archcpu(env);
>> +    uint16_t sew = 8 << FIELD_EX64(s2, VTYPE, VSEW);
>> +    uint8_t ediv = FIELD_EX64(s2, VTYPE, VEDIV);
>> +    bool vill = FIELD_EX64(s2, VTYPE, VILL);
>> +    target_ulong reserved = FIELD_EX64(s2, VTYPE, RESERVED);
>> +
>> +    if ((sew > cpu->cfg.elen) || vill || (ediv != 0) || (reserved != 0)) {
>> +        /* only set vill bit. */
>> +        env->vtype = FIELD_DP64(0, VTYPE, VILL, 1);
>> +        env->vl = 0;
>> +        env->vstart = 0;
>> +        return 0;
>> +    }
>> +
>> +    vlmax = vext_get_vlmax(cpu, s2);
>> +    if (s1 <= vlmax) {
>> +        vl = s1;
>> +    } else {
>> +        vl = vlmax;
>> +    }
>> +    env->vl = vl;
>> +    env->vtype = s2;
>> +    env->vstart = 0;
>> +    return vl;
>> +}
>> --
>> 2.23.0
>>



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 04/60] target/riscv: add vector configure instruction
  2020-03-12 22:00       ` LIU Zhiwei
@ 2020-03-12 22:07         ` Alistair Francis
  -1 siblings, 0 replies; 336+ messages in thread
From: Alistair Francis @ 2020-03-12 22:07 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: guoren, open list:RISC-V, Richard Henderson,
	qemu-devel@nongnu.org Developers, wxy194768, Chih-Min Chao,
	wenmeng_zhang, Palmer Dabbelt

On Thu, Mar 12, 2020 at 3:00 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
>
>
> On 2020/3/13 5:23, Alistair Francis wrote:
> > On Thu, Mar 12, 2020 at 8:07 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
> >> vsetvl and vsetvli are two configure instructions for vl, vtype. TB flags
> >> should update after configure instructions. The (ill, lmul, sew ) of vtype
> >> and the bit of (VSTART == 0 && VL == VLMAX) will be placed within tb_flags.
> >>
> >> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> >> ---
> >>   target/riscv/Makefile.objs              |  2 +-
> >>   target/riscv/cpu.h                      | 63 ++++++++++++++++++----
> >>   target/riscv/helper.h                   |  2 +
> >>   target/riscv/insn32.decode              |  5 ++
> >>   target/riscv/insn_trans/trans_rvv.inc.c | 69 +++++++++++++++++++++++++
> >>   target/riscv/translate.c                | 17 +++++-
> >>   target/riscv/vector_helper.c            | 53 +++++++++++++++++++
> >>   7 files changed, 199 insertions(+), 12 deletions(-)
> >>   create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
> >>   create mode 100644 target/riscv/vector_helper.c
> >>

...

> >> +    gen_get_gpr(s2, a->rs2);
> >> +    gen_helper_vsetvl(dst, cpu_env, s1, s2);
> >> +    gen_set_gpr(a->rd, dst);
> >> +    tcg_gen_movi_tl(cpu_pc, ctx->pc_succ_insn);
> >> +    exit_tb(ctx);
> > Why does this
> As the vsetvl will change vtype, the tb flags of the instructions next
> to the vsetvl
> will change(some tb flags  are from vtype, like LMUL).
> >
> >> +    ctx->base.is_jmp = DISAS_NORETURN;
> >> +
> >> +    tcg_temp_free(s1);
> >> +    tcg_temp_free(s2);
> >> +    tcg_temp_free(dst);
> >> +    return true;
> >> +}
> >> +
> >> +static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli * a)
> >> +{
> >> +    TCGv s1, s2, dst;
> >> +    s2 = tcg_const_tl(a->zimm);
> >> +    dst = tcg_temp_new();
> >> +
> >> +    /* Using x0 as the rs1 register specifier, encodes an infinite AVL */
> >> +    if (a->rs1 == 0) {
> >> +        /* As the mask is at least one bit, RV_VLEN_MAX is >= VLMAX */
> >> +        s1 = tcg_const_tl(RV_VLEN_MAX);
> >> +    } else {
> >> +        s1 = tcg_temp_new();
> >> +        gen_get_gpr(s1, a->rs1);
> >> +    }
> >> +    gen_helper_vsetvl(dst, cpu_env, s1, s2);
> >> +    gen_set_gpr(a->rd, dst);
> >> +    gen_goto_tb(ctx, 0, ctx->pc_succ_insn);
> > Need to be different to this?
> Although vsetvli will also change vtype, the vtype will be a constant.
> So the tb flags of  the instruction(A) next to
> it will always be same with the tb flags at first translation of A.
> That's why gen_goto_tb is enough.

Ah ok. Makes sense.

Once you fix the one nit pick I had you can add my reviewed by:

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

>
> Zhiwei


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 04/60] target/riscv: add vector configure instruction
@ 2020-03-12 22:07         ` Alistair Francis
  0 siblings, 0 replies; 336+ messages in thread
From: Alistair Francis @ 2020-03-12 22:07 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Chih-Min Chao, Palmer Dabbelt, wenmeng_zhang,
	wxy194768, guoren, qemu-devel@nongnu.org Developers,
	open list:RISC-V

On Thu, Mar 12, 2020 at 3:00 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
>
>
> On 2020/3/13 5:23, Alistair Francis wrote:
> > On Thu, Mar 12, 2020 at 8:07 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
> >> vsetvl and vsetvli are two configure instructions for vl, vtype. TB flags
> >> should update after configure instructions. The (ill, lmul, sew ) of vtype
> >> and the bit of (VSTART == 0 && VL == VLMAX) will be placed within tb_flags.
> >>
> >> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> >> ---
> >>   target/riscv/Makefile.objs              |  2 +-
> >>   target/riscv/cpu.h                      | 63 ++++++++++++++++++----
> >>   target/riscv/helper.h                   |  2 +
> >>   target/riscv/insn32.decode              |  5 ++
> >>   target/riscv/insn_trans/trans_rvv.inc.c | 69 +++++++++++++++++++++++++
> >>   target/riscv/translate.c                | 17 +++++-
> >>   target/riscv/vector_helper.c            | 53 +++++++++++++++++++
> >>   7 files changed, 199 insertions(+), 12 deletions(-)
> >>   create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
> >>   create mode 100644 target/riscv/vector_helper.c
> >>

...

> >> +    gen_get_gpr(s2, a->rs2);
> >> +    gen_helper_vsetvl(dst, cpu_env, s1, s2);
> >> +    gen_set_gpr(a->rd, dst);
> >> +    tcg_gen_movi_tl(cpu_pc, ctx->pc_succ_insn);
> >> +    exit_tb(ctx);
> > Why does this
> As the vsetvl will change vtype, the tb flags of the instructions next
> to the vsetvl
> will change(some tb flags  are from vtype, like LMUL).
> >
> >> +    ctx->base.is_jmp = DISAS_NORETURN;
> >> +
> >> +    tcg_temp_free(s1);
> >> +    tcg_temp_free(s2);
> >> +    tcg_temp_free(dst);
> >> +    return true;
> >> +}
> >> +
> >> +static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli * a)
> >> +{
> >> +    TCGv s1, s2, dst;
> >> +    s2 = tcg_const_tl(a->zimm);
> >> +    dst = tcg_temp_new();
> >> +
> >> +    /* Using x0 as the rs1 register specifier, encodes an infinite AVL */
> >> +    if (a->rs1 == 0) {
> >> +        /* As the mask is at least one bit, RV_VLEN_MAX is >= VLMAX */
> >> +        s1 = tcg_const_tl(RV_VLEN_MAX);
> >> +    } else {
> >> +        s1 = tcg_temp_new();
> >> +        gen_get_gpr(s1, a->rs1);
> >> +    }
> >> +    gen_helper_vsetvl(dst, cpu_env, s1, s2);
> >> +    gen_set_gpr(a->rd, dst);
> >> +    gen_goto_tb(ctx, 0, ctx->pc_succ_insn);
> > Need to be different to this?
> Although vsetvli will also change vtype, the vtype will be a constant.
> So the tb flags of  the instruction(A) next to
> it will always be same with the tb flags at first translation of A.
> That's why gen_goto_tb is enough.

Ah ok. Makes sense.

Once you fix the one nit pick I had you can add my reviewed by:

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

>
> Zhiwei


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 00/60] target/riscv: support vector extension v0.7.1
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-13  0:41   ` no-reply
  -1 siblings, 0 replies; 336+ messages in thread
From: no-reply @ 2020-03-13  0:41 UTC (permalink / raw)
  To: zhiwei_liu
  Cc: guoren, qemu-riscv, richard.henderson, qemu-devel, wxy194768,
	chihmin.chao, wenmeng_zhang, palmer, alistair23, zhiwei_liu

Patchew URL: https://patchew.org/QEMU/20200312145900.2054-1-zhiwei_liu@c-sky.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH v5 00/60] target/riscv: support vector extension v0.7.1
Message-id: 20200312145900.2054-1-zhiwei_liu@c-sky.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Switched to a new branch 'test'
375b545 target/riscv: configure and turn on vector extension from command line
1b21cbe target/riscv: vector compress instruction
f991525 target/riscv: vector register gather instruction
1e14e32 target/riscv: vector slide instructions
39f1497 target/riscv: floating-point scalar move instructions
2e66424 target/riscv: integer scalar move instruction
72404d7 target/riscv: integer extract instruction
3e441a9 target/riscv: vector element index instruction
0e8d18f target/riscv: vector iota instruction
5941891 target/riscv: set-X-first mask bit
1403c7e target/riscv: vmfirst find-first-set mask bit
3eba22e target/riscv: vector mask population count vmpopc
10809a8 target/riscv: vector mask-register logical instructions
b279b81 target/riscv: vector widening floating-point reduction instructions
6b1e85b target/riscv: vector single-width floating-point reduction instructions
ae44adc target/riscv: vector wideing integer reduction instructions
2f73f58 target/riscv: vector single-width integer reduction instructions
4ddb4e3 target/riscv: narrowing floating-point/integer type-convert instructions
ec3b1de target/riscv: widening floating-point/integer type-convert instructions
fc9abf9 target/riscv: vector floating-point/integer type-convert instructions
b3ae6d1 target/riscv: vector floating-point merge instructions
cb59558 target/riscv: vector floating-point classify instructions
1aa8c5b target/riscv: vector floating-point compare instructions
4b71902 target/riscv: vector floating-point sign-injection instructions
9a4bcd8 target/riscv: vector floating-point min/max instructions
8cefa5a target/riscv: vector floating-point square-root instruction
1dca724 target/riscv: vector widening floating-point fused multiply-add instructions
d730445 target/riscv: vector single-width floating-point fused multiply-add instructions
59e9d00 target/riscv: vector widening floating-point multiply
7728ab1 target/riscv: vector single-width floating-point multiply/divide instructions
db7a3eb target/riscv: vector widening floating-point add/subtract instructions
b74ee11 target/riscv: vector single-width floating-point add/subtract instructions
a6aed98 target/riscv: vector narrowing fixed-point clip instructions
41bff4f target/riscv: vector single-width scaling shift instructions
4e0735b target/riscv: vector widening saturating scaled multiply-add
7175350 target/riscv: vector single-width fractional multiply with rounding and saturation
866ade9 target/riscv: vector single-width averaging add and subtract
a10f893 target/riscv: vector single-width saturating add and subtract
b1968d2 target/riscv: vector integer merge and move instructions
b9a7f44 target/riscv: vector widening integer multiply-add instructions
9a490e5 target/riscv: vector single-width integer multiply-add instructions
24d1513 target/riscv: vector widening integer multiply instructions
4080b57 target/riscv: vector integer divide instructions
aafca3f target/riscv: vector single-width integer multiply instructions
386c472 target/riscv: vector integer min/max instructions
9586428 target/riscv: vector integer comparison instructions
615ad80 target/riscv: vector narrowing integer right shift instructions
2eb1e18 target/riscv: vector single-width bit shift instructions
047a1fa target/riscv: vector bitwise logical instructions
b403895 target/riscv: vector integer add-with-carry / subtract-with-borrow instructions
8f2bc0b target/riscv: vector widening integer add and subtract
8f204ca target/riscv: vector single-width integer add and subtract
d5f58d7 target/riscv: add vector amo operations
29a0e0d target/riscv: add fault-only-first unit stride load
8166bfc target/riscv: add vector index load and store instructions
72f9f39 target/riscv: add vector stride load and store instructions
392ca2c target/riscv: add vector configure instruction
472b5e6 target/riscv: support vector extension csr
d172c56 target/riscv: implementation-defined constant parameters
73ee7eb target/riscv: add vector extension field in CPURISCVState

=== OUTPUT BEGIN ===
1/60 Checking commit 73ee7eb553fc (target/riscv: add vector extension field in CPURISCVState)
2/60 Checking commit d172c5624ac8 (target/riscv: implementation-defined constant parameters)
3/60 Checking commit 472b5e62cd77 (target/riscv: support vector extension csr)
4/60 Checking commit 392ca2c42910 (target/riscv: add vector configure instruction)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#158: 
new file mode 100644

total: 0 errors, 1 warnings, 284 lines checked

Patch 4/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
5/60 Checking commit 72f9f398f938 (target/riscv: add vector stride load and store instructions)
ERROR: spaces required around that '*' (ctx:WxV)
#283: FILE: target/riscv/insn_trans/trans_rvv.inc.c:127:
+static bool trans_##NAME(DisasContext *s, arg_##ARGTYPE *a)\
                                                         ^

total: 1 errors, 0 warnings, 966 lines checked

Patch 5/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

6/60 Checking commit 8166bfc954b6 (target/riscv: add vector index load and store instructions)
7/60 Checking commit 29a0e0d6fbc3 (target/riscv: add fault-only-first unit stride load)
8/60 Checking commit d5f58d7a2231 (target/riscv: add vector amo operations)
9/60 Checking commit 8f204cad4cdc (target/riscv: vector single-width integer add and subtract)
ERROR: spaces required around that '*' (ctx:WxV)
#87: FILE: target/riscv/insn_trans/trans_rvv.inc.c:739:
+static bool opivv_check(DisasContext *s, arg_rmrr *a)
                                                   ^

ERROR: spaces required around that '*' (ctx:WxV)
#98: FILE: target/riscv/insn_trans/trans_rvv.inc.c:750:
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
                                                    ^

total: 2 errors, 0 warnings, 399 lines checked

Patch 9/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

10/60 Checking commit 8f2bc0b5ecdc (target/riscv: vector widening integer add and subtract)
11/60 Checking commit b403895b2425 (target/riscv: vector integer add-with-carry / subtract-with-borrow instructions)
ERROR: spaces required around that '*' (ctx:WxV)
#83: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1110:
+static bool trans_##NAME(DisasContext *s, arg_r *a)                \
                                                 ^

ERROR: spaces required around that '*' (ctx:WxV)
#105: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1132:
+static bool opivv_vadc_check(DisasContext *s, arg_r *a)
                                                     ^

ERROR: spaces required around that '*' (ctx:WxV)
#120: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1147:
+static bool opivv_vmadc_check(DisasContext *s, arg_r *a)
                                                      ^

ERROR: spaces required around that '*' (ctx:WxV)
#133: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1160:
+static bool trans_##NAME(DisasContext *s, arg_r *a)                      \
                                                 ^

ERROR: spaces required around that '*' (ctx:WxV)
#149: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1176:
+static bool opivx_vadc_check(DisasContext *s, arg_r *a)
                                                     ^

ERROR: spaces required around that '*' (ctx:WxV)
#159: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1186:
+static bool opivx_vmadc_check(DisasContext *s, arg_r *a)
                                                      ^

ERROR: spaces required around that '*' (ctx:WxV)
#170: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1197:
+static bool trans_##NAME(DisasContext *s, arg_r *a)                      \
                                                 ^

total: 7 errors, 0 warnings, 312 lines checked

Patch 11/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

12/60 Checking commit 047a1fa32f47 (target/riscv: vector bitwise logical instructions)
13/60 Checking commit 2eb1e18e754d (target/riscv: vector single-width bit shift instructions)
14/60 Checking commit 615ad80f3f71 (target/riscv: vector narrowing integer right shift instructions)
15/60 Checking commit 9586428c962a (target/riscv: vector integer comparison instructions)
16/60 Checking commit 386c472368f0 (target/riscv: vector integer min/max instructions)
17/60 Checking commit aafca3f152fa (target/riscv: vector single-width integer multiply instructions)
18/60 Checking commit 4080b573cb2c (target/riscv: vector integer divide instructions)
19/60 Checking commit 24d15131325b (target/riscv: vector widening integer multiply instructions)
20/60 Checking commit 9a490e5f02a8 (target/riscv: vector single-width integer multiply-add instructions)
21/60 Checking commit b9a7f44e4b10 (target/riscv: vector widening integer multiply-add instructions)
22/60 Checking commit b1968d225be8 (target/riscv: vector integer merge and move instructions)
23/60 Checking commit a10f89334975 (target/riscv: vector single-width saturating add and subtract)
24/60 Checking commit 866ade991718 (target/riscv: vector single-width averaging add and subtract)
25/60 Checking commit 71753503b7bd (target/riscv: vector single-width fractional multiply with rounding and saturation)
26/60 Checking commit 4e0735b24caa (target/riscv: vector widening saturating scaled multiply-add)
27/60 Checking commit 41bff4f7fb9c (target/riscv: vector single-width scaling shift instructions)
28/60 Checking commit a6aed98d4490 (target/riscv: vector narrowing fixed-point clip instructions)
29/60 Checking commit b74ee1196c93 (target/riscv: vector single-width floating-point add/subtract instructions)
ERROR: spaces required around that '*' (ctx:WxV)
#249: FILE: target/riscv/vector_helper.c:3012:
+static uint16_t float16_rsub(uint16_t a, uint16_t b, float_status *s)
                                                                   ^

total: 1 errors, 0 warnings, 238 lines checked

Patch 29/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

30/60 Checking commit db7a3eb8d90a (target/riscv: vector widening floating-point add/subtract instructions)
31/60 Checking commit 7728ab1858dc (target/riscv: vector single-width floating-point multiply/divide instructions)
32/60 Checking commit 59e9d00e6557 (target/riscv: vector widening floating-point multiply)
33/60 Checking commit d7304457e529 (target/riscv: vector single-width floating-point fused multiply-add instructions)
34/60 Checking commit 1dca7240a1c4 (target/riscv: vector widening floating-point fused multiply-add instructions)
35/60 Checking commit 8cefa5a85791 (target/riscv: vector floating-point square-root instruction)
ERROR: spaces required around that '*' (ctx:WxV)
#65: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1844:
+static bool opfv_check(DisasContext *s, arg_rmr *a)
                                                 ^

ERROR: spaces required around that '*' (ctx:WxV)
#75: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1854:
+static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
                                                   ^

total: 2 errors, 0 warnings, 111 lines checked

Patch 35/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

36/60 Checking commit 9a4bcd893d2b (target/riscv: vector floating-point min/max instructions)
37/60 Checking commit 4b7190263384 (target/riscv: vector floating-point sign-injection instructions)
38/60 Checking commit 1aa8c5ba7f17 (target/riscv: vector floating-point compare instructions)
39/60 Checking commit cb595588a5aa (target/riscv: vector floating-point classify instructions)
40/60 Checking commit b3ae6d1de7d2 (target/riscv: vector floating-point merge instructions)
41/60 Checking commit fc9abf966c74 (target/riscv: vector floating-point/integer type-convert instructions)
42/60 Checking commit ec3b1dee93bb (target/riscv: widening floating-point/integer type-convert instructions)
ERROR: spaces required around that '*' (ctx:WxV)
#60: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1949:
+static bool opfv_widen_check(DisasContext *s, arg_rmr *a)
                                                       ^

ERROR: spaces required around that '*' (ctx:WxV)
#72: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1961:
+static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
                                                   ^

total: 2 errors, 0 warnings, 118 lines checked

Patch 42/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

43/60 Checking commit 4ddb4e380116 (target/riscv: narrowing floating-point/integer type-convert instructions)
ERROR: spaces required around that '*' (ctx:WxV)
#60: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1991:
+static bool opfv_narrow_check(DisasContext *s, arg_rmr *a)
                                                        ^

ERROR: spaces required around that '*' (ctx:WxV)
#72: FILE: target/riscv/insn_trans/trans_rvv.inc.c:2003:
+static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
                                                   ^

total: 2 errors, 0 warnings, 115 lines checked

Patch 43/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

44/60 Checking commit 2f73f581cd8f (target/riscv: vector single-width integer reduction instructions)
45/60 Checking commit ae44adc808a4 (target/riscv: vector wideing integer reduction instructions)
46/60 Checking commit 6b1e85bd5bf9 (target/riscv: vector single-width floating-point reduction instructions)
47/60 Checking commit b279b81d260f (target/riscv: vector widening floating-point reduction instructions)
48/60 Checking commit 10809a8f183e (target/riscv: vector mask-register logical instructions)
ERROR: spaces required around that '*' (ctx:WxV)
#60: FILE: target/riscv/insn_trans/trans_rvv.inc.c:2061:
+static bool trans_##NAME(DisasContext *s, arg_r *a)                \
                                                 ^

ERROR: "foo * bar" should be "foo *bar"
#64: FILE: target/riscv/insn_trans/trans_rvv.inc.c:2065:
+        gen_helper_gvec_4_ptr * fn = gen_helper_##NAME;            \

total: 2 errors, 0 warnings, 100 lines checked

Patch 48/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

49/60 Checking commit 3eba22ed5b2f (target/riscv: vector mask population count vmpopc)
ERROR: spaces required around that '*' (ctx:WxV)
#42: FILE: target/riscv/insn_trans/trans_rvv.inc.c:2085:
+static bool trans_vmpopc_m(DisasContext *s, arg_rmr *a)
                                                     ^

total: 1 errors, 0 warnings, 70 lines checked

Patch 49/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

50/60 Checking commit 1403c7eebe82 (target/riscv: vmfirst find-first-set mask bit)
ERROR: suspect code indent for conditional statements (12, 15)
#92: FILE: target/riscv/vector_helper.c:4313:
+            if (vext_elem_mask(vs2, mlen, i)) {
+               return i;

total: 1 errors, 0 warnings, 69 lines checked

Patch 50/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

51/60 Checking commit 5941891e02bf (target/riscv: set-X-first mask bit)
ERROR: "foo * bar" should be "foo *bar"
#53: FILE: target/riscv/insn_trans/trans_rvv.inc.c:2156:
+        gen_helper_gvec_3_ptr * fn = gen_helper_##NAME;            \

total: 1 errors, 0 warnings, 111 lines checked

Patch 51/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

52/60 Checking commit 0e8d18f00c74 (target/riscv: vector iota instruction)
ERROR: spaces required around that '*' (ctx:WxV)
#45: FILE: target/riscv/insn_trans/trans_rvv.inc.c:2172:
+static bool trans_viota_m(DisasContext *s, arg_viota_m *a)
                                                        ^

total: 1 errors, 0 warnings, 74 lines checked

Patch 52/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

53/60 Checking commit 3e441a993c10 (target/riscv: vector element index instruction)
54/60 Checking commit 72404d73d06a (target/riscv: integer extract instruction)
ERROR: space prohibited after that '*' (ctx:BxW)
#48: FILE: target/riscv/insn_trans/trans_rvv.inc.c:2218:
+typedef void (* gen_helper_vext_x_v)(TCGv, TCGv_ptr, TCGv, TCGv_env);
               ^

total: 1 errors, 0 warnings, 74 lines checked

Patch 54/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

55/60 Checking commit 2e66424902bf (target/riscv: integer scalar move instruction)
ERROR: space prohibited after that '*' (ctx:BxW)
#45: FILE: target/riscv/insn_trans/trans_rvv.inc.c:2248:
+typedef void (* gen_helper_vmv_s_x)(TCGv_ptr, TCGv, TCGv_env);
               ^

total: 1 errors, 0 warnings, 62 lines checked

Patch 55/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

56/60 Checking commit 39f1497df9c9 (target/riscv: floating-point scalar move instructions)
ERROR: space prohibited after that '*' (ctx:BxW)
#50: FILE: target/riscv/insn_trans/trans_rvv.inc.c:2274:
+typedef void (* gen_helper_vfmv_f_s)(TCGv_i64, TCGv_ptr, TCGv_env);
               ^

ERROR: space prohibited after that '*' (ctx:BxW)
#71: FILE: target/riscv/insn_trans/trans_rvv.inc.c:2295:
+typedef void (* gen_helper_vfmv_s_f)(TCGv_ptr, TCGv_i64, TCGv_env);
               ^

total: 2 errors, 0 warnings, 109 lines checked

Patch 56/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

57/60 Checking commit 1e14e32ecd73 (target/riscv: vector slide instructions)
58/60 Checking commit f991525dafb5 (target/riscv: vector register gather instruction)
59/60 Checking commit 1b21cbeba68f (target/riscv: vector compress instruction)
60/60 Checking commit 375b5454f423 (target/riscv: configure and turn on vector extension from command line)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200312145900.2054-1-zhiwei_liu@c-sky.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 00/60] target/riscv: support vector extension v0.7.1
@ 2020-03-13  0:41   ` no-reply
  0 siblings, 0 replies; 336+ messages in thread
From: no-reply @ 2020-03-13  0:41 UTC (permalink / raw)
  To: zhiwei_liu
  Cc: richard.henderson, alistair23, chihmin.chao, palmer, guoren,
	qemu-riscv, qemu-devel, wxy194768, wenmeng_zhang, zhiwei_liu

Patchew URL: https://patchew.org/QEMU/20200312145900.2054-1-zhiwei_liu@c-sky.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH v5 00/60] target/riscv: support vector extension v0.7.1
Message-id: 20200312145900.2054-1-zhiwei_liu@c-sky.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Switched to a new branch 'test'
375b545 target/riscv: configure and turn on vector extension from command line
1b21cbe target/riscv: vector compress instruction
f991525 target/riscv: vector register gather instruction
1e14e32 target/riscv: vector slide instructions
39f1497 target/riscv: floating-point scalar move instructions
2e66424 target/riscv: integer scalar move instruction
72404d7 target/riscv: integer extract instruction
3e441a9 target/riscv: vector element index instruction
0e8d18f target/riscv: vector iota instruction
5941891 target/riscv: set-X-first mask bit
1403c7e target/riscv: vmfirst find-first-set mask bit
3eba22e target/riscv: vector mask population count vmpopc
10809a8 target/riscv: vector mask-register logical instructions
b279b81 target/riscv: vector widening floating-point reduction instructions
6b1e85b target/riscv: vector single-width floating-point reduction instructions
ae44adc target/riscv: vector wideing integer reduction instructions
2f73f58 target/riscv: vector single-width integer reduction instructions
4ddb4e3 target/riscv: narrowing floating-point/integer type-convert instructions
ec3b1de target/riscv: widening floating-point/integer type-convert instructions
fc9abf9 target/riscv: vector floating-point/integer type-convert instructions
b3ae6d1 target/riscv: vector floating-point merge instructions
cb59558 target/riscv: vector floating-point classify instructions
1aa8c5b target/riscv: vector floating-point compare instructions
4b71902 target/riscv: vector floating-point sign-injection instructions
9a4bcd8 target/riscv: vector floating-point min/max instructions
8cefa5a target/riscv: vector floating-point square-root instruction
1dca724 target/riscv: vector widening floating-point fused multiply-add instructions
d730445 target/riscv: vector single-width floating-point fused multiply-add instructions
59e9d00 target/riscv: vector widening floating-point multiply
7728ab1 target/riscv: vector single-width floating-point multiply/divide instructions
db7a3eb target/riscv: vector widening floating-point add/subtract instructions
b74ee11 target/riscv: vector single-width floating-point add/subtract instructions
a6aed98 target/riscv: vector narrowing fixed-point clip instructions
41bff4f target/riscv: vector single-width scaling shift instructions
4e0735b target/riscv: vector widening saturating scaled multiply-add
7175350 target/riscv: vector single-width fractional multiply with rounding and saturation
866ade9 target/riscv: vector single-width averaging add and subtract
a10f893 target/riscv: vector single-width saturating add and subtract
b1968d2 target/riscv: vector integer merge and move instructions
b9a7f44 target/riscv: vector widening integer multiply-add instructions
9a490e5 target/riscv: vector single-width integer multiply-add instructions
24d1513 target/riscv: vector widening integer multiply instructions
4080b57 target/riscv: vector integer divide instructions
aafca3f target/riscv: vector single-width integer multiply instructions
386c472 target/riscv: vector integer min/max instructions
9586428 target/riscv: vector integer comparison instructions
615ad80 target/riscv: vector narrowing integer right shift instructions
2eb1e18 target/riscv: vector single-width bit shift instructions
047a1fa target/riscv: vector bitwise logical instructions
b403895 target/riscv: vector integer add-with-carry / subtract-with-borrow instructions
8f2bc0b target/riscv: vector widening integer add and subtract
8f204ca target/riscv: vector single-width integer add and subtract
d5f58d7 target/riscv: add vector amo operations
29a0e0d target/riscv: add fault-only-first unit stride load
8166bfc target/riscv: add vector index load and store instructions
72f9f39 target/riscv: add vector stride load and store instructions
392ca2c target/riscv: add vector configure instruction
472b5e6 target/riscv: support vector extension csr
d172c56 target/riscv: implementation-defined constant parameters
73ee7eb target/riscv: add vector extension field in CPURISCVState

=== OUTPUT BEGIN ===
1/60 Checking commit 73ee7eb553fc (target/riscv: add vector extension field in CPURISCVState)
2/60 Checking commit d172c5624ac8 (target/riscv: implementation-defined constant parameters)
3/60 Checking commit 472b5e62cd77 (target/riscv: support vector extension csr)
4/60 Checking commit 392ca2c42910 (target/riscv: add vector configure instruction)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#158: 
new file mode 100644

total: 0 errors, 1 warnings, 284 lines checked

Patch 4/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
5/60 Checking commit 72f9f398f938 (target/riscv: add vector stride load and store instructions)
ERROR: spaces required around that '*' (ctx:WxV)
#283: FILE: target/riscv/insn_trans/trans_rvv.inc.c:127:
+static bool trans_##NAME(DisasContext *s, arg_##ARGTYPE *a)\
                                                         ^

total: 1 errors, 0 warnings, 966 lines checked

Patch 5/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

6/60 Checking commit 8166bfc954b6 (target/riscv: add vector index load and store instructions)
7/60 Checking commit 29a0e0d6fbc3 (target/riscv: add fault-only-first unit stride load)
8/60 Checking commit d5f58d7a2231 (target/riscv: add vector amo operations)
9/60 Checking commit 8f204cad4cdc (target/riscv: vector single-width integer add and subtract)
ERROR: spaces required around that '*' (ctx:WxV)
#87: FILE: target/riscv/insn_trans/trans_rvv.inc.c:739:
+static bool opivv_check(DisasContext *s, arg_rmrr *a)
                                                   ^

ERROR: spaces required around that '*' (ctx:WxV)
#98: FILE: target/riscv/insn_trans/trans_rvv.inc.c:750:
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
                                                    ^

total: 2 errors, 0 warnings, 399 lines checked

Patch 9/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

10/60 Checking commit 8f2bc0b5ecdc (target/riscv: vector widening integer add and subtract)
11/60 Checking commit b403895b2425 (target/riscv: vector integer add-with-carry / subtract-with-borrow instructions)
ERROR: spaces required around that '*' (ctx:WxV)
#83: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1110:
+static bool trans_##NAME(DisasContext *s, arg_r *a)                \
                                                 ^

ERROR: spaces required around that '*' (ctx:WxV)
#105: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1132:
+static bool opivv_vadc_check(DisasContext *s, arg_r *a)
                                                     ^

ERROR: spaces required around that '*' (ctx:WxV)
#120: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1147:
+static bool opivv_vmadc_check(DisasContext *s, arg_r *a)
                                                      ^

ERROR: spaces required around that '*' (ctx:WxV)
#133: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1160:
+static bool trans_##NAME(DisasContext *s, arg_r *a)                      \
                                                 ^

ERROR: spaces required around that '*' (ctx:WxV)
#149: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1176:
+static bool opivx_vadc_check(DisasContext *s, arg_r *a)
                                                     ^

ERROR: spaces required around that '*' (ctx:WxV)
#159: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1186:
+static bool opivx_vmadc_check(DisasContext *s, arg_r *a)
                                                      ^

ERROR: spaces required around that '*' (ctx:WxV)
#170: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1197:
+static bool trans_##NAME(DisasContext *s, arg_r *a)                      \
                                                 ^

total: 7 errors, 0 warnings, 312 lines checked

Patch 11/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

12/60 Checking commit 047a1fa32f47 (target/riscv: vector bitwise logical instructions)
13/60 Checking commit 2eb1e18e754d (target/riscv: vector single-width bit shift instructions)
14/60 Checking commit 615ad80f3f71 (target/riscv: vector narrowing integer right shift instructions)
15/60 Checking commit 9586428c962a (target/riscv: vector integer comparison instructions)
16/60 Checking commit 386c472368f0 (target/riscv: vector integer min/max instructions)
17/60 Checking commit aafca3f152fa (target/riscv: vector single-width integer multiply instructions)
18/60 Checking commit 4080b573cb2c (target/riscv: vector integer divide instructions)
19/60 Checking commit 24d15131325b (target/riscv: vector widening integer multiply instructions)
20/60 Checking commit 9a490e5f02a8 (target/riscv: vector single-width integer multiply-add instructions)
21/60 Checking commit b9a7f44e4b10 (target/riscv: vector widening integer multiply-add instructions)
22/60 Checking commit b1968d225be8 (target/riscv: vector integer merge and move instructions)
23/60 Checking commit a10f89334975 (target/riscv: vector single-width saturating add and subtract)
24/60 Checking commit 866ade991718 (target/riscv: vector single-width averaging add and subtract)
25/60 Checking commit 71753503b7bd (target/riscv: vector single-width fractional multiply with rounding and saturation)
26/60 Checking commit 4e0735b24caa (target/riscv: vector widening saturating scaled multiply-add)
27/60 Checking commit 41bff4f7fb9c (target/riscv: vector single-width scaling shift instructions)
28/60 Checking commit a6aed98d4490 (target/riscv: vector narrowing fixed-point clip instructions)
29/60 Checking commit b74ee1196c93 (target/riscv: vector single-width floating-point add/subtract instructions)
ERROR: spaces required around that '*' (ctx:WxV)
#249: FILE: target/riscv/vector_helper.c:3012:
+static uint16_t float16_rsub(uint16_t a, uint16_t b, float_status *s)
                                                                   ^

total: 1 errors, 0 warnings, 238 lines checked

Patch 29/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

30/60 Checking commit db7a3eb8d90a (target/riscv: vector widening floating-point add/subtract instructions)
31/60 Checking commit 7728ab1858dc (target/riscv: vector single-width floating-point multiply/divide instructions)
32/60 Checking commit 59e9d00e6557 (target/riscv: vector widening floating-point multiply)
33/60 Checking commit d7304457e529 (target/riscv: vector single-width floating-point fused multiply-add instructions)
34/60 Checking commit 1dca7240a1c4 (target/riscv: vector widening floating-point fused multiply-add instructions)
35/60 Checking commit 8cefa5a85791 (target/riscv: vector floating-point square-root instruction)
ERROR: spaces required around that '*' (ctx:WxV)
#65: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1844:
+static bool opfv_check(DisasContext *s, arg_rmr *a)
                                                 ^

ERROR: spaces required around that '*' (ctx:WxV)
#75: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1854:
+static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
                                                   ^

total: 2 errors, 0 warnings, 111 lines checked

Patch 35/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

36/60 Checking commit 9a4bcd893d2b (target/riscv: vector floating-point min/max instructions)
37/60 Checking commit 4b7190263384 (target/riscv: vector floating-point sign-injection instructions)
38/60 Checking commit 1aa8c5ba7f17 (target/riscv: vector floating-point compare instructions)
39/60 Checking commit cb595588a5aa (target/riscv: vector floating-point classify instructions)
40/60 Checking commit b3ae6d1de7d2 (target/riscv: vector floating-point merge instructions)
41/60 Checking commit fc9abf966c74 (target/riscv: vector floating-point/integer type-convert instructions)
42/60 Checking commit ec3b1dee93bb (target/riscv: widening floating-point/integer type-convert instructions)
ERROR: spaces required around that '*' (ctx:WxV)
#60: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1949:
+static bool opfv_widen_check(DisasContext *s, arg_rmr *a)
                                                       ^

ERROR: spaces required around that '*' (ctx:WxV)
#72: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1961:
+static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
                                                   ^

total: 2 errors, 0 warnings, 118 lines checked

Patch 42/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

43/60 Checking commit 4ddb4e380116 (target/riscv: narrowing floating-point/integer type-convert instructions)
ERROR: spaces required around that '*' (ctx:WxV)
#60: FILE: target/riscv/insn_trans/trans_rvv.inc.c:1991:
+static bool opfv_narrow_check(DisasContext *s, arg_rmr *a)
                                                        ^

ERROR: spaces required around that '*' (ctx:WxV)
#72: FILE: target/riscv/insn_trans/trans_rvv.inc.c:2003:
+static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
                                                   ^

total: 2 errors, 0 warnings, 115 lines checked

Patch 43/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

44/60 Checking commit 2f73f581cd8f (target/riscv: vector single-width integer reduction instructions)
45/60 Checking commit ae44adc808a4 (target/riscv: vector wideing integer reduction instructions)
46/60 Checking commit 6b1e85bd5bf9 (target/riscv: vector single-width floating-point reduction instructions)
47/60 Checking commit b279b81d260f (target/riscv: vector widening floating-point reduction instructions)
48/60 Checking commit 10809a8f183e (target/riscv: vector mask-register logical instructions)
ERROR: spaces required around that '*' (ctx:WxV)
#60: FILE: target/riscv/insn_trans/trans_rvv.inc.c:2061:
+static bool trans_##NAME(DisasContext *s, arg_r *a)                \
                                                 ^

ERROR: "foo * bar" should be "foo *bar"
#64: FILE: target/riscv/insn_trans/trans_rvv.inc.c:2065:
+        gen_helper_gvec_4_ptr * fn = gen_helper_##NAME;            \

total: 2 errors, 0 warnings, 100 lines checked

Patch 48/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

49/60 Checking commit 3eba22ed5b2f (target/riscv: vector mask population count vmpopc)
ERROR: spaces required around that '*' (ctx:WxV)
#42: FILE: target/riscv/insn_trans/trans_rvv.inc.c:2085:
+static bool trans_vmpopc_m(DisasContext *s, arg_rmr *a)
                                                     ^

total: 1 errors, 0 warnings, 70 lines checked

Patch 49/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

50/60 Checking commit 1403c7eebe82 (target/riscv: vmfirst find-first-set mask bit)
ERROR: suspect code indent for conditional statements (12, 15)
#92: FILE: target/riscv/vector_helper.c:4313:
+            if (vext_elem_mask(vs2, mlen, i)) {
+               return i;

total: 1 errors, 0 warnings, 69 lines checked

Patch 50/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

51/60 Checking commit 5941891e02bf (target/riscv: set-X-first mask bit)
ERROR: "foo * bar" should be "foo *bar"
#53: FILE: target/riscv/insn_trans/trans_rvv.inc.c:2156:
+        gen_helper_gvec_3_ptr * fn = gen_helper_##NAME;            \

total: 1 errors, 0 warnings, 111 lines checked

Patch 51/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

52/60 Checking commit 0e8d18f00c74 (target/riscv: vector iota instruction)
ERROR: spaces required around that '*' (ctx:WxV)
#45: FILE: target/riscv/insn_trans/trans_rvv.inc.c:2172:
+static bool trans_viota_m(DisasContext *s, arg_viota_m *a)
                                                        ^

total: 1 errors, 0 warnings, 74 lines checked

Patch 52/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

53/60 Checking commit 3e441a993c10 (target/riscv: vector element index instruction)
54/60 Checking commit 72404d73d06a (target/riscv: integer extract instruction)
ERROR: space prohibited after that '*' (ctx:BxW)
#48: FILE: target/riscv/insn_trans/trans_rvv.inc.c:2218:
+typedef void (* gen_helper_vext_x_v)(TCGv, TCGv_ptr, TCGv, TCGv_env);
               ^

total: 1 errors, 0 warnings, 74 lines checked

Patch 54/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

55/60 Checking commit 2e66424902bf (target/riscv: integer scalar move instruction)
ERROR: space prohibited after that '*' (ctx:BxW)
#45: FILE: target/riscv/insn_trans/trans_rvv.inc.c:2248:
+typedef void (* gen_helper_vmv_s_x)(TCGv_ptr, TCGv, TCGv_env);
               ^

total: 1 errors, 0 warnings, 62 lines checked

Patch 55/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

56/60 Checking commit 39f1497df9c9 (target/riscv: floating-point scalar move instructions)
ERROR: space prohibited after that '*' (ctx:BxW)
#50: FILE: target/riscv/insn_trans/trans_rvv.inc.c:2274:
+typedef void (* gen_helper_vfmv_f_s)(TCGv_i64, TCGv_ptr, TCGv_env);
               ^

ERROR: space prohibited after that '*' (ctx:BxW)
#71: FILE: target/riscv/insn_trans/trans_rvv.inc.c:2295:
+typedef void (* gen_helper_vfmv_s_f)(TCGv_ptr, TCGv_i64, TCGv_env);
               ^

total: 2 errors, 0 warnings, 109 lines checked

Patch 56/60 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

57/60 Checking commit 1e14e32ecd73 (target/riscv: vector slide instructions)
58/60 Checking commit f991525dafb5 (target/riscv: vector register gather instruction)
59/60 Checking commit 1b21cbeba68f (target/riscv: vector compress instruction)
60/60 Checking commit 375b5454f423 (target/riscv: configure and turn on vector extension from command line)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200312145900.2054-1-zhiwei_liu@c-sky.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 05/60] target/riscv: add vector stride load and store instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-13 20:38     ` Alistair Francis
  -1 siblings, 0 replies; 336+ messages in thread
From: Alistair Francis @ 2020-03-13 20:38 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: guoren, open list:RISC-V, Richard Henderson,
	qemu-devel@nongnu.org Developers, wxy194768, Chih-Min Chao,
	wenmeng_zhang, Palmer Dabbelt

On Thu, Mar 12, 2020 at 8:09 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Vector strided operations access the first memory element at the base address,
> and then access subsequent elements at address increments given by the byte
> offset contained in the x register specified by rs2.
>
> Vector unit-stride operations access elements stored contiguously in memory
> starting from the base effective address. It can been seen as a special
> case of strided operations.
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/cpu.h                      |   6 +
>  target/riscv/helper.h                   | 105 ++++++
>  target/riscv/insn32.decode              |  32 ++
>  target/riscv/insn_trans/trans_rvv.inc.c | 340 ++++++++++++++++++++
>  target/riscv/translate.c                |   7 +
>  target/riscv/vector_helper.c            | 406 ++++++++++++++++++++++++
>  6 files changed, 896 insertions(+)
>
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index 505d1a8515..b6ebb9b0eb 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -369,6 +369,12 @@ typedef CPURISCVState CPUArchState;
>  typedef RISCVCPU ArchCPU;
>  #include "exec/cpu-all.h"
>
> +/* share data between vector helpers and decode code */
> +FIELD(VDATA, MLEN, 0, 8)
> +FIELD(VDATA, VM, 8, 1)
> +FIELD(VDATA, LMUL, 9, 2)
> +FIELD(VDATA, NF, 11, 4)
> +
>  FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
>  FIELD(TB_FLAGS, LMUL, 3, 2)
>  FIELD(TB_FLAGS, SEW, 5, 3)
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 3c28c7e407..87dfa90609 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -78,3 +78,108 @@ DEF_HELPER_1(tlb_flush, void, env)
>  #endif
>  /* Vector functions */
>  DEF_HELPER_3(vsetvl, tl, env, tl, tl)
> +DEF_HELPER_5(vlb_v_b, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlb_v_b_mask, void, ptr, ptr, tl, env, i32)

Do you mind explaining why we have *_mask versions? I'm struggling to
understand this.

> +DEF_HELPER_5(vlb_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlb_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlb_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlb_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlb_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlb_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlh_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlh_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlh_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlh_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlh_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlh_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlw_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlw_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlw_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlw_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_b, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_b_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_b, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_b_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhu_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhu_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhu_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhu_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhu_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhu_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlwu_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlwu_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlwu_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlwu_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_b, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_b_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsh_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsh_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsh_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsh_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsh_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsh_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsw_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsw_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsw_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsw_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vse_v_b, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vse_v_b_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vse_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vse_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vse_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vse_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vse_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vse_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_6(vlsb_v_b, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsb_v_h, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsb_v_w, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsb_v_d, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsh_v_h, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsh_v_w, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsh_v_d, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsw_v_w, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsw_v_d, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlse_v_b, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlse_v_h, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlse_v_w, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlse_v_d, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsbu_v_b, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsbu_v_h, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsbu_v_w, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsbu_v_d, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlshu_v_h, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlshu_v_w, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlshu_v_d, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlswu_v_w, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlswu_v_d, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vssb_v_b, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vssb_v_h, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vssb_v_w, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vssb_v_d, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vssh_v_h, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vssh_v_w, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vssh_v_d, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vssw_v_w, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vssw_v_d, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vsse_v_b, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vsse_v_h, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vsse_v_w, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vsse_v_d, void, ptr, ptr, tl, tl, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 53340bdbc4..ef521152c5 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -25,6 +25,7 @@
>  %sh10    20:10
>  %csr    20:12
>  %rm     12:3
> +%nf     29:3                     !function=ex_plus_1
>
>  # immediates:
>  %imm_i    20:s12
> @@ -43,6 +44,8 @@
>  &u    imm rd
>  &shift     shamt rs1 rd
>  &atomic    aq rl rs2 rs1 rd
> +&r2nfvm    vm rd rs1 nf
> +&rnfvm     vm rd rs1 rs2 nf
>
>  # Formats 32:
>  @r       .......   ..... ..... ... ..... ....... &r                %rs2 %rs1 %rd
> @@ -62,6 +65,8 @@
>  @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
>  @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
>  @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
> +@r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
> +@r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
>  @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
>
>  @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
> @@ -210,5 +215,32 @@ fcvt_d_w   1101001  00000 ..... ... ..... 1010011 @r2_rm
>  fcvt_d_wu  1101001  00001 ..... ... ..... 1010011 @r2_rm
>
>  # *** RV32V Extension ***
> +
> +# *** Vector loads and stores are encoded within LOADFP/STORE-FP ***
> +vlb_v      ... 100 . 00000 ..... 000 ..... 0000111 @r2_nfvm
> +vlh_v      ... 100 . 00000 ..... 101 ..... 0000111 @r2_nfvm
> +vlw_v      ... 100 . 00000 ..... 110 ..... 0000111 @r2_nfvm
> +vle_v      ... 000 . 00000 ..... 111 ..... 0000111 @r2_nfvm
> +vlbu_v     ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm
> +vlhu_v     ... 000 . 00000 ..... 101 ..... 0000111 @r2_nfvm
> +vlwu_v     ... 000 . 00000 ..... 110 ..... 0000111 @r2_nfvm
> +vsb_v      ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm
> +vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
> +vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
> +vse_v      ... 000 . 00000 ..... 111 ..... 0100111 @r2_nfvm
> +
> +vlsb_v     ... 110 . ..... ..... 000 ..... 0000111 @r_nfvm
> +vlsh_v     ... 110 . ..... ..... 101 ..... 0000111 @r_nfvm
> +vlsw_v     ... 110 . ..... ..... 110 ..... 0000111 @r_nfvm
> +vlse_v     ... 010 . ..... ..... 111 ..... 0000111 @r_nfvm
> +vlsbu_v    ... 010 . ..... ..... 000 ..... 0000111 @r_nfvm
> +vlshu_v    ... 010 . ..... ..... 101 ..... 0000111 @r_nfvm
> +vlswu_v    ... 010 . ..... ..... 110 ..... 0000111 @r_nfvm
> +vssb_v     ... 010 . ..... ..... 000 ..... 0100111 @r_nfvm
> +vssh_v     ... 010 . ..... ..... 101 ..... 0100111 @r_nfvm
> +vssw_v     ... 010 . ..... ..... 110 ..... 0100111 @r_nfvm
> +vsse_v     ... 010 . ..... ..... 111 ..... 0100111 @r_nfvm
> +
> +# *** new major opcode OP-V ***
>  vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
>  vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
> index da82c72bbf..d85f2aec68 100644
> --- a/target/riscv/insn_trans/trans_rvv.inc.c
> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
> @@ -15,6 +15,8 @@
>   * You should have received a copy of the GNU General Public License along with
>   * this program.  If not, see <http://www.gnu.org/licenses/>.
>   */
> +#include "tcg/tcg-op-gvec.h"
> +#include "tcg/tcg-gvec-desc.h"
>
>  static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl * a)
>  {
> @@ -67,3 +69,341 @@ static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli * a)
>      tcg_temp_free(dst);
>      return true;
>  }
> +
> +/* vector register offset from env */
> +static uint32_t vreg_ofs(DisasContext *s, int reg)
> +{
> +    return offsetof(CPURISCVState, vreg) + reg * s->vlen / 8;
> +}
> +
> +/* check functions */
> +static bool vext_check_isa_ill(DisasContext *s, target_ulong isa)
> +{
> +    return !s->vill && ((s->misa & isa) == isa);
> +}

I don't think we need a new function to check ISA.

> +
> +/*
> + * There are two rules check here.
> + *
> + * 1. Vector register numbers are multiples of LMUL. (Section 3.2)
> + *
> + * 2. For all widening instructions, the destination LMUL value must also be
> + *    a supported LMUL value. (Section 11.2)
> + */
> +static bool vext_check_reg(DisasContext *s, uint32_t reg, bool widen)
> +{
> +    /*
> +     * The destination vector register group results are arranged as if both
> +     * SEW and LMUL were at twice their current settings. (Section 11.2).
> +     */
> +    int legal = widen ? 2 << s->lmul : 1 << s->lmul;
> +
> +    return !((s->lmul == 0x3 && widen) || (reg % legal));

Where does this 3 come from?


> +}
> +
> +/*
> + * There are two rules check here.
> + *
> + * 1. The destination vector register group for a masked vector instruction can
> + *    only overlap the source mask register (v0) when LMUL=1. (Section 5.3)
> + *
> + * 2. In widen instructions and some other insturctions, like vslideup.vx,
> + *    there is no need to check whether LMUL=1.
> + */
> +static bool vext_check_overlap_mask(DisasContext *s, uint32_t vd, bool vm,
> +    bool force)
> +{
> +    return (vm != 0 || vd != 0) || (!force && (s->lmul == 0));
> +}
> +
> +/* The LMUL setting must be such that LMUL * NFIELDS <= 8. (Section 7.8) */
> +static bool vext_check_nf(DisasContext *s, uint32_t nf)
> +{
> +    return (1 << s->lmul) * nf <= 8;
> +}
> +
> +/* common translation macro */
> +#define GEN_VEXT_TRANS(NAME, SEQ, ARGTYPE, OP, CHECK)      \
> +static bool trans_##NAME(DisasContext *s, arg_##ARGTYPE *a)\
> +{                                                          \
> +    if (CHECK(s, a)) {                                     \
> +        return OP(s, a, SEQ);                              \
> +    }                                                      \
> +    return false;                                          \
> +}
> +
> +/*
> + *** unit stride load and store
> + */
> +typedef void gen_helper_ldst_us(TCGv_ptr, TCGv_ptr, TCGv,
> +        TCGv_env, TCGv_i32);
> +
> +static bool ldst_us_trans(uint32_t vd, uint32_t rs1, uint32_t data,
> +        gen_helper_ldst_us *fn, DisasContext *s)
> +{
> +    TCGv_ptr dest, mask;
> +    TCGv base;
> +    TCGv_i32 desc;
> +
> +    dest = tcg_temp_new_ptr();
> +    mask = tcg_temp_new_ptr();
> +    base = tcg_temp_new();
> +
> +    /*
> +     * As simd_desc supports at most 256 bytes, and in this implementation,
> +     * the max vector group length is 2048 bytes. So split it into two parts.
> +     *
> +     * The first part is vlen in bytes, encoded in maxsz of simd_desc.
> +     * The second part is lmul, encoded in data of simd_desc.
> +     */
> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
> +
> +    gen_get_gpr(base, rs1);
> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
> +
> +    fn(dest, mask, base, cpu_env, desc);
> +
> +    tcg_temp_free_ptr(dest);
> +    tcg_temp_free_ptr(mask);
> +    tcg_temp_free(base);
> +    tcg_temp_free_i32(desc);
> +    return true;
> +}
> +
> +static bool ld_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
> +{
> +    uint32_t data = 0;
> +    gen_helper_ldst_us *fn;
> +    static gen_helper_ldst_us * const fns[2][7][4] = {
> +        /* masked unit stride load */
> +        { { gen_helper_vlb_v_b_mask,  gen_helper_vlb_v_h_mask,
> +            gen_helper_vlb_v_w_mask,  gen_helper_vlb_v_d_mask },
> +          { NULL,                     gen_helper_vlh_v_h_mask,
> +            gen_helper_vlh_v_w_mask,  gen_helper_vlh_v_d_mask },
> +          { NULL,                     NULL,
> +            gen_helper_vlw_v_w_mask,  gen_helper_vlw_v_d_mask },
> +          { gen_helper_vle_v_b_mask,  gen_helper_vle_v_h_mask,
> +            gen_helper_vle_v_w_mask,  gen_helper_vle_v_d_mask },
> +          { gen_helper_vlbu_v_b_mask, gen_helper_vlbu_v_h_mask,
> +            gen_helper_vlbu_v_w_mask, gen_helper_vlbu_v_d_mask },
> +          { NULL,                     gen_helper_vlhu_v_h_mask,
> +            gen_helper_vlhu_v_w_mask, gen_helper_vlhu_v_d_mask },
> +          { NULL,                     NULL,
> +            gen_helper_vlwu_v_w_mask, gen_helper_vlwu_v_d_mask } },
> +        /* unmasked unit stride load */
> +        { { gen_helper_vlb_v_b,  gen_helper_vlb_v_h,
> +            gen_helper_vlb_v_w,  gen_helper_vlb_v_d },
> +          { NULL,                gen_helper_vlh_v_h,
> +            gen_helper_vlh_v_w,  gen_helper_vlh_v_d },
> +          { NULL,                NULL,
> +            gen_helper_vlw_v_w,  gen_helper_vlw_v_d },
> +          { gen_helper_vle_v_b,  gen_helper_vle_v_h,
> +            gen_helper_vle_v_w,  gen_helper_vle_v_d },
> +          { gen_helper_vlbu_v_b, gen_helper_vlbu_v_h,
> +            gen_helper_vlbu_v_w, gen_helper_vlbu_v_d },
> +          { NULL,                gen_helper_vlhu_v_h,
> +            gen_helper_vlhu_v_w, gen_helper_vlhu_v_d },
> +          { NULL,                NULL,
> +            gen_helper_vlwu_v_w, gen_helper_vlwu_v_d } }
> +    };
> +
> +    fn =  fns[a->vm][seq][s->sew];
> +    if (fn == NULL) {
> +        return false;
> +    }
> +
> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
> +    return ldst_us_trans(a->rd, a->rs1, data, fn, s);
> +}
> +
> +static bool ld_us_check(DisasContext *s, arg_r2nfvm* a)
> +{
> +    return (vext_check_isa_ill(s, RVV) &&
> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
> +            vext_check_reg(s, a->rd, false) &&
> +            vext_check_nf(s, a->nf));
> +}
> +
> +GEN_VEXT_TRANS(vlb_v, 0, r2nfvm, ld_us_op, ld_us_check)
> +GEN_VEXT_TRANS(vlh_v, 1, r2nfvm, ld_us_op, ld_us_check)
> +GEN_VEXT_TRANS(vlw_v, 2, r2nfvm, ld_us_op, ld_us_check)
> +GEN_VEXT_TRANS(vle_v, 3, r2nfvm, ld_us_op, ld_us_check)
> +GEN_VEXT_TRANS(vlbu_v, 4, r2nfvm, ld_us_op, ld_us_check)
> +GEN_VEXT_TRANS(vlhu_v, 5, r2nfvm, ld_us_op, ld_us_check)
> +GEN_VEXT_TRANS(vlwu_v, 6, r2nfvm, ld_us_op, ld_us_check)
> +
> +static bool st_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
> +{
> +    uint32_t data = 0;
> +    gen_helper_ldst_us *fn;
> +    static gen_helper_ldst_us * const fns[2][4][4] = {
> +        /* masked unit stride load and store */
> +        { { gen_helper_vsb_v_b_mask,  gen_helper_vsb_v_h_mask,
> +            gen_helper_vsb_v_w_mask,  gen_helper_vsb_v_d_mask },
> +          { NULL,                     gen_helper_vsh_v_h_mask,
> +            gen_helper_vsh_v_w_mask,  gen_helper_vsh_v_d_mask },
> +          { NULL,                     NULL,
> +            gen_helper_vsw_v_w_mask,  gen_helper_vsw_v_d_mask },
> +          { gen_helper_vse_v_b_mask,  gen_helper_vse_v_h_mask,
> +            gen_helper_vse_v_w_mask,  gen_helper_vse_v_d_mask } },
> +        /* unmasked unit stride store */
> +        { { gen_helper_vsb_v_b,  gen_helper_vsb_v_h,
> +            gen_helper_vsb_v_w,  gen_helper_vsb_v_d },
> +          { NULL,                gen_helper_vsh_v_h,
> +            gen_helper_vsh_v_w,  gen_helper_vsh_v_d },
> +          { NULL,                NULL,
> +            gen_helper_vsw_v_w,  gen_helper_vsw_v_d },
> +          { gen_helper_vse_v_b,  gen_helper_vse_v_h,
> +            gen_helper_vse_v_w,  gen_helper_vse_v_d } }
> +    };
> +
> +    fn =  fns[a->vm][seq][s->sew];
> +    if (fn == NULL) {
> +        return false;
> +    }
> +
> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
> +    return ldst_us_trans(a->rd, a->rs1, data, fn, s);
> +}
> +
> +static bool st_us_check(DisasContext *s, arg_r2nfvm* a)
> +{
> +    return (vext_check_isa_ill(s, RVV) &&
> +            vext_check_reg(s, a->rd, false) &&
> +            vext_check_nf(s, a->nf));
> +}
> +
> +GEN_VEXT_TRANS(vsb_v, 0, r2nfvm, st_us_op, st_us_check)
> +GEN_VEXT_TRANS(vsh_v, 1, r2nfvm, st_us_op, st_us_check)
> +GEN_VEXT_TRANS(vsw_v, 2, r2nfvm, st_us_op, st_us_check)
> +GEN_VEXT_TRANS(vse_v, 3, r2nfvm, st_us_op, st_us_check)
> +
> +/*
> + *** stride load and store
> + */
> +typedef void gen_helper_ldst_stride(TCGv_ptr, TCGv_ptr, TCGv,
> +        TCGv, TCGv_env, TCGv_i32);
> +
> +static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, uint32_t rs2,
> +        uint32_t data, gen_helper_ldst_stride *fn, DisasContext *s)
> +{
> +    TCGv_ptr dest, mask;
> +    TCGv base, stride;
> +    TCGv_i32 desc;
> +
> +    dest = tcg_temp_new_ptr();
> +    mask = tcg_temp_new_ptr();
> +    base = tcg_temp_new();
> +    stride = tcg_temp_new();
> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
> +
> +    gen_get_gpr(base, rs1);
> +    gen_get_gpr(stride, rs2);
> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
> +
> +    fn(dest, mask, base, stride, cpu_env, desc);
> +
> +    tcg_temp_free_ptr(dest);
> +    tcg_temp_free_ptr(mask);
> +    tcg_temp_free(base);
> +    tcg_temp_free(stride);
> +    tcg_temp_free_i32(desc);
> +    return true;
> +}
> +
> +static bool ld_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
> +{
> +    uint32_t data = 0;
> +    gen_helper_ldst_stride *fn;
> +    static gen_helper_ldst_stride * const fns[7][4] = {
> +        { gen_helper_vlsb_v_b,  gen_helper_vlsb_v_h,
> +          gen_helper_vlsb_v_w,  gen_helper_vlsb_v_d },
> +        { NULL,                 gen_helper_vlsh_v_h,
> +          gen_helper_vlsh_v_w,  gen_helper_vlsh_v_d },
> +        { NULL,                 NULL,
> +          gen_helper_vlsw_v_w,  gen_helper_vlsw_v_d },
> +        { gen_helper_vlse_v_b,  gen_helper_vlse_v_h,
> +          gen_helper_vlse_v_w,  gen_helper_vlse_v_d },
> +        { gen_helper_vlsbu_v_b, gen_helper_vlsbu_v_h,
> +          gen_helper_vlsbu_v_w, gen_helper_vlsbu_v_d },
> +        { NULL,                 gen_helper_vlshu_v_h,
> +          gen_helper_vlshu_v_w, gen_helper_vlshu_v_d },
> +        { NULL,                 NULL,
> +          gen_helper_vlswu_v_w, gen_helper_vlswu_v_d },
> +    };
> +
> +    fn =  fns[seq][s->sew];
> +    if (fn == NULL) {
> +        return false;
> +    }
> +
> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
> +    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
> +}
> +
> +static bool ld_stride_check(DisasContext *s, arg_rnfvm* a)
> +{
> +    return (vext_check_isa_ill(s, RVV) &&
> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
> +            vext_check_reg(s, a->rd, false) &&
> +            vext_check_nf(s, a->nf));
> +}
> +
> +GEN_VEXT_TRANS(vlsb_v, 0, rnfvm, ld_stride_op, ld_stride_check)
> +GEN_VEXT_TRANS(vlsh_v, 1, rnfvm, ld_stride_op, ld_stride_check)
> +GEN_VEXT_TRANS(vlsw_v, 2, rnfvm, ld_stride_op, ld_stride_check)
> +GEN_VEXT_TRANS(vlse_v, 3, rnfvm, ld_stride_op, ld_stride_check)
> +GEN_VEXT_TRANS(vlsbu_v, 4, rnfvm, ld_stride_op, ld_stride_check)
> +GEN_VEXT_TRANS(vlshu_v, 5, rnfvm, ld_stride_op, ld_stride_check)
> +GEN_VEXT_TRANS(vlswu_v, 6, rnfvm, ld_stride_op, ld_stride_check)
> +
> +static bool st_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
> +{
> +    uint32_t data = 0;
> +    gen_helper_ldst_stride *fn;
> +    static gen_helper_ldst_stride * const fns[4][4] = {
> +        /* masked stride store */
> +        { gen_helper_vssb_v_b,  gen_helper_vssb_v_h,
> +          gen_helper_vssb_v_w,  gen_helper_vssb_v_d },
> +        { NULL,                 gen_helper_vssh_v_h,
> +          gen_helper_vssh_v_w,  gen_helper_vssh_v_d },
> +        { NULL,                 NULL,
> +          gen_helper_vssw_v_w,  gen_helper_vssw_v_d },
> +        { gen_helper_vsse_v_b,  gen_helper_vsse_v_h,
> +          gen_helper_vsse_v_w,  gen_helper_vsse_v_d }
> +    };
> +
> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
> +    fn =  fns[seq][s->sew];
> +    if (fn == NULL) {
> +        return false;
> +    }
> +
> +    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
> +}
> +
> +static bool st_stride_check(DisasContext *s, arg_rnfvm* a)
> +{
> +    return (vext_check_isa_ill(s, RVV) &&
> +            vext_check_reg(s, a->rd, false) &&
> +            vext_check_nf(s, a->nf));
> +}
> +
> +GEN_VEXT_TRANS(vssb_v, 0, rnfvm, st_stride_op, st_stride_check)
> +GEN_VEXT_TRANS(vssh_v, 1, rnfvm, st_stride_op, st_stride_check)
> +GEN_VEXT_TRANS(vssw_v, 2, rnfvm, st_stride_op, st_stride_check)
> +GEN_VEXT_TRANS(vsse_v, 3, rnfvm, st_stride_op, st_stride_check)

Looks good

> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index af07ac4160..852545b77e 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -61,6 +61,7 @@ typedef struct DisasContext {
>      uint8_t lmul;
>      uint8_t sew;
>      uint16_t vlen;
> +    uint16_t mlen;
>      bool vl_eq_vlmax;
>  } DisasContext;
>
> @@ -548,6 +549,11 @@ static void decode_RV32_64C(DisasContext *ctx, uint16_t opcode)
>      }
>  }
>
> +static int ex_plus_1(DisasContext *ctx, int nf)
> +{
> +    return nf + 1;
> +}
> +
>  #define EX_SH(amount) \
>      static int ex_shift_##amount(DisasContext *ctx, int imm) \
>      {                                         \
> @@ -784,6 +790,7 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
>      ctx->vill = FIELD_EX32(tb_flags, TB_FLAGS, VILL);
>      ctx->sew = FIELD_EX32(tb_flags, TB_FLAGS, SEW);
>      ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
> +    ctx->mlen = 1 << (ctx->sew  + 3 - ctx->lmul);
>      ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
>  }
>
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 2afe716f2a..ebfabd2946 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -18,8 +18,10 @@
>
>  #include "qemu/osdep.h"
>  #include "cpu.h"
> +#include "exec/memop.h"
>  #include "exec/exec-all.h"
>  #include "exec/helper-proto.h"
> +#include "tcg/tcg-gvec-desc.h"
>  #include <math.h>
>
>  target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
> @@ -51,3 +53,407 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
>      env->vstart = 0;
>      return vl;
>  }
> +
> +/*
> + * Note that vector data is stored in host-endian 64-bit chunks,
> + * so addressing units smaller than that needs a host-endian fixup.
> + */
> +#ifdef HOST_WORDS_BIGENDIAN
> +#define H1(x)   ((x) ^ 7)
> +#define H1_2(x) ((x) ^ 6)
> +#define H1_4(x) ((x) ^ 4)
> +#define H2(x)   ((x) ^ 3)
> +#define H4(x)   ((x) ^ 1)
> +#define H8(x)   ((x))
> +#else
> +#define H1(x)   (x)
> +#define H1_2(x) (x)
> +#define H1_4(x) (x)
> +#define H2(x)   (x)
> +#define H4(x)   (x)
> +#define H8(x)   (x)
> +#endif

Looks good. Overall this looks good. Do you mind splitting this patch
up a little bit more? It's difficult to review such a long and complex
patch.

Alistair

> +
> +static inline uint32_t vext_nf(uint32_t desc)
> +{
> +    return FIELD_EX32(simd_data(desc), VDATA, NF);
> +}
> +
> +static inline uint32_t vext_mlen(uint32_t desc)
> +{
> +    return FIELD_EX32(simd_data(desc), VDATA, MLEN);
> +}
> +
> +static inline uint32_t vext_vm(uint32_t desc)
> +{
> +    return FIELD_EX32(simd_data(desc), VDATA, VM);
> +}
> +
> +static inline uint32_t vext_lmul(uint32_t desc)
> +{
> +    return FIELD_EX32(simd_data(desc), VDATA, LMUL);
> +}
> +
> +/*
> + * Get vector group length in bytes. Its range is [64, 2048].
> + *
> + * As simd_desc support at most 256, the max vlen is 512 bits.
> + * So vlen in bytes is encoded as maxsz.
> + */
> +static inline uint32_t vext_maxsz(uint32_t desc)
> +{
> +    return simd_maxsz(desc) << vext_lmul(desc);
> +}
> +
> +/*
> + * This function checks watchpoint before real load operation.
> + *
> + * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
> + * In user mode, there is no watchpoint support now.
> + *
> + * It will trigger an exception if there is no mapping in TLB
> + * and page table walk can't fill the TLB entry. Then the guest
> + * software can return here after process the exception or never return.
> + */
> +static void probe_pages(CPURISCVState *env, target_ulong addr,
> +        target_ulong len, uintptr_t ra, MMUAccessType access_type)
> +{
> +    target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
> +    target_ulong curlen = MIN(pagelen, len);
> +
> +    probe_access(env, addr, curlen, access_type,
> +            cpu_mmu_index(env, false), ra);
> +    if (len > curlen) {
> +        addr += curlen;
> +        curlen = len - curlen;
> +        probe_access(env, addr, curlen, access_type,
> +                cpu_mmu_index(env, false), ra);
> +    }
> +}
> +
> +#ifdef HOST_WORDS_BIGENDIAN
> +static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
> +{
> +    /*
> +     * Split the remaining range to two parts.
> +     * The first part is in the last uint64_t unit.
> +     * The second part start from the next uint64_t unit.
> +     */
> +    int part1 = 0, part2 = tot - cnt;
> +    if (cnt % 8) {
> +        part1 = 8 - (cnt % 8);
> +        part2 = tot - cnt - part1;
> +        memset(tail & ~(7ULL), 0, part1);
> +        memset((tail + 8) & ~(7ULL), 0, part2);
> +    } else {
> +        memset(tail, 0, part2);
> +    }
> +}
> +#else
> +static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
> +{
> +    memset(tail, 0, tot - cnt);
> +}
> +#endif
> +
> +static void clearb(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
> +{
> +    int8_t *cur = ((int8_t *)vd + H1(idx));
> +    vext_clear(cur, cnt, tot);
> +}
> +
> +static void clearh(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
> +{
> +    int16_t *cur = ((int16_t *)vd + H2(idx));
> +    vext_clear(cur, cnt, tot);
> +}
> +
> +static void clearl(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
> +{
> +    int32_t *cur = ((int32_t *)vd + H4(idx));
> +    vext_clear(cur, cnt, tot);
> +}
> +
> +static void clearq(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
> +{
> +    int64_t *cur = (int64_t *)vd + idx;
> +    vext_clear(cur, cnt, tot);
> +}
> +
> +
> +static inline int vext_elem_mask(void *v0, int mlen, int index)
> +{
> +    int idx = (index * mlen) / 64;
> +    int pos = (index * mlen) % 64;
> +    return (((uint64_t *)v0)[idx] >> pos) & 1;
> +}
> +
> +/* elements operations for load and store */
> +typedef void (*vext_ldst_elem_fn)(CPURISCVState *env, target_ulong addr,
> +        uint32_t idx, void *vd, uintptr_t retaddr);
> +typedef void (*vext_ld_clear_elem)(void *vd, uint32_t idx,
> +        uint32_t cnt, uint32_t tot);
> +
> +#define GEN_VEXT_LD_ELEM(NAME, MTYPE, ETYPE, H, LDSUF)     \
> +static void NAME(CPURISCVState *env, abi_ptr addr,         \
> +        uint32_t idx, void *vd, uintptr_t retaddr)         \
> +{                                                          \
> +    MTYPE data;                                            \
> +    ETYPE *cur = ((ETYPE *)vd + H(idx));                   \
> +    data = cpu_##LDSUF##_data_ra(env, addr, retaddr);      \
> +    *cur = data;                                           \
> +}                                                          \
> +
> +GEN_VEXT_LD_ELEM(ldb_b, int8_t,  int8_t,  H1, ldsb)
> +GEN_VEXT_LD_ELEM(ldb_h, int8_t,  int16_t, H2, ldsb)
> +GEN_VEXT_LD_ELEM(ldb_w, int8_t,  int32_t, H4, ldsb)
> +GEN_VEXT_LD_ELEM(ldb_d, int8_t,  int64_t, H8, ldsb)
> +GEN_VEXT_LD_ELEM(ldh_h, int16_t, int16_t, H2, ldsw)
> +GEN_VEXT_LD_ELEM(ldh_w, int16_t, int32_t, H4, ldsw)
> +GEN_VEXT_LD_ELEM(ldh_d, int16_t, int64_t, H8, ldsw)
> +GEN_VEXT_LD_ELEM(ldw_w, int32_t, int32_t, H4, ldl)
> +GEN_VEXT_LD_ELEM(ldw_d, int32_t, int64_t, H8, ldl)
> +GEN_VEXT_LD_ELEM(lde_b, int8_t,  int8_t,  H1, ldsb)
> +GEN_VEXT_LD_ELEM(lde_h, int16_t, int16_t, H2, ldsw)
> +GEN_VEXT_LD_ELEM(lde_w, int32_t, int32_t, H4, ldl)
> +GEN_VEXT_LD_ELEM(lde_d, int64_t, int64_t, H8, ldq)
> +GEN_VEXT_LD_ELEM(ldbu_b, uint8_t,  uint8_t,  H1, ldub)
> +GEN_VEXT_LD_ELEM(ldbu_h, uint8_t,  uint16_t, H2, ldub)
> +GEN_VEXT_LD_ELEM(ldbu_w, uint8_t,  uint32_t, H4, ldub)
> +GEN_VEXT_LD_ELEM(ldbu_d, uint8_t,  uint64_t, H8, ldub)
> +GEN_VEXT_LD_ELEM(ldhu_h, uint16_t, uint16_t, H2, lduw)
> +GEN_VEXT_LD_ELEM(ldhu_w, uint16_t, uint32_t, H4, lduw)
> +GEN_VEXT_LD_ELEM(ldhu_d, uint16_t, uint64_t, H8, lduw)
> +GEN_VEXT_LD_ELEM(ldwu_w, uint32_t, uint32_t, H4, ldl)
> +GEN_VEXT_LD_ELEM(ldwu_d, uint32_t, uint64_t, H8, ldl)
> +
> +#define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF)          \
> +static void NAME(CPURISCVState *env, abi_ptr addr,       \
> +        uint32_t idx, void *vd, uintptr_t retaddr)       \
> +{                                                        \
> +    ETYPE data = *((ETYPE *)vd + H(idx));                \
> +    cpu_##STSUF##_data_ra(env, addr, data, retaddr);     \
> +}
> +GEN_VEXT_ST_ELEM(stb_b, int8_t,  H1, stb)
> +GEN_VEXT_ST_ELEM(stb_h, int16_t, H2, stb)
> +GEN_VEXT_ST_ELEM(stb_w, int32_t, H4, stb)
> +GEN_VEXT_ST_ELEM(stb_d, int64_t, H8, stb)
> +GEN_VEXT_ST_ELEM(sth_h, int16_t, H2, stw)
> +GEN_VEXT_ST_ELEM(sth_w, int32_t, H4, stw)
> +GEN_VEXT_ST_ELEM(sth_d, int64_t, H8, stw)
> +GEN_VEXT_ST_ELEM(stw_w, int32_t, H4, stl)
> +GEN_VEXT_ST_ELEM(stw_d, int64_t, H8, stl)
> +GEN_VEXT_ST_ELEM(ste_b, int8_t,  H1, stb)
> +GEN_VEXT_ST_ELEM(ste_h, int16_t, H2, stw)
> +GEN_VEXT_ST_ELEM(ste_w, int32_t, H4, stl)
> +GEN_VEXT_ST_ELEM(ste_d, int64_t, H8, stq)
> +
> +/*
> + *** stride: access vector element from strided memory
> + */
> +static void vext_ldst_stride(void *vd, void *v0, target_ulong base,
> +        target_ulong stride, CPURISCVState *env, uint32_t desc, uint32_t vm,
> +        vext_ldst_elem_fn ldst_elem, vext_ld_clear_elem clear_elem,
> +        uint32_t esz, uint32_t msz, uintptr_t ra, MMUAccessType access_type)
> +{
> +    uint32_t i, k;
> +    uint32_t nf = vext_nf(desc);
> +    uint32_t mlen = vext_mlen(desc);
> +    uint32_t vlmax = vext_maxsz(desc) / esz;
> +
> +    if (env->vl == 0) {
> +        return;
> +    }
> +    /* probe every access*/
> +    for (i = 0; i < env->vl; i++) {
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +            continue;
> +        }
> +        probe_pages(env, base + stride * i, nf * msz, ra, access_type);
> +    }
> +    /* do real access */
> +    for (i = 0; i < env->vl; i++) {
> +        k = 0;
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +            continue;
> +        }
> +        while (k < nf) {
> +            target_ulong addr = base + stride * i + k * msz;
> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
> +            k++;
> +        }
> +    }
> +    /* clear tail elements */
> +    if (clear_elem) {
> +        for (k = 0; k < nf; k++) {
> +            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
> +        }
> +    }
> +}
> +
> +#define GEN_VEXT_LD_STRIDE(NAME, MTYPE, ETYPE, LOAD_FN, CLEAR_FN)       \
> +void HELPER(NAME)(void *vd, void * v0, target_ulong base,               \
> +        target_ulong stride, CPURISCVState *env, uint32_t desc)         \
> +{                                                                       \
> +    uint32_t vm = vext_vm(desc);                                        \
> +    vext_ldst_stride(vd, v0, base, stride, env, desc, vm, LOAD_FN,      \
> +        CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);\
> +}
> +
> +GEN_VEXT_LD_STRIDE(vlsb_v_b,  int8_t,   int8_t,   ldb_b,  clearb)
> +GEN_VEXT_LD_STRIDE(vlsb_v_h,  int8_t,   int16_t,  ldb_h,  clearh)
> +GEN_VEXT_LD_STRIDE(vlsb_v_w,  int8_t,   int32_t,  ldb_w,  clearl)
> +GEN_VEXT_LD_STRIDE(vlsb_v_d,  int8_t,   int64_t,  ldb_d,  clearq)
> +GEN_VEXT_LD_STRIDE(vlsh_v_h,  int16_t,  int16_t,  ldh_h,  clearh)
> +GEN_VEXT_LD_STRIDE(vlsh_v_w,  int16_t,  int32_t,  ldh_w,  clearl)
> +GEN_VEXT_LD_STRIDE(vlsh_v_d,  int16_t,  int64_t,  ldh_d,  clearq)
> +GEN_VEXT_LD_STRIDE(vlsw_v_w,  int32_t,  int32_t,  ldw_w,  clearl)
> +GEN_VEXT_LD_STRIDE(vlsw_v_d,  int32_t,  int64_t,  ldw_d,  clearq)
> +GEN_VEXT_LD_STRIDE(vlse_v_b,  int8_t,   int8_t,   lde_b,  clearb)
> +GEN_VEXT_LD_STRIDE(vlse_v_h,  int16_t,  int16_t,  lde_h,  clearh)
> +GEN_VEXT_LD_STRIDE(vlse_v_w,  int32_t,  int32_t,  lde_w,  clearl)
> +GEN_VEXT_LD_STRIDE(vlse_v_d,  int64_t,  int64_t,  lde_d,  clearq)
> +GEN_VEXT_LD_STRIDE(vlsbu_v_b, uint8_t,  uint8_t,  ldbu_b, clearb)
> +GEN_VEXT_LD_STRIDE(vlsbu_v_h, uint8_t,  uint16_t, ldbu_h, clearh)
> +GEN_VEXT_LD_STRIDE(vlsbu_v_w, uint8_t,  uint32_t, ldbu_w, clearl)
> +GEN_VEXT_LD_STRIDE(vlsbu_v_d, uint8_t,  uint64_t, ldbu_d, clearq)
> +GEN_VEXT_LD_STRIDE(vlshu_v_h, uint16_t, uint16_t, ldhu_h, clearh)
> +GEN_VEXT_LD_STRIDE(vlshu_v_w, uint16_t, uint32_t, ldhu_w, clearl)
> +GEN_VEXT_LD_STRIDE(vlshu_v_d, uint16_t, uint64_t, ldhu_d, clearq)
> +GEN_VEXT_LD_STRIDE(vlswu_v_w, uint32_t, uint32_t, ldwu_w, clearl)
> +GEN_VEXT_LD_STRIDE(vlswu_v_d, uint32_t, uint64_t, ldwu_d, clearq)
> +
> +#define GEN_VEXT_ST_STRIDE(NAME, MTYPE, ETYPE, STORE_FN)                \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
> +        target_ulong stride, CPURISCVState *env, uint32_t desc)         \
> +{                                                                       \
> +    uint32_t vm = vext_vm(desc);                                        \
> +    vext_ldst_stride(vd, v0, base, stride, env, desc, vm, STORE_FN,     \
> +        NULL, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);   \
> +}
> +
> +GEN_VEXT_ST_STRIDE(vssb_v_b, int8_t,  int8_t,  stb_b)
> +GEN_VEXT_ST_STRIDE(vssb_v_h, int8_t,  int16_t, stb_h)
> +GEN_VEXT_ST_STRIDE(vssb_v_w, int8_t,  int32_t, stb_w)
> +GEN_VEXT_ST_STRIDE(vssb_v_d, int8_t,  int64_t, stb_d)
> +GEN_VEXT_ST_STRIDE(vssh_v_h, int16_t, int16_t, sth_h)
> +GEN_VEXT_ST_STRIDE(vssh_v_w, int16_t, int32_t, sth_w)
> +GEN_VEXT_ST_STRIDE(vssh_v_d, int16_t, int64_t, sth_d)
> +GEN_VEXT_ST_STRIDE(vssw_v_w, int32_t, int32_t, stw_w)
> +GEN_VEXT_ST_STRIDE(vssw_v_d, int32_t, int64_t, stw_d)
> +GEN_VEXT_ST_STRIDE(vsse_v_b, int8_t,  int8_t,  ste_b)
> +GEN_VEXT_ST_STRIDE(vsse_v_h, int16_t, int16_t, ste_h)
> +GEN_VEXT_ST_STRIDE(vsse_v_w, int32_t, int32_t, ste_w)
> +GEN_VEXT_ST_STRIDE(vsse_v_d, int64_t, int64_t, ste_d)
> +
> +/*
> + *** unit-stride: access elements stored contiguously in memory
> + */
> +
> +/* unmasked unit-stride load and store operation*/
> +static inline void vext_ldst_us(void *vd, target_ulong base,
> +        CPURISCVState *env, uint32_t desc,
> +        vext_ldst_elem_fn ldst_elem,
> +        vext_ld_clear_elem clear_elem,
> +        uint32_t esz, uint32_t msz, uintptr_t ra,
> +        MMUAccessType access_type)
> +{
> +    uint32_t i, k;
> +    uint32_t nf = vext_nf(desc);
> +    uint32_t vlmax = vext_maxsz(desc) / esz;
> +
> +    if (env->vl == 0) {
> +        return;
> +    }
> +    /* probe every access */
> +    probe_pages(env, base, env->vl * nf * msz, ra, access_type);
> +    /* load bytes from guest memory */
> +    for (i = 0; i < env->vl; i++) {
> +        k = 0;
> +        while (k < nf) {
> +            target_ulong addr = base + (i * nf + k) * msz;
> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
> +            k++;
> +        }
> +    }
> +    /* clear tail elements */
> +    if (clear_elem) {
> +        for (k = 0; k < nf; k++) {
> +            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
> +        }
> +    }
> +}
> +
> +/*
> + * masked unit-stride load and store operation will be a special case of stride,
> + * stride = NF * sizeof (MTYPE)
> + */
> +
> +#define GEN_VEXT_LD_US(NAME, MTYPE, ETYPE, LOAD_FN, CLEAR_FN)           \
> +void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,         \
> +        CPURISCVState *env, uint32_t desc)                              \
> +{                                                                       \
> +    uint32_t stride = vext_nf(desc) * sizeof(MTYPE);                    \
> +    vext_ldst_stride(vd, v0, base, stride, env, desc, false, LOAD_FN,   \
> +        CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);\
> +}                                                                       \
> +                                                                        \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
> +        CPURISCVState *env, uint32_t desc)                              \
> +{                                                                       \
> +    vext_ldst_us(vd, base, env, desc, LOAD_FN, CLEAR_FN,                \
> +        sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);          \
> +}
> +
> +GEN_VEXT_LD_US(vlb_v_b,  int8_t,   int8_t,   ldb_b,  clearb)
> +GEN_VEXT_LD_US(vlb_v_h,  int8_t,   int16_t,  ldb_h,  clearh)
> +GEN_VEXT_LD_US(vlb_v_w,  int8_t,   int32_t,  ldb_w,  clearl)
> +GEN_VEXT_LD_US(vlb_v_d,  int8_t,   int64_t,  ldb_d,  clearq)
> +GEN_VEXT_LD_US(vlh_v_h,  int16_t,  int16_t,  ldh_h,  clearh)
> +GEN_VEXT_LD_US(vlh_v_w,  int16_t,  int32_t,  ldh_w,  clearl)
> +GEN_VEXT_LD_US(vlh_v_d,  int16_t,  int64_t,  ldh_d,  clearq)
> +GEN_VEXT_LD_US(vlw_v_w,  int32_t,  int32_t,  ldw_w,  clearl)
> +GEN_VEXT_LD_US(vlw_v_d,  int32_t,  int64_t,  ldw_d,  clearq)
> +GEN_VEXT_LD_US(vle_v_b,  int8_t,   int8_t,   lde_b,  clearb)
> +GEN_VEXT_LD_US(vle_v_h,  int16_t,  int16_t,  lde_h,  clearh)
> +GEN_VEXT_LD_US(vle_v_w,  int32_t,  int32_t,  lde_w,  clearl)
> +GEN_VEXT_LD_US(vle_v_d,  int64_t,  int64_t,  lde_d,  clearq)
> +GEN_VEXT_LD_US(vlbu_v_b, uint8_t,  uint8_t,  ldbu_b, clearb)
> +GEN_VEXT_LD_US(vlbu_v_h, uint8_t,  uint16_t, ldbu_h, clearh)
> +GEN_VEXT_LD_US(vlbu_v_w, uint8_t,  uint32_t, ldbu_w, clearl)
> +GEN_VEXT_LD_US(vlbu_v_d, uint8_t,  uint64_t, ldbu_d, clearq)
> +GEN_VEXT_LD_US(vlhu_v_h, uint16_t, uint16_t, ldhu_h, clearh)
> +GEN_VEXT_LD_US(vlhu_v_w, uint16_t, uint32_t, ldhu_w, clearl)
> +GEN_VEXT_LD_US(vlhu_v_d, uint16_t, uint64_t, ldhu_d, clearq)
> +GEN_VEXT_LD_US(vlwu_v_w, uint32_t, uint32_t, ldwu_w, clearl)
> +GEN_VEXT_LD_US(vlwu_v_d, uint32_t, uint64_t, ldwu_d, clearq)
> +
> +#define GEN_VEXT_ST_US(NAME, MTYPE, ETYPE, STORE_FN)                    \
> +void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,         \
> +        CPURISCVState *env, uint32_t desc)                              \
> +{                                                                       \
> +    uint32_t stride = vext_nf(desc) * sizeof(MTYPE);                    \
> +    vext_ldst_stride(vd, v0, base, stride, env, desc, false, STORE_FN,  \
> +        NULL, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);   \
> +}                                                                       \
> +                                                                        \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
> +        CPURISCVState *env, uint32_t desc)                              \
> +{                                                                       \
> +    vext_ldst_us(vd, base, env, desc, STORE_FN, NULL,                   \
> +        sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);         \
> +}
> +
> +GEN_VEXT_ST_US(vsb_v_b, int8_t,  int8_t , stb_b)
> +GEN_VEXT_ST_US(vsb_v_h, int8_t,  int16_t, stb_h)
> +GEN_VEXT_ST_US(vsb_v_w, int8_t,  int32_t, stb_w)
> +GEN_VEXT_ST_US(vsb_v_d, int8_t,  int64_t, stb_d)
> +GEN_VEXT_ST_US(vsh_v_h, int16_t, int16_t, sth_h)
> +GEN_VEXT_ST_US(vsh_v_w, int16_t, int32_t, sth_w)
> +GEN_VEXT_ST_US(vsh_v_d, int16_t, int64_t, sth_d)
> +GEN_VEXT_ST_US(vsw_v_w, int32_t, int32_t, stw_w)
> +GEN_VEXT_ST_US(vsw_v_d, int32_t, int64_t, stw_d)
> +GEN_VEXT_ST_US(vse_v_b, int8_t,  int8_t , ste_b)
> +GEN_VEXT_ST_US(vse_v_h, int16_t, int16_t, ste_h)
> +GEN_VEXT_ST_US(vse_v_w, int32_t, int32_t, ste_w)
> +GEN_VEXT_ST_US(vse_v_d, int64_t, int64_t, ste_d)
> --
> 2.23.0
>


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 05/60] target/riscv: add vector stride load and store instructions
@ 2020-03-13 20:38     ` Alistair Francis
  0 siblings, 0 replies; 336+ messages in thread
From: Alistair Francis @ 2020-03-13 20:38 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Chih-Min Chao, Palmer Dabbelt, wenmeng_zhang,
	wxy194768, guoren, qemu-devel@nongnu.org Developers,
	open list:RISC-V

On Thu, Mar 12, 2020 at 8:09 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Vector strided operations access the first memory element at the base address,
> and then access subsequent elements at address increments given by the byte
> offset contained in the x register specified by rs2.
>
> Vector unit-stride operations access elements stored contiguously in memory
> starting from the base effective address. It can been seen as a special
> case of strided operations.
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/cpu.h                      |   6 +
>  target/riscv/helper.h                   | 105 ++++++
>  target/riscv/insn32.decode              |  32 ++
>  target/riscv/insn_trans/trans_rvv.inc.c | 340 ++++++++++++++++++++
>  target/riscv/translate.c                |   7 +
>  target/riscv/vector_helper.c            | 406 ++++++++++++++++++++++++
>  6 files changed, 896 insertions(+)
>
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index 505d1a8515..b6ebb9b0eb 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -369,6 +369,12 @@ typedef CPURISCVState CPUArchState;
>  typedef RISCVCPU ArchCPU;
>  #include "exec/cpu-all.h"
>
> +/* share data between vector helpers and decode code */
> +FIELD(VDATA, MLEN, 0, 8)
> +FIELD(VDATA, VM, 8, 1)
> +FIELD(VDATA, LMUL, 9, 2)
> +FIELD(VDATA, NF, 11, 4)
> +
>  FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
>  FIELD(TB_FLAGS, LMUL, 3, 2)
>  FIELD(TB_FLAGS, SEW, 5, 3)
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 3c28c7e407..87dfa90609 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -78,3 +78,108 @@ DEF_HELPER_1(tlb_flush, void, env)
>  #endif
>  /* Vector functions */
>  DEF_HELPER_3(vsetvl, tl, env, tl, tl)
> +DEF_HELPER_5(vlb_v_b, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlb_v_b_mask, void, ptr, ptr, tl, env, i32)

Do you mind explaining why we have *_mask versions? I'm struggling to
understand this.

> +DEF_HELPER_5(vlb_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlb_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlb_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlb_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlb_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlb_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlh_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlh_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlh_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlh_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlh_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlh_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlw_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlw_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlw_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlw_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_b, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_b_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_b, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_b_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhu_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhu_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhu_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhu_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhu_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhu_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlwu_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlwu_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlwu_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlwu_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_b, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_b_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsh_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsh_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsh_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsh_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsh_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsh_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsw_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsw_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsw_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsw_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vse_v_b, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vse_v_b_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vse_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vse_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vse_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vse_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vse_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vse_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_6(vlsb_v_b, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsb_v_h, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsb_v_w, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsb_v_d, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsh_v_h, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsh_v_w, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsh_v_d, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsw_v_w, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsw_v_d, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlse_v_b, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlse_v_h, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlse_v_w, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlse_v_d, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsbu_v_b, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsbu_v_h, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsbu_v_w, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlsbu_v_d, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlshu_v_h, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlshu_v_w, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlshu_v_d, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlswu_v_w, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlswu_v_d, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vssb_v_b, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vssb_v_h, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vssb_v_w, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vssb_v_d, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vssh_v_h, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vssh_v_w, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vssh_v_d, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vssw_v_w, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vssw_v_d, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vsse_v_b, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vsse_v_h, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vsse_v_w, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vsse_v_d, void, ptr, ptr, tl, tl, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 53340bdbc4..ef521152c5 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -25,6 +25,7 @@
>  %sh10    20:10
>  %csr    20:12
>  %rm     12:3
> +%nf     29:3                     !function=ex_plus_1
>
>  # immediates:
>  %imm_i    20:s12
> @@ -43,6 +44,8 @@
>  &u    imm rd
>  &shift     shamt rs1 rd
>  &atomic    aq rl rs2 rs1 rd
> +&r2nfvm    vm rd rs1 nf
> +&rnfvm     vm rd rs1 rs2 nf
>
>  # Formats 32:
>  @r       .......   ..... ..... ... ..... ....... &r                %rs2 %rs1 %rd
> @@ -62,6 +65,8 @@
>  @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
>  @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
>  @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
> +@r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
> +@r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
>  @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
>
>  @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
> @@ -210,5 +215,32 @@ fcvt_d_w   1101001  00000 ..... ... ..... 1010011 @r2_rm
>  fcvt_d_wu  1101001  00001 ..... ... ..... 1010011 @r2_rm
>
>  # *** RV32V Extension ***
> +
> +# *** Vector loads and stores are encoded within LOADFP/STORE-FP ***
> +vlb_v      ... 100 . 00000 ..... 000 ..... 0000111 @r2_nfvm
> +vlh_v      ... 100 . 00000 ..... 101 ..... 0000111 @r2_nfvm
> +vlw_v      ... 100 . 00000 ..... 110 ..... 0000111 @r2_nfvm
> +vle_v      ... 000 . 00000 ..... 111 ..... 0000111 @r2_nfvm
> +vlbu_v     ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm
> +vlhu_v     ... 000 . 00000 ..... 101 ..... 0000111 @r2_nfvm
> +vlwu_v     ... 000 . 00000 ..... 110 ..... 0000111 @r2_nfvm
> +vsb_v      ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm
> +vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
> +vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
> +vse_v      ... 000 . 00000 ..... 111 ..... 0100111 @r2_nfvm
> +
> +vlsb_v     ... 110 . ..... ..... 000 ..... 0000111 @r_nfvm
> +vlsh_v     ... 110 . ..... ..... 101 ..... 0000111 @r_nfvm
> +vlsw_v     ... 110 . ..... ..... 110 ..... 0000111 @r_nfvm
> +vlse_v     ... 010 . ..... ..... 111 ..... 0000111 @r_nfvm
> +vlsbu_v    ... 010 . ..... ..... 000 ..... 0000111 @r_nfvm
> +vlshu_v    ... 010 . ..... ..... 101 ..... 0000111 @r_nfvm
> +vlswu_v    ... 010 . ..... ..... 110 ..... 0000111 @r_nfvm
> +vssb_v     ... 010 . ..... ..... 000 ..... 0100111 @r_nfvm
> +vssh_v     ... 010 . ..... ..... 101 ..... 0100111 @r_nfvm
> +vssw_v     ... 010 . ..... ..... 110 ..... 0100111 @r_nfvm
> +vsse_v     ... 010 . ..... ..... 111 ..... 0100111 @r_nfvm
> +
> +# *** new major opcode OP-V ***
>  vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
>  vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
> index da82c72bbf..d85f2aec68 100644
> --- a/target/riscv/insn_trans/trans_rvv.inc.c
> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
> @@ -15,6 +15,8 @@
>   * You should have received a copy of the GNU General Public License along with
>   * this program.  If not, see <http://www.gnu.org/licenses/>.
>   */
> +#include "tcg/tcg-op-gvec.h"
> +#include "tcg/tcg-gvec-desc.h"
>
>  static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl * a)
>  {
> @@ -67,3 +69,341 @@ static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli * a)
>      tcg_temp_free(dst);
>      return true;
>  }
> +
> +/* vector register offset from env */
> +static uint32_t vreg_ofs(DisasContext *s, int reg)
> +{
> +    return offsetof(CPURISCVState, vreg) + reg * s->vlen / 8;
> +}
> +
> +/* check functions */
> +static bool vext_check_isa_ill(DisasContext *s, target_ulong isa)
> +{
> +    return !s->vill && ((s->misa & isa) == isa);
> +}

I don't think we need a new function to check ISA.

> +
> +/*
> + * There are two rules check here.
> + *
> + * 1. Vector register numbers are multiples of LMUL. (Section 3.2)
> + *
> + * 2. For all widening instructions, the destination LMUL value must also be
> + *    a supported LMUL value. (Section 11.2)
> + */
> +static bool vext_check_reg(DisasContext *s, uint32_t reg, bool widen)
> +{
> +    /*
> +     * The destination vector register group results are arranged as if both
> +     * SEW and LMUL were at twice their current settings. (Section 11.2).
> +     */
> +    int legal = widen ? 2 << s->lmul : 1 << s->lmul;
> +
> +    return !((s->lmul == 0x3 && widen) || (reg % legal));

Where does this 3 come from?


> +}
> +
> +/*
> + * There are two rules check here.
> + *
> + * 1. The destination vector register group for a masked vector instruction can
> + *    only overlap the source mask register (v0) when LMUL=1. (Section 5.3)
> + *
> + * 2. In widen instructions and some other insturctions, like vslideup.vx,
> + *    there is no need to check whether LMUL=1.
> + */
> +static bool vext_check_overlap_mask(DisasContext *s, uint32_t vd, bool vm,
> +    bool force)
> +{
> +    return (vm != 0 || vd != 0) || (!force && (s->lmul == 0));
> +}
> +
> +/* The LMUL setting must be such that LMUL * NFIELDS <= 8. (Section 7.8) */
> +static bool vext_check_nf(DisasContext *s, uint32_t nf)
> +{
> +    return (1 << s->lmul) * nf <= 8;
> +}
> +
> +/* common translation macro */
> +#define GEN_VEXT_TRANS(NAME, SEQ, ARGTYPE, OP, CHECK)      \
> +static bool trans_##NAME(DisasContext *s, arg_##ARGTYPE *a)\
> +{                                                          \
> +    if (CHECK(s, a)) {                                     \
> +        return OP(s, a, SEQ);                              \
> +    }                                                      \
> +    return false;                                          \
> +}
> +
> +/*
> + *** unit stride load and store
> + */
> +typedef void gen_helper_ldst_us(TCGv_ptr, TCGv_ptr, TCGv,
> +        TCGv_env, TCGv_i32);
> +
> +static bool ldst_us_trans(uint32_t vd, uint32_t rs1, uint32_t data,
> +        gen_helper_ldst_us *fn, DisasContext *s)
> +{
> +    TCGv_ptr dest, mask;
> +    TCGv base;
> +    TCGv_i32 desc;
> +
> +    dest = tcg_temp_new_ptr();
> +    mask = tcg_temp_new_ptr();
> +    base = tcg_temp_new();
> +
> +    /*
> +     * As simd_desc supports at most 256 bytes, and in this implementation,
> +     * the max vector group length is 2048 bytes. So split it into two parts.
> +     *
> +     * The first part is vlen in bytes, encoded in maxsz of simd_desc.
> +     * The second part is lmul, encoded in data of simd_desc.
> +     */
> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
> +
> +    gen_get_gpr(base, rs1);
> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
> +
> +    fn(dest, mask, base, cpu_env, desc);
> +
> +    tcg_temp_free_ptr(dest);
> +    tcg_temp_free_ptr(mask);
> +    tcg_temp_free(base);
> +    tcg_temp_free_i32(desc);
> +    return true;
> +}
> +
> +static bool ld_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
> +{
> +    uint32_t data = 0;
> +    gen_helper_ldst_us *fn;
> +    static gen_helper_ldst_us * const fns[2][7][4] = {
> +        /* masked unit stride load */
> +        { { gen_helper_vlb_v_b_mask,  gen_helper_vlb_v_h_mask,
> +            gen_helper_vlb_v_w_mask,  gen_helper_vlb_v_d_mask },
> +          { NULL,                     gen_helper_vlh_v_h_mask,
> +            gen_helper_vlh_v_w_mask,  gen_helper_vlh_v_d_mask },
> +          { NULL,                     NULL,
> +            gen_helper_vlw_v_w_mask,  gen_helper_vlw_v_d_mask },
> +          { gen_helper_vle_v_b_mask,  gen_helper_vle_v_h_mask,
> +            gen_helper_vle_v_w_mask,  gen_helper_vle_v_d_mask },
> +          { gen_helper_vlbu_v_b_mask, gen_helper_vlbu_v_h_mask,
> +            gen_helper_vlbu_v_w_mask, gen_helper_vlbu_v_d_mask },
> +          { NULL,                     gen_helper_vlhu_v_h_mask,
> +            gen_helper_vlhu_v_w_mask, gen_helper_vlhu_v_d_mask },
> +          { NULL,                     NULL,
> +            gen_helper_vlwu_v_w_mask, gen_helper_vlwu_v_d_mask } },
> +        /* unmasked unit stride load */
> +        { { gen_helper_vlb_v_b,  gen_helper_vlb_v_h,
> +            gen_helper_vlb_v_w,  gen_helper_vlb_v_d },
> +          { NULL,                gen_helper_vlh_v_h,
> +            gen_helper_vlh_v_w,  gen_helper_vlh_v_d },
> +          { NULL,                NULL,
> +            gen_helper_vlw_v_w,  gen_helper_vlw_v_d },
> +          { gen_helper_vle_v_b,  gen_helper_vle_v_h,
> +            gen_helper_vle_v_w,  gen_helper_vle_v_d },
> +          { gen_helper_vlbu_v_b, gen_helper_vlbu_v_h,
> +            gen_helper_vlbu_v_w, gen_helper_vlbu_v_d },
> +          { NULL,                gen_helper_vlhu_v_h,
> +            gen_helper_vlhu_v_w, gen_helper_vlhu_v_d },
> +          { NULL,                NULL,
> +            gen_helper_vlwu_v_w, gen_helper_vlwu_v_d } }
> +    };
> +
> +    fn =  fns[a->vm][seq][s->sew];
> +    if (fn == NULL) {
> +        return false;
> +    }
> +
> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
> +    return ldst_us_trans(a->rd, a->rs1, data, fn, s);
> +}
> +
> +static bool ld_us_check(DisasContext *s, arg_r2nfvm* a)
> +{
> +    return (vext_check_isa_ill(s, RVV) &&
> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
> +            vext_check_reg(s, a->rd, false) &&
> +            vext_check_nf(s, a->nf));
> +}
> +
> +GEN_VEXT_TRANS(vlb_v, 0, r2nfvm, ld_us_op, ld_us_check)
> +GEN_VEXT_TRANS(vlh_v, 1, r2nfvm, ld_us_op, ld_us_check)
> +GEN_VEXT_TRANS(vlw_v, 2, r2nfvm, ld_us_op, ld_us_check)
> +GEN_VEXT_TRANS(vle_v, 3, r2nfvm, ld_us_op, ld_us_check)
> +GEN_VEXT_TRANS(vlbu_v, 4, r2nfvm, ld_us_op, ld_us_check)
> +GEN_VEXT_TRANS(vlhu_v, 5, r2nfvm, ld_us_op, ld_us_check)
> +GEN_VEXT_TRANS(vlwu_v, 6, r2nfvm, ld_us_op, ld_us_check)
> +
> +static bool st_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
> +{
> +    uint32_t data = 0;
> +    gen_helper_ldst_us *fn;
> +    static gen_helper_ldst_us * const fns[2][4][4] = {
> +        /* masked unit stride load and store */
> +        { { gen_helper_vsb_v_b_mask,  gen_helper_vsb_v_h_mask,
> +            gen_helper_vsb_v_w_mask,  gen_helper_vsb_v_d_mask },
> +          { NULL,                     gen_helper_vsh_v_h_mask,
> +            gen_helper_vsh_v_w_mask,  gen_helper_vsh_v_d_mask },
> +          { NULL,                     NULL,
> +            gen_helper_vsw_v_w_mask,  gen_helper_vsw_v_d_mask },
> +          { gen_helper_vse_v_b_mask,  gen_helper_vse_v_h_mask,
> +            gen_helper_vse_v_w_mask,  gen_helper_vse_v_d_mask } },
> +        /* unmasked unit stride store */
> +        { { gen_helper_vsb_v_b,  gen_helper_vsb_v_h,
> +            gen_helper_vsb_v_w,  gen_helper_vsb_v_d },
> +          { NULL,                gen_helper_vsh_v_h,
> +            gen_helper_vsh_v_w,  gen_helper_vsh_v_d },
> +          { NULL,                NULL,
> +            gen_helper_vsw_v_w,  gen_helper_vsw_v_d },
> +          { gen_helper_vse_v_b,  gen_helper_vse_v_h,
> +            gen_helper_vse_v_w,  gen_helper_vse_v_d } }
> +    };
> +
> +    fn =  fns[a->vm][seq][s->sew];
> +    if (fn == NULL) {
> +        return false;
> +    }
> +
> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
> +    return ldst_us_trans(a->rd, a->rs1, data, fn, s);
> +}
> +
> +static bool st_us_check(DisasContext *s, arg_r2nfvm* a)
> +{
> +    return (vext_check_isa_ill(s, RVV) &&
> +            vext_check_reg(s, a->rd, false) &&
> +            vext_check_nf(s, a->nf));
> +}
> +
> +GEN_VEXT_TRANS(vsb_v, 0, r2nfvm, st_us_op, st_us_check)
> +GEN_VEXT_TRANS(vsh_v, 1, r2nfvm, st_us_op, st_us_check)
> +GEN_VEXT_TRANS(vsw_v, 2, r2nfvm, st_us_op, st_us_check)
> +GEN_VEXT_TRANS(vse_v, 3, r2nfvm, st_us_op, st_us_check)
> +
> +/*
> + *** stride load and store
> + */
> +typedef void gen_helper_ldst_stride(TCGv_ptr, TCGv_ptr, TCGv,
> +        TCGv, TCGv_env, TCGv_i32);
> +
> +static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, uint32_t rs2,
> +        uint32_t data, gen_helper_ldst_stride *fn, DisasContext *s)
> +{
> +    TCGv_ptr dest, mask;
> +    TCGv base, stride;
> +    TCGv_i32 desc;
> +
> +    dest = tcg_temp_new_ptr();
> +    mask = tcg_temp_new_ptr();
> +    base = tcg_temp_new();
> +    stride = tcg_temp_new();
> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
> +
> +    gen_get_gpr(base, rs1);
> +    gen_get_gpr(stride, rs2);
> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
> +
> +    fn(dest, mask, base, stride, cpu_env, desc);
> +
> +    tcg_temp_free_ptr(dest);
> +    tcg_temp_free_ptr(mask);
> +    tcg_temp_free(base);
> +    tcg_temp_free(stride);
> +    tcg_temp_free_i32(desc);
> +    return true;
> +}
> +
> +static bool ld_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
> +{
> +    uint32_t data = 0;
> +    gen_helper_ldst_stride *fn;
> +    static gen_helper_ldst_stride * const fns[7][4] = {
> +        { gen_helper_vlsb_v_b,  gen_helper_vlsb_v_h,
> +          gen_helper_vlsb_v_w,  gen_helper_vlsb_v_d },
> +        { NULL,                 gen_helper_vlsh_v_h,
> +          gen_helper_vlsh_v_w,  gen_helper_vlsh_v_d },
> +        { NULL,                 NULL,
> +          gen_helper_vlsw_v_w,  gen_helper_vlsw_v_d },
> +        { gen_helper_vlse_v_b,  gen_helper_vlse_v_h,
> +          gen_helper_vlse_v_w,  gen_helper_vlse_v_d },
> +        { gen_helper_vlsbu_v_b, gen_helper_vlsbu_v_h,
> +          gen_helper_vlsbu_v_w, gen_helper_vlsbu_v_d },
> +        { NULL,                 gen_helper_vlshu_v_h,
> +          gen_helper_vlshu_v_w, gen_helper_vlshu_v_d },
> +        { NULL,                 NULL,
> +          gen_helper_vlswu_v_w, gen_helper_vlswu_v_d },
> +    };
> +
> +    fn =  fns[seq][s->sew];
> +    if (fn == NULL) {
> +        return false;
> +    }
> +
> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
> +    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
> +}
> +
> +static bool ld_stride_check(DisasContext *s, arg_rnfvm* a)
> +{
> +    return (vext_check_isa_ill(s, RVV) &&
> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
> +            vext_check_reg(s, a->rd, false) &&
> +            vext_check_nf(s, a->nf));
> +}
> +
> +GEN_VEXT_TRANS(vlsb_v, 0, rnfvm, ld_stride_op, ld_stride_check)
> +GEN_VEXT_TRANS(vlsh_v, 1, rnfvm, ld_stride_op, ld_stride_check)
> +GEN_VEXT_TRANS(vlsw_v, 2, rnfvm, ld_stride_op, ld_stride_check)
> +GEN_VEXT_TRANS(vlse_v, 3, rnfvm, ld_stride_op, ld_stride_check)
> +GEN_VEXT_TRANS(vlsbu_v, 4, rnfvm, ld_stride_op, ld_stride_check)
> +GEN_VEXT_TRANS(vlshu_v, 5, rnfvm, ld_stride_op, ld_stride_check)
> +GEN_VEXT_TRANS(vlswu_v, 6, rnfvm, ld_stride_op, ld_stride_check)
> +
> +static bool st_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
> +{
> +    uint32_t data = 0;
> +    gen_helper_ldst_stride *fn;
> +    static gen_helper_ldst_stride * const fns[4][4] = {
> +        /* masked stride store */
> +        { gen_helper_vssb_v_b,  gen_helper_vssb_v_h,
> +          gen_helper_vssb_v_w,  gen_helper_vssb_v_d },
> +        { NULL,                 gen_helper_vssh_v_h,
> +          gen_helper_vssh_v_w,  gen_helper_vssh_v_d },
> +        { NULL,                 NULL,
> +          gen_helper_vssw_v_w,  gen_helper_vssw_v_d },
> +        { gen_helper_vsse_v_b,  gen_helper_vsse_v_h,
> +          gen_helper_vsse_v_w,  gen_helper_vsse_v_d }
> +    };
> +
> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
> +    fn =  fns[seq][s->sew];
> +    if (fn == NULL) {
> +        return false;
> +    }
> +
> +    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
> +}
> +
> +static bool st_stride_check(DisasContext *s, arg_rnfvm* a)
> +{
> +    return (vext_check_isa_ill(s, RVV) &&
> +            vext_check_reg(s, a->rd, false) &&
> +            vext_check_nf(s, a->nf));
> +}
> +
> +GEN_VEXT_TRANS(vssb_v, 0, rnfvm, st_stride_op, st_stride_check)
> +GEN_VEXT_TRANS(vssh_v, 1, rnfvm, st_stride_op, st_stride_check)
> +GEN_VEXT_TRANS(vssw_v, 2, rnfvm, st_stride_op, st_stride_check)
> +GEN_VEXT_TRANS(vsse_v, 3, rnfvm, st_stride_op, st_stride_check)

Looks good

> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index af07ac4160..852545b77e 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -61,6 +61,7 @@ typedef struct DisasContext {
>      uint8_t lmul;
>      uint8_t sew;
>      uint16_t vlen;
> +    uint16_t mlen;
>      bool vl_eq_vlmax;
>  } DisasContext;
>
> @@ -548,6 +549,11 @@ static void decode_RV32_64C(DisasContext *ctx, uint16_t opcode)
>      }
>  }
>
> +static int ex_plus_1(DisasContext *ctx, int nf)
> +{
> +    return nf + 1;
> +}
> +
>  #define EX_SH(amount) \
>      static int ex_shift_##amount(DisasContext *ctx, int imm) \
>      {                                         \
> @@ -784,6 +790,7 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
>      ctx->vill = FIELD_EX32(tb_flags, TB_FLAGS, VILL);
>      ctx->sew = FIELD_EX32(tb_flags, TB_FLAGS, SEW);
>      ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
> +    ctx->mlen = 1 << (ctx->sew  + 3 - ctx->lmul);
>      ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
>  }
>
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 2afe716f2a..ebfabd2946 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -18,8 +18,10 @@
>
>  #include "qemu/osdep.h"
>  #include "cpu.h"
> +#include "exec/memop.h"
>  #include "exec/exec-all.h"
>  #include "exec/helper-proto.h"
> +#include "tcg/tcg-gvec-desc.h"
>  #include <math.h>
>
>  target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
> @@ -51,3 +53,407 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
>      env->vstart = 0;
>      return vl;
>  }
> +
> +/*
> + * Note that vector data is stored in host-endian 64-bit chunks,
> + * so addressing units smaller than that needs a host-endian fixup.
> + */
> +#ifdef HOST_WORDS_BIGENDIAN
> +#define H1(x)   ((x) ^ 7)
> +#define H1_2(x) ((x) ^ 6)
> +#define H1_4(x) ((x) ^ 4)
> +#define H2(x)   ((x) ^ 3)
> +#define H4(x)   ((x) ^ 1)
> +#define H8(x)   ((x))
> +#else
> +#define H1(x)   (x)
> +#define H1_2(x) (x)
> +#define H1_4(x) (x)
> +#define H2(x)   (x)
> +#define H4(x)   (x)
> +#define H8(x)   (x)
> +#endif

Looks good. Overall this looks good. Do you mind splitting this patch
up a little bit more? It's difficult to review such a long and complex
patch.

Alistair

> +
> +static inline uint32_t vext_nf(uint32_t desc)
> +{
> +    return FIELD_EX32(simd_data(desc), VDATA, NF);
> +}
> +
> +static inline uint32_t vext_mlen(uint32_t desc)
> +{
> +    return FIELD_EX32(simd_data(desc), VDATA, MLEN);
> +}
> +
> +static inline uint32_t vext_vm(uint32_t desc)
> +{
> +    return FIELD_EX32(simd_data(desc), VDATA, VM);
> +}
> +
> +static inline uint32_t vext_lmul(uint32_t desc)
> +{
> +    return FIELD_EX32(simd_data(desc), VDATA, LMUL);
> +}
> +
> +/*
> + * Get vector group length in bytes. Its range is [64, 2048].
> + *
> + * As simd_desc support at most 256, the max vlen is 512 bits.
> + * So vlen in bytes is encoded as maxsz.
> + */
> +static inline uint32_t vext_maxsz(uint32_t desc)
> +{
> +    return simd_maxsz(desc) << vext_lmul(desc);
> +}
> +
> +/*
> + * This function checks watchpoint before real load operation.
> + *
> + * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
> + * In user mode, there is no watchpoint support now.
> + *
> + * It will trigger an exception if there is no mapping in TLB
> + * and page table walk can't fill the TLB entry. Then the guest
> + * software can return here after process the exception or never return.
> + */
> +static void probe_pages(CPURISCVState *env, target_ulong addr,
> +        target_ulong len, uintptr_t ra, MMUAccessType access_type)
> +{
> +    target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
> +    target_ulong curlen = MIN(pagelen, len);
> +
> +    probe_access(env, addr, curlen, access_type,
> +            cpu_mmu_index(env, false), ra);
> +    if (len > curlen) {
> +        addr += curlen;
> +        curlen = len - curlen;
> +        probe_access(env, addr, curlen, access_type,
> +                cpu_mmu_index(env, false), ra);
> +    }
> +}
> +
> +#ifdef HOST_WORDS_BIGENDIAN
> +static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
> +{
> +    /*
> +     * Split the remaining range to two parts.
> +     * The first part is in the last uint64_t unit.
> +     * The second part start from the next uint64_t unit.
> +     */
> +    int part1 = 0, part2 = tot - cnt;
> +    if (cnt % 8) {
> +        part1 = 8 - (cnt % 8);
> +        part2 = tot - cnt - part1;
> +        memset(tail & ~(7ULL), 0, part1);
> +        memset((tail + 8) & ~(7ULL), 0, part2);
> +    } else {
> +        memset(tail, 0, part2);
> +    }
> +}
> +#else
> +static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
> +{
> +    memset(tail, 0, tot - cnt);
> +}
> +#endif
> +
> +static void clearb(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
> +{
> +    int8_t *cur = ((int8_t *)vd + H1(idx));
> +    vext_clear(cur, cnt, tot);
> +}
> +
> +static void clearh(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
> +{
> +    int16_t *cur = ((int16_t *)vd + H2(idx));
> +    vext_clear(cur, cnt, tot);
> +}
> +
> +static void clearl(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
> +{
> +    int32_t *cur = ((int32_t *)vd + H4(idx));
> +    vext_clear(cur, cnt, tot);
> +}
> +
> +static void clearq(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
> +{
> +    int64_t *cur = (int64_t *)vd + idx;
> +    vext_clear(cur, cnt, tot);
> +}
> +
> +
> +static inline int vext_elem_mask(void *v0, int mlen, int index)
> +{
> +    int idx = (index * mlen) / 64;
> +    int pos = (index * mlen) % 64;
> +    return (((uint64_t *)v0)[idx] >> pos) & 1;
> +}
> +
> +/* elements operations for load and store */
> +typedef void (*vext_ldst_elem_fn)(CPURISCVState *env, target_ulong addr,
> +        uint32_t idx, void *vd, uintptr_t retaddr);
> +typedef void (*vext_ld_clear_elem)(void *vd, uint32_t idx,
> +        uint32_t cnt, uint32_t tot);
> +
> +#define GEN_VEXT_LD_ELEM(NAME, MTYPE, ETYPE, H, LDSUF)     \
> +static void NAME(CPURISCVState *env, abi_ptr addr,         \
> +        uint32_t idx, void *vd, uintptr_t retaddr)         \
> +{                                                          \
> +    MTYPE data;                                            \
> +    ETYPE *cur = ((ETYPE *)vd + H(idx));                   \
> +    data = cpu_##LDSUF##_data_ra(env, addr, retaddr);      \
> +    *cur = data;                                           \
> +}                                                          \
> +
> +GEN_VEXT_LD_ELEM(ldb_b, int8_t,  int8_t,  H1, ldsb)
> +GEN_VEXT_LD_ELEM(ldb_h, int8_t,  int16_t, H2, ldsb)
> +GEN_VEXT_LD_ELEM(ldb_w, int8_t,  int32_t, H4, ldsb)
> +GEN_VEXT_LD_ELEM(ldb_d, int8_t,  int64_t, H8, ldsb)
> +GEN_VEXT_LD_ELEM(ldh_h, int16_t, int16_t, H2, ldsw)
> +GEN_VEXT_LD_ELEM(ldh_w, int16_t, int32_t, H4, ldsw)
> +GEN_VEXT_LD_ELEM(ldh_d, int16_t, int64_t, H8, ldsw)
> +GEN_VEXT_LD_ELEM(ldw_w, int32_t, int32_t, H4, ldl)
> +GEN_VEXT_LD_ELEM(ldw_d, int32_t, int64_t, H8, ldl)
> +GEN_VEXT_LD_ELEM(lde_b, int8_t,  int8_t,  H1, ldsb)
> +GEN_VEXT_LD_ELEM(lde_h, int16_t, int16_t, H2, ldsw)
> +GEN_VEXT_LD_ELEM(lde_w, int32_t, int32_t, H4, ldl)
> +GEN_VEXT_LD_ELEM(lde_d, int64_t, int64_t, H8, ldq)
> +GEN_VEXT_LD_ELEM(ldbu_b, uint8_t,  uint8_t,  H1, ldub)
> +GEN_VEXT_LD_ELEM(ldbu_h, uint8_t,  uint16_t, H2, ldub)
> +GEN_VEXT_LD_ELEM(ldbu_w, uint8_t,  uint32_t, H4, ldub)
> +GEN_VEXT_LD_ELEM(ldbu_d, uint8_t,  uint64_t, H8, ldub)
> +GEN_VEXT_LD_ELEM(ldhu_h, uint16_t, uint16_t, H2, lduw)
> +GEN_VEXT_LD_ELEM(ldhu_w, uint16_t, uint32_t, H4, lduw)
> +GEN_VEXT_LD_ELEM(ldhu_d, uint16_t, uint64_t, H8, lduw)
> +GEN_VEXT_LD_ELEM(ldwu_w, uint32_t, uint32_t, H4, ldl)
> +GEN_VEXT_LD_ELEM(ldwu_d, uint32_t, uint64_t, H8, ldl)
> +
> +#define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF)          \
> +static void NAME(CPURISCVState *env, abi_ptr addr,       \
> +        uint32_t idx, void *vd, uintptr_t retaddr)       \
> +{                                                        \
> +    ETYPE data = *((ETYPE *)vd + H(idx));                \
> +    cpu_##STSUF##_data_ra(env, addr, data, retaddr);     \
> +}
> +GEN_VEXT_ST_ELEM(stb_b, int8_t,  H1, stb)
> +GEN_VEXT_ST_ELEM(stb_h, int16_t, H2, stb)
> +GEN_VEXT_ST_ELEM(stb_w, int32_t, H4, stb)
> +GEN_VEXT_ST_ELEM(stb_d, int64_t, H8, stb)
> +GEN_VEXT_ST_ELEM(sth_h, int16_t, H2, stw)
> +GEN_VEXT_ST_ELEM(sth_w, int32_t, H4, stw)
> +GEN_VEXT_ST_ELEM(sth_d, int64_t, H8, stw)
> +GEN_VEXT_ST_ELEM(stw_w, int32_t, H4, stl)
> +GEN_VEXT_ST_ELEM(stw_d, int64_t, H8, stl)
> +GEN_VEXT_ST_ELEM(ste_b, int8_t,  H1, stb)
> +GEN_VEXT_ST_ELEM(ste_h, int16_t, H2, stw)
> +GEN_VEXT_ST_ELEM(ste_w, int32_t, H4, stl)
> +GEN_VEXT_ST_ELEM(ste_d, int64_t, H8, stq)
> +
> +/*
> + *** stride: access vector element from strided memory
> + */
> +static void vext_ldst_stride(void *vd, void *v0, target_ulong base,
> +        target_ulong stride, CPURISCVState *env, uint32_t desc, uint32_t vm,
> +        vext_ldst_elem_fn ldst_elem, vext_ld_clear_elem clear_elem,
> +        uint32_t esz, uint32_t msz, uintptr_t ra, MMUAccessType access_type)
> +{
> +    uint32_t i, k;
> +    uint32_t nf = vext_nf(desc);
> +    uint32_t mlen = vext_mlen(desc);
> +    uint32_t vlmax = vext_maxsz(desc) / esz;
> +
> +    if (env->vl == 0) {
> +        return;
> +    }
> +    /* probe every access*/
> +    for (i = 0; i < env->vl; i++) {
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +            continue;
> +        }
> +        probe_pages(env, base + stride * i, nf * msz, ra, access_type);
> +    }
> +    /* do real access */
> +    for (i = 0; i < env->vl; i++) {
> +        k = 0;
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +            continue;
> +        }
> +        while (k < nf) {
> +            target_ulong addr = base + stride * i + k * msz;
> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
> +            k++;
> +        }
> +    }
> +    /* clear tail elements */
> +    if (clear_elem) {
> +        for (k = 0; k < nf; k++) {
> +            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
> +        }
> +    }
> +}
> +
> +#define GEN_VEXT_LD_STRIDE(NAME, MTYPE, ETYPE, LOAD_FN, CLEAR_FN)       \
> +void HELPER(NAME)(void *vd, void * v0, target_ulong base,               \
> +        target_ulong stride, CPURISCVState *env, uint32_t desc)         \
> +{                                                                       \
> +    uint32_t vm = vext_vm(desc);                                        \
> +    vext_ldst_stride(vd, v0, base, stride, env, desc, vm, LOAD_FN,      \
> +        CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);\
> +}
> +
> +GEN_VEXT_LD_STRIDE(vlsb_v_b,  int8_t,   int8_t,   ldb_b,  clearb)
> +GEN_VEXT_LD_STRIDE(vlsb_v_h,  int8_t,   int16_t,  ldb_h,  clearh)
> +GEN_VEXT_LD_STRIDE(vlsb_v_w,  int8_t,   int32_t,  ldb_w,  clearl)
> +GEN_VEXT_LD_STRIDE(vlsb_v_d,  int8_t,   int64_t,  ldb_d,  clearq)
> +GEN_VEXT_LD_STRIDE(vlsh_v_h,  int16_t,  int16_t,  ldh_h,  clearh)
> +GEN_VEXT_LD_STRIDE(vlsh_v_w,  int16_t,  int32_t,  ldh_w,  clearl)
> +GEN_VEXT_LD_STRIDE(vlsh_v_d,  int16_t,  int64_t,  ldh_d,  clearq)
> +GEN_VEXT_LD_STRIDE(vlsw_v_w,  int32_t,  int32_t,  ldw_w,  clearl)
> +GEN_VEXT_LD_STRIDE(vlsw_v_d,  int32_t,  int64_t,  ldw_d,  clearq)
> +GEN_VEXT_LD_STRIDE(vlse_v_b,  int8_t,   int8_t,   lde_b,  clearb)
> +GEN_VEXT_LD_STRIDE(vlse_v_h,  int16_t,  int16_t,  lde_h,  clearh)
> +GEN_VEXT_LD_STRIDE(vlse_v_w,  int32_t,  int32_t,  lde_w,  clearl)
> +GEN_VEXT_LD_STRIDE(vlse_v_d,  int64_t,  int64_t,  lde_d,  clearq)
> +GEN_VEXT_LD_STRIDE(vlsbu_v_b, uint8_t,  uint8_t,  ldbu_b, clearb)
> +GEN_VEXT_LD_STRIDE(vlsbu_v_h, uint8_t,  uint16_t, ldbu_h, clearh)
> +GEN_VEXT_LD_STRIDE(vlsbu_v_w, uint8_t,  uint32_t, ldbu_w, clearl)
> +GEN_VEXT_LD_STRIDE(vlsbu_v_d, uint8_t,  uint64_t, ldbu_d, clearq)
> +GEN_VEXT_LD_STRIDE(vlshu_v_h, uint16_t, uint16_t, ldhu_h, clearh)
> +GEN_VEXT_LD_STRIDE(vlshu_v_w, uint16_t, uint32_t, ldhu_w, clearl)
> +GEN_VEXT_LD_STRIDE(vlshu_v_d, uint16_t, uint64_t, ldhu_d, clearq)
> +GEN_VEXT_LD_STRIDE(vlswu_v_w, uint32_t, uint32_t, ldwu_w, clearl)
> +GEN_VEXT_LD_STRIDE(vlswu_v_d, uint32_t, uint64_t, ldwu_d, clearq)
> +
> +#define GEN_VEXT_ST_STRIDE(NAME, MTYPE, ETYPE, STORE_FN)                \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
> +        target_ulong stride, CPURISCVState *env, uint32_t desc)         \
> +{                                                                       \
> +    uint32_t vm = vext_vm(desc);                                        \
> +    vext_ldst_stride(vd, v0, base, stride, env, desc, vm, STORE_FN,     \
> +        NULL, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);   \
> +}
> +
> +GEN_VEXT_ST_STRIDE(vssb_v_b, int8_t,  int8_t,  stb_b)
> +GEN_VEXT_ST_STRIDE(vssb_v_h, int8_t,  int16_t, stb_h)
> +GEN_VEXT_ST_STRIDE(vssb_v_w, int8_t,  int32_t, stb_w)
> +GEN_VEXT_ST_STRIDE(vssb_v_d, int8_t,  int64_t, stb_d)
> +GEN_VEXT_ST_STRIDE(vssh_v_h, int16_t, int16_t, sth_h)
> +GEN_VEXT_ST_STRIDE(vssh_v_w, int16_t, int32_t, sth_w)
> +GEN_VEXT_ST_STRIDE(vssh_v_d, int16_t, int64_t, sth_d)
> +GEN_VEXT_ST_STRIDE(vssw_v_w, int32_t, int32_t, stw_w)
> +GEN_VEXT_ST_STRIDE(vssw_v_d, int32_t, int64_t, stw_d)
> +GEN_VEXT_ST_STRIDE(vsse_v_b, int8_t,  int8_t,  ste_b)
> +GEN_VEXT_ST_STRIDE(vsse_v_h, int16_t, int16_t, ste_h)
> +GEN_VEXT_ST_STRIDE(vsse_v_w, int32_t, int32_t, ste_w)
> +GEN_VEXT_ST_STRIDE(vsse_v_d, int64_t, int64_t, ste_d)
> +
> +/*
> + *** unit-stride: access elements stored contiguously in memory
> + */
> +
> +/* unmasked unit-stride load and store operation*/
> +static inline void vext_ldst_us(void *vd, target_ulong base,
> +        CPURISCVState *env, uint32_t desc,
> +        vext_ldst_elem_fn ldst_elem,
> +        vext_ld_clear_elem clear_elem,
> +        uint32_t esz, uint32_t msz, uintptr_t ra,
> +        MMUAccessType access_type)
> +{
> +    uint32_t i, k;
> +    uint32_t nf = vext_nf(desc);
> +    uint32_t vlmax = vext_maxsz(desc) / esz;
> +
> +    if (env->vl == 0) {
> +        return;
> +    }
> +    /* probe every access */
> +    probe_pages(env, base, env->vl * nf * msz, ra, access_type);
> +    /* load bytes from guest memory */
> +    for (i = 0; i < env->vl; i++) {
> +        k = 0;
> +        while (k < nf) {
> +            target_ulong addr = base + (i * nf + k) * msz;
> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
> +            k++;
> +        }
> +    }
> +    /* clear tail elements */
> +    if (clear_elem) {
> +        for (k = 0; k < nf; k++) {
> +            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
> +        }
> +    }
> +}
> +
> +/*
> + * masked unit-stride load and store operation will be a special case of stride,
> + * stride = NF * sizeof (MTYPE)
> + */
> +
> +#define GEN_VEXT_LD_US(NAME, MTYPE, ETYPE, LOAD_FN, CLEAR_FN)           \
> +void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,         \
> +        CPURISCVState *env, uint32_t desc)                              \
> +{                                                                       \
> +    uint32_t stride = vext_nf(desc) * sizeof(MTYPE);                    \
> +    vext_ldst_stride(vd, v0, base, stride, env, desc, false, LOAD_FN,   \
> +        CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);\
> +}                                                                       \
> +                                                                        \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
> +        CPURISCVState *env, uint32_t desc)                              \
> +{                                                                       \
> +    vext_ldst_us(vd, base, env, desc, LOAD_FN, CLEAR_FN,                \
> +        sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);          \
> +}
> +
> +GEN_VEXT_LD_US(vlb_v_b,  int8_t,   int8_t,   ldb_b,  clearb)
> +GEN_VEXT_LD_US(vlb_v_h,  int8_t,   int16_t,  ldb_h,  clearh)
> +GEN_VEXT_LD_US(vlb_v_w,  int8_t,   int32_t,  ldb_w,  clearl)
> +GEN_VEXT_LD_US(vlb_v_d,  int8_t,   int64_t,  ldb_d,  clearq)
> +GEN_VEXT_LD_US(vlh_v_h,  int16_t,  int16_t,  ldh_h,  clearh)
> +GEN_VEXT_LD_US(vlh_v_w,  int16_t,  int32_t,  ldh_w,  clearl)
> +GEN_VEXT_LD_US(vlh_v_d,  int16_t,  int64_t,  ldh_d,  clearq)
> +GEN_VEXT_LD_US(vlw_v_w,  int32_t,  int32_t,  ldw_w,  clearl)
> +GEN_VEXT_LD_US(vlw_v_d,  int32_t,  int64_t,  ldw_d,  clearq)
> +GEN_VEXT_LD_US(vle_v_b,  int8_t,   int8_t,   lde_b,  clearb)
> +GEN_VEXT_LD_US(vle_v_h,  int16_t,  int16_t,  lde_h,  clearh)
> +GEN_VEXT_LD_US(vle_v_w,  int32_t,  int32_t,  lde_w,  clearl)
> +GEN_VEXT_LD_US(vle_v_d,  int64_t,  int64_t,  lde_d,  clearq)
> +GEN_VEXT_LD_US(vlbu_v_b, uint8_t,  uint8_t,  ldbu_b, clearb)
> +GEN_VEXT_LD_US(vlbu_v_h, uint8_t,  uint16_t, ldbu_h, clearh)
> +GEN_VEXT_LD_US(vlbu_v_w, uint8_t,  uint32_t, ldbu_w, clearl)
> +GEN_VEXT_LD_US(vlbu_v_d, uint8_t,  uint64_t, ldbu_d, clearq)
> +GEN_VEXT_LD_US(vlhu_v_h, uint16_t, uint16_t, ldhu_h, clearh)
> +GEN_VEXT_LD_US(vlhu_v_w, uint16_t, uint32_t, ldhu_w, clearl)
> +GEN_VEXT_LD_US(vlhu_v_d, uint16_t, uint64_t, ldhu_d, clearq)
> +GEN_VEXT_LD_US(vlwu_v_w, uint32_t, uint32_t, ldwu_w, clearl)
> +GEN_VEXT_LD_US(vlwu_v_d, uint32_t, uint64_t, ldwu_d, clearq)
> +
> +#define GEN_VEXT_ST_US(NAME, MTYPE, ETYPE, STORE_FN)                    \
> +void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,         \
> +        CPURISCVState *env, uint32_t desc)                              \
> +{                                                                       \
> +    uint32_t stride = vext_nf(desc) * sizeof(MTYPE);                    \
> +    vext_ldst_stride(vd, v0, base, stride, env, desc, false, STORE_FN,  \
> +        NULL, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);   \
> +}                                                                       \
> +                                                                        \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
> +        CPURISCVState *env, uint32_t desc)                              \
> +{                                                                       \
> +    vext_ldst_us(vd, base, env, desc, STORE_FN, NULL,                   \
> +        sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);         \
> +}
> +
> +GEN_VEXT_ST_US(vsb_v_b, int8_t,  int8_t , stb_b)
> +GEN_VEXT_ST_US(vsb_v_h, int8_t,  int16_t, stb_h)
> +GEN_VEXT_ST_US(vsb_v_w, int8_t,  int32_t, stb_w)
> +GEN_VEXT_ST_US(vsb_v_d, int8_t,  int64_t, stb_d)
> +GEN_VEXT_ST_US(vsh_v_h, int16_t, int16_t, sth_h)
> +GEN_VEXT_ST_US(vsh_v_w, int16_t, int32_t, sth_w)
> +GEN_VEXT_ST_US(vsh_v_d, int16_t, int64_t, sth_d)
> +GEN_VEXT_ST_US(vsw_v_w, int32_t, int32_t, stw_w)
> +GEN_VEXT_ST_US(vsw_v_d, int32_t, int64_t, stw_d)
> +GEN_VEXT_ST_US(vse_v_b, int8_t,  int8_t , ste_b)
> +GEN_VEXT_ST_US(vse_v_h, int16_t, int16_t, ste_h)
> +GEN_VEXT_ST_US(vse_v_w, int32_t, int32_t, ste_w)
> +GEN_VEXT_ST_US(vse_v_d, int64_t, int64_t, ste_d)
> --
> 2.23.0
>


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 06/60] target/riscv: add vector index load and store instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-13 21:21     ` Alistair Francis
  -1 siblings, 0 replies; 336+ messages in thread
From: Alistair Francis @ 2020-03-13 21:21 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: guoren, open list:RISC-V, Richard Henderson,
	qemu-devel@nongnu.org Developers, wxy194768, Chih-Min Chao,
	wenmeng_zhang, Palmer Dabbelt

On Thu, Mar 12, 2020 at 8:11 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Vector indexed operations add the contents of each element of the
> vector offset operand specified by vs2 to the base effective address
> to give the effective address of each element.
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |  35 +++++++
>  target/riscv/insn32.decode              |  13 +++
>  target/riscv/insn_trans/trans_rvv.inc.c | 124 ++++++++++++++++++++++++
>  target/riscv/vector_helper.c            | 117 ++++++++++++++++++++++
>  4 files changed, 289 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 87dfa90609..f9b3da60ca 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -183,3 +183,38 @@ DEF_HELPER_6(vsse_v_b, void, ptr, ptr, tl, tl, env, i32)
>  DEF_HELPER_6(vsse_v_h, void, ptr, ptr, tl, tl, env, i32)
>  DEF_HELPER_6(vsse_v_w, void, ptr, ptr, tl, tl, env, i32)
>  DEF_HELPER_6(vsse_v_d, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlxb_v_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxb_v_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxb_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxb_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxh_v_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxh_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxh_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxw_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxw_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxe_v_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxe_v_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxe_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxe_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxbu_v_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxbu_v_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxbu_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxbu_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxhu_v_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxhu_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxhu_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxwu_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxwu_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxb_v_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxb_v_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxb_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxb_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxh_v_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxh_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxh_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxw_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxw_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxe_v_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxe_v_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxe_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxe_v_d, void, ptr, ptr, tl, ptr, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index ef521152c5..bc36df33b5 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -241,6 +241,19 @@ vssh_v     ... 010 . ..... ..... 101 ..... 0100111 @r_nfvm
>  vssw_v     ... 010 . ..... ..... 110 ..... 0100111 @r_nfvm
>  vsse_v     ... 010 . ..... ..... 111 ..... 0100111 @r_nfvm
>
> +vlxb_v     ... 111 . ..... ..... 000 ..... 0000111 @r_nfvm
> +vlxh_v     ... 111 . ..... ..... 101 ..... 0000111 @r_nfvm
> +vlxw_v     ... 111 . ..... ..... 110 ..... 0000111 @r_nfvm
> +vlxe_v     ... 011 . ..... ..... 111 ..... 0000111 @r_nfvm
> +vlxbu_v    ... 011 . ..... ..... 000 ..... 0000111 @r_nfvm
> +vlxhu_v    ... 011 . ..... ..... 101 ..... 0000111 @r_nfvm
> +vlxwu_v    ... 011 . ..... ..... 110 ..... 0000111 @r_nfvm
> +# Vector ordered-indexed and unordered-indexed store insns.
> +vsxb_v     ... -11 . ..... ..... 000 ..... 0100111 @r_nfvm
> +vsxh_v     ... -11 . ..... ..... 101 ..... 0100111 @r_nfvm
> +vsxw_v     ... -11 . ..... ..... 110 ..... 0100111 @r_nfvm
> +vsxe_v     ... -11 . ..... ..... 111 ..... 0100111 @r_nfvm
> +
>  # *** new major opcode OP-V ***
>  vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
>  vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
> index d85f2aec68..5d1eeef323 100644
> --- a/target/riscv/insn_trans/trans_rvv.inc.c
> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
> @@ -407,3 +407,127 @@ GEN_VEXT_TRANS(vssb_v, 0, rnfvm, st_stride_op, st_stride_check)
>  GEN_VEXT_TRANS(vssh_v, 1, rnfvm, st_stride_op, st_stride_check)
>  GEN_VEXT_TRANS(vssw_v, 2, rnfvm, st_stride_op, st_stride_check)
>  GEN_VEXT_TRANS(vsse_v, 3, rnfvm, st_stride_op, st_stride_check)
> +
> +/*
> + *** index load and store
> + */
> +typedef void gen_helper_ldst_index(TCGv_ptr, TCGv_ptr, TCGv,
> +        TCGv_ptr, TCGv_env, TCGv_i32);
> +
> +static bool ldst_index_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
> +        uint32_t data, gen_helper_ldst_index *fn, DisasContext *s)
> +{
> +    TCGv_ptr dest, mask, index;
> +    TCGv base;
> +    TCGv_i32 desc;
> +
> +    dest = tcg_temp_new_ptr();
> +    mask = tcg_temp_new_ptr();
> +    index = tcg_temp_new_ptr();
> +    base = tcg_temp_new();
> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
> +
> +    gen_get_gpr(base, rs1);
> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
> +    tcg_gen_addi_ptr(index, cpu_env, vreg_ofs(s, vs2));
> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
> +
> +    fn(dest, mask, base, index, cpu_env, desc);
> +
> +    tcg_temp_free_ptr(dest);
> +    tcg_temp_free_ptr(mask);
> +    tcg_temp_free_ptr(index);
> +    tcg_temp_free(base);
> +    tcg_temp_free_i32(desc);
> +    return true;
> +}
> +
> +static bool ld_index_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
> +{
> +    uint32_t data = 0;
> +    gen_helper_ldst_index *fn;
> +    static gen_helper_ldst_index * const fns[7][4] = {
> +        { gen_helper_vlxb_v_b,  gen_helper_vlxb_v_h,
> +          gen_helper_vlxb_v_w,  gen_helper_vlxb_v_d },
> +        { NULL,                 gen_helper_vlxh_v_h,
> +          gen_helper_vlxh_v_w,  gen_helper_vlxh_v_d },
> +        { NULL,                 NULL,
> +          gen_helper_vlxw_v_w,  gen_helper_vlxw_v_d },
> +        { gen_helper_vlxe_v_b,  gen_helper_vlxe_v_h,
> +          gen_helper_vlxe_v_w,  gen_helper_vlxe_v_d },
> +        { gen_helper_vlxbu_v_b, gen_helper_vlxbu_v_h,
> +          gen_helper_vlxbu_v_w, gen_helper_vlxbu_v_d },
> +        { NULL,                 gen_helper_vlxhu_v_h,
> +          gen_helper_vlxhu_v_w, gen_helper_vlxhu_v_d },
> +        { NULL,                 NULL,
> +          gen_helper_vlxwu_v_w, gen_helper_vlxwu_v_d },
> +    };
> +
> +    fn =  fns[seq][s->sew];
> +    if (fn == NULL) {
> +        return false;
> +    }
> +
> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
> +    return ldst_index_trans(a->rd, a->rs1, a->rs2, data, fn, s);
> +}
> +
> +static bool ld_index_check(DisasContext *s, arg_rnfvm* a)
> +{
> +    return (vext_check_isa_ill(s, RVV) &&
> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
> +            vext_check_reg(s, a->rd, false) &&
> +            vext_check_reg(s, a->rs2, false) &&
> +            vext_check_nf(s, a->nf));
> +}
> +
> +GEN_VEXT_TRANS(vlxb_v, 0, rnfvm, ld_index_op, ld_index_check)
> +GEN_VEXT_TRANS(vlxh_v, 1, rnfvm, ld_index_op, ld_index_check)
> +GEN_VEXT_TRANS(vlxw_v, 2, rnfvm, ld_index_op, ld_index_check)
> +GEN_VEXT_TRANS(vlxe_v, 3, rnfvm, ld_index_op, ld_index_check)
> +GEN_VEXT_TRANS(vlxbu_v, 4, rnfvm, ld_index_op, ld_index_check)
> +GEN_VEXT_TRANS(vlxhu_v, 5, rnfvm, ld_index_op, ld_index_check)
> +GEN_VEXT_TRANS(vlxwu_v, 6, rnfvm, ld_index_op, ld_index_check)
> +
> +static bool st_index_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
> +{
> +    uint32_t data = 0;
> +    gen_helper_ldst_index *fn;
> +    static gen_helper_ldst_index * const fns[4][4] = {
> +        { gen_helper_vsxb_v_b,  gen_helper_vsxb_v_h,
> +          gen_helper_vsxb_v_w,  gen_helper_vsxb_v_d },
> +        { NULL,                 gen_helper_vsxh_v_h,
> +          gen_helper_vsxh_v_w,  gen_helper_vsxh_v_d },
> +        { NULL,                 NULL,
> +          gen_helper_vsxw_v_w,  gen_helper_vsxw_v_d },
> +        { gen_helper_vsxe_v_b,  gen_helper_vsxe_v_h,
> +          gen_helper_vsxe_v_w,  gen_helper_vsxe_v_d }
> +    };
> +
> +    fn =  fns[seq][s->sew];
> +    if (fn == NULL) {
> +        return false;
> +    }
> +
> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
> +    return ldst_index_trans(a->rd, a->rs1, a->rs2, data, fn, s);
> +}
> +
> +static bool st_index_check(DisasContext *s, arg_rnfvm* a)
> +{
> +    return (vext_check_isa_ill(s, RVV) &&
> +            vext_check_reg(s, a->rd, false) &&
> +            vext_check_reg(s, a->rs2, false) &&
> +            vext_check_nf(s, a->nf));
> +}
> +
> +GEN_VEXT_TRANS(vsxb_v, 0, rnfvm, st_index_op, st_index_check)
> +GEN_VEXT_TRANS(vsxh_v, 1, rnfvm, st_index_op, st_index_check)
> +GEN_VEXT_TRANS(vsxw_v, 2, rnfvm, st_index_op, st_index_check)
> +GEN_VEXT_TRANS(vsxe_v, 3, rnfvm, st_index_op, st_index_check)
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index ebfabd2946..35cb9f09b4 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -457,3 +457,120 @@ GEN_VEXT_ST_US(vse_v_b, int8_t,  int8_t , ste_b)
>  GEN_VEXT_ST_US(vse_v_h, int16_t, int16_t, ste_h)
>  GEN_VEXT_ST_US(vse_v_w, int32_t, int32_t, ste_w)
>  GEN_VEXT_ST_US(vse_v_d, int64_t, int64_t, ste_d)
> +
> +/*
> + *** index: access vector element from indexed memory
> + */
> +typedef target_ulong (*vext_get_index_addr)(target_ulong base,
> +        uint32_t idx, void *vs2);
> +
> +#define GEN_VEXT_GET_INDEX_ADDR(NAME, ETYPE, H)        \
> +static target_ulong NAME(target_ulong base,            \
> +        uint32_t idx, void *vs2)                       \
> +{                                                      \
> +    return (base + *((ETYPE *)vs2 + H(idx)));          \
> +}
> +
> +GEN_VEXT_GET_INDEX_ADDR(idx_b, int8_t,  H1)
> +GEN_VEXT_GET_INDEX_ADDR(idx_h, int16_t, H2)
> +GEN_VEXT_GET_INDEX_ADDR(idx_w, int32_t, H4)
> +GEN_VEXT_GET_INDEX_ADDR(idx_d, int64_t, H8)
> +
> +static inline void vext_ldst_index(void *vd, void *v0, target_ulong base,
> +        void *vs2, CPURISCVState *env, uint32_t desc,
> +        vext_get_index_addr get_index_addr,
> +        vext_ldst_elem_fn ldst_elem,
> +        vext_ld_clear_elem clear_elem,
> +        uint32_t esz, uint32_t msz, uintptr_t ra,
> +        MMUAccessType access_type)
> +{
> +    uint32_t i, k;
> +    uint32_t nf = vext_nf(desc);
> +    uint32_t vm = vext_vm(desc);
> +    uint32_t mlen = vext_mlen(desc);
> +    uint32_t vlmax = vext_maxsz(desc) / esz;
> +
> +    if (env->vl == 0) {
> +        return;
> +    }
> +    /* probe every access*/
> +    for (i = 0; i < env->vl; i++) {
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +            continue;
> +        }
> +        probe_pages(env, get_index_addr(base, i, vs2), nf * msz, ra,
> +                access_type);
> +    }
> +    /* load bytes from guest memory */
> +    for (i = 0; i < env->vl; i++) {
> +        k = 0;
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +            continue;
> +        }
> +        while (k < nf) {
> +            abi_ptr addr = get_index_addr(base, i, vs2) + k * msz;
> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
> +            k++;
> +        }
> +    }
> +    /* clear tail elements */
> +    if (clear_elem) {
> +        for (k = 0; k < nf; k++) {
> +            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
> +        }
> +    }
> +}
> +
> +#define GEN_VEXT_LD_INDEX(NAME, MTYPE, ETYPE, INDEX_FN, LOAD_FN, CLEAR_FN) \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                   \
> +        void *vs2, CPURISCVState *env, uint32_t desc)                      \
> +{                                                                          \
> +    vext_ldst_index(vd, v0, base, vs2, env, desc, INDEX_FN,                \
> +        LOAD_FN, CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE),                   \
> +        GETPC(), MMU_DATA_LOAD);                                           \
> +}
> +GEN_VEXT_LD_INDEX(vlxb_v_b,  int8_t,   int8_t,   idx_b, ldb_b,  clearb)
> +GEN_VEXT_LD_INDEX(vlxb_v_h,  int8_t,   int16_t,  idx_h, ldb_h,  clearh)
> +GEN_VEXT_LD_INDEX(vlxb_v_w,  int8_t,   int32_t,  idx_w, ldb_w,  clearl)
> +GEN_VEXT_LD_INDEX(vlxb_v_d,  int8_t,   int64_t,  idx_d, ldb_d,  clearq)
> +GEN_VEXT_LD_INDEX(vlxh_v_h,  int16_t,  int16_t,  idx_h, ldh_h,  clearh)
> +GEN_VEXT_LD_INDEX(vlxh_v_w,  int16_t,  int32_t,  idx_w, ldh_w,  clearl)
> +GEN_VEXT_LD_INDEX(vlxh_v_d,  int16_t,  int64_t,  idx_d, ldh_d,  clearq)
> +GEN_VEXT_LD_INDEX(vlxw_v_w,  int32_t,  int32_t,  idx_w, ldw_w,  clearl)
> +GEN_VEXT_LD_INDEX(vlxw_v_d,  int32_t,  int64_t,  idx_d, ldw_d,  clearq)
> +GEN_VEXT_LD_INDEX(vlxe_v_b,  int8_t,   int8_t,   idx_b, lde_b,  clearb)
> +GEN_VEXT_LD_INDEX(vlxe_v_h,  int16_t,  int16_t,  idx_h, lde_h,  clearh)
> +GEN_VEXT_LD_INDEX(vlxe_v_w,  int32_t,  int32_t,  idx_w, lde_w,  clearl)
> +GEN_VEXT_LD_INDEX(vlxe_v_d,  int64_t,  int64_t,  idx_d, lde_d,  clearq)
> +GEN_VEXT_LD_INDEX(vlxbu_v_b, uint8_t,  uint8_t,  idx_b, ldbu_b, clearb)
> +GEN_VEXT_LD_INDEX(vlxbu_v_h, uint8_t,  uint16_t, idx_h, ldbu_h, clearh)
> +GEN_VEXT_LD_INDEX(vlxbu_v_w, uint8_t,  uint32_t, idx_w, ldbu_w, clearl)
> +GEN_VEXT_LD_INDEX(vlxbu_v_d, uint8_t,  uint64_t, idx_d, ldbu_d, clearq)
> +GEN_VEXT_LD_INDEX(vlxhu_v_h, uint16_t, uint16_t, idx_h, ldhu_h, clearh)
> +GEN_VEXT_LD_INDEX(vlxhu_v_w, uint16_t, uint32_t, idx_w, ldhu_w, clearl)
> +GEN_VEXT_LD_INDEX(vlxhu_v_d, uint16_t, uint64_t, idx_d, ldhu_d, clearq)
> +GEN_VEXT_LD_INDEX(vlxwu_v_w, uint32_t, uint32_t, idx_w, ldwu_w, clearl)
> +GEN_VEXT_LD_INDEX(vlxwu_v_d, uint32_t, uint64_t, idx_d, ldwu_d, clearq)
> +
> +#define GEN_VEXT_ST_INDEX(NAME, MTYPE, ETYPE, INDEX_FN, STORE_FN)\
> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,         \
> +        void *vs2, CPURISCVState *env, uint32_t desc)            \
> +{                                                                \
> +    vext_ldst_index(vd, v0, base, vs2, env, desc, INDEX_FN,      \
> +        STORE_FN, NULL, sizeof(ETYPE), sizeof(MTYPE),            \
> +        GETPC(), MMU_DATA_STORE);                                \
> +}
> +
> +GEN_VEXT_ST_INDEX(vsxb_v_b, int8_t,  int8_t,  idx_b, stb_b)
> +GEN_VEXT_ST_INDEX(vsxb_v_h, int8_t,  int16_t, idx_h, stb_h)
> +GEN_VEXT_ST_INDEX(vsxb_v_w, int8_t,  int32_t, idx_w, stb_w)
> +GEN_VEXT_ST_INDEX(vsxb_v_d, int8_t,  int64_t, idx_d, stb_d)
> +GEN_VEXT_ST_INDEX(vsxh_v_h, int16_t, int16_t, idx_h, sth_h)
> +GEN_VEXT_ST_INDEX(vsxh_v_w, int16_t, int32_t, idx_w, sth_w)
> +GEN_VEXT_ST_INDEX(vsxh_v_d, int16_t, int64_t, idx_d, sth_d)
> +GEN_VEXT_ST_INDEX(vsxw_v_w, int32_t, int32_t, idx_w, stw_w)
> +GEN_VEXT_ST_INDEX(vsxw_v_d, int32_t, int64_t, idx_d, stw_d)
> +GEN_VEXT_ST_INDEX(vsxe_v_b, int8_t,  int8_t,  idx_b, ste_b)
> +GEN_VEXT_ST_INDEX(vsxe_v_h, int16_t, int16_t, idx_h, ste_h)
> +GEN_VEXT_ST_INDEX(vsxe_v_w, int32_t, int32_t, idx_w, ste_w)
> +GEN_VEXT_ST_INDEX(vsxe_v_d, int64_t, int64_t, idx_d, ste_d)
> --
> 2.23.0
>


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 06/60] target/riscv: add vector index load and store instructions
@ 2020-03-13 21:21     ` Alistair Francis
  0 siblings, 0 replies; 336+ messages in thread
From: Alistair Francis @ 2020-03-13 21:21 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Chih-Min Chao, Palmer Dabbelt, wenmeng_zhang,
	wxy194768, guoren, qemu-devel@nongnu.org Developers,
	open list:RISC-V

On Thu, Mar 12, 2020 at 8:11 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Vector indexed operations add the contents of each element of the
> vector offset operand specified by vs2 to the base effective address
> to give the effective address of each element.
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |  35 +++++++
>  target/riscv/insn32.decode              |  13 +++
>  target/riscv/insn_trans/trans_rvv.inc.c | 124 ++++++++++++++++++++++++
>  target/riscv/vector_helper.c            | 117 ++++++++++++++++++++++
>  4 files changed, 289 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 87dfa90609..f9b3da60ca 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -183,3 +183,38 @@ DEF_HELPER_6(vsse_v_b, void, ptr, ptr, tl, tl, env, i32)
>  DEF_HELPER_6(vsse_v_h, void, ptr, ptr, tl, tl, env, i32)
>  DEF_HELPER_6(vsse_v_w, void, ptr, ptr, tl, tl, env, i32)
>  DEF_HELPER_6(vsse_v_d, void, ptr, ptr, tl, tl, env, i32)
> +DEF_HELPER_6(vlxb_v_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxb_v_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxb_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxb_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxh_v_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxh_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxh_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxw_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxw_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxe_v_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxe_v_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxe_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxe_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxbu_v_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxbu_v_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxbu_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxbu_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxhu_v_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxhu_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxhu_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxwu_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vlxwu_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxb_v_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxb_v_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxb_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxb_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxh_v_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxh_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxh_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxw_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxw_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxe_v_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxe_v_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxe_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsxe_v_d, void, ptr, ptr, tl, ptr, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index ef521152c5..bc36df33b5 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -241,6 +241,19 @@ vssh_v     ... 010 . ..... ..... 101 ..... 0100111 @r_nfvm
>  vssw_v     ... 010 . ..... ..... 110 ..... 0100111 @r_nfvm
>  vsse_v     ... 010 . ..... ..... 111 ..... 0100111 @r_nfvm
>
> +vlxb_v     ... 111 . ..... ..... 000 ..... 0000111 @r_nfvm
> +vlxh_v     ... 111 . ..... ..... 101 ..... 0000111 @r_nfvm
> +vlxw_v     ... 111 . ..... ..... 110 ..... 0000111 @r_nfvm
> +vlxe_v     ... 011 . ..... ..... 111 ..... 0000111 @r_nfvm
> +vlxbu_v    ... 011 . ..... ..... 000 ..... 0000111 @r_nfvm
> +vlxhu_v    ... 011 . ..... ..... 101 ..... 0000111 @r_nfvm
> +vlxwu_v    ... 011 . ..... ..... 110 ..... 0000111 @r_nfvm
> +# Vector ordered-indexed and unordered-indexed store insns.
> +vsxb_v     ... -11 . ..... ..... 000 ..... 0100111 @r_nfvm
> +vsxh_v     ... -11 . ..... ..... 101 ..... 0100111 @r_nfvm
> +vsxw_v     ... -11 . ..... ..... 110 ..... 0100111 @r_nfvm
> +vsxe_v     ... -11 . ..... ..... 111 ..... 0100111 @r_nfvm
> +
>  # *** new major opcode OP-V ***
>  vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
>  vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
> index d85f2aec68..5d1eeef323 100644
> --- a/target/riscv/insn_trans/trans_rvv.inc.c
> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
> @@ -407,3 +407,127 @@ GEN_VEXT_TRANS(vssb_v, 0, rnfvm, st_stride_op, st_stride_check)
>  GEN_VEXT_TRANS(vssh_v, 1, rnfvm, st_stride_op, st_stride_check)
>  GEN_VEXT_TRANS(vssw_v, 2, rnfvm, st_stride_op, st_stride_check)
>  GEN_VEXT_TRANS(vsse_v, 3, rnfvm, st_stride_op, st_stride_check)
> +
> +/*
> + *** index load and store
> + */
> +typedef void gen_helper_ldst_index(TCGv_ptr, TCGv_ptr, TCGv,
> +        TCGv_ptr, TCGv_env, TCGv_i32);
> +
> +static bool ldst_index_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
> +        uint32_t data, gen_helper_ldst_index *fn, DisasContext *s)
> +{
> +    TCGv_ptr dest, mask, index;
> +    TCGv base;
> +    TCGv_i32 desc;
> +
> +    dest = tcg_temp_new_ptr();
> +    mask = tcg_temp_new_ptr();
> +    index = tcg_temp_new_ptr();
> +    base = tcg_temp_new();
> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
> +
> +    gen_get_gpr(base, rs1);
> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
> +    tcg_gen_addi_ptr(index, cpu_env, vreg_ofs(s, vs2));
> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
> +
> +    fn(dest, mask, base, index, cpu_env, desc);
> +
> +    tcg_temp_free_ptr(dest);
> +    tcg_temp_free_ptr(mask);
> +    tcg_temp_free_ptr(index);
> +    tcg_temp_free(base);
> +    tcg_temp_free_i32(desc);
> +    return true;
> +}
> +
> +static bool ld_index_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
> +{
> +    uint32_t data = 0;
> +    gen_helper_ldst_index *fn;
> +    static gen_helper_ldst_index * const fns[7][4] = {
> +        { gen_helper_vlxb_v_b,  gen_helper_vlxb_v_h,
> +          gen_helper_vlxb_v_w,  gen_helper_vlxb_v_d },
> +        { NULL,                 gen_helper_vlxh_v_h,
> +          gen_helper_vlxh_v_w,  gen_helper_vlxh_v_d },
> +        { NULL,                 NULL,
> +          gen_helper_vlxw_v_w,  gen_helper_vlxw_v_d },
> +        { gen_helper_vlxe_v_b,  gen_helper_vlxe_v_h,
> +          gen_helper_vlxe_v_w,  gen_helper_vlxe_v_d },
> +        { gen_helper_vlxbu_v_b, gen_helper_vlxbu_v_h,
> +          gen_helper_vlxbu_v_w, gen_helper_vlxbu_v_d },
> +        { NULL,                 gen_helper_vlxhu_v_h,
> +          gen_helper_vlxhu_v_w, gen_helper_vlxhu_v_d },
> +        { NULL,                 NULL,
> +          gen_helper_vlxwu_v_w, gen_helper_vlxwu_v_d },
> +    };
> +
> +    fn =  fns[seq][s->sew];
> +    if (fn == NULL) {
> +        return false;
> +    }
> +
> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
> +    return ldst_index_trans(a->rd, a->rs1, a->rs2, data, fn, s);
> +}
> +
> +static bool ld_index_check(DisasContext *s, arg_rnfvm* a)
> +{
> +    return (vext_check_isa_ill(s, RVV) &&
> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
> +            vext_check_reg(s, a->rd, false) &&
> +            vext_check_reg(s, a->rs2, false) &&
> +            vext_check_nf(s, a->nf));
> +}
> +
> +GEN_VEXT_TRANS(vlxb_v, 0, rnfvm, ld_index_op, ld_index_check)
> +GEN_VEXT_TRANS(vlxh_v, 1, rnfvm, ld_index_op, ld_index_check)
> +GEN_VEXT_TRANS(vlxw_v, 2, rnfvm, ld_index_op, ld_index_check)
> +GEN_VEXT_TRANS(vlxe_v, 3, rnfvm, ld_index_op, ld_index_check)
> +GEN_VEXT_TRANS(vlxbu_v, 4, rnfvm, ld_index_op, ld_index_check)
> +GEN_VEXT_TRANS(vlxhu_v, 5, rnfvm, ld_index_op, ld_index_check)
> +GEN_VEXT_TRANS(vlxwu_v, 6, rnfvm, ld_index_op, ld_index_check)
> +
> +static bool st_index_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
> +{
> +    uint32_t data = 0;
> +    gen_helper_ldst_index *fn;
> +    static gen_helper_ldst_index * const fns[4][4] = {
> +        { gen_helper_vsxb_v_b,  gen_helper_vsxb_v_h,
> +          gen_helper_vsxb_v_w,  gen_helper_vsxb_v_d },
> +        { NULL,                 gen_helper_vsxh_v_h,
> +          gen_helper_vsxh_v_w,  gen_helper_vsxh_v_d },
> +        { NULL,                 NULL,
> +          gen_helper_vsxw_v_w,  gen_helper_vsxw_v_d },
> +        { gen_helper_vsxe_v_b,  gen_helper_vsxe_v_h,
> +          gen_helper_vsxe_v_w,  gen_helper_vsxe_v_d }
> +    };
> +
> +    fn =  fns[seq][s->sew];
> +    if (fn == NULL) {
> +        return false;
> +    }
> +
> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
> +    return ldst_index_trans(a->rd, a->rs1, a->rs2, data, fn, s);
> +}
> +
> +static bool st_index_check(DisasContext *s, arg_rnfvm* a)
> +{
> +    return (vext_check_isa_ill(s, RVV) &&
> +            vext_check_reg(s, a->rd, false) &&
> +            vext_check_reg(s, a->rs2, false) &&
> +            vext_check_nf(s, a->nf));
> +}
> +
> +GEN_VEXT_TRANS(vsxb_v, 0, rnfvm, st_index_op, st_index_check)
> +GEN_VEXT_TRANS(vsxh_v, 1, rnfvm, st_index_op, st_index_check)
> +GEN_VEXT_TRANS(vsxw_v, 2, rnfvm, st_index_op, st_index_check)
> +GEN_VEXT_TRANS(vsxe_v, 3, rnfvm, st_index_op, st_index_check)
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index ebfabd2946..35cb9f09b4 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -457,3 +457,120 @@ GEN_VEXT_ST_US(vse_v_b, int8_t,  int8_t , ste_b)
>  GEN_VEXT_ST_US(vse_v_h, int16_t, int16_t, ste_h)
>  GEN_VEXT_ST_US(vse_v_w, int32_t, int32_t, ste_w)
>  GEN_VEXT_ST_US(vse_v_d, int64_t, int64_t, ste_d)
> +
> +/*
> + *** index: access vector element from indexed memory
> + */
> +typedef target_ulong (*vext_get_index_addr)(target_ulong base,
> +        uint32_t idx, void *vs2);
> +
> +#define GEN_VEXT_GET_INDEX_ADDR(NAME, ETYPE, H)        \
> +static target_ulong NAME(target_ulong base,            \
> +        uint32_t idx, void *vs2)                       \
> +{                                                      \
> +    return (base + *((ETYPE *)vs2 + H(idx)));          \
> +}
> +
> +GEN_VEXT_GET_INDEX_ADDR(idx_b, int8_t,  H1)
> +GEN_VEXT_GET_INDEX_ADDR(idx_h, int16_t, H2)
> +GEN_VEXT_GET_INDEX_ADDR(idx_w, int32_t, H4)
> +GEN_VEXT_GET_INDEX_ADDR(idx_d, int64_t, H8)
> +
> +static inline void vext_ldst_index(void *vd, void *v0, target_ulong base,
> +        void *vs2, CPURISCVState *env, uint32_t desc,
> +        vext_get_index_addr get_index_addr,
> +        vext_ldst_elem_fn ldst_elem,
> +        vext_ld_clear_elem clear_elem,
> +        uint32_t esz, uint32_t msz, uintptr_t ra,
> +        MMUAccessType access_type)
> +{
> +    uint32_t i, k;
> +    uint32_t nf = vext_nf(desc);
> +    uint32_t vm = vext_vm(desc);
> +    uint32_t mlen = vext_mlen(desc);
> +    uint32_t vlmax = vext_maxsz(desc) / esz;
> +
> +    if (env->vl == 0) {
> +        return;
> +    }
> +    /* probe every access*/
> +    for (i = 0; i < env->vl; i++) {
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +            continue;
> +        }
> +        probe_pages(env, get_index_addr(base, i, vs2), nf * msz, ra,
> +                access_type);
> +    }
> +    /* load bytes from guest memory */
> +    for (i = 0; i < env->vl; i++) {
> +        k = 0;
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +            continue;
> +        }
> +        while (k < nf) {
> +            abi_ptr addr = get_index_addr(base, i, vs2) + k * msz;
> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
> +            k++;
> +        }
> +    }
> +    /* clear tail elements */
> +    if (clear_elem) {
> +        for (k = 0; k < nf; k++) {
> +            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
> +        }
> +    }
> +}
> +
> +#define GEN_VEXT_LD_INDEX(NAME, MTYPE, ETYPE, INDEX_FN, LOAD_FN, CLEAR_FN) \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                   \
> +        void *vs2, CPURISCVState *env, uint32_t desc)                      \
> +{                                                                          \
> +    vext_ldst_index(vd, v0, base, vs2, env, desc, INDEX_FN,                \
> +        LOAD_FN, CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE),                   \
> +        GETPC(), MMU_DATA_LOAD);                                           \
> +}
> +GEN_VEXT_LD_INDEX(vlxb_v_b,  int8_t,   int8_t,   idx_b, ldb_b,  clearb)
> +GEN_VEXT_LD_INDEX(vlxb_v_h,  int8_t,   int16_t,  idx_h, ldb_h,  clearh)
> +GEN_VEXT_LD_INDEX(vlxb_v_w,  int8_t,   int32_t,  idx_w, ldb_w,  clearl)
> +GEN_VEXT_LD_INDEX(vlxb_v_d,  int8_t,   int64_t,  idx_d, ldb_d,  clearq)
> +GEN_VEXT_LD_INDEX(vlxh_v_h,  int16_t,  int16_t,  idx_h, ldh_h,  clearh)
> +GEN_VEXT_LD_INDEX(vlxh_v_w,  int16_t,  int32_t,  idx_w, ldh_w,  clearl)
> +GEN_VEXT_LD_INDEX(vlxh_v_d,  int16_t,  int64_t,  idx_d, ldh_d,  clearq)
> +GEN_VEXT_LD_INDEX(vlxw_v_w,  int32_t,  int32_t,  idx_w, ldw_w,  clearl)
> +GEN_VEXT_LD_INDEX(vlxw_v_d,  int32_t,  int64_t,  idx_d, ldw_d,  clearq)
> +GEN_VEXT_LD_INDEX(vlxe_v_b,  int8_t,   int8_t,   idx_b, lde_b,  clearb)
> +GEN_VEXT_LD_INDEX(vlxe_v_h,  int16_t,  int16_t,  idx_h, lde_h,  clearh)
> +GEN_VEXT_LD_INDEX(vlxe_v_w,  int32_t,  int32_t,  idx_w, lde_w,  clearl)
> +GEN_VEXT_LD_INDEX(vlxe_v_d,  int64_t,  int64_t,  idx_d, lde_d,  clearq)
> +GEN_VEXT_LD_INDEX(vlxbu_v_b, uint8_t,  uint8_t,  idx_b, ldbu_b, clearb)
> +GEN_VEXT_LD_INDEX(vlxbu_v_h, uint8_t,  uint16_t, idx_h, ldbu_h, clearh)
> +GEN_VEXT_LD_INDEX(vlxbu_v_w, uint8_t,  uint32_t, idx_w, ldbu_w, clearl)
> +GEN_VEXT_LD_INDEX(vlxbu_v_d, uint8_t,  uint64_t, idx_d, ldbu_d, clearq)
> +GEN_VEXT_LD_INDEX(vlxhu_v_h, uint16_t, uint16_t, idx_h, ldhu_h, clearh)
> +GEN_VEXT_LD_INDEX(vlxhu_v_w, uint16_t, uint32_t, idx_w, ldhu_w, clearl)
> +GEN_VEXT_LD_INDEX(vlxhu_v_d, uint16_t, uint64_t, idx_d, ldhu_d, clearq)
> +GEN_VEXT_LD_INDEX(vlxwu_v_w, uint32_t, uint32_t, idx_w, ldwu_w, clearl)
> +GEN_VEXT_LD_INDEX(vlxwu_v_d, uint32_t, uint64_t, idx_d, ldwu_d, clearq)
> +
> +#define GEN_VEXT_ST_INDEX(NAME, MTYPE, ETYPE, INDEX_FN, STORE_FN)\
> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,         \
> +        void *vs2, CPURISCVState *env, uint32_t desc)            \
> +{                                                                \
> +    vext_ldst_index(vd, v0, base, vs2, env, desc, INDEX_FN,      \
> +        STORE_FN, NULL, sizeof(ETYPE), sizeof(MTYPE),            \
> +        GETPC(), MMU_DATA_STORE);                                \
> +}
> +
> +GEN_VEXT_ST_INDEX(vsxb_v_b, int8_t,  int8_t,  idx_b, stb_b)
> +GEN_VEXT_ST_INDEX(vsxb_v_h, int8_t,  int16_t, idx_h, stb_h)
> +GEN_VEXT_ST_INDEX(vsxb_v_w, int8_t,  int32_t, idx_w, stb_w)
> +GEN_VEXT_ST_INDEX(vsxb_v_d, int8_t,  int64_t, idx_d, stb_d)
> +GEN_VEXT_ST_INDEX(vsxh_v_h, int16_t, int16_t, idx_h, sth_h)
> +GEN_VEXT_ST_INDEX(vsxh_v_w, int16_t, int32_t, idx_w, sth_w)
> +GEN_VEXT_ST_INDEX(vsxh_v_d, int16_t, int64_t, idx_d, sth_d)
> +GEN_VEXT_ST_INDEX(vsxw_v_w, int32_t, int32_t, idx_w, stw_w)
> +GEN_VEXT_ST_INDEX(vsxw_v_d, int32_t, int64_t, idx_d, stw_d)
> +GEN_VEXT_ST_INDEX(vsxe_v_b, int8_t,  int8_t,  idx_b, ste_b)
> +GEN_VEXT_ST_INDEX(vsxe_v_h, int16_t, int16_t, idx_h, ste_h)
> +GEN_VEXT_ST_INDEX(vsxe_v_w, int32_t, int32_t, idx_w, ste_w)
> +GEN_VEXT_ST_INDEX(vsxe_v_d, int64_t, int64_t, idx_d, ste_d)
> --
> 2.23.0
>


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 05/60] target/riscv: add vector stride load and store instructions
  2020-03-13 20:38     ` Alistair Francis
@ 2020-03-13 21:32       ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-13 21:32 UTC (permalink / raw)
  To: Alistair Francis
  Cc: guoren, open list:RISC-V, Richard Henderson,
	qemu-devel@nongnu.org Developers, wxy194768, Chih-Min Chao,
	wenmeng_zhang, Palmer Dabbelt



On 2020/3/14 4:38, Alistair Francis wrote:
> On Thu, Mar 12, 2020 at 8:09 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>> Vector strided operations access the first memory element at the base address,
>> and then access subsequent elements at address increments given by the byte
>> offset contained in the x register specified by rs2.
>>
>> Vector unit-stride operations access elements stored contiguously in memory
>> starting from the base effective address. It can been seen as a special
>> case of strided operations.
>>
>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>> ---
>>   target/riscv/cpu.h                      |   6 +
>>   target/riscv/helper.h                   | 105 ++++++
>>   target/riscv/insn32.decode              |  32 ++
>>   target/riscv/insn_trans/trans_rvv.inc.c | 340 ++++++++++++++++++++
>>   target/riscv/translate.c                |   7 +
>>   target/riscv/vector_helper.c            | 406 ++++++++++++++++++++++++
>>   6 files changed, 896 insertions(+)
>>
>> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
>> index 505d1a8515..b6ebb9b0eb 100644
>> --- a/target/riscv/cpu.h
>> +++ b/target/riscv/cpu.h
>> @@ -369,6 +369,12 @@ typedef CPURISCVState CPUArchState;
>>   typedef RISCVCPU ArchCPU;
>>   #include "exec/cpu-all.h"
>>
>> +/* share data between vector helpers and decode code */
>> +FIELD(VDATA, MLEN, 0, 8)
>> +FIELD(VDATA, VM, 8, 1)
>> +FIELD(VDATA, LMUL, 9, 2)
>> +FIELD(VDATA, NF, 11, 4)
>> +
>>   FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
>>   FIELD(TB_FLAGS, LMUL, 3, 2)
>>   FIELD(TB_FLAGS, SEW, 5, 3)
>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>> index 3c28c7e407..87dfa90609 100644
>> --- a/target/riscv/helper.h
>> +++ b/target/riscv/helper.h
>> @@ -78,3 +78,108 @@ DEF_HELPER_1(tlb_flush, void, env)
>>   #endif
>>   /* Vector functions */
>>   DEF_HELPER_3(vsetvl, tl, env, tl, tl)
>> +DEF_HELPER_5(vlb_v_b, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlb_v_b_mask, void, ptr, ptr, tl, env, i32)
> Do you mind explaining why we have *_mask versions? I'm struggling to
> understand this.
When an instruction with a mask, it will only operate the active 
elements in vector.
Whether an element is active or inactive is predicated by a mask 
register v0.

Without mask, it will operate every element in vector in the body.
>> +DEF_HELPER_5(vlb_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlb_v_h_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlb_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlb_v_w_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlb_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlb_v_d_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlh_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlh_v_h_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlh_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlh_v_w_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlh_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlh_v_d_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlw_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlw_v_w_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlw_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlw_v_d_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vle_v_b, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vle_v_b_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vle_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vle_v_h_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vle_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vle_v_w_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vle_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vle_v_d_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbu_v_b, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbu_v_b_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbu_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbu_v_h_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbu_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbu_v_w_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbu_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbu_v_d_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhu_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhu_v_h_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhu_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhu_v_w_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhu_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhu_v_d_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlwu_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlwu_v_w_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlwu_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlwu_v_d_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsb_v_b, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsb_v_b_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsb_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsb_v_h_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsb_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsb_v_w_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsb_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsb_v_d_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsh_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsh_v_h_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsh_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsh_v_w_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsh_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsh_v_d_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsw_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsw_v_w_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsw_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsw_v_d_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vse_v_b, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vse_v_b_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vse_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vse_v_h_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vse_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vse_v_w_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vse_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vse_v_d_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_6(vlsb_v_b, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsb_v_h, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsb_v_w, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsb_v_d, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsh_v_h, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsh_v_w, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsh_v_d, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsw_v_w, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsw_v_d, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlse_v_b, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlse_v_h, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlse_v_w, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlse_v_d, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsbu_v_b, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsbu_v_h, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsbu_v_w, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsbu_v_d, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlshu_v_h, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlshu_v_w, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlshu_v_d, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlswu_v_w, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlswu_v_d, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vssb_v_b, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vssb_v_h, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vssb_v_w, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vssb_v_d, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vssh_v_h, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vssh_v_w, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vssh_v_d, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vssw_v_w, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vssw_v_d, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vsse_v_b, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vsse_v_h, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vsse_v_w, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vsse_v_d, void, ptr, ptr, tl, tl, env, i32)
>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>> index 53340bdbc4..ef521152c5 100644
>> --- a/target/riscv/insn32.decode
>> +++ b/target/riscv/insn32.decode
>> @@ -25,6 +25,7 @@
>>   %sh10    20:10
>>   %csr    20:12
>>   %rm     12:3
>> +%nf     29:3                     !function=ex_plus_1
>>
>>   # immediates:
>>   %imm_i    20:s12
>> @@ -43,6 +44,8 @@
>>   &u    imm rd
>>   &shift     shamt rs1 rd
>>   &atomic    aq rl rs2 rs1 rd
>> +&r2nfvm    vm rd rs1 nf
>> +&rnfvm     vm rd rs1 rs2 nf
>>
>>   # Formats 32:
>>   @r       .......   ..... ..... ... ..... ....... &r                %rs2 %rs1 %rd
>> @@ -62,6 +65,8 @@
>>   @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
>>   @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
>>   @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
>> +@r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
>> +@r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
>>   @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
>>
>>   @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
>> @@ -210,5 +215,32 @@ fcvt_d_w   1101001  00000 ..... ... ..... 1010011 @r2_rm
>>   fcvt_d_wu  1101001  00001 ..... ... ..... 1010011 @r2_rm
>>
>>   # *** RV32V Extension ***
>> +
>> +# *** Vector loads and stores are encoded within LOADFP/STORE-FP ***
>> +vlb_v      ... 100 . 00000 ..... 000 ..... 0000111 @r2_nfvm
>> +vlh_v      ... 100 . 00000 ..... 101 ..... 0000111 @r2_nfvm
>> +vlw_v      ... 100 . 00000 ..... 110 ..... 0000111 @r2_nfvm
>> +vle_v      ... 000 . 00000 ..... 111 ..... 0000111 @r2_nfvm
>> +vlbu_v     ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm
>> +vlhu_v     ... 000 . 00000 ..... 101 ..... 0000111 @r2_nfvm
>> +vlwu_v     ... 000 . 00000 ..... 110 ..... 0000111 @r2_nfvm
>> +vsb_v      ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm
>> +vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
>> +vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
>> +vse_v      ... 000 . 00000 ..... 111 ..... 0100111 @r2_nfvm
>> +
>> +vlsb_v     ... 110 . ..... ..... 000 ..... 0000111 @r_nfvm
>> +vlsh_v     ... 110 . ..... ..... 101 ..... 0000111 @r_nfvm
>> +vlsw_v     ... 110 . ..... ..... 110 ..... 0000111 @r_nfvm
>> +vlse_v     ... 010 . ..... ..... 111 ..... 0000111 @r_nfvm
>> +vlsbu_v    ... 010 . ..... ..... 000 ..... 0000111 @r_nfvm
>> +vlshu_v    ... 010 . ..... ..... 101 ..... 0000111 @r_nfvm
>> +vlswu_v    ... 010 . ..... ..... 110 ..... 0000111 @r_nfvm
>> +vssb_v     ... 010 . ..... ..... 000 ..... 0100111 @r_nfvm
>> +vssh_v     ... 010 . ..... ..... 101 ..... 0100111 @r_nfvm
>> +vssw_v     ... 010 . ..... ..... 110 ..... 0100111 @r_nfvm
>> +vsse_v     ... 010 . ..... ..... 111 ..... 0100111 @r_nfvm
>> +
>> +# *** new major opcode OP-V ***
>>   vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
>>   vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
>> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
>> index da82c72bbf..d85f2aec68 100644
>> --- a/target/riscv/insn_trans/trans_rvv.inc.c
>> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
>> @@ -15,6 +15,8 @@
>>    * You should have received a copy of the GNU General Public License along with
>>    * this program.  If not, see <http://www.gnu.org/licenses/>.
>>    */
>> +#include "tcg/tcg-op-gvec.h"
>> +#include "tcg/tcg-gvec-desc.h"
>>
>>   static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl * a)
>>   {
>> @@ -67,3 +69,341 @@ static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli * a)
>>       tcg_temp_free(dst);
>>       return true;
>>   }
>> +
>> +/* vector register offset from env */
>> +static uint32_t vreg_ofs(DisasContext *s, int reg)
>> +{
>> +    return offsetof(CPURISCVState, vreg) + reg * s->vlen / 8;
>> +}
>> +
>> +/* check functions */
>> +static bool vext_check_isa_ill(DisasContext *s, target_ulong isa)
>> +{
>> +    return !s->vill && ((s->misa & isa) == isa);
>> +}
> I don't think we need a new function to check ISA.
I don't think so.

Although there is a riscv_has_ext(env, isa) in cpu.h, it is not proper 
in this file,
as it is in translation time and  usually DisasContext   is used here 
instead of CPURISCVState.

VILL and ISA  will be checked in every vector instruction, I just put 
them in one function.
>
>> +
>> +/*
>> + * There are two rules check here.
>> + *
>> + * 1. Vector register numbers are multiples of LMUL. (Section 3.2)
>> + *
>> + * 2. For all widening instructions, the destination LMUL value must also be
>> + *    a supported LMUL value. (Section 11.2)
>> + */
>> +static bool vext_check_reg(DisasContext *s, uint32_t reg, bool widen)
>> +{
>> +    /*
>> +     * The destination vector register group results are arranged as if both
>> +     * SEW and LMUL were at twice their current settings. (Section 11.2).
>> +     */
>> +    int legal = widen ? 2 << s->lmul : 1 << s->lmul;
>> +
>> +    return !((s->lmul == 0x3 && widen) || (reg % legal));
> Where does this 3 come from?
LMUL are 2 bits in VTYPE.  So the biggest LMUL is 0x3.
The meaning of 0x3 is there are 8 vector registers will be used for 
operators.

For a widen operation, LMUL equals 0x3 will be illegal, as

     "The destination vector register group results are arranged as if both
      SEW and LMUL were at twice their current settings. (Section 11.2)."

If LMUL is 0x3, the source vector register group is 8 vector registers, and
the destination vector register group will be 16 vector registers indicated,
which is illegal.
>
>> +}
>> +
>> +/*
>> + * There are two rules check here.
>> + *
>> + * 1. The destination vector register group for a masked vector instruction can
>> + *    only overlap the source mask register (v0) when LMUL=1. (Section 5.3)
>> + *
>> + * 2. In widen instructions and some other insturctions, like vslideup.vx,
>> + *    there is no need to check whether LMUL=1.
>> + */
>> +static bool vext_check_overlap_mask(DisasContext *s, uint32_t vd, bool vm,
>> +    bool force)
>> +{
>> +    return (vm != 0 || vd != 0) || (!force && (s->lmul == 0));
>> +}
>> +
>> +/* The LMUL setting must be such that LMUL * NFIELDS <= 8. (Section 7.8) */
>> +static bool vext_check_nf(DisasContext *s, uint32_t nf)
>> +{
>> +    return (1 << s->lmul) * nf <= 8;
>> +}
>> +
>> +/* common translation macro */
>> +#define GEN_VEXT_TRANS(NAME, SEQ, ARGTYPE, OP, CHECK)      \
>> +static bool trans_##NAME(DisasContext *s, arg_##ARGTYPE *a)\
>> +{                                                          \
>> +    if (CHECK(s, a)) {                                     \
>> +        return OP(s, a, SEQ);                              \
>> +    }                                                      \
>> +    return false;                                          \
>> +}
>> +
>> +/*
>> + *** unit stride load and store
>> + */
>> +typedef void gen_helper_ldst_us(TCGv_ptr, TCGv_ptr, TCGv,
>> +        TCGv_env, TCGv_i32);
>> +
>> +static bool ldst_us_trans(uint32_t vd, uint32_t rs1, uint32_t data,
>> +        gen_helper_ldst_us *fn, DisasContext *s)
>> +{
>> +    TCGv_ptr dest, mask;
>> +    TCGv base;
>> +    TCGv_i32 desc;
>> +
>> +    dest = tcg_temp_new_ptr();
>> +    mask = tcg_temp_new_ptr();
>> +    base = tcg_temp_new();
>> +
>> +    /*
>> +     * As simd_desc supports at most 256 bytes, and in this implementation,
>> +     * the max vector group length is 2048 bytes. So split it into two parts.
>> +     *
>> +     * The first part is vlen in bytes, encoded in maxsz of simd_desc.
>> +     * The second part is lmul, encoded in data of simd_desc.
>> +     */
>> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
>> +
>> +    gen_get_gpr(base, rs1);
>> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
>> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
>> +
>> +    fn(dest, mask, base, cpu_env, desc);
>> +
>> +    tcg_temp_free_ptr(dest);
>> +    tcg_temp_free_ptr(mask);
>> +    tcg_temp_free(base);
>> +    tcg_temp_free_i32(desc);
>> +    return true;
>> +}
>> +
>> +static bool ld_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
>> +{
>> +    uint32_t data = 0;
>> +    gen_helper_ldst_us *fn;
>> +    static gen_helper_ldst_us * const fns[2][7][4] = {
>> +        /* masked unit stride load */
>> +        { { gen_helper_vlb_v_b_mask,  gen_helper_vlb_v_h_mask,
>> +            gen_helper_vlb_v_w_mask,  gen_helper_vlb_v_d_mask },
>> +          { NULL,                     gen_helper_vlh_v_h_mask,
>> +            gen_helper_vlh_v_w_mask,  gen_helper_vlh_v_d_mask },
>> +          { NULL,                     NULL,
>> +            gen_helper_vlw_v_w_mask,  gen_helper_vlw_v_d_mask },
>> +          { gen_helper_vle_v_b_mask,  gen_helper_vle_v_h_mask,
>> +            gen_helper_vle_v_w_mask,  gen_helper_vle_v_d_mask },
>> +          { gen_helper_vlbu_v_b_mask, gen_helper_vlbu_v_h_mask,
>> +            gen_helper_vlbu_v_w_mask, gen_helper_vlbu_v_d_mask },
>> +          { NULL,                     gen_helper_vlhu_v_h_mask,
>> +            gen_helper_vlhu_v_w_mask, gen_helper_vlhu_v_d_mask },
>> +          { NULL,                     NULL,
>> +            gen_helper_vlwu_v_w_mask, gen_helper_vlwu_v_d_mask } },
>> +        /* unmasked unit stride load */
>> +        { { gen_helper_vlb_v_b,  gen_helper_vlb_v_h,
>> +            gen_helper_vlb_v_w,  gen_helper_vlb_v_d },
>> +          { NULL,                gen_helper_vlh_v_h,
>> +            gen_helper_vlh_v_w,  gen_helper_vlh_v_d },
>> +          { NULL,                NULL,
>> +            gen_helper_vlw_v_w,  gen_helper_vlw_v_d },
>> +          { gen_helper_vle_v_b,  gen_helper_vle_v_h,
>> +            gen_helper_vle_v_w,  gen_helper_vle_v_d },
>> +          { gen_helper_vlbu_v_b, gen_helper_vlbu_v_h,
>> +            gen_helper_vlbu_v_w, gen_helper_vlbu_v_d },
>> +          { NULL,                gen_helper_vlhu_v_h,
>> +            gen_helper_vlhu_v_w, gen_helper_vlhu_v_d },
>> +          { NULL,                NULL,
>> +            gen_helper_vlwu_v_w, gen_helper_vlwu_v_d } }
>> +    };
>> +
>> +    fn =  fns[a->vm][seq][s->sew];
>> +    if (fn == NULL) {
>> +        return false;
>> +    }
>> +
>> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
>> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
>> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
>> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
>> +    return ldst_us_trans(a->rd, a->rs1, data, fn, s);
>> +}
>> +
>> +static bool ld_us_check(DisasContext *s, arg_r2nfvm* a)
>> +{
>> +    return (vext_check_isa_ill(s, RVV) &&
>> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
>> +            vext_check_reg(s, a->rd, false) &&
>> +            vext_check_nf(s, a->nf));
>> +}
>> +
>> +GEN_VEXT_TRANS(vlb_v, 0, r2nfvm, ld_us_op, ld_us_check)
>> +GEN_VEXT_TRANS(vlh_v, 1, r2nfvm, ld_us_op, ld_us_check)
>> +GEN_VEXT_TRANS(vlw_v, 2, r2nfvm, ld_us_op, ld_us_check)
>> +GEN_VEXT_TRANS(vle_v, 3, r2nfvm, ld_us_op, ld_us_check)
>> +GEN_VEXT_TRANS(vlbu_v, 4, r2nfvm, ld_us_op, ld_us_check)
>> +GEN_VEXT_TRANS(vlhu_v, 5, r2nfvm, ld_us_op, ld_us_check)
>> +GEN_VEXT_TRANS(vlwu_v, 6, r2nfvm, ld_us_op, ld_us_check)
>> +
>> +static bool st_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
>> +{
>> +    uint32_t data = 0;
>> +    gen_helper_ldst_us *fn;
>> +    static gen_helper_ldst_us * const fns[2][4][4] = {
>> +        /* masked unit stride load and store */
>> +        { { gen_helper_vsb_v_b_mask,  gen_helper_vsb_v_h_mask,
>> +            gen_helper_vsb_v_w_mask,  gen_helper_vsb_v_d_mask },
>> +          { NULL,                     gen_helper_vsh_v_h_mask,
>> +            gen_helper_vsh_v_w_mask,  gen_helper_vsh_v_d_mask },
>> +          { NULL,                     NULL,
>> +            gen_helper_vsw_v_w_mask,  gen_helper_vsw_v_d_mask },
>> +          { gen_helper_vse_v_b_mask,  gen_helper_vse_v_h_mask,
>> +            gen_helper_vse_v_w_mask,  gen_helper_vse_v_d_mask } },
>> +        /* unmasked unit stride store */
>> +        { { gen_helper_vsb_v_b,  gen_helper_vsb_v_h,
>> +            gen_helper_vsb_v_w,  gen_helper_vsb_v_d },
>> +          { NULL,                gen_helper_vsh_v_h,
>> +            gen_helper_vsh_v_w,  gen_helper_vsh_v_d },
>> +          { NULL,                NULL,
>> +            gen_helper_vsw_v_w,  gen_helper_vsw_v_d },
>> +          { gen_helper_vse_v_b,  gen_helper_vse_v_h,
>> +            gen_helper_vse_v_w,  gen_helper_vse_v_d } }
>> +    };
>> +
>> +    fn =  fns[a->vm][seq][s->sew];
>> +    if (fn == NULL) {
>> +        return false;
>> +    }
>> +
>> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
>> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
>> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
>> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
>> +    return ldst_us_trans(a->rd, a->rs1, data, fn, s);
>> +}
>> +
>> +static bool st_us_check(DisasContext *s, arg_r2nfvm* a)
>> +{
>> +    return (vext_check_isa_ill(s, RVV) &&
>> +            vext_check_reg(s, a->rd, false) &&
>> +            vext_check_nf(s, a->nf));
>> +}
>> +
>> +GEN_VEXT_TRANS(vsb_v, 0, r2nfvm, st_us_op, st_us_check)
>> +GEN_VEXT_TRANS(vsh_v, 1, r2nfvm, st_us_op, st_us_check)
>> +GEN_VEXT_TRANS(vsw_v, 2, r2nfvm, st_us_op, st_us_check)
>> +GEN_VEXT_TRANS(vse_v, 3, r2nfvm, st_us_op, st_us_check)
>> +
>> +/*
>> + *** stride load and store
>> + */
>> +typedef void gen_helper_ldst_stride(TCGv_ptr, TCGv_ptr, TCGv,
>> +        TCGv, TCGv_env, TCGv_i32);
>> +
>> +static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, uint32_t rs2,
>> +        uint32_t data, gen_helper_ldst_stride *fn, DisasContext *s)
>> +{
>> +    TCGv_ptr dest, mask;
>> +    TCGv base, stride;
>> +    TCGv_i32 desc;
>> +
>> +    dest = tcg_temp_new_ptr();
>> +    mask = tcg_temp_new_ptr();
>> +    base = tcg_temp_new();
>> +    stride = tcg_temp_new();
>> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
>> +
>> +    gen_get_gpr(base, rs1);
>> +    gen_get_gpr(stride, rs2);
>> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
>> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
>> +
>> +    fn(dest, mask, base, stride, cpu_env, desc);
>> +
>> +    tcg_temp_free_ptr(dest);
>> +    tcg_temp_free_ptr(mask);
>> +    tcg_temp_free(base);
>> +    tcg_temp_free(stride);
>> +    tcg_temp_free_i32(desc);
>> +    return true;
>> +}
>> +
>> +static bool ld_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
>> +{
>> +    uint32_t data = 0;
>> +    gen_helper_ldst_stride *fn;
>> +    static gen_helper_ldst_stride * const fns[7][4] = {
>> +        { gen_helper_vlsb_v_b,  gen_helper_vlsb_v_h,
>> +          gen_helper_vlsb_v_w,  gen_helper_vlsb_v_d },
>> +        { NULL,                 gen_helper_vlsh_v_h,
>> +          gen_helper_vlsh_v_w,  gen_helper_vlsh_v_d },
>> +        { NULL,                 NULL,
>> +          gen_helper_vlsw_v_w,  gen_helper_vlsw_v_d },
>> +        { gen_helper_vlse_v_b,  gen_helper_vlse_v_h,
>> +          gen_helper_vlse_v_w,  gen_helper_vlse_v_d },
>> +        { gen_helper_vlsbu_v_b, gen_helper_vlsbu_v_h,
>> +          gen_helper_vlsbu_v_w, gen_helper_vlsbu_v_d },
>> +        { NULL,                 gen_helper_vlshu_v_h,
>> +          gen_helper_vlshu_v_w, gen_helper_vlshu_v_d },
>> +        { NULL,                 NULL,
>> +          gen_helper_vlswu_v_w, gen_helper_vlswu_v_d },
>> +    };
>> +
>> +    fn =  fns[seq][s->sew];
>> +    if (fn == NULL) {
>> +        return false;
>> +    }
>> +
>> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
>> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
>> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
>> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
>> +    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
>> +}
>> +
>> +static bool ld_stride_check(DisasContext *s, arg_rnfvm* a)
>> +{
>> +    return (vext_check_isa_ill(s, RVV) &&
>> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
>> +            vext_check_reg(s, a->rd, false) &&
>> +            vext_check_nf(s, a->nf));
>> +}
>> +
>> +GEN_VEXT_TRANS(vlsb_v, 0, rnfvm, ld_stride_op, ld_stride_check)
>> +GEN_VEXT_TRANS(vlsh_v, 1, rnfvm, ld_stride_op, ld_stride_check)
>> +GEN_VEXT_TRANS(vlsw_v, 2, rnfvm, ld_stride_op, ld_stride_check)
>> +GEN_VEXT_TRANS(vlse_v, 3, rnfvm, ld_stride_op, ld_stride_check)
>> +GEN_VEXT_TRANS(vlsbu_v, 4, rnfvm, ld_stride_op, ld_stride_check)
>> +GEN_VEXT_TRANS(vlshu_v, 5, rnfvm, ld_stride_op, ld_stride_check)
>> +GEN_VEXT_TRANS(vlswu_v, 6, rnfvm, ld_stride_op, ld_stride_check)
>> +
>> +static bool st_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
>> +{
>> +    uint32_t data = 0;
>> +    gen_helper_ldst_stride *fn;
>> +    static gen_helper_ldst_stride * const fns[4][4] = {
>> +        /* masked stride store */
>> +        { gen_helper_vssb_v_b,  gen_helper_vssb_v_h,
>> +          gen_helper_vssb_v_w,  gen_helper_vssb_v_d },
>> +        { NULL,                 gen_helper_vssh_v_h,
>> +          gen_helper_vssh_v_w,  gen_helper_vssh_v_d },
>> +        { NULL,                 NULL,
>> +          gen_helper_vssw_v_w,  gen_helper_vssw_v_d },
>> +        { gen_helper_vsse_v_b,  gen_helper_vsse_v_h,
>> +          gen_helper_vsse_v_w,  gen_helper_vsse_v_d }
>> +    };
>> +
>> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
>> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
>> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
>> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
>> +    fn =  fns[seq][s->sew];
>> +    if (fn == NULL) {
>> +        return false;
>> +    }
>> +
>> +    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
>> +}
>> +
>> +static bool st_stride_check(DisasContext *s, arg_rnfvm* a)
>> +{
>> +    return (vext_check_isa_ill(s, RVV) &&
>> +            vext_check_reg(s, a->rd, false) &&
>> +            vext_check_nf(s, a->nf));
>> +}
>> +
>> +GEN_VEXT_TRANS(vssb_v, 0, rnfvm, st_stride_op, st_stride_check)
>> +GEN_VEXT_TRANS(vssh_v, 1, rnfvm, st_stride_op, st_stride_check)
>> +GEN_VEXT_TRANS(vssw_v, 2, rnfvm, st_stride_op, st_stride_check)
>> +GEN_VEXT_TRANS(vsse_v, 3, rnfvm, st_stride_op, st_stride_check)
> Looks good
>
>> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
>> index af07ac4160..852545b77e 100644
>> --- a/target/riscv/translate.c
>> +++ b/target/riscv/translate.c
>> @@ -61,6 +61,7 @@ typedef struct DisasContext {
>>       uint8_t lmul;
>>       uint8_t sew;
>>       uint16_t vlen;
>> +    uint16_t mlen;
>>       bool vl_eq_vlmax;
>>   } DisasContext;
>>
>> @@ -548,6 +549,11 @@ static void decode_RV32_64C(DisasContext *ctx, uint16_t opcode)
>>       }
>>   }
>>
>> +static int ex_plus_1(DisasContext *ctx, int nf)
>> +{
>> +    return nf + 1;
>> +}
>> +
>>   #define EX_SH(amount) \
>>       static int ex_shift_##amount(DisasContext *ctx, int imm) \
>>       {                                         \
>> @@ -784,6 +790,7 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
>>       ctx->vill = FIELD_EX32(tb_flags, TB_FLAGS, VILL);
>>       ctx->sew = FIELD_EX32(tb_flags, TB_FLAGS, SEW);
>>       ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
>> +    ctx->mlen = 1 << (ctx->sew  + 3 - ctx->lmul);
>>       ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
>>   }
>>
>> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
>> index 2afe716f2a..ebfabd2946 100644
>> --- a/target/riscv/vector_helper.c
>> +++ b/target/riscv/vector_helper.c
>> @@ -18,8 +18,10 @@
>>
>>   #include "qemu/osdep.h"
>>   #include "cpu.h"
>> +#include "exec/memop.h"
>>   #include "exec/exec-all.h"
>>   #include "exec/helper-proto.h"
>> +#include "tcg/tcg-gvec-desc.h"
>>   #include <math.h>
>>
>>   target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
>> @@ -51,3 +53,407 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
>>       env->vstart = 0;
>>       return vl;
>>   }
>> +
>> +/*
>> + * Note that vector data is stored in host-endian 64-bit chunks,
>> + * so addressing units smaller than that needs a host-endian fixup.
>> + */
>> +#ifdef HOST_WORDS_BIGENDIAN
>> +#define H1(x)   ((x) ^ 7)
>> +#define H1_2(x) ((x) ^ 6)
>> +#define H1_4(x) ((x) ^ 4)
>> +#define H2(x)   ((x) ^ 3)
>> +#define H4(x)   ((x) ^ 1)
>> +#define H8(x)   ((x))
>> +#else
>> +#define H1(x)   (x)
>> +#define H1_2(x) (x)
>> +#define H1_4(x) (x)
>> +#define H2(x)   (x)
>> +#define H4(x)   (x)
>> +#define H8(x)   (x)
>> +#endif
> Looks good. Overall this looks good. Do you mind splitting this patch
> up a little bit more? It's difficult to review such a long and complex
> patch.
>
> Alistair
As unit stride can be saw as  a special case of stride mode, I just put 
them together.
I will  split the stride and unit stride mode in next patch set.

Even though I think it will be some long and complex, a lot of corner 
case must
be considered for vector load and store, and a lot of common code will 
be defined
here.

Zhiwei
>> +
>> +static inline uint32_t vext_nf(uint32_t desc)
>> +{
>> +    return FIELD_EX32(simd_data(desc), VDATA, NF);
>> +}
>> +
>> +static inline uint32_t vext_mlen(uint32_t desc)
>> +{
>> +    return FIELD_EX32(simd_data(desc), VDATA, MLEN);
>> +}
>> +
>> +static inline uint32_t vext_vm(uint32_t desc)
>> +{
>> +    return FIELD_EX32(simd_data(desc), VDATA, VM);
>> +}
>> +
>> +static inline uint32_t vext_lmul(uint32_t desc)
>> +{
>> +    return FIELD_EX32(simd_data(desc), VDATA, LMUL);
>> +}
>> +
>> +/*
>> + * Get vector group length in bytes. Its range is [64, 2048].
>> + *
>> + * As simd_desc support at most 256, the max vlen is 512 bits.
>> + * So vlen in bytes is encoded as maxsz.
>> + */
>> +static inline uint32_t vext_maxsz(uint32_t desc)
>> +{
>> +    return simd_maxsz(desc) << vext_lmul(desc);
>> +}
>> +
>> +/*
>> + * This function checks watchpoint before real load operation.
>> + *
>> + * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
>> + * In user mode, there is no watchpoint support now.
>> + *
>> + * It will trigger an exception if there is no mapping in TLB
>> + * and page table walk can't fill the TLB entry. Then the guest
>> + * software can return here after process the exception or never return.
>> + */
>> +static void probe_pages(CPURISCVState *env, target_ulong addr,
>> +        target_ulong len, uintptr_t ra, MMUAccessType access_type)
>> +{
>> +    target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
>> +    target_ulong curlen = MIN(pagelen, len);
>> +
>> +    probe_access(env, addr, curlen, access_type,
>> +            cpu_mmu_index(env, false), ra);
>> +    if (len > curlen) {
>> +        addr += curlen;
>> +        curlen = len - curlen;
>> +        probe_access(env, addr, curlen, access_type,
>> +                cpu_mmu_index(env, false), ra);
>> +    }
>> +}
>> +
>> +#ifdef HOST_WORDS_BIGENDIAN
>> +static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
>> +{
>> +    /*
>> +     * Split the remaining range to two parts.
>> +     * The first part is in the last uint64_t unit.
>> +     * The second part start from the next uint64_t unit.
>> +     */
>> +    int part1 = 0, part2 = tot - cnt;
>> +    if (cnt % 8) {
>> +        part1 = 8 - (cnt % 8);
>> +        part2 = tot - cnt - part1;
>> +        memset(tail & ~(7ULL), 0, part1);
>> +        memset((tail + 8) & ~(7ULL), 0, part2);
>> +    } else {
>> +        memset(tail, 0, part2);
>> +    }
>> +}
>> +#else
>> +static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
>> +{
>> +    memset(tail, 0, tot - cnt);
>> +}
>> +#endif
>> +
>> +static void clearb(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
>> +{
>> +    int8_t *cur = ((int8_t *)vd + H1(idx));
>> +    vext_clear(cur, cnt, tot);
>> +}
>> +
>> +static void clearh(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
>> +{
>> +    int16_t *cur = ((int16_t *)vd + H2(idx));
>> +    vext_clear(cur, cnt, tot);
>> +}
>> +
>> +static void clearl(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
>> +{
>> +    int32_t *cur = ((int32_t *)vd + H4(idx));
>> +    vext_clear(cur, cnt, tot);
>> +}
>> +
>> +static void clearq(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
>> +{
>> +    int64_t *cur = (int64_t *)vd + idx;
>> +    vext_clear(cur, cnt, tot);
>> +}
>> +
>> +
>> +static inline int vext_elem_mask(void *v0, int mlen, int index)
>> +{
>> +    int idx = (index * mlen) / 64;
>> +    int pos = (index * mlen) % 64;
>> +    return (((uint64_t *)v0)[idx] >> pos) & 1;
>> +}
>> +
>> +/* elements operations for load and store */
>> +typedef void (*vext_ldst_elem_fn)(CPURISCVState *env, target_ulong addr,
>> +        uint32_t idx, void *vd, uintptr_t retaddr);
>> +typedef void (*vext_ld_clear_elem)(void *vd, uint32_t idx,
>> +        uint32_t cnt, uint32_t tot);
>> +
>> +#define GEN_VEXT_LD_ELEM(NAME, MTYPE, ETYPE, H, LDSUF)     \
>> +static void NAME(CPURISCVState *env, abi_ptr addr,         \
>> +        uint32_t idx, void *vd, uintptr_t retaddr)         \
>> +{                                                          \
>> +    MTYPE data;                                            \
>> +    ETYPE *cur = ((ETYPE *)vd + H(idx));                   \
>> +    data = cpu_##LDSUF##_data_ra(env, addr, retaddr);      \
>> +    *cur = data;                                           \
>> +}                                                          \
>> +
>> +GEN_VEXT_LD_ELEM(ldb_b, int8_t,  int8_t,  H1, ldsb)
>> +GEN_VEXT_LD_ELEM(ldb_h, int8_t,  int16_t, H2, ldsb)
>> +GEN_VEXT_LD_ELEM(ldb_w, int8_t,  int32_t, H4, ldsb)
>> +GEN_VEXT_LD_ELEM(ldb_d, int8_t,  int64_t, H8, ldsb)
>> +GEN_VEXT_LD_ELEM(ldh_h, int16_t, int16_t, H2, ldsw)
>> +GEN_VEXT_LD_ELEM(ldh_w, int16_t, int32_t, H4, ldsw)
>> +GEN_VEXT_LD_ELEM(ldh_d, int16_t, int64_t, H8, ldsw)
>> +GEN_VEXT_LD_ELEM(ldw_w, int32_t, int32_t, H4, ldl)
>> +GEN_VEXT_LD_ELEM(ldw_d, int32_t, int64_t, H8, ldl)
>> +GEN_VEXT_LD_ELEM(lde_b, int8_t,  int8_t,  H1, ldsb)
>> +GEN_VEXT_LD_ELEM(lde_h, int16_t, int16_t, H2, ldsw)
>> +GEN_VEXT_LD_ELEM(lde_w, int32_t, int32_t, H4, ldl)
>> +GEN_VEXT_LD_ELEM(lde_d, int64_t, int64_t, H8, ldq)
>> +GEN_VEXT_LD_ELEM(ldbu_b, uint8_t,  uint8_t,  H1, ldub)
>> +GEN_VEXT_LD_ELEM(ldbu_h, uint8_t,  uint16_t, H2, ldub)
>> +GEN_VEXT_LD_ELEM(ldbu_w, uint8_t,  uint32_t, H4, ldub)
>> +GEN_VEXT_LD_ELEM(ldbu_d, uint8_t,  uint64_t, H8, ldub)
>> +GEN_VEXT_LD_ELEM(ldhu_h, uint16_t, uint16_t, H2, lduw)
>> +GEN_VEXT_LD_ELEM(ldhu_w, uint16_t, uint32_t, H4, lduw)
>> +GEN_VEXT_LD_ELEM(ldhu_d, uint16_t, uint64_t, H8, lduw)
>> +GEN_VEXT_LD_ELEM(ldwu_w, uint32_t, uint32_t, H4, ldl)
>> +GEN_VEXT_LD_ELEM(ldwu_d, uint32_t, uint64_t, H8, ldl)
>> +
>> +#define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF)          \
>> +static void NAME(CPURISCVState *env, abi_ptr addr,       \
>> +        uint32_t idx, void *vd, uintptr_t retaddr)       \
>> +{                                                        \
>> +    ETYPE data = *((ETYPE *)vd + H(idx));                \
>> +    cpu_##STSUF##_data_ra(env, addr, data, retaddr);     \
>> +}
>> +GEN_VEXT_ST_ELEM(stb_b, int8_t,  H1, stb)
>> +GEN_VEXT_ST_ELEM(stb_h, int16_t, H2, stb)
>> +GEN_VEXT_ST_ELEM(stb_w, int32_t, H4, stb)
>> +GEN_VEXT_ST_ELEM(stb_d, int64_t, H8, stb)
>> +GEN_VEXT_ST_ELEM(sth_h, int16_t, H2, stw)
>> +GEN_VEXT_ST_ELEM(sth_w, int32_t, H4, stw)
>> +GEN_VEXT_ST_ELEM(sth_d, int64_t, H8, stw)
>> +GEN_VEXT_ST_ELEM(stw_w, int32_t, H4, stl)
>> +GEN_VEXT_ST_ELEM(stw_d, int64_t, H8, stl)
>> +GEN_VEXT_ST_ELEM(ste_b, int8_t,  H1, stb)
>> +GEN_VEXT_ST_ELEM(ste_h, int16_t, H2, stw)
>> +GEN_VEXT_ST_ELEM(ste_w, int32_t, H4, stl)
>> +GEN_VEXT_ST_ELEM(ste_d, int64_t, H8, stq)
>> +
>> +/*
>> + *** stride: access vector element from strided memory
>> + */
>> +static void vext_ldst_stride(void *vd, void *v0, target_ulong base,
>> +        target_ulong stride, CPURISCVState *env, uint32_t desc, uint32_t vm,
>> +        vext_ldst_elem_fn ldst_elem, vext_ld_clear_elem clear_elem,
>> +        uint32_t esz, uint32_t msz, uintptr_t ra, MMUAccessType access_type)
>> +{
>> +    uint32_t i, k;
>> +    uint32_t nf = vext_nf(desc);
>> +    uint32_t mlen = vext_mlen(desc);
>> +    uint32_t vlmax = vext_maxsz(desc) / esz;
>> +
>> +    if (env->vl == 0) {
>> +        return;
>> +    }
>> +    /* probe every access*/
>> +    for (i = 0; i < env->vl; i++) {
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
>> +            continue;
>> +        }
>> +        probe_pages(env, base + stride * i, nf * msz, ra, access_type);
>> +    }
>> +    /* do real access */
>> +    for (i = 0; i < env->vl; i++) {
>> +        k = 0;
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
>> +            continue;
>> +        }
>> +        while (k < nf) {
>> +            target_ulong addr = base + stride * i + k * msz;
>> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
>> +            k++;
>> +        }
>> +    }
>> +    /* clear tail elements */
>> +    if (clear_elem) {
>> +        for (k = 0; k < nf; k++) {
>> +            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
>> +        }
>> +    }
>> +}
>> +
>> +#define GEN_VEXT_LD_STRIDE(NAME, MTYPE, ETYPE, LOAD_FN, CLEAR_FN)       \
>> +void HELPER(NAME)(void *vd, void * v0, target_ulong base,               \
>> +        target_ulong stride, CPURISCVState *env, uint32_t desc)         \
>> +{                                                                       \
>> +    uint32_t vm = vext_vm(desc);                                        \
>> +    vext_ldst_stride(vd, v0, base, stride, env, desc, vm, LOAD_FN,      \
>> +        CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);\
>> +}
>> +
>> +GEN_VEXT_LD_STRIDE(vlsb_v_b,  int8_t,   int8_t,   ldb_b,  clearb)
>> +GEN_VEXT_LD_STRIDE(vlsb_v_h,  int8_t,   int16_t,  ldb_h,  clearh)
>> +GEN_VEXT_LD_STRIDE(vlsb_v_w,  int8_t,   int32_t,  ldb_w,  clearl)
>> +GEN_VEXT_LD_STRIDE(vlsb_v_d,  int8_t,   int64_t,  ldb_d,  clearq)
>> +GEN_VEXT_LD_STRIDE(vlsh_v_h,  int16_t,  int16_t,  ldh_h,  clearh)
>> +GEN_VEXT_LD_STRIDE(vlsh_v_w,  int16_t,  int32_t,  ldh_w,  clearl)
>> +GEN_VEXT_LD_STRIDE(vlsh_v_d,  int16_t,  int64_t,  ldh_d,  clearq)
>> +GEN_VEXT_LD_STRIDE(vlsw_v_w,  int32_t,  int32_t,  ldw_w,  clearl)
>> +GEN_VEXT_LD_STRIDE(vlsw_v_d,  int32_t,  int64_t,  ldw_d,  clearq)
>> +GEN_VEXT_LD_STRIDE(vlse_v_b,  int8_t,   int8_t,   lde_b,  clearb)
>> +GEN_VEXT_LD_STRIDE(vlse_v_h,  int16_t,  int16_t,  lde_h,  clearh)
>> +GEN_VEXT_LD_STRIDE(vlse_v_w,  int32_t,  int32_t,  lde_w,  clearl)
>> +GEN_VEXT_LD_STRIDE(vlse_v_d,  int64_t,  int64_t,  lde_d,  clearq)
>> +GEN_VEXT_LD_STRIDE(vlsbu_v_b, uint8_t,  uint8_t,  ldbu_b, clearb)
>> +GEN_VEXT_LD_STRIDE(vlsbu_v_h, uint8_t,  uint16_t, ldbu_h, clearh)
>> +GEN_VEXT_LD_STRIDE(vlsbu_v_w, uint8_t,  uint32_t, ldbu_w, clearl)
>> +GEN_VEXT_LD_STRIDE(vlsbu_v_d, uint8_t,  uint64_t, ldbu_d, clearq)
>> +GEN_VEXT_LD_STRIDE(vlshu_v_h, uint16_t, uint16_t, ldhu_h, clearh)
>> +GEN_VEXT_LD_STRIDE(vlshu_v_w, uint16_t, uint32_t, ldhu_w, clearl)
>> +GEN_VEXT_LD_STRIDE(vlshu_v_d, uint16_t, uint64_t, ldhu_d, clearq)
>> +GEN_VEXT_LD_STRIDE(vlswu_v_w, uint32_t, uint32_t, ldwu_w, clearl)
>> +GEN_VEXT_LD_STRIDE(vlswu_v_d, uint32_t, uint64_t, ldwu_d, clearq)
>> +
>> +#define GEN_VEXT_ST_STRIDE(NAME, MTYPE, ETYPE, STORE_FN)                \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
>> +        target_ulong stride, CPURISCVState *env, uint32_t desc)         \
>> +{                                                                       \
>> +    uint32_t vm = vext_vm(desc);                                        \
>> +    vext_ldst_stride(vd, v0, base, stride, env, desc, vm, STORE_FN,     \
>> +        NULL, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);   \
>> +}
>> +
>> +GEN_VEXT_ST_STRIDE(vssb_v_b, int8_t,  int8_t,  stb_b)
>> +GEN_VEXT_ST_STRIDE(vssb_v_h, int8_t,  int16_t, stb_h)
>> +GEN_VEXT_ST_STRIDE(vssb_v_w, int8_t,  int32_t, stb_w)
>> +GEN_VEXT_ST_STRIDE(vssb_v_d, int8_t,  int64_t, stb_d)
>> +GEN_VEXT_ST_STRIDE(vssh_v_h, int16_t, int16_t, sth_h)
>> +GEN_VEXT_ST_STRIDE(vssh_v_w, int16_t, int32_t, sth_w)
>> +GEN_VEXT_ST_STRIDE(vssh_v_d, int16_t, int64_t, sth_d)
>> +GEN_VEXT_ST_STRIDE(vssw_v_w, int32_t, int32_t, stw_w)
>> +GEN_VEXT_ST_STRIDE(vssw_v_d, int32_t, int64_t, stw_d)
>> +GEN_VEXT_ST_STRIDE(vsse_v_b, int8_t,  int8_t,  ste_b)
>> +GEN_VEXT_ST_STRIDE(vsse_v_h, int16_t, int16_t, ste_h)
>> +GEN_VEXT_ST_STRIDE(vsse_v_w, int32_t, int32_t, ste_w)
>> +GEN_VEXT_ST_STRIDE(vsse_v_d, int64_t, int64_t, ste_d)
>> +
>> +/*
>> + *** unit-stride: access elements stored contiguously in memory
>> + */
>> +
>> +/* unmasked unit-stride load and store operation*/
>> +static inline void vext_ldst_us(void *vd, target_ulong base,
>> +        CPURISCVState *env, uint32_t desc,
>> +        vext_ldst_elem_fn ldst_elem,
>> +        vext_ld_clear_elem clear_elem,
>> +        uint32_t esz, uint32_t msz, uintptr_t ra,
>> +        MMUAccessType access_type)
>> +{
>> +    uint32_t i, k;
>> +    uint32_t nf = vext_nf(desc);
>> +    uint32_t vlmax = vext_maxsz(desc) / esz;
>> +
>> +    if (env->vl == 0) {
>> +        return;
>> +    }
>> +    /* probe every access */
>> +    probe_pages(env, base, env->vl * nf * msz, ra, access_type);
>> +    /* load bytes from guest memory */
>> +    for (i = 0; i < env->vl; i++) {
>> +        k = 0;
>> +        while (k < nf) {
>> +            target_ulong addr = base + (i * nf + k) * msz;
>> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
>> +            k++;
>> +        }
>> +    }
>> +    /* clear tail elements */
>> +    if (clear_elem) {
>> +        for (k = 0; k < nf; k++) {
>> +            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
>> +        }
>> +    }
>> +}
>> +
>> +/*
>> + * masked unit-stride load and store operation will be a special case of stride,
>> + * stride = NF * sizeof (MTYPE)
>> + */
>> +
>> +#define GEN_VEXT_LD_US(NAME, MTYPE, ETYPE, LOAD_FN, CLEAR_FN)           \
>> +void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,         \
>> +        CPURISCVState *env, uint32_t desc)                              \
>> +{                                                                       \
>> +    uint32_t stride = vext_nf(desc) * sizeof(MTYPE);                    \
>> +    vext_ldst_stride(vd, v0, base, stride, env, desc, false, LOAD_FN,   \
>> +        CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);\
>> +}                                                                       \
>> +                                                                        \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
>> +        CPURISCVState *env, uint32_t desc)                              \
>> +{                                                                       \
>> +    vext_ldst_us(vd, base, env, desc, LOAD_FN, CLEAR_FN,                \
>> +        sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);          \
>> +}
>> +
>> +GEN_VEXT_LD_US(vlb_v_b,  int8_t,   int8_t,   ldb_b,  clearb)
>> +GEN_VEXT_LD_US(vlb_v_h,  int8_t,   int16_t,  ldb_h,  clearh)
>> +GEN_VEXT_LD_US(vlb_v_w,  int8_t,   int32_t,  ldb_w,  clearl)
>> +GEN_VEXT_LD_US(vlb_v_d,  int8_t,   int64_t,  ldb_d,  clearq)
>> +GEN_VEXT_LD_US(vlh_v_h,  int16_t,  int16_t,  ldh_h,  clearh)
>> +GEN_VEXT_LD_US(vlh_v_w,  int16_t,  int32_t,  ldh_w,  clearl)
>> +GEN_VEXT_LD_US(vlh_v_d,  int16_t,  int64_t,  ldh_d,  clearq)
>> +GEN_VEXT_LD_US(vlw_v_w,  int32_t,  int32_t,  ldw_w,  clearl)
>> +GEN_VEXT_LD_US(vlw_v_d,  int32_t,  int64_t,  ldw_d,  clearq)
>> +GEN_VEXT_LD_US(vle_v_b,  int8_t,   int8_t,   lde_b,  clearb)
>> +GEN_VEXT_LD_US(vle_v_h,  int16_t,  int16_t,  lde_h,  clearh)
>> +GEN_VEXT_LD_US(vle_v_w,  int32_t,  int32_t,  lde_w,  clearl)
>> +GEN_VEXT_LD_US(vle_v_d,  int64_t,  int64_t,  lde_d,  clearq)
>> +GEN_VEXT_LD_US(vlbu_v_b, uint8_t,  uint8_t,  ldbu_b, clearb)
>> +GEN_VEXT_LD_US(vlbu_v_h, uint8_t,  uint16_t, ldbu_h, clearh)
>> +GEN_VEXT_LD_US(vlbu_v_w, uint8_t,  uint32_t, ldbu_w, clearl)
>> +GEN_VEXT_LD_US(vlbu_v_d, uint8_t,  uint64_t, ldbu_d, clearq)
>> +GEN_VEXT_LD_US(vlhu_v_h, uint16_t, uint16_t, ldhu_h, clearh)
>> +GEN_VEXT_LD_US(vlhu_v_w, uint16_t, uint32_t, ldhu_w, clearl)
>> +GEN_VEXT_LD_US(vlhu_v_d, uint16_t, uint64_t, ldhu_d, clearq)
>> +GEN_VEXT_LD_US(vlwu_v_w, uint32_t, uint32_t, ldwu_w, clearl)
>> +GEN_VEXT_LD_US(vlwu_v_d, uint32_t, uint64_t, ldwu_d, clearq)
>> +
>> +#define GEN_VEXT_ST_US(NAME, MTYPE, ETYPE, STORE_FN)                    \
>> +void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,         \
>> +        CPURISCVState *env, uint32_t desc)                              \
>> +{                                                                       \
>> +    uint32_t stride = vext_nf(desc) * sizeof(MTYPE);                    \
>> +    vext_ldst_stride(vd, v0, base, stride, env, desc, false, STORE_FN,  \
>> +        NULL, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);   \
>> +}                                                                       \
>> +                                                                        \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
>> +        CPURISCVState *env, uint32_t desc)                              \
>> +{                                                                       \
>> +    vext_ldst_us(vd, base, env, desc, STORE_FN, NULL,                   \
>> +        sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);         \
>> +}
>> +
>> +GEN_VEXT_ST_US(vsb_v_b, int8_t,  int8_t , stb_b)
>> +GEN_VEXT_ST_US(vsb_v_h, int8_t,  int16_t, stb_h)
>> +GEN_VEXT_ST_US(vsb_v_w, int8_t,  int32_t, stb_w)
>> +GEN_VEXT_ST_US(vsb_v_d, int8_t,  int64_t, stb_d)
>> +GEN_VEXT_ST_US(vsh_v_h, int16_t, int16_t, sth_h)
>> +GEN_VEXT_ST_US(vsh_v_w, int16_t, int32_t, sth_w)
>> +GEN_VEXT_ST_US(vsh_v_d, int16_t, int64_t, sth_d)
>> +GEN_VEXT_ST_US(vsw_v_w, int32_t, int32_t, stw_w)
>> +GEN_VEXT_ST_US(vsw_v_d, int32_t, int64_t, stw_d)
>> +GEN_VEXT_ST_US(vse_v_b, int8_t,  int8_t , ste_b)
>> +GEN_VEXT_ST_US(vse_v_h, int16_t, int16_t, ste_h)
>> +GEN_VEXT_ST_US(vse_v_w, int32_t, int32_t, ste_w)
>> +GEN_VEXT_ST_US(vse_v_d, int64_t, int64_t, ste_d)
>> --
>> 2.23.0
>>



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 05/60] target/riscv: add vector stride load and store instructions
@ 2020-03-13 21:32       ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-13 21:32 UTC (permalink / raw)
  To: Alistair Francis
  Cc: Richard Henderson, Chih-Min Chao, Palmer Dabbelt, wenmeng_zhang,
	wxy194768, guoren, qemu-devel@nongnu.org Developers,
	open list:RISC-V



On 2020/3/14 4:38, Alistair Francis wrote:
> On Thu, Mar 12, 2020 at 8:09 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>> Vector strided operations access the first memory element at the base address,
>> and then access subsequent elements at address increments given by the byte
>> offset contained in the x register specified by rs2.
>>
>> Vector unit-stride operations access elements stored contiguously in memory
>> starting from the base effective address. It can been seen as a special
>> case of strided operations.
>>
>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>> ---
>>   target/riscv/cpu.h                      |   6 +
>>   target/riscv/helper.h                   | 105 ++++++
>>   target/riscv/insn32.decode              |  32 ++
>>   target/riscv/insn_trans/trans_rvv.inc.c | 340 ++++++++++++++++++++
>>   target/riscv/translate.c                |   7 +
>>   target/riscv/vector_helper.c            | 406 ++++++++++++++++++++++++
>>   6 files changed, 896 insertions(+)
>>
>> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
>> index 505d1a8515..b6ebb9b0eb 100644
>> --- a/target/riscv/cpu.h
>> +++ b/target/riscv/cpu.h
>> @@ -369,6 +369,12 @@ typedef CPURISCVState CPUArchState;
>>   typedef RISCVCPU ArchCPU;
>>   #include "exec/cpu-all.h"
>>
>> +/* share data between vector helpers and decode code */
>> +FIELD(VDATA, MLEN, 0, 8)
>> +FIELD(VDATA, VM, 8, 1)
>> +FIELD(VDATA, LMUL, 9, 2)
>> +FIELD(VDATA, NF, 11, 4)
>> +
>>   FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
>>   FIELD(TB_FLAGS, LMUL, 3, 2)
>>   FIELD(TB_FLAGS, SEW, 5, 3)
>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>> index 3c28c7e407..87dfa90609 100644
>> --- a/target/riscv/helper.h
>> +++ b/target/riscv/helper.h
>> @@ -78,3 +78,108 @@ DEF_HELPER_1(tlb_flush, void, env)
>>   #endif
>>   /* Vector functions */
>>   DEF_HELPER_3(vsetvl, tl, env, tl, tl)
>> +DEF_HELPER_5(vlb_v_b, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlb_v_b_mask, void, ptr, ptr, tl, env, i32)
> Do you mind explaining why we have *_mask versions? I'm struggling to
> understand this.
When an instruction with a mask, it will only operate the active 
elements in vector.
Whether an element is active or inactive is predicated by a mask 
register v0.

Without mask, it will operate every element in vector in the body.
>> +DEF_HELPER_5(vlb_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlb_v_h_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlb_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlb_v_w_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlb_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlb_v_d_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlh_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlh_v_h_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlh_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlh_v_w_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlh_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlh_v_d_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlw_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlw_v_w_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlw_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlw_v_d_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vle_v_b, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vle_v_b_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vle_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vle_v_h_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vle_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vle_v_w_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vle_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vle_v_d_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbu_v_b, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbu_v_b_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbu_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbu_v_h_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbu_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbu_v_w_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbu_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbu_v_d_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhu_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhu_v_h_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhu_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhu_v_w_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhu_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhu_v_d_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlwu_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlwu_v_w_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlwu_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlwu_v_d_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsb_v_b, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsb_v_b_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsb_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsb_v_h_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsb_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsb_v_w_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsb_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsb_v_d_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsh_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsh_v_h_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsh_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsh_v_w_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsh_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsh_v_d_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsw_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsw_v_w_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsw_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vsw_v_d_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vse_v_b, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vse_v_b_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vse_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vse_v_h_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vse_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vse_v_w_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vse_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vse_v_d_mask, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_6(vlsb_v_b, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsb_v_h, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsb_v_w, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsb_v_d, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsh_v_h, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsh_v_w, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsh_v_d, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsw_v_w, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsw_v_d, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlse_v_b, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlse_v_h, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlse_v_w, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlse_v_d, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsbu_v_b, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsbu_v_h, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsbu_v_w, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlsbu_v_d, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlshu_v_h, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlshu_v_w, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlshu_v_d, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlswu_v_w, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vlswu_v_d, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vssb_v_b, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vssb_v_h, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vssb_v_w, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vssb_v_d, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vssh_v_h, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vssh_v_w, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vssh_v_d, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vssw_v_w, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vssw_v_d, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vsse_v_b, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vsse_v_h, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vsse_v_w, void, ptr, ptr, tl, tl, env, i32)
>> +DEF_HELPER_6(vsse_v_d, void, ptr, ptr, tl, tl, env, i32)
>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>> index 53340bdbc4..ef521152c5 100644
>> --- a/target/riscv/insn32.decode
>> +++ b/target/riscv/insn32.decode
>> @@ -25,6 +25,7 @@
>>   %sh10    20:10
>>   %csr    20:12
>>   %rm     12:3
>> +%nf     29:3                     !function=ex_plus_1
>>
>>   # immediates:
>>   %imm_i    20:s12
>> @@ -43,6 +44,8 @@
>>   &u    imm rd
>>   &shift     shamt rs1 rd
>>   &atomic    aq rl rs2 rs1 rd
>> +&r2nfvm    vm rd rs1 nf
>> +&rnfvm     vm rd rs1 rs2 nf
>>
>>   # Formats 32:
>>   @r       .......   ..... ..... ... ..... ....... &r                %rs2 %rs1 %rd
>> @@ -62,6 +65,8 @@
>>   @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
>>   @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
>>   @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
>> +@r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
>> +@r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
>>   @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
>>
>>   @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
>> @@ -210,5 +215,32 @@ fcvt_d_w   1101001  00000 ..... ... ..... 1010011 @r2_rm
>>   fcvt_d_wu  1101001  00001 ..... ... ..... 1010011 @r2_rm
>>
>>   # *** RV32V Extension ***
>> +
>> +# *** Vector loads and stores are encoded within LOADFP/STORE-FP ***
>> +vlb_v      ... 100 . 00000 ..... 000 ..... 0000111 @r2_nfvm
>> +vlh_v      ... 100 . 00000 ..... 101 ..... 0000111 @r2_nfvm
>> +vlw_v      ... 100 . 00000 ..... 110 ..... 0000111 @r2_nfvm
>> +vle_v      ... 000 . 00000 ..... 111 ..... 0000111 @r2_nfvm
>> +vlbu_v     ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm
>> +vlhu_v     ... 000 . 00000 ..... 101 ..... 0000111 @r2_nfvm
>> +vlwu_v     ... 000 . 00000 ..... 110 ..... 0000111 @r2_nfvm
>> +vsb_v      ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm
>> +vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
>> +vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
>> +vse_v      ... 000 . 00000 ..... 111 ..... 0100111 @r2_nfvm
>> +
>> +vlsb_v     ... 110 . ..... ..... 000 ..... 0000111 @r_nfvm
>> +vlsh_v     ... 110 . ..... ..... 101 ..... 0000111 @r_nfvm
>> +vlsw_v     ... 110 . ..... ..... 110 ..... 0000111 @r_nfvm
>> +vlse_v     ... 010 . ..... ..... 111 ..... 0000111 @r_nfvm
>> +vlsbu_v    ... 010 . ..... ..... 000 ..... 0000111 @r_nfvm
>> +vlshu_v    ... 010 . ..... ..... 101 ..... 0000111 @r_nfvm
>> +vlswu_v    ... 010 . ..... ..... 110 ..... 0000111 @r_nfvm
>> +vssb_v     ... 010 . ..... ..... 000 ..... 0100111 @r_nfvm
>> +vssh_v     ... 010 . ..... ..... 101 ..... 0100111 @r_nfvm
>> +vssw_v     ... 010 . ..... ..... 110 ..... 0100111 @r_nfvm
>> +vsse_v     ... 010 . ..... ..... 111 ..... 0100111 @r_nfvm
>> +
>> +# *** new major opcode OP-V ***
>>   vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
>>   vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
>> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
>> index da82c72bbf..d85f2aec68 100644
>> --- a/target/riscv/insn_trans/trans_rvv.inc.c
>> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
>> @@ -15,6 +15,8 @@
>>    * You should have received a copy of the GNU General Public License along with
>>    * this program.  If not, see <http://www.gnu.org/licenses/>.
>>    */
>> +#include "tcg/tcg-op-gvec.h"
>> +#include "tcg/tcg-gvec-desc.h"
>>
>>   static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl * a)
>>   {
>> @@ -67,3 +69,341 @@ static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli * a)
>>       tcg_temp_free(dst);
>>       return true;
>>   }
>> +
>> +/* vector register offset from env */
>> +static uint32_t vreg_ofs(DisasContext *s, int reg)
>> +{
>> +    return offsetof(CPURISCVState, vreg) + reg * s->vlen / 8;
>> +}
>> +
>> +/* check functions */
>> +static bool vext_check_isa_ill(DisasContext *s, target_ulong isa)
>> +{
>> +    return !s->vill && ((s->misa & isa) == isa);
>> +}
> I don't think we need a new function to check ISA.
I don't think so.

Although there is a riscv_has_ext(env, isa) in cpu.h, it is not proper 
in this file,
as it is in translation time and  usually DisasContext   is used here 
instead of CPURISCVState.

VILL and ISA  will be checked in every vector instruction, I just put 
them in one function.
>
>> +
>> +/*
>> + * There are two rules check here.
>> + *
>> + * 1. Vector register numbers are multiples of LMUL. (Section 3.2)
>> + *
>> + * 2. For all widening instructions, the destination LMUL value must also be
>> + *    a supported LMUL value. (Section 11.2)
>> + */
>> +static bool vext_check_reg(DisasContext *s, uint32_t reg, bool widen)
>> +{
>> +    /*
>> +     * The destination vector register group results are arranged as if both
>> +     * SEW and LMUL were at twice their current settings. (Section 11.2).
>> +     */
>> +    int legal = widen ? 2 << s->lmul : 1 << s->lmul;
>> +
>> +    return !((s->lmul == 0x3 && widen) || (reg % legal));
> Where does this 3 come from?
LMUL are 2 bits in VTYPE.  So the biggest LMUL is 0x3.
The meaning of 0x3 is there are 8 vector registers will be used for 
operators.

For a widen operation, LMUL equals 0x3 will be illegal, as

     "The destination vector register group results are arranged as if both
      SEW and LMUL were at twice their current settings. (Section 11.2)."

If LMUL is 0x3, the source vector register group is 8 vector registers, and
the destination vector register group will be 16 vector registers indicated,
which is illegal.
>
>> +}
>> +
>> +/*
>> + * There are two rules check here.
>> + *
>> + * 1. The destination vector register group for a masked vector instruction can
>> + *    only overlap the source mask register (v0) when LMUL=1. (Section 5.3)
>> + *
>> + * 2. In widen instructions and some other insturctions, like vslideup.vx,
>> + *    there is no need to check whether LMUL=1.
>> + */
>> +static bool vext_check_overlap_mask(DisasContext *s, uint32_t vd, bool vm,
>> +    bool force)
>> +{
>> +    return (vm != 0 || vd != 0) || (!force && (s->lmul == 0));
>> +}
>> +
>> +/* The LMUL setting must be such that LMUL * NFIELDS <= 8. (Section 7.8) */
>> +static bool vext_check_nf(DisasContext *s, uint32_t nf)
>> +{
>> +    return (1 << s->lmul) * nf <= 8;
>> +}
>> +
>> +/* common translation macro */
>> +#define GEN_VEXT_TRANS(NAME, SEQ, ARGTYPE, OP, CHECK)      \
>> +static bool trans_##NAME(DisasContext *s, arg_##ARGTYPE *a)\
>> +{                                                          \
>> +    if (CHECK(s, a)) {                                     \
>> +        return OP(s, a, SEQ);                              \
>> +    }                                                      \
>> +    return false;                                          \
>> +}
>> +
>> +/*
>> + *** unit stride load and store
>> + */
>> +typedef void gen_helper_ldst_us(TCGv_ptr, TCGv_ptr, TCGv,
>> +        TCGv_env, TCGv_i32);
>> +
>> +static bool ldst_us_trans(uint32_t vd, uint32_t rs1, uint32_t data,
>> +        gen_helper_ldst_us *fn, DisasContext *s)
>> +{
>> +    TCGv_ptr dest, mask;
>> +    TCGv base;
>> +    TCGv_i32 desc;
>> +
>> +    dest = tcg_temp_new_ptr();
>> +    mask = tcg_temp_new_ptr();
>> +    base = tcg_temp_new();
>> +
>> +    /*
>> +     * As simd_desc supports at most 256 bytes, and in this implementation,
>> +     * the max vector group length is 2048 bytes. So split it into two parts.
>> +     *
>> +     * The first part is vlen in bytes, encoded in maxsz of simd_desc.
>> +     * The second part is lmul, encoded in data of simd_desc.
>> +     */
>> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
>> +
>> +    gen_get_gpr(base, rs1);
>> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
>> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
>> +
>> +    fn(dest, mask, base, cpu_env, desc);
>> +
>> +    tcg_temp_free_ptr(dest);
>> +    tcg_temp_free_ptr(mask);
>> +    tcg_temp_free(base);
>> +    tcg_temp_free_i32(desc);
>> +    return true;
>> +}
>> +
>> +static bool ld_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
>> +{
>> +    uint32_t data = 0;
>> +    gen_helper_ldst_us *fn;
>> +    static gen_helper_ldst_us * const fns[2][7][4] = {
>> +        /* masked unit stride load */
>> +        { { gen_helper_vlb_v_b_mask,  gen_helper_vlb_v_h_mask,
>> +            gen_helper_vlb_v_w_mask,  gen_helper_vlb_v_d_mask },
>> +          { NULL,                     gen_helper_vlh_v_h_mask,
>> +            gen_helper_vlh_v_w_mask,  gen_helper_vlh_v_d_mask },
>> +          { NULL,                     NULL,
>> +            gen_helper_vlw_v_w_mask,  gen_helper_vlw_v_d_mask },
>> +          { gen_helper_vle_v_b_mask,  gen_helper_vle_v_h_mask,
>> +            gen_helper_vle_v_w_mask,  gen_helper_vle_v_d_mask },
>> +          { gen_helper_vlbu_v_b_mask, gen_helper_vlbu_v_h_mask,
>> +            gen_helper_vlbu_v_w_mask, gen_helper_vlbu_v_d_mask },
>> +          { NULL,                     gen_helper_vlhu_v_h_mask,
>> +            gen_helper_vlhu_v_w_mask, gen_helper_vlhu_v_d_mask },
>> +          { NULL,                     NULL,
>> +            gen_helper_vlwu_v_w_mask, gen_helper_vlwu_v_d_mask } },
>> +        /* unmasked unit stride load */
>> +        { { gen_helper_vlb_v_b,  gen_helper_vlb_v_h,
>> +            gen_helper_vlb_v_w,  gen_helper_vlb_v_d },
>> +          { NULL,                gen_helper_vlh_v_h,
>> +            gen_helper_vlh_v_w,  gen_helper_vlh_v_d },
>> +          { NULL,                NULL,
>> +            gen_helper_vlw_v_w,  gen_helper_vlw_v_d },
>> +          { gen_helper_vle_v_b,  gen_helper_vle_v_h,
>> +            gen_helper_vle_v_w,  gen_helper_vle_v_d },
>> +          { gen_helper_vlbu_v_b, gen_helper_vlbu_v_h,
>> +            gen_helper_vlbu_v_w, gen_helper_vlbu_v_d },
>> +          { NULL,                gen_helper_vlhu_v_h,
>> +            gen_helper_vlhu_v_w, gen_helper_vlhu_v_d },
>> +          { NULL,                NULL,
>> +            gen_helper_vlwu_v_w, gen_helper_vlwu_v_d } }
>> +    };
>> +
>> +    fn =  fns[a->vm][seq][s->sew];
>> +    if (fn == NULL) {
>> +        return false;
>> +    }
>> +
>> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
>> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
>> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
>> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
>> +    return ldst_us_trans(a->rd, a->rs1, data, fn, s);
>> +}
>> +
>> +static bool ld_us_check(DisasContext *s, arg_r2nfvm* a)
>> +{
>> +    return (vext_check_isa_ill(s, RVV) &&
>> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
>> +            vext_check_reg(s, a->rd, false) &&
>> +            vext_check_nf(s, a->nf));
>> +}
>> +
>> +GEN_VEXT_TRANS(vlb_v, 0, r2nfvm, ld_us_op, ld_us_check)
>> +GEN_VEXT_TRANS(vlh_v, 1, r2nfvm, ld_us_op, ld_us_check)
>> +GEN_VEXT_TRANS(vlw_v, 2, r2nfvm, ld_us_op, ld_us_check)
>> +GEN_VEXT_TRANS(vle_v, 3, r2nfvm, ld_us_op, ld_us_check)
>> +GEN_VEXT_TRANS(vlbu_v, 4, r2nfvm, ld_us_op, ld_us_check)
>> +GEN_VEXT_TRANS(vlhu_v, 5, r2nfvm, ld_us_op, ld_us_check)
>> +GEN_VEXT_TRANS(vlwu_v, 6, r2nfvm, ld_us_op, ld_us_check)
>> +
>> +static bool st_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
>> +{
>> +    uint32_t data = 0;
>> +    gen_helper_ldst_us *fn;
>> +    static gen_helper_ldst_us * const fns[2][4][4] = {
>> +        /* masked unit stride load and store */
>> +        { { gen_helper_vsb_v_b_mask,  gen_helper_vsb_v_h_mask,
>> +            gen_helper_vsb_v_w_mask,  gen_helper_vsb_v_d_mask },
>> +          { NULL,                     gen_helper_vsh_v_h_mask,
>> +            gen_helper_vsh_v_w_mask,  gen_helper_vsh_v_d_mask },
>> +          { NULL,                     NULL,
>> +            gen_helper_vsw_v_w_mask,  gen_helper_vsw_v_d_mask },
>> +          { gen_helper_vse_v_b_mask,  gen_helper_vse_v_h_mask,
>> +            gen_helper_vse_v_w_mask,  gen_helper_vse_v_d_mask } },
>> +        /* unmasked unit stride store */
>> +        { { gen_helper_vsb_v_b,  gen_helper_vsb_v_h,
>> +            gen_helper_vsb_v_w,  gen_helper_vsb_v_d },
>> +          { NULL,                gen_helper_vsh_v_h,
>> +            gen_helper_vsh_v_w,  gen_helper_vsh_v_d },
>> +          { NULL,                NULL,
>> +            gen_helper_vsw_v_w,  gen_helper_vsw_v_d },
>> +          { gen_helper_vse_v_b,  gen_helper_vse_v_h,
>> +            gen_helper_vse_v_w,  gen_helper_vse_v_d } }
>> +    };
>> +
>> +    fn =  fns[a->vm][seq][s->sew];
>> +    if (fn == NULL) {
>> +        return false;
>> +    }
>> +
>> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
>> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
>> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
>> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
>> +    return ldst_us_trans(a->rd, a->rs1, data, fn, s);
>> +}
>> +
>> +static bool st_us_check(DisasContext *s, arg_r2nfvm* a)
>> +{
>> +    return (vext_check_isa_ill(s, RVV) &&
>> +            vext_check_reg(s, a->rd, false) &&
>> +            vext_check_nf(s, a->nf));
>> +}
>> +
>> +GEN_VEXT_TRANS(vsb_v, 0, r2nfvm, st_us_op, st_us_check)
>> +GEN_VEXT_TRANS(vsh_v, 1, r2nfvm, st_us_op, st_us_check)
>> +GEN_VEXT_TRANS(vsw_v, 2, r2nfvm, st_us_op, st_us_check)
>> +GEN_VEXT_TRANS(vse_v, 3, r2nfvm, st_us_op, st_us_check)
>> +
>> +/*
>> + *** stride load and store
>> + */
>> +typedef void gen_helper_ldst_stride(TCGv_ptr, TCGv_ptr, TCGv,
>> +        TCGv, TCGv_env, TCGv_i32);
>> +
>> +static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, uint32_t rs2,
>> +        uint32_t data, gen_helper_ldst_stride *fn, DisasContext *s)
>> +{
>> +    TCGv_ptr dest, mask;
>> +    TCGv base, stride;
>> +    TCGv_i32 desc;
>> +
>> +    dest = tcg_temp_new_ptr();
>> +    mask = tcg_temp_new_ptr();
>> +    base = tcg_temp_new();
>> +    stride = tcg_temp_new();
>> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
>> +
>> +    gen_get_gpr(base, rs1);
>> +    gen_get_gpr(stride, rs2);
>> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
>> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
>> +
>> +    fn(dest, mask, base, stride, cpu_env, desc);
>> +
>> +    tcg_temp_free_ptr(dest);
>> +    tcg_temp_free_ptr(mask);
>> +    tcg_temp_free(base);
>> +    tcg_temp_free(stride);
>> +    tcg_temp_free_i32(desc);
>> +    return true;
>> +}
>> +
>> +static bool ld_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
>> +{
>> +    uint32_t data = 0;
>> +    gen_helper_ldst_stride *fn;
>> +    static gen_helper_ldst_stride * const fns[7][4] = {
>> +        { gen_helper_vlsb_v_b,  gen_helper_vlsb_v_h,
>> +          gen_helper_vlsb_v_w,  gen_helper_vlsb_v_d },
>> +        { NULL,                 gen_helper_vlsh_v_h,
>> +          gen_helper_vlsh_v_w,  gen_helper_vlsh_v_d },
>> +        { NULL,                 NULL,
>> +          gen_helper_vlsw_v_w,  gen_helper_vlsw_v_d },
>> +        { gen_helper_vlse_v_b,  gen_helper_vlse_v_h,
>> +          gen_helper_vlse_v_w,  gen_helper_vlse_v_d },
>> +        { gen_helper_vlsbu_v_b, gen_helper_vlsbu_v_h,
>> +          gen_helper_vlsbu_v_w, gen_helper_vlsbu_v_d },
>> +        { NULL,                 gen_helper_vlshu_v_h,
>> +          gen_helper_vlshu_v_w, gen_helper_vlshu_v_d },
>> +        { NULL,                 NULL,
>> +          gen_helper_vlswu_v_w, gen_helper_vlswu_v_d },
>> +    };
>> +
>> +    fn =  fns[seq][s->sew];
>> +    if (fn == NULL) {
>> +        return false;
>> +    }
>> +
>> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
>> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
>> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
>> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
>> +    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
>> +}
>> +
>> +static bool ld_stride_check(DisasContext *s, arg_rnfvm* a)
>> +{
>> +    return (vext_check_isa_ill(s, RVV) &&
>> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
>> +            vext_check_reg(s, a->rd, false) &&
>> +            vext_check_nf(s, a->nf));
>> +}
>> +
>> +GEN_VEXT_TRANS(vlsb_v, 0, rnfvm, ld_stride_op, ld_stride_check)
>> +GEN_VEXT_TRANS(vlsh_v, 1, rnfvm, ld_stride_op, ld_stride_check)
>> +GEN_VEXT_TRANS(vlsw_v, 2, rnfvm, ld_stride_op, ld_stride_check)
>> +GEN_VEXT_TRANS(vlse_v, 3, rnfvm, ld_stride_op, ld_stride_check)
>> +GEN_VEXT_TRANS(vlsbu_v, 4, rnfvm, ld_stride_op, ld_stride_check)
>> +GEN_VEXT_TRANS(vlshu_v, 5, rnfvm, ld_stride_op, ld_stride_check)
>> +GEN_VEXT_TRANS(vlswu_v, 6, rnfvm, ld_stride_op, ld_stride_check)
>> +
>> +static bool st_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
>> +{
>> +    uint32_t data = 0;
>> +    gen_helper_ldst_stride *fn;
>> +    static gen_helper_ldst_stride * const fns[4][4] = {
>> +        /* masked stride store */
>> +        { gen_helper_vssb_v_b,  gen_helper_vssb_v_h,
>> +          gen_helper_vssb_v_w,  gen_helper_vssb_v_d },
>> +        { NULL,                 gen_helper_vssh_v_h,
>> +          gen_helper_vssh_v_w,  gen_helper_vssh_v_d },
>> +        { NULL,                 NULL,
>> +          gen_helper_vssw_v_w,  gen_helper_vssw_v_d },
>> +        { gen_helper_vsse_v_b,  gen_helper_vsse_v_h,
>> +          gen_helper_vsse_v_w,  gen_helper_vsse_v_d }
>> +    };
>> +
>> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
>> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
>> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
>> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
>> +    fn =  fns[seq][s->sew];
>> +    if (fn == NULL) {
>> +        return false;
>> +    }
>> +
>> +    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
>> +}
>> +
>> +static bool st_stride_check(DisasContext *s, arg_rnfvm* a)
>> +{
>> +    return (vext_check_isa_ill(s, RVV) &&
>> +            vext_check_reg(s, a->rd, false) &&
>> +            vext_check_nf(s, a->nf));
>> +}
>> +
>> +GEN_VEXT_TRANS(vssb_v, 0, rnfvm, st_stride_op, st_stride_check)
>> +GEN_VEXT_TRANS(vssh_v, 1, rnfvm, st_stride_op, st_stride_check)
>> +GEN_VEXT_TRANS(vssw_v, 2, rnfvm, st_stride_op, st_stride_check)
>> +GEN_VEXT_TRANS(vsse_v, 3, rnfvm, st_stride_op, st_stride_check)
> Looks good
>
>> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
>> index af07ac4160..852545b77e 100644
>> --- a/target/riscv/translate.c
>> +++ b/target/riscv/translate.c
>> @@ -61,6 +61,7 @@ typedef struct DisasContext {
>>       uint8_t lmul;
>>       uint8_t sew;
>>       uint16_t vlen;
>> +    uint16_t mlen;
>>       bool vl_eq_vlmax;
>>   } DisasContext;
>>
>> @@ -548,6 +549,11 @@ static void decode_RV32_64C(DisasContext *ctx, uint16_t opcode)
>>       }
>>   }
>>
>> +static int ex_plus_1(DisasContext *ctx, int nf)
>> +{
>> +    return nf + 1;
>> +}
>> +
>>   #define EX_SH(amount) \
>>       static int ex_shift_##amount(DisasContext *ctx, int imm) \
>>       {                                         \
>> @@ -784,6 +790,7 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
>>       ctx->vill = FIELD_EX32(tb_flags, TB_FLAGS, VILL);
>>       ctx->sew = FIELD_EX32(tb_flags, TB_FLAGS, SEW);
>>       ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
>> +    ctx->mlen = 1 << (ctx->sew  + 3 - ctx->lmul);
>>       ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
>>   }
>>
>> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
>> index 2afe716f2a..ebfabd2946 100644
>> --- a/target/riscv/vector_helper.c
>> +++ b/target/riscv/vector_helper.c
>> @@ -18,8 +18,10 @@
>>
>>   #include "qemu/osdep.h"
>>   #include "cpu.h"
>> +#include "exec/memop.h"
>>   #include "exec/exec-all.h"
>>   #include "exec/helper-proto.h"
>> +#include "tcg/tcg-gvec-desc.h"
>>   #include <math.h>
>>
>>   target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
>> @@ -51,3 +53,407 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
>>       env->vstart = 0;
>>       return vl;
>>   }
>> +
>> +/*
>> + * Note that vector data is stored in host-endian 64-bit chunks,
>> + * so addressing units smaller than that needs a host-endian fixup.
>> + */
>> +#ifdef HOST_WORDS_BIGENDIAN
>> +#define H1(x)   ((x) ^ 7)
>> +#define H1_2(x) ((x) ^ 6)
>> +#define H1_4(x) ((x) ^ 4)
>> +#define H2(x)   ((x) ^ 3)
>> +#define H4(x)   ((x) ^ 1)
>> +#define H8(x)   ((x))
>> +#else
>> +#define H1(x)   (x)
>> +#define H1_2(x) (x)
>> +#define H1_4(x) (x)
>> +#define H2(x)   (x)
>> +#define H4(x)   (x)
>> +#define H8(x)   (x)
>> +#endif
> Looks good. Overall this looks good. Do you mind splitting this patch
> up a little bit more? It's difficult to review such a long and complex
> patch.
>
> Alistair
As unit stride can be saw as  a special case of stride mode, I just put 
them together.
I will  split the stride and unit stride mode in next patch set.

Even though I think it will be some long and complex, a lot of corner 
case must
be considered for vector load and store, and a lot of common code will 
be defined
here.

Zhiwei
>> +
>> +static inline uint32_t vext_nf(uint32_t desc)
>> +{
>> +    return FIELD_EX32(simd_data(desc), VDATA, NF);
>> +}
>> +
>> +static inline uint32_t vext_mlen(uint32_t desc)
>> +{
>> +    return FIELD_EX32(simd_data(desc), VDATA, MLEN);
>> +}
>> +
>> +static inline uint32_t vext_vm(uint32_t desc)
>> +{
>> +    return FIELD_EX32(simd_data(desc), VDATA, VM);
>> +}
>> +
>> +static inline uint32_t vext_lmul(uint32_t desc)
>> +{
>> +    return FIELD_EX32(simd_data(desc), VDATA, LMUL);
>> +}
>> +
>> +/*
>> + * Get vector group length in bytes. Its range is [64, 2048].
>> + *
>> + * As simd_desc support at most 256, the max vlen is 512 bits.
>> + * So vlen in bytes is encoded as maxsz.
>> + */
>> +static inline uint32_t vext_maxsz(uint32_t desc)
>> +{
>> +    return simd_maxsz(desc) << vext_lmul(desc);
>> +}
>> +
>> +/*
>> + * This function checks watchpoint before real load operation.
>> + *
>> + * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
>> + * In user mode, there is no watchpoint support now.
>> + *
>> + * It will trigger an exception if there is no mapping in TLB
>> + * and page table walk can't fill the TLB entry. Then the guest
>> + * software can return here after process the exception or never return.
>> + */
>> +static void probe_pages(CPURISCVState *env, target_ulong addr,
>> +        target_ulong len, uintptr_t ra, MMUAccessType access_type)
>> +{
>> +    target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
>> +    target_ulong curlen = MIN(pagelen, len);
>> +
>> +    probe_access(env, addr, curlen, access_type,
>> +            cpu_mmu_index(env, false), ra);
>> +    if (len > curlen) {
>> +        addr += curlen;
>> +        curlen = len - curlen;
>> +        probe_access(env, addr, curlen, access_type,
>> +                cpu_mmu_index(env, false), ra);
>> +    }
>> +}
>> +
>> +#ifdef HOST_WORDS_BIGENDIAN
>> +static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
>> +{
>> +    /*
>> +     * Split the remaining range to two parts.
>> +     * The first part is in the last uint64_t unit.
>> +     * The second part start from the next uint64_t unit.
>> +     */
>> +    int part1 = 0, part2 = tot - cnt;
>> +    if (cnt % 8) {
>> +        part1 = 8 - (cnt % 8);
>> +        part2 = tot - cnt - part1;
>> +        memset(tail & ~(7ULL), 0, part1);
>> +        memset((tail + 8) & ~(7ULL), 0, part2);
>> +    } else {
>> +        memset(tail, 0, part2);
>> +    }
>> +}
>> +#else
>> +static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
>> +{
>> +    memset(tail, 0, tot - cnt);
>> +}
>> +#endif
>> +
>> +static void clearb(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
>> +{
>> +    int8_t *cur = ((int8_t *)vd + H1(idx));
>> +    vext_clear(cur, cnt, tot);
>> +}
>> +
>> +static void clearh(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
>> +{
>> +    int16_t *cur = ((int16_t *)vd + H2(idx));
>> +    vext_clear(cur, cnt, tot);
>> +}
>> +
>> +static void clearl(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
>> +{
>> +    int32_t *cur = ((int32_t *)vd + H4(idx));
>> +    vext_clear(cur, cnt, tot);
>> +}
>> +
>> +static void clearq(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
>> +{
>> +    int64_t *cur = (int64_t *)vd + idx;
>> +    vext_clear(cur, cnt, tot);
>> +}
>> +
>> +
>> +static inline int vext_elem_mask(void *v0, int mlen, int index)
>> +{
>> +    int idx = (index * mlen) / 64;
>> +    int pos = (index * mlen) % 64;
>> +    return (((uint64_t *)v0)[idx] >> pos) & 1;
>> +}
>> +
>> +/* elements operations for load and store */
>> +typedef void (*vext_ldst_elem_fn)(CPURISCVState *env, target_ulong addr,
>> +        uint32_t idx, void *vd, uintptr_t retaddr);
>> +typedef void (*vext_ld_clear_elem)(void *vd, uint32_t idx,
>> +        uint32_t cnt, uint32_t tot);
>> +
>> +#define GEN_VEXT_LD_ELEM(NAME, MTYPE, ETYPE, H, LDSUF)     \
>> +static void NAME(CPURISCVState *env, abi_ptr addr,         \
>> +        uint32_t idx, void *vd, uintptr_t retaddr)         \
>> +{                                                          \
>> +    MTYPE data;                                            \
>> +    ETYPE *cur = ((ETYPE *)vd + H(idx));                   \
>> +    data = cpu_##LDSUF##_data_ra(env, addr, retaddr);      \
>> +    *cur = data;                                           \
>> +}                                                          \
>> +
>> +GEN_VEXT_LD_ELEM(ldb_b, int8_t,  int8_t,  H1, ldsb)
>> +GEN_VEXT_LD_ELEM(ldb_h, int8_t,  int16_t, H2, ldsb)
>> +GEN_VEXT_LD_ELEM(ldb_w, int8_t,  int32_t, H4, ldsb)
>> +GEN_VEXT_LD_ELEM(ldb_d, int8_t,  int64_t, H8, ldsb)
>> +GEN_VEXT_LD_ELEM(ldh_h, int16_t, int16_t, H2, ldsw)
>> +GEN_VEXT_LD_ELEM(ldh_w, int16_t, int32_t, H4, ldsw)
>> +GEN_VEXT_LD_ELEM(ldh_d, int16_t, int64_t, H8, ldsw)
>> +GEN_VEXT_LD_ELEM(ldw_w, int32_t, int32_t, H4, ldl)
>> +GEN_VEXT_LD_ELEM(ldw_d, int32_t, int64_t, H8, ldl)
>> +GEN_VEXT_LD_ELEM(lde_b, int8_t,  int8_t,  H1, ldsb)
>> +GEN_VEXT_LD_ELEM(lde_h, int16_t, int16_t, H2, ldsw)
>> +GEN_VEXT_LD_ELEM(lde_w, int32_t, int32_t, H4, ldl)
>> +GEN_VEXT_LD_ELEM(lde_d, int64_t, int64_t, H8, ldq)
>> +GEN_VEXT_LD_ELEM(ldbu_b, uint8_t,  uint8_t,  H1, ldub)
>> +GEN_VEXT_LD_ELEM(ldbu_h, uint8_t,  uint16_t, H2, ldub)
>> +GEN_VEXT_LD_ELEM(ldbu_w, uint8_t,  uint32_t, H4, ldub)
>> +GEN_VEXT_LD_ELEM(ldbu_d, uint8_t,  uint64_t, H8, ldub)
>> +GEN_VEXT_LD_ELEM(ldhu_h, uint16_t, uint16_t, H2, lduw)
>> +GEN_VEXT_LD_ELEM(ldhu_w, uint16_t, uint32_t, H4, lduw)
>> +GEN_VEXT_LD_ELEM(ldhu_d, uint16_t, uint64_t, H8, lduw)
>> +GEN_VEXT_LD_ELEM(ldwu_w, uint32_t, uint32_t, H4, ldl)
>> +GEN_VEXT_LD_ELEM(ldwu_d, uint32_t, uint64_t, H8, ldl)
>> +
>> +#define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF)          \
>> +static void NAME(CPURISCVState *env, abi_ptr addr,       \
>> +        uint32_t idx, void *vd, uintptr_t retaddr)       \
>> +{                                                        \
>> +    ETYPE data = *((ETYPE *)vd + H(idx));                \
>> +    cpu_##STSUF##_data_ra(env, addr, data, retaddr);     \
>> +}
>> +GEN_VEXT_ST_ELEM(stb_b, int8_t,  H1, stb)
>> +GEN_VEXT_ST_ELEM(stb_h, int16_t, H2, stb)
>> +GEN_VEXT_ST_ELEM(stb_w, int32_t, H4, stb)
>> +GEN_VEXT_ST_ELEM(stb_d, int64_t, H8, stb)
>> +GEN_VEXT_ST_ELEM(sth_h, int16_t, H2, stw)
>> +GEN_VEXT_ST_ELEM(sth_w, int32_t, H4, stw)
>> +GEN_VEXT_ST_ELEM(sth_d, int64_t, H8, stw)
>> +GEN_VEXT_ST_ELEM(stw_w, int32_t, H4, stl)
>> +GEN_VEXT_ST_ELEM(stw_d, int64_t, H8, stl)
>> +GEN_VEXT_ST_ELEM(ste_b, int8_t,  H1, stb)
>> +GEN_VEXT_ST_ELEM(ste_h, int16_t, H2, stw)
>> +GEN_VEXT_ST_ELEM(ste_w, int32_t, H4, stl)
>> +GEN_VEXT_ST_ELEM(ste_d, int64_t, H8, stq)
>> +
>> +/*
>> + *** stride: access vector element from strided memory
>> + */
>> +static void vext_ldst_stride(void *vd, void *v0, target_ulong base,
>> +        target_ulong stride, CPURISCVState *env, uint32_t desc, uint32_t vm,
>> +        vext_ldst_elem_fn ldst_elem, vext_ld_clear_elem clear_elem,
>> +        uint32_t esz, uint32_t msz, uintptr_t ra, MMUAccessType access_type)
>> +{
>> +    uint32_t i, k;
>> +    uint32_t nf = vext_nf(desc);
>> +    uint32_t mlen = vext_mlen(desc);
>> +    uint32_t vlmax = vext_maxsz(desc) / esz;
>> +
>> +    if (env->vl == 0) {
>> +        return;
>> +    }
>> +    /* probe every access*/
>> +    for (i = 0; i < env->vl; i++) {
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
>> +            continue;
>> +        }
>> +        probe_pages(env, base + stride * i, nf * msz, ra, access_type);
>> +    }
>> +    /* do real access */
>> +    for (i = 0; i < env->vl; i++) {
>> +        k = 0;
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
>> +            continue;
>> +        }
>> +        while (k < nf) {
>> +            target_ulong addr = base + stride * i + k * msz;
>> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
>> +            k++;
>> +        }
>> +    }
>> +    /* clear tail elements */
>> +    if (clear_elem) {
>> +        for (k = 0; k < nf; k++) {
>> +            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
>> +        }
>> +    }
>> +}
>> +
>> +#define GEN_VEXT_LD_STRIDE(NAME, MTYPE, ETYPE, LOAD_FN, CLEAR_FN)       \
>> +void HELPER(NAME)(void *vd, void * v0, target_ulong base,               \
>> +        target_ulong stride, CPURISCVState *env, uint32_t desc)         \
>> +{                                                                       \
>> +    uint32_t vm = vext_vm(desc);                                        \
>> +    vext_ldst_stride(vd, v0, base, stride, env, desc, vm, LOAD_FN,      \
>> +        CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);\
>> +}
>> +
>> +GEN_VEXT_LD_STRIDE(vlsb_v_b,  int8_t,   int8_t,   ldb_b,  clearb)
>> +GEN_VEXT_LD_STRIDE(vlsb_v_h,  int8_t,   int16_t,  ldb_h,  clearh)
>> +GEN_VEXT_LD_STRIDE(vlsb_v_w,  int8_t,   int32_t,  ldb_w,  clearl)
>> +GEN_VEXT_LD_STRIDE(vlsb_v_d,  int8_t,   int64_t,  ldb_d,  clearq)
>> +GEN_VEXT_LD_STRIDE(vlsh_v_h,  int16_t,  int16_t,  ldh_h,  clearh)
>> +GEN_VEXT_LD_STRIDE(vlsh_v_w,  int16_t,  int32_t,  ldh_w,  clearl)
>> +GEN_VEXT_LD_STRIDE(vlsh_v_d,  int16_t,  int64_t,  ldh_d,  clearq)
>> +GEN_VEXT_LD_STRIDE(vlsw_v_w,  int32_t,  int32_t,  ldw_w,  clearl)
>> +GEN_VEXT_LD_STRIDE(vlsw_v_d,  int32_t,  int64_t,  ldw_d,  clearq)
>> +GEN_VEXT_LD_STRIDE(vlse_v_b,  int8_t,   int8_t,   lde_b,  clearb)
>> +GEN_VEXT_LD_STRIDE(vlse_v_h,  int16_t,  int16_t,  lde_h,  clearh)
>> +GEN_VEXT_LD_STRIDE(vlse_v_w,  int32_t,  int32_t,  lde_w,  clearl)
>> +GEN_VEXT_LD_STRIDE(vlse_v_d,  int64_t,  int64_t,  lde_d,  clearq)
>> +GEN_VEXT_LD_STRIDE(vlsbu_v_b, uint8_t,  uint8_t,  ldbu_b, clearb)
>> +GEN_VEXT_LD_STRIDE(vlsbu_v_h, uint8_t,  uint16_t, ldbu_h, clearh)
>> +GEN_VEXT_LD_STRIDE(vlsbu_v_w, uint8_t,  uint32_t, ldbu_w, clearl)
>> +GEN_VEXT_LD_STRIDE(vlsbu_v_d, uint8_t,  uint64_t, ldbu_d, clearq)
>> +GEN_VEXT_LD_STRIDE(vlshu_v_h, uint16_t, uint16_t, ldhu_h, clearh)
>> +GEN_VEXT_LD_STRIDE(vlshu_v_w, uint16_t, uint32_t, ldhu_w, clearl)
>> +GEN_VEXT_LD_STRIDE(vlshu_v_d, uint16_t, uint64_t, ldhu_d, clearq)
>> +GEN_VEXT_LD_STRIDE(vlswu_v_w, uint32_t, uint32_t, ldwu_w, clearl)
>> +GEN_VEXT_LD_STRIDE(vlswu_v_d, uint32_t, uint64_t, ldwu_d, clearq)
>> +
>> +#define GEN_VEXT_ST_STRIDE(NAME, MTYPE, ETYPE, STORE_FN)                \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
>> +        target_ulong stride, CPURISCVState *env, uint32_t desc)         \
>> +{                                                                       \
>> +    uint32_t vm = vext_vm(desc);                                        \
>> +    vext_ldst_stride(vd, v0, base, stride, env, desc, vm, STORE_FN,     \
>> +        NULL, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);   \
>> +}
>> +
>> +GEN_VEXT_ST_STRIDE(vssb_v_b, int8_t,  int8_t,  stb_b)
>> +GEN_VEXT_ST_STRIDE(vssb_v_h, int8_t,  int16_t, stb_h)
>> +GEN_VEXT_ST_STRIDE(vssb_v_w, int8_t,  int32_t, stb_w)
>> +GEN_VEXT_ST_STRIDE(vssb_v_d, int8_t,  int64_t, stb_d)
>> +GEN_VEXT_ST_STRIDE(vssh_v_h, int16_t, int16_t, sth_h)
>> +GEN_VEXT_ST_STRIDE(vssh_v_w, int16_t, int32_t, sth_w)
>> +GEN_VEXT_ST_STRIDE(vssh_v_d, int16_t, int64_t, sth_d)
>> +GEN_VEXT_ST_STRIDE(vssw_v_w, int32_t, int32_t, stw_w)
>> +GEN_VEXT_ST_STRIDE(vssw_v_d, int32_t, int64_t, stw_d)
>> +GEN_VEXT_ST_STRIDE(vsse_v_b, int8_t,  int8_t,  ste_b)
>> +GEN_VEXT_ST_STRIDE(vsse_v_h, int16_t, int16_t, ste_h)
>> +GEN_VEXT_ST_STRIDE(vsse_v_w, int32_t, int32_t, ste_w)
>> +GEN_VEXT_ST_STRIDE(vsse_v_d, int64_t, int64_t, ste_d)
>> +
>> +/*
>> + *** unit-stride: access elements stored contiguously in memory
>> + */
>> +
>> +/* unmasked unit-stride load and store operation*/
>> +static inline void vext_ldst_us(void *vd, target_ulong base,
>> +        CPURISCVState *env, uint32_t desc,
>> +        vext_ldst_elem_fn ldst_elem,
>> +        vext_ld_clear_elem clear_elem,
>> +        uint32_t esz, uint32_t msz, uintptr_t ra,
>> +        MMUAccessType access_type)
>> +{
>> +    uint32_t i, k;
>> +    uint32_t nf = vext_nf(desc);
>> +    uint32_t vlmax = vext_maxsz(desc) / esz;
>> +
>> +    if (env->vl == 0) {
>> +        return;
>> +    }
>> +    /* probe every access */
>> +    probe_pages(env, base, env->vl * nf * msz, ra, access_type);
>> +    /* load bytes from guest memory */
>> +    for (i = 0; i < env->vl; i++) {
>> +        k = 0;
>> +        while (k < nf) {
>> +            target_ulong addr = base + (i * nf + k) * msz;
>> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
>> +            k++;
>> +        }
>> +    }
>> +    /* clear tail elements */
>> +    if (clear_elem) {
>> +        for (k = 0; k < nf; k++) {
>> +            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
>> +        }
>> +    }
>> +}
>> +
>> +/*
>> + * masked unit-stride load and store operation will be a special case of stride,
>> + * stride = NF * sizeof (MTYPE)
>> + */
>> +
>> +#define GEN_VEXT_LD_US(NAME, MTYPE, ETYPE, LOAD_FN, CLEAR_FN)           \
>> +void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,         \
>> +        CPURISCVState *env, uint32_t desc)                              \
>> +{                                                                       \
>> +    uint32_t stride = vext_nf(desc) * sizeof(MTYPE);                    \
>> +    vext_ldst_stride(vd, v0, base, stride, env, desc, false, LOAD_FN,   \
>> +        CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);\
>> +}                                                                       \
>> +                                                                        \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
>> +        CPURISCVState *env, uint32_t desc)                              \
>> +{                                                                       \
>> +    vext_ldst_us(vd, base, env, desc, LOAD_FN, CLEAR_FN,                \
>> +        sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);          \
>> +}
>> +
>> +GEN_VEXT_LD_US(vlb_v_b,  int8_t,   int8_t,   ldb_b,  clearb)
>> +GEN_VEXT_LD_US(vlb_v_h,  int8_t,   int16_t,  ldb_h,  clearh)
>> +GEN_VEXT_LD_US(vlb_v_w,  int8_t,   int32_t,  ldb_w,  clearl)
>> +GEN_VEXT_LD_US(vlb_v_d,  int8_t,   int64_t,  ldb_d,  clearq)
>> +GEN_VEXT_LD_US(vlh_v_h,  int16_t,  int16_t,  ldh_h,  clearh)
>> +GEN_VEXT_LD_US(vlh_v_w,  int16_t,  int32_t,  ldh_w,  clearl)
>> +GEN_VEXT_LD_US(vlh_v_d,  int16_t,  int64_t,  ldh_d,  clearq)
>> +GEN_VEXT_LD_US(vlw_v_w,  int32_t,  int32_t,  ldw_w,  clearl)
>> +GEN_VEXT_LD_US(vlw_v_d,  int32_t,  int64_t,  ldw_d,  clearq)
>> +GEN_VEXT_LD_US(vle_v_b,  int8_t,   int8_t,   lde_b,  clearb)
>> +GEN_VEXT_LD_US(vle_v_h,  int16_t,  int16_t,  lde_h,  clearh)
>> +GEN_VEXT_LD_US(vle_v_w,  int32_t,  int32_t,  lde_w,  clearl)
>> +GEN_VEXT_LD_US(vle_v_d,  int64_t,  int64_t,  lde_d,  clearq)
>> +GEN_VEXT_LD_US(vlbu_v_b, uint8_t,  uint8_t,  ldbu_b, clearb)
>> +GEN_VEXT_LD_US(vlbu_v_h, uint8_t,  uint16_t, ldbu_h, clearh)
>> +GEN_VEXT_LD_US(vlbu_v_w, uint8_t,  uint32_t, ldbu_w, clearl)
>> +GEN_VEXT_LD_US(vlbu_v_d, uint8_t,  uint64_t, ldbu_d, clearq)
>> +GEN_VEXT_LD_US(vlhu_v_h, uint16_t, uint16_t, ldhu_h, clearh)
>> +GEN_VEXT_LD_US(vlhu_v_w, uint16_t, uint32_t, ldhu_w, clearl)
>> +GEN_VEXT_LD_US(vlhu_v_d, uint16_t, uint64_t, ldhu_d, clearq)
>> +GEN_VEXT_LD_US(vlwu_v_w, uint32_t, uint32_t, ldwu_w, clearl)
>> +GEN_VEXT_LD_US(vlwu_v_d, uint32_t, uint64_t, ldwu_d, clearq)
>> +
>> +#define GEN_VEXT_ST_US(NAME, MTYPE, ETYPE, STORE_FN)                    \
>> +void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,         \
>> +        CPURISCVState *env, uint32_t desc)                              \
>> +{                                                                       \
>> +    uint32_t stride = vext_nf(desc) * sizeof(MTYPE);                    \
>> +    vext_ldst_stride(vd, v0, base, stride, env, desc, false, STORE_FN,  \
>> +        NULL, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);   \
>> +}                                                                       \
>> +                                                                        \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
>> +        CPURISCVState *env, uint32_t desc)                              \
>> +{                                                                       \
>> +    vext_ldst_us(vd, base, env, desc, STORE_FN, NULL,                   \
>> +        sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);         \
>> +}
>> +
>> +GEN_VEXT_ST_US(vsb_v_b, int8_t,  int8_t , stb_b)
>> +GEN_VEXT_ST_US(vsb_v_h, int8_t,  int16_t, stb_h)
>> +GEN_VEXT_ST_US(vsb_v_w, int8_t,  int32_t, stb_w)
>> +GEN_VEXT_ST_US(vsb_v_d, int8_t,  int64_t, stb_d)
>> +GEN_VEXT_ST_US(vsh_v_h, int16_t, int16_t, sth_h)
>> +GEN_VEXT_ST_US(vsh_v_w, int16_t, int32_t, sth_w)
>> +GEN_VEXT_ST_US(vsh_v_d, int16_t, int64_t, sth_d)
>> +GEN_VEXT_ST_US(vsw_v_w, int32_t, int32_t, stw_w)
>> +GEN_VEXT_ST_US(vsw_v_d, int32_t, int64_t, stw_d)
>> +GEN_VEXT_ST_US(vse_v_b, int8_t,  int8_t , ste_b)
>> +GEN_VEXT_ST_US(vse_v_h, int16_t, int16_t, ste_h)
>> +GEN_VEXT_ST_US(vse_v_w, int32_t, int32_t, ste_w)
>> +GEN_VEXT_ST_US(vse_v_d, int64_t, int64_t, ste_d)
>> --
>> 2.23.0
>>



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 60/60] target/riscv: configure and turn on vector extension from command line
  2020-03-12 14:59   ` LIU Zhiwei
@ 2020-03-13 21:41     ` Alistair Francis
  -1 siblings, 0 replies; 336+ messages in thread
From: Alistair Francis @ 2020-03-13 21:41 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: guoren, open list:RISC-V, Richard Henderson,
	qemu-devel@nongnu.org Developers, wxy194768, Chih-Min Chao,
	wenmeng_zhang, Palmer Dabbelt

On Thu, Mar 12, 2020 at 10:00 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Vector extension is default off. The only way to use vector extension is
> 1. use cpu rv32 or rv64
> 2. turn on it by command line
> "-cpu rv64,v=true,vlen=128,elen=64,vext_spec=v0.7.1".
>
> vlen is the vector register length, default value is 128 bit.
> elen is the max operator size in bits, default value is 64 bit.
> vext_spec is the vector specification version, default value is v0.7.1.
> These properties can be specified with other values.
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/cpu.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
>  target/riscv/cpu.h |  2 ++
>  2 files changed, 45 insertions(+), 1 deletion(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 6e4135583d..5f1cdd4f2b 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -395,7 +395,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
>      }
>
>      set_priv_version(env, priv_version);
> -    set_vext_version(env, vext_version);
>      set_resetvec(env, DEFAULT_RSTVEC);
>
>      if (cpu->cfg.mmu) {
> @@ -463,6 +462,45 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
>          if (cpu->cfg.ext_h) {
>              target_misa |= RVH;
>          }
> +        if (cpu->cfg.ext_v) {
> +            target_misa |= RVV;
> +            if (!is_power_of_2(cpu->cfg.vlen)) {
> +                error_setg(errp,
> +                        "Vector extension VLEN must be power of 2");
> +                return;
> +            }
> +            if (cpu->cfg.vlen > RV_VLEN_MAX || cpu->cfg.vlen < 128) {
> +                error_setg(errp,
> +                        "Vector extension implementation only supports VLEN "
> +                        "in the range [128, %d]", RV_VLEN_MAX);
> +                return;
> +            }
> +            if (!is_power_of_2(cpu->cfg.elen)) {
> +                error_setg(errp,
> +                        "Vector extension ELEN must be power of 2");
> +                return;
> +            }
> +            if (cpu->cfg.elen > 64 || cpu->cfg.vlen < 8) {
> +                error_setg(errp,
> +                        "Vector extension implementation only supports ELEN "
> +                        "in the range [8, 64]");
> +                return;
> +            }
> +            if (cpu->cfg.vext_spec) {
> +                if (!g_strcmp0(cpu->cfg.vext_spec, "v0.7.1")) {
> +                    vext_version = VEXT_VERSION_0_07_1;
> +                } else {
> +                    error_setg(errp,
> +                           "Unsupported vector spec version '%s'",
> +                           cpu->cfg.vext_spec);
> +                    return;
> +                }
> +            } else {
> +                qemu_log("vector verison is not specified, "
> +                        "use the default value v0.7.1\n");
> +            }
> +            set_vext_version(env, vext_version);
> +        }
>
>          set_misa(env, RVXLEN | target_misa);
>      }
> @@ -500,10 +538,14 @@ static Property riscv_cpu_properties[] = {
>      DEFINE_PROP_BOOL("u", RISCVCPU, cfg.ext_u, true),
>      /* This is experimental so mark with 'x-' */
>      DEFINE_PROP_BOOL("x-h", RISCVCPU, cfg.ext_h, false),
> +    DEFINE_PROP_BOOL("v", RISCVCPU, cfg.ext_v, false),

This should be x-v as it's experimental.

Alistair

>      DEFINE_PROP_BOOL("Counters", RISCVCPU, cfg.ext_counters, true),
>      DEFINE_PROP_BOOL("Zifencei", RISCVCPU, cfg.ext_ifencei, true),
>      DEFINE_PROP_BOOL("Zicsr", RISCVCPU, cfg.ext_icsr, true),
>      DEFINE_PROP_STRING("priv_spec", RISCVCPU, cfg.priv_spec),
> +    DEFINE_PROP_STRING("vext_spec", RISCVCPU, cfg.vext_spec),
> +    DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
> +    DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),
>      DEFINE_PROP_BOOL("mmu", RISCVCPU, cfg.mmu, true),
>      DEFINE_PROP_BOOL("pmp", RISCVCPU, cfg.pmp, true),
>      DEFINE_PROP_END_OF_LIST(),
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index e069e55e81..36ead8d6d5 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -285,12 +285,14 @@ typedef struct RISCVCPU {
>          bool ext_s;
>          bool ext_u;
>          bool ext_h;
> +        bool ext_v;
>          bool ext_counters;
>          bool ext_ifencei;
>          bool ext_icsr;
>
>          char *priv_spec;
>          char *user_spec;
> +        char *vext_spec;
>          uint16_t vlen;
>          uint16_t elen;
>          bool mmu;
> --
> 2.23.0
>


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 60/60] target/riscv: configure and turn on vector extension from command line
@ 2020-03-13 21:41     ` Alistair Francis
  0 siblings, 0 replies; 336+ messages in thread
From: Alistair Francis @ 2020-03-13 21:41 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Chih-Min Chao, Palmer Dabbelt, wenmeng_zhang,
	wxy194768, guoren, qemu-devel@nongnu.org Developers,
	open list:RISC-V

On Thu, Mar 12, 2020 at 10:00 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Vector extension is default off. The only way to use vector extension is
> 1. use cpu rv32 or rv64
> 2. turn on it by command line
> "-cpu rv64,v=true,vlen=128,elen=64,vext_spec=v0.7.1".
>
> vlen is the vector register length, default value is 128 bit.
> elen is the max operator size in bits, default value is 64 bit.
> vext_spec is the vector specification version, default value is v0.7.1.
> These properties can be specified with other values.
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/cpu.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
>  target/riscv/cpu.h |  2 ++
>  2 files changed, 45 insertions(+), 1 deletion(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 6e4135583d..5f1cdd4f2b 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -395,7 +395,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
>      }
>
>      set_priv_version(env, priv_version);
> -    set_vext_version(env, vext_version);
>      set_resetvec(env, DEFAULT_RSTVEC);
>
>      if (cpu->cfg.mmu) {
> @@ -463,6 +462,45 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
>          if (cpu->cfg.ext_h) {
>              target_misa |= RVH;
>          }
> +        if (cpu->cfg.ext_v) {
> +            target_misa |= RVV;
> +            if (!is_power_of_2(cpu->cfg.vlen)) {
> +                error_setg(errp,
> +                        "Vector extension VLEN must be power of 2");
> +                return;
> +            }
> +            if (cpu->cfg.vlen > RV_VLEN_MAX || cpu->cfg.vlen < 128) {
> +                error_setg(errp,
> +                        "Vector extension implementation only supports VLEN "
> +                        "in the range [128, %d]", RV_VLEN_MAX);
> +                return;
> +            }
> +            if (!is_power_of_2(cpu->cfg.elen)) {
> +                error_setg(errp,
> +                        "Vector extension ELEN must be power of 2");
> +                return;
> +            }
> +            if (cpu->cfg.elen > 64 || cpu->cfg.vlen < 8) {
> +                error_setg(errp,
> +                        "Vector extension implementation only supports ELEN "
> +                        "in the range [8, 64]");
> +                return;
> +            }
> +            if (cpu->cfg.vext_spec) {
> +                if (!g_strcmp0(cpu->cfg.vext_spec, "v0.7.1")) {
> +                    vext_version = VEXT_VERSION_0_07_1;
> +                } else {
> +                    error_setg(errp,
> +                           "Unsupported vector spec version '%s'",
> +                           cpu->cfg.vext_spec);
> +                    return;
> +                }
> +            } else {
> +                qemu_log("vector verison is not specified, "
> +                        "use the default value v0.7.1\n");
> +            }
> +            set_vext_version(env, vext_version);
> +        }
>
>          set_misa(env, RVXLEN | target_misa);
>      }
> @@ -500,10 +538,14 @@ static Property riscv_cpu_properties[] = {
>      DEFINE_PROP_BOOL("u", RISCVCPU, cfg.ext_u, true),
>      /* This is experimental so mark with 'x-' */
>      DEFINE_PROP_BOOL("x-h", RISCVCPU, cfg.ext_h, false),
> +    DEFINE_PROP_BOOL("v", RISCVCPU, cfg.ext_v, false),

This should be x-v as it's experimental.

Alistair

>      DEFINE_PROP_BOOL("Counters", RISCVCPU, cfg.ext_counters, true),
>      DEFINE_PROP_BOOL("Zifencei", RISCVCPU, cfg.ext_ifencei, true),
>      DEFINE_PROP_BOOL("Zicsr", RISCVCPU, cfg.ext_icsr, true),
>      DEFINE_PROP_STRING("priv_spec", RISCVCPU, cfg.priv_spec),
> +    DEFINE_PROP_STRING("vext_spec", RISCVCPU, cfg.vext_spec),
> +    DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
> +    DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),
>      DEFINE_PROP_BOOL("mmu", RISCVCPU, cfg.mmu, true),
>      DEFINE_PROP_BOOL("pmp", RISCVCPU, cfg.pmp, true),
>      DEFINE_PROP_END_OF_LIST(),
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index e069e55e81..36ead8d6d5 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -285,12 +285,14 @@ typedef struct RISCVCPU {
>          bool ext_s;
>          bool ext_u;
>          bool ext_h;
> +        bool ext_v;
>          bool ext_counters;
>          bool ext_ifencei;
>          bool ext_icsr;
>
>          char *priv_spec;
>          char *user_spec;
> +        char *vext_spec;
>          uint16_t vlen;
>          uint16_t elen;
>          bool mmu;
> --
> 2.23.0
>


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 60/60] target/riscv: configure and turn on vector extension from command line
  2020-03-13 21:41     ` Alistair Francis
@ 2020-03-13 21:52       ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-13 21:52 UTC (permalink / raw)
  To: Alistair Francis
  Cc: guoren, open list:RISC-V, Richard Henderson,
	qemu-devel@nongnu.org Developers, wxy194768, Chih-Min Chao,
	wenmeng_zhang, Palmer Dabbelt



On 2020/3/14 5:41, Alistair Francis wrote:
> On Thu, Mar 12, 2020 at 10:00 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>> Vector extension is default off. The only way to use vector extension is
>> 1. use cpu rv32 or rv64
>> 2. turn on it by command line
>> "-cpu rv64,v=true,vlen=128,elen=64,vext_spec=v0.7.1".
>>
>> vlen is the vector register length, default value is 128 bit.
>> elen is the max operator size in bits, default value is 64 bit.
>> vext_spec is the vector specification version, default value is v0.7.1.
>> These properties can be specified with other values.
>>
>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>> ---
>>   target/riscv/cpu.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
>>   target/riscv/cpu.h |  2 ++
>>   2 files changed, 45 insertions(+), 1 deletion(-)
>>
>> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
>> index 6e4135583d..5f1cdd4f2b 100644
>> --- a/target/riscv/cpu.c
>> +++ b/target/riscv/cpu.c
>> @@ -395,7 +395,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
>>       }
>>
>>       set_priv_version(env, priv_version);
>> -    set_vext_version(env, vext_version);
>>       set_resetvec(env, DEFAULT_RSTVEC);
>>
>>       if (cpu->cfg.mmu) {
>> @@ -463,6 +462,45 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
>>           if (cpu->cfg.ext_h) {
>>               target_misa |= RVH;
>>           }
>> +        if (cpu->cfg.ext_v) {
>> +            target_misa |= RVV;
>> +            if (!is_power_of_2(cpu->cfg.vlen)) {
>> +                error_setg(errp,
>> +                        "Vector extension VLEN must be power of 2");
>> +                return;
>> +            }
>> +            if (cpu->cfg.vlen > RV_VLEN_MAX || cpu->cfg.vlen < 128) {
>> +                error_setg(errp,
>> +                        "Vector extension implementation only supports VLEN "
>> +                        "in the range [128, %d]", RV_VLEN_MAX);
>> +                return;
>> +            }
>> +            if (!is_power_of_2(cpu->cfg.elen)) {
>> +                error_setg(errp,
>> +                        "Vector extension ELEN must be power of 2");
>> +                return;
>> +            }
>> +            if (cpu->cfg.elen > 64 || cpu->cfg.vlen < 8) {
>> +                error_setg(errp,
>> +                        "Vector extension implementation only supports ELEN "
>> +                        "in the range [8, 64]");
>> +                return;
>> +            }
>> +            if (cpu->cfg.vext_spec) {
>> +                if (!g_strcmp0(cpu->cfg.vext_spec, "v0.7.1")) {
>> +                    vext_version = VEXT_VERSION_0_07_1;
>> +                } else {
>> +                    error_setg(errp,
>> +                           "Unsupported vector spec version '%s'",
>> +                           cpu->cfg.vext_spec);
>> +                    return;
>> +                }
>> +            } else {
>> +                qemu_log("vector verison is not specified, "
>> +                        "use the default value v0.7.1\n");
>> +            }
>> +            set_vext_version(env, vext_version);
>> +        }
>>
>>           set_misa(env, RVXLEN | target_misa);
>>       }
>> @@ -500,10 +538,14 @@ static Property riscv_cpu_properties[] = {
>>       DEFINE_PROP_BOOL("u", RISCVCPU, cfg.ext_u, true),
>>       /* This is experimental so mark with 'x-' */
>>       DEFINE_PROP_BOOL("x-h", RISCVCPU, cfg.ext_h, false),
>> +    DEFINE_PROP_BOOL("v", RISCVCPU, cfg.ext_v, false),
> This should be x-v as it's experimental.
>
> Alistair
Yes. I will fix it next patch set.

Zhiwei
>
>>       DEFINE_PROP_BOOL("Counters", RISCVCPU, cfg.ext_counters, true),
>>       DEFINE_PROP_BOOL("Zifencei", RISCVCPU, cfg.ext_ifencei, true),
>>       DEFINE_PROP_BOOL("Zicsr", RISCVCPU, cfg.ext_icsr, true),
>>       DEFINE_PROP_STRING("priv_spec", RISCVCPU, cfg.priv_spec),
>> +    DEFINE_PROP_STRING("vext_spec", RISCVCPU, cfg.vext_spec),
>> +    DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
>> +    DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),
>>       DEFINE_PROP_BOOL("mmu", RISCVCPU, cfg.mmu, true),
>>       DEFINE_PROP_BOOL("pmp", RISCVCPU, cfg.pmp, true),
>>       DEFINE_PROP_END_OF_LIST(),
>> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
>> index e069e55e81..36ead8d6d5 100644
>> --- a/target/riscv/cpu.h
>> +++ b/target/riscv/cpu.h
>> @@ -285,12 +285,14 @@ typedef struct RISCVCPU {
>>           bool ext_s;
>>           bool ext_u;
>>           bool ext_h;
>> +        bool ext_v;
>>           bool ext_counters;
>>           bool ext_ifencei;
>>           bool ext_icsr;
>>
>>           char *priv_spec;
>>           char *user_spec;
>> +        char *vext_spec;
>>           uint16_t vlen;
>>           uint16_t elen;
>>           bool mmu;
>> --
>> 2.23.0
>>



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 60/60] target/riscv: configure and turn on vector extension from command line
@ 2020-03-13 21:52       ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-13 21:52 UTC (permalink / raw)
  To: Alistair Francis
  Cc: Richard Henderson, Chih-Min Chao, Palmer Dabbelt, wenmeng_zhang,
	wxy194768, guoren, qemu-devel@nongnu.org Developers,
	open list:RISC-V



On 2020/3/14 5:41, Alistair Francis wrote:
> On Thu, Mar 12, 2020 at 10:00 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>> Vector extension is default off. The only way to use vector extension is
>> 1. use cpu rv32 or rv64
>> 2. turn on it by command line
>> "-cpu rv64,v=true,vlen=128,elen=64,vext_spec=v0.7.1".
>>
>> vlen is the vector register length, default value is 128 bit.
>> elen is the max operator size in bits, default value is 64 bit.
>> vext_spec is the vector specification version, default value is v0.7.1.
>> These properties can be specified with other values.
>>
>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>> ---
>>   target/riscv/cpu.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
>>   target/riscv/cpu.h |  2 ++
>>   2 files changed, 45 insertions(+), 1 deletion(-)
>>
>> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
>> index 6e4135583d..5f1cdd4f2b 100644
>> --- a/target/riscv/cpu.c
>> +++ b/target/riscv/cpu.c
>> @@ -395,7 +395,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
>>       }
>>
>>       set_priv_version(env, priv_version);
>> -    set_vext_version(env, vext_version);
>>       set_resetvec(env, DEFAULT_RSTVEC);
>>
>>       if (cpu->cfg.mmu) {
>> @@ -463,6 +462,45 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
>>           if (cpu->cfg.ext_h) {
>>               target_misa |= RVH;
>>           }
>> +        if (cpu->cfg.ext_v) {
>> +            target_misa |= RVV;
>> +            if (!is_power_of_2(cpu->cfg.vlen)) {
>> +                error_setg(errp,
>> +                        "Vector extension VLEN must be power of 2");
>> +                return;
>> +            }
>> +            if (cpu->cfg.vlen > RV_VLEN_MAX || cpu->cfg.vlen < 128) {
>> +                error_setg(errp,
>> +                        "Vector extension implementation only supports VLEN "
>> +                        "in the range [128, %d]", RV_VLEN_MAX);
>> +                return;
>> +            }
>> +            if (!is_power_of_2(cpu->cfg.elen)) {
>> +                error_setg(errp,
>> +                        "Vector extension ELEN must be power of 2");
>> +                return;
>> +            }
>> +            if (cpu->cfg.elen > 64 || cpu->cfg.vlen < 8) {
>> +                error_setg(errp,
>> +                        "Vector extension implementation only supports ELEN "
>> +                        "in the range [8, 64]");
>> +                return;
>> +            }
>> +            if (cpu->cfg.vext_spec) {
>> +                if (!g_strcmp0(cpu->cfg.vext_spec, "v0.7.1")) {
>> +                    vext_version = VEXT_VERSION_0_07_1;
>> +                } else {
>> +                    error_setg(errp,
>> +                           "Unsupported vector spec version '%s'",
>> +                           cpu->cfg.vext_spec);
>> +                    return;
>> +                }
>> +            } else {
>> +                qemu_log("vector verison is not specified, "
>> +                        "use the default value v0.7.1\n");
>> +            }
>> +            set_vext_version(env, vext_version);
>> +        }
>>
>>           set_misa(env, RVXLEN | target_misa);
>>       }
>> @@ -500,10 +538,14 @@ static Property riscv_cpu_properties[] = {
>>       DEFINE_PROP_BOOL("u", RISCVCPU, cfg.ext_u, true),
>>       /* This is experimental so mark with 'x-' */
>>       DEFINE_PROP_BOOL("x-h", RISCVCPU, cfg.ext_h, false),
>> +    DEFINE_PROP_BOOL("v", RISCVCPU, cfg.ext_v, false),
> This should be x-v as it's experimental.
>
> Alistair
Yes. I will fix it next patch set.

Zhiwei
>
>>       DEFINE_PROP_BOOL("Counters", RISCVCPU, cfg.ext_counters, true),
>>       DEFINE_PROP_BOOL("Zifencei", RISCVCPU, cfg.ext_ifencei, true),
>>       DEFINE_PROP_BOOL("Zicsr", RISCVCPU, cfg.ext_icsr, true),
>>       DEFINE_PROP_STRING("priv_spec", RISCVCPU, cfg.priv_spec),
>> +    DEFINE_PROP_STRING("vext_spec", RISCVCPU, cfg.vext_spec),
>> +    DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
>> +    DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),
>>       DEFINE_PROP_BOOL("mmu", RISCVCPU, cfg.mmu, true),
>>       DEFINE_PROP_BOOL("pmp", RISCVCPU, cfg.pmp, true),
>>       DEFINE_PROP_END_OF_LIST(),
>> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
>> index e069e55e81..36ead8d6d5 100644
>> --- a/target/riscv/cpu.h
>> +++ b/target/riscv/cpu.h
>> @@ -285,12 +285,14 @@ typedef struct RISCVCPU {
>>           bool ext_s;
>>           bool ext_u;
>>           bool ext_h;
>> +        bool ext_v;
>>           bool ext_counters;
>>           bool ext_ifencei;
>>           bool ext_icsr;
>>
>>           char *priv_spec;
>>           char *user_spec;
>> +        char *vext_spec;
>>           uint16_t vlen;
>>           uint16_t elen;
>>           bool mmu;
>> --
>> 2.23.0
>>



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 05/60] target/riscv: add vector stride load and store instructions
  2020-03-13 21:32       ` LIU Zhiwei
@ 2020-03-13 22:05         ` Alistair Francis
  -1 siblings, 0 replies; 336+ messages in thread
From: Alistair Francis @ 2020-03-13 22:05 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: guoren, open list:RISC-V, Richard Henderson,
	qemu-devel@nongnu.org Developers, wxy194768, Chih-Min Chao,
	wenmeng_zhang, Palmer Dabbelt

On Fri, Mar 13, 2020 at 2:32 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
>
>
> On 2020/3/14 4:38, Alistair Francis wrote:
> > On Thu, Mar 12, 2020 at 8:09 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
> >> Vector strided operations access the first memory element at the base address,
> >> and then access subsequent elements at address increments given by the byte
> >> offset contained in the x register specified by rs2.
> >>
> >> Vector unit-stride operations access elements stored contiguously in memory
> >> starting from the base effective address. It can been seen as a special
> >> case of strided operations.
> >>
> >> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> >> ---
> >>   target/riscv/cpu.h                      |   6 +
> >>   target/riscv/helper.h                   | 105 ++++++
> >>   target/riscv/insn32.decode              |  32 ++
> >>   target/riscv/insn_trans/trans_rvv.inc.c | 340 ++++++++++++++++++++
> >>   target/riscv/translate.c                |   7 +
> >>   target/riscv/vector_helper.c            | 406 ++++++++++++++++++++++++
> >>   6 files changed, 896 insertions(+)
> >>
> >> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> >> index 505d1a8515..b6ebb9b0eb 100644
> >> --- a/target/riscv/cpu.h
> >> +++ b/target/riscv/cpu.h
> >> @@ -369,6 +369,12 @@ typedef CPURISCVState CPUArchState;
> >>   typedef RISCVCPU ArchCPU;
> >>   #include "exec/cpu-all.h"
> >>
> >> +/* share data between vector helpers and decode code */
> >> +FIELD(VDATA, MLEN, 0, 8)
> >> +FIELD(VDATA, VM, 8, 1)
> >> +FIELD(VDATA, LMUL, 9, 2)
> >> +FIELD(VDATA, NF, 11, 4)
> >> +
> >>   FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
> >>   FIELD(TB_FLAGS, LMUL, 3, 2)
> >>   FIELD(TB_FLAGS, SEW, 5, 3)
> >> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> >> index 3c28c7e407..87dfa90609 100644
> >> --- a/target/riscv/helper.h
> >> +++ b/target/riscv/helper.h
> >> @@ -78,3 +78,108 @@ DEF_HELPER_1(tlb_flush, void, env)
> >>   #endif
> >>   /* Vector functions */
> >>   DEF_HELPER_3(vsetvl, tl, env, tl, tl)
> >> +DEF_HELPER_5(vlb_v_b, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlb_v_b_mask, void, ptr, ptr, tl, env, i32)
> > Do you mind explaining why we have *_mask versions? I'm struggling to
> > understand this.
> When an instruction with a mask, it will only operate the active
> elements in vector.
> Whether an element is active or inactive is predicated by a mask
> register v0.
>
> Without mask, it will operate every element in vector in the body.

Doesn't the mask always apply though? Why do we need an extra helper?

> >> +DEF_HELPER_5(vlb_v_h, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlb_v_h_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlb_v_w, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlb_v_w_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlb_v_d, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlb_v_d_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlh_v_h, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlh_v_h_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlh_v_w, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlh_v_w_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlh_v_d, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlh_v_d_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlw_v_w, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlw_v_w_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlw_v_d, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlw_v_d_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vle_v_b, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vle_v_b_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vle_v_h, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vle_v_h_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vle_v_w, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vle_v_w_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vle_v_d, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vle_v_d_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlbu_v_b, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlbu_v_b_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlbu_v_h, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlbu_v_h_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlbu_v_w, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlbu_v_w_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlbu_v_d, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlbu_v_d_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlhu_v_h, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlhu_v_h_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlhu_v_w, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlhu_v_w_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlhu_v_d, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlhu_v_d_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlwu_v_w, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlwu_v_w_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlwu_v_d, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlwu_v_d_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsb_v_b, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsb_v_b_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsb_v_h, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsb_v_h_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsb_v_w, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsb_v_w_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsb_v_d, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsb_v_d_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsh_v_h, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsh_v_h_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsh_v_w, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsh_v_w_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsh_v_d, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsh_v_d_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsw_v_w, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsw_v_w_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsw_v_d, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsw_v_d_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vse_v_b, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vse_v_b_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vse_v_h, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vse_v_h_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vse_v_w, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vse_v_w_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vse_v_d, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vse_v_d_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_6(vlsb_v_b, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsb_v_h, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsb_v_w, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsb_v_d, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsh_v_h, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsh_v_w, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsh_v_d, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsw_v_w, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsw_v_d, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlse_v_b, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlse_v_h, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlse_v_w, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlse_v_d, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsbu_v_b, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsbu_v_h, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsbu_v_w, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsbu_v_d, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlshu_v_h, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlshu_v_w, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlshu_v_d, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlswu_v_w, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlswu_v_d, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vssb_v_b, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vssb_v_h, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vssb_v_w, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vssb_v_d, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vssh_v_h, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vssh_v_w, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vssh_v_d, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vssw_v_w, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vssw_v_d, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vsse_v_b, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vsse_v_h, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vsse_v_w, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vsse_v_d, void, ptr, ptr, tl, tl, env, i32)
> >> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> >> index 53340bdbc4..ef521152c5 100644
> >> --- a/target/riscv/insn32.decode
> >> +++ b/target/riscv/insn32.decode
> >> @@ -25,6 +25,7 @@
> >>   %sh10    20:10
> >>   %csr    20:12
> >>   %rm     12:3
> >> +%nf     29:3                     !function=ex_plus_1
> >>
> >>   # immediates:
> >>   %imm_i    20:s12
> >> @@ -43,6 +44,8 @@
> >>   &u    imm rd
> >>   &shift     shamt rs1 rd
> >>   &atomic    aq rl rs2 rs1 rd
> >> +&r2nfvm    vm rd rs1 nf
> >> +&rnfvm     vm rd rs1 rs2 nf
> >>
> >>   # Formats 32:
> >>   @r       .......   ..... ..... ... ..... ....... &r                %rs2 %rs1 %rd
> >> @@ -62,6 +65,8 @@
> >>   @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
> >>   @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
> >>   @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
> >> +@r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
> >> +@r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
> >>   @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
> >>
> >>   @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
> >> @@ -210,5 +215,32 @@ fcvt_d_w   1101001  00000 ..... ... ..... 1010011 @r2_rm
> >>   fcvt_d_wu  1101001  00001 ..... ... ..... 1010011 @r2_rm
> >>
> >>   # *** RV32V Extension ***
> >> +
> >> +# *** Vector loads and stores are encoded within LOADFP/STORE-FP ***
> >> +vlb_v      ... 100 . 00000 ..... 000 ..... 0000111 @r2_nfvm
> >> +vlh_v      ... 100 . 00000 ..... 101 ..... 0000111 @r2_nfvm
> >> +vlw_v      ... 100 . 00000 ..... 110 ..... 0000111 @r2_nfvm
> >> +vle_v      ... 000 . 00000 ..... 111 ..... 0000111 @r2_nfvm
> >> +vlbu_v     ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm
> >> +vlhu_v     ... 000 . 00000 ..... 101 ..... 0000111 @r2_nfvm
> >> +vlwu_v     ... 000 . 00000 ..... 110 ..... 0000111 @r2_nfvm
> >> +vsb_v      ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm
> >> +vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
> >> +vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
> >> +vse_v      ... 000 . 00000 ..... 111 ..... 0100111 @r2_nfvm
> >> +
> >> +vlsb_v     ... 110 . ..... ..... 000 ..... 0000111 @r_nfvm
> >> +vlsh_v     ... 110 . ..... ..... 101 ..... 0000111 @r_nfvm
> >> +vlsw_v     ... 110 . ..... ..... 110 ..... 0000111 @r_nfvm
> >> +vlse_v     ... 010 . ..... ..... 111 ..... 0000111 @r_nfvm
> >> +vlsbu_v    ... 010 . ..... ..... 000 ..... 0000111 @r_nfvm
> >> +vlshu_v    ... 010 . ..... ..... 101 ..... 0000111 @r_nfvm
> >> +vlswu_v    ... 010 . ..... ..... 110 ..... 0000111 @r_nfvm
> >> +vssb_v     ... 010 . ..... ..... 000 ..... 0100111 @r_nfvm
> >> +vssh_v     ... 010 . ..... ..... 101 ..... 0100111 @r_nfvm
> >> +vssw_v     ... 010 . ..... ..... 110 ..... 0100111 @r_nfvm
> >> +vsse_v     ... 010 . ..... ..... 111 ..... 0100111 @r_nfvm
> >> +
> >> +# *** new major opcode OP-V ***
> >>   vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
> >>   vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
> >> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
> >> index da82c72bbf..d85f2aec68 100644
> >> --- a/target/riscv/insn_trans/trans_rvv.inc.c
> >> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
> >> @@ -15,6 +15,8 @@
> >>    * You should have received a copy of the GNU General Public License along with
> >>    * this program.  If not, see <http://www.gnu.org/licenses/>.
> >>    */
> >> +#include "tcg/tcg-op-gvec.h"
> >> +#include "tcg/tcg-gvec-desc.h"
> >>
> >>   static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl * a)
> >>   {
> >> @@ -67,3 +69,341 @@ static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli * a)
> >>       tcg_temp_free(dst);
> >>       return true;
> >>   }
> >> +
> >> +/* vector register offset from env */
> >> +static uint32_t vreg_ofs(DisasContext *s, int reg)
> >> +{
> >> +    return offsetof(CPURISCVState, vreg) + reg * s->vlen / 8;
> >> +}
> >> +
> >> +/* check functions */
> >> +static bool vext_check_isa_ill(DisasContext *s, target_ulong isa)
> >> +{
> >> +    return !s->vill && ((s->misa & isa) == isa);
> >> +}
> > I don't think we need a new function to check ISA.
> I don't think so.
>
> Although there is a riscv_has_ext(env, isa) in cpu.h, it is not proper
> in this file,
> as it is in translation time and  usually DisasContext   is used here
> instead of CPURISCVState.

Ah good point. This is fine then.

>
> VILL and ISA  will be checked in every vector instruction, I just put
> them in one function.
> >
> >> +
> >> +/*
> >> + * There are two rules check here.
> >> + *
> >> + * 1. Vector register numbers are multiples of LMUL. (Section 3.2)
> >> + *
> >> + * 2. For all widening instructions, the destination LMUL value must also be
> >> + *    a supported LMUL value. (Section 11.2)
> >> + */
> >> +static bool vext_check_reg(DisasContext *s, uint32_t reg, bool widen)
> >> +{
> >> +    /*
> >> +     * The destination vector register group results are arranged as if both
> >> +     * SEW and LMUL were at twice their current settings. (Section 11.2).
> >> +     */
> >> +    int legal = widen ? 2 << s->lmul : 1 << s->lmul;
> >> +
> >> +    return !((s->lmul == 0x3 && widen) || (reg % legal));
> > Where does this 3 come from?
> LMUL are 2 bits in VTYPE.  So the biggest LMUL is 0x3.
> The meaning of 0x3 is there are 8 vector registers will be used for
> operators.
>
> For a widen operation, LMUL equals 0x3 will be illegal, as
>
>      "The destination vector register group results are arranged as if both
>       SEW and LMUL were at twice their current settings. (Section 11.2)."
>
> If LMUL is 0x3, the source vector register group is 8 vector registers, and
> the destination vector register group will be 16 vector registers indicated,
> which is illegal.

Ah ok.

> >
> >> +}
> >> +
> >> +/*
> >> + * There are two rules check here.
> >> + *
> >> + * 1. The destination vector register group for a masked vector instruction can
> >> + *    only overlap the source mask register (v0) when LMUL=1. (Section 5.3)
> >> + *
> >> + * 2. In widen instructions and some other insturctions, like vslideup.vx,
> >> + *    there is no need to check whether LMUL=1.
> >> + */
> >> +static bool vext_check_overlap_mask(DisasContext *s, uint32_t vd, bool vm,
> >> +    bool force)
> >> +{
> >> +    return (vm != 0 || vd != 0) || (!force && (s->lmul == 0));
> >> +}
> >> +
> >> +/* The LMUL setting must be such that LMUL * NFIELDS <= 8. (Section 7.8) */
> >> +static bool vext_check_nf(DisasContext *s, uint32_t nf)
> >> +{
> >> +    return (1 << s->lmul) * nf <= 8;
> >> +}
> >> +
> >> +/* common translation macro */
> >> +#define GEN_VEXT_TRANS(NAME, SEQ, ARGTYPE, OP, CHECK)      \
> >> +static bool trans_##NAME(DisasContext *s, arg_##ARGTYPE *a)\
> >> +{                                                          \
> >> +    if (CHECK(s, a)) {                                     \
> >> +        return OP(s, a, SEQ);                              \
> >> +    }                                                      \
> >> +    return false;                                          \
> >> +}
> >> +
> >> +/*
> >> + *** unit stride load and store
> >> + */
> >> +typedef void gen_helper_ldst_us(TCGv_ptr, TCGv_ptr, TCGv,
> >> +        TCGv_env, TCGv_i32);
> >> +
> >> +static bool ldst_us_trans(uint32_t vd, uint32_t rs1, uint32_t data,
> >> +        gen_helper_ldst_us *fn, DisasContext *s)
> >> +{
> >> +    TCGv_ptr dest, mask;
> >> +    TCGv base;
> >> +    TCGv_i32 desc;
> >> +
> >> +    dest = tcg_temp_new_ptr();
> >> +    mask = tcg_temp_new_ptr();
> >> +    base = tcg_temp_new();
> >> +
> >> +    /*
> >> +     * As simd_desc supports at most 256 bytes, and in this implementation,
> >> +     * the max vector group length is 2048 bytes. So split it into two parts.
> >> +     *
> >> +     * The first part is vlen in bytes, encoded in maxsz of simd_desc.
> >> +     * The second part is lmul, encoded in data of simd_desc.
> >> +     */
> >> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
> >> +
> >> +    gen_get_gpr(base, rs1);
> >> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
> >> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
> >> +
> >> +    fn(dest, mask, base, cpu_env, desc);
> >> +
> >> +    tcg_temp_free_ptr(dest);
> >> +    tcg_temp_free_ptr(mask);
> >> +    tcg_temp_free(base);
> >> +    tcg_temp_free_i32(desc);
> >> +    return true;
> >> +}
> >> +
> >> +static bool ld_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
> >> +{
> >> +    uint32_t data = 0;
> >> +    gen_helper_ldst_us *fn;
> >> +    static gen_helper_ldst_us * const fns[2][7][4] = {
> >> +        /* masked unit stride load */
> >> +        { { gen_helper_vlb_v_b_mask,  gen_helper_vlb_v_h_mask,
> >> +            gen_helper_vlb_v_w_mask,  gen_helper_vlb_v_d_mask },
> >> +          { NULL,                     gen_helper_vlh_v_h_mask,
> >> +            gen_helper_vlh_v_w_mask,  gen_helper_vlh_v_d_mask },
> >> +          { NULL,                     NULL,
> >> +            gen_helper_vlw_v_w_mask,  gen_helper_vlw_v_d_mask },
> >> +          { gen_helper_vle_v_b_mask,  gen_helper_vle_v_h_mask,
> >> +            gen_helper_vle_v_w_mask,  gen_helper_vle_v_d_mask },
> >> +          { gen_helper_vlbu_v_b_mask, gen_helper_vlbu_v_h_mask,
> >> +            gen_helper_vlbu_v_w_mask, gen_helper_vlbu_v_d_mask },
> >> +          { NULL,                     gen_helper_vlhu_v_h_mask,
> >> +            gen_helper_vlhu_v_w_mask, gen_helper_vlhu_v_d_mask },
> >> +          { NULL,                     NULL,
> >> +            gen_helper_vlwu_v_w_mask, gen_helper_vlwu_v_d_mask } },
> >> +        /* unmasked unit stride load */
> >> +        { { gen_helper_vlb_v_b,  gen_helper_vlb_v_h,
> >> +            gen_helper_vlb_v_w,  gen_helper_vlb_v_d },
> >> +          { NULL,                gen_helper_vlh_v_h,
> >> +            gen_helper_vlh_v_w,  gen_helper_vlh_v_d },
> >> +          { NULL,                NULL,
> >> +            gen_helper_vlw_v_w,  gen_helper_vlw_v_d },
> >> +          { gen_helper_vle_v_b,  gen_helper_vle_v_h,
> >> +            gen_helper_vle_v_w,  gen_helper_vle_v_d },
> >> +          { gen_helper_vlbu_v_b, gen_helper_vlbu_v_h,
> >> +            gen_helper_vlbu_v_w, gen_helper_vlbu_v_d },
> >> +          { NULL,                gen_helper_vlhu_v_h,
> >> +            gen_helper_vlhu_v_w, gen_helper_vlhu_v_d },
> >> +          { NULL,                NULL,
> >> +            gen_helper_vlwu_v_w, gen_helper_vlwu_v_d } }
> >> +    };
> >> +
> >> +    fn =  fns[a->vm][seq][s->sew];
> >> +    if (fn == NULL) {
> >> +        return false;
> >> +    }
> >> +
> >> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> >> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> >> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> >> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
> >> +    return ldst_us_trans(a->rd, a->rs1, data, fn, s);
> >> +}
> >> +
> >> +static bool ld_us_check(DisasContext *s, arg_r2nfvm* a)
> >> +{
> >> +    return (vext_check_isa_ill(s, RVV) &&
> >> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
> >> +            vext_check_reg(s, a->rd, false) &&
> >> +            vext_check_nf(s, a->nf));
> >> +}
> >> +
> >> +GEN_VEXT_TRANS(vlb_v, 0, r2nfvm, ld_us_op, ld_us_check)
> >> +GEN_VEXT_TRANS(vlh_v, 1, r2nfvm, ld_us_op, ld_us_check)
> >> +GEN_VEXT_TRANS(vlw_v, 2, r2nfvm, ld_us_op, ld_us_check)
> >> +GEN_VEXT_TRANS(vle_v, 3, r2nfvm, ld_us_op, ld_us_check)
> >> +GEN_VEXT_TRANS(vlbu_v, 4, r2nfvm, ld_us_op, ld_us_check)
> >> +GEN_VEXT_TRANS(vlhu_v, 5, r2nfvm, ld_us_op, ld_us_check)
> >> +GEN_VEXT_TRANS(vlwu_v, 6, r2nfvm, ld_us_op, ld_us_check)
> >> +
> >> +static bool st_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
> >> +{
> >> +    uint32_t data = 0;
> >> +    gen_helper_ldst_us *fn;
> >> +    static gen_helper_ldst_us * const fns[2][4][4] = {
> >> +        /* masked unit stride load and store */
> >> +        { { gen_helper_vsb_v_b_mask,  gen_helper_vsb_v_h_mask,
> >> +            gen_helper_vsb_v_w_mask,  gen_helper_vsb_v_d_mask },
> >> +          { NULL,                     gen_helper_vsh_v_h_mask,
> >> +            gen_helper_vsh_v_w_mask,  gen_helper_vsh_v_d_mask },
> >> +          { NULL,                     NULL,
> >> +            gen_helper_vsw_v_w_mask,  gen_helper_vsw_v_d_mask },
> >> +          { gen_helper_vse_v_b_mask,  gen_helper_vse_v_h_mask,
> >> +            gen_helper_vse_v_w_mask,  gen_helper_vse_v_d_mask } },
> >> +        /* unmasked unit stride store */
> >> +        { { gen_helper_vsb_v_b,  gen_helper_vsb_v_h,
> >> +            gen_helper_vsb_v_w,  gen_helper_vsb_v_d },
> >> +          { NULL,                gen_helper_vsh_v_h,
> >> +            gen_helper_vsh_v_w,  gen_helper_vsh_v_d },
> >> +          { NULL,                NULL,
> >> +            gen_helper_vsw_v_w,  gen_helper_vsw_v_d },
> >> +          { gen_helper_vse_v_b,  gen_helper_vse_v_h,
> >> +            gen_helper_vse_v_w,  gen_helper_vse_v_d } }
> >> +    };
> >> +
> >> +    fn =  fns[a->vm][seq][s->sew];
> >> +    if (fn == NULL) {
> >> +        return false;
> >> +    }
> >> +
> >> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> >> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> >> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> >> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
> >> +    return ldst_us_trans(a->rd, a->rs1, data, fn, s);
> >> +}
> >> +
> >> +static bool st_us_check(DisasContext *s, arg_r2nfvm* a)
> >> +{
> >> +    return (vext_check_isa_ill(s, RVV) &&
> >> +            vext_check_reg(s, a->rd, false) &&
> >> +            vext_check_nf(s, a->nf));
> >> +}
> >> +
> >> +GEN_VEXT_TRANS(vsb_v, 0, r2nfvm, st_us_op, st_us_check)
> >> +GEN_VEXT_TRANS(vsh_v, 1, r2nfvm, st_us_op, st_us_check)
> >> +GEN_VEXT_TRANS(vsw_v, 2, r2nfvm, st_us_op, st_us_check)
> >> +GEN_VEXT_TRANS(vse_v, 3, r2nfvm, st_us_op, st_us_check)
> >> +
> >> +/*
> >> + *** stride load and store
> >> + */
> >> +typedef void gen_helper_ldst_stride(TCGv_ptr, TCGv_ptr, TCGv,
> >> +        TCGv, TCGv_env, TCGv_i32);
> >> +
> >> +static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, uint32_t rs2,
> >> +        uint32_t data, gen_helper_ldst_stride *fn, DisasContext *s)
> >> +{
> >> +    TCGv_ptr dest, mask;
> >> +    TCGv base, stride;
> >> +    TCGv_i32 desc;
> >> +
> >> +    dest = tcg_temp_new_ptr();
> >> +    mask = tcg_temp_new_ptr();
> >> +    base = tcg_temp_new();
> >> +    stride = tcg_temp_new();
> >> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
> >> +
> >> +    gen_get_gpr(base, rs1);
> >> +    gen_get_gpr(stride, rs2);
> >> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
> >> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
> >> +
> >> +    fn(dest, mask, base, stride, cpu_env, desc);
> >> +
> >> +    tcg_temp_free_ptr(dest);
> >> +    tcg_temp_free_ptr(mask);
> >> +    tcg_temp_free(base);
> >> +    tcg_temp_free(stride);
> >> +    tcg_temp_free_i32(desc);
> >> +    return true;
> >> +}
> >> +
> >> +static bool ld_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
> >> +{
> >> +    uint32_t data = 0;
> >> +    gen_helper_ldst_stride *fn;
> >> +    static gen_helper_ldst_stride * const fns[7][4] = {
> >> +        { gen_helper_vlsb_v_b,  gen_helper_vlsb_v_h,
> >> +          gen_helper_vlsb_v_w,  gen_helper_vlsb_v_d },
> >> +        { NULL,                 gen_helper_vlsh_v_h,
> >> +          gen_helper_vlsh_v_w,  gen_helper_vlsh_v_d },
> >> +        { NULL,                 NULL,
> >> +          gen_helper_vlsw_v_w,  gen_helper_vlsw_v_d },
> >> +        { gen_helper_vlse_v_b,  gen_helper_vlse_v_h,
> >> +          gen_helper_vlse_v_w,  gen_helper_vlse_v_d },
> >> +        { gen_helper_vlsbu_v_b, gen_helper_vlsbu_v_h,
> >> +          gen_helper_vlsbu_v_w, gen_helper_vlsbu_v_d },
> >> +        { NULL,                 gen_helper_vlshu_v_h,
> >> +          gen_helper_vlshu_v_w, gen_helper_vlshu_v_d },
> >> +        { NULL,                 NULL,
> >> +          gen_helper_vlswu_v_w, gen_helper_vlswu_v_d },
> >> +    };
> >> +
> >> +    fn =  fns[seq][s->sew];
> >> +    if (fn == NULL) {
> >> +        return false;
> >> +    }
> >> +
> >> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> >> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> >> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> >> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
> >> +    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
> >> +}
> >> +
> >> +static bool ld_stride_check(DisasContext *s, arg_rnfvm* a)
> >> +{
> >> +    return (vext_check_isa_ill(s, RVV) &&
> >> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
> >> +            vext_check_reg(s, a->rd, false) &&
> >> +            vext_check_nf(s, a->nf));
> >> +}
> >> +
> >> +GEN_VEXT_TRANS(vlsb_v, 0, rnfvm, ld_stride_op, ld_stride_check)
> >> +GEN_VEXT_TRANS(vlsh_v, 1, rnfvm, ld_stride_op, ld_stride_check)
> >> +GEN_VEXT_TRANS(vlsw_v, 2, rnfvm, ld_stride_op, ld_stride_check)
> >> +GEN_VEXT_TRANS(vlse_v, 3, rnfvm, ld_stride_op, ld_stride_check)
> >> +GEN_VEXT_TRANS(vlsbu_v, 4, rnfvm, ld_stride_op, ld_stride_check)
> >> +GEN_VEXT_TRANS(vlshu_v, 5, rnfvm, ld_stride_op, ld_stride_check)
> >> +GEN_VEXT_TRANS(vlswu_v, 6, rnfvm, ld_stride_op, ld_stride_check)
> >> +
> >> +static bool st_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
> >> +{
> >> +    uint32_t data = 0;
> >> +    gen_helper_ldst_stride *fn;
> >> +    static gen_helper_ldst_stride * const fns[4][4] = {
> >> +        /* masked stride store */
> >> +        { gen_helper_vssb_v_b,  gen_helper_vssb_v_h,
> >> +          gen_helper_vssb_v_w,  gen_helper_vssb_v_d },
> >> +        { NULL,                 gen_helper_vssh_v_h,
> >> +          gen_helper_vssh_v_w,  gen_helper_vssh_v_d },
> >> +        { NULL,                 NULL,
> >> +          gen_helper_vssw_v_w,  gen_helper_vssw_v_d },
> >> +        { gen_helper_vsse_v_b,  gen_helper_vsse_v_h,
> >> +          gen_helper_vsse_v_w,  gen_helper_vsse_v_d }
> >> +    };
> >> +
> >> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> >> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> >> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> >> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
> >> +    fn =  fns[seq][s->sew];
> >> +    if (fn == NULL) {
> >> +        return false;
> >> +    }
> >> +
> >> +    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
> >> +}
> >> +
> >> +static bool st_stride_check(DisasContext *s, arg_rnfvm* a)
> >> +{
> >> +    return (vext_check_isa_ill(s, RVV) &&
> >> +            vext_check_reg(s, a->rd, false) &&
> >> +            vext_check_nf(s, a->nf));
> >> +}
> >> +
> >> +GEN_VEXT_TRANS(vssb_v, 0, rnfvm, st_stride_op, st_stride_check)
> >> +GEN_VEXT_TRANS(vssh_v, 1, rnfvm, st_stride_op, st_stride_check)
> >> +GEN_VEXT_TRANS(vssw_v, 2, rnfvm, st_stride_op, st_stride_check)
> >> +GEN_VEXT_TRANS(vsse_v, 3, rnfvm, st_stride_op, st_stride_check)
> > Looks good
> >
> >> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> >> index af07ac4160..852545b77e 100644
> >> --- a/target/riscv/translate.c
> >> +++ b/target/riscv/translate.c
> >> @@ -61,6 +61,7 @@ typedef struct DisasContext {
> >>       uint8_t lmul;
> >>       uint8_t sew;
> >>       uint16_t vlen;
> >> +    uint16_t mlen;
> >>       bool vl_eq_vlmax;
> >>   } DisasContext;
> >>
> >> @@ -548,6 +549,11 @@ static void decode_RV32_64C(DisasContext *ctx, uint16_t opcode)
> >>       }
> >>   }
> >>
> >> +static int ex_plus_1(DisasContext *ctx, int nf)
> >> +{
> >> +    return nf + 1;
> >> +}
> >> +
> >>   #define EX_SH(amount) \
> >>       static int ex_shift_##amount(DisasContext *ctx, int imm) \
> >>       {                                         \
> >> @@ -784,6 +790,7 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
> >>       ctx->vill = FIELD_EX32(tb_flags, TB_FLAGS, VILL);
> >>       ctx->sew = FIELD_EX32(tb_flags, TB_FLAGS, SEW);
> >>       ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
> >> +    ctx->mlen = 1 << (ctx->sew  + 3 - ctx->lmul);
> >>       ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
> >>   }
> >>
> >> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> >> index 2afe716f2a..ebfabd2946 100644
> >> --- a/target/riscv/vector_helper.c
> >> +++ b/target/riscv/vector_helper.c
> >> @@ -18,8 +18,10 @@
> >>
> >>   #include "qemu/osdep.h"
> >>   #include "cpu.h"
> >> +#include "exec/memop.h"
> >>   #include "exec/exec-all.h"
> >>   #include "exec/helper-proto.h"
> >> +#include "tcg/tcg-gvec-desc.h"
> >>   #include <math.h>
> >>
> >>   target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
> >> @@ -51,3 +53,407 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
> >>       env->vstart = 0;
> >>       return vl;
> >>   }
> >> +
> >> +/*
> >> + * Note that vector data is stored in host-endian 64-bit chunks,
> >> + * so addressing units smaller than that needs a host-endian fixup.
> >> + */
> >> +#ifdef HOST_WORDS_BIGENDIAN
> >> +#define H1(x)   ((x) ^ 7)
> >> +#define H1_2(x) ((x) ^ 6)
> >> +#define H1_4(x) ((x) ^ 4)
> >> +#define H2(x)   ((x) ^ 3)
> >> +#define H4(x)   ((x) ^ 1)
> >> +#define H8(x)   ((x))
> >> +#else
> >> +#define H1(x)   (x)
> >> +#define H1_2(x) (x)
> >> +#define H1_4(x) (x)
> >> +#define H2(x)   (x)
> >> +#define H4(x)   (x)
> >> +#define H8(x)   (x)
> >> +#endif
> > Looks good. Overall this looks good. Do you mind splitting this patch
> > up a little bit more? It's difficult to review such a long and complex
> > patch.
> >
> > Alistair
> As unit stride can be saw as  a special case of stride mode, I just put
> them together.
> I will  split the stride and unit stride mode in next patch set.

Thank you.

>
> Even though I think it will be some long and complex, a lot of corner
> case must
> be considered for vector load and store, and a lot of common code will
> be defined
> here.

That's fine

Alistair

>
> Zhiwei
> >> +
> >> +static inline uint32_t vext_nf(uint32_t desc)
> >> +{
> >> +    return FIELD_EX32(simd_data(desc), VDATA, NF);
> >> +}
> >> +
> >> +static inline uint32_t vext_mlen(uint32_t desc)
> >> +{
> >> +    return FIELD_EX32(simd_data(desc), VDATA, MLEN);
> >> +}
> >> +
> >> +static inline uint32_t vext_vm(uint32_t desc)
> >> +{
> >> +    return FIELD_EX32(simd_data(desc), VDATA, VM);
> >> +}
> >> +
> >> +static inline uint32_t vext_lmul(uint32_t desc)
> >> +{
> >> +    return FIELD_EX32(simd_data(desc), VDATA, LMUL);
> >> +}
> >> +
> >> +/*
> >> + * Get vector group length in bytes. Its range is [64, 2048].
> >> + *
> >> + * As simd_desc support at most 256, the max vlen is 512 bits.
> >> + * So vlen in bytes is encoded as maxsz.
> >> + */
> >> +static inline uint32_t vext_maxsz(uint32_t desc)
> >> +{
> >> +    return simd_maxsz(desc) << vext_lmul(desc);
> >> +}
> >> +
> >> +/*
> >> + * This function checks watchpoint before real load operation.
> >> + *
> >> + * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
> >> + * In user mode, there is no watchpoint support now.
> >> + *
> >> + * It will trigger an exception if there is no mapping in TLB
> >> + * and page table walk can't fill the TLB entry. Then the guest
> >> + * software can return here after process the exception or never return.
> >> + */
> >> +static void probe_pages(CPURISCVState *env, target_ulong addr,
> >> +        target_ulong len, uintptr_t ra, MMUAccessType access_type)
> >> +{
> >> +    target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
> >> +    target_ulong curlen = MIN(pagelen, len);
> >> +
> >> +    probe_access(env, addr, curlen, access_type,
> >> +            cpu_mmu_index(env, false), ra);
> >> +    if (len > curlen) {
> >> +        addr += curlen;
> >> +        curlen = len - curlen;
> >> +        probe_access(env, addr, curlen, access_type,
> >> +                cpu_mmu_index(env, false), ra);
> >> +    }
> >> +}
> >> +
> >> +#ifdef HOST_WORDS_BIGENDIAN
> >> +static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
> >> +{
> >> +    /*
> >> +     * Split the remaining range to two parts.
> >> +     * The first part is in the last uint64_t unit.
> >> +     * The second part start from the next uint64_t unit.
> >> +     */
> >> +    int part1 = 0, part2 = tot - cnt;
> >> +    if (cnt % 8) {
> >> +        part1 = 8 - (cnt % 8);
> >> +        part2 = tot - cnt - part1;
> >> +        memset(tail & ~(7ULL), 0, part1);
> >> +        memset((tail + 8) & ~(7ULL), 0, part2);
> >> +    } else {
> >> +        memset(tail, 0, part2);
> >> +    }
> >> +}
> >> +#else
> >> +static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
> >> +{
> >> +    memset(tail, 0, tot - cnt);
> >> +}
> >> +#endif
> >> +
> >> +static void clearb(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
> >> +{
> >> +    int8_t *cur = ((int8_t *)vd + H1(idx));
> >> +    vext_clear(cur, cnt, tot);
> >> +}
> >> +
> >> +static void clearh(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
> >> +{
> >> +    int16_t *cur = ((int16_t *)vd + H2(idx));
> >> +    vext_clear(cur, cnt, tot);
> >> +}
> >> +
> >> +static void clearl(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
> >> +{
> >> +    int32_t *cur = ((int32_t *)vd + H4(idx));
> >> +    vext_clear(cur, cnt, tot);
> >> +}
> >> +
> >> +static void clearq(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
> >> +{
> >> +    int64_t *cur = (int64_t *)vd + idx;
> >> +    vext_clear(cur, cnt, tot);
> >> +}
> >> +
> >> +
> >> +static inline int vext_elem_mask(void *v0, int mlen, int index)
> >> +{
> >> +    int idx = (index * mlen) / 64;
> >> +    int pos = (index * mlen) % 64;
> >> +    return (((uint64_t *)v0)[idx] >> pos) & 1;
> >> +}
> >> +
> >> +/* elements operations for load and store */
> >> +typedef void (*vext_ldst_elem_fn)(CPURISCVState *env, target_ulong addr,
> >> +        uint32_t idx, void *vd, uintptr_t retaddr);
> >> +typedef void (*vext_ld_clear_elem)(void *vd, uint32_t idx,
> >> +        uint32_t cnt, uint32_t tot);
> >> +
> >> +#define GEN_VEXT_LD_ELEM(NAME, MTYPE, ETYPE, H, LDSUF)     \
> >> +static void NAME(CPURISCVState *env, abi_ptr addr,         \
> >> +        uint32_t idx, void *vd, uintptr_t retaddr)         \
> >> +{                                                          \
> >> +    MTYPE data;                                            \
> >> +    ETYPE *cur = ((ETYPE *)vd + H(idx));                   \
> >> +    data = cpu_##LDSUF##_data_ra(env, addr, retaddr);      \
> >> +    *cur = data;                                           \
> >> +}                                                          \
> >> +
> >> +GEN_VEXT_LD_ELEM(ldb_b, int8_t,  int8_t,  H1, ldsb)
> >> +GEN_VEXT_LD_ELEM(ldb_h, int8_t,  int16_t, H2, ldsb)
> >> +GEN_VEXT_LD_ELEM(ldb_w, int8_t,  int32_t, H4, ldsb)
> >> +GEN_VEXT_LD_ELEM(ldb_d, int8_t,  int64_t, H8, ldsb)
> >> +GEN_VEXT_LD_ELEM(ldh_h, int16_t, int16_t, H2, ldsw)
> >> +GEN_VEXT_LD_ELEM(ldh_w, int16_t, int32_t, H4, ldsw)
> >> +GEN_VEXT_LD_ELEM(ldh_d, int16_t, int64_t, H8, ldsw)
> >> +GEN_VEXT_LD_ELEM(ldw_w, int32_t, int32_t, H4, ldl)
> >> +GEN_VEXT_LD_ELEM(ldw_d, int32_t, int64_t, H8, ldl)
> >> +GEN_VEXT_LD_ELEM(lde_b, int8_t,  int8_t,  H1, ldsb)
> >> +GEN_VEXT_LD_ELEM(lde_h, int16_t, int16_t, H2, ldsw)
> >> +GEN_VEXT_LD_ELEM(lde_w, int32_t, int32_t, H4, ldl)
> >> +GEN_VEXT_LD_ELEM(lde_d, int64_t, int64_t, H8, ldq)
> >> +GEN_VEXT_LD_ELEM(ldbu_b, uint8_t,  uint8_t,  H1, ldub)
> >> +GEN_VEXT_LD_ELEM(ldbu_h, uint8_t,  uint16_t, H2, ldub)
> >> +GEN_VEXT_LD_ELEM(ldbu_w, uint8_t,  uint32_t, H4, ldub)
> >> +GEN_VEXT_LD_ELEM(ldbu_d, uint8_t,  uint64_t, H8, ldub)
> >> +GEN_VEXT_LD_ELEM(ldhu_h, uint16_t, uint16_t, H2, lduw)
> >> +GEN_VEXT_LD_ELEM(ldhu_w, uint16_t, uint32_t, H4, lduw)
> >> +GEN_VEXT_LD_ELEM(ldhu_d, uint16_t, uint64_t, H8, lduw)
> >> +GEN_VEXT_LD_ELEM(ldwu_w, uint32_t, uint32_t, H4, ldl)
> >> +GEN_VEXT_LD_ELEM(ldwu_d, uint32_t, uint64_t, H8, ldl)
> >> +
> >> +#define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF)          \
> >> +static void NAME(CPURISCVState *env, abi_ptr addr,       \
> >> +        uint32_t idx, void *vd, uintptr_t retaddr)       \
> >> +{                                                        \
> >> +    ETYPE data = *((ETYPE *)vd + H(idx));                \
> >> +    cpu_##STSUF##_data_ra(env, addr, data, retaddr);     \
> >> +}
> >> +GEN_VEXT_ST_ELEM(stb_b, int8_t,  H1, stb)
> >> +GEN_VEXT_ST_ELEM(stb_h, int16_t, H2, stb)
> >> +GEN_VEXT_ST_ELEM(stb_w, int32_t, H4, stb)
> >> +GEN_VEXT_ST_ELEM(stb_d, int64_t, H8, stb)
> >> +GEN_VEXT_ST_ELEM(sth_h, int16_t, H2, stw)
> >> +GEN_VEXT_ST_ELEM(sth_w, int32_t, H4, stw)
> >> +GEN_VEXT_ST_ELEM(sth_d, int64_t, H8, stw)
> >> +GEN_VEXT_ST_ELEM(stw_w, int32_t, H4, stl)
> >> +GEN_VEXT_ST_ELEM(stw_d, int64_t, H8, stl)
> >> +GEN_VEXT_ST_ELEM(ste_b, int8_t,  H1, stb)
> >> +GEN_VEXT_ST_ELEM(ste_h, int16_t, H2, stw)
> >> +GEN_VEXT_ST_ELEM(ste_w, int32_t, H4, stl)
> >> +GEN_VEXT_ST_ELEM(ste_d, int64_t, H8, stq)
> >> +
> >> +/*
> >> + *** stride: access vector element from strided memory
> >> + */
> >> +static void vext_ldst_stride(void *vd, void *v0, target_ulong base,
> >> +        target_ulong stride, CPURISCVState *env, uint32_t desc, uint32_t vm,
> >> +        vext_ldst_elem_fn ldst_elem, vext_ld_clear_elem clear_elem,
> >> +        uint32_t esz, uint32_t msz, uintptr_t ra, MMUAccessType access_type)
> >> +{
> >> +    uint32_t i, k;
> >> +    uint32_t nf = vext_nf(desc);
> >> +    uint32_t mlen = vext_mlen(desc);
> >> +    uint32_t vlmax = vext_maxsz(desc) / esz;
> >> +
> >> +    if (env->vl == 0) {
> >> +        return;
> >> +    }
> >> +    /* probe every access*/
> >> +    for (i = 0; i < env->vl; i++) {
> >> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> >> +            continue;
> >> +        }
> >> +        probe_pages(env, base + stride * i, nf * msz, ra, access_type);
> >> +    }
> >> +    /* do real access */
> >> +    for (i = 0; i < env->vl; i++) {
> >> +        k = 0;
> >> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> >> +            continue;
> >> +        }
> >> +        while (k < nf) {
> >> +            target_ulong addr = base + stride * i + k * msz;
> >> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
> >> +            k++;
> >> +        }
> >> +    }
> >> +    /* clear tail elements */
> >> +    if (clear_elem) {
> >> +        for (k = 0; k < nf; k++) {
> >> +            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
> >> +        }
> >> +    }
> >> +}
> >> +
> >> +#define GEN_VEXT_LD_STRIDE(NAME, MTYPE, ETYPE, LOAD_FN, CLEAR_FN)       \
> >> +void HELPER(NAME)(void *vd, void * v0, target_ulong base,               \
> >> +        target_ulong stride, CPURISCVState *env, uint32_t desc)         \
> >> +{                                                                       \
> >> +    uint32_t vm = vext_vm(desc);                                        \
> >> +    vext_ldst_stride(vd, v0, base, stride, env, desc, vm, LOAD_FN,      \
> >> +        CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);\
> >> +}
> >> +
> >> +GEN_VEXT_LD_STRIDE(vlsb_v_b,  int8_t,   int8_t,   ldb_b,  clearb)
> >> +GEN_VEXT_LD_STRIDE(vlsb_v_h,  int8_t,   int16_t,  ldb_h,  clearh)
> >> +GEN_VEXT_LD_STRIDE(vlsb_v_w,  int8_t,   int32_t,  ldb_w,  clearl)
> >> +GEN_VEXT_LD_STRIDE(vlsb_v_d,  int8_t,   int64_t,  ldb_d,  clearq)
> >> +GEN_VEXT_LD_STRIDE(vlsh_v_h,  int16_t,  int16_t,  ldh_h,  clearh)
> >> +GEN_VEXT_LD_STRIDE(vlsh_v_w,  int16_t,  int32_t,  ldh_w,  clearl)
> >> +GEN_VEXT_LD_STRIDE(vlsh_v_d,  int16_t,  int64_t,  ldh_d,  clearq)
> >> +GEN_VEXT_LD_STRIDE(vlsw_v_w,  int32_t,  int32_t,  ldw_w,  clearl)
> >> +GEN_VEXT_LD_STRIDE(vlsw_v_d,  int32_t,  int64_t,  ldw_d,  clearq)
> >> +GEN_VEXT_LD_STRIDE(vlse_v_b,  int8_t,   int8_t,   lde_b,  clearb)
> >> +GEN_VEXT_LD_STRIDE(vlse_v_h,  int16_t,  int16_t,  lde_h,  clearh)
> >> +GEN_VEXT_LD_STRIDE(vlse_v_w,  int32_t,  int32_t,  lde_w,  clearl)
> >> +GEN_VEXT_LD_STRIDE(vlse_v_d,  int64_t,  int64_t,  lde_d,  clearq)
> >> +GEN_VEXT_LD_STRIDE(vlsbu_v_b, uint8_t,  uint8_t,  ldbu_b, clearb)
> >> +GEN_VEXT_LD_STRIDE(vlsbu_v_h, uint8_t,  uint16_t, ldbu_h, clearh)
> >> +GEN_VEXT_LD_STRIDE(vlsbu_v_w, uint8_t,  uint32_t, ldbu_w, clearl)
> >> +GEN_VEXT_LD_STRIDE(vlsbu_v_d, uint8_t,  uint64_t, ldbu_d, clearq)
> >> +GEN_VEXT_LD_STRIDE(vlshu_v_h, uint16_t, uint16_t, ldhu_h, clearh)
> >> +GEN_VEXT_LD_STRIDE(vlshu_v_w, uint16_t, uint32_t, ldhu_w, clearl)
> >> +GEN_VEXT_LD_STRIDE(vlshu_v_d, uint16_t, uint64_t, ldhu_d, clearq)
> >> +GEN_VEXT_LD_STRIDE(vlswu_v_w, uint32_t, uint32_t, ldwu_w, clearl)
> >> +GEN_VEXT_LD_STRIDE(vlswu_v_d, uint32_t, uint64_t, ldwu_d, clearq)
> >> +
> >> +#define GEN_VEXT_ST_STRIDE(NAME, MTYPE, ETYPE, STORE_FN)                \
> >> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
> >> +        target_ulong stride, CPURISCVState *env, uint32_t desc)         \
> >> +{                                                                       \
> >> +    uint32_t vm = vext_vm(desc);                                        \
> >> +    vext_ldst_stride(vd, v0, base, stride, env, desc, vm, STORE_FN,     \
> >> +        NULL, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);   \
> >> +}
> >> +
> >> +GEN_VEXT_ST_STRIDE(vssb_v_b, int8_t,  int8_t,  stb_b)
> >> +GEN_VEXT_ST_STRIDE(vssb_v_h, int8_t,  int16_t, stb_h)
> >> +GEN_VEXT_ST_STRIDE(vssb_v_w, int8_t,  int32_t, stb_w)
> >> +GEN_VEXT_ST_STRIDE(vssb_v_d, int8_t,  int64_t, stb_d)
> >> +GEN_VEXT_ST_STRIDE(vssh_v_h, int16_t, int16_t, sth_h)
> >> +GEN_VEXT_ST_STRIDE(vssh_v_w, int16_t, int32_t, sth_w)
> >> +GEN_VEXT_ST_STRIDE(vssh_v_d, int16_t, int64_t, sth_d)
> >> +GEN_VEXT_ST_STRIDE(vssw_v_w, int32_t, int32_t, stw_w)
> >> +GEN_VEXT_ST_STRIDE(vssw_v_d, int32_t, int64_t, stw_d)
> >> +GEN_VEXT_ST_STRIDE(vsse_v_b, int8_t,  int8_t,  ste_b)
> >> +GEN_VEXT_ST_STRIDE(vsse_v_h, int16_t, int16_t, ste_h)
> >> +GEN_VEXT_ST_STRIDE(vsse_v_w, int32_t, int32_t, ste_w)
> >> +GEN_VEXT_ST_STRIDE(vsse_v_d, int64_t, int64_t, ste_d)
> >> +
> >> +/*
> >> + *** unit-stride: access elements stored contiguously in memory
> >> + */
> >> +
> >> +/* unmasked unit-stride load and store operation*/
> >> +static inline void vext_ldst_us(void *vd, target_ulong base,
> >> +        CPURISCVState *env, uint32_t desc,
> >> +        vext_ldst_elem_fn ldst_elem,
> >> +        vext_ld_clear_elem clear_elem,
> >> +        uint32_t esz, uint32_t msz, uintptr_t ra,
> >> +        MMUAccessType access_type)
> >> +{
> >> +    uint32_t i, k;
> >> +    uint32_t nf = vext_nf(desc);
> >> +    uint32_t vlmax = vext_maxsz(desc) / esz;
> >> +
> >> +    if (env->vl == 0) {
> >> +        return;
> >> +    }
> >> +    /* probe every access */
> >> +    probe_pages(env, base, env->vl * nf * msz, ra, access_type);
> >> +    /* load bytes from guest memory */
> >> +    for (i = 0; i < env->vl; i++) {
> >> +        k = 0;
> >> +        while (k < nf) {
> >> +            target_ulong addr = base + (i * nf + k) * msz;
> >> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
> >> +            k++;
> >> +        }
> >> +    }
> >> +    /* clear tail elements */
> >> +    if (clear_elem) {
> >> +        for (k = 0; k < nf; k++) {
> >> +            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
> >> +        }
> >> +    }
> >> +}
> >> +
> >> +/*
> >> + * masked unit-stride load and store operation will be a special case of stride,
> >> + * stride = NF * sizeof (MTYPE)
> >> + */
> >> +
> >> +#define GEN_VEXT_LD_US(NAME, MTYPE, ETYPE, LOAD_FN, CLEAR_FN)           \
> >> +void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,         \
> >> +        CPURISCVState *env, uint32_t desc)                              \
> >> +{                                                                       \
> >> +    uint32_t stride = vext_nf(desc) * sizeof(MTYPE);                    \
> >> +    vext_ldst_stride(vd, v0, base, stride, env, desc, false, LOAD_FN,   \
> >> +        CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);\
> >> +}                                                                       \
> >> +                                                                        \
> >> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
> >> +        CPURISCVState *env, uint32_t desc)                              \
> >> +{                                                                       \
> >> +    vext_ldst_us(vd, base, env, desc, LOAD_FN, CLEAR_FN,                \
> >> +        sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);          \
> >> +}
> >> +
> >> +GEN_VEXT_LD_US(vlb_v_b,  int8_t,   int8_t,   ldb_b,  clearb)
> >> +GEN_VEXT_LD_US(vlb_v_h,  int8_t,   int16_t,  ldb_h,  clearh)
> >> +GEN_VEXT_LD_US(vlb_v_w,  int8_t,   int32_t,  ldb_w,  clearl)
> >> +GEN_VEXT_LD_US(vlb_v_d,  int8_t,   int64_t,  ldb_d,  clearq)
> >> +GEN_VEXT_LD_US(vlh_v_h,  int16_t,  int16_t,  ldh_h,  clearh)
> >> +GEN_VEXT_LD_US(vlh_v_w,  int16_t,  int32_t,  ldh_w,  clearl)
> >> +GEN_VEXT_LD_US(vlh_v_d,  int16_t,  int64_t,  ldh_d,  clearq)
> >> +GEN_VEXT_LD_US(vlw_v_w,  int32_t,  int32_t,  ldw_w,  clearl)
> >> +GEN_VEXT_LD_US(vlw_v_d,  int32_t,  int64_t,  ldw_d,  clearq)
> >> +GEN_VEXT_LD_US(vle_v_b,  int8_t,   int8_t,   lde_b,  clearb)
> >> +GEN_VEXT_LD_US(vle_v_h,  int16_t,  int16_t,  lde_h,  clearh)
> >> +GEN_VEXT_LD_US(vle_v_w,  int32_t,  int32_t,  lde_w,  clearl)
> >> +GEN_VEXT_LD_US(vle_v_d,  int64_t,  int64_t,  lde_d,  clearq)
> >> +GEN_VEXT_LD_US(vlbu_v_b, uint8_t,  uint8_t,  ldbu_b, clearb)
> >> +GEN_VEXT_LD_US(vlbu_v_h, uint8_t,  uint16_t, ldbu_h, clearh)
> >> +GEN_VEXT_LD_US(vlbu_v_w, uint8_t,  uint32_t, ldbu_w, clearl)
> >> +GEN_VEXT_LD_US(vlbu_v_d, uint8_t,  uint64_t, ldbu_d, clearq)
> >> +GEN_VEXT_LD_US(vlhu_v_h, uint16_t, uint16_t, ldhu_h, clearh)
> >> +GEN_VEXT_LD_US(vlhu_v_w, uint16_t, uint32_t, ldhu_w, clearl)
> >> +GEN_VEXT_LD_US(vlhu_v_d, uint16_t, uint64_t, ldhu_d, clearq)
> >> +GEN_VEXT_LD_US(vlwu_v_w, uint32_t, uint32_t, ldwu_w, clearl)
> >> +GEN_VEXT_LD_US(vlwu_v_d, uint32_t, uint64_t, ldwu_d, clearq)
> >> +
> >> +#define GEN_VEXT_ST_US(NAME, MTYPE, ETYPE, STORE_FN)                    \
> >> +void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,         \
> >> +        CPURISCVState *env, uint32_t desc)                              \
> >> +{                                                                       \
> >> +    uint32_t stride = vext_nf(desc) * sizeof(MTYPE);                    \
> >> +    vext_ldst_stride(vd, v0, base, stride, env, desc, false, STORE_FN,  \
> >> +        NULL, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);   \
> >> +}                                                                       \
> >> +                                                                        \
> >> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
> >> +        CPURISCVState *env, uint32_t desc)                              \
> >> +{                                                                       \
> >> +    vext_ldst_us(vd, base, env, desc, STORE_FN, NULL,                   \
> >> +        sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);         \
> >> +}
> >> +
> >> +GEN_VEXT_ST_US(vsb_v_b, int8_t,  int8_t , stb_b)
> >> +GEN_VEXT_ST_US(vsb_v_h, int8_t,  int16_t, stb_h)
> >> +GEN_VEXT_ST_US(vsb_v_w, int8_t,  int32_t, stb_w)
> >> +GEN_VEXT_ST_US(vsb_v_d, int8_t,  int64_t, stb_d)
> >> +GEN_VEXT_ST_US(vsh_v_h, int16_t, int16_t, sth_h)
> >> +GEN_VEXT_ST_US(vsh_v_w, int16_t, int32_t, sth_w)
> >> +GEN_VEXT_ST_US(vsh_v_d, int16_t, int64_t, sth_d)
> >> +GEN_VEXT_ST_US(vsw_v_w, int32_t, int32_t, stw_w)
> >> +GEN_VEXT_ST_US(vsw_v_d, int32_t, int64_t, stw_d)
> >> +GEN_VEXT_ST_US(vse_v_b, int8_t,  int8_t , ste_b)
> >> +GEN_VEXT_ST_US(vse_v_h, int16_t, int16_t, ste_h)
> >> +GEN_VEXT_ST_US(vse_v_w, int32_t, int32_t, ste_w)
> >> +GEN_VEXT_ST_US(vse_v_d, int64_t, int64_t, ste_d)
> >> --
> >> 2.23.0
> >>
>


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 05/60] target/riscv: add vector stride load and store instructions
@ 2020-03-13 22:05         ` Alistair Francis
  0 siblings, 0 replies; 336+ messages in thread
From: Alistair Francis @ 2020-03-13 22:05 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Chih-Min Chao, Palmer Dabbelt, wenmeng_zhang,
	wxy194768, guoren, qemu-devel@nongnu.org Developers,
	open list:RISC-V

On Fri, Mar 13, 2020 at 2:32 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
>
>
> On 2020/3/14 4:38, Alistair Francis wrote:
> > On Thu, Mar 12, 2020 at 8:09 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
> >> Vector strided operations access the first memory element at the base address,
> >> and then access subsequent elements at address increments given by the byte
> >> offset contained in the x register specified by rs2.
> >>
> >> Vector unit-stride operations access elements stored contiguously in memory
> >> starting from the base effective address. It can been seen as a special
> >> case of strided operations.
> >>
> >> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> >> ---
> >>   target/riscv/cpu.h                      |   6 +
> >>   target/riscv/helper.h                   | 105 ++++++
> >>   target/riscv/insn32.decode              |  32 ++
> >>   target/riscv/insn_trans/trans_rvv.inc.c | 340 ++++++++++++++++++++
> >>   target/riscv/translate.c                |   7 +
> >>   target/riscv/vector_helper.c            | 406 ++++++++++++++++++++++++
> >>   6 files changed, 896 insertions(+)
> >>
> >> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> >> index 505d1a8515..b6ebb9b0eb 100644
> >> --- a/target/riscv/cpu.h
> >> +++ b/target/riscv/cpu.h
> >> @@ -369,6 +369,12 @@ typedef CPURISCVState CPUArchState;
> >>   typedef RISCVCPU ArchCPU;
> >>   #include "exec/cpu-all.h"
> >>
> >> +/* share data between vector helpers and decode code */
> >> +FIELD(VDATA, MLEN, 0, 8)
> >> +FIELD(VDATA, VM, 8, 1)
> >> +FIELD(VDATA, LMUL, 9, 2)
> >> +FIELD(VDATA, NF, 11, 4)
> >> +
> >>   FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
> >>   FIELD(TB_FLAGS, LMUL, 3, 2)
> >>   FIELD(TB_FLAGS, SEW, 5, 3)
> >> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> >> index 3c28c7e407..87dfa90609 100644
> >> --- a/target/riscv/helper.h
> >> +++ b/target/riscv/helper.h
> >> @@ -78,3 +78,108 @@ DEF_HELPER_1(tlb_flush, void, env)
> >>   #endif
> >>   /* Vector functions */
> >>   DEF_HELPER_3(vsetvl, tl, env, tl, tl)
> >> +DEF_HELPER_5(vlb_v_b, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlb_v_b_mask, void, ptr, ptr, tl, env, i32)
> > Do you mind explaining why we have *_mask versions? I'm struggling to
> > understand this.
> When an instruction with a mask, it will only operate the active
> elements in vector.
> Whether an element is active or inactive is predicated by a mask
> register v0.
>
> Without mask, it will operate every element in vector in the body.

Doesn't the mask always apply though? Why do we need an extra helper?

> >> +DEF_HELPER_5(vlb_v_h, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlb_v_h_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlb_v_w, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlb_v_w_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlb_v_d, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlb_v_d_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlh_v_h, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlh_v_h_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlh_v_w, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlh_v_w_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlh_v_d, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlh_v_d_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlw_v_w, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlw_v_w_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlw_v_d, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlw_v_d_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vle_v_b, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vle_v_b_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vle_v_h, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vle_v_h_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vle_v_w, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vle_v_w_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vle_v_d, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vle_v_d_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlbu_v_b, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlbu_v_b_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlbu_v_h, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlbu_v_h_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlbu_v_w, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlbu_v_w_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlbu_v_d, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlbu_v_d_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlhu_v_h, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlhu_v_h_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlhu_v_w, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlhu_v_w_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlhu_v_d, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlhu_v_d_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlwu_v_w, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlwu_v_w_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlwu_v_d, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vlwu_v_d_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsb_v_b, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsb_v_b_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsb_v_h, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsb_v_h_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsb_v_w, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsb_v_w_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsb_v_d, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsb_v_d_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsh_v_h, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsh_v_h_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsh_v_w, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsh_v_w_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsh_v_d, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsh_v_d_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsw_v_w, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsw_v_w_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsw_v_d, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vsw_v_d_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vse_v_b, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vse_v_b_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vse_v_h, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vse_v_h_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vse_v_w, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vse_v_w_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vse_v_d, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_5(vse_v_d_mask, void, ptr, ptr, tl, env, i32)
> >> +DEF_HELPER_6(vlsb_v_b, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsb_v_h, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsb_v_w, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsb_v_d, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsh_v_h, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsh_v_w, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsh_v_d, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsw_v_w, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsw_v_d, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlse_v_b, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlse_v_h, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlse_v_w, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlse_v_d, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsbu_v_b, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsbu_v_h, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsbu_v_w, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlsbu_v_d, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlshu_v_h, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlshu_v_w, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlshu_v_d, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlswu_v_w, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vlswu_v_d, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vssb_v_b, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vssb_v_h, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vssb_v_w, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vssb_v_d, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vssh_v_h, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vssh_v_w, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vssh_v_d, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vssw_v_w, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vssw_v_d, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vsse_v_b, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vsse_v_h, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vsse_v_w, void, ptr, ptr, tl, tl, env, i32)
> >> +DEF_HELPER_6(vsse_v_d, void, ptr, ptr, tl, tl, env, i32)
> >> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> >> index 53340bdbc4..ef521152c5 100644
> >> --- a/target/riscv/insn32.decode
> >> +++ b/target/riscv/insn32.decode
> >> @@ -25,6 +25,7 @@
> >>   %sh10    20:10
> >>   %csr    20:12
> >>   %rm     12:3
> >> +%nf     29:3                     !function=ex_plus_1
> >>
> >>   # immediates:
> >>   %imm_i    20:s12
> >> @@ -43,6 +44,8 @@
> >>   &u    imm rd
> >>   &shift     shamt rs1 rd
> >>   &atomic    aq rl rs2 rs1 rd
> >> +&r2nfvm    vm rd rs1 nf
> >> +&rnfvm     vm rd rs1 rs2 nf
> >>
> >>   # Formats 32:
> >>   @r       .......   ..... ..... ... ..... ....... &r                %rs2 %rs1 %rd
> >> @@ -62,6 +65,8 @@
> >>   @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
> >>   @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
> >>   @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
> >> +@r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
> >> +@r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
> >>   @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
> >>
> >>   @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
> >> @@ -210,5 +215,32 @@ fcvt_d_w   1101001  00000 ..... ... ..... 1010011 @r2_rm
> >>   fcvt_d_wu  1101001  00001 ..... ... ..... 1010011 @r2_rm
> >>
> >>   # *** RV32V Extension ***
> >> +
> >> +# *** Vector loads and stores are encoded within LOADFP/STORE-FP ***
> >> +vlb_v      ... 100 . 00000 ..... 000 ..... 0000111 @r2_nfvm
> >> +vlh_v      ... 100 . 00000 ..... 101 ..... 0000111 @r2_nfvm
> >> +vlw_v      ... 100 . 00000 ..... 110 ..... 0000111 @r2_nfvm
> >> +vle_v      ... 000 . 00000 ..... 111 ..... 0000111 @r2_nfvm
> >> +vlbu_v     ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm
> >> +vlhu_v     ... 000 . 00000 ..... 101 ..... 0000111 @r2_nfvm
> >> +vlwu_v     ... 000 . 00000 ..... 110 ..... 0000111 @r2_nfvm
> >> +vsb_v      ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm
> >> +vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
> >> +vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
> >> +vse_v      ... 000 . 00000 ..... 111 ..... 0100111 @r2_nfvm
> >> +
> >> +vlsb_v     ... 110 . ..... ..... 000 ..... 0000111 @r_nfvm
> >> +vlsh_v     ... 110 . ..... ..... 101 ..... 0000111 @r_nfvm
> >> +vlsw_v     ... 110 . ..... ..... 110 ..... 0000111 @r_nfvm
> >> +vlse_v     ... 010 . ..... ..... 111 ..... 0000111 @r_nfvm
> >> +vlsbu_v    ... 010 . ..... ..... 000 ..... 0000111 @r_nfvm
> >> +vlshu_v    ... 010 . ..... ..... 101 ..... 0000111 @r_nfvm
> >> +vlswu_v    ... 010 . ..... ..... 110 ..... 0000111 @r_nfvm
> >> +vssb_v     ... 010 . ..... ..... 000 ..... 0100111 @r_nfvm
> >> +vssh_v     ... 010 . ..... ..... 101 ..... 0100111 @r_nfvm
> >> +vssw_v     ... 010 . ..... ..... 110 ..... 0100111 @r_nfvm
> >> +vsse_v     ... 010 . ..... ..... 111 ..... 0100111 @r_nfvm
> >> +
> >> +# *** new major opcode OP-V ***
> >>   vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
> >>   vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
> >> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
> >> index da82c72bbf..d85f2aec68 100644
> >> --- a/target/riscv/insn_trans/trans_rvv.inc.c
> >> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
> >> @@ -15,6 +15,8 @@
> >>    * You should have received a copy of the GNU General Public License along with
> >>    * this program.  If not, see <http://www.gnu.org/licenses/>.
> >>    */
> >> +#include "tcg/tcg-op-gvec.h"
> >> +#include "tcg/tcg-gvec-desc.h"
> >>
> >>   static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl * a)
> >>   {
> >> @@ -67,3 +69,341 @@ static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli * a)
> >>       tcg_temp_free(dst);
> >>       return true;
> >>   }
> >> +
> >> +/* vector register offset from env */
> >> +static uint32_t vreg_ofs(DisasContext *s, int reg)
> >> +{
> >> +    return offsetof(CPURISCVState, vreg) + reg * s->vlen / 8;
> >> +}
> >> +
> >> +/* check functions */
> >> +static bool vext_check_isa_ill(DisasContext *s, target_ulong isa)
> >> +{
> >> +    return !s->vill && ((s->misa & isa) == isa);
> >> +}
> > I don't think we need a new function to check ISA.
> I don't think so.
>
> Although there is a riscv_has_ext(env, isa) in cpu.h, it is not proper
> in this file,
> as it is in translation time and  usually DisasContext   is used here
> instead of CPURISCVState.

Ah good point. This is fine then.

>
> VILL and ISA  will be checked in every vector instruction, I just put
> them in one function.
> >
> >> +
> >> +/*
> >> + * There are two rules check here.
> >> + *
> >> + * 1. Vector register numbers are multiples of LMUL. (Section 3.2)
> >> + *
> >> + * 2. For all widening instructions, the destination LMUL value must also be
> >> + *    a supported LMUL value. (Section 11.2)
> >> + */
> >> +static bool vext_check_reg(DisasContext *s, uint32_t reg, bool widen)
> >> +{
> >> +    /*
> >> +     * The destination vector register group results are arranged as if both
> >> +     * SEW and LMUL were at twice their current settings. (Section 11.2).
> >> +     */
> >> +    int legal = widen ? 2 << s->lmul : 1 << s->lmul;
> >> +
> >> +    return !((s->lmul == 0x3 && widen) || (reg % legal));
> > Where does this 3 come from?
> LMUL are 2 bits in VTYPE.  So the biggest LMUL is 0x3.
> The meaning of 0x3 is there are 8 vector registers will be used for
> operators.
>
> For a widen operation, LMUL equals 0x3 will be illegal, as
>
>      "The destination vector register group results are arranged as if both
>       SEW and LMUL were at twice their current settings. (Section 11.2)."
>
> If LMUL is 0x3, the source vector register group is 8 vector registers, and
> the destination vector register group will be 16 vector registers indicated,
> which is illegal.

Ah ok.

> >
> >> +}
> >> +
> >> +/*
> >> + * There are two rules check here.
> >> + *
> >> + * 1. The destination vector register group for a masked vector instruction can
> >> + *    only overlap the source mask register (v0) when LMUL=1. (Section 5.3)
> >> + *
> >> + * 2. In widen instructions and some other insturctions, like vslideup.vx,
> >> + *    there is no need to check whether LMUL=1.
> >> + */
> >> +static bool vext_check_overlap_mask(DisasContext *s, uint32_t vd, bool vm,
> >> +    bool force)
> >> +{
> >> +    return (vm != 0 || vd != 0) || (!force && (s->lmul == 0));
> >> +}
> >> +
> >> +/* The LMUL setting must be such that LMUL * NFIELDS <= 8. (Section 7.8) */
> >> +static bool vext_check_nf(DisasContext *s, uint32_t nf)
> >> +{
> >> +    return (1 << s->lmul) * nf <= 8;
> >> +}
> >> +
> >> +/* common translation macro */
> >> +#define GEN_VEXT_TRANS(NAME, SEQ, ARGTYPE, OP, CHECK)      \
> >> +static bool trans_##NAME(DisasContext *s, arg_##ARGTYPE *a)\
> >> +{                                                          \
> >> +    if (CHECK(s, a)) {                                     \
> >> +        return OP(s, a, SEQ);                              \
> >> +    }                                                      \
> >> +    return false;                                          \
> >> +}
> >> +
> >> +/*
> >> + *** unit stride load and store
> >> + */
> >> +typedef void gen_helper_ldst_us(TCGv_ptr, TCGv_ptr, TCGv,
> >> +        TCGv_env, TCGv_i32);
> >> +
> >> +static bool ldst_us_trans(uint32_t vd, uint32_t rs1, uint32_t data,
> >> +        gen_helper_ldst_us *fn, DisasContext *s)
> >> +{
> >> +    TCGv_ptr dest, mask;
> >> +    TCGv base;
> >> +    TCGv_i32 desc;
> >> +
> >> +    dest = tcg_temp_new_ptr();
> >> +    mask = tcg_temp_new_ptr();
> >> +    base = tcg_temp_new();
> >> +
> >> +    /*
> >> +     * As simd_desc supports at most 256 bytes, and in this implementation,
> >> +     * the max vector group length is 2048 bytes. So split it into two parts.
> >> +     *
> >> +     * The first part is vlen in bytes, encoded in maxsz of simd_desc.
> >> +     * The second part is lmul, encoded in data of simd_desc.
> >> +     */
> >> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
> >> +
> >> +    gen_get_gpr(base, rs1);
> >> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
> >> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
> >> +
> >> +    fn(dest, mask, base, cpu_env, desc);
> >> +
> >> +    tcg_temp_free_ptr(dest);
> >> +    tcg_temp_free_ptr(mask);
> >> +    tcg_temp_free(base);
> >> +    tcg_temp_free_i32(desc);
> >> +    return true;
> >> +}
> >> +
> >> +static bool ld_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
> >> +{
> >> +    uint32_t data = 0;
> >> +    gen_helper_ldst_us *fn;
> >> +    static gen_helper_ldst_us * const fns[2][7][4] = {
> >> +        /* masked unit stride load */
> >> +        { { gen_helper_vlb_v_b_mask,  gen_helper_vlb_v_h_mask,
> >> +            gen_helper_vlb_v_w_mask,  gen_helper_vlb_v_d_mask },
> >> +          { NULL,                     gen_helper_vlh_v_h_mask,
> >> +            gen_helper_vlh_v_w_mask,  gen_helper_vlh_v_d_mask },
> >> +          { NULL,                     NULL,
> >> +            gen_helper_vlw_v_w_mask,  gen_helper_vlw_v_d_mask },
> >> +          { gen_helper_vle_v_b_mask,  gen_helper_vle_v_h_mask,
> >> +            gen_helper_vle_v_w_mask,  gen_helper_vle_v_d_mask },
> >> +          { gen_helper_vlbu_v_b_mask, gen_helper_vlbu_v_h_mask,
> >> +            gen_helper_vlbu_v_w_mask, gen_helper_vlbu_v_d_mask },
> >> +          { NULL,                     gen_helper_vlhu_v_h_mask,
> >> +            gen_helper_vlhu_v_w_mask, gen_helper_vlhu_v_d_mask },
> >> +          { NULL,                     NULL,
> >> +            gen_helper_vlwu_v_w_mask, gen_helper_vlwu_v_d_mask } },
> >> +        /* unmasked unit stride load */
> >> +        { { gen_helper_vlb_v_b,  gen_helper_vlb_v_h,
> >> +            gen_helper_vlb_v_w,  gen_helper_vlb_v_d },
> >> +          { NULL,                gen_helper_vlh_v_h,
> >> +            gen_helper_vlh_v_w,  gen_helper_vlh_v_d },
> >> +          { NULL,                NULL,
> >> +            gen_helper_vlw_v_w,  gen_helper_vlw_v_d },
> >> +          { gen_helper_vle_v_b,  gen_helper_vle_v_h,
> >> +            gen_helper_vle_v_w,  gen_helper_vle_v_d },
> >> +          { gen_helper_vlbu_v_b, gen_helper_vlbu_v_h,
> >> +            gen_helper_vlbu_v_w, gen_helper_vlbu_v_d },
> >> +          { NULL,                gen_helper_vlhu_v_h,
> >> +            gen_helper_vlhu_v_w, gen_helper_vlhu_v_d },
> >> +          { NULL,                NULL,
> >> +            gen_helper_vlwu_v_w, gen_helper_vlwu_v_d } }
> >> +    };
> >> +
> >> +    fn =  fns[a->vm][seq][s->sew];
> >> +    if (fn == NULL) {
> >> +        return false;
> >> +    }
> >> +
> >> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> >> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> >> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> >> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
> >> +    return ldst_us_trans(a->rd, a->rs1, data, fn, s);
> >> +}
> >> +
> >> +static bool ld_us_check(DisasContext *s, arg_r2nfvm* a)
> >> +{
> >> +    return (vext_check_isa_ill(s, RVV) &&
> >> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
> >> +            vext_check_reg(s, a->rd, false) &&
> >> +            vext_check_nf(s, a->nf));
> >> +}
> >> +
> >> +GEN_VEXT_TRANS(vlb_v, 0, r2nfvm, ld_us_op, ld_us_check)
> >> +GEN_VEXT_TRANS(vlh_v, 1, r2nfvm, ld_us_op, ld_us_check)
> >> +GEN_VEXT_TRANS(vlw_v, 2, r2nfvm, ld_us_op, ld_us_check)
> >> +GEN_VEXT_TRANS(vle_v, 3, r2nfvm, ld_us_op, ld_us_check)
> >> +GEN_VEXT_TRANS(vlbu_v, 4, r2nfvm, ld_us_op, ld_us_check)
> >> +GEN_VEXT_TRANS(vlhu_v, 5, r2nfvm, ld_us_op, ld_us_check)
> >> +GEN_VEXT_TRANS(vlwu_v, 6, r2nfvm, ld_us_op, ld_us_check)
> >> +
> >> +static bool st_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
> >> +{
> >> +    uint32_t data = 0;
> >> +    gen_helper_ldst_us *fn;
> >> +    static gen_helper_ldst_us * const fns[2][4][4] = {
> >> +        /* masked unit stride load and store */
> >> +        { { gen_helper_vsb_v_b_mask,  gen_helper_vsb_v_h_mask,
> >> +            gen_helper_vsb_v_w_mask,  gen_helper_vsb_v_d_mask },
> >> +          { NULL,                     gen_helper_vsh_v_h_mask,
> >> +            gen_helper_vsh_v_w_mask,  gen_helper_vsh_v_d_mask },
> >> +          { NULL,                     NULL,
> >> +            gen_helper_vsw_v_w_mask,  gen_helper_vsw_v_d_mask },
> >> +          { gen_helper_vse_v_b_mask,  gen_helper_vse_v_h_mask,
> >> +            gen_helper_vse_v_w_mask,  gen_helper_vse_v_d_mask } },
> >> +        /* unmasked unit stride store */
> >> +        { { gen_helper_vsb_v_b,  gen_helper_vsb_v_h,
> >> +            gen_helper_vsb_v_w,  gen_helper_vsb_v_d },
> >> +          { NULL,                gen_helper_vsh_v_h,
> >> +            gen_helper_vsh_v_w,  gen_helper_vsh_v_d },
> >> +          { NULL,                NULL,
> >> +            gen_helper_vsw_v_w,  gen_helper_vsw_v_d },
> >> +          { gen_helper_vse_v_b,  gen_helper_vse_v_h,
> >> +            gen_helper_vse_v_w,  gen_helper_vse_v_d } }
> >> +    };
> >> +
> >> +    fn =  fns[a->vm][seq][s->sew];
> >> +    if (fn == NULL) {
> >> +        return false;
> >> +    }
> >> +
> >> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> >> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> >> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> >> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
> >> +    return ldst_us_trans(a->rd, a->rs1, data, fn, s);
> >> +}
> >> +
> >> +static bool st_us_check(DisasContext *s, arg_r2nfvm* a)
> >> +{
> >> +    return (vext_check_isa_ill(s, RVV) &&
> >> +            vext_check_reg(s, a->rd, false) &&
> >> +            vext_check_nf(s, a->nf));
> >> +}
> >> +
> >> +GEN_VEXT_TRANS(vsb_v, 0, r2nfvm, st_us_op, st_us_check)
> >> +GEN_VEXT_TRANS(vsh_v, 1, r2nfvm, st_us_op, st_us_check)
> >> +GEN_VEXT_TRANS(vsw_v, 2, r2nfvm, st_us_op, st_us_check)
> >> +GEN_VEXT_TRANS(vse_v, 3, r2nfvm, st_us_op, st_us_check)
> >> +
> >> +/*
> >> + *** stride load and store
> >> + */
> >> +typedef void gen_helper_ldst_stride(TCGv_ptr, TCGv_ptr, TCGv,
> >> +        TCGv, TCGv_env, TCGv_i32);
> >> +
> >> +static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, uint32_t rs2,
> >> +        uint32_t data, gen_helper_ldst_stride *fn, DisasContext *s)
> >> +{
> >> +    TCGv_ptr dest, mask;
> >> +    TCGv base, stride;
> >> +    TCGv_i32 desc;
> >> +
> >> +    dest = tcg_temp_new_ptr();
> >> +    mask = tcg_temp_new_ptr();
> >> +    base = tcg_temp_new();
> >> +    stride = tcg_temp_new();
> >> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
> >> +
> >> +    gen_get_gpr(base, rs1);
> >> +    gen_get_gpr(stride, rs2);
> >> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
> >> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
> >> +
> >> +    fn(dest, mask, base, stride, cpu_env, desc);
> >> +
> >> +    tcg_temp_free_ptr(dest);
> >> +    tcg_temp_free_ptr(mask);
> >> +    tcg_temp_free(base);
> >> +    tcg_temp_free(stride);
> >> +    tcg_temp_free_i32(desc);
> >> +    return true;
> >> +}
> >> +
> >> +static bool ld_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
> >> +{
> >> +    uint32_t data = 0;
> >> +    gen_helper_ldst_stride *fn;
> >> +    static gen_helper_ldst_stride * const fns[7][4] = {
> >> +        { gen_helper_vlsb_v_b,  gen_helper_vlsb_v_h,
> >> +          gen_helper_vlsb_v_w,  gen_helper_vlsb_v_d },
> >> +        { NULL,                 gen_helper_vlsh_v_h,
> >> +          gen_helper_vlsh_v_w,  gen_helper_vlsh_v_d },
> >> +        { NULL,                 NULL,
> >> +          gen_helper_vlsw_v_w,  gen_helper_vlsw_v_d },
> >> +        { gen_helper_vlse_v_b,  gen_helper_vlse_v_h,
> >> +          gen_helper_vlse_v_w,  gen_helper_vlse_v_d },
> >> +        { gen_helper_vlsbu_v_b, gen_helper_vlsbu_v_h,
> >> +          gen_helper_vlsbu_v_w, gen_helper_vlsbu_v_d },
> >> +        { NULL,                 gen_helper_vlshu_v_h,
> >> +          gen_helper_vlshu_v_w, gen_helper_vlshu_v_d },
> >> +        { NULL,                 NULL,
> >> +          gen_helper_vlswu_v_w, gen_helper_vlswu_v_d },
> >> +    };
> >> +
> >> +    fn =  fns[seq][s->sew];
> >> +    if (fn == NULL) {
> >> +        return false;
> >> +    }
> >> +
> >> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> >> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> >> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> >> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
> >> +    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
> >> +}
> >> +
> >> +static bool ld_stride_check(DisasContext *s, arg_rnfvm* a)
> >> +{
> >> +    return (vext_check_isa_ill(s, RVV) &&
> >> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
> >> +            vext_check_reg(s, a->rd, false) &&
> >> +            vext_check_nf(s, a->nf));
> >> +}
> >> +
> >> +GEN_VEXT_TRANS(vlsb_v, 0, rnfvm, ld_stride_op, ld_stride_check)
> >> +GEN_VEXT_TRANS(vlsh_v, 1, rnfvm, ld_stride_op, ld_stride_check)
> >> +GEN_VEXT_TRANS(vlsw_v, 2, rnfvm, ld_stride_op, ld_stride_check)
> >> +GEN_VEXT_TRANS(vlse_v, 3, rnfvm, ld_stride_op, ld_stride_check)
> >> +GEN_VEXT_TRANS(vlsbu_v, 4, rnfvm, ld_stride_op, ld_stride_check)
> >> +GEN_VEXT_TRANS(vlshu_v, 5, rnfvm, ld_stride_op, ld_stride_check)
> >> +GEN_VEXT_TRANS(vlswu_v, 6, rnfvm, ld_stride_op, ld_stride_check)
> >> +
> >> +static bool st_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
> >> +{
> >> +    uint32_t data = 0;
> >> +    gen_helper_ldst_stride *fn;
> >> +    static gen_helper_ldst_stride * const fns[4][4] = {
> >> +        /* masked stride store */
> >> +        { gen_helper_vssb_v_b,  gen_helper_vssb_v_h,
> >> +          gen_helper_vssb_v_w,  gen_helper_vssb_v_d },
> >> +        { NULL,                 gen_helper_vssh_v_h,
> >> +          gen_helper_vssh_v_w,  gen_helper_vssh_v_d },
> >> +        { NULL,                 NULL,
> >> +          gen_helper_vssw_v_w,  gen_helper_vssw_v_d },
> >> +        { gen_helper_vsse_v_b,  gen_helper_vsse_v_h,
> >> +          gen_helper_vsse_v_w,  gen_helper_vsse_v_d }
> >> +    };
> >> +
> >> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> >> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> >> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> >> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
> >> +    fn =  fns[seq][s->sew];
> >> +    if (fn == NULL) {
> >> +        return false;
> >> +    }
> >> +
> >> +    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
> >> +}
> >> +
> >> +static bool st_stride_check(DisasContext *s, arg_rnfvm* a)
> >> +{
> >> +    return (vext_check_isa_ill(s, RVV) &&
> >> +            vext_check_reg(s, a->rd, false) &&
> >> +            vext_check_nf(s, a->nf));
> >> +}
> >> +
> >> +GEN_VEXT_TRANS(vssb_v, 0, rnfvm, st_stride_op, st_stride_check)
> >> +GEN_VEXT_TRANS(vssh_v, 1, rnfvm, st_stride_op, st_stride_check)
> >> +GEN_VEXT_TRANS(vssw_v, 2, rnfvm, st_stride_op, st_stride_check)
> >> +GEN_VEXT_TRANS(vsse_v, 3, rnfvm, st_stride_op, st_stride_check)
> > Looks good
> >
> >> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> >> index af07ac4160..852545b77e 100644
> >> --- a/target/riscv/translate.c
> >> +++ b/target/riscv/translate.c
> >> @@ -61,6 +61,7 @@ typedef struct DisasContext {
> >>       uint8_t lmul;
> >>       uint8_t sew;
> >>       uint16_t vlen;
> >> +    uint16_t mlen;
> >>       bool vl_eq_vlmax;
> >>   } DisasContext;
> >>
> >> @@ -548,6 +549,11 @@ static void decode_RV32_64C(DisasContext *ctx, uint16_t opcode)
> >>       }
> >>   }
> >>
> >> +static int ex_plus_1(DisasContext *ctx, int nf)
> >> +{
> >> +    return nf + 1;
> >> +}
> >> +
> >>   #define EX_SH(amount) \
> >>       static int ex_shift_##amount(DisasContext *ctx, int imm) \
> >>       {                                         \
> >> @@ -784,6 +790,7 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
> >>       ctx->vill = FIELD_EX32(tb_flags, TB_FLAGS, VILL);
> >>       ctx->sew = FIELD_EX32(tb_flags, TB_FLAGS, SEW);
> >>       ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
> >> +    ctx->mlen = 1 << (ctx->sew  + 3 - ctx->lmul);
> >>       ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
> >>   }
> >>
> >> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> >> index 2afe716f2a..ebfabd2946 100644
> >> --- a/target/riscv/vector_helper.c
> >> +++ b/target/riscv/vector_helper.c
> >> @@ -18,8 +18,10 @@
> >>
> >>   #include "qemu/osdep.h"
> >>   #include "cpu.h"
> >> +#include "exec/memop.h"
> >>   #include "exec/exec-all.h"
> >>   #include "exec/helper-proto.h"
> >> +#include "tcg/tcg-gvec-desc.h"
> >>   #include <math.h>
> >>
> >>   target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
> >> @@ -51,3 +53,407 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
> >>       env->vstart = 0;
> >>       return vl;
> >>   }
> >> +
> >> +/*
> >> + * Note that vector data is stored in host-endian 64-bit chunks,
> >> + * so addressing units smaller than that needs a host-endian fixup.
> >> + */
> >> +#ifdef HOST_WORDS_BIGENDIAN
> >> +#define H1(x)   ((x) ^ 7)
> >> +#define H1_2(x) ((x) ^ 6)
> >> +#define H1_4(x) ((x) ^ 4)
> >> +#define H2(x)   ((x) ^ 3)
> >> +#define H4(x)   ((x) ^ 1)
> >> +#define H8(x)   ((x))
> >> +#else
> >> +#define H1(x)   (x)
> >> +#define H1_2(x) (x)
> >> +#define H1_4(x) (x)
> >> +#define H2(x)   (x)
> >> +#define H4(x)   (x)
> >> +#define H8(x)   (x)
> >> +#endif
> > Looks good. Overall this looks good. Do you mind splitting this patch
> > up a little bit more? It's difficult to review such a long and complex
> > patch.
> >
> > Alistair
> As unit stride can be saw as  a special case of stride mode, I just put
> them together.
> I will  split the stride and unit stride mode in next patch set.

Thank you.

>
> Even though I think it will be some long and complex, a lot of corner
> case must
> be considered for vector load and store, and a lot of common code will
> be defined
> here.

That's fine

Alistair

>
> Zhiwei
> >> +
> >> +static inline uint32_t vext_nf(uint32_t desc)
> >> +{
> >> +    return FIELD_EX32(simd_data(desc), VDATA, NF);
> >> +}
> >> +
> >> +static inline uint32_t vext_mlen(uint32_t desc)
> >> +{
> >> +    return FIELD_EX32(simd_data(desc), VDATA, MLEN);
> >> +}
> >> +
> >> +static inline uint32_t vext_vm(uint32_t desc)
> >> +{
> >> +    return FIELD_EX32(simd_data(desc), VDATA, VM);
> >> +}
> >> +
> >> +static inline uint32_t vext_lmul(uint32_t desc)
> >> +{
> >> +    return FIELD_EX32(simd_data(desc), VDATA, LMUL);
> >> +}
> >> +
> >> +/*
> >> + * Get vector group length in bytes. Its range is [64, 2048].
> >> + *
> >> + * As simd_desc support at most 256, the max vlen is 512 bits.
> >> + * So vlen in bytes is encoded as maxsz.
> >> + */
> >> +static inline uint32_t vext_maxsz(uint32_t desc)
> >> +{
> >> +    return simd_maxsz(desc) << vext_lmul(desc);
> >> +}
> >> +
> >> +/*
> >> + * This function checks watchpoint before real load operation.
> >> + *
> >> + * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
> >> + * In user mode, there is no watchpoint support now.
> >> + *
> >> + * It will trigger an exception if there is no mapping in TLB
> >> + * and page table walk can't fill the TLB entry. Then the guest
> >> + * software can return here after process the exception or never return.
> >> + */
> >> +static void probe_pages(CPURISCVState *env, target_ulong addr,
> >> +        target_ulong len, uintptr_t ra, MMUAccessType access_type)
> >> +{
> >> +    target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
> >> +    target_ulong curlen = MIN(pagelen, len);
> >> +
> >> +    probe_access(env, addr, curlen, access_type,
> >> +            cpu_mmu_index(env, false), ra);
> >> +    if (len > curlen) {
> >> +        addr += curlen;
> >> +        curlen = len - curlen;
> >> +        probe_access(env, addr, curlen, access_type,
> >> +                cpu_mmu_index(env, false), ra);
> >> +    }
> >> +}
> >> +
> >> +#ifdef HOST_WORDS_BIGENDIAN
> >> +static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
> >> +{
> >> +    /*
> >> +     * Split the remaining range to two parts.
> >> +     * The first part is in the last uint64_t unit.
> >> +     * The second part start from the next uint64_t unit.
> >> +     */
> >> +    int part1 = 0, part2 = tot - cnt;
> >> +    if (cnt % 8) {
> >> +        part1 = 8 - (cnt % 8);
> >> +        part2 = tot - cnt - part1;
> >> +        memset(tail & ~(7ULL), 0, part1);
> >> +        memset((tail + 8) & ~(7ULL), 0, part2);
> >> +    } else {
> >> +        memset(tail, 0, part2);
> >> +    }
> >> +}
> >> +#else
> >> +static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
> >> +{
> >> +    memset(tail, 0, tot - cnt);
> >> +}
> >> +#endif
> >> +
> >> +static void clearb(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
> >> +{
> >> +    int8_t *cur = ((int8_t *)vd + H1(idx));
> >> +    vext_clear(cur, cnt, tot);
> >> +}
> >> +
> >> +static void clearh(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
> >> +{
> >> +    int16_t *cur = ((int16_t *)vd + H2(idx));
> >> +    vext_clear(cur, cnt, tot);
> >> +}
> >> +
> >> +static void clearl(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
> >> +{
> >> +    int32_t *cur = ((int32_t *)vd + H4(idx));
> >> +    vext_clear(cur, cnt, tot);
> >> +}
> >> +
> >> +static void clearq(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
> >> +{
> >> +    int64_t *cur = (int64_t *)vd + idx;
> >> +    vext_clear(cur, cnt, tot);
> >> +}
> >> +
> >> +
> >> +static inline int vext_elem_mask(void *v0, int mlen, int index)
> >> +{
> >> +    int idx = (index * mlen) / 64;
> >> +    int pos = (index * mlen) % 64;
> >> +    return (((uint64_t *)v0)[idx] >> pos) & 1;
> >> +}
> >> +
> >> +/* elements operations for load and store */
> >> +typedef void (*vext_ldst_elem_fn)(CPURISCVState *env, target_ulong addr,
> >> +        uint32_t idx, void *vd, uintptr_t retaddr);
> >> +typedef void (*vext_ld_clear_elem)(void *vd, uint32_t idx,
> >> +        uint32_t cnt, uint32_t tot);
> >> +
> >> +#define GEN_VEXT_LD_ELEM(NAME, MTYPE, ETYPE, H, LDSUF)     \
> >> +static void NAME(CPURISCVState *env, abi_ptr addr,         \
> >> +        uint32_t idx, void *vd, uintptr_t retaddr)         \
> >> +{                                                          \
> >> +    MTYPE data;                                            \
> >> +    ETYPE *cur = ((ETYPE *)vd + H(idx));                   \
> >> +    data = cpu_##LDSUF##_data_ra(env, addr, retaddr);      \
> >> +    *cur = data;                                           \
> >> +}                                                          \
> >> +
> >> +GEN_VEXT_LD_ELEM(ldb_b, int8_t,  int8_t,  H1, ldsb)
> >> +GEN_VEXT_LD_ELEM(ldb_h, int8_t,  int16_t, H2, ldsb)
> >> +GEN_VEXT_LD_ELEM(ldb_w, int8_t,  int32_t, H4, ldsb)
> >> +GEN_VEXT_LD_ELEM(ldb_d, int8_t,  int64_t, H8, ldsb)
> >> +GEN_VEXT_LD_ELEM(ldh_h, int16_t, int16_t, H2, ldsw)
> >> +GEN_VEXT_LD_ELEM(ldh_w, int16_t, int32_t, H4, ldsw)
> >> +GEN_VEXT_LD_ELEM(ldh_d, int16_t, int64_t, H8, ldsw)
> >> +GEN_VEXT_LD_ELEM(ldw_w, int32_t, int32_t, H4, ldl)
> >> +GEN_VEXT_LD_ELEM(ldw_d, int32_t, int64_t, H8, ldl)
> >> +GEN_VEXT_LD_ELEM(lde_b, int8_t,  int8_t,  H1, ldsb)
> >> +GEN_VEXT_LD_ELEM(lde_h, int16_t, int16_t, H2, ldsw)
> >> +GEN_VEXT_LD_ELEM(lde_w, int32_t, int32_t, H4, ldl)
> >> +GEN_VEXT_LD_ELEM(lde_d, int64_t, int64_t, H8, ldq)
> >> +GEN_VEXT_LD_ELEM(ldbu_b, uint8_t,  uint8_t,  H1, ldub)
> >> +GEN_VEXT_LD_ELEM(ldbu_h, uint8_t,  uint16_t, H2, ldub)
> >> +GEN_VEXT_LD_ELEM(ldbu_w, uint8_t,  uint32_t, H4, ldub)
> >> +GEN_VEXT_LD_ELEM(ldbu_d, uint8_t,  uint64_t, H8, ldub)
> >> +GEN_VEXT_LD_ELEM(ldhu_h, uint16_t, uint16_t, H2, lduw)
> >> +GEN_VEXT_LD_ELEM(ldhu_w, uint16_t, uint32_t, H4, lduw)
> >> +GEN_VEXT_LD_ELEM(ldhu_d, uint16_t, uint64_t, H8, lduw)
> >> +GEN_VEXT_LD_ELEM(ldwu_w, uint32_t, uint32_t, H4, ldl)
> >> +GEN_VEXT_LD_ELEM(ldwu_d, uint32_t, uint64_t, H8, ldl)
> >> +
> >> +#define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF)          \
> >> +static void NAME(CPURISCVState *env, abi_ptr addr,       \
> >> +        uint32_t idx, void *vd, uintptr_t retaddr)       \
> >> +{                                                        \
> >> +    ETYPE data = *((ETYPE *)vd + H(idx));                \
> >> +    cpu_##STSUF##_data_ra(env, addr, data, retaddr);     \
> >> +}
> >> +GEN_VEXT_ST_ELEM(stb_b, int8_t,  H1, stb)
> >> +GEN_VEXT_ST_ELEM(stb_h, int16_t, H2, stb)
> >> +GEN_VEXT_ST_ELEM(stb_w, int32_t, H4, stb)
> >> +GEN_VEXT_ST_ELEM(stb_d, int64_t, H8, stb)
> >> +GEN_VEXT_ST_ELEM(sth_h, int16_t, H2, stw)
> >> +GEN_VEXT_ST_ELEM(sth_w, int32_t, H4, stw)
> >> +GEN_VEXT_ST_ELEM(sth_d, int64_t, H8, stw)
> >> +GEN_VEXT_ST_ELEM(stw_w, int32_t, H4, stl)
> >> +GEN_VEXT_ST_ELEM(stw_d, int64_t, H8, stl)
> >> +GEN_VEXT_ST_ELEM(ste_b, int8_t,  H1, stb)
> >> +GEN_VEXT_ST_ELEM(ste_h, int16_t, H2, stw)
> >> +GEN_VEXT_ST_ELEM(ste_w, int32_t, H4, stl)
> >> +GEN_VEXT_ST_ELEM(ste_d, int64_t, H8, stq)
> >> +
> >> +/*
> >> + *** stride: access vector element from strided memory
> >> + */
> >> +static void vext_ldst_stride(void *vd, void *v0, target_ulong base,
> >> +        target_ulong stride, CPURISCVState *env, uint32_t desc, uint32_t vm,
> >> +        vext_ldst_elem_fn ldst_elem, vext_ld_clear_elem clear_elem,
> >> +        uint32_t esz, uint32_t msz, uintptr_t ra, MMUAccessType access_type)
> >> +{
> >> +    uint32_t i, k;
> >> +    uint32_t nf = vext_nf(desc);
> >> +    uint32_t mlen = vext_mlen(desc);
> >> +    uint32_t vlmax = vext_maxsz(desc) / esz;
> >> +
> >> +    if (env->vl == 0) {
> >> +        return;
> >> +    }
> >> +    /* probe every access*/
> >> +    for (i = 0; i < env->vl; i++) {
> >> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> >> +            continue;
> >> +        }
> >> +        probe_pages(env, base + stride * i, nf * msz, ra, access_type);
> >> +    }
> >> +    /* do real access */
> >> +    for (i = 0; i < env->vl; i++) {
> >> +        k = 0;
> >> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> >> +            continue;
> >> +        }
> >> +        while (k < nf) {
> >> +            target_ulong addr = base + stride * i + k * msz;
> >> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
> >> +            k++;
> >> +        }
> >> +    }
> >> +    /* clear tail elements */
> >> +    if (clear_elem) {
> >> +        for (k = 0; k < nf; k++) {
> >> +            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
> >> +        }
> >> +    }
> >> +}
> >> +
> >> +#define GEN_VEXT_LD_STRIDE(NAME, MTYPE, ETYPE, LOAD_FN, CLEAR_FN)       \
> >> +void HELPER(NAME)(void *vd, void * v0, target_ulong base,               \
> >> +        target_ulong stride, CPURISCVState *env, uint32_t desc)         \
> >> +{                                                                       \
> >> +    uint32_t vm = vext_vm(desc);                                        \
> >> +    vext_ldst_stride(vd, v0, base, stride, env, desc, vm, LOAD_FN,      \
> >> +        CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);\
> >> +}
> >> +
> >> +GEN_VEXT_LD_STRIDE(vlsb_v_b,  int8_t,   int8_t,   ldb_b,  clearb)
> >> +GEN_VEXT_LD_STRIDE(vlsb_v_h,  int8_t,   int16_t,  ldb_h,  clearh)
> >> +GEN_VEXT_LD_STRIDE(vlsb_v_w,  int8_t,   int32_t,  ldb_w,  clearl)
> >> +GEN_VEXT_LD_STRIDE(vlsb_v_d,  int8_t,   int64_t,  ldb_d,  clearq)
> >> +GEN_VEXT_LD_STRIDE(vlsh_v_h,  int16_t,  int16_t,  ldh_h,  clearh)
> >> +GEN_VEXT_LD_STRIDE(vlsh_v_w,  int16_t,  int32_t,  ldh_w,  clearl)
> >> +GEN_VEXT_LD_STRIDE(vlsh_v_d,  int16_t,  int64_t,  ldh_d,  clearq)
> >> +GEN_VEXT_LD_STRIDE(vlsw_v_w,  int32_t,  int32_t,  ldw_w,  clearl)
> >> +GEN_VEXT_LD_STRIDE(vlsw_v_d,  int32_t,  int64_t,  ldw_d,  clearq)
> >> +GEN_VEXT_LD_STRIDE(vlse_v_b,  int8_t,   int8_t,   lde_b,  clearb)
> >> +GEN_VEXT_LD_STRIDE(vlse_v_h,  int16_t,  int16_t,  lde_h,  clearh)
> >> +GEN_VEXT_LD_STRIDE(vlse_v_w,  int32_t,  int32_t,  lde_w,  clearl)
> >> +GEN_VEXT_LD_STRIDE(vlse_v_d,  int64_t,  int64_t,  lde_d,  clearq)
> >> +GEN_VEXT_LD_STRIDE(vlsbu_v_b, uint8_t,  uint8_t,  ldbu_b, clearb)
> >> +GEN_VEXT_LD_STRIDE(vlsbu_v_h, uint8_t,  uint16_t, ldbu_h, clearh)
> >> +GEN_VEXT_LD_STRIDE(vlsbu_v_w, uint8_t,  uint32_t, ldbu_w, clearl)
> >> +GEN_VEXT_LD_STRIDE(vlsbu_v_d, uint8_t,  uint64_t, ldbu_d, clearq)
> >> +GEN_VEXT_LD_STRIDE(vlshu_v_h, uint16_t, uint16_t, ldhu_h, clearh)
> >> +GEN_VEXT_LD_STRIDE(vlshu_v_w, uint16_t, uint32_t, ldhu_w, clearl)
> >> +GEN_VEXT_LD_STRIDE(vlshu_v_d, uint16_t, uint64_t, ldhu_d, clearq)
> >> +GEN_VEXT_LD_STRIDE(vlswu_v_w, uint32_t, uint32_t, ldwu_w, clearl)
> >> +GEN_VEXT_LD_STRIDE(vlswu_v_d, uint32_t, uint64_t, ldwu_d, clearq)
> >> +
> >> +#define GEN_VEXT_ST_STRIDE(NAME, MTYPE, ETYPE, STORE_FN)                \
> >> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
> >> +        target_ulong stride, CPURISCVState *env, uint32_t desc)         \
> >> +{                                                                       \
> >> +    uint32_t vm = vext_vm(desc);                                        \
> >> +    vext_ldst_stride(vd, v0, base, stride, env, desc, vm, STORE_FN,     \
> >> +        NULL, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);   \
> >> +}
> >> +
> >> +GEN_VEXT_ST_STRIDE(vssb_v_b, int8_t,  int8_t,  stb_b)
> >> +GEN_VEXT_ST_STRIDE(vssb_v_h, int8_t,  int16_t, stb_h)
> >> +GEN_VEXT_ST_STRIDE(vssb_v_w, int8_t,  int32_t, stb_w)
> >> +GEN_VEXT_ST_STRIDE(vssb_v_d, int8_t,  int64_t, stb_d)
> >> +GEN_VEXT_ST_STRIDE(vssh_v_h, int16_t, int16_t, sth_h)
> >> +GEN_VEXT_ST_STRIDE(vssh_v_w, int16_t, int32_t, sth_w)
> >> +GEN_VEXT_ST_STRIDE(vssh_v_d, int16_t, int64_t, sth_d)
> >> +GEN_VEXT_ST_STRIDE(vssw_v_w, int32_t, int32_t, stw_w)
> >> +GEN_VEXT_ST_STRIDE(vssw_v_d, int32_t, int64_t, stw_d)
> >> +GEN_VEXT_ST_STRIDE(vsse_v_b, int8_t,  int8_t,  ste_b)
> >> +GEN_VEXT_ST_STRIDE(vsse_v_h, int16_t, int16_t, ste_h)
> >> +GEN_VEXT_ST_STRIDE(vsse_v_w, int32_t, int32_t, ste_w)
> >> +GEN_VEXT_ST_STRIDE(vsse_v_d, int64_t, int64_t, ste_d)
> >> +
> >> +/*
> >> + *** unit-stride: access elements stored contiguously in memory
> >> + */
> >> +
> >> +/* unmasked unit-stride load and store operation*/
> >> +static inline void vext_ldst_us(void *vd, target_ulong base,
> >> +        CPURISCVState *env, uint32_t desc,
> >> +        vext_ldst_elem_fn ldst_elem,
> >> +        vext_ld_clear_elem clear_elem,
> >> +        uint32_t esz, uint32_t msz, uintptr_t ra,
> >> +        MMUAccessType access_type)
> >> +{
> >> +    uint32_t i, k;
> >> +    uint32_t nf = vext_nf(desc);
> >> +    uint32_t vlmax = vext_maxsz(desc) / esz;
> >> +
> >> +    if (env->vl == 0) {
> >> +        return;
> >> +    }
> >> +    /* probe every access */
> >> +    probe_pages(env, base, env->vl * nf * msz, ra, access_type);
> >> +    /* load bytes from guest memory */
> >> +    for (i = 0; i < env->vl; i++) {
> >> +        k = 0;
> >> +        while (k < nf) {
> >> +            target_ulong addr = base + (i * nf + k) * msz;
> >> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
> >> +            k++;
> >> +        }
> >> +    }
> >> +    /* clear tail elements */
> >> +    if (clear_elem) {
> >> +        for (k = 0; k < nf; k++) {
> >> +            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
> >> +        }
> >> +    }
> >> +}
> >> +
> >> +/*
> >> + * masked unit-stride load and store operation will be a special case of stride,
> >> + * stride = NF * sizeof (MTYPE)
> >> + */
> >> +
> >> +#define GEN_VEXT_LD_US(NAME, MTYPE, ETYPE, LOAD_FN, CLEAR_FN)           \
> >> +void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,         \
> >> +        CPURISCVState *env, uint32_t desc)                              \
> >> +{                                                                       \
> >> +    uint32_t stride = vext_nf(desc) * sizeof(MTYPE);                    \
> >> +    vext_ldst_stride(vd, v0, base, stride, env, desc, false, LOAD_FN,   \
> >> +        CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);\
> >> +}                                                                       \
> >> +                                                                        \
> >> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
> >> +        CPURISCVState *env, uint32_t desc)                              \
> >> +{                                                                       \
> >> +    vext_ldst_us(vd, base, env, desc, LOAD_FN, CLEAR_FN,                \
> >> +        sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);          \
> >> +}
> >> +
> >> +GEN_VEXT_LD_US(vlb_v_b,  int8_t,   int8_t,   ldb_b,  clearb)
> >> +GEN_VEXT_LD_US(vlb_v_h,  int8_t,   int16_t,  ldb_h,  clearh)
> >> +GEN_VEXT_LD_US(vlb_v_w,  int8_t,   int32_t,  ldb_w,  clearl)
> >> +GEN_VEXT_LD_US(vlb_v_d,  int8_t,   int64_t,  ldb_d,  clearq)
> >> +GEN_VEXT_LD_US(vlh_v_h,  int16_t,  int16_t,  ldh_h,  clearh)
> >> +GEN_VEXT_LD_US(vlh_v_w,  int16_t,  int32_t,  ldh_w,  clearl)
> >> +GEN_VEXT_LD_US(vlh_v_d,  int16_t,  int64_t,  ldh_d,  clearq)
> >> +GEN_VEXT_LD_US(vlw_v_w,  int32_t,  int32_t,  ldw_w,  clearl)
> >> +GEN_VEXT_LD_US(vlw_v_d,  int32_t,  int64_t,  ldw_d,  clearq)
> >> +GEN_VEXT_LD_US(vle_v_b,  int8_t,   int8_t,   lde_b,  clearb)
> >> +GEN_VEXT_LD_US(vle_v_h,  int16_t,  int16_t,  lde_h,  clearh)
> >> +GEN_VEXT_LD_US(vle_v_w,  int32_t,  int32_t,  lde_w,  clearl)
> >> +GEN_VEXT_LD_US(vle_v_d,  int64_t,  int64_t,  lde_d,  clearq)
> >> +GEN_VEXT_LD_US(vlbu_v_b, uint8_t,  uint8_t,  ldbu_b, clearb)
> >> +GEN_VEXT_LD_US(vlbu_v_h, uint8_t,  uint16_t, ldbu_h, clearh)
> >> +GEN_VEXT_LD_US(vlbu_v_w, uint8_t,  uint32_t, ldbu_w, clearl)
> >> +GEN_VEXT_LD_US(vlbu_v_d, uint8_t,  uint64_t, ldbu_d, clearq)
> >> +GEN_VEXT_LD_US(vlhu_v_h, uint16_t, uint16_t, ldhu_h, clearh)
> >> +GEN_VEXT_LD_US(vlhu_v_w, uint16_t, uint32_t, ldhu_w, clearl)
> >> +GEN_VEXT_LD_US(vlhu_v_d, uint16_t, uint64_t, ldhu_d, clearq)
> >> +GEN_VEXT_LD_US(vlwu_v_w, uint32_t, uint32_t, ldwu_w, clearl)
> >> +GEN_VEXT_LD_US(vlwu_v_d, uint32_t, uint64_t, ldwu_d, clearq)
> >> +
> >> +#define GEN_VEXT_ST_US(NAME, MTYPE, ETYPE, STORE_FN)                    \
> >> +void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,         \
> >> +        CPURISCVState *env, uint32_t desc)                              \
> >> +{                                                                       \
> >> +    uint32_t stride = vext_nf(desc) * sizeof(MTYPE);                    \
> >> +    vext_ldst_stride(vd, v0, base, stride, env, desc, false, STORE_FN,  \
> >> +        NULL, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);   \
> >> +}                                                                       \
> >> +                                                                        \
> >> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
> >> +        CPURISCVState *env, uint32_t desc)                              \
> >> +{                                                                       \
> >> +    vext_ldst_us(vd, base, env, desc, STORE_FN, NULL,                   \
> >> +        sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);         \
> >> +}
> >> +
> >> +GEN_VEXT_ST_US(vsb_v_b, int8_t,  int8_t , stb_b)
> >> +GEN_VEXT_ST_US(vsb_v_h, int8_t,  int16_t, stb_h)
> >> +GEN_VEXT_ST_US(vsb_v_w, int8_t,  int32_t, stb_w)
> >> +GEN_VEXT_ST_US(vsb_v_d, int8_t,  int64_t, stb_d)
> >> +GEN_VEXT_ST_US(vsh_v_h, int16_t, int16_t, sth_h)
> >> +GEN_VEXT_ST_US(vsh_v_w, int16_t, int32_t, sth_w)
> >> +GEN_VEXT_ST_US(vsh_v_d, int16_t, int64_t, sth_d)
> >> +GEN_VEXT_ST_US(vsw_v_w, int32_t, int32_t, stw_w)
> >> +GEN_VEXT_ST_US(vsw_v_d, int32_t, int64_t, stw_d)
> >> +GEN_VEXT_ST_US(vse_v_b, int8_t,  int8_t , ste_b)
> >> +GEN_VEXT_ST_US(vse_v_h, int16_t, int16_t, ste_h)
> >> +GEN_VEXT_ST_US(vse_v_w, int32_t, int32_t, ste_w)
> >> +GEN_VEXT_ST_US(vse_v_d, int64_t, int64_t, ste_d)
> >> --
> >> 2.23.0
> >>
>


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 05/60] target/riscv: add vector stride load and store instructions
  2020-03-13 22:05         ` Alistair Francis
@ 2020-03-13 22:17           ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-13 22:17 UTC (permalink / raw)
  To: Alistair Francis
  Cc: guoren, open list:RISC-V, Richard Henderson,
	qemu-devel@nongnu.org Developers, wxy194768, Chih-Min Chao,
	wenmeng_zhang, Palmer Dabbelt



On 2020/3/14 6:05, Alistair Francis wrote:
> On Fri, Mar 13, 2020 at 2:32 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>>
>>
>> On 2020/3/14 4:38, Alistair Francis wrote:
>>> On Thu, Mar 12, 2020 at 8:09 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>>>> Vector strided operations access the first memory element at the base address,
>>>> and then access subsequent elements at address increments given by the byte
>>>> offset contained in the x register specified by rs2.
>>>>
>>>> Vector unit-stride operations access elements stored contiguously in memory
>>>> starting from the base effective address. It can been seen as a special
>>>> case of strided operations.
>>>>
>>>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>>>> ---
>>>>    target/riscv/cpu.h                      |   6 +
>>>>    target/riscv/helper.h                   | 105 ++++++
>>>>    target/riscv/insn32.decode              |  32 ++
>>>>    target/riscv/insn_trans/trans_rvv.inc.c | 340 ++++++++++++++++++++
>>>>    target/riscv/translate.c                |   7 +
>>>>    target/riscv/vector_helper.c            | 406 ++++++++++++++++++++++++
>>>>    6 files changed, 896 insertions(+)
>>>>
>>>> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
>>>> index 505d1a8515..b6ebb9b0eb 100644
>>>> --- a/target/riscv/cpu.h
>>>> +++ b/target/riscv/cpu.h
>>>> @@ -369,6 +369,12 @@ typedef CPURISCVState CPUArchState;
>>>>    typedef RISCVCPU ArchCPU;
>>>>    #include "exec/cpu-all.h"
>>>>
>>>> +/* share data between vector helpers and decode code */
>>>> +FIELD(VDATA, MLEN, 0, 8)
>>>> +FIELD(VDATA, VM, 8, 1)
>>>> +FIELD(VDATA, LMUL, 9, 2)
>>>> +FIELD(VDATA, NF, 11, 4)
>>>> +
>>>>    FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
>>>>    FIELD(TB_FLAGS, LMUL, 3, 2)
>>>>    FIELD(TB_FLAGS, SEW, 5, 3)
>>>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>>>> index 3c28c7e407..87dfa90609 100644
>>>> --- a/target/riscv/helper.h
>>>> +++ b/target/riscv/helper.h
>>>> @@ -78,3 +78,108 @@ DEF_HELPER_1(tlb_flush, void, env)
>>>>    #endif
>>>>    /* Vector functions */
>>>>    DEF_HELPER_3(vsetvl, tl, env, tl, tl)
>>>> +DEF_HELPER_5(vlb_v_b, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlb_v_b_mask, void, ptr, ptr, tl, env, i32)
>>> Do you mind explaining why we have *_mask versions? I'm struggling to
>>> understand this.
>> When an instruction with a mask, it will only operate the active
>> elements in vector.
>> Whether an element is active or inactive is predicated by a mask
>> register v0.
>>
>> Without mask, it will operate every element in vector in the body.
> Doesn't the mask always apply though? Why do we need an extra helper?
Yes, mask is always applied.

As you can see,  an extra helper is  very special for unit stride mode.  
Other
instructions do not have the extra helpers.

That's because a more efficient implementation is possible for unit stride
load/store with vm==1(always unmasked).

It will operate a contiguous memory block, so I can probe the memory access
and clean the tail elements more efficient.

Zhiwei

>
>>>> +DEF_HELPER_5(vlb_v_h, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlb_v_h_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlb_v_w, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlb_v_w_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlb_v_d, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlb_v_d_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlh_v_h, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlh_v_h_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlh_v_w, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlh_v_w_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlh_v_d, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlh_v_d_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlw_v_w, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlw_v_w_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlw_v_d, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlw_v_d_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vle_v_b, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vle_v_b_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vle_v_h, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vle_v_h_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vle_v_w, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vle_v_w_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vle_v_d, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vle_v_d_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlbu_v_b, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlbu_v_b_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlbu_v_h, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlbu_v_h_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlbu_v_w, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlbu_v_w_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlbu_v_d, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlbu_v_d_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlhu_v_h, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlhu_v_h_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlhu_v_w, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlhu_v_w_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlhu_v_d, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlhu_v_d_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlwu_v_w, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlwu_v_w_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlwu_v_d, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlwu_v_d_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsb_v_b, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsb_v_b_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsb_v_h, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsb_v_h_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsb_v_w, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsb_v_w_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsb_v_d, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsb_v_d_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsh_v_h, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsh_v_h_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsh_v_w, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsh_v_w_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsh_v_d, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsh_v_d_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsw_v_w, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsw_v_w_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsw_v_d, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsw_v_d_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vse_v_b, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vse_v_b_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vse_v_h, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vse_v_h_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vse_v_w, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vse_v_w_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vse_v_d, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vse_v_d_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_6(vlsb_v_b, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsb_v_h, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsb_v_w, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsb_v_d, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsh_v_h, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsh_v_w, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsh_v_d, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsw_v_w, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsw_v_d, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlse_v_b, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlse_v_h, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlse_v_w, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlse_v_d, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsbu_v_b, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsbu_v_h, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsbu_v_w, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsbu_v_d, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlshu_v_h, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlshu_v_w, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlshu_v_d, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlswu_v_w, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlswu_v_d, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vssb_v_b, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vssb_v_h, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vssb_v_w, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vssb_v_d, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vssh_v_h, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vssh_v_w, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vssh_v_d, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vssw_v_w, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vssw_v_d, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vsse_v_b, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vsse_v_h, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vsse_v_w, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vsse_v_d, void, ptr, ptr, tl, tl, env, i32)
>>>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>>>> index 53340bdbc4..ef521152c5 100644
>>>> --- a/target/riscv/insn32.decode
>>>> +++ b/target/riscv/insn32.decode
>>>> @@ -25,6 +25,7 @@
>>>>    %sh10    20:10
>>>>    %csr    20:12
>>>>    %rm     12:3
>>>> +%nf     29:3                     !function=ex_plus_1
>>>>
>>>>    # immediates:
>>>>    %imm_i    20:s12
>>>> @@ -43,6 +44,8 @@
>>>>    &u    imm rd
>>>>    &shift     shamt rs1 rd
>>>>    &atomic    aq rl rs2 rs1 rd
>>>> +&r2nfvm    vm rd rs1 nf
>>>> +&rnfvm     vm rd rs1 rs2 nf
>>>>
>>>>    # Formats 32:
>>>>    @r       .......   ..... ..... ... ..... ....... &r                %rs2 %rs1 %rd
>>>> @@ -62,6 +65,8 @@
>>>>    @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
>>>>    @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
>>>>    @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
>>>> +@r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
>>>> +@r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
>>>>    @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
>>>>
>>>>    @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
>>>> @@ -210,5 +215,32 @@ fcvt_d_w   1101001  00000 ..... ... ..... 1010011 @r2_rm
>>>>    fcvt_d_wu  1101001  00001 ..... ... ..... 1010011 @r2_rm
>>>>
>>>>    # *** RV32V Extension ***
>>>> +
>>>> +# *** Vector loads and stores are encoded within LOADFP/STORE-FP ***
>>>> +vlb_v      ... 100 . 00000 ..... 000 ..... 0000111 @r2_nfvm
>>>> +vlh_v      ... 100 . 00000 ..... 101 ..... 0000111 @r2_nfvm
>>>> +vlw_v      ... 100 . 00000 ..... 110 ..... 0000111 @r2_nfvm
>>>> +vle_v      ... 000 . 00000 ..... 111 ..... 0000111 @r2_nfvm
>>>> +vlbu_v     ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm
>>>> +vlhu_v     ... 000 . 00000 ..... 101 ..... 0000111 @r2_nfvm
>>>> +vlwu_v     ... 000 . 00000 ..... 110 ..... 0000111 @r2_nfvm
>>>> +vsb_v      ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm
>>>> +vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
>>>> +vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
>>>> +vse_v      ... 000 . 00000 ..... 111 ..... 0100111 @r2_nfvm
>>>> +
>>>> +vlsb_v     ... 110 . ..... ..... 000 ..... 0000111 @r_nfvm
>>>> +vlsh_v     ... 110 . ..... ..... 101 ..... 0000111 @r_nfvm
>>>> +vlsw_v     ... 110 . ..... ..... 110 ..... 0000111 @r_nfvm
>>>> +vlse_v     ... 010 . ..... ..... 111 ..... 0000111 @r_nfvm
>>>> +vlsbu_v    ... 010 . ..... ..... 000 ..... 0000111 @r_nfvm
>>>> +vlshu_v    ... 010 . ..... ..... 101 ..... 0000111 @r_nfvm
>>>> +vlswu_v    ... 010 . ..... ..... 110 ..... 0000111 @r_nfvm
>>>> +vssb_v     ... 010 . ..... ..... 000 ..... 0100111 @r_nfvm
>>>> +vssh_v     ... 010 . ..... ..... 101 ..... 0100111 @r_nfvm
>>>> +vssw_v     ... 010 . ..... ..... 110 ..... 0100111 @r_nfvm
>>>> +vsse_v     ... 010 . ..... ..... 111 ..... 0100111 @r_nfvm
>>>> +
>>>> +# *** new major opcode OP-V ***
>>>>    vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
>>>>    vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
>>>> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
>>>> index da82c72bbf..d85f2aec68 100644
>>>> --- a/target/riscv/insn_trans/trans_rvv.inc.c
>>>> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
>>>> @@ -15,6 +15,8 @@
>>>>     * You should have received a copy of the GNU General Public License along with
>>>>     * this program.  If not, see <http://www.gnu.org/licenses/>.
>>>>     */
>>>> +#include "tcg/tcg-op-gvec.h"
>>>> +#include "tcg/tcg-gvec-desc.h"
>>>>
>>>>    static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl * a)
>>>>    {
>>>> @@ -67,3 +69,341 @@ static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli * a)
>>>>        tcg_temp_free(dst);
>>>>        return true;
>>>>    }
>>>> +
>>>> +/* vector register offset from env */
>>>> +static uint32_t vreg_ofs(DisasContext *s, int reg)
>>>> +{
>>>> +    return offsetof(CPURISCVState, vreg) + reg * s->vlen / 8;
>>>> +}
>>>> +
>>>> +/* check functions */
>>>> +static bool vext_check_isa_ill(DisasContext *s, target_ulong isa)
>>>> +{
>>>> +    return !s->vill && ((s->misa & isa) == isa);
>>>> +}
>>> I don't think we need a new function to check ISA.
>> I don't think so.
>>
>> Although there is a riscv_has_ext(env, isa) in cpu.h, it is not proper
>> in this file,
>> as it is in translation time and  usually DisasContext   is used here
>> instead of CPURISCVState.
> Ah good point. This is fine then.
>
>> VILL and ISA  will be checked in every vector instruction, I just put
>> them in one function.
>>>> +
>>>> +/*
>>>> + * There are two rules check here.
>>>> + *
>>>> + * 1. Vector register numbers are multiples of LMUL. (Section 3.2)
>>>> + *
>>>> + * 2. For all widening instructions, the destination LMUL value must also be
>>>> + *    a supported LMUL value. (Section 11.2)
>>>> + */
>>>> +static bool vext_check_reg(DisasContext *s, uint32_t reg, bool widen)
>>>> +{
>>>> +    /*
>>>> +     * The destination vector register group results are arranged as if both
>>>> +     * SEW and LMUL were at twice their current settings. (Section 11.2).
>>>> +     */
>>>> +    int legal = widen ? 2 << s->lmul : 1 << s->lmul;
>>>> +
>>>> +    return !((s->lmul == 0x3 && widen) || (reg % legal));
>>> Where does this 3 come from?
>> LMUL are 2 bits in VTYPE.  So the biggest LMUL is 0x3.
>> The meaning of 0x3 is there are 8 vector registers will be used for
>> operators.
>>
>> For a widen operation, LMUL equals 0x3 will be illegal, as
>>
>>       "The destination vector register group results are arranged as if both
>>        SEW and LMUL were at twice their current settings. (Section 11.2)."
>>
>> If LMUL is 0x3, the source vector register group is 8 vector registers, and
>> the destination vector register group will be 16 vector registers indicated,
>> which is illegal.
> Ah ok.
>
>>>> +}
>>>> +
>>>> +/*
>>>> + * There are two rules check here.
>>>> + *
>>>> + * 1. The destination vector register group for a masked vector instruction can
>>>> + *    only overlap the source mask register (v0) when LMUL=1. (Section 5.3)
>>>> + *
>>>> + * 2. In widen instructions and some other insturctions, like vslideup.vx,
>>>> + *    there is no need to check whether LMUL=1.
>>>> + */
>>>> +static bool vext_check_overlap_mask(DisasContext *s, uint32_t vd, bool vm,
>>>> +    bool force)
>>>> +{
>>>> +    return (vm != 0 || vd != 0) || (!force && (s->lmul == 0));
>>>> +}
>>>> +
>>>> +/* The LMUL setting must be such that LMUL * NFIELDS <= 8. (Section 7.8) */
>>>> +static bool vext_check_nf(DisasContext *s, uint32_t nf)
>>>> +{
>>>> +    return (1 << s->lmul) * nf <= 8;
>>>> +}
>>>> +
>>>> +/* common translation macro */
>>>> +#define GEN_VEXT_TRANS(NAME, SEQ, ARGTYPE, OP, CHECK)      \
>>>> +static bool trans_##NAME(DisasContext *s, arg_##ARGTYPE *a)\
>>>> +{                                                          \
>>>> +    if (CHECK(s, a)) {                                     \
>>>> +        return OP(s, a, SEQ);                              \
>>>> +    }                                                      \
>>>> +    return false;                                          \
>>>> +}
>>>> +
>>>> +/*
>>>> + *** unit stride load and store
>>>> + */
>>>> +typedef void gen_helper_ldst_us(TCGv_ptr, TCGv_ptr, TCGv,
>>>> +        TCGv_env, TCGv_i32);
>>>> +
>>>> +static bool ldst_us_trans(uint32_t vd, uint32_t rs1, uint32_t data,
>>>> +        gen_helper_ldst_us *fn, DisasContext *s)
>>>> +{
>>>> +    TCGv_ptr dest, mask;
>>>> +    TCGv base;
>>>> +    TCGv_i32 desc;
>>>> +
>>>> +    dest = tcg_temp_new_ptr();
>>>> +    mask = tcg_temp_new_ptr();
>>>> +    base = tcg_temp_new();
>>>> +
>>>> +    /*
>>>> +     * As simd_desc supports at most 256 bytes, and in this implementation,
>>>> +     * the max vector group length is 2048 bytes. So split it into two parts.
>>>> +     *
>>>> +     * The first part is vlen in bytes, encoded in maxsz of simd_desc.
>>>> +     * The second part is lmul, encoded in data of simd_desc.
>>>> +     */
>>>> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
>>>> +
>>>> +    gen_get_gpr(base, rs1);
>>>> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
>>>> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
>>>> +
>>>> +    fn(dest, mask, base, cpu_env, desc);
>>>> +
>>>> +    tcg_temp_free_ptr(dest);
>>>> +    tcg_temp_free_ptr(mask);
>>>> +    tcg_temp_free(base);
>>>> +    tcg_temp_free_i32(desc);
>>>> +    return true;
>>>> +}
>>>> +
>>>> +static bool ld_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
>>>> +{
>>>> +    uint32_t data = 0;
>>>> +    gen_helper_ldst_us *fn;
>>>> +    static gen_helper_ldst_us * const fns[2][7][4] = {
>>>> +        /* masked unit stride load */
>>>> +        { { gen_helper_vlb_v_b_mask,  gen_helper_vlb_v_h_mask,
>>>> +            gen_helper_vlb_v_w_mask,  gen_helper_vlb_v_d_mask },
>>>> +          { NULL,                     gen_helper_vlh_v_h_mask,
>>>> +            gen_helper_vlh_v_w_mask,  gen_helper_vlh_v_d_mask },
>>>> +          { NULL,                     NULL,
>>>> +            gen_helper_vlw_v_w_mask,  gen_helper_vlw_v_d_mask },
>>>> +          { gen_helper_vle_v_b_mask,  gen_helper_vle_v_h_mask,
>>>> +            gen_helper_vle_v_w_mask,  gen_helper_vle_v_d_mask },
>>>> +          { gen_helper_vlbu_v_b_mask, gen_helper_vlbu_v_h_mask,
>>>> +            gen_helper_vlbu_v_w_mask, gen_helper_vlbu_v_d_mask },
>>>> +          { NULL,                     gen_helper_vlhu_v_h_mask,
>>>> +            gen_helper_vlhu_v_w_mask, gen_helper_vlhu_v_d_mask },
>>>> +          { NULL,                     NULL,
>>>> +            gen_helper_vlwu_v_w_mask, gen_helper_vlwu_v_d_mask } },
>>>> +        /* unmasked unit stride load */
>>>> +        { { gen_helper_vlb_v_b,  gen_helper_vlb_v_h,
>>>> +            gen_helper_vlb_v_w,  gen_helper_vlb_v_d },
>>>> +          { NULL,                gen_helper_vlh_v_h,
>>>> +            gen_helper_vlh_v_w,  gen_helper_vlh_v_d },
>>>> +          { NULL,                NULL,
>>>> +            gen_helper_vlw_v_w,  gen_helper_vlw_v_d },
>>>> +          { gen_helper_vle_v_b,  gen_helper_vle_v_h,
>>>> +            gen_helper_vle_v_w,  gen_helper_vle_v_d },
>>>> +          { gen_helper_vlbu_v_b, gen_helper_vlbu_v_h,
>>>> +            gen_helper_vlbu_v_w, gen_helper_vlbu_v_d },
>>>> +          { NULL,                gen_helper_vlhu_v_h,
>>>> +            gen_helper_vlhu_v_w, gen_helper_vlhu_v_d },
>>>> +          { NULL,                NULL,
>>>> +            gen_helper_vlwu_v_w, gen_helper_vlwu_v_d } }
>>>> +    };
>>>> +
>>>> +    fn =  fns[a->vm][seq][s->sew];
>>>> +    if (fn == NULL) {
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
>>>> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
>>>> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
>>>> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
>>>> +    return ldst_us_trans(a->rd, a->rs1, data, fn, s);
>>>> +}
>>>> +
>>>> +static bool ld_us_check(DisasContext *s, arg_r2nfvm* a)
>>>> +{
>>>> +    return (vext_check_isa_ill(s, RVV) &&
>>>> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
>>>> +            vext_check_reg(s, a->rd, false) &&
>>>> +            vext_check_nf(s, a->nf));
>>>> +}
>>>> +
>>>> +GEN_VEXT_TRANS(vlb_v, 0, r2nfvm, ld_us_op, ld_us_check)
>>>> +GEN_VEXT_TRANS(vlh_v, 1, r2nfvm, ld_us_op, ld_us_check)
>>>> +GEN_VEXT_TRANS(vlw_v, 2, r2nfvm, ld_us_op, ld_us_check)
>>>> +GEN_VEXT_TRANS(vle_v, 3, r2nfvm, ld_us_op, ld_us_check)
>>>> +GEN_VEXT_TRANS(vlbu_v, 4, r2nfvm, ld_us_op, ld_us_check)
>>>> +GEN_VEXT_TRANS(vlhu_v, 5, r2nfvm, ld_us_op, ld_us_check)
>>>> +GEN_VEXT_TRANS(vlwu_v, 6, r2nfvm, ld_us_op, ld_us_check)
>>>> +
>>>> +static bool st_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
>>>> +{
>>>> +    uint32_t data = 0;
>>>> +    gen_helper_ldst_us *fn;
>>>> +    static gen_helper_ldst_us * const fns[2][4][4] = {
>>>> +        /* masked unit stride load and store */
>>>> +        { { gen_helper_vsb_v_b_mask,  gen_helper_vsb_v_h_mask,
>>>> +            gen_helper_vsb_v_w_mask,  gen_helper_vsb_v_d_mask },
>>>> +          { NULL,                     gen_helper_vsh_v_h_mask,
>>>> +            gen_helper_vsh_v_w_mask,  gen_helper_vsh_v_d_mask },
>>>> +          { NULL,                     NULL,
>>>> +            gen_helper_vsw_v_w_mask,  gen_helper_vsw_v_d_mask },
>>>> +          { gen_helper_vse_v_b_mask,  gen_helper_vse_v_h_mask,
>>>> +            gen_helper_vse_v_w_mask,  gen_helper_vse_v_d_mask } },
>>>> +        /* unmasked unit stride store */
>>>> +        { { gen_helper_vsb_v_b,  gen_helper_vsb_v_h,
>>>> +            gen_helper_vsb_v_w,  gen_helper_vsb_v_d },
>>>> +          { NULL,                gen_helper_vsh_v_h,
>>>> +            gen_helper_vsh_v_w,  gen_helper_vsh_v_d },
>>>> +          { NULL,                NULL,
>>>> +            gen_helper_vsw_v_w,  gen_helper_vsw_v_d },
>>>> +          { gen_helper_vse_v_b,  gen_helper_vse_v_h,
>>>> +            gen_helper_vse_v_w,  gen_helper_vse_v_d } }
>>>> +    };
>>>> +
>>>> +    fn =  fns[a->vm][seq][s->sew];
>>>> +    if (fn == NULL) {
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
>>>> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
>>>> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
>>>> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
>>>> +    return ldst_us_trans(a->rd, a->rs1, data, fn, s);
>>>> +}
>>>> +
>>>> +static bool st_us_check(DisasContext *s, arg_r2nfvm* a)
>>>> +{
>>>> +    return (vext_check_isa_ill(s, RVV) &&
>>>> +            vext_check_reg(s, a->rd, false) &&
>>>> +            vext_check_nf(s, a->nf));
>>>> +}
>>>> +
>>>> +GEN_VEXT_TRANS(vsb_v, 0, r2nfvm, st_us_op, st_us_check)
>>>> +GEN_VEXT_TRANS(vsh_v, 1, r2nfvm, st_us_op, st_us_check)
>>>> +GEN_VEXT_TRANS(vsw_v, 2, r2nfvm, st_us_op, st_us_check)
>>>> +GEN_VEXT_TRANS(vse_v, 3, r2nfvm, st_us_op, st_us_check)
>>>> +
>>>> +/*
>>>> + *** stride load and store
>>>> + */
>>>> +typedef void gen_helper_ldst_stride(TCGv_ptr, TCGv_ptr, TCGv,
>>>> +        TCGv, TCGv_env, TCGv_i32);
>>>> +
>>>> +static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, uint32_t rs2,
>>>> +        uint32_t data, gen_helper_ldst_stride *fn, DisasContext *s)
>>>> +{
>>>> +    TCGv_ptr dest, mask;
>>>> +    TCGv base, stride;
>>>> +    TCGv_i32 desc;
>>>> +
>>>> +    dest = tcg_temp_new_ptr();
>>>> +    mask = tcg_temp_new_ptr();
>>>> +    base = tcg_temp_new();
>>>> +    stride = tcg_temp_new();
>>>> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
>>>> +
>>>> +    gen_get_gpr(base, rs1);
>>>> +    gen_get_gpr(stride, rs2);
>>>> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
>>>> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
>>>> +
>>>> +    fn(dest, mask, base, stride, cpu_env, desc);
>>>> +
>>>> +    tcg_temp_free_ptr(dest);
>>>> +    tcg_temp_free_ptr(mask);
>>>> +    tcg_temp_free(base);
>>>> +    tcg_temp_free(stride);
>>>> +    tcg_temp_free_i32(desc);
>>>> +    return true;
>>>> +}
>>>> +
>>>> +static bool ld_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
>>>> +{
>>>> +    uint32_t data = 0;
>>>> +    gen_helper_ldst_stride *fn;
>>>> +    static gen_helper_ldst_stride * const fns[7][4] = {
>>>> +        { gen_helper_vlsb_v_b,  gen_helper_vlsb_v_h,
>>>> +          gen_helper_vlsb_v_w,  gen_helper_vlsb_v_d },
>>>> +        { NULL,                 gen_helper_vlsh_v_h,
>>>> +          gen_helper_vlsh_v_w,  gen_helper_vlsh_v_d },
>>>> +        { NULL,                 NULL,
>>>> +          gen_helper_vlsw_v_w,  gen_helper_vlsw_v_d },
>>>> +        { gen_helper_vlse_v_b,  gen_helper_vlse_v_h,
>>>> +          gen_helper_vlse_v_w,  gen_helper_vlse_v_d },
>>>> +        { gen_helper_vlsbu_v_b, gen_helper_vlsbu_v_h,
>>>> +          gen_helper_vlsbu_v_w, gen_helper_vlsbu_v_d },
>>>> +        { NULL,                 gen_helper_vlshu_v_h,
>>>> +          gen_helper_vlshu_v_w, gen_helper_vlshu_v_d },
>>>> +        { NULL,                 NULL,
>>>> +          gen_helper_vlswu_v_w, gen_helper_vlswu_v_d },
>>>> +    };
>>>> +
>>>> +    fn =  fns[seq][s->sew];
>>>> +    if (fn == NULL) {
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
>>>> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
>>>> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
>>>> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
>>>> +    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
>>>> +}
>>>> +
>>>> +static bool ld_stride_check(DisasContext *s, arg_rnfvm* a)
>>>> +{
>>>> +    return (vext_check_isa_ill(s, RVV) &&
>>>> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
>>>> +            vext_check_reg(s, a->rd, false) &&
>>>> +            vext_check_nf(s, a->nf));
>>>> +}
>>>> +
>>>> +GEN_VEXT_TRANS(vlsb_v, 0, rnfvm, ld_stride_op, ld_stride_check)
>>>> +GEN_VEXT_TRANS(vlsh_v, 1, rnfvm, ld_stride_op, ld_stride_check)
>>>> +GEN_VEXT_TRANS(vlsw_v, 2, rnfvm, ld_stride_op, ld_stride_check)
>>>> +GEN_VEXT_TRANS(vlse_v, 3, rnfvm, ld_stride_op, ld_stride_check)
>>>> +GEN_VEXT_TRANS(vlsbu_v, 4, rnfvm, ld_stride_op, ld_stride_check)
>>>> +GEN_VEXT_TRANS(vlshu_v, 5, rnfvm, ld_stride_op, ld_stride_check)
>>>> +GEN_VEXT_TRANS(vlswu_v, 6, rnfvm, ld_stride_op, ld_stride_check)
>>>> +
>>>> +static bool st_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
>>>> +{
>>>> +    uint32_t data = 0;
>>>> +    gen_helper_ldst_stride *fn;
>>>> +    static gen_helper_ldst_stride * const fns[4][4] = {
>>>> +        /* masked stride store */
>>>> +        { gen_helper_vssb_v_b,  gen_helper_vssb_v_h,
>>>> +          gen_helper_vssb_v_w,  gen_helper_vssb_v_d },
>>>> +        { NULL,                 gen_helper_vssh_v_h,
>>>> +          gen_helper_vssh_v_w,  gen_helper_vssh_v_d },
>>>> +        { NULL,                 NULL,
>>>> +          gen_helper_vssw_v_w,  gen_helper_vssw_v_d },
>>>> +        { gen_helper_vsse_v_b,  gen_helper_vsse_v_h,
>>>> +          gen_helper_vsse_v_w,  gen_helper_vsse_v_d }
>>>> +    };
>>>> +
>>>> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
>>>> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
>>>> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
>>>> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
>>>> +    fn =  fns[seq][s->sew];
>>>> +    if (fn == NULL) {
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
>>>> +}
>>>> +
>>>> +static bool st_stride_check(DisasContext *s, arg_rnfvm* a)
>>>> +{
>>>> +    return (vext_check_isa_ill(s, RVV) &&
>>>> +            vext_check_reg(s, a->rd, false) &&
>>>> +            vext_check_nf(s, a->nf));
>>>> +}
>>>> +
>>>> +GEN_VEXT_TRANS(vssb_v, 0, rnfvm, st_stride_op, st_stride_check)
>>>> +GEN_VEXT_TRANS(vssh_v, 1, rnfvm, st_stride_op, st_stride_check)
>>>> +GEN_VEXT_TRANS(vssw_v, 2, rnfvm, st_stride_op, st_stride_check)
>>>> +GEN_VEXT_TRANS(vsse_v, 3, rnfvm, st_stride_op, st_stride_check)
>>> Looks good
>>>
>>>> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
>>>> index af07ac4160..852545b77e 100644
>>>> --- a/target/riscv/translate.c
>>>> +++ b/target/riscv/translate.c
>>>> @@ -61,6 +61,7 @@ typedef struct DisasContext {
>>>>        uint8_t lmul;
>>>>        uint8_t sew;
>>>>        uint16_t vlen;
>>>> +    uint16_t mlen;
>>>>        bool vl_eq_vlmax;
>>>>    } DisasContext;
>>>>
>>>> @@ -548,6 +549,11 @@ static void decode_RV32_64C(DisasContext *ctx, uint16_t opcode)
>>>>        }
>>>>    }
>>>>
>>>> +static int ex_plus_1(DisasContext *ctx, int nf)
>>>> +{
>>>> +    return nf + 1;
>>>> +}
>>>> +
>>>>    #define EX_SH(amount) \
>>>>        static int ex_shift_##amount(DisasContext *ctx, int imm) \
>>>>        {                                         \
>>>> @@ -784,6 +790,7 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
>>>>        ctx->vill = FIELD_EX32(tb_flags, TB_FLAGS, VILL);
>>>>        ctx->sew = FIELD_EX32(tb_flags, TB_FLAGS, SEW);
>>>>        ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
>>>> +    ctx->mlen = 1 << (ctx->sew  + 3 - ctx->lmul);
>>>>        ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
>>>>    }
>>>>
>>>> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
>>>> index 2afe716f2a..ebfabd2946 100644
>>>> --- a/target/riscv/vector_helper.c
>>>> +++ b/target/riscv/vector_helper.c
>>>> @@ -18,8 +18,10 @@
>>>>
>>>>    #include "qemu/osdep.h"
>>>>    #include "cpu.h"
>>>> +#include "exec/memop.h"
>>>>    #include "exec/exec-all.h"
>>>>    #include "exec/helper-proto.h"
>>>> +#include "tcg/tcg-gvec-desc.h"
>>>>    #include <math.h>
>>>>
>>>>    target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
>>>> @@ -51,3 +53,407 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
>>>>        env->vstart = 0;
>>>>        return vl;
>>>>    }
>>>> +
>>>> +/*
>>>> + * Note that vector data is stored in host-endian 64-bit chunks,
>>>> + * so addressing units smaller than that needs a host-endian fixup.
>>>> + */
>>>> +#ifdef HOST_WORDS_BIGENDIAN
>>>> +#define H1(x)   ((x) ^ 7)
>>>> +#define H1_2(x) ((x) ^ 6)
>>>> +#define H1_4(x) ((x) ^ 4)
>>>> +#define H2(x)   ((x) ^ 3)
>>>> +#define H4(x)   ((x) ^ 1)
>>>> +#define H8(x)   ((x))
>>>> +#else
>>>> +#define H1(x)   (x)
>>>> +#define H1_2(x) (x)
>>>> +#define H1_4(x) (x)
>>>> +#define H2(x)   (x)
>>>> +#define H4(x)   (x)
>>>> +#define H8(x)   (x)
>>>> +#endif
>>> Looks good. Overall this looks good. Do you mind splitting this patch
>>> up a little bit more? It's difficult to review such a long and complex
>>> patch.
>>>
>>> Alistair
>> As unit stride can be saw as  a special case of stride mode, I just put
>> them together.
>> I will  split the stride and unit stride mode in next patch set.
> Thank you.
>
>> Even though I think it will be some long and complex, a lot of corner
>> case must
>> be considered for vector load and store, and a lot of common code will
>> be defined
>> here.
> That's fine
>
> Alistair
>
>> Zhiwei
>>>> +
>>>> +static inline uint32_t vext_nf(uint32_t desc)
>>>> +{
>>>> +    return FIELD_EX32(simd_data(desc), VDATA, NF);
>>>> +}
>>>> +
>>>> +static inline uint32_t vext_mlen(uint32_t desc)
>>>> +{
>>>> +    return FIELD_EX32(simd_data(desc), VDATA, MLEN);
>>>> +}
>>>> +
>>>> +static inline uint32_t vext_vm(uint32_t desc)
>>>> +{
>>>> +    return FIELD_EX32(simd_data(desc), VDATA, VM);
>>>> +}
>>>> +
>>>> +static inline uint32_t vext_lmul(uint32_t desc)
>>>> +{
>>>> +    return FIELD_EX32(simd_data(desc), VDATA, LMUL);
>>>> +}
>>>> +
>>>> +/*
>>>> + * Get vector group length in bytes. Its range is [64, 2048].
>>>> + *
>>>> + * As simd_desc support at most 256, the max vlen is 512 bits.
>>>> + * So vlen in bytes is encoded as maxsz.
>>>> + */
>>>> +static inline uint32_t vext_maxsz(uint32_t desc)
>>>> +{
>>>> +    return simd_maxsz(desc) << vext_lmul(desc);
>>>> +}
>>>> +
>>>> +/*
>>>> + * This function checks watchpoint before real load operation.
>>>> + *
>>>> + * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
>>>> + * In user mode, there is no watchpoint support now.
>>>> + *
>>>> + * It will trigger an exception if there is no mapping in TLB
>>>> + * and page table walk can't fill the TLB entry. Then the guest
>>>> + * software can return here after process the exception or never return.
>>>> + */
>>>> +static void probe_pages(CPURISCVState *env, target_ulong addr,
>>>> +        target_ulong len, uintptr_t ra, MMUAccessType access_type)
>>>> +{
>>>> +    target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
>>>> +    target_ulong curlen = MIN(pagelen, len);
>>>> +
>>>> +    probe_access(env, addr, curlen, access_type,
>>>> +            cpu_mmu_index(env, false), ra);
>>>> +    if (len > curlen) {
>>>> +        addr += curlen;
>>>> +        curlen = len - curlen;
>>>> +        probe_access(env, addr, curlen, access_type,
>>>> +                cpu_mmu_index(env, false), ra);
>>>> +    }
>>>> +}
>>>> +
>>>> +#ifdef HOST_WORDS_BIGENDIAN
>>>> +static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
>>>> +{
>>>> +    /*
>>>> +     * Split the remaining range to two parts.
>>>> +     * The first part is in the last uint64_t unit.
>>>> +     * The second part start from the next uint64_t unit.
>>>> +     */
>>>> +    int part1 = 0, part2 = tot - cnt;
>>>> +    if (cnt % 8) {
>>>> +        part1 = 8 - (cnt % 8);
>>>> +        part2 = tot - cnt - part1;
>>>> +        memset(tail & ~(7ULL), 0, part1);
>>>> +        memset((tail + 8) & ~(7ULL), 0, part2);
>>>> +    } else {
>>>> +        memset(tail, 0, part2);
>>>> +    }
>>>> +}
>>>> +#else
>>>> +static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
>>>> +{
>>>> +    memset(tail, 0, tot - cnt);
>>>> +}
>>>> +#endif
>>>> +
>>>> +static void clearb(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
>>>> +{
>>>> +    int8_t *cur = ((int8_t *)vd + H1(idx));
>>>> +    vext_clear(cur, cnt, tot);
>>>> +}
>>>> +
>>>> +static void clearh(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
>>>> +{
>>>> +    int16_t *cur = ((int16_t *)vd + H2(idx));
>>>> +    vext_clear(cur, cnt, tot);
>>>> +}
>>>> +
>>>> +static void clearl(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
>>>> +{
>>>> +    int32_t *cur = ((int32_t *)vd + H4(idx));
>>>> +    vext_clear(cur, cnt, tot);
>>>> +}
>>>> +
>>>> +static void clearq(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
>>>> +{
>>>> +    int64_t *cur = (int64_t *)vd + idx;
>>>> +    vext_clear(cur, cnt, tot);
>>>> +}
>>>> +
>>>> +
>>>> +static inline int vext_elem_mask(void *v0, int mlen, int index)
>>>> +{
>>>> +    int idx = (index * mlen) / 64;
>>>> +    int pos = (index * mlen) % 64;
>>>> +    return (((uint64_t *)v0)[idx] >> pos) & 1;
>>>> +}
>>>> +
>>>> +/* elements operations for load and store */
>>>> +typedef void (*vext_ldst_elem_fn)(CPURISCVState *env, target_ulong addr,
>>>> +        uint32_t idx, void *vd, uintptr_t retaddr);
>>>> +typedef void (*vext_ld_clear_elem)(void *vd, uint32_t idx,
>>>> +        uint32_t cnt, uint32_t tot);
>>>> +
>>>> +#define GEN_VEXT_LD_ELEM(NAME, MTYPE, ETYPE, H, LDSUF)     \
>>>> +static void NAME(CPURISCVState *env, abi_ptr addr,         \
>>>> +        uint32_t idx, void *vd, uintptr_t retaddr)         \
>>>> +{                                                          \
>>>> +    MTYPE data;                                            \
>>>> +    ETYPE *cur = ((ETYPE *)vd + H(idx));                   \
>>>> +    data = cpu_##LDSUF##_data_ra(env, addr, retaddr);      \
>>>> +    *cur = data;                                           \
>>>> +}                                                          \
>>>> +
>>>> +GEN_VEXT_LD_ELEM(ldb_b, int8_t,  int8_t,  H1, ldsb)
>>>> +GEN_VEXT_LD_ELEM(ldb_h, int8_t,  int16_t, H2, ldsb)
>>>> +GEN_VEXT_LD_ELEM(ldb_w, int8_t,  int32_t, H4, ldsb)
>>>> +GEN_VEXT_LD_ELEM(ldb_d, int8_t,  int64_t, H8, ldsb)
>>>> +GEN_VEXT_LD_ELEM(ldh_h, int16_t, int16_t, H2, ldsw)
>>>> +GEN_VEXT_LD_ELEM(ldh_w, int16_t, int32_t, H4, ldsw)
>>>> +GEN_VEXT_LD_ELEM(ldh_d, int16_t, int64_t, H8, ldsw)
>>>> +GEN_VEXT_LD_ELEM(ldw_w, int32_t, int32_t, H4, ldl)
>>>> +GEN_VEXT_LD_ELEM(ldw_d, int32_t, int64_t, H8, ldl)
>>>> +GEN_VEXT_LD_ELEM(lde_b, int8_t,  int8_t,  H1, ldsb)
>>>> +GEN_VEXT_LD_ELEM(lde_h, int16_t, int16_t, H2, ldsw)
>>>> +GEN_VEXT_LD_ELEM(lde_w, int32_t, int32_t, H4, ldl)
>>>> +GEN_VEXT_LD_ELEM(lde_d, int64_t, int64_t, H8, ldq)
>>>> +GEN_VEXT_LD_ELEM(ldbu_b, uint8_t,  uint8_t,  H1, ldub)
>>>> +GEN_VEXT_LD_ELEM(ldbu_h, uint8_t,  uint16_t, H2, ldub)
>>>> +GEN_VEXT_LD_ELEM(ldbu_w, uint8_t,  uint32_t, H4, ldub)
>>>> +GEN_VEXT_LD_ELEM(ldbu_d, uint8_t,  uint64_t, H8, ldub)
>>>> +GEN_VEXT_LD_ELEM(ldhu_h, uint16_t, uint16_t, H2, lduw)
>>>> +GEN_VEXT_LD_ELEM(ldhu_w, uint16_t, uint32_t, H4, lduw)
>>>> +GEN_VEXT_LD_ELEM(ldhu_d, uint16_t, uint64_t, H8, lduw)
>>>> +GEN_VEXT_LD_ELEM(ldwu_w, uint32_t, uint32_t, H4, ldl)
>>>> +GEN_VEXT_LD_ELEM(ldwu_d, uint32_t, uint64_t, H8, ldl)
>>>> +
>>>> +#define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF)          \
>>>> +static void NAME(CPURISCVState *env, abi_ptr addr,       \
>>>> +        uint32_t idx, void *vd, uintptr_t retaddr)       \
>>>> +{                                                        \
>>>> +    ETYPE data = *((ETYPE *)vd + H(idx));                \
>>>> +    cpu_##STSUF##_data_ra(env, addr, data, retaddr);     \
>>>> +}
>>>> +GEN_VEXT_ST_ELEM(stb_b, int8_t,  H1, stb)
>>>> +GEN_VEXT_ST_ELEM(stb_h, int16_t, H2, stb)
>>>> +GEN_VEXT_ST_ELEM(stb_w, int32_t, H4, stb)
>>>> +GEN_VEXT_ST_ELEM(stb_d, int64_t, H8, stb)
>>>> +GEN_VEXT_ST_ELEM(sth_h, int16_t, H2, stw)
>>>> +GEN_VEXT_ST_ELEM(sth_w, int32_t, H4, stw)
>>>> +GEN_VEXT_ST_ELEM(sth_d, int64_t, H8, stw)
>>>> +GEN_VEXT_ST_ELEM(stw_w, int32_t, H4, stl)
>>>> +GEN_VEXT_ST_ELEM(stw_d, int64_t, H8, stl)
>>>> +GEN_VEXT_ST_ELEM(ste_b, int8_t,  H1, stb)
>>>> +GEN_VEXT_ST_ELEM(ste_h, int16_t, H2, stw)
>>>> +GEN_VEXT_ST_ELEM(ste_w, int32_t, H4, stl)
>>>> +GEN_VEXT_ST_ELEM(ste_d, int64_t, H8, stq)
>>>> +
>>>> +/*
>>>> + *** stride: access vector element from strided memory
>>>> + */
>>>> +static void vext_ldst_stride(void *vd, void *v0, target_ulong base,
>>>> +        target_ulong stride, CPURISCVState *env, uint32_t desc, uint32_t vm,
>>>> +        vext_ldst_elem_fn ldst_elem, vext_ld_clear_elem clear_elem,
>>>> +        uint32_t esz, uint32_t msz, uintptr_t ra, MMUAccessType access_type)
>>>> +{
>>>> +    uint32_t i, k;
>>>> +    uint32_t nf = vext_nf(desc);
>>>> +    uint32_t mlen = vext_mlen(desc);
>>>> +    uint32_t vlmax = vext_maxsz(desc) / esz;
>>>> +
>>>> +    if (env->vl == 0) {
>>>> +        return;
>>>> +    }
>>>> +    /* probe every access*/
>>>> +    for (i = 0; i < env->vl; i++) {
>>>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
>>>> +            continue;
>>>> +        }
>>>> +        probe_pages(env, base + stride * i, nf * msz, ra, access_type);
>>>> +    }
>>>> +    /* do real access */
>>>> +    for (i = 0; i < env->vl; i++) {
>>>> +        k = 0;
>>>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
>>>> +            continue;
>>>> +        }
>>>> +        while (k < nf) {
>>>> +            target_ulong addr = base + stride * i + k * msz;
>>>> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
>>>> +            k++;
>>>> +        }
>>>> +    }
>>>> +    /* clear tail elements */
>>>> +    if (clear_elem) {
>>>> +        for (k = 0; k < nf; k++) {
>>>> +            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>> +#define GEN_VEXT_LD_STRIDE(NAME, MTYPE, ETYPE, LOAD_FN, CLEAR_FN)       \
>>>> +void HELPER(NAME)(void *vd, void * v0, target_ulong base,               \
>>>> +        target_ulong stride, CPURISCVState *env, uint32_t desc)         \
>>>> +{                                                                       \
>>>> +    uint32_t vm = vext_vm(desc);                                        \
>>>> +    vext_ldst_stride(vd, v0, base, stride, env, desc, vm, LOAD_FN,      \
>>>> +        CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);\
>>>> +}
>>>> +
>>>> +GEN_VEXT_LD_STRIDE(vlsb_v_b,  int8_t,   int8_t,   ldb_b,  clearb)
>>>> +GEN_VEXT_LD_STRIDE(vlsb_v_h,  int8_t,   int16_t,  ldb_h,  clearh)
>>>> +GEN_VEXT_LD_STRIDE(vlsb_v_w,  int8_t,   int32_t,  ldb_w,  clearl)
>>>> +GEN_VEXT_LD_STRIDE(vlsb_v_d,  int8_t,   int64_t,  ldb_d,  clearq)
>>>> +GEN_VEXT_LD_STRIDE(vlsh_v_h,  int16_t,  int16_t,  ldh_h,  clearh)
>>>> +GEN_VEXT_LD_STRIDE(vlsh_v_w,  int16_t,  int32_t,  ldh_w,  clearl)
>>>> +GEN_VEXT_LD_STRIDE(vlsh_v_d,  int16_t,  int64_t,  ldh_d,  clearq)
>>>> +GEN_VEXT_LD_STRIDE(vlsw_v_w,  int32_t,  int32_t,  ldw_w,  clearl)
>>>> +GEN_VEXT_LD_STRIDE(vlsw_v_d,  int32_t,  int64_t,  ldw_d,  clearq)
>>>> +GEN_VEXT_LD_STRIDE(vlse_v_b,  int8_t,   int8_t,   lde_b,  clearb)
>>>> +GEN_VEXT_LD_STRIDE(vlse_v_h,  int16_t,  int16_t,  lde_h,  clearh)
>>>> +GEN_VEXT_LD_STRIDE(vlse_v_w,  int32_t,  int32_t,  lde_w,  clearl)
>>>> +GEN_VEXT_LD_STRIDE(vlse_v_d,  int64_t,  int64_t,  lde_d,  clearq)
>>>> +GEN_VEXT_LD_STRIDE(vlsbu_v_b, uint8_t,  uint8_t,  ldbu_b, clearb)
>>>> +GEN_VEXT_LD_STRIDE(vlsbu_v_h, uint8_t,  uint16_t, ldbu_h, clearh)
>>>> +GEN_VEXT_LD_STRIDE(vlsbu_v_w, uint8_t,  uint32_t, ldbu_w, clearl)
>>>> +GEN_VEXT_LD_STRIDE(vlsbu_v_d, uint8_t,  uint64_t, ldbu_d, clearq)
>>>> +GEN_VEXT_LD_STRIDE(vlshu_v_h, uint16_t, uint16_t, ldhu_h, clearh)
>>>> +GEN_VEXT_LD_STRIDE(vlshu_v_w, uint16_t, uint32_t, ldhu_w, clearl)
>>>> +GEN_VEXT_LD_STRIDE(vlshu_v_d, uint16_t, uint64_t, ldhu_d, clearq)
>>>> +GEN_VEXT_LD_STRIDE(vlswu_v_w, uint32_t, uint32_t, ldwu_w, clearl)
>>>> +GEN_VEXT_LD_STRIDE(vlswu_v_d, uint32_t, uint64_t, ldwu_d, clearq)
>>>> +
>>>> +#define GEN_VEXT_ST_STRIDE(NAME, MTYPE, ETYPE, STORE_FN)                \
>>>> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
>>>> +        target_ulong stride, CPURISCVState *env, uint32_t desc)         \
>>>> +{                                                                       \
>>>> +    uint32_t vm = vext_vm(desc);                                        \
>>>> +    vext_ldst_stride(vd, v0, base, stride, env, desc, vm, STORE_FN,     \
>>>> +        NULL, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);   \
>>>> +}
>>>> +
>>>> +GEN_VEXT_ST_STRIDE(vssb_v_b, int8_t,  int8_t,  stb_b)
>>>> +GEN_VEXT_ST_STRIDE(vssb_v_h, int8_t,  int16_t, stb_h)
>>>> +GEN_VEXT_ST_STRIDE(vssb_v_w, int8_t,  int32_t, stb_w)
>>>> +GEN_VEXT_ST_STRIDE(vssb_v_d, int8_t,  int64_t, stb_d)
>>>> +GEN_VEXT_ST_STRIDE(vssh_v_h, int16_t, int16_t, sth_h)
>>>> +GEN_VEXT_ST_STRIDE(vssh_v_w, int16_t, int32_t, sth_w)
>>>> +GEN_VEXT_ST_STRIDE(vssh_v_d, int16_t, int64_t, sth_d)
>>>> +GEN_VEXT_ST_STRIDE(vssw_v_w, int32_t, int32_t, stw_w)
>>>> +GEN_VEXT_ST_STRIDE(vssw_v_d, int32_t, int64_t, stw_d)
>>>> +GEN_VEXT_ST_STRIDE(vsse_v_b, int8_t,  int8_t,  ste_b)
>>>> +GEN_VEXT_ST_STRIDE(vsse_v_h, int16_t, int16_t, ste_h)
>>>> +GEN_VEXT_ST_STRIDE(vsse_v_w, int32_t, int32_t, ste_w)
>>>> +GEN_VEXT_ST_STRIDE(vsse_v_d, int64_t, int64_t, ste_d)
>>>> +
>>>> +/*
>>>> + *** unit-stride: access elements stored contiguously in memory
>>>> + */
>>>> +
>>>> +/* unmasked unit-stride load and store operation*/
>>>> +static inline void vext_ldst_us(void *vd, target_ulong base,
>>>> +        CPURISCVState *env, uint32_t desc,
>>>> +        vext_ldst_elem_fn ldst_elem,
>>>> +        vext_ld_clear_elem clear_elem,
>>>> +        uint32_t esz, uint32_t msz, uintptr_t ra,
>>>> +        MMUAccessType access_type)
>>>> +{
>>>> +    uint32_t i, k;
>>>> +    uint32_t nf = vext_nf(desc);
>>>> +    uint32_t vlmax = vext_maxsz(desc) / esz;
>>>> +
>>>> +    if (env->vl == 0) {
>>>> +        return;
>>>> +    }
>>>> +    /* probe every access */
>>>> +    probe_pages(env, base, env->vl * nf * msz, ra, access_type);
>>>> +    /* load bytes from guest memory */
>>>> +    for (i = 0; i < env->vl; i++) {
>>>> +        k = 0;
>>>> +        while (k < nf) {
>>>> +            target_ulong addr = base + (i * nf + k) * msz;
>>>> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
>>>> +            k++;
>>>> +        }
>>>> +    }
>>>> +    /* clear tail elements */
>>>> +    if (clear_elem) {
>>>> +        for (k = 0; k < nf; k++) {
>>>> +            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>> +/*
>>>> + * masked unit-stride load and store operation will be a special case of stride,
>>>> + * stride = NF * sizeof (MTYPE)
>>>> + */
>>>> +
>>>> +#define GEN_VEXT_LD_US(NAME, MTYPE, ETYPE, LOAD_FN, CLEAR_FN)           \
>>>> +void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,         \
>>>> +        CPURISCVState *env, uint32_t desc)                              \
>>>> +{                                                                       \
>>>> +    uint32_t stride = vext_nf(desc) * sizeof(MTYPE);                    \
>>>> +    vext_ldst_stride(vd, v0, base, stride, env, desc, false, LOAD_FN,   \
>>>> +        CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);\
>>>> +}                                                                       \
>>>> +                                                                        \
>>>> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
>>>> +        CPURISCVState *env, uint32_t desc)                              \
>>>> +{                                                                       \
>>>> +    vext_ldst_us(vd, base, env, desc, LOAD_FN, CLEAR_FN,                \
>>>> +        sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);          \
>>>> +}
>>>> +
>>>> +GEN_VEXT_LD_US(vlb_v_b,  int8_t,   int8_t,   ldb_b,  clearb)
>>>> +GEN_VEXT_LD_US(vlb_v_h,  int8_t,   int16_t,  ldb_h,  clearh)
>>>> +GEN_VEXT_LD_US(vlb_v_w,  int8_t,   int32_t,  ldb_w,  clearl)
>>>> +GEN_VEXT_LD_US(vlb_v_d,  int8_t,   int64_t,  ldb_d,  clearq)
>>>> +GEN_VEXT_LD_US(vlh_v_h,  int16_t,  int16_t,  ldh_h,  clearh)
>>>> +GEN_VEXT_LD_US(vlh_v_w,  int16_t,  int32_t,  ldh_w,  clearl)
>>>> +GEN_VEXT_LD_US(vlh_v_d,  int16_t,  int64_t,  ldh_d,  clearq)
>>>> +GEN_VEXT_LD_US(vlw_v_w,  int32_t,  int32_t,  ldw_w,  clearl)
>>>> +GEN_VEXT_LD_US(vlw_v_d,  int32_t,  int64_t,  ldw_d,  clearq)
>>>> +GEN_VEXT_LD_US(vle_v_b,  int8_t,   int8_t,   lde_b,  clearb)
>>>> +GEN_VEXT_LD_US(vle_v_h,  int16_t,  int16_t,  lde_h,  clearh)
>>>> +GEN_VEXT_LD_US(vle_v_w,  int32_t,  int32_t,  lde_w,  clearl)
>>>> +GEN_VEXT_LD_US(vle_v_d,  int64_t,  int64_t,  lde_d,  clearq)
>>>> +GEN_VEXT_LD_US(vlbu_v_b, uint8_t,  uint8_t,  ldbu_b, clearb)
>>>> +GEN_VEXT_LD_US(vlbu_v_h, uint8_t,  uint16_t, ldbu_h, clearh)
>>>> +GEN_VEXT_LD_US(vlbu_v_w, uint8_t,  uint32_t, ldbu_w, clearl)
>>>> +GEN_VEXT_LD_US(vlbu_v_d, uint8_t,  uint64_t, ldbu_d, clearq)
>>>> +GEN_VEXT_LD_US(vlhu_v_h, uint16_t, uint16_t, ldhu_h, clearh)
>>>> +GEN_VEXT_LD_US(vlhu_v_w, uint16_t, uint32_t, ldhu_w, clearl)
>>>> +GEN_VEXT_LD_US(vlhu_v_d, uint16_t, uint64_t, ldhu_d, clearq)
>>>> +GEN_VEXT_LD_US(vlwu_v_w, uint32_t, uint32_t, ldwu_w, clearl)
>>>> +GEN_VEXT_LD_US(vlwu_v_d, uint32_t, uint64_t, ldwu_d, clearq)
>>>> +
>>>> +#define GEN_VEXT_ST_US(NAME, MTYPE, ETYPE, STORE_FN)                    \
>>>> +void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,         \
>>>> +        CPURISCVState *env, uint32_t desc)                              \
>>>> +{                                                                       \
>>>> +    uint32_t stride = vext_nf(desc) * sizeof(MTYPE);                    \
>>>> +    vext_ldst_stride(vd, v0, base, stride, env, desc, false, STORE_FN,  \
>>>> +        NULL, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);   \
>>>> +}                                                                       \
>>>> +                                                                        \
>>>> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
>>>> +        CPURISCVState *env, uint32_t desc)                              \
>>>> +{                                                                       \
>>>> +    vext_ldst_us(vd, base, env, desc, STORE_FN, NULL,                   \
>>>> +        sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);         \
>>>> +}
>>>> +
>>>> +GEN_VEXT_ST_US(vsb_v_b, int8_t,  int8_t , stb_b)
>>>> +GEN_VEXT_ST_US(vsb_v_h, int8_t,  int16_t, stb_h)
>>>> +GEN_VEXT_ST_US(vsb_v_w, int8_t,  int32_t, stb_w)
>>>> +GEN_VEXT_ST_US(vsb_v_d, int8_t,  int64_t, stb_d)
>>>> +GEN_VEXT_ST_US(vsh_v_h, int16_t, int16_t, sth_h)
>>>> +GEN_VEXT_ST_US(vsh_v_w, int16_t, int32_t, sth_w)
>>>> +GEN_VEXT_ST_US(vsh_v_d, int16_t, int64_t, sth_d)
>>>> +GEN_VEXT_ST_US(vsw_v_w, int32_t, int32_t, stw_w)
>>>> +GEN_VEXT_ST_US(vsw_v_d, int32_t, int64_t, stw_d)
>>>> +GEN_VEXT_ST_US(vse_v_b, int8_t,  int8_t , ste_b)
>>>> +GEN_VEXT_ST_US(vse_v_h, int16_t, int16_t, ste_h)
>>>> +GEN_VEXT_ST_US(vse_v_w, int32_t, int32_t, ste_w)
>>>> +GEN_VEXT_ST_US(vse_v_d, int64_t, int64_t, ste_d)
>>>> --
>>>> 2.23.0
>>>>



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 05/60] target/riscv: add vector stride load and store instructions
@ 2020-03-13 22:17           ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-13 22:17 UTC (permalink / raw)
  To: Alistair Francis
  Cc: Richard Henderson, Chih-Min Chao, Palmer Dabbelt, wenmeng_zhang,
	wxy194768, guoren, qemu-devel@nongnu.org Developers,
	open list:RISC-V



On 2020/3/14 6:05, Alistair Francis wrote:
> On Fri, Mar 13, 2020 at 2:32 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>>
>>
>> On 2020/3/14 4:38, Alistair Francis wrote:
>>> On Thu, Mar 12, 2020 at 8:09 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>>>> Vector strided operations access the first memory element at the base address,
>>>> and then access subsequent elements at address increments given by the byte
>>>> offset contained in the x register specified by rs2.
>>>>
>>>> Vector unit-stride operations access elements stored contiguously in memory
>>>> starting from the base effective address. It can been seen as a special
>>>> case of strided operations.
>>>>
>>>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>>>> ---
>>>>    target/riscv/cpu.h                      |   6 +
>>>>    target/riscv/helper.h                   | 105 ++++++
>>>>    target/riscv/insn32.decode              |  32 ++
>>>>    target/riscv/insn_trans/trans_rvv.inc.c | 340 ++++++++++++++++++++
>>>>    target/riscv/translate.c                |   7 +
>>>>    target/riscv/vector_helper.c            | 406 ++++++++++++++++++++++++
>>>>    6 files changed, 896 insertions(+)
>>>>
>>>> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
>>>> index 505d1a8515..b6ebb9b0eb 100644
>>>> --- a/target/riscv/cpu.h
>>>> +++ b/target/riscv/cpu.h
>>>> @@ -369,6 +369,12 @@ typedef CPURISCVState CPUArchState;
>>>>    typedef RISCVCPU ArchCPU;
>>>>    #include "exec/cpu-all.h"
>>>>
>>>> +/* share data between vector helpers and decode code */
>>>> +FIELD(VDATA, MLEN, 0, 8)
>>>> +FIELD(VDATA, VM, 8, 1)
>>>> +FIELD(VDATA, LMUL, 9, 2)
>>>> +FIELD(VDATA, NF, 11, 4)
>>>> +
>>>>    FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
>>>>    FIELD(TB_FLAGS, LMUL, 3, 2)
>>>>    FIELD(TB_FLAGS, SEW, 5, 3)
>>>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>>>> index 3c28c7e407..87dfa90609 100644
>>>> --- a/target/riscv/helper.h
>>>> +++ b/target/riscv/helper.h
>>>> @@ -78,3 +78,108 @@ DEF_HELPER_1(tlb_flush, void, env)
>>>>    #endif
>>>>    /* Vector functions */
>>>>    DEF_HELPER_3(vsetvl, tl, env, tl, tl)
>>>> +DEF_HELPER_5(vlb_v_b, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlb_v_b_mask, void, ptr, ptr, tl, env, i32)
>>> Do you mind explaining why we have *_mask versions? I'm struggling to
>>> understand this.
>> When an instruction with a mask, it will only operate the active
>> elements in vector.
>> Whether an element is active or inactive is predicated by a mask
>> register v0.
>>
>> Without mask, it will operate every element in vector in the body.
> Doesn't the mask always apply though? Why do we need an extra helper?
Yes, mask is always applied.

As you can see,  an extra helper is  very special for unit stride mode.  
Other
instructions do not have the extra helpers.

That's because a more efficient implementation is possible for unit stride
load/store with vm==1(always unmasked).

It will operate a contiguous memory block, so I can probe the memory access
and clean the tail elements more efficient.

Zhiwei

>
>>>> +DEF_HELPER_5(vlb_v_h, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlb_v_h_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlb_v_w, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlb_v_w_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlb_v_d, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlb_v_d_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlh_v_h, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlh_v_h_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlh_v_w, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlh_v_w_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlh_v_d, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlh_v_d_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlw_v_w, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlw_v_w_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlw_v_d, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlw_v_d_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vle_v_b, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vle_v_b_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vle_v_h, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vle_v_h_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vle_v_w, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vle_v_w_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vle_v_d, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vle_v_d_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlbu_v_b, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlbu_v_b_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlbu_v_h, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlbu_v_h_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlbu_v_w, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlbu_v_w_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlbu_v_d, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlbu_v_d_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlhu_v_h, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlhu_v_h_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlhu_v_w, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlhu_v_w_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlhu_v_d, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlhu_v_d_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlwu_v_w, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlwu_v_w_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlwu_v_d, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vlwu_v_d_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsb_v_b, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsb_v_b_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsb_v_h, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsb_v_h_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsb_v_w, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsb_v_w_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsb_v_d, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsb_v_d_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsh_v_h, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsh_v_h_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsh_v_w, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsh_v_w_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsh_v_d, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsh_v_d_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsw_v_w, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsw_v_w_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsw_v_d, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vsw_v_d_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vse_v_b, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vse_v_b_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vse_v_h, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vse_v_h_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vse_v_w, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vse_v_w_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vse_v_d, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_5(vse_v_d_mask, void, ptr, ptr, tl, env, i32)
>>>> +DEF_HELPER_6(vlsb_v_b, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsb_v_h, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsb_v_w, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsb_v_d, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsh_v_h, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsh_v_w, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsh_v_d, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsw_v_w, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsw_v_d, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlse_v_b, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlse_v_h, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlse_v_w, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlse_v_d, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsbu_v_b, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsbu_v_h, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsbu_v_w, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlsbu_v_d, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlshu_v_h, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlshu_v_w, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlshu_v_d, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlswu_v_w, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vlswu_v_d, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vssb_v_b, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vssb_v_h, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vssb_v_w, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vssb_v_d, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vssh_v_h, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vssh_v_w, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vssh_v_d, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vssw_v_w, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vssw_v_d, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vsse_v_b, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vsse_v_h, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vsse_v_w, void, ptr, ptr, tl, tl, env, i32)
>>>> +DEF_HELPER_6(vsse_v_d, void, ptr, ptr, tl, tl, env, i32)
>>>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>>>> index 53340bdbc4..ef521152c5 100644
>>>> --- a/target/riscv/insn32.decode
>>>> +++ b/target/riscv/insn32.decode
>>>> @@ -25,6 +25,7 @@
>>>>    %sh10    20:10
>>>>    %csr    20:12
>>>>    %rm     12:3
>>>> +%nf     29:3                     !function=ex_plus_1
>>>>
>>>>    # immediates:
>>>>    %imm_i    20:s12
>>>> @@ -43,6 +44,8 @@
>>>>    &u    imm rd
>>>>    &shift     shamt rs1 rd
>>>>    &atomic    aq rl rs2 rs1 rd
>>>> +&r2nfvm    vm rd rs1 nf
>>>> +&rnfvm     vm rd rs1 rs2 nf
>>>>
>>>>    # Formats 32:
>>>>    @r       .......   ..... ..... ... ..... ....... &r                %rs2 %rs1 %rd
>>>> @@ -62,6 +65,8 @@
>>>>    @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
>>>>    @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
>>>>    @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
>>>> +@r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
>>>> +@r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
>>>>    @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
>>>>
>>>>    @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
>>>> @@ -210,5 +215,32 @@ fcvt_d_w   1101001  00000 ..... ... ..... 1010011 @r2_rm
>>>>    fcvt_d_wu  1101001  00001 ..... ... ..... 1010011 @r2_rm
>>>>
>>>>    # *** RV32V Extension ***
>>>> +
>>>> +# *** Vector loads and stores are encoded within LOADFP/STORE-FP ***
>>>> +vlb_v      ... 100 . 00000 ..... 000 ..... 0000111 @r2_nfvm
>>>> +vlh_v      ... 100 . 00000 ..... 101 ..... 0000111 @r2_nfvm
>>>> +vlw_v      ... 100 . 00000 ..... 110 ..... 0000111 @r2_nfvm
>>>> +vle_v      ... 000 . 00000 ..... 111 ..... 0000111 @r2_nfvm
>>>> +vlbu_v     ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm
>>>> +vlhu_v     ... 000 . 00000 ..... 101 ..... 0000111 @r2_nfvm
>>>> +vlwu_v     ... 000 . 00000 ..... 110 ..... 0000111 @r2_nfvm
>>>> +vsb_v      ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm
>>>> +vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
>>>> +vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
>>>> +vse_v      ... 000 . 00000 ..... 111 ..... 0100111 @r2_nfvm
>>>> +
>>>> +vlsb_v     ... 110 . ..... ..... 000 ..... 0000111 @r_nfvm
>>>> +vlsh_v     ... 110 . ..... ..... 101 ..... 0000111 @r_nfvm
>>>> +vlsw_v     ... 110 . ..... ..... 110 ..... 0000111 @r_nfvm
>>>> +vlse_v     ... 010 . ..... ..... 111 ..... 0000111 @r_nfvm
>>>> +vlsbu_v    ... 010 . ..... ..... 000 ..... 0000111 @r_nfvm
>>>> +vlshu_v    ... 010 . ..... ..... 101 ..... 0000111 @r_nfvm
>>>> +vlswu_v    ... 010 . ..... ..... 110 ..... 0000111 @r_nfvm
>>>> +vssb_v     ... 010 . ..... ..... 000 ..... 0100111 @r_nfvm
>>>> +vssh_v     ... 010 . ..... ..... 101 ..... 0100111 @r_nfvm
>>>> +vssw_v     ... 010 . ..... ..... 110 ..... 0100111 @r_nfvm
>>>> +vsse_v     ... 010 . ..... ..... 111 ..... 0100111 @r_nfvm
>>>> +
>>>> +# *** new major opcode OP-V ***
>>>>    vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
>>>>    vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
>>>> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
>>>> index da82c72bbf..d85f2aec68 100644
>>>> --- a/target/riscv/insn_trans/trans_rvv.inc.c
>>>> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
>>>> @@ -15,6 +15,8 @@
>>>>     * You should have received a copy of the GNU General Public License along with
>>>>     * this program.  If not, see <http://www.gnu.org/licenses/>.
>>>>     */
>>>> +#include "tcg/tcg-op-gvec.h"
>>>> +#include "tcg/tcg-gvec-desc.h"
>>>>
>>>>    static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl * a)
>>>>    {
>>>> @@ -67,3 +69,341 @@ static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli * a)
>>>>        tcg_temp_free(dst);
>>>>        return true;
>>>>    }
>>>> +
>>>> +/* vector register offset from env */
>>>> +static uint32_t vreg_ofs(DisasContext *s, int reg)
>>>> +{
>>>> +    return offsetof(CPURISCVState, vreg) + reg * s->vlen / 8;
>>>> +}
>>>> +
>>>> +/* check functions */
>>>> +static bool vext_check_isa_ill(DisasContext *s, target_ulong isa)
>>>> +{
>>>> +    return !s->vill && ((s->misa & isa) == isa);
>>>> +}
>>> I don't think we need a new function to check ISA.
>> I don't think so.
>>
>> Although there is a riscv_has_ext(env, isa) in cpu.h, it is not proper
>> in this file,
>> as it is in translation time and  usually DisasContext   is used here
>> instead of CPURISCVState.
> Ah good point. This is fine then.
>
>> VILL and ISA  will be checked in every vector instruction, I just put
>> them in one function.
>>>> +
>>>> +/*
>>>> + * There are two rules check here.
>>>> + *
>>>> + * 1. Vector register numbers are multiples of LMUL. (Section 3.2)
>>>> + *
>>>> + * 2. For all widening instructions, the destination LMUL value must also be
>>>> + *    a supported LMUL value. (Section 11.2)
>>>> + */
>>>> +static bool vext_check_reg(DisasContext *s, uint32_t reg, bool widen)
>>>> +{
>>>> +    /*
>>>> +     * The destination vector register group results are arranged as if both
>>>> +     * SEW and LMUL were at twice their current settings. (Section 11.2).
>>>> +     */
>>>> +    int legal = widen ? 2 << s->lmul : 1 << s->lmul;
>>>> +
>>>> +    return !((s->lmul == 0x3 && widen) || (reg % legal));
>>> Where does this 3 come from?
>> LMUL are 2 bits in VTYPE.  So the biggest LMUL is 0x3.
>> The meaning of 0x3 is there are 8 vector registers will be used for
>> operators.
>>
>> For a widen operation, LMUL equals 0x3 will be illegal, as
>>
>>       "The destination vector register group results are arranged as if both
>>        SEW and LMUL were at twice their current settings. (Section 11.2)."
>>
>> If LMUL is 0x3, the source vector register group is 8 vector registers, and
>> the destination vector register group will be 16 vector registers indicated,
>> which is illegal.
> Ah ok.
>
>>>> +}
>>>> +
>>>> +/*
>>>> + * There are two rules check here.
>>>> + *
>>>> + * 1. The destination vector register group for a masked vector instruction can
>>>> + *    only overlap the source mask register (v0) when LMUL=1. (Section 5.3)
>>>> + *
>>>> + * 2. In widen instructions and some other insturctions, like vslideup.vx,
>>>> + *    there is no need to check whether LMUL=1.
>>>> + */
>>>> +static bool vext_check_overlap_mask(DisasContext *s, uint32_t vd, bool vm,
>>>> +    bool force)
>>>> +{
>>>> +    return (vm != 0 || vd != 0) || (!force && (s->lmul == 0));
>>>> +}
>>>> +
>>>> +/* The LMUL setting must be such that LMUL * NFIELDS <= 8. (Section 7.8) */
>>>> +static bool vext_check_nf(DisasContext *s, uint32_t nf)
>>>> +{
>>>> +    return (1 << s->lmul) * nf <= 8;
>>>> +}
>>>> +
>>>> +/* common translation macro */
>>>> +#define GEN_VEXT_TRANS(NAME, SEQ, ARGTYPE, OP, CHECK)      \
>>>> +static bool trans_##NAME(DisasContext *s, arg_##ARGTYPE *a)\
>>>> +{                                                          \
>>>> +    if (CHECK(s, a)) {                                     \
>>>> +        return OP(s, a, SEQ);                              \
>>>> +    }                                                      \
>>>> +    return false;                                          \
>>>> +}
>>>> +
>>>> +/*
>>>> + *** unit stride load and store
>>>> + */
>>>> +typedef void gen_helper_ldst_us(TCGv_ptr, TCGv_ptr, TCGv,
>>>> +        TCGv_env, TCGv_i32);
>>>> +
>>>> +static bool ldst_us_trans(uint32_t vd, uint32_t rs1, uint32_t data,
>>>> +        gen_helper_ldst_us *fn, DisasContext *s)
>>>> +{
>>>> +    TCGv_ptr dest, mask;
>>>> +    TCGv base;
>>>> +    TCGv_i32 desc;
>>>> +
>>>> +    dest = tcg_temp_new_ptr();
>>>> +    mask = tcg_temp_new_ptr();
>>>> +    base = tcg_temp_new();
>>>> +
>>>> +    /*
>>>> +     * As simd_desc supports at most 256 bytes, and in this implementation,
>>>> +     * the max vector group length is 2048 bytes. So split it into two parts.
>>>> +     *
>>>> +     * The first part is vlen in bytes, encoded in maxsz of simd_desc.
>>>> +     * The second part is lmul, encoded in data of simd_desc.
>>>> +     */
>>>> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
>>>> +
>>>> +    gen_get_gpr(base, rs1);
>>>> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
>>>> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
>>>> +
>>>> +    fn(dest, mask, base, cpu_env, desc);
>>>> +
>>>> +    tcg_temp_free_ptr(dest);
>>>> +    tcg_temp_free_ptr(mask);
>>>> +    tcg_temp_free(base);
>>>> +    tcg_temp_free_i32(desc);
>>>> +    return true;
>>>> +}
>>>> +
>>>> +static bool ld_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
>>>> +{
>>>> +    uint32_t data = 0;
>>>> +    gen_helper_ldst_us *fn;
>>>> +    static gen_helper_ldst_us * const fns[2][7][4] = {
>>>> +        /* masked unit stride load */
>>>> +        { { gen_helper_vlb_v_b_mask,  gen_helper_vlb_v_h_mask,
>>>> +            gen_helper_vlb_v_w_mask,  gen_helper_vlb_v_d_mask },
>>>> +          { NULL,                     gen_helper_vlh_v_h_mask,
>>>> +            gen_helper_vlh_v_w_mask,  gen_helper_vlh_v_d_mask },
>>>> +          { NULL,                     NULL,
>>>> +            gen_helper_vlw_v_w_mask,  gen_helper_vlw_v_d_mask },
>>>> +          { gen_helper_vle_v_b_mask,  gen_helper_vle_v_h_mask,
>>>> +            gen_helper_vle_v_w_mask,  gen_helper_vle_v_d_mask },
>>>> +          { gen_helper_vlbu_v_b_mask, gen_helper_vlbu_v_h_mask,
>>>> +            gen_helper_vlbu_v_w_mask, gen_helper_vlbu_v_d_mask },
>>>> +          { NULL,                     gen_helper_vlhu_v_h_mask,
>>>> +            gen_helper_vlhu_v_w_mask, gen_helper_vlhu_v_d_mask },
>>>> +          { NULL,                     NULL,
>>>> +            gen_helper_vlwu_v_w_mask, gen_helper_vlwu_v_d_mask } },
>>>> +        /* unmasked unit stride load */
>>>> +        { { gen_helper_vlb_v_b,  gen_helper_vlb_v_h,
>>>> +            gen_helper_vlb_v_w,  gen_helper_vlb_v_d },
>>>> +          { NULL,                gen_helper_vlh_v_h,
>>>> +            gen_helper_vlh_v_w,  gen_helper_vlh_v_d },
>>>> +          { NULL,                NULL,
>>>> +            gen_helper_vlw_v_w,  gen_helper_vlw_v_d },
>>>> +          { gen_helper_vle_v_b,  gen_helper_vle_v_h,
>>>> +            gen_helper_vle_v_w,  gen_helper_vle_v_d },
>>>> +          { gen_helper_vlbu_v_b, gen_helper_vlbu_v_h,
>>>> +            gen_helper_vlbu_v_w, gen_helper_vlbu_v_d },
>>>> +          { NULL,                gen_helper_vlhu_v_h,
>>>> +            gen_helper_vlhu_v_w, gen_helper_vlhu_v_d },
>>>> +          { NULL,                NULL,
>>>> +            gen_helper_vlwu_v_w, gen_helper_vlwu_v_d } }
>>>> +    };
>>>> +
>>>> +    fn =  fns[a->vm][seq][s->sew];
>>>> +    if (fn == NULL) {
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
>>>> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
>>>> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
>>>> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
>>>> +    return ldst_us_trans(a->rd, a->rs1, data, fn, s);
>>>> +}
>>>> +
>>>> +static bool ld_us_check(DisasContext *s, arg_r2nfvm* a)
>>>> +{
>>>> +    return (vext_check_isa_ill(s, RVV) &&
>>>> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
>>>> +            vext_check_reg(s, a->rd, false) &&
>>>> +            vext_check_nf(s, a->nf));
>>>> +}
>>>> +
>>>> +GEN_VEXT_TRANS(vlb_v, 0, r2nfvm, ld_us_op, ld_us_check)
>>>> +GEN_VEXT_TRANS(vlh_v, 1, r2nfvm, ld_us_op, ld_us_check)
>>>> +GEN_VEXT_TRANS(vlw_v, 2, r2nfvm, ld_us_op, ld_us_check)
>>>> +GEN_VEXT_TRANS(vle_v, 3, r2nfvm, ld_us_op, ld_us_check)
>>>> +GEN_VEXT_TRANS(vlbu_v, 4, r2nfvm, ld_us_op, ld_us_check)
>>>> +GEN_VEXT_TRANS(vlhu_v, 5, r2nfvm, ld_us_op, ld_us_check)
>>>> +GEN_VEXT_TRANS(vlwu_v, 6, r2nfvm, ld_us_op, ld_us_check)
>>>> +
>>>> +static bool st_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
>>>> +{
>>>> +    uint32_t data = 0;
>>>> +    gen_helper_ldst_us *fn;
>>>> +    static gen_helper_ldst_us * const fns[2][4][4] = {
>>>> +        /* masked unit stride load and store */
>>>> +        { { gen_helper_vsb_v_b_mask,  gen_helper_vsb_v_h_mask,
>>>> +            gen_helper_vsb_v_w_mask,  gen_helper_vsb_v_d_mask },
>>>> +          { NULL,                     gen_helper_vsh_v_h_mask,
>>>> +            gen_helper_vsh_v_w_mask,  gen_helper_vsh_v_d_mask },
>>>> +          { NULL,                     NULL,
>>>> +            gen_helper_vsw_v_w_mask,  gen_helper_vsw_v_d_mask },
>>>> +          { gen_helper_vse_v_b_mask,  gen_helper_vse_v_h_mask,
>>>> +            gen_helper_vse_v_w_mask,  gen_helper_vse_v_d_mask } },
>>>> +        /* unmasked unit stride store */
>>>> +        { { gen_helper_vsb_v_b,  gen_helper_vsb_v_h,
>>>> +            gen_helper_vsb_v_w,  gen_helper_vsb_v_d },
>>>> +          { NULL,                gen_helper_vsh_v_h,
>>>> +            gen_helper_vsh_v_w,  gen_helper_vsh_v_d },
>>>> +          { NULL,                NULL,
>>>> +            gen_helper_vsw_v_w,  gen_helper_vsw_v_d },
>>>> +          { gen_helper_vse_v_b,  gen_helper_vse_v_h,
>>>> +            gen_helper_vse_v_w,  gen_helper_vse_v_d } }
>>>> +    };
>>>> +
>>>> +    fn =  fns[a->vm][seq][s->sew];
>>>> +    if (fn == NULL) {
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
>>>> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
>>>> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
>>>> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
>>>> +    return ldst_us_trans(a->rd, a->rs1, data, fn, s);
>>>> +}
>>>> +
>>>> +static bool st_us_check(DisasContext *s, arg_r2nfvm* a)
>>>> +{
>>>> +    return (vext_check_isa_ill(s, RVV) &&
>>>> +            vext_check_reg(s, a->rd, false) &&
>>>> +            vext_check_nf(s, a->nf));
>>>> +}
>>>> +
>>>> +GEN_VEXT_TRANS(vsb_v, 0, r2nfvm, st_us_op, st_us_check)
>>>> +GEN_VEXT_TRANS(vsh_v, 1, r2nfvm, st_us_op, st_us_check)
>>>> +GEN_VEXT_TRANS(vsw_v, 2, r2nfvm, st_us_op, st_us_check)
>>>> +GEN_VEXT_TRANS(vse_v, 3, r2nfvm, st_us_op, st_us_check)
>>>> +
>>>> +/*
>>>> + *** stride load and store
>>>> + */
>>>> +typedef void gen_helper_ldst_stride(TCGv_ptr, TCGv_ptr, TCGv,
>>>> +        TCGv, TCGv_env, TCGv_i32);
>>>> +
>>>> +static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, uint32_t rs2,
>>>> +        uint32_t data, gen_helper_ldst_stride *fn, DisasContext *s)
>>>> +{
>>>> +    TCGv_ptr dest, mask;
>>>> +    TCGv base, stride;
>>>> +    TCGv_i32 desc;
>>>> +
>>>> +    dest = tcg_temp_new_ptr();
>>>> +    mask = tcg_temp_new_ptr();
>>>> +    base = tcg_temp_new();
>>>> +    stride = tcg_temp_new();
>>>> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
>>>> +
>>>> +    gen_get_gpr(base, rs1);
>>>> +    gen_get_gpr(stride, rs2);
>>>> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
>>>> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
>>>> +
>>>> +    fn(dest, mask, base, stride, cpu_env, desc);
>>>> +
>>>> +    tcg_temp_free_ptr(dest);
>>>> +    tcg_temp_free_ptr(mask);
>>>> +    tcg_temp_free(base);
>>>> +    tcg_temp_free(stride);
>>>> +    tcg_temp_free_i32(desc);
>>>> +    return true;
>>>> +}
>>>> +
>>>> +static bool ld_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
>>>> +{
>>>> +    uint32_t data = 0;
>>>> +    gen_helper_ldst_stride *fn;
>>>> +    static gen_helper_ldst_stride * const fns[7][4] = {
>>>> +        { gen_helper_vlsb_v_b,  gen_helper_vlsb_v_h,
>>>> +          gen_helper_vlsb_v_w,  gen_helper_vlsb_v_d },
>>>> +        { NULL,                 gen_helper_vlsh_v_h,
>>>> +          gen_helper_vlsh_v_w,  gen_helper_vlsh_v_d },
>>>> +        { NULL,                 NULL,
>>>> +          gen_helper_vlsw_v_w,  gen_helper_vlsw_v_d },
>>>> +        { gen_helper_vlse_v_b,  gen_helper_vlse_v_h,
>>>> +          gen_helper_vlse_v_w,  gen_helper_vlse_v_d },
>>>> +        { gen_helper_vlsbu_v_b, gen_helper_vlsbu_v_h,
>>>> +          gen_helper_vlsbu_v_w, gen_helper_vlsbu_v_d },
>>>> +        { NULL,                 gen_helper_vlshu_v_h,
>>>> +          gen_helper_vlshu_v_w, gen_helper_vlshu_v_d },
>>>> +        { NULL,                 NULL,
>>>> +          gen_helper_vlswu_v_w, gen_helper_vlswu_v_d },
>>>> +    };
>>>> +
>>>> +    fn =  fns[seq][s->sew];
>>>> +    if (fn == NULL) {
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
>>>> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
>>>> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
>>>> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
>>>> +    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
>>>> +}
>>>> +
>>>> +static bool ld_stride_check(DisasContext *s, arg_rnfvm* a)
>>>> +{
>>>> +    return (vext_check_isa_ill(s, RVV) &&
>>>> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
>>>> +            vext_check_reg(s, a->rd, false) &&
>>>> +            vext_check_nf(s, a->nf));
>>>> +}
>>>> +
>>>> +GEN_VEXT_TRANS(vlsb_v, 0, rnfvm, ld_stride_op, ld_stride_check)
>>>> +GEN_VEXT_TRANS(vlsh_v, 1, rnfvm, ld_stride_op, ld_stride_check)
>>>> +GEN_VEXT_TRANS(vlsw_v, 2, rnfvm, ld_stride_op, ld_stride_check)
>>>> +GEN_VEXT_TRANS(vlse_v, 3, rnfvm, ld_stride_op, ld_stride_check)
>>>> +GEN_VEXT_TRANS(vlsbu_v, 4, rnfvm, ld_stride_op, ld_stride_check)
>>>> +GEN_VEXT_TRANS(vlshu_v, 5, rnfvm, ld_stride_op, ld_stride_check)
>>>> +GEN_VEXT_TRANS(vlswu_v, 6, rnfvm, ld_stride_op, ld_stride_check)
>>>> +
>>>> +static bool st_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t seq)
>>>> +{
>>>> +    uint32_t data = 0;
>>>> +    gen_helper_ldst_stride *fn;
>>>> +    static gen_helper_ldst_stride * const fns[4][4] = {
>>>> +        /* masked stride store */
>>>> +        { gen_helper_vssb_v_b,  gen_helper_vssb_v_h,
>>>> +          gen_helper_vssb_v_w,  gen_helper_vssb_v_d },
>>>> +        { NULL,                 gen_helper_vssh_v_h,
>>>> +          gen_helper_vssh_v_w,  gen_helper_vssh_v_d },
>>>> +        { NULL,                 NULL,
>>>> +          gen_helper_vssw_v_w,  gen_helper_vssw_v_d },
>>>> +        { gen_helper_vsse_v_b,  gen_helper_vsse_v_h,
>>>> +          gen_helper_vsse_v_w,  gen_helper_vsse_v_d }
>>>> +    };
>>>> +
>>>> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
>>>> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
>>>> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
>>>> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
>>>> +    fn =  fns[seq][s->sew];
>>>> +    if (fn == NULL) {
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
>>>> +}
>>>> +
>>>> +static bool st_stride_check(DisasContext *s, arg_rnfvm* a)
>>>> +{
>>>> +    return (vext_check_isa_ill(s, RVV) &&
>>>> +            vext_check_reg(s, a->rd, false) &&
>>>> +            vext_check_nf(s, a->nf));
>>>> +}
>>>> +
>>>> +GEN_VEXT_TRANS(vssb_v, 0, rnfvm, st_stride_op, st_stride_check)
>>>> +GEN_VEXT_TRANS(vssh_v, 1, rnfvm, st_stride_op, st_stride_check)
>>>> +GEN_VEXT_TRANS(vssw_v, 2, rnfvm, st_stride_op, st_stride_check)
>>>> +GEN_VEXT_TRANS(vsse_v, 3, rnfvm, st_stride_op, st_stride_check)
>>> Looks good
>>>
>>>> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
>>>> index af07ac4160..852545b77e 100644
>>>> --- a/target/riscv/translate.c
>>>> +++ b/target/riscv/translate.c
>>>> @@ -61,6 +61,7 @@ typedef struct DisasContext {
>>>>        uint8_t lmul;
>>>>        uint8_t sew;
>>>>        uint16_t vlen;
>>>> +    uint16_t mlen;
>>>>        bool vl_eq_vlmax;
>>>>    } DisasContext;
>>>>
>>>> @@ -548,6 +549,11 @@ static void decode_RV32_64C(DisasContext *ctx, uint16_t opcode)
>>>>        }
>>>>    }
>>>>
>>>> +static int ex_plus_1(DisasContext *ctx, int nf)
>>>> +{
>>>> +    return nf + 1;
>>>> +}
>>>> +
>>>>    #define EX_SH(amount) \
>>>>        static int ex_shift_##amount(DisasContext *ctx, int imm) \
>>>>        {                                         \
>>>> @@ -784,6 +790,7 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
>>>>        ctx->vill = FIELD_EX32(tb_flags, TB_FLAGS, VILL);
>>>>        ctx->sew = FIELD_EX32(tb_flags, TB_FLAGS, SEW);
>>>>        ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
>>>> +    ctx->mlen = 1 << (ctx->sew  + 3 - ctx->lmul);
>>>>        ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
>>>>    }
>>>>
>>>> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
>>>> index 2afe716f2a..ebfabd2946 100644
>>>> --- a/target/riscv/vector_helper.c
>>>> +++ b/target/riscv/vector_helper.c
>>>> @@ -18,8 +18,10 @@
>>>>
>>>>    #include "qemu/osdep.h"
>>>>    #include "cpu.h"
>>>> +#include "exec/memop.h"
>>>>    #include "exec/exec-all.h"
>>>>    #include "exec/helper-proto.h"
>>>> +#include "tcg/tcg-gvec-desc.h"
>>>>    #include <math.h>
>>>>
>>>>    target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
>>>> @@ -51,3 +53,407 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
>>>>        env->vstart = 0;
>>>>        return vl;
>>>>    }
>>>> +
>>>> +/*
>>>> + * Note that vector data is stored in host-endian 64-bit chunks,
>>>> + * so addressing units smaller than that needs a host-endian fixup.
>>>> + */
>>>> +#ifdef HOST_WORDS_BIGENDIAN
>>>> +#define H1(x)   ((x) ^ 7)
>>>> +#define H1_2(x) ((x) ^ 6)
>>>> +#define H1_4(x) ((x) ^ 4)
>>>> +#define H2(x)   ((x) ^ 3)
>>>> +#define H4(x)   ((x) ^ 1)
>>>> +#define H8(x)   ((x))
>>>> +#else
>>>> +#define H1(x)   (x)
>>>> +#define H1_2(x) (x)
>>>> +#define H1_4(x) (x)
>>>> +#define H2(x)   (x)
>>>> +#define H4(x)   (x)
>>>> +#define H8(x)   (x)
>>>> +#endif
>>> Looks good. Overall this looks good. Do you mind splitting this patch
>>> up a little bit more? It's difficult to review such a long and complex
>>> patch.
>>>
>>> Alistair
>> As unit stride can be saw as  a special case of stride mode, I just put
>> them together.
>> I will  split the stride and unit stride mode in next patch set.
> Thank you.
>
>> Even though I think it will be some long and complex, a lot of corner
>> case must
>> be considered for vector load and store, and a lot of common code will
>> be defined
>> here.
> That's fine
>
> Alistair
>
>> Zhiwei
>>>> +
>>>> +static inline uint32_t vext_nf(uint32_t desc)
>>>> +{
>>>> +    return FIELD_EX32(simd_data(desc), VDATA, NF);
>>>> +}
>>>> +
>>>> +static inline uint32_t vext_mlen(uint32_t desc)
>>>> +{
>>>> +    return FIELD_EX32(simd_data(desc), VDATA, MLEN);
>>>> +}
>>>> +
>>>> +static inline uint32_t vext_vm(uint32_t desc)
>>>> +{
>>>> +    return FIELD_EX32(simd_data(desc), VDATA, VM);
>>>> +}
>>>> +
>>>> +static inline uint32_t vext_lmul(uint32_t desc)
>>>> +{
>>>> +    return FIELD_EX32(simd_data(desc), VDATA, LMUL);
>>>> +}
>>>> +
>>>> +/*
>>>> + * Get vector group length in bytes. Its range is [64, 2048].
>>>> + *
>>>> + * As simd_desc support at most 256, the max vlen is 512 bits.
>>>> + * So vlen in bytes is encoded as maxsz.
>>>> + */
>>>> +static inline uint32_t vext_maxsz(uint32_t desc)
>>>> +{
>>>> +    return simd_maxsz(desc) << vext_lmul(desc);
>>>> +}
>>>> +
>>>> +/*
>>>> + * This function checks watchpoint before real load operation.
>>>> + *
>>>> + * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
>>>> + * In user mode, there is no watchpoint support now.
>>>> + *
>>>> + * It will trigger an exception if there is no mapping in TLB
>>>> + * and page table walk can't fill the TLB entry. Then the guest
>>>> + * software can return here after process the exception or never return.
>>>> + */
>>>> +static void probe_pages(CPURISCVState *env, target_ulong addr,
>>>> +        target_ulong len, uintptr_t ra, MMUAccessType access_type)
>>>> +{
>>>> +    target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
>>>> +    target_ulong curlen = MIN(pagelen, len);
>>>> +
>>>> +    probe_access(env, addr, curlen, access_type,
>>>> +            cpu_mmu_index(env, false), ra);
>>>> +    if (len > curlen) {
>>>> +        addr += curlen;
>>>> +        curlen = len - curlen;
>>>> +        probe_access(env, addr, curlen, access_type,
>>>> +                cpu_mmu_index(env, false), ra);
>>>> +    }
>>>> +}
>>>> +
>>>> +#ifdef HOST_WORDS_BIGENDIAN
>>>> +static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
>>>> +{
>>>> +    /*
>>>> +     * Split the remaining range to two parts.
>>>> +     * The first part is in the last uint64_t unit.
>>>> +     * The second part start from the next uint64_t unit.
>>>> +     */
>>>> +    int part1 = 0, part2 = tot - cnt;
>>>> +    if (cnt % 8) {
>>>> +        part1 = 8 - (cnt % 8);
>>>> +        part2 = tot - cnt - part1;
>>>> +        memset(tail & ~(7ULL), 0, part1);
>>>> +        memset((tail + 8) & ~(7ULL), 0, part2);
>>>> +    } else {
>>>> +        memset(tail, 0, part2);
>>>> +    }
>>>> +}
>>>> +#else
>>>> +static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
>>>> +{
>>>> +    memset(tail, 0, tot - cnt);
>>>> +}
>>>> +#endif
>>>> +
>>>> +static void clearb(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
>>>> +{
>>>> +    int8_t *cur = ((int8_t *)vd + H1(idx));
>>>> +    vext_clear(cur, cnt, tot);
>>>> +}
>>>> +
>>>> +static void clearh(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
>>>> +{
>>>> +    int16_t *cur = ((int16_t *)vd + H2(idx));
>>>> +    vext_clear(cur, cnt, tot);
>>>> +}
>>>> +
>>>> +static void clearl(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
>>>> +{
>>>> +    int32_t *cur = ((int32_t *)vd + H4(idx));
>>>> +    vext_clear(cur, cnt, tot);
>>>> +}
>>>> +
>>>> +static void clearq(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
>>>> +{
>>>> +    int64_t *cur = (int64_t *)vd + idx;
>>>> +    vext_clear(cur, cnt, tot);
>>>> +}
>>>> +
>>>> +
>>>> +static inline int vext_elem_mask(void *v0, int mlen, int index)
>>>> +{
>>>> +    int idx = (index * mlen) / 64;
>>>> +    int pos = (index * mlen) % 64;
>>>> +    return (((uint64_t *)v0)[idx] >> pos) & 1;
>>>> +}
>>>> +
>>>> +/* elements operations for load and store */
>>>> +typedef void (*vext_ldst_elem_fn)(CPURISCVState *env, target_ulong addr,
>>>> +        uint32_t idx, void *vd, uintptr_t retaddr);
>>>> +typedef void (*vext_ld_clear_elem)(void *vd, uint32_t idx,
>>>> +        uint32_t cnt, uint32_t tot);
>>>> +
>>>> +#define GEN_VEXT_LD_ELEM(NAME, MTYPE, ETYPE, H, LDSUF)     \
>>>> +static void NAME(CPURISCVState *env, abi_ptr addr,         \
>>>> +        uint32_t idx, void *vd, uintptr_t retaddr)         \
>>>> +{                                                          \
>>>> +    MTYPE data;                                            \
>>>> +    ETYPE *cur = ((ETYPE *)vd + H(idx));                   \
>>>> +    data = cpu_##LDSUF##_data_ra(env, addr, retaddr);      \
>>>> +    *cur = data;                                           \
>>>> +}                                                          \
>>>> +
>>>> +GEN_VEXT_LD_ELEM(ldb_b, int8_t,  int8_t,  H1, ldsb)
>>>> +GEN_VEXT_LD_ELEM(ldb_h, int8_t,  int16_t, H2, ldsb)
>>>> +GEN_VEXT_LD_ELEM(ldb_w, int8_t,  int32_t, H4, ldsb)
>>>> +GEN_VEXT_LD_ELEM(ldb_d, int8_t,  int64_t, H8, ldsb)
>>>> +GEN_VEXT_LD_ELEM(ldh_h, int16_t, int16_t, H2, ldsw)
>>>> +GEN_VEXT_LD_ELEM(ldh_w, int16_t, int32_t, H4, ldsw)
>>>> +GEN_VEXT_LD_ELEM(ldh_d, int16_t, int64_t, H8, ldsw)
>>>> +GEN_VEXT_LD_ELEM(ldw_w, int32_t, int32_t, H4, ldl)
>>>> +GEN_VEXT_LD_ELEM(ldw_d, int32_t, int64_t, H8, ldl)
>>>> +GEN_VEXT_LD_ELEM(lde_b, int8_t,  int8_t,  H1, ldsb)
>>>> +GEN_VEXT_LD_ELEM(lde_h, int16_t, int16_t, H2, ldsw)
>>>> +GEN_VEXT_LD_ELEM(lde_w, int32_t, int32_t, H4, ldl)
>>>> +GEN_VEXT_LD_ELEM(lde_d, int64_t, int64_t, H8, ldq)
>>>> +GEN_VEXT_LD_ELEM(ldbu_b, uint8_t,  uint8_t,  H1, ldub)
>>>> +GEN_VEXT_LD_ELEM(ldbu_h, uint8_t,  uint16_t, H2, ldub)
>>>> +GEN_VEXT_LD_ELEM(ldbu_w, uint8_t,  uint32_t, H4, ldub)
>>>> +GEN_VEXT_LD_ELEM(ldbu_d, uint8_t,  uint64_t, H8, ldub)
>>>> +GEN_VEXT_LD_ELEM(ldhu_h, uint16_t, uint16_t, H2, lduw)
>>>> +GEN_VEXT_LD_ELEM(ldhu_w, uint16_t, uint32_t, H4, lduw)
>>>> +GEN_VEXT_LD_ELEM(ldhu_d, uint16_t, uint64_t, H8, lduw)
>>>> +GEN_VEXT_LD_ELEM(ldwu_w, uint32_t, uint32_t, H4, ldl)
>>>> +GEN_VEXT_LD_ELEM(ldwu_d, uint32_t, uint64_t, H8, ldl)
>>>> +
>>>> +#define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF)          \
>>>> +static void NAME(CPURISCVState *env, abi_ptr addr,       \
>>>> +        uint32_t idx, void *vd, uintptr_t retaddr)       \
>>>> +{                                                        \
>>>> +    ETYPE data = *((ETYPE *)vd + H(idx));                \
>>>> +    cpu_##STSUF##_data_ra(env, addr, data, retaddr);     \
>>>> +}
>>>> +GEN_VEXT_ST_ELEM(stb_b, int8_t,  H1, stb)
>>>> +GEN_VEXT_ST_ELEM(stb_h, int16_t, H2, stb)
>>>> +GEN_VEXT_ST_ELEM(stb_w, int32_t, H4, stb)
>>>> +GEN_VEXT_ST_ELEM(stb_d, int64_t, H8, stb)
>>>> +GEN_VEXT_ST_ELEM(sth_h, int16_t, H2, stw)
>>>> +GEN_VEXT_ST_ELEM(sth_w, int32_t, H4, stw)
>>>> +GEN_VEXT_ST_ELEM(sth_d, int64_t, H8, stw)
>>>> +GEN_VEXT_ST_ELEM(stw_w, int32_t, H4, stl)
>>>> +GEN_VEXT_ST_ELEM(stw_d, int64_t, H8, stl)
>>>> +GEN_VEXT_ST_ELEM(ste_b, int8_t,  H1, stb)
>>>> +GEN_VEXT_ST_ELEM(ste_h, int16_t, H2, stw)
>>>> +GEN_VEXT_ST_ELEM(ste_w, int32_t, H4, stl)
>>>> +GEN_VEXT_ST_ELEM(ste_d, int64_t, H8, stq)
>>>> +
>>>> +/*
>>>> + *** stride: access vector element from strided memory
>>>> + */
>>>> +static void vext_ldst_stride(void *vd, void *v0, target_ulong base,
>>>> +        target_ulong stride, CPURISCVState *env, uint32_t desc, uint32_t vm,
>>>> +        vext_ldst_elem_fn ldst_elem, vext_ld_clear_elem clear_elem,
>>>> +        uint32_t esz, uint32_t msz, uintptr_t ra, MMUAccessType access_type)
>>>> +{
>>>> +    uint32_t i, k;
>>>> +    uint32_t nf = vext_nf(desc);
>>>> +    uint32_t mlen = vext_mlen(desc);
>>>> +    uint32_t vlmax = vext_maxsz(desc) / esz;
>>>> +
>>>> +    if (env->vl == 0) {
>>>> +        return;
>>>> +    }
>>>> +    /* probe every access*/
>>>> +    for (i = 0; i < env->vl; i++) {
>>>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
>>>> +            continue;
>>>> +        }
>>>> +        probe_pages(env, base + stride * i, nf * msz, ra, access_type);
>>>> +    }
>>>> +    /* do real access */
>>>> +    for (i = 0; i < env->vl; i++) {
>>>> +        k = 0;
>>>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
>>>> +            continue;
>>>> +        }
>>>> +        while (k < nf) {
>>>> +            target_ulong addr = base + stride * i + k * msz;
>>>> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
>>>> +            k++;
>>>> +        }
>>>> +    }
>>>> +    /* clear tail elements */
>>>> +    if (clear_elem) {
>>>> +        for (k = 0; k < nf; k++) {
>>>> +            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>> +#define GEN_VEXT_LD_STRIDE(NAME, MTYPE, ETYPE, LOAD_FN, CLEAR_FN)       \
>>>> +void HELPER(NAME)(void *vd, void * v0, target_ulong base,               \
>>>> +        target_ulong stride, CPURISCVState *env, uint32_t desc)         \
>>>> +{                                                                       \
>>>> +    uint32_t vm = vext_vm(desc);                                        \
>>>> +    vext_ldst_stride(vd, v0, base, stride, env, desc, vm, LOAD_FN,      \
>>>> +        CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);\
>>>> +}
>>>> +
>>>> +GEN_VEXT_LD_STRIDE(vlsb_v_b,  int8_t,   int8_t,   ldb_b,  clearb)
>>>> +GEN_VEXT_LD_STRIDE(vlsb_v_h,  int8_t,   int16_t,  ldb_h,  clearh)
>>>> +GEN_VEXT_LD_STRIDE(vlsb_v_w,  int8_t,   int32_t,  ldb_w,  clearl)
>>>> +GEN_VEXT_LD_STRIDE(vlsb_v_d,  int8_t,   int64_t,  ldb_d,  clearq)
>>>> +GEN_VEXT_LD_STRIDE(vlsh_v_h,  int16_t,  int16_t,  ldh_h,  clearh)
>>>> +GEN_VEXT_LD_STRIDE(vlsh_v_w,  int16_t,  int32_t,  ldh_w,  clearl)
>>>> +GEN_VEXT_LD_STRIDE(vlsh_v_d,  int16_t,  int64_t,  ldh_d,  clearq)
>>>> +GEN_VEXT_LD_STRIDE(vlsw_v_w,  int32_t,  int32_t,  ldw_w,  clearl)
>>>> +GEN_VEXT_LD_STRIDE(vlsw_v_d,  int32_t,  int64_t,  ldw_d,  clearq)
>>>> +GEN_VEXT_LD_STRIDE(vlse_v_b,  int8_t,   int8_t,   lde_b,  clearb)
>>>> +GEN_VEXT_LD_STRIDE(vlse_v_h,  int16_t,  int16_t,  lde_h,  clearh)
>>>> +GEN_VEXT_LD_STRIDE(vlse_v_w,  int32_t,  int32_t,  lde_w,  clearl)
>>>> +GEN_VEXT_LD_STRIDE(vlse_v_d,  int64_t,  int64_t,  lde_d,  clearq)
>>>> +GEN_VEXT_LD_STRIDE(vlsbu_v_b, uint8_t,  uint8_t,  ldbu_b, clearb)
>>>> +GEN_VEXT_LD_STRIDE(vlsbu_v_h, uint8_t,  uint16_t, ldbu_h, clearh)
>>>> +GEN_VEXT_LD_STRIDE(vlsbu_v_w, uint8_t,  uint32_t, ldbu_w, clearl)
>>>> +GEN_VEXT_LD_STRIDE(vlsbu_v_d, uint8_t,  uint64_t, ldbu_d, clearq)
>>>> +GEN_VEXT_LD_STRIDE(vlshu_v_h, uint16_t, uint16_t, ldhu_h, clearh)
>>>> +GEN_VEXT_LD_STRIDE(vlshu_v_w, uint16_t, uint32_t, ldhu_w, clearl)
>>>> +GEN_VEXT_LD_STRIDE(vlshu_v_d, uint16_t, uint64_t, ldhu_d, clearq)
>>>> +GEN_VEXT_LD_STRIDE(vlswu_v_w, uint32_t, uint32_t, ldwu_w, clearl)
>>>> +GEN_VEXT_LD_STRIDE(vlswu_v_d, uint32_t, uint64_t, ldwu_d, clearq)
>>>> +
>>>> +#define GEN_VEXT_ST_STRIDE(NAME, MTYPE, ETYPE, STORE_FN)                \
>>>> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
>>>> +        target_ulong stride, CPURISCVState *env, uint32_t desc)         \
>>>> +{                                                                       \
>>>> +    uint32_t vm = vext_vm(desc);                                        \
>>>> +    vext_ldst_stride(vd, v0, base, stride, env, desc, vm, STORE_FN,     \
>>>> +        NULL, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);   \
>>>> +}
>>>> +
>>>> +GEN_VEXT_ST_STRIDE(vssb_v_b, int8_t,  int8_t,  stb_b)
>>>> +GEN_VEXT_ST_STRIDE(vssb_v_h, int8_t,  int16_t, stb_h)
>>>> +GEN_VEXT_ST_STRIDE(vssb_v_w, int8_t,  int32_t, stb_w)
>>>> +GEN_VEXT_ST_STRIDE(vssb_v_d, int8_t,  int64_t, stb_d)
>>>> +GEN_VEXT_ST_STRIDE(vssh_v_h, int16_t, int16_t, sth_h)
>>>> +GEN_VEXT_ST_STRIDE(vssh_v_w, int16_t, int32_t, sth_w)
>>>> +GEN_VEXT_ST_STRIDE(vssh_v_d, int16_t, int64_t, sth_d)
>>>> +GEN_VEXT_ST_STRIDE(vssw_v_w, int32_t, int32_t, stw_w)
>>>> +GEN_VEXT_ST_STRIDE(vssw_v_d, int32_t, int64_t, stw_d)
>>>> +GEN_VEXT_ST_STRIDE(vsse_v_b, int8_t,  int8_t,  ste_b)
>>>> +GEN_VEXT_ST_STRIDE(vsse_v_h, int16_t, int16_t, ste_h)
>>>> +GEN_VEXT_ST_STRIDE(vsse_v_w, int32_t, int32_t, ste_w)
>>>> +GEN_VEXT_ST_STRIDE(vsse_v_d, int64_t, int64_t, ste_d)
>>>> +
>>>> +/*
>>>> + *** unit-stride: access elements stored contiguously in memory
>>>> + */
>>>> +
>>>> +/* unmasked unit-stride load and store operation*/
>>>> +static inline void vext_ldst_us(void *vd, target_ulong base,
>>>> +        CPURISCVState *env, uint32_t desc,
>>>> +        vext_ldst_elem_fn ldst_elem,
>>>> +        vext_ld_clear_elem clear_elem,
>>>> +        uint32_t esz, uint32_t msz, uintptr_t ra,
>>>> +        MMUAccessType access_type)
>>>> +{
>>>> +    uint32_t i, k;
>>>> +    uint32_t nf = vext_nf(desc);
>>>> +    uint32_t vlmax = vext_maxsz(desc) / esz;
>>>> +
>>>> +    if (env->vl == 0) {
>>>> +        return;
>>>> +    }
>>>> +    /* probe every access */
>>>> +    probe_pages(env, base, env->vl * nf * msz, ra, access_type);
>>>> +    /* load bytes from guest memory */
>>>> +    for (i = 0; i < env->vl; i++) {
>>>> +        k = 0;
>>>> +        while (k < nf) {
>>>> +            target_ulong addr = base + (i * nf + k) * msz;
>>>> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
>>>> +            k++;
>>>> +        }
>>>> +    }
>>>> +    /* clear tail elements */
>>>> +    if (clear_elem) {
>>>> +        for (k = 0; k < nf; k++) {
>>>> +            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>> +/*
>>>> + * masked unit-stride load and store operation will be a special case of stride,
>>>> + * stride = NF * sizeof (MTYPE)
>>>> + */
>>>> +
>>>> +#define GEN_VEXT_LD_US(NAME, MTYPE, ETYPE, LOAD_FN, CLEAR_FN)           \
>>>> +void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,         \
>>>> +        CPURISCVState *env, uint32_t desc)                              \
>>>> +{                                                                       \
>>>> +    uint32_t stride = vext_nf(desc) * sizeof(MTYPE);                    \
>>>> +    vext_ldst_stride(vd, v0, base, stride, env, desc, false, LOAD_FN,   \
>>>> +        CLEAR_FN, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);\
>>>> +}                                                                       \
>>>> +                                                                        \
>>>> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
>>>> +        CPURISCVState *env, uint32_t desc)                              \
>>>> +{                                                                       \
>>>> +    vext_ldst_us(vd, base, env, desc, LOAD_FN, CLEAR_FN,                \
>>>> +        sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_LOAD);          \
>>>> +}
>>>> +
>>>> +GEN_VEXT_LD_US(vlb_v_b,  int8_t,   int8_t,   ldb_b,  clearb)
>>>> +GEN_VEXT_LD_US(vlb_v_h,  int8_t,   int16_t,  ldb_h,  clearh)
>>>> +GEN_VEXT_LD_US(vlb_v_w,  int8_t,   int32_t,  ldb_w,  clearl)
>>>> +GEN_VEXT_LD_US(vlb_v_d,  int8_t,   int64_t,  ldb_d,  clearq)
>>>> +GEN_VEXT_LD_US(vlh_v_h,  int16_t,  int16_t,  ldh_h,  clearh)
>>>> +GEN_VEXT_LD_US(vlh_v_w,  int16_t,  int32_t,  ldh_w,  clearl)
>>>> +GEN_VEXT_LD_US(vlh_v_d,  int16_t,  int64_t,  ldh_d,  clearq)
>>>> +GEN_VEXT_LD_US(vlw_v_w,  int32_t,  int32_t,  ldw_w,  clearl)
>>>> +GEN_VEXT_LD_US(vlw_v_d,  int32_t,  int64_t,  ldw_d,  clearq)
>>>> +GEN_VEXT_LD_US(vle_v_b,  int8_t,   int8_t,   lde_b,  clearb)
>>>> +GEN_VEXT_LD_US(vle_v_h,  int16_t,  int16_t,  lde_h,  clearh)
>>>> +GEN_VEXT_LD_US(vle_v_w,  int32_t,  int32_t,  lde_w,  clearl)
>>>> +GEN_VEXT_LD_US(vle_v_d,  int64_t,  int64_t,  lde_d,  clearq)
>>>> +GEN_VEXT_LD_US(vlbu_v_b, uint8_t,  uint8_t,  ldbu_b, clearb)
>>>> +GEN_VEXT_LD_US(vlbu_v_h, uint8_t,  uint16_t, ldbu_h, clearh)
>>>> +GEN_VEXT_LD_US(vlbu_v_w, uint8_t,  uint32_t, ldbu_w, clearl)
>>>> +GEN_VEXT_LD_US(vlbu_v_d, uint8_t,  uint64_t, ldbu_d, clearq)
>>>> +GEN_VEXT_LD_US(vlhu_v_h, uint16_t, uint16_t, ldhu_h, clearh)
>>>> +GEN_VEXT_LD_US(vlhu_v_w, uint16_t, uint32_t, ldhu_w, clearl)
>>>> +GEN_VEXT_LD_US(vlhu_v_d, uint16_t, uint64_t, ldhu_d, clearq)
>>>> +GEN_VEXT_LD_US(vlwu_v_w, uint32_t, uint32_t, ldwu_w, clearl)
>>>> +GEN_VEXT_LD_US(vlwu_v_d, uint32_t, uint64_t, ldwu_d, clearq)
>>>> +
>>>> +#define GEN_VEXT_ST_US(NAME, MTYPE, ETYPE, STORE_FN)                    \
>>>> +void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,         \
>>>> +        CPURISCVState *env, uint32_t desc)                              \
>>>> +{                                                                       \
>>>> +    uint32_t stride = vext_nf(desc) * sizeof(MTYPE);                    \
>>>> +    vext_ldst_stride(vd, v0, base, stride, env, desc, false, STORE_FN,  \
>>>> +        NULL, sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);   \
>>>> +}                                                                       \
>>>> +                                                                        \
>>>> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
>>>> +        CPURISCVState *env, uint32_t desc)                              \
>>>> +{                                                                       \
>>>> +    vext_ldst_us(vd, base, env, desc, STORE_FN, NULL,                   \
>>>> +        sizeof(ETYPE), sizeof(MTYPE), GETPC(), MMU_DATA_STORE);         \
>>>> +}
>>>> +
>>>> +GEN_VEXT_ST_US(vsb_v_b, int8_t,  int8_t , stb_b)
>>>> +GEN_VEXT_ST_US(vsb_v_h, int8_t,  int16_t, stb_h)
>>>> +GEN_VEXT_ST_US(vsb_v_w, int8_t,  int32_t, stb_w)
>>>> +GEN_VEXT_ST_US(vsb_v_d, int8_t,  int64_t, stb_d)
>>>> +GEN_VEXT_ST_US(vsh_v_h, int16_t, int16_t, sth_h)
>>>> +GEN_VEXT_ST_US(vsh_v_w, int16_t, int32_t, sth_w)
>>>> +GEN_VEXT_ST_US(vsh_v_d, int16_t, int64_t, sth_d)
>>>> +GEN_VEXT_ST_US(vsw_v_w, int32_t, int32_t, stw_w)
>>>> +GEN_VEXT_ST_US(vsw_v_d, int32_t, int64_t, stw_d)
>>>> +GEN_VEXT_ST_US(vse_v_b, int8_t,  int8_t , ste_b)
>>>> +GEN_VEXT_ST_US(vse_v_h, int16_t, int16_t, ste_h)
>>>> +GEN_VEXT_ST_US(vse_v_w, int32_t, int32_t, ste_w)
>>>> +GEN_VEXT_ST_US(vse_v_d, int64_t, int64_t, ste_d)
>>>> --
>>>> 2.23.0
>>>>



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 07/60] target/riscv: add fault-only-first unit stride load
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-13 22:24     ` Alistair Francis
  -1 siblings, 0 replies; 336+ messages in thread
From: Alistair Francis @ 2020-03-13 22:24 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: guoren, open list:RISC-V, Richard Henderson,
	qemu-devel@nongnu.org Developers, wxy194768, Chih-Min Chao,
	wenmeng_zhang, Palmer Dabbelt

On Thu, Mar 12, 2020 at 8:13 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> The unit-stride fault-only-fault load instructions are used to
> vectorize loops with data-dependent exit conditions(while loops).
> These instructions execute as a regular load except that they
> will only take a trap on element 0.
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  22 +++++
>  target/riscv/insn32.decode              |   7 ++
>  target/riscv/insn_trans/trans_rvv.inc.c |  69 +++++++++++++++
>  target/riscv/vector_helper.c            | 111 ++++++++++++++++++++++++
>  4 files changed, 209 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index f9b3da60ca..72ba4d9bdb 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -218,3 +218,25 @@ DEF_HELPER_6(vsxe_v_b, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vsxe_v_h, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vsxe_v_w, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vsxe_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_5(vlbff_v_b, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbff_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbff_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbff_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhff_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhff_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhff_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlwff_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlwff_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vleff_v_b, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vleff_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vleff_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vleff_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbuff_v_b, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbuff_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbuff_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbuff_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhuff_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhuff_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhuff_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlwuff_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlwuff_v_d, void, ptr, ptr, tl, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index bc36df33b5..b76c09c8c0 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -224,6 +224,13 @@ vle_v      ... 000 . 00000 ..... 111 ..... 0000111 @r2_nfvm
>  vlbu_v     ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm
>  vlhu_v     ... 000 . 00000 ..... 101 ..... 0000111 @r2_nfvm
>  vlwu_v     ... 000 . 00000 ..... 110 ..... 0000111 @r2_nfvm
> +vlbff_v    ... 100 . 10000 ..... 000 ..... 0000111 @r2_nfvm
> +vlhff_v    ... 100 . 10000 ..... 101 ..... 0000111 @r2_nfvm
> +vlwff_v    ... 100 . 10000 ..... 110 ..... 0000111 @r2_nfvm
> +vleff_v    ... 000 . 10000 ..... 111 ..... 0000111 @r2_nfvm
> +vlbuff_v   ... 000 . 10000 ..... 000 ..... 0000111 @r2_nfvm
> +vlhuff_v   ... 000 . 10000 ..... 101 ..... 0000111 @r2_nfvm
> +vlwuff_v   ... 000 . 10000 ..... 110 ..... 0000111 @r2_nfvm
>  vsb_v      ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm
>  vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
>  vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
> index 5d1eeef323..9d9fc886d6 100644
> --- a/target/riscv/insn_trans/trans_rvv.inc.c
> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
> @@ -531,3 +531,72 @@ GEN_VEXT_TRANS(vsxb_v, 0, rnfvm, st_index_op, st_index_check)
>  GEN_VEXT_TRANS(vsxh_v, 1, rnfvm, st_index_op, st_index_check)
>  GEN_VEXT_TRANS(vsxw_v, 2, rnfvm, st_index_op, st_index_check)
>  GEN_VEXT_TRANS(vsxe_v, 3, rnfvm, st_index_op, st_index_check)
> +
> +/*
> + *** unit stride fault-only-first load
> + */
> +static bool ldff_trans(uint32_t vd, uint32_t rs1, uint32_t data,
> +        gen_helper_ldst_us *fn, DisasContext *s)
> +{
> +    TCGv_ptr dest, mask;
> +    TCGv base;
> +    TCGv_i32 desc;
> +
> +    dest = tcg_temp_new_ptr();
> +    mask = tcg_temp_new_ptr();
> +    base = tcg_temp_new();
> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
> +
> +    gen_get_gpr(base, rs1);
> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
> +
> +    fn(dest, mask, base, cpu_env, desc);
> +
> +    tcg_temp_free_ptr(dest);
> +    tcg_temp_free_ptr(mask);
> +    tcg_temp_free(base);
> +    tcg_temp_free_i32(desc);
> +    return true;
> +}
> +
> +static bool ldff_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
> +{
> +    uint32_t data = 0;
> +    gen_helper_ldst_us *fn;
> +    static gen_helper_ldst_us * const fns[7][4] = {
> +        { gen_helper_vlbff_v_b,  gen_helper_vlbff_v_h,
> +          gen_helper_vlbff_v_w,  gen_helper_vlbff_v_d },
> +        { NULL,                  gen_helper_vlhff_v_h,
> +          gen_helper_vlhff_v_w,  gen_helper_vlhff_v_d },
> +        { NULL,                  NULL,
> +          gen_helper_vlwff_v_w,  gen_helper_vlwff_v_d },
> +        { gen_helper_vleff_v_b,  gen_helper_vleff_v_h,
> +          gen_helper_vleff_v_w,  gen_helper_vleff_v_d },
> +        { gen_helper_vlbuff_v_b, gen_helper_vlbuff_v_h,
> +          gen_helper_vlbuff_v_w, gen_helper_vlbuff_v_d },
> +        { NULL,                  gen_helper_vlhuff_v_h,
> +          gen_helper_vlhuff_v_w, gen_helper_vlhuff_v_d },
> +        { NULL,                  NULL,
> +          gen_helper_vlwuff_v_w, gen_helper_vlwuff_v_d }
> +    };
> +
> +    fn =  fns[seq][s->sew];
> +    if (fn == NULL) {
> +        return false;
> +    }
> +
> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
> +    return ldff_trans(a->rd, a->rs1, data, fn, s);
> +}
> +
> +GEN_VEXT_TRANS(vlbff_v, 0, r2nfvm, ldff_op, ld_us_check)
> +GEN_VEXT_TRANS(vlhff_v, 1, r2nfvm, ldff_op, ld_us_check)
> +GEN_VEXT_TRANS(vlwff_v, 2, r2nfvm, ldff_op, ld_us_check)
> +GEN_VEXT_TRANS(vleff_v, 3, r2nfvm, ldff_op, ld_us_check)
> +GEN_VEXT_TRANS(vlbuff_v, 4, r2nfvm, ldff_op, ld_us_check)
> +GEN_VEXT_TRANS(vlhuff_v, 5, r2nfvm, ldff_op, ld_us_check)
> +GEN_VEXT_TRANS(vlwuff_v, 6, r2nfvm, ldff_op, ld_us_check)
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 35cb9f09b4..3841301b74 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -574,3 +574,114 @@ GEN_VEXT_ST_INDEX(vsxe_v_b, int8_t,  int8_t,  idx_b, ste_b)
>  GEN_VEXT_ST_INDEX(vsxe_v_h, int16_t, int16_t, idx_h, ste_h)
>  GEN_VEXT_ST_INDEX(vsxe_v_w, int32_t, int32_t, idx_w, ste_w)
>  GEN_VEXT_ST_INDEX(vsxe_v_d, int64_t, int64_t, idx_d, ste_d)
> +
> +/*
> + *** unit-stride fault-only-fisrt load instructions
> + */
> +static inline void vext_ldff(void *vd, void *v0, target_ulong base,
> +        CPURISCVState *env, uint32_t desc,
> +        vext_ldst_elem_fn ldst_elem,
> +        vext_ld_clear_elem clear_elem,
> +        int mmuidx, uint32_t esz, uint32_t msz, uintptr_t ra)
> +{
> +    void *host;
> +    uint32_t i, k, vl = 0;
> +    uint32_t mlen = vext_mlen(desc);
> +    uint32_t nf = vext_nf(desc);
> +    uint32_t vm = vext_vm(desc);
> +    uint32_t vlmax = vext_maxsz(desc) / esz;
> +    target_ulong addr, offset, remain;
> +
> +    if (env->vl == 0) {
> +        return;
> +    }
> +    /* probe every access*/
> +    for (i = 0; i < env->vl; i++) {
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +            continue;
> +        }
> +        addr = base + nf * i * msz;
> +        if (i == 0) {
> +            probe_pages(env, addr, nf * msz, ra, MMU_DATA_LOAD);
> +        } else {
> +            /* if it triggers an exception, no need to check watchpoint */
> +            offset = -(addr | TARGET_PAGE_MASK);

You can move this assign into the while loop below

> +            remain = nf * msz;
> +            while (remain > 0) {
> +                host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmuidx);
> +                if (host) {
> +#ifdef CONFIG_USER_ONLY
> +                    if (page_check_range(addr, nf * msz, PAGE_READ) < 0) {
> +                        vl = i;
> +                        goto ProbeSuccess;
> +                    }
> +#else
> +                    probe_pages(env, addr, nf * msz, ra, MMU_DATA_LOAD);
> +#endif
> +                } else {
> +                    vl = i;
> +                    goto ProbeSuccess;
> +                }
> +                if (remain <=  offset) {
> +                    break;
> +                }
> +                remain -= offset;
> +                addr += offset;
> +                offset = -(addr | TARGET_PAGE_MASK);

and then remove this

> +            }
> +        }
> +    }
> +ProbeSuccess:
> +    /* load bytes from guest memory */
> +    if (vl != 0) {
> +        env->vl = vl;
> +    }
> +    for (i = 0; i < env->vl; i++) {
> +        k = 0;
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +            continue;
> +        }
> +        while (k < nf) {
> +            target_ulong addr = base + (i * nf + k) * msz;
> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
> +            k++;
> +        }
> +    }
> +    /* clear tail elements */
> +    if (vl != 0) {
> +        return;
> +    }
> +    for (k = 0; k < nf; k++) {
> +        clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
> +    }
> +}
> +
> +#define GEN_VEXT_LDFF(NAME, MTYPE, ETYPE, MMUIDX, LOAD_FN, CLEAR_FN)  \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,              \
> +        CPURISCVState *env, uint32_t desc)                            \
> +{                                                                     \
> +    vext_ldff(vd, v0, base, env, desc, LOAD_FN, CLEAR_FN, MMUIDX,     \
> +        sizeof(ETYPE), sizeof(MTYPE), GETPC());                       \
> +}
> +GEN_VEXT_LDFF(vlbff_v_b,  int8_t,   int8_t,   MO_SB,   ldb_b,  clearb)
> +GEN_VEXT_LDFF(vlbff_v_h,  int8_t,   int16_t,  MO_SB,   ldb_h,  clearh)
> +GEN_VEXT_LDFF(vlbff_v_w,  int8_t,   int32_t,  MO_SB,   ldb_w,  clearl)
> +GEN_VEXT_LDFF(vlbff_v_d,  int8_t,   int64_t,  MO_SB,   ldb_d,  clearq)
> +GEN_VEXT_LDFF(vlhff_v_h,  int16_t,  int16_t,  MO_LESW, ldh_h,  clearh)
> +GEN_VEXT_LDFF(vlhff_v_w,  int16_t,  int32_t,  MO_LESW, ldh_w,  clearl)
> +GEN_VEXT_LDFF(vlhff_v_d,  int16_t,  int64_t,  MO_LESW, ldh_d,  clearq)
> +GEN_VEXT_LDFF(vlwff_v_w,  int32_t,  int32_t,  MO_LESL, ldw_w,  clearl)
> +GEN_VEXT_LDFF(vlwff_v_d,  int32_t,  int64_t,  MO_LESL, ldw_d,  clearq)
> +GEN_VEXT_LDFF(vleff_v_b,  int8_t,   int8_t,   MO_SB,   lde_b,  clearb)
> +GEN_VEXT_LDFF(vleff_v_h,  int16_t,  int16_t,  MO_LESW, lde_h,  clearh)
> +GEN_VEXT_LDFF(vleff_v_w,  int32_t,  int32_t,  MO_LESL, lde_w,  clearl)
> +GEN_VEXT_LDFF(vleff_v_d,  int64_t,  int64_t,  MO_LEQ,  lde_d,  clearq)
> +GEN_VEXT_LDFF(vlbuff_v_b, uint8_t,  uint8_t,  MO_UB,   ldbu_b, clearb)
> +GEN_VEXT_LDFF(vlbuff_v_h, uint8_t,  uint16_t, MO_UB,   ldbu_h, clearh)
> +GEN_VEXT_LDFF(vlbuff_v_w, uint8_t,  uint32_t, MO_UB,   ldbu_w, clearl)
> +GEN_VEXT_LDFF(vlbuff_v_d, uint8_t,  uint64_t, MO_UB,   ldbu_d, clearq)
> +GEN_VEXT_LDFF(vlhuff_v_h, uint16_t, uint16_t, MO_LEUW, ldhu_h, clearh)
> +GEN_VEXT_LDFF(vlhuff_v_w, uint16_t, uint32_t, MO_LEUW, ldhu_w, clearl)
> +GEN_VEXT_LDFF(vlhuff_v_d, uint16_t, uint64_t, MO_LEUW, ldhu_d, clearq)
> +GEN_VEXT_LDFF(vlwuff_v_w, uint32_t, uint32_t, MO_LEUL, ldwu_w, clearl)
> +GEN_VEXT_LDFF(vlwuff_v_d, uint32_t, uint64_t, MO_LEUL, ldwu_d, clearq)

Otherwise looks good.

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> --
> 2.23.0
>


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 07/60] target/riscv: add fault-only-first unit stride load
@ 2020-03-13 22:24     ` Alistair Francis
  0 siblings, 0 replies; 336+ messages in thread
From: Alistair Francis @ 2020-03-13 22:24 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Chih-Min Chao, Palmer Dabbelt, wenmeng_zhang,
	wxy194768, guoren, qemu-devel@nongnu.org Developers,
	open list:RISC-V

On Thu, Mar 12, 2020 at 8:13 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> The unit-stride fault-only-fault load instructions are used to
> vectorize loops with data-dependent exit conditions(while loops).
> These instructions execute as a regular load except that they
> will only take a trap on element 0.
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  22 +++++
>  target/riscv/insn32.decode              |   7 ++
>  target/riscv/insn_trans/trans_rvv.inc.c |  69 +++++++++++++++
>  target/riscv/vector_helper.c            | 111 ++++++++++++++++++++++++
>  4 files changed, 209 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index f9b3da60ca..72ba4d9bdb 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -218,3 +218,25 @@ DEF_HELPER_6(vsxe_v_b, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vsxe_v_h, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vsxe_v_w, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vsxe_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_5(vlbff_v_b, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbff_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbff_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbff_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhff_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhff_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhff_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlwff_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlwff_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vleff_v_b, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vleff_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vleff_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vleff_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbuff_v_b, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbuff_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbuff_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbuff_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhuff_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhuff_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhuff_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlwuff_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlwuff_v_d, void, ptr, ptr, tl, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index bc36df33b5..b76c09c8c0 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -224,6 +224,13 @@ vle_v      ... 000 . 00000 ..... 111 ..... 0000111 @r2_nfvm
>  vlbu_v     ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm
>  vlhu_v     ... 000 . 00000 ..... 101 ..... 0000111 @r2_nfvm
>  vlwu_v     ... 000 . 00000 ..... 110 ..... 0000111 @r2_nfvm
> +vlbff_v    ... 100 . 10000 ..... 000 ..... 0000111 @r2_nfvm
> +vlhff_v    ... 100 . 10000 ..... 101 ..... 0000111 @r2_nfvm
> +vlwff_v    ... 100 . 10000 ..... 110 ..... 0000111 @r2_nfvm
> +vleff_v    ... 000 . 10000 ..... 111 ..... 0000111 @r2_nfvm
> +vlbuff_v   ... 000 . 10000 ..... 000 ..... 0000111 @r2_nfvm
> +vlhuff_v   ... 000 . 10000 ..... 101 ..... 0000111 @r2_nfvm
> +vlwuff_v   ... 000 . 10000 ..... 110 ..... 0000111 @r2_nfvm
>  vsb_v      ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm
>  vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
>  vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
> index 5d1eeef323..9d9fc886d6 100644
> --- a/target/riscv/insn_trans/trans_rvv.inc.c
> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
> @@ -531,3 +531,72 @@ GEN_VEXT_TRANS(vsxb_v, 0, rnfvm, st_index_op, st_index_check)
>  GEN_VEXT_TRANS(vsxh_v, 1, rnfvm, st_index_op, st_index_check)
>  GEN_VEXT_TRANS(vsxw_v, 2, rnfvm, st_index_op, st_index_check)
>  GEN_VEXT_TRANS(vsxe_v, 3, rnfvm, st_index_op, st_index_check)
> +
> +/*
> + *** unit stride fault-only-first load
> + */
> +static bool ldff_trans(uint32_t vd, uint32_t rs1, uint32_t data,
> +        gen_helper_ldst_us *fn, DisasContext *s)
> +{
> +    TCGv_ptr dest, mask;
> +    TCGv base;
> +    TCGv_i32 desc;
> +
> +    dest = tcg_temp_new_ptr();
> +    mask = tcg_temp_new_ptr();
> +    base = tcg_temp_new();
> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
> +
> +    gen_get_gpr(base, rs1);
> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
> +
> +    fn(dest, mask, base, cpu_env, desc);
> +
> +    tcg_temp_free_ptr(dest);
> +    tcg_temp_free_ptr(mask);
> +    tcg_temp_free(base);
> +    tcg_temp_free_i32(desc);
> +    return true;
> +}
> +
> +static bool ldff_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
> +{
> +    uint32_t data = 0;
> +    gen_helper_ldst_us *fn;
> +    static gen_helper_ldst_us * const fns[7][4] = {
> +        { gen_helper_vlbff_v_b,  gen_helper_vlbff_v_h,
> +          gen_helper_vlbff_v_w,  gen_helper_vlbff_v_d },
> +        { NULL,                  gen_helper_vlhff_v_h,
> +          gen_helper_vlhff_v_w,  gen_helper_vlhff_v_d },
> +        { NULL,                  NULL,
> +          gen_helper_vlwff_v_w,  gen_helper_vlwff_v_d },
> +        { gen_helper_vleff_v_b,  gen_helper_vleff_v_h,
> +          gen_helper_vleff_v_w,  gen_helper_vleff_v_d },
> +        { gen_helper_vlbuff_v_b, gen_helper_vlbuff_v_h,
> +          gen_helper_vlbuff_v_w, gen_helper_vlbuff_v_d },
> +        { NULL,                  gen_helper_vlhuff_v_h,
> +          gen_helper_vlhuff_v_w, gen_helper_vlhuff_v_d },
> +        { NULL,                  NULL,
> +          gen_helper_vlwuff_v_w, gen_helper_vlwuff_v_d }
> +    };
> +
> +    fn =  fns[seq][s->sew];
> +    if (fn == NULL) {
> +        return false;
> +    }
> +
> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
> +    return ldff_trans(a->rd, a->rs1, data, fn, s);
> +}
> +
> +GEN_VEXT_TRANS(vlbff_v, 0, r2nfvm, ldff_op, ld_us_check)
> +GEN_VEXT_TRANS(vlhff_v, 1, r2nfvm, ldff_op, ld_us_check)
> +GEN_VEXT_TRANS(vlwff_v, 2, r2nfvm, ldff_op, ld_us_check)
> +GEN_VEXT_TRANS(vleff_v, 3, r2nfvm, ldff_op, ld_us_check)
> +GEN_VEXT_TRANS(vlbuff_v, 4, r2nfvm, ldff_op, ld_us_check)
> +GEN_VEXT_TRANS(vlhuff_v, 5, r2nfvm, ldff_op, ld_us_check)
> +GEN_VEXT_TRANS(vlwuff_v, 6, r2nfvm, ldff_op, ld_us_check)
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 35cb9f09b4..3841301b74 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -574,3 +574,114 @@ GEN_VEXT_ST_INDEX(vsxe_v_b, int8_t,  int8_t,  idx_b, ste_b)
>  GEN_VEXT_ST_INDEX(vsxe_v_h, int16_t, int16_t, idx_h, ste_h)
>  GEN_VEXT_ST_INDEX(vsxe_v_w, int32_t, int32_t, idx_w, ste_w)
>  GEN_VEXT_ST_INDEX(vsxe_v_d, int64_t, int64_t, idx_d, ste_d)
> +
> +/*
> + *** unit-stride fault-only-fisrt load instructions
> + */
> +static inline void vext_ldff(void *vd, void *v0, target_ulong base,
> +        CPURISCVState *env, uint32_t desc,
> +        vext_ldst_elem_fn ldst_elem,
> +        vext_ld_clear_elem clear_elem,
> +        int mmuidx, uint32_t esz, uint32_t msz, uintptr_t ra)
> +{
> +    void *host;
> +    uint32_t i, k, vl = 0;
> +    uint32_t mlen = vext_mlen(desc);
> +    uint32_t nf = vext_nf(desc);
> +    uint32_t vm = vext_vm(desc);
> +    uint32_t vlmax = vext_maxsz(desc) / esz;
> +    target_ulong addr, offset, remain;
> +
> +    if (env->vl == 0) {
> +        return;
> +    }
> +    /* probe every access*/
> +    for (i = 0; i < env->vl; i++) {
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +            continue;
> +        }
> +        addr = base + nf * i * msz;
> +        if (i == 0) {
> +            probe_pages(env, addr, nf * msz, ra, MMU_DATA_LOAD);
> +        } else {
> +            /* if it triggers an exception, no need to check watchpoint */
> +            offset = -(addr | TARGET_PAGE_MASK);

You can move this assign into the while loop below

> +            remain = nf * msz;
> +            while (remain > 0) {
> +                host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmuidx);
> +                if (host) {
> +#ifdef CONFIG_USER_ONLY
> +                    if (page_check_range(addr, nf * msz, PAGE_READ) < 0) {
> +                        vl = i;
> +                        goto ProbeSuccess;
> +                    }
> +#else
> +                    probe_pages(env, addr, nf * msz, ra, MMU_DATA_LOAD);
> +#endif
> +                } else {
> +                    vl = i;
> +                    goto ProbeSuccess;
> +                }
> +                if (remain <=  offset) {
> +                    break;
> +                }
> +                remain -= offset;
> +                addr += offset;
> +                offset = -(addr | TARGET_PAGE_MASK);

and then remove this

> +            }
> +        }
> +    }
> +ProbeSuccess:
> +    /* load bytes from guest memory */
> +    if (vl != 0) {
> +        env->vl = vl;
> +    }
> +    for (i = 0; i < env->vl; i++) {
> +        k = 0;
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +            continue;
> +        }
> +        while (k < nf) {
> +            target_ulong addr = base + (i * nf + k) * msz;
> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
> +            k++;
> +        }
> +    }
> +    /* clear tail elements */
> +    if (vl != 0) {
> +        return;
> +    }
> +    for (k = 0; k < nf; k++) {
> +        clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
> +    }
> +}
> +
> +#define GEN_VEXT_LDFF(NAME, MTYPE, ETYPE, MMUIDX, LOAD_FN, CLEAR_FN)  \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,              \
> +        CPURISCVState *env, uint32_t desc)                            \
> +{                                                                     \
> +    vext_ldff(vd, v0, base, env, desc, LOAD_FN, CLEAR_FN, MMUIDX,     \
> +        sizeof(ETYPE), sizeof(MTYPE), GETPC());                       \
> +}
> +GEN_VEXT_LDFF(vlbff_v_b,  int8_t,   int8_t,   MO_SB,   ldb_b,  clearb)
> +GEN_VEXT_LDFF(vlbff_v_h,  int8_t,   int16_t,  MO_SB,   ldb_h,  clearh)
> +GEN_VEXT_LDFF(vlbff_v_w,  int8_t,   int32_t,  MO_SB,   ldb_w,  clearl)
> +GEN_VEXT_LDFF(vlbff_v_d,  int8_t,   int64_t,  MO_SB,   ldb_d,  clearq)
> +GEN_VEXT_LDFF(vlhff_v_h,  int16_t,  int16_t,  MO_LESW, ldh_h,  clearh)
> +GEN_VEXT_LDFF(vlhff_v_w,  int16_t,  int32_t,  MO_LESW, ldh_w,  clearl)
> +GEN_VEXT_LDFF(vlhff_v_d,  int16_t,  int64_t,  MO_LESW, ldh_d,  clearq)
> +GEN_VEXT_LDFF(vlwff_v_w,  int32_t,  int32_t,  MO_LESL, ldw_w,  clearl)
> +GEN_VEXT_LDFF(vlwff_v_d,  int32_t,  int64_t,  MO_LESL, ldw_d,  clearq)
> +GEN_VEXT_LDFF(vleff_v_b,  int8_t,   int8_t,   MO_SB,   lde_b,  clearb)
> +GEN_VEXT_LDFF(vleff_v_h,  int16_t,  int16_t,  MO_LESW, lde_h,  clearh)
> +GEN_VEXT_LDFF(vleff_v_w,  int32_t,  int32_t,  MO_LESL, lde_w,  clearl)
> +GEN_VEXT_LDFF(vleff_v_d,  int64_t,  int64_t,  MO_LEQ,  lde_d,  clearq)
> +GEN_VEXT_LDFF(vlbuff_v_b, uint8_t,  uint8_t,  MO_UB,   ldbu_b, clearb)
> +GEN_VEXT_LDFF(vlbuff_v_h, uint8_t,  uint16_t, MO_UB,   ldbu_h, clearh)
> +GEN_VEXT_LDFF(vlbuff_v_w, uint8_t,  uint32_t, MO_UB,   ldbu_w, clearl)
> +GEN_VEXT_LDFF(vlbuff_v_d, uint8_t,  uint64_t, MO_UB,   ldbu_d, clearq)
> +GEN_VEXT_LDFF(vlhuff_v_h, uint16_t, uint16_t, MO_LEUW, ldhu_h, clearh)
> +GEN_VEXT_LDFF(vlhuff_v_w, uint16_t, uint32_t, MO_LEUW, ldhu_w, clearl)
> +GEN_VEXT_LDFF(vlhuff_v_d, uint16_t, uint64_t, MO_LEUW, ldhu_d, clearq)
> +GEN_VEXT_LDFF(vlwuff_v_w, uint32_t, uint32_t, MO_LEUL, ldwu_w, clearl)
> +GEN_VEXT_LDFF(vlwuff_v_d, uint32_t, uint64_t, MO_LEUL, ldwu_d, clearq)

Otherwise looks good.

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> --
> 2.23.0
>


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 07/60] target/riscv: add fault-only-first unit stride load
  2020-03-13 22:24     ` Alistair Francis
@ 2020-03-13 22:41       ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-13 22:41 UTC (permalink / raw)
  To: Alistair Francis
  Cc: guoren, open list:RISC-V, Richard Henderson,
	qemu-devel@nongnu.org Developers, wxy194768, Chih-Min Chao,
	wenmeng_zhang, Palmer Dabbelt



On 2020/3/14 6:24, Alistair Francis wrote:
> On Thu, Mar 12, 2020 at 8:13 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>> The unit-stride fault-only-fault load instructions are used to
>> vectorize loops with data-dependent exit conditions(while loops).
>> These instructions execute as a regular load except that they
>> will only take a trap on element 0.
>>
>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>> ---
>>   target/riscv/helper.h                   |  22 +++++
>>   target/riscv/insn32.decode              |   7 ++
>>   target/riscv/insn_trans/trans_rvv.inc.c |  69 +++++++++++++++
>>   target/riscv/vector_helper.c            | 111 ++++++++++++++++++++++++
>>   4 files changed, 209 insertions(+)
>>
>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>> index f9b3da60ca..72ba4d9bdb 100644
>> --- a/target/riscv/helper.h
>> +++ b/target/riscv/helper.h
>> @@ -218,3 +218,25 @@ DEF_HELPER_6(vsxe_v_b, void, ptr, ptr, tl, ptr, env, i32)
>>   DEF_HELPER_6(vsxe_v_h, void, ptr, ptr, tl, ptr, env, i32)
>>   DEF_HELPER_6(vsxe_v_w, void, ptr, ptr, tl, ptr, env, i32)
>>   DEF_HELPER_6(vsxe_v_d, void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_5(vlbff_v_b, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbff_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbff_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbff_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhff_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhff_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhff_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlwff_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlwff_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vleff_v_b, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vleff_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vleff_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vleff_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbuff_v_b, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbuff_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbuff_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbuff_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhuff_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhuff_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhuff_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlwuff_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlwuff_v_d, void, ptr, ptr, tl, env, i32)
>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>> index bc36df33b5..b76c09c8c0 100644
>> --- a/target/riscv/insn32.decode
>> +++ b/target/riscv/insn32.decode
>> @@ -224,6 +224,13 @@ vle_v      ... 000 . 00000 ..... 111 ..... 0000111 @r2_nfvm
>>   vlbu_v     ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm
>>   vlhu_v     ... 000 . 00000 ..... 101 ..... 0000111 @r2_nfvm
>>   vlwu_v     ... 000 . 00000 ..... 110 ..... 0000111 @r2_nfvm
>> +vlbff_v    ... 100 . 10000 ..... 000 ..... 0000111 @r2_nfvm
>> +vlhff_v    ... 100 . 10000 ..... 101 ..... 0000111 @r2_nfvm
>> +vlwff_v    ... 100 . 10000 ..... 110 ..... 0000111 @r2_nfvm
>> +vleff_v    ... 000 . 10000 ..... 111 ..... 0000111 @r2_nfvm
>> +vlbuff_v   ... 000 . 10000 ..... 000 ..... 0000111 @r2_nfvm
>> +vlhuff_v   ... 000 . 10000 ..... 101 ..... 0000111 @r2_nfvm
>> +vlwuff_v   ... 000 . 10000 ..... 110 ..... 0000111 @r2_nfvm
>>   vsb_v      ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm
>>   vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
>>   vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
>> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
>> index 5d1eeef323..9d9fc886d6 100644
>> --- a/target/riscv/insn_trans/trans_rvv.inc.c
>> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
>> @@ -531,3 +531,72 @@ GEN_VEXT_TRANS(vsxb_v, 0, rnfvm, st_index_op, st_index_check)
>>   GEN_VEXT_TRANS(vsxh_v, 1, rnfvm, st_index_op, st_index_check)
>>   GEN_VEXT_TRANS(vsxw_v, 2, rnfvm, st_index_op, st_index_check)
>>   GEN_VEXT_TRANS(vsxe_v, 3, rnfvm, st_index_op, st_index_check)
>> +
>> +/*
>> + *** unit stride fault-only-first load
>> + */
>> +static bool ldff_trans(uint32_t vd, uint32_t rs1, uint32_t data,
>> +        gen_helper_ldst_us *fn, DisasContext *s)
>> +{
>> +    TCGv_ptr dest, mask;
>> +    TCGv base;
>> +    TCGv_i32 desc;
>> +
>> +    dest = tcg_temp_new_ptr();
>> +    mask = tcg_temp_new_ptr();
>> +    base = tcg_temp_new();
>> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
>> +
>> +    gen_get_gpr(base, rs1);
>> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
>> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
>> +
>> +    fn(dest, mask, base, cpu_env, desc);
>> +
>> +    tcg_temp_free_ptr(dest);
>> +    tcg_temp_free_ptr(mask);
>> +    tcg_temp_free(base);
>> +    tcg_temp_free_i32(desc);
>> +    return true;
>> +}
>> +
>> +static bool ldff_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
>> +{
>> +    uint32_t data = 0;
>> +    gen_helper_ldst_us *fn;
>> +    static gen_helper_ldst_us * const fns[7][4] = {
>> +        { gen_helper_vlbff_v_b,  gen_helper_vlbff_v_h,
>> +          gen_helper_vlbff_v_w,  gen_helper_vlbff_v_d },
>> +        { NULL,                  gen_helper_vlhff_v_h,
>> +          gen_helper_vlhff_v_w,  gen_helper_vlhff_v_d },
>> +        { NULL,                  NULL,
>> +          gen_helper_vlwff_v_w,  gen_helper_vlwff_v_d },
>> +        { gen_helper_vleff_v_b,  gen_helper_vleff_v_h,
>> +          gen_helper_vleff_v_w,  gen_helper_vleff_v_d },
>> +        { gen_helper_vlbuff_v_b, gen_helper_vlbuff_v_h,
>> +          gen_helper_vlbuff_v_w, gen_helper_vlbuff_v_d },
>> +        { NULL,                  gen_helper_vlhuff_v_h,
>> +          gen_helper_vlhuff_v_w, gen_helper_vlhuff_v_d },
>> +        { NULL,                  NULL,
>> +          gen_helper_vlwuff_v_w, gen_helper_vlwuff_v_d }
>> +    };
>> +
>> +    fn =  fns[seq][s->sew];
>> +    if (fn == NULL) {
>> +        return false;
>> +    }
>> +
>> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
>> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
>> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
>> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
>> +    return ldff_trans(a->rd, a->rs1, data, fn, s);
>> +}
>> +
>> +GEN_VEXT_TRANS(vlbff_v, 0, r2nfvm, ldff_op, ld_us_check)
>> +GEN_VEXT_TRANS(vlhff_v, 1, r2nfvm, ldff_op, ld_us_check)
>> +GEN_VEXT_TRANS(vlwff_v, 2, r2nfvm, ldff_op, ld_us_check)
>> +GEN_VEXT_TRANS(vleff_v, 3, r2nfvm, ldff_op, ld_us_check)
>> +GEN_VEXT_TRANS(vlbuff_v, 4, r2nfvm, ldff_op, ld_us_check)
>> +GEN_VEXT_TRANS(vlhuff_v, 5, r2nfvm, ldff_op, ld_us_check)
>> +GEN_VEXT_TRANS(vlwuff_v, 6, r2nfvm, ldff_op, ld_us_check)
>> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
>> index 35cb9f09b4..3841301b74 100644
>> --- a/target/riscv/vector_helper.c
>> +++ b/target/riscv/vector_helper.c
>> @@ -574,3 +574,114 @@ GEN_VEXT_ST_INDEX(vsxe_v_b, int8_t,  int8_t,  idx_b, ste_b)
>>   GEN_VEXT_ST_INDEX(vsxe_v_h, int16_t, int16_t, idx_h, ste_h)
>>   GEN_VEXT_ST_INDEX(vsxe_v_w, int32_t, int32_t, idx_w, ste_w)
>>   GEN_VEXT_ST_INDEX(vsxe_v_d, int64_t, int64_t, idx_d, ste_d)
>> +
>> +/*
>> + *** unit-stride fault-only-fisrt load instructions
>> + */
>> +static inline void vext_ldff(void *vd, void *v0, target_ulong base,
>> +        CPURISCVState *env, uint32_t desc,
>> +        vext_ldst_elem_fn ldst_elem,
>> +        vext_ld_clear_elem clear_elem,
>> +        int mmuidx, uint32_t esz, uint32_t msz, uintptr_t ra)
>> +{
>> +    void *host;
>> +    uint32_t i, k, vl = 0;
>> +    uint32_t mlen = vext_mlen(desc);
>> +    uint32_t nf = vext_nf(desc);
>> +    uint32_t vm = vext_vm(desc);
>> +    uint32_t vlmax = vext_maxsz(desc) / esz;
>> +    target_ulong addr, offset, remain;
>> +
>> +    if (env->vl == 0) {
>> +        return;
>> +    }
>> +    /* probe every access*/
>> +    for (i = 0; i < env->vl; i++) {
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
>> +            continue;
>> +        }
>> +        addr = base + nf * i * msz;
>> +        if (i == 0) {
>> +            probe_pages(env, addr, nf * msz, ra, MMU_DATA_LOAD);
>> +        } else {
>> +            /* if it triggers an exception, no need to check watchpoint */
>> +            offset = -(addr | TARGET_PAGE_MASK);
> You can move this assign into the while loop below
>
>> +            remain = nf * msz;
>> +            while (remain > 0) {
>> +                host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmuidx);
>> +                if (host) {
>> +#ifdef CONFIG_USER_ONLY
>> +                    if (page_check_range(addr, nf * msz, PAGE_READ) < 0) {
>> +                        vl = i;
>> +                        goto ProbeSuccess;
>> +                    }
>> +#else
>> +                    probe_pages(env, addr, nf * msz, ra, MMU_DATA_LOAD);
>> +#endif
>> +                } else {
>> +                    vl = i;
>> +                    goto ProbeSuccess;
>> +                }
>> +                if (remain <=  offset) {
>> +                    break;
>> +                }
>> +                remain -= offset;
>> +                addr += offset;
>> +                offset = -(addr | TARGET_PAGE_MASK);
> and then remove this
Good idea. Thanks.
I will move it next patch set.

Zhiwei
>
>> +            }
>> +        }
>> +    }
>> +ProbeSuccess:
>> +    /* load bytes from guest memory */
>> +    if (vl != 0) {
>> +        env->vl = vl;
>> +    }
>> +    for (i = 0; i < env->vl; i++) {
>> +        k = 0;
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
>> +            continue;
>> +        }
>> +        while (k < nf) {
>> +            target_ulong addr = base + (i * nf + k) * msz;
>> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
>> +            k++;
>> +        }
>> +    }
>> +    /* clear tail elements */
>> +    if (vl != 0) {
>> +        return;
>> +    }
>> +    for (k = 0; k < nf; k++) {
>> +        clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
>> +    }
>> +}
>> +
>> +#define GEN_VEXT_LDFF(NAME, MTYPE, ETYPE, MMUIDX, LOAD_FN, CLEAR_FN)  \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,              \
>> +        CPURISCVState *env, uint32_t desc)                            \
>> +{                                                                     \
>> +    vext_ldff(vd, v0, base, env, desc, LOAD_FN, CLEAR_FN, MMUIDX,     \
>> +        sizeof(ETYPE), sizeof(MTYPE), GETPC());                       \
>> +}
>> +GEN_VEXT_LDFF(vlbff_v_b,  int8_t,   int8_t,   MO_SB,   ldb_b,  clearb)
>> +GEN_VEXT_LDFF(vlbff_v_h,  int8_t,   int16_t,  MO_SB,   ldb_h,  clearh)
>> +GEN_VEXT_LDFF(vlbff_v_w,  int8_t,   int32_t,  MO_SB,   ldb_w,  clearl)
>> +GEN_VEXT_LDFF(vlbff_v_d,  int8_t,   int64_t,  MO_SB,   ldb_d,  clearq)
>> +GEN_VEXT_LDFF(vlhff_v_h,  int16_t,  int16_t,  MO_LESW, ldh_h,  clearh)
>> +GEN_VEXT_LDFF(vlhff_v_w,  int16_t,  int32_t,  MO_LESW, ldh_w,  clearl)
>> +GEN_VEXT_LDFF(vlhff_v_d,  int16_t,  int64_t,  MO_LESW, ldh_d,  clearq)
>> +GEN_VEXT_LDFF(vlwff_v_w,  int32_t,  int32_t,  MO_LESL, ldw_w,  clearl)
>> +GEN_VEXT_LDFF(vlwff_v_d,  int32_t,  int64_t,  MO_LESL, ldw_d,  clearq)
>> +GEN_VEXT_LDFF(vleff_v_b,  int8_t,   int8_t,   MO_SB,   lde_b,  clearb)
>> +GEN_VEXT_LDFF(vleff_v_h,  int16_t,  int16_t,  MO_LESW, lde_h,  clearh)
>> +GEN_VEXT_LDFF(vleff_v_w,  int32_t,  int32_t,  MO_LESL, lde_w,  clearl)
>> +GEN_VEXT_LDFF(vleff_v_d,  int64_t,  int64_t,  MO_LEQ,  lde_d,  clearq)
>> +GEN_VEXT_LDFF(vlbuff_v_b, uint8_t,  uint8_t,  MO_UB,   ldbu_b, clearb)
>> +GEN_VEXT_LDFF(vlbuff_v_h, uint8_t,  uint16_t, MO_UB,   ldbu_h, clearh)
>> +GEN_VEXT_LDFF(vlbuff_v_w, uint8_t,  uint32_t, MO_UB,   ldbu_w, clearl)
>> +GEN_VEXT_LDFF(vlbuff_v_d, uint8_t,  uint64_t, MO_UB,   ldbu_d, clearq)
>> +GEN_VEXT_LDFF(vlhuff_v_h, uint16_t, uint16_t, MO_LEUW, ldhu_h, clearh)
>> +GEN_VEXT_LDFF(vlhuff_v_w, uint16_t, uint32_t, MO_LEUW, ldhu_w, clearl)
>> +GEN_VEXT_LDFF(vlhuff_v_d, uint16_t, uint64_t, MO_LEUW, ldhu_d, clearq)
>> +GEN_VEXT_LDFF(vlwuff_v_w, uint32_t, uint32_t, MO_LEUL, ldwu_w, clearl)
>> +GEN_VEXT_LDFF(vlwuff_v_d, uint32_t, uint64_t, MO_LEUL, ldwu_d, clearq)
> Otherwise looks good.
>
> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
>
> Alistair
>
>> --
>> 2.23.0
>>



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 07/60] target/riscv: add fault-only-first unit stride load
@ 2020-03-13 22:41       ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-13 22:41 UTC (permalink / raw)
  To: Alistair Francis
  Cc: Richard Henderson, Chih-Min Chao, Palmer Dabbelt, wenmeng_zhang,
	wxy194768, guoren, qemu-devel@nongnu.org Developers,
	open list:RISC-V



On 2020/3/14 6:24, Alistair Francis wrote:
> On Thu, Mar 12, 2020 at 8:13 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>> The unit-stride fault-only-fault load instructions are used to
>> vectorize loops with data-dependent exit conditions(while loops).
>> These instructions execute as a regular load except that they
>> will only take a trap on element 0.
>>
>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>> ---
>>   target/riscv/helper.h                   |  22 +++++
>>   target/riscv/insn32.decode              |   7 ++
>>   target/riscv/insn_trans/trans_rvv.inc.c |  69 +++++++++++++++
>>   target/riscv/vector_helper.c            | 111 ++++++++++++++++++++++++
>>   4 files changed, 209 insertions(+)
>>
>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>> index f9b3da60ca..72ba4d9bdb 100644
>> --- a/target/riscv/helper.h
>> +++ b/target/riscv/helper.h
>> @@ -218,3 +218,25 @@ DEF_HELPER_6(vsxe_v_b, void, ptr, ptr, tl, ptr, env, i32)
>>   DEF_HELPER_6(vsxe_v_h, void, ptr, ptr, tl, ptr, env, i32)
>>   DEF_HELPER_6(vsxe_v_w, void, ptr, ptr, tl, ptr, env, i32)
>>   DEF_HELPER_6(vsxe_v_d, void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_5(vlbff_v_b, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbff_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbff_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbff_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhff_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhff_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhff_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlwff_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlwff_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vleff_v_b, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vleff_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vleff_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vleff_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbuff_v_b, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbuff_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbuff_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlbuff_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhuff_v_h, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhuff_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlhuff_v_d, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlwuff_v_w, void, ptr, ptr, tl, env, i32)
>> +DEF_HELPER_5(vlwuff_v_d, void, ptr, ptr, tl, env, i32)
>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>> index bc36df33b5..b76c09c8c0 100644
>> --- a/target/riscv/insn32.decode
>> +++ b/target/riscv/insn32.decode
>> @@ -224,6 +224,13 @@ vle_v      ... 000 . 00000 ..... 111 ..... 0000111 @r2_nfvm
>>   vlbu_v     ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm
>>   vlhu_v     ... 000 . 00000 ..... 101 ..... 0000111 @r2_nfvm
>>   vlwu_v     ... 000 . 00000 ..... 110 ..... 0000111 @r2_nfvm
>> +vlbff_v    ... 100 . 10000 ..... 000 ..... 0000111 @r2_nfvm
>> +vlhff_v    ... 100 . 10000 ..... 101 ..... 0000111 @r2_nfvm
>> +vlwff_v    ... 100 . 10000 ..... 110 ..... 0000111 @r2_nfvm
>> +vleff_v    ... 000 . 10000 ..... 111 ..... 0000111 @r2_nfvm
>> +vlbuff_v   ... 000 . 10000 ..... 000 ..... 0000111 @r2_nfvm
>> +vlhuff_v   ... 000 . 10000 ..... 101 ..... 0000111 @r2_nfvm
>> +vlwuff_v   ... 000 . 10000 ..... 110 ..... 0000111 @r2_nfvm
>>   vsb_v      ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm
>>   vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
>>   vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
>> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
>> index 5d1eeef323..9d9fc886d6 100644
>> --- a/target/riscv/insn_trans/trans_rvv.inc.c
>> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
>> @@ -531,3 +531,72 @@ GEN_VEXT_TRANS(vsxb_v, 0, rnfvm, st_index_op, st_index_check)
>>   GEN_VEXT_TRANS(vsxh_v, 1, rnfvm, st_index_op, st_index_check)
>>   GEN_VEXT_TRANS(vsxw_v, 2, rnfvm, st_index_op, st_index_check)
>>   GEN_VEXT_TRANS(vsxe_v, 3, rnfvm, st_index_op, st_index_check)
>> +
>> +/*
>> + *** unit stride fault-only-first load
>> + */
>> +static bool ldff_trans(uint32_t vd, uint32_t rs1, uint32_t data,
>> +        gen_helper_ldst_us *fn, DisasContext *s)
>> +{
>> +    TCGv_ptr dest, mask;
>> +    TCGv base;
>> +    TCGv_i32 desc;
>> +
>> +    dest = tcg_temp_new_ptr();
>> +    mask = tcg_temp_new_ptr();
>> +    base = tcg_temp_new();
>> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
>> +
>> +    gen_get_gpr(base, rs1);
>> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
>> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
>> +
>> +    fn(dest, mask, base, cpu_env, desc);
>> +
>> +    tcg_temp_free_ptr(dest);
>> +    tcg_temp_free_ptr(mask);
>> +    tcg_temp_free(base);
>> +    tcg_temp_free_i32(desc);
>> +    return true;
>> +}
>> +
>> +static bool ldff_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
>> +{
>> +    uint32_t data = 0;
>> +    gen_helper_ldst_us *fn;
>> +    static gen_helper_ldst_us * const fns[7][4] = {
>> +        { gen_helper_vlbff_v_b,  gen_helper_vlbff_v_h,
>> +          gen_helper_vlbff_v_w,  gen_helper_vlbff_v_d },
>> +        { NULL,                  gen_helper_vlhff_v_h,
>> +          gen_helper_vlhff_v_w,  gen_helper_vlhff_v_d },
>> +        { NULL,                  NULL,
>> +          gen_helper_vlwff_v_w,  gen_helper_vlwff_v_d },
>> +        { gen_helper_vleff_v_b,  gen_helper_vleff_v_h,
>> +          gen_helper_vleff_v_w,  gen_helper_vleff_v_d },
>> +        { gen_helper_vlbuff_v_b, gen_helper_vlbuff_v_h,
>> +          gen_helper_vlbuff_v_w, gen_helper_vlbuff_v_d },
>> +        { NULL,                  gen_helper_vlhuff_v_h,
>> +          gen_helper_vlhuff_v_w, gen_helper_vlhuff_v_d },
>> +        { NULL,                  NULL,
>> +          gen_helper_vlwuff_v_w, gen_helper_vlwuff_v_d }
>> +    };
>> +
>> +    fn =  fns[seq][s->sew];
>> +    if (fn == NULL) {
>> +        return false;
>> +    }
>> +
>> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
>> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
>> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
>> +    data = FIELD_DP32(data, VDATA, NF, a->nf);
>> +    return ldff_trans(a->rd, a->rs1, data, fn, s);
>> +}
>> +
>> +GEN_VEXT_TRANS(vlbff_v, 0, r2nfvm, ldff_op, ld_us_check)
>> +GEN_VEXT_TRANS(vlhff_v, 1, r2nfvm, ldff_op, ld_us_check)
>> +GEN_VEXT_TRANS(vlwff_v, 2, r2nfvm, ldff_op, ld_us_check)
>> +GEN_VEXT_TRANS(vleff_v, 3, r2nfvm, ldff_op, ld_us_check)
>> +GEN_VEXT_TRANS(vlbuff_v, 4, r2nfvm, ldff_op, ld_us_check)
>> +GEN_VEXT_TRANS(vlhuff_v, 5, r2nfvm, ldff_op, ld_us_check)
>> +GEN_VEXT_TRANS(vlwuff_v, 6, r2nfvm, ldff_op, ld_us_check)
>> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
>> index 35cb9f09b4..3841301b74 100644
>> --- a/target/riscv/vector_helper.c
>> +++ b/target/riscv/vector_helper.c
>> @@ -574,3 +574,114 @@ GEN_VEXT_ST_INDEX(vsxe_v_b, int8_t,  int8_t,  idx_b, ste_b)
>>   GEN_VEXT_ST_INDEX(vsxe_v_h, int16_t, int16_t, idx_h, ste_h)
>>   GEN_VEXT_ST_INDEX(vsxe_v_w, int32_t, int32_t, idx_w, ste_w)
>>   GEN_VEXT_ST_INDEX(vsxe_v_d, int64_t, int64_t, idx_d, ste_d)
>> +
>> +/*
>> + *** unit-stride fault-only-fisrt load instructions
>> + */
>> +static inline void vext_ldff(void *vd, void *v0, target_ulong base,
>> +        CPURISCVState *env, uint32_t desc,
>> +        vext_ldst_elem_fn ldst_elem,
>> +        vext_ld_clear_elem clear_elem,
>> +        int mmuidx, uint32_t esz, uint32_t msz, uintptr_t ra)
>> +{
>> +    void *host;
>> +    uint32_t i, k, vl = 0;
>> +    uint32_t mlen = vext_mlen(desc);
>> +    uint32_t nf = vext_nf(desc);
>> +    uint32_t vm = vext_vm(desc);
>> +    uint32_t vlmax = vext_maxsz(desc) / esz;
>> +    target_ulong addr, offset, remain;
>> +
>> +    if (env->vl == 0) {
>> +        return;
>> +    }
>> +    /* probe every access*/
>> +    for (i = 0; i < env->vl; i++) {
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
>> +            continue;
>> +        }
>> +        addr = base + nf * i * msz;
>> +        if (i == 0) {
>> +            probe_pages(env, addr, nf * msz, ra, MMU_DATA_LOAD);
>> +        } else {
>> +            /* if it triggers an exception, no need to check watchpoint */
>> +            offset = -(addr | TARGET_PAGE_MASK);
> You can move this assign into the while loop below
>
>> +            remain = nf * msz;
>> +            while (remain > 0) {
>> +                host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmuidx);
>> +                if (host) {
>> +#ifdef CONFIG_USER_ONLY
>> +                    if (page_check_range(addr, nf * msz, PAGE_READ) < 0) {
>> +                        vl = i;
>> +                        goto ProbeSuccess;
>> +                    }
>> +#else
>> +                    probe_pages(env, addr, nf * msz, ra, MMU_DATA_LOAD);
>> +#endif
>> +                } else {
>> +                    vl = i;
>> +                    goto ProbeSuccess;
>> +                }
>> +                if (remain <=  offset) {
>> +                    break;
>> +                }
>> +                remain -= offset;
>> +                addr += offset;
>> +                offset = -(addr | TARGET_PAGE_MASK);
> and then remove this
Good idea. Thanks.
I will move it next patch set.

Zhiwei
>
>> +            }
>> +        }
>> +    }
>> +ProbeSuccess:
>> +    /* load bytes from guest memory */
>> +    if (vl != 0) {
>> +        env->vl = vl;
>> +    }
>> +    for (i = 0; i < env->vl; i++) {
>> +        k = 0;
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
>> +            continue;
>> +        }
>> +        while (k < nf) {
>> +            target_ulong addr = base + (i * nf + k) * msz;
>> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
>> +            k++;
>> +        }
>> +    }
>> +    /* clear tail elements */
>> +    if (vl != 0) {
>> +        return;
>> +    }
>> +    for (k = 0; k < nf; k++) {
>> +        clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
>> +    }
>> +}
>> +
>> +#define GEN_VEXT_LDFF(NAME, MTYPE, ETYPE, MMUIDX, LOAD_FN, CLEAR_FN)  \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong base,              \
>> +        CPURISCVState *env, uint32_t desc)                            \
>> +{                                                                     \
>> +    vext_ldff(vd, v0, base, env, desc, LOAD_FN, CLEAR_FN, MMUIDX,     \
>> +        sizeof(ETYPE), sizeof(MTYPE), GETPC());                       \
>> +}
>> +GEN_VEXT_LDFF(vlbff_v_b,  int8_t,   int8_t,   MO_SB,   ldb_b,  clearb)
>> +GEN_VEXT_LDFF(vlbff_v_h,  int8_t,   int16_t,  MO_SB,   ldb_h,  clearh)
>> +GEN_VEXT_LDFF(vlbff_v_w,  int8_t,   int32_t,  MO_SB,   ldb_w,  clearl)
>> +GEN_VEXT_LDFF(vlbff_v_d,  int8_t,   int64_t,  MO_SB,   ldb_d,  clearq)
>> +GEN_VEXT_LDFF(vlhff_v_h,  int16_t,  int16_t,  MO_LESW, ldh_h,  clearh)
>> +GEN_VEXT_LDFF(vlhff_v_w,  int16_t,  int32_t,  MO_LESW, ldh_w,  clearl)
>> +GEN_VEXT_LDFF(vlhff_v_d,  int16_t,  int64_t,  MO_LESW, ldh_d,  clearq)
>> +GEN_VEXT_LDFF(vlwff_v_w,  int32_t,  int32_t,  MO_LESL, ldw_w,  clearl)
>> +GEN_VEXT_LDFF(vlwff_v_d,  int32_t,  int64_t,  MO_LESL, ldw_d,  clearq)
>> +GEN_VEXT_LDFF(vleff_v_b,  int8_t,   int8_t,   MO_SB,   lde_b,  clearb)
>> +GEN_VEXT_LDFF(vleff_v_h,  int16_t,  int16_t,  MO_LESW, lde_h,  clearh)
>> +GEN_VEXT_LDFF(vleff_v_w,  int32_t,  int32_t,  MO_LESL, lde_w,  clearl)
>> +GEN_VEXT_LDFF(vleff_v_d,  int64_t,  int64_t,  MO_LEQ,  lde_d,  clearq)
>> +GEN_VEXT_LDFF(vlbuff_v_b, uint8_t,  uint8_t,  MO_UB,   ldbu_b, clearb)
>> +GEN_VEXT_LDFF(vlbuff_v_h, uint8_t,  uint16_t, MO_UB,   ldbu_h, clearh)
>> +GEN_VEXT_LDFF(vlbuff_v_w, uint8_t,  uint32_t, MO_UB,   ldbu_w, clearl)
>> +GEN_VEXT_LDFF(vlbuff_v_d, uint8_t,  uint64_t, MO_UB,   ldbu_d, clearq)
>> +GEN_VEXT_LDFF(vlhuff_v_h, uint16_t, uint16_t, MO_LEUW, ldhu_h, clearh)
>> +GEN_VEXT_LDFF(vlhuff_v_w, uint16_t, uint32_t, MO_LEUW, ldhu_w, clearl)
>> +GEN_VEXT_LDFF(vlhuff_v_d, uint16_t, uint64_t, MO_LEUW, ldhu_d, clearq)
>> +GEN_VEXT_LDFF(vlwuff_v_w, uint32_t, uint32_t, MO_LEUL, ldwu_w, clearl)
>> +GEN_VEXT_LDFF(vlwuff_v_d, uint32_t, uint64_t, MO_LEUL, ldwu_d, clearq)
> Otherwise looks good.
>
> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
>
> Alistair
>
>> --
>> 2.23.0
>>



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 05/60] target/riscv: add vector stride load and store instructions
  2020-03-13 22:17           ` LIU Zhiwei
@ 2020-03-13 23:38             ` Alistair Francis
  -1 siblings, 0 replies; 336+ messages in thread
From: Alistair Francis @ 2020-03-13 23:38 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: guoren, open list:RISC-V, Richard Henderson,
	qemu-devel@nongnu.org Developers, wxy194768, Chih-Min Chao,
	wenmeng_zhang, Palmer Dabbelt

On Fri, Mar 13, 2020 at 3:17 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
>
>
> On 2020/3/14 6:05, Alistair Francis wrote:
> > On Fri, Mar 13, 2020 at 2:32 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
> >>
> >>
> >> On 2020/3/14 4:38, Alistair Francis wrote:
> >>> On Thu, Mar 12, 2020 at 8:09 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
> >>>> Vector strided operations access the first memory element at the base address,
> >>>> and then access subsequent elements at address increments given by the byte
> >>>> offset contained in the x register specified by rs2.
> >>>>
> >>>> Vector unit-stride operations access elements stored contiguously in memory
> >>>> starting from the base effective address. It can been seen as a special
> >>>> case of strided operations.
> >>>>
> >>>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> >>>> ---
> >>>>    target/riscv/cpu.h                      |   6 +
> >>>>    target/riscv/helper.h                   | 105 ++++++
> >>>>    target/riscv/insn32.decode              |  32 ++
> >>>>    target/riscv/insn_trans/trans_rvv.inc.c | 340 ++++++++++++++++++++
> >>>>    target/riscv/translate.c                |   7 +
> >>>>    target/riscv/vector_helper.c            | 406 ++++++++++++++++++++++++
> >>>>    6 files changed, 896 insertions(+)
> >>>>
> >>>> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> >>>> index 505d1a8515..b6ebb9b0eb 100644
> >>>> --- a/target/riscv/cpu.h
> >>>> +++ b/target/riscv/cpu.h
> >>>> @@ -369,6 +369,12 @@ typedef CPURISCVState CPUArchState;
> >>>>    typedef RISCVCPU ArchCPU;
> >>>>    #include "exec/cpu-all.h"
> >>>>
> >>>> +/* share data between vector helpers and decode code */
> >>>> +FIELD(VDATA, MLEN, 0, 8)
> >>>> +FIELD(VDATA, VM, 8, 1)
> >>>> +FIELD(VDATA, LMUL, 9, 2)
> >>>> +FIELD(VDATA, NF, 11, 4)
> >>>> +
> >>>>    FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
> >>>>    FIELD(TB_FLAGS, LMUL, 3, 2)
> >>>>    FIELD(TB_FLAGS, SEW, 5, 3)
> >>>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> >>>> index 3c28c7e407..87dfa90609 100644
> >>>> --- a/target/riscv/helper.h
> >>>> +++ b/target/riscv/helper.h
> >>>> @@ -78,3 +78,108 @@ DEF_HELPER_1(tlb_flush, void, env)
> >>>>    #endif
> >>>>    /* Vector functions */
> >>>>    DEF_HELPER_3(vsetvl, tl, env, tl, tl)
> >>>> +DEF_HELPER_5(vlb_v_b, void, ptr, ptr, tl, env, i32)
> >>>> +DEF_HELPER_5(vlb_v_b_mask, void, ptr, ptr, tl, env, i32)
> >>> Do you mind explaining why we have *_mask versions? I'm struggling to
> >>> understand this.
> >> When an instruction with a mask, it will only operate the active
> >> elements in vector.
> >> Whether an element is active or inactive is predicated by a mask
> >> register v0.
> >>
> >> Without mask, it will operate every element in vector in the body.
> > Doesn't the mask always apply though? Why do we need an extra helper?
> Yes, mask is always applied.
>
> As you can see,  an extra helper is  very special for unit stride mode.
> Other
> instructions do not have the extra helpers.
>
> That's because a more efficient implementation is possible for unit stride
> load/store with vm==1(always unmasked).
>
> It will operate a contiguous memory block, so I can probe the memory access
> and clean the tail elements more efficient.

Ah ok. I think I get what you are saying. I think this is all ok then.
I'll review the next version (after you have split it).

Alistair

>
> Zhiwei


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 05/60] target/riscv: add vector stride load and store instructions
@ 2020-03-13 23:38             ` Alistair Francis
  0 siblings, 0 replies; 336+ messages in thread
From: Alistair Francis @ 2020-03-13 23:38 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Chih-Min Chao, Palmer Dabbelt, wenmeng_zhang,
	wxy194768, guoren, qemu-devel@nongnu.org Developers,
	open list:RISC-V

On Fri, Mar 13, 2020 at 3:17 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
>
>
> On 2020/3/14 6:05, Alistair Francis wrote:
> > On Fri, Mar 13, 2020 at 2:32 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
> >>
> >>
> >> On 2020/3/14 4:38, Alistair Francis wrote:
> >>> On Thu, Mar 12, 2020 at 8:09 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
> >>>> Vector strided operations access the first memory element at the base address,
> >>>> and then access subsequent elements at address increments given by the byte
> >>>> offset contained in the x register specified by rs2.
> >>>>
> >>>> Vector unit-stride operations access elements stored contiguously in memory
> >>>> starting from the base effective address. It can been seen as a special
> >>>> case of strided operations.
> >>>>
> >>>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> >>>> ---
> >>>>    target/riscv/cpu.h                      |   6 +
> >>>>    target/riscv/helper.h                   | 105 ++++++
> >>>>    target/riscv/insn32.decode              |  32 ++
> >>>>    target/riscv/insn_trans/trans_rvv.inc.c | 340 ++++++++++++++++++++
> >>>>    target/riscv/translate.c                |   7 +
> >>>>    target/riscv/vector_helper.c            | 406 ++++++++++++++++++++++++
> >>>>    6 files changed, 896 insertions(+)
> >>>>
> >>>> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> >>>> index 505d1a8515..b6ebb9b0eb 100644
> >>>> --- a/target/riscv/cpu.h
> >>>> +++ b/target/riscv/cpu.h
> >>>> @@ -369,6 +369,12 @@ typedef CPURISCVState CPUArchState;
> >>>>    typedef RISCVCPU ArchCPU;
> >>>>    #include "exec/cpu-all.h"
> >>>>
> >>>> +/* share data between vector helpers and decode code */
> >>>> +FIELD(VDATA, MLEN, 0, 8)
> >>>> +FIELD(VDATA, VM, 8, 1)
> >>>> +FIELD(VDATA, LMUL, 9, 2)
> >>>> +FIELD(VDATA, NF, 11, 4)
> >>>> +
> >>>>    FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
> >>>>    FIELD(TB_FLAGS, LMUL, 3, 2)
> >>>>    FIELD(TB_FLAGS, SEW, 5, 3)
> >>>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> >>>> index 3c28c7e407..87dfa90609 100644
> >>>> --- a/target/riscv/helper.h
> >>>> +++ b/target/riscv/helper.h
> >>>> @@ -78,3 +78,108 @@ DEF_HELPER_1(tlb_flush, void, env)
> >>>>    #endif
> >>>>    /* Vector functions */
> >>>>    DEF_HELPER_3(vsetvl, tl, env, tl, tl)
> >>>> +DEF_HELPER_5(vlb_v_b, void, ptr, ptr, tl, env, i32)
> >>>> +DEF_HELPER_5(vlb_v_b_mask, void, ptr, ptr, tl, env, i32)
> >>> Do you mind explaining why we have *_mask versions? I'm struggling to
> >>> understand this.
> >> When an instruction with a mask, it will only operate the active
> >> elements in vector.
> >> Whether an element is active or inactive is predicated by a mask
> >> register v0.
> >>
> >> Without mask, it will operate every element in vector in the body.
> > Doesn't the mask always apply though? Why do we need an extra helper?
> Yes, mask is always applied.
>
> As you can see,  an extra helper is  very special for unit stride mode.
> Other
> instructions do not have the extra helpers.
>
> That's because a more efficient implementation is possible for unit stride
> load/store with vm==1(always unmasked).
>
> It will operate a contiguous memory block, so I can probe the memory access
> and clean the tail elements more efficient.

Ah ok. I think I get what you are saying. I think this is all ok then.
I'll review the next version (after you have split it).

Alistair

>
> Zhiwei


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 08/60] target/riscv: add vector amo operations
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  0:02     ` Alistair Francis
  -1 siblings, 0 replies; 336+ messages in thread
From: Alistair Francis @ 2020-03-14  0:02 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: guoren, open list:RISC-V, Richard Henderson,
	qemu-devel@nongnu.org Developers, wxy194768, Chih-Min Chao,
	wenmeng_zhang, Palmer Dabbelt

On Thu, Mar 12, 2020 at 8:15 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Vector AMOs operate as if aq and rl bits were zero on each element
> with regard to ordering relative to other instructions in the same hart.
> Vector AMOs provide no ordering guarantee between element operations
> in the same vector AMO instruction
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/cpu.h                      |   1 +
>  target/riscv/helper.h                   |  29 +++++
>  target/riscv/insn32-64.decode           |  11 ++
>  target/riscv/insn32.decode              |  13 +++
>  target/riscv/insn_trans/trans_rvv.inc.c | 130 +++++++++++++++++++++
>  target/riscv/vector_helper.c            | 143 ++++++++++++++++++++++++
>  6 files changed, 327 insertions(+)
>
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index b6ebb9b0eb..e069e55e81 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -374,6 +374,7 @@ FIELD(VDATA, MLEN, 0, 8)
>  FIELD(VDATA, VM, 8, 1)
>  FIELD(VDATA, LMUL, 9, 2)
>  FIELD(VDATA, NF, 11, 4)
> +FIELD(VDATA, WD, 11, 1)
>
>  FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
>  FIELD(TB_FLAGS, LMUL, 3, 2)
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 72ba4d9bdb..70a4b05f75 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -240,3 +240,32 @@ DEF_HELPER_5(vlhuff_v_w, void, ptr, ptr, tl, env, i32)
>  DEF_HELPER_5(vlhuff_v_d, void, ptr, ptr, tl, env, i32)
>  DEF_HELPER_5(vlwuff_v_w, void, ptr, ptr, tl, env, i32)
>  DEF_HELPER_5(vlwuff_v_d, void, ptr, ptr, tl, env, i32)
> +#ifdef TARGET_RISCV64
> +DEF_HELPER_6(vamoswapw_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoswapd_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoaddw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoaddd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoxorw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoxord_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoandw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoandd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoorw_v_d,   void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoord_v_d,   void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamominw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamomind_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamomaxw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamomaxd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamominuw_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamominud_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamomaxuw_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamomaxud_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +#endif
> +DEF_HELPER_6(vamoswapw_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoaddw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoxorw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoandw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoorw_v_w,   void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamominw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamomaxw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamominuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamomaxuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
> diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
> index 380bf791bc..86153d93fa 100644
> --- a/target/riscv/insn32-64.decode
> +++ b/target/riscv/insn32-64.decode
> @@ -57,6 +57,17 @@ amomax_d   10100 . . ..... ..... 011 ..... 0101111 @atom_st
>  amominu_d  11000 . . ..... ..... 011 ..... 0101111 @atom_st
>  amomaxu_d  11100 . . ..... ..... 011 ..... 0101111 @atom_st
>
> +#*** Vector AMO operations (in addition to Zvamo) ***
> +vamoswapd_v     00001 . . ..... ..... 111 ..... 0101111 @r_wdvm
> +vamoaddd_v      00000 . . ..... ..... 111 ..... 0101111 @r_wdvm
> +vamoxord_v      00100 . . ..... ..... 111 ..... 0101111 @r_wdvm
> +vamoandd_v      01100 . . ..... ..... 111 ..... 0101111 @r_wdvm
> +vamoord_v       01000 . . ..... ..... 111 ..... 0101111 @r_wdvm
> +vamomind_v      10000 . . ..... ..... 111 ..... 0101111 @r_wdvm
> +vamomaxd_v      10100 . . ..... ..... 111 ..... 0101111 @r_wdvm
> +vamominud_v     11000 . . ..... ..... 111 ..... 0101111 @r_wdvm
> +vamomaxud_v     11100 . . ..... ..... 111 ..... 0101111 @r_wdvm
> +
>  # *** RV64F Standard Extension (in addition to RV32F) ***
>  fcvt_l_s   1100000  00010 ..... ... ..... 1010011 @r2_rm
>  fcvt_lu_s  1100000  00011 ..... ... ..... 1010011 @r2_rm
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index b76c09c8c0..1330703720 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -44,6 +44,7 @@
>  &u    imm rd
>  &shift     shamt rs1 rd
>  &atomic    aq rl rs2 rs1 rd
> +&rwdvm     vm wd rd rs1 rs2
>  &r2nfvm    vm rd rs1 nf
>  &rnfvm     vm rd rs1 rs2 nf
>
> @@ -67,6 +68,7 @@
>  @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
>  @r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
>  @r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
> +@r_wdvm  ..... wd:1 vm:1 ..... ..... ... ..... ....... &rwdvm %rs2 %rs1 %rd
>  @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
>
>  @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
> @@ -261,6 +263,17 @@ vsxh_v     ... -11 . ..... ..... 101 ..... 0100111 @r_nfvm
>  vsxw_v     ... -11 . ..... ..... 110 ..... 0100111 @r_nfvm
>  vsxe_v     ... -11 . ..... ..... 111 ..... 0100111 @r_nfvm
>
> +#*** Vector AMO operations are encoded under the standard AMO major opcode ***
> +vamoswapw_v     00001 . . ..... ..... 110 ..... 0101111 @r_wdvm
> +vamoaddw_v      00000 . . ..... ..... 110 ..... 0101111 @r_wdvm
> +vamoxorw_v      00100 . . ..... ..... 110 ..... 0101111 @r_wdvm
> +vamoandw_v      01100 . . ..... ..... 110 ..... 0101111 @r_wdvm
> +vamoorw_v       01000 . . ..... ..... 110 ..... 0101111 @r_wdvm
> +vamominw_v      10000 . . ..... ..... 110 ..... 0101111 @r_wdvm
> +vamomaxw_v      10100 . . ..... ..... 110 ..... 0101111 @r_wdvm
> +vamominuw_v     11000 . . ..... ..... 110 ..... 0101111 @r_wdvm
> +vamomaxuw_v     11100 . . ..... ..... 110 ..... 0101111 @r_wdvm
> +
>  # *** new major opcode OP-V ***
>  vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
>  vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
> index 9d9fc886d6..3c677160c5 100644
> --- a/target/riscv/insn_trans/trans_rvv.inc.c
> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
> @@ -600,3 +600,133 @@ GEN_VEXT_TRANS(vleff_v, 3, r2nfvm, ldff_op, ld_us_check)
>  GEN_VEXT_TRANS(vlbuff_v, 4, r2nfvm, ldff_op, ld_us_check)
>  GEN_VEXT_TRANS(vlhuff_v, 5, r2nfvm, ldff_op, ld_us_check)
>  GEN_VEXT_TRANS(vlwuff_v, 6, r2nfvm, ldff_op, ld_us_check)
> +
> +/*
> + *** vector atomic operation
> + */
> +typedef void gen_helper_amo(TCGv_ptr, TCGv_ptr, TCGv, TCGv_ptr,
> +        TCGv_env, TCGv_i32);
> +
> +static bool amo_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
> +        uint32_t data, gen_helper_amo *fn, DisasContext *s)
> +{
> +    TCGv_ptr dest, mask, index;
> +    TCGv base;
> +    TCGv_i32 desc;
> +
> +    dest = tcg_temp_new_ptr();
> +    mask = tcg_temp_new_ptr();
> +    index = tcg_temp_new_ptr();
> +    base = tcg_temp_new();
> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
> +
> +    gen_get_gpr(base, rs1);
> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
> +    tcg_gen_addi_ptr(index, cpu_env, vreg_ofs(s, vs2));
> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
> +
> +    fn(dest, mask, base, index, cpu_env, desc);
> +
> +    tcg_temp_free_ptr(dest);
> +    tcg_temp_free_ptr(mask);
> +    tcg_temp_free_ptr(index);
> +    tcg_temp_free(base);
> +    tcg_temp_free_i32(desc);
> +    return true;
> +}
> +
> +static bool amo_op(DisasContext *s, arg_rwdvm *a, uint8_t seq)
> +{
> +    uint32_t data = 0;
> +    gen_helper_amo *fn;
> +    static gen_helper_amo *const fnsw[9] = {
> +        /* no atomic operation */
> +        gen_helper_vamoswapw_v_w,
> +        gen_helper_vamoaddw_v_w,
> +        gen_helper_vamoxorw_v_w,
> +        gen_helper_vamoandw_v_w,
> +        gen_helper_vamoorw_v_w,
> +        gen_helper_vamominw_v_w,
> +        gen_helper_vamomaxw_v_w,
> +        gen_helper_vamominuw_v_w,
> +        gen_helper_vamomaxuw_v_w
> +    };
> +#ifdef TARGET_RISCV64
> +    static gen_helper_amo *const fnsd[18] = {
> +        gen_helper_vamoswapw_v_d,
> +        gen_helper_vamoaddw_v_d,
> +        gen_helper_vamoxorw_v_d,
> +        gen_helper_vamoandw_v_d,
> +        gen_helper_vamoorw_v_d,
> +        gen_helper_vamominw_v_d,
> +        gen_helper_vamomaxw_v_d,
> +        gen_helper_vamominuw_v_d,
> +        gen_helper_vamomaxuw_v_d,
> +        gen_helper_vamoswapd_v_d,
> +        gen_helper_vamoaddd_v_d,
> +        gen_helper_vamoxord_v_d,
> +        gen_helper_vamoandd_v_d,
> +        gen_helper_vamoord_v_d,
> +        gen_helper_vamomind_v_d,
> +        gen_helper_vamomaxd_v_d,
> +        gen_helper_vamominud_v_d,
> +        gen_helper_vamomaxud_v_d
> +    };
> +#endif
> +
> +    if (tb_cflags(s->base.tb) & CF_PARALLEL) {
> +        gen_helper_exit_atomic(cpu_env);
> +        s->base.is_jmp = DISAS_NORETURN;
> +        return true;
> +    } else {
> +        fn = fnsw[seq];
> +#ifdef TARGET_RISCV64
> +        if (s->sew == 3) {
> +            fn = fnsd[seq];
> +        }
> +#endif
> +    }
> +
> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> +    data = FIELD_DP32(data, VDATA, WD, a->wd);
> +    return amo_trans(a->rd, a->rs1, a->rs2, data, fn, s);
> +}
> +/*
> + * There are two rules check here.
> + *
> + * 1. SEW must be at least as wide as the AMO memory element size.
> + *
> + * 2. If SEW is greater than XLEN, an illegal instruction exception is raised.
> + */
> +static bool amo_check(DisasContext *s, arg_rwdvm* a)
> +{
> +    return (vext_check_isa_ill(s, RVV | RVA) &&
> +            (!a->wd || vext_check_overlap_mask(s, a->rd, a->vm, false)) &&
> +            vext_check_reg(s, a->rd, false) &&
> +            vext_check_reg(s, a->rs2, false) &&
> +            ((1 << s->sew) <= sizeof(target_ulong)) &&
> +            ((1 << s->sew) >= 4));
> +}
> +
> +GEN_VEXT_TRANS(vamoswapw_v, 0, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamoaddw_v, 1, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamoxorw_v, 2, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamoandw_v, 3, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamoorw_v, 4, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamominw_v, 5, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamomaxw_v, 6, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamominuw_v, 7, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamomaxuw_v, 8, rwdvm, amo_op, amo_check)
> +#ifdef TARGET_RISCV64
> +GEN_VEXT_TRANS(vamoswapd_v, 9, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamoaddd_v, 10, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamoxord_v, 11, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamoandd_v, 12, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamoord_v, 13, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamomind_v, 14, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamomaxd_v, 15, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamominud_v, 16, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamomaxud_v, 17, rwdvm, amo_op, amo_check)
> +#endif
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 3841301b74..f9b409b169 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -94,6 +94,11 @@ static inline uint32_t vext_lmul(uint32_t desc)
>      return FIELD_EX32(simd_data(desc), VDATA, LMUL);
>  }
>
> +static uint32_t vext_wd(uint32_t desc)
> +{
> +    return (simd_data(desc) >> 11) & 0x1;
> +}
> +
>  /*
>   * Get vector group length in bytes. Its range is [64, 2048].
>   *
> @@ -685,3 +690,141 @@ GEN_VEXT_LDFF(vlhuff_v_w, uint16_t, uint32_t, MO_LEUW, ldhu_w, clearl)
>  GEN_VEXT_LDFF(vlhuff_v_d, uint16_t, uint64_t, MO_LEUW, ldhu_d, clearq)
>  GEN_VEXT_LDFF(vlwuff_v_w, uint32_t, uint32_t, MO_LEUL, ldwu_w, clearl)
>  GEN_VEXT_LDFF(vlwuff_v_d, uint32_t, uint64_t, MO_LEUL, ldwu_d, clearq)
> +
> +/*
> + *** Vector AMO Operations (Zvamo)
> + */
> +typedef void (*vext_amo_noatomic_fn)(void *vs3, target_ulong addr,
> +        uint32_t wd, uint32_t idx, CPURISCVState *env, uintptr_t retaddr);
> +
> +/* no atomic opreation for vector atomic insructions */
> +#define DO_SWAP(N, M) (M)
> +#define DO_AND(N, M)  (N & M)
> +#define DO_XOR(N, M)  (N ^ M)
> +#define DO_OR(N, M)   (N | M)
> +#define DO_ADD(N, M)  (N + M)

Why don't these need to be atomic?

> +
> +#define GEN_VEXT_AMO_NOATOMIC_OP(NAME, ESZ, MSZ, H, DO_OP, SUF) \
> +static void vext_##NAME##_noatomic_op(void *vs3,                \
> +            target_ulong addr, uint32_t wd, uint32_t idx,       \
> +                CPURISCVState *env, uintptr_t retaddr)          \
> +{                                                               \
> +    typedef int##ESZ##_t ETYPE;                                 \
> +    typedef int##MSZ##_t MTYPE;                                 \
> +    typedef uint##MSZ##_t UMTYPE __attribute__((unused));       \
> +    ETYPE *pe3 = (ETYPE *)vs3 + H(idx);                         \
> +    MTYPE a = *pe3, b = cpu_ld##SUF##_data(env, addr);          \
> +    a = DO_OP(a, b);                                            \
> +    cpu_st##SUF##_data(env, addr, a);                           \
> +    if (wd) {                                                   \
> +        *pe3 = a;                                               \
> +    }                                                           \
> +}
> +
> +/* Signed min/max */
> +#define DO_MAX(N, M)  ((N) >= (M) ? (N) : (M))
> +#define DO_MIN(N, M)  ((N) >= (M) ? (M) : (N))
> +
> +/* Unsigned min/max */
> +#define DO_MAXU(N, M) DO_MAX((UMTYPE)N, (UMTYPE)M)
> +#define DO_MINU(N, M) DO_MIN((UMTYPE)N, (UMTYPE)M)
> +
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoswapw_v_w, 32, 32, H4, DO_SWAP, l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoaddw_v_w,  32, 32, H4, DO_ADD,  l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoxorw_v_w,  32, 32, H4, DO_XOR,  l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoandw_v_w,  32, 32, H4, DO_AND,  l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoorw_v_w,   32, 32, H4, DO_OR,   l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamominw_v_w,  32, 32, H4, DO_MIN,  l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxw_v_w,  32, 32, H4, DO_MAX,  l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamominuw_v_w, 32, 32, H4, DO_MINU, l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxuw_v_w, 32, 32, H4, DO_MAXU, l)
> +#ifdef TARGET_RISCV64
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoswapw_v_d, 64, 32, H8, DO_SWAP, l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoswapd_v_d, 64, 64, H8, DO_SWAP, q)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoaddw_v_d,  64, 32, H8, DO_ADD,  l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoaddd_v_d,  64, 64, H8, DO_ADD,  q)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoxorw_v_d,  64, 32, H8, DO_XOR,  l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoxord_v_d,  64, 64, H8, DO_XOR,  q)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoandw_v_d,  64, 32, H8, DO_AND,  l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoandd_v_d,  64, 64, H8, DO_AND,  q)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoorw_v_d,   64, 32, H8, DO_OR,   l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoord_v_d,   64, 64, H8, DO_OR,   q)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamominw_v_d,  64, 32, H8, DO_MIN,  l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamomind_v_d,  64, 64, H8, DO_MIN,  q)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxw_v_d,  64, 32, H8, DO_MAX,  l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxd_v_d,  64, 64, H8, DO_MAX,  q)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamominuw_v_d, 64, 32, H8, DO_MINU, l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamominud_v_d, 64, 64, H8, DO_MINU, q)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxuw_v_d, 64, 32, H8, DO_MAXU, l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxud_v_d, 64, 64, H8, DO_MAXU, q)
> +#endif

I'm confused why these are different to

> +
> +static inline void vext_amo_noatomic(void *vs3, void *v0, target_ulong base,
> +        void *vs2, CPURISCVState *env, uint32_t desc,
> +        vext_get_index_addr get_index_addr,
> +        vext_amo_noatomic_fn noatomic_op,
> +        vext_ld_clear_elem clear_elem,
> +        uint32_t esz, uint32_t msz, uintptr_t ra)
> +{
> +    uint32_t i;
> +    target_long addr;
> +    uint32_t wd = vext_wd(desc);
> +    uint32_t vm = vext_vm(desc);
> +    uint32_t mlen = vext_mlen(desc);
> +    uint32_t vlmax = vext_maxsz(desc) / esz;
> +
> +    for (i = 0; i < env->vl; i++) {
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +            continue;
> +        }
> +        probe_pages(env, get_index_addr(base, i, vs2), msz, ra, MMU_DATA_LOAD);
> +        probe_pages(env, get_index_addr(base, i, vs2), msz, ra, MMU_DATA_STORE);
> +    }
> +    for (i = 0; i < env->vl; i++) {
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +            continue;
> +        }
> +        addr = get_index_addr(base, i, vs2);
> +        noatomic_op(vs3, addr, wd, i, env, ra);
> +    }
> +    clear_elem(vs3, env->vl, env->vl * esz, vlmax * esz);
> +}
> +
> +#define GEN_VEXT_AMO(NAME, MTYPE, ETYPE, INDEX_FN, CLEAR_FN)    \
> +void HELPER(NAME)(void *vs3, void *v0, target_ulong base,       \
> +        void *vs2, CPURISCVState *env, uint32_t desc)           \
> +{                                                               \
> +    vext_amo_noatomic(vs3, v0, base, vs2, env, desc,            \
> +        INDEX_FN, vext_##NAME##_noatomic_op, CLEAR_FN,          \
> +        sizeof(ETYPE), sizeof(MTYPE), GETPC());                 \
> +}
> +
> +#ifdef TARGET_RISCV64
> +GEN_VEXT_AMO(vamoswapw_v_d, int32_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamoswapd_v_d, int64_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamoaddw_v_d,  int32_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamoaddd_v_d,  int64_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamoxorw_v_d,  int32_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamoxord_v_d,  int64_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamoandw_v_d,  int32_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamoandd_v_d,  int64_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamoorw_v_d,   int32_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamoord_v_d,   int64_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamominw_v_d,  int32_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamomind_v_d,  int64_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamomaxw_v_d,  int32_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamomaxd_v_d,  int64_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamominuw_v_d, uint32_t, uint64_t, idx_d, clearq)
> +GEN_VEXT_AMO(vamominud_v_d, uint64_t, uint64_t, idx_d, clearq)
> +GEN_VEXT_AMO(vamomaxuw_v_d, uint32_t, uint64_t, idx_d, clearq)
> +GEN_VEXT_AMO(vamomaxud_v_d, uint64_t, uint64_t, idx_d, clearq)
> +#endif
> +GEN_VEXT_AMO(vamoswapw_v_w, int32_t,  int32_t,  idx_w, clearl)
> +GEN_VEXT_AMO(vamoaddw_v_w,  int32_t,  int32_t,  idx_w, clearl)
> +GEN_VEXT_AMO(vamoxorw_v_w,  int32_t,  int32_t,  idx_w, clearl)
> +GEN_VEXT_AMO(vamoandw_v_w,  int32_t,  int32_t,  idx_w, clearl)
> +GEN_VEXT_AMO(vamoorw_v_w,   int32_t,  int32_t,  idx_w, clearl)
> +GEN_VEXT_AMO(vamominw_v_w,  int32_t,  int32_t,  idx_w, clearl)
> +GEN_VEXT_AMO(vamomaxw_v_w,  int32_t,  int32_t,  idx_w, clearl)
> +GEN_VEXT_AMO(vamominuw_v_w, uint32_t, uint32_t, idx_w, clearl)
> +GEN_VEXT_AMO(vamomaxuw_v_w, uint32_t, uint32_t, idx_w, clearl)

These?

Alistair

> --
> 2.23.0
>


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 08/60] target/riscv: add vector amo operations
@ 2020-03-14  0:02     ` Alistair Francis
  0 siblings, 0 replies; 336+ messages in thread
From: Alistair Francis @ 2020-03-14  0:02 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Chih-Min Chao, Palmer Dabbelt, wenmeng_zhang,
	wxy194768, guoren, qemu-devel@nongnu.org Developers,
	open list:RISC-V

On Thu, Mar 12, 2020 at 8:15 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Vector AMOs operate as if aq and rl bits were zero on each element
> with regard to ordering relative to other instructions in the same hart.
> Vector AMOs provide no ordering guarantee between element operations
> in the same vector AMO instruction
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/cpu.h                      |   1 +
>  target/riscv/helper.h                   |  29 +++++
>  target/riscv/insn32-64.decode           |  11 ++
>  target/riscv/insn32.decode              |  13 +++
>  target/riscv/insn_trans/trans_rvv.inc.c | 130 +++++++++++++++++++++
>  target/riscv/vector_helper.c            | 143 ++++++++++++++++++++++++
>  6 files changed, 327 insertions(+)
>
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index b6ebb9b0eb..e069e55e81 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -374,6 +374,7 @@ FIELD(VDATA, MLEN, 0, 8)
>  FIELD(VDATA, VM, 8, 1)
>  FIELD(VDATA, LMUL, 9, 2)
>  FIELD(VDATA, NF, 11, 4)
> +FIELD(VDATA, WD, 11, 1)
>
>  FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
>  FIELD(TB_FLAGS, LMUL, 3, 2)
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 72ba4d9bdb..70a4b05f75 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -240,3 +240,32 @@ DEF_HELPER_5(vlhuff_v_w, void, ptr, ptr, tl, env, i32)
>  DEF_HELPER_5(vlhuff_v_d, void, ptr, ptr, tl, env, i32)
>  DEF_HELPER_5(vlwuff_v_w, void, ptr, ptr, tl, env, i32)
>  DEF_HELPER_5(vlwuff_v_d, void, ptr, ptr, tl, env, i32)
> +#ifdef TARGET_RISCV64
> +DEF_HELPER_6(vamoswapw_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoswapd_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoaddw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoaddd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoxorw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoxord_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoandw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoandd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoorw_v_d,   void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoord_v_d,   void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamominw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamomind_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamomaxw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamomaxd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamominuw_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamominud_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamomaxuw_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamomaxud_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +#endif
> +DEF_HELPER_6(vamoswapw_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoaddw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoxorw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoandw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoorw_v_w,   void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamominw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamomaxw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamominuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamomaxuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
> diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
> index 380bf791bc..86153d93fa 100644
> --- a/target/riscv/insn32-64.decode
> +++ b/target/riscv/insn32-64.decode
> @@ -57,6 +57,17 @@ amomax_d   10100 . . ..... ..... 011 ..... 0101111 @atom_st
>  amominu_d  11000 . . ..... ..... 011 ..... 0101111 @atom_st
>  amomaxu_d  11100 . . ..... ..... 011 ..... 0101111 @atom_st
>
> +#*** Vector AMO operations (in addition to Zvamo) ***
> +vamoswapd_v     00001 . . ..... ..... 111 ..... 0101111 @r_wdvm
> +vamoaddd_v      00000 . . ..... ..... 111 ..... 0101111 @r_wdvm
> +vamoxord_v      00100 . . ..... ..... 111 ..... 0101111 @r_wdvm
> +vamoandd_v      01100 . . ..... ..... 111 ..... 0101111 @r_wdvm
> +vamoord_v       01000 . . ..... ..... 111 ..... 0101111 @r_wdvm
> +vamomind_v      10000 . . ..... ..... 111 ..... 0101111 @r_wdvm
> +vamomaxd_v      10100 . . ..... ..... 111 ..... 0101111 @r_wdvm
> +vamominud_v     11000 . . ..... ..... 111 ..... 0101111 @r_wdvm
> +vamomaxud_v     11100 . . ..... ..... 111 ..... 0101111 @r_wdvm
> +
>  # *** RV64F Standard Extension (in addition to RV32F) ***
>  fcvt_l_s   1100000  00010 ..... ... ..... 1010011 @r2_rm
>  fcvt_lu_s  1100000  00011 ..... ... ..... 1010011 @r2_rm
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index b76c09c8c0..1330703720 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -44,6 +44,7 @@
>  &u    imm rd
>  &shift     shamt rs1 rd
>  &atomic    aq rl rs2 rs1 rd
> +&rwdvm     vm wd rd rs1 rs2
>  &r2nfvm    vm rd rs1 nf
>  &rnfvm     vm rd rs1 rs2 nf
>
> @@ -67,6 +68,7 @@
>  @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
>  @r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
>  @r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
> +@r_wdvm  ..... wd:1 vm:1 ..... ..... ... ..... ....... &rwdvm %rs2 %rs1 %rd
>  @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
>
>  @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
> @@ -261,6 +263,17 @@ vsxh_v     ... -11 . ..... ..... 101 ..... 0100111 @r_nfvm
>  vsxw_v     ... -11 . ..... ..... 110 ..... 0100111 @r_nfvm
>  vsxe_v     ... -11 . ..... ..... 111 ..... 0100111 @r_nfvm
>
> +#*** Vector AMO operations are encoded under the standard AMO major opcode ***
> +vamoswapw_v     00001 . . ..... ..... 110 ..... 0101111 @r_wdvm
> +vamoaddw_v      00000 . . ..... ..... 110 ..... 0101111 @r_wdvm
> +vamoxorw_v      00100 . . ..... ..... 110 ..... 0101111 @r_wdvm
> +vamoandw_v      01100 . . ..... ..... 110 ..... 0101111 @r_wdvm
> +vamoorw_v       01000 . . ..... ..... 110 ..... 0101111 @r_wdvm
> +vamominw_v      10000 . . ..... ..... 110 ..... 0101111 @r_wdvm
> +vamomaxw_v      10100 . . ..... ..... 110 ..... 0101111 @r_wdvm
> +vamominuw_v     11000 . . ..... ..... 110 ..... 0101111 @r_wdvm
> +vamomaxuw_v     11100 . . ..... ..... 110 ..... 0101111 @r_wdvm
> +
>  # *** new major opcode OP-V ***
>  vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
>  vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
> index 9d9fc886d6..3c677160c5 100644
> --- a/target/riscv/insn_trans/trans_rvv.inc.c
> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
> @@ -600,3 +600,133 @@ GEN_VEXT_TRANS(vleff_v, 3, r2nfvm, ldff_op, ld_us_check)
>  GEN_VEXT_TRANS(vlbuff_v, 4, r2nfvm, ldff_op, ld_us_check)
>  GEN_VEXT_TRANS(vlhuff_v, 5, r2nfvm, ldff_op, ld_us_check)
>  GEN_VEXT_TRANS(vlwuff_v, 6, r2nfvm, ldff_op, ld_us_check)
> +
> +/*
> + *** vector atomic operation
> + */
> +typedef void gen_helper_amo(TCGv_ptr, TCGv_ptr, TCGv, TCGv_ptr,
> +        TCGv_env, TCGv_i32);
> +
> +static bool amo_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
> +        uint32_t data, gen_helper_amo *fn, DisasContext *s)
> +{
> +    TCGv_ptr dest, mask, index;
> +    TCGv base;
> +    TCGv_i32 desc;
> +
> +    dest = tcg_temp_new_ptr();
> +    mask = tcg_temp_new_ptr();
> +    index = tcg_temp_new_ptr();
> +    base = tcg_temp_new();
> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
> +
> +    gen_get_gpr(base, rs1);
> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
> +    tcg_gen_addi_ptr(index, cpu_env, vreg_ofs(s, vs2));
> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
> +
> +    fn(dest, mask, base, index, cpu_env, desc);
> +
> +    tcg_temp_free_ptr(dest);
> +    tcg_temp_free_ptr(mask);
> +    tcg_temp_free_ptr(index);
> +    tcg_temp_free(base);
> +    tcg_temp_free_i32(desc);
> +    return true;
> +}
> +
> +static bool amo_op(DisasContext *s, arg_rwdvm *a, uint8_t seq)
> +{
> +    uint32_t data = 0;
> +    gen_helper_amo *fn;
> +    static gen_helper_amo *const fnsw[9] = {
> +        /* no atomic operation */
> +        gen_helper_vamoswapw_v_w,
> +        gen_helper_vamoaddw_v_w,
> +        gen_helper_vamoxorw_v_w,
> +        gen_helper_vamoandw_v_w,
> +        gen_helper_vamoorw_v_w,
> +        gen_helper_vamominw_v_w,
> +        gen_helper_vamomaxw_v_w,
> +        gen_helper_vamominuw_v_w,
> +        gen_helper_vamomaxuw_v_w
> +    };
> +#ifdef TARGET_RISCV64
> +    static gen_helper_amo *const fnsd[18] = {
> +        gen_helper_vamoswapw_v_d,
> +        gen_helper_vamoaddw_v_d,
> +        gen_helper_vamoxorw_v_d,
> +        gen_helper_vamoandw_v_d,
> +        gen_helper_vamoorw_v_d,
> +        gen_helper_vamominw_v_d,
> +        gen_helper_vamomaxw_v_d,
> +        gen_helper_vamominuw_v_d,
> +        gen_helper_vamomaxuw_v_d,
> +        gen_helper_vamoswapd_v_d,
> +        gen_helper_vamoaddd_v_d,
> +        gen_helper_vamoxord_v_d,
> +        gen_helper_vamoandd_v_d,
> +        gen_helper_vamoord_v_d,
> +        gen_helper_vamomind_v_d,
> +        gen_helper_vamomaxd_v_d,
> +        gen_helper_vamominud_v_d,
> +        gen_helper_vamomaxud_v_d
> +    };
> +#endif
> +
> +    if (tb_cflags(s->base.tb) & CF_PARALLEL) {
> +        gen_helper_exit_atomic(cpu_env);
> +        s->base.is_jmp = DISAS_NORETURN;
> +        return true;
> +    } else {
> +        fn = fnsw[seq];
> +#ifdef TARGET_RISCV64
> +        if (s->sew == 3) {
> +            fn = fnsd[seq];
> +        }
> +#endif
> +    }
> +
> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
> +    data = FIELD_DP32(data, VDATA, WD, a->wd);
> +    return amo_trans(a->rd, a->rs1, a->rs2, data, fn, s);
> +}
> +/*
> + * There are two rules check here.
> + *
> + * 1. SEW must be at least as wide as the AMO memory element size.
> + *
> + * 2. If SEW is greater than XLEN, an illegal instruction exception is raised.
> + */
> +static bool amo_check(DisasContext *s, arg_rwdvm* a)
> +{
> +    return (vext_check_isa_ill(s, RVV | RVA) &&
> +            (!a->wd || vext_check_overlap_mask(s, a->rd, a->vm, false)) &&
> +            vext_check_reg(s, a->rd, false) &&
> +            vext_check_reg(s, a->rs2, false) &&
> +            ((1 << s->sew) <= sizeof(target_ulong)) &&
> +            ((1 << s->sew) >= 4));
> +}
> +
> +GEN_VEXT_TRANS(vamoswapw_v, 0, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamoaddw_v, 1, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamoxorw_v, 2, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamoandw_v, 3, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamoorw_v, 4, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamominw_v, 5, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamomaxw_v, 6, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamominuw_v, 7, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamomaxuw_v, 8, rwdvm, amo_op, amo_check)
> +#ifdef TARGET_RISCV64
> +GEN_VEXT_TRANS(vamoswapd_v, 9, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamoaddd_v, 10, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamoxord_v, 11, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamoandd_v, 12, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamoord_v, 13, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamomind_v, 14, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamomaxd_v, 15, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamominud_v, 16, rwdvm, amo_op, amo_check)
> +GEN_VEXT_TRANS(vamomaxud_v, 17, rwdvm, amo_op, amo_check)
> +#endif
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 3841301b74..f9b409b169 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -94,6 +94,11 @@ static inline uint32_t vext_lmul(uint32_t desc)
>      return FIELD_EX32(simd_data(desc), VDATA, LMUL);
>  }
>
> +static uint32_t vext_wd(uint32_t desc)
> +{
> +    return (simd_data(desc) >> 11) & 0x1;
> +}
> +
>  /*
>   * Get vector group length in bytes. Its range is [64, 2048].
>   *
> @@ -685,3 +690,141 @@ GEN_VEXT_LDFF(vlhuff_v_w, uint16_t, uint32_t, MO_LEUW, ldhu_w, clearl)
>  GEN_VEXT_LDFF(vlhuff_v_d, uint16_t, uint64_t, MO_LEUW, ldhu_d, clearq)
>  GEN_VEXT_LDFF(vlwuff_v_w, uint32_t, uint32_t, MO_LEUL, ldwu_w, clearl)
>  GEN_VEXT_LDFF(vlwuff_v_d, uint32_t, uint64_t, MO_LEUL, ldwu_d, clearq)
> +
> +/*
> + *** Vector AMO Operations (Zvamo)
> + */
> +typedef void (*vext_amo_noatomic_fn)(void *vs3, target_ulong addr,
> +        uint32_t wd, uint32_t idx, CPURISCVState *env, uintptr_t retaddr);
> +
> +/* no atomic opreation for vector atomic insructions */
> +#define DO_SWAP(N, M) (M)
> +#define DO_AND(N, M)  (N & M)
> +#define DO_XOR(N, M)  (N ^ M)
> +#define DO_OR(N, M)   (N | M)
> +#define DO_ADD(N, M)  (N + M)

Why don't these need to be atomic?

> +
> +#define GEN_VEXT_AMO_NOATOMIC_OP(NAME, ESZ, MSZ, H, DO_OP, SUF) \
> +static void vext_##NAME##_noatomic_op(void *vs3,                \
> +            target_ulong addr, uint32_t wd, uint32_t idx,       \
> +                CPURISCVState *env, uintptr_t retaddr)          \
> +{                                                               \
> +    typedef int##ESZ##_t ETYPE;                                 \
> +    typedef int##MSZ##_t MTYPE;                                 \
> +    typedef uint##MSZ##_t UMTYPE __attribute__((unused));       \
> +    ETYPE *pe3 = (ETYPE *)vs3 + H(idx);                         \
> +    MTYPE a = *pe3, b = cpu_ld##SUF##_data(env, addr);          \
> +    a = DO_OP(a, b);                                            \
> +    cpu_st##SUF##_data(env, addr, a);                           \
> +    if (wd) {                                                   \
> +        *pe3 = a;                                               \
> +    }                                                           \
> +}
> +
> +/* Signed min/max */
> +#define DO_MAX(N, M)  ((N) >= (M) ? (N) : (M))
> +#define DO_MIN(N, M)  ((N) >= (M) ? (M) : (N))
> +
> +/* Unsigned min/max */
> +#define DO_MAXU(N, M) DO_MAX((UMTYPE)N, (UMTYPE)M)
> +#define DO_MINU(N, M) DO_MIN((UMTYPE)N, (UMTYPE)M)
> +
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoswapw_v_w, 32, 32, H4, DO_SWAP, l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoaddw_v_w,  32, 32, H4, DO_ADD,  l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoxorw_v_w,  32, 32, H4, DO_XOR,  l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoandw_v_w,  32, 32, H4, DO_AND,  l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoorw_v_w,   32, 32, H4, DO_OR,   l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamominw_v_w,  32, 32, H4, DO_MIN,  l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxw_v_w,  32, 32, H4, DO_MAX,  l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamominuw_v_w, 32, 32, H4, DO_MINU, l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxuw_v_w, 32, 32, H4, DO_MAXU, l)
> +#ifdef TARGET_RISCV64
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoswapw_v_d, 64, 32, H8, DO_SWAP, l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoswapd_v_d, 64, 64, H8, DO_SWAP, q)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoaddw_v_d,  64, 32, H8, DO_ADD,  l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoaddd_v_d,  64, 64, H8, DO_ADD,  q)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoxorw_v_d,  64, 32, H8, DO_XOR,  l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoxord_v_d,  64, 64, H8, DO_XOR,  q)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoandw_v_d,  64, 32, H8, DO_AND,  l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoandd_v_d,  64, 64, H8, DO_AND,  q)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoorw_v_d,   64, 32, H8, DO_OR,   l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamoord_v_d,   64, 64, H8, DO_OR,   q)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamominw_v_d,  64, 32, H8, DO_MIN,  l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamomind_v_d,  64, 64, H8, DO_MIN,  q)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxw_v_d,  64, 32, H8, DO_MAX,  l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxd_v_d,  64, 64, H8, DO_MAX,  q)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamominuw_v_d, 64, 32, H8, DO_MINU, l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamominud_v_d, 64, 64, H8, DO_MINU, q)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxuw_v_d, 64, 32, H8, DO_MAXU, l)
> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxud_v_d, 64, 64, H8, DO_MAXU, q)
> +#endif

I'm confused why these are different to

> +
> +static inline void vext_amo_noatomic(void *vs3, void *v0, target_ulong base,
> +        void *vs2, CPURISCVState *env, uint32_t desc,
> +        vext_get_index_addr get_index_addr,
> +        vext_amo_noatomic_fn noatomic_op,
> +        vext_ld_clear_elem clear_elem,
> +        uint32_t esz, uint32_t msz, uintptr_t ra)
> +{
> +    uint32_t i;
> +    target_long addr;
> +    uint32_t wd = vext_wd(desc);
> +    uint32_t vm = vext_vm(desc);
> +    uint32_t mlen = vext_mlen(desc);
> +    uint32_t vlmax = vext_maxsz(desc) / esz;
> +
> +    for (i = 0; i < env->vl; i++) {
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +            continue;
> +        }
> +        probe_pages(env, get_index_addr(base, i, vs2), msz, ra, MMU_DATA_LOAD);
> +        probe_pages(env, get_index_addr(base, i, vs2), msz, ra, MMU_DATA_STORE);
> +    }
> +    for (i = 0; i < env->vl; i++) {
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +            continue;
> +        }
> +        addr = get_index_addr(base, i, vs2);
> +        noatomic_op(vs3, addr, wd, i, env, ra);
> +    }
> +    clear_elem(vs3, env->vl, env->vl * esz, vlmax * esz);
> +}
> +
> +#define GEN_VEXT_AMO(NAME, MTYPE, ETYPE, INDEX_FN, CLEAR_FN)    \
> +void HELPER(NAME)(void *vs3, void *v0, target_ulong base,       \
> +        void *vs2, CPURISCVState *env, uint32_t desc)           \
> +{                                                               \
> +    vext_amo_noatomic(vs3, v0, base, vs2, env, desc,            \
> +        INDEX_FN, vext_##NAME##_noatomic_op, CLEAR_FN,          \
> +        sizeof(ETYPE), sizeof(MTYPE), GETPC());                 \
> +}
> +
> +#ifdef TARGET_RISCV64
> +GEN_VEXT_AMO(vamoswapw_v_d, int32_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamoswapd_v_d, int64_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamoaddw_v_d,  int32_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamoaddd_v_d,  int64_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamoxorw_v_d,  int32_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamoxord_v_d,  int64_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamoandw_v_d,  int32_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamoandd_v_d,  int64_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamoorw_v_d,   int32_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamoord_v_d,   int64_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamominw_v_d,  int32_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamomind_v_d,  int64_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamomaxw_v_d,  int32_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamomaxd_v_d,  int64_t,  int64_t,  idx_d, clearq)
> +GEN_VEXT_AMO(vamominuw_v_d, uint32_t, uint64_t, idx_d, clearq)
> +GEN_VEXT_AMO(vamominud_v_d, uint64_t, uint64_t, idx_d, clearq)
> +GEN_VEXT_AMO(vamomaxuw_v_d, uint32_t, uint64_t, idx_d, clearq)
> +GEN_VEXT_AMO(vamomaxud_v_d, uint64_t, uint64_t, idx_d, clearq)
> +#endif
> +GEN_VEXT_AMO(vamoswapw_v_w, int32_t,  int32_t,  idx_w, clearl)
> +GEN_VEXT_AMO(vamoaddw_v_w,  int32_t,  int32_t,  idx_w, clearl)
> +GEN_VEXT_AMO(vamoxorw_v_w,  int32_t,  int32_t,  idx_w, clearl)
> +GEN_VEXT_AMO(vamoandw_v_w,  int32_t,  int32_t,  idx_w, clearl)
> +GEN_VEXT_AMO(vamoorw_v_w,   int32_t,  int32_t,  idx_w, clearl)
> +GEN_VEXT_AMO(vamominw_v_w,  int32_t,  int32_t,  idx_w, clearl)
> +GEN_VEXT_AMO(vamomaxw_v_w,  int32_t,  int32_t,  idx_w, clearl)
> +GEN_VEXT_AMO(vamominuw_v_w, uint32_t, uint32_t, idx_w, clearl)
> +GEN_VEXT_AMO(vamomaxuw_v_w, uint32_t, uint32_t, idx_w, clearl)

These?

Alistair

> --
> 2.23.0
>


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 08/60] target/riscv: add vector amo operations
  2020-03-14  0:02     ` Alistair Francis
@ 2020-03-14  0:36       ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-14  0:36 UTC (permalink / raw)
  To: Alistair Francis
  Cc: guoren, open list:RISC-V, Richard Henderson,
	qemu-devel@nongnu.org Developers, wxy194768, Chih-Min Chao,
	wenmeng_zhang, Palmer Dabbelt



On 2020/3/14 8:02, Alistair Francis wrote:
> On Thu, Mar 12, 2020 at 8:15 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>> Vector AMOs operate as if aq and rl bits were zero on each element
>> with regard to ordering relative to other instructions in the same hart.
>> Vector AMOs provide no ordering guarantee between element operations
>> in the same vector AMO instruction
>>
>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>> ---
>>   target/riscv/cpu.h                      |   1 +
>>   target/riscv/helper.h                   |  29 +++++
>>   target/riscv/insn32-64.decode           |  11 ++
>>   target/riscv/insn32.decode              |  13 +++
>>   target/riscv/insn_trans/trans_rvv.inc.c | 130 +++++++++++++++++++++
>>   target/riscv/vector_helper.c            | 143 ++++++++++++++++++++++++
>>   6 files changed, 327 insertions(+)
>>
>> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
>> index b6ebb9b0eb..e069e55e81 100644
>> --- a/target/riscv/cpu.h
>> +++ b/target/riscv/cpu.h
>> @@ -374,6 +374,7 @@ FIELD(VDATA, MLEN, 0, 8)
>>   FIELD(VDATA, VM, 8, 1)
>>   FIELD(VDATA, LMUL, 9, 2)
>>   FIELD(VDATA, NF, 11, 4)
>> +FIELD(VDATA, WD, 11, 1)
>>
>>   FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
>>   FIELD(TB_FLAGS, LMUL, 3, 2)
>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>> index 72ba4d9bdb..70a4b05f75 100644
>> --- a/target/riscv/helper.h
>> +++ b/target/riscv/helper.h
>> @@ -240,3 +240,32 @@ DEF_HELPER_5(vlhuff_v_w, void, ptr, ptr, tl, env, i32)
>>   DEF_HELPER_5(vlhuff_v_d, void, ptr, ptr, tl, env, i32)
>>   DEF_HELPER_5(vlwuff_v_w, void, ptr, ptr, tl, env, i32)
>>   DEF_HELPER_5(vlwuff_v_d, void, ptr, ptr, tl, env, i32)
>> +#ifdef TARGET_RISCV64
>> +DEF_HELPER_6(vamoswapw_v_d, void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoswapd_v_d, void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoaddw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoaddd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoxorw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoxord_v_d,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoandw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoandd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoorw_v_d,   void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoord_v_d,   void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamominw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamomind_v_d,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamomaxw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamomaxd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamominuw_v_d, void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamominud_v_d, void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamomaxuw_v_d, void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamomaxud_v_d, void, ptr, ptr, tl, ptr, env, i32)
>> +#endif
>> +DEF_HELPER_6(vamoswapw_v_w, void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoaddw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoxorw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoandw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoorw_v_w,   void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamominw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamomaxw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamominuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamomaxuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
>> diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
>> index 380bf791bc..86153d93fa 100644
>> --- a/target/riscv/insn32-64.decode
>> +++ b/target/riscv/insn32-64.decode
>> @@ -57,6 +57,17 @@ amomax_d   10100 . . ..... ..... 011 ..... 0101111 @atom_st
>>   amominu_d  11000 . . ..... ..... 011 ..... 0101111 @atom_st
>>   amomaxu_d  11100 . . ..... ..... 011 ..... 0101111 @atom_st
>>
>> +#*** Vector AMO operations (in addition to Zvamo) ***
>> +vamoswapd_v     00001 . . ..... ..... 111 ..... 0101111 @r_wdvm
>> +vamoaddd_v      00000 . . ..... ..... 111 ..... 0101111 @r_wdvm
>> +vamoxord_v      00100 . . ..... ..... 111 ..... 0101111 @r_wdvm
>> +vamoandd_v      01100 . . ..... ..... 111 ..... 0101111 @r_wdvm
>> +vamoord_v       01000 . . ..... ..... 111 ..... 0101111 @r_wdvm
>> +vamomind_v      10000 . . ..... ..... 111 ..... 0101111 @r_wdvm
>> +vamomaxd_v      10100 . . ..... ..... 111 ..... 0101111 @r_wdvm
>> +vamominud_v     11000 . . ..... ..... 111 ..... 0101111 @r_wdvm
>> +vamomaxud_v     11100 . . ..... ..... 111 ..... 0101111 @r_wdvm
>> +
>>   # *** RV64F Standard Extension (in addition to RV32F) ***
>>   fcvt_l_s   1100000  00010 ..... ... ..... 1010011 @r2_rm
>>   fcvt_lu_s  1100000  00011 ..... ... ..... 1010011 @r2_rm
>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>> index b76c09c8c0..1330703720 100644
>> --- a/target/riscv/insn32.decode
>> +++ b/target/riscv/insn32.decode
>> @@ -44,6 +44,7 @@
>>   &u    imm rd
>>   &shift     shamt rs1 rd
>>   &atomic    aq rl rs2 rs1 rd
>> +&rwdvm     vm wd rd rs1 rs2
>>   &r2nfvm    vm rd rs1 nf
>>   &rnfvm     vm rd rs1 rs2 nf
>>
>> @@ -67,6 +68,7 @@
>>   @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
>>   @r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
>>   @r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
>> +@r_wdvm  ..... wd:1 vm:1 ..... ..... ... ..... ....... &rwdvm %rs2 %rs1 %rd
>>   @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
>>
>>   @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
>> @@ -261,6 +263,17 @@ vsxh_v     ... -11 . ..... ..... 101 ..... 0100111 @r_nfvm
>>   vsxw_v     ... -11 . ..... ..... 110 ..... 0100111 @r_nfvm
>>   vsxe_v     ... -11 . ..... ..... 111 ..... 0100111 @r_nfvm
>>
>> +#*** Vector AMO operations are encoded under the standard AMO major opcode ***
>> +vamoswapw_v     00001 . . ..... ..... 110 ..... 0101111 @r_wdvm
>> +vamoaddw_v      00000 . . ..... ..... 110 ..... 0101111 @r_wdvm
>> +vamoxorw_v      00100 . . ..... ..... 110 ..... 0101111 @r_wdvm
>> +vamoandw_v      01100 . . ..... ..... 110 ..... 0101111 @r_wdvm
>> +vamoorw_v       01000 . . ..... ..... 110 ..... 0101111 @r_wdvm
>> +vamominw_v      10000 . . ..... ..... 110 ..... 0101111 @r_wdvm
>> +vamomaxw_v      10100 . . ..... ..... 110 ..... 0101111 @r_wdvm
>> +vamominuw_v     11000 . . ..... ..... 110 ..... 0101111 @r_wdvm
>> +vamomaxuw_v     11100 . . ..... ..... 110 ..... 0101111 @r_wdvm
>> +
>>   # *** new major opcode OP-V ***
>>   vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
>>   vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
>> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
>> index 9d9fc886d6..3c677160c5 100644
>> --- a/target/riscv/insn_trans/trans_rvv.inc.c
>> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
>> @@ -600,3 +600,133 @@ GEN_VEXT_TRANS(vleff_v, 3, r2nfvm, ldff_op, ld_us_check)
>>   GEN_VEXT_TRANS(vlbuff_v, 4, r2nfvm, ldff_op, ld_us_check)
>>   GEN_VEXT_TRANS(vlhuff_v, 5, r2nfvm, ldff_op, ld_us_check)
>>   GEN_VEXT_TRANS(vlwuff_v, 6, r2nfvm, ldff_op, ld_us_check)
>> +
>> +/*
>> + *** vector atomic operation
>> + */
>> +typedef void gen_helper_amo(TCGv_ptr, TCGv_ptr, TCGv, TCGv_ptr,
>> +        TCGv_env, TCGv_i32);
>> +
>> +static bool amo_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
>> +        uint32_t data, gen_helper_amo *fn, DisasContext *s)
>> +{
>> +    TCGv_ptr dest, mask, index;
>> +    TCGv base;
>> +    TCGv_i32 desc;
>> +
>> +    dest = tcg_temp_new_ptr();
>> +    mask = tcg_temp_new_ptr();
>> +    index = tcg_temp_new_ptr();
>> +    base = tcg_temp_new();
>> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
>> +
>> +    gen_get_gpr(base, rs1);
>> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
>> +    tcg_gen_addi_ptr(index, cpu_env, vreg_ofs(s, vs2));
>> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
>> +
>> +    fn(dest, mask, base, index, cpu_env, desc);
>> +
>> +    tcg_temp_free_ptr(dest);
>> +    tcg_temp_free_ptr(mask);
>> +    tcg_temp_free_ptr(index);
>> +    tcg_temp_free(base);
>> +    tcg_temp_free_i32(desc);
>> +    return true;
>> +}
>> +
>> +static bool amo_op(DisasContext *s, arg_rwdvm *a, uint8_t seq)
>> +{
>> +    uint32_t data = 0;
>> +    gen_helper_amo *fn;
>> +    static gen_helper_amo *const fnsw[9] = {
>> +        /* no atomic operation */
>> +        gen_helper_vamoswapw_v_w,
>> +        gen_helper_vamoaddw_v_w,
>> +        gen_helper_vamoxorw_v_w,
>> +        gen_helper_vamoandw_v_w,
>> +        gen_helper_vamoorw_v_w,
>> +        gen_helper_vamominw_v_w,
>> +        gen_helper_vamomaxw_v_w,
>> +        gen_helper_vamominuw_v_w,
>> +        gen_helper_vamomaxuw_v_w
>> +    };
>> +#ifdef TARGET_RISCV64
>> +    static gen_helper_amo *const fnsd[18] = {
>> +        gen_helper_vamoswapw_v_d,
>> +        gen_helper_vamoaddw_v_d,
>> +        gen_helper_vamoxorw_v_d,
>> +        gen_helper_vamoandw_v_d,
>> +        gen_helper_vamoorw_v_d,
>> +        gen_helper_vamominw_v_d,
>> +        gen_helper_vamomaxw_v_d,
>> +        gen_helper_vamominuw_v_d,
>> +        gen_helper_vamomaxuw_v_d,
>> +        gen_helper_vamoswapd_v_d,
>> +        gen_helper_vamoaddd_v_d,
>> +        gen_helper_vamoxord_v_d,
>> +        gen_helper_vamoandd_v_d,
>> +        gen_helper_vamoord_v_d,
>> +        gen_helper_vamomind_v_d,
>> +        gen_helper_vamomaxd_v_d,
>> +        gen_helper_vamominud_v_d,
>> +        gen_helper_vamomaxud_v_d
>> +    };
>> +#endif
>> +
>> +    if (tb_cflags(s->base.tb) & CF_PARALLEL) {
>> +        gen_helper_exit_atomic(cpu_env);
>> +        s->base.is_jmp = DISAS_NORETURN;
>> +        return true;
>> +    } else {
>> +        fn = fnsw[seq];
>> +#ifdef TARGET_RISCV64
>> +        if (s->sew == 3) {
>> +            fn = fnsd[seq];
>> +        }
>> +#endif
>> +    }
>> +
>> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
>> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
>> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
>> +    data = FIELD_DP32(data, VDATA, WD, a->wd);
>> +    return amo_trans(a->rd, a->rs1, a->rs2, data, fn, s);
>> +}
>> +/*
>> + * There are two rules check here.
>> + *
>> + * 1. SEW must be at least as wide as the AMO memory element size.
>> + *
>> + * 2. If SEW is greater than XLEN, an illegal instruction exception is raised.
>> + */
>> +static bool amo_check(DisasContext *s, arg_rwdvm* a)
>> +{
>> +    return (vext_check_isa_ill(s, RVV | RVA) &&
>> +            (!a->wd || vext_check_overlap_mask(s, a->rd, a->vm, false)) &&
>> +            vext_check_reg(s, a->rd, false) &&
>> +            vext_check_reg(s, a->rs2, false) &&
>> +            ((1 << s->sew) <= sizeof(target_ulong)) &&
>> +            ((1 << s->sew) >= 4));
>> +}
>> +
>> +GEN_VEXT_TRANS(vamoswapw_v, 0, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamoaddw_v, 1, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamoxorw_v, 2, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamoandw_v, 3, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamoorw_v, 4, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamominw_v, 5, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamomaxw_v, 6, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamominuw_v, 7, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamomaxuw_v, 8, rwdvm, amo_op, amo_check)
>> +#ifdef TARGET_RISCV64
>> +GEN_VEXT_TRANS(vamoswapd_v, 9, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamoaddd_v, 10, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamoxord_v, 11, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamoandd_v, 12, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamoord_v, 13, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamomind_v, 14, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamomaxd_v, 15, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamominud_v, 16, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamomaxud_v, 17, rwdvm, amo_op, amo_check)
>> +#endif
>> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
>> index 3841301b74..f9b409b169 100644
>> --- a/target/riscv/vector_helper.c
>> +++ b/target/riscv/vector_helper.c
>> @@ -94,6 +94,11 @@ static inline uint32_t vext_lmul(uint32_t desc)
>>       return FIELD_EX32(simd_data(desc), VDATA, LMUL);
>>   }
>>
>> +static uint32_t vext_wd(uint32_t desc)
>> +{
>> +    return (simd_data(desc) >> 11) & 0x1;
>> +}
>> +
>>   /*
>>    * Get vector group length in bytes. Its range is [64, 2048].
>>    *
>> @@ -685,3 +690,141 @@ GEN_VEXT_LDFF(vlhuff_v_w, uint16_t, uint32_t, MO_LEUW, ldhu_w, clearl)
>>   GEN_VEXT_LDFF(vlhuff_v_d, uint16_t, uint64_t, MO_LEUW, ldhu_d, clearq)
>>   GEN_VEXT_LDFF(vlwuff_v_w, uint32_t, uint32_t, MO_LEUL, ldwu_w, clearl)
>>   GEN_VEXT_LDFF(vlwuff_v_d, uint32_t, uint64_t, MO_LEUL, ldwu_d, clearq)
>> +
>> +/*
>> + *** Vector AMO Operations (Zvamo)
>> + */
>> +typedef void (*vext_amo_noatomic_fn)(void *vs3, target_ulong addr,
>> +        uint32_t wd, uint32_t idx, CPURISCVState *env, uintptr_t retaddr);
>> +
>> +/* no atomic opreation for vector atomic insructions */
>> +#define DO_SWAP(N, M) (M)
>> +#define DO_AND(N, M)  (N & M)
>> +#define DO_XOR(N, M)  (N ^ M)
>> +#define DO_OR(N, M)   (N | M)
>> +#define DO_ADD(N, M)  (N + M)
> Why don't these need to be atomic?
>
>> +
>> +#define GEN_VEXT_AMO_NOATOMIC_OP(NAME, ESZ, MSZ, H, DO_OP, SUF) \
>> +static void vext_##NAME##_noatomic_op(void *vs3,                \
>> +            target_ulong addr, uint32_t wd, uint32_t idx,       \
>> +                CPURISCVState *env, uintptr_t retaddr)          \
>> +{                                                               \
>> +    typedef int##ESZ##_t ETYPE;                                 \
>> +    typedef int##MSZ##_t MTYPE;                                 \
>> +    typedef uint##MSZ##_t UMTYPE __attribute__((unused));       \
>> +    ETYPE *pe3 = (ETYPE *)vs3 + H(idx);                         \
>> +    MTYPE a = *pe3, b = cpu_ld##SUF##_data(env, addr);          \
>> +    a = DO_OP(a, b);                                            \
>> +    cpu_st##SUF##_data(env, addr, a);                           \
>> +    if (wd) {                                                   \
>> +        *pe3 = a;                                               \
>> +    }                                                           \
>> +}
>> +
>> +/* Signed min/max */
>> +#define DO_MAX(N, M)  ((N) >= (M) ? (N) : (M))
>> +#define DO_MIN(N, M)  ((N) >= (M) ? (M) : (N))
>> +
>> +/* Unsigned min/max */
>> +#define DO_MAXU(N, M) DO_MAX((UMTYPE)N, (UMTYPE)M)
>> +#define DO_MINU(N, M) DO_MIN((UMTYPE)N, (UMTYPE)M)
>> +
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoswapw_v_w, 32, 32, H4, DO_SWAP, l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoaddw_v_w,  32, 32, H4, DO_ADD,  l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoxorw_v_w,  32, 32, H4, DO_XOR,  l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoandw_v_w,  32, 32, H4, DO_AND,  l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoorw_v_w,   32, 32, H4, DO_OR,   l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamominw_v_w,  32, 32, H4, DO_MIN,  l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxw_v_w,  32, 32, H4, DO_MAX,  l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamominuw_v_w, 32, 32, H4, DO_MINU, l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxuw_v_w, 32, 32, H4, DO_MAXU, l)
>> +#ifdef TARGET_RISCV64
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoswapw_v_d, 64, 32, H8, DO_SWAP, l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoswapd_v_d, 64, 64, H8, DO_SWAP, q)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoaddw_v_d,  64, 32, H8, DO_ADD,  l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoaddd_v_d,  64, 64, H8, DO_ADD,  q)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoxorw_v_d,  64, 32, H8, DO_XOR,  l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoxord_v_d,  64, 64, H8, DO_XOR,  q)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoandw_v_d,  64, 32, H8, DO_AND,  l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoandd_v_d,  64, 64, H8, DO_AND,  q)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoorw_v_d,   64, 32, H8, DO_OR,   l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoord_v_d,   64, 64, H8, DO_OR,   q)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamominw_v_d,  64, 32, H8, DO_MIN,  l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamomind_v_d,  64, 64, H8, DO_MIN,  q)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxw_v_d,  64, 32, H8, DO_MAX,  l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxd_v_d,  64, 64, H8, DO_MAX,  q)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamominuw_v_d, 64, 32, H8, DO_MINU, l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamominud_v_d, 64, 64, H8, DO_MINU, q)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxuw_v_d, 64, 32, H8, DO_MAXU, l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxud_v_d, 64, 64, H8, DO_MAXU, q)
>> +#endif
> I'm confused why these are different to
>
>> +
>> +static inline void vext_amo_noatomic(void *vs3, void *v0, target_ulong base,
>> +        void *vs2, CPURISCVState *env, uint32_t desc,
>> +        vext_get_index_addr get_index_addr,
>> +        vext_amo_noatomic_fn noatomic_op,
>> +        vext_ld_clear_elem clear_elem,
>> +        uint32_t esz, uint32_t msz, uintptr_t ra)
>> +{
>> +    uint32_t i;
>> +    target_long addr;
>> +    uint32_t wd = vext_wd(desc);
>> +    uint32_t vm = vext_vm(desc);
>> +    uint32_t mlen = vext_mlen(desc);
>> +    uint32_t vlmax = vext_maxsz(desc) / esz;
>> +
>> +    for (i = 0; i < env->vl; i++) {
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
>> +            continue;
>> +        }
>> +        probe_pages(env, get_index_addr(base, i, vs2), msz, ra, MMU_DATA_LOAD);
>> +        probe_pages(env, get_index_addr(base, i, vs2), msz, ra, MMU_DATA_STORE);
>> +    }
>> +    for (i = 0; i < env->vl; i++) {
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
>> +            continue;
>> +        }
>> +        addr = get_index_addr(base, i, vs2);
>> +        noatomic_op(vs3, addr, wd, i, env, ra);
>> +    }
>> +    clear_elem(vs3, env->vl, env->vl * esz, vlmax * esz);
>> +}
>> +
>> +#define GEN_VEXT_AMO(NAME, MTYPE, ETYPE, INDEX_FN, CLEAR_FN)    \
>> +void HELPER(NAME)(void *vs3, void *v0, target_ulong base,       \
>> +        void *vs2, CPURISCVState *env, uint32_t desc)           \
>> +{                                                               \
>> +    vext_amo_noatomic(vs3, v0, base, vs2, env, desc,            \
>> +        INDEX_FN, vext_##NAME##_noatomic_op, CLEAR_FN,          \
>> +        sizeof(ETYPE), sizeof(MTYPE), GETPC());                 \
>> +}
>> +
>> +#ifdef TARGET_RISCV64
>> +GEN_VEXT_AMO(vamoswapw_v_d, int32_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamoswapd_v_d, int64_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamoaddw_v_d,  int32_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamoaddd_v_d,  int64_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamoxorw_v_d,  int32_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamoxord_v_d,  int64_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamoandw_v_d,  int32_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamoandd_v_d,  int64_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamoorw_v_d,   int32_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamoord_v_d,   int64_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamominw_v_d,  int32_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamomind_v_d,  int64_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamomaxw_v_d,  int32_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamomaxd_v_d,  int64_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamominuw_v_d, uint32_t, uint64_t, idx_d, clearq)
>> +GEN_VEXT_AMO(vamominud_v_d, uint64_t, uint64_t, idx_d, clearq)
>> +GEN_VEXT_AMO(vamomaxuw_v_d, uint32_t, uint64_t, idx_d, clearq)
>> +GEN_VEXT_AMO(vamomaxud_v_d, uint64_t, uint64_t, idx_d, clearq)
>> +#endif
>> +GEN_VEXT_AMO(vamoswapw_v_w, int32_t,  int32_t,  idx_w, clearl)
>> +GEN_VEXT_AMO(vamoaddw_v_w,  int32_t,  int32_t,  idx_w, clearl)
>> +GEN_VEXT_AMO(vamoxorw_v_w,  int32_t,  int32_t,  idx_w, clearl)
>> +GEN_VEXT_AMO(vamoandw_v_w,  int32_t,  int32_t,  idx_w, clearl)
>> +GEN_VEXT_AMO(vamoorw_v_w,   int32_t,  int32_t,  idx_w, clearl)
>> +GEN_VEXT_AMO(vamominw_v_w,  int32_t,  int32_t,  idx_w, clearl)
>> +GEN_VEXT_AMO(vamomaxw_v_w,  int32_t,  int32_t,  idx_w, clearl)
>> +GEN_VEXT_AMO(vamominuw_v_w, uint32_t, uint32_t, idx_w, clearl)
>> +GEN_VEXT_AMO(vamomaxuw_v_w, uint32_t, uint32_t, idx_w, clearl)
> These?
GEN_VEXT_AMO is to generate atomic helper functions.
GEN_VEXT_AMO_NOATOMIC_OP is  to generate a function to process just one 
element in the vector register groups,
which will be used in atomic helper functions.

Once before there were to macros for atomic operation, 
GEN_VEXT_AMO_ATOMIC_OP and GEN_VEXT_AMO_NOATOMIC_OP.
Therefore two OPS were available for atomic instructions elements. When 
needs paralell, it will use ATOMIC_OP, otherwise
use the NOATOMIC_OP.

However GEN_VEXT_AMO_ATOMIC_OP needs call some atomic helpers in TCG 
runtime.
The atomic helpers are designed for atomic TCG IR. I can only use them 
with very cautiously.

As Richard suggests, I should not use the atomic helpers in vector 
helpers.  An atomic op will be
replaced by an atomic exit exception and a noatomic op.

That's the reason why I name it GEN_VEXT_AMO_NOATOMIC_OP. If in the 
future,  the proper
atomic APIs were created, I will also add  a GEN_VEXT_AMO_ATOMIC_OP 
macro here.

Zhiwei

>
> Alistair
>
>> --
>> 2.23.0
>>



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 08/60] target/riscv: add vector amo operations
@ 2020-03-14  0:36       ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-14  0:36 UTC (permalink / raw)
  To: Alistair Francis
  Cc: Richard Henderson, Chih-Min Chao, Palmer Dabbelt, wenmeng_zhang,
	wxy194768, guoren, qemu-devel@nongnu.org Developers,
	open list:RISC-V



On 2020/3/14 8:02, Alistair Francis wrote:
> On Thu, Mar 12, 2020 at 8:15 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>> Vector AMOs operate as if aq and rl bits were zero on each element
>> with regard to ordering relative to other instructions in the same hart.
>> Vector AMOs provide no ordering guarantee between element operations
>> in the same vector AMO instruction
>>
>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>> ---
>>   target/riscv/cpu.h                      |   1 +
>>   target/riscv/helper.h                   |  29 +++++
>>   target/riscv/insn32-64.decode           |  11 ++
>>   target/riscv/insn32.decode              |  13 +++
>>   target/riscv/insn_trans/trans_rvv.inc.c | 130 +++++++++++++++++++++
>>   target/riscv/vector_helper.c            | 143 ++++++++++++++++++++++++
>>   6 files changed, 327 insertions(+)
>>
>> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
>> index b6ebb9b0eb..e069e55e81 100644
>> --- a/target/riscv/cpu.h
>> +++ b/target/riscv/cpu.h
>> @@ -374,6 +374,7 @@ FIELD(VDATA, MLEN, 0, 8)
>>   FIELD(VDATA, VM, 8, 1)
>>   FIELD(VDATA, LMUL, 9, 2)
>>   FIELD(VDATA, NF, 11, 4)
>> +FIELD(VDATA, WD, 11, 1)
>>
>>   FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
>>   FIELD(TB_FLAGS, LMUL, 3, 2)
>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>> index 72ba4d9bdb..70a4b05f75 100644
>> --- a/target/riscv/helper.h
>> +++ b/target/riscv/helper.h
>> @@ -240,3 +240,32 @@ DEF_HELPER_5(vlhuff_v_w, void, ptr, ptr, tl, env, i32)
>>   DEF_HELPER_5(vlhuff_v_d, void, ptr, ptr, tl, env, i32)
>>   DEF_HELPER_5(vlwuff_v_w, void, ptr, ptr, tl, env, i32)
>>   DEF_HELPER_5(vlwuff_v_d, void, ptr, ptr, tl, env, i32)
>> +#ifdef TARGET_RISCV64
>> +DEF_HELPER_6(vamoswapw_v_d, void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoswapd_v_d, void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoaddw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoaddd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoxorw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoxord_v_d,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoandw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoandd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoorw_v_d,   void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoord_v_d,   void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamominw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamomind_v_d,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamomaxw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamomaxd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamominuw_v_d, void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamominud_v_d, void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamomaxuw_v_d, void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamomaxud_v_d, void, ptr, ptr, tl, ptr, env, i32)
>> +#endif
>> +DEF_HELPER_6(vamoswapw_v_w, void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoaddw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoxorw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoandw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamoorw_v_w,   void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamominw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamomaxw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamominuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
>> +DEF_HELPER_6(vamomaxuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
>> diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
>> index 380bf791bc..86153d93fa 100644
>> --- a/target/riscv/insn32-64.decode
>> +++ b/target/riscv/insn32-64.decode
>> @@ -57,6 +57,17 @@ amomax_d   10100 . . ..... ..... 011 ..... 0101111 @atom_st
>>   amominu_d  11000 . . ..... ..... 011 ..... 0101111 @atom_st
>>   amomaxu_d  11100 . . ..... ..... 011 ..... 0101111 @atom_st
>>
>> +#*** Vector AMO operations (in addition to Zvamo) ***
>> +vamoswapd_v     00001 . . ..... ..... 111 ..... 0101111 @r_wdvm
>> +vamoaddd_v      00000 . . ..... ..... 111 ..... 0101111 @r_wdvm
>> +vamoxord_v      00100 . . ..... ..... 111 ..... 0101111 @r_wdvm
>> +vamoandd_v      01100 . . ..... ..... 111 ..... 0101111 @r_wdvm
>> +vamoord_v       01000 . . ..... ..... 111 ..... 0101111 @r_wdvm
>> +vamomind_v      10000 . . ..... ..... 111 ..... 0101111 @r_wdvm
>> +vamomaxd_v      10100 . . ..... ..... 111 ..... 0101111 @r_wdvm
>> +vamominud_v     11000 . . ..... ..... 111 ..... 0101111 @r_wdvm
>> +vamomaxud_v     11100 . . ..... ..... 111 ..... 0101111 @r_wdvm
>> +
>>   # *** RV64F Standard Extension (in addition to RV32F) ***
>>   fcvt_l_s   1100000  00010 ..... ... ..... 1010011 @r2_rm
>>   fcvt_lu_s  1100000  00011 ..... ... ..... 1010011 @r2_rm
>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>> index b76c09c8c0..1330703720 100644
>> --- a/target/riscv/insn32.decode
>> +++ b/target/riscv/insn32.decode
>> @@ -44,6 +44,7 @@
>>   &u    imm rd
>>   &shift     shamt rs1 rd
>>   &atomic    aq rl rs2 rs1 rd
>> +&rwdvm     vm wd rd rs1 rs2
>>   &r2nfvm    vm rd rs1 nf
>>   &rnfvm     vm rd rs1 rs2 nf
>>
>> @@ -67,6 +68,7 @@
>>   @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
>>   @r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
>>   @r_nfvm  ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd
>> +@r_wdvm  ..... wd:1 vm:1 ..... ..... ... ..... ....... &rwdvm %rs2 %rs1 %rd
>>   @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
>>
>>   @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
>> @@ -261,6 +263,17 @@ vsxh_v     ... -11 . ..... ..... 101 ..... 0100111 @r_nfvm
>>   vsxw_v     ... -11 . ..... ..... 110 ..... 0100111 @r_nfvm
>>   vsxe_v     ... -11 . ..... ..... 111 ..... 0100111 @r_nfvm
>>
>> +#*** Vector AMO operations are encoded under the standard AMO major opcode ***
>> +vamoswapw_v     00001 . . ..... ..... 110 ..... 0101111 @r_wdvm
>> +vamoaddw_v      00000 . . ..... ..... 110 ..... 0101111 @r_wdvm
>> +vamoxorw_v      00100 . . ..... ..... 110 ..... 0101111 @r_wdvm
>> +vamoandw_v      01100 . . ..... ..... 110 ..... 0101111 @r_wdvm
>> +vamoorw_v       01000 . . ..... ..... 110 ..... 0101111 @r_wdvm
>> +vamominw_v      10000 . . ..... ..... 110 ..... 0101111 @r_wdvm
>> +vamomaxw_v      10100 . . ..... ..... 110 ..... 0101111 @r_wdvm
>> +vamominuw_v     11000 . . ..... ..... 110 ..... 0101111 @r_wdvm
>> +vamomaxuw_v     11100 . . ..... ..... 110 ..... 0101111 @r_wdvm
>> +
>>   # *** new major opcode OP-V ***
>>   vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
>>   vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
>> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
>> index 9d9fc886d6..3c677160c5 100644
>> --- a/target/riscv/insn_trans/trans_rvv.inc.c
>> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
>> @@ -600,3 +600,133 @@ GEN_VEXT_TRANS(vleff_v, 3, r2nfvm, ldff_op, ld_us_check)
>>   GEN_VEXT_TRANS(vlbuff_v, 4, r2nfvm, ldff_op, ld_us_check)
>>   GEN_VEXT_TRANS(vlhuff_v, 5, r2nfvm, ldff_op, ld_us_check)
>>   GEN_VEXT_TRANS(vlwuff_v, 6, r2nfvm, ldff_op, ld_us_check)
>> +
>> +/*
>> + *** vector atomic operation
>> + */
>> +typedef void gen_helper_amo(TCGv_ptr, TCGv_ptr, TCGv, TCGv_ptr,
>> +        TCGv_env, TCGv_i32);
>> +
>> +static bool amo_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
>> +        uint32_t data, gen_helper_amo *fn, DisasContext *s)
>> +{
>> +    TCGv_ptr dest, mask, index;
>> +    TCGv base;
>> +    TCGv_i32 desc;
>> +
>> +    dest = tcg_temp_new_ptr();
>> +    mask = tcg_temp_new_ptr();
>> +    index = tcg_temp_new_ptr();
>> +    base = tcg_temp_new();
>> +    desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
>> +
>> +    gen_get_gpr(base, rs1);
>> +    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
>> +    tcg_gen_addi_ptr(index, cpu_env, vreg_ofs(s, vs2));
>> +    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
>> +
>> +    fn(dest, mask, base, index, cpu_env, desc);
>> +
>> +    tcg_temp_free_ptr(dest);
>> +    tcg_temp_free_ptr(mask);
>> +    tcg_temp_free_ptr(index);
>> +    tcg_temp_free(base);
>> +    tcg_temp_free_i32(desc);
>> +    return true;
>> +}
>> +
>> +static bool amo_op(DisasContext *s, arg_rwdvm *a, uint8_t seq)
>> +{
>> +    uint32_t data = 0;
>> +    gen_helper_amo *fn;
>> +    static gen_helper_amo *const fnsw[9] = {
>> +        /* no atomic operation */
>> +        gen_helper_vamoswapw_v_w,
>> +        gen_helper_vamoaddw_v_w,
>> +        gen_helper_vamoxorw_v_w,
>> +        gen_helper_vamoandw_v_w,
>> +        gen_helper_vamoorw_v_w,
>> +        gen_helper_vamominw_v_w,
>> +        gen_helper_vamomaxw_v_w,
>> +        gen_helper_vamominuw_v_w,
>> +        gen_helper_vamomaxuw_v_w
>> +    };
>> +#ifdef TARGET_RISCV64
>> +    static gen_helper_amo *const fnsd[18] = {
>> +        gen_helper_vamoswapw_v_d,
>> +        gen_helper_vamoaddw_v_d,
>> +        gen_helper_vamoxorw_v_d,
>> +        gen_helper_vamoandw_v_d,
>> +        gen_helper_vamoorw_v_d,
>> +        gen_helper_vamominw_v_d,
>> +        gen_helper_vamomaxw_v_d,
>> +        gen_helper_vamominuw_v_d,
>> +        gen_helper_vamomaxuw_v_d,
>> +        gen_helper_vamoswapd_v_d,
>> +        gen_helper_vamoaddd_v_d,
>> +        gen_helper_vamoxord_v_d,
>> +        gen_helper_vamoandd_v_d,
>> +        gen_helper_vamoord_v_d,
>> +        gen_helper_vamomind_v_d,
>> +        gen_helper_vamomaxd_v_d,
>> +        gen_helper_vamominud_v_d,
>> +        gen_helper_vamomaxud_v_d
>> +    };
>> +#endif
>> +
>> +    if (tb_cflags(s->base.tb) & CF_PARALLEL) {
>> +        gen_helper_exit_atomic(cpu_env);
>> +        s->base.is_jmp = DISAS_NORETURN;
>> +        return true;
>> +    } else {
>> +        fn = fnsw[seq];
>> +#ifdef TARGET_RISCV64
>> +        if (s->sew == 3) {
>> +            fn = fnsd[seq];
>> +        }
>> +#endif
>> +    }
>> +
>> +    data = FIELD_DP32(data, VDATA, MLEN, s->mlen);
>> +    data = FIELD_DP32(data, VDATA, VM, a->vm);
>> +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
>> +    data = FIELD_DP32(data, VDATA, WD, a->wd);
>> +    return amo_trans(a->rd, a->rs1, a->rs2, data, fn, s);
>> +}
>> +/*
>> + * There are two rules check here.
>> + *
>> + * 1. SEW must be at least as wide as the AMO memory element size.
>> + *
>> + * 2. If SEW is greater than XLEN, an illegal instruction exception is raised.
>> + */
>> +static bool amo_check(DisasContext *s, arg_rwdvm* a)
>> +{
>> +    return (vext_check_isa_ill(s, RVV | RVA) &&
>> +            (!a->wd || vext_check_overlap_mask(s, a->rd, a->vm, false)) &&
>> +            vext_check_reg(s, a->rd, false) &&
>> +            vext_check_reg(s, a->rs2, false) &&
>> +            ((1 << s->sew) <= sizeof(target_ulong)) &&
>> +            ((1 << s->sew) >= 4));
>> +}
>> +
>> +GEN_VEXT_TRANS(vamoswapw_v, 0, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamoaddw_v, 1, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamoxorw_v, 2, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamoandw_v, 3, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamoorw_v, 4, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamominw_v, 5, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamomaxw_v, 6, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamominuw_v, 7, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamomaxuw_v, 8, rwdvm, amo_op, amo_check)
>> +#ifdef TARGET_RISCV64
>> +GEN_VEXT_TRANS(vamoswapd_v, 9, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamoaddd_v, 10, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamoxord_v, 11, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamoandd_v, 12, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamoord_v, 13, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamomind_v, 14, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamomaxd_v, 15, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamominud_v, 16, rwdvm, amo_op, amo_check)
>> +GEN_VEXT_TRANS(vamomaxud_v, 17, rwdvm, amo_op, amo_check)
>> +#endif
>> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
>> index 3841301b74..f9b409b169 100644
>> --- a/target/riscv/vector_helper.c
>> +++ b/target/riscv/vector_helper.c
>> @@ -94,6 +94,11 @@ static inline uint32_t vext_lmul(uint32_t desc)
>>       return FIELD_EX32(simd_data(desc), VDATA, LMUL);
>>   }
>>
>> +static uint32_t vext_wd(uint32_t desc)
>> +{
>> +    return (simd_data(desc) >> 11) & 0x1;
>> +}
>> +
>>   /*
>>    * Get vector group length in bytes. Its range is [64, 2048].
>>    *
>> @@ -685,3 +690,141 @@ GEN_VEXT_LDFF(vlhuff_v_w, uint16_t, uint32_t, MO_LEUW, ldhu_w, clearl)
>>   GEN_VEXT_LDFF(vlhuff_v_d, uint16_t, uint64_t, MO_LEUW, ldhu_d, clearq)
>>   GEN_VEXT_LDFF(vlwuff_v_w, uint32_t, uint32_t, MO_LEUL, ldwu_w, clearl)
>>   GEN_VEXT_LDFF(vlwuff_v_d, uint32_t, uint64_t, MO_LEUL, ldwu_d, clearq)
>> +
>> +/*
>> + *** Vector AMO Operations (Zvamo)
>> + */
>> +typedef void (*vext_amo_noatomic_fn)(void *vs3, target_ulong addr,
>> +        uint32_t wd, uint32_t idx, CPURISCVState *env, uintptr_t retaddr);
>> +
>> +/* no atomic opreation for vector atomic insructions */
>> +#define DO_SWAP(N, M) (M)
>> +#define DO_AND(N, M)  (N & M)
>> +#define DO_XOR(N, M)  (N ^ M)
>> +#define DO_OR(N, M)   (N | M)
>> +#define DO_ADD(N, M)  (N + M)
> Why don't these need to be atomic?
>
>> +
>> +#define GEN_VEXT_AMO_NOATOMIC_OP(NAME, ESZ, MSZ, H, DO_OP, SUF) \
>> +static void vext_##NAME##_noatomic_op(void *vs3,                \
>> +            target_ulong addr, uint32_t wd, uint32_t idx,       \
>> +                CPURISCVState *env, uintptr_t retaddr)          \
>> +{                                                               \
>> +    typedef int##ESZ##_t ETYPE;                                 \
>> +    typedef int##MSZ##_t MTYPE;                                 \
>> +    typedef uint##MSZ##_t UMTYPE __attribute__((unused));       \
>> +    ETYPE *pe3 = (ETYPE *)vs3 + H(idx);                         \
>> +    MTYPE a = *pe3, b = cpu_ld##SUF##_data(env, addr);          \
>> +    a = DO_OP(a, b);                                            \
>> +    cpu_st##SUF##_data(env, addr, a);                           \
>> +    if (wd) {                                                   \
>> +        *pe3 = a;                                               \
>> +    }                                                           \
>> +}
>> +
>> +/* Signed min/max */
>> +#define DO_MAX(N, M)  ((N) >= (M) ? (N) : (M))
>> +#define DO_MIN(N, M)  ((N) >= (M) ? (M) : (N))
>> +
>> +/* Unsigned min/max */
>> +#define DO_MAXU(N, M) DO_MAX((UMTYPE)N, (UMTYPE)M)
>> +#define DO_MINU(N, M) DO_MIN((UMTYPE)N, (UMTYPE)M)
>> +
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoswapw_v_w, 32, 32, H4, DO_SWAP, l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoaddw_v_w,  32, 32, H4, DO_ADD,  l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoxorw_v_w,  32, 32, H4, DO_XOR,  l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoandw_v_w,  32, 32, H4, DO_AND,  l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoorw_v_w,   32, 32, H4, DO_OR,   l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamominw_v_w,  32, 32, H4, DO_MIN,  l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxw_v_w,  32, 32, H4, DO_MAX,  l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamominuw_v_w, 32, 32, H4, DO_MINU, l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxuw_v_w, 32, 32, H4, DO_MAXU, l)
>> +#ifdef TARGET_RISCV64
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoswapw_v_d, 64, 32, H8, DO_SWAP, l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoswapd_v_d, 64, 64, H8, DO_SWAP, q)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoaddw_v_d,  64, 32, H8, DO_ADD,  l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoaddd_v_d,  64, 64, H8, DO_ADD,  q)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoxorw_v_d,  64, 32, H8, DO_XOR,  l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoxord_v_d,  64, 64, H8, DO_XOR,  q)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoandw_v_d,  64, 32, H8, DO_AND,  l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoandd_v_d,  64, 64, H8, DO_AND,  q)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoorw_v_d,   64, 32, H8, DO_OR,   l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamoord_v_d,   64, 64, H8, DO_OR,   q)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamominw_v_d,  64, 32, H8, DO_MIN,  l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamomind_v_d,  64, 64, H8, DO_MIN,  q)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxw_v_d,  64, 32, H8, DO_MAX,  l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxd_v_d,  64, 64, H8, DO_MAX,  q)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamominuw_v_d, 64, 32, H8, DO_MINU, l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamominud_v_d, 64, 64, H8, DO_MINU, q)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxuw_v_d, 64, 32, H8, DO_MAXU, l)
>> +GEN_VEXT_AMO_NOATOMIC_OP(vamomaxud_v_d, 64, 64, H8, DO_MAXU, q)
>> +#endif
> I'm confused why these are different to
>
>> +
>> +static inline void vext_amo_noatomic(void *vs3, void *v0, target_ulong base,
>> +        void *vs2, CPURISCVState *env, uint32_t desc,
>> +        vext_get_index_addr get_index_addr,
>> +        vext_amo_noatomic_fn noatomic_op,
>> +        vext_ld_clear_elem clear_elem,
>> +        uint32_t esz, uint32_t msz, uintptr_t ra)
>> +{
>> +    uint32_t i;
>> +    target_long addr;
>> +    uint32_t wd = vext_wd(desc);
>> +    uint32_t vm = vext_vm(desc);
>> +    uint32_t mlen = vext_mlen(desc);
>> +    uint32_t vlmax = vext_maxsz(desc) / esz;
>> +
>> +    for (i = 0; i < env->vl; i++) {
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
>> +            continue;
>> +        }
>> +        probe_pages(env, get_index_addr(base, i, vs2), msz, ra, MMU_DATA_LOAD);
>> +        probe_pages(env, get_index_addr(base, i, vs2), msz, ra, MMU_DATA_STORE);
>> +    }
>> +    for (i = 0; i < env->vl; i++) {
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
>> +            continue;
>> +        }
>> +        addr = get_index_addr(base, i, vs2);
>> +        noatomic_op(vs3, addr, wd, i, env, ra);
>> +    }
>> +    clear_elem(vs3, env->vl, env->vl * esz, vlmax * esz);
>> +}
>> +
>> +#define GEN_VEXT_AMO(NAME, MTYPE, ETYPE, INDEX_FN, CLEAR_FN)    \
>> +void HELPER(NAME)(void *vs3, void *v0, target_ulong base,       \
>> +        void *vs2, CPURISCVState *env, uint32_t desc)           \
>> +{                                                               \
>> +    vext_amo_noatomic(vs3, v0, base, vs2, env, desc,            \
>> +        INDEX_FN, vext_##NAME##_noatomic_op, CLEAR_FN,          \
>> +        sizeof(ETYPE), sizeof(MTYPE), GETPC());                 \
>> +}
>> +
>> +#ifdef TARGET_RISCV64
>> +GEN_VEXT_AMO(vamoswapw_v_d, int32_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamoswapd_v_d, int64_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamoaddw_v_d,  int32_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamoaddd_v_d,  int64_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamoxorw_v_d,  int32_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamoxord_v_d,  int64_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamoandw_v_d,  int32_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamoandd_v_d,  int64_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamoorw_v_d,   int32_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamoord_v_d,   int64_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamominw_v_d,  int32_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamomind_v_d,  int64_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamomaxw_v_d,  int32_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamomaxd_v_d,  int64_t,  int64_t,  idx_d, clearq)
>> +GEN_VEXT_AMO(vamominuw_v_d, uint32_t, uint64_t, idx_d, clearq)
>> +GEN_VEXT_AMO(vamominud_v_d, uint64_t, uint64_t, idx_d, clearq)
>> +GEN_VEXT_AMO(vamomaxuw_v_d, uint32_t, uint64_t, idx_d, clearq)
>> +GEN_VEXT_AMO(vamomaxud_v_d, uint64_t, uint64_t, idx_d, clearq)
>> +#endif
>> +GEN_VEXT_AMO(vamoswapw_v_w, int32_t,  int32_t,  idx_w, clearl)
>> +GEN_VEXT_AMO(vamoaddw_v_w,  int32_t,  int32_t,  idx_w, clearl)
>> +GEN_VEXT_AMO(vamoxorw_v_w,  int32_t,  int32_t,  idx_w, clearl)
>> +GEN_VEXT_AMO(vamoandw_v_w,  int32_t,  int32_t,  idx_w, clearl)
>> +GEN_VEXT_AMO(vamoorw_v_w,   int32_t,  int32_t,  idx_w, clearl)
>> +GEN_VEXT_AMO(vamominw_v_w,  int32_t,  int32_t,  idx_w, clearl)
>> +GEN_VEXT_AMO(vamomaxw_v_w,  int32_t,  int32_t,  idx_w, clearl)
>> +GEN_VEXT_AMO(vamominuw_v_w, uint32_t, uint32_t, idx_w, clearl)
>> +GEN_VEXT_AMO(vamomaxuw_v_w, uint32_t, uint32_t, idx_w, clearl)
> These?
GEN_VEXT_AMO is to generate atomic helper functions.
GEN_VEXT_AMO_NOATOMIC_OP is  to generate a function to process just one 
element in the vector register groups,
which will be used in atomic helper functions.

Once before there were to macros for atomic operation, 
GEN_VEXT_AMO_ATOMIC_OP and GEN_VEXT_AMO_NOATOMIC_OP.
Therefore two OPS were available for atomic instructions elements. When 
needs paralell, it will use ATOMIC_OP, otherwise
use the NOATOMIC_OP.

However GEN_VEXT_AMO_ATOMIC_OP needs call some atomic helpers in TCG 
runtime.
The atomic helpers are designed for atomic TCG IR. I can only use them 
with very cautiously.

As Richard suggests, I should not use the atomic helpers in vector 
helpers.  An atomic op will be
replaced by an atomic exit exception and a noatomic op.

That's the reason why I name it GEN_VEXT_AMO_NOATOMIC_OP. If in the 
future,  the proper
atomic APIs were created, I will also add  a GEN_VEXT_AMO_ATOMIC_OP 
macro here.

Zhiwei

>
> Alistair
>
>> --
>> 2.23.0
>>



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 03/60] target/riscv: support vector extension csr
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  1:11     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  1:11 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> The v0.7.1 specification does not define vector status within mstatus.
> A future revision will define the privileged portion of the vector status.
> 
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/cpu_bits.h | 15 +++++++++
>  target/riscv/csr.c      | 75 ++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 89 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 03/60] target/riscv: support vector extension csr
@ 2020-03-14  1:11     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  1:11 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> The v0.7.1 specification does not define vector status within mstatus.
> A future revision will define the privileged portion of the vector status.
> 
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/cpu_bits.h | 15 +++++++++
>  target/riscv/csr.c      | 75 ++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 89 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 04/60] target/riscv: add vector configure instruction
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  1:14     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  1:14 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl * a)
> +{
> +    TCGv s1, s2, dst;
> +    s2 = tcg_temp_new();
> +    dst = tcg_temp_new();
> +
> +    /* Using x0 as the rs1 register specifier, encodes an infinite AVL */
> +    if (a->rs1 == 0) {
> +        /* As the mask is at least one bit, RV_VLEN_MAX is >= VLMAX */
> +        s1 = tcg_const_tl(RV_VLEN_MAX);
> +    } else {
> +        s1 = tcg_temp_new();
> +        gen_get_gpr(s1, a->rs1);
> +    }
> +    gen_get_gpr(s2, a->rs2);
> +    gen_helper_vsetvl(dst, cpu_env, s1, s2);
> +    gen_set_gpr(a->rd, dst);
> +    tcg_gen_movi_tl(cpu_pc, ctx->pc_succ_insn);
> +    exit_tb(ctx);

You can use lookup_and_goto_ptr here.  But either way,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 04/60] target/riscv: add vector configure instruction
@ 2020-03-14  1:14     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  1:14 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl * a)
> +{
> +    TCGv s1, s2, dst;
> +    s2 = tcg_temp_new();
> +    dst = tcg_temp_new();
> +
> +    /* Using x0 as the rs1 register specifier, encodes an infinite AVL */
> +    if (a->rs1 == 0) {
> +        /* As the mask is at least one bit, RV_VLEN_MAX is >= VLMAX */
> +        s1 = tcg_const_tl(RV_VLEN_MAX);
> +    } else {
> +        s1 = tcg_temp_new();
> +        gen_get_gpr(s1, a->rs1);
> +    }
> +    gen_get_gpr(s2, a->rs2);
> +    gen_helper_vsetvl(dst, cpu_env, s1, s2);
> +    gen_set_gpr(a->rd, dst);
> +    tcg_gen_movi_tl(cpu_pc, ctx->pc_succ_insn);
> +    exit_tb(ctx);

You can use lookup_and_goto_ptr here.  But either way,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 05/60] target/riscv: add vector stride load and store instructions
  2020-03-13 21:32       ` LIU Zhiwei
@ 2020-03-14  1:26         ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  1:26 UTC (permalink / raw)
  To: LIU Zhiwei, Alistair Francis
  Cc: guoren, open list:RISC-V, qemu-devel@nongnu.org Developers,
	wxy194768, Chih-Min Chao, wenmeng_zhang, Palmer Dabbelt

On 3/13/20 2:32 PM, LIU Zhiwei wrote:
>>> +/* check functions */
>>> +static bool vext_check_isa_ill(DisasContext *s, target_ulong isa)
>>> +{
>>> +    return !s->vill && ((s->misa & isa) == isa);
>>> +}
>> I don't think we need a new function to check ISA.
> I don't think so.
> 
> Although there is a riscv_has_ext(env, isa) in cpu.h, it is not proper in this
> file,
> as it is in translation time and  usually DisasContext   is used here instead
> of CPURISCVState.

In translate.c we have has_ext() for this purpose.

I think you don't need to test has_ext(s, RVV) at all,
because in cpu_get_tb_cpu_state(), you already tested
RVV, and set VILL if RVV was not present.

Thus testing vill here is sufficient.  A comment here
to remind us of that fact would be appropriate.

For those few cases where you have an extension beyond
RVV, e.g. amo_check() I think you should simply use
has_ext() like so:

static bool amo_check(DisasContext *s, arg_rwdvm *a)
{
    return (!s->vill &&
            has_ext(s, RVA) &&
            ...);
}


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 05/60] target/riscv: add vector stride load and store instructions
@ 2020-03-14  1:26         ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  1:26 UTC (permalink / raw)
  To: LIU Zhiwei, Alistair Francis
  Cc: Chih-Min Chao, Palmer Dabbelt, wenmeng_zhang, wxy194768, guoren,
	qemu-devel@nongnu.org Developers, open list:RISC-V

On 3/13/20 2:32 PM, LIU Zhiwei wrote:
>>> +/* check functions */
>>> +static bool vext_check_isa_ill(DisasContext *s, target_ulong isa)
>>> +{
>>> +    return !s->vill && ((s->misa & isa) == isa);
>>> +}
>> I don't think we need a new function to check ISA.
> I don't think so.
> 
> Although there is a riscv_has_ext(env, isa) in cpu.h, it is not proper in this
> file,
> as it is in translation time and  usually DisasContext   is used here instead
> of CPURISCVState.

In translate.c we have has_ext() for this purpose.

I think you don't need to test has_ext(s, RVV) at all,
because in cpu_get_tb_cpu_state(), you already tested
RVV, and set VILL if RVV was not present.

Thus testing vill here is sufficient.  A comment here
to remind us of that fact would be appropriate.

For those few cases where you have an extension beyond
RVV, e.g. amo_check() I think you should simply use
has_ext() like so:

static bool amo_check(DisasContext *s, arg_rwdvm *a)
{
    return (!s->vill &&
            has_ext(s, RVA) &&
            ...);
}


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 05/60] target/riscv: add vector stride load and store instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  1:36     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  1:36 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Vector strided operations access the first memory element at the base address,
> and then access subsequent elements at address increments given by the byte
> offset contained in the x register specified by rs2.
> 
> Vector unit-stride operations access elements stored contiguously in memory
> starting from the base effective address. It can been seen as a special
> case of strided operations.
> 
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/cpu.h                      |   6 +
>  target/riscv/helper.h                   | 105 ++++++
>  target/riscv/insn32.decode              |  32 ++
>  target/riscv/insn_trans/trans_rvv.inc.c | 340 ++++++++++++++++++++
>  target/riscv/translate.c                |   7 +
>  target/riscv/vector_helper.c            | 406 ++++++++++++++++++++++++
>  6 files changed, 896 insertions(+)

With the changes for has_ext,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~




^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 05/60] target/riscv: add vector stride load and store instructions
@ 2020-03-14  1:36     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  1:36 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Vector strided operations access the first memory element at the base address,
> and then access subsequent elements at address increments given by the byte
> offset contained in the x register specified by rs2.
> 
> Vector unit-stride operations access elements stored contiguously in memory
> starting from the base effective address. It can been seen as a special
> case of strided operations.
> 
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/cpu.h                      |   6 +
>  target/riscv/helper.h                   | 105 ++++++
>  target/riscv/insn32.decode              |  32 ++
>  target/riscv/insn_trans/trans_rvv.inc.c | 340 ++++++++++++++++++++
>  target/riscv/translate.c                |   7 +
>  target/riscv/vector_helper.c            | 406 ++++++++++++++++++++++++
>  6 files changed, 896 insertions(+)

With the changes for has_ext,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~




^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 05/60] target/riscv: add vector stride load and store instructions
  2020-03-14  1:26         ` Richard Henderson
@ 2020-03-14  1:49           ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-14  1:49 UTC (permalink / raw)
  To: Richard Henderson, Alistair Francis
  Cc: guoren, open list:RISC-V, qemu-devel@nongnu.org Developers,
	wxy194768, Chih-Min Chao, wenmeng_zhang, Palmer Dabbelt



On 2020/3/14 9:26, Richard Henderson wrote:
> On 3/13/20 2:32 PM, LIU Zhiwei wrote:
>>>> +/* check functions */
>>>> +static bool vext_check_isa_ill(DisasContext *s, target_ulong isa)
>>>> +{
>>>> +    return !s->vill && ((s->misa & isa) == isa);
>>>> +}
>>> I don't think we need a new function to check ISA.
>> I don't think so.
>>
>> Although there is a riscv_has_ext(env, isa) in cpu.h, it is not proper in this
>> file,
>> as it is in translation time and  usually DisasContext   is used here instead
>> of CPURISCVState.
> In translate.c we have has_ext() for this purpose.
Yes, I will use it.
> I think you don't need to test has_ext(s, RVV) at all,
> because in cpu_get_tb_cpu_state(), you already tested
> RVV, and set VILL if RVV was not present.
>
> Thus testing vill here is sufficient.  A comment here
> to remind us of that fact would be appropriate.
Yes, I forgot it. I will keep the function and add a comment.
> For those few cases where you have an extension beyond
> RVV, e.g. amo_check() I think you should simply use
> has_ext() like so:
>
> static bool amo_check(DisasContext *s, arg_rwdvm *a)
> {
>      return (!s->vill &&
>              has_ext(s, RVA) &&
>              ...);
> }
Yes, I will fix it in that patch.
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 05/60] target/riscv: add vector stride load and store instructions
@ 2020-03-14  1:49           ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-14  1:49 UTC (permalink / raw)
  To: Richard Henderson, Alistair Francis
  Cc: Chih-Min Chao, Palmer Dabbelt, wenmeng_zhang, wxy194768, guoren,
	qemu-devel@nongnu.org Developers, open list:RISC-V



On 2020/3/14 9:26, Richard Henderson wrote:
> On 3/13/20 2:32 PM, LIU Zhiwei wrote:
>>>> +/* check functions */
>>>> +static bool vext_check_isa_ill(DisasContext *s, target_ulong isa)
>>>> +{
>>>> +    return !s->vill && ((s->misa & isa) == isa);
>>>> +}
>>> I don't think we need a new function to check ISA.
>> I don't think so.
>>
>> Although there is a riscv_has_ext(env, isa) in cpu.h, it is not proper in this
>> file,
>> as it is in translation time and  usually DisasContext   is used here instead
>> of CPURISCVState.
> In translate.c we have has_ext() for this purpose.
Yes, I will use it.
> I think you don't need to test has_ext(s, RVV) at all,
> because in cpu_get_tb_cpu_state(), you already tested
> RVV, and set VILL if RVV was not present.
>
> Thus testing vill here is sufficient.  A comment here
> to remind us of that fact would be appropriate.
Yes, I forgot it. I will keep the function and add a comment.
> For those few cases where you have an extension beyond
> RVV, e.g. amo_check() I think you should simply use
> has_ext() like so:
>
> static bool amo_check(DisasContext *s, arg_rwdvm *a)
> {
>      return (!s->vill &&
>              has_ext(s, RVA) &&
>              ...);
> }
Yes, I will fix it in that patch.
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 06/60] target/riscv: add vector index load and store instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  1:49     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  1:49 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +static inline void vext_ldst_index(void *vd, void *v0, target_ulong base,
> +        void *vs2, CPURISCVState *env, uint32_t desc,
> +        vext_get_index_addr get_index_addr,
> +        vext_ldst_elem_fn ldst_elem,
> +        vext_ld_clear_elem clear_elem,
> +        uint32_t esz, uint32_t msz, uintptr_t ra,
> +        MMUAccessType access_type)
> +{
> +    uint32_t i, k;
> +    uint32_t nf = vext_nf(desc);
> +    uint32_t vm = vext_vm(desc);
> +    uint32_t mlen = vext_mlen(desc);
> +    uint32_t vlmax = vext_maxsz(desc) / esz;
> +
> +    if (env->vl == 0) {
> +        return;
> +    }
> +    /* probe every access*/
> +    for (i = 0; i < env->vl; i++) {
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +            continue;
> +        }
> +        probe_pages(env, get_index_addr(base, i, vs2), nf * msz, ra,
> +                access_type);

Indentation.

> +    /* load bytes from guest memory */
> +    for (i = 0; i < env->vl; i++) {
> +        k = 0;
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +            continue;
> +        }
> +        while (k < nf) {
> +            abi_ptr addr = get_index_addr(base, i, vs2) + k * msz;
> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
> +            k++;
> +        }

Why the odd formulation with k?

> +        for (k = 0; k < nf; k++) {
> +            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
> +        }

Using a for is certainly a bit clearer.

Which does bring to mind an optimization -- letting the compiler know that
these loops always go at least once.

We can do that either by writing all of them as do { } while.

Or by encoding NF in desc like the instruction does:

static inline uint32_t vext_nf(uint32_t desc)
{
    return FIELD_EX32(simd_data(desc), VDATA, NF) + 1;
}

which will let the compiler know that NF >= 1.

But that's minor, and we can look at these sorts of things later.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 06/60] target/riscv: add vector index load and store instructions
@ 2020-03-14  1:49     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  1:49 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +static inline void vext_ldst_index(void *vd, void *v0, target_ulong base,
> +        void *vs2, CPURISCVState *env, uint32_t desc,
> +        vext_get_index_addr get_index_addr,
> +        vext_ldst_elem_fn ldst_elem,
> +        vext_ld_clear_elem clear_elem,
> +        uint32_t esz, uint32_t msz, uintptr_t ra,
> +        MMUAccessType access_type)
> +{
> +    uint32_t i, k;
> +    uint32_t nf = vext_nf(desc);
> +    uint32_t vm = vext_vm(desc);
> +    uint32_t mlen = vext_mlen(desc);
> +    uint32_t vlmax = vext_maxsz(desc) / esz;
> +
> +    if (env->vl == 0) {
> +        return;
> +    }
> +    /* probe every access*/
> +    for (i = 0; i < env->vl; i++) {
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +            continue;
> +        }
> +        probe_pages(env, get_index_addr(base, i, vs2), nf * msz, ra,
> +                access_type);

Indentation.

> +    /* load bytes from guest memory */
> +    for (i = 0; i < env->vl; i++) {
> +        k = 0;
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {
> +            continue;
> +        }
> +        while (k < nf) {
> +            abi_ptr addr = get_index_addr(base, i, vs2) + k * msz;
> +            ldst_elem(env, addr, i + k * vlmax, vd, ra);
> +            k++;
> +        }

Why the odd formulation with k?

> +        for (k = 0; k < nf; k++) {
> +            clear_elem(vd, env->vl + k * vlmax, env->vl * esz, vlmax * esz);
> +        }

Using a for is certainly a bit clearer.

Which does bring to mind an optimization -- letting the compiler know that
these loops always go at least once.

We can do that either by writing all of them as do { } while.

Or by encoding NF in desc like the instruction does:

static inline uint32_t vext_nf(uint32_t desc)
{
    return FIELD_EX32(simd_data(desc), VDATA, NF) + 1;
}

which will let the compiler know that NF >= 1.

But that's minor, and we can look at these sorts of things later.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 07/60] target/riscv: add fault-only-first unit stride load
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  1:50     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  1:50 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> The unit-stride fault-only-fault load instructions are used to
> vectorize loops with data-dependent exit conditions(while loops).
> These instructions execute as a regular load except that they
> will only take a trap on element 0.
> 
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  22 +++++
>  target/riscv/insn32.decode              |   7 ++
>  target/riscv/insn_trans/trans_rvv.inc.c |  69 +++++++++++++++
>  target/riscv/vector_helper.c            | 111 ++++++++++++++++++++++++
>  4 files changed, 209 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 07/60] target/riscv: add fault-only-first unit stride load
@ 2020-03-14  1:50     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  1:50 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> The unit-stride fault-only-fault load instructions are used to
> vectorize loops with data-dependent exit conditions(while loops).
> These instructions execute as a regular load except that they
> will only take a trap on element 0.
> 
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  22 +++++
>  target/riscv/insn32.decode              |   7 ++
>  target/riscv/insn_trans/trans_rvv.inc.c |  69 +++++++++++++++
>  target/riscv/vector_helper.c            | 111 ++++++++++++++++++++++++
>  4 files changed, 209 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 08/60] target/riscv: add vector amo operations
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  4:28     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  4:28 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +    static gen_helper_amo *const fnsw[9] = {
...
> +    static gen_helper_amo *const fnsd[18] = {
...
> +        fn = fnsw[seq];
> +#ifdef TARGET_RISCV64
> +        if (s->sew == 3) {
> +            fn = fnsd[seq];
> +        }
> +#endif

This indexing is wrong, since for seq == 11 you index past the end of fnsw[].

You need something like

    if (s->sew == 3) {
#ifdef TARGET_RISCV64
        fn = fnsd[seq];
#else
        /* Check done in amo_check(). */
        g_assert_not_reached();
#endif
    } else {
        fn = fnsw[seq];
    }

Otherwise it looks ok.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 08/60] target/riscv: add vector amo operations
@ 2020-03-14  4:28     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  4:28 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +    static gen_helper_amo *const fnsw[9] = {
...
> +    static gen_helper_amo *const fnsd[18] = {
...
> +        fn = fnsw[seq];
> +#ifdef TARGET_RISCV64
> +        if (s->sew == 3) {
> +            fn = fnsd[seq];
> +        }
> +#endif

This indexing is wrong, since for seq == 11 you index past the end of fnsw[].

You need something like

    if (s->sew == 3) {
#ifdef TARGET_RISCV64
        fn = fnsd[seq];
#else
        /* Check done in amo_check(). */
        g_assert_not_reached();
#endif
    } else {
        fn = fnsw[seq];
    }

Otherwise it looks ok.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 08/60] target/riscv: add vector amo operations
  2020-03-14  4:28     ` Richard Henderson
@ 2020-03-14  5:07       ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-14  5:07 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768



On 2020/3/14 12:28, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +    static gen_helper_amo *const fnsw[9] = {
> ...
>> +    static gen_helper_amo *const fnsd[18] = {
> ...
>> +        fn = fnsw[seq];
>> +#ifdef TARGET_RISCV64
>> +        if (s->sew == 3) {
>> +            fn = fnsd[seq];
>> +
>> +#endif
> This indexing is wrong, since for seq == 11 you index past the end of fnsw[].
Yes, it really a security bug.  Thanks for pointing that.

Zhiwei
> You need something like
>
>      if (s->sew == 3) {
> #ifdef TARGET_RISCV64
>          fn = fnsd[seq];
> #else
>          /* Check done in amo_check(). */
>          g_assert_not_reached();
> #endif
>      } else {
>          fn = fnsw[seq];
>      }
> Otherwise it looks ok.
>
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 08/60] target/riscv: add vector amo operations
@ 2020-03-14  5:07       ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-14  5:07 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv



On 2020/3/14 12:28, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +    static gen_helper_amo *const fnsw[9] = {
> ...
>> +    static gen_helper_amo *const fnsd[18] = {
> ...
>> +        fn = fnsw[seq];
>> +#ifdef TARGET_RISCV64
>> +        if (s->sew == 3) {
>> +            fn = fnsd[seq];
>> +
>> +#endif
> This indexing is wrong, since for seq == 11 you index past the end of fnsw[].
Yes, it really a security bug.  Thanks for pointing that.

Zhiwei
> You need something like
>
>      if (s->sew == 3) {
> #ifdef TARGET_RISCV64
>          fn = fnsd[seq];
> #else
>          /* Check done in amo_check(). */
>          g_assert_not_reached();
> #endif
>      } else {
>          fn = fnsw[seq];
>      }
> Otherwise it looks ok.
>
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 09/60] target/riscv: vector single-width integer add and subtract
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  5:25     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  5:25 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +    if (a->vm && s->vl_eq_vlmax) {                                 \
> +        tcg_gen_gvec_##GVSUF(8 << s->sew, vreg_ofs(s, a->rd),      \
> +            vreg_ofs(s, a->rs2), vreg_ofs(s, a->rs1),              \
> +            MAXSZ(s), MAXSZ(s));                                   \

The first argument here should be just s->sew.
You should have see the assert fire:

    tcg_debug_assert(vece <= MO_64);

It would be nice to pull out the bulk of GEN_OPIVV_GVEC_TRANS as a function,
and pass in tcg_gen_gvec_* as a function pointer, and fns as a pointer.

In general, I prefer the functions that are generated by macros like this to
have exactly one executable statement -- the call to the helper that does all
of the work using the arguments provided.  That way a maximum number of lines
are available for stepping with the debugger.

> +        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                        \
> +        data = FIELD_DP32(data, VDATA, VM, a->vm);                            \
> +        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                        \

Why are these replicated in each trans_* function, and not done in opiv?_trans,
where the rest of the descriptor is created?

> +/* OPIVX without GVEC IR */
> +#define GEN_OPIVX_TRANS(NAME, CHECK)                                     \
> +static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
> +{                                                                        \
> +    if (CHECK(s, a)) {                                                   \
> +        uint32_t data = 0;                                               \
> +        static gen_helper_opivx const fns[4] = {                         \
> +            gen_helper_##NAME##_b, gen_helper_##NAME##_h,                \
> +            gen_helper_##NAME##_w, gen_helper_##NAME##_d,                \
> +        };                                                               \
> +                                                                         \
> +        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
> +        data = FIELD_DP32(data, VDATA, VM, a->vm);                       \
> +        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
> +        return opivx_trans(a->rd, a->rs1, a->rs2, data, fns[s->sew], s); \
> +    }                                                                    \
> +    return false;                                                        \
> +}
> +
> +GEN_OPIVX_TRANS(vrsub_vx, opivx_check)

Note that you *can* generate vector code for this,
you just have to write your own helpers.

E.g.

static void gen_vec_rsub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 a)
{
    tcg_gen_vec_sub8_i64(d, b, a);
}
// etc, reversing the arguments and passing on to sub.

static const GVecGen2s rsub_op[4] = {
    { .fni8 = tcg_gen_vec_rsub8_i64,
      .fniv = tcg_gen_rsub_vec,
      .fno = gen_helper_gvec_rsubs8,
      .opt_opc = vecop_list_sub,
      .vece = MO_8 },
    { .fni8 = tcg_gen_vec_rsub16_i64,
      .fniv = tcg_gen_rsub_vec,
      .fno = gen_helper_gvec_rsubs16,
      .opt_opc = vecop_list_sub,
      .vece = MO_16 },
    { .fni4 = tcg_gen_rsub_i32,
      .fniv = tcg_gen_rsub_vec,
      .fno = gen_helper_gvec_rsubs32,
      .opt_opc = vecop_list_sub,
      .vece = MO_32 },
    { .fni8 = tcg_gen_rsub_i64,
      .fniv = tcg_gen_rsub_vec,
      .fno = gen_helper_gvec_rsubs64,
      .opt_opc = vecop_list_sub,
      .prefer_i64 = TCG_TARGET_REG_BITS == 64,
      .vece = MO_64 },
};

static void gen_gvec_rsubs(unsigned vece, uint32_t dofs,
    uint32_t aofs, TCGv_i64 c,
    uint32_t oprsz, uint32_t maxsz)
{
    tcg_debug_assert(vece <= MO_64);
    tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, c, &rsub_op[vece]);
}

static void gen_gvec_rsubi(unsigned vece, uint32_t dofs,
    uint32_t aofs, int64_t c,
    uint32_t oprsz, uint32_t maxsz)
{
    tcg_debug_assert(vece <= MO_64);
    tcg_gen_gvec_2i(dofs, aofs, oprsz, maxsz, c, &rsub_op[vece]);
}

> +/* generate the helpers for OPIVV */
> +#define GEN_VEXT_VV(NAME, ESZ, DSZ, CLEAR_FN)             \
> +void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
> +        void *vs2, CPURISCVState *env, uint32_t desc)     \
> +{                                                         \
> +    uint32_t vlmax = vext_maxsz(desc) / ESZ;              \
> +    uint32_t mlen = vext_mlen(desc);                      \
> +    uint32_t vm = vext_vm(desc);                          \
> +    uint32_t vl = env->vl;                                \
> +    uint32_t i;                                           \
> +    for (i = 0; i < vl; i++) {                            \
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
> +            continue;                                     \
> +        }                                                 \
> +        do_##NAME(vd, vs1, vs2, i);                       \
> +    }                                                     \
> +    if (i != 0) {                                         \
> +        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);         \
> +    }                                                     \
> +}
> +
> +GEN_VEXT_VV(vadd_vv_b, 1, 1, clearb)
> +GEN_VEXT_VV(vadd_vv_h, 2, 2, clearh)
> +GEN_VEXT_VV(vadd_vv_w, 4, 4, clearl)
> +GEN_VEXT_VV(vadd_vv_d, 8, 8, clearq)
> +GEN_VEXT_VV(vsub_vv_b, 1, 1, clearb)
> +GEN_VEXT_VV(vsub_vv_h, 2, 2, clearh)
> +GEN_VEXT_VV(vsub_vv_w, 4, 4, clearl)
> +GEN_VEXT_VV(vsub_vv_d, 8, 8, clearq)

The body of GEN_VEXT_VV can be an inline function, calling the helper functions
that you generated above.

> +/*
> + * If XLEN < SEW, the value from the x register is sign-extended to SEW bits.
> + * So (target_long)s1 is need. (T1)(target_long)s1 gives the real operator type.
> + * (TX1)(T1)(target_long)s1 expands the operator type of widen operations
> + * or narrow operations
> + */
> +#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
> +static void do_##NAME(void *vd, target_ulong s1, void *vs2, int i)  \
> +{                                                                   \
> +    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
> +    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)(target_long)s1);         \
> +}

Why not just make the type of s1 be target_long in the parameter?

> +/* generate the helpers for instructions with one vector and one sclar */
> +#define GEN_VEXT_VX(NAME, ESZ, DSZ, CLEAR_FN)             \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
> +        void *vs2, CPURISCVState *env, uint32_t desc)     \
> +{                                                         \
> +    uint32_t vlmax = vext_maxsz(desc) / ESZ;              \
> +    uint32_t mlen = vext_mlen(desc);                      \
> +    uint32_t vm = vext_vm(desc);                          \
> +    uint32_t vl = env->vl;                                \
> +    uint32_t i;                                           \
> +                                                          \
> +    for (i = 0; i < vl; i++) {                            \
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
> +            continue;                                     \
> +        }                                                 \
> +        do_##NAME(vd, s1, vs2, i);                        \
> +    }                                                     \
> +    if (i != 0) {                                         \
> +        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);         \
> +    }                                                     \
> +}

Likewise an inline function.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 09/60] target/riscv: vector single-width integer add and subtract
@ 2020-03-14  5:25     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  5:25 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +    if (a->vm && s->vl_eq_vlmax) {                                 \
> +        tcg_gen_gvec_##GVSUF(8 << s->sew, vreg_ofs(s, a->rd),      \
> +            vreg_ofs(s, a->rs2), vreg_ofs(s, a->rs1),              \
> +            MAXSZ(s), MAXSZ(s));                                   \

The first argument here should be just s->sew.
You should have see the assert fire:

    tcg_debug_assert(vece <= MO_64);

It would be nice to pull out the bulk of GEN_OPIVV_GVEC_TRANS as a function,
and pass in tcg_gen_gvec_* as a function pointer, and fns as a pointer.

In general, I prefer the functions that are generated by macros like this to
have exactly one executable statement -- the call to the helper that does all
of the work using the arguments provided.  That way a maximum number of lines
are available for stepping with the debugger.

> +        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                        \
> +        data = FIELD_DP32(data, VDATA, VM, a->vm);                            \
> +        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                        \

Why are these replicated in each trans_* function, and not done in opiv?_trans,
where the rest of the descriptor is created?

> +/* OPIVX without GVEC IR */
> +#define GEN_OPIVX_TRANS(NAME, CHECK)                                     \
> +static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
> +{                                                                        \
> +    if (CHECK(s, a)) {                                                   \
> +        uint32_t data = 0;                                               \
> +        static gen_helper_opivx const fns[4] = {                         \
> +            gen_helper_##NAME##_b, gen_helper_##NAME##_h,                \
> +            gen_helper_##NAME##_w, gen_helper_##NAME##_d,                \
> +        };                                                               \
> +                                                                         \
> +        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
> +        data = FIELD_DP32(data, VDATA, VM, a->vm);                       \
> +        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
> +        return opivx_trans(a->rd, a->rs1, a->rs2, data, fns[s->sew], s); \
> +    }                                                                    \
> +    return false;                                                        \
> +}
> +
> +GEN_OPIVX_TRANS(vrsub_vx, opivx_check)

Note that you *can* generate vector code for this,
you just have to write your own helpers.

E.g.

static void gen_vec_rsub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 a)
{
    tcg_gen_vec_sub8_i64(d, b, a);
}
// etc, reversing the arguments and passing on to sub.

static const GVecGen2s rsub_op[4] = {
    { .fni8 = tcg_gen_vec_rsub8_i64,
      .fniv = tcg_gen_rsub_vec,
      .fno = gen_helper_gvec_rsubs8,
      .opt_opc = vecop_list_sub,
      .vece = MO_8 },
    { .fni8 = tcg_gen_vec_rsub16_i64,
      .fniv = tcg_gen_rsub_vec,
      .fno = gen_helper_gvec_rsubs16,
      .opt_opc = vecop_list_sub,
      .vece = MO_16 },
    { .fni4 = tcg_gen_rsub_i32,
      .fniv = tcg_gen_rsub_vec,
      .fno = gen_helper_gvec_rsubs32,
      .opt_opc = vecop_list_sub,
      .vece = MO_32 },
    { .fni8 = tcg_gen_rsub_i64,
      .fniv = tcg_gen_rsub_vec,
      .fno = gen_helper_gvec_rsubs64,
      .opt_opc = vecop_list_sub,
      .prefer_i64 = TCG_TARGET_REG_BITS == 64,
      .vece = MO_64 },
};

static void gen_gvec_rsubs(unsigned vece, uint32_t dofs,
    uint32_t aofs, TCGv_i64 c,
    uint32_t oprsz, uint32_t maxsz)
{
    tcg_debug_assert(vece <= MO_64);
    tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, c, &rsub_op[vece]);
}

static void gen_gvec_rsubi(unsigned vece, uint32_t dofs,
    uint32_t aofs, int64_t c,
    uint32_t oprsz, uint32_t maxsz)
{
    tcg_debug_assert(vece <= MO_64);
    tcg_gen_gvec_2i(dofs, aofs, oprsz, maxsz, c, &rsub_op[vece]);
}

> +/* generate the helpers for OPIVV */
> +#define GEN_VEXT_VV(NAME, ESZ, DSZ, CLEAR_FN)             \
> +void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
> +        void *vs2, CPURISCVState *env, uint32_t desc)     \
> +{                                                         \
> +    uint32_t vlmax = vext_maxsz(desc) / ESZ;              \
> +    uint32_t mlen = vext_mlen(desc);                      \
> +    uint32_t vm = vext_vm(desc);                          \
> +    uint32_t vl = env->vl;                                \
> +    uint32_t i;                                           \
> +    for (i = 0; i < vl; i++) {                            \
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
> +            continue;                                     \
> +        }                                                 \
> +        do_##NAME(vd, vs1, vs2, i);                       \
> +    }                                                     \
> +    if (i != 0) {                                         \
> +        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);         \
> +    }                                                     \
> +}
> +
> +GEN_VEXT_VV(vadd_vv_b, 1, 1, clearb)
> +GEN_VEXT_VV(vadd_vv_h, 2, 2, clearh)
> +GEN_VEXT_VV(vadd_vv_w, 4, 4, clearl)
> +GEN_VEXT_VV(vadd_vv_d, 8, 8, clearq)
> +GEN_VEXT_VV(vsub_vv_b, 1, 1, clearb)
> +GEN_VEXT_VV(vsub_vv_h, 2, 2, clearh)
> +GEN_VEXT_VV(vsub_vv_w, 4, 4, clearl)
> +GEN_VEXT_VV(vsub_vv_d, 8, 8, clearq)

The body of GEN_VEXT_VV can be an inline function, calling the helper functions
that you generated above.

> +/*
> + * If XLEN < SEW, the value from the x register is sign-extended to SEW bits.
> + * So (target_long)s1 is need. (T1)(target_long)s1 gives the real operator type.
> + * (TX1)(T1)(target_long)s1 expands the operator type of widen operations
> + * or narrow operations
> + */
> +#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
> +static void do_##NAME(void *vd, target_ulong s1, void *vs2, int i)  \
> +{                                                                   \
> +    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
> +    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)(target_long)s1);         \
> +}

Why not just make the type of s1 be target_long in the parameter?

> +/* generate the helpers for instructions with one vector and one sclar */
> +#define GEN_VEXT_VX(NAME, ESZ, DSZ, CLEAR_FN)             \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
> +        void *vs2, CPURISCVState *env, uint32_t desc)     \
> +{                                                         \
> +    uint32_t vlmax = vext_maxsz(desc) / ESZ;              \
> +    uint32_t mlen = vext_mlen(desc);                      \
> +    uint32_t vm = vext_vm(desc);                          \
> +    uint32_t vl = env->vl;                                \
> +    uint32_t i;                                           \
> +                                                          \
> +    for (i = 0; i < vl; i++) {                            \
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
> +            continue;                                     \
> +        }                                                 \
> +        do_##NAME(vd, s1, vs2, i);                        \
> +    }                                                     \
> +    if (i != 0) {                                         \
> +        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);         \
> +    }                                                     \
> +}

Likewise an inline function.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 10/60] target/riscv: vector widening integer add and subtract
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  5:32     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  5:32 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  49 ++++++++
>  target/riscv/insn32.decode              |  16 +++
>  target/riscv/insn_trans/trans_rvv.inc.c | 154 ++++++++++++++++++++++++
>  target/riscv/vector_helper.c            | 112 +++++++++++++++++
>  4 files changed, 331 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 10/60] target/riscv: vector widening integer add and subtract
@ 2020-03-14  5:32     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  5:32 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  49 ++++++++
>  target/riscv/insn32.decode              |  16 +++
>  target/riscv/insn_trans/trans_rvv.inc.c | 154 ++++++++++++++++++++++++
>  target/riscv/vector_helper.c            | 112 +++++++++++++++++
>  4 files changed, 331 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 11/60] target/riscv: vector integer add-with-carry / subtract-with-borrow instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  5:58     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  5:58 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +#define DO_MADC(N, M, C) ((__typeof(N))(N + M + C) < N ? 1 : 0)

Incorrect.  E.g N = 1, M = UINT_MAX, C = 1, adds to 1, which is not less than
N, despite the carry-out.

You want

    C ? N + M <= N : N + M < N

> +#define DO_MSBC(N, M, C) ((__typeof(N))(N - M - C) > N ? 1 : 0)

Similarly

    C ? N <= M : N < M


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 11/60] target/riscv: vector integer add-with-carry / subtract-with-borrow instructions
@ 2020-03-14  5:58     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  5:58 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +#define DO_MADC(N, M, C) ((__typeof(N))(N + M + C) < N ? 1 : 0)

Incorrect.  E.g N = 1, M = UINT_MAX, C = 1, adds to 1, which is not less than
N, despite the carry-out.

You want

    C ? N + M <= N : N + M < N

> +#define DO_MSBC(N, M, C) ((__typeof(N))(N - M - C) > N ? 1 : 0)

Similarly

    C ? N <= M : N < M


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 12/60] target/riscv: vector bitwise logical instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  6:00     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  6:00 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   | 25 ++++++++++++
>  target/riscv/insn32.decode              |  9 +++++
>  target/riscv/insn_trans/trans_rvv.inc.c | 11 ++++++
>  target/riscv/vector_helper.c            | 51 +++++++++++++++++++++++++
>  4 files changed, 96 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 12/60] target/riscv: vector bitwise logical instructions
@ 2020-03-14  6:00     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  6:00 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   | 25 ++++++++++++
>  target/riscv/insn32.decode              |  9 +++++
>  target/riscv/insn_trans/trans_rvv.inc.c | 11 ++++++
>  target/riscv/vector_helper.c            | 51 +++++++++++++++++++++++++
>  4 files changed, 96 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 13/60] target/riscv: vector single-width bit shift instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  6:07     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  6:07 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +#define GEN_OPIVX_GVEC_SHIFT_TRANS(NAME, GVSUF)                               \
> +static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                        \
> +{                                                                             \
> +    if (!opivx_check(s, a)) {                                                 \
> +        return false;                                                         \
> +    }                                                                         \
> +                                                                              \
> +    if (a->vm && s->vl_eq_vlmax) {                                            \
> +        TCGv_i32 src1 = tcg_temp_new_i32();                                   \
> +        TCGv tmp = tcg_temp_new();                                            \
> +        gen_get_gpr(tmp, a->rs1);                                             \
> +        tcg_gen_trunc_tl_i32(src1, tmp);                                      \
> +        tcg_gen_gvec_##GVSUF(8 << s->sew, vreg_ofs(s, a->rd),                 \
> +            vreg_ofs(s, a->rs2), src1, MAXSZ(s), MAXSZ(s));                   \

Incorrect first argument.
Prefer an inline funtion helper.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 13/60] target/riscv: vector single-width bit shift instructions
@ 2020-03-14  6:07     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  6:07 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +#define GEN_OPIVX_GVEC_SHIFT_TRANS(NAME, GVSUF)                               \
> +static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                        \
> +{                                                                             \
> +    if (!opivx_check(s, a)) {                                                 \
> +        return false;                                                         \
> +    }                                                                         \
> +                                                                              \
> +    if (a->vm && s->vl_eq_vlmax) {                                            \
> +        TCGv_i32 src1 = tcg_temp_new_i32();                                   \
> +        TCGv tmp = tcg_temp_new();                                            \
> +        gen_get_gpr(tmp, a->rs1);                                             \
> +        tcg_gen_trunc_tl_i32(src1, tmp);                                      \
> +        tcg_gen_gvec_##GVSUF(8 << s->sew, vreg_ofs(s, a->rd),                 \
> +            vreg_ofs(s, a->rs2), src1, MAXSZ(s), MAXSZ(s));                   \

Incorrect first argument.
Prefer an inline funtion helper.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 11/60] target/riscv: vector integer add-with-carry / subtract-with-borrow instructions
  2020-03-14  5:58     ` Richard Henderson
@ 2020-03-14  6:08       ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-14  6:08 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768



On 2020/3/14 13:58, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +#define DO_MADC(N, M, C) ((__typeof(N))(N + M + C) < N ? 1 : 0)
> Incorrect.  E.g N = 1, M = UINT_MAX, C = 1, adds to 1, which is not less than
> N, despite the carry-out.
Yes, it really the corner case. I should test C first.

Thanks for pointing that.

Zhiwei
>
> You want
>
>      C ? N + M <= N : N + M < N
>
>> +#define DO_MSBC(N, M, C) ((__typeof(N))(N - M - C) > N ? 1 : 0)
> Similarly
>
>      C ? N <= M : N < M
>
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 11/60] target/riscv: vector integer add-with-carry / subtract-with-borrow instructions
@ 2020-03-14  6:08       ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-14  6:08 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv



On 2020/3/14 13:58, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +#define DO_MADC(N, M, C) ((__typeof(N))(N + M + C) < N ? 1 : 0)
> Incorrect.  E.g N = 1, M = UINT_MAX, C = 1, adds to 1, which is not less than
> N, despite the carry-out.
Yes, it really the corner case. I should test C first.

Thanks for pointing that.

Zhiwei
>
> You want
>
>      C ? N + M <= N : N + M < N
>
>> +#define DO_MSBC(N, M, C) ((__typeof(N))(N - M - C) > N ? 1 : 0)
> Similarly
>
>      C ? N <= M : N < M
>
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 14/60] target/riscv: vector narrowing integer right shift instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  6:10     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  6:10 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   | 13 ++++
>  target/riscv/insn32.decode              |  6 ++
>  target/riscv/insn_trans/trans_rvv.inc.c | 91 +++++++++++++++++++++++++
>  target/riscv/vector_helper.c            | 14 ++++
>  4 files changed, 124 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 14/60] target/riscv: vector narrowing integer right shift instructions
@ 2020-03-14  6:10     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  6:10 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   | 13 ++++
>  target/riscv/insn32.decode              |  6 ++
>  target/riscv/insn_trans/trans_rvv.inc.c | 91 +++++++++++++++++++++++++
>  target/riscv/vector_helper.c            | 14 ++++
>  4 files changed, 124 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 11/60] target/riscv: vector integer add-with-carry / subtract-with-borrow instructions
  2020-03-14  5:58     ` Richard Henderson
@ 2020-03-14  6:16       ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  6:16 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/13/20 10:58 PM, Richard Henderson wrote:
>     C ? N + M <= N : N + M < N

Ho hum.  N + M + 1 <= N.
I'm sure you saw the typo...


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 11/60] target/riscv: vector integer add-with-carry / subtract-with-borrow instructions
@ 2020-03-14  6:16       ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  6:16 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/13/20 10:58 PM, Richard Henderson wrote:
>     C ? N + M <= N : N + M < N

Ho hum.  N + M + 1 <= N.
I'm sure you saw the typo...


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 11/60] target/riscv: vector integer add-with-carry / subtract-with-borrow instructions
  2020-03-14  6:16       ` Richard Henderson
@ 2020-03-14  6:32         ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-14  6:32 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768



On 2020/3/14 14:16, Richard Henderson wrote:
> On 3/13/20 10:58 PM, Richard Henderson wrote:
>>      C ? N + M <= N : N + M < N
> Ho hum.  N + M + 1 <= N.
> I'm sure you saw the typo...
>

You give the corner case and the very precise answer.

Thanks very much.

Zhiwei
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 11/60] target/riscv: vector integer add-with-carry / subtract-with-borrow instructions
@ 2020-03-14  6:32         ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-14  6:32 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv



On 2020/3/14 14:16, Richard Henderson wrote:
> On 3/13/20 10:58 PM, Richard Henderson wrote:
>>      C ? N + M <= N : N + M < N
> Ho hum.  N + M + 1 <= N.
> I'm sure you saw the typo...
>

You give the corner case and the very precise answer.

Thanks very much.

Zhiwei
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 15/60] target/riscv: vector integer comparison instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  6:33     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  6:33 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +/* Vector Integer Comparison Instructions */
> +#define DO_MSEQ(N, M) ((N == M) ? 1 : 0)
> +#define DO_MSNE(N, M) ((N != M) ? 1 : 0)
> +#define DO_MSLTU(N, M) ((N < M) ? 1 : 0)
> +#define DO_MSLT(N, M) ((N < M) ? 1 : 0)
> +#define DO_MSLEU(N, M) ((N <= M) ? 1 : 0)
> +#define DO_MSLE(N, M) ((N <= M) ? 1 : 0)
> +#define DO_MSGTU(N, M) ((N > M) ? 1 : 0)
> +#define DO_MSGT(N, M) ((N > M) ? 1 : 0)

You can drop the ? 1 : 0.  You can drop the LT/LTU (etc) distinction, since
that comes from the type.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 15/60] target/riscv: vector integer comparison instructions
@ 2020-03-14  6:33     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  6:33 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +/* Vector Integer Comparison Instructions */
> +#define DO_MSEQ(N, M) ((N == M) ? 1 : 0)
> +#define DO_MSNE(N, M) ((N != M) ? 1 : 0)
> +#define DO_MSLTU(N, M) ((N < M) ? 1 : 0)
> +#define DO_MSLT(N, M) ((N < M) ? 1 : 0)
> +#define DO_MSLEU(N, M) ((N <= M) ? 1 : 0)
> +#define DO_MSLE(N, M) ((N <= M) ? 1 : 0)
> +#define DO_MSGTU(N, M) ((N > M) ? 1 : 0)
> +#define DO_MSGT(N, M) ((N > M) ? 1 : 0)

You can drop the ? 1 : 0.  You can drop the LT/LTU (etc) distinction, since
that comes from the type.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 16/60] target/riscv: vector integer min/max instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  6:40     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  6:40 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +/* Vector Integer Min/Max Instructions */
> +GEN_OPIVV_GVEC_TRANS(vminu_vv, umin)
> +GEN_OPIVV_GVEC_TRANS(vmin_vv,  smin)
> +GEN_OPIVV_GVEC_TRANS(vmaxu_vv, umax)
> +GEN_OPIVV_GVEC_TRANS(vmax_vv,  smax)
> +GEN_OPIVX_TRANS(vminu_vx, opivx_check)
> +GEN_OPIVX_TRANS(vmin_vx,  opivx_check)
> +GEN_OPIVX_TRANS(vmaxu_vx, opivx_check)
> +GEN_OPIVX_TRANS(vmax_vx,  opivx_check)

As with rsub, it is possible to use tcg_gen_gvec_2s to produce inline
vectorizations of {u,s}{min,max}.  But that can wait, if you like.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 16/60] target/riscv: vector integer min/max instructions
@ 2020-03-14  6:40     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  6:40 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +/* Vector Integer Min/Max Instructions */
> +GEN_OPIVV_GVEC_TRANS(vminu_vv, umin)
> +GEN_OPIVV_GVEC_TRANS(vmin_vv,  smin)
> +GEN_OPIVV_GVEC_TRANS(vmaxu_vv, umax)
> +GEN_OPIVV_GVEC_TRANS(vmax_vv,  smax)
> +GEN_OPIVX_TRANS(vminu_vx, opivx_check)
> +GEN_OPIVX_TRANS(vmin_vx,  opivx_check)
> +GEN_OPIVX_TRANS(vmaxu_vx, opivx_check)
> +GEN_OPIVX_TRANS(vmax_vx,  opivx_check)

As with rsub, it is possible to use tcg_gen_gvec_2s to produce inline
vectorizations of {u,s}{min,max}.  But that can wait, if you like.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 17/60] target/riscv: vector single-width integer multiply instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  6:52     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  6:52 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +static int64_t do_mulhsu_d(int64_t s2, uint64_t s1)
> +{
> +    uint64_t hi_64, lo_64, abs_s2 = s2;
> +
> +    if (s2 < 0) {
> +        abs_s2 = -s2;
> +    }
> +    mulu64(&lo_64, &hi_64, abs_s2, s1);
> +    if ((int64_t)(s2 ^ s1) < 0) {

Why would the sign of s1 be relevant?
It's always unsigned.

We have code for this in e.g. tcg_gen_mulsu2_i64

    mulu4(&lo, &hi, s1, s2);
    if ((int64_t)s2 < 0) {
        hi -= s2;
    }


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 17/60] target/riscv: vector single-width integer multiply instructions
@ 2020-03-14  6:52     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  6:52 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +static int64_t do_mulhsu_d(int64_t s2, uint64_t s1)
> +{
> +    uint64_t hi_64, lo_64, abs_s2 = s2;
> +
> +    if (s2 < 0) {
> +        abs_s2 = -s2;
> +    }
> +    mulu64(&lo_64, &hi_64, abs_s2, s1);
> +    if ((int64_t)(s2 ^ s1) < 0) {

Why would the sign of s1 be relevant?
It's always unsigned.

We have code for this in e.g. tcg_gen_mulsu2_i64

    mulu4(&lo, &hi, s1, s2);
    if ((int64_t)s2 < 0) {
        hi -= s2;
    }


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 18/60] target/riscv: vector integer divide instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  6:58     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  6:58 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   | 33 +++++++++++
>  target/riscv/insn32.decode              |  8 +++
>  target/riscv/insn_trans/trans_rvv.inc.c | 10 ++++
>  target/riscv/vector_helper.c            | 74 +++++++++++++++++++++++++
>  4 files changed, 125 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 18/60] target/riscv: vector integer divide instructions
@ 2020-03-14  6:58     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  6:58 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   | 33 +++++++++++
>  target/riscv/insn32.decode              |  8 +++
>  target/riscv/insn_trans/trans_rvv.inc.c | 10 ++++
>  target/riscv/vector_helper.c            | 74 +++++++++++++++++++++++++
>  4 files changed, 125 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 17/60] target/riscv: vector single-width integer multiply instructions
  2020-03-14  6:52     ` Richard Henderson
@ 2020-03-14  7:02       ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-14  7:02 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768



On 2020/3/14 14:52, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +static int64_t do_mulhsu_d(int64_t s2, uint64_t s1)
>> +{
>> +    uint64_t hi_64, lo_64, abs_s2 = s2;
>> +
>> +    if (s2 < 0) {
>> +        abs_s2 = -s2;
>> +    }
>> +    mulu64(&lo_64, &hi_64, abs_s2, s1);
>> +    if ((int64_t)(s2 ^ s1) < 0) {
> Why would the sign of s1 be relevant?
> It's always unsigned.
Yes, it is a bug. Thanks for pointing that.

Zhiwei
> We have code for this in e.g. tcg_gen_mulsu2_i64
>
>      mulu4(&lo, &hi, s1, s2);
>      if ((int64_t)s2 < 0) {
>          hi -= s2;
>      }
>
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 17/60] target/riscv: vector single-width integer multiply instructions
@ 2020-03-14  7:02       ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-14  7:02 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv



On 2020/3/14 14:52, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +static int64_t do_mulhsu_d(int64_t s2, uint64_t s1)
>> +{
>> +    uint64_t hi_64, lo_64, abs_s2 = s2;
>> +
>> +    if (s2 < 0) {
>> +        abs_s2 = -s2;
>> +    }
>> +    mulu64(&lo_64, &hi_64, abs_s2, s1);
>> +    if ((int64_t)(s2 ^ s1) < 0) {
> Why would the sign of s1 be relevant?
> It's always unsigned.
Yes, it is a bug. Thanks for pointing that.

Zhiwei
> We have code for this in e.g. tcg_gen_mulsu2_i64
>
>      mulu4(&lo, &hi, s1, s2);
>      if ((int64_t)s2 < 0) {
>          hi -= s2;
>      }
>
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 19/60] target/riscv: vector widening integer multiply instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  7:06     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  7:06 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   | 19 +++++++++
>  target/riscv/insn32.decode              |  6 +++
>  target/riscv/insn_trans/trans_rvv.inc.c |  8 ++++
>  target/riscv/vector_helper.c            | 51 +++++++++++++++++++++++++
>  4 files changed, 84 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 19/60] target/riscv: vector widening integer multiply instructions
@ 2020-03-14  7:06     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  7:06 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   | 19 +++++++++
>  target/riscv/insn32.decode              |  6 +++
>  target/riscv/insn_trans/trans_rvv.inc.c |  8 ++++
>  target/riscv/vector_helper.c            | 51 +++++++++++++++++++++++++
>  4 files changed, 84 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 20/60] target/riscv: vector single-width integer multiply-add instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  7:10     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  7:10 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +#define OPIVX3(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)         \
> +static void do_##NAME(void *vd, target_ulong s1, void *vs2, int i)  \
> +{                                                                   \
> +    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
> +    TD d = *((TD *)vd + HD(i));                                     \
> +    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)(target_long)s1, d);      \
> +}

Change the type of s1?  Otherwise,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 20/60] target/riscv: vector single-width integer multiply-add instructions
@ 2020-03-14  7:10     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  7:10 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +#define OPIVX3(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)         \
> +static void do_##NAME(void *vd, target_ulong s1, void *vs2, int i)  \
> +{                                                                   \
> +    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
> +    TD d = *((TD *)vd + HD(i));                                     \
> +    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)(target_long)s1, d);      \
> +}

Change the type of s1?  Otherwise,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 21/60] target/riscv: vector widening integer multiply-add instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  7:13     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  7:13 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   | 22 ++++++++++++
>  target/riscv/insn32.decode              |  7 ++++
>  target/riscv/insn_trans/trans_rvv.inc.c |  9 +++++
>  target/riscv/vector_helper.c            | 45 +++++++++++++++++++++++++
>  4 files changed, 83 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 21/60] target/riscv: vector widening integer multiply-add instructions
@ 2020-03-14  7:13     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  7:13 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   | 22 ++++++++++++
>  target/riscv/insn32.decode              |  7 ++++
>  target/riscv/insn_trans/trans_rvv.inc.c |  9 +++++
>  target/riscv/vector_helper.c            | 45 +++++++++++++++++++++++++
>  4 files changed, 83 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 22/60] target/riscv: vector integer merge and move instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  7:27     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  7:27 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +/* Vector Integer Merge and Move Instructions */
> +static bool opivv_vmerge_check(DisasContext *s, arg_rmrr *a)
> +{
> +    return (vext_check_isa_ill(s, RVV) &&
> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
> +            vext_check_reg(s, a->rd, false) &&
> +            vext_check_reg(s, a->rs2, false) &&
> +            vext_check_reg(s, a->rs1, false) &&
> +            ((a->vm == 0) || (a->rs2 == 0)));
> +}
> +GEN_OPIVV_TRANS(vmerge_vvm, opivv_vmerge_check)
> +
> +static bool opivx_vmerge_check(DisasContext *s, arg_rmrr *a)
> +{
> +    return (vext_check_isa_ill(s, RVV) &&
> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
> +            vext_check_reg(s, a->rd, false) &&
> +            vext_check_reg(s, a->rs2, false) &&
> +            ((a->vm == 0) || (a->rs2 == 0)));
> +}
> +GEN_OPIVX_TRANS(vmerge_vxm, opivx_vmerge_check)
> +
> +GEN_OPIVI_TRANS(vmerge_vim, 0, vmerge_vxm, opivx_vmerge_check)

I think you need to special case these.  The unmasked instructions are the
canonical move instructions: vmv.v.*.

You definitely want to use tcg_gen_gvec_mov (vv), tcg_gen_gvec_dup_i{32,64}
(vx) and tcg_gen_gvec_dup{8,16,32,64}i (vi).

> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {                   \
> +            ETYPE s2 = *((ETYPE *)vs2 + H(i));                       \
> +            *((ETYPE *)vd + H1(i)) = s2;                             \
> +        } else {                                                     \
> +            ETYPE s1 = *((ETYPE *)vs1 + H(i));                       \
> +            *((ETYPE *)vd + H(i)) = s1;                              \
> +        }                                                            \

Perhaps better as

ETYPE *vt = (!vm && !vext_elem_mask(v0, mlen, i) ? vs2 : vs1);
*((ETYPE *)vd + H(i)) = *((ETYPE *)vt + H(i));

> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {                   \
> +            ETYPE s2 = *((ETYPE *)vs2 + H(i));                       \
> +            *((ETYPE *)vd + H1(i)) = s2;                             \
> +        } else {                                                     \
> +            *((ETYPE *)vd + H(i)) = (ETYPE)(target_long)s1;          \
> +        }                                                            \

Perhaps better as

ETYPE s2 = *((ETYPE *)vs2 + H(i));
ETYPE d = (!vm && !vext_elem_mask(v0, mlen, i)
           ? s2 : (ETYPE)(target_long)s1);
*((ETYPE *)vd + H(i)) = d;

as most host platforms have a conditional reg-reg move, but not a conditional load.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 22/60] target/riscv: vector integer merge and move instructions
@ 2020-03-14  7:27     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  7:27 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +/* Vector Integer Merge and Move Instructions */
> +static bool opivv_vmerge_check(DisasContext *s, arg_rmrr *a)
> +{
> +    return (vext_check_isa_ill(s, RVV) &&
> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
> +            vext_check_reg(s, a->rd, false) &&
> +            vext_check_reg(s, a->rs2, false) &&
> +            vext_check_reg(s, a->rs1, false) &&
> +            ((a->vm == 0) || (a->rs2 == 0)));
> +}
> +GEN_OPIVV_TRANS(vmerge_vvm, opivv_vmerge_check)
> +
> +static bool opivx_vmerge_check(DisasContext *s, arg_rmrr *a)
> +{
> +    return (vext_check_isa_ill(s, RVV) &&
> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
> +            vext_check_reg(s, a->rd, false) &&
> +            vext_check_reg(s, a->rs2, false) &&
> +            ((a->vm == 0) || (a->rs2 == 0)));
> +}
> +GEN_OPIVX_TRANS(vmerge_vxm, opivx_vmerge_check)
> +
> +GEN_OPIVI_TRANS(vmerge_vim, 0, vmerge_vxm, opivx_vmerge_check)

I think you need to special case these.  The unmasked instructions are the
canonical move instructions: vmv.v.*.

You definitely want to use tcg_gen_gvec_mov (vv), tcg_gen_gvec_dup_i{32,64}
(vx) and tcg_gen_gvec_dup{8,16,32,64}i (vi).

> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {                   \
> +            ETYPE s2 = *((ETYPE *)vs2 + H(i));                       \
> +            *((ETYPE *)vd + H1(i)) = s2;                             \
> +        } else {                                                     \
> +            ETYPE s1 = *((ETYPE *)vs1 + H(i));                       \
> +            *((ETYPE *)vd + H(i)) = s1;                              \
> +        }                                                            \

Perhaps better as

ETYPE *vt = (!vm && !vext_elem_mask(v0, mlen, i) ? vs2 : vs1);
*((ETYPE *)vd + H(i)) = *((ETYPE *)vt + H(i));

> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {                   \
> +            ETYPE s2 = *((ETYPE *)vs2 + H(i));                       \
> +            *((ETYPE *)vd + H1(i)) = s2;                             \
> +        } else {                                                     \
> +            *((ETYPE *)vd + H(i)) = (ETYPE)(target_long)s1;          \
> +        }                                                            \

Perhaps better as

ETYPE s2 = *((ETYPE *)vs2 + H(i));
ETYPE d = (!vm && !vext_elem_mask(v0, mlen, i)
           ? s2 : (ETYPE)(target_long)s1);
*((ETYPE *)vd + H(i)) = d;

as most host platforms have a conditional reg-reg move, but not a conditional load.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 23/60] target/riscv: vector single-width saturating add and subtract
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  7:52     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  7:52 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +/* Vector Single-Width Saturating Add and Subtract */
> +GEN_OPIVV_GVEC_TRANS(vsaddu_vv, usadd)
> +GEN_OPIVV_GVEC_TRANS(vsadd_vv,  ssadd)
> +GEN_OPIVV_GVEC_TRANS(vssubu_vv, ussub)
> +GEN_OPIVV_GVEC_TRANS(vssub_vv,  sssub)
> +GEN_OPIVX_TRANS(vsaddu_vx,  opivx_check)
> +GEN_OPIVX_TRANS(vsadd_vx,  opivx_check)
> +GEN_OPIVX_TRANS(vssubu_vx,  opivx_check)
> +GEN_OPIVX_TRANS(vssub_vx,  opivx_check)
> +GEN_OPIVI_TRANS(vsaddu_vi, 1, vsaddu_vx, opivx_check)
> +GEN_OPIVI_TRANS(vsadd_vi, 0, vsadd_vx, opivx_check)

The vxsat bit can't be set by the gvec routines, at least on its own.

For ppc I compute the saturation bit by doing the vector saturating add, the
vector normal add, and comparing the two.  See uses of vscr_sat.

But for now, you can just use your own current out-of-line functions.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 23/60] target/riscv: vector single-width saturating add and subtract
@ 2020-03-14  7:52     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  7:52 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +/* Vector Single-Width Saturating Add and Subtract */
> +GEN_OPIVV_GVEC_TRANS(vsaddu_vv, usadd)
> +GEN_OPIVV_GVEC_TRANS(vsadd_vv,  ssadd)
> +GEN_OPIVV_GVEC_TRANS(vssubu_vv, ussub)
> +GEN_OPIVV_GVEC_TRANS(vssub_vv,  sssub)
> +GEN_OPIVX_TRANS(vsaddu_vx,  opivx_check)
> +GEN_OPIVX_TRANS(vsadd_vx,  opivx_check)
> +GEN_OPIVX_TRANS(vssubu_vx,  opivx_check)
> +GEN_OPIVX_TRANS(vssub_vx,  opivx_check)
> +GEN_OPIVI_TRANS(vsaddu_vi, 1, vsaddu_vx, opivx_check)
> +GEN_OPIVI_TRANS(vsadd_vi, 0, vsadd_vx, opivx_check)

The vxsat bit can't be set by the gvec routines, at least on its own.

For ppc I compute the saturation bit by doing the vector saturating add, the
vector normal add, and comparing the two.  See uses of vscr_sat.

But for now, you can just use your own current out-of-line functions.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 09/60] target/riscv: vector single-width integer add and subtract
  2020-03-14  5:25     ` Richard Henderson
@ 2020-03-14  8:11       ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-14  8:11 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768



On 2020/3/14 13:25, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +    if (a->vm && s->vl_eq_vlmax) {                                 \
>> +        tcg_gen_gvec_##GVSUF(8 << s->sew, vreg_ofs(s, a->rd),      \
>> +            vreg_ofs(s, a->rs2), vreg_ofs(s, a->rs1),              \
>> +            MAXSZ(s), MAXSZ(s));                                   \
> The first argument here should be just s->sew.
> You should have see the assert fire:
>
>      tcg_debug_assert(vece <= MO_64);
Oh, sorry, I did not see this. I must miss testing  this  path.
> It would be nice to pull out the bulk of GEN_OPIVV_GVEC_TRANS as a function,
> and pass in tcg_gen_gvec_* as a function pointer, and fns as a pointer.
>
> In general, I prefer the functions that are generated by macros like this to
> have exactly one executable statement -- the call to the helper that does all
> of the work using the arguments provided.  That way a maximum number of lines
> are available for stepping with the debugger.
Can't agree more. When I debug the test cases, I also find it is hard 
to  debug the
generated code. The macro to generate code should be as short as possible.

I accept  your advice to  pull out the bulk of GEN_OPIVV_GVEC_TRANS as a 
function.
>
>> +        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                        \
>> +        data = FIELD_DP32(data, VDATA, VM, a->vm);                            \
>> +        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                        \
> Why are these replicated in each trans_* function, and not done in opiv?_trans,
> where the rest of the descriptor is created?
The opiv? _trans is a better place.
>
>> +/* OPIVX without GVEC IR */
>> +#define GEN_OPIVX_TRANS(NAME, CHECK)                                     \
>> +static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
>> +{                                                                        \
>> +    if (CHECK(s, a)) {                                                   \
>> +        uint32_t data = 0;                                               \
>> +        static gen_helper_opivx const fns[4] = {                         \
>> +            gen_helper_##NAME##_b, gen_helper_##NAME##_h,                \
>> +            gen_helper_##NAME##_w, gen_helper_##NAME##_d,                \
>> +        };                                                               \
>> +                                                                         \
>> +        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
>> +        data = FIELD_DP32(data, VDATA, VM, a->vm);                       \
>> +        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
>> +        return opivx_trans(a->rd, a->rs1, a->rs2, data, fns[s->sew], s); \
>> +    }                                                                    \
>> +    return false;                                                        \
>> +}
>> +
>> +GEN_OPIVX_TRANS(vrsub_vx, opivx_check)
> Note that you *can* generate vector code for this,
> you just have to write your own helpers.
>
> E.g.
>
> static void gen_vec_rsub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 a)
> {
>      tcg_gen_vec_sub8_i64(d, b, a);
> }
> // etc, reversing the arguments and passing on to sub.
>
> static const GVecGen2s rsub_op[4] = {
>      { .fni8 = tcg_gen_vec_rsub8_i64,
>        .fniv = tcg_gen_rsub_vec,
>        .fno = gen_helper_gvec_rsubs8,
>        .opt_opc = vecop_list_sub,
>        .vece = MO_8 },
>      { .fni8 = tcg_gen_vec_rsub16_i64,
>        .fniv = tcg_gen_rsub_vec,
>        .fno = gen_helper_gvec_rsubs16,
>        .opt_opc = vecop_list_sub,
>        .vece = MO_16 },
>      { .fni4 = tcg_gen_rsub_i32,
>        .fniv = tcg_gen_rsub_vec,
>        .fno = gen_helper_gvec_rsubs32,
>        .opt_opc = vecop_list_sub,
>        .vece = MO_32 },
>      { .fni8 = tcg_gen_rsub_i64,
>        .fniv = tcg_gen_rsub_vec,
>        .fno = gen_helper_gvec_rsubs64,
>        .opt_opc = vecop_list_sub,
>        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
>        .vece = MO_64 },
> };
> static void gen_gvec_rsubs(unsigned vece, uint32_t dofs,
>      uint32_t aofs, TCGv_i64 c,
>      uint32_t oprsz, uint32_t maxsz)
> {
>      tcg_debug_assert(vece <= MO_64);
>      tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, c, &rsub_op[vece]);
> }
>
> static void gen_gvec_rsubi(unsigned vece, uint32_t dofs,
>      uint32_t aofs, int64_t c,
>      uint32_t oprsz, uint32_t maxsz)
> {
>      tcg_debug_assert(vece <= MO_64);
>      tcg_gen_gvec_2i(dofs, aofs, oprsz, maxsz, c, &rsub_op[vece]);
> }
Good idea. I will try to these GVEC IRs.
>> +/* generate the helpers for OPIVV */
>> +#define GEN_VEXT_VV(NAME, ESZ, DSZ, CLEAR_FN)             \
>> +void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
>> +        void *vs2, CPURISCVState *env, uint32_t desc)     \
>> +{                                                         \
>> +    uint32_t vlmax = vext_maxsz(desc) / ESZ;              \
>> +    uint32_t mlen = vext_mlen(desc);                      \
>> +    uint32_t vm = vext_vm(desc);                          \
>> +    uint32_t vl = env->vl;                                \
>> +    uint32_t i;                                           \
>> +    for (i = 0; i < vl; i++) {                            \
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
>> +            continue;                                     \
>> +        }                                                 \
>> +        do_##NAME(vd, vs1, vs2, i);                       \
>> +    }                                                     \
>> +    if (i != 0) {                                         \
>> +        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);         \
>> +    }                                                     \
>> +}
>> +
>> +GEN_VEXT_VV(vadd_vv_b, 1, 1, clearb)
>> +GEN_VEXT_VV(vadd_vv_h, 2, 2, clearh)
>> +GEN_VEXT_VV(vadd_vv_w, 4, 4, clearl)
>> +GEN_VEXT_VV(vadd_vv_d, 8, 8, clearq)
>> +GEN_VEXT_VV(vsub_vv_b, 1, 1, clearb)
>> +GEN_VEXT_VV(vsub_vv_h, 2, 2, clearh)
>> +GEN_VEXT_VV(vsub_vv_w, 4, 4, clearl)
>> +GEN_VEXT_VV(vsub_vv_d, 8, 8, clearq)
> The body of GEN_VEXT_VV can be an inline function, calling the helper functions
> that you generated above.
Yes, I will.
>> +/*
>> + * If XLEN < SEW, the value from the x register is sign-extended to SEW bits.
>> + * So (target_long)s1 is need. (T1)(target_long)s1 gives the real operator type.
>> + * (TX1)(T1)(target_long)s1 expands the operator type of widen operations
>> + * or narrow operations
>> + */
>> +#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
>> +static void do_##NAME(void *vd, target_ulong s1, void *vs2, int i)  \
>> +{                                                                   \
>> +    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
>> +    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)(target_long)s1);         \
>> +}
> Why not just make the type of s1 be target_long in the parameter?
Yes, I should.
>
>> +/* generate the helpers for instructions with one vector and one sclar */
>> +#define GEN_VEXT_VX(NAME, ESZ, DSZ, CLEAR_FN)             \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
>> +        void *vs2, CPURISCVState *env, uint32_t desc)     \
>> +{                                                         \
>> +    uint32_t vlmax = vext_maxsz(desc) / ESZ;              \
>> +    uint32_t mlen = vext_mlen(desc);                      \
>> +    uint32_t vm = vext_vm(desc);                          \
>> +    uint32_t vl = env->vl;                                \
>> +    uint32_t i;                                           \
>> +                                                          \
>> +    for (i = 0; i < vl; i++) {                            \
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
>> +            continue;                                     \
>> +        }                                                 \
>> +        do_##NAME(vd, s1, vs2, i);                        \
>> +    }                                                     \
>> +    if (i != 0) {                                         \
>> +        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);         \
>> +    }                                                     \
>> +}
> Likewise an inline function.
Yes, I will.

Very informative comments. I will try to address them in next patch set 
soon.

Thanks very much.

Zhiwei
>
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 09/60] target/riscv: vector single-width integer add and subtract
@ 2020-03-14  8:11       ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-14  8:11 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv



On 2020/3/14 13:25, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +    if (a->vm && s->vl_eq_vlmax) {                                 \
>> +        tcg_gen_gvec_##GVSUF(8 << s->sew, vreg_ofs(s, a->rd),      \
>> +            vreg_ofs(s, a->rs2), vreg_ofs(s, a->rs1),              \
>> +            MAXSZ(s), MAXSZ(s));                                   \
> The first argument here should be just s->sew.
> You should have see the assert fire:
>
>      tcg_debug_assert(vece <= MO_64);
Oh, sorry, I did not see this. I must miss testing  this  path.
> It would be nice to pull out the bulk of GEN_OPIVV_GVEC_TRANS as a function,
> and pass in tcg_gen_gvec_* as a function pointer, and fns as a pointer.
>
> In general, I prefer the functions that are generated by macros like this to
> have exactly one executable statement -- the call to the helper that does all
> of the work using the arguments provided.  That way a maximum number of lines
> are available for stepping with the debugger.
Can't agree more. When I debug the test cases, I also find it is hard 
to  debug the
generated code. The macro to generate code should be as short as possible.

I accept  your advice to  pull out the bulk of GEN_OPIVV_GVEC_TRANS as a 
function.
>
>> +        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                        \
>> +        data = FIELD_DP32(data, VDATA, VM, a->vm);                            \
>> +        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                        \
> Why are these replicated in each trans_* function, and not done in opiv?_trans,
> where the rest of the descriptor is created?
The opiv? _trans is a better place.
>
>> +/* OPIVX without GVEC IR */
>> +#define GEN_OPIVX_TRANS(NAME, CHECK)                                     \
>> +static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
>> +{                                                                        \
>> +    if (CHECK(s, a)) {                                                   \
>> +        uint32_t data = 0;                                               \
>> +        static gen_helper_opivx const fns[4] = {                         \
>> +            gen_helper_##NAME##_b, gen_helper_##NAME##_h,                \
>> +            gen_helper_##NAME##_w, gen_helper_##NAME##_d,                \
>> +        };                                                               \
>> +                                                                         \
>> +        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
>> +        data = FIELD_DP32(data, VDATA, VM, a->vm);                       \
>> +        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
>> +        return opivx_trans(a->rd, a->rs1, a->rs2, data, fns[s->sew], s); \
>> +    }                                                                    \
>> +    return false;                                                        \
>> +}
>> +
>> +GEN_OPIVX_TRANS(vrsub_vx, opivx_check)
> Note that you *can* generate vector code for this,
> you just have to write your own helpers.
>
> E.g.
>
> static void gen_vec_rsub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 a)
> {
>      tcg_gen_vec_sub8_i64(d, b, a);
> }
> // etc, reversing the arguments and passing on to sub.
>
> static const GVecGen2s rsub_op[4] = {
>      { .fni8 = tcg_gen_vec_rsub8_i64,
>        .fniv = tcg_gen_rsub_vec,
>        .fno = gen_helper_gvec_rsubs8,
>        .opt_opc = vecop_list_sub,
>        .vece = MO_8 },
>      { .fni8 = tcg_gen_vec_rsub16_i64,
>        .fniv = tcg_gen_rsub_vec,
>        .fno = gen_helper_gvec_rsubs16,
>        .opt_opc = vecop_list_sub,
>        .vece = MO_16 },
>      { .fni4 = tcg_gen_rsub_i32,
>        .fniv = tcg_gen_rsub_vec,
>        .fno = gen_helper_gvec_rsubs32,
>        .opt_opc = vecop_list_sub,
>        .vece = MO_32 },
>      { .fni8 = tcg_gen_rsub_i64,
>        .fniv = tcg_gen_rsub_vec,
>        .fno = gen_helper_gvec_rsubs64,
>        .opt_opc = vecop_list_sub,
>        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
>        .vece = MO_64 },
> };
> static void gen_gvec_rsubs(unsigned vece, uint32_t dofs,
>      uint32_t aofs, TCGv_i64 c,
>      uint32_t oprsz, uint32_t maxsz)
> {
>      tcg_debug_assert(vece <= MO_64);
>      tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, c, &rsub_op[vece]);
> }
>
> static void gen_gvec_rsubi(unsigned vece, uint32_t dofs,
>      uint32_t aofs, int64_t c,
>      uint32_t oprsz, uint32_t maxsz)
> {
>      tcg_debug_assert(vece <= MO_64);
>      tcg_gen_gvec_2i(dofs, aofs, oprsz, maxsz, c, &rsub_op[vece]);
> }
Good idea. I will try to these GVEC IRs.
>> +/* generate the helpers for OPIVV */
>> +#define GEN_VEXT_VV(NAME, ESZ, DSZ, CLEAR_FN)             \
>> +void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
>> +        void *vs2, CPURISCVState *env, uint32_t desc)     \
>> +{                                                         \
>> +    uint32_t vlmax = vext_maxsz(desc) / ESZ;              \
>> +    uint32_t mlen = vext_mlen(desc);                      \
>> +    uint32_t vm = vext_vm(desc);                          \
>> +    uint32_t vl = env->vl;                                \
>> +    uint32_t i;                                           \
>> +    for (i = 0; i < vl; i++) {                            \
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
>> +            continue;                                     \
>> +        }                                                 \
>> +        do_##NAME(vd, vs1, vs2, i);                       \
>> +    }                                                     \
>> +    if (i != 0) {                                         \
>> +        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);         \
>> +    }                                                     \
>> +}
>> +
>> +GEN_VEXT_VV(vadd_vv_b, 1, 1, clearb)
>> +GEN_VEXT_VV(vadd_vv_h, 2, 2, clearh)
>> +GEN_VEXT_VV(vadd_vv_w, 4, 4, clearl)
>> +GEN_VEXT_VV(vadd_vv_d, 8, 8, clearq)
>> +GEN_VEXT_VV(vsub_vv_b, 1, 1, clearb)
>> +GEN_VEXT_VV(vsub_vv_h, 2, 2, clearh)
>> +GEN_VEXT_VV(vsub_vv_w, 4, 4, clearl)
>> +GEN_VEXT_VV(vsub_vv_d, 8, 8, clearq)
> The body of GEN_VEXT_VV can be an inline function, calling the helper functions
> that you generated above.
Yes, I will.
>> +/*
>> + * If XLEN < SEW, the value from the x register is sign-extended to SEW bits.
>> + * So (target_long)s1 is need. (T1)(target_long)s1 gives the real operator type.
>> + * (TX1)(T1)(target_long)s1 expands the operator type of widen operations
>> + * or narrow operations
>> + */
>> +#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
>> +static void do_##NAME(void *vd, target_ulong s1, void *vs2, int i)  \
>> +{                                                                   \
>> +    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
>> +    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)(target_long)s1);         \
>> +}
> Why not just make the type of s1 be target_long in the parameter?
Yes, I should.
>
>> +/* generate the helpers for instructions with one vector and one sclar */
>> +#define GEN_VEXT_VX(NAME, ESZ, DSZ, CLEAR_FN)             \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
>> +        void *vs2, CPURISCVState *env, uint32_t desc)     \
>> +{                                                         \
>> +    uint32_t vlmax = vext_maxsz(desc) / ESZ;              \
>> +    uint32_t mlen = vext_mlen(desc);                      \
>> +    uint32_t vm = vext_vm(desc);                          \
>> +    uint32_t vl = env->vl;                                \
>> +    uint32_t i;                                           \
>> +                                                          \
>> +    for (i = 0; i < vl; i++) {                            \
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
>> +            continue;                                     \
>> +        }                                                 \
>> +        do_##NAME(vd, s1, vs2, i);                        \
>> +    }                                                     \
>> +    if (i != 0) {                                         \
>> +        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);         \
>> +    }                                                     \
>> +}
> Likewise an inline function.
Yes, I will.

Very informative comments. I will try to address them in next patch set 
soon.

Thanks very much.

Zhiwei
>
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 24/60] target/riscv: vector single-width averaging add and subtract
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  8:14     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:14 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +/* Vector Single-Width Averaging Add and Subtract */
> +static inline uint8_t get_round(CPURISCVState *env, uint64_t v, uint8_t shift)
> +{
> +    uint8_t d = extract64(v, shift, 1);
> +    uint8_t d1;
> +    uint64_t D1, D2;
> +    int mod = env->vxrm;
> +
> +    if (shift == 0 || shift > 64) {
> +        return 0;
> +    }
> +
> +    d1 = extract64(v, shift - 1, 1);
> +    D1 = extract64(v, 0, shift);
> +    if (mod == 0) { /* round-to-nearest-up (add +0.5 LSB) */
> +        return d1;
> +    } else if (mod == 1) { /* round-to-nearest-even */
> +        if (shift > 1) {
> +            D2 = extract64(v, 0, shift - 1);
> +            return d1 & ((D2 != 0) | d);
> +        } else {
> +            return d1 & d;
> +        }
> +    } else if (mod == 3) { /* round-to-odd (OR bits into LSB, aka "jam") */
> +        return !d & (D1 != 0);
> +    }
> +    return 0; /* round-down (truncate) */
> +}
> +
> +static inline int8_t aadd8(CPURISCVState *env, int8_t a, int8_t b)
> +{
> +    int16_t res = (int16_t)a + (int16_t)b;
> +    uint8_t round = get_round(env, res, 1);
> +    res   = (res >> 1) + round;
> +    return res;
> +}

I think this is a suboptimal way to arrange things.  It leaves the vxrm lookup
inside of the main loop, while it is obviously loop invariant.

I think you should have 4 versions of aadd8, for each of the rounding modes,

> +RVVCALL(OPIVV2_ENV, vaadd_vv_b, OP_SSS_B, H1, H1, H1, aadd8)

then use this, or something like it, to define 4 functions containing main
loops, which will get the helper above inlined.

Then use a final outermost wrapper to select one of the 4 functions based on
env->vxrm.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 24/60] target/riscv: vector single-width averaging add and subtract
@ 2020-03-14  8:14     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:14 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +/* Vector Single-Width Averaging Add and Subtract */
> +static inline uint8_t get_round(CPURISCVState *env, uint64_t v, uint8_t shift)
> +{
> +    uint8_t d = extract64(v, shift, 1);
> +    uint8_t d1;
> +    uint64_t D1, D2;
> +    int mod = env->vxrm;
> +
> +    if (shift == 0 || shift > 64) {
> +        return 0;
> +    }
> +
> +    d1 = extract64(v, shift - 1, 1);
> +    D1 = extract64(v, 0, shift);
> +    if (mod == 0) { /* round-to-nearest-up (add +0.5 LSB) */
> +        return d1;
> +    } else if (mod == 1) { /* round-to-nearest-even */
> +        if (shift > 1) {
> +            D2 = extract64(v, 0, shift - 1);
> +            return d1 & ((D2 != 0) | d);
> +        } else {
> +            return d1 & d;
> +        }
> +    } else if (mod == 3) { /* round-to-odd (OR bits into LSB, aka "jam") */
> +        return !d & (D1 != 0);
> +    }
> +    return 0; /* round-down (truncate) */
> +}
> +
> +static inline int8_t aadd8(CPURISCVState *env, int8_t a, int8_t b)
> +{
> +    int16_t res = (int16_t)a + (int16_t)b;
> +    uint8_t round = get_round(env, res, 1);
> +    res   = (res >> 1) + round;
> +    return res;
> +}

I think this is a suboptimal way to arrange things.  It leaves the vxrm lookup
inside of the main loop, while it is obviously loop invariant.

I think you should have 4 versions of aadd8, for each of the rounding modes,

> +RVVCALL(OPIVV2_ENV, vaadd_vv_b, OP_SSS_B, H1, H1, H1, aadd8)

then use this, or something like it, to define 4 functions containing main
loops, which will get the helper above inlined.

Then use a final outermost wrapper to select one of the 4 functions based on
env->vxrm.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 24/60] target/riscv: vector single-width averaging add and subtract
  2020-03-14  8:14     ` Richard Henderson
@ 2020-03-14  8:25       ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:25 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/14/20 1:14 AM, Richard Henderson wrote:
> I think you should have 4 versions of aadd8, for each of the rounding modes,
> 
>> +RVVCALL(OPIVV2_ENV, vaadd_vv_b, OP_SSS_B, H1, H1, H1, aadd8)
> 
> then use this, or something like it, to define 4 functions containing main
> loops, which will get the helper above inlined.

Alternately, a set of inlines, where a (constant) vxrm is passed down from above.

> Then use a final outermost wrapper to select one of the 4 functions based on
> env->vxrm.

The outermost wrapper could look like

    switch (env->vxrm) {
    case 0:  somefunc(some, args, 0); break;
    case 1:  somefunc(some, args, 1); break;
    case 2:  somefunc(some, args, 2); break;
    default: somefunc(some, args, 3); break;
    }

so that somefunc (and its subroutines) are expanded with a constant, and we
switch on that constant at the outermost level.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 24/60] target/riscv: vector single-width averaging add and subtract
@ 2020-03-14  8:25       ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:25 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/14/20 1:14 AM, Richard Henderson wrote:
> I think you should have 4 versions of aadd8, for each of the rounding modes,
> 
>> +RVVCALL(OPIVV2_ENV, vaadd_vv_b, OP_SSS_B, H1, H1, H1, aadd8)
> 
> then use this, or something like it, to define 4 functions containing main
> loops, which will get the helper above inlined.

Alternately, a set of inlines, where a (constant) vxrm is passed down from above.

> Then use a final outermost wrapper to select one of the 4 functions based on
> env->vxrm.

The outermost wrapper could look like

    switch (env->vxrm) {
    case 0:  somefunc(some, args, 0); break;
    case 1:  somefunc(some, args, 1); break;
    case 2:  somefunc(some, args, 2); break;
    default: somefunc(some, args, 3); break;
    }

so that somefunc (and its subroutines) are expanded with a constant, and we
switch on that constant at the outermost level.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 25/60] target/riscv: vector single-width fractional multiply with rounding and saturation
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  8:27     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:27 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +/* Vector Single-Width Fractional Multiply with Rounding and Saturation */
> +static inline int8_t vsmul8(CPURISCVState *env, int8_t a, int8_t b)
> +{
> +    uint8_t round;
> +    int16_t res;
> +
> +    res = (int16_t)a * (int16_t)b;
> +    round = get_round(env, res, 7);
> +    res   = (res >> 7) + round;
> +
> +    if (res > INT8_MAX) {
> +        env->vxsat = 0x1;
> +        return INT8_MAX;
> +    } else if (res < INT8_MIN) {
> +        env->vxsat = 0x1;
> +        return INT8_MIN;
> +    } else {
> +        return res;
> +    }
> +}
> +static int16_t vsmul16(CPURISCVState *env, int16_t a, int16_t b)

With the same caveat for vxrm as before.  Oh, and watch the spacing between
these functions.  I noticed before but didn't mention.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 25/60] target/riscv: vector single-width fractional multiply with rounding and saturation
@ 2020-03-14  8:27     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:27 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +/* Vector Single-Width Fractional Multiply with Rounding and Saturation */
> +static inline int8_t vsmul8(CPURISCVState *env, int8_t a, int8_t b)
> +{
> +    uint8_t round;
> +    int16_t res;
> +
> +    res = (int16_t)a * (int16_t)b;
> +    round = get_round(env, res, 7);
> +    res   = (res >> 7) + round;
> +
> +    if (res > INT8_MAX) {
> +        env->vxsat = 0x1;
> +        return INT8_MAX;
> +    } else if (res < INT8_MIN) {
> +        env->vxsat = 0x1;
> +        return INT8_MIN;
> +    } else {
> +        return res;
> +    }
> +}
> +static int16_t vsmul16(CPURISCVState *env, int16_t a, int16_t b)

With the same caveat for vxrm as before.  Oh, and watch the spacing between
these functions.  I noticed before but didn't mention.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 26/60] target/riscv: vector widening saturating scaled multiply-add
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  8:32     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:32 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +static uint16_t vwsmaccu8(CPURISCVState *env, uint8_t a, uint8_t b,
> +    uint16_t c)
> +{
> +    uint8_t round;
> +    uint16_t res = (uint16_t)a * (uint16_t)b;
> +
> +    round = get_round(env, res, 4);
> +    res   = (res >> 4) + round;
> +    return saddu16(env, c, res);
> +}
> +static uint32_t vwsmaccu16(CPURISCVState *env, uint16_t a, uint16_t b,

With the same caveat for vxrm as before, and the spacing.

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 26/60] target/riscv: vector widening saturating scaled multiply-add
@ 2020-03-14  8:32     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:32 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +static uint16_t vwsmaccu8(CPURISCVState *env, uint8_t a, uint8_t b,
> +    uint16_t c)
> +{
> +    uint8_t round;
> +    uint16_t res = (uint16_t)a * (uint16_t)b;
> +
> +    round = get_round(env, res, 4);
> +    res   = (res >> 4) + round;
> +    return saddu16(env, c, res);
> +}
> +static uint32_t vwsmaccu16(CPURISCVState *env, uint16_t a, uint16_t b,

With the same caveat for vxrm as before, and the spacing.

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 27/60] target/riscv: vector single-width scaling shift instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  8:34     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:34 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +static uint8_t vssrl8(CPURISCVState *env, uint8_t a, uint8_t b)
> +{
> +    uint8_t round, shift = b & 0x7;
> +    uint8_t res;
> +
> +    round = get_round(env, a, shift);
> +    res   = (a >> shift)  + round;
> +    return res;
> +}
> +static uint16_t vssrl16(CPURISCVState *env, uint16_t a, uint16_t b)

Likewise.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 27/60] target/riscv: vector single-width scaling shift instructions
@ 2020-03-14  8:34     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:34 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +static uint8_t vssrl8(CPURISCVState *env, uint8_t a, uint8_t b)
> +{
> +    uint8_t round, shift = b & 0x7;
> +    uint8_t res;
> +
> +    round = get_round(env, a, shift);
> +    res   = (a >> shift)  + round;
> +    return res;
> +}
> +static uint16_t vssrl16(CPURISCVState *env, uint16_t a, uint16_t b)

Likewise.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 28/60] target/riscv: vector narrowing fixed-point clip instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  8:36     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:36 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +static int8_t vnclip8(CPURISCVState *env, int16_t a, int8_t b)
> +{
> +    uint8_t round, shift = b & 0xf;
> +    int16_t res;
> +
> +    round = get_round(env, a, shift);
> +    res   = (a >> shift)  + round;
> +    if (res > INT8_MAX) {
> +        env->vxsat = 0x1;
> +        return INT8_MAX;
> +    } else if (res < INT8_MIN) {
> +        env->vxsat = 0x1;
> +        return INT8_MIN;
> +    } else {
> +        return res;
> +    }
> +}
> +static int16_t vnclip16(CPURISCVState *env, int32_t a, int16_t b)

Likewise.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 28/60] target/riscv: vector narrowing fixed-point clip instructions
@ 2020-03-14  8:36     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:36 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +static int8_t vnclip8(CPURISCVState *env, int16_t a, int8_t b)
> +{
> +    uint8_t round, shift = b & 0xf;
> +    int16_t res;
> +
> +    round = get_round(env, a, shift);
> +    res   = (a >> shift)  + round;
> +    if (res > INT8_MAX) {
> +        env->vxsat = 0x1;
> +        return INT8_MAX;
> +    } else if (res < INT8_MIN) {
> +        env->vxsat = 0x1;
> +        return INT8_MIN;
> +    } else {
> +        return res;
> +    }
> +}
> +static int16_t vnclip16(CPURISCVState *env, int32_t a, int16_t b)

Likewise.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 29/60] target/riscv: vector single-width floating-point add/subtract instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  8:40     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:40 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  16 ++++
>  target/riscv/insn32.decode              |   5 ++
>  target/riscv/insn_trans/trans_rvv.inc.c | 107 ++++++++++++++++++++++++
>  target/riscv/vector_helper.c            |  89 ++++++++++++++++++++
>  4 files changed, 217 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 29/60] target/riscv: vector single-width floating-point add/subtract instructions
@ 2020-03-14  8:40     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:40 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  16 ++++
>  target/riscv/insn32.decode              |   5 ++
>  target/riscv/insn_trans/trans_rvv.inc.c | 107 ++++++++++++++++++++++++
>  target/riscv/vector_helper.c            |  89 ++++++++++++++++++++
>  4 files changed, 217 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 30/60] target/riscv: vector widening floating-point add/subtract instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  8:43     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:43 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  17 +++
>  target/riscv/insn32.decode              |   8 ++
>  target/riscv/insn_trans/trans_rvv.inc.c | 131 ++++++++++++++++++++++++
>  target/riscv/vector_helper.c            |  77 ++++++++++++++
>  4 files changed, 233 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 30/60] target/riscv: vector widening floating-point add/subtract instructions
@ 2020-03-14  8:43     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:43 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  17 +++
>  target/riscv/insn32.decode              |   8 ++
>  target/riscv/insn_trans/trans_rvv.inc.c | 131 ++++++++++++++++++++++++
>  target/riscv/vector_helper.c            |  77 ++++++++++++++
>  4 files changed, 233 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 31/60] target/riscv: vector single-width floating-point multiply/divide instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  8:43     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:43 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   | 16 +++++++++
>  target/riscv/insn32.decode              |  5 +++
>  target/riscv/insn_trans/trans_rvv.inc.c |  7 ++++
>  target/riscv/vector_helper.c            | 48 +++++++++++++++++++++++++
>  4 files changed, 76 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 31/60] target/riscv: vector single-width floating-point multiply/divide instructions
@ 2020-03-14  8:43     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:43 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   | 16 +++++++++
>  target/riscv/insn32.decode              |  5 +++
>  target/riscv/insn_trans/trans_rvv.inc.c |  7 ++++
>  target/riscv/vector_helper.c            | 48 +++++++++++++++++++++++++
>  4 files changed, 76 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 32/60] target/riscv: vector widening floating-point multiply
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  8:46     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:46 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  5 +++++
>  target/riscv/insn32.decode              |  2 ++
>  target/riscv/insn_trans/trans_rvv.inc.c |  4 ++++
>  target/riscv/vector_helper.c            | 22 ++++++++++++++++++++++
>  4 files changed, 33 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 32/60] target/riscv: vector widening floating-point multiply
@ 2020-03-14  8:46     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:46 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  5 +++++
>  target/riscv/insn32.decode              |  2 ++
>  target/riscv/insn_trans/trans_rvv.inc.c |  4 ++++
>  target/riscv/vector_helper.c            | 22 ++++++++++++++++++++++
>  4 files changed, 33 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 33/60] target/riscv: vector single-width floating-point fused multiply-add instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  8:49     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:49 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  49 +++++
>  target/riscv/insn32.decode              |  16 ++
>  target/riscv/insn_trans/trans_rvv.inc.c |  18 ++
>  target/riscv/vector_helper.c            | 228 ++++++++++++++++++++++++
>  4 files changed, 311 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 33/60] target/riscv: vector single-width floating-point fused multiply-add instructions
@ 2020-03-14  8:49     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:49 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  49 +++++
>  target/riscv/insn32.decode              |  16 ++
>  target/riscv/insn_trans/trans_rvv.inc.c |  18 ++
>  target/riscv/vector_helper.c            | 228 ++++++++++++++++++++++++
>  4 files changed, 311 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 34/60] target/riscv: vector widening floating-point fused multiply-add instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  8:50     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:50 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   | 17 +++++
>  target/riscv/insn32.decode              |  8 +++
>  target/riscv/insn_trans/trans_rvv.inc.c | 10 +++
>  target/riscv/vector_helper.c            | 84 +++++++++++++++++++++++++
>  4 files changed, 119 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 34/60] target/riscv: vector widening floating-point fused multiply-add instructions
@ 2020-03-14  8:50     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:50 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   | 17 +++++
>  target/riscv/insn32.decode              |  8 +++
>  target/riscv/insn_trans/trans_rvv.inc.c | 10 +++
>  target/riscv/vector_helper.c            | 84 +++++++++++++++++++++++++
>  4 files changed, 119 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 36/60] target/riscv: vector floating-point min/max instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  8:52     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:52 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +RVVCALL(OPFVV2, vfmax_vv_h, OP_UUU_H, H2, H2, H2, float16_max)
> +RVVCALL(OPFVV2, vfmax_vv_w, OP_UUU_W, H4, H4, H4, float32_max)
> +RVVCALL(OPFVV2, vfmax_vv_d, OP_UUU_D, H8, H8, H8, float64_max)
> +GEN_VEXT_VV_ENV(vfmax_vv_h, 2, 2, clearh)
> +GEN_VEXT_VV_ENV(vfmax_vv_w, 4, 4, clearl)
> +GEN_VEXT_VV_ENV(vfmax_vv_d, 8, 8, clearq)
> +RVVCALL(OPFVF2, vfmax_vf_h, OP_UUU_H, H2, H2, float16_max)
> +RVVCALL(OPFVF2, vfmax_vf_w, OP_UUU_W, H4, H4, float32_max)
> +RVVCALL(OPFVF2, vfmax_vf_d, OP_UUU_D, H8, H8, float64_max)

maxnum.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 36/60] target/riscv: vector floating-point min/max instructions
@ 2020-03-14  8:52     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:52 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +RVVCALL(OPFVV2, vfmax_vv_h, OP_UUU_H, H2, H2, H2, float16_max)
> +RVVCALL(OPFVV2, vfmax_vv_w, OP_UUU_W, H4, H4, H4, float32_max)
> +RVVCALL(OPFVV2, vfmax_vv_d, OP_UUU_D, H8, H8, H8, float64_max)
> +GEN_VEXT_VV_ENV(vfmax_vv_h, 2, 2, clearh)
> +GEN_VEXT_VV_ENV(vfmax_vv_w, 4, 4, clearl)
> +GEN_VEXT_VV_ENV(vfmax_vv_d, 8, 8, clearq)
> +RVVCALL(OPFVF2, vfmax_vf_h, OP_UUU_H, H2, H2, float16_max)
> +RVVCALL(OPFVF2, vfmax_vf_w, OP_UUU_W, H4, H4, float32_max)
> +RVVCALL(OPFVF2, vfmax_vf_d, OP_UUU_D, H8, H8, float64_max)

maxnum.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 37/60] target/riscv: vector floating-point sign-injection instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  8:57     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:57 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   | 19 +++++++
>  target/riscv/insn32.decode              |  6 ++
>  target/riscv/insn_trans/trans_rvv.inc.c |  8 +++
>  target/riscv/vector_helper.c            | 76 +++++++++++++++++++++++++
>  4 files changed, 109 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 37/60] target/riscv: vector floating-point sign-injection instructions
@ 2020-03-14  8:57     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  8:57 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   | 19 +++++++
>  target/riscv/insn32.decode              |  6 ++
>  target/riscv/insn_trans/trans_rvv.inc.c |  8 +++
>  target/riscv/vector_helper.c            | 76 +++++++++++++++++++++++++
>  4 files changed, 109 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 38/60] target/riscv: vector floating-point compare instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  9:08     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  9:08 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +static uint8_t float16_eq_quiet(uint16_t a, uint16_t b, float_status *s)
> +{
> +    int compare = float16_compare_quiet(a, b, s);
> +    if (compare == float_relation_equal) {
> +        return 1;
> +    } else {
> +        return 0;
> +    }
> +}

You really need remember that boolean results in C are 1 and 0.
You do not need to keep translating true to 1 and false to 0.

> +static uint8_t vmfne16(uint16_t a, uint16_t b, float_status *s)
> +{
> +    int compare = float16_compare_quiet(a, b, s);
> +    if (compare != float_relation_equal &&
> +            compare != float_relation_unordered) {

Indentation.

> +static uint8_t float16_le(uint16_t a, uint16_t b, float_status *s)
> +{
> +    int compare = float16_compare(a, b, s);
> +    if (compare == float_relation_less ||
> +            compare == float_relation_equal) {
> +        return 1;

Indentation.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 38/60] target/riscv: vector floating-point compare instructions
@ 2020-03-14  9:08     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  9:08 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +static uint8_t float16_eq_quiet(uint16_t a, uint16_t b, float_status *s)
> +{
> +    int compare = float16_compare_quiet(a, b, s);
> +    if (compare == float_relation_equal) {
> +        return 1;
> +    } else {
> +        return 0;
> +    }
> +}

You really need remember that boolean results in C are 1 and 0.
You do not need to keep translating true to 1 and false to 0.

> +static uint8_t vmfne16(uint16_t a, uint16_t b, float_status *s)
> +{
> +    int compare = float16_compare_quiet(a, b, s);
> +    if (compare != float_relation_equal &&
> +            compare != float_relation_unordered) {

Indentation.

> +static uint8_t float16_le(uint16_t a, uint16_t b, float_status *s)
> +{
> +    int compare = float16_compare(a, b, s);
> +    if (compare == float_relation_less ||
> +            compare == float_relation_equal) {
> +        return 1;

Indentation.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 39/60] target/riscv: vector floating-point classify instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14  9:10     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  9:10 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +/* Vector Floating-Point Classify Instruction */
> +static uint16_t fclass_f16(uint16_t frs1, float_status *s)
> +{
> +    float16 f = frs1;
> +    bool sign = float16_is_neg(f);
> +
> +    if (float16_is_infinity(f)) {
> +        return sign ? 1 << 0 : 1 << 7;
> +    } else if (float16_is_zero(f)) {
> +        return sign ? 1 << 3 : 1 << 4;
> +    } else if (float16_is_zero_or_denormal(f)) {
> +        return sign ? 1 << 2 : 1 << 5;
> +    } else if (float16_is_any_nan(f)) {
> +        float_status s = { }; /* for snan_bit_is_one */
> +        return float16_is_quiet_nan(f, &s) ? 1 << 9 : 1 << 8;
> +    } else {
> +        return sign ? 1 << 1 : 1 << 6;
> +    }
> +}
> +static uint32_t fclass_s(uint32_t frs1, float_status *s)
> +{
> +    float32 f = frs1;
> +    bool sign = float32_is_neg(f);
> +
> +    if (float32_is_infinity(f)) {
> +        return sign ? 1 << 0 : 1 << 7;
> +    } else if (float32_is_zero(f)) {
> +        return sign ? 1 << 3 : 1 << 4;
> +    } else if (float32_is_zero_or_denormal(f)) {
> +        return sign ? 1 << 2 : 1 << 5;
> +    } else if (float32_is_any_nan(f)) {
> +        float_status s = { }; /* for snan_bit_is_one */
> +        return float32_is_quiet_nan(f, &s) ? 1 << 9 : 1 << 8;
> +    } else {
> +        return sign ? 1 << 1 : 1 << 6;
> +    }
> +}
> +static uint64_t fclass_d(uint64_t frs1, float_status *s)
> +{
> +    float64 f = frs1;
> +    bool sign = float64_is_neg(f);
> +
> +    if (float64_is_infinity(f)) {
> +        return sign ? 1 << 0 : 1 << 7;
> +    } else if (float64_is_zero(f)) {
> +        return sign ? 1 << 3 : 1 << 4;
> +    } else if (float64_is_zero_or_denormal(f)) {
> +        return sign ? 1 << 2 : 1 << 5;
> +    } else if (float64_is_any_nan(f)) {
> +        float_status s = { }; /* for snan_bit_is_one */
> +        return float64_is_quiet_nan(f, &s) ? 1 << 9 : 1 << 8;
> +    } else {
> +        return sign ? 1 << 1 : 1 << 6;
> +    }
> +}

These need to be moved out of fpu_helper.c so they can be shared.

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 39/60] target/riscv: vector floating-point classify instructions
@ 2020-03-14  9:10     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14  9:10 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +/* Vector Floating-Point Classify Instruction */
> +static uint16_t fclass_f16(uint16_t frs1, float_status *s)
> +{
> +    float16 f = frs1;
> +    bool sign = float16_is_neg(f);
> +
> +    if (float16_is_infinity(f)) {
> +        return sign ? 1 << 0 : 1 << 7;
> +    } else if (float16_is_zero(f)) {
> +        return sign ? 1 << 3 : 1 << 4;
> +    } else if (float16_is_zero_or_denormal(f)) {
> +        return sign ? 1 << 2 : 1 << 5;
> +    } else if (float16_is_any_nan(f)) {
> +        float_status s = { }; /* for snan_bit_is_one */
> +        return float16_is_quiet_nan(f, &s) ? 1 << 9 : 1 << 8;
> +    } else {
> +        return sign ? 1 << 1 : 1 << 6;
> +    }
> +}
> +static uint32_t fclass_s(uint32_t frs1, float_status *s)
> +{
> +    float32 f = frs1;
> +    bool sign = float32_is_neg(f);
> +
> +    if (float32_is_infinity(f)) {
> +        return sign ? 1 << 0 : 1 << 7;
> +    } else if (float32_is_zero(f)) {
> +        return sign ? 1 << 3 : 1 << 4;
> +    } else if (float32_is_zero_or_denormal(f)) {
> +        return sign ? 1 << 2 : 1 << 5;
> +    } else if (float32_is_any_nan(f)) {
> +        float_status s = { }; /* for snan_bit_is_one */
> +        return float32_is_quiet_nan(f, &s) ? 1 << 9 : 1 << 8;
> +    } else {
> +        return sign ? 1 << 1 : 1 << 6;
> +    }
> +}
> +static uint64_t fclass_d(uint64_t frs1, float_status *s)
> +{
> +    float64 f = frs1;
> +    bool sign = float64_is_neg(f);
> +
> +    if (float64_is_infinity(f)) {
> +        return sign ? 1 << 0 : 1 << 7;
> +    } else if (float64_is_zero(f)) {
> +        return sign ? 1 << 3 : 1 << 4;
> +    } else if (float64_is_zero_or_denormal(f)) {
> +        return sign ? 1 << 2 : 1 << 5;
> +    } else if (float64_is_any_nan(f)) {
> +        float_status s = { }; /* for snan_bit_is_one */
> +        return float64_is_quiet_nan(f, &s) ? 1 << 9 : 1 << 8;
> +    } else {
> +        return sign ? 1 << 1 : 1 << 6;
> +    }
> +}

These need to be moved out of fpu_helper.c so they can be shared.

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 38/60] target/riscv: vector floating-point compare instructions
  2020-03-14  9:08     ` Richard Henderson
@ 2020-03-14  9:11       ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-14  9:11 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768



On 2020/3/14 17:08, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +static uint8_t float16_eq_quiet(uint16_t a, uint16_t b, float_status *s)
>> +{
>> +    int compare = float16_compare_quiet(a, b, s);
>> +    if (compare == float_relation_equal) {
>> +        return 1;
>> +    } else {
>> +        return 0;
>> +    }
>> +}
> You really need remember that boolean results in C are 1 and 0.
> You do not need to keep translating true to 1 and false to 0.
Got it. I was not very sure it is 1 or non 0 for true before.

Zhiwei
>> +static uint8_t vmfne16(uint16_t a, uint16_t b, float_status *s)
>> +{
>> +    int compare = float16_compare_quiet(a, b, s);
>> +    if (compare != float_relation_equal &&
>> +            compare != float_relation_unordered) {
> Indentation.
>
>> +static uint8_t float16_le(uint16_t a, uint16_t b, float_status *s)
>> +{
>> +    int compare = float16_compare(a, b, s);
>> +    if (compare == float_relation_less ||
>> +            compare == float_relation_equal) {
>> +        return 1;
> Indentation.
>
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 38/60] target/riscv: vector floating-point compare instructions
@ 2020-03-14  9:11       ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-14  9:11 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv



On 2020/3/14 17:08, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +static uint8_t float16_eq_quiet(uint16_t a, uint16_t b, float_status *s)
>> +{
>> +    int compare = float16_compare_quiet(a, b, s);
>> +    if (compare == float_relation_equal) {
>> +        return 1;
>> +    } else {
>> +        return 0;
>> +    }
>> +}
> You really need remember that boolean results in C are 1 and 0.
> You do not need to keep translating true to 1 and false to 0.
Got it. I was not very sure it is 1 or non 0 for true before.

Zhiwei
>> +static uint8_t vmfne16(uint16_t a, uint16_t b, float_status *s)
>> +{
>> +    int compare = float16_compare_quiet(a, b, s);
>> +    if (compare != float_relation_equal &&
>> +            compare != float_relation_unordered) {
> Indentation.
>
>> +static uint8_t float16_le(uint16_t a, uint16_t b, float_status *s)
>> +{
>> +    int compare = float16_compare(a, b, s);
>> +    if (compare == float_relation_less ||
>> +            compare == float_relation_equal) {
>> +        return 1;
> Indentation.
>
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 39/60] target/riscv: vector floating-point classify instructions
  2020-03-14  9:10     ` Richard Henderson
@ 2020-03-14  9:15       ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-14  9:15 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768



On 2020/3/14 17:10, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +/* Vector Floating-Point Classify Instruction */
>> +static uint16_t fclass_f16(uint16_t frs1, float_status *s)
>> +{
>> +    float16 f = frs1;
>> +    bool sign = float16_is_neg(f);
>> +
>> +    if (float16_is_infinity(f)) {
>> +        return sign ? 1 << 0 : 1 << 7;
>> +    } else if (float16_is_zero(f)) {
>> +        return sign ? 1 << 3 : 1 << 4;
>> +    } else if (float16_is_zero_or_denormal(f)) {
>> +        return sign ? 1 << 2 : 1 << 5;
>> +    } else if (float16_is_any_nan(f)) {
>> +        float_status s = { }; /* for snan_bit_is_one */
>> +        return float16_is_quiet_nan(f, &s) ? 1 << 9 : 1 << 8;
>> +    } else {
>> +        return sign ? 1 << 1 : 1 << 6;
>> +    }
>> +}
>> +static uint32_t fclass_s(uint32_t frs1, float_status *s)
>> +{
>> +    float32 f = frs1;
>> +    bool sign = float32_is_neg(f);
>> +
>> +    if (float32_is_infinity(f)) {
>> +        return sign ? 1 << 0 : 1 << 7;
>> +    } else if (float32_is_zero(f)) {
>> +        return sign ? 1 << 3 : 1 << 4;
>> +    } else if (float32_is_zero_or_denormal(f)) {
>> +        return sign ? 1 << 2 : 1 << 5;
>> +    } else if (float32_is_any_nan(f)) {
>> +        float_status s = { }; /* for snan_bit_is_one */
>> +        return float32_is_quiet_nan(f, &s) ? 1 << 9 : 1 << 8;
>> +    } else {
>> +        return sign ? 1 << 1 : 1 << 6;
>> +    }
>> +}
>> +static uint64_t fclass_d(uint64_t frs1, float_status *s)
>> +{
>> +    float64 f = frs1;
>> +    bool sign = float64_is_neg(f);
>> +
>> +    if (float64_is_infinity(f)) {
>> +        return sign ? 1 << 0 : 1 << 7;
>> +    } else if (float64_is_zero(f)) {
>> +        return sign ? 1 << 3 : 1 << 4;
>> +    } else if (float64_is_zero_or_denormal(f)) {
>> +        return sign ? 1 << 2 : 1 << 5;
>> +    } else if (float64_is_any_nan(f)) {
>> +        float_status s = { }; /* for snan_bit_is_one */
>> +        return float64_is_quiet_nan(f, &s) ? 1 << 9 : 1 << 8;
>> +    } else {
>> +        return sign ? 1 << 1 : 1 << 6;
>> +    }
>> +}
> These need to be moved out of fpu_helper.c so they can be shared.
I will add an internals.h and move the declaration to internals.h.

Zhiwei

>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 39/60] target/riscv: vector floating-point classify instructions
@ 2020-03-14  9:15       ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-14  9:15 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv



On 2020/3/14 17:10, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +/* Vector Floating-Point Classify Instruction */
>> +static uint16_t fclass_f16(uint16_t frs1, float_status *s)
>> +{
>> +    float16 f = frs1;
>> +    bool sign = float16_is_neg(f);
>> +
>> +    if (float16_is_infinity(f)) {
>> +        return sign ? 1 << 0 : 1 << 7;
>> +    } else if (float16_is_zero(f)) {
>> +        return sign ? 1 << 3 : 1 << 4;
>> +    } else if (float16_is_zero_or_denormal(f)) {
>> +        return sign ? 1 << 2 : 1 << 5;
>> +    } else if (float16_is_any_nan(f)) {
>> +        float_status s = { }; /* for snan_bit_is_one */
>> +        return float16_is_quiet_nan(f, &s) ? 1 << 9 : 1 << 8;
>> +    } else {
>> +        return sign ? 1 << 1 : 1 << 6;
>> +    }
>> +}
>> +static uint32_t fclass_s(uint32_t frs1, float_status *s)
>> +{
>> +    float32 f = frs1;
>> +    bool sign = float32_is_neg(f);
>> +
>> +    if (float32_is_infinity(f)) {
>> +        return sign ? 1 << 0 : 1 << 7;
>> +    } else if (float32_is_zero(f)) {
>> +        return sign ? 1 << 3 : 1 << 4;
>> +    } else if (float32_is_zero_or_denormal(f)) {
>> +        return sign ? 1 << 2 : 1 << 5;
>> +    } else if (float32_is_any_nan(f)) {
>> +        float_status s = { }; /* for snan_bit_is_one */
>> +        return float32_is_quiet_nan(f, &s) ? 1 << 9 : 1 << 8;
>> +    } else {
>> +        return sign ? 1 << 1 : 1 << 6;
>> +    }
>> +}
>> +static uint64_t fclass_d(uint64_t frs1, float_status *s)
>> +{
>> +    float64 f = frs1;
>> +    bool sign = float64_is_neg(f);
>> +
>> +    if (float64_is_infinity(f)) {
>> +        return sign ? 1 << 0 : 1 << 7;
>> +    } else if (float64_is_zero(f)) {
>> +        return sign ? 1 << 3 : 1 << 4;
>> +    } else if (float64_is_zero_or_denormal(f)) {
>> +        return sign ? 1 << 2 : 1 << 5;
>> +    } else if (float64_is_any_nan(f)) {
>> +        float_status s = { }; /* for snan_bit_is_one */
>> +        return float64_is_quiet_nan(f, &s) ? 1 << 9 : 1 << 8;
>> +    } else {
>> +        return sign ? 1 << 1 : 1 << 6;
>> +    }
>> +}
> These need to be moved out of fpu_helper.c so they can be shared.
I will add an internals.h and move the declaration to internals.h.

Zhiwei

>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 39/60] target/riscv: vector floating-point classify instructions
  2020-03-14  9:15       ` LIU Zhiwei
@ 2020-03-14 22:06         ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14 22:06 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/14/20 2:15 AM, LIU Zhiwei wrote:
>>> +static uint64_t fclass_d(uint64_t frs1, float_status *s)
>>> +{
>>> +    float64 f = frs1;
>>> +    bool sign = float64_is_neg(f);
>>> +
>>> +    if (float64_is_infinity(f)) {
>>> +        return sign ? 1 << 0 : 1 << 7;
>>> +    } else if (float64_is_zero(f)) {
>>> +        return sign ? 1 << 3 : 1 << 4;
>>> +    } else if (float64_is_zero_or_denormal(f)) {
>>> +        return sign ? 1 << 2 : 1 << 5;
>>> +    } else if (float64_is_any_nan(f)) {
>>> +        float_status s = { }; /* for snan_bit_is_one */
>>> +        return float64_is_quiet_nan(f, &s) ? 1 << 9 : 1 << 8;
>>> +    } else {
>>> +        return sign ? 1 << 1 : 1 << 6;
>>> +    }
>>> +}
>> These need to be moved out of fpu_helper.c so they can be shared.
> I will add an internals.h and move the declaration to internals.h.

Actually, let's just put declarations for them in internals.h and remove the
static.  They are large enough that they don't need to be inlined.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 39/60] target/riscv: vector floating-point classify instructions
@ 2020-03-14 22:06         ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14 22:06 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/14/20 2:15 AM, LIU Zhiwei wrote:
>>> +static uint64_t fclass_d(uint64_t frs1, float_status *s)
>>> +{
>>> +    float64 f = frs1;
>>> +    bool sign = float64_is_neg(f);
>>> +
>>> +    if (float64_is_infinity(f)) {
>>> +        return sign ? 1 << 0 : 1 << 7;
>>> +    } else if (float64_is_zero(f)) {
>>> +        return sign ? 1 << 3 : 1 << 4;
>>> +    } else if (float64_is_zero_or_denormal(f)) {
>>> +        return sign ? 1 << 2 : 1 << 5;
>>> +    } else if (float64_is_any_nan(f)) {
>>> +        float_status s = { }; /* for snan_bit_is_one */
>>> +        return float64_is_quiet_nan(f, &s) ? 1 << 9 : 1 << 8;
>>> +    } else {
>>> +        return sign ? 1 << 1 : 1 << 6;
>>> +    }
>>> +}
>> These need to be moved out of fpu_helper.c so they can be shared.
> I will add an internals.h and move the declaration to internals.h.

Actually, let's just put declarations for them in internals.h and remove the
static.  They are large enough that they don't need to be inlined.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 40/60] target/riscv: vector floating-point merge instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14 22:47     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14 22:47 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +
> +/* Vector Floating-Point Merge Instruction */
> +static bool opfvf_vfmerge_check(DisasContext *s, arg_rmrr *a)
> +{
> +    return (vext_check_isa_ill(s, RVV) &&
> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
> +            vext_check_reg(s, a->rd, false) &&
> +            vext_check_reg(s, a->rs2, false) &&
> +            ((a->vm == 0) || (a->rs2 == 0)) &&
> +            (s->sew != 0));
> +}
> +GEN_OPFVF_TRANS(vfmerge_vfm, opfvf_vfmerge_check)

Similar comments as for integer merge, using tcg_gen_gvec_dup_i64 for
unpredicated merges.

In fact, there's no reason at all to define a helper function for this one.  I
would expect you do be able to use the exact same helpers as for the integer
merges.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 40/60] target/riscv: vector floating-point merge instructions
@ 2020-03-14 22:47     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14 22:47 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +
> +/* Vector Floating-Point Merge Instruction */
> +static bool opfvf_vfmerge_check(DisasContext *s, arg_rmrr *a)
> +{
> +    return (vext_check_isa_ill(s, RVV) &&
> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
> +            vext_check_reg(s, a->rd, false) &&
> +            vext_check_reg(s, a->rs2, false) &&
> +            ((a->vm == 0) || (a->rs2 == 0)) &&
> +            (s->sew != 0));
> +}
> +GEN_OPFVF_TRANS(vfmerge_vfm, opfvf_vfmerge_check)

Similar comments as for integer merge, using tcg_gen_gvec_dup_i64 for
unpredicated merges.

In fact, there's no reason at all to define a helper function for this one.  I
would expect you do be able to use the exact same helpers as for the integer
merges.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 41/60] target/riscv: vector floating-point/integer type-convert instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14 22:50     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14 22:50 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   | 13 ++++++++++
>  target/riscv/insn32.decode              |  4 +++
>  target/riscv/insn_trans/trans_rvv.inc.c |  6 +++++
>  target/riscv/vector_helper.c            | 33 +++++++++++++++++++++++++
>  4 files changed, 56 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 41/60] target/riscv: vector floating-point/integer type-convert instructions
@ 2020-03-14 22:50     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14 22:50 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   | 13 ++++++++++
>  target/riscv/insn32.decode              |  4 +++
>  target/riscv/insn_trans/trans_rvv.inc.c |  6 +++++
>  target/riscv/vector_helper.c            | 33 +++++++++++++++++++++++++
>  4 files changed, 56 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 42/60] target/riscv: widening floating-point/integer type-convert instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14 23:03     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14 23:03 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +/*
> + * vfwcvt.f.f.v vd, vs2, vm #
> + * Convert single-width float to double-width float.
> + */
> +static uint32_t vfwcvtffv16(uint16_t a, float_status *s)
> +{
> +    return float16_to_float32(a, true, s);
> +}
> +static uint64_t vfwcvtffv32(uint32_t a, float_status *s)
> +{
> +    return float32_to_float64(a, s);
> +}

Do you actually need this second one, as opposed to using float32_to_float64
directly?

> +RVVCALL(OPFVV1, vfwcvt_f_f_v_h, WOP_UU_H, H4, H2, vfwcvtffv16)
> +RVVCALL(OPFVV1, vfwcvt_f_f_v_w, WOP_UU_W, H8, H4, vfwcvtffv32)
> +GEN_VEXT_V_ENV(vfwcvt_f_f_v_h, 2, 4, clearl)
> +GEN_VEXT_V_ENV(vfwcvt_f_f_v_w, 4, 8, clearq)
> 

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 42/60] target/riscv: widening floating-point/integer type-convert instructions
@ 2020-03-14 23:03     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14 23:03 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +/*
> + * vfwcvt.f.f.v vd, vs2, vm #
> + * Convert single-width float to double-width float.
> + */
> +static uint32_t vfwcvtffv16(uint16_t a, float_status *s)
> +{
> +    return float16_to_float32(a, true, s);
> +}
> +static uint64_t vfwcvtffv32(uint32_t a, float_status *s)
> +{
> +    return float32_to_float64(a, s);
> +}

Do you actually need this second one, as opposed to using float32_to_float64
directly?

> +RVVCALL(OPFVV1, vfwcvt_f_f_v_h, WOP_UU_H, H4, H2, vfwcvtffv16)
> +RVVCALL(OPFVV1, vfwcvt_f_f_v_w, WOP_UU_W, H8, H4, vfwcvtffv32)
> +GEN_VEXT_V_ENV(vfwcvt_f_f_v_h, 2, 4, clearl)
> +GEN_VEXT_V_ENV(vfwcvt_f_f_v_w, 4, 8, clearq)
> 

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 43/60] target/riscv: narrowing floating-point/integer type-convert instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14 23:08     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14 23:08 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +static uint32_t vfncvtffv32(uint64_t a, float_status *s)
> +{
> +    return float64_to_float32(a, s);
> +}
> +RVVCALL(OPFVV1, vfncvt_f_f_v_h, NOP_UU_H, H2, H4, vfncvtffv16)
> +RVVCALL(OPFVV1, vfncvt_f_f_v_w, NOP_UU_W, H4, H8, vfncvtffv32)
> +GEN_VEXT_V_ENV(vfncvt_f_f_v_h, 2, 2, clearh)
> +GEN_VEXT_V_ENV(vfncvt_f_f_v_w, 4, 4, clearl)

Same question as for float32_to_float64.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 43/60] target/riscv: narrowing floating-point/integer type-convert instructions
@ 2020-03-14 23:08     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14 23:08 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +static uint32_t vfncvtffv32(uint64_t a, float_status *s)
> +{
> +    return float64_to_float32(a, s);
> +}
> +RVVCALL(OPFVV1, vfncvt_f_f_v_h, NOP_UU_H, H2, H4, vfncvtffv16)
> +RVVCALL(OPFVV1, vfncvt_f_f_v_w, NOP_UU_W, H4, H8, vfncvtffv32)
> +GEN_VEXT_V_ENV(vfncvt_f_f_v_h, 2, 2, clearh)
> +GEN_VEXT_V_ENV(vfncvt_f_f_v_w, 4, 4, clearl)

Same question as for float32_to_float64.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 24/60] target/riscv: vector single-width averaging add and subtract
  2020-03-14  8:25       ` Richard Henderson
@ 2020-03-14 23:12         ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-14 23:12 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768



On 2020/3/14 16:25, Richard Henderson wrote:
> On 3/14/20 1:14 AM, Richard Henderson wrote:
>> I think you should have 4 versions of aadd8, for each of the rounding modes,
>>
>>> +RVVCALL(OPIVV2_ENV, vaadd_vv_b, OP_SSS_B, H1, H1, H1, aadd8)
>> then use this, or something like it, to define 4 functions containing main
>> loops, which will get the helper above inlined.
> Alternately, a set of inlines, where a (constant) vxrm is passed down from above.

I am not sure whether I get it. In my opinion, the code should be modified like

static inline int8_t aadd8_rnu(CPURISCVState *env, int8_t a, int8_t b)
{
     int16_t res = (int16_t)a + (int16_t)b;
     uint8_t round = res & 0x1;
     res   = (res >> 1) + round;
     return res;
}

static inline int8_t aadd8_rne(CPURISCVState *env, int8_t a, int8_t b)
{
     int16_t res = (int16_t)a + (int16_t)b;
     uint8_t round = ((res & 0x3) == 0x3);
     res   = (res >> 1) + round;
     return res;
}

static inline int8_t aadd8_rdn(CPURISCVState *env, int8_t a, int8_t b)
{
     int16_t res = (int16_t)a + (int16_t)b;
     res   = (res >> 1);
     return res;
}

static inline int8_t aadd8_rod(CPURISCVState *env, int8_t a, int8_t b)
{
     int16_t res = (int16_t)a + (int16_t)b;
     uint8_t round = ((res & 0x3) == 0x1);
    res   = (res >> 1) + round;
     return res;
}

RVVCALL(OPIVV2_ENV, vaadd_vv_b_rnu, OP_SSS_B, H1, H1, H1, aadd8_rnu)
RVVCALL(OPIVV2_ENV, vaadd_vv_b_rne, OP_SSS_B, H1, H1, H1, aadd8_rne)
RVVCALL(OPIVV2_ENV, vaadd_vv_b_rdn, OP_SSS_B, H1, H1, H1, aadd8_rdn)
RVVCALL(OPIVV2_ENV, vaadd_vv_b_rod, OP_SSS_B, H1, H1, H1, aadd8_rod)

void do_vext_vv_env(void *vd, void *v0, void *vs1,
                     void *vs2, CPURISCVState *env, uint32_t desc,
                     uint32_t esz, uint32_t dsz,
                     opivv2_fn *fn, clear_fn *clearfn)
{
     uint32_t vlmax = vext_maxsz(desc) / esz;
     uint32_t mlen = vext_mlen(desc);
     uint32_t vm = vext_vm(desc);
     uint32_t vl = env->vl;
     uint32_t i;
     for (i = 0; i < vl; i++) {
         if (!vm && !vext_elem_mask(v0, mlen, i)) {
             continue;
         }
         fn(vd, vs1, vs2, i, env);
     }
     if (i != 0) {
         clear_fn(vd, vl, vl * dsz,  vlmax * dsz);
     }
}

#define GEN_VEXT_VV_ENV(NAME, ESZ, DSZ, CLEAR_FN)         \
void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
                   void *vs2, CPURISCVState *env,          \
                   uint32_t desc)                          \
{                                                         \
     static opivv2_fn *fns[4] = {                          \
         NAME##_rnu, NAME##_rne,                           \
         NAME##_rdn, NAME##_rod                            \
     }                                                     \
     return do_vext_vv_env(vd, v0, vs1, vs2, env, desc,    \
                           ESZ, DSZ, fns[env->vxrm],       \
			  CLEAR_FN);                      \
}

Is it true?

Zhiwei

>> Then use a final outermost wrapper to select one of the 4 functions based on
>> env->vxrm.
> The outermost wrapper could look like
>
>      switch (env->vxrm) {
>      case 0:  somefunc(some, args, 0); break;
>      case 1:  somefunc(some, args, 1); break;
>      case 2:  somefunc(some, args, 2); break;
>      default: somefunc(some, args, 3); break;
>      }
>
> so that somefunc (and its subroutines) are expanded with a constant, and we
> switch on that constant at the outermost level.
>
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 24/60] target/riscv: vector single-width averaging add and subtract
@ 2020-03-14 23:12         ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-14 23:12 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv



On 2020/3/14 16:25, Richard Henderson wrote:
> On 3/14/20 1:14 AM, Richard Henderson wrote:
>> I think you should have 4 versions of aadd8, for each of the rounding modes,
>>
>>> +RVVCALL(OPIVV2_ENV, vaadd_vv_b, OP_SSS_B, H1, H1, H1, aadd8)
>> then use this, or something like it, to define 4 functions containing main
>> loops, which will get the helper above inlined.
> Alternately, a set of inlines, where a (constant) vxrm is passed down from above.

I am not sure whether I get it. In my opinion, the code should be modified like

static inline int8_t aadd8_rnu(CPURISCVState *env, int8_t a, int8_t b)
{
     int16_t res = (int16_t)a + (int16_t)b;
     uint8_t round = res & 0x1;
     res   = (res >> 1) + round;
     return res;
}

static inline int8_t aadd8_rne(CPURISCVState *env, int8_t a, int8_t b)
{
     int16_t res = (int16_t)a + (int16_t)b;
     uint8_t round = ((res & 0x3) == 0x3);
     res   = (res >> 1) + round;
     return res;
}

static inline int8_t aadd8_rdn(CPURISCVState *env, int8_t a, int8_t b)
{
     int16_t res = (int16_t)a + (int16_t)b;
     res   = (res >> 1);
     return res;
}

static inline int8_t aadd8_rod(CPURISCVState *env, int8_t a, int8_t b)
{
     int16_t res = (int16_t)a + (int16_t)b;
     uint8_t round = ((res & 0x3) == 0x1);
    res   = (res >> 1) + round;
     return res;
}

RVVCALL(OPIVV2_ENV, vaadd_vv_b_rnu, OP_SSS_B, H1, H1, H1, aadd8_rnu)
RVVCALL(OPIVV2_ENV, vaadd_vv_b_rne, OP_SSS_B, H1, H1, H1, aadd8_rne)
RVVCALL(OPIVV2_ENV, vaadd_vv_b_rdn, OP_SSS_B, H1, H1, H1, aadd8_rdn)
RVVCALL(OPIVV2_ENV, vaadd_vv_b_rod, OP_SSS_B, H1, H1, H1, aadd8_rod)

void do_vext_vv_env(void *vd, void *v0, void *vs1,
                     void *vs2, CPURISCVState *env, uint32_t desc,
                     uint32_t esz, uint32_t dsz,
                     opivv2_fn *fn, clear_fn *clearfn)
{
     uint32_t vlmax = vext_maxsz(desc) / esz;
     uint32_t mlen = vext_mlen(desc);
     uint32_t vm = vext_vm(desc);
     uint32_t vl = env->vl;
     uint32_t i;
     for (i = 0; i < vl; i++) {
         if (!vm && !vext_elem_mask(v0, mlen, i)) {
             continue;
         }
         fn(vd, vs1, vs2, i, env);
     }
     if (i != 0) {
         clear_fn(vd, vl, vl * dsz,  vlmax * dsz);
     }
}

#define GEN_VEXT_VV_ENV(NAME, ESZ, DSZ, CLEAR_FN)         \
void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
                   void *vs2, CPURISCVState *env,          \
                   uint32_t desc)                          \
{                                                         \
     static opivv2_fn *fns[4] = {                          \
         NAME##_rnu, NAME##_rne,                           \
         NAME##_rdn, NAME##_rod                            \
     }                                                     \
     return do_vext_vv_env(vd, v0, vs1, vs2, env, desc,    \
                           ESZ, DSZ, fns[env->vxrm],       \
			  CLEAR_FN);                      \
}

Is it true?

Zhiwei

>> Then use a final outermost wrapper to select one of the 4 functions based on
>> env->vxrm.
> The outermost wrapper could look like
>
>      switch (env->vxrm) {
>      case 0:  somefunc(some, args, 0); break;
>      case 1:  somefunc(some, args, 1); break;
>      case 2:  somefunc(some, args, 2); break;
>      default: somefunc(some, args, 3); break;
>      }
>
> so that somefunc (and its subroutines) are expanded with a constant, and we
> switch on that constant at the outermost level.
>
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 44/60] target/riscv: vector single-width integer reduction instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14 23:29     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14 23:29 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   | 33 +++++++++++
>  target/riscv/insn32.decode              |  8 +++
>  target/riscv/insn_trans/trans_rvv.inc.c | 17 ++++++
>  target/riscv/vector_helper.c            | 76 +++++++++++++++++++++++++
>  4 files changed, 134 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 44/60] target/riscv: vector single-width integer reduction instructions
@ 2020-03-14 23:29     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14 23:29 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   | 33 +++++++++++
>  target/riscv/insn32.decode              |  8 +++
>  target/riscv/insn_trans/trans_rvv.inc.c | 17 ++++++
>  target/riscv/vector_helper.c            | 76 +++++++++++++++++++++++++
>  4 files changed, 134 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 45/60] target/riscv: vector wideing integer reduction instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14 23:34     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14 23:34 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  7 +++++++
>  target/riscv/insn32.decode              |  2 ++
>  target/riscv/insn_trans/trans_rvv.inc.c |  4 ++++
>  target/riscv/vector_helper.c            | 11 +++++++++++
>  4 files changed, 24 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 45/60] target/riscv: vector wideing integer reduction instructions
@ 2020-03-14 23:34     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14 23:34 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  7 +++++++
>  target/riscv/insn32.decode              |  2 ++
>  target/riscv/insn_trans/trans_rvv.inc.c |  4 ++++
>  target/riscv/vector_helper.c            | 11 +++++++++++
>  4 files changed, 24 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 46/60] target/riscv: vector single-width floating-point reduction instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14 23:48     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14 23:48 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   | 10 +++++++
>  target/riscv/insn32.decode              |  4 +++
>  target/riscv/insn_trans/trans_rvv.inc.c |  5 ++++
>  target/riscv/vector_helper.c            | 39 +++++++++++++++++++++++++
>  4 files changed, 58 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 46/60] target/riscv: vector single-width floating-point reduction instructions
@ 2020-03-14 23:48     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14 23:48 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   | 10 +++++++
>  target/riscv/insn32.decode              |  4 +++
>  target/riscv/insn_trans/trans_rvv.inc.c |  5 ++++
>  target/riscv/vector_helper.c            | 39 +++++++++++++++++++++++++
>  4 files changed, 58 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 47/60] target/riscv: vector widening floating-point reduction instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-14 23:49     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14 23:49 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  3 ++
>  target/riscv/insn32.decode              |  2 +
>  target/riscv/insn_trans/trans_rvv.inc.c |  3 ++
>  target/riscv/vector_helper.c            | 50 +++++++++++++++++++++++++
>  4 files changed, 58 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 47/60] target/riscv: vector widening floating-point reduction instructions
@ 2020-03-14 23:49     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-14 23:49 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  3 ++
>  target/riscv/insn32.decode              |  2 +
>  target/riscv/insn_trans/trans_rvv.inc.c |  3 ++
>  target/riscv/vector_helper.c            | 50 +++++++++++++++++++++++++
>  4 files changed, 58 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 24/60] target/riscv: vector single-width averaging add and subtract
  2020-03-14 23:12         ` LIU Zhiwei
@ 2020-03-15  1:00           ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  1:00 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/14/20 4:12 PM, LIU Zhiwei wrote:
> I am not sure whether I get it. In my opinion, the code should be modified like
> 
> static inline int8_t aadd8_rnu(CPURISCVState *env, int8_t a, int8_t b)
> {
>     int16_t res = (int16_t)a + (int16_t)b;
>     uint8_t round = res & 0x1;
>     res   = (res >> 1) + round;
>     return res;
> }
> 
> static inline int8_t aadd8_rne(CPURISCVState *env, int8_t a, int8_t b)
> {
>     int16_t res = (int16_t)a + (int16_t)b;
>     uint8_t round = ((res & 0x3) == 0x3);
>     res   = (res >> 1) + round;
>     return res;
> }
> 
> static inline int8_t aadd8_rdn(CPURISCVState *env, int8_t a, int8_t b)
> {
>     int16_t res = (int16_t)a + (int16_t)b;
>     res   = (res >> 1);
>     return res;
> }
> 
> static inline int8_t aadd8_rod(CPURISCVState *env, int8_t a, int8_t b)
> {
>     int16_t res = (int16_t)a + (int16_t)b;
>     uint8_t round = ((res & 0x3) == 0x1);
>    res   = (res >> 1) + round;
>     return res;
> }
> 
> RVVCALL(OPIVV2_ENV, vaadd_vv_b_rnu, OP_SSS_B, H1, H1, H1, aadd8_rnu)
> RVVCALL(OPIVV2_ENV, vaadd_vv_b_rne, OP_SSS_B, H1, H1, H1, aadd8_rne)
> RVVCALL(OPIVV2_ENV, vaadd_vv_b_rdn, OP_SSS_B, H1, H1, H1, aadd8_rdn)
> RVVCALL(OPIVV2_ENV, vaadd_vv_b_rod, OP_SSS_B, H1, H1, H1, aadd8_rod)
> 
> void do_vext_vv_env(void *vd, void *v0, void *vs1,
>                     void *vs2, CPURISCVState *env, uint32_t desc,
>                     uint32_t esz, uint32_t dsz,
>                     opivv2_fn *fn, clear_fn *clearfn)
> {
>     uint32_t vlmax = vext_maxsz(desc) / esz;
>     uint32_t mlen = vext_mlen(desc);
>     uint32_t vm = vext_vm(desc);
>     uint32_t vl = env->vl;
>     uint32_t i;
>     for (i = 0; i < vl; i++) {
>         if (!vm && !vext_elem_mask(v0, mlen, i)) {
>             continue;
>         }
>         fn(vd, vs1, vs2, i, env);
>     }
>     if (i != 0) {
>         clear_fn(vd, vl, vl * dsz,  vlmax * dsz);
>     }
> }
> 
> #define GEN_VEXT_VV_ENV(NAME, ESZ, DSZ, CLEAR_FN)         \
> void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
>                   void *vs2, CPURISCVState *env,          \
>                   uint32_t desc)                          \
> {                                                         \
>     static opivv2_fn *fns[4] = {                          \
>         NAME##_rnu, NAME##_rne,                           \
>         NAME##_rdn, NAME##_rod                            \
>     }                                                     \
>     return do_vext_vv_env(vd, v0, vs1, vs2, env, desc,    \
>                           ESZ, DSZ, fns[env->vxrm],       \
>               CLEAR_FN);                      \
> }
> 
> Is it true?

While that does look good for this case, there are many other uses of
get_round(), and it may not be quite as simple there.

My suggestion was

static inline int32_t aadd32(int vxrm, int32_t a, int32_t b)
{
    int64_t res = (int64_t)a + b;
    uint8_t round = get_round(vxrm, res, 1);

    return (res >> 1) + round;
}

static inline int64_t aadd64(int vxrm, int64_t a, int64_t b)
{
    int64_t res = a + b;
    uint8_t round = get_round(vxrm, res, 1);
    int64_t over = (res ^ a) & (res ^ b) & INT64_MIN;

    /* With signed overflow, bit 64 is inverse of bit 63. */
    return ((res >> 1) ^ over) + round;
}

RVVCALL(OPIVV2_RM, vaadd_vv_b, OP_SSS_B, H1, H1, H1, aadd32)
RVVCALL(OPIVV2_RM, vaadd_vv_h, OP_SSS_H, H2, H2, H2, aadd32)
RVVCALL(OPIVV2_RM, vaadd_vv_w, OP_SSS_W, H4, H4, H4, aadd32)
RVVCALL(OPIVV2_RM, vaadd_vv_d, OP_SSS_D, H8, H8, H8, aadd64)

static inline void
vext_vv_rm_1(void *vd, void *v0, void *vs1, void *vs2,
             uint32_t vl, uint32_t vm, uint32_t mlen, int vxrm,
             opivv2_rm_fn *fn)
{
    for (uint32_t i = 0; i < vl; i++) {
        if (!vm && !vext_elem_mask(v0, mlen, i)) {
            continue;
        }
        fn(vd, vs1, vs2, i, vxrm);
    }
}

static inline void
vext_vv_rm_2(void *vd, void *v0, void *vs1,
             void *vs2, CPURISCVState *env, uint32_t desc,
             uint32_t esz, uint32_t dsz,
             opivv2_rm_fn *fn, clear_fn *clearfn)
{
    uint32_t vlmax = vext_maxsz(desc) / esz;
    uint32_t mlen = vext_mlen(desc);
    uint32_t vm = vext_vm(desc);
    uint32_t vl = env->vl;

    if (vl == 0) {
        return;
    }

    switch (env->vxrm) {
    case 0: /* rnu */
        vext_vv_rm_1(vd, v0, vs1, vs2,
                     vl, vm, mlen, 0, fn);
        break;
    case 1: /* rne */
        vext_vv_rm_1(vd, v0, vs1, vs2,
                     vl, vm, mlen, 1, fn);
        break;
    case 2: /* rdn */
        vext_vv_rm_1(vd, v0, vs1, vs2,
                     vl, vm, mlen, 2, fn);
        break;
    default: /* rod */
        vext_vv_rm_1(vd, v0, vs1, vs2,
                     vl, vm, mlen, 3, fn);
        break;
    }

    clear_fn(vd, vl, vl * dsz,  vlmax * dsz);
}

>From vext_vv_rm_2, a constant is passed down all of the inline functions, so
that a constant arrives in get_round() at the bottom of the call chain.  At
which point all of the expressions get folded by the compiler and we *should*
get very similar generated code as to what you have above.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 24/60] target/riscv: vector single-width averaging add and subtract
@ 2020-03-15  1:00           ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  1:00 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/14/20 4:12 PM, LIU Zhiwei wrote:
> I am not sure whether I get it. In my opinion, the code should be modified like
> 
> static inline int8_t aadd8_rnu(CPURISCVState *env, int8_t a, int8_t b)
> {
>     int16_t res = (int16_t)a + (int16_t)b;
>     uint8_t round = res & 0x1;
>     res   = (res >> 1) + round;
>     return res;
> }
> 
> static inline int8_t aadd8_rne(CPURISCVState *env, int8_t a, int8_t b)
> {
>     int16_t res = (int16_t)a + (int16_t)b;
>     uint8_t round = ((res & 0x3) == 0x3);
>     res   = (res >> 1) + round;
>     return res;
> }
> 
> static inline int8_t aadd8_rdn(CPURISCVState *env, int8_t a, int8_t b)
> {
>     int16_t res = (int16_t)a + (int16_t)b;
>     res   = (res >> 1);
>     return res;
> }
> 
> static inline int8_t aadd8_rod(CPURISCVState *env, int8_t a, int8_t b)
> {
>     int16_t res = (int16_t)a + (int16_t)b;
>     uint8_t round = ((res & 0x3) == 0x1);
>    res   = (res >> 1) + round;
>     return res;
> }
> 
> RVVCALL(OPIVV2_ENV, vaadd_vv_b_rnu, OP_SSS_B, H1, H1, H1, aadd8_rnu)
> RVVCALL(OPIVV2_ENV, vaadd_vv_b_rne, OP_SSS_B, H1, H1, H1, aadd8_rne)
> RVVCALL(OPIVV2_ENV, vaadd_vv_b_rdn, OP_SSS_B, H1, H1, H1, aadd8_rdn)
> RVVCALL(OPIVV2_ENV, vaadd_vv_b_rod, OP_SSS_B, H1, H1, H1, aadd8_rod)
> 
> void do_vext_vv_env(void *vd, void *v0, void *vs1,
>                     void *vs2, CPURISCVState *env, uint32_t desc,
>                     uint32_t esz, uint32_t dsz,
>                     opivv2_fn *fn, clear_fn *clearfn)
> {
>     uint32_t vlmax = vext_maxsz(desc) / esz;
>     uint32_t mlen = vext_mlen(desc);
>     uint32_t vm = vext_vm(desc);
>     uint32_t vl = env->vl;
>     uint32_t i;
>     for (i = 0; i < vl; i++) {
>         if (!vm && !vext_elem_mask(v0, mlen, i)) {
>             continue;
>         }
>         fn(vd, vs1, vs2, i, env);
>     }
>     if (i != 0) {
>         clear_fn(vd, vl, vl * dsz,  vlmax * dsz);
>     }
> }
> 
> #define GEN_VEXT_VV_ENV(NAME, ESZ, DSZ, CLEAR_FN)         \
> void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
>                   void *vs2, CPURISCVState *env,          \
>                   uint32_t desc)                          \
> {                                                         \
>     static opivv2_fn *fns[4] = {                          \
>         NAME##_rnu, NAME##_rne,                           \
>         NAME##_rdn, NAME##_rod                            \
>     }                                                     \
>     return do_vext_vv_env(vd, v0, vs1, vs2, env, desc,    \
>                           ESZ, DSZ, fns[env->vxrm],       \
>               CLEAR_FN);                      \
> }
> 
> Is it true?

While that does look good for this case, there are many other uses of
get_round(), and it may not be quite as simple there.

My suggestion was

static inline int32_t aadd32(int vxrm, int32_t a, int32_t b)
{
    int64_t res = (int64_t)a + b;
    uint8_t round = get_round(vxrm, res, 1);

    return (res >> 1) + round;
}

static inline int64_t aadd64(int vxrm, int64_t a, int64_t b)
{
    int64_t res = a + b;
    uint8_t round = get_round(vxrm, res, 1);
    int64_t over = (res ^ a) & (res ^ b) & INT64_MIN;

    /* With signed overflow, bit 64 is inverse of bit 63. */
    return ((res >> 1) ^ over) + round;
}

RVVCALL(OPIVV2_RM, vaadd_vv_b, OP_SSS_B, H1, H1, H1, aadd32)
RVVCALL(OPIVV2_RM, vaadd_vv_h, OP_SSS_H, H2, H2, H2, aadd32)
RVVCALL(OPIVV2_RM, vaadd_vv_w, OP_SSS_W, H4, H4, H4, aadd32)
RVVCALL(OPIVV2_RM, vaadd_vv_d, OP_SSS_D, H8, H8, H8, aadd64)

static inline void
vext_vv_rm_1(void *vd, void *v0, void *vs1, void *vs2,
             uint32_t vl, uint32_t vm, uint32_t mlen, int vxrm,
             opivv2_rm_fn *fn)
{
    for (uint32_t i = 0; i < vl; i++) {
        if (!vm && !vext_elem_mask(v0, mlen, i)) {
            continue;
        }
        fn(vd, vs1, vs2, i, vxrm);
    }
}

static inline void
vext_vv_rm_2(void *vd, void *v0, void *vs1,
             void *vs2, CPURISCVState *env, uint32_t desc,
             uint32_t esz, uint32_t dsz,
             opivv2_rm_fn *fn, clear_fn *clearfn)
{
    uint32_t vlmax = vext_maxsz(desc) / esz;
    uint32_t mlen = vext_mlen(desc);
    uint32_t vm = vext_vm(desc);
    uint32_t vl = env->vl;

    if (vl == 0) {
        return;
    }

    switch (env->vxrm) {
    case 0: /* rnu */
        vext_vv_rm_1(vd, v0, vs1, vs2,
                     vl, vm, mlen, 0, fn);
        break;
    case 1: /* rne */
        vext_vv_rm_1(vd, v0, vs1, vs2,
                     vl, vm, mlen, 1, fn);
        break;
    case 2: /* rdn */
        vext_vv_rm_1(vd, v0, vs1, vs2,
                     vl, vm, mlen, 2, fn);
        break;
    default: /* rod */
        vext_vv_rm_1(vd, v0, vs1, vs2,
                     vl, vm, mlen, 3, fn);
        break;
    }

    clear_fn(vd, vl, vl * dsz,  vlmax * dsz);
}

From vext_vv_rm_2, a constant is passed down all of the inline functions, so
that a constant arrives in get_round() at the bottom of the call chain.  At
which point all of the expressions get folded by the compiler and we *should*
get very similar generated code as to what you have above.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 49/60] target/riscv: vector mask population count vmpopc
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-15  1:20     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  1:20 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +target_ulong HELPER(vmpopc_m)(void *v0, void *vs2, CPURISCVState *env,
> +        uint32_t desc)
> +{
> +    target_ulong cnt = 0;
> +    uint32_t mlen = vext_mlen(desc);
> +    uint32_t vm = vext_vm(desc);
> +    uint32_t vl = env->vl;
> +    int i;
> +
> +    for (i = 0; i < vl; i++) {
> +        if (vm || vext_elem_mask(v0, mlen, i)) {
> +            if (vext_elem_mask(vs2, mlen, i)) {
> +                cnt++;
> +            }
> +        }
> +    }
> +    return cnt;
> +}

This is ok as-is, so
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

But you can do better.

You create an array, similar to arm's pred_esz_masks[],
indexed by log2(mlen).

    mask = pred_mlen_masks[log2_mlen];
    n = vl >> (6 - log2_mlen);
    r = extract32(vl, 0, 6 - log2_mlen);
    if (r) {
        rmask = extract64(mask, 0, r << log2_mlen);
    } else {
        rmask = 0;
    }

    if (vm) {
        for (i = 0; i < n; i++) {
            uint64_t j = ((uint64_t *)vs2)[i];
            cnt += ctpop64(j & mask);
        }
        if (rmask) {
            uint64_t j = ((uint64_t *)vs2)[i];
            cnt += ctpop64(j & rmask);
        }
    } else {
        for (i = 0; i < n; i++) {
            uint64_t j = ((uint64_t *)vs2)[i];
            uint64_t k = ((uint64_t *)v0)[i];
            cnt += ctpop64(j & k & mask);
        }
        if (rmask) {
            uint64_t j = ((uint64_t *)vs2)[i];
            uint64_t k = ((uint64_t *)v0)[i];
            cnt += ctpop64(j & k & rmask);
        }
    }


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 49/60] target/riscv: vector mask population count vmpopc
@ 2020-03-15  1:20     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  1:20 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +target_ulong HELPER(vmpopc_m)(void *v0, void *vs2, CPURISCVState *env,
> +        uint32_t desc)
> +{
> +    target_ulong cnt = 0;
> +    uint32_t mlen = vext_mlen(desc);
> +    uint32_t vm = vext_vm(desc);
> +    uint32_t vl = env->vl;
> +    int i;
> +
> +    for (i = 0; i < vl; i++) {
> +        if (vm || vext_elem_mask(v0, mlen, i)) {
> +            if (vext_elem_mask(vs2, mlen, i)) {
> +                cnt++;
> +            }
> +        }
> +    }
> +    return cnt;
> +}

This is ok as-is, so
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

But you can do better.

You create an array, similar to arm's pred_esz_masks[],
indexed by log2(mlen).

    mask = pred_mlen_masks[log2_mlen];
    n = vl >> (6 - log2_mlen);
    r = extract32(vl, 0, 6 - log2_mlen);
    if (r) {
        rmask = extract64(mask, 0, r << log2_mlen);
    } else {
        rmask = 0;
    }

    if (vm) {
        for (i = 0; i < n; i++) {
            uint64_t j = ((uint64_t *)vs2)[i];
            cnt += ctpop64(j & mask);
        }
        if (rmask) {
            uint64_t j = ((uint64_t *)vs2)[i];
            cnt += ctpop64(j & rmask);
        }
    } else {
        for (i = 0; i < n; i++) {
            uint64_t j = ((uint64_t *)vs2)[i];
            uint64_t k = ((uint64_t *)v0)[i];
            cnt += ctpop64(j & k & mask);
        }
        if (rmask) {
            uint64_t j = ((uint64_t *)vs2)[i];
            uint64_t k = ((uint64_t *)v0)[i];
            cnt += ctpop64(j & k & rmask);
        }
    }


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 50/60] target/riscv: vmfirst find-first-set mask bit
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-15  1:36     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  1:36 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +/* vmfirst find-first-set mask bit*/
> +target_ulong HELPER(vmfirst_m)(void *v0, void *vs2, CPURISCVState *env,
> +        uint32_t desc)
> +{
> +    uint32_t mlen = vext_mlen(desc);
> +    uint32_t vm = vext_vm(desc);
> +    uint32_t vl = env->vl;
> +    int i;
> +
> +    for (i = 0; i < vl; i++) {
> +        if (vm || vext_elem_mask(v0, mlen, i)) {
> +            if (vext_elem_mask(vs2, mlen, i)) {
> +               return i;
> +            }
> +        }
> +    }
> +    return -1LL;
> +}

This is ok as-is, so
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

But you can do better.  With the mask, as discussed, the inner loop looks like

    j = mask;
    j &= ((uint64_t *)vs2)[i];
    j &= ((uint64_t *)v0)[i];
    if (j) {
        k = ctz64(j) + i * 64;
        return k >> log2_mlen;
    }


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 50/60] target/riscv: vmfirst find-first-set mask bit
@ 2020-03-15  1:36     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  1:36 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +/* vmfirst find-first-set mask bit*/
> +target_ulong HELPER(vmfirst_m)(void *v0, void *vs2, CPURISCVState *env,
> +        uint32_t desc)
> +{
> +    uint32_t mlen = vext_mlen(desc);
> +    uint32_t vm = vext_vm(desc);
> +    uint32_t vl = env->vl;
> +    int i;
> +
> +    for (i = 0; i < vl; i++) {
> +        if (vm || vext_elem_mask(v0, mlen, i)) {
> +            if (vext_elem_mask(vs2, mlen, i)) {
> +               return i;
> +            }
> +        }
> +    }
> +    return -1LL;
> +}

This is ok as-is, so
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

But you can do better.  With the mask, as discussed, the inner loop looks like

    j = mask;
    j &= ((uint64_t *)vs2)[i];
    j &= ((uint64_t *)v0)[i];
    if (j) {
        k = ctz64(j) + i * 64;
        return k >> log2_mlen;
    }


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 52/60] target/riscv: vector iota instruction
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-15  1:50     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  1:50 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  5 ++++
>  target/riscv/insn32.decode              |  1 +
>  target/riscv/insn_trans/trans_rvv.inc.c | 22 ++++++++++++++++++
>  target/riscv/vector_helper.c            | 31 +++++++++++++++++++++++++
>  4 files changed, 59 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 52/60] target/riscv: vector iota instruction
@ 2020-03-15  1:50     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  1:50 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  5 ++++
>  target/riscv/insn32.decode              |  1 +
>  target/riscv/insn_trans/trans_rvv.inc.c | 22 ++++++++++++++++++
>  target/riscv/vector_helper.c            | 31 +++++++++++++++++++++++++
>  4 files changed, 59 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 53/60] target/riscv: vector element index instruction
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-15  1:54     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  1:54 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  5 +++++
>  target/riscv/insn32.decode              |  2 ++
>  target/riscv/insn_trans/trans_rvv.inc.c | 21 ++++++++++++++++++++
>  target/riscv/vector_helper.c            | 26 +++++++++++++++++++++++++
>  4 files changed, 54 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 53/60] target/riscv: vector element index instruction
@ 2020-03-15  1:54     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  1:54 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  5 +++++
>  target/riscv/insn32.decode              |  2 ++
>  target/riscv/insn_trans/trans_rvv.inc.c | 21 ++++++++++++++++++++
>  target/riscv/vector_helper.c            | 26 +++++++++++++++++++++++++
>  4 files changed, 54 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 54/60] target/riscv: integer extract instruction
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-15  2:53     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  2:53 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +static bool trans_vext_x_v(DisasContext *s, arg_r *a)
> +{
> +    if (vext_check_isa_ill(s, RVV)) {
> +        TCGv_ptr src2;
> +        TCGv dest, src1;
> +        gen_helper_vext_x_v fns[4] = {
> +            gen_helper_vext_x_v_b, gen_helper_vext_x_v_h,
> +            gen_helper_vext_x_v_w, gen_helper_vext_x_v_d
> +        };
> +
> +        dest = tcg_temp_new();
> +        src1 = tcg_temp_new();
> +        src2 = tcg_temp_new_ptr();
> +
> +        gen_get_gpr(src1, a->rs1);
> +        tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, a->rs2));
> +
> +        fns[s->sew](dest, src2, src1, cpu_env);
> +        gen_set_gpr(a->rd, dest);
> +
> +        tcg_temp_free(dest);
> +        tcg_temp_free(src1);
> +        tcg_temp_free_ptr(src2);
> +        return true;
> +    }
> +    return false;
> +}

This entire operation can be performed inline easily.

static void extract_element(TCGv dest, TCGv_ptr base,
                            int ofs, int sew)
{
    switch (sew) {
    case MO_8:
        tcg_gen_ld8u_tl(dest, base, ofs);
        break;
    case MO_16:
        tcg_gen_ld16u_tl(dest, base, ofs);
        break;
    default:
        tcg_gen_ld32u_tl(dest, base, ofs);
        break;
#if TARGET_LONG_BITS == 64
    case MO_64:
        tcg_gen_ld_i64(dest, base, ofs);
        break;
#endif
    }
}

static bool trans_vext_x_v(DisasContext *s, arg_r *a)
{
...
    if (a->rs1 == 0) {
        /* Special case vmv.x.s rd, vs2. */
        do_extract(dest, cpu_env,
                   vreg_ofs(s, a->rs2), s->sew);
    } else {
        int vlen = s->vlen >> (3 + s->sew);
        TCGv_i32 ofs = tcg_temp_new_i32();
        TCGv_ptr  base = tcg_temp_new_ptr();
        TCGv t_vlen, t_zero;

        /* Mask the index to the length so that we do
           not produce an out-of-range load. */
        tcg_gen_trunc_tl_i32(ofs, cpu_gpr[a->rs1]);
        tcg_gen_andi_i32(ofs, ofs, vlen - 1);

        /* Convert the index to an offset.  */
        tcg_gen_shli_i32(ofs, ofs, s->sew);

        /* Convert the index to a pointer. */
        tcg_gen_extu_i32_ptr(base, ofs);
        tcg_gen_add_ptr(base, base, cpu_env);

        /* Perform the load. */
        do_extract(dest, base,
                   vreg_ofs(s, a->rs2), s->sew);
        tcg_temp_free_ptr(base);
        tcg_temp_free_i32(ofs);

        /* Flush out-of-range indexing to zero.  */
        t_vlen = tcg_const_tl(vlen);
        t_zero = tcg_const_tl(0);
        tcg_gen_movcond_tl(TCG_COND_LTU, dest, cpu_gpr[a->rs1],
                           t_vlen, dest, t_zero);
        tcg_temp_free(t_vlen);
        tcg_temp_free(t_zero);
    }


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 54/60] target/riscv: integer extract instruction
@ 2020-03-15  2:53     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  2:53 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +static bool trans_vext_x_v(DisasContext *s, arg_r *a)
> +{
> +    if (vext_check_isa_ill(s, RVV)) {
> +        TCGv_ptr src2;
> +        TCGv dest, src1;
> +        gen_helper_vext_x_v fns[4] = {
> +            gen_helper_vext_x_v_b, gen_helper_vext_x_v_h,
> +            gen_helper_vext_x_v_w, gen_helper_vext_x_v_d
> +        };
> +
> +        dest = tcg_temp_new();
> +        src1 = tcg_temp_new();
> +        src2 = tcg_temp_new_ptr();
> +
> +        gen_get_gpr(src1, a->rs1);
> +        tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, a->rs2));
> +
> +        fns[s->sew](dest, src2, src1, cpu_env);
> +        gen_set_gpr(a->rd, dest);
> +
> +        tcg_temp_free(dest);
> +        tcg_temp_free(src1);
> +        tcg_temp_free_ptr(src2);
> +        return true;
> +    }
> +    return false;
> +}

This entire operation can be performed inline easily.

static void extract_element(TCGv dest, TCGv_ptr base,
                            int ofs, int sew)
{
    switch (sew) {
    case MO_8:
        tcg_gen_ld8u_tl(dest, base, ofs);
        break;
    case MO_16:
        tcg_gen_ld16u_tl(dest, base, ofs);
        break;
    default:
        tcg_gen_ld32u_tl(dest, base, ofs);
        break;
#if TARGET_LONG_BITS == 64
    case MO_64:
        tcg_gen_ld_i64(dest, base, ofs);
        break;
#endif
    }
}

static bool trans_vext_x_v(DisasContext *s, arg_r *a)
{
...
    if (a->rs1 == 0) {
        /* Special case vmv.x.s rd, vs2. */
        do_extract(dest, cpu_env,
                   vreg_ofs(s, a->rs2), s->sew);
    } else {
        int vlen = s->vlen >> (3 + s->sew);
        TCGv_i32 ofs = tcg_temp_new_i32();
        TCGv_ptr  base = tcg_temp_new_ptr();
        TCGv t_vlen, t_zero;

        /* Mask the index to the length so that we do
           not produce an out-of-range load. */
        tcg_gen_trunc_tl_i32(ofs, cpu_gpr[a->rs1]);
        tcg_gen_andi_i32(ofs, ofs, vlen - 1);

        /* Convert the index to an offset.  */
        tcg_gen_shli_i32(ofs, ofs, s->sew);

        /* Convert the index to a pointer. */
        tcg_gen_extu_i32_ptr(base, ofs);
        tcg_gen_add_ptr(base, base, cpu_env);

        /* Perform the load. */
        do_extract(dest, base,
                   vreg_ofs(s, a->rs2), s->sew);
        tcg_temp_free_ptr(base);
        tcg_temp_free_i32(ofs);

        /* Flush out-of-range indexing to zero.  */
        t_vlen = tcg_const_tl(vlen);
        t_zero = tcg_const_tl(0);
        tcg_gen_movcond_tl(TCG_COND_LTU, dest, cpu_gpr[a->rs1],
                           t_vlen, dest, t_zero);
        tcg_temp_free(t_vlen);
        tcg_temp_free(t_zero);
    }


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 55/60] target/riscv: integer scalar move instruction
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-15  3:54     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  3:54 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  5 +++++
>  target/riscv/insn32.decode              |  1 +
>  target/riscv/insn_trans/trans_rvv.inc.c | 26 +++++++++++++++++++++++++
>  target/riscv/vector_helper.c            | 15 ++++++++++++++
>  4 files changed, 47 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

What an annoying difference here between 0.7.1 and 0.8.
With 0.8, we can inline this operation as for vmv.x.s.


r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 55/60] target/riscv: integer scalar move instruction
@ 2020-03-15  3:54     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  3:54 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  5 +++++
>  target/riscv/insn32.decode              |  1 +
>  target/riscv/insn_trans/trans_rvv.inc.c | 26 +++++++++++++++++++++++++
>  target/riscv/vector_helper.c            | 15 ++++++++++++++
>  4 files changed, 47 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

What an annoying difference here between 0.7.1 and 0.8.
With 0.8, we can inline this operation as for vmv.x.s.


r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 56/60] target/riscv: floating-point scalar move instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-15  4:39     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  4:39 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  9 +++++
>  target/riscv/insn32.decode              |  2 ++
>  target/riscv/insn_trans/trans_rvv.inc.c | 47 +++++++++++++++++++++++++
>  target/riscv/vector_helper.c            | 36 +++++++++++++++++++
>  4 files changed, 94 insertions(+)
> 
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 41cecd266c..7a689a5c07 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1111,3 +1111,12 @@ DEF_HELPER_3(vmv_s_x_b, void, ptr, tl, env)
>  DEF_HELPER_3(vmv_s_x_h, void, ptr, tl, env)
>  DEF_HELPER_3(vmv_s_x_w, void, ptr, tl, env)
>  DEF_HELPER_3(vmv_s_x_d, void, ptr, tl, env)
> +
> +DEF_HELPER_2(vfmv_f_s_b, i64, ptr, env)
> +DEF_HELPER_2(vfmv_f_s_h, i64, ptr, env)
> +DEF_HELPER_2(vfmv_f_s_w, i64, ptr, env)
> +DEF_HELPER_2(vfmv_f_s_d, i64, ptr, env)
> +DEF_HELPER_3(vfmv_s_f_b, void, ptr, i64, env)
> +DEF_HELPER_3(vfmv_s_f_h, void, ptr, i64, env)
> +DEF_HELPER_3(vfmv_s_f_w, void, ptr, i64, env)
> +DEF_HELPER_3(vfmv_s_f_d, void, ptr, i64, env)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 7e1efeec05..bfdce0979c 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -557,6 +557,8 @@ viota_m         010110 . ..... 10000 010 ..... 1010111 @r2_vm
>  vid_v           010110 . 00000 10001 010 ..... 1010111 @r1_vm
>  vext_x_v        001100 1 ..... ..... 010 ..... 1010111 @r
>  vmv_s_x         001101 1 00000 ..... 110 ..... 1010111 @r2
> +vfmv_f_s        001100 1 ..... 00000 001 ..... 1010111 @r2rd
> +vfmv_s_f        001101 1 00000 ..... 101 ..... 1010111 @r2
>  
>  vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
>  vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
> index 7720ffecde..99cd45b0aa 100644
> --- a/target/riscv/insn_trans/trans_rvv.inc.c
> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
> @@ -2269,3 +2269,50 @@ static bool trans_vmv_s_x(DisasContext *s, arg_vmv_s_x *a)
>      }
>      return false;
>  }
> +
> +/* Floating-Point Scalar Move Instructions */
> +typedef void (* gen_helper_vfmv_f_s)(TCGv_i64, TCGv_ptr, TCGv_env);
> +static bool trans_vfmv_f_s(DisasContext *s, arg_vfmv_f_s *a)
> +{
> +    if (vext_check_isa_ill(s, RVV)) {
> +        TCGv_ptr src2;
> +        gen_helper_vfmv_f_s fns[4] = {
> +            gen_helper_vfmv_f_s_b, gen_helper_vfmv_f_s_h,
> +            gen_helper_vfmv_f_s_w, gen_helper_vfmv_f_s_d
> +        };
> +
> +        src2 = tcg_temp_new_ptr();
> +        tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, a->rs2));
> +
> +        fns[s->sew](cpu_fpr[a->rd], src2, cpu_env);
> +
> +        tcg_temp_free_ptr(src2);
> +        return true;
> +    }
> +    return false;
> +}

SEW == MO_8 should raise illegal instruction exception.

Need a check for fp enabled.  Presumably

    if (s->mstatus_fs == 0 || !has_ext(s, RVF)) {
        return false;
    }

Need to mark_fs_dirty().

Like integer vmv.x.s, this can be done inline.  The nan-boxing is trivial as well.

For 0.8, we will have to validate the nan-boxing for SEW=MO_64 && !RVD.  That's
still not hard to do inline.



> +
> +typedef void (* gen_helper_vfmv_s_f)(TCGv_ptr, TCGv_i64, TCGv_env);
> +static bool trans_vfmv_s_f(DisasContext *s, arg_vfmv_s_f *a)
> +{
> +    if (vext_check_isa_ill(s, RVV | RVF) ||
> +        vext_check_isa_ill(s, RVV | RVD)) {
> +        TCGv_ptr dest;
> +        TCGv_i64 src1;
> +        gen_helper_vfmv_s_f fns[4] = {
> +            gen_helper_vfmv_s_f_b, gen_helper_vfmv_s_f_h,
> +            gen_helper_vfmv_s_f_w, gen_helper_vfmv_s_f_d
> +        };
> +
> +        src1 = tcg_temp_new_i64();
> +        dest = tcg_temp_new_ptr();
> +        tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, a->rd));
> +
> +        fns[s->sew](dest, src1, cpu_env);
> +
> +        tcg_temp_free_i64(src1);
> +        tcg_temp_free_ptr(dest);
> +        return true;
> +    }
> +    return false;
> +}

Again, SEW == MO_8 is illegal.  Missing fp enable check.

I don't believe RVD without RVF is legal; you should not need to check for both.

Missing nan-boxing for SEW==MO_64 && FLEN==32 (!RVD).  Which I think should be
done here inline, so that the uint64_t passed to the helper is always correct.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 56/60] target/riscv: floating-point scalar move instructions
@ 2020-03-15  4:39     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  4:39 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  9 +++++
>  target/riscv/insn32.decode              |  2 ++
>  target/riscv/insn_trans/trans_rvv.inc.c | 47 +++++++++++++++++++++++++
>  target/riscv/vector_helper.c            | 36 +++++++++++++++++++
>  4 files changed, 94 insertions(+)
> 
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 41cecd266c..7a689a5c07 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1111,3 +1111,12 @@ DEF_HELPER_3(vmv_s_x_b, void, ptr, tl, env)
>  DEF_HELPER_3(vmv_s_x_h, void, ptr, tl, env)
>  DEF_HELPER_3(vmv_s_x_w, void, ptr, tl, env)
>  DEF_HELPER_3(vmv_s_x_d, void, ptr, tl, env)
> +
> +DEF_HELPER_2(vfmv_f_s_b, i64, ptr, env)
> +DEF_HELPER_2(vfmv_f_s_h, i64, ptr, env)
> +DEF_HELPER_2(vfmv_f_s_w, i64, ptr, env)
> +DEF_HELPER_2(vfmv_f_s_d, i64, ptr, env)
> +DEF_HELPER_3(vfmv_s_f_b, void, ptr, i64, env)
> +DEF_HELPER_3(vfmv_s_f_h, void, ptr, i64, env)
> +DEF_HELPER_3(vfmv_s_f_w, void, ptr, i64, env)
> +DEF_HELPER_3(vfmv_s_f_d, void, ptr, i64, env)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 7e1efeec05..bfdce0979c 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -557,6 +557,8 @@ viota_m         010110 . ..... 10000 010 ..... 1010111 @r2_vm
>  vid_v           010110 . 00000 10001 010 ..... 1010111 @r1_vm
>  vext_x_v        001100 1 ..... ..... 010 ..... 1010111 @r
>  vmv_s_x         001101 1 00000 ..... 110 ..... 1010111 @r2
> +vfmv_f_s        001100 1 ..... 00000 001 ..... 1010111 @r2rd
> +vfmv_s_f        001101 1 00000 ..... 101 ..... 1010111 @r2
>  
>  vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
>  vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
> index 7720ffecde..99cd45b0aa 100644
> --- a/target/riscv/insn_trans/trans_rvv.inc.c
> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
> @@ -2269,3 +2269,50 @@ static bool trans_vmv_s_x(DisasContext *s, arg_vmv_s_x *a)
>      }
>      return false;
>  }
> +
> +/* Floating-Point Scalar Move Instructions */
> +typedef void (* gen_helper_vfmv_f_s)(TCGv_i64, TCGv_ptr, TCGv_env);
> +static bool trans_vfmv_f_s(DisasContext *s, arg_vfmv_f_s *a)
> +{
> +    if (vext_check_isa_ill(s, RVV)) {
> +        TCGv_ptr src2;
> +        gen_helper_vfmv_f_s fns[4] = {
> +            gen_helper_vfmv_f_s_b, gen_helper_vfmv_f_s_h,
> +            gen_helper_vfmv_f_s_w, gen_helper_vfmv_f_s_d
> +        };
> +
> +        src2 = tcg_temp_new_ptr();
> +        tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, a->rs2));
> +
> +        fns[s->sew](cpu_fpr[a->rd], src2, cpu_env);
> +
> +        tcg_temp_free_ptr(src2);
> +        return true;
> +    }
> +    return false;
> +}

SEW == MO_8 should raise illegal instruction exception.

Need a check for fp enabled.  Presumably

    if (s->mstatus_fs == 0 || !has_ext(s, RVF)) {
        return false;
    }

Need to mark_fs_dirty().

Like integer vmv.x.s, this can be done inline.  The nan-boxing is trivial as well.

For 0.8, we will have to validate the nan-boxing for SEW=MO_64 && !RVD.  That's
still not hard to do inline.



> +
> +typedef void (* gen_helper_vfmv_s_f)(TCGv_ptr, TCGv_i64, TCGv_env);
> +static bool trans_vfmv_s_f(DisasContext *s, arg_vfmv_s_f *a)
> +{
> +    if (vext_check_isa_ill(s, RVV | RVF) ||
> +        vext_check_isa_ill(s, RVV | RVD)) {
> +        TCGv_ptr dest;
> +        TCGv_i64 src1;
> +        gen_helper_vfmv_s_f fns[4] = {
> +            gen_helper_vfmv_s_f_b, gen_helper_vfmv_s_f_h,
> +            gen_helper_vfmv_s_f_w, gen_helper_vfmv_s_f_d
> +        };
> +
> +        src1 = tcg_temp_new_i64();
> +        dest = tcg_temp_new_ptr();
> +        tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, a->rd));
> +
> +        fns[s->sew](dest, src1, cpu_env);
> +
> +        tcg_temp_free_i64(src1);
> +        tcg_temp_free_ptr(dest);
> +        return true;
> +    }
> +    return false;
> +}

Again, SEW == MO_8 is illegal.  Missing fp enable check.

I don't believe RVD without RVF is legal; you should not need to check for both.

Missing nan-boxing for SEW==MO_64 && FLEN==32 (!RVD).  Which I think should be
done here inline, so that the uint64_t passed to the helper is always correct.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 54/60] target/riscv: integer extract instruction
  2020-03-15  2:53     ` Richard Henderson
@ 2020-03-15  5:15       ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-15  5:15 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768



On 2020/3/15 10:53, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +static bool trans_vext_x_v(DisasContext *s, arg_r *a)
>> +{
>> +    if (vext_check_isa_ill(s, RVV)) {
>> +        TCGv_ptr src2;
>> +        TCGv dest, src1;
>> +        gen_helper_vext_x_v fns[4] = {
>> +            gen_helper_vext_x_v_b, gen_helper_vext_x_v_h,
>> +            gen_helper_vext_x_v_w, gen_helper_vext_x_v_d
>> +        };
>> +
>> +        dest = tcg_temp_new();
>> +        src1 = tcg_temp_new();
>> +        src2 = tcg_temp_new_ptr();
>> +
>> +        gen_get_gpr(src1, a->rs1);
>> +        tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, a->rs2));
>> +
>> +        fns[s->sew](dest, src2, src1, cpu_env);
>> +        gen_set_gpr(a->rd, dest);
>> +
>> +        tcg_temp_free(dest);
>> +        tcg_temp_free(src1);
>> +        tcg_temp_free_ptr(src2);
>> +        return true;
>> +    }
>> +    return false;
>> +}
> This entire operation can be performed inline easily.
>
> static void extract_element(TCGv dest, TCGv_ptr base,
>                              int ofs, int sew)
> {
>      switch (sew) {
>      case MO_8:
>          tcg_gen_ld8u_tl(dest, base, ofs);
>          break;
>      case MO_16:
>          tcg_gen_ld16u_tl(dest, base, ofs);
>          break;
>      default:
>          tcg_gen_ld32u_tl(dest, base, ofs);
>          break;
> #if TARGET_LONG_BITS == 64
>      case MO_64:
>          tcg_gen_ld_i64(dest, base, ofs);
>          break;
> #endif
>      }
> }
>
> static bool trans_vext_x_v(DisasContext *s, arg_r *a)
> {
> ...
>      if (a->rs1 == 0) {
>          /* Special case vmv.x.s rd, vs2. */
>          do_extract(dest, cpu_env,
>                     vreg_ofs(s, a->rs2), s->sew);
>      } else {
>          int vlen = s->vlen >> (3 + s->sew);
>          TCGv_i32 ofs = tcg_temp_new_i32();
>          TCGv_ptr  base = tcg_temp_new_ptr();
>          TCGv t_vlen, t_zero;
>
>          /* Mask the index to the length so that we do
>             not produce an out-of-range load. */
>          tcg_gen_trunc_tl_i32(ofs, cpu_gpr[a->rs1]);
>          tcg_gen_andi_i32(ofs, ofs, vlen - 1);
>
>          /* Convert the index to an offset.  */
>          tcg_gen_shli_i32(ofs, ofs, s->sew);

In  big endianess host, should I convert the index first before this 
statement.

#ifdef HOST_WORDS_BIGENDIAN
static void convert_idx(TCGv_i32 idx, int sew)
{
     switch (sew) {
     case MO_8:
         tcg_gen_xori_i32(idx, idx, 7);
         break;
     case MO_16:
         tcg_gen_xori_i32(idx, idx, 3);
         break;
     case MO_32:
         tcg_gen_xori_i32(idx, idx, 1);
         break;
     default:
         break;
     }
}
#endif


When convert the index to an offset, use this function first

#ifdef HOST_WORDS_BIGENDIAN
     convert_idx(ofs, s->sew)
#endif
/* Convert the index to an offset.  */
tcg_gen_shli_i32(ofs, ofs, s->sew)

Zhiwei
>          /* Convert the index to a pointer. */
>          tcg_gen_extu_i32_ptr(base, ofs);
>          tcg_gen_add_ptr(base, base, cpu_env);
>
>          /* Perform the load. */
>          do_extract(dest, base,
>                     vreg_ofs(s, a->rs2), s->sew);
>          tcg_temp_free_ptr(base);
>          tcg_temp_free_i32(ofs);
>
>          /* Flush out-of-range indexing to zero.  */
>          t_vlen = tcg_const_tl(vlen);
>          t_zero = tcg_const_tl(0);
>          tcg_gen_movcond_tl(TCG_COND_LTU, dest, cpu_gpr[a->rs1],
>                             t_vlen, dest, t_zero);
>          tcg_temp_free(t_vlen);
>          tcg_temp_free(t_zero);
>      }
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 54/60] target/riscv: integer extract instruction
@ 2020-03-15  5:15       ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-15  5:15 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv



On 2020/3/15 10:53, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +static bool trans_vext_x_v(DisasContext *s, arg_r *a)
>> +{
>> +    if (vext_check_isa_ill(s, RVV)) {
>> +        TCGv_ptr src2;
>> +        TCGv dest, src1;
>> +        gen_helper_vext_x_v fns[4] = {
>> +            gen_helper_vext_x_v_b, gen_helper_vext_x_v_h,
>> +            gen_helper_vext_x_v_w, gen_helper_vext_x_v_d
>> +        };
>> +
>> +        dest = tcg_temp_new();
>> +        src1 = tcg_temp_new();
>> +        src2 = tcg_temp_new_ptr();
>> +
>> +        gen_get_gpr(src1, a->rs1);
>> +        tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, a->rs2));
>> +
>> +        fns[s->sew](dest, src2, src1, cpu_env);
>> +        gen_set_gpr(a->rd, dest);
>> +
>> +        tcg_temp_free(dest);
>> +        tcg_temp_free(src1);
>> +        tcg_temp_free_ptr(src2);
>> +        return true;
>> +    }
>> +    return false;
>> +}
> This entire operation can be performed inline easily.
>
> static void extract_element(TCGv dest, TCGv_ptr base,
>                              int ofs, int sew)
> {
>      switch (sew) {
>      case MO_8:
>          tcg_gen_ld8u_tl(dest, base, ofs);
>          break;
>      case MO_16:
>          tcg_gen_ld16u_tl(dest, base, ofs);
>          break;
>      default:
>          tcg_gen_ld32u_tl(dest, base, ofs);
>          break;
> #if TARGET_LONG_BITS == 64
>      case MO_64:
>          tcg_gen_ld_i64(dest, base, ofs);
>          break;
> #endif
>      }
> }
>
> static bool trans_vext_x_v(DisasContext *s, arg_r *a)
> {
> ...
>      if (a->rs1 == 0) {
>          /* Special case vmv.x.s rd, vs2. */
>          do_extract(dest, cpu_env,
>                     vreg_ofs(s, a->rs2), s->sew);
>      } else {
>          int vlen = s->vlen >> (3 + s->sew);
>          TCGv_i32 ofs = tcg_temp_new_i32();
>          TCGv_ptr  base = tcg_temp_new_ptr();
>          TCGv t_vlen, t_zero;
>
>          /* Mask the index to the length so that we do
>             not produce an out-of-range load. */
>          tcg_gen_trunc_tl_i32(ofs, cpu_gpr[a->rs1]);
>          tcg_gen_andi_i32(ofs, ofs, vlen - 1);
>
>          /* Convert the index to an offset.  */
>          tcg_gen_shli_i32(ofs, ofs, s->sew);

In  big endianess host, should I convert the index first before this 
statement.

#ifdef HOST_WORDS_BIGENDIAN
static void convert_idx(TCGv_i32 idx, int sew)
{
     switch (sew) {
     case MO_8:
         tcg_gen_xori_i32(idx, idx, 7);
         break;
     case MO_16:
         tcg_gen_xori_i32(idx, idx, 3);
         break;
     case MO_32:
         tcg_gen_xori_i32(idx, idx, 1);
         break;
     default:
         break;
     }
}
#endif


When convert the index to an offset, use this function first

#ifdef HOST_WORDS_BIGENDIAN
     convert_idx(ofs, s->sew)
#endif
/* Convert the index to an offset.  */
tcg_gen_shli_i32(ofs, ofs, s->sew)

Zhiwei
>          /* Convert the index to a pointer. */
>          tcg_gen_extu_i32_ptr(base, ofs);
>          tcg_gen_add_ptr(base, base, cpu_env);
>
>          /* Perform the load. */
>          do_extract(dest, base,
>                     vreg_ofs(s, a->rs2), s->sew);
>          tcg_temp_free_ptr(base);
>          tcg_temp_free_i32(ofs);
>
>          /* Flush out-of-range indexing to zero.  */
>          t_vlen = tcg_const_tl(vlen);
>          t_zero = tcg_const_tl(0);
>          tcg_gen_movcond_tl(TCG_COND_LTU, dest, cpu_gpr[a->rs1],
>                             t_vlen, dest, t_zero);
>          tcg_temp_free(t_vlen);
>          tcg_temp_free(t_zero);
>      }
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 57/60] target/riscv: vector slide instructions
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-15  5:16     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  5:16 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +#define GEN_VEXT_VSLIDEUP_VX(NAME, ETYPE, H, CLEAR_FN)                    \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
> +        CPURISCVState *env, uint32_t desc)                                \
> +{                                                                         \
> +    uint32_t mlen = vext_mlen(desc);                                      \
> +    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
> +    uint32_t vm = vext_vm(desc);                                          \
> +    uint32_t vl = env->vl;                                                \
> +    uint32_t offset = s1, i;                                              \
> +                                                                          \
> +    if (offset > vl) {                                                    \
> +        offset = vl;                                                      \
> +    }                                                                     \

This isn't right.

> +    for (i = 0; i < vl; i++) {                                            \
> +        if (((i < offset)) || (!vm && !vext_elem_mask(v0, mlen, i))) {    \
> +            continue;                                                     \
> +        }                                                                 \
> +        *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - offset));          \
> +    }                                                                     \
> +    if (i == 0) {                                                         \
> +        return;                                                           \
> +    }                                                                     \

You need to eliminate vl == 0 first, not last.
Then

    for (i = offset; i < vl; i++)

The types of i and vl need to be extended to target_ulong, so that you don't
incorrectly crop the input offset.

It may be worth special-casing vm=1, or hoisting it out of the loop.  The
operation becomes a memcpy (at least for little-endian) at that point.  See
swap_memmove in arm/sve_helper.c.


> +#define GEN_VEXT_VSLIDEDOWN_VX(NAME, ETYPE, H, CLEAR_FN)                  \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
> +        CPURISCVState *env, uint32_t desc)                                \
> +{                                                                         \
> +    uint32_t mlen = vext_mlen(desc);                                      \
> +    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
> +    uint32_t vm = vext_vm(desc);                                          \
> +    uint32_t vl = env->vl;                                                \
> +    uint32_t offset = s1, i;                                              \
> +                                                                          \
> +    for (i = 0; i < vl; i++) {                                            \
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
> +            continue;                                                     \
> +        }                                                                 \
> +        if (i + offset < vlmax) {                                         \
> +            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i + offset));      \

Again, eliminate vl == 0 first.  In fact, why don't we make that a global
request for all of the patches for the next revision.  Checking for i == 0 last
is silly, and checks for the zero twice: once in the loop bounds and again at
the end.

It is probably worth changing the loop bounds to

    if (offset >= vlmax) {
       max = 0;
    } else {
       max = MIN(vl, vlmax - offset);
    }
    for (i = 0; i < max; ++i)


> +        } else {                                                          \
> +            *((ETYPE *)vd + H(i)) = 0;                                    \
> +        }

Which lets these zeros merge into...

> +    for (; i < vlmax; i++) {                                              \
> +        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
> +    }                                                                     \

These zeros.

> +#define GEN_VEXT_VSLIDE1UP_VX(NAME, ETYPE, H, CLEAR_FN)                   \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
> +        CPURISCVState *env, uint32_t desc)                                \
> +{                                                                         \
> +    uint32_t mlen = vext_mlen(desc);                                      \
> +    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
> +    uint32_t vm = vext_vm(desc);                                          \
> +    uint32_t vl = env->vl;                                                \
> +    uint32_t i;                                                           \
> +                                                                          \
> +    for (i = 0; i < vl; i++) {                                            \
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
> +            continue;                                                     \
> +        }                                                                 \
> +        if (i == 0) {                                                     \
> +            *((ETYPE *)vd + H(i)) = s1;                                   \
> +        } else {                                                          \
> +            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - 1));           \
> +        }                                                                 \
> +    }                                                                     \
> +    if (i == 0) {                                                         \
> +        return;                                                           \
> +    }                                                                     \
> +    for (; i < vlmax; i++) {                                              \
> +        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
> +    }                                                                     \
> +}

As a preference, I think you can do away with this helper.
Simply use the slideup helper with argument 1, and then
afterwards store the integer register into element 0.  You should be able to
re-use code from vmv.s.x for that.

> +#define GEN_VEXT_VSLIDE1DOWN_VX(NAME, ETYPE, H, CLEAR_FN)                 \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
> +        CPURISCVState *env, uint32_t desc)                                \
> +{                                                                         \

Likewise.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 57/60] target/riscv: vector slide instructions
@ 2020-03-15  5:16     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  5:16 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +#define GEN_VEXT_VSLIDEUP_VX(NAME, ETYPE, H, CLEAR_FN)                    \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
> +        CPURISCVState *env, uint32_t desc)                                \
> +{                                                                         \
> +    uint32_t mlen = vext_mlen(desc);                                      \
> +    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
> +    uint32_t vm = vext_vm(desc);                                          \
> +    uint32_t vl = env->vl;                                                \
> +    uint32_t offset = s1, i;                                              \
> +                                                                          \
> +    if (offset > vl) {                                                    \
> +        offset = vl;                                                      \
> +    }                                                                     \

This isn't right.

> +    for (i = 0; i < vl; i++) {                                            \
> +        if (((i < offset)) || (!vm && !vext_elem_mask(v0, mlen, i))) {    \
> +            continue;                                                     \
> +        }                                                                 \
> +        *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - offset));          \
> +    }                                                                     \
> +    if (i == 0) {                                                         \
> +        return;                                                           \
> +    }                                                                     \

You need to eliminate vl == 0 first, not last.
Then

    for (i = offset; i < vl; i++)

The types of i and vl need to be extended to target_ulong, so that you don't
incorrectly crop the input offset.

It may be worth special-casing vm=1, or hoisting it out of the loop.  The
operation becomes a memcpy (at least for little-endian) at that point.  See
swap_memmove in arm/sve_helper.c.


> +#define GEN_VEXT_VSLIDEDOWN_VX(NAME, ETYPE, H, CLEAR_FN)                  \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
> +        CPURISCVState *env, uint32_t desc)                                \
> +{                                                                         \
> +    uint32_t mlen = vext_mlen(desc);                                      \
> +    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
> +    uint32_t vm = vext_vm(desc);                                          \
> +    uint32_t vl = env->vl;                                                \
> +    uint32_t offset = s1, i;                                              \
> +                                                                          \
> +    for (i = 0; i < vl; i++) {                                            \
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
> +            continue;                                                     \
> +        }                                                                 \
> +        if (i + offset < vlmax) {                                         \
> +            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i + offset));      \

Again, eliminate vl == 0 first.  In fact, why don't we make that a global
request for all of the patches for the next revision.  Checking for i == 0 last
is silly, and checks for the zero twice: once in the loop bounds and again at
the end.

It is probably worth changing the loop bounds to

    if (offset >= vlmax) {
       max = 0;
    } else {
       max = MIN(vl, vlmax - offset);
    }
    for (i = 0; i < max; ++i)


> +        } else {                                                          \
> +            *((ETYPE *)vd + H(i)) = 0;                                    \
> +        }

Which lets these zeros merge into...

> +    for (; i < vlmax; i++) {                                              \
> +        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
> +    }                                                                     \

These zeros.

> +#define GEN_VEXT_VSLIDE1UP_VX(NAME, ETYPE, H, CLEAR_FN)                   \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
> +        CPURISCVState *env, uint32_t desc)                                \
> +{                                                                         \
> +    uint32_t mlen = vext_mlen(desc);                                      \
> +    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
> +    uint32_t vm = vext_vm(desc);                                          \
> +    uint32_t vl = env->vl;                                                \
> +    uint32_t i;                                                           \
> +                                                                          \
> +    for (i = 0; i < vl; i++) {                                            \
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
> +            continue;                                                     \
> +        }                                                                 \
> +        if (i == 0) {                                                     \
> +            *((ETYPE *)vd + H(i)) = s1;                                   \
> +        } else {                                                          \
> +            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - 1));           \
> +        }                                                                 \
> +    }                                                                     \
> +    if (i == 0) {                                                         \
> +        return;                                                           \
> +    }                                                                     \
> +    for (; i < vlmax; i++) {                                              \
> +        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
> +    }                                                                     \
> +}

As a preference, I think you can do away with this helper.
Simply use the slideup helper with argument 1, and then
afterwards store the integer register into element 0.  You should be able to
re-use code from vmv.s.x for that.

> +#define GEN_VEXT_VSLIDE1DOWN_VX(NAME, ETYPE, H, CLEAR_FN)                 \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
> +        CPURISCVState *env, uint32_t desc)                                \
> +{                                                                         \

Likewise.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 54/60] target/riscv: integer extract instruction
  2020-03-15  5:15       ` LIU Zhiwei
@ 2020-03-15  5:21         ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  5:21 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/14/20 10:15 PM, LIU Zhiwei wrote:
> 
> 
> On 2020/3/15 10:53, Richard Henderson wrote:
>> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>>> +static bool trans_vext_x_v(DisasContext *s, arg_r *a)
>>> +{
>>> +    if (vext_check_isa_ill(s, RVV)) {
>>> +        TCGv_ptr src2;
>>> +        TCGv dest, src1;
>>> +        gen_helper_vext_x_v fns[4] = {
>>> +            gen_helper_vext_x_v_b, gen_helper_vext_x_v_h,
>>> +            gen_helper_vext_x_v_w, gen_helper_vext_x_v_d
>>> +        };
>>> +
>>> +        dest = tcg_temp_new();
>>> +        src1 = tcg_temp_new();
>>> +        src2 = tcg_temp_new_ptr();
>>> +
>>> +        gen_get_gpr(src1, a->rs1);
>>> +        tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, a->rs2));
>>> +
>>> +        fns[s->sew](dest, src2, src1, cpu_env);
>>> +        gen_set_gpr(a->rd, dest);
>>> +
>>> +        tcg_temp_free(dest);
>>> +        tcg_temp_free(src1);
>>> +        tcg_temp_free_ptr(src2);
>>> +        return true;
>>> +    }
>>> +    return false;
>>> +}
>> This entire operation can be performed inline easily.
>>
>> static void extract_element(TCGv dest, TCGv_ptr base,
>>                              int ofs, int sew)
>> {
>>      switch (sew) {
>>      case MO_8:
>>          tcg_gen_ld8u_tl(dest, base, ofs);
>>          break;
>>      case MO_16:
>>          tcg_gen_ld16u_tl(dest, base, ofs);
>>          break;
>>      default:
>>          tcg_gen_ld32u_tl(dest, base, ofs);
>>          break;
>> #if TARGET_LONG_BITS == 64
>>      case MO_64:
>>          tcg_gen_ld_i64(dest, base, ofs);
>>          break;
>> #endif
>>      }
>> }
>>
>> static bool trans_vext_x_v(DisasContext *s, arg_r *a)
>> {
>> ...
>>      if (a->rs1 == 0) {
>>          /* Special case vmv.x.s rd, vs2. */
>>          do_extract(dest, cpu_env,
>>                     vreg_ofs(s, a->rs2), s->sew);
>>      } else {
>>          int vlen = s->vlen >> (3 + s->sew);
>>          TCGv_i32 ofs = tcg_temp_new_i32();
>>          TCGv_ptr  base = tcg_temp_new_ptr();
>>          TCGv t_vlen, t_zero;
>>
>>          /* Mask the index to the length so that we do
>>             not produce an out-of-range load. */
>>          tcg_gen_trunc_tl_i32(ofs, cpu_gpr[a->rs1]);
>>          tcg_gen_andi_i32(ofs, ofs, vlen - 1);
>>
>>          /* Convert the index to an offset.  */
>>          tcg_gen_shli_i32(ofs, ofs, s->sew);
> 
> In  big endianess host, should I convert the index first before this statement.
> 
> #ifdef HOST_WORDS_BIGENDIAN
> static void convert_idx(TCGv_i32 idx, int sew)
> {
>     switch (sew) {
>     case MO_8:
>         tcg_gen_xori_i32(idx, idx, 7);
>         break;
>     case MO_16:
>         tcg_gen_xori_i32(idx, idx, 3);
>         break;
>     case MO_32:
>         tcg_gen_xori_i32(idx, idx, 1);
>         break;
>     default:
>         break;
>     }
> }
> #endif
> 
> 
> When convert the index to an offset, use this function first
> 
> #ifdef HOST_WORDS_BIGENDIAN
>     convert_idx(ofs, s->sew)
> #endif

Yes, I forgot about endian adjust.

I would say

static void endian_adjust(TCGv_i32 ofs, int sew)
{
#ifdef HOST_WORDS_BIGENDIAN
    tcg_gen_xori_i32(ofs, ofs, 7 >> sew);
#endif
}

so that you don't need the ifdef at the use site.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 54/60] target/riscv: integer extract instruction
@ 2020-03-15  5:21         ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  5:21 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/14/20 10:15 PM, LIU Zhiwei wrote:
> 
> 
> On 2020/3/15 10:53, Richard Henderson wrote:
>> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>>> +static bool trans_vext_x_v(DisasContext *s, arg_r *a)
>>> +{
>>> +    if (vext_check_isa_ill(s, RVV)) {
>>> +        TCGv_ptr src2;
>>> +        TCGv dest, src1;
>>> +        gen_helper_vext_x_v fns[4] = {
>>> +            gen_helper_vext_x_v_b, gen_helper_vext_x_v_h,
>>> +            gen_helper_vext_x_v_w, gen_helper_vext_x_v_d
>>> +        };
>>> +
>>> +        dest = tcg_temp_new();
>>> +        src1 = tcg_temp_new();
>>> +        src2 = tcg_temp_new_ptr();
>>> +
>>> +        gen_get_gpr(src1, a->rs1);
>>> +        tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, a->rs2));
>>> +
>>> +        fns[s->sew](dest, src2, src1, cpu_env);
>>> +        gen_set_gpr(a->rd, dest);
>>> +
>>> +        tcg_temp_free(dest);
>>> +        tcg_temp_free(src1);
>>> +        tcg_temp_free_ptr(src2);
>>> +        return true;
>>> +    }
>>> +    return false;
>>> +}
>> This entire operation can be performed inline easily.
>>
>> static void extract_element(TCGv dest, TCGv_ptr base,
>>                              int ofs, int sew)
>> {
>>      switch (sew) {
>>      case MO_8:
>>          tcg_gen_ld8u_tl(dest, base, ofs);
>>          break;
>>      case MO_16:
>>          tcg_gen_ld16u_tl(dest, base, ofs);
>>          break;
>>      default:
>>          tcg_gen_ld32u_tl(dest, base, ofs);
>>          break;
>> #if TARGET_LONG_BITS == 64
>>      case MO_64:
>>          tcg_gen_ld_i64(dest, base, ofs);
>>          break;
>> #endif
>>      }
>> }
>>
>> static bool trans_vext_x_v(DisasContext *s, arg_r *a)
>> {
>> ...
>>      if (a->rs1 == 0) {
>>          /* Special case vmv.x.s rd, vs2. */
>>          do_extract(dest, cpu_env,
>>                     vreg_ofs(s, a->rs2), s->sew);
>>      } else {
>>          int vlen = s->vlen >> (3 + s->sew);
>>          TCGv_i32 ofs = tcg_temp_new_i32();
>>          TCGv_ptr  base = tcg_temp_new_ptr();
>>          TCGv t_vlen, t_zero;
>>
>>          /* Mask the index to the length so that we do
>>             not produce an out-of-range load. */
>>          tcg_gen_trunc_tl_i32(ofs, cpu_gpr[a->rs1]);
>>          tcg_gen_andi_i32(ofs, ofs, vlen - 1);
>>
>>          /* Convert the index to an offset.  */
>>          tcg_gen_shli_i32(ofs, ofs, s->sew);
> 
> In  big endianess host, should I convert the index first before this statement.
> 
> #ifdef HOST_WORDS_BIGENDIAN
> static void convert_idx(TCGv_i32 idx, int sew)
> {
>     switch (sew) {
>     case MO_8:
>         tcg_gen_xori_i32(idx, idx, 7);
>         break;
>     case MO_16:
>         tcg_gen_xori_i32(idx, idx, 3);
>         break;
>     case MO_32:
>         tcg_gen_xori_i32(idx, idx, 1);
>         break;
>     default:
>         break;
>     }
> }
> #endif
> 
> 
> When convert the index to an offset, use this function first
> 
> #ifdef HOST_WORDS_BIGENDIAN
>     convert_idx(ofs, s->sew)
> #endif

Yes, I forgot about endian adjust.

I would say

static void endian_adjust(TCGv_i32 ofs, int sew)
{
#ifdef HOST_WORDS_BIGENDIAN
    tcg_gen_xori_i32(ofs, ofs, 7 >> sew);
#endif
}

so that you don't need the ifdef at the use site.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 58/60] target/riscv: vector register gather instruction
  2020-03-12 14:58   ` LIU Zhiwei
@ 2020-03-15  5:44     ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  5:44 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +static bool vrgather_vx_check(DisasContext *s, arg_rmrr *a)
> +{
> +    return (vext_check_isa_ill(s, RVV) &&
> +            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
> +            vext_check_reg(s, a->rd, false) &&
> +            vext_check_reg(s, a->rs2, false) &&
> +            (a->rd != a->rs2));
> +}
> +GEN_OPIVX_TRANS(vrgather_vx, vrgather_vx_check)
> +GEN_OPIVI_TRANS(vrgather_vi, 1, vrgather_vx, vrgather_vx_check)

The unmasked versions of these should use gvec_dup.

For the immediate version, where we can validate the index at translation time,
we can use tcg_gen_gvec_dup_mem, so that the host vector dup-from-memory
instruction can be used.

For the register version, we should re-use the code from vext.x.s where we load
the element, bound the index and squash the value to zero for index >= VLMAX.
Then use tcg_gen_gvec_dup_i64.

For the masked versions, we should load the value, as above, and then re-use
the vmerge helper with vs2 = vd, so that we get

    vd[i] = v0[i].lsb ? val : vd[i]


> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 2219fdd6c5..5788e46dcf 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -4647,3 +4647,71 @@ GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_b, uint8_t, H1, clearb)
>  GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_h, uint16_t, H2, clearh)
>  GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_w, uint32_t, H4, clearl)
>  GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_d, uint64_t, H8, clearq)
> +
> +/* Vector Register Gather Instruction */
> +#define GEN_VEXT_VRGATHER_VV(NAME, ETYPE, H, CLEAR_FN)                    \
> +void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,               \
> +        CPURISCVState *env, uint32_t desc)                                \
> +{                                                                         \
> +    uint32_t mlen = vext_mlen(desc);                                      \
> +    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
> +    uint32_t vm = vext_vm(desc);                                          \
> +    uint32_t vl = env->vl;                                                \
> +    uint32_t index, i;                                                    \
> +                                                                          \
> +    for (i = 0; i < vl; i++) {                                            \
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
> +            continue;                                                     \
> +        }                                                                 \
> +        index = *((ETYPE *)vs1 + H(i));                                   \
> +        if (index >= vlmax) {

The type of index should be ETYPE or uint64_t, and similar for vlmax just so
they match.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 58/60] target/riscv: vector register gather instruction
@ 2020-03-15  5:44     ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  5:44 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +static bool vrgather_vx_check(DisasContext *s, arg_rmrr *a)
> +{
> +    return (vext_check_isa_ill(s, RVV) &&
> +            vext_check_overlap_mask(s, a->rd, a->vm, true) &&
> +            vext_check_reg(s, a->rd, false) &&
> +            vext_check_reg(s, a->rs2, false) &&
> +            (a->rd != a->rs2));
> +}
> +GEN_OPIVX_TRANS(vrgather_vx, vrgather_vx_check)
> +GEN_OPIVI_TRANS(vrgather_vi, 1, vrgather_vx, vrgather_vx_check)

The unmasked versions of these should use gvec_dup.

For the immediate version, where we can validate the index at translation time,
we can use tcg_gen_gvec_dup_mem, so that the host vector dup-from-memory
instruction can be used.

For the register version, we should re-use the code from vext.x.s where we load
the element, bound the index and squash the value to zero for index >= VLMAX.
Then use tcg_gen_gvec_dup_i64.

For the masked versions, we should load the value, as above, and then re-use
the vmerge helper with vs2 = vd, so that we get

    vd[i] = v0[i].lsb ? val : vd[i]


> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 2219fdd6c5..5788e46dcf 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -4647,3 +4647,71 @@ GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_b, uint8_t, H1, clearb)
>  GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_h, uint16_t, H2, clearh)
>  GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_w, uint32_t, H4, clearl)
>  GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_d, uint64_t, H8, clearq)
> +
> +/* Vector Register Gather Instruction */
> +#define GEN_VEXT_VRGATHER_VV(NAME, ETYPE, H, CLEAR_FN)                    \
> +void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,               \
> +        CPURISCVState *env, uint32_t desc)                                \
> +{                                                                         \
> +    uint32_t mlen = vext_mlen(desc);                                      \
> +    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
> +    uint32_t vm = vext_vm(desc);                                          \
> +    uint32_t vl = env->vl;                                                \
> +    uint32_t index, i;                                                    \
> +                                                                          \
> +    for (i = 0; i < vl; i++) {                                            \
> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
> +            continue;                                                     \
> +        }                                                                 \
> +        index = *((ETYPE *)vs1 + H(i));                                   \
> +        if (index >= vlmax) {

The type of index should be ETYPE or uint64_t, and similar for vlmax just so
they match.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 56/60] target/riscv: floating-point scalar move instructions
  2020-03-15  4:39     ` Richard Henderson
@ 2020-03-15  6:13       ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-15  6:13 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768



On 2020/3/15 12:39, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>> ---
>>   target/riscv/helper.h                   |  9 +++++
>>   target/riscv/insn32.decode              |  2 ++
>>   target/riscv/insn_trans/trans_rvv.inc.c | 47 +++++++++++++++++++++++++
>>   target/riscv/vector_helper.c            | 36 +++++++++++++++++++
>>   4 files changed, 94 insertions(+)
>>
>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>> index 41cecd266c..7a689a5c07 100644
>> --- a/target/riscv/helper.h
>> +++ b/target/riscv/helper.h
>> @@ -1111,3 +1111,12 @@ DEF_HELPER_3(vmv_s_x_b, void, ptr, tl, env)
>>   DEF_HELPER_3(vmv_s_x_h, void, ptr, tl, env)
>>   DEF_HELPER_3(vmv_s_x_w, void, ptr, tl, env)
>>   DEF_HELPER_3(vmv_s_x_d, void, ptr, tl, env)
>> +
>> +DEF_HELPER_2(vfmv_f_s_b, i64, ptr, env)
>> +DEF_HELPER_2(vfmv_f_s_h, i64, ptr, env)
>> +DEF_HELPER_2(vfmv_f_s_w, i64, ptr, env)
>> +DEF_HELPER_2(vfmv_f_s_d, i64, ptr, env)
>> +DEF_HELPER_3(vfmv_s_f_b, void, ptr, i64, env)
>> +DEF_HELPER_3(vfmv_s_f_h, void, ptr, i64, env)
>> +DEF_HELPER_3(vfmv_s_f_w, void, ptr, i64, env)
>> +DEF_HELPER_3(vfmv_s_f_d, void, ptr, i64, env)
>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>> index 7e1efeec05..bfdce0979c 100644
>> --- a/target/riscv/insn32.decode
>> +++ b/target/riscv/insn32.decode
>> @@ -557,6 +557,8 @@ viota_m         010110 . ..... 10000 010 ..... 1010111 @r2_vm
>>   vid_v           010110 . 00000 10001 010 ..... 1010111 @r1_vm
>>   vext_x_v        001100 1 ..... ..... 010 ..... 1010111 @r
>>   vmv_s_x         001101 1 00000 ..... 110 ..... 1010111 @r2
>> +vfmv_f_s        001100 1 ..... 00000 001 ..... 1010111 @r2rd
>> +vfmv_s_f        001101 1 00000 ..... 101 ..... 1010111 @r2
>>   
>>   vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
>>   vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
>> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
>> index 7720ffecde..99cd45b0aa 100644
>> --- a/target/riscv/insn_trans/trans_rvv.inc.c
>> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
>> @@ -2269,3 +2269,50 @@ static bool trans_vmv_s_x(DisasContext *s, arg_vmv_s_x *a)
>>       }
>>       return false;
>>   }
>> +
>> +/* Floating-Point Scalar Move Instructions */
>> +typedef void (* gen_helper_vfmv_f_s)(TCGv_i64, TCGv_ptr, TCGv_env);
>> +static bool trans_vfmv_f_s(DisasContext *s, arg_vfmv_f_s *a)
>> +{
>> +    if (vext_check_isa_ill(s, RVV)) {
>> +        TCGv_ptr src2;
>> +        gen_helper_vfmv_f_s fns[4] = {
>> +            gen_helper_vfmv_f_s_b, gen_helper_vfmv_f_s_h,
>> +            gen_helper_vfmv_f_s_w, gen_helper_vfmv_f_s_d
>> +        };
>> +
>> +        src2 = tcg_temp_new_ptr();
>> +        tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, a->rs2));
>> +
>> +        fns[s->sew](cpu_fpr[a->rd], src2, cpu_env);
>> +
>> +        tcg_temp_free_ptr(src2);
>> +        return true;
>> +    }
>> +    return false;
>> +}
> SEW == MO_8 should raise illegal instruction exception.
I agree. But I didn't find a reference in Section 17.3 both in v0.7.1 
and v0.8.

Perhaps I should refer

"If the current SEW does not correspond to a supported IEEE floating-point
type, an illegal instruction exception is raised."(Section 14)


> Need a check for fp enabled.  Presumably
>
>      if (s->mstatus_fs == 0 || !has_ext(s, RVF)) {
>          return false;
>      }
>
> Need to mark_fs_dirty().
Yes, I should.
>
> Like integer vmv.x.s, this can be done inline.  The nan-boxing is trivial as well.
>
> For 0.8, we will have to validate the nan-boxing for SEW=MO_64 && !RVD.  That's
> still not hard to do inline.
>
I see it. Thanks.
>
>> +
>> +typedef void (* gen_helper_vfmv_s_f)(TCGv_ptr, TCGv_i64, TCGv_env);
>> +static bool trans_vfmv_s_f(DisasContext *s, arg_vfmv_s_f *a)
>> +{
>> +    if (vext_check_isa_ill(s, RVV | RVF) ||
>> +        vext_check_isa_ill(s, RVV | RVD)) {
>> +        TCGv_ptr dest;
>> +        TCGv_i64 src1;
>> +        gen_helper_vfmv_s_f fns[4] = {
>> +            gen_helper_vfmv_s_f_b, gen_helper_vfmv_s_f_h,
>> +            gen_helper_vfmv_s_f_w, gen_helper_vfmv_s_f_d
>> +        };
>> +
>> +        src1 = tcg_temp_new_i64();
>> +        dest = tcg_temp_new_ptr();
>> +        tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, a->rd));
>> +
>> +        fns[s->sew](dest, src1, cpu_env);
There is a mistake here.

fns[s->sew](dest, cpu_fpr[a->rs1], cpu_env);
>> +
>> +        tcg_temp_free_i64(src1);
>> +        tcg_temp_free_ptr(dest);
>> +        return true;
>> +    }
>> +    return false;
>> +}
> Again, SEW == MO_8 is illegal.  Missing fp enable check.
>
> I don't believe RVD without RVF is legal; you should not need to check for both.
Reasonable.
>
> Missing nan-boxing for SEW==MO_64 && FLEN==32 (!RVD).  Which I think should be
> done here inline, so that the uint64_t passed to the helper is always correct.
I think all float registers have been NAN-boxed in QEMU target/riscv.

As float registers are  always 64bits.  If FLEN is 32, a float register 
has been NAN-boxed in FLW or VFMV.F.S

Should I NAN-boxed the float register explicitly here ?

Zhiwei
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 56/60] target/riscv: floating-point scalar move instructions
@ 2020-03-15  6:13       ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-15  6:13 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv



On 2020/3/15 12:39, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>> ---
>>   target/riscv/helper.h                   |  9 +++++
>>   target/riscv/insn32.decode              |  2 ++
>>   target/riscv/insn_trans/trans_rvv.inc.c | 47 +++++++++++++++++++++++++
>>   target/riscv/vector_helper.c            | 36 +++++++++++++++++++
>>   4 files changed, 94 insertions(+)
>>
>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>> index 41cecd266c..7a689a5c07 100644
>> --- a/target/riscv/helper.h
>> +++ b/target/riscv/helper.h
>> @@ -1111,3 +1111,12 @@ DEF_HELPER_3(vmv_s_x_b, void, ptr, tl, env)
>>   DEF_HELPER_3(vmv_s_x_h, void, ptr, tl, env)
>>   DEF_HELPER_3(vmv_s_x_w, void, ptr, tl, env)
>>   DEF_HELPER_3(vmv_s_x_d, void, ptr, tl, env)
>> +
>> +DEF_HELPER_2(vfmv_f_s_b, i64, ptr, env)
>> +DEF_HELPER_2(vfmv_f_s_h, i64, ptr, env)
>> +DEF_HELPER_2(vfmv_f_s_w, i64, ptr, env)
>> +DEF_HELPER_2(vfmv_f_s_d, i64, ptr, env)
>> +DEF_HELPER_3(vfmv_s_f_b, void, ptr, i64, env)
>> +DEF_HELPER_3(vfmv_s_f_h, void, ptr, i64, env)
>> +DEF_HELPER_3(vfmv_s_f_w, void, ptr, i64, env)
>> +DEF_HELPER_3(vfmv_s_f_d, void, ptr, i64, env)
>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>> index 7e1efeec05..bfdce0979c 100644
>> --- a/target/riscv/insn32.decode
>> +++ b/target/riscv/insn32.decode
>> @@ -557,6 +557,8 @@ viota_m         010110 . ..... 10000 010 ..... 1010111 @r2_vm
>>   vid_v           010110 . 00000 10001 010 ..... 1010111 @r1_vm
>>   vext_x_v        001100 1 ..... ..... 010 ..... 1010111 @r
>>   vmv_s_x         001101 1 00000 ..... 110 ..... 1010111 @r2
>> +vfmv_f_s        001100 1 ..... 00000 001 ..... 1010111 @r2rd
>> +vfmv_s_f        001101 1 00000 ..... 101 ..... 1010111 @r2
>>   
>>   vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
>>   vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
>> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
>> index 7720ffecde..99cd45b0aa 100644
>> --- a/target/riscv/insn_trans/trans_rvv.inc.c
>> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
>> @@ -2269,3 +2269,50 @@ static bool trans_vmv_s_x(DisasContext *s, arg_vmv_s_x *a)
>>       }
>>       return false;
>>   }
>> +
>> +/* Floating-Point Scalar Move Instructions */
>> +typedef void (* gen_helper_vfmv_f_s)(TCGv_i64, TCGv_ptr, TCGv_env);
>> +static bool trans_vfmv_f_s(DisasContext *s, arg_vfmv_f_s *a)
>> +{
>> +    if (vext_check_isa_ill(s, RVV)) {
>> +        TCGv_ptr src2;
>> +        gen_helper_vfmv_f_s fns[4] = {
>> +            gen_helper_vfmv_f_s_b, gen_helper_vfmv_f_s_h,
>> +            gen_helper_vfmv_f_s_w, gen_helper_vfmv_f_s_d
>> +        };
>> +
>> +        src2 = tcg_temp_new_ptr();
>> +        tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, a->rs2));
>> +
>> +        fns[s->sew](cpu_fpr[a->rd], src2, cpu_env);
>> +
>> +        tcg_temp_free_ptr(src2);
>> +        return true;
>> +    }
>> +    return false;
>> +}
> SEW == MO_8 should raise illegal instruction exception.
I agree. But I didn't find a reference in Section 17.3 both in v0.7.1 
and v0.8.

Perhaps I should refer

"If the current SEW does not correspond to a supported IEEE floating-point
type, an illegal instruction exception is raised."(Section 14)


> Need a check for fp enabled.  Presumably
>
>      if (s->mstatus_fs == 0 || !has_ext(s, RVF)) {
>          return false;
>      }
>
> Need to mark_fs_dirty().
Yes, I should.
>
> Like integer vmv.x.s, this can be done inline.  The nan-boxing is trivial as well.
>
> For 0.8, we will have to validate the nan-boxing for SEW=MO_64 && !RVD.  That's
> still not hard to do inline.
>
I see it. Thanks.
>
>> +
>> +typedef void (* gen_helper_vfmv_s_f)(TCGv_ptr, TCGv_i64, TCGv_env);
>> +static bool trans_vfmv_s_f(DisasContext *s, arg_vfmv_s_f *a)
>> +{
>> +    if (vext_check_isa_ill(s, RVV | RVF) ||
>> +        vext_check_isa_ill(s, RVV | RVD)) {
>> +        TCGv_ptr dest;
>> +        TCGv_i64 src1;
>> +        gen_helper_vfmv_s_f fns[4] = {
>> +            gen_helper_vfmv_s_f_b, gen_helper_vfmv_s_f_h,
>> +            gen_helper_vfmv_s_f_w, gen_helper_vfmv_s_f_d
>> +        };
>> +
>> +        src1 = tcg_temp_new_i64();
>> +        dest = tcg_temp_new_ptr();
>> +        tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, a->rd));
>> +
>> +        fns[s->sew](dest, src1, cpu_env);
There is a mistake here.

fns[s->sew](dest, cpu_fpr[a->rs1], cpu_env);
>> +
>> +        tcg_temp_free_i64(src1);
>> +        tcg_temp_free_ptr(dest);
>> +        return true;
>> +    }
>> +    return false;
>> +}
> Again, SEW == MO_8 is illegal.  Missing fp enable check.
>
> I don't believe RVD without RVF is legal; you should not need to check for both.
Reasonable.
>
> Missing nan-boxing for SEW==MO_64 && FLEN==32 (!RVD).  Which I think should be
> done here inline, so that the uint64_t passed to the helper is always correct.
I think all float registers have been NAN-boxed in QEMU target/riscv.

As float registers are  always 64bits.  If FLEN is 32, a float register 
has been NAN-boxed in FLW or VFMV.F.S

Should I NAN-boxed the float register explicitly here ?

Zhiwei
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 56/60] target/riscv: floating-point scalar move instructions
  2020-03-15  6:13       ` LIU Zhiwei
@ 2020-03-15  6:48         ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  6:48 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/14/20 11:13 PM, LIU Zhiwei wrote:
>> SEW == MO_8 should raise illegal instruction exception.
> I agree. But I didn't find a reference in Section 17.3 both in v0.7.1 and v0.8.
> 
> Perhaps I should refer
> 
> "If the current SEW does not correspond to a supported IEEE floating-point
> type, an illegal instruction exception is raised."(Section 14)

Yes, that's the rule I was thinking of.

>> Missing nan-boxing for SEW==MO_64 && FLEN==32 (!RVD).  Which I think should be
>> done here inline, so that the uint64_t passed to the helper is always correct.
> I think all float registers have been NAN-boxed in QEMU target/riscv.
> 
> As float registers are  always 64bits.  If FLEN is 32, a float register has
> been NAN-boxed in FLW or VFMV.F.S
> 
> Should I NAN-boxed the float register explicitly here ?

Hmm, I see what you mean -- RVF is supposed to have already boxed all of the
values.  Except that it doesn't at the moment.  I remember now that we were
talking about this some months ago; I thought it had been taken care of, but
hasn't.

I think we should explicitly do it here, with a comment.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 56/60] target/riscv: floating-point scalar move instructions
@ 2020-03-15  6:48         ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  6:48 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/14/20 11:13 PM, LIU Zhiwei wrote:
>> SEW == MO_8 should raise illegal instruction exception.
> I agree. But I didn't find a reference in Section 17.3 both in v0.7.1 and v0.8.
> 
> Perhaps I should refer
> 
> "If the current SEW does not correspond to a supported IEEE floating-point
> type, an illegal instruction exception is raised."(Section 14)

Yes, that's the rule I was thinking of.

>> Missing nan-boxing for SEW==MO_64 && FLEN==32 (!RVD).  Which I think should be
>> done here inline, so that the uint64_t passed to the helper is always correct.
> I think all float registers have been NAN-boxed in QEMU target/riscv.
> 
> As float registers are  always 64bits.  If FLEN is 32, a float register has
> been NAN-boxed in FLW or VFMV.F.S
> 
> Should I NAN-boxed the float register explicitly here ?

Hmm, I see what you mean -- RVF is supposed to have already boxed all of the
values.  Except that it doesn't at the moment.  I remember now that we were
talking about this some months ago; I thought it had been taken care of, but
hasn't.

I think we should explicitly do it here, with a comment.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 57/60] target/riscv: vector slide instructions
  2020-03-15  5:16     ` Richard Henderson
@ 2020-03-15  6:49       ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-15  6:49 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

[-- Attachment #1: Type: text/plain, Size: 7846 bytes --]



On 2020/3/15 13:16, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +#define GEN_VEXT_VSLIDEUP_VX(NAME, ETYPE, H, CLEAR_FN)                    \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
>> +        CPURISCVState *env, uint32_t desc)                                \
>> +{                                                                         \
>> +    uint32_t mlen = vext_mlen(desc);                                      \
>> +    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
>> +    uint32_t vm = vext_vm(desc);                                          \
>> +    uint32_t vl = env->vl;                                                \
>> +    uint32_t offset = s1, i;                                              \
>> +                                                                          \
>> +    if (offset > vl) {                                                    \
>> +        offset = vl;                                                      \
>> +    }                                                                     \
> This isn't right.
That's to process a corner case.  As you can see the behavior of 
vslideup.vx from Section 17.4.1

0 < i < max(vstart, OFFSET) 	  unchanged
max(vstart, OFFSET) <= i < vl 	  vd[i] = vs2[i-OFFSET] if mask enabled, 
unchanged if not
vl <= i < VLMAX
	  tail elements, vd[i] = 0


The spec v0.7.1 or v0.8 does not specified when OFFSET > vl.

Should The elements (vl <=  i  < OFFSET) be seen as tail elements, or 
unchanged?

And it is possible because OFFSET is from a scalar register.

Here (vl <=  i  < OFFSET) elements are seen as tail elements.

>
>> +    for (i = 0; i < vl; i++) {                                            \
>> +        if (((i < offset)) || (!vm && !vext_elem_mask(v0, mlen, i))) {    \
>> +            continue;                                                     \
>> +        }                                                                 \
>> +        *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - offset));          \
>> +    }                                                                     \
>> +    if (i == 0) {                                                         \
>> +        return;                                                           \
>> +    }                                                                     \
> You need to eliminate vl == 0 first, not last.
> Then
>
>      for (i = offset; i < vl; i++)
>
> The types of i and vl need to be extended to target_ulong, so that you don't
> incorrectly crop the input offset.
Yes, I should.
>
> It may be worth special-casing vm=1, or hoisting it out of the loop.  The
> operation becomes a memcpy (at least for little-endian) at that point.  See
> swap_memmove in arm/sve_helper.c.
>
>
>> +#define GEN_VEXT_VSLIDEDOWN_VX(NAME, ETYPE, H, CLEAR_FN)                  \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
>> +        CPURISCVState *env, uint32_t desc)                                \
>> +{                                                                         \
>> +    uint32_t mlen = vext_mlen(desc);                                      \
>> +    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
>> +    uint32_t vm = vext_vm(desc);                                          \
>> +    uint32_t vl = env->vl;                                                \
>> +    uint32_t offset = s1, i;                                              \
>> +                                                                          \
>> +    for (i = 0; i < vl; i++) {                                            \
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
>> +            continue;                                                     \
>> +        }                                                                 \
>> +        if (i + offset < vlmax) {                                         \
>> +            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i + offset));      \
> Again, eliminate vl == 0 first.  In fact, why don't we make that a global
> request for all of the patches for the next revision.
I don't get it.

Check vl == 0 first for all patches. Is it right?
>   Checking for i == 0 last
> is silly, and checks for the zero twice: once in the loop bounds and again at
> the end.

>
> It is probably worth changing the loop bounds to
>
>      if (offset >= vlmax) {
>         max = 0;
>      } else {
>         max = MIN(vl, vlmax - offset);
>      }
>      for (i = 0; i < max; ++i)
>
Yes.
>> +        } else {                                                          \
>> +            *((ETYPE *)vd + H(i)) = 0;                                    \
>> +        }
> Which lets these zeros merge into...
It's a mistake here.
>
>> +    for (; i < vlmax; i++) {                                              \
>> +        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
>> +    }                                                                     \
> These zeros.
>
>> +#define GEN_VEXT_VSLIDE1UP_VX(NAME, ETYPE, H, CLEAR_FN)                   \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
>> +        CPURISCVState *env, uint32_t desc)                                \
>> +{                                                                         \
>> +    uint32_t mlen = vext_mlen(desc);                                      \
>> +    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
>> +    uint32_t vm = vext_vm(desc);                                          \
>> +    uint32_t vl = env->vl;                                                \
>> +    uint32_t i;                                                           \
>> +                                                                          \
>> +    for (i = 0; i < vl; i++) {                                            \
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
>> +            continue;                                                     \
>> +        }                                                                 \
>> +        if (i == 0) {                                                     \
>> +            *((ETYPE *)vd + H(i)) = s1;                                   \
>> +        } else {                                                          \
>> +            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - 1));           \
>> +        }                                                                 \
>> +    }                                                                     \
>> +    if (i == 0) {                                                         \
>> +        return;                                                           \
>> +    }                                                                     \
>> +    for (; i < vlmax; i++) {                                              \
>> +        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
>> +    }                                                                     \
>> +}
> As a preference, I think you can do away with this helper.
> Simply use the slideup helper with argument 1, and then
> afterwards store the integer register into element 0.  You should be able to
> re-use code from vmv.s.x for that.
I will try just in line.

Zhiwei
>> +#define GEN_VEXT_VSLIDE1DOWN_VX(NAME, ETYPE, H, CLEAR_FN)                 \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
>> +        CPURISCVState *env, uint32_t desc)                                \
>> +{                                                                         \
> Likewise.
>
>
> r~


[-- Attachment #2: Type: text/html, Size: 10776 bytes --]

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 57/60] target/riscv: vector slide instructions
@ 2020-03-15  6:49       ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-15  6:49 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

[-- Attachment #1: Type: text/plain, Size: 7846 bytes --]



On 2020/3/15 13:16, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +#define GEN_VEXT_VSLIDEUP_VX(NAME, ETYPE, H, CLEAR_FN)                    \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
>> +        CPURISCVState *env, uint32_t desc)                                \
>> +{                                                                         \
>> +    uint32_t mlen = vext_mlen(desc);                                      \
>> +    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
>> +    uint32_t vm = vext_vm(desc);                                          \
>> +    uint32_t vl = env->vl;                                                \
>> +    uint32_t offset = s1, i;                                              \
>> +                                                                          \
>> +    if (offset > vl) {                                                    \
>> +        offset = vl;                                                      \
>> +    }                                                                     \
> This isn't right.
That's to process a corner case.  As you can see the behavior of 
vslideup.vx from Section 17.4.1

0 < i < max(vstart, OFFSET) 	  unchanged
max(vstart, OFFSET) <= i < vl 	  vd[i] = vs2[i-OFFSET] if mask enabled, 
unchanged if not
vl <= i < VLMAX
	  tail elements, vd[i] = 0


The spec v0.7.1 or v0.8 does not specified when OFFSET > vl.

Should The elements (vl <=  i  < OFFSET) be seen as tail elements, or 
unchanged?

And it is possible because OFFSET is from a scalar register.

Here (vl <=  i  < OFFSET) elements are seen as tail elements.

>
>> +    for (i = 0; i < vl; i++) {                                            \
>> +        if (((i < offset)) || (!vm && !vext_elem_mask(v0, mlen, i))) {    \
>> +            continue;                                                     \
>> +        }                                                                 \
>> +        *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - offset));          \
>> +    }                                                                     \
>> +    if (i == 0) {                                                         \
>> +        return;                                                           \
>> +    }                                                                     \
> You need to eliminate vl == 0 first, not last.
> Then
>
>      for (i = offset; i < vl; i++)
>
> The types of i and vl need to be extended to target_ulong, so that you don't
> incorrectly crop the input offset.
Yes, I should.
>
> It may be worth special-casing vm=1, or hoisting it out of the loop.  The
> operation becomes a memcpy (at least for little-endian) at that point.  See
> swap_memmove in arm/sve_helper.c.
>
>
>> +#define GEN_VEXT_VSLIDEDOWN_VX(NAME, ETYPE, H, CLEAR_FN)                  \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
>> +        CPURISCVState *env, uint32_t desc)                                \
>> +{                                                                         \
>> +    uint32_t mlen = vext_mlen(desc);                                      \
>> +    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
>> +    uint32_t vm = vext_vm(desc);                                          \
>> +    uint32_t vl = env->vl;                                                \
>> +    uint32_t offset = s1, i;                                              \
>> +                                                                          \
>> +    for (i = 0; i < vl; i++) {                                            \
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
>> +            continue;                                                     \
>> +        }                                                                 \
>> +        if (i + offset < vlmax) {                                         \
>> +            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i + offset));      \
> Again, eliminate vl == 0 first.  In fact, why don't we make that a global
> request for all of the patches for the next revision.
I don't get it.

Check vl == 0 first for all patches. Is it right?
>   Checking for i == 0 last
> is silly, and checks for the zero twice: once in the loop bounds and again at
> the end.

>
> It is probably worth changing the loop bounds to
>
>      if (offset >= vlmax) {
>         max = 0;
>      } else {
>         max = MIN(vl, vlmax - offset);
>      }
>      for (i = 0; i < max; ++i)
>
Yes.
>> +        } else {                                                          \
>> +            *((ETYPE *)vd + H(i)) = 0;                                    \
>> +        }
> Which lets these zeros merge into...
It's a mistake here.
>
>> +    for (; i < vlmax; i++) {                                              \
>> +        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
>> +    }                                                                     \
> These zeros.
>
>> +#define GEN_VEXT_VSLIDE1UP_VX(NAME, ETYPE, H, CLEAR_FN)                   \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
>> +        CPURISCVState *env, uint32_t desc)                                \
>> +{                                                                         \
>> +    uint32_t mlen = vext_mlen(desc);                                      \
>> +    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
>> +    uint32_t vm = vext_vm(desc);                                          \
>> +    uint32_t vl = env->vl;                                                \
>> +    uint32_t i;                                                           \
>> +                                                                          \
>> +    for (i = 0; i < vl; i++) {                                            \
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
>> +            continue;                                                     \
>> +        }                                                                 \
>> +        if (i == 0) {                                                     \
>> +            *((ETYPE *)vd + H(i)) = s1;                                   \
>> +        } else {                                                          \
>> +            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - 1));           \
>> +        }                                                                 \
>> +    }                                                                     \
>> +    if (i == 0) {                                                         \
>> +        return;                                                           \
>> +    }                                                                     \
>> +    for (; i < vlmax; i++) {                                              \
>> +        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
>> +    }                                                                     \
>> +}
> As a preference, I think you can do away with this helper.
> Simply use the slideup helper with argument 1, and then
> afterwards store the integer register into element 0.  You should be able to
> re-use code from vmv.s.x for that.
I will try just in line.

Zhiwei
>> +#define GEN_VEXT_VSLIDE1DOWN_VX(NAME, ETYPE, H, CLEAR_FN)                 \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
>> +        CPURISCVState *env, uint32_t desc)                                \
>> +{                                                                         \
> Likewise.
>
>
> r~


[-- Attachment #2: Type: text/html, Size: 10776 bytes --]

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 57/60] target/riscv: vector slide instructions
  2020-03-15  6:49       ` LIU Zhiwei
@ 2020-03-15  6:56         ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  6:56 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/14/20 11:49 PM, LIU Zhiwei wrote:
>>> +    if (offset > vl) {                                                    \
>>> +        offset = vl;                                                      \
>>> +    }                                                                     \
>> This isn't right.
> That's to process a corner case.  As you can see the behavior of vslideup.vx
> from Section 17.4.1
> 
> 0 < i < max(vstart, OFFSET) 	  unchanged
> max(vstart, OFFSET) <= i < vl 	  vd[i] = vs2[i-OFFSET] if mask enabled,
> unchanged if not
> vl <= i < VLMAX   
> 	  tail elements, vd[i] = 0
> 
> 
> The spec v0.7.1 or v0.8 does not specified when OFFSET > vl.

Certainly it does, right there:

   offset <= i < vl.

If offset >= vl, then that range is empty of elements.

> Should The elements (vl <=  i  < OFFSET) be seen as tail elements, or unchanged?

Tail elements.

> Here (vl <=  i  < OFFSET) elements are seen as tail elements.

Exactly.

>> Again, eliminate vl == 0 first.  In fact, why don't we make that a global
>> request for all of the patches for the next revision. 
> I don't get it.
> 
> Check vl == 0 first for all patches. Is it right?

Yes.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 57/60] target/riscv: vector slide instructions
@ 2020-03-15  6:56         ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  6:56 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/14/20 11:49 PM, LIU Zhiwei wrote:
>>> +    if (offset > vl) {                                                    \
>>> +        offset = vl;                                                      \
>>> +    }                                                                     \
>> This isn't right.
> That's to process a corner case.  As you can see the behavior of vslideup.vx
> from Section 17.4.1
> 
> 0 < i < max(vstart, OFFSET) 	  unchanged
> max(vstart, OFFSET) <= i < vl 	  vd[i] = vs2[i-OFFSET] if mask enabled,
> unchanged if not
> vl <= i < VLMAX   
> 	  tail elements, vd[i] = 0
> 
> 
> The spec v0.7.1 or v0.8 does not specified when OFFSET > vl.

Certainly it does, right there:

   offset <= i < vl.

If offset >= vl, then that range is empty of elements.

> Should The elements (vl <=  i  < OFFSET) be seen as tail elements, or unchanged?

Tail elements.

> Here (vl <=  i  < OFFSET) elements are seen as tail elements.

Exactly.

>> Again, eliminate vl == 0 first.  In fact, why don't we make that a global
>> request for all of the patches for the next revision. 
> I don't get it.
> 
> Check vl == 0 first for all patches. Is it right?

Yes.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 35/60] target/riscv: vector floating-point square-root instruction
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-15  7:00   ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  7:00 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

[ Patch didn't make it to the list, so reviewing

https://github.com/romanheros/qemu/commit/c77bef489c5517951077679ec9228438d05f1e1a
]

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 35/60] target/riscv: vector floating-point square-root instruction
@ 2020-03-15  7:00   ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  7:00 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

[ Patch didn't make it to the list, so reviewing

https://github.com/romanheros/qemu/commit/c77bef489c5517951077679ec9228438d05f1e1a
]

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 51/60] target/riscv: set-X-first mask bit
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-15  7:26   ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  7:26 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

[ Patch didn't make it to the list, so reviewing

https://github.com/romanheros/qemu/commit/60668e86d94ffa48adb2f9c346753cf77f582686
]

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

It should be possible to rewrite the helpers in units of uint64_t.  The
unmasked path is easy; the masked path is more complicated.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 51/60] target/riscv: set-X-first mask bit
@ 2020-03-15  7:26   ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  7:26 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

[ Patch didn't make it to the list, so reviewing

https://github.com/romanheros/qemu/commit/60668e86d94ffa48adb2f9c346753cf77f582686
]

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

It should be possible to rewrite the helpers in units of uint64_t.  The
unmasked path is easy; the masked path is more complicated.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 59/60] target/riscv: vector compress instruction
  2020-03-12 14:58 ` LIU Zhiwei
@ 2020-03-15  7:34   ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  7:34 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

[ The patch didn't make it to the list, so reviewing

https://github.com/romanheros/qemu/commit/f46b8c8bbbf0acd78746a49fe712306d7c05c7e6
]

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 59/60] target/riscv: vector compress instruction
@ 2020-03-15  7:34   ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15  7:34 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

[ The patch didn't make it to the list, so reviewing

https://github.com/romanheros/qemu/commit/f46b8c8bbbf0acd78746a49fe712306d7c05c7e6
]

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 24/60] target/riscv: vector single-width averaging add and subtract
  2020-03-15  1:00           ` Richard Henderson
@ 2020-03-15 23:23             ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-15 23:23 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

[-- Attachment #1: Type: text/plain, Size: 7089 bytes --]



On 2020/3/15 9:00, Richard Henderson wrote:
> On 3/14/20 4:12 PM, LIU Zhiwei wrote:
>> I am not sure whether I get it. In my opinion, the code should be modified like
>>
>> static inline int8_t aadd8_rnu(CPURISCVState *env, int8_t a, int8_t b)
>> {
>>      int16_t res = (int16_t)a + (int16_t)b;
>>      uint8_t round = res & 0x1;
>>      res   = (res >> 1) + round;
>>      return res;
>> }
>>
>> static inline int8_t aadd8_rne(CPURISCVState *env, int8_t a, int8_t b)
>> {
>>      int16_t res = (int16_t)a + (int16_t)b;
>>      uint8_t round = ((res & 0x3) == 0x3);
>>      res   = (res >> 1) + round;
>>      return res;
>> }
>>
>> static inline int8_t aadd8_rdn(CPURISCVState *env, int8_t a, int8_t b)
>> {
>>      int16_t res = (int16_t)a + (int16_t)b;
>>      res   = (res >> 1);
>>      return res;
>> }
>>
>> static inline int8_t aadd8_rod(CPURISCVState *env, int8_t a, int8_t b)
>> {
>>      int16_t res = (int16_t)a + (int16_t)b;
>>      uint8_t round = ((res & 0x3) == 0x1);
>>     res   = (res >> 1) + round;
>>      return res;
>> }
>>
>> RVVCALL(OPIVV2_ENV, vaadd_vv_b_rnu, OP_SSS_B, H1, H1, H1, aadd8_rnu)
>> RVVCALL(OPIVV2_ENV, vaadd_vv_b_rne, OP_SSS_B, H1, H1, H1, aadd8_rne)
>> RVVCALL(OPIVV2_ENV, vaadd_vv_b_rdn, OP_SSS_B, H1, H1, H1, aadd8_rdn)
>> RVVCALL(OPIVV2_ENV, vaadd_vv_b_rod, OP_SSS_B, H1, H1, H1, aadd8_rod)
>>
>> void do_vext_vv_env(void *vd, void *v0, void *vs1,
>>                      void *vs2, CPURISCVState *env, uint32_t desc,
>>                      uint32_t esz, uint32_t dsz,
>>                      opivv2_fn *fn, clear_fn *clearfn)
>> {
>>      uint32_t vlmax = vext_maxsz(desc) / esz;
>>      uint32_t mlen = vext_mlen(desc);
>>      uint32_t vm = vext_vm(desc);
>>      uint32_t vl = env->vl;
>>      uint32_t i;
>>      for (i = 0; i < vl; i++) {
>>          if (!vm && !vext_elem_mask(v0, mlen, i)) {
>>              continue;
>>          }
>>          fn(vd, vs1, vs2, i, env);
>>      }
>>      if (i != 0) {
>>          clear_fn(vd, vl, vl * dsz,  vlmax * dsz);
>>      }
>> }
>>
>> #define GEN_VEXT_VV_ENV(NAME, ESZ, DSZ, CLEAR_FN)         \
>> void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
>>                    void *vs2, CPURISCVState *env,          \
>>                    uint32_t desc)                          \
>> {                                                         \
>>      static opivv2_fn *fns[4] = {                          \
>>          NAME##_rnu, NAME##_rne,                           \
>>          NAME##_rdn, NAME##_rod                            \
>>      }                                                     \
>>      return do_vext_vv_env(vd, v0, vs1, vs2, env, desc,    \
>>                            ESZ, DSZ, fns[env->vxrm],       \
>>                CLEAR_FN);                      \
>> }
>>
>> Is it true?
> While that does look good for this case, there are many other uses of
> get_round(), and it may not be quite as simple there.
>
> My suggestion was
>
> static inline int32_t aadd32(int vxrm, int32_t a, int32_t b)
> {
>      int64_t res = (int64_t)a + b;
>      uint8_t round = get_round(vxrm, res, 1);
>
>      return (res >> 1) + round;
> }
>
> static inline int64_t aadd64(int vxrm, int64_t a, int64_t b)
> {
>      int64_t res = a + b;
>      uint8_t round = get_round(vxrm, res, 1);
>      int64_t over = (res ^ a) & (res ^ b) & INT64_MIN;
>
>      /* With signed overflow, bit 64 is inverse of bit 63. */
>      return ((res >> 1) ^ over) + round;
> }
>
> RVVCALL(OPIVV2_RM, vaadd_vv_b, OP_SSS_B, H1, H1, H1, aadd32)
> RVVCALL(OPIVV2_RM, vaadd_vv_h, OP_SSS_H, H2, H2, H2, aadd32)
> RVVCALL(OPIVV2_RM, vaadd_vv_w, OP_SSS_W, H4, H4, H4, aadd32)
> RVVCALL(OPIVV2_RM, vaadd_vv_d, OP_SSS_D, H8, H8, H8, aadd64)
>
> static inline void
> vext_vv_rm_1(void *vd, void *v0, void *vs1, void *vs2,
>               uint32_t vl, uint32_t vm, uint32_t mlen, int vxrm,
>               opivv2_rm_fn *fn)
> {
>      for (uint32_t i = 0; i < vl; i++) {
>          if (!vm && !vext_elem_mask(v0, mlen, i)) {
>              continue;
>          }
>          fn(vd, vs1, vs2, i, vxrm);
>      }
> }
>
> static inline void
> vext_vv_rm_2(void *vd, void *v0, void *vs1,
>               void *vs2, CPURISCVState *env, uint32_t desc,
>               uint32_t esz, uint32_t dsz,
>               opivv2_rm_fn *fn, clear_fn *clearfn)
> {
>      uint32_t vlmax = vext_maxsz(desc) / esz;
>      uint32_t mlen = vext_mlen(desc);
>      uint32_t vm = vext_vm(desc);
>      uint32_t vl = env->vl;
>
>      if (vl == 0) {
>          return;
>      }
>
>      switch (env->vxrm) {
>      case 0: /* rnu */
>          vext_vv_rm_1(vd, v0, vs1, vs2,
>                       vl, vm, mlen, 0, fn);
>          break;
>      case 1: /* rne */
>          vext_vv_rm_1(vd, v0, vs1, vs2,
>                       vl, vm, mlen, 1, fn);
>          break;
>      case 2: /* rdn */
>          vext_vv_rm_1(vd, v0, vs1, vs2,
>                       vl, vm, mlen, 2, fn);
>          break;
>      default: /* rod */
>          vext_vv_rm_1(vd, v0, vs1, vs2,
>                       vl, vm, mlen, 3, fn);
>          break;
>      }
>
>      clear_fn(vd, vl, vl * dsz,  vlmax * dsz);
> }
>
> >From vext_vv_rm_2, a constant is passed down all of the inline functions, so
> that a constant arrives in get_round() at the bottom of the call chain.  At
> which point all of the expressions get folded by the compiler and we *should*
> get very similar generated code as to what you have above.
Yes, it will be much better.

I still have one question here.

Many other fixed point instructions also need vxsat besides vxsrm.

In that cases, can I just define OPIVV2_RM like this:

#define OPIVV2_RM(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)     \
static inline void                                                  \
do_##NAME(void *vd, void *vs1, void *vs2, int i,                    \
           CPURISCVState *env, int vxrm)                             \
{                                                                   \
     TX1 s1 = *((T1 *)vs1 + HS1(i));                                 \
     TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
     *((TD *)vd + HD(i)) = OP(env, vxrm, s2, s1);                    \
}

static inline int32_t aadd32(|__attribute__((unused)) |CPURISCVState *env,
			     int vxrm, int32_t a, int32_t b)
{
     int64_t res = (int64_t)a + b;
     uint8_t round = get_round(vxrm, res, 1);

     return (res >> 1) + round;
}


In this way, I can write just one OPIVV2_RM instead of (OPIVV2_RM, 
OPIVV2_RM_ENV, OPIVV2_ENV).

Zhiwei

>
> r~


[-- Attachment #2: Type: text/html, Size: 8669 bytes --]

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 24/60] target/riscv: vector single-width averaging add and subtract
@ 2020-03-15 23:23             ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-15 23:23 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

[-- Attachment #1: Type: text/plain, Size: 7089 bytes --]



On 2020/3/15 9:00, Richard Henderson wrote:
> On 3/14/20 4:12 PM, LIU Zhiwei wrote:
>> I am not sure whether I get it. In my opinion, the code should be modified like
>>
>> static inline int8_t aadd8_rnu(CPURISCVState *env, int8_t a, int8_t b)
>> {
>>      int16_t res = (int16_t)a + (int16_t)b;
>>      uint8_t round = res & 0x1;
>>      res   = (res >> 1) + round;
>>      return res;
>> }
>>
>> static inline int8_t aadd8_rne(CPURISCVState *env, int8_t a, int8_t b)
>> {
>>      int16_t res = (int16_t)a + (int16_t)b;
>>      uint8_t round = ((res & 0x3) == 0x3);
>>      res   = (res >> 1) + round;
>>      return res;
>> }
>>
>> static inline int8_t aadd8_rdn(CPURISCVState *env, int8_t a, int8_t b)
>> {
>>      int16_t res = (int16_t)a + (int16_t)b;
>>      res   = (res >> 1);
>>      return res;
>> }
>>
>> static inline int8_t aadd8_rod(CPURISCVState *env, int8_t a, int8_t b)
>> {
>>      int16_t res = (int16_t)a + (int16_t)b;
>>      uint8_t round = ((res & 0x3) == 0x1);
>>     res   = (res >> 1) + round;
>>      return res;
>> }
>>
>> RVVCALL(OPIVV2_ENV, vaadd_vv_b_rnu, OP_SSS_B, H1, H1, H1, aadd8_rnu)
>> RVVCALL(OPIVV2_ENV, vaadd_vv_b_rne, OP_SSS_B, H1, H1, H1, aadd8_rne)
>> RVVCALL(OPIVV2_ENV, vaadd_vv_b_rdn, OP_SSS_B, H1, H1, H1, aadd8_rdn)
>> RVVCALL(OPIVV2_ENV, vaadd_vv_b_rod, OP_SSS_B, H1, H1, H1, aadd8_rod)
>>
>> void do_vext_vv_env(void *vd, void *v0, void *vs1,
>>                      void *vs2, CPURISCVState *env, uint32_t desc,
>>                      uint32_t esz, uint32_t dsz,
>>                      opivv2_fn *fn, clear_fn *clearfn)
>> {
>>      uint32_t vlmax = vext_maxsz(desc) / esz;
>>      uint32_t mlen = vext_mlen(desc);
>>      uint32_t vm = vext_vm(desc);
>>      uint32_t vl = env->vl;
>>      uint32_t i;
>>      for (i = 0; i < vl; i++) {
>>          if (!vm && !vext_elem_mask(v0, mlen, i)) {
>>              continue;
>>          }
>>          fn(vd, vs1, vs2, i, env);
>>      }
>>      if (i != 0) {
>>          clear_fn(vd, vl, vl * dsz,  vlmax * dsz);
>>      }
>> }
>>
>> #define GEN_VEXT_VV_ENV(NAME, ESZ, DSZ, CLEAR_FN)         \
>> void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
>>                    void *vs2, CPURISCVState *env,          \
>>                    uint32_t desc)                          \
>> {                                                         \
>>      static opivv2_fn *fns[4] = {                          \
>>          NAME##_rnu, NAME##_rne,                           \
>>          NAME##_rdn, NAME##_rod                            \
>>      }                                                     \
>>      return do_vext_vv_env(vd, v0, vs1, vs2, env, desc,    \
>>                            ESZ, DSZ, fns[env->vxrm],       \
>>                CLEAR_FN);                      \
>> }
>>
>> Is it true?
> While that does look good for this case, there are many other uses of
> get_round(), and it may not be quite as simple there.
>
> My suggestion was
>
> static inline int32_t aadd32(int vxrm, int32_t a, int32_t b)
> {
>      int64_t res = (int64_t)a + b;
>      uint8_t round = get_round(vxrm, res, 1);
>
>      return (res >> 1) + round;
> }
>
> static inline int64_t aadd64(int vxrm, int64_t a, int64_t b)
> {
>      int64_t res = a + b;
>      uint8_t round = get_round(vxrm, res, 1);
>      int64_t over = (res ^ a) & (res ^ b) & INT64_MIN;
>
>      /* With signed overflow, bit 64 is inverse of bit 63. */
>      return ((res >> 1) ^ over) + round;
> }
>
> RVVCALL(OPIVV2_RM, vaadd_vv_b, OP_SSS_B, H1, H1, H1, aadd32)
> RVVCALL(OPIVV2_RM, vaadd_vv_h, OP_SSS_H, H2, H2, H2, aadd32)
> RVVCALL(OPIVV2_RM, vaadd_vv_w, OP_SSS_W, H4, H4, H4, aadd32)
> RVVCALL(OPIVV2_RM, vaadd_vv_d, OP_SSS_D, H8, H8, H8, aadd64)
>
> static inline void
> vext_vv_rm_1(void *vd, void *v0, void *vs1, void *vs2,
>               uint32_t vl, uint32_t vm, uint32_t mlen, int vxrm,
>               opivv2_rm_fn *fn)
> {
>      for (uint32_t i = 0; i < vl; i++) {
>          if (!vm && !vext_elem_mask(v0, mlen, i)) {
>              continue;
>          }
>          fn(vd, vs1, vs2, i, vxrm);
>      }
> }
>
> static inline void
> vext_vv_rm_2(void *vd, void *v0, void *vs1,
>               void *vs2, CPURISCVState *env, uint32_t desc,
>               uint32_t esz, uint32_t dsz,
>               opivv2_rm_fn *fn, clear_fn *clearfn)
> {
>      uint32_t vlmax = vext_maxsz(desc) / esz;
>      uint32_t mlen = vext_mlen(desc);
>      uint32_t vm = vext_vm(desc);
>      uint32_t vl = env->vl;
>
>      if (vl == 0) {
>          return;
>      }
>
>      switch (env->vxrm) {
>      case 0: /* rnu */
>          vext_vv_rm_1(vd, v0, vs1, vs2,
>                       vl, vm, mlen, 0, fn);
>          break;
>      case 1: /* rne */
>          vext_vv_rm_1(vd, v0, vs1, vs2,
>                       vl, vm, mlen, 1, fn);
>          break;
>      case 2: /* rdn */
>          vext_vv_rm_1(vd, v0, vs1, vs2,
>                       vl, vm, mlen, 2, fn);
>          break;
>      default: /* rod */
>          vext_vv_rm_1(vd, v0, vs1, vs2,
>                       vl, vm, mlen, 3, fn);
>          break;
>      }
>
>      clear_fn(vd, vl, vl * dsz,  vlmax * dsz);
> }
>
> >From vext_vv_rm_2, a constant is passed down all of the inline functions, so
> that a constant arrives in get_round() at the bottom of the call chain.  At
> which point all of the expressions get folded by the compiler and we *should*
> get very similar generated code as to what you have above.
Yes, it will be much better.

I still have one question here.

Many other fixed point instructions also need vxsat besides vxsrm.

In that cases, can I just define OPIVV2_RM like this:

#define OPIVV2_RM(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)     \
static inline void                                                  \
do_##NAME(void *vd, void *vs1, void *vs2, int i,                    \
           CPURISCVState *env, int vxrm)                             \
{                                                                   \
     TX1 s1 = *((T1 *)vs1 + HS1(i));                                 \
     TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
     *((TD *)vd + HD(i)) = OP(env, vxrm, s2, s1);                    \
}

static inline int32_t aadd32(|__attribute__((unused)) |CPURISCVState *env,
			     int vxrm, int32_t a, int32_t b)
{
     int64_t res = (int64_t)a + b;
     uint8_t round = get_round(vxrm, res, 1);

     return (res >> 1) + round;
}


In this way, I can write just one OPIVV2_RM instead of (OPIVV2_RM, 
OPIVV2_RM_ENV, OPIVV2_ENV).

Zhiwei

>
> r~


[-- Attachment #2: Type: text/html, Size: 8669 bytes --]

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 24/60] target/riscv: vector single-width averaging add and subtract
  2020-03-15 23:23             ` LIU Zhiwei
@ 2020-03-15 23:27               ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15 23:27 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/15/20 4:23 PM, LIU Zhiwei wrote:
> Many other fixed point instructions also need vxsat besides vxsrm.

Ah yes.

> In that cases, can I just define OPIVV2_RM like this:
> 
> #define OPIVV2_RM(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)     \
> static inline void                                                  \
> do_##NAME(void *vd, void *vs1, void *vs2, int i,                    \
>           CPURISCVState *env, int vxrm)                             \
> {                                                                   \
>     TX1 s1 = *((T1 *)vs1 + HS1(i));                                 \
>     TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
>     *((TD *)vd + HD(i)) = OP(env, vxrm, s2, s1);                    \
> }
> 
> static inline int32_t aadd32(|__attribute__((unused)) |CPURISCVState *env, 
> 			     int vxrm, int32_t a, int32_t b)

You can drop the unused.  We don't turn on warnings for unused arguments, as we
have a *lot* of them for exactly this reason -- keeping a common functional
interface.


> {
>     int64_t res = (int64_t)a + b;
>     uint8_t round = get_round(vxrm, res, 1);
> 
>     return (res >> 1) + round;
> }
> 
> 
> In this way, I can write just one OPIVV2_RM instead of (OPIVV2_RM,
> OPIVV2_RM_ENV, OPIVV2_ENV).

Yes, that's fine.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 24/60] target/riscv: vector single-width averaging add and subtract
@ 2020-03-15 23:27               ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-15 23:27 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/15/20 4:23 PM, LIU Zhiwei wrote:
> Many other fixed point instructions also need vxsat besides vxsrm.

Ah yes.

> In that cases, can I just define OPIVV2_RM like this:
> 
> #define OPIVV2_RM(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)     \
> static inline void                                                  \
> do_##NAME(void *vd, void *vs1, void *vs2, int i,                    \
>           CPURISCVState *env, int vxrm)                             \
> {                                                                   \
>     TX1 s1 = *((T1 *)vs1 + HS1(i));                                 \
>     TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
>     *((TD *)vd + HD(i)) = OP(env, vxrm, s2, s1);                    \
> }
> 
> static inline int32_t aadd32(|__attribute__((unused)) |CPURISCVState *env, 
> 			     int vxrm, int32_t a, int32_t b)

You can drop the unused.  We don't turn on warnings for unused arguments, as we
have a *lot* of them for exactly this reason -- keeping a common functional
interface.


> {
>     int64_t res = (int64_t)a + b;
>     uint8_t round = get_round(vxrm, res, 1);
> 
>     return (res >> 1) + round;
> }
> 
> 
> In this way, I can write just one OPIVV2_RM instead of (OPIVV2_RM,
> OPIVV2_RM_ENV, OPIVV2_ENV).

Yes, that's fine.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 22/60] target/riscv: vector integer merge and move instructions
  2020-03-14  7:27     ` Richard Henderson
@ 2020-03-16  2:57       ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-16  2:57 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

[-- Attachment #1: Type: text/plain, Size: 3761 bytes --]



On 2020/3/14 15:27, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +/* Vector Integer Merge and Move Instructions */
>> +static bool opivv_vmerge_check(DisasContext *s, arg_rmrr *a)
>> +{
>> +    return (vext_check_isa_ill(s, RVV) &&
>> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
>> +            vext_check_reg(s, a->rd, false) &&
>> +            vext_check_reg(s, a->rs2, false) &&
>> +            vext_check_reg(s, a->rs1, false) &&
>> +            ((a->vm == 0) || (a->rs2 == 0)));
>> +}
>> +GEN_OPIVV_TRANS(vmerge_vvm, opivv_vmerge_check)
>> +
>> +static bool opivx_vmerge_check(DisasContext *s, arg_rmrr *a)
>> +{
>> +    return (vext_check_isa_ill(s, RVV) &&
>> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
>> +            vext_check_reg(s, a->rd, false) &&
>> +            vext_check_reg(s, a->rs2, false) &&
>> +            ((a->vm == 0) || (a->rs2 == 0)));
>> +}
>> +GEN_OPIVX_TRANS(vmerge_vxm, opivx_vmerge_check)
>> +
>> +GEN_OPIVI_TRANS(vmerge_vim, 0, vmerge_vxm, opivx_vmerge_check)
> I think you need to special case these.  The unmasked instructions are the
> canonical move instructions: vmv.v.*.
>
> You definitely want to use tcg_gen_gvec_mov (vv), tcg_gen_gvec_dup_i{32,64}
> (vx) and tcg_gen_gvec_dup{8,16,32,64}i (vi).
I have a question here.

Are these GVEC IRs  proper for any vl, or just when vl equals vlmax?
I see there are some align assert in these GVEC IR.

Now the code is like

static bool trans_vmv_v_v(DisasContext *s, arg_r *a)
{
     if (vext_check_isa_ill(s, RVV) &&
         vext_check_reg(s, a->rd, false) &&
         vext_check_reg(s, a->rs1, false)) {

         if (s->vl_eq_vlmax) {
             tcg_gen_gvec_mov(s->sew, vreg_ofs(s, a->rd),
                              vreg_ofs(s, a->rs1),
                              MAXSZ(s), MAXSZ(s));
         } else {
             uint32_t data = FIELD_DP32(0, VDATA, LMUL, s->lmul);
             static gen_helper_gvec_2_ptr * const fns[4] = {
                 gen_helper_vmv_v_v_b, gen_helper_vmv_v_v_h,
                 gen_helper_vmv_v_v_w, gen_helper_vmv_v_v_d,
             };

             tcg_gen_gvec_2_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs1),
                                cpu_env, 0, s->vlen / 8, data, fns[s->sew]);
         }
         return true;
     }
     return false;
}

Is it right?

Zhiwei
>
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {                   \
>> +            ETYPE s2 = *((ETYPE *)vs2 + H(i));                       \
>> +            *((ETYPE *)vd + H1(i)) = s2;                             \
>> +        } else {                                                     \
>> +            ETYPE s1 = *((ETYPE *)vs1 + H(i));                       \
>> +            *((ETYPE *)vd + H(i)) = s1;                              \
>> +        }                                                            \
> Perhaps better as
>
> ETYPE *vt = (!vm && !vext_elem_mask(v0, mlen, i) ? vs2 : vs1);
> *((ETYPE *)vd + H(i)) = *((ETYPE *)vt + H(i));
>
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {                   \
>> +            ETYPE s2 = *((ETYPE *)vs2 + H(i));                       \
>> +            *((ETYPE *)vd + H1(i)) = s2;                             \
>> +        } else {                                                     \
>> +            *((ETYPE *)vd + H(i)) = (ETYPE)(target_long)s1;          \
>> +        }                                                            \
> Perhaps better as
>
> ETYPE s2 = *((ETYPE *)vs2 + H(i));
> ETYPE d = (!vm && !vext_elem_mask(v0, mlen, i)
>             ? s2 : (ETYPE)(target_long)s1);
> *((ETYPE *)vd + H(i)) = d;
>
> as most host platforms have a conditional reg-reg move, but not a conditional load.
>
>
> r~


[-- Attachment #2: Type: text/html, Size: 4851 bytes --]

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 22/60] target/riscv: vector integer merge and move instructions
@ 2020-03-16  2:57       ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-16  2:57 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

[-- Attachment #1: Type: text/plain, Size: 3761 bytes --]



On 2020/3/14 15:27, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +/* Vector Integer Merge and Move Instructions */
>> +static bool opivv_vmerge_check(DisasContext *s, arg_rmrr *a)
>> +{
>> +    return (vext_check_isa_ill(s, RVV) &&
>> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
>> +            vext_check_reg(s, a->rd, false) &&
>> +            vext_check_reg(s, a->rs2, false) &&
>> +            vext_check_reg(s, a->rs1, false) &&
>> +            ((a->vm == 0) || (a->rs2 == 0)));
>> +}
>> +GEN_OPIVV_TRANS(vmerge_vvm, opivv_vmerge_check)
>> +
>> +static bool opivx_vmerge_check(DisasContext *s, arg_rmrr *a)
>> +{
>> +    return (vext_check_isa_ill(s, RVV) &&
>> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
>> +            vext_check_reg(s, a->rd, false) &&
>> +            vext_check_reg(s, a->rs2, false) &&
>> +            ((a->vm == 0) || (a->rs2 == 0)));
>> +}
>> +GEN_OPIVX_TRANS(vmerge_vxm, opivx_vmerge_check)
>> +
>> +GEN_OPIVI_TRANS(vmerge_vim, 0, vmerge_vxm, opivx_vmerge_check)
> I think you need to special case these.  The unmasked instructions are the
> canonical move instructions: vmv.v.*.
>
> You definitely want to use tcg_gen_gvec_mov (vv), tcg_gen_gvec_dup_i{32,64}
> (vx) and tcg_gen_gvec_dup{8,16,32,64}i (vi).
I have a question here.

Are these GVEC IRs  proper for any vl, or just when vl equals vlmax?
I see there are some align assert in these GVEC IR.

Now the code is like

static bool trans_vmv_v_v(DisasContext *s, arg_r *a)
{
     if (vext_check_isa_ill(s, RVV) &&
         vext_check_reg(s, a->rd, false) &&
         vext_check_reg(s, a->rs1, false)) {

         if (s->vl_eq_vlmax) {
             tcg_gen_gvec_mov(s->sew, vreg_ofs(s, a->rd),
                              vreg_ofs(s, a->rs1),
                              MAXSZ(s), MAXSZ(s));
         } else {
             uint32_t data = FIELD_DP32(0, VDATA, LMUL, s->lmul);
             static gen_helper_gvec_2_ptr * const fns[4] = {
                 gen_helper_vmv_v_v_b, gen_helper_vmv_v_v_h,
                 gen_helper_vmv_v_v_w, gen_helper_vmv_v_v_d,
             };

             tcg_gen_gvec_2_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs1),
                                cpu_env, 0, s->vlen / 8, data, fns[s->sew]);
         }
         return true;
     }
     return false;
}

Is it right?

Zhiwei
>
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {                   \
>> +            ETYPE s2 = *((ETYPE *)vs2 + H(i));                       \
>> +            *((ETYPE *)vd + H1(i)) = s2;                             \
>> +        } else {                                                     \
>> +            ETYPE s1 = *((ETYPE *)vs1 + H(i));                       \
>> +            *((ETYPE *)vd + H(i)) = s1;                              \
>> +        }                                                            \
> Perhaps better as
>
> ETYPE *vt = (!vm && !vext_elem_mask(v0, mlen, i) ? vs2 : vs1);
> *((ETYPE *)vd + H(i)) = *((ETYPE *)vt + H(i));
>
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {                   \
>> +            ETYPE s2 = *((ETYPE *)vs2 + H(i));                       \
>> +            *((ETYPE *)vd + H1(i)) = s2;                             \
>> +        } else {                                                     \
>> +            *((ETYPE *)vd + H(i)) = (ETYPE)(target_long)s1;          \
>> +        }                                                            \
> Perhaps better as
>
> ETYPE s2 = *((ETYPE *)vs2 + H(i));
> ETYPE d = (!vm && !vext_elem_mask(v0, mlen, i)
>             ? s2 : (ETYPE)(target_long)s1);
> *((ETYPE *)vd + H(i)) = d;
>
> as most host platforms have a conditional reg-reg move, but not a conditional load.
>
>
> r~


[-- Attachment #2: Type: text/html, Size: 4851 bytes --]

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 40/60] target/riscv: vector floating-point merge instructions
  2020-03-14 22:47     ` Richard Henderson
@ 2020-03-16  3:41       ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-16  3:41 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

[-- Attachment #1: Type: text/plain, Size: 1026 bytes --]



On 2020/3/15 6:47, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +
>> +/* Vector Floating-Point Merge Instruction */
>> +static bool opfvf_vfmerge_check(DisasContext *s, arg_rmrr *a)
>> +{
>> +    return (vext_check_isa_ill(s, RVV) &&
>> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
>> +            vext_check_reg(s, a->rd, false) &&
>> +            vext_check_reg(s, a->rs2, false) &&
>> +            ((a->vm == 0) || (a->rs2 == 0)) &&
>> +            (s->sew != 0));
>> +}
>> +GEN_OPFVF_TRANS(vfmerge_vfm, opfvf_vfmerge_check)
> Similar comments as for integer merge, using tcg_gen_gvec_dup_i64 for
> unpredicated merges.
>
> In fact, there's no reason at all to define a helper function for this one.  I
> would expect you do be able to use the exact same helpers as for the integer
> merges.

Do you mean that I should expands TCGv to TCGv_i64 for vmv.v.x in 
translation?
So that I can reuse it.

void gen_helper_vmv_v_x(TCG_ptr, TCGv_i64, TCGv_env, TCGv_i32);

Zhiwei

>
> r~


[-- Attachment #2: Type: text/html, Size: 1764 bytes --]

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 40/60] target/riscv: vector floating-point merge instructions
@ 2020-03-16  3:41       ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-16  3:41 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

[-- Attachment #1: Type: text/plain, Size: 1026 bytes --]



On 2020/3/15 6:47, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +
>> +/* Vector Floating-Point Merge Instruction */
>> +static bool opfvf_vfmerge_check(DisasContext *s, arg_rmrr *a)
>> +{
>> +    return (vext_check_isa_ill(s, RVV) &&
>> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
>> +            vext_check_reg(s, a->rd, false) &&
>> +            vext_check_reg(s, a->rs2, false) &&
>> +            ((a->vm == 0) || (a->rs2 == 0)) &&
>> +            (s->sew != 0));
>> +}
>> +GEN_OPFVF_TRANS(vfmerge_vfm, opfvf_vfmerge_check)
> Similar comments as for integer merge, using tcg_gen_gvec_dup_i64 for
> unpredicated merges.
>
> In fact, there's no reason at all to define a helper function for this one.  I
> would expect you do be able to use the exact same helpers as for the integer
> merges.

Do you mean that I should expands TCGv to TCGv_i64 for vmv.v.x in 
translation?
So that I can reuse it.

void gen_helper_vmv_v_x(TCG_ptr, TCGv_i64, TCGv_env, TCGv_i32);

Zhiwei

>
> r~


[-- Attachment #2: Type: text/html, Size: 1764 bytes --]

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 22/60] target/riscv: vector integer merge and move instructions
  2020-03-16  2:57       ` LIU Zhiwei
@ 2020-03-16  5:32         ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-16  5:32 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/15/20 7:57 PM, LIU Zhiwei wrote:
>> You definitely want to use tcg_gen_gvec_mov (vv), tcg_gen_gvec_dup_i{32,64}
>> (vx) and tcg_gen_gvec_dup{8,16,32,64}i (vi).
> I have a question here.
> 
> Are these GVEC IRs  proper for any vl, or just when vl equals vlmax?
> I see there are some align assert in these GVEC IR.

Only vl_eq_vlmax.  I should have been more precise.
But I expect this boolean to be true quite often.

> 
> Now the code is like
> 
> static bool trans_vmv_v_v(DisasContext *s, arg_r *a)
> {
>     if (vext_check_isa_ill(s, RVV) &&
>         vext_check_reg(s, a->rd, false) &&
>         vext_check_reg(s, a->rs1, false)) {
> 
>         if (s->vl_eq_vlmax) {
>             tcg_gen_gvec_mov(s->sew, vreg_ofs(s, a->rd),
>                              vreg_ofs(s, a->rs1),
>                              MAXSZ(s), MAXSZ(s));
>         } else {
>             uint32_t data = FIELD_DP32(0, VDATA, LMUL, s->lmul);
>             static gen_helper_gvec_2_ptr * const fns[4] = {
>                 gen_helper_vmv_v_v_b, gen_helper_vmv_v_v_h,
>                 gen_helper_vmv_v_v_w, gen_helper_vmv_v_v_d,
>             };
> 
>             tcg_gen_gvec_2_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs1),
>                                cpu_env, 0, s->vlen / 8, data, fns[s->sew]);
>         }
>         return true;
>     }
>     return false;
> }
> 
> Is it right?

Yes, that looks fine.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 22/60] target/riscv: vector integer merge and move instructions
@ 2020-03-16  5:32         ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-16  5:32 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/15/20 7:57 PM, LIU Zhiwei wrote:
>> You definitely want to use tcg_gen_gvec_mov (vv), tcg_gen_gvec_dup_i{32,64}
>> (vx) and tcg_gen_gvec_dup{8,16,32,64}i (vi).
> I have a question here.
> 
> Are these GVEC IRs  proper for any vl, or just when vl equals vlmax?
> I see there are some align assert in these GVEC IR.

Only vl_eq_vlmax.  I should have been more precise.
But I expect this boolean to be true quite often.

> 
> Now the code is like
> 
> static bool trans_vmv_v_v(DisasContext *s, arg_r *a)
> {
>     if (vext_check_isa_ill(s, RVV) &&
>         vext_check_reg(s, a->rd, false) &&
>         vext_check_reg(s, a->rs1, false)) {
> 
>         if (s->vl_eq_vlmax) {
>             tcg_gen_gvec_mov(s->sew, vreg_ofs(s, a->rd),
>                              vreg_ofs(s, a->rs1),
>                              MAXSZ(s), MAXSZ(s));
>         } else {
>             uint32_t data = FIELD_DP32(0, VDATA, LMUL, s->lmul);
>             static gen_helper_gvec_2_ptr * const fns[4] = {
>                 gen_helper_vmv_v_v_b, gen_helper_vmv_v_v_h,
>                 gen_helper_vmv_v_v_w, gen_helper_vmv_v_v_d,
>             };
> 
>             tcg_gen_gvec_2_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs1),
>                                cpu_env, 0, s->vlen / 8, data, fns[s->sew]);
>         }
>         return true;
>     }
>     return false;
> }
> 
> Is it right?

Yes, that looks fine.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 40/60] target/riscv: vector floating-point merge instructions
  2020-03-16  3:41       ` LIU Zhiwei
@ 2020-03-16  5:37         ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-16  5:37 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/15/20 8:41 PM, LIU Zhiwei wrote:
> 
> 
> On 2020/3/15 6:47, Richard Henderson wrote:
>> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>>> +
>>> +/* Vector Floating-Point Merge Instruction */
>>> +static bool opfvf_vfmerge_check(DisasContext *s, arg_rmrr *a)
>>> +{
>>> +    return (vext_check_isa_ill(s, RVV) &&
>>> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
>>> +            vext_check_reg(s, a->rd, false) &&
>>> +            vext_check_reg(s, a->rs2, false) &&
>>> +            ((a->vm == 0) || (a->rs2 == 0)) &&
>>> +            (s->sew != 0));
>>> +}
>>> +GEN_OPFVF_TRANS(vfmerge_vfm, opfvf_vfmerge_check)
>> Similar comments as for integer merge, using tcg_gen_gvec_dup_i64 for
>> unpredicated merges.
>>
>> In fact, there's no reason at all to define a helper function for this one.  I
>> would expect you do be able to use the exact same helpers as for the integer
>> merges.
> 
> Do you mean that I should expands TCGv to TCGv_i64 for vmv.v.x in translation?
> So that I can reuse it.
> 
> void gen_helper_vmv_v_x(TCG_ptr, TCGv_i64, TCGv_env, TCGv_i32);

Oh, I see, yes currently the integer helper is TCGv.
Yes, it might be easiest to extend to TCGv_i64.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 40/60] target/riscv: vector floating-point merge instructions
@ 2020-03-16  5:37         ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-16  5:37 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/15/20 8:41 PM, LIU Zhiwei wrote:
> 
> 
> On 2020/3/15 6:47, Richard Henderson wrote:
>> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>>> +
>>> +/* Vector Floating-Point Merge Instruction */
>>> +static bool opfvf_vfmerge_check(DisasContext *s, arg_rmrr *a)
>>> +{
>>> +    return (vext_check_isa_ill(s, RVV) &&
>>> +            vext_check_overlap_mask(s, a->rd, a->vm, false) &&
>>> +            vext_check_reg(s, a->rd, false) &&
>>> +            vext_check_reg(s, a->rs2, false) &&
>>> +            ((a->vm == 0) || (a->rs2 == 0)) &&
>>> +            (s->sew != 0));
>>> +}
>>> +GEN_OPFVF_TRANS(vfmerge_vfm, opfvf_vfmerge_check)
>> Similar comments as for integer merge, using tcg_gen_gvec_dup_i64 for
>> unpredicated merges.
>>
>> In fact, there's no reason at all to define a helper function for this one.  I
>> would expect you do be able to use the exact same helpers as for the integer
>> merges.
> 
> Do you mean that I should expands TCGv to TCGv_i64 for vmv.v.x in translation?
> So that I can reuse it.
> 
> void gen_helper_vmv_v_x(TCG_ptr, TCGv_i64, TCGv_env, TCGv_i32);

Oh, I see, yes currently the integer helper is TCGv.
Yes, it might be easiest to extend to TCGv_i64.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 57/60] target/riscv: vector slide instructions
  2020-03-15  5:16     ` Richard Henderson
@ 2020-03-16  8:04       ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-16  8:04 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768



On 2020/3/15 13:16, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +#define GEN_VEXT_VSLIDEUP_VX(NAME, ETYPE, H, CLEAR_FN)                    \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
>> +        CPURISCVState *env, uint32_t desc)                                \
>> +{                                                                         \
>> +    uint32_t mlen = vext_mlen(desc);                                      \
>> +    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
>> +    uint32_t vm = vext_vm(desc);                                          \
>> +    uint32_t vl = env->vl;                                                \
>> +    uint32_t offset = s1, i;                                              \
>> +                                                                          \
>> +    if (offset > vl) {                                                    \
>> +        offset = vl;                                                      \
>> +    }                                                                     \
> This isn't right.
>
>> +    for (i = 0; i < vl; i++) {                                            \
>> +        if (((i < offset)) || (!vm && !vext_elem_mask(v0, mlen, i))) {    \
>> +            continue;                                                     \
>> +        }                                                                 \
>> +        *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - offset));          \
>> +    }                                                                     \
>> +    if (i == 0) {                                                         \
>> +        return;                                                           \
>> +    }                                                                     \
> You need to eliminate vl == 0 first, not last.
> Then
>
>      for (i = offset; i < vl; i++)
>
> The types of i and vl need to be extended to target_ulong, so that you don't
> incorrectly crop the input offset.
>
> It may be worth special-casing vm=1, or hoisting it out of the loop.  The
> operation becomes a memcpy (at least for little-endian) at that point.  See
> swap_memmove in arm/sve_helper.c.
>
>
>> +#define GEN_VEXT_VSLIDEDOWN_VX(NAME, ETYPE, H, CLEAR_FN)                  \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
>> +        CPURISCVState *env, uint32_t desc)                                \
>> +{                                                                         \
>> +    uint32_t mlen = vext_mlen(desc);                                      \
>> +    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
>> +    uint32_t vm = vext_vm(desc);                                          \
>> +    uint32_t vl = env->vl;                                                \
>> +    uint32_t offset = s1, i;                                              \
>> +                                                                          \
>> +    for (i = 0; i < vl; i++) {                                            \
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
>> +            continue;                                                     \
>> +        }                                                                 \
>> +        if (i + offset < vlmax) {                                         \
>> +            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i + offset));      \
> Again, eliminate vl == 0 first.  In fact, why don't we make that a global
> request for all of the patches for the next revision.  Checking for i == 0 last
> is silly, and checks for the zero twice: once in the loop bounds and again at
> the end.
>
> It is probably worth changing the loop bounds to
>
>      if (offset >= vlmax) {
>         max = 0;
>      } else {
>         max = MIN(vl, vlmax - offset);
>      }
>      for (i = 0; i < max; ++i)
>
>
>> +        } else {                                                          \
>> +            *((ETYPE *)vd + H(i)) = 0;                                    \
>> +        }
> Which lets these zeros merge into...
>
>> +    for (; i < vlmax; i++) {                                              \
>> +        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
>> +    }                                                                     \
> These zeros.
>
>> +#define GEN_VEXT_VSLIDE1UP_VX(NAME, ETYPE, H, CLEAR_FN)                   \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
>> +        CPURISCVState *env, uint32_t desc)                                \
>> +{                                                                         \
>> +    uint32_t mlen = vext_mlen(desc);                                      \
>> +    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
>> +    uint32_t vm = vext_vm(desc);                                          \
>> +    uint32_t vl = env->vl;                                                \
>> +    uint32_t i;                                                           \
>> +                                                                          \
>> +    for (i = 0; i < vl; i++) {                                            \
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
>> +            continue;                                                     \
>> +        }                                                                 \
>> +        if (i == 0) {                                                     \
>> +            *((ETYPE *)vd + H(i)) = s1;                                   \
>> +        } else {                                                          \
>> +            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - 1));           \
>> +        }                                                                 \
>> +    }                                                                     \
>> +    if (i == 0) {                                                         \
>> +        return;                                                           \
>> +    }                                                                     \
>> +    for (; i < vlmax; i++) {                                              \
>> +        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
>> +    }                                                                     \
>> +}
> As a preference, I think you can do away with this helper.
> Simply use the slideup helper with argument 1, and then
> afterwards store the integer register into element 0.  You should be able to
> re-use code from vmv.s.x for that.
When I try it, I find it is some difficult, because  vmv.s.x will clean
the elements (0 < index < VLEN/SEW).

Zhiwei
>> +#define GEN_VEXT_VSLIDE1DOWN_VX(NAME, ETYPE, H, CLEAR_FN)                 \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
>> +        CPURISCVState *env, uint32_t desc)                                \
>> +{                                                                         \
> Likewise.
>
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 57/60] target/riscv: vector slide instructions
@ 2020-03-16  8:04       ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-16  8:04 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv



On 2020/3/15 13:16, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +#define GEN_VEXT_VSLIDEUP_VX(NAME, ETYPE, H, CLEAR_FN)                    \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
>> +        CPURISCVState *env, uint32_t desc)                                \
>> +{                                                                         \
>> +    uint32_t mlen = vext_mlen(desc);                                      \
>> +    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
>> +    uint32_t vm = vext_vm(desc);                                          \
>> +    uint32_t vl = env->vl;                                                \
>> +    uint32_t offset = s1, i;                                              \
>> +                                                                          \
>> +    if (offset > vl) {                                                    \
>> +        offset = vl;                                                      \
>> +    }                                                                     \
> This isn't right.
>
>> +    for (i = 0; i < vl; i++) {                                            \
>> +        if (((i < offset)) || (!vm && !vext_elem_mask(v0, mlen, i))) {    \
>> +            continue;                                                     \
>> +        }                                                                 \
>> +        *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - offset));          \
>> +    }                                                                     \
>> +    if (i == 0) {                                                         \
>> +        return;                                                           \
>> +    }                                                                     \
> You need to eliminate vl == 0 first, not last.
> Then
>
>      for (i = offset; i < vl; i++)
>
> The types of i and vl need to be extended to target_ulong, so that you don't
> incorrectly crop the input offset.
>
> It may be worth special-casing vm=1, or hoisting it out of the loop.  The
> operation becomes a memcpy (at least for little-endian) at that point.  See
> swap_memmove in arm/sve_helper.c.
>
>
>> +#define GEN_VEXT_VSLIDEDOWN_VX(NAME, ETYPE, H, CLEAR_FN)                  \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
>> +        CPURISCVState *env, uint32_t desc)                                \
>> +{                                                                         \
>> +    uint32_t mlen = vext_mlen(desc);                                      \
>> +    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
>> +    uint32_t vm = vext_vm(desc);                                          \
>> +    uint32_t vl = env->vl;                                                \
>> +    uint32_t offset = s1, i;                                              \
>> +                                                                          \
>> +    for (i = 0; i < vl; i++) {                                            \
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
>> +            continue;                                                     \
>> +        }                                                                 \
>> +        if (i + offset < vlmax) {                                         \
>> +            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i + offset));      \
> Again, eliminate vl == 0 first.  In fact, why don't we make that a global
> request for all of the patches for the next revision.  Checking for i == 0 last
> is silly, and checks for the zero twice: once in the loop bounds and again at
> the end.
>
> It is probably worth changing the loop bounds to
>
>      if (offset >= vlmax) {
>         max = 0;
>      } else {
>         max = MIN(vl, vlmax - offset);
>      }
>      for (i = 0; i < max; ++i)
>
>
>> +        } else {                                                          \
>> +            *((ETYPE *)vd + H(i)) = 0;                                    \
>> +        }
> Which lets these zeros merge into...
>
>> +    for (; i < vlmax; i++) {                                              \
>> +        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
>> +    }                                                                     \
> These zeros.
>
>> +#define GEN_VEXT_VSLIDE1UP_VX(NAME, ETYPE, H, CLEAR_FN)                   \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
>> +        CPURISCVState *env, uint32_t desc)                                \
>> +{                                                                         \
>> +    uint32_t mlen = vext_mlen(desc);                                      \
>> +    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
>> +    uint32_t vm = vext_vm(desc);                                          \
>> +    uint32_t vl = env->vl;                                                \
>> +    uint32_t i;                                                           \
>> +                                                                          \
>> +    for (i = 0; i < vl; i++) {                                            \
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
>> +            continue;                                                     \
>> +        }                                                                 \
>> +        if (i == 0) {                                                     \
>> +            *((ETYPE *)vd + H(i)) = s1;                                   \
>> +        } else {                                                          \
>> +            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - 1));           \
>> +        }                                                                 \
>> +    }                                                                     \
>> +    if (i == 0) {                                                         \
>> +        return;                                                           \
>> +    }                                                                     \
>> +    for (; i < vlmax; i++) {                                              \
>> +        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
>> +    }                                                                     \
>> +}
> As a preference, I think you can do away with this helper.
> Simply use the slideup helper with argument 1, and then
> afterwards store the integer register into element 0.  You should be able to
> re-use code from vmv.s.x for that.
When I try it, I find it is some difficult, because  vmv.s.x will clean
the elements (0 < index < VLEN/SEW).

Zhiwei
>> +#define GEN_VEXT_VSLIDE1DOWN_VX(NAME, ETYPE, H, CLEAR_FN)                 \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
>> +        CPURISCVState *env, uint32_t desc)                                \
>> +{                                                                         \
> Likewise.
>
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 57/60] target/riscv: vector slide instructions
  2020-03-16  8:04       ` LIU Zhiwei
@ 2020-03-16 17:42         ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-16 17:42 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/16/20 1:04 AM, LIU Zhiwei wrote:
>> As a preference, I think you can do away with this helper.
>> Simply use the slideup helper with argument 1, and then
>> afterwards store the integer register into element 0.  You should be able to
>> re-use code from vmv.s.x for that.
> When I try it, I find it is some difficult, because  vmv.s.x will clean
> the elements (0 < index < VLEN/SEW).

Well, two things about that:

(1) The 0.8 version of vmv.s.x does *not* zero the other elements, so we'll
want to be prepared for that.

(2) We have 8 insns that, in the end come down to a direct element access,
possibly with some other processing.

So we'll want basic helper functions that can locate an element by immediate
offset and by variable offset:

/* Compute the offset of vreg[idx] relative to cpu_env.
   The index must be in range of VLMAX. */
int vec_element_ofsi(int vreg, int idx, int sew);

/* Compute a pointer to vreg[idx].
   If need_bound is true, mask idx into VLMAX,
   Otherwise we know a-priori that idx is already in bounds. */
void vec_element_ofsx(DisasContext *s, TCGv_ptr base,
                      TCGv idx, int sew, bool need_bound);

/* Load idx >= VLMAX ? 0 : vreg[idx] */
void vec_element_loadi(DisasContext *s, TCGv_i64 val,
                       int vreg, int idx, int sew);
void vec_element_loadx(DisasContext *s, TCGv_i64 val,
                       int vreg, TCGv idx, int sew);

/* Store vreg[imm] = val.
   The index must be in range of VLMAX.  */
void vec_element_storei(DisasContext *s, int vreg, int imm,
                        TCGv_i64 val);
void vec_element_storex(DisasContext *s, int vreg,
                        TCGv idx, TCGv_i64 val);

(3) It would be handy to have TCGv cpu_vl.

Then:

vext.x.v:
    If rs1 == 0,
        Use vec_element_loadi(s, x[rd], vs2, 0, s->sew).
    else
        Use vec_element_loadx(s, x[rd], vs2, x[rs1], true).

vmv.s.x:
    over = gen_new_label();
    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
    For 0.7.1:
        Use tcg_gen_dup8i to zero all VLMAX elements of vd.
        If rs1 == 0, goto done.
    Use vec_element_storei(s, vs2, 0, x[rs1]).
 done:
    gen_set_label(over);

vfmv.f.s:
    Use vec_element_loadi(x, f[rd], vs2, 0).
    NaN-box f[rd] as necessary for SEW.

vfmv.s.f:
    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
    For 0.7.1:
        Use tcg_gen_dup8i to zero all VLMAX elements of vd.
    Let tmp = f[rs1], nan-boxed as necessary for SEW.
    Use vec_element_storei(s, vs2, 0, tmp).
    gen_set_label(over);

vslide1up.vx:
    Ho hum, I forgot about masking.  Some options:
    (1) Call a helper just as you did in your original patch.
    (2) Call a helper only for !vm, for vm as below.
    (3) Call vslideup w/1.
        tcg_gen_brcondi(TCG_COND_EQ, cpu_vl, 0, over);
        If !vm,
            // inline test for v0[0]
            vec_element_loadi(s, tmp, 0, 0, MO_8);
            tcg_gen_andi_i64(tmp, tmp, 1);
            tcg_gen_brcondi(TCG_COND_EQ, tmp, 0, over);
        Use vec_element_store(s, vd, 0, x[rs1]).
        gen_set_label(over);

vslide1down.vx:
    For !vm, this is complicated enough for a helper.
    If using option 3 for vslide1up, then the store becomes:
    tcg_gen_subi_tl(tmp, cpu_vl, 1);
    vec_element_storex(s, base, tmp, x[rs1]);

vrgather.vx:
    If !vm or !vl_eq_vlmax, use helper.
    vec_element_loadx(s, tmp, vs2, x[rs1]);
    Use tcg_gen_gvec_dup_i64 to store to tmp to vd.

vrgather.vi:
    If !vm or !vl_eq_vlmax, use helper.
    If imm >= vlmax,
        Use tcg_gen_dup8i to zero vd;
    else,
        ofs = vec_element_ofsi(s, vs2, imm, s->sew);
        tcg_gen_gvec_dup_mem(sew, vreg_ofs(vd),
                             ofs, vlmax, vlmax);


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 57/60] target/riscv: vector slide instructions
@ 2020-03-16 17:42         ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-16 17:42 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/16/20 1:04 AM, LIU Zhiwei wrote:
>> As a preference, I think you can do away with this helper.
>> Simply use the slideup helper with argument 1, and then
>> afterwards store the integer register into element 0.  You should be able to
>> re-use code from vmv.s.x for that.
> When I try it, I find it is some difficult, because  vmv.s.x will clean
> the elements (0 < index < VLEN/SEW).

Well, two things about that:

(1) The 0.8 version of vmv.s.x does *not* zero the other elements, so we'll
want to be prepared for that.

(2) We have 8 insns that, in the end come down to a direct element access,
possibly with some other processing.

So we'll want basic helper functions that can locate an element by immediate
offset and by variable offset:

/* Compute the offset of vreg[idx] relative to cpu_env.
   The index must be in range of VLMAX. */
int vec_element_ofsi(int vreg, int idx, int sew);

/* Compute a pointer to vreg[idx].
   If need_bound is true, mask idx into VLMAX,
   Otherwise we know a-priori that idx is already in bounds. */
void vec_element_ofsx(DisasContext *s, TCGv_ptr base,
                      TCGv idx, int sew, bool need_bound);

/* Load idx >= VLMAX ? 0 : vreg[idx] */
void vec_element_loadi(DisasContext *s, TCGv_i64 val,
                       int vreg, int idx, int sew);
void vec_element_loadx(DisasContext *s, TCGv_i64 val,
                       int vreg, TCGv idx, int sew);

/* Store vreg[imm] = val.
   The index must be in range of VLMAX.  */
void vec_element_storei(DisasContext *s, int vreg, int imm,
                        TCGv_i64 val);
void vec_element_storex(DisasContext *s, int vreg,
                        TCGv idx, TCGv_i64 val);

(3) It would be handy to have TCGv cpu_vl.

Then:

vext.x.v:
    If rs1 == 0,
        Use vec_element_loadi(s, x[rd], vs2, 0, s->sew).
    else
        Use vec_element_loadx(s, x[rd], vs2, x[rs1], true).

vmv.s.x:
    over = gen_new_label();
    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
    For 0.7.1:
        Use tcg_gen_dup8i to zero all VLMAX elements of vd.
        If rs1 == 0, goto done.
    Use vec_element_storei(s, vs2, 0, x[rs1]).
 done:
    gen_set_label(over);

vfmv.f.s:
    Use vec_element_loadi(x, f[rd], vs2, 0).
    NaN-box f[rd] as necessary for SEW.

vfmv.s.f:
    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
    For 0.7.1:
        Use tcg_gen_dup8i to zero all VLMAX elements of vd.
    Let tmp = f[rs1], nan-boxed as necessary for SEW.
    Use vec_element_storei(s, vs2, 0, tmp).
    gen_set_label(over);

vslide1up.vx:
    Ho hum, I forgot about masking.  Some options:
    (1) Call a helper just as you did in your original patch.
    (2) Call a helper only for !vm, for vm as below.
    (3) Call vslideup w/1.
        tcg_gen_brcondi(TCG_COND_EQ, cpu_vl, 0, over);
        If !vm,
            // inline test for v0[0]
            vec_element_loadi(s, tmp, 0, 0, MO_8);
            tcg_gen_andi_i64(tmp, tmp, 1);
            tcg_gen_brcondi(TCG_COND_EQ, tmp, 0, over);
        Use vec_element_store(s, vd, 0, x[rs1]).
        gen_set_label(over);

vslide1down.vx:
    For !vm, this is complicated enough for a helper.
    If using option 3 for vslide1up, then the store becomes:
    tcg_gen_subi_tl(tmp, cpu_vl, 1);
    vec_element_storex(s, base, tmp, x[rs1]);

vrgather.vx:
    If !vm or !vl_eq_vlmax, use helper.
    vec_element_loadx(s, tmp, vs2, x[rs1]);
    Use tcg_gen_gvec_dup_i64 to store to tmp to vd.

vrgather.vi:
    If !vm or !vl_eq_vlmax, use helper.
    If imm >= vlmax,
        Use tcg_gen_dup8i to zero vd;
    else,
        ofs = vec_element_ofsi(s, vs2, imm, s->sew);
        tcg_gen_gvec_dup_mem(sew, vreg_ofs(vd),
                             ofs, vlmax, vlmax);


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 56/60] target/riscv: floating-point scalar move instructions
  2020-03-15  4:39     ` Richard Henderson
@ 2020-03-17  6:01       ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-17  6:01 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768



On 2020/3/15 12:39, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>> ---
>>   target/riscv/helper.h                   |  9 +++++
>>   target/riscv/insn32.decode              |  2 ++
>>   target/riscv/insn_trans/trans_rvv.inc.c | 47 +++++++++++++++++++++++++
>>   target/riscv/vector_helper.c            | 36 +++++++++++++++++++
>>   4 files changed, 94 insertions(+)
>>
>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>> index 41cecd266c..7a689a5c07 100644
>> --- a/target/riscv/helper.h
>> +++ b/target/riscv/helper.h
>> @@ -1111,3 +1111,12 @@ DEF_HELPER_3(vmv_s_x_b, void, ptr, tl, env)
>>   DEF_HELPER_3(vmv_s_x_h, void, ptr, tl, env)
>>   DEF_HELPER_3(vmv_s_x_w, void, ptr, tl, env)
>>   DEF_HELPER_3(vmv_s_x_d, void, ptr, tl, env)
>> +
>> +DEF_HELPER_2(vfmv_f_s_b, i64, ptr, env)
>> +DEF_HELPER_2(vfmv_f_s_h, i64, ptr, env)
>> +DEF_HELPER_2(vfmv_f_s_w, i64, ptr, env)
>> +DEF_HELPER_2(vfmv_f_s_d, i64, ptr, env)
>> +DEF_HELPER_3(vfmv_s_f_b, void, ptr, i64, env)
>> +DEF_HELPER_3(vfmv_s_f_h, void, ptr, i64, env)
>> +DEF_HELPER_3(vfmv_s_f_w, void, ptr, i64, env)
>> +DEF_HELPER_3(vfmv_s_f_d, void, ptr, i64, env)
>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>> index 7e1efeec05..bfdce0979c 100644
>> --- a/target/riscv/insn32.decode
>> +++ b/target/riscv/insn32.decode
>> @@ -557,6 +557,8 @@ viota_m         010110 . ..... 10000 010 ..... 1010111 @r2_vm
>>   vid_v           010110 . 00000 10001 010 ..... 1010111 @r1_vm
>>   vext_x_v        001100 1 ..... ..... 010 ..... 1010111 @r
>>   vmv_s_x         001101 1 00000 ..... 110 ..... 1010111 @r2
>> +vfmv_f_s        001100 1 ..... 00000 001 ..... 1010111 @r2rd
>> +vfmv_s_f        001101 1 00000 ..... 101 ..... 1010111 @r2
>>   
>>   vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
>>   vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
>> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
>> index 7720ffecde..99cd45b0aa 100644
>> --- a/target/riscv/insn_trans/trans_rvv.inc.c
>> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
>> @@ -2269,3 +2269,50 @@ static bool trans_vmv_s_x(DisasContext *s, arg_vmv_s_x *a)
>>       }
>>       return false;
>>   }
>> +
>> +/* Floating-Point Scalar Move Instructions */
>> +typedef void (* gen_helper_vfmv_f_s)(TCGv_i64, TCGv_ptr, TCGv_env);
>> +static bool trans_vfmv_f_s(DisasContext *s, arg_vfmv_f_s *a)
>> +{
>> +    if (vext_check_isa_ill(s, RVV)) {
>> +        TCGv_ptr src2;
>> +        gen_helper_vfmv_f_s fns[4] = {
>> +            gen_helper_vfmv_f_s_b, gen_helper_vfmv_f_s_h,
>> +            gen_helper_vfmv_f_s_w, gen_helper_vfmv_f_s_d
>> +        };
>> +
>> +        src2 = tcg_temp_new_ptr();
>> +        tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, a->rs2));
>> +
>> +        fns[s->sew](cpu_fpr[a->rd], src2, cpu_env);
>> +
>> +        tcg_temp_free_ptr(src2);
>> +        return true;
>> +    }
>> +    return false;
>> +}
> SEW == MO_8 should raise illegal instruction exception.
>
> Need a check for fp enabled.  Presumably
>
>      if (s->mstatus_fs == 0 || !has_ext(s, RVF)) {
>          return false;
>      }
Hi  Richard,

Two questions here. I don't find the answer in the specification.

1. Should  I check RVF if the instruction uses float register,  such as 
all float point instructions and some other instructions?

2. Should I check mstatus_fs if the instruction uses float registers, or 
just for instructions that write float point register?

Zhiwei

> Need to mark_fs_dirty().
>
> Like integer vmv.x.s, this can be done inline.  The nan-boxing is trivial as well.
>
> For 0.8, we will have to validate the nan-boxing for SEW=MO_64 && !RVD.  That's
> still not hard to do inline.
>
>
>
>> +
>> +typedef void (* gen_helper_vfmv_s_f)(TCGv_ptr, TCGv_i64, TCGv_env);
>> +static bool trans_vfmv_s_f(DisasContext *s, arg_vfmv_s_f *a)
>> +{
>> +    if (vext_check_isa_ill(s, RVV | RVF) ||
>> +        vext_check_isa_ill(s, RVV | RVD)) {
>> +        TCGv_ptr dest;
>> +        TCGv_i64 src1;
>> +        gen_helper_vfmv_s_f fns[4] = {
>> +            gen_helper_vfmv_s_f_b, gen_helper_vfmv_s_f_h,
>> +            gen_helper_vfmv_s_f_w, gen_helper_vfmv_s_f_d
>> +        };
>> +
>> +        src1 = tcg_temp_new_i64();
>> +        dest = tcg_temp_new_ptr();
>> +        tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, a->rd));
>> +
>> +        fns[s->sew](dest, src1, cpu_env);
>> +
>> +        tcg_temp_free_i64(src1);
>> +        tcg_temp_free_ptr(dest);
>> +        return true;
>> +    }
>> +    return false;
>> +}
> Again, SEW == MO_8 is illegal.  Missing fp enable check.
>
> I don't believe RVD without RVF is legal; you should not need to check for both.
>
> Missing nan-boxing for SEW==MO_64 && FLEN==32 (!RVD).  Which I think should be
> done here inline, so that the uint64_t passed to the helper is always correct.
>
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 56/60] target/riscv: floating-point scalar move instructions
@ 2020-03-17  6:01       ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-17  6:01 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv



On 2020/3/15 12:39, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>> ---
>>   target/riscv/helper.h                   |  9 +++++
>>   target/riscv/insn32.decode              |  2 ++
>>   target/riscv/insn_trans/trans_rvv.inc.c | 47 +++++++++++++++++++++++++
>>   target/riscv/vector_helper.c            | 36 +++++++++++++++++++
>>   4 files changed, 94 insertions(+)
>>
>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>> index 41cecd266c..7a689a5c07 100644
>> --- a/target/riscv/helper.h
>> +++ b/target/riscv/helper.h
>> @@ -1111,3 +1111,12 @@ DEF_HELPER_3(vmv_s_x_b, void, ptr, tl, env)
>>   DEF_HELPER_3(vmv_s_x_h, void, ptr, tl, env)
>>   DEF_HELPER_3(vmv_s_x_w, void, ptr, tl, env)
>>   DEF_HELPER_3(vmv_s_x_d, void, ptr, tl, env)
>> +
>> +DEF_HELPER_2(vfmv_f_s_b, i64, ptr, env)
>> +DEF_HELPER_2(vfmv_f_s_h, i64, ptr, env)
>> +DEF_HELPER_2(vfmv_f_s_w, i64, ptr, env)
>> +DEF_HELPER_2(vfmv_f_s_d, i64, ptr, env)
>> +DEF_HELPER_3(vfmv_s_f_b, void, ptr, i64, env)
>> +DEF_HELPER_3(vfmv_s_f_h, void, ptr, i64, env)
>> +DEF_HELPER_3(vfmv_s_f_w, void, ptr, i64, env)
>> +DEF_HELPER_3(vfmv_s_f_d, void, ptr, i64, env)
>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>> index 7e1efeec05..bfdce0979c 100644
>> --- a/target/riscv/insn32.decode
>> +++ b/target/riscv/insn32.decode
>> @@ -557,6 +557,8 @@ viota_m         010110 . ..... 10000 010 ..... 1010111 @r2_vm
>>   vid_v           010110 . 00000 10001 010 ..... 1010111 @r1_vm
>>   vext_x_v        001100 1 ..... ..... 010 ..... 1010111 @r
>>   vmv_s_x         001101 1 00000 ..... 110 ..... 1010111 @r2
>> +vfmv_f_s        001100 1 ..... 00000 001 ..... 1010111 @r2rd
>> +vfmv_s_f        001101 1 00000 ..... 101 ..... 1010111 @r2
>>   
>>   vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
>>   vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
>> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
>> index 7720ffecde..99cd45b0aa 100644
>> --- a/target/riscv/insn_trans/trans_rvv.inc.c
>> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
>> @@ -2269,3 +2269,50 @@ static bool trans_vmv_s_x(DisasContext *s, arg_vmv_s_x *a)
>>       }
>>       return false;
>>   }
>> +
>> +/* Floating-Point Scalar Move Instructions */
>> +typedef void (* gen_helper_vfmv_f_s)(TCGv_i64, TCGv_ptr, TCGv_env);
>> +static bool trans_vfmv_f_s(DisasContext *s, arg_vfmv_f_s *a)
>> +{
>> +    if (vext_check_isa_ill(s, RVV)) {
>> +        TCGv_ptr src2;
>> +        gen_helper_vfmv_f_s fns[4] = {
>> +            gen_helper_vfmv_f_s_b, gen_helper_vfmv_f_s_h,
>> +            gen_helper_vfmv_f_s_w, gen_helper_vfmv_f_s_d
>> +        };
>> +
>> +        src2 = tcg_temp_new_ptr();
>> +        tcg_gen_addi_ptr(src2, cpu_env, vreg_ofs(s, a->rs2));
>> +
>> +        fns[s->sew](cpu_fpr[a->rd], src2, cpu_env);
>> +
>> +        tcg_temp_free_ptr(src2);
>> +        return true;
>> +    }
>> +    return false;
>> +}
> SEW == MO_8 should raise illegal instruction exception.
>
> Need a check for fp enabled.  Presumably
>
>      if (s->mstatus_fs == 0 || !has_ext(s, RVF)) {
>          return false;
>      }
Hi  Richard,

Two questions here. I don't find the answer in the specification.

1. Should  I check RVF if the instruction uses float register,  such as 
all float point instructions and some other instructions?

2. Should I check mstatus_fs if the instruction uses float registers, or 
just for instructions that write float point register?

Zhiwei

> Need to mark_fs_dirty().
>
> Like integer vmv.x.s, this can be done inline.  The nan-boxing is trivial as well.
>
> For 0.8, we will have to validate the nan-boxing for SEW=MO_64 && !RVD.  That's
> still not hard to do inline.
>
>
>
>> +
>> +typedef void (* gen_helper_vfmv_s_f)(TCGv_ptr, TCGv_i64, TCGv_env);
>> +static bool trans_vfmv_s_f(DisasContext *s, arg_vfmv_s_f *a)
>> +{
>> +    if (vext_check_isa_ill(s, RVV | RVF) ||
>> +        vext_check_isa_ill(s, RVV | RVD)) {
>> +        TCGv_ptr dest;
>> +        TCGv_i64 src1;
>> +        gen_helper_vfmv_s_f fns[4] = {
>> +            gen_helper_vfmv_s_f_b, gen_helper_vfmv_s_f_h,
>> +            gen_helper_vfmv_s_f_w, gen_helper_vfmv_s_f_d
>> +        };
>> +
>> +        src1 = tcg_temp_new_i64();
>> +        dest = tcg_temp_new_ptr();
>> +        tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, a->rd));
>> +
>> +        fns[s->sew](dest, src1, cpu_env);
>> +
>> +        tcg_temp_free_i64(src1);
>> +        tcg_temp_free_ptr(dest);
>> +        return true;
>> +    }
>> +    return false;
>> +}
> Again, SEW == MO_8 is illegal.  Missing fp enable check.
>
> I don't believe RVD without RVF is legal; you should not need to check for both.
>
> Missing nan-boxing for SEW==MO_64 && FLEN==32 (!RVD).  Which I think should be
> done here inline, so that the uint64_t passed to the helper is always correct.
>
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 56/60] target/riscv: floating-point scalar move instructions
  2020-03-17  6:01       ` LIU Zhiwei
@ 2020-03-17 15:11         ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-17 15:11 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/16/20 11:01 PM, LIU Zhiwei wrote:
> Two questions here. I don't find the answer in the specification.
> 
> 1. Should  I check RVF if the instruction uses float register,  such as all
> float point instructions and some other instructions?

I would think so, but even the 0.8 spec isn't clear.


> 2. Should I check mstatus_fs if the instruction uses float registers, or just
> for instructions that write float point register?

Definitely, just like the regular fp instructions.

This trap is how the kernel implements lazy fp context switching, so if you
allow access to fp when disabled you may be accessing values from a different
process.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 56/60] target/riscv: floating-point scalar move instructions
@ 2020-03-17 15:11         ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-17 15:11 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/16/20 11:01 PM, LIU Zhiwei wrote:
> Two questions here. I don't find the answer in the specification.
> 
> 1. Should  I check RVF if the instruction uses float register,  such as all
> float point instructions and some other instructions?

I would think so, but even the 0.8 spec isn't clear.


> 2. Should I check mstatus_fs if the instruction uses float registers, or just
> for instructions that write float point register?

Definitely, just like the regular fp instructions.

This trap is how the kernel implements lazy fp context switching, so if you
allow access to fp when disabled you may be accessing values from a different
process.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 09/60] target/riscv: vector single-width integer add and subtract
  2020-03-14  5:25     ` Richard Henderson
@ 2020-03-23  8:10       ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-23  8:10 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

[-- Attachment #1: Type: text/plain, Size: 8725 bytes --]



On 2020/3/14 13:25, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +    if (a->vm && s->vl_eq_vlmax) {                                 \
>> +        tcg_gen_gvec_##GVSUF(8 << s->sew, vreg_ofs(s, a->rd),      \
>> +            vreg_ofs(s, a->rs2), vreg_ofs(s, a->rs1),              \
>> +            MAXSZ(s), MAXSZ(s));                                   \
> The first argument here should be just s->sew.
> You should have see the assert fire:
>
>      tcg_debug_assert(vece <= MO_64);
>
> It would be nice to pull out the bulk of GEN_OPIVV_GVEC_TRANS as a function,
> and pass in tcg_gen_gvec_* as a function pointer, and fns as a pointer.
>
> In general, I prefer the functions that are generated by macros like this to
> have exactly one executable statement -- the call to the helper that does all
> of the work using the arguments provided.  That way a maximum number of lines
> are available for stepping with the debugger.
>
>> +        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                        \
>> +        data = FIELD_DP32(data, VDATA, VM, a->vm);                            \
>> +        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                        \
> Why are these replicated in each trans_* function, and not done in opiv?_trans,
> where the rest of the descriptor is created?
>
>> +/* OPIVX without GVEC IR */
>> +#define GEN_OPIVX_TRANS(NAME, CHECK)                                     \
>> +static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
>> +{                                                                        \
>> +    if (CHECK(s, a)) {                                                   \
>> +        uint32_t data = 0;                                               \
>> +        static gen_helper_opivx const fns[4] = {                         \
>> +            gen_helper_##NAME##_b, gen_helper_##NAME##_h,                \
>> +            gen_helper_##NAME##_w, gen_helper_##NAME##_d,                \
>> +        };                                                               \
>> +                                                                         \
>> +        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
>> +        data = FIELD_DP32(data, VDATA, VM, a->vm);                       \
>> +        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
>> +        return opivx_trans(a->rd, a->rs1, a->rs2, data, fns[s->sew], s); \
>> +    }                                                                    \
>> +    return false;                                                        \
>> +}
>> +
>> +GEN_OPIVX_TRANS(vrsub_vx, opivx_check)
> Note that you *can* generate vector code for this,
> you just have to write your own helpers.
>
> E.g.
>
> static void gen_vec_rsub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 a)
> {
>      tcg_gen_vec_sub8_i64(d, b, a);
> }
> // etc, reversing the arguments and passing on to sub.
>
> static const GVecGen2s rsub_op[4] = {
>      { .fni8 = tcg_gen_vec_rsub8_i64,
>        .fniv = tcg_gen_rsub_vec,
>        .fno = gen_helper_gvec_rsubs8,
>        .opt_opc = vecop_list_sub,
>        .vece = MO_8 },
>      { .fni8 = tcg_gen_vec_rsub16_i64,
>        .fniv = tcg_gen_rsub_vec,
>        .fno = gen_helper_gvec_rsubs16,
>        .opt_opc = vecop_list_sub,
>        .vece = MO_16 },
>      { .fni4 = tcg_gen_rsub_i32,
>        .fniv = tcg_gen_rsub_vec,
>        .fno = gen_helper_gvec_rsubs32,
>        .opt_opc = vecop_list_sub,
>        .vece = MO_32 },
>      { .fni8 = tcg_gen_rsub_i64,
>        .fniv = tcg_gen_rsub_vec,
>        .fno = gen_helper_gvec_rsubs64,
>        .opt_opc = vecop_list_sub,
>        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
>        .vece = MO_64 },
> };
>
> static void gen_gvec_rsubs(unsigned vece, uint32_t dofs,
>      uint32_t aofs, TCGv_i64 c,
>      uint32_t oprsz, uint32_t maxsz)
> {
>      tcg_debug_assert(vece <= MO_64);
>      tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, c, &rsub_op[vece]);
> }
>
> static void gen_gvec_rsubi(unsigned vece, uint32_t dofs,
>      uint32_t aofs, int64_t c,
>      uint32_t oprsz, uint32_t maxsz)
> {
>      tcg_debug_assert(vece <= MO_64);
>      tcg_gen_gvec_2i(dofs, aofs, oprsz, maxsz, c, &rsub_op[vece]);
> }
Hi Richard,

When I try to add GVEC IR rsubs,I find it is some difficult to keep it 
separate from tcg-runtime-gvec.c.

The .fno functions, e.g.,  gen_helper_gvec_rsubs8 need to be defined like

    void HELPER(gvec_subs8)(void *d, void *a, uint64_t b, uint32_t desc)

    {

         intptr_t oprsz = simd_oprsz(desc);

         vec8 vecb = (vec8)DUP16(b);

         intptr_t i;

         for (i = 0; i < oprsz; i += sizeof(vec8)) {

             *(vec8 *)(d + i) = vecb - *(vec8 *)(a + i);

         }

         clear_high(d, oprsz, desc);

    }


The vec8 and DUP are defined in tcg-runtime-gvec.c.

Should I declare them  in somewhere else, or just put HELPER(gvec_subs8) 
into tcg-runtime-gvec.c?

Zhiwei

>> +/* generate the helpers for OPIVV */
>> +#define GEN_VEXT_VV(NAME, ESZ, DSZ, CLEAR_FN)             \
>> +void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
>> +        void *vs2, CPURISCVState *env, uint32_t desc)     \
>> +{                                                         \
>> +    uint32_t vlmax = vext_maxsz(desc) / ESZ;              \
>> +    uint32_t mlen = vext_mlen(desc);                      \
>> +    uint32_t vm = vext_vm(desc);                          \
>> +    uint32_t vl = env->vl;                                \
>> +    uint32_t i;                                           \
>> +    for (i = 0; i < vl; i++) {                            \
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
>> +            continue;                                     \
>> +        }                                                 \
>> +        do_##NAME(vd, vs1, vs2, i);                       \
>> +    }                                                     \
>> +    if (i != 0) {                                         \
>> +        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);         \
>> +    }                                                     \
>> +}
>> +
>> +GEN_VEXT_VV(vadd_vv_b, 1, 1, clearb)
>> +GEN_VEXT_VV(vadd_vv_h, 2, 2, clearh)
>> +GEN_VEXT_VV(vadd_vv_w, 4, 4, clearl)
>> +GEN_VEXT_VV(vadd_vv_d, 8, 8, clearq)
>> +GEN_VEXT_VV(vsub_vv_b, 1, 1, clearb)
>> +GEN_VEXT_VV(vsub_vv_h, 2, 2, clearh)
>> +GEN_VEXT_VV(vsub_vv_w, 4, 4, clearl)
>> +GEN_VEXT_VV(vsub_vv_d, 8, 8, clearq)
> The body of GEN_VEXT_VV can be an inline function, calling the helper functions
> that you generated above.
>
>> +/*
>> + * If XLEN < SEW, the value from the x register is sign-extended to SEW bits.
>> + * So (target_long)s1 is need. (T1)(target_long)s1 gives the real operator type.
>> + * (TX1)(T1)(target_long)s1 expands the operator type of widen operations
>> + * or narrow operations
>> + */
>> +#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
>> +static void do_##NAME(void *vd, target_ulong s1, void *vs2, int i)  \
>> +{                                                                   \
>> +    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
>> +    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)(target_long)s1);         \
>> +}
> Why not just make the type of s1 be target_long in the parameter?
>
>> +/* generate the helpers for instructions with one vector and one sclar */
>> +#define GEN_VEXT_VX(NAME, ESZ, DSZ, CLEAR_FN)             \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
>> +        void *vs2, CPURISCVState *env, uint32_t desc)     \
>> +{                                                         \
>> +    uint32_t vlmax = vext_maxsz(desc) / ESZ;              \
>> +    uint32_t mlen = vext_mlen(desc);                      \
>> +    uint32_t vm = vext_vm(desc);                          \
>> +    uint32_t vl = env->vl;                                \
>> +    uint32_t i;                                           \
>> +                                                          \
>> +    for (i = 0; i < vl; i++) {                            \
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
>> +            continue;                                     \
>> +        }                                                 \
>> +        do_##NAME(vd, s1, vs2, i);                        \
>> +    }                                                     \
>> +    if (i != 0) {                                         \
>> +        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);         \
>> +    }                                                     \
>> +}
> Likewise an inline function.
>
>
> r~


[-- Attachment #2: Type: text/html, Size: 10104 bytes --]

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 09/60] target/riscv: vector single-width integer add and subtract
@ 2020-03-23  8:10       ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-23  8:10 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

[-- Attachment #1: Type: text/plain, Size: 8725 bytes --]



On 2020/3/14 13:25, Richard Henderson wrote:
> On 3/12/20 7:58 AM, LIU Zhiwei wrote:
>> +    if (a->vm && s->vl_eq_vlmax) {                                 \
>> +        tcg_gen_gvec_##GVSUF(8 << s->sew, vreg_ofs(s, a->rd),      \
>> +            vreg_ofs(s, a->rs2), vreg_ofs(s, a->rs1),              \
>> +            MAXSZ(s), MAXSZ(s));                                   \
> The first argument here should be just s->sew.
> You should have see the assert fire:
>
>      tcg_debug_assert(vece <= MO_64);
>
> It would be nice to pull out the bulk of GEN_OPIVV_GVEC_TRANS as a function,
> and pass in tcg_gen_gvec_* as a function pointer, and fns as a pointer.
>
> In general, I prefer the functions that are generated by macros like this to
> have exactly one executable statement -- the call to the helper that does all
> of the work using the arguments provided.  That way a maximum number of lines
> are available for stepping with the debugger.
>
>> +        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                        \
>> +        data = FIELD_DP32(data, VDATA, VM, a->vm);                            \
>> +        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                        \
> Why are these replicated in each trans_* function, and not done in opiv?_trans,
> where the rest of the descriptor is created?
>
>> +/* OPIVX without GVEC IR */
>> +#define GEN_OPIVX_TRANS(NAME, CHECK)                                     \
>> +static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
>> +{                                                                        \
>> +    if (CHECK(s, a)) {                                                   \
>> +        uint32_t data = 0;                                               \
>> +        static gen_helper_opivx const fns[4] = {                         \
>> +            gen_helper_##NAME##_b, gen_helper_##NAME##_h,                \
>> +            gen_helper_##NAME##_w, gen_helper_##NAME##_d,                \
>> +        };                                                               \
>> +                                                                         \
>> +        data = FIELD_DP32(data, VDATA, MLEN, s->mlen);                   \
>> +        data = FIELD_DP32(data, VDATA, VM, a->vm);                       \
>> +        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                   \
>> +        return opivx_trans(a->rd, a->rs1, a->rs2, data, fns[s->sew], s); \
>> +    }                                                                    \
>> +    return false;                                                        \
>> +}
>> +
>> +GEN_OPIVX_TRANS(vrsub_vx, opivx_check)
> Note that you *can* generate vector code for this,
> you just have to write your own helpers.
>
> E.g.
>
> static void gen_vec_rsub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 a)
> {
>      tcg_gen_vec_sub8_i64(d, b, a);
> }
> // etc, reversing the arguments and passing on to sub.
>
> static const GVecGen2s rsub_op[4] = {
>      { .fni8 = tcg_gen_vec_rsub8_i64,
>        .fniv = tcg_gen_rsub_vec,
>        .fno = gen_helper_gvec_rsubs8,
>        .opt_opc = vecop_list_sub,
>        .vece = MO_8 },
>      { .fni8 = tcg_gen_vec_rsub16_i64,
>        .fniv = tcg_gen_rsub_vec,
>        .fno = gen_helper_gvec_rsubs16,
>        .opt_opc = vecop_list_sub,
>        .vece = MO_16 },
>      { .fni4 = tcg_gen_rsub_i32,
>        .fniv = tcg_gen_rsub_vec,
>        .fno = gen_helper_gvec_rsubs32,
>        .opt_opc = vecop_list_sub,
>        .vece = MO_32 },
>      { .fni8 = tcg_gen_rsub_i64,
>        .fniv = tcg_gen_rsub_vec,
>        .fno = gen_helper_gvec_rsubs64,
>        .opt_opc = vecop_list_sub,
>        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
>        .vece = MO_64 },
> };
>
> static void gen_gvec_rsubs(unsigned vece, uint32_t dofs,
>      uint32_t aofs, TCGv_i64 c,
>      uint32_t oprsz, uint32_t maxsz)
> {
>      tcg_debug_assert(vece <= MO_64);
>      tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, c, &rsub_op[vece]);
> }
>
> static void gen_gvec_rsubi(unsigned vece, uint32_t dofs,
>      uint32_t aofs, int64_t c,
>      uint32_t oprsz, uint32_t maxsz)
> {
>      tcg_debug_assert(vece <= MO_64);
>      tcg_gen_gvec_2i(dofs, aofs, oprsz, maxsz, c, &rsub_op[vece]);
> }
Hi Richard,

When I try to add GVEC IR rsubs,I find it is some difficult to keep it 
separate from tcg-runtime-gvec.c.

The .fno functions, e.g.,  gen_helper_gvec_rsubs8 need to be defined like

    void HELPER(gvec_subs8)(void *d, void *a, uint64_t b, uint32_t desc)

    {

         intptr_t oprsz = simd_oprsz(desc);

         vec8 vecb = (vec8)DUP16(b);

         intptr_t i;

         for (i = 0; i < oprsz; i += sizeof(vec8)) {

             *(vec8 *)(d + i) = vecb - *(vec8 *)(a + i);

         }

         clear_high(d, oprsz, desc);

    }


The vec8 and DUP are defined in tcg-runtime-gvec.c.

Should I declare them  in somewhere else, or just put HELPER(gvec_subs8) 
into tcg-runtime-gvec.c?

Zhiwei

>> +/* generate the helpers for OPIVV */
>> +#define GEN_VEXT_VV(NAME, ESZ, DSZ, CLEAR_FN)             \
>> +void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
>> +        void *vs2, CPURISCVState *env, uint32_t desc)     \
>> +{                                                         \
>> +    uint32_t vlmax = vext_maxsz(desc) / ESZ;              \
>> +    uint32_t mlen = vext_mlen(desc);                      \
>> +    uint32_t vm = vext_vm(desc);                          \
>> +    uint32_t vl = env->vl;                                \
>> +    uint32_t i;                                           \
>> +    for (i = 0; i < vl; i++) {                            \
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
>> +            continue;                                     \
>> +        }                                                 \
>> +        do_##NAME(vd, vs1, vs2, i);                       \
>> +    }                                                     \
>> +    if (i != 0) {                                         \
>> +        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);         \
>> +    }                                                     \
>> +}
>> +
>> +GEN_VEXT_VV(vadd_vv_b, 1, 1, clearb)
>> +GEN_VEXT_VV(vadd_vv_h, 2, 2, clearh)
>> +GEN_VEXT_VV(vadd_vv_w, 4, 4, clearl)
>> +GEN_VEXT_VV(vadd_vv_d, 8, 8, clearq)
>> +GEN_VEXT_VV(vsub_vv_b, 1, 1, clearb)
>> +GEN_VEXT_VV(vsub_vv_h, 2, 2, clearh)
>> +GEN_VEXT_VV(vsub_vv_w, 4, 4, clearl)
>> +GEN_VEXT_VV(vsub_vv_d, 8, 8, clearq)
> The body of GEN_VEXT_VV can be an inline function, calling the helper functions
> that you generated above.
>
>> +/*
>> + * If XLEN < SEW, the value from the x register is sign-extended to SEW bits.
>> + * So (target_long)s1 is need. (T1)(target_long)s1 gives the real operator type.
>> + * (TX1)(T1)(target_long)s1 expands the operator type of widen operations
>> + * or narrow operations
>> + */
>> +#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
>> +static void do_##NAME(void *vd, target_ulong s1, void *vs2, int i)  \
>> +{                                                                   \
>> +    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
>> +    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)(target_long)s1);         \
>> +}
> Why not just make the type of s1 be target_long in the parameter?
>
>> +/* generate the helpers for instructions with one vector and one sclar */
>> +#define GEN_VEXT_VX(NAME, ESZ, DSZ, CLEAR_FN)             \
>> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
>> +        void *vs2, CPURISCVState *env, uint32_t desc)     \
>> +{                                                         \
>> +    uint32_t vlmax = vext_maxsz(desc) / ESZ;              \
>> +    uint32_t mlen = vext_mlen(desc);                      \
>> +    uint32_t vm = vext_vm(desc);                          \
>> +    uint32_t vl = env->vl;                                \
>> +    uint32_t i;                                           \
>> +                                                          \
>> +    for (i = 0; i < vl; i++) {                            \
>> +        if (!vm && !vext_elem_mask(v0, mlen, i)) {        \
>> +            continue;                                     \
>> +        }                                                 \
>> +        do_##NAME(vd, s1, vs2, i);                        \
>> +    }                                                     \
>> +    if (i != 0) {                                         \
>> +        CLEAR_FN(vd, vl, vl * DSZ,  vlmax * DSZ);         \
>> +    }                                                     \
>> +}
> Likewise an inline function.
>
>
> r~


[-- Attachment #2: Type: text/html, Size: 10104 bytes --]

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 09/60] target/riscv: vector single-width integer add and subtract
  2020-03-23  8:10       ` LIU Zhiwei
@ 2020-03-23 17:46         ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-23 17:46 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/23/20 1:10 AM, LIU Zhiwei wrote:
>> static void gen_gvec_rsubi(unsigned vece, uint32_t dofs,
>>     uint32_t aofs, int64_t c,
>>     uint32_t oprsz, uint32_t maxsz)
>> {
>>     tcg_debug_assert(vece <= MO_64);
>>     tcg_gen_gvec_2i(dofs, aofs, oprsz, maxsz, c, &rsub_op[vece]);
>> }
> Hi Richard,
> 
> When I try to add GVEC IR rsubs,I find it is some difficult to keep it
> separate from tcg-runtime-gvec.c.
> 
> The .fno functions, e.g.,  gen_helper_gvec_rsubs8  need to be defined like
> 
>     void HELPER(gvec_subs8)(void *d, void *a, uint64_t b, uint32_t desc)
> 
>     {
> 
>         intptr_t oprsz = simd_oprsz(desc);
> 
>         vec8 vecb = (vec8)DUP16(b);
> 
>         intptr_t i;
> 
>         for (i = 0; i < oprsz; i += sizeof(vec8)) {
> 
>             *(vec8 *)(d + i) = vecb - *(vec8 *)(a + i);
> 
>         }
> 
>         clear_high(d, oprsz, desc);
> 
>     }
> 
>    
> The vec8 and DUP are defined in tcg-runtime-gvec.c. 

Update your branch -- they're gone since commit 0a83e43a9ee6.
Just use normal integer types.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 09/60] target/riscv: vector single-width integer add and subtract
@ 2020-03-23 17:46         ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-23 17:46 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/23/20 1:10 AM, LIU Zhiwei wrote:
>> static void gen_gvec_rsubi(unsigned vece, uint32_t dofs,
>>     uint32_t aofs, int64_t c,
>>     uint32_t oprsz, uint32_t maxsz)
>> {
>>     tcg_debug_assert(vece <= MO_64);
>>     tcg_gen_gvec_2i(dofs, aofs, oprsz, maxsz, c, &rsub_op[vece]);
>> }
> Hi Richard,
> 
> When I try to add GVEC IR rsubs,I find it is some difficult to keep it
> separate from tcg-runtime-gvec.c.
> 
> The .fno functions, e.g.,  gen_helper_gvec_rsubs8  need to be defined like
> 
>     void HELPER(gvec_subs8)(void *d, void *a, uint64_t b, uint32_t desc)
> 
>     {
> 
>         intptr_t oprsz = simd_oprsz(desc);
> 
>         vec8 vecb = (vec8)DUP16(b);
> 
>         intptr_t i;
> 
>         for (i = 0; i < oprsz; i += sizeof(vec8)) {
> 
>             *(vec8 *)(d + i) = vecb - *(vec8 *)(a + i);
> 
>         }
> 
>         clear_high(d, oprsz, desc);
> 
>     }
> 
>    
> The vec8 and DUP are defined in tcg-runtime-gvec.c. 

Update your branch -- they're gone since commit 0a83e43a9ee6.
Just use normal integer types.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 57/60] target/riscv: vector slide instructions
  2020-03-16 17:42         ` Richard Henderson
@ 2020-03-24 10:51           ` LIU Zhiwei
  -1 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-24 10:51 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768



On 2020/3/17 1:42, Richard Henderson wrote:
> On 3/16/20 1:04 AM, LIU Zhiwei wrote:
>>> As a preference, I think you can do away with this helper.
>>> Simply use the slideup helper with argument 1, and then
>>> afterwards store the integer register into element 0.  You should be able to
>>> re-use code from vmv.s.x for that.
>> When I try it, I find it is some difficult, because  vmv.s.x will clean
>> the elements (0 < index < VLEN/SEW).
> Well, two things about that:
>
> (1) The 0.8 version of vmv.s.x does *not* zero the other elements, so we'll
> want to be prepared for that.
>
> (2) We have 8 insns that, in the end come down to a direct element access,
> possibly with some other processing.
>
> So we'll want basic helper functions that can locate an element by immediate
> offset and by variable offset:
>
> /* Compute the offset of vreg[idx] relative to cpu_env.
>     The index must be in range of VLMAX. */
> int vec_element_ofsi(int vreg, int idx, int sew);
>
> /* Compute a pointer to vreg[idx].
>     If need_bound is true, mask idx into VLMAX,
>     Otherwise we know a-priori that idx is already in bounds. */
> void vec_element_ofsx(DisasContext *s, TCGv_ptr base,
>                        TCGv idx, int sew, bool need_bound);
>
> /* Load idx >= VLMAX ? 0 : vreg[idx] */
> void vec_element_loadi(DisasContext *s, TCGv_i64 val,
>                         int vreg, int idx, int sew);
> void vec_element_loadx(DisasContext *s, TCGv_i64 val,
>                         int vreg, TCGv idx, int sew);
>
> /* Store vreg[imm] = val.
>     The index must be in range of VLMAX.  */
> void vec_element_storei(DisasContext *s, int vreg, int imm,
>                          TCGv_i64 val);
> void vec_element_storex(DisasContext *s, int vreg,
>                          TCGv idx, TCGv_i64 val);
>
> (3) It would be handy to have TCGv cpu_vl.
Do you mean I should define cpu_vl as a global TCG varible like cpu_pc?
So that I can check vl==0 in translation time.

Or just a temp variable?
>
> Then:
>
> vext.x.v:
>      If rs1 == 0,
>          Use vec_element_loadi(s, x[rd], vs2, 0, s->sew).
>      else
>          Use vec_element_loadx(s, x[rd], vs2, x[rs1], true).
>
> vmv.s.x:
>      over = gen_new_label();
>      tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
>      For 0.7.1:
>          Use tcg_gen_dup8i to zero all VLMAX elements of vd.
>          If rs1 == 0, goto done.
>      Use vec_element_storei(s, vs2, 0, x[rs1]).
>   done:
>      gen_set_label(over);
>
> vfmv.f.s:
>      Use vec_element_loadi(x, f[rd], vs2, 0).
>      NaN-box f[rd] as necessary for SEW.
>
> vfmv.s.f:
>      tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
>      For 0.7.1:
>          Use tcg_gen_dup8i to zero all VLMAX elements of vd.
>      Let tmp = f[rs1], nan-boxed as necessary for SEW.
>      Use vec_element_storei(s, vs2, 0, tmp).
>      gen_set_label(over);
>
> vslide1up.vx:
>      Ho hum, I forgot about masking.  Some options:
>      (1) Call a helper just as you did in your original patch.
>      (2) Call a helper only for !vm, for vm as below.

Sorry, I don't get it why I need a helper for !vm.
I think I can  call vslideup w/1 whether !vm or vm, then a store to vd[0].

Zhiwei
>      (3) Call vslideup w/1.
>          tcg_gen_brcondi(TCG_COND_EQ, cpu_vl, 0, over);
>          If !vm,
>              // inline test for v0[0]
>              vec_element_loadi(s, tmp, 0, 0, MO_8);
>              tcg_gen_andi_i64(tmp, tmp, 1);
>              tcg_gen_brcondi(TCG_COND_EQ, tmp, 0, over);
>          Use vec_element_store(s, vd, 0, x[rs1]).
>          gen_set_label(over);
>
> vslide1down.vx:
>      For !vm, this is complicated enough for a helper.
>      If using option 3 for vslide1up, then the store becomes:
>      tcg_gen_subi_tl(tmp, cpu_vl, 1);
>      vec_element_storex(s, base, tmp, x[rs1]);
>
> vrgather.vx:
>      If !vm or !vl_eq_vlmax, use helper.
>      vec_element_loadx(s, tmp, vs2, x[rs1]);
>      Use tcg_gen_gvec_dup_i64 to store to tmp to vd.
>
> vrgather.vi:
>      If !vm or !vl_eq_vlmax, use helper.
>      If imm >= vlmax,
>          Use tcg_gen_dup8i to zero vd;
>      else,
>          ofs = vec_element_ofsi(s, vs2, imm, s->sew);
>          tcg_gen_gvec_dup_mem(sew, vreg_ofs(vd),
>                               ofs, vlmax, vlmax);
>
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 57/60] target/riscv: vector slide instructions
@ 2020-03-24 10:51           ` LIU Zhiwei
  0 siblings, 0 replies; 336+ messages in thread
From: LIU Zhiwei @ 2020-03-24 10:51 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv



On 2020/3/17 1:42, Richard Henderson wrote:
> On 3/16/20 1:04 AM, LIU Zhiwei wrote:
>>> As a preference, I think you can do away with this helper.
>>> Simply use the slideup helper with argument 1, and then
>>> afterwards store the integer register into element 0.  You should be able to
>>> re-use code from vmv.s.x for that.
>> When I try it, I find it is some difficult, because  vmv.s.x will clean
>> the elements (0 < index < VLEN/SEW).
> Well, two things about that:
>
> (1) The 0.8 version of vmv.s.x does *not* zero the other elements, so we'll
> want to be prepared for that.
>
> (2) We have 8 insns that, in the end come down to a direct element access,
> possibly with some other processing.
>
> So we'll want basic helper functions that can locate an element by immediate
> offset and by variable offset:
>
> /* Compute the offset of vreg[idx] relative to cpu_env.
>     The index must be in range of VLMAX. */
> int vec_element_ofsi(int vreg, int idx, int sew);
>
> /* Compute a pointer to vreg[idx].
>     If need_bound is true, mask idx into VLMAX,
>     Otherwise we know a-priori that idx is already in bounds. */
> void vec_element_ofsx(DisasContext *s, TCGv_ptr base,
>                        TCGv idx, int sew, bool need_bound);
>
> /* Load idx >= VLMAX ? 0 : vreg[idx] */
> void vec_element_loadi(DisasContext *s, TCGv_i64 val,
>                         int vreg, int idx, int sew);
> void vec_element_loadx(DisasContext *s, TCGv_i64 val,
>                         int vreg, TCGv idx, int sew);
>
> /* Store vreg[imm] = val.
>     The index must be in range of VLMAX.  */
> void vec_element_storei(DisasContext *s, int vreg, int imm,
>                          TCGv_i64 val);
> void vec_element_storex(DisasContext *s, int vreg,
>                          TCGv idx, TCGv_i64 val);
>
> (3) It would be handy to have TCGv cpu_vl.
Do you mean I should define cpu_vl as a global TCG varible like cpu_pc?
So that I can check vl==0 in translation time.

Or just a temp variable?
>
> Then:
>
> vext.x.v:
>      If rs1 == 0,
>          Use vec_element_loadi(s, x[rd], vs2, 0, s->sew).
>      else
>          Use vec_element_loadx(s, x[rd], vs2, x[rs1], true).
>
> vmv.s.x:
>      over = gen_new_label();
>      tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
>      For 0.7.1:
>          Use tcg_gen_dup8i to zero all VLMAX elements of vd.
>          If rs1 == 0, goto done.
>      Use vec_element_storei(s, vs2, 0, x[rs1]).
>   done:
>      gen_set_label(over);
>
> vfmv.f.s:
>      Use vec_element_loadi(x, f[rd], vs2, 0).
>      NaN-box f[rd] as necessary for SEW.
>
> vfmv.s.f:
>      tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
>      For 0.7.1:
>          Use tcg_gen_dup8i to zero all VLMAX elements of vd.
>      Let tmp = f[rs1], nan-boxed as necessary for SEW.
>      Use vec_element_storei(s, vs2, 0, tmp).
>      gen_set_label(over);
>
> vslide1up.vx:
>      Ho hum, I forgot about masking.  Some options:
>      (1) Call a helper just as you did in your original patch.
>      (2) Call a helper only for !vm, for vm as below.

Sorry, I don't get it why I need a helper for !vm.
I think I can  call vslideup w/1 whether !vm or vm, then a store to vd[0].

Zhiwei
>      (3) Call vslideup w/1.
>          tcg_gen_brcondi(TCG_COND_EQ, cpu_vl, 0, over);
>          If !vm,
>              // inline test for v0[0]
>              vec_element_loadi(s, tmp, 0, 0, MO_8);
>              tcg_gen_andi_i64(tmp, tmp, 1);
>              tcg_gen_brcondi(TCG_COND_EQ, tmp, 0, over);
>          Use vec_element_store(s, vd, 0, x[rs1]).
>          gen_set_label(over);
>
> vslide1down.vx:
>      For !vm, this is complicated enough for a helper.
>      If using option 3 for vslide1up, then the store becomes:
>      tcg_gen_subi_tl(tmp, cpu_vl, 1);
>      vec_element_storex(s, base, tmp, x[rs1]);
>
> vrgather.vx:
>      If !vm or !vl_eq_vlmax, use helper.
>      vec_element_loadx(s, tmp, vs2, x[rs1]);
>      Use tcg_gen_gvec_dup_i64 to store to tmp to vd.
>
> vrgather.vi:
>      If !vm or !vl_eq_vlmax, use helper.
>      If imm >= vlmax,
>          Use tcg_gen_dup8i to zero vd;
>      else,
>          ofs = vec_element_ofsi(s, vs2, imm, s->sew);
>          tcg_gen_gvec_dup_mem(sew, vreg_ofs(vd),
>                               ofs, vlmax, vlmax);
>
>
> r~



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 57/60] target/riscv: vector slide instructions
  2020-03-24 10:51           ` LIU Zhiwei
@ 2020-03-24 14:52             ` Richard Henderson
  -1 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-24 14:52 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: guoren, wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 3/24/20 3:51 AM, LIU Zhiwei wrote:
>> (3) It would be handy to have TCGv cpu_vl.
> Do you mean I should define cpu_vl as a global TCG varible like cpu_pc?
> So that I can check vl==0 in translation time.

Yes.

>> vslide1up.vx:
>>      Ho hum, I forgot about masking.  Some options:
>>      (1) Call a helper just as you did in your original patch.
>>      (2) Call a helper only for !vm, for vm as below.
> 
> Sorry, I don't get it why I need a helper for !vm.
> I think I can  call vslideup w/1 whether !vm or vm, then a store to vd[0].

That's right.  I didn't mean a helper specific to vslide1up, but any helper.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH v5 57/60] target/riscv: vector slide instructions
@ 2020-03-24 14:52             ` Richard Henderson
  0 siblings, 0 replies; 336+ messages in thread
From: Richard Henderson @ 2020-03-24 14:52 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, guoren, qemu-devel, qemu-riscv

On 3/24/20 3:51 AM, LIU Zhiwei wrote:
>> (3) It would be handy to have TCGv cpu_vl.
> Do you mean I should define cpu_vl as a global TCG varible like cpu_pc?
> So that I can check vl==0 in translation time.

Yes.

>> vslide1up.vx:
>>      Ho hum, I forgot about masking.  Some options:
>>      (1) Call a helper just as you did in your original patch.
>>      (2) Call a helper only for !vm, for vm as below.
> 
> Sorry, I don't get it why I need a helper for !vm.
> I think I can  call vslideup w/1 whether !vm or vm, then a store to vd[0].

That's right.  I didn't mean a helper specific to vslide1up, but any helper.


r~


^ permalink raw reply	[flat|nested] 336+ messages in thread

end of thread, other threads:[~2020-03-24 14:53 UTC | newest]

Thread overview: 336+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-12 14:58 [PATCH v5 00/60] target/riscv: support vector extension v0.7.1 LIU Zhiwei
2020-03-12 14:58 ` LIU Zhiwei
2020-03-12 14:58 ` [PATCH v5 01/60] target/riscv: add vector extension field in CPURISCVState LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-12 14:58 ` [PATCH v5 02/60] target/riscv: implementation-defined constant parameters LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-12 14:58 ` [PATCH v5 03/60] target/riscv: support vector extension csr LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-12 20:54   ` Alistair Francis
2020-03-12 20:54     ` Alistair Francis
2020-03-14  1:11   ` Richard Henderson
2020-03-14  1:11     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 04/60] target/riscv: add vector configure instruction LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-12 21:23   ` Alistair Francis
2020-03-12 21:23     ` Alistair Francis
2020-03-12 22:00     ` LIU Zhiwei
2020-03-12 22:00       ` LIU Zhiwei
2020-03-12 22:07       ` Alistair Francis
2020-03-12 22:07         ` Alistair Francis
2020-03-14  1:14   ` Richard Henderson
2020-03-14  1:14     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 05/60] target/riscv: add vector stride load and store instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-13 20:38   ` Alistair Francis
2020-03-13 20:38     ` Alistair Francis
2020-03-13 21:32     ` LIU Zhiwei
2020-03-13 21:32       ` LIU Zhiwei
2020-03-13 22:05       ` Alistair Francis
2020-03-13 22:05         ` Alistair Francis
2020-03-13 22:17         ` LIU Zhiwei
2020-03-13 22:17           ` LIU Zhiwei
2020-03-13 23:38           ` Alistair Francis
2020-03-13 23:38             ` Alistair Francis
2020-03-14  1:26       ` Richard Henderson
2020-03-14  1:26         ` Richard Henderson
2020-03-14  1:49         ` LIU Zhiwei
2020-03-14  1:49           ` LIU Zhiwei
2020-03-14  1:36   ` Richard Henderson
2020-03-14  1:36     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 06/60] target/riscv: add vector index " LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-13 21:21   ` Alistair Francis
2020-03-13 21:21     ` Alistair Francis
2020-03-14  1:49   ` Richard Henderson
2020-03-14  1:49     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 07/60] target/riscv: add fault-only-first unit stride load LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-13 22:24   ` Alistair Francis
2020-03-13 22:24     ` Alistair Francis
2020-03-13 22:41     ` LIU Zhiwei
2020-03-13 22:41       ` LIU Zhiwei
2020-03-14  1:50   ` Richard Henderson
2020-03-14  1:50     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 08/60] target/riscv: add vector amo operations LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  0:02   ` Alistair Francis
2020-03-14  0:02     ` Alistair Francis
2020-03-14  0:36     ` LIU Zhiwei
2020-03-14  0:36       ` LIU Zhiwei
2020-03-14  4:28   ` Richard Henderson
2020-03-14  4:28     ` Richard Henderson
2020-03-14  5:07     ` LIU Zhiwei
2020-03-14  5:07       ` LIU Zhiwei
2020-03-12 14:58 ` [PATCH v5 09/60] target/riscv: vector single-width integer add and subtract LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  5:25   ` Richard Henderson
2020-03-14  5:25     ` Richard Henderson
2020-03-14  8:11     ` LIU Zhiwei
2020-03-14  8:11       ` LIU Zhiwei
2020-03-23  8:10     ` LIU Zhiwei
2020-03-23  8:10       ` LIU Zhiwei
2020-03-23 17:46       ` Richard Henderson
2020-03-23 17:46         ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 10/60] target/riscv: vector widening " LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  5:32   ` Richard Henderson
2020-03-14  5:32     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 11/60] target/riscv: vector integer add-with-carry / subtract-with-borrow instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  5:58   ` Richard Henderson
2020-03-14  5:58     ` Richard Henderson
2020-03-14  6:08     ` LIU Zhiwei
2020-03-14  6:08       ` LIU Zhiwei
2020-03-14  6:16     ` Richard Henderson
2020-03-14  6:16       ` Richard Henderson
2020-03-14  6:32       ` LIU Zhiwei
2020-03-14  6:32         ` LIU Zhiwei
2020-03-12 14:58 ` [PATCH v5 12/60] target/riscv: vector bitwise logical instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  6:00   ` Richard Henderson
2020-03-14  6:00     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 13/60] target/riscv: vector single-width bit shift instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  6:07   ` Richard Henderson
2020-03-14  6:07     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 14/60] target/riscv: vector narrowing integer right " LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  6:10   ` Richard Henderson
2020-03-14  6:10     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 15/60] target/riscv: vector integer comparison instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  6:33   ` Richard Henderson
2020-03-14  6:33     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 16/60] target/riscv: vector integer min/max instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  6:40   ` Richard Henderson
2020-03-14  6:40     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 17/60] target/riscv: vector single-width integer multiply instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  6:52   ` Richard Henderson
2020-03-14  6:52     ` Richard Henderson
2020-03-14  7:02     ` LIU Zhiwei
2020-03-14  7:02       ` LIU Zhiwei
2020-03-12 14:58 ` [PATCH v5 18/60] target/riscv: vector integer divide instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  6:58   ` Richard Henderson
2020-03-14  6:58     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 19/60] target/riscv: vector widening integer multiply instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  7:06   ` Richard Henderson
2020-03-14  7:06     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 20/60] target/riscv: vector single-width integer multiply-add instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  7:10   ` Richard Henderson
2020-03-14  7:10     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 21/60] target/riscv: vector widening " LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  7:13   ` Richard Henderson
2020-03-14  7:13     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 22/60] target/riscv: vector integer merge and move instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  7:27   ` Richard Henderson
2020-03-14  7:27     ` Richard Henderson
2020-03-16  2:57     ` LIU Zhiwei
2020-03-16  2:57       ` LIU Zhiwei
2020-03-16  5:32       ` Richard Henderson
2020-03-16  5:32         ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 23/60] target/riscv: vector single-width saturating add and subtract LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  7:52   ` Richard Henderson
2020-03-14  7:52     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 24/60] target/riscv: vector single-width averaging " LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  8:14   ` Richard Henderson
2020-03-14  8:14     ` Richard Henderson
2020-03-14  8:25     ` Richard Henderson
2020-03-14  8:25       ` Richard Henderson
2020-03-14 23:12       ` LIU Zhiwei
2020-03-14 23:12         ` LIU Zhiwei
2020-03-15  1:00         ` Richard Henderson
2020-03-15  1:00           ` Richard Henderson
2020-03-15 23:23           ` LIU Zhiwei
2020-03-15 23:23             ` LIU Zhiwei
2020-03-15 23:27             ` Richard Henderson
2020-03-15 23:27               ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 25/60] target/riscv: vector single-width fractional multiply with rounding and saturation LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  8:27   ` Richard Henderson
2020-03-14  8:27     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 26/60] target/riscv: vector widening saturating scaled multiply-add LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  8:32   ` Richard Henderson
2020-03-14  8:32     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 27/60] target/riscv: vector single-width scaling shift instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  8:34   ` Richard Henderson
2020-03-14  8:34     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 28/60] target/riscv: vector narrowing fixed-point clip instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  8:36   ` Richard Henderson
2020-03-14  8:36     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 29/60] target/riscv: vector single-width floating-point add/subtract instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  8:40   ` Richard Henderson
2020-03-14  8:40     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 30/60] target/riscv: vector widening " LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  8:43   ` Richard Henderson
2020-03-14  8:43     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 31/60] target/riscv: vector single-width floating-point multiply/divide instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  8:43   ` Richard Henderson
2020-03-14  8:43     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 32/60] target/riscv: vector widening floating-point multiply LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  8:46   ` Richard Henderson
2020-03-14  8:46     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 33/60] target/riscv: vector single-width floating-point fused multiply-add instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  8:49   ` Richard Henderson
2020-03-14  8:49     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 34/60] target/riscv: vector widening " LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  8:50   ` Richard Henderson
2020-03-14  8:50     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 35/60] target/riscv: vector floating-point square-root instruction LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-12 14:58 ` [PATCH v5 36/60] target/riscv: vector floating-point min/max instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  8:52   ` Richard Henderson
2020-03-14  8:52     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 37/60] target/riscv: vector floating-point sign-injection instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  8:57   ` Richard Henderson
2020-03-14  8:57     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 38/60] target/riscv: vector floating-point compare instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  9:08   ` Richard Henderson
2020-03-14  9:08     ` Richard Henderson
2020-03-14  9:11     ` LIU Zhiwei
2020-03-14  9:11       ` LIU Zhiwei
2020-03-12 14:58 ` [PATCH v5 39/60] target/riscv: vector floating-point classify instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14  9:10   ` Richard Henderson
2020-03-14  9:10     ` Richard Henderson
2020-03-14  9:15     ` LIU Zhiwei
2020-03-14  9:15       ` LIU Zhiwei
2020-03-14 22:06       ` Richard Henderson
2020-03-14 22:06         ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 40/60] target/riscv: vector floating-point merge instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14 22:47   ` Richard Henderson
2020-03-14 22:47     ` Richard Henderson
2020-03-16  3:41     ` LIU Zhiwei
2020-03-16  3:41       ` LIU Zhiwei
2020-03-16  5:37       ` Richard Henderson
2020-03-16  5:37         ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 41/60] target/riscv: vector floating-point/integer type-convert instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14 22:50   ` Richard Henderson
2020-03-14 22:50     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 42/60] target/riscv: widening " LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14 23:03   ` Richard Henderson
2020-03-14 23:03     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 43/60] target/riscv: narrowing " LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14 23:08   ` Richard Henderson
2020-03-14 23:08     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 44/60] target/riscv: vector single-width integer reduction instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14 23:29   ` Richard Henderson
2020-03-14 23:29     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 45/60] target/riscv: vector wideing " LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14 23:34   ` Richard Henderson
2020-03-14 23:34     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 46/60] target/riscv: vector single-width floating-point " LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14 23:48   ` Richard Henderson
2020-03-14 23:48     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 47/60] target/riscv: vector widening " LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-14 23:49   ` Richard Henderson
2020-03-14 23:49     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 48/60] target/riscv: vector mask-register logical instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-12 14:58 ` [PATCH v5 49/60] target/riscv: vector mask population count vmpopc LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-15  1:20   ` Richard Henderson
2020-03-15  1:20     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 50/60] target/riscv: vmfirst find-first-set mask bit LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-15  1:36   ` Richard Henderson
2020-03-15  1:36     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 51/60] target/riscv: set-X-first " LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-12 14:58 ` [PATCH v5 52/60] target/riscv: vector iota instruction LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-15  1:50   ` Richard Henderson
2020-03-15  1:50     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 53/60] target/riscv: vector element index instruction LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-15  1:54   ` Richard Henderson
2020-03-15  1:54     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 54/60] target/riscv: integer extract instruction LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-15  2:53   ` Richard Henderson
2020-03-15  2:53     ` Richard Henderson
2020-03-15  5:15     ` LIU Zhiwei
2020-03-15  5:15       ` LIU Zhiwei
2020-03-15  5:21       ` Richard Henderson
2020-03-15  5:21         ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 55/60] target/riscv: integer scalar move instruction LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-15  3:54   ` Richard Henderson
2020-03-15  3:54     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 56/60] target/riscv: floating-point scalar move instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-15  4:39   ` Richard Henderson
2020-03-15  4:39     ` Richard Henderson
2020-03-15  6:13     ` LIU Zhiwei
2020-03-15  6:13       ` LIU Zhiwei
2020-03-15  6:48       ` Richard Henderson
2020-03-15  6:48         ` Richard Henderson
2020-03-17  6:01     ` LIU Zhiwei
2020-03-17  6:01       ` LIU Zhiwei
2020-03-17 15:11       ` Richard Henderson
2020-03-17 15:11         ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 57/60] target/riscv: vector slide instructions LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-15  5:16   ` Richard Henderson
2020-03-15  5:16     ` Richard Henderson
2020-03-15  6:49     ` LIU Zhiwei
2020-03-15  6:49       ` LIU Zhiwei
2020-03-15  6:56       ` Richard Henderson
2020-03-15  6:56         ` Richard Henderson
2020-03-16  8:04     ` LIU Zhiwei
2020-03-16  8:04       ` LIU Zhiwei
2020-03-16 17:42       ` Richard Henderson
2020-03-16 17:42         ` Richard Henderson
2020-03-24 10:51         ` LIU Zhiwei
2020-03-24 10:51           ` LIU Zhiwei
2020-03-24 14:52           ` Richard Henderson
2020-03-24 14:52             ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 58/60] target/riscv: vector register gather instruction LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-15  5:44   ` Richard Henderson
2020-03-15  5:44     ` Richard Henderson
2020-03-12 14:58 ` [PATCH v5 59/60] target/riscv: vector compress instruction LIU Zhiwei
2020-03-12 14:58   ` LIU Zhiwei
2020-03-12 14:59 ` [PATCH v5 60/60] target/riscv: configure and turn on vector extension from command line LIU Zhiwei
2020-03-12 14:59   ` LIU Zhiwei
2020-03-13 21:41   ` Alistair Francis
2020-03-13 21:41     ` Alistair Francis
2020-03-13 21:52     ` LIU Zhiwei
2020-03-13 21:52       ` LIU Zhiwei
2020-03-13  0:41 ` [PATCH v5 00/60] target/riscv: support vector extension v0.7.1 no-reply
2020-03-13  0:41   ` no-reply
2020-03-15  7:00 ` [PATCH v5 35/60] target/riscv: vector floating-point square-root instruction Richard Henderson
2020-03-15  7:00   ` Richard Henderson
2020-03-15  7:26 ` [PATCH v5 51/60] target/riscv: set-X-first mask bit Richard Henderson
2020-03-15  7:26   ` Richard Henderson
2020-03-15  7:34 ` [PATCH v5 59/60] target/riscv: vector compress instruction Richard Henderson
2020-03-15  7:34   ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.