All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/37] target/riscv: support packed extension v0.9.4
@ 2021-06-10  7:58 ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

This patchset implements the packed extension for RISC-V on QEMU.

This patchset have passed all my direct Linux user mode cases(RV64) on
X86-64 Ubuntu host machine. You can also find this patch set on my
repo(https://github.com/romanheros/qemu.git branch:packed-upstream-v2).

I have ported packed extension on RISU, but I can't compare with SPIKE 
because the SPIKE PK lacks of socket and signal process syscalls. Neither
with RISCOF, as its P extension is not complete. If anyone has random
test method, please let me know.

Features:
* support specification packed extension 
  v0.9.4(https://github.com/riscv/riscv-p-spec/)
* support basic packed extension.
* support Zpsoperand.

v2:
* remove all the TARGET_RISCV64 macro.
* use tcg_gen_vec_* to accelabrate.
* update specficication to latest v0.9.4
* fix kmsxda32, kmsda32,kslra32,smal

LIU Zhiwei (37):
  target/riscv: implementation-defined constant parameters
  target/riscv: Make the vector helper functions public
  target/riscv: 16-bit Addition & Subtraction Instructions
  target/riscv: 8-bit Addition & Subtraction Instruction
  target/riscv: SIMD 16-bit Shift Instructions
  target/riscv: SIMD 8-bit Shift Instructions
  target/riscv: SIMD 16-bit Compare Instructions
  target/riscv: SIMD 8-bit Compare Instructions
  target/riscv: SIMD 16-bit Multiply Instructions
  target/riscv: SIMD 8-bit Multiply Instructions
  target/riscv: SIMD 16-bit Miscellaneous Instructions
  target/riscv: SIMD 8-bit Miscellaneous Instructions
  target/riscv: 8-bit Unpacking Instructions
  target/riscv: 16-bit Packing Instructions
  target/riscv: Signed MSW 32x32 Multiply and Add Instructions
  target/riscv: Signed MSW 32x16 Multiply and Add Instructions
  target/riscv: Signed 16-bit Multiply 32-bit Add/Subtract Instructions
  target/riscv: Signed 16-bit Multiply 64-bit Add/Subtract Instructions
  target/riscv: Partial-SIMD Miscellaneous Instructions
  target/riscv: 8-bit Multiply with 32-bit Add Instructions
  target/riscv: 64-bit Add/Subtract Instructions
  target/riscv: 32-bit Multiply 64-bit Add/Subtract Instructions
  target/riscv: Signed 16-bit Multiply with 64-bit Add/Subtract
    Instructions
  target/riscv: Non-SIMD Q15 saturation ALU Instructions
  target/riscv: Non-SIMD Q31 saturation ALU Instructions
  target/riscv: 32-bit Computation Instructions
  target/riscv: Non-SIMD Miscellaneous Instructions
  target/riscv: RV64 Only SIMD 32-bit Add/Subtract Instructions
  target/riscv: RV64 Only SIMD 32-bit Shift Instructions
  target/riscv: RV64 Only SIMD 32-bit Miscellaneous Instructions
  target/riscv: RV64 Only SIMD Q15 saturating Multiply Instructions
  target/riscv: RV64 Only 32-bit Multiply Instructions
  target/riscv: RV64 Only 32-bit Multiply & Add Instructions
  target/riscv: RV64 Only 32-bit Parallel Multiply & Add Instructions
  target/riscv: RV64 Only Non-SIMD 32-bit Shift Instructions
  target/riscv: RV64 Only 32-bit Packing Instructions
  target/riscv: configure and turn on packed extension from command line

 include/tcg/tcg-op-gvec.h               |   38 +
 target/riscv/cpu.c                      |   34 +
 target/riscv/cpu.h                      |    6 +
 target/riscv/helper.h                   |  330 ++
 target/riscv/insn32.decode              |  370 +++
 target/riscv/insn_trans/trans_rvp.c.inc | 1155 +++++++
 target/riscv/internals.h                |   50 +
 target/riscv/meson.build                |    1 +
 target/riscv/packed_helper.c            | 3851 +++++++++++++++++++++++
 target/riscv/translate.c                |    3 +
 target/riscv/vector_helper.c            |   82 +-
 tcg/tcg-op-gvec.c                       |  131 +
 12 files changed, 5993 insertions(+), 58 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvp.c.inc
 create mode 100644 target/riscv/packed_helper.c

-- 
2.25.1



^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v2 00/37] target/riscv: support packed extension v0.9.4
@ 2021-06-10  7:58 ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

This patchset implements the packed extension for RISC-V on QEMU.

This patchset have passed all my direct Linux user mode cases(RV64) on
X86-64 Ubuntu host machine. You can also find this patch set on my
repo(https://github.com/romanheros/qemu.git branch:packed-upstream-v2).

I have ported packed extension on RISU, but I can't compare with SPIKE 
because the SPIKE PK lacks of socket and signal process syscalls. Neither
with RISCOF, as its P extension is not complete. If anyone has random
test method, please let me know.

Features:
* support specification packed extension 
  v0.9.4(https://github.com/riscv/riscv-p-spec/)
* support basic packed extension.
* support Zpsoperand.

v2:
* remove all the TARGET_RISCV64 macro.
* use tcg_gen_vec_* to accelabrate.
* update specficication to latest v0.9.4
* fix kmsxda32, kmsda32,kslra32,smal

LIU Zhiwei (37):
  target/riscv: implementation-defined constant parameters
  target/riscv: Make the vector helper functions public
  target/riscv: 16-bit Addition & Subtraction Instructions
  target/riscv: 8-bit Addition & Subtraction Instruction
  target/riscv: SIMD 16-bit Shift Instructions
  target/riscv: SIMD 8-bit Shift Instructions
  target/riscv: SIMD 16-bit Compare Instructions
  target/riscv: SIMD 8-bit Compare Instructions
  target/riscv: SIMD 16-bit Multiply Instructions
  target/riscv: SIMD 8-bit Multiply Instructions
  target/riscv: SIMD 16-bit Miscellaneous Instructions
  target/riscv: SIMD 8-bit Miscellaneous Instructions
  target/riscv: 8-bit Unpacking Instructions
  target/riscv: 16-bit Packing Instructions
  target/riscv: Signed MSW 32x32 Multiply and Add Instructions
  target/riscv: Signed MSW 32x16 Multiply and Add Instructions
  target/riscv: Signed 16-bit Multiply 32-bit Add/Subtract Instructions
  target/riscv: Signed 16-bit Multiply 64-bit Add/Subtract Instructions
  target/riscv: Partial-SIMD Miscellaneous Instructions
  target/riscv: 8-bit Multiply with 32-bit Add Instructions
  target/riscv: 64-bit Add/Subtract Instructions
  target/riscv: 32-bit Multiply 64-bit Add/Subtract Instructions
  target/riscv: Signed 16-bit Multiply with 64-bit Add/Subtract
    Instructions
  target/riscv: Non-SIMD Q15 saturation ALU Instructions
  target/riscv: Non-SIMD Q31 saturation ALU Instructions
  target/riscv: 32-bit Computation Instructions
  target/riscv: Non-SIMD Miscellaneous Instructions
  target/riscv: RV64 Only SIMD 32-bit Add/Subtract Instructions
  target/riscv: RV64 Only SIMD 32-bit Shift Instructions
  target/riscv: RV64 Only SIMD 32-bit Miscellaneous Instructions
  target/riscv: RV64 Only SIMD Q15 saturating Multiply Instructions
  target/riscv: RV64 Only 32-bit Multiply Instructions
  target/riscv: RV64 Only 32-bit Multiply & Add Instructions
  target/riscv: RV64 Only 32-bit Parallel Multiply & Add Instructions
  target/riscv: RV64 Only Non-SIMD 32-bit Shift Instructions
  target/riscv: RV64 Only 32-bit Packing Instructions
  target/riscv: configure and turn on packed extension from command line

 include/tcg/tcg-op-gvec.h               |   38 +
 target/riscv/cpu.c                      |   34 +
 target/riscv/cpu.h                      |    6 +
 target/riscv/helper.h                   |  330 ++
 target/riscv/insn32.decode              |  370 +++
 target/riscv/insn_trans/trans_rvp.c.inc | 1155 +++++++
 target/riscv/internals.h                |   50 +
 target/riscv/meson.build                |    1 +
 target/riscv/packed_helper.c            | 3851 +++++++++++++++++++++++
 target/riscv/translate.c                |    3 +
 target/riscv/vector_helper.c            |   82 +-
 tcg/tcg-op-gvec.c                       |  131 +
 12 files changed, 5993 insertions(+), 58 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvp.c.inc
 create mode 100644 target/riscv/packed_helper.c

-- 
2.25.1



^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v2 01/37] target/riscv: implementation-defined constant parameters
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair Francis, LIU Zhiwei

ext_psfoperand is whether to support Zpsfoperand sub-extension.
pext_ver is the packed specification version, default value is v0.9.4.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/cpu.c       | 31 +++++++++++++++++++++++++++++++
 target/riscv/cpu.h       |  6 ++++++
 target/riscv/translate.c |  2 ++
 3 files changed, 39 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 991a6bb760..9d8cf60a1c 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -137,6 +137,11 @@ static void set_vext_version(CPURISCVState *env, int vext_ver)
     env->vext_ver = vext_ver;
 }
 
+static void set_pext_version(CPURISCVState *env, int pext_ver)
+{
+    env->pext_ver = pext_ver;
+}
+
 static void set_feature(CPURISCVState *env, int feature)
 {
     env->features |= (1ULL << feature);
@@ -395,6 +400,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
     int priv_version = PRIV_VERSION_1_11_0;
     int bext_version = BEXT_VERSION_0_93_0;
     int vext_version = VEXT_VERSION_0_07_1;
+    int pext_version = PEXT_VERSION_0_09_4;
     target_ulong target_misa = env->misa;
     Error *local_err = NULL;
 
@@ -420,6 +426,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
     set_priv_version(env, priv_version);
     set_bext_version(env, bext_version);
     set_vext_version(env, vext_version);
+    set_pext_version(env, pext_version);
 
     if (cpu->cfg.mmu) {
         set_feature(env, RISCV_FEATURE_MMU);
@@ -553,6 +560,30 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
             }
             set_vext_version(env, vext_version);
         }
+        if (cpu->cfg.ext_p) {
+            target_misa |= RVP;
+            if (cpu->cfg.pext_spec) {
+                if (!g_strcmp0(cpu->cfg.pext_spec, "v0.9.4")) {
+                    pext_version = PEXT_VERSION_0_09_4;
+                } else {
+                    error_setg(errp,
+                               "Unsupported packed spec version '%s'",
+                               cpu->cfg.pext_spec);
+                    return;
+                }
+            } else {
+                qemu_log("packed verison is not specified, "
+                         "use the default value v0.9.4\n");
+            }
+            if (env->misa == RV64) {
+                if (!cpu->cfg.ext_psfoperand) {
+                    error_setg(errp, "The Zpsfoperand"
+                                     "sub-extensions is required for RV64P.");
+                    return;
+                }
+            }
+            set_pext_version(env, pext_version);
+        }
 
         set_misa(env, target_misa);
     }
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index bf1c899c00..4d20afb267 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -63,6 +63,7 @@
 #define RVF RV('F')
 #define RVD RV('D')
 #define RVV RV('V')
+#define RVP RV('P')
 #define RVC RV('C')
 #define RVS RV('S')
 #define RVU RV('U')
@@ -85,6 +86,7 @@ enum {
 
 #define BEXT_VERSION_0_93_0 0x00009300
 #define VEXT_VERSION_0_07_1 0x00000701
+#define PEXT_VERSION_0_09_4 0x00000904
 
 enum {
     TRANSLATE_SUCCESS,
@@ -135,6 +137,7 @@ struct CPURISCVState {
     target_ulong priv_ver;
     target_ulong bext_ver;
     target_ulong vext_ver;
+    target_ulong pext_ver;
     target_ulong misa;
     target_ulong misa_mask;
 
@@ -293,14 +296,17 @@ struct RISCVCPU {
         bool ext_u;
         bool ext_h;
         bool ext_v;
+        bool ext_p;
         bool ext_counters;
         bool ext_ifencei;
         bool ext_icsr;
+        bool ext_psfoperand;
 
         char *priv_spec;
         char *user_spec;
         char *bext_spec;
         char *vext_spec;
+        char *pext_spec;
         uint16_t vlen;
         uint16_t elen;
         bool mmu;
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index c6e8739614..0e6ede4d71 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -56,6 +56,7 @@ typedef struct DisasContext {
        to reset this known value.  */
     int frm;
     bool ext_ifencei;
+    bool ext_psfoperand;
     bool hlsx;
     /* vector extension */
     bool vill;
@@ -965,6 +966,7 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
     ctx->mlen = 1 << (ctx->sew  + 3 - ctx->lmul);
     ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
+    ctx->ext_psfoperand = cpu->cfg.ext_psfoperand;
     ctx->cs = cs;
 }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 01/37] target/riscv: implementation-defined constant parameters
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson,
	LIU Zhiwei, Alistair Francis

ext_psfoperand is whether to support Zpsfoperand sub-extension.
pext_ver is the packed specification version, default value is v0.9.4.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/cpu.c       | 31 +++++++++++++++++++++++++++++++
 target/riscv/cpu.h       |  6 ++++++
 target/riscv/translate.c |  2 ++
 3 files changed, 39 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 991a6bb760..9d8cf60a1c 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -137,6 +137,11 @@ static void set_vext_version(CPURISCVState *env, int vext_ver)
     env->vext_ver = vext_ver;
 }
 
+static void set_pext_version(CPURISCVState *env, int pext_ver)
+{
+    env->pext_ver = pext_ver;
+}
+
 static void set_feature(CPURISCVState *env, int feature)
 {
     env->features |= (1ULL << feature);
@@ -395,6 +400,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
     int priv_version = PRIV_VERSION_1_11_0;
     int bext_version = BEXT_VERSION_0_93_0;
     int vext_version = VEXT_VERSION_0_07_1;
+    int pext_version = PEXT_VERSION_0_09_4;
     target_ulong target_misa = env->misa;
     Error *local_err = NULL;
 
@@ -420,6 +426,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
     set_priv_version(env, priv_version);
     set_bext_version(env, bext_version);
     set_vext_version(env, vext_version);
+    set_pext_version(env, pext_version);
 
     if (cpu->cfg.mmu) {
         set_feature(env, RISCV_FEATURE_MMU);
@@ -553,6 +560,30 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
             }
             set_vext_version(env, vext_version);
         }
+        if (cpu->cfg.ext_p) {
+            target_misa |= RVP;
+            if (cpu->cfg.pext_spec) {
+                if (!g_strcmp0(cpu->cfg.pext_spec, "v0.9.4")) {
+                    pext_version = PEXT_VERSION_0_09_4;
+                } else {
+                    error_setg(errp,
+                               "Unsupported packed spec version '%s'",
+                               cpu->cfg.pext_spec);
+                    return;
+                }
+            } else {
+                qemu_log("packed verison is not specified, "
+                         "use the default value v0.9.4\n");
+            }
+            if (env->misa == RV64) {
+                if (!cpu->cfg.ext_psfoperand) {
+                    error_setg(errp, "The Zpsfoperand"
+                                     "sub-extensions is required for RV64P.");
+                    return;
+                }
+            }
+            set_pext_version(env, pext_version);
+        }
 
         set_misa(env, target_misa);
     }
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index bf1c899c00..4d20afb267 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -63,6 +63,7 @@
 #define RVF RV('F')
 #define RVD RV('D')
 #define RVV RV('V')
+#define RVP RV('P')
 #define RVC RV('C')
 #define RVS RV('S')
 #define RVU RV('U')
@@ -85,6 +86,7 @@ enum {
 
 #define BEXT_VERSION_0_93_0 0x00009300
 #define VEXT_VERSION_0_07_1 0x00000701
+#define PEXT_VERSION_0_09_4 0x00000904
 
 enum {
     TRANSLATE_SUCCESS,
@@ -135,6 +137,7 @@ struct CPURISCVState {
     target_ulong priv_ver;
     target_ulong bext_ver;
     target_ulong vext_ver;
+    target_ulong pext_ver;
     target_ulong misa;
     target_ulong misa_mask;
 
@@ -293,14 +296,17 @@ struct RISCVCPU {
         bool ext_u;
         bool ext_h;
         bool ext_v;
+        bool ext_p;
         bool ext_counters;
         bool ext_ifencei;
         bool ext_icsr;
+        bool ext_psfoperand;
 
         char *priv_spec;
         char *user_spec;
         char *bext_spec;
         char *vext_spec;
+        char *pext_spec;
         uint16_t vlen;
         uint16_t elen;
         bool mmu;
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index c6e8739614..0e6ede4d71 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -56,6 +56,7 @@ typedef struct DisasContext {
        to reset this known value.  */
     int frm;
     bool ext_ifencei;
+    bool ext_psfoperand;
     bool hlsx;
     /* vector extension */
     bool vill;
@@ -965,6 +966,7 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
     ctx->mlen = 1 << (ctx->sew  + 3 - ctx->lmul);
     ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
+    ctx->ext_psfoperand = cpu->cfg.ext_psfoperand;
     ctx->cs = cs;
 }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 02/37] target/riscv: Make the vector helper functions public
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair Francis, LIU Zhiwei

The saturate functions about add,subtract and shift functions can
be used in packed extension.Therefore hoist them up.

The endianess process macro is also be hoisted.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/internals.h     | 50 ++++++++++++++++++++++
 target/riscv/vector_helper.c | 82 +++++++++++-------------------------
 2 files changed, 74 insertions(+), 58 deletions(-)

diff --git a/target/riscv/internals.h b/target/riscv/internals.h
index b15ad394bb..698158e116 100644
--- a/target/riscv/internals.h
+++ b/target/riscv/internals.h
@@ -58,4 +58,54 @@ static inline float32 check_nanbox_s(uint64_t f)
     }
 }
 
+/*
+ * Note that vector data is stored in host-endian 64-bit chunks,
+ * so addressing units smaller than that needs a host-endian fixup.
+ */
+#ifdef HOST_WORDS_BIGENDIAN
+#define H1(x)   ((x) ^ 7)
+#define H1_2(x) ((x) ^ 6)
+#define H1_4(x) ((x) ^ 4)
+#define H2(x)   ((x) ^ 3)
+#define H4(x)   ((x) ^ 1)
+#define H8(x)   ((x))
+#else
+#define H1(x)   (x)
+#define H1_2(x) (x)
+#define H1_4(x) (x)
+#define H2(x)   (x)
+#define H4(x)   (x)
+#define H8(x)   (x)
+#endif
+
+/* share functions about saturation */
+int8_t sadd8(CPURISCVState *, int vxrm, int8_t, int8_t);
+int16_t sadd16(CPURISCVState *, int vxrm, int16_t, int16_t);
+int32_t sadd32(CPURISCVState *, int vxrm, int32_t, int32_t);
+int64_t sadd64(CPURISCVState *, int vxrm, int64_t, int64_t);
+
+uint8_t saddu8(CPURISCVState *, int vxrm, uint8_t, uint8_t);
+uint16_t saddu16(CPURISCVState *, int vxrm, uint16_t, uint16_t);
+uint32_t saddu32(CPURISCVState *, int vxrm, uint32_t, uint32_t);
+uint64_t saddu64(CPURISCVState *, int vxrm, uint64_t, uint64_t);
+
+int8_t ssub8(CPURISCVState *, int vxrm, int8_t, int8_t);
+int16_t ssub16(CPURISCVState *, int vxrm, int16_t, int16_t);
+int32_t ssub32(CPURISCVState *, int vxrm, int32_t, int32_t);
+int64_t ssub64(CPURISCVState *, int vxrm, int64_t, int64_t);
+
+uint8_t ssubu8(CPURISCVState *, int vxrm, uint8_t, uint8_t);
+uint16_t ssubu16(CPURISCVState *, int vxrm, uint16_t, uint16_t);
+uint32_t ssubu32(CPURISCVState *, int vxrm, uint32_t, uint32_t);
+uint64_t ssubu64(CPURISCVState *, int vxrm, uint64_t, uint64_t);
+
+/* share shift functions */
+int8_t vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b);
+int16_t vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b);
+int32_t vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b);
+int64_t vssra64(CPURISCVState *env, int vxrm, int64_t a, int64_t b);
+uint8_t vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b);
+uint16_t vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b);
+uint32_t vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b);
+uint64_t vssrl64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b);
 #endif
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 12c31aa4b4..c720e7b1fc 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -56,26 +56,6 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
     return vl;
 }
 
-/*
- * Note that vector data is stored in host-endian 64-bit chunks,
- * so addressing units smaller than that needs a host-endian fixup.
- */
-#ifdef HOST_WORDS_BIGENDIAN
-#define H1(x)   ((x) ^ 7)
-#define H1_2(x) ((x) ^ 6)
-#define H1_4(x) ((x) ^ 4)
-#define H2(x)   ((x) ^ 3)
-#define H4(x)   ((x) ^ 1)
-#define H8(x)   ((x))
-#else
-#define H1(x)   (x)
-#define H1_2(x) (x)
-#define H1_4(x) (x)
-#define H2(x)   (x)
-#define H4(x)   (x)
-#define H8(x)   (x)
-#endif
-
 static inline uint32_t vext_nf(uint32_t desc)
 {
     return FIELD_EX32(simd_data(desc), VDATA, NF);
@@ -2195,7 +2175,7 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,     \
                  do_##NAME, CLEAR_FN);                          \
 }
 
-static inline uint8_t saddu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
+uint8_t saddu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
 {
     uint8_t res = a + b;
     if (res < a) {
@@ -2205,8 +2185,7 @@ static inline uint8_t saddu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
     return res;
 }
 
-static inline uint16_t saddu16(CPURISCVState *env, int vxrm, uint16_t a,
-                               uint16_t b)
+uint16_t saddu16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
 {
     uint16_t res = a + b;
     if (res < a) {
@@ -2216,8 +2195,7 @@ static inline uint16_t saddu16(CPURISCVState *env, int vxrm, uint16_t a,
     return res;
 }
 
-static inline uint32_t saddu32(CPURISCVState *env, int vxrm, uint32_t a,
-                               uint32_t b)
+uint32_t saddu32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
 {
     uint32_t res = a + b;
     if (res < a) {
@@ -2227,8 +2205,7 @@ static inline uint32_t saddu32(CPURISCVState *env, int vxrm, uint32_t a,
     return res;
 }
 
-static inline uint64_t saddu64(CPURISCVState *env, int vxrm, uint64_t a,
-                               uint64_t b)
+uint64_t saddu64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
 {
     uint64_t res = a + b;
     if (res < a) {
@@ -2324,7 +2301,7 @@ GEN_VEXT_VX_RM(vsaddu_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_RM(vsaddu_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_RM(vsaddu_vx_d, 8, 8, clearq)
 
-static inline int8_t sadd8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
+int8_t sadd8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
 {
     int8_t res = a + b;
     if ((res ^ a) & (res ^ b) & INT8_MIN) {
@@ -2334,7 +2311,7 @@ static inline int8_t sadd8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
     return res;
 }
 
-static inline int16_t sadd16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
+int16_t sadd16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
 {
     int16_t res = a + b;
     if ((res ^ a) & (res ^ b) & INT16_MIN) {
@@ -2344,7 +2321,7 @@ static inline int16_t sadd16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
     return res;
 }
 
-static inline int32_t sadd32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
+int32_t sadd32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
 {
     int32_t res = a + b;
     if ((res ^ a) & (res ^ b) & INT32_MIN) {
@@ -2354,7 +2331,7 @@ static inline int32_t sadd32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
     return res;
 }
 
-static inline int64_t sadd64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
+int64_t sadd64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
 {
     int64_t res = a + b;
     if ((res ^ a) & (res ^ b) & INT64_MIN) {
@@ -2382,7 +2359,7 @@ GEN_VEXT_VX_RM(vsadd_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_RM(vsadd_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_RM(vsadd_vx_d, 8, 8, clearq)
 
-static inline uint8_t ssubu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
+uint8_t ssubu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
 {
     uint8_t res = a - b;
     if (res > a) {
@@ -2392,8 +2369,7 @@ static inline uint8_t ssubu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
     return res;
 }
 
-static inline uint16_t ssubu16(CPURISCVState *env, int vxrm, uint16_t a,
-                               uint16_t b)
+uint16_t ssubu16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
 {
     uint16_t res = a - b;
     if (res > a) {
@@ -2403,8 +2379,7 @@ static inline uint16_t ssubu16(CPURISCVState *env, int vxrm, uint16_t a,
     return res;
 }
 
-static inline uint32_t ssubu32(CPURISCVState *env, int vxrm, uint32_t a,
-                               uint32_t b)
+uint32_t ssubu32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
 {
     uint32_t res = a - b;
     if (res > a) {
@@ -2414,8 +2389,7 @@ static inline uint32_t ssubu32(CPURISCVState *env, int vxrm, uint32_t a,
     return res;
 }
 
-static inline uint64_t ssubu64(CPURISCVState *env, int vxrm, uint64_t a,
-                               uint64_t b)
+uint64_t ssubu64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
 {
     uint64_t res = a - b;
     if (res > a) {
@@ -2443,7 +2417,7 @@ GEN_VEXT_VX_RM(vssubu_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_RM(vssubu_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_RM(vssubu_vx_d, 8, 8, clearq)
 
-static inline int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
+int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
 {
     int8_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT8_MIN) {
@@ -2453,7 +2427,7 @@ static inline int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
     return res;
 }
 
-static inline int16_t ssub16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
+int16_t ssub16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
 {
     int16_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT16_MIN) {
@@ -2463,7 +2437,7 @@ static inline int16_t ssub16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
     return res;
 }
 
-static inline int32_t ssub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
+int32_t ssub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
 {
     int32_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT32_MIN) {
@@ -2473,7 +2447,7 @@ static inline int32_t ssub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
     return res;
 }
 
-static inline int64_t ssub64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
+int64_t ssub64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
 {
     int64_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT64_MIN) {
@@ -2914,8 +2888,7 @@ GEN_VEXT_VX_RM(vwsmaccus_vx_h, 2, 4, clearl)
 GEN_VEXT_VX_RM(vwsmaccus_vx_w, 4, 8, clearq)
 
 /* Vector Single-Width Scaling Shift Instructions */
-static inline uint8_t
-vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
+uint8_t vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
 {
     uint8_t round, shift = b & 0x7;
     uint8_t res;
@@ -2924,8 +2897,7 @@ vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline uint16_t
-vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
+uint16_t vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
 {
     uint8_t round, shift = b & 0xf;
     uint16_t res;
@@ -2934,8 +2906,7 @@ vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline uint32_t
-vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
+uint32_t vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
 {
     uint8_t round, shift = b & 0x1f;
     uint32_t res;
@@ -2944,8 +2915,7 @@ vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline uint64_t
-vssrl64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
+uint64_t vssrl64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
 {
     uint8_t round, shift = b & 0x3f;
     uint64_t res;
@@ -2972,8 +2942,7 @@ GEN_VEXT_VX_RM(vssrl_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_RM(vssrl_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_RM(vssrl_vx_d, 8, 8, clearq)
 
-static inline int8_t
-vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
+int8_t vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
 {
     uint8_t round, shift = b & 0x7;
     int8_t res;
@@ -2982,8 +2951,7 @@ vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline int16_t
-vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
+int16_t vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
 {
     uint8_t round, shift = b & 0xf;
     int16_t res;
@@ -2992,8 +2960,7 @@ vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline int32_t
-vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
+int32_t vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
 {
     uint8_t round, shift = b & 0x1f;
     int32_t res;
@@ -3002,8 +2969,7 @@ vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline int64_t
-vssra64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
+int64_t vssra64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
 {
     uint8_t round, shift = b & 0x3f;
     int64_t res;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 02/37] target/riscv: Make the vector helper functions public
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson,
	LIU Zhiwei, Alistair Francis

The saturate functions about add,subtract and shift functions can
be used in packed extension.Therefore hoist them up.

The endianess process macro is also be hoisted.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/internals.h     | 50 ++++++++++++++++++++++
 target/riscv/vector_helper.c | 82 +++++++++++-------------------------
 2 files changed, 74 insertions(+), 58 deletions(-)

diff --git a/target/riscv/internals.h b/target/riscv/internals.h
index b15ad394bb..698158e116 100644
--- a/target/riscv/internals.h
+++ b/target/riscv/internals.h
@@ -58,4 +58,54 @@ static inline float32 check_nanbox_s(uint64_t f)
     }
 }
 
+/*
+ * Note that vector data is stored in host-endian 64-bit chunks,
+ * so addressing units smaller than that needs a host-endian fixup.
+ */
+#ifdef HOST_WORDS_BIGENDIAN
+#define H1(x)   ((x) ^ 7)
+#define H1_2(x) ((x) ^ 6)
+#define H1_4(x) ((x) ^ 4)
+#define H2(x)   ((x) ^ 3)
+#define H4(x)   ((x) ^ 1)
+#define H8(x)   ((x))
+#else
+#define H1(x)   (x)
+#define H1_2(x) (x)
+#define H1_4(x) (x)
+#define H2(x)   (x)
+#define H4(x)   (x)
+#define H8(x)   (x)
+#endif
+
+/* share functions about saturation */
+int8_t sadd8(CPURISCVState *, int vxrm, int8_t, int8_t);
+int16_t sadd16(CPURISCVState *, int vxrm, int16_t, int16_t);
+int32_t sadd32(CPURISCVState *, int vxrm, int32_t, int32_t);
+int64_t sadd64(CPURISCVState *, int vxrm, int64_t, int64_t);
+
+uint8_t saddu8(CPURISCVState *, int vxrm, uint8_t, uint8_t);
+uint16_t saddu16(CPURISCVState *, int vxrm, uint16_t, uint16_t);
+uint32_t saddu32(CPURISCVState *, int vxrm, uint32_t, uint32_t);
+uint64_t saddu64(CPURISCVState *, int vxrm, uint64_t, uint64_t);
+
+int8_t ssub8(CPURISCVState *, int vxrm, int8_t, int8_t);
+int16_t ssub16(CPURISCVState *, int vxrm, int16_t, int16_t);
+int32_t ssub32(CPURISCVState *, int vxrm, int32_t, int32_t);
+int64_t ssub64(CPURISCVState *, int vxrm, int64_t, int64_t);
+
+uint8_t ssubu8(CPURISCVState *, int vxrm, uint8_t, uint8_t);
+uint16_t ssubu16(CPURISCVState *, int vxrm, uint16_t, uint16_t);
+uint32_t ssubu32(CPURISCVState *, int vxrm, uint32_t, uint32_t);
+uint64_t ssubu64(CPURISCVState *, int vxrm, uint64_t, uint64_t);
+
+/* share shift functions */
+int8_t vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b);
+int16_t vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b);
+int32_t vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b);
+int64_t vssra64(CPURISCVState *env, int vxrm, int64_t a, int64_t b);
+uint8_t vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b);
+uint16_t vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b);
+uint32_t vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b);
+uint64_t vssrl64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b);
 #endif
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 12c31aa4b4..c720e7b1fc 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -56,26 +56,6 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
     return vl;
 }
 
-/*
- * Note that vector data is stored in host-endian 64-bit chunks,
- * so addressing units smaller than that needs a host-endian fixup.
- */
-#ifdef HOST_WORDS_BIGENDIAN
-#define H1(x)   ((x) ^ 7)
-#define H1_2(x) ((x) ^ 6)
-#define H1_4(x) ((x) ^ 4)
-#define H2(x)   ((x) ^ 3)
-#define H4(x)   ((x) ^ 1)
-#define H8(x)   ((x))
-#else
-#define H1(x)   (x)
-#define H1_2(x) (x)
-#define H1_4(x) (x)
-#define H2(x)   (x)
-#define H4(x)   (x)
-#define H8(x)   (x)
-#endif
-
 static inline uint32_t vext_nf(uint32_t desc)
 {
     return FIELD_EX32(simd_data(desc), VDATA, NF);
@@ -2195,7 +2175,7 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,     \
                  do_##NAME, CLEAR_FN);                          \
 }
 
-static inline uint8_t saddu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
+uint8_t saddu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
 {
     uint8_t res = a + b;
     if (res < a) {
@@ -2205,8 +2185,7 @@ static inline uint8_t saddu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
     return res;
 }
 
-static inline uint16_t saddu16(CPURISCVState *env, int vxrm, uint16_t a,
-                               uint16_t b)
+uint16_t saddu16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
 {
     uint16_t res = a + b;
     if (res < a) {
@@ -2216,8 +2195,7 @@ static inline uint16_t saddu16(CPURISCVState *env, int vxrm, uint16_t a,
     return res;
 }
 
-static inline uint32_t saddu32(CPURISCVState *env, int vxrm, uint32_t a,
-                               uint32_t b)
+uint32_t saddu32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
 {
     uint32_t res = a + b;
     if (res < a) {
@@ -2227,8 +2205,7 @@ static inline uint32_t saddu32(CPURISCVState *env, int vxrm, uint32_t a,
     return res;
 }
 
-static inline uint64_t saddu64(CPURISCVState *env, int vxrm, uint64_t a,
-                               uint64_t b)
+uint64_t saddu64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
 {
     uint64_t res = a + b;
     if (res < a) {
@@ -2324,7 +2301,7 @@ GEN_VEXT_VX_RM(vsaddu_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_RM(vsaddu_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_RM(vsaddu_vx_d, 8, 8, clearq)
 
-static inline int8_t sadd8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
+int8_t sadd8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
 {
     int8_t res = a + b;
     if ((res ^ a) & (res ^ b) & INT8_MIN) {
@@ -2334,7 +2311,7 @@ static inline int8_t sadd8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
     return res;
 }
 
-static inline int16_t sadd16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
+int16_t sadd16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
 {
     int16_t res = a + b;
     if ((res ^ a) & (res ^ b) & INT16_MIN) {
@@ -2344,7 +2321,7 @@ static inline int16_t sadd16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
     return res;
 }
 
-static inline int32_t sadd32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
+int32_t sadd32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
 {
     int32_t res = a + b;
     if ((res ^ a) & (res ^ b) & INT32_MIN) {
@@ -2354,7 +2331,7 @@ static inline int32_t sadd32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
     return res;
 }
 
-static inline int64_t sadd64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
+int64_t sadd64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
 {
     int64_t res = a + b;
     if ((res ^ a) & (res ^ b) & INT64_MIN) {
@@ -2382,7 +2359,7 @@ GEN_VEXT_VX_RM(vsadd_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_RM(vsadd_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_RM(vsadd_vx_d, 8, 8, clearq)
 
-static inline uint8_t ssubu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
+uint8_t ssubu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
 {
     uint8_t res = a - b;
     if (res > a) {
@@ -2392,8 +2369,7 @@ static inline uint8_t ssubu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
     return res;
 }
 
-static inline uint16_t ssubu16(CPURISCVState *env, int vxrm, uint16_t a,
-                               uint16_t b)
+uint16_t ssubu16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
 {
     uint16_t res = a - b;
     if (res > a) {
@@ -2403,8 +2379,7 @@ static inline uint16_t ssubu16(CPURISCVState *env, int vxrm, uint16_t a,
     return res;
 }
 
-static inline uint32_t ssubu32(CPURISCVState *env, int vxrm, uint32_t a,
-                               uint32_t b)
+uint32_t ssubu32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
 {
     uint32_t res = a - b;
     if (res > a) {
@@ -2414,8 +2389,7 @@ static inline uint32_t ssubu32(CPURISCVState *env, int vxrm, uint32_t a,
     return res;
 }
 
-static inline uint64_t ssubu64(CPURISCVState *env, int vxrm, uint64_t a,
-                               uint64_t b)
+uint64_t ssubu64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
 {
     uint64_t res = a - b;
     if (res > a) {
@@ -2443,7 +2417,7 @@ GEN_VEXT_VX_RM(vssubu_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_RM(vssubu_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_RM(vssubu_vx_d, 8, 8, clearq)
 
-static inline int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
+int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
 {
     int8_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT8_MIN) {
@@ -2453,7 +2427,7 @@ static inline int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
     return res;
 }
 
-static inline int16_t ssub16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
+int16_t ssub16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
 {
     int16_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT16_MIN) {
@@ -2463,7 +2437,7 @@ static inline int16_t ssub16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
     return res;
 }
 
-static inline int32_t ssub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
+int32_t ssub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
 {
     int32_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT32_MIN) {
@@ -2473,7 +2447,7 @@ static inline int32_t ssub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
     return res;
 }
 
-static inline int64_t ssub64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
+int64_t ssub64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
 {
     int64_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT64_MIN) {
@@ -2914,8 +2888,7 @@ GEN_VEXT_VX_RM(vwsmaccus_vx_h, 2, 4, clearl)
 GEN_VEXT_VX_RM(vwsmaccus_vx_w, 4, 8, clearq)
 
 /* Vector Single-Width Scaling Shift Instructions */
-static inline uint8_t
-vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
+uint8_t vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
 {
     uint8_t round, shift = b & 0x7;
     uint8_t res;
@@ -2924,8 +2897,7 @@ vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline uint16_t
-vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
+uint16_t vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
 {
     uint8_t round, shift = b & 0xf;
     uint16_t res;
@@ -2934,8 +2906,7 @@ vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline uint32_t
-vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
+uint32_t vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
 {
     uint8_t round, shift = b & 0x1f;
     uint32_t res;
@@ -2944,8 +2915,7 @@ vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline uint64_t
-vssrl64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
+uint64_t vssrl64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
 {
     uint8_t round, shift = b & 0x3f;
     uint64_t res;
@@ -2972,8 +2942,7 @@ GEN_VEXT_VX_RM(vssrl_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_RM(vssrl_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_RM(vssrl_vx_d, 8, 8, clearq)
 
-static inline int8_t
-vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
+int8_t vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
 {
     uint8_t round, shift = b & 0x7;
     int8_t res;
@@ -2982,8 +2951,7 @@ vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline int16_t
-vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
+int16_t vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
 {
     uint8_t round, shift = b & 0xf;
     int16_t res;
@@ -2992,8 +2960,7 @@ vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline int32_t
-vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
+int32_t vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
 {
     uint8_t round, shift = b & 0x1f;
     int32_t res;
@@ -3002,8 +2969,7 @@ vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline int64_t
-vssra64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
+int64_t vssra64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
 {
     uint8_t round, shift = b & 0x3f;
     int64_t res;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 03/37] target/riscv: 16-bit Addition & Subtraction Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

Include 5 groups: Wrap-around (dropping overflow), Signed Halving,
Unsigned Halving, Signed Saturation, and Unsigned Saturation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 include/tcg/tcg-op-gvec.h               |  10 +
 target/riscv/helper.h                   |  30 ++
 target/riscv/insn32.decode              |  32 +++
 target/riscv/insn_trans/trans_rvp.c.inc | 117 ++++++++
 target/riscv/meson.build                |   1 +
 target/riscv/packed_helper.c            | 354 ++++++++++++++++++++++++
 target/riscv/translate.c                |   1 +
 tcg/tcg-op-gvec.c                       |  28 ++
 8 files changed, 573 insertions(+)
 create mode 100644 target/riscv/insn_trans/trans_rvp.c.inc
 create mode 100644 target/riscv/packed_helper.c

diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index c69a7de984..2dae9e78d0 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -386,10 +386,12 @@ void tcg_gen_vec_neg32_i64(TCGv_i64 d, TCGv_i64 a);
 
 void tcg_gen_vec_add8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 void tcg_gen_vec_add16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void tcg_gen_vec_add16_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
 void tcg_gen_vec_add32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 
 void tcg_gen_vec_sub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 void tcg_gen_vec_sub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void tcg_gen_vec_sub16_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
 void tcg_gen_vec_sub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 
 void tcg_gen_vec_shl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
@@ -401,4 +403,12 @@ void tcg_gen_vec_sar16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
 void tcg_gen_vec_rotl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
 void tcg_gen_vec_rotl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
 
+#if TARGET_LONG_BITS == 64
+#define tcg_gen_vec_add16_tl tcg_gen_vec_add16_i64
+#define tcg_gen_vec_sub16_tl tcg_gen_vec_sub16_i64
+#else
+#define tcg_gen_vec_add16_tl tcg_gen_vec_add16_i32
+#define tcg_gen_vec_sub16_tl tcg_gen_vec_sub16_i32
+#endif
+
 #endif
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 415e37bc37..b6a71ade33 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1149,3 +1149,33 @@ DEF_HELPER_6(vcompress_vm_b, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vcompress_vm_h, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vcompress_vm_w, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vcompress_vm_d, void, ptr, ptr, ptr, ptr, env, i32)
+
+/* P extension function */
+DEF_HELPER_3(radd16, tl, env, tl, tl)
+DEF_HELPER_3(uradd16, tl, env, tl, tl)
+DEF_HELPER_3(kadd16, tl, env, tl, tl)
+DEF_HELPER_3(ukadd16, tl, env, tl, tl)
+DEF_HELPER_3(rsub16, tl, env, tl, tl)
+DEF_HELPER_3(ursub16, tl, env, tl, tl)
+DEF_HELPER_3(ksub16, tl, env, tl, tl)
+DEF_HELPER_3(uksub16, tl, env, tl, tl)
+DEF_HELPER_3(cras16, tl, env, tl, tl)
+DEF_HELPER_3(rcras16, tl, env, tl, tl)
+DEF_HELPER_3(urcras16, tl, env, tl, tl)
+DEF_HELPER_3(kcras16, tl, env, tl, tl)
+DEF_HELPER_3(ukcras16, tl, env, tl, tl)
+DEF_HELPER_3(crsa16, tl, env, tl, tl)
+DEF_HELPER_3(rcrsa16, tl, env, tl, tl)
+DEF_HELPER_3(urcrsa16, tl, env, tl, tl)
+DEF_HELPER_3(kcrsa16, tl, env, tl, tl)
+DEF_HELPER_3(ukcrsa16, tl, env, tl, tl)
+DEF_HELPER_3(stas16, tl, env, tl, tl)
+DEF_HELPER_3(rstas16, tl, env, tl, tl)
+DEF_HELPER_3(urstas16, tl, env, tl, tl)
+DEF_HELPER_3(kstas16, tl, env, tl, tl)
+DEF_HELPER_3(ukstas16, tl, env, tl, tl)
+DEF_HELPER_3(stsa16, tl, env, tl, tl)
+DEF_HELPER_3(rstsa16, tl, env, tl, tl)
+DEF_HELPER_3(urstsa16, tl, env, tl, tl)
+DEF_HELPER_3(kstsa16, tl, env, tl, tl)
+DEF_HELPER_3(ukstsa16, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index f09f8d5faf..57f72fabf6 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -732,3 +732,35 @@ greviw     0110100 .......... 101 ..... 0011011 @sh5
 gorciw     0010100 .......... 101 ..... 0011011 @sh5
 
 slli_uw    00001. ........... 001 ..... 0011011 @sh
+
+# *** RV32P Extension ***
+add16      0100000  ..... ..... 000 ..... 1110111 @r
+radd16     0000000  ..... ..... 000 ..... 1110111 @r
+uradd16    0010000  ..... ..... 000 ..... 1110111 @r
+kadd16     0001000  ..... ..... 000 ..... 1110111 @r
+ukadd16    0011000  ..... ..... 000 ..... 1110111 @r
+sub16      0100001  ..... ..... 000 ..... 1110111 @r
+rsub16     0000001  ..... ..... 000 ..... 1110111 @r
+ursub16    0010001  ..... ..... 000 ..... 1110111 @r
+ksub16     0001001  ..... ..... 000 ..... 1110111 @r
+uksub16    0011001  ..... ..... 000 ..... 1110111 @r
+cras16     0100010  ..... ..... 000 ..... 1110111 @r
+rcras16    0000010  ..... ..... 000 ..... 1110111 @r
+urcras16   0010010  ..... ..... 000 ..... 1110111 @r
+kcras16    0001010  ..... ..... 000 ..... 1110111 @r
+ukcras16   0011010  ..... ..... 000 ..... 1110111 @r
+crsa16     0100011  ..... ..... 000 ..... 1110111 @r
+rcrsa16    0000011  ..... ..... 000 ..... 1110111 @r
+urcrsa16   0010011  ..... ..... 000 ..... 1110111 @r
+kcrsa16    0001011  ..... ..... 000 ..... 1110111 @r
+ukcrsa16   0011011  ..... ..... 000 ..... 1110111 @r
+stas16     1111010  ..... ..... 010 ..... 1110111 @r
+rstas16    1011010  ..... ..... 010 ..... 1110111 @r
+urstas16   1101010  ..... ..... 010 ..... 1110111 @r
+kstas16    1100010  ..... ..... 010 ..... 1110111 @r
+ukstas16   1110010  ..... ..... 010 ..... 1110111 @r
+stsa16     1111011  ..... ..... 010 ..... 1110111 @r
+rstsa16    1011011  ..... ..... 010 ..... 1110111 @r
+urstsa16   1101011  ..... ..... 010 ..... 1110111 @r
+kstsa16    1100011  ..... ..... 010 ..... 1110111 @r
+ukstsa16   1110011  ..... ..... 010 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
new file mode 100644
index 0000000000..43f395657a
--- /dev/null
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -0,0 +1,117 @@
+/*
+ * RISC-V translation routines for the RVP Standard Extension.
+ *
+ * Copyright (c) 2021 T-Head Semiconductor Co., Ltd. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "tcg/tcg-op-gvec.h"
+#include "tcg/tcg-gvec-desc.h"
+#include "tcg/tcg.h"
+
+/*
+ *** SIMD Data Processing Instructions
+ */
+
+/* 16-bit Addition & Subtraction Instructions */
+
+/*
+ * For some instructions, such as add16, an oberservation can be utilized:
+ * 1) If any reg is zero, it can be reduced to an inline op on the whole reg.
+ * 2) Otherwise, it can be acclebrated by an vec op.
+ */
+static inline bool
+r_inline(DisasContext *ctx, arg_r *a,
+         void (* vecop)(TCGv, TCGv, TCGv),
+         void (* op)(TCGv, TCGv, TCGv))
+{
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+    if (a->rd && a->rs1 && a->rs2) {
+        vecop(cpu_gpr[a->rd], cpu_gpr[a->rs1], cpu_gpr[a->rs2]);
+    } else {
+        gen_arith(ctx, a, op);
+    }
+    return true;
+}
+
+/* Complete inline implementation */
+#define GEN_RVP_R_INLINE(NAME, VECOP, OP)                \
+static bool trans_##NAME(DisasContext *s, arg_r *a)      \
+{                                                        \
+    return r_inline(s, a, VECOP, OP);                    \
+}
+
+GEN_RVP_R_INLINE(add16, tcg_gen_vec_add16_tl, tcg_gen_add_tl);
+GEN_RVP_R_INLINE(sub16, tcg_gen_vec_sub16_tl, tcg_gen_sub_tl);
+
+/* Out of line helpers for R format packed instructions */
+static inline bool
+r_ool(DisasContext *ctx, arg_r *a, void (* fn)(TCGv, TCGv_ptr, TCGv, TCGv))
+{
+    TCGv src1, src2, dst;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    fn(dst, cpu_env, src1, src2);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(dst);
+    return true;
+}
+
+#define GEN_RVP_R_OOL(NAME)                            \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_ool(s, a, gen_helper_##NAME);             \
+}
+
+GEN_RVP_R_OOL(radd16);
+GEN_RVP_R_OOL(uradd16);
+GEN_RVP_R_OOL(kadd16);
+GEN_RVP_R_OOL(ukadd16);
+GEN_RVP_R_OOL(rsub16);
+GEN_RVP_R_OOL(ursub16);
+GEN_RVP_R_OOL(ksub16);
+GEN_RVP_R_OOL(uksub16);
+GEN_RVP_R_OOL(cras16);
+GEN_RVP_R_OOL(rcras16);
+GEN_RVP_R_OOL(urcras16);
+GEN_RVP_R_OOL(kcras16);
+GEN_RVP_R_OOL(ukcras16);
+GEN_RVP_R_OOL(crsa16);
+GEN_RVP_R_OOL(rcrsa16);
+GEN_RVP_R_OOL(urcrsa16);
+GEN_RVP_R_OOL(kcrsa16);
+GEN_RVP_R_OOL(ukcrsa16);
+GEN_RVP_R_OOL(stas16);
+GEN_RVP_R_OOL(rstas16);
+GEN_RVP_R_OOL(urstas16);
+GEN_RVP_R_OOL(kstas16);
+GEN_RVP_R_OOL(ukstas16);
+GEN_RVP_R_OOL(stsa16);
+GEN_RVP_R_OOL(rstsa16);
+GEN_RVP_R_OOL(urstsa16);
+GEN_RVP_R_OOL(kstsa16);
+GEN_RVP_R_OOL(ukstsa16);
diff --git a/target/riscv/meson.build b/target/riscv/meson.build
index d5e0bc93ea..cc169e1b2c 100644
--- a/target/riscv/meson.build
+++ b/target/riscv/meson.build
@@ -17,6 +17,7 @@ riscv_ss.add(files(
   'op_helper.c',
   'vector_helper.c',
   'bitmanip_helper.c',
+  'packed_helper.c',
   'translate.c',
 ))
 
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
new file mode 100644
index 0000000000..b84abaaf25
--- /dev/null
+++ b/target/riscv/packed_helper.c
@@ -0,0 +1,354 @@
+/*
+ * RISC-V P Extension Helpers for QEMU.
+ *
+ * Copyright (c) 2021 T-Head Semiconductor Co., Ltd. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/exec-all.h"
+#include "exec/helper-proto.h"
+#include "exec/cpu_ldst.h"
+#include "fpu/softfloat.h"
+#include <math.h>
+#include "internals.h"
+
+/*
+ *** SIMD Data Processing Instructions
+ */
+
+/* 16-bit Addition & Subtraction Instructions */
+typedef void PackedFn3i(CPURISCVState *, void *, void *, void *, uint8_t);
+
+/* Define a common function to loop elements in packed register */
+static inline target_ulong
+rvpr(CPURISCVState *env, target_ulong a, target_ulong b,
+     uint8_t step, uint8_t size, PackedFn3i *fn)
+{
+    int i, passes = sizeof(target_ulong) / size;
+    target_ulong result = 0;
+
+    for (i = 0; i < passes; i += step) {
+        fn(env, &result, &a, &b, i);
+    }
+    return result;
+}
+
+#define RVPR(NAME, STEP, SIZE)                                  \
+target_ulong HELPER(NAME)(CPURISCVState *env, target_ulong a,   \
+                          target_ulong b)                       \
+{                                                               \
+    return rvpr(env, a, b, STEP, SIZE, (PackedFn3i *)do_##NAME);\
+}
+
+static inline int32_t hadd32(int32_t a, int32_t b)
+{
+    return ((int64_t)a + b) >> 1;
+}
+
+static inline void do_radd16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = hadd32(a[i], b[i]);
+}
+
+RVPR(radd16, 1, 2);
+
+static inline uint32_t haddu32(uint32_t a, uint32_t b)
+{
+    return ((uint64_t)a + b) >> 1;
+}
+
+static inline void do_uradd16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = haddu32(a[i], b[i]);
+}
+
+RVPR(uradd16, 1, 2);
+
+static inline void do_kadd16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = sadd16(env, 0, a[i], b[i]);
+}
+
+RVPR(kadd16, 1, 2);
+
+static inline void do_ukadd16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = saddu16(env, 0, a[i], b[i]);
+}
+
+RVPR(ukadd16, 1, 2);
+
+static inline int32_t hsub32(int32_t a, int32_t b)
+{
+    return ((int64_t)a - b) >> 1;
+}
+
+static inline int64_t hsub64(int64_t a, int64_t b)
+{
+    int64_t res = a - b;
+    int64_t over = (res ^ a) & (a ^ b) & INT64_MIN;
+
+    /* With signed overflow, bit 64 is inverse of bit 63. */
+    return (res >> 1) ^ over;
+}
+
+static inline void do_rsub16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = hsub32(a[i], b[i]);
+}
+
+RVPR(rsub16, 1, 2);
+
+static inline uint64_t hsubu64(uint64_t a, uint64_t b)
+{
+    return (a - b) >> 1;
+}
+
+static inline void do_ursub16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = hsubu64(a[i], b[i]);
+}
+
+RVPR(ursub16, 1, 2);
+
+static inline void do_ksub16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = ssub16(env, 0, a[i], b[i]);
+}
+
+RVPR(ksub16, 1, 2);
+
+static inline void do_uksub16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = ssubu16(env, 0, a[i], b[i]);
+}
+
+RVPR(uksub16, 1, 2);
+
+static inline void do_cras16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = a[H2(i)] - b[H2(i + 1)];
+    d[H2(i + 1)] = a[H2(i + 1)] + b[H2(i)];
+}
+
+RVPR(cras16, 2, 2);
+
+static inline void do_rcras16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hsub32(a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = hadd32(a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(rcras16, 2, 2);
+
+static inline void do_urcras16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hsubu64(a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = haddu32(a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(urcras16, 2, 2);
+
+static inline void do_kcras16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = ssub16(env, 0, a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = sadd16(env, 0, a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(kcras16, 2, 2);
+
+static inline void do_ukcras16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = ssubu16(env, 0, a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = saddu16(env, 0, a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(ukcras16, 2, 2);
+
+static inline void do_crsa16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = a[H2(i)] + b[H2(i + 1)];
+    d[H2(i + 1)] = a[H2(i + 1)] - b[H2(i)];
+}
+
+RVPR(crsa16, 2, 2);
+
+static inline void do_rcrsa16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hadd32(a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = hsub32(a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(rcrsa16, 2, 2);
+
+static inline void do_urcrsa16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = haddu32(a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = hsubu64(a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(urcrsa16, 2, 2);
+
+static inline void do_kcrsa16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = sadd16(env, 0, a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = ssub16(env, 0, a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(kcrsa16, 2, 2);
+
+static inline void do_ukcrsa16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = saddu16(env, 0, a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = ssubu16(env, 0, a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(ukcrsa16, 2, 2);
+
+static inline void do_stas16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = a[H2(i)] - b[H2(i)];
+    d[H2(i + 1)] = a[H2(i + 1)] + b[H2(i + 1)];
+}
+
+RVPR(stas16, 2, 2);
+
+static inline void do_rstas16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hsub32(a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = hadd32(a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(rstas16, 2, 2);
+
+static inline void do_urstas16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hsubu64(a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = haddu32(a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(urstas16, 2, 2);
+
+static inline void do_kstas16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = ssub16(env, 0, a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = sadd16(env, 0, a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(kstas16, 2, 2);
+
+static inline void do_ukstas16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = ssubu16(env, 0, a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = saddu16(env, 0, a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(ukstas16, 2, 2);
+
+static inline void do_stsa16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = a[H2(i)] + b[H2(i)];
+    d[H2(i + 1)] = a[H2(i + 1)] - b[H2(i + 1)];
+}
+
+RVPR(stsa16, 2, 2);
+
+static inline void do_rstsa16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hadd32(a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = hsub32(a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(rstsa16, 2, 2);
+
+static inline void do_urstsa16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = haddu32(a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = hsubu64(a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(urstsa16, 2, 2);
+
+static inline void do_kstsa16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = sadd16(env, 0, a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = ssub16(env, 0, a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(kstsa16, 2, 2);
+
+static inline void do_ukstsa16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = saddu16(env, 0, a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = ssubu16(env, 0, a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(ukstsa16, 2, 2);
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 0e6ede4d71..51b144e9be 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -908,6 +908,7 @@ static bool gen_unary(DisasContext *ctx, arg_r2 *a,
 #include "insn_trans/trans_rvh.c.inc"
 #include "insn_trans/trans_rvv.c.inc"
 #include "insn_trans/trans_rvb.c.inc"
+#include "insn_trans/trans_rvp.c.inc"
 #include "insn_trans/trans_privileged.c.inc"
 
 /* Include the auto-generated decoder for 16 bit insn */
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 498a959839..a8898ba7bf 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -1742,6 +1742,20 @@ void tcg_gen_vec_add16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
     gen_addv_mask(d, a, b, m);
 }
 
+void tcg_gen_vec_add16_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 t1 = tcg_temp_new_i32();
+    TCGv_i32 t2 = tcg_temp_new_i32();
+
+    tcg_gen_andi_i32(t1, a, ~0xffff);
+    tcg_gen_add_i32(t2, a, b);
+    tcg_gen_add_i32(t1, t1, b);
+    tcg_gen_deposit_i32(d, t1, t2, 0, 16);
+
+    tcg_temp_free_i32(t1);
+    tcg_temp_free_i32(t2);
+}
+
 void tcg_gen_vec_add32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 {
     TCGv_i64 t1 = tcg_temp_new_i64();
@@ -1892,6 +1906,20 @@ void tcg_gen_vec_sub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
     gen_subv_mask(d, a, b, m);
 }
 
+void tcg_gen_vec_sub16_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 t1 = tcg_temp_new_i32();
+    TCGv_i32 t2 = tcg_temp_new_i32();
+
+    tcg_gen_andi_i32(t1, b, ~0xffff);
+    tcg_gen_sub_i32(t2, a, b);
+    tcg_gen_sub_i32(t1, a, t1);
+    tcg_gen_deposit_i32(d, t1, t2, 0, 16);
+
+    tcg_temp_free_i32(t1);
+    tcg_temp_free_i32(t2);
+}
+
 void tcg_gen_vec_sub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 {
     TCGv_i64 t1 = tcg_temp_new_i64();
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 03/37] target/riscv: 16-bit Addition & Subtraction Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

Include 5 groups: Wrap-around (dropping overflow), Signed Halving,
Unsigned Halving, Signed Saturation, and Unsigned Saturation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 include/tcg/tcg-op-gvec.h               |  10 +
 target/riscv/helper.h                   |  30 ++
 target/riscv/insn32.decode              |  32 +++
 target/riscv/insn_trans/trans_rvp.c.inc | 117 ++++++++
 target/riscv/meson.build                |   1 +
 target/riscv/packed_helper.c            | 354 ++++++++++++++++++++++++
 target/riscv/translate.c                |   1 +
 tcg/tcg-op-gvec.c                       |  28 ++
 8 files changed, 573 insertions(+)
 create mode 100644 target/riscv/insn_trans/trans_rvp.c.inc
 create mode 100644 target/riscv/packed_helper.c

diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index c69a7de984..2dae9e78d0 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -386,10 +386,12 @@ void tcg_gen_vec_neg32_i64(TCGv_i64 d, TCGv_i64 a);
 
 void tcg_gen_vec_add8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 void tcg_gen_vec_add16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void tcg_gen_vec_add16_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
 void tcg_gen_vec_add32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 
 void tcg_gen_vec_sub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 void tcg_gen_vec_sub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void tcg_gen_vec_sub16_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
 void tcg_gen_vec_sub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 
 void tcg_gen_vec_shl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
@@ -401,4 +403,12 @@ void tcg_gen_vec_sar16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
 void tcg_gen_vec_rotl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
 void tcg_gen_vec_rotl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
 
+#if TARGET_LONG_BITS == 64
+#define tcg_gen_vec_add16_tl tcg_gen_vec_add16_i64
+#define tcg_gen_vec_sub16_tl tcg_gen_vec_sub16_i64
+#else
+#define tcg_gen_vec_add16_tl tcg_gen_vec_add16_i32
+#define tcg_gen_vec_sub16_tl tcg_gen_vec_sub16_i32
+#endif
+
 #endif
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 415e37bc37..b6a71ade33 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1149,3 +1149,33 @@ DEF_HELPER_6(vcompress_vm_b, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vcompress_vm_h, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vcompress_vm_w, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vcompress_vm_d, void, ptr, ptr, ptr, ptr, env, i32)
+
+/* P extension function */
+DEF_HELPER_3(radd16, tl, env, tl, tl)
+DEF_HELPER_3(uradd16, tl, env, tl, tl)
+DEF_HELPER_3(kadd16, tl, env, tl, tl)
+DEF_HELPER_3(ukadd16, tl, env, tl, tl)
+DEF_HELPER_3(rsub16, tl, env, tl, tl)
+DEF_HELPER_3(ursub16, tl, env, tl, tl)
+DEF_HELPER_3(ksub16, tl, env, tl, tl)
+DEF_HELPER_3(uksub16, tl, env, tl, tl)
+DEF_HELPER_3(cras16, tl, env, tl, tl)
+DEF_HELPER_3(rcras16, tl, env, tl, tl)
+DEF_HELPER_3(urcras16, tl, env, tl, tl)
+DEF_HELPER_3(kcras16, tl, env, tl, tl)
+DEF_HELPER_3(ukcras16, tl, env, tl, tl)
+DEF_HELPER_3(crsa16, tl, env, tl, tl)
+DEF_HELPER_3(rcrsa16, tl, env, tl, tl)
+DEF_HELPER_3(urcrsa16, tl, env, tl, tl)
+DEF_HELPER_3(kcrsa16, tl, env, tl, tl)
+DEF_HELPER_3(ukcrsa16, tl, env, tl, tl)
+DEF_HELPER_3(stas16, tl, env, tl, tl)
+DEF_HELPER_3(rstas16, tl, env, tl, tl)
+DEF_HELPER_3(urstas16, tl, env, tl, tl)
+DEF_HELPER_3(kstas16, tl, env, tl, tl)
+DEF_HELPER_3(ukstas16, tl, env, tl, tl)
+DEF_HELPER_3(stsa16, tl, env, tl, tl)
+DEF_HELPER_3(rstsa16, tl, env, tl, tl)
+DEF_HELPER_3(urstsa16, tl, env, tl, tl)
+DEF_HELPER_3(kstsa16, tl, env, tl, tl)
+DEF_HELPER_3(ukstsa16, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index f09f8d5faf..57f72fabf6 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -732,3 +732,35 @@ greviw     0110100 .......... 101 ..... 0011011 @sh5
 gorciw     0010100 .......... 101 ..... 0011011 @sh5
 
 slli_uw    00001. ........... 001 ..... 0011011 @sh
+
+# *** RV32P Extension ***
+add16      0100000  ..... ..... 000 ..... 1110111 @r
+radd16     0000000  ..... ..... 000 ..... 1110111 @r
+uradd16    0010000  ..... ..... 000 ..... 1110111 @r
+kadd16     0001000  ..... ..... 000 ..... 1110111 @r
+ukadd16    0011000  ..... ..... 000 ..... 1110111 @r
+sub16      0100001  ..... ..... 000 ..... 1110111 @r
+rsub16     0000001  ..... ..... 000 ..... 1110111 @r
+ursub16    0010001  ..... ..... 000 ..... 1110111 @r
+ksub16     0001001  ..... ..... 000 ..... 1110111 @r
+uksub16    0011001  ..... ..... 000 ..... 1110111 @r
+cras16     0100010  ..... ..... 000 ..... 1110111 @r
+rcras16    0000010  ..... ..... 000 ..... 1110111 @r
+urcras16   0010010  ..... ..... 000 ..... 1110111 @r
+kcras16    0001010  ..... ..... 000 ..... 1110111 @r
+ukcras16   0011010  ..... ..... 000 ..... 1110111 @r
+crsa16     0100011  ..... ..... 000 ..... 1110111 @r
+rcrsa16    0000011  ..... ..... 000 ..... 1110111 @r
+urcrsa16   0010011  ..... ..... 000 ..... 1110111 @r
+kcrsa16    0001011  ..... ..... 000 ..... 1110111 @r
+ukcrsa16   0011011  ..... ..... 000 ..... 1110111 @r
+stas16     1111010  ..... ..... 010 ..... 1110111 @r
+rstas16    1011010  ..... ..... 010 ..... 1110111 @r
+urstas16   1101010  ..... ..... 010 ..... 1110111 @r
+kstas16    1100010  ..... ..... 010 ..... 1110111 @r
+ukstas16   1110010  ..... ..... 010 ..... 1110111 @r
+stsa16     1111011  ..... ..... 010 ..... 1110111 @r
+rstsa16    1011011  ..... ..... 010 ..... 1110111 @r
+urstsa16   1101011  ..... ..... 010 ..... 1110111 @r
+kstsa16    1100011  ..... ..... 010 ..... 1110111 @r
+ukstsa16   1110011  ..... ..... 010 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
new file mode 100644
index 0000000000..43f395657a
--- /dev/null
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -0,0 +1,117 @@
+/*
+ * RISC-V translation routines for the RVP Standard Extension.
+ *
+ * Copyright (c) 2021 T-Head Semiconductor Co., Ltd. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "tcg/tcg-op-gvec.h"
+#include "tcg/tcg-gvec-desc.h"
+#include "tcg/tcg.h"
+
+/*
+ *** SIMD Data Processing Instructions
+ */
+
+/* 16-bit Addition & Subtraction Instructions */
+
+/*
+ * For some instructions, such as add16, an oberservation can be utilized:
+ * 1) If any reg is zero, it can be reduced to an inline op on the whole reg.
+ * 2) Otherwise, it can be acclebrated by an vec op.
+ */
+static inline bool
+r_inline(DisasContext *ctx, arg_r *a,
+         void (* vecop)(TCGv, TCGv, TCGv),
+         void (* op)(TCGv, TCGv, TCGv))
+{
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+    if (a->rd && a->rs1 && a->rs2) {
+        vecop(cpu_gpr[a->rd], cpu_gpr[a->rs1], cpu_gpr[a->rs2]);
+    } else {
+        gen_arith(ctx, a, op);
+    }
+    return true;
+}
+
+/* Complete inline implementation */
+#define GEN_RVP_R_INLINE(NAME, VECOP, OP)                \
+static bool trans_##NAME(DisasContext *s, arg_r *a)      \
+{                                                        \
+    return r_inline(s, a, VECOP, OP);                    \
+}
+
+GEN_RVP_R_INLINE(add16, tcg_gen_vec_add16_tl, tcg_gen_add_tl);
+GEN_RVP_R_INLINE(sub16, tcg_gen_vec_sub16_tl, tcg_gen_sub_tl);
+
+/* Out of line helpers for R format packed instructions */
+static inline bool
+r_ool(DisasContext *ctx, arg_r *a, void (* fn)(TCGv, TCGv_ptr, TCGv, TCGv))
+{
+    TCGv src1, src2, dst;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    fn(dst, cpu_env, src1, src2);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(dst);
+    return true;
+}
+
+#define GEN_RVP_R_OOL(NAME)                            \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_ool(s, a, gen_helper_##NAME);             \
+}
+
+GEN_RVP_R_OOL(radd16);
+GEN_RVP_R_OOL(uradd16);
+GEN_RVP_R_OOL(kadd16);
+GEN_RVP_R_OOL(ukadd16);
+GEN_RVP_R_OOL(rsub16);
+GEN_RVP_R_OOL(ursub16);
+GEN_RVP_R_OOL(ksub16);
+GEN_RVP_R_OOL(uksub16);
+GEN_RVP_R_OOL(cras16);
+GEN_RVP_R_OOL(rcras16);
+GEN_RVP_R_OOL(urcras16);
+GEN_RVP_R_OOL(kcras16);
+GEN_RVP_R_OOL(ukcras16);
+GEN_RVP_R_OOL(crsa16);
+GEN_RVP_R_OOL(rcrsa16);
+GEN_RVP_R_OOL(urcrsa16);
+GEN_RVP_R_OOL(kcrsa16);
+GEN_RVP_R_OOL(ukcrsa16);
+GEN_RVP_R_OOL(stas16);
+GEN_RVP_R_OOL(rstas16);
+GEN_RVP_R_OOL(urstas16);
+GEN_RVP_R_OOL(kstas16);
+GEN_RVP_R_OOL(ukstas16);
+GEN_RVP_R_OOL(stsa16);
+GEN_RVP_R_OOL(rstsa16);
+GEN_RVP_R_OOL(urstsa16);
+GEN_RVP_R_OOL(kstsa16);
+GEN_RVP_R_OOL(ukstsa16);
diff --git a/target/riscv/meson.build b/target/riscv/meson.build
index d5e0bc93ea..cc169e1b2c 100644
--- a/target/riscv/meson.build
+++ b/target/riscv/meson.build
@@ -17,6 +17,7 @@ riscv_ss.add(files(
   'op_helper.c',
   'vector_helper.c',
   'bitmanip_helper.c',
+  'packed_helper.c',
   'translate.c',
 ))
 
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
new file mode 100644
index 0000000000..b84abaaf25
--- /dev/null
+++ b/target/riscv/packed_helper.c
@@ -0,0 +1,354 @@
+/*
+ * RISC-V P Extension Helpers for QEMU.
+ *
+ * Copyright (c) 2021 T-Head Semiconductor Co., Ltd. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/exec-all.h"
+#include "exec/helper-proto.h"
+#include "exec/cpu_ldst.h"
+#include "fpu/softfloat.h"
+#include <math.h>
+#include "internals.h"
+
+/*
+ *** SIMD Data Processing Instructions
+ */
+
+/* 16-bit Addition & Subtraction Instructions */
+typedef void PackedFn3i(CPURISCVState *, void *, void *, void *, uint8_t);
+
+/* Define a common function to loop elements in packed register */
+static inline target_ulong
+rvpr(CPURISCVState *env, target_ulong a, target_ulong b,
+     uint8_t step, uint8_t size, PackedFn3i *fn)
+{
+    int i, passes = sizeof(target_ulong) / size;
+    target_ulong result = 0;
+
+    for (i = 0; i < passes; i += step) {
+        fn(env, &result, &a, &b, i);
+    }
+    return result;
+}
+
+#define RVPR(NAME, STEP, SIZE)                                  \
+target_ulong HELPER(NAME)(CPURISCVState *env, target_ulong a,   \
+                          target_ulong b)                       \
+{                                                               \
+    return rvpr(env, a, b, STEP, SIZE, (PackedFn3i *)do_##NAME);\
+}
+
+static inline int32_t hadd32(int32_t a, int32_t b)
+{
+    return ((int64_t)a + b) >> 1;
+}
+
+static inline void do_radd16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = hadd32(a[i], b[i]);
+}
+
+RVPR(radd16, 1, 2);
+
+static inline uint32_t haddu32(uint32_t a, uint32_t b)
+{
+    return ((uint64_t)a + b) >> 1;
+}
+
+static inline void do_uradd16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = haddu32(a[i], b[i]);
+}
+
+RVPR(uradd16, 1, 2);
+
+static inline void do_kadd16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = sadd16(env, 0, a[i], b[i]);
+}
+
+RVPR(kadd16, 1, 2);
+
+static inline void do_ukadd16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = saddu16(env, 0, a[i], b[i]);
+}
+
+RVPR(ukadd16, 1, 2);
+
+static inline int32_t hsub32(int32_t a, int32_t b)
+{
+    return ((int64_t)a - b) >> 1;
+}
+
+static inline int64_t hsub64(int64_t a, int64_t b)
+{
+    int64_t res = a - b;
+    int64_t over = (res ^ a) & (a ^ b) & INT64_MIN;
+
+    /* With signed overflow, bit 64 is inverse of bit 63. */
+    return (res >> 1) ^ over;
+}
+
+static inline void do_rsub16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = hsub32(a[i], b[i]);
+}
+
+RVPR(rsub16, 1, 2);
+
+static inline uint64_t hsubu64(uint64_t a, uint64_t b)
+{
+    return (a - b) >> 1;
+}
+
+static inline void do_ursub16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = hsubu64(a[i], b[i]);
+}
+
+RVPR(ursub16, 1, 2);
+
+static inline void do_ksub16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = ssub16(env, 0, a[i], b[i]);
+}
+
+RVPR(ksub16, 1, 2);
+
+static inline void do_uksub16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = ssubu16(env, 0, a[i], b[i]);
+}
+
+RVPR(uksub16, 1, 2);
+
+static inline void do_cras16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = a[H2(i)] - b[H2(i + 1)];
+    d[H2(i + 1)] = a[H2(i + 1)] + b[H2(i)];
+}
+
+RVPR(cras16, 2, 2);
+
+static inline void do_rcras16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hsub32(a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = hadd32(a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(rcras16, 2, 2);
+
+static inline void do_urcras16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hsubu64(a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = haddu32(a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(urcras16, 2, 2);
+
+static inline void do_kcras16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = ssub16(env, 0, a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = sadd16(env, 0, a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(kcras16, 2, 2);
+
+static inline void do_ukcras16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = ssubu16(env, 0, a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = saddu16(env, 0, a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(ukcras16, 2, 2);
+
+static inline void do_crsa16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = a[H2(i)] + b[H2(i + 1)];
+    d[H2(i + 1)] = a[H2(i + 1)] - b[H2(i)];
+}
+
+RVPR(crsa16, 2, 2);
+
+static inline void do_rcrsa16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hadd32(a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = hsub32(a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(rcrsa16, 2, 2);
+
+static inline void do_urcrsa16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = haddu32(a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = hsubu64(a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(urcrsa16, 2, 2);
+
+static inline void do_kcrsa16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = sadd16(env, 0, a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = ssub16(env, 0, a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(kcrsa16, 2, 2);
+
+static inline void do_ukcrsa16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = saddu16(env, 0, a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = ssubu16(env, 0, a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(ukcrsa16, 2, 2);
+
+static inline void do_stas16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = a[H2(i)] - b[H2(i)];
+    d[H2(i + 1)] = a[H2(i + 1)] + b[H2(i + 1)];
+}
+
+RVPR(stas16, 2, 2);
+
+static inline void do_rstas16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hsub32(a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = hadd32(a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(rstas16, 2, 2);
+
+static inline void do_urstas16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hsubu64(a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = haddu32(a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(urstas16, 2, 2);
+
+static inline void do_kstas16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = ssub16(env, 0, a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = sadd16(env, 0, a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(kstas16, 2, 2);
+
+static inline void do_ukstas16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = ssubu16(env, 0, a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = saddu16(env, 0, a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(ukstas16, 2, 2);
+
+static inline void do_stsa16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = a[H2(i)] + b[H2(i)];
+    d[H2(i + 1)] = a[H2(i + 1)] - b[H2(i + 1)];
+}
+
+RVPR(stsa16, 2, 2);
+
+static inline void do_rstsa16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hadd32(a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = hsub32(a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(rstsa16, 2, 2);
+
+static inline void do_urstsa16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = haddu32(a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = hsubu64(a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(urstsa16, 2, 2);
+
+static inline void do_kstsa16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = sadd16(env, 0, a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = ssub16(env, 0, a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(kstsa16, 2, 2);
+
+static inline void do_ukstsa16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = saddu16(env, 0, a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = ssubu16(env, 0, a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(ukstsa16, 2, 2);
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 0e6ede4d71..51b144e9be 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -908,6 +908,7 @@ static bool gen_unary(DisasContext *ctx, arg_r2 *a,
 #include "insn_trans/trans_rvh.c.inc"
 #include "insn_trans/trans_rvv.c.inc"
 #include "insn_trans/trans_rvb.c.inc"
+#include "insn_trans/trans_rvp.c.inc"
 #include "insn_trans/trans_privileged.c.inc"
 
 /* Include the auto-generated decoder for 16 bit insn */
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 498a959839..a8898ba7bf 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -1742,6 +1742,20 @@ void tcg_gen_vec_add16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
     gen_addv_mask(d, a, b, m);
 }
 
+void tcg_gen_vec_add16_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 t1 = tcg_temp_new_i32();
+    TCGv_i32 t2 = tcg_temp_new_i32();
+
+    tcg_gen_andi_i32(t1, a, ~0xffff);
+    tcg_gen_add_i32(t2, a, b);
+    tcg_gen_add_i32(t1, t1, b);
+    tcg_gen_deposit_i32(d, t1, t2, 0, 16);
+
+    tcg_temp_free_i32(t1);
+    tcg_temp_free_i32(t2);
+}
+
 void tcg_gen_vec_add32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 {
     TCGv_i64 t1 = tcg_temp_new_i64();
@@ -1892,6 +1906,20 @@ void tcg_gen_vec_sub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
     gen_subv_mask(d, a, b, m);
 }
 
+void tcg_gen_vec_sub16_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 t1 = tcg_temp_new_i32();
+    TCGv_i32 t2 = tcg_temp_new_i32();
+
+    tcg_gen_andi_i32(t1, b, ~0xffff);
+    tcg_gen_sub_i32(t2, a, b);
+    tcg_gen_sub_i32(t1, a, t1);
+    tcg_gen_deposit_i32(d, t1, t2, 0, 16);
+
+    tcg_temp_free_i32(t1);
+    tcg_temp_free_i32(t2);
+}
+
 void tcg_gen_vec_sub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 {
     TCGv_i64 t1 = tcg_temp_new_i64();
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 04/37] target/riscv: 8-bit Addition & Subtraction Instruction
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: bin.meng, Palmer Dabbelt, richard.henderson, palmer,
	Alistair Francis, LIU Zhiwei

Include 5 groups: Wrap-around (dropping overflow), Signed Halving,
Unsigned Halving, Signed Saturation, and Unsigned Saturation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
---
 include/tcg/tcg-op-gvec.h               |  6 ++
 target/riscv/helper.h                   |  9 +++
 target/riscv/insn32.decode              | 11 ++++
 target/riscv/insn_trans/trans_rvp.c.inc | 13 +++++
 target/riscv/packed_helper.c            | 73 +++++++++++++++++++++++++
 tcg/tcg-op-gvec.c                       | 47 ++++++++++++++++
 6 files changed, 159 insertions(+)

diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index 2dae9e78d0..392c0f95a4 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -385,11 +385,13 @@ void tcg_gen_vec_neg16_i64(TCGv_i64 d, TCGv_i64 a);
 void tcg_gen_vec_neg32_i64(TCGv_i64 d, TCGv_i64 a);
 
 void tcg_gen_vec_add8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void tcg_gen_vec_add8_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
 void tcg_gen_vec_add16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 void tcg_gen_vec_add16_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
 void tcg_gen_vec_add32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 
 void tcg_gen_vec_sub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void tcg_gen_vec_sub8_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
 void tcg_gen_vec_sub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 void tcg_gen_vec_sub16_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
 void tcg_gen_vec_sub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
@@ -406,9 +408,13 @@ void tcg_gen_vec_rotl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
 #if TARGET_LONG_BITS == 64
 #define tcg_gen_vec_add16_tl tcg_gen_vec_add16_i64
 #define tcg_gen_vec_sub16_tl tcg_gen_vec_sub16_i64
+#define tcg_gen_vec_add8_tl  tcg_gen_vec_add8_i64
+#define tcg_gen_vec_sub8_tl  tcg_gen_vec_sub8_i64
 #else
 #define tcg_gen_vec_add16_tl tcg_gen_vec_add16_i32
 #define tcg_gen_vec_sub16_tl tcg_gen_vec_sub16_i32
+#define tcg_gen_vec_add8_tl  tcg_gen_vec_add8_i32
+#define tcg_gen_vec_sub8_tl  tcg_gen_vec_sub8_i32
 #endif
 
 #endif
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index b6a71ade33..629ff13402 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1179,3 +1179,12 @@ DEF_HELPER_3(rstsa16, tl, env, tl, tl)
 DEF_HELPER_3(urstsa16, tl, env, tl, tl)
 DEF_HELPER_3(kstsa16, tl, env, tl, tl)
 DEF_HELPER_3(ukstsa16, tl, env, tl, tl)
+
+DEF_HELPER_3(radd8, tl, env, tl, tl)
+DEF_HELPER_3(uradd8, tl, env, tl, tl)
+DEF_HELPER_3(kadd8, tl, env, tl, tl)
+DEF_HELPER_3(ukadd8, tl, env, tl, tl)
+DEF_HELPER_3(rsub8, tl, env, tl, tl)
+DEF_HELPER_3(ursub8, tl, env, tl, tl)
+DEF_HELPER_3(ksub8, tl, env, tl, tl)
+DEF_HELPER_3(uksub8, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 57f72fabf6..13e1222296 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -764,3 +764,14 @@ rstsa16    1011011  ..... ..... 010 ..... 1110111 @r
 urstsa16   1101011  ..... ..... 010 ..... 1110111 @r
 kstsa16    1100011  ..... ..... 010 ..... 1110111 @r
 ukstsa16   1110011  ..... ..... 010 ..... 1110111 @r
+
+add8       0100100  ..... ..... 000 ..... 1110111 @r
+radd8      0000100  ..... ..... 000 ..... 1110111 @r
+uradd8     0010100  ..... ..... 000 ..... 1110111 @r
+kadd8      0001100  ..... ..... 000 ..... 1110111 @r
+ukadd8     0011100  ..... ..... 000 ..... 1110111 @r
+sub8       0100101  ..... ..... 000 ..... 1110111 @r
+rsub8      0000101  ..... ..... 000 ..... 1110111 @r
+ursub8     0010101  ..... ..... 000 ..... 1110111 @r
+ksub8      0001101  ..... ..... 000 ..... 1110111 @r
+uksub8     0011101  ..... ..... 000 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 43f395657a..80bec35ac9 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -115,3 +115,16 @@ GEN_RVP_R_OOL(rstsa16);
 GEN_RVP_R_OOL(urstsa16);
 GEN_RVP_R_OOL(kstsa16);
 GEN_RVP_R_OOL(ukstsa16);
+
+/* 8-bit Addition & Subtraction Instructions */
+GEN_RVP_R_INLINE(add8, tcg_gen_vec_add8_tl, tcg_gen_add_tl);
+GEN_RVP_R_INLINE(sub8, tcg_gen_vec_sub8_tl, tcg_gen_sub_tl);
+
+GEN_RVP_R_OOL(radd8);
+GEN_RVP_R_OOL(uradd8);
+GEN_RVP_R_OOL(kadd8);
+GEN_RVP_R_OOL(ukadd8);
+GEN_RVP_R_OOL(rsub8);
+GEN_RVP_R_OOL(ursub8);
+GEN_RVP_R_OOL(ksub8);
+GEN_RVP_R_OOL(uksub8);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index b84abaaf25..62db072204 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -352,3 +352,76 @@ static inline void do_ukstsa16(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(ukstsa16, 2, 2);
+
+/* 8-bit Addition & Subtraction Instructions */
+static inline void do_radd8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = hadd32(a[i], b[i]);
+}
+
+RVPR(radd8, 1, 1);
+
+static inline void do_uradd8(CPURISCVState *env, void *vd, void *va,
+                                  void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = haddu32(a[i], b[i]);
+}
+
+RVPR(uradd8, 1, 1);
+
+static inline void do_kadd8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = sadd8(env, 0, a[i], b[i]);
+}
+
+RVPR(kadd8, 1, 1);
+
+static inline void do_ukadd8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = saddu8(env, 0, a[i], b[i]);
+}
+
+RVPR(ukadd8, 1, 1);
+
+static inline void do_rsub8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = hsub32(a[i], b[i]);
+}
+
+RVPR(rsub8, 1, 1);
+
+static inline void do_ursub8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = hsubu64(a[i], b[i]);
+}
+
+RVPR(ursub8, 1, 1);
+
+static inline void do_ksub8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = ssub8(env, 0, a[i], b[i]);
+}
+
+RVPR(ksub8, 1, 1);
+
+static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = ssubu8(env, 0, a[i], b[i]);
+}
+
+RVPR(uksub8, 1, 1);
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index a8898ba7bf..484ced3054 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -1736,6 +1736,30 @@ void tcg_gen_vec_add8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
     gen_addv_mask(d, a, b, m);
 }
 
+static void gen_addv_mask_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, TCGv_i32 m)
+{
+    TCGv_i32 t1 = tcg_temp_new_i32();
+    TCGv_i32 t2 = tcg_temp_new_i32();
+    TCGv_i32 t3 = tcg_temp_new_i32();
+
+    tcg_gen_andc_i32(t1, a, m);
+    tcg_gen_andc_i32(t2, b, m);
+    tcg_gen_xor_i32(t3, a, b);
+    tcg_gen_add_i32(d, t1, t2);
+    tcg_gen_and_i32(t3, t3, m);
+    tcg_gen_xor_i32(d, d, t3);
+
+    tcg_temp_free_i32(t1);
+    tcg_temp_free_i32(t2);
+    tcg_temp_free_i32(t3);
+}
+
+void tcg_gen_vec_add8_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 m = tcg_constant_i32((int32_t)dup_const(MO_8, 0x80));
+    gen_addv_mask_i32(d, a, b, m);
+}
+
 void tcg_gen_vec_add16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 {
     TCGv_i64 m = tcg_constant_i64(dup_const(MO_16, 0x8000));
@@ -1900,6 +1924,29 @@ void tcg_gen_vec_sub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
     gen_subv_mask(d, a, b, m);
 }
 
+static void gen_subv_mask_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, TCGv_i32 m)
+{
+    TCGv_i32 t1 = tcg_temp_new_i32();
+    TCGv_i32 t2 = tcg_temp_new_i32();
+    TCGv_i32 t3 = tcg_temp_new_i32();
+
+    tcg_gen_or_i32(t1, a, m);
+    tcg_gen_andc_i32(t2, b, m);
+    tcg_gen_eqv_i32(t3, a, b);
+    tcg_gen_sub_i32(d, t1, t2);
+    tcg_gen_and_i32(t3, t3, m);
+    tcg_gen_xor_i32(d, d, t3);
+
+    tcg_temp_free_i32(t1);
+    tcg_temp_free_i32(t2);
+    tcg_temp_free_i32(t3);
+}
+
+void tcg_gen_vec_sub8_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 m = tcg_constant_i32((int32_t)dup_const(MO_8, 0x80));
+    gen_subv_mask_i32(d, a, b, m);
+}
 void tcg_gen_vec_sub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 {
     TCGv_i64 m = tcg_constant_i64(dup_const(MO_16, 0x8000));
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 04/37] target/riscv: 8-bit Addition & Subtraction Instruction
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson,
	LIU Zhiwei, Alistair Francis, Palmer Dabbelt

Include 5 groups: Wrap-around (dropping overflow), Signed Halving,
Unsigned Halving, Signed Saturation, and Unsigned Saturation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
---
 include/tcg/tcg-op-gvec.h               |  6 ++
 target/riscv/helper.h                   |  9 +++
 target/riscv/insn32.decode              | 11 ++++
 target/riscv/insn_trans/trans_rvp.c.inc | 13 +++++
 target/riscv/packed_helper.c            | 73 +++++++++++++++++++++++++
 tcg/tcg-op-gvec.c                       | 47 ++++++++++++++++
 6 files changed, 159 insertions(+)

diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index 2dae9e78d0..392c0f95a4 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -385,11 +385,13 @@ void tcg_gen_vec_neg16_i64(TCGv_i64 d, TCGv_i64 a);
 void tcg_gen_vec_neg32_i64(TCGv_i64 d, TCGv_i64 a);
 
 void tcg_gen_vec_add8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void tcg_gen_vec_add8_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
 void tcg_gen_vec_add16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 void tcg_gen_vec_add16_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
 void tcg_gen_vec_add32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 
 void tcg_gen_vec_sub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void tcg_gen_vec_sub8_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
 void tcg_gen_vec_sub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 void tcg_gen_vec_sub16_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
 void tcg_gen_vec_sub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
@@ -406,9 +408,13 @@ void tcg_gen_vec_rotl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
 #if TARGET_LONG_BITS == 64
 #define tcg_gen_vec_add16_tl tcg_gen_vec_add16_i64
 #define tcg_gen_vec_sub16_tl tcg_gen_vec_sub16_i64
+#define tcg_gen_vec_add8_tl  tcg_gen_vec_add8_i64
+#define tcg_gen_vec_sub8_tl  tcg_gen_vec_sub8_i64
 #else
 #define tcg_gen_vec_add16_tl tcg_gen_vec_add16_i32
 #define tcg_gen_vec_sub16_tl tcg_gen_vec_sub16_i32
+#define tcg_gen_vec_add8_tl  tcg_gen_vec_add8_i32
+#define tcg_gen_vec_sub8_tl  tcg_gen_vec_sub8_i32
 #endif
 
 #endif
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index b6a71ade33..629ff13402 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1179,3 +1179,12 @@ DEF_HELPER_3(rstsa16, tl, env, tl, tl)
 DEF_HELPER_3(urstsa16, tl, env, tl, tl)
 DEF_HELPER_3(kstsa16, tl, env, tl, tl)
 DEF_HELPER_3(ukstsa16, tl, env, tl, tl)
+
+DEF_HELPER_3(radd8, tl, env, tl, tl)
+DEF_HELPER_3(uradd8, tl, env, tl, tl)
+DEF_HELPER_3(kadd8, tl, env, tl, tl)
+DEF_HELPER_3(ukadd8, tl, env, tl, tl)
+DEF_HELPER_3(rsub8, tl, env, tl, tl)
+DEF_HELPER_3(ursub8, tl, env, tl, tl)
+DEF_HELPER_3(ksub8, tl, env, tl, tl)
+DEF_HELPER_3(uksub8, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 57f72fabf6..13e1222296 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -764,3 +764,14 @@ rstsa16    1011011  ..... ..... 010 ..... 1110111 @r
 urstsa16   1101011  ..... ..... 010 ..... 1110111 @r
 kstsa16    1100011  ..... ..... 010 ..... 1110111 @r
 ukstsa16   1110011  ..... ..... 010 ..... 1110111 @r
+
+add8       0100100  ..... ..... 000 ..... 1110111 @r
+radd8      0000100  ..... ..... 000 ..... 1110111 @r
+uradd8     0010100  ..... ..... 000 ..... 1110111 @r
+kadd8      0001100  ..... ..... 000 ..... 1110111 @r
+ukadd8     0011100  ..... ..... 000 ..... 1110111 @r
+sub8       0100101  ..... ..... 000 ..... 1110111 @r
+rsub8      0000101  ..... ..... 000 ..... 1110111 @r
+ursub8     0010101  ..... ..... 000 ..... 1110111 @r
+ksub8      0001101  ..... ..... 000 ..... 1110111 @r
+uksub8     0011101  ..... ..... 000 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 43f395657a..80bec35ac9 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -115,3 +115,16 @@ GEN_RVP_R_OOL(rstsa16);
 GEN_RVP_R_OOL(urstsa16);
 GEN_RVP_R_OOL(kstsa16);
 GEN_RVP_R_OOL(ukstsa16);
+
+/* 8-bit Addition & Subtraction Instructions */
+GEN_RVP_R_INLINE(add8, tcg_gen_vec_add8_tl, tcg_gen_add_tl);
+GEN_RVP_R_INLINE(sub8, tcg_gen_vec_sub8_tl, tcg_gen_sub_tl);
+
+GEN_RVP_R_OOL(radd8);
+GEN_RVP_R_OOL(uradd8);
+GEN_RVP_R_OOL(kadd8);
+GEN_RVP_R_OOL(ukadd8);
+GEN_RVP_R_OOL(rsub8);
+GEN_RVP_R_OOL(ursub8);
+GEN_RVP_R_OOL(ksub8);
+GEN_RVP_R_OOL(uksub8);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index b84abaaf25..62db072204 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -352,3 +352,76 @@ static inline void do_ukstsa16(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(ukstsa16, 2, 2);
+
+/* 8-bit Addition & Subtraction Instructions */
+static inline void do_radd8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = hadd32(a[i], b[i]);
+}
+
+RVPR(radd8, 1, 1);
+
+static inline void do_uradd8(CPURISCVState *env, void *vd, void *va,
+                                  void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = haddu32(a[i], b[i]);
+}
+
+RVPR(uradd8, 1, 1);
+
+static inline void do_kadd8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = sadd8(env, 0, a[i], b[i]);
+}
+
+RVPR(kadd8, 1, 1);
+
+static inline void do_ukadd8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = saddu8(env, 0, a[i], b[i]);
+}
+
+RVPR(ukadd8, 1, 1);
+
+static inline void do_rsub8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = hsub32(a[i], b[i]);
+}
+
+RVPR(rsub8, 1, 1);
+
+static inline void do_ursub8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = hsubu64(a[i], b[i]);
+}
+
+RVPR(ursub8, 1, 1);
+
+static inline void do_ksub8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = ssub8(env, 0, a[i], b[i]);
+}
+
+RVPR(ksub8, 1, 1);
+
+static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = ssubu8(env, 0, a[i], b[i]);
+}
+
+RVPR(uksub8, 1, 1);
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index a8898ba7bf..484ced3054 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -1736,6 +1736,30 @@ void tcg_gen_vec_add8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
     gen_addv_mask(d, a, b, m);
 }
 
+static void gen_addv_mask_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, TCGv_i32 m)
+{
+    TCGv_i32 t1 = tcg_temp_new_i32();
+    TCGv_i32 t2 = tcg_temp_new_i32();
+    TCGv_i32 t3 = tcg_temp_new_i32();
+
+    tcg_gen_andc_i32(t1, a, m);
+    tcg_gen_andc_i32(t2, b, m);
+    tcg_gen_xor_i32(t3, a, b);
+    tcg_gen_add_i32(d, t1, t2);
+    tcg_gen_and_i32(t3, t3, m);
+    tcg_gen_xor_i32(d, d, t3);
+
+    tcg_temp_free_i32(t1);
+    tcg_temp_free_i32(t2);
+    tcg_temp_free_i32(t3);
+}
+
+void tcg_gen_vec_add8_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 m = tcg_constant_i32((int32_t)dup_const(MO_8, 0x80));
+    gen_addv_mask_i32(d, a, b, m);
+}
+
 void tcg_gen_vec_add16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 {
     TCGv_i64 m = tcg_constant_i64(dup_const(MO_16, 0x8000));
@@ -1900,6 +1924,29 @@ void tcg_gen_vec_sub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
     gen_subv_mask(d, a, b, m);
 }
 
+static void gen_subv_mask_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, TCGv_i32 m)
+{
+    TCGv_i32 t1 = tcg_temp_new_i32();
+    TCGv_i32 t2 = tcg_temp_new_i32();
+    TCGv_i32 t3 = tcg_temp_new_i32();
+
+    tcg_gen_or_i32(t1, a, m);
+    tcg_gen_andc_i32(t2, b, m);
+    tcg_gen_eqv_i32(t3, a, b);
+    tcg_gen_sub_i32(d, t1, t2);
+    tcg_gen_and_i32(t3, t3, m);
+    tcg_gen_xor_i32(d, d, t3);
+
+    tcg_temp_free_i32(t1);
+    tcg_temp_free_i32(t2);
+    tcg_temp_free_i32(t3);
+}
+
+void tcg_gen_vec_sub8_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 m = tcg_constant_i32((int32_t)dup_const(MO_8, 0x80));
+    gen_subv_mask_i32(d, a, b, m);
+}
 void tcg_gen_vec_sub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 {
     TCGv_i64 m = tcg_constant_i64(dup_const(MO_16, 0x8000));
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 05/37] target/riscv: SIMD 16-bit Shift Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

Instructions include right arithmetic shift, right logic shift,
and left shift.

The shift can be an immediate or a register scalar. The
right shift has rounding operation. And the left shift
has saturation operation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 include/tcg/tcg-op-gvec.h               |   9 ++
 target/riscv/helper.h                   |   9 ++
 target/riscv/insn32.decode              |  17 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  59 ++++++++++++++
 target/riscv/packed_helper.c            | 104 ++++++++++++++++++++++++
 tcg/tcg-op-gvec.c                       |  28 +++++++
 6 files changed, 226 insertions(+)

diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index 392c0f95a4..72cf697646 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -398,10 +398,13 @@ void tcg_gen_vec_sub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 
 void tcg_gen_vec_shl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
 void tcg_gen_vec_shl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
+void tcg_gen_vec_shl16i_i32(TCGv_i32 d, TCGv_i32 a, int32_t);
 void tcg_gen_vec_shr8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
 void tcg_gen_vec_shr16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
+void tcg_gen_vec_shr16i_i32(TCGv_i32 d, TCGv_i32 a, int32_t);
 void tcg_gen_vec_sar8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
 void tcg_gen_vec_sar16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
+void tcg_gen_vec_sar16i_i32(TCGv_i32 d, TCGv_i32 a, int32_t);
 void tcg_gen_vec_rotl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
 void tcg_gen_vec_rotl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
 
@@ -410,11 +413,17 @@ void tcg_gen_vec_rotl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
 #define tcg_gen_vec_sub16_tl tcg_gen_vec_sub16_i64
 #define tcg_gen_vec_add8_tl  tcg_gen_vec_add8_i64
 #define tcg_gen_vec_sub8_tl  tcg_gen_vec_sub8_i64
+#define tcg_gen_vec_shl16i_tl tcg_gen_vec_shl16i_i64
+#define tcg_gen_vec_shr16i_tl tcg_gen_vec_shr16i_i64
+#define tcg_gen_vec_sar16i_tl tcg_gen_vec_sar16i_i64
 #else
 #define tcg_gen_vec_add16_tl tcg_gen_vec_add16_i32
 #define tcg_gen_vec_sub16_tl tcg_gen_vec_sub16_i32
 #define tcg_gen_vec_add8_tl  tcg_gen_vec_add8_i32
 #define tcg_gen_vec_sub8_tl  tcg_gen_vec_sub8_i32
+#define tcg_gen_vec_shl16i_tl tcg_gen_vec_shl16i_i32
+#define tcg_gen_vec_shr16i_tl tcg_gen_vec_shr16i_i32
+#define tcg_gen_vec_sar16i_tl tcg_gen_vec_sar16i_i32
 #endif
 
 #endif
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 629ff13402..de7b4fc17d 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1188,3 +1188,12 @@ DEF_HELPER_3(rsub8, tl, env, tl, tl)
 DEF_HELPER_3(ursub8, tl, env, tl, tl)
 DEF_HELPER_3(ksub8, tl, env, tl, tl)
 DEF_HELPER_3(uksub8, tl, env, tl, tl)
+
+DEF_HELPER_3(sra16, tl, env, tl, tl)
+DEF_HELPER_3(sra16_u, tl, env, tl, tl)
+DEF_HELPER_3(srl16, tl, env, tl, tl)
+DEF_HELPER_3(srl16_u, tl, env, tl, tl)
+DEF_HELPER_3(sll16, tl, env, tl, tl)
+DEF_HELPER_3(ksll16, tl, env, tl, tl)
+DEF_HELPER_3(kslra16, tl, env, tl, tl)
+DEF_HELPER_3(kslra16_u, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 13e1222296..44c497f28a 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -24,6 +24,7 @@
 %sh5       20:5
 
 %sh7    20:7
+%sh4    20:4
 %csr    20:12
 %rm     12:3
 %nf     29:3                     !function=ex_plus_1
@@ -61,6 +62,7 @@
 @j       ....................      ..... ....... &j      imm=%imm_j          %rd
 
 @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh7     %rs1 %rd
+@sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
 @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
 
 @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
@@ -775,3 +777,18 @@ rsub8      0000101  ..... ..... 000 ..... 1110111 @r
 ursub8     0010101  ..... ..... 000 ..... 1110111 @r
 ksub8      0001101  ..... ..... 000 ..... 1110111 @r
 uksub8     0011101  ..... ..... 000 ..... 1110111 @r
+
+sra16      0101000  ..... ..... 000 ..... 1110111 @r
+sra16_u    0110000  ..... ..... 000 ..... 1110111 @r
+srai16     0111000  0.... ..... 000 ..... 1110111 @sh4
+srai16_u   0111000  1.... ..... 000 ..... 1110111 @sh4
+srl16      0101001  ..... ..... 000 ..... 1110111 @r
+srl16_u    0110001  ..... ..... 000 ..... 1110111 @r
+srli16     0111001  0.... ..... 000 ..... 1110111 @sh4
+srli16_u   0111001  1.... ..... 000 ..... 1110111 @sh4
+sll16      0101010  ..... ..... 000 ..... 1110111 @r
+slli16     0111010  0.... ..... 000 ..... 1110111 @sh4
+ksll16     0110010  ..... ..... 000 ..... 1110111 @r
+kslli16    0111010  1.... ..... 000 ..... 1110111 @sh4
+kslra16    0101011  ..... ..... 000 ..... 1110111 @r
+kslra16_u  0110011  ..... ..... 000 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 80bec35ac9..afafa49824 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -128,3 +128,62 @@ GEN_RVP_R_OOL(rsub8);
 GEN_RVP_R_OOL(ursub8);
 GEN_RVP_R_OOL(ksub8);
 GEN_RVP_R_OOL(uksub8);
+
+/* 16-bit Shift Instructions */
+GEN_RVP_R_OOL(sra16);
+GEN_RVP_R_OOL(srl16);
+GEN_RVP_R_OOL(sll16);
+GEN_RVP_R_OOL(sra16_u);
+GEN_RVP_R_OOL(srl16_u);
+GEN_RVP_R_OOL(ksll16);
+GEN_RVP_R_OOL(kslra16);
+GEN_RVP_R_OOL(kslra16_u);
+
+static bool
+rvp_shifti_ool(DisasContext *ctx, arg_shift *a,
+               void (* fn)(TCGv, TCGv_ptr, TCGv, TCGv))
+{
+    TCGv src1, dst, shift;
+
+    src1 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    shift = tcg_const_tl(a->shamt);
+    fn(dst, cpu_env, src1, shift);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(dst);
+    tcg_temp_free(shift);
+    return true;
+}
+
+static inline bool
+rvp_shifti(DisasContext *ctx, arg_shift *a,
+           void (* vecop)(TCGv, TCGv, target_long),
+           void (* op)(TCGv, TCGv_ptr, TCGv, TCGv))
+{
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    if (a->rd && a->rs1 && vecop) {
+        vecop(cpu_gpr[a->rd], cpu_gpr[a->rs1], a->shamt);
+        return true;
+    }
+    return rvp_shifti_ool(ctx, a, op);
+}
+
+#define GEN_RVP_SHIFTI(NAME, VECOP, OP)                  \
+static bool trans_##NAME(DisasContext *s, arg_shift *a)  \
+{                                                        \
+    return rvp_shifti(s, a, VECOP, OP);                  \
+}
+
+GEN_RVP_SHIFTI(srai16, tcg_gen_vec_sar16i_tl, gen_helper_sra16);
+GEN_RVP_SHIFTI(srli16, tcg_gen_vec_shr16i_tl, gen_helper_srl16);
+GEN_RVP_SHIFTI(slli16, tcg_gen_vec_shl16i_tl, gen_helper_sll16);
+GEN_RVP_SHIFTI(srai16_u, NULL, gen_helper_sra16_u);
+GEN_RVP_SHIFTI(srli16_u, NULL, gen_helper_srl16_u);
+GEN_RVP_SHIFTI(kslli16, NULL, gen_helper_ksll16);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 62db072204..7e31c2fe46 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -425,3 +425,107 @@ static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(uksub8, 1, 1);
+
+/* 16-bit Shift Instructions */
+static inline void do_sra16(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+    d[i] = a[i] >> shift;
+}
+
+RVPR(sra16, 1, 2);
+
+static inline void do_srl16(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+    d[i] = a[i] >> shift;
+}
+
+RVPR(srl16, 1, 2);
+
+static inline void do_sll16(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+    d[i] = a[i] << shift;
+}
+
+RVPR(sll16, 1, 2);
+
+static inline void do_sra16_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+
+    d[i] = vssra16(env, 0, a[i], shift);
+}
+
+RVPR(sra16_u, 1, 2);
+
+static inline void do_srl16_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+
+    d[i] = vssrl16(env, 0, a[i], shift);
+}
+
+RVPR(srl16_u, 1, 2);
+
+static inline void do_ksll16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, result;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+
+    result = a[i] << shift;
+    if (shift > (clrsb32(a[i]) - 16)) {
+        env->vxsat = 0x1;
+        d[i] = (a[i] & INT16_MIN) ? INT16_MIN : INT16_MAX;
+    } else {
+        d[i] = result;
+    }
+}
+
+RVPR(ksll16, 1, 2);
+
+static inline void do_kslra16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    int32_t shift = sextract32((*(target_ulong *)vb), 0, 5);
+
+    if (shift >= 0) {
+        do_ksll16(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 16) ? 15 : shift;
+        d[i] = a[i] >> shift;
+    }
+}
+
+RVPR(kslra16, 1, 2);
+
+static inline void do_kslra16_u(CPURISCVState *env, void *vd, void *va,
+                                void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 5);
+
+    if (shift >= 0) {
+        do_ksll16(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 16) ? 15 : shift;
+        d[i] = vssra16(env, 0, a[i], shift);
+    }
+}
+
+RVPR(kslra16_u, 1, 2);
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 484ced3054..cf1357cee1 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -2687,6 +2687,13 @@ void tcg_gen_vec_shl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
     tcg_gen_andi_i64(d, d, mask);
 }
 
+void tcg_gen_vec_shl16i_i32(TCGv_i32 d, TCGv_i32 a, int32_t c)
+{
+    uint32_t mask = dup_const(MO_16, 0xffff << c);
+    tcg_gen_shli_i32(d, a, c);
+    tcg_gen_andi_i32(d, d, mask);
+}
+
 void tcg_gen_gvec_shli(unsigned vece, uint32_t dofs, uint32_t aofs,
                        int64_t shift, uint32_t oprsz, uint32_t maxsz)
 {
@@ -2738,6 +2745,13 @@ void tcg_gen_vec_shr16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
     tcg_gen_andi_i64(d, d, mask);
 }
 
+void tcg_gen_vec_shr16i_i32(TCGv_i32 d, TCGv_i32 a, int32_t c)
+{
+    uint32_t mask = dup_const(MO_16, 0xffff >> c);
+    tcg_gen_shri_i32(d, a, c);
+    tcg_gen_andi_i32(d, d, mask);
+}
+
 void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, uint32_t aofs,
                        int64_t shift, uint32_t oprsz, uint32_t maxsz)
 {
@@ -2803,6 +2817,20 @@ void tcg_gen_vec_sar16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
     tcg_temp_free_i64(s);
 }
 
+void tcg_gen_vec_sar16i_i32(TCGv_i32 d, TCGv_i32 a, int32_t c)
+{
+    uint32_t s_mask = dup_const(MO_16, 0x8000 >> c);
+    uint32_t c_mask = dup_const(MO_16, 0xffff >> c);
+    TCGv_i32 s = tcg_temp_new_i32();
+
+    tcg_gen_shri_i32(d, a, c);
+    tcg_gen_andi_i32(s, d, s_mask);  /* isolate (shifted) sign bit */
+    tcg_gen_andi_i32(d, d, c_mask);  /* clear out bits above sign  */
+    tcg_gen_muli_i32(s, s, (2 << c) - 2); /* replicate isolated signs */
+    tcg_gen_or_i32(d, d, s);         /* include sign extension */
+    tcg_temp_free_i32(s);
+}
+
 void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs,
                        int64_t shift, uint32_t oprsz, uint32_t maxsz)
 {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 05/37] target/riscv: SIMD 16-bit Shift Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

Instructions include right arithmetic shift, right logic shift,
and left shift.

The shift can be an immediate or a register scalar. The
right shift has rounding operation. And the left shift
has saturation operation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 include/tcg/tcg-op-gvec.h               |   9 ++
 target/riscv/helper.h                   |   9 ++
 target/riscv/insn32.decode              |  17 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  59 ++++++++++++++
 target/riscv/packed_helper.c            | 104 ++++++++++++++++++++++++
 tcg/tcg-op-gvec.c                       |  28 +++++++
 6 files changed, 226 insertions(+)

diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index 392c0f95a4..72cf697646 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -398,10 +398,13 @@ void tcg_gen_vec_sub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 
 void tcg_gen_vec_shl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
 void tcg_gen_vec_shl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
+void tcg_gen_vec_shl16i_i32(TCGv_i32 d, TCGv_i32 a, int32_t);
 void tcg_gen_vec_shr8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
 void tcg_gen_vec_shr16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
+void tcg_gen_vec_shr16i_i32(TCGv_i32 d, TCGv_i32 a, int32_t);
 void tcg_gen_vec_sar8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
 void tcg_gen_vec_sar16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
+void tcg_gen_vec_sar16i_i32(TCGv_i32 d, TCGv_i32 a, int32_t);
 void tcg_gen_vec_rotl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
 void tcg_gen_vec_rotl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
 
@@ -410,11 +413,17 @@ void tcg_gen_vec_rotl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
 #define tcg_gen_vec_sub16_tl tcg_gen_vec_sub16_i64
 #define tcg_gen_vec_add8_tl  tcg_gen_vec_add8_i64
 #define tcg_gen_vec_sub8_tl  tcg_gen_vec_sub8_i64
+#define tcg_gen_vec_shl16i_tl tcg_gen_vec_shl16i_i64
+#define tcg_gen_vec_shr16i_tl tcg_gen_vec_shr16i_i64
+#define tcg_gen_vec_sar16i_tl tcg_gen_vec_sar16i_i64
 #else
 #define tcg_gen_vec_add16_tl tcg_gen_vec_add16_i32
 #define tcg_gen_vec_sub16_tl tcg_gen_vec_sub16_i32
 #define tcg_gen_vec_add8_tl  tcg_gen_vec_add8_i32
 #define tcg_gen_vec_sub8_tl  tcg_gen_vec_sub8_i32
+#define tcg_gen_vec_shl16i_tl tcg_gen_vec_shl16i_i32
+#define tcg_gen_vec_shr16i_tl tcg_gen_vec_shr16i_i32
+#define tcg_gen_vec_sar16i_tl tcg_gen_vec_sar16i_i32
 #endif
 
 #endif
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 629ff13402..de7b4fc17d 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1188,3 +1188,12 @@ DEF_HELPER_3(rsub8, tl, env, tl, tl)
 DEF_HELPER_3(ursub8, tl, env, tl, tl)
 DEF_HELPER_3(ksub8, tl, env, tl, tl)
 DEF_HELPER_3(uksub8, tl, env, tl, tl)
+
+DEF_HELPER_3(sra16, tl, env, tl, tl)
+DEF_HELPER_3(sra16_u, tl, env, tl, tl)
+DEF_HELPER_3(srl16, tl, env, tl, tl)
+DEF_HELPER_3(srl16_u, tl, env, tl, tl)
+DEF_HELPER_3(sll16, tl, env, tl, tl)
+DEF_HELPER_3(ksll16, tl, env, tl, tl)
+DEF_HELPER_3(kslra16, tl, env, tl, tl)
+DEF_HELPER_3(kslra16_u, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 13e1222296..44c497f28a 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -24,6 +24,7 @@
 %sh5       20:5
 
 %sh7    20:7
+%sh4    20:4
 %csr    20:12
 %rm     12:3
 %nf     29:3                     !function=ex_plus_1
@@ -61,6 +62,7 @@
 @j       ....................      ..... ....... &j      imm=%imm_j          %rd
 
 @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh7     %rs1 %rd
+@sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
 @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
 
 @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
@@ -775,3 +777,18 @@ rsub8      0000101  ..... ..... 000 ..... 1110111 @r
 ursub8     0010101  ..... ..... 000 ..... 1110111 @r
 ksub8      0001101  ..... ..... 000 ..... 1110111 @r
 uksub8     0011101  ..... ..... 000 ..... 1110111 @r
+
+sra16      0101000  ..... ..... 000 ..... 1110111 @r
+sra16_u    0110000  ..... ..... 000 ..... 1110111 @r
+srai16     0111000  0.... ..... 000 ..... 1110111 @sh4
+srai16_u   0111000  1.... ..... 000 ..... 1110111 @sh4
+srl16      0101001  ..... ..... 000 ..... 1110111 @r
+srl16_u    0110001  ..... ..... 000 ..... 1110111 @r
+srli16     0111001  0.... ..... 000 ..... 1110111 @sh4
+srli16_u   0111001  1.... ..... 000 ..... 1110111 @sh4
+sll16      0101010  ..... ..... 000 ..... 1110111 @r
+slli16     0111010  0.... ..... 000 ..... 1110111 @sh4
+ksll16     0110010  ..... ..... 000 ..... 1110111 @r
+kslli16    0111010  1.... ..... 000 ..... 1110111 @sh4
+kslra16    0101011  ..... ..... 000 ..... 1110111 @r
+kslra16_u  0110011  ..... ..... 000 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 80bec35ac9..afafa49824 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -128,3 +128,62 @@ GEN_RVP_R_OOL(rsub8);
 GEN_RVP_R_OOL(ursub8);
 GEN_RVP_R_OOL(ksub8);
 GEN_RVP_R_OOL(uksub8);
+
+/* 16-bit Shift Instructions */
+GEN_RVP_R_OOL(sra16);
+GEN_RVP_R_OOL(srl16);
+GEN_RVP_R_OOL(sll16);
+GEN_RVP_R_OOL(sra16_u);
+GEN_RVP_R_OOL(srl16_u);
+GEN_RVP_R_OOL(ksll16);
+GEN_RVP_R_OOL(kslra16);
+GEN_RVP_R_OOL(kslra16_u);
+
+static bool
+rvp_shifti_ool(DisasContext *ctx, arg_shift *a,
+               void (* fn)(TCGv, TCGv_ptr, TCGv, TCGv))
+{
+    TCGv src1, dst, shift;
+
+    src1 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    shift = tcg_const_tl(a->shamt);
+    fn(dst, cpu_env, src1, shift);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(dst);
+    tcg_temp_free(shift);
+    return true;
+}
+
+static inline bool
+rvp_shifti(DisasContext *ctx, arg_shift *a,
+           void (* vecop)(TCGv, TCGv, target_long),
+           void (* op)(TCGv, TCGv_ptr, TCGv, TCGv))
+{
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    if (a->rd && a->rs1 && vecop) {
+        vecop(cpu_gpr[a->rd], cpu_gpr[a->rs1], a->shamt);
+        return true;
+    }
+    return rvp_shifti_ool(ctx, a, op);
+}
+
+#define GEN_RVP_SHIFTI(NAME, VECOP, OP)                  \
+static bool trans_##NAME(DisasContext *s, arg_shift *a)  \
+{                                                        \
+    return rvp_shifti(s, a, VECOP, OP);                  \
+}
+
+GEN_RVP_SHIFTI(srai16, tcg_gen_vec_sar16i_tl, gen_helper_sra16);
+GEN_RVP_SHIFTI(srli16, tcg_gen_vec_shr16i_tl, gen_helper_srl16);
+GEN_RVP_SHIFTI(slli16, tcg_gen_vec_shl16i_tl, gen_helper_sll16);
+GEN_RVP_SHIFTI(srai16_u, NULL, gen_helper_sra16_u);
+GEN_RVP_SHIFTI(srli16_u, NULL, gen_helper_srl16_u);
+GEN_RVP_SHIFTI(kslli16, NULL, gen_helper_ksll16);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 62db072204..7e31c2fe46 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -425,3 +425,107 @@ static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(uksub8, 1, 1);
+
+/* 16-bit Shift Instructions */
+static inline void do_sra16(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+    d[i] = a[i] >> shift;
+}
+
+RVPR(sra16, 1, 2);
+
+static inline void do_srl16(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+    d[i] = a[i] >> shift;
+}
+
+RVPR(srl16, 1, 2);
+
+static inline void do_sll16(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+    d[i] = a[i] << shift;
+}
+
+RVPR(sll16, 1, 2);
+
+static inline void do_sra16_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+
+    d[i] = vssra16(env, 0, a[i], shift);
+}
+
+RVPR(sra16_u, 1, 2);
+
+static inline void do_srl16_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+
+    d[i] = vssrl16(env, 0, a[i], shift);
+}
+
+RVPR(srl16_u, 1, 2);
+
+static inline void do_ksll16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, result;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+
+    result = a[i] << shift;
+    if (shift > (clrsb32(a[i]) - 16)) {
+        env->vxsat = 0x1;
+        d[i] = (a[i] & INT16_MIN) ? INT16_MIN : INT16_MAX;
+    } else {
+        d[i] = result;
+    }
+}
+
+RVPR(ksll16, 1, 2);
+
+static inline void do_kslra16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    int32_t shift = sextract32((*(target_ulong *)vb), 0, 5);
+
+    if (shift >= 0) {
+        do_ksll16(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 16) ? 15 : shift;
+        d[i] = a[i] >> shift;
+    }
+}
+
+RVPR(kslra16, 1, 2);
+
+static inline void do_kslra16_u(CPURISCVState *env, void *vd, void *va,
+                                void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 5);
+
+    if (shift >= 0) {
+        do_ksll16(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 16) ? 15 : shift;
+        d[i] = vssra16(env, 0, a[i], shift);
+    }
+}
+
+RVPR(kslra16_u, 1, 2);
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 484ced3054..cf1357cee1 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -2687,6 +2687,13 @@ void tcg_gen_vec_shl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
     tcg_gen_andi_i64(d, d, mask);
 }
 
+void tcg_gen_vec_shl16i_i32(TCGv_i32 d, TCGv_i32 a, int32_t c)
+{
+    uint32_t mask = dup_const(MO_16, 0xffff << c);
+    tcg_gen_shli_i32(d, a, c);
+    tcg_gen_andi_i32(d, d, mask);
+}
+
 void tcg_gen_gvec_shli(unsigned vece, uint32_t dofs, uint32_t aofs,
                        int64_t shift, uint32_t oprsz, uint32_t maxsz)
 {
@@ -2738,6 +2745,13 @@ void tcg_gen_vec_shr16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
     tcg_gen_andi_i64(d, d, mask);
 }
 
+void tcg_gen_vec_shr16i_i32(TCGv_i32 d, TCGv_i32 a, int32_t c)
+{
+    uint32_t mask = dup_const(MO_16, 0xffff >> c);
+    tcg_gen_shri_i32(d, a, c);
+    tcg_gen_andi_i32(d, d, mask);
+}
+
 void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, uint32_t aofs,
                        int64_t shift, uint32_t oprsz, uint32_t maxsz)
 {
@@ -2803,6 +2817,20 @@ void tcg_gen_vec_sar16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
     tcg_temp_free_i64(s);
 }
 
+void tcg_gen_vec_sar16i_i32(TCGv_i32 d, TCGv_i32 a, int32_t c)
+{
+    uint32_t s_mask = dup_const(MO_16, 0x8000 >> c);
+    uint32_t c_mask = dup_const(MO_16, 0xffff >> c);
+    TCGv_i32 s = tcg_temp_new_i32();
+
+    tcg_gen_shri_i32(d, a, c);
+    tcg_gen_andi_i32(s, d, s_mask);  /* isolate (shifted) sign bit */
+    tcg_gen_andi_i32(d, d, c_mask);  /* clear out bits above sign  */
+    tcg_gen_muli_i32(s, s, (2 << c) - 2); /* replicate isolated signs */
+    tcg_gen_or_i32(d, d, s);         /* include sign extension */
+    tcg_temp_free_i32(s);
+}
+
 void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs,
                        int64_t shift, uint32_t oprsz, uint32_t maxsz)
 {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 06/37] target/riscv: SIMD 8-bit Shift Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: bin.meng, Palmer Dabbelt, richard.henderson, palmer,
	Alistair Francis, LIU Zhiwei

Instructions include right arithmetic shift, right logic shift,
and left shift.

The shift can be an immediate or a register scalar. The
right shift has rounding operation. And the left shift
has saturation operation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
---
 include/tcg/tcg-op-gvec.h               |   9 +++
 target/riscv/helper.h                   |   9 +++
 target/riscv/insn32.decode              |  17 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  16 ++++
 target/riscv/packed_helper.c            | 102 ++++++++++++++++++++++++
 tcg/tcg-op-gvec.c                       |  28 +++++++
 6 files changed, 181 insertions(+)

diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index 72cf697646..91531ecb0b 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -397,12 +397,15 @@ void tcg_gen_vec_sub16_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
 void tcg_gen_vec_sub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 
 void tcg_gen_vec_shl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
+void tcg_gen_vec_shl8i_i32(TCGv_i32 d, TCGv_i32 a, int32_t);
 void tcg_gen_vec_shl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
 void tcg_gen_vec_shl16i_i32(TCGv_i32 d, TCGv_i32 a, int32_t);
 void tcg_gen_vec_shr8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
+void tcg_gen_vec_shr8i_i32(TCGv_i32 d, TCGv_i32 a, int32_t);
 void tcg_gen_vec_shr16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
 void tcg_gen_vec_shr16i_i32(TCGv_i32 d, TCGv_i32 a, int32_t);
 void tcg_gen_vec_sar8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
+void tcg_gen_vec_sar8i_i32(TCGv_i32 d, TCGv_i32 a, int32_t);
 void tcg_gen_vec_sar16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
 void tcg_gen_vec_sar16i_i32(TCGv_i32 d, TCGv_i32 a, int32_t);
 void tcg_gen_vec_rotl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
@@ -416,6 +419,9 @@ void tcg_gen_vec_rotl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
 #define tcg_gen_vec_shl16i_tl tcg_gen_vec_shl16i_i64
 #define tcg_gen_vec_shr16i_tl tcg_gen_vec_shr16i_i64
 #define tcg_gen_vec_sar16i_tl tcg_gen_vec_sar16i_i64
+#define tcg_gen_vec_shl8i_tl tcg_gen_vec_shl8i_i64
+#define tcg_gen_vec_shr8i_tl tcg_gen_vec_shr8i_i64
+#define tcg_gen_vec_sar8i_tl tcg_gen_vec_sar8i_i64
 #else
 #define tcg_gen_vec_add16_tl tcg_gen_vec_add16_i32
 #define tcg_gen_vec_sub16_tl tcg_gen_vec_sub16_i32
@@ -424,6 +430,9 @@ void tcg_gen_vec_rotl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
 #define tcg_gen_vec_shl16i_tl tcg_gen_vec_shl16i_i32
 #define tcg_gen_vec_shr16i_tl tcg_gen_vec_shr16i_i32
 #define tcg_gen_vec_sar16i_tl tcg_gen_vec_sar16i_i32
+#define tcg_gen_vec_shl8i_tl tcg_gen_vec_shl8i_i32
+#define tcg_gen_vec_shr8i_tl tcg_gen_vec_shr8i_i32
+#define tcg_gen_vec_sar8i_tl tcg_gen_vec_sar8i_i32
 #endif
 
 #endif
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index de7b4fc17d..1b365135ff 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1197,3 +1197,12 @@ DEF_HELPER_3(sll16, tl, env, tl, tl)
 DEF_HELPER_3(ksll16, tl, env, tl, tl)
 DEF_HELPER_3(kslra16, tl, env, tl, tl)
 DEF_HELPER_3(kslra16_u, tl, env, tl, tl)
+
+DEF_HELPER_3(sra8, tl, env, tl, tl)
+DEF_HELPER_3(sra8_u, tl, env, tl, tl)
+DEF_HELPER_3(srl8, tl, env, tl, tl)
+DEF_HELPER_3(srl8_u, tl, env, tl, tl)
+DEF_HELPER_3(sll8, tl, env, tl, tl)
+DEF_HELPER_3(ksll8, tl, env, tl, tl)
+DEF_HELPER_3(kslra8, tl, env, tl, tl)
+DEF_HELPER_3(kslra8_u, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 44c497f28a..8b78fb24bc 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -25,6 +25,7 @@
 
 %sh7    20:7
 %sh4    20:4
+%sh3    20:3
 %csr    20:12
 %rm     12:3
 %nf     29:3                     !function=ex_plus_1
@@ -63,6 +64,7 @@
 
 @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh7     %rs1 %rd
 @sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
+@sh3     ......  ...... .....  ... ..... ....... &shift  shamt=%sh3      %rs1 %rd
 @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
 
 @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
@@ -792,3 +794,18 @@ ksll16     0110010  ..... ..... 000 ..... 1110111 @r
 kslli16    0111010  1.... ..... 000 ..... 1110111 @sh4
 kslra16    0101011  ..... ..... 000 ..... 1110111 @r
 kslra16_u  0110011  ..... ..... 000 ..... 1110111 @r
+
+sra8       0101100  ..... ..... 000 ..... 1110111 @r
+sra8_u     0110100  ..... ..... 000 ..... 1110111 @r
+srai8      0111100  00... ..... 000 ..... 1110111 @sh3
+srai8_u    0111100  01... ..... 000 ..... 1110111 @sh3
+srl8       0101101  ..... ..... 000 ..... 1110111 @r
+srl8_u     0110101  ..... ..... 000 ..... 1110111 @r
+srli8      0111101  00... ..... 000 ..... 1110111 @sh3
+srli8_u    0111101  01... ..... 000 ..... 1110111 @sh3
+sll8       0101110  ..... ..... 000 ..... 1110111 @r
+slli8      0111110  00... ..... 000 ..... 1110111 @sh3
+ksll8      0110110  ..... ..... 000 ..... 1110111 @r
+kslli8     0111110  01... ..... 000 ..... 1110111 @sh3
+kslra8     0101111  ..... ..... 000 ..... 1110111 @r
+kslra8_u   0110111  ..... ..... 000 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index afafa49824..e6c5f2ddf5 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -187,3 +187,19 @@ GEN_RVP_SHIFTI(slli16, tcg_gen_vec_shl16i_tl, gen_helper_sll16);
 GEN_RVP_SHIFTI(srai16_u, NULL, gen_helper_sra16_u);
 GEN_RVP_SHIFTI(srli16_u, NULL, gen_helper_srl16_u);
 GEN_RVP_SHIFTI(kslli16, NULL, gen_helper_ksll16);
+
+/* SIMD 8-bit Shift Instructions */
+GEN_RVP_R_OOL(sra8);
+GEN_RVP_R_OOL(srl8);
+GEN_RVP_R_OOL(sll8);
+GEN_RVP_R_OOL(sra8_u);
+GEN_RVP_R_OOL(srl8_u);
+GEN_RVP_R_OOL(ksll8);
+GEN_RVP_R_OOL(kslra8);
+GEN_RVP_R_OOL(kslra8_u);
+GEN_RVP_SHIFTI(srai8, tcg_gen_vec_sar8i_tl, gen_helper_sra8);
+GEN_RVP_SHIFTI(srli8, tcg_gen_vec_shr8i_tl, gen_helper_srl8);
+GEN_RVP_SHIFTI(slli8, tcg_gen_vec_shl8i_tl, gen_helper_sll8);
+GEN_RVP_SHIFTI(srai8_u, NULL, gen_helper_sra8_u);
+GEN_RVP_SHIFTI(srli8_u, NULL, gen_helper_srl8_u);
+GEN_RVP_SHIFTI(kslli8, NULL, gen_helper_ksll8);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 7e31c2fe46..ab9ebc472b 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -529,3 +529,105 @@ static inline void do_kslra16_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(kslra16_u, 1, 2);
+
+/* SIMD 8-bit Shift Instructions */
+static inline void do_sra8(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+    d[i] = a[i] >> shift;
+}
+
+RVPR(sra8, 1, 1);
+
+static inline void do_srl8(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+    d[i] = a[i] >> shift;
+}
+
+RVPR(srl8, 1, 1);
+
+static inline void do_sll8(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+    d[i] = a[i] << shift;
+}
+
+RVPR(sll8, 1, 1);
+
+static inline void do_sra8_u(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+    d[i] =  vssra8(env, 0, a[i], shift);
+}
+
+RVPR(sra8_u, 1, 1);
+
+static inline void do_srl8_u(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+    d[i] =  vssrl8(env, 0, a[i], shift);
+}
+
+RVPR(srl8_u, 1, 1);
+
+static inline void do_ksll8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, result;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+
+    result = a[i] << shift;
+    if (shift > (clrsb32(a[i]) - 24)) {
+        env->vxsat = 0x1;
+        d[i] = (a[i] & INT8_MIN) ? INT8_MIN : INT8_MAX;
+    } else {
+        d[i] = result;
+    }
+}
+
+RVPR(ksll8, 1, 1);
+
+static inline void do_kslra8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 4);
+
+    if (shift >= 0) {
+        do_ksll8(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 8) ? 7 : shift;
+        d[i] = a[i] >> shift;
+    }
+}
+
+RVPR(kslra8, 1, 1);
+
+static inline void do_kslra8_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 4);
+
+    if (shift >= 0) {
+        do_ksll8(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 8) ? 7 : shift;
+        d[i] =  vssra8(env, 0, a[i], shift);
+    }
+}
+
+RVPR(kslra8_u, 1, 1);
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index cf1357cee1..f8d00a7ffa 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -2680,6 +2680,13 @@ void tcg_gen_vec_shl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
     tcg_gen_andi_i64(d, d, mask);
 }
 
+void tcg_gen_vec_shl8i_i32(TCGv_i32 d, TCGv_i32 a, int32_t c)
+{
+    uint32_t mask = dup_const(MO_8, 0xff << c);
+    tcg_gen_shli_i32(d, a, c);
+    tcg_gen_andi_i32(d, d, mask);
+}
+
 void tcg_gen_vec_shl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
 {
     uint64_t mask = dup_const(MO_16, 0xffff << c);
@@ -2738,6 +2745,13 @@ void tcg_gen_vec_shr8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
     tcg_gen_andi_i64(d, d, mask);
 }
 
+void tcg_gen_vec_shr8i_i32(TCGv_i32 d, TCGv_i32 a, int32_t c)
+{
+    uint32_t mask = dup_const(MO_8, 0xff >> c);
+    tcg_gen_shri_i32(d, a, c);
+    tcg_gen_andi_i32(d, d, mask);
+}
+
 void tcg_gen_vec_shr16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
 {
     uint64_t mask = dup_const(MO_16, 0xffff >> c);
@@ -2803,6 +2817,20 @@ void tcg_gen_vec_sar8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
     tcg_temp_free_i64(s);
 }
 
+void tcg_gen_vec_sar8i_i32(TCGv_i32 d, TCGv_i32 a, int32_t c)
+{
+    uint32_t s_mask = dup_const(MO_8, 0x80 >> c);
+    uint32_t c_mask = dup_const(MO_8, 0xff >> c);
+    TCGv_i32 s = tcg_temp_new_i32();
+
+    tcg_gen_shri_i32(d, a, c);
+    tcg_gen_andi_i32(s, d, s_mask);  /* isolate (shifted) sign bit */
+    tcg_gen_muli_i32(s, s, (2 << c) - 2); /* replicate isolated signs */
+    tcg_gen_andi_i32(d, d, c_mask);  /* clear out bits above sign  */
+    tcg_gen_or_i32(d, d, s);         /* include sign extension */
+    tcg_temp_free_i32(s);
+}
+
 void tcg_gen_vec_sar16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
 {
     uint64_t s_mask = dup_const(MO_16, 0x8000 >> c);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 06/37] target/riscv: SIMD 8-bit Shift Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson,
	LIU Zhiwei, Alistair Francis, Palmer Dabbelt

Instructions include right arithmetic shift, right logic shift,
and left shift.

The shift can be an immediate or a register scalar. The
right shift has rounding operation. And the left shift
has saturation operation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
---
 include/tcg/tcg-op-gvec.h               |   9 +++
 target/riscv/helper.h                   |   9 +++
 target/riscv/insn32.decode              |  17 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  16 ++++
 target/riscv/packed_helper.c            | 102 ++++++++++++++++++++++++
 tcg/tcg-op-gvec.c                       |  28 +++++++
 6 files changed, 181 insertions(+)

diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index 72cf697646..91531ecb0b 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -397,12 +397,15 @@ void tcg_gen_vec_sub16_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
 void tcg_gen_vec_sub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 
 void tcg_gen_vec_shl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
+void tcg_gen_vec_shl8i_i32(TCGv_i32 d, TCGv_i32 a, int32_t);
 void tcg_gen_vec_shl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
 void tcg_gen_vec_shl16i_i32(TCGv_i32 d, TCGv_i32 a, int32_t);
 void tcg_gen_vec_shr8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
+void tcg_gen_vec_shr8i_i32(TCGv_i32 d, TCGv_i32 a, int32_t);
 void tcg_gen_vec_shr16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
 void tcg_gen_vec_shr16i_i32(TCGv_i32 d, TCGv_i32 a, int32_t);
 void tcg_gen_vec_sar8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
+void tcg_gen_vec_sar8i_i32(TCGv_i32 d, TCGv_i32 a, int32_t);
 void tcg_gen_vec_sar16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
 void tcg_gen_vec_sar16i_i32(TCGv_i32 d, TCGv_i32 a, int32_t);
 void tcg_gen_vec_rotl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
@@ -416,6 +419,9 @@ void tcg_gen_vec_rotl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
 #define tcg_gen_vec_shl16i_tl tcg_gen_vec_shl16i_i64
 #define tcg_gen_vec_shr16i_tl tcg_gen_vec_shr16i_i64
 #define tcg_gen_vec_sar16i_tl tcg_gen_vec_sar16i_i64
+#define tcg_gen_vec_shl8i_tl tcg_gen_vec_shl8i_i64
+#define tcg_gen_vec_shr8i_tl tcg_gen_vec_shr8i_i64
+#define tcg_gen_vec_sar8i_tl tcg_gen_vec_sar8i_i64
 #else
 #define tcg_gen_vec_add16_tl tcg_gen_vec_add16_i32
 #define tcg_gen_vec_sub16_tl tcg_gen_vec_sub16_i32
@@ -424,6 +430,9 @@ void tcg_gen_vec_rotl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
 #define tcg_gen_vec_shl16i_tl tcg_gen_vec_shl16i_i32
 #define tcg_gen_vec_shr16i_tl tcg_gen_vec_shr16i_i32
 #define tcg_gen_vec_sar16i_tl tcg_gen_vec_sar16i_i32
+#define tcg_gen_vec_shl8i_tl tcg_gen_vec_shl8i_i32
+#define tcg_gen_vec_shr8i_tl tcg_gen_vec_shr8i_i32
+#define tcg_gen_vec_sar8i_tl tcg_gen_vec_sar8i_i32
 #endif
 
 #endif
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index de7b4fc17d..1b365135ff 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1197,3 +1197,12 @@ DEF_HELPER_3(sll16, tl, env, tl, tl)
 DEF_HELPER_3(ksll16, tl, env, tl, tl)
 DEF_HELPER_3(kslra16, tl, env, tl, tl)
 DEF_HELPER_3(kslra16_u, tl, env, tl, tl)
+
+DEF_HELPER_3(sra8, tl, env, tl, tl)
+DEF_HELPER_3(sra8_u, tl, env, tl, tl)
+DEF_HELPER_3(srl8, tl, env, tl, tl)
+DEF_HELPER_3(srl8_u, tl, env, tl, tl)
+DEF_HELPER_3(sll8, tl, env, tl, tl)
+DEF_HELPER_3(ksll8, tl, env, tl, tl)
+DEF_HELPER_3(kslra8, tl, env, tl, tl)
+DEF_HELPER_3(kslra8_u, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 44c497f28a..8b78fb24bc 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -25,6 +25,7 @@
 
 %sh7    20:7
 %sh4    20:4
+%sh3    20:3
 %csr    20:12
 %rm     12:3
 %nf     29:3                     !function=ex_plus_1
@@ -63,6 +64,7 @@
 
 @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh7     %rs1 %rd
 @sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
+@sh3     ......  ...... .....  ... ..... ....... &shift  shamt=%sh3      %rs1 %rd
 @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
 
 @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
@@ -792,3 +794,18 @@ ksll16     0110010  ..... ..... 000 ..... 1110111 @r
 kslli16    0111010  1.... ..... 000 ..... 1110111 @sh4
 kslra16    0101011  ..... ..... 000 ..... 1110111 @r
 kslra16_u  0110011  ..... ..... 000 ..... 1110111 @r
+
+sra8       0101100  ..... ..... 000 ..... 1110111 @r
+sra8_u     0110100  ..... ..... 000 ..... 1110111 @r
+srai8      0111100  00... ..... 000 ..... 1110111 @sh3
+srai8_u    0111100  01... ..... 000 ..... 1110111 @sh3
+srl8       0101101  ..... ..... 000 ..... 1110111 @r
+srl8_u     0110101  ..... ..... 000 ..... 1110111 @r
+srli8      0111101  00... ..... 000 ..... 1110111 @sh3
+srli8_u    0111101  01... ..... 000 ..... 1110111 @sh3
+sll8       0101110  ..... ..... 000 ..... 1110111 @r
+slli8      0111110  00... ..... 000 ..... 1110111 @sh3
+ksll8      0110110  ..... ..... 000 ..... 1110111 @r
+kslli8     0111110  01... ..... 000 ..... 1110111 @sh3
+kslra8     0101111  ..... ..... 000 ..... 1110111 @r
+kslra8_u   0110111  ..... ..... 000 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index afafa49824..e6c5f2ddf5 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -187,3 +187,19 @@ GEN_RVP_SHIFTI(slli16, tcg_gen_vec_shl16i_tl, gen_helper_sll16);
 GEN_RVP_SHIFTI(srai16_u, NULL, gen_helper_sra16_u);
 GEN_RVP_SHIFTI(srli16_u, NULL, gen_helper_srl16_u);
 GEN_RVP_SHIFTI(kslli16, NULL, gen_helper_ksll16);
+
+/* SIMD 8-bit Shift Instructions */
+GEN_RVP_R_OOL(sra8);
+GEN_RVP_R_OOL(srl8);
+GEN_RVP_R_OOL(sll8);
+GEN_RVP_R_OOL(sra8_u);
+GEN_RVP_R_OOL(srl8_u);
+GEN_RVP_R_OOL(ksll8);
+GEN_RVP_R_OOL(kslra8);
+GEN_RVP_R_OOL(kslra8_u);
+GEN_RVP_SHIFTI(srai8, tcg_gen_vec_sar8i_tl, gen_helper_sra8);
+GEN_RVP_SHIFTI(srli8, tcg_gen_vec_shr8i_tl, gen_helper_srl8);
+GEN_RVP_SHIFTI(slli8, tcg_gen_vec_shl8i_tl, gen_helper_sll8);
+GEN_RVP_SHIFTI(srai8_u, NULL, gen_helper_sra8_u);
+GEN_RVP_SHIFTI(srli8_u, NULL, gen_helper_srl8_u);
+GEN_RVP_SHIFTI(kslli8, NULL, gen_helper_ksll8);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 7e31c2fe46..ab9ebc472b 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -529,3 +529,105 @@ static inline void do_kslra16_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(kslra16_u, 1, 2);
+
+/* SIMD 8-bit Shift Instructions */
+static inline void do_sra8(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+    d[i] = a[i] >> shift;
+}
+
+RVPR(sra8, 1, 1);
+
+static inline void do_srl8(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+    d[i] = a[i] >> shift;
+}
+
+RVPR(srl8, 1, 1);
+
+static inline void do_sll8(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+    d[i] = a[i] << shift;
+}
+
+RVPR(sll8, 1, 1);
+
+static inline void do_sra8_u(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+    d[i] =  vssra8(env, 0, a[i], shift);
+}
+
+RVPR(sra8_u, 1, 1);
+
+static inline void do_srl8_u(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+    d[i] =  vssrl8(env, 0, a[i], shift);
+}
+
+RVPR(srl8_u, 1, 1);
+
+static inline void do_ksll8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, result;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+
+    result = a[i] << shift;
+    if (shift > (clrsb32(a[i]) - 24)) {
+        env->vxsat = 0x1;
+        d[i] = (a[i] & INT8_MIN) ? INT8_MIN : INT8_MAX;
+    } else {
+        d[i] = result;
+    }
+}
+
+RVPR(ksll8, 1, 1);
+
+static inline void do_kslra8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 4);
+
+    if (shift >= 0) {
+        do_ksll8(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 8) ? 7 : shift;
+        d[i] = a[i] >> shift;
+    }
+}
+
+RVPR(kslra8, 1, 1);
+
+static inline void do_kslra8_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 4);
+
+    if (shift >= 0) {
+        do_ksll8(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 8) ? 7 : shift;
+        d[i] =  vssra8(env, 0, a[i], shift);
+    }
+}
+
+RVPR(kslra8_u, 1, 1);
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index cf1357cee1..f8d00a7ffa 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -2680,6 +2680,13 @@ void tcg_gen_vec_shl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
     tcg_gen_andi_i64(d, d, mask);
 }
 
+void tcg_gen_vec_shl8i_i32(TCGv_i32 d, TCGv_i32 a, int32_t c)
+{
+    uint32_t mask = dup_const(MO_8, 0xff << c);
+    tcg_gen_shli_i32(d, a, c);
+    tcg_gen_andi_i32(d, d, mask);
+}
+
 void tcg_gen_vec_shl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
 {
     uint64_t mask = dup_const(MO_16, 0xffff << c);
@@ -2738,6 +2745,13 @@ void tcg_gen_vec_shr8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
     tcg_gen_andi_i64(d, d, mask);
 }
 
+void tcg_gen_vec_shr8i_i32(TCGv_i32 d, TCGv_i32 a, int32_t c)
+{
+    uint32_t mask = dup_const(MO_8, 0xff >> c);
+    tcg_gen_shri_i32(d, a, c);
+    tcg_gen_andi_i32(d, d, mask);
+}
+
 void tcg_gen_vec_shr16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
 {
     uint64_t mask = dup_const(MO_16, 0xffff >> c);
@@ -2803,6 +2817,20 @@ void tcg_gen_vec_sar8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
     tcg_temp_free_i64(s);
 }
 
+void tcg_gen_vec_sar8i_i32(TCGv_i32 d, TCGv_i32 a, int32_t c)
+{
+    uint32_t s_mask = dup_const(MO_8, 0x80 >> c);
+    uint32_t c_mask = dup_const(MO_8, 0xff >> c);
+    TCGv_i32 s = tcg_temp_new_i32();
+
+    tcg_gen_shri_i32(d, a, c);
+    tcg_gen_andi_i32(s, d, s_mask);  /* isolate (shifted) sign bit */
+    tcg_gen_muli_i32(s, s, (2 << c) - 2); /* replicate isolated signs */
+    tcg_gen_andi_i32(d, d, c_mask);  /* clear out bits above sign  */
+    tcg_gen_or_i32(d, d, s);         /* include sign extension */
+    tcg_temp_free_i32(s);
+}
+
 void tcg_gen_vec_sar16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
 {
     uint64_t s_mask = dup_const(MO_16, 0x8000 >> c);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 07/37] target/riscv: SIMD 16-bit Compare Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

There are 5 instructions here, including 16-bit compare
equal, signed less than, signed less than & equal,
unsigned less than, unsigned less than & equal.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com
---
 target/riscv/helper.h                   |  6 ++++
 target/riscv/insn32.decode              |  6 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  7 ++++
 target/riscv/packed_helper.c            | 46 +++++++++++++++++++++++++
 4 files changed, 65 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 1b365135ff..830845761b 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1206,3 +1206,9 @@ DEF_HELPER_3(sll8, tl, env, tl, tl)
 DEF_HELPER_3(ksll8, tl, env, tl, tl)
 DEF_HELPER_3(kslra8, tl, env, tl, tl)
 DEF_HELPER_3(kslra8_u, tl, env, tl, tl)
+
+DEF_HELPER_3(cmpeq16, tl, env, tl, tl)
+DEF_HELPER_3(scmplt16, tl, env, tl, tl)
+DEF_HELPER_3(scmple16, tl, env, tl, tl)
+DEF_HELPER_3(ucmplt16, tl, env, tl, tl)
+DEF_HELPER_3(ucmple16, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 8b78fb24bc..5031cebf1f 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -809,3 +809,9 @@ ksll8      0110110  ..... ..... 000 ..... 1110111 @r
 kslli8     0111110  01... ..... 000 ..... 1110111 @sh3
 kslra8     0101111  ..... ..... 000 ..... 1110111 @r
 kslra8_u   0110111  ..... ..... 000 ..... 1110111 @r
+
+cmpeq16    0100110  ..... ..... 000 ..... 1110111 @r
+scmplt16   0000110  ..... ..... 000 ..... 1110111 @r
+scmple16   0001110  ..... ..... 000 ..... 1110111 @r
+ucmplt16   0010110  ..... ..... 000 ..... 1110111 @r
+ucmple16   0011110  ..... ..... 000 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index e6c5f2ddf5..65199ffb5a 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -203,3 +203,10 @@ GEN_RVP_SHIFTI(slli8, tcg_gen_vec_shl8i_tl, gen_helper_sll8);
 GEN_RVP_SHIFTI(srai8_u, NULL, gen_helper_sra8_u);
 GEN_RVP_SHIFTI(srli8_u, NULL, gen_helper_srl8_u);
 GEN_RVP_SHIFTI(kslli8, NULL, gen_helper_ksll8);
+
+/* SIMD 16-bit Compare Instructions */
+GEN_RVP_R_OOL(cmpeq16);
+GEN_RVP_R_OOL(scmplt16);
+GEN_RVP_R_OOL(scmple16);
+GEN_RVP_R_OOL(ucmplt16);
+GEN_RVP_R_OOL(ucmple16);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index ab9ebc472b..30b916b5ad 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -631,3 +631,49 @@ static inline void do_kslra8_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(kslra8_u, 1, 1);
+
+/* SIMD 16-bit Compare Instructions */
+static inline void do_cmpeq16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] == b[i]) ? 0xffff : 0x0;
+}
+
+RVPR(cmpeq16, 1, 2);
+
+static inline void do_scmplt16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] < b[i]) ? 0xffff : 0x0;
+}
+
+RVPR(scmplt16, 1, 2);
+
+static inline void do_scmple16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] <= b[i]) ? 0xffff : 0x0;
+}
+
+RVPR(scmple16, 1, 2);
+
+static inline void do_ucmplt16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] < b[i]) ? 0xffff : 0x0;
+}
+
+RVPR(ucmplt16, 1, 2);
+
+static inline void do_ucmple16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] <= b[i]) ? 0xffff : 0x0;
+}
+
+RVPR(ucmple16, 1, 2);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 07/37] target/riscv: SIMD 16-bit Compare Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

There are 5 instructions here, including 16-bit compare
equal, signed less than, signed less than & equal,
unsigned less than, unsigned less than & equal.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com
---
 target/riscv/helper.h                   |  6 ++++
 target/riscv/insn32.decode              |  6 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  7 ++++
 target/riscv/packed_helper.c            | 46 +++++++++++++++++++++++++
 4 files changed, 65 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 1b365135ff..830845761b 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1206,3 +1206,9 @@ DEF_HELPER_3(sll8, tl, env, tl, tl)
 DEF_HELPER_3(ksll8, tl, env, tl, tl)
 DEF_HELPER_3(kslra8, tl, env, tl, tl)
 DEF_HELPER_3(kslra8_u, tl, env, tl, tl)
+
+DEF_HELPER_3(cmpeq16, tl, env, tl, tl)
+DEF_HELPER_3(scmplt16, tl, env, tl, tl)
+DEF_HELPER_3(scmple16, tl, env, tl, tl)
+DEF_HELPER_3(ucmplt16, tl, env, tl, tl)
+DEF_HELPER_3(ucmple16, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 8b78fb24bc..5031cebf1f 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -809,3 +809,9 @@ ksll8      0110110  ..... ..... 000 ..... 1110111 @r
 kslli8     0111110  01... ..... 000 ..... 1110111 @sh3
 kslra8     0101111  ..... ..... 000 ..... 1110111 @r
 kslra8_u   0110111  ..... ..... 000 ..... 1110111 @r
+
+cmpeq16    0100110  ..... ..... 000 ..... 1110111 @r
+scmplt16   0000110  ..... ..... 000 ..... 1110111 @r
+scmple16   0001110  ..... ..... 000 ..... 1110111 @r
+ucmplt16   0010110  ..... ..... 000 ..... 1110111 @r
+ucmple16   0011110  ..... ..... 000 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index e6c5f2ddf5..65199ffb5a 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -203,3 +203,10 @@ GEN_RVP_SHIFTI(slli8, tcg_gen_vec_shl8i_tl, gen_helper_sll8);
 GEN_RVP_SHIFTI(srai8_u, NULL, gen_helper_sra8_u);
 GEN_RVP_SHIFTI(srli8_u, NULL, gen_helper_srl8_u);
 GEN_RVP_SHIFTI(kslli8, NULL, gen_helper_ksll8);
+
+/* SIMD 16-bit Compare Instructions */
+GEN_RVP_R_OOL(cmpeq16);
+GEN_RVP_R_OOL(scmplt16);
+GEN_RVP_R_OOL(scmple16);
+GEN_RVP_R_OOL(ucmplt16);
+GEN_RVP_R_OOL(ucmple16);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index ab9ebc472b..30b916b5ad 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -631,3 +631,49 @@ static inline void do_kslra8_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(kslra8_u, 1, 1);
+
+/* SIMD 16-bit Compare Instructions */
+static inline void do_cmpeq16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] == b[i]) ? 0xffff : 0x0;
+}
+
+RVPR(cmpeq16, 1, 2);
+
+static inline void do_scmplt16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] < b[i]) ? 0xffff : 0x0;
+}
+
+RVPR(scmplt16, 1, 2);
+
+static inline void do_scmple16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] <= b[i]) ? 0xffff : 0x0;
+}
+
+RVPR(scmple16, 1, 2);
+
+static inline void do_ucmplt16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] < b[i]) ? 0xffff : 0x0;
+}
+
+RVPR(ucmplt16, 1, 2);
+
+static inline void do_ucmple16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] <= b[i]) ? 0xffff : 0x0;
+}
+
+RVPR(ucmple16, 1, 2);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 08/37] target/riscv: SIMD 8-bit Compare Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair Francis, LIU Zhiwei

There are 5 instructions here, including 8-bit compare
equal, signed less than, signed less than & equal,
unsigned less than, unsigned less than & equal.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/helper.h                   |  6 ++++
 target/riscv/insn32.decode              |  6 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  7 ++++
 target/riscv/packed_helper.c            | 46 +++++++++++++++++++++++++
 4 files changed, 65 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 830845761b..c424e45fe5 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1212,3 +1212,9 @@ DEF_HELPER_3(scmplt16, tl, env, tl, tl)
 DEF_HELPER_3(scmple16, tl, env, tl, tl)
 DEF_HELPER_3(ucmplt16, tl, env, tl, tl)
 DEF_HELPER_3(ucmple16, tl, env, tl, tl)
+
+DEF_HELPER_3(cmpeq8, tl, env, tl, tl)
+DEF_HELPER_3(scmplt8, tl, env, tl, tl)
+DEF_HELPER_3(scmple8, tl, env, tl, tl)
+DEF_HELPER_3(ucmplt8, tl, env, tl, tl)
+DEF_HELPER_3(ucmple8, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 5031cebf1f..fdbf3798c7 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -815,3 +815,9 @@ scmplt16   0000110  ..... ..... 000 ..... 1110111 @r
 scmple16   0001110  ..... ..... 000 ..... 1110111 @r
 ucmplt16   0010110  ..... ..... 000 ..... 1110111 @r
 ucmple16   0011110  ..... ..... 000 ..... 1110111 @r
+
+cmpeq8     0100111  ..... ..... 000 ..... 1110111 @r
+scmplt8    0000111  ..... ..... 000 ..... 1110111 @r
+scmple8    0001111  ..... ..... 000 ..... 1110111 @r
+ucmplt8    0010111  ..... ..... 000 ..... 1110111 @r
+ucmple8    0011111  ..... ..... 000 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 65199ffb5a..aa432701c8 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -210,3 +210,10 @@ GEN_RVP_R_OOL(scmplt16);
 GEN_RVP_R_OOL(scmple16);
 GEN_RVP_R_OOL(ucmplt16);
 GEN_RVP_R_OOL(ucmple16);
+
+/* SIMD 8-bit Compare Instructions */
+GEN_RVP_R_OOL(cmpeq8);
+GEN_RVP_R_OOL(scmplt8);
+GEN_RVP_R_OOL(scmple8);
+GEN_RVP_R_OOL(ucmplt8);
+GEN_RVP_R_OOL(ucmple8);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 30b916b5ad..ff86e015e4 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -677,3 +677,49 @@ static inline void do_ucmple16(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(ucmple16, 1, 2);
+
+/* SIMD 8-bit Compare Instructions */
+static inline void do_cmpeq8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] == b[i]) ? 0xff : 0x0;
+}
+
+RVPR(cmpeq8, 1, 1);
+
+static inline void do_scmplt8(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] < b[i]) ? 0xff : 0x0;
+}
+
+RVPR(scmplt8, 1, 1);
+
+static inline void do_scmple8(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] <= b[i]) ? 0xff : 0x0;
+}
+
+RVPR(scmple8, 1, 1);
+
+static inline void do_ucmplt8(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] < b[i]) ? 0xff : 0x0;
+}
+
+RVPR(ucmplt8, 1, 1);
+
+static inline void do_ucmple8(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] <= b[i]) ? 0xff : 0x0;
+}
+
+RVPR(ucmple8, 1, 1);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 08/37] target/riscv: SIMD 8-bit Compare Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson,
	LIU Zhiwei, Alistair Francis

There are 5 instructions here, including 8-bit compare
equal, signed less than, signed less than & equal,
unsigned less than, unsigned less than & equal.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/helper.h                   |  6 ++++
 target/riscv/insn32.decode              |  6 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  7 ++++
 target/riscv/packed_helper.c            | 46 +++++++++++++++++++++++++
 4 files changed, 65 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 830845761b..c424e45fe5 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1212,3 +1212,9 @@ DEF_HELPER_3(scmplt16, tl, env, tl, tl)
 DEF_HELPER_3(scmple16, tl, env, tl, tl)
 DEF_HELPER_3(ucmplt16, tl, env, tl, tl)
 DEF_HELPER_3(ucmple16, tl, env, tl, tl)
+
+DEF_HELPER_3(cmpeq8, tl, env, tl, tl)
+DEF_HELPER_3(scmplt8, tl, env, tl, tl)
+DEF_HELPER_3(scmple8, tl, env, tl, tl)
+DEF_HELPER_3(ucmplt8, tl, env, tl, tl)
+DEF_HELPER_3(ucmple8, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 5031cebf1f..fdbf3798c7 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -815,3 +815,9 @@ scmplt16   0000110  ..... ..... 000 ..... 1110111 @r
 scmple16   0001110  ..... ..... 000 ..... 1110111 @r
 ucmplt16   0010110  ..... ..... 000 ..... 1110111 @r
 ucmple16   0011110  ..... ..... 000 ..... 1110111 @r
+
+cmpeq8     0100111  ..... ..... 000 ..... 1110111 @r
+scmplt8    0000111  ..... ..... 000 ..... 1110111 @r
+scmple8    0001111  ..... ..... 000 ..... 1110111 @r
+ucmplt8    0010111  ..... ..... 000 ..... 1110111 @r
+ucmple8    0011111  ..... ..... 000 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 65199ffb5a..aa432701c8 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -210,3 +210,10 @@ GEN_RVP_R_OOL(scmplt16);
 GEN_RVP_R_OOL(scmple16);
 GEN_RVP_R_OOL(ucmplt16);
 GEN_RVP_R_OOL(ucmple16);
+
+/* SIMD 8-bit Compare Instructions */
+GEN_RVP_R_OOL(cmpeq8);
+GEN_RVP_R_OOL(scmplt8);
+GEN_RVP_R_OOL(scmple8);
+GEN_RVP_R_OOL(ucmplt8);
+GEN_RVP_R_OOL(ucmple8);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 30b916b5ad..ff86e015e4 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -677,3 +677,49 @@ static inline void do_ucmple16(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(ucmple16, 1, 2);
+
+/* SIMD 8-bit Compare Instructions */
+static inline void do_cmpeq8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] == b[i]) ? 0xff : 0x0;
+}
+
+RVPR(cmpeq8, 1, 1);
+
+static inline void do_scmplt8(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] < b[i]) ? 0xff : 0x0;
+}
+
+RVPR(scmplt8, 1, 1);
+
+static inline void do_scmple8(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] <= b[i]) ? 0xff : 0x0;
+}
+
+RVPR(scmple8, 1, 1);
+
+static inline void do_ucmplt8(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] < b[i]) ? 0xff : 0x0;
+}
+
+RVPR(ucmplt8, 1, 1);
+
+static inline void do_ucmple8(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] <= b[i]) ? 0xff : 0x0;
+}
+
+RVPR(ucmple8, 1, 1);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 09/37] target/riscv: SIMD 16-bit Multiply Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

There are 6 instructions, including 16-bit signed or unsigned multiply,
16-bit signed or unsigned crossed multiply, Q15 signed or signed crossed
saturating multiply.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   7 ++
 target/riscv/insn32.decode              |   7 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  69 ++++++++++++++++
 target/riscv/packed_helper.c            | 104 ++++++++++++++++++++++++
 4 files changed, 187 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index c424e45fe5..d13b84f165 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1218,3 +1218,10 @@ DEF_HELPER_3(scmplt8, tl, env, tl, tl)
 DEF_HELPER_3(scmple8, tl, env, tl, tl)
 DEF_HELPER_3(ucmplt8, tl, env, tl, tl)
 DEF_HELPER_3(ucmple8, tl, env, tl, tl)
+
+DEF_HELPER_3(smul16, i64, env, tl, tl)
+DEF_HELPER_3(smulx16, i64, env, tl, tl)
+DEF_HELPER_3(umul16, i64, env, tl, tl)
+DEF_HELPER_3(umulx16, i64, env, tl, tl)
+DEF_HELPER_3(khm16, tl, env, tl, tl)
+DEF_HELPER_3(khmx16, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index fdbf3798c7..cbee995229 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -821,3 +821,10 @@ scmplt8    0000111  ..... ..... 000 ..... 1110111 @r
 scmple8    0001111  ..... ..... 000 ..... 1110111 @r
 ucmplt8    0010111  ..... ..... 000 ..... 1110111 @r
 ucmple8    0011111  ..... ..... 000 ..... 1110111 @r
+
+smul16     1010000  ..... ..... 000 ..... 1110111 @r
+smulx16    1010001  ..... ..... 000 ..... 1110111 @r
+umul16     1011000  ..... ..... 000 ..... 1110111 @r
+umulx16    1011001  ..... ..... 000 ..... 1110111 @r
+khm16      1000011  ..... ..... 000 ..... 1110111 @r
+khmx16     1001011  ..... ..... 000 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index aa432701c8..b93ba63dd8 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -217,3 +217,72 @@ GEN_RVP_R_OOL(scmplt8);
 GEN_RVP_R_OOL(scmple8);
 GEN_RVP_R_OOL(ucmplt8);
 GEN_RVP_R_OOL(ucmple8);
+
+/* SIMD 16-bit Multiply Instructions */
+static void set_pair_regs(DisasContext *ctx, TCGv_i64 dst, int rd)
+{
+    TCGv t1, t2;
+
+    t1 = tcg_temp_new();
+    t2 = tcg_temp_new();
+
+    if (is_32bit(ctx)) {
+        TCGv_i32 lo, hi;
+
+        lo = tcg_temp_new_i32();
+        hi = tcg_temp_new_i32();
+        tcg_gen_extr_i64_i32(lo, hi, dst);
+
+        tcg_gen_ext_i32_tl(t1, lo);
+        tcg_gen_ext_i32_tl(t2, hi);
+
+        gen_set_gpr(rd, t1);
+        gen_set_gpr(rd + 1, t2);
+        tcg_temp_free_i32(lo);
+        tcg_temp_free_i32(hi);
+    } else {
+        tcg_gen_trunc_i64_tl(t1, dst);
+        gen_set_gpr(rd, t1);
+    }
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+}
+
+static inline bool
+r_d64_ool(DisasContext *ctx, arg_r *a,
+          void (* fn)(TCGv_i64, TCGv_ptr, TCGv, TCGv))
+{
+    TCGv t1, t2;
+    TCGv_i64 t3;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_psfoperand) {
+        return false;
+    }
+
+    t1 = tcg_temp_new();
+    t2 = tcg_temp_new();
+    t3 = tcg_temp_new_i64();
+
+    gen_get_gpr(t1, a->rs1);
+    gen_get_gpr(t2, a->rs2);
+    fn(t3, cpu_env, t1, t2);
+    set_pair_regs(ctx, t3, a->rd);
+
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+    tcg_temp_free_i64(t3);
+    return true;
+}
+
+#define GEN_RVP_R_D64_OOL(NAME)                        \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_d64_ool(s, a, gen_helper_##NAME);         \
+}
+
+GEN_RVP_R_D64_OOL(smul16);
+GEN_RVP_R_D64_OOL(smulx16);
+GEN_RVP_R_D64_OOL(umul16);
+GEN_RVP_R_D64_OOL(umulx16);
+GEN_RVP_R_OOL(khm16);
+GEN_RVP_R_OOL(khmx16);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index ff86e015e4..13fed2c4d1 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -723,3 +723,107 @@ static inline void do_ucmple8(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(ucmple8, 1, 1);
+
+/* SIMD 16-bit Multiply Instructions */
+typedef void PackedFn3(CPURISCVState *, void *, void *, void *);
+static inline uint64_t rvpr64(CPURISCVState *env, target_ulong a,
+                              target_ulong b, PackedFn3 *fn)
+{
+    uint64_t result;
+
+    fn(env, &result, &a, &b);
+    return result;
+}
+
+#define RVPR64(NAME)                                            \
+uint64_t HELPER(NAME)(CPURISCVState *env, target_ulong a,       \
+                      target_ulong b)                           \
+{                                                               \
+    return rvpr64(env, a, b, (PackedFn3 *)do_##NAME);           \
+}
+
+static inline void do_smul16(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(0)] = (int32_t)a[H2(0)] * b[H2(0)];
+    d[H4(1)] = (int32_t)a[H2(1)] * b[H2(1)];
+}
+
+RVPR64(smul16);
+
+static inline void do_smulx16(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(0)] = (int32_t)a[H2(0)] * b[H2(1)];
+    d[H4(1)] = (int32_t)a[H2(1)] * b[H2(0)];
+}
+
+RVPR64(smulx16);
+
+static inline void do_umul16(CPURISCVState *env, void *vd, void *va, void *vb,
+                             uint8_t i)
+{
+    uint32_t *d = vd;
+    uint16_t *a = va, *b = vb;
+    d[H4(0)] = (uint32_t)a[H2(0)] * b[H2(0)];
+    d[H4(1)] = (uint32_t)a[H2(1)] * b[H2(1)];
+}
+
+RVPR64(umul16);
+
+static inline void do_umulx16(CPURISCVState *env, void *vd, void *va, void *vb,
+                              uint8_t i)
+{
+    uint32_t *d = vd;
+    uint16_t *a = va, *b = vb;
+    d[H4(0)] = (uint32_t)a[H2(0)] * b[H2(1)];
+    d[H4(1)] = (uint32_t)a[H2(1)] * b[H2(0)];
+}
+
+RVPR64(umulx16);
+
+static inline void do_khm16(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+
+    if (a[i] == INT16_MIN && b[i] == INT16_MIN) {
+        env->vxsat = 1;
+        d[i] = INT16_MAX;
+    } else {
+        d[i] = (int32_t)a[i] * b[i] >> 15;
+    }
+}
+
+RVPR(khm16, 1, 2);
+
+static inline void do_khmx16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+
+    /*
+     * t[x] = ra.H[x] s* rb.H[y];
+     * rt.H[x] = SAT.Q15(t[x] s>> 15);
+     *
+     * (RV32: (x,y)=(1,0),(0,1),
+     *  RV64: (x,y)=(3,2),(2,3),
+     *              (1,0),(0,1)
+     */
+    if (a[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        env->vxsat = 1;
+        d[H2(i)] = INT16_MAX;
+    } else {
+        d[H2(i)] = (int32_t)a[H2(i)] * b[H2(i + 1)] >> 15;
+    }
+    if (a[H2(i + 1)] == INT16_MIN && b[H2(i)] == INT16_MIN) {
+        env->vxsat = 1;
+        d[H2(i + 1)] = INT16_MAX;
+    } else {
+        d[H2(i + 1)] = (int32_t)a[H2(i + 1)] * b[H2(i)] >> 15;
+    }
+}
+
+RVPR(khmx16, 2, 2);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 09/37] target/riscv: SIMD 16-bit Multiply Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

There are 6 instructions, including 16-bit signed or unsigned multiply,
16-bit signed or unsigned crossed multiply, Q15 signed or signed crossed
saturating multiply.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   7 ++
 target/riscv/insn32.decode              |   7 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  69 ++++++++++++++++
 target/riscv/packed_helper.c            | 104 ++++++++++++++++++++++++
 4 files changed, 187 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index c424e45fe5..d13b84f165 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1218,3 +1218,10 @@ DEF_HELPER_3(scmplt8, tl, env, tl, tl)
 DEF_HELPER_3(scmple8, tl, env, tl, tl)
 DEF_HELPER_3(ucmplt8, tl, env, tl, tl)
 DEF_HELPER_3(ucmple8, tl, env, tl, tl)
+
+DEF_HELPER_3(smul16, i64, env, tl, tl)
+DEF_HELPER_3(smulx16, i64, env, tl, tl)
+DEF_HELPER_3(umul16, i64, env, tl, tl)
+DEF_HELPER_3(umulx16, i64, env, tl, tl)
+DEF_HELPER_3(khm16, tl, env, tl, tl)
+DEF_HELPER_3(khmx16, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index fdbf3798c7..cbee995229 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -821,3 +821,10 @@ scmplt8    0000111  ..... ..... 000 ..... 1110111 @r
 scmple8    0001111  ..... ..... 000 ..... 1110111 @r
 ucmplt8    0010111  ..... ..... 000 ..... 1110111 @r
 ucmple8    0011111  ..... ..... 000 ..... 1110111 @r
+
+smul16     1010000  ..... ..... 000 ..... 1110111 @r
+smulx16    1010001  ..... ..... 000 ..... 1110111 @r
+umul16     1011000  ..... ..... 000 ..... 1110111 @r
+umulx16    1011001  ..... ..... 000 ..... 1110111 @r
+khm16      1000011  ..... ..... 000 ..... 1110111 @r
+khmx16     1001011  ..... ..... 000 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index aa432701c8..b93ba63dd8 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -217,3 +217,72 @@ GEN_RVP_R_OOL(scmplt8);
 GEN_RVP_R_OOL(scmple8);
 GEN_RVP_R_OOL(ucmplt8);
 GEN_RVP_R_OOL(ucmple8);
+
+/* SIMD 16-bit Multiply Instructions */
+static void set_pair_regs(DisasContext *ctx, TCGv_i64 dst, int rd)
+{
+    TCGv t1, t2;
+
+    t1 = tcg_temp_new();
+    t2 = tcg_temp_new();
+
+    if (is_32bit(ctx)) {
+        TCGv_i32 lo, hi;
+
+        lo = tcg_temp_new_i32();
+        hi = tcg_temp_new_i32();
+        tcg_gen_extr_i64_i32(lo, hi, dst);
+
+        tcg_gen_ext_i32_tl(t1, lo);
+        tcg_gen_ext_i32_tl(t2, hi);
+
+        gen_set_gpr(rd, t1);
+        gen_set_gpr(rd + 1, t2);
+        tcg_temp_free_i32(lo);
+        tcg_temp_free_i32(hi);
+    } else {
+        tcg_gen_trunc_i64_tl(t1, dst);
+        gen_set_gpr(rd, t1);
+    }
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+}
+
+static inline bool
+r_d64_ool(DisasContext *ctx, arg_r *a,
+          void (* fn)(TCGv_i64, TCGv_ptr, TCGv, TCGv))
+{
+    TCGv t1, t2;
+    TCGv_i64 t3;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_psfoperand) {
+        return false;
+    }
+
+    t1 = tcg_temp_new();
+    t2 = tcg_temp_new();
+    t3 = tcg_temp_new_i64();
+
+    gen_get_gpr(t1, a->rs1);
+    gen_get_gpr(t2, a->rs2);
+    fn(t3, cpu_env, t1, t2);
+    set_pair_regs(ctx, t3, a->rd);
+
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+    tcg_temp_free_i64(t3);
+    return true;
+}
+
+#define GEN_RVP_R_D64_OOL(NAME)                        \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_d64_ool(s, a, gen_helper_##NAME);         \
+}
+
+GEN_RVP_R_D64_OOL(smul16);
+GEN_RVP_R_D64_OOL(smulx16);
+GEN_RVP_R_D64_OOL(umul16);
+GEN_RVP_R_D64_OOL(umulx16);
+GEN_RVP_R_OOL(khm16);
+GEN_RVP_R_OOL(khmx16);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index ff86e015e4..13fed2c4d1 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -723,3 +723,107 @@ static inline void do_ucmple8(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(ucmple8, 1, 1);
+
+/* SIMD 16-bit Multiply Instructions */
+typedef void PackedFn3(CPURISCVState *, void *, void *, void *);
+static inline uint64_t rvpr64(CPURISCVState *env, target_ulong a,
+                              target_ulong b, PackedFn3 *fn)
+{
+    uint64_t result;
+
+    fn(env, &result, &a, &b);
+    return result;
+}
+
+#define RVPR64(NAME)                                            \
+uint64_t HELPER(NAME)(CPURISCVState *env, target_ulong a,       \
+                      target_ulong b)                           \
+{                                                               \
+    return rvpr64(env, a, b, (PackedFn3 *)do_##NAME);           \
+}
+
+static inline void do_smul16(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(0)] = (int32_t)a[H2(0)] * b[H2(0)];
+    d[H4(1)] = (int32_t)a[H2(1)] * b[H2(1)];
+}
+
+RVPR64(smul16);
+
+static inline void do_smulx16(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(0)] = (int32_t)a[H2(0)] * b[H2(1)];
+    d[H4(1)] = (int32_t)a[H2(1)] * b[H2(0)];
+}
+
+RVPR64(smulx16);
+
+static inline void do_umul16(CPURISCVState *env, void *vd, void *va, void *vb,
+                             uint8_t i)
+{
+    uint32_t *d = vd;
+    uint16_t *a = va, *b = vb;
+    d[H4(0)] = (uint32_t)a[H2(0)] * b[H2(0)];
+    d[H4(1)] = (uint32_t)a[H2(1)] * b[H2(1)];
+}
+
+RVPR64(umul16);
+
+static inline void do_umulx16(CPURISCVState *env, void *vd, void *va, void *vb,
+                              uint8_t i)
+{
+    uint32_t *d = vd;
+    uint16_t *a = va, *b = vb;
+    d[H4(0)] = (uint32_t)a[H2(0)] * b[H2(1)];
+    d[H4(1)] = (uint32_t)a[H2(1)] * b[H2(0)];
+}
+
+RVPR64(umulx16);
+
+static inline void do_khm16(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+
+    if (a[i] == INT16_MIN && b[i] == INT16_MIN) {
+        env->vxsat = 1;
+        d[i] = INT16_MAX;
+    } else {
+        d[i] = (int32_t)a[i] * b[i] >> 15;
+    }
+}
+
+RVPR(khm16, 1, 2);
+
+static inline void do_khmx16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+
+    /*
+     * t[x] = ra.H[x] s* rb.H[y];
+     * rt.H[x] = SAT.Q15(t[x] s>> 15);
+     *
+     * (RV32: (x,y)=(1,0),(0,1),
+     *  RV64: (x,y)=(3,2),(2,3),
+     *              (1,0),(0,1)
+     */
+    if (a[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        env->vxsat = 1;
+        d[H2(i)] = INT16_MAX;
+    } else {
+        d[H2(i)] = (int32_t)a[H2(i)] * b[H2(i + 1)] >> 15;
+    }
+    if (a[H2(i + 1)] == INT16_MIN && b[H2(i)] == INT16_MIN) {
+        env->vxsat = 1;
+        d[H2(i + 1)] = INT16_MAX;
+    } else {
+        d[H2(i + 1)] = (int32_t)a[H2(i + 1)] * b[H2(i)] >> 15;
+    }
+}
+
+RVPR(khmx16, 2, 2);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 10/37] target/riscv: SIMD 8-bit Multiply Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair Francis, LIU Zhiwei

There are 6 instructions, including 8-bit signed or unsigned multiply,
8-bit signed or unsigned crossed multiply, Q7 signed or signed crossed
saturating multiply.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/helper.h                   |  7 ++
 target/riscv/insn32.decode              |  7 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  8 +++
 target/riscv/packed_helper.c            | 93 +++++++++++++++++++++++++
 4 files changed, 115 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index d13b84f165..4d0918b9a9 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1225,3 +1225,10 @@ DEF_HELPER_3(umul16, i64, env, tl, tl)
 DEF_HELPER_3(umulx16, i64, env, tl, tl)
 DEF_HELPER_3(khm16, tl, env, tl, tl)
 DEF_HELPER_3(khmx16, tl, env, tl, tl)
+
+DEF_HELPER_3(smul8, i64, env, tl, tl)
+DEF_HELPER_3(smulx8, i64, env, tl, tl)
+DEF_HELPER_3(umul8, i64, env, tl, tl)
+DEF_HELPER_3(umulx8, i64, env, tl, tl)
+DEF_HELPER_3(khm8, tl, env, tl, tl)
+DEF_HELPER_3(khmx8, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index cbee995229..05c3e67477 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -828,3 +828,10 @@ umul16     1011000  ..... ..... 000 ..... 1110111 @r
 umulx16    1011001  ..... ..... 000 ..... 1110111 @r
 khm16      1000011  ..... ..... 000 ..... 1110111 @r
 khmx16     1001011  ..... ..... 000 ..... 1110111 @r
+
+smul8      1010100  ..... ..... 000 ..... 1110111 @r
+smulx8     1010101  ..... ..... 000 ..... 1110111 @r
+umul8      1011100  ..... ..... 000 ..... 1110111 @r
+umulx8     1011101  ..... ..... 000 ..... 1110111 @r
+khm8       1000111  ..... ..... 000 ..... 1110111 @r
+khmx8      1001111  ..... ..... 000 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index b93ba63dd8..2188de8505 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -286,3 +286,11 @@ GEN_RVP_R_D64_OOL(umul16);
 GEN_RVP_R_D64_OOL(umulx16);
 GEN_RVP_R_OOL(khm16);
 GEN_RVP_R_OOL(khmx16);
+
+/* SIMD 8-bit Multiply Instructions */
+GEN_RVP_R_D64_OOL(smul8);
+GEN_RVP_R_D64_OOL(smulx8);
+GEN_RVP_R_D64_OOL(umul8);
+GEN_RVP_R_D64_OOL(umulx8);
+GEN_RVP_R_OOL(khm8);
+GEN_RVP_R_OOL(khmx8);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 13fed2c4d1..56baefeb8e 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -827,3 +827,96 @@ static inline void do_khmx16(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(khmx16, 2, 2);
+
+/* SIMD 8-bit Multiply Instructions */
+static inline void do_smul8(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    int16_t *d = vd;
+    int8_t *a = va, *b = vb;
+    d[H2(0)] = (int16_t)a[H1(0)] * b[H1(0)];
+    d[H2(1)] = (int16_t)a[H1(1)] * b[H1(1)];
+    d[H2(2)] = (int16_t)a[H1(2)] * b[H1(2)];
+    d[H2(3)] = (int16_t)a[H1(3)] * b[H1(3)];
+}
+
+RVPR64(smul8);
+
+static inline void do_smulx8(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    int16_t *d = vd;
+    int8_t *a = va, *b = vb;
+    d[H2(0)] = (int16_t)a[H1(0)] * b[H1(1)];
+    d[H2(1)] = (int16_t)a[H1(1)] * b[H1(0)];
+    d[H2(2)] = (int16_t)a[H1(2)] * b[H1(3)];
+    d[H2(3)] = (int16_t)a[H1(3)] * b[H1(2)];
+}
+
+RVPR64(smulx8);
+
+static inline void do_umul8(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    uint16_t *d = vd;
+    uint8_t *a = va, *b = vb;
+    d[H2(0)] = (uint16_t)a[H1(0)] * b[H1(0)];
+    d[H2(1)] = (uint16_t)a[H1(1)] * b[H1(1)];
+    d[H2(2)] = (uint16_t)a[H1(2)] * b[H1(2)];
+    d[H2(3)] = (uint16_t)a[H1(3)] * b[H1(3)];
+}
+
+RVPR64(umul8);
+
+static inline void do_umulx8(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    uint16_t *d = vd;
+    uint8_t *a = va, *b = vb;
+    d[H2(0)] = (uint16_t)a[H1(0)] * b[H1(1)];
+    d[H2(1)] = (uint16_t)a[H1(1)] * b[H1(0)];
+    d[H2(2)] = (uint16_t)a[H1(2)] * b[H1(3)];
+    d[H2(3)] = (uint16_t)a[H1(3)] * b[H1(2)];
+}
+
+RVPR64(umulx8);
+
+static inline void do_khm8(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+
+    if (a[i] == INT8_MIN && b[i] == INT8_MIN) {
+        env->vxsat = 1;
+        d[i] = INT8_MAX;
+    } else {
+        d[i] = (int16_t)a[i] * b[i] >> 7;
+    }
+}
+
+RVPR(khm8, 1, 1);
+
+static inline void do_khmx8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    /*
+     * t[x] = ra.B[x] s* rb.B[y];
+     * rt.B[x] = SAT.Q7(t[x] s>> 7);
+     *
+     * (RV32: (x,y)=(3,2),(2,3),
+     *              (1,0),(0,1),
+     * (RV64: (x,y)=(7,6),(6,7),(5,4),(4,5),
+     *              (3,2),(2,3),(1,0),(0,1))
+     */
+    if (a[H1(i)] == INT8_MIN && b[H1(i + 1)] == INT8_MIN) {
+        env->vxsat = 1;
+        d[H1(i)] = INT8_MAX;
+    } else {
+        d[H1(i)] = (int16_t)a[H1(i)] * b[H1(i + 1)] >> 7;
+    }
+    if (a[H1(i + 1)] == INT8_MIN && b[H1(i)] == INT8_MIN) {
+        env->vxsat = 1;
+        d[H1(i + 1)] = INT8_MAX;
+    } else {
+        d[H1(i + 1)] = (int16_t)a[H1(i + 1)] * b[H1(i)] >> 7;
+    }
+}
+
+RVPR(khmx8, 2, 1);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 10/37] target/riscv: SIMD 8-bit Multiply Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson,
	LIU Zhiwei, Alistair Francis

There are 6 instructions, including 8-bit signed or unsigned multiply,
8-bit signed or unsigned crossed multiply, Q7 signed or signed crossed
saturating multiply.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/helper.h                   |  7 ++
 target/riscv/insn32.decode              |  7 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  8 +++
 target/riscv/packed_helper.c            | 93 +++++++++++++++++++++++++
 4 files changed, 115 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index d13b84f165..4d0918b9a9 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1225,3 +1225,10 @@ DEF_HELPER_3(umul16, i64, env, tl, tl)
 DEF_HELPER_3(umulx16, i64, env, tl, tl)
 DEF_HELPER_3(khm16, tl, env, tl, tl)
 DEF_HELPER_3(khmx16, tl, env, tl, tl)
+
+DEF_HELPER_3(smul8, i64, env, tl, tl)
+DEF_HELPER_3(smulx8, i64, env, tl, tl)
+DEF_HELPER_3(umul8, i64, env, tl, tl)
+DEF_HELPER_3(umulx8, i64, env, tl, tl)
+DEF_HELPER_3(khm8, tl, env, tl, tl)
+DEF_HELPER_3(khmx8, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index cbee995229..05c3e67477 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -828,3 +828,10 @@ umul16     1011000  ..... ..... 000 ..... 1110111 @r
 umulx16    1011001  ..... ..... 000 ..... 1110111 @r
 khm16      1000011  ..... ..... 000 ..... 1110111 @r
 khmx16     1001011  ..... ..... 000 ..... 1110111 @r
+
+smul8      1010100  ..... ..... 000 ..... 1110111 @r
+smulx8     1010101  ..... ..... 000 ..... 1110111 @r
+umul8      1011100  ..... ..... 000 ..... 1110111 @r
+umulx8     1011101  ..... ..... 000 ..... 1110111 @r
+khm8       1000111  ..... ..... 000 ..... 1110111 @r
+khmx8      1001111  ..... ..... 000 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index b93ba63dd8..2188de8505 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -286,3 +286,11 @@ GEN_RVP_R_D64_OOL(umul16);
 GEN_RVP_R_D64_OOL(umulx16);
 GEN_RVP_R_OOL(khm16);
 GEN_RVP_R_OOL(khmx16);
+
+/* SIMD 8-bit Multiply Instructions */
+GEN_RVP_R_D64_OOL(smul8);
+GEN_RVP_R_D64_OOL(smulx8);
+GEN_RVP_R_D64_OOL(umul8);
+GEN_RVP_R_D64_OOL(umulx8);
+GEN_RVP_R_OOL(khm8);
+GEN_RVP_R_OOL(khmx8);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 13fed2c4d1..56baefeb8e 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -827,3 +827,96 @@ static inline void do_khmx16(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(khmx16, 2, 2);
+
+/* SIMD 8-bit Multiply Instructions */
+static inline void do_smul8(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    int16_t *d = vd;
+    int8_t *a = va, *b = vb;
+    d[H2(0)] = (int16_t)a[H1(0)] * b[H1(0)];
+    d[H2(1)] = (int16_t)a[H1(1)] * b[H1(1)];
+    d[H2(2)] = (int16_t)a[H1(2)] * b[H1(2)];
+    d[H2(3)] = (int16_t)a[H1(3)] * b[H1(3)];
+}
+
+RVPR64(smul8);
+
+static inline void do_smulx8(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    int16_t *d = vd;
+    int8_t *a = va, *b = vb;
+    d[H2(0)] = (int16_t)a[H1(0)] * b[H1(1)];
+    d[H2(1)] = (int16_t)a[H1(1)] * b[H1(0)];
+    d[H2(2)] = (int16_t)a[H1(2)] * b[H1(3)];
+    d[H2(3)] = (int16_t)a[H1(3)] * b[H1(2)];
+}
+
+RVPR64(smulx8);
+
+static inline void do_umul8(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    uint16_t *d = vd;
+    uint8_t *a = va, *b = vb;
+    d[H2(0)] = (uint16_t)a[H1(0)] * b[H1(0)];
+    d[H2(1)] = (uint16_t)a[H1(1)] * b[H1(1)];
+    d[H2(2)] = (uint16_t)a[H1(2)] * b[H1(2)];
+    d[H2(3)] = (uint16_t)a[H1(3)] * b[H1(3)];
+}
+
+RVPR64(umul8);
+
+static inline void do_umulx8(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    uint16_t *d = vd;
+    uint8_t *a = va, *b = vb;
+    d[H2(0)] = (uint16_t)a[H1(0)] * b[H1(1)];
+    d[H2(1)] = (uint16_t)a[H1(1)] * b[H1(0)];
+    d[H2(2)] = (uint16_t)a[H1(2)] * b[H1(3)];
+    d[H2(3)] = (uint16_t)a[H1(3)] * b[H1(2)];
+}
+
+RVPR64(umulx8);
+
+static inline void do_khm8(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+
+    if (a[i] == INT8_MIN && b[i] == INT8_MIN) {
+        env->vxsat = 1;
+        d[i] = INT8_MAX;
+    } else {
+        d[i] = (int16_t)a[i] * b[i] >> 7;
+    }
+}
+
+RVPR(khm8, 1, 1);
+
+static inline void do_khmx8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    /*
+     * t[x] = ra.B[x] s* rb.B[y];
+     * rt.B[x] = SAT.Q7(t[x] s>> 7);
+     *
+     * (RV32: (x,y)=(3,2),(2,3),
+     *              (1,0),(0,1),
+     * (RV64: (x,y)=(7,6),(6,7),(5,4),(4,5),
+     *              (3,2),(2,3),(1,0),(0,1))
+     */
+    if (a[H1(i)] == INT8_MIN && b[H1(i + 1)] == INT8_MIN) {
+        env->vxsat = 1;
+        d[H1(i)] = INT8_MAX;
+    } else {
+        d[H1(i)] = (int16_t)a[H1(i)] * b[H1(i + 1)] >> 7;
+    }
+    if (a[H1(i + 1)] == INT8_MIN && b[H1(i)] == INT8_MIN) {
+        env->vxsat = 1;
+        d[H1(i + 1)] = INT8_MAX;
+    } else {
+        d[H1(i + 1)] = (int16_t)a[H1(i + 1)] * b[H1(i)] >> 7;
+    }
+}
+
+RVPR(khmx8, 2, 1);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 11/37] target/riscv: SIMD 16-bit Miscellaneous Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair Francis, LIU Zhiwei

There are 11 instructions, including signed or unsigned
minimum, maximum, clip value, absolute value, and leading
zero, leading one count instructions.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/helper.h                   |  11 ++
 target/riscv/insn32.decode              |  11 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  41 ++++++
 target/riscv/packed_helper.c            | 158 ++++++++++++++++++++++++
 4 files changed, 221 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 4d0918b9a9..88035aafad 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1232,3 +1232,14 @@ DEF_HELPER_3(umul8, i64, env, tl, tl)
 DEF_HELPER_3(umulx8, i64, env, tl, tl)
 DEF_HELPER_3(khm8, tl, env, tl, tl)
 DEF_HELPER_3(khmx8, tl, env, tl, tl)
+
+DEF_HELPER_3(smin16, tl, env, tl, tl)
+DEF_HELPER_3(umin16, tl, env, tl, tl)
+DEF_HELPER_3(smax16, tl, env, tl, tl)
+DEF_HELPER_3(umax16, tl, env, tl, tl)
+DEF_HELPER_3(sclip16, tl, env, tl, tl)
+DEF_HELPER_3(uclip16, tl, env, tl, tl)
+DEF_HELPER_2(kabs16, tl, env, tl)
+DEF_HELPER_2(clrs16, tl, env, tl)
+DEF_HELPER_2(clz16, tl, env, tl)
+DEF_HELPER_2(clo16, tl, env, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 05c3e67477..847c796874 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -835,3 +835,14 @@ umul8      1011100  ..... ..... 000 ..... 1110111 @r
 umulx8     1011101  ..... ..... 000 ..... 1110111 @r
 khm8       1000111  ..... ..... 000 ..... 1110111 @r
 khmx8      1001111  ..... ..... 000 ..... 1110111 @r
+
+smin16     1000000  ..... ..... 000 ..... 1110111 @r
+umin16     1001000  ..... ..... 000 ..... 1110111 @r
+smax16     1000001  ..... ..... 000 ..... 1110111 @r
+umax16     1001001  ..... ..... 000 ..... 1110111 @r
+sclip16    1000010  0.... ..... 000 ..... 1110111 @sh4
+uclip16    1000010  1.... ..... 000 ..... 1110111 @sh4
+kabs16     1010110  10001 ..... 000 ..... 1110111 @r2
+clrs16     1010111  01000 ..... 000 ..... 1110111 @r2
+clz16      1010111  01001 ..... 000 ..... 1110111 @r2
+clo16      1010111  01011 ..... 000 ..... 1110111 @r2
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 2188de8505..3e6307cdc3 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -294,3 +294,44 @@ GEN_RVP_R_D64_OOL(umul8);
 GEN_RVP_R_D64_OOL(umulx8);
 GEN_RVP_R_OOL(khm8);
 GEN_RVP_R_OOL(khmx8);
+
+/* SIMD 16-bit Miscellaneous Instructions */
+GEN_RVP_R_OOL(smin16);
+GEN_RVP_R_OOL(umin16);
+GEN_RVP_R_OOL(smax16);
+GEN_RVP_R_OOL(umax16);
+GEN_RVP_SHIFTI(sclip16, NULL, gen_helper_sclip16);
+GEN_RVP_SHIFTI(uclip16, NULL, gen_helper_uclip16);
+
+/* Out of line helpers for R2 format */
+static bool
+r2_ool(DisasContext *ctx, arg_r2 *a,
+       void (* fn)(TCGv, TCGv_ptr, TCGv))
+{
+    TCGv src1, dst;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    fn(dst, cpu_env, src1);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(dst);
+    return true;
+}
+
+#define GEN_RVP_R2_OOL(NAME)                           \
+static bool trans_##NAME(DisasContext *s, arg_r2 *a)   \
+{                                                      \
+    return r2_ool(s, a, gen_helper_##NAME);            \
+}
+
+GEN_RVP_R2_OOL(kabs16);
+GEN_RVP_R2_OOL(clrs16);
+GEN_RVP_R2_OOL(clz16);
+GEN_RVP_R2_OOL(clo16);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 56baefeb8e..e4a9463135 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -920,3 +920,161 @@ static inline void do_khmx8(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(khmx8, 2, 1);
+
+/* SIMD 16-bit Miscellaneous Instructions */
+static inline void do_smin16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR(smin16, 1, 2);
+
+static inline void do_umin16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR(umin16, 1, 2);
+
+static inline void do_smax16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR(smax16, 1, 2);
+
+static inline void do_umax16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR(umax16, 1, 2);
+
+static int64_t sat64(CPURISCVState *env, int64_t a, uint8_t shift)
+{
+    int64_t max = shift >= 64 ? INT64_MAX : (1ull << shift) - 1;
+    int64_t min = shift >= 64 ? INT64_MIN : -(1ull << shift);
+    int64_t result;
+
+    if (a > max) {
+        result = max;
+        env->vxsat = 0x1;
+    } else if (a < min) {
+        result = min;
+        env->vxsat = 0x1;
+    } else {
+        result = a;
+    }
+    return result;
+}
+
+static inline void do_sclip16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+
+    d[i] = sat64(env, a[i], shift);
+}
+
+RVPR(sclip16, 1, 2);
+
+static uint64_t satu64(CPURISCVState *env, uint64_t a, uint8_t shift)
+{
+    uint64_t max = shift >= 64 ? UINT64_MAX : (1ull << shift) - 1;
+    uint64_t result;
+
+    if (a > max) {
+        result = max;
+        env->vxsat = 0x1;
+    } else {
+        result = a;
+    }
+    return result;
+}
+
+static inline void do_uclip16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+
+    if (a[i] < 0) {
+        d[i] = 0;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = satu64(env, a[i], shift);
+    }
+}
+
+RVPR(uclip16, 1, 2);
+
+typedef void PackedFn2i(CPURISCVState *, void *, void *, uint8_t);
+
+static inline target_ulong rvpr2(CPURISCVState *env, target_ulong a,
+                                 uint8_t step, uint8_t size, PackedFn2i *fn)
+{
+    int i, passes = sizeof(target_ulong) / size;
+    target_ulong result;
+
+    for (i = 0; i < passes; i += step) {
+        fn(env, &result, &a, i);
+    }
+    return result;
+}
+
+#define RVPR2(NAME, STEP, SIZE)                                  \
+target_ulong HELPER(NAME)(CPURISCVState *env, target_ulong a)    \
+{                                                                \
+    return rvpr2(env, a, STEP, SIZE, (PackedFn2i *)do_##NAME);   \
+}
+
+static inline void do_kabs16(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+
+    if (a[i] == INT16_MIN) {
+        d[i] = INT16_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = abs(a[i]);
+    }
+}
+
+RVPR2(kabs16, 1, 2);
+
+static inline void do_clrs16(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    d[i] = clrsb32(a[i]) - 16;
+}
+
+RVPR2(clrs16, 1, 2);
+
+static inline void do_clz16(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    d[i] = (a[i] < 0) ? 0 : (clz32(a[i]) - 16);
+}
+
+RVPR2(clz16, 1, 2);
+
+static inline void do_clo16(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    d[i] = (a[i] >= 0) ? 0 : (clo32(a[i]) - 16);
+}
+
+RVPR2(clo16, 1, 2);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 11/37] target/riscv: SIMD 16-bit Miscellaneous Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson,
	LIU Zhiwei, Alistair Francis

There are 11 instructions, including signed or unsigned
minimum, maximum, clip value, absolute value, and leading
zero, leading one count instructions.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/helper.h                   |  11 ++
 target/riscv/insn32.decode              |  11 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  41 ++++++
 target/riscv/packed_helper.c            | 158 ++++++++++++++++++++++++
 4 files changed, 221 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 4d0918b9a9..88035aafad 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1232,3 +1232,14 @@ DEF_HELPER_3(umul8, i64, env, tl, tl)
 DEF_HELPER_3(umulx8, i64, env, tl, tl)
 DEF_HELPER_3(khm8, tl, env, tl, tl)
 DEF_HELPER_3(khmx8, tl, env, tl, tl)
+
+DEF_HELPER_3(smin16, tl, env, tl, tl)
+DEF_HELPER_3(umin16, tl, env, tl, tl)
+DEF_HELPER_3(smax16, tl, env, tl, tl)
+DEF_HELPER_3(umax16, tl, env, tl, tl)
+DEF_HELPER_3(sclip16, tl, env, tl, tl)
+DEF_HELPER_3(uclip16, tl, env, tl, tl)
+DEF_HELPER_2(kabs16, tl, env, tl)
+DEF_HELPER_2(clrs16, tl, env, tl)
+DEF_HELPER_2(clz16, tl, env, tl)
+DEF_HELPER_2(clo16, tl, env, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 05c3e67477..847c796874 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -835,3 +835,14 @@ umul8      1011100  ..... ..... 000 ..... 1110111 @r
 umulx8     1011101  ..... ..... 000 ..... 1110111 @r
 khm8       1000111  ..... ..... 000 ..... 1110111 @r
 khmx8      1001111  ..... ..... 000 ..... 1110111 @r
+
+smin16     1000000  ..... ..... 000 ..... 1110111 @r
+umin16     1001000  ..... ..... 000 ..... 1110111 @r
+smax16     1000001  ..... ..... 000 ..... 1110111 @r
+umax16     1001001  ..... ..... 000 ..... 1110111 @r
+sclip16    1000010  0.... ..... 000 ..... 1110111 @sh4
+uclip16    1000010  1.... ..... 000 ..... 1110111 @sh4
+kabs16     1010110  10001 ..... 000 ..... 1110111 @r2
+clrs16     1010111  01000 ..... 000 ..... 1110111 @r2
+clz16      1010111  01001 ..... 000 ..... 1110111 @r2
+clo16      1010111  01011 ..... 000 ..... 1110111 @r2
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 2188de8505..3e6307cdc3 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -294,3 +294,44 @@ GEN_RVP_R_D64_OOL(umul8);
 GEN_RVP_R_D64_OOL(umulx8);
 GEN_RVP_R_OOL(khm8);
 GEN_RVP_R_OOL(khmx8);
+
+/* SIMD 16-bit Miscellaneous Instructions */
+GEN_RVP_R_OOL(smin16);
+GEN_RVP_R_OOL(umin16);
+GEN_RVP_R_OOL(smax16);
+GEN_RVP_R_OOL(umax16);
+GEN_RVP_SHIFTI(sclip16, NULL, gen_helper_sclip16);
+GEN_RVP_SHIFTI(uclip16, NULL, gen_helper_uclip16);
+
+/* Out of line helpers for R2 format */
+static bool
+r2_ool(DisasContext *ctx, arg_r2 *a,
+       void (* fn)(TCGv, TCGv_ptr, TCGv))
+{
+    TCGv src1, dst;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    fn(dst, cpu_env, src1);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(dst);
+    return true;
+}
+
+#define GEN_RVP_R2_OOL(NAME)                           \
+static bool trans_##NAME(DisasContext *s, arg_r2 *a)   \
+{                                                      \
+    return r2_ool(s, a, gen_helper_##NAME);            \
+}
+
+GEN_RVP_R2_OOL(kabs16);
+GEN_RVP_R2_OOL(clrs16);
+GEN_RVP_R2_OOL(clz16);
+GEN_RVP_R2_OOL(clo16);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 56baefeb8e..e4a9463135 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -920,3 +920,161 @@ static inline void do_khmx8(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(khmx8, 2, 1);
+
+/* SIMD 16-bit Miscellaneous Instructions */
+static inline void do_smin16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR(smin16, 1, 2);
+
+static inline void do_umin16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR(umin16, 1, 2);
+
+static inline void do_smax16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR(smax16, 1, 2);
+
+static inline void do_umax16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR(umax16, 1, 2);
+
+static int64_t sat64(CPURISCVState *env, int64_t a, uint8_t shift)
+{
+    int64_t max = shift >= 64 ? INT64_MAX : (1ull << shift) - 1;
+    int64_t min = shift >= 64 ? INT64_MIN : -(1ull << shift);
+    int64_t result;
+
+    if (a > max) {
+        result = max;
+        env->vxsat = 0x1;
+    } else if (a < min) {
+        result = min;
+        env->vxsat = 0x1;
+    } else {
+        result = a;
+    }
+    return result;
+}
+
+static inline void do_sclip16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+
+    d[i] = sat64(env, a[i], shift);
+}
+
+RVPR(sclip16, 1, 2);
+
+static uint64_t satu64(CPURISCVState *env, uint64_t a, uint8_t shift)
+{
+    uint64_t max = shift >= 64 ? UINT64_MAX : (1ull << shift) - 1;
+    uint64_t result;
+
+    if (a > max) {
+        result = max;
+        env->vxsat = 0x1;
+    } else {
+        result = a;
+    }
+    return result;
+}
+
+static inline void do_uclip16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+
+    if (a[i] < 0) {
+        d[i] = 0;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = satu64(env, a[i], shift);
+    }
+}
+
+RVPR(uclip16, 1, 2);
+
+typedef void PackedFn2i(CPURISCVState *, void *, void *, uint8_t);
+
+static inline target_ulong rvpr2(CPURISCVState *env, target_ulong a,
+                                 uint8_t step, uint8_t size, PackedFn2i *fn)
+{
+    int i, passes = sizeof(target_ulong) / size;
+    target_ulong result;
+
+    for (i = 0; i < passes; i += step) {
+        fn(env, &result, &a, i);
+    }
+    return result;
+}
+
+#define RVPR2(NAME, STEP, SIZE)                                  \
+target_ulong HELPER(NAME)(CPURISCVState *env, target_ulong a)    \
+{                                                                \
+    return rvpr2(env, a, STEP, SIZE, (PackedFn2i *)do_##NAME);   \
+}
+
+static inline void do_kabs16(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+
+    if (a[i] == INT16_MIN) {
+        d[i] = INT16_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = abs(a[i]);
+    }
+}
+
+RVPR2(kabs16, 1, 2);
+
+static inline void do_clrs16(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    d[i] = clrsb32(a[i]) - 16;
+}
+
+RVPR2(clrs16, 1, 2);
+
+static inline void do_clz16(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    d[i] = (a[i] < 0) ? 0 : (clz32(a[i]) - 16);
+}
+
+RVPR2(clz16, 1, 2);
+
+static inline void do_clo16(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    d[i] = (a[i] >= 0) ? 0 : (clo32(a[i]) - 16);
+}
+
+RVPR2(clo16, 1, 2);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 12/37] target/riscv: SIMD 8-bit Miscellaneous Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair Francis, LIU Zhiwei

Instructions include signed or unsigned minimum, maximum,
clip value, absolute value, and leading zero, leading one
count instructions.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/helper.h                   |  12 +++
 target/riscv/insn32.decode              |  12 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  13 +++
 target/riscv/packed_helper.c            | 115 ++++++++++++++++++++++++
 4 files changed, 152 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 88035aafad..240df8b766 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1243,3 +1243,15 @@ DEF_HELPER_2(kabs16, tl, env, tl)
 DEF_HELPER_2(clrs16, tl, env, tl)
 DEF_HELPER_2(clz16, tl, env, tl)
 DEF_HELPER_2(clo16, tl, env, tl)
+
+DEF_HELPER_3(smin8, tl, env, tl, tl)
+DEF_HELPER_3(umin8, tl, env, tl, tl)
+DEF_HELPER_3(smax8, tl, env, tl, tl)
+DEF_HELPER_3(umax8, tl, env, tl, tl)
+DEF_HELPER_3(sclip8, tl, env, tl, tl)
+DEF_HELPER_3(uclip8, tl, env, tl, tl)
+DEF_HELPER_2(kabs8, tl, env, tl)
+DEF_HELPER_2(clrs8, tl, env, tl)
+DEF_HELPER_2(clz8, tl, env, tl)
+DEF_HELPER_2(clo8, tl, env, tl)
+DEF_HELPER_2(swap8, tl, env, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 847c796874..4c34f0f4f4 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -846,3 +846,15 @@ kabs16     1010110  10001 ..... 000 ..... 1110111 @r2
 clrs16     1010111  01000 ..... 000 ..... 1110111 @r2
 clz16      1010111  01001 ..... 000 ..... 1110111 @r2
 clo16      1010111  01011 ..... 000 ..... 1110111 @r2
+
+smin8      1000100  ..... ..... 000 ..... 1110111 @r
+umin8      1001100  ..... ..... 000 ..... 1110111 @r
+smax8      1000101  ..... ..... 000 ..... 1110111 @r
+umax8      1001101  ..... ..... 000 ..... 1110111 @r
+sclip8     1000110  00... ..... 000 ..... 1110111 @sh3
+uclip8     1000110  10... ..... 000 ..... 1110111 @sh3
+kabs8      1010110  10000 ..... 000 ..... 1110111 @r2
+clrs8      1010111  00000 ..... 000 ..... 1110111 @r2
+clz8       1010111  00001 ..... 000 ..... 1110111 @r2
+clo8       1010111  00011 ..... 000 ..... 1110111 @r2
+swap8      1010110  11000 ..... 000 ..... 1110111 @r2
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 3e6307cdc3..c5ec530fd7 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -335,3 +335,16 @@ GEN_RVP_R2_OOL(kabs16);
 GEN_RVP_R2_OOL(clrs16);
 GEN_RVP_R2_OOL(clz16);
 GEN_RVP_R2_OOL(clo16);
+
+/* SIMD 8-bit Miscellaneous Instructions */
+GEN_RVP_R_OOL(smin8);
+GEN_RVP_R_OOL(umin8);
+GEN_RVP_R_OOL(smax8);
+GEN_RVP_R_OOL(umax8);
+GEN_RVP_SHIFTI(sclip8, NULL, gen_helper_sclip8);
+GEN_RVP_SHIFTI(uclip8, NULL, gen_helper_uclip8);
+GEN_RVP_R2_OOL(kabs8);
+GEN_RVP_R2_OOL(clrs8);
+GEN_RVP_R2_OOL(clz8);
+GEN_RVP_R2_OOL(clo8);
+GEN_RVP_R2_OOL(swap8);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index e4a9463135..3d3d2bf3e4 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1078,3 +1078,118 @@ static inline void do_clo16(CPURISCVState *env, void *vd, void *va, uint8_t i)
 }
 
 RVPR2(clo16, 1, 2);
+
+/* SIMD 8-bit Miscellaneous Instructions */
+static inline void do_smin8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR(smin8, 1, 1);
+
+static inline void do_umin8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR(umin8, 1, 1);
+
+static inline void do_smax8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR(smax8, 1, 1);
+
+static inline void do_umax8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR(umax8, 1, 1);
+
+static inline void do_sclip8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+
+    d[i] = sat64(env, a[i], shift);
+}
+
+RVPR(sclip8, 1, 1);
+
+static inline void do_uclip8(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+
+    if (a[i] < 0) {
+        d[i] = 0;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = satu64(env, a[i], shift);
+    }
+}
+
+RVPR(uclip8, 1, 1);
+
+static inline void do_kabs8(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+
+    if (a[i] == INT8_MIN) {
+        d[i] = INT8_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = abs(a[i]);
+    }
+}
+
+RVPR2(kabs8, 1, 1);
+
+static inline void do_clrs8(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    d[i] = clrsb32(a[i]) - 24;
+}
+
+RVPR2(clrs8, 1, 1);
+
+static inline void do_clz8(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    d[i] = (a[i] < 0) ? 0 : (clz32(a[i]) - 24);
+}
+
+RVPR2(clz8, 1, 1);
+
+static inline void do_clo8(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    d[i] = (a[i] >= 0) ? 0 : (clo32(a[i]) - 24);
+}
+
+RVPR2(clo8, 1, 1);
+
+static inline void do_swap8(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    d[H1(i)] = a[H1(i + 1)];
+    d[H1(i + 1)] = a[H1(i)];
+}
+
+RVPR2(swap8, 2, 1);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 12/37] target/riscv: SIMD 8-bit Miscellaneous Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson,
	LIU Zhiwei, Alistair Francis

Instructions include signed or unsigned minimum, maximum,
clip value, absolute value, and leading zero, leading one
count instructions.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/helper.h                   |  12 +++
 target/riscv/insn32.decode              |  12 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  13 +++
 target/riscv/packed_helper.c            | 115 ++++++++++++++++++++++++
 4 files changed, 152 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 88035aafad..240df8b766 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1243,3 +1243,15 @@ DEF_HELPER_2(kabs16, tl, env, tl)
 DEF_HELPER_2(clrs16, tl, env, tl)
 DEF_HELPER_2(clz16, tl, env, tl)
 DEF_HELPER_2(clo16, tl, env, tl)
+
+DEF_HELPER_3(smin8, tl, env, tl, tl)
+DEF_HELPER_3(umin8, tl, env, tl, tl)
+DEF_HELPER_3(smax8, tl, env, tl, tl)
+DEF_HELPER_3(umax8, tl, env, tl, tl)
+DEF_HELPER_3(sclip8, tl, env, tl, tl)
+DEF_HELPER_3(uclip8, tl, env, tl, tl)
+DEF_HELPER_2(kabs8, tl, env, tl)
+DEF_HELPER_2(clrs8, tl, env, tl)
+DEF_HELPER_2(clz8, tl, env, tl)
+DEF_HELPER_2(clo8, tl, env, tl)
+DEF_HELPER_2(swap8, tl, env, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 847c796874..4c34f0f4f4 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -846,3 +846,15 @@ kabs16     1010110  10001 ..... 000 ..... 1110111 @r2
 clrs16     1010111  01000 ..... 000 ..... 1110111 @r2
 clz16      1010111  01001 ..... 000 ..... 1110111 @r2
 clo16      1010111  01011 ..... 000 ..... 1110111 @r2
+
+smin8      1000100  ..... ..... 000 ..... 1110111 @r
+umin8      1001100  ..... ..... 000 ..... 1110111 @r
+smax8      1000101  ..... ..... 000 ..... 1110111 @r
+umax8      1001101  ..... ..... 000 ..... 1110111 @r
+sclip8     1000110  00... ..... 000 ..... 1110111 @sh3
+uclip8     1000110  10... ..... 000 ..... 1110111 @sh3
+kabs8      1010110  10000 ..... 000 ..... 1110111 @r2
+clrs8      1010111  00000 ..... 000 ..... 1110111 @r2
+clz8       1010111  00001 ..... 000 ..... 1110111 @r2
+clo8       1010111  00011 ..... 000 ..... 1110111 @r2
+swap8      1010110  11000 ..... 000 ..... 1110111 @r2
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 3e6307cdc3..c5ec530fd7 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -335,3 +335,16 @@ GEN_RVP_R2_OOL(kabs16);
 GEN_RVP_R2_OOL(clrs16);
 GEN_RVP_R2_OOL(clz16);
 GEN_RVP_R2_OOL(clo16);
+
+/* SIMD 8-bit Miscellaneous Instructions */
+GEN_RVP_R_OOL(smin8);
+GEN_RVP_R_OOL(umin8);
+GEN_RVP_R_OOL(smax8);
+GEN_RVP_R_OOL(umax8);
+GEN_RVP_SHIFTI(sclip8, NULL, gen_helper_sclip8);
+GEN_RVP_SHIFTI(uclip8, NULL, gen_helper_uclip8);
+GEN_RVP_R2_OOL(kabs8);
+GEN_RVP_R2_OOL(clrs8);
+GEN_RVP_R2_OOL(clz8);
+GEN_RVP_R2_OOL(clo8);
+GEN_RVP_R2_OOL(swap8);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index e4a9463135..3d3d2bf3e4 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1078,3 +1078,118 @@ static inline void do_clo16(CPURISCVState *env, void *vd, void *va, uint8_t i)
 }
 
 RVPR2(clo16, 1, 2);
+
+/* SIMD 8-bit Miscellaneous Instructions */
+static inline void do_smin8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR(smin8, 1, 1);
+
+static inline void do_umin8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR(umin8, 1, 1);
+
+static inline void do_smax8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR(smax8, 1, 1);
+
+static inline void do_umax8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR(umax8, 1, 1);
+
+static inline void do_sclip8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+
+    d[i] = sat64(env, a[i], shift);
+}
+
+RVPR(sclip8, 1, 1);
+
+static inline void do_uclip8(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+
+    if (a[i] < 0) {
+        d[i] = 0;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = satu64(env, a[i], shift);
+    }
+}
+
+RVPR(uclip8, 1, 1);
+
+static inline void do_kabs8(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+
+    if (a[i] == INT8_MIN) {
+        d[i] = INT8_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = abs(a[i]);
+    }
+}
+
+RVPR2(kabs8, 1, 1);
+
+static inline void do_clrs8(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    d[i] = clrsb32(a[i]) - 24;
+}
+
+RVPR2(clrs8, 1, 1);
+
+static inline void do_clz8(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    d[i] = (a[i] < 0) ? 0 : (clz32(a[i]) - 24);
+}
+
+RVPR2(clz8, 1, 1);
+
+static inline void do_clo8(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    d[i] = (a[i] >= 0) ? 0 : (clo32(a[i]) - 24);
+}
+
+RVPR2(clo8, 1, 1);
+
+static inline void do_swap8(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    d[H1(i)] = a[H1(i + 1)];
+    d[H1(i + 1)] = a[H1(i)];
+}
+
+RVPR2(swap8, 2, 1);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 13/37] target/riscv: 8-bit Unpacking Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair Francis, LIU Zhiwei

Sign-extend or zero-extend selected 8-bit elements to
16-bit elements.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/helper.h                   |  11 +++
 target/riscv/insn32.decode              |  11 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  12 +++
 target/riscv/packed_helper.c            | 121 ++++++++++++++++++++++++
 4 files changed, 155 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 240df8b766..9fd2a70f7d 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1255,3 +1255,14 @@ DEF_HELPER_2(clrs8, tl, env, tl)
 DEF_HELPER_2(clz8, tl, env, tl)
 DEF_HELPER_2(clo8, tl, env, tl)
 DEF_HELPER_2(swap8, tl, env, tl)
+
+DEF_HELPER_2(sunpkd810, tl, env, tl)
+DEF_HELPER_2(sunpkd820, tl, env, tl)
+DEF_HELPER_2(sunpkd830, tl, env, tl)
+DEF_HELPER_2(sunpkd831, tl, env, tl)
+DEF_HELPER_2(sunpkd832, tl, env, tl)
+DEF_HELPER_2(zunpkd810, tl, env, tl)
+DEF_HELPER_2(zunpkd820, tl, env, tl)
+DEF_HELPER_2(zunpkd830, tl, env, tl)
+DEF_HELPER_2(zunpkd831, tl, env, tl)
+DEF_HELPER_2(zunpkd832, tl, env, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 4c34f0f4f4..9b8ea0f9ab 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -858,3 +858,14 @@ clrs8      1010111  00000 ..... 000 ..... 1110111 @r2
 clz8       1010111  00001 ..... 000 ..... 1110111 @r2
 clo8       1010111  00011 ..... 000 ..... 1110111 @r2
 swap8      1010110  11000 ..... 000 ..... 1110111 @r2
+
+sunpkd810  1010110  01000 ..... 000 ..... 1110111 @r2
+sunpkd820  1010110  01001 ..... 000 ..... 1110111 @r2
+sunpkd830  1010110  01010 ..... 000 ..... 1110111 @r2
+sunpkd831  1010110  01011 ..... 000 ..... 1110111 @r2
+sunpkd832  1010110  10011 ..... 000 ..... 1110111 @r2
+zunpkd810  1010110  01100 ..... 000 ..... 1110111 @r2
+zunpkd820  1010110  01101 ..... 000 ..... 1110111 @r2
+zunpkd830  1010110  01110 ..... 000 ..... 1110111 @r2
+zunpkd831  1010110  01111 ..... 000 ..... 1110111 @r2
+zunpkd832  1010110  10111 ..... 000 ..... 1110111 @r2
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index c5ec530fd7..5af2c7c2cc 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -348,3 +348,15 @@ GEN_RVP_R2_OOL(clrs8);
 GEN_RVP_R2_OOL(clz8);
 GEN_RVP_R2_OOL(clo8);
 GEN_RVP_R2_OOL(swap8);
+
+/* 8-bit Unpacking Instructions */
+GEN_RVP_R2_OOL(sunpkd810);
+GEN_RVP_R2_OOL(sunpkd820);
+GEN_RVP_R2_OOL(sunpkd830);
+GEN_RVP_R2_OOL(sunpkd831);
+GEN_RVP_R2_OOL(sunpkd832);
+GEN_RVP_R2_OOL(zunpkd810);
+GEN_RVP_R2_OOL(zunpkd820);
+GEN_RVP_R2_OOL(zunpkd830);
+GEN_RVP_R2_OOL(zunpkd831);
+GEN_RVP_R2_OOL(zunpkd832);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 3d3d2bf3e4..8226dbd079 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1193,3 +1193,124 @@ static inline void do_swap8(CPURISCVState *env, void *vd, void *va, uint8_t i)
 }
 
 RVPR2(swap8, 2, 1);
+
+/* 8-bit Unpacking Instructions */
+static inline void
+do_sunpkd810(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *a = va;
+    int16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 1)];
+}
+
+RVPR2(sunpkd810, 4, 1);
+
+static inline void
+do_sunpkd820(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *a = va;
+    int16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 2)];
+}
+
+RVPR2(sunpkd820, 4, 1);
+
+static inline void
+do_sunpkd830(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *a = va;
+    int16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(sunpkd830, 4, 1);
+
+static inline void
+do_sunpkd831(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *a = va;
+    int16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i) + 1];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(sunpkd831, 4, 1);
+
+static inline void
+do_sunpkd832(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *a = va;
+    int16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i) + 2];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(sunpkd832, 4, 1);
+
+static inline void
+do_zunpkd810(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    uint8_t *a = va;
+    uint16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 1)];
+}
+
+RVPR2(zunpkd810, 4, 1);
+
+static inline void
+do_zunpkd820(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    uint8_t *a = va;
+    uint16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 2)];
+}
+
+RVPR2(zunpkd820, 4, 1);
+
+static inline void
+do_zunpkd830(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    uint8_t *a = va;
+    uint16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(zunpkd830, 4, 1);
+
+static inline void
+do_zunpkd831(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    uint8_t *a = va;
+    uint16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i) + 1];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(zunpkd831, 4, 1);
+
+static inline void
+do_zunpkd832(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    uint8_t *a = va;
+    uint16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i) + 2];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(zunpkd832, 4, 1);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 13/37] target/riscv: 8-bit Unpacking Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson,
	LIU Zhiwei, Alistair Francis

Sign-extend or zero-extend selected 8-bit elements to
16-bit elements.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/helper.h                   |  11 +++
 target/riscv/insn32.decode              |  11 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  12 +++
 target/riscv/packed_helper.c            | 121 ++++++++++++++++++++++++
 4 files changed, 155 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 240df8b766..9fd2a70f7d 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1255,3 +1255,14 @@ DEF_HELPER_2(clrs8, tl, env, tl)
 DEF_HELPER_2(clz8, tl, env, tl)
 DEF_HELPER_2(clo8, tl, env, tl)
 DEF_HELPER_2(swap8, tl, env, tl)
+
+DEF_HELPER_2(sunpkd810, tl, env, tl)
+DEF_HELPER_2(sunpkd820, tl, env, tl)
+DEF_HELPER_2(sunpkd830, tl, env, tl)
+DEF_HELPER_2(sunpkd831, tl, env, tl)
+DEF_HELPER_2(sunpkd832, tl, env, tl)
+DEF_HELPER_2(zunpkd810, tl, env, tl)
+DEF_HELPER_2(zunpkd820, tl, env, tl)
+DEF_HELPER_2(zunpkd830, tl, env, tl)
+DEF_HELPER_2(zunpkd831, tl, env, tl)
+DEF_HELPER_2(zunpkd832, tl, env, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 4c34f0f4f4..9b8ea0f9ab 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -858,3 +858,14 @@ clrs8      1010111  00000 ..... 000 ..... 1110111 @r2
 clz8       1010111  00001 ..... 000 ..... 1110111 @r2
 clo8       1010111  00011 ..... 000 ..... 1110111 @r2
 swap8      1010110  11000 ..... 000 ..... 1110111 @r2
+
+sunpkd810  1010110  01000 ..... 000 ..... 1110111 @r2
+sunpkd820  1010110  01001 ..... 000 ..... 1110111 @r2
+sunpkd830  1010110  01010 ..... 000 ..... 1110111 @r2
+sunpkd831  1010110  01011 ..... 000 ..... 1110111 @r2
+sunpkd832  1010110  10011 ..... 000 ..... 1110111 @r2
+zunpkd810  1010110  01100 ..... 000 ..... 1110111 @r2
+zunpkd820  1010110  01101 ..... 000 ..... 1110111 @r2
+zunpkd830  1010110  01110 ..... 000 ..... 1110111 @r2
+zunpkd831  1010110  01111 ..... 000 ..... 1110111 @r2
+zunpkd832  1010110  10111 ..... 000 ..... 1110111 @r2
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index c5ec530fd7..5af2c7c2cc 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -348,3 +348,15 @@ GEN_RVP_R2_OOL(clrs8);
 GEN_RVP_R2_OOL(clz8);
 GEN_RVP_R2_OOL(clo8);
 GEN_RVP_R2_OOL(swap8);
+
+/* 8-bit Unpacking Instructions */
+GEN_RVP_R2_OOL(sunpkd810);
+GEN_RVP_R2_OOL(sunpkd820);
+GEN_RVP_R2_OOL(sunpkd830);
+GEN_RVP_R2_OOL(sunpkd831);
+GEN_RVP_R2_OOL(sunpkd832);
+GEN_RVP_R2_OOL(zunpkd810);
+GEN_RVP_R2_OOL(zunpkd820);
+GEN_RVP_R2_OOL(zunpkd830);
+GEN_RVP_R2_OOL(zunpkd831);
+GEN_RVP_R2_OOL(zunpkd832);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 3d3d2bf3e4..8226dbd079 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1193,3 +1193,124 @@ static inline void do_swap8(CPURISCVState *env, void *vd, void *va, uint8_t i)
 }
 
 RVPR2(swap8, 2, 1);
+
+/* 8-bit Unpacking Instructions */
+static inline void
+do_sunpkd810(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *a = va;
+    int16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 1)];
+}
+
+RVPR2(sunpkd810, 4, 1);
+
+static inline void
+do_sunpkd820(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *a = va;
+    int16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 2)];
+}
+
+RVPR2(sunpkd820, 4, 1);
+
+static inline void
+do_sunpkd830(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *a = va;
+    int16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(sunpkd830, 4, 1);
+
+static inline void
+do_sunpkd831(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *a = va;
+    int16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i) + 1];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(sunpkd831, 4, 1);
+
+static inline void
+do_sunpkd832(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *a = va;
+    int16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i) + 2];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(sunpkd832, 4, 1);
+
+static inline void
+do_zunpkd810(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    uint8_t *a = va;
+    uint16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 1)];
+}
+
+RVPR2(zunpkd810, 4, 1);
+
+static inline void
+do_zunpkd820(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    uint8_t *a = va;
+    uint16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 2)];
+}
+
+RVPR2(zunpkd820, 4, 1);
+
+static inline void
+do_zunpkd830(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    uint8_t *a = va;
+    uint16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(zunpkd830, 4, 1);
+
+static inline void
+do_zunpkd831(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    uint8_t *a = va;
+    uint16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i) + 1];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(zunpkd831, 4, 1);
+
+static inline void
+do_zunpkd832(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    uint8_t *a = va;
+    uint16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i) + 2];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(zunpkd832, 4, 1);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 14/37] target/riscv: 16-bit Packing Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair Francis, LIU Zhiwei

Concat 16-bit elements from source register to 32-bit element
in destination register.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/helper.h                   |  5 +++
 target/riscv/insn32.decode              |  5 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  9 +++++
 target/riscv/packed_helper.c            | 45 +++++++++++++++++++++++++
 4 files changed, 64 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 9fd2a70f7d..9872f5efbd 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1266,3 +1266,8 @@ DEF_HELPER_2(zunpkd820, tl, env, tl)
 DEF_HELPER_2(zunpkd830, tl, env, tl)
 DEF_HELPER_2(zunpkd831, tl, env, tl)
 DEF_HELPER_2(zunpkd832, tl, env, tl)
+
+DEF_HELPER_3(pkbb16, tl, env, tl, tl)
+DEF_HELPER_3(pkbt16, tl, env, tl, tl)
+DEF_HELPER_3(pktt16, tl, env, tl, tl)
+DEF_HELPER_3(pktb16, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 9b8ea0f9ab..0b6830c76e 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -869,3 +869,8 @@ zunpkd820  1010110  01101 ..... 000 ..... 1110111 @r2
 zunpkd830  1010110  01110 ..... 000 ..... 1110111 @r2
 zunpkd831  1010110  01111 ..... 000 ..... 1110111 @r2
 zunpkd832  1010110  10111 ..... 000 ..... 1110111 @r2
+
+pkbb16     0000111  ..... ..... 001 ..... 1110111 @r
+pkbt16     0001111  ..... ..... 001 ..... 1110111 @r
+pktt16     0010111  ..... ..... 001 ..... 1110111 @r
+pktb16     0011111  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 5af2c7c2cc..b5bd8b1406 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -360,3 +360,12 @@ GEN_RVP_R2_OOL(zunpkd820);
 GEN_RVP_R2_OOL(zunpkd830);
 GEN_RVP_R2_OOL(zunpkd831);
 GEN_RVP_R2_OOL(zunpkd832);
+
+/*
+ *** Partial-SIMD Data Processing Instruction
+ */
+/* 16-bit Packing Instructions */
+GEN_RVP_R_OOL(pkbb16);
+GEN_RVP_R_OOL(pkbt16);
+GEN_RVP_R_OOL(pktt16);
+GEN_RVP_R_OOL(pktb16);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 8226dbd079..f6cea654b2 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1314,3 +1314,48 @@ do_zunpkd832(CPURISCVState *env, void *vd, void *va, uint8_t i)
 }
 
 RVPR2(zunpkd832, 4, 1);
+
+/*
+ *** Partial-SIMD Data Processing Instructions
+ */
+
+/* 16-bit Packing Instructions */
+static inline void do_pkbb16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i + 1)] = a[H2(i)];
+    d[H2(i)] = b[H2(i)];
+}
+
+RVPR(pkbb16, 2, 2);
+
+static inline void do_pkbt16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i + 1)] = a[H2(i)];
+    d[H2(i)] = b[H2(i + 1)];
+}
+
+RVPR(pkbt16, 2, 2);
+
+static inline void do_pktt16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i + 1)] = a[H2(i + 1)];
+    d[H2(i)] = b[H2(i + 1)];
+}
+
+RVPR(pktt16, 2, 2);
+
+static inline void do_pktb16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i + 1)] = a[H2(i + 1)];
+    d[H2(i)] = b[H2(i)];
+}
+
+RVPR(pktb16, 2, 2);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 14/37] target/riscv: 16-bit Packing Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson,
	LIU Zhiwei, Alistair Francis

Concat 16-bit elements from source register to 32-bit element
in destination register.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/helper.h                   |  5 +++
 target/riscv/insn32.decode              |  5 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  9 +++++
 target/riscv/packed_helper.c            | 45 +++++++++++++++++++++++++
 4 files changed, 64 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 9fd2a70f7d..9872f5efbd 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1266,3 +1266,8 @@ DEF_HELPER_2(zunpkd820, tl, env, tl)
 DEF_HELPER_2(zunpkd830, tl, env, tl)
 DEF_HELPER_2(zunpkd831, tl, env, tl)
 DEF_HELPER_2(zunpkd832, tl, env, tl)
+
+DEF_HELPER_3(pkbb16, tl, env, tl, tl)
+DEF_HELPER_3(pkbt16, tl, env, tl, tl)
+DEF_HELPER_3(pktt16, tl, env, tl, tl)
+DEF_HELPER_3(pktb16, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 9b8ea0f9ab..0b6830c76e 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -869,3 +869,8 @@ zunpkd820  1010110  01101 ..... 000 ..... 1110111 @r2
 zunpkd830  1010110  01110 ..... 000 ..... 1110111 @r2
 zunpkd831  1010110  01111 ..... 000 ..... 1110111 @r2
 zunpkd832  1010110  10111 ..... 000 ..... 1110111 @r2
+
+pkbb16     0000111  ..... ..... 001 ..... 1110111 @r
+pkbt16     0001111  ..... ..... 001 ..... 1110111 @r
+pktt16     0010111  ..... ..... 001 ..... 1110111 @r
+pktb16     0011111  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 5af2c7c2cc..b5bd8b1406 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -360,3 +360,12 @@ GEN_RVP_R2_OOL(zunpkd820);
 GEN_RVP_R2_OOL(zunpkd830);
 GEN_RVP_R2_OOL(zunpkd831);
 GEN_RVP_R2_OOL(zunpkd832);
+
+/*
+ *** Partial-SIMD Data Processing Instruction
+ */
+/* 16-bit Packing Instructions */
+GEN_RVP_R_OOL(pkbb16);
+GEN_RVP_R_OOL(pkbt16);
+GEN_RVP_R_OOL(pktt16);
+GEN_RVP_R_OOL(pktb16);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 8226dbd079..f6cea654b2 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1314,3 +1314,48 @@ do_zunpkd832(CPURISCVState *env, void *vd, void *va, uint8_t i)
 }
 
 RVPR2(zunpkd832, 4, 1);
+
+/*
+ *** Partial-SIMD Data Processing Instructions
+ */
+
+/* 16-bit Packing Instructions */
+static inline void do_pkbb16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i + 1)] = a[H2(i)];
+    d[H2(i)] = b[H2(i)];
+}
+
+RVPR(pkbb16, 2, 2);
+
+static inline void do_pkbt16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i + 1)] = a[H2(i)];
+    d[H2(i)] = b[H2(i + 1)];
+}
+
+RVPR(pkbt16, 2, 2);
+
+static inline void do_pktt16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i + 1)] = a[H2(i + 1)];
+    d[H2(i)] = b[H2(i + 1)];
+}
+
+RVPR(pktt16, 2, 2);
+
+static inline void do_pktb16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i + 1)] = a[H2(i + 1)];
+    d[H2(i)] = b[H2(i)];
+}
+
+RVPR(pktb16, 2, 2);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 15/37] target/riscv: Signed MSW 32x32 Multiply and Add Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

Always contain a 32x32 multiplification and the most significant
word can be used as the result, or an operand for an add or
subtract operation with rounding or saturation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   9 ++
 target/riscv/insn32.decode              |   9 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  44 ++++++++++
 target/riscv/packed_helper.c            | 109 ++++++++++++++++++++++++
 4 files changed, 171 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 9872f5efbd..600e8dee44 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1271,3 +1271,12 @@ DEF_HELPER_3(pkbb16, tl, env, tl, tl)
 DEF_HELPER_3(pkbt16, tl, env, tl, tl)
 DEF_HELPER_3(pktt16, tl, env, tl, tl)
 DEF_HELPER_3(pktb16, tl, env, tl, tl)
+
+DEF_HELPER_3(smmul, tl, env, tl, tl)
+DEF_HELPER_3(smmul_u, tl, env, tl, tl)
+DEF_HELPER_4(kmmac, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmac_u, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmsb, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmsb_u, tl, env, tl, tl, tl)
+DEF_HELPER_3(kwmmul, tl, env, tl, tl)
+DEF_HELPER_3(kwmmul_u, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 0b6830c76e..0484de140b 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -874,3 +874,12 @@ pkbb16     0000111  ..... ..... 001 ..... 1110111 @r
 pkbt16     0001111  ..... ..... 001 ..... 1110111 @r
 pktt16     0010111  ..... ..... 001 ..... 1110111 @r
 pktb16     0011111  ..... ..... 001 ..... 1110111 @r
+
+smmul      0100000  ..... ..... 001 ..... 1110111 @r
+smmul_u    0101000  ..... ..... 001 ..... 1110111 @r
+kmmac      0110000  ..... ..... 001 ..... 1110111 @r
+kmmac_u    0111000  ..... ..... 001 ..... 1110111 @r
+kmmsb      0100001  ..... ..... 001 ..... 1110111 @r
+kmmsb_u    0101001  ..... ..... 001 ..... 1110111 @r
+kwmmul     0110001  ..... ..... 001 ..... 1110111 @r
+kwmmul_u   0111001  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index b5bd8b1406..073558b950 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -369,3 +369,47 @@ GEN_RVP_R_OOL(pkbb16);
 GEN_RVP_R_OOL(pkbt16);
 GEN_RVP_R_OOL(pktt16);
 GEN_RVP_R_OOL(pktb16);
+
+/* Most Significant Word “32x32” Multiply & Add Instructions */
+GEN_RVP_R_OOL(smmul);
+GEN_RVP_R_OOL(smmul_u);
+
+/* Function to accumulate destination register */
+static inline bool r_acc_ool(DisasContext *ctx, arg_r *a,
+                             void (* fn)(TCGv, TCGv_ptr, TCGv, TCGv, TCGv))
+{
+    TCGv src1, src2, src3, dst;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    src3 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    gen_get_gpr(src3, a->rd);
+    fn(dst, cpu_env, src1, src2, src3);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(src3);
+    tcg_temp_free(dst);
+    return true;
+}
+
+#define GEN_RVP_R_ACC_OOL(NAME)                        \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_acc_ool(s, a, gen_helper_##NAME);         \
+}
+
+GEN_RVP_R_ACC_OOL(kmmac);
+GEN_RVP_R_ACC_OOL(kmmac_u);
+GEN_RVP_R_ACC_OOL(kmmsb);
+GEN_RVP_R_ACC_OOL(kmmsb_u);
+GEN_RVP_R_OOL(kwmmul);
+GEN_RVP_R_OOL(kwmmul_u);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index f6cea654b2..465cb5a3b3 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1359,3 +1359,112 @@ static inline void do_pktb16(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(pktb16, 2, 2);
+
+/* Most Significant Word “32x32” Multiply & Add Instructions */
+static inline void do_smmul(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = (int64_t)a[i] * b[i] >> 32;
+}
+
+RVPR(smmul, 1, 4);
+
+static inline void do_smmul_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = ((int64_t)a[i] * b[i] + (uint32_t)INT32_MIN) >> 32;
+}
+
+RVPR(smmul_u, 1, 4);
+
+typedef void PackedFn4i(CPURISCVState *, void *, void *,
+                        void *, void *, uint8_t);
+
+static inline target_ulong
+rvpr_acc(CPURISCVState *env, target_ulong a,
+         target_ulong b, target_ulong c,
+         uint8_t step, uint8_t size, PackedFn4i *fn)
+{
+    int i, passes = sizeof(target_ulong) / size;
+    target_ulong result = 0;
+
+    for (i = 0; i < passes; i += step) {
+        fn(env, &result, &a, &b, &c, i);
+    }
+    return result;
+}
+
+#define RVPR_ACC(NAME, STEP, SIZE)                                     \
+target_ulong HELPER(NAME)(CPURISCVState *env, target_ulong a,          \
+                          target_ulong b, target_ulong c)              \
+{                                                                      \
+    return rvpr_acc(env, a, b, c, STEP, SIZE, (PackedFn4i *)do_##NAME);\
+}
+
+static inline void do_kmmac(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb, *c = vc;
+    d[i] = sadd32(env, 0, ((int64_t)a[i] * b[i]) >> 32, c[i]);
+}
+
+RVPR_ACC(kmmac, 1, 4);
+
+static inline void do_kmmac_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb, *c = vc;
+    d[i] = sadd32(env, 0, ((int64_t)a[i] * b[i] +
+                           (uint32_t)INT32_MIN) >> 32, c[i]);
+}
+
+RVPR_ACC(kmmac_u, 1, 4);
+
+static inline void do_kmmsb(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb, *c = vc;
+    d[i] = ssub32(env, 0, c[i], (int64_t)a[i] * b[i] >> 32);
+}
+
+RVPR_ACC(kmmsb, 1, 4);
+
+static inline void do_kmmsb_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb, *c = vc;
+    d[i] = ssub32(env, 0, c[i], ((int64_t)a[i] * b[i] +
+                                 (uint32_t)INT32_MIN) >> 32);
+}
+
+RVPR_ACC(kmmsb_u, 1, 4);
+
+static inline void do_kwmmul(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    if (a[i] == INT32_MIN && b[i] == INT32_MIN) {
+        env->vxsat = 0x1;
+        d[i] = INT32_MAX;
+    } else {
+        d[i] = (int64_t)a[i] * b[i] >> 31;
+    }
+}
+
+RVPR(kwmmul, 1, 4);
+
+static inline void do_kwmmul_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    if (a[i] == INT32_MIN && b[i] == INT32_MIN) {
+        env->vxsat = 0x1;
+        d[i] = INT32_MAX;
+    } else {
+        d[i] = ((int64_t)a[i] * b[i] + (1ull << 30)) >> 31;
+    }
+}
+
+RVPR(kwmmul_u, 1, 4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 15/37] target/riscv: Signed MSW 32x32 Multiply and Add Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

Always contain a 32x32 multiplification and the most significant
word can be used as the result, or an operand for an add or
subtract operation with rounding or saturation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   9 ++
 target/riscv/insn32.decode              |   9 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  44 ++++++++++
 target/riscv/packed_helper.c            | 109 ++++++++++++++++++++++++
 4 files changed, 171 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 9872f5efbd..600e8dee44 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1271,3 +1271,12 @@ DEF_HELPER_3(pkbb16, tl, env, tl, tl)
 DEF_HELPER_3(pkbt16, tl, env, tl, tl)
 DEF_HELPER_3(pktt16, tl, env, tl, tl)
 DEF_HELPER_3(pktb16, tl, env, tl, tl)
+
+DEF_HELPER_3(smmul, tl, env, tl, tl)
+DEF_HELPER_3(smmul_u, tl, env, tl, tl)
+DEF_HELPER_4(kmmac, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmac_u, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmsb, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmsb_u, tl, env, tl, tl, tl)
+DEF_HELPER_3(kwmmul, tl, env, tl, tl)
+DEF_HELPER_3(kwmmul_u, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 0b6830c76e..0484de140b 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -874,3 +874,12 @@ pkbb16     0000111  ..... ..... 001 ..... 1110111 @r
 pkbt16     0001111  ..... ..... 001 ..... 1110111 @r
 pktt16     0010111  ..... ..... 001 ..... 1110111 @r
 pktb16     0011111  ..... ..... 001 ..... 1110111 @r
+
+smmul      0100000  ..... ..... 001 ..... 1110111 @r
+smmul_u    0101000  ..... ..... 001 ..... 1110111 @r
+kmmac      0110000  ..... ..... 001 ..... 1110111 @r
+kmmac_u    0111000  ..... ..... 001 ..... 1110111 @r
+kmmsb      0100001  ..... ..... 001 ..... 1110111 @r
+kmmsb_u    0101001  ..... ..... 001 ..... 1110111 @r
+kwmmul     0110001  ..... ..... 001 ..... 1110111 @r
+kwmmul_u   0111001  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index b5bd8b1406..073558b950 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -369,3 +369,47 @@ GEN_RVP_R_OOL(pkbb16);
 GEN_RVP_R_OOL(pkbt16);
 GEN_RVP_R_OOL(pktt16);
 GEN_RVP_R_OOL(pktb16);
+
+/* Most Significant Word “32x32” Multiply & Add Instructions */
+GEN_RVP_R_OOL(smmul);
+GEN_RVP_R_OOL(smmul_u);
+
+/* Function to accumulate destination register */
+static inline bool r_acc_ool(DisasContext *ctx, arg_r *a,
+                             void (* fn)(TCGv, TCGv_ptr, TCGv, TCGv, TCGv))
+{
+    TCGv src1, src2, src3, dst;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    src3 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    gen_get_gpr(src3, a->rd);
+    fn(dst, cpu_env, src1, src2, src3);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(src3);
+    tcg_temp_free(dst);
+    return true;
+}
+
+#define GEN_RVP_R_ACC_OOL(NAME)                        \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_acc_ool(s, a, gen_helper_##NAME);         \
+}
+
+GEN_RVP_R_ACC_OOL(kmmac);
+GEN_RVP_R_ACC_OOL(kmmac_u);
+GEN_RVP_R_ACC_OOL(kmmsb);
+GEN_RVP_R_ACC_OOL(kmmsb_u);
+GEN_RVP_R_OOL(kwmmul);
+GEN_RVP_R_OOL(kwmmul_u);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index f6cea654b2..465cb5a3b3 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1359,3 +1359,112 @@ static inline void do_pktb16(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(pktb16, 2, 2);
+
+/* Most Significant Word “32x32” Multiply & Add Instructions */
+static inline void do_smmul(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = (int64_t)a[i] * b[i] >> 32;
+}
+
+RVPR(smmul, 1, 4);
+
+static inline void do_smmul_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = ((int64_t)a[i] * b[i] + (uint32_t)INT32_MIN) >> 32;
+}
+
+RVPR(smmul_u, 1, 4);
+
+typedef void PackedFn4i(CPURISCVState *, void *, void *,
+                        void *, void *, uint8_t);
+
+static inline target_ulong
+rvpr_acc(CPURISCVState *env, target_ulong a,
+         target_ulong b, target_ulong c,
+         uint8_t step, uint8_t size, PackedFn4i *fn)
+{
+    int i, passes = sizeof(target_ulong) / size;
+    target_ulong result = 0;
+
+    for (i = 0; i < passes; i += step) {
+        fn(env, &result, &a, &b, &c, i);
+    }
+    return result;
+}
+
+#define RVPR_ACC(NAME, STEP, SIZE)                                     \
+target_ulong HELPER(NAME)(CPURISCVState *env, target_ulong a,          \
+                          target_ulong b, target_ulong c)              \
+{                                                                      \
+    return rvpr_acc(env, a, b, c, STEP, SIZE, (PackedFn4i *)do_##NAME);\
+}
+
+static inline void do_kmmac(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb, *c = vc;
+    d[i] = sadd32(env, 0, ((int64_t)a[i] * b[i]) >> 32, c[i]);
+}
+
+RVPR_ACC(kmmac, 1, 4);
+
+static inline void do_kmmac_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb, *c = vc;
+    d[i] = sadd32(env, 0, ((int64_t)a[i] * b[i] +
+                           (uint32_t)INT32_MIN) >> 32, c[i]);
+}
+
+RVPR_ACC(kmmac_u, 1, 4);
+
+static inline void do_kmmsb(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb, *c = vc;
+    d[i] = ssub32(env, 0, c[i], (int64_t)a[i] * b[i] >> 32);
+}
+
+RVPR_ACC(kmmsb, 1, 4);
+
+static inline void do_kmmsb_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb, *c = vc;
+    d[i] = ssub32(env, 0, c[i], ((int64_t)a[i] * b[i] +
+                                 (uint32_t)INT32_MIN) >> 32);
+}
+
+RVPR_ACC(kmmsb_u, 1, 4);
+
+static inline void do_kwmmul(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    if (a[i] == INT32_MIN && b[i] == INT32_MIN) {
+        env->vxsat = 0x1;
+        d[i] = INT32_MAX;
+    } else {
+        d[i] = (int64_t)a[i] * b[i] >> 31;
+    }
+}
+
+RVPR(kwmmul, 1, 4);
+
+static inline void do_kwmmul_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    if (a[i] == INT32_MIN && b[i] == INT32_MIN) {
+        env->vxsat = 0x1;
+        d[i] = INT32_MAX;
+    } else {
+        d[i] = ((int64_t)a[i] * b[i] + (1ull << 30)) >> 31;
+    }
+}
+
+RVPR(kwmmul_u, 1, 4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 16/37] target/riscv: Signed MSW 32x16 Multiply and Add Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair Francis, LIU Zhiwei

Always contain a 32x16 multiplification and the most significant
word can be used as the result, or an operand for an add or
subtract operation with rounding or saturation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/helper.h                   |  17 ++
 target/riscv/insn32.decode              |  17 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  18 ++
 target/riscv/packed_helper.c            | 208 ++++++++++++++++++++++++
 4 files changed, 260 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 600e8dee44..854f48d385 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1280,3 +1280,20 @@ DEF_HELPER_4(kmmsb, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmmsb_u, tl, env, tl, tl, tl)
 DEF_HELPER_3(kwmmul, tl, env, tl, tl)
 DEF_HELPER_3(kwmmul_u, tl, env, tl, tl)
+
+DEF_HELPER_3(smmwb, tl, env, tl, tl)
+DEF_HELPER_3(smmwb_u, tl, env, tl, tl)
+DEF_HELPER_3(smmwt, tl, env, tl, tl)
+DEF_HELPER_3(smmwt_u, tl, env, tl, tl)
+DEF_HELPER_4(kmmawb, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawb_u, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawt, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawt_u, tl, env, tl, tl, tl)
+DEF_HELPER_3(kmmwb2, tl, env, tl, tl)
+DEF_HELPER_3(kmmwb2_u, tl, env, tl, tl)
+DEF_HELPER_3(kmmwt2, tl, env, tl, tl)
+DEF_HELPER_3(kmmwt2_u, tl, env, tl, tl)
+DEF_HELPER_4(kmmawb2, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawb2_u, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawt2, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawt2_u, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 0484de140b..e5a8f663dc 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -883,3 +883,20 @@ kmmsb      0100001  ..... ..... 001 ..... 1110111 @r
 kmmsb_u    0101001  ..... ..... 001 ..... 1110111 @r
 kwmmul     0110001  ..... ..... 001 ..... 1110111 @r
 kwmmul_u   0111001  ..... ..... 001 ..... 1110111 @r
+
+smmwb      0100010  ..... ..... 001 ..... 1110111 @r
+smmwb_u    0101010  ..... ..... 001 ..... 1110111 @r
+smmwt      0110010  ..... ..... 001 ..... 1110111 @r
+smmwt_u    0111010  ..... ..... 001 ..... 1110111 @r
+kmmawb     0100011  ..... ..... 001 ..... 1110111 @r
+kmmawb_u   0101011  ..... ..... 001 ..... 1110111 @r
+kmmawt     0110011  ..... ..... 001 ..... 1110111 @r
+kmmawt_u   0111011  ..... ..... 001 ..... 1110111 @r
+kmmwb2     1000111  ..... ..... 001 ..... 1110111 @r
+kmmwb2_u   1001111  ..... ..... 001 ..... 1110111 @r
+kmmwt2     1010111  ..... ..... 001 ..... 1110111 @r
+kmmwt2_u   1011111  ..... ..... 001 ..... 1110111 @r
+kmmawb2    1100111  ..... ..... 001 ..... 1110111 @r
+kmmawb2_u  1101111  ..... ..... 001 ..... 1110111 @r
+kmmawt2    1110111  ..... ..... 001 ..... 1110111 @r
+kmmawt2_u  1111111  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 073558b950..af490a5ef0 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -413,3 +413,21 @@ GEN_RVP_R_ACC_OOL(kmmsb);
 GEN_RVP_R_ACC_OOL(kmmsb_u);
 GEN_RVP_R_OOL(kwmmul);
 GEN_RVP_R_OOL(kwmmul_u);
+
+/* Most Significant Word “32x16” Multiply & Add Instructions */
+GEN_RVP_R_OOL(smmwb);
+GEN_RVP_R_OOL(smmwb_u);
+GEN_RVP_R_OOL(smmwt);
+GEN_RVP_R_OOL(smmwt_u);
+GEN_RVP_R_ACC_OOL(kmmawb);
+GEN_RVP_R_ACC_OOL(kmmawb_u);
+GEN_RVP_R_ACC_OOL(kmmawt);
+GEN_RVP_R_ACC_OOL(kmmawt_u);
+GEN_RVP_R_OOL(kmmwb2);
+GEN_RVP_R_OOL(kmmwb2_u);
+GEN_RVP_R_OOL(kmmwt2);
+GEN_RVP_R_OOL(kmmwt2_u);
+GEN_RVP_R_ACC_OOL(kmmawb2);
+GEN_RVP_R_ACC_OOL(kmmawb2_u);
+GEN_RVP_R_ACC_OOL(kmmawt2);
+GEN_RVP_R_ACC_OOL(kmmawt2_u);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 465cb5a3b3..868a1a71ba 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1468,3 +1468,211 @@ static inline void do_kwmmul_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(kwmmul_u, 1, 4);
+
+/* Most Significant Word “32x16” Multiply & Add Instructions */
+static inline void do_smmwb(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i)] >> 16;
+}
+
+RVPR(smmwb, 1, 4);
+
+static inline void do_smmwb_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i)] + (1ull << 15)) >> 16;
+}
+
+RVPR(smmwb_u, 1, 4);
+
+static inline void do_smmwt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 16;
+}
+
+RVPR(smmwt, 1, 4);
+
+static inline void do_smmwt_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] + (1ull << 15)) >> 16;
+}
+
+RVPR(smmwt_u, 1, 4);
+
+static inline void do_kmmawb(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc;
+    int16_t *b = vb;
+    d[H4(i)] = sadd32(env, 0, (int64_t)a[H4(i)] * b[H2(2 * i)] >> 16, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawb, 1, 4);
+
+static inline void do_kmmawb_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc;
+    int16_t *b = vb;
+    d[H4(i)] = sadd32(env, 0, ((int64_t)a[H4(i)] * b[H2(2 * i)] +
+                               (1ull << 15)) >> 16, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawb_u, 1, 4);
+
+static inline void do_kmmawt(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc;
+    int16_t *b = vb;
+    d[H4(i)] = sadd32(env, 0, (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 16,
+                      c[H4(i)]);
+}
+
+RVPR_ACC(kmmawt, 1, 4);
+
+static inline void do_kmmawt_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc;
+    int16_t *b = vb;
+    d[H4(i)] = sadd32(env, 0, ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] +
+                               (1ull << 15)) >> 16, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawt_u, 1, 4);
+
+static inline void do_kmmwb2(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        d[H4(i)] = INT32_MAX;
+    } else {
+        d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i)] >> 15;
+    }
+}
+
+RVPR(kmmwb2, 1, 4);
+
+static inline void do_kmmwb2_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        d[H4(i)] = INT32_MAX;
+    } else {
+        d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i)] + (1ull << 14)) >> 15;
+    }
+}
+
+RVPR(kmmwb2_u, 1, 4);
+
+static inline void do_kmmwt2(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        d[H4(i)] = INT32_MAX;
+    } else {
+        d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 15;
+    }
+}
+
+RVPR(kmmwt2, 1, 4);
+
+static inline void do_kmmwt2_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        d[H4(i)] = INT32_MAX;
+    } else {
+        d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] + (1ull << 14)) >> 15;
+    }
+}
+
+RVPR(kmmwt2_u, 1, 4);
+
+static inline void do_kmmawb2(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc, result;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        result = INT32_MAX;
+    } else {
+        result = (int64_t)a[H4(i)] * b[H2(2 * i)] >> 15;
+    }
+    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawb2, 1, 4);
+
+static inline void do_kmmawb2_u(CPURISCVState *env, void *vd, void *va,
+                                void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc, result;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        result = INT32_MAX;
+    } else {
+        result = ((int64_t)a[H4(i)] * b[H2(2 * i)] + (1ull << 14)) >> 15;
+    }
+    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawb2_u, 1, 4);
+
+static inline void do_kmmawt2(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc, result;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        result = INT32_MAX;
+    } else {
+        result = (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 15;
+    }
+    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawt2, 1, 4);
+
+static inline void do_kmmawt2_u(CPURISCVState *env, void *vd, void *va,
+                                void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc, result;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        result = INT32_MAX;
+    } else {
+        result = ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] + (1ull << 14)) >> 15;
+    }
+    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawt2_u, 1, 4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 16/37] target/riscv: Signed MSW 32x16 Multiply and Add Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson,
	LIU Zhiwei, Alistair Francis

Always contain a 32x16 multiplification and the most significant
word can be used as the result, or an operand for an add or
subtract operation with rounding or saturation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/helper.h                   |  17 ++
 target/riscv/insn32.decode              |  17 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  18 ++
 target/riscv/packed_helper.c            | 208 ++++++++++++++++++++++++
 4 files changed, 260 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 600e8dee44..854f48d385 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1280,3 +1280,20 @@ DEF_HELPER_4(kmmsb, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmmsb_u, tl, env, tl, tl, tl)
 DEF_HELPER_3(kwmmul, tl, env, tl, tl)
 DEF_HELPER_3(kwmmul_u, tl, env, tl, tl)
+
+DEF_HELPER_3(smmwb, tl, env, tl, tl)
+DEF_HELPER_3(smmwb_u, tl, env, tl, tl)
+DEF_HELPER_3(smmwt, tl, env, tl, tl)
+DEF_HELPER_3(smmwt_u, tl, env, tl, tl)
+DEF_HELPER_4(kmmawb, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawb_u, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawt, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawt_u, tl, env, tl, tl, tl)
+DEF_HELPER_3(kmmwb2, tl, env, tl, tl)
+DEF_HELPER_3(kmmwb2_u, tl, env, tl, tl)
+DEF_HELPER_3(kmmwt2, tl, env, tl, tl)
+DEF_HELPER_3(kmmwt2_u, tl, env, tl, tl)
+DEF_HELPER_4(kmmawb2, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawb2_u, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawt2, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawt2_u, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 0484de140b..e5a8f663dc 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -883,3 +883,20 @@ kmmsb      0100001  ..... ..... 001 ..... 1110111 @r
 kmmsb_u    0101001  ..... ..... 001 ..... 1110111 @r
 kwmmul     0110001  ..... ..... 001 ..... 1110111 @r
 kwmmul_u   0111001  ..... ..... 001 ..... 1110111 @r
+
+smmwb      0100010  ..... ..... 001 ..... 1110111 @r
+smmwb_u    0101010  ..... ..... 001 ..... 1110111 @r
+smmwt      0110010  ..... ..... 001 ..... 1110111 @r
+smmwt_u    0111010  ..... ..... 001 ..... 1110111 @r
+kmmawb     0100011  ..... ..... 001 ..... 1110111 @r
+kmmawb_u   0101011  ..... ..... 001 ..... 1110111 @r
+kmmawt     0110011  ..... ..... 001 ..... 1110111 @r
+kmmawt_u   0111011  ..... ..... 001 ..... 1110111 @r
+kmmwb2     1000111  ..... ..... 001 ..... 1110111 @r
+kmmwb2_u   1001111  ..... ..... 001 ..... 1110111 @r
+kmmwt2     1010111  ..... ..... 001 ..... 1110111 @r
+kmmwt2_u   1011111  ..... ..... 001 ..... 1110111 @r
+kmmawb2    1100111  ..... ..... 001 ..... 1110111 @r
+kmmawb2_u  1101111  ..... ..... 001 ..... 1110111 @r
+kmmawt2    1110111  ..... ..... 001 ..... 1110111 @r
+kmmawt2_u  1111111  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 073558b950..af490a5ef0 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -413,3 +413,21 @@ GEN_RVP_R_ACC_OOL(kmmsb);
 GEN_RVP_R_ACC_OOL(kmmsb_u);
 GEN_RVP_R_OOL(kwmmul);
 GEN_RVP_R_OOL(kwmmul_u);
+
+/* Most Significant Word “32x16” Multiply & Add Instructions */
+GEN_RVP_R_OOL(smmwb);
+GEN_RVP_R_OOL(smmwb_u);
+GEN_RVP_R_OOL(smmwt);
+GEN_RVP_R_OOL(smmwt_u);
+GEN_RVP_R_ACC_OOL(kmmawb);
+GEN_RVP_R_ACC_OOL(kmmawb_u);
+GEN_RVP_R_ACC_OOL(kmmawt);
+GEN_RVP_R_ACC_OOL(kmmawt_u);
+GEN_RVP_R_OOL(kmmwb2);
+GEN_RVP_R_OOL(kmmwb2_u);
+GEN_RVP_R_OOL(kmmwt2);
+GEN_RVP_R_OOL(kmmwt2_u);
+GEN_RVP_R_ACC_OOL(kmmawb2);
+GEN_RVP_R_ACC_OOL(kmmawb2_u);
+GEN_RVP_R_ACC_OOL(kmmawt2);
+GEN_RVP_R_ACC_OOL(kmmawt2_u);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 465cb5a3b3..868a1a71ba 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1468,3 +1468,211 @@ static inline void do_kwmmul_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(kwmmul_u, 1, 4);
+
+/* Most Significant Word “32x16” Multiply & Add Instructions */
+static inline void do_smmwb(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i)] >> 16;
+}
+
+RVPR(smmwb, 1, 4);
+
+static inline void do_smmwb_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i)] + (1ull << 15)) >> 16;
+}
+
+RVPR(smmwb_u, 1, 4);
+
+static inline void do_smmwt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 16;
+}
+
+RVPR(smmwt, 1, 4);
+
+static inline void do_smmwt_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] + (1ull << 15)) >> 16;
+}
+
+RVPR(smmwt_u, 1, 4);
+
+static inline void do_kmmawb(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc;
+    int16_t *b = vb;
+    d[H4(i)] = sadd32(env, 0, (int64_t)a[H4(i)] * b[H2(2 * i)] >> 16, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawb, 1, 4);
+
+static inline void do_kmmawb_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc;
+    int16_t *b = vb;
+    d[H4(i)] = sadd32(env, 0, ((int64_t)a[H4(i)] * b[H2(2 * i)] +
+                               (1ull << 15)) >> 16, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawb_u, 1, 4);
+
+static inline void do_kmmawt(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc;
+    int16_t *b = vb;
+    d[H4(i)] = sadd32(env, 0, (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 16,
+                      c[H4(i)]);
+}
+
+RVPR_ACC(kmmawt, 1, 4);
+
+static inline void do_kmmawt_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc;
+    int16_t *b = vb;
+    d[H4(i)] = sadd32(env, 0, ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] +
+                               (1ull << 15)) >> 16, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawt_u, 1, 4);
+
+static inline void do_kmmwb2(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        d[H4(i)] = INT32_MAX;
+    } else {
+        d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i)] >> 15;
+    }
+}
+
+RVPR(kmmwb2, 1, 4);
+
+static inline void do_kmmwb2_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        d[H4(i)] = INT32_MAX;
+    } else {
+        d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i)] + (1ull << 14)) >> 15;
+    }
+}
+
+RVPR(kmmwb2_u, 1, 4);
+
+static inline void do_kmmwt2(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        d[H4(i)] = INT32_MAX;
+    } else {
+        d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 15;
+    }
+}
+
+RVPR(kmmwt2, 1, 4);
+
+static inline void do_kmmwt2_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        d[H4(i)] = INT32_MAX;
+    } else {
+        d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] + (1ull << 14)) >> 15;
+    }
+}
+
+RVPR(kmmwt2_u, 1, 4);
+
+static inline void do_kmmawb2(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc, result;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        result = INT32_MAX;
+    } else {
+        result = (int64_t)a[H4(i)] * b[H2(2 * i)] >> 15;
+    }
+    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawb2, 1, 4);
+
+static inline void do_kmmawb2_u(CPURISCVState *env, void *vd, void *va,
+                                void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc, result;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        result = INT32_MAX;
+    } else {
+        result = ((int64_t)a[H4(i)] * b[H2(2 * i)] + (1ull << 14)) >> 15;
+    }
+    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawb2_u, 1, 4);
+
+static inline void do_kmmawt2(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc, result;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        result = INT32_MAX;
+    } else {
+        result = (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 15;
+    }
+    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawt2, 1, 4);
+
+static inline void do_kmmawt2_u(CPURISCVState *env, void *vd, void *va,
+                                void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc, result;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        result = INT32_MAX;
+    } else {
+        result = ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] + (1ull << 14)) >> 15;
+    }
+    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawt2_u, 1, 4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 17/37] target/riscv: Signed 16-bit Multiply 32-bit Add/Subtract Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

Always contain a signed 16x16 multiply and the 32-bit result can be
written to the destination register or as an operand for an add/subtract
operation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  19 ++
 target/riscv/insn32.decode              |  19 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  20 ++
 target/riscv/packed_helper.c            | 268 ++++++++++++++++++++++++
 4 files changed, 326 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 854f48d385..5aac6ba578 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1297,3 +1297,22 @@ DEF_HELPER_4(kmmawb2, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmmawb2_u, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmmawt2, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmmawt2_u, tl, env, tl, tl, tl)
+
+DEF_HELPER_3(smbb16, tl, env, tl, tl)
+DEF_HELPER_3(smbt16, tl, env, tl, tl)
+DEF_HELPER_3(smtt16, tl, env, tl, tl)
+DEF_HELPER_3(kmda, tl, env, tl, tl)
+DEF_HELPER_3(kmxda, tl, env, tl, tl)
+DEF_HELPER_3(smds, tl, env, tl, tl)
+DEF_HELPER_3(smdrs, tl, env, tl, tl)
+DEF_HELPER_3(smxds, tl, env, tl, tl)
+DEF_HELPER_4(kmabb, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmabt, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmatt, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmada, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmaxda, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmads, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmadrs, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmaxds, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmsda, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmsxda, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e5a8f663dc..f590880750 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -900,3 +900,22 @@ kmmawb2    1100111  ..... ..... 001 ..... 1110111 @r
 kmmawb2_u  1101111  ..... ..... 001 ..... 1110111 @r
 kmmawt2    1110111  ..... ..... 001 ..... 1110111 @r
 kmmawt2_u  1111111  ..... ..... 001 ..... 1110111 @r
+
+smbb16     0000100  ..... ..... 001 ..... 1110111 @r
+smbt16     0001100  ..... ..... 001 ..... 1110111 @r
+smtt16     0010100  ..... ..... 001 ..... 1110111 @r
+kmda       0011100  ..... ..... 001 ..... 1110111 @r
+kmxda      0011101  ..... ..... 001 ..... 1110111 @r
+smds       0101100  ..... ..... 001 ..... 1110111 @r
+smdrs      0110100  ..... ..... 001 ..... 1110111 @r
+smxds      0111100  ..... ..... 001 ..... 1110111 @r
+kmabb      0101101  ..... ..... 001 ..... 1110111 @r
+kmabt      0110101  ..... ..... 001 ..... 1110111 @r
+kmatt      0111101  ..... ..... 001 ..... 1110111 @r
+kmada      0100100  ..... ..... 001 ..... 1110111 @r
+kmaxda     0100101  ..... ..... 001 ..... 1110111 @r
+kmads      0101110  ..... ..... 001 ..... 1110111 @r
+kmadrs     0110110  ..... ..... 001 ..... 1110111 @r
+kmaxds     0111110  ..... ..... 001 ..... 1110111 @r
+kmsda      0100110  ..... ..... 001 ..... 1110111 @r
+kmsxda     0100111  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index af490a5ef0..308fc223db 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -431,3 +431,23 @@ GEN_RVP_R_ACC_OOL(kmmawb2);
 GEN_RVP_R_ACC_OOL(kmmawb2_u);
 GEN_RVP_R_ACC_OOL(kmmawt2);
 GEN_RVP_R_ACC_OOL(kmmawt2_u);
+
+/* Signed 16-bit Multiply with 32-bit Add/Subtract Instructions */
+GEN_RVP_R_OOL(smbb16);
+GEN_RVP_R_OOL(smbt16);
+GEN_RVP_R_OOL(smtt16);
+GEN_RVP_R_OOL(kmda);
+GEN_RVP_R_OOL(kmxda);
+GEN_RVP_R_OOL(smds);
+GEN_RVP_R_OOL(smdrs);
+GEN_RVP_R_OOL(smxds);
+GEN_RVP_R_ACC_OOL(kmabb);
+GEN_RVP_R_ACC_OOL(kmabt);
+GEN_RVP_R_ACC_OOL(kmatt);
+GEN_RVP_R_ACC_OOL(kmada);
+GEN_RVP_R_ACC_OOL(kmaxda);
+GEN_RVP_R_ACC_OOL(kmads);
+GEN_RVP_R_ACC_OOL(kmadrs);
+GEN_RVP_R_ACC_OOL(kmaxds);
+GEN_RVP_R_ACC_OOL(kmsda);
+GEN_RVP_R_ACC_OOL(kmsxda);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 868a1a71ba..88509fd118 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1676,3 +1676,271 @@ static inline void do_kmmawt2_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(kmmawt2_u, 1, 4);
+
+/* Signed 16-bit Multiply with 32-bit Add/Subtract Instruction */
+static inline void do_smbb16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+}
+
+RVPR(smbb16, 1, 4);
+
+static inline void do_smbt16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+}
+
+RVPR(smbt16, 1, 4);
+
+static inline void do_smtt16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+}
+
+RVPR(smtt16, 1, 4);
+
+static inline void do_kmda(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    if (a[H2(2 * i)] == INT16_MIN && a[H2(2 * i + 1)] == INT16_MIN &&
+        b[H2(2 * i)] == INT16_MIN && a[H2(2 * i + 1)] == INT16_MIN) {
+        d[H4(i)] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[H4(i)] = (int32_t)a[H2(2 * i)] * b[H2(2 * i)] +
+                   (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+    }
+}
+
+RVPR(kmda, 1, 4);
+
+static inline void do_kmxda(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    if (a[H2(2 * i)] == INT16_MIN && a[H2(2 * i + 1)] == INT16_MIN &&
+        b[H2(2 * i)] == INT16_MIN && a[H2(2 * i + 1)] == INT16_MIN) {
+        d[H4(i)] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[H4(i)] = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i)] +
+                   (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+    }
+}
+
+RVPR(kmxda, 1, 4);
+
+static inline void do_smds(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)] -
+               (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+}
+
+RVPR(smds, 1, 4);
+
+static inline void do_smdrs(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i)] * b[H2(2 * i)] -
+               (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+}
+
+RVPR(smdrs, 1, 4);
+
+static inline void do_smxds(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i)] -
+               (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+}
+
+RVPR(smxds, 1, 4);
+
+static inline void do_kmabb(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = sadd32(env, 0, (int32_t)a[H2(2 * i)] * b[H2(2 * i)], c[H4(i)]);
+}
+
+RVPR_ACC(kmabb, 1, 4);
+
+static inline void do_kmabt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = sadd32(env, 0, (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)],
+                      c[H4(i)]);
+}
+
+RVPR_ACC(kmabt, 1, 4);
+
+static inline void do_kmatt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = sadd32(env, 0, (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)],
+                      c[H4(i)]);
+}
+
+RVPR_ACC(kmatt, 1, 4);
+
+static inline void do_kmada(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+    p2 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+
+    if (a[H2(i)] == INT16_MIN && a[H2(i + 1)] == INT16_MIN &&
+        b[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        if (c[H4(i)] < 0) {
+            d[H4(i)] = INT32_MAX + c[H4(i)] + 1ll;
+        } else {
+            env->vxsat = 0x1;
+            d[H4(i)] = INT32_MAX;
+        }
+    } else {
+        d[H4(i)] = sadd32(env, 0, p1 + p2, c[H4(i)]);
+    }
+}
+
+RVPR_ACC(kmada, 1, 4);
+
+static inline void do_kmaxda(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i)];
+    p2 = (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+
+    if (a[H2(2 * i)] == INT16_MIN && a[H2(2 * i + 1)] == INT16_MIN &&
+        b[H2(2 * i)] == INT16_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
+        if (c[H4(i)] < 0) {
+            d[H4(i)] = INT32_MAX + c[H4(i)] + 1ll;
+        } else {
+            env->vxsat = 0x1;
+            d[H4(i)] = INT32_MAX;
+        }
+    } else {
+        d[H4(i)] = sadd32(env, 0, p1 + p2, c[H4(i)]);
+    }
+}
+
+RVPR_ACC(kmaxda, 1, 4);
+
+static inline void do_kmads(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 =   (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+    p2 =   (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+
+    d[H4(i)] = sadd32(env, 0, p1 - p2, c[H4(i)]);
+}
+
+RVPR_ACC(kmads, 1, 4);
+
+static inline void do_kmadrs(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void * vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+    p2 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+
+    d[H4(i)] = sadd32(env, 0, p1 - p2, c[H4(i)]);
+}
+
+RVPR_ACC(kmadrs, 1, 4);
+
+static inline void do_kmaxds(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i)];
+    p2 = (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+
+    d[H4(i)] = sadd32(env, 0, p1 - p2, c[H4(i)]);
+}
+
+RVPR_ACC(kmaxds, 1, 4);
+
+static inline void do_kmsda(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+    p2 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+
+    if (a[H2(i)] == INT16_MIN && a[H2(i + 1)] == INT16_MIN &&
+        b[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        if (c[H4(i)] < 0) {
+            env->vxsat = 0x1;
+            d[H4(i)] = INT32_MIN;
+        } else {
+            d[H4(i)] = c[H4(i)] - 1ll - INT32_MAX;
+        }
+    } else {
+        d[H4(i)] = ssub32(env, 0, c[H4(i)], p1 + p2);
+    }
+}
+
+RVPR_ACC(kmsda, 1, 4);
+
+static inline void do_kmsxda(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void * vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+    p2 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i)];
+
+    if (a[H2(i)] == INT16_MIN && a[H2(i + 1)] == INT16_MIN &&
+        b[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        if (d[H4(i)] < 0) {
+            env->vxsat = 0x1;
+            d[H4(i)] = INT32_MIN;
+        } else {
+            d[H4(i)] = c[H4(i)] - 1ll - INT32_MAX;
+        }
+    } else {
+        d[H4(i)] = ssub32(env, 0, c[H4(i)], p1 + p2);
+    }
+}
+
+RVPR_ACC(kmsxda, 1, 4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 17/37] target/riscv: Signed 16-bit Multiply 32-bit Add/Subtract Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

Always contain a signed 16x16 multiply and the 32-bit result can be
written to the destination register or as an operand for an add/subtract
operation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  19 ++
 target/riscv/insn32.decode              |  19 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  20 ++
 target/riscv/packed_helper.c            | 268 ++++++++++++++++++++++++
 4 files changed, 326 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 854f48d385..5aac6ba578 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1297,3 +1297,22 @@ DEF_HELPER_4(kmmawb2, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmmawb2_u, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmmawt2, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmmawt2_u, tl, env, tl, tl, tl)
+
+DEF_HELPER_3(smbb16, tl, env, tl, tl)
+DEF_HELPER_3(smbt16, tl, env, tl, tl)
+DEF_HELPER_3(smtt16, tl, env, tl, tl)
+DEF_HELPER_3(kmda, tl, env, tl, tl)
+DEF_HELPER_3(kmxda, tl, env, tl, tl)
+DEF_HELPER_3(smds, tl, env, tl, tl)
+DEF_HELPER_3(smdrs, tl, env, tl, tl)
+DEF_HELPER_3(smxds, tl, env, tl, tl)
+DEF_HELPER_4(kmabb, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmabt, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmatt, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmada, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmaxda, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmads, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmadrs, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmaxds, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmsda, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmsxda, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e5a8f663dc..f590880750 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -900,3 +900,22 @@ kmmawb2    1100111  ..... ..... 001 ..... 1110111 @r
 kmmawb2_u  1101111  ..... ..... 001 ..... 1110111 @r
 kmmawt2    1110111  ..... ..... 001 ..... 1110111 @r
 kmmawt2_u  1111111  ..... ..... 001 ..... 1110111 @r
+
+smbb16     0000100  ..... ..... 001 ..... 1110111 @r
+smbt16     0001100  ..... ..... 001 ..... 1110111 @r
+smtt16     0010100  ..... ..... 001 ..... 1110111 @r
+kmda       0011100  ..... ..... 001 ..... 1110111 @r
+kmxda      0011101  ..... ..... 001 ..... 1110111 @r
+smds       0101100  ..... ..... 001 ..... 1110111 @r
+smdrs      0110100  ..... ..... 001 ..... 1110111 @r
+smxds      0111100  ..... ..... 001 ..... 1110111 @r
+kmabb      0101101  ..... ..... 001 ..... 1110111 @r
+kmabt      0110101  ..... ..... 001 ..... 1110111 @r
+kmatt      0111101  ..... ..... 001 ..... 1110111 @r
+kmada      0100100  ..... ..... 001 ..... 1110111 @r
+kmaxda     0100101  ..... ..... 001 ..... 1110111 @r
+kmads      0101110  ..... ..... 001 ..... 1110111 @r
+kmadrs     0110110  ..... ..... 001 ..... 1110111 @r
+kmaxds     0111110  ..... ..... 001 ..... 1110111 @r
+kmsda      0100110  ..... ..... 001 ..... 1110111 @r
+kmsxda     0100111  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index af490a5ef0..308fc223db 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -431,3 +431,23 @@ GEN_RVP_R_ACC_OOL(kmmawb2);
 GEN_RVP_R_ACC_OOL(kmmawb2_u);
 GEN_RVP_R_ACC_OOL(kmmawt2);
 GEN_RVP_R_ACC_OOL(kmmawt2_u);
+
+/* Signed 16-bit Multiply with 32-bit Add/Subtract Instructions */
+GEN_RVP_R_OOL(smbb16);
+GEN_RVP_R_OOL(smbt16);
+GEN_RVP_R_OOL(smtt16);
+GEN_RVP_R_OOL(kmda);
+GEN_RVP_R_OOL(kmxda);
+GEN_RVP_R_OOL(smds);
+GEN_RVP_R_OOL(smdrs);
+GEN_RVP_R_OOL(smxds);
+GEN_RVP_R_ACC_OOL(kmabb);
+GEN_RVP_R_ACC_OOL(kmabt);
+GEN_RVP_R_ACC_OOL(kmatt);
+GEN_RVP_R_ACC_OOL(kmada);
+GEN_RVP_R_ACC_OOL(kmaxda);
+GEN_RVP_R_ACC_OOL(kmads);
+GEN_RVP_R_ACC_OOL(kmadrs);
+GEN_RVP_R_ACC_OOL(kmaxds);
+GEN_RVP_R_ACC_OOL(kmsda);
+GEN_RVP_R_ACC_OOL(kmsxda);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 868a1a71ba..88509fd118 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1676,3 +1676,271 @@ static inline void do_kmmawt2_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(kmmawt2_u, 1, 4);
+
+/* Signed 16-bit Multiply with 32-bit Add/Subtract Instruction */
+static inline void do_smbb16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+}
+
+RVPR(smbb16, 1, 4);
+
+static inline void do_smbt16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+}
+
+RVPR(smbt16, 1, 4);
+
+static inline void do_smtt16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+}
+
+RVPR(smtt16, 1, 4);
+
+static inline void do_kmda(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    if (a[H2(2 * i)] == INT16_MIN && a[H2(2 * i + 1)] == INT16_MIN &&
+        b[H2(2 * i)] == INT16_MIN && a[H2(2 * i + 1)] == INT16_MIN) {
+        d[H4(i)] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[H4(i)] = (int32_t)a[H2(2 * i)] * b[H2(2 * i)] +
+                   (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+    }
+}
+
+RVPR(kmda, 1, 4);
+
+static inline void do_kmxda(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    if (a[H2(2 * i)] == INT16_MIN && a[H2(2 * i + 1)] == INT16_MIN &&
+        b[H2(2 * i)] == INT16_MIN && a[H2(2 * i + 1)] == INT16_MIN) {
+        d[H4(i)] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[H4(i)] = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i)] +
+                   (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+    }
+}
+
+RVPR(kmxda, 1, 4);
+
+static inline void do_smds(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)] -
+               (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+}
+
+RVPR(smds, 1, 4);
+
+static inline void do_smdrs(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i)] * b[H2(2 * i)] -
+               (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+}
+
+RVPR(smdrs, 1, 4);
+
+static inline void do_smxds(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i)] -
+               (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+}
+
+RVPR(smxds, 1, 4);
+
+static inline void do_kmabb(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = sadd32(env, 0, (int32_t)a[H2(2 * i)] * b[H2(2 * i)], c[H4(i)]);
+}
+
+RVPR_ACC(kmabb, 1, 4);
+
+static inline void do_kmabt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = sadd32(env, 0, (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)],
+                      c[H4(i)]);
+}
+
+RVPR_ACC(kmabt, 1, 4);
+
+static inline void do_kmatt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = sadd32(env, 0, (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)],
+                      c[H4(i)]);
+}
+
+RVPR_ACC(kmatt, 1, 4);
+
+static inline void do_kmada(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+    p2 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+
+    if (a[H2(i)] == INT16_MIN && a[H2(i + 1)] == INT16_MIN &&
+        b[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        if (c[H4(i)] < 0) {
+            d[H4(i)] = INT32_MAX + c[H4(i)] + 1ll;
+        } else {
+            env->vxsat = 0x1;
+            d[H4(i)] = INT32_MAX;
+        }
+    } else {
+        d[H4(i)] = sadd32(env, 0, p1 + p2, c[H4(i)]);
+    }
+}
+
+RVPR_ACC(kmada, 1, 4);
+
+static inline void do_kmaxda(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i)];
+    p2 = (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+
+    if (a[H2(2 * i)] == INT16_MIN && a[H2(2 * i + 1)] == INT16_MIN &&
+        b[H2(2 * i)] == INT16_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
+        if (c[H4(i)] < 0) {
+            d[H4(i)] = INT32_MAX + c[H4(i)] + 1ll;
+        } else {
+            env->vxsat = 0x1;
+            d[H4(i)] = INT32_MAX;
+        }
+    } else {
+        d[H4(i)] = sadd32(env, 0, p1 + p2, c[H4(i)]);
+    }
+}
+
+RVPR_ACC(kmaxda, 1, 4);
+
+static inline void do_kmads(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 =   (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+    p2 =   (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+
+    d[H4(i)] = sadd32(env, 0, p1 - p2, c[H4(i)]);
+}
+
+RVPR_ACC(kmads, 1, 4);
+
+static inline void do_kmadrs(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void * vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+    p2 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+
+    d[H4(i)] = sadd32(env, 0, p1 - p2, c[H4(i)]);
+}
+
+RVPR_ACC(kmadrs, 1, 4);
+
+static inline void do_kmaxds(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i)];
+    p2 = (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+
+    d[H4(i)] = sadd32(env, 0, p1 - p2, c[H4(i)]);
+}
+
+RVPR_ACC(kmaxds, 1, 4);
+
+static inline void do_kmsda(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+    p2 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+
+    if (a[H2(i)] == INT16_MIN && a[H2(i + 1)] == INT16_MIN &&
+        b[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        if (c[H4(i)] < 0) {
+            env->vxsat = 0x1;
+            d[H4(i)] = INT32_MIN;
+        } else {
+            d[H4(i)] = c[H4(i)] - 1ll - INT32_MAX;
+        }
+    } else {
+        d[H4(i)] = ssub32(env, 0, c[H4(i)], p1 + p2);
+    }
+}
+
+RVPR_ACC(kmsda, 1, 4);
+
+static inline void do_kmsxda(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void * vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+    p2 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i)];
+
+    if (a[H2(i)] == INT16_MIN && a[H2(i + 1)] == INT16_MIN &&
+        b[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        if (d[H4(i)] < 0) {
+            env->vxsat = 0x1;
+            d[H4(i)] = INT32_MIN;
+        } else {
+            d[H4(i)] = c[H4(i)] - 1ll - INT32_MAX;
+        }
+    } else {
+        d[H4(i)] = ssub32(env, 0, c[H4(i)], p1 + p2);
+    }
+}
+
+RVPR_ACC(kmsxda, 1, 4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 18/37] target/riscv: Signed 16-bit Multiply 64-bit Add/Subtract Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

"16x16" with 64-bit Signed Addition(64 = 64 + 16x16).

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  2 +
 target/riscv/insn32.decode              |  2 +
 target/riscv/insn_trans/trans_rvp.c.inc | 51 +++++++++++++++++++++++++
 target/riscv/packed_helper.c            | 25 ++++++++++++
 4 files changed, 80 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 5aac6ba578..a37b023c53 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1316,3 +1316,5 @@ DEF_HELPER_4(kmadrs, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmaxds, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmsda, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmsxda, tl, env, tl, tl, tl)
+
+DEF_HELPER_3(smal, i64, env, i64, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index f590880750..233df941b4 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -919,3 +919,5 @@ kmadrs     0110110  ..... ..... 001 ..... 1110111 @r
 kmaxds     0111110  ..... ..... 001 ..... 1110111 @r
 kmsda      0100110  ..... ..... 001 ..... 1110111 @r
 kmsxda     0100111  ..... ..... 001 ..... 1110111 @r
+
+smal       0101111  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 308fc223db..8b0728fc5a 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -451,3 +451,54 @@ GEN_RVP_R_ACC_OOL(kmadrs);
 GEN_RVP_R_ACC_OOL(kmaxds);
 GEN_RVP_R_ACC_OOL(kmsda);
 GEN_RVP_R_ACC_OOL(kmsxda);
+
+/* Signed 16-bit Multiply with 64-bit Add/Subtract Instructions */
+static bool
+r_d64_s64_ool(DisasContext *ctx, arg_r *a,
+              void (* fn)(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv))
+{
+    TCGv src2;
+    TCGv_i64 src1, dst;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_psfoperand) {
+        return false;
+    }
+
+    src1 = tcg_temp_new_i64();
+    src2 = tcg_temp_new();
+    dst = tcg_temp_new_i64();
+
+    if (is_32bit(ctx)) {
+        TCGv t0, t1;
+        t0 = tcg_temp_new();
+        t1 = tcg_temp_new();
+        gen_get_gpr(t0, a->rs1);
+        gen_get_gpr(t1, a->rs1 + 1);
+        tcg_gen_concat_tl_i64(src1, t0, t1);
+        tcg_temp_free(t0);
+        tcg_temp_free(t1);
+    } else {
+        TCGv t0;
+        t0 = tcg_temp_new();
+        gen_get_gpr(t0, a->rs1);
+        tcg_gen_ext_tl_i64(src1, t0);
+        tcg_temp_free(t0);
+    }
+
+    gen_get_gpr(src2, a->rs2);
+    fn(dst, cpu_env, src1, src2);
+    set_pair_regs(ctx, dst, a->rd);
+
+    tcg_temp_free_i64(src1);
+    tcg_temp_free_i64(dst);
+    tcg_temp_free(src2);
+    return true;
+}
+
+#define GEN_RVP_R_D64_S64_OOL(NAME)                    \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_d64_s64_ool(s, a, gen_helper_##NAME);     \
+}
+
+GEN_RVP_R_D64_S64_OOL(smal);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 88509fd118..1f9a5d620f 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1944,3 +1944,28 @@ static inline void do_kmsxda(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(kmsxda, 1, 4);
+
+/* Signed 16-bit Multiply with 64-bit Add/Subtract Instructions */
+static inline void do_smal(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va;
+    int16_t *b = vb;
+
+    if (i == 0) {
+        *d = *a;
+    }
+
+    *d += b[H2(i)] * b[H2(i + 1)];
+}
+
+uint64_t helper_smal(CPURISCVState *env, uint64_t a, target_ulong b)
+{
+    int i;
+    int64_t result = 0;
+
+    for (i = 0; i < sizeof(target_ulong) / 2; i += 2) {
+        do_smal(env, &result, &a, &b, i);
+    }
+    return result;
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 18/37] target/riscv: Signed 16-bit Multiply 64-bit Add/Subtract Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

"16x16" with 64-bit Signed Addition(64 = 64 + 16x16).

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  2 +
 target/riscv/insn32.decode              |  2 +
 target/riscv/insn_trans/trans_rvp.c.inc | 51 +++++++++++++++++++++++++
 target/riscv/packed_helper.c            | 25 ++++++++++++
 4 files changed, 80 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 5aac6ba578..a37b023c53 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1316,3 +1316,5 @@ DEF_HELPER_4(kmadrs, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmaxds, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmsda, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmsxda, tl, env, tl, tl, tl)
+
+DEF_HELPER_3(smal, i64, env, i64, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index f590880750..233df941b4 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -919,3 +919,5 @@ kmadrs     0110110  ..... ..... 001 ..... 1110111 @r
 kmaxds     0111110  ..... ..... 001 ..... 1110111 @r
 kmsda      0100110  ..... ..... 001 ..... 1110111 @r
 kmsxda     0100111  ..... ..... 001 ..... 1110111 @r
+
+smal       0101111  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 308fc223db..8b0728fc5a 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -451,3 +451,54 @@ GEN_RVP_R_ACC_OOL(kmadrs);
 GEN_RVP_R_ACC_OOL(kmaxds);
 GEN_RVP_R_ACC_OOL(kmsda);
 GEN_RVP_R_ACC_OOL(kmsxda);
+
+/* Signed 16-bit Multiply with 64-bit Add/Subtract Instructions */
+static bool
+r_d64_s64_ool(DisasContext *ctx, arg_r *a,
+              void (* fn)(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv))
+{
+    TCGv src2;
+    TCGv_i64 src1, dst;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_psfoperand) {
+        return false;
+    }
+
+    src1 = tcg_temp_new_i64();
+    src2 = tcg_temp_new();
+    dst = tcg_temp_new_i64();
+
+    if (is_32bit(ctx)) {
+        TCGv t0, t1;
+        t0 = tcg_temp_new();
+        t1 = tcg_temp_new();
+        gen_get_gpr(t0, a->rs1);
+        gen_get_gpr(t1, a->rs1 + 1);
+        tcg_gen_concat_tl_i64(src1, t0, t1);
+        tcg_temp_free(t0);
+        tcg_temp_free(t1);
+    } else {
+        TCGv t0;
+        t0 = tcg_temp_new();
+        gen_get_gpr(t0, a->rs1);
+        tcg_gen_ext_tl_i64(src1, t0);
+        tcg_temp_free(t0);
+    }
+
+    gen_get_gpr(src2, a->rs2);
+    fn(dst, cpu_env, src1, src2);
+    set_pair_regs(ctx, dst, a->rd);
+
+    tcg_temp_free_i64(src1);
+    tcg_temp_free_i64(dst);
+    tcg_temp_free(src2);
+    return true;
+}
+
+#define GEN_RVP_R_D64_S64_OOL(NAME)                    \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_d64_s64_ool(s, a, gen_helper_##NAME);     \
+}
+
+GEN_RVP_R_D64_S64_OOL(smal);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 88509fd118..1f9a5d620f 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1944,3 +1944,28 @@ static inline void do_kmsxda(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(kmsxda, 1, 4);
+
+/* Signed 16-bit Multiply with 64-bit Add/Subtract Instructions */
+static inline void do_smal(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va;
+    int16_t *b = vb;
+
+    if (i == 0) {
+        *d = *a;
+    }
+
+    *d += b[H2(i)] * b[H2(i + 1)];
+}
+
+uint64_t helper_smal(CPURISCVState *env, uint64_t a, target_ulong b)
+{
+    int i;
+    int64_t result = 0;
+
+    for (i = 0; i < sizeof(target_ulong) / 2; i += 2) {
+        do_smal(env, &result, &a, &b, i);
+    }
+    return result;
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 19/37] target/riscv: Partial-SIMD Miscellaneous Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair Francis, LIU Zhiwei

32-bit signed or unsigned clip value. 32-bit leading
redundant sign, leading zero, leading one count. Parallel
byte sum of absolute difference or parallel byte sum of
absolute difference accumulation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/helper.h                   |  8 +++
 target/riscv/insn32.decode              |  8 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  9 +++
 target/riscv/packed_helper.c            | 75 +++++++++++++++++++++++++
 4 files changed, 100 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index a37b023c53..35c8c61b00 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1318,3 +1318,11 @@ DEF_HELPER_4(kmsda, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmsxda, tl, env, tl, tl, tl)
 
 DEF_HELPER_3(smal, i64, env, i64, tl)
+
+DEF_HELPER_3(sclip32, tl, env, tl, tl)
+DEF_HELPER_3(uclip32, tl, env, tl, tl)
+DEF_HELPER_2(clrs32, tl, env, tl)
+DEF_HELPER_2(clz32, tl, env, tl)
+DEF_HELPER_2(clo32, tl, env, tl)
+DEF_HELPER_3(pbsad, tl, env, tl, tl)
+DEF_HELPER_4(pbsada, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 233df941b4..ce8bdee34b 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -921,3 +921,11 @@ kmsda      0100110  ..... ..... 001 ..... 1110111 @r
 kmsxda     0100111  ..... ..... 001 ..... 1110111 @r
 
 smal       0101111  ..... ..... 001 ..... 1110111 @r
+
+sclip32    1110010  ..... ..... 000 ..... 1110111 @sh5
+uclip32    1111010  ..... ..... 000 ..... 1110111 @sh5
+clrs32     1010111  11000 ..... 000 ..... 1110111 @r2
+clz32      1010111  11001 ..... 000 ..... 1110111 @r2
+clo32      1010111  11011 ..... 000 ..... 1110111 @r2
+pbsad      1111110  ..... ..... 000 ..... 1110111 @r
+pbsada     1111111  ..... ..... 000 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 8b0728fc5a..43e7e5a75d 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -502,3 +502,12 @@ static bool trans_##NAME(DisasContext *s, arg_r *a)    \
 }
 
 GEN_RVP_R_D64_S64_OOL(smal);
+
+/* Partial-SIMD Miscellaneous Instructions */
+GEN_RVP_SHIFTI(sclip32, NULL, gen_helper_sclip32);
+GEN_RVP_SHIFTI(uclip32, NULL, gen_helper_uclip32);
+GEN_RVP_R2_OOL(clrs32);
+GEN_RVP_R2_OOL(clz32);
+GEN_RVP_R2_OOL(clo32);
+GEN_RVP_R_OOL(pbsad);
+GEN_RVP_R_ACC_OOL(pbsada);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 1f9a5d620f..1f2b90c394 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1969,3 +1969,78 @@ uint64_t helper_smal(CPURISCVState *env, uint64_t a, target_ulong b)
     }
     return result;
 }
+
+/* Partial-SIMD Miscellaneous Instructions */
+static inline void do_sclip32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+
+    d[i] = sat64(env, a[i], shift);
+}
+
+RVPR(sclip32, 1, 4);
+
+static inline void do_uclip32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+
+    if (a[i] < 0) {
+        d[i] = 0;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = satu64(env, a[i], shift);
+    }
+}
+
+RVPR(uclip32, 1, 4);
+
+static inline void do_clrs32(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    d[i] = clrsb32(a[i]);
+}
+
+RVPR2(clrs32, 1, 4);
+
+static inline void do_clz32(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    d[i] = clz32(a[i]);
+}
+
+RVPR2(clz32, 1, 4);
+
+static inline void do_clo32(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    d[i] = clo32(a[i]);
+}
+
+RVPR2(clo32, 1, 4);
+
+static inline void do_pbsad(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_ulong *d = vd;
+    uint8_t *a = va, *b = vb;
+    *d += abs(a[i] - b[i]);
+}
+
+RVPR(pbsad, 1, 1);
+
+static inline void do_pbsada(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    target_ulong *d = vd, *c = vc;
+    uint8_t *a = va, *b = vb;
+    if (i == 0) {
+        *d += *c;
+    }
+    *d += abs(a[i] - b[i]);
+}
+
+RVPR_ACC(pbsada, 1, 1);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 19/37] target/riscv: Partial-SIMD Miscellaneous Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson,
	LIU Zhiwei, Alistair Francis

32-bit signed or unsigned clip value. 32-bit leading
redundant sign, leading zero, leading one count. Parallel
byte sum of absolute difference or parallel byte sum of
absolute difference accumulation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/helper.h                   |  8 +++
 target/riscv/insn32.decode              |  8 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  9 +++
 target/riscv/packed_helper.c            | 75 +++++++++++++++++++++++++
 4 files changed, 100 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index a37b023c53..35c8c61b00 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1318,3 +1318,11 @@ DEF_HELPER_4(kmsda, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmsxda, tl, env, tl, tl, tl)
 
 DEF_HELPER_3(smal, i64, env, i64, tl)
+
+DEF_HELPER_3(sclip32, tl, env, tl, tl)
+DEF_HELPER_3(uclip32, tl, env, tl, tl)
+DEF_HELPER_2(clrs32, tl, env, tl)
+DEF_HELPER_2(clz32, tl, env, tl)
+DEF_HELPER_2(clo32, tl, env, tl)
+DEF_HELPER_3(pbsad, tl, env, tl, tl)
+DEF_HELPER_4(pbsada, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 233df941b4..ce8bdee34b 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -921,3 +921,11 @@ kmsda      0100110  ..... ..... 001 ..... 1110111 @r
 kmsxda     0100111  ..... ..... 001 ..... 1110111 @r
 
 smal       0101111  ..... ..... 001 ..... 1110111 @r
+
+sclip32    1110010  ..... ..... 000 ..... 1110111 @sh5
+uclip32    1111010  ..... ..... 000 ..... 1110111 @sh5
+clrs32     1010111  11000 ..... 000 ..... 1110111 @r2
+clz32      1010111  11001 ..... 000 ..... 1110111 @r2
+clo32      1010111  11011 ..... 000 ..... 1110111 @r2
+pbsad      1111110  ..... ..... 000 ..... 1110111 @r
+pbsada     1111111  ..... ..... 000 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 8b0728fc5a..43e7e5a75d 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -502,3 +502,12 @@ static bool trans_##NAME(DisasContext *s, arg_r *a)    \
 }
 
 GEN_RVP_R_D64_S64_OOL(smal);
+
+/* Partial-SIMD Miscellaneous Instructions */
+GEN_RVP_SHIFTI(sclip32, NULL, gen_helper_sclip32);
+GEN_RVP_SHIFTI(uclip32, NULL, gen_helper_uclip32);
+GEN_RVP_R2_OOL(clrs32);
+GEN_RVP_R2_OOL(clz32);
+GEN_RVP_R2_OOL(clo32);
+GEN_RVP_R_OOL(pbsad);
+GEN_RVP_R_ACC_OOL(pbsada);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 1f9a5d620f..1f2b90c394 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1969,3 +1969,78 @@ uint64_t helper_smal(CPURISCVState *env, uint64_t a, target_ulong b)
     }
     return result;
 }
+
+/* Partial-SIMD Miscellaneous Instructions */
+static inline void do_sclip32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+
+    d[i] = sat64(env, a[i], shift);
+}
+
+RVPR(sclip32, 1, 4);
+
+static inline void do_uclip32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+
+    if (a[i] < 0) {
+        d[i] = 0;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = satu64(env, a[i], shift);
+    }
+}
+
+RVPR(uclip32, 1, 4);
+
+static inline void do_clrs32(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    d[i] = clrsb32(a[i]);
+}
+
+RVPR2(clrs32, 1, 4);
+
+static inline void do_clz32(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    d[i] = clz32(a[i]);
+}
+
+RVPR2(clz32, 1, 4);
+
+static inline void do_clo32(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    d[i] = clo32(a[i]);
+}
+
+RVPR2(clo32, 1, 4);
+
+static inline void do_pbsad(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_ulong *d = vd;
+    uint8_t *a = va, *b = vb;
+    *d += abs(a[i] - b[i]);
+}
+
+RVPR(pbsad, 1, 1);
+
+static inline void do_pbsada(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    target_ulong *d = vd, *c = vc;
+    uint8_t *a = va, *b = vb;
+    if (i == 0) {
+        *d += *c;
+    }
+    *d += abs(a[i] - b[i]);
+}
+
+RVPR_ACC(pbsada, 1, 1);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 20/37] target/riscv: 8-bit Multiply with 32-bit Add Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

Four "signed or unsigned 8 x signed or unsigned 8" with 32-bit addition
(32 = 32 + 8x8 + 8x8 + 8x8 + 8x8).

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  4 +++
 target/riscv/insn32.decode              |  4 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  5 +++
 target/riscv/packed_helper.c            | 44 +++++++++++++++++++++++++
 4 files changed, 57 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 35c8c61b00..a0e3131512 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1326,3 +1326,7 @@ DEF_HELPER_2(clz32, tl, env, tl)
 DEF_HELPER_2(clo32, tl, env, tl)
 DEF_HELPER_3(pbsad, tl, env, tl, tl)
 DEF_HELPER_4(pbsada, tl, env, tl, tl, tl)
+
+DEF_HELPER_4(smaqa, tl, env, tl, tl, tl)
+DEF_HELPER_4(umaqa, tl, env, tl, tl, tl)
+DEF_HELPER_4(smaqa_su, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index ce8bdee34b..96288370a6 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -929,3 +929,7 @@ clz32      1010111  11001 ..... 000 ..... 1110111 @r2
 clo32      1010111  11011 ..... 000 ..... 1110111 @r2
 pbsad      1111110  ..... ..... 000 ..... 1110111 @r
 pbsada     1111111  ..... ..... 000 ..... 1110111 @r
+
+smaqa      1100100  ..... ..... 000 ..... 1110111 @r
+umaqa      1100110  ..... ..... 000 ..... 1110111 @r
+smaqa_su   1100101  ..... ..... 000 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 43e7e5a75d..1a10f13318 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -511,3 +511,8 @@ GEN_RVP_R2_OOL(clz32);
 GEN_RVP_R2_OOL(clo32);
 GEN_RVP_R_OOL(pbsad);
 GEN_RVP_R_ACC_OOL(pbsada);
+
+/* 8-bit Multiply with 32-bit Add Instructions */
+GEN_RVP_R_ACC_OOL(smaqa);
+GEN_RVP_R_ACC_OOL(umaqa);
+GEN_RVP_R_ACC_OOL(smaqa_su);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 1f2b90c394..02178d6e61 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2044,3 +2044,47 @@ static inline void do_pbsada(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(pbsada, 1, 1);
+
+/* 8-bit Multiply with 32-bit Add Instructions */
+static inline void do_smaqa(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int8_t *a = va, *b = vb;
+    int32_t *d = vd, *c = vc;
+
+    d[H4(i)] = c[H4(i)] + a[H1(i * 4)] * b[H1(i * 4)] +
+               a[H1(i * 4 + 1)] * b[H1(i * 4 + 1)] +
+               a[H1(i * 4 + 2)] * b[H1(i * 4 + 2)] +
+               a[H1(i * 4 + 3)] * b[H1(i * 4 + 3)];
+}
+
+RVPR_ACC(smaqa, 1, 4);
+
+static inline void do_umaqa(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    uint8_t *a = va, *b = vb;
+    uint32_t *d = vd, *c = vc;
+
+    d[H4(i)] = c[H4(i)] + a[H1(i * 4)] * b[H1(i * 4)] +
+               a[H1(i * 4 + 1)] * b[H1(i * 4 + 1)] +
+               a[H1(i * 4 + 2)] * b[H1(i * 4 + 2)] +
+               a[H1(i * 4 + 3)] * b[H1(i * 4 + 3)];
+}
+
+RVPR_ACC(umaqa, 1, 4);
+
+static inline void do_smaqa_su(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int8_t *a = va;
+    uint8_t *b = vb;
+    int32_t *d = vd, *c = vc;
+
+    d[H4(i)] = c[H4(i)] + a[H1(i * 4)] * b[H1(i * 4)] +
+               a[H1(i * 4 + 1)] * b[H1(i * 4 + 1)] +
+               a[H1(i * 4 + 2)] * b[H1(i * 4 + 2)] +
+               a[H1(i * 4 + 3)] * b[H1(i * 4 + 3)];
+}
+
+RVPR_ACC(smaqa_su, 1, 4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 20/37] target/riscv: 8-bit Multiply with 32-bit Add Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

Four "signed or unsigned 8 x signed or unsigned 8" with 32-bit addition
(32 = 32 + 8x8 + 8x8 + 8x8 + 8x8).

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  4 +++
 target/riscv/insn32.decode              |  4 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  5 +++
 target/riscv/packed_helper.c            | 44 +++++++++++++++++++++++++
 4 files changed, 57 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 35c8c61b00..a0e3131512 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1326,3 +1326,7 @@ DEF_HELPER_2(clz32, tl, env, tl)
 DEF_HELPER_2(clo32, tl, env, tl)
 DEF_HELPER_3(pbsad, tl, env, tl, tl)
 DEF_HELPER_4(pbsada, tl, env, tl, tl, tl)
+
+DEF_HELPER_4(smaqa, tl, env, tl, tl, tl)
+DEF_HELPER_4(umaqa, tl, env, tl, tl, tl)
+DEF_HELPER_4(smaqa_su, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index ce8bdee34b..96288370a6 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -929,3 +929,7 @@ clz32      1010111  11001 ..... 000 ..... 1110111 @r2
 clo32      1010111  11011 ..... 000 ..... 1110111 @r2
 pbsad      1111110  ..... ..... 000 ..... 1110111 @r
 pbsada     1111111  ..... ..... 000 ..... 1110111 @r
+
+smaqa      1100100  ..... ..... 000 ..... 1110111 @r
+umaqa      1100110  ..... ..... 000 ..... 1110111 @r
+smaqa_su   1100101  ..... ..... 000 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 43e7e5a75d..1a10f13318 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -511,3 +511,8 @@ GEN_RVP_R2_OOL(clz32);
 GEN_RVP_R2_OOL(clo32);
 GEN_RVP_R_OOL(pbsad);
 GEN_RVP_R_ACC_OOL(pbsada);
+
+/* 8-bit Multiply with 32-bit Add Instructions */
+GEN_RVP_R_ACC_OOL(smaqa);
+GEN_RVP_R_ACC_OOL(umaqa);
+GEN_RVP_R_ACC_OOL(smaqa_su);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 1f2b90c394..02178d6e61 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2044,3 +2044,47 @@ static inline void do_pbsada(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(pbsada, 1, 1);
+
+/* 8-bit Multiply with 32-bit Add Instructions */
+static inline void do_smaqa(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int8_t *a = va, *b = vb;
+    int32_t *d = vd, *c = vc;
+
+    d[H4(i)] = c[H4(i)] + a[H1(i * 4)] * b[H1(i * 4)] +
+               a[H1(i * 4 + 1)] * b[H1(i * 4 + 1)] +
+               a[H1(i * 4 + 2)] * b[H1(i * 4 + 2)] +
+               a[H1(i * 4 + 3)] * b[H1(i * 4 + 3)];
+}
+
+RVPR_ACC(smaqa, 1, 4);
+
+static inline void do_umaqa(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    uint8_t *a = va, *b = vb;
+    uint32_t *d = vd, *c = vc;
+
+    d[H4(i)] = c[H4(i)] + a[H1(i * 4)] * b[H1(i * 4)] +
+               a[H1(i * 4 + 1)] * b[H1(i * 4 + 1)] +
+               a[H1(i * 4 + 2)] * b[H1(i * 4 + 2)] +
+               a[H1(i * 4 + 3)] * b[H1(i * 4 + 3)];
+}
+
+RVPR_ACC(umaqa, 1, 4);
+
+static inline void do_smaqa_su(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int8_t *a = va;
+    uint8_t *b = vb;
+    int32_t *d = vd, *c = vc;
+
+    d[H4(i)] = c[H4(i)] + a[H1(i * 4)] * b[H1(i * 4)] +
+               a[H1(i * 4 + 1)] * b[H1(i * 4 + 1)] +
+               a[H1(i * 4 + 2)] * b[H1(i * 4 + 2)] +
+               a[H1(i * 4 + 3)] * b[H1(i * 4 + 3)];
+}
+
+RVPR_ACC(smaqa_su, 1, 4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 21/37] target/riscv: 64-bit Add/Subtract Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

64-bit add/subtract with saturation or halving operation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  11 ++
 target/riscv/insn32.decode              |  11 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  74 +++++++++++++
 target/riscv/packed_helper.c            | 132 ++++++++++++++++++++++++
 4 files changed, 228 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index a0e3131512..192ef42d2a 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1330,3 +1330,14 @@ DEF_HELPER_4(pbsada, tl, env, tl, tl, tl)
 DEF_HELPER_4(smaqa, tl, env, tl, tl, tl)
 DEF_HELPER_4(umaqa, tl, env, tl, tl, tl)
 DEF_HELPER_4(smaqa_su, tl, env, tl, tl, tl)
+
+DEF_HELPER_3(add64, i64, env, i64, i64)
+DEF_HELPER_3(radd64, i64, env, i64, i64)
+DEF_HELPER_3(uradd64, i64, env, i64, i64)
+DEF_HELPER_3(kadd64, i64, env, i64, i64)
+DEF_HELPER_3(ukadd64, i64, env, i64, i64)
+DEF_HELPER_3(sub64, i64, env, i64, i64)
+DEF_HELPER_3(rsub64, i64, env, i64, i64)
+DEF_HELPER_3(ursub64, i64, env, i64, i64)
+DEF_HELPER_3(ksub64, i64, env, i64, i64)
+DEF_HELPER_3(uksub64, i64, env, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 96288370a6..5156fa060e 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -933,3 +933,14 @@ pbsada     1111111  ..... ..... 000 ..... 1110111 @r
 smaqa      1100100  ..... ..... 000 ..... 1110111 @r
 umaqa      1100110  ..... ..... 000 ..... 1110111 @r
 smaqa_su   1100101  ..... ..... 000 ..... 1110111 @r
+
+add64      1100000  ..... ..... 001 ..... 1110111 @r
+radd64     1000000  ..... ..... 001 ..... 1110111 @r
+uradd64    1010000  ..... ..... 001 ..... 1110111 @r
+kadd64     1001000  ..... ..... 001 ..... 1110111 @r
+ukadd64    1011000  ..... ..... 001 ..... 1110111 @r
+sub64      1100001  ..... ..... 001 ..... 1110111 @r
+rsub64     1000001  ..... ..... 001 ..... 1110111 @r
+ursub64    1010001  ..... ..... 001 ..... 1110111 @r
+ksub64     1001001  ..... ..... 001 ..... 1110111 @r
+uksub64    1011001  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 1a10f13318..e04c79931d 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -516,3 +516,77 @@ GEN_RVP_R_ACC_OOL(pbsada);
 GEN_RVP_R_ACC_OOL(smaqa);
 GEN_RVP_R_ACC_OOL(umaqa);
 GEN_RVP_R_ACC_OOL(smaqa_su);
+
+/*
+ *** 64-bit Profile Instructions
+ */
+/* 64-bit Addition & Subtraction Instructions */
+static bool
+r_d64_s64_s64_ool(DisasContext *ctx, arg_r *a,
+                  void (* fn)(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64))
+{
+    TCGv t1, t2;
+    TCGv_i64 src1, src2, dst;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_psfoperand) {
+        return false;
+    }
+
+    src1 = tcg_temp_new_i64();
+    src2 = tcg_temp_new_i64();
+    dst = tcg_temp_new_i64();
+
+    if (is_32bit(ctx)) {
+        TCGv a0, a1, b0, b1;
+        a0 = tcg_temp_new();
+        a1 = tcg_temp_new();
+        b0 = tcg_temp_new();
+        b1 = tcg_temp_new();
+
+        gen_get_gpr(a0, a->rs1);
+        gen_get_gpr(a1, a->rs1 + 1);
+        tcg_gen_concat_tl_i64(src1, a0, a1);
+        gen_get_gpr(b0, a->rs2);
+        gen_get_gpr(b1, a->rs2 + 1);
+        tcg_gen_concat_tl_i64(src2, b0, b1);
+
+        tcg_temp_free(a0);
+        tcg_temp_free(a1);
+        tcg_temp_free(b0);
+        tcg_temp_free(b1);
+    } else {
+        t1 = tcg_temp_new();
+        t2 = tcg_temp_new();
+        gen_get_gpr(t1, a->rs1);
+        tcg_gen_ext_tl_i64(src1, t1);
+        gen_get_gpr(t2, a->rs2);
+        tcg_gen_ext_tl_i64(src2, t2);
+        tcg_temp_free(t1);
+        tcg_temp_free(t2);
+    }
+
+    fn(dst, cpu_env, src1, src2);
+    set_pair_regs(ctx, dst, a->rd);
+
+    tcg_temp_free_i64(src1);
+    tcg_temp_free_i64(src2);
+    tcg_temp_free_i64(dst);
+    return true;
+}
+
+#define GEN_RVP_R_D64_S64_S64_OOL(NAME)                   \
+static bool trans_##NAME(DisasContext *s, arg_r *a)       \
+{                                                         \
+    return r_d64_s64_s64_ool(s, a, gen_helper_##NAME);    \
+}
+
+GEN_RVP_R_D64_S64_S64_OOL(add64);
+GEN_RVP_R_D64_S64_S64_OOL(radd64);
+GEN_RVP_R_D64_S64_S64_OOL(uradd64);
+GEN_RVP_R_D64_S64_S64_OOL(kadd64);
+GEN_RVP_R_D64_S64_S64_OOL(ukadd64);
+GEN_RVP_R_D64_S64_S64_OOL(sub64);
+GEN_RVP_R_D64_S64_S64_OOL(rsub64);
+GEN_RVP_R_D64_S64_S64_OOL(ursub64);
+GEN_RVP_R_D64_S64_S64_OOL(ksub64);
+GEN_RVP_R_D64_S64_S64_OOL(uksub64);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 02178d6e61..b8be234d97 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2088,3 +2088,135 @@ static inline void do_smaqa_su(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(smaqa_su, 1, 4);
+
+/*
+ *** 64-bit Profile Instructions
+ */
+/* 64-bit Addition & Subtraction Instructions */
+
+/* Define a common function to loop elements in packed register */
+static inline uint64_t
+rvpr64_64_64(CPURISCVState *env, uint64_t a, uint64_t b,
+             uint8_t step, uint8_t size, PackedFn3i *fn)
+{
+    int i, passes = sizeof(uint64_t) / size;
+    uint64_t result = 0;
+
+    for (i = 0; i < passes; i += step) {
+        fn(env, &result, &a, &b, i);
+    }
+    return result;
+}
+
+#define RVPR64_64_64(NAME, STEP, SIZE)                                    \
+uint64_t HELPER(NAME)(CPURISCVState *env, uint64_t a, uint64_t b)         \
+{                                                                         \
+    return rvpr64_64_64(env, a, b, STEP, SIZE, (PackedFn3i *)do_##NAME);  \
+}
+
+static inline void do_add64(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = *a + *b;
+}
+
+RVPR64_64_64(add64, 1, 8);
+
+static inline int64_t hadd64(int64_t a, int64_t b)
+{
+    int64_t res = a + b;
+    int64_t over = (res ^ a) & (res ^ b) & INT64_MIN;
+
+    /* With signed overflow, bit 64 is inverse of bit 63. */
+    return (res >> 1) ^ over;
+}
+
+static inline void do_radd64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = hadd64(*a, *b);
+}
+
+RVPR64_64_64(radd64, 1, 8);
+
+static inline uint64_t haddu64(uint64_t a, uint64_t b)
+{
+    uint64_t res = a + b;
+    bool over = res < a;
+
+    return over ? ((res >> 1) | INT64_MIN) : (res >> 1);
+}
+
+static inline void do_uradd64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint64_t *d = vd, *a = va, *b = vb;
+    *d = haddu64(*a, *b);
+}
+
+RVPR64_64_64(uradd64, 1, 8);
+
+static inline void do_kadd64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = sadd64(env, 0, *a, *b);
+}
+
+RVPR64_64_64(kadd64, 1, 8);
+
+static inline void do_ukadd64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint64_t *d = vd, *a = va, *b = vb;
+    *d = saddu64(env, 0, *a, *b);
+}
+
+RVPR64_64_64(ukadd64, 1, 8);
+
+static inline void do_sub64(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = *a - *b;
+}
+
+RVPR64_64_64(sub64, 1, 8);
+
+static inline void do_rsub64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = hsub64(*a, *b);
+}
+
+RVPR64_64_64(rsub64, 1, 8);
+
+static inline void do_ursub64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint64_t *d = vd, *a = va, *b = vb;
+    *d = hsubu64(*a, *b);
+}
+
+RVPR64_64_64(ursub64, 1, 8);
+
+static inline void do_ksub64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = ssub64(env, 0, *a, *b);
+}
+
+RVPR64_64_64(ksub64, 1, 8);
+
+static inline void do_uksub64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint64_t *d = vd, *a = va, *b = vb;
+    *d = ssubu64(env, 0, *a, *b);
+}
+
+RVPR64_64_64(uksub64, 1, 8);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 21/37] target/riscv: 64-bit Add/Subtract Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

64-bit add/subtract with saturation or halving operation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  11 ++
 target/riscv/insn32.decode              |  11 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  74 +++++++++++++
 target/riscv/packed_helper.c            | 132 ++++++++++++++++++++++++
 4 files changed, 228 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index a0e3131512..192ef42d2a 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1330,3 +1330,14 @@ DEF_HELPER_4(pbsada, tl, env, tl, tl, tl)
 DEF_HELPER_4(smaqa, tl, env, tl, tl, tl)
 DEF_HELPER_4(umaqa, tl, env, tl, tl, tl)
 DEF_HELPER_4(smaqa_su, tl, env, tl, tl, tl)
+
+DEF_HELPER_3(add64, i64, env, i64, i64)
+DEF_HELPER_3(radd64, i64, env, i64, i64)
+DEF_HELPER_3(uradd64, i64, env, i64, i64)
+DEF_HELPER_3(kadd64, i64, env, i64, i64)
+DEF_HELPER_3(ukadd64, i64, env, i64, i64)
+DEF_HELPER_3(sub64, i64, env, i64, i64)
+DEF_HELPER_3(rsub64, i64, env, i64, i64)
+DEF_HELPER_3(ursub64, i64, env, i64, i64)
+DEF_HELPER_3(ksub64, i64, env, i64, i64)
+DEF_HELPER_3(uksub64, i64, env, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 96288370a6..5156fa060e 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -933,3 +933,14 @@ pbsada     1111111  ..... ..... 000 ..... 1110111 @r
 smaqa      1100100  ..... ..... 000 ..... 1110111 @r
 umaqa      1100110  ..... ..... 000 ..... 1110111 @r
 smaqa_su   1100101  ..... ..... 000 ..... 1110111 @r
+
+add64      1100000  ..... ..... 001 ..... 1110111 @r
+radd64     1000000  ..... ..... 001 ..... 1110111 @r
+uradd64    1010000  ..... ..... 001 ..... 1110111 @r
+kadd64     1001000  ..... ..... 001 ..... 1110111 @r
+ukadd64    1011000  ..... ..... 001 ..... 1110111 @r
+sub64      1100001  ..... ..... 001 ..... 1110111 @r
+rsub64     1000001  ..... ..... 001 ..... 1110111 @r
+ursub64    1010001  ..... ..... 001 ..... 1110111 @r
+ksub64     1001001  ..... ..... 001 ..... 1110111 @r
+uksub64    1011001  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 1a10f13318..e04c79931d 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -516,3 +516,77 @@ GEN_RVP_R_ACC_OOL(pbsada);
 GEN_RVP_R_ACC_OOL(smaqa);
 GEN_RVP_R_ACC_OOL(umaqa);
 GEN_RVP_R_ACC_OOL(smaqa_su);
+
+/*
+ *** 64-bit Profile Instructions
+ */
+/* 64-bit Addition & Subtraction Instructions */
+static bool
+r_d64_s64_s64_ool(DisasContext *ctx, arg_r *a,
+                  void (* fn)(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64))
+{
+    TCGv t1, t2;
+    TCGv_i64 src1, src2, dst;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_psfoperand) {
+        return false;
+    }
+
+    src1 = tcg_temp_new_i64();
+    src2 = tcg_temp_new_i64();
+    dst = tcg_temp_new_i64();
+
+    if (is_32bit(ctx)) {
+        TCGv a0, a1, b0, b1;
+        a0 = tcg_temp_new();
+        a1 = tcg_temp_new();
+        b0 = tcg_temp_new();
+        b1 = tcg_temp_new();
+
+        gen_get_gpr(a0, a->rs1);
+        gen_get_gpr(a1, a->rs1 + 1);
+        tcg_gen_concat_tl_i64(src1, a0, a1);
+        gen_get_gpr(b0, a->rs2);
+        gen_get_gpr(b1, a->rs2 + 1);
+        tcg_gen_concat_tl_i64(src2, b0, b1);
+
+        tcg_temp_free(a0);
+        tcg_temp_free(a1);
+        tcg_temp_free(b0);
+        tcg_temp_free(b1);
+    } else {
+        t1 = tcg_temp_new();
+        t2 = tcg_temp_new();
+        gen_get_gpr(t1, a->rs1);
+        tcg_gen_ext_tl_i64(src1, t1);
+        gen_get_gpr(t2, a->rs2);
+        tcg_gen_ext_tl_i64(src2, t2);
+        tcg_temp_free(t1);
+        tcg_temp_free(t2);
+    }
+
+    fn(dst, cpu_env, src1, src2);
+    set_pair_regs(ctx, dst, a->rd);
+
+    tcg_temp_free_i64(src1);
+    tcg_temp_free_i64(src2);
+    tcg_temp_free_i64(dst);
+    return true;
+}
+
+#define GEN_RVP_R_D64_S64_S64_OOL(NAME)                   \
+static bool trans_##NAME(DisasContext *s, arg_r *a)       \
+{                                                         \
+    return r_d64_s64_s64_ool(s, a, gen_helper_##NAME);    \
+}
+
+GEN_RVP_R_D64_S64_S64_OOL(add64);
+GEN_RVP_R_D64_S64_S64_OOL(radd64);
+GEN_RVP_R_D64_S64_S64_OOL(uradd64);
+GEN_RVP_R_D64_S64_S64_OOL(kadd64);
+GEN_RVP_R_D64_S64_S64_OOL(ukadd64);
+GEN_RVP_R_D64_S64_S64_OOL(sub64);
+GEN_RVP_R_D64_S64_S64_OOL(rsub64);
+GEN_RVP_R_D64_S64_S64_OOL(ursub64);
+GEN_RVP_R_D64_S64_S64_OOL(ksub64);
+GEN_RVP_R_D64_S64_S64_OOL(uksub64);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 02178d6e61..b8be234d97 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2088,3 +2088,135 @@ static inline void do_smaqa_su(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(smaqa_su, 1, 4);
+
+/*
+ *** 64-bit Profile Instructions
+ */
+/* 64-bit Addition & Subtraction Instructions */
+
+/* Define a common function to loop elements in packed register */
+static inline uint64_t
+rvpr64_64_64(CPURISCVState *env, uint64_t a, uint64_t b,
+             uint8_t step, uint8_t size, PackedFn3i *fn)
+{
+    int i, passes = sizeof(uint64_t) / size;
+    uint64_t result = 0;
+
+    for (i = 0; i < passes; i += step) {
+        fn(env, &result, &a, &b, i);
+    }
+    return result;
+}
+
+#define RVPR64_64_64(NAME, STEP, SIZE)                                    \
+uint64_t HELPER(NAME)(CPURISCVState *env, uint64_t a, uint64_t b)         \
+{                                                                         \
+    return rvpr64_64_64(env, a, b, STEP, SIZE, (PackedFn3i *)do_##NAME);  \
+}
+
+static inline void do_add64(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = *a + *b;
+}
+
+RVPR64_64_64(add64, 1, 8);
+
+static inline int64_t hadd64(int64_t a, int64_t b)
+{
+    int64_t res = a + b;
+    int64_t over = (res ^ a) & (res ^ b) & INT64_MIN;
+
+    /* With signed overflow, bit 64 is inverse of bit 63. */
+    return (res >> 1) ^ over;
+}
+
+static inline void do_radd64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = hadd64(*a, *b);
+}
+
+RVPR64_64_64(radd64, 1, 8);
+
+static inline uint64_t haddu64(uint64_t a, uint64_t b)
+{
+    uint64_t res = a + b;
+    bool over = res < a;
+
+    return over ? ((res >> 1) | INT64_MIN) : (res >> 1);
+}
+
+static inline void do_uradd64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint64_t *d = vd, *a = va, *b = vb;
+    *d = haddu64(*a, *b);
+}
+
+RVPR64_64_64(uradd64, 1, 8);
+
+static inline void do_kadd64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = sadd64(env, 0, *a, *b);
+}
+
+RVPR64_64_64(kadd64, 1, 8);
+
+static inline void do_ukadd64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint64_t *d = vd, *a = va, *b = vb;
+    *d = saddu64(env, 0, *a, *b);
+}
+
+RVPR64_64_64(ukadd64, 1, 8);
+
+static inline void do_sub64(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = *a - *b;
+}
+
+RVPR64_64_64(sub64, 1, 8);
+
+static inline void do_rsub64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = hsub64(*a, *b);
+}
+
+RVPR64_64_64(rsub64, 1, 8);
+
+static inline void do_ursub64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint64_t *d = vd, *a = va, *b = vb;
+    *d = hsubu64(*a, *b);
+}
+
+RVPR64_64_64(ursub64, 1, 8);
+
+static inline void do_ksub64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = ssub64(env, 0, *a, *b);
+}
+
+RVPR64_64_64(ksub64, 1, 8);
+
+static inline void do_uksub64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint64_t *d = vd, *a = va, *b = vb;
+    *d = ssubu64(env, 0, *a, *b);
+}
+
+RVPR64_64_64(uksub64, 1, 8);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 22/37] target/riscv: 32-bit Multiply 64-bit Add/Subtract Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

32x32 multiply as an operand for 64-bit add/subtract operation
with saturation or not.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   9 ++
 target/riscv/insn32.decode              |   9 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  67 ++++++++++
 target/riscv/packed_helper.c            | 155 ++++++++++++++++++++++++
 4 files changed, 240 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 192ef42d2a..c3c086bed0 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1341,3 +1341,12 @@ DEF_HELPER_3(rsub64, i64, env, i64, i64)
 DEF_HELPER_3(ursub64, i64, env, i64, i64)
 DEF_HELPER_3(ksub64, i64, env, i64, i64)
 DEF_HELPER_3(uksub64, i64, env, i64, i64)
+
+DEF_HELPER_4(smar64, i64, env, tl, tl, i64)
+DEF_HELPER_4(smsr64, i64, env, tl, tl, i64)
+DEF_HELPER_4(umar64, i64, env, tl, tl, i64)
+DEF_HELPER_4(umsr64, i64, env, tl, tl, i64)
+DEF_HELPER_4(kmar64, i64, env, tl, tl, i64)
+DEF_HELPER_4(kmsr64, i64, env, tl, tl, i64)
+DEF_HELPER_4(ukmar64, i64, env, tl, tl, i64)
+DEF_HELPER_4(ukmsr64, i64, env, tl, tl, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 5156fa060e..5d123bbb97 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -944,3 +944,12 @@ rsub64     1000001  ..... ..... 001 ..... 1110111 @r
 ursub64    1010001  ..... ..... 001 ..... 1110111 @r
 ksub64     1001001  ..... ..... 001 ..... 1110111 @r
 uksub64    1011001  ..... ..... 001 ..... 1110111 @r
+
+smar64     1000010  ..... ..... 001 ..... 1110111 @r
+smsr64     1000011  ..... ..... 001 ..... 1110111 @r
+umar64     1010010  ..... ..... 001 ..... 1110111 @r
+umsr64     1010011  ..... ..... 001 ..... 1110111 @r
+kmar64     1001010  ..... ..... 001 ..... 1110111 @r
+kmsr64     1001011  ..... ..... 001 ..... 1110111 @r
+ukmar64    1011010  ..... ..... 001 ..... 1110111 @r
+ukmsr64    1011011  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index e04c79931d..63b6810227 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -590,3 +590,70 @@ GEN_RVP_R_D64_S64_S64_OOL(rsub64);
 GEN_RVP_R_D64_S64_S64_OOL(ursub64);
 GEN_RVP_R_D64_S64_S64_OOL(ksub64);
 GEN_RVP_R_D64_S64_S64_OOL(uksub64);
+
+/* 32-bit Multiply with 64-bit Add/Subtract Instructions */
+
+/* Function to accumulate 64bit destination register */
+static bool
+r_d64_acc_ool(DisasContext *ctx, arg_r *a,
+              void (* fn)(TCGv_i64, TCGv_ptr, TCGv, TCGv, TCGv_i64))
+{
+    TCGv src1, src2;
+    TCGv_i64 dst, src3;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_psfoperand) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    src3 = tcg_temp_new_i64();
+    dst = tcg_temp_new_i64();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+
+    if (is_32bit(ctx)) {
+        TCGv t0, t1;
+        t0 = tcg_temp_new();
+        t1 = tcg_temp_new();
+
+        gen_get_gpr(t0, a->rd);
+        gen_get_gpr(t1, a->rd + 1);
+        tcg_gen_concat_tl_i64(src3, t0, t1);
+        tcg_temp_free(t0);
+        tcg_temp_free(t1);
+    } else {
+        TCGv t0;
+        t0 = tcg_temp_new();
+
+        gen_get_gpr(t0, a->rd);
+        tcg_gen_ext_tl_i64(src3, t0);
+        tcg_temp_free(t0);
+    }
+
+    fn(dst, cpu_env, src1, src2, src3);
+
+    set_pair_regs(ctx, dst, a->rd);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free_i64(src3);
+    tcg_temp_free_i64(dst);
+    return true;
+}
+
+#define GEN_RVP_R_D64_ACC_OOL(NAME)                    \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_d64_acc_ool(s, a, gen_helper_##NAME);     \
+}
+
+GEN_RVP_R_D64_ACC_OOL(smar64);
+GEN_RVP_R_D64_ACC_OOL(smsr64);
+GEN_RVP_R_D64_ACC_OOL(umar64);
+GEN_RVP_R_D64_ACC_OOL(umsr64);
+GEN_RVP_R_D64_ACC_OOL(kmar64);
+GEN_RVP_R_D64_ACC_OOL(kmsr64);
+GEN_RVP_R_D64_ACC_OOL(ukmar64);
+GEN_RVP_R_D64_ACC_OOL(ukmsr64);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index b8be234d97..59a06c604d 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2220,3 +2220,158 @@ static inline void do_uksub64(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64_64_64(uksub64, 1, 8);
+
+/* 32-bit Multiply with 64-bit Add/Subtract Instructions */
+static inline uint64_t
+rvpr64_acc(CPURISCVState *env, target_ulong a,
+           target_ulong b, uint64_t c,
+           uint8_t step, uint8_t size, PackedFn4i *fn)
+{
+    int i, passes = sizeof(target_ulong) / size;
+    uint64_t result = 0;
+
+    for (i = 0; i < passes; i += step) {
+        fn(env, &result, &a, &b, &c, i);
+    }
+    return result;
+}
+
+#define RVPR64_ACC(NAME, STEP, SIZE)                                     \
+uint64_t HELPER(NAME)(CPURISCVState *env, target_ulong a,                \
+                      target_ulong b, uint64_t c)                        \
+{                                                                        \
+    return rvpr64_acc(env, a, b, c, STEP, SIZE, (PackedFn4i *)do_##NAME);\
+}
+
+static inline void do_smar64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    int64_t *d = vd, *c = vc;
+    if (i == 0) {
+        *d = *c;
+    }
+    *d += (int64_t)a[H4(i)] * b[H4(i)];
+}
+
+RVPR64_ACC(smar64, 1, 4);
+
+static inline void do_smsr64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    int64_t *d = vd, *c = vc;
+    if (i == 0) {
+        *d = *c;
+    }
+    *d -= (int64_t)a[H4(i)] * b[H4(i)];
+}
+
+RVPR64_ACC(smsr64, 1, 4);
+
+static inline void do_umar64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    uint64_t *d = vd, *c = vc;
+    if (i == 0) {
+        *d = *c;
+    }
+    *d += (uint64_t)a[H4(i)] * b[H4(i)];
+}
+
+RVPR64_ACC(umar64, 1, 4);
+
+static inline void do_umsr64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    uint64_t *d = vd, *c = vc;
+    if (i == 0) {
+        *d = *c;
+    }
+    *d -= (uint64_t)a[H4(i)] * b[H4(i)];
+}
+
+RVPR64_ACC(umsr64, 1, 4);
+
+static inline void do_kmar64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    int64_t *d = vd, *c = vc;
+    int64_t m0 =  (int64_t)a[H4(i)] * b[H4(i)];
+    if (!riscv_cpu_is_32bit(env)) {
+        int64_t m1 =  (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+        if (a[H4(i)] == INT32_MIN && b[H4(i)] == INT32_MIN &&
+            a[H4(i + 1)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+            if (*c >= 0) {
+                *d = INT64_MAX;
+                env->vxsat = 1;
+            } else {
+                *d = sadd64(env, 0, *c + m0, m1);
+            }
+        } else {
+            *d = sadd64(env, 0, *c, m0 + m1);
+        }
+    } else {
+        *d = sadd64(env, 0, *c, m0);
+    }
+}
+
+RVPR64_ACC(kmar64, 1, sizeof(target_ulong));
+
+static inline void do_kmsr64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    int64_t *d = vd, *c = vc;
+
+    int64_t m0 =  (int64_t)a[H4(i)] * b[H4(i)];
+    if (!riscv_cpu_is_32bit(env)) {
+        int64_t m1 =  (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+        if (a[H4(i)] == INT32_MIN && b[H4(i)] == INT32_MIN &&
+            a[H4(i + 1)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+            if (*c <= 0) {
+                *d = INT64_MIN;
+                env->vxsat = 1;
+            } else {
+                *d = ssub64(env, 0, *c - m0, m1);
+            }
+        } else {
+            *d = ssub64(env, 0, *c, m0 + m1);
+        }
+    } else {
+        *d = ssub64(env, 0, *c, m0);
+    }
+}
+
+RVPR64_ACC(kmsr64, 1, sizeof(target_ulong));
+
+static inline void do_ukmar64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    uint64_t *d = vd, *c = vc;
+
+    if (i == 0) {
+        *d = *c;
+    }
+    *d = saddu64(env, 0, *d, (uint64_t)a[H4(i)] * b[H4(i)]);
+}
+
+RVPR64_ACC(ukmar64, 1, 4);
+
+static inline void do_ukmsr64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    uint64_t *d = vd, *c = vc;
+
+    if (i == 0) {
+        *d = *c;
+    }
+    *d = ssubu64(env, 0, *d, (uint64_t)a[i] * b[i]);
+}
+
+RVPR64_ACC(ukmsr64, 1, 4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 22/37] target/riscv: 32-bit Multiply 64-bit Add/Subtract Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

32x32 multiply as an operand for 64-bit add/subtract operation
with saturation or not.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   9 ++
 target/riscv/insn32.decode              |   9 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  67 ++++++++++
 target/riscv/packed_helper.c            | 155 ++++++++++++++++++++++++
 4 files changed, 240 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 192ef42d2a..c3c086bed0 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1341,3 +1341,12 @@ DEF_HELPER_3(rsub64, i64, env, i64, i64)
 DEF_HELPER_3(ursub64, i64, env, i64, i64)
 DEF_HELPER_3(ksub64, i64, env, i64, i64)
 DEF_HELPER_3(uksub64, i64, env, i64, i64)
+
+DEF_HELPER_4(smar64, i64, env, tl, tl, i64)
+DEF_HELPER_4(smsr64, i64, env, tl, tl, i64)
+DEF_HELPER_4(umar64, i64, env, tl, tl, i64)
+DEF_HELPER_4(umsr64, i64, env, tl, tl, i64)
+DEF_HELPER_4(kmar64, i64, env, tl, tl, i64)
+DEF_HELPER_4(kmsr64, i64, env, tl, tl, i64)
+DEF_HELPER_4(ukmar64, i64, env, tl, tl, i64)
+DEF_HELPER_4(ukmsr64, i64, env, tl, tl, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 5156fa060e..5d123bbb97 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -944,3 +944,12 @@ rsub64     1000001  ..... ..... 001 ..... 1110111 @r
 ursub64    1010001  ..... ..... 001 ..... 1110111 @r
 ksub64     1001001  ..... ..... 001 ..... 1110111 @r
 uksub64    1011001  ..... ..... 001 ..... 1110111 @r
+
+smar64     1000010  ..... ..... 001 ..... 1110111 @r
+smsr64     1000011  ..... ..... 001 ..... 1110111 @r
+umar64     1010010  ..... ..... 001 ..... 1110111 @r
+umsr64     1010011  ..... ..... 001 ..... 1110111 @r
+kmar64     1001010  ..... ..... 001 ..... 1110111 @r
+kmsr64     1001011  ..... ..... 001 ..... 1110111 @r
+ukmar64    1011010  ..... ..... 001 ..... 1110111 @r
+ukmsr64    1011011  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index e04c79931d..63b6810227 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -590,3 +590,70 @@ GEN_RVP_R_D64_S64_S64_OOL(rsub64);
 GEN_RVP_R_D64_S64_S64_OOL(ursub64);
 GEN_RVP_R_D64_S64_S64_OOL(ksub64);
 GEN_RVP_R_D64_S64_S64_OOL(uksub64);
+
+/* 32-bit Multiply with 64-bit Add/Subtract Instructions */
+
+/* Function to accumulate 64bit destination register */
+static bool
+r_d64_acc_ool(DisasContext *ctx, arg_r *a,
+              void (* fn)(TCGv_i64, TCGv_ptr, TCGv, TCGv, TCGv_i64))
+{
+    TCGv src1, src2;
+    TCGv_i64 dst, src3;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_psfoperand) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    src3 = tcg_temp_new_i64();
+    dst = tcg_temp_new_i64();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+
+    if (is_32bit(ctx)) {
+        TCGv t0, t1;
+        t0 = tcg_temp_new();
+        t1 = tcg_temp_new();
+
+        gen_get_gpr(t0, a->rd);
+        gen_get_gpr(t1, a->rd + 1);
+        tcg_gen_concat_tl_i64(src3, t0, t1);
+        tcg_temp_free(t0);
+        tcg_temp_free(t1);
+    } else {
+        TCGv t0;
+        t0 = tcg_temp_new();
+
+        gen_get_gpr(t0, a->rd);
+        tcg_gen_ext_tl_i64(src3, t0);
+        tcg_temp_free(t0);
+    }
+
+    fn(dst, cpu_env, src1, src2, src3);
+
+    set_pair_regs(ctx, dst, a->rd);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free_i64(src3);
+    tcg_temp_free_i64(dst);
+    return true;
+}
+
+#define GEN_RVP_R_D64_ACC_OOL(NAME)                    \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_d64_acc_ool(s, a, gen_helper_##NAME);     \
+}
+
+GEN_RVP_R_D64_ACC_OOL(smar64);
+GEN_RVP_R_D64_ACC_OOL(smsr64);
+GEN_RVP_R_D64_ACC_OOL(umar64);
+GEN_RVP_R_D64_ACC_OOL(umsr64);
+GEN_RVP_R_D64_ACC_OOL(kmar64);
+GEN_RVP_R_D64_ACC_OOL(kmsr64);
+GEN_RVP_R_D64_ACC_OOL(ukmar64);
+GEN_RVP_R_D64_ACC_OOL(ukmsr64);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index b8be234d97..59a06c604d 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2220,3 +2220,158 @@ static inline void do_uksub64(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64_64_64(uksub64, 1, 8);
+
+/* 32-bit Multiply with 64-bit Add/Subtract Instructions */
+static inline uint64_t
+rvpr64_acc(CPURISCVState *env, target_ulong a,
+           target_ulong b, uint64_t c,
+           uint8_t step, uint8_t size, PackedFn4i *fn)
+{
+    int i, passes = sizeof(target_ulong) / size;
+    uint64_t result = 0;
+
+    for (i = 0; i < passes; i += step) {
+        fn(env, &result, &a, &b, &c, i);
+    }
+    return result;
+}
+
+#define RVPR64_ACC(NAME, STEP, SIZE)                                     \
+uint64_t HELPER(NAME)(CPURISCVState *env, target_ulong a,                \
+                      target_ulong b, uint64_t c)                        \
+{                                                                        \
+    return rvpr64_acc(env, a, b, c, STEP, SIZE, (PackedFn4i *)do_##NAME);\
+}
+
+static inline void do_smar64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    int64_t *d = vd, *c = vc;
+    if (i == 0) {
+        *d = *c;
+    }
+    *d += (int64_t)a[H4(i)] * b[H4(i)];
+}
+
+RVPR64_ACC(smar64, 1, 4);
+
+static inline void do_smsr64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    int64_t *d = vd, *c = vc;
+    if (i == 0) {
+        *d = *c;
+    }
+    *d -= (int64_t)a[H4(i)] * b[H4(i)];
+}
+
+RVPR64_ACC(smsr64, 1, 4);
+
+static inline void do_umar64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    uint64_t *d = vd, *c = vc;
+    if (i == 0) {
+        *d = *c;
+    }
+    *d += (uint64_t)a[H4(i)] * b[H4(i)];
+}
+
+RVPR64_ACC(umar64, 1, 4);
+
+static inline void do_umsr64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    uint64_t *d = vd, *c = vc;
+    if (i == 0) {
+        *d = *c;
+    }
+    *d -= (uint64_t)a[H4(i)] * b[H4(i)];
+}
+
+RVPR64_ACC(umsr64, 1, 4);
+
+static inline void do_kmar64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    int64_t *d = vd, *c = vc;
+    int64_t m0 =  (int64_t)a[H4(i)] * b[H4(i)];
+    if (!riscv_cpu_is_32bit(env)) {
+        int64_t m1 =  (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+        if (a[H4(i)] == INT32_MIN && b[H4(i)] == INT32_MIN &&
+            a[H4(i + 1)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+            if (*c >= 0) {
+                *d = INT64_MAX;
+                env->vxsat = 1;
+            } else {
+                *d = sadd64(env, 0, *c + m0, m1);
+            }
+        } else {
+            *d = sadd64(env, 0, *c, m0 + m1);
+        }
+    } else {
+        *d = sadd64(env, 0, *c, m0);
+    }
+}
+
+RVPR64_ACC(kmar64, 1, sizeof(target_ulong));
+
+static inline void do_kmsr64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    int64_t *d = vd, *c = vc;
+
+    int64_t m0 =  (int64_t)a[H4(i)] * b[H4(i)];
+    if (!riscv_cpu_is_32bit(env)) {
+        int64_t m1 =  (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+        if (a[H4(i)] == INT32_MIN && b[H4(i)] == INT32_MIN &&
+            a[H4(i + 1)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+            if (*c <= 0) {
+                *d = INT64_MIN;
+                env->vxsat = 1;
+            } else {
+                *d = ssub64(env, 0, *c - m0, m1);
+            }
+        } else {
+            *d = ssub64(env, 0, *c, m0 + m1);
+        }
+    } else {
+        *d = ssub64(env, 0, *c, m0);
+    }
+}
+
+RVPR64_ACC(kmsr64, 1, sizeof(target_ulong));
+
+static inline void do_ukmar64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    uint64_t *d = vd, *c = vc;
+
+    if (i == 0) {
+        *d = *c;
+    }
+    *d = saddu64(env, 0, *d, (uint64_t)a[H4(i)] * b[H4(i)]);
+}
+
+RVPR64_ACC(ukmar64, 1, 4);
+
+static inline void do_ukmsr64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    uint64_t *d = vd, *c = vc;
+
+    if (i == 0) {
+        *d = *c;
+    }
+    *d = ssubu64(env, 0, *d, (uint64_t)a[i] * b[i]);
+}
+
+RVPR64_ACC(ukmsr64, 1, 4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 23/37] target/riscv: Signed 16-bit Multiply with 64-bit Add/Subtract Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

one or two 16x16 multiply as operands for an add/subtract operation with
another 64-bit operand.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  11 ++
 target/riscv/insn32.decode              |  11 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  12 ++
 target/riscv/packed_helper.c            | 151 ++++++++++++++++++++++++
 4 files changed, 185 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index c3c086bed0..87a0779842 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1350,3 +1350,14 @@ DEF_HELPER_4(kmar64, i64, env, tl, tl, i64)
 DEF_HELPER_4(kmsr64, i64, env, tl, tl, i64)
 DEF_HELPER_4(ukmar64, i64, env, tl, tl, i64)
 DEF_HELPER_4(ukmsr64, i64, env, tl, tl, i64)
+
+DEF_HELPER_4(smalbb, i64, env, tl, tl, i64)
+DEF_HELPER_4(smalbt, i64, env, tl, tl, i64)
+DEF_HELPER_4(smaltt, i64, env, tl, tl, i64)
+DEF_HELPER_4(smalda, i64, env, tl, tl, i64)
+DEF_HELPER_4(smalxda, i64, env, tl, tl, i64)
+DEF_HELPER_4(smalds, i64, env, tl, tl, i64)
+DEF_HELPER_4(smalxds, i64, env, tl, tl, i64)
+DEF_HELPER_4(smaldrs, i64, env, tl, tl, i64)
+DEF_HELPER_4(smslda, i64, env, tl, tl, i64)
+DEF_HELPER_4(smslxda, i64, env, tl, tl, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 5d123bbb97..d1668b34cb 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -953,3 +953,14 @@ kmar64     1001010  ..... ..... 001 ..... 1110111 @r
 kmsr64     1001011  ..... ..... 001 ..... 1110111 @r
 ukmar64    1011010  ..... ..... 001 ..... 1110111 @r
 ukmsr64    1011011  ..... ..... 001 ..... 1110111 @r
+
+smalbb     1000100  ..... ..... 001 ..... 1110111 @r
+smalbt     1001100  ..... ..... 001 ..... 1110111 @r
+smaltt     1010100  ..... ..... 001 ..... 1110111 @r
+smalda     1000110  ..... ..... 001 ..... 1110111 @r
+smalxda    1001110  ..... ..... 001 ..... 1110111 @r
+smalds     1000101  ..... ..... 001 ..... 1110111 @r
+smaldrs    1001101  ..... ..... 001 ..... 1110111 @r
+smalxds    1010101  ..... ..... 001 ..... 1110111 @r
+smslda     1010110  ..... ..... 001 ..... 1110111 @r
+smslxda    1011110  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 63b6810227..7c91bdc888 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -657,3 +657,15 @@ GEN_RVP_R_D64_ACC_OOL(kmar64);
 GEN_RVP_R_D64_ACC_OOL(kmsr64);
 GEN_RVP_R_D64_ACC_OOL(ukmar64);
 GEN_RVP_R_D64_ACC_OOL(ukmsr64);
+
+/* Signed 16-bit Multiply with 64-bit Add/Subtract Instructions */
+GEN_RVP_R_D64_ACC_OOL(smalbb);
+GEN_RVP_R_D64_ACC_OOL(smalbt);
+GEN_RVP_R_D64_ACC_OOL(smaltt);
+GEN_RVP_R_D64_ACC_OOL(smalda);
+GEN_RVP_R_D64_ACC_OOL(smalxda);
+GEN_RVP_R_D64_ACC_OOL(smalds);
+GEN_RVP_R_D64_ACC_OOL(smaldrs);
+GEN_RVP_R_D64_ACC_OOL(smalxds);
+GEN_RVP_R_D64_ACC_OOL(smslda);
+GEN_RVP_R_D64_ACC_OOL(smslxda);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 59a06c604d..3330a2ecec 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2375,3 +2375,154 @@ static inline void do_ukmsr64(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64_ACC(ukmsr64, 1, 4);
+
+/* Signed 16-bit Multiply with 64-bit Add/Subtract Instructions */
+static inline void do_smalbb(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i)] * b[H2(i)];
+}
+
+RVPR64_ACC(smalbb, 2, 2);
+
+static inline void do_smalbt(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smalbt, 2, 2);
+
+static inline void do_smaltt(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i + 1)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smaltt, 2, 2);
+
+static inline void do_smalda(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i)] * b[H2(i)] + (int64_t)a[H2(i + 1)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smalda, 2, 2);
+
+static inline void do_smalxda(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i)] * b[H2(i + 1)] + (int64_t)a[H2(i + 1)] * b[H2(i)];
+}
+
+RVPR64_ACC(smalxda, 2, 2);
+
+static inline void do_smalds(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i + 1)] * b[H2(i + 1)] - (int64_t)a[H2(i)] * b[H2(i)];
+}
+
+RVPR64_ACC(smalds, 2, 2);
+
+static inline void do_smaldrs(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i)] * b[H2(i)] - (int64_t)a[H2(i + 1)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smaldrs, 2, 2);
+
+static inline void do_smalxds(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i + 1)] * b[H2(i)] - (int64_t)a[H2(i)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smalxds, 2, 2);
+
+static inline void do_smslda(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d -= (int64_t)a[H2(i)] * b[H2(i)] + (int64_t)a[H2(i + 1)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smslda, 2, 2);
+
+static inline void do_smslxda(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d -= (int64_t)a[H2(i + 1)] * b[H2(i)] + (int64_t)a[H2(i)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smslxda, 2, 2);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 23/37] target/riscv: Signed 16-bit Multiply with 64-bit Add/Subtract Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

one or two 16x16 multiply as operands for an add/subtract operation with
another 64-bit operand.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  11 ++
 target/riscv/insn32.decode              |  11 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  12 ++
 target/riscv/packed_helper.c            | 151 ++++++++++++++++++++++++
 4 files changed, 185 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index c3c086bed0..87a0779842 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1350,3 +1350,14 @@ DEF_HELPER_4(kmar64, i64, env, tl, tl, i64)
 DEF_HELPER_4(kmsr64, i64, env, tl, tl, i64)
 DEF_HELPER_4(ukmar64, i64, env, tl, tl, i64)
 DEF_HELPER_4(ukmsr64, i64, env, tl, tl, i64)
+
+DEF_HELPER_4(smalbb, i64, env, tl, tl, i64)
+DEF_HELPER_4(smalbt, i64, env, tl, tl, i64)
+DEF_HELPER_4(smaltt, i64, env, tl, tl, i64)
+DEF_HELPER_4(smalda, i64, env, tl, tl, i64)
+DEF_HELPER_4(smalxda, i64, env, tl, tl, i64)
+DEF_HELPER_4(smalds, i64, env, tl, tl, i64)
+DEF_HELPER_4(smalxds, i64, env, tl, tl, i64)
+DEF_HELPER_4(smaldrs, i64, env, tl, tl, i64)
+DEF_HELPER_4(smslda, i64, env, tl, tl, i64)
+DEF_HELPER_4(smslxda, i64, env, tl, tl, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 5d123bbb97..d1668b34cb 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -953,3 +953,14 @@ kmar64     1001010  ..... ..... 001 ..... 1110111 @r
 kmsr64     1001011  ..... ..... 001 ..... 1110111 @r
 ukmar64    1011010  ..... ..... 001 ..... 1110111 @r
 ukmsr64    1011011  ..... ..... 001 ..... 1110111 @r
+
+smalbb     1000100  ..... ..... 001 ..... 1110111 @r
+smalbt     1001100  ..... ..... 001 ..... 1110111 @r
+smaltt     1010100  ..... ..... 001 ..... 1110111 @r
+smalda     1000110  ..... ..... 001 ..... 1110111 @r
+smalxda    1001110  ..... ..... 001 ..... 1110111 @r
+smalds     1000101  ..... ..... 001 ..... 1110111 @r
+smaldrs    1001101  ..... ..... 001 ..... 1110111 @r
+smalxds    1010101  ..... ..... 001 ..... 1110111 @r
+smslda     1010110  ..... ..... 001 ..... 1110111 @r
+smslxda    1011110  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 63b6810227..7c91bdc888 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -657,3 +657,15 @@ GEN_RVP_R_D64_ACC_OOL(kmar64);
 GEN_RVP_R_D64_ACC_OOL(kmsr64);
 GEN_RVP_R_D64_ACC_OOL(ukmar64);
 GEN_RVP_R_D64_ACC_OOL(ukmsr64);
+
+/* Signed 16-bit Multiply with 64-bit Add/Subtract Instructions */
+GEN_RVP_R_D64_ACC_OOL(smalbb);
+GEN_RVP_R_D64_ACC_OOL(smalbt);
+GEN_RVP_R_D64_ACC_OOL(smaltt);
+GEN_RVP_R_D64_ACC_OOL(smalda);
+GEN_RVP_R_D64_ACC_OOL(smalxda);
+GEN_RVP_R_D64_ACC_OOL(smalds);
+GEN_RVP_R_D64_ACC_OOL(smaldrs);
+GEN_RVP_R_D64_ACC_OOL(smalxds);
+GEN_RVP_R_D64_ACC_OOL(smslda);
+GEN_RVP_R_D64_ACC_OOL(smslxda);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 59a06c604d..3330a2ecec 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2375,3 +2375,154 @@ static inline void do_ukmsr64(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64_ACC(ukmsr64, 1, 4);
+
+/* Signed 16-bit Multiply with 64-bit Add/Subtract Instructions */
+static inline void do_smalbb(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i)] * b[H2(i)];
+}
+
+RVPR64_ACC(smalbb, 2, 2);
+
+static inline void do_smalbt(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smalbt, 2, 2);
+
+static inline void do_smaltt(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i + 1)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smaltt, 2, 2);
+
+static inline void do_smalda(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i)] * b[H2(i)] + (int64_t)a[H2(i + 1)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smalda, 2, 2);
+
+static inline void do_smalxda(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i)] * b[H2(i + 1)] + (int64_t)a[H2(i + 1)] * b[H2(i)];
+}
+
+RVPR64_ACC(smalxda, 2, 2);
+
+static inline void do_smalds(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i + 1)] * b[H2(i + 1)] - (int64_t)a[H2(i)] * b[H2(i)];
+}
+
+RVPR64_ACC(smalds, 2, 2);
+
+static inline void do_smaldrs(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i)] * b[H2(i)] - (int64_t)a[H2(i + 1)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smaldrs, 2, 2);
+
+static inline void do_smalxds(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i + 1)] * b[H2(i)] - (int64_t)a[H2(i)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smalxds, 2, 2);
+
+static inline void do_smslda(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d -= (int64_t)a[H2(i)] * b[H2(i)] + (int64_t)a[H2(i + 1)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smslda, 2, 2);
+
+static inline void do_smslxda(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d -= (int64_t)a[H2(i + 1)] * b[H2(i)] + (int64_t)a[H2(i)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smslxda, 2, 2);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 24/37] target/riscv: Non-SIMD Q15 saturation ALU Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

Q15 saturation is to limit the result to the range
[INT16_MIN, INT16_MAX].

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  8 +++
 target/riscv/insn32.decode              |  8 +++
 target/riscv/insn_trans/trans_rvp.c.inc | 12 ++++
 target/riscv/packed_helper.c            | 78 +++++++++++++++++++++++++
 4 files changed, 106 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 87a0779842..6ce22a186e 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1361,3 +1361,11 @@ DEF_HELPER_4(smalxds, i64, env, tl, tl, i64)
 DEF_HELPER_4(smaldrs, i64, env, tl, tl, i64)
 DEF_HELPER_4(smslda, i64, env, tl, tl, i64)
 DEF_HELPER_4(smslxda, i64, env, tl, tl, i64)
+
+DEF_HELPER_3(kaddh, tl, env, tl, tl)
+DEF_HELPER_3(ksubh, tl, env, tl, tl)
+DEF_HELPER_3(khmbb, tl, env, tl, tl)
+DEF_HELPER_3(khmbt, tl, env, tl, tl)
+DEF_HELPER_3(khmtt, tl, env, tl, tl)
+DEF_HELPER_3(ukaddh, tl, env, tl, tl)
+DEF_HELPER_3(uksubh, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index d1668b34cb..f465851f03 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -964,3 +964,11 @@ smaldrs    1001101  ..... ..... 001 ..... 1110111 @r
 smalxds    1010101  ..... ..... 001 ..... 1110111 @r
 smslda     1010110  ..... ..... 001 ..... 1110111 @r
 smslxda    1011110  ..... ..... 001 ..... 1110111 @r
+
+kaddh      0000010  ..... ..... 001 ..... 1110111 @r
+ksubh      0000011  ..... ..... 001 ..... 1110111 @r
+khmbb      0000110  ..... ..... 001 ..... 1110111 @r
+khmbt      0001110  ..... ..... 001 ..... 1110111 @r
+khmtt      0010110  ..... ..... 001 ..... 1110111 @r
+ukaddh     0001010  ..... ..... 001 ..... 1110111 @r
+uksubh     0001011  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 7c91bdc888..48eb190bc6 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -669,3 +669,15 @@ GEN_RVP_R_D64_ACC_OOL(smaldrs);
 GEN_RVP_R_D64_ACC_OOL(smalxds);
 GEN_RVP_R_D64_ACC_OOL(smslda);
 GEN_RVP_R_D64_ACC_OOL(smslxda);
+
+/*
+ *** Non-SIMD Instructions
+ */
+/* Non-SIMD Q15 saturation ALU Instructions */
+GEN_RVP_R_OOL(kaddh);
+GEN_RVP_R_OOL(ksubh);
+GEN_RVP_R_OOL(khmbb);
+GEN_RVP_R_OOL(khmbt);
+GEN_RVP_R_OOL(khmtt);
+GEN_RVP_R_OOL(ukaddh);
+GEN_RVP_R_OOL(uksubh);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 3330a2ecec..171f88face 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2526,3 +2526,81 @@ static inline void do_smslxda(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64_ACC(smslxda, 2, 2);
+
+/* Q15 saturation instructions */
+static inline void do_kaddh(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = sat64(env, (int64_t)a[H4(i)] + b[H4(i)], 15);
+}
+
+RVPR(kaddh, 2, 4);
+
+static inline void do_ksubh(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = sat64(env, (int64_t)a[H4(i)] - b[H4(i)], 15);
+}
+
+RVPR(ksubh, 2, 4);
+
+static inline void do_khmbb(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    *d = sat64(env, (int64_t)a[H2(i)] * b[H2(i)] >> 15, 15);
+}
+
+RVPR(khmbb, 4, 2);
+
+static inline void do_khmbt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    *d = sat64(env, (int64_t)a[H2(i)] * b[H2(i + 1)] >> 15, 15);
+}
+
+RVPR(khmbt, 4, 2);
+
+static inline void do_khmtt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    *d = sat64(env, (int64_t)a[H2(i + 1)] * b[H2(i + 1)] >> 15, 15);
+}
+
+RVPR(khmtt, 4, 2);
+
+static inline void do_ukaddh(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    uint32_t *a = va, *b = vb;
+
+    *d = (int16_t)satu64(env, saddu32(env, 0, a[H4(i)], b[H4(i)]), 16);
+}
+
+RVPR(ukaddh, 2, 4);
+
+static inline void do_uksubh(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    uint32_t *a = va, *b = vb;
+
+    *d = (int16_t)satu64(env, ssubu32(env, 0, a[H4(i)], b[H4(i)]), 16);
+}
+
+RVPR(uksubh, 2, 4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 24/37] target/riscv: Non-SIMD Q15 saturation ALU Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

Q15 saturation is to limit the result to the range
[INT16_MIN, INT16_MAX].

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  8 +++
 target/riscv/insn32.decode              |  8 +++
 target/riscv/insn_trans/trans_rvp.c.inc | 12 ++++
 target/riscv/packed_helper.c            | 78 +++++++++++++++++++++++++
 4 files changed, 106 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 87a0779842..6ce22a186e 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1361,3 +1361,11 @@ DEF_HELPER_4(smalxds, i64, env, tl, tl, i64)
 DEF_HELPER_4(smaldrs, i64, env, tl, tl, i64)
 DEF_HELPER_4(smslda, i64, env, tl, tl, i64)
 DEF_HELPER_4(smslxda, i64, env, tl, tl, i64)
+
+DEF_HELPER_3(kaddh, tl, env, tl, tl)
+DEF_HELPER_3(ksubh, tl, env, tl, tl)
+DEF_HELPER_3(khmbb, tl, env, tl, tl)
+DEF_HELPER_3(khmbt, tl, env, tl, tl)
+DEF_HELPER_3(khmtt, tl, env, tl, tl)
+DEF_HELPER_3(ukaddh, tl, env, tl, tl)
+DEF_HELPER_3(uksubh, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index d1668b34cb..f465851f03 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -964,3 +964,11 @@ smaldrs    1001101  ..... ..... 001 ..... 1110111 @r
 smalxds    1010101  ..... ..... 001 ..... 1110111 @r
 smslda     1010110  ..... ..... 001 ..... 1110111 @r
 smslxda    1011110  ..... ..... 001 ..... 1110111 @r
+
+kaddh      0000010  ..... ..... 001 ..... 1110111 @r
+ksubh      0000011  ..... ..... 001 ..... 1110111 @r
+khmbb      0000110  ..... ..... 001 ..... 1110111 @r
+khmbt      0001110  ..... ..... 001 ..... 1110111 @r
+khmtt      0010110  ..... ..... 001 ..... 1110111 @r
+ukaddh     0001010  ..... ..... 001 ..... 1110111 @r
+uksubh     0001011  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 7c91bdc888..48eb190bc6 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -669,3 +669,15 @@ GEN_RVP_R_D64_ACC_OOL(smaldrs);
 GEN_RVP_R_D64_ACC_OOL(smalxds);
 GEN_RVP_R_D64_ACC_OOL(smslda);
 GEN_RVP_R_D64_ACC_OOL(smslxda);
+
+/*
+ *** Non-SIMD Instructions
+ */
+/* Non-SIMD Q15 saturation ALU Instructions */
+GEN_RVP_R_OOL(kaddh);
+GEN_RVP_R_OOL(ksubh);
+GEN_RVP_R_OOL(khmbb);
+GEN_RVP_R_OOL(khmbt);
+GEN_RVP_R_OOL(khmtt);
+GEN_RVP_R_OOL(ukaddh);
+GEN_RVP_R_OOL(uksubh);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 3330a2ecec..171f88face 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2526,3 +2526,81 @@ static inline void do_smslxda(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64_ACC(smslxda, 2, 2);
+
+/* Q15 saturation instructions */
+static inline void do_kaddh(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = sat64(env, (int64_t)a[H4(i)] + b[H4(i)], 15);
+}
+
+RVPR(kaddh, 2, 4);
+
+static inline void do_ksubh(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = sat64(env, (int64_t)a[H4(i)] - b[H4(i)], 15);
+}
+
+RVPR(ksubh, 2, 4);
+
+static inline void do_khmbb(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    *d = sat64(env, (int64_t)a[H2(i)] * b[H2(i)] >> 15, 15);
+}
+
+RVPR(khmbb, 4, 2);
+
+static inline void do_khmbt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    *d = sat64(env, (int64_t)a[H2(i)] * b[H2(i + 1)] >> 15, 15);
+}
+
+RVPR(khmbt, 4, 2);
+
+static inline void do_khmtt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    *d = sat64(env, (int64_t)a[H2(i + 1)] * b[H2(i + 1)] >> 15, 15);
+}
+
+RVPR(khmtt, 4, 2);
+
+static inline void do_ukaddh(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    uint32_t *a = va, *b = vb;
+
+    *d = (int16_t)satu64(env, saddu32(env, 0, a[H4(i)], b[H4(i)]), 16);
+}
+
+RVPR(ukaddh, 2, 4);
+
+static inline void do_uksubh(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    uint32_t *a = va, *b = vb;
+
+    *d = (int16_t)satu64(env, ssubu32(env, 0, a[H4(i)], b[H4(i)]), 16);
+}
+
+RVPR(uksubh, 2, 4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 25/37] target/riscv: Non-SIMD Q31 saturation ALU Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

Q31 saturation is to limit the result to the range
[INT32_MIN, INT32_MAX].

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  15 ++
 target/riscv/insn32.decode              |  16 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  17 ++
 target/riscv/packed_helper.c            | 214 ++++++++++++++++++++++++
 4 files changed, 262 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 6ce22a186e..b3485f95a2 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1369,3 +1369,18 @@ DEF_HELPER_3(khmbt, tl, env, tl, tl)
 DEF_HELPER_3(khmtt, tl, env, tl, tl)
 DEF_HELPER_3(ukaddh, tl, env, tl, tl)
 DEF_HELPER_3(uksubh, tl, env, tl, tl)
+
+DEF_HELPER_3(kaddw, tl, env, tl, tl)
+DEF_HELPER_3(ukaddw, tl, env, tl, tl)
+DEF_HELPER_3(ksubw, tl, env, tl, tl)
+DEF_HELPER_3(uksubw, tl, env, tl, tl)
+DEF_HELPER_3(kdmbb, tl, env, tl, tl)
+DEF_HELPER_3(kdmbt, tl, env, tl, tl)
+DEF_HELPER_3(kdmtt, tl, env, tl, tl)
+DEF_HELPER_3(kslraw, tl, env, tl, tl)
+DEF_HELPER_3(kslraw_u, tl, env, tl, tl)
+DEF_HELPER_3(ksllw, tl, env, tl, tl)
+DEF_HELPER_4(kdmabb, tl, env, tl, tl, tl)
+DEF_HELPER_4(kdmabt, tl, env, tl, tl, tl)
+DEF_HELPER_4(kdmatt, tl, env, tl, tl, tl)
+DEF_HELPER_2(kabsw, tl, env, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index f465851f03..a25294baab 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -972,3 +972,19 @@ khmbt      0001110  ..... ..... 001 ..... 1110111 @r
 khmtt      0010110  ..... ..... 001 ..... 1110111 @r
 ukaddh     0001010  ..... ..... 001 ..... 1110111 @r
 uksubh     0001011  ..... ..... 001 ..... 1110111 @r
+
+kaddw      0000000  ..... ..... 001 ..... 1110111 @r
+ukaddw     0001000  ..... ..... 001 ..... 1110111 @r
+ksubw      0000001  ..... ..... 001 ..... 1110111 @r
+uksubw     0001001  ..... ..... 001 ..... 1110111 @r
+kdmbb      0000101  ..... ..... 001 ..... 1110111 @r
+kdmbt      0001101  ..... ..... 001 ..... 1110111 @r
+kdmtt      0010101  ..... ..... 001 ..... 1110111 @r
+kslraw     0110111  ..... ..... 001 ..... 1110111 @r
+kslraw_u   0111111  ..... ..... 001 ..... 1110111 @r
+ksllw      0010011  ..... ..... 001 ..... 1110111 @r
+kslliw     0011011  ..... ..... 001 ..... 1110111 @sh5
+kdmabb     1101001  ..... ..... 001 ..... 1110111 @r
+kdmabt     1110001  ..... ..... 001 ..... 1110111 @r
+kdmatt     1111001  ..... ..... 001 ..... 1110111 @r
+kabsw      1010110  10100 ..... 000 ..... 1110111 @r2
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 48eb190bc6..d2c7ab1440 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -681,3 +681,20 @@ GEN_RVP_R_OOL(khmbt);
 GEN_RVP_R_OOL(khmtt);
 GEN_RVP_R_OOL(ukaddh);
 GEN_RVP_R_OOL(uksubh);
+
+/* Non-SIMD Q31 saturation ALU Instructions */
+GEN_RVP_R_OOL(kaddw);
+GEN_RVP_R_OOL(ukaddw);
+GEN_RVP_R_OOL(ksubw);
+GEN_RVP_R_OOL(uksubw);
+GEN_RVP_R_OOL(kdmbb);
+GEN_RVP_R_OOL(kdmbt);
+GEN_RVP_R_OOL(kdmtt);
+GEN_RVP_R_OOL(kslraw);
+GEN_RVP_R_OOL(kslraw_u);
+GEN_RVP_R_OOL(ksllw);
+GEN_RVP_SHIFTI(kslliw, NULL, gen_helper_ksllw);
+GEN_RVP_R_ACC_OOL(kdmabb);
+GEN_RVP_R_ACC_OOL(kdmabt);
+GEN_RVP_R_ACC_OOL(kdmatt);
+GEN_RVP_R2_OOL(kabsw);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 171f88face..89d203730d 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2604,3 +2604,217 @@ static inline void do_uksubh(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(uksubh, 2, 4);
+
+/* Q31 saturation Instructions */
+static inline void do_kaddw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = sadd32(env, 0, a[H4(i)], b[H4(i)]);
+}
+
+RVPR(kaddw, 2, 4);
+
+static inline void do_ukaddw(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    uint32_t *a = va, *b = vb;
+
+    *d = (int32_t)saddu32(env, 0, a[H4(i)], b[H4(i)]);
+}
+
+RVPR(ukaddw, 2, 4);
+
+static inline void do_ksubw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = ssub32(env, 0, a[H4(i)], b[H4(i)]);
+}
+
+RVPR(ksubw, 2, 4);
+
+static inline void do_uksubw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    uint32_t *a = va, *b = vb;
+
+    *d = (int32_t)ssubu32(env, 0, a[H4(i)], b[H4(i)]);
+}
+
+RVPR(uksubw, 2, 4);
+
+static inline void do_kdmbb(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i)] == INT16_MIN) {
+        *d = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int64_t)a[H2(i)] * b[H2(i)] << 1;
+    }
+}
+
+RVPR(kdmbb, 4, 2);
+
+static inline void do_kdmbt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        *d = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int64_t)a[H2(i)] * b[H2(i + 1)] << 1;
+    }
+}
+
+RVPR(kdmbt, 4, 2);
+
+static inline void do_kdmtt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i + 1)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        *d = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int64_t)a[H2(i + 1)] * b[H2(i + 1)] << 1;
+    }
+}
+
+RVPR(kdmtt, 4, 2);
+
+static inline void do_kslraw(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 6);
+
+    if (shift >= 0) {
+        *d = (int32_t)sat64(env, (int64_t)a[H4(i)] << shift, 31);
+    } else {
+        shift = -shift;
+        shift = (shift == 32) ? 31 : shift;
+        *d = a[H4(i)] >> shift;
+    }
+}
+
+RVPR(kslraw, 2, 4);
+
+static inline void do_kslraw_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 6);
+
+    if (shift >= 0) {
+        *d = (int32_t)sat64(env, (int64_t)a[H4(i)] << shift, 31);
+    } else {
+        shift = -shift;
+        shift = (shift == 32) ? 31 : shift;
+        *d = vssra32(env, 0, a[H4(i)], shift);
+    }
+}
+
+RVPR(kslraw_u, 2, 4);
+
+static inline void do_ksllw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+
+    *d = (int32_t)sat64(env, (int64_t)a[H4(i)] << shift, 31);
+}
+
+RVPR(ksllw, 2, 4);
+
+static inline void do_kdmabb(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i)] * b[H2(i)] << 1;
+    }
+    *d = sadd32(env, 0, c[H4(i)], m0);
+}
+
+RVPR_ACC(kdmabb, 4, 2);
+
+static inline void do_kdmabt(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i)] * b[H2(i + 1)] << 1;
+    }
+    *d = sadd32(env, 0, c[H4(i)], m0);
+}
+
+RVPR_ACC(kdmabt, 4, 2);
+
+static inline void do_kdmatt(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i + 1)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i + 1)] * b[H2(i + 1)] << 1;
+    }
+    *d = sadd32(env, 0, c[H4(i)], m0);
+}
+
+RVPR_ACC(kdmatt, 4, 2);
+
+static inline void do_kabsw(CPURISCVState *env, void *vd, void *va, uint8_t i)
+
+{
+    target_long *d = vd;
+    int32_t *a = va;
+
+    if (a[H4(i)] == INT32_MIN) {
+        *d = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int32_t)abs(a[H4(i)]);
+    }
+}
+
+RVPR2(kabsw, 2, 4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 25/37] target/riscv: Non-SIMD Q31 saturation ALU Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

Q31 saturation is to limit the result to the range
[INT32_MIN, INT32_MAX].

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  15 ++
 target/riscv/insn32.decode              |  16 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  17 ++
 target/riscv/packed_helper.c            | 214 ++++++++++++++++++++++++
 4 files changed, 262 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 6ce22a186e..b3485f95a2 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1369,3 +1369,18 @@ DEF_HELPER_3(khmbt, tl, env, tl, tl)
 DEF_HELPER_3(khmtt, tl, env, tl, tl)
 DEF_HELPER_3(ukaddh, tl, env, tl, tl)
 DEF_HELPER_3(uksubh, tl, env, tl, tl)
+
+DEF_HELPER_3(kaddw, tl, env, tl, tl)
+DEF_HELPER_3(ukaddw, tl, env, tl, tl)
+DEF_HELPER_3(ksubw, tl, env, tl, tl)
+DEF_HELPER_3(uksubw, tl, env, tl, tl)
+DEF_HELPER_3(kdmbb, tl, env, tl, tl)
+DEF_HELPER_3(kdmbt, tl, env, tl, tl)
+DEF_HELPER_3(kdmtt, tl, env, tl, tl)
+DEF_HELPER_3(kslraw, tl, env, tl, tl)
+DEF_HELPER_3(kslraw_u, tl, env, tl, tl)
+DEF_HELPER_3(ksllw, tl, env, tl, tl)
+DEF_HELPER_4(kdmabb, tl, env, tl, tl, tl)
+DEF_HELPER_4(kdmabt, tl, env, tl, tl, tl)
+DEF_HELPER_4(kdmatt, tl, env, tl, tl, tl)
+DEF_HELPER_2(kabsw, tl, env, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index f465851f03..a25294baab 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -972,3 +972,19 @@ khmbt      0001110  ..... ..... 001 ..... 1110111 @r
 khmtt      0010110  ..... ..... 001 ..... 1110111 @r
 ukaddh     0001010  ..... ..... 001 ..... 1110111 @r
 uksubh     0001011  ..... ..... 001 ..... 1110111 @r
+
+kaddw      0000000  ..... ..... 001 ..... 1110111 @r
+ukaddw     0001000  ..... ..... 001 ..... 1110111 @r
+ksubw      0000001  ..... ..... 001 ..... 1110111 @r
+uksubw     0001001  ..... ..... 001 ..... 1110111 @r
+kdmbb      0000101  ..... ..... 001 ..... 1110111 @r
+kdmbt      0001101  ..... ..... 001 ..... 1110111 @r
+kdmtt      0010101  ..... ..... 001 ..... 1110111 @r
+kslraw     0110111  ..... ..... 001 ..... 1110111 @r
+kslraw_u   0111111  ..... ..... 001 ..... 1110111 @r
+ksllw      0010011  ..... ..... 001 ..... 1110111 @r
+kslliw     0011011  ..... ..... 001 ..... 1110111 @sh5
+kdmabb     1101001  ..... ..... 001 ..... 1110111 @r
+kdmabt     1110001  ..... ..... 001 ..... 1110111 @r
+kdmatt     1111001  ..... ..... 001 ..... 1110111 @r
+kabsw      1010110  10100 ..... 000 ..... 1110111 @r2
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 48eb190bc6..d2c7ab1440 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -681,3 +681,20 @@ GEN_RVP_R_OOL(khmbt);
 GEN_RVP_R_OOL(khmtt);
 GEN_RVP_R_OOL(ukaddh);
 GEN_RVP_R_OOL(uksubh);
+
+/* Non-SIMD Q31 saturation ALU Instructions */
+GEN_RVP_R_OOL(kaddw);
+GEN_RVP_R_OOL(ukaddw);
+GEN_RVP_R_OOL(ksubw);
+GEN_RVP_R_OOL(uksubw);
+GEN_RVP_R_OOL(kdmbb);
+GEN_RVP_R_OOL(kdmbt);
+GEN_RVP_R_OOL(kdmtt);
+GEN_RVP_R_OOL(kslraw);
+GEN_RVP_R_OOL(kslraw_u);
+GEN_RVP_R_OOL(ksllw);
+GEN_RVP_SHIFTI(kslliw, NULL, gen_helper_ksllw);
+GEN_RVP_R_ACC_OOL(kdmabb);
+GEN_RVP_R_ACC_OOL(kdmabt);
+GEN_RVP_R_ACC_OOL(kdmatt);
+GEN_RVP_R2_OOL(kabsw);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 171f88face..89d203730d 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2604,3 +2604,217 @@ static inline void do_uksubh(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(uksubh, 2, 4);
+
+/* Q31 saturation Instructions */
+static inline void do_kaddw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = sadd32(env, 0, a[H4(i)], b[H4(i)]);
+}
+
+RVPR(kaddw, 2, 4);
+
+static inline void do_ukaddw(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    uint32_t *a = va, *b = vb;
+
+    *d = (int32_t)saddu32(env, 0, a[H4(i)], b[H4(i)]);
+}
+
+RVPR(ukaddw, 2, 4);
+
+static inline void do_ksubw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = ssub32(env, 0, a[H4(i)], b[H4(i)]);
+}
+
+RVPR(ksubw, 2, 4);
+
+static inline void do_uksubw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    uint32_t *a = va, *b = vb;
+
+    *d = (int32_t)ssubu32(env, 0, a[H4(i)], b[H4(i)]);
+}
+
+RVPR(uksubw, 2, 4);
+
+static inline void do_kdmbb(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i)] == INT16_MIN) {
+        *d = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int64_t)a[H2(i)] * b[H2(i)] << 1;
+    }
+}
+
+RVPR(kdmbb, 4, 2);
+
+static inline void do_kdmbt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        *d = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int64_t)a[H2(i)] * b[H2(i + 1)] << 1;
+    }
+}
+
+RVPR(kdmbt, 4, 2);
+
+static inline void do_kdmtt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i + 1)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        *d = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int64_t)a[H2(i + 1)] * b[H2(i + 1)] << 1;
+    }
+}
+
+RVPR(kdmtt, 4, 2);
+
+static inline void do_kslraw(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 6);
+
+    if (shift >= 0) {
+        *d = (int32_t)sat64(env, (int64_t)a[H4(i)] << shift, 31);
+    } else {
+        shift = -shift;
+        shift = (shift == 32) ? 31 : shift;
+        *d = a[H4(i)] >> shift;
+    }
+}
+
+RVPR(kslraw, 2, 4);
+
+static inline void do_kslraw_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 6);
+
+    if (shift >= 0) {
+        *d = (int32_t)sat64(env, (int64_t)a[H4(i)] << shift, 31);
+    } else {
+        shift = -shift;
+        shift = (shift == 32) ? 31 : shift;
+        *d = vssra32(env, 0, a[H4(i)], shift);
+    }
+}
+
+RVPR(kslraw_u, 2, 4);
+
+static inline void do_ksllw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+
+    *d = (int32_t)sat64(env, (int64_t)a[H4(i)] << shift, 31);
+}
+
+RVPR(ksllw, 2, 4);
+
+static inline void do_kdmabb(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i)] * b[H2(i)] << 1;
+    }
+    *d = sadd32(env, 0, c[H4(i)], m0);
+}
+
+RVPR_ACC(kdmabb, 4, 2);
+
+static inline void do_kdmabt(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i)] * b[H2(i + 1)] << 1;
+    }
+    *d = sadd32(env, 0, c[H4(i)], m0);
+}
+
+RVPR_ACC(kdmabt, 4, 2);
+
+static inline void do_kdmatt(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i + 1)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i + 1)] * b[H2(i + 1)] << 1;
+    }
+    *d = sadd32(env, 0, c[H4(i)], m0);
+}
+
+RVPR_ACC(kdmatt, 4, 2);
+
+static inline void do_kabsw(CPURISCVState *env, void *vd, void *va, uint8_t i)
+
+{
+    target_long *d = vd;
+    int32_t *a = va;
+
+    if (a[H4(i)] == INT32_MIN) {
+        *d = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int32_t)abs(a[H4(i)]);
+    }
+}
+
+RVPR2(kabsw, 2, 4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 26/37] target/riscv: 32-bit Computation Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

32-bit halving addition or subtraction, maximum, minimum,
or multiply.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  9 +++
 target/riscv/insn32.decode              |  9 +++
 target/riscv/insn_trans/trans_rvp.c.inc | 10 +++
 target/riscv/packed_helper.c            | 92 +++++++++++++++++++++++++
 4 files changed, 120 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index b3485f95a2..3063b583f3 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1384,3 +1384,12 @@ DEF_HELPER_4(kdmabb, tl, env, tl, tl, tl)
 DEF_HELPER_4(kdmabt, tl, env, tl, tl, tl)
 DEF_HELPER_4(kdmatt, tl, env, tl, tl, tl)
 DEF_HELPER_2(kabsw, tl, env, tl)
+
+DEF_HELPER_3(raddw, tl, env, tl, tl)
+DEF_HELPER_3(uraddw, tl, env, tl, tl)
+DEF_HELPER_3(rsubw, tl, env, tl, tl)
+DEF_HELPER_3(ursubw, tl, env, tl, tl)
+DEF_HELPER_3(maxw, tl, env, tl, tl)
+DEF_HELPER_3(minw, tl, env, tl, tl)
+DEF_HELPER_3(mulr64, i64, env, tl, tl)
+DEF_HELPER_3(mulsr64, i64, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index a25294baab..9cfe5570b0 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -988,3 +988,12 @@ kdmabb     1101001  ..... ..... 001 ..... 1110111 @r
 kdmabt     1110001  ..... ..... 001 ..... 1110111 @r
 kdmatt     1111001  ..... ..... 001 ..... 1110111 @r
 kabsw      1010110  10100 ..... 000 ..... 1110111 @r2
+
+raddw      0010000  ..... ..... 001 ..... 1110111 @r
+uraddw     0011000  ..... ..... 001 ..... 1110111 @r
+rsubw      0010001  ..... ..... 001 ..... 1110111 @r
+ursubw     0011001  ..... ..... 001 ..... 1110111 @r
+maxw       1111001  ..... ..... 000 ..... 1110111 @r
+minw       1111000  ..... ..... 000 ..... 1110111 @r
+mulr64     1111000  ..... ..... 001 ..... 1110111 @r
+mulsr64    1110000  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index d2c7ab1440..b720c6e037 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -698,3 +698,13 @@ GEN_RVP_R_ACC_OOL(kdmabb);
 GEN_RVP_R_ACC_OOL(kdmabt);
 GEN_RVP_R_ACC_OOL(kdmatt);
 GEN_RVP_R2_OOL(kabsw);
+
+/* 32-bit Computation Instructions */
+GEN_RVP_R_OOL(raddw);
+GEN_RVP_R_OOL(uraddw);
+GEN_RVP_R_OOL(rsubw);
+GEN_RVP_R_OOL(ursubw);
+GEN_RVP_R_OOL(minw);
+GEN_RVP_R_OOL(maxw);
+GEN_RVP_R_D64_OOL(mulr64);
+GEN_RVP_R_D64_OOL(mulsr64);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 89d203730d..c0e3b6bbdb 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2818,3 +2818,95 @@ static inline void do_kabsw(CPURISCVState *env, void *vd, void *va, uint8_t i)
 }
 
 RVPR2(kabsw, 2, 4);
+
+/* 32-bit Computation Instructions */
+static inline void do_raddw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    target_long *d = vd;
+
+    *d = hadd32(a[H4(i)], b[H4(i)]);
+}
+
+RVPR(raddw, 2, 4);
+
+static inline void do_uraddw(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    target_long *d = vd;
+
+    *d = (int32_t)haddu32(a[H4(i)], b[H4(i)]);
+}
+
+RVPR(uraddw, 2, 4);
+
+static inline void do_rsubw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    target_long *d = vd;
+
+    *d = hsub32(a[H4(i)], b[H4(i)]);
+}
+
+RVPR(rsubw, 2, 4);
+
+static inline void do_ursubw(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    target_long *d = vd;
+
+    *d = (int32_t)hsubu64(a[H4(i)], b[H4(i)]);
+}
+
+RVPR(ursubw, 2, 4);
+
+static inline void do_maxw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = (a[H4(i)] > b[H4(i)]) ? a[H4(i)] : b[H4(i)];
+}
+
+RVPR(maxw, 2, 4);
+
+static inline void do_minw(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = (a[H4(i)] < b[H4(i)]) ? a[H4(i)] : b[H4(i)];
+}
+
+RVPR(minw, 2, 4);
+
+static inline void do_mulr64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint64_t *d = vd;
+    uint32_t *a = va, *b = vb;
+
+    *d = (uint64_t)a[H4(0)] * b[H4(0)];
+}
+
+RVPR64(mulr64);
+
+static inline void do_mulsr64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int64_t result;
+    int32_t *a = va, *b = vb;
+
+    result = (int64_t)a[H4(0)] * b[H4(0)];
+    d[H4(1)] = result >> 32;
+    d[H4(0)] = result & UINT32_MAX;
+}
+
+RVPR64(mulsr64);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 26/37] target/riscv: 32-bit Computation Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

32-bit halving addition or subtraction, maximum, minimum,
or multiply.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  9 +++
 target/riscv/insn32.decode              |  9 +++
 target/riscv/insn_trans/trans_rvp.c.inc | 10 +++
 target/riscv/packed_helper.c            | 92 +++++++++++++++++++++++++
 4 files changed, 120 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index b3485f95a2..3063b583f3 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1384,3 +1384,12 @@ DEF_HELPER_4(kdmabb, tl, env, tl, tl, tl)
 DEF_HELPER_4(kdmabt, tl, env, tl, tl, tl)
 DEF_HELPER_4(kdmatt, tl, env, tl, tl, tl)
 DEF_HELPER_2(kabsw, tl, env, tl)
+
+DEF_HELPER_3(raddw, tl, env, tl, tl)
+DEF_HELPER_3(uraddw, tl, env, tl, tl)
+DEF_HELPER_3(rsubw, tl, env, tl, tl)
+DEF_HELPER_3(ursubw, tl, env, tl, tl)
+DEF_HELPER_3(maxw, tl, env, tl, tl)
+DEF_HELPER_3(minw, tl, env, tl, tl)
+DEF_HELPER_3(mulr64, i64, env, tl, tl)
+DEF_HELPER_3(mulsr64, i64, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index a25294baab..9cfe5570b0 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -988,3 +988,12 @@ kdmabb     1101001  ..... ..... 001 ..... 1110111 @r
 kdmabt     1110001  ..... ..... 001 ..... 1110111 @r
 kdmatt     1111001  ..... ..... 001 ..... 1110111 @r
 kabsw      1010110  10100 ..... 000 ..... 1110111 @r2
+
+raddw      0010000  ..... ..... 001 ..... 1110111 @r
+uraddw     0011000  ..... ..... 001 ..... 1110111 @r
+rsubw      0010001  ..... ..... 001 ..... 1110111 @r
+ursubw     0011001  ..... ..... 001 ..... 1110111 @r
+maxw       1111001  ..... ..... 000 ..... 1110111 @r
+minw       1111000  ..... ..... 000 ..... 1110111 @r
+mulr64     1111000  ..... ..... 001 ..... 1110111 @r
+mulsr64    1110000  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index d2c7ab1440..b720c6e037 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -698,3 +698,13 @@ GEN_RVP_R_ACC_OOL(kdmabb);
 GEN_RVP_R_ACC_OOL(kdmabt);
 GEN_RVP_R_ACC_OOL(kdmatt);
 GEN_RVP_R2_OOL(kabsw);
+
+/* 32-bit Computation Instructions */
+GEN_RVP_R_OOL(raddw);
+GEN_RVP_R_OOL(uraddw);
+GEN_RVP_R_OOL(rsubw);
+GEN_RVP_R_OOL(ursubw);
+GEN_RVP_R_OOL(minw);
+GEN_RVP_R_OOL(maxw);
+GEN_RVP_R_D64_OOL(mulr64);
+GEN_RVP_R_D64_OOL(mulsr64);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 89d203730d..c0e3b6bbdb 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2818,3 +2818,95 @@ static inline void do_kabsw(CPURISCVState *env, void *vd, void *va, uint8_t i)
 }
 
 RVPR2(kabsw, 2, 4);
+
+/* 32-bit Computation Instructions */
+static inline void do_raddw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    target_long *d = vd;
+
+    *d = hadd32(a[H4(i)], b[H4(i)]);
+}
+
+RVPR(raddw, 2, 4);
+
+static inline void do_uraddw(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    target_long *d = vd;
+
+    *d = (int32_t)haddu32(a[H4(i)], b[H4(i)]);
+}
+
+RVPR(uraddw, 2, 4);
+
+static inline void do_rsubw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    target_long *d = vd;
+
+    *d = hsub32(a[H4(i)], b[H4(i)]);
+}
+
+RVPR(rsubw, 2, 4);
+
+static inline void do_ursubw(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    target_long *d = vd;
+
+    *d = (int32_t)hsubu64(a[H4(i)], b[H4(i)]);
+}
+
+RVPR(ursubw, 2, 4);
+
+static inline void do_maxw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = (a[H4(i)] > b[H4(i)]) ? a[H4(i)] : b[H4(i)];
+}
+
+RVPR(maxw, 2, 4);
+
+static inline void do_minw(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = (a[H4(i)] < b[H4(i)]) ? a[H4(i)] : b[H4(i)];
+}
+
+RVPR(minw, 2, 4);
+
+static inline void do_mulr64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint64_t *d = vd;
+    uint32_t *a = va, *b = vb;
+
+    *d = (uint64_t)a[H4(0)] * b[H4(0)];
+}
+
+RVPR64(mulr64);
+
+static inline void do_mulsr64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int64_t result;
+    int32_t *a = va, *b = vb;
+
+    result = (int64_t)a[H4(0)] * b[H4(0)];
+    d[H4(1)] = result >> 32;
+    d[H4(0)] = result & UINT32_MAX;
+}
+
+RVPR64(mulsr64);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 27/37] target/riscv: Non-SIMD Miscellaneous Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

Bit reverse, average, rounding shift, extract and insert byte
instructions.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   6 +
 target/riscv/insn32.decode              |  16 ++
 target/riscv/insn_trans/trans_rvp.c.inc | 241 ++++++++++++++++++++++++
 target/riscv/packed_helper.c            |  77 ++++++++
 4 files changed, 340 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 3063b583f3..bdd5ca1251 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1393,3 +1393,9 @@ DEF_HELPER_3(maxw, tl, env, tl, tl)
 DEF_HELPER_3(minw, tl, env, tl, tl)
 DEF_HELPER_3(mulr64, i64, env, tl, tl)
 DEF_HELPER_3(mulsr64, i64, env, tl, tl)
+
+DEF_HELPER_3(ave, tl, env, tl, tl)
+DEF_HELPER_3(sra_u, tl, env, tl, tl)
+DEF_HELPER_3(bitrev, tl, env, tl, tl)
+DEF_HELPER_3(wext, tl, env, i64, tl)
+DEF_HELPER_4(bpick, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 9cfe5570b0..b70f6f0dc2 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -26,6 +26,7 @@
 %sh7    20:7
 %sh4    20:4
 %sh3    20:3
+%sh6    20:6
 %csr    20:12
 %rm     12:3
 %nf     29:3                     !function=ex_plus_1
@@ -44,6 +45,7 @@
 &j    imm rd
 &r    rd rs1 rs2
 &r2   rd rs1
+&r4   rd rs1 rs2 rs3
 &s    imm rs1 rs2
 &u    imm rd
 &shift     shamt rs1 rd
@@ -65,6 +67,7 @@
 @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh7     %rs1 %rd
 @sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
 @sh3     ......  ...... .....  ... ..... ....... &shift  shamt=%sh3      %rs1 %rd
+@sh6     ......  ...... .....  ... ..... ....... &shift  shamt=%sh6      %rs1 %rd
 @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
 
 @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
@@ -74,6 +77,7 @@
 @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
 @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
 @r2      .......   ..... ..... ... ..... ....... &r2 %rs1 %rd
+@r4      ..... ..  ..... ..... ... ..... ....... %rs3 %rs2 %rs1 %rd
 @r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
 @r2_vm   ...... vm:1 ..... ..... ... ..... ....... &rmr %rs2 %rd
 @r1_vm   ...... vm:1 ..... ..... ... ..... ....... %rd
@@ -997,3 +1001,15 @@ maxw       1111001  ..... ..... 000 ..... 1110111 @r
 minw       1111000  ..... ..... 000 ..... 1110111 @r
 mulr64     1111000  ..... ..... 001 ..... 1110111 @r
 mulsr64    1110000  ..... ..... 001 ..... 1110111 @r
+
+ave        1110000  ..... ..... 000 ..... 1110111 @r
+sra_u      0010010  ..... ..... 001 ..... 1110111 @r
+srai_u     110101  ...... ..... 001 ..... 1110111 @sh6
+bitrev     1110011  ..... ..... 000 ..... 1110111 @r
+bitrevi    111010  ...... ..... 000 ..... 1110111 @sh6
+wext       1100111  ..... ..... 000 ..... 1110111 @r
+wexti      1101111  ..... ..... 000 ..... 1110111 @sh5
+bpick      .....00  ..... ..... 011 ..... 1110111 @r4
+insb       1010110  00 ... ..... 000 ..... 1110111 @sh3
+maddr32    1100010  ..... ..... 001 ..... 1110111 @r
+msubr32    1100011  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index b720c6e037..51e140d157 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -708,3 +708,244 @@ GEN_RVP_R_OOL(minw);
 GEN_RVP_R_OOL(maxw);
 GEN_RVP_R_D64_OOL(mulr64);
 GEN_RVP_R_D64_OOL(mulsr64);
+
+/* Non-SIMD Miscellaneous Instructions */
+GEN_RVP_R_OOL(ave);
+GEN_RVP_R_OOL(sra_u);
+GEN_RVP_SHIFTI(srai_u, NULL, gen_helper_sra_u);
+GEN_RVP_R_OOL(bitrev);
+GEN_RVP_SHIFTI(bitrevi, NULL, gen_helper_bitrev);
+
+static bool
+r_s64_ool(DisasContext *ctx, arg_r *a,
+          void (* fn)(TCGv, TCGv_ptr, TCGv_i64, TCGv))
+{
+    TCGv_i64 src1;
+    TCGv src2, dst;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_psfoperand) {
+        return false;
+    }
+
+    src1 = tcg_temp_new_i64();
+    src2 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    if (is_32bit(ctx)) {
+        TCGv t0, t1;
+        t0 = tcg_temp_new();
+        t1 = tcg_temp_new();
+        gen_get_gpr(t0, a->rs1);
+        gen_get_gpr(t1, a->rs1 + 1);
+        tcg_gen_concat_tl_i64(src1, t0, t1);
+        tcg_temp_free(t0);
+        tcg_temp_free(t1);
+    } else {
+        TCGv t0;
+        t0 = tcg_temp_new();
+        gen_get_gpr(t0, a->rs1);
+        tcg_gen_ext_tl_i64(src1, t0);
+        tcg_temp_free(t0);
+    }
+    gen_get_gpr(src2, a->rs2);
+    fn(dst, cpu_env, src1, src2);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free_i64(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(dst);
+    return true;
+}
+
+#define GEN_RVP_R_S64_OOL(NAME)                        \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_s64_ool(s, a, gen_helper_##NAME);         \
+}
+
+GEN_RVP_R_S64_OOL(wext);
+
+static bool rvp_shifti_s64_ool(DisasContext *ctx, arg_shift *a,
+                               void (* fn)(TCGv, TCGv_ptr, TCGv_i64, TCGv))
+{
+    TCGv_i64 src1;
+    TCGv shift, dst;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_psfoperand) {
+        return false;
+    }
+
+    src1 = tcg_temp_new_i64();
+    dst = tcg_temp_new();
+
+    if (is_32bit(ctx)) {
+        TCGv t0, t1;
+        t0 = tcg_temp_new();
+        t1 = tcg_temp_new();
+        gen_get_gpr(t0, a->rs1);
+        gen_get_gpr(t1, a->rs1 + 1);
+        tcg_gen_concat_tl_i64(src1, t0, t1);
+        tcg_temp_free(t0);
+        tcg_temp_free(t1);
+    } else {
+        TCGv t0;
+        t0 = tcg_temp_new();
+        gen_get_gpr(t0, a->rs1);
+        tcg_gen_ext_tl_i64(src1, t0);
+        tcg_temp_free(t0);
+    }
+    shift = tcg_const_tl(a->shamt);
+    fn(dst, cpu_env, src1, shift);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free_i64(src1);
+    tcg_temp_free(shift);
+    tcg_temp_free(dst);
+    return true;
+}
+
+#define GEN_RVP_SHIFTI_S64_OOL(NAME, OP)                    \
+static bool trans_##NAME(DisasContext *s, arg_shift *a)     \
+{                                                           \
+    return rvp_shifti_s64_ool(s, a, gen_helper_##OP);       \
+}
+
+GEN_RVP_SHIFTI_S64_OOL(wexti, wext);
+
+typedef void gen_helper_rvp_r4(TCGv, TCGv_ptr, TCGv, TCGv, TCGv);
+
+static bool r4_ool(DisasContext *ctx, arg_r4 *a, gen_helper_rvp_r4 *fn)
+{
+    TCGv src1, src2, src3, dst;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    src3 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    gen_get_gpr(src3, a->rs3);
+    fn(dst, cpu_env, src1, src2, src3);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(src3);
+    tcg_temp_free(dst);
+    return true;
+}
+
+#define GEN_RVP_R4_OOL(NAME)                           \
+static bool trans_##NAME(DisasContext *s, arg_r4 *a)   \
+{                                                      \
+    return r4_ool(s, a, gen_helper_##NAME);            \
+}
+
+GEN_RVP_R4_OOL(bpick);
+
+static bool trans_insb(DisasContext *ctx, arg_shift *a)
+{
+    TCGv src1, dst, b0;
+    uint8_t shift;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+    if (is_32bit(ctx)) {
+        shift = a->shamt & 0x3;
+    } else {
+        shift = a->shamt;
+    }
+    src1 = tcg_temp_new();
+    dst = tcg_temp_new();
+    b0 = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(dst, a->rd);
+
+    tcg_gen_andi_tl(b0, src1, 0xff);
+    tcg_gen_deposit_tl(dst, dst, b0, shift * 8, 8);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(dst);
+    tcg_temp_free(b0);
+    return true;
+}
+
+static bool trans_maddr32(DisasContext *ctx, arg_r *a)
+{
+    TCGv src1, src2, dst;
+    TCGv_i32 w1, w2, w3;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    dst = tcg_temp_new();
+    w1 = tcg_temp_new_i32();
+    w2 = tcg_temp_new_i32();
+    w3 = tcg_temp_new_i32();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    gen_get_gpr(dst, a->rd);
+
+    tcg_gen_trunc_tl_i32(w1, src1);
+    tcg_gen_trunc_tl_i32(w2, src2);
+    tcg_gen_trunc_tl_i32(w3, dst);
+
+    tcg_gen_mul_i32(w1, w1, w2);
+    tcg_gen_add_i32(w3, w3, w1);
+    tcg_gen_ext_i32_tl(dst, w3);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(dst);
+    tcg_temp_free_i32(w1);
+    tcg_temp_free_i32(w2);
+    tcg_temp_free_i32(w3);
+    return true;
+}
+
+static bool trans_msubr32(DisasContext *ctx, arg_r *a)
+{
+    TCGv src1, src2, dst;
+    TCGv_i32 w1, w2, w3;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    dst = tcg_temp_new();
+    w1 = tcg_temp_new_i32();
+    w2 = tcg_temp_new_i32();
+    w3 = tcg_temp_new_i32();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    gen_get_gpr(dst, a->rd);
+
+    tcg_gen_trunc_tl_i32(w1, src1);
+    tcg_gen_trunc_tl_i32(w2, src2);
+    tcg_gen_trunc_tl_i32(w3, dst);
+
+    tcg_gen_mul_i32(w1, w1, w2);
+    tcg_gen_sub_i32(w3, w3, w1);
+    tcg_gen_ext_i32_tl(dst, w3);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(dst);
+    tcg_temp_free_i32(w1);
+    tcg_temp_free_i32(w2);
+    tcg_temp_free_i32(w3);
+    return true;
+}
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index c0e3b6bbdb..4e0c7a92eb 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2910,3 +2910,80 @@ static inline void do_mulsr64(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64(mulsr64);
+
+/* Miscellaneous Instructions */
+static inline void do_ave(CPURISCVState *env, void *vd, void *va,
+                          void *vb, uint8_t i)
+{
+    target_long *d = vd, *a = va, *b = vb, half;
+
+    half = hadd64(*a, *b);
+    if ((*a ^ *b) & 0x1) {
+        half++;
+    }
+    *d = half;
+}
+
+RVPR(ave, 1, sizeof(target_ulong));
+
+static inline void do_sra_u(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd, *a = va;
+    uint8_t *b = vb;
+    uint8_t shift = riscv_has_ext(env, RV32) ? (*b & 0x1f) : (*b & 0x3f);
+
+    *d = vssra64(env, 0, *a, shift);
+}
+
+RVPR(sra_u, 1, sizeof(target_ulong));
+
+static inline void do_bitrev(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    target_ulong *d = vd, *a = va;
+    uint8_t *b = vb;
+    uint8_t shift = riscv_has_ext(env, RV32) ? (*b & 0x1f) : (*b & 0x3f);
+
+    *d = revbit64(*a) >> (64 - shift - 1);
+}
+
+RVPR(bitrev, 1, sizeof(target_ulong));
+
+static inline target_ulong
+rvpr_64(CPURISCVState *env, uint64_t a, target_ulong b, PackedFn3 *fn)
+{
+    target_ulong result = 0;
+
+    fn(env, &result, &a, &b);
+    return result;
+}
+
+#define RVPR_64(NAME)                                       \
+target_ulong HELPER(NAME)(CPURISCVState *env, uint64_t a,   \
+                          target_ulong b)                   \
+{                                                           \
+    return rvpr_64(env, a, b, (PackedFn3 *)do_##NAME);      \
+}
+
+static inline void do_wext(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int64_t *a = va;
+    uint8_t b = *(uint8_t *)vb & 0x1f;
+
+    *d = sextract64(*a, b, 32);
+}
+
+RVPR_64(wext);
+
+static inline void do_bpick(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    target_long *d = vd, *a = va, *b = vb, *c = vc;
+
+    *d = (*c & *a) | (~*c & *b);
+}
+
+RVPR_ACC(bpick, 1, sizeof(target_ulong));
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 27/37] target/riscv: Non-SIMD Miscellaneous Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

Bit reverse, average, rounding shift, extract and insert byte
instructions.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   6 +
 target/riscv/insn32.decode              |  16 ++
 target/riscv/insn_trans/trans_rvp.c.inc | 241 ++++++++++++++++++++++++
 target/riscv/packed_helper.c            |  77 ++++++++
 4 files changed, 340 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 3063b583f3..bdd5ca1251 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1393,3 +1393,9 @@ DEF_HELPER_3(maxw, tl, env, tl, tl)
 DEF_HELPER_3(minw, tl, env, tl, tl)
 DEF_HELPER_3(mulr64, i64, env, tl, tl)
 DEF_HELPER_3(mulsr64, i64, env, tl, tl)
+
+DEF_HELPER_3(ave, tl, env, tl, tl)
+DEF_HELPER_3(sra_u, tl, env, tl, tl)
+DEF_HELPER_3(bitrev, tl, env, tl, tl)
+DEF_HELPER_3(wext, tl, env, i64, tl)
+DEF_HELPER_4(bpick, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 9cfe5570b0..b70f6f0dc2 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -26,6 +26,7 @@
 %sh7    20:7
 %sh4    20:4
 %sh3    20:3
+%sh6    20:6
 %csr    20:12
 %rm     12:3
 %nf     29:3                     !function=ex_plus_1
@@ -44,6 +45,7 @@
 &j    imm rd
 &r    rd rs1 rs2
 &r2   rd rs1
+&r4   rd rs1 rs2 rs3
 &s    imm rs1 rs2
 &u    imm rd
 &shift     shamt rs1 rd
@@ -65,6 +67,7 @@
 @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh7     %rs1 %rd
 @sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
 @sh3     ......  ...... .....  ... ..... ....... &shift  shamt=%sh3      %rs1 %rd
+@sh6     ......  ...... .....  ... ..... ....... &shift  shamt=%sh6      %rs1 %rd
 @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
 
 @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
@@ -74,6 +77,7 @@
 @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
 @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
 @r2      .......   ..... ..... ... ..... ....... &r2 %rs1 %rd
+@r4      ..... ..  ..... ..... ... ..... ....... %rs3 %rs2 %rs1 %rd
 @r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
 @r2_vm   ...... vm:1 ..... ..... ... ..... ....... &rmr %rs2 %rd
 @r1_vm   ...... vm:1 ..... ..... ... ..... ....... %rd
@@ -997,3 +1001,15 @@ maxw       1111001  ..... ..... 000 ..... 1110111 @r
 minw       1111000  ..... ..... 000 ..... 1110111 @r
 mulr64     1111000  ..... ..... 001 ..... 1110111 @r
 mulsr64    1110000  ..... ..... 001 ..... 1110111 @r
+
+ave        1110000  ..... ..... 000 ..... 1110111 @r
+sra_u      0010010  ..... ..... 001 ..... 1110111 @r
+srai_u     110101  ...... ..... 001 ..... 1110111 @sh6
+bitrev     1110011  ..... ..... 000 ..... 1110111 @r
+bitrevi    111010  ...... ..... 000 ..... 1110111 @sh6
+wext       1100111  ..... ..... 000 ..... 1110111 @r
+wexti      1101111  ..... ..... 000 ..... 1110111 @sh5
+bpick      .....00  ..... ..... 011 ..... 1110111 @r4
+insb       1010110  00 ... ..... 000 ..... 1110111 @sh3
+maddr32    1100010  ..... ..... 001 ..... 1110111 @r
+msubr32    1100011  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index b720c6e037..51e140d157 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -708,3 +708,244 @@ GEN_RVP_R_OOL(minw);
 GEN_RVP_R_OOL(maxw);
 GEN_RVP_R_D64_OOL(mulr64);
 GEN_RVP_R_D64_OOL(mulsr64);
+
+/* Non-SIMD Miscellaneous Instructions */
+GEN_RVP_R_OOL(ave);
+GEN_RVP_R_OOL(sra_u);
+GEN_RVP_SHIFTI(srai_u, NULL, gen_helper_sra_u);
+GEN_RVP_R_OOL(bitrev);
+GEN_RVP_SHIFTI(bitrevi, NULL, gen_helper_bitrev);
+
+static bool
+r_s64_ool(DisasContext *ctx, arg_r *a,
+          void (* fn)(TCGv, TCGv_ptr, TCGv_i64, TCGv))
+{
+    TCGv_i64 src1;
+    TCGv src2, dst;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_psfoperand) {
+        return false;
+    }
+
+    src1 = tcg_temp_new_i64();
+    src2 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    if (is_32bit(ctx)) {
+        TCGv t0, t1;
+        t0 = tcg_temp_new();
+        t1 = tcg_temp_new();
+        gen_get_gpr(t0, a->rs1);
+        gen_get_gpr(t1, a->rs1 + 1);
+        tcg_gen_concat_tl_i64(src1, t0, t1);
+        tcg_temp_free(t0);
+        tcg_temp_free(t1);
+    } else {
+        TCGv t0;
+        t0 = tcg_temp_new();
+        gen_get_gpr(t0, a->rs1);
+        tcg_gen_ext_tl_i64(src1, t0);
+        tcg_temp_free(t0);
+    }
+    gen_get_gpr(src2, a->rs2);
+    fn(dst, cpu_env, src1, src2);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free_i64(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(dst);
+    return true;
+}
+
+#define GEN_RVP_R_S64_OOL(NAME)                        \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_s64_ool(s, a, gen_helper_##NAME);         \
+}
+
+GEN_RVP_R_S64_OOL(wext);
+
+static bool rvp_shifti_s64_ool(DisasContext *ctx, arg_shift *a,
+                               void (* fn)(TCGv, TCGv_ptr, TCGv_i64, TCGv))
+{
+    TCGv_i64 src1;
+    TCGv shift, dst;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_psfoperand) {
+        return false;
+    }
+
+    src1 = tcg_temp_new_i64();
+    dst = tcg_temp_new();
+
+    if (is_32bit(ctx)) {
+        TCGv t0, t1;
+        t0 = tcg_temp_new();
+        t1 = tcg_temp_new();
+        gen_get_gpr(t0, a->rs1);
+        gen_get_gpr(t1, a->rs1 + 1);
+        tcg_gen_concat_tl_i64(src1, t0, t1);
+        tcg_temp_free(t0);
+        tcg_temp_free(t1);
+    } else {
+        TCGv t0;
+        t0 = tcg_temp_new();
+        gen_get_gpr(t0, a->rs1);
+        tcg_gen_ext_tl_i64(src1, t0);
+        tcg_temp_free(t0);
+    }
+    shift = tcg_const_tl(a->shamt);
+    fn(dst, cpu_env, src1, shift);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free_i64(src1);
+    tcg_temp_free(shift);
+    tcg_temp_free(dst);
+    return true;
+}
+
+#define GEN_RVP_SHIFTI_S64_OOL(NAME, OP)                    \
+static bool trans_##NAME(DisasContext *s, arg_shift *a)     \
+{                                                           \
+    return rvp_shifti_s64_ool(s, a, gen_helper_##OP);       \
+}
+
+GEN_RVP_SHIFTI_S64_OOL(wexti, wext);
+
+typedef void gen_helper_rvp_r4(TCGv, TCGv_ptr, TCGv, TCGv, TCGv);
+
+static bool r4_ool(DisasContext *ctx, arg_r4 *a, gen_helper_rvp_r4 *fn)
+{
+    TCGv src1, src2, src3, dst;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    src3 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    gen_get_gpr(src3, a->rs3);
+    fn(dst, cpu_env, src1, src2, src3);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(src3);
+    tcg_temp_free(dst);
+    return true;
+}
+
+#define GEN_RVP_R4_OOL(NAME)                           \
+static bool trans_##NAME(DisasContext *s, arg_r4 *a)   \
+{                                                      \
+    return r4_ool(s, a, gen_helper_##NAME);            \
+}
+
+GEN_RVP_R4_OOL(bpick);
+
+static bool trans_insb(DisasContext *ctx, arg_shift *a)
+{
+    TCGv src1, dst, b0;
+    uint8_t shift;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+    if (is_32bit(ctx)) {
+        shift = a->shamt & 0x3;
+    } else {
+        shift = a->shamt;
+    }
+    src1 = tcg_temp_new();
+    dst = tcg_temp_new();
+    b0 = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(dst, a->rd);
+
+    tcg_gen_andi_tl(b0, src1, 0xff);
+    tcg_gen_deposit_tl(dst, dst, b0, shift * 8, 8);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(dst);
+    tcg_temp_free(b0);
+    return true;
+}
+
+static bool trans_maddr32(DisasContext *ctx, arg_r *a)
+{
+    TCGv src1, src2, dst;
+    TCGv_i32 w1, w2, w3;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    dst = tcg_temp_new();
+    w1 = tcg_temp_new_i32();
+    w2 = tcg_temp_new_i32();
+    w3 = tcg_temp_new_i32();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    gen_get_gpr(dst, a->rd);
+
+    tcg_gen_trunc_tl_i32(w1, src1);
+    tcg_gen_trunc_tl_i32(w2, src2);
+    tcg_gen_trunc_tl_i32(w3, dst);
+
+    tcg_gen_mul_i32(w1, w1, w2);
+    tcg_gen_add_i32(w3, w3, w1);
+    tcg_gen_ext_i32_tl(dst, w3);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(dst);
+    tcg_temp_free_i32(w1);
+    tcg_temp_free_i32(w2);
+    tcg_temp_free_i32(w3);
+    return true;
+}
+
+static bool trans_msubr32(DisasContext *ctx, arg_r *a)
+{
+    TCGv src1, src2, dst;
+    TCGv_i32 w1, w2, w3;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    dst = tcg_temp_new();
+    w1 = tcg_temp_new_i32();
+    w2 = tcg_temp_new_i32();
+    w3 = tcg_temp_new_i32();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    gen_get_gpr(dst, a->rd);
+
+    tcg_gen_trunc_tl_i32(w1, src1);
+    tcg_gen_trunc_tl_i32(w2, src2);
+    tcg_gen_trunc_tl_i32(w3, dst);
+
+    tcg_gen_mul_i32(w1, w1, w2);
+    tcg_gen_sub_i32(w3, w3, w1);
+    tcg_gen_ext_i32_tl(dst, w3);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(dst);
+    tcg_temp_free_i32(w1);
+    tcg_temp_free_i32(w2);
+    tcg_temp_free_i32(w3);
+    return true;
+}
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index c0e3b6bbdb..4e0c7a92eb 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2910,3 +2910,80 @@ static inline void do_mulsr64(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64(mulsr64);
+
+/* Miscellaneous Instructions */
+static inline void do_ave(CPURISCVState *env, void *vd, void *va,
+                          void *vb, uint8_t i)
+{
+    target_long *d = vd, *a = va, *b = vb, half;
+
+    half = hadd64(*a, *b);
+    if ((*a ^ *b) & 0x1) {
+        half++;
+    }
+    *d = half;
+}
+
+RVPR(ave, 1, sizeof(target_ulong));
+
+static inline void do_sra_u(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd, *a = va;
+    uint8_t *b = vb;
+    uint8_t shift = riscv_has_ext(env, RV32) ? (*b & 0x1f) : (*b & 0x3f);
+
+    *d = vssra64(env, 0, *a, shift);
+}
+
+RVPR(sra_u, 1, sizeof(target_ulong));
+
+static inline void do_bitrev(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    target_ulong *d = vd, *a = va;
+    uint8_t *b = vb;
+    uint8_t shift = riscv_has_ext(env, RV32) ? (*b & 0x1f) : (*b & 0x3f);
+
+    *d = revbit64(*a) >> (64 - shift - 1);
+}
+
+RVPR(bitrev, 1, sizeof(target_ulong));
+
+static inline target_ulong
+rvpr_64(CPURISCVState *env, uint64_t a, target_ulong b, PackedFn3 *fn)
+{
+    target_ulong result = 0;
+
+    fn(env, &result, &a, &b);
+    return result;
+}
+
+#define RVPR_64(NAME)                                       \
+target_ulong HELPER(NAME)(CPURISCVState *env, uint64_t a,   \
+                          target_ulong b)                   \
+{                                                           \
+    return rvpr_64(env, a, b, (PackedFn3 *)do_##NAME);      \
+}
+
+static inline void do_wext(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int64_t *a = va;
+    uint8_t b = *(uint8_t *)vb & 0x1f;
+
+    *d = sextract64(*a, b, 32);
+}
+
+RVPR_64(wext);
+
+static inline void do_bpick(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    target_long *d = vd, *a = va, *b = vb, *c = vc;
+
+    *d = (*c & *a) | (~*c & *b);
+}
+
+RVPR_ACC(bpick, 1, sizeof(target_ulong));
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 28/37] target/riscv: RV64 Only SIMD 32-bit Add/Subtract Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:58   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

SIMD 32-bit straight or crossed add/subtract with rounding, havling,
or saturation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 include/tcg/tcg-op-gvec.h               |   4 +
 target/riscv/helper.h                   |  29 +++
 target/riscv/insn32.decode              |  32 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  84 ++++++++
 target/riscv/packed_helper.c            | 276 ++++++++++++++++++++++++
 5 files changed, 425 insertions(+)

diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index 91531ecb0b..023190e063 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -422,6 +422,8 @@ void tcg_gen_vec_rotl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
 #define tcg_gen_vec_shl8i_tl tcg_gen_vec_shl8i_i64
 #define tcg_gen_vec_shr8i_tl tcg_gen_vec_shr8i_i64
 #define tcg_gen_vec_sar8i_tl tcg_gen_vec_sar8i_i64
+#define tcg_gen_vec_add32_tl tcg_gen_vec_add32_i64
+#define tcg_gen_vec_sub32_tl tcg_gen_vec_sub32_i64
 #else
 #define tcg_gen_vec_add16_tl tcg_gen_vec_add16_i32
 #define tcg_gen_vec_sub16_tl tcg_gen_vec_sub16_i32
@@ -433,6 +435,8 @@ void tcg_gen_vec_rotl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
 #define tcg_gen_vec_shl8i_tl tcg_gen_vec_shl8i_i32
 #define tcg_gen_vec_shr8i_tl tcg_gen_vec_shr8i_i32
 #define tcg_gen_vec_sar8i_tl tcg_gen_vec_sar8i_i32
+#define tcg_gen_vec_add32_tl tcg_gen_add_i32
+#define tcg_gen_vec_sub32_tl tcg_gen_sub_i32
 #endif
 
 #endif
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index bdd5ca1251..0f02e140f5 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1399,3 +1399,32 @@ DEF_HELPER_3(sra_u, tl, env, tl, tl)
 DEF_HELPER_3(bitrev, tl, env, tl, tl)
 DEF_HELPER_3(wext, tl, env, i64, tl)
 DEF_HELPER_4(bpick, tl, env, tl, tl, tl)
+
+DEF_HELPER_3(radd32, i64, env, i64, i64)
+DEF_HELPER_3(uradd32, i64, env, i64, i64)
+DEF_HELPER_3(kadd32, i64, env, i64, i64)
+DEF_HELPER_3(ukadd32, i64, env, i64, i64)
+DEF_HELPER_3(rsub32, i64, env, i64, i64)
+DEF_HELPER_3(ursub32, i64, env, i64, i64)
+DEF_HELPER_3(ksub32, i64, env, i64, i64)
+DEF_HELPER_3(uksub32, i64, env, i64, i64)
+DEF_HELPER_3(cras32, i64, env, i64, i64)
+DEF_HELPER_3(rcras32, i64, env, i64, i64)
+DEF_HELPER_3(urcras32, i64, env, i64, i64)
+DEF_HELPER_3(kcras32, i64, env, i64, i64)
+DEF_HELPER_3(ukcras32, i64, env, i64, i64)
+DEF_HELPER_3(crsa32, i64, env, i64, i64)
+DEF_HELPER_3(rcrsa32, i64, env, i64, i64)
+DEF_HELPER_3(urcrsa32, i64, env, i64, i64)
+DEF_HELPER_3(kcrsa32, i64, env, i64, i64)
+DEF_HELPER_3(ukcrsa32, i64, env, i64, i64)
+DEF_HELPER_3(stas32, i64, env, i64, i64)
+DEF_HELPER_3(rstas32, i64, env, i64, i64)
+DEF_HELPER_3(urstas32, i64, env, i64, i64)
+DEF_HELPER_3(kstas32, i64, env, i64, i64)
+DEF_HELPER_3(ukstas32, i64, env, i64, i64)
+DEF_HELPER_3(stsa32, i64, env, i64, i64)
+DEF_HELPER_3(rstsa32, i64, env, i64, i64)
+DEF_HELPER_3(urstsa32, i64, env, i64, i64)
+DEF_HELPER_3(kstsa32, i64, env, i64, i64)
+DEF_HELPER_3(ukstsa32, i64, env, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b70f6f0dc2..05151c6c51 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1013,3 +1013,35 @@ bpick      .....00  ..... ..... 011 ..... 1110111 @r4
 insb       1010110  00 ... ..... 000 ..... 1110111 @sh3
 maddr32    1100010  ..... ..... 001 ..... 1110111 @r
 msubr32    1100011  ..... ..... 001 ..... 1110111 @r
+
+# *** RV64P Standard Extension (in addition to RV32P) ***
+add32      0100000  ..... ..... 010 ..... 1110111 @r
+radd32     0000000  ..... ..... 010 ..... 1110111 @r
+uradd32    0010000  ..... ..... 010 ..... 1110111 @r
+kadd32     0001000  ..... ..... 010 ..... 1110111 @r
+ukadd32    0011000  ..... ..... 010 ..... 1110111 @r
+sub32      0100001  ..... ..... 010 ..... 1110111 @r
+rsub32     0000001  ..... ..... 010 ..... 1110111 @r
+ursub32    0010001  ..... ..... 010 ..... 1110111 @r
+ksub32     0001001  ..... ..... 010 ..... 1110111 @r
+uksub32    0011001  ..... ..... 010 ..... 1110111 @r
+cras32     0100010  ..... ..... 010 ..... 1110111 @r
+rcras32    0000010  ..... ..... 010 ..... 1110111 @r
+urcras32   0010010  ..... ..... 010 ..... 1110111 @r
+kcras32    0001010  ..... ..... 010 ..... 1110111 @r
+ukcras32   0011010  ..... ..... 010 ..... 1110111 @r
+crsa32     0100011  ..... ..... 010 ..... 1110111 @r
+rcrsa32    0000011  ..... ..... 010 ..... 1110111 @r
+urcrsa32   0010011  ..... ..... 010 ..... 1110111 @r
+kcrsa32    0001011  ..... ..... 010 ..... 1110111 @r
+ukcrsa32   0011011  ..... ..... 010 ..... 1110111 @r
+stas32     1111000  ..... ..... 010 ..... 1110111 @r
+rstas32    1011000  ..... ..... 010 ..... 1110111 @r
+urstas32   1101000  ..... ..... 010 ..... 1110111 @r
+kstas32    1100000  ..... ..... 010 ..... 1110111 @r
+ukstas32   1110000  ..... ..... 010 ..... 1110111 @r
+stsa32     1111001  ..... ..... 010 ..... 1110111 @r
+rstsa32    1011001  ..... ..... 010 ..... 1110111 @r
+urstsa32   1101001  ..... ..... 010 ..... 1110111 @r
+kstsa32    1100001  ..... ..... 010 ..... 1110111 @r
+ukstsa32   1110001  ..... ..... 010 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 51e140d157..293c2c4597 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -949,3 +949,87 @@ static bool trans_msubr32(DisasContext *ctx, arg_r *a)
     tcg_temp_free_i32(w3);
     return true;
 }
+
+/*
+ *** RV64 Only Instructions
+ */
+/* RV64 Only) SIMD 32-bit Add/Subtract Instructions */
+#define GEN_RVP64_R_INLINE(NAME, VECOP, OP)              \
+static bool trans_##NAME(DisasContext *s, arg_r *a)      \
+{                                                        \
+    REQUIRE_64BIT(s);                                    \
+    return r_inline(s, a, VECOP, OP);                    \
+}
+
+GEN_RVP64_R_INLINE(add32, tcg_gen_vec_add32_tl, tcg_gen_add_tl);
+GEN_RVP64_R_INLINE(sub32, tcg_gen_vec_sub32_tl, tcg_gen_sub_tl);
+
+static bool
+r_64_ool(DisasContext *ctx, arg_r *a,
+         void (* fn)(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64))
+{
+    TCGv t1, t2;
+    TCGv_i64 src1, src2, dst;
+
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new_i64();
+    src2 = tcg_temp_new_i64();
+    dst = tcg_temp_new_i64();
+
+    t1 = tcg_temp_new();
+    t2 = tcg_temp_new();
+    gen_get_gpr(t1, a->rs1);
+    tcg_gen_ext_tl_i64(src1, t1);
+    gen_get_gpr(t2, a->rs2);
+    tcg_gen_ext_tl_i64(src2, t2);
+
+    fn(dst, cpu_env, src1, src2);
+    tcg_gen_trunc_i64_tl(t1, dst);
+    gen_set_gpr(a->rd, t1);
+
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+    tcg_temp_free_i64(src1);
+    tcg_temp_free_i64(src2);
+    tcg_temp_free_i64(dst);
+    return true;
+}
+
+#define GEN_RVP64_R_OOL(NAME)                          \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    REQUIRE_64BIT(s);                                  \
+    return r_64_ool(s, a, gen_helper_##NAME);          \
+}
+
+GEN_RVP64_R_OOL(radd32);
+GEN_RVP64_R_OOL(uradd32);
+GEN_RVP64_R_OOL(kadd32);
+GEN_RVP64_R_OOL(ukadd32);
+GEN_RVP64_R_OOL(rsub32);
+GEN_RVP64_R_OOL(ursub32);
+GEN_RVP64_R_OOL(ksub32);
+GEN_RVP64_R_OOL(uksub32);
+GEN_RVP64_R_OOL(cras32);
+GEN_RVP64_R_OOL(rcras32);
+GEN_RVP64_R_OOL(urcras32);
+GEN_RVP64_R_OOL(kcras32);
+GEN_RVP64_R_OOL(ukcras32);
+GEN_RVP64_R_OOL(crsa32);
+GEN_RVP64_R_OOL(rcrsa32);
+GEN_RVP64_R_OOL(urcrsa32);
+GEN_RVP64_R_OOL(kcrsa32);
+GEN_RVP64_R_OOL(ukcrsa32);
+GEN_RVP64_R_OOL(stas32);
+GEN_RVP64_R_OOL(rstas32);
+GEN_RVP64_R_OOL(urstas32);
+GEN_RVP64_R_OOL(kstas32);
+GEN_RVP64_R_OOL(ukstas32);
+GEN_RVP64_R_OOL(stsa32);
+GEN_RVP64_R_OOL(rstsa32);
+GEN_RVP64_R_OOL(urstsa32);
+GEN_RVP64_R_OOL(kstsa32);
+GEN_RVP64_R_OOL(ukstsa32);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 4e0c7a92eb..305c515132 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2987,3 +2987,279 @@ static inline void do_bpick(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(bpick, 1, sizeof(target_ulong));
+
+/*
+ *** RV64 Only Instructions
+ */
+/* (RV64 Only) SIMD 32-bit Add/Subtract Instructions */
+static inline void do_radd32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint16_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = hadd32(a[i], b[i]);
+}
+
+RVPR64_64_64(radd32, 1, 4);
+
+static inline void do_uradd32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint16_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[i] = haddu32(a[i], b[i]);
+}
+
+RVPR64_64_64(uradd32, 1, 4);
+
+static inline void do_kadd32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint16_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = sadd32(env, 0, a[i], b[i]);
+}
+
+RVPR64_64_64(kadd32, 1, 4);
+
+static inline void do_ukadd32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint16_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[i] = saddu32(env, 0, a[i], b[i]);
+}
+
+RVPR64_64_64(ukadd32, 1, 4);
+
+static inline void do_rsub32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint16_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = hsub32(a[i], b[i]);
+}
+
+RVPR64_64_64(rsub32, 1, 4);
+
+static inline void do_ursub32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint16_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[i] = hsubu64(a[i], b[i]);
+}
+
+RVPR64_64_64(ursub32, 1, 4);
+
+static inline void do_ksub32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint16_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = ssub32(env, 0, a[i], b[i]);
+}
+
+RVPR64_64_64(ksub32, 1, 4);
+
+static inline void do_uksub32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint16_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[i] = ssubu32(env, 0, a[i], b[i]);
+}
+
+RVPR64_64_64(uksub32, 1, 4);
+
+static inline void do_cras32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = a[H4(i)] - b[H4(i + 1)];
+    d[H4(i + 1)] = a[H4(i + 1)] + b[H4(i)];
+}
+
+RVPR64_64_64(cras32, 2, 4);
+
+static inline void do_rcras32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hsub32(a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = hadd32(a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR64_64_64(rcras32, 2, 4);
+
+static inline void do_urcras32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hsubu64(a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = haddu32(a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR64_64_64(urcras32, 2, 4);
+
+static inline void do_kcras32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = ssub32(env, 0, a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = sadd32(env, 0, a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR64_64_64(kcras32, 2, 4);
+
+static inline void do_ukcras32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = ssubu32(env, 0, a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = saddu32(env, 0, a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR64_64_64(ukcras32, 2, 4);
+
+static inline void do_crsa32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = a[H4(i)] + b[H4(i + 1)];
+    d[H4(i + 1)] = a[H4(i + 1)] - b[H4(i)];
+}
+
+RVPR64_64_64(crsa32, 2, 4);
+
+static inline void do_rcrsa32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hadd32(a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = hsub32(a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR64_64_64(rcrsa32, 2, 4);
+
+static inline void do_urcrsa32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = haddu32(a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = hsubu64(a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR64_64_64(urcrsa32, 2, 4);
+
+static inline void do_kcrsa32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = sadd32(env, 0, a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = ssub32(env, 0, a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR64_64_64(kcrsa32, 2, 4);
+
+static inline void do_ukcrsa32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = saddu32(env, 0, a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = ssubu32(env, 0, a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR64_64_64(ukcrsa32, 2, 4);
+
+static inline void do_stas32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = a[H4(i)] - b[H4(i)];
+    d[H4(i + 1)] = a[H4(i + 1)] + b[H4(i + 1)];
+}
+
+RVPR64_64_64(stas32, 2, 4);
+
+static inline void do_rstas32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hsub32(a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = hadd32(a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR64_64_64(rstas32, 2, 4);
+
+static inline void do_urstas32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hsubu64(a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = haddu32(a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR64_64_64(urstas32, 2, 4);
+
+static inline void do_kstas32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = ssub32(env, 0, a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = sadd32(env, 0, a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR64_64_64(kstas32, 2, 4);
+
+static inline void do_ukstas32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = ssubu32(env, 0, a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = saddu32(env, 0, a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR64_64_64(ukstas32, 2, 4);
+
+static inline void do_stsa32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = a[H4(i)] + b[H4(i)];
+    d[H4(i + 1)] = a[H4(i + 1)] - b[H4(i + 1)];
+}
+
+RVPR64_64_64(stsa32, 2, 4);
+
+static inline void do_rstsa32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hadd32(a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = hsub32(a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR64_64_64(rstsa32, 2, 4);
+
+static inline void do_urstsa32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = haddu32(a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = hsubu64(a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR64_64_64(urstsa32, 2, 4);
+
+static inline void do_kstsa32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = sadd32(env, 0, a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = ssub32(env, 0, a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR64_64_64(kstsa32, 2, 4);
+
+static inline void do_ukstsa32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = saddu32(env, 0, a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = ssubu32(env, 0, a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR64_64_64(ukstsa32, 2, 4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 28/37] target/riscv: RV64 Only SIMD 32-bit Add/Subtract Instructions
@ 2021-06-10  7:58   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:58 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

SIMD 32-bit straight or crossed add/subtract with rounding, havling,
or saturation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 include/tcg/tcg-op-gvec.h               |   4 +
 target/riscv/helper.h                   |  29 +++
 target/riscv/insn32.decode              |  32 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  84 ++++++++
 target/riscv/packed_helper.c            | 276 ++++++++++++++++++++++++
 5 files changed, 425 insertions(+)

diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index 91531ecb0b..023190e063 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -422,6 +422,8 @@ void tcg_gen_vec_rotl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
 #define tcg_gen_vec_shl8i_tl tcg_gen_vec_shl8i_i64
 #define tcg_gen_vec_shr8i_tl tcg_gen_vec_shr8i_i64
 #define tcg_gen_vec_sar8i_tl tcg_gen_vec_sar8i_i64
+#define tcg_gen_vec_add32_tl tcg_gen_vec_add32_i64
+#define tcg_gen_vec_sub32_tl tcg_gen_vec_sub32_i64
 #else
 #define tcg_gen_vec_add16_tl tcg_gen_vec_add16_i32
 #define tcg_gen_vec_sub16_tl tcg_gen_vec_sub16_i32
@@ -433,6 +435,8 @@ void tcg_gen_vec_rotl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
 #define tcg_gen_vec_shl8i_tl tcg_gen_vec_shl8i_i32
 #define tcg_gen_vec_shr8i_tl tcg_gen_vec_shr8i_i32
 #define tcg_gen_vec_sar8i_tl tcg_gen_vec_sar8i_i32
+#define tcg_gen_vec_add32_tl tcg_gen_add_i32
+#define tcg_gen_vec_sub32_tl tcg_gen_sub_i32
 #endif
 
 #endif
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index bdd5ca1251..0f02e140f5 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1399,3 +1399,32 @@ DEF_HELPER_3(sra_u, tl, env, tl, tl)
 DEF_HELPER_3(bitrev, tl, env, tl, tl)
 DEF_HELPER_3(wext, tl, env, i64, tl)
 DEF_HELPER_4(bpick, tl, env, tl, tl, tl)
+
+DEF_HELPER_3(radd32, i64, env, i64, i64)
+DEF_HELPER_3(uradd32, i64, env, i64, i64)
+DEF_HELPER_3(kadd32, i64, env, i64, i64)
+DEF_HELPER_3(ukadd32, i64, env, i64, i64)
+DEF_HELPER_3(rsub32, i64, env, i64, i64)
+DEF_HELPER_3(ursub32, i64, env, i64, i64)
+DEF_HELPER_3(ksub32, i64, env, i64, i64)
+DEF_HELPER_3(uksub32, i64, env, i64, i64)
+DEF_HELPER_3(cras32, i64, env, i64, i64)
+DEF_HELPER_3(rcras32, i64, env, i64, i64)
+DEF_HELPER_3(urcras32, i64, env, i64, i64)
+DEF_HELPER_3(kcras32, i64, env, i64, i64)
+DEF_HELPER_3(ukcras32, i64, env, i64, i64)
+DEF_HELPER_3(crsa32, i64, env, i64, i64)
+DEF_HELPER_3(rcrsa32, i64, env, i64, i64)
+DEF_HELPER_3(urcrsa32, i64, env, i64, i64)
+DEF_HELPER_3(kcrsa32, i64, env, i64, i64)
+DEF_HELPER_3(ukcrsa32, i64, env, i64, i64)
+DEF_HELPER_3(stas32, i64, env, i64, i64)
+DEF_HELPER_3(rstas32, i64, env, i64, i64)
+DEF_HELPER_3(urstas32, i64, env, i64, i64)
+DEF_HELPER_3(kstas32, i64, env, i64, i64)
+DEF_HELPER_3(ukstas32, i64, env, i64, i64)
+DEF_HELPER_3(stsa32, i64, env, i64, i64)
+DEF_HELPER_3(rstsa32, i64, env, i64, i64)
+DEF_HELPER_3(urstsa32, i64, env, i64, i64)
+DEF_HELPER_3(kstsa32, i64, env, i64, i64)
+DEF_HELPER_3(ukstsa32, i64, env, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b70f6f0dc2..05151c6c51 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1013,3 +1013,35 @@ bpick      .....00  ..... ..... 011 ..... 1110111 @r4
 insb       1010110  00 ... ..... 000 ..... 1110111 @sh3
 maddr32    1100010  ..... ..... 001 ..... 1110111 @r
 msubr32    1100011  ..... ..... 001 ..... 1110111 @r
+
+# *** RV64P Standard Extension (in addition to RV32P) ***
+add32      0100000  ..... ..... 010 ..... 1110111 @r
+radd32     0000000  ..... ..... 010 ..... 1110111 @r
+uradd32    0010000  ..... ..... 010 ..... 1110111 @r
+kadd32     0001000  ..... ..... 010 ..... 1110111 @r
+ukadd32    0011000  ..... ..... 010 ..... 1110111 @r
+sub32      0100001  ..... ..... 010 ..... 1110111 @r
+rsub32     0000001  ..... ..... 010 ..... 1110111 @r
+ursub32    0010001  ..... ..... 010 ..... 1110111 @r
+ksub32     0001001  ..... ..... 010 ..... 1110111 @r
+uksub32    0011001  ..... ..... 010 ..... 1110111 @r
+cras32     0100010  ..... ..... 010 ..... 1110111 @r
+rcras32    0000010  ..... ..... 010 ..... 1110111 @r
+urcras32   0010010  ..... ..... 010 ..... 1110111 @r
+kcras32    0001010  ..... ..... 010 ..... 1110111 @r
+ukcras32   0011010  ..... ..... 010 ..... 1110111 @r
+crsa32     0100011  ..... ..... 010 ..... 1110111 @r
+rcrsa32    0000011  ..... ..... 010 ..... 1110111 @r
+urcrsa32   0010011  ..... ..... 010 ..... 1110111 @r
+kcrsa32    0001011  ..... ..... 010 ..... 1110111 @r
+ukcrsa32   0011011  ..... ..... 010 ..... 1110111 @r
+stas32     1111000  ..... ..... 010 ..... 1110111 @r
+rstas32    1011000  ..... ..... 010 ..... 1110111 @r
+urstas32   1101000  ..... ..... 010 ..... 1110111 @r
+kstas32    1100000  ..... ..... 010 ..... 1110111 @r
+ukstas32   1110000  ..... ..... 010 ..... 1110111 @r
+stsa32     1111001  ..... ..... 010 ..... 1110111 @r
+rstsa32    1011001  ..... ..... 010 ..... 1110111 @r
+urstsa32   1101001  ..... ..... 010 ..... 1110111 @r
+kstsa32    1100001  ..... ..... 010 ..... 1110111 @r
+ukstsa32   1110001  ..... ..... 010 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 51e140d157..293c2c4597 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -949,3 +949,87 @@ static bool trans_msubr32(DisasContext *ctx, arg_r *a)
     tcg_temp_free_i32(w3);
     return true;
 }
+
+/*
+ *** RV64 Only Instructions
+ */
+/* RV64 Only) SIMD 32-bit Add/Subtract Instructions */
+#define GEN_RVP64_R_INLINE(NAME, VECOP, OP)              \
+static bool trans_##NAME(DisasContext *s, arg_r *a)      \
+{                                                        \
+    REQUIRE_64BIT(s);                                    \
+    return r_inline(s, a, VECOP, OP);                    \
+}
+
+GEN_RVP64_R_INLINE(add32, tcg_gen_vec_add32_tl, tcg_gen_add_tl);
+GEN_RVP64_R_INLINE(sub32, tcg_gen_vec_sub32_tl, tcg_gen_sub_tl);
+
+static bool
+r_64_ool(DisasContext *ctx, arg_r *a,
+         void (* fn)(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64))
+{
+    TCGv t1, t2;
+    TCGv_i64 src1, src2, dst;
+
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new_i64();
+    src2 = tcg_temp_new_i64();
+    dst = tcg_temp_new_i64();
+
+    t1 = tcg_temp_new();
+    t2 = tcg_temp_new();
+    gen_get_gpr(t1, a->rs1);
+    tcg_gen_ext_tl_i64(src1, t1);
+    gen_get_gpr(t2, a->rs2);
+    tcg_gen_ext_tl_i64(src2, t2);
+
+    fn(dst, cpu_env, src1, src2);
+    tcg_gen_trunc_i64_tl(t1, dst);
+    gen_set_gpr(a->rd, t1);
+
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+    tcg_temp_free_i64(src1);
+    tcg_temp_free_i64(src2);
+    tcg_temp_free_i64(dst);
+    return true;
+}
+
+#define GEN_RVP64_R_OOL(NAME)                          \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    REQUIRE_64BIT(s);                                  \
+    return r_64_ool(s, a, gen_helper_##NAME);          \
+}
+
+GEN_RVP64_R_OOL(radd32);
+GEN_RVP64_R_OOL(uradd32);
+GEN_RVP64_R_OOL(kadd32);
+GEN_RVP64_R_OOL(ukadd32);
+GEN_RVP64_R_OOL(rsub32);
+GEN_RVP64_R_OOL(ursub32);
+GEN_RVP64_R_OOL(ksub32);
+GEN_RVP64_R_OOL(uksub32);
+GEN_RVP64_R_OOL(cras32);
+GEN_RVP64_R_OOL(rcras32);
+GEN_RVP64_R_OOL(urcras32);
+GEN_RVP64_R_OOL(kcras32);
+GEN_RVP64_R_OOL(ukcras32);
+GEN_RVP64_R_OOL(crsa32);
+GEN_RVP64_R_OOL(rcrsa32);
+GEN_RVP64_R_OOL(urcrsa32);
+GEN_RVP64_R_OOL(kcrsa32);
+GEN_RVP64_R_OOL(ukcrsa32);
+GEN_RVP64_R_OOL(stas32);
+GEN_RVP64_R_OOL(rstas32);
+GEN_RVP64_R_OOL(urstas32);
+GEN_RVP64_R_OOL(kstas32);
+GEN_RVP64_R_OOL(ukstas32);
+GEN_RVP64_R_OOL(stsa32);
+GEN_RVP64_R_OOL(rstsa32);
+GEN_RVP64_R_OOL(urstsa32);
+GEN_RVP64_R_OOL(kstsa32);
+GEN_RVP64_R_OOL(ukstsa32);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 4e0c7a92eb..305c515132 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2987,3 +2987,279 @@ static inline void do_bpick(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(bpick, 1, sizeof(target_ulong));
+
+/*
+ *** RV64 Only Instructions
+ */
+/* (RV64 Only) SIMD 32-bit Add/Subtract Instructions */
+static inline void do_radd32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint16_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = hadd32(a[i], b[i]);
+}
+
+RVPR64_64_64(radd32, 1, 4);
+
+static inline void do_uradd32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint16_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[i] = haddu32(a[i], b[i]);
+}
+
+RVPR64_64_64(uradd32, 1, 4);
+
+static inline void do_kadd32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint16_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = sadd32(env, 0, a[i], b[i]);
+}
+
+RVPR64_64_64(kadd32, 1, 4);
+
+static inline void do_ukadd32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint16_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[i] = saddu32(env, 0, a[i], b[i]);
+}
+
+RVPR64_64_64(ukadd32, 1, 4);
+
+static inline void do_rsub32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint16_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = hsub32(a[i], b[i]);
+}
+
+RVPR64_64_64(rsub32, 1, 4);
+
+static inline void do_ursub32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint16_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[i] = hsubu64(a[i], b[i]);
+}
+
+RVPR64_64_64(ursub32, 1, 4);
+
+static inline void do_ksub32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint16_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = ssub32(env, 0, a[i], b[i]);
+}
+
+RVPR64_64_64(ksub32, 1, 4);
+
+static inline void do_uksub32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint16_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[i] = ssubu32(env, 0, a[i], b[i]);
+}
+
+RVPR64_64_64(uksub32, 1, 4);
+
+static inline void do_cras32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = a[H4(i)] - b[H4(i + 1)];
+    d[H4(i + 1)] = a[H4(i + 1)] + b[H4(i)];
+}
+
+RVPR64_64_64(cras32, 2, 4);
+
+static inline void do_rcras32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hsub32(a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = hadd32(a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR64_64_64(rcras32, 2, 4);
+
+static inline void do_urcras32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hsubu64(a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = haddu32(a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR64_64_64(urcras32, 2, 4);
+
+static inline void do_kcras32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = ssub32(env, 0, a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = sadd32(env, 0, a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR64_64_64(kcras32, 2, 4);
+
+static inline void do_ukcras32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = ssubu32(env, 0, a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = saddu32(env, 0, a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR64_64_64(ukcras32, 2, 4);
+
+static inline void do_crsa32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = a[H4(i)] + b[H4(i + 1)];
+    d[H4(i + 1)] = a[H4(i + 1)] - b[H4(i)];
+}
+
+RVPR64_64_64(crsa32, 2, 4);
+
+static inline void do_rcrsa32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hadd32(a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = hsub32(a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR64_64_64(rcrsa32, 2, 4);
+
+static inline void do_urcrsa32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = haddu32(a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = hsubu64(a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR64_64_64(urcrsa32, 2, 4);
+
+static inline void do_kcrsa32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = sadd32(env, 0, a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = ssub32(env, 0, a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR64_64_64(kcrsa32, 2, 4);
+
+static inline void do_ukcrsa32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = saddu32(env, 0, a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = ssubu32(env, 0, a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR64_64_64(ukcrsa32, 2, 4);
+
+static inline void do_stas32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = a[H4(i)] - b[H4(i)];
+    d[H4(i + 1)] = a[H4(i + 1)] + b[H4(i + 1)];
+}
+
+RVPR64_64_64(stas32, 2, 4);
+
+static inline void do_rstas32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hsub32(a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = hadd32(a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR64_64_64(rstas32, 2, 4);
+
+static inline void do_urstas32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hsubu64(a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = haddu32(a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR64_64_64(urstas32, 2, 4);
+
+static inline void do_kstas32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = ssub32(env, 0, a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = sadd32(env, 0, a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR64_64_64(kstas32, 2, 4);
+
+static inline void do_ukstas32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = ssubu32(env, 0, a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = saddu32(env, 0, a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR64_64_64(ukstas32, 2, 4);
+
+static inline void do_stsa32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = a[H4(i)] + b[H4(i)];
+    d[H4(i + 1)] = a[H4(i + 1)] - b[H4(i + 1)];
+}
+
+RVPR64_64_64(stsa32, 2, 4);
+
+static inline void do_rstsa32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hadd32(a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = hsub32(a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR64_64_64(rstsa32, 2, 4);
+
+static inline void do_urstsa32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = haddu32(a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = hsubu64(a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR64_64_64(urstsa32, 2, 4);
+
+static inline void do_kstsa32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = sadd32(env, 0, a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = ssub32(env, 0, a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR64_64_64(kstsa32, 2, 4);
+
+static inline void do_ukstsa32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = saddu32(env, 0, a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = ssubu32(env, 0, a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR64_64_64(ukstsa32, 2, 4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 29/37] target/riscv: RV64 Only SIMD 32-bit Shift Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:59   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

SIMD 32-bit right shift with rounding or left shift with
saturation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   9 ++
 target/riscv/insn32.decode              |  15 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  55 +++++++++++++
 target/riscv/packed_helper.c            | 104 ++++++++++++++++++++++++
 4 files changed, 183 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 0f02e140f5..3b2a73db9a 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1428,3 +1428,12 @@ DEF_HELPER_3(rstsa32, i64, env, i64, i64)
 DEF_HELPER_3(urstsa32, i64, env, i64, i64)
 DEF_HELPER_3(kstsa32, i64, env, i64, i64)
 DEF_HELPER_3(ukstsa32, i64, env, i64, i64)
+
+DEF_HELPER_3(sra32, i64, env, i64, i64)
+DEF_HELPER_3(sra32_u, i64, env, i64, i64)
+DEF_HELPER_3(srl32, i64, env, i64, i64)
+DEF_HELPER_3(srl32_u, i64, env, i64, i64)
+DEF_HELPER_3(sll32, i64, env, i64, i64)
+DEF_HELPER_3(ksll32, i64, env, i64, i64)
+DEF_HELPER_3(kslra32, i64, env, i64, i64)
+DEF_HELPER_3(kslra32_u, i64, env, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 05151c6c51..80150c693a 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1045,3 +1045,18 @@ rstsa32    1011001  ..... ..... 010 ..... 1110111 @r
 urstsa32   1101001  ..... ..... 010 ..... 1110111 @r
 kstsa32    1100001  ..... ..... 010 ..... 1110111 @r
 ukstsa32   1110001  ..... ..... 010 ..... 1110111 @r
+
+sra32      0101000  ..... ..... 010 ..... 1110111 @r
+sra32_u    0110000  ..... ..... 010 ..... 1110111 @r
+srai32     0111000  ..... ..... 010 ..... 1110111 @sh5
+srai32_u   1000000  ..... ..... 010 ..... 1110111 @sh5
+srl32      0101001  ..... ..... 010 ..... 1110111 @r
+srl32_u    0110001  ..... ..... 010 ..... 1110111 @r
+srli32     0111001  ..... ..... 010 ..... 1110111 @sh5
+srli32_u   1000001  ..... ..... 010 ..... 1110111 @sh5
+sll32      0101010  ..... ..... 010 ..... 1110111 @r
+slli32     0111010  ..... ..... 010 ..... 1110111 @sh5
+ksll32     0110010  ..... ..... 010 ..... 1110111 @r
+kslli32    1000010  ..... ..... 010 ..... 1110111 @sh5
+kslra32    0101011  ..... ..... 010 ..... 1110111 @r
+kslra32_u  0110011  ..... ..... 010 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 293c2c4597..6cba14be84 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1033,3 +1033,58 @@ GEN_RVP64_R_OOL(rstsa32);
 GEN_RVP64_R_OOL(urstsa32);
 GEN_RVP64_R_OOL(kstsa32);
 GEN_RVP64_R_OOL(ukstsa32);
+
+/* (RV64 Only) SIMD 32-bit Shift Instructions */
+static inline bool
+rvp64_shifti(DisasContext *ctx, arg_shift *a,
+             void (* fn)(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64))
+{
+    TCGv t1;
+    TCGv_i64 src1, dst, shift;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new_i64();
+    dst = tcg_temp_new_i64();
+    t1 = tcg_temp_new();
+
+    gen_get_gpr(t1, a->rs1);
+    tcg_gen_ext_tl_i64(src1, t1);
+    shift = tcg_const_i64(a->shamt);
+
+    fn(dst, cpu_env, src1, shift);
+    tcg_gen_trunc_i64_tl(t1, dst);
+    gen_set_gpr(a->rd, t1);
+
+    tcg_temp_free_i64(src1);
+    tcg_temp_free_i64(dst);
+    tcg_temp_free_i64(shift);
+    tcg_temp_free(t1);
+    return true;
+}
+
+#define GEN_RVP64_SHIFTI(NAME, OP)                       \
+static bool trans_##NAME(DisasContext *s, arg_shift *a)  \
+{                                                        \
+    REQUIRE_64BIT(s);                                    \
+    return rvp64_shifti(s, a, OP);                       \
+}
+
+GEN_RVP64_SHIFTI(srai32, gen_helper_sra32);
+GEN_RVP64_SHIFTI(srli32, gen_helper_srl32);
+GEN_RVP64_SHIFTI(slli32, gen_helper_sll32);
+
+GEN_RVP64_SHIFTI(srai32_u, gen_helper_sra32_u);
+GEN_RVP64_SHIFTI(srli32_u, gen_helper_srl32_u);
+GEN_RVP64_SHIFTI(kslli32, gen_helper_ksll32);
+
+GEN_RVP64_R_OOL(sra32);
+GEN_RVP64_R_OOL(srl32);
+GEN_RVP64_R_OOL(sll32);
+GEN_RVP64_R_OOL(ksll32);
+GEN_RVP64_R_OOL(kslra32);
+
+GEN_RVP64_R_OOL(sra32_u);
+GEN_RVP64_R_OOL(srl32_u);
+GEN_RVP64_R_OOL(kslra32_u);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 305c515132..74d42e4c33 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3263,3 +3263,107 @@ static inline void do_ukstsa32(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64_64_64(ukstsa32, 2, 4);
+
+/* (RV64 Only) SIMD 32-bit Shift Instructions */
+static inline void do_sra32(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+    d[i] = a[i] >> shift;
+}
+
+RVPR64_64_64(sra32, 1, 4);
+
+static inline void do_srl32(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+    d[i] = a[i] >> shift;
+}
+
+RVPR64_64_64(srl32, 1, 4);
+
+static inline void do_sll32(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+    d[i] = a[i] << shift;
+}
+
+RVPR64_64_64(sll32, 1, 4);
+
+static inline void do_sra32_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+
+    d[i] = vssra32(env, 0, a[i], shift);
+}
+
+RVPR64_64_64(sra32_u, 1, 4);
+
+static inline void do_srl32_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+
+    d[i] = vssrl32(env, 0, a[i], shift);
+}
+
+RVPR64_64_64(srl32_u, 1, 4);
+
+static inline void do_ksll32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, result;
+    uint8_t shift = *(uint64_t *)vb & 0x1f;
+
+    result = a[i] << shift;
+    if (shift > clrsb32(a[i])) {
+        env->vxsat = 0x1;
+        d[i] = (a[i] & INT32_MIN) ? INT32_MIN : INT32_MAX;
+    } else {
+        d[i] = result;
+    }
+}
+
+RVPR64_64_64(ksll32, 1, 4);
+
+static inline void do_kslra32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int64_t shift = sextract64(*(uint64_t *)vb, 0, 6);
+
+    if (shift >= 0) {
+        do_ksll32(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 32) ? 31 : shift;
+        d[i] = a[i] >> shift;
+    }
+}
+
+RVPR64_64_64(kslra32, 1, 4);
+
+static inline void do_kslra32_u(CPURISCVState *env, void *vd, void *va,
+                                void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 6);
+
+    if (shift >= 0) {
+        do_ksll32(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 32) ? 31 : shift;
+        d[i] = vssra32(env, 0, a[i], shift);
+    }
+}
+
+RVPR64_64_64(kslra32_u, 1, 4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 29/37] target/riscv: RV64 Only SIMD 32-bit Shift Instructions
@ 2021-06-10  7:59   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

SIMD 32-bit right shift with rounding or left shift with
saturation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   9 ++
 target/riscv/insn32.decode              |  15 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  55 +++++++++++++
 target/riscv/packed_helper.c            | 104 ++++++++++++++++++++++++
 4 files changed, 183 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 0f02e140f5..3b2a73db9a 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1428,3 +1428,12 @@ DEF_HELPER_3(rstsa32, i64, env, i64, i64)
 DEF_HELPER_3(urstsa32, i64, env, i64, i64)
 DEF_HELPER_3(kstsa32, i64, env, i64, i64)
 DEF_HELPER_3(ukstsa32, i64, env, i64, i64)
+
+DEF_HELPER_3(sra32, i64, env, i64, i64)
+DEF_HELPER_3(sra32_u, i64, env, i64, i64)
+DEF_HELPER_3(srl32, i64, env, i64, i64)
+DEF_HELPER_3(srl32_u, i64, env, i64, i64)
+DEF_HELPER_3(sll32, i64, env, i64, i64)
+DEF_HELPER_3(ksll32, i64, env, i64, i64)
+DEF_HELPER_3(kslra32, i64, env, i64, i64)
+DEF_HELPER_3(kslra32_u, i64, env, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 05151c6c51..80150c693a 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1045,3 +1045,18 @@ rstsa32    1011001  ..... ..... 010 ..... 1110111 @r
 urstsa32   1101001  ..... ..... 010 ..... 1110111 @r
 kstsa32    1100001  ..... ..... 010 ..... 1110111 @r
 ukstsa32   1110001  ..... ..... 010 ..... 1110111 @r
+
+sra32      0101000  ..... ..... 010 ..... 1110111 @r
+sra32_u    0110000  ..... ..... 010 ..... 1110111 @r
+srai32     0111000  ..... ..... 010 ..... 1110111 @sh5
+srai32_u   1000000  ..... ..... 010 ..... 1110111 @sh5
+srl32      0101001  ..... ..... 010 ..... 1110111 @r
+srl32_u    0110001  ..... ..... 010 ..... 1110111 @r
+srli32     0111001  ..... ..... 010 ..... 1110111 @sh5
+srli32_u   1000001  ..... ..... 010 ..... 1110111 @sh5
+sll32      0101010  ..... ..... 010 ..... 1110111 @r
+slli32     0111010  ..... ..... 010 ..... 1110111 @sh5
+ksll32     0110010  ..... ..... 010 ..... 1110111 @r
+kslli32    1000010  ..... ..... 010 ..... 1110111 @sh5
+kslra32    0101011  ..... ..... 010 ..... 1110111 @r
+kslra32_u  0110011  ..... ..... 010 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 293c2c4597..6cba14be84 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1033,3 +1033,58 @@ GEN_RVP64_R_OOL(rstsa32);
 GEN_RVP64_R_OOL(urstsa32);
 GEN_RVP64_R_OOL(kstsa32);
 GEN_RVP64_R_OOL(ukstsa32);
+
+/* (RV64 Only) SIMD 32-bit Shift Instructions */
+static inline bool
+rvp64_shifti(DisasContext *ctx, arg_shift *a,
+             void (* fn)(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64))
+{
+    TCGv t1;
+    TCGv_i64 src1, dst, shift;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new_i64();
+    dst = tcg_temp_new_i64();
+    t1 = tcg_temp_new();
+
+    gen_get_gpr(t1, a->rs1);
+    tcg_gen_ext_tl_i64(src1, t1);
+    shift = tcg_const_i64(a->shamt);
+
+    fn(dst, cpu_env, src1, shift);
+    tcg_gen_trunc_i64_tl(t1, dst);
+    gen_set_gpr(a->rd, t1);
+
+    tcg_temp_free_i64(src1);
+    tcg_temp_free_i64(dst);
+    tcg_temp_free_i64(shift);
+    tcg_temp_free(t1);
+    return true;
+}
+
+#define GEN_RVP64_SHIFTI(NAME, OP)                       \
+static bool trans_##NAME(DisasContext *s, arg_shift *a)  \
+{                                                        \
+    REQUIRE_64BIT(s);                                    \
+    return rvp64_shifti(s, a, OP);                       \
+}
+
+GEN_RVP64_SHIFTI(srai32, gen_helper_sra32);
+GEN_RVP64_SHIFTI(srli32, gen_helper_srl32);
+GEN_RVP64_SHIFTI(slli32, gen_helper_sll32);
+
+GEN_RVP64_SHIFTI(srai32_u, gen_helper_sra32_u);
+GEN_RVP64_SHIFTI(srli32_u, gen_helper_srl32_u);
+GEN_RVP64_SHIFTI(kslli32, gen_helper_ksll32);
+
+GEN_RVP64_R_OOL(sra32);
+GEN_RVP64_R_OOL(srl32);
+GEN_RVP64_R_OOL(sll32);
+GEN_RVP64_R_OOL(ksll32);
+GEN_RVP64_R_OOL(kslra32);
+
+GEN_RVP64_R_OOL(sra32_u);
+GEN_RVP64_R_OOL(srl32_u);
+GEN_RVP64_R_OOL(kslra32_u);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 305c515132..74d42e4c33 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3263,3 +3263,107 @@ static inline void do_ukstsa32(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64_64_64(ukstsa32, 2, 4);
+
+/* (RV64 Only) SIMD 32-bit Shift Instructions */
+static inline void do_sra32(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+    d[i] = a[i] >> shift;
+}
+
+RVPR64_64_64(sra32, 1, 4);
+
+static inline void do_srl32(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+    d[i] = a[i] >> shift;
+}
+
+RVPR64_64_64(srl32, 1, 4);
+
+static inline void do_sll32(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+    d[i] = a[i] << shift;
+}
+
+RVPR64_64_64(sll32, 1, 4);
+
+static inline void do_sra32_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+
+    d[i] = vssra32(env, 0, a[i], shift);
+}
+
+RVPR64_64_64(sra32_u, 1, 4);
+
+static inline void do_srl32_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+
+    d[i] = vssrl32(env, 0, a[i], shift);
+}
+
+RVPR64_64_64(srl32_u, 1, 4);
+
+static inline void do_ksll32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, result;
+    uint8_t shift = *(uint64_t *)vb & 0x1f;
+
+    result = a[i] << shift;
+    if (shift > clrsb32(a[i])) {
+        env->vxsat = 0x1;
+        d[i] = (a[i] & INT32_MIN) ? INT32_MIN : INT32_MAX;
+    } else {
+        d[i] = result;
+    }
+}
+
+RVPR64_64_64(ksll32, 1, 4);
+
+static inline void do_kslra32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int64_t shift = sextract64(*(uint64_t *)vb, 0, 6);
+
+    if (shift >= 0) {
+        do_ksll32(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 32) ? 31 : shift;
+        d[i] = a[i] >> shift;
+    }
+}
+
+RVPR64_64_64(kslra32, 1, 4);
+
+static inline void do_kslra32_u(CPURISCVState *env, void *vd, void *va,
+                                void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 6);
+
+    if (shift >= 0) {
+        do_ksll32(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 32) ? 31 : shift;
+        d[i] = vssra32(env, 0, a[i], shift);
+    }
+}
+
+RVPR64_64_64(kslra32_u, 1, 4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 30/37] target/riscv: RV64 Only SIMD 32-bit Miscellaneous Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:59   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

SIMD 32-bit absolute value, signed or unsigned maximum, minimum.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  6 +++
 target/riscv/insn32.decode              |  6 +++
 target/riscv/insn_trans/trans_rvp.c.inc | 15 +++++++
 target/riscv/packed_helper.c            | 55 +++++++++++++++++++++++++
 4 files changed, 82 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 3b2a73db9a..d992859747 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1437,3 +1437,9 @@ DEF_HELPER_3(sll32, i64, env, i64, i64)
 DEF_HELPER_3(ksll32, i64, env, i64, i64)
 DEF_HELPER_3(kslra32, i64, env, i64, i64)
 DEF_HELPER_3(kslra32_u, i64, env, i64, i64)
+
+DEF_HELPER_3(smin32, i64, env, i64, i64)
+DEF_HELPER_3(umin32, i64, env, i64, i64)
+DEF_HELPER_3(smax32, i64, env, i64, i64)
+DEF_HELPER_3(umax32, i64, env, i64, i64)
+DEF_HELPER_2(kabs32, tl, env, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 80150c693a..ee5f855f28 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1060,3 +1060,9 @@ ksll32     0110010  ..... ..... 010 ..... 1110111 @r
 kslli32    1000010  ..... ..... 010 ..... 1110111 @sh5
 kslra32    0101011  ..... ..... 010 ..... 1110111 @r
 kslra32_u  0110011  ..... ..... 010 ..... 1110111 @r
+
+smin32     1001000  ..... ..... 010 ..... 1110111 @r
+umin32     1010000  ..... ..... 010 ..... 1110111 @r
+smax32     1001001  ..... ..... 010 ..... 1110111 @r
+umax32     1010001  ..... ..... 010 ..... 1110111 @r
+kabs32     1010110  10010 ..... 000 ..... 1110111 @r2
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 6cba14be84..77586e07e4 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1088,3 +1088,18 @@ GEN_RVP64_R_OOL(kslra32);
 GEN_RVP64_R_OOL(sra32_u);
 GEN_RVP64_R_OOL(srl32_u);
 GEN_RVP64_R_OOL(kslra32_u);
+
+/* (RV64 Only) SIMD 32-bit Miscellaneous Instructions */
+GEN_RVP64_R_OOL(smin32);
+GEN_RVP64_R_OOL(umin32);
+GEN_RVP64_R_OOL(smax32);
+GEN_RVP64_R_OOL(umax32);
+
+#define GEN_RVP64_R2_OOL(NAME)                         \
+static bool trans_##NAME(DisasContext *s, arg_r2 *a)   \
+{                                                      \
+    REQUIRE_64BIT(s);                                  \
+    return r2_ool(s, a, gen_helper_##NAME);            \
+}
+
+GEN_RVP64_R2_OOL(kabs32);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 74d42e4c33..a808dae9d8 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3367,3 +3367,58 @@ static inline void do_kslra32_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64_64_64(kslra32_u, 1, 4);
+
+/* (RV64 Only) SIMD 32-bit Miscellaneous Instructions */
+static inline void do_smin32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR64_64_64(smin32, 1, 4);
+
+static inline void do_umin32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR64_64_64(umin32, 1, 4);
+
+static inline void do_smax32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR64_64_64(smax32, 1, 4);
+
+static inline void do_umax32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR64_64_64(umax32, 1, 4);
+
+static inline void do_kabs32(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+
+    if (a[i] == INT32_MIN) {
+        d[i] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = abs(a[i]);
+    }
+}
+
+RVPR2(kabs32, 1, 4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 30/37] target/riscv: RV64 Only SIMD 32-bit Miscellaneous Instructions
@ 2021-06-10  7:59   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

SIMD 32-bit absolute value, signed or unsigned maximum, minimum.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  6 +++
 target/riscv/insn32.decode              |  6 +++
 target/riscv/insn_trans/trans_rvp.c.inc | 15 +++++++
 target/riscv/packed_helper.c            | 55 +++++++++++++++++++++++++
 4 files changed, 82 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 3b2a73db9a..d992859747 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1437,3 +1437,9 @@ DEF_HELPER_3(sll32, i64, env, i64, i64)
 DEF_HELPER_3(ksll32, i64, env, i64, i64)
 DEF_HELPER_3(kslra32, i64, env, i64, i64)
 DEF_HELPER_3(kslra32_u, i64, env, i64, i64)
+
+DEF_HELPER_3(smin32, i64, env, i64, i64)
+DEF_HELPER_3(umin32, i64, env, i64, i64)
+DEF_HELPER_3(smax32, i64, env, i64, i64)
+DEF_HELPER_3(umax32, i64, env, i64, i64)
+DEF_HELPER_2(kabs32, tl, env, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 80150c693a..ee5f855f28 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1060,3 +1060,9 @@ ksll32     0110010  ..... ..... 010 ..... 1110111 @r
 kslli32    1000010  ..... ..... 010 ..... 1110111 @sh5
 kslra32    0101011  ..... ..... 010 ..... 1110111 @r
 kslra32_u  0110011  ..... ..... 010 ..... 1110111 @r
+
+smin32     1001000  ..... ..... 010 ..... 1110111 @r
+umin32     1010000  ..... ..... 010 ..... 1110111 @r
+smax32     1001001  ..... ..... 010 ..... 1110111 @r
+umax32     1010001  ..... ..... 010 ..... 1110111 @r
+kabs32     1010110  10010 ..... 000 ..... 1110111 @r2
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 6cba14be84..77586e07e4 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1088,3 +1088,18 @@ GEN_RVP64_R_OOL(kslra32);
 GEN_RVP64_R_OOL(sra32_u);
 GEN_RVP64_R_OOL(srl32_u);
 GEN_RVP64_R_OOL(kslra32_u);
+
+/* (RV64 Only) SIMD 32-bit Miscellaneous Instructions */
+GEN_RVP64_R_OOL(smin32);
+GEN_RVP64_R_OOL(umin32);
+GEN_RVP64_R_OOL(smax32);
+GEN_RVP64_R_OOL(umax32);
+
+#define GEN_RVP64_R2_OOL(NAME)                         \
+static bool trans_##NAME(DisasContext *s, arg_r2 *a)   \
+{                                                      \
+    REQUIRE_64BIT(s);                                  \
+    return r2_ool(s, a, gen_helper_##NAME);            \
+}
+
+GEN_RVP64_R2_OOL(kabs32);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 74d42e4c33..a808dae9d8 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3367,3 +3367,58 @@ static inline void do_kslra32_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64_64_64(kslra32_u, 1, 4);
+
+/* (RV64 Only) SIMD 32-bit Miscellaneous Instructions */
+static inline void do_smin32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR64_64_64(smin32, 1, 4);
+
+static inline void do_umin32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR64_64_64(umin32, 1, 4);
+
+static inline void do_smax32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR64_64_64(smax32, 1, 4);
+
+static inline void do_umax32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR64_64_64(umax32, 1, 4);
+
+static inline void do_kabs32(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+
+    if (a[i] == INT32_MIN) {
+        d[i] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = abs(a[i]);
+    }
+}
+
+RVPR2(kabs32, 1, 4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 31/37] target/riscv: RV64 Only SIMD Q15 saturating Multiply Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:59   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

Q15 saturation limits the result to the range [INT16_MIN, INT16_MAX].

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  10 ++
 target/riscv/insn32.decode              |  10 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  19 ++++
 target/riscv/packed_helper.c            | 139 ++++++++++++++++++++++++
 4 files changed, 178 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index d992859747..5edaf389e4 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1443,3 +1443,13 @@ DEF_HELPER_3(umin32, i64, env, i64, i64)
 DEF_HELPER_3(smax32, i64, env, i64, i64)
 DEF_HELPER_3(umax32, i64, env, i64, i64)
 DEF_HELPER_2(kabs32, tl, env, tl)
+
+DEF_HELPER_3(khmbb16, i64, env, i64, i64)
+DEF_HELPER_3(khmbt16, i64, env, i64, i64)
+DEF_HELPER_3(khmtt16, i64, env, i64, i64)
+DEF_HELPER_3(kdmbb16, i64, env, i64, i64)
+DEF_HELPER_3(kdmbt16, i64, env, i64, i64)
+DEF_HELPER_3(kdmtt16, i64, env, i64, i64)
+DEF_HELPER_4(kdmabb16, tl, env, tl, tl, tl)
+DEF_HELPER_4(kdmabt16, tl, env, tl, tl, tl)
+DEF_HELPER_4(kdmatt16, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index ee5f855f28..a7b5643d5f 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1066,3 +1066,13 @@ umin32     1010000  ..... ..... 010 ..... 1110111 @r
 smax32     1001001  ..... ..... 010 ..... 1110111 @r
 umax32     1010001  ..... ..... 010 ..... 1110111 @r
 kabs32     1010110  10010 ..... 000 ..... 1110111 @r2
+
+khmbb16    1101110  ..... ..... 001 ..... 1110111 @r
+khmbt16    1110110  ..... ..... 001 ..... 1110111 @r
+khmtt16    1111110  ..... ..... 001 ..... 1110111 @r
+kdmbb16    1101101  ..... ..... 001 ..... 1110111 @r
+kdmbt16    1110101  ..... ..... 001 ..... 1110111 @r
+kdmtt16    1111101  ..... ..... 001 ..... 1110111 @r
+kdmabb16   1101100  ..... ..... 001 ..... 1110111 @r
+kdmabt16   1110100  ..... ..... 001 ..... 1110111 @r
+kdmatt16   1111100  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 77586e07e4..aa97161697 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1103,3 +1103,22 @@ static bool trans_##NAME(DisasContext *s, arg_r2 *a)   \
 }
 
 GEN_RVP64_R2_OOL(kabs32);
+
+/* (RV64 Only) SIMD Q15 saturating Multiply Instructions */
+GEN_RVP64_R_OOL(khmbb16);
+GEN_RVP64_R_OOL(khmbt16);
+GEN_RVP64_R_OOL(khmtt16);
+GEN_RVP64_R_OOL(kdmbb16);
+GEN_RVP64_R_OOL(kdmbt16);
+GEN_RVP64_R_OOL(kdmtt16);
+
+#define GEN_RVP64_R_ACC_OOL(NAME)                      \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    REQUIRE_64BIT(s);                                  \
+    return r_acc_ool(s, a, gen_helper_##NAME);         \
+}
+
+GEN_RVP64_R_ACC_OOL(kdmabb16);
+GEN_RVP64_R_ACC_OOL(kdmabt16);
+GEN_RVP64_R_ACC_OOL(kdmatt16);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index a808dae9d8..32e0af2ef6 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3422,3 +3422,142 @@ static inline void do_kabs32(CPURISCVState *env, void *vd, void *va, uint8_t i)
 }
 
 RVPR2(kabs32, 1, 4);
+
+/* (RV64 Only) SIMD Q15 saturating Multiply Instructions */
+static inline void do_khmbb16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    d[H4(i / 2)] = sat64(env, (int64_t)a[H2(i)] * b[H2(i)] >> 15, 15);
+}
+
+RVPR64_64_64(khmbb16, 2, 2);
+
+static inline void do_khmbt16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    d[H4(i / 2)] = sat64(env, (int64_t)a[H2(i)] * b[H2(i + 1)] >> 15, 15);
+}
+
+RVPR64_64_64(khmbt16, 2, 2);
+
+static inline void do_khmtt16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    d[H4(i / 2)] = sat64(env, (int64_t)a[H2(i + 1)] * b[H2(i + 1)] >> 15, 15);
+}
+
+RVPR64_64_64(khmtt16, 2, 2);
+
+static inline void do_kdmbb16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i)] == INT16_MIN) {
+        d[H4(i / 2)] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[H4(i / 2)] = (int64_t)a[H2(i)] * b[H2(i)] << 1;
+    }
+}
+
+RVPR64_64_64(kdmbb16, 2, 2);
+
+static inline void do_kdmbt16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        d[H4(i / 2)] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[H4(i / 2)] = (int64_t)a[H2(i)] * b[H2(i + 1)] << 1;
+    }
+}
+
+RVPR64_64_64(kdmbt16, 2, 2);
+
+static inline void do_kdmtt16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i + 1)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        d[H4(i / 2)] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[H4(i / 2)] = (int64_t)a[H2(i + 1)] * b[H2(i + 1)] << 1;
+    }
+}
+
+RVPR64_64_64(kdmtt16, 2, 2);
+
+static inline void do_kdmabb16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i)] * b[H2(i)] << 1;
+    }
+    d[H4(i / 2)] = sadd32(env, 0, c[H4(i / 2)], m0);
+}
+
+RVPR_ACC(kdmabb16, 2, 2);
+
+static inline void do_kdmabt16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i)] * b[H2(i + 1)] << 1;
+    }
+    d[H4(i / 2)] = sadd32(env, 0, c[H4(i / 2)], m0);
+}
+
+RVPR_ACC(kdmabt16, 2, 2);
+
+static inline void do_kdmatt16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i + 1)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i + 1)] * b[H2(i + 1)] << 1;
+    }
+    d[H4(i / 2)] = sadd32(env, 0, c[H4(i / 2)], m0);
+}
+
+RVPR_ACC(kdmatt16, 2, 2);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 31/37] target/riscv: RV64 Only SIMD Q15 saturating Multiply Instructions
@ 2021-06-10  7:59   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

Q15 saturation limits the result to the range [INT16_MIN, INT16_MAX].

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  10 ++
 target/riscv/insn32.decode              |  10 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  19 ++++
 target/riscv/packed_helper.c            | 139 ++++++++++++++++++++++++
 4 files changed, 178 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index d992859747..5edaf389e4 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1443,3 +1443,13 @@ DEF_HELPER_3(umin32, i64, env, i64, i64)
 DEF_HELPER_3(smax32, i64, env, i64, i64)
 DEF_HELPER_3(umax32, i64, env, i64, i64)
 DEF_HELPER_2(kabs32, tl, env, tl)
+
+DEF_HELPER_3(khmbb16, i64, env, i64, i64)
+DEF_HELPER_3(khmbt16, i64, env, i64, i64)
+DEF_HELPER_3(khmtt16, i64, env, i64, i64)
+DEF_HELPER_3(kdmbb16, i64, env, i64, i64)
+DEF_HELPER_3(kdmbt16, i64, env, i64, i64)
+DEF_HELPER_3(kdmtt16, i64, env, i64, i64)
+DEF_HELPER_4(kdmabb16, tl, env, tl, tl, tl)
+DEF_HELPER_4(kdmabt16, tl, env, tl, tl, tl)
+DEF_HELPER_4(kdmatt16, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index ee5f855f28..a7b5643d5f 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1066,3 +1066,13 @@ umin32     1010000  ..... ..... 010 ..... 1110111 @r
 smax32     1001001  ..... ..... 010 ..... 1110111 @r
 umax32     1010001  ..... ..... 010 ..... 1110111 @r
 kabs32     1010110  10010 ..... 000 ..... 1110111 @r2
+
+khmbb16    1101110  ..... ..... 001 ..... 1110111 @r
+khmbt16    1110110  ..... ..... 001 ..... 1110111 @r
+khmtt16    1111110  ..... ..... 001 ..... 1110111 @r
+kdmbb16    1101101  ..... ..... 001 ..... 1110111 @r
+kdmbt16    1110101  ..... ..... 001 ..... 1110111 @r
+kdmtt16    1111101  ..... ..... 001 ..... 1110111 @r
+kdmabb16   1101100  ..... ..... 001 ..... 1110111 @r
+kdmabt16   1110100  ..... ..... 001 ..... 1110111 @r
+kdmatt16   1111100  ..... ..... 001 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 77586e07e4..aa97161697 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1103,3 +1103,22 @@ static bool trans_##NAME(DisasContext *s, arg_r2 *a)   \
 }
 
 GEN_RVP64_R2_OOL(kabs32);
+
+/* (RV64 Only) SIMD Q15 saturating Multiply Instructions */
+GEN_RVP64_R_OOL(khmbb16);
+GEN_RVP64_R_OOL(khmbt16);
+GEN_RVP64_R_OOL(khmtt16);
+GEN_RVP64_R_OOL(kdmbb16);
+GEN_RVP64_R_OOL(kdmbt16);
+GEN_RVP64_R_OOL(kdmtt16);
+
+#define GEN_RVP64_R_ACC_OOL(NAME)                      \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    REQUIRE_64BIT(s);                                  \
+    return r_acc_ool(s, a, gen_helper_##NAME);         \
+}
+
+GEN_RVP64_R_ACC_OOL(kdmabb16);
+GEN_RVP64_R_ACC_OOL(kdmabt16);
+GEN_RVP64_R_ACC_OOL(kdmatt16);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index a808dae9d8..32e0af2ef6 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3422,3 +3422,142 @@ static inline void do_kabs32(CPURISCVState *env, void *vd, void *va, uint8_t i)
 }
 
 RVPR2(kabs32, 1, 4);
+
+/* (RV64 Only) SIMD Q15 saturating Multiply Instructions */
+static inline void do_khmbb16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    d[H4(i / 2)] = sat64(env, (int64_t)a[H2(i)] * b[H2(i)] >> 15, 15);
+}
+
+RVPR64_64_64(khmbb16, 2, 2);
+
+static inline void do_khmbt16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    d[H4(i / 2)] = sat64(env, (int64_t)a[H2(i)] * b[H2(i + 1)] >> 15, 15);
+}
+
+RVPR64_64_64(khmbt16, 2, 2);
+
+static inline void do_khmtt16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    d[H4(i / 2)] = sat64(env, (int64_t)a[H2(i + 1)] * b[H2(i + 1)] >> 15, 15);
+}
+
+RVPR64_64_64(khmtt16, 2, 2);
+
+static inline void do_kdmbb16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i)] == INT16_MIN) {
+        d[H4(i / 2)] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[H4(i / 2)] = (int64_t)a[H2(i)] * b[H2(i)] << 1;
+    }
+}
+
+RVPR64_64_64(kdmbb16, 2, 2);
+
+static inline void do_kdmbt16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        d[H4(i / 2)] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[H4(i / 2)] = (int64_t)a[H2(i)] * b[H2(i + 1)] << 1;
+    }
+}
+
+RVPR64_64_64(kdmbt16, 2, 2);
+
+static inline void do_kdmtt16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i + 1)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        d[H4(i / 2)] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[H4(i / 2)] = (int64_t)a[H2(i + 1)] * b[H2(i + 1)] << 1;
+    }
+}
+
+RVPR64_64_64(kdmtt16, 2, 2);
+
+static inline void do_kdmabb16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i)] * b[H2(i)] << 1;
+    }
+    d[H4(i / 2)] = sadd32(env, 0, c[H4(i / 2)], m0);
+}
+
+RVPR_ACC(kdmabb16, 2, 2);
+
+static inline void do_kdmabt16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i)] * b[H2(i + 1)] << 1;
+    }
+    d[H4(i / 2)] = sadd32(env, 0, c[H4(i / 2)], m0);
+}
+
+RVPR_ACC(kdmabt16, 2, 2);
+
+static inline void do_kdmatt16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i + 1)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i + 1)] * b[H2(i + 1)] << 1;
+    }
+    d[H4(i / 2)] = sadd32(env, 0, c[H4(i / 2)], m0);
+}
+
+RVPR_ACC(kdmatt16, 2, 2);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 32/37] target/riscv: RV64 Only 32-bit Multiply Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:59   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

Multiply the straight or crossed 32-bit elements of two registers.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  3 +++
 target/riscv/insn32.decode              |  3 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  4 ++++
 target/riscv/packed_helper.c            | 21 +++++++++++++++++++++
 4 files changed, 31 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 5edaf389e4..0fa48955d8 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1453,3 +1453,6 @@ DEF_HELPER_3(kdmtt16, i64, env, i64, i64)
 DEF_HELPER_4(kdmabb16, tl, env, tl, tl, tl)
 DEF_HELPER_4(kdmabt16, tl, env, tl, tl, tl)
 DEF_HELPER_4(kdmatt16, tl, env, tl, tl, tl)
+
+DEF_HELPER_3(smbt32, i64, env, i64, i64)
+DEF_HELPER_3(smtt32, i64, env, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index a7b5643d5f..d06075c062 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1076,3 +1076,6 @@ kdmtt16    1111101  ..... ..... 001 ..... 1110111 @r
 kdmabb16   1101100  ..... ..... 001 ..... 1110111 @r
 kdmabt16   1110100  ..... ..... 001 ..... 1110111 @r
 kdmatt16   1111100  ..... ..... 001 ..... 1110111 @r
+
+smbt32     0001100  ..... ..... 010 ..... 1110111 @r
+smtt32     0010100  ..... ..... 010 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index aa97161697..a88ce7a5c4 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1122,3 +1122,7 @@ static bool trans_##NAME(DisasContext *s, arg_r *a)    \
 GEN_RVP64_R_ACC_OOL(kdmabb16);
 GEN_RVP64_R_ACC_OOL(kdmabt16);
 GEN_RVP64_R_ACC_OOL(kdmatt16);
+
+/* (RV64 Only) 32-bit Multiply Instructions */
+GEN_RVP64_R_OOL(smbt32);
+GEN_RVP64_R_OOL(smtt32);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 32e0af2ef6..eb086b775f 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3561,3 +3561,24 @@ static inline void do_kdmatt16(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(kdmatt16, 2, 2);
+
+/* (RV64 Only) 32-bit Multiply Instructions */
+static inline void do_smbt32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    *d = (int64_t)a[H4(2 * i)] * b[H4(2 * i + 1)];
+}
+
+RVPR64_64_64(smbt32, 1, 8);
+
+static inline void do_smtt32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    *d = (int64_t)a[H4(2 * i + 1)] * b[H4(2 * i + 1)];
+}
+
+RVPR64_64_64(smtt32, 1, 8);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 32/37] target/riscv: RV64 Only 32-bit Multiply Instructions
@ 2021-06-10  7:59   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

Multiply the straight or crossed 32-bit elements of two registers.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  3 +++
 target/riscv/insn32.decode              |  3 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  4 ++++
 target/riscv/packed_helper.c            | 21 +++++++++++++++++++++
 4 files changed, 31 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 5edaf389e4..0fa48955d8 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1453,3 +1453,6 @@ DEF_HELPER_3(kdmtt16, i64, env, i64, i64)
 DEF_HELPER_4(kdmabb16, tl, env, tl, tl, tl)
 DEF_HELPER_4(kdmabt16, tl, env, tl, tl, tl)
 DEF_HELPER_4(kdmatt16, tl, env, tl, tl, tl)
+
+DEF_HELPER_3(smbt32, i64, env, i64, i64)
+DEF_HELPER_3(smtt32, i64, env, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index a7b5643d5f..d06075c062 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1076,3 +1076,6 @@ kdmtt16    1111101  ..... ..... 001 ..... 1110111 @r
 kdmabb16   1101100  ..... ..... 001 ..... 1110111 @r
 kdmabt16   1110100  ..... ..... 001 ..... 1110111 @r
 kdmatt16   1111100  ..... ..... 001 ..... 1110111 @r
+
+smbt32     0001100  ..... ..... 010 ..... 1110111 @r
+smtt32     0010100  ..... ..... 010 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index aa97161697..a88ce7a5c4 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1122,3 +1122,7 @@ static bool trans_##NAME(DisasContext *s, arg_r *a)    \
 GEN_RVP64_R_ACC_OOL(kdmabb16);
 GEN_RVP64_R_ACC_OOL(kdmabt16);
 GEN_RVP64_R_ACC_OOL(kdmatt16);
+
+/* (RV64 Only) 32-bit Multiply Instructions */
+GEN_RVP64_R_OOL(smbt32);
+GEN_RVP64_R_OOL(smtt32);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 32e0af2ef6..eb086b775f 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3561,3 +3561,24 @@ static inline void do_kdmatt16(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(kdmatt16, 2, 2);
+
+/* (RV64 Only) 32-bit Multiply Instructions */
+static inline void do_smbt32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    *d = (int64_t)a[H4(2 * i)] * b[H4(2 * i + 1)];
+}
+
+RVPR64_64_64(smbt32, 1, 8);
+
+static inline void do_smtt32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    *d = (int64_t)a[H4(2 * i + 1)] * b[H4(2 * i + 1)];
+}
+
+RVPR64_64_64(smtt32, 1, 8);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 33/37] target/riscv: RV64 Only 32-bit Multiply & Add Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:59   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

32x32 multiplication result is added to a third register with Q63 saturation

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  4 ++++
 target/riscv/insn32.decode              |  4 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  5 ++++
 target/riscv/packed_helper.c            | 31 +++++++++++++++++++++++++
 4 files changed, 44 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 0fa48955d8..05f8f31367 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1456,3 +1456,7 @@ DEF_HELPER_4(kdmatt16, tl, env, tl, tl, tl)
 
 DEF_HELPER_3(smbt32, i64, env, i64, i64)
 DEF_HELPER_3(smtt32, i64, env, i64, i64)
+
+DEF_HELPER_4(kmabb32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmabt32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmatt32, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index d06075c062..dec714a064 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1079,3 +1079,7 @@ kdmatt16   1111100  ..... ..... 001 ..... 1110111 @r
 
 smbt32     0001100  ..... ..... 010 ..... 1110111 @r
 smtt32     0010100  ..... ..... 010 ..... 1110111 @r
+
+kmabb32    0101101  ..... ..... 010 ..... 1110111 @r
+kmabt32    0110101  ..... ..... 010 ..... 1110111 @r
+kmatt32    0111101  ..... ..... 010 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index a88ce7a5c4..2de81abbb8 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1126,3 +1126,8 @@ GEN_RVP64_R_ACC_OOL(kdmatt16);
 /* (RV64 Only) 32-bit Multiply Instructions */
 GEN_RVP64_R_OOL(smbt32);
 GEN_RVP64_R_OOL(smtt32);
+
+/* (RV64 Only) 32-bit Multiply & Add Instructions */
+GEN_RVP64_R_ACC_OOL(kmabb32);
+GEN_RVP64_R_ACC_OOL(kmabt32);
+GEN_RVP64_R_ACC_OOL(kmatt32);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index eb086b775f..3c05c748c4 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3582,3 +3582,34 @@ static inline void do_smtt32(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64_64_64(smtt32, 1, 8);
+
+/* (RV64 Only) 32-bit Multiply & Add Instructions */
+static inline void do_kmabb32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    *d = sadd64(env, 0, (int64_t)a[H4(2 * i)] * b[H4(2 * i)], *c);
+}
+
+RVPR_ACC(kmabb32, 1, 8);
+
+static inline void do_kmabt32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    *d = sadd64(env, 0, (int64_t)a[H4(2 * i)] * b[H4(2 * i + 1)], *c);
+}
+
+RVPR_ACC(kmabt32, 1, 8);
+
+static inline void do_kmatt32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    *d = sadd64(env, 0, (int64_t)a[H4(2 * i + 1)] * b[H4(2 * i + 1)], *c);
+}
+
+RVPR_ACC(kmatt32, 1, 8);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 33/37] target/riscv: RV64 Only 32-bit Multiply & Add Instructions
@ 2021-06-10  7:59   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

32x32 multiplication result is added to a third register with Q63 saturation

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  4 ++++
 target/riscv/insn32.decode              |  4 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  5 ++++
 target/riscv/packed_helper.c            | 31 +++++++++++++++++++++++++
 4 files changed, 44 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 0fa48955d8..05f8f31367 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1456,3 +1456,7 @@ DEF_HELPER_4(kdmatt16, tl, env, tl, tl, tl)
 
 DEF_HELPER_3(smbt32, i64, env, i64, i64)
 DEF_HELPER_3(smtt32, i64, env, i64, i64)
+
+DEF_HELPER_4(kmabb32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmabt32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmatt32, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index d06075c062..dec714a064 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1079,3 +1079,7 @@ kdmatt16   1111100  ..... ..... 001 ..... 1110111 @r
 
 smbt32     0001100  ..... ..... 010 ..... 1110111 @r
 smtt32     0010100  ..... ..... 010 ..... 1110111 @r
+
+kmabb32    0101101  ..... ..... 010 ..... 1110111 @r
+kmabt32    0110101  ..... ..... 010 ..... 1110111 @r
+kmatt32    0111101  ..... ..... 010 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index a88ce7a5c4..2de81abbb8 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1126,3 +1126,8 @@ GEN_RVP64_R_ACC_OOL(kdmatt16);
 /* (RV64 Only) 32-bit Multiply Instructions */
 GEN_RVP64_R_OOL(smbt32);
 GEN_RVP64_R_OOL(smtt32);
+
+/* (RV64 Only) 32-bit Multiply & Add Instructions */
+GEN_RVP64_R_ACC_OOL(kmabb32);
+GEN_RVP64_R_ACC_OOL(kmabt32);
+GEN_RVP64_R_ACC_OOL(kmatt32);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index eb086b775f..3c05c748c4 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3582,3 +3582,34 @@ static inline void do_smtt32(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64_64_64(smtt32, 1, 8);
+
+/* (RV64 Only) 32-bit Multiply & Add Instructions */
+static inline void do_kmabb32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    *d = sadd64(env, 0, (int64_t)a[H4(2 * i)] * b[H4(2 * i)], *c);
+}
+
+RVPR_ACC(kmabb32, 1, 8);
+
+static inline void do_kmabt32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    *d = sadd64(env, 0, (int64_t)a[H4(2 * i)] * b[H4(2 * i + 1)], *c);
+}
+
+RVPR_ACC(kmabt32, 1, 8);
+
+static inline void do_kmatt32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    *d = sadd64(env, 0, (int64_t)a[H4(2 * i + 1)] * b[H4(2 * i + 1)], *c);
+}
+
+RVPR_ACC(kmatt32, 1, 8);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 34/37] target/riscv: RV64 Only 32-bit Parallel Multiply & Add Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:59   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

Two 32x32 results written directly to destation register or
as operands added to a 64-bit register.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  12 ++
 target/riscv/insn32.decode              |  12 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  13 ++
 target/riscv/packed_helper.c            | 182 ++++++++++++++++++++++++
 4 files changed, 219 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 05f8f31367..aa80095e1d 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1460,3 +1460,15 @@ DEF_HELPER_3(smtt32, i64, env, i64, i64)
 DEF_HELPER_4(kmabb32, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmabt32, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmatt32, tl, env, tl, tl, tl)
+
+DEF_HELPER_3(kmda32, i64, env, i64, i64)
+DEF_HELPER_3(kmxda32, i64, env, i64, i64)
+DEF_HELPER_4(kmaxda32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmads32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmadrs32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmaxds32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmsda32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmsxda32, tl, env, tl, tl, tl)
+DEF_HELPER_3(smds32, i64, env, i64, i64)
+DEF_HELPER_3(smdrs32, i64, env, i64, i64)
+DEF_HELPER_3(smxds32, i64, env, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index dec714a064..b9eeb57ca7 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1083,3 +1083,15 @@ smtt32     0010100  ..... ..... 010 ..... 1110111 @r
 kmabb32    0101101  ..... ..... 010 ..... 1110111 @r
 kmabt32    0110101  ..... ..... 010 ..... 1110111 @r
 kmatt32    0111101  ..... ..... 010 ..... 1110111 @r
+
+kmda32     0011100  ..... ..... 010 ..... 1110111 @r
+kmxda32    0011101  ..... ..... 010 ..... 1110111 @r
+kmaxda32   0100101  ..... ..... 010 ..... 1110111 @r
+kmads32    0101110  ..... ..... 010 ..... 1110111 @r
+kmadrs32   0110110  ..... ..... 010 ..... 1110111 @r
+kmaxds32   0111110  ..... ..... 010 ..... 1110111 @r
+kmsda32    0100110  ..... ..... 010 ..... 1110111 @r
+kmsxda32   0100111  ..... ..... 010 ..... 1110111 @r
+smds32     0101100  ..... ..... 010 ..... 1110111 @r
+smdrs32    0110100  ..... ..... 010 ..... 1110111 @r
+smxds32    0111100  ..... ..... 010 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 2de81abbb8..48bcf37e36 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1131,3 +1131,16 @@ GEN_RVP64_R_OOL(smtt32);
 GEN_RVP64_R_ACC_OOL(kmabb32);
 GEN_RVP64_R_ACC_OOL(kmabt32);
 GEN_RVP64_R_ACC_OOL(kmatt32);
+
+/* (RV64 Only) 32-bit Parallel Multiply & Add Instructions */
+GEN_RVP64_R_OOL(kmda32);
+GEN_RVP64_R_OOL(kmxda32);
+GEN_RVP64_R_ACC_OOL(kmaxda32);
+GEN_RVP64_R_ACC_OOL(kmads32);
+GEN_RVP64_R_ACC_OOL(kmadrs32);
+GEN_RVP64_R_ACC_OOL(kmaxds32);
+GEN_RVP64_R_ACC_OOL(kmsda32);
+GEN_RVP64_R_ACC_OOL(kmsxda32);
+GEN_RVP64_R_OOL(smds32);
+GEN_RVP64_R_OOL(smdrs32);
+GEN_RVP64_R_OOL(smxds32);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 3c05c748c4..834e7dbebb 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3613,3 +3613,185 @@ static inline void do_kmatt32(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(kmatt32, 1, 8);
+
+/* (RV64 Only) 32-bit Parallel Multiply & Add Instructions */
+static inline void do_kmda32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H4(i)] == INT32_MIN &&
+        a[H4(i + 1)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        *d = INT64_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int64_t)a[H4(i)] * b[H4(i)] +
+             (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+    }
+}
+
+RVPR64_64_64(kmda32, 1, 8);
+
+static inline void do_kmxda32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H4(i)] == INT32_MIN &&
+        a[H4(i + 1)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        *d = INT64_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int64_t)a[H4(i)] * b[H4(i + 1)] +
+             (int64_t)a[H4(i + 1)] * b[H4(i)];
+    }
+}
+
+RVPR64_64_64(kmxda32, 1, 8);
+
+static inline void do_kmaxda32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t p1, p2;
+    p1 = (int64_t)a[H4(i)] * b[H4(i + 1)];
+    p2 = (int64_t)a[H4(i + 1)] * b[H4(i)];
+
+    if (a[H4(i)] == INT32_MIN && a[H4(i + 1)] == INT32_MIN &&
+        b[H4(i)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        if (*d < 0) {
+            *d = (INT64_MAX + *c) + 1ll;
+        } else {
+            env->vxsat = 0x1;
+            *d = INT64_MAX;
+        }
+    } else {
+        *d = sadd64(env, 0, p1 + p2, *c);
+    }
+}
+
+RVPR_ACC(kmaxda32, 1, 8);
+
+static inline void do_kmads32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t t0, t1;
+    t1 = (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+    t0 = (int64_t)a[H4(i)] * b[H4(i)];
+
+    *d = sadd64(env, 0, t1 - t0, *c);
+}
+
+RVPR_ACC(kmads32, 1, 8);
+
+static inline void do_kmadrs32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t t0, t1;
+    t1 = (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+    t0 = (int64_t)a[H4(i)] * b[H4(i)];
+
+    *d = sadd64(env, 0, t0 - t1, *c);
+}
+
+RVPR_ACC(kmadrs32, 1, 8);
+
+static inline void do_kmaxds32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t t01, t10;
+    t01 = (int64_t)a[H4(i)] * b[H4(i + 1)];
+    t10 = (int64_t)a[H4(i + 1)] * b[H4(i)];
+
+    *d = sadd64(env, 0, t10 - t01, *c);
+}
+
+RVPR_ACC(kmaxds32, 1, 8);
+
+static inline void do_kmsda32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t t0, t1;
+    t0 = (int64_t)a[H4(i)] * b[H4(i)];
+    t1 = (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+
+    if (a[H4(i)] == INT32_MIN && a[H4(i + 1)] == INT32_MIN &&
+        b[H4(i)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        if (*c < 0) {
+            env->vxsat = 0x1;
+            *d = INT64_MIN;
+        } else {
+            *d = *c - 1ll - INT64_MAX;
+        }
+    } else {
+        *d = ssub64(env, 0, *c, t0 + t1);
+    }
+}
+
+RVPR_ACC(kmsda32, 1, 8);
+
+static inline void do_kmsxda32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t t01, t10;
+    t10 = (int64_t)a[H4(i + 1)] * b[H4(i)];
+    t01 = (int64_t)a[H4(i)] * b[H4(i + 1)];
+
+    if (a[H4(i)] == INT32_MIN && a[H4(i + 1)] == INT32_MIN &&
+        b[H4(i)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        if (*c < 0) {
+            env->vxsat = 0x1;
+            *d = INT64_MIN;
+        } else {
+            *d = *c - 1ll - INT64_MAX;
+        }
+    } else {
+        *d = ssub64(env, 0, *c, t10 + t01);
+    }
+}
+
+RVPR_ACC(kmsxda32, 1, 8);
+
+static inline void do_smds32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    *d = (int64_t)a[H4(i + 1)] * b[H4(i + 1)] -
+         (int64_t)a[H4(i)] * b[H4(i)];
+}
+
+RVPR64_64_64(smds32, 1, 8);
+
+static inline void do_smdrs32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    *d = (int64_t)a[H4(i)] * b[H4(i)] -
+         (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+}
+
+RVPR64_64_64(smdrs32, 1, 8);
+
+static inline void do_smxds32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    *d = (int64_t)a[H4(i + 1)] * b[H4(i)] -
+         (int64_t)a[H4(i)] * b[H4(i + 1)];
+}
+
+RVPR64_64_64(smxds32, 1, 8);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 34/37] target/riscv: RV64 Only 32-bit Parallel Multiply & Add Instructions
@ 2021-06-10  7:59   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

Two 32x32 results written directly to destation register or
as operands added to a 64-bit register.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  12 ++
 target/riscv/insn32.decode              |  12 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  13 ++
 target/riscv/packed_helper.c            | 182 ++++++++++++++++++++++++
 4 files changed, 219 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 05f8f31367..aa80095e1d 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1460,3 +1460,15 @@ DEF_HELPER_3(smtt32, i64, env, i64, i64)
 DEF_HELPER_4(kmabb32, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmabt32, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmatt32, tl, env, tl, tl, tl)
+
+DEF_HELPER_3(kmda32, i64, env, i64, i64)
+DEF_HELPER_3(kmxda32, i64, env, i64, i64)
+DEF_HELPER_4(kmaxda32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmads32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmadrs32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmaxds32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmsda32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmsxda32, tl, env, tl, tl, tl)
+DEF_HELPER_3(smds32, i64, env, i64, i64)
+DEF_HELPER_3(smdrs32, i64, env, i64, i64)
+DEF_HELPER_3(smxds32, i64, env, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index dec714a064..b9eeb57ca7 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1083,3 +1083,15 @@ smtt32     0010100  ..... ..... 010 ..... 1110111 @r
 kmabb32    0101101  ..... ..... 010 ..... 1110111 @r
 kmabt32    0110101  ..... ..... 010 ..... 1110111 @r
 kmatt32    0111101  ..... ..... 010 ..... 1110111 @r
+
+kmda32     0011100  ..... ..... 010 ..... 1110111 @r
+kmxda32    0011101  ..... ..... 010 ..... 1110111 @r
+kmaxda32   0100101  ..... ..... 010 ..... 1110111 @r
+kmads32    0101110  ..... ..... 010 ..... 1110111 @r
+kmadrs32   0110110  ..... ..... 010 ..... 1110111 @r
+kmaxds32   0111110  ..... ..... 010 ..... 1110111 @r
+kmsda32    0100110  ..... ..... 010 ..... 1110111 @r
+kmsxda32   0100111  ..... ..... 010 ..... 1110111 @r
+smds32     0101100  ..... ..... 010 ..... 1110111 @r
+smdrs32    0110100  ..... ..... 010 ..... 1110111 @r
+smxds32    0111100  ..... ..... 010 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 2de81abbb8..48bcf37e36 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1131,3 +1131,16 @@ GEN_RVP64_R_OOL(smtt32);
 GEN_RVP64_R_ACC_OOL(kmabb32);
 GEN_RVP64_R_ACC_OOL(kmabt32);
 GEN_RVP64_R_ACC_OOL(kmatt32);
+
+/* (RV64 Only) 32-bit Parallel Multiply & Add Instructions */
+GEN_RVP64_R_OOL(kmda32);
+GEN_RVP64_R_OOL(kmxda32);
+GEN_RVP64_R_ACC_OOL(kmaxda32);
+GEN_RVP64_R_ACC_OOL(kmads32);
+GEN_RVP64_R_ACC_OOL(kmadrs32);
+GEN_RVP64_R_ACC_OOL(kmaxds32);
+GEN_RVP64_R_ACC_OOL(kmsda32);
+GEN_RVP64_R_ACC_OOL(kmsxda32);
+GEN_RVP64_R_OOL(smds32);
+GEN_RVP64_R_OOL(smdrs32);
+GEN_RVP64_R_OOL(smxds32);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 3c05c748c4..834e7dbebb 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3613,3 +3613,185 @@ static inline void do_kmatt32(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(kmatt32, 1, 8);
+
+/* (RV64 Only) 32-bit Parallel Multiply & Add Instructions */
+static inline void do_kmda32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H4(i)] == INT32_MIN &&
+        a[H4(i + 1)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        *d = INT64_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int64_t)a[H4(i)] * b[H4(i)] +
+             (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+    }
+}
+
+RVPR64_64_64(kmda32, 1, 8);
+
+static inline void do_kmxda32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H4(i)] == INT32_MIN &&
+        a[H4(i + 1)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        *d = INT64_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int64_t)a[H4(i)] * b[H4(i + 1)] +
+             (int64_t)a[H4(i + 1)] * b[H4(i)];
+    }
+}
+
+RVPR64_64_64(kmxda32, 1, 8);
+
+static inline void do_kmaxda32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t p1, p2;
+    p1 = (int64_t)a[H4(i)] * b[H4(i + 1)];
+    p2 = (int64_t)a[H4(i + 1)] * b[H4(i)];
+
+    if (a[H4(i)] == INT32_MIN && a[H4(i + 1)] == INT32_MIN &&
+        b[H4(i)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        if (*d < 0) {
+            *d = (INT64_MAX + *c) + 1ll;
+        } else {
+            env->vxsat = 0x1;
+            *d = INT64_MAX;
+        }
+    } else {
+        *d = sadd64(env, 0, p1 + p2, *c);
+    }
+}
+
+RVPR_ACC(kmaxda32, 1, 8);
+
+static inline void do_kmads32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t t0, t1;
+    t1 = (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+    t0 = (int64_t)a[H4(i)] * b[H4(i)];
+
+    *d = sadd64(env, 0, t1 - t0, *c);
+}
+
+RVPR_ACC(kmads32, 1, 8);
+
+static inline void do_kmadrs32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t t0, t1;
+    t1 = (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+    t0 = (int64_t)a[H4(i)] * b[H4(i)];
+
+    *d = sadd64(env, 0, t0 - t1, *c);
+}
+
+RVPR_ACC(kmadrs32, 1, 8);
+
+static inline void do_kmaxds32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t t01, t10;
+    t01 = (int64_t)a[H4(i)] * b[H4(i + 1)];
+    t10 = (int64_t)a[H4(i + 1)] * b[H4(i)];
+
+    *d = sadd64(env, 0, t10 - t01, *c);
+}
+
+RVPR_ACC(kmaxds32, 1, 8);
+
+static inline void do_kmsda32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t t0, t1;
+    t0 = (int64_t)a[H4(i)] * b[H4(i)];
+    t1 = (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+
+    if (a[H4(i)] == INT32_MIN && a[H4(i + 1)] == INT32_MIN &&
+        b[H4(i)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        if (*c < 0) {
+            env->vxsat = 0x1;
+            *d = INT64_MIN;
+        } else {
+            *d = *c - 1ll - INT64_MAX;
+        }
+    } else {
+        *d = ssub64(env, 0, *c, t0 + t1);
+    }
+}
+
+RVPR_ACC(kmsda32, 1, 8);
+
+static inline void do_kmsxda32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t t01, t10;
+    t10 = (int64_t)a[H4(i + 1)] * b[H4(i)];
+    t01 = (int64_t)a[H4(i)] * b[H4(i + 1)];
+
+    if (a[H4(i)] == INT32_MIN && a[H4(i + 1)] == INT32_MIN &&
+        b[H4(i)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        if (*c < 0) {
+            env->vxsat = 0x1;
+            *d = INT64_MIN;
+        } else {
+            *d = *c - 1ll - INT64_MAX;
+        }
+    } else {
+        *d = ssub64(env, 0, *c, t10 + t01);
+    }
+}
+
+RVPR_ACC(kmsxda32, 1, 8);
+
+static inline void do_smds32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    *d = (int64_t)a[H4(i + 1)] * b[H4(i + 1)] -
+         (int64_t)a[H4(i)] * b[H4(i)];
+}
+
+RVPR64_64_64(smds32, 1, 8);
+
+static inline void do_smdrs32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    *d = (int64_t)a[H4(i)] * b[H4(i)] -
+         (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+}
+
+RVPR64_64_64(smdrs32, 1, 8);
+
+static inline void do_smxds32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    *d = (int64_t)a[H4(i + 1)] * b[H4(i)] -
+         (int64_t)a[H4(i)] * b[H4(i + 1)];
+}
+
+RVPR64_64_64(smxds32, 1, 8);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 35/37] target/riscv: RV64 Only Non-SIMD 32-bit Shift Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:59   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

32-bit rounding arithmetic shift right immediate.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  2 ++
 target/riscv/insn32.decode              |  2 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  3 +++
 target/riscv/packed_helper.c            | 13 +++++++++++++
 4 files changed, 20 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index aa80095e1d..b998c86abf 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1472,3 +1472,5 @@ DEF_HELPER_4(kmsxda32, tl, env, tl, tl, tl)
 DEF_HELPER_3(smds32, i64, env, i64, i64)
 DEF_HELPER_3(smdrs32, i64, env, i64, i64)
 DEF_HELPER_3(smxds32, i64, env, i64, i64)
+
+DEF_HELPER_3(sraiw_u, i64, env, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b9eeb57ca7..8e8aca4ea1 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1095,3 +1095,5 @@ kmsxda32   0100111  ..... ..... 010 ..... 1110111 @r
 smds32     0101100  ..... ..... 010 ..... 1110111 @r
 smdrs32    0110100  ..... ..... 010 ..... 1110111 @r
 smxds32    0111100  ..... ..... 010 ..... 1110111 @r
+
+sraiw_u    0011010  ..... ..... 001 ..... 1110111 @sh5
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 48bcf37e36..68c1ef9f48 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1144,3 +1144,6 @@ GEN_RVP64_R_ACC_OOL(kmsxda32);
 GEN_RVP64_R_OOL(smds32);
 GEN_RVP64_R_OOL(smdrs32);
 GEN_RVP64_R_OOL(smxds32);
+
+/* (RV64 Only) Non-SIMD 32-bit Shift Instructions */
+GEN_RVP64_SHIFTI(sraiw_u, gen_helper_sraiw_u);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 834e7dbebb..42f1d96fa5 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3795,3 +3795,16 @@ static inline void do_smxds32(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64_64_64(smxds32, 1, 8);
+
+/* (RV64 Only) Non-SIMD 32-bit Shift Instructions */
+static inline void do_sraiw_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va;
+    uint8_t shift = *(uint8_t *)vb;
+
+    *d = vssra32(env, 0, a[H4(i)], shift);
+}
+
+RVPR64_64_64(sraiw_u, 1, 8);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 35/37] target/riscv: RV64 Only Non-SIMD 32-bit Shift Instructions
@ 2021-06-10  7:59   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

32-bit rounding arithmetic shift right immediate.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  2 ++
 target/riscv/insn32.decode              |  2 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  3 +++
 target/riscv/packed_helper.c            | 13 +++++++++++++
 4 files changed, 20 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index aa80095e1d..b998c86abf 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1472,3 +1472,5 @@ DEF_HELPER_4(kmsxda32, tl, env, tl, tl, tl)
 DEF_HELPER_3(smds32, i64, env, i64, i64)
 DEF_HELPER_3(smdrs32, i64, env, i64, i64)
 DEF_HELPER_3(smxds32, i64, env, i64, i64)
+
+DEF_HELPER_3(sraiw_u, i64, env, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b9eeb57ca7..8e8aca4ea1 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1095,3 +1095,5 @@ kmsxda32   0100111  ..... ..... 010 ..... 1110111 @r
 smds32     0101100  ..... ..... 010 ..... 1110111 @r
 smdrs32    0110100  ..... ..... 010 ..... 1110111 @r
 smxds32    0111100  ..... ..... 010 ..... 1110111 @r
+
+sraiw_u    0011010  ..... ..... 001 ..... 1110111 @sh5
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 48bcf37e36..68c1ef9f48 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1144,3 +1144,6 @@ GEN_RVP64_R_ACC_OOL(kmsxda32);
 GEN_RVP64_R_OOL(smds32);
 GEN_RVP64_R_OOL(smdrs32);
 GEN_RVP64_R_OOL(smxds32);
+
+/* (RV64 Only) Non-SIMD 32-bit Shift Instructions */
+GEN_RVP64_SHIFTI(sraiw_u, gen_helper_sraiw_u);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 834e7dbebb..42f1d96fa5 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3795,3 +3795,16 @@ static inline void do_smxds32(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64_64_64(smxds32, 1, 8);
+
+/* (RV64 Only) Non-SIMD 32-bit Shift Instructions */
+static inline void do_sraiw_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va;
+    uint8_t shift = *(uint8_t *)vb;
+
+    *d = vssra32(env, 0, a[H4(i)], shift);
+}
+
+RVPR64_64_64(sraiw_u, 1, 8);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 36/37] target/riscv: RV64 Only 32-bit Packing Instructions
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:59   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

Concat two 32-bit elements to form a 64-bit element.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  5 +++
 target/riscv/insn32.decode              |  5 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  6 ++++
 target/riscv/packed_helper.c            | 41 +++++++++++++++++++++++++
 4 files changed, 57 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index b998c86abf..bfcf0ff761 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1474,3 +1474,8 @@ DEF_HELPER_3(smdrs32, i64, env, i64, i64)
 DEF_HELPER_3(smxds32, i64, env, i64, i64)
 
 DEF_HELPER_3(sraiw_u, i64, env, i64, i64)
+
+DEF_HELPER_3(pkbb32, i64, env, i64, i64)
+DEF_HELPER_3(pkbt32, i64, env, i64, i64)
+DEF_HELPER_3(pktt32, i64, env, i64, i64)
+DEF_HELPER_3(pktb32, i64, env, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 8e8aca4ea1..65682f70b5 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1097,3 +1097,8 @@ smdrs32    0110100  ..... ..... 010 ..... 1110111 @r
 smxds32    0111100  ..... ..... 010 ..... 1110111 @r
 
 sraiw_u    0011010  ..... ..... 001 ..... 1110111 @sh5
+
+pkbb32     0000111  ..... ..... 010 ..... 1110111 @r
+pkbt32     0001111  ..... ..... 010 ..... 1110111 @r
+pktt32     0010111  ..... ..... 010 ..... 1110111 @r
+pktb32     0011111  ..... ..... 010 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 68c1ef9f48..7505a0f89b 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1147,3 +1147,9 @@ GEN_RVP64_R_OOL(smxds32);
 
 /* (RV64 Only) Non-SIMD 32-bit Shift Instructions */
 GEN_RVP64_SHIFTI(sraiw_u, gen_helper_sraiw_u);
+
+/* (RV64 Only) 32-bit Packing Instructions */
+GEN_RVP64_R_OOL(pkbb32);
+GEN_RVP64_R_OOL(pkbt32);
+GEN_RVP64_R_OOL(pktt32);
+GEN_RVP64_R_OOL(pktb32);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 42f1d96fa5..3f4bc593f9 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3808,3 +3808,44 @@ static inline void do_sraiw_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64_64_64(sraiw_u, 1, 8);
+
+/* (RV64 Only)  32-bit packing instructions here */
+static inline void do_pkbb32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = b[H4(i)];
+    d[H4(i + 1)] = a[H4(i)];
+}
+
+RVPR64_64_64(pkbb32, 2, 4);
+
+static inline void do_pkbt32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = b[H4(i + 1)];
+    d[H4(i + 1)] = a[H4(i)];
+}
+
+RVPR64_64_64(pkbt32, 2, 4);
+
+static inline void do_pktb32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = b[H4(i)];
+    d[H4(i + 1)] = a[H4(i + 1)];
+}
+
+RVPR64_64_64(pktb32, 2, 4);
+
+static inline void do_pktt32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = b[H4(i + 1)];
+    d[H4(i + 1)] = a[H4(i + 1)];
+}
+
+RVPR64_64_64(pktt32, 2, 4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 36/37] target/riscv: RV64 Only 32-bit Packing Instructions
@ 2021-06-10  7:59   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

Concat two 32-bit elements to form a 64-bit element.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  5 +++
 target/riscv/insn32.decode              |  5 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  6 ++++
 target/riscv/packed_helper.c            | 41 +++++++++++++++++++++++++
 4 files changed, 57 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index b998c86abf..bfcf0ff761 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1474,3 +1474,8 @@ DEF_HELPER_3(smdrs32, i64, env, i64, i64)
 DEF_HELPER_3(smxds32, i64, env, i64, i64)
 
 DEF_HELPER_3(sraiw_u, i64, env, i64, i64)
+
+DEF_HELPER_3(pkbb32, i64, env, i64, i64)
+DEF_HELPER_3(pkbt32, i64, env, i64, i64)
+DEF_HELPER_3(pktt32, i64, env, i64, i64)
+DEF_HELPER_3(pktb32, i64, env, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 8e8aca4ea1..65682f70b5 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1097,3 +1097,8 @@ smdrs32    0110100  ..... ..... 010 ..... 1110111 @r
 smxds32    0111100  ..... ..... 010 ..... 1110111 @r
 
 sraiw_u    0011010  ..... ..... 001 ..... 1110111 @sh5
+
+pkbb32     0000111  ..... ..... 010 ..... 1110111 @r
+pkbt32     0001111  ..... ..... 010 ..... 1110111 @r
+pktt32     0010111  ..... ..... 010 ..... 1110111 @r
+pktb32     0011111  ..... ..... 010 ..... 1110111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 68c1ef9f48..7505a0f89b 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1147,3 +1147,9 @@ GEN_RVP64_R_OOL(smxds32);
 
 /* (RV64 Only) Non-SIMD 32-bit Shift Instructions */
 GEN_RVP64_SHIFTI(sraiw_u, gen_helper_sraiw_u);
+
+/* (RV64 Only) 32-bit Packing Instructions */
+GEN_RVP64_R_OOL(pkbb32);
+GEN_RVP64_R_OOL(pkbt32);
+GEN_RVP64_R_OOL(pktt32);
+GEN_RVP64_R_OOL(pktb32);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 42f1d96fa5..3f4bc593f9 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3808,3 +3808,44 @@ static inline void do_sraiw_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64_64_64(sraiw_u, 1, 8);
+
+/* (RV64 Only)  32-bit packing instructions here */
+static inline void do_pkbb32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = b[H4(i)];
+    d[H4(i + 1)] = a[H4(i)];
+}
+
+RVPR64_64_64(pkbb32, 2, 4);
+
+static inline void do_pkbt32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = b[H4(i + 1)];
+    d[H4(i + 1)] = a[H4(i)];
+}
+
+RVPR64_64_64(pkbt32, 2, 4);
+
+static inline void do_pktb32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = b[H4(i)];
+    d[H4(i + 1)] = a[H4(i + 1)];
+}
+
+RVPR64_64_64(pktb32, 2, 4);
+
+static inline void do_pktt32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = b[H4(i + 1)];
+    d[H4(i + 1)] = a[H4(i + 1)];
+}
+
+RVPR64_64_64(pktt32, 2, 4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 37/37] target/riscv: configure and turn on packed extension from command line
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-10  7:59   ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: palmer, richard.henderson, bin.meng, Alistair.Francis, LIU Zhiwei

Packed extension is default off. The only way to use packed extension is
1. use cpu rv32 or rv64
2. turn on it by command line
   "-cpu rv32,x-p=true,Zpsfoperand=true,pext_spec=v0.9.4".

Zpsfoperand is whether to support Zpsfoperand sub-extension,
default value is true.
pext_ver is the packed specification version, default value is v0.9.4.
These properties can be specified with other values.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/cpu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 9d8cf60a1c..21020b902e 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -618,14 +618,17 @@ static Property riscv_cpu_properties[] = {
     DEFINE_PROP_BOOL("x-b", RISCVCPU, cfg.ext_b, false),
     DEFINE_PROP_BOOL("x-h", RISCVCPU, cfg.ext_h, false),
     DEFINE_PROP_BOOL("x-v", RISCVCPU, cfg.ext_v, false),
+    DEFINE_PROP_BOOL("x-p", RISCVCPU, cfg.ext_p, false),
     DEFINE_PROP_BOOL("Counters", RISCVCPU, cfg.ext_counters, true),
     DEFINE_PROP_BOOL("Zifencei", RISCVCPU, cfg.ext_ifencei, true),
     DEFINE_PROP_BOOL("Zicsr", RISCVCPU, cfg.ext_icsr, true),
     DEFINE_PROP_STRING("priv_spec", RISCVCPU, cfg.priv_spec),
     DEFINE_PROP_STRING("bext_spec", RISCVCPU, cfg.bext_spec),
+    DEFINE_PROP_STRING("pext_spec", RISCVCPU, cfg.pext_spec),
     DEFINE_PROP_STRING("vext_spec", RISCVCPU, cfg.vext_spec),
     DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
     DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),
+    DEFINE_PROP_BOOL("Zpsfoperand", RISCVCPU, cfg.ext_psfoperand, true),
     DEFINE_PROP_BOOL("mmu", RISCVCPU, cfg.mmu, true),
     DEFINE_PROP_BOOL("pmp", RISCVCPU, cfg.pmp, true),
     DEFINE_PROP_BOOL("x-epmp", RISCVCPU, cfg.epmp, false),
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 37/37] target/riscv: configure and turn on packed extension from command line
@ 2021-06-10  7:59   ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-10  7:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, richard.henderson, LIU Zhiwei

Packed extension is default off. The only way to use packed extension is
1. use cpu rv32 or rv64
2. turn on it by command line
   "-cpu rv32,x-p=true,Zpsfoperand=true,pext_spec=v0.9.4".

Zpsfoperand is whether to support Zpsfoperand sub-extension,
default value is true.
pext_ver is the packed specification version, default value is v0.9.4.
These properties can be specified with other values.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/cpu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 9d8cf60a1c..21020b902e 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -618,14 +618,17 @@ static Property riscv_cpu_properties[] = {
     DEFINE_PROP_BOOL("x-b", RISCVCPU, cfg.ext_b, false),
     DEFINE_PROP_BOOL("x-h", RISCVCPU, cfg.ext_h, false),
     DEFINE_PROP_BOOL("x-v", RISCVCPU, cfg.ext_v, false),
+    DEFINE_PROP_BOOL("x-p", RISCVCPU, cfg.ext_p, false),
     DEFINE_PROP_BOOL("Counters", RISCVCPU, cfg.ext_counters, true),
     DEFINE_PROP_BOOL("Zifencei", RISCVCPU, cfg.ext_ifencei, true),
     DEFINE_PROP_BOOL("Zicsr", RISCVCPU, cfg.ext_icsr, true),
     DEFINE_PROP_STRING("priv_spec", RISCVCPU, cfg.priv_spec),
     DEFINE_PROP_STRING("bext_spec", RISCVCPU, cfg.bext_spec),
+    DEFINE_PROP_STRING("pext_spec", RISCVCPU, cfg.pext_spec),
     DEFINE_PROP_STRING("vext_spec", RISCVCPU, cfg.vext_spec),
     DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
     DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),
+    DEFINE_PROP_BOOL("Zpsfoperand", RISCVCPU, cfg.ext_psfoperand, true),
     DEFINE_PROP_BOOL("mmu", RISCVCPU, cfg.mmu, true),
     DEFINE_PROP_BOOL("pmp", RISCVCPU, cfg.pmp, true),
     DEFINE_PROP_BOOL("x-epmp", RISCVCPU, cfg.epmp, false),
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 03/37] target/riscv: 16-bit Addition & Subtraction Instructions
  2021-06-10  7:58   ` LIU Zhiwei
@ 2021-06-10 18:00     ` Richard Henderson
  -1 siblings, 0 replies; 88+ messages in thread
From: Richard Henderson @ 2021-06-10 18:00 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel, qemu-riscv; +Cc: palmer, bin.meng, Alistair.Francis

On 6/10/21 12:58 AM, LIU Zhiwei wrote:
> Include 5 groups: Wrap-around (dropping overflow), Signed Halving,
> Unsigned Halving, Signed Saturation, and Unsigned Saturation.
> 
> Signed-off-by: LIU Zhiwei<zhiwei_liu@c-sky.com>
> ---
>   include/tcg/tcg-op-gvec.h               |  10 +
>   target/riscv/helper.h                   |  30 ++
>   target/riscv/insn32.decode              |  32 +++
>   target/riscv/insn_trans/trans_rvp.c.inc | 117 ++++++++
>   target/riscv/meson.build                |   1 +
>   target/riscv/packed_helper.c            | 354 ++++++++++++++++++++++++
>   target/riscv/translate.c                |   1 +
>   tcg/tcg-op-gvec.c                       |  28 ++

The tcg part needs to be split out, and I'm happy to give a

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

on that.


r~


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 03/37] target/riscv: 16-bit Addition & Subtraction Instructions
@ 2021-06-10 18:00     ` Richard Henderson
  0 siblings, 0 replies; 88+ messages in thread
From: Richard Henderson @ 2021-06-10 18:00 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel, qemu-riscv; +Cc: Alistair.Francis, palmer, bin.meng

On 6/10/21 12:58 AM, LIU Zhiwei wrote:
> Include 5 groups: Wrap-around (dropping overflow), Signed Halving,
> Unsigned Halving, Signed Saturation, and Unsigned Saturation.
> 
> Signed-off-by: LIU Zhiwei<zhiwei_liu@c-sky.com>
> ---
>   include/tcg/tcg-op-gvec.h               |  10 +
>   target/riscv/helper.h                   |  30 ++
>   target/riscv/insn32.decode              |  32 +++
>   target/riscv/insn_trans/trans_rvp.c.inc | 117 ++++++++
>   target/riscv/meson.build                |   1 +
>   target/riscv/packed_helper.c            | 354 ++++++++++++++++++++++++
>   target/riscv/translate.c                |   1 +
>   tcg/tcg-op-gvec.c                       |  28 ++

The tcg part needs to be split out, and I'm happy to give a

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

on that.


r~


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 04/37] target/riscv: 8-bit Addition & Subtraction Instruction
  2021-06-10  7:58   ` LIU Zhiwei
@ 2021-06-10 19:39     ` Richard Henderson
  -1 siblings, 0 replies; 88+ messages in thread
From: Richard Henderson @ 2021-06-10 19:39 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel, qemu-riscv
  Cc: Palmer Dabbelt, palmer, bin.meng, Alistair.Francis

On 6/10/21 12:58 AM, LIU Zhiwei wrote:
>   include/tcg/tcg-op-gvec.h               |  6 ++
>   tcg/tcg-op-gvec.c                       | 47 ++++++++++++++++

Likewise, should be split from the larger patch.

> +static void gen_addv_mask_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, TCGv_i32 m)
> +{
> +    TCGv_i32 t1 = tcg_temp_new_i32();
> +    TCGv_i32 t2 = tcg_temp_new_i32();
> +    TCGv_i32 t3 = tcg_temp_new_i32();
> +
> +    tcg_gen_andc_i32(t1, a, m);
> +    tcg_gen_andc_i32(t2, b, m);
> +    tcg_gen_xor_i32(t3, a, b);
> +    tcg_gen_add_i32(d, t1, t2);
> +    tcg_gen_and_i32(t3, t3, m);
> +    tcg_gen_xor_i32(d, d, t3);
> +
> +    tcg_temp_free_i32(t1);
> +    tcg_temp_free_i32(t2);
> +    tcg_temp_free_i32(t3);
> +}
> +
> +void tcg_gen_vec_add8_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
> +{
> +    TCGv_i32 m = tcg_constant_i32((int32_t)dup_const(MO_8, 0x80));
> +    gen_addv_mask_i32(d, a, b, m);
> +}

There will only ever be one use; we might as well merge them.
The cast is unnecessary.


r~


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 04/37] target/riscv: 8-bit Addition & Subtraction Instruction
@ 2021-06-10 19:39     ` Richard Henderson
  0 siblings, 0 replies; 88+ messages in thread
From: Richard Henderson @ 2021-06-10 19:39 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, Palmer Dabbelt

On 6/10/21 12:58 AM, LIU Zhiwei wrote:
>   include/tcg/tcg-op-gvec.h               |  6 ++
>   tcg/tcg-op-gvec.c                       | 47 ++++++++++++++++

Likewise, should be split from the larger patch.

> +static void gen_addv_mask_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, TCGv_i32 m)
> +{
> +    TCGv_i32 t1 = tcg_temp_new_i32();
> +    TCGv_i32 t2 = tcg_temp_new_i32();
> +    TCGv_i32 t3 = tcg_temp_new_i32();
> +
> +    tcg_gen_andc_i32(t1, a, m);
> +    tcg_gen_andc_i32(t2, b, m);
> +    tcg_gen_xor_i32(t3, a, b);
> +    tcg_gen_add_i32(d, t1, t2);
> +    tcg_gen_and_i32(t3, t3, m);
> +    tcg_gen_xor_i32(d, d, t3);
> +
> +    tcg_temp_free_i32(t1);
> +    tcg_temp_free_i32(t2);
> +    tcg_temp_free_i32(t3);
> +}
> +
> +void tcg_gen_vec_add8_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
> +{
> +    TCGv_i32 m = tcg_constant_i32((int32_t)dup_const(MO_8, 0x80));
> +    gen_addv_mask_i32(d, a, b, m);
> +}

There will only ever be one use; we might as well merge them.
The cast is unnecessary.


r~


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 05/37] target/riscv: SIMD 16-bit Shift Instructions
  2021-06-10  7:58   ` LIU Zhiwei
@ 2021-06-10 19:44     ` Richard Henderson
  -1 siblings, 0 replies; 88+ messages in thread
From: Richard Henderson @ 2021-06-10 19:44 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel, qemu-riscv; +Cc: palmer, bin.meng, Alistair.Francis

On 6/10/21 12:58 AM, LIU Zhiwei wrote:
>   include/tcg/tcg-op-gvec.h               |   9 ++
>   tcg/tcg-op-gvec.c                       |  28 +++++++

Again, should be split out, with a

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 05/37] target/riscv: SIMD 16-bit Shift Instructions
@ 2021-06-10 19:44     ` Richard Henderson
  0 siblings, 0 replies; 88+ messages in thread
From: Richard Henderson @ 2021-06-10 19:44 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel, qemu-riscv; +Cc: Alistair.Francis, palmer, bin.meng

On 6/10/21 12:58 AM, LIU Zhiwei wrote:
>   include/tcg/tcg-op-gvec.h               |   9 ++
>   tcg/tcg-op-gvec.c                       |  28 +++++++

Again, should be split out, with a

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 04/37] target/riscv: 8-bit Addition & Subtraction Instruction
  2021-06-10 19:39     ` Richard Henderson
@ 2021-06-11  4:36       ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-11  4:36 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel, qemu-riscv
  Cc: Palmer Dabbelt, palmer, bin.meng, Alistair.Francis

On 6/11/21 3:39 AM, Richard Henderson wrote:

> On 6/10/21 12:58 AM, LIU Zhiwei wrote:
>>   include/tcg/tcg-op-gvec.h |  6 ++
>>   tcg/tcg-op-gvec.c                       | 47 ++++++++++++++++
>
> Likewise, should be split from the larger patch.
>
OK
>> +static void gen_addv_mask_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, 
>> TCGv_i32 m)
>> +{
>> +    TCGv_i32 t1 = tcg_temp_new_i32();
>> +    TCGv_i32 t2 = tcg_temp_new_i32();
>> +    TCGv_i32 t3 = tcg_temp_new_i32();
>> +
>> +    tcg_gen_andc_i32(t1, a, m);
>> +    tcg_gen_andc_i32(t2, b, m);
>> +    tcg_gen_xor_i32(t3, a, b);
>> +    tcg_gen_add_i32(d, t1, t2);
>> +    tcg_gen_and_i32(t3, t3, m);
>> +    tcg_gen_xor_i32(d, d, t3);
>> +
>> +    tcg_temp_free_i32(t1);
>> +    tcg_temp_free_i32(t2);
>> +    tcg_temp_free_i32(t3);
>> +}
>> +
>> +void tcg_gen_vec_add8_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
>> +{
>> +    TCGv_i32 m = tcg_constant_i32((int32_t)dup_const(MO_8, 0x80));
>> +    gen_addv_mask_i32(d, a, b, m);
>> +}
>
> There will only ever be one use; we might as well merge them.
> The cast is unnecessary.

A little puzzling. Should I still split it?


Zhiwei

>
>
> r~


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 04/37] target/riscv: 8-bit Addition & Subtraction Instruction
@ 2021-06-11  4:36       ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-11  4:36 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, Palmer Dabbelt

On 6/11/21 3:39 AM, Richard Henderson wrote:

> On 6/10/21 12:58 AM, LIU Zhiwei wrote:
>>   include/tcg/tcg-op-gvec.h |  6 ++
>>   tcg/tcg-op-gvec.c                       | 47 ++++++++++++++++
>
> Likewise, should be split from the larger patch.
>
OK
>> +static void gen_addv_mask_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, 
>> TCGv_i32 m)
>> +{
>> +    TCGv_i32 t1 = tcg_temp_new_i32();
>> +    TCGv_i32 t2 = tcg_temp_new_i32();
>> +    TCGv_i32 t3 = tcg_temp_new_i32();
>> +
>> +    tcg_gen_andc_i32(t1, a, m);
>> +    tcg_gen_andc_i32(t2, b, m);
>> +    tcg_gen_xor_i32(t3, a, b);
>> +    tcg_gen_add_i32(d, t1, t2);
>> +    tcg_gen_and_i32(t3, t3, m);
>> +    tcg_gen_xor_i32(d, d, t3);
>> +
>> +    tcg_temp_free_i32(t1);
>> +    tcg_temp_free_i32(t2);
>> +    tcg_temp_free_i32(t3);
>> +}
>> +
>> +void tcg_gen_vec_add8_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
>> +{
>> +    TCGv_i32 m = tcg_constant_i32((int32_t)dup_const(MO_8, 0x80));
>> +    gen_addv_mask_i32(d, a, b, m);
>> +}
>
> There will only ever be one use; we might as well merge them.
> The cast is unnecessary.

A little puzzling. Should I still split it?


Zhiwei

>
>
> r~


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 00/37] target/riscv: support packed extension v0.9.4
  2021-06-10  7:58 ` LIU Zhiwei
@ 2021-06-14 22:55   ` no-reply
  -1 siblings, 0 replies; 88+ messages in thread
From: no-reply @ 2021-06-14 22:55 UTC (permalink / raw)
  To: zhiwei_liu
  Cc: qemu-riscv, bin.meng, richard.henderson, qemu-devel, palmer,
	Alistair.Francis, zhiwei_liu

Patchew URL: https://patchew.org/QEMU/20210610075908.3305506-1-zhiwei_liu@c-sky.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210610075908.3305506-1-zhiwei_liu@c-sky.com
Subject: [PATCH v2 00/37] target/riscv: support packed extension v0.9.4

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
58bab55 target/riscv: configure and turn on packed extension from command line
aaa9443 target/riscv: RV64 Only 32-bit Packing Instructions
fd98368 target/riscv: RV64 Only Non-SIMD 32-bit Shift Instructions
13ee829 target/riscv: RV64 Only 32-bit Parallel Multiply & Add Instructions
a45bf09 target/riscv: RV64 Only 32-bit Multiply & Add Instructions
afa9d9f target/riscv: RV64 Only 32-bit Multiply Instructions
5e47cf9 target/riscv: RV64 Only SIMD Q15 saturating Multiply Instructions
0707fb2 target/riscv: RV64 Only SIMD 32-bit Miscellaneous Instructions
9103b42 target/riscv: RV64 Only SIMD 32-bit Shift Instructions
aa8562e target/riscv: RV64 Only SIMD 32-bit Add/Subtract Instructions
4e3c751 target/riscv: Non-SIMD Miscellaneous Instructions
98463d7 target/riscv: 32-bit Computation Instructions
a2b5fa4 target/riscv: Non-SIMD Q31 saturation ALU Instructions
5ac11aa target/riscv: Non-SIMD Q15 saturation ALU Instructions
8f8cc98 target/riscv: Signed 16-bit Multiply with 64-bit Add/Subtract Instructions
562fe16 target/riscv: 32-bit Multiply 64-bit Add/Subtract Instructions
abd68e9 target/riscv: 64-bit Add/Subtract Instructions
1101a08 target/riscv: 8-bit Multiply with 32-bit Add Instructions
cade413 target/riscv: Partial-SIMD Miscellaneous Instructions
868fc8a target/riscv: Signed 16-bit Multiply 64-bit Add/Subtract Instructions
55ea8d5 target/riscv: Signed 16-bit Multiply 32-bit Add/Subtract Instructions
fc7375d target/riscv: Signed MSW 32x16 Multiply and Add Instructions
14d0690 target/riscv: Signed MSW 32x32 Multiply and Add Instructions
75852f9 target/riscv: 16-bit Packing Instructions
4c0f92a target/riscv: 8-bit Unpacking Instructions
da3eb1d target/riscv: SIMD 8-bit Miscellaneous Instructions
cda90fe target/riscv: SIMD 16-bit Miscellaneous Instructions
2c1cebb target/riscv: SIMD 8-bit Multiply Instructions
ea6538c target/riscv: SIMD 16-bit Multiply Instructions
e6d145d target/riscv: SIMD 8-bit Compare Instructions
c7dc098 target/riscv: SIMD 16-bit Compare Instructions
98fdd40 target/riscv: SIMD 8-bit Shift Instructions
161cf36 target/riscv: SIMD 16-bit Shift Instructions
52b81ce target/riscv: 8-bit Addition & Subtraction Instruction
51a264e target/riscv: 16-bit Addition & Subtraction Instructions
9bfdab5 target/riscv: Make the vector helper functions public
8966803 target/riscv: implementation-defined constant parameters

=== OUTPUT BEGIN ===
1/37 Checking commit 896680352f63 (target/riscv: implementation-defined constant parameters)
2/37 Checking commit 9bfdab58457e (target/riscv: Make the vector helper functions public)
3/37 Checking commit 51a264e82cf5 (target/riscv: 16-bit Addition & Subtraction Instructions)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#132: 
new file mode 100644

ERROR: space prohibited after that '*' (ctx:BxW)
#172: FILE: target/riscv/insn_trans/trans_rvp.c.inc:36:
+         void (* vecop)(TCGv, TCGv, TCGv),
                ^

ERROR: space prohibited after that '*' (ctx:BxW)
#173: FILE: target/riscv/insn_trans/trans_rvp.c.inc:37:
+         void (* op)(TCGv, TCGv, TCGv))
                ^

ERROR: space prohibited after that '*' (ctx:BxW)
#198: FILE: target/riscv/insn_trans/trans_rvp.c.inc:62:
+r_ool(DisasContext *ctx, arg_r *a, void (* fn)(TCGv, TCGv_ptr, TCGv, TCGv))
                                          ^

total: 3 errors, 1 warnings, 617 lines checked

Patch 3/37 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

4/37 Checking commit 52b81cea79fe (target/riscv: 8-bit Addition & Subtraction Instruction)
5/37 Checking commit 161cf360f41e (target/riscv: SIMD 16-bit Shift Instructions)
ERROR: space prohibited after that '*' (ctx:BxW)
#138: FILE: target/riscv/insn_trans/trans_rvp.c.inc:144:
+               void (* fn)(TCGv, TCGv_ptr, TCGv, TCGv))
                      ^

ERROR: space prohibited after that '*' (ctx:BxW)
#158: FILE: target/riscv/insn_trans/trans_rvp.c.inc:164:
+           void (* vecop)(TCGv, TCGv, target_long),
                  ^

ERROR: space prohibited after that '*' (ctx:BxW)
#159: FILE: target/riscv/insn_trans/trans_rvp.c.inc:165:
+           void (* op)(TCGv, TCGv_ptr, TCGv, TCGv))
                  ^

total: 3 errors, 0 warnings, 289 lines checked

Patch 5/37 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

6/37 Checking commit 98fdd4058bd6 (target/riscv: SIMD 8-bit Shift Instructions)
7/37 Checking commit c7dc0984c42d (target/riscv: SIMD 16-bit Compare Instructions)
8/37 Checking commit e6d145d8c6e2 (target/riscv: SIMD 8-bit Compare Instructions)
9/37 Checking commit ea6538c95033 (target/riscv: SIMD 16-bit Multiply Instructions)
ERROR: space prohibited after that '*' (ctx:BxW)
#91: FILE: target/riscv/insn_trans/trans_rvp.c.inc:253:
+          void (* fn)(TCGv_i64, TCGv_ptr, TCGv, TCGv))
                 ^

total: 1 errors, 0 warnings, 199 lines checked

Patch 9/37 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

10/37 Checking commit 2c1cebb7751d (target/riscv: SIMD 8-bit Multiply Instructions)
11/37 Checking commit cda90fe68120 (target/riscv: SIMD 16-bit Miscellaneous Instructions)
ERROR: space prohibited after that '*' (ctx:BxW)
#79: FILE: target/riscv/insn_trans/trans_rvp.c.inc:309:
+       void (* fn)(TCGv, TCGv_ptr, TCGv))
              ^

total: 1 errors, 0 warnings, 233 lines checked

Patch 11/37 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

12/37 Checking commit da3eb1d28281 (target/riscv: SIMD 8-bit Miscellaneous Instructions)
13/37 Checking commit 4c0f92ad768c (target/riscv: 8-bit Unpacking Instructions)
14/37 Checking commit 75852f9829dd (target/riscv: 16-bit Packing Instructions)
15/37 Checking commit 14d069035135 (target/riscv: Signed MSW 32x32 Multiply and Add Instructions)
ERROR: space prohibited after that '*' (ctx:BxW)
#69: FILE: target/riscv/insn_trans/trans_rvp.c.inc:379:
+                             void (* fn)(TCGv, TCGv_ptr, TCGv, TCGv, TCGv))
                                    ^

total: 1 errors, 0 warnings, 183 lines checked

Patch 15/37 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

16/37 Checking commit fc7375de7797 (target/riscv: Signed MSW 32x16 Multiply and Add Instructions)
17/37 Checking commit 55ea8d559755 (target/riscv: Signed 16-bit Multiply 32-bit Add/Subtract Instructions)
18/37 Checking commit 868fc8a71557 (target/riscv: Signed 16-bit Multiply 64-bit Add/Subtract Instructions)
ERROR: space prohibited after that '*' (ctx:BxW)
#50: FILE: target/riscv/insn_trans/trans_rvp.c.inc:458:
+              void (* fn)(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv))
                     ^

total: 1 errors, 0 warnings, 92 lines checked

Patch 18/37 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

19/37 Checking commit cade413e7d67 (target/riscv: Partial-SIMD Miscellaneous Instructions)
20/37 Checking commit 1101a08fa021 (target/riscv: 8-bit Multiply with 32-bit Add Instructions)
21/37 Checking commit abd68e9846f7 (target/riscv: 64-bit Add/Subtract Instructions)
ERROR: space prohibited after that '*' (ctx:BxW)
#71: FILE: target/riscv/insn_trans/trans_rvp.c.inc:526:
+                  void (* fn)(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64))
                         ^

total: 1 errors, 0 warnings, 240 lines checked

Patch 21/37 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

22/37 Checking commit 562fe1664758 (target/riscv: 32-bit Multiply 64-bit Add/Subtract Instructions)
ERROR: space prohibited after that '*' (ctx:BxW)
#67: FILE: target/riscv/insn_trans/trans_rvp.c.inc:599:
+              void (* fn)(TCGv_i64, TCGv_ptr, TCGv, TCGv, TCGv_i64))
                     ^

total: 1 errors, 0 warnings, 252 lines checked

Patch 22/37 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

23/37 Checking commit 8f8cc98490dd (target/riscv: Signed 16-bit Multiply with 64-bit Add/Subtract Instructions)
24/37 Checking commit 5ac11aaba982 (target/riscv: Non-SIMD Q15 saturation ALU Instructions)
25/37 Checking commit a2b5fa48ee23 (target/riscv: Non-SIMD Q31 saturation ALU Instructions)
26/37 Checking commit 98463d7bddc4 (target/riscv: 32-bit Computation Instructions)
27/37 Checking commit 4e3c7519d03b (target/riscv: Non-SIMD Miscellaneous Instructions)
ERROR: space prohibited after that '*' (ctx:BxW)
#103: FILE: target/riscv/insn_trans/trans_rvp.c.inc:721:
+          void (* fn)(TCGv, TCGv_ptr, TCGv_i64, TCGv))
                 ^

ERROR: space prohibited after that '*' (ctx:BxW)
#151: FILE: target/riscv/insn_trans/trans_rvp.c.inc:769:
+                               void (* fn)(TCGv, TCGv_ptr, TCGv_i64, TCGv))
                                      ^

total: 2 errors, 0 warnings, 376 lines checked

Patch 27/37 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

28/37 Checking commit aa8562e69e4f (target/riscv: RV64 Only SIMD 32-bit Add/Subtract Instructions)
ERROR: space prohibited after that '*' (ctx:BxW)
#144: FILE: target/riscv/insn_trans/trans_rvp.c.inc:969:
+         void (* fn)(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64))
                ^

total: 1 errors, 0 warnings, 449 lines checked

Patch 28/37 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

29/37 Checking commit 9103b42ea0f6 (target/riscv: RV64 Only SIMD 32-bit Shift Instructions)
ERROR: space prohibited after that '*' (ctx:BxW)
#71: FILE: target/riscv/insn_trans/trans_rvp.c.inc:1040:
+             void (* fn)(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64))
                    ^

total: 1 errors, 0 warnings, 195 lines checked

Patch 29/37 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

30/37 Checking commit 0707fb22513f (target/riscv: RV64 Only SIMD 32-bit Miscellaneous Instructions)
31/37 Checking commit 5e47cf943de0 (target/riscv: RV64 Only SIMD Q15 saturating Multiply Instructions)
32/37 Checking commit afa9d9f289b8 (target/riscv: RV64 Only 32-bit Multiply Instructions)
33/37 Checking commit a45bf09dd5f8 (target/riscv: RV64 Only 32-bit Multiply & Add Instructions)
34/37 Checking commit 13ee829e5c05 (target/riscv: RV64 Only 32-bit Parallel Multiply & Add Instructions)
35/37 Checking commit fd983689c420 (target/riscv: RV64 Only Non-SIMD 32-bit Shift Instructions)
36/37 Checking commit aaa9443b6f93 (target/riscv: RV64 Only 32-bit Packing Instructions)
37/37 Checking commit 58bab55e2bde (target/riscv: configure and turn on packed extension from command line)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20210610075908.3305506-1-zhiwei_liu@c-sky.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 00/37] target/riscv: support packed extension v0.9.4
@ 2021-06-14 22:55   ` no-reply
  0 siblings, 0 replies; 88+ messages in thread
From: no-reply @ 2021-06-14 22:55 UTC (permalink / raw)
  To: zhiwei_liu
  Cc: qemu-devel, qemu-riscv, palmer, richard.henderson, bin.meng,
	Alistair.Francis, zhiwei_liu

Patchew URL: https://patchew.org/QEMU/20210610075908.3305506-1-zhiwei_liu@c-sky.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210610075908.3305506-1-zhiwei_liu@c-sky.com
Subject: [PATCH v2 00/37] target/riscv: support packed extension v0.9.4

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
58bab55 target/riscv: configure and turn on packed extension from command line
aaa9443 target/riscv: RV64 Only 32-bit Packing Instructions
fd98368 target/riscv: RV64 Only Non-SIMD 32-bit Shift Instructions
13ee829 target/riscv: RV64 Only 32-bit Parallel Multiply & Add Instructions
a45bf09 target/riscv: RV64 Only 32-bit Multiply & Add Instructions
afa9d9f target/riscv: RV64 Only 32-bit Multiply Instructions
5e47cf9 target/riscv: RV64 Only SIMD Q15 saturating Multiply Instructions
0707fb2 target/riscv: RV64 Only SIMD 32-bit Miscellaneous Instructions
9103b42 target/riscv: RV64 Only SIMD 32-bit Shift Instructions
aa8562e target/riscv: RV64 Only SIMD 32-bit Add/Subtract Instructions
4e3c751 target/riscv: Non-SIMD Miscellaneous Instructions
98463d7 target/riscv: 32-bit Computation Instructions
a2b5fa4 target/riscv: Non-SIMD Q31 saturation ALU Instructions
5ac11aa target/riscv: Non-SIMD Q15 saturation ALU Instructions
8f8cc98 target/riscv: Signed 16-bit Multiply with 64-bit Add/Subtract Instructions
562fe16 target/riscv: 32-bit Multiply 64-bit Add/Subtract Instructions
abd68e9 target/riscv: 64-bit Add/Subtract Instructions
1101a08 target/riscv: 8-bit Multiply with 32-bit Add Instructions
cade413 target/riscv: Partial-SIMD Miscellaneous Instructions
868fc8a target/riscv: Signed 16-bit Multiply 64-bit Add/Subtract Instructions
55ea8d5 target/riscv: Signed 16-bit Multiply 32-bit Add/Subtract Instructions
fc7375d target/riscv: Signed MSW 32x16 Multiply and Add Instructions
14d0690 target/riscv: Signed MSW 32x32 Multiply and Add Instructions
75852f9 target/riscv: 16-bit Packing Instructions
4c0f92a target/riscv: 8-bit Unpacking Instructions
da3eb1d target/riscv: SIMD 8-bit Miscellaneous Instructions
cda90fe target/riscv: SIMD 16-bit Miscellaneous Instructions
2c1cebb target/riscv: SIMD 8-bit Multiply Instructions
ea6538c target/riscv: SIMD 16-bit Multiply Instructions
e6d145d target/riscv: SIMD 8-bit Compare Instructions
c7dc098 target/riscv: SIMD 16-bit Compare Instructions
98fdd40 target/riscv: SIMD 8-bit Shift Instructions
161cf36 target/riscv: SIMD 16-bit Shift Instructions
52b81ce target/riscv: 8-bit Addition & Subtraction Instruction
51a264e target/riscv: 16-bit Addition & Subtraction Instructions
9bfdab5 target/riscv: Make the vector helper functions public
8966803 target/riscv: implementation-defined constant parameters

=== OUTPUT BEGIN ===
1/37 Checking commit 896680352f63 (target/riscv: implementation-defined constant parameters)
2/37 Checking commit 9bfdab58457e (target/riscv: Make the vector helper functions public)
3/37 Checking commit 51a264e82cf5 (target/riscv: 16-bit Addition & Subtraction Instructions)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#132: 
new file mode 100644

ERROR: space prohibited after that '*' (ctx:BxW)
#172: FILE: target/riscv/insn_trans/trans_rvp.c.inc:36:
+         void (* vecop)(TCGv, TCGv, TCGv),
                ^

ERROR: space prohibited after that '*' (ctx:BxW)
#173: FILE: target/riscv/insn_trans/trans_rvp.c.inc:37:
+         void (* op)(TCGv, TCGv, TCGv))
                ^

ERROR: space prohibited after that '*' (ctx:BxW)
#198: FILE: target/riscv/insn_trans/trans_rvp.c.inc:62:
+r_ool(DisasContext *ctx, arg_r *a, void (* fn)(TCGv, TCGv_ptr, TCGv, TCGv))
                                          ^

total: 3 errors, 1 warnings, 617 lines checked

Patch 3/37 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

4/37 Checking commit 52b81cea79fe (target/riscv: 8-bit Addition & Subtraction Instruction)
5/37 Checking commit 161cf360f41e (target/riscv: SIMD 16-bit Shift Instructions)
ERROR: space prohibited after that '*' (ctx:BxW)
#138: FILE: target/riscv/insn_trans/trans_rvp.c.inc:144:
+               void (* fn)(TCGv, TCGv_ptr, TCGv, TCGv))
                      ^

ERROR: space prohibited after that '*' (ctx:BxW)
#158: FILE: target/riscv/insn_trans/trans_rvp.c.inc:164:
+           void (* vecop)(TCGv, TCGv, target_long),
                  ^

ERROR: space prohibited after that '*' (ctx:BxW)
#159: FILE: target/riscv/insn_trans/trans_rvp.c.inc:165:
+           void (* op)(TCGv, TCGv_ptr, TCGv, TCGv))
                  ^

total: 3 errors, 0 warnings, 289 lines checked

Patch 5/37 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

6/37 Checking commit 98fdd4058bd6 (target/riscv: SIMD 8-bit Shift Instructions)
7/37 Checking commit c7dc0984c42d (target/riscv: SIMD 16-bit Compare Instructions)
8/37 Checking commit e6d145d8c6e2 (target/riscv: SIMD 8-bit Compare Instructions)
9/37 Checking commit ea6538c95033 (target/riscv: SIMD 16-bit Multiply Instructions)
ERROR: space prohibited after that '*' (ctx:BxW)
#91: FILE: target/riscv/insn_trans/trans_rvp.c.inc:253:
+          void (* fn)(TCGv_i64, TCGv_ptr, TCGv, TCGv))
                 ^

total: 1 errors, 0 warnings, 199 lines checked

Patch 9/37 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

10/37 Checking commit 2c1cebb7751d (target/riscv: SIMD 8-bit Multiply Instructions)
11/37 Checking commit cda90fe68120 (target/riscv: SIMD 16-bit Miscellaneous Instructions)
ERROR: space prohibited after that '*' (ctx:BxW)
#79: FILE: target/riscv/insn_trans/trans_rvp.c.inc:309:
+       void (* fn)(TCGv, TCGv_ptr, TCGv))
              ^

total: 1 errors, 0 warnings, 233 lines checked

Patch 11/37 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

12/37 Checking commit da3eb1d28281 (target/riscv: SIMD 8-bit Miscellaneous Instructions)
13/37 Checking commit 4c0f92ad768c (target/riscv: 8-bit Unpacking Instructions)
14/37 Checking commit 75852f9829dd (target/riscv: 16-bit Packing Instructions)
15/37 Checking commit 14d069035135 (target/riscv: Signed MSW 32x32 Multiply and Add Instructions)
ERROR: space prohibited after that '*' (ctx:BxW)
#69: FILE: target/riscv/insn_trans/trans_rvp.c.inc:379:
+                             void (* fn)(TCGv, TCGv_ptr, TCGv, TCGv, TCGv))
                                    ^

total: 1 errors, 0 warnings, 183 lines checked

Patch 15/37 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

16/37 Checking commit fc7375de7797 (target/riscv: Signed MSW 32x16 Multiply and Add Instructions)
17/37 Checking commit 55ea8d559755 (target/riscv: Signed 16-bit Multiply 32-bit Add/Subtract Instructions)
18/37 Checking commit 868fc8a71557 (target/riscv: Signed 16-bit Multiply 64-bit Add/Subtract Instructions)
ERROR: space prohibited after that '*' (ctx:BxW)
#50: FILE: target/riscv/insn_trans/trans_rvp.c.inc:458:
+              void (* fn)(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv))
                     ^

total: 1 errors, 0 warnings, 92 lines checked

Patch 18/37 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

19/37 Checking commit cade413e7d67 (target/riscv: Partial-SIMD Miscellaneous Instructions)
20/37 Checking commit 1101a08fa021 (target/riscv: 8-bit Multiply with 32-bit Add Instructions)
21/37 Checking commit abd68e9846f7 (target/riscv: 64-bit Add/Subtract Instructions)
ERROR: space prohibited after that '*' (ctx:BxW)
#71: FILE: target/riscv/insn_trans/trans_rvp.c.inc:526:
+                  void (* fn)(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64))
                         ^

total: 1 errors, 0 warnings, 240 lines checked

Patch 21/37 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

22/37 Checking commit 562fe1664758 (target/riscv: 32-bit Multiply 64-bit Add/Subtract Instructions)
ERROR: space prohibited after that '*' (ctx:BxW)
#67: FILE: target/riscv/insn_trans/trans_rvp.c.inc:599:
+              void (* fn)(TCGv_i64, TCGv_ptr, TCGv, TCGv, TCGv_i64))
                     ^

total: 1 errors, 0 warnings, 252 lines checked

Patch 22/37 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

23/37 Checking commit 8f8cc98490dd (target/riscv: Signed 16-bit Multiply with 64-bit Add/Subtract Instructions)
24/37 Checking commit 5ac11aaba982 (target/riscv: Non-SIMD Q15 saturation ALU Instructions)
25/37 Checking commit a2b5fa48ee23 (target/riscv: Non-SIMD Q31 saturation ALU Instructions)
26/37 Checking commit 98463d7bddc4 (target/riscv: 32-bit Computation Instructions)
27/37 Checking commit 4e3c7519d03b (target/riscv: Non-SIMD Miscellaneous Instructions)
ERROR: space prohibited after that '*' (ctx:BxW)
#103: FILE: target/riscv/insn_trans/trans_rvp.c.inc:721:
+          void (* fn)(TCGv, TCGv_ptr, TCGv_i64, TCGv))
                 ^

ERROR: space prohibited after that '*' (ctx:BxW)
#151: FILE: target/riscv/insn_trans/trans_rvp.c.inc:769:
+                               void (* fn)(TCGv, TCGv_ptr, TCGv_i64, TCGv))
                                      ^

total: 2 errors, 0 warnings, 376 lines checked

Patch 27/37 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

28/37 Checking commit aa8562e69e4f (target/riscv: RV64 Only SIMD 32-bit Add/Subtract Instructions)
ERROR: space prohibited after that '*' (ctx:BxW)
#144: FILE: target/riscv/insn_trans/trans_rvp.c.inc:969:
+         void (* fn)(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64))
                ^

total: 1 errors, 0 warnings, 449 lines checked

Patch 28/37 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

29/37 Checking commit 9103b42ea0f6 (target/riscv: RV64 Only SIMD 32-bit Shift Instructions)
ERROR: space prohibited after that '*' (ctx:BxW)
#71: FILE: target/riscv/insn_trans/trans_rvp.c.inc:1040:
+             void (* fn)(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64))
                    ^

total: 1 errors, 0 warnings, 195 lines checked

Patch 29/37 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

30/37 Checking commit 0707fb22513f (target/riscv: RV64 Only SIMD 32-bit Miscellaneous Instructions)
31/37 Checking commit 5e47cf943de0 (target/riscv: RV64 Only SIMD Q15 saturating Multiply Instructions)
32/37 Checking commit afa9d9f289b8 (target/riscv: RV64 Only 32-bit Multiply Instructions)
33/37 Checking commit a45bf09dd5f8 (target/riscv: RV64 Only 32-bit Multiply & Add Instructions)
34/37 Checking commit 13ee829e5c05 (target/riscv: RV64 Only 32-bit Parallel Multiply & Add Instructions)
35/37 Checking commit fd983689c420 (target/riscv: RV64 Only Non-SIMD 32-bit Shift Instructions)
36/37 Checking commit aaa9443b6f93 (target/riscv: RV64 Only 32-bit Packing Instructions)
37/37 Checking commit 58bab55e2bde (target/riscv: configure and turn on packed extension from command line)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20210610075908.3305506-1-zhiwei_liu@c-sky.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 04/37] target/riscv: 8-bit Addition & Subtraction Instruction
  2021-06-10 19:39     ` Richard Henderson
@ 2021-06-24  6:05       ` LIU Zhiwei
  -1 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-24  6:05 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel, qemu-riscv
  Cc: Palmer Dabbelt, palmer, bin.meng, Alistair.Francis

[-- Attachment #1: Type: text/plain, Size: 1798 bytes --]


On 2021/6/11 上午3:39, Richard Henderson wrote:
> On 6/10/21 12:58 AM, LIU Zhiwei wrote:
>>   include/tcg/tcg-op-gvec.h |  6 ++
>>   tcg/tcg-op-gvec.c                       | 47 ++++++++++++++++
>
> Likewise, should be split from the larger patch.
>
>> +static void gen_addv_mask_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, 
>> TCGv_i32 m)
>> +{
>> +    TCGv_i32 t1 = tcg_temp_new_i32();
>> +    TCGv_i32 t2 = tcg_temp_new_i32();
>> +    TCGv_i32 t3 = tcg_temp_new_i32();
>> +
>> +    tcg_gen_andc_i32(t1, a, m);
>> +    tcg_gen_andc_i32(t2, b, m);
>> +    tcg_gen_xor_i32(t3, a, b);
>> +    tcg_gen_add_i32(d, t1, t2);
>> +    tcg_gen_and_i32(t3, t3, m);
>> +    tcg_gen_xor_i32(d, d, t3);
>> +
>> +    tcg_temp_free_i32(t1);
>> +    tcg_temp_free_i32(t2);
>> +    tcg_temp_free_i32(t3);
>> +}
>> +
>> +void tcg_gen_vec_add8_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
>> +{
>> +    TCGv_i32 m = tcg_constant_i32((int32_t)dup_const(MO_8, 0x80));
>> +    gen_addv_mask_i32(d, a, b, m);
>> +}
>
> There will only ever be one use; we might as well merge them.
OK
> The cast is unnecessary.

I meet compiler error report without cast. So I just keep it.

../tcg/tcg-op-gvec.c: In function ‘tcg_gen_vec_sub8_i32’:
/home/roman/git/qemu/include/tcg/tcg.h:1327:5: error: overflow in implicit constant conversion [-Werror=overflow]
      (__builtin_constant_p(VECE)                                    \
      ^
../tcg/tcg-op-gvec.c:1947:35: note: in expansion of macro ‘dup_const’
      TCGv_i32 m = tcg_constant_i32(dup_const(MO_8, 0x80));
                                    ^~~~~~~~~
cc1: all warnings being treated as errors

Thanks,
Zhiwei

>
>
> r~

[-- Attachment #2: Type: text/html, Size: 3090 bytes --]

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 04/37] target/riscv: 8-bit Addition & Subtraction Instruction
@ 2021-06-24  6:05       ` LIU Zhiwei
  0 siblings, 0 replies; 88+ messages in thread
From: LIU Zhiwei @ 2021-06-24  6:05 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, bin.meng, Palmer Dabbelt

[-- Attachment #1: Type: text/plain, Size: 1798 bytes --]


On 2021/6/11 上午3:39, Richard Henderson wrote:
> On 6/10/21 12:58 AM, LIU Zhiwei wrote:
>>   include/tcg/tcg-op-gvec.h |  6 ++
>>   tcg/tcg-op-gvec.c                       | 47 ++++++++++++++++
>
> Likewise, should be split from the larger patch.
>
>> +static void gen_addv_mask_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, 
>> TCGv_i32 m)
>> +{
>> +    TCGv_i32 t1 = tcg_temp_new_i32();
>> +    TCGv_i32 t2 = tcg_temp_new_i32();
>> +    TCGv_i32 t3 = tcg_temp_new_i32();
>> +
>> +    tcg_gen_andc_i32(t1, a, m);
>> +    tcg_gen_andc_i32(t2, b, m);
>> +    tcg_gen_xor_i32(t3, a, b);
>> +    tcg_gen_add_i32(d, t1, t2);
>> +    tcg_gen_and_i32(t3, t3, m);
>> +    tcg_gen_xor_i32(d, d, t3);
>> +
>> +    tcg_temp_free_i32(t1);
>> +    tcg_temp_free_i32(t2);
>> +    tcg_temp_free_i32(t3);
>> +}
>> +
>> +void tcg_gen_vec_add8_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
>> +{
>> +    TCGv_i32 m = tcg_constant_i32((int32_t)dup_const(MO_8, 0x80));
>> +    gen_addv_mask_i32(d, a, b, m);
>> +}
>
> There will only ever be one use; we might as well merge them.
OK
> The cast is unnecessary.

I meet compiler error report without cast. So I just keep it.

../tcg/tcg-op-gvec.c: In function ‘tcg_gen_vec_sub8_i32’:
/home/roman/git/qemu/include/tcg/tcg.h:1327:5: error: overflow in implicit constant conversion [-Werror=overflow]
      (__builtin_constant_p(VECE)                                    \
      ^
../tcg/tcg-op-gvec.c:1947:35: note: in expansion of macro ‘dup_const’
      TCGv_i32 m = tcg_constant_i32(dup_const(MO_8, 0x80));
                                    ^~~~~~~~~
cc1: all warnings being treated as errors

Thanks,
Zhiwei

>
>
> r~

[-- Attachment #2: Type: text/html, Size: 3090 bytes --]

^ permalink raw reply	[flat|nested] 88+ messages in thread

end of thread, other threads:[~2021-06-24  6:07 UTC | newest]

Thread overview: 88+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-10  7:58 [PATCH v2 00/37] target/riscv: support packed extension v0.9.4 LIU Zhiwei
2021-06-10  7:58 ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 01/37] target/riscv: implementation-defined constant parameters LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 02/37] target/riscv: Make the vector helper functions public LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 03/37] target/riscv: 16-bit Addition & Subtraction Instructions LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10 18:00   ` Richard Henderson
2021-06-10 18:00     ` Richard Henderson
2021-06-10  7:58 ` [PATCH v2 04/37] target/riscv: 8-bit Addition & Subtraction Instruction LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10 19:39   ` Richard Henderson
2021-06-10 19:39     ` Richard Henderson
2021-06-11  4:36     ` LIU Zhiwei
2021-06-11  4:36       ` LIU Zhiwei
2021-06-24  6:05     ` LIU Zhiwei
2021-06-24  6:05       ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 05/37] target/riscv: SIMD 16-bit Shift Instructions LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10 19:44   ` Richard Henderson
2021-06-10 19:44     ` Richard Henderson
2021-06-10  7:58 ` [PATCH v2 06/37] target/riscv: SIMD 8-bit " LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 07/37] target/riscv: SIMD 16-bit Compare Instructions LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 08/37] target/riscv: SIMD 8-bit " LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 09/37] target/riscv: SIMD 16-bit Multiply Instructions LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 10/37] target/riscv: SIMD 8-bit " LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 11/37] target/riscv: SIMD 16-bit Miscellaneous Instructions LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 12/37] target/riscv: SIMD 8-bit " LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 13/37] target/riscv: 8-bit Unpacking Instructions LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 14/37] target/riscv: 16-bit Packing Instructions LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 15/37] target/riscv: Signed MSW 32x32 Multiply and Add Instructions LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 16/37] target/riscv: Signed MSW 32x16 " LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 17/37] target/riscv: Signed 16-bit Multiply 32-bit Add/Subtract Instructions LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 18/37] target/riscv: Signed 16-bit Multiply 64-bit " LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 19/37] target/riscv: Partial-SIMD Miscellaneous Instructions LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 20/37] target/riscv: 8-bit Multiply with 32-bit Add Instructions LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 21/37] target/riscv: 64-bit Add/Subtract Instructions LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 22/37] target/riscv: 32-bit Multiply " LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 23/37] target/riscv: Signed 16-bit Multiply with " LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 24/37] target/riscv: Non-SIMD Q15 saturation ALU Instructions LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 25/37] target/riscv: Non-SIMD Q31 " LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 26/37] target/riscv: 32-bit Computation Instructions LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 27/37] target/riscv: Non-SIMD Miscellaneous Instructions LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:58 ` [PATCH v2 28/37] target/riscv: RV64 Only SIMD 32-bit Add/Subtract Instructions LIU Zhiwei
2021-06-10  7:58   ` LIU Zhiwei
2021-06-10  7:59 ` [PATCH v2 29/37] target/riscv: RV64 Only SIMD 32-bit Shift Instructions LIU Zhiwei
2021-06-10  7:59   ` LIU Zhiwei
2021-06-10  7:59 ` [PATCH v2 30/37] target/riscv: RV64 Only SIMD 32-bit Miscellaneous Instructions LIU Zhiwei
2021-06-10  7:59   ` LIU Zhiwei
2021-06-10  7:59 ` [PATCH v2 31/37] target/riscv: RV64 Only SIMD Q15 saturating Multiply Instructions LIU Zhiwei
2021-06-10  7:59   ` LIU Zhiwei
2021-06-10  7:59 ` [PATCH v2 32/37] target/riscv: RV64 Only 32-bit " LIU Zhiwei
2021-06-10  7:59   ` LIU Zhiwei
2021-06-10  7:59 ` [PATCH v2 33/37] target/riscv: RV64 Only 32-bit Multiply & Add Instructions LIU Zhiwei
2021-06-10  7:59   ` LIU Zhiwei
2021-06-10  7:59 ` [PATCH v2 34/37] target/riscv: RV64 Only 32-bit Parallel " LIU Zhiwei
2021-06-10  7:59   ` LIU Zhiwei
2021-06-10  7:59 ` [PATCH v2 35/37] target/riscv: RV64 Only Non-SIMD 32-bit Shift Instructions LIU Zhiwei
2021-06-10  7:59   ` LIU Zhiwei
2021-06-10  7:59 ` [PATCH v2 36/37] target/riscv: RV64 Only 32-bit Packing Instructions LIU Zhiwei
2021-06-10  7:59   ` LIU Zhiwei
2021-06-10  7:59 ` [PATCH v2 37/37] target/riscv: configure and turn on packed extension from command line LIU Zhiwei
2021-06-10  7:59   ` LIU Zhiwei
2021-06-14 22:55 ` [PATCH v2 00/37] target/riscv: support packed extension v0.9.4 no-reply
2021-06-14 22:55   ` no-reply

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.