All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/38] target/riscv: support packed extension v0.9.2
@ 2021-02-12 15:02 ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

This patchset implements the packed extension for RISC-V on QEMU.

This patchset have passed all my direct Linux user mode cases(RV64) and
bare metal cases(RV32) on X86-64 Ubuntu host machine. I will later push
these test cases to my repo(https://github.com/romanheros/qemu.git
branch:packed-upstream-v1).

I have ported packed extension on RISU, but I didn't find a simulator or
hardware to compare with. If anyone have one, please let me know.

Features:
  * support specification packed extension v0.9.2(https://github.com/riscv/riscv-p-spec/)
  * support basic packed extension.
  * support Zp64.

LIU Zhiwei (38):
  target/riscv: implementation-defined constant parameters
  target/riscv: Hoist vector functions
  target/riscv: Fixup saturate subtract function
  target/riscv: 16-bit Addition & Subtraction Instructions
  target/riscv: 8-bit Addition & Subtraction Instruction
  target/riscv: SIMD 16-bit Shift Instructions
  target/riscv: SIMD 8-bit Shift Instructions
  target/riscv: SIMD 16-bit Compare Instructions
  target/riscv: SIMD 8-bit Compare Instructions
  target/riscv: SIMD 16-bit Multiply Instructions
  target/riscv: SIMD 8-bit Multiply Instructions
  target/riscv: SIMD 16-bit Miscellaneous Instructions
  target/riscv: SIMD 8-bit Miscellaneous Instructions
  target/riscv: 8-bit Unpacking Instructions
  target/riscv: 16-bit Packing Instructions
  target/riscv: Signed MSW 32x32 Multiply and Add Instructions
  target/riscv: Signed MSW 32x16 Multiply and Add Instructions
  target/riscv: Signed 16-bit Multiply 32-bit Add/Subtract Instructions
  target/riscv: Signed 16-bit Multiply 64-bit Add/Subtract Instructions
  target/riscv: Partial-SIMD Miscellaneous Instructions
  target/riscv: 8-bit Multiply with 32-bit Add Instructions
  target/riscv: 64-bit Add/Subtract Instructions
  target/riscv: 32-bit Multiply 64-bit Add/Subtract Instructions
  target/riscv: Signed 16-bit Multiply with 64-bit Add/Subtract
    Instructions
  target/riscv: Non-SIMD Q15 saturation ALU Instructions
  target/riscv: Non-SIMD Q31 saturation ALU Instructions
  target/riscv: 32-bit Computation Instructions
  target/riscv: Non-SIMD Miscellaneous Instructions
  target/riscv: RV64 Only SIMD 32-bit Add/Subtract Instructions
  target/riscv: RV64 Only SIMD 32-bit Shift Instructions
  target/riscv: RV64 Only SIMD 32-bit Miscellaneous Instructions
  target/riscv: RV64 Only SIMD Q15 saturating Multiply Instructions
  target/riscv: RV64 Only 32-bit Multiply Instructions
  target/riscv: RV64 Only 32-bit Multiply & Add Instructions
  target/riscv: RV64 Only 32-bit Parallel Multiply & Add Instructions
  target/riscv: RV64 Only Non-SIMD 32-bit Shift Instructions
  target/riscv: RV64 Only 32-bit Packing Instructions
  target/riscv: configure and turn on packed extension from command line

 target/riscv/cpu.c                      |   32 +
 target/riscv/cpu.h                      |    6 +
 target/riscv/helper.h                   |  332 ++
 target/riscv/insn32-64.decode           |   93 +-
 target/riscv/insn32.decode              |  285 ++
 target/riscv/insn_trans/trans_rvp.c.inc | 1224 +++++++
 target/riscv/internals.h                |   50 +
 target/riscv/meson.build                |    1 +
 target/riscv/packed_helper.c            | 3862 +++++++++++++++++++++++
 target/riscv/translate.c                |    3 +
 target/riscv/vector_helper.c            |   90 +-
 11 files changed, 5912 insertions(+), 66 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvp.c.inc
 create mode 100644 target/riscv/packed_helper.c

-- 
2.17.1



^ permalink raw reply	[flat|nested] 150+ messages in thread

* [PATCH 00/38] target/riscv: support packed extension v0.9.2
@ 2021-02-12 15:02 ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

This patchset implements the packed extension for RISC-V on QEMU.

This patchset have passed all my direct Linux user mode cases(RV64) and
bare metal cases(RV32) on X86-64 Ubuntu host machine. I will later push
these test cases to my repo(https://github.com/romanheros/qemu.git
branch:packed-upstream-v1).

I have ported packed extension on RISU, but I didn't find a simulator or
hardware to compare with. If anyone have one, please let me know.

Features:
  * support specification packed extension v0.9.2(https://github.com/riscv/riscv-p-spec/)
  * support basic packed extension.
  * support Zp64.

LIU Zhiwei (38):
  target/riscv: implementation-defined constant parameters
  target/riscv: Hoist vector functions
  target/riscv: Fixup saturate subtract function
  target/riscv: 16-bit Addition & Subtraction Instructions
  target/riscv: 8-bit Addition & Subtraction Instruction
  target/riscv: SIMD 16-bit Shift Instructions
  target/riscv: SIMD 8-bit Shift Instructions
  target/riscv: SIMD 16-bit Compare Instructions
  target/riscv: SIMD 8-bit Compare Instructions
  target/riscv: SIMD 16-bit Multiply Instructions
  target/riscv: SIMD 8-bit Multiply Instructions
  target/riscv: SIMD 16-bit Miscellaneous Instructions
  target/riscv: SIMD 8-bit Miscellaneous Instructions
  target/riscv: 8-bit Unpacking Instructions
  target/riscv: 16-bit Packing Instructions
  target/riscv: Signed MSW 32x32 Multiply and Add Instructions
  target/riscv: Signed MSW 32x16 Multiply and Add Instructions
  target/riscv: Signed 16-bit Multiply 32-bit Add/Subtract Instructions
  target/riscv: Signed 16-bit Multiply 64-bit Add/Subtract Instructions
  target/riscv: Partial-SIMD Miscellaneous Instructions
  target/riscv: 8-bit Multiply with 32-bit Add Instructions
  target/riscv: 64-bit Add/Subtract Instructions
  target/riscv: 32-bit Multiply 64-bit Add/Subtract Instructions
  target/riscv: Signed 16-bit Multiply with 64-bit Add/Subtract
    Instructions
  target/riscv: Non-SIMD Q15 saturation ALU Instructions
  target/riscv: Non-SIMD Q31 saturation ALU Instructions
  target/riscv: 32-bit Computation Instructions
  target/riscv: Non-SIMD Miscellaneous Instructions
  target/riscv: RV64 Only SIMD 32-bit Add/Subtract Instructions
  target/riscv: RV64 Only SIMD 32-bit Shift Instructions
  target/riscv: RV64 Only SIMD 32-bit Miscellaneous Instructions
  target/riscv: RV64 Only SIMD Q15 saturating Multiply Instructions
  target/riscv: RV64 Only 32-bit Multiply Instructions
  target/riscv: RV64 Only 32-bit Multiply & Add Instructions
  target/riscv: RV64 Only 32-bit Parallel Multiply & Add Instructions
  target/riscv: RV64 Only Non-SIMD 32-bit Shift Instructions
  target/riscv: RV64 Only 32-bit Packing Instructions
  target/riscv: configure and turn on packed extension from command line

 target/riscv/cpu.c                      |   32 +
 target/riscv/cpu.h                      |    6 +
 target/riscv/helper.h                   |  332 ++
 target/riscv/insn32-64.decode           |   93 +-
 target/riscv/insn32.decode              |  285 ++
 target/riscv/insn_trans/trans_rvp.c.inc | 1224 +++++++
 target/riscv/internals.h                |   50 +
 target/riscv/meson.build                |    1 +
 target/riscv/packed_helper.c            | 3862 +++++++++++++++++++++++
 target/riscv/translate.c                |    3 +
 target/riscv/vector_helper.c            |   90 +-
 11 files changed, 5912 insertions(+), 66 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvp.c.inc
 create mode 100644 target/riscv/packed_helper.c

-- 
2.17.1



^ permalink raw reply	[flat|nested] 150+ messages in thread

* [PATCH 01/38] target/riscv: implementation-defined constant parameters
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

ext_p64 is whether to support Zp64 extension in RV32, default value is true.
pext_ver is the packed specification version, default value is v0.9.2.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/cpu.c       | 29 +++++++++++++++++++++++++++++
 target/riscv/cpu.h       |  6 ++++++
 target/riscv/translate.c |  2 ++
 3 files changed, 37 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 16f1a34238..1b99f629ec 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -132,6 +132,11 @@ static void set_vext_version(CPURISCVState *env, int vext_ver)
     env->vext_ver = vext_ver;
 }
 
+static void set_pext_version(CPURISCVState *env, int pext_ver)
+{
+    env->pext_ver = pext_ver;
+}
+
 static void set_feature(CPURISCVState *env, int feature)
 {
     env->features |= (1ULL << feature);
@@ -380,6 +385,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
     RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(dev);
     int priv_version = PRIV_VERSION_1_11_0;
     int vext_version = VEXT_VERSION_0_07_1;
+    int pext_version = PEXT_VERSION_0_09_2;
     target_ulong target_misa = env->misa;
     Error *local_err = NULL;
 
@@ -404,6 +410,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
 
     set_priv_version(env, priv_version);
     set_vext_version(env, vext_version);
+    set_pext_version(env, pext_version);
 
     if (cpu->cfg.mmu) {
         set_feature(env, RISCV_FEATURE_MMU);
@@ -511,6 +518,28 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
             }
             set_vext_version(env, vext_version);
         }
+        if (cpu->cfg.ext_p) {
+            target_misa |= RVP;
+            if (cpu->cfg.pext_spec) {
+                if (!g_strcmp0(cpu->cfg.pext_spec, "v0.9.2")) {
+                    pext_version = PEXT_VERSION_0_09_2;
+                } else {
+                    error_setg(errp,
+                               "Unsupported packed spec version '%s'",
+                               cpu->cfg.pext_spec);
+                    return;
+                }
+            } else {
+                qemu_log("packed verison is not specified, "
+                         "use the default value v0.9.2\n");
+            }
+            if (!cpu->cfg.ext_p64 && env->misa == RV64) {
+                error_setg(errp, "For RV64, the Zp64 instructions will be "
+                                 "included in the baseline P extension.");
+                return;
+            }
+            set_pext_version(env, pext_version);
+        }
 
         set_misa(env, target_misa);
     }
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 02758ae0eb..f458722646 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -68,6 +68,7 @@
 #define RVF RV('F')
 #define RVD RV('D')
 #define RVV RV('V')
+#define RVP RV('P')
 #define RVC RV('C')
 #define RVS RV('S')
 #define RVU RV('U')
@@ -87,6 +88,7 @@ enum {
 #define PRIV_VERSION_1_11_0 0x00011100
 
 #define VEXT_VERSION_0_07_1 0x00000701
+#define PEXT_VERSION_0_09_2 0x00000902
 
 enum {
     TRANSLATE_SUCCESS,
@@ -134,6 +136,7 @@ struct CPURISCVState {
 
     target_ulong priv_ver;
     target_ulong vext_ver;
+    target_ulong pext_ver;
     target_ulong misa;
     target_ulong misa_mask;
 
@@ -288,13 +291,16 @@ struct RISCVCPU {
         bool ext_u;
         bool ext_h;
         bool ext_v;
+        bool ext_p;
         bool ext_counters;
         bool ext_ifencei;
         bool ext_icsr;
+        bool ext_p64;
 
         char *priv_spec;
         char *user_spec;
         char *vext_spec;
+        char *pext_spec;
         uint16_t vlen;
         uint16_t elen;
         bool mmu;
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 0f28b5f41e..eb810efec6 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -56,6 +56,7 @@ typedef struct DisasContext {
        to reset this known value.  */
     int frm;
     bool ext_ifencei;
+    bool ext_p64;
     bool hlsx;
     /* vector extension */
     bool vill;
@@ -824,6 +825,7 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
     ctx->mlen = 1 << (ctx->sew  + 3 - ctx->lmul);
     ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
+    ctx->ext_p64 = cpu->cfg.ext_p64;
     ctx->cs = cs;
 }
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 01/38] target/riscv: implementation-defined constant parameters
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

ext_p64 is whether to support Zp64 extension in RV32, default value is true.
pext_ver is the packed specification version, default value is v0.9.2.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/cpu.c       | 29 +++++++++++++++++++++++++++++
 target/riscv/cpu.h       |  6 ++++++
 target/riscv/translate.c |  2 ++
 3 files changed, 37 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 16f1a34238..1b99f629ec 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -132,6 +132,11 @@ static void set_vext_version(CPURISCVState *env, int vext_ver)
     env->vext_ver = vext_ver;
 }
 
+static void set_pext_version(CPURISCVState *env, int pext_ver)
+{
+    env->pext_ver = pext_ver;
+}
+
 static void set_feature(CPURISCVState *env, int feature)
 {
     env->features |= (1ULL << feature);
@@ -380,6 +385,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
     RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(dev);
     int priv_version = PRIV_VERSION_1_11_0;
     int vext_version = VEXT_VERSION_0_07_1;
+    int pext_version = PEXT_VERSION_0_09_2;
     target_ulong target_misa = env->misa;
     Error *local_err = NULL;
 
@@ -404,6 +410,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
 
     set_priv_version(env, priv_version);
     set_vext_version(env, vext_version);
+    set_pext_version(env, pext_version);
 
     if (cpu->cfg.mmu) {
         set_feature(env, RISCV_FEATURE_MMU);
@@ -511,6 +518,28 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
             }
             set_vext_version(env, vext_version);
         }
+        if (cpu->cfg.ext_p) {
+            target_misa |= RVP;
+            if (cpu->cfg.pext_spec) {
+                if (!g_strcmp0(cpu->cfg.pext_spec, "v0.9.2")) {
+                    pext_version = PEXT_VERSION_0_09_2;
+                } else {
+                    error_setg(errp,
+                               "Unsupported packed spec version '%s'",
+                               cpu->cfg.pext_spec);
+                    return;
+                }
+            } else {
+                qemu_log("packed verison is not specified, "
+                         "use the default value v0.9.2\n");
+            }
+            if (!cpu->cfg.ext_p64 && env->misa == RV64) {
+                error_setg(errp, "For RV64, the Zp64 instructions will be "
+                                 "included in the baseline P extension.");
+                return;
+            }
+            set_pext_version(env, pext_version);
+        }
 
         set_misa(env, target_misa);
     }
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 02758ae0eb..f458722646 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -68,6 +68,7 @@
 #define RVF RV('F')
 #define RVD RV('D')
 #define RVV RV('V')
+#define RVP RV('P')
 #define RVC RV('C')
 #define RVS RV('S')
 #define RVU RV('U')
@@ -87,6 +88,7 @@ enum {
 #define PRIV_VERSION_1_11_0 0x00011100
 
 #define VEXT_VERSION_0_07_1 0x00000701
+#define PEXT_VERSION_0_09_2 0x00000902
 
 enum {
     TRANSLATE_SUCCESS,
@@ -134,6 +136,7 @@ struct CPURISCVState {
 
     target_ulong priv_ver;
     target_ulong vext_ver;
+    target_ulong pext_ver;
     target_ulong misa;
     target_ulong misa_mask;
 
@@ -288,13 +291,16 @@ struct RISCVCPU {
         bool ext_u;
         bool ext_h;
         bool ext_v;
+        bool ext_p;
         bool ext_counters;
         bool ext_ifencei;
         bool ext_icsr;
+        bool ext_p64;
 
         char *priv_spec;
         char *user_spec;
         char *vext_spec;
+        char *pext_spec;
         uint16_t vlen;
         uint16_t elen;
         bool mmu;
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 0f28b5f41e..eb810efec6 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -56,6 +56,7 @@ typedef struct DisasContext {
        to reset this known value.  */
     int frm;
     bool ext_ifencei;
+    bool ext_p64;
     bool hlsx;
     /* vector extension */
     bool vill;
@@ -824,6 +825,7 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
     ctx->mlen = 1 << (ctx->sew  + 3 - ctx->lmul);
     ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
+    ctx->ext_p64 = cpu->cfg.ext_p64;
     ctx->cs = cs;
 }
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 02/38] target/riscv: Hoist vector functions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

The saturate functions about add,subtract and shift functions can
be used in packed extension.Therefore hoist them up.

The endianess process macro is also be hoisted.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/internals.h     | 50 ++++++++++++++++++++++
 target/riscv/vector_helper.c | 82 +++++++++++-------------------------
 2 files changed, 74 insertions(+), 58 deletions(-)

diff --git a/target/riscv/internals.h b/target/riscv/internals.h
index b15ad394bb..698158e116 100644
--- a/target/riscv/internals.h
+++ b/target/riscv/internals.h
@@ -58,4 +58,54 @@ static inline float32 check_nanbox_s(uint64_t f)
     }
 }
 
+/*
+ * Note that vector data is stored in host-endian 64-bit chunks,
+ * so addressing units smaller than that needs a host-endian fixup.
+ */
+#ifdef HOST_WORDS_BIGENDIAN
+#define H1(x)   ((x) ^ 7)
+#define H1_2(x) ((x) ^ 6)
+#define H1_4(x) ((x) ^ 4)
+#define H2(x)   ((x) ^ 3)
+#define H4(x)   ((x) ^ 1)
+#define H8(x)   ((x))
+#else
+#define H1(x)   (x)
+#define H1_2(x) (x)
+#define H1_4(x) (x)
+#define H2(x)   (x)
+#define H4(x)   (x)
+#define H8(x)   (x)
+#endif
+
+/* share functions about saturation */
+int8_t sadd8(CPURISCVState *, int vxrm, int8_t, int8_t);
+int16_t sadd16(CPURISCVState *, int vxrm, int16_t, int16_t);
+int32_t sadd32(CPURISCVState *, int vxrm, int32_t, int32_t);
+int64_t sadd64(CPURISCVState *, int vxrm, int64_t, int64_t);
+
+uint8_t saddu8(CPURISCVState *, int vxrm, uint8_t, uint8_t);
+uint16_t saddu16(CPURISCVState *, int vxrm, uint16_t, uint16_t);
+uint32_t saddu32(CPURISCVState *, int vxrm, uint32_t, uint32_t);
+uint64_t saddu64(CPURISCVState *, int vxrm, uint64_t, uint64_t);
+
+int8_t ssub8(CPURISCVState *, int vxrm, int8_t, int8_t);
+int16_t ssub16(CPURISCVState *, int vxrm, int16_t, int16_t);
+int32_t ssub32(CPURISCVState *, int vxrm, int32_t, int32_t);
+int64_t ssub64(CPURISCVState *, int vxrm, int64_t, int64_t);
+
+uint8_t ssubu8(CPURISCVState *, int vxrm, uint8_t, uint8_t);
+uint16_t ssubu16(CPURISCVState *, int vxrm, uint16_t, uint16_t);
+uint32_t ssubu32(CPURISCVState *, int vxrm, uint32_t, uint32_t);
+uint64_t ssubu64(CPURISCVState *, int vxrm, uint64_t, uint64_t);
+
+/* share shift functions */
+int8_t vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b);
+int16_t vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b);
+int32_t vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b);
+int64_t vssra64(CPURISCVState *env, int vxrm, int64_t a, int64_t b);
+uint8_t vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b);
+uint16_t vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b);
+uint32_t vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b);
+uint64_t vssrl64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b);
 #endif
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index a156573d28..9371d70f6b 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -56,26 +56,6 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
     return vl;
 }
 
-/*
- * Note that vector data is stored in host-endian 64-bit chunks,
- * so addressing units smaller than that needs a host-endian fixup.
- */
-#ifdef HOST_WORDS_BIGENDIAN
-#define H1(x)   ((x) ^ 7)
-#define H1_2(x) ((x) ^ 6)
-#define H1_4(x) ((x) ^ 4)
-#define H2(x)   ((x) ^ 3)
-#define H4(x)   ((x) ^ 1)
-#define H8(x)   ((x))
-#else
-#define H1(x)   (x)
-#define H1_2(x) (x)
-#define H1_4(x) (x)
-#define H2(x)   (x)
-#define H4(x)   (x)
-#define H8(x)   (x)
-#endif
-
 static inline uint32_t vext_nf(uint32_t desc)
 {
     return FIELD_EX32(simd_data(desc), VDATA, NF);
@@ -2199,7 +2179,7 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,     \
                  do_##NAME, CLEAR_FN);                          \
 }
 
-static inline uint8_t saddu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
+uint8_t saddu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
 {
     uint8_t res = a + b;
     if (res < a) {
@@ -2209,8 +2189,7 @@ static inline uint8_t saddu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
     return res;
 }
 
-static inline uint16_t saddu16(CPURISCVState *env, int vxrm, uint16_t a,
-                               uint16_t b)
+uint16_t saddu16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
 {
     uint16_t res = a + b;
     if (res < a) {
@@ -2220,8 +2199,7 @@ static inline uint16_t saddu16(CPURISCVState *env, int vxrm, uint16_t a,
     return res;
 }
 
-static inline uint32_t saddu32(CPURISCVState *env, int vxrm, uint32_t a,
-                               uint32_t b)
+uint32_t saddu32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
 {
     uint32_t res = a + b;
     if (res < a) {
@@ -2231,8 +2209,7 @@ static inline uint32_t saddu32(CPURISCVState *env, int vxrm, uint32_t a,
     return res;
 }
 
-static inline uint64_t saddu64(CPURISCVState *env, int vxrm, uint64_t a,
-                               uint64_t b)
+uint64_t saddu64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
 {
     uint64_t res = a + b;
     if (res < a) {
@@ -2328,7 +2305,7 @@ GEN_VEXT_VX_RM(vsaddu_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_RM(vsaddu_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_RM(vsaddu_vx_d, 8, 8, clearq)
 
-static inline int8_t sadd8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
+int8_t sadd8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
 {
     int8_t res = a + b;
     if ((res ^ a) & (res ^ b) & INT8_MIN) {
@@ -2338,7 +2315,7 @@ static inline int8_t sadd8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
     return res;
 }
 
-static inline int16_t sadd16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
+int16_t sadd16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
 {
     int16_t res = a + b;
     if ((res ^ a) & (res ^ b) & INT16_MIN) {
@@ -2348,7 +2325,7 @@ static inline int16_t sadd16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
     return res;
 }
 
-static inline int32_t sadd32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
+int32_t sadd32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
 {
     int32_t res = a + b;
     if ((res ^ a) & (res ^ b) & INT32_MIN) {
@@ -2358,7 +2335,7 @@ static inline int32_t sadd32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
     return res;
 }
 
-static inline int64_t sadd64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
+int64_t sadd64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
 {
     int64_t res = a + b;
     if ((res ^ a) & (res ^ b) & INT64_MIN) {
@@ -2386,7 +2363,7 @@ GEN_VEXT_VX_RM(vsadd_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_RM(vsadd_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_RM(vsadd_vx_d, 8, 8, clearq)
 
-static inline uint8_t ssubu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
+uint8_t ssubu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
 {
     uint8_t res = a - b;
     if (res > a) {
@@ -2396,8 +2373,7 @@ static inline uint8_t ssubu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
     return res;
 }
 
-static inline uint16_t ssubu16(CPURISCVState *env, int vxrm, uint16_t a,
-                               uint16_t b)
+uint16_t ssubu16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
 {
     uint16_t res = a - b;
     if (res > a) {
@@ -2407,8 +2383,7 @@ static inline uint16_t ssubu16(CPURISCVState *env, int vxrm, uint16_t a,
     return res;
 }
 
-static inline uint32_t ssubu32(CPURISCVState *env, int vxrm, uint32_t a,
-                               uint32_t b)
+uint32_t ssubu32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
 {
     uint32_t res = a - b;
     if (res > a) {
@@ -2418,8 +2393,7 @@ static inline uint32_t ssubu32(CPURISCVState *env, int vxrm, uint32_t a,
     return res;
 }
 
-static inline uint64_t ssubu64(CPURISCVState *env, int vxrm, uint64_t a,
-                               uint64_t b)
+uint64_t ssubu64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
 {
     uint64_t res = a - b;
     if (res > a) {
@@ -2447,7 +2421,7 @@ GEN_VEXT_VX_RM(vssubu_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_RM(vssubu_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_RM(vssubu_vx_d, 8, 8, clearq)
 
-static inline int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
+int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
 {
     int8_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT8_MIN) {
@@ -2457,7 +2431,7 @@ static inline int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
     return res;
 }
 
-static inline int16_t ssub16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
+int16_t ssub16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
 {
     int16_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT16_MIN) {
@@ -2467,7 +2441,7 @@ static inline int16_t ssub16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
     return res;
 }
 
-static inline int32_t ssub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
+int32_t ssub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
 {
     int32_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT32_MIN) {
@@ -2477,7 +2451,7 @@ static inline int32_t ssub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
     return res;
 }
 
-static inline int64_t ssub64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
+int64_t ssub64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
 {
     int64_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT64_MIN) {
@@ -2918,8 +2892,7 @@ GEN_VEXT_VX_RM(vwsmaccus_vx_h, 2, 4, clearl)
 GEN_VEXT_VX_RM(vwsmaccus_vx_w, 4, 8, clearq)
 
 /* Vector Single-Width Scaling Shift Instructions */
-static inline uint8_t
-vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
+uint8_t vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
 {
     uint8_t round, shift = b & 0x7;
     uint8_t res;
@@ -2928,8 +2901,7 @@ vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline uint16_t
-vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
+uint16_t vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
 {
     uint8_t round, shift = b & 0xf;
     uint16_t res;
@@ -2938,8 +2910,7 @@ vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline uint32_t
-vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
+uint32_t vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
 {
     uint8_t round, shift = b & 0x1f;
     uint32_t res;
@@ -2948,8 +2919,7 @@ vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline uint64_t
-vssrl64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
+uint64_t vssrl64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
 {
     uint8_t round, shift = b & 0x3f;
     uint64_t res;
@@ -2976,8 +2946,7 @@ GEN_VEXT_VX_RM(vssrl_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_RM(vssrl_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_RM(vssrl_vx_d, 8, 8, clearq)
 
-static inline int8_t
-vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
+int8_t vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
 {
     uint8_t round, shift = b & 0x7;
     int8_t res;
@@ -2986,8 +2955,7 @@ vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline int16_t
-vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
+int16_t vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
 {
     uint8_t round, shift = b & 0xf;
     int16_t res;
@@ -2996,8 +2964,7 @@ vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline int32_t
-vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
+int32_t vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
 {
     uint8_t round, shift = b & 0x1f;
     int32_t res;
@@ -3006,8 +2973,7 @@ vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline int64_t
-vssra64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
+int64_t vssra64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
 {
     uint8_t round, shift = b & 0x3f;
     int64_t res;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 02/38] target/riscv: Hoist vector functions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

The saturate functions about add,subtract and shift functions can
be used in packed extension.Therefore hoist them up.

The endianess process macro is also be hoisted.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/internals.h     | 50 ++++++++++++++++++++++
 target/riscv/vector_helper.c | 82 +++++++++++-------------------------
 2 files changed, 74 insertions(+), 58 deletions(-)

diff --git a/target/riscv/internals.h b/target/riscv/internals.h
index b15ad394bb..698158e116 100644
--- a/target/riscv/internals.h
+++ b/target/riscv/internals.h
@@ -58,4 +58,54 @@ static inline float32 check_nanbox_s(uint64_t f)
     }
 }
 
+/*
+ * Note that vector data is stored in host-endian 64-bit chunks,
+ * so addressing units smaller than that needs a host-endian fixup.
+ */
+#ifdef HOST_WORDS_BIGENDIAN
+#define H1(x)   ((x) ^ 7)
+#define H1_2(x) ((x) ^ 6)
+#define H1_4(x) ((x) ^ 4)
+#define H2(x)   ((x) ^ 3)
+#define H4(x)   ((x) ^ 1)
+#define H8(x)   ((x))
+#else
+#define H1(x)   (x)
+#define H1_2(x) (x)
+#define H1_4(x) (x)
+#define H2(x)   (x)
+#define H4(x)   (x)
+#define H8(x)   (x)
+#endif
+
+/* share functions about saturation */
+int8_t sadd8(CPURISCVState *, int vxrm, int8_t, int8_t);
+int16_t sadd16(CPURISCVState *, int vxrm, int16_t, int16_t);
+int32_t sadd32(CPURISCVState *, int vxrm, int32_t, int32_t);
+int64_t sadd64(CPURISCVState *, int vxrm, int64_t, int64_t);
+
+uint8_t saddu8(CPURISCVState *, int vxrm, uint8_t, uint8_t);
+uint16_t saddu16(CPURISCVState *, int vxrm, uint16_t, uint16_t);
+uint32_t saddu32(CPURISCVState *, int vxrm, uint32_t, uint32_t);
+uint64_t saddu64(CPURISCVState *, int vxrm, uint64_t, uint64_t);
+
+int8_t ssub8(CPURISCVState *, int vxrm, int8_t, int8_t);
+int16_t ssub16(CPURISCVState *, int vxrm, int16_t, int16_t);
+int32_t ssub32(CPURISCVState *, int vxrm, int32_t, int32_t);
+int64_t ssub64(CPURISCVState *, int vxrm, int64_t, int64_t);
+
+uint8_t ssubu8(CPURISCVState *, int vxrm, uint8_t, uint8_t);
+uint16_t ssubu16(CPURISCVState *, int vxrm, uint16_t, uint16_t);
+uint32_t ssubu32(CPURISCVState *, int vxrm, uint32_t, uint32_t);
+uint64_t ssubu64(CPURISCVState *, int vxrm, uint64_t, uint64_t);
+
+/* share shift functions */
+int8_t vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b);
+int16_t vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b);
+int32_t vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b);
+int64_t vssra64(CPURISCVState *env, int vxrm, int64_t a, int64_t b);
+uint8_t vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b);
+uint16_t vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b);
+uint32_t vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b);
+uint64_t vssrl64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b);
 #endif
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index a156573d28..9371d70f6b 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -56,26 +56,6 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
     return vl;
 }
 
-/*
- * Note that vector data is stored in host-endian 64-bit chunks,
- * so addressing units smaller than that needs a host-endian fixup.
- */
-#ifdef HOST_WORDS_BIGENDIAN
-#define H1(x)   ((x) ^ 7)
-#define H1_2(x) ((x) ^ 6)
-#define H1_4(x) ((x) ^ 4)
-#define H2(x)   ((x) ^ 3)
-#define H4(x)   ((x) ^ 1)
-#define H8(x)   ((x))
-#else
-#define H1(x)   (x)
-#define H1_2(x) (x)
-#define H1_4(x) (x)
-#define H2(x)   (x)
-#define H4(x)   (x)
-#define H8(x)   (x)
-#endif
-
 static inline uint32_t vext_nf(uint32_t desc)
 {
     return FIELD_EX32(simd_data(desc), VDATA, NF);
@@ -2199,7 +2179,7 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,     \
                  do_##NAME, CLEAR_FN);                          \
 }
 
-static inline uint8_t saddu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
+uint8_t saddu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
 {
     uint8_t res = a + b;
     if (res < a) {
@@ -2209,8 +2189,7 @@ static inline uint8_t saddu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
     return res;
 }
 
-static inline uint16_t saddu16(CPURISCVState *env, int vxrm, uint16_t a,
-                               uint16_t b)
+uint16_t saddu16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
 {
     uint16_t res = a + b;
     if (res < a) {
@@ -2220,8 +2199,7 @@ static inline uint16_t saddu16(CPURISCVState *env, int vxrm, uint16_t a,
     return res;
 }
 
-static inline uint32_t saddu32(CPURISCVState *env, int vxrm, uint32_t a,
-                               uint32_t b)
+uint32_t saddu32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
 {
     uint32_t res = a + b;
     if (res < a) {
@@ -2231,8 +2209,7 @@ static inline uint32_t saddu32(CPURISCVState *env, int vxrm, uint32_t a,
     return res;
 }
 
-static inline uint64_t saddu64(CPURISCVState *env, int vxrm, uint64_t a,
-                               uint64_t b)
+uint64_t saddu64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
 {
     uint64_t res = a + b;
     if (res < a) {
@@ -2328,7 +2305,7 @@ GEN_VEXT_VX_RM(vsaddu_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_RM(vsaddu_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_RM(vsaddu_vx_d, 8, 8, clearq)
 
-static inline int8_t sadd8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
+int8_t sadd8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
 {
     int8_t res = a + b;
     if ((res ^ a) & (res ^ b) & INT8_MIN) {
@@ -2338,7 +2315,7 @@ static inline int8_t sadd8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
     return res;
 }
 
-static inline int16_t sadd16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
+int16_t sadd16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
 {
     int16_t res = a + b;
     if ((res ^ a) & (res ^ b) & INT16_MIN) {
@@ -2348,7 +2325,7 @@ static inline int16_t sadd16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
     return res;
 }
 
-static inline int32_t sadd32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
+int32_t sadd32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
 {
     int32_t res = a + b;
     if ((res ^ a) & (res ^ b) & INT32_MIN) {
@@ -2358,7 +2335,7 @@ static inline int32_t sadd32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
     return res;
 }
 
-static inline int64_t sadd64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
+int64_t sadd64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
 {
     int64_t res = a + b;
     if ((res ^ a) & (res ^ b) & INT64_MIN) {
@@ -2386,7 +2363,7 @@ GEN_VEXT_VX_RM(vsadd_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_RM(vsadd_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_RM(vsadd_vx_d, 8, 8, clearq)
 
-static inline uint8_t ssubu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
+uint8_t ssubu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
 {
     uint8_t res = a - b;
     if (res > a) {
@@ -2396,8 +2373,7 @@ static inline uint8_t ssubu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
     return res;
 }
 
-static inline uint16_t ssubu16(CPURISCVState *env, int vxrm, uint16_t a,
-                               uint16_t b)
+uint16_t ssubu16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
 {
     uint16_t res = a - b;
     if (res > a) {
@@ -2407,8 +2383,7 @@ static inline uint16_t ssubu16(CPURISCVState *env, int vxrm, uint16_t a,
     return res;
 }
 
-static inline uint32_t ssubu32(CPURISCVState *env, int vxrm, uint32_t a,
-                               uint32_t b)
+uint32_t ssubu32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
 {
     uint32_t res = a - b;
     if (res > a) {
@@ -2418,8 +2393,7 @@ static inline uint32_t ssubu32(CPURISCVState *env, int vxrm, uint32_t a,
     return res;
 }
 
-static inline uint64_t ssubu64(CPURISCVState *env, int vxrm, uint64_t a,
-                               uint64_t b)
+uint64_t ssubu64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
 {
     uint64_t res = a - b;
     if (res > a) {
@@ -2447,7 +2421,7 @@ GEN_VEXT_VX_RM(vssubu_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_RM(vssubu_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_RM(vssubu_vx_d, 8, 8, clearq)
 
-static inline int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
+int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
 {
     int8_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT8_MIN) {
@@ -2457,7 +2431,7 @@ static inline int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
     return res;
 }
 
-static inline int16_t ssub16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
+int16_t ssub16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
 {
     int16_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT16_MIN) {
@@ -2467,7 +2441,7 @@ static inline int16_t ssub16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
     return res;
 }
 
-static inline int32_t ssub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
+int32_t ssub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
 {
     int32_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT32_MIN) {
@@ -2477,7 +2451,7 @@ static inline int32_t ssub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
     return res;
 }
 
-static inline int64_t ssub64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
+int64_t ssub64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
 {
     int64_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT64_MIN) {
@@ -2918,8 +2892,7 @@ GEN_VEXT_VX_RM(vwsmaccus_vx_h, 2, 4, clearl)
 GEN_VEXT_VX_RM(vwsmaccus_vx_w, 4, 8, clearq)
 
 /* Vector Single-Width Scaling Shift Instructions */
-static inline uint8_t
-vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
+uint8_t vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
 {
     uint8_t round, shift = b & 0x7;
     uint8_t res;
@@ -2928,8 +2901,7 @@ vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline uint16_t
-vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
+uint16_t vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
 {
     uint8_t round, shift = b & 0xf;
     uint16_t res;
@@ -2938,8 +2910,7 @@ vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline uint32_t
-vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
+uint32_t vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
 {
     uint8_t round, shift = b & 0x1f;
     uint32_t res;
@@ -2948,8 +2919,7 @@ vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline uint64_t
-vssrl64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
+uint64_t vssrl64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
 {
     uint8_t round, shift = b & 0x3f;
     uint64_t res;
@@ -2976,8 +2946,7 @@ GEN_VEXT_VX_RM(vssrl_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_RM(vssrl_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_RM(vssrl_vx_d, 8, 8, clearq)
 
-static inline int8_t
-vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
+int8_t vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
 {
     uint8_t round, shift = b & 0x7;
     int8_t res;
@@ -2986,8 +2955,7 @@ vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline int16_t
-vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
+int16_t vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
 {
     uint8_t round, shift = b & 0xf;
     int16_t res;
@@ -2996,8 +2964,7 @@ vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline int32_t
-vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
+int32_t vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
 {
     uint8_t round, shift = b & 0x1f;
     int32_t res;
@@ -3006,8 +2973,7 @@ vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
     res   = (a >> shift)  + round;
     return res;
 }
-static inline int64_t
-vssra64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
+int64_t vssra64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
 {
     uint8_t round, shift = b & 0x3f;
     int64_t res;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 03/38] target/riscv: Fixup saturate subtract function
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

The overflow predication ((a - b) ^ a) & (a ^ b) & INT64_MIN is right.
However, when the predication is ture and a is 0, it should return maximum.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/vector_helper.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 9371d70f6b..9786f630b4 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -2425,7 +2425,7 @@ int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
 {
     int8_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT8_MIN) {
-        res = a > 0 ? INT8_MAX : INT8_MIN;
+        res = a >= 0 ? INT8_MAX : INT8_MIN;
         env->vxsat = 0x1;
     }
     return res;
@@ -2435,7 +2435,7 @@ int16_t ssub16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
 {
     int16_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT16_MIN) {
-        res = a > 0 ? INT16_MAX : INT16_MIN;
+        res = a >= 0 ? INT16_MAX : INT16_MIN;
         env->vxsat = 0x1;
     }
     return res;
@@ -2445,7 +2445,7 @@ int32_t ssub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
 {
     int32_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT32_MIN) {
-        res = a > 0 ? INT32_MAX : INT32_MIN;
+        res = a >= 0 ? INT32_MAX : INT32_MIN;
         env->vxsat = 0x1;
     }
     return res;
@@ -2455,7 +2455,7 @@ int64_t ssub64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
 {
     int64_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT64_MIN) {
-        res = a > 0 ? INT64_MAX : INT64_MIN;
+        res = a >= 0 ? INT64_MAX : INT64_MIN;
         env->vxsat = 0x1;
     }
     return res;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 03/38] target/riscv: Fixup saturate subtract function
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

The overflow predication ((a - b) ^ a) & (a ^ b) & INT64_MIN is right.
However, when the predication is ture and a is 0, it should return maximum.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/vector_helper.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 9371d70f6b..9786f630b4 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -2425,7 +2425,7 @@ int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
 {
     int8_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT8_MIN) {
-        res = a > 0 ? INT8_MAX : INT8_MIN;
+        res = a >= 0 ? INT8_MAX : INT8_MIN;
         env->vxsat = 0x1;
     }
     return res;
@@ -2435,7 +2435,7 @@ int16_t ssub16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
 {
     int16_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT16_MIN) {
-        res = a > 0 ? INT16_MAX : INT16_MIN;
+        res = a >= 0 ? INT16_MAX : INT16_MIN;
         env->vxsat = 0x1;
     }
     return res;
@@ -2445,7 +2445,7 @@ int32_t ssub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
 {
     int32_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT32_MIN) {
-        res = a > 0 ? INT32_MAX : INT32_MIN;
+        res = a >= 0 ? INT32_MAX : INT32_MIN;
         env->vxsat = 0x1;
     }
     return res;
@@ -2455,7 +2455,7 @@ int64_t ssub64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
 {
     int64_t res = a - b;
     if ((res ^ a) & (a ^ b) & INT64_MIN) {
-        res = a > 0 ? INT64_MAX : INT64_MIN;
+        res = a >= 0 ? INT64_MAX : INT64_MIN;
         env->vxsat = 0x1;
     }
     return res;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 04/38] target/riscv: 16-bit Addition & Subtraction Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Include 5 groups: Wrap-around (dropping overflow), Signed Halving,
Unsigned Halving, Signed Saturation, and Unsigned Saturation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  30 ++
 target/riscv/insn32.decode              |  32 +++
 target/riscv/insn_trans/trans_rvp.c.inc | 161 +++++++++++
 target/riscv/meson.build                |   1 +
 target/riscv/packed_helper.c            | 354 ++++++++++++++++++++++++
 target/riscv/translate.c                |   1 +
 6 files changed, 579 insertions(+)
 create mode 100644 target/riscv/insn_trans/trans_rvp.c.inc
 create mode 100644 target/riscv/packed_helper.c

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index e3f3f41e89..6d622c732a 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1145,3 +1145,33 @@ DEF_HELPER_6(vcompress_vm_b, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vcompress_vm_h, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vcompress_vm_w, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vcompress_vm_d, void, ptr, ptr, ptr, ptr, env, i32)
+
+/* P extension function */
+DEF_HELPER_3(radd16, tl, env, tl, tl)
+DEF_HELPER_3(uradd16, tl, env, tl, tl)
+DEF_HELPER_3(kadd16, tl, env, tl, tl)
+DEF_HELPER_3(ukadd16, tl, env, tl, tl)
+DEF_HELPER_3(rsub16, tl, env, tl, tl)
+DEF_HELPER_3(ursub16, tl, env, tl, tl)
+DEF_HELPER_3(ksub16, tl, env, tl, tl)
+DEF_HELPER_3(uksub16, tl, env, tl, tl)
+DEF_HELPER_3(cras16, tl, env, tl, tl)
+DEF_HELPER_3(rcras16, tl, env, tl, tl)
+DEF_HELPER_3(urcras16, tl, env, tl, tl)
+DEF_HELPER_3(kcras16, tl, env, tl, tl)
+DEF_HELPER_3(ukcras16, tl, env, tl, tl)
+DEF_HELPER_3(crsa16, tl, env, tl, tl)
+DEF_HELPER_3(rcrsa16, tl, env, tl, tl)
+DEF_HELPER_3(urcrsa16, tl, env, tl, tl)
+DEF_HELPER_3(kcrsa16, tl, env, tl, tl)
+DEF_HELPER_3(ukcrsa16, tl, env, tl, tl)
+DEF_HELPER_3(stas16, tl, env, tl, tl)
+DEF_HELPER_3(rstas16, tl, env, tl, tl)
+DEF_HELPER_3(urstas16, tl, env, tl, tl)
+DEF_HELPER_3(kstas16, tl, env, tl, tl)
+DEF_HELPER_3(ukstas16, tl, env, tl, tl)
+DEF_HELPER_3(stsa16, tl, env, tl, tl)
+DEF_HELPER_3(rstsa16, tl, env, tl, tl)
+DEF_HELPER_3(urstsa16, tl, env, tl, tl)
+DEF_HELPER_3(kstsa16, tl, env, tl, tl)
+DEF_HELPER_3(ukstsa16, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 84080dd18c..8815e90476 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -592,3 +592,35 @@ vcompress_vm    010111 - ..... ..... 010 ..... 1010111 @r
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
+
+# *** RV32P Extension ***
+add16      0100000  ..... ..... 000 ..... 1111111 @r
+radd16     0000000  ..... ..... 000 ..... 1111111 @r
+uradd16    0010000  ..... ..... 000 ..... 1111111 @r
+kadd16     0001000  ..... ..... 000 ..... 1111111 @r
+ukadd16    0011000  ..... ..... 000 ..... 1111111 @r
+sub16      0100001  ..... ..... 000 ..... 1111111 @r
+rsub16     0000001  ..... ..... 000 ..... 1111111 @r
+ursub16    0010001  ..... ..... 000 ..... 1111111 @r
+ksub16     0001001  ..... ..... 000 ..... 1111111 @r
+uksub16    0011001  ..... ..... 000 ..... 1111111 @r
+cras16     0100010  ..... ..... 000 ..... 1111111 @r
+rcras16    0000010  ..... ..... 000 ..... 1111111 @r
+urcras16   0010010  ..... ..... 000 ..... 1111111 @r
+kcras16    0001010  ..... ..... 000 ..... 1111111 @r
+ukcras16   0011010  ..... ..... 000 ..... 1111111 @r
+crsa16     0100011  ..... ..... 000 ..... 1111111 @r
+rcrsa16    0000011  ..... ..... 000 ..... 1111111 @r
+urcrsa16   0010011  ..... ..... 000 ..... 1111111 @r
+kcrsa16    0001011  ..... ..... 000 ..... 1111111 @r
+ukcrsa16   0011011  ..... ..... 000 ..... 1111111 @r
+stas16     1111010  ..... ..... 010 ..... 1111111 @r
+rstas16    1011010  ..... ..... 010 ..... 1111111 @r
+urstas16   1101010  ..... ..... 010 ..... 1111111 @r
+kstas16    1100010  ..... ..... 010 ..... 1111111 @r
+ukstas16   1110010  ..... ..... 010 ..... 1111111 @r
+stsa16     1111011  ..... ..... 010 ..... 1111111 @r
+rstsa16    1011011  ..... ..... 010 ..... 1111111 @r
+urstsa16   1101011  ..... ..... 010 ..... 1111111 @r
+kstsa16    1100011  ..... ..... 010 ..... 1111111 @r
+ukstsa16   1110011  ..... ..... 010 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
new file mode 100644
index 0000000000..0885a4fd45
--- /dev/null
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -0,0 +1,161 @@
+/*
+ * RISC-V translation routines for the RVP Standard Extension.
+ *
+ * Copyright (c) 2021 T-Head Semiconductor Co., Ltd. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "tcg/tcg-op-gvec.h"
+#include "tcg/tcg-gvec-desc.h"
+#include "tcg/tcg.h"
+
+/*
+ *** SIMD Data Processing Instructions
+ */
+
+/* 16-bit Addition & Subtraction Instructions */
+
+/*
+ * For some instructions, such as add16, an oberservation can be utilized:
+ * 1) If any reg is zero, it can be reduced to an inline op on the whole reg.
+ * 2) Otherwise, it can be acclebrated by an gvec op or an inline op.
+ */
+
+typedef void GenZeroFn(DisasContext *, arg_r *);
+typedef void GenNoZero32Fn(TCGv, TCGv, TCGv);
+typedef void GenNoZero64Fn(unsigned, uint32_t, uint32_t,
+                           uint32_t, uint32_t, uint32_t);
+
+static inline bool
+r_inline(DisasContext *ctx, arg_r *a, uint8_t vece,
+         GenNoZero64Fn *f64, GenNoZero32Fn *f32,
+         GenZeroFn *fn)
+{
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+    if (a->rd && a->rs1 && a->rs2) {
+#ifdef TARGET_RISCV64
+        f64(vece, offsetof(CPURISCVState, gpr[a->rd]),
+            offsetof(CPURISCVState, gpr[a->rs1]),
+            offsetof(CPURISCVState, gpr[a->rs2]),
+            8, 8);
+#else
+        f32(cpu_gpr[a->rd], cpu_gpr[a->rs1], cpu_gpr[a->rs2]);
+#endif
+    } else {
+        fn(ctx, a);
+    }
+    return true;
+}
+
+/* Complete inline implementation */
+#define GEN_RVP_R_INLINE(NAME, GSUF, VECE, FN)                     \
+static bool trans_##NAME(DisasContext *s, arg_r *a)                \
+{                                                                  \
+    return r_inline(s, a, VECE, tcg_gen_gvec_##GSUF,               \
+                    tcg_gen_simd_##NAME, (GenZeroFn *)FN);         \
+}                                                                  \
+
+static void tcg_gen_simd_add16(TCGv d, TCGv a, TCGv b)
+{
+    TCGv t1 = tcg_temp_new();
+    TCGv t2 = tcg_temp_new();
+
+    tcg_gen_andi_tl(t1, a, ~0xffff);
+    tcg_gen_add_tl(t2, a, b);
+    tcg_gen_add_tl(t1, t1, b);
+    tcg_gen_deposit_tl(d, t1, t2, 0, 16);
+
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+}
+
+GEN_RVP_R_INLINE(add16, add, 1, trans_add);
+
+static void tcg_gen_simd_sub16(TCGv d, TCGv a, TCGv b)
+{
+    TCGv t1 = tcg_temp_new();
+    TCGv t2 = tcg_temp_new();
+
+    tcg_gen_andi_tl(t1, b, ~0xffff);
+    tcg_gen_sub_tl(t2, a, b);
+    tcg_gen_sub_tl(t1, a, t1);
+    tcg_gen_deposit_tl(d, t1, t2, 0, 16);
+
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+}
+
+GEN_RVP_R_INLINE(sub16, sub, 1, trans_sub);
+
+/* Out of line helpers for R format packed instructions */
+typedef void gen_helper_rvp_r(TCGv, TCGv_ptr, TCGv, TCGv);
+
+static inline bool r_ool(DisasContext *ctx, arg_r *a, gen_helper_rvp_r *fn)
+{
+    TCGv src1, src2, dst;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    fn(dst, cpu_env, src1, src2);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(dst);
+    return true;
+}
+
+#define GEN_RVP_R_OOL(NAME)                            \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_ool(s, a, gen_helper_##NAME);             \
+}
+
+GEN_RVP_R_OOL(radd16);
+GEN_RVP_R_OOL(uradd16);
+GEN_RVP_R_OOL(kadd16);
+GEN_RVP_R_OOL(ukadd16);
+GEN_RVP_R_OOL(rsub16);
+GEN_RVP_R_OOL(ursub16);
+GEN_RVP_R_OOL(ksub16);
+GEN_RVP_R_OOL(uksub16);
+GEN_RVP_R_OOL(cras16);
+GEN_RVP_R_OOL(rcras16);
+GEN_RVP_R_OOL(urcras16);
+GEN_RVP_R_OOL(kcras16);
+GEN_RVP_R_OOL(ukcras16);
+GEN_RVP_R_OOL(crsa16);
+GEN_RVP_R_OOL(rcrsa16);
+GEN_RVP_R_OOL(urcrsa16);
+GEN_RVP_R_OOL(kcrsa16);
+GEN_RVP_R_OOL(ukcrsa16);
+GEN_RVP_R_OOL(stas16);
+GEN_RVP_R_OOL(rstas16);
+GEN_RVP_R_OOL(urstas16);
+GEN_RVP_R_OOL(kstas16);
+GEN_RVP_R_OOL(ukstas16);
+GEN_RVP_R_OOL(stsa16);
+GEN_RVP_R_OOL(rstsa16);
+GEN_RVP_R_OOL(urstsa16);
+GEN_RVP_R_OOL(kstsa16);
+GEN_RVP_R_OOL(ukstsa16);
diff --git a/target/riscv/meson.build b/target/riscv/meson.build
index 14a5c62dac..d26a437ee8 100644
--- a/target/riscv/meson.build
+++ b/target/riscv/meson.build
@@ -21,6 +21,7 @@ riscv_ss.add(files(
   'gdbstub.c',
   'op_helper.c',
   'vector_helper.c',
+  'packed_helper.c',
   'translate.c',
 ))
 
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
new file mode 100644
index 0000000000..b84abaaf25
--- /dev/null
+++ b/target/riscv/packed_helper.c
@@ -0,0 +1,354 @@
+/*
+ * RISC-V P Extension Helpers for QEMU.
+ *
+ * Copyright (c) 2021 T-Head Semiconductor Co., Ltd. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/exec-all.h"
+#include "exec/helper-proto.h"
+#include "exec/cpu_ldst.h"
+#include "fpu/softfloat.h"
+#include <math.h>
+#include "internals.h"
+
+/*
+ *** SIMD Data Processing Instructions
+ */
+
+/* 16-bit Addition & Subtraction Instructions */
+typedef void PackedFn3i(CPURISCVState *, void *, void *, void *, uint8_t);
+
+/* Define a common function to loop elements in packed register */
+static inline target_ulong
+rvpr(CPURISCVState *env, target_ulong a, target_ulong b,
+     uint8_t step, uint8_t size, PackedFn3i *fn)
+{
+    int i, passes = sizeof(target_ulong) / size;
+    target_ulong result = 0;
+
+    for (i = 0; i < passes; i += step) {
+        fn(env, &result, &a, &b, i);
+    }
+    return result;
+}
+
+#define RVPR(NAME, STEP, SIZE)                                  \
+target_ulong HELPER(NAME)(CPURISCVState *env, target_ulong a,   \
+                          target_ulong b)                       \
+{                                                               \
+    return rvpr(env, a, b, STEP, SIZE, (PackedFn3i *)do_##NAME);\
+}
+
+static inline int32_t hadd32(int32_t a, int32_t b)
+{
+    return ((int64_t)a + b) >> 1;
+}
+
+static inline void do_radd16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = hadd32(a[i], b[i]);
+}
+
+RVPR(radd16, 1, 2);
+
+static inline uint32_t haddu32(uint32_t a, uint32_t b)
+{
+    return ((uint64_t)a + b) >> 1;
+}
+
+static inline void do_uradd16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = haddu32(a[i], b[i]);
+}
+
+RVPR(uradd16, 1, 2);
+
+static inline void do_kadd16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = sadd16(env, 0, a[i], b[i]);
+}
+
+RVPR(kadd16, 1, 2);
+
+static inline void do_ukadd16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = saddu16(env, 0, a[i], b[i]);
+}
+
+RVPR(ukadd16, 1, 2);
+
+static inline int32_t hsub32(int32_t a, int32_t b)
+{
+    return ((int64_t)a - b) >> 1;
+}
+
+static inline int64_t hsub64(int64_t a, int64_t b)
+{
+    int64_t res = a - b;
+    int64_t over = (res ^ a) & (a ^ b) & INT64_MIN;
+
+    /* With signed overflow, bit 64 is inverse of bit 63. */
+    return (res >> 1) ^ over;
+}
+
+static inline void do_rsub16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = hsub32(a[i], b[i]);
+}
+
+RVPR(rsub16, 1, 2);
+
+static inline uint64_t hsubu64(uint64_t a, uint64_t b)
+{
+    return (a - b) >> 1;
+}
+
+static inline void do_ursub16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = hsubu64(a[i], b[i]);
+}
+
+RVPR(ursub16, 1, 2);
+
+static inline void do_ksub16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = ssub16(env, 0, a[i], b[i]);
+}
+
+RVPR(ksub16, 1, 2);
+
+static inline void do_uksub16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = ssubu16(env, 0, a[i], b[i]);
+}
+
+RVPR(uksub16, 1, 2);
+
+static inline void do_cras16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = a[H2(i)] - b[H2(i + 1)];
+    d[H2(i + 1)] = a[H2(i + 1)] + b[H2(i)];
+}
+
+RVPR(cras16, 2, 2);
+
+static inline void do_rcras16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hsub32(a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = hadd32(a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(rcras16, 2, 2);
+
+static inline void do_urcras16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hsubu64(a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = haddu32(a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(urcras16, 2, 2);
+
+static inline void do_kcras16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = ssub16(env, 0, a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = sadd16(env, 0, a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(kcras16, 2, 2);
+
+static inline void do_ukcras16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = ssubu16(env, 0, a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = saddu16(env, 0, a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(ukcras16, 2, 2);
+
+static inline void do_crsa16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = a[H2(i)] + b[H2(i + 1)];
+    d[H2(i + 1)] = a[H2(i + 1)] - b[H2(i)];
+}
+
+RVPR(crsa16, 2, 2);
+
+static inline void do_rcrsa16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hadd32(a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = hsub32(a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(rcrsa16, 2, 2);
+
+static inline void do_urcrsa16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = haddu32(a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = hsubu64(a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(urcrsa16, 2, 2);
+
+static inline void do_kcrsa16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = sadd16(env, 0, a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = ssub16(env, 0, a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(kcrsa16, 2, 2);
+
+static inline void do_ukcrsa16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = saddu16(env, 0, a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = ssubu16(env, 0, a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(ukcrsa16, 2, 2);
+
+static inline void do_stas16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = a[H2(i)] - b[H2(i)];
+    d[H2(i + 1)] = a[H2(i + 1)] + b[H2(i + 1)];
+}
+
+RVPR(stas16, 2, 2);
+
+static inline void do_rstas16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hsub32(a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = hadd32(a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(rstas16, 2, 2);
+
+static inline void do_urstas16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hsubu64(a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = haddu32(a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(urstas16, 2, 2);
+
+static inline void do_kstas16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = ssub16(env, 0, a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = sadd16(env, 0, a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(kstas16, 2, 2);
+
+static inline void do_ukstas16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = ssubu16(env, 0, a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = saddu16(env, 0, a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(ukstas16, 2, 2);
+
+static inline void do_stsa16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = a[H2(i)] + b[H2(i)];
+    d[H2(i + 1)] = a[H2(i + 1)] - b[H2(i + 1)];
+}
+
+RVPR(stsa16, 2, 2);
+
+static inline void do_rstsa16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hadd32(a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = hsub32(a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(rstsa16, 2, 2);
+
+static inline void do_urstsa16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = haddu32(a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = hsubu64(a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(urstsa16, 2, 2);
+
+static inline void do_kstsa16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = sadd16(env, 0, a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = ssub16(env, 0, a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(kstsa16, 2, 2);
+
+static inline void do_ukstsa16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = saddu16(env, 0, a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = ssubu16(env, 0, a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(ukstsa16, 2, 2);
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index eb810efec6..f0a753f9c7 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -766,6 +766,7 @@ static uint32_t opcode_at(DisasContextBase *dcbase, target_ulong pc)
 #include "insn_trans/trans_rvd.c.inc"
 #include "insn_trans/trans_rvh.c.inc"
 #include "insn_trans/trans_rvv.c.inc"
+#include "insn_trans/trans_rvp.c.inc"
 #include "insn_trans/trans_privileged.c.inc"
 
 /* Include the auto-generated decoder for 16 bit insn */
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 04/38] target/riscv: 16-bit Addition & Subtraction Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Include 5 groups: Wrap-around (dropping overflow), Signed Halving,
Unsigned Halving, Signed Saturation, and Unsigned Saturation.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  30 ++
 target/riscv/insn32.decode              |  32 +++
 target/riscv/insn_trans/trans_rvp.c.inc | 161 +++++++++++
 target/riscv/meson.build                |   1 +
 target/riscv/packed_helper.c            | 354 ++++++++++++++++++++++++
 target/riscv/translate.c                |   1 +
 6 files changed, 579 insertions(+)
 create mode 100644 target/riscv/insn_trans/trans_rvp.c.inc
 create mode 100644 target/riscv/packed_helper.c

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index e3f3f41e89..6d622c732a 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1145,3 +1145,33 @@ DEF_HELPER_6(vcompress_vm_b, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vcompress_vm_h, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vcompress_vm_w, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vcompress_vm_d, void, ptr, ptr, ptr, ptr, env, i32)
+
+/* P extension function */
+DEF_HELPER_3(radd16, tl, env, tl, tl)
+DEF_HELPER_3(uradd16, tl, env, tl, tl)
+DEF_HELPER_3(kadd16, tl, env, tl, tl)
+DEF_HELPER_3(ukadd16, tl, env, tl, tl)
+DEF_HELPER_3(rsub16, tl, env, tl, tl)
+DEF_HELPER_3(ursub16, tl, env, tl, tl)
+DEF_HELPER_3(ksub16, tl, env, tl, tl)
+DEF_HELPER_3(uksub16, tl, env, tl, tl)
+DEF_HELPER_3(cras16, tl, env, tl, tl)
+DEF_HELPER_3(rcras16, tl, env, tl, tl)
+DEF_HELPER_3(urcras16, tl, env, tl, tl)
+DEF_HELPER_3(kcras16, tl, env, tl, tl)
+DEF_HELPER_3(ukcras16, tl, env, tl, tl)
+DEF_HELPER_3(crsa16, tl, env, tl, tl)
+DEF_HELPER_3(rcrsa16, tl, env, tl, tl)
+DEF_HELPER_3(urcrsa16, tl, env, tl, tl)
+DEF_HELPER_3(kcrsa16, tl, env, tl, tl)
+DEF_HELPER_3(ukcrsa16, tl, env, tl, tl)
+DEF_HELPER_3(stas16, tl, env, tl, tl)
+DEF_HELPER_3(rstas16, tl, env, tl, tl)
+DEF_HELPER_3(urstas16, tl, env, tl, tl)
+DEF_HELPER_3(kstas16, tl, env, tl, tl)
+DEF_HELPER_3(ukstas16, tl, env, tl, tl)
+DEF_HELPER_3(stsa16, tl, env, tl, tl)
+DEF_HELPER_3(rstsa16, tl, env, tl, tl)
+DEF_HELPER_3(urstsa16, tl, env, tl, tl)
+DEF_HELPER_3(kstsa16, tl, env, tl, tl)
+DEF_HELPER_3(ukstsa16, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 84080dd18c..8815e90476 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -592,3 +592,35 @@ vcompress_vm    010111 - ..... ..... 010 ..... 1010111 @r
 
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
+
+# *** RV32P Extension ***
+add16      0100000  ..... ..... 000 ..... 1111111 @r
+radd16     0000000  ..... ..... 000 ..... 1111111 @r
+uradd16    0010000  ..... ..... 000 ..... 1111111 @r
+kadd16     0001000  ..... ..... 000 ..... 1111111 @r
+ukadd16    0011000  ..... ..... 000 ..... 1111111 @r
+sub16      0100001  ..... ..... 000 ..... 1111111 @r
+rsub16     0000001  ..... ..... 000 ..... 1111111 @r
+ursub16    0010001  ..... ..... 000 ..... 1111111 @r
+ksub16     0001001  ..... ..... 000 ..... 1111111 @r
+uksub16    0011001  ..... ..... 000 ..... 1111111 @r
+cras16     0100010  ..... ..... 000 ..... 1111111 @r
+rcras16    0000010  ..... ..... 000 ..... 1111111 @r
+urcras16   0010010  ..... ..... 000 ..... 1111111 @r
+kcras16    0001010  ..... ..... 000 ..... 1111111 @r
+ukcras16   0011010  ..... ..... 000 ..... 1111111 @r
+crsa16     0100011  ..... ..... 000 ..... 1111111 @r
+rcrsa16    0000011  ..... ..... 000 ..... 1111111 @r
+urcrsa16   0010011  ..... ..... 000 ..... 1111111 @r
+kcrsa16    0001011  ..... ..... 000 ..... 1111111 @r
+ukcrsa16   0011011  ..... ..... 000 ..... 1111111 @r
+stas16     1111010  ..... ..... 010 ..... 1111111 @r
+rstas16    1011010  ..... ..... 010 ..... 1111111 @r
+urstas16   1101010  ..... ..... 010 ..... 1111111 @r
+kstas16    1100010  ..... ..... 010 ..... 1111111 @r
+ukstas16   1110010  ..... ..... 010 ..... 1111111 @r
+stsa16     1111011  ..... ..... 010 ..... 1111111 @r
+rstsa16    1011011  ..... ..... 010 ..... 1111111 @r
+urstsa16   1101011  ..... ..... 010 ..... 1111111 @r
+kstsa16    1100011  ..... ..... 010 ..... 1111111 @r
+ukstsa16   1110011  ..... ..... 010 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
new file mode 100644
index 0000000000..0885a4fd45
--- /dev/null
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -0,0 +1,161 @@
+/*
+ * RISC-V translation routines for the RVP Standard Extension.
+ *
+ * Copyright (c) 2021 T-Head Semiconductor Co., Ltd. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "tcg/tcg-op-gvec.h"
+#include "tcg/tcg-gvec-desc.h"
+#include "tcg/tcg.h"
+
+/*
+ *** SIMD Data Processing Instructions
+ */
+
+/* 16-bit Addition & Subtraction Instructions */
+
+/*
+ * For some instructions, such as add16, an oberservation can be utilized:
+ * 1) If any reg is zero, it can be reduced to an inline op on the whole reg.
+ * 2) Otherwise, it can be acclebrated by an gvec op or an inline op.
+ */
+
+typedef void GenZeroFn(DisasContext *, arg_r *);
+typedef void GenNoZero32Fn(TCGv, TCGv, TCGv);
+typedef void GenNoZero64Fn(unsigned, uint32_t, uint32_t,
+                           uint32_t, uint32_t, uint32_t);
+
+static inline bool
+r_inline(DisasContext *ctx, arg_r *a, uint8_t vece,
+         GenNoZero64Fn *f64, GenNoZero32Fn *f32,
+         GenZeroFn *fn)
+{
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+    if (a->rd && a->rs1 && a->rs2) {
+#ifdef TARGET_RISCV64
+        f64(vece, offsetof(CPURISCVState, gpr[a->rd]),
+            offsetof(CPURISCVState, gpr[a->rs1]),
+            offsetof(CPURISCVState, gpr[a->rs2]),
+            8, 8);
+#else
+        f32(cpu_gpr[a->rd], cpu_gpr[a->rs1], cpu_gpr[a->rs2]);
+#endif
+    } else {
+        fn(ctx, a);
+    }
+    return true;
+}
+
+/* Complete inline implementation */
+#define GEN_RVP_R_INLINE(NAME, GSUF, VECE, FN)                     \
+static bool trans_##NAME(DisasContext *s, arg_r *a)                \
+{                                                                  \
+    return r_inline(s, a, VECE, tcg_gen_gvec_##GSUF,               \
+                    tcg_gen_simd_##NAME, (GenZeroFn *)FN);         \
+}                                                                  \
+
+static void tcg_gen_simd_add16(TCGv d, TCGv a, TCGv b)
+{
+    TCGv t1 = tcg_temp_new();
+    TCGv t2 = tcg_temp_new();
+
+    tcg_gen_andi_tl(t1, a, ~0xffff);
+    tcg_gen_add_tl(t2, a, b);
+    tcg_gen_add_tl(t1, t1, b);
+    tcg_gen_deposit_tl(d, t1, t2, 0, 16);
+
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+}
+
+GEN_RVP_R_INLINE(add16, add, 1, trans_add);
+
+static void tcg_gen_simd_sub16(TCGv d, TCGv a, TCGv b)
+{
+    TCGv t1 = tcg_temp_new();
+    TCGv t2 = tcg_temp_new();
+
+    tcg_gen_andi_tl(t1, b, ~0xffff);
+    tcg_gen_sub_tl(t2, a, b);
+    tcg_gen_sub_tl(t1, a, t1);
+    tcg_gen_deposit_tl(d, t1, t2, 0, 16);
+
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+}
+
+GEN_RVP_R_INLINE(sub16, sub, 1, trans_sub);
+
+/* Out of line helpers for R format packed instructions */
+typedef void gen_helper_rvp_r(TCGv, TCGv_ptr, TCGv, TCGv);
+
+static inline bool r_ool(DisasContext *ctx, arg_r *a, gen_helper_rvp_r *fn)
+{
+    TCGv src1, src2, dst;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    fn(dst, cpu_env, src1, src2);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(dst);
+    return true;
+}
+
+#define GEN_RVP_R_OOL(NAME)                            \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_ool(s, a, gen_helper_##NAME);             \
+}
+
+GEN_RVP_R_OOL(radd16);
+GEN_RVP_R_OOL(uradd16);
+GEN_RVP_R_OOL(kadd16);
+GEN_RVP_R_OOL(ukadd16);
+GEN_RVP_R_OOL(rsub16);
+GEN_RVP_R_OOL(ursub16);
+GEN_RVP_R_OOL(ksub16);
+GEN_RVP_R_OOL(uksub16);
+GEN_RVP_R_OOL(cras16);
+GEN_RVP_R_OOL(rcras16);
+GEN_RVP_R_OOL(urcras16);
+GEN_RVP_R_OOL(kcras16);
+GEN_RVP_R_OOL(ukcras16);
+GEN_RVP_R_OOL(crsa16);
+GEN_RVP_R_OOL(rcrsa16);
+GEN_RVP_R_OOL(urcrsa16);
+GEN_RVP_R_OOL(kcrsa16);
+GEN_RVP_R_OOL(ukcrsa16);
+GEN_RVP_R_OOL(stas16);
+GEN_RVP_R_OOL(rstas16);
+GEN_RVP_R_OOL(urstas16);
+GEN_RVP_R_OOL(kstas16);
+GEN_RVP_R_OOL(ukstas16);
+GEN_RVP_R_OOL(stsa16);
+GEN_RVP_R_OOL(rstsa16);
+GEN_RVP_R_OOL(urstsa16);
+GEN_RVP_R_OOL(kstsa16);
+GEN_RVP_R_OOL(ukstsa16);
diff --git a/target/riscv/meson.build b/target/riscv/meson.build
index 14a5c62dac..d26a437ee8 100644
--- a/target/riscv/meson.build
+++ b/target/riscv/meson.build
@@ -21,6 +21,7 @@ riscv_ss.add(files(
   'gdbstub.c',
   'op_helper.c',
   'vector_helper.c',
+  'packed_helper.c',
   'translate.c',
 ))
 
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
new file mode 100644
index 0000000000..b84abaaf25
--- /dev/null
+++ b/target/riscv/packed_helper.c
@@ -0,0 +1,354 @@
+/*
+ * RISC-V P Extension Helpers for QEMU.
+ *
+ * Copyright (c) 2021 T-Head Semiconductor Co., Ltd. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/exec-all.h"
+#include "exec/helper-proto.h"
+#include "exec/cpu_ldst.h"
+#include "fpu/softfloat.h"
+#include <math.h>
+#include "internals.h"
+
+/*
+ *** SIMD Data Processing Instructions
+ */
+
+/* 16-bit Addition & Subtraction Instructions */
+typedef void PackedFn3i(CPURISCVState *, void *, void *, void *, uint8_t);
+
+/* Define a common function to loop elements in packed register */
+static inline target_ulong
+rvpr(CPURISCVState *env, target_ulong a, target_ulong b,
+     uint8_t step, uint8_t size, PackedFn3i *fn)
+{
+    int i, passes = sizeof(target_ulong) / size;
+    target_ulong result = 0;
+
+    for (i = 0; i < passes; i += step) {
+        fn(env, &result, &a, &b, i);
+    }
+    return result;
+}
+
+#define RVPR(NAME, STEP, SIZE)                                  \
+target_ulong HELPER(NAME)(CPURISCVState *env, target_ulong a,   \
+                          target_ulong b)                       \
+{                                                               \
+    return rvpr(env, a, b, STEP, SIZE, (PackedFn3i *)do_##NAME);\
+}
+
+static inline int32_t hadd32(int32_t a, int32_t b)
+{
+    return ((int64_t)a + b) >> 1;
+}
+
+static inline void do_radd16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = hadd32(a[i], b[i]);
+}
+
+RVPR(radd16, 1, 2);
+
+static inline uint32_t haddu32(uint32_t a, uint32_t b)
+{
+    return ((uint64_t)a + b) >> 1;
+}
+
+static inline void do_uradd16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = haddu32(a[i], b[i]);
+}
+
+RVPR(uradd16, 1, 2);
+
+static inline void do_kadd16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = sadd16(env, 0, a[i], b[i]);
+}
+
+RVPR(kadd16, 1, 2);
+
+static inline void do_ukadd16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = saddu16(env, 0, a[i], b[i]);
+}
+
+RVPR(ukadd16, 1, 2);
+
+static inline int32_t hsub32(int32_t a, int32_t b)
+{
+    return ((int64_t)a - b) >> 1;
+}
+
+static inline int64_t hsub64(int64_t a, int64_t b)
+{
+    int64_t res = a - b;
+    int64_t over = (res ^ a) & (a ^ b) & INT64_MIN;
+
+    /* With signed overflow, bit 64 is inverse of bit 63. */
+    return (res >> 1) ^ over;
+}
+
+static inline void do_rsub16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = hsub32(a[i], b[i]);
+}
+
+RVPR(rsub16, 1, 2);
+
+static inline uint64_t hsubu64(uint64_t a, uint64_t b)
+{
+    return (a - b) >> 1;
+}
+
+static inline void do_ursub16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = hsubu64(a[i], b[i]);
+}
+
+RVPR(ursub16, 1, 2);
+
+static inline void do_ksub16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = ssub16(env, 0, a[i], b[i]);
+}
+
+RVPR(ksub16, 1, 2);
+
+static inline void do_uksub16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = ssubu16(env, 0, a[i], b[i]);
+}
+
+RVPR(uksub16, 1, 2);
+
+static inline void do_cras16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = a[H2(i)] - b[H2(i + 1)];
+    d[H2(i + 1)] = a[H2(i + 1)] + b[H2(i)];
+}
+
+RVPR(cras16, 2, 2);
+
+static inline void do_rcras16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hsub32(a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = hadd32(a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(rcras16, 2, 2);
+
+static inline void do_urcras16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hsubu64(a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = haddu32(a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(urcras16, 2, 2);
+
+static inline void do_kcras16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = ssub16(env, 0, a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = sadd16(env, 0, a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(kcras16, 2, 2);
+
+static inline void do_ukcras16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = ssubu16(env, 0, a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = saddu16(env, 0, a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(ukcras16, 2, 2);
+
+static inline void do_crsa16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = a[H2(i)] + b[H2(i + 1)];
+    d[H2(i + 1)] = a[H2(i + 1)] - b[H2(i)];
+}
+
+RVPR(crsa16, 2, 2);
+
+static inline void do_rcrsa16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hadd32(a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = hsub32(a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(rcrsa16, 2, 2);
+
+static inline void do_urcrsa16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = haddu32(a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = hsubu64(a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(urcrsa16, 2, 2);
+
+static inline void do_kcrsa16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = sadd16(env, 0, a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = ssub16(env, 0, a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(kcrsa16, 2, 2);
+
+static inline void do_ukcrsa16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = saddu16(env, 0, a[H2(i)], b[H2(i + 1)]);
+    d[H2(i + 1)] = ssubu16(env, 0, a[H2(i + 1)], b[H2(i)]);
+}
+
+RVPR(ukcrsa16, 2, 2);
+
+static inline void do_stas16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = a[H2(i)] - b[H2(i)];
+    d[H2(i + 1)] = a[H2(i + 1)] + b[H2(i + 1)];
+}
+
+RVPR(stas16, 2, 2);
+
+static inline void do_rstas16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hsub32(a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = hadd32(a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(rstas16, 2, 2);
+
+static inline void do_urstas16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hsubu64(a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = haddu32(a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(urstas16, 2, 2);
+
+static inline void do_kstas16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = ssub16(env, 0, a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = sadd16(env, 0, a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(kstas16, 2, 2);
+
+static inline void do_ukstas16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = ssubu16(env, 0, a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = saddu16(env, 0, a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(ukstas16, 2, 2);
+
+static inline void do_stsa16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = a[H2(i)] + b[H2(i)];
+    d[H2(i + 1)] = a[H2(i + 1)] - b[H2(i + 1)];
+}
+
+RVPR(stsa16, 2, 2);
+
+static inline void do_rstsa16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = hadd32(a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = hsub32(a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(rstsa16, 2, 2);
+
+static inline void do_urstsa16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = haddu32(a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = hsubu64(a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(urstsa16, 2, 2);
+
+static inline void do_kstsa16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = sadd16(env, 0, a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = ssub16(env, 0, a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(kstsa16, 2, 2);
+
+static inline void do_ukstsa16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i)] = saddu16(env, 0, a[H2(i)], b[H2(i)]);
+    d[H2(i + 1)] = ssubu16(env, 0, a[H2(i + 1)], b[H2(i + 1)]);
+}
+
+RVPR(ukstsa16, 2, 2);
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index eb810efec6..f0a753f9c7 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -766,6 +766,7 @@ static uint32_t opcode_at(DisasContextBase *dcbase, target_ulong pc)
 #include "insn_trans/trans_rvd.c.inc"
 #include "insn_trans/trans_rvh.c.inc"
 #include "insn_trans/trans_rvv.c.inc"
+#include "insn_trans/trans_rvp.c.inc"
 #include "insn_trans/trans_privileged.c.inc"
 
 /* Include the auto-generated decoder for 16 bit insn */
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 05/38] target/riscv: 8-bit Addition & Subtraction Instruction
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  9 +++
 target/riscv/insn32.decode              | 11 ++++
 target/riscv/insn_trans/trans_rvp.c.inc | 79 +++++++++++++++++++++++++
 target/riscv/packed_helper.c            | 73 +++++++++++++++++++++++
 4 files changed, 172 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 6d622c732a..a69a6b4e84 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1175,3 +1175,12 @@ DEF_HELPER_3(rstsa16, tl, env, tl, tl)
 DEF_HELPER_3(urstsa16, tl, env, tl, tl)
 DEF_HELPER_3(kstsa16, tl, env, tl, tl)
 DEF_HELPER_3(ukstsa16, tl, env, tl, tl)
+
+DEF_HELPER_3(radd8, tl, env, tl, tl)
+DEF_HELPER_3(uradd8, tl, env, tl, tl)
+DEF_HELPER_3(kadd8, tl, env, tl, tl)
+DEF_HELPER_3(ukadd8, tl, env, tl, tl)
+DEF_HELPER_3(rsub8, tl, env, tl, tl)
+DEF_HELPER_3(ursub8, tl, env, tl, tl)
+DEF_HELPER_3(ksub8, tl, env, tl, tl)
+DEF_HELPER_3(uksub8, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 8815e90476..358dd1fa10 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -624,3 +624,14 @@ rstsa16    1011011  ..... ..... 010 ..... 1111111 @r
 urstsa16   1101011  ..... ..... 010 ..... 1111111 @r
 kstsa16    1100011  ..... ..... 010 ..... 1111111 @r
 ukstsa16   1110011  ..... ..... 010 ..... 1111111 @r
+
+add8       0100100  ..... ..... 000 ..... 1111111 @r
+radd8      0000100  ..... ..... 000 ..... 1111111 @r
+uradd8     0010100  ..... ..... 000 ..... 1111111 @r
+kadd8      0001100  ..... ..... 000 ..... 1111111 @r
+ukadd8     0011100  ..... ..... 000 ..... 1111111 @r
+sub8       0100101  ..... ..... 000 ..... 1111111 @r
+rsub8      0000101  ..... ..... 000 ..... 1111111 @r
+ursub8     0010101  ..... ..... 000 ..... 1111111 @r
+ksub8      0001101  ..... ..... 000 ..... 1111111 @r
+uksub8     0011101  ..... ..... 000 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 0885a4fd45..109f560ec9 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -159,3 +159,82 @@ GEN_RVP_R_OOL(rstsa16);
 GEN_RVP_R_OOL(urstsa16);
 GEN_RVP_R_OOL(kstsa16);
 GEN_RVP_R_OOL(ukstsa16);
+
+/* 8-bit Addition & Subtraction Instructions */
+/*
+ *  Copied from tcg-op-gvec.c.
+ *
+ *  Perform a vector addition using normal addition and a mask.  The mask
+ *  should be the sign bit of each lane.  This 6-operation form is more
+ *  efficient than separate additions when there are 4 or more lanes in
+ *  the 64-bit operation.
+ */
+
+static void gen_simd_add_mask(TCGv d, TCGv a, TCGv b, TCGv m)
+{
+    TCGv t1 = tcg_temp_new();
+    TCGv t2 = tcg_temp_new();
+    TCGv t3 = tcg_temp_new();
+
+    tcg_gen_andc_tl(t1, a, m);
+    tcg_gen_andc_tl(t2, b, m);
+    tcg_gen_xor_tl(t3, a, b);
+    tcg_gen_add_tl(d, t1, t2);
+    tcg_gen_and_tl(t3, t3, m);
+    tcg_gen_xor_tl(d, d, t3);
+
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+    tcg_temp_free(t3);
+}
+
+static void tcg_gen_simd_add8(TCGv d, TCGv a, TCGv b)
+{
+    TCGv m = tcg_const_tl((target_ulong)dup_const(MO_8, 0x80));
+    gen_simd_add_mask(d, a, b, m);
+    tcg_temp_free(m);
+}
+
+GEN_RVP_R_INLINE(add8, add, 0, trans_add);
+
+/*
+ *  Copied from tcg-op-gvec.c.
+ *
+ *  Perform a vector subtraction using normal subtraction and a mask.
+ *  Compare gen_addv_mask above.
+ */
+static void gen_simd_sub_mask(TCGv d, TCGv a, TCGv b, TCGv m)
+{
+    TCGv t1 = tcg_temp_new();
+    TCGv t2 = tcg_temp_new();
+    TCGv t3 = tcg_temp_new();
+
+    tcg_gen_or_tl(t1, a, m);
+    tcg_gen_andc_tl(t2, b, m);
+    tcg_gen_eqv_tl(t3, a, b);
+    tcg_gen_sub_tl(d, t1, t2);
+    tcg_gen_and_tl(t3, t3, m);
+    tcg_gen_xor_tl(d, d, t3);
+
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+    tcg_temp_free(t3);
+}
+
+static void tcg_gen_simd_sub8(TCGv d, TCGv a, TCGv b)
+{
+    TCGv m = tcg_const_tl((target_ulong)dup_const(MO_8, 0x80));
+    gen_simd_sub_mask(d, a, b, m);
+    tcg_temp_free(m);
+}
+
+GEN_RVP_R_INLINE(sub8, sub, 0, trans_sub);
+
+GEN_RVP_R_OOL(radd8);
+GEN_RVP_R_OOL(uradd8);
+GEN_RVP_R_OOL(kadd8);
+GEN_RVP_R_OOL(ukadd8);
+GEN_RVP_R_OOL(rsub8);
+GEN_RVP_R_OOL(ursub8);
+GEN_RVP_R_OOL(ksub8);
+GEN_RVP_R_OOL(uksub8);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index b84abaaf25..62db072204 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -352,3 +352,76 @@ static inline void do_ukstsa16(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(ukstsa16, 2, 2);
+
+/* 8-bit Addition & Subtraction Instructions */
+static inline void do_radd8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = hadd32(a[i], b[i]);
+}
+
+RVPR(radd8, 1, 1);
+
+static inline void do_uradd8(CPURISCVState *env, void *vd, void *va,
+                                  void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = haddu32(a[i], b[i]);
+}
+
+RVPR(uradd8, 1, 1);
+
+static inline void do_kadd8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = sadd8(env, 0, a[i], b[i]);
+}
+
+RVPR(kadd8, 1, 1);
+
+static inline void do_ukadd8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = saddu8(env, 0, a[i], b[i]);
+}
+
+RVPR(ukadd8, 1, 1);
+
+static inline void do_rsub8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = hsub32(a[i], b[i]);
+}
+
+RVPR(rsub8, 1, 1);
+
+static inline void do_ursub8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = hsubu64(a[i], b[i]);
+}
+
+RVPR(ursub8, 1, 1);
+
+static inline void do_ksub8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = ssub8(env, 0, a[i], b[i]);
+}
+
+RVPR(ksub8, 1, 1);
+
+static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = ssubu8(env, 0, a[i], b[i]);
+}
+
+RVPR(uksub8, 1, 1);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 05/38] target/riscv: 8-bit Addition & Subtraction Instruction
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  9 +++
 target/riscv/insn32.decode              | 11 ++++
 target/riscv/insn_trans/trans_rvp.c.inc | 79 +++++++++++++++++++++++++
 target/riscv/packed_helper.c            | 73 +++++++++++++++++++++++
 4 files changed, 172 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 6d622c732a..a69a6b4e84 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1175,3 +1175,12 @@ DEF_HELPER_3(rstsa16, tl, env, tl, tl)
 DEF_HELPER_3(urstsa16, tl, env, tl, tl)
 DEF_HELPER_3(kstsa16, tl, env, tl, tl)
 DEF_HELPER_3(ukstsa16, tl, env, tl, tl)
+
+DEF_HELPER_3(radd8, tl, env, tl, tl)
+DEF_HELPER_3(uradd8, tl, env, tl, tl)
+DEF_HELPER_3(kadd8, tl, env, tl, tl)
+DEF_HELPER_3(ukadd8, tl, env, tl, tl)
+DEF_HELPER_3(rsub8, tl, env, tl, tl)
+DEF_HELPER_3(ursub8, tl, env, tl, tl)
+DEF_HELPER_3(ksub8, tl, env, tl, tl)
+DEF_HELPER_3(uksub8, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 8815e90476..358dd1fa10 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -624,3 +624,14 @@ rstsa16    1011011  ..... ..... 010 ..... 1111111 @r
 urstsa16   1101011  ..... ..... 010 ..... 1111111 @r
 kstsa16    1100011  ..... ..... 010 ..... 1111111 @r
 ukstsa16   1110011  ..... ..... 010 ..... 1111111 @r
+
+add8       0100100  ..... ..... 000 ..... 1111111 @r
+radd8      0000100  ..... ..... 000 ..... 1111111 @r
+uradd8     0010100  ..... ..... 000 ..... 1111111 @r
+kadd8      0001100  ..... ..... 000 ..... 1111111 @r
+ukadd8     0011100  ..... ..... 000 ..... 1111111 @r
+sub8       0100101  ..... ..... 000 ..... 1111111 @r
+rsub8      0000101  ..... ..... 000 ..... 1111111 @r
+ursub8     0010101  ..... ..... 000 ..... 1111111 @r
+ksub8      0001101  ..... ..... 000 ..... 1111111 @r
+uksub8     0011101  ..... ..... 000 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 0885a4fd45..109f560ec9 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -159,3 +159,82 @@ GEN_RVP_R_OOL(rstsa16);
 GEN_RVP_R_OOL(urstsa16);
 GEN_RVP_R_OOL(kstsa16);
 GEN_RVP_R_OOL(ukstsa16);
+
+/* 8-bit Addition & Subtraction Instructions */
+/*
+ *  Copied from tcg-op-gvec.c.
+ *
+ *  Perform a vector addition using normal addition and a mask.  The mask
+ *  should be the sign bit of each lane.  This 6-operation form is more
+ *  efficient than separate additions when there are 4 or more lanes in
+ *  the 64-bit operation.
+ */
+
+static void gen_simd_add_mask(TCGv d, TCGv a, TCGv b, TCGv m)
+{
+    TCGv t1 = tcg_temp_new();
+    TCGv t2 = tcg_temp_new();
+    TCGv t3 = tcg_temp_new();
+
+    tcg_gen_andc_tl(t1, a, m);
+    tcg_gen_andc_tl(t2, b, m);
+    tcg_gen_xor_tl(t3, a, b);
+    tcg_gen_add_tl(d, t1, t2);
+    tcg_gen_and_tl(t3, t3, m);
+    tcg_gen_xor_tl(d, d, t3);
+
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+    tcg_temp_free(t3);
+}
+
+static void tcg_gen_simd_add8(TCGv d, TCGv a, TCGv b)
+{
+    TCGv m = tcg_const_tl((target_ulong)dup_const(MO_8, 0x80));
+    gen_simd_add_mask(d, a, b, m);
+    tcg_temp_free(m);
+}
+
+GEN_RVP_R_INLINE(add8, add, 0, trans_add);
+
+/*
+ *  Copied from tcg-op-gvec.c.
+ *
+ *  Perform a vector subtraction using normal subtraction and a mask.
+ *  Compare gen_addv_mask above.
+ */
+static void gen_simd_sub_mask(TCGv d, TCGv a, TCGv b, TCGv m)
+{
+    TCGv t1 = tcg_temp_new();
+    TCGv t2 = tcg_temp_new();
+    TCGv t3 = tcg_temp_new();
+
+    tcg_gen_or_tl(t1, a, m);
+    tcg_gen_andc_tl(t2, b, m);
+    tcg_gen_eqv_tl(t3, a, b);
+    tcg_gen_sub_tl(d, t1, t2);
+    tcg_gen_and_tl(t3, t3, m);
+    tcg_gen_xor_tl(d, d, t3);
+
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+    tcg_temp_free(t3);
+}
+
+static void tcg_gen_simd_sub8(TCGv d, TCGv a, TCGv b)
+{
+    TCGv m = tcg_const_tl((target_ulong)dup_const(MO_8, 0x80));
+    gen_simd_sub_mask(d, a, b, m);
+    tcg_temp_free(m);
+}
+
+GEN_RVP_R_INLINE(sub8, sub, 0, trans_sub);
+
+GEN_RVP_R_OOL(radd8);
+GEN_RVP_R_OOL(uradd8);
+GEN_RVP_R_OOL(kadd8);
+GEN_RVP_R_OOL(ukadd8);
+GEN_RVP_R_OOL(rsub8);
+GEN_RVP_R_OOL(ursub8);
+GEN_RVP_R_OOL(ksub8);
+GEN_RVP_R_OOL(uksub8);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index b84abaaf25..62db072204 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -352,3 +352,76 @@ static inline void do_ukstsa16(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(ukstsa16, 2, 2);
+
+/* 8-bit Addition & Subtraction Instructions */
+static inline void do_radd8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = hadd32(a[i], b[i]);
+}
+
+RVPR(radd8, 1, 1);
+
+static inline void do_uradd8(CPURISCVState *env, void *vd, void *va,
+                                  void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = haddu32(a[i], b[i]);
+}
+
+RVPR(uradd8, 1, 1);
+
+static inline void do_kadd8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = sadd8(env, 0, a[i], b[i]);
+}
+
+RVPR(kadd8, 1, 1);
+
+static inline void do_ukadd8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = saddu8(env, 0, a[i], b[i]);
+}
+
+RVPR(ukadd8, 1, 1);
+
+static inline void do_rsub8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = hsub32(a[i], b[i]);
+}
+
+RVPR(rsub8, 1, 1);
+
+static inline void do_ursub8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = hsubu64(a[i], b[i]);
+}
+
+RVPR(ursub8, 1, 1);
+
+static inline void do_ksub8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = ssub8(env, 0, a[i], b[i]);
+}
+
+RVPR(ksub8, 1, 1);
+
+static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = ssubu8(env, 0, a[i], b[i]);
+}
+
+RVPR(uksub8, 1, 1);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 06/38] target/riscv: SIMD 16-bit Shift Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   9 ++
 target/riscv/insn32.decode              |  17 ++++
 target/riscv/insn_trans/trans_rvp.c.inc | 115 ++++++++++++++++++++++++
 target/riscv/packed_helper.c            | 104 +++++++++++++++++++++
 4 files changed, 245 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index a69a6b4e84..20bf400ac2 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1184,3 +1184,12 @@ DEF_HELPER_3(rsub8, tl, env, tl, tl)
 DEF_HELPER_3(ursub8, tl, env, tl, tl)
 DEF_HELPER_3(ksub8, tl, env, tl, tl)
 DEF_HELPER_3(uksub8, tl, env, tl, tl)
+
+DEF_HELPER_3(sra16, tl, env, tl, tl)
+DEF_HELPER_3(sra16_u, tl, env, tl, tl)
+DEF_HELPER_3(srl16, tl, env, tl, tl)
+DEF_HELPER_3(srl16_u, tl, env, tl, tl)
+DEF_HELPER_3(sll16, tl, env, tl, tl)
+DEF_HELPER_3(ksll16, tl, env, tl, tl)
+DEF_HELPER_3(kslra16, tl, env, tl, tl)
+DEF_HELPER_3(kslra16_u, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 358dd1fa10..6f053bfeb7 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -23,6 +23,7 @@
 %rd        7:5
 
 %sh10    20:10
+%sh4    20:4
 %csr    20:12
 %rm     12:3
 %nf     29:3                     !function=ex_plus_1
@@ -59,6 +60,7 @@
 @j       ....................      ..... ....... &j      imm=%imm_j          %rd
 
 @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh10      %rs1 %rd
+@sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
 @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
 
 @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
@@ -635,3 +637,18 @@ rsub8      0000101  ..... ..... 000 ..... 1111111 @r
 ursub8     0010101  ..... ..... 000 ..... 1111111 @r
 ksub8      0001101  ..... ..... 000 ..... 1111111 @r
 uksub8     0011101  ..... ..... 000 ..... 1111111 @r
+
+sra16      0101000  ..... ..... 000 ..... 1111111 @r
+sra16_u    0110000  ..... ..... 000 ..... 1111111 @r
+srai16     0111000  0.... ..... 000 ..... 1111111 @sh4
+srai16_u   0111000  1.... ..... 000 ..... 1111111 @sh4
+srl16      0101001  ..... ..... 000 ..... 1111111 @r
+srl16_u    0110001  ..... ..... 000 ..... 1111111 @r
+srli16     0111001  0.... ..... 000 ..... 1111111 @sh4
+srli16_u   0111001  1.... ..... 000 ..... 1111111 @sh4
+sll16      0101010  ..... ..... 000 ..... 1111111 @r
+slli16     0111010  0.... ..... 000 ..... 1111111 @sh4
+ksll16     0110010  ..... ..... 000 ..... 1111111 @r
+kslli16    0111010  1.... ..... 000 ..... 1111111 @sh4
+kslra16    0101011  ..... ..... 000 ..... 1111111 @r
+kslra16_u  0110011  ..... ..... 000 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 109f560ec9..848edab7e5 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -238,3 +238,118 @@ GEN_RVP_R_OOL(rsub8);
 GEN_RVP_R_OOL(ursub8);
 GEN_RVP_R_OOL(ksub8);
 GEN_RVP_R_OOL(uksub8);
+
+/* 16-bit Shift Instructions */
+static bool rvp_shift_ool(DisasContext *ctx, arg_r *a,
+                          gen_helper_rvp_r *fn, target_ulong mask)
+{
+    TCGv src1, src2, dst;
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    tcg_gen_andi_tl(src2, src2, mask);
+
+    fn(dst, cpu_env, src1, src2);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(dst);
+    return true;
+}
+
+typedef void GenGvecShift(unsigned, uint32_t, uint32_t, TCGv_i32,
+                          uint32_t, uint32_t);
+static inline bool
+rvp_shift(DisasContext *ctx, arg_r *a, uint8_t vece,
+          GenGvecShift *f64, gen_helper_rvp_r *fn,
+          uint8_t mask)
+{
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+#ifdef TARGET_RISCV64
+    if (a->rd && a->rs1 && a->rs2) {
+        TCGv_i32 shift = tcg_temp_new_i32();
+        tcg_gen_extrl_i64_i32(shift, cpu_gpr[a->rs2]);
+        tcg_gen_andi_i32(shift, shift, mask);
+        f64(vece, offsetof(CPURISCVState, gpr[a->rd]),
+            offsetof(CPURISCVState, gpr[a->rs1]),
+            shift, 8, 8);
+        tcg_temp_free_i32(shift);
+        return true;
+    }
+#endif
+    return rvp_shift_ool(ctx, a, fn, mask);
+}
+
+#define GEN_RVP_SHIFT(NAME, GVEC, VECE)                     \
+static bool trans_##NAME(DisasContext *s, arg_r *a)         \
+{                                                           \
+    return rvp_shift(s, a, VECE, GVEC, gen_helper_##NAME,   \
+                     (8 << VECE) - 1);                      \
+}
+
+GEN_RVP_SHIFT(sra16, tcg_gen_gvec_sars, 1);
+GEN_RVP_SHIFT(srl16, tcg_gen_gvec_shrs, 1);
+GEN_RVP_SHIFT(sll16, tcg_gen_gvec_shls, 1);
+GEN_RVP_R_OOL(sra16_u);
+GEN_RVP_R_OOL(srl16_u);
+GEN_RVP_R_OOL(ksll16);
+GEN_RVP_R_OOL(kslra16);
+GEN_RVP_R_OOL(kslra16_u);
+
+static bool rvp_shifti_ool(DisasContext *ctx, arg_shift *a,
+                           gen_helper_rvp_r *fn)
+{
+    TCGv src1, dst, shift;
+
+    src1 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    shift = tcg_const_tl(a->shamt);
+    fn(dst, cpu_env, src1, shift);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(dst);
+    tcg_temp_free(shift);
+    return true;
+}
+
+static inline bool
+rvp_shifti(DisasContext *ctx, arg_shift *a,
+           void (* f64)(TCGv_i64, TCGv_i64, int64_t),
+           gen_helper_rvp_r *fn)
+{
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+#ifdef TARGET_RISCV64
+    if (a->rd && a->rs1 && f64) {
+        f64(cpu_gpr[a->rd], cpu_gpr[a->rs1], a->shamt);
+        return true;
+    }
+#endif
+    return rvp_shifti_ool(ctx, a, fn);
+}
+
+#define GEN_RVP_SHIFTI(NAME, OP, GVEC)                   \
+static bool trans_##NAME(DisasContext *s, arg_shift *a)  \
+{                                                        \
+    return rvp_shifti(s, a, GVEC, gen_helper_##OP);      \
+}
+
+GEN_RVP_SHIFTI(srai16, sra16, tcg_gen_vec_sar16i_i64);
+GEN_RVP_SHIFTI(srli16, srl16, tcg_gen_vec_shr16i_i64);
+GEN_RVP_SHIFTI(slli16, sll16, tcg_gen_vec_shl16i_i64);
+GEN_RVP_SHIFTI(srai16_u, sra16_u, NULL);
+GEN_RVP_SHIFTI(srli16_u, srl16_u, NULL);
+GEN_RVP_SHIFTI(kslli16, ksll16, NULL);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 62db072204..7e31c2fe46 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -425,3 +425,107 @@ static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(uksub8, 1, 1);
+
+/* 16-bit Shift Instructions */
+static inline void do_sra16(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+    d[i] = a[i] >> shift;
+}
+
+RVPR(sra16, 1, 2);
+
+static inline void do_srl16(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+    d[i] = a[i] >> shift;
+}
+
+RVPR(srl16, 1, 2);
+
+static inline void do_sll16(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+    d[i] = a[i] << shift;
+}
+
+RVPR(sll16, 1, 2);
+
+static inline void do_sra16_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+
+    d[i] = vssra16(env, 0, a[i], shift);
+}
+
+RVPR(sra16_u, 1, 2);
+
+static inline void do_srl16_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+
+    d[i] = vssrl16(env, 0, a[i], shift);
+}
+
+RVPR(srl16_u, 1, 2);
+
+static inline void do_ksll16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, result;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+
+    result = a[i] << shift;
+    if (shift > (clrsb32(a[i]) - 16)) {
+        env->vxsat = 0x1;
+        d[i] = (a[i] & INT16_MIN) ? INT16_MIN : INT16_MAX;
+    } else {
+        d[i] = result;
+    }
+}
+
+RVPR(ksll16, 1, 2);
+
+static inline void do_kslra16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    int32_t shift = sextract32((*(target_ulong *)vb), 0, 5);
+
+    if (shift >= 0) {
+        do_ksll16(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 16) ? 15 : shift;
+        d[i] = a[i] >> shift;
+    }
+}
+
+RVPR(kslra16, 1, 2);
+
+static inline void do_kslra16_u(CPURISCVState *env, void *vd, void *va,
+                                void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 5);
+
+    if (shift >= 0) {
+        do_ksll16(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 16) ? 15 : shift;
+        d[i] = vssra16(env, 0, a[i], shift);
+    }
+}
+
+RVPR(kslra16_u, 1, 2);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 06/38] target/riscv: SIMD 16-bit Shift Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   9 ++
 target/riscv/insn32.decode              |  17 ++++
 target/riscv/insn_trans/trans_rvp.c.inc | 115 ++++++++++++++++++++++++
 target/riscv/packed_helper.c            | 104 +++++++++++++++++++++
 4 files changed, 245 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index a69a6b4e84..20bf400ac2 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1184,3 +1184,12 @@ DEF_HELPER_3(rsub8, tl, env, tl, tl)
 DEF_HELPER_3(ursub8, tl, env, tl, tl)
 DEF_HELPER_3(ksub8, tl, env, tl, tl)
 DEF_HELPER_3(uksub8, tl, env, tl, tl)
+
+DEF_HELPER_3(sra16, tl, env, tl, tl)
+DEF_HELPER_3(sra16_u, tl, env, tl, tl)
+DEF_HELPER_3(srl16, tl, env, tl, tl)
+DEF_HELPER_3(srl16_u, tl, env, tl, tl)
+DEF_HELPER_3(sll16, tl, env, tl, tl)
+DEF_HELPER_3(ksll16, tl, env, tl, tl)
+DEF_HELPER_3(kslra16, tl, env, tl, tl)
+DEF_HELPER_3(kslra16_u, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 358dd1fa10..6f053bfeb7 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -23,6 +23,7 @@
 %rd        7:5
 
 %sh10    20:10
+%sh4    20:4
 %csr    20:12
 %rm     12:3
 %nf     29:3                     !function=ex_plus_1
@@ -59,6 +60,7 @@
 @j       ....................      ..... ....... &j      imm=%imm_j          %rd
 
 @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh10      %rs1 %rd
+@sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
 @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
 
 @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
@@ -635,3 +637,18 @@ rsub8      0000101  ..... ..... 000 ..... 1111111 @r
 ursub8     0010101  ..... ..... 000 ..... 1111111 @r
 ksub8      0001101  ..... ..... 000 ..... 1111111 @r
 uksub8     0011101  ..... ..... 000 ..... 1111111 @r
+
+sra16      0101000  ..... ..... 000 ..... 1111111 @r
+sra16_u    0110000  ..... ..... 000 ..... 1111111 @r
+srai16     0111000  0.... ..... 000 ..... 1111111 @sh4
+srai16_u   0111000  1.... ..... 000 ..... 1111111 @sh4
+srl16      0101001  ..... ..... 000 ..... 1111111 @r
+srl16_u    0110001  ..... ..... 000 ..... 1111111 @r
+srli16     0111001  0.... ..... 000 ..... 1111111 @sh4
+srli16_u   0111001  1.... ..... 000 ..... 1111111 @sh4
+sll16      0101010  ..... ..... 000 ..... 1111111 @r
+slli16     0111010  0.... ..... 000 ..... 1111111 @sh4
+ksll16     0110010  ..... ..... 000 ..... 1111111 @r
+kslli16    0111010  1.... ..... 000 ..... 1111111 @sh4
+kslra16    0101011  ..... ..... 000 ..... 1111111 @r
+kslra16_u  0110011  ..... ..... 000 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 109f560ec9..848edab7e5 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -238,3 +238,118 @@ GEN_RVP_R_OOL(rsub8);
 GEN_RVP_R_OOL(ursub8);
 GEN_RVP_R_OOL(ksub8);
 GEN_RVP_R_OOL(uksub8);
+
+/* 16-bit Shift Instructions */
+static bool rvp_shift_ool(DisasContext *ctx, arg_r *a,
+                          gen_helper_rvp_r *fn, target_ulong mask)
+{
+    TCGv src1, src2, dst;
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    tcg_gen_andi_tl(src2, src2, mask);
+
+    fn(dst, cpu_env, src1, src2);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(dst);
+    return true;
+}
+
+typedef void GenGvecShift(unsigned, uint32_t, uint32_t, TCGv_i32,
+                          uint32_t, uint32_t);
+static inline bool
+rvp_shift(DisasContext *ctx, arg_r *a, uint8_t vece,
+          GenGvecShift *f64, gen_helper_rvp_r *fn,
+          uint8_t mask)
+{
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+#ifdef TARGET_RISCV64
+    if (a->rd && a->rs1 && a->rs2) {
+        TCGv_i32 shift = tcg_temp_new_i32();
+        tcg_gen_extrl_i64_i32(shift, cpu_gpr[a->rs2]);
+        tcg_gen_andi_i32(shift, shift, mask);
+        f64(vece, offsetof(CPURISCVState, gpr[a->rd]),
+            offsetof(CPURISCVState, gpr[a->rs1]),
+            shift, 8, 8);
+        tcg_temp_free_i32(shift);
+        return true;
+    }
+#endif
+    return rvp_shift_ool(ctx, a, fn, mask);
+}
+
+#define GEN_RVP_SHIFT(NAME, GVEC, VECE)                     \
+static bool trans_##NAME(DisasContext *s, arg_r *a)         \
+{                                                           \
+    return rvp_shift(s, a, VECE, GVEC, gen_helper_##NAME,   \
+                     (8 << VECE) - 1);                      \
+}
+
+GEN_RVP_SHIFT(sra16, tcg_gen_gvec_sars, 1);
+GEN_RVP_SHIFT(srl16, tcg_gen_gvec_shrs, 1);
+GEN_RVP_SHIFT(sll16, tcg_gen_gvec_shls, 1);
+GEN_RVP_R_OOL(sra16_u);
+GEN_RVP_R_OOL(srl16_u);
+GEN_RVP_R_OOL(ksll16);
+GEN_RVP_R_OOL(kslra16);
+GEN_RVP_R_OOL(kslra16_u);
+
+static bool rvp_shifti_ool(DisasContext *ctx, arg_shift *a,
+                           gen_helper_rvp_r *fn)
+{
+    TCGv src1, dst, shift;
+
+    src1 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    shift = tcg_const_tl(a->shamt);
+    fn(dst, cpu_env, src1, shift);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(dst);
+    tcg_temp_free(shift);
+    return true;
+}
+
+static inline bool
+rvp_shifti(DisasContext *ctx, arg_shift *a,
+           void (* f64)(TCGv_i64, TCGv_i64, int64_t),
+           gen_helper_rvp_r *fn)
+{
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+#ifdef TARGET_RISCV64
+    if (a->rd && a->rs1 && f64) {
+        f64(cpu_gpr[a->rd], cpu_gpr[a->rs1], a->shamt);
+        return true;
+    }
+#endif
+    return rvp_shifti_ool(ctx, a, fn);
+}
+
+#define GEN_RVP_SHIFTI(NAME, OP, GVEC)                   \
+static bool trans_##NAME(DisasContext *s, arg_shift *a)  \
+{                                                        \
+    return rvp_shifti(s, a, GVEC, gen_helper_##OP);      \
+}
+
+GEN_RVP_SHIFTI(srai16, sra16, tcg_gen_vec_sar16i_i64);
+GEN_RVP_SHIFTI(srli16, srl16, tcg_gen_vec_shr16i_i64);
+GEN_RVP_SHIFTI(slli16, sll16, tcg_gen_vec_shl16i_i64);
+GEN_RVP_SHIFTI(srai16_u, sra16_u, NULL);
+GEN_RVP_SHIFTI(srli16_u, srl16_u, NULL);
+GEN_RVP_SHIFTI(kslli16, ksll16, NULL);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 62db072204..7e31c2fe46 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -425,3 +425,107 @@ static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(uksub8, 1, 1);
+
+/* 16-bit Shift Instructions */
+static inline void do_sra16(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+    d[i] = a[i] >> shift;
+}
+
+RVPR(sra16, 1, 2);
+
+static inline void do_srl16(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+    d[i] = a[i] >> shift;
+}
+
+RVPR(srl16, 1, 2);
+
+static inline void do_sll16(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+    d[i] = a[i] << shift;
+}
+
+RVPR(sll16, 1, 2);
+
+static inline void do_sra16_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+
+    d[i] = vssra16(env, 0, a[i], shift);
+}
+
+RVPR(sra16_u, 1, 2);
+
+static inline void do_srl16_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+
+    d[i] = vssrl16(env, 0, a[i], shift);
+}
+
+RVPR(srl16_u, 1, 2);
+
+static inline void do_ksll16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, result;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+
+    result = a[i] << shift;
+    if (shift > (clrsb32(a[i]) - 16)) {
+        env->vxsat = 0x1;
+        d[i] = (a[i] & INT16_MIN) ? INT16_MIN : INT16_MAX;
+    } else {
+        d[i] = result;
+    }
+}
+
+RVPR(ksll16, 1, 2);
+
+static inline void do_kslra16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    int32_t shift = sextract32((*(target_ulong *)vb), 0, 5);
+
+    if (shift >= 0) {
+        do_ksll16(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 16) ? 15 : shift;
+        d[i] = a[i] >> shift;
+    }
+}
+
+RVPR(kslra16, 1, 2);
+
+static inline void do_kslra16_u(CPURISCVState *env, void *vd, void *va,
+                                void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 5);
+
+    if (shift >= 0) {
+        do_ksll16(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 16) ? 15 : shift;
+        d[i] = vssra16(env, 0, a[i], shift);
+    }
+}
+
+RVPR(kslra16_u, 1, 2);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 07/38] target/riscv: SIMD 8-bit Shift Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   9 +++
 target/riscv/insn32.decode              |  17 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  16 ++++
 target/riscv/packed_helper.c            | 102 ++++++++++++++++++++++++
 4 files changed, 144 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 20bf400ac2..0ecd4d53f9 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1193,3 +1193,12 @@ DEF_HELPER_3(sll16, tl, env, tl, tl)
 DEF_HELPER_3(ksll16, tl, env, tl, tl)
 DEF_HELPER_3(kslra16, tl, env, tl, tl)
 DEF_HELPER_3(kslra16_u, tl, env, tl, tl)
+
+DEF_HELPER_3(sra8, tl, env, tl, tl)
+DEF_HELPER_3(sra8_u, tl, env, tl, tl)
+DEF_HELPER_3(srl8, tl, env, tl, tl)
+DEF_HELPER_3(srl8_u, tl, env, tl, tl)
+DEF_HELPER_3(sll8, tl, env, tl, tl)
+DEF_HELPER_3(ksll8, tl, env, tl, tl)
+DEF_HELPER_3(kslra8, tl, env, tl, tl)
+DEF_HELPER_3(kslra8_u, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 6f053bfeb7..cc782fcde5 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -24,6 +24,7 @@
 
 %sh10    20:10
 %sh4    20:4
+%sh3    20:3
 %csr    20:12
 %rm     12:3
 %nf     29:3                     !function=ex_plus_1
@@ -61,6 +62,7 @@
 
 @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh10      %rs1 %rd
 @sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
+@sh3     ......  ...... .....  ... ..... ....... &shift  shamt=%sh3      %rs1 %rd
 @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
 
 @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
@@ -652,3 +654,18 @@ ksll16     0110010  ..... ..... 000 ..... 1111111 @r
 kslli16    0111010  1.... ..... 000 ..... 1111111 @sh4
 kslra16    0101011  ..... ..... 000 ..... 1111111 @r
 kslra16_u  0110011  ..... ..... 000 ..... 1111111 @r
+
+sra8       0101100  ..... ..... 000 ..... 1111111 @r
+sra8_u     0110100  ..... ..... 000 ..... 1111111 @r
+srai8      0111100  00... ..... 000 ..... 1111111 @sh3
+srai8_u    0111100  01... ..... 000 ..... 1111111 @sh3
+srl8       0101101  ..... ..... 000 ..... 1111111 @r
+srl8_u     0110101  ..... ..... 000 ..... 1111111 @r
+srli8      0111101  00... ..... 000 ..... 1111111 @sh3
+srli8_u    0111101  01... ..... 000 ..... 1111111 @sh3
+sll8       0101110  ..... ..... 000 ..... 1111111 @r
+slli8      0111110  00... ..... 000 ..... 1111111 @sh3
+ksll8      0110110  ..... ..... 000 ..... 1111111 @r
+kslli8     0111110  01... ..... 000 ..... 1111111 @sh3
+kslra8     0101111  ..... ..... 000 ..... 1111111 @r
+kslra8_u   0110111  ..... ..... 000 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 848edab7e5..12a64849eb 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -353,3 +353,19 @@ GEN_RVP_SHIFTI(slli16, sll16, tcg_gen_vec_shl16i_i64);
 GEN_RVP_SHIFTI(srai16_u, sra16_u, NULL);
 GEN_RVP_SHIFTI(srli16_u, srl16_u, NULL);
 GEN_RVP_SHIFTI(kslli16, ksll16, NULL);
+
+/* SIMD 8-bit Shift Instructions */
+GEN_RVP_SHIFT(sra8, tcg_gen_gvec_sars, 0);
+GEN_RVP_SHIFT(srl8, tcg_gen_gvec_shrs, 0);
+GEN_RVP_SHIFT(sll8, tcg_gen_gvec_shls, 0);
+GEN_RVP_R_OOL(sra8_u);
+GEN_RVP_R_OOL(srl8_u);
+GEN_RVP_R_OOL(ksll8);
+GEN_RVP_R_OOL(kslra8);
+GEN_RVP_R_OOL(kslra8_u);
+GEN_RVP_SHIFTI(srai8, sra8, tcg_gen_vec_sar8i_i64);
+GEN_RVP_SHIFTI(srli8, srl8, tcg_gen_vec_shr8i_i64);
+GEN_RVP_SHIFTI(slli8, sll8, tcg_gen_vec_shl8i_i64);
+GEN_RVP_SHIFTI(srai8_u, sra8_u, NULL);
+GEN_RVP_SHIFTI(srli8_u, srl8_u, NULL);
+GEN_RVP_SHIFTI(kslli8, ksll8, NULL);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 7e31c2fe46..ab9ebc472b 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -529,3 +529,105 @@ static inline void do_kslra16_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(kslra16_u, 1, 2);
+
+/* SIMD 8-bit Shift Instructions */
+static inline void do_sra8(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+    d[i] = a[i] >> shift;
+}
+
+RVPR(sra8, 1, 1);
+
+static inline void do_srl8(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+    d[i] = a[i] >> shift;
+}
+
+RVPR(srl8, 1, 1);
+
+static inline void do_sll8(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+    d[i] = a[i] << shift;
+}
+
+RVPR(sll8, 1, 1);
+
+static inline void do_sra8_u(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+    d[i] =  vssra8(env, 0, a[i], shift);
+}
+
+RVPR(sra8_u, 1, 1);
+
+static inline void do_srl8_u(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+    d[i] =  vssrl8(env, 0, a[i], shift);
+}
+
+RVPR(srl8_u, 1, 1);
+
+static inline void do_ksll8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, result;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+
+    result = a[i] << shift;
+    if (shift > (clrsb32(a[i]) - 24)) {
+        env->vxsat = 0x1;
+        d[i] = (a[i] & INT8_MIN) ? INT8_MIN : INT8_MAX;
+    } else {
+        d[i] = result;
+    }
+}
+
+RVPR(ksll8, 1, 1);
+
+static inline void do_kslra8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 4);
+
+    if (shift >= 0) {
+        do_ksll8(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 8) ? 7 : shift;
+        d[i] = a[i] >> shift;
+    }
+}
+
+RVPR(kslra8, 1, 1);
+
+static inline void do_kslra8_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 4);
+
+    if (shift >= 0) {
+        do_ksll8(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 8) ? 7 : shift;
+        d[i] =  vssra8(env, 0, a[i], shift);
+    }
+}
+
+RVPR(kslra8_u, 1, 1);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 07/38] target/riscv: SIMD 8-bit Shift Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   9 +++
 target/riscv/insn32.decode              |  17 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  16 ++++
 target/riscv/packed_helper.c            | 102 ++++++++++++++++++++++++
 4 files changed, 144 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 20bf400ac2..0ecd4d53f9 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1193,3 +1193,12 @@ DEF_HELPER_3(sll16, tl, env, tl, tl)
 DEF_HELPER_3(ksll16, tl, env, tl, tl)
 DEF_HELPER_3(kslra16, tl, env, tl, tl)
 DEF_HELPER_3(kslra16_u, tl, env, tl, tl)
+
+DEF_HELPER_3(sra8, tl, env, tl, tl)
+DEF_HELPER_3(sra8_u, tl, env, tl, tl)
+DEF_HELPER_3(srl8, tl, env, tl, tl)
+DEF_HELPER_3(srl8_u, tl, env, tl, tl)
+DEF_HELPER_3(sll8, tl, env, tl, tl)
+DEF_HELPER_3(ksll8, tl, env, tl, tl)
+DEF_HELPER_3(kslra8, tl, env, tl, tl)
+DEF_HELPER_3(kslra8_u, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 6f053bfeb7..cc782fcde5 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -24,6 +24,7 @@
 
 %sh10    20:10
 %sh4    20:4
+%sh3    20:3
 %csr    20:12
 %rm     12:3
 %nf     29:3                     !function=ex_plus_1
@@ -61,6 +62,7 @@
 
 @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh10      %rs1 %rd
 @sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
+@sh3     ......  ...... .....  ... ..... ....... &shift  shamt=%sh3      %rs1 %rd
 @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
 
 @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
@@ -652,3 +654,18 @@ ksll16     0110010  ..... ..... 000 ..... 1111111 @r
 kslli16    0111010  1.... ..... 000 ..... 1111111 @sh4
 kslra16    0101011  ..... ..... 000 ..... 1111111 @r
 kslra16_u  0110011  ..... ..... 000 ..... 1111111 @r
+
+sra8       0101100  ..... ..... 000 ..... 1111111 @r
+sra8_u     0110100  ..... ..... 000 ..... 1111111 @r
+srai8      0111100  00... ..... 000 ..... 1111111 @sh3
+srai8_u    0111100  01... ..... 000 ..... 1111111 @sh3
+srl8       0101101  ..... ..... 000 ..... 1111111 @r
+srl8_u     0110101  ..... ..... 000 ..... 1111111 @r
+srli8      0111101  00... ..... 000 ..... 1111111 @sh3
+srli8_u    0111101  01... ..... 000 ..... 1111111 @sh3
+sll8       0101110  ..... ..... 000 ..... 1111111 @r
+slli8      0111110  00... ..... 000 ..... 1111111 @sh3
+ksll8      0110110  ..... ..... 000 ..... 1111111 @r
+kslli8     0111110  01... ..... 000 ..... 1111111 @sh3
+kslra8     0101111  ..... ..... 000 ..... 1111111 @r
+kslra8_u   0110111  ..... ..... 000 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 848edab7e5..12a64849eb 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -353,3 +353,19 @@ GEN_RVP_SHIFTI(slli16, sll16, tcg_gen_vec_shl16i_i64);
 GEN_RVP_SHIFTI(srai16_u, sra16_u, NULL);
 GEN_RVP_SHIFTI(srli16_u, srl16_u, NULL);
 GEN_RVP_SHIFTI(kslli16, ksll16, NULL);
+
+/* SIMD 8-bit Shift Instructions */
+GEN_RVP_SHIFT(sra8, tcg_gen_gvec_sars, 0);
+GEN_RVP_SHIFT(srl8, tcg_gen_gvec_shrs, 0);
+GEN_RVP_SHIFT(sll8, tcg_gen_gvec_shls, 0);
+GEN_RVP_R_OOL(sra8_u);
+GEN_RVP_R_OOL(srl8_u);
+GEN_RVP_R_OOL(ksll8);
+GEN_RVP_R_OOL(kslra8);
+GEN_RVP_R_OOL(kslra8_u);
+GEN_RVP_SHIFTI(srai8, sra8, tcg_gen_vec_sar8i_i64);
+GEN_RVP_SHIFTI(srli8, srl8, tcg_gen_vec_shr8i_i64);
+GEN_RVP_SHIFTI(slli8, sll8, tcg_gen_vec_shl8i_i64);
+GEN_RVP_SHIFTI(srai8_u, sra8_u, NULL);
+GEN_RVP_SHIFTI(srli8_u, srl8_u, NULL);
+GEN_RVP_SHIFTI(kslli8, ksll8, NULL);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 7e31c2fe46..ab9ebc472b 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -529,3 +529,105 @@ static inline void do_kslra16_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(kslra16_u, 1, 2);
+
+/* SIMD 8-bit Shift Instructions */
+static inline void do_sra8(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+    d[i] = a[i] >> shift;
+}
+
+RVPR(sra8, 1, 1);
+
+static inline void do_srl8(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+    d[i] = a[i] >> shift;
+}
+
+RVPR(srl8, 1, 1);
+
+static inline void do_sll8(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+    d[i] = a[i] << shift;
+}
+
+RVPR(sll8, 1, 1);
+
+static inline void do_sra8_u(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+    d[i] =  vssra8(env, 0, a[i], shift);
+}
+
+RVPR(sra8_u, 1, 1);
+
+static inline void do_srl8_u(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+    d[i] =  vssrl8(env, 0, a[i], shift);
+}
+
+RVPR(srl8_u, 1, 1);
+
+static inline void do_ksll8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, result;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+
+    result = a[i] << shift;
+    if (shift > (clrsb32(a[i]) - 24)) {
+        env->vxsat = 0x1;
+        d[i] = (a[i] & INT8_MIN) ? INT8_MIN : INT8_MAX;
+    } else {
+        d[i] = result;
+    }
+}
+
+RVPR(ksll8, 1, 1);
+
+static inline void do_kslra8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 4);
+
+    if (shift >= 0) {
+        do_ksll8(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 8) ? 7 : shift;
+        d[i] = a[i] >> shift;
+    }
+}
+
+RVPR(kslra8, 1, 1);
+
+static inline void do_kslra8_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 4);
+
+    if (shift >= 0) {
+        do_ksll8(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 8) ? 7 : shift;
+        d[i] =  vssra8(env, 0, a[i], shift);
+    }
+}
+
+RVPR(kslra8_u, 1, 1);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 08/38] target/riscv: SIMD 16-bit Compare Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  6 ++++
 target/riscv/insn32.decode              |  6 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  7 ++++
 target/riscv/packed_helper.c            | 46 +++++++++++++++++++++++++
 4 files changed, 65 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 0ecd4d53f9..f41f9acccc 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1202,3 +1202,9 @@ DEF_HELPER_3(sll8, tl, env, tl, tl)
 DEF_HELPER_3(ksll8, tl, env, tl, tl)
 DEF_HELPER_3(kslra8, tl, env, tl, tl)
 DEF_HELPER_3(kslra8_u, tl, env, tl, tl)
+
+DEF_HELPER_3(cmpeq16, tl, env, tl, tl)
+DEF_HELPER_3(scmplt16, tl, env, tl, tl)
+DEF_HELPER_3(scmple16, tl, env, tl, tl)
+DEF_HELPER_3(ucmplt16, tl, env, tl, tl)
+DEF_HELPER_3(ucmple16, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index cc782fcde5..f3cd508396 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -669,3 +669,9 @@ ksll8      0110110  ..... ..... 000 ..... 1111111 @r
 kslli8     0111110  01... ..... 000 ..... 1111111 @sh3
 kslra8     0101111  ..... ..... 000 ..... 1111111 @r
 kslra8_u   0110111  ..... ..... 000 ..... 1111111 @r
+
+cmpeq16    0100110  ..... ..... 000 ..... 1111111 @r
+scmplt16   0000110  ..... ..... 000 ..... 1111111 @r
+scmple16   0001110  ..... ..... 000 ..... 1111111 @r
+ucmplt16   0010110  ..... ..... 000 ..... 1111111 @r
+ucmple16   0011110  ..... ..... 000 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 12a64849eb..6438dfb776 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -369,3 +369,10 @@ GEN_RVP_SHIFTI(slli8, sll8, tcg_gen_vec_shl8i_i64);
 GEN_RVP_SHIFTI(srai8_u, sra8_u, NULL);
 GEN_RVP_SHIFTI(srli8_u, srl8_u, NULL);
 GEN_RVP_SHIFTI(kslli8, ksll8, NULL);
+
+/* SIMD 16-bit Compare Instructions */
+GEN_RVP_R_OOL(cmpeq16);
+GEN_RVP_R_OOL(scmplt16);
+GEN_RVP_R_OOL(scmple16);
+GEN_RVP_R_OOL(ucmplt16);
+GEN_RVP_R_OOL(ucmple16);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index ab9ebc472b..30b916b5ad 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -631,3 +631,49 @@ static inline void do_kslra8_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(kslra8_u, 1, 1);
+
+/* SIMD 16-bit Compare Instructions */
+static inline void do_cmpeq16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] == b[i]) ? 0xffff : 0x0;
+}
+
+RVPR(cmpeq16, 1, 2);
+
+static inline void do_scmplt16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] < b[i]) ? 0xffff : 0x0;
+}
+
+RVPR(scmplt16, 1, 2);
+
+static inline void do_scmple16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] <= b[i]) ? 0xffff : 0x0;
+}
+
+RVPR(scmple16, 1, 2);
+
+static inline void do_ucmplt16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] < b[i]) ? 0xffff : 0x0;
+}
+
+RVPR(ucmplt16, 1, 2);
+
+static inline void do_ucmple16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] <= b[i]) ? 0xffff : 0x0;
+}
+
+RVPR(ucmple16, 1, 2);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 08/38] target/riscv: SIMD 16-bit Compare Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  6 ++++
 target/riscv/insn32.decode              |  6 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  7 ++++
 target/riscv/packed_helper.c            | 46 +++++++++++++++++++++++++
 4 files changed, 65 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 0ecd4d53f9..f41f9acccc 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1202,3 +1202,9 @@ DEF_HELPER_3(sll8, tl, env, tl, tl)
 DEF_HELPER_3(ksll8, tl, env, tl, tl)
 DEF_HELPER_3(kslra8, tl, env, tl, tl)
 DEF_HELPER_3(kslra8_u, tl, env, tl, tl)
+
+DEF_HELPER_3(cmpeq16, tl, env, tl, tl)
+DEF_HELPER_3(scmplt16, tl, env, tl, tl)
+DEF_HELPER_3(scmple16, tl, env, tl, tl)
+DEF_HELPER_3(ucmplt16, tl, env, tl, tl)
+DEF_HELPER_3(ucmple16, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index cc782fcde5..f3cd508396 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -669,3 +669,9 @@ ksll8      0110110  ..... ..... 000 ..... 1111111 @r
 kslli8     0111110  01... ..... 000 ..... 1111111 @sh3
 kslra8     0101111  ..... ..... 000 ..... 1111111 @r
 kslra8_u   0110111  ..... ..... 000 ..... 1111111 @r
+
+cmpeq16    0100110  ..... ..... 000 ..... 1111111 @r
+scmplt16   0000110  ..... ..... 000 ..... 1111111 @r
+scmple16   0001110  ..... ..... 000 ..... 1111111 @r
+ucmplt16   0010110  ..... ..... 000 ..... 1111111 @r
+ucmple16   0011110  ..... ..... 000 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 12a64849eb..6438dfb776 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -369,3 +369,10 @@ GEN_RVP_SHIFTI(slli8, sll8, tcg_gen_vec_shl8i_i64);
 GEN_RVP_SHIFTI(srai8_u, sra8_u, NULL);
 GEN_RVP_SHIFTI(srli8_u, srl8_u, NULL);
 GEN_RVP_SHIFTI(kslli8, ksll8, NULL);
+
+/* SIMD 16-bit Compare Instructions */
+GEN_RVP_R_OOL(cmpeq16);
+GEN_RVP_R_OOL(scmplt16);
+GEN_RVP_R_OOL(scmple16);
+GEN_RVP_R_OOL(ucmplt16);
+GEN_RVP_R_OOL(ucmple16);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index ab9ebc472b..30b916b5ad 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -631,3 +631,49 @@ static inline void do_kslra8_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(kslra8_u, 1, 1);
+
+/* SIMD 16-bit Compare Instructions */
+static inline void do_cmpeq16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] == b[i]) ? 0xffff : 0x0;
+}
+
+RVPR(cmpeq16, 1, 2);
+
+static inline void do_scmplt16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] < b[i]) ? 0xffff : 0x0;
+}
+
+RVPR(scmplt16, 1, 2);
+
+static inline void do_scmple16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] <= b[i]) ? 0xffff : 0x0;
+}
+
+RVPR(scmple16, 1, 2);
+
+static inline void do_ucmplt16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] < b[i]) ? 0xffff : 0x0;
+}
+
+RVPR(ucmplt16, 1, 2);
+
+static inline void do_ucmple16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] <= b[i]) ? 0xffff : 0x0;
+}
+
+RVPR(ucmple16, 1, 2);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 09/38] target/riscv: SIMD 8-bit Compare Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  6 ++++
 target/riscv/insn32.decode              |  6 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  7 ++++
 target/riscv/packed_helper.c            | 46 +++++++++++++++++++++++++
 4 files changed, 65 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index f41f9acccc..4d9c36609c 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1208,3 +1208,9 @@ DEF_HELPER_3(scmplt16, tl, env, tl, tl)
 DEF_HELPER_3(scmple16, tl, env, tl, tl)
 DEF_HELPER_3(ucmplt16, tl, env, tl, tl)
 DEF_HELPER_3(ucmple16, tl, env, tl, tl)
+
+DEF_HELPER_3(cmpeq8, tl, env, tl, tl)
+DEF_HELPER_3(scmplt8, tl, env, tl, tl)
+DEF_HELPER_3(scmple8, tl, env, tl, tl)
+DEF_HELPER_3(ucmplt8, tl, env, tl, tl)
+DEF_HELPER_3(ucmple8, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index f3cd508396..7519df7e20 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -675,3 +675,9 @@ scmplt16   0000110  ..... ..... 000 ..... 1111111 @r
 scmple16   0001110  ..... ..... 000 ..... 1111111 @r
 ucmplt16   0010110  ..... ..... 000 ..... 1111111 @r
 ucmple16   0011110  ..... ..... 000 ..... 1111111 @r
+
+cmpeq8     0100111  ..... ..... 000 ..... 1111111 @r
+scmplt8    0000111  ..... ..... 000 ..... 1111111 @r
+scmple8    0001111  ..... ..... 000 ..... 1111111 @r
+ucmplt8    0010111  ..... ..... 000 ..... 1111111 @r
+ucmple8    0011111  ..... ..... 000 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 6438dfb776..6eb9e83c6f 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -376,3 +376,10 @@ GEN_RVP_R_OOL(scmplt16);
 GEN_RVP_R_OOL(scmple16);
 GEN_RVP_R_OOL(ucmplt16);
 GEN_RVP_R_OOL(ucmple16);
+
+/* SIMD 8-bit Compare Instructions */
+GEN_RVP_R_OOL(cmpeq8);
+GEN_RVP_R_OOL(scmplt8);
+GEN_RVP_R_OOL(scmple8);
+GEN_RVP_R_OOL(ucmplt8);
+GEN_RVP_R_OOL(ucmple8);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 30b916b5ad..ff86e015e4 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -677,3 +677,49 @@ static inline void do_ucmple16(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(ucmple16, 1, 2);
+
+/* SIMD 8-bit Compare Instructions */
+static inline void do_cmpeq8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] == b[i]) ? 0xff : 0x0;
+}
+
+RVPR(cmpeq8, 1, 1);
+
+static inline void do_scmplt8(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] < b[i]) ? 0xff : 0x0;
+}
+
+RVPR(scmplt8, 1, 1);
+
+static inline void do_scmple8(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] <= b[i]) ? 0xff : 0x0;
+}
+
+RVPR(scmple8, 1, 1);
+
+static inline void do_ucmplt8(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] < b[i]) ? 0xff : 0x0;
+}
+
+RVPR(ucmplt8, 1, 1);
+
+static inline void do_ucmple8(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] <= b[i]) ? 0xff : 0x0;
+}
+
+RVPR(ucmple8, 1, 1);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 09/38] target/riscv: SIMD 8-bit Compare Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  6 ++++
 target/riscv/insn32.decode              |  6 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  7 ++++
 target/riscv/packed_helper.c            | 46 +++++++++++++++++++++++++
 4 files changed, 65 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index f41f9acccc..4d9c36609c 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1208,3 +1208,9 @@ DEF_HELPER_3(scmplt16, tl, env, tl, tl)
 DEF_HELPER_3(scmple16, tl, env, tl, tl)
 DEF_HELPER_3(ucmplt16, tl, env, tl, tl)
 DEF_HELPER_3(ucmple16, tl, env, tl, tl)
+
+DEF_HELPER_3(cmpeq8, tl, env, tl, tl)
+DEF_HELPER_3(scmplt8, tl, env, tl, tl)
+DEF_HELPER_3(scmple8, tl, env, tl, tl)
+DEF_HELPER_3(ucmplt8, tl, env, tl, tl)
+DEF_HELPER_3(ucmple8, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index f3cd508396..7519df7e20 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -675,3 +675,9 @@ scmplt16   0000110  ..... ..... 000 ..... 1111111 @r
 scmple16   0001110  ..... ..... 000 ..... 1111111 @r
 ucmplt16   0010110  ..... ..... 000 ..... 1111111 @r
 ucmple16   0011110  ..... ..... 000 ..... 1111111 @r
+
+cmpeq8     0100111  ..... ..... 000 ..... 1111111 @r
+scmplt8    0000111  ..... ..... 000 ..... 1111111 @r
+scmple8    0001111  ..... ..... 000 ..... 1111111 @r
+ucmplt8    0010111  ..... ..... 000 ..... 1111111 @r
+ucmple8    0011111  ..... ..... 000 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 6438dfb776..6eb9e83c6f 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -376,3 +376,10 @@ GEN_RVP_R_OOL(scmplt16);
 GEN_RVP_R_OOL(scmple16);
 GEN_RVP_R_OOL(ucmplt16);
 GEN_RVP_R_OOL(ucmple16);
+
+/* SIMD 8-bit Compare Instructions */
+GEN_RVP_R_OOL(cmpeq8);
+GEN_RVP_R_OOL(scmplt8);
+GEN_RVP_R_OOL(scmple8);
+GEN_RVP_R_OOL(ucmplt8);
+GEN_RVP_R_OOL(ucmple8);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 30b916b5ad..ff86e015e4 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -677,3 +677,49 @@ static inline void do_ucmple16(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(ucmple16, 1, 2);
+
+/* SIMD 8-bit Compare Instructions */
+static inline void do_cmpeq8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] == b[i]) ? 0xff : 0x0;
+}
+
+RVPR(cmpeq8, 1, 1);
+
+static inline void do_scmplt8(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] < b[i]) ? 0xff : 0x0;
+}
+
+RVPR(scmplt8, 1, 1);
+
+static inline void do_scmple8(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] <= b[i]) ? 0xff : 0x0;
+}
+
+RVPR(scmple8, 1, 1);
+
+static inline void do_ucmplt8(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] < b[i]) ? 0xff : 0x0;
+}
+
+RVPR(ucmplt8, 1, 1);
+
+static inline void do_ucmple8(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+    d[i] = (a[i] <= b[i]) ? 0xff : 0x0;
+}
+
+RVPR(ucmple8, 1, 1);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 10/38] target/riscv: SIMD 16-bit Multiply Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   7 ++
 target/riscv/insn32.decode              |   7 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  53 ++++++++++++
 target/riscv/packed_helper.c            | 104 ++++++++++++++++++++++++
 4 files changed, 171 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 4d9c36609c..bc60712bd9 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1214,3 +1214,10 @@ DEF_HELPER_3(scmplt8, tl, env, tl, tl)
 DEF_HELPER_3(scmple8, tl, env, tl, tl)
 DEF_HELPER_3(ucmplt8, tl, env, tl, tl)
 DEF_HELPER_3(ucmple8, tl, env, tl, tl)
+
+DEF_HELPER_3(smul16, i64, env, tl, tl)
+DEF_HELPER_3(smulx16, i64, env, tl, tl)
+DEF_HELPER_3(umul16, i64, env, tl, tl)
+DEF_HELPER_3(umulx16, i64, env, tl, tl)
+DEF_HELPER_3(khm16, tl, env, tl, tl)
+DEF_HELPER_3(khmx16, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 7519df7e20..38519a477c 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -681,3 +681,10 @@ scmplt8    0000111  ..... ..... 000 ..... 1111111 @r
 scmple8    0001111  ..... ..... 000 ..... 1111111 @r
 ucmplt8    0010111  ..... ..... 000 ..... 1111111 @r
 ucmple8    0011111  ..... ..... 000 ..... 1111111 @r
+
+smul16     1010000  ..... ..... 000 ..... 1111111 @r
+smulx16    1010001  ..... ..... 000 ..... 1111111 @r
+umul16     1011000  ..... ..... 000 ..... 1111111 @r
+umulx16    1011001  ..... ..... 000 ..... 1111111 @r
+khm16      1000011  ..... ..... 000 ..... 1111111 @r
+khmx16     1001011  ..... ..... 000 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 6eb9e83c6f..7e5bf9041d 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -383,3 +383,56 @@ GEN_RVP_R_OOL(scmplt8);
 GEN_RVP_R_OOL(scmple8);
 GEN_RVP_R_OOL(ucmplt8);
 GEN_RVP_R_OOL(ucmple8);
+
+/* SIMD 16-bit Multiply Instructions */
+static inline bool
+r_d64_ool(DisasContext *ctx, arg_r *a,
+          void (* fn)(TCGv_i64, TCGv_ptr, TCGv, TCGv))
+{
+#ifdef TARGET_RISCV64
+    return r_ool(ctx, a, fn);
+#else
+    TCGv src1, src2;
+    TCGv_i64 dst;
+    TCGv_i32 low, high;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_p64) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    dst = tcg_temp_new_i64();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    fn(dst, cpu_env, src1, src2);
+
+    low = tcg_temp_new_i32();
+    high = tcg_temp_new_i32();
+    tcg_gen_extrl_i64_i32(low, dst);
+    tcg_gen_extrh_i64_i32(high, dst);
+    gen_set_gpr(a->rd, low);
+    gen_set_gpr(a->rd + 1, high);
+    tcg_temp_free_i32(low);
+    tcg_temp_free_i32(high);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free_i64(dst);
+    return true;
+#endif
+}
+
+#define GEN_RVP_R_D64_OOL(NAME)                        \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_d64_ool(s, a, gen_helper_##NAME);         \
+}
+
+GEN_RVP_R_D64_OOL(smul16);
+GEN_RVP_R_D64_OOL(smulx16);
+GEN_RVP_R_D64_OOL(umul16);
+GEN_RVP_R_D64_OOL(umulx16);
+GEN_RVP_R_OOL(khm16);
+GEN_RVP_R_OOL(khmx16);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index ff86e015e4..13fed2c4d1 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -723,3 +723,107 @@ static inline void do_ucmple8(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(ucmple8, 1, 1);
+
+/* SIMD 16-bit Multiply Instructions */
+typedef void PackedFn3(CPURISCVState *, void *, void *, void *);
+static inline uint64_t rvpr64(CPURISCVState *env, target_ulong a,
+                              target_ulong b, PackedFn3 *fn)
+{
+    uint64_t result;
+
+    fn(env, &result, &a, &b);
+    return result;
+}
+
+#define RVPR64(NAME)                                            \
+uint64_t HELPER(NAME)(CPURISCVState *env, target_ulong a,       \
+                      target_ulong b)                           \
+{                                                               \
+    return rvpr64(env, a, b, (PackedFn3 *)do_##NAME);           \
+}
+
+static inline void do_smul16(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(0)] = (int32_t)a[H2(0)] * b[H2(0)];
+    d[H4(1)] = (int32_t)a[H2(1)] * b[H2(1)];
+}
+
+RVPR64(smul16);
+
+static inline void do_smulx16(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(0)] = (int32_t)a[H2(0)] * b[H2(1)];
+    d[H4(1)] = (int32_t)a[H2(1)] * b[H2(0)];
+}
+
+RVPR64(smulx16);
+
+static inline void do_umul16(CPURISCVState *env, void *vd, void *va, void *vb,
+                             uint8_t i)
+{
+    uint32_t *d = vd;
+    uint16_t *a = va, *b = vb;
+    d[H4(0)] = (uint32_t)a[H2(0)] * b[H2(0)];
+    d[H4(1)] = (uint32_t)a[H2(1)] * b[H2(1)];
+}
+
+RVPR64(umul16);
+
+static inline void do_umulx16(CPURISCVState *env, void *vd, void *va, void *vb,
+                              uint8_t i)
+{
+    uint32_t *d = vd;
+    uint16_t *a = va, *b = vb;
+    d[H4(0)] = (uint32_t)a[H2(0)] * b[H2(1)];
+    d[H4(1)] = (uint32_t)a[H2(1)] * b[H2(0)];
+}
+
+RVPR64(umulx16);
+
+static inline void do_khm16(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+
+    if (a[i] == INT16_MIN && b[i] == INT16_MIN) {
+        env->vxsat = 1;
+        d[i] = INT16_MAX;
+    } else {
+        d[i] = (int32_t)a[i] * b[i] >> 15;
+    }
+}
+
+RVPR(khm16, 1, 2);
+
+static inline void do_khmx16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+
+    /*
+     * t[x] = ra.H[x] s* rb.H[y];
+     * rt.H[x] = SAT.Q15(t[x] s>> 15);
+     *
+     * (RV32: (x,y)=(1,0),(0,1),
+     *  RV64: (x,y)=(3,2),(2,3),
+     *              (1,0),(0,1)
+     */
+    if (a[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        env->vxsat = 1;
+        d[H2(i)] = INT16_MAX;
+    } else {
+        d[H2(i)] = (int32_t)a[H2(i)] * b[H2(i + 1)] >> 15;
+    }
+    if (a[H2(i + 1)] == INT16_MIN && b[H2(i)] == INT16_MIN) {
+        env->vxsat = 1;
+        d[H2(i + 1)] = INT16_MAX;
+    } else {
+        d[H2(i + 1)] = (int32_t)a[H2(i + 1)] * b[H2(i)] >> 15;
+    }
+}
+
+RVPR(khmx16, 2, 2);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 10/38] target/riscv: SIMD 16-bit Multiply Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   7 ++
 target/riscv/insn32.decode              |   7 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  53 ++++++++++++
 target/riscv/packed_helper.c            | 104 ++++++++++++++++++++++++
 4 files changed, 171 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 4d9c36609c..bc60712bd9 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1214,3 +1214,10 @@ DEF_HELPER_3(scmplt8, tl, env, tl, tl)
 DEF_HELPER_3(scmple8, tl, env, tl, tl)
 DEF_HELPER_3(ucmplt8, tl, env, tl, tl)
 DEF_HELPER_3(ucmple8, tl, env, tl, tl)
+
+DEF_HELPER_3(smul16, i64, env, tl, tl)
+DEF_HELPER_3(smulx16, i64, env, tl, tl)
+DEF_HELPER_3(umul16, i64, env, tl, tl)
+DEF_HELPER_3(umulx16, i64, env, tl, tl)
+DEF_HELPER_3(khm16, tl, env, tl, tl)
+DEF_HELPER_3(khmx16, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 7519df7e20..38519a477c 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -681,3 +681,10 @@ scmplt8    0000111  ..... ..... 000 ..... 1111111 @r
 scmple8    0001111  ..... ..... 000 ..... 1111111 @r
 ucmplt8    0010111  ..... ..... 000 ..... 1111111 @r
 ucmple8    0011111  ..... ..... 000 ..... 1111111 @r
+
+smul16     1010000  ..... ..... 000 ..... 1111111 @r
+smulx16    1010001  ..... ..... 000 ..... 1111111 @r
+umul16     1011000  ..... ..... 000 ..... 1111111 @r
+umulx16    1011001  ..... ..... 000 ..... 1111111 @r
+khm16      1000011  ..... ..... 000 ..... 1111111 @r
+khmx16     1001011  ..... ..... 000 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 6eb9e83c6f..7e5bf9041d 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -383,3 +383,56 @@ GEN_RVP_R_OOL(scmplt8);
 GEN_RVP_R_OOL(scmple8);
 GEN_RVP_R_OOL(ucmplt8);
 GEN_RVP_R_OOL(ucmple8);
+
+/* SIMD 16-bit Multiply Instructions */
+static inline bool
+r_d64_ool(DisasContext *ctx, arg_r *a,
+          void (* fn)(TCGv_i64, TCGv_ptr, TCGv, TCGv))
+{
+#ifdef TARGET_RISCV64
+    return r_ool(ctx, a, fn);
+#else
+    TCGv src1, src2;
+    TCGv_i64 dst;
+    TCGv_i32 low, high;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_p64) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    dst = tcg_temp_new_i64();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    fn(dst, cpu_env, src1, src2);
+
+    low = tcg_temp_new_i32();
+    high = tcg_temp_new_i32();
+    tcg_gen_extrl_i64_i32(low, dst);
+    tcg_gen_extrh_i64_i32(high, dst);
+    gen_set_gpr(a->rd, low);
+    gen_set_gpr(a->rd + 1, high);
+    tcg_temp_free_i32(low);
+    tcg_temp_free_i32(high);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free_i64(dst);
+    return true;
+#endif
+}
+
+#define GEN_RVP_R_D64_OOL(NAME)                        \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_d64_ool(s, a, gen_helper_##NAME);         \
+}
+
+GEN_RVP_R_D64_OOL(smul16);
+GEN_RVP_R_D64_OOL(smulx16);
+GEN_RVP_R_D64_OOL(umul16);
+GEN_RVP_R_D64_OOL(umulx16);
+GEN_RVP_R_OOL(khm16);
+GEN_RVP_R_OOL(khmx16);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index ff86e015e4..13fed2c4d1 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -723,3 +723,107 @@ static inline void do_ucmple8(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(ucmple8, 1, 1);
+
+/* SIMD 16-bit Multiply Instructions */
+typedef void PackedFn3(CPURISCVState *, void *, void *, void *);
+static inline uint64_t rvpr64(CPURISCVState *env, target_ulong a,
+                              target_ulong b, PackedFn3 *fn)
+{
+    uint64_t result;
+
+    fn(env, &result, &a, &b);
+    return result;
+}
+
+#define RVPR64(NAME)                                            \
+uint64_t HELPER(NAME)(CPURISCVState *env, target_ulong a,       \
+                      target_ulong b)                           \
+{                                                               \
+    return rvpr64(env, a, b, (PackedFn3 *)do_##NAME);           \
+}
+
+static inline void do_smul16(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(0)] = (int32_t)a[H2(0)] * b[H2(0)];
+    d[H4(1)] = (int32_t)a[H2(1)] * b[H2(1)];
+}
+
+RVPR64(smul16);
+
+static inline void do_smulx16(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(0)] = (int32_t)a[H2(0)] * b[H2(1)];
+    d[H4(1)] = (int32_t)a[H2(1)] * b[H2(0)];
+}
+
+RVPR64(smulx16);
+
+static inline void do_umul16(CPURISCVState *env, void *vd, void *va, void *vb,
+                             uint8_t i)
+{
+    uint32_t *d = vd;
+    uint16_t *a = va, *b = vb;
+    d[H4(0)] = (uint32_t)a[H2(0)] * b[H2(0)];
+    d[H4(1)] = (uint32_t)a[H2(1)] * b[H2(1)];
+}
+
+RVPR64(umul16);
+
+static inline void do_umulx16(CPURISCVState *env, void *vd, void *va, void *vb,
+                              uint8_t i)
+{
+    uint32_t *d = vd;
+    uint16_t *a = va, *b = vb;
+    d[H4(0)] = (uint32_t)a[H2(0)] * b[H2(1)];
+    d[H4(1)] = (uint32_t)a[H2(1)] * b[H2(0)];
+}
+
+RVPR64(umulx16);
+
+static inline void do_khm16(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+
+    if (a[i] == INT16_MIN && b[i] == INT16_MIN) {
+        env->vxsat = 1;
+        d[i] = INT16_MAX;
+    } else {
+        d[i] = (int32_t)a[i] * b[i] >> 15;
+    }
+}
+
+RVPR(khm16, 1, 2);
+
+static inline void do_khmx16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+
+    /*
+     * t[x] = ra.H[x] s* rb.H[y];
+     * rt.H[x] = SAT.Q15(t[x] s>> 15);
+     *
+     * (RV32: (x,y)=(1,0),(0,1),
+     *  RV64: (x,y)=(3,2),(2,3),
+     *              (1,0),(0,1)
+     */
+    if (a[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        env->vxsat = 1;
+        d[H2(i)] = INT16_MAX;
+    } else {
+        d[H2(i)] = (int32_t)a[H2(i)] * b[H2(i + 1)] >> 15;
+    }
+    if (a[H2(i + 1)] == INT16_MIN && b[H2(i)] == INT16_MIN) {
+        env->vxsat = 1;
+        d[H2(i + 1)] = INT16_MAX;
+    } else {
+        d[H2(i + 1)] = (int32_t)a[H2(i + 1)] * b[H2(i)] >> 15;
+    }
+}
+
+RVPR(khmx16, 2, 2);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 11/38] target/riscv: SIMD 8-bit Multiply Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  7 ++
 target/riscv/insn32.decode              |  7 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  8 +++
 target/riscv/packed_helper.c            | 93 +++++++++++++++++++++++++
 4 files changed, 115 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index bc60712bd9..6bb601b436 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1221,3 +1221,10 @@ DEF_HELPER_3(umul16, i64, env, tl, tl)
 DEF_HELPER_3(umulx16, i64, env, tl, tl)
 DEF_HELPER_3(khm16, tl, env, tl, tl)
 DEF_HELPER_3(khmx16, tl, env, tl, tl)
+
+DEF_HELPER_3(smul8, i64, env, tl, tl)
+DEF_HELPER_3(smulx8, i64, env, tl, tl)
+DEF_HELPER_3(umul8, i64, env, tl, tl)
+DEF_HELPER_3(umulx8, i64, env, tl, tl)
+DEF_HELPER_3(khm8, tl, env, tl, tl)
+DEF_HELPER_3(khmx8, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 38519a477c..9d165efba9 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -688,3 +688,10 @@ umul16     1011000  ..... ..... 000 ..... 1111111 @r
 umulx16    1011001  ..... ..... 000 ..... 1111111 @r
 khm16      1000011  ..... ..... 000 ..... 1111111 @r
 khmx16     1001011  ..... ..... 000 ..... 1111111 @r
+
+smul8      1010100  ..... ..... 000 ..... 1111111 @r
+smulx8     1010101  ..... ..... 000 ..... 1111111 @r
+umul8      1011100  ..... ..... 000 ..... 1111111 @r
+umulx8     1011101  ..... ..... 000 ..... 1111111 @r
+khm8       1000111  ..... ..... 000 ..... 1111111 @r
+khmx8      1001111  ..... ..... 000 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 7e5bf9041d..336f3418b1 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -436,3 +436,11 @@ GEN_RVP_R_D64_OOL(umul16);
 GEN_RVP_R_D64_OOL(umulx16);
 GEN_RVP_R_OOL(khm16);
 GEN_RVP_R_OOL(khmx16);
+
+/* SIMD 8-bit Multiply Instructions */
+GEN_RVP_R_D64_OOL(smul8);
+GEN_RVP_R_D64_OOL(smulx8);
+GEN_RVP_R_D64_OOL(umul8);
+GEN_RVP_R_D64_OOL(umulx8);
+GEN_RVP_R_OOL(khm8);
+GEN_RVP_R_OOL(khmx8);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 13fed2c4d1..56baefeb8e 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -827,3 +827,96 @@ static inline void do_khmx16(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(khmx16, 2, 2);
+
+/* SIMD 8-bit Multiply Instructions */
+static inline void do_smul8(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    int16_t *d = vd;
+    int8_t *a = va, *b = vb;
+    d[H2(0)] = (int16_t)a[H1(0)] * b[H1(0)];
+    d[H2(1)] = (int16_t)a[H1(1)] * b[H1(1)];
+    d[H2(2)] = (int16_t)a[H1(2)] * b[H1(2)];
+    d[H2(3)] = (int16_t)a[H1(3)] * b[H1(3)];
+}
+
+RVPR64(smul8);
+
+static inline void do_smulx8(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    int16_t *d = vd;
+    int8_t *a = va, *b = vb;
+    d[H2(0)] = (int16_t)a[H1(0)] * b[H1(1)];
+    d[H2(1)] = (int16_t)a[H1(1)] * b[H1(0)];
+    d[H2(2)] = (int16_t)a[H1(2)] * b[H1(3)];
+    d[H2(3)] = (int16_t)a[H1(3)] * b[H1(2)];
+}
+
+RVPR64(smulx8);
+
+static inline void do_umul8(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    uint16_t *d = vd;
+    uint8_t *a = va, *b = vb;
+    d[H2(0)] = (uint16_t)a[H1(0)] * b[H1(0)];
+    d[H2(1)] = (uint16_t)a[H1(1)] * b[H1(1)];
+    d[H2(2)] = (uint16_t)a[H1(2)] * b[H1(2)];
+    d[H2(3)] = (uint16_t)a[H1(3)] * b[H1(3)];
+}
+
+RVPR64(umul8);
+
+static inline void do_umulx8(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    uint16_t *d = vd;
+    uint8_t *a = va, *b = vb;
+    d[H2(0)] = (uint16_t)a[H1(0)] * b[H1(1)];
+    d[H2(1)] = (uint16_t)a[H1(1)] * b[H1(0)];
+    d[H2(2)] = (uint16_t)a[H1(2)] * b[H1(3)];
+    d[H2(3)] = (uint16_t)a[H1(3)] * b[H1(2)];
+}
+
+RVPR64(umulx8);
+
+static inline void do_khm8(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+
+    if (a[i] == INT8_MIN && b[i] == INT8_MIN) {
+        env->vxsat = 1;
+        d[i] = INT8_MAX;
+    } else {
+        d[i] = (int16_t)a[i] * b[i] >> 7;
+    }
+}
+
+RVPR(khm8, 1, 1);
+
+static inline void do_khmx8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    /*
+     * t[x] = ra.B[x] s* rb.B[y];
+     * rt.B[x] = SAT.Q7(t[x] s>> 7);
+     *
+     * (RV32: (x,y)=(3,2),(2,3),
+     *              (1,0),(0,1),
+     * (RV64: (x,y)=(7,6),(6,7),(5,4),(4,5),
+     *              (3,2),(2,3),(1,0),(0,1))
+     */
+    if (a[H1(i)] == INT8_MIN && b[H1(i + 1)] == INT8_MIN) {
+        env->vxsat = 1;
+        d[H1(i)] = INT8_MAX;
+    } else {
+        d[H1(i)] = (int16_t)a[H1(i)] * b[H1(i + 1)] >> 7;
+    }
+    if (a[H1(i + 1)] == INT8_MIN && b[H1(i)] == INT8_MIN) {
+        env->vxsat = 1;
+        d[H1(i + 1)] = INT8_MAX;
+    } else {
+        d[H1(i + 1)] = (int16_t)a[H1(i + 1)] * b[H1(i)] >> 7;
+    }
+}
+
+RVPR(khmx8, 2, 1);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 11/38] target/riscv: SIMD 8-bit Multiply Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  7 ++
 target/riscv/insn32.decode              |  7 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  8 +++
 target/riscv/packed_helper.c            | 93 +++++++++++++++++++++++++
 4 files changed, 115 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index bc60712bd9..6bb601b436 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1221,3 +1221,10 @@ DEF_HELPER_3(umul16, i64, env, tl, tl)
 DEF_HELPER_3(umulx16, i64, env, tl, tl)
 DEF_HELPER_3(khm16, tl, env, tl, tl)
 DEF_HELPER_3(khmx16, tl, env, tl, tl)
+
+DEF_HELPER_3(smul8, i64, env, tl, tl)
+DEF_HELPER_3(smulx8, i64, env, tl, tl)
+DEF_HELPER_3(umul8, i64, env, tl, tl)
+DEF_HELPER_3(umulx8, i64, env, tl, tl)
+DEF_HELPER_3(khm8, tl, env, tl, tl)
+DEF_HELPER_3(khmx8, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 38519a477c..9d165efba9 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -688,3 +688,10 @@ umul16     1011000  ..... ..... 000 ..... 1111111 @r
 umulx16    1011001  ..... ..... 000 ..... 1111111 @r
 khm16      1000011  ..... ..... 000 ..... 1111111 @r
 khmx16     1001011  ..... ..... 000 ..... 1111111 @r
+
+smul8      1010100  ..... ..... 000 ..... 1111111 @r
+smulx8     1010101  ..... ..... 000 ..... 1111111 @r
+umul8      1011100  ..... ..... 000 ..... 1111111 @r
+umulx8     1011101  ..... ..... 000 ..... 1111111 @r
+khm8       1000111  ..... ..... 000 ..... 1111111 @r
+khmx8      1001111  ..... ..... 000 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 7e5bf9041d..336f3418b1 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -436,3 +436,11 @@ GEN_RVP_R_D64_OOL(umul16);
 GEN_RVP_R_D64_OOL(umulx16);
 GEN_RVP_R_OOL(khm16);
 GEN_RVP_R_OOL(khmx16);
+
+/* SIMD 8-bit Multiply Instructions */
+GEN_RVP_R_D64_OOL(smul8);
+GEN_RVP_R_D64_OOL(smulx8);
+GEN_RVP_R_D64_OOL(umul8);
+GEN_RVP_R_D64_OOL(umulx8);
+GEN_RVP_R_OOL(khm8);
+GEN_RVP_R_OOL(khmx8);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 13fed2c4d1..56baefeb8e 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -827,3 +827,96 @@ static inline void do_khmx16(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(khmx16, 2, 2);
+
+/* SIMD 8-bit Multiply Instructions */
+static inline void do_smul8(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    int16_t *d = vd;
+    int8_t *a = va, *b = vb;
+    d[H2(0)] = (int16_t)a[H1(0)] * b[H1(0)];
+    d[H2(1)] = (int16_t)a[H1(1)] * b[H1(1)];
+    d[H2(2)] = (int16_t)a[H1(2)] * b[H1(2)];
+    d[H2(3)] = (int16_t)a[H1(3)] * b[H1(3)];
+}
+
+RVPR64(smul8);
+
+static inline void do_smulx8(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    int16_t *d = vd;
+    int8_t *a = va, *b = vb;
+    d[H2(0)] = (int16_t)a[H1(0)] * b[H1(1)];
+    d[H2(1)] = (int16_t)a[H1(1)] * b[H1(0)];
+    d[H2(2)] = (int16_t)a[H1(2)] * b[H1(3)];
+    d[H2(3)] = (int16_t)a[H1(3)] * b[H1(2)];
+}
+
+RVPR64(smulx8);
+
+static inline void do_umul8(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    uint16_t *d = vd;
+    uint8_t *a = va, *b = vb;
+    d[H2(0)] = (uint16_t)a[H1(0)] * b[H1(0)];
+    d[H2(1)] = (uint16_t)a[H1(1)] * b[H1(1)];
+    d[H2(2)] = (uint16_t)a[H1(2)] * b[H1(2)];
+    d[H2(3)] = (uint16_t)a[H1(3)] * b[H1(3)];
+}
+
+RVPR64(umul8);
+
+static inline void do_umulx8(CPURISCVState *env, void *vd, void *va, void *vb)
+{
+    uint16_t *d = vd;
+    uint8_t *a = va, *b = vb;
+    d[H2(0)] = (uint16_t)a[H1(0)] * b[H1(1)];
+    d[H2(1)] = (uint16_t)a[H1(1)] * b[H1(0)];
+    d[H2(2)] = (uint16_t)a[H1(2)] * b[H1(3)];
+    d[H2(3)] = (uint16_t)a[H1(3)] * b[H1(2)];
+}
+
+RVPR64(umulx8);
+
+static inline void do_khm8(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+
+    if (a[i] == INT8_MIN && b[i] == INT8_MIN) {
+        env->vxsat = 1;
+        d[i] = INT8_MAX;
+    } else {
+        d[i] = (int16_t)a[i] * b[i] >> 7;
+    }
+}
+
+RVPR(khm8, 1, 1);
+
+static inline void do_khmx8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+    /*
+     * t[x] = ra.B[x] s* rb.B[y];
+     * rt.B[x] = SAT.Q7(t[x] s>> 7);
+     *
+     * (RV32: (x,y)=(3,2),(2,3),
+     *              (1,0),(0,1),
+     * (RV64: (x,y)=(7,6),(6,7),(5,4),(4,5),
+     *              (3,2),(2,3),(1,0),(0,1))
+     */
+    if (a[H1(i)] == INT8_MIN && b[H1(i + 1)] == INT8_MIN) {
+        env->vxsat = 1;
+        d[H1(i)] = INT8_MAX;
+    } else {
+        d[H1(i)] = (int16_t)a[H1(i)] * b[H1(i + 1)] >> 7;
+    }
+    if (a[H1(i + 1)] == INT8_MIN && b[H1(i)] == INT8_MIN) {
+        env->vxsat = 1;
+        d[H1(i + 1)] = INT8_MAX;
+    } else {
+        d[H1(i + 1)] = (int16_t)a[H1(i + 1)] * b[H1(i)] >> 7;
+    }
+}
+
+RVPR(khmx8, 2, 1);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 12/38] target/riscv: SIMD 16-bit Miscellaneous Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  12 ++
 target/riscv/insn32.decode              |  13 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  42 ++++++
 target/riscv/packed_helper.c            | 167 ++++++++++++++++++++++++
 4 files changed, 234 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 6bb601b436..866484e37d 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1228,3 +1228,15 @@ DEF_HELPER_3(umul8, i64, env, tl, tl)
 DEF_HELPER_3(umulx8, i64, env, tl, tl)
 DEF_HELPER_3(khm8, tl, env, tl, tl)
 DEF_HELPER_3(khmx8, tl, env, tl, tl)
+
+DEF_HELPER_3(smin16, tl, env, tl, tl)
+DEF_HELPER_3(umin16, tl, env, tl, tl)
+DEF_HELPER_3(smax16, tl, env, tl, tl)
+DEF_HELPER_3(umax16, tl, env, tl, tl)
+DEF_HELPER_3(sclip16, tl, env, tl, tl)
+DEF_HELPER_3(uclip16, tl, env, tl, tl)
+DEF_HELPER_2(kabs16, tl, env, tl)
+DEF_HELPER_2(clrs16, tl, env, tl)
+DEF_HELPER_2(clz16, tl, env, tl)
+DEF_HELPER_2(clo16, tl, env, tl)
+DEF_HELPER_2(swap16, tl, env, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 9d165efba9..bc9d5fc967 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -42,6 +42,7 @@
 &i    imm rs1 rd
 &j    imm rd
 &r    rd rs1 rs2
+&r2   rd rs1
 &s    imm rs1 rs2
 &u    imm rd
 &shift     shamt rs1 rd
@@ -695,3 +696,15 @@ umul8      1011100  ..... ..... 000 ..... 1111111 @r
 umulx8     1011101  ..... ..... 000 ..... 1111111 @r
 khm8       1000111  ..... ..... 000 ..... 1111111 @r
 khmx8      1001111  ..... ..... 000 ..... 1111111 @r
+
+smin16     1000000  ..... ..... 000 ..... 1111111 @r
+umin16     1001000  ..... ..... 000 ..... 1111111 @r
+smax16     1000001  ..... ..... 000 ..... 1111111 @r
+umax16     1001001  ..... ..... 000 ..... 1111111 @r
+sclip16    1000010  0.... ..... 000 ..... 1111111 @sh4
+uclip16    1000010  1.... ..... 000 ..... 1111111 @sh4
+kabs16     1010110  10001 ..... 000 ..... 1111111 @r2
+clrs16     1010111  01000 ..... 000 ..... 1111111 @r2
+clz16      1010111  01001 ..... 000 ..... 1111111 @r2
+clo16      1010111  01011 ..... 000 ..... 1111111 @r2
+swap16     1010110  11001 ..... 000 ..... 1111111 @r2
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 336f3418b1..56fb8b2523 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -444,3 +444,45 @@ GEN_RVP_R_D64_OOL(umul8);
 GEN_RVP_R_D64_OOL(umulx8);
 GEN_RVP_R_OOL(khm8);
 GEN_RVP_R_OOL(khmx8);
+
+/* SIMD 16-bit Miscellaneous Instructions */
+GEN_RVP_R_OOL(smin16);
+GEN_RVP_R_OOL(umin16);
+GEN_RVP_R_OOL(smax16);
+GEN_RVP_R_OOL(umax16);
+GEN_RVP_SHIFTI(sclip16, sclip16, NULL);
+GEN_RVP_SHIFTI(uclip16, uclip16, NULL);
+
+/* Out of line helpers for R2 format */
+static bool
+r2_ool(DisasContext *ctx, arg_r2 *a,
+       void (* fn)(TCGv, TCGv_ptr, TCGv))
+{
+    TCGv src1, dst;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    fn(dst, cpu_env, src1);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(dst);
+    return true;
+}
+
+#define GEN_RVP_R2_OOL(NAME)                           \
+static bool trans_##NAME(DisasContext *s, arg_r2 *a)   \
+{                                                      \
+    return r2_ool(s, a, gen_helper_##NAME);            \
+}
+
+GEN_RVP_R2_OOL(kabs16);
+GEN_RVP_R2_OOL(clrs16);
+GEN_RVP_R2_OOL(clz16);
+GEN_RVP_R2_OOL(clo16);
+GEN_RVP_R2_OOL(swap16);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 56baefeb8e..a6ab011ace 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -920,3 +920,170 @@ static inline void do_khmx8(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(khmx8, 2, 1);
+
+/* SIMD 16-bit Miscellaneous Instructions */
+static inline void do_smin16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR(smin16, 1, 2);
+
+static inline void do_umin16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR(umin16, 1, 2);
+
+static inline void do_smax16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR(smax16, 1, 2);
+
+static inline void do_umax16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR(umax16, 1, 2);
+
+static int64_t sat64(CPURISCVState *env, int64_t a, uint8_t shift)
+{
+    int64_t max = shift >= 64 ? INT64_MAX : (1ull << shift) - 1;
+    int64_t min = shift >= 64 ? INT64_MIN : -(1ull << shift);
+    int64_t result;
+
+    if (a > max) {
+        result = max;
+        env->vxsat = 0x1;
+    } else if (a < min) {
+        result = min;
+        env->vxsat = 0x1;
+    } else {
+        result = a;
+    }
+    return result;
+}
+
+static inline void do_sclip16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+
+    d[i] = sat64(env, a[i], shift);
+}
+
+RVPR(sclip16, 1, 2);
+
+static uint64_t satu64(CPURISCVState *env, uint64_t a, uint8_t shift)
+{
+    uint64_t max = shift >= 64 ? UINT64_MAX : (1ull << shift) - 1;
+    uint64_t result;
+
+    if (a > max) {
+        result = max;
+        env->vxsat = 0x1;
+    } else {
+        result = a;
+    }
+    return result;
+}
+
+static inline void do_uclip16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+
+    if (a[i] < 0) {
+        d[i] = 0;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = satu64(env, a[i], shift);
+    }
+}
+
+RVPR(uclip16, 1, 2);
+
+typedef void PackedFn2i(CPURISCVState *, void *, void *, uint8_t);
+
+static inline target_ulong rvpr2(CPURISCVState *env, target_ulong a,
+                                 uint8_t step, uint8_t size, PackedFn2i *fn)
+{
+    int i, passes = sizeof(target_ulong) / size;
+    target_ulong result;
+
+    for (i = 0; i < passes; i += step) {
+        fn(env, &result, &a, i);
+    }
+    return result;
+}
+
+#define RVPR2(NAME, STEP, SIZE)                                  \
+target_ulong HELPER(NAME)(CPURISCVState *env, target_ulong a)    \
+{                                                                \
+    return rvpr2(env, a, STEP, SIZE, (PackedFn2i *)do_##NAME);   \
+}
+
+static inline void do_kabs16(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+
+    if (a[i] == INT16_MIN) {
+        d[i] = INT16_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = abs(a[i]);
+    }
+}
+
+RVPR2(kabs16, 1, 2);
+
+static inline void do_clrs16(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    d[i] = clrsb32(a[i]) - 16;
+}
+
+RVPR2(clrs16, 1, 2);
+
+static inline void do_clz16(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    d[i] = (a[i] < 0) ? 0 : (clz32(a[i]) - 16);
+}
+
+RVPR2(clz16, 1, 2);
+
+static inline void do_clo16(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    d[i] = (a[i] >= 0) ? 0 : (clo32(a[i]) - 16);
+}
+
+RVPR2(clo16, 1, 2);
+
+static inline void do_swap16(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    d[H2(i)] = a[H2(i + 1)];
+    d[H2(i + 1)] = a[H2(i)];
+}
+
+RVPR2(swap16, 2, 2);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 12/38] target/riscv: SIMD 16-bit Miscellaneous Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  12 ++
 target/riscv/insn32.decode              |  13 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  42 ++++++
 target/riscv/packed_helper.c            | 167 ++++++++++++++++++++++++
 4 files changed, 234 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 6bb601b436..866484e37d 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1228,3 +1228,15 @@ DEF_HELPER_3(umul8, i64, env, tl, tl)
 DEF_HELPER_3(umulx8, i64, env, tl, tl)
 DEF_HELPER_3(khm8, tl, env, tl, tl)
 DEF_HELPER_3(khmx8, tl, env, tl, tl)
+
+DEF_HELPER_3(smin16, tl, env, tl, tl)
+DEF_HELPER_3(umin16, tl, env, tl, tl)
+DEF_HELPER_3(smax16, tl, env, tl, tl)
+DEF_HELPER_3(umax16, tl, env, tl, tl)
+DEF_HELPER_3(sclip16, tl, env, tl, tl)
+DEF_HELPER_3(uclip16, tl, env, tl, tl)
+DEF_HELPER_2(kabs16, tl, env, tl)
+DEF_HELPER_2(clrs16, tl, env, tl)
+DEF_HELPER_2(clz16, tl, env, tl)
+DEF_HELPER_2(clo16, tl, env, tl)
+DEF_HELPER_2(swap16, tl, env, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 9d165efba9..bc9d5fc967 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -42,6 +42,7 @@
 &i    imm rs1 rd
 &j    imm rd
 &r    rd rs1 rs2
+&r2   rd rs1
 &s    imm rs1 rs2
 &u    imm rd
 &shift     shamt rs1 rd
@@ -695,3 +696,15 @@ umul8      1011100  ..... ..... 000 ..... 1111111 @r
 umulx8     1011101  ..... ..... 000 ..... 1111111 @r
 khm8       1000111  ..... ..... 000 ..... 1111111 @r
 khmx8      1001111  ..... ..... 000 ..... 1111111 @r
+
+smin16     1000000  ..... ..... 000 ..... 1111111 @r
+umin16     1001000  ..... ..... 000 ..... 1111111 @r
+smax16     1000001  ..... ..... 000 ..... 1111111 @r
+umax16     1001001  ..... ..... 000 ..... 1111111 @r
+sclip16    1000010  0.... ..... 000 ..... 1111111 @sh4
+uclip16    1000010  1.... ..... 000 ..... 1111111 @sh4
+kabs16     1010110  10001 ..... 000 ..... 1111111 @r2
+clrs16     1010111  01000 ..... 000 ..... 1111111 @r2
+clz16      1010111  01001 ..... 000 ..... 1111111 @r2
+clo16      1010111  01011 ..... 000 ..... 1111111 @r2
+swap16     1010110  11001 ..... 000 ..... 1111111 @r2
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 336f3418b1..56fb8b2523 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -444,3 +444,45 @@ GEN_RVP_R_D64_OOL(umul8);
 GEN_RVP_R_D64_OOL(umulx8);
 GEN_RVP_R_OOL(khm8);
 GEN_RVP_R_OOL(khmx8);
+
+/* SIMD 16-bit Miscellaneous Instructions */
+GEN_RVP_R_OOL(smin16);
+GEN_RVP_R_OOL(umin16);
+GEN_RVP_R_OOL(smax16);
+GEN_RVP_R_OOL(umax16);
+GEN_RVP_SHIFTI(sclip16, sclip16, NULL);
+GEN_RVP_SHIFTI(uclip16, uclip16, NULL);
+
+/* Out of line helpers for R2 format */
+static bool
+r2_ool(DisasContext *ctx, arg_r2 *a,
+       void (* fn)(TCGv, TCGv_ptr, TCGv))
+{
+    TCGv src1, dst;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    fn(dst, cpu_env, src1);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(dst);
+    return true;
+}
+
+#define GEN_RVP_R2_OOL(NAME)                           \
+static bool trans_##NAME(DisasContext *s, arg_r2 *a)   \
+{                                                      \
+    return r2_ool(s, a, gen_helper_##NAME);            \
+}
+
+GEN_RVP_R2_OOL(kabs16);
+GEN_RVP_R2_OOL(clrs16);
+GEN_RVP_R2_OOL(clz16);
+GEN_RVP_R2_OOL(clo16);
+GEN_RVP_R2_OOL(swap16);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 56baefeb8e..a6ab011ace 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -920,3 +920,170 @@ static inline void do_khmx8(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(khmx8, 2, 1);
+
+/* SIMD 16-bit Miscellaneous Instructions */
+static inline void do_smin16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR(smin16, 1, 2);
+
+static inline void do_umin16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR(umin16, 1, 2);
+
+static inline void do_smax16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR(smax16, 1, 2);
+
+static inline void do_umax16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR(umax16, 1, 2);
+
+static int64_t sat64(CPURISCVState *env, int64_t a, uint8_t shift)
+{
+    int64_t max = shift >= 64 ? INT64_MAX : (1ull << shift) - 1;
+    int64_t min = shift >= 64 ? INT64_MIN : -(1ull << shift);
+    int64_t result;
+
+    if (a > max) {
+        result = max;
+        env->vxsat = 0x1;
+    } else if (a < min) {
+        result = min;
+        env->vxsat = 0x1;
+    } else {
+        result = a;
+    }
+    return result;
+}
+
+static inline void do_sclip16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+
+    d[i] = sat64(env, a[i], shift);
+}
+
+RVPR(sclip16, 1, 2);
+
+static uint64_t satu64(CPURISCVState *env, uint64_t a, uint8_t shift)
+{
+    uint64_t max = shift >= 64 ? UINT64_MAX : (1ull << shift) - 1;
+    uint64_t result;
+
+    if (a > max) {
+        result = max;
+        env->vxsat = 0x1;
+    } else {
+        result = a;
+    }
+    return result;
+}
+
+static inline void do_uclip16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0xf;
+
+    if (a[i] < 0) {
+        d[i] = 0;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = satu64(env, a[i], shift);
+    }
+}
+
+RVPR(uclip16, 1, 2);
+
+typedef void PackedFn2i(CPURISCVState *, void *, void *, uint8_t);
+
+static inline target_ulong rvpr2(CPURISCVState *env, target_ulong a,
+                                 uint8_t step, uint8_t size, PackedFn2i *fn)
+{
+    int i, passes = sizeof(target_ulong) / size;
+    target_ulong result;
+
+    for (i = 0; i < passes; i += step) {
+        fn(env, &result, &a, i);
+    }
+    return result;
+}
+
+#define RVPR2(NAME, STEP, SIZE)                                  \
+target_ulong HELPER(NAME)(CPURISCVState *env, target_ulong a)    \
+{                                                                \
+    return rvpr2(env, a, STEP, SIZE, (PackedFn2i *)do_##NAME);   \
+}
+
+static inline void do_kabs16(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+
+    if (a[i] == INT16_MIN) {
+        d[i] = INT16_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = abs(a[i]);
+    }
+}
+
+RVPR2(kabs16, 1, 2);
+
+static inline void do_clrs16(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    d[i] = clrsb32(a[i]) - 16;
+}
+
+RVPR2(clrs16, 1, 2);
+
+static inline void do_clz16(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    d[i] = (a[i] < 0) ? 0 : (clz32(a[i]) - 16);
+}
+
+RVPR2(clz16, 1, 2);
+
+static inline void do_clo16(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    d[i] = (a[i] >= 0) ? 0 : (clo32(a[i]) - 16);
+}
+
+RVPR2(clo16, 1, 2);
+
+static inline void do_swap16(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int16_t *d = vd, *a = va;
+    d[H2(i)] = a[H2(i + 1)];
+    d[H2(i + 1)] = a[H2(i)];
+}
+
+RVPR2(swap16, 2, 2);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 13/38] target/riscv: SIMD 8-bit Miscellaneous Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  12 +++
 target/riscv/insn32.decode              |  12 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  13 +++
 target/riscv/packed_helper.c            | 115 ++++++++++++++++++++++++
 4 files changed, 152 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 866484e37d..83778b532a 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1240,3 +1240,15 @@ DEF_HELPER_2(clrs16, tl, env, tl)
 DEF_HELPER_2(clz16, tl, env, tl)
 DEF_HELPER_2(clo16, tl, env, tl)
 DEF_HELPER_2(swap16, tl, env, tl)
+
+DEF_HELPER_3(smin8, tl, env, tl, tl)
+DEF_HELPER_3(umin8, tl, env, tl, tl)
+DEF_HELPER_3(smax8, tl, env, tl, tl)
+DEF_HELPER_3(umax8, tl, env, tl, tl)
+DEF_HELPER_3(sclip8, tl, env, tl, tl)
+DEF_HELPER_3(uclip8, tl, env, tl, tl)
+DEF_HELPER_2(kabs8, tl, env, tl)
+DEF_HELPER_2(clrs8, tl, env, tl)
+DEF_HELPER_2(clz8, tl, env, tl)
+DEF_HELPER_2(clo8, tl, env, tl)
+DEF_HELPER_2(swap8, tl, env, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index bc9d5fc967..e158066353 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -708,3 +708,15 @@ clrs16     1010111  01000 ..... 000 ..... 1111111 @r2
 clz16      1010111  01001 ..... 000 ..... 1111111 @r2
 clo16      1010111  01011 ..... 000 ..... 1111111 @r2
 swap16     1010110  11001 ..... 000 ..... 1111111 @r2
+
+smin8      1000100  ..... ..... 000 ..... 1111111 @r
+umin8      1001100  ..... ..... 000 ..... 1111111 @r
+smax8      1000101  ..... ..... 000 ..... 1111111 @r
+umax8      1001101  ..... ..... 000 ..... 1111111 @r
+sclip8     1000110  00... ..... 000 ..... 1111111 @sh3
+uclip8     1000110  10... ..... 000 ..... 1111111 @sh3
+kabs8      1010110  10000 ..... 000 ..... 1111111 @r2
+clrs8      1010111  00000 ..... 000 ..... 1111111 @r2
+clz8       1010111  00001 ..... 000 ..... 1111111 @r2
+clo8       1010111  00011 ..... 000 ..... 1111111 @r2
+swap8      1010110  11000 ..... 000 ..... 1111111 @r2
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 56fb8b2523..5ad057d7ac 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -486,3 +486,16 @@ GEN_RVP_R2_OOL(clrs16);
 GEN_RVP_R2_OOL(clz16);
 GEN_RVP_R2_OOL(clo16);
 GEN_RVP_R2_OOL(swap16);
+
+/* SIMD 8-bit Miscellaneous Instructions */
+GEN_RVP_R_OOL(smin8);
+GEN_RVP_R_OOL(umin8);
+GEN_RVP_R_OOL(smax8);
+GEN_RVP_R_OOL(umax8);
+GEN_RVP_SHIFTI(sclip8, sclip8, NULL);
+GEN_RVP_SHIFTI(uclip8, uclip8, NULL);
+GEN_RVP_R2_OOL(kabs8);
+GEN_RVP_R2_OOL(clrs8);
+GEN_RVP_R2_OOL(clz8);
+GEN_RVP_R2_OOL(clo8);
+GEN_RVP_R2_OOL(swap8);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index a6ab011ace..be91d308e5 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1087,3 +1087,118 @@ static inline void do_swap16(CPURISCVState *env, void *vd, void *va, uint8_t i)
 }
 
 RVPR2(swap16, 2, 2);
+
+/* SIMD 8-bit Miscellaneous Instructions */
+static inline void do_smin8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR(smin8, 1, 1);
+
+static inline void do_umin8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR(umin8, 1, 1);
+
+static inline void do_smax8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR(smax8, 1, 1);
+
+static inline void do_umax8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR(umax8, 1, 1);
+
+static inline void do_sclip8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+
+    d[i] = sat64(env, a[i], shift);
+}
+
+RVPR(sclip8, 1, 1);
+
+static inline void do_uclip8(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+
+    if (a[i] < 0) {
+        d[i] = 0;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = satu64(env, a[i], shift);
+    }
+}
+
+RVPR(uclip8, 1, 1);
+
+static inline void do_kabs8(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+
+    if (a[i] == INT8_MIN) {
+        d[i] = INT8_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = abs(a[i]);
+    }
+}
+
+RVPR2(kabs8, 1, 1);
+
+static inline void do_clrs8(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    d[i] = clrsb32(a[i]) - 24;
+}
+
+RVPR2(clrs8, 1, 1);
+
+static inline void do_clz8(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    d[i] = (a[i] < 0) ? 0 : (clz32(a[i]) - 24);
+}
+
+RVPR2(clz8, 1, 1);
+
+static inline void do_clo8(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    d[i] = (a[i] >= 0) ? 0 : (clo32(a[i]) - 24);
+}
+
+RVPR2(clo8, 1, 1);
+
+static inline void do_swap8(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    d[H1(i)] = a[H1(i + 1)];
+    d[H1(i + 1)] = a[H1(i)];
+}
+
+RVPR2(swap8, 2, 1);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 13/38] target/riscv: SIMD 8-bit Miscellaneous Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  12 +++
 target/riscv/insn32.decode              |  12 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  13 +++
 target/riscv/packed_helper.c            | 115 ++++++++++++++++++++++++
 4 files changed, 152 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 866484e37d..83778b532a 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1240,3 +1240,15 @@ DEF_HELPER_2(clrs16, tl, env, tl)
 DEF_HELPER_2(clz16, tl, env, tl)
 DEF_HELPER_2(clo16, tl, env, tl)
 DEF_HELPER_2(swap16, tl, env, tl)
+
+DEF_HELPER_3(smin8, tl, env, tl, tl)
+DEF_HELPER_3(umin8, tl, env, tl, tl)
+DEF_HELPER_3(smax8, tl, env, tl, tl)
+DEF_HELPER_3(umax8, tl, env, tl, tl)
+DEF_HELPER_3(sclip8, tl, env, tl, tl)
+DEF_HELPER_3(uclip8, tl, env, tl, tl)
+DEF_HELPER_2(kabs8, tl, env, tl)
+DEF_HELPER_2(clrs8, tl, env, tl)
+DEF_HELPER_2(clz8, tl, env, tl)
+DEF_HELPER_2(clo8, tl, env, tl)
+DEF_HELPER_2(swap8, tl, env, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index bc9d5fc967..e158066353 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -708,3 +708,15 @@ clrs16     1010111  01000 ..... 000 ..... 1111111 @r2
 clz16      1010111  01001 ..... 000 ..... 1111111 @r2
 clo16      1010111  01011 ..... 000 ..... 1111111 @r2
 swap16     1010110  11001 ..... 000 ..... 1111111 @r2
+
+smin8      1000100  ..... ..... 000 ..... 1111111 @r
+umin8      1001100  ..... ..... 000 ..... 1111111 @r
+smax8      1000101  ..... ..... 000 ..... 1111111 @r
+umax8      1001101  ..... ..... 000 ..... 1111111 @r
+sclip8     1000110  00... ..... 000 ..... 1111111 @sh3
+uclip8     1000110  10... ..... 000 ..... 1111111 @sh3
+kabs8      1010110  10000 ..... 000 ..... 1111111 @r2
+clrs8      1010111  00000 ..... 000 ..... 1111111 @r2
+clz8       1010111  00001 ..... 000 ..... 1111111 @r2
+clo8       1010111  00011 ..... 000 ..... 1111111 @r2
+swap8      1010110  11000 ..... 000 ..... 1111111 @r2
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 56fb8b2523..5ad057d7ac 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -486,3 +486,16 @@ GEN_RVP_R2_OOL(clrs16);
 GEN_RVP_R2_OOL(clz16);
 GEN_RVP_R2_OOL(clo16);
 GEN_RVP_R2_OOL(swap16);
+
+/* SIMD 8-bit Miscellaneous Instructions */
+GEN_RVP_R_OOL(smin8);
+GEN_RVP_R_OOL(umin8);
+GEN_RVP_R_OOL(smax8);
+GEN_RVP_R_OOL(umax8);
+GEN_RVP_SHIFTI(sclip8, sclip8, NULL);
+GEN_RVP_SHIFTI(uclip8, uclip8, NULL);
+GEN_RVP_R2_OOL(kabs8);
+GEN_RVP_R2_OOL(clrs8);
+GEN_RVP_R2_OOL(clz8);
+GEN_RVP_R2_OOL(clo8);
+GEN_RVP_R2_OOL(swap8);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index a6ab011ace..be91d308e5 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1087,3 +1087,118 @@ static inline void do_swap16(CPURISCVState *env, void *vd, void *va, uint8_t i)
 }
 
 RVPR2(swap16, 2, 2);
+
+/* SIMD 8-bit Miscellaneous Instructions */
+static inline void do_smin8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR(smin8, 1, 1);
+
+static inline void do_umin8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR(umin8, 1, 1);
+
+static inline void do_smax8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR(smax8, 1, 1);
+
+static inline void do_umax8(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint8_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR(umax8, 1, 1);
+
+static inline void do_sclip8(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+
+    d[i] = sat64(env, a[i], shift);
+}
+
+RVPR(sclip8, 1, 1);
+
+static inline void do_uclip8(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x7;
+
+    if (a[i] < 0) {
+        d[i] = 0;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = satu64(env, a[i], shift);
+    }
+}
+
+RVPR(uclip8, 1, 1);
+
+static inline void do_kabs8(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+
+    if (a[i] == INT8_MIN) {
+        d[i] = INT8_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = abs(a[i]);
+    }
+}
+
+RVPR2(kabs8, 1, 1);
+
+static inline void do_clrs8(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    d[i] = clrsb32(a[i]) - 24;
+}
+
+RVPR2(clrs8, 1, 1);
+
+static inline void do_clz8(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    d[i] = (a[i] < 0) ? 0 : (clz32(a[i]) - 24);
+}
+
+RVPR2(clz8, 1, 1);
+
+static inline void do_clo8(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    d[i] = (a[i] >= 0) ? 0 : (clo32(a[i]) - 24);
+}
+
+RVPR2(clo8, 1, 1);
+
+static inline void do_swap8(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *d = vd, *a = va;
+    d[H1(i)] = a[H1(i + 1)];
+    d[H1(i + 1)] = a[H1(i)];
+}
+
+RVPR2(swap8, 2, 1);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 14/38] target/riscv: 8-bit Unpacking Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  11 +++
 target/riscv/insn32.decode              |  11 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  12 +++
 target/riscv/packed_helper.c            | 121 ++++++++++++++++++++++++
 4 files changed, 155 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 83778b532a..585905a689 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1252,3 +1252,14 @@ DEF_HELPER_2(clrs8, tl, env, tl)
 DEF_HELPER_2(clz8, tl, env, tl)
 DEF_HELPER_2(clo8, tl, env, tl)
 DEF_HELPER_2(swap8, tl, env, tl)
+
+DEF_HELPER_2(sunpkd810, tl, env, tl)
+DEF_HELPER_2(sunpkd820, tl, env, tl)
+DEF_HELPER_2(sunpkd830, tl, env, tl)
+DEF_HELPER_2(sunpkd831, tl, env, tl)
+DEF_HELPER_2(sunpkd832, tl, env, tl)
+DEF_HELPER_2(zunpkd810, tl, env, tl)
+DEF_HELPER_2(zunpkd820, tl, env, tl)
+DEF_HELPER_2(zunpkd830, tl, env, tl)
+DEF_HELPER_2(zunpkd831, tl, env, tl)
+DEF_HELPER_2(zunpkd832, tl, env, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e158066353..fa4a02c9db 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -720,3 +720,14 @@ clrs8      1010111  00000 ..... 000 ..... 1111111 @r2
 clz8       1010111  00001 ..... 000 ..... 1111111 @r2
 clo8       1010111  00011 ..... 000 ..... 1111111 @r2
 swap8      1010110  11000 ..... 000 ..... 1111111 @r2
+
+sunpkd810  1010110  01000 ..... 000 ..... 1111111 @r2
+sunpkd820  1010110  01001 ..... 000 ..... 1111111 @r2
+sunpkd830  1010110  01010 ..... 000 ..... 1111111 @r2
+sunpkd831  1010110  01011 ..... 000 ..... 1111111 @r2
+sunpkd832  1010110  10011 ..... 000 ..... 1111111 @r2
+zunpkd810  1010110  01100 ..... 000 ..... 1111111 @r2
+zunpkd820  1010110  01101 ..... 000 ..... 1111111 @r2
+zunpkd830  1010110  01110 ..... 000 ..... 1111111 @r2
+zunpkd831  1010110  01111 ..... 000 ..... 1111111 @r2
+zunpkd832  1010110  10111 ..... 000 ..... 1111111 @r2
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 5ad057d7ac..b69e964cb4 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -499,3 +499,15 @@ GEN_RVP_R2_OOL(clrs8);
 GEN_RVP_R2_OOL(clz8);
 GEN_RVP_R2_OOL(clo8);
 GEN_RVP_R2_OOL(swap8);
+
+/* 8-bit Unpacking Instructions */
+GEN_RVP_R2_OOL(sunpkd810);
+GEN_RVP_R2_OOL(sunpkd820);
+GEN_RVP_R2_OOL(sunpkd830);
+GEN_RVP_R2_OOL(sunpkd831);
+GEN_RVP_R2_OOL(sunpkd832);
+GEN_RVP_R2_OOL(zunpkd810);
+GEN_RVP_R2_OOL(zunpkd820);
+GEN_RVP_R2_OOL(zunpkd830);
+GEN_RVP_R2_OOL(zunpkd831);
+GEN_RVP_R2_OOL(zunpkd832);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index be91d308e5..d0dcb692f5 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1202,3 +1202,124 @@ static inline void do_swap8(CPURISCVState *env, void *vd, void *va, uint8_t i)
 }
 
 RVPR2(swap8, 2, 1);
+
+/* 8-bit Unpacking Instructions */
+static inline void
+do_sunpkd810(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *a = va;
+    int16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 1)];
+}
+
+RVPR2(sunpkd810, 4, 1);
+
+static inline void
+do_sunpkd820(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *a = va;
+    int16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 2)];
+}
+
+RVPR2(sunpkd820, 4, 1);
+
+static inline void
+do_sunpkd830(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *a = va;
+    int16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(sunpkd830, 4, 1);
+
+static inline void
+do_sunpkd831(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *a = va;
+    int16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i) + 1];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(sunpkd831, 4, 1);
+
+static inline void
+do_sunpkd832(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *a = va;
+    int16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i) + 2];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(sunpkd832, 4, 1);
+
+static inline void
+do_zunpkd810(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    uint8_t *a = va;
+    uint16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 1)];
+}
+
+RVPR2(zunpkd810, 4, 1);
+
+static inline void
+do_zunpkd820(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    uint8_t *a = va;
+    uint16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 2)];
+}
+
+RVPR2(zunpkd820, 4, 1);
+
+static inline void
+do_zunpkd830(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    uint8_t *a = va;
+    uint16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(zunpkd830, 4, 1);
+
+static inline void
+do_zunpkd831(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    uint8_t *a = va;
+    uint16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i) + 1];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(zunpkd831, 4, 1);
+
+static inline void
+do_zunpkd832(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    uint8_t *a = va;
+    uint16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i) + 2];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(zunpkd832, 4, 1);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 14/38] target/riscv: 8-bit Unpacking Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  11 +++
 target/riscv/insn32.decode              |  11 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  12 +++
 target/riscv/packed_helper.c            | 121 ++++++++++++++++++++++++
 4 files changed, 155 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 83778b532a..585905a689 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1252,3 +1252,14 @@ DEF_HELPER_2(clrs8, tl, env, tl)
 DEF_HELPER_2(clz8, tl, env, tl)
 DEF_HELPER_2(clo8, tl, env, tl)
 DEF_HELPER_2(swap8, tl, env, tl)
+
+DEF_HELPER_2(sunpkd810, tl, env, tl)
+DEF_HELPER_2(sunpkd820, tl, env, tl)
+DEF_HELPER_2(sunpkd830, tl, env, tl)
+DEF_HELPER_2(sunpkd831, tl, env, tl)
+DEF_HELPER_2(sunpkd832, tl, env, tl)
+DEF_HELPER_2(zunpkd810, tl, env, tl)
+DEF_HELPER_2(zunpkd820, tl, env, tl)
+DEF_HELPER_2(zunpkd830, tl, env, tl)
+DEF_HELPER_2(zunpkd831, tl, env, tl)
+DEF_HELPER_2(zunpkd832, tl, env, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e158066353..fa4a02c9db 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -720,3 +720,14 @@ clrs8      1010111  00000 ..... 000 ..... 1111111 @r2
 clz8       1010111  00001 ..... 000 ..... 1111111 @r2
 clo8       1010111  00011 ..... 000 ..... 1111111 @r2
 swap8      1010110  11000 ..... 000 ..... 1111111 @r2
+
+sunpkd810  1010110  01000 ..... 000 ..... 1111111 @r2
+sunpkd820  1010110  01001 ..... 000 ..... 1111111 @r2
+sunpkd830  1010110  01010 ..... 000 ..... 1111111 @r2
+sunpkd831  1010110  01011 ..... 000 ..... 1111111 @r2
+sunpkd832  1010110  10011 ..... 000 ..... 1111111 @r2
+zunpkd810  1010110  01100 ..... 000 ..... 1111111 @r2
+zunpkd820  1010110  01101 ..... 000 ..... 1111111 @r2
+zunpkd830  1010110  01110 ..... 000 ..... 1111111 @r2
+zunpkd831  1010110  01111 ..... 000 ..... 1111111 @r2
+zunpkd832  1010110  10111 ..... 000 ..... 1111111 @r2
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 5ad057d7ac..b69e964cb4 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -499,3 +499,15 @@ GEN_RVP_R2_OOL(clrs8);
 GEN_RVP_R2_OOL(clz8);
 GEN_RVP_R2_OOL(clo8);
 GEN_RVP_R2_OOL(swap8);
+
+/* 8-bit Unpacking Instructions */
+GEN_RVP_R2_OOL(sunpkd810);
+GEN_RVP_R2_OOL(sunpkd820);
+GEN_RVP_R2_OOL(sunpkd830);
+GEN_RVP_R2_OOL(sunpkd831);
+GEN_RVP_R2_OOL(sunpkd832);
+GEN_RVP_R2_OOL(zunpkd810);
+GEN_RVP_R2_OOL(zunpkd820);
+GEN_RVP_R2_OOL(zunpkd830);
+GEN_RVP_R2_OOL(zunpkd831);
+GEN_RVP_R2_OOL(zunpkd832);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index be91d308e5..d0dcb692f5 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1202,3 +1202,124 @@ static inline void do_swap8(CPURISCVState *env, void *vd, void *va, uint8_t i)
 }
 
 RVPR2(swap8, 2, 1);
+
+/* 8-bit Unpacking Instructions */
+static inline void
+do_sunpkd810(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *a = va;
+    int16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 1)];
+}
+
+RVPR2(sunpkd810, 4, 1);
+
+static inline void
+do_sunpkd820(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *a = va;
+    int16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 2)];
+}
+
+RVPR2(sunpkd820, 4, 1);
+
+static inline void
+do_sunpkd830(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *a = va;
+    int16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(sunpkd830, 4, 1);
+
+static inline void
+do_sunpkd831(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *a = va;
+    int16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i) + 1];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(sunpkd831, 4, 1);
+
+static inline void
+do_sunpkd832(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int8_t *a = va;
+    int16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i) + 2];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(sunpkd832, 4, 1);
+
+static inline void
+do_zunpkd810(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    uint8_t *a = va;
+    uint16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 1)];
+}
+
+RVPR2(zunpkd810, 4, 1);
+
+static inline void
+do_zunpkd820(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    uint8_t *a = va;
+    uint16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 2)];
+}
+
+RVPR2(zunpkd820, 4, 1);
+
+static inline void
+do_zunpkd830(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    uint8_t *a = va;
+    uint16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i)];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(zunpkd830, 4, 1);
+
+static inline void
+do_zunpkd831(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    uint8_t *a = va;
+    uint16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i) + 1];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(zunpkd831, 4, 1);
+
+static inline void
+do_zunpkd832(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    uint8_t *a = va;
+    uint16_t *d = vd;
+
+    d[H2(i / 2)] = a[H1(i) + 2];
+    d[H2(i / 2 + 1)] = a[H1(i + 3)];
+}
+
+RVPR2(zunpkd832, 4, 1);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 15/38] target/riscv: 16-bit Packing Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  5 +++
 target/riscv/insn32.decode              |  5 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  9 +++++
 target/riscv/packed_helper.c            | 45 +++++++++++++++++++++++++
 4 files changed, 64 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 585905a689..4dc66cf4cc 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1263,3 +1263,8 @@ DEF_HELPER_2(zunpkd820, tl, env, tl)
 DEF_HELPER_2(zunpkd830, tl, env, tl)
 DEF_HELPER_2(zunpkd831, tl, env, tl)
 DEF_HELPER_2(zunpkd832, tl, env, tl)
+
+DEF_HELPER_3(pkbb16, tl, env, tl, tl)
+DEF_HELPER_3(pkbt16, tl, env, tl, tl)
+DEF_HELPER_3(pktt16, tl, env, tl, tl)
+DEF_HELPER_3(pktb16, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index fa4a02c9db..a4d9ff2282 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -731,3 +731,8 @@ zunpkd820  1010110  01101 ..... 000 ..... 1111111 @r2
 zunpkd830  1010110  01110 ..... 000 ..... 1111111 @r2
 zunpkd831  1010110  01111 ..... 000 ..... 1111111 @r2
 zunpkd832  1010110  10111 ..... 000 ..... 1111111 @r2
+
+pkbb16     0000111  ..... ..... 001 ..... 1111111 @r
+pkbt16     0001111  ..... ..... 001 ..... 1111111 @r
+pktt16     0010111  ..... ..... 001 ..... 1111111 @r
+pktb16     0011111  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index b69e964cb4..99a19019eb 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -511,3 +511,12 @@ GEN_RVP_R2_OOL(zunpkd820);
 GEN_RVP_R2_OOL(zunpkd830);
 GEN_RVP_R2_OOL(zunpkd831);
 GEN_RVP_R2_OOL(zunpkd832);
+
+/*
+ *** Partial-SIMD Data Processing Instruction
+ */
+/* 16-bit Packing Instructions */
+GEN_RVP_R_OOL(pkbb16);
+GEN_RVP_R_OOL(pkbt16);
+GEN_RVP_R_OOL(pktt16);
+GEN_RVP_R_OOL(pktb16);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index d0dcb692f5..fe1b48c86d 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1323,3 +1323,48 @@ do_zunpkd832(CPURISCVState *env, void *vd, void *va, uint8_t i)
 }
 
 RVPR2(zunpkd832, 4, 1);
+
+/*
+ *** Partial-SIMD Data Processing Instructions
+ */
+
+/* 16-bit Packing Instructions */
+static inline void do_pkbb16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i + 1)] = a[H2(i)];
+    d[H2(i)] = b[H2(i)];
+}
+
+RVPR(pkbb16, 2, 2);
+
+static inline void do_pkbt16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i + 1)] = a[H2(i)];
+    d[H2(i)] = b[H2(i + 1)];
+}
+
+RVPR(pkbt16, 2, 2);
+
+static inline void do_pktt16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i + 1)] = a[H2(i + 1)];
+    d[H2(i)] = b[H2(i + 1)];
+}
+
+RVPR(pktt16, 2, 2);
+
+static inline void do_pktb16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i + 1)] = a[H2(i + 1)];
+    d[H2(i)] = b[H2(i)];
+}
+
+RVPR(pktb16, 2, 2);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 15/38] target/riscv: 16-bit Packing Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  5 +++
 target/riscv/insn32.decode              |  5 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  9 +++++
 target/riscv/packed_helper.c            | 45 +++++++++++++++++++++++++
 4 files changed, 64 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 585905a689..4dc66cf4cc 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1263,3 +1263,8 @@ DEF_HELPER_2(zunpkd820, tl, env, tl)
 DEF_HELPER_2(zunpkd830, tl, env, tl)
 DEF_HELPER_2(zunpkd831, tl, env, tl)
 DEF_HELPER_2(zunpkd832, tl, env, tl)
+
+DEF_HELPER_3(pkbb16, tl, env, tl, tl)
+DEF_HELPER_3(pkbt16, tl, env, tl, tl)
+DEF_HELPER_3(pktt16, tl, env, tl, tl)
+DEF_HELPER_3(pktb16, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index fa4a02c9db..a4d9ff2282 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -731,3 +731,8 @@ zunpkd820  1010110  01101 ..... 000 ..... 1111111 @r2
 zunpkd830  1010110  01110 ..... 000 ..... 1111111 @r2
 zunpkd831  1010110  01111 ..... 000 ..... 1111111 @r2
 zunpkd832  1010110  10111 ..... 000 ..... 1111111 @r2
+
+pkbb16     0000111  ..... ..... 001 ..... 1111111 @r
+pkbt16     0001111  ..... ..... 001 ..... 1111111 @r
+pktt16     0010111  ..... ..... 001 ..... 1111111 @r
+pktb16     0011111  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index b69e964cb4..99a19019eb 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -511,3 +511,12 @@ GEN_RVP_R2_OOL(zunpkd820);
 GEN_RVP_R2_OOL(zunpkd830);
 GEN_RVP_R2_OOL(zunpkd831);
 GEN_RVP_R2_OOL(zunpkd832);
+
+/*
+ *** Partial-SIMD Data Processing Instruction
+ */
+/* 16-bit Packing Instructions */
+GEN_RVP_R_OOL(pkbb16);
+GEN_RVP_R_OOL(pkbt16);
+GEN_RVP_R_OOL(pktt16);
+GEN_RVP_R_OOL(pktb16);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index d0dcb692f5..fe1b48c86d 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1323,3 +1323,48 @@ do_zunpkd832(CPURISCVState *env, void *vd, void *va, uint8_t i)
 }
 
 RVPR2(zunpkd832, 4, 1);
+
+/*
+ *** Partial-SIMD Data Processing Instructions
+ */
+
+/* 16-bit Packing Instructions */
+static inline void do_pkbb16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i + 1)] = a[H2(i)];
+    d[H2(i)] = b[H2(i)];
+}
+
+RVPR(pkbb16, 2, 2);
+
+static inline void do_pkbt16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i + 1)] = a[H2(i)];
+    d[H2(i)] = b[H2(i + 1)];
+}
+
+RVPR(pkbt16, 2, 2);
+
+static inline void do_pktt16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i + 1)] = a[H2(i + 1)];
+    d[H2(i)] = b[H2(i + 1)];
+}
+
+RVPR(pktt16, 2, 2);
+
+static inline void do_pktb16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint16_t *d = vd, *a = va, *b = vb;
+    d[H2(i + 1)] = a[H2(i + 1)];
+    d[H2(i)] = b[H2(i)];
+}
+
+RVPR(pktb16, 2, 2);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 16/38] target/riscv: Signed MSW 32x32 Multiply and Add Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   9 ++
 target/riscv/insn32.decode              |   9 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  44 ++++++++++
 target/riscv/packed_helper.c            | 109 ++++++++++++++++++++++++
 4 files changed, 171 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 4dc66cf4cc..0bd21c8514 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1268,3 +1268,12 @@ DEF_HELPER_3(pkbb16, tl, env, tl, tl)
 DEF_HELPER_3(pkbt16, tl, env, tl, tl)
 DEF_HELPER_3(pktt16, tl, env, tl, tl)
 DEF_HELPER_3(pktb16, tl, env, tl, tl)
+
+DEF_HELPER_3(smmul, tl, env, tl, tl)
+DEF_HELPER_3(smmul_u, tl, env, tl, tl)
+DEF_HELPER_4(kmmac, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmac_u, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmsb, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmsb_u, tl, env, tl, tl, tl)
+DEF_HELPER_3(kwmmul, tl, env, tl, tl)
+DEF_HELPER_3(kwmmul_u, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index a4d9ff2282..e0be2790dc 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -736,3 +736,12 @@ pkbb16     0000111  ..... ..... 001 ..... 1111111 @r
 pkbt16     0001111  ..... ..... 001 ..... 1111111 @r
 pktt16     0010111  ..... ..... 001 ..... 1111111 @r
 pktb16     0011111  ..... ..... 001 ..... 1111111 @r
+
+smmul      0100000  ..... ..... 001 ..... 1111111 @r
+smmul_u    0101000  ..... ..... 001 ..... 1111111 @r
+kmmac      0110000  ..... ..... 001 ..... 1111111 @r
+kmmac_u    0111000  ..... ..... 001 ..... 1111111 @r
+kmmsb      0100001  ..... ..... 001 ..... 1111111 @r
+kmmsb_u    0101001  ..... ..... 001 ..... 1111111 @r
+kwmmul     0110001  ..... ..... 001 ..... 1111111 @r
+kwmmul_u   0111001  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 99a19019eb..fbc9c0b57b 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -520,3 +520,47 @@ GEN_RVP_R_OOL(pkbb16);
 GEN_RVP_R_OOL(pkbt16);
 GEN_RVP_R_OOL(pktt16);
 GEN_RVP_R_OOL(pktb16);
+
+/* Most Significant Word "32x32" Multiply & Add Instructions */
+GEN_RVP_R_OOL(smmul);
+GEN_RVP_R_OOL(smmul_u);
+
+/* Function to accumulate destination register */
+static inline bool r_acc_ool(DisasContext *ctx, arg_r *a,
+                             void (* fn)(TCGv, TCGv_ptr, TCGv, TCGv, TCGv))
+{
+    TCGv src1, src2, src3, dst;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    src3 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    gen_get_gpr(src3, a->rd);
+    fn(dst, cpu_env, src1, src2, src3);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(src3);
+    tcg_temp_free(dst);
+    return true;
+}
+
+#define GEN_RVP_R_ACC_OOL(NAME)                        \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_acc_ool(s, a, gen_helper_##NAME);         \
+}
+
+GEN_RVP_R_ACC_OOL(kmmac);
+GEN_RVP_R_ACC_OOL(kmmac_u);
+GEN_RVP_R_ACC_OOL(kmmsb);
+GEN_RVP_R_ACC_OOL(kmmsb_u);
+GEN_RVP_R_OOL(kwmmul);
+GEN_RVP_R_OOL(kwmmul_u);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index fe1b48c86d..c1322d2fac 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1368,3 +1368,112 @@ static inline void do_pktb16(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(pktb16, 2, 2);
+
+/* Most Significant Word "32x32" Multiply & Add Instructions */
+static inline void do_smmul(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = (int64_t)a[i] * b[i] >> 32;
+}
+
+RVPR(smmul, 1, 4);
+
+static inline void do_smmul_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = ((int64_t)a[i] * b[i] + (uint32_t)INT32_MIN) >> 32;
+}
+
+RVPR(smmul_u, 1, 4);
+
+typedef void PackedFn4i(CPURISCVState *, void *, void *,
+                        void *, void *, uint8_t);
+
+static inline target_ulong
+rvpr_acc(CPURISCVState *env, target_ulong a,
+         target_ulong b, target_ulong c,
+         uint8_t step, uint8_t size, PackedFn4i *fn)
+{
+    int i, passes = sizeof(target_ulong) / size;
+    target_ulong result = 0;
+
+    for (i = 0; i < passes; i += step) {
+        fn(env, &result, &a, &b, &c, i);
+    }
+    return result;
+}
+
+#define RVPR_ACC(NAME, STEP, SIZE)                                     \
+target_ulong HELPER(NAME)(CPURISCVState *env, target_ulong a,          \
+                          target_ulong b, target_ulong c)              \
+{                                                                      \
+    return rvpr_acc(env, a, b, c, STEP, SIZE, (PackedFn4i *)do_##NAME);\
+}
+
+static inline void do_kmmac(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb, *c = vc;
+    d[i] = sadd32(env, 0, ((int64_t)a[i] * b[i]) >> 32, c[i]);
+}
+
+RVPR_ACC(kmmac, 1, 4);
+
+static inline void do_kmmac_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb, *c = vc;
+    d[i] = sadd32(env, 0, ((int64_t)a[i] * b[i] +
+                           (uint32_t)INT32_MIN) >> 32, c[i]);
+}
+
+RVPR_ACC(kmmac_u, 1, 4);
+
+static inline void do_kmmsb(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb, *c = vc;
+    d[i] = ssub32(env, 0, c[i], (int64_t)a[i] * b[i] >> 32);
+}
+
+RVPR_ACC(kmmsb, 1, 4);
+
+static inline void do_kmmsb_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb, *c = vc;
+    d[i] = ssub32(env, 0, c[i], ((int64_t)a[i] * b[i] +
+                                 (uint32_t)INT32_MIN) >> 32);
+}
+
+RVPR_ACC(kmmsb_u, 1, 4);
+
+static inline void do_kwmmul(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    if (a[i] == INT32_MIN && b[i] == INT32_MIN) {
+        env->vxsat = 0x1;
+        d[i] = INT32_MAX;
+    } else {
+        d[i] = (int64_t)a[i] * b[i] >> 31;
+    }
+}
+
+RVPR(kwmmul, 1, 4);
+
+static inline void do_kwmmul_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    if (a[i] == INT32_MIN && b[i] == INT32_MIN) {
+        env->vxsat = 0x1;
+        d[i] = INT32_MAX;
+    } else {
+        d[i] = ((int64_t)a[i] * b[i] + (1ull << 30)) >> 31;
+    }
+}
+
+RVPR(kwmmul_u, 1, 4);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 16/38] target/riscv: Signed MSW 32x32 Multiply and Add Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   9 ++
 target/riscv/insn32.decode              |   9 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  44 ++++++++++
 target/riscv/packed_helper.c            | 109 ++++++++++++++++++++++++
 4 files changed, 171 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 4dc66cf4cc..0bd21c8514 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1268,3 +1268,12 @@ DEF_HELPER_3(pkbb16, tl, env, tl, tl)
 DEF_HELPER_3(pkbt16, tl, env, tl, tl)
 DEF_HELPER_3(pktt16, tl, env, tl, tl)
 DEF_HELPER_3(pktb16, tl, env, tl, tl)
+
+DEF_HELPER_3(smmul, tl, env, tl, tl)
+DEF_HELPER_3(smmul_u, tl, env, tl, tl)
+DEF_HELPER_4(kmmac, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmac_u, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmsb, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmsb_u, tl, env, tl, tl, tl)
+DEF_HELPER_3(kwmmul, tl, env, tl, tl)
+DEF_HELPER_3(kwmmul_u, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index a4d9ff2282..e0be2790dc 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -736,3 +736,12 @@ pkbb16     0000111  ..... ..... 001 ..... 1111111 @r
 pkbt16     0001111  ..... ..... 001 ..... 1111111 @r
 pktt16     0010111  ..... ..... 001 ..... 1111111 @r
 pktb16     0011111  ..... ..... 001 ..... 1111111 @r
+
+smmul      0100000  ..... ..... 001 ..... 1111111 @r
+smmul_u    0101000  ..... ..... 001 ..... 1111111 @r
+kmmac      0110000  ..... ..... 001 ..... 1111111 @r
+kmmac_u    0111000  ..... ..... 001 ..... 1111111 @r
+kmmsb      0100001  ..... ..... 001 ..... 1111111 @r
+kmmsb_u    0101001  ..... ..... 001 ..... 1111111 @r
+kwmmul     0110001  ..... ..... 001 ..... 1111111 @r
+kwmmul_u   0111001  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 99a19019eb..fbc9c0b57b 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -520,3 +520,47 @@ GEN_RVP_R_OOL(pkbb16);
 GEN_RVP_R_OOL(pkbt16);
 GEN_RVP_R_OOL(pktt16);
 GEN_RVP_R_OOL(pktb16);
+
+/* Most Significant Word "32x32" Multiply & Add Instructions */
+GEN_RVP_R_OOL(smmul);
+GEN_RVP_R_OOL(smmul_u);
+
+/* Function to accumulate destination register */
+static inline bool r_acc_ool(DisasContext *ctx, arg_r *a,
+                             void (* fn)(TCGv, TCGv_ptr, TCGv, TCGv, TCGv))
+{
+    TCGv src1, src2, src3, dst;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    src3 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    gen_get_gpr(src3, a->rd);
+    fn(dst, cpu_env, src1, src2, src3);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(src3);
+    tcg_temp_free(dst);
+    return true;
+}
+
+#define GEN_RVP_R_ACC_OOL(NAME)                        \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_acc_ool(s, a, gen_helper_##NAME);         \
+}
+
+GEN_RVP_R_ACC_OOL(kmmac);
+GEN_RVP_R_ACC_OOL(kmmac_u);
+GEN_RVP_R_ACC_OOL(kmmsb);
+GEN_RVP_R_ACC_OOL(kmmsb_u);
+GEN_RVP_R_OOL(kwmmul);
+GEN_RVP_R_OOL(kwmmul_u);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index fe1b48c86d..c1322d2fac 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1368,3 +1368,112 @@ static inline void do_pktb16(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(pktb16, 2, 2);
+
+/* Most Significant Word "32x32" Multiply & Add Instructions */
+static inline void do_smmul(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = (int64_t)a[i] * b[i] >> 32;
+}
+
+RVPR(smmul, 1, 4);
+
+static inline void do_smmul_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = ((int64_t)a[i] * b[i] + (uint32_t)INT32_MIN) >> 32;
+}
+
+RVPR(smmul_u, 1, 4);
+
+typedef void PackedFn4i(CPURISCVState *, void *, void *,
+                        void *, void *, uint8_t);
+
+static inline target_ulong
+rvpr_acc(CPURISCVState *env, target_ulong a,
+         target_ulong b, target_ulong c,
+         uint8_t step, uint8_t size, PackedFn4i *fn)
+{
+    int i, passes = sizeof(target_ulong) / size;
+    target_ulong result = 0;
+
+    for (i = 0; i < passes; i += step) {
+        fn(env, &result, &a, &b, &c, i);
+    }
+    return result;
+}
+
+#define RVPR_ACC(NAME, STEP, SIZE)                                     \
+target_ulong HELPER(NAME)(CPURISCVState *env, target_ulong a,          \
+                          target_ulong b, target_ulong c)              \
+{                                                                      \
+    return rvpr_acc(env, a, b, c, STEP, SIZE, (PackedFn4i *)do_##NAME);\
+}
+
+static inline void do_kmmac(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb, *c = vc;
+    d[i] = sadd32(env, 0, ((int64_t)a[i] * b[i]) >> 32, c[i]);
+}
+
+RVPR_ACC(kmmac, 1, 4);
+
+static inline void do_kmmac_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb, *c = vc;
+    d[i] = sadd32(env, 0, ((int64_t)a[i] * b[i] +
+                           (uint32_t)INT32_MIN) >> 32, c[i]);
+}
+
+RVPR_ACC(kmmac_u, 1, 4);
+
+static inline void do_kmmsb(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb, *c = vc;
+    d[i] = ssub32(env, 0, c[i], (int64_t)a[i] * b[i] >> 32);
+}
+
+RVPR_ACC(kmmsb, 1, 4);
+
+static inline void do_kmmsb_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb, *c = vc;
+    d[i] = ssub32(env, 0, c[i], ((int64_t)a[i] * b[i] +
+                                 (uint32_t)INT32_MIN) >> 32);
+}
+
+RVPR_ACC(kmmsb_u, 1, 4);
+
+static inline void do_kwmmul(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    if (a[i] == INT32_MIN && b[i] == INT32_MIN) {
+        env->vxsat = 0x1;
+        d[i] = INT32_MAX;
+    } else {
+        d[i] = (int64_t)a[i] * b[i] >> 31;
+    }
+}
+
+RVPR(kwmmul, 1, 4);
+
+static inline void do_kwmmul_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    if (a[i] == INT32_MIN && b[i] == INT32_MIN) {
+        env->vxsat = 0x1;
+        d[i] = INT32_MAX;
+    } else {
+        d[i] = ((int64_t)a[i] * b[i] + (1ull << 30)) >> 31;
+    }
+}
+
+RVPR(kwmmul_u, 1, 4);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 17/38] target/riscv: Signed MSW 32x16 Multiply and Add Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  17 ++
 target/riscv/insn32.decode              |  17 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  18 ++
 target/riscv/packed_helper.c            | 208 ++++++++++++++++++++++++
 4 files changed, 260 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 0bd21c8514..25aa07a7ff 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1277,3 +1277,20 @@ DEF_HELPER_4(kmmsb, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmmsb_u, tl, env, tl, tl, tl)
 DEF_HELPER_3(kwmmul, tl, env, tl, tl)
 DEF_HELPER_3(kwmmul_u, tl, env, tl, tl)
+
+DEF_HELPER_3(smmwb, tl, env, tl, tl)
+DEF_HELPER_3(smmwb_u, tl, env, tl, tl)
+DEF_HELPER_3(smmwt, tl, env, tl, tl)
+DEF_HELPER_3(smmwt_u, tl, env, tl, tl)
+DEF_HELPER_4(kmmawb, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawb_u, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawt, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawt_u, tl, env, tl, tl, tl)
+DEF_HELPER_3(kmmwb2, tl, env, tl, tl)
+DEF_HELPER_3(kmmwb2_u, tl, env, tl, tl)
+DEF_HELPER_3(kmmwt2, tl, env, tl, tl)
+DEF_HELPER_3(kmmwt2_u, tl, env, tl, tl)
+DEF_HELPER_4(kmmawb2, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawb2_u, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawt2, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawt2_u, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e0be2790dc..6e63bab2d9 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -745,3 +745,20 @@ kmmsb      0100001  ..... ..... 001 ..... 1111111 @r
 kmmsb_u    0101001  ..... ..... 001 ..... 1111111 @r
 kwmmul     0110001  ..... ..... 001 ..... 1111111 @r
 kwmmul_u   0111001  ..... ..... 001 ..... 1111111 @r
+
+smmwb      0100010  ..... ..... 001 ..... 1111111 @r
+smmwb_u    0101010  ..... ..... 001 ..... 1111111 @r
+smmwt      0110010  ..... ..... 001 ..... 1111111 @r
+smmwt_u    0111010  ..... ..... 001 ..... 1111111 @r
+kmmawb     0100011  ..... ..... 001 ..... 1111111 @r
+kmmawb_u   0101011  ..... ..... 001 ..... 1111111 @r
+kmmawt     0110011  ..... ..... 001 ..... 1111111 @r
+kmmawt_u   0111011  ..... ..... 001 ..... 1111111 @r
+kmmwb2     1000111  ..... ..... 001 ..... 1111111 @r
+kmmwb2_u   1001111  ..... ..... 001 ..... 1111111 @r
+kmmwt2     1010111  ..... ..... 001 ..... 1111111 @r
+kmmwt2_u   1011111  ..... ..... 001 ..... 1111111 @r
+kmmawb2    1100111  ..... ..... 001 ..... 1111111 @r
+kmmawb2_u  1101111  ..... ..... 001 ..... 1111111 @r
+kmmawt2    1110111  ..... ..... 001 ..... 1111111 @r
+kmmawt2_u  1111111  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index fbc9c0b57b..e708ae7a6a 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -564,3 +564,21 @@ GEN_RVP_R_ACC_OOL(kmmsb);
 GEN_RVP_R_ACC_OOL(kmmsb_u);
 GEN_RVP_R_OOL(kwmmul);
 GEN_RVP_R_OOL(kwmmul_u);
+
+/* Most Significant Word "32x16" Multiply & Add Instructions */
+GEN_RVP_R_OOL(smmwb);
+GEN_RVP_R_OOL(smmwb_u);
+GEN_RVP_R_OOL(smmwt);
+GEN_RVP_R_OOL(smmwt_u);
+GEN_RVP_R_ACC_OOL(kmmawb);
+GEN_RVP_R_ACC_OOL(kmmawb_u);
+GEN_RVP_R_ACC_OOL(kmmawt);
+GEN_RVP_R_ACC_OOL(kmmawt_u);
+GEN_RVP_R_OOL(kmmwb2);
+GEN_RVP_R_OOL(kmmwb2_u);
+GEN_RVP_R_OOL(kmmwt2);
+GEN_RVP_R_OOL(kmmwt2_u);
+GEN_RVP_R_ACC_OOL(kmmawb2);
+GEN_RVP_R_ACC_OOL(kmmawb2_u);
+GEN_RVP_R_ACC_OOL(kmmawt2);
+GEN_RVP_R_ACC_OOL(kmmawt2_u);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index c1322d2fac..ea3c9f6dd8 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1477,3 +1477,211 @@ static inline void do_kwmmul_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(kwmmul_u, 1, 4);
+
+/* Most Significant Word "32x16" Multiply & Add Instructions */
+static inline void do_smmwb(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i)] >> 16;
+}
+
+RVPR(smmwb, 1, 4);
+
+static inline void do_smmwb_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i)] + (1ull << 15)) >> 16;
+}
+
+RVPR(smmwb_u, 1, 4);
+
+static inline void do_smmwt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 16;
+}
+
+RVPR(smmwt, 1, 4);
+
+static inline void do_smmwt_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] + (1ull << 15)) >> 16;
+}
+
+RVPR(smmwt_u, 1, 4);
+
+static inline void do_kmmawb(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc;
+    int16_t *b = vb;
+    d[H4(i)] = sadd32(env, 0, (int64_t)a[H4(i)] * b[H2(2 * i)] >> 16, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawb, 1, 4);
+
+static inline void do_kmmawb_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc;
+    int16_t *b = vb;
+    d[H4(i)] = sadd32(env, 0, ((int64_t)a[H4(i)] * b[H2(2 * i)] +
+                               (1ull << 15)) >> 16, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawb_u, 1, 4);
+
+static inline void do_kmmawt(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc;
+    int16_t *b = vb;
+    d[H4(i)] = sadd32(env, 0, (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 16,
+                      c[H4(i)]);
+}
+
+RVPR_ACC(kmmawt, 1, 4);
+
+static inline void do_kmmawt_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc;
+    int16_t *b = vb;
+    d[H4(i)] = sadd32(env, 0, ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] +
+                               (1ull << 15)) >> 16, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawt_u, 1, 4);
+
+static inline void do_kmmwb2(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        d[H4(i)] = INT32_MAX;
+    } else {
+        d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i)] >> 15;
+    }
+}
+
+RVPR(kmmwb2, 1, 4);
+
+static inline void do_kmmwb2_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        d[H4(i)] = INT32_MAX;
+    } else {
+        d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i)] + (1ull << 14)) >> 15;
+    }
+}
+
+RVPR(kmmwb2_u, 1, 4);
+
+static inline void do_kmmwt2(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        d[H4(i)] = INT32_MAX;
+    } else {
+        d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 15;
+    }
+}
+
+RVPR(kmmwt2, 1, 4);
+
+static inline void do_kmmwt2_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        d[H4(i)] = INT32_MAX;
+    } else {
+        d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] + (1ull << 14)) >> 15;
+    }
+}
+
+RVPR(kmmwt2_u, 1, 4);
+
+static inline void do_kmmawb2(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc, result;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        result = INT32_MAX;
+    } else {
+        result = (int64_t)a[H4(i)] * b[H2(2 * i)] >> 15;
+    }
+    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawb2, 1, 4);
+
+static inline void do_kmmawb2_u(CPURISCVState *env, void *vd, void *va,
+                                void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc, result;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        result = INT32_MAX;
+    } else {
+        result = ((int64_t)a[H4(i)] * b[H2(2 * i)] + (1ull << 14)) >> 15;
+    }
+    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawb2_u, 1, 4);
+
+static inline void do_kmmawt2(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc, result;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        result = INT32_MAX;
+    } else {
+        result = (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 15;
+    }
+    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawt2, 1, 4);
+
+static inline void do_kmmawt2_u(CPURISCVState *env, void *vd, void *va,
+                                void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc, result;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        result = INT32_MAX;
+    } else {
+        result = ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] + (1ull << 14)) >> 15;
+    }
+    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawt2_u, 1, 4);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 17/38] target/riscv: Signed MSW 32x16 Multiply and Add Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  17 ++
 target/riscv/insn32.decode              |  17 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  18 ++
 target/riscv/packed_helper.c            | 208 ++++++++++++++++++++++++
 4 files changed, 260 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 0bd21c8514..25aa07a7ff 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1277,3 +1277,20 @@ DEF_HELPER_4(kmmsb, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmmsb_u, tl, env, tl, tl, tl)
 DEF_HELPER_3(kwmmul, tl, env, tl, tl)
 DEF_HELPER_3(kwmmul_u, tl, env, tl, tl)
+
+DEF_HELPER_3(smmwb, tl, env, tl, tl)
+DEF_HELPER_3(smmwb_u, tl, env, tl, tl)
+DEF_HELPER_3(smmwt, tl, env, tl, tl)
+DEF_HELPER_3(smmwt_u, tl, env, tl, tl)
+DEF_HELPER_4(kmmawb, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawb_u, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawt, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawt_u, tl, env, tl, tl, tl)
+DEF_HELPER_3(kmmwb2, tl, env, tl, tl)
+DEF_HELPER_3(kmmwb2_u, tl, env, tl, tl)
+DEF_HELPER_3(kmmwt2, tl, env, tl, tl)
+DEF_HELPER_3(kmmwt2_u, tl, env, tl, tl)
+DEF_HELPER_4(kmmawb2, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawb2_u, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawt2, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmmawt2_u, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e0be2790dc..6e63bab2d9 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -745,3 +745,20 @@ kmmsb      0100001  ..... ..... 001 ..... 1111111 @r
 kmmsb_u    0101001  ..... ..... 001 ..... 1111111 @r
 kwmmul     0110001  ..... ..... 001 ..... 1111111 @r
 kwmmul_u   0111001  ..... ..... 001 ..... 1111111 @r
+
+smmwb      0100010  ..... ..... 001 ..... 1111111 @r
+smmwb_u    0101010  ..... ..... 001 ..... 1111111 @r
+smmwt      0110010  ..... ..... 001 ..... 1111111 @r
+smmwt_u    0111010  ..... ..... 001 ..... 1111111 @r
+kmmawb     0100011  ..... ..... 001 ..... 1111111 @r
+kmmawb_u   0101011  ..... ..... 001 ..... 1111111 @r
+kmmawt     0110011  ..... ..... 001 ..... 1111111 @r
+kmmawt_u   0111011  ..... ..... 001 ..... 1111111 @r
+kmmwb2     1000111  ..... ..... 001 ..... 1111111 @r
+kmmwb2_u   1001111  ..... ..... 001 ..... 1111111 @r
+kmmwt2     1010111  ..... ..... 001 ..... 1111111 @r
+kmmwt2_u   1011111  ..... ..... 001 ..... 1111111 @r
+kmmawb2    1100111  ..... ..... 001 ..... 1111111 @r
+kmmawb2_u  1101111  ..... ..... 001 ..... 1111111 @r
+kmmawt2    1110111  ..... ..... 001 ..... 1111111 @r
+kmmawt2_u  1111111  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index fbc9c0b57b..e708ae7a6a 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -564,3 +564,21 @@ GEN_RVP_R_ACC_OOL(kmmsb);
 GEN_RVP_R_ACC_OOL(kmmsb_u);
 GEN_RVP_R_OOL(kwmmul);
 GEN_RVP_R_OOL(kwmmul_u);
+
+/* Most Significant Word "32x16" Multiply & Add Instructions */
+GEN_RVP_R_OOL(smmwb);
+GEN_RVP_R_OOL(smmwb_u);
+GEN_RVP_R_OOL(smmwt);
+GEN_RVP_R_OOL(smmwt_u);
+GEN_RVP_R_ACC_OOL(kmmawb);
+GEN_RVP_R_ACC_OOL(kmmawb_u);
+GEN_RVP_R_ACC_OOL(kmmawt);
+GEN_RVP_R_ACC_OOL(kmmawt_u);
+GEN_RVP_R_OOL(kmmwb2);
+GEN_RVP_R_OOL(kmmwb2_u);
+GEN_RVP_R_OOL(kmmwt2);
+GEN_RVP_R_OOL(kmmwt2_u);
+GEN_RVP_R_ACC_OOL(kmmawb2);
+GEN_RVP_R_ACC_OOL(kmmawb2_u);
+GEN_RVP_R_ACC_OOL(kmmawt2);
+GEN_RVP_R_ACC_OOL(kmmawt2_u);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index c1322d2fac..ea3c9f6dd8 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1477,3 +1477,211 @@ static inline void do_kwmmul_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(kwmmul_u, 1, 4);
+
+/* Most Significant Word "32x16" Multiply & Add Instructions */
+static inline void do_smmwb(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i)] >> 16;
+}
+
+RVPR(smmwb, 1, 4);
+
+static inline void do_smmwb_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i)] + (1ull << 15)) >> 16;
+}
+
+RVPR(smmwb_u, 1, 4);
+
+static inline void do_smmwt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 16;
+}
+
+RVPR(smmwt, 1, 4);
+
+static inline void do_smmwt_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] + (1ull << 15)) >> 16;
+}
+
+RVPR(smmwt_u, 1, 4);
+
+static inline void do_kmmawb(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc;
+    int16_t *b = vb;
+    d[H4(i)] = sadd32(env, 0, (int64_t)a[H4(i)] * b[H2(2 * i)] >> 16, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawb, 1, 4);
+
+static inline void do_kmmawb_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc;
+    int16_t *b = vb;
+    d[H4(i)] = sadd32(env, 0, ((int64_t)a[H4(i)] * b[H2(2 * i)] +
+                               (1ull << 15)) >> 16, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawb_u, 1, 4);
+
+static inline void do_kmmawt(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc;
+    int16_t *b = vb;
+    d[H4(i)] = sadd32(env, 0, (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 16,
+                      c[H4(i)]);
+}
+
+RVPR_ACC(kmmawt, 1, 4);
+
+static inline void do_kmmawt_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc;
+    int16_t *b = vb;
+    d[H4(i)] = sadd32(env, 0, ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] +
+                               (1ull << 15)) >> 16, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawt_u, 1, 4);
+
+static inline void do_kmmwb2(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        d[H4(i)] = INT32_MAX;
+    } else {
+        d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i)] >> 15;
+    }
+}
+
+RVPR(kmmwb2, 1, 4);
+
+static inline void do_kmmwb2_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        d[H4(i)] = INT32_MAX;
+    } else {
+        d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i)] + (1ull << 14)) >> 15;
+    }
+}
+
+RVPR(kmmwb2_u, 1, 4);
+
+static inline void do_kmmwt2(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        d[H4(i)] = INT32_MAX;
+    } else {
+        d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 15;
+    }
+}
+
+RVPR(kmmwt2, 1, 4);
+
+static inline void do_kmmwt2_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        d[H4(i)] = INT32_MAX;
+    } else {
+        d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] + (1ull << 14)) >> 15;
+    }
+}
+
+RVPR(kmmwt2_u, 1, 4);
+
+static inline void do_kmmawb2(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc, result;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        result = INT32_MAX;
+    } else {
+        result = (int64_t)a[H4(i)] * b[H2(2 * i)] >> 15;
+    }
+    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawb2, 1, 4);
+
+static inline void do_kmmawb2_u(CPURISCVState *env, void *vd, void *va,
+                                void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc, result;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        result = INT32_MAX;
+    } else {
+        result = ((int64_t)a[H4(i)] * b[H2(2 * i)] + (1ull << 14)) >> 15;
+    }
+    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawb2_u, 1, 4);
+
+static inline void do_kmmawt2(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc, result;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        result = INT32_MAX;
+    } else {
+        result = (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 15;
+    }
+    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawt2, 1, 4);
+
+static inline void do_kmmawt2_u(CPURISCVState *env, void *vd, void *va,
+                                void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *c = vc, result;
+    int16_t *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
+        env->vxsat = 0x1;
+        result = INT32_MAX;
+    } else {
+        result = ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] + (1ull << 14)) >> 15;
+    }
+    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
+}
+
+RVPR_ACC(kmmawt2_u, 1, 4);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 18/38] target/riscv: Signed 16-bit Multiply 32-bit Add/Subtract Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  19 ++
 target/riscv/insn32.decode              |  19 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  20 ++
 target/riscv/packed_helper.c            | 268 ++++++++++++++++++++++++
 4 files changed, 326 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 25aa07a7ff..b1f831bb02 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1294,3 +1294,22 @@ DEF_HELPER_4(kmmawb2, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmmawb2_u, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmmawt2, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmmawt2_u, tl, env, tl, tl, tl)
+
+DEF_HELPER_3(smbb16, tl, env, tl, tl)
+DEF_HELPER_3(smbt16, tl, env, tl, tl)
+DEF_HELPER_3(smtt16, tl, env, tl, tl)
+DEF_HELPER_3(kmda, tl, env, tl, tl)
+DEF_HELPER_3(kmxda, tl, env, tl, tl)
+DEF_HELPER_3(smds, tl, env, tl, tl)
+DEF_HELPER_3(smdrs, tl, env, tl, tl)
+DEF_HELPER_3(smxds, tl, env, tl, tl)
+DEF_HELPER_4(kmabb, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmabt, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmatt, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmada, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmaxda, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmads, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmadrs, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmaxds, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmsda, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmsxda, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 6e63bab2d9..4e5cdbb928 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -762,3 +762,22 @@ kmmawb2    1100111  ..... ..... 001 ..... 1111111 @r
 kmmawb2_u  1101111  ..... ..... 001 ..... 1111111 @r
 kmmawt2    1110111  ..... ..... 001 ..... 1111111 @r
 kmmawt2_u  1111111  ..... ..... 001 ..... 1111111 @r
+
+smbb16     0000100  ..... ..... 001 ..... 1111111 @r
+smbt16     0001100  ..... ..... 001 ..... 1111111 @r
+smtt16     0010100  ..... ..... 001 ..... 1111111 @r
+kmda       0011100  ..... ..... 001 ..... 1111111 @r
+kmxda      0011101  ..... ..... 001 ..... 1111111 @r
+smds       0101100  ..... ..... 001 ..... 1111111 @r
+smdrs      0110100  ..... ..... 001 ..... 1111111 @r
+smxds      0111100  ..... ..... 001 ..... 1111111 @r
+kmabb      0101101  ..... ..... 001 ..... 1111111 @r
+kmabt      0110101  ..... ..... 001 ..... 1111111 @r
+kmatt      0111101  ..... ..... 001 ..... 1111111 @r
+kmada      0100100  ..... ..... 001 ..... 1111111 @r
+kmaxda     0100101  ..... ..... 001 ..... 1111111 @r
+kmads      0101110  ..... ..... 001 ..... 1111111 @r
+kmadrs     0110110  ..... ..... 001 ..... 1111111 @r
+kmaxds     0111110  ..... ..... 001 ..... 1111111 @r
+kmsda      0100110  ..... ..... 001 ..... 1111111 @r
+kmsxda     0100111  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index e708ae7a6a..261aab402a 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -582,3 +582,23 @@ GEN_RVP_R_ACC_OOL(kmmawb2);
 GEN_RVP_R_ACC_OOL(kmmawb2_u);
 GEN_RVP_R_ACC_OOL(kmmawt2);
 GEN_RVP_R_ACC_OOL(kmmawt2_u);
+
+/* Signed 16-bit Multiply with 32-bit Add/Subtract Instructions */
+GEN_RVP_R_OOL(smbb16);
+GEN_RVP_R_OOL(smbt16);
+GEN_RVP_R_OOL(smtt16);
+GEN_RVP_R_OOL(kmda);
+GEN_RVP_R_OOL(kmxda);
+GEN_RVP_R_OOL(smds);
+GEN_RVP_R_OOL(smdrs);
+GEN_RVP_R_OOL(smxds);
+GEN_RVP_R_ACC_OOL(kmabb);
+GEN_RVP_R_ACC_OOL(kmabt);
+GEN_RVP_R_ACC_OOL(kmatt);
+GEN_RVP_R_ACC_OOL(kmada);
+GEN_RVP_R_ACC_OOL(kmaxda);
+GEN_RVP_R_ACC_OOL(kmads);
+GEN_RVP_R_ACC_OOL(kmadrs);
+GEN_RVP_R_ACC_OOL(kmaxds);
+GEN_RVP_R_ACC_OOL(kmsda);
+GEN_RVP_R_ACC_OOL(kmsxda);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index ea3c9f6dd8..b3673a33ee 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1685,3 +1685,271 @@ static inline void do_kmmawt2_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(kmmawt2_u, 1, 4);
+
+/* Signed 16-bit Multiply with 32-bit Add/Subtract Instruction */
+static inline void do_smbb16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+}
+
+RVPR(smbb16, 1, 4);
+
+static inline void do_smbt16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+}
+
+RVPR(smbt16, 1, 4);
+
+static inline void do_smtt16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+}
+
+RVPR(smtt16, 1, 4);
+
+static inline void do_kmda(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    if (a[H2(2 * i)] == INT16_MIN && a[H2(2 * i + 1)] == INT16_MIN &&
+        b[H2(2 * i)] == INT16_MIN && a[H2(2 * i + 1)] == INT16_MIN) {
+        d[H4(i)] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[H4(i)] = (int32_t)a[H2(2 * i)] * b[H2(2 * i)] +
+                   (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+    }
+}
+
+RVPR(kmda, 1, 4);
+
+static inline void do_kmxda(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    if (a[H2(2 * i)] == INT16_MIN && a[H2(2 * i + 1)] == INT16_MIN &&
+        b[H2(2 * i)] == INT16_MIN && a[H2(2 * i + 1)] == INT16_MIN) {
+        d[H4(i)] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[H4(i)] = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i)] +
+                   (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+    }
+}
+
+RVPR(kmxda, 1, 4);
+
+static inline void do_smds(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)] -
+               (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+}
+
+RVPR(smds, 1, 4);
+
+static inline void do_smdrs(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i)] * b[H2(2 * i)] -
+               (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+}
+
+RVPR(smdrs, 1, 4);
+
+static inline void do_smxds(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i)] -
+               (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+}
+
+RVPR(smxds, 1, 4);
+
+static inline void do_kmabb(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = sadd32(env, 0, (int32_t)a[H2(2 * i)] * b[H2(2 * i)], c[H4(i)]);
+}
+
+RVPR_ACC(kmabb, 1, 4);
+
+static inline void do_kmabt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = sadd32(env, 0, (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)],
+                      c[H4(i)]);
+}
+
+RVPR_ACC(kmabt, 1, 4);
+
+static inline void do_kmatt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = sadd32(env, 0, (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)],
+                      c[H4(i)]);
+}
+
+RVPR_ACC(kmatt, 1, 4);
+
+static inline void do_kmada(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+    p2 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+
+    if (a[H2(i)] == INT16_MIN && a[H2(i + 1)] == INT16_MIN &&
+        b[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        if (c[H4(i)] < 0) {
+            d[H4(i)] = INT32_MAX + c[H4(i)] + 1ll;
+        } else {
+            env->vxsat = 0x1;
+            d[H4(i)] = INT32_MAX;
+        }
+    } else {
+        d[H4(i)] = sadd32(env, 0, p1 + p2, c[H4(i)]);
+    }
+}
+
+RVPR_ACC(kmada, 1, 4);
+
+static inline void do_kmaxda(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i)];
+    p2 = (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+
+    if (a[H2(2 * i)] == INT16_MIN && a[H2(2 * i + 1)] == INT16_MIN &&
+        b[H2(2 * i)] == INT16_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
+        if (c[H4(i)] < 0) {
+            d[H4(i)] = INT32_MAX + c[H4(i)] + 1ll;
+        } else {
+            env->vxsat = 0x1;
+            d[H4(i)] = INT32_MAX;
+        }
+    } else {
+        d[H4(i)] = sadd32(env, 0, p1 + p2, c[H4(i)]);
+    }
+}
+
+RVPR_ACC(kmaxda, 1, 4);
+
+static inline void do_kmads(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 =   (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+    p2 =   (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+
+    d[H4(i)] = sadd32(env, 0, p1 - p2, c[H4(i)]);
+}
+
+RVPR_ACC(kmads, 1, 4);
+
+static inline void do_kmadrs(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void * vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+    p2 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+
+    d[H4(i)] = sadd32(env, 0, p1 - p2, c[H4(i)]);
+}
+
+RVPR_ACC(kmadrs, 1, 4);
+
+static inline void do_kmaxds(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i)];
+    p2 = (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+
+    d[H4(i)] = sadd32(env, 0, p1 - p2, c[H4(i)]);
+}
+
+RVPR_ACC(kmaxds, 1, 4);
+
+static inline void do_kmsda(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+    p2 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+
+    if (a[H2(i)] == INT16_MIN && a[H2(i + 1)] == INT16_MIN &&
+        b[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        if (c[H4(i)] < 0) {
+            env->vxsat = 0x1;
+            d[H4(i)] = INT32_MIN;
+        } else {
+            d[H4(i)] = c[H4(i)] - 1ll - INT32_MAX;
+        }
+    } else {
+        d[H4(i)] = ssub32(env, 0, c[H4(i)], p1 + p2);
+    }
+}
+
+RVPR_ACC(kmsda, 1, 4);
+
+static inline void do_kmsxda(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void * vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+    p2 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i)];
+
+    if (a[H2(i)] == INT16_MIN && a[H2(i + 1)] == INT16_MIN &&
+        b[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        if (d[H4(i)] < 0) {
+            env->vxsat = 0x1;
+            d[H4(i)] = INT32_MIN;
+        } else {
+            d[H4(i)] = c[H4(i)] - 1ll - INT32_MAX;
+        }
+    } else {
+        d[H4(i)] = ssub32(env, 0, c[H4(i)], p1 + p2);
+    }
+}
+
+RVPR_ACC(kmsxda, 1, 4);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 18/38] target/riscv: Signed 16-bit Multiply 32-bit Add/Subtract Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  19 ++
 target/riscv/insn32.decode              |  19 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  20 ++
 target/riscv/packed_helper.c            | 268 ++++++++++++++++++++++++
 4 files changed, 326 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 25aa07a7ff..b1f831bb02 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1294,3 +1294,22 @@ DEF_HELPER_4(kmmawb2, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmmawb2_u, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmmawt2, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmmawt2_u, tl, env, tl, tl, tl)
+
+DEF_HELPER_3(smbb16, tl, env, tl, tl)
+DEF_HELPER_3(smbt16, tl, env, tl, tl)
+DEF_HELPER_3(smtt16, tl, env, tl, tl)
+DEF_HELPER_3(kmda, tl, env, tl, tl)
+DEF_HELPER_3(kmxda, tl, env, tl, tl)
+DEF_HELPER_3(smds, tl, env, tl, tl)
+DEF_HELPER_3(smdrs, tl, env, tl, tl)
+DEF_HELPER_3(smxds, tl, env, tl, tl)
+DEF_HELPER_4(kmabb, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmabt, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmatt, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmada, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmaxda, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmads, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmadrs, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmaxds, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmsda, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmsxda, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 6e63bab2d9..4e5cdbb928 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -762,3 +762,22 @@ kmmawb2    1100111  ..... ..... 001 ..... 1111111 @r
 kmmawb2_u  1101111  ..... ..... 001 ..... 1111111 @r
 kmmawt2    1110111  ..... ..... 001 ..... 1111111 @r
 kmmawt2_u  1111111  ..... ..... 001 ..... 1111111 @r
+
+smbb16     0000100  ..... ..... 001 ..... 1111111 @r
+smbt16     0001100  ..... ..... 001 ..... 1111111 @r
+smtt16     0010100  ..... ..... 001 ..... 1111111 @r
+kmda       0011100  ..... ..... 001 ..... 1111111 @r
+kmxda      0011101  ..... ..... 001 ..... 1111111 @r
+smds       0101100  ..... ..... 001 ..... 1111111 @r
+smdrs      0110100  ..... ..... 001 ..... 1111111 @r
+smxds      0111100  ..... ..... 001 ..... 1111111 @r
+kmabb      0101101  ..... ..... 001 ..... 1111111 @r
+kmabt      0110101  ..... ..... 001 ..... 1111111 @r
+kmatt      0111101  ..... ..... 001 ..... 1111111 @r
+kmada      0100100  ..... ..... 001 ..... 1111111 @r
+kmaxda     0100101  ..... ..... 001 ..... 1111111 @r
+kmads      0101110  ..... ..... 001 ..... 1111111 @r
+kmadrs     0110110  ..... ..... 001 ..... 1111111 @r
+kmaxds     0111110  ..... ..... 001 ..... 1111111 @r
+kmsda      0100110  ..... ..... 001 ..... 1111111 @r
+kmsxda     0100111  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index e708ae7a6a..261aab402a 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -582,3 +582,23 @@ GEN_RVP_R_ACC_OOL(kmmawb2);
 GEN_RVP_R_ACC_OOL(kmmawb2_u);
 GEN_RVP_R_ACC_OOL(kmmawt2);
 GEN_RVP_R_ACC_OOL(kmmawt2_u);
+
+/* Signed 16-bit Multiply with 32-bit Add/Subtract Instructions */
+GEN_RVP_R_OOL(smbb16);
+GEN_RVP_R_OOL(smbt16);
+GEN_RVP_R_OOL(smtt16);
+GEN_RVP_R_OOL(kmda);
+GEN_RVP_R_OOL(kmxda);
+GEN_RVP_R_OOL(smds);
+GEN_RVP_R_OOL(smdrs);
+GEN_RVP_R_OOL(smxds);
+GEN_RVP_R_ACC_OOL(kmabb);
+GEN_RVP_R_ACC_OOL(kmabt);
+GEN_RVP_R_ACC_OOL(kmatt);
+GEN_RVP_R_ACC_OOL(kmada);
+GEN_RVP_R_ACC_OOL(kmaxda);
+GEN_RVP_R_ACC_OOL(kmads);
+GEN_RVP_R_ACC_OOL(kmadrs);
+GEN_RVP_R_ACC_OOL(kmaxds);
+GEN_RVP_R_ACC_OOL(kmsda);
+GEN_RVP_R_ACC_OOL(kmsxda);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index ea3c9f6dd8..b3673a33ee 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1685,3 +1685,271 @@ static inline void do_kmmawt2_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(kmmawt2_u, 1, 4);
+
+/* Signed 16-bit Multiply with 32-bit Add/Subtract Instruction */
+static inline void do_smbb16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+}
+
+RVPR(smbb16, 1, 4);
+
+static inline void do_smbt16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+}
+
+RVPR(smbt16, 1, 4);
+
+static inline void do_smtt16(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+}
+
+RVPR(smtt16, 1, 4);
+
+static inline void do_kmda(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    if (a[H2(2 * i)] == INT16_MIN && a[H2(2 * i + 1)] == INT16_MIN &&
+        b[H2(2 * i)] == INT16_MIN && a[H2(2 * i + 1)] == INT16_MIN) {
+        d[H4(i)] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[H4(i)] = (int32_t)a[H2(2 * i)] * b[H2(2 * i)] +
+                   (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+    }
+}
+
+RVPR(kmda, 1, 4);
+
+static inline void do_kmxda(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    if (a[H2(2 * i)] == INT16_MIN && a[H2(2 * i + 1)] == INT16_MIN &&
+        b[H2(2 * i)] == INT16_MIN && a[H2(2 * i + 1)] == INT16_MIN) {
+        d[H4(i)] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[H4(i)] = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i)] +
+                   (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+    }
+}
+
+RVPR(kmxda, 1, 4);
+
+static inline void do_smds(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)] -
+               (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+}
+
+RVPR(smds, 1, 4);
+
+static inline void do_smdrs(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i)] * b[H2(2 * i)] -
+               (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+}
+
+RVPR(smdrs, 1, 4);
+
+static inline void do_smxds(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i)] -
+               (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+}
+
+RVPR(smxds, 1, 4);
+
+static inline void do_kmabb(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = sadd32(env, 0, (int32_t)a[H2(2 * i)] * b[H2(2 * i)], c[H4(i)]);
+}
+
+RVPR_ACC(kmabb, 1, 4);
+
+static inline void do_kmabt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = sadd32(env, 0, (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)],
+                      c[H4(i)]);
+}
+
+RVPR_ACC(kmabt, 1, 4);
+
+static inline void do_kmatt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    d[H4(i)] = sadd32(env, 0, (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)],
+                      c[H4(i)]);
+}
+
+RVPR_ACC(kmatt, 1, 4);
+
+static inline void do_kmada(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+    p2 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+
+    if (a[H2(i)] == INT16_MIN && a[H2(i + 1)] == INT16_MIN &&
+        b[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        if (c[H4(i)] < 0) {
+            d[H4(i)] = INT32_MAX + c[H4(i)] + 1ll;
+        } else {
+            env->vxsat = 0x1;
+            d[H4(i)] = INT32_MAX;
+        }
+    } else {
+        d[H4(i)] = sadd32(env, 0, p1 + p2, c[H4(i)]);
+    }
+}
+
+RVPR_ACC(kmada, 1, 4);
+
+static inline void do_kmaxda(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i)];
+    p2 = (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+
+    if (a[H2(2 * i)] == INT16_MIN && a[H2(2 * i + 1)] == INT16_MIN &&
+        b[H2(2 * i)] == INT16_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
+        if (c[H4(i)] < 0) {
+            d[H4(i)] = INT32_MAX + c[H4(i)] + 1ll;
+        } else {
+            env->vxsat = 0x1;
+            d[H4(i)] = INT32_MAX;
+        }
+    } else {
+        d[H4(i)] = sadd32(env, 0, p1 + p2, c[H4(i)]);
+    }
+}
+
+RVPR_ACC(kmaxda, 1, 4);
+
+static inline void do_kmads(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 =   (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+    p2 =   (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+
+    d[H4(i)] = sadd32(env, 0, p1 - p2, c[H4(i)]);
+}
+
+RVPR_ACC(kmads, 1, 4);
+
+static inline void do_kmadrs(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void * vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+    p2 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+
+    d[H4(i)] = sadd32(env, 0, p1 - p2, c[H4(i)]);
+}
+
+RVPR_ACC(kmadrs, 1, 4);
+
+static inline void do_kmaxds(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i)];
+    p2 = (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+
+    d[H4(i)] = sadd32(env, 0, p1 - p2, c[H4(i)]);
+}
+
+RVPR_ACC(kmaxds, 1, 4);
+
+static inline void do_kmsda(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i)] * b[H2(2 * i)];
+    p2 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i + 1)];
+
+    if (a[H2(i)] == INT16_MIN && a[H2(i + 1)] == INT16_MIN &&
+        b[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        if (c[H4(i)] < 0) {
+            env->vxsat = 0x1;
+            d[H4(i)] = INT32_MIN;
+        } else {
+            d[H4(i)] = c[H4(i)] - 1ll - INT32_MAX;
+        }
+    } else {
+        d[H4(i)] = ssub32(env, 0, c[H4(i)], p1 + p2);
+    }
+}
+
+RVPR_ACC(kmsda, 1, 4);
+
+static inline void do_kmsxda(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void * vc, uint8_t i)
+{
+    int32_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+    int32_t p1, p2;
+    p1 = (int32_t)a[H2(2 * i)] * b[H2(2 * i + 1)];
+    p2 = (int32_t)a[H2(2 * i + 1)] * b[H2(2 * i)];
+
+    if (a[H2(i)] == INT16_MIN && a[H2(i + 1)] == INT16_MIN &&
+        b[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        if (d[H4(i)] < 0) {
+            env->vxsat = 0x1;
+            d[H4(i)] = INT32_MIN;
+        } else {
+            d[H4(i)] = c[H4(i)] - 1ll - INT32_MAX;
+        }
+    } else {
+        d[H4(i)] = ssub32(env, 0, c[H4(i)], p1 + p2);
+    }
+}
+
+RVPR_ACC(kmsxda, 1, 4);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 19/38] target/riscv: Signed 16-bit Multiply 64-bit Add/Subtract Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  2 +
 target/riscv/insn32.decode              |  2 +
 target/riscv/insn_trans/trans_rvp.c.inc | 54 +++++++++++++++++++++++++
 target/riscv/packed_helper.c            | 25 ++++++++++++
 4 files changed, 83 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index b1f831bb02..2511134610 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1313,3 +1313,5 @@ DEF_HELPER_4(kmadrs, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmaxds, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmsda, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmsxda, tl, env, tl, tl, tl)
+
+DEF_HELPER_3(smal, i64, env, i64, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 4e5cdbb928..a022f660b7 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -781,3 +781,5 @@ kmadrs     0110110  ..... ..... 001 ..... 1111111 @r
 kmaxds     0111110  ..... ..... 001 ..... 1111111 @r
 kmsda      0100110  ..... ..... 001 ..... 1111111 @r
 kmsxda     0100111  ..... ..... 001 ..... 1111111 @r
+
+smal       0101111  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 261aab402a..73a26bbfbd 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -602,3 +602,57 @@ GEN_RVP_R_ACC_OOL(kmadrs);
 GEN_RVP_R_ACC_OOL(kmaxds);
 GEN_RVP_R_ACC_OOL(kmsda);
 GEN_RVP_R_ACC_OOL(kmsxda);
+
+/* Signed 16-bit Multiply with 64-bit Add/Subtract Instructions */
+static bool
+r_d64_s64_ool(DisasContext *ctx, arg_r *a,
+              void (* fn)(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv))
+{
+#ifdef TARGET_RISCV64
+    return r_ool(ctx, a, fn);
+#else
+    TCGv_i32 src2, a0, a1, d0, d1;
+    TCGv_i64 src1, dst;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_p64) {
+        return false;
+    }
+
+    src1 = tcg_temp_new_i64();
+    src2 = tcg_temp_new_i32();
+    dst = tcg_temp_new_i64();
+    a0 = tcg_temp_new_i32();
+    a1 = tcg_temp_new_i32();
+
+    gen_get_gpr(a0, a->rs1);
+    gen_get_gpr(a1, a->rs1 + 1);
+    tcg_gen_concat_i32_i64(src1, a0, a1);
+    gen_get_gpr(src2, a->rs2);
+
+    fn(dst, cpu_env, src1, src2);
+
+    d0 = tcg_temp_new_i32();
+    d1 = tcg_temp_new_i32();
+    tcg_gen_extrl_i64_i32(d0, dst);
+    tcg_gen_extrh_i64_i32(d1, dst);
+    gen_set_gpr(a->rd, d0);
+    gen_set_gpr(a->rd + 1, d1);
+    tcg_temp_free_i32(d0);
+    tcg_temp_free_i32(d1);
+
+    tcg_temp_free_i64(src1);
+    tcg_temp_free_i32(src2);
+    tcg_temp_free_i64(dst);
+    tcg_temp_free_i32(a0);
+    tcg_temp_free_i32(a1);
+    return true;
+#endif
+}
+
+#define GEN_RVP_R_D64_S64_OOL(NAME)                    \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_d64_s64_ool(s, a, gen_helper_##NAME);     \
+}
+
+GEN_RVP_R_D64_S64_OOL(smal);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index b3673a33ee..8ad7ea8354 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1953,3 +1953,28 @@ static inline void do_kmsxda(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(kmsxda, 1, 4);
+
+/* Signed 16-bit Multiply with 64-bit Add/Subtract Instructions */
+static inline void do_smal(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va;
+    int16_t *b = vb;
+
+    if (i == 0) {
+        *d = *a;
+    }
+
+    *d += b[H2(i)] * b[H2(i + 1)];
+}
+
+uint64_t helper_smal(CPURISCVState *env, uint64_t a, target_ulong b)
+{
+    int i;
+    int64_t result = 0;
+
+    for (i = 0; i < sizeof(target_ulong); i += 2) {
+        do_smal(env, &result, &a, &b, i);
+    }
+    return result;
+}
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 19/38] target/riscv: Signed 16-bit Multiply 64-bit Add/Subtract Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  2 +
 target/riscv/insn32.decode              |  2 +
 target/riscv/insn_trans/trans_rvp.c.inc | 54 +++++++++++++++++++++++++
 target/riscv/packed_helper.c            | 25 ++++++++++++
 4 files changed, 83 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index b1f831bb02..2511134610 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1313,3 +1313,5 @@ DEF_HELPER_4(kmadrs, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmaxds, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmsda, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmsxda, tl, env, tl, tl, tl)
+
+DEF_HELPER_3(smal, i64, env, i64, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 4e5cdbb928..a022f660b7 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -781,3 +781,5 @@ kmadrs     0110110  ..... ..... 001 ..... 1111111 @r
 kmaxds     0111110  ..... ..... 001 ..... 1111111 @r
 kmsda      0100110  ..... ..... 001 ..... 1111111 @r
 kmsxda     0100111  ..... ..... 001 ..... 1111111 @r
+
+smal       0101111  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 261aab402a..73a26bbfbd 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -602,3 +602,57 @@ GEN_RVP_R_ACC_OOL(kmadrs);
 GEN_RVP_R_ACC_OOL(kmaxds);
 GEN_RVP_R_ACC_OOL(kmsda);
 GEN_RVP_R_ACC_OOL(kmsxda);
+
+/* Signed 16-bit Multiply with 64-bit Add/Subtract Instructions */
+static bool
+r_d64_s64_ool(DisasContext *ctx, arg_r *a,
+              void (* fn)(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv))
+{
+#ifdef TARGET_RISCV64
+    return r_ool(ctx, a, fn);
+#else
+    TCGv_i32 src2, a0, a1, d0, d1;
+    TCGv_i64 src1, dst;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_p64) {
+        return false;
+    }
+
+    src1 = tcg_temp_new_i64();
+    src2 = tcg_temp_new_i32();
+    dst = tcg_temp_new_i64();
+    a0 = tcg_temp_new_i32();
+    a1 = tcg_temp_new_i32();
+
+    gen_get_gpr(a0, a->rs1);
+    gen_get_gpr(a1, a->rs1 + 1);
+    tcg_gen_concat_i32_i64(src1, a0, a1);
+    gen_get_gpr(src2, a->rs2);
+
+    fn(dst, cpu_env, src1, src2);
+
+    d0 = tcg_temp_new_i32();
+    d1 = tcg_temp_new_i32();
+    tcg_gen_extrl_i64_i32(d0, dst);
+    tcg_gen_extrh_i64_i32(d1, dst);
+    gen_set_gpr(a->rd, d0);
+    gen_set_gpr(a->rd + 1, d1);
+    tcg_temp_free_i32(d0);
+    tcg_temp_free_i32(d1);
+
+    tcg_temp_free_i64(src1);
+    tcg_temp_free_i32(src2);
+    tcg_temp_free_i64(dst);
+    tcg_temp_free_i32(a0);
+    tcg_temp_free_i32(a1);
+    return true;
+#endif
+}
+
+#define GEN_RVP_R_D64_S64_OOL(NAME)                    \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_d64_s64_ool(s, a, gen_helper_##NAME);     \
+}
+
+GEN_RVP_R_D64_S64_OOL(smal);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index b3673a33ee..8ad7ea8354 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1953,3 +1953,28 @@ static inline void do_kmsxda(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(kmsxda, 1, 4);
+
+/* Signed 16-bit Multiply with 64-bit Add/Subtract Instructions */
+static inline void do_smal(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va;
+    int16_t *b = vb;
+
+    if (i == 0) {
+        *d = *a;
+    }
+
+    *d += b[H2(i)] * b[H2(i + 1)];
+}
+
+uint64_t helper_smal(CPURISCVState *env, uint64_t a, target_ulong b)
+{
+    int i;
+    int64_t result = 0;
+
+    for (i = 0; i < sizeof(target_ulong); i += 2) {
+        do_smal(env, &result, &a, &b, i);
+    }
+    return result;
+}
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 20/38] target/riscv: Partial-SIMD Miscellaneous Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  8 +++
 target/riscv/insn32-64.decode           |  4 --
 target/riscv/insn32.decode              | 10 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  9 +++
 target/riscv/packed_helper.c            | 75 +++++++++++++++++++++++++
 5 files changed, 102 insertions(+), 4 deletions(-)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 2511134610..7c3a0654d6 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1315,3 +1315,11 @@ DEF_HELPER_4(kmsda, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmsxda, tl, env, tl, tl, tl)
 
 DEF_HELPER_3(smal, i64, env, i64, tl)
+
+DEF_HELPER_3(sclip32, tl, env, tl, tl)
+DEF_HELPER_3(uclip32, tl, env, tl, tl)
+DEF_HELPER_2(clrs32, tl, env, tl)
+DEF_HELPER_2(clz32, tl, env, tl)
+DEF_HELPER_2(clo32, tl, env, tl)
+DEF_HELPER_3(pbsad, tl, env, tl, tl)
+DEF_HELPER_4(pbsada, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index 8157dee8b7..1094172210 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -19,10 +19,6 @@
 # This is concatenated with insn32.decode for risc64 targets.
 # Most of the fields and formats are there.
 
-%sh5    20:5
-
-@sh5     .......  ..... .....  ... ..... ....... &shift  shamt=%sh5      %rs1 %rd
-
 # *** RV64I Base Instruction Set (in addition to RV32I) ***
 lwu      ............   ..... 110 ..... 0000011 @i
 ld       ............   ..... 011 ..... 0000011 @i
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index a022f660b7..12e95f9c5f 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -25,6 +25,7 @@
 %sh10    20:10
 %sh4    20:4
 %sh3    20:3
+%sh5    20:5
 %csr    20:12
 %rm     12:3
 %nf     29:3                     !function=ex_plus_1
@@ -64,6 +65,7 @@
 @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh10      %rs1 %rd
 @sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
 @sh3     ......  ...... .....  ... ..... ....... &shift  shamt=%sh3      %rs1 %rd
+@sh5     ......  ...... .....  ... ..... ....... &shift  shamt=%sh5      %rs1 %rd
 @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
 
 @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
@@ -783,3 +785,11 @@ kmsda      0100110  ..... ..... 001 ..... 1111111 @r
 kmsxda     0100111  ..... ..... 001 ..... 1111111 @r
 
 smal       0101111  ..... ..... 001 ..... 1111111 @r
+
+sclip32    1110010  ..... ..... 000 ..... 1111111 @sh5
+uclip32    1111010  ..... ..... 000 ..... 1111111 @sh5
+clrs32     1010111  11000 ..... 000 ..... 1111111 @r2
+clz32      1010111  11001 ..... 000 ..... 1111111 @r2
+clo32      1010111  11011 ..... 000 ..... 1111111 @r2
+pbsad      1111110  ..... ..... 000 ..... 1111111 @r
+pbsada     1111111  ..... ..... 000 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 73a26bbfbd..42656682c6 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -656,3 +656,12 @@ static bool trans_##NAME(DisasContext *s, arg_r *a)    \
 }
 
 GEN_RVP_R_D64_S64_OOL(smal);
+
+/* Partial-SIMD Miscellaneous Instructions */
+GEN_RVP_SHIFTI(sclip32, sclip32, NULL);
+GEN_RVP_SHIFTI(uclip32, uclip32, NULL);
+GEN_RVP_R2_OOL(clrs32);
+GEN_RVP_R2_OOL(clz32);
+GEN_RVP_R2_OOL(clo32);
+GEN_RVP_R_OOL(pbsad);
+GEN_RVP_R_ACC_OOL(pbsada);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 8ad7ea8354..96e73c045b 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1978,3 +1978,78 @@ uint64_t helper_smal(CPURISCVState *env, uint64_t a, target_ulong b)
     }
     return result;
 }
+
+/* Partial-SIMD Miscellaneous Instructions */
+static inline void do_sclip32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+
+    d[i] = sat64(env, a[i], shift);
+}
+
+RVPR(sclip32, 1, 4);
+
+static inline void do_uclip32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+
+    if (a[i] < 0) {
+        d[i] = 0;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = satu64(env, a[i], shift);
+    }
+}
+
+RVPR(uclip32, 1, 4);
+
+static inline void do_clrs32(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    d[i] = clrsb32(a[i]);
+}
+
+RVPR2(clrs32, 1, 4);
+
+static inline void do_clz32(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    d[i] = clz32(a[i]);
+}
+
+RVPR2(clz32, 1, 4);
+
+static inline void do_clo32(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    d[i] = clo32(a[i]);
+}
+
+RVPR2(clo32, 1, 4);
+
+static inline void do_pbsad(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_ulong *d = vd;
+    uint8_t *a = va, *b = vb;
+    *d += abs(a[i] - b[i]);
+}
+
+RVPR(pbsad, 1, 1);
+
+static inline void do_pbsada(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    target_ulong *d = vd, *c = vc;
+    uint8_t *a = va, *b = vb;
+    if (i == 0) {
+        *d += *c;
+    }
+    *d += abs(a[i] - b[i]);
+}
+
+RVPR_ACC(pbsada, 1, 1);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 20/38] target/riscv: Partial-SIMD Miscellaneous Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  8 +++
 target/riscv/insn32-64.decode           |  4 --
 target/riscv/insn32.decode              | 10 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  9 +++
 target/riscv/packed_helper.c            | 75 +++++++++++++++++++++++++
 5 files changed, 102 insertions(+), 4 deletions(-)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 2511134610..7c3a0654d6 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1315,3 +1315,11 @@ DEF_HELPER_4(kmsda, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmsxda, tl, env, tl, tl, tl)
 
 DEF_HELPER_3(smal, i64, env, i64, tl)
+
+DEF_HELPER_3(sclip32, tl, env, tl, tl)
+DEF_HELPER_3(uclip32, tl, env, tl, tl)
+DEF_HELPER_2(clrs32, tl, env, tl)
+DEF_HELPER_2(clz32, tl, env, tl)
+DEF_HELPER_2(clo32, tl, env, tl)
+DEF_HELPER_3(pbsad, tl, env, tl, tl)
+DEF_HELPER_4(pbsada, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index 8157dee8b7..1094172210 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -19,10 +19,6 @@
 # This is concatenated with insn32.decode for risc64 targets.
 # Most of the fields and formats are there.
 
-%sh5    20:5
-
-@sh5     .......  ..... .....  ... ..... ....... &shift  shamt=%sh5      %rs1 %rd
-
 # *** RV64I Base Instruction Set (in addition to RV32I) ***
 lwu      ............   ..... 110 ..... 0000011 @i
 ld       ............   ..... 011 ..... 0000011 @i
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index a022f660b7..12e95f9c5f 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -25,6 +25,7 @@
 %sh10    20:10
 %sh4    20:4
 %sh3    20:3
+%sh5    20:5
 %csr    20:12
 %rm     12:3
 %nf     29:3                     !function=ex_plus_1
@@ -64,6 +65,7 @@
 @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh10      %rs1 %rd
 @sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
 @sh3     ......  ...... .....  ... ..... ....... &shift  shamt=%sh3      %rs1 %rd
+@sh5     ......  ...... .....  ... ..... ....... &shift  shamt=%sh5      %rs1 %rd
 @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
 
 @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
@@ -783,3 +785,11 @@ kmsda      0100110  ..... ..... 001 ..... 1111111 @r
 kmsxda     0100111  ..... ..... 001 ..... 1111111 @r
 
 smal       0101111  ..... ..... 001 ..... 1111111 @r
+
+sclip32    1110010  ..... ..... 000 ..... 1111111 @sh5
+uclip32    1111010  ..... ..... 000 ..... 1111111 @sh5
+clrs32     1010111  11000 ..... 000 ..... 1111111 @r2
+clz32      1010111  11001 ..... 000 ..... 1111111 @r2
+clo32      1010111  11011 ..... 000 ..... 1111111 @r2
+pbsad      1111110  ..... ..... 000 ..... 1111111 @r
+pbsada     1111111  ..... ..... 000 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 73a26bbfbd..42656682c6 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -656,3 +656,12 @@ static bool trans_##NAME(DisasContext *s, arg_r *a)    \
 }
 
 GEN_RVP_R_D64_S64_OOL(smal);
+
+/* Partial-SIMD Miscellaneous Instructions */
+GEN_RVP_SHIFTI(sclip32, sclip32, NULL);
+GEN_RVP_SHIFTI(uclip32, uclip32, NULL);
+GEN_RVP_R2_OOL(clrs32);
+GEN_RVP_R2_OOL(clz32);
+GEN_RVP_R2_OOL(clo32);
+GEN_RVP_R_OOL(pbsad);
+GEN_RVP_R_ACC_OOL(pbsada);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 8ad7ea8354..96e73c045b 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -1978,3 +1978,78 @@ uint64_t helper_smal(CPURISCVState *env, uint64_t a, target_ulong b)
     }
     return result;
 }
+
+/* Partial-SIMD Miscellaneous Instructions */
+static inline void do_sclip32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+
+    d[i] = sat64(env, a[i], shift);
+}
+
+RVPR(sclip32, 1, 4);
+
+static inline void do_uclip32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+
+    if (a[i] < 0) {
+        d[i] = 0;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = satu64(env, a[i], shift);
+    }
+}
+
+RVPR(uclip32, 1, 4);
+
+static inline void do_clrs32(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    d[i] = clrsb32(a[i]);
+}
+
+RVPR2(clrs32, 1, 4);
+
+static inline void do_clz32(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    d[i] = clz32(a[i]);
+}
+
+RVPR2(clz32, 1, 4);
+
+static inline void do_clo32(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    d[i] = clo32(a[i]);
+}
+
+RVPR2(clo32, 1, 4);
+
+static inline void do_pbsad(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_ulong *d = vd;
+    uint8_t *a = va, *b = vb;
+    *d += abs(a[i] - b[i]);
+}
+
+RVPR(pbsad, 1, 1);
+
+static inline void do_pbsada(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    target_ulong *d = vd, *c = vc;
+    uint8_t *a = va, *b = vb;
+    if (i == 0) {
+        *d += *c;
+    }
+    *d += abs(a[i] - b[i]);
+}
+
+RVPR_ACC(pbsada, 1, 1);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 21/38] target/riscv: 8-bit Multiply with 32-bit Add Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  4 +++
 target/riscv/insn32.decode              |  4 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  5 +++
 target/riscv/packed_helper.c            | 44 +++++++++++++++++++++++++
 4 files changed, 57 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 7c3a0654d6..0ddd07b305 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1323,3 +1323,7 @@ DEF_HELPER_2(clz32, tl, env, tl)
 DEF_HELPER_2(clo32, tl, env, tl)
 DEF_HELPER_3(pbsad, tl, env, tl, tl)
 DEF_HELPER_4(pbsada, tl, env, tl, tl, tl)
+
+DEF_HELPER_4(smaqa, tl, env, tl, tl, tl)
+DEF_HELPER_4(umaqa, tl, env, tl, tl, tl)
+DEF_HELPER_4(smaqa_su, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 12e95f9c5f..6a50abca21 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -793,3 +793,7 @@ clz32      1010111  11001 ..... 000 ..... 1111111 @r2
 clo32      1010111  11011 ..... 000 ..... 1111111 @r2
 pbsad      1111110  ..... ..... 000 ..... 1111111 @r
 pbsada     1111111  ..... ..... 000 ..... 1111111 @r
+
+smaqa      1100100  ..... ..... 000 ..... 1111111 @r
+umaqa      1100110  ..... ..... 000 ..... 1111111 @r
+smaqa_su   1100101  ..... ..... 000 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 42656682c6..0877cd04b4 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -665,3 +665,8 @@ GEN_RVP_R2_OOL(clz32);
 GEN_RVP_R2_OOL(clo32);
 GEN_RVP_R_OOL(pbsad);
 GEN_RVP_R_ACC_OOL(pbsada);
+
+/* 8-bit Multiply with 32-bit Add Instructions */
+GEN_RVP_R_ACC_OOL(smaqa);
+GEN_RVP_R_ACC_OOL(umaqa);
+GEN_RVP_R_ACC_OOL(smaqa_su);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 96e73c045b..02a0f912e9 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2053,3 +2053,47 @@ static inline void do_pbsada(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(pbsada, 1, 1);
+
+/* 8-bit Multiply with 32-bit Add Instructions */
+static inline void do_smaqa(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int8_t *a = va, *b = vb;
+    int32_t *d = vd, *c = vc;
+
+    d[H4(i)] = c[H4(i)] + a[H1(i * 4)] * b[H1(i * 4)] +
+               a[H1(i * 4 + 1)] * b[H1(i * 4 + 1)] +
+               a[H1(i * 4 + 2)] * b[H1(i * 4 + 2)] +
+               a[H1(i * 4 + 3)] * b[H1(i * 4 + 3)];
+}
+
+RVPR_ACC(smaqa, 1, 4);
+
+static inline void do_umaqa(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    uint8_t *a = va, *b = vb;
+    uint32_t *d = vd, *c = vc;
+
+    d[H4(i)] = c[H4(i)] + a[H1(i * 4)] * b[H1(i * 4)] +
+               a[H1(i * 4 + 1)] * b[H1(i * 4 + 1)] +
+               a[H1(i * 4 + 2)] * b[H1(i * 4 + 2)] +
+               a[H1(i * 4 + 3)] * b[H1(i * 4 + 3)];
+}
+
+RVPR_ACC(umaqa, 1, 4);
+
+static inline void do_smaqa_su(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int8_t *a = va;
+    uint8_t *b = vb;
+    int32_t *d = vd, *c = vc;
+
+    d[H4(i)] = c[H4(i)] + a[H1(i * 4)] * b[H1(i * 4)] +
+               a[H1(i * 4 + 1)] * b[H1(i * 4 + 1)] +
+               a[H1(i * 4 + 2)] * b[H1(i * 4 + 2)] +
+               a[H1(i * 4 + 3)] * b[H1(i * 4 + 3)];
+}
+
+RVPR_ACC(smaqa_su, 1, 4);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 21/38] target/riscv: 8-bit Multiply with 32-bit Add Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  4 +++
 target/riscv/insn32.decode              |  4 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  5 +++
 target/riscv/packed_helper.c            | 44 +++++++++++++++++++++++++
 4 files changed, 57 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 7c3a0654d6..0ddd07b305 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1323,3 +1323,7 @@ DEF_HELPER_2(clz32, tl, env, tl)
 DEF_HELPER_2(clo32, tl, env, tl)
 DEF_HELPER_3(pbsad, tl, env, tl, tl)
 DEF_HELPER_4(pbsada, tl, env, tl, tl, tl)
+
+DEF_HELPER_4(smaqa, tl, env, tl, tl, tl)
+DEF_HELPER_4(umaqa, tl, env, tl, tl, tl)
+DEF_HELPER_4(smaqa_su, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 12e95f9c5f..6a50abca21 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -793,3 +793,7 @@ clz32      1010111  11001 ..... 000 ..... 1111111 @r2
 clo32      1010111  11011 ..... 000 ..... 1111111 @r2
 pbsad      1111110  ..... ..... 000 ..... 1111111 @r
 pbsada     1111111  ..... ..... 000 ..... 1111111 @r
+
+smaqa      1100100  ..... ..... 000 ..... 1111111 @r
+umaqa      1100110  ..... ..... 000 ..... 1111111 @r
+smaqa_su   1100101  ..... ..... 000 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 42656682c6..0877cd04b4 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -665,3 +665,8 @@ GEN_RVP_R2_OOL(clz32);
 GEN_RVP_R2_OOL(clo32);
 GEN_RVP_R_OOL(pbsad);
 GEN_RVP_R_ACC_OOL(pbsada);
+
+/* 8-bit Multiply with 32-bit Add Instructions */
+GEN_RVP_R_ACC_OOL(smaqa);
+GEN_RVP_R_ACC_OOL(umaqa);
+GEN_RVP_R_ACC_OOL(smaqa_su);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 96e73c045b..02a0f912e9 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2053,3 +2053,47 @@ static inline void do_pbsada(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(pbsada, 1, 1);
+
+/* 8-bit Multiply with 32-bit Add Instructions */
+static inline void do_smaqa(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    int8_t *a = va, *b = vb;
+    int32_t *d = vd, *c = vc;
+
+    d[H4(i)] = c[H4(i)] + a[H1(i * 4)] * b[H1(i * 4)] +
+               a[H1(i * 4 + 1)] * b[H1(i * 4 + 1)] +
+               a[H1(i * 4 + 2)] * b[H1(i * 4 + 2)] +
+               a[H1(i * 4 + 3)] * b[H1(i * 4 + 3)];
+}
+
+RVPR_ACC(smaqa, 1, 4);
+
+static inline void do_umaqa(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    uint8_t *a = va, *b = vb;
+    uint32_t *d = vd, *c = vc;
+
+    d[H4(i)] = c[H4(i)] + a[H1(i * 4)] * b[H1(i * 4)] +
+               a[H1(i * 4 + 1)] * b[H1(i * 4 + 1)] +
+               a[H1(i * 4 + 2)] * b[H1(i * 4 + 2)] +
+               a[H1(i * 4 + 3)] * b[H1(i * 4 + 3)];
+}
+
+RVPR_ACC(umaqa, 1, 4);
+
+static inline void do_smaqa_su(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int8_t *a = va;
+    uint8_t *b = vb;
+    int32_t *d = vd, *c = vc;
+
+    d[H4(i)] = c[H4(i)] + a[H1(i * 4)] * b[H1(i * 4)] +
+               a[H1(i * 4 + 1)] * b[H1(i * 4 + 1)] +
+               a[H1(i * 4 + 2)] * b[H1(i * 4 + 2)] +
+               a[H1(i * 4 + 3)] * b[H1(i * 4 + 3)];
+}
+
+RVPR_ACC(smaqa_su, 1, 4);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 22/38] target/riscv: 64-bit Add/Subtract Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  11 ++
 target/riscv/insn32.decode              |  11 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  72 +++++++++++++
 target/riscv/packed_helper.c            | 132 ++++++++++++++++++++++++
 4 files changed, 226 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 0ddd07b305..cce4c8cbcc 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1327,3 +1327,14 @@ DEF_HELPER_4(pbsada, tl, env, tl, tl, tl)
 DEF_HELPER_4(smaqa, tl, env, tl, tl, tl)
 DEF_HELPER_4(umaqa, tl, env, tl, tl, tl)
 DEF_HELPER_4(smaqa_su, tl, env, tl, tl, tl)
+
+DEF_HELPER_3(add64, i64, env, i64, i64)
+DEF_HELPER_3(radd64, i64, env, i64, i64)
+DEF_HELPER_3(uradd64, i64, env, i64, i64)
+DEF_HELPER_3(kadd64, i64, env, i64, i64)
+DEF_HELPER_3(ukadd64, i64, env, i64, i64)
+DEF_HELPER_3(sub64, i64, env, i64, i64)
+DEF_HELPER_3(rsub64, i64, env, i64, i64)
+DEF_HELPER_3(ursub64, i64, env, i64, i64)
+DEF_HELPER_3(ksub64, i64, env, i64, i64)
+DEF_HELPER_3(uksub64, i64, env, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 6a50abca21..b52e1c1142 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -797,3 +797,14 @@ pbsada     1111111  ..... ..... 000 ..... 1111111 @r
 smaqa      1100100  ..... ..... 000 ..... 1111111 @r
 umaqa      1100110  ..... ..... 000 ..... 1111111 @r
 smaqa_su   1100101  ..... ..... 000 ..... 1111111 @r
+
+add64      1100000  ..... ..... 001 ..... 1111111 @r
+radd64     1000000  ..... ..... 001 ..... 1111111 @r
+uradd64    1010000  ..... ..... 001 ..... 1111111 @r
+kadd64     1001000  ..... ..... 001 ..... 1111111 @r
+ukadd64    1011000  ..... ..... 001 ..... 1111111 @r
+sub64      1100001  ..... ..... 001 ..... 1111111 @r
+rsub64     1000001  ..... ..... 001 ..... 1111111 @r
+ursub64    1010001  ..... ..... 001 ..... 1111111 @r
+ksub64     1001001  ..... ..... 001 ..... 1111111 @r
+uksub64    1011001  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 0877cd04b4..94e5e09425 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -670,3 +670,75 @@ GEN_RVP_R_ACC_OOL(pbsada);
 GEN_RVP_R_ACC_OOL(smaqa);
 GEN_RVP_R_ACC_OOL(umaqa);
 GEN_RVP_R_ACC_OOL(smaqa_su);
+
+/*
+ *** 64-bit Profile Instructions
+ */
+/* 64-bit Addition & Subtraction Instructions */
+static bool
+r_d64_s64_s64_ool(DisasContext *ctx, arg_r *a,
+                  void (* fn)(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64))
+{
+#ifdef TARGET_RISCV64
+    return r_ool(ctx, a, fn);
+#else
+    TCGv a0, a1, b0, b1, d0, d1;
+    TCGv_i64 src1, src2, dst;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_p64) {
+        return false;
+    }
+
+    src1 = tcg_temp_new_i64();
+    src2 = tcg_temp_new_i64();
+    dst = tcg_temp_new_i64();
+    a0 = tcg_temp_new_i32();
+    a1 = tcg_temp_new_i32();
+    b0 = tcg_temp_new_i32();
+    b1 = tcg_temp_new_i32();
+
+    gen_get_gpr(a0, a->rs1);
+    gen_get_gpr(a1, a->rs1 + 1);
+    tcg_gen_concat_i32_i64(src1, a0, a1);
+    gen_get_gpr(b0, a->rs2);
+    gen_get_gpr(b1, a->rs2 + 1);
+    tcg_gen_concat_i32_i64(src2, b0, b1);
+
+    fn(dst, cpu_env, src1, src2);
+
+    d0 = tcg_temp_new_i32();
+    d1 = tcg_temp_new_i32();
+    tcg_gen_extrl_i64_i32(d0, dst);
+    tcg_gen_extrh_i64_i32(d1, dst);
+    gen_set_gpr(a->rd, d0);
+    gen_set_gpr(a->rd + 1, d1);
+    tcg_temp_free_i32(d0);
+    tcg_temp_free_i32(d1);
+
+    tcg_temp_free_i64(src1);
+    tcg_temp_free_i64(src2);
+    tcg_temp_free_i64(dst);
+    tcg_temp_free_i32(a0);
+    tcg_temp_free_i32(a1);
+    tcg_temp_free_i32(b0);
+    tcg_temp_free_i32(b1);
+    return true;
+#endif
+}
+
+#define GEN_RVP_R_D64_S64_S64_OOL(NAME)                   \
+static bool trans_##NAME(DisasContext *s, arg_r *a)       \
+{                                                         \
+    return r_d64_s64_s64_ool(s, a, gen_helper_##NAME);    \
+}
+
+GEN_RVP_R_D64_S64_S64_OOL(add64);
+GEN_RVP_R_D64_S64_S64_OOL(radd64);
+GEN_RVP_R_D64_S64_S64_OOL(uradd64);
+GEN_RVP_R_D64_S64_S64_OOL(kadd64);
+GEN_RVP_R_D64_S64_S64_OOL(ukadd64);
+GEN_RVP_R_D64_S64_S64_OOL(sub64);
+GEN_RVP_R_D64_S64_S64_OOL(rsub64);
+GEN_RVP_R_D64_S64_S64_OOL(ursub64);
+GEN_RVP_R_D64_S64_S64_OOL(ksub64);
+GEN_RVP_R_D64_S64_S64_OOL(uksub64);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 02a0f912e9..0629c5178b 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2097,3 +2097,135 @@ static inline void do_smaqa_su(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(smaqa_su, 1, 4);
+
+/*
+ *** 64-bit Profile Instructions
+ */
+/* 64-bit Addition & Subtraction Instructions */
+
+/* Define a common function to loop elements in packed register */
+static inline uint64_t
+rvpr64_64_64(CPURISCVState *env, uint64_t a, uint64_t b,
+             uint8_t step, uint8_t size, PackedFn3i *fn)
+{
+    int i, passes = sizeof(uint64_t) / size;
+    uint64_t result = 0;
+
+    for (i = 0; i < passes; i += step) {
+        fn(env, &result, &a, &b, i);
+    }
+    return result;
+}
+
+#define RVPR64_64_64(NAME, STEP, SIZE)                                    \
+uint64_t HELPER(NAME)(CPURISCVState *env, uint64_t a, uint64_t b)         \
+{                                                                         \
+    return rvpr64_64_64(env, a, b, STEP, SIZE, (PackedFn3i *)do_##NAME);  \
+}
+
+static inline void do_add64(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = *a + *b;
+}
+
+RVPR64_64_64(add64, 1, 8);
+
+static inline int64_t hadd64(int64_t a, int64_t b)
+{
+    int64_t res = a + b;
+    int64_t over = (res ^ a) & (res ^ b) & INT64_MIN;
+
+    /* With signed overflow, bit 64 is inverse of bit 63. */
+    return (res >> 1) ^ over;
+}
+
+static inline void do_radd64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = hadd64(*a, *b);
+}
+
+RVPR64_64_64(radd64, 1, 8);
+
+static inline uint64_t haddu64(uint64_t a, uint64_t b)
+{
+    uint64_t res = a + b;
+    bool over = res < a;
+
+    return over ? ((res >> 1) | INT64_MIN) : (res >> 1);
+}
+
+static inline void do_uradd64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint64_t *d = vd, *a = va, *b = vb;
+    *d = haddu64(*a, *b);
+}
+
+RVPR64_64_64(uradd64, 1, 8);
+
+static inline void do_kadd64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = sadd64(env, 0, *a, *b);
+}
+
+RVPR64_64_64(kadd64, 1, 8);
+
+static inline void do_ukadd64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint64_t *d = vd, *a = va, *b = vb;
+    *d = saddu64(env, 0, *a, *b);
+}
+
+RVPR64_64_64(ukadd64, 1, 8);
+
+static inline void do_sub64(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = *a - *b;
+}
+
+RVPR64_64_64(sub64, 1, 8);
+
+static inline void do_rsub64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = hsub64(*a, *b);
+}
+
+RVPR64_64_64(rsub64, 1, 8);
+
+static inline void do_ursub64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint64_t *d = vd, *a = va, *b = vb;
+    *d = hsubu64(*a, *b);
+}
+
+RVPR64_64_64(ursub64, 1, 8);
+
+static inline void do_ksub64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = ssub64(env, 0, *a, *b);
+}
+
+RVPR64_64_64(ksub64, 1, 8);
+
+static inline void do_uksub64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint64_t *d = vd, *a = va, *b = vb;
+    *d = ssubu64(env, 0, *a, *b);
+}
+
+RVPR64_64_64(uksub64, 1, 8);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 22/38] target/riscv: 64-bit Add/Subtract Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  11 ++
 target/riscv/insn32.decode              |  11 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  72 +++++++++++++
 target/riscv/packed_helper.c            | 132 ++++++++++++++++++++++++
 4 files changed, 226 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 0ddd07b305..cce4c8cbcc 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1327,3 +1327,14 @@ DEF_HELPER_4(pbsada, tl, env, tl, tl, tl)
 DEF_HELPER_4(smaqa, tl, env, tl, tl, tl)
 DEF_HELPER_4(umaqa, tl, env, tl, tl, tl)
 DEF_HELPER_4(smaqa_su, tl, env, tl, tl, tl)
+
+DEF_HELPER_3(add64, i64, env, i64, i64)
+DEF_HELPER_3(radd64, i64, env, i64, i64)
+DEF_HELPER_3(uradd64, i64, env, i64, i64)
+DEF_HELPER_3(kadd64, i64, env, i64, i64)
+DEF_HELPER_3(ukadd64, i64, env, i64, i64)
+DEF_HELPER_3(sub64, i64, env, i64, i64)
+DEF_HELPER_3(rsub64, i64, env, i64, i64)
+DEF_HELPER_3(ursub64, i64, env, i64, i64)
+DEF_HELPER_3(ksub64, i64, env, i64, i64)
+DEF_HELPER_3(uksub64, i64, env, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 6a50abca21..b52e1c1142 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -797,3 +797,14 @@ pbsada     1111111  ..... ..... 000 ..... 1111111 @r
 smaqa      1100100  ..... ..... 000 ..... 1111111 @r
 umaqa      1100110  ..... ..... 000 ..... 1111111 @r
 smaqa_su   1100101  ..... ..... 000 ..... 1111111 @r
+
+add64      1100000  ..... ..... 001 ..... 1111111 @r
+radd64     1000000  ..... ..... 001 ..... 1111111 @r
+uradd64    1010000  ..... ..... 001 ..... 1111111 @r
+kadd64     1001000  ..... ..... 001 ..... 1111111 @r
+ukadd64    1011000  ..... ..... 001 ..... 1111111 @r
+sub64      1100001  ..... ..... 001 ..... 1111111 @r
+rsub64     1000001  ..... ..... 001 ..... 1111111 @r
+ursub64    1010001  ..... ..... 001 ..... 1111111 @r
+ksub64     1001001  ..... ..... 001 ..... 1111111 @r
+uksub64    1011001  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 0877cd04b4..94e5e09425 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -670,3 +670,75 @@ GEN_RVP_R_ACC_OOL(pbsada);
 GEN_RVP_R_ACC_OOL(smaqa);
 GEN_RVP_R_ACC_OOL(umaqa);
 GEN_RVP_R_ACC_OOL(smaqa_su);
+
+/*
+ *** 64-bit Profile Instructions
+ */
+/* 64-bit Addition & Subtraction Instructions */
+static bool
+r_d64_s64_s64_ool(DisasContext *ctx, arg_r *a,
+                  void (* fn)(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64))
+{
+#ifdef TARGET_RISCV64
+    return r_ool(ctx, a, fn);
+#else
+    TCGv a0, a1, b0, b1, d0, d1;
+    TCGv_i64 src1, src2, dst;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_p64) {
+        return false;
+    }
+
+    src1 = tcg_temp_new_i64();
+    src2 = tcg_temp_new_i64();
+    dst = tcg_temp_new_i64();
+    a0 = tcg_temp_new_i32();
+    a1 = tcg_temp_new_i32();
+    b0 = tcg_temp_new_i32();
+    b1 = tcg_temp_new_i32();
+
+    gen_get_gpr(a0, a->rs1);
+    gen_get_gpr(a1, a->rs1 + 1);
+    tcg_gen_concat_i32_i64(src1, a0, a1);
+    gen_get_gpr(b0, a->rs2);
+    gen_get_gpr(b1, a->rs2 + 1);
+    tcg_gen_concat_i32_i64(src2, b0, b1);
+
+    fn(dst, cpu_env, src1, src2);
+
+    d0 = tcg_temp_new_i32();
+    d1 = tcg_temp_new_i32();
+    tcg_gen_extrl_i64_i32(d0, dst);
+    tcg_gen_extrh_i64_i32(d1, dst);
+    gen_set_gpr(a->rd, d0);
+    gen_set_gpr(a->rd + 1, d1);
+    tcg_temp_free_i32(d0);
+    tcg_temp_free_i32(d1);
+
+    tcg_temp_free_i64(src1);
+    tcg_temp_free_i64(src2);
+    tcg_temp_free_i64(dst);
+    tcg_temp_free_i32(a0);
+    tcg_temp_free_i32(a1);
+    tcg_temp_free_i32(b0);
+    tcg_temp_free_i32(b1);
+    return true;
+#endif
+}
+
+#define GEN_RVP_R_D64_S64_S64_OOL(NAME)                   \
+static bool trans_##NAME(DisasContext *s, arg_r *a)       \
+{                                                         \
+    return r_d64_s64_s64_ool(s, a, gen_helper_##NAME);    \
+}
+
+GEN_RVP_R_D64_S64_S64_OOL(add64);
+GEN_RVP_R_D64_S64_S64_OOL(radd64);
+GEN_RVP_R_D64_S64_S64_OOL(uradd64);
+GEN_RVP_R_D64_S64_S64_OOL(kadd64);
+GEN_RVP_R_D64_S64_S64_OOL(ukadd64);
+GEN_RVP_R_D64_S64_S64_OOL(sub64);
+GEN_RVP_R_D64_S64_S64_OOL(rsub64);
+GEN_RVP_R_D64_S64_S64_OOL(ursub64);
+GEN_RVP_R_D64_S64_S64_OOL(ksub64);
+GEN_RVP_R_D64_S64_S64_OOL(uksub64);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 02a0f912e9..0629c5178b 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2097,3 +2097,135 @@ static inline void do_smaqa_su(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(smaqa_su, 1, 4);
+
+/*
+ *** 64-bit Profile Instructions
+ */
+/* 64-bit Addition & Subtraction Instructions */
+
+/* Define a common function to loop elements in packed register */
+static inline uint64_t
+rvpr64_64_64(CPURISCVState *env, uint64_t a, uint64_t b,
+             uint8_t step, uint8_t size, PackedFn3i *fn)
+{
+    int i, passes = sizeof(uint64_t) / size;
+    uint64_t result = 0;
+
+    for (i = 0; i < passes; i += step) {
+        fn(env, &result, &a, &b, i);
+    }
+    return result;
+}
+
+#define RVPR64_64_64(NAME, STEP, SIZE)                                    \
+uint64_t HELPER(NAME)(CPURISCVState *env, uint64_t a, uint64_t b)         \
+{                                                                         \
+    return rvpr64_64_64(env, a, b, STEP, SIZE, (PackedFn3i *)do_##NAME);  \
+}
+
+static inline void do_add64(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = *a + *b;
+}
+
+RVPR64_64_64(add64, 1, 8);
+
+static inline int64_t hadd64(int64_t a, int64_t b)
+{
+    int64_t res = a + b;
+    int64_t over = (res ^ a) & (res ^ b) & INT64_MIN;
+
+    /* With signed overflow, bit 64 is inverse of bit 63. */
+    return (res >> 1) ^ over;
+}
+
+static inline void do_radd64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = hadd64(*a, *b);
+}
+
+RVPR64_64_64(radd64, 1, 8);
+
+static inline uint64_t haddu64(uint64_t a, uint64_t b)
+{
+    uint64_t res = a + b;
+    bool over = res < a;
+
+    return over ? ((res >> 1) | INT64_MIN) : (res >> 1);
+}
+
+static inline void do_uradd64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint64_t *d = vd, *a = va, *b = vb;
+    *d = haddu64(*a, *b);
+}
+
+RVPR64_64_64(uradd64, 1, 8);
+
+static inline void do_kadd64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = sadd64(env, 0, *a, *b);
+}
+
+RVPR64_64_64(kadd64, 1, 8);
+
+static inline void do_ukadd64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint64_t *d = vd, *a = va, *b = vb;
+    *d = saddu64(env, 0, *a, *b);
+}
+
+RVPR64_64_64(ukadd64, 1, 8);
+
+static inline void do_sub64(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = *a - *b;
+}
+
+RVPR64_64_64(sub64, 1, 8);
+
+static inline void do_rsub64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = hsub64(*a, *b);
+}
+
+RVPR64_64_64(rsub64, 1, 8);
+
+static inline void do_ursub64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint64_t *d = vd, *a = va, *b = vb;
+    *d = hsubu64(*a, *b);
+}
+
+RVPR64_64_64(ursub64, 1, 8);
+
+static inline void do_ksub64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd, *a = va, *b = vb;
+    *d = ssub64(env, 0, *a, *b);
+}
+
+RVPR64_64_64(ksub64, 1, 8);
+
+static inline void do_uksub64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint64_t *d = vd, *a = va, *b = vb;
+    *d = ssubu64(env, 0, *a, *b);
+}
+
+RVPR64_64_64(uksub64, 1, 8);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 23/38] target/riscv: 32-bit Multiply 64-bit Add/Subtract Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   9 ++
 target/riscv/insn32.decode              |   9 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  63 ++++++++++
 target/riscv/packed_helper.c            | 155 ++++++++++++++++++++++++
 4 files changed, 236 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index cce4c8cbcc..4d89417287 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1338,3 +1338,12 @@ DEF_HELPER_3(rsub64, i64, env, i64, i64)
 DEF_HELPER_3(ursub64, i64, env, i64, i64)
 DEF_HELPER_3(ksub64, i64, env, i64, i64)
 DEF_HELPER_3(uksub64, i64, env, i64, i64)
+
+DEF_HELPER_4(smar64, i64, env, tl, tl, i64)
+DEF_HELPER_4(smsr64, i64, env, tl, tl, i64)
+DEF_HELPER_4(umar64, i64, env, tl, tl, i64)
+DEF_HELPER_4(umsr64, i64, env, tl, tl, i64)
+DEF_HELPER_4(kmar64, i64, env, tl, tl, i64)
+DEF_HELPER_4(kmsr64, i64, env, tl, tl, i64)
+DEF_HELPER_4(ukmar64, i64, env, tl, tl, i64)
+DEF_HELPER_4(ukmsr64, i64, env, tl, tl, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b52e1c1142..60b8b3617b 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -808,3 +808,12 @@ rsub64     1000001  ..... ..... 001 ..... 1111111 @r
 ursub64    1010001  ..... ..... 001 ..... 1111111 @r
 ksub64     1001001  ..... ..... 001 ..... 1111111 @r
 uksub64    1011001  ..... ..... 001 ..... 1111111 @r
+
+smar64     1000010  ..... ..... 001 ..... 1111111 @r
+smsr64     1000011  ..... ..... 001 ..... 1111111 @r
+umar64     1010010  ..... ..... 001 ..... 1111111 @r
+umsr64     1010011  ..... ..... 001 ..... 1111111 @r
+kmar64     1001010  ..... ..... 001 ..... 1111111 @r
+kmsr64     1001011  ..... ..... 001 ..... 1111111 @r
+ukmar64    1011010  ..... ..... 001 ..... 1111111 @r
+ukmsr64    1011011  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 94e5e09425..3e62024aac 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -742,3 +742,66 @@ GEN_RVP_R_D64_S64_S64_OOL(rsub64);
 GEN_RVP_R_D64_S64_S64_OOL(ursub64);
 GEN_RVP_R_D64_S64_S64_OOL(ksub64);
 GEN_RVP_R_D64_S64_S64_OOL(uksub64);
+
+/* 32-bit Multiply with 64-bit Add/Subtract Instructions */
+
+/* Function to accumulate 64bit destination register */
+static bool
+r_d64_acc_ool(DisasContext *ctx, arg_r *a,
+              void (* fn)(TCGv_i64, TCGv_ptr, TCGv, TCGv, TCGv_i64))
+{
+#ifdef TARGET_RISCV64
+    return r_acc_ool(ctx, a, fn);
+#else
+    TCGv_i32 src1, src2;
+    TCGv_i64 dst, src3;
+    TCGv_i32 d0, d1;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_p64) {
+        return false;
+    }
+
+    src1 = tcg_temp_new_i32();
+    src2 = tcg_temp_new_i32();
+    src3 = tcg_temp_new_i64();
+    dst = tcg_temp_new_i64();
+    d0 = tcg_temp_new_i32();
+    d1 = tcg_temp_new_i32();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    gen_get_gpr(d0, a->rd);
+    gen_get_gpr(d1, a->rd + 1);
+    tcg_gen_concat_i32_i64(src3, d0, d1);
+
+    fn(dst, cpu_env, src1, src2, src3);
+
+    tcg_gen_extrl_i64_i32(d0, dst);
+    tcg_gen_extrh_i64_i32(d1, dst);
+    gen_set_gpr(a->rd, d0);
+    gen_set_gpr(a->rd + 1, d1);
+
+    tcg_temp_free_i32(d0);
+    tcg_temp_free_i32(d1);
+    tcg_temp_free_i32(src1);
+    tcg_temp_free_i32(src2);
+    tcg_temp_free_i64(src3);
+    tcg_temp_free_i64(dst);
+    return true;
+#endif
+}
+
+#define GEN_RVP_R_D64_ACC_OOL(NAME)                    \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_d64_acc_ool(s, a, gen_helper_##NAME);     \
+}
+
+GEN_RVP_R_D64_ACC_OOL(smar64);
+GEN_RVP_R_D64_ACC_OOL(smsr64);
+GEN_RVP_R_D64_ACC_OOL(umar64);
+GEN_RVP_R_D64_ACC_OOL(umsr64);
+GEN_RVP_R_D64_ACC_OOL(kmar64);
+GEN_RVP_R_D64_ACC_OOL(kmsr64);
+GEN_RVP_R_D64_ACC_OOL(ukmar64);
+GEN_RVP_R_D64_ACC_OOL(ukmsr64);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 0629c5178b..3cbe9e51cc 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2229,3 +2229,158 @@ static inline void do_uksub64(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64_64_64(uksub64, 1, 8);
+
+/* 32-bit Multiply with 64-bit Add/Subtract Instructions */
+static inline uint64_t
+rvpr64_acc(CPURISCVState *env, target_ulong a,
+           target_ulong b, uint64_t c,
+           uint8_t step, uint8_t size, PackedFn4i *fn)
+{
+    int i, passes = sizeof(target_ulong) / size;
+    uint64_t result = 0;
+
+    for (i = 0; i < passes; i += step) {
+        fn(env, &result, &a, &b, &c, i);
+    }
+    return result;
+}
+
+#define RVPR64_ACC(NAME, STEP, SIZE)                                     \
+uint64_t HELPER(NAME)(CPURISCVState *env, target_ulong a,                \
+                      target_ulong b, uint64_t c)                        \
+{                                                                        \
+    return rvpr64_acc(env, a, b, c, STEP, SIZE, (PackedFn4i *)do_##NAME);\
+}
+
+static inline void do_smar64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    int64_t *d = vd, *c = vc;
+    if (i == 0) {
+        *d = *c;
+    }
+    *d += (int64_t)a[H4(i)] * b[H4(i)];
+}
+
+RVPR64_ACC(smar64, 1, 4);
+
+static inline void do_smsr64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    int64_t *d = vd, *c = vc;
+    if (i == 0) {
+        *d = *c;
+    }
+    *d -= (int64_t)a[H4(i)] * b[H4(i)];
+}
+
+RVPR64_ACC(smsr64, 1, 4);
+
+static inline void do_umar64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    uint64_t *d = vd, *c = vc;
+    if (i == 0) {
+        *d = *c;
+    }
+    *d += (uint64_t)a[H4(i)] * b[H4(i)];
+}
+
+RVPR64_ACC(umar64, 1, 4);
+
+static inline void do_umsr64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    uint64_t *d = vd, *c = vc;
+    if (i == 0) {
+        *d = *c;
+    }
+    *d -= (uint64_t)a[H4(i)] * b[H4(i)];
+}
+
+RVPR64_ACC(umsr64, 1, 4);
+
+static inline void do_kmar64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    int64_t *d = vd, *c = vc;
+    int64_t m0 =  (int64_t)a[H4(i)] * b[H4(i)];
+#ifdef TARGET_RISCV64
+    int64_t m1 =  (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+    if (a[H4(i)] == INT32_MIN && b[H4(i)] == INT32_MIN &&
+        a[H4(i + 1)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        if (*c >= 0) {
+            *d = INT64_MAX;
+            env->vxsat = 1;
+        } else {
+            *d = sadd64(env, 0, *c + m0, m1);
+        }
+    } else {
+        *d = sadd64(env, 0, *c, m0 + m1);
+    }
+#else
+    *d = sadd64(env, 0, *c, m0);
+#endif
+}
+
+RVPR64_ACC(kmar64, 1, sizeof(target_ulong));
+
+static inline void do_kmsr64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    int64_t *d = vd, *c = vc;
+
+    int64_t m0 =  (int64_t)a[H4(i)] * b[H4(i)];
+#ifdef TARGET_RISCV64
+    int64_t m1 =  (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+    if (a[H4(i)] == INT32_MIN && b[H4(i)] == INT32_MIN &&
+        a[H4(i + 1)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        if (*c <= 0) {
+            *d = INT64_MIN;
+            env->vxsat = 1;
+        } else {
+            *d = ssub64(env, 0, *c - m0, m1);
+        }
+    } else {
+        *d = ssub64(env, 0, *c, m0 + m1);
+    }
+#else
+    *d = ssub64(env, 0, *c, m0);
+#endif
+}
+
+RVPR64_ACC(kmsr64, 1, sizeof(target_ulong));
+
+static inline void do_ukmar64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    uint64_t *d = vd, *c = vc;
+
+    if (i == 0) {
+        *d = *c;
+    }
+    *d = saddu64(env, 0, *d, (uint64_t)a[H4(i)] * b[H4(i)]);
+}
+
+RVPR64_ACC(ukmar64, 1, 4);
+
+static inline void do_ukmsr64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    uint64_t *d = vd, *c = vc;
+
+    if (i == 0) {
+        *d = *c;
+    }
+    *d = ssubu64(env, 0, *d, (uint64_t)a[i] * b[i]);
+}
+
+RVPR64_ACC(ukmsr64, 1, 4);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 23/38] target/riscv: 32-bit Multiply 64-bit Add/Subtract Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   9 ++
 target/riscv/insn32.decode              |   9 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  63 ++++++++++
 target/riscv/packed_helper.c            | 155 ++++++++++++++++++++++++
 4 files changed, 236 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index cce4c8cbcc..4d89417287 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1338,3 +1338,12 @@ DEF_HELPER_3(rsub64, i64, env, i64, i64)
 DEF_HELPER_3(ursub64, i64, env, i64, i64)
 DEF_HELPER_3(ksub64, i64, env, i64, i64)
 DEF_HELPER_3(uksub64, i64, env, i64, i64)
+
+DEF_HELPER_4(smar64, i64, env, tl, tl, i64)
+DEF_HELPER_4(smsr64, i64, env, tl, tl, i64)
+DEF_HELPER_4(umar64, i64, env, tl, tl, i64)
+DEF_HELPER_4(umsr64, i64, env, tl, tl, i64)
+DEF_HELPER_4(kmar64, i64, env, tl, tl, i64)
+DEF_HELPER_4(kmsr64, i64, env, tl, tl, i64)
+DEF_HELPER_4(ukmar64, i64, env, tl, tl, i64)
+DEF_HELPER_4(ukmsr64, i64, env, tl, tl, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b52e1c1142..60b8b3617b 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -808,3 +808,12 @@ rsub64     1000001  ..... ..... 001 ..... 1111111 @r
 ursub64    1010001  ..... ..... 001 ..... 1111111 @r
 ksub64     1001001  ..... ..... 001 ..... 1111111 @r
 uksub64    1011001  ..... ..... 001 ..... 1111111 @r
+
+smar64     1000010  ..... ..... 001 ..... 1111111 @r
+smsr64     1000011  ..... ..... 001 ..... 1111111 @r
+umar64     1010010  ..... ..... 001 ..... 1111111 @r
+umsr64     1010011  ..... ..... 001 ..... 1111111 @r
+kmar64     1001010  ..... ..... 001 ..... 1111111 @r
+kmsr64     1001011  ..... ..... 001 ..... 1111111 @r
+ukmar64    1011010  ..... ..... 001 ..... 1111111 @r
+ukmsr64    1011011  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 94e5e09425..3e62024aac 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -742,3 +742,66 @@ GEN_RVP_R_D64_S64_S64_OOL(rsub64);
 GEN_RVP_R_D64_S64_S64_OOL(ursub64);
 GEN_RVP_R_D64_S64_S64_OOL(ksub64);
 GEN_RVP_R_D64_S64_S64_OOL(uksub64);
+
+/* 32-bit Multiply with 64-bit Add/Subtract Instructions */
+
+/* Function to accumulate 64bit destination register */
+static bool
+r_d64_acc_ool(DisasContext *ctx, arg_r *a,
+              void (* fn)(TCGv_i64, TCGv_ptr, TCGv, TCGv, TCGv_i64))
+{
+#ifdef TARGET_RISCV64
+    return r_acc_ool(ctx, a, fn);
+#else
+    TCGv_i32 src1, src2;
+    TCGv_i64 dst, src3;
+    TCGv_i32 d0, d1;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_p64) {
+        return false;
+    }
+
+    src1 = tcg_temp_new_i32();
+    src2 = tcg_temp_new_i32();
+    src3 = tcg_temp_new_i64();
+    dst = tcg_temp_new_i64();
+    d0 = tcg_temp_new_i32();
+    d1 = tcg_temp_new_i32();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    gen_get_gpr(d0, a->rd);
+    gen_get_gpr(d1, a->rd + 1);
+    tcg_gen_concat_i32_i64(src3, d0, d1);
+
+    fn(dst, cpu_env, src1, src2, src3);
+
+    tcg_gen_extrl_i64_i32(d0, dst);
+    tcg_gen_extrh_i64_i32(d1, dst);
+    gen_set_gpr(a->rd, d0);
+    gen_set_gpr(a->rd + 1, d1);
+
+    tcg_temp_free_i32(d0);
+    tcg_temp_free_i32(d1);
+    tcg_temp_free_i32(src1);
+    tcg_temp_free_i32(src2);
+    tcg_temp_free_i64(src3);
+    tcg_temp_free_i64(dst);
+    return true;
+#endif
+}
+
+#define GEN_RVP_R_D64_ACC_OOL(NAME)                    \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_d64_acc_ool(s, a, gen_helper_##NAME);     \
+}
+
+GEN_RVP_R_D64_ACC_OOL(smar64);
+GEN_RVP_R_D64_ACC_OOL(smsr64);
+GEN_RVP_R_D64_ACC_OOL(umar64);
+GEN_RVP_R_D64_ACC_OOL(umsr64);
+GEN_RVP_R_D64_ACC_OOL(kmar64);
+GEN_RVP_R_D64_ACC_OOL(kmsr64);
+GEN_RVP_R_D64_ACC_OOL(ukmar64);
+GEN_RVP_R_D64_ACC_OOL(ukmsr64);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 0629c5178b..3cbe9e51cc 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2229,3 +2229,158 @@ static inline void do_uksub64(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64_64_64(uksub64, 1, 8);
+
+/* 32-bit Multiply with 64-bit Add/Subtract Instructions */
+static inline uint64_t
+rvpr64_acc(CPURISCVState *env, target_ulong a,
+           target_ulong b, uint64_t c,
+           uint8_t step, uint8_t size, PackedFn4i *fn)
+{
+    int i, passes = sizeof(target_ulong) / size;
+    uint64_t result = 0;
+
+    for (i = 0; i < passes; i += step) {
+        fn(env, &result, &a, &b, &c, i);
+    }
+    return result;
+}
+
+#define RVPR64_ACC(NAME, STEP, SIZE)                                     \
+uint64_t HELPER(NAME)(CPURISCVState *env, target_ulong a,                \
+                      target_ulong b, uint64_t c)                        \
+{                                                                        \
+    return rvpr64_acc(env, a, b, c, STEP, SIZE, (PackedFn4i *)do_##NAME);\
+}
+
+static inline void do_smar64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    int64_t *d = vd, *c = vc;
+    if (i == 0) {
+        *d = *c;
+    }
+    *d += (int64_t)a[H4(i)] * b[H4(i)];
+}
+
+RVPR64_ACC(smar64, 1, 4);
+
+static inline void do_smsr64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    int64_t *d = vd, *c = vc;
+    if (i == 0) {
+        *d = *c;
+    }
+    *d -= (int64_t)a[H4(i)] * b[H4(i)];
+}
+
+RVPR64_ACC(smsr64, 1, 4);
+
+static inline void do_umar64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    uint64_t *d = vd, *c = vc;
+    if (i == 0) {
+        *d = *c;
+    }
+    *d += (uint64_t)a[H4(i)] * b[H4(i)];
+}
+
+RVPR64_ACC(umar64, 1, 4);
+
+static inline void do_umsr64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    uint64_t *d = vd, *c = vc;
+    if (i == 0) {
+        *d = *c;
+    }
+    *d -= (uint64_t)a[H4(i)] * b[H4(i)];
+}
+
+RVPR64_ACC(umsr64, 1, 4);
+
+static inline void do_kmar64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    int64_t *d = vd, *c = vc;
+    int64_t m0 =  (int64_t)a[H4(i)] * b[H4(i)];
+#ifdef TARGET_RISCV64
+    int64_t m1 =  (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+    if (a[H4(i)] == INT32_MIN && b[H4(i)] == INT32_MIN &&
+        a[H4(i + 1)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        if (*c >= 0) {
+            *d = INT64_MAX;
+            env->vxsat = 1;
+        } else {
+            *d = sadd64(env, 0, *c + m0, m1);
+        }
+    } else {
+        *d = sadd64(env, 0, *c, m0 + m1);
+    }
+#else
+    *d = sadd64(env, 0, *c, m0);
+#endif
+}
+
+RVPR64_ACC(kmar64, 1, sizeof(target_ulong));
+
+static inline void do_kmsr64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    int64_t *d = vd, *c = vc;
+
+    int64_t m0 =  (int64_t)a[H4(i)] * b[H4(i)];
+#ifdef TARGET_RISCV64
+    int64_t m1 =  (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+    if (a[H4(i)] == INT32_MIN && b[H4(i)] == INT32_MIN &&
+        a[H4(i + 1)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        if (*c <= 0) {
+            *d = INT64_MIN;
+            env->vxsat = 1;
+        } else {
+            *d = ssub64(env, 0, *c - m0, m1);
+        }
+    } else {
+        *d = ssub64(env, 0, *c, m0 + m1);
+    }
+#else
+    *d = ssub64(env, 0, *c, m0);
+#endif
+}
+
+RVPR64_ACC(kmsr64, 1, sizeof(target_ulong));
+
+static inline void do_ukmar64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    uint64_t *d = vd, *c = vc;
+
+    if (i == 0) {
+        *d = *c;
+    }
+    *d = saddu64(env, 0, *d, (uint64_t)a[H4(i)] * b[H4(i)]);
+}
+
+RVPR64_ACC(ukmar64, 1, 4);
+
+static inline void do_ukmsr64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    uint64_t *d = vd, *c = vc;
+
+    if (i == 0) {
+        *d = *c;
+    }
+    *d = ssubu64(env, 0, *d, (uint64_t)a[i] * b[i]);
+}
+
+RVPR64_ACC(ukmsr64, 1, 4);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 24/38] target/riscv: Signed 16-bit Multiply with 64-bit Add/Subtract Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  11 ++
 target/riscv/insn32.decode              |  11 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  12 ++
 target/riscv/packed_helper.c            | 151 ++++++++++++++++++++++++
 4 files changed, 185 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 4d89417287..3ec4477ce8 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1347,3 +1347,14 @@ DEF_HELPER_4(kmar64, i64, env, tl, tl, i64)
 DEF_HELPER_4(kmsr64, i64, env, tl, tl, i64)
 DEF_HELPER_4(ukmar64, i64, env, tl, tl, i64)
 DEF_HELPER_4(ukmsr64, i64, env, tl, tl, i64)
+
+DEF_HELPER_4(smalbb, i64, env, tl, tl, i64)
+DEF_HELPER_4(smalbt, i64, env, tl, tl, i64)
+DEF_HELPER_4(smaltt, i64, env, tl, tl, i64)
+DEF_HELPER_4(smalda, i64, env, tl, tl, i64)
+DEF_HELPER_4(smalxda, i64, env, tl, tl, i64)
+DEF_HELPER_4(smalds, i64, env, tl, tl, i64)
+DEF_HELPER_4(smalxds, i64, env, tl, tl, i64)
+DEF_HELPER_4(smaldrs, i64, env, tl, tl, i64)
+DEF_HELPER_4(smslda, i64, env, tl, tl, i64)
+DEF_HELPER_4(smslxda, i64, env, tl, tl, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 60b8b3617b..82ee24c563 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -817,3 +817,14 @@ kmar64     1001010  ..... ..... 001 ..... 1111111 @r
 kmsr64     1001011  ..... ..... 001 ..... 1111111 @r
 ukmar64    1011010  ..... ..... 001 ..... 1111111 @r
 ukmsr64    1011011  ..... ..... 001 ..... 1111111 @r
+
+smalbb     1000100  ..... ..... 001 ..... 1111111 @r
+smalbt     1001100  ..... ..... 001 ..... 1111111 @r
+smaltt     1010100  ..... ..... 001 ..... 1111111 @r
+smalda     1000110  ..... ..... 001 ..... 1111111 @r
+smalxda    1001110  ..... ..... 001 ..... 1111111 @r
+smalds     1000101  ..... ..... 001 ..... 1111111 @r
+smaldrs    1001101  ..... ..... 001 ..... 1111111 @r
+smalxds    1010101  ..... ..... 001 ..... 1111111 @r
+smslda     1010110  ..... ..... 001 ..... 1111111 @r
+smslxda    1011110  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 3e62024aac..ddaca3d20b 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -805,3 +805,15 @@ GEN_RVP_R_D64_ACC_OOL(kmar64);
 GEN_RVP_R_D64_ACC_OOL(kmsr64);
 GEN_RVP_R_D64_ACC_OOL(ukmar64);
 GEN_RVP_R_D64_ACC_OOL(ukmsr64);
+
+/* Signed 16-bit Multiply with 64-bit Add/Subtract Instructions */
+GEN_RVP_R_D64_ACC_OOL(smalbb);
+GEN_RVP_R_D64_ACC_OOL(smalbt);
+GEN_RVP_R_D64_ACC_OOL(smaltt);
+GEN_RVP_R_D64_ACC_OOL(smalda);
+GEN_RVP_R_D64_ACC_OOL(smalxda);
+GEN_RVP_R_D64_ACC_OOL(smalds);
+GEN_RVP_R_D64_ACC_OOL(smaldrs);
+GEN_RVP_R_D64_ACC_OOL(smalxds);
+GEN_RVP_R_D64_ACC_OOL(smslda);
+GEN_RVP_R_D64_ACC_OOL(smslxda);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 3cbe9e51cc..4e4722c20e 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2384,3 +2384,154 @@ static inline void do_ukmsr64(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64_ACC(ukmsr64, 1, 4);
+
+/* Signed 16-bit Multiply with 64-bit Add/Subtract Instructions */
+static inline void do_smalbb(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i)] * b[H2(i)];
+}
+
+RVPR64_ACC(smalbb, 2, 2);
+
+static inline void do_smalbt(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smalbt, 2, 2);
+
+static inline void do_smaltt(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i + 1)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smaltt, 2, 2);
+
+static inline void do_smalda(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i)] * b[H2(i)] + (int64_t)a[H2(i + 1)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smalda, 2, 2);
+
+static inline void do_smalxda(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i)] * b[H2(i + 1)] + (int64_t)a[H2(i + 1)] * b[H2(i)];
+}
+
+RVPR64_ACC(smalxda, 2, 2);
+
+static inline void do_smalds(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i + 1)] * b[H2(i + 1)] - (int64_t)a[H2(i)] * b[H2(i)];
+}
+
+RVPR64_ACC(smalds, 2, 2);
+
+static inline void do_smaldrs(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i)] * b[H2(i)] - (int64_t)a[H2(i + 1)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smaldrs, 2, 2);
+
+static inline void do_smalxds(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i + 1)] * b[H2(i)] - (int64_t)a[H2(i)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smalxds, 2, 2);
+
+static inline void do_smslda(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d -= (int64_t)a[H2(i)] * b[H2(i)] + (int64_t)a[H2(i + 1)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smslda, 2, 2);
+
+static inline void do_smslxda(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d -= (int64_t)a[H2(i + 1)] * b[H2(i)] + (int64_t)a[H2(i)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smslxda, 2, 2);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 24/38] target/riscv: Signed 16-bit Multiply with 64-bit Add/Subtract Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  11 ++
 target/riscv/insn32.decode              |  11 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  12 ++
 target/riscv/packed_helper.c            | 151 ++++++++++++++++++++++++
 4 files changed, 185 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 4d89417287..3ec4477ce8 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1347,3 +1347,14 @@ DEF_HELPER_4(kmar64, i64, env, tl, tl, i64)
 DEF_HELPER_4(kmsr64, i64, env, tl, tl, i64)
 DEF_HELPER_4(ukmar64, i64, env, tl, tl, i64)
 DEF_HELPER_4(ukmsr64, i64, env, tl, tl, i64)
+
+DEF_HELPER_4(smalbb, i64, env, tl, tl, i64)
+DEF_HELPER_4(smalbt, i64, env, tl, tl, i64)
+DEF_HELPER_4(smaltt, i64, env, tl, tl, i64)
+DEF_HELPER_4(smalda, i64, env, tl, tl, i64)
+DEF_HELPER_4(smalxda, i64, env, tl, tl, i64)
+DEF_HELPER_4(smalds, i64, env, tl, tl, i64)
+DEF_HELPER_4(smalxds, i64, env, tl, tl, i64)
+DEF_HELPER_4(smaldrs, i64, env, tl, tl, i64)
+DEF_HELPER_4(smslda, i64, env, tl, tl, i64)
+DEF_HELPER_4(smslxda, i64, env, tl, tl, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 60b8b3617b..82ee24c563 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -817,3 +817,14 @@ kmar64     1001010  ..... ..... 001 ..... 1111111 @r
 kmsr64     1001011  ..... ..... 001 ..... 1111111 @r
 ukmar64    1011010  ..... ..... 001 ..... 1111111 @r
 ukmsr64    1011011  ..... ..... 001 ..... 1111111 @r
+
+smalbb     1000100  ..... ..... 001 ..... 1111111 @r
+smalbt     1001100  ..... ..... 001 ..... 1111111 @r
+smaltt     1010100  ..... ..... 001 ..... 1111111 @r
+smalda     1000110  ..... ..... 001 ..... 1111111 @r
+smalxda    1001110  ..... ..... 001 ..... 1111111 @r
+smalds     1000101  ..... ..... 001 ..... 1111111 @r
+smaldrs    1001101  ..... ..... 001 ..... 1111111 @r
+smalxds    1010101  ..... ..... 001 ..... 1111111 @r
+smslda     1010110  ..... ..... 001 ..... 1111111 @r
+smslxda    1011110  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 3e62024aac..ddaca3d20b 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -805,3 +805,15 @@ GEN_RVP_R_D64_ACC_OOL(kmar64);
 GEN_RVP_R_D64_ACC_OOL(kmsr64);
 GEN_RVP_R_D64_ACC_OOL(ukmar64);
 GEN_RVP_R_D64_ACC_OOL(ukmsr64);
+
+/* Signed 16-bit Multiply with 64-bit Add/Subtract Instructions */
+GEN_RVP_R_D64_ACC_OOL(smalbb);
+GEN_RVP_R_D64_ACC_OOL(smalbt);
+GEN_RVP_R_D64_ACC_OOL(smaltt);
+GEN_RVP_R_D64_ACC_OOL(smalda);
+GEN_RVP_R_D64_ACC_OOL(smalxda);
+GEN_RVP_R_D64_ACC_OOL(smalds);
+GEN_RVP_R_D64_ACC_OOL(smaldrs);
+GEN_RVP_R_D64_ACC_OOL(smalxds);
+GEN_RVP_R_D64_ACC_OOL(smslda);
+GEN_RVP_R_D64_ACC_OOL(smslxda);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 3cbe9e51cc..4e4722c20e 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2384,3 +2384,154 @@ static inline void do_ukmsr64(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64_ACC(ukmsr64, 1, 4);
+
+/* Signed 16-bit Multiply with 64-bit Add/Subtract Instructions */
+static inline void do_smalbb(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i)] * b[H2(i)];
+}
+
+RVPR64_ACC(smalbb, 2, 2);
+
+static inline void do_smalbt(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smalbt, 2, 2);
+
+static inline void do_smaltt(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i + 1)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smaltt, 2, 2);
+
+static inline void do_smalda(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i)] * b[H2(i)] + (int64_t)a[H2(i + 1)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smalda, 2, 2);
+
+static inline void do_smalxda(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i)] * b[H2(i + 1)] + (int64_t)a[H2(i + 1)] * b[H2(i)];
+}
+
+RVPR64_ACC(smalxda, 2, 2);
+
+static inline void do_smalds(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i + 1)] * b[H2(i + 1)] - (int64_t)a[H2(i)] * b[H2(i)];
+}
+
+RVPR64_ACC(smalds, 2, 2);
+
+static inline void do_smaldrs(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i)] * b[H2(i)] - (int64_t)a[H2(i + 1)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smaldrs, 2, 2);
+
+static inline void do_smalxds(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d += (int64_t)a[H2(i + 1)] * b[H2(i)] - (int64_t)a[H2(i)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smalxds, 2, 2);
+
+static inline void do_smslda(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d -= (int64_t)a[H2(i)] * b[H2(i)] + (int64_t)a[H2(i + 1)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smslda, 2, 2);
+
+static inline void do_smslxda(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int16_t *a = va, *b = vb;
+
+    if (i == 0) {
+        *d = *c;
+    }
+
+    *d -= (int64_t)a[H2(i + 1)] * b[H2(i)] + (int64_t)a[H2(i)] * b[H2(i + 1)];
+}
+
+RVPR64_ACC(smslxda, 2, 2);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 25/38] target/riscv: Non-SIMD Q15 saturation ALU Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  8 +++
 target/riscv/insn32.decode              |  8 +++
 target/riscv/insn_trans/trans_rvp.c.inc | 12 ++++
 target/riscv/packed_helper.c            | 78 +++++++++++++++++++++++++
 4 files changed, 106 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 3ec4477ce8..fdfd3177db 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1358,3 +1358,11 @@ DEF_HELPER_4(smalxds, i64, env, tl, tl, i64)
 DEF_HELPER_4(smaldrs, i64, env, tl, tl, i64)
 DEF_HELPER_4(smslda, i64, env, tl, tl, i64)
 DEF_HELPER_4(smslxda, i64, env, tl, tl, i64)
+
+DEF_HELPER_3(kaddh, tl, env, tl, tl)
+DEF_HELPER_3(ksubh, tl, env, tl, tl)
+DEF_HELPER_3(khmbb, tl, env, tl, tl)
+DEF_HELPER_3(khmbt, tl, env, tl, tl)
+DEF_HELPER_3(khmtt, tl, env, tl, tl)
+DEF_HELPER_3(ukaddh, tl, env, tl, tl)
+DEF_HELPER_3(uksubh, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 82ee24c563..b31bec9c75 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -828,3 +828,11 @@ smaldrs    1001101  ..... ..... 001 ..... 1111111 @r
 smalxds    1010101  ..... ..... 001 ..... 1111111 @r
 smslda     1010110  ..... ..... 001 ..... 1111111 @r
 smslxda    1011110  ..... ..... 001 ..... 1111111 @r
+
+kaddh      0000010  ..... ..... 001 ..... 1111111 @r
+ksubh      0000011  ..... ..... 001 ..... 1111111 @r
+khmbb      0000110  ..... ..... 001 ..... 1111111 @r
+khmbt      0001110  ..... ..... 001 ..... 1111111 @r
+khmtt      0010110  ..... ..... 001 ..... 1111111 @r
+ukaddh     0001010  ..... ..... 001 ..... 1111111 @r
+uksubh     0001011  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index ddaca3d20b..b4f6b74b70 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -817,3 +817,15 @@ GEN_RVP_R_D64_ACC_OOL(smaldrs);
 GEN_RVP_R_D64_ACC_OOL(smalxds);
 GEN_RVP_R_D64_ACC_OOL(smslda);
 GEN_RVP_R_D64_ACC_OOL(smslxda);
+
+/*
+ *** Non-SIMD Instructions
+ */
+/* Non-SIMD Q15 saturation ALU Instructions */
+GEN_RVP_R_OOL(kaddh);
+GEN_RVP_R_OOL(ksubh);
+GEN_RVP_R_OOL(khmbb);
+GEN_RVP_R_OOL(khmbt);
+GEN_RVP_R_OOL(khmtt);
+GEN_RVP_R_OOL(ukaddh);
+GEN_RVP_R_OOL(uksubh);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 4e4722c20e..68db0b1f61 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2535,3 +2535,81 @@ static inline void do_smslxda(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64_ACC(smslxda, 2, 2);
+
+/* Q15 saturation instructions */
+static inline void do_kaddh(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = sat64(env, (int64_t)a[H4(i)] + b[H4(i)], 15);
+}
+
+RVPR(kaddh, 2, 4);
+
+static inline void do_ksubh(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = sat64(env, (int64_t)a[H4(i)] - b[H4(i)], 15);
+}
+
+RVPR(ksubh, 2, 4);
+
+static inline void do_khmbb(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    *d = sat64(env, (int64_t)a[H2(i)] * b[H2(i)] >> 15, 15);
+}
+
+RVPR(khmbb, 4, 2);
+
+static inline void do_khmbt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    *d = sat64(env, (int64_t)a[H2(i)] * b[H2(i + 1)] >> 15, 15);
+}
+
+RVPR(khmbt, 4, 2);
+
+static inline void do_khmtt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    *d = sat64(env, (int64_t)a[H2(i + 1)] * b[H2(i + 1)] >> 15, 15);
+}
+
+RVPR(khmtt, 4, 2);
+
+static inline void do_ukaddh(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    uint32_t *a = va, *b = vb;
+
+    *d = (int16_t)satu64(env, saddu32(env, 0, a[H4(i)], b[H4(i)]), 16);
+}
+
+RVPR(ukaddh, 2, 4);
+
+static inline void do_uksubh(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    uint32_t *a = va, *b = vb;
+
+    *d = (int16_t)satu64(env, ssubu32(env, 0, a[H4(i)], b[H4(i)]), 16);
+}
+
+RVPR(uksubh, 2, 4);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 25/38] target/riscv: Non-SIMD Q15 saturation ALU Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  8 +++
 target/riscv/insn32.decode              |  8 +++
 target/riscv/insn_trans/trans_rvp.c.inc | 12 ++++
 target/riscv/packed_helper.c            | 78 +++++++++++++++++++++++++
 4 files changed, 106 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 3ec4477ce8..fdfd3177db 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1358,3 +1358,11 @@ DEF_HELPER_4(smalxds, i64, env, tl, tl, i64)
 DEF_HELPER_4(smaldrs, i64, env, tl, tl, i64)
 DEF_HELPER_4(smslda, i64, env, tl, tl, i64)
 DEF_HELPER_4(smslxda, i64, env, tl, tl, i64)
+
+DEF_HELPER_3(kaddh, tl, env, tl, tl)
+DEF_HELPER_3(ksubh, tl, env, tl, tl)
+DEF_HELPER_3(khmbb, tl, env, tl, tl)
+DEF_HELPER_3(khmbt, tl, env, tl, tl)
+DEF_HELPER_3(khmtt, tl, env, tl, tl)
+DEF_HELPER_3(ukaddh, tl, env, tl, tl)
+DEF_HELPER_3(uksubh, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 82ee24c563..b31bec9c75 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -828,3 +828,11 @@ smaldrs    1001101  ..... ..... 001 ..... 1111111 @r
 smalxds    1010101  ..... ..... 001 ..... 1111111 @r
 smslda     1010110  ..... ..... 001 ..... 1111111 @r
 smslxda    1011110  ..... ..... 001 ..... 1111111 @r
+
+kaddh      0000010  ..... ..... 001 ..... 1111111 @r
+ksubh      0000011  ..... ..... 001 ..... 1111111 @r
+khmbb      0000110  ..... ..... 001 ..... 1111111 @r
+khmbt      0001110  ..... ..... 001 ..... 1111111 @r
+khmtt      0010110  ..... ..... 001 ..... 1111111 @r
+ukaddh     0001010  ..... ..... 001 ..... 1111111 @r
+uksubh     0001011  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index ddaca3d20b..b4f6b74b70 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -817,3 +817,15 @@ GEN_RVP_R_D64_ACC_OOL(smaldrs);
 GEN_RVP_R_D64_ACC_OOL(smalxds);
 GEN_RVP_R_D64_ACC_OOL(smslda);
 GEN_RVP_R_D64_ACC_OOL(smslxda);
+
+/*
+ *** Non-SIMD Instructions
+ */
+/* Non-SIMD Q15 saturation ALU Instructions */
+GEN_RVP_R_OOL(kaddh);
+GEN_RVP_R_OOL(ksubh);
+GEN_RVP_R_OOL(khmbb);
+GEN_RVP_R_OOL(khmbt);
+GEN_RVP_R_OOL(khmtt);
+GEN_RVP_R_OOL(ukaddh);
+GEN_RVP_R_OOL(uksubh);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 4e4722c20e..68db0b1f61 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2535,3 +2535,81 @@ static inline void do_smslxda(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64_ACC(smslxda, 2, 2);
+
+/* Q15 saturation instructions */
+static inline void do_kaddh(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = sat64(env, (int64_t)a[H4(i)] + b[H4(i)], 15);
+}
+
+RVPR(kaddh, 2, 4);
+
+static inline void do_ksubh(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = sat64(env, (int64_t)a[H4(i)] - b[H4(i)], 15);
+}
+
+RVPR(ksubh, 2, 4);
+
+static inline void do_khmbb(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    *d = sat64(env, (int64_t)a[H2(i)] * b[H2(i)] >> 15, 15);
+}
+
+RVPR(khmbb, 4, 2);
+
+static inline void do_khmbt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    *d = sat64(env, (int64_t)a[H2(i)] * b[H2(i + 1)] >> 15, 15);
+}
+
+RVPR(khmbt, 4, 2);
+
+static inline void do_khmtt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    *d = sat64(env, (int64_t)a[H2(i + 1)] * b[H2(i + 1)] >> 15, 15);
+}
+
+RVPR(khmtt, 4, 2);
+
+static inline void do_ukaddh(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    uint32_t *a = va, *b = vb;
+
+    *d = (int16_t)satu64(env, saddu32(env, 0, a[H4(i)], b[H4(i)]), 16);
+}
+
+RVPR(ukaddh, 2, 4);
+
+static inline void do_uksubh(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    uint32_t *a = va, *b = vb;
+
+    *d = (int16_t)satu64(env, ssubu32(env, 0, a[H4(i)], b[H4(i)]), 16);
+}
+
+RVPR(uksubh, 2, 4);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 26/38] target/riscv: Non-SIMD Q31 saturation ALU Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  15 ++
 target/riscv/insn32.decode              |  16 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  17 ++
 target/riscv/packed_helper.c            | 214 ++++++++++++++++++++++++
 4 files changed, 262 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index fdfd3177db..a6f62295e9 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1366,3 +1366,18 @@ DEF_HELPER_3(khmbt, tl, env, tl, tl)
 DEF_HELPER_3(khmtt, tl, env, tl, tl)
 DEF_HELPER_3(ukaddh, tl, env, tl, tl)
 DEF_HELPER_3(uksubh, tl, env, tl, tl)
+
+DEF_HELPER_3(kaddw, tl, env, tl, tl)
+DEF_HELPER_3(ukaddw, tl, env, tl, tl)
+DEF_HELPER_3(ksubw, tl, env, tl, tl)
+DEF_HELPER_3(uksubw, tl, env, tl, tl)
+DEF_HELPER_3(kdmbb, tl, env, tl, tl)
+DEF_HELPER_3(kdmbt, tl, env, tl, tl)
+DEF_HELPER_3(kdmtt, tl, env, tl, tl)
+DEF_HELPER_3(kslraw, tl, env, tl, tl)
+DEF_HELPER_3(kslraw_u, tl, env, tl, tl)
+DEF_HELPER_3(ksllw, tl, env, tl, tl)
+DEF_HELPER_4(kdmabb, tl, env, tl, tl, tl)
+DEF_HELPER_4(kdmabt, tl, env, tl, tl, tl)
+DEF_HELPER_4(kdmatt, tl, env, tl, tl, tl)
+DEF_HELPER_2(kabsw, tl, env, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b31bec9c75..0b8f8d4c42 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -836,3 +836,19 @@ khmbt      0001110  ..... ..... 001 ..... 1111111 @r
 khmtt      0010110  ..... ..... 001 ..... 1111111 @r
 ukaddh     0001010  ..... ..... 001 ..... 1111111 @r
 uksubh     0001011  ..... ..... 001 ..... 1111111 @r
+
+kaddw      0000000  ..... ..... 001 ..... 1111111 @r
+ukaddw     0001000  ..... ..... 001 ..... 1111111 @r
+ksubw      0000001  ..... ..... 001 ..... 1111111 @r
+uksubw     0001001  ..... ..... 001 ..... 1111111 @r
+kdmbb      0000101  ..... ..... 001 ..... 1111111 @r
+kdmbt      0001101  ..... ..... 001 ..... 1111111 @r
+kdmtt      0010101  ..... ..... 001 ..... 1111111 @r
+kslraw     0110111  ..... ..... 001 ..... 1111111 @r
+kslraw_u   0111111  ..... ..... 001 ..... 1111111 @r
+ksllw      0010011  ..... ..... 001 ..... 1111111 @r
+kslliw     0011011  ..... ..... 001 ..... 1111111 @sh5
+kdmabb     1101001  ..... ..... 001 ..... 1111111 @r
+kdmabt     1110001  ..... ..... 001 ..... 1111111 @r
+kdmatt     1111001  ..... ..... 001 ..... 1111111 @r
+kabsw      1010110  10100 ..... 000 ..... 1111111 @r2
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index b4f6b74b70..a57776303a 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -829,3 +829,20 @@ GEN_RVP_R_OOL(khmbt);
 GEN_RVP_R_OOL(khmtt);
 GEN_RVP_R_OOL(ukaddh);
 GEN_RVP_R_OOL(uksubh);
+
+/* Non-SIMD Q31 saturation ALU Instructions */
+GEN_RVP_R_OOL(kaddw);
+GEN_RVP_R_OOL(ukaddw);
+GEN_RVP_R_OOL(ksubw);
+GEN_RVP_R_OOL(uksubw);
+GEN_RVP_R_OOL(kdmbb);
+GEN_RVP_R_OOL(kdmbt);
+GEN_RVP_R_OOL(kdmtt);
+GEN_RVP_R_OOL(kslraw);
+GEN_RVP_R_OOL(kslraw_u);
+GEN_RVP_R_OOL(ksllw);
+GEN_RVP_SHIFTI(kslliw, ksllw, NULL);
+GEN_RVP_R_ACC_OOL(kdmabb);
+GEN_RVP_R_ACC_OOL(kdmabt);
+GEN_RVP_R_ACC_OOL(kdmatt);
+GEN_RVP_R2_OOL(kabsw);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 68db0b1f61..d2f7ec26f9 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2613,3 +2613,217 @@ static inline void do_uksubh(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(uksubh, 2, 4);
+
+/* Q31 saturation Instructions */
+static inline void do_kaddw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = sadd32(env, 0, a[H4(i)], b[H4(i)]);
+}
+
+RVPR(kaddw, 2, 4);
+
+static inline void do_ukaddw(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    uint32_t *a = va, *b = vb;
+
+    *d = (int32_t)saddu32(env, 0, a[H4(i)], b[H4(i)]);
+}
+
+RVPR(ukaddw, 2, 4);
+
+static inline void do_ksubw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = ssub32(env, 0, a[H4(i)], b[H4(i)]);
+}
+
+RVPR(ksubw, 2, 4);
+
+static inline void do_uksubw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    uint32_t *a = va, *b = vb;
+
+    *d = (int32_t)ssubu32(env, 0, a[H4(i)], b[H4(i)]);
+}
+
+RVPR(uksubw, 2, 4);
+
+static inline void do_kdmbb(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i)] == INT16_MIN) {
+        *d = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int64_t)a[H2(i)] * b[H2(i)] << 1;
+    }
+}
+
+RVPR(kdmbb, 4, 2);
+
+static inline void do_kdmbt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        *d = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int64_t)a[H2(i)] * b[H2(i + 1)] << 1;
+    }
+}
+
+RVPR(kdmbt, 4, 2);
+
+static inline void do_kdmtt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i + 1)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        *d = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int64_t)a[H2(i + 1)] * b[H2(i + 1)] << 1;
+    }
+}
+
+RVPR(kdmtt, 4, 2);
+
+static inline void do_kslraw(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 6);
+
+    if (shift >= 0) {
+        *d = (int32_t)sat64(env, (int64_t)a[H4(i)] << shift, 31);
+    } else {
+        shift = -shift;
+        shift = (shift == 32) ? 31 : shift;
+        *d = a[H4(i)] >> shift;
+    }
+}
+
+RVPR(kslraw, 2, 4);
+
+static inline void do_kslraw_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 6);
+
+    if (shift >= 0) {
+        *d = (int32_t)sat64(env, (int64_t)a[H4(i)] << shift, 31);
+    } else {
+        shift = -shift;
+        shift = (shift == 32) ? 31 : shift;
+        *d = vssra32(env, 0, a[H4(i)], shift);
+    }
+}
+
+RVPR(kslraw_u, 2, 4);
+
+static inline void do_ksllw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+
+    *d = (int32_t)sat64(env, (int64_t)a[H4(i)] << shift, 31);
+}
+
+RVPR(ksllw, 2, 4);
+
+static inline void do_kdmabb(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i)] * b[H2(i)] << 1;
+    }
+    *d = sadd32(env, 0, c[H4(i)], m0);
+}
+
+RVPR_ACC(kdmabb, 4, 2);
+
+static inline void do_kdmabt(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i)] * b[H2(i + 1)] << 1;
+    }
+    *d = sadd32(env, 0, c[H4(i)], m0);
+}
+
+RVPR_ACC(kdmabt, 4, 2);
+
+static inline void do_kdmatt(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i + 1)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i + 1)] * b[H2(i + 1)] << 1;
+    }
+    *d = sadd32(env, 0, c[H4(i)], m0);
+}
+
+RVPR_ACC(kdmatt, 4, 2);
+
+static inline void do_kabsw(CPURISCVState *env, void *vd, void *va, uint8_t i)
+
+{
+    target_long *d = vd;
+    int32_t *a = va;
+
+    if (a[H4(i)] == INT32_MIN) {
+        *d = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int32_t)abs(a[H4(i)]);
+    }
+}
+
+RVPR2(kabsw, 2, 4);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 26/38] target/riscv: Non-SIMD Q31 saturation ALU Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  15 ++
 target/riscv/insn32.decode              |  16 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  17 ++
 target/riscv/packed_helper.c            | 214 ++++++++++++++++++++++++
 4 files changed, 262 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index fdfd3177db..a6f62295e9 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1366,3 +1366,18 @@ DEF_HELPER_3(khmbt, tl, env, tl, tl)
 DEF_HELPER_3(khmtt, tl, env, tl, tl)
 DEF_HELPER_3(ukaddh, tl, env, tl, tl)
 DEF_HELPER_3(uksubh, tl, env, tl, tl)
+
+DEF_HELPER_3(kaddw, tl, env, tl, tl)
+DEF_HELPER_3(ukaddw, tl, env, tl, tl)
+DEF_HELPER_3(ksubw, tl, env, tl, tl)
+DEF_HELPER_3(uksubw, tl, env, tl, tl)
+DEF_HELPER_3(kdmbb, tl, env, tl, tl)
+DEF_HELPER_3(kdmbt, tl, env, tl, tl)
+DEF_HELPER_3(kdmtt, tl, env, tl, tl)
+DEF_HELPER_3(kslraw, tl, env, tl, tl)
+DEF_HELPER_3(kslraw_u, tl, env, tl, tl)
+DEF_HELPER_3(ksllw, tl, env, tl, tl)
+DEF_HELPER_4(kdmabb, tl, env, tl, tl, tl)
+DEF_HELPER_4(kdmabt, tl, env, tl, tl, tl)
+DEF_HELPER_4(kdmatt, tl, env, tl, tl, tl)
+DEF_HELPER_2(kabsw, tl, env, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b31bec9c75..0b8f8d4c42 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -836,3 +836,19 @@ khmbt      0001110  ..... ..... 001 ..... 1111111 @r
 khmtt      0010110  ..... ..... 001 ..... 1111111 @r
 ukaddh     0001010  ..... ..... 001 ..... 1111111 @r
 uksubh     0001011  ..... ..... 001 ..... 1111111 @r
+
+kaddw      0000000  ..... ..... 001 ..... 1111111 @r
+ukaddw     0001000  ..... ..... 001 ..... 1111111 @r
+ksubw      0000001  ..... ..... 001 ..... 1111111 @r
+uksubw     0001001  ..... ..... 001 ..... 1111111 @r
+kdmbb      0000101  ..... ..... 001 ..... 1111111 @r
+kdmbt      0001101  ..... ..... 001 ..... 1111111 @r
+kdmtt      0010101  ..... ..... 001 ..... 1111111 @r
+kslraw     0110111  ..... ..... 001 ..... 1111111 @r
+kslraw_u   0111111  ..... ..... 001 ..... 1111111 @r
+ksllw      0010011  ..... ..... 001 ..... 1111111 @r
+kslliw     0011011  ..... ..... 001 ..... 1111111 @sh5
+kdmabb     1101001  ..... ..... 001 ..... 1111111 @r
+kdmabt     1110001  ..... ..... 001 ..... 1111111 @r
+kdmatt     1111001  ..... ..... 001 ..... 1111111 @r
+kabsw      1010110  10100 ..... 000 ..... 1111111 @r2
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index b4f6b74b70..a57776303a 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -829,3 +829,20 @@ GEN_RVP_R_OOL(khmbt);
 GEN_RVP_R_OOL(khmtt);
 GEN_RVP_R_OOL(ukaddh);
 GEN_RVP_R_OOL(uksubh);
+
+/* Non-SIMD Q31 saturation ALU Instructions */
+GEN_RVP_R_OOL(kaddw);
+GEN_RVP_R_OOL(ukaddw);
+GEN_RVP_R_OOL(ksubw);
+GEN_RVP_R_OOL(uksubw);
+GEN_RVP_R_OOL(kdmbb);
+GEN_RVP_R_OOL(kdmbt);
+GEN_RVP_R_OOL(kdmtt);
+GEN_RVP_R_OOL(kslraw);
+GEN_RVP_R_OOL(kslraw_u);
+GEN_RVP_R_OOL(ksllw);
+GEN_RVP_SHIFTI(kslliw, ksllw, NULL);
+GEN_RVP_R_ACC_OOL(kdmabb);
+GEN_RVP_R_ACC_OOL(kdmabt);
+GEN_RVP_R_ACC_OOL(kdmatt);
+GEN_RVP_R2_OOL(kabsw);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 68db0b1f61..d2f7ec26f9 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2613,3 +2613,217 @@ static inline void do_uksubh(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(uksubh, 2, 4);
+
+/* Q31 saturation Instructions */
+static inline void do_kaddw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = sadd32(env, 0, a[H4(i)], b[H4(i)]);
+}
+
+RVPR(kaddw, 2, 4);
+
+static inline void do_ukaddw(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    uint32_t *a = va, *b = vb;
+
+    *d = (int32_t)saddu32(env, 0, a[H4(i)], b[H4(i)]);
+}
+
+RVPR(ukaddw, 2, 4);
+
+static inline void do_ksubw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = ssub32(env, 0, a[H4(i)], b[H4(i)]);
+}
+
+RVPR(ksubw, 2, 4);
+
+static inline void do_uksubw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    uint32_t *a = va, *b = vb;
+
+    *d = (int32_t)ssubu32(env, 0, a[H4(i)], b[H4(i)]);
+}
+
+RVPR(uksubw, 2, 4);
+
+static inline void do_kdmbb(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i)] == INT16_MIN) {
+        *d = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int64_t)a[H2(i)] * b[H2(i)] << 1;
+    }
+}
+
+RVPR(kdmbb, 4, 2);
+
+static inline void do_kdmbt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        *d = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int64_t)a[H2(i)] * b[H2(i + 1)] << 1;
+    }
+}
+
+RVPR(kdmbt, 4, 2);
+
+static inline void do_kdmtt(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i + 1)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        *d = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int64_t)a[H2(i + 1)] * b[H2(i + 1)] << 1;
+    }
+}
+
+RVPR(kdmtt, 4, 2);
+
+static inline void do_kslraw(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 6);
+
+    if (shift >= 0) {
+        *d = (int32_t)sat64(env, (int64_t)a[H4(i)] << shift, 31);
+    } else {
+        shift = -shift;
+        shift = (shift == 32) ? 31 : shift;
+        *d = a[H4(i)] >> shift;
+    }
+}
+
+RVPR(kslraw, 2, 4);
+
+static inline void do_kslraw_u(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 6);
+
+    if (shift >= 0) {
+        *d = (int32_t)sat64(env, (int64_t)a[H4(i)] << shift, 31);
+    } else {
+        shift = -shift;
+        shift = (shift == 32) ? 31 : shift;
+        *d = vssra32(env, 0, a[H4(i)], shift);
+    }
+}
+
+RVPR(kslraw_u, 2, 4);
+
+static inline void do_ksllw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+
+    *d = (int32_t)sat64(env, (int64_t)a[H4(i)] << shift, 31);
+}
+
+RVPR(ksllw, 2, 4);
+
+static inline void do_kdmabb(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i)] * b[H2(i)] << 1;
+    }
+    *d = sadd32(env, 0, c[H4(i)], m0);
+}
+
+RVPR_ACC(kdmabb, 4, 2);
+
+static inline void do_kdmabt(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i)] * b[H2(i + 1)] << 1;
+    }
+    *d = sadd32(env, 0, c[H4(i)], m0);
+}
+
+RVPR_ACC(kdmabt, 4, 2);
+
+static inline void do_kdmatt(CPURISCVState *env, void *vd, void *va,
+                             void *vb, void *vc, uint8_t i)
+
+{
+    target_long *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i + 1)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i + 1)] * b[H2(i + 1)] << 1;
+    }
+    *d = sadd32(env, 0, c[H4(i)], m0);
+}
+
+RVPR_ACC(kdmatt, 4, 2);
+
+static inline void do_kabsw(CPURISCVState *env, void *vd, void *va, uint8_t i)
+
+{
+    target_long *d = vd;
+    int32_t *a = va;
+
+    if (a[H4(i)] == INT32_MIN) {
+        *d = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int32_t)abs(a[H4(i)]);
+    }
+}
+
+RVPR2(kabsw, 2, 4);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 27/38] target/riscv: 32-bit Computation Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  9 +++
 target/riscv/insn32.decode              |  9 +++
 target/riscv/insn_trans/trans_rvp.c.inc | 10 +++
 target/riscv/packed_helper.c            | 92 +++++++++++++++++++++++++
 4 files changed, 120 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index a6f62295e9..93bb26d207 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1381,3 +1381,12 @@ DEF_HELPER_4(kdmabb, tl, env, tl, tl, tl)
 DEF_HELPER_4(kdmabt, tl, env, tl, tl, tl)
 DEF_HELPER_4(kdmatt, tl, env, tl, tl, tl)
 DEF_HELPER_2(kabsw, tl, env, tl)
+
+DEF_HELPER_3(raddw, tl, env, tl, tl)
+DEF_HELPER_3(uraddw, tl, env, tl, tl)
+DEF_HELPER_3(rsubw, tl, env, tl, tl)
+DEF_HELPER_3(ursubw, tl, env, tl, tl)
+DEF_HELPER_3(maxw, tl, env, tl, tl)
+DEF_HELPER_3(minw, tl, env, tl, tl)
+DEF_HELPER_3(mulr64, i64, env, tl, tl)
+DEF_HELPER_3(mulsr64, i64, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 0b8f8d4c42..342b6d64c3 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -852,3 +852,12 @@ kdmabb     1101001  ..... ..... 001 ..... 1111111 @r
 kdmabt     1110001  ..... ..... 001 ..... 1111111 @r
 kdmatt     1111001  ..... ..... 001 ..... 1111111 @r
 kabsw      1010110  10100 ..... 000 ..... 1111111 @r2
+
+raddw      0010000  ..... ..... 001 ..... 1111111 @r
+uraddw     0011000  ..... ..... 001 ..... 1111111 @r
+rsubw      0010001  ..... ..... 001 ..... 1111111 @r
+ursubw     0011001  ..... ..... 001 ..... 1111111 @r
+maxw       1111001  ..... ..... 000 ..... 1111111 @r
+minw       1111000  ..... ..... 000 ..... 1111111 @r
+mulr64     1111000  ..... ..... 001 ..... 1111111 @r
+mulsr64    1110000  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index a57776303a..676c193f07 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -846,3 +846,13 @@ GEN_RVP_R_ACC_OOL(kdmabb);
 GEN_RVP_R_ACC_OOL(kdmabt);
 GEN_RVP_R_ACC_OOL(kdmatt);
 GEN_RVP_R2_OOL(kabsw);
+
+/* 32-bit Computation Instructions */
+GEN_RVP_R_OOL(raddw);
+GEN_RVP_R_OOL(uraddw);
+GEN_RVP_R_OOL(rsubw);
+GEN_RVP_R_OOL(ursubw);
+GEN_RVP_R_OOL(minw);
+GEN_RVP_R_OOL(maxw);
+GEN_RVP_R_D64_OOL(mulr64);
+GEN_RVP_R_D64_OOL(mulsr64);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index d2f7ec26f9..34af713020 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2827,3 +2827,95 @@ static inline void do_kabsw(CPURISCVState *env, void *vd, void *va, uint8_t i)
 }
 
 RVPR2(kabsw, 2, 4);
+
+/* 32-bit Computation Instructions */
+static inline void do_raddw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    target_long *d = vd;
+
+    *d = hadd32(a[H4(i)], b[H4(i)]);
+}
+
+RVPR(raddw, 2, 4);
+
+static inline void do_uraddw(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    target_long *d = vd;
+
+    *d = (int32_t)haddu32(a[H4(i)], b[H4(i)]);
+}
+
+RVPR(uraddw, 2, 4);
+
+static inline void do_rsubw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    target_long *d = vd;
+
+    *d = hsub32(a[H4(i)], b[H4(i)]);
+}
+
+RVPR(rsubw, 2, 4);
+
+static inline void do_ursubw(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    target_long *d = vd;
+
+    *d = (int32_t)hsubu64(a[H4(i)], b[H4(i)]);
+}
+
+RVPR(ursubw, 2, 4);
+
+static inline void do_maxw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = (a[H4(i)] > b[H4(i)]) ? a[H4(i)] : b[H4(i)];
+}
+
+RVPR(maxw, 2, 4);
+
+static inline void do_minw(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = (a[H4(i)] < b[H4(i)]) ? a[H4(i)] : b[H4(i)];
+}
+
+RVPR(minw, 2, 4);
+
+static inline void do_mulr64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint64_t *d = vd;
+    uint32_t *a = va, *b = vb;
+
+    *d = (uint64_t)a[H4(0)] * b[H4(0)];
+}
+
+RVPR64(mulr64);
+
+static inline void do_mulsr64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int64_t result;
+    int32_t *a = va, *b = vb;
+
+    result = (int64_t)a[H4(0)] * b[H4(0)];
+    d[H4(1)] = result >> 32;
+    d[H4(0)] = result & UINT32_MAX;
+}
+
+RVPR64(mulsr64);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 27/38] target/riscv: 32-bit Computation Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  9 +++
 target/riscv/insn32.decode              |  9 +++
 target/riscv/insn_trans/trans_rvp.c.inc | 10 +++
 target/riscv/packed_helper.c            | 92 +++++++++++++++++++++++++
 4 files changed, 120 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index a6f62295e9..93bb26d207 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1381,3 +1381,12 @@ DEF_HELPER_4(kdmabb, tl, env, tl, tl, tl)
 DEF_HELPER_4(kdmabt, tl, env, tl, tl, tl)
 DEF_HELPER_4(kdmatt, tl, env, tl, tl, tl)
 DEF_HELPER_2(kabsw, tl, env, tl)
+
+DEF_HELPER_3(raddw, tl, env, tl, tl)
+DEF_HELPER_3(uraddw, tl, env, tl, tl)
+DEF_HELPER_3(rsubw, tl, env, tl, tl)
+DEF_HELPER_3(ursubw, tl, env, tl, tl)
+DEF_HELPER_3(maxw, tl, env, tl, tl)
+DEF_HELPER_3(minw, tl, env, tl, tl)
+DEF_HELPER_3(mulr64, i64, env, tl, tl)
+DEF_HELPER_3(mulsr64, i64, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 0b8f8d4c42..342b6d64c3 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -852,3 +852,12 @@ kdmabb     1101001  ..... ..... 001 ..... 1111111 @r
 kdmabt     1110001  ..... ..... 001 ..... 1111111 @r
 kdmatt     1111001  ..... ..... 001 ..... 1111111 @r
 kabsw      1010110  10100 ..... 000 ..... 1111111 @r2
+
+raddw      0010000  ..... ..... 001 ..... 1111111 @r
+uraddw     0011000  ..... ..... 001 ..... 1111111 @r
+rsubw      0010001  ..... ..... 001 ..... 1111111 @r
+ursubw     0011001  ..... ..... 001 ..... 1111111 @r
+maxw       1111001  ..... ..... 000 ..... 1111111 @r
+minw       1111000  ..... ..... 000 ..... 1111111 @r
+mulr64     1111000  ..... ..... 001 ..... 1111111 @r
+mulsr64    1110000  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index a57776303a..676c193f07 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -846,3 +846,13 @@ GEN_RVP_R_ACC_OOL(kdmabb);
 GEN_RVP_R_ACC_OOL(kdmabt);
 GEN_RVP_R_ACC_OOL(kdmatt);
 GEN_RVP_R2_OOL(kabsw);
+
+/* 32-bit Computation Instructions */
+GEN_RVP_R_OOL(raddw);
+GEN_RVP_R_OOL(uraddw);
+GEN_RVP_R_OOL(rsubw);
+GEN_RVP_R_OOL(ursubw);
+GEN_RVP_R_OOL(minw);
+GEN_RVP_R_OOL(maxw);
+GEN_RVP_R_D64_OOL(mulr64);
+GEN_RVP_R_D64_OOL(mulsr64);
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index d2f7ec26f9..34af713020 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2827,3 +2827,95 @@ static inline void do_kabsw(CPURISCVState *env, void *vd, void *va, uint8_t i)
 }
 
 RVPR2(kabsw, 2, 4);
+
+/* 32-bit Computation Instructions */
+static inline void do_raddw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    target_long *d = vd;
+
+    *d = hadd32(a[H4(i)], b[H4(i)]);
+}
+
+RVPR(raddw, 2, 4);
+
+static inline void do_uraddw(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    target_long *d = vd;
+
+    *d = (int32_t)haddu32(a[H4(i)], b[H4(i)]);
+}
+
+RVPR(uraddw, 2, 4);
+
+static inline void do_rsubw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *a = va, *b = vb;
+    target_long *d = vd;
+
+    *d = hsub32(a[H4(i)], b[H4(i)]);
+}
+
+RVPR(rsubw, 2, 4);
+
+static inline void do_ursubw(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *a = va, *b = vb;
+    target_long *d = vd;
+
+    *d = (int32_t)hsubu64(a[H4(i)], b[H4(i)]);
+}
+
+RVPR(ursubw, 2, 4);
+
+static inline void do_maxw(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = (a[H4(i)] > b[H4(i)]) ? a[H4(i)] : b[H4(i)];
+}
+
+RVPR(maxw, 2, 4);
+
+static inline void do_minw(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int32_t *a = va, *b = vb;
+
+    *d = (a[H4(i)] < b[H4(i)]) ? a[H4(i)] : b[H4(i)];
+}
+
+RVPR(minw, 2, 4);
+
+static inline void do_mulr64(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint64_t *d = vd;
+    uint32_t *a = va, *b = vb;
+
+    *d = (uint64_t)a[H4(0)] * b[H4(0)];
+}
+
+RVPR64(mulr64);
+
+static inline void do_mulsr64(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int64_t result;
+    int32_t *a = va, *b = vb;
+
+    result = (int64_t)a[H4(0)] * b[H4(0)];
+    d[H4(1)] = result >> 32;
+    d[H4(0)] = result & UINT32_MAX;
+}
+
+RVPR64(mulsr64);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 28/38] target/riscv: Non-SIMD Miscellaneous Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   6 +
 target/riscv/insn32.decode              |  16 ++
 target/riscv/insn_trans/trans_rvp.c.inc | 234 ++++++++++++++++++++++++
 target/riscv/packed_helper.c            |  77 ++++++++
 4 files changed, 333 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 93bb26d207..7b3f41866e 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1390,3 +1390,9 @@ DEF_HELPER_3(maxw, tl, env, tl, tl)
 DEF_HELPER_3(minw, tl, env, tl, tl)
 DEF_HELPER_3(mulr64, i64, env, tl, tl)
 DEF_HELPER_3(mulsr64, i64, env, tl, tl)
+
+DEF_HELPER_3(ave, tl, env, tl, tl)
+DEF_HELPER_3(sra_u, tl, env, tl, tl)
+DEF_HELPER_3(bitrev, tl, env, tl, tl)
+DEF_HELPER_3(wext, tl, env, i64, tl)
+DEF_HELPER_4(bpick, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 342b6d64c3..16bf3c945b 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -26,6 +26,7 @@
 %sh4    20:4
 %sh3    20:3
 %sh5    20:5
+%sh6    20:6
 %csr    20:12
 %rm     12:3
 %nf     29:3                     !function=ex_plus_1
@@ -44,6 +45,7 @@
 &j    imm rd
 &r    rd rs1 rs2
 &r2   rd rs1
+&r4   rd rs1 rs2 rs3
 &s    imm rs1 rs2
 &u    imm rd
 &shift     shamt rs1 rd
@@ -66,6 +68,7 @@
 @sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
 @sh3     ......  ...... .....  ... ..... ....... &shift  shamt=%sh3      %rs1 %rd
 @sh5     ......  ...... .....  ... ..... ....... &shift  shamt=%sh5      %rs1 %rd
+@sh6     ......  ...... .....  ... ..... ....... &shift  shamt=%sh6      %rs1 %rd
 @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
 
 @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
@@ -75,6 +78,7 @@
 @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
 @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
 @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
+@r4      ..... ..  ..... ..... ... ..... ....... %rs3 %rs2 %rs1 %rd
 @r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
 @r2_vm   ...... vm:1 ..... ..... ... ..... ....... &rmr %rs2 %rd
 @r1_vm   ...... vm:1 ..... ..... ... ..... ....... %rd
@@ -861,3 +865,15 @@ maxw       1111001  ..... ..... 000 ..... 1111111 @r
 minw       1111000  ..... ..... 000 ..... 1111111 @r
 mulr64     1111000  ..... ..... 001 ..... 1111111 @r
 mulsr64    1110000  ..... ..... 001 ..... 1111111 @r
+
+ave        1110000  ..... ..... 000 ..... 1111111 @r
+sra_u      0010010  ..... ..... 001 ..... 1111111 @r
+srai_u     110101  ...... ..... 001 ..... 1111111 @sh6
+bitrev     1110011  ..... ..... 000 ..... 1111111 @r
+bitrevi    111010  ...... ..... 000 ..... 1111111 @sh6
+wext       1100111  ..... ..... 000 ..... 1111111 @r
+wexti      1101111  ..... ..... 000 ..... 1111111 @sh5
+bpick      .....00  ..... ..... 011 ..... 1111111 @r4
+insb       1010110  00 ... ..... 000 ..... 1111111 @sh3
+maddr32    1100010  ..... ..... 001 ..... 1111111 @r
+msubr32    1100011  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 676c193f07..8c47fd562b 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -856,3 +856,237 @@ GEN_RVP_R_OOL(minw);
 GEN_RVP_R_OOL(maxw);
 GEN_RVP_R_D64_OOL(mulr64);
 GEN_RVP_R_D64_OOL(mulsr64);
+
+/* Non-SIMD Miscellaneous Instructions */
+GEN_RVP_R_OOL(ave);
+GEN_RVP_R_OOL(sra_u);
+GEN_RVP_SHIFTI(srai_u, sra_u, NULL);
+GEN_RVP_R_OOL(bitrev);
+GEN_RVP_SHIFTI(bitrevi, bitrev, NULL);
+
+static bool
+r_s64_ool(DisasContext *ctx, arg_r *a,
+          void (* fn)(TCGv, TCGv_ptr, TCGv_i64, TCGv))
+{
+#ifdef TARGET_RISCV64
+    return r_ool(ctx, a, fn);
+#else
+    TCGv_i64 src1;
+    TCGv_i32 src2, dst;
+    TCGv_i32 a0, a1;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_p64) {
+        return false;
+    }
+
+    src1 = tcg_temp_new_i64();
+    src2 = tcg_temp_new_i32();
+    a0 = tcg_temp_new_i32();
+    a1 = tcg_temp_new_i32();
+    dst = tcg_temp_new_i32();
+
+    gen_get_gpr(a0, a->rs1);
+    gen_get_gpr(a1, a->rs1 + 1);
+    tcg_gen_concat_i32_i64(src1, a0, a1);
+    gen_get_gpr(src2, a->rs2);
+    fn(dst, cpu_env, src1, src2);
+
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free_i64(src1);
+    tcg_temp_free_i32(a0);
+    tcg_temp_free_i32(a1);
+    tcg_temp_free_i32(src2);
+    tcg_temp_free_i32(dst);
+    return true;
+#endif
+}
+
+#define GEN_RVP_R_S64_OOL(NAME)                        \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_s64_ool(s, a, gen_helper_##NAME);         \
+}
+
+GEN_RVP_R_S64_OOL(wext);
+
+static bool rvp_shifti_s64_ool(DisasContext *ctx, arg_shift *a,
+                               void (* fn)(TCGv, TCGv_ptr, TCGv_i64, TCGv))
+{
+#ifdef TARGET_RISCV64
+    return rvp_shifti_ool(ctx, a, fn);
+#else
+    TCGv_i64 src1;
+    TCGv_i32 shift, dst;
+    TCGv_i32 a0, a1;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_p64) {
+        return false;
+    }
+
+    src1 = tcg_temp_new_i64();
+    a0 = tcg_temp_new_i32();
+    a1 = tcg_temp_new_i32();
+    dst = tcg_temp_new_i32();
+
+    gen_get_gpr(a0, a->rs1);
+    gen_get_gpr(a1, a->rs1 + 1);
+    tcg_gen_concat_i32_i64(src1, a0, a1);
+    shift = tcg_const_tl(a->shamt);
+    fn(dst, cpu_env, src1, shift);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free_i64(src1);
+    tcg_temp_free_i32(a0);
+    tcg_temp_free_i32(a1);
+    tcg_temp_free_i32(shift);
+    tcg_temp_free_i32(dst);
+    return true;
+#endif
+}
+
+#define GEN_RVP_SHIFTI_S64_OOL(NAME, OP)                    \
+static bool trans_##NAME(DisasContext *s, arg_shift *a)     \
+{                                                           \
+    return rvp_shifti_s64_ool(s, a, gen_helper_##OP);       \
+}
+
+GEN_RVP_SHIFTI_S64_OOL(wexti, wext);
+
+typedef void gen_helper_rvp_r4(TCGv, TCGv_ptr, TCGv, TCGv, TCGv);
+
+static bool r4_ool(DisasContext *ctx, arg_r4 *a, gen_helper_rvp_r4 *fn)
+{
+    TCGv src1, src2, src3, dst;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    src3 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    gen_get_gpr(src3, a->rs3);
+    fn(dst, cpu_env, src1, src2, src3);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(src3);
+    tcg_temp_free(dst);
+    return true;
+}
+
+#define GEN_RVP_R4_OOL(NAME)                           \
+static bool trans_##NAME(DisasContext *s, arg_r4 *a)   \
+{                                                      \
+    return r4_ool(s, a, gen_helper_##NAME);            \
+}
+
+GEN_RVP_R4_OOL(bpick);
+
+static bool trans_insb(DisasContext *ctx, arg_shift *a)
+{
+    TCGv src1, dst, b0;
+    uint8_t shift;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+#ifdef TARGET_RISCV32
+    shift = a->shamt & 0x3;
+#else
+    shift = a->shamt;
+#endif
+    src1 = tcg_temp_new();
+    dst = tcg_temp_new();
+    b0 = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(dst, a->rd);
+
+    tcg_gen_andi_tl(b0, src1, 0xff);
+    tcg_gen_deposit_tl(dst, dst, b0, shift * 8, 8);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(dst);
+    tcg_temp_free(b0);
+    return true;
+}
+
+static bool trans_maddr32(DisasContext *ctx, arg_r *a)
+{
+    TCGv src1, src2, dst;
+    TCGv_i32 w1, w2, w3;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    dst = tcg_temp_new();
+    w1 = tcg_temp_new_i32();
+    w2 = tcg_temp_new_i32();
+    w3 = tcg_temp_new_i32();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    gen_get_gpr(dst, a->rd);
+
+    tcg_gen_trunc_tl_i32(w1, src1);
+    tcg_gen_trunc_tl_i32(w2, src2);
+    tcg_gen_trunc_tl_i32(w3, dst);
+
+    tcg_gen_mul_i32(w1, w1, w2);
+    tcg_gen_add_i32(w3, w3, w1);
+    tcg_gen_ext_i32_tl(dst, w3);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(dst);
+    tcg_temp_free_i32(w1);
+    tcg_temp_free_i32(w2);
+    tcg_temp_free_i32(w3);
+    return true;
+}
+
+static bool trans_msubr32(DisasContext *ctx, arg_r *a)
+{
+    TCGv src1, src2, dst;
+    TCGv_i32 w1, w2, w3;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    dst = tcg_temp_new();
+    w1 = tcg_temp_new_i32();
+    w2 = tcg_temp_new_i32();
+    w3 = tcg_temp_new_i32();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    gen_get_gpr(dst, a->rd);
+
+    tcg_gen_trunc_tl_i32(w1, src1);
+    tcg_gen_trunc_tl_i32(w2, src2);
+    tcg_gen_trunc_tl_i32(w3, dst);
+
+    tcg_gen_mul_i32(w1, w1, w2);
+    tcg_gen_sub_i32(w3, w3, w1);
+    tcg_gen_ext_i32_tl(dst, w3);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(dst);
+    tcg_temp_free_i32(w1);
+    tcg_temp_free_i32(w2);
+    tcg_temp_free_i32(w3);
+    return true;
+}
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 34af713020..95e60da70b 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2919,3 +2919,80 @@ static inline void do_mulsr64(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64(mulsr64);
+
+/* Miscellaneous Instructions */
+static inline void do_ave(CPURISCVState *env, void *vd, void *va,
+                          void *vb, uint8_t i)
+{
+    target_long *d = vd, *a = va, *b = vb, half;
+
+    half = hadd64(*a, *b);
+    if ((*a ^ *b) & 0x1) {
+        half++;
+    }
+    *d = half;
+}
+
+RVPR(ave, 1, sizeof(target_ulong));
+
+static inline void do_sra_u(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd, *a = va;
+    uint8_t *b = vb;
+    uint8_t shift = riscv_has_ext(env, RV32) ? (*b & 0x1f) : (*b & 0x3f);
+
+    *d = vssra64(env, 0, *a, shift);
+}
+
+RVPR(sra_u, 1, sizeof(target_ulong));
+
+static inline void do_bitrev(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    target_ulong *d = vd, *a = va;
+    uint8_t *b = vb;
+    uint8_t shift = riscv_has_ext(env, RV32) ? (*b & 0x1f) : (*b & 0x3f);
+
+    *d = revbit64(*a) >> (64 - shift - 1);
+}
+
+RVPR(bitrev, 1, sizeof(target_ulong));
+
+static inline target_ulong
+rvpr_64(CPURISCVState *env, uint64_t a, target_ulong b, PackedFn3 *fn)
+{
+    target_ulong result = 0;
+
+    fn(env, &result, &a, &b);
+    return result;
+}
+
+#define RVPR_64(NAME)                                       \
+target_ulong HELPER(NAME)(CPURISCVState *env, uint64_t a,   \
+                          target_ulong b)                   \
+{                                                           \
+    return rvpr_64(env, a, b, (PackedFn3 *)do_##NAME);      \
+}
+
+static inline void do_wext(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int64_t *a = va;
+    uint8_t b = *(uint8_t *)vb & 0x1f;
+
+    *d = sextract64(*a, b, 32);
+}
+
+RVPR_64(wext);
+
+static inline void do_bpick(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    target_long *d = vd, *a = va, *b = vb, *c = vc;
+
+    *d = (*c & *a) | (~*c & *b);
+}
+
+RVPR_ACC(bpick, 1, sizeof(target_ulong));
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 28/38] target/riscv: Non-SIMD Miscellaneous Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   6 +
 target/riscv/insn32.decode              |  16 ++
 target/riscv/insn_trans/trans_rvp.c.inc | 234 ++++++++++++++++++++++++
 target/riscv/packed_helper.c            |  77 ++++++++
 4 files changed, 333 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 93bb26d207..7b3f41866e 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1390,3 +1390,9 @@ DEF_HELPER_3(maxw, tl, env, tl, tl)
 DEF_HELPER_3(minw, tl, env, tl, tl)
 DEF_HELPER_3(mulr64, i64, env, tl, tl)
 DEF_HELPER_3(mulsr64, i64, env, tl, tl)
+
+DEF_HELPER_3(ave, tl, env, tl, tl)
+DEF_HELPER_3(sra_u, tl, env, tl, tl)
+DEF_HELPER_3(bitrev, tl, env, tl, tl)
+DEF_HELPER_3(wext, tl, env, i64, tl)
+DEF_HELPER_4(bpick, tl, env, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 342b6d64c3..16bf3c945b 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -26,6 +26,7 @@
 %sh4    20:4
 %sh3    20:3
 %sh5    20:5
+%sh6    20:6
 %csr    20:12
 %rm     12:3
 %nf     29:3                     !function=ex_plus_1
@@ -44,6 +45,7 @@
 &j    imm rd
 &r    rd rs1 rs2
 &r2   rd rs1
+&r4   rd rs1 rs2 rs3
 &s    imm rs1 rs2
 &u    imm rd
 &shift     shamt rs1 rd
@@ -66,6 +68,7 @@
 @sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
 @sh3     ......  ...... .....  ... ..... ....... &shift  shamt=%sh3      %rs1 %rd
 @sh5     ......  ...... .....  ... ..... ....... &shift  shamt=%sh5      %rs1 %rd
+@sh6     ......  ...... .....  ... ..... ....... &shift  shamt=%sh6      %rs1 %rd
 @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
 
 @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
@@ -75,6 +78,7 @@
 @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
 @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
 @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
+@r4      ..... ..  ..... ..... ... ..... ....... %rs3 %rs2 %rs1 %rd
 @r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
 @r2_vm   ...... vm:1 ..... ..... ... ..... ....... &rmr %rs2 %rd
 @r1_vm   ...... vm:1 ..... ..... ... ..... ....... %rd
@@ -861,3 +865,15 @@ maxw       1111001  ..... ..... 000 ..... 1111111 @r
 minw       1111000  ..... ..... 000 ..... 1111111 @r
 mulr64     1111000  ..... ..... 001 ..... 1111111 @r
 mulsr64    1110000  ..... ..... 001 ..... 1111111 @r
+
+ave        1110000  ..... ..... 000 ..... 1111111 @r
+sra_u      0010010  ..... ..... 001 ..... 1111111 @r
+srai_u     110101  ...... ..... 001 ..... 1111111 @sh6
+bitrev     1110011  ..... ..... 000 ..... 1111111 @r
+bitrevi    111010  ...... ..... 000 ..... 1111111 @sh6
+wext       1100111  ..... ..... 000 ..... 1111111 @r
+wexti      1101111  ..... ..... 000 ..... 1111111 @sh5
+bpick      .....00  ..... ..... 011 ..... 1111111 @r4
+insb       1010110  00 ... ..... 000 ..... 1111111 @sh3
+maddr32    1100010  ..... ..... 001 ..... 1111111 @r
+msubr32    1100011  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 676c193f07..8c47fd562b 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -856,3 +856,237 @@ GEN_RVP_R_OOL(minw);
 GEN_RVP_R_OOL(maxw);
 GEN_RVP_R_D64_OOL(mulr64);
 GEN_RVP_R_D64_OOL(mulsr64);
+
+/* Non-SIMD Miscellaneous Instructions */
+GEN_RVP_R_OOL(ave);
+GEN_RVP_R_OOL(sra_u);
+GEN_RVP_SHIFTI(srai_u, sra_u, NULL);
+GEN_RVP_R_OOL(bitrev);
+GEN_RVP_SHIFTI(bitrevi, bitrev, NULL);
+
+static bool
+r_s64_ool(DisasContext *ctx, arg_r *a,
+          void (* fn)(TCGv, TCGv_ptr, TCGv_i64, TCGv))
+{
+#ifdef TARGET_RISCV64
+    return r_ool(ctx, a, fn);
+#else
+    TCGv_i64 src1;
+    TCGv_i32 src2, dst;
+    TCGv_i32 a0, a1;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_p64) {
+        return false;
+    }
+
+    src1 = tcg_temp_new_i64();
+    src2 = tcg_temp_new_i32();
+    a0 = tcg_temp_new_i32();
+    a1 = tcg_temp_new_i32();
+    dst = tcg_temp_new_i32();
+
+    gen_get_gpr(a0, a->rs1);
+    gen_get_gpr(a1, a->rs1 + 1);
+    tcg_gen_concat_i32_i64(src1, a0, a1);
+    gen_get_gpr(src2, a->rs2);
+    fn(dst, cpu_env, src1, src2);
+
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free_i64(src1);
+    tcg_temp_free_i32(a0);
+    tcg_temp_free_i32(a1);
+    tcg_temp_free_i32(src2);
+    tcg_temp_free_i32(dst);
+    return true;
+#endif
+}
+
+#define GEN_RVP_R_S64_OOL(NAME)                        \
+static bool trans_##NAME(DisasContext *s, arg_r *a)    \
+{                                                      \
+    return r_s64_ool(s, a, gen_helper_##NAME);         \
+}
+
+GEN_RVP_R_S64_OOL(wext);
+
+static bool rvp_shifti_s64_ool(DisasContext *ctx, arg_shift *a,
+                               void (* fn)(TCGv, TCGv_ptr, TCGv_i64, TCGv))
+{
+#ifdef TARGET_RISCV64
+    return rvp_shifti_ool(ctx, a, fn);
+#else
+    TCGv_i64 src1;
+    TCGv_i32 shift, dst;
+    TCGv_i32 a0, a1;
+
+    if (!has_ext(ctx, RVP) || !ctx->ext_p64) {
+        return false;
+    }
+
+    src1 = tcg_temp_new_i64();
+    a0 = tcg_temp_new_i32();
+    a1 = tcg_temp_new_i32();
+    dst = tcg_temp_new_i32();
+
+    gen_get_gpr(a0, a->rs1);
+    gen_get_gpr(a1, a->rs1 + 1);
+    tcg_gen_concat_i32_i64(src1, a0, a1);
+    shift = tcg_const_tl(a->shamt);
+    fn(dst, cpu_env, src1, shift);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free_i64(src1);
+    tcg_temp_free_i32(a0);
+    tcg_temp_free_i32(a1);
+    tcg_temp_free_i32(shift);
+    tcg_temp_free_i32(dst);
+    return true;
+#endif
+}
+
+#define GEN_RVP_SHIFTI_S64_OOL(NAME, OP)                    \
+static bool trans_##NAME(DisasContext *s, arg_shift *a)     \
+{                                                           \
+    return rvp_shifti_s64_ool(s, a, gen_helper_##OP);       \
+}
+
+GEN_RVP_SHIFTI_S64_OOL(wexti, wext);
+
+typedef void gen_helper_rvp_r4(TCGv, TCGv_ptr, TCGv, TCGv, TCGv);
+
+static bool r4_ool(DisasContext *ctx, arg_r4 *a, gen_helper_rvp_r4 *fn)
+{
+    TCGv src1, src2, src3, dst;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    src3 = tcg_temp_new();
+    dst = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    gen_get_gpr(src3, a->rs3);
+    fn(dst, cpu_env, src1, src2, src3);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(src3);
+    tcg_temp_free(dst);
+    return true;
+}
+
+#define GEN_RVP_R4_OOL(NAME)                           \
+static bool trans_##NAME(DisasContext *s, arg_r4 *a)   \
+{                                                      \
+    return r4_ool(s, a, gen_helper_##NAME);            \
+}
+
+GEN_RVP_R4_OOL(bpick);
+
+static bool trans_insb(DisasContext *ctx, arg_shift *a)
+{
+    TCGv src1, dst, b0;
+    uint8_t shift;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+#ifdef TARGET_RISCV32
+    shift = a->shamt & 0x3;
+#else
+    shift = a->shamt;
+#endif
+    src1 = tcg_temp_new();
+    dst = tcg_temp_new();
+    b0 = tcg_temp_new();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(dst, a->rd);
+
+    tcg_gen_andi_tl(b0, src1, 0xff);
+    tcg_gen_deposit_tl(dst, dst, b0, shift * 8, 8);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(dst);
+    tcg_temp_free(b0);
+    return true;
+}
+
+static bool trans_maddr32(DisasContext *ctx, arg_r *a)
+{
+    TCGv src1, src2, dst;
+    TCGv_i32 w1, w2, w3;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    dst = tcg_temp_new();
+    w1 = tcg_temp_new_i32();
+    w2 = tcg_temp_new_i32();
+    w3 = tcg_temp_new_i32();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    gen_get_gpr(dst, a->rd);
+
+    tcg_gen_trunc_tl_i32(w1, src1);
+    tcg_gen_trunc_tl_i32(w2, src2);
+    tcg_gen_trunc_tl_i32(w3, dst);
+
+    tcg_gen_mul_i32(w1, w1, w2);
+    tcg_gen_add_i32(w3, w3, w1);
+    tcg_gen_ext_i32_tl(dst, w3);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(dst);
+    tcg_temp_free_i32(w1);
+    tcg_temp_free_i32(w2);
+    tcg_temp_free_i32(w3);
+    return true;
+}
+
+static bool trans_msubr32(DisasContext *ctx, arg_r *a)
+{
+    TCGv src1, src2, dst;
+    TCGv_i32 w1, w2, w3;
+    if (!has_ext(ctx, RVP)) {
+        return false;
+    }
+
+    src1 = tcg_temp_new();
+    src2 = tcg_temp_new();
+    dst = tcg_temp_new();
+    w1 = tcg_temp_new_i32();
+    w2 = tcg_temp_new_i32();
+    w3 = tcg_temp_new_i32();
+
+    gen_get_gpr(src1, a->rs1);
+    gen_get_gpr(src2, a->rs2);
+    gen_get_gpr(dst, a->rd);
+
+    tcg_gen_trunc_tl_i32(w1, src1);
+    tcg_gen_trunc_tl_i32(w2, src2);
+    tcg_gen_trunc_tl_i32(w3, dst);
+
+    tcg_gen_mul_i32(w1, w1, w2);
+    tcg_gen_sub_i32(w3, w3, w1);
+    tcg_gen_ext_i32_tl(dst, w3);
+    gen_set_gpr(a->rd, dst);
+
+    tcg_temp_free(src1);
+    tcg_temp_free(src2);
+    tcg_temp_free(dst);
+    tcg_temp_free_i32(w1);
+    tcg_temp_free_i32(w2);
+    tcg_temp_free_i32(w3);
+    return true;
+}
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 34af713020..95e60da70b 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2919,3 +2919,80 @@ static inline void do_mulsr64(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR64(mulsr64);
+
+/* Miscellaneous Instructions */
+static inline void do_ave(CPURISCVState *env, void *vd, void *va,
+                          void *vb, uint8_t i)
+{
+    target_long *d = vd, *a = va, *b = vb, half;
+
+    half = hadd64(*a, *b);
+    if ((*a ^ *b) & 0x1) {
+        half++;
+    }
+    *d = half;
+}
+
+RVPR(ave, 1, sizeof(target_ulong));
+
+static inline void do_sra_u(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    target_long *d = vd, *a = va;
+    uint8_t *b = vb;
+    uint8_t shift = riscv_has_ext(env, RV32) ? (*b & 0x1f) : (*b & 0x3f);
+
+    *d = vssra64(env, 0, *a, shift);
+}
+
+RVPR(sra_u, 1, sizeof(target_ulong));
+
+static inline void do_bitrev(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    target_ulong *d = vd, *a = va;
+    uint8_t *b = vb;
+    uint8_t shift = riscv_has_ext(env, RV32) ? (*b & 0x1f) : (*b & 0x3f);
+
+    *d = revbit64(*a) >> (64 - shift - 1);
+}
+
+RVPR(bitrev, 1, sizeof(target_ulong));
+
+static inline target_ulong
+rvpr_64(CPURISCVState *env, uint64_t a, target_ulong b, PackedFn3 *fn)
+{
+    target_ulong result = 0;
+
+    fn(env, &result, &a, &b);
+    return result;
+}
+
+#define RVPR_64(NAME)                                       \
+target_ulong HELPER(NAME)(CPURISCVState *env, uint64_t a,   \
+                          target_ulong b)                   \
+{                                                           \
+    return rvpr_64(env, a, b, (PackedFn3 *)do_##NAME);      \
+}
+
+static inline void do_wext(CPURISCVState *env, void *vd, void *va,
+                           void *vb, uint8_t i)
+{
+    target_long *d = vd;
+    int64_t *a = va;
+    uint8_t b = *(uint8_t *)vb & 0x1f;
+
+    *d = sextract64(*a, b, 32);
+}
+
+RVPR_64(wext);
+
+static inline void do_bpick(CPURISCVState *env, void *vd, void *va,
+                            void *vb, void *vc, uint8_t i)
+{
+    target_long *d = vd, *a = va, *b = vb, *c = vc;
+
+    *d = (*c & *a) | (~*c & *b);
+}
+
+RVPR_ACC(bpick, 1, sizeof(target_ulong));
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 29/38] target/riscv: RV64 Only SIMD 32-bit Add/Subtract Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  30 +++
 target/riscv/insn32-64.decode           |  32 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  67 ++++++
 target/riscv/packed_helper.c            | 278 ++++++++++++++++++++++++
 4 files changed, 407 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 7b3f41866e..0ade207de6 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1396,3 +1396,33 @@ DEF_HELPER_3(sra_u, tl, env, tl, tl)
 DEF_HELPER_3(bitrev, tl, env, tl, tl)
 DEF_HELPER_3(wext, tl, env, i64, tl)
 DEF_HELPER_4(bpick, tl, env, tl, tl, tl)
+#ifdef TARGET_RISCV64
+DEF_HELPER_3(radd32, tl, env, tl, tl)
+DEF_HELPER_3(uradd32, tl, env, tl, tl)
+DEF_HELPER_3(kadd32, tl, env, tl, tl)
+DEF_HELPER_3(ukadd32, tl, env, tl, tl)
+DEF_HELPER_3(rsub32, tl, env, tl, tl)
+DEF_HELPER_3(ursub32, tl, env, tl, tl)
+DEF_HELPER_3(ksub32, tl, env, tl, tl)
+DEF_HELPER_3(uksub32, tl, env, tl, tl)
+DEF_HELPER_3(cras32, tl, env, tl, tl)
+DEF_HELPER_3(rcras32, tl, env, tl, tl)
+DEF_HELPER_3(urcras32, tl, env, tl, tl)
+DEF_HELPER_3(kcras32, tl, env, tl, tl)
+DEF_HELPER_3(ukcras32, tl, env, tl, tl)
+DEF_HELPER_3(crsa32, tl, env, tl, tl)
+DEF_HELPER_3(rcrsa32, tl, env, tl, tl)
+DEF_HELPER_3(urcrsa32, tl, env, tl, tl)
+DEF_HELPER_3(kcrsa32, tl, env, tl, tl)
+DEF_HELPER_3(ukcrsa32, tl, env, tl, tl)
+DEF_HELPER_3(stas32, tl, env, tl, tl)
+DEF_HELPER_3(rstas32, tl, env, tl, tl)
+DEF_HELPER_3(urstas32, tl, env, tl, tl)
+DEF_HELPER_3(kstas32, tl, env, tl, tl)
+DEF_HELPER_3(ukstas32, tl, env, tl, tl)
+DEF_HELPER_3(stsa32, tl, env, tl, tl)
+DEF_HELPER_3(rstsa32, tl, env, tl, tl)
+DEF_HELPER_3(urstsa32, tl, env, tl, tl)
+DEF_HELPER_3(kstsa32, tl, env, tl, tl)
+DEF_HELPER_3(ukstsa32, tl, env, tl, tl)
+#endif
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index 1094172210..66eec1a44a 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -82,3 +82,35 @@ fmv_d_x    1111001  00000 ..... 000 ..... 1010011 @r2
 hlv_wu    0110100  00001   ..... 100 ..... 1110011 @r2
 hlv_d     0110110  00000   ..... 100 ..... 1110011 @r2
 hsv_d     0110111  .....   ..... 100 00000 1110011 @r2_s
+
+# *** RV64P Standard Extension (in addition to RV32P) ***
+add32      0100000  ..... ..... 010 ..... 1111111 @r
+radd32     0000000  ..... ..... 010 ..... 1111111 @r
+uradd32    0010000  ..... ..... 010 ..... 1111111 @r
+kadd32     0001000  ..... ..... 010 ..... 1111111 @r
+ukadd32    0011000  ..... ..... 010 ..... 1111111 @r
+sub32      0100001  ..... ..... 010 ..... 1111111 @r
+rsub32     0000001  ..... ..... 010 ..... 1111111 @r
+ursub32    0010001  ..... ..... 010 ..... 1111111 @r
+ksub32     0001001  ..... ..... 010 ..... 1111111 @r
+uksub32    0011001  ..... ..... 010 ..... 1111111 @r
+cras32     0100010  ..... ..... 010 ..... 1111111 @r
+rcras32    0000010  ..... ..... 010 ..... 1111111 @r
+urcras32   0010010  ..... ..... 010 ..... 1111111 @r
+kcras32    0001010  ..... ..... 010 ..... 1111111 @r
+ukcras32   0011010  ..... ..... 010 ..... 1111111 @r
+crsa32     0100011  ..... ..... 010 ..... 1111111 @r
+rcrsa32    0000011  ..... ..... 010 ..... 1111111 @r
+urcrsa32   0010011  ..... ..... 010 ..... 1111111 @r
+kcrsa32    0001011  ..... ..... 010 ..... 1111111 @r
+ukcrsa32   0011011  ..... ..... 010 ..... 1111111 @r
+stas32     1111000  ..... ..... 010 ..... 1111111 @r
+rstas32    1011000  ..... ..... 010 ..... 1111111 @r
+urstas32   1101000  ..... ..... 010 ..... 1111111 @r
+kstas32    1100000  ..... ..... 010 ..... 1111111 @r
+ukstas32   1110000  ..... ..... 010 ..... 1111111 @r
+stsa32     1111001  ..... ..... 010 ..... 1111111 @r
+rstsa32    1011001  ..... ..... 010 ..... 1111111 @r
+urstsa32   1101001  ..... ..... 010 ..... 1111111 @r
+kstsa32    1100001  ..... ..... 010 ..... 1111111 @r
+ukstsa32   1110001  ..... ..... 010 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 8c47fd562b..ea673b3aca 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1090,3 +1090,70 @@ static bool trans_msubr32(DisasContext *ctx, arg_r *a)
     tcg_temp_free_i32(w3);
     return true;
 }
+
+#ifdef TARGET_RISCV64
+/*
+ *** RV64 Only Instructions
+ */
+/* RV64 Only) SIMD 32-bit Add/Subtract Instructions */
+static void tcg_gen_simd_add32(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+
+    tcg_gen_andi_i64(t1, a, ~0xffffffff);
+    tcg_gen_add_i64(t2, a, b);
+    tcg_gen_add_i64(t1, t1, b);
+    tcg_gen_deposit_i64(d, t1, t2, 0, 32);
+
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
+GEN_RVP_R_INLINE(add32, add, 2, trans_add);
+
+static void tcg_gen_simd_sub32(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+
+    tcg_gen_andi_i64(t1, b, ~0xffffffff);
+    tcg_gen_sub_i64(t2, a, b);
+    tcg_gen_sub_i64(t1, a, t1);
+    tcg_gen_deposit_i64(d, t1, t2, 0, 32);
+
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
+GEN_RVP_R_INLINE(sub32, sub, 2, trans_sub);
+
+GEN_RVP_R_OOL(radd32);
+GEN_RVP_R_OOL(uradd32);
+GEN_RVP_R_OOL(kadd32);
+GEN_RVP_R_OOL(ukadd32);
+GEN_RVP_R_OOL(rsub32);
+GEN_RVP_R_OOL(ursub32);
+GEN_RVP_R_OOL(ksub32);
+GEN_RVP_R_OOL(uksub32);
+GEN_RVP_R_OOL(cras32);
+GEN_RVP_R_OOL(rcras32);
+GEN_RVP_R_OOL(urcras32);
+GEN_RVP_R_OOL(kcras32);
+GEN_RVP_R_OOL(ukcras32);
+GEN_RVP_R_OOL(crsa32);
+GEN_RVP_R_OOL(rcrsa32);
+GEN_RVP_R_OOL(urcrsa32);
+GEN_RVP_R_OOL(kcrsa32);
+GEN_RVP_R_OOL(ukcrsa32);
+GEN_RVP_R_OOL(stas32);
+GEN_RVP_R_OOL(rstas32);
+GEN_RVP_R_OOL(urstas32);
+GEN_RVP_R_OOL(kstas32);
+GEN_RVP_R_OOL(ukstas32);
+GEN_RVP_R_OOL(stsa32);
+GEN_RVP_R_OOL(rstsa32);
+GEN_RVP_R_OOL(urstsa32);
+GEN_RVP_R_OOL(kstsa32);
+GEN_RVP_R_OOL(ukstsa32);
+#endif
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 95e60da70b..bb56933c39 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2996,3 +2996,281 @@ static inline void do_bpick(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(bpick, 1, sizeof(target_ulong));
+
+/*
+ *** RV64 Only Instructions
+ */
+/* (RV64 Only) SIMD 32-bit Add/Subtract Instructions */
+#ifdef TARGET_RISCV64
+static inline void do_radd32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint16_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = hadd32(a[i], b[i]);
+}
+
+RVPR(radd32, 1, 4);
+
+static inline void do_uradd32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint16_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[i] = haddu32(a[i], b[i]);
+}
+
+RVPR(uradd32, 1, 4);
+
+static inline void do_kadd32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint16_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = sadd32(env, 0, a[i], b[i]);
+}
+
+RVPR(kadd32, 1, 4);
+
+static inline void do_ukadd32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint16_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[i] = saddu32(env, 0, a[i], b[i]);
+}
+
+RVPR(ukadd32, 1, 4);
+
+static inline void do_rsub32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint16_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = hsub32(a[i], b[i]);
+}
+
+RVPR(rsub32, 1, 4);
+
+static inline void do_ursub32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint16_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[i] = hsubu64(a[i], b[i]);
+}
+
+RVPR(ursub32, 1, 4);
+
+static inline void do_ksub32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint16_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = ssub32(env, 0, a[i], b[i]);
+}
+
+RVPR(ksub32, 1, 4);
+
+static inline void do_uksub32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint16_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[i] = ssubu32(env, 0, a[i], b[i]);
+}
+
+RVPR(uksub32, 1, 4);
+
+static inline void do_cras32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = a[H4(i)] - b[H4(i + 1)];
+    d[H4(i + 1)] = a[H4(i + 1)] + b[H4(i)];
+}
+
+RVPR(cras32, 2, 4);
+
+static inline void do_rcras32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hsub32(a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = hadd32(a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR(rcras32, 2, 4);
+
+static inline void do_urcras32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hsubu64(a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = haddu32(a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR(urcras32, 2, 4);
+
+static inline void do_kcras32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = ssub32(env, 0, a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = sadd32(env, 0, a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR(kcras32, 2, 4);
+
+static inline void do_ukcras32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = ssubu32(env, 0, a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = saddu32(env, 0, a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR(ukcras32, 2, 4);
+
+static inline void do_crsa32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = a[H4(i)] + b[H4(i + 1)];
+    d[H4(i + 1)] = a[H4(i + 1)] - b[H4(i)];
+}
+
+RVPR(crsa32, 2, 4);
+
+static inline void do_rcrsa32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hadd32(a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = hsub32(a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR(rcrsa32, 2, 4);
+
+static inline void do_urcrsa32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = haddu32(a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = hsubu64(a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR(urcrsa32, 2, 4);
+
+static inline void do_kcrsa32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = sadd32(env, 0, a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = ssub32(env, 0, a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR(kcrsa32, 2, 4);
+
+static inline void do_ukcrsa32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = saddu32(env, 0, a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = ssubu32(env, 0, a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR(ukcrsa32, 2, 4);
+
+static inline void do_stas32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = a[H4(i)] - b[H4(i)];
+    d[H4(i + 1)] = a[H4(i + 1)] + b[H4(i + 1)];
+}
+
+RVPR(stas32, 2, 4);
+
+static inline void do_rstas32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hsub32(a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = hadd32(a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR(rstas32, 2, 4);
+
+static inline void do_urstas32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hsubu64(a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = haddu32(a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR(urstas32, 2, 4);
+
+static inline void do_kstas32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = ssub32(env, 0, a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = sadd32(env, 0, a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR(kstas32, 2, 4);
+
+static inline void do_ukstas32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = ssubu32(env, 0, a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = saddu32(env, 0, a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR(ukstas32, 2, 4);
+
+static inline void do_stsa32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = a[H4(i)] + b[H4(i)];
+    d[H4(i + 1)] = a[H4(i + 1)] - b[H4(i + 1)];
+}
+
+RVPR(stsa32, 2, 4);
+
+static inline void do_rstsa32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hadd32(a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = hsub32(a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR(rstsa32, 2, 4);
+
+static inline void do_urstsa32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = haddu32(a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = hsubu64(a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR(urstsa32, 2, 4);
+
+static inline void do_kstsa32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = sadd32(env, 0, a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = ssub32(env, 0, a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR(kstsa32, 2, 4);
+
+static inline void do_ukstsa32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = saddu32(env, 0, a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = ssubu32(env, 0, a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR(ukstsa32, 2, 4);
+#endif
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 29/38] target/riscv: RV64 Only SIMD 32-bit Add/Subtract Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  30 +++
 target/riscv/insn32-64.decode           |  32 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  67 ++++++
 target/riscv/packed_helper.c            | 278 ++++++++++++++++++++++++
 4 files changed, 407 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 7b3f41866e..0ade207de6 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1396,3 +1396,33 @@ DEF_HELPER_3(sra_u, tl, env, tl, tl)
 DEF_HELPER_3(bitrev, tl, env, tl, tl)
 DEF_HELPER_3(wext, tl, env, i64, tl)
 DEF_HELPER_4(bpick, tl, env, tl, tl, tl)
+#ifdef TARGET_RISCV64
+DEF_HELPER_3(radd32, tl, env, tl, tl)
+DEF_HELPER_3(uradd32, tl, env, tl, tl)
+DEF_HELPER_3(kadd32, tl, env, tl, tl)
+DEF_HELPER_3(ukadd32, tl, env, tl, tl)
+DEF_HELPER_3(rsub32, tl, env, tl, tl)
+DEF_HELPER_3(ursub32, tl, env, tl, tl)
+DEF_HELPER_3(ksub32, tl, env, tl, tl)
+DEF_HELPER_3(uksub32, tl, env, tl, tl)
+DEF_HELPER_3(cras32, tl, env, tl, tl)
+DEF_HELPER_3(rcras32, tl, env, tl, tl)
+DEF_HELPER_3(urcras32, tl, env, tl, tl)
+DEF_HELPER_3(kcras32, tl, env, tl, tl)
+DEF_HELPER_3(ukcras32, tl, env, tl, tl)
+DEF_HELPER_3(crsa32, tl, env, tl, tl)
+DEF_HELPER_3(rcrsa32, tl, env, tl, tl)
+DEF_HELPER_3(urcrsa32, tl, env, tl, tl)
+DEF_HELPER_3(kcrsa32, tl, env, tl, tl)
+DEF_HELPER_3(ukcrsa32, tl, env, tl, tl)
+DEF_HELPER_3(stas32, tl, env, tl, tl)
+DEF_HELPER_3(rstas32, tl, env, tl, tl)
+DEF_HELPER_3(urstas32, tl, env, tl, tl)
+DEF_HELPER_3(kstas32, tl, env, tl, tl)
+DEF_HELPER_3(ukstas32, tl, env, tl, tl)
+DEF_HELPER_3(stsa32, tl, env, tl, tl)
+DEF_HELPER_3(rstsa32, tl, env, tl, tl)
+DEF_HELPER_3(urstsa32, tl, env, tl, tl)
+DEF_HELPER_3(kstsa32, tl, env, tl, tl)
+DEF_HELPER_3(ukstsa32, tl, env, tl, tl)
+#endif
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index 1094172210..66eec1a44a 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -82,3 +82,35 @@ fmv_d_x    1111001  00000 ..... 000 ..... 1010011 @r2
 hlv_wu    0110100  00001   ..... 100 ..... 1110011 @r2
 hlv_d     0110110  00000   ..... 100 ..... 1110011 @r2
 hsv_d     0110111  .....   ..... 100 00000 1110011 @r2_s
+
+# *** RV64P Standard Extension (in addition to RV32P) ***
+add32      0100000  ..... ..... 010 ..... 1111111 @r
+radd32     0000000  ..... ..... 010 ..... 1111111 @r
+uradd32    0010000  ..... ..... 010 ..... 1111111 @r
+kadd32     0001000  ..... ..... 010 ..... 1111111 @r
+ukadd32    0011000  ..... ..... 010 ..... 1111111 @r
+sub32      0100001  ..... ..... 010 ..... 1111111 @r
+rsub32     0000001  ..... ..... 010 ..... 1111111 @r
+ursub32    0010001  ..... ..... 010 ..... 1111111 @r
+ksub32     0001001  ..... ..... 010 ..... 1111111 @r
+uksub32    0011001  ..... ..... 010 ..... 1111111 @r
+cras32     0100010  ..... ..... 010 ..... 1111111 @r
+rcras32    0000010  ..... ..... 010 ..... 1111111 @r
+urcras32   0010010  ..... ..... 010 ..... 1111111 @r
+kcras32    0001010  ..... ..... 010 ..... 1111111 @r
+ukcras32   0011010  ..... ..... 010 ..... 1111111 @r
+crsa32     0100011  ..... ..... 010 ..... 1111111 @r
+rcrsa32    0000011  ..... ..... 010 ..... 1111111 @r
+urcrsa32   0010011  ..... ..... 010 ..... 1111111 @r
+kcrsa32    0001011  ..... ..... 010 ..... 1111111 @r
+ukcrsa32   0011011  ..... ..... 010 ..... 1111111 @r
+stas32     1111000  ..... ..... 010 ..... 1111111 @r
+rstas32    1011000  ..... ..... 010 ..... 1111111 @r
+urstas32   1101000  ..... ..... 010 ..... 1111111 @r
+kstas32    1100000  ..... ..... 010 ..... 1111111 @r
+ukstas32   1110000  ..... ..... 010 ..... 1111111 @r
+stsa32     1111001  ..... ..... 010 ..... 1111111 @r
+rstsa32    1011001  ..... ..... 010 ..... 1111111 @r
+urstsa32   1101001  ..... ..... 010 ..... 1111111 @r
+kstsa32    1100001  ..... ..... 010 ..... 1111111 @r
+ukstsa32   1110001  ..... ..... 010 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 8c47fd562b..ea673b3aca 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1090,3 +1090,70 @@ static bool trans_msubr32(DisasContext *ctx, arg_r *a)
     tcg_temp_free_i32(w3);
     return true;
 }
+
+#ifdef TARGET_RISCV64
+/*
+ *** RV64 Only Instructions
+ */
+/* RV64 Only) SIMD 32-bit Add/Subtract Instructions */
+static void tcg_gen_simd_add32(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+
+    tcg_gen_andi_i64(t1, a, ~0xffffffff);
+    tcg_gen_add_i64(t2, a, b);
+    tcg_gen_add_i64(t1, t1, b);
+    tcg_gen_deposit_i64(d, t1, t2, 0, 32);
+
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
+GEN_RVP_R_INLINE(add32, add, 2, trans_add);
+
+static void tcg_gen_simd_sub32(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+
+    tcg_gen_andi_i64(t1, b, ~0xffffffff);
+    tcg_gen_sub_i64(t2, a, b);
+    tcg_gen_sub_i64(t1, a, t1);
+    tcg_gen_deposit_i64(d, t1, t2, 0, 32);
+
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
+GEN_RVP_R_INLINE(sub32, sub, 2, trans_sub);
+
+GEN_RVP_R_OOL(radd32);
+GEN_RVP_R_OOL(uradd32);
+GEN_RVP_R_OOL(kadd32);
+GEN_RVP_R_OOL(ukadd32);
+GEN_RVP_R_OOL(rsub32);
+GEN_RVP_R_OOL(ursub32);
+GEN_RVP_R_OOL(ksub32);
+GEN_RVP_R_OOL(uksub32);
+GEN_RVP_R_OOL(cras32);
+GEN_RVP_R_OOL(rcras32);
+GEN_RVP_R_OOL(urcras32);
+GEN_RVP_R_OOL(kcras32);
+GEN_RVP_R_OOL(ukcras32);
+GEN_RVP_R_OOL(crsa32);
+GEN_RVP_R_OOL(rcrsa32);
+GEN_RVP_R_OOL(urcrsa32);
+GEN_RVP_R_OOL(kcrsa32);
+GEN_RVP_R_OOL(ukcrsa32);
+GEN_RVP_R_OOL(stas32);
+GEN_RVP_R_OOL(rstas32);
+GEN_RVP_R_OOL(urstas32);
+GEN_RVP_R_OOL(kstas32);
+GEN_RVP_R_OOL(ukstas32);
+GEN_RVP_R_OOL(stsa32);
+GEN_RVP_R_OOL(rstsa32);
+GEN_RVP_R_OOL(urstsa32);
+GEN_RVP_R_OOL(kstsa32);
+GEN_RVP_R_OOL(ukstsa32);
+#endif
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 95e60da70b..bb56933c39 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -2996,3 +2996,281 @@ static inline void do_bpick(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(bpick, 1, sizeof(target_ulong));
+
+/*
+ *** RV64 Only Instructions
+ */
+/* (RV64 Only) SIMD 32-bit Add/Subtract Instructions */
+#ifdef TARGET_RISCV64
+static inline void do_radd32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint16_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = hadd32(a[i], b[i]);
+}
+
+RVPR(radd32, 1, 4);
+
+static inline void do_uradd32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint16_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[i] = haddu32(a[i], b[i]);
+}
+
+RVPR(uradd32, 1, 4);
+
+static inline void do_kadd32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint16_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = sadd32(env, 0, a[i], b[i]);
+}
+
+RVPR(kadd32, 1, 4);
+
+static inline void do_ukadd32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint16_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[i] = saddu32(env, 0, a[i], b[i]);
+}
+
+RVPR(ukadd32, 1, 4);
+
+static inline void do_rsub32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint16_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = hsub32(a[i], b[i]);
+}
+
+RVPR(rsub32, 1, 4);
+
+static inline void do_ursub32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint16_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[i] = hsubu64(a[i], b[i]);
+}
+
+RVPR(ursub32, 1, 4);
+
+static inline void do_ksub32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint16_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[i] = ssub32(env, 0, a[i], b[i]);
+}
+
+RVPR(ksub32, 1, 4);
+
+static inline void do_uksub32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint16_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[i] = ssubu32(env, 0, a[i], b[i]);
+}
+
+RVPR(uksub32, 1, 4);
+
+static inline void do_cras32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = a[H4(i)] - b[H4(i + 1)];
+    d[H4(i + 1)] = a[H4(i + 1)] + b[H4(i)];
+}
+
+RVPR(cras32, 2, 4);
+
+static inline void do_rcras32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hsub32(a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = hadd32(a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR(rcras32, 2, 4);
+
+static inline void do_urcras32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hsubu64(a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = haddu32(a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR(urcras32, 2, 4);
+
+static inline void do_kcras32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = ssub32(env, 0, a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = sadd32(env, 0, a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR(kcras32, 2, 4);
+
+static inline void do_ukcras32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = ssubu32(env, 0, a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = saddu32(env, 0, a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR(ukcras32, 2, 4);
+
+static inline void do_crsa32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = a[H4(i)] + b[H4(i + 1)];
+    d[H4(i + 1)] = a[H4(i + 1)] - b[H4(i)];
+}
+
+RVPR(crsa32, 2, 4);
+
+static inline void do_rcrsa32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hadd32(a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = hsub32(a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR(rcrsa32, 2, 4);
+
+static inline void do_urcrsa32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = haddu32(a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = hsubu64(a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR(urcrsa32, 2, 4);
+
+static inline void do_kcrsa32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = sadd32(env, 0, a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = ssub32(env, 0, a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR(kcrsa32, 2, 4);
+
+static inline void do_ukcrsa32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = saddu32(env, 0, a[H4(i)], b[H4(i + 1)]);
+    d[H4(i + 1)] = ssubu32(env, 0, a[H4(i + 1)], b[H4(i)]);
+}
+
+RVPR(ukcrsa32, 2, 4);
+
+static inline void do_stas32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = a[H4(i)] - b[H4(i)];
+    d[H4(i + 1)] = a[H4(i + 1)] + b[H4(i + 1)];
+}
+
+RVPR(stas32, 2, 4);
+
+static inline void do_rstas32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hsub32(a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = hadd32(a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR(rstas32, 2, 4);
+
+static inline void do_urstas32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hsubu64(a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = haddu32(a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR(urstas32, 2, 4);
+
+static inline void do_kstas32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = ssub32(env, 0, a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = sadd32(env, 0, a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR(kstas32, 2, 4);
+
+static inline void do_ukstas32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = ssubu32(env, 0, a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = saddu32(env, 0, a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR(ukstas32, 2, 4);
+
+static inline void do_stsa32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = a[H4(i)] + b[H4(i)];
+    d[H4(i + 1)] = a[H4(i + 1)] - b[H4(i + 1)];
+}
+
+RVPR(stsa32, 2, 4);
+
+static inline void do_rstsa32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = hadd32(a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = hsub32(a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR(rstsa32, 2, 4);
+
+static inline void do_urstsa32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = haddu32(a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = hsubu64(a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR(urstsa32, 2, 4);
+
+static inline void do_kstsa32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = sadd32(env, 0, a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = ssub32(env, 0, a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR(kstsa32, 2, 4);
+
+static inline void do_ukstsa32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = saddu32(env, 0, a[H4(i)], b[H4(i)]);
+    d[H4(i + 1)] = ssubu32(env, 0, a[H4(i + 1)], b[H4(i + 1)]);
+}
+
+RVPR(ukstsa32, 2, 4);
+#endif
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 30/38] target/riscv: RV64 Only SIMD 32-bit Shift Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   9 ++
 target/riscv/insn32-64.decode           |  15 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  16 ++++
 target/riscv/packed_helper.c            | 104 ++++++++++++++++++++++++
 4 files changed, 144 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 0ade207de6..673bc4f628 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1425,4 +1425,13 @@ DEF_HELPER_3(rstsa32, tl, env, tl, tl)
 DEF_HELPER_3(urstsa32, tl, env, tl, tl)
 DEF_HELPER_3(kstsa32, tl, env, tl, tl)
 DEF_HELPER_3(ukstsa32, tl, env, tl, tl)
+
+DEF_HELPER_3(sra32, tl, env, tl, tl)
+DEF_HELPER_3(sra32_u, tl, env, tl, tl)
+DEF_HELPER_3(srl32, tl, env, tl, tl)
+DEF_HELPER_3(srl32_u, tl, env, tl, tl)
+DEF_HELPER_3(sll32, tl, env, tl, tl)
+DEF_HELPER_3(ksll32, tl, env, tl, tl)
+DEF_HELPER_3(kslra32, tl, env, tl, tl)
+DEF_HELPER_3(kslra32_u, tl, env, tl, tl)
 #endif
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index 66eec1a44a..6f0f2923ca 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -114,3 +114,18 @@ rstsa32    1011001  ..... ..... 010 ..... 1111111 @r
 urstsa32   1101001  ..... ..... 010 ..... 1111111 @r
 kstsa32    1100001  ..... ..... 010 ..... 1111111 @r
 ukstsa32   1110001  ..... ..... 010 ..... 1111111 @r
+
+sra32      0101000  ..... ..... 010 ..... 1111111 @r
+sra32_u    0110000  ..... ..... 010 ..... 1111111 @r
+srai32     0111000  ..... ..... 010 ..... 1111111 @sh5
+srai32_u   1000000  ..... ..... 010 ..... 1111111 @sh5
+srl32      0101001  ..... ..... 010 ..... 1111111 @r
+srl32_u    0110001  ..... ..... 010 ..... 1111111 @r
+srli32     0111001  ..... ..... 010 ..... 1111111 @sh5
+srli32_u   1000001  ..... ..... 010 ..... 1111111 @sh5
+sll32      0101010  ..... ..... 010 ..... 1111111 @r
+slli32     0111010  ..... ..... 010 ..... 1111111 @sh5
+ksll32     0110010  ..... ..... 010 ..... 1111111 @r
+kslli32    1000010  ..... ..... 010 ..... 1111111 @sh5
+kslra32    0101011  ..... ..... 010 ..... 1111111 @r
+kslra32_u  0110011  ..... ..... 010 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index ea673b3aca..e52f268a57 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1156,4 +1156,20 @@ GEN_RVP_R_OOL(rstsa32);
 GEN_RVP_R_OOL(urstsa32);
 GEN_RVP_R_OOL(kstsa32);
 GEN_RVP_R_OOL(ukstsa32);
+
+/* (RV64 Only) SIMD 32-bit Shift Instructions */
+GEN_RVP_SHIFT(sra32, tcg_gen_gvec_sars, 2);
+GEN_RVP_SHIFTI(srai32, sra32, NULL);
+GEN_RVP_R_OOL(sra32_u);
+GEN_RVP_SHIFTI(srai32_u, sra32_u, NULL);
+GEN_RVP_SHIFT(srl32, tcg_gen_gvec_shrs, 2);
+GEN_RVP_SHIFTI(srli32, srl32, NULL);
+GEN_RVP_R_OOL(srl32_u);
+GEN_RVP_SHIFTI(srli32_u, srl32_u, NULL);
+GEN_RVP_SHIFT(sll32, tcg_gen_gvec_shls, 2);
+GEN_RVP_SHIFTI(slli32, sll32, NULL);
+GEN_RVP_R_OOL(ksll32);
+GEN_RVP_SHIFTI(kslli32, ksll32, NULL);
+GEN_RVP_R_OOL(kslra32);
+GEN_RVP_R_OOL(kslra32_u);
 #endif
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index bb56933c39..c168c51eff 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3273,4 +3273,108 @@ static inline void do_ukstsa32(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(ukstsa32, 2, 4);
+
+/* (RV64 Only) SIMD 32-bit Shift Instructions */
+static inline void do_sra32(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+    d[i] = a[i] >> shift;
+}
+
+RVPR(sra32, 1, 4);
+
+static inline void do_srl32(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+    d[i] = a[i] >> shift;
+}
+
+RVPR(srl32, 1, 4);
+
+static inline void do_sll32(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+    d[i] = a[i] << shift;
+}
+
+RVPR(sll32, 1, 4);
+
+static inline void do_sra32_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+
+    d[i] = vssra32(env, 0, a[i], shift);
+}
+
+RVPR(sra32_u, 1, 4);
+
+static inline void do_srl32_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+
+    d[i] = vssrl32(env, 0, a[i], shift);
+}
+
+RVPR(srl32_u, 1, 4);
+
+static inline void do_ksll32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, result;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+
+    result = a[i] << shift;
+    if (shift > clrsb32(a[i])) {
+        env->vxsat = 0x1;
+        d[i] = (a[i] & INT32_MIN) ? INT32_MIN : INT32_MAX;
+    } else {
+        d[i] = result;
+    }
+}
+
+RVPR(ksll32, 1, 4);
+
+static inline void do_kslra32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 6);
+
+    if (shift >= 0) {
+        do_ksll32(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 32) ? 31 : shift;
+        d[i] = a[i] >> shift;
+    }
+}
+
+RVPR(kslra32, 1, 4);
+
+static inline void do_kslra32_u(CPURISCVState *env, void *vd, void *va,
+                                void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 6);
+
+    if (shift >= 0) {
+        do_ksll32(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 32) ? 31 : shift;
+        d[i] = vssra32(env, 0, a[i], shift);
+    }
+}
+
+RVPR(kslra32_u, 1, 4);
 #endif
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 30/38] target/riscv: RV64 Only SIMD 32-bit Shift Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |   9 ++
 target/riscv/insn32-64.decode           |  15 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  16 ++++
 target/riscv/packed_helper.c            | 104 ++++++++++++++++++++++++
 4 files changed, 144 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 0ade207de6..673bc4f628 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1425,4 +1425,13 @@ DEF_HELPER_3(rstsa32, tl, env, tl, tl)
 DEF_HELPER_3(urstsa32, tl, env, tl, tl)
 DEF_HELPER_3(kstsa32, tl, env, tl, tl)
 DEF_HELPER_3(ukstsa32, tl, env, tl, tl)
+
+DEF_HELPER_3(sra32, tl, env, tl, tl)
+DEF_HELPER_3(sra32_u, tl, env, tl, tl)
+DEF_HELPER_3(srl32, tl, env, tl, tl)
+DEF_HELPER_3(srl32_u, tl, env, tl, tl)
+DEF_HELPER_3(sll32, tl, env, tl, tl)
+DEF_HELPER_3(ksll32, tl, env, tl, tl)
+DEF_HELPER_3(kslra32, tl, env, tl, tl)
+DEF_HELPER_3(kslra32_u, tl, env, tl, tl)
 #endif
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index 66eec1a44a..6f0f2923ca 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -114,3 +114,18 @@ rstsa32    1011001  ..... ..... 010 ..... 1111111 @r
 urstsa32   1101001  ..... ..... 010 ..... 1111111 @r
 kstsa32    1100001  ..... ..... 010 ..... 1111111 @r
 ukstsa32   1110001  ..... ..... 010 ..... 1111111 @r
+
+sra32      0101000  ..... ..... 010 ..... 1111111 @r
+sra32_u    0110000  ..... ..... 010 ..... 1111111 @r
+srai32     0111000  ..... ..... 010 ..... 1111111 @sh5
+srai32_u   1000000  ..... ..... 010 ..... 1111111 @sh5
+srl32      0101001  ..... ..... 010 ..... 1111111 @r
+srl32_u    0110001  ..... ..... 010 ..... 1111111 @r
+srli32     0111001  ..... ..... 010 ..... 1111111 @sh5
+srli32_u   1000001  ..... ..... 010 ..... 1111111 @sh5
+sll32      0101010  ..... ..... 010 ..... 1111111 @r
+slli32     0111010  ..... ..... 010 ..... 1111111 @sh5
+ksll32     0110010  ..... ..... 010 ..... 1111111 @r
+kslli32    1000010  ..... ..... 010 ..... 1111111 @sh5
+kslra32    0101011  ..... ..... 010 ..... 1111111 @r
+kslra32_u  0110011  ..... ..... 010 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index ea673b3aca..e52f268a57 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1156,4 +1156,20 @@ GEN_RVP_R_OOL(rstsa32);
 GEN_RVP_R_OOL(urstsa32);
 GEN_RVP_R_OOL(kstsa32);
 GEN_RVP_R_OOL(ukstsa32);
+
+/* (RV64 Only) SIMD 32-bit Shift Instructions */
+GEN_RVP_SHIFT(sra32, tcg_gen_gvec_sars, 2);
+GEN_RVP_SHIFTI(srai32, sra32, NULL);
+GEN_RVP_R_OOL(sra32_u);
+GEN_RVP_SHIFTI(srai32_u, sra32_u, NULL);
+GEN_RVP_SHIFT(srl32, tcg_gen_gvec_shrs, 2);
+GEN_RVP_SHIFTI(srli32, srl32, NULL);
+GEN_RVP_R_OOL(srl32_u);
+GEN_RVP_SHIFTI(srli32_u, srl32_u, NULL);
+GEN_RVP_SHIFT(sll32, tcg_gen_gvec_shls, 2);
+GEN_RVP_SHIFTI(slli32, sll32, NULL);
+GEN_RVP_R_OOL(ksll32);
+GEN_RVP_SHIFTI(kslli32, ksll32, NULL);
+GEN_RVP_R_OOL(kslra32);
+GEN_RVP_R_OOL(kslra32_u);
 #endif
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index bb56933c39..c168c51eff 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3273,4 +3273,108 @@ static inline void do_ukstsa32(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(ukstsa32, 2, 4);
+
+/* (RV64 Only) SIMD 32-bit Shift Instructions */
+static inline void do_sra32(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+    d[i] = a[i] >> shift;
+}
+
+RVPR(sra32, 1, 4);
+
+static inline void do_srl32(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+    d[i] = a[i] >> shift;
+}
+
+RVPR(srl32, 1, 4);
+
+static inline void do_sll32(CPURISCVState *env, void *vd, void *va,
+                            void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+    d[i] = a[i] << shift;
+}
+
+RVPR(sll32, 1, 4);
+
+static inline void do_sra32_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+
+    d[i] = vssra32(env, 0, a[i], shift);
+}
+
+RVPR(sra32_u, 1, 4);
+
+static inline void do_srl32_u(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+
+    d[i] = vssrl32(env, 0, a[i], shift);
+}
+
+RVPR(srl32_u, 1, 4);
+
+static inline void do_ksll32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, result;
+    uint8_t shift = *(uint8_t *)vb & 0x1f;
+
+    result = a[i] << shift;
+    if (shift > clrsb32(a[i])) {
+        env->vxsat = 0x1;
+        d[i] = (a[i] & INT32_MIN) ? INT32_MIN : INT32_MAX;
+    } else {
+        d[i] = result;
+    }
+}
+
+RVPR(ksll32, 1, 4);
+
+static inline void do_kslra32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 6);
+
+    if (shift >= 0) {
+        do_ksll32(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 32) ? 31 : shift;
+        d[i] = a[i] >> shift;
+    }
+}
+
+RVPR(kslra32, 1, 4);
+
+static inline void do_kslra32_u(CPURISCVState *env, void *vd, void *va,
+                                void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va;
+    int32_t shift = sextract32((*(uint32_t *)vb), 0, 6);
+
+    if (shift >= 0) {
+        do_ksll32(env, vd, va, vb, i);
+    } else {
+        shift = -shift;
+        shift = (shift == 32) ? 31 : shift;
+        d[i] = vssra32(env, 0, a[i], shift);
+    }
+}
+
+RVPR(kslra32_u, 1, 4);
 #endif
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 31/38] target/riscv: RV64 Only SIMD 32-bit Miscellaneous Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  6 +++
 target/riscv/insn32-64.decode           |  6 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  7 ++++
 target/riscv/packed_helper.c            | 55 +++++++++++++++++++++++++
 4 files changed, 74 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 673bc4f628..384b42ce90 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1434,4 +1434,10 @@ DEF_HELPER_3(sll32, tl, env, tl, tl)
 DEF_HELPER_3(ksll32, tl, env, tl, tl)
 DEF_HELPER_3(kslra32, tl, env, tl, tl)
 DEF_HELPER_3(kslra32_u, tl, env, tl, tl)
+
+DEF_HELPER_3(smin32, tl, env, tl, tl)
+DEF_HELPER_3(umin32, tl, env, tl, tl)
+DEF_HELPER_3(smax32, tl, env, tl, tl)
+DEF_HELPER_3(umax32, tl, env, tl, tl)
+DEF_HELPER_2(kabs32, tl, env, tl)
 #endif
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index 6f0f2923ca..a2b8831467 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -129,3 +129,9 @@ ksll32     0110010  ..... ..... 010 ..... 1111111 @r
 kslli32    1000010  ..... ..... 010 ..... 1111111 @sh5
 kslra32    0101011  ..... ..... 010 ..... 1111111 @r
 kslra32_u  0110011  ..... ..... 010 ..... 1111111 @r
+
+smin32     1001000  ..... ..... 010 ..... 1111111 @r
+umin32     1010000  ..... ..... 010 ..... 1111111 @r
+smax32     1001001  ..... ..... 010 ..... 1111111 @r
+umax32     1010001  ..... ..... 010 ..... 1111111 @r
+kabs32     1010110  10010 ..... 000 ..... 1111111 @r2
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index e52f268a57..ce144ee5c0 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1172,4 +1172,11 @@ GEN_RVP_R_OOL(ksll32);
 GEN_RVP_SHIFTI(kslli32, ksll32, NULL);
 GEN_RVP_R_OOL(kslra32);
 GEN_RVP_R_OOL(kslra32_u);
+
+/* (RV64 Only) SIMD 32-bit Miscellaneous Instructions */
+GEN_RVP_R_OOL(smin32);
+GEN_RVP_R_OOL(umin32);
+GEN_RVP_R_OOL(smax32);
+GEN_RVP_R_OOL(umax32);
+GEN_RVP_R2_OOL(kabs32);
 #endif
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index c168c51eff..c8a92f5b7d 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3377,4 +3377,59 @@ static inline void do_kslra32_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(kslra32_u, 1, 4);
+
+/* (RV64 Only) SIMD 32-bit Miscellaneous Instructions */
+static inline void do_smin32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR(smin32, 1, 4);
+
+static inline void do_umin32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR(umin32, 1, 4);
+
+static inline void do_smax32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR(smax32, 1, 4);
+
+static inline void do_umax32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR(umax32, 1, 4);
+
+static inline void do_kabs32(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+
+    if (a[i] == INT32_MIN) {
+        d[i] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = abs(a[i]);
+    }
+}
+
+RVPR2(kabs32, 1, 4);
 #endif
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 31/38] target/riscv: RV64 Only SIMD 32-bit Miscellaneous Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  6 +++
 target/riscv/insn32-64.decode           |  6 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  7 ++++
 target/riscv/packed_helper.c            | 55 +++++++++++++++++++++++++
 4 files changed, 74 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 673bc4f628..384b42ce90 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1434,4 +1434,10 @@ DEF_HELPER_3(sll32, tl, env, tl, tl)
 DEF_HELPER_3(ksll32, tl, env, tl, tl)
 DEF_HELPER_3(kslra32, tl, env, tl, tl)
 DEF_HELPER_3(kslra32_u, tl, env, tl, tl)
+
+DEF_HELPER_3(smin32, tl, env, tl, tl)
+DEF_HELPER_3(umin32, tl, env, tl, tl)
+DEF_HELPER_3(smax32, tl, env, tl, tl)
+DEF_HELPER_3(umax32, tl, env, tl, tl)
+DEF_HELPER_2(kabs32, tl, env, tl)
 #endif
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index 6f0f2923ca..a2b8831467 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -129,3 +129,9 @@ ksll32     0110010  ..... ..... 010 ..... 1111111 @r
 kslli32    1000010  ..... ..... 010 ..... 1111111 @sh5
 kslra32    0101011  ..... ..... 010 ..... 1111111 @r
 kslra32_u  0110011  ..... ..... 010 ..... 1111111 @r
+
+smin32     1001000  ..... ..... 010 ..... 1111111 @r
+umin32     1010000  ..... ..... 010 ..... 1111111 @r
+smax32     1001001  ..... ..... 010 ..... 1111111 @r
+umax32     1010001  ..... ..... 010 ..... 1111111 @r
+kabs32     1010110  10010 ..... 000 ..... 1111111 @r2
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index e52f268a57..ce144ee5c0 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1172,4 +1172,11 @@ GEN_RVP_R_OOL(ksll32);
 GEN_RVP_SHIFTI(kslli32, ksll32, NULL);
 GEN_RVP_R_OOL(kslra32);
 GEN_RVP_R_OOL(kslra32_u);
+
+/* (RV64 Only) SIMD 32-bit Miscellaneous Instructions */
+GEN_RVP_R_OOL(smin32);
+GEN_RVP_R_OOL(umin32);
+GEN_RVP_R_OOL(smax32);
+GEN_RVP_R_OOL(umax32);
+GEN_RVP_R2_OOL(kabs32);
 #endif
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index c168c51eff..c8a92f5b7d 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3377,4 +3377,59 @@ static inline void do_kslra32_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(kslra32_u, 1, 4);
+
+/* (RV64 Only) SIMD 32-bit Miscellaneous Instructions */
+static inline void do_smin32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR(smin32, 1, 4);
+
+static inline void do_umin32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] < b[i]) ? a[i] : b[i];
+}
+
+RVPR(umin32, 1, 4);
+
+static inline void do_smax32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int32_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR(smax32, 1, 4);
+
+static inline void do_umax32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+
+    d[i] = (a[i] > b[i]) ? a[i] : b[i];
+}
+
+RVPR(umax32, 1, 4);
+
+static inline void do_kabs32(CPURISCVState *env, void *vd, void *va, uint8_t i)
+{
+    int32_t *d = vd, *a = va;
+
+    if (a[i] == INT32_MIN) {
+        d[i] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[i] = abs(a[i]);
+    }
+}
+
+RVPR2(kabs32, 1, 4);
 #endif
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 32/38] target/riscv: RV64 Only SIMD Q15 saturating Multiply Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  10 ++
 target/riscv/insn32-64.decode           |  10 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  11 ++
 target/riscv/packed_helper.c            | 141 ++++++++++++++++++++++++
 4 files changed, 172 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 384b42ce90..f8521a5388 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1440,4 +1440,14 @@ DEF_HELPER_3(umin32, tl, env, tl, tl)
 DEF_HELPER_3(smax32, tl, env, tl, tl)
 DEF_HELPER_3(umax32, tl, env, tl, tl)
 DEF_HELPER_2(kabs32, tl, env, tl)
+
+DEF_HELPER_3(khmbb16, tl, env, tl, tl)
+DEF_HELPER_3(khmbt16, tl, env, tl, tl)
+DEF_HELPER_3(khmtt16, tl, env, tl, tl)
+DEF_HELPER_3(kdmbb16, tl, env, tl, tl)
+DEF_HELPER_3(kdmbt16, tl, env, tl, tl)
+DEF_HELPER_3(kdmtt16, tl, env, tl, tl)
+DEF_HELPER_4(kdmabb16, tl, env, tl, tl, tl)
+DEF_HELPER_4(kdmabt16, tl, env, tl, tl, tl)
+DEF_HELPER_4(kdmatt16, tl, env, tl, tl, tl)
 #endif
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index a2b8831467..2e1c1817e4 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -135,3 +135,13 @@ umin32     1010000  ..... ..... 010 ..... 1111111 @r
 smax32     1001001  ..... ..... 010 ..... 1111111 @r
 umax32     1010001  ..... ..... 010 ..... 1111111 @r
 kabs32     1010110  10010 ..... 000 ..... 1111111 @r2
+
+khmbb16    1101110  ..... ..... 001 ..... 1111111 @r
+khmbt16    1110110  ..... ..... 001 ..... 1111111 @r
+khmtt16    1111110  ..... ..... 001 ..... 1111111 @r
+kdmbb16    1101101  ..... ..... 001 ..... 1111111 @r
+kdmbt16    1110101  ..... ..... 001 ..... 1111111 @r
+kdmtt16    1111101  ..... ..... 001 ..... 1111111 @r
+kdmabb16   1101100  ..... ..... 001 ..... 1111111 @r
+kdmabt16   1110100  ..... ..... 001 ..... 1111111 @r
+kdmatt16   1111100  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index ce144ee5c0..2b4418abd8 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1179,4 +1179,15 @@ GEN_RVP_R_OOL(umin32);
 GEN_RVP_R_OOL(smax32);
 GEN_RVP_R_OOL(umax32);
 GEN_RVP_R2_OOL(kabs32);
+
+/* (RV64 Only) SIMD Q15 saturating Multiply Instructions */
+GEN_RVP_R_OOL(khmbb16);
+GEN_RVP_R_OOL(khmbt16);
+GEN_RVP_R_OOL(khmtt16);
+GEN_RVP_R_OOL(kdmbb16);
+GEN_RVP_R_OOL(kdmbt16);
+GEN_RVP_R_OOL(kdmtt16);
+GEN_RVP_R_ACC_OOL(kdmabb16);
+GEN_RVP_R_ACC_OOL(kdmabt16);
+GEN_RVP_R_ACC_OOL(kdmatt16);
 #endif
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index c8a92f5b7d..5636848aaf 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3432,4 +3432,145 @@ static inline void do_kabs32(CPURISCVState *env, void *vd, void *va, uint8_t i)
 }
 
 RVPR2(kabs32, 1, 4);
+
+/* (RV64 Only) SIMD Q15 saturating Multiply Instructions */
+static inline void do_khmbb16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    d[H4(i / 2)] = sat64(env, (int64_t)a[H2(i)] * b[H2(i)] >> 15, 15);
+}
+
+RVPR(khmbb16, 2, 2);
+
+static inline void do_khmbt16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    d[H4(i / 2)] = sat64(env, (int64_t)a[H2(i)] * b[H2(i + 1)] >> 15, 15);
+}
+
+RVPR(khmbt16, 2, 2);
+
+static inline void do_khmtt16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    d[H4(i / 2)] = sat64(env, (int64_t)a[H2(i + 1)] * b[H2(i + 1)] >> 15, 15);
+}
+
+RVPR(khmtt16, 2, 2);
+
+static inline void do_kdmbb16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i)] == INT16_MIN) {
+        d[H4(i / 2)] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[H4(i / 2)] = (int64_t)a[H2(i)] * b[H2(i)] << 1;
+    }
+}
+
+RVPR(kdmbb16, 2, 2);
+
+static inline void do_kdmbt16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        d[H4(i / 2)] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[H4(i / 2)] = (int64_t)a[H2(i)] * b[H2(i + 1)] << 1;
+    }
+}
+
+RVPR(kdmbt16, 2, 2);
+
+static inline void do_kdmtt16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i + 1)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        d[H4(i / 2)] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[H4(i / 2)] = (int64_t)a[H2(i + 1)] * b[H2(i + 1)] << 1;
+    }
+}
+
+RVPR(kdmtt16, 2, 2);
+
+static inline void do_kdmabb16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i)] * b[H2(i)] << 1;
+    }
+    d[H4(i / 2)] = sadd32(env, 0, c[H4(i)], m0);
+}
+
+RVPR_ACC(kdmabb16, 2, 2);
+
+static inline void do_kdmabt16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i)] * b[H2(i + 1)] << 1;
+    }
+    d[H4(i / 2)] = sadd32(env, 0, c[H4(i)], m0);
+}
+
+RVPR_ACC(kdmabt16, 2, 2);
+
+static inline void do_kdmatt16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i + 1)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i + 1)] * b[H2(i + 1)] << 1;
+    }
+    *d = sadd32(env, 0, c[H4(i)], m0);
+}
+
+RVPR_ACC(kdmatt16, 2, 2);
+
+
 #endif
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 32/38] target/riscv: RV64 Only SIMD Q15 saturating Multiply Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  10 ++
 target/riscv/insn32-64.decode           |  10 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  11 ++
 target/riscv/packed_helper.c            | 141 ++++++++++++++++++++++++
 4 files changed, 172 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 384b42ce90..f8521a5388 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1440,4 +1440,14 @@ DEF_HELPER_3(umin32, tl, env, tl, tl)
 DEF_HELPER_3(smax32, tl, env, tl, tl)
 DEF_HELPER_3(umax32, tl, env, tl, tl)
 DEF_HELPER_2(kabs32, tl, env, tl)
+
+DEF_HELPER_3(khmbb16, tl, env, tl, tl)
+DEF_HELPER_3(khmbt16, tl, env, tl, tl)
+DEF_HELPER_3(khmtt16, tl, env, tl, tl)
+DEF_HELPER_3(kdmbb16, tl, env, tl, tl)
+DEF_HELPER_3(kdmbt16, tl, env, tl, tl)
+DEF_HELPER_3(kdmtt16, tl, env, tl, tl)
+DEF_HELPER_4(kdmabb16, tl, env, tl, tl, tl)
+DEF_HELPER_4(kdmabt16, tl, env, tl, tl, tl)
+DEF_HELPER_4(kdmatt16, tl, env, tl, tl, tl)
 #endif
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index a2b8831467..2e1c1817e4 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -135,3 +135,13 @@ umin32     1010000  ..... ..... 010 ..... 1111111 @r
 smax32     1001001  ..... ..... 010 ..... 1111111 @r
 umax32     1010001  ..... ..... 010 ..... 1111111 @r
 kabs32     1010110  10010 ..... 000 ..... 1111111 @r2
+
+khmbb16    1101110  ..... ..... 001 ..... 1111111 @r
+khmbt16    1110110  ..... ..... 001 ..... 1111111 @r
+khmtt16    1111110  ..... ..... 001 ..... 1111111 @r
+kdmbb16    1101101  ..... ..... 001 ..... 1111111 @r
+kdmbt16    1110101  ..... ..... 001 ..... 1111111 @r
+kdmtt16    1111101  ..... ..... 001 ..... 1111111 @r
+kdmabb16   1101100  ..... ..... 001 ..... 1111111 @r
+kdmabt16   1110100  ..... ..... 001 ..... 1111111 @r
+kdmatt16   1111100  ..... ..... 001 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index ce144ee5c0..2b4418abd8 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1179,4 +1179,15 @@ GEN_RVP_R_OOL(umin32);
 GEN_RVP_R_OOL(smax32);
 GEN_RVP_R_OOL(umax32);
 GEN_RVP_R2_OOL(kabs32);
+
+/* (RV64 Only) SIMD Q15 saturating Multiply Instructions */
+GEN_RVP_R_OOL(khmbb16);
+GEN_RVP_R_OOL(khmbt16);
+GEN_RVP_R_OOL(khmtt16);
+GEN_RVP_R_OOL(kdmbb16);
+GEN_RVP_R_OOL(kdmbt16);
+GEN_RVP_R_OOL(kdmtt16);
+GEN_RVP_R_ACC_OOL(kdmabb16);
+GEN_RVP_R_ACC_OOL(kdmabt16);
+GEN_RVP_R_ACC_OOL(kdmatt16);
 #endif
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index c8a92f5b7d..5636848aaf 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3432,4 +3432,145 @@ static inline void do_kabs32(CPURISCVState *env, void *vd, void *va, uint8_t i)
 }
 
 RVPR2(kabs32, 1, 4);
+
+/* (RV64 Only) SIMD Q15 saturating Multiply Instructions */
+static inline void do_khmbb16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    d[H4(i / 2)] = sat64(env, (int64_t)a[H2(i)] * b[H2(i)] >> 15, 15);
+}
+
+RVPR(khmbb16, 2, 2);
+
+static inline void do_khmbt16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    d[H4(i / 2)] = sat64(env, (int64_t)a[H2(i)] * b[H2(i + 1)] >> 15, 15);
+}
+
+RVPR(khmbt16, 2, 2);
+
+static inline void do_khmtt16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    d[H4(i / 2)] = sat64(env, (int64_t)a[H2(i + 1)] * b[H2(i + 1)] >> 15, 15);
+}
+
+RVPR(khmtt16, 2, 2);
+
+static inline void do_kdmbb16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i)] == INT16_MIN) {
+        d[H4(i / 2)] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[H4(i / 2)] = (int64_t)a[H2(i)] * b[H2(i)] << 1;
+    }
+}
+
+RVPR(kdmbb16, 2, 2);
+
+static inline void do_kdmbt16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        d[H4(i / 2)] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[H4(i / 2)] = (int64_t)a[H2(i)] * b[H2(i + 1)] << 1;
+    }
+}
+
+RVPR(kdmbt16, 2, 2);
+
+static inline void do_kdmtt16(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+
+    if (a[H2(i + 1)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        d[H4(i / 2)] = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        d[H4(i / 2)] = (int64_t)a[H2(i + 1)] * b[H2(i + 1)] << 1;
+    }
+}
+
+RVPR(kdmtt16, 2, 2);
+
+static inline void do_kdmabb16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i)] * b[H2(i)] << 1;
+    }
+    d[H4(i / 2)] = sadd32(env, 0, c[H4(i)], m0);
+}
+
+RVPR_ACC(kdmabb16, 2, 2);
+
+static inline void do_kdmabt16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i)] * b[H2(i + 1)] << 1;
+    }
+    d[H4(i / 2)] = sadd32(env, 0, c[H4(i)], m0);
+}
+
+RVPR_ACC(kdmabt16, 2, 2);
+
+static inline void do_kdmatt16(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+
+{
+    int32_t *d = vd;
+    int16_t *a = va, *b = vb;
+    int32_t *c = vc, m0;
+
+    if (a[H2(i + 1)] == INT16_MIN && b[H2(i + 1)] == INT16_MIN) {
+        m0 = INT32_MAX;
+        env->vxsat = 0x1;
+    } else {
+        m0 = (int32_t)a[H2(i + 1)] * b[H2(i + 1)] << 1;
+    }
+    *d = sadd32(env, 0, c[H4(i)], m0);
+}
+
+RVPR_ACC(kdmatt16, 2, 2);
+
+
 #endif
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 33/38] target/riscv: RV64 Only 32-bit Multiply Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  3 +++
 target/riscv/insn32-64.decode           |  3 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  4 ++++
 target/riscv/packed_helper.c            | 19 +++++++++++++++++++
 4 files changed, 29 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index f8521a5388..198b010601 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1450,4 +1450,7 @@ DEF_HELPER_3(kdmtt16, tl, env, tl, tl)
 DEF_HELPER_4(kdmabb16, tl, env, tl, tl, tl)
 DEF_HELPER_4(kdmabt16, tl, env, tl, tl, tl)
 DEF_HELPER_4(kdmatt16, tl, env, tl, tl, tl)
+
+DEF_HELPER_3(smbt32, tl, env, tl, tl)
+DEF_HELPER_3(smtt32, tl, env, tl, tl)
 #endif
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index 2e1c1817e4..46a4e5d080 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -145,3 +145,6 @@ kdmtt16    1111101  ..... ..... 001 ..... 1111111 @r
 kdmabb16   1101100  ..... ..... 001 ..... 1111111 @r
 kdmabt16   1110100  ..... ..... 001 ..... 1111111 @r
 kdmatt16   1111100  ..... ..... 001 ..... 1111111 @r
+
+smbt32     0001100  ..... ..... 010 ..... 1111111 @r
+smtt32     0010100  ..... ..... 010 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 2b4418abd8..33435c3a9e 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1190,4 +1190,8 @@ GEN_RVP_R_OOL(kdmtt16);
 GEN_RVP_R_ACC_OOL(kdmabb16);
 GEN_RVP_R_ACC_OOL(kdmabt16);
 GEN_RVP_R_ACC_OOL(kdmatt16);
+
+/* (RV64 Only) 32-bit Multiply Instructions */
+GEN_RVP_R_OOL(smbt32);
+GEN_RVP_R_OOL(smtt32);
 #endif
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 5636848aaf..11b41637a1 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3572,5 +3572,24 @@ static inline void do_kdmatt16(CPURISCVState *env, void *vd, void *va,
 
 RVPR_ACC(kdmatt16, 2, 2);
 
+/* (RV64 Only) 32-bit Multiply Instructions */
+static inline void do_smbt32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    *d = (int64_t)a[H4(2 * i)] * b[H4(4 * i + 1)];
+}
+
+RVPR(smbt32, 1, sizeof(target_ulong));
+
+static inline void do_smtt32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    *d = (int64_t)a[H4(2 * i + 1)] * b[H4(2 * i + 1)];
+}
 
+RVPR(smtt32, 1, sizeof(target_ulong));
 #endif
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 33/38] target/riscv: RV64 Only 32-bit Multiply Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  3 +++
 target/riscv/insn32-64.decode           |  3 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  4 ++++
 target/riscv/packed_helper.c            | 19 +++++++++++++++++++
 4 files changed, 29 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index f8521a5388..198b010601 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1450,4 +1450,7 @@ DEF_HELPER_3(kdmtt16, tl, env, tl, tl)
 DEF_HELPER_4(kdmabb16, tl, env, tl, tl, tl)
 DEF_HELPER_4(kdmabt16, tl, env, tl, tl, tl)
 DEF_HELPER_4(kdmatt16, tl, env, tl, tl, tl)
+
+DEF_HELPER_3(smbt32, tl, env, tl, tl)
+DEF_HELPER_3(smtt32, tl, env, tl, tl)
 #endif
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index 2e1c1817e4..46a4e5d080 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -145,3 +145,6 @@ kdmtt16    1111101  ..... ..... 001 ..... 1111111 @r
 kdmabb16   1101100  ..... ..... 001 ..... 1111111 @r
 kdmabt16   1110100  ..... ..... 001 ..... 1111111 @r
 kdmatt16   1111100  ..... ..... 001 ..... 1111111 @r
+
+smbt32     0001100  ..... ..... 010 ..... 1111111 @r
+smtt32     0010100  ..... ..... 010 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 2b4418abd8..33435c3a9e 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1190,4 +1190,8 @@ GEN_RVP_R_OOL(kdmtt16);
 GEN_RVP_R_ACC_OOL(kdmabb16);
 GEN_RVP_R_ACC_OOL(kdmabt16);
 GEN_RVP_R_ACC_OOL(kdmatt16);
+
+/* (RV64 Only) 32-bit Multiply Instructions */
+GEN_RVP_R_OOL(smbt32);
+GEN_RVP_R_OOL(smtt32);
 #endif
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 5636848aaf..11b41637a1 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3572,5 +3572,24 @@ static inline void do_kdmatt16(CPURISCVState *env, void *vd, void *va,
 
 RVPR_ACC(kdmatt16, 2, 2);
 
+/* (RV64 Only) 32-bit Multiply Instructions */
+static inline void do_smbt32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    *d = (int64_t)a[H4(2 * i)] * b[H4(4 * i + 1)];
+}
+
+RVPR(smbt32, 1, sizeof(target_ulong));
+
+static inline void do_smtt32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    *d = (int64_t)a[H4(2 * i + 1)] * b[H4(2 * i + 1)];
+}
 
+RVPR(smtt32, 1, sizeof(target_ulong));
 #endif
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 34/38] target/riscv: RV64 Only 32-bit Multiply & Add Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  4 ++++
 target/riscv/insn32-64.decode           |  4 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  5 ++++
 target/riscv/packed_helper.c            | 31 +++++++++++++++++++++++++
 4 files changed, 44 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 198b010601..05f7c1d811 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1453,4 +1453,8 @@ DEF_HELPER_4(kdmatt16, tl, env, tl, tl, tl)
 
 DEF_HELPER_3(smbt32, tl, env, tl, tl)
 DEF_HELPER_3(smtt32, tl, env, tl, tl)
+
+DEF_HELPER_4(kmabb32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmabt32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmatt32, tl, env, tl, tl, tl)
 #endif
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index 46a4e5d080..c5b07a2667 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -148,3 +148,7 @@ kdmatt16   1111100  ..... ..... 001 ..... 1111111 @r
 
 smbt32     0001100  ..... ..... 010 ..... 1111111 @r
 smtt32     0010100  ..... ..... 010 ..... 1111111 @r
+
+kmabb32    0101101  ..... ..... 010 ..... 1111111 @r
+kmabt32    0110101  ..... ..... 010 ..... 1111111 @r
+kmatt32    0111101  ..... ..... 010 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 33435c3a9e..da6a4ba14a 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1194,4 +1194,9 @@ GEN_RVP_R_ACC_OOL(kdmatt16);
 /* (RV64 Only) 32-bit Multiply Instructions */
 GEN_RVP_R_OOL(smbt32);
 GEN_RVP_R_OOL(smtt32);
+
+/* (RV64 Only) 32-bit Multiply & Add Instructions */
+GEN_RVP_R_ACC_OOL(kmabb32);
+GEN_RVP_R_ACC_OOL(kmabt32);
+GEN_RVP_R_ACC_OOL(kmatt32);
 #endif
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 11b41637a1..99da28a4b3 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3592,4 +3592,35 @@ static inline void do_smtt32(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(smtt32, 1, sizeof(target_ulong));
+
+/* (RV64 Only) 32-bit Multiply & Add Instructions */
+static inline void do_kmabb32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    *d = sadd64(env, 0, (int64_t)a[H4(2 * i)] * b[H4(2 * i)], *c);
+}
+
+RVPR_ACC(kmabb32, 1, sizeof(target_ulong));
+
+static inline void do_kmabt32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    *d = sadd64(env, 0, (int64_t)a[H4(2 * i)] * b[H4(2 * i + 1)], *c);
+}
+
+RVPR_ACC(kmabt32, 1, sizeof(target_ulong));
+
+static inline void do_kmatt32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    *d = sadd64(env, 0, (int64_t)a[H4(2 * i + 1)] * b[H4(2 * i + 1)], *c);
+}
+
+RVPR_ACC(kmatt32, 1, sizeof(target_ulong));
 #endif
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 34/38] target/riscv: RV64 Only 32-bit Multiply & Add Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  4 ++++
 target/riscv/insn32-64.decode           |  4 ++++
 target/riscv/insn_trans/trans_rvp.c.inc |  5 ++++
 target/riscv/packed_helper.c            | 31 +++++++++++++++++++++++++
 4 files changed, 44 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 198b010601..05f7c1d811 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1453,4 +1453,8 @@ DEF_HELPER_4(kdmatt16, tl, env, tl, tl, tl)
 
 DEF_HELPER_3(smbt32, tl, env, tl, tl)
 DEF_HELPER_3(smtt32, tl, env, tl, tl)
+
+DEF_HELPER_4(kmabb32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmabt32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmatt32, tl, env, tl, tl, tl)
 #endif
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index 46a4e5d080..c5b07a2667 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -148,3 +148,7 @@ kdmatt16   1111100  ..... ..... 001 ..... 1111111 @r
 
 smbt32     0001100  ..... ..... 010 ..... 1111111 @r
 smtt32     0010100  ..... ..... 010 ..... 1111111 @r
+
+kmabb32    0101101  ..... ..... 010 ..... 1111111 @r
+kmabt32    0110101  ..... ..... 010 ..... 1111111 @r
+kmatt32    0111101  ..... ..... 010 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 33435c3a9e..da6a4ba14a 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1194,4 +1194,9 @@ GEN_RVP_R_ACC_OOL(kdmatt16);
 /* (RV64 Only) 32-bit Multiply Instructions */
 GEN_RVP_R_OOL(smbt32);
 GEN_RVP_R_OOL(smtt32);
+
+/* (RV64 Only) 32-bit Multiply & Add Instructions */
+GEN_RVP_R_ACC_OOL(kmabb32);
+GEN_RVP_R_ACC_OOL(kmabt32);
+GEN_RVP_R_ACC_OOL(kmatt32);
 #endif
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 11b41637a1..99da28a4b3 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3592,4 +3592,35 @@ static inline void do_smtt32(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(smtt32, 1, sizeof(target_ulong));
+
+/* (RV64 Only) 32-bit Multiply & Add Instructions */
+static inline void do_kmabb32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    *d = sadd64(env, 0, (int64_t)a[H4(2 * i)] * b[H4(2 * i)], *c);
+}
+
+RVPR_ACC(kmabb32, 1, sizeof(target_ulong));
+
+static inline void do_kmabt32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    *d = sadd64(env, 0, (int64_t)a[H4(2 * i)] * b[H4(2 * i + 1)], *c);
+}
+
+RVPR_ACC(kmabt32, 1, sizeof(target_ulong));
+
+static inline void do_kmatt32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    *d = sadd64(env, 0, (int64_t)a[H4(2 * i + 1)] * b[H4(2 * i + 1)], *c);
+}
+
+RVPR_ACC(kmatt32, 1, sizeof(target_ulong));
 #endif
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 35/38] target/riscv: RV64 Only 32-bit Parallel Multiply & Add Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  12 ++
 target/riscv/insn32-64.decode           |  12 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  13 ++
 target/riscv/packed_helper.c            | 182 ++++++++++++++++++++++++
 4 files changed, 219 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 05f7c1d811..85290a2b05 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1457,4 +1457,16 @@ DEF_HELPER_3(smtt32, tl, env, tl, tl)
 DEF_HELPER_4(kmabb32, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmabt32, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmatt32, tl, env, tl, tl, tl)
+
+DEF_HELPER_3(kmda32, tl, env, tl, tl)
+DEF_HELPER_3(kmxda32, tl, env, tl, tl)
+DEF_HELPER_4(kmaxda32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmads32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmadrs32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmaxds32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmsda32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmsxda32, tl, env, tl, tl, tl)
+DEF_HELPER_3(smds32, tl, env, tl, tl)
+DEF_HELPER_3(smdrs32, tl, env, tl, tl)
+DEF_HELPER_3(smxds32, tl, env, tl, tl)
 #endif
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index c5b07a2667..ccdd965963 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -152,3 +152,15 @@ smtt32     0010100  ..... ..... 010 ..... 1111111 @r
 kmabb32    0101101  ..... ..... 010 ..... 1111111 @r
 kmabt32    0110101  ..... ..... 010 ..... 1111111 @r
 kmatt32    0111101  ..... ..... 010 ..... 1111111 @r
+
+kmda32     0011100  ..... ..... 010 ..... 1111111 @r
+kmxda32    0011101  ..... ..... 010 ..... 1111111 @r
+kmaxda32   0100101  ..... ..... 010 ..... 1111111 @r
+kmads32    0101110  ..... ..... 010 ..... 1111111 @r
+kmadrs32   0110110  ..... ..... 010 ..... 1111111 @r
+kmaxds32   0111110  ..... ..... 010 ..... 1111111 @r
+kmsda32    0100110  ..... ..... 010 ..... 1111111 @r
+kmsxda32   0100111  ..... ..... 010 ..... 1111111 @r
+smds32     0101100  ..... ..... 010 ..... 1111111 @r
+smdrs32    0110100  ..... ..... 010 ..... 1111111 @r
+smxds32    0111100  ..... ..... 010 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index da6a4ba14a..d2000bcfb5 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1199,4 +1199,17 @@ GEN_RVP_R_OOL(smtt32);
 GEN_RVP_R_ACC_OOL(kmabb32);
 GEN_RVP_R_ACC_OOL(kmabt32);
 GEN_RVP_R_ACC_OOL(kmatt32);
+
+/* (RV64 Only) 32-bit Parallel Multiply & Add Instructions */
+GEN_RVP_R_OOL(kmda32);
+GEN_RVP_R_OOL(kmxda32);
+GEN_RVP_R_ACC_OOL(kmaxda32);
+GEN_RVP_R_ACC_OOL(kmads32);
+GEN_RVP_R_ACC_OOL(kmadrs32);
+GEN_RVP_R_ACC_OOL(kmaxds32);
+GEN_RVP_R_ACC_OOL(kmsda32);
+GEN_RVP_R_ACC_OOL(kmsxda32);
+GEN_RVP_R_OOL(smds32);
+GEN_RVP_R_OOL(smdrs32);
+GEN_RVP_R_OOL(smxds32);
 #endif
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 99da28a4b3..bd24d5145a 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3623,4 +3623,186 @@ static inline void do_kmatt32(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(kmatt32, 1, sizeof(target_ulong));
+
+/* (RV64 Only) 32-bit Parallel Multiply & Add Instructions */
+static inline void do_kmda32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H4(i)] == INT32_MIN &&
+        a[H4(i + 1)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        *d = INT64_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int64_t)a[H4(i)] * b[H4(i)] +
+             (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+    }
+}
+
+RVPR(kmda32, 1, sizeof(target_ulong));
+
+static inline void do_kmxda32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H4(i)] == INT32_MIN &&
+        a[H4(i + 1)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        *d = INT64_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int64_t)a[H4(i)] * b[H4(i + 1)] +
+             (int64_t)a[H4(i + 1)] * b[H4(i)];
+    }
+}
+
+RVPR(kmxda32, 1, sizeof(target_ulong));
+
+static inline void do_kmaxda32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t p1, p2;
+    p1 = (int64_t)a[H4(i)] * b[H4(i + 1)];
+    p2 = (int64_t)a[H4(i + 1)] * b[H4(i)];
+
+    if (a[H4(i)] == INT32_MIN && a[H4(i + 1)] == INT32_MIN &&
+        b[H4(i)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        if (*d < 0) {
+            *d = (INT64_MAX + *c) + 1ll;
+        } else {
+            env->vxsat = 0x1;
+            *d = INT64_MAX;
+        }
+    } else {
+        *d = sadd64(env, 0, p1 + p2, *c);
+    }
+}
+
+RVPR_ACC(kmaxda32, 1, sizeof(target_ulong));
+
+static inline void do_kmads32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t t0, t1;
+    t1 = (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+    t0 = (int64_t)a[H4(i)] * b[H4(i)];
+
+    *d = sadd64(env, 0, t1 - t0, *c);
+}
+
+RVPR_ACC(kmads32, 1, sizeof(target_ulong));
+
+static inline void do_kmadrs32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t t0, t1;
+    t1 = (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+    t0 = (int64_t)a[H4(i)] * b[H4(i)];
+
+    *d = sadd64(env, 0, t0 - t1, *c);
+}
+
+RVPR_ACC(kmadrs32, 1, sizeof(target_ulong));
+
+static inline void do_kmaxds32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t t01, t10;
+    t01 = (int64_t)a[H4(i)] * b[H4(i + 1)];
+    t10 = (int64_t)a[H4(i + 1)] * b[H4(i)];
+
+    *d = sadd64(env, 0, t10 - t01, *c);
+}
+
+RVPR_ACC(kmaxds32, 1, sizeof(target_ulong));
+
+static inline void do_kmsda32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t t0, t1;
+    t0 = (int64_t)a[H4(i)] * b[H4(i)];
+    t1 = (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+
+    if (a[H4(i)] == INT32_MIN && a[H4(i + 1)] == INT32_MIN &&
+        b[H4(i)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        if (*d < 0) {
+            env->vxsat = 0x1;
+            *d = INT64_MIN;
+        } else {
+            *d = *c - 1ll - INT64_MAX;
+        }
+    } else {
+        *d = ssub64(env, 0, t0 + t1, *c);
+    }
+}
+
+RVPR_ACC(kmsda32, 1, sizeof(target_ulong));
+
+static inline void do_kmsxda32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t t01, t10;
+    t10 = (int64_t)a[H4(i + 1)] * b[H4(i)];
+    t01 = (int64_t)a[H4(i)] * b[H4(i + 1)];
+
+    if (a[H4(i)] == INT32_MIN && a[H4(i + 1)] == INT32_MIN &&
+        b[H4(i)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        if (*d < 0) {
+            env->vxsat = 0x1;
+            *d = INT64_MIN;
+        } else {
+            *d = *c - 1ll - INT64_MAX;
+        }
+    } else {
+        *d = ssub64(env, 0, t10 + t01, *c);
+    }
+}
+
+RVPR_ACC(kmsxda32, 1, sizeof(target_ulong));
+
+static inline void do_smds32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    *d = (int64_t)a[H4(i + 1)] * b[H4(i + 1)] -
+         (int64_t)a[H4(i)] * b[H4(i)];
+}
+
+RVPR(smds32, 1, sizeof(target_ulong));
+
+static inline void do_smdrs32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    *d = (int64_t)a[H4(i)] * b[H4(i)] -
+         (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+}
+
+RVPR(smdrs32, 1, sizeof(target_ulong));
+
+static inline void do_smxds32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    *d = (int64_t)a[H4(i + 1)] * b[H4(i)] -
+         (int64_t)a[H4(i)] * b[H4(i + 1)];
+}
+
+RVPR(smxds32, 1, sizeof(target_ulong));
 #endif
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 35/38] target/riscv: RV64 Only 32-bit Parallel Multiply & Add Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  12 ++
 target/riscv/insn32-64.decode           |  12 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  13 ++
 target/riscv/packed_helper.c            | 182 ++++++++++++++++++++++++
 4 files changed, 219 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 05f7c1d811..85290a2b05 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1457,4 +1457,16 @@ DEF_HELPER_3(smtt32, tl, env, tl, tl)
 DEF_HELPER_4(kmabb32, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmabt32, tl, env, tl, tl, tl)
 DEF_HELPER_4(kmatt32, tl, env, tl, tl, tl)
+
+DEF_HELPER_3(kmda32, tl, env, tl, tl)
+DEF_HELPER_3(kmxda32, tl, env, tl, tl)
+DEF_HELPER_4(kmaxda32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmads32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmadrs32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmaxds32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmsda32, tl, env, tl, tl, tl)
+DEF_HELPER_4(kmsxda32, tl, env, tl, tl, tl)
+DEF_HELPER_3(smds32, tl, env, tl, tl)
+DEF_HELPER_3(smdrs32, tl, env, tl, tl)
+DEF_HELPER_3(smxds32, tl, env, tl, tl)
 #endif
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index c5b07a2667..ccdd965963 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -152,3 +152,15 @@ smtt32     0010100  ..... ..... 010 ..... 1111111 @r
 kmabb32    0101101  ..... ..... 010 ..... 1111111 @r
 kmabt32    0110101  ..... ..... 010 ..... 1111111 @r
 kmatt32    0111101  ..... ..... 010 ..... 1111111 @r
+
+kmda32     0011100  ..... ..... 010 ..... 1111111 @r
+kmxda32    0011101  ..... ..... 010 ..... 1111111 @r
+kmaxda32   0100101  ..... ..... 010 ..... 1111111 @r
+kmads32    0101110  ..... ..... 010 ..... 1111111 @r
+kmadrs32   0110110  ..... ..... 010 ..... 1111111 @r
+kmaxds32   0111110  ..... ..... 010 ..... 1111111 @r
+kmsda32    0100110  ..... ..... 010 ..... 1111111 @r
+kmsxda32   0100111  ..... ..... 010 ..... 1111111 @r
+smds32     0101100  ..... ..... 010 ..... 1111111 @r
+smdrs32    0110100  ..... ..... 010 ..... 1111111 @r
+smxds32    0111100  ..... ..... 010 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index da6a4ba14a..d2000bcfb5 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1199,4 +1199,17 @@ GEN_RVP_R_OOL(smtt32);
 GEN_RVP_R_ACC_OOL(kmabb32);
 GEN_RVP_R_ACC_OOL(kmabt32);
 GEN_RVP_R_ACC_OOL(kmatt32);
+
+/* (RV64 Only) 32-bit Parallel Multiply & Add Instructions */
+GEN_RVP_R_OOL(kmda32);
+GEN_RVP_R_OOL(kmxda32);
+GEN_RVP_R_ACC_OOL(kmaxda32);
+GEN_RVP_R_ACC_OOL(kmads32);
+GEN_RVP_R_ACC_OOL(kmadrs32);
+GEN_RVP_R_ACC_OOL(kmaxds32);
+GEN_RVP_R_ACC_OOL(kmsda32);
+GEN_RVP_R_ACC_OOL(kmsxda32);
+GEN_RVP_R_OOL(smds32);
+GEN_RVP_R_OOL(smdrs32);
+GEN_RVP_R_OOL(smxds32);
 #endif
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 99da28a4b3..bd24d5145a 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3623,4 +3623,186 @@ static inline void do_kmatt32(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR_ACC(kmatt32, 1, sizeof(target_ulong));
+
+/* (RV64 Only) 32-bit Parallel Multiply & Add Instructions */
+static inline void do_kmda32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H4(i)] == INT32_MIN &&
+        a[H4(i + 1)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        *d = INT64_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int64_t)a[H4(i)] * b[H4(i)] +
+             (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+    }
+}
+
+RVPR(kmda32, 1, sizeof(target_ulong));
+
+static inline void do_kmxda32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    if (a[H4(i)] == INT32_MIN && b[H4(i)] == INT32_MIN &&
+        a[H4(i + 1)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        *d = INT64_MAX;
+        env->vxsat = 0x1;
+    } else {
+        *d = (int64_t)a[H4(i)] * b[H4(i + 1)] +
+             (int64_t)a[H4(i + 1)] * b[H4(i)];
+    }
+}
+
+RVPR(kmxda32, 1, sizeof(target_ulong));
+
+static inline void do_kmaxda32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t p1, p2;
+    p1 = (int64_t)a[H4(i)] * b[H4(i + 1)];
+    p2 = (int64_t)a[H4(i + 1)] * b[H4(i)];
+
+    if (a[H4(i)] == INT32_MIN && a[H4(i + 1)] == INT32_MIN &&
+        b[H4(i)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        if (*d < 0) {
+            *d = (INT64_MAX + *c) + 1ll;
+        } else {
+            env->vxsat = 0x1;
+            *d = INT64_MAX;
+        }
+    } else {
+        *d = sadd64(env, 0, p1 + p2, *c);
+    }
+}
+
+RVPR_ACC(kmaxda32, 1, sizeof(target_ulong));
+
+static inline void do_kmads32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t t0, t1;
+    t1 = (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+    t0 = (int64_t)a[H4(i)] * b[H4(i)];
+
+    *d = sadd64(env, 0, t1 - t0, *c);
+}
+
+RVPR_ACC(kmads32, 1, sizeof(target_ulong));
+
+static inline void do_kmadrs32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t t0, t1;
+    t1 = (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+    t0 = (int64_t)a[H4(i)] * b[H4(i)];
+
+    *d = sadd64(env, 0, t0 - t1, *c);
+}
+
+RVPR_ACC(kmadrs32, 1, sizeof(target_ulong));
+
+static inline void do_kmaxds32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t t01, t10;
+    t01 = (int64_t)a[H4(i)] * b[H4(i + 1)];
+    t10 = (int64_t)a[H4(i + 1)] * b[H4(i)];
+
+    *d = sadd64(env, 0, t10 - t01, *c);
+}
+
+RVPR_ACC(kmaxds32, 1, sizeof(target_ulong));
+
+static inline void do_kmsda32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t t0, t1;
+    t0 = (int64_t)a[H4(i)] * b[H4(i)];
+    t1 = (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+
+    if (a[H4(i)] == INT32_MIN && a[H4(i + 1)] == INT32_MIN &&
+        b[H4(i)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        if (*d < 0) {
+            env->vxsat = 0x1;
+            *d = INT64_MIN;
+        } else {
+            *d = *c - 1ll - INT64_MAX;
+        }
+    } else {
+        *d = ssub64(env, 0, t0 + t1, *c);
+    }
+}
+
+RVPR_ACC(kmsda32, 1, sizeof(target_ulong));
+
+static inline void do_kmsxda32(CPURISCVState *env, void *vd, void *va,
+                               void *vb, void *vc, uint8_t i)
+{
+    int64_t *d = vd, *c = vc;
+    int32_t *a = va, *b = vb;
+    int64_t t01, t10;
+    t10 = (int64_t)a[H4(i + 1)] * b[H4(i)];
+    t01 = (int64_t)a[H4(i)] * b[H4(i + 1)];
+
+    if (a[H4(i)] == INT32_MIN && a[H4(i + 1)] == INT32_MIN &&
+        b[H4(i)] == INT32_MIN && b[H4(i + 1)] == INT32_MIN) {
+        if (*d < 0) {
+            env->vxsat = 0x1;
+            *d = INT64_MIN;
+        } else {
+            *d = *c - 1ll - INT64_MAX;
+        }
+    } else {
+        *d = ssub64(env, 0, t10 + t01, *c);
+    }
+}
+
+RVPR_ACC(kmsxda32, 1, sizeof(target_ulong));
+
+static inline void do_smds32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    *d = (int64_t)a[H4(i + 1)] * b[H4(i + 1)] -
+         (int64_t)a[H4(i)] * b[H4(i)];
+}
+
+RVPR(smds32, 1, sizeof(target_ulong));
+
+static inline void do_smdrs32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    *d = (int64_t)a[H4(i)] * b[H4(i)] -
+         (int64_t)a[H4(i + 1)] * b[H4(i + 1)];
+}
+
+RVPR(smdrs32, 1, sizeof(target_ulong));
+
+static inline void do_smxds32(CPURISCVState *env, void *vd, void *va,
+                              void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va, *b = vb;
+    *d = (int64_t)a[H4(i + 1)] * b[H4(i)] -
+         (int64_t)a[H4(i)] * b[H4(i + 1)];
+}
+
+RVPR(smxds32, 1, sizeof(target_ulong));
 #endif
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 36/38] target/riscv: RV64 Only Non-SIMD 32-bit Shift Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  2 ++
 target/riscv/insn32-64.decode           |  2 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  3 +++
 target/riscv/packed_helper.c            | 13 +++++++++++++
 4 files changed, 20 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 85290a2b05..d3dd1fb248 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1469,4 +1469,6 @@ DEF_HELPER_4(kmsxda32, tl, env, tl, tl, tl)
 DEF_HELPER_3(smds32, tl, env, tl, tl)
 DEF_HELPER_3(smdrs32, tl, env, tl, tl)
 DEF_HELPER_3(smxds32, tl, env, tl, tl)
+
+DEF_HELPER_3(sraiw_u, tl, env, tl, tl)
 #endif
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index ccdd965963..32066d3ac2 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -164,3 +164,5 @@ kmsxda32   0100111  ..... ..... 010 ..... 1111111 @r
 smds32     0101100  ..... ..... 010 ..... 1111111 @r
 smdrs32    0110100  ..... ..... 010 ..... 1111111 @r
 smxds32    0111100  ..... ..... 010 ..... 1111111 @r
+
+sraiw_u    0011010  ..... ..... 001 ..... 1111111 @sh5
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index d2000bcfb5..57827d2e15 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1212,4 +1212,7 @@ GEN_RVP_R_ACC_OOL(kmsxda32);
 GEN_RVP_R_OOL(smds32);
 GEN_RVP_R_OOL(smdrs32);
 GEN_RVP_R_OOL(smxds32);
+
+/* (RV64 Only) Non-SIMD 32-bit Shift Instructions */
+GEN_RVP_SHIFTI(sraiw_u, sraiw_u, NULL);
 #endif
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index bd24d5145a..69a7788e99 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3805,4 +3805,17 @@ static inline void do_smxds32(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(smxds32, 1, sizeof(target_ulong));
+
+/* (RV64 Only) Non-SIMD 32-bit Shift Instructions */
+static inline void do_sraiw_u(CPURISCVState *env, void *vd, void *va,
+                         void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va;
+    uint8_t shift = *(uint8_t *)vb;
+
+    *d = vssra32(env, 0, a[H4(i)], shift);
+}
+
+RVPR(sraiw_u, 1, sizeof(target_ulong));
 #endif
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 36/38] target/riscv: RV64 Only Non-SIMD 32-bit Shift Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  2 ++
 target/riscv/insn32-64.decode           |  2 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  3 +++
 target/riscv/packed_helper.c            | 13 +++++++++++++
 4 files changed, 20 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 85290a2b05..d3dd1fb248 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1469,4 +1469,6 @@ DEF_HELPER_4(kmsxda32, tl, env, tl, tl, tl)
 DEF_HELPER_3(smds32, tl, env, tl, tl)
 DEF_HELPER_3(smdrs32, tl, env, tl, tl)
 DEF_HELPER_3(smxds32, tl, env, tl, tl)
+
+DEF_HELPER_3(sraiw_u, tl, env, tl, tl)
 #endif
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index ccdd965963..32066d3ac2 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -164,3 +164,5 @@ kmsxda32   0100111  ..... ..... 010 ..... 1111111 @r
 smds32     0101100  ..... ..... 010 ..... 1111111 @r
 smdrs32    0110100  ..... ..... 010 ..... 1111111 @r
 smxds32    0111100  ..... ..... 010 ..... 1111111 @r
+
+sraiw_u    0011010  ..... ..... 001 ..... 1111111 @sh5
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index d2000bcfb5..57827d2e15 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1212,4 +1212,7 @@ GEN_RVP_R_ACC_OOL(kmsxda32);
 GEN_RVP_R_OOL(smds32);
 GEN_RVP_R_OOL(smdrs32);
 GEN_RVP_R_OOL(smxds32);
+
+/* (RV64 Only) Non-SIMD 32-bit Shift Instructions */
+GEN_RVP_SHIFTI(sraiw_u, sraiw_u, NULL);
 #endif
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index bd24d5145a..69a7788e99 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3805,4 +3805,17 @@ static inline void do_smxds32(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(smxds32, 1, sizeof(target_ulong));
+
+/* (RV64 Only) Non-SIMD 32-bit Shift Instructions */
+static inline void do_sraiw_u(CPURISCVState *env, void *vd, void *va,
+                         void *vb, uint8_t i)
+{
+    int64_t *d = vd;
+    int32_t *a = va;
+    uint8_t shift = *(uint8_t *)vb;
+
+    *d = vssra32(env, 0, a[H4(i)], shift);
+}
+
+RVPR(sraiw_u, 1, sizeof(target_ulong));
 #endif
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 37/38] target/riscv: RV64 Only 32-bit Packing Instructions
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  5 +++
 target/riscv/insn32-64.decode           |  5 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  6 ++++
 target/riscv/packed_helper.c            | 41 +++++++++++++++++++++++++
 4 files changed, 57 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index d3dd1fb248..6e9c205481 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1471,4 +1471,9 @@ DEF_HELPER_3(smdrs32, tl, env, tl, tl)
 DEF_HELPER_3(smxds32, tl, env, tl, tl)
 
 DEF_HELPER_3(sraiw_u, tl, env, tl, tl)
+
+DEF_HELPER_3(pkbb32, tl, env, tl, tl)
+DEF_HELPER_3(pkbt32, tl, env, tl, tl)
+DEF_HELPER_3(pktt32, tl, env, tl, tl)
+DEF_HELPER_3(pktb32, tl, env, tl, tl)
 #endif
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index 32066d3ac2..62cfd74830 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -166,3 +166,8 @@ smdrs32    0110100  ..... ..... 010 ..... 1111111 @r
 smxds32    0111100  ..... ..... 010 ..... 1111111 @r
 
 sraiw_u    0011010  ..... ..... 001 ..... 1111111 @sh5
+
+pkbb32     0000111  ..... ..... 010 ..... 1111111 @r
+pkbt32     0001111  ..... ..... 010 ..... 1111111 @r
+pktt32     0010111  ..... ..... 010 ..... 1111111 @r
+pktb32     0011111  ..... ..... 010 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 57827d2e15..868b308ed6 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1215,4 +1215,10 @@ GEN_RVP_R_OOL(smxds32);
 
 /* (RV64 Only) Non-SIMD 32-bit Shift Instructions */
 GEN_RVP_SHIFTI(sraiw_u, sraiw_u, NULL);
+
+/* (RV64 Only) 32-bit Packing Instructions */
+GEN_RVP_R_OOL(pkbb32);
+GEN_RVP_R_OOL(pkbt32);
+GEN_RVP_R_OOL(pktt32);
+GEN_RVP_R_OOL(pktb32);
 #endif
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 69a7788e99..e9add8fe5b 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3818,4 +3818,45 @@ static inline void do_sraiw_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(sraiw_u, 1, sizeof(target_ulong));
+
+/* (RV64 Only)  32-bit packing instructions here */
+static inline void do_pkbb32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = b[H4(i)];
+    d[H4(i + 1)] = a[H4(i)];
+}
+
+RVPR(pkbb32, 2, 4);
+
+static inline void do_pkbt32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = b[H4(i + 1)];
+    d[H4(i + 1)] = a[H4(i)];
+}
+
+RVPR(pkbt32, 2, 4);
+
+static inline void do_pktb32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = b[H4(i)];
+    d[H4(i + 1)] = a[H4(i + 1)];
+}
+
+RVPR(pktb32, 2, 4);
+
+static inline void do_pktt32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = b[H4(i + 1)];
+    d[H4(i + 1)] = a[H4(i + 1)];
+}
+
+RVPR(pktt32, 2, 4);
 #endif
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 37/38] target/riscv: RV64 Only 32-bit Packing Instructions
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  5 +++
 target/riscv/insn32-64.decode           |  5 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  6 ++++
 target/riscv/packed_helper.c            | 41 +++++++++++++++++++++++++
 4 files changed, 57 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index d3dd1fb248..6e9c205481 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1471,4 +1471,9 @@ DEF_HELPER_3(smdrs32, tl, env, tl, tl)
 DEF_HELPER_3(smxds32, tl, env, tl, tl)
 
 DEF_HELPER_3(sraiw_u, tl, env, tl, tl)
+
+DEF_HELPER_3(pkbb32, tl, env, tl, tl)
+DEF_HELPER_3(pkbt32, tl, env, tl, tl)
+DEF_HELPER_3(pktt32, tl, env, tl, tl)
+DEF_HELPER_3(pktb32, tl, env, tl, tl)
 #endif
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index 32066d3ac2..62cfd74830 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -166,3 +166,8 @@ smdrs32    0110100  ..... ..... 010 ..... 1111111 @r
 smxds32    0111100  ..... ..... 010 ..... 1111111 @r
 
 sraiw_u    0011010  ..... ..... 001 ..... 1111111 @sh5
+
+pkbb32     0000111  ..... ..... 010 ..... 1111111 @r
+pkbt32     0001111  ..... ..... 010 ..... 1111111 @r
+pktt32     0010111  ..... ..... 010 ..... 1111111 @r
+pktb32     0011111  ..... ..... 010 ..... 1111111 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
index 57827d2e15..868b308ed6 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -1215,4 +1215,10 @@ GEN_RVP_R_OOL(smxds32);
 
 /* (RV64 Only) Non-SIMD 32-bit Shift Instructions */
 GEN_RVP_SHIFTI(sraiw_u, sraiw_u, NULL);
+
+/* (RV64 Only) 32-bit Packing Instructions */
+GEN_RVP_R_OOL(pkbb32);
+GEN_RVP_R_OOL(pkbt32);
+GEN_RVP_R_OOL(pktt32);
+GEN_RVP_R_OOL(pktb32);
 #endif
diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
index 69a7788e99..e9add8fe5b 100644
--- a/target/riscv/packed_helper.c
+++ b/target/riscv/packed_helper.c
@@ -3818,4 +3818,45 @@ static inline void do_sraiw_u(CPURISCVState *env, void *vd, void *va,
 }
 
 RVPR(sraiw_u, 1, sizeof(target_ulong));
+
+/* (RV64 Only)  32-bit packing instructions here */
+static inline void do_pkbb32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = b[H4(i)];
+    d[H4(i + 1)] = a[H4(i)];
+}
+
+RVPR(pkbb32, 2, 4);
+
+static inline void do_pkbt32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = b[H4(i + 1)];
+    d[H4(i + 1)] = a[H4(i)];
+}
+
+RVPR(pkbt32, 2, 4);
+
+static inline void do_pktb32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = b[H4(i)];
+    d[H4(i + 1)] = a[H4(i + 1)];
+}
+
+RVPR(pktb32, 2, 4);
+
+static inline void do_pktt32(CPURISCVState *env, void *vd, void *va,
+                             void *vb, uint8_t i)
+{
+    uint32_t *d = vd, *a = va, *b = vb;
+    d[H4(i)] = b[H4(i + 1)];
+    d[H4(i + 1)] = a[H4(i + 1)];
+}
+
+RVPR(pktt32, 2, 4);
 #endif
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 38/38] target/riscv: configure and turn on packed extension from command line
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-02-12 15:02   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, LIU Zhiwei, qemu-riscv, palmer, alistair23

Packed extension is default off. The only way to use packed extension is
1. use cpu rv32 or rv64
2. turn on it by command line
   "-cpu rv64,x-p=true,Zp64=true,pext_spec=v0.9.2".

Zp64 is whether to support Zp64 extension, default value is true.
pext_ver is the packed specification version, default value is v0.9.2.
These properties can be specified with other values.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/cpu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1b99f629ec..a94cef1cd1 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -573,13 +573,16 @@ static Property riscv_cpu_properties[] = {
     /* This is experimental so mark with 'x-' */
     DEFINE_PROP_BOOL("x-h", RISCVCPU, cfg.ext_h, false),
     DEFINE_PROP_BOOL("x-v", RISCVCPU, cfg.ext_v, false),
+    DEFINE_PROP_BOOL("x-p", RISCVCPU, cfg.ext_p, false),
     DEFINE_PROP_BOOL("Counters", RISCVCPU, cfg.ext_counters, true),
     DEFINE_PROP_BOOL("Zifencei", RISCVCPU, cfg.ext_ifencei, true),
     DEFINE_PROP_BOOL("Zicsr", RISCVCPU, cfg.ext_icsr, true),
     DEFINE_PROP_STRING("priv_spec", RISCVCPU, cfg.priv_spec),
+    DEFINE_PROP_STRING("pext_spec", RISCVCPU, cfg.pext_spec),
     DEFINE_PROP_STRING("vext_spec", RISCVCPU, cfg.vext_spec),
     DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
     DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),
+    DEFINE_PROP_BOOL("Zp64", RISCVCPU, cfg.ext_p64, true),
     DEFINE_PROP_BOOL("mmu", RISCVCPU, cfg.mmu, true),
     DEFINE_PROP_BOOL("pmp", RISCVCPU, cfg.pmp, true),
     DEFINE_PROP_UINT64("resetvec", RISCVCPU, cfg.resetvec, DEFAULT_RSTVEC),
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH 38/38] target/riscv: configure and turn on packed extension from command line
@ 2021-02-12 15:02   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer, LIU Zhiwei

Packed extension is default off. The only way to use packed extension is
1. use cpu rv32 or rv64
2. turn on it by command line
   "-cpu rv64,x-p=true,Zp64=true,pext_spec=v0.9.2".

Zp64 is whether to support Zp64 extension, default value is true.
pext_ver is the packed specification version, default value is v0.9.2.
These properties can be specified with other values.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/cpu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1b99f629ec..a94cef1cd1 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -573,13 +573,16 @@ static Property riscv_cpu_properties[] = {
     /* This is experimental so mark with 'x-' */
     DEFINE_PROP_BOOL("x-h", RISCVCPU, cfg.ext_h, false),
     DEFINE_PROP_BOOL("x-v", RISCVCPU, cfg.ext_v, false),
+    DEFINE_PROP_BOOL("x-p", RISCVCPU, cfg.ext_p, false),
     DEFINE_PROP_BOOL("Counters", RISCVCPU, cfg.ext_counters, true),
     DEFINE_PROP_BOOL("Zifencei", RISCVCPU, cfg.ext_ifencei, true),
     DEFINE_PROP_BOOL("Zicsr", RISCVCPU, cfg.ext_icsr, true),
     DEFINE_PROP_STRING("priv_spec", RISCVCPU, cfg.priv_spec),
+    DEFINE_PROP_STRING("pext_spec", RISCVCPU, cfg.pext_spec),
     DEFINE_PROP_STRING("vext_spec", RISCVCPU, cfg.vext_spec),
     DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
     DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),
+    DEFINE_PROP_BOOL("Zp64", RISCVCPU, cfg.ext_p64, true),
     DEFINE_PROP_BOOL("mmu", RISCVCPU, cfg.mmu, true),
     DEFINE_PROP_BOOL("pmp", RISCVCPU, cfg.pmp, true),
     DEFINE_PROP_UINT64("resetvec", RISCVCPU, cfg.resetvec, DEFAULT_RSTVEC),
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* Re: [PATCH 04/38] target/riscv: 16-bit Addition & Subtraction Instructions
  2021-02-12 15:02   ` LIU Zhiwei
@ 2021-02-12 18:03     ` Richard Henderson
  -1 siblings, 0 replies; 150+ messages in thread
From: Richard Henderson @ 2021-02-12 18:03 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel; +Cc: alistair23, qemu-riscv, palmer

On 2/12/21 7:02 AM, LIU Zhiwei wrote:
> +    if (a->rd && a->rs1 && a->rs2) {
> +#ifdef TARGET_RISCV64
> +        f64(vece, offsetof(CPURISCVState, gpr[a->rd]),
> +            offsetof(CPURISCVState, gpr[a->rs1]),
> +            offsetof(CPURISCVState, gpr[a->rs2]),
> +            8, 8);
> +#else

This is not legal tcg.

You cannot reference as memory anything which has an associated tcg_global_mem.
 Which is true for all of the gprs -- see riscv_translate_init.


r~


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 04/38] target/riscv: 16-bit Addition & Subtraction Instructions
@ 2021-02-12 18:03     ` Richard Henderson
  0 siblings, 0 replies; 150+ messages in thread
From: Richard Henderson @ 2021-02-12 18:03 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel; +Cc: qemu-riscv, alistair23, palmer

On 2/12/21 7:02 AM, LIU Zhiwei wrote:
> +    if (a->rd && a->rs1 && a->rs2) {
> +#ifdef TARGET_RISCV64
> +        f64(vece, offsetof(CPURISCVState, gpr[a->rd]),
> +            offsetof(CPURISCVState, gpr[a->rs1]),
> +            offsetof(CPURISCVState, gpr[a->rs2]),
> +            8, 8);
> +#else

This is not legal tcg.

You cannot reference as memory anything which has an associated tcg_global_mem.
 Which is true for all of the gprs -- see riscv_translate_init.


r~


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 03/38] target/riscv: Fixup saturate subtract function
  2021-02-12 15:02   ` LIU Zhiwei
@ 2021-02-12 18:52     ` Richard Henderson
  -1 siblings, 0 replies; 150+ messages in thread
From: Richard Henderson @ 2021-02-12 18:52 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel; +Cc: alistair23, qemu-riscv, palmer

On 2/12/21 7:02 AM, LIU Zhiwei wrote:
> The overflow predication ((a - b) ^ a) & (a ^ b) & INT64_MIN is right.
> However, when the predication is ture and a is 0, it should return maximum.
> 
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/vector_helper.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 03/38] target/riscv: Fixup saturate subtract function
@ 2021-02-12 18:52     ` Richard Henderson
  0 siblings, 0 replies; 150+ messages in thread
From: Richard Henderson @ 2021-02-12 18:52 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel; +Cc: qemu-riscv, alistair23, palmer

On 2/12/21 7:02 AM, LIU Zhiwei wrote:
> The overflow predication ((a - b) ^ a) & (a ^ b) & INT64_MIN is right.
> However, when the predication is ture and a is 0, it should return maximum.
> 
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/vector_helper.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 04/38] target/riscv: 16-bit Addition & Subtraction Instructions
  2021-02-12 15:02   ` LIU Zhiwei
@ 2021-02-12 19:02     ` Richard Henderson
  -1 siblings, 0 replies; 150+ messages in thread
From: Richard Henderson @ 2021-02-12 19:02 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel; +Cc: alistair23, qemu-riscv, palmer

On 2/12/21 7:02 AM, LIU Zhiwei wrote:
> +static void tcg_gen_simd_add16(TCGv d, TCGv a, TCGv b)
> +{
> +    TCGv t1 = tcg_temp_new();
> +    TCGv t2 = tcg_temp_new();
> +
> +    tcg_gen_andi_tl(t1, a, ~0xffff);
> +    tcg_gen_add_tl(t2, a, b);
> +    tcg_gen_add_tl(t1, t1, b);
> +    tcg_gen_deposit_tl(d, t1, t2, 0, 16);
> +
> +    tcg_temp_free(t1);
> +    tcg_temp_free(t2);
> +}

I will note that there are some helper functions, e.g. tcg_gen_vec_add16_i64
(see the end of include/tcg/tcg-op-gvec.h), but those are explicitly i64, and
you'll still need these for rv32.


r~


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 04/38] target/riscv: 16-bit Addition & Subtraction Instructions
@ 2021-02-12 19:02     ` Richard Henderson
  0 siblings, 0 replies; 150+ messages in thread
From: Richard Henderson @ 2021-02-12 19:02 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel; +Cc: qemu-riscv, alistair23, palmer

On 2/12/21 7:02 AM, LIU Zhiwei wrote:
> +static void tcg_gen_simd_add16(TCGv d, TCGv a, TCGv b)
> +{
> +    TCGv t1 = tcg_temp_new();
> +    TCGv t2 = tcg_temp_new();
> +
> +    tcg_gen_andi_tl(t1, a, ~0xffff);
> +    tcg_gen_add_tl(t2, a, b);
> +    tcg_gen_add_tl(t1, t1, b);
> +    tcg_gen_deposit_tl(d, t1, t2, 0, 16);
> +
> +    tcg_temp_free(t1);
> +    tcg_temp_free(t2);
> +}

I will note that there are some helper functions, e.g. tcg_gen_vec_add16_i64
(see the end of include/tcg/tcg-op-gvec.h), but those are explicitly i64, and
you'll still need these for rv32.


r~


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 04/38] target/riscv: 16-bit Addition & Subtraction Instructions
  2021-02-12 18:03     ` Richard Henderson
@ 2021-02-18  8:39       ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-18  8:39 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alistair23, qemu-riscv, palmer

On 2021/2/13 2:03, Richard Henderson wrote:
> On 2/12/21 7:02 AM, LIU Zhiwei wrote:
>> +    if (a->rd && a->rs1 && a->rs2) {
>> +#ifdef TARGET_RISCV64
>> +        f64(vece, offsetof(CPURISCVState, gpr[a->rd]),
>> +            offsetof(CPURISCVState, gpr[a->rs1]),
>> +            offsetof(CPURISCVState, gpr[a->rs2]),
>> +            8, 8);
>> +#else
> This is not legal tcg.
>
> You cannot reference as memory anything which has an associated tcg_global_mem.
Thanks.

Do you mean referring  a global TCGTemp as memory will cause not 
consistent between TCGContext::temps and
CPUArchState field?

Zhiwei
>   Which is true for all of the gprs -- see riscv_translate_init.
>
>
> r~



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 04/38] target/riscv: 16-bit Addition & Subtraction Instructions
@ 2021-02-18  8:39       ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-18  8:39 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-riscv, alistair23, palmer

On 2021/2/13 2:03, Richard Henderson wrote:
> On 2/12/21 7:02 AM, LIU Zhiwei wrote:
>> +    if (a->rd && a->rs1 && a->rs2) {
>> +#ifdef TARGET_RISCV64
>> +        f64(vece, offsetof(CPURISCVState, gpr[a->rd]),
>> +            offsetof(CPURISCVState, gpr[a->rs1]),
>> +            offsetof(CPURISCVState, gpr[a->rs2]),
>> +            8, 8);
>> +#else
> This is not legal tcg.
>
> You cannot reference as memory anything which has an associated tcg_global_mem.
Thanks.

Do you mean referring  a global TCGTemp as memory will cause not 
consistent between TCGContext::temps and
CPUArchState field?

Zhiwei
>   Which is true for all of the gprs -- see riscv_translate_init.
>
>
> r~



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 04/38] target/riscv: 16-bit Addition & Subtraction Instructions
  2021-02-12 19:02     ` Richard Henderson
@ 2021-02-18  8:47       ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-18  8:47 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alistair23, qemu-riscv, palmer



On 2021/2/13 3:02, Richard Henderson wrote:
> On 2/12/21 7:02 AM, LIU Zhiwei wrote:
>> +static void tcg_gen_simd_add16(TCGv d, TCGv a, TCGv b)
>> +{
>> +    TCGv t1 = tcg_temp_new();
>> +    TCGv t2 = tcg_temp_new();
>> +
>> +    tcg_gen_andi_tl(t1, a, ~0xffff);
>> +    tcg_gen_add_tl(t2, a, b);
>> +    tcg_gen_add_tl(t1, t1, b);
>> +    tcg_gen_deposit_tl(d, t1, t2, 0, 16);
>> +
>> +    tcg_temp_free(t1);
>> +    tcg_temp_free(t2);
>> +}
> I will note that there are some helper functions, e.g. tcg_gen_vec_add16_i64
> (see the end of include/tcg/tcg-op-gvec.h), but those are explicitly i64, and
> you'll still need these for rv32.
Hi Richard,

Yes, that's really what I need.
Do you mind  continue to review the other patches in v1? Or should I 
send a v2 to fix current error at first?

Zhiwei
>
> r~



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 04/38] target/riscv: 16-bit Addition & Subtraction Instructions
@ 2021-02-18  8:47       ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-02-18  8:47 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-riscv, alistair23, palmer



On 2021/2/13 3:02, Richard Henderson wrote:
> On 2/12/21 7:02 AM, LIU Zhiwei wrote:
>> +static void tcg_gen_simd_add16(TCGv d, TCGv a, TCGv b)
>> +{
>> +    TCGv t1 = tcg_temp_new();
>> +    TCGv t2 = tcg_temp_new();
>> +
>> +    tcg_gen_andi_tl(t1, a, ~0xffff);
>> +    tcg_gen_add_tl(t2, a, b);
>> +    tcg_gen_add_tl(t1, t1, b);
>> +    tcg_gen_deposit_tl(d, t1, t2, 0, 16);
>> +
>> +    tcg_temp_free(t1);
>> +    tcg_temp_free(t2);
>> +}
> I will note that there are some helper functions, e.g. tcg_gen_vec_add16_i64
> (see the end of include/tcg/tcg-op-gvec.h), but those are explicitly i64, and
> you'll still need these for rv32.
Hi Richard,

Yes, that's really what I need.
Do you mind  continue to review the other patches in v1? Or should I 
send a v2 to fix current error at first?

Zhiwei
>
> r~



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 04/38] target/riscv: 16-bit Addition & Subtraction Instructions
  2021-02-18  8:39       ` LIU Zhiwei
@ 2021-02-18 16:20         ` Richard Henderson
  -1 siblings, 0 replies; 150+ messages in thread
From: Richard Henderson @ 2021-02-18 16:20 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel; +Cc: alistair23, qemu-riscv, palmer

On 2/18/21 12:39 AM, LIU Zhiwei wrote:
> On 2021/2/13 2:03, Richard Henderson wrote:
>> On 2/12/21 7:02 AM, LIU Zhiwei wrote:
>>> +    if (a->rd && a->rs1 && a->rs2) {
>>> +#ifdef TARGET_RISCV64
>>> +        f64(vece, offsetof(CPURISCVState, gpr[a->rd]),
>>> +            offsetof(CPURISCVState, gpr[a->rs1]),
>>> +            offsetof(CPURISCVState, gpr[a->rs2]),
>>> +            8, 8);
>>> +#else
>> This is not legal tcg.
>>
>> You cannot reference as memory anything which has an associated tcg_global_mem.
> Thanks.
> 
> Do you mean referring  a global TCGTemp as memory will cause not consistent
> between TCGContext::temps and
> CPUArchState field?

Yes, there is nothing that will keep them in sync.


r~


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 04/38] target/riscv: 16-bit Addition & Subtraction Instructions
@ 2021-02-18 16:20         ` Richard Henderson
  0 siblings, 0 replies; 150+ messages in thread
From: Richard Henderson @ 2021-02-18 16:20 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel; +Cc: qemu-riscv, alistair23, palmer

On 2/18/21 12:39 AM, LIU Zhiwei wrote:
> On 2021/2/13 2:03, Richard Henderson wrote:
>> On 2/12/21 7:02 AM, LIU Zhiwei wrote:
>>> +    if (a->rd && a->rs1 && a->rs2) {
>>> +#ifdef TARGET_RISCV64
>>> +        f64(vece, offsetof(CPURISCVState, gpr[a->rd]),
>>> +            offsetof(CPURISCVState, gpr[a->rs1]),
>>> +            offsetof(CPURISCVState, gpr[a->rs2]),
>>> +            8, 8);
>>> +#else
>> This is not legal tcg.
>>
>> You cannot reference as memory anything which has an associated tcg_global_mem.
> Thanks.
> 
> Do you mean referring  a global TCGTemp as memory will cause not consistent
> between TCGContext::temps and
> CPUArchState field?

Yes, there is nothing that will keep them in sync.


r~


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 04/38] target/riscv: 16-bit Addition & Subtraction Instructions
  2021-02-18  8:47       ` LIU Zhiwei
@ 2021-02-18 16:21         ` Richard Henderson
  -1 siblings, 0 replies; 150+ messages in thread
From: Richard Henderson @ 2021-02-18 16:21 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel; +Cc: alistair23, qemu-riscv, palmer

On 2/18/21 12:47 AM, LIU Zhiwei wrote:
> Do you mind  continue to review the other patches in v1? Or should I send a v2
> to fix current error at first?

Yes, I can have a look through the others.

r!



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 04/38] target/riscv: 16-bit Addition & Subtraction Instructions
@ 2021-02-18 16:21         ` Richard Henderson
  0 siblings, 0 replies; 150+ messages in thread
From: Richard Henderson @ 2021-02-18 16:21 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel; +Cc: qemu-riscv, alistair23, palmer

On 2/18/21 12:47 AM, LIU Zhiwei wrote:
> Do you mind  continue to review the other patches in v1? Or should I send a v2
> to fix current error at first?

Yes, I can have a look through the others.

r!



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 00/38] target/riscv: support packed extension v0.9.2
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-03-05  6:14   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-03-05  6:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, qemu-riscv, palmer, alistair23

ping

On 2021/2/12 23:02, LIU Zhiwei wrote:
> This patchset implements the packed extension for RISC-V on QEMU.
>
> This patchset have passed all my direct Linux user mode cases(RV64) and
> bare metal cases(RV32) on X86-64 Ubuntu host machine. I will later push
> these test cases to my repo(https://github.com/romanheros/qemu.git
> branch:packed-upstream-v1).
>
> I have ported packed extension on RISU, but I didn't find a simulator or
> hardware to compare with. If anyone have one, please let me know.
>
> Features:
>    * support specification packed extension v0.9.2(https://github.com/riscv/riscv-p-spec/)
>    * support basic packed extension.
>    * support Zp64.
>
> LIU Zhiwei (38):
>    target/riscv: implementation-defined constant parameters
>    target/riscv: Hoist vector functions
>    target/riscv: Fixup saturate subtract function
>    target/riscv: 16-bit Addition & Subtraction Instructions
>    target/riscv: 8-bit Addition & Subtraction Instruction
>    target/riscv: SIMD 16-bit Shift Instructions
>    target/riscv: SIMD 8-bit Shift Instructions
>    target/riscv: SIMD 16-bit Compare Instructions
>    target/riscv: SIMD 8-bit Compare Instructions
>    target/riscv: SIMD 16-bit Multiply Instructions
>    target/riscv: SIMD 8-bit Multiply Instructions
>    target/riscv: SIMD 16-bit Miscellaneous Instructions
>    target/riscv: SIMD 8-bit Miscellaneous Instructions
>    target/riscv: 8-bit Unpacking Instructions
>    target/riscv: 16-bit Packing Instructions
>    target/riscv: Signed MSW 32x32 Multiply and Add Instructions
>    target/riscv: Signed MSW 32x16 Multiply and Add Instructions
>    target/riscv: Signed 16-bit Multiply 32-bit Add/Subtract Instructions
>    target/riscv: Signed 16-bit Multiply 64-bit Add/Subtract Instructions
>    target/riscv: Partial-SIMD Miscellaneous Instructions
>    target/riscv: 8-bit Multiply with 32-bit Add Instructions
>    target/riscv: 64-bit Add/Subtract Instructions
>    target/riscv: 32-bit Multiply 64-bit Add/Subtract Instructions
>    target/riscv: Signed 16-bit Multiply with 64-bit Add/Subtract
>      Instructions
>    target/riscv: Non-SIMD Q15 saturation ALU Instructions
>    target/riscv: Non-SIMD Q31 saturation ALU Instructions
>    target/riscv: 32-bit Computation Instructions
>    target/riscv: Non-SIMD Miscellaneous Instructions
>    target/riscv: RV64 Only SIMD 32-bit Add/Subtract Instructions
>    target/riscv: RV64 Only SIMD 32-bit Shift Instructions
>    target/riscv: RV64 Only SIMD 32-bit Miscellaneous Instructions
>    target/riscv: RV64 Only SIMD Q15 saturating Multiply Instructions
>    target/riscv: RV64 Only 32-bit Multiply Instructions
>    target/riscv: RV64 Only 32-bit Multiply & Add Instructions
>    target/riscv: RV64 Only 32-bit Parallel Multiply & Add Instructions
>    target/riscv: RV64 Only Non-SIMD 32-bit Shift Instructions
>    target/riscv: RV64 Only 32-bit Packing Instructions
>    target/riscv: configure and turn on packed extension from command line
>
>   target/riscv/cpu.c                      |   32 +
>   target/riscv/cpu.h                      |    6 +
>   target/riscv/helper.h                   |  332 ++
>   target/riscv/insn32-64.decode           |   93 +-
>   target/riscv/insn32.decode              |  285 ++
>   target/riscv/insn_trans/trans_rvp.c.inc | 1224 +++++++
>   target/riscv/internals.h                |   50 +
>   target/riscv/meson.build                |    1 +
>   target/riscv/packed_helper.c            | 3862 +++++++++++++++++++++++
>   target/riscv/translate.c                |    3 +
>   target/riscv/vector_helper.c            |   90 +-
>   11 files changed, 5912 insertions(+), 66 deletions(-)
>   create mode 100644 target/riscv/insn_trans/trans_rvp.c.inc
>   create mode 100644 target/riscv/packed_helper.c
>



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 00/38] target/riscv: support packed extension v0.9.2
@ 2021-03-05  6:14   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-03-05  6:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer

ping

On 2021/2/12 23:02, LIU Zhiwei wrote:
> This patchset implements the packed extension for RISC-V on QEMU.
>
> This patchset have passed all my direct Linux user mode cases(RV64) and
> bare metal cases(RV32) on X86-64 Ubuntu host machine. I will later push
> these test cases to my repo(https://github.com/romanheros/qemu.git
> branch:packed-upstream-v1).
>
> I have ported packed extension on RISU, but I didn't find a simulator or
> hardware to compare with. If anyone have one, please let me know.
>
> Features:
>    * support specification packed extension v0.9.2(https://github.com/riscv/riscv-p-spec/)
>    * support basic packed extension.
>    * support Zp64.
>
> LIU Zhiwei (38):
>    target/riscv: implementation-defined constant parameters
>    target/riscv: Hoist vector functions
>    target/riscv: Fixup saturate subtract function
>    target/riscv: 16-bit Addition & Subtraction Instructions
>    target/riscv: 8-bit Addition & Subtraction Instruction
>    target/riscv: SIMD 16-bit Shift Instructions
>    target/riscv: SIMD 8-bit Shift Instructions
>    target/riscv: SIMD 16-bit Compare Instructions
>    target/riscv: SIMD 8-bit Compare Instructions
>    target/riscv: SIMD 16-bit Multiply Instructions
>    target/riscv: SIMD 8-bit Multiply Instructions
>    target/riscv: SIMD 16-bit Miscellaneous Instructions
>    target/riscv: SIMD 8-bit Miscellaneous Instructions
>    target/riscv: 8-bit Unpacking Instructions
>    target/riscv: 16-bit Packing Instructions
>    target/riscv: Signed MSW 32x32 Multiply and Add Instructions
>    target/riscv: Signed MSW 32x16 Multiply and Add Instructions
>    target/riscv: Signed 16-bit Multiply 32-bit Add/Subtract Instructions
>    target/riscv: Signed 16-bit Multiply 64-bit Add/Subtract Instructions
>    target/riscv: Partial-SIMD Miscellaneous Instructions
>    target/riscv: 8-bit Multiply with 32-bit Add Instructions
>    target/riscv: 64-bit Add/Subtract Instructions
>    target/riscv: 32-bit Multiply 64-bit Add/Subtract Instructions
>    target/riscv: Signed 16-bit Multiply with 64-bit Add/Subtract
>      Instructions
>    target/riscv: Non-SIMD Q15 saturation ALU Instructions
>    target/riscv: Non-SIMD Q31 saturation ALU Instructions
>    target/riscv: 32-bit Computation Instructions
>    target/riscv: Non-SIMD Miscellaneous Instructions
>    target/riscv: RV64 Only SIMD 32-bit Add/Subtract Instructions
>    target/riscv: RV64 Only SIMD 32-bit Shift Instructions
>    target/riscv: RV64 Only SIMD 32-bit Miscellaneous Instructions
>    target/riscv: RV64 Only SIMD Q15 saturating Multiply Instructions
>    target/riscv: RV64 Only 32-bit Multiply Instructions
>    target/riscv: RV64 Only 32-bit Multiply & Add Instructions
>    target/riscv: RV64 Only 32-bit Parallel Multiply & Add Instructions
>    target/riscv: RV64 Only Non-SIMD 32-bit Shift Instructions
>    target/riscv: RV64 Only 32-bit Packing Instructions
>    target/riscv: configure and turn on packed extension from command line
>
>   target/riscv/cpu.c                      |   32 +
>   target/riscv/cpu.h                      |    6 +
>   target/riscv/helper.h                   |  332 ++
>   target/riscv/insn32-64.decode           |   93 +-
>   target/riscv/insn32.decode              |  285 ++
>   target/riscv/insn_trans/trans_rvp.c.inc | 1224 +++++++
>   target/riscv/internals.h                |   50 +
>   target/riscv/meson.build                |    1 +
>   target/riscv/packed_helper.c            | 3862 +++++++++++++++++++++++
>   target/riscv/translate.c                |    3 +
>   target/riscv/vector_helper.c            |   90 +-
>   11 files changed, 5912 insertions(+), 66 deletions(-)
>   create mode 100644 target/riscv/insn_trans/trans_rvp.c.inc
>   create mode 100644 target/riscv/packed_helper.c
>



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 01/38] target/riscv: implementation-defined constant parameters
  2021-02-12 15:02   ` LIU Zhiwei
@ 2021-03-09 14:08     ` Alistair Francis
  -1 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-09 14:08 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Palmer Dabbelt, open list:RISC-V,
	qemu-devel@nongnu.org Developers

On Fri, Feb 12, 2021 at 10:05 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> ext_p64 is whether to support Zp64 extension in RV32, default value is true.
> pext_ver is the packed specification version, default value is v0.9.2.
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/cpu.c       | 29 +++++++++++++++++++++++++++++
>  target/riscv/cpu.h       |  6 ++++++
>  target/riscv/translate.c |  2 ++
>  3 files changed, 37 insertions(+)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 16f1a34238..1b99f629ec 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -132,6 +132,11 @@ static void set_vext_version(CPURISCVState *env, int vext_ver)
>      env->vext_ver = vext_ver;
>  }
>
> +static void set_pext_version(CPURISCVState *env, int pext_ver)
> +{
> +    env->pext_ver = pext_ver;
> +}
> +
>  static void set_feature(CPURISCVState *env, int feature)
>  {
>      env->features |= (1ULL << feature);
> @@ -380,6 +385,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
>      RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(dev);
>      int priv_version = PRIV_VERSION_1_11_0;
>      int vext_version = VEXT_VERSION_0_07_1;
> +    int pext_version = PEXT_VERSION_0_09_2;
>      target_ulong target_misa = env->misa;
>      Error *local_err = NULL;
>
> @@ -404,6 +410,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
>
>      set_priv_version(env, priv_version);
>      set_vext_version(env, vext_version);
> +    set_pext_version(env, pext_version);
>
>      if (cpu->cfg.mmu) {
>          set_feature(env, RISCV_FEATURE_MMU);
> @@ -511,6 +518,28 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
>              }
>              set_vext_version(env, vext_version);
>          }
> +        if (cpu->cfg.ext_p) {
> +            target_misa |= RVP;
> +            if (cpu->cfg.pext_spec) {
> +                if (!g_strcmp0(cpu->cfg.pext_spec, "v0.9.2")) {
> +                    pext_version = PEXT_VERSION_0_09_2;
> +                } else {
> +                    error_setg(errp,
> +                               "Unsupported packed spec version '%s'",
> +                               cpu->cfg.pext_spec);
> +                    return;
> +                }
> +            } else {
> +                qemu_log("packed verison is not specified, "
> +                         "use the default value v0.9.2\n");
> +            }
> +            if (!cpu->cfg.ext_p64 && env->misa == RV64) {
> +                error_setg(errp, "For RV64, the Zp64 instructions will be "
> +                                 "included in the baseline P extension.");
> +                return;
> +            }
> +            set_pext_version(env, pext_version);
> +        }
>
>          set_misa(env, target_misa);
>      }
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index 02758ae0eb..f458722646 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -68,6 +68,7 @@
>  #define RVF RV('F')
>  #define RVD RV('D')
>  #define RVV RV('V')
> +#define RVP RV('P')
>  #define RVC RV('C')
>  #define RVS RV('S')
>  #define RVU RV('U')
> @@ -87,6 +88,7 @@ enum {
>  #define PRIV_VERSION_1_11_0 0x00011100
>
>  #define VEXT_VERSION_0_07_1 0x00000701
> +#define PEXT_VERSION_0_09_2 0x00000902
>
>  enum {
>      TRANSLATE_SUCCESS,
> @@ -134,6 +136,7 @@ struct CPURISCVState {
>
>      target_ulong priv_ver;
>      target_ulong vext_ver;
> +    target_ulong pext_ver;
>      target_ulong misa;
>      target_ulong misa_mask;
>
> @@ -288,13 +291,16 @@ struct RISCVCPU {
>          bool ext_u;
>          bool ext_h;
>          bool ext_v;
> +        bool ext_p;
>          bool ext_counters;
>          bool ext_ifencei;
>          bool ext_icsr;
> +        bool ext_p64;
>
>          char *priv_spec;
>          char *user_spec;
>          char *vext_spec;
> +        char *pext_spec;
>          uint16_t vlen;
>          uint16_t elen;
>          bool mmu;
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index 0f28b5f41e..eb810efec6 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -56,6 +56,7 @@ typedef struct DisasContext {
>         to reset this known value.  */
>      int frm;
>      bool ext_ifencei;
> +    bool ext_p64;
>      bool hlsx;
>      /* vector extension */
>      bool vill;
> @@ -824,6 +825,7 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
>      ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
>      ctx->mlen = 1 << (ctx->sew  + 3 - ctx->lmul);
>      ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
> +    ctx->ext_p64 = cpu->cfg.ext_p64;
>      ctx->cs = cs;
>  }
>
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 01/38] target/riscv: implementation-defined constant parameters
@ 2021-03-09 14:08     ` Alistair Francis
  0 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-09 14:08 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, Palmer Dabbelt

On Fri, Feb 12, 2021 at 10:05 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> ext_p64 is whether to support Zp64 extension in RV32, default value is true.
> pext_ver is the packed specification version, default value is v0.9.2.
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/cpu.c       | 29 +++++++++++++++++++++++++++++
>  target/riscv/cpu.h       |  6 ++++++
>  target/riscv/translate.c |  2 ++
>  3 files changed, 37 insertions(+)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 16f1a34238..1b99f629ec 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -132,6 +132,11 @@ static void set_vext_version(CPURISCVState *env, int vext_ver)
>      env->vext_ver = vext_ver;
>  }
>
> +static void set_pext_version(CPURISCVState *env, int pext_ver)
> +{
> +    env->pext_ver = pext_ver;
> +}
> +
>  static void set_feature(CPURISCVState *env, int feature)
>  {
>      env->features |= (1ULL << feature);
> @@ -380,6 +385,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
>      RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(dev);
>      int priv_version = PRIV_VERSION_1_11_0;
>      int vext_version = VEXT_VERSION_0_07_1;
> +    int pext_version = PEXT_VERSION_0_09_2;
>      target_ulong target_misa = env->misa;
>      Error *local_err = NULL;
>
> @@ -404,6 +410,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
>
>      set_priv_version(env, priv_version);
>      set_vext_version(env, vext_version);
> +    set_pext_version(env, pext_version);
>
>      if (cpu->cfg.mmu) {
>          set_feature(env, RISCV_FEATURE_MMU);
> @@ -511,6 +518,28 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
>              }
>              set_vext_version(env, vext_version);
>          }
> +        if (cpu->cfg.ext_p) {
> +            target_misa |= RVP;
> +            if (cpu->cfg.pext_spec) {
> +                if (!g_strcmp0(cpu->cfg.pext_spec, "v0.9.2")) {
> +                    pext_version = PEXT_VERSION_0_09_2;
> +                } else {
> +                    error_setg(errp,
> +                               "Unsupported packed spec version '%s'",
> +                               cpu->cfg.pext_spec);
> +                    return;
> +                }
> +            } else {
> +                qemu_log("packed verison is not specified, "
> +                         "use the default value v0.9.2\n");
> +            }
> +            if (!cpu->cfg.ext_p64 && env->misa == RV64) {
> +                error_setg(errp, "For RV64, the Zp64 instructions will be "
> +                                 "included in the baseline P extension.");
> +                return;
> +            }
> +            set_pext_version(env, pext_version);
> +        }
>
>          set_misa(env, target_misa);
>      }
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index 02758ae0eb..f458722646 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -68,6 +68,7 @@
>  #define RVF RV('F')
>  #define RVD RV('D')
>  #define RVV RV('V')
> +#define RVP RV('P')
>  #define RVC RV('C')
>  #define RVS RV('S')
>  #define RVU RV('U')
> @@ -87,6 +88,7 @@ enum {
>  #define PRIV_VERSION_1_11_0 0x00011100
>
>  #define VEXT_VERSION_0_07_1 0x00000701
> +#define PEXT_VERSION_0_09_2 0x00000902
>
>  enum {
>      TRANSLATE_SUCCESS,
> @@ -134,6 +136,7 @@ struct CPURISCVState {
>
>      target_ulong priv_ver;
>      target_ulong vext_ver;
> +    target_ulong pext_ver;
>      target_ulong misa;
>      target_ulong misa_mask;
>
> @@ -288,13 +291,16 @@ struct RISCVCPU {
>          bool ext_u;
>          bool ext_h;
>          bool ext_v;
> +        bool ext_p;
>          bool ext_counters;
>          bool ext_ifencei;
>          bool ext_icsr;
> +        bool ext_p64;
>
>          char *priv_spec;
>          char *user_spec;
>          char *vext_spec;
> +        char *pext_spec;
>          uint16_t vlen;
>          uint16_t elen;
>          bool mmu;
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index 0f28b5f41e..eb810efec6 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -56,6 +56,7 @@ typedef struct DisasContext {
>         to reset this known value.  */
>      int frm;
>      bool ext_ifencei;
> +    bool ext_p64;
>      bool hlsx;
>      /* vector extension */
>      bool vill;
> @@ -824,6 +825,7 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
>      ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
>      ctx->mlen = 1 << (ctx->sew  + 3 - ctx->lmul);
>      ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
> +    ctx->ext_p64 = cpu->cfg.ext_p64;
>      ctx->cs = cs;
>  }
>
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 02/38] target/riscv: Hoist vector functions
  2021-02-12 15:02   ` LIU Zhiwei
@ 2021-03-09 14:10     ` Alistair Francis
  -1 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-09 14:10 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Palmer Dabbelt, open list:RISC-V,
	qemu-devel@nongnu.org Developers

On Fri, Feb 12, 2021 at 10:07 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> The saturate functions about add,subtract and shift functions can
> be used in packed extension.Therefore hoist them up.

A better title might be:

target/riscv: Make the vector helper functions public

Otherwise:

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

>
> The endianess process macro is also be hoisted.
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/internals.h     | 50 ++++++++++++++++++++++
>  target/riscv/vector_helper.c | 82 +++++++++++-------------------------
>  2 files changed, 74 insertions(+), 58 deletions(-)
>
> diff --git a/target/riscv/internals.h b/target/riscv/internals.h
> index b15ad394bb..698158e116 100644
> --- a/target/riscv/internals.h
> +++ b/target/riscv/internals.h
> @@ -58,4 +58,54 @@ static inline float32 check_nanbox_s(uint64_t f)
>      }
>  }
>
> +/*
> + * Note that vector data is stored in host-endian 64-bit chunks,
> + * so addressing units smaller than that needs a host-endian fixup.
> + */
> +#ifdef HOST_WORDS_BIGENDIAN
> +#define H1(x)   ((x) ^ 7)
> +#define H1_2(x) ((x) ^ 6)
> +#define H1_4(x) ((x) ^ 4)
> +#define H2(x)   ((x) ^ 3)
> +#define H4(x)   ((x) ^ 1)
> +#define H8(x)   ((x))
> +#else
> +#define H1(x)   (x)
> +#define H1_2(x) (x)
> +#define H1_4(x) (x)
> +#define H2(x)   (x)
> +#define H4(x)   (x)
> +#define H8(x)   (x)
> +#endif
> +
> +/* share functions about saturation */
> +int8_t sadd8(CPURISCVState *, int vxrm, int8_t, int8_t);
> +int16_t sadd16(CPURISCVState *, int vxrm, int16_t, int16_t);
> +int32_t sadd32(CPURISCVState *, int vxrm, int32_t, int32_t);
> +int64_t sadd64(CPURISCVState *, int vxrm, int64_t, int64_t);
> +
> +uint8_t saddu8(CPURISCVState *, int vxrm, uint8_t, uint8_t);
> +uint16_t saddu16(CPURISCVState *, int vxrm, uint16_t, uint16_t);
> +uint32_t saddu32(CPURISCVState *, int vxrm, uint32_t, uint32_t);
> +uint64_t saddu64(CPURISCVState *, int vxrm, uint64_t, uint64_t);
> +
> +int8_t ssub8(CPURISCVState *, int vxrm, int8_t, int8_t);
> +int16_t ssub16(CPURISCVState *, int vxrm, int16_t, int16_t);
> +int32_t ssub32(CPURISCVState *, int vxrm, int32_t, int32_t);
> +int64_t ssub64(CPURISCVState *, int vxrm, int64_t, int64_t);
> +
> +uint8_t ssubu8(CPURISCVState *, int vxrm, uint8_t, uint8_t);
> +uint16_t ssubu16(CPURISCVState *, int vxrm, uint16_t, uint16_t);
> +uint32_t ssubu32(CPURISCVState *, int vxrm, uint32_t, uint32_t);
> +uint64_t ssubu64(CPURISCVState *, int vxrm, uint64_t, uint64_t);
> +
> +/* share shift functions */
> +int8_t vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b);
> +int16_t vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b);
> +int32_t vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b);
> +int64_t vssra64(CPURISCVState *env, int vxrm, int64_t a, int64_t b);
> +uint8_t vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b);
> +uint16_t vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b);
> +uint32_t vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b);
> +uint64_t vssrl64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b);
>  #endif
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index a156573d28..9371d70f6b 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -56,26 +56,6 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
>      return vl;
>  }
>
> -/*
> - * Note that vector data is stored in host-endian 64-bit chunks,
> - * so addressing units smaller than that needs a host-endian fixup.
> - */
> -#ifdef HOST_WORDS_BIGENDIAN
> -#define H1(x)   ((x) ^ 7)
> -#define H1_2(x) ((x) ^ 6)
> -#define H1_4(x) ((x) ^ 4)
> -#define H2(x)   ((x) ^ 3)
> -#define H4(x)   ((x) ^ 1)
> -#define H8(x)   ((x))
> -#else
> -#define H1(x)   (x)
> -#define H1_2(x) (x)
> -#define H1_4(x) (x)
> -#define H2(x)   (x)
> -#define H4(x)   (x)
> -#define H8(x)   (x)
> -#endif
> -
>  static inline uint32_t vext_nf(uint32_t desc)
>  {
>      return FIELD_EX32(simd_data(desc), VDATA, NF);
> @@ -2199,7 +2179,7 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,     \
>                   do_##NAME, CLEAR_FN);                          \
>  }
>
> -static inline uint8_t saddu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
> +uint8_t saddu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
>  {
>      uint8_t res = a + b;
>      if (res < a) {
> @@ -2209,8 +2189,7 @@ static inline uint8_t saddu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
>      return res;
>  }
>
> -static inline uint16_t saddu16(CPURISCVState *env, int vxrm, uint16_t a,
> -                               uint16_t b)
> +uint16_t saddu16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
>  {
>      uint16_t res = a + b;
>      if (res < a) {
> @@ -2220,8 +2199,7 @@ static inline uint16_t saddu16(CPURISCVState *env, int vxrm, uint16_t a,
>      return res;
>  }
>
> -static inline uint32_t saddu32(CPURISCVState *env, int vxrm, uint32_t a,
> -                               uint32_t b)
> +uint32_t saddu32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
>  {
>      uint32_t res = a + b;
>      if (res < a) {
> @@ -2231,8 +2209,7 @@ static inline uint32_t saddu32(CPURISCVState *env, int vxrm, uint32_t a,
>      return res;
>  }
>
> -static inline uint64_t saddu64(CPURISCVState *env, int vxrm, uint64_t a,
> -                               uint64_t b)
> +uint64_t saddu64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
>  {
>      uint64_t res = a + b;
>      if (res < a) {
> @@ -2328,7 +2305,7 @@ GEN_VEXT_VX_RM(vsaddu_vx_h, 2, 2, clearh)
>  GEN_VEXT_VX_RM(vsaddu_vx_w, 4, 4, clearl)
>  GEN_VEXT_VX_RM(vsaddu_vx_d, 8, 8, clearq)
>
> -static inline int8_t sadd8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
> +int8_t sadd8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
>  {
>      int8_t res = a + b;
>      if ((res ^ a) & (res ^ b) & INT8_MIN) {
> @@ -2338,7 +2315,7 @@ static inline int8_t sadd8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
>      return res;
>  }
>
> -static inline int16_t sadd16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
> +int16_t sadd16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
>  {
>      int16_t res = a + b;
>      if ((res ^ a) & (res ^ b) & INT16_MIN) {
> @@ -2348,7 +2325,7 @@ static inline int16_t sadd16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
>      return res;
>  }
>
> -static inline int32_t sadd32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
> +int32_t sadd32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
>  {
>      int32_t res = a + b;
>      if ((res ^ a) & (res ^ b) & INT32_MIN) {
> @@ -2358,7 +2335,7 @@ static inline int32_t sadd32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
>      return res;
>  }
>
> -static inline int64_t sadd64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
> +int64_t sadd64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
>  {
>      int64_t res = a + b;
>      if ((res ^ a) & (res ^ b) & INT64_MIN) {
> @@ -2386,7 +2363,7 @@ GEN_VEXT_VX_RM(vsadd_vx_h, 2, 2, clearh)
>  GEN_VEXT_VX_RM(vsadd_vx_w, 4, 4, clearl)
>  GEN_VEXT_VX_RM(vsadd_vx_d, 8, 8, clearq)
>
> -static inline uint8_t ssubu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
> +uint8_t ssubu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
>  {
>      uint8_t res = a - b;
>      if (res > a) {
> @@ -2396,8 +2373,7 @@ static inline uint8_t ssubu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
>      return res;
>  }
>
> -static inline uint16_t ssubu16(CPURISCVState *env, int vxrm, uint16_t a,
> -                               uint16_t b)
> +uint16_t ssubu16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
>  {
>      uint16_t res = a - b;
>      if (res > a) {
> @@ -2407,8 +2383,7 @@ static inline uint16_t ssubu16(CPURISCVState *env, int vxrm, uint16_t a,
>      return res;
>  }
>
> -static inline uint32_t ssubu32(CPURISCVState *env, int vxrm, uint32_t a,
> -                               uint32_t b)
> +uint32_t ssubu32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
>  {
>      uint32_t res = a - b;
>      if (res > a) {
> @@ -2418,8 +2393,7 @@ static inline uint32_t ssubu32(CPURISCVState *env, int vxrm, uint32_t a,
>      return res;
>  }
>
> -static inline uint64_t ssubu64(CPURISCVState *env, int vxrm, uint64_t a,
> -                               uint64_t b)
> +uint64_t ssubu64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
>  {
>      uint64_t res = a - b;
>      if (res > a) {
> @@ -2447,7 +2421,7 @@ GEN_VEXT_VX_RM(vssubu_vx_h, 2, 2, clearh)
>  GEN_VEXT_VX_RM(vssubu_vx_w, 4, 4, clearl)
>  GEN_VEXT_VX_RM(vssubu_vx_d, 8, 8, clearq)
>
> -static inline int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
> +int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
>  {
>      int8_t res = a - b;
>      if ((res ^ a) & (a ^ b) & INT8_MIN) {
> @@ -2457,7 +2431,7 @@ static inline int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
>      return res;
>  }
>
> -static inline int16_t ssub16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
> +int16_t ssub16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
>  {
>      int16_t res = a - b;
>      if ((res ^ a) & (a ^ b) & INT16_MIN) {
> @@ -2467,7 +2441,7 @@ static inline int16_t ssub16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
>      return res;
>  }
>
> -static inline int32_t ssub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
> +int32_t ssub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
>  {
>      int32_t res = a - b;
>      if ((res ^ a) & (a ^ b) & INT32_MIN) {
> @@ -2477,7 +2451,7 @@ static inline int32_t ssub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
>      return res;
>  }
>
> -static inline int64_t ssub64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
> +int64_t ssub64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
>  {
>      int64_t res = a - b;
>      if ((res ^ a) & (a ^ b) & INT64_MIN) {
> @@ -2918,8 +2892,7 @@ GEN_VEXT_VX_RM(vwsmaccus_vx_h, 2, 4, clearl)
>  GEN_VEXT_VX_RM(vwsmaccus_vx_w, 4, 8, clearq)
>
>  /* Vector Single-Width Scaling Shift Instructions */
> -static inline uint8_t
> -vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
> +uint8_t vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
>  {
>      uint8_t round, shift = b & 0x7;
>      uint8_t res;
> @@ -2928,8 +2901,7 @@ vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
>      res   = (a >> shift)  + round;
>      return res;
>  }
> -static inline uint16_t
> -vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
> +uint16_t vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
>  {
>      uint8_t round, shift = b & 0xf;
>      uint16_t res;
> @@ -2938,8 +2910,7 @@ vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
>      res   = (a >> shift)  + round;
>      return res;
>  }
> -static inline uint32_t
> -vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
> +uint32_t vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
>  {
>      uint8_t round, shift = b & 0x1f;
>      uint32_t res;
> @@ -2948,8 +2919,7 @@ vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
>      res   = (a >> shift)  + round;
>      return res;
>  }
> -static inline uint64_t
> -vssrl64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
> +uint64_t vssrl64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
>  {
>      uint8_t round, shift = b & 0x3f;
>      uint64_t res;
> @@ -2976,8 +2946,7 @@ GEN_VEXT_VX_RM(vssrl_vx_h, 2, 2, clearh)
>  GEN_VEXT_VX_RM(vssrl_vx_w, 4, 4, clearl)
>  GEN_VEXT_VX_RM(vssrl_vx_d, 8, 8, clearq)
>
> -static inline int8_t
> -vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
> +int8_t vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
>  {
>      uint8_t round, shift = b & 0x7;
>      int8_t res;
> @@ -2986,8 +2955,7 @@ vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
>      res   = (a >> shift)  + round;
>      return res;
>  }
> -static inline int16_t
> -vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
> +int16_t vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
>  {
>      uint8_t round, shift = b & 0xf;
>      int16_t res;
> @@ -2996,8 +2964,7 @@ vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
>      res   = (a >> shift)  + round;
>      return res;
>  }
> -static inline int32_t
> -vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
> +int32_t vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
>  {
>      uint8_t round, shift = b & 0x1f;
>      int32_t res;
> @@ -3006,8 +2973,7 @@ vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
>      res   = (a >> shift)  + round;
>      return res;
>  }
> -static inline int64_t
> -vssra64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
> +int64_t vssra64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
>  {
>      uint8_t round, shift = b & 0x3f;
>      int64_t res;
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 02/38] target/riscv: Hoist vector functions
@ 2021-03-09 14:10     ` Alistair Francis
  0 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-09 14:10 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, Palmer Dabbelt

On Fri, Feb 12, 2021 at 10:07 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> The saturate functions about add,subtract and shift functions can
> be used in packed extension.Therefore hoist them up.

A better title might be:

target/riscv: Make the vector helper functions public

Otherwise:

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

>
> The endianess process macro is also be hoisted.
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/internals.h     | 50 ++++++++++++++++++++++
>  target/riscv/vector_helper.c | 82 +++++++++++-------------------------
>  2 files changed, 74 insertions(+), 58 deletions(-)
>
> diff --git a/target/riscv/internals.h b/target/riscv/internals.h
> index b15ad394bb..698158e116 100644
> --- a/target/riscv/internals.h
> +++ b/target/riscv/internals.h
> @@ -58,4 +58,54 @@ static inline float32 check_nanbox_s(uint64_t f)
>      }
>  }
>
> +/*
> + * Note that vector data is stored in host-endian 64-bit chunks,
> + * so addressing units smaller than that needs a host-endian fixup.
> + */
> +#ifdef HOST_WORDS_BIGENDIAN
> +#define H1(x)   ((x) ^ 7)
> +#define H1_2(x) ((x) ^ 6)
> +#define H1_4(x) ((x) ^ 4)
> +#define H2(x)   ((x) ^ 3)
> +#define H4(x)   ((x) ^ 1)
> +#define H8(x)   ((x))
> +#else
> +#define H1(x)   (x)
> +#define H1_2(x) (x)
> +#define H1_4(x) (x)
> +#define H2(x)   (x)
> +#define H4(x)   (x)
> +#define H8(x)   (x)
> +#endif
> +
> +/* share functions about saturation */
> +int8_t sadd8(CPURISCVState *, int vxrm, int8_t, int8_t);
> +int16_t sadd16(CPURISCVState *, int vxrm, int16_t, int16_t);
> +int32_t sadd32(CPURISCVState *, int vxrm, int32_t, int32_t);
> +int64_t sadd64(CPURISCVState *, int vxrm, int64_t, int64_t);
> +
> +uint8_t saddu8(CPURISCVState *, int vxrm, uint8_t, uint8_t);
> +uint16_t saddu16(CPURISCVState *, int vxrm, uint16_t, uint16_t);
> +uint32_t saddu32(CPURISCVState *, int vxrm, uint32_t, uint32_t);
> +uint64_t saddu64(CPURISCVState *, int vxrm, uint64_t, uint64_t);
> +
> +int8_t ssub8(CPURISCVState *, int vxrm, int8_t, int8_t);
> +int16_t ssub16(CPURISCVState *, int vxrm, int16_t, int16_t);
> +int32_t ssub32(CPURISCVState *, int vxrm, int32_t, int32_t);
> +int64_t ssub64(CPURISCVState *, int vxrm, int64_t, int64_t);
> +
> +uint8_t ssubu8(CPURISCVState *, int vxrm, uint8_t, uint8_t);
> +uint16_t ssubu16(CPURISCVState *, int vxrm, uint16_t, uint16_t);
> +uint32_t ssubu32(CPURISCVState *, int vxrm, uint32_t, uint32_t);
> +uint64_t ssubu64(CPURISCVState *, int vxrm, uint64_t, uint64_t);
> +
> +/* share shift functions */
> +int8_t vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b);
> +int16_t vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b);
> +int32_t vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b);
> +int64_t vssra64(CPURISCVState *env, int vxrm, int64_t a, int64_t b);
> +uint8_t vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b);
> +uint16_t vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b);
> +uint32_t vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b);
> +uint64_t vssrl64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b);
>  #endif
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index a156573d28..9371d70f6b 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -56,26 +56,6 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
>      return vl;
>  }
>
> -/*
> - * Note that vector data is stored in host-endian 64-bit chunks,
> - * so addressing units smaller than that needs a host-endian fixup.
> - */
> -#ifdef HOST_WORDS_BIGENDIAN
> -#define H1(x)   ((x) ^ 7)
> -#define H1_2(x) ((x) ^ 6)
> -#define H1_4(x) ((x) ^ 4)
> -#define H2(x)   ((x) ^ 3)
> -#define H4(x)   ((x) ^ 1)
> -#define H8(x)   ((x))
> -#else
> -#define H1(x)   (x)
> -#define H1_2(x) (x)
> -#define H1_4(x) (x)
> -#define H2(x)   (x)
> -#define H4(x)   (x)
> -#define H8(x)   (x)
> -#endif
> -
>  static inline uint32_t vext_nf(uint32_t desc)
>  {
>      return FIELD_EX32(simd_data(desc), VDATA, NF);
> @@ -2199,7 +2179,7 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,     \
>                   do_##NAME, CLEAR_FN);                          \
>  }
>
> -static inline uint8_t saddu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
> +uint8_t saddu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
>  {
>      uint8_t res = a + b;
>      if (res < a) {
> @@ -2209,8 +2189,7 @@ static inline uint8_t saddu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
>      return res;
>  }
>
> -static inline uint16_t saddu16(CPURISCVState *env, int vxrm, uint16_t a,
> -                               uint16_t b)
> +uint16_t saddu16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
>  {
>      uint16_t res = a + b;
>      if (res < a) {
> @@ -2220,8 +2199,7 @@ static inline uint16_t saddu16(CPURISCVState *env, int vxrm, uint16_t a,
>      return res;
>  }
>
> -static inline uint32_t saddu32(CPURISCVState *env, int vxrm, uint32_t a,
> -                               uint32_t b)
> +uint32_t saddu32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
>  {
>      uint32_t res = a + b;
>      if (res < a) {
> @@ -2231,8 +2209,7 @@ static inline uint32_t saddu32(CPURISCVState *env, int vxrm, uint32_t a,
>      return res;
>  }
>
> -static inline uint64_t saddu64(CPURISCVState *env, int vxrm, uint64_t a,
> -                               uint64_t b)
> +uint64_t saddu64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
>  {
>      uint64_t res = a + b;
>      if (res < a) {
> @@ -2328,7 +2305,7 @@ GEN_VEXT_VX_RM(vsaddu_vx_h, 2, 2, clearh)
>  GEN_VEXT_VX_RM(vsaddu_vx_w, 4, 4, clearl)
>  GEN_VEXT_VX_RM(vsaddu_vx_d, 8, 8, clearq)
>
> -static inline int8_t sadd8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
> +int8_t sadd8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
>  {
>      int8_t res = a + b;
>      if ((res ^ a) & (res ^ b) & INT8_MIN) {
> @@ -2338,7 +2315,7 @@ static inline int8_t sadd8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
>      return res;
>  }
>
> -static inline int16_t sadd16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
> +int16_t sadd16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
>  {
>      int16_t res = a + b;
>      if ((res ^ a) & (res ^ b) & INT16_MIN) {
> @@ -2348,7 +2325,7 @@ static inline int16_t sadd16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
>      return res;
>  }
>
> -static inline int32_t sadd32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
> +int32_t sadd32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
>  {
>      int32_t res = a + b;
>      if ((res ^ a) & (res ^ b) & INT32_MIN) {
> @@ -2358,7 +2335,7 @@ static inline int32_t sadd32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
>      return res;
>  }
>
> -static inline int64_t sadd64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
> +int64_t sadd64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
>  {
>      int64_t res = a + b;
>      if ((res ^ a) & (res ^ b) & INT64_MIN) {
> @@ -2386,7 +2363,7 @@ GEN_VEXT_VX_RM(vsadd_vx_h, 2, 2, clearh)
>  GEN_VEXT_VX_RM(vsadd_vx_w, 4, 4, clearl)
>  GEN_VEXT_VX_RM(vsadd_vx_d, 8, 8, clearq)
>
> -static inline uint8_t ssubu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
> +uint8_t ssubu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
>  {
>      uint8_t res = a - b;
>      if (res > a) {
> @@ -2396,8 +2373,7 @@ static inline uint8_t ssubu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
>      return res;
>  }
>
> -static inline uint16_t ssubu16(CPURISCVState *env, int vxrm, uint16_t a,
> -                               uint16_t b)
> +uint16_t ssubu16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
>  {
>      uint16_t res = a - b;
>      if (res > a) {
> @@ -2407,8 +2383,7 @@ static inline uint16_t ssubu16(CPURISCVState *env, int vxrm, uint16_t a,
>      return res;
>  }
>
> -static inline uint32_t ssubu32(CPURISCVState *env, int vxrm, uint32_t a,
> -                               uint32_t b)
> +uint32_t ssubu32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
>  {
>      uint32_t res = a - b;
>      if (res > a) {
> @@ -2418,8 +2393,7 @@ static inline uint32_t ssubu32(CPURISCVState *env, int vxrm, uint32_t a,
>      return res;
>  }
>
> -static inline uint64_t ssubu64(CPURISCVState *env, int vxrm, uint64_t a,
> -                               uint64_t b)
> +uint64_t ssubu64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
>  {
>      uint64_t res = a - b;
>      if (res > a) {
> @@ -2447,7 +2421,7 @@ GEN_VEXT_VX_RM(vssubu_vx_h, 2, 2, clearh)
>  GEN_VEXT_VX_RM(vssubu_vx_w, 4, 4, clearl)
>  GEN_VEXT_VX_RM(vssubu_vx_d, 8, 8, clearq)
>
> -static inline int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
> +int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
>  {
>      int8_t res = a - b;
>      if ((res ^ a) & (a ^ b) & INT8_MIN) {
> @@ -2457,7 +2431,7 @@ static inline int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
>      return res;
>  }
>
> -static inline int16_t ssub16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
> +int16_t ssub16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
>  {
>      int16_t res = a - b;
>      if ((res ^ a) & (a ^ b) & INT16_MIN) {
> @@ -2467,7 +2441,7 @@ static inline int16_t ssub16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
>      return res;
>  }
>
> -static inline int32_t ssub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
> +int32_t ssub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
>  {
>      int32_t res = a - b;
>      if ((res ^ a) & (a ^ b) & INT32_MIN) {
> @@ -2477,7 +2451,7 @@ static inline int32_t ssub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
>      return res;
>  }
>
> -static inline int64_t ssub64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
> +int64_t ssub64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
>  {
>      int64_t res = a - b;
>      if ((res ^ a) & (a ^ b) & INT64_MIN) {
> @@ -2918,8 +2892,7 @@ GEN_VEXT_VX_RM(vwsmaccus_vx_h, 2, 4, clearl)
>  GEN_VEXT_VX_RM(vwsmaccus_vx_w, 4, 8, clearq)
>
>  /* Vector Single-Width Scaling Shift Instructions */
> -static inline uint8_t
> -vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
> +uint8_t vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
>  {
>      uint8_t round, shift = b & 0x7;
>      uint8_t res;
> @@ -2928,8 +2901,7 @@ vssrl8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
>      res   = (a >> shift)  + round;
>      return res;
>  }
> -static inline uint16_t
> -vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
> +uint16_t vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
>  {
>      uint8_t round, shift = b & 0xf;
>      uint16_t res;
> @@ -2938,8 +2910,7 @@ vssrl16(CPURISCVState *env, int vxrm, uint16_t a, uint16_t b)
>      res   = (a >> shift)  + round;
>      return res;
>  }
> -static inline uint32_t
> -vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
> +uint32_t vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
>  {
>      uint8_t round, shift = b & 0x1f;
>      uint32_t res;
> @@ -2948,8 +2919,7 @@ vssrl32(CPURISCVState *env, int vxrm, uint32_t a, uint32_t b)
>      res   = (a >> shift)  + round;
>      return res;
>  }
> -static inline uint64_t
> -vssrl64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
> +uint64_t vssrl64(CPURISCVState *env, int vxrm, uint64_t a, uint64_t b)
>  {
>      uint8_t round, shift = b & 0x3f;
>      uint64_t res;
> @@ -2976,8 +2946,7 @@ GEN_VEXT_VX_RM(vssrl_vx_h, 2, 2, clearh)
>  GEN_VEXT_VX_RM(vssrl_vx_w, 4, 4, clearl)
>  GEN_VEXT_VX_RM(vssrl_vx_d, 8, 8, clearq)
>
> -static inline int8_t
> -vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
> +int8_t vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
>  {
>      uint8_t round, shift = b & 0x7;
>      int8_t res;
> @@ -2986,8 +2955,7 @@ vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
>      res   = (a >> shift)  + round;
>      return res;
>  }
> -static inline int16_t
> -vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
> +int16_t vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
>  {
>      uint8_t round, shift = b & 0xf;
>      int16_t res;
> @@ -2996,8 +2964,7 @@ vssra16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
>      res   = (a >> shift)  + round;
>      return res;
>  }
> -static inline int32_t
> -vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
> +int32_t vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
>  {
>      uint8_t round, shift = b & 0x1f;
>      int32_t res;
> @@ -3006,8 +2973,7 @@ vssra32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
>      res   = (a >> shift)  + round;
>      return res;
>  }
> -static inline int64_t
> -vssra64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
> +int64_t vssra64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
>  {
>      uint8_t round, shift = b & 0x3f;
>      int64_t res;
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 03/38] target/riscv: Fixup saturate subtract function
  2021-02-12 15:02   ` LIU Zhiwei
@ 2021-03-09 14:11     ` Alistair Francis
  -1 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-09 14:11 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Palmer Dabbelt, open list:RISC-V,
	qemu-devel@nongnu.org Developers

On Fri, Feb 12, 2021 at 10:10 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> The overflow predication ((a - b) ^ a) & (a ^ b) & INT64_MIN is right.
> However, when the predication is ture and a is 0, it should return maximum.
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/vector_helper.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 9371d70f6b..9786f630b4 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -2425,7 +2425,7 @@ int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
>  {
>      int8_t res = a - b;
>      if ((res ^ a) & (a ^ b) & INT8_MIN) {
> -        res = a > 0 ? INT8_MAX : INT8_MIN;
> +        res = a >= 0 ? INT8_MAX : INT8_MIN;
>          env->vxsat = 0x1;
>      }
>      return res;
> @@ -2435,7 +2435,7 @@ int16_t ssub16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
>  {
>      int16_t res = a - b;
>      if ((res ^ a) & (a ^ b) & INT16_MIN) {
> -        res = a > 0 ? INT16_MAX : INT16_MIN;
> +        res = a >= 0 ? INT16_MAX : INT16_MIN;
>          env->vxsat = 0x1;
>      }
>      return res;
> @@ -2445,7 +2445,7 @@ int32_t ssub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
>  {
>      int32_t res = a - b;
>      if ((res ^ a) & (a ^ b) & INT32_MIN) {
> -        res = a > 0 ? INT32_MAX : INT32_MIN;
> +        res = a >= 0 ? INT32_MAX : INT32_MIN;
>          env->vxsat = 0x1;
>      }
>      return res;
> @@ -2455,7 +2455,7 @@ int64_t ssub64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
>  {
>      int64_t res = a - b;
>      if ((res ^ a) & (a ^ b) & INT64_MIN) {
> -        res = a > 0 ? INT64_MAX : INT64_MIN;
> +        res = a >= 0 ? INT64_MAX : INT64_MIN;
>          env->vxsat = 0x1;
>      }
>      return res;
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 03/38] target/riscv: Fixup saturate subtract function
@ 2021-03-09 14:11     ` Alistair Francis
  0 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-09 14:11 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, Palmer Dabbelt

On Fri, Feb 12, 2021 at 10:10 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> The overflow predication ((a - b) ^ a) & (a ^ b) & INT64_MIN is right.
> However, when the predication is ture and a is 0, it should return maximum.
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/vector_helper.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 9371d70f6b..9786f630b4 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -2425,7 +2425,7 @@ int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
>  {
>      int8_t res = a - b;
>      if ((res ^ a) & (a ^ b) & INT8_MIN) {
> -        res = a > 0 ? INT8_MAX : INT8_MIN;
> +        res = a >= 0 ? INT8_MAX : INT8_MIN;
>          env->vxsat = 0x1;
>      }
>      return res;
> @@ -2435,7 +2435,7 @@ int16_t ssub16(CPURISCVState *env, int vxrm, int16_t a, int16_t b)
>  {
>      int16_t res = a - b;
>      if ((res ^ a) & (a ^ b) & INT16_MIN) {
> -        res = a > 0 ? INT16_MAX : INT16_MIN;
> +        res = a >= 0 ? INT16_MAX : INT16_MIN;
>          env->vxsat = 0x1;
>      }
>      return res;
> @@ -2445,7 +2445,7 @@ int32_t ssub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
>  {
>      int32_t res = a - b;
>      if ((res ^ a) & (a ^ b) & INT32_MIN) {
> -        res = a > 0 ? INT32_MAX : INT32_MIN;
> +        res = a >= 0 ? INT32_MAX : INT32_MIN;
>          env->vxsat = 0x1;
>      }
>      return res;
> @@ -2455,7 +2455,7 @@ int64_t ssub64(CPURISCVState *env, int vxrm, int64_t a, int64_t b)
>  {
>      int64_t res = a - b;
>      if ((res ^ a) & (a ^ b) & INT64_MIN) {
> -        res = a > 0 ? INT64_MAX : INT64_MIN;
> +        res = a >= 0 ? INT64_MAX : INT64_MIN;
>          env->vxsat = 0x1;
>      }
>      return res;
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 05/38] target/riscv: 8-bit Addition & Subtraction Instruction
  2021-02-12 15:02   ` LIU Zhiwei
@ 2021-03-15 21:22     ` Alistair Francis
  -1 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-15 21:22 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Palmer Dabbelt, open list:RISC-V,
	qemu-devel@nongnu.org Developers

On Fri, Feb 12, 2021 at 10:14 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |  9 +++
>  target/riscv/insn32.decode              | 11 ++++
>  target/riscv/insn_trans/trans_rvp.c.inc | 79 +++++++++++++++++++++++++
>  target/riscv/packed_helper.c            | 73 +++++++++++++++++++++++
>  4 files changed, 172 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 6d622c732a..a69a6b4e84 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1175,3 +1175,12 @@ DEF_HELPER_3(rstsa16, tl, env, tl, tl)
>  DEF_HELPER_3(urstsa16, tl, env, tl, tl)
>  DEF_HELPER_3(kstsa16, tl, env, tl, tl)
>  DEF_HELPER_3(ukstsa16, tl, env, tl, tl)
> +
> +DEF_HELPER_3(radd8, tl, env, tl, tl)
> +DEF_HELPER_3(uradd8, tl, env, tl, tl)
> +DEF_HELPER_3(kadd8, tl, env, tl, tl)
> +DEF_HELPER_3(ukadd8, tl, env, tl, tl)
> +DEF_HELPER_3(rsub8, tl, env, tl, tl)
> +DEF_HELPER_3(ursub8, tl, env, tl, tl)
> +DEF_HELPER_3(ksub8, tl, env, tl, tl)
> +DEF_HELPER_3(uksub8, tl, env, tl, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 8815e90476..358dd1fa10 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -624,3 +624,14 @@ rstsa16    1011011  ..... ..... 010 ..... 1111111 @r
>  urstsa16   1101011  ..... ..... 010 ..... 1111111 @r
>  kstsa16    1100011  ..... ..... 010 ..... 1111111 @r
>  ukstsa16   1110011  ..... ..... 010 ..... 1111111 @r
> +
> +add8       0100100  ..... ..... 000 ..... 1111111 @r
> +radd8      0000100  ..... ..... 000 ..... 1111111 @r
> +uradd8     0010100  ..... ..... 000 ..... 1111111 @r
> +kadd8      0001100  ..... ..... 000 ..... 1111111 @r
> +ukadd8     0011100  ..... ..... 000 ..... 1111111 @r
> +sub8       0100101  ..... ..... 000 ..... 1111111 @r
> +rsub8      0000101  ..... ..... 000 ..... 1111111 @r
> +ursub8     0010101  ..... ..... 000 ..... 1111111 @r
> +ksub8      0001101  ..... ..... 000 ..... 1111111 @r
> +uksub8     0011101  ..... ..... 000 ..... 1111111 @r
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 0885a4fd45..109f560ec9 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -159,3 +159,82 @@ GEN_RVP_R_OOL(rstsa16);
>  GEN_RVP_R_OOL(urstsa16);
>  GEN_RVP_R_OOL(kstsa16);
>  GEN_RVP_R_OOL(ukstsa16);
> +
> +/* 8-bit Addition & Subtraction Instructions */
> +/*
> + *  Copied from tcg-op-gvec.c.
> + *
> + *  Perform a vector addition using normal addition and a mask.  The mask
> + *  should be the sign bit of each lane.  This 6-operation form is more
> + *  efficient than separate additions when there are 4 or more lanes in
> + *  the 64-bit operation.
> + */
> +
> +static void gen_simd_add_mask(TCGv d, TCGv a, TCGv b, TCGv m)
> +{
> +    TCGv t1 = tcg_temp_new();
> +    TCGv t2 = tcg_temp_new();
> +    TCGv t3 = tcg_temp_new();
> +
> +    tcg_gen_andc_tl(t1, a, m);
> +    tcg_gen_andc_tl(t2, b, m);
> +    tcg_gen_xor_tl(t3, a, b);
> +    tcg_gen_add_tl(d, t1, t2);
> +    tcg_gen_and_tl(t3, t3, m);
> +    tcg_gen_xor_tl(d, d, t3);
> +
> +    tcg_temp_free(t1);
> +    tcg_temp_free(t2);
> +    tcg_temp_free(t3);
> +}
> +
> +static void tcg_gen_simd_add8(TCGv d, TCGv a, TCGv b)
> +{
> +    TCGv m = tcg_const_tl((target_ulong)dup_const(MO_8, 0x80));
> +    gen_simd_add_mask(d, a, b, m);
> +    tcg_temp_free(m);
> +}
> +
> +GEN_RVP_R_INLINE(add8, add, 0, trans_add);
> +
> +/*
> + *  Copied from tcg-op-gvec.c.
> + *
> + *  Perform a vector subtraction using normal subtraction and a mask.
> + *  Compare gen_addv_mask above.
> + */
> +static void gen_simd_sub_mask(TCGv d, TCGv a, TCGv b, TCGv m)
> +{
> +    TCGv t1 = tcg_temp_new();
> +    TCGv t2 = tcg_temp_new();
> +    TCGv t3 = tcg_temp_new();
> +
> +    tcg_gen_or_tl(t1, a, m);
> +    tcg_gen_andc_tl(t2, b, m);
> +    tcg_gen_eqv_tl(t3, a, b);
> +    tcg_gen_sub_tl(d, t1, t2);
> +    tcg_gen_and_tl(t3, t3, m);
> +    tcg_gen_xor_tl(d, d, t3);
> +
> +    tcg_temp_free(t1);
> +    tcg_temp_free(t2);
> +    tcg_temp_free(t3);
> +}
> +
> +static void tcg_gen_simd_sub8(TCGv d, TCGv a, TCGv b)
> +{
> +    TCGv m = tcg_const_tl((target_ulong)dup_const(MO_8, 0x80));
> +    gen_simd_sub_mask(d, a, b, m);
> +    tcg_temp_free(m);
> +}
> +
> +GEN_RVP_R_INLINE(sub8, sub, 0, trans_sub);
> +
> +GEN_RVP_R_OOL(radd8);
> +GEN_RVP_R_OOL(uradd8);
> +GEN_RVP_R_OOL(kadd8);
> +GEN_RVP_R_OOL(ukadd8);
> +GEN_RVP_R_OOL(rsub8);
> +GEN_RVP_R_OOL(ursub8);
> +GEN_RVP_R_OOL(ksub8);
> +GEN_RVP_R_OOL(uksub8);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index b84abaaf25..62db072204 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -352,3 +352,76 @@ static inline void do_ukstsa16(CPURISCVState *env, void *vd, void *va,
>  }
>
>  RVPR(ukstsa16, 2, 2);
> +
> +/* 8-bit Addition & Subtraction Instructions */
> +static inline void do_radd8(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, *b = vb;
> +    d[i] = hadd32(a[i], b[i]);
> +}
> +
> +RVPR(radd8, 1, 1);
> +
> +static inline void do_uradd8(CPURISCVState *env, void *vd, void *va,
> +                                  void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va, *b = vb;
> +    d[i] = haddu32(a[i], b[i]);
> +}
> +
> +RVPR(uradd8, 1, 1);
> +
> +static inline void do_kadd8(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, *b = vb;
> +    d[i] = sadd8(env, 0, a[i], b[i]);
> +}
> +
> +RVPR(kadd8, 1, 1);
> +
> +static inline void do_ukadd8(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va, *b = vb;
> +    d[i] = saddu8(env, 0, a[i], b[i]);
> +}
> +
> +RVPR(ukadd8, 1, 1);
> +
> +static inline void do_rsub8(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, *b = vb;
> +    d[i] = hsub32(a[i], b[i]);
> +}
> +
> +RVPR(rsub8, 1, 1);
> +
> +static inline void do_ursub8(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va, *b = vb;
> +    d[i] = hsubu64(a[i], b[i]);
> +}
> +
> +RVPR(ursub8, 1, 1);
> +
> +static inline void do_ksub8(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, *b = vb;
> +    d[i] = ssub8(env, 0, a[i], b[i]);
> +}
> +
> +RVPR(ksub8, 1, 1);
> +
> +static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va, *b = vb;
> +    d[i] = ssubu8(env, 0, a[i], b[i]);
> +}
> +
> +RVPR(uksub8, 1, 1);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 05/38] target/riscv: 8-bit Addition & Subtraction Instruction
@ 2021-03-15 21:22     ` Alistair Francis
  0 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-15 21:22 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, Palmer Dabbelt

On Fri, Feb 12, 2021 at 10:14 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |  9 +++
>  target/riscv/insn32.decode              | 11 ++++
>  target/riscv/insn_trans/trans_rvp.c.inc | 79 +++++++++++++++++++++++++
>  target/riscv/packed_helper.c            | 73 +++++++++++++++++++++++
>  4 files changed, 172 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 6d622c732a..a69a6b4e84 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1175,3 +1175,12 @@ DEF_HELPER_3(rstsa16, tl, env, tl, tl)
>  DEF_HELPER_3(urstsa16, tl, env, tl, tl)
>  DEF_HELPER_3(kstsa16, tl, env, tl, tl)
>  DEF_HELPER_3(ukstsa16, tl, env, tl, tl)
> +
> +DEF_HELPER_3(radd8, tl, env, tl, tl)
> +DEF_HELPER_3(uradd8, tl, env, tl, tl)
> +DEF_HELPER_3(kadd8, tl, env, tl, tl)
> +DEF_HELPER_3(ukadd8, tl, env, tl, tl)
> +DEF_HELPER_3(rsub8, tl, env, tl, tl)
> +DEF_HELPER_3(ursub8, tl, env, tl, tl)
> +DEF_HELPER_3(ksub8, tl, env, tl, tl)
> +DEF_HELPER_3(uksub8, tl, env, tl, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 8815e90476..358dd1fa10 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -624,3 +624,14 @@ rstsa16    1011011  ..... ..... 010 ..... 1111111 @r
>  urstsa16   1101011  ..... ..... 010 ..... 1111111 @r
>  kstsa16    1100011  ..... ..... 010 ..... 1111111 @r
>  ukstsa16   1110011  ..... ..... 010 ..... 1111111 @r
> +
> +add8       0100100  ..... ..... 000 ..... 1111111 @r
> +radd8      0000100  ..... ..... 000 ..... 1111111 @r
> +uradd8     0010100  ..... ..... 000 ..... 1111111 @r
> +kadd8      0001100  ..... ..... 000 ..... 1111111 @r
> +ukadd8     0011100  ..... ..... 000 ..... 1111111 @r
> +sub8       0100101  ..... ..... 000 ..... 1111111 @r
> +rsub8      0000101  ..... ..... 000 ..... 1111111 @r
> +ursub8     0010101  ..... ..... 000 ..... 1111111 @r
> +ksub8      0001101  ..... ..... 000 ..... 1111111 @r
> +uksub8     0011101  ..... ..... 000 ..... 1111111 @r
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 0885a4fd45..109f560ec9 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -159,3 +159,82 @@ GEN_RVP_R_OOL(rstsa16);
>  GEN_RVP_R_OOL(urstsa16);
>  GEN_RVP_R_OOL(kstsa16);
>  GEN_RVP_R_OOL(ukstsa16);
> +
> +/* 8-bit Addition & Subtraction Instructions */
> +/*
> + *  Copied from tcg-op-gvec.c.
> + *
> + *  Perform a vector addition using normal addition and a mask.  The mask
> + *  should be the sign bit of each lane.  This 6-operation form is more
> + *  efficient than separate additions when there are 4 or more lanes in
> + *  the 64-bit operation.
> + */
> +
> +static void gen_simd_add_mask(TCGv d, TCGv a, TCGv b, TCGv m)
> +{
> +    TCGv t1 = tcg_temp_new();
> +    TCGv t2 = tcg_temp_new();
> +    TCGv t3 = tcg_temp_new();
> +
> +    tcg_gen_andc_tl(t1, a, m);
> +    tcg_gen_andc_tl(t2, b, m);
> +    tcg_gen_xor_tl(t3, a, b);
> +    tcg_gen_add_tl(d, t1, t2);
> +    tcg_gen_and_tl(t3, t3, m);
> +    tcg_gen_xor_tl(d, d, t3);
> +
> +    tcg_temp_free(t1);
> +    tcg_temp_free(t2);
> +    tcg_temp_free(t3);
> +}
> +
> +static void tcg_gen_simd_add8(TCGv d, TCGv a, TCGv b)
> +{
> +    TCGv m = tcg_const_tl((target_ulong)dup_const(MO_8, 0x80));
> +    gen_simd_add_mask(d, a, b, m);
> +    tcg_temp_free(m);
> +}
> +
> +GEN_RVP_R_INLINE(add8, add, 0, trans_add);
> +
> +/*
> + *  Copied from tcg-op-gvec.c.
> + *
> + *  Perform a vector subtraction using normal subtraction and a mask.
> + *  Compare gen_addv_mask above.
> + */
> +static void gen_simd_sub_mask(TCGv d, TCGv a, TCGv b, TCGv m)
> +{
> +    TCGv t1 = tcg_temp_new();
> +    TCGv t2 = tcg_temp_new();
> +    TCGv t3 = tcg_temp_new();
> +
> +    tcg_gen_or_tl(t1, a, m);
> +    tcg_gen_andc_tl(t2, b, m);
> +    tcg_gen_eqv_tl(t3, a, b);
> +    tcg_gen_sub_tl(d, t1, t2);
> +    tcg_gen_and_tl(t3, t3, m);
> +    tcg_gen_xor_tl(d, d, t3);
> +
> +    tcg_temp_free(t1);
> +    tcg_temp_free(t2);
> +    tcg_temp_free(t3);
> +}
> +
> +static void tcg_gen_simd_sub8(TCGv d, TCGv a, TCGv b)
> +{
> +    TCGv m = tcg_const_tl((target_ulong)dup_const(MO_8, 0x80));
> +    gen_simd_sub_mask(d, a, b, m);
> +    tcg_temp_free(m);
> +}
> +
> +GEN_RVP_R_INLINE(sub8, sub, 0, trans_sub);
> +
> +GEN_RVP_R_OOL(radd8);
> +GEN_RVP_R_OOL(uradd8);
> +GEN_RVP_R_OOL(kadd8);
> +GEN_RVP_R_OOL(ukadd8);
> +GEN_RVP_R_OOL(rsub8);
> +GEN_RVP_R_OOL(ursub8);
> +GEN_RVP_R_OOL(ksub8);
> +GEN_RVP_R_OOL(uksub8);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index b84abaaf25..62db072204 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -352,3 +352,76 @@ static inline void do_ukstsa16(CPURISCVState *env, void *vd, void *va,
>  }
>
>  RVPR(ukstsa16, 2, 2);
> +
> +/* 8-bit Addition & Subtraction Instructions */
> +static inline void do_radd8(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, *b = vb;
> +    d[i] = hadd32(a[i], b[i]);
> +}
> +
> +RVPR(radd8, 1, 1);
> +
> +static inline void do_uradd8(CPURISCVState *env, void *vd, void *va,
> +                                  void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va, *b = vb;
> +    d[i] = haddu32(a[i], b[i]);
> +}
> +
> +RVPR(uradd8, 1, 1);
> +
> +static inline void do_kadd8(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, *b = vb;
> +    d[i] = sadd8(env, 0, a[i], b[i]);
> +}
> +
> +RVPR(kadd8, 1, 1);
> +
> +static inline void do_ukadd8(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va, *b = vb;
> +    d[i] = saddu8(env, 0, a[i], b[i]);
> +}
> +
> +RVPR(ukadd8, 1, 1);
> +
> +static inline void do_rsub8(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, *b = vb;
> +    d[i] = hsub32(a[i], b[i]);
> +}
> +
> +RVPR(rsub8, 1, 1);
> +
> +static inline void do_ursub8(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va, *b = vb;
> +    d[i] = hsubu64(a[i], b[i]);
> +}
> +
> +RVPR(ursub8, 1, 1);
> +
> +static inline void do_ksub8(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, *b = vb;
> +    d[i] = ssub8(env, 0, a[i], b[i]);
> +}
> +
> +RVPR(ksub8, 1, 1);
> +
> +static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va, *b = vb;
> +    d[i] = ssubu8(env, 0, a[i], b[i]);
> +}
> +
> +RVPR(uksub8, 1, 1);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 06/38] target/riscv: SIMD 16-bit Shift Instructions
  2021-02-12 15:02   ` LIU Zhiwei
@ 2021-03-15 21:25     ` Alistair Francis
  -1 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-15 21:25 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Palmer Dabbelt, open list:RISC-V,
	qemu-devel@nongnu.org Developers

On Fri, Feb 12, 2021 at 10:16 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |   9 ++
>  target/riscv/insn32.decode              |  17 ++++
>  target/riscv/insn_trans/trans_rvp.c.inc | 115 ++++++++++++++++++++++++
>  target/riscv/packed_helper.c            | 104 +++++++++++++++++++++
>  4 files changed, 245 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index a69a6b4e84..20bf400ac2 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1184,3 +1184,12 @@ DEF_HELPER_3(rsub8, tl, env, tl, tl)
>  DEF_HELPER_3(ursub8, tl, env, tl, tl)
>  DEF_HELPER_3(ksub8, tl, env, tl, tl)
>  DEF_HELPER_3(uksub8, tl, env, tl, tl)
> +
> +DEF_HELPER_3(sra16, tl, env, tl, tl)
> +DEF_HELPER_3(sra16_u, tl, env, tl, tl)
> +DEF_HELPER_3(srl16, tl, env, tl, tl)
> +DEF_HELPER_3(srl16_u, tl, env, tl, tl)
> +DEF_HELPER_3(sll16, tl, env, tl, tl)
> +DEF_HELPER_3(ksll16, tl, env, tl, tl)
> +DEF_HELPER_3(kslra16, tl, env, tl, tl)
> +DEF_HELPER_3(kslra16_u, tl, env, tl, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 358dd1fa10..6f053bfeb7 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -23,6 +23,7 @@
>  %rd        7:5
>
>  %sh10    20:10
> +%sh4    20:4
>  %csr    20:12
>  %rm     12:3
>  %nf     29:3                     !function=ex_plus_1
> @@ -59,6 +60,7 @@
>  @j       ....................      ..... ....... &j      imm=%imm_j          %rd
>
>  @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh10      %rs1 %rd
> +@sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
>  @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
>
>  @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
> @@ -635,3 +637,18 @@ rsub8      0000101  ..... ..... 000 ..... 1111111 @r
>  ursub8     0010101  ..... ..... 000 ..... 1111111 @r
>  ksub8      0001101  ..... ..... 000 ..... 1111111 @r
>  uksub8     0011101  ..... ..... 000 ..... 1111111 @r
> +
> +sra16      0101000  ..... ..... 000 ..... 1111111 @r
> +sra16_u    0110000  ..... ..... 000 ..... 1111111 @r
> +srai16     0111000  0.... ..... 000 ..... 1111111 @sh4
> +srai16_u   0111000  1.... ..... 000 ..... 1111111 @sh4
> +srl16      0101001  ..... ..... 000 ..... 1111111 @r
> +srl16_u    0110001  ..... ..... 000 ..... 1111111 @r
> +srli16     0111001  0.... ..... 000 ..... 1111111 @sh4
> +srli16_u   0111001  1.... ..... 000 ..... 1111111 @sh4
> +sll16      0101010  ..... ..... 000 ..... 1111111 @r
> +slli16     0111010  0.... ..... 000 ..... 1111111 @sh4
> +ksll16     0110010  ..... ..... 000 ..... 1111111 @r
> +kslli16    0111010  1.... ..... 000 ..... 1111111 @sh4
> +kslra16    0101011  ..... ..... 000 ..... 1111111 @r
> +kslra16_u  0110011  ..... ..... 000 ..... 1111111 @r
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 109f560ec9..848edab7e5 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -238,3 +238,118 @@ GEN_RVP_R_OOL(rsub8);
>  GEN_RVP_R_OOL(ursub8);
>  GEN_RVP_R_OOL(ksub8);
>  GEN_RVP_R_OOL(uksub8);
> +
> +/* 16-bit Shift Instructions */
> +static bool rvp_shift_ool(DisasContext *ctx, arg_r *a,
> +                          gen_helper_rvp_r *fn, target_ulong mask)
> +{
> +    TCGv src1, src2, dst;
> +
> +    src1 = tcg_temp_new();
> +    src2 = tcg_temp_new();
> +    dst = tcg_temp_new();
> +
> +    gen_get_gpr(src1, a->rs1);
> +    gen_get_gpr(src2, a->rs2);
> +    tcg_gen_andi_tl(src2, src2, mask);
> +
> +    fn(dst, cpu_env, src1, src2);
> +    gen_set_gpr(a->rd, dst);
> +
> +    tcg_temp_free(src1);
> +    tcg_temp_free(src2);
> +    tcg_temp_free(dst);
> +    return true;
> +}
> +
> +typedef void GenGvecShift(unsigned, uint32_t, uint32_t, TCGv_i32,
> +                          uint32_t, uint32_t);
> +static inline bool
> +rvp_shift(DisasContext *ctx, arg_r *a, uint8_t vece,
> +          GenGvecShift *f64, gen_helper_rvp_r *fn,
> +          uint8_t mask)
> +{
> +    if (!has_ext(ctx, RVP)) {
> +        return false;
> +    }
> +
> +#ifdef TARGET_RISCV64

Hmm....

I don't want to add any more #defines on the RISC-V xlen. We are
trying to make the QEMU RISC-V implementation xlen independent.

Can you use `riscv_cpu_is_32bit(env)` instead, here are everywhere
else you add a #define TARGET... ?

Alistair

> +    if (a->rd && a->rs1 && a->rs2) {
> +        TCGv_i32 shift = tcg_temp_new_i32();
> +        tcg_gen_extrl_i64_i32(shift, cpu_gpr[a->rs2]);
> +        tcg_gen_andi_i32(shift, shift, mask);
> +        f64(vece, offsetof(CPURISCVState, gpr[a->rd]),
> +            offsetof(CPURISCVState, gpr[a->rs1]),
> +            shift, 8, 8);
> +        tcg_temp_free_i32(shift);
> +        return true;
> +    }
> +#endif
> +    return rvp_shift_ool(ctx, a, fn, mask);
> +}
> +
> +#define GEN_RVP_SHIFT(NAME, GVEC, VECE)                     \
> +static bool trans_##NAME(DisasContext *s, arg_r *a)         \
> +{                                                           \
> +    return rvp_shift(s, a, VECE, GVEC, gen_helper_##NAME,   \
> +                     (8 << VECE) - 1);                      \
> +}
> +
> +GEN_RVP_SHIFT(sra16, tcg_gen_gvec_sars, 1);
> +GEN_RVP_SHIFT(srl16, tcg_gen_gvec_shrs, 1);
> +GEN_RVP_SHIFT(sll16, tcg_gen_gvec_shls, 1);
> +GEN_RVP_R_OOL(sra16_u);
> +GEN_RVP_R_OOL(srl16_u);
> +GEN_RVP_R_OOL(ksll16);
> +GEN_RVP_R_OOL(kslra16);
> +GEN_RVP_R_OOL(kslra16_u);
> +
> +static bool rvp_shifti_ool(DisasContext *ctx, arg_shift *a,
> +                           gen_helper_rvp_r *fn)
> +{
> +    TCGv src1, dst, shift;
> +
> +    src1 = tcg_temp_new();
> +    dst = tcg_temp_new();
> +
> +    gen_get_gpr(src1, a->rs1);
> +    shift = tcg_const_tl(a->shamt);
> +    fn(dst, cpu_env, src1, shift);
> +    gen_set_gpr(a->rd, dst);
> +
> +    tcg_temp_free(src1);
> +    tcg_temp_free(dst);
> +    tcg_temp_free(shift);
> +    return true;
> +}
> +
> +static inline bool
> +rvp_shifti(DisasContext *ctx, arg_shift *a,
> +           void (* f64)(TCGv_i64, TCGv_i64, int64_t),
> +           gen_helper_rvp_r *fn)
> +{
> +    if (!has_ext(ctx, RVP)) {
> +        return false;
> +    }
> +
> +#ifdef TARGET_RISCV64
> +    if (a->rd && a->rs1 && f64) {
> +        f64(cpu_gpr[a->rd], cpu_gpr[a->rs1], a->shamt);
> +        return true;
> +    }
> +#endif
> +    return rvp_shifti_ool(ctx, a, fn);
> +}
> +
> +#define GEN_RVP_SHIFTI(NAME, OP, GVEC)                   \
> +static bool trans_##NAME(DisasContext *s, arg_shift *a)  \
> +{                                                        \
> +    return rvp_shifti(s, a, GVEC, gen_helper_##OP);      \
> +}
> +
> +GEN_RVP_SHIFTI(srai16, sra16, tcg_gen_vec_sar16i_i64);
> +GEN_RVP_SHIFTI(srli16, srl16, tcg_gen_vec_shr16i_i64);
> +GEN_RVP_SHIFTI(slli16, sll16, tcg_gen_vec_shl16i_i64);
> +GEN_RVP_SHIFTI(srai16_u, sra16_u, NULL);
> +GEN_RVP_SHIFTI(srli16_u, srl16_u, NULL);
> +GEN_RVP_SHIFTI(kslli16, ksll16, NULL);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index 62db072204..7e31c2fe46 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -425,3 +425,107 @@ static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
>  }
>
>  RVPR(uksub8, 1, 1);
> +
> +/* 16-bit Shift Instructions */
> +static inline void do_sra16(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> +    d[i] = a[i] >> shift;
> +}
> +
> +RVPR(sra16, 1, 2);
> +
> +static inline void do_srl16(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> +    d[i] = a[i] >> shift;
> +}
> +
> +RVPR(srl16, 1, 2);
> +
> +static inline void do_sll16(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> +    d[i] = a[i] << shift;
> +}
> +
> +RVPR(sll16, 1, 2);
> +
> +static inline void do_sra16_u(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> +
> +    d[i] = vssra16(env, 0, a[i], shift);
> +}
> +
> +RVPR(sra16_u, 1, 2);
> +
> +static inline void do_srl16_u(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> +
> +    d[i] = vssrl16(env, 0, a[i], shift);
> +}
> +
> +RVPR(srl16_u, 1, 2);
> +
> +static inline void do_ksll16(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va, result;
> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> +
> +    result = a[i] << shift;
> +    if (shift > (clrsb32(a[i]) - 16)) {
> +        env->vxsat = 0x1;
> +        d[i] = (a[i] & INT16_MIN) ? INT16_MIN : INT16_MAX;
> +    } else {
> +        d[i] = result;
> +    }
> +}
> +
> +RVPR(ksll16, 1, 2);
> +
> +static inline void do_kslra16(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va;
> +    int32_t shift = sextract32((*(target_ulong *)vb), 0, 5);
> +
> +    if (shift >= 0) {
> +        do_ksll16(env, vd, va, vb, i);
> +    } else {
> +        shift = -shift;
> +        shift = (shift == 16) ? 15 : shift;
> +        d[i] = a[i] >> shift;
> +    }
> +}
> +
> +RVPR(kslra16, 1, 2);
> +
> +static inline void do_kslra16_u(CPURISCVState *env, void *vd, void *va,
> +                                void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va;
> +    int32_t shift = sextract32((*(uint32_t *)vb), 0, 5);
> +
> +    if (shift >= 0) {
> +        do_ksll16(env, vd, va, vb, i);
> +    } else {
> +        shift = -shift;
> +        shift = (shift == 16) ? 15 : shift;
> +        d[i] = vssra16(env, 0, a[i], shift);
> +    }
> +}
> +
> +RVPR(kslra16_u, 1, 2);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 06/38] target/riscv: SIMD 16-bit Shift Instructions
@ 2021-03-15 21:25     ` Alistair Francis
  0 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-15 21:25 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, Palmer Dabbelt

On Fri, Feb 12, 2021 at 10:16 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |   9 ++
>  target/riscv/insn32.decode              |  17 ++++
>  target/riscv/insn_trans/trans_rvp.c.inc | 115 ++++++++++++++++++++++++
>  target/riscv/packed_helper.c            | 104 +++++++++++++++++++++
>  4 files changed, 245 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index a69a6b4e84..20bf400ac2 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1184,3 +1184,12 @@ DEF_HELPER_3(rsub8, tl, env, tl, tl)
>  DEF_HELPER_3(ursub8, tl, env, tl, tl)
>  DEF_HELPER_3(ksub8, tl, env, tl, tl)
>  DEF_HELPER_3(uksub8, tl, env, tl, tl)
> +
> +DEF_HELPER_3(sra16, tl, env, tl, tl)
> +DEF_HELPER_3(sra16_u, tl, env, tl, tl)
> +DEF_HELPER_3(srl16, tl, env, tl, tl)
> +DEF_HELPER_3(srl16_u, tl, env, tl, tl)
> +DEF_HELPER_3(sll16, tl, env, tl, tl)
> +DEF_HELPER_3(ksll16, tl, env, tl, tl)
> +DEF_HELPER_3(kslra16, tl, env, tl, tl)
> +DEF_HELPER_3(kslra16_u, tl, env, tl, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 358dd1fa10..6f053bfeb7 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -23,6 +23,7 @@
>  %rd        7:5
>
>  %sh10    20:10
> +%sh4    20:4
>  %csr    20:12
>  %rm     12:3
>  %nf     29:3                     !function=ex_plus_1
> @@ -59,6 +60,7 @@
>  @j       ....................      ..... ....... &j      imm=%imm_j          %rd
>
>  @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh10      %rs1 %rd
> +@sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
>  @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
>
>  @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
> @@ -635,3 +637,18 @@ rsub8      0000101  ..... ..... 000 ..... 1111111 @r
>  ursub8     0010101  ..... ..... 000 ..... 1111111 @r
>  ksub8      0001101  ..... ..... 000 ..... 1111111 @r
>  uksub8     0011101  ..... ..... 000 ..... 1111111 @r
> +
> +sra16      0101000  ..... ..... 000 ..... 1111111 @r
> +sra16_u    0110000  ..... ..... 000 ..... 1111111 @r
> +srai16     0111000  0.... ..... 000 ..... 1111111 @sh4
> +srai16_u   0111000  1.... ..... 000 ..... 1111111 @sh4
> +srl16      0101001  ..... ..... 000 ..... 1111111 @r
> +srl16_u    0110001  ..... ..... 000 ..... 1111111 @r
> +srli16     0111001  0.... ..... 000 ..... 1111111 @sh4
> +srli16_u   0111001  1.... ..... 000 ..... 1111111 @sh4
> +sll16      0101010  ..... ..... 000 ..... 1111111 @r
> +slli16     0111010  0.... ..... 000 ..... 1111111 @sh4
> +ksll16     0110010  ..... ..... 000 ..... 1111111 @r
> +kslli16    0111010  1.... ..... 000 ..... 1111111 @sh4
> +kslra16    0101011  ..... ..... 000 ..... 1111111 @r
> +kslra16_u  0110011  ..... ..... 000 ..... 1111111 @r
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 109f560ec9..848edab7e5 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -238,3 +238,118 @@ GEN_RVP_R_OOL(rsub8);
>  GEN_RVP_R_OOL(ursub8);
>  GEN_RVP_R_OOL(ksub8);
>  GEN_RVP_R_OOL(uksub8);
> +
> +/* 16-bit Shift Instructions */
> +static bool rvp_shift_ool(DisasContext *ctx, arg_r *a,
> +                          gen_helper_rvp_r *fn, target_ulong mask)
> +{
> +    TCGv src1, src2, dst;
> +
> +    src1 = tcg_temp_new();
> +    src2 = tcg_temp_new();
> +    dst = tcg_temp_new();
> +
> +    gen_get_gpr(src1, a->rs1);
> +    gen_get_gpr(src2, a->rs2);
> +    tcg_gen_andi_tl(src2, src2, mask);
> +
> +    fn(dst, cpu_env, src1, src2);
> +    gen_set_gpr(a->rd, dst);
> +
> +    tcg_temp_free(src1);
> +    tcg_temp_free(src2);
> +    tcg_temp_free(dst);
> +    return true;
> +}
> +
> +typedef void GenGvecShift(unsigned, uint32_t, uint32_t, TCGv_i32,
> +                          uint32_t, uint32_t);
> +static inline bool
> +rvp_shift(DisasContext *ctx, arg_r *a, uint8_t vece,
> +          GenGvecShift *f64, gen_helper_rvp_r *fn,
> +          uint8_t mask)
> +{
> +    if (!has_ext(ctx, RVP)) {
> +        return false;
> +    }
> +
> +#ifdef TARGET_RISCV64

Hmm....

I don't want to add any more #defines on the RISC-V xlen. We are
trying to make the QEMU RISC-V implementation xlen independent.

Can you use `riscv_cpu_is_32bit(env)` instead, here are everywhere
else you add a #define TARGET... ?

Alistair

> +    if (a->rd && a->rs1 && a->rs2) {
> +        TCGv_i32 shift = tcg_temp_new_i32();
> +        tcg_gen_extrl_i64_i32(shift, cpu_gpr[a->rs2]);
> +        tcg_gen_andi_i32(shift, shift, mask);
> +        f64(vece, offsetof(CPURISCVState, gpr[a->rd]),
> +            offsetof(CPURISCVState, gpr[a->rs1]),
> +            shift, 8, 8);
> +        tcg_temp_free_i32(shift);
> +        return true;
> +    }
> +#endif
> +    return rvp_shift_ool(ctx, a, fn, mask);
> +}
> +
> +#define GEN_RVP_SHIFT(NAME, GVEC, VECE)                     \
> +static bool trans_##NAME(DisasContext *s, arg_r *a)         \
> +{                                                           \
> +    return rvp_shift(s, a, VECE, GVEC, gen_helper_##NAME,   \
> +                     (8 << VECE) - 1);                      \
> +}
> +
> +GEN_RVP_SHIFT(sra16, tcg_gen_gvec_sars, 1);
> +GEN_RVP_SHIFT(srl16, tcg_gen_gvec_shrs, 1);
> +GEN_RVP_SHIFT(sll16, tcg_gen_gvec_shls, 1);
> +GEN_RVP_R_OOL(sra16_u);
> +GEN_RVP_R_OOL(srl16_u);
> +GEN_RVP_R_OOL(ksll16);
> +GEN_RVP_R_OOL(kslra16);
> +GEN_RVP_R_OOL(kslra16_u);
> +
> +static bool rvp_shifti_ool(DisasContext *ctx, arg_shift *a,
> +                           gen_helper_rvp_r *fn)
> +{
> +    TCGv src1, dst, shift;
> +
> +    src1 = tcg_temp_new();
> +    dst = tcg_temp_new();
> +
> +    gen_get_gpr(src1, a->rs1);
> +    shift = tcg_const_tl(a->shamt);
> +    fn(dst, cpu_env, src1, shift);
> +    gen_set_gpr(a->rd, dst);
> +
> +    tcg_temp_free(src1);
> +    tcg_temp_free(dst);
> +    tcg_temp_free(shift);
> +    return true;
> +}
> +
> +static inline bool
> +rvp_shifti(DisasContext *ctx, arg_shift *a,
> +           void (* f64)(TCGv_i64, TCGv_i64, int64_t),
> +           gen_helper_rvp_r *fn)
> +{
> +    if (!has_ext(ctx, RVP)) {
> +        return false;
> +    }
> +
> +#ifdef TARGET_RISCV64
> +    if (a->rd && a->rs1 && f64) {
> +        f64(cpu_gpr[a->rd], cpu_gpr[a->rs1], a->shamt);
> +        return true;
> +    }
> +#endif
> +    return rvp_shifti_ool(ctx, a, fn);
> +}
> +
> +#define GEN_RVP_SHIFTI(NAME, OP, GVEC)                   \
> +static bool trans_##NAME(DisasContext *s, arg_shift *a)  \
> +{                                                        \
> +    return rvp_shifti(s, a, GVEC, gen_helper_##OP);      \
> +}
> +
> +GEN_RVP_SHIFTI(srai16, sra16, tcg_gen_vec_sar16i_i64);
> +GEN_RVP_SHIFTI(srli16, srl16, tcg_gen_vec_shr16i_i64);
> +GEN_RVP_SHIFTI(slli16, sll16, tcg_gen_vec_shl16i_i64);
> +GEN_RVP_SHIFTI(srai16_u, sra16_u, NULL);
> +GEN_RVP_SHIFTI(srli16_u, srl16_u, NULL);
> +GEN_RVP_SHIFTI(kslli16, ksll16, NULL);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index 62db072204..7e31c2fe46 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -425,3 +425,107 @@ static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
>  }
>
>  RVPR(uksub8, 1, 1);
> +
> +/* 16-bit Shift Instructions */
> +static inline void do_sra16(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> +    d[i] = a[i] >> shift;
> +}
> +
> +RVPR(sra16, 1, 2);
> +
> +static inline void do_srl16(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> +    d[i] = a[i] >> shift;
> +}
> +
> +RVPR(srl16, 1, 2);
> +
> +static inline void do_sll16(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> +    d[i] = a[i] << shift;
> +}
> +
> +RVPR(sll16, 1, 2);
> +
> +static inline void do_sra16_u(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> +
> +    d[i] = vssra16(env, 0, a[i], shift);
> +}
> +
> +RVPR(sra16_u, 1, 2);
> +
> +static inline void do_srl16_u(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> +
> +    d[i] = vssrl16(env, 0, a[i], shift);
> +}
> +
> +RVPR(srl16_u, 1, 2);
> +
> +static inline void do_ksll16(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va, result;
> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> +
> +    result = a[i] << shift;
> +    if (shift > (clrsb32(a[i]) - 16)) {
> +        env->vxsat = 0x1;
> +        d[i] = (a[i] & INT16_MIN) ? INT16_MIN : INT16_MAX;
> +    } else {
> +        d[i] = result;
> +    }
> +}
> +
> +RVPR(ksll16, 1, 2);
> +
> +static inline void do_kslra16(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va;
> +    int32_t shift = sextract32((*(target_ulong *)vb), 0, 5);
> +
> +    if (shift >= 0) {
> +        do_ksll16(env, vd, va, vb, i);
> +    } else {
> +        shift = -shift;
> +        shift = (shift == 16) ? 15 : shift;
> +        d[i] = a[i] >> shift;
> +    }
> +}
> +
> +RVPR(kslra16, 1, 2);
> +
> +static inline void do_kslra16_u(CPURISCVState *env, void *vd, void *va,
> +                                void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va;
> +    int32_t shift = sextract32((*(uint32_t *)vb), 0, 5);
> +
> +    if (shift >= 0) {
> +        do_ksll16(env, vd, va, vb, i);
> +    } else {
> +        shift = -shift;
> +        shift = (shift == 16) ? 15 : shift;
> +        d[i] = vssra16(env, 0, a[i], shift);
> +    }
> +}
> +
> +RVPR(kslra16_u, 1, 2);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 07/38] target/riscv: SIMD 8-bit Shift Instructions
  2021-02-12 15:02   ` LIU Zhiwei
@ 2021-03-15 21:27     ` Alistair Francis
  -1 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-15 21:27 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Palmer Dabbelt, open list:RISC-V,
	qemu-devel@nongnu.org Developers

On Fri, Feb 12, 2021 at 10:18 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |   9 +++
>  target/riscv/insn32.decode              |  17 ++++
>  target/riscv/insn_trans/trans_rvp.c.inc |  16 ++++
>  target/riscv/packed_helper.c            | 102 ++++++++++++++++++++++++
>  4 files changed, 144 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 20bf400ac2..0ecd4d53f9 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1193,3 +1193,12 @@ DEF_HELPER_3(sll16, tl, env, tl, tl)
>  DEF_HELPER_3(ksll16, tl, env, tl, tl)
>  DEF_HELPER_3(kslra16, tl, env, tl, tl)
>  DEF_HELPER_3(kslra16_u, tl, env, tl, tl)
> +
> +DEF_HELPER_3(sra8, tl, env, tl, tl)
> +DEF_HELPER_3(sra8_u, tl, env, tl, tl)
> +DEF_HELPER_3(srl8, tl, env, tl, tl)
> +DEF_HELPER_3(srl8_u, tl, env, tl, tl)
> +DEF_HELPER_3(sll8, tl, env, tl, tl)
> +DEF_HELPER_3(ksll8, tl, env, tl, tl)
> +DEF_HELPER_3(kslra8, tl, env, tl, tl)
> +DEF_HELPER_3(kslra8_u, tl, env, tl, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 6f053bfeb7..cc782fcde5 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -24,6 +24,7 @@
>
>  %sh10    20:10
>  %sh4    20:4
> +%sh3    20:3
>  %csr    20:12
>  %rm     12:3
>  %nf     29:3                     !function=ex_plus_1
> @@ -61,6 +62,7 @@
>
>  @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh10      %rs1 %rd
>  @sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
> +@sh3     ......  ...... .....  ... ..... ....... &shift  shamt=%sh3      %rs1 %rd
>  @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
>
>  @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
> @@ -652,3 +654,18 @@ ksll16     0110010  ..... ..... 000 ..... 1111111 @r
>  kslli16    0111010  1.... ..... 000 ..... 1111111 @sh4
>  kslra16    0101011  ..... ..... 000 ..... 1111111 @r
>  kslra16_u  0110011  ..... ..... 000 ..... 1111111 @r
> +
> +sra8       0101100  ..... ..... 000 ..... 1111111 @r
> +sra8_u     0110100  ..... ..... 000 ..... 1111111 @r
> +srai8      0111100  00... ..... 000 ..... 1111111 @sh3
> +srai8_u    0111100  01... ..... 000 ..... 1111111 @sh3
> +srl8       0101101  ..... ..... 000 ..... 1111111 @r
> +srl8_u     0110101  ..... ..... 000 ..... 1111111 @r
> +srli8      0111101  00... ..... 000 ..... 1111111 @sh3
> +srli8_u    0111101  01... ..... 000 ..... 1111111 @sh3
> +sll8       0101110  ..... ..... 000 ..... 1111111 @r
> +slli8      0111110  00... ..... 000 ..... 1111111 @sh3
> +ksll8      0110110  ..... ..... 000 ..... 1111111 @r
> +kslli8     0111110  01... ..... 000 ..... 1111111 @sh3
> +kslra8     0101111  ..... ..... 000 ..... 1111111 @r
> +kslra8_u   0110111  ..... ..... 000 ..... 1111111 @r
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 848edab7e5..12a64849eb 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -353,3 +353,19 @@ GEN_RVP_SHIFTI(slli16, sll16, tcg_gen_vec_shl16i_i64);
>  GEN_RVP_SHIFTI(srai16_u, sra16_u, NULL);
>  GEN_RVP_SHIFTI(srli16_u, srl16_u, NULL);
>  GEN_RVP_SHIFTI(kslli16, ksll16, NULL);
> +
> +/* SIMD 8-bit Shift Instructions */
> +GEN_RVP_SHIFT(sra8, tcg_gen_gvec_sars, 0);
> +GEN_RVP_SHIFT(srl8, tcg_gen_gvec_shrs, 0);
> +GEN_RVP_SHIFT(sll8, tcg_gen_gvec_shls, 0);
> +GEN_RVP_R_OOL(sra8_u);
> +GEN_RVP_R_OOL(srl8_u);
> +GEN_RVP_R_OOL(ksll8);
> +GEN_RVP_R_OOL(kslra8);
> +GEN_RVP_R_OOL(kslra8_u);
> +GEN_RVP_SHIFTI(srai8, sra8, tcg_gen_vec_sar8i_i64);
> +GEN_RVP_SHIFTI(srli8, srl8, tcg_gen_vec_shr8i_i64);
> +GEN_RVP_SHIFTI(slli8, sll8, tcg_gen_vec_shl8i_i64);
> +GEN_RVP_SHIFTI(srai8_u, sra8_u, NULL);
> +GEN_RVP_SHIFTI(srli8_u, srl8_u, NULL);
> +GEN_RVP_SHIFTI(kslli8, ksll8, NULL);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index 7e31c2fe46..ab9ebc472b 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -529,3 +529,105 @@ static inline void do_kslra16_u(CPURISCVState *env, void *vd, void *va,
>  }
>
>  RVPR(kslra16_u, 1, 2);
> +
> +/* SIMD 8-bit Shift Instructions */
> +static inline void do_sra8(CPURISCVState *env, void *vd, void *va,
> +                           void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +    d[i] = a[i] >> shift;
> +}
> +
> +RVPR(sra8, 1, 1);
> +
> +static inline void do_srl8(CPURISCVState *env, void *vd, void *va,
> +                           void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +    d[i] = a[i] >> shift;
> +}
> +
> +RVPR(srl8, 1, 1);
> +
> +static inline void do_sll8(CPURISCVState *env, void *vd, void *va,
> +                           void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +    d[i] = a[i] << shift;
> +}
> +
> +RVPR(sll8, 1, 1);
> +
> +static inline void do_sra8_u(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +    d[i] =  vssra8(env, 0, a[i], shift);
> +}
> +
> +RVPR(sra8_u, 1, 1);
> +
> +static inline void do_srl8_u(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +    d[i] =  vssrl8(env, 0, a[i], shift);
> +}
> +
> +RVPR(srl8_u, 1, 1);
> +
> +static inline void do_ksll8(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, result;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +
> +    result = a[i] << shift;
> +    if (shift > (clrsb32(a[i]) - 24)) {
> +        env->vxsat = 0x1;
> +        d[i] = (a[i] & INT8_MIN) ? INT8_MIN : INT8_MAX;
> +    } else {
> +        d[i] = result;
> +    }
> +}
> +
> +RVPR(ksll8, 1, 1);
> +
> +static inline void do_kslra8(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    int32_t shift = sextract32((*(uint32_t *)vb), 0, 4);
> +
> +    if (shift >= 0) {
> +        do_ksll8(env, vd, va, vb, i);
> +    } else {
> +        shift = -shift;
> +        shift = (shift == 8) ? 7 : shift;
> +        d[i] = a[i] >> shift;
> +    }
> +}
> +
> +RVPR(kslra8, 1, 1);
> +
> +static inline void do_kslra8_u(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    int32_t shift = sextract32((*(uint32_t *)vb), 0, 4);
> +
> +    if (shift >= 0) {
> +        do_ksll8(env, vd, va, vb, i);
> +    } else {
> +        shift = -shift;
> +        shift = (shift == 8) ? 7 : shift;
> +        d[i] =  vssra8(env, 0, a[i], shift);
> +    }
> +}
> +
> +RVPR(kslra8_u, 1, 1);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 07/38] target/riscv: SIMD 8-bit Shift Instructions
@ 2021-03-15 21:27     ` Alistair Francis
  0 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-15 21:27 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, Palmer Dabbelt

On Fri, Feb 12, 2021 at 10:18 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |   9 +++
>  target/riscv/insn32.decode              |  17 ++++
>  target/riscv/insn_trans/trans_rvp.c.inc |  16 ++++
>  target/riscv/packed_helper.c            | 102 ++++++++++++++++++++++++
>  4 files changed, 144 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 20bf400ac2..0ecd4d53f9 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1193,3 +1193,12 @@ DEF_HELPER_3(sll16, tl, env, tl, tl)
>  DEF_HELPER_3(ksll16, tl, env, tl, tl)
>  DEF_HELPER_3(kslra16, tl, env, tl, tl)
>  DEF_HELPER_3(kslra16_u, tl, env, tl, tl)
> +
> +DEF_HELPER_3(sra8, tl, env, tl, tl)
> +DEF_HELPER_3(sra8_u, tl, env, tl, tl)
> +DEF_HELPER_3(srl8, tl, env, tl, tl)
> +DEF_HELPER_3(srl8_u, tl, env, tl, tl)
> +DEF_HELPER_3(sll8, tl, env, tl, tl)
> +DEF_HELPER_3(ksll8, tl, env, tl, tl)
> +DEF_HELPER_3(kslra8, tl, env, tl, tl)
> +DEF_HELPER_3(kslra8_u, tl, env, tl, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 6f053bfeb7..cc782fcde5 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -24,6 +24,7 @@
>
>  %sh10    20:10
>  %sh4    20:4
> +%sh3    20:3
>  %csr    20:12
>  %rm     12:3
>  %nf     29:3                     !function=ex_plus_1
> @@ -61,6 +62,7 @@
>
>  @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh10      %rs1 %rd
>  @sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
> +@sh3     ......  ...... .....  ... ..... ....... &shift  shamt=%sh3      %rs1 %rd
>  @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
>
>  @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
> @@ -652,3 +654,18 @@ ksll16     0110010  ..... ..... 000 ..... 1111111 @r
>  kslli16    0111010  1.... ..... 000 ..... 1111111 @sh4
>  kslra16    0101011  ..... ..... 000 ..... 1111111 @r
>  kslra16_u  0110011  ..... ..... 000 ..... 1111111 @r
> +
> +sra8       0101100  ..... ..... 000 ..... 1111111 @r
> +sra8_u     0110100  ..... ..... 000 ..... 1111111 @r
> +srai8      0111100  00... ..... 000 ..... 1111111 @sh3
> +srai8_u    0111100  01... ..... 000 ..... 1111111 @sh3
> +srl8       0101101  ..... ..... 000 ..... 1111111 @r
> +srl8_u     0110101  ..... ..... 000 ..... 1111111 @r
> +srli8      0111101  00... ..... 000 ..... 1111111 @sh3
> +srli8_u    0111101  01... ..... 000 ..... 1111111 @sh3
> +sll8       0101110  ..... ..... 000 ..... 1111111 @r
> +slli8      0111110  00... ..... 000 ..... 1111111 @sh3
> +ksll8      0110110  ..... ..... 000 ..... 1111111 @r
> +kslli8     0111110  01... ..... 000 ..... 1111111 @sh3
> +kslra8     0101111  ..... ..... 000 ..... 1111111 @r
> +kslra8_u   0110111  ..... ..... 000 ..... 1111111 @r
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 848edab7e5..12a64849eb 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -353,3 +353,19 @@ GEN_RVP_SHIFTI(slli16, sll16, tcg_gen_vec_shl16i_i64);
>  GEN_RVP_SHIFTI(srai16_u, sra16_u, NULL);
>  GEN_RVP_SHIFTI(srli16_u, srl16_u, NULL);
>  GEN_RVP_SHIFTI(kslli16, ksll16, NULL);
> +
> +/* SIMD 8-bit Shift Instructions */
> +GEN_RVP_SHIFT(sra8, tcg_gen_gvec_sars, 0);
> +GEN_RVP_SHIFT(srl8, tcg_gen_gvec_shrs, 0);
> +GEN_RVP_SHIFT(sll8, tcg_gen_gvec_shls, 0);
> +GEN_RVP_R_OOL(sra8_u);
> +GEN_RVP_R_OOL(srl8_u);
> +GEN_RVP_R_OOL(ksll8);
> +GEN_RVP_R_OOL(kslra8);
> +GEN_RVP_R_OOL(kslra8_u);
> +GEN_RVP_SHIFTI(srai8, sra8, tcg_gen_vec_sar8i_i64);
> +GEN_RVP_SHIFTI(srli8, srl8, tcg_gen_vec_shr8i_i64);
> +GEN_RVP_SHIFTI(slli8, sll8, tcg_gen_vec_shl8i_i64);
> +GEN_RVP_SHIFTI(srai8_u, sra8_u, NULL);
> +GEN_RVP_SHIFTI(srli8_u, srl8_u, NULL);
> +GEN_RVP_SHIFTI(kslli8, ksll8, NULL);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index 7e31c2fe46..ab9ebc472b 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -529,3 +529,105 @@ static inline void do_kslra16_u(CPURISCVState *env, void *vd, void *va,
>  }
>
>  RVPR(kslra16_u, 1, 2);
> +
> +/* SIMD 8-bit Shift Instructions */
> +static inline void do_sra8(CPURISCVState *env, void *vd, void *va,
> +                           void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +    d[i] = a[i] >> shift;
> +}
> +
> +RVPR(sra8, 1, 1);
> +
> +static inline void do_srl8(CPURISCVState *env, void *vd, void *va,
> +                           void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +    d[i] = a[i] >> shift;
> +}
> +
> +RVPR(srl8, 1, 1);
> +
> +static inline void do_sll8(CPURISCVState *env, void *vd, void *va,
> +                           void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +    d[i] = a[i] << shift;
> +}
> +
> +RVPR(sll8, 1, 1);
> +
> +static inline void do_sra8_u(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +    d[i] =  vssra8(env, 0, a[i], shift);
> +}
> +
> +RVPR(sra8_u, 1, 1);
> +
> +static inline void do_srl8_u(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +    d[i] =  vssrl8(env, 0, a[i], shift);
> +}
> +
> +RVPR(srl8_u, 1, 1);
> +
> +static inline void do_ksll8(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, result;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +
> +    result = a[i] << shift;
> +    if (shift > (clrsb32(a[i]) - 24)) {
> +        env->vxsat = 0x1;
> +        d[i] = (a[i] & INT8_MIN) ? INT8_MIN : INT8_MAX;
> +    } else {
> +        d[i] = result;
> +    }
> +}
> +
> +RVPR(ksll8, 1, 1);
> +
> +static inline void do_kslra8(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    int32_t shift = sextract32((*(uint32_t *)vb), 0, 4);
> +
> +    if (shift >= 0) {
> +        do_ksll8(env, vd, va, vb, i);
> +    } else {
> +        shift = -shift;
> +        shift = (shift == 8) ? 7 : shift;
> +        d[i] = a[i] >> shift;
> +    }
> +}
> +
> +RVPR(kslra8, 1, 1);
> +
> +static inline void do_kslra8_u(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    int32_t shift = sextract32((*(uint32_t *)vb), 0, 4);
> +
> +    if (shift >= 0) {
> +        do_ksll8(env, vd, va, vb, i);
> +    } else {
> +        shift = -shift;
> +        shift = (shift == 8) ? 7 : shift;
> +        d[i] =  vssra8(env, 0, a[i], shift);
> +    }
> +}
> +
> +RVPR(kslra8_u, 1, 1);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 08/38] target/riscv: SIMD 16-bit Compare Instructions
  2021-02-12 15:02   ` LIU Zhiwei
@ 2021-03-15 21:28     ` Alistair Francis
  -1 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-15 21:28 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Palmer Dabbelt, open list:RISC-V,
	qemu-devel@nongnu.org Developers

On Fri, Feb 12, 2021 at 10:20 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |  6 ++++
>  target/riscv/insn32.decode              |  6 ++++
>  target/riscv/insn_trans/trans_rvp.c.inc |  7 ++++
>  target/riscv/packed_helper.c            | 46 +++++++++++++++++++++++++
>  4 files changed, 65 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 0ecd4d53f9..f41f9acccc 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1202,3 +1202,9 @@ DEF_HELPER_3(sll8, tl, env, tl, tl)
>  DEF_HELPER_3(ksll8, tl, env, tl, tl)
>  DEF_HELPER_3(kslra8, tl, env, tl, tl)
>  DEF_HELPER_3(kslra8_u, tl, env, tl, tl)
> +
> +DEF_HELPER_3(cmpeq16, tl, env, tl, tl)
> +DEF_HELPER_3(scmplt16, tl, env, tl, tl)
> +DEF_HELPER_3(scmple16, tl, env, tl, tl)
> +DEF_HELPER_3(ucmplt16, tl, env, tl, tl)
> +DEF_HELPER_3(ucmple16, tl, env, tl, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index cc782fcde5..f3cd508396 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -669,3 +669,9 @@ ksll8      0110110  ..... ..... 000 ..... 1111111 @r
>  kslli8     0111110  01... ..... 000 ..... 1111111 @sh3
>  kslra8     0101111  ..... ..... 000 ..... 1111111 @r
>  kslra8_u   0110111  ..... ..... 000 ..... 1111111 @r
> +
> +cmpeq16    0100110  ..... ..... 000 ..... 1111111 @r
> +scmplt16   0000110  ..... ..... 000 ..... 1111111 @r
> +scmple16   0001110  ..... ..... 000 ..... 1111111 @r
> +ucmplt16   0010110  ..... ..... 000 ..... 1111111 @r
> +ucmple16   0011110  ..... ..... 000 ..... 1111111 @r
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 12a64849eb..6438dfb776 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -369,3 +369,10 @@ GEN_RVP_SHIFTI(slli8, sll8, tcg_gen_vec_shl8i_i64);
>  GEN_RVP_SHIFTI(srai8_u, sra8_u, NULL);
>  GEN_RVP_SHIFTI(srli8_u, srl8_u, NULL);
>  GEN_RVP_SHIFTI(kslli8, ksll8, NULL);
> +
> +/* SIMD 16-bit Compare Instructions */
> +GEN_RVP_R_OOL(cmpeq16);
> +GEN_RVP_R_OOL(scmplt16);
> +GEN_RVP_R_OOL(scmple16);
> +GEN_RVP_R_OOL(ucmplt16);
> +GEN_RVP_R_OOL(ucmple16);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index ab9ebc472b..30b916b5ad 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -631,3 +631,49 @@ static inline void do_kslra8_u(CPURISCVState *env, void *vd, void *va,
>  }
>
>  RVPR(kslra8_u, 1, 1);
> +
> +/* SIMD 16-bit Compare Instructions */
> +static inline void do_cmpeq16(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] == b[i]) ? 0xffff : 0x0;
> +}
> +
> +RVPR(cmpeq16, 1, 2);
> +
> +static inline void do_scmplt16(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] < b[i]) ? 0xffff : 0x0;
> +}
> +
> +RVPR(scmplt16, 1, 2);
> +
> +static inline void do_scmple16(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] <= b[i]) ? 0xffff : 0x0;
> +}
> +
> +RVPR(scmple16, 1, 2);
> +
> +static inline void do_ucmplt16(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] < b[i]) ? 0xffff : 0x0;
> +}
> +
> +RVPR(ucmplt16, 1, 2);
> +
> +static inline void do_ucmple16(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] <= b[i]) ? 0xffff : 0x0;
> +}
> +
> +RVPR(ucmple16, 1, 2);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 08/38] target/riscv: SIMD 16-bit Compare Instructions
@ 2021-03-15 21:28     ` Alistair Francis
  0 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-15 21:28 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, Palmer Dabbelt

On Fri, Feb 12, 2021 at 10:20 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |  6 ++++
>  target/riscv/insn32.decode              |  6 ++++
>  target/riscv/insn_trans/trans_rvp.c.inc |  7 ++++
>  target/riscv/packed_helper.c            | 46 +++++++++++++++++++++++++
>  4 files changed, 65 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 0ecd4d53f9..f41f9acccc 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1202,3 +1202,9 @@ DEF_HELPER_3(sll8, tl, env, tl, tl)
>  DEF_HELPER_3(ksll8, tl, env, tl, tl)
>  DEF_HELPER_3(kslra8, tl, env, tl, tl)
>  DEF_HELPER_3(kslra8_u, tl, env, tl, tl)
> +
> +DEF_HELPER_3(cmpeq16, tl, env, tl, tl)
> +DEF_HELPER_3(scmplt16, tl, env, tl, tl)
> +DEF_HELPER_3(scmple16, tl, env, tl, tl)
> +DEF_HELPER_3(ucmplt16, tl, env, tl, tl)
> +DEF_HELPER_3(ucmple16, tl, env, tl, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index cc782fcde5..f3cd508396 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -669,3 +669,9 @@ ksll8      0110110  ..... ..... 000 ..... 1111111 @r
>  kslli8     0111110  01... ..... 000 ..... 1111111 @sh3
>  kslra8     0101111  ..... ..... 000 ..... 1111111 @r
>  kslra8_u   0110111  ..... ..... 000 ..... 1111111 @r
> +
> +cmpeq16    0100110  ..... ..... 000 ..... 1111111 @r
> +scmplt16   0000110  ..... ..... 000 ..... 1111111 @r
> +scmple16   0001110  ..... ..... 000 ..... 1111111 @r
> +ucmplt16   0010110  ..... ..... 000 ..... 1111111 @r
> +ucmple16   0011110  ..... ..... 000 ..... 1111111 @r
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 12a64849eb..6438dfb776 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -369,3 +369,10 @@ GEN_RVP_SHIFTI(slli8, sll8, tcg_gen_vec_shl8i_i64);
>  GEN_RVP_SHIFTI(srai8_u, sra8_u, NULL);
>  GEN_RVP_SHIFTI(srli8_u, srl8_u, NULL);
>  GEN_RVP_SHIFTI(kslli8, ksll8, NULL);
> +
> +/* SIMD 16-bit Compare Instructions */
> +GEN_RVP_R_OOL(cmpeq16);
> +GEN_RVP_R_OOL(scmplt16);
> +GEN_RVP_R_OOL(scmple16);
> +GEN_RVP_R_OOL(ucmplt16);
> +GEN_RVP_R_OOL(ucmple16);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index ab9ebc472b..30b916b5ad 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -631,3 +631,49 @@ static inline void do_kslra8_u(CPURISCVState *env, void *vd, void *va,
>  }
>
>  RVPR(kslra8_u, 1, 1);
> +
> +/* SIMD 16-bit Compare Instructions */
> +static inline void do_cmpeq16(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] == b[i]) ? 0xffff : 0x0;
> +}
> +
> +RVPR(cmpeq16, 1, 2);
> +
> +static inline void do_scmplt16(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] < b[i]) ? 0xffff : 0x0;
> +}
> +
> +RVPR(scmplt16, 1, 2);
> +
> +static inline void do_scmple16(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] <= b[i]) ? 0xffff : 0x0;
> +}
> +
> +RVPR(scmple16, 1, 2);
> +
> +static inline void do_ucmplt16(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] < b[i]) ? 0xffff : 0x0;
> +}
> +
> +RVPR(ucmplt16, 1, 2);
> +
> +static inline void do_ucmple16(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] <= b[i]) ? 0xffff : 0x0;
> +}
> +
> +RVPR(ucmple16, 1, 2);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 09/38] target/riscv: SIMD 8-bit Compare Instructions
  2021-02-12 15:02   ` LIU Zhiwei
@ 2021-03-15 21:31     ` Alistair Francis
  -1 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-15 21:31 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Palmer Dabbelt, open list:RISC-V,
	qemu-devel@nongnu.org Developers

On Fri, Feb 12, 2021 at 10:22 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |  6 ++++
>  target/riscv/insn32.decode              |  6 ++++
>  target/riscv/insn_trans/trans_rvp.c.inc |  7 ++++
>  target/riscv/packed_helper.c            | 46 +++++++++++++++++++++++++
>  4 files changed, 65 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index f41f9acccc..4d9c36609c 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1208,3 +1208,9 @@ DEF_HELPER_3(scmplt16, tl, env, tl, tl)
>  DEF_HELPER_3(scmple16, tl, env, tl, tl)
>  DEF_HELPER_3(ucmplt16, tl, env, tl, tl)
>  DEF_HELPER_3(ucmple16, tl, env, tl, tl)
> +
> +DEF_HELPER_3(cmpeq8, tl, env, tl, tl)
> +DEF_HELPER_3(scmplt8, tl, env, tl, tl)
> +DEF_HELPER_3(scmple8, tl, env, tl, tl)
> +DEF_HELPER_3(ucmplt8, tl, env, tl, tl)
> +DEF_HELPER_3(ucmple8, tl, env, tl, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index f3cd508396..7519df7e20 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -675,3 +675,9 @@ scmplt16   0000110  ..... ..... 000 ..... 1111111 @r
>  scmple16   0001110  ..... ..... 000 ..... 1111111 @r
>  ucmplt16   0010110  ..... ..... 000 ..... 1111111 @r
>  ucmple16   0011110  ..... ..... 000 ..... 1111111 @r
> +
> +cmpeq8     0100111  ..... ..... 000 ..... 1111111 @r
> +scmplt8    0000111  ..... ..... 000 ..... 1111111 @r
> +scmple8    0001111  ..... ..... 000 ..... 1111111 @r
> +ucmplt8    0010111  ..... ..... 000 ..... 1111111 @r
> +ucmple8    0011111  ..... ..... 000 ..... 1111111 @r
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 6438dfb776..6eb9e83c6f 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -376,3 +376,10 @@ GEN_RVP_R_OOL(scmplt16);
>  GEN_RVP_R_OOL(scmple16);
>  GEN_RVP_R_OOL(ucmplt16);
>  GEN_RVP_R_OOL(ucmple16);
> +
> +/* SIMD 8-bit Compare Instructions */
> +GEN_RVP_R_OOL(cmpeq8);
> +GEN_RVP_R_OOL(scmplt8);
> +GEN_RVP_R_OOL(scmple8);
> +GEN_RVP_R_OOL(ucmplt8);
> +GEN_RVP_R_OOL(ucmple8);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index 30b916b5ad..ff86e015e4 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -677,3 +677,49 @@ static inline void do_ucmple16(CPURISCVState *env, void *vd, void *va,
>  }
>
>  RVPR(ucmple16, 1, 2);
> +
> +/* SIMD 8-bit Compare Instructions */
> +static inline void do_cmpeq8(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] == b[i]) ? 0xff : 0x0;
> +}
> +
> +RVPR(cmpeq8, 1, 1);
> +
> +static inline void do_scmplt8(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] < b[i]) ? 0xff : 0x0;
> +}
> +
> +RVPR(scmplt8, 1, 1);
> +
> +static inline void do_scmple8(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] <= b[i]) ? 0xff : 0x0;
> +}
> +
> +RVPR(scmple8, 1, 1);
> +
> +static inline void do_ucmplt8(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] < b[i]) ? 0xff : 0x0;
> +}
> +
> +RVPR(ucmplt8, 1, 1);
> +
> +static inline void do_ucmple8(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] <= b[i]) ? 0xff : 0x0;
> +}
> +
> +RVPR(ucmple8, 1, 1);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 09/38] target/riscv: SIMD 8-bit Compare Instructions
@ 2021-03-15 21:31     ` Alistair Francis
  0 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-15 21:31 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, Palmer Dabbelt

On Fri, Feb 12, 2021 at 10:22 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |  6 ++++
>  target/riscv/insn32.decode              |  6 ++++
>  target/riscv/insn_trans/trans_rvp.c.inc |  7 ++++
>  target/riscv/packed_helper.c            | 46 +++++++++++++++++++++++++
>  4 files changed, 65 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index f41f9acccc..4d9c36609c 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1208,3 +1208,9 @@ DEF_HELPER_3(scmplt16, tl, env, tl, tl)
>  DEF_HELPER_3(scmple16, tl, env, tl, tl)
>  DEF_HELPER_3(ucmplt16, tl, env, tl, tl)
>  DEF_HELPER_3(ucmple16, tl, env, tl, tl)
> +
> +DEF_HELPER_3(cmpeq8, tl, env, tl, tl)
> +DEF_HELPER_3(scmplt8, tl, env, tl, tl)
> +DEF_HELPER_3(scmple8, tl, env, tl, tl)
> +DEF_HELPER_3(ucmplt8, tl, env, tl, tl)
> +DEF_HELPER_3(ucmple8, tl, env, tl, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index f3cd508396..7519df7e20 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -675,3 +675,9 @@ scmplt16   0000110  ..... ..... 000 ..... 1111111 @r
>  scmple16   0001110  ..... ..... 000 ..... 1111111 @r
>  ucmplt16   0010110  ..... ..... 000 ..... 1111111 @r
>  ucmple16   0011110  ..... ..... 000 ..... 1111111 @r
> +
> +cmpeq8     0100111  ..... ..... 000 ..... 1111111 @r
> +scmplt8    0000111  ..... ..... 000 ..... 1111111 @r
> +scmple8    0001111  ..... ..... 000 ..... 1111111 @r
> +ucmplt8    0010111  ..... ..... 000 ..... 1111111 @r
> +ucmple8    0011111  ..... ..... 000 ..... 1111111 @r
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 6438dfb776..6eb9e83c6f 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -376,3 +376,10 @@ GEN_RVP_R_OOL(scmplt16);
>  GEN_RVP_R_OOL(scmple16);
>  GEN_RVP_R_OOL(ucmplt16);
>  GEN_RVP_R_OOL(ucmple16);
> +
> +/* SIMD 8-bit Compare Instructions */
> +GEN_RVP_R_OOL(cmpeq8);
> +GEN_RVP_R_OOL(scmplt8);
> +GEN_RVP_R_OOL(scmple8);
> +GEN_RVP_R_OOL(ucmplt8);
> +GEN_RVP_R_OOL(ucmple8);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index 30b916b5ad..ff86e015e4 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -677,3 +677,49 @@ static inline void do_ucmple16(CPURISCVState *env, void *vd, void *va,
>  }
>
>  RVPR(ucmple16, 1, 2);
> +
> +/* SIMD 8-bit Compare Instructions */
> +static inline void do_cmpeq8(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] == b[i]) ? 0xff : 0x0;
> +}
> +
> +RVPR(cmpeq8, 1, 1);
> +
> +static inline void do_scmplt8(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] < b[i]) ? 0xff : 0x0;
> +}
> +
> +RVPR(scmplt8, 1, 1);
> +
> +static inline void do_scmple8(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] <= b[i]) ? 0xff : 0x0;
> +}
> +
> +RVPR(scmple8, 1, 1);
> +
> +static inline void do_ucmplt8(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] < b[i]) ? 0xff : 0x0;
> +}
> +
> +RVPR(ucmplt8, 1, 1);
> +
> +static inline void do_ucmple8(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] <= b[i]) ? 0xff : 0x0;
> +}
> +
> +RVPR(ucmple8, 1, 1);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 11/38] target/riscv: SIMD 8-bit Multiply Instructions
  2021-02-12 15:02   ` LIU Zhiwei
@ 2021-03-15 21:33     ` Alistair Francis
  -1 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-15 21:33 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Palmer Dabbelt, open list:RISC-V,
	qemu-devel@nongnu.org Developers

On Fri, Feb 12, 2021 at 10:26 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |  7 ++
>  target/riscv/insn32.decode              |  7 ++
>  target/riscv/insn_trans/trans_rvp.c.inc |  8 +++
>  target/riscv/packed_helper.c            | 93 +++++++++++++++++++++++++
>  4 files changed, 115 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index bc60712bd9..6bb601b436 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1221,3 +1221,10 @@ DEF_HELPER_3(umul16, i64, env, tl, tl)
>  DEF_HELPER_3(umulx16, i64, env, tl, tl)
>  DEF_HELPER_3(khm16, tl, env, tl, tl)
>  DEF_HELPER_3(khmx16, tl, env, tl, tl)
> +
> +DEF_HELPER_3(smul8, i64, env, tl, tl)
> +DEF_HELPER_3(smulx8, i64, env, tl, tl)
> +DEF_HELPER_3(umul8, i64, env, tl, tl)
> +DEF_HELPER_3(umulx8, i64, env, tl, tl)
> +DEF_HELPER_3(khm8, tl, env, tl, tl)
> +DEF_HELPER_3(khmx8, tl, env, tl, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 38519a477c..9d165efba9 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -688,3 +688,10 @@ umul16     1011000  ..... ..... 000 ..... 1111111 @r
>  umulx16    1011001  ..... ..... 000 ..... 1111111 @r
>  khm16      1000011  ..... ..... 000 ..... 1111111 @r
>  khmx16     1001011  ..... ..... 000 ..... 1111111 @r
> +
> +smul8      1010100  ..... ..... 000 ..... 1111111 @r
> +smulx8     1010101  ..... ..... 000 ..... 1111111 @r
> +umul8      1011100  ..... ..... 000 ..... 1111111 @r
> +umulx8     1011101  ..... ..... 000 ..... 1111111 @r
> +khm8       1000111  ..... ..... 000 ..... 1111111 @r
> +khmx8      1001111  ..... ..... 000 ..... 1111111 @r
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 7e5bf9041d..336f3418b1 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -436,3 +436,11 @@ GEN_RVP_R_D64_OOL(umul16);
>  GEN_RVP_R_D64_OOL(umulx16);
>  GEN_RVP_R_OOL(khm16);
>  GEN_RVP_R_OOL(khmx16);
> +
> +/* SIMD 8-bit Multiply Instructions */
> +GEN_RVP_R_D64_OOL(smul8);
> +GEN_RVP_R_D64_OOL(smulx8);
> +GEN_RVP_R_D64_OOL(umul8);
> +GEN_RVP_R_D64_OOL(umulx8);
> +GEN_RVP_R_OOL(khm8);
> +GEN_RVP_R_OOL(khmx8);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index 13fed2c4d1..56baefeb8e 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -827,3 +827,96 @@ static inline void do_khmx16(CPURISCVState *env, void *vd, void *va,
>  }
>
>  RVPR(khmx16, 2, 2);
> +
> +/* SIMD 8-bit Multiply Instructions */
> +static inline void do_smul8(CPURISCVState *env, void *vd, void *va, void *vb)
> +{
> +    int16_t *d = vd;
> +    int8_t *a = va, *b = vb;
> +    d[H2(0)] = (int16_t)a[H1(0)] * b[H1(0)];
> +    d[H2(1)] = (int16_t)a[H1(1)] * b[H1(1)];
> +    d[H2(2)] = (int16_t)a[H1(2)] * b[H1(2)];
> +    d[H2(3)] = (int16_t)a[H1(3)] * b[H1(3)];
> +}
> +
> +RVPR64(smul8);
> +
> +static inline void do_smulx8(CPURISCVState *env, void *vd, void *va, void *vb)
> +{
> +    int16_t *d = vd;
> +    int8_t *a = va, *b = vb;
> +    d[H2(0)] = (int16_t)a[H1(0)] * b[H1(1)];
> +    d[H2(1)] = (int16_t)a[H1(1)] * b[H1(0)];
> +    d[H2(2)] = (int16_t)a[H1(2)] * b[H1(3)];
> +    d[H2(3)] = (int16_t)a[H1(3)] * b[H1(2)];
> +}
> +
> +RVPR64(smulx8);
> +
> +static inline void do_umul8(CPURISCVState *env, void *vd, void *va, void *vb)
> +{
> +    uint16_t *d = vd;
> +    uint8_t *a = va, *b = vb;
> +    d[H2(0)] = (uint16_t)a[H1(0)] * b[H1(0)];
> +    d[H2(1)] = (uint16_t)a[H1(1)] * b[H1(1)];
> +    d[H2(2)] = (uint16_t)a[H1(2)] * b[H1(2)];
> +    d[H2(3)] = (uint16_t)a[H1(3)] * b[H1(3)];
> +}
> +
> +RVPR64(umul8);
> +
> +static inline void do_umulx8(CPURISCVState *env, void *vd, void *va, void *vb)
> +{
> +    uint16_t *d = vd;
> +    uint8_t *a = va, *b = vb;
> +    d[H2(0)] = (uint16_t)a[H1(0)] * b[H1(1)];
> +    d[H2(1)] = (uint16_t)a[H1(1)] * b[H1(0)];
> +    d[H2(2)] = (uint16_t)a[H1(2)] * b[H1(3)];
> +    d[H2(3)] = (uint16_t)a[H1(3)] * b[H1(2)];
> +}
> +
> +RVPR64(umulx8);
> +
> +static inline void do_khm8(CPURISCVState *env, void *vd, void *va,
> +                           void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, *b = vb;
> +
> +    if (a[i] == INT8_MIN && b[i] == INT8_MIN) {
> +        env->vxsat = 1;
> +        d[i] = INT8_MAX;
> +    } else {
> +        d[i] = (int16_t)a[i] * b[i] >> 7;
> +    }
> +}
> +
> +RVPR(khm8, 1, 1);
> +
> +static inline void do_khmx8(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, *b = vb;
> +    /*
> +     * t[x] = ra.B[x] s* rb.B[y];
> +     * rt.B[x] = SAT.Q7(t[x] s>> 7);
> +     *
> +     * (RV32: (x,y)=(3,2),(2,3),
> +     *              (1,0),(0,1),
> +     * (RV64: (x,y)=(7,6),(6,7),(5,4),(4,5),
> +     *              (3,2),(2,3),(1,0),(0,1))
> +     */
> +    if (a[H1(i)] == INT8_MIN && b[H1(i + 1)] == INT8_MIN) {
> +        env->vxsat = 1;
> +        d[H1(i)] = INT8_MAX;
> +    } else {
> +        d[H1(i)] = (int16_t)a[H1(i)] * b[H1(i + 1)] >> 7;
> +    }
> +    if (a[H1(i + 1)] == INT8_MIN && b[H1(i)] == INT8_MIN) {
> +        env->vxsat = 1;
> +        d[H1(i + 1)] = INT8_MAX;
> +    } else {
> +        d[H1(i + 1)] = (int16_t)a[H1(i + 1)] * b[H1(i)] >> 7;
> +    }
> +}
> +
> +RVPR(khmx8, 2, 1);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 11/38] target/riscv: SIMD 8-bit Multiply Instructions
@ 2021-03-15 21:33     ` Alistair Francis
  0 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-15 21:33 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, Palmer Dabbelt

On Fri, Feb 12, 2021 at 10:26 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |  7 ++
>  target/riscv/insn32.decode              |  7 ++
>  target/riscv/insn_trans/trans_rvp.c.inc |  8 +++
>  target/riscv/packed_helper.c            | 93 +++++++++++++++++++++++++
>  4 files changed, 115 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index bc60712bd9..6bb601b436 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1221,3 +1221,10 @@ DEF_HELPER_3(umul16, i64, env, tl, tl)
>  DEF_HELPER_3(umulx16, i64, env, tl, tl)
>  DEF_HELPER_3(khm16, tl, env, tl, tl)
>  DEF_HELPER_3(khmx16, tl, env, tl, tl)
> +
> +DEF_HELPER_3(smul8, i64, env, tl, tl)
> +DEF_HELPER_3(smulx8, i64, env, tl, tl)
> +DEF_HELPER_3(umul8, i64, env, tl, tl)
> +DEF_HELPER_3(umulx8, i64, env, tl, tl)
> +DEF_HELPER_3(khm8, tl, env, tl, tl)
> +DEF_HELPER_3(khmx8, tl, env, tl, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 38519a477c..9d165efba9 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -688,3 +688,10 @@ umul16     1011000  ..... ..... 000 ..... 1111111 @r
>  umulx16    1011001  ..... ..... 000 ..... 1111111 @r
>  khm16      1000011  ..... ..... 000 ..... 1111111 @r
>  khmx16     1001011  ..... ..... 000 ..... 1111111 @r
> +
> +smul8      1010100  ..... ..... 000 ..... 1111111 @r
> +smulx8     1010101  ..... ..... 000 ..... 1111111 @r
> +umul8      1011100  ..... ..... 000 ..... 1111111 @r
> +umulx8     1011101  ..... ..... 000 ..... 1111111 @r
> +khm8       1000111  ..... ..... 000 ..... 1111111 @r
> +khmx8      1001111  ..... ..... 000 ..... 1111111 @r
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 7e5bf9041d..336f3418b1 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -436,3 +436,11 @@ GEN_RVP_R_D64_OOL(umul16);
>  GEN_RVP_R_D64_OOL(umulx16);
>  GEN_RVP_R_OOL(khm16);
>  GEN_RVP_R_OOL(khmx16);
> +
> +/* SIMD 8-bit Multiply Instructions */
> +GEN_RVP_R_D64_OOL(smul8);
> +GEN_RVP_R_D64_OOL(smulx8);
> +GEN_RVP_R_D64_OOL(umul8);
> +GEN_RVP_R_D64_OOL(umulx8);
> +GEN_RVP_R_OOL(khm8);
> +GEN_RVP_R_OOL(khmx8);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index 13fed2c4d1..56baefeb8e 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -827,3 +827,96 @@ static inline void do_khmx16(CPURISCVState *env, void *vd, void *va,
>  }
>
>  RVPR(khmx16, 2, 2);
> +
> +/* SIMD 8-bit Multiply Instructions */
> +static inline void do_smul8(CPURISCVState *env, void *vd, void *va, void *vb)
> +{
> +    int16_t *d = vd;
> +    int8_t *a = va, *b = vb;
> +    d[H2(0)] = (int16_t)a[H1(0)] * b[H1(0)];
> +    d[H2(1)] = (int16_t)a[H1(1)] * b[H1(1)];
> +    d[H2(2)] = (int16_t)a[H1(2)] * b[H1(2)];
> +    d[H2(3)] = (int16_t)a[H1(3)] * b[H1(3)];
> +}
> +
> +RVPR64(smul8);
> +
> +static inline void do_smulx8(CPURISCVState *env, void *vd, void *va, void *vb)
> +{
> +    int16_t *d = vd;
> +    int8_t *a = va, *b = vb;
> +    d[H2(0)] = (int16_t)a[H1(0)] * b[H1(1)];
> +    d[H2(1)] = (int16_t)a[H1(1)] * b[H1(0)];
> +    d[H2(2)] = (int16_t)a[H1(2)] * b[H1(3)];
> +    d[H2(3)] = (int16_t)a[H1(3)] * b[H1(2)];
> +}
> +
> +RVPR64(smulx8);
> +
> +static inline void do_umul8(CPURISCVState *env, void *vd, void *va, void *vb)
> +{
> +    uint16_t *d = vd;
> +    uint8_t *a = va, *b = vb;
> +    d[H2(0)] = (uint16_t)a[H1(0)] * b[H1(0)];
> +    d[H2(1)] = (uint16_t)a[H1(1)] * b[H1(1)];
> +    d[H2(2)] = (uint16_t)a[H1(2)] * b[H1(2)];
> +    d[H2(3)] = (uint16_t)a[H1(3)] * b[H1(3)];
> +}
> +
> +RVPR64(umul8);
> +
> +static inline void do_umulx8(CPURISCVState *env, void *vd, void *va, void *vb)
> +{
> +    uint16_t *d = vd;
> +    uint8_t *a = va, *b = vb;
> +    d[H2(0)] = (uint16_t)a[H1(0)] * b[H1(1)];
> +    d[H2(1)] = (uint16_t)a[H1(1)] * b[H1(0)];
> +    d[H2(2)] = (uint16_t)a[H1(2)] * b[H1(3)];
> +    d[H2(3)] = (uint16_t)a[H1(3)] * b[H1(2)];
> +}
> +
> +RVPR64(umulx8);
> +
> +static inline void do_khm8(CPURISCVState *env, void *vd, void *va,
> +                           void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, *b = vb;
> +
> +    if (a[i] == INT8_MIN && b[i] == INT8_MIN) {
> +        env->vxsat = 1;
> +        d[i] = INT8_MAX;
> +    } else {
> +        d[i] = (int16_t)a[i] * b[i] >> 7;
> +    }
> +}
> +
> +RVPR(khm8, 1, 1);
> +
> +static inline void do_khmx8(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, *b = vb;
> +    /*
> +     * t[x] = ra.B[x] s* rb.B[y];
> +     * rt.B[x] = SAT.Q7(t[x] s>> 7);
> +     *
> +     * (RV32: (x,y)=(3,2),(2,3),
> +     *              (1,0),(0,1),
> +     * (RV64: (x,y)=(7,6),(6,7),(5,4),(4,5),
> +     *              (3,2),(2,3),(1,0),(0,1))
> +     */
> +    if (a[H1(i)] == INT8_MIN && b[H1(i + 1)] == INT8_MIN) {
> +        env->vxsat = 1;
> +        d[H1(i)] = INT8_MAX;
> +    } else {
> +        d[H1(i)] = (int16_t)a[H1(i)] * b[H1(i + 1)] >> 7;
> +    }
> +    if (a[H1(i + 1)] == INT8_MIN && b[H1(i)] == INT8_MIN) {
> +        env->vxsat = 1;
> +        d[H1(i + 1)] = INT8_MAX;
> +    } else {
> +        d[H1(i + 1)] = (int16_t)a[H1(i + 1)] * b[H1(i)] >> 7;
> +    }
> +}
> +
> +RVPR(khmx8, 2, 1);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 12/38] target/riscv: SIMD 16-bit Miscellaneous Instructions
  2021-02-12 15:02   ` LIU Zhiwei
@ 2021-03-15 21:35     ` Alistair Francis
  -1 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-15 21:35 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Palmer Dabbelt, open list:RISC-V,
	qemu-devel@nongnu.org Developers

On Fri, Feb 12, 2021 at 10:28 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |  12 ++
>  target/riscv/insn32.decode              |  13 ++
>  target/riscv/insn_trans/trans_rvp.c.inc |  42 ++++++
>  target/riscv/packed_helper.c            | 167 ++++++++++++++++++++++++
>  4 files changed, 234 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 6bb601b436..866484e37d 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1228,3 +1228,15 @@ DEF_HELPER_3(umul8, i64, env, tl, tl)
>  DEF_HELPER_3(umulx8, i64, env, tl, tl)
>  DEF_HELPER_3(khm8, tl, env, tl, tl)
>  DEF_HELPER_3(khmx8, tl, env, tl, tl)
> +
> +DEF_HELPER_3(smin16, tl, env, tl, tl)
> +DEF_HELPER_3(umin16, tl, env, tl, tl)
> +DEF_HELPER_3(smax16, tl, env, tl, tl)
> +DEF_HELPER_3(umax16, tl, env, tl, tl)
> +DEF_HELPER_3(sclip16, tl, env, tl, tl)
> +DEF_HELPER_3(uclip16, tl, env, tl, tl)
> +DEF_HELPER_2(kabs16, tl, env, tl)
> +DEF_HELPER_2(clrs16, tl, env, tl)
> +DEF_HELPER_2(clz16, tl, env, tl)
> +DEF_HELPER_2(clo16, tl, env, tl)
> +DEF_HELPER_2(swap16, tl, env, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 9d165efba9..bc9d5fc967 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -42,6 +42,7 @@
>  &i    imm rs1 rd
>  &j    imm rd
>  &r    rd rs1 rs2
> +&r2   rd rs1
>  &s    imm rs1 rs2
>  &u    imm rd
>  &shift     shamt rs1 rd
> @@ -695,3 +696,15 @@ umul8      1011100  ..... ..... 000 ..... 1111111 @r
>  umulx8     1011101  ..... ..... 000 ..... 1111111 @r
>  khm8       1000111  ..... ..... 000 ..... 1111111 @r
>  khmx8      1001111  ..... ..... 000 ..... 1111111 @r
> +
> +smin16     1000000  ..... ..... 000 ..... 1111111 @r
> +umin16     1001000  ..... ..... 000 ..... 1111111 @r
> +smax16     1000001  ..... ..... 000 ..... 1111111 @r
> +umax16     1001001  ..... ..... 000 ..... 1111111 @r
> +sclip16    1000010  0.... ..... 000 ..... 1111111 @sh4
> +uclip16    1000010  1.... ..... 000 ..... 1111111 @sh4
> +kabs16     1010110  10001 ..... 000 ..... 1111111 @r2
> +clrs16     1010111  01000 ..... 000 ..... 1111111 @r2
> +clz16      1010111  01001 ..... 000 ..... 1111111 @r2
> +clo16      1010111  01011 ..... 000 ..... 1111111 @r2
> +swap16     1010110  11001 ..... 000 ..... 1111111 @r2
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 336f3418b1..56fb8b2523 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -444,3 +444,45 @@ GEN_RVP_R_D64_OOL(umul8);
>  GEN_RVP_R_D64_OOL(umulx8);
>  GEN_RVP_R_OOL(khm8);
>  GEN_RVP_R_OOL(khmx8);
> +
> +/* SIMD 16-bit Miscellaneous Instructions */
> +GEN_RVP_R_OOL(smin16);
> +GEN_RVP_R_OOL(umin16);
> +GEN_RVP_R_OOL(smax16);
> +GEN_RVP_R_OOL(umax16);
> +GEN_RVP_SHIFTI(sclip16, sclip16, NULL);
> +GEN_RVP_SHIFTI(uclip16, uclip16, NULL);
> +
> +/* Out of line helpers for R2 format */
> +static bool
> +r2_ool(DisasContext *ctx, arg_r2 *a,
> +       void (* fn)(TCGv, TCGv_ptr, TCGv))
> +{
> +    TCGv src1, dst;
> +    if (!has_ext(ctx, RVP)) {
> +        return false;
> +    }
> +
> +    src1 = tcg_temp_new();
> +    dst = tcg_temp_new();
> +
> +    gen_get_gpr(src1, a->rs1);
> +    fn(dst, cpu_env, src1);
> +    gen_set_gpr(a->rd, dst);
> +
> +    tcg_temp_free(src1);
> +    tcg_temp_free(dst);
> +    return true;
> +}
> +
> +#define GEN_RVP_R2_OOL(NAME)                           \
> +static bool trans_##NAME(DisasContext *s, arg_r2 *a)   \
> +{                                                      \
> +    return r2_ool(s, a, gen_helper_##NAME);            \
> +}
> +
> +GEN_RVP_R2_OOL(kabs16);
> +GEN_RVP_R2_OOL(clrs16);
> +GEN_RVP_R2_OOL(clz16);
> +GEN_RVP_R2_OOL(clo16);
> +GEN_RVP_R2_OOL(swap16);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index 56baefeb8e..a6ab011ace 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -920,3 +920,170 @@ static inline void do_khmx8(CPURISCVState *env, void *vd, void *va,
>  }
>
>  RVPR(khmx8, 2, 1);
> +
> +/* SIMD 16-bit Miscellaneous Instructions */
> +static inline void do_smin16(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va, *b = vb;
> +
> +    d[i] = (a[i] < b[i]) ? a[i] : b[i];
> +}
> +
> +RVPR(smin16, 1, 2);
> +
> +static inline void do_umin16(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +
> +    d[i] = (a[i] < b[i]) ? a[i] : b[i];
> +}
> +
> +RVPR(umin16, 1, 2);
> +
> +static inline void do_smax16(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va, *b = vb;
> +
> +    d[i] = (a[i] > b[i]) ? a[i] : b[i];
> +}
> +
> +RVPR(smax16, 1, 2);
> +
> +static inline void do_umax16(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +
> +    d[i] = (a[i] > b[i]) ? a[i] : b[i];
> +}
> +
> +RVPR(umax16, 1, 2);
> +
> +static int64_t sat64(CPURISCVState *env, int64_t a, uint8_t shift)
> +{
> +    int64_t max = shift >= 64 ? INT64_MAX : (1ull << shift) - 1;
> +    int64_t min = shift >= 64 ? INT64_MIN : -(1ull << shift);
> +    int64_t result;
> +
> +    if (a > max) {
> +        result = max;
> +        env->vxsat = 0x1;
> +    } else if (a < min) {
> +        result = min;
> +        env->vxsat = 0x1;
> +    } else {
> +        result = a;
> +    }
> +    return result;
> +}
> +
> +static inline void do_sclip16(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> +
> +    d[i] = sat64(env, a[i], shift);
> +}
> +
> +RVPR(sclip16, 1, 2);
> +
> +static uint64_t satu64(CPURISCVState *env, uint64_t a, uint8_t shift)
> +{
> +    uint64_t max = shift >= 64 ? UINT64_MAX : (1ull << shift) - 1;
> +    uint64_t result;
> +
> +    if (a > max) {
> +        result = max;
> +        env->vxsat = 0x1;
> +    } else {
> +        result = a;
> +    }
> +    return result;
> +}
> +
> +static inline void do_uclip16(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> +
> +    if (a[i] < 0) {
> +        d[i] = 0;
> +        env->vxsat = 0x1;
> +    } else {
> +        d[i] = satu64(env, a[i], shift);
> +    }
> +}
> +
> +RVPR(uclip16, 1, 2);
> +
> +typedef void PackedFn2i(CPURISCVState *, void *, void *, uint8_t);
> +
> +static inline target_ulong rvpr2(CPURISCVState *env, target_ulong a,
> +                                 uint8_t step, uint8_t size, PackedFn2i *fn)
> +{
> +    int i, passes = sizeof(target_ulong) / size;
> +    target_ulong result;
> +
> +    for (i = 0; i < passes; i += step) {
> +        fn(env, &result, &a, i);
> +    }
> +    return result;
> +}
> +
> +#define RVPR2(NAME, STEP, SIZE)                                  \
> +target_ulong HELPER(NAME)(CPURISCVState *env, target_ulong a)    \
> +{                                                                \
> +    return rvpr2(env, a, STEP, SIZE, (PackedFn2i *)do_##NAME);   \
> +}
> +
> +static inline void do_kabs16(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va;
> +
> +    if (a[i] == INT16_MIN) {
> +        d[i] = INT16_MAX;
> +        env->vxsat = 0x1;
> +    } else {
> +        d[i] = abs(a[i]);
> +    }
> +}
> +
> +RVPR2(kabs16, 1, 2);
> +
> +static inline void do_clrs16(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va;
> +    d[i] = clrsb32(a[i]) - 16;
> +}
> +
> +RVPR2(clrs16, 1, 2);
> +
> +static inline void do_clz16(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va;
> +    d[i] = (a[i] < 0) ? 0 : (clz32(a[i]) - 16);
> +}
> +
> +RVPR2(clz16, 1, 2);
> +
> +static inline void do_clo16(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va;
> +    d[i] = (a[i] >= 0) ? 0 : (clo32(a[i]) - 16);
> +}
> +
> +RVPR2(clo16, 1, 2);
> +
> +static inline void do_swap16(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va;
> +    d[H2(i)] = a[H2(i + 1)];
> +    d[H2(i + 1)] = a[H2(i)];
> +}
> +
> +RVPR2(swap16, 2, 2);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 12/38] target/riscv: SIMD 16-bit Miscellaneous Instructions
@ 2021-03-15 21:35     ` Alistair Francis
  0 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-15 21:35 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, Palmer Dabbelt

On Fri, Feb 12, 2021 at 10:28 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |  12 ++
>  target/riscv/insn32.decode              |  13 ++
>  target/riscv/insn_trans/trans_rvp.c.inc |  42 ++++++
>  target/riscv/packed_helper.c            | 167 ++++++++++++++++++++++++
>  4 files changed, 234 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 6bb601b436..866484e37d 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1228,3 +1228,15 @@ DEF_HELPER_3(umul8, i64, env, tl, tl)
>  DEF_HELPER_3(umulx8, i64, env, tl, tl)
>  DEF_HELPER_3(khm8, tl, env, tl, tl)
>  DEF_HELPER_3(khmx8, tl, env, tl, tl)
> +
> +DEF_HELPER_3(smin16, tl, env, tl, tl)
> +DEF_HELPER_3(umin16, tl, env, tl, tl)
> +DEF_HELPER_3(smax16, tl, env, tl, tl)
> +DEF_HELPER_3(umax16, tl, env, tl, tl)
> +DEF_HELPER_3(sclip16, tl, env, tl, tl)
> +DEF_HELPER_3(uclip16, tl, env, tl, tl)
> +DEF_HELPER_2(kabs16, tl, env, tl)
> +DEF_HELPER_2(clrs16, tl, env, tl)
> +DEF_HELPER_2(clz16, tl, env, tl)
> +DEF_HELPER_2(clo16, tl, env, tl)
> +DEF_HELPER_2(swap16, tl, env, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 9d165efba9..bc9d5fc967 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -42,6 +42,7 @@
>  &i    imm rs1 rd
>  &j    imm rd
>  &r    rd rs1 rs2
> +&r2   rd rs1
>  &s    imm rs1 rs2
>  &u    imm rd
>  &shift     shamt rs1 rd
> @@ -695,3 +696,15 @@ umul8      1011100  ..... ..... 000 ..... 1111111 @r
>  umulx8     1011101  ..... ..... 000 ..... 1111111 @r
>  khm8       1000111  ..... ..... 000 ..... 1111111 @r
>  khmx8      1001111  ..... ..... 000 ..... 1111111 @r
> +
> +smin16     1000000  ..... ..... 000 ..... 1111111 @r
> +umin16     1001000  ..... ..... 000 ..... 1111111 @r
> +smax16     1000001  ..... ..... 000 ..... 1111111 @r
> +umax16     1001001  ..... ..... 000 ..... 1111111 @r
> +sclip16    1000010  0.... ..... 000 ..... 1111111 @sh4
> +uclip16    1000010  1.... ..... 000 ..... 1111111 @sh4
> +kabs16     1010110  10001 ..... 000 ..... 1111111 @r2
> +clrs16     1010111  01000 ..... 000 ..... 1111111 @r2
> +clz16      1010111  01001 ..... 000 ..... 1111111 @r2
> +clo16      1010111  01011 ..... 000 ..... 1111111 @r2
> +swap16     1010110  11001 ..... 000 ..... 1111111 @r2
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 336f3418b1..56fb8b2523 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -444,3 +444,45 @@ GEN_RVP_R_D64_OOL(umul8);
>  GEN_RVP_R_D64_OOL(umulx8);
>  GEN_RVP_R_OOL(khm8);
>  GEN_RVP_R_OOL(khmx8);
> +
> +/* SIMD 16-bit Miscellaneous Instructions */
> +GEN_RVP_R_OOL(smin16);
> +GEN_RVP_R_OOL(umin16);
> +GEN_RVP_R_OOL(smax16);
> +GEN_RVP_R_OOL(umax16);
> +GEN_RVP_SHIFTI(sclip16, sclip16, NULL);
> +GEN_RVP_SHIFTI(uclip16, uclip16, NULL);
> +
> +/* Out of line helpers for R2 format */
> +static bool
> +r2_ool(DisasContext *ctx, arg_r2 *a,
> +       void (* fn)(TCGv, TCGv_ptr, TCGv))
> +{
> +    TCGv src1, dst;
> +    if (!has_ext(ctx, RVP)) {
> +        return false;
> +    }
> +
> +    src1 = tcg_temp_new();
> +    dst = tcg_temp_new();
> +
> +    gen_get_gpr(src1, a->rs1);
> +    fn(dst, cpu_env, src1);
> +    gen_set_gpr(a->rd, dst);
> +
> +    tcg_temp_free(src1);
> +    tcg_temp_free(dst);
> +    return true;
> +}
> +
> +#define GEN_RVP_R2_OOL(NAME)                           \
> +static bool trans_##NAME(DisasContext *s, arg_r2 *a)   \
> +{                                                      \
> +    return r2_ool(s, a, gen_helper_##NAME);            \
> +}
> +
> +GEN_RVP_R2_OOL(kabs16);
> +GEN_RVP_R2_OOL(clrs16);
> +GEN_RVP_R2_OOL(clz16);
> +GEN_RVP_R2_OOL(clo16);
> +GEN_RVP_R2_OOL(swap16);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index 56baefeb8e..a6ab011ace 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -920,3 +920,170 @@ static inline void do_khmx8(CPURISCVState *env, void *vd, void *va,
>  }
>
>  RVPR(khmx8, 2, 1);
> +
> +/* SIMD 16-bit Miscellaneous Instructions */
> +static inline void do_smin16(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va, *b = vb;
> +
> +    d[i] = (a[i] < b[i]) ? a[i] : b[i];
> +}
> +
> +RVPR(smin16, 1, 2);
> +
> +static inline void do_umin16(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +
> +    d[i] = (a[i] < b[i]) ? a[i] : b[i];
> +}
> +
> +RVPR(umin16, 1, 2);
> +
> +static inline void do_smax16(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va, *b = vb;
> +
> +    d[i] = (a[i] > b[i]) ? a[i] : b[i];
> +}
> +
> +RVPR(smax16, 1, 2);
> +
> +static inline void do_umax16(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +
> +    d[i] = (a[i] > b[i]) ? a[i] : b[i];
> +}
> +
> +RVPR(umax16, 1, 2);
> +
> +static int64_t sat64(CPURISCVState *env, int64_t a, uint8_t shift)
> +{
> +    int64_t max = shift >= 64 ? INT64_MAX : (1ull << shift) - 1;
> +    int64_t min = shift >= 64 ? INT64_MIN : -(1ull << shift);
> +    int64_t result;
> +
> +    if (a > max) {
> +        result = max;
> +        env->vxsat = 0x1;
> +    } else if (a < min) {
> +        result = min;
> +        env->vxsat = 0x1;
> +    } else {
> +        result = a;
> +    }
> +    return result;
> +}
> +
> +static inline void do_sclip16(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> +
> +    d[i] = sat64(env, a[i], shift);
> +}
> +
> +RVPR(sclip16, 1, 2);
> +
> +static uint64_t satu64(CPURISCVState *env, uint64_t a, uint8_t shift)
> +{
> +    uint64_t max = shift >= 64 ? UINT64_MAX : (1ull << shift) - 1;
> +    uint64_t result;
> +
> +    if (a > max) {
> +        result = max;
> +        env->vxsat = 0x1;
> +    } else {
> +        result = a;
> +    }
> +    return result;
> +}
> +
> +static inline void do_uclip16(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> +
> +    if (a[i] < 0) {
> +        d[i] = 0;
> +        env->vxsat = 0x1;
> +    } else {
> +        d[i] = satu64(env, a[i], shift);
> +    }
> +}
> +
> +RVPR(uclip16, 1, 2);
> +
> +typedef void PackedFn2i(CPURISCVState *, void *, void *, uint8_t);
> +
> +static inline target_ulong rvpr2(CPURISCVState *env, target_ulong a,
> +                                 uint8_t step, uint8_t size, PackedFn2i *fn)
> +{
> +    int i, passes = sizeof(target_ulong) / size;
> +    target_ulong result;
> +
> +    for (i = 0; i < passes; i += step) {
> +        fn(env, &result, &a, i);
> +    }
> +    return result;
> +}
> +
> +#define RVPR2(NAME, STEP, SIZE)                                  \
> +target_ulong HELPER(NAME)(CPURISCVState *env, target_ulong a)    \
> +{                                                                \
> +    return rvpr2(env, a, STEP, SIZE, (PackedFn2i *)do_##NAME);   \
> +}
> +
> +static inline void do_kabs16(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va;
> +
> +    if (a[i] == INT16_MIN) {
> +        d[i] = INT16_MAX;
> +        env->vxsat = 0x1;
> +    } else {
> +        d[i] = abs(a[i]);
> +    }
> +}
> +
> +RVPR2(kabs16, 1, 2);
> +
> +static inline void do_clrs16(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va;
> +    d[i] = clrsb32(a[i]) - 16;
> +}
> +
> +RVPR2(clrs16, 1, 2);
> +
> +static inline void do_clz16(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va;
> +    d[i] = (a[i] < 0) ? 0 : (clz32(a[i]) - 16);
> +}
> +
> +RVPR2(clz16, 1, 2);
> +
> +static inline void do_clo16(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va;
> +    d[i] = (a[i] >= 0) ? 0 : (clo32(a[i]) - 16);
> +}
> +
> +RVPR2(clo16, 1, 2);
> +
> +static inline void do_swap16(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va;
> +    d[H2(i)] = a[H2(i + 1)];
> +    d[H2(i + 1)] = a[H2(i)];
> +}
> +
> +RVPR2(swap16, 2, 2);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 06/38] target/riscv: SIMD 16-bit Shift Instructions
  2021-03-15 21:25     ` Alistair Francis
@ 2021-03-16  2:40       ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-03-16  2:40 UTC (permalink / raw)
  To: Alistair Francis
  Cc: Richard Henderson, Palmer Dabbelt, open list:RISC-V,
	qemu-devel@nongnu.org Developers



On 2021/3/16 5:25, Alistair Francis wrote:
> On Fri, Feb 12, 2021 at 10:16 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>> ---
>>   target/riscv/helper.h                   |   9 ++
>>   target/riscv/insn32.decode              |  17 ++++
>>   target/riscv/insn_trans/trans_rvp.c.inc | 115 ++++++++++++++++++++++++
>>   target/riscv/packed_helper.c            | 104 +++++++++++++++++++++
>>   4 files changed, 245 insertions(+)
>>
>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>> index a69a6b4e84..20bf400ac2 100644
>> --- a/target/riscv/helper.h
>> +++ b/target/riscv/helper.h
>> @@ -1184,3 +1184,12 @@ DEF_HELPER_3(rsub8, tl, env, tl, tl)
>>   DEF_HELPER_3(ursub8, tl, env, tl, tl)
>>   DEF_HELPER_3(ksub8, tl, env, tl, tl)
>>   DEF_HELPER_3(uksub8, tl, env, tl, tl)
>> +
>> +DEF_HELPER_3(sra16, tl, env, tl, tl)
>> +DEF_HELPER_3(sra16_u, tl, env, tl, tl)
>> +DEF_HELPER_3(srl16, tl, env, tl, tl)
>> +DEF_HELPER_3(srl16_u, tl, env, tl, tl)
>> +DEF_HELPER_3(sll16, tl, env, tl, tl)
>> +DEF_HELPER_3(ksll16, tl, env, tl, tl)
>> +DEF_HELPER_3(kslra16, tl, env, tl, tl)
>> +DEF_HELPER_3(kslra16_u, tl, env, tl, tl)
>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>> index 358dd1fa10..6f053bfeb7 100644
>> --- a/target/riscv/insn32.decode
>> +++ b/target/riscv/insn32.decode
>> @@ -23,6 +23,7 @@
>>   %rd        7:5
>>
>>   %sh10    20:10
>> +%sh4    20:4
>>   %csr    20:12
>>   %rm     12:3
>>   %nf     29:3                     !function=ex_plus_1
>> @@ -59,6 +60,7 @@
>>   @j       ....................      ..... ....... &j      imm=%imm_j          %rd
>>
>>   @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh10      %rs1 %rd
>> +@sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
>>   @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
>>
>>   @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
>> @@ -635,3 +637,18 @@ rsub8      0000101  ..... ..... 000 ..... 1111111 @r
>>   ursub8     0010101  ..... ..... 000 ..... 1111111 @r
>>   ksub8      0001101  ..... ..... 000 ..... 1111111 @r
>>   uksub8     0011101  ..... ..... 000 ..... 1111111 @r
>> +
>> +sra16      0101000  ..... ..... 000 ..... 1111111 @r
>> +sra16_u    0110000  ..... ..... 000 ..... 1111111 @r
>> +srai16     0111000  0.... ..... 000 ..... 1111111 @sh4
>> +srai16_u   0111000  1.... ..... 000 ..... 1111111 @sh4
>> +srl16      0101001  ..... ..... 000 ..... 1111111 @r
>> +srl16_u    0110001  ..... ..... 000 ..... 1111111 @r
>> +srli16     0111001  0.... ..... 000 ..... 1111111 @sh4
>> +srli16_u   0111001  1.... ..... 000 ..... 1111111 @sh4
>> +sll16      0101010  ..... ..... 000 ..... 1111111 @r
>> +slli16     0111010  0.... ..... 000 ..... 1111111 @sh4
>> +ksll16     0110010  ..... ..... 000 ..... 1111111 @r
>> +kslli16    0111010  1.... ..... 000 ..... 1111111 @sh4
>> +kslra16    0101011  ..... ..... 000 ..... 1111111 @r
>> +kslra16_u  0110011  ..... ..... 000 ..... 1111111 @r
>> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
>> index 109f560ec9..848edab7e5 100644
>> --- a/target/riscv/insn_trans/trans_rvp.c.inc
>> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
>> @@ -238,3 +238,118 @@ GEN_RVP_R_OOL(rsub8);
>>   GEN_RVP_R_OOL(ursub8);
>>   GEN_RVP_R_OOL(ksub8);
>>   GEN_RVP_R_OOL(uksub8);
>> +
>> +/* 16-bit Shift Instructions */
>> +static bool rvp_shift_ool(DisasContext *ctx, arg_r *a,
>> +                          gen_helper_rvp_r *fn, target_ulong mask)
>> +{
>> +    TCGv src1, src2, dst;
>> +
>> +    src1 = tcg_temp_new();
>> +    src2 = tcg_temp_new();
>> +    dst = tcg_temp_new();
>> +
>> +    gen_get_gpr(src1, a->rs1);
>> +    gen_get_gpr(src2, a->rs2);
>> +    tcg_gen_andi_tl(src2, src2, mask);
>> +
>> +    fn(dst, cpu_env, src1, src2);
>> +    gen_set_gpr(a->rd, dst);
>> +
>> +    tcg_temp_free(src1);
>> +    tcg_temp_free(src2);
>> +    tcg_temp_free(dst);
>> +    return true;
>> +}
>> +
>> +typedef void GenGvecShift(unsigned, uint32_t, uint32_t, TCGv_i32,
>> +                          uint32_t, uint32_t);
>> +static inline bool
>> +rvp_shift(DisasContext *ctx, arg_r *a, uint8_t vece,
>> +          GenGvecShift *f64, gen_helper_rvp_r *fn,
>> +          uint8_t mask)
>> +{
>> +    if (!has_ext(ctx, RVP)) {
>> +        return false;
>> +    }
>> +
>> +#ifdef TARGET_RISCV64
> Hmm....
>
> I don't want to add any more #defines on the RISC-V xlen. We are
> trying to make the QEMU RISC-V implementation xlen independent.
I noticed the change, but was not quite clear about the benefit of it.

Could you give a brief explanation?
> Can you use `riscv_cpu_is_32bit(env)` instead, here are everywhere
> else you add a #define TARGET... ?
Sure, I think there are two ways.

1) Get env from the current_cpu, then call riscv_cpu_is_32bit(env).

It's some strange,  because I can't find current_cpu reference from many 
archs.

I don't know whether it has side effects.

2)  Add a similar function cpu_is_32bit(DisasContext *ctx).

In this way, the type of  misa field  in struct DisasContext should be 
target_ulong.
Currently, the type of misa filed is uint32_t.

Do you think which one is better? Thanks very much.

Zhiwei
>
> Alistair
>
>> +    if (a->rd && a->rs1 && a->rs2) {
>> +        TCGv_i32 shift = tcg_temp_new_i32();
>> +        tcg_gen_extrl_i64_i32(shift, cpu_gpr[a->rs2]);
>> +        tcg_gen_andi_i32(shift, shift, mask);
>> +        f64(vece, offsetof(CPURISCVState, gpr[a->rd]),
>> +            offsetof(CPURISCVState, gpr[a->rs1]),
>> +            shift, 8, 8);
>> +        tcg_temp_free_i32(shift);
>> +        return true;
>> +    }
>> +#endif
>> +    return rvp_shift_ool(ctx, a, fn, mask);
>> +}
>> +
>> +#define GEN_RVP_SHIFT(NAME, GVEC, VECE)                     \
>> +static bool trans_##NAME(DisasContext *s, arg_r *a)         \
>> +{                                                           \
>> +    return rvp_shift(s, a, VECE, GVEC, gen_helper_##NAME,   \
>> +                     (8 << VECE) - 1);                      \
>> +}
>> +
>> +GEN_RVP_SHIFT(sra16, tcg_gen_gvec_sars, 1);
>> +GEN_RVP_SHIFT(srl16, tcg_gen_gvec_shrs, 1);
>> +GEN_RVP_SHIFT(sll16, tcg_gen_gvec_shls, 1);
>> +GEN_RVP_R_OOL(sra16_u);
>> +GEN_RVP_R_OOL(srl16_u);
>> +GEN_RVP_R_OOL(ksll16);
>> +GEN_RVP_R_OOL(kslra16);
>> +GEN_RVP_R_OOL(kslra16_u);
>> +
>> +static bool rvp_shifti_ool(DisasContext *ctx, arg_shift *a,
>> +                           gen_helper_rvp_r *fn)
>> +{
>> +    TCGv src1, dst, shift;
>> +
>> +    src1 = tcg_temp_new();
>> +    dst = tcg_temp_new();
>> +
>> +    gen_get_gpr(src1, a->rs1);
>> +    shift = tcg_const_tl(a->shamt);
>> +    fn(dst, cpu_env, src1, shift);
>> +    gen_set_gpr(a->rd, dst);
>> +
>> +    tcg_temp_free(src1);
>> +    tcg_temp_free(dst);
>> +    tcg_temp_free(shift);
>> +    return true;
>> +}
>> +
>> +static inline bool
>> +rvp_shifti(DisasContext *ctx, arg_shift *a,
>> +           void (* f64)(TCGv_i64, TCGv_i64, int64_t),
>> +           gen_helper_rvp_r *fn)
>> +{
>> +    if (!has_ext(ctx, RVP)) {
>> +        return false;
>> +    }
>> +
>> +#ifdef TARGET_RISCV64
>> +    if (a->rd && a->rs1 && f64) {
>> +        f64(cpu_gpr[a->rd], cpu_gpr[a->rs1], a->shamt);
>> +        return true;
>> +    }
>> +#endif
>> +    return rvp_shifti_ool(ctx, a, fn);
>> +}
>> +
>> +#define GEN_RVP_SHIFTI(NAME, OP, GVEC)                   \
>> +static bool trans_##NAME(DisasContext *s, arg_shift *a)  \
>> +{                                                        \
>> +    return rvp_shifti(s, a, GVEC, gen_helper_##OP);      \
>> +}
>> +
>> +GEN_RVP_SHIFTI(srai16, sra16, tcg_gen_vec_sar16i_i64);
>> +GEN_RVP_SHIFTI(srli16, srl16, tcg_gen_vec_shr16i_i64);
>> +GEN_RVP_SHIFTI(slli16, sll16, tcg_gen_vec_shl16i_i64);
>> +GEN_RVP_SHIFTI(srai16_u, sra16_u, NULL);
>> +GEN_RVP_SHIFTI(srli16_u, srl16_u, NULL);
>> +GEN_RVP_SHIFTI(kslli16, ksll16, NULL);
>> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
>> index 62db072204..7e31c2fe46 100644
>> --- a/target/riscv/packed_helper.c
>> +++ b/target/riscv/packed_helper.c
>> @@ -425,3 +425,107 @@ static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
>>   }
>>
>>   RVPR(uksub8, 1, 1);
>> +
>> +/* 16-bit Shift Instructions */
>> +static inline void do_sra16(CPURISCVState *env, void *vd, void *va,
>> +                            void *vb, uint8_t i)
>> +{
>> +    int16_t *d = vd, *a = va;
>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>> +    d[i] = a[i] >> shift;
>> +}
>> +
>> +RVPR(sra16, 1, 2);
>> +
>> +static inline void do_srl16(CPURISCVState *env, void *vd, void *va,
>> +                            void *vb, uint8_t i)
>> +{
>> +    uint16_t *d = vd, *a = va;
>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>> +    d[i] = a[i] >> shift;
>> +}
>> +
>> +RVPR(srl16, 1, 2);
>> +
>> +static inline void do_sll16(CPURISCVState *env, void *vd, void *va,
>> +                            void *vb, uint8_t i)
>> +{
>> +    uint16_t *d = vd, *a = va;
>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>> +    d[i] = a[i] << shift;
>> +}
>> +
>> +RVPR(sll16, 1, 2);
>> +
>> +static inline void do_sra16_u(CPURISCVState *env, void *vd, void *va,
>> +                              void *vb, uint8_t i)
>> +{
>> +    int16_t *d = vd, *a = va;
>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>> +
>> +    d[i] = vssra16(env, 0, a[i], shift);
>> +}
>> +
>> +RVPR(sra16_u, 1, 2);
>> +
>> +static inline void do_srl16_u(CPURISCVState *env, void *vd, void *va,
>> +                              void *vb, uint8_t i)
>> +{
>> +    uint16_t *d = vd, *a = va;
>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>> +
>> +    d[i] = vssrl16(env, 0, a[i], shift);
>> +}
>> +
>> +RVPR(srl16_u, 1, 2);
>> +
>> +static inline void do_ksll16(CPURISCVState *env, void *vd, void *va,
>> +                             void *vb, uint8_t i)
>> +{
>> +    int16_t *d = vd, *a = va, result;
>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>> +
>> +    result = a[i] << shift;
>> +    if (shift > (clrsb32(a[i]) - 16)) {
>> +        env->vxsat = 0x1;
>> +        d[i] = (a[i] & INT16_MIN) ? INT16_MIN : INT16_MAX;
>> +    } else {
>> +        d[i] = result;
>> +    }
>> +}
>> +
>> +RVPR(ksll16, 1, 2);
>> +
>> +static inline void do_kslra16(CPURISCVState *env, void *vd, void *va,
>> +                              void *vb, uint8_t i)
>> +{
>> +    int16_t *d = vd, *a = va;
>> +    int32_t shift = sextract32((*(target_ulong *)vb), 0, 5);
>> +
>> +    if (shift >= 0) {
>> +        do_ksll16(env, vd, va, vb, i);
>> +    } else {
>> +        shift = -shift;
>> +        shift = (shift == 16) ? 15 : shift;
>> +        d[i] = a[i] >> shift;
>> +    }
>> +}
>> +
>> +RVPR(kslra16, 1, 2);
>> +
>> +static inline void do_kslra16_u(CPURISCVState *env, void *vd, void *va,
>> +                                void *vb, uint8_t i)
>> +{
>> +    int16_t *d = vd, *a = va;
>> +    int32_t shift = sextract32((*(uint32_t *)vb), 0, 5);
>> +
>> +    if (shift >= 0) {
>> +        do_ksll16(env, vd, va, vb, i);
>> +    } else {
>> +        shift = -shift;
>> +        shift = (shift == 16) ? 15 : shift;
>> +        d[i] = vssra16(env, 0, a[i], shift);
>> +    }
>> +}
>> +
>> +RVPR(kslra16_u, 1, 2);
>> --
>> 2.17.1
>>



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 06/38] target/riscv: SIMD 16-bit Shift Instructions
@ 2021-03-16  2:40       ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-03-16  2:40 UTC (permalink / raw)
  To: Alistair Francis
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, Palmer Dabbelt



On 2021/3/16 5:25, Alistair Francis wrote:
> On Fri, Feb 12, 2021 at 10:16 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>> ---
>>   target/riscv/helper.h                   |   9 ++
>>   target/riscv/insn32.decode              |  17 ++++
>>   target/riscv/insn_trans/trans_rvp.c.inc | 115 ++++++++++++++++++++++++
>>   target/riscv/packed_helper.c            | 104 +++++++++++++++++++++
>>   4 files changed, 245 insertions(+)
>>
>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>> index a69a6b4e84..20bf400ac2 100644
>> --- a/target/riscv/helper.h
>> +++ b/target/riscv/helper.h
>> @@ -1184,3 +1184,12 @@ DEF_HELPER_3(rsub8, tl, env, tl, tl)
>>   DEF_HELPER_3(ursub8, tl, env, tl, tl)
>>   DEF_HELPER_3(ksub8, tl, env, tl, tl)
>>   DEF_HELPER_3(uksub8, tl, env, tl, tl)
>> +
>> +DEF_HELPER_3(sra16, tl, env, tl, tl)
>> +DEF_HELPER_3(sra16_u, tl, env, tl, tl)
>> +DEF_HELPER_3(srl16, tl, env, tl, tl)
>> +DEF_HELPER_3(srl16_u, tl, env, tl, tl)
>> +DEF_HELPER_3(sll16, tl, env, tl, tl)
>> +DEF_HELPER_3(ksll16, tl, env, tl, tl)
>> +DEF_HELPER_3(kslra16, tl, env, tl, tl)
>> +DEF_HELPER_3(kslra16_u, tl, env, tl, tl)
>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>> index 358dd1fa10..6f053bfeb7 100644
>> --- a/target/riscv/insn32.decode
>> +++ b/target/riscv/insn32.decode
>> @@ -23,6 +23,7 @@
>>   %rd        7:5
>>
>>   %sh10    20:10
>> +%sh4    20:4
>>   %csr    20:12
>>   %rm     12:3
>>   %nf     29:3                     !function=ex_plus_1
>> @@ -59,6 +60,7 @@
>>   @j       ....................      ..... ....... &j      imm=%imm_j          %rd
>>
>>   @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh10      %rs1 %rd
>> +@sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
>>   @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
>>
>>   @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
>> @@ -635,3 +637,18 @@ rsub8      0000101  ..... ..... 000 ..... 1111111 @r
>>   ursub8     0010101  ..... ..... 000 ..... 1111111 @r
>>   ksub8      0001101  ..... ..... 000 ..... 1111111 @r
>>   uksub8     0011101  ..... ..... 000 ..... 1111111 @r
>> +
>> +sra16      0101000  ..... ..... 000 ..... 1111111 @r
>> +sra16_u    0110000  ..... ..... 000 ..... 1111111 @r
>> +srai16     0111000  0.... ..... 000 ..... 1111111 @sh4
>> +srai16_u   0111000  1.... ..... 000 ..... 1111111 @sh4
>> +srl16      0101001  ..... ..... 000 ..... 1111111 @r
>> +srl16_u    0110001  ..... ..... 000 ..... 1111111 @r
>> +srli16     0111001  0.... ..... 000 ..... 1111111 @sh4
>> +srli16_u   0111001  1.... ..... 000 ..... 1111111 @sh4
>> +sll16      0101010  ..... ..... 000 ..... 1111111 @r
>> +slli16     0111010  0.... ..... 000 ..... 1111111 @sh4
>> +ksll16     0110010  ..... ..... 000 ..... 1111111 @r
>> +kslli16    0111010  1.... ..... 000 ..... 1111111 @sh4
>> +kslra16    0101011  ..... ..... 000 ..... 1111111 @r
>> +kslra16_u  0110011  ..... ..... 000 ..... 1111111 @r
>> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
>> index 109f560ec9..848edab7e5 100644
>> --- a/target/riscv/insn_trans/trans_rvp.c.inc
>> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
>> @@ -238,3 +238,118 @@ GEN_RVP_R_OOL(rsub8);
>>   GEN_RVP_R_OOL(ursub8);
>>   GEN_RVP_R_OOL(ksub8);
>>   GEN_RVP_R_OOL(uksub8);
>> +
>> +/* 16-bit Shift Instructions */
>> +static bool rvp_shift_ool(DisasContext *ctx, arg_r *a,
>> +                          gen_helper_rvp_r *fn, target_ulong mask)
>> +{
>> +    TCGv src1, src2, dst;
>> +
>> +    src1 = tcg_temp_new();
>> +    src2 = tcg_temp_new();
>> +    dst = tcg_temp_new();
>> +
>> +    gen_get_gpr(src1, a->rs1);
>> +    gen_get_gpr(src2, a->rs2);
>> +    tcg_gen_andi_tl(src2, src2, mask);
>> +
>> +    fn(dst, cpu_env, src1, src2);
>> +    gen_set_gpr(a->rd, dst);
>> +
>> +    tcg_temp_free(src1);
>> +    tcg_temp_free(src2);
>> +    tcg_temp_free(dst);
>> +    return true;
>> +}
>> +
>> +typedef void GenGvecShift(unsigned, uint32_t, uint32_t, TCGv_i32,
>> +                          uint32_t, uint32_t);
>> +static inline bool
>> +rvp_shift(DisasContext *ctx, arg_r *a, uint8_t vece,
>> +          GenGvecShift *f64, gen_helper_rvp_r *fn,
>> +          uint8_t mask)
>> +{
>> +    if (!has_ext(ctx, RVP)) {
>> +        return false;
>> +    }
>> +
>> +#ifdef TARGET_RISCV64
> Hmm....
>
> I don't want to add any more #defines on the RISC-V xlen. We are
> trying to make the QEMU RISC-V implementation xlen independent.
I noticed the change, but was not quite clear about the benefit of it.

Could you give a brief explanation?
> Can you use `riscv_cpu_is_32bit(env)` instead, here are everywhere
> else you add a #define TARGET... ?
Sure, I think there are two ways.

1) Get env from the current_cpu, then call riscv_cpu_is_32bit(env).

It's some strange,  because I can't find current_cpu reference from many 
archs.

I don't know whether it has side effects.

2)  Add a similar function cpu_is_32bit(DisasContext *ctx).

In this way, the type of  misa field  in struct DisasContext should be 
target_ulong.
Currently, the type of misa filed is uint32_t.

Do you think which one is better? Thanks very much.

Zhiwei
>
> Alistair
>
>> +    if (a->rd && a->rs1 && a->rs2) {
>> +        TCGv_i32 shift = tcg_temp_new_i32();
>> +        tcg_gen_extrl_i64_i32(shift, cpu_gpr[a->rs2]);
>> +        tcg_gen_andi_i32(shift, shift, mask);
>> +        f64(vece, offsetof(CPURISCVState, gpr[a->rd]),
>> +            offsetof(CPURISCVState, gpr[a->rs1]),
>> +            shift, 8, 8);
>> +        tcg_temp_free_i32(shift);
>> +        return true;
>> +    }
>> +#endif
>> +    return rvp_shift_ool(ctx, a, fn, mask);
>> +}
>> +
>> +#define GEN_RVP_SHIFT(NAME, GVEC, VECE)                     \
>> +static bool trans_##NAME(DisasContext *s, arg_r *a)         \
>> +{                                                           \
>> +    return rvp_shift(s, a, VECE, GVEC, gen_helper_##NAME,   \
>> +                     (8 << VECE) - 1);                      \
>> +}
>> +
>> +GEN_RVP_SHIFT(sra16, tcg_gen_gvec_sars, 1);
>> +GEN_RVP_SHIFT(srl16, tcg_gen_gvec_shrs, 1);
>> +GEN_RVP_SHIFT(sll16, tcg_gen_gvec_shls, 1);
>> +GEN_RVP_R_OOL(sra16_u);
>> +GEN_RVP_R_OOL(srl16_u);
>> +GEN_RVP_R_OOL(ksll16);
>> +GEN_RVP_R_OOL(kslra16);
>> +GEN_RVP_R_OOL(kslra16_u);
>> +
>> +static bool rvp_shifti_ool(DisasContext *ctx, arg_shift *a,
>> +                           gen_helper_rvp_r *fn)
>> +{
>> +    TCGv src1, dst, shift;
>> +
>> +    src1 = tcg_temp_new();
>> +    dst = tcg_temp_new();
>> +
>> +    gen_get_gpr(src1, a->rs1);
>> +    shift = tcg_const_tl(a->shamt);
>> +    fn(dst, cpu_env, src1, shift);
>> +    gen_set_gpr(a->rd, dst);
>> +
>> +    tcg_temp_free(src1);
>> +    tcg_temp_free(dst);
>> +    tcg_temp_free(shift);
>> +    return true;
>> +}
>> +
>> +static inline bool
>> +rvp_shifti(DisasContext *ctx, arg_shift *a,
>> +           void (* f64)(TCGv_i64, TCGv_i64, int64_t),
>> +           gen_helper_rvp_r *fn)
>> +{
>> +    if (!has_ext(ctx, RVP)) {
>> +        return false;
>> +    }
>> +
>> +#ifdef TARGET_RISCV64
>> +    if (a->rd && a->rs1 && f64) {
>> +        f64(cpu_gpr[a->rd], cpu_gpr[a->rs1], a->shamt);
>> +        return true;
>> +    }
>> +#endif
>> +    return rvp_shifti_ool(ctx, a, fn);
>> +}
>> +
>> +#define GEN_RVP_SHIFTI(NAME, OP, GVEC)                   \
>> +static bool trans_##NAME(DisasContext *s, arg_shift *a)  \
>> +{                                                        \
>> +    return rvp_shifti(s, a, GVEC, gen_helper_##OP);      \
>> +}
>> +
>> +GEN_RVP_SHIFTI(srai16, sra16, tcg_gen_vec_sar16i_i64);
>> +GEN_RVP_SHIFTI(srli16, srl16, tcg_gen_vec_shr16i_i64);
>> +GEN_RVP_SHIFTI(slli16, sll16, tcg_gen_vec_shl16i_i64);
>> +GEN_RVP_SHIFTI(srai16_u, sra16_u, NULL);
>> +GEN_RVP_SHIFTI(srli16_u, srl16_u, NULL);
>> +GEN_RVP_SHIFTI(kslli16, ksll16, NULL);
>> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
>> index 62db072204..7e31c2fe46 100644
>> --- a/target/riscv/packed_helper.c
>> +++ b/target/riscv/packed_helper.c
>> @@ -425,3 +425,107 @@ static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
>>   }
>>
>>   RVPR(uksub8, 1, 1);
>> +
>> +/* 16-bit Shift Instructions */
>> +static inline void do_sra16(CPURISCVState *env, void *vd, void *va,
>> +                            void *vb, uint8_t i)
>> +{
>> +    int16_t *d = vd, *a = va;
>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>> +    d[i] = a[i] >> shift;
>> +}
>> +
>> +RVPR(sra16, 1, 2);
>> +
>> +static inline void do_srl16(CPURISCVState *env, void *vd, void *va,
>> +                            void *vb, uint8_t i)
>> +{
>> +    uint16_t *d = vd, *a = va;
>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>> +    d[i] = a[i] >> shift;
>> +}
>> +
>> +RVPR(srl16, 1, 2);
>> +
>> +static inline void do_sll16(CPURISCVState *env, void *vd, void *va,
>> +                            void *vb, uint8_t i)
>> +{
>> +    uint16_t *d = vd, *a = va;
>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>> +    d[i] = a[i] << shift;
>> +}
>> +
>> +RVPR(sll16, 1, 2);
>> +
>> +static inline void do_sra16_u(CPURISCVState *env, void *vd, void *va,
>> +                              void *vb, uint8_t i)
>> +{
>> +    int16_t *d = vd, *a = va;
>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>> +
>> +    d[i] = vssra16(env, 0, a[i], shift);
>> +}
>> +
>> +RVPR(sra16_u, 1, 2);
>> +
>> +static inline void do_srl16_u(CPURISCVState *env, void *vd, void *va,
>> +                              void *vb, uint8_t i)
>> +{
>> +    uint16_t *d = vd, *a = va;
>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>> +
>> +    d[i] = vssrl16(env, 0, a[i], shift);
>> +}
>> +
>> +RVPR(srl16_u, 1, 2);
>> +
>> +static inline void do_ksll16(CPURISCVState *env, void *vd, void *va,
>> +                             void *vb, uint8_t i)
>> +{
>> +    int16_t *d = vd, *a = va, result;
>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>> +
>> +    result = a[i] << shift;
>> +    if (shift > (clrsb32(a[i]) - 16)) {
>> +        env->vxsat = 0x1;
>> +        d[i] = (a[i] & INT16_MIN) ? INT16_MIN : INT16_MAX;
>> +    } else {
>> +        d[i] = result;
>> +    }
>> +}
>> +
>> +RVPR(ksll16, 1, 2);
>> +
>> +static inline void do_kslra16(CPURISCVState *env, void *vd, void *va,
>> +                              void *vb, uint8_t i)
>> +{
>> +    int16_t *d = vd, *a = va;
>> +    int32_t shift = sextract32((*(target_ulong *)vb), 0, 5);
>> +
>> +    if (shift >= 0) {
>> +        do_ksll16(env, vd, va, vb, i);
>> +    } else {
>> +        shift = -shift;
>> +        shift = (shift == 16) ? 15 : shift;
>> +        d[i] = a[i] >> shift;
>> +    }
>> +}
>> +
>> +RVPR(kslra16, 1, 2);
>> +
>> +static inline void do_kslra16_u(CPURISCVState *env, void *vd, void *va,
>> +                                void *vb, uint8_t i)
>> +{
>> +    int16_t *d = vd, *a = va;
>> +    int32_t shift = sextract32((*(uint32_t *)vb), 0, 5);
>> +
>> +    if (shift >= 0) {
>> +        do_ksll16(env, vd, va, vb, i);
>> +    } else {
>> +        shift = -shift;
>> +        shift = (shift == 16) ? 15 : shift;
>> +        d[i] = vssra16(env, 0, a[i], shift);
>> +    }
>> +}
>> +
>> +RVPR(kslra16_u, 1, 2);
>> --
>> 2.17.1
>>



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 13/38] target/riscv: SIMD 8-bit Miscellaneous Instructions
  2021-02-12 15:02   ` LIU Zhiwei
@ 2021-03-16 14:38     ` Alistair Francis
  -1 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-16 14:38 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Palmer Dabbelt, open list:RISC-V,
	qemu-devel@nongnu.org Developers

On Fri, Feb 12, 2021 at 10:30 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |  12 +++
>  target/riscv/insn32.decode              |  12 +++
>  target/riscv/insn_trans/trans_rvp.c.inc |  13 +++
>  target/riscv/packed_helper.c            | 115 ++++++++++++++++++++++++
>  4 files changed, 152 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 866484e37d..83778b532a 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1240,3 +1240,15 @@ DEF_HELPER_2(clrs16, tl, env, tl)
>  DEF_HELPER_2(clz16, tl, env, tl)
>  DEF_HELPER_2(clo16, tl, env, tl)
>  DEF_HELPER_2(swap16, tl, env, tl)
> +
> +DEF_HELPER_3(smin8, tl, env, tl, tl)
> +DEF_HELPER_3(umin8, tl, env, tl, tl)
> +DEF_HELPER_3(smax8, tl, env, tl, tl)
> +DEF_HELPER_3(umax8, tl, env, tl, tl)
> +DEF_HELPER_3(sclip8, tl, env, tl, tl)
> +DEF_HELPER_3(uclip8, tl, env, tl, tl)
> +DEF_HELPER_2(kabs8, tl, env, tl)
> +DEF_HELPER_2(clrs8, tl, env, tl)
> +DEF_HELPER_2(clz8, tl, env, tl)
> +DEF_HELPER_2(clo8, tl, env, tl)
> +DEF_HELPER_2(swap8, tl, env, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index bc9d5fc967..e158066353 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -708,3 +708,15 @@ clrs16     1010111  01000 ..... 000 ..... 1111111 @r2
>  clz16      1010111  01001 ..... 000 ..... 1111111 @r2
>  clo16      1010111  01011 ..... 000 ..... 1111111 @r2
>  swap16     1010110  11001 ..... 000 ..... 1111111 @r2
> +
> +smin8      1000100  ..... ..... 000 ..... 1111111 @r
> +umin8      1001100  ..... ..... 000 ..... 1111111 @r
> +smax8      1000101  ..... ..... 000 ..... 1111111 @r
> +umax8      1001101  ..... ..... 000 ..... 1111111 @r
> +sclip8     1000110  00... ..... 000 ..... 1111111 @sh3
> +uclip8     1000110  10... ..... 000 ..... 1111111 @sh3
> +kabs8      1010110  10000 ..... 000 ..... 1111111 @r2
> +clrs8      1010111  00000 ..... 000 ..... 1111111 @r2
> +clz8       1010111  00001 ..... 000 ..... 1111111 @r2
> +clo8       1010111  00011 ..... 000 ..... 1111111 @r2
> +swap8      1010110  11000 ..... 000 ..... 1111111 @r2
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 56fb8b2523..5ad057d7ac 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -486,3 +486,16 @@ GEN_RVP_R2_OOL(clrs16);
>  GEN_RVP_R2_OOL(clz16);
>  GEN_RVP_R2_OOL(clo16);
>  GEN_RVP_R2_OOL(swap16);
> +
> +/* SIMD 8-bit Miscellaneous Instructions */
> +GEN_RVP_R_OOL(smin8);
> +GEN_RVP_R_OOL(umin8);
> +GEN_RVP_R_OOL(smax8);
> +GEN_RVP_R_OOL(umax8);
> +GEN_RVP_SHIFTI(sclip8, sclip8, NULL);
> +GEN_RVP_SHIFTI(uclip8, uclip8, NULL);
> +GEN_RVP_R2_OOL(kabs8);
> +GEN_RVP_R2_OOL(clrs8);
> +GEN_RVP_R2_OOL(clz8);
> +GEN_RVP_R2_OOL(clo8);
> +GEN_RVP_R2_OOL(swap8);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index a6ab011ace..be91d308e5 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -1087,3 +1087,118 @@ static inline void do_swap16(CPURISCVState *env, void *vd, void *va, uint8_t i)
>  }
>
>  RVPR2(swap16, 2, 2);
> +
> +/* SIMD 8-bit Miscellaneous Instructions */
> +static inline void do_smin8(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, *b = vb;
> +
> +    d[i] = (a[i] < b[i]) ? a[i] : b[i];
> +}
> +
> +RVPR(smin8, 1, 1);
> +
> +static inline void do_umin8(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va, *b = vb;
> +
> +    d[i] = (a[i] < b[i]) ? a[i] : b[i];
> +}
> +
> +RVPR(umin8, 1, 1);
> +
> +static inline void do_smax8(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, *b = vb;
> +
> +    d[i] = (a[i] > b[i]) ? a[i] : b[i];
> +}
> +
> +RVPR(smax8, 1, 1);
> +
> +static inline void do_umax8(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va, *b = vb;
> +
> +    d[i] = (a[i] > b[i]) ? a[i] : b[i];
> +}
> +
> +RVPR(umax8, 1, 1);
> +
> +static inline void do_sclip8(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +
> +    d[i] = sat64(env, a[i], shift);
> +}
> +
> +RVPR(sclip8, 1, 1);
> +
> +static inline void do_uclip8(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +
> +    if (a[i] < 0) {
> +        d[i] = 0;
> +        env->vxsat = 0x1;
> +    } else {
> +        d[i] = satu64(env, a[i], shift);
> +    }
> +}
> +
> +RVPR(uclip8, 1, 1);
> +
> +static inline void do_kabs8(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +
> +    if (a[i] == INT8_MIN) {
> +        d[i] = INT8_MAX;
> +        env->vxsat = 0x1;
> +    } else {
> +        d[i] = abs(a[i]);
> +    }
> +}
> +
> +RVPR2(kabs8, 1, 1);
> +
> +static inline void do_clrs8(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    d[i] = clrsb32(a[i]) - 24;
> +}
> +
> +RVPR2(clrs8, 1, 1);
> +
> +static inline void do_clz8(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    d[i] = (a[i] < 0) ? 0 : (clz32(a[i]) - 24);
> +}
> +
> +RVPR2(clz8, 1, 1);
> +
> +static inline void do_clo8(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    d[i] = (a[i] >= 0) ? 0 : (clo32(a[i]) - 24);
> +}
> +
> +RVPR2(clo8, 1, 1);
> +
> +static inline void do_swap8(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    d[H1(i)] = a[H1(i + 1)];
> +    d[H1(i + 1)] = a[H1(i)];
> +}
> +
> +RVPR2(swap8, 2, 1);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 13/38] target/riscv: SIMD 8-bit Miscellaneous Instructions
@ 2021-03-16 14:38     ` Alistair Francis
  0 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-16 14:38 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, Palmer Dabbelt

On Fri, Feb 12, 2021 at 10:30 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |  12 +++
>  target/riscv/insn32.decode              |  12 +++
>  target/riscv/insn_trans/trans_rvp.c.inc |  13 +++
>  target/riscv/packed_helper.c            | 115 ++++++++++++++++++++++++
>  4 files changed, 152 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 866484e37d..83778b532a 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1240,3 +1240,15 @@ DEF_HELPER_2(clrs16, tl, env, tl)
>  DEF_HELPER_2(clz16, tl, env, tl)
>  DEF_HELPER_2(clo16, tl, env, tl)
>  DEF_HELPER_2(swap16, tl, env, tl)
> +
> +DEF_HELPER_3(smin8, tl, env, tl, tl)
> +DEF_HELPER_3(umin8, tl, env, tl, tl)
> +DEF_HELPER_3(smax8, tl, env, tl, tl)
> +DEF_HELPER_3(umax8, tl, env, tl, tl)
> +DEF_HELPER_3(sclip8, tl, env, tl, tl)
> +DEF_HELPER_3(uclip8, tl, env, tl, tl)
> +DEF_HELPER_2(kabs8, tl, env, tl)
> +DEF_HELPER_2(clrs8, tl, env, tl)
> +DEF_HELPER_2(clz8, tl, env, tl)
> +DEF_HELPER_2(clo8, tl, env, tl)
> +DEF_HELPER_2(swap8, tl, env, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index bc9d5fc967..e158066353 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -708,3 +708,15 @@ clrs16     1010111  01000 ..... 000 ..... 1111111 @r2
>  clz16      1010111  01001 ..... 000 ..... 1111111 @r2
>  clo16      1010111  01011 ..... 000 ..... 1111111 @r2
>  swap16     1010110  11001 ..... 000 ..... 1111111 @r2
> +
> +smin8      1000100  ..... ..... 000 ..... 1111111 @r
> +umin8      1001100  ..... ..... 000 ..... 1111111 @r
> +smax8      1000101  ..... ..... 000 ..... 1111111 @r
> +umax8      1001101  ..... ..... 000 ..... 1111111 @r
> +sclip8     1000110  00... ..... 000 ..... 1111111 @sh3
> +uclip8     1000110  10... ..... 000 ..... 1111111 @sh3
> +kabs8      1010110  10000 ..... 000 ..... 1111111 @r2
> +clrs8      1010111  00000 ..... 000 ..... 1111111 @r2
> +clz8       1010111  00001 ..... 000 ..... 1111111 @r2
> +clo8       1010111  00011 ..... 000 ..... 1111111 @r2
> +swap8      1010110  11000 ..... 000 ..... 1111111 @r2
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 56fb8b2523..5ad057d7ac 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -486,3 +486,16 @@ GEN_RVP_R2_OOL(clrs16);
>  GEN_RVP_R2_OOL(clz16);
>  GEN_RVP_R2_OOL(clo16);
>  GEN_RVP_R2_OOL(swap16);
> +
> +/* SIMD 8-bit Miscellaneous Instructions */
> +GEN_RVP_R_OOL(smin8);
> +GEN_RVP_R_OOL(umin8);
> +GEN_RVP_R_OOL(smax8);
> +GEN_RVP_R_OOL(umax8);
> +GEN_RVP_SHIFTI(sclip8, sclip8, NULL);
> +GEN_RVP_SHIFTI(uclip8, uclip8, NULL);
> +GEN_RVP_R2_OOL(kabs8);
> +GEN_RVP_R2_OOL(clrs8);
> +GEN_RVP_R2_OOL(clz8);
> +GEN_RVP_R2_OOL(clo8);
> +GEN_RVP_R2_OOL(swap8);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index a6ab011ace..be91d308e5 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -1087,3 +1087,118 @@ static inline void do_swap16(CPURISCVState *env, void *vd, void *va, uint8_t i)
>  }
>
>  RVPR2(swap16, 2, 2);
> +
> +/* SIMD 8-bit Miscellaneous Instructions */
> +static inline void do_smin8(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, *b = vb;
> +
> +    d[i] = (a[i] < b[i]) ? a[i] : b[i];
> +}
> +
> +RVPR(smin8, 1, 1);
> +
> +static inline void do_umin8(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va, *b = vb;
> +
> +    d[i] = (a[i] < b[i]) ? a[i] : b[i];
> +}
> +
> +RVPR(umin8, 1, 1);
> +
> +static inline void do_smax8(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, *b = vb;
> +
> +    d[i] = (a[i] > b[i]) ? a[i] : b[i];
> +}
> +
> +RVPR(smax8, 1, 1);
> +
> +static inline void do_umax8(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va, *b = vb;
> +
> +    d[i] = (a[i] > b[i]) ? a[i] : b[i];
> +}
> +
> +RVPR(umax8, 1, 1);
> +
> +static inline void do_sclip8(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +
> +    d[i] = sat64(env, a[i], shift);
> +}
> +
> +RVPR(sclip8, 1, 1);
> +
> +static inline void do_uclip8(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +
> +    if (a[i] < 0) {
> +        d[i] = 0;
> +        env->vxsat = 0x1;
> +    } else {
> +        d[i] = satu64(env, a[i], shift);
> +    }
> +}
> +
> +RVPR(uclip8, 1, 1);
> +
> +static inline void do_kabs8(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +
> +    if (a[i] == INT8_MIN) {
> +        d[i] = INT8_MAX;
> +        env->vxsat = 0x1;
> +    } else {
> +        d[i] = abs(a[i]);
> +    }
> +}
> +
> +RVPR2(kabs8, 1, 1);
> +
> +static inline void do_clrs8(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    d[i] = clrsb32(a[i]) - 24;
> +}
> +
> +RVPR2(clrs8, 1, 1);
> +
> +static inline void do_clz8(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    d[i] = (a[i] < 0) ? 0 : (clz32(a[i]) - 24);
> +}
> +
> +RVPR2(clz8, 1, 1);
> +
> +static inline void do_clo8(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    d[i] = (a[i] >= 0) ? 0 : (clo32(a[i]) - 24);
> +}
> +
> +RVPR2(clo8, 1, 1);
> +
> +static inline void do_swap8(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    d[H1(i)] = a[H1(i + 1)];
> +    d[H1(i + 1)] = a[H1(i)];
> +}
> +
> +RVPR2(swap8, 2, 1);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 14/38] target/riscv: 8-bit Unpacking Instructions
  2021-02-12 15:02   ` LIU Zhiwei
@ 2021-03-16 14:40     ` Alistair Francis
  -1 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-16 14:40 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Palmer Dabbelt, open list:RISC-V,
	qemu-devel@nongnu.org Developers

On Fri, Feb 12, 2021 at 10:32 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |  11 +++
>  target/riscv/insn32.decode              |  11 +++
>  target/riscv/insn_trans/trans_rvp.c.inc |  12 +++
>  target/riscv/packed_helper.c            | 121 ++++++++++++++++++++++++
>  4 files changed, 155 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 83778b532a..585905a689 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1252,3 +1252,14 @@ DEF_HELPER_2(clrs8, tl, env, tl)
>  DEF_HELPER_2(clz8, tl, env, tl)
>  DEF_HELPER_2(clo8, tl, env, tl)
>  DEF_HELPER_2(swap8, tl, env, tl)
> +
> +DEF_HELPER_2(sunpkd810, tl, env, tl)
> +DEF_HELPER_2(sunpkd820, tl, env, tl)
> +DEF_HELPER_2(sunpkd830, tl, env, tl)
> +DEF_HELPER_2(sunpkd831, tl, env, tl)
> +DEF_HELPER_2(sunpkd832, tl, env, tl)
> +DEF_HELPER_2(zunpkd810, tl, env, tl)
> +DEF_HELPER_2(zunpkd820, tl, env, tl)
> +DEF_HELPER_2(zunpkd830, tl, env, tl)
> +DEF_HELPER_2(zunpkd831, tl, env, tl)
> +DEF_HELPER_2(zunpkd832, tl, env, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index e158066353..fa4a02c9db 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -720,3 +720,14 @@ clrs8      1010111  00000 ..... 000 ..... 1111111 @r2
>  clz8       1010111  00001 ..... 000 ..... 1111111 @r2
>  clo8       1010111  00011 ..... 000 ..... 1111111 @r2
>  swap8      1010110  11000 ..... 000 ..... 1111111 @r2
> +
> +sunpkd810  1010110  01000 ..... 000 ..... 1111111 @r2
> +sunpkd820  1010110  01001 ..... 000 ..... 1111111 @r2
> +sunpkd830  1010110  01010 ..... 000 ..... 1111111 @r2
> +sunpkd831  1010110  01011 ..... 000 ..... 1111111 @r2
> +sunpkd832  1010110  10011 ..... 000 ..... 1111111 @r2
> +zunpkd810  1010110  01100 ..... 000 ..... 1111111 @r2
> +zunpkd820  1010110  01101 ..... 000 ..... 1111111 @r2
> +zunpkd830  1010110  01110 ..... 000 ..... 1111111 @r2
> +zunpkd831  1010110  01111 ..... 000 ..... 1111111 @r2
> +zunpkd832  1010110  10111 ..... 000 ..... 1111111 @r2
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 5ad057d7ac..b69e964cb4 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -499,3 +499,15 @@ GEN_RVP_R2_OOL(clrs8);
>  GEN_RVP_R2_OOL(clz8);
>  GEN_RVP_R2_OOL(clo8);
>  GEN_RVP_R2_OOL(swap8);
> +
> +/* 8-bit Unpacking Instructions */
> +GEN_RVP_R2_OOL(sunpkd810);
> +GEN_RVP_R2_OOL(sunpkd820);
> +GEN_RVP_R2_OOL(sunpkd830);
> +GEN_RVP_R2_OOL(sunpkd831);
> +GEN_RVP_R2_OOL(sunpkd832);
> +GEN_RVP_R2_OOL(zunpkd810);
> +GEN_RVP_R2_OOL(zunpkd820);
> +GEN_RVP_R2_OOL(zunpkd830);
> +GEN_RVP_R2_OOL(zunpkd831);
> +GEN_RVP_R2_OOL(zunpkd832);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index be91d308e5..d0dcb692f5 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -1202,3 +1202,124 @@ static inline void do_swap8(CPURISCVState *env, void *vd, void *va, uint8_t i)
>  }
>
>  RVPR2(swap8, 2, 1);
> +
> +/* 8-bit Unpacking Instructions */
> +static inline void
> +do_sunpkd810(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int8_t *a = va;
> +    int16_t *d = vd;
> +
> +    d[H2(i / 2)] = a[H1(i)];
> +    d[H2(i / 2 + 1)] = a[H1(i + 1)];
> +}
> +
> +RVPR2(sunpkd810, 4, 1);
> +
> +static inline void
> +do_sunpkd820(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int8_t *a = va;
> +    int16_t *d = vd;
> +
> +    d[H2(i / 2)] = a[H1(i)];
> +    d[H2(i / 2 + 1)] = a[H1(i + 2)];
> +}
> +
> +RVPR2(sunpkd820, 4, 1);
> +
> +static inline void
> +do_sunpkd830(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int8_t *a = va;
> +    int16_t *d = vd;
> +
> +    d[H2(i / 2)] = a[H1(i)];
> +    d[H2(i / 2 + 1)] = a[H1(i + 3)];
> +}
> +
> +RVPR2(sunpkd830, 4, 1);
> +
> +static inline void
> +do_sunpkd831(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int8_t *a = va;
> +    int16_t *d = vd;
> +
> +    d[H2(i / 2)] = a[H1(i) + 1];
> +    d[H2(i / 2 + 1)] = a[H1(i + 3)];
> +}
> +
> +RVPR2(sunpkd831, 4, 1);
> +
> +static inline void
> +do_sunpkd832(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int8_t *a = va;
> +    int16_t *d = vd;
> +
> +    d[H2(i / 2)] = a[H1(i) + 2];
> +    d[H2(i / 2 + 1)] = a[H1(i + 3)];
> +}
> +
> +RVPR2(sunpkd832, 4, 1);
> +
> +static inline void
> +do_zunpkd810(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    uint8_t *a = va;
> +    uint16_t *d = vd;
> +
> +    d[H2(i / 2)] = a[H1(i)];
> +    d[H2(i / 2 + 1)] = a[H1(i + 1)];
> +}
> +
> +RVPR2(zunpkd810, 4, 1);
> +
> +static inline void
> +do_zunpkd820(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    uint8_t *a = va;
> +    uint16_t *d = vd;
> +
> +    d[H2(i / 2)] = a[H1(i)];
> +    d[H2(i / 2 + 1)] = a[H1(i + 2)];
> +}
> +
> +RVPR2(zunpkd820, 4, 1);
> +
> +static inline void
> +do_zunpkd830(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    uint8_t *a = va;
> +    uint16_t *d = vd;
> +
> +    d[H2(i / 2)] = a[H1(i)];
> +    d[H2(i / 2 + 1)] = a[H1(i + 3)];
> +}
> +
> +RVPR2(zunpkd830, 4, 1);
> +
> +static inline void
> +do_zunpkd831(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    uint8_t *a = va;
> +    uint16_t *d = vd;
> +
> +    d[H2(i / 2)] = a[H1(i) + 1];
> +    d[H2(i / 2 + 1)] = a[H1(i + 3)];
> +}
> +
> +RVPR2(zunpkd831, 4, 1);
> +
> +static inline void
> +do_zunpkd832(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    uint8_t *a = va;
> +    uint16_t *d = vd;
> +
> +    d[H2(i / 2)] = a[H1(i) + 2];
> +    d[H2(i / 2 + 1)] = a[H1(i + 3)];
> +}
> +
> +RVPR2(zunpkd832, 4, 1);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 14/38] target/riscv: 8-bit Unpacking Instructions
@ 2021-03-16 14:40     ` Alistair Francis
  0 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-16 14:40 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, Palmer Dabbelt

On Fri, Feb 12, 2021 at 10:32 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |  11 +++
>  target/riscv/insn32.decode              |  11 +++
>  target/riscv/insn_trans/trans_rvp.c.inc |  12 +++
>  target/riscv/packed_helper.c            | 121 ++++++++++++++++++++++++
>  4 files changed, 155 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 83778b532a..585905a689 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1252,3 +1252,14 @@ DEF_HELPER_2(clrs8, tl, env, tl)
>  DEF_HELPER_2(clz8, tl, env, tl)
>  DEF_HELPER_2(clo8, tl, env, tl)
>  DEF_HELPER_2(swap8, tl, env, tl)
> +
> +DEF_HELPER_2(sunpkd810, tl, env, tl)
> +DEF_HELPER_2(sunpkd820, tl, env, tl)
> +DEF_HELPER_2(sunpkd830, tl, env, tl)
> +DEF_HELPER_2(sunpkd831, tl, env, tl)
> +DEF_HELPER_2(sunpkd832, tl, env, tl)
> +DEF_HELPER_2(zunpkd810, tl, env, tl)
> +DEF_HELPER_2(zunpkd820, tl, env, tl)
> +DEF_HELPER_2(zunpkd830, tl, env, tl)
> +DEF_HELPER_2(zunpkd831, tl, env, tl)
> +DEF_HELPER_2(zunpkd832, tl, env, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index e158066353..fa4a02c9db 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -720,3 +720,14 @@ clrs8      1010111  00000 ..... 000 ..... 1111111 @r2
>  clz8       1010111  00001 ..... 000 ..... 1111111 @r2
>  clo8       1010111  00011 ..... 000 ..... 1111111 @r2
>  swap8      1010110  11000 ..... 000 ..... 1111111 @r2
> +
> +sunpkd810  1010110  01000 ..... 000 ..... 1111111 @r2
> +sunpkd820  1010110  01001 ..... 000 ..... 1111111 @r2
> +sunpkd830  1010110  01010 ..... 000 ..... 1111111 @r2
> +sunpkd831  1010110  01011 ..... 000 ..... 1111111 @r2
> +sunpkd832  1010110  10011 ..... 000 ..... 1111111 @r2
> +zunpkd810  1010110  01100 ..... 000 ..... 1111111 @r2
> +zunpkd820  1010110  01101 ..... 000 ..... 1111111 @r2
> +zunpkd830  1010110  01110 ..... 000 ..... 1111111 @r2
> +zunpkd831  1010110  01111 ..... 000 ..... 1111111 @r2
> +zunpkd832  1010110  10111 ..... 000 ..... 1111111 @r2
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 5ad057d7ac..b69e964cb4 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -499,3 +499,15 @@ GEN_RVP_R2_OOL(clrs8);
>  GEN_RVP_R2_OOL(clz8);
>  GEN_RVP_R2_OOL(clo8);
>  GEN_RVP_R2_OOL(swap8);
> +
> +/* 8-bit Unpacking Instructions */
> +GEN_RVP_R2_OOL(sunpkd810);
> +GEN_RVP_R2_OOL(sunpkd820);
> +GEN_RVP_R2_OOL(sunpkd830);
> +GEN_RVP_R2_OOL(sunpkd831);
> +GEN_RVP_R2_OOL(sunpkd832);
> +GEN_RVP_R2_OOL(zunpkd810);
> +GEN_RVP_R2_OOL(zunpkd820);
> +GEN_RVP_R2_OOL(zunpkd830);
> +GEN_RVP_R2_OOL(zunpkd831);
> +GEN_RVP_R2_OOL(zunpkd832);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index be91d308e5..d0dcb692f5 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -1202,3 +1202,124 @@ static inline void do_swap8(CPURISCVState *env, void *vd, void *va, uint8_t i)
>  }
>
>  RVPR2(swap8, 2, 1);
> +
> +/* 8-bit Unpacking Instructions */
> +static inline void
> +do_sunpkd810(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int8_t *a = va;
> +    int16_t *d = vd;
> +
> +    d[H2(i / 2)] = a[H1(i)];
> +    d[H2(i / 2 + 1)] = a[H1(i + 1)];
> +}
> +
> +RVPR2(sunpkd810, 4, 1);
> +
> +static inline void
> +do_sunpkd820(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int8_t *a = va;
> +    int16_t *d = vd;
> +
> +    d[H2(i / 2)] = a[H1(i)];
> +    d[H2(i / 2 + 1)] = a[H1(i + 2)];
> +}
> +
> +RVPR2(sunpkd820, 4, 1);
> +
> +static inline void
> +do_sunpkd830(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int8_t *a = va;
> +    int16_t *d = vd;
> +
> +    d[H2(i / 2)] = a[H1(i)];
> +    d[H2(i / 2 + 1)] = a[H1(i + 3)];
> +}
> +
> +RVPR2(sunpkd830, 4, 1);
> +
> +static inline void
> +do_sunpkd831(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int8_t *a = va;
> +    int16_t *d = vd;
> +
> +    d[H2(i / 2)] = a[H1(i) + 1];
> +    d[H2(i / 2 + 1)] = a[H1(i + 3)];
> +}
> +
> +RVPR2(sunpkd831, 4, 1);
> +
> +static inline void
> +do_sunpkd832(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int8_t *a = va;
> +    int16_t *d = vd;
> +
> +    d[H2(i / 2)] = a[H1(i) + 2];
> +    d[H2(i / 2 + 1)] = a[H1(i + 3)];
> +}
> +
> +RVPR2(sunpkd832, 4, 1);
> +
> +static inline void
> +do_zunpkd810(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    uint8_t *a = va;
> +    uint16_t *d = vd;
> +
> +    d[H2(i / 2)] = a[H1(i)];
> +    d[H2(i / 2 + 1)] = a[H1(i + 1)];
> +}
> +
> +RVPR2(zunpkd810, 4, 1);
> +
> +static inline void
> +do_zunpkd820(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    uint8_t *a = va;
> +    uint16_t *d = vd;
> +
> +    d[H2(i / 2)] = a[H1(i)];
> +    d[H2(i / 2 + 1)] = a[H1(i + 2)];
> +}
> +
> +RVPR2(zunpkd820, 4, 1);
> +
> +static inline void
> +do_zunpkd830(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    uint8_t *a = va;
> +    uint16_t *d = vd;
> +
> +    d[H2(i / 2)] = a[H1(i)];
> +    d[H2(i / 2 + 1)] = a[H1(i + 3)];
> +}
> +
> +RVPR2(zunpkd830, 4, 1);
> +
> +static inline void
> +do_zunpkd831(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    uint8_t *a = va;
> +    uint16_t *d = vd;
> +
> +    d[H2(i / 2)] = a[H1(i) + 1];
> +    d[H2(i / 2 + 1)] = a[H1(i + 3)];
> +}
> +
> +RVPR2(zunpkd831, 4, 1);
> +
> +static inline void
> +do_zunpkd832(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    uint8_t *a = va;
> +    uint16_t *d = vd;
> +
> +    d[H2(i / 2)] = a[H1(i) + 2];
> +    d[H2(i / 2 + 1)] = a[H1(i + 3)];
> +}
> +
> +RVPR2(zunpkd832, 4, 1);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 15/38] target/riscv: 16-bit Packing Instructions
  2021-02-12 15:02   ` LIU Zhiwei
@ 2021-03-16 14:42     ` Alistair Francis
  -1 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-16 14:42 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Palmer Dabbelt, open list:RISC-V,
	qemu-devel@nongnu.org Developers

On Fri, Feb 12, 2021 at 10:34 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |  5 +++
>  target/riscv/insn32.decode              |  5 +++
>  target/riscv/insn_trans/trans_rvp.c.inc |  9 +++++
>  target/riscv/packed_helper.c            | 45 +++++++++++++++++++++++++
>  4 files changed, 64 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 585905a689..4dc66cf4cc 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1263,3 +1263,8 @@ DEF_HELPER_2(zunpkd820, tl, env, tl)
>  DEF_HELPER_2(zunpkd830, tl, env, tl)
>  DEF_HELPER_2(zunpkd831, tl, env, tl)
>  DEF_HELPER_2(zunpkd832, tl, env, tl)
> +
> +DEF_HELPER_3(pkbb16, tl, env, tl, tl)
> +DEF_HELPER_3(pkbt16, tl, env, tl, tl)
> +DEF_HELPER_3(pktt16, tl, env, tl, tl)
> +DEF_HELPER_3(pktb16, tl, env, tl, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index fa4a02c9db..a4d9ff2282 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -731,3 +731,8 @@ zunpkd820  1010110  01101 ..... 000 ..... 1111111 @r2
>  zunpkd830  1010110  01110 ..... 000 ..... 1111111 @r2
>  zunpkd831  1010110  01111 ..... 000 ..... 1111111 @r2
>  zunpkd832  1010110  10111 ..... 000 ..... 1111111 @r2
> +
> +pkbb16     0000111  ..... ..... 001 ..... 1111111 @r
> +pkbt16     0001111  ..... ..... 001 ..... 1111111 @r
> +pktt16     0010111  ..... ..... 001 ..... 1111111 @r
> +pktb16     0011111  ..... ..... 001 ..... 1111111 @r
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index b69e964cb4..99a19019eb 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -511,3 +511,12 @@ GEN_RVP_R2_OOL(zunpkd820);
>  GEN_RVP_R2_OOL(zunpkd830);
>  GEN_RVP_R2_OOL(zunpkd831);
>  GEN_RVP_R2_OOL(zunpkd832);
> +
> +/*
> + *** Partial-SIMD Data Processing Instruction
> + */
> +/* 16-bit Packing Instructions */
> +GEN_RVP_R_OOL(pkbb16);
> +GEN_RVP_R_OOL(pkbt16);
> +GEN_RVP_R_OOL(pktt16);
> +GEN_RVP_R_OOL(pktb16);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index d0dcb692f5..fe1b48c86d 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -1323,3 +1323,48 @@ do_zunpkd832(CPURISCVState *env, void *vd, void *va, uint8_t i)
>  }
>
>  RVPR2(zunpkd832, 4, 1);
> +
> +/*
> + *** Partial-SIMD Data Processing Instructions
> + */
> +
> +/* 16-bit Packing Instructions */
> +static inline void do_pkbb16(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +    d[H2(i + 1)] = a[H2(i)];
> +    d[H2(i)] = b[H2(i)];
> +}
> +
> +RVPR(pkbb16, 2, 2);
> +
> +static inline void do_pkbt16(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +    d[H2(i + 1)] = a[H2(i)];
> +    d[H2(i)] = b[H2(i + 1)];
> +}
> +
> +RVPR(pkbt16, 2, 2);
> +
> +static inline void do_pktt16(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +    d[H2(i + 1)] = a[H2(i + 1)];
> +    d[H2(i)] = b[H2(i + 1)];
> +}
> +
> +RVPR(pktt16, 2, 2);
> +
> +static inline void do_pktb16(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +    d[H2(i + 1)] = a[H2(i + 1)];
> +    d[H2(i)] = b[H2(i)];
> +}
> +
> +RVPR(pktb16, 2, 2);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 15/38] target/riscv: 16-bit Packing Instructions
@ 2021-03-16 14:42     ` Alistair Francis
  0 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-16 14:42 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, Palmer Dabbelt

On Fri, Feb 12, 2021 at 10:34 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |  5 +++
>  target/riscv/insn32.decode              |  5 +++
>  target/riscv/insn_trans/trans_rvp.c.inc |  9 +++++
>  target/riscv/packed_helper.c            | 45 +++++++++++++++++++++++++
>  4 files changed, 64 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 585905a689..4dc66cf4cc 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1263,3 +1263,8 @@ DEF_HELPER_2(zunpkd820, tl, env, tl)
>  DEF_HELPER_2(zunpkd830, tl, env, tl)
>  DEF_HELPER_2(zunpkd831, tl, env, tl)
>  DEF_HELPER_2(zunpkd832, tl, env, tl)
> +
> +DEF_HELPER_3(pkbb16, tl, env, tl, tl)
> +DEF_HELPER_3(pkbt16, tl, env, tl, tl)
> +DEF_HELPER_3(pktt16, tl, env, tl, tl)
> +DEF_HELPER_3(pktb16, tl, env, tl, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index fa4a02c9db..a4d9ff2282 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -731,3 +731,8 @@ zunpkd820  1010110  01101 ..... 000 ..... 1111111 @r2
>  zunpkd830  1010110  01110 ..... 000 ..... 1111111 @r2
>  zunpkd831  1010110  01111 ..... 000 ..... 1111111 @r2
>  zunpkd832  1010110  10111 ..... 000 ..... 1111111 @r2
> +
> +pkbb16     0000111  ..... ..... 001 ..... 1111111 @r
> +pkbt16     0001111  ..... ..... 001 ..... 1111111 @r
> +pktt16     0010111  ..... ..... 001 ..... 1111111 @r
> +pktb16     0011111  ..... ..... 001 ..... 1111111 @r
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index b69e964cb4..99a19019eb 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -511,3 +511,12 @@ GEN_RVP_R2_OOL(zunpkd820);
>  GEN_RVP_R2_OOL(zunpkd830);
>  GEN_RVP_R2_OOL(zunpkd831);
>  GEN_RVP_R2_OOL(zunpkd832);
> +
> +/*
> + *** Partial-SIMD Data Processing Instruction
> + */
> +/* 16-bit Packing Instructions */
> +GEN_RVP_R_OOL(pkbb16);
> +GEN_RVP_R_OOL(pkbt16);
> +GEN_RVP_R_OOL(pktt16);
> +GEN_RVP_R_OOL(pktb16);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index d0dcb692f5..fe1b48c86d 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -1323,3 +1323,48 @@ do_zunpkd832(CPURISCVState *env, void *vd, void *va, uint8_t i)
>  }
>
>  RVPR2(zunpkd832, 4, 1);
> +
> +/*
> + *** Partial-SIMD Data Processing Instructions
> + */
> +
> +/* 16-bit Packing Instructions */
> +static inline void do_pkbb16(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +    d[H2(i + 1)] = a[H2(i)];
> +    d[H2(i)] = b[H2(i)];
> +}
> +
> +RVPR(pkbb16, 2, 2);
> +
> +static inline void do_pkbt16(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +    d[H2(i + 1)] = a[H2(i)];
> +    d[H2(i)] = b[H2(i + 1)];
> +}
> +
> +RVPR(pkbt16, 2, 2);
> +
> +static inline void do_pktt16(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +    d[H2(i + 1)] = a[H2(i + 1)];
> +    d[H2(i)] = b[H2(i + 1)];
> +}
> +
> +RVPR(pktt16, 2, 2);
> +
> +static inline void do_pktb16(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +    d[H2(i + 1)] = a[H2(i + 1)];
> +    d[H2(i)] = b[H2(i)];
> +}
> +
> +RVPR(pktb16, 2, 2);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 17/38] target/riscv: Signed MSW 32x16 Multiply and Add Instructions
  2021-02-12 15:02   ` LIU Zhiwei
@ 2021-03-16 16:01     ` Alistair Francis
  -1 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-16 16:01 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Palmer Dabbelt, open list:RISC-V,
	qemu-devel@nongnu.org Developers

On Fri, Feb 12, 2021 at 10:38 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |  17 ++
>  target/riscv/insn32.decode              |  17 ++
>  target/riscv/insn_trans/trans_rvp.c.inc |  18 ++
>  target/riscv/packed_helper.c            | 208 ++++++++++++++++++++++++
>  4 files changed, 260 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 0bd21c8514..25aa07a7ff 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1277,3 +1277,20 @@ DEF_HELPER_4(kmmsb, tl, env, tl, tl, tl)
>  DEF_HELPER_4(kmmsb_u, tl, env, tl, tl, tl)
>  DEF_HELPER_3(kwmmul, tl, env, tl, tl)
>  DEF_HELPER_3(kwmmul_u, tl, env, tl, tl)
> +
> +DEF_HELPER_3(smmwb, tl, env, tl, tl)
> +DEF_HELPER_3(smmwb_u, tl, env, tl, tl)
> +DEF_HELPER_3(smmwt, tl, env, tl, tl)
> +DEF_HELPER_3(smmwt_u, tl, env, tl, tl)
> +DEF_HELPER_4(kmmawb, tl, env, tl, tl, tl)
> +DEF_HELPER_4(kmmawb_u, tl, env, tl, tl, tl)
> +DEF_HELPER_4(kmmawt, tl, env, tl, tl, tl)
> +DEF_HELPER_4(kmmawt_u, tl, env, tl, tl, tl)
> +DEF_HELPER_3(kmmwb2, tl, env, tl, tl)
> +DEF_HELPER_3(kmmwb2_u, tl, env, tl, tl)
> +DEF_HELPER_3(kmmwt2, tl, env, tl, tl)
> +DEF_HELPER_3(kmmwt2_u, tl, env, tl, tl)
> +DEF_HELPER_4(kmmawb2, tl, env, tl, tl, tl)
> +DEF_HELPER_4(kmmawb2_u, tl, env, tl, tl, tl)
> +DEF_HELPER_4(kmmawt2, tl, env, tl, tl, tl)
> +DEF_HELPER_4(kmmawt2_u, tl, env, tl, tl, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index e0be2790dc..6e63bab2d9 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -745,3 +745,20 @@ kmmsb      0100001  ..... ..... 001 ..... 1111111 @r
>  kmmsb_u    0101001  ..... ..... 001 ..... 1111111 @r
>  kwmmul     0110001  ..... ..... 001 ..... 1111111 @r
>  kwmmul_u   0111001  ..... ..... 001 ..... 1111111 @r
> +
> +smmwb      0100010  ..... ..... 001 ..... 1111111 @r
> +smmwb_u    0101010  ..... ..... 001 ..... 1111111 @r
> +smmwt      0110010  ..... ..... 001 ..... 1111111 @r
> +smmwt_u    0111010  ..... ..... 001 ..... 1111111 @r
> +kmmawb     0100011  ..... ..... 001 ..... 1111111 @r
> +kmmawb_u   0101011  ..... ..... 001 ..... 1111111 @r
> +kmmawt     0110011  ..... ..... 001 ..... 1111111 @r
> +kmmawt_u   0111011  ..... ..... 001 ..... 1111111 @r
> +kmmwb2     1000111  ..... ..... 001 ..... 1111111 @r
> +kmmwb2_u   1001111  ..... ..... 001 ..... 1111111 @r
> +kmmwt2     1010111  ..... ..... 001 ..... 1111111 @r
> +kmmwt2_u   1011111  ..... ..... 001 ..... 1111111 @r
> +kmmawb2    1100111  ..... ..... 001 ..... 1111111 @r
> +kmmawb2_u  1101111  ..... ..... 001 ..... 1111111 @r
> +kmmawt2    1110111  ..... ..... 001 ..... 1111111 @r
> +kmmawt2_u  1111111  ..... ..... 001 ..... 1111111 @r
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index fbc9c0b57b..e708ae7a6a 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -564,3 +564,21 @@ GEN_RVP_R_ACC_OOL(kmmsb);
>  GEN_RVP_R_ACC_OOL(kmmsb_u);
>  GEN_RVP_R_OOL(kwmmul);
>  GEN_RVP_R_OOL(kwmmul_u);
> +
> +/* Most Significant Word "32x16" Multiply & Add Instructions */
> +GEN_RVP_R_OOL(smmwb);
> +GEN_RVP_R_OOL(smmwb_u);
> +GEN_RVP_R_OOL(smmwt);
> +GEN_RVP_R_OOL(smmwt_u);
> +GEN_RVP_R_ACC_OOL(kmmawb);
> +GEN_RVP_R_ACC_OOL(kmmawb_u);
> +GEN_RVP_R_ACC_OOL(kmmawt);
> +GEN_RVP_R_ACC_OOL(kmmawt_u);
> +GEN_RVP_R_OOL(kmmwb2);
> +GEN_RVP_R_OOL(kmmwb2_u);
> +GEN_RVP_R_OOL(kmmwt2);
> +GEN_RVP_R_OOL(kmmwt2_u);
> +GEN_RVP_R_ACC_OOL(kmmawb2);
> +GEN_RVP_R_ACC_OOL(kmmawb2_u);
> +GEN_RVP_R_ACC_OOL(kmmawt2);
> +GEN_RVP_R_ACC_OOL(kmmawt2_u);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index c1322d2fac..ea3c9f6dd8 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -1477,3 +1477,211 @@ static inline void do_kwmmul_u(CPURISCVState *env, void *vd, void *va,
>  }
>
>  RVPR(kwmmul_u, 1, 4);
> +
> +/* Most Significant Word "32x16" Multiply & Add Instructions */
> +static inline void do_smmwb(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    int16_t *b = vb;
> +    d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i)] >> 16;
> +}
> +
> +RVPR(smmwb, 1, 4);
> +
> +static inline void do_smmwb_u(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    int16_t *b = vb;
> +    d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i)] + (1ull << 15)) >> 16;
> +}
> +
> +RVPR(smmwb_u, 1, 4);
> +
> +static inline void do_smmwt(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    int16_t *b = vb;
> +    d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 16;
> +}
> +
> +RVPR(smmwt, 1, 4);
> +
> +static inline void do_smmwt_u(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    int16_t *b = vb;
> +    d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] + (1ull << 15)) >> 16;
> +}
> +
> +RVPR(smmwt_u, 1, 4);
> +
> +static inline void do_kmmawb(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, void *vc, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va, *c = vc;
> +    int16_t *b = vb;
> +    d[H4(i)] = sadd32(env, 0, (int64_t)a[H4(i)] * b[H2(2 * i)] >> 16, c[H4(i)]);
> +}
> +
> +RVPR_ACC(kmmawb, 1, 4);
> +
> +static inline void do_kmmawb_u(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, void *vc, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va, *c = vc;
> +    int16_t *b = vb;
> +    d[H4(i)] = sadd32(env, 0, ((int64_t)a[H4(i)] * b[H2(2 * i)] +
> +                               (1ull << 15)) >> 16, c[H4(i)]);
> +}
> +
> +RVPR_ACC(kmmawb_u, 1, 4);
> +
> +static inline void do_kmmawt(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, void *vc, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va, *c = vc;
> +    int16_t *b = vb;
> +    d[H4(i)] = sadd32(env, 0, (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 16,
> +                      c[H4(i)]);
> +}
> +
> +RVPR_ACC(kmmawt, 1, 4);
> +
> +static inline void do_kmmawt_u(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, void *vc, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va, *c = vc;
> +    int16_t *b = vb;
> +    d[H4(i)] = sadd32(env, 0, ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] +
> +                               (1ull << 15)) >> 16, c[H4(i)]);
> +}
> +
> +RVPR_ACC(kmmawt_u, 1, 4);
> +
> +static inline void do_kmmwb2(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    int16_t *b = vb;
> +    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
> +        env->vxsat = 0x1;
> +        d[H4(i)] = INT32_MAX;
> +    } else {
> +        d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i)] >> 15;
> +    }
> +}
> +
> +RVPR(kmmwb2, 1, 4);
> +
> +static inline void do_kmmwb2_u(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    int16_t *b = vb;
> +    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
> +        env->vxsat = 0x1;
> +        d[H4(i)] = INT32_MAX;
> +    } else {
> +        d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i)] + (1ull << 14)) >> 15;
> +    }
> +}
> +
> +RVPR(kmmwb2_u, 1, 4);
> +
> +static inline void do_kmmwt2(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    int16_t *b = vb;
> +    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
> +        env->vxsat = 0x1;
> +        d[H4(i)] = INT32_MAX;
> +    } else {
> +        d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 15;
> +    }
> +}
> +
> +RVPR(kmmwt2, 1, 4);
> +
> +static inline void do_kmmwt2_u(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    int16_t *b = vb;
> +    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
> +        env->vxsat = 0x1;
> +        d[H4(i)] = INT32_MAX;
> +    } else {
> +        d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] + (1ull << 14)) >> 15;
> +    }
> +}
> +
> +RVPR(kmmwt2_u, 1, 4);
> +
> +static inline void do_kmmawb2(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, void *vc, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va, *c = vc, result;
> +    int16_t *b = vb;
> +    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
> +        env->vxsat = 0x1;
> +        result = INT32_MAX;
> +    } else {
> +        result = (int64_t)a[H4(i)] * b[H2(2 * i)] >> 15;
> +    }
> +    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
> +}
> +
> +RVPR_ACC(kmmawb2, 1, 4);
> +
> +static inline void do_kmmawb2_u(CPURISCVState *env, void *vd, void *va,
> +                                void *vb, void *vc, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va, *c = vc, result;
> +    int16_t *b = vb;
> +    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
> +        env->vxsat = 0x1;
> +        result = INT32_MAX;
> +    } else {
> +        result = ((int64_t)a[H4(i)] * b[H2(2 * i)] + (1ull << 14)) >> 15;
> +    }
> +    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
> +}
> +
> +RVPR_ACC(kmmawb2_u, 1, 4);
> +
> +static inline void do_kmmawt2(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, void *vc, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va, *c = vc, result;
> +    int16_t *b = vb;
> +    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
> +        env->vxsat = 0x1;
> +        result = INT32_MAX;
> +    } else {
> +        result = (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 15;
> +    }
> +    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
> +}
> +
> +RVPR_ACC(kmmawt2, 1, 4);
> +
> +static inline void do_kmmawt2_u(CPURISCVState *env, void *vd, void *va,
> +                                void *vb, void *vc, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va, *c = vc, result;
> +    int16_t *b = vb;
> +    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
> +        env->vxsat = 0x1;
> +        result = INT32_MAX;
> +    } else {
> +        result = ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] + (1ull << 14)) >> 15;
> +    }
> +    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
> +}
> +
> +RVPR_ACC(kmmawt2_u, 1, 4);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 17/38] target/riscv: Signed MSW 32x16 Multiply and Add Instructions
@ 2021-03-16 16:01     ` Alistair Francis
  0 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-16 16:01 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, Palmer Dabbelt

On Fri, Feb 12, 2021 at 10:38 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |  17 ++
>  target/riscv/insn32.decode              |  17 ++
>  target/riscv/insn_trans/trans_rvp.c.inc |  18 ++
>  target/riscv/packed_helper.c            | 208 ++++++++++++++++++++++++
>  4 files changed, 260 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 0bd21c8514..25aa07a7ff 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1277,3 +1277,20 @@ DEF_HELPER_4(kmmsb, tl, env, tl, tl, tl)
>  DEF_HELPER_4(kmmsb_u, tl, env, tl, tl, tl)
>  DEF_HELPER_3(kwmmul, tl, env, tl, tl)
>  DEF_HELPER_3(kwmmul_u, tl, env, tl, tl)
> +
> +DEF_HELPER_3(smmwb, tl, env, tl, tl)
> +DEF_HELPER_3(smmwb_u, tl, env, tl, tl)
> +DEF_HELPER_3(smmwt, tl, env, tl, tl)
> +DEF_HELPER_3(smmwt_u, tl, env, tl, tl)
> +DEF_HELPER_4(kmmawb, tl, env, tl, tl, tl)
> +DEF_HELPER_4(kmmawb_u, tl, env, tl, tl, tl)
> +DEF_HELPER_4(kmmawt, tl, env, tl, tl, tl)
> +DEF_HELPER_4(kmmawt_u, tl, env, tl, tl, tl)
> +DEF_HELPER_3(kmmwb2, tl, env, tl, tl)
> +DEF_HELPER_3(kmmwb2_u, tl, env, tl, tl)
> +DEF_HELPER_3(kmmwt2, tl, env, tl, tl)
> +DEF_HELPER_3(kmmwt2_u, tl, env, tl, tl)
> +DEF_HELPER_4(kmmawb2, tl, env, tl, tl, tl)
> +DEF_HELPER_4(kmmawb2_u, tl, env, tl, tl, tl)
> +DEF_HELPER_4(kmmawt2, tl, env, tl, tl, tl)
> +DEF_HELPER_4(kmmawt2_u, tl, env, tl, tl, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index e0be2790dc..6e63bab2d9 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -745,3 +745,20 @@ kmmsb      0100001  ..... ..... 001 ..... 1111111 @r
>  kmmsb_u    0101001  ..... ..... 001 ..... 1111111 @r
>  kwmmul     0110001  ..... ..... 001 ..... 1111111 @r
>  kwmmul_u   0111001  ..... ..... 001 ..... 1111111 @r
> +
> +smmwb      0100010  ..... ..... 001 ..... 1111111 @r
> +smmwb_u    0101010  ..... ..... 001 ..... 1111111 @r
> +smmwt      0110010  ..... ..... 001 ..... 1111111 @r
> +smmwt_u    0111010  ..... ..... 001 ..... 1111111 @r
> +kmmawb     0100011  ..... ..... 001 ..... 1111111 @r
> +kmmawb_u   0101011  ..... ..... 001 ..... 1111111 @r
> +kmmawt     0110011  ..... ..... 001 ..... 1111111 @r
> +kmmawt_u   0111011  ..... ..... 001 ..... 1111111 @r
> +kmmwb2     1000111  ..... ..... 001 ..... 1111111 @r
> +kmmwb2_u   1001111  ..... ..... 001 ..... 1111111 @r
> +kmmwt2     1010111  ..... ..... 001 ..... 1111111 @r
> +kmmwt2_u   1011111  ..... ..... 001 ..... 1111111 @r
> +kmmawb2    1100111  ..... ..... 001 ..... 1111111 @r
> +kmmawb2_u  1101111  ..... ..... 001 ..... 1111111 @r
> +kmmawt2    1110111  ..... ..... 001 ..... 1111111 @r
> +kmmawt2_u  1111111  ..... ..... 001 ..... 1111111 @r
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index fbc9c0b57b..e708ae7a6a 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -564,3 +564,21 @@ GEN_RVP_R_ACC_OOL(kmmsb);
>  GEN_RVP_R_ACC_OOL(kmmsb_u);
>  GEN_RVP_R_OOL(kwmmul);
>  GEN_RVP_R_OOL(kwmmul_u);
> +
> +/* Most Significant Word "32x16" Multiply & Add Instructions */
> +GEN_RVP_R_OOL(smmwb);
> +GEN_RVP_R_OOL(smmwb_u);
> +GEN_RVP_R_OOL(smmwt);
> +GEN_RVP_R_OOL(smmwt_u);
> +GEN_RVP_R_ACC_OOL(kmmawb);
> +GEN_RVP_R_ACC_OOL(kmmawb_u);
> +GEN_RVP_R_ACC_OOL(kmmawt);
> +GEN_RVP_R_ACC_OOL(kmmawt_u);
> +GEN_RVP_R_OOL(kmmwb2);
> +GEN_RVP_R_OOL(kmmwb2_u);
> +GEN_RVP_R_OOL(kmmwt2);
> +GEN_RVP_R_OOL(kmmwt2_u);
> +GEN_RVP_R_ACC_OOL(kmmawb2);
> +GEN_RVP_R_ACC_OOL(kmmawb2_u);
> +GEN_RVP_R_ACC_OOL(kmmawt2);
> +GEN_RVP_R_ACC_OOL(kmmawt2_u);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index c1322d2fac..ea3c9f6dd8 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -1477,3 +1477,211 @@ static inline void do_kwmmul_u(CPURISCVState *env, void *vd, void *va,
>  }
>
>  RVPR(kwmmul_u, 1, 4);
> +
> +/* Most Significant Word "32x16" Multiply & Add Instructions */
> +static inline void do_smmwb(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    int16_t *b = vb;
> +    d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i)] >> 16;
> +}
> +
> +RVPR(smmwb, 1, 4);
> +
> +static inline void do_smmwb_u(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    int16_t *b = vb;
> +    d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i)] + (1ull << 15)) >> 16;
> +}
> +
> +RVPR(smmwb_u, 1, 4);
> +
> +static inline void do_smmwt(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    int16_t *b = vb;
> +    d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 16;
> +}
> +
> +RVPR(smmwt, 1, 4);
> +
> +static inline void do_smmwt_u(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    int16_t *b = vb;
> +    d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] + (1ull << 15)) >> 16;
> +}
> +
> +RVPR(smmwt_u, 1, 4);
> +
> +static inline void do_kmmawb(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, void *vc, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va, *c = vc;
> +    int16_t *b = vb;
> +    d[H4(i)] = sadd32(env, 0, (int64_t)a[H4(i)] * b[H2(2 * i)] >> 16, c[H4(i)]);
> +}
> +
> +RVPR_ACC(kmmawb, 1, 4);
> +
> +static inline void do_kmmawb_u(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, void *vc, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va, *c = vc;
> +    int16_t *b = vb;
> +    d[H4(i)] = sadd32(env, 0, ((int64_t)a[H4(i)] * b[H2(2 * i)] +
> +                               (1ull << 15)) >> 16, c[H4(i)]);
> +}
> +
> +RVPR_ACC(kmmawb_u, 1, 4);
> +
> +static inline void do_kmmawt(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, void *vc, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va, *c = vc;
> +    int16_t *b = vb;
> +    d[H4(i)] = sadd32(env, 0, (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 16,
> +                      c[H4(i)]);
> +}
> +
> +RVPR_ACC(kmmawt, 1, 4);
> +
> +static inline void do_kmmawt_u(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, void *vc, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va, *c = vc;
> +    int16_t *b = vb;
> +    d[H4(i)] = sadd32(env, 0, ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] +
> +                               (1ull << 15)) >> 16, c[H4(i)]);
> +}
> +
> +RVPR_ACC(kmmawt_u, 1, 4);
> +
> +static inline void do_kmmwb2(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    int16_t *b = vb;
> +    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
> +        env->vxsat = 0x1;
> +        d[H4(i)] = INT32_MAX;
> +    } else {
> +        d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i)] >> 15;
> +    }
> +}
> +
> +RVPR(kmmwb2, 1, 4);
> +
> +static inline void do_kmmwb2_u(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    int16_t *b = vb;
> +    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
> +        env->vxsat = 0x1;
> +        d[H4(i)] = INT32_MAX;
> +    } else {
> +        d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i)] + (1ull << 14)) >> 15;
> +    }
> +}
> +
> +RVPR(kmmwb2_u, 1, 4);
> +
> +static inline void do_kmmwt2(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    int16_t *b = vb;
> +    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
> +        env->vxsat = 0x1;
> +        d[H4(i)] = INT32_MAX;
> +    } else {
> +        d[H4(i)] = (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 15;
> +    }
> +}
> +
> +RVPR(kmmwt2, 1, 4);
> +
> +static inline void do_kmmwt2_u(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    int16_t *b = vb;
> +    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
> +        env->vxsat = 0x1;
> +        d[H4(i)] = INT32_MAX;
> +    } else {
> +        d[H4(i)] = ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] + (1ull << 14)) >> 15;
> +    }
> +}
> +
> +RVPR(kmmwt2_u, 1, 4);
> +
> +static inline void do_kmmawb2(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, void *vc, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va, *c = vc, result;
> +    int16_t *b = vb;
> +    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
> +        env->vxsat = 0x1;
> +        result = INT32_MAX;
> +    } else {
> +        result = (int64_t)a[H4(i)] * b[H2(2 * i)] >> 15;
> +    }
> +    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
> +}
> +
> +RVPR_ACC(kmmawb2, 1, 4);
> +
> +static inline void do_kmmawb2_u(CPURISCVState *env, void *vd, void *va,
> +                                void *vb, void *vc, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va, *c = vc, result;
> +    int16_t *b = vb;
> +    if (a[H4(i)] == INT32_MIN && b[H2(2 * i)] == INT16_MIN) {
> +        env->vxsat = 0x1;
> +        result = INT32_MAX;
> +    } else {
> +        result = ((int64_t)a[H4(i)] * b[H2(2 * i)] + (1ull << 14)) >> 15;
> +    }
> +    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
> +}
> +
> +RVPR_ACC(kmmawb2_u, 1, 4);
> +
> +static inline void do_kmmawt2(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, void *vc, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va, *c = vc, result;
> +    int16_t *b = vb;
> +    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
> +        env->vxsat = 0x1;
> +        result = INT32_MAX;
> +    } else {
> +        result = (int64_t)a[H4(i)] * b[H2(2 * i + 1)] >> 15;
> +    }
> +    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
> +}
> +
> +RVPR_ACC(kmmawt2, 1, 4);
> +
> +static inline void do_kmmawt2_u(CPURISCVState *env, void *vd, void *va,
> +                                void *vb, void *vc, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va, *c = vc, result;
> +    int16_t *b = vb;
> +    if (a[H4(i)] == INT32_MIN && b[H2(2 * i + 1)] == INT16_MIN) {
> +        env->vxsat = 0x1;
> +        result = INT32_MAX;
> +    } else {
> +        result = ((int64_t)a[H4(i)] * b[H2(2 * i + 1)] + (1ull << 14)) >> 15;
> +    }
> +    d[H4(i)] = sadd32(env, 0, result, c[H4(i)]);
> +}
> +
> +RVPR_ACC(kmmawt2_u, 1, 4);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 20/38] target/riscv: Partial-SIMD Miscellaneous Instructions
  2021-02-12 15:02   ` LIU Zhiwei
@ 2021-03-16 19:44     ` Alistair Francis
  -1 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-16 19:44 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Palmer Dabbelt, open list:RISC-V,
	qemu-devel@nongnu.org Developers

On Fri, Feb 12, 2021 at 10:44 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |  8 +++
>  target/riscv/insn32-64.decode           |  4 --
>  target/riscv/insn32.decode              | 10 ++++
>  target/riscv/insn_trans/trans_rvp.c.inc |  9 +++
>  target/riscv/packed_helper.c            | 75 +++++++++++++++++++++++++
>  5 files changed, 102 insertions(+), 4 deletions(-)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 2511134610..7c3a0654d6 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1315,3 +1315,11 @@ DEF_HELPER_4(kmsda, tl, env, tl, tl, tl)
>  DEF_HELPER_4(kmsxda, tl, env, tl, tl, tl)
>
>  DEF_HELPER_3(smal, i64, env, i64, tl)
> +
> +DEF_HELPER_3(sclip32, tl, env, tl, tl)
> +DEF_HELPER_3(uclip32, tl, env, tl, tl)
> +DEF_HELPER_2(clrs32, tl, env, tl)
> +DEF_HELPER_2(clz32, tl, env, tl)
> +DEF_HELPER_2(clo32, tl, env, tl)
> +DEF_HELPER_3(pbsad, tl, env, tl, tl)
> +DEF_HELPER_4(pbsada, tl, env, tl, tl, tl)
> diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
> index 8157dee8b7..1094172210 100644
> --- a/target/riscv/insn32-64.decode
> +++ b/target/riscv/insn32-64.decode
> @@ -19,10 +19,6 @@
>  # This is concatenated with insn32.decode for risc64 targets.
>  # Most of the fields and formats are there.
>
> -%sh5    20:5
> -
> -@sh5     .......  ..... .....  ... ..... ....... &shift  shamt=%sh5      %rs1 %rd
> -
>  # *** RV64I Base Instruction Set (in addition to RV32I) ***
>  lwu      ............   ..... 110 ..... 0000011 @i
>  ld       ............   ..... 011 ..... 0000011 @i
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index a022f660b7..12e95f9c5f 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -25,6 +25,7 @@
>  %sh10    20:10
>  %sh4    20:4
>  %sh3    20:3
> +%sh5    20:5
>  %csr    20:12
>  %rm     12:3
>  %nf     29:3                     !function=ex_plus_1
> @@ -64,6 +65,7 @@
>  @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh10      %rs1 %rd
>  @sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
>  @sh3     ......  ...... .....  ... ..... ....... &shift  shamt=%sh3      %rs1 %rd
> +@sh5     ......  ...... .....  ... ..... ....... &shift  shamt=%sh5      %rs1 %rd
>  @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
>
>  @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
> @@ -783,3 +785,11 @@ kmsda      0100110  ..... ..... 001 ..... 1111111 @r
>  kmsxda     0100111  ..... ..... 001 ..... 1111111 @r
>
>  smal       0101111  ..... ..... 001 ..... 1111111 @r
> +
> +sclip32    1110010  ..... ..... 000 ..... 1111111 @sh5
> +uclip32    1111010  ..... ..... 000 ..... 1111111 @sh5
> +clrs32     1010111  11000 ..... 000 ..... 1111111 @r2
> +clz32      1010111  11001 ..... 000 ..... 1111111 @r2
> +clo32      1010111  11011 ..... 000 ..... 1111111 @r2
> +pbsad      1111110  ..... ..... 000 ..... 1111111 @r
> +pbsada     1111111  ..... ..... 000 ..... 1111111 @r
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 73a26bbfbd..42656682c6 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -656,3 +656,12 @@ static bool trans_##NAME(DisasContext *s, arg_r *a)    \
>  }
>
>  GEN_RVP_R_D64_S64_OOL(smal);
> +
> +/* Partial-SIMD Miscellaneous Instructions */
> +GEN_RVP_SHIFTI(sclip32, sclip32, NULL);
> +GEN_RVP_SHIFTI(uclip32, uclip32, NULL);
> +GEN_RVP_R2_OOL(clrs32);
> +GEN_RVP_R2_OOL(clz32);
> +GEN_RVP_R2_OOL(clo32);
> +GEN_RVP_R_OOL(pbsad);
> +GEN_RVP_R_ACC_OOL(pbsada);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index 8ad7ea8354..96e73c045b 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -1978,3 +1978,78 @@ uint64_t helper_smal(CPURISCVState *env, uint64_t a, target_ulong b)
>      }
>      return result;
>  }
> +
> +/* Partial-SIMD Miscellaneous Instructions */
> +static inline void do_sclip32(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x1f;
> +
> +    d[i] = sat64(env, a[i], shift);
> +}
> +
> +RVPR(sclip32, 1, 4);
> +
> +static inline void do_uclip32(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x1f;
> +
> +    if (a[i] < 0) {
> +        d[i] = 0;
> +        env->vxsat = 0x1;
> +    } else {
> +        d[i] = satu64(env, a[i], shift);
> +    }
> +}
> +
> +RVPR(uclip32, 1, 4);
> +
> +static inline void do_clrs32(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    d[i] = clrsb32(a[i]);
> +}
> +
> +RVPR2(clrs32, 1, 4);
> +
> +static inline void do_clz32(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    d[i] = clz32(a[i]);
> +}
> +
> +RVPR2(clz32, 1, 4);
> +
> +static inline void do_clo32(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    d[i] = clo32(a[i]);
> +}
> +
> +RVPR2(clo32, 1, 4);
> +
> +static inline void do_pbsad(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    target_ulong *d = vd;
> +    uint8_t *a = va, *b = vb;
> +    *d += abs(a[i] - b[i]);
> +}
> +
> +RVPR(pbsad, 1, 1);
> +
> +static inline void do_pbsada(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, void *vc, uint8_t i)
> +{
> +    target_ulong *d = vd, *c = vc;
> +    uint8_t *a = va, *b = vb;
> +    if (i == 0) {
> +        *d += *c;
> +    }
> +    *d += abs(a[i] - b[i]);
> +}
> +
> +RVPR_ACC(pbsada, 1, 1);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 20/38] target/riscv: Partial-SIMD Miscellaneous Instructions
@ 2021-03-16 19:44     ` Alistair Francis
  0 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-16 19:44 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, Palmer Dabbelt

On Fri, Feb 12, 2021 at 10:44 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/helper.h                   |  8 +++
>  target/riscv/insn32-64.decode           |  4 --
>  target/riscv/insn32.decode              | 10 ++++
>  target/riscv/insn_trans/trans_rvp.c.inc |  9 +++
>  target/riscv/packed_helper.c            | 75 +++++++++++++++++++++++++
>  5 files changed, 102 insertions(+), 4 deletions(-)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 2511134610..7c3a0654d6 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1315,3 +1315,11 @@ DEF_HELPER_4(kmsda, tl, env, tl, tl, tl)
>  DEF_HELPER_4(kmsxda, tl, env, tl, tl, tl)
>
>  DEF_HELPER_3(smal, i64, env, i64, tl)
> +
> +DEF_HELPER_3(sclip32, tl, env, tl, tl)
> +DEF_HELPER_3(uclip32, tl, env, tl, tl)
> +DEF_HELPER_2(clrs32, tl, env, tl)
> +DEF_HELPER_2(clz32, tl, env, tl)
> +DEF_HELPER_2(clo32, tl, env, tl)
> +DEF_HELPER_3(pbsad, tl, env, tl, tl)
> +DEF_HELPER_4(pbsada, tl, env, tl, tl, tl)
> diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
> index 8157dee8b7..1094172210 100644
> --- a/target/riscv/insn32-64.decode
> +++ b/target/riscv/insn32-64.decode
> @@ -19,10 +19,6 @@
>  # This is concatenated with insn32.decode for risc64 targets.
>  # Most of the fields and formats are there.
>
> -%sh5    20:5
> -
> -@sh5     .......  ..... .....  ... ..... ....... &shift  shamt=%sh5      %rs1 %rd
> -
>  # *** RV64I Base Instruction Set (in addition to RV32I) ***
>  lwu      ............   ..... 110 ..... 0000011 @i
>  ld       ............   ..... 011 ..... 0000011 @i
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index a022f660b7..12e95f9c5f 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -25,6 +25,7 @@
>  %sh10    20:10
>  %sh4    20:4
>  %sh3    20:3
> +%sh5    20:5
>  %csr    20:12
>  %rm     12:3
>  %nf     29:3                     !function=ex_plus_1
> @@ -64,6 +65,7 @@
>  @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh10      %rs1 %rd
>  @sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
>  @sh3     ......  ...... .....  ... ..... ....... &shift  shamt=%sh3      %rs1 %rd
> +@sh5     ......  ...... .....  ... ..... ....... &shift  shamt=%sh5      %rs1 %rd
>  @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
>
>  @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
> @@ -783,3 +785,11 @@ kmsda      0100110  ..... ..... 001 ..... 1111111 @r
>  kmsxda     0100111  ..... ..... 001 ..... 1111111 @r
>
>  smal       0101111  ..... ..... 001 ..... 1111111 @r
> +
> +sclip32    1110010  ..... ..... 000 ..... 1111111 @sh5
> +uclip32    1111010  ..... ..... 000 ..... 1111111 @sh5
> +clrs32     1010111  11000 ..... 000 ..... 1111111 @r2
> +clz32      1010111  11001 ..... 000 ..... 1111111 @r2
> +clo32      1010111  11011 ..... 000 ..... 1111111 @r2
> +pbsad      1111110  ..... ..... 000 ..... 1111111 @r
> +pbsada     1111111  ..... ..... 000 ..... 1111111 @r
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 73a26bbfbd..42656682c6 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -656,3 +656,12 @@ static bool trans_##NAME(DisasContext *s, arg_r *a)    \
>  }
>
>  GEN_RVP_R_D64_S64_OOL(smal);
> +
> +/* Partial-SIMD Miscellaneous Instructions */
> +GEN_RVP_SHIFTI(sclip32, sclip32, NULL);
> +GEN_RVP_SHIFTI(uclip32, uclip32, NULL);
> +GEN_RVP_R2_OOL(clrs32);
> +GEN_RVP_R2_OOL(clz32);
> +GEN_RVP_R2_OOL(clo32);
> +GEN_RVP_R_OOL(pbsad);
> +GEN_RVP_R_ACC_OOL(pbsada);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index 8ad7ea8354..96e73c045b 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -1978,3 +1978,78 @@ uint64_t helper_smal(CPURISCVState *env, uint64_t a, target_ulong b)
>      }
>      return result;
>  }
> +
> +/* Partial-SIMD Miscellaneous Instructions */
> +static inline void do_sclip32(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x1f;
> +
> +    d[i] = sat64(env, a[i], shift);
> +}
> +
> +RVPR(sclip32, 1, 4);
> +
> +static inline void do_uclip32(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x1f;
> +
> +    if (a[i] < 0) {
> +        d[i] = 0;
> +        env->vxsat = 0x1;
> +    } else {
> +        d[i] = satu64(env, a[i], shift);
> +    }
> +}
> +
> +RVPR(uclip32, 1, 4);
> +
> +static inline void do_clrs32(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    d[i] = clrsb32(a[i]);
> +}
> +
> +RVPR2(clrs32, 1, 4);
> +
> +static inline void do_clz32(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    d[i] = clz32(a[i]);
> +}
> +
> +RVPR2(clz32, 1, 4);
> +
> +static inline void do_clo32(CPURISCVState *env, void *vd, void *va, uint8_t i)
> +{
> +    int32_t *d = vd, *a = va;
> +    d[i] = clo32(a[i]);
> +}
> +
> +RVPR2(clo32, 1, 4);
> +
> +static inline void do_pbsad(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    target_ulong *d = vd;
> +    uint8_t *a = va, *b = vb;
> +    *d += abs(a[i] - b[i]);
> +}
> +
> +RVPR(pbsad, 1, 1);
> +
> +static inline void do_pbsada(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, void *vc, uint8_t i)
> +{
> +    target_ulong *d = vd, *c = vc;
> +    uint8_t *a = va, *b = vb;
> +    if (i == 0) {
> +        *d += *c;
> +    }
> +    *d += abs(a[i] - b[i]);
> +}
> +
> +RVPR_ACC(pbsada, 1, 1);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 06/38] target/riscv: SIMD 16-bit Shift Instructions
  2021-03-16  2:40       ` LIU Zhiwei
@ 2021-03-16 19:54         ` Alistair Francis
  -1 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-16 19:54 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Palmer Dabbelt, open list:RISC-V,
	qemu-devel@nongnu.org Developers

On Mon, Mar 15, 2021 at 10:40 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
>
>
> On 2021/3/16 5:25, Alistair Francis wrote:
> > On Fri, Feb 12, 2021 at 10:16 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
> >> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> >> ---
> >>   target/riscv/helper.h                   |   9 ++
> >>   target/riscv/insn32.decode              |  17 ++++
> >>   target/riscv/insn_trans/trans_rvp.c.inc | 115 ++++++++++++++++++++++++
> >>   target/riscv/packed_helper.c            | 104 +++++++++++++++++++++
> >>   4 files changed, 245 insertions(+)
> >>
> >> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> >> index a69a6b4e84..20bf400ac2 100644
> >> --- a/target/riscv/helper.h
> >> +++ b/target/riscv/helper.h
> >> @@ -1184,3 +1184,12 @@ DEF_HELPER_3(rsub8, tl, env, tl, tl)
> >>   DEF_HELPER_3(ursub8, tl, env, tl, tl)
> >>   DEF_HELPER_3(ksub8, tl, env, tl, tl)
> >>   DEF_HELPER_3(uksub8, tl, env, tl, tl)
> >> +
> >> +DEF_HELPER_3(sra16, tl, env, tl, tl)
> >> +DEF_HELPER_3(sra16_u, tl, env, tl, tl)
> >> +DEF_HELPER_3(srl16, tl, env, tl, tl)
> >> +DEF_HELPER_3(srl16_u, tl, env, tl, tl)
> >> +DEF_HELPER_3(sll16, tl, env, tl, tl)
> >> +DEF_HELPER_3(ksll16, tl, env, tl, tl)
> >> +DEF_HELPER_3(kslra16, tl, env, tl, tl)
> >> +DEF_HELPER_3(kslra16_u, tl, env, tl, tl)
> >> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> >> index 358dd1fa10..6f053bfeb7 100644
> >> --- a/target/riscv/insn32.decode
> >> +++ b/target/riscv/insn32.decode
> >> @@ -23,6 +23,7 @@
> >>   %rd        7:5
> >>
> >>   %sh10    20:10
> >> +%sh4    20:4
> >>   %csr    20:12
> >>   %rm     12:3
> >>   %nf     29:3                     !function=ex_plus_1
> >> @@ -59,6 +60,7 @@
> >>   @j       ....................      ..... ....... &j      imm=%imm_j          %rd
> >>
> >>   @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh10      %rs1 %rd
> >> +@sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
> >>   @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
> >>
> >>   @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
> >> @@ -635,3 +637,18 @@ rsub8      0000101  ..... ..... 000 ..... 1111111 @r
> >>   ursub8     0010101  ..... ..... 000 ..... 1111111 @r
> >>   ksub8      0001101  ..... ..... 000 ..... 1111111 @r
> >>   uksub8     0011101  ..... ..... 000 ..... 1111111 @r
> >> +
> >> +sra16      0101000  ..... ..... 000 ..... 1111111 @r
> >> +sra16_u    0110000  ..... ..... 000 ..... 1111111 @r
> >> +srai16     0111000  0.... ..... 000 ..... 1111111 @sh4
> >> +srai16_u   0111000  1.... ..... 000 ..... 1111111 @sh4
> >> +srl16      0101001  ..... ..... 000 ..... 1111111 @r
> >> +srl16_u    0110001  ..... ..... 000 ..... 1111111 @r
> >> +srli16     0111001  0.... ..... 000 ..... 1111111 @sh4
> >> +srli16_u   0111001  1.... ..... 000 ..... 1111111 @sh4
> >> +sll16      0101010  ..... ..... 000 ..... 1111111 @r
> >> +slli16     0111010  0.... ..... 000 ..... 1111111 @sh4
> >> +ksll16     0110010  ..... ..... 000 ..... 1111111 @r
> >> +kslli16    0111010  1.... ..... 000 ..... 1111111 @sh4
> >> +kslra16    0101011  ..... ..... 000 ..... 1111111 @r
> >> +kslra16_u  0110011  ..... ..... 000 ..... 1111111 @r
> >> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> >> index 109f560ec9..848edab7e5 100644
> >> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> >> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> >> @@ -238,3 +238,118 @@ GEN_RVP_R_OOL(rsub8);
> >>   GEN_RVP_R_OOL(ursub8);
> >>   GEN_RVP_R_OOL(ksub8);
> >>   GEN_RVP_R_OOL(uksub8);
> >> +
> >> +/* 16-bit Shift Instructions */
> >> +static bool rvp_shift_ool(DisasContext *ctx, arg_r *a,
> >> +                          gen_helper_rvp_r *fn, target_ulong mask)
> >> +{
> >> +    TCGv src1, src2, dst;
> >> +
> >> +    src1 = tcg_temp_new();
> >> +    src2 = tcg_temp_new();
> >> +    dst = tcg_temp_new();
> >> +
> >> +    gen_get_gpr(src1, a->rs1);
> >> +    gen_get_gpr(src2, a->rs2);
> >> +    tcg_gen_andi_tl(src2, src2, mask);
> >> +
> >> +    fn(dst, cpu_env, src1, src2);
> >> +    gen_set_gpr(a->rd, dst);
> >> +
> >> +    tcg_temp_free(src1);
> >> +    tcg_temp_free(src2);
> >> +    tcg_temp_free(dst);
> >> +    return true;
> >> +}
> >> +
> >> +typedef void GenGvecShift(unsigned, uint32_t, uint32_t, TCGv_i32,
> >> +                          uint32_t, uint32_t);
> >> +static inline bool
> >> +rvp_shift(DisasContext *ctx, arg_r *a, uint8_t vece,
> >> +          GenGvecShift *f64, gen_helper_rvp_r *fn,
> >> +          uint8_t mask)
> >> +{
> >> +    if (!has_ext(ctx, RVP)) {
> >> +        return false;
> >> +    }
> >> +
> >> +#ifdef TARGET_RISCV64
> > Hmm....
> >
> > I don't want to add any more #defines on the RISC-V xlen. We are
> > trying to make the QEMU RISC-V implementation xlen independent.
> I noticed the change, but was not quite clear about the benefit of it.
>
> Could you give a brief explanation?

Yep, there are a few reasons for it.

AFAIK every QEMU platform except RISC-V allows the 64-bit binary to
run the 32-bit guests. So for example in the ARM QEMU builds I can use
qemu-system-aarch64 to run 32-bit ARMv7 guests. This is a step towards
allowing us to do the same with RISC-V.

It's also a step towards allowing fixed XLEN CPUS to run. For example
4 64-bit application CPUs and a single 32-bit power management CPU can
all run together.

Also XLEN in RISC-V is configurable. A 64-bit Hypervisor can have
32-bit XLEN guests according to the spec. This is a step towards
allowing that as well.

> > Can you use `riscv_cpu_is_32bit(env)` instead, here are everywhere
> > else you add a #define TARGET... ?
> Sure, I think there are two ways.
>
> 1) Get env from the current_cpu, then call riscv_cpu_is_32bit(env).
>
> It's some strange,  because I can't find current_cpu reference from many
> archs.
>
> I don't know whether it has side effects.
>
> 2)  Add a similar function cpu_is_32bit(DisasContext *ctx).

This is probably a better option, but I'm open to either way if you
have a strong preference.

Alistair

>
> In this way, the type of  misa field  in struct DisasContext should be
> target_ulong.
> Currently, the type of misa filed is uint32_t.
>
> Do you think which one is better? Thanks very much.
>
> Zhiwei
> >
> > Alistair
> >
> >> +    if (a->rd && a->rs1 && a->rs2) {
> >> +        TCGv_i32 shift = tcg_temp_new_i32();
> >> +        tcg_gen_extrl_i64_i32(shift, cpu_gpr[a->rs2]);
> >> +        tcg_gen_andi_i32(shift, shift, mask);
> >> +        f64(vece, offsetof(CPURISCVState, gpr[a->rd]),
> >> +            offsetof(CPURISCVState, gpr[a->rs1]),
> >> +            shift, 8, 8);
> >> +        tcg_temp_free_i32(shift);
> >> +        return true;
> >> +    }
> >> +#endif
> >> +    return rvp_shift_ool(ctx, a, fn, mask);
> >> +}
> >> +
> >> +#define GEN_RVP_SHIFT(NAME, GVEC, VECE)                     \
> >> +static bool trans_##NAME(DisasContext *s, arg_r *a)         \
> >> +{                                                           \
> >> +    return rvp_shift(s, a, VECE, GVEC, gen_helper_##NAME,   \
> >> +                     (8 << VECE) - 1);                      \
> >> +}
> >> +
> >> +GEN_RVP_SHIFT(sra16, tcg_gen_gvec_sars, 1);
> >> +GEN_RVP_SHIFT(srl16, tcg_gen_gvec_shrs, 1);
> >> +GEN_RVP_SHIFT(sll16, tcg_gen_gvec_shls, 1);
> >> +GEN_RVP_R_OOL(sra16_u);
> >> +GEN_RVP_R_OOL(srl16_u);
> >> +GEN_RVP_R_OOL(ksll16);
> >> +GEN_RVP_R_OOL(kslra16);
> >> +GEN_RVP_R_OOL(kslra16_u);
> >> +
> >> +static bool rvp_shifti_ool(DisasContext *ctx, arg_shift *a,
> >> +                           gen_helper_rvp_r *fn)
> >> +{
> >> +    TCGv src1, dst, shift;
> >> +
> >> +    src1 = tcg_temp_new();
> >> +    dst = tcg_temp_new();
> >> +
> >> +    gen_get_gpr(src1, a->rs1);
> >> +    shift = tcg_const_tl(a->shamt);
> >> +    fn(dst, cpu_env, src1, shift);
> >> +    gen_set_gpr(a->rd, dst);
> >> +
> >> +    tcg_temp_free(src1);
> >> +    tcg_temp_free(dst);
> >> +    tcg_temp_free(shift);
> >> +    return true;
> >> +}
> >> +
> >> +static inline bool
> >> +rvp_shifti(DisasContext *ctx, arg_shift *a,
> >> +           void (* f64)(TCGv_i64, TCGv_i64, int64_t),
> >> +           gen_helper_rvp_r *fn)
> >> +{
> >> +    if (!has_ext(ctx, RVP)) {
> >> +        return false;
> >> +    }
> >> +
> >> +#ifdef TARGET_RISCV64
> >> +    if (a->rd && a->rs1 && f64) {
> >> +        f64(cpu_gpr[a->rd], cpu_gpr[a->rs1], a->shamt);
> >> +        return true;
> >> +    }
> >> +#endif
> >> +    return rvp_shifti_ool(ctx, a, fn);
> >> +}
> >> +
> >> +#define GEN_RVP_SHIFTI(NAME, OP, GVEC)                   \
> >> +static bool trans_##NAME(DisasContext *s, arg_shift *a)  \
> >> +{                                                        \
> >> +    return rvp_shifti(s, a, GVEC, gen_helper_##OP);      \
> >> +}
> >> +
> >> +GEN_RVP_SHIFTI(srai16, sra16, tcg_gen_vec_sar16i_i64);
> >> +GEN_RVP_SHIFTI(srli16, srl16, tcg_gen_vec_shr16i_i64);
> >> +GEN_RVP_SHIFTI(slli16, sll16, tcg_gen_vec_shl16i_i64);
> >> +GEN_RVP_SHIFTI(srai16_u, sra16_u, NULL);
> >> +GEN_RVP_SHIFTI(srli16_u, srl16_u, NULL);
> >> +GEN_RVP_SHIFTI(kslli16, ksll16, NULL);
> >> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> >> index 62db072204..7e31c2fe46 100644
> >> --- a/target/riscv/packed_helper.c
> >> +++ b/target/riscv/packed_helper.c
> >> @@ -425,3 +425,107 @@ static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
> >>   }
> >>
> >>   RVPR(uksub8, 1, 1);
> >> +
> >> +/* 16-bit Shift Instructions */
> >> +static inline void do_sra16(CPURISCVState *env, void *vd, void *va,
> >> +                            void *vb, uint8_t i)
> >> +{
> >> +    int16_t *d = vd, *a = va;
> >> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >> +    d[i] = a[i] >> shift;
> >> +}
> >> +
> >> +RVPR(sra16, 1, 2);
> >> +
> >> +static inline void do_srl16(CPURISCVState *env, void *vd, void *va,
> >> +                            void *vb, uint8_t i)
> >> +{
> >> +    uint16_t *d = vd, *a = va;
> >> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >> +    d[i] = a[i] >> shift;
> >> +}
> >> +
> >> +RVPR(srl16, 1, 2);
> >> +
> >> +static inline void do_sll16(CPURISCVState *env, void *vd, void *va,
> >> +                            void *vb, uint8_t i)
> >> +{
> >> +    uint16_t *d = vd, *a = va;
> >> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >> +    d[i] = a[i] << shift;
> >> +}
> >> +
> >> +RVPR(sll16, 1, 2);
> >> +
> >> +static inline void do_sra16_u(CPURISCVState *env, void *vd, void *va,
> >> +                              void *vb, uint8_t i)
> >> +{
> >> +    int16_t *d = vd, *a = va;
> >> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >> +
> >> +    d[i] = vssra16(env, 0, a[i], shift);
> >> +}
> >> +
> >> +RVPR(sra16_u, 1, 2);
> >> +
> >> +static inline void do_srl16_u(CPURISCVState *env, void *vd, void *va,
> >> +                              void *vb, uint8_t i)
> >> +{
> >> +    uint16_t *d = vd, *a = va;
> >> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >> +
> >> +    d[i] = vssrl16(env, 0, a[i], shift);
> >> +}
> >> +
> >> +RVPR(srl16_u, 1, 2);
> >> +
> >> +static inline void do_ksll16(CPURISCVState *env, void *vd, void *va,
> >> +                             void *vb, uint8_t i)
> >> +{
> >> +    int16_t *d = vd, *a = va, result;
> >> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >> +
> >> +    result = a[i] << shift;
> >> +    if (shift > (clrsb32(a[i]) - 16)) {
> >> +        env->vxsat = 0x1;
> >> +        d[i] = (a[i] & INT16_MIN) ? INT16_MIN : INT16_MAX;
> >> +    } else {
> >> +        d[i] = result;
> >> +    }
> >> +}
> >> +
> >> +RVPR(ksll16, 1, 2);
> >> +
> >> +static inline void do_kslra16(CPURISCVState *env, void *vd, void *va,
> >> +                              void *vb, uint8_t i)
> >> +{
> >> +    int16_t *d = vd, *a = va;
> >> +    int32_t shift = sextract32((*(target_ulong *)vb), 0, 5);
> >> +
> >> +    if (shift >= 0) {
> >> +        do_ksll16(env, vd, va, vb, i);
> >> +    } else {
> >> +        shift = -shift;
> >> +        shift = (shift == 16) ? 15 : shift;
> >> +        d[i] = a[i] >> shift;
> >> +    }
> >> +}
> >> +
> >> +RVPR(kslra16, 1, 2);
> >> +
> >> +static inline void do_kslra16_u(CPURISCVState *env, void *vd, void *va,
> >> +                                void *vb, uint8_t i)
> >> +{
> >> +    int16_t *d = vd, *a = va;
> >> +    int32_t shift = sextract32((*(uint32_t *)vb), 0, 5);
> >> +
> >> +    if (shift >= 0) {
> >> +        do_ksll16(env, vd, va, vb, i);
> >> +    } else {
> >> +        shift = -shift;
> >> +        shift = (shift == 16) ? 15 : shift;
> >> +        d[i] = vssra16(env, 0, a[i], shift);
> >> +    }
> >> +}
> >> +
> >> +RVPR(kslra16_u, 1, 2);
> >> --
> >> 2.17.1
> >>
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 06/38] target/riscv: SIMD 16-bit Shift Instructions
@ 2021-03-16 19:54         ` Alistair Francis
  0 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-16 19:54 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, Palmer Dabbelt

On Mon, Mar 15, 2021 at 10:40 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
>
>
> On 2021/3/16 5:25, Alistair Francis wrote:
> > On Fri, Feb 12, 2021 at 10:16 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
> >> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> >> ---
> >>   target/riscv/helper.h                   |   9 ++
> >>   target/riscv/insn32.decode              |  17 ++++
> >>   target/riscv/insn_trans/trans_rvp.c.inc | 115 ++++++++++++++++++++++++
> >>   target/riscv/packed_helper.c            | 104 +++++++++++++++++++++
> >>   4 files changed, 245 insertions(+)
> >>
> >> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> >> index a69a6b4e84..20bf400ac2 100644
> >> --- a/target/riscv/helper.h
> >> +++ b/target/riscv/helper.h
> >> @@ -1184,3 +1184,12 @@ DEF_HELPER_3(rsub8, tl, env, tl, tl)
> >>   DEF_HELPER_3(ursub8, tl, env, tl, tl)
> >>   DEF_HELPER_3(ksub8, tl, env, tl, tl)
> >>   DEF_HELPER_3(uksub8, tl, env, tl, tl)
> >> +
> >> +DEF_HELPER_3(sra16, tl, env, tl, tl)
> >> +DEF_HELPER_3(sra16_u, tl, env, tl, tl)
> >> +DEF_HELPER_3(srl16, tl, env, tl, tl)
> >> +DEF_HELPER_3(srl16_u, tl, env, tl, tl)
> >> +DEF_HELPER_3(sll16, tl, env, tl, tl)
> >> +DEF_HELPER_3(ksll16, tl, env, tl, tl)
> >> +DEF_HELPER_3(kslra16, tl, env, tl, tl)
> >> +DEF_HELPER_3(kslra16_u, tl, env, tl, tl)
> >> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> >> index 358dd1fa10..6f053bfeb7 100644
> >> --- a/target/riscv/insn32.decode
> >> +++ b/target/riscv/insn32.decode
> >> @@ -23,6 +23,7 @@
> >>   %rd        7:5
> >>
> >>   %sh10    20:10
> >> +%sh4    20:4
> >>   %csr    20:12
> >>   %rm     12:3
> >>   %nf     29:3                     !function=ex_plus_1
> >> @@ -59,6 +60,7 @@
> >>   @j       ....................      ..... ....... &j      imm=%imm_j          %rd
> >>
> >>   @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh10      %rs1 %rd
> >> +@sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
> >>   @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
> >>
> >>   @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
> >> @@ -635,3 +637,18 @@ rsub8      0000101  ..... ..... 000 ..... 1111111 @r
> >>   ursub8     0010101  ..... ..... 000 ..... 1111111 @r
> >>   ksub8      0001101  ..... ..... 000 ..... 1111111 @r
> >>   uksub8     0011101  ..... ..... 000 ..... 1111111 @r
> >> +
> >> +sra16      0101000  ..... ..... 000 ..... 1111111 @r
> >> +sra16_u    0110000  ..... ..... 000 ..... 1111111 @r
> >> +srai16     0111000  0.... ..... 000 ..... 1111111 @sh4
> >> +srai16_u   0111000  1.... ..... 000 ..... 1111111 @sh4
> >> +srl16      0101001  ..... ..... 000 ..... 1111111 @r
> >> +srl16_u    0110001  ..... ..... 000 ..... 1111111 @r
> >> +srli16     0111001  0.... ..... 000 ..... 1111111 @sh4
> >> +srli16_u   0111001  1.... ..... 000 ..... 1111111 @sh4
> >> +sll16      0101010  ..... ..... 000 ..... 1111111 @r
> >> +slli16     0111010  0.... ..... 000 ..... 1111111 @sh4
> >> +ksll16     0110010  ..... ..... 000 ..... 1111111 @r
> >> +kslli16    0111010  1.... ..... 000 ..... 1111111 @sh4
> >> +kslra16    0101011  ..... ..... 000 ..... 1111111 @r
> >> +kslra16_u  0110011  ..... ..... 000 ..... 1111111 @r
> >> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> >> index 109f560ec9..848edab7e5 100644
> >> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> >> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> >> @@ -238,3 +238,118 @@ GEN_RVP_R_OOL(rsub8);
> >>   GEN_RVP_R_OOL(ursub8);
> >>   GEN_RVP_R_OOL(ksub8);
> >>   GEN_RVP_R_OOL(uksub8);
> >> +
> >> +/* 16-bit Shift Instructions */
> >> +static bool rvp_shift_ool(DisasContext *ctx, arg_r *a,
> >> +                          gen_helper_rvp_r *fn, target_ulong mask)
> >> +{
> >> +    TCGv src1, src2, dst;
> >> +
> >> +    src1 = tcg_temp_new();
> >> +    src2 = tcg_temp_new();
> >> +    dst = tcg_temp_new();
> >> +
> >> +    gen_get_gpr(src1, a->rs1);
> >> +    gen_get_gpr(src2, a->rs2);
> >> +    tcg_gen_andi_tl(src2, src2, mask);
> >> +
> >> +    fn(dst, cpu_env, src1, src2);
> >> +    gen_set_gpr(a->rd, dst);
> >> +
> >> +    tcg_temp_free(src1);
> >> +    tcg_temp_free(src2);
> >> +    tcg_temp_free(dst);
> >> +    return true;
> >> +}
> >> +
> >> +typedef void GenGvecShift(unsigned, uint32_t, uint32_t, TCGv_i32,
> >> +                          uint32_t, uint32_t);
> >> +static inline bool
> >> +rvp_shift(DisasContext *ctx, arg_r *a, uint8_t vece,
> >> +          GenGvecShift *f64, gen_helper_rvp_r *fn,
> >> +          uint8_t mask)
> >> +{
> >> +    if (!has_ext(ctx, RVP)) {
> >> +        return false;
> >> +    }
> >> +
> >> +#ifdef TARGET_RISCV64
> > Hmm....
> >
> > I don't want to add any more #defines on the RISC-V xlen. We are
> > trying to make the QEMU RISC-V implementation xlen independent.
> I noticed the change, but was not quite clear about the benefit of it.
>
> Could you give a brief explanation?

Yep, there are a few reasons for it.

AFAIK every QEMU platform except RISC-V allows the 64-bit binary to
run the 32-bit guests. So for example in the ARM QEMU builds I can use
qemu-system-aarch64 to run 32-bit ARMv7 guests. This is a step towards
allowing us to do the same with RISC-V.

It's also a step towards allowing fixed XLEN CPUS to run. For example
4 64-bit application CPUs and a single 32-bit power management CPU can
all run together.

Also XLEN in RISC-V is configurable. A 64-bit Hypervisor can have
32-bit XLEN guests according to the spec. This is a step towards
allowing that as well.

> > Can you use `riscv_cpu_is_32bit(env)` instead, here are everywhere
> > else you add a #define TARGET... ?
> Sure, I think there are two ways.
>
> 1) Get env from the current_cpu, then call riscv_cpu_is_32bit(env).
>
> It's some strange,  because I can't find current_cpu reference from many
> archs.
>
> I don't know whether it has side effects.
>
> 2)  Add a similar function cpu_is_32bit(DisasContext *ctx).

This is probably a better option, but I'm open to either way if you
have a strong preference.

Alistair

>
> In this way, the type of  misa field  in struct DisasContext should be
> target_ulong.
> Currently, the type of misa filed is uint32_t.
>
> Do you think which one is better? Thanks very much.
>
> Zhiwei
> >
> > Alistair
> >
> >> +    if (a->rd && a->rs1 && a->rs2) {
> >> +        TCGv_i32 shift = tcg_temp_new_i32();
> >> +        tcg_gen_extrl_i64_i32(shift, cpu_gpr[a->rs2]);
> >> +        tcg_gen_andi_i32(shift, shift, mask);
> >> +        f64(vece, offsetof(CPURISCVState, gpr[a->rd]),
> >> +            offsetof(CPURISCVState, gpr[a->rs1]),
> >> +            shift, 8, 8);
> >> +        tcg_temp_free_i32(shift);
> >> +        return true;
> >> +    }
> >> +#endif
> >> +    return rvp_shift_ool(ctx, a, fn, mask);
> >> +}
> >> +
> >> +#define GEN_RVP_SHIFT(NAME, GVEC, VECE)                     \
> >> +static bool trans_##NAME(DisasContext *s, arg_r *a)         \
> >> +{                                                           \
> >> +    return rvp_shift(s, a, VECE, GVEC, gen_helper_##NAME,   \
> >> +                     (8 << VECE) - 1);                      \
> >> +}
> >> +
> >> +GEN_RVP_SHIFT(sra16, tcg_gen_gvec_sars, 1);
> >> +GEN_RVP_SHIFT(srl16, tcg_gen_gvec_shrs, 1);
> >> +GEN_RVP_SHIFT(sll16, tcg_gen_gvec_shls, 1);
> >> +GEN_RVP_R_OOL(sra16_u);
> >> +GEN_RVP_R_OOL(srl16_u);
> >> +GEN_RVP_R_OOL(ksll16);
> >> +GEN_RVP_R_OOL(kslra16);
> >> +GEN_RVP_R_OOL(kslra16_u);
> >> +
> >> +static bool rvp_shifti_ool(DisasContext *ctx, arg_shift *a,
> >> +                           gen_helper_rvp_r *fn)
> >> +{
> >> +    TCGv src1, dst, shift;
> >> +
> >> +    src1 = tcg_temp_new();
> >> +    dst = tcg_temp_new();
> >> +
> >> +    gen_get_gpr(src1, a->rs1);
> >> +    shift = tcg_const_tl(a->shamt);
> >> +    fn(dst, cpu_env, src1, shift);
> >> +    gen_set_gpr(a->rd, dst);
> >> +
> >> +    tcg_temp_free(src1);
> >> +    tcg_temp_free(dst);
> >> +    tcg_temp_free(shift);
> >> +    return true;
> >> +}
> >> +
> >> +static inline bool
> >> +rvp_shifti(DisasContext *ctx, arg_shift *a,
> >> +           void (* f64)(TCGv_i64, TCGv_i64, int64_t),
> >> +           gen_helper_rvp_r *fn)
> >> +{
> >> +    if (!has_ext(ctx, RVP)) {
> >> +        return false;
> >> +    }
> >> +
> >> +#ifdef TARGET_RISCV64
> >> +    if (a->rd && a->rs1 && f64) {
> >> +        f64(cpu_gpr[a->rd], cpu_gpr[a->rs1], a->shamt);
> >> +        return true;
> >> +    }
> >> +#endif
> >> +    return rvp_shifti_ool(ctx, a, fn);
> >> +}
> >> +
> >> +#define GEN_RVP_SHIFTI(NAME, OP, GVEC)                   \
> >> +static bool trans_##NAME(DisasContext *s, arg_shift *a)  \
> >> +{                                                        \
> >> +    return rvp_shifti(s, a, GVEC, gen_helper_##OP);      \
> >> +}
> >> +
> >> +GEN_RVP_SHIFTI(srai16, sra16, tcg_gen_vec_sar16i_i64);
> >> +GEN_RVP_SHIFTI(srli16, srl16, tcg_gen_vec_shr16i_i64);
> >> +GEN_RVP_SHIFTI(slli16, sll16, tcg_gen_vec_shl16i_i64);
> >> +GEN_RVP_SHIFTI(srai16_u, sra16_u, NULL);
> >> +GEN_RVP_SHIFTI(srli16_u, srl16_u, NULL);
> >> +GEN_RVP_SHIFTI(kslli16, ksll16, NULL);
> >> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> >> index 62db072204..7e31c2fe46 100644
> >> --- a/target/riscv/packed_helper.c
> >> +++ b/target/riscv/packed_helper.c
> >> @@ -425,3 +425,107 @@ static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
> >>   }
> >>
> >>   RVPR(uksub8, 1, 1);
> >> +
> >> +/* 16-bit Shift Instructions */
> >> +static inline void do_sra16(CPURISCVState *env, void *vd, void *va,
> >> +                            void *vb, uint8_t i)
> >> +{
> >> +    int16_t *d = vd, *a = va;
> >> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >> +    d[i] = a[i] >> shift;
> >> +}
> >> +
> >> +RVPR(sra16, 1, 2);
> >> +
> >> +static inline void do_srl16(CPURISCVState *env, void *vd, void *va,
> >> +                            void *vb, uint8_t i)
> >> +{
> >> +    uint16_t *d = vd, *a = va;
> >> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >> +    d[i] = a[i] >> shift;
> >> +}
> >> +
> >> +RVPR(srl16, 1, 2);
> >> +
> >> +static inline void do_sll16(CPURISCVState *env, void *vd, void *va,
> >> +                            void *vb, uint8_t i)
> >> +{
> >> +    uint16_t *d = vd, *a = va;
> >> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >> +    d[i] = a[i] << shift;
> >> +}
> >> +
> >> +RVPR(sll16, 1, 2);
> >> +
> >> +static inline void do_sra16_u(CPURISCVState *env, void *vd, void *va,
> >> +                              void *vb, uint8_t i)
> >> +{
> >> +    int16_t *d = vd, *a = va;
> >> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >> +
> >> +    d[i] = vssra16(env, 0, a[i], shift);
> >> +}
> >> +
> >> +RVPR(sra16_u, 1, 2);
> >> +
> >> +static inline void do_srl16_u(CPURISCVState *env, void *vd, void *va,
> >> +                              void *vb, uint8_t i)
> >> +{
> >> +    uint16_t *d = vd, *a = va;
> >> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >> +
> >> +    d[i] = vssrl16(env, 0, a[i], shift);
> >> +}
> >> +
> >> +RVPR(srl16_u, 1, 2);
> >> +
> >> +static inline void do_ksll16(CPURISCVState *env, void *vd, void *va,
> >> +                             void *vb, uint8_t i)
> >> +{
> >> +    int16_t *d = vd, *a = va, result;
> >> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >> +
> >> +    result = a[i] << shift;
> >> +    if (shift > (clrsb32(a[i]) - 16)) {
> >> +        env->vxsat = 0x1;
> >> +        d[i] = (a[i] & INT16_MIN) ? INT16_MIN : INT16_MAX;
> >> +    } else {
> >> +        d[i] = result;
> >> +    }
> >> +}
> >> +
> >> +RVPR(ksll16, 1, 2);
> >> +
> >> +static inline void do_kslra16(CPURISCVState *env, void *vd, void *va,
> >> +                              void *vb, uint8_t i)
> >> +{
> >> +    int16_t *d = vd, *a = va;
> >> +    int32_t shift = sextract32((*(target_ulong *)vb), 0, 5);
> >> +
> >> +    if (shift >= 0) {
> >> +        do_ksll16(env, vd, va, vb, i);
> >> +    } else {
> >> +        shift = -shift;
> >> +        shift = (shift == 16) ? 15 : shift;
> >> +        d[i] = a[i] >> shift;
> >> +    }
> >> +}
> >> +
> >> +RVPR(kslra16, 1, 2);
> >> +
> >> +static inline void do_kslra16_u(CPURISCVState *env, void *vd, void *va,
> >> +                                void *vb, uint8_t i)
> >> +{
> >> +    int16_t *d = vd, *a = va;
> >> +    int32_t shift = sextract32((*(uint32_t *)vb), 0, 5);
> >> +
> >> +    if (shift >= 0) {
> >> +        do_ksll16(env, vd, va, vb, i);
> >> +    } else {
> >> +        shift = -shift;
> >> +        shift = (shift == 16) ? 15 : shift;
> >> +        d[i] = vssra16(env, 0, a[i], shift);
> >> +    }
> >> +}
> >> +
> >> +RVPR(kslra16_u, 1, 2);
> >> --
> >> 2.17.1
> >>
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 06/38] target/riscv: SIMD 16-bit Shift Instructions
  2021-03-16 19:54         ` Alistair Francis
@ 2021-03-17  2:30           ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-03-17  2:30 UTC (permalink / raw)
  To: Alistair Francis
  Cc: Richard Henderson, Palmer Dabbelt, open list:RISC-V,
	qemu-devel@nongnu.org Developers



On 2021/3/17 3:54, Alistair Francis wrote:
> On Mon, Mar 15, 2021 at 10:40 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>>
>>
>> On 2021/3/16 5:25, Alistair Francis wrote:
>>> On Fri, Feb 12, 2021 at 10:16 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>>>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>>>> ---
>>>>    target/riscv/helper.h                   |   9 ++
>>>>    target/riscv/insn32.decode              |  17 ++++
>>>>    target/riscv/insn_trans/trans_rvp.c.inc | 115 ++++++++++++++++++++++++
>>>>    target/riscv/packed_helper.c            | 104 +++++++++++++++++++++
>>>>    4 files changed, 245 insertions(+)
>>>>
>>>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>>>> index a69a6b4e84..20bf400ac2 100644
>>>> --- a/target/riscv/helper.h
>>>> +++ b/target/riscv/helper.h
>>>> @@ -1184,3 +1184,12 @@ DEF_HELPER_3(rsub8, tl, env, tl, tl)
>>>>    DEF_HELPER_3(ursub8, tl, env, tl, tl)
>>>>    DEF_HELPER_3(ksub8, tl, env, tl, tl)
>>>>    DEF_HELPER_3(uksub8, tl, env, tl, tl)
>>>> +
>>>> +DEF_HELPER_3(sra16, tl, env, tl, tl)
>>>> +DEF_HELPER_3(sra16_u, tl, env, tl, tl)
>>>> +DEF_HELPER_3(srl16, tl, env, tl, tl)
>>>> +DEF_HELPER_3(srl16_u, tl, env, tl, tl)
>>>> +DEF_HELPER_3(sll16, tl, env, tl, tl)
>>>> +DEF_HELPER_3(ksll16, tl, env, tl, tl)
>>>> +DEF_HELPER_3(kslra16, tl, env, tl, tl)
>>>> +DEF_HELPER_3(kslra16_u, tl, env, tl, tl)
>>>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>>>> index 358dd1fa10..6f053bfeb7 100644
>>>> --- a/target/riscv/insn32.decode
>>>> +++ b/target/riscv/insn32.decode
>>>> @@ -23,6 +23,7 @@
>>>>    %rd        7:5
>>>>
>>>>    %sh10    20:10
>>>> +%sh4    20:4
>>>>    %csr    20:12
>>>>    %rm     12:3
>>>>    %nf     29:3                     !function=ex_plus_1
>>>> @@ -59,6 +60,7 @@
>>>>    @j       ....................      ..... ....... &j      imm=%imm_j          %rd
>>>>
>>>>    @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh10      %rs1 %rd
>>>> +@sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
>>>>    @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
>>>>
>>>>    @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
>>>> @@ -635,3 +637,18 @@ rsub8      0000101  ..... ..... 000 ..... 1111111 @r
>>>>    ursub8     0010101  ..... ..... 000 ..... 1111111 @r
>>>>    ksub8      0001101  ..... ..... 000 ..... 1111111 @r
>>>>    uksub8     0011101  ..... ..... 000 ..... 1111111 @r
>>>> +
>>>> +sra16      0101000  ..... ..... 000 ..... 1111111 @r
>>>> +sra16_u    0110000  ..... ..... 000 ..... 1111111 @r
>>>> +srai16     0111000  0.... ..... 000 ..... 1111111 @sh4
>>>> +srai16_u   0111000  1.... ..... 000 ..... 1111111 @sh4
>>>> +srl16      0101001  ..... ..... 000 ..... 1111111 @r
>>>> +srl16_u    0110001  ..... ..... 000 ..... 1111111 @r
>>>> +srli16     0111001  0.... ..... 000 ..... 1111111 @sh4
>>>> +srli16_u   0111001  1.... ..... 000 ..... 1111111 @sh4
>>>> +sll16      0101010  ..... ..... 000 ..... 1111111 @r
>>>> +slli16     0111010  0.... ..... 000 ..... 1111111 @sh4
>>>> +ksll16     0110010  ..... ..... 000 ..... 1111111 @r
>>>> +kslli16    0111010  1.... ..... 000 ..... 1111111 @sh4
>>>> +kslra16    0101011  ..... ..... 000 ..... 1111111 @r
>>>> +kslra16_u  0110011  ..... ..... 000 ..... 1111111 @r
>>>> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
>>>> index 109f560ec9..848edab7e5 100644
>>>> --- a/target/riscv/insn_trans/trans_rvp.c.inc
>>>> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
>>>> @@ -238,3 +238,118 @@ GEN_RVP_R_OOL(rsub8);
>>>>    GEN_RVP_R_OOL(ursub8);
>>>>    GEN_RVP_R_OOL(ksub8);
>>>>    GEN_RVP_R_OOL(uksub8);
>>>> +
>>>> +/* 16-bit Shift Instructions */
>>>> +static bool rvp_shift_ool(DisasContext *ctx, arg_r *a,
>>>> +                          gen_helper_rvp_r *fn, target_ulong mask)
>>>> +{
>>>> +    TCGv src1, src2, dst;
>>>> +
>>>> +    src1 = tcg_temp_new();
>>>> +    src2 = tcg_temp_new();
>>>> +    dst = tcg_temp_new();
>>>> +
>>>> +    gen_get_gpr(src1, a->rs1);
>>>> +    gen_get_gpr(src2, a->rs2);
>>>> +    tcg_gen_andi_tl(src2, src2, mask);
>>>> +
>>>> +    fn(dst, cpu_env, src1, src2);
>>>> +    gen_set_gpr(a->rd, dst);
>>>> +
>>>> +    tcg_temp_free(src1);
>>>> +    tcg_temp_free(src2);
>>>> +    tcg_temp_free(dst);
>>>> +    return true;
>>>> +}
>>>> +
>>>> +typedef void GenGvecShift(unsigned, uint32_t, uint32_t, TCGv_i32,
>>>> +                          uint32_t, uint32_t);
>>>> +static inline bool
>>>> +rvp_shift(DisasContext *ctx, arg_r *a, uint8_t vece,
>>>> +          GenGvecShift *f64, gen_helper_rvp_r *fn,
>>>> +          uint8_t mask)
>>>> +{
>>>> +    if (!has_ext(ctx, RVP)) {
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +#ifdef TARGET_RISCV64
>>> Hmm....
>>>
>>> I don't want to add any more #defines on the RISC-V xlen. We are
>>> trying to make the QEMU RISC-V implementation xlen independent.
>> I noticed the change, but was not quite clear about the benefit of it.
>>
>> Could you give a brief explanation?
> Yep, there are a few reasons for it.
>
> AFAIK every QEMU platform except RISC-V allows the 64-bit binary to
> run the 32-bit guests. So for example in the ARM QEMU builds I can use
> qemu-system-aarch64 to run 32-bit ARMv7 guests. This is a step towards
> allowing us to do the same with RISC-V.
Get it. By explicitly specifying a 32bit CPU option in command line, 
qemu-system-riscv64
can run a 32-bit guest application. Is it right?
>
> It's also a step towards allowing fixed XLEN CPUS to run. For example
> 4 64-bit application CPUs and a single 32-bit power management CPU can
> all run together.
>
> Also XLEN in RISC-V is configurable. A 64-bit Hypervisor can have
> 32-bit XLEN guests according to the spec. This is a step towards
> allowing that as well.
Really interesting points.

I have not used QEMU in this way. Are these features ready now?
>>> Can you use `riscv_cpu_is_32bit(env)` instead, here are everywhere
>>> else you add a #define TARGET... ?
>> Sure, I think there are two ways.
>>
>> 1) Get env from the current_cpu, then call riscv_cpu_is_32bit(env).
>>
>> It's some strange,  because I can't find current_cpu reference from many
>> archs.
>>
>> I don't know whether it has side effects.
>>
>> 2)  Add a similar function cpu_is_32bit(DisasContext *ctx).
> This is probably a better option, but I'm open to either way if you
> have a strong preference.
I will add a patch in this way in v2. Thanks a lot.

Zhiwei
>
> Alistair
>
>> In this way, the type of  misa field  in struct DisasContext should be
>> target_ulong.
>> Currently, the type of misa filed is uint32_t.
>>
>> Do you think which one is better? Thanks very much.
>>
>> Zhiwei
>>> Alistair
>>>
>>>> +    if (a->rd && a->rs1 && a->rs2) {
>>>> +        TCGv_i32 shift = tcg_temp_new_i32();
>>>> +        tcg_gen_extrl_i64_i32(shift, cpu_gpr[a->rs2]);
>>>> +        tcg_gen_andi_i32(shift, shift, mask);
>>>> +        f64(vece, offsetof(CPURISCVState, gpr[a->rd]),
>>>> +            offsetof(CPURISCVState, gpr[a->rs1]),
>>>> +            shift, 8, 8);
>>>> +        tcg_temp_free_i32(shift);
>>>> +        return true;
>>>> +    }
>>>> +#endif
>>>> +    return rvp_shift_ool(ctx, a, fn, mask);
>>>> +}
>>>> +
>>>> +#define GEN_RVP_SHIFT(NAME, GVEC, VECE)                     \
>>>> +static bool trans_##NAME(DisasContext *s, arg_r *a)         \
>>>> +{                                                           \
>>>> +    return rvp_shift(s, a, VECE, GVEC, gen_helper_##NAME,   \
>>>> +                     (8 << VECE) - 1);                      \
>>>> +}
>>>> +
>>>> +GEN_RVP_SHIFT(sra16, tcg_gen_gvec_sars, 1);
>>>> +GEN_RVP_SHIFT(srl16, tcg_gen_gvec_shrs, 1);
>>>> +GEN_RVP_SHIFT(sll16, tcg_gen_gvec_shls, 1);
>>>> +GEN_RVP_R_OOL(sra16_u);
>>>> +GEN_RVP_R_OOL(srl16_u);
>>>> +GEN_RVP_R_OOL(ksll16);
>>>> +GEN_RVP_R_OOL(kslra16);
>>>> +GEN_RVP_R_OOL(kslra16_u);
>>>> +
>>>> +static bool rvp_shifti_ool(DisasContext *ctx, arg_shift *a,
>>>> +                           gen_helper_rvp_r *fn)
>>>> +{
>>>> +    TCGv src1, dst, shift;
>>>> +
>>>> +    src1 = tcg_temp_new();
>>>> +    dst = tcg_temp_new();
>>>> +
>>>> +    gen_get_gpr(src1, a->rs1);
>>>> +    shift = tcg_const_tl(a->shamt);
>>>> +    fn(dst, cpu_env, src1, shift);
>>>> +    gen_set_gpr(a->rd, dst);
>>>> +
>>>> +    tcg_temp_free(src1);
>>>> +    tcg_temp_free(dst);
>>>> +    tcg_temp_free(shift);
>>>> +    return true;
>>>> +}
>>>> +
>>>> +static inline bool
>>>> +rvp_shifti(DisasContext *ctx, arg_shift *a,
>>>> +           void (* f64)(TCGv_i64, TCGv_i64, int64_t),
>>>> +           gen_helper_rvp_r *fn)
>>>> +{
>>>> +    if (!has_ext(ctx, RVP)) {
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +#ifdef TARGET_RISCV64
>>>> +    if (a->rd && a->rs1 && f64) {
>>>> +        f64(cpu_gpr[a->rd], cpu_gpr[a->rs1], a->shamt);
>>>> +        return true;
>>>> +    }
>>>> +#endif
>>>> +    return rvp_shifti_ool(ctx, a, fn);
>>>> +}
>>>> +
>>>> +#define GEN_RVP_SHIFTI(NAME, OP, GVEC)                   \
>>>> +static bool trans_##NAME(DisasContext *s, arg_shift *a)  \
>>>> +{                                                        \
>>>> +    return rvp_shifti(s, a, GVEC, gen_helper_##OP);      \
>>>> +}
>>>> +
>>>> +GEN_RVP_SHIFTI(srai16, sra16, tcg_gen_vec_sar16i_i64);
>>>> +GEN_RVP_SHIFTI(srli16, srl16, tcg_gen_vec_shr16i_i64);
>>>> +GEN_RVP_SHIFTI(slli16, sll16, tcg_gen_vec_shl16i_i64);
>>>> +GEN_RVP_SHIFTI(srai16_u, sra16_u, NULL);
>>>> +GEN_RVP_SHIFTI(srli16_u, srl16_u, NULL);
>>>> +GEN_RVP_SHIFTI(kslli16, ksll16, NULL);
>>>> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
>>>> index 62db072204..7e31c2fe46 100644
>>>> --- a/target/riscv/packed_helper.c
>>>> +++ b/target/riscv/packed_helper.c
>>>> @@ -425,3 +425,107 @@ static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
>>>>    }
>>>>
>>>>    RVPR(uksub8, 1, 1);
>>>> +
>>>> +/* 16-bit Shift Instructions */
>>>> +static inline void do_sra16(CPURISCVState *env, void *vd, void *va,
>>>> +                            void *vb, uint8_t i)
>>>> +{
>>>> +    int16_t *d = vd, *a = va;
>>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>>>> +    d[i] = a[i] >> shift;
>>>> +}
>>>> +
>>>> +RVPR(sra16, 1, 2);
>>>> +
>>>> +static inline void do_srl16(CPURISCVState *env, void *vd, void *va,
>>>> +                            void *vb, uint8_t i)
>>>> +{
>>>> +    uint16_t *d = vd, *a = va;
>>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>>>> +    d[i] = a[i] >> shift;
>>>> +}
>>>> +
>>>> +RVPR(srl16, 1, 2);
>>>> +
>>>> +static inline void do_sll16(CPURISCVState *env, void *vd, void *va,
>>>> +                            void *vb, uint8_t i)
>>>> +{
>>>> +    uint16_t *d = vd, *a = va;
>>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>>>> +    d[i] = a[i] << shift;
>>>> +}
>>>> +
>>>> +RVPR(sll16, 1, 2);
>>>> +
>>>> +static inline void do_sra16_u(CPURISCVState *env, void *vd, void *va,
>>>> +                              void *vb, uint8_t i)
>>>> +{
>>>> +    int16_t *d = vd, *a = va;
>>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>>>> +
>>>> +    d[i] = vssra16(env, 0, a[i], shift);
>>>> +}
>>>> +
>>>> +RVPR(sra16_u, 1, 2);
>>>> +
>>>> +static inline void do_srl16_u(CPURISCVState *env, void *vd, void *va,
>>>> +                              void *vb, uint8_t i)
>>>> +{
>>>> +    uint16_t *d = vd, *a = va;
>>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>>>> +
>>>> +    d[i] = vssrl16(env, 0, a[i], shift);
>>>> +}
>>>> +
>>>> +RVPR(srl16_u, 1, 2);
>>>> +
>>>> +static inline void do_ksll16(CPURISCVState *env, void *vd, void *va,
>>>> +                             void *vb, uint8_t i)
>>>> +{
>>>> +    int16_t *d = vd, *a = va, result;
>>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>>>> +
>>>> +    result = a[i] << shift;
>>>> +    if (shift > (clrsb32(a[i]) - 16)) {
>>>> +        env->vxsat = 0x1;
>>>> +        d[i] = (a[i] & INT16_MIN) ? INT16_MIN : INT16_MAX;
>>>> +    } else {
>>>> +        d[i] = result;
>>>> +    }
>>>> +}
>>>> +
>>>> +RVPR(ksll16, 1, 2);
>>>> +
>>>> +static inline void do_kslra16(CPURISCVState *env, void *vd, void *va,
>>>> +                              void *vb, uint8_t i)
>>>> +{
>>>> +    int16_t *d = vd, *a = va;
>>>> +    int32_t shift = sextract32((*(target_ulong *)vb), 0, 5);
>>>> +
>>>> +    if (shift >= 0) {
>>>> +        do_ksll16(env, vd, va, vb, i);
>>>> +    } else {
>>>> +        shift = -shift;
>>>> +        shift = (shift == 16) ? 15 : shift;
>>>> +        d[i] = a[i] >> shift;
>>>> +    }
>>>> +}
>>>> +
>>>> +RVPR(kslra16, 1, 2);
>>>> +
>>>> +static inline void do_kslra16_u(CPURISCVState *env, void *vd, void *va,
>>>> +                                void *vb, uint8_t i)
>>>> +{
>>>> +    int16_t *d = vd, *a = va;
>>>> +    int32_t shift = sextract32((*(uint32_t *)vb), 0, 5);
>>>> +
>>>> +    if (shift >= 0) {
>>>> +        do_ksll16(env, vd, va, vb, i);
>>>> +    } else {
>>>> +        shift = -shift;
>>>> +        shift = (shift == 16) ? 15 : shift;
>>>> +        d[i] = vssra16(env, 0, a[i], shift);
>>>> +    }
>>>> +}
>>>> +
>>>> +RVPR(kslra16_u, 1, 2);
>>>> --
>>>> 2.17.1
>>>>



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 06/38] target/riscv: SIMD 16-bit Shift Instructions
@ 2021-03-17  2:30           ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-03-17  2:30 UTC (permalink / raw)
  To: Alistair Francis
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, Palmer Dabbelt



On 2021/3/17 3:54, Alistair Francis wrote:
> On Mon, Mar 15, 2021 at 10:40 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>>
>>
>> On 2021/3/16 5:25, Alistair Francis wrote:
>>> On Fri, Feb 12, 2021 at 10:16 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>>>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>>>> ---
>>>>    target/riscv/helper.h                   |   9 ++
>>>>    target/riscv/insn32.decode              |  17 ++++
>>>>    target/riscv/insn_trans/trans_rvp.c.inc | 115 ++++++++++++++++++++++++
>>>>    target/riscv/packed_helper.c            | 104 +++++++++++++++++++++
>>>>    4 files changed, 245 insertions(+)
>>>>
>>>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>>>> index a69a6b4e84..20bf400ac2 100644
>>>> --- a/target/riscv/helper.h
>>>> +++ b/target/riscv/helper.h
>>>> @@ -1184,3 +1184,12 @@ DEF_HELPER_3(rsub8, tl, env, tl, tl)
>>>>    DEF_HELPER_3(ursub8, tl, env, tl, tl)
>>>>    DEF_HELPER_3(ksub8, tl, env, tl, tl)
>>>>    DEF_HELPER_3(uksub8, tl, env, tl, tl)
>>>> +
>>>> +DEF_HELPER_3(sra16, tl, env, tl, tl)
>>>> +DEF_HELPER_3(sra16_u, tl, env, tl, tl)
>>>> +DEF_HELPER_3(srl16, tl, env, tl, tl)
>>>> +DEF_HELPER_3(srl16_u, tl, env, tl, tl)
>>>> +DEF_HELPER_3(sll16, tl, env, tl, tl)
>>>> +DEF_HELPER_3(ksll16, tl, env, tl, tl)
>>>> +DEF_HELPER_3(kslra16, tl, env, tl, tl)
>>>> +DEF_HELPER_3(kslra16_u, tl, env, tl, tl)
>>>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>>>> index 358dd1fa10..6f053bfeb7 100644
>>>> --- a/target/riscv/insn32.decode
>>>> +++ b/target/riscv/insn32.decode
>>>> @@ -23,6 +23,7 @@
>>>>    %rd        7:5
>>>>
>>>>    %sh10    20:10
>>>> +%sh4    20:4
>>>>    %csr    20:12
>>>>    %rm     12:3
>>>>    %nf     29:3                     !function=ex_plus_1
>>>> @@ -59,6 +60,7 @@
>>>>    @j       ....................      ..... ....... &j      imm=%imm_j          %rd
>>>>
>>>>    @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh10      %rs1 %rd
>>>> +@sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
>>>>    @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
>>>>
>>>>    @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
>>>> @@ -635,3 +637,18 @@ rsub8      0000101  ..... ..... 000 ..... 1111111 @r
>>>>    ursub8     0010101  ..... ..... 000 ..... 1111111 @r
>>>>    ksub8      0001101  ..... ..... 000 ..... 1111111 @r
>>>>    uksub8     0011101  ..... ..... 000 ..... 1111111 @r
>>>> +
>>>> +sra16      0101000  ..... ..... 000 ..... 1111111 @r
>>>> +sra16_u    0110000  ..... ..... 000 ..... 1111111 @r
>>>> +srai16     0111000  0.... ..... 000 ..... 1111111 @sh4
>>>> +srai16_u   0111000  1.... ..... 000 ..... 1111111 @sh4
>>>> +srl16      0101001  ..... ..... 000 ..... 1111111 @r
>>>> +srl16_u    0110001  ..... ..... 000 ..... 1111111 @r
>>>> +srli16     0111001  0.... ..... 000 ..... 1111111 @sh4
>>>> +srli16_u   0111001  1.... ..... 000 ..... 1111111 @sh4
>>>> +sll16      0101010  ..... ..... 000 ..... 1111111 @r
>>>> +slli16     0111010  0.... ..... 000 ..... 1111111 @sh4
>>>> +ksll16     0110010  ..... ..... 000 ..... 1111111 @r
>>>> +kslli16    0111010  1.... ..... 000 ..... 1111111 @sh4
>>>> +kslra16    0101011  ..... ..... 000 ..... 1111111 @r
>>>> +kslra16_u  0110011  ..... ..... 000 ..... 1111111 @r
>>>> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
>>>> index 109f560ec9..848edab7e5 100644
>>>> --- a/target/riscv/insn_trans/trans_rvp.c.inc
>>>> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
>>>> @@ -238,3 +238,118 @@ GEN_RVP_R_OOL(rsub8);
>>>>    GEN_RVP_R_OOL(ursub8);
>>>>    GEN_RVP_R_OOL(ksub8);
>>>>    GEN_RVP_R_OOL(uksub8);
>>>> +
>>>> +/* 16-bit Shift Instructions */
>>>> +static bool rvp_shift_ool(DisasContext *ctx, arg_r *a,
>>>> +                          gen_helper_rvp_r *fn, target_ulong mask)
>>>> +{
>>>> +    TCGv src1, src2, dst;
>>>> +
>>>> +    src1 = tcg_temp_new();
>>>> +    src2 = tcg_temp_new();
>>>> +    dst = tcg_temp_new();
>>>> +
>>>> +    gen_get_gpr(src1, a->rs1);
>>>> +    gen_get_gpr(src2, a->rs2);
>>>> +    tcg_gen_andi_tl(src2, src2, mask);
>>>> +
>>>> +    fn(dst, cpu_env, src1, src2);
>>>> +    gen_set_gpr(a->rd, dst);
>>>> +
>>>> +    tcg_temp_free(src1);
>>>> +    tcg_temp_free(src2);
>>>> +    tcg_temp_free(dst);
>>>> +    return true;
>>>> +}
>>>> +
>>>> +typedef void GenGvecShift(unsigned, uint32_t, uint32_t, TCGv_i32,
>>>> +                          uint32_t, uint32_t);
>>>> +static inline bool
>>>> +rvp_shift(DisasContext *ctx, arg_r *a, uint8_t vece,
>>>> +          GenGvecShift *f64, gen_helper_rvp_r *fn,
>>>> +          uint8_t mask)
>>>> +{
>>>> +    if (!has_ext(ctx, RVP)) {
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +#ifdef TARGET_RISCV64
>>> Hmm....
>>>
>>> I don't want to add any more #defines on the RISC-V xlen. We are
>>> trying to make the QEMU RISC-V implementation xlen independent.
>> I noticed the change, but was not quite clear about the benefit of it.
>>
>> Could you give a brief explanation?
> Yep, there are a few reasons for it.
>
> AFAIK every QEMU platform except RISC-V allows the 64-bit binary to
> run the 32-bit guests. So for example in the ARM QEMU builds I can use
> qemu-system-aarch64 to run 32-bit ARMv7 guests. This is a step towards
> allowing us to do the same with RISC-V.
Get it. By explicitly specifying a 32bit CPU option in command line, 
qemu-system-riscv64
can run a 32-bit guest application. Is it right?
>
> It's also a step towards allowing fixed XLEN CPUS to run. For example
> 4 64-bit application CPUs and a single 32-bit power management CPU can
> all run together.
>
> Also XLEN in RISC-V is configurable. A 64-bit Hypervisor can have
> 32-bit XLEN guests according to the spec. This is a step towards
> allowing that as well.
Really interesting points.

I have not used QEMU in this way. Are these features ready now?
>>> Can you use `riscv_cpu_is_32bit(env)` instead, here are everywhere
>>> else you add a #define TARGET... ?
>> Sure, I think there are two ways.
>>
>> 1) Get env from the current_cpu, then call riscv_cpu_is_32bit(env).
>>
>> It's some strange,  because I can't find current_cpu reference from many
>> archs.
>>
>> I don't know whether it has side effects.
>>
>> 2)  Add a similar function cpu_is_32bit(DisasContext *ctx).
> This is probably a better option, but I'm open to either way if you
> have a strong preference.
I will add a patch in this way in v2. Thanks a lot.

Zhiwei
>
> Alistair
>
>> In this way, the type of  misa field  in struct DisasContext should be
>> target_ulong.
>> Currently, the type of misa filed is uint32_t.
>>
>> Do you think which one is better? Thanks very much.
>>
>> Zhiwei
>>> Alistair
>>>
>>>> +    if (a->rd && a->rs1 && a->rs2) {
>>>> +        TCGv_i32 shift = tcg_temp_new_i32();
>>>> +        tcg_gen_extrl_i64_i32(shift, cpu_gpr[a->rs2]);
>>>> +        tcg_gen_andi_i32(shift, shift, mask);
>>>> +        f64(vece, offsetof(CPURISCVState, gpr[a->rd]),
>>>> +            offsetof(CPURISCVState, gpr[a->rs1]),
>>>> +            shift, 8, 8);
>>>> +        tcg_temp_free_i32(shift);
>>>> +        return true;
>>>> +    }
>>>> +#endif
>>>> +    return rvp_shift_ool(ctx, a, fn, mask);
>>>> +}
>>>> +
>>>> +#define GEN_RVP_SHIFT(NAME, GVEC, VECE)                     \
>>>> +static bool trans_##NAME(DisasContext *s, arg_r *a)         \
>>>> +{                                                           \
>>>> +    return rvp_shift(s, a, VECE, GVEC, gen_helper_##NAME,   \
>>>> +                     (8 << VECE) - 1);                      \
>>>> +}
>>>> +
>>>> +GEN_RVP_SHIFT(sra16, tcg_gen_gvec_sars, 1);
>>>> +GEN_RVP_SHIFT(srl16, tcg_gen_gvec_shrs, 1);
>>>> +GEN_RVP_SHIFT(sll16, tcg_gen_gvec_shls, 1);
>>>> +GEN_RVP_R_OOL(sra16_u);
>>>> +GEN_RVP_R_OOL(srl16_u);
>>>> +GEN_RVP_R_OOL(ksll16);
>>>> +GEN_RVP_R_OOL(kslra16);
>>>> +GEN_RVP_R_OOL(kslra16_u);
>>>> +
>>>> +static bool rvp_shifti_ool(DisasContext *ctx, arg_shift *a,
>>>> +                           gen_helper_rvp_r *fn)
>>>> +{
>>>> +    TCGv src1, dst, shift;
>>>> +
>>>> +    src1 = tcg_temp_new();
>>>> +    dst = tcg_temp_new();
>>>> +
>>>> +    gen_get_gpr(src1, a->rs1);
>>>> +    shift = tcg_const_tl(a->shamt);
>>>> +    fn(dst, cpu_env, src1, shift);
>>>> +    gen_set_gpr(a->rd, dst);
>>>> +
>>>> +    tcg_temp_free(src1);
>>>> +    tcg_temp_free(dst);
>>>> +    tcg_temp_free(shift);
>>>> +    return true;
>>>> +}
>>>> +
>>>> +static inline bool
>>>> +rvp_shifti(DisasContext *ctx, arg_shift *a,
>>>> +           void (* f64)(TCGv_i64, TCGv_i64, int64_t),
>>>> +           gen_helper_rvp_r *fn)
>>>> +{
>>>> +    if (!has_ext(ctx, RVP)) {
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +#ifdef TARGET_RISCV64
>>>> +    if (a->rd && a->rs1 && f64) {
>>>> +        f64(cpu_gpr[a->rd], cpu_gpr[a->rs1], a->shamt);
>>>> +        return true;
>>>> +    }
>>>> +#endif
>>>> +    return rvp_shifti_ool(ctx, a, fn);
>>>> +}
>>>> +
>>>> +#define GEN_RVP_SHIFTI(NAME, OP, GVEC)                   \
>>>> +static bool trans_##NAME(DisasContext *s, arg_shift *a)  \
>>>> +{                                                        \
>>>> +    return rvp_shifti(s, a, GVEC, gen_helper_##OP);      \
>>>> +}
>>>> +
>>>> +GEN_RVP_SHIFTI(srai16, sra16, tcg_gen_vec_sar16i_i64);
>>>> +GEN_RVP_SHIFTI(srli16, srl16, tcg_gen_vec_shr16i_i64);
>>>> +GEN_RVP_SHIFTI(slli16, sll16, tcg_gen_vec_shl16i_i64);
>>>> +GEN_RVP_SHIFTI(srai16_u, sra16_u, NULL);
>>>> +GEN_RVP_SHIFTI(srli16_u, srl16_u, NULL);
>>>> +GEN_RVP_SHIFTI(kslli16, ksll16, NULL);
>>>> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
>>>> index 62db072204..7e31c2fe46 100644
>>>> --- a/target/riscv/packed_helper.c
>>>> +++ b/target/riscv/packed_helper.c
>>>> @@ -425,3 +425,107 @@ static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
>>>>    }
>>>>
>>>>    RVPR(uksub8, 1, 1);
>>>> +
>>>> +/* 16-bit Shift Instructions */
>>>> +static inline void do_sra16(CPURISCVState *env, void *vd, void *va,
>>>> +                            void *vb, uint8_t i)
>>>> +{
>>>> +    int16_t *d = vd, *a = va;
>>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>>>> +    d[i] = a[i] >> shift;
>>>> +}
>>>> +
>>>> +RVPR(sra16, 1, 2);
>>>> +
>>>> +static inline void do_srl16(CPURISCVState *env, void *vd, void *va,
>>>> +                            void *vb, uint8_t i)
>>>> +{
>>>> +    uint16_t *d = vd, *a = va;
>>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>>>> +    d[i] = a[i] >> shift;
>>>> +}
>>>> +
>>>> +RVPR(srl16, 1, 2);
>>>> +
>>>> +static inline void do_sll16(CPURISCVState *env, void *vd, void *va,
>>>> +                            void *vb, uint8_t i)
>>>> +{
>>>> +    uint16_t *d = vd, *a = va;
>>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>>>> +    d[i] = a[i] << shift;
>>>> +}
>>>> +
>>>> +RVPR(sll16, 1, 2);
>>>> +
>>>> +static inline void do_sra16_u(CPURISCVState *env, void *vd, void *va,
>>>> +                              void *vb, uint8_t i)
>>>> +{
>>>> +    int16_t *d = vd, *a = va;
>>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>>>> +
>>>> +    d[i] = vssra16(env, 0, a[i], shift);
>>>> +}
>>>> +
>>>> +RVPR(sra16_u, 1, 2);
>>>> +
>>>> +static inline void do_srl16_u(CPURISCVState *env, void *vd, void *va,
>>>> +                              void *vb, uint8_t i)
>>>> +{
>>>> +    uint16_t *d = vd, *a = va;
>>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>>>> +
>>>> +    d[i] = vssrl16(env, 0, a[i], shift);
>>>> +}
>>>> +
>>>> +RVPR(srl16_u, 1, 2);
>>>> +
>>>> +static inline void do_ksll16(CPURISCVState *env, void *vd, void *va,
>>>> +                             void *vb, uint8_t i)
>>>> +{
>>>> +    int16_t *d = vd, *a = va, result;
>>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
>>>> +
>>>> +    result = a[i] << shift;
>>>> +    if (shift > (clrsb32(a[i]) - 16)) {
>>>> +        env->vxsat = 0x1;
>>>> +        d[i] = (a[i] & INT16_MIN) ? INT16_MIN : INT16_MAX;
>>>> +    } else {
>>>> +        d[i] = result;
>>>> +    }
>>>> +}
>>>> +
>>>> +RVPR(ksll16, 1, 2);
>>>> +
>>>> +static inline void do_kslra16(CPURISCVState *env, void *vd, void *va,
>>>> +                              void *vb, uint8_t i)
>>>> +{
>>>> +    int16_t *d = vd, *a = va;
>>>> +    int32_t shift = sextract32((*(target_ulong *)vb), 0, 5);
>>>> +
>>>> +    if (shift >= 0) {
>>>> +        do_ksll16(env, vd, va, vb, i);
>>>> +    } else {
>>>> +        shift = -shift;
>>>> +        shift = (shift == 16) ? 15 : shift;
>>>> +        d[i] = a[i] >> shift;
>>>> +    }
>>>> +}
>>>> +
>>>> +RVPR(kslra16, 1, 2);
>>>> +
>>>> +static inline void do_kslra16_u(CPURISCVState *env, void *vd, void *va,
>>>> +                                void *vb, uint8_t i)
>>>> +{
>>>> +    int16_t *d = vd, *a = va;
>>>> +    int32_t shift = sextract32((*(uint32_t *)vb), 0, 5);
>>>> +
>>>> +    if (shift >= 0) {
>>>> +        do_ksll16(env, vd, va, vb, i);
>>>> +    } else {
>>>> +        shift = -shift;
>>>> +        shift = (shift == 16) ? 15 : shift;
>>>> +        d[i] = vssra16(env, 0, a[i], shift);
>>>> +    }
>>>> +}
>>>> +
>>>> +RVPR(kslra16_u, 1, 2);
>>>> --
>>>> 2.17.1
>>>>



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 06/38] target/riscv: SIMD 16-bit Shift Instructions
  2021-03-17  2:30           ` LIU Zhiwei
@ 2021-03-17 20:39             ` Alistair Francis
  -1 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-17 20:39 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Palmer Dabbelt, open list:RISC-V,
	qemu-devel@nongnu.org Developers

On Tue, Mar 16, 2021 at 10:31 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
>
>
> On 2021/3/17 3:54, Alistair Francis wrote:
> > On Mon, Mar 15, 2021 at 10:40 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
> >>
> >>
> >> On 2021/3/16 5:25, Alistair Francis wrote:
> >>> On Fri, Feb 12, 2021 at 10:16 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
> >>>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> >>>> ---
> >>>>    target/riscv/helper.h                   |   9 ++
> >>>>    target/riscv/insn32.decode              |  17 ++++
> >>>>    target/riscv/insn_trans/trans_rvp.c.inc | 115 ++++++++++++++++++++++++
> >>>>    target/riscv/packed_helper.c            | 104 +++++++++++++++++++++
> >>>>    4 files changed, 245 insertions(+)
> >>>>
> >>>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> >>>> index a69a6b4e84..20bf400ac2 100644
> >>>> --- a/target/riscv/helper.h
> >>>> +++ b/target/riscv/helper.h
> >>>> @@ -1184,3 +1184,12 @@ DEF_HELPER_3(rsub8, tl, env, tl, tl)
> >>>>    DEF_HELPER_3(ursub8, tl, env, tl, tl)
> >>>>    DEF_HELPER_3(ksub8, tl, env, tl, tl)
> >>>>    DEF_HELPER_3(uksub8, tl, env, tl, tl)
> >>>> +
> >>>> +DEF_HELPER_3(sra16, tl, env, tl, tl)
> >>>> +DEF_HELPER_3(sra16_u, tl, env, tl, tl)
> >>>> +DEF_HELPER_3(srl16, tl, env, tl, tl)
> >>>> +DEF_HELPER_3(srl16_u, tl, env, tl, tl)
> >>>> +DEF_HELPER_3(sll16, tl, env, tl, tl)
> >>>> +DEF_HELPER_3(ksll16, tl, env, tl, tl)
> >>>> +DEF_HELPER_3(kslra16, tl, env, tl, tl)
> >>>> +DEF_HELPER_3(kslra16_u, tl, env, tl, tl)
> >>>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> >>>> index 358dd1fa10..6f053bfeb7 100644
> >>>> --- a/target/riscv/insn32.decode
> >>>> +++ b/target/riscv/insn32.decode
> >>>> @@ -23,6 +23,7 @@
> >>>>    %rd        7:5
> >>>>
> >>>>    %sh10    20:10
> >>>> +%sh4    20:4
> >>>>    %csr    20:12
> >>>>    %rm     12:3
> >>>>    %nf     29:3                     !function=ex_plus_1
> >>>> @@ -59,6 +60,7 @@
> >>>>    @j       ....................      ..... ....... &j      imm=%imm_j          %rd
> >>>>
> >>>>    @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh10      %rs1 %rd
> >>>> +@sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
> >>>>    @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
> >>>>
> >>>>    @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
> >>>> @@ -635,3 +637,18 @@ rsub8      0000101  ..... ..... 000 ..... 1111111 @r
> >>>>    ursub8     0010101  ..... ..... 000 ..... 1111111 @r
> >>>>    ksub8      0001101  ..... ..... 000 ..... 1111111 @r
> >>>>    uksub8     0011101  ..... ..... 000 ..... 1111111 @r
> >>>> +
> >>>> +sra16      0101000  ..... ..... 000 ..... 1111111 @r
> >>>> +sra16_u    0110000  ..... ..... 000 ..... 1111111 @r
> >>>> +srai16     0111000  0.... ..... 000 ..... 1111111 @sh4
> >>>> +srai16_u   0111000  1.... ..... 000 ..... 1111111 @sh4
> >>>> +srl16      0101001  ..... ..... 000 ..... 1111111 @r
> >>>> +srl16_u    0110001  ..... ..... 000 ..... 1111111 @r
> >>>> +srli16     0111001  0.... ..... 000 ..... 1111111 @sh4
> >>>> +srli16_u   0111001  1.... ..... 000 ..... 1111111 @sh4
> >>>> +sll16      0101010  ..... ..... 000 ..... 1111111 @r
> >>>> +slli16     0111010  0.... ..... 000 ..... 1111111 @sh4
> >>>> +ksll16     0110010  ..... ..... 000 ..... 1111111 @r
> >>>> +kslli16    0111010  1.... ..... 000 ..... 1111111 @sh4
> >>>> +kslra16    0101011  ..... ..... 000 ..... 1111111 @r
> >>>> +kslra16_u  0110011  ..... ..... 000 ..... 1111111 @r
> >>>> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> >>>> index 109f560ec9..848edab7e5 100644
> >>>> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> >>>> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> >>>> @@ -238,3 +238,118 @@ GEN_RVP_R_OOL(rsub8);
> >>>>    GEN_RVP_R_OOL(ursub8);
> >>>>    GEN_RVP_R_OOL(ksub8);
> >>>>    GEN_RVP_R_OOL(uksub8);
> >>>> +
> >>>> +/* 16-bit Shift Instructions */
> >>>> +static bool rvp_shift_ool(DisasContext *ctx, arg_r *a,
> >>>> +                          gen_helper_rvp_r *fn, target_ulong mask)
> >>>> +{
> >>>> +    TCGv src1, src2, dst;
> >>>> +
> >>>> +    src1 = tcg_temp_new();
> >>>> +    src2 = tcg_temp_new();
> >>>> +    dst = tcg_temp_new();
> >>>> +
> >>>> +    gen_get_gpr(src1, a->rs1);
> >>>> +    gen_get_gpr(src2, a->rs2);
> >>>> +    tcg_gen_andi_tl(src2, src2, mask);
> >>>> +
> >>>> +    fn(dst, cpu_env, src1, src2);
> >>>> +    gen_set_gpr(a->rd, dst);
> >>>> +
> >>>> +    tcg_temp_free(src1);
> >>>> +    tcg_temp_free(src2);
> >>>> +    tcg_temp_free(dst);
> >>>> +    return true;
> >>>> +}
> >>>> +
> >>>> +typedef void GenGvecShift(unsigned, uint32_t, uint32_t, TCGv_i32,
> >>>> +                          uint32_t, uint32_t);
> >>>> +static inline bool
> >>>> +rvp_shift(DisasContext *ctx, arg_r *a, uint8_t vece,
> >>>> +          GenGvecShift *f64, gen_helper_rvp_r *fn,
> >>>> +          uint8_t mask)
> >>>> +{
> >>>> +    if (!has_ext(ctx, RVP)) {
> >>>> +        return false;
> >>>> +    }
> >>>> +
> >>>> +#ifdef TARGET_RISCV64
> >>> Hmm....
> >>>
> >>> I don't want to add any more #defines on the RISC-V xlen. We are
> >>> trying to make the QEMU RISC-V implementation xlen independent.
> >> I noticed the change, but was not quite clear about the benefit of it.
> >>
> >> Could you give a brief explanation?
> > Yep, there are a few reasons for it.
> >
> > AFAIK every QEMU platform except RISC-V allows the 64-bit binary to
> > run the 32-bit guests. So for example in the ARM QEMU builds I can use
> > qemu-system-aarch64 to run 32-bit ARMv7 guests. This is a step towards
> > allowing us to do the same with RISC-V.
> Get it. By explicitly specifying a 32bit CPU option in command line,
> qemu-system-riscv64
> can run a 32-bit guest application. Is it right?

Yep, that's the idea.

> >
> > It's also a step towards allowing fixed XLEN CPUS to run. For example
> > 4 64-bit application CPUs and a single 32-bit power management CPU can
> > all run together.
> >
> > Also XLEN in RISC-V is configurable. A 64-bit Hypervisor can have
> > 32-bit XLEN guests according to the spec. This is a step towards
> > allowing that as well.
> Really interesting points.
>
> I have not used QEMU in this way. Are these features ready now?

Nope, we are still pretty far away unfortunately, but I don't want to
get even further away.

> >>> Can you use `riscv_cpu_is_32bit(env)` instead, here are everywhere
> >>> else you add a #define TARGET... ?
> >> Sure, I think there are two ways.
> >>
> >> 1) Get env from the current_cpu, then call riscv_cpu_is_32bit(env).
> >>
> >> It's some strange,  because I can't find current_cpu reference from many
> >> archs.
> >>
> >> I don't know whether it has side effects.
> >>
> >> 2)  Add a similar function cpu_is_32bit(DisasContext *ctx).
> > This is probably a better option, but I'm open to either way if you
> > have a strong preference.
> I will add a patch in this way in v2. Thanks a lot.

Thanks!

Alistair

>
> Zhiwei
> >
> > Alistair
> >
> >> In this way, the type of  misa field  in struct DisasContext should be
> >> target_ulong.
> >> Currently, the type of misa filed is uint32_t.
> >>
> >> Do you think which one is better? Thanks very much.
> >>
> >> Zhiwei
> >>> Alistair
> >>>
> >>>> +    if (a->rd && a->rs1 && a->rs2) {
> >>>> +        TCGv_i32 shift = tcg_temp_new_i32();
> >>>> +        tcg_gen_extrl_i64_i32(shift, cpu_gpr[a->rs2]);
> >>>> +        tcg_gen_andi_i32(shift, shift, mask);
> >>>> +        f64(vece, offsetof(CPURISCVState, gpr[a->rd]),
> >>>> +            offsetof(CPURISCVState, gpr[a->rs1]),
> >>>> +            shift, 8, 8);
> >>>> +        tcg_temp_free_i32(shift);
> >>>> +        return true;
> >>>> +    }
> >>>> +#endif
> >>>> +    return rvp_shift_ool(ctx, a, fn, mask);
> >>>> +}
> >>>> +
> >>>> +#define GEN_RVP_SHIFT(NAME, GVEC, VECE)                     \
> >>>> +static bool trans_##NAME(DisasContext *s, arg_r *a)         \
> >>>> +{                                                           \
> >>>> +    return rvp_shift(s, a, VECE, GVEC, gen_helper_##NAME,   \
> >>>> +                     (8 << VECE) - 1);                      \
> >>>> +}
> >>>> +
> >>>> +GEN_RVP_SHIFT(sra16, tcg_gen_gvec_sars, 1);
> >>>> +GEN_RVP_SHIFT(srl16, tcg_gen_gvec_shrs, 1);
> >>>> +GEN_RVP_SHIFT(sll16, tcg_gen_gvec_shls, 1);
> >>>> +GEN_RVP_R_OOL(sra16_u);
> >>>> +GEN_RVP_R_OOL(srl16_u);
> >>>> +GEN_RVP_R_OOL(ksll16);
> >>>> +GEN_RVP_R_OOL(kslra16);
> >>>> +GEN_RVP_R_OOL(kslra16_u);
> >>>> +
> >>>> +static bool rvp_shifti_ool(DisasContext *ctx, arg_shift *a,
> >>>> +                           gen_helper_rvp_r *fn)
> >>>> +{
> >>>> +    TCGv src1, dst, shift;
> >>>> +
> >>>> +    src1 = tcg_temp_new();
> >>>> +    dst = tcg_temp_new();
> >>>> +
> >>>> +    gen_get_gpr(src1, a->rs1);
> >>>> +    shift = tcg_const_tl(a->shamt);
> >>>> +    fn(dst, cpu_env, src1, shift);
> >>>> +    gen_set_gpr(a->rd, dst);
> >>>> +
> >>>> +    tcg_temp_free(src1);
> >>>> +    tcg_temp_free(dst);
> >>>> +    tcg_temp_free(shift);
> >>>> +    return true;
> >>>> +}
> >>>> +
> >>>> +static inline bool
> >>>> +rvp_shifti(DisasContext *ctx, arg_shift *a,
> >>>> +           void (* f64)(TCGv_i64, TCGv_i64, int64_t),
> >>>> +           gen_helper_rvp_r *fn)
> >>>> +{
> >>>> +    if (!has_ext(ctx, RVP)) {
> >>>> +        return false;
> >>>> +    }
> >>>> +
> >>>> +#ifdef TARGET_RISCV64
> >>>> +    if (a->rd && a->rs1 && f64) {
> >>>> +        f64(cpu_gpr[a->rd], cpu_gpr[a->rs1], a->shamt);
> >>>> +        return true;
> >>>> +    }
> >>>> +#endif
> >>>> +    return rvp_shifti_ool(ctx, a, fn);
> >>>> +}
> >>>> +
> >>>> +#define GEN_RVP_SHIFTI(NAME, OP, GVEC)                   \
> >>>> +static bool trans_##NAME(DisasContext *s, arg_shift *a)  \
> >>>> +{                                                        \
> >>>> +    return rvp_shifti(s, a, GVEC, gen_helper_##OP);      \
> >>>> +}
> >>>> +
> >>>> +GEN_RVP_SHIFTI(srai16, sra16, tcg_gen_vec_sar16i_i64);
> >>>> +GEN_RVP_SHIFTI(srli16, srl16, tcg_gen_vec_shr16i_i64);
> >>>> +GEN_RVP_SHIFTI(slli16, sll16, tcg_gen_vec_shl16i_i64);
> >>>> +GEN_RVP_SHIFTI(srai16_u, sra16_u, NULL);
> >>>> +GEN_RVP_SHIFTI(srli16_u, srl16_u, NULL);
> >>>> +GEN_RVP_SHIFTI(kslli16, ksll16, NULL);
> >>>> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> >>>> index 62db072204..7e31c2fe46 100644
> >>>> --- a/target/riscv/packed_helper.c
> >>>> +++ b/target/riscv/packed_helper.c
> >>>> @@ -425,3 +425,107 @@ static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
> >>>>    }
> >>>>
> >>>>    RVPR(uksub8, 1, 1);
> >>>> +
> >>>> +/* 16-bit Shift Instructions */
> >>>> +static inline void do_sra16(CPURISCVState *env, void *vd, void *va,
> >>>> +                            void *vb, uint8_t i)
> >>>> +{
> >>>> +    int16_t *d = vd, *a = va;
> >>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >>>> +    d[i] = a[i] >> shift;
> >>>> +}
> >>>> +
> >>>> +RVPR(sra16, 1, 2);
> >>>> +
> >>>> +static inline void do_srl16(CPURISCVState *env, void *vd, void *va,
> >>>> +                            void *vb, uint8_t i)
> >>>> +{
> >>>> +    uint16_t *d = vd, *a = va;
> >>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >>>> +    d[i] = a[i] >> shift;
> >>>> +}
> >>>> +
> >>>> +RVPR(srl16, 1, 2);
> >>>> +
> >>>> +static inline void do_sll16(CPURISCVState *env, void *vd, void *va,
> >>>> +                            void *vb, uint8_t i)
> >>>> +{
> >>>> +    uint16_t *d = vd, *a = va;
> >>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >>>> +    d[i] = a[i] << shift;
> >>>> +}
> >>>> +
> >>>> +RVPR(sll16, 1, 2);
> >>>> +
> >>>> +static inline void do_sra16_u(CPURISCVState *env, void *vd, void *va,
> >>>> +                              void *vb, uint8_t i)
> >>>> +{
> >>>> +    int16_t *d = vd, *a = va;
> >>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >>>> +
> >>>> +    d[i] = vssra16(env, 0, a[i], shift);
> >>>> +}
> >>>> +
> >>>> +RVPR(sra16_u, 1, 2);
> >>>> +
> >>>> +static inline void do_srl16_u(CPURISCVState *env, void *vd, void *va,
> >>>> +                              void *vb, uint8_t i)
> >>>> +{
> >>>> +    uint16_t *d = vd, *a = va;
> >>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >>>> +
> >>>> +    d[i] = vssrl16(env, 0, a[i], shift);
> >>>> +}
> >>>> +
> >>>> +RVPR(srl16_u, 1, 2);
> >>>> +
> >>>> +static inline void do_ksll16(CPURISCVState *env, void *vd, void *va,
> >>>> +                             void *vb, uint8_t i)
> >>>> +{
> >>>> +    int16_t *d = vd, *a = va, result;
> >>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >>>> +
> >>>> +    result = a[i] << shift;
> >>>> +    if (shift > (clrsb32(a[i]) - 16)) {
> >>>> +        env->vxsat = 0x1;
> >>>> +        d[i] = (a[i] & INT16_MIN) ? INT16_MIN : INT16_MAX;
> >>>> +    } else {
> >>>> +        d[i] = result;
> >>>> +    }
> >>>> +}
> >>>> +
> >>>> +RVPR(ksll16, 1, 2);
> >>>> +
> >>>> +static inline void do_kslra16(CPURISCVState *env, void *vd, void *va,
> >>>> +                              void *vb, uint8_t i)
> >>>> +{
> >>>> +    int16_t *d = vd, *a = va;
> >>>> +    int32_t shift = sextract32((*(target_ulong *)vb), 0, 5);
> >>>> +
> >>>> +    if (shift >= 0) {
> >>>> +        do_ksll16(env, vd, va, vb, i);
> >>>> +    } else {
> >>>> +        shift = -shift;
> >>>> +        shift = (shift == 16) ? 15 : shift;
> >>>> +        d[i] = a[i] >> shift;
> >>>> +    }
> >>>> +}
> >>>> +
> >>>> +RVPR(kslra16, 1, 2);
> >>>> +
> >>>> +static inline void do_kslra16_u(CPURISCVState *env, void *vd, void *va,
> >>>> +                                void *vb, uint8_t i)
> >>>> +{
> >>>> +    int16_t *d = vd, *a = va;
> >>>> +    int32_t shift = sextract32((*(uint32_t *)vb), 0, 5);
> >>>> +
> >>>> +    if (shift >= 0) {
> >>>> +        do_ksll16(env, vd, va, vb, i);
> >>>> +    } else {
> >>>> +        shift = -shift;
> >>>> +        shift = (shift == 16) ? 15 : shift;
> >>>> +        d[i] = vssra16(env, 0, a[i], shift);
> >>>> +    }
> >>>> +}
> >>>> +
> >>>> +RVPR(kslra16_u, 1, 2);
> >>>> --
> >>>> 2.17.1
> >>>>
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 06/38] target/riscv: SIMD 16-bit Shift Instructions
@ 2021-03-17 20:39             ` Alistair Francis
  0 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-03-17 20:39 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, Palmer Dabbelt

On Tue, Mar 16, 2021 at 10:31 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
>
>
> On 2021/3/17 3:54, Alistair Francis wrote:
> > On Mon, Mar 15, 2021 at 10:40 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
> >>
> >>
> >> On 2021/3/16 5:25, Alistair Francis wrote:
> >>> On Fri, Feb 12, 2021 at 10:16 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
> >>>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> >>>> ---
> >>>>    target/riscv/helper.h                   |   9 ++
> >>>>    target/riscv/insn32.decode              |  17 ++++
> >>>>    target/riscv/insn_trans/trans_rvp.c.inc | 115 ++++++++++++++++++++++++
> >>>>    target/riscv/packed_helper.c            | 104 +++++++++++++++++++++
> >>>>    4 files changed, 245 insertions(+)
> >>>>
> >>>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> >>>> index a69a6b4e84..20bf400ac2 100644
> >>>> --- a/target/riscv/helper.h
> >>>> +++ b/target/riscv/helper.h
> >>>> @@ -1184,3 +1184,12 @@ DEF_HELPER_3(rsub8, tl, env, tl, tl)
> >>>>    DEF_HELPER_3(ursub8, tl, env, tl, tl)
> >>>>    DEF_HELPER_3(ksub8, tl, env, tl, tl)
> >>>>    DEF_HELPER_3(uksub8, tl, env, tl, tl)
> >>>> +
> >>>> +DEF_HELPER_3(sra16, tl, env, tl, tl)
> >>>> +DEF_HELPER_3(sra16_u, tl, env, tl, tl)
> >>>> +DEF_HELPER_3(srl16, tl, env, tl, tl)
> >>>> +DEF_HELPER_3(srl16_u, tl, env, tl, tl)
> >>>> +DEF_HELPER_3(sll16, tl, env, tl, tl)
> >>>> +DEF_HELPER_3(ksll16, tl, env, tl, tl)
> >>>> +DEF_HELPER_3(kslra16, tl, env, tl, tl)
> >>>> +DEF_HELPER_3(kslra16_u, tl, env, tl, tl)
> >>>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> >>>> index 358dd1fa10..6f053bfeb7 100644
> >>>> --- a/target/riscv/insn32.decode
> >>>> +++ b/target/riscv/insn32.decode
> >>>> @@ -23,6 +23,7 @@
> >>>>    %rd        7:5
> >>>>
> >>>>    %sh10    20:10
> >>>> +%sh4    20:4
> >>>>    %csr    20:12
> >>>>    %rm     12:3
> >>>>    %nf     29:3                     !function=ex_plus_1
> >>>> @@ -59,6 +60,7 @@
> >>>>    @j       ....................      ..... ....... &j      imm=%imm_j          %rd
> >>>>
> >>>>    @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh10      %rs1 %rd
> >>>> +@sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
> >>>>    @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
> >>>>
> >>>>    @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
> >>>> @@ -635,3 +637,18 @@ rsub8      0000101  ..... ..... 000 ..... 1111111 @r
> >>>>    ursub8     0010101  ..... ..... 000 ..... 1111111 @r
> >>>>    ksub8      0001101  ..... ..... 000 ..... 1111111 @r
> >>>>    uksub8     0011101  ..... ..... 000 ..... 1111111 @r
> >>>> +
> >>>> +sra16      0101000  ..... ..... 000 ..... 1111111 @r
> >>>> +sra16_u    0110000  ..... ..... 000 ..... 1111111 @r
> >>>> +srai16     0111000  0.... ..... 000 ..... 1111111 @sh4
> >>>> +srai16_u   0111000  1.... ..... 000 ..... 1111111 @sh4
> >>>> +srl16      0101001  ..... ..... 000 ..... 1111111 @r
> >>>> +srl16_u    0110001  ..... ..... 000 ..... 1111111 @r
> >>>> +srli16     0111001  0.... ..... 000 ..... 1111111 @sh4
> >>>> +srli16_u   0111001  1.... ..... 000 ..... 1111111 @sh4
> >>>> +sll16      0101010  ..... ..... 000 ..... 1111111 @r
> >>>> +slli16     0111010  0.... ..... 000 ..... 1111111 @sh4
> >>>> +ksll16     0110010  ..... ..... 000 ..... 1111111 @r
> >>>> +kslli16    0111010  1.... ..... 000 ..... 1111111 @sh4
> >>>> +kslra16    0101011  ..... ..... 000 ..... 1111111 @r
> >>>> +kslra16_u  0110011  ..... ..... 000 ..... 1111111 @r
> >>>> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> >>>> index 109f560ec9..848edab7e5 100644
> >>>> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> >>>> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> >>>> @@ -238,3 +238,118 @@ GEN_RVP_R_OOL(rsub8);
> >>>>    GEN_RVP_R_OOL(ursub8);
> >>>>    GEN_RVP_R_OOL(ksub8);
> >>>>    GEN_RVP_R_OOL(uksub8);
> >>>> +
> >>>> +/* 16-bit Shift Instructions */
> >>>> +static bool rvp_shift_ool(DisasContext *ctx, arg_r *a,
> >>>> +                          gen_helper_rvp_r *fn, target_ulong mask)
> >>>> +{
> >>>> +    TCGv src1, src2, dst;
> >>>> +
> >>>> +    src1 = tcg_temp_new();
> >>>> +    src2 = tcg_temp_new();
> >>>> +    dst = tcg_temp_new();
> >>>> +
> >>>> +    gen_get_gpr(src1, a->rs1);
> >>>> +    gen_get_gpr(src2, a->rs2);
> >>>> +    tcg_gen_andi_tl(src2, src2, mask);
> >>>> +
> >>>> +    fn(dst, cpu_env, src1, src2);
> >>>> +    gen_set_gpr(a->rd, dst);
> >>>> +
> >>>> +    tcg_temp_free(src1);
> >>>> +    tcg_temp_free(src2);
> >>>> +    tcg_temp_free(dst);
> >>>> +    return true;
> >>>> +}
> >>>> +
> >>>> +typedef void GenGvecShift(unsigned, uint32_t, uint32_t, TCGv_i32,
> >>>> +                          uint32_t, uint32_t);
> >>>> +static inline bool
> >>>> +rvp_shift(DisasContext *ctx, arg_r *a, uint8_t vece,
> >>>> +          GenGvecShift *f64, gen_helper_rvp_r *fn,
> >>>> +          uint8_t mask)
> >>>> +{
> >>>> +    if (!has_ext(ctx, RVP)) {
> >>>> +        return false;
> >>>> +    }
> >>>> +
> >>>> +#ifdef TARGET_RISCV64
> >>> Hmm....
> >>>
> >>> I don't want to add any more #defines on the RISC-V xlen. We are
> >>> trying to make the QEMU RISC-V implementation xlen independent.
> >> I noticed the change, but was not quite clear about the benefit of it.
> >>
> >> Could you give a brief explanation?
> > Yep, there are a few reasons for it.
> >
> > AFAIK every QEMU platform except RISC-V allows the 64-bit binary to
> > run the 32-bit guests. So for example in the ARM QEMU builds I can use
> > qemu-system-aarch64 to run 32-bit ARMv7 guests. This is a step towards
> > allowing us to do the same with RISC-V.
> Get it. By explicitly specifying a 32bit CPU option in command line,
> qemu-system-riscv64
> can run a 32-bit guest application. Is it right?

Yep, that's the idea.

> >
> > It's also a step towards allowing fixed XLEN CPUS to run. For example
> > 4 64-bit application CPUs and a single 32-bit power management CPU can
> > all run together.
> >
> > Also XLEN in RISC-V is configurable. A 64-bit Hypervisor can have
> > 32-bit XLEN guests according to the spec. This is a step towards
> > allowing that as well.
> Really interesting points.
>
> I have not used QEMU in this way. Are these features ready now?

Nope, we are still pretty far away unfortunately, but I don't want to
get even further away.

> >>> Can you use `riscv_cpu_is_32bit(env)` instead, here are everywhere
> >>> else you add a #define TARGET... ?
> >> Sure, I think there are two ways.
> >>
> >> 1) Get env from the current_cpu, then call riscv_cpu_is_32bit(env).
> >>
> >> It's some strange,  because I can't find current_cpu reference from many
> >> archs.
> >>
> >> I don't know whether it has side effects.
> >>
> >> 2)  Add a similar function cpu_is_32bit(DisasContext *ctx).
> > This is probably a better option, but I'm open to either way if you
> > have a strong preference.
> I will add a patch in this way in v2. Thanks a lot.

Thanks!

Alistair

>
> Zhiwei
> >
> > Alistair
> >
> >> In this way, the type of  misa field  in struct DisasContext should be
> >> target_ulong.
> >> Currently, the type of misa filed is uint32_t.
> >>
> >> Do you think which one is better? Thanks very much.
> >>
> >> Zhiwei
> >>> Alistair
> >>>
> >>>> +    if (a->rd && a->rs1 && a->rs2) {
> >>>> +        TCGv_i32 shift = tcg_temp_new_i32();
> >>>> +        tcg_gen_extrl_i64_i32(shift, cpu_gpr[a->rs2]);
> >>>> +        tcg_gen_andi_i32(shift, shift, mask);
> >>>> +        f64(vece, offsetof(CPURISCVState, gpr[a->rd]),
> >>>> +            offsetof(CPURISCVState, gpr[a->rs1]),
> >>>> +            shift, 8, 8);
> >>>> +        tcg_temp_free_i32(shift);
> >>>> +        return true;
> >>>> +    }
> >>>> +#endif
> >>>> +    return rvp_shift_ool(ctx, a, fn, mask);
> >>>> +}
> >>>> +
> >>>> +#define GEN_RVP_SHIFT(NAME, GVEC, VECE)                     \
> >>>> +static bool trans_##NAME(DisasContext *s, arg_r *a)         \
> >>>> +{                                                           \
> >>>> +    return rvp_shift(s, a, VECE, GVEC, gen_helper_##NAME,   \
> >>>> +                     (8 << VECE) - 1);                      \
> >>>> +}
> >>>> +
> >>>> +GEN_RVP_SHIFT(sra16, tcg_gen_gvec_sars, 1);
> >>>> +GEN_RVP_SHIFT(srl16, tcg_gen_gvec_shrs, 1);
> >>>> +GEN_RVP_SHIFT(sll16, tcg_gen_gvec_shls, 1);
> >>>> +GEN_RVP_R_OOL(sra16_u);
> >>>> +GEN_RVP_R_OOL(srl16_u);
> >>>> +GEN_RVP_R_OOL(ksll16);
> >>>> +GEN_RVP_R_OOL(kslra16);
> >>>> +GEN_RVP_R_OOL(kslra16_u);
> >>>> +
> >>>> +static bool rvp_shifti_ool(DisasContext *ctx, arg_shift *a,
> >>>> +                           gen_helper_rvp_r *fn)
> >>>> +{
> >>>> +    TCGv src1, dst, shift;
> >>>> +
> >>>> +    src1 = tcg_temp_new();
> >>>> +    dst = tcg_temp_new();
> >>>> +
> >>>> +    gen_get_gpr(src1, a->rs1);
> >>>> +    shift = tcg_const_tl(a->shamt);
> >>>> +    fn(dst, cpu_env, src1, shift);
> >>>> +    gen_set_gpr(a->rd, dst);
> >>>> +
> >>>> +    tcg_temp_free(src1);
> >>>> +    tcg_temp_free(dst);
> >>>> +    tcg_temp_free(shift);
> >>>> +    return true;
> >>>> +}
> >>>> +
> >>>> +static inline bool
> >>>> +rvp_shifti(DisasContext *ctx, arg_shift *a,
> >>>> +           void (* f64)(TCGv_i64, TCGv_i64, int64_t),
> >>>> +           gen_helper_rvp_r *fn)
> >>>> +{
> >>>> +    if (!has_ext(ctx, RVP)) {
> >>>> +        return false;
> >>>> +    }
> >>>> +
> >>>> +#ifdef TARGET_RISCV64
> >>>> +    if (a->rd && a->rs1 && f64) {
> >>>> +        f64(cpu_gpr[a->rd], cpu_gpr[a->rs1], a->shamt);
> >>>> +        return true;
> >>>> +    }
> >>>> +#endif
> >>>> +    return rvp_shifti_ool(ctx, a, fn);
> >>>> +}
> >>>> +
> >>>> +#define GEN_RVP_SHIFTI(NAME, OP, GVEC)                   \
> >>>> +static bool trans_##NAME(DisasContext *s, arg_shift *a)  \
> >>>> +{                                                        \
> >>>> +    return rvp_shifti(s, a, GVEC, gen_helper_##OP);      \
> >>>> +}
> >>>> +
> >>>> +GEN_RVP_SHIFTI(srai16, sra16, tcg_gen_vec_sar16i_i64);
> >>>> +GEN_RVP_SHIFTI(srli16, srl16, tcg_gen_vec_shr16i_i64);
> >>>> +GEN_RVP_SHIFTI(slli16, sll16, tcg_gen_vec_shl16i_i64);
> >>>> +GEN_RVP_SHIFTI(srai16_u, sra16_u, NULL);
> >>>> +GEN_RVP_SHIFTI(srli16_u, srl16_u, NULL);
> >>>> +GEN_RVP_SHIFTI(kslli16, ksll16, NULL);
> >>>> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> >>>> index 62db072204..7e31c2fe46 100644
> >>>> --- a/target/riscv/packed_helper.c
> >>>> +++ b/target/riscv/packed_helper.c
> >>>> @@ -425,3 +425,107 @@ static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
> >>>>    }
> >>>>
> >>>>    RVPR(uksub8, 1, 1);
> >>>> +
> >>>> +/* 16-bit Shift Instructions */
> >>>> +static inline void do_sra16(CPURISCVState *env, void *vd, void *va,
> >>>> +                            void *vb, uint8_t i)
> >>>> +{
> >>>> +    int16_t *d = vd, *a = va;
> >>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >>>> +    d[i] = a[i] >> shift;
> >>>> +}
> >>>> +
> >>>> +RVPR(sra16, 1, 2);
> >>>> +
> >>>> +static inline void do_srl16(CPURISCVState *env, void *vd, void *va,
> >>>> +                            void *vb, uint8_t i)
> >>>> +{
> >>>> +    uint16_t *d = vd, *a = va;
> >>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >>>> +    d[i] = a[i] >> shift;
> >>>> +}
> >>>> +
> >>>> +RVPR(srl16, 1, 2);
> >>>> +
> >>>> +static inline void do_sll16(CPURISCVState *env, void *vd, void *va,
> >>>> +                            void *vb, uint8_t i)
> >>>> +{
> >>>> +    uint16_t *d = vd, *a = va;
> >>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >>>> +    d[i] = a[i] << shift;
> >>>> +}
> >>>> +
> >>>> +RVPR(sll16, 1, 2);
> >>>> +
> >>>> +static inline void do_sra16_u(CPURISCVState *env, void *vd, void *va,
> >>>> +                              void *vb, uint8_t i)
> >>>> +{
> >>>> +    int16_t *d = vd, *a = va;
> >>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >>>> +
> >>>> +    d[i] = vssra16(env, 0, a[i], shift);
> >>>> +}
> >>>> +
> >>>> +RVPR(sra16_u, 1, 2);
> >>>> +
> >>>> +static inline void do_srl16_u(CPURISCVState *env, void *vd, void *va,
> >>>> +                              void *vb, uint8_t i)
> >>>> +{
> >>>> +    uint16_t *d = vd, *a = va;
> >>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >>>> +
> >>>> +    d[i] = vssrl16(env, 0, a[i], shift);
> >>>> +}
> >>>> +
> >>>> +RVPR(srl16_u, 1, 2);
> >>>> +
> >>>> +static inline void do_ksll16(CPURISCVState *env, void *vd, void *va,
> >>>> +                             void *vb, uint8_t i)
> >>>> +{
> >>>> +    int16_t *d = vd, *a = va, result;
> >>>> +    uint8_t shift = *(uint8_t *)vb & 0xf;
> >>>> +
> >>>> +    result = a[i] << shift;
> >>>> +    if (shift > (clrsb32(a[i]) - 16)) {
> >>>> +        env->vxsat = 0x1;
> >>>> +        d[i] = (a[i] & INT16_MIN) ? INT16_MIN : INT16_MAX;
> >>>> +    } else {
> >>>> +        d[i] = result;
> >>>> +    }
> >>>> +}
> >>>> +
> >>>> +RVPR(ksll16, 1, 2);
> >>>> +
> >>>> +static inline void do_kslra16(CPURISCVState *env, void *vd, void *va,
> >>>> +                              void *vb, uint8_t i)
> >>>> +{
> >>>> +    int16_t *d = vd, *a = va;
> >>>> +    int32_t shift = sextract32((*(target_ulong *)vb), 0, 5);
> >>>> +
> >>>> +    if (shift >= 0) {
> >>>> +        do_ksll16(env, vd, va, vb, i);
> >>>> +    } else {
> >>>> +        shift = -shift;
> >>>> +        shift = (shift == 16) ? 15 : shift;
> >>>> +        d[i] = a[i] >> shift;
> >>>> +    }
> >>>> +}
> >>>> +
> >>>> +RVPR(kslra16, 1, 2);
> >>>> +
> >>>> +static inline void do_kslra16_u(CPURISCVState *env, void *vd, void *va,
> >>>> +                                void *vb, uint8_t i)
> >>>> +{
> >>>> +    int16_t *d = vd, *a = va;
> >>>> +    int32_t shift = sextract32((*(uint32_t *)vb), 0, 5);
> >>>> +
> >>>> +    if (shift >= 0) {
> >>>> +        do_ksll16(env, vd, va, vb, i);
> >>>> +    } else {
> >>>> +        shift = -shift;
> >>>> +        shift = (shift == 16) ? 15 : shift;
> >>>> +        d[i] = vssra16(env, 0, a[i], shift);
> >>>> +    }
> >>>> +}
> >>>> +
> >>>> +RVPR(kslra16_u, 1, 2);
> >>>> --
> >>>> 2.17.1
> >>>>
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 00/38] target/riscv: support packed extension v0.9.2
  2021-02-12 15:02 ` LIU Zhiwei
@ 2021-04-13  3:27   ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-04-13  3:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, qemu-riscv, palmer, alistair23

ping +1.

On 2021/2/12 下午11:02, LIU Zhiwei wrote:
> This patchset implements the packed extension for RISC-V on QEMU.
>
> This patchset have passed all my direct Linux user mode cases(RV64) and
> bare metal cases(RV32) on X86-64 Ubuntu host machine. I will later push
> these test cases to my repo(https://github.com/romanheros/qemu.git
> branch:packed-upstream-v1).
>
> I have ported packed extension on RISU, but I didn't find a simulator or
> hardware to compare with. If anyone have one, please let me know.
>
> Features:
>    * support specification packed extension v0.9.2(https://github.com/riscv/riscv-p-spec/)
>    * support basic packed extension.
>    * support Zp64.
>
> LIU Zhiwei (38):
>    target/riscv: implementation-defined constant parameters
>    target/riscv: Hoist vector functions
>    target/riscv: Fixup saturate subtract function
>    target/riscv: 16-bit Addition & Subtraction Instructions
>    target/riscv: 8-bit Addition & Subtraction Instruction
>    target/riscv: SIMD 16-bit Shift Instructions
>    target/riscv: SIMD 8-bit Shift Instructions
>    target/riscv: SIMD 16-bit Compare Instructions
>    target/riscv: SIMD 8-bit Compare Instructions
>    target/riscv: SIMD 16-bit Multiply Instructions
>    target/riscv: SIMD 8-bit Multiply Instructions
>    target/riscv: SIMD 16-bit Miscellaneous Instructions
>    target/riscv: SIMD 8-bit Miscellaneous Instructions
>    target/riscv: 8-bit Unpacking Instructions
>    target/riscv: 16-bit Packing Instructions
>    target/riscv: Signed MSW 32x32 Multiply and Add Instructions
>    target/riscv: Signed MSW 32x16 Multiply and Add Instructions
>    target/riscv: Signed 16-bit Multiply 32-bit Add/Subtract Instructions
>    target/riscv: Signed 16-bit Multiply 64-bit Add/Subtract Instructions
>    target/riscv: Partial-SIMD Miscellaneous Instructions
>    target/riscv: 8-bit Multiply with 32-bit Add Instructions
>    target/riscv: 64-bit Add/Subtract Instructions
>    target/riscv: 32-bit Multiply 64-bit Add/Subtract Instructions
>    target/riscv: Signed 16-bit Multiply with 64-bit Add/Subtract
>      Instructions
>    target/riscv: Non-SIMD Q15 saturation ALU Instructions
>    target/riscv: Non-SIMD Q31 saturation ALU Instructions
>    target/riscv: 32-bit Computation Instructions
>    target/riscv: Non-SIMD Miscellaneous Instructions
>    target/riscv: RV64 Only SIMD 32-bit Add/Subtract Instructions
>    target/riscv: RV64 Only SIMD 32-bit Shift Instructions
>    target/riscv: RV64 Only SIMD 32-bit Miscellaneous Instructions
>    target/riscv: RV64 Only SIMD Q15 saturating Multiply Instructions
>    target/riscv: RV64 Only 32-bit Multiply Instructions
>    target/riscv: RV64 Only 32-bit Multiply & Add Instructions
>    target/riscv: RV64 Only 32-bit Parallel Multiply & Add Instructions
>    target/riscv: RV64 Only Non-SIMD 32-bit Shift Instructions
>    target/riscv: RV64 Only 32-bit Packing Instructions
>    target/riscv: configure and turn on packed extension from command line
>
>   target/riscv/cpu.c                      |   32 +
>   target/riscv/cpu.h                      |    6 +
>   target/riscv/helper.h                   |  332 ++
>   target/riscv/insn32-64.decode           |   93 +-
>   target/riscv/insn32.decode              |  285 ++
>   target/riscv/insn_trans/trans_rvp.c.inc | 1224 +++++++
>   target/riscv/internals.h                |   50 +
>   target/riscv/meson.build                |    1 +
>   target/riscv/packed_helper.c            | 3862 +++++++++++++++++++++++
>   target/riscv/translate.c                |    3 +
>   target/riscv/vector_helper.c            |   90 +-
>   11 files changed, 5912 insertions(+), 66 deletions(-)
>   create mode 100644 target/riscv/insn_trans/trans_rvp.c.inc
>   create mode 100644 target/riscv/packed_helper.c
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 00/38] target/riscv: support packed extension v0.9.2
@ 2021-04-13  3:27   ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-04-13  3:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-riscv, richard.henderson, alistair23, palmer

ping +1.

On 2021/2/12 下午11:02, LIU Zhiwei wrote:
> This patchset implements the packed extension for RISC-V on QEMU.
>
> This patchset have passed all my direct Linux user mode cases(RV64) and
> bare metal cases(RV32) on X86-64 Ubuntu host machine. I will later push
> these test cases to my repo(https://github.com/romanheros/qemu.git
> branch:packed-upstream-v1).
>
> I have ported packed extension on RISU, but I didn't find a simulator or
> hardware to compare with. If anyone have one, please let me know.
>
> Features:
>    * support specification packed extension v0.9.2(https://github.com/riscv/riscv-p-spec/)
>    * support basic packed extension.
>    * support Zp64.
>
> LIU Zhiwei (38):
>    target/riscv: implementation-defined constant parameters
>    target/riscv: Hoist vector functions
>    target/riscv: Fixup saturate subtract function
>    target/riscv: 16-bit Addition & Subtraction Instructions
>    target/riscv: 8-bit Addition & Subtraction Instruction
>    target/riscv: SIMD 16-bit Shift Instructions
>    target/riscv: SIMD 8-bit Shift Instructions
>    target/riscv: SIMD 16-bit Compare Instructions
>    target/riscv: SIMD 8-bit Compare Instructions
>    target/riscv: SIMD 16-bit Multiply Instructions
>    target/riscv: SIMD 8-bit Multiply Instructions
>    target/riscv: SIMD 16-bit Miscellaneous Instructions
>    target/riscv: SIMD 8-bit Miscellaneous Instructions
>    target/riscv: 8-bit Unpacking Instructions
>    target/riscv: 16-bit Packing Instructions
>    target/riscv: Signed MSW 32x32 Multiply and Add Instructions
>    target/riscv: Signed MSW 32x16 Multiply and Add Instructions
>    target/riscv: Signed 16-bit Multiply 32-bit Add/Subtract Instructions
>    target/riscv: Signed 16-bit Multiply 64-bit Add/Subtract Instructions
>    target/riscv: Partial-SIMD Miscellaneous Instructions
>    target/riscv: 8-bit Multiply with 32-bit Add Instructions
>    target/riscv: 64-bit Add/Subtract Instructions
>    target/riscv: 32-bit Multiply 64-bit Add/Subtract Instructions
>    target/riscv: Signed 16-bit Multiply with 64-bit Add/Subtract
>      Instructions
>    target/riscv: Non-SIMD Q15 saturation ALU Instructions
>    target/riscv: Non-SIMD Q31 saturation ALU Instructions
>    target/riscv: 32-bit Computation Instructions
>    target/riscv: Non-SIMD Miscellaneous Instructions
>    target/riscv: RV64 Only SIMD 32-bit Add/Subtract Instructions
>    target/riscv: RV64 Only SIMD 32-bit Shift Instructions
>    target/riscv: RV64 Only SIMD 32-bit Miscellaneous Instructions
>    target/riscv: RV64 Only SIMD Q15 saturating Multiply Instructions
>    target/riscv: RV64 Only 32-bit Multiply Instructions
>    target/riscv: RV64 Only 32-bit Multiply & Add Instructions
>    target/riscv: RV64 Only 32-bit Parallel Multiply & Add Instructions
>    target/riscv: RV64 Only Non-SIMD 32-bit Shift Instructions
>    target/riscv: RV64 Only 32-bit Packing Instructions
>    target/riscv: configure and turn on packed extension from command line
>
>   target/riscv/cpu.c                      |   32 +
>   target/riscv/cpu.h                      |    6 +
>   target/riscv/helper.h                   |  332 ++
>   target/riscv/insn32-64.decode           |   93 +-
>   target/riscv/insn32.decode              |  285 ++
>   target/riscv/insn_trans/trans_rvp.c.inc | 1224 +++++++
>   target/riscv/internals.h                |   50 +
>   target/riscv/meson.build                |    1 +
>   target/riscv/packed_helper.c            | 3862 +++++++++++++++++++++++
>   target/riscv/translate.c                |    3 +
>   target/riscv/vector_helper.c            |   90 +-
>   11 files changed, 5912 insertions(+), 66 deletions(-)
>   create mode 100644 target/riscv/insn_trans/trans_rvp.c.inc
>   create mode 100644 target/riscv/packed_helper.c
>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 00/38] target/riscv: support packed extension v0.9.2
  2021-04-13  3:27   ` LIU Zhiwei
@ 2021-04-15  4:46     ` Alistair Francis
  -1 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-04-15  4:46 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, Palmer Dabbelt, open list:RISC-V,
	qemu-devel@nongnu.org Developers

On Tue, Apr 13, 2021 at 1:28 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> ping +1.
>
> On 2021/2/12 下午11:02, LIU Zhiwei wrote:
> > This patchset implements the packed extension for RISC-V on QEMU.
> >
> > This patchset have passed all my direct Linux user mode cases(RV64) and
> > bare metal cases(RV32) on X86-64 Ubuntu host machine. I will later push
> > these test cases to my repo(https://github.com/romanheros/qemu.git
> > branch:packed-upstream-v1).
> >
> > I have ported packed extension on RISU, but I didn't find a simulator or
> > hardware to compare with. If anyone have one, please let me know.
> >
> > Features:
> >    * support specification packed extension v0.9.2(https://github.com/riscv/riscv-p-spec/)
> >    * support basic packed extension.
> >    * support Zp64.
> >
> > LIU Zhiwei (38):
> >    target/riscv: implementation-defined constant parameters
> >    target/riscv: Hoist vector functions
> >    target/riscv: Fixup saturate subtract function

Thanks for the patches and sorry for the long delay.

I have applied patch 3 as it fixes a bug.

As for the other patches they are on both my review queue and Palmer's
review queue. It takes a lot of time to review these large patch
series, especially as I haven't been involved with the extension
development, so I have to both understand the extension and then
review the code.

If you would like to help speed things up you could review other
patches. That way I will have more time left to review your patches.

Alistair

> >    target/riscv: 16-bit Addition & Subtraction Instructions
> >    target/riscv: 8-bit Addition & Subtraction Instruction
> >    target/riscv: SIMD 16-bit Shift Instructions
> >    target/riscv: SIMD 8-bit Shift Instructions
> >    target/riscv: SIMD 16-bit Compare Instructions
> >    target/riscv: SIMD 8-bit Compare Instructions
> >    target/riscv: SIMD 16-bit Multiply Instructions
> >    target/riscv: SIMD 8-bit Multiply Instructions
> >    target/riscv: SIMD 16-bit Miscellaneous Instructions
> >    target/riscv: SIMD 8-bit Miscellaneous Instructions
> >    target/riscv: 8-bit Unpacking Instructions
> >    target/riscv: 16-bit Packing Instructions
> >    target/riscv: Signed MSW 32x32 Multiply and Add Instructions
> >    target/riscv: Signed MSW 32x16 Multiply and Add Instructions
> >    target/riscv: Signed 16-bit Multiply 32-bit Add/Subtract Instructions
> >    target/riscv: Signed 16-bit Multiply 64-bit Add/Subtract Instructions
> >    target/riscv: Partial-SIMD Miscellaneous Instructions
> >    target/riscv: 8-bit Multiply with 32-bit Add Instructions
> >    target/riscv: 64-bit Add/Subtract Instructions
> >    target/riscv: 32-bit Multiply 64-bit Add/Subtract Instructions
> >    target/riscv: Signed 16-bit Multiply with 64-bit Add/Subtract
> >      Instructions
> >    target/riscv: Non-SIMD Q15 saturation ALU Instructions
> >    target/riscv: Non-SIMD Q31 saturation ALU Instructions
> >    target/riscv: 32-bit Computation Instructions
> >    target/riscv: Non-SIMD Miscellaneous Instructions
> >    target/riscv: RV64 Only SIMD 32-bit Add/Subtract Instructions
> >    target/riscv: RV64 Only SIMD 32-bit Shift Instructions
> >    target/riscv: RV64 Only SIMD 32-bit Miscellaneous Instructions
> >    target/riscv: RV64 Only SIMD Q15 saturating Multiply Instructions
> >    target/riscv: RV64 Only 32-bit Multiply Instructions
> >    target/riscv: RV64 Only 32-bit Multiply & Add Instructions
> >    target/riscv: RV64 Only 32-bit Parallel Multiply & Add Instructions
> >    target/riscv: RV64 Only Non-SIMD 32-bit Shift Instructions
> >    target/riscv: RV64 Only 32-bit Packing Instructions
> >    target/riscv: configure and turn on packed extension from command line
> >
> >   target/riscv/cpu.c                      |   32 +
> >   target/riscv/cpu.h                      |    6 +
> >   target/riscv/helper.h                   |  332 ++
> >   target/riscv/insn32-64.decode           |   93 +-
> >   target/riscv/insn32.decode              |  285 ++
> >   target/riscv/insn_trans/trans_rvp.c.inc | 1224 +++++++
> >   target/riscv/internals.h                |   50 +
> >   target/riscv/meson.build                |    1 +
> >   target/riscv/packed_helper.c            | 3862 +++++++++++++++++++++++
> >   target/riscv/translate.c                |    3 +
> >   target/riscv/vector_helper.c            |   90 +-
> >   11 files changed, 5912 insertions(+), 66 deletions(-)
> >   create mode 100644 target/riscv/insn_trans/trans_rvp.c.inc
> >   create mode 100644 target/riscv/packed_helper.c
> >


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 00/38] target/riscv: support packed extension v0.9.2
@ 2021-04-15  4:46     ` Alistair Francis
  0 siblings, 0 replies; 150+ messages in thread
From: Alistair Francis @ 2021-04-15  4:46 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, Palmer Dabbelt

On Tue, Apr 13, 2021 at 1:28 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
> ping +1.
>
> On 2021/2/12 下午11:02, LIU Zhiwei wrote:
> > This patchset implements the packed extension for RISC-V on QEMU.
> >
> > This patchset have passed all my direct Linux user mode cases(RV64) and
> > bare metal cases(RV32) on X86-64 Ubuntu host machine. I will later push
> > these test cases to my repo(https://github.com/romanheros/qemu.git
> > branch:packed-upstream-v1).
> >
> > I have ported packed extension on RISU, but I didn't find a simulator or
> > hardware to compare with. If anyone have one, please let me know.
> >
> > Features:
> >    * support specification packed extension v0.9.2(https://github.com/riscv/riscv-p-spec/)
> >    * support basic packed extension.
> >    * support Zp64.
> >
> > LIU Zhiwei (38):
> >    target/riscv: implementation-defined constant parameters
> >    target/riscv: Hoist vector functions
> >    target/riscv: Fixup saturate subtract function

Thanks for the patches and sorry for the long delay.

I have applied patch 3 as it fixes a bug.

As for the other patches they are on both my review queue and Palmer's
review queue. It takes a lot of time to review these large patch
series, especially as I haven't been involved with the extension
development, so I have to both understand the extension and then
review the code.

If you would like to help speed things up you could review other
patches. That way I will have more time left to review your patches.

Alistair

> >    target/riscv: 16-bit Addition & Subtraction Instructions
> >    target/riscv: 8-bit Addition & Subtraction Instruction
> >    target/riscv: SIMD 16-bit Shift Instructions
> >    target/riscv: SIMD 8-bit Shift Instructions
> >    target/riscv: SIMD 16-bit Compare Instructions
> >    target/riscv: SIMD 8-bit Compare Instructions
> >    target/riscv: SIMD 16-bit Multiply Instructions
> >    target/riscv: SIMD 8-bit Multiply Instructions
> >    target/riscv: SIMD 16-bit Miscellaneous Instructions
> >    target/riscv: SIMD 8-bit Miscellaneous Instructions
> >    target/riscv: 8-bit Unpacking Instructions
> >    target/riscv: 16-bit Packing Instructions
> >    target/riscv: Signed MSW 32x32 Multiply and Add Instructions
> >    target/riscv: Signed MSW 32x16 Multiply and Add Instructions
> >    target/riscv: Signed 16-bit Multiply 32-bit Add/Subtract Instructions
> >    target/riscv: Signed 16-bit Multiply 64-bit Add/Subtract Instructions
> >    target/riscv: Partial-SIMD Miscellaneous Instructions
> >    target/riscv: 8-bit Multiply with 32-bit Add Instructions
> >    target/riscv: 64-bit Add/Subtract Instructions
> >    target/riscv: 32-bit Multiply 64-bit Add/Subtract Instructions
> >    target/riscv: Signed 16-bit Multiply with 64-bit Add/Subtract
> >      Instructions
> >    target/riscv: Non-SIMD Q15 saturation ALU Instructions
> >    target/riscv: Non-SIMD Q31 saturation ALU Instructions
> >    target/riscv: 32-bit Computation Instructions
> >    target/riscv: Non-SIMD Miscellaneous Instructions
> >    target/riscv: RV64 Only SIMD 32-bit Add/Subtract Instructions
> >    target/riscv: RV64 Only SIMD 32-bit Shift Instructions
> >    target/riscv: RV64 Only SIMD 32-bit Miscellaneous Instructions
> >    target/riscv: RV64 Only SIMD Q15 saturating Multiply Instructions
> >    target/riscv: RV64 Only 32-bit Multiply Instructions
> >    target/riscv: RV64 Only 32-bit Multiply & Add Instructions
> >    target/riscv: RV64 Only 32-bit Parallel Multiply & Add Instructions
> >    target/riscv: RV64 Only Non-SIMD 32-bit Shift Instructions
> >    target/riscv: RV64 Only 32-bit Packing Instructions
> >    target/riscv: configure and turn on packed extension from command line
> >
> >   target/riscv/cpu.c                      |   32 +
> >   target/riscv/cpu.h                      |    6 +
> >   target/riscv/helper.h                   |  332 ++
> >   target/riscv/insn32-64.decode           |   93 +-
> >   target/riscv/insn32.decode              |  285 ++
> >   target/riscv/insn_trans/trans_rvp.c.inc | 1224 +++++++
> >   target/riscv/internals.h                |   50 +
> >   target/riscv/meson.build                |    1 +
> >   target/riscv/packed_helper.c            | 3862 +++++++++++++++++++++++
> >   target/riscv/translate.c                |    3 +
> >   target/riscv/vector_helper.c            |   90 +-
> >   11 files changed, 5912 insertions(+), 66 deletions(-)
> >   create mode 100644 target/riscv/insn_trans/trans_rvp.c.inc
> >   create mode 100644 target/riscv/packed_helper.c
> >


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 00/38] target/riscv: support packed extension v0.9.2
  2021-04-15  4:46     ` Alistair Francis
@ 2021-04-15  5:50       ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-04-15  5:50 UTC (permalink / raw)
  To: Alistair Francis
  Cc: Richard Henderson, Palmer Dabbelt, open list:RISC-V,
	qemu-devel@nongnu.org Developers


On 2021/4/15 下午12:46, Alistair Francis wrote:
> On Tue, Apr 13, 2021 at 1:28 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>> ping +1.
>>
>> On 2021/2/12 下午11:02, LIU Zhiwei wrote:
>>> This patchset implements the packed extension for RISC-V on QEMU.
>>>
>>> This patchset have passed all my direct Linux user mode cases(RV64) and
>>> bare metal cases(RV32) on X86-64 Ubuntu host machine. I will later push
>>> these test cases to my repo(https://github.com/romanheros/qemu.git
>>> branch:packed-upstream-v1).
>>>
>>> I have ported packed extension on RISU, but I didn't find a simulator or
>>> hardware to compare with. If anyone have one, please let me know.
>>>
>>> Features:
>>>     * support specification packed extension v0.9.2(https://github.com/riscv/riscv-p-spec/)
>>>     * support basic packed extension.
>>>     * support Zp64.
>>>
>>> LIU Zhiwei (38):
>>>     target/riscv: implementation-defined constant parameters
>>>     target/riscv: Hoist vector functions
>>>     target/riscv: Fixup saturate subtract function
> Thanks for the patches and sorry for the long delay.
>
> I have applied patch 3 as it fixes a bug.
> As for the other patches they are on both my review queue and Palmer's
> review queue. It takes a lot of time to review these large patch
> series, especially as I haven't been involved with the extension
> development, so I have to both understand the extension and then
> review the code.
>
> If you would like to help speed things up you could review other
> patches. That way I will have more time left to review your patches.

No worries. I fully understand the great efforts needed to review so 
many patches. Firstly, I will try to review as many  as I send.

Zhiwei

> Alistair
>
>>>     target/riscv: 16-bit Addition & Subtraction Instructions
>>>     target/riscv: 8-bit Addition & Subtraction Instruction
>>>     target/riscv: SIMD 16-bit Shift Instructions
>>>     target/riscv: SIMD 8-bit Shift Instructions
>>>     target/riscv: SIMD 16-bit Compare Instructions
>>>     target/riscv: SIMD 8-bit Compare Instructions
>>>     target/riscv: SIMD 16-bit Multiply Instructions
>>>     target/riscv: SIMD 8-bit Multiply Instructions
>>>     target/riscv: SIMD 16-bit Miscellaneous Instructions
>>>     target/riscv: SIMD 8-bit Miscellaneous Instructions
>>>     target/riscv: 8-bit Unpacking Instructions
>>>     target/riscv: 16-bit Packing Instructions
>>>     target/riscv: Signed MSW 32x32 Multiply and Add Instructions
>>>     target/riscv: Signed MSW 32x16 Multiply and Add Instructions
>>>     target/riscv: Signed 16-bit Multiply 32-bit Add/Subtract Instructions
>>>     target/riscv: Signed 16-bit Multiply 64-bit Add/Subtract Instructions
>>>     target/riscv: Partial-SIMD Miscellaneous Instructions
>>>     target/riscv: 8-bit Multiply with 32-bit Add Instructions
>>>     target/riscv: 64-bit Add/Subtract Instructions
>>>     target/riscv: 32-bit Multiply 64-bit Add/Subtract Instructions
>>>     target/riscv: Signed 16-bit Multiply with 64-bit Add/Subtract
>>>       Instructions
>>>     target/riscv: Non-SIMD Q15 saturation ALU Instructions
>>>     target/riscv: Non-SIMD Q31 saturation ALU Instructions
>>>     target/riscv: 32-bit Computation Instructions
>>>     target/riscv: Non-SIMD Miscellaneous Instructions
>>>     target/riscv: RV64 Only SIMD 32-bit Add/Subtract Instructions
>>>     target/riscv: RV64 Only SIMD 32-bit Shift Instructions
>>>     target/riscv: RV64 Only SIMD 32-bit Miscellaneous Instructions
>>>     target/riscv: RV64 Only SIMD Q15 saturating Multiply Instructions
>>>     target/riscv: RV64 Only 32-bit Multiply Instructions
>>>     target/riscv: RV64 Only 32-bit Multiply & Add Instructions
>>>     target/riscv: RV64 Only 32-bit Parallel Multiply & Add Instructions
>>>     target/riscv: RV64 Only Non-SIMD 32-bit Shift Instructions
>>>     target/riscv: RV64 Only 32-bit Packing Instructions
>>>     target/riscv: configure and turn on packed extension from command line
>>>
>>>    target/riscv/cpu.c                      |   32 +
>>>    target/riscv/cpu.h                      |    6 +
>>>    target/riscv/helper.h                   |  332 ++
>>>    target/riscv/insn32-64.decode           |   93 +-
>>>    target/riscv/insn32.decode              |  285 ++
>>>    target/riscv/insn_trans/trans_rvp.c.inc | 1224 +++++++
>>>    target/riscv/internals.h                |   50 +
>>>    target/riscv/meson.build                |    1 +
>>>    target/riscv/packed_helper.c            | 3862 +++++++++++++++++++++++
>>>    target/riscv/translate.c                |    3 +
>>>    target/riscv/vector_helper.c            |   90 +-
>>>    11 files changed, 5912 insertions(+), 66 deletions(-)
>>>    create mode 100644 target/riscv/insn_trans/trans_rvp.c.inc
>>>    create mode 100644 target/riscv/packed_helper.c
>>>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 00/38] target/riscv: support packed extension v0.9.2
@ 2021-04-15  5:50       ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-04-15  5:50 UTC (permalink / raw)
  To: Alistair Francis
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, Palmer Dabbelt


On 2021/4/15 下午12:46, Alistair Francis wrote:
> On Tue, Apr 13, 2021 at 1:28 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>> ping +1.
>>
>> On 2021/2/12 下午11:02, LIU Zhiwei wrote:
>>> This patchset implements the packed extension for RISC-V on QEMU.
>>>
>>> This patchset have passed all my direct Linux user mode cases(RV64) and
>>> bare metal cases(RV32) on X86-64 Ubuntu host machine. I will later push
>>> these test cases to my repo(https://github.com/romanheros/qemu.git
>>> branch:packed-upstream-v1).
>>>
>>> I have ported packed extension on RISU, but I didn't find a simulator or
>>> hardware to compare with. If anyone have one, please let me know.
>>>
>>> Features:
>>>     * support specification packed extension v0.9.2(https://github.com/riscv/riscv-p-spec/)
>>>     * support basic packed extension.
>>>     * support Zp64.
>>>
>>> LIU Zhiwei (38):
>>>     target/riscv: implementation-defined constant parameters
>>>     target/riscv: Hoist vector functions
>>>     target/riscv: Fixup saturate subtract function
> Thanks for the patches and sorry for the long delay.
>
> I have applied patch 3 as it fixes a bug.
> As for the other patches they are on both my review queue and Palmer's
> review queue. It takes a lot of time to review these large patch
> series, especially as I haven't been involved with the extension
> development, so I have to both understand the extension and then
> review the code.
>
> If you would like to help speed things up you could review other
> patches. That way I will have more time left to review your patches.

No worries. I fully understand the great efforts needed to review so 
many patches. Firstly, I will try to review as many  as I send.

Zhiwei

> Alistair
>
>>>     target/riscv: 16-bit Addition & Subtraction Instructions
>>>     target/riscv: 8-bit Addition & Subtraction Instruction
>>>     target/riscv: SIMD 16-bit Shift Instructions
>>>     target/riscv: SIMD 8-bit Shift Instructions
>>>     target/riscv: SIMD 16-bit Compare Instructions
>>>     target/riscv: SIMD 8-bit Compare Instructions
>>>     target/riscv: SIMD 16-bit Multiply Instructions
>>>     target/riscv: SIMD 8-bit Multiply Instructions
>>>     target/riscv: SIMD 16-bit Miscellaneous Instructions
>>>     target/riscv: SIMD 8-bit Miscellaneous Instructions
>>>     target/riscv: 8-bit Unpacking Instructions
>>>     target/riscv: 16-bit Packing Instructions
>>>     target/riscv: Signed MSW 32x32 Multiply and Add Instructions
>>>     target/riscv: Signed MSW 32x16 Multiply and Add Instructions
>>>     target/riscv: Signed 16-bit Multiply 32-bit Add/Subtract Instructions
>>>     target/riscv: Signed 16-bit Multiply 64-bit Add/Subtract Instructions
>>>     target/riscv: Partial-SIMD Miscellaneous Instructions
>>>     target/riscv: 8-bit Multiply with 32-bit Add Instructions
>>>     target/riscv: 64-bit Add/Subtract Instructions
>>>     target/riscv: 32-bit Multiply 64-bit Add/Subtract Instructions
>>>     target/riscv: Signed 16-bit Multiply with 64-bit Add/Subtract
>>>       Instructions
>>>     target/riscv: Non-SIMD Q15 saturation ALU Instructions
>>>     target/riscv: Non-SIMD Q31 saturation ALU Instructions
>>>     target/riscv: 32-bit Computation Instructions
>>>     target/riscv: Non-SIMD Miscellaneous Instructions
>>>     target/riscv: RV64 Only SIMD 32-bit Add/Subtract Instructions
>>>     target/riscv: RV64 Only SIMD 32-bit Shift Instructions
>>>     target/riscv: RV64 Only SIMD 32-bit Miscellaneous Instructions
>>>     target/riscv: RV64 Only SIMD Q15 saturating Multiply Instructions
>>>     target/riscv: RV64 Only 32-bit Multiply Instructions
>>>     target/riscv: RV64 Only 32-bit Multiply & Add Instructions
>>>     target/riscv: RV64 Only 32-bit Parallel Multiply & Add Instructions
>>>     target/riscv: RV64 Only Non-SIMD 32-bit Shift Instructions
>>>     target/riscv: RV64 Only 32-bit Packing Instructions
>>>     target/riscv: configure and turn on packed extension from command line
>>>
>>>    target/riscv/cpu.c                      |   32 +
>>>    target/riscv/cpu.h                      |    6 +
>>>    target/riscv/helper.h                   |  332 ++
>>>    target/riscv/insn32-64.decode           |   93 +-
>>>    target/riscv/insn32.decode              |  285 ++
>>>    target/riscv/insn_trans/trans_rvp.c.inc | 1224 +++++++
>>>    target/riscv/internals.h                |   50 +
>>>    target/riscv/meson.build                |    1 +
>>>    target/riscv/packed_helper.c            | 3862 +++++++++++++++++++++++
>>>    target/riscv/translate.c                |    3 +
>>>    target/riscv/vector_helper.c            |   90 +-
>>>    11 files changed, 5912 insertions(+), 66 deletions(-)
>>>    create mode 100644 target/riscv/insn_trans/trans_rvp.c.inc
>>>    create mode 100644 target/riscv/packed_helper.c
>>>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 05/38] target/riscv: 8-bit Addition & Subtraction Instruction
  2021-03-15 21:22     ` Alistair Francis
@ 2021-05-24  1:00       ` Palmer Dabbelt
  -1 siblings, 0 replies; 150+ messages in thread
From: Palmer Dabbelt @ 2021-05-24  1:00 UTC (permalink / raw)
  To: alistair23; +Cc: richard.henderson, qemu-riscv, zhiwei_liu, qemu-devel

On Mon, 15 Mar 2021 14:22:58 PDT (-0700), alistair23@gmail.com wrote:
> On Fri, Feb 12, 2021 at 10:14 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>>
>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>
> Acked-by: Alistair Francis <alistair.francis@wdc.com>

I saw some reviews on the other ones, but since others (like this) just 
have acks and haven't had any other traffic I'm going to start here.

It looks like the latest spec is 0.9.4, but the changelog is pretty 
minimal between 0.9.5 and 0.9.2:

[0.9.2 -> 0.9.3]

* Changed Zp64 name to Zpsfoperand.
* Added Zprvsfextra for RV64 only instructions.
* Removed SWAP16 encoding. It is an alias of PKBT16.
* Fixed few typos and enhanced precision descriptions on imtermediate results.

[0.9.3 -> 0.9.4]

* Fixed few typos and enhanced precision descriptions on imtermediate results.
* Fixed/Changed data types for some intrinsic functions.
* Removed "RV32 Only" for Zpsfoperand.

So I'm just going to stick with reviewing based on the latest spec 
<https://github.com/riscv/riscv-p-spec/blob/d33a761f805d3b7c84214e5654a511267985a0a0/P-ext-proposal.pdf> 
and try to keep those differences in mind, assuming we're just tracking 
the latest draft here.

> Alistair
>
>> ---
>>  target/riscv/helper.h                   |  9 +++
>>  target/riscv/insn32.decode              | 11 ++++
>>  target/riscv/insn_trans/trans_rvp.c.inc | 79 +++++++++++++++++++++++++
>>  target/riscv/packed_helper.c            | 73 +++++++++++++++++++++++
>>  4 files changed, 172 insertions(+)
>>
>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>> index 6d622c732a..a69a6b4e84 100644
>> --- a/target/riscv/helper.h
>> +++ b/target/riscv/helper.h
>> @@ -1175,3 +1175,12 @@ DEF_HELPER_3(rstsa16, tl, env, tl, tl)
>>  DEF_HELPER_3(urstsa16, tl, env, tl, tl)
>>  DEF_HELPER_3(kstsa16, tl, env, tl, tl)
>>  DEF_HELPER_3(ukstsa16, tl, env, tl, tl)
>> +
>> +DEF_HELPER_3(radd8, tl, env, tl, tl)
>> +DEF_HELPER_3(uradd8, tl, env, tl, tl)
>> +DEF_HELPER_3(kadd8, tl, env, tl, tl)
>> +DEF_HELPER_3(ukadd8, tl, env, tl, tl)
>> +DEF_HELPER_3(rsub8, tl, env, tl, tl)
>> +DEF_HELPER_3(ursub8, tl, env, tl, tl)
>> +DEF_HELPER_3(ksub8, tl, env, tl, tl)
>> +DEF_HELPER_3(uksub8, tl, env, tl, tl)
>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>> index 8815e90476..358dd1fa10 100644
>> --- a/target/riscv/insn32.decode
>> +++ b/target/riscv/insn32.decode
>> @@ -624,3 +624,14 @@ rstsa16    1011011  ..... ..... 010 ..... 1111111 @r
>>  urstsa16   1101011  ..... ..... 010 ..... 1111111 @r
>>  kstsa16    1100011  ..... ..... 010 ..... 1111111 @r
>>  ukstsa16   1110011  ..... ..... 010 ..... 1111111 @r
>> +
>> +add8       0100100  ..... ..... 000 ..... 1111111 @r
>> +radd8      0000100  ..... ..... 000 ..... 1111111 @r
>> +uradd8     0010100  ..... ..... 000 ..... 1111111 @r
>> +kadd8      0001100  ..... ..... 000 ..... 1111111 @r
>> +ukadd8     0011100  ..... ..... 000 ..... 1111111 @r
>> +sub8       0100101  ..... ..... 000 ..... 1111111 @r
>> +rsub8      0000101  ..... ..... 000 ..... 1111111 @r
>> +ursub8     0010101  ..... ..... 000 ..... 1111111 @r
>> +ksub8      0001101  ..... ..... 000 ..... 1111111 @r
>> +uksub8     0011101  ..... ..... 000 ..... 1111111 @r
>> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
>> index 0885a4fd45..109f560ec9 100644
>> --- a/target/riscv/insn_trans/trans_rvp.c.inc
>> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
>> @@ -159,3 +159,82 @@ GEN_RVP_R_OOL(rstsa16);
>>  GEN_RVP_R_OOL(urstsa16);
>>  GEN_RVP_R_OOL(kstsa16);
>>  GEN_RVP_R_OOL(ukstsa16);
>> +
>> +/* 8-bit Addition & Subtraction Instructions */
>> +/*
>> + *  Copied from tcg-op-gvec.c.
>> + *
>> + *  Perform a vector addition using normal addition and a mask.  The mask
>> + *  should be the sign bit of each lane.  This 6-operation form is more
>> + *  efficient than separate additions when there are 4 or more lanes in
>> + *  the 64-bit operation.
>> + */
>> +
>> +static void gen_simd_add_mask(TCGv d, TCGv a, TCGv b, TCGv m)
>> +{
>> +    TCGv t1 = tcg_temp_new();
>> +    TCGv t2 = tcg_temp_new();
>> +    TCGv t3 = tcg_temp_new();
>> +
>> +    tcg_gen_andc_tl(t1, a, m);
>> +    tcg_gen_andc_tl(t2, b, m);
>> +    tcg_gen_xor_tl(t3, a, b);
>> +    tcg_gen_add_tl(d, t1, t2);
>> +    tcg_gen_and_tl(t3, t3, m);
>> +    tcg_gen_xor_tl(d, d, t3);
>> +
>> +    tcg_temp_free(t1);
>> +    tcg_temp_free(t2);
>> +    tcg_temp_free(t3);
>> +}
>> +
>> +static void tcg_gen_simd_add8(TCGv d, TCGv a, TCGv b)
>> +{
>> +    TCGv m = tcg_const_tl((target_ulong)dup_const(MO_8, 0x80));
>> +    gen_simd_add_mask(d, a, b, m);
>> +    tcg_temp_free(m);
>> +}
>> +
>> +GEN_RVP_R_INLINE(add8, add, 0, trans_add);
>> +
>> +/*
>> + *  Copied from tcg-op-gvec.c.
>> + *
>> + *  Perform a vector subtraction using normal subtraction and a mask.
>> + *  Compare gen_addv_mask above.
>> + */
>> +static void gen_simd_sub_mask(TCGv d, TCGv a, TCGv b, TCGv m)
>> +{
>> +    TCGv t1 = tcg_temp_new();
>> +    TCGv t2 = tcg_temp_new();
>> +    TCGv t3 = tcg_temp_new();
>> +
>> +    tcg_gen_or_tl(t1, a, m);
>> +    tcg_gen_andc_tl(t2, b, m);
>> +    tcg_gen_eqv_tl(t3, a, b);
>> +    tcg_gen_sub_tl(d, t1, t2);
>> +    tcg_gen_and_tl(t3, t3, m);
>> +    tcg_gen_xor_tl(d, d, t3);
>> +
>> +    tcg_temp_free(t1);
>> +    tcg_temp_free(t2);
>> +    tcg_temp_free(t3);
>> +}
>> +
>> +static void tcg_gen_simd_sub8(TCGv d, TCGv a, TCGv b)
>> +{
>> +    TCGv m = tcg_const_tl((target_ulong)dup_const(MO_8, 0x80));
>> +    gen_simd_sub_mask(d, a, b, m);
>> +    tcg_temp_free(m);
>> +}
>> +
>> +GEN_RVP_R_INLINE(sub8, sub, 0, trans_sub);
>> +
>> +GEN_RVP_R_OOL(radd8);
>> +GEN_RVP_R_OOL(uradd8);
>> +GEN_RVP_R_OOL(kadd8);
>> +GEN_RVP_R_OOL(ukadd8);
>> +GEN_RVP_R_OOL(rsub8);
>> +GEN_RVP_R_OOL(ursub8);
>> +GEN_RVP_R_OOL(ksub8);
>> +GEN_RVP_R_OOL(uksub8);
>> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
>> index b84abaaf25..62db072204 100644
>> --- a/target/riscv/packed_helper.c
>> +++ b/target/riscv/packed_helper.c
>> @@ -352,3 +352,76 @@ static inline void do_ukstsa16(CPURISCVState *env, void *vd, void *va,
>>  }
>>
>>  RVPR(ukstsa16, 2, 2);
>> +
>> +/* 8-bit Addition & Subtraction Instructions */
>> +static inline void do_radd8(CPURISCVState *env, void *vd, void *va,
>> +                            void *vb, uint8_t i)
>> +{
>> +    int8_t *d = vd, *a = va, *b = vb;
>> +    d[i] = hadd32(a[i], b[i]);
>> +}
>> +
>> +RVPR(radd8, 1, 1);
>> +
>> +static inline void do_uradd8(CPURISCVState *env, void *vd, void *va,
>> +                                  void *vb, uint8_t i)
>> +{
>> +    uint8_t *d = vd, *a = va, *b = vb;
>> +    d[i] = haddu32(a[i], b[i]);
>> +}
>> +
>> +RVPR(uradd8, 1, 1);
>> +
>> +static inline void do_kadd8(CPURISCVState *env, void *vd, void *va,
>> +                            void *vb, uint8_t i)
>> +{
>> +    int8_t *d = vd, *a = va, *b = vb;
>> +    d[i] = sadd8(env, 0, a[i], b[i]);
>> +}
>> +
>> +RVPR(kadd8, 1, 1);
>> +
>> +static inline void do_ukadd8(CPURISCVState *env, void *vd, void *va,
>> +                             void *vb, uint8_t i)
>> +{
>> +    uint8_t *d = vd, *a = va, *b = vb;
>> +    d[i] = saddu8(env, 0, a[i], b[i]);
>> +}
>> +
>> +RVPR(ukadd8, 1, 1);
>> +
>> +static inline void do_rsub8(CPURISCVState *env, void *vd, void *va,
>> +                            void *vb, uint8_t i)
>> +{
>> +    int8_t *d = vd, *a = va, *b = vb;
>> +    d[i] = hsub32(a[i], b[i]);
>> +}
>> +
>> +RVPR(rsub8, 1, 1);
>> +
>> +static inline void do_ursub8(CPURISCVState *env, void *vd, void *va,
>> +                             void *vb, uint8_t i)
>> +{
>> +    uint8_t *d = vd, *a = va, *b = vb;
>> +    d[i] = hsubu64(a[i], b[i]);
>> +}
>> +
>> +RVPR(ursub8, 1, 1);
>> +
>> +static inline void do_ksub8(CPURISCVState *env, void *vd, void *va,
>> +                            void *vb, uint8_t i)
>> +{
>> +    int8_t *d = vd, *a = va, *b = vb;
>> +    d[i] = ssub8(env, 0, a[i], b[i]);
>> +}
>> +
>> +RVPR(ksub8, 1, 1);
>> +
>> +static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
>> +                             void *vb, uint8_t i)
>> +{
>> +    uint8_t *d = vd, *a = va, *b = vb;
>> +    d[i] = ssubu8(env, 0, a[i], b[i]);
>> +}
>> +
>> +RVPR(uksub8, 1, 1);
>> --
>> 2.17.1
>>

The naming on some of these helpers is a bit odd, but given that they're 
a mix of the V and P extensions it's probably fine to just leave them 
as-is.  

Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 05/38] target/riscv: 8-bit Addition & Subtraction Instruction
@ 2021-05-24  1:00       ` Palmer Dabbelt
  0 siblings, 0 replies; 150+ messages in thread
From: Palmer Dabbelt @ 2021-05-24  1:00 UTC (permalink / raw)
  To: alistair23; +Cc: zhiwei_liu, qemu-devel, qemu-riscv, richard.henderson

On Mon, 15 Mar 2021 14:22:58 PDT (-0700), alistair23@gmail.com wrote:
> On Fri, Feb 12, 2021 at 10:14 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>>
>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>
> Acked-by: Alistair Francis <alistair.francis@wdc.com>

I saw some reviews on the other ones, but since others (like this) just 
have acks and haven't had any other traffic I'm going to start here.

It looks like the latest spec is 0.9.4, but the changelog is pretty 
minimal between 0.9.5 and 0.9.2:

[0.9.2 -> 0.9.3]

* Changed Zp64 name to Zpsfoperand.
* Added Zprvsfextra for RV64 only instructions.
* Removed SWAP16 encoding. It is an alias of PKBT16.
* Fixed few typos and enhanced precision descriptions on imtermediate results.

[0.9.3 -> 0.9.4]

* Fixed few typos and enhanced precision descriptions on imtermediate results.
* Fixed/Changed data types for some intrinsic functions.
* Removed "RV32 Only" for Zpsfoperand.

So I'm just going to stick with reviewing based on the latest spec 
<https://github.com/riscv/riscv-p-spec/blob/d33a761f805d3b7c84214e5654a511267985a0a0/P-ext-proposal.pdf> 
and try to keep those differences in mind, assuming we're just tracking 
the latest draft here.

> Alistair
>
>> ---
>>  target/riscv/helper.h                   |  9 +++
>>  target/riscv/insn32.decode              | 11 ++++
>>  target/riscv/insn_trans/trans_rvp.c.inc | 79 +++++++++++++++++++++++++
>>  target/riscv/packed_helper.c            | 73 +++++++++++++++++++++++
>>  4 files changed, 172 insertions(+)
>>
>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>> index 6d622c732a..a69a6b4e84 100644
>> --- a/target/riscv/helper.h
>> +++ b/target/riscv/helper.h
>> @@ -1175,3 +1175,12 @@ DEF_HELPER_3(rstsa16, tl, env, tl, tl)
>>  DEF_HELPER_3(urstsa16, tl, env, tl, tl)
>>  DEF_HELPER_3(kstsa16, tl, env, tl, tl)
>>  DEF_HELPER_3(ukstsa16, tl, env, tl, tl)
>> +
>> +DEF_HELPER_3(radd8, tl, env, tl, tl)
>> +DEF_HELPER_3(uradd8, tl, env, tl, tl)
>> +DEF_HELPER_3(kadd8, tl, env, tl, tl)
>> +DEF_HELPER_3(ukadd8, tl, env, tl, tl)
>> +DEF_HELPER_3(rsub8, tl, env, tl, tl)
>> +DEF_HELPER_3(ursub8, tl, env, tl, tl)
>> +DEF_HELPER_3(ksub8, tl, env, tl, tl)
>> +DEF_HELPER_3(uksub8, tl, env, tl, tl)
>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>> index 8815e90476..358dd1fa10 100644
>> --- a/target/riscv/insn32.decode
>> +++ b/target/riscv/insn32.decode
>> @@ -624,3 +624,14 @@ rstsa16    1011011  ..... ..... 010 ..... 1111111 @r
>>  urstsa16   1101011  ..... ..... 010 ..... 1111111 @r
>>  kstsa16    1100011  ..... ..... 010 ..... 1111111 @r
>>  ukstsa16   1110011  ..... ..... 010 ..... 1111111 @r
>> +
>> +add8       0100100  ..... ..... 000 ..... 1111111 @r
>> +radd8      0000100  ..... ..... 000 ..... 1111111 @r
>> +uradd8     0010100  ..... ..... 000 ..... 1111111 @r
>> +kadd8      0001100  ..... ..... 000 ..... 1111111 @r
>> +ukadd8     0011100  ..... ..... 000 ..... 1111111 @r
>> +sub8       0100101  ..... ..... 000 ..... 1111111 @r
>> +rsub8      0000101  ..... ..... 000 ..... 1111111 @r
>> +ursub8     0010101  ..... ..... 000 ..... 1111111 @r
>> +ksub8      0001101  ..... ..... 000 ..... 1111111 @r
>> +uksub8     0011101  ..... ..... 000 ..... 1111111 @r
>> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
>> index 0885a4fd45..109f560ec9 100644
>> --- a/target/riscv/insn_trans/trans_rvp.c.inc
>> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
>> @@ -159,3 +159,82 @@ GEN_RVP_R_OOL(rstsa16);
>>  GEN_RVP_R_OOL(urstsa16);
>>  GEN_RVP_R_OOL(kstsa16);
>>  GEN_RVP_R_OOL(ukstsa16);
>> +
>> +/* 8-bit Addition & Subtraction Instructions */
>> +/*
>> + *  Copied from tcg-op-gvec.c.
>> + *
>> + *  Perform a vector addition using normal addition and a mask.  The mask
>> + *  should be the sign bit of each lane.  This 6-operation form is more
>> + *  efficient than separate additions when there are 4 or more lanes in
>> + *  the 64-bit operation.
>> + */
>> +
>> +static void gen_simd_add_mask(TCGv d, TCGv a, TCGv b, TCGv m)
>> +{
>> +    TCGv t1 = tcg_temp_new();
>> +    TCGv t2 = tcg_temp_new();
>> +    TCGv t3 = tcg_temp_new();
>> +
>> +    tcg_gen_andc_tl(t1, a, m);
>> +    tcg_gen_andc_tl(t2, b, m);
>> +    tcg_gen_xor_tl(t3, a, b);
>> +    tcg_gen_add_tl(d, t1, t2);
>> +    tcg_gen_and_tl(t3, t3, m);
>> +    tcg_gen_xor_tl(d, d, t3);
>> +
>> +    tcg_temp_free(t1);
>> +    tcg_temp_free(t2);
>> +    tcg_temp_free(t3);
>> +}
>> +
>> +static void tcg_gen_simd_add8(TCGv d, TCGv a, TCGv b)
>> +{
>> +    TCGv m = tcg_const_tl((target_ulong)dup_const(MO_8, 0x80));
>> +    gen_simd_add_mask(d, a, b, m);
>> +    tcg_temp_free(m);
>> +}
>> +
>> +GEN_RVP_R_INLINE(add8, add, 0, trans_add);
>> +
>> +/*
>> + *  Copied from tcg-op-gvec.c.
>> + *
>> + *  Perform a vector subtraction using normal subtraction and a mask.
>> + *  Compare gen_addv_mask above.
>> + */
>> +static void gen_simd_sub_mask(TCGv d, TCGv a, TCGv b, TCGv m)
>> +{
>> +    TCGv t1 = tcg_temp_new();
>> +    TCGv t2 = tcg_temp_new();
>> +    TCGv t3 = tcg_temp_new();
>> +
>> +    tcg_gen_or_tl(t1, a, m);
>> +    tcg_gen_andc_tl(t2, b, m);
>> +    tcg_gen_eqv_tl(t3, a, b);
>> +    tcg_gen_sub_tl(d, t1, t2);
>> +    tcg_gen_and_tl(t3, t3, m);
>> +    tcg_gen_xor_tl(d, d, t3);
>> +
>> +    tcg_temp_free(t1);
>> +    tcg_temp_free(t2);
>> +    tcg_temp_free(t3);
>> +}
>> +
>> +static void tcg_gen_simd_sub8(TCGv d, TCGv a, TCGv b)
>> +{
>> +    TCGv m = tcg_const_tl((target_ulong)dup_const(MO_8, 0x80));
>> +    gen_simd_sub_mask(d, a, b, m);
>> +    tcg_temp_free(m);
>> +}
>> +
>> +GEN_RVP_R_INLINE(sub8, sub, 0, trans_sub);
>> +
>> +GEN_RVP_R_OOL(radd8);
>> +GEN_RVP_R_OOL(uradd8);
>> +GEN_RVP_R_OOL(kadd8);
>> +GEN_RVP_R_OOL(ukadd8);
>> +GEN_RVP_R_OOL(rsub8);
>> +GEN_RVP_R_OOL(ursub8);
>> +GEN_RVP_R_OOL(ksub8);
>> +GEN_RVP_R_OOL(uksub8);
>> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
>> index b84abaaf25..62db072204 100644
>> --- a/target/riscv/packed_helper.c
>> +++ b/target/riscv/packed_helper.c
>> @@ -352,3 +352,76 @@ static inline void do_ukstsa16(CPURISCVState *env, void *vd, void *va,
>>  }
>>
>>  RVPR(ukstsa16, 2, 2);
>> +
>> +/* 8-bit Addition & Subtraction Instructions */
>> +static inline void do_radd8(CPURISCVState *env, void *vd, void *va,
>> +                            void *vb, uint8_t i)
>> +{
>> +    int8_t *d = vd, *a = va, *b = vb;
>> +    d[i] = hadd32(a[i], b[i]);
>> +}
>> +
>> +RVPR(radd8, 1, 1);
>> +
>> +static inline void do_uradd8(CPURISCVState *env, void *vd, void *va,
>> +                                  void *vb, uint8_t i)
>> +{
>> +    uint8_t *d = vd, *a = va, *b = vb;
>> +    d[i] = haddu32(a[i], b[i]);
>> +}
>> +
>> +RVPR(uradd8, 1, 1);
>> +
>> +static inline void do_kadd8(CPURISCVState *env, void *vd, void *va,
>> +                            void *vb, uint8_t i)
>> +{
>> +    int8_t *d = vd, *a = va, *b = vb;
>> +    d[i] = sadd8(env, 0, a[i], b[i]);
>> +}
>> +
>> +RVPR(kadd8, 1, 1);
>> +
>> +static inline void do_ukadd8(CPURISCVState *env, void *vd, void *va,
>> +                             void *vb, uint8_t i)
>> +{
>> +    uint8_t *d = vd, *a = va, *b = vb;
>> +    d[i] = saddu8(env, 0, a[i], b[i]);
>> +}
>> +
>> +RVPR(ukadd8, 1, 1);
>> +
>> +static inline void do_rsub8(CPURISCVState *env, void *vd, void *va,
>> +                            void *vb, uint8_t i)
>> +{
>> +    int8_t *d = vd, *a = va, *b = vb;
>> +    d[i] = hsub32(a[i], b[i]);
>> +}
>> +
>> +RVPR(rsub8, 1, 1);
>> +
>> +static inline void do_ursub8(CPURISCVState *env, void *vd, void *va,
>> +                             void *vb, uint8_t i)
>> +{
>> +    uint8_t *d = vd, *a = va, *b = vb;
>> +    d[i] = hsubu64(a[i], b[i]);
>> +}
>> +
>> +RVPR(ursub8, 1, 1);
>> +
>> +static inline void do_ksub8(CPURISCVState *env, void *vd, void *va,
>> +                            void *vb, uint8_t i)
>> +{
>> +    int8_t *d = vd, *a = va, *b = vb;
>> +    d[i] = ssub8(env, 0, a[i], b[i]);
>> +}
>> +
>> +RVPR(ksub8, 1, 1);
>> +
>> +static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
>> +                             void *vb, uint8_t i)
>> +{
>> +    uint8_t *d = vd, *a = va, *b = vb;
>> +    d[i] = ssubu8(env, 0, a[i], b[i]);
>> +}
>> +
>> +RVPR(uksub8, 1, 1);
>> --
>> 2.17.1
>>

The naming on some of these helpers is a bit odd, but given that they're 
a mix of the V and P extensions it's probably fine to just leave them 
as-is.  

Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 07/38] target/riscv: SIMD 8-bit Shift Instructions
  2021-02-12 15:02   ` LIU Zhiwei
@ 2021-05-24  4:46     ` Palmer Dabbelt
  -1 siblings, 0 replies; 150+ messages in thread
From: Palmer Dabbelt @ 2021-05-24  4:46 UTC (permalink / raw)
  To: zhiwei_liu
  Cc: richard.henderson, zhiwei_liu, qemu-riscv, qemu-devel, alistair23

On Fri, 12 Feb 2021 07:02:25 PST (-0800), zhiwei_liu@c-sky.com wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

I know it's always kind of akward for this type of patches, but IIUC 
they're all supposed to have some sort of description.

> ---
>  target/riscv/helper.h                   |   9 +++
>  target/riscv/insn32.decode              |  17 ++++
>  target/riscv/insn_trans/trans_rvp.c.inc |  16 ++++
>  target/riscv/packed_helper.c            | 102 ++++++++++++++++++++++++
>  4 files changed, 144 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 20bf400ac2..0ecd4d53f9 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1193,3 +1193,12 @@ DEF_HELPER_3(sll16, tl, env, tl, tl)
>  DEF_HELPER_3(ksll16, tl, env, tl, tl)
>  DEF_HELPER_3(kslra16, tl, env, tl, tl)
>  DEF_HELPER_3(kslra16_u, tl, env, tl, tl)
> +
> +DEF_HELPER_3(sra8, tl, env, tl, tl)
> +DEF_HELPER_3(sra8_u, tl, env, tl, tl)
> +DEF_HELPER_3(srl8, tl, env, tl, tl)
> +DEF_HELPER_3(srl8_u, tl, env, tl, tl)
> +DEF_HELPER_3(sll8, tl, env, tl, tl)
> +DEF_HELPER_3(ksll8, tl, env, tl, tl)
> +DEF_HELPER_3(kslra8, tl, env, tl, tl)
> +DEF_HELPER_3(kslra8_u, tl, env, tl, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 6f053bfeb7..cc782fcde5 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -24,6 +24,7 @@
>
>  %sh10    20:10
>  %sh4    20:4
> +%sh3    20:3
>  %csr    20:12
>  %rm     12:3
>  %nf     29:3                     !function=ex_plus_1
> @@ -61,6 +62,7 @@
>
>  @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh10      %rs1 %rd
>  @sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
> +@sh3     ......  ...... .....  ... ..... ....... &shift  shamt=%sh3      %rs1 %rd
>  @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
>
>  @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
> @@ -652,3 +654,18 @@ ksll16     0110010  ..... ..... 000 ..... 1111111 @r
>  kslli16    0111010  1.... ..... 000 ..... 1111111 @sh4
>  kslra16    0101011  ..... ..... 000 ..... 1111111 @r
>  kslra16_u  0110011  ..... ..... 000 ..... 1111111 @r
> +
> +sra8       0101100  ..... ..... 000 ..... 1111111 @r
> +sra8_u     0110100  ..... ..... 000 ..... 1111111 @r
> +srai8      0111100  00... ..... 000 ..... 1111111 @sh3
> +srai8_u    0111100  01... ..... 000 ..... 1111111 @sh3
> +srl8       0101101  ..... ..... 000 ..... 1111111 @r
> +srl8_u     0110101  ..... ..... 000 ..... 1111111 @r
> +srli8      0111101  00... ..... 000 ..... 1111111 @sh3
> +srli8_u    0111101  01... ..... 000 ..... 1111111 @sh3
> +sll8       0101110  ..... ..... 000 ..... 1111111 @r
> +slli8      0111110  00... ..... 000 ..... 1111111 @sh3
> +ksll8      0110110  ..... ..... 000 ..... 1111111 @r
> +kslli8     0111110  01... ..... 000 ..... 1111111 @sh3
> +kslra8     0101111  ..... ..... 000 ..... 1111111 @r
> +kslra8_u   0110111  ..... ..... 000 ..... 1111111 @r
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 848edab7e5..12a64849eb 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -353,3 +353,19 @@ GEN_RVP_SHIFTI(slli16, sll16, tcg_gen_vec_shl16i_i64);
>  GEN_RVP_SHIFTI(srai16_u, sra16_u, NULL);
>  GEN_RVP_SHIFTI(srli16_u, srl16_u, NULL);
>  GEN_RVP_SHIFTI(kslli16, ksll16, NULL);
> +
> +/* SIMD 8-bit Shift Instructions */
> +GEN_RVP_SHIFT(sra8, tcg_gen_gvec_sars, 0);
> +GEN_RVP_SHIFT(srl8, tcg_gen_gvec_shrs, 0);
> +GEN_RVP_SHIFT(sll8, tcg_gen_gvec_shls, 0);
> +GEN_RVP_R_OOL(sra8_u);
> +GEN_RVP_R_OOL(srl8_u);
> +GEN_RVP_R_OOL(ksll8);
> +GEN_RVP_R_OOL(kslra8);
> +GEN_RVP_R_OOL(kslra8_u);
> +GEN_RVP_SHIFTI(srai8, sra8, tcg_gen_vec_sar8i_i64);
> +GEN_RVP_SHIFTI(srli8, srl8, tcg_gen_vec_shr8i_i64);
> +GEN_RVP_SHIFTI(slli8, sll8, tcg_gen_vec_shl8i_i64);
> +GEN_RVP_SHIFTI(srai8_u, sra8_u, NULL);
> +GEN_RVP_SHIFTI(srli8_u, srl8_u, NULL);
> +GEN_RVP_SHIFTI(kslli8, ksll8, NULL);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index 7e31c2fe46..ab9ebc472b 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -529,3 +529,105 @@ static inline void do_kslra16_u(CPURISCVState *env, void *vd, void *va,
>  }
>
>  RVPR(kslra16_u, 1, 2);
> +
> +/* SIMD 8-bit Shift Instructions */
> +static inline void do_sra8(CPURISCVState *env, void *vd, void *va,
> +                           void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +    d[i] = a[i] >> shift;
> +}
> +
> +RVPR(sra8, 1, 1);
> +
> +static inline void do_srl8(CPURISCVState *env, void *vd, void *va,
> +                           void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +    d[i] = a[i] >> shift;
> +}
> +
> +RVPR(srl8, 1, 1);
> +
> +static inline void do_sll8(CPURISCVState *env, void *vd, void *va,
> +                           void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +    d[i] = a[i] << shift;
> +}
> +
> +RVPR(sll8, 1, 1);
> +
> +static inline void do_sra8_u(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +    d[i] =  vssra8(env, 0, a[i], shift);
> +}
> +
> +RVPR(sra8_u, 1, 1);
> +
> +static inline void do_srl8_u(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +    d[i] =  vssrl8(env, 0, a[i], shift);
> +}
> +
> +RVPR(srl8_u, 1, 1);
> +
> +static inline void do_ksll8(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, result;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +
> +    result = a[i] << shift;
> +    if (shift > (clrsb32(a[i]) - 24)) {
> +        env->vxsat = 0x1;
> +        d[i] = (a[i] & INT8_MIN) ? INT8_MIN : INT8_MAX;
> +    } else {
> +        d[i] = result;
> +    }
> +}
> +
> +RVPR(ksll8, 1, 1);
> +
> +static inline void do_kslra8(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    int32_t shift = sextract32((*(uint32_t *)vb), 0, 4);
> +
> +    if (shift >= 0) {
> +        do_ksll8(env, vd, va, vb, i);
> +    } else {
> +        shift = -shift;
> +        shift = (shift == 8) ? 7 : shift;
> +        d[i] = a[i] >> shift;
> +    }
> +}
> +
> +RVPR(kslra8, 1, 1);
> +
> +static inline void do_kslra8_u(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    int32_t shift = sextract32((*(uint32_t *)vb), 0, 4);
> +
> +    if (shift >= 0) {
> +        do_ksll8(env, vd, va, vb, i);
> +    } else {
> +        shift = -shift;
> +        shift = (shift == 8) ? 7 : shift;
> +        d[i] =  vssra8(env, 0, a[i], shift);
> +    }
> +}
> +
> +RVPR(kslra8_u, 1, 1);

Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 07/38] target/riscv: SIMD 8-bit Shift Instructions
@ 2021-05-24  4:46     ` Palmer Dabbelt
  0 siblings, 0 replies; 150+ messages in thread
From: Palmer Dabbelt @ 2021-05-24  4:46 UTC (permalink / raw)
  To: zhiwei_liu
  Cc: qemu-devel, qemu-riscv, richard.henderson, alistair23, zhiwei_liu

On Fri, 12 Feb 2021 07:02:25 PST (-0800), zhiwei_liu@c-sky.com wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

I know it's always kind of akward for this type of patches, but IIUC 
they're all supposed to have some sort of description.

> ---
>  target/riscv/helper.h                   |   9 +++
>  target/riscv/insn32.decode              |  17 ++++
>  target/riscv/insn_trans/trans_rvp.c.inc |  16 ++++
>  target/riscv/packed_helper.c            | 102 ++++++++++++++++++++++++
>  4 files changed, 144 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 20bf400ac2..0ecd4d53f9 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1193,3 +1193,12 @@ DEF_HELPER_3(sll16, tl, env, tl, tl)
>  DEF_HELPER_3(ksll16, tl, env, tl, tl)
>  DEF_HELPER_3(kslra16, tl, env, tl, tl)
>  DEF_HELPER_3(kslra16_u, tl, env, tl, tl)
> +
> +DEF_HELPER_3(sra8, tl, env, tl, tl)
> +DEF_HELPER_3(sra8_u, tl, env, tl, tl)
> +DEF_HELPER_3(srl8, tl, env, tl, tl)
> +DEF_HELPER_3(srl8_u, tl, env, tl, tl)
> +DEF_HELPER_3(sll8, tl, env, tl, tl)
> +DEF_HELPER_3(ksll8, tl, env, tl, tl)
> +DEF_HELPER_3(kslra8, tl, env, tl, tl)
> +DEF_HELPER_3(kslra8_u, tl, env, tl, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 6f053bfeb7..cc782fcde5 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -24,6 +24,7 @@
>
>  %sh10    20:10
>  %sh4    20:4
> +%sh3    20:3
>  %csr    20:12
>  %rm     12:3
>  %nf     29:3                     !function=ex_plus_1
> @@ -61,6 +62,7 @@
>
>  @sh      ......  ...... .....  ... ..... ....... &shift  shamt=%sh10      %rs1 %rd
>  @sh4     ......  ...... .....  ... ..... ....... &shift  shamt=%sh4      %rs1 %rd
> +@sh3     ......  ...... .....  ... ..... ....... &shift  shamt=%sh3      %rs1 %rd
>  @csr     ............   .....  ... ..... .......               %csr     %rs1 %rd
>
>  @atom_ld ..... aq:1 rl:1 ..... ........ ..... ....... &atomic rs2=0     %rs1 %rd
> @@ -652,3 +654,18 @@ ksll16     0110010  ..... ..... 000 ..... 1111111 @r
>  kslli16    0111010  1.... ..... 000 ..... 1111111 @sh4
>  kslra16    0101011  ..... ..... 000 ..... 1111111 @r
>  kslra16_u  0110011  ..... ..... 000 ..... 1111111 @r
> +
> +sra8       0101100  ..... ..... 000 ..... 1111111 @r
> +sra8_u     0110100  ..... ..... 000 ..... 1111111 @r
> +srai8      0111100  00... ..... 000 ..... 1111111 @sh3
> +srai8_u    0111100  01... ..... 000 ..... 1111111 @sh3
> +srl8       0101101  ..... ..... 000 ..... 1111111 @r
> +srl8_u     0110101  ..... ..... 000 ..... 1111111 @r
> +srli8      0111101  00... ..... 000 ..... 1111111 @sh3
> +srli8_u    0111101  01... ..... 000 ..... 1111111 @sh3
> +sll8       0101110  ..... ..... 000 ..... 1111111 @r
> +slli8      0111110  00... ..... 000 ..... 1111111 @sh3
> +ksll8      0110110  ..... ..... 000 ..... 1111111 @r
> +kslli8     0111110  01... ..... 000 ..... 1111111 @sh3
> +kslra8     0101111  ..... ..... 000 ..... 1111111 @r
> +kslra8_u   0110111  ..... ..... 000 ..... 1111111 @r
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 848edab7e5..12a64849eb 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -353,3 +353,19 @@ GEN_RVP_SHIFTI(slli16, sll16, tcg_gen_vec_shl16i_i64);
>  GEN_RVP_SHIFTI(srai16_u, sra16_u, NULL);
>  GEN_RVP_SHIFTI(srli16_u, srl16_u, NULL);
>  GEN_RVP_SHIFTI(kslli16, ksll16, NULL);
> +
> +/* SIMD 8-bit Shift Instructions */
> +GEN_RVP_SHIFT(sra8, tcg_gen_gvec_sars, 0);
> +GEN_RVP_SHIFT(srl8, tcg_gen_gvec_shrs, 0);
> +GEN_RVP_SHIFT(sll8, tcg_gen_gvec_shls, 0);
> +GEN_RVP_R_OOL(sra8_u);
> +GEN_RVP_R_OOL(srl8_u);
> +GEN_RVP_R_OOL(ksll8);
> +GEN_RVP_R_OOL(kslra8);
> +GEN_RVP_R_OOL(kslra8_u);
> +GEN_RVP_SHIFTI(srai8, sra8, tcg_gen_vec_sar8i_i64);
> +GEN_RVP_SHIFTI(srli8, srl8, tcg_gen_vec_shr8i_i64);
> +GEN_RVP_SHIFTI(slli8, sll8, tcg_gen_vec_shl8i_i64);
> +GEN_RVP_SHIFTI(srai8_u, sra8_u, NULL);
> +GEN_RVP_SHIFTI(srli8_u, srl8_u, NULL);
> +GEN_RVP_SHIFTI(kslli8, ksll8, NULL);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index 7e31c2fe46..ab9ebc472b 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -529,3 +529,105 @@ static inline void do_kslra16_u(CPURISCVState *env, void *vd, void *va,
>  }
>
>  RVPR(kslra16_u, 1, 2);
> +
> +/* SIMD 8-bit Shift Instructions */
> +static inline void do_sra8(CPURISCVState *env, void *vd, void *va,
> +                           void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +    d[i] = a[i] >> shift;
> +}
> +
> +RVPR(sra8, 1, 1);
> +
> +static inline void do_srl8(CPURISCVState *env, void *vd, void *va,
> +                           void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +    d[i] = a[i] >> shift;
> +}
> +
> +RVPR(srl8, 1, 1);
> +
> +static inline void do_sll8(CPURISCVState *env, void *vd, void *va,
> +                           void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +    d[i] = a[i] << shift;
> +}
> +
> +RVPR(sll8, 1, 1);
> +
> +static inline void do_sra8_u(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +    d[i] =  vssra8(env, 0, a[i], shift);
> +}
> +
> +RVPR(sra8_u, 1, 1);
> +
> +static inline void do_srl8_u(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    uint8_t *d = vd, *a = va;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +    d[i] =  vssrl8(env, 0, a[i], shift);
> +}
> +
> +RVPR(srl8_u, 1, 1);
> +
> +static inline void do_ksll8(CPURISCVState *env, void *vd, void *va,
> +                            void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va, result;
> +    uint8_t shift = *(uint8_t *)vb & 0x7;
> +
> +    result = a[i] << shift;
> +    if (shift > (clrsb32(a[i]) - 24)) {
> +        env->vxsat = 0x1;
> +        d[i] = (a[i] & INT8_MIN) ? INT8_MIN : INT8_MAX;
> +    } else {
> +        d[i] = result;
> +    }
> +}
> +
> +RVPR(ksll8, 1, 1);
> +
> +static inline void do_kslra8(CPURISCVState *env, void *vd, void *va,
> +                             void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    int32_t shift = sextract32((*(uint32_t *)vb), 0, 4);
> +
> +    if (shift >= 0) {
> +        do_ksll8(env, vd, va, vb, i);
> +    } else {
> +        shift = -shift;
> +        shift = (shift == 8) ? 7 : shift;
> +        d[i] = a[i] >> shift;
> +    }
> +}
> +
> +RVPR(kslra8, 1, 1);
> +
> +static inline void do_kslra8_u(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    int8_t *d = vd, *a = va;
> +    int32_t shift = sextract32((*(uint32_t *)vb), 0, 4);
> +
> +    if (shift >= 0) {
> +        do_ksll8(env, vd, va, vb, i);
> +    } else {
> +        shift = -shift;
> +        shift = (shift == 8) ? 7 : shift;
> +        d[i] =  vssra8(env, 0, a[i], shift);
> +    }
> +}
> +
> +RVPR(kslra8_u, 1, 1);

Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 08/38] target/riscv: SIMD 16-bit Compare Instructions
  2021-02-12 15:02   ` LIU Zhiwei
@ 2021-05-26  5:30     ` Palmer Dabbelt
  -1 siblings, 0 replies; 150+ messages in thread
From: Palmer Dabbelt @ 2021-05-26  5:30 UTC (permalink / raw)
  To: zhiwei_liu
  Cc: richard.henderson, zhiwei_liu, qemu-riscv, qemu-devel, alistair23

On Fri, 12 Feb 2021 07:02:26 PST (-0800), zhiwei_liu@c-sky.com wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  6 ++++
>  target/riscv/insn32.decode              |  6 ++++
>  target/riscv/insn_trans/trans_rvp.c.inc |  7 ++++
>  target/riscv/packed_helper.c            | 46 +++++++++++++++++++++++++
>  4 files changed, 65 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 0ecd4d53f9..f41f9acccc 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1202,3 +1202,9 @@ DEF_HELPER_3(sll8, tl, env, tl, tl)
>  DEF_HELPER_3(ksll8, tl, env, tl, tl)
>  DEF_HELPER_3(kslra8, tl, env, tl, tl)
>  DEF_HELPER_3(kslra8_u, tl, env, tl, tl)
> +
> +DEF_HELPER_3(cmpeq16, tl, env, tl, tl)
> +DEF_HELPER_3(scmplt16, tl, env, tl, tl)
> +DEF_HELPER_3(scmple16, tl, env, tl, tl)
> +DEF_HELPER_3(ucmplt16, tl, env, tl, tl)
> +DEF_HELPER_3(ucmple16, tl, env, tl, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index cc782fcde5..f3cd508396 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -669,3 +669,9 @@ ksll8      0110110  ..... ..... 000 ..... 1111111 @r
>  kslli8     0111110  01... ..... 000 ..... 1111111 @sh3
>  kslra8     0101111  ..... ..... 000 ..... 1111111 @r
>  kslra8_u   0110111  ..... ..... 000 ..... 1111111 @r
> +
> +cmpeq16    0100110  ..... ..... 000 ..... 1111111 @r
> +scmplt16   0000110  ..... ..... 000 ..... 1111111 @r
> +scmple16   0001110  ..... ..... 000 ..... 1111111 @r
> +ucmplt16   0010110  ..... ..... 000 ..... 1111111 @r
> +ucmple16   0011110  ..... ..... 000 ..... 1111111 @r
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 12a64849eb..6438dfb776 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -369,3 +369,10 @@ GEN_RVP_SHIFTI(slli8, sll8, tcg_gen_vec_shl8i_i64);
>  GEN_RVP_SHIFTI(srai8_u, sra8_u, NULL);
>  GEN_RVP_SHIFTI(srli8_u, srl8_u, NULL);
>  GEN_RVP_SHIFTI(kslli8, ksll8, NULL);
> +
> +/* SIMD 16-bit Compare Instructions */
> +GEN_RVP_R_OOL(cmpeq16);
> +GEN_RVP_R_OOL(scmplt16);
> +GEN_RVP_R_OOL(scmple16);
> +GEN_RVP_R_OOL(ucmplt16);
> +GEN_RVP_R_OOL(ucmple16);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index ab9ebc472b..30b916b5ad 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -631,3 +631,49 @@ static inline void do_kslra8_u(CPURISCVState *env, void *vd, void *va,
>  }
>
>  RVPR(kslra8_u, 1, 1);
> +
> +/* SIMD 16-bit Compare Instructions */
> +static inline void do_cmpeq16(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] == b[i]) ? 0xffff : 0x0;
> +}
> +
> +RVPR(cmpeq16, 1, 2);
> +
> +static inline void do_scmplt16(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] < b[i]) ? 0xffff : 0x0;
> +}
> +
> +RVPR(scmplt16, 1, 2);
> +
> +static inline void do_scmple16(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] <= b[i]) ? 0xffff : 0x0;
> +}
> +
> +RVPR(scmple16, 1, 2);
> +
> +static inline void do_ucmplt16(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] < b[i]) ? 0xffff : 0x0;
> +}
> +
> +RVPR(ucmplt16, 1, 2);
> +
> +static inline void do_ucmple16(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] <= b[i]) ? 0xffff : 0x0;
> +}
> +
> +RVPR(ucmple16, 1, 2);

Thanks, this is on for-next.


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 08/38] target/riscv: SIMD 16-bit Compare Instructions
@ 2021-05-26  5:30     ` Palmer Dabbelt
  0 siblings, 0 replies; 150+ messages in thread
From: Palmer Dabbelt @ 2021-05-26  5:30 UTC (permalink / raw)
  To: zhiwei_liu
  Cc: qemu-devel, qemu-riscv, richard.henderson, alistair23, zhiwei_liu

On Fri, 12 Feb 2021 07:02:26 PST (-0800), zhiwei_liu@c-sky.com wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/helper.h                   |  6 ++++
>  target/riscv/insn32.decode              |  6 ++++
>  target/riscv/insn_trans/trans_rvp.c.inc |  7 ++++
>  target/riscv/packed_helper.c            | 46 +++++++++++++++++++++++++
>  4 files changed, 65 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 0ecd4d53f9..f41f9acccc 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1202,3 +1202,9 @@ DEF_HELPER_3(sll8, tl, env, tl, tl)
>  DEF_HELPER_3(ksll8, tl, env, tl, tl)
>  DEF_HELPER_3(kslra8, tl, env, tl, tl)
>  DEF_HELPER_3(kslra8_u, tl, env, tl, tl)
> +
> +DEF_HELPER_3(cmpeq16, tl, env, tl, tl)
> +DEF_HELPER_3(scmplt16, tl, env, tl, tl)
> +DEF_HELPER_3(scmple16, tl, env, tl, tl)
> +DEF_HELPER_3(ucmplt16, tl, env, tl, tl)
> +DEF_HELPER_3(ucmple16, tl, env, tl, tl)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index cc782fcde5..f3cd508396 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -669,3 +669,9 @@ ksll8      0110110  ..... ..... 000 ..... 1111111 @r
>  kslli8     0111110  01... ..... 000 ..... 1111111 @sh3
>  kslra8     0101111  ..... ..... 000 ..... 1111111 @r
>  kslra8_u   0110111  ..... ..... 000 ..... 1111111 @r
> +
> +cmpeq16    0100110  ..... ..... 000 ..... 1111111 @r
> +scmplt16   0000110  ..... ..... 000 ..... 1111111 @r
> +scmple16   0001110  ..... ..... 000 ..... 1111111 @r
> +ucmplt16   0010110  ..... ..... 000 ..... 1111111 @r
> +ucmple16   0011110  ..... ..... 000 ..... 1111111 @r
> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
> index 12a64849eb..6438dfb776 100644
> --- a/target/riscv/insn_trans/trans_rvp.c.inc
> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
> @@ -369,3 +369,10 @@ GEN_RVP_SHIFTI(slli8, sll8, tcg_gen_vec_shl8i_i64);
>  GEN_RVP_SHIFTI(srai8_u, sra8_u, NULL);
>  GEN_RVP_SHIFTI(srli8_u, srl8_u, NULL);
>  GEN_RVP_SHIFTI(kslli8, ksll8, NULL);
> +
> +/* SIMD 16-bit Compare Instructions */
> +GEN_RVP_R_OOL(cmpeq16);
> +GEN_RVP_R_OOL(scmplt16);
> +GEN_RVP_R_OOL(scmple16);
> +GEN_RVP_R_OOL(ucmplt16);
> +GEN_RVP_R_OOL(ucmple16);
> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
> index ab9ebc472b..30b916b5ad 100644
> --- a/target/riscv/packed_helper.c
> +++ b/target/riscv/packed_helper.c
> @@ -631,3 +631,49 @@ static inline void do_kslra8_u(CPURISCVState *env, void *vd, void *va,
>  }
>
>  RVPR(kslra8_u, 1, 1);
> +
> +/* SIMD 16-bit Compare Instructions */
> +static inline void do_cmpeq16(CPURISCVState *env, void *vd, void *va,
> +                              void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] == b[i]) ? 0xffff : 0x0;
> +}
> +
> +RVPR(cmpeq16, 1, 2);
> +
> +static inline void do_scmplt16(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] < b[i]) ? 0xffff : 0x0;
> +}
> +
> +RVPR(scmplt16, 1, 2);
> +
> +static inline void do_scmple16(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    int16_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] <= b[i]) ? 0xffff : 0x0;
> +}
> +
> +RVPR(scmple16, 1, 2);
> +
> +static inline void do_ucmplt16(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] < b[i]) ? 0xffff : 0x0;
> +}
> +
> +RVPR(ucmplt16, 1, 2);
> +
> +static inline void do_ucmple16(CPURISCVState *env, void *vd, void *va,
> +                               void *vb, uint8_t i)
> +{
> +    uint16_t *d = vd, *a = va, *b = vb;
> +    d[i] = (a[i] <= b[i]) ? 0xffff : 0x0;
> +}
> +
> +RVPR(ucmple16, 1, 2);

Thanks, this is on for-next.


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 08/38] target/riscv: SIMD 16-bit Compare Instructions
  2021-05-26  5:30     ` Palmer Dabbelt
@ 2021-05-26  5:31       ` Palmer Dabbelt
  -1 siblings, 0 replies; 150+ messages in thread
From: Palmer Dabbelt @ 2021-05-26  5:31 UTC (permalink / raw)
  To: zhiwei_liu
  Cc: richard.henderson, zhiwei_liu, qemu-riscv, qemu-devel, alistair23

On Tue, 25 May 2021 22:30:14 PDT (-0700), Palmer Dabbelt wrote:
> On Fri, 12 Feb 2021 07:02:26 PST (-0800), zhiwei_liu@c-sky.com wrote:
>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>> ---
>>  target/riscv/helper.h                   |  6 ++++
>>  target/riscv/insn32.decode              |  6 ++++
>>  target/riscv/insn_trans/trans_rvp.c.inc |  7 ++++
>>  target/riscv/packed_helper.c            | 46 +++++++++++++++++++++++++
>>  4 files changed, 65 insertions(+)
>>
>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>> index 0ecd4d53f9..f41f9acccc 100644
>> --- a/target/riscv/helper.h
>> +++ b/target/riscv/helper.h
>> @@ -1202,3 +1202,9 @@ DEF_HELPER_3(sll8, tl, env, tl, tl)
>>  DEF_HELPER_3(ksll8, tl, env, tl, tl)
>>  DEF_HELPER_3(kslra8, tl, env, tl, tl)
>>  DEF_HELPER_3(kslra8_u, tl, env, tl, tl)
>> +
>> +DEF_HELPER_3(cmpeq16, tl, env, tl, tl)
>> +DEF_HELPER_3(scmplt16, tl, env, tl, tl)
>> +DEF_HELPER_3(scmple16, tl, env, tl, tl)
>> +DEF_HELPER_3(ucmplt16, tl, env, tl, tl)
>> +DEF_HELPER_3(ucmple16, tl, env, tl, tl)
>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>> index cc782fcde5..f3cd508396 100644
>> --- a/target/riscv/insn32.decode
>> +++ b/target/riscv/insn32.decode
>> @@ -669,3 +669,9 @@ ksll8      0110110  ..... ..... 000 ..... 1111111 @r
>>  kslli8     0111110  01... ..... 000 ..... 1111111 @sh3
>>  kslra8     0101111  ..... ..... 000 ..... 1111111 @r
>>  kslra8_u   0110111  ..... ..... 000 ..... 1111111 @r
>> +
>> +cmpeq16    0100110  ..... ..... 000 ..... 1111111 @r
>> +scmplt16   0000110  ..... ..... 000 ..... 1111111 @r
>> +scmple16   0001110  ..... ..... 000 ..... 1111111 @r
>> +ucmplt16   0010110  ..... ..... 000 ..... 1111111 @r
>> +ucmple16   0011110  ..... ..... 000 ..... 1111111 @r
>> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
>> index 12a64849eb..6438dfb776 100644
>> --- a/target/riscv/insn_trans/trans_rvp.c.inc
>> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
>> @@ -369,3 +369,10 @@ GEN_RVP_SHIFTI(slli8, sll8, tcg_gen_vec_shl8i_i64);
>>  GEN_RVP_SHIFTI(srai8_u, sra8_u, NULL);
>>  GEN_RVP_SHIFTI(srli8_u, srl8_u, NULL);
>>  GEN_RVP_SHIFTI(kslli8, ksll8, NULL);
>> +
>> +/* SIMD 16-bit Compare Instructions */
>> +GEN_RVP_R_OOL(cmpeq16);
>> +GEN_RVP_R_OOL(scmplt16);
>> +GEN_RVP_R_OOL(scmple16);
>> +GEN_RVP_R_OOL(ucmplt16);
>> +GEN_RVP_R_OOL(ucmple16);
>> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
>> index ab9ebc472b..30b916b5ad 100644
>> --- a/target/riscv/packed_helper.c
>> +++ b/target/riscv/packed_helper.c
>> @@ -631,3 +631,49 @@ static inline void do_kslra8_u(CPURISCVState *env, void *vd, void *va,
>>  }
>>
>>  RVPR(kslra8_u, 1, 1);
>> +
>> +/* SIMD 16-bit Compare Instructions */
>> +static inline void do_cmpeq16(CPURISCVState *env, void *vd, void *va,
>> +                              void *vb, uint8_t i)
>> +{
>> +    uint16_t *d = vd, *a = va, *b = vb;
>> +    d[i] = (a[i] == b[i]) ? 0xffff : 0x0;
>> +}
>> +
>> +RVPR(cmpeq16, 1, 2);
>> +
>> +static inline void do_scmplt16(CPURISCVState *env, void *vd, void *va,
>> +                               void *vb, uint8_t i)
>> +{
>> +    int16_t *d = vd, *a = va, *b = vb;
>> +    d[i] = (a[i] < b[i]) ? 0xffff : 0x0;
>> +}
>> +
>> +RVPR(scmplt16, 1, 2);
>> +
>> +static inline void do_scmple16(CPURISCVState *env, void *vd, void *va,
>> +                               void *vb, uint8_t i)
>> +{
>> +    int16_t *d = vd, *a = va, *b = vb;
>> +    d[i] = (a[i] <= b[i]) ? 0xffff : 0x0;
>> +}
>> +
>> +RVPR(scmple16, 1, 2);
>> +
>> +static inline void do_ucmplt16(CPURISCVState *env, void *vd, void *va,
>> +                               void *vb, uint8_t i)
>> +{
>> +    uint16_t *d = vd, *a = va, *b = vb;
>> +    d[i] = (a[i] < b[i]) ? 0xffff : 0x0;
>> +}
>> +
>> +RVPR(ucmplt16, 1, 2);
>> +
>> +static inline void do_ucmple16(CPURISCVState *env, void *vd, void *va,
>> +                               void *vb, uint8_t i)
>> +{
>> +    uint16_t *d = vd, *a = va, *b = vb;
>> +    d[i] = (a[i] <= b[i]) ? 0xffff : 0x0;
>> +}
>> +
>> +RVPR(ucmple16, 1, 2);
>
> Thanks, this is on for-next.

Oops, got my threads crossed.


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 08/38] target/riscv: SIMD 16-bit Compare Instructions
@ 2021-05-26  5:31       ` Palmer Dabbelt
  0 siblings, 0 replies; 150+ messages in thread
From: Palmer Dabbelt @ 2021-05-26  5:31 UTC (permalink / raw)
  To: zhiwei_liu
  Cc: qemu-devel, qemu-riscv, richard.henderson, alistair23, zhiwei_liu

On Tue, 25 May 2021 22:30:14 PDT (-0700), Palmer Dabbelt wrote:
> On Fri, 12 Feb 2021 07:02:26 PST (-0800), zhiwei_liu@c-sky.com wrote:
>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>> ---
>>  target/riscv/helper.h                   |  6 ++++
>>  target/riscv/insn32.decode              |  6 ++++
>>  target/riscv/insn_trans/trans_rvp.c.inc |  7 ++++
>>  target/riscv/packed_helper.c            | 46 +++++++++++++++++++++++++
>>  4 files changed, 65 insertions(+)
>>
>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>> index 0ecd4d53f9..f41f9acccc 100644
>> --- a/target/riscv/helper.h
>> +++ b/target/riscv/helper.h
>> @@ -1202,3 +1202,9 @@ DEF_HELPER_3(sll8, tl, env, tl, tl)
>>  DEF_HELPER_3(ksll8, tl, env, tl, tl)
>>  DEF_HELPER_3(kslra8, tl, env, tl, tl)
>>  DEF_HELPER_3(kslra8_u, tl, env, tl, tl)
>> +
>> +DEF_HELPER_3(cmpeq16, tl, env, tl, tl)
>> +DEF_HELPER_3(scmplt16, tl, env, tl, tl)
>> +DEF_HELPER_3(scmple16, tl, env, tl, tl)
>> +DEF_HELPER_3(ucmplt16, tl, env, tl, tl)
>> +DEF_HELPER_3(ucmple16, tl, env, tl, tl)
>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>> index cc782fcde5..f3cd508396 100644
>> --- a/target/riscv/insn32.decode
>> +++ b/target/riscv/insn32.decode
>> @@ -669,3 +669,9 @@ ksll8      0110110  ..... ..... 000 ..... 1111111 @r
>>  kslli8     0111110  01... ..... 000 ..... 1111111 @sh3
>>  kslra8     0101111  ..... ..... 000 ..... 1111111 @r
>>  kslra8_u   0110111  ..... ..... 000 ..... 1111111 @r
>> +
>> +cmpeq16    0100110  ..... ..... 000 ..... 1111111 @r
>> +scmplt16   0000110  ..... ..... 000 ..... 1111111 @r
>> +scmple16   0001110  ..... ..... 000 ..... 1111111 @r
>> +ucmplt16   0010110  ..... ..... 000 ..... 1111111 @r
>> +ucmple16   0011110  ..... ..... 000 ..... 1111111 @r
>> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_trans/trans_rvp.c.inc
>> index 12a64849eb..6438dfb776 100644
>> --- a/target/riscv/insn_trans/trans_rvp.c.inc
>> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
>> @@ -369,3 +369,10 @@ GEN_RVP_SHIFTI(slli8, sll8, tcg_gen_vec_shl8i_i64);
>>  GEN_RVP_SHIFTI(srai8_u, sra8_u, NULL);
>>  GEN_RVP_SHIFTI(srli8_u, srl8_u, NULL);
>>  GEN_RVP_SHIFTI(kslli8, ksll8, NULL);
>> +
>> +/* SIMD 16-bit Compare Instructions */
>> +GEN_RVP_R_OOL(cmpeq16);
>> +GEN_RVP_R_OOL(scmplt16);
>> +GEN_RVP_R_OOL(scmple16);
>> +GEN_RVP_R_OOL(ucmplt16);
>> +GEN_RVP_R_OOL(ucmple16);
>> diff --git a/target/riscv/packed_helper.c b/target/riscv/packed_helper.c
>> index ab9ebc472b..30b916b5ad 100644
>> --- a/target/riscv/packed_helper.c
>> +++ b/target/riscv/packed_helper.c
>> @@ -631,3 +631,49 @@ static inline void do_kslra8_u(CPURISCVState *env, void *vd, void *va,
>>  }
>>
>>  RVPR(kslra8_u, 1, 1);
>> +
>> +/* SIMD 16-bit Compare Instructions */
>> +static inline void do_cmpeq16(CPURISCVState *env, void *vd, void *va,
>> +                              void *vb, uint8_t i)
>> +{
>> +    uint16_t *d = vd, *a = va, *b = vb;
>> +    d[i] = (a[i] == b[i]) ? 0xffff : 0x0;
>> +}
>> +
>> +RVPR(cmpeq16, 1, 2);
>> +
>> +static inline void do_scmplt16(CPURISCVState *env, void *vd, void *va,
>> +                               void *vb, uint8_t i)
>> +{
>> +    int16_t *d = vd, *a = va, *b = vb;
>> +    d[i] = (a[i] < b[i]) ? 0xffff : 0x0;
>> +}
>> +
>> +RVPR(scmplt16, 1, 2);
>> +
>> +static inline void do_scmple16(CPURISCVState *env, void *vd, void *va,
>> +                               void *vb, uint8_t i)
>> +{
>> +    int16_t *d = vd, *a = va, *b = vb;
>> +    d[i] = (a[i] <= b[i]) ? 0xffff : 0x0;
>> +}
>> +
>> +RVPR(scmple16, 1, 2);
>> +
>> +static inline void do_ucmplt16(CPURISCVState *env, void *vd, void *va,
>> +                               void *vb, uint8_t i)
>> +{
>> +    uint16_t *d = vd, *a = va, *b = vb;
>> +    d[i] = (a[i] < b[i]) ? 0xffff : 0x0;
>> +}
>> +
>> +RVPR(ucmplt16, 1, 2);
>> +
>> +static inline void do_ucmple16(CPURISCVState *env, void *vd, void *va,
>> +                               void *vb, uint8_t i)
>> +{
>> +    uint16_t *d = vd, *a = va, *b = vb;
>> +    d[i] = (a[i] <= b[i]) ? 0xffff : 0x0;
>> +}
>> +
>> +RVPR(ucmple16, 1, 2);
>
> Thanks, this is on for-next.

Oops, got my threads crossed.


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 05/38] target/riscv: 8-bit Addition & Subtraction Instruction
  2021-05-24  1:00       ` Palmer Dabbelt
@ 2021-05-26  5:43         ` LIU Zhiwei
  -1 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-05-26  5:43 UTC (permalink / raw)
  To: Palmer Dabbelt, alistair23; +Cc: richard.henderson, qemu-riscv, qemu-devel


On 5/24/21 9:00 AM, Palmer Dabbelt wrote:
> On Mon, 15 Mar 2021 14:22:58 PDT (-0700), alistair23@gmail.com wrote:
>> On Fri, Feb 12, 2021 at 10:14 AM LIU Zhiwei <zhiwei_liu@c-sky.com> 
>> wrote:
>>>
>>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>>
>> Acked-by: Alistair Francis <alistair.francis@wdc.com>
>
> I saw some reviews on the other ones, but since others (like this) 
> just have acks and haven't had any other traffic I'm going to start here.
>
> It looks like the latest spec is 0.9.4, but the changelog is pretty 
> minimal between 0.9.5 and 0.9.2:
>
> [0.9.2 -> 0.9.3]
>
> * Changed Zp64 name to Zpsfoperand.
> * Added Zprvsfextra for RV64 only instructions.
> * Removed SWAP16 encoding. It is an alias of PKBT16.
> * Fixed few typos and enhanced precision descriptions on imtermediate 
> results.
>
> [0.9.3 -> 0.9.4]
>
> * Fixed few typos and enhanced precision descriptions on imtermediate 
> results.
> * Fixed/Changed data types for some intrinsic functions.
> * Removed "RV32 Only" for Zpsfoperand.
>
> So I'm just going to stick with reviewing based on the latest spec 
> <https://github.com/riscv/riscv-p-spec/blob/d33a761f805d3b7c84214e5654a511267985a0a0/P-ext-proposal.pdf> 
> and try to keep those differences in mind, assuming we're just 
> tracking the latest draft here.
>
Hi Palmer,

It's a good news.

I plan to rebase the patch set and update to the latest specification.

Probably before next week, we can get a v2 patch set.

Zhiwei

>> Alistair
>>
>>> ---
>>>  target/riscv/helper.h                   |  9 +++
>>>  target/riscv/insn32.decode              | 11 ++++
>>>  target/riscv/insn_trans/trans_rvp.c.inc | 79 +++++++++++++++++++++++++
>>>  target/riscv/packed_helper.c            | 73 +++++++++++++++++++++++
>>>  4 files changed, 172 insertions(+)
>>>
>>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>>> index 6d622c732a..a69a6b4e84 100644
>>> --- a/target/riscv/helper.h
>>> +++ b/target/riscv/helper.h
>>> @@ -1175,3 +1175,12 @@ DEF_HELPER_3(rstsa16, tl, env, tl, tl)
>>>  DEF_HELPER_3(urstsa16, tl, env, tl, tl)
>>>  DEF_HELPER_3(kstsa16, tl, env, tl, tl)
>>>  DEF_HELPER_3(ukstsa16, tl, env, tl, tl)
>>> +
>>> +DEF_HELPER_3(radd8, tl, env, tl, tl)
>>> +DEF_HELPER_3(uradd8, tl, env, tl, tl)
>>> +DEF_HELPER_3(kadd8, tl, env, tl, tl)
>>> +DEF_HELPER_3(ukadd8, tl, env, tl, tl)
>>> +DEF_HELPER_3(rsub8, tl, env, tl, tl)
>>> +DEF_HELPER_3(ursub8, tl, env, tl, tl)
>>> +DEF_HELPER_3(ksub8, tl, env, tl, tl)
>>> +DEF_HELPER_3(uksub8, tl, env, tl, tl)
>>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>>> index 8815e90476..358dd1fa10 100644
>>> --- a/target/riscv/insn32.decode
>>> +++ b/target/riscv/insn32.decode
>>> @@ -624,3 +624,14 @@ rstsa16    1011011  ..... ..... 010 ..... 
>>> 1111111 @r
>>>  urstsa16   1101011  ..... ..... 010 ..... 1111111 @r
>>>  kstsa16    1100011  ..... ..... 010 ..... 1111111 @r
>>>  ukstsa16   1110011  ..... ..... 010 ..... 1111111 @r
>>> +
>>> +add8       0100100  ..... ..... 000 ..... 1111111 @r
>>> +radd8      0000100  ..... ..... 000 ..... 1111111 @r
>>> +uradd8     0010100  ..... ..... 000 ..... 1111111 @r
>>> +kadd8      0001100  ..... ..... 000 ..... 1111111 @r
>>> +ukadd8     0011100  ..... ..... 000 ..... 1111111 @r
>>> +sub8       0100101  ..... ..... 000 ..... 1111111 @r
>>> +rsub8      0000101  ..... ..... 000 ..... 1111111 @r
>>> +ursub8     0010101  ..... ..... 000 ..... 1111111 @r
>>> +ksub8      0001101  ..... ..... 000 ..... 1111111 @r
>>> +uksub8     0011101  ..... ..... 000 ..... 1111111 @r
>>> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc 
>>> b/target/riscv/insn_trans/trans_rvp.c.inc
>>> index 0885a4fd45..109f560ec9 100644
>>> --- a/target/riscv/insn_trans/trans_rvp.c.inc
>>> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
>>> @@ -159,3 +159,82 @@ GEN_RVP_R_OOL(rstsa16);
>>>  GEN_RVP_R_OOL(urstsa16);
>>>  GEN_RVP_R_OOL(kstsa16);
>>>  GEN_RVP_R_OOL(ukstsa16);
>>> +
>>> +/* 8-bit Addition & Subtraction Instructions */
>>> +/*
>>> + *  Copied from tcg-op-gvec.c.
>>> + *
>>> + *  Perform a vector addition using normal addition and a mask.  
>>> The mask
>>> + *  should be the sign bit of each lane.  This 6-operation form is 
>>> more
>>> + *  efficient than separate additions when there are 4 or more 
>>> lanes in
>>> + *  the 64-bit operation.
>>> + */
>>> +
>>> +static void gen_simd_add_mask(TCGv d, TCGv a, TCGv b, TCGv m)
>>> +{
>>> +    TCGv t1 = tcg_temp_new();
>>> +    TCGv t2 = tcg_temp_new();
>>> +    TCGv t3 = tcg_temp_new();
>>> +
>>> +    tcg_gen_andc_tl(t1, a, m);
>>> +    tcg_gen_andc_tl(t2, b, m);
>>> +    tcg_gen_xor_tl(t3, a, b);
>>> +    tcg_gen_add_tl(d, t1, t2);
>>> +    tcg_gen_and_tl(t3, t3, m);
>>> +    tcg_gen_xor_tl(d, d, t3);
>>> +
>>> +    tcg_temp_free(t1);
>>> +    tcg_temp_free(t2);
>>> +    tcg_temp_free(t3);
>>> +}
>>> +
>>> +static void tcg_gen_simd_add8(TCGv d, TCGv a, TCGv b)
>>> +{
>>> +    TCGv m = tcg_const_tl((target_ulong)dup_const(MO_8, 0x80));
>>> +    gen_simd_add_mask(d, a, b, m);
>>> +    tcg_temp_free(m);
>>> +}
>>> +
>>> +GEN_RVP_R_INLINE(add8, add, 0, trans_add);
>>> +
>>> +/*
>>> + *  Copied from tcg-op-gvec.c.
>>> + *
>>> + *  Perform a vector subtraction using normal subtraction and a mask.
>>> + *  Compare gen_addv_mask above.
>>> + */
>>> +static void gen_simd_sub_mask(TCGv d, TCGv a, TCGv b, TCGv m)
>>> +{
>>> +    TCGv t1 = tcg_temp_new();
>>> +    TCGv t2 = tcg_temp_new();
>>> +    TCGv t3 = tcg_temp_new();
>>> +
>>> +    tcg_gen_or_tl(t1, a, m);
>>> +    tcg_gen_andc_tl(t2, b, m);
>>> +    tcg_gen_eqv_tl(t3, a, b);
>>> +    tcg_gen_sub_tl(d, t1, t2);
>>> +    tcg_gen_and_tl(t3, t3, m);
>>> +    tcg_gen_xor_tl(d, d, t3);
>>> +
>>> +    tcg_temp_free(t1);
>>> +    tcg_temp_free(t2);
>>> +    tcg_temp_free(t3);
>>> +}
>>> +
>>> +static void tcg_gen_simd_sub8(TCGv d, TCGv a, TCGv b)
>>> +{
>>> +    TCGv m = tcg_const_tl((target_ulong)dup_const(MO_8, 0x80));
>>> +    gen_simd_sub_mask(d, a, b, m);
>>> +    tcg_temp_free(m);
>>> +}
>>> +
>>> +GEN_RVP_R_INLINE(sub8, sub, 0, trans_sub);
>>> +
>>> +GEN_RVP_R_OOL(radd8);
>>> +GEN_RVP_R_OOL(uradd8);
>>> +GEN_RVP_R_OOL(kadd8);
>>> +GEN_RVP_R_OOL(ukadd8);
>>> +GEN_RVP_R_OOL(rsub8);
>>> +GEN_RVP_R_OOL(ursub8);
>>> +GEN_RVP_R_OOL(ksub8);
>>> +GEN_RVP_R_OOL(uksub8);
>>> diff --git a/target/riscv/packed_helper.c 
>>> b/target/riscv/packed_helper.c
>>> index b84abaaf25..62db072204 100644
>>> --- a/target/riscv/packed_helper.c
>>> +++ b/target/riscv/packed_helper.c
>>> @@ -352,3 +352,76 @@ static inline void do_ukstsa16(CPURISCVState 
>>> *env, void *vd, void *va,
>>>  }
>>>
>>>  RVPR(ukstsa16, 2, 2);
>>> +
>>> +/* 8-bit Addition & Subtraction Instructions */
>>> +static inline void do_radd8(CPURISCVState *env, void *vd, void *va,
>>> +                            void *vb, uint8_t i)
>>> +{
>>> +    int8_t *d = vd, *a = va, *b = vb;
>>> +    d[i] = hadd32(a[i], b[i]);
>>> +}
>>> +
>>> +RVPR(radd8, 1, 1);
>>> +
>>> +static inline void do_uradd8(CPURISCVState *env, void *vd, void *va,
>>> +                                  void *vb, uint8_t i)
>>> +{
>>> +    uint8_t *d = vd, *a = va, *b = vb;
>>> +    d[i] = haddu32(a[i], b[i]);
>>> +}
>>> +
>>> +RVPR(uradd8, 1, 1);
>>> +
>>> +static inline void do_kadd8(CPURISCVState *env, void *vd, void *va,
>>> +                            void *vb, uint8_t i)
>>> +{
>>> +    int8_t *d = vd, *a = va, *b = vb;
>>> +    d[i] = sadd8(env, 0, a[i], b[i]);
>>> +}
>>> +
>>> +RVPR(kadd8, 1, 1);
>>> +
>>> +static inline void do_ukadd8(CPURISCVState *env, void *vd, void *va,
>>> +                             void *vb, uint8_t i)
>>> +{
>>> +    uint8_t *d = vd, *a = va, *b = vb;
>>> +    d[i] = saddu8(env, 0, a[i], b[i]);
>>> +}
>>> +
>>> +RVPR(ukadd8, 1, 1);
>>> +
>>> +static inline void do_rsub8(CPURISCVState *env, void *vd, void *va,
>>> +                            void *vb, uint8_t i)
>>> +{
>>> +    int8_t *d = vd, *a = va, *b = vb;
>>> +    d[i] = hsub32(a[i], b[i]);
>>> +}
>>> +
>>> +RVPR(rsub8, 1, 1);
>>> +
>>> +static inline void do_ursub8(CPURISCVState *env, void *vd, void *va,
>>> +                             void *vb, uint8_t i)
>>> +{
>>> +    uint8_t *d = vd, *a = va, *b = vb;
>>> +    d[i] = hsubu64(a[i], b[i]);
>>> +}
>>> +
>>> +RVPR(ursub8, 1, 1);
>>> +
>>> +static inline void do_ksub8(CPURISCVState *env, void *vd, void *va,
>>> +                            void *vb, uint8_t i)
>>> +{
>>> +    int8_t *d = vd, *a = va, *b = vb;
>>> +    d[i] = ssub8(env, 0, a[i], b[i]);
>>> +}
>>> +
>>> +RVPR(ksub8, 1, 1);
>>> +
>>> +static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
>>> +                             void *vb, uint8_t i)
>>> +{
>>> +    uint8_t *d = vd, *a = va, *b = vb;
>>> +    d[i] = ssubu8(env, 0, a[i], b[i]);
>>> +}
>>> +
>>> +RVPR(uksub8, 1, 1);
>>> -- 
>>> 2.17.1
>>>
>
> The naming on some of these helpers is a bit odd, but given that 
> they're a mix of the V and P extensions it's probably fine to just 
> leave them as-is.
> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 05/38] target/riscv: 8-bit Addition & Subtraction Instruction
@ 2021-05-26  5:43         ` LIU Zhiwei
  0 siblings, 0 replies; 150+ messages in thread
From: LIU Zhiwei @ 2021-05-26  5:43 UTC (permalink / raw)
  To: Palmer Dabbelt, alistair23; +Cc: qemu-devel, qemu-riscv, richard.henderson


On 5/24/21 9:00 AM, Palmer Dabbelt wrote:
> On Mon, 15 Mar 2021 14:22:58 PDT (-0700), alistair23@gmail.com wrote:
>> On Fri, Feb 12, 2021 at 10:14 AM LIU Zhiwei <zhiwei_liu@c-sky.com> 
>> wrote:
>>>
>>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>>
>> Acked-by: Alistair Francis <alistair.francis@wdc.com>
>
> I saw some reviews on the other ones, but since others (like this) 
> just have acks and haven't had any other traffic I'm going to start here.
>
> It looks like the latest spec is 0.9.4, but the changelog is pretty 
> minimal between 0.9.5 and 0.9.2:
>
> [0.9.2 -> 0.9.3]
>
> * Changed Zp64 name to Zpsfoperand.
> * Added Zprvsfextra for RV64 only instructions.
> * Removed SWAP16 encoding. It is an alias of PKBT16.
> * Fixed few typos and enhanced precision descriptions on imtermediate 
> results.
>
> [0.9.3 -> 0.9.4]
>
> * Fixed few typos and enhanced precision descriptions on imtermediate 
> results.
> * Fixed/Changed data types for some intrinsic functions.
> * Removed "RV32 Only" for Zpsfoperand.
>
> So I'm just going to stick with reviewing based on the latest spec 
> <https://github.com/riscv/riscv-p-spec/blob/d33a761f805d3b7c84214e5654a511267985a0a0/P-ext-proposal.pdf> 
> and try to keep those differences in mind, assuming we're just 
> tracking the latest draft here.
>
Hi Palmer,

It's a good news.

I plan to rebase the patch set and update to the latest specification.

Probably before next week, we can get a v2 patch set.

Zhiwei

>> Alistair
>>
>>> ---
>>>  target/riscv/helper.h                   |  9 +++
>>>  target/riscv/insn32.decode              | 11 ++++
>>>  target/riscv/insn_trans/trans_rvp.c.inc | 79 +++++++++++++++++++++++++
>>>  target/riscv/packed_helper.c            | 73 +++++++++++++++++++++++
>>>  4 files changed, 172 insertions(+)
>>>
>>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>>> index 6d622c732a..a69a6b4e84 100644
>>> --- a/target/riscv/helper.h
>>> +++ b/target/riscv/helper.h
>>> @@ -1175,3 +1175,12 @@ DEF_HELPER_3(rstsa16, tl, env, tl, tl)
>>>  DEF_HELPER_3(urstsa16, tl, env, tl, tl)
>>>  DEF_HELPER_3(kstsa16, tl, env, tl, tl)
>>>  DEF_HELPER_3(ukstsa16, tl, env, tl, tl)
>>> +
>>> +DEF_HELPER_3(radd8, tl, env, tl, tl)
>>> +DEF_HELPER_3(uradd8, tl, env, tl, tl)
>>> +DEF_HELPER_3(kadd8, tl, env, tl, tl)
>>> +DEF_HELPER_3(ukadd8, tl, env, tl, tl)
>>> +DEF_HELPER_3(rsub8, tl, env, tl, tl)
>>> +DEF_HELPER_3(ursub8, tl, env, tl, tl)
>>> +DEF_HELPER_3(ksub8, tl, env, tl, tl)
>>> +DEF_HELPER_3(uksub8, tl, env, tl, tl)
>>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>>> index 8815e90476..358dd1fa10 100644
>>> --- a/target/riscv/insn32.decode
>>> +++ b/target/riscv/insn32.decode
>>> @@ -624,3 +624,14 @@ rstsa16    1011011  ..... ..... 010 ..... 
>>> 1111111 @r
>>>  urstsa16   1101011  ..... ..... 010 ..... 1111111 @r
>>>  kstsa16    1100011  ..... ..... 010 ..... 1111111 @r
>>>  ukstsa16   1110011  ..... ..... 010 ..... 1111111 @r
>>> +
>>> +add8       0100100  ..... ..... 000 ..... 1111111 @r
>>> +radd8      0000100  ..... ..... 000 ..... 1111111 @r
>>> +uradd8     0010100  ..... ..... 000 ..... 1111111 @r
>>> +kadd8      0001100  ..... ..... 000 ..... 1111111 @r
>>> +ukadd8     0011100  ..... ..... 000 ..... 1111111 @r
>>> +sub8       0100101  ..... ..... 000 ..... 1111111 @r
>>> +rsub8      0000101  ..... ..... 000 ..... 1111111 @r
>>> +ursub8     0010101  ..... ..... 000 ..... 1111111 @r
>>> +ksub8      0001101  ..... ..... 000 ..... 1111111 @r
>>> +uksub8     0011101  ..... ..... 000 ..... 1111111 @r
>>> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc 
>>> b/target/riscv/insn_trans/trans_rvp.c.inc
>>> index 0885a4fd45..109f560ec9 100644
>>> --- a/target/riscv/insn_trans/trans_rvp.c.inc
>>> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
>>> @@ -159,3 +159,82 @@ GEN_RVP_R_OOL(rstsa16);
>>>  GEN_RVP_R_OOL(urstsa16);
>>>  GEN_RVP_R_OOL(kstsa16);
>>>  GEN_RVP_R_OOL(ukstsa16);
>>> +
>>> +/* 8-bit Addition & Subtraction Instructions */
>>> +/*
>>> + *  Copied from tcg-op-gvec.c.
>>> + *
>>> + *  Perform a vector addition using normal addition and a mask.  
>>> The mask
>>> + *  should be the sign bit of each lane.  This 6-operation form is 
>>> more
>>> + *  efficient than separate additions when there are 4 or more 
>>> lanes in
>>> + *  the 64-bit operation.
>>> + */
>>> +
>>> +static void gen_simd_add_mask(TCGv d, TCGv a, TCGv b, TCGv m)
>>> +{
>>> +    TCGv t1 = tcg_temp_new();
>>> +    TCGv t2 = tcg_temp_new();
>>> +    TCGv t3 = tcg_temp_new();
>>> +
>>> +    tcg_gen_andc_tl(t1, a, m);
>>> +    tcg_gen_andc_tl(t2, b, m);
>>> +    tcg_gen_xor_tl(t3, a, b);
>>> +    tcg_gen_add_tl(d, t1, t2);
>>> +    tcg_gen_and_tl(t3, t3, m);
>>> +    tcg_gen_xor_tl(d, d, t3);
>>> +
>>> +    tcg_temp_free(t1);
>>> +    tcg_temp_free(t2);
>>> +    tcg_temp_free(t3);
>>> +}
>>> +
>>> +static void tcg_gen_simd_add8(TCGv d, TCGv a, TCGv b)
>>> +{
>>> +    TCGv m = tcg_const_tl((target_ulong)dup_const(MO_8, 0x80));
>>> +    gen_simd_add_mask(d, a, b, m);
>>> +    tcg_temp_free(m);
>>> +}
>>> +
>>> +GEN_RVP_R_INLINE(add8, add, 0, trans_add);
>>> +
>>> +/*
>>> + *  Copied from tcg-op-gvec.c.
>>> + *
>>> + *  Perform a vector subtraction using normal subtraction and a mask.
>>> + *  Compare gen_addv_mask above.
>>> + */
>>> +static void gen_simd_sub_mask(TCGv d, TCGv a, TCGv b, TCGv m)
>>> +{
>>> +    TCGv t1 = tcg_temp_new();
>>> +    TCGv t2 = tcg_temp_new();
>>> +    TCGv t3 = tcg_temp_new();
>>> +
>>> +    tcg_gen_or_tl(t1, a, m);
>>> +    tcg_gen_andc_tl(t2, b, m);
>>> +    tcg_gen_eqv_tl(t3, a, b);
>>> +    tcg_gen_sub_tl(d, t1, t2);
>>> +    tcg_gen_and_tl(t3, t3, m);
>>> +    tcg_gen_xor_tl(d, d, t3);
>>> +
>>> +    tcg_temp_free(t1);
>>> +    tcg_temp_free(t2);
>>> +    tcg_temp_free(t3);
>>> +}
>>> +
>>> +static void tcg_gen_simd_sub8(TCGv d, TCGv a, TCGv b)
>>> +{
>>> +    TCGv m = tcg_const_tl((target_ulong)dup_const(MO_8, 0x80));
>>> +    gen_simd_sub_mask(d, a, b, m);
>>> +    tcg_temp_free(m);
>>> +}
>>> +
>>> +GEN_RVP_R_INLINE(sub8, sub, 0, trans_sub);
>>> +
>>> +GEN_RVP_R_OOL(radd8);
>>> +GEN_RVP_R_OOL(uradd8);
>>> +GEN_RVP_R_OOL(kadd8);
>>> +GEN_RVP_R_OOL(ukadd8);
>>> +GEN_RVP_R_OOL(rsub8);
>>> +GEN_RVP_R_OOL(ursub8);
>>> +GEN_RVP_R_OOL(ksub8);
>>> +GEN_RVP_R_OOL(uksub8);
>>> diff --git a/target/riscv/packed_helper.c 
>>> b/target/riscv/packed_helper.c
>>> index b84abaaf25..62db072204 100644
>>> --- a/target/riscv/packed_helper.c
>>> +++ b/target/riscv/packed_helper.c
>>> @@ -352,3 +352,76 @@ static inline void do_ukstsa16(CPURISCVState 
>>> *env, void *vd, void *va,
>>>  }
>>>
>>>  RVPR(ukstsa16, 2, 2);
>>> +
>>> +/* 8-bit Addition & Subtraction Instructions */
>>> +static inline void do_radd8(CPURISCVState *env, void *vd, void *va,
>>> +                            void *vb, uint8_t i)
>>> +{
>>> +    int8_t *d = vd, *a = va, *b = vb;
>>> +    d[i] = hadd32(a[i], b[i]);
>>> +}
>>> +
>>> +RVPR(radd8, 1, 1);
>>> +
>>> +static inline void do_uradd8(CPURISCVState *env, void *vd, void *va,
>>> +                                  void *vb, uint8_t i)
>>> +{
>>> +    uint8_t *d = vd, *a = va, *b = vb;
>>> +    d[i] = haddu32(a[i], b[i]);
>>> +}
>>> +
>>> +RVPR(uradd8, 1, 1);
>>> +
>>> +static inline void do_kadd8(CPURISCVState *env, void *vd, void *va,
>>> +                            void *vb, uint8_t i)
>>> +{
>>> +    int8_t *d = vd, *a = va, *b = vb;
>>> +    d[i] = sadd8(env, 0, a[i], b[i]);
>>> +}
>>> +
>>> +RVPR(kadd8, 1, 1);
>>> +
>>> +static inline void do_ukadd8(CPURISCVState *env, void *vd, void *va,
>>> +                             void *vb, uint8_t i)
>>> +{
>>> +    uint8_t *d = vd, *a = va, *b = vb;
>>> +    d[i] = saddu8(env, 0, a[i], b[i]);
>>> +}
>>> +
>>> +RVPR(ukadd8, 1, 1);
>>> +
>>> +static inline void do_rsub8(CPURISCVState *env, void *vd, void *va,
>>> +                            void *vb, uint8_t i)
>>> +{
>>> +    int8_t *d = vd, *a = va, *b = vb;
>>> +    d[i] = hsub32(a[i], b[i]);
>>> +}
>>> +
>>> +RVPR(rsub8, 1, 1);
>>> +
>>> +static inline void do_ursub8(CPURISCVState *env, void *vd, void *va,
>>> +                             void *vb, uint8_t i)
>>> +{
>>> +    uint8_t *d = vd, *a = va, *b = vb;
>>> +    d[i] = hsubu64(a[i], b[i]);
>>> +}
>>> +
>>> +RVPR(ursub8, 1, 1);
>>> +
>>> +static inline void do_ksub8(CPURISCVState *env, void *vd, void *va,
>>> +                            void *vb, uint8_t i)
>>> +{
>>> +    int8_t *d = vd, *a = va, *b = vb;
>>> +    d[i] = ssub8(env, 0, a[i], b[i]);
>>> +}
>>> +
>>> +RVPR(ksub8, 1, 1);
>>> +
>>> +static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
>>> +                             void *vb, uint8_t i)
>>> +{
>>> +    uint8_t *d = vd, *a = va, *b = vb;
>>> +    d[i] = ssubu8(env, 0, a[i], b[i]);
>>> +}
>>> +
>>> +RVPR(uksub8, 1, 1);
>>> -- 
>>> 2.17.1
>>>
>
> The naming on some of these helpers is a bit odd, but given that 
> they're a mix of the V and P extensions it's probably fine to just 
> leave them as-is.
> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 05/38] target/riscv: 8-bit Addition & Subtraction Instruction
  2021-05-26  5:43         ` LIU Zhiwei
@ 2021-05-26  6:15           ` Palmer Dabbelt
  -1 siblings, 0 replies; 150+ messages in thread
From: Palmer Dabbelt @ 2021-05-26  6:15 UTC (permalink / raw)
  To: zhiwei_liu; +Cc: alistair23, richard.henderson, qemu-riscv, qemu-devel

On Tue, 25 May 2021 22:43:27 PDT (-0700), zhiwei_liu@c-sky.com wrote:
>
> On 5/24/21 9:00 AM, Palmer Dabbelt wrote:
>> On Mon, 15 Mar 2021 14:22:58 PDT (-0700), alistair23@gmail.com wrote:
>>> On Fri, Feb 12, 2021 at 10:14 AM LIU Zhiwei <zhiwei_liu@c-sky.com>
>>> wrote:
>>>>
>>>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>>>
>>> Acked-by: Alistair Francis <alistair.francis@wdc.com>
>>
>> I saw some reviews on the other ones, but since others (like this)
>> just have acks and haven't had any other traffic I'm going to start here.
>>
>> It looks like the latest spec is 0.9.4, but the changelog is pretty
>> minimal between 0.9.5 and 0.9.2:
>>
>> [0.9.2 -> 0.9.3]
>>
>> * Changed Zp64 name to Zpsfoperand.
>> * Added Zprvsfextra for RV64 only instructions.
>> * Removed SWAP16 encoding. It is an alias of PKBT16.
>> * Fixed few typos and enhanced precision descriptions on imtermediate
>> results.
>>
>> [0.9.3 -> 0.9.4]
>>
>> * Fixed few typos and enhanced precision descriptions on imtermediate
>> results.
>> * Fixed/Changed data types for some intrinsic functions.
>> * Removed "RV32 Only" for Zpsfoperand.
>>
>> So I'm just going to stick with reviewing based on the latest spec
>> <https://github.com/riscv/riscv-p-spec/blob/d33a761f805d3b7c84214e5654a511267985a0a0/P-ext-proposal.pdf>
>> and try to keep those differences in mind, assuming we're just
>> tracking the latest draft here.
>>
> Hi Palmer,
>
> It's a good news.
>
> I plan to rebase the patch set and update to the latest specification.
>
> Probably before next week, we can get a v2 patch set.

Sounds good.  I'll keep slowly going through these until the v2 shows up 
and then jump over there.

>
> Zhiwei
>
>>> Alistair
>>>
>>>> ---
>>>>  target/riscv/helper.h                   |  9 +++
>>>>  target/riscv/insn32.decode              | 11 ++++
>>>>  target/riscv/insn_trans/trans_rvp.c.inc | 79 +++++++++++++++++++++++++
>>>>  target/riscv/packed_helper.c            | 73 +++++++++++++++++++++++
>>>>  4 files changed, 172 insertions(+)
>>>>
>>>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>>>> index 6d622c732a..a69a6b4e84 100644
>>>> --- a/target/riscv/helper.h
>>>> +++ b/target/riscv/helper.h
>>>> @@ -1175,3 +1175,12 @@ DEF_HELPER_3(rstsa16, tl, env, tl, tl)
>>>>  DEF_HELPER_3(urstsa16, tl, env, tl, tl)
>>>>  DEF_HELPER_3(kstsa16, tl, env, tl, tl)
>>>>  DEF_HELPER_3(ukstsa16, tl, env, tl, tl)
>>>> +
>>>> +DEF_HELPER_3(radd8, tl, env, tl, tl)
>>>> +DEF_HELPER_3(uradd8, tl, env, tl, tl)
>>>> +DEF_HELPER_3(kadd8, tl, env, tl, tl)
>>>> +DEF_HELPER_3(ukadd8, tl, env, tl, tl)
>>>> +DEF_HELPER_3(rsub8, tl, env, tl, tl)
>>>> +DEF_HELPER_3(ursub8, tl, env, tl, tl)
>>>> +DEF_HELPER_3(ksub8, tl, env, tl, tl)
>>>> +DEF_HELPER_3(uksub8, tl, env, tl, tl)
>>>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>>>> index 8815e90476..358dd1fa10 100644
>>>> --- a/target/riscv/insn32.decode
>>>> +++ b/target/riscv/insn32.decode
>>>> @@ -624,3 +624,14 @@ rstsa16    1011011  ..... ..... 010 .....
>>>> 1111111 @r
>>>>  urstsa16   1101011  ..... ..... 010 ..... 1111111 @r
>>>>  kstsa16    1100011  ..... ..... 010 ..... 1111111 @r
>>>>  ukstsa16   1110011  ..... ..... 010 ..... 1111111 @r
>>>> +
>>>> +add8       0100100  ..... ..... 000 ..... 1111111 @r
>>>> +radd8      0000100  ..... ..... 000 ..... 1111111 @r
>>>> +uradd8     0010100  ..... ..... 000 ..... 1111111 @r
>>>> +kadd8      0001100  ..... ..... 000 ..... 1111111 @r
>>>> +ukadd8     0011100  ..... ..... 000 ..... 1111111 @r
>>>> +sub8       0100101  ..... ..... 000 ..... 1111111 @r
>>>> +rsub8      0000101  ..... ..... 000 ..... 1111111 @r
>>>> +ursub8     0010101  ..... ..... 000 ..... 1111111 @r
>>>> +ksub8      0001101  ..... ..... 000 ..... 1111111 @r
>>>> +uksub8     0011101  ..... ..... 000 ..... 1111111 @r
>>>> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc
>>>> b/target/riscv/insn_trans/trans_rvp.c.inc
>>>> index 0885a4fd45..109f560ec9 100644
>>>> --- a/target/riscv/insn_trans/trans_rvp.c.inc
>>>> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
>>>> @@ -159,3 +159,82 @@ GEN_RVP_R_OOL(rstsa16);
>>>>  GEN_RVP_R_OOL(urstsa16);
>>>>  GEN_RVP_R_OOL(kstsa16);
>>>>  GEN_RVP_R_OOL(ukstsa16);
>>>> +
>>>> +/* 8-bit Addition & Subtraction Instructions */
>>>> +/*
>>>> + *  Copied from tcg-op-gvec.c.
>>>> + *
>>>> + *  Perform a vector addition using normal addition and a mask. 
>>>> The mask
>>>> + *  should be the sign bit of each lane.  This 6-operation form is
>>>> more
>>>> + *  efficient than separate additions when there are 4 or more
>>>> lanes in
>>>> + *  the 64-bit operation.
>>>> + */
>>>> +
>>>> +static void gen_simd_add_mask(TCGv d, TCGv a, TCGv b, TCGv m)
>>>> +{
>>>> +    TCGv t1 = tcg_temp_new();
>>>> +    TCGv t2 = tcg_temp_new();
>>>> +    TCGv t3 = tcg_temp_new();
>>>> +
>>>> +    tcg_gen_andc_tl(t1, a, m);
>>>> +    tcg_gen_andc_tl(t2, b, m);
>>>> +    tcg_gen_xor_tl(t3, a, b);
>>>> +    tcg_gen_add_tl(d, t1, t2);
>>>> +    tcg_gen_and_tl(t3, t3, m);
>>>> +    tcg_gen_xor_tl(d, d, t3);
>>>> +
>>>> +    tcg_temp_free(t1);
>>>> +    tcg_temp_free(t2);
>>>> +    tcg_temp_free(t3);
>>>> +}
>>>> +
>>>> +static void tcg_gen_simd_add8(TCGv d, TCGv a, TCGv b)
>>>> +{
>>>> +    TCGv m = tcg_const_tl((target_ulong)dup_const(MO_8, 0x80));
>>>> +    gen_simd_add_mask(d, a, b, m);
>>>> +    tcg_temp_free(m);
>>>> +}
>>>> +
>>>> +GEN_RVP_R_INLINE(add8, add, 0, trans_add);
>>>> +
>>>> +/*
>>>> + *  Copied from tcg-op-gvec.c.
>>>> + *
>>>> + *  Perform a vector subtraction using normal subtraction and a mask.
>>>> + *  Compare gen_addv_mask above.
>>>> + */
>>>> +static void gen_simd_sub_mask(TCGv d, TCGv a, TCGv b, TCGv m)
>>>> +{
>>>> +    TCGv t1 = tcg_temp_new();
>>>> +    TCGv t2 = tcg_temp_new();
>>>> +    TCGv t3 = tcg_temp_new();
>>>> +
>>>> +    tcg_gen_or_tl(t1, a, m);
>>>> +    tcg_gen_andc_tl(t2, b, m);
>>>> +    tcg_gen_eqv_tl(t3, a, b);
>>>> +    tcg_gen_sub_tl(d, t1, t2);
>>>> +    tcg_gen_and_tl(t3, t3, m);
>>>> +    tcg_gen_xor_tl(d, d, t3);
>>>> +
>>>> +    tcg_temp_free(t1);
>>>> +    tcg_temp_free(t2);
>>>> +    tcg_temp_free(t3);
>>>> +}
>>>> +
>>>> +static void tcg_gen_simd_sub8(TCGv d, TCGv a, TCGv b)
>>>> +{
>>>> +    TCGv m = tcg_const_tl((target_ulong)dup_const(MO_8, 0x80));
>>>> +    gen_simd_sub_mask(d, a, b, m);
>>>> +    tcg_temp_free(m);
>>>> +}
>>>> +
>>>> +GEN_RVP_R_INLINE(sub8, sub, 0, trans_sub);
>>>> +
>>>> +GEN_RVP_R_OOL(radd8);
>>>> +GEN_RVP_R_OOL(uradd8);
>>>> +GEN_RVP_R_OOL(kadd8);
>>>> +GEN_RVP_R_OOL(ukadd8);
>>>> +GEN_RVP_R_OOL(rsub8);
>>>> +GEN_RVP_R_OOL(ursub8);
>>>> +GEN_RVP_R_OOL(ksub8);
>>>> +GEN_RVP_R_OOL(uksub8);
>>>> diff --git a/target/riscv/packed_helper.c
>>>> b/target/riscv/packed_helper.c
>>>> index b84abaaf25..62db072204 100644
>>>> --- a/target/riscv/packed_helper.c
>>>> +++ b/target/riscv/packed_helper.c
>>>> @@ -352,3 +352,76 @@ static inline void do_ukstsa16(CPURISCVState
>>>> *env, void *vd, void *va,
>>>>  }
>>>>
>>>>  RVPR(ukstsa16, 2, 2);
>>>> +
>>>> +/* 8-bit Addition & Subtraction Instructions */
>>>> +static inline void do_radd8(CPURISCVState *env, void *vd, void *va,
>>>> +                            void *vb, uint8_t i)
>>>> +{
>>>> +    int8_t *d = vd, *a = va, *b = vb;
>>>> +    d[i] = hadd32(a[i], b[i]);
>>>> +}
>>>> +
>>>> +RVPR(radd8, 1, 1);
>>>> +
>>>> +static inline void do_uradd8(CPURISCVState *env, void *vd, void *va,
>>>> +                                  void *vb, uint8_t i)
>>>> +{
>>>> +    uint8_t *d = vd, *a = va, *b = vb;
>>>> +    d[i] = haddu32(a[i], b[i]);
>>>> +}
>>>> +
>>>> +RVPR(uradd8, 1, 1);
>>>> +
>>>> +static inline void do_kadd8(CPURISCVState *env, void *vd, void *va,
>>>> +                            void *vb, uint8_t i)
>>>> +{
>>>> +    int8_t *d = vd, *a = va, *b = vb;
>>>> +    d[i] = sadd8(env, 0, a[i], b[i]);
>>>> +}
>>>> +
>>>> +RVPR(kadd8, 1, 1);
>>>> +
>>>> +static inline void do_ukadd8(CPURISCVState *env, void *vd, void *va,
>>>> +                             void *vb, uint8_t i)
>>>> +{
>>>> +    uint8_t *d = vd, *a = va, *b = vb;
>>>> +    d[i] = saddu8(env, 0, a[i], b[i]);
>>>> +}
>>>> +
>>>> +RVPR(ukadd8, 1, 1);
>>>> +
>>>> +static inline void do_rsub8(CPURISCVState *env, void *vd, void *va,
>>>> +                            void *vb, uint8_t i)
>>>> +{
>>>> +    int8_t *d = vd, *a = va, *b = vb;
>>>> +    d[i] = hsub32(a[i], b[i]);
>>>> +}
>>>> +
>>>> +RVPR(rsub8, 1, 1);
>>>> +
>>>> +static inline void do_ursub8(CPURISCVState *env, void *vd, void *va,
>>>> +                             void *vb, uint8_t i)
>>>> +{
>>>> +    uint8_t *d = vd, *a = va, *b = vb;
>>>> +    d[i] = hsubu64(a[i], b[i]);
>>>> +}
>>>> +
>>>> +RVPR(ursub8, 1, 1);
>>>> +
>>>> +static inline void do_ksub8(CPURISCVState *env, void *vd, void *va,
>>>> +                            void *vb, uint8_t i)
>>>> +{
>>>> +    int8_t *d = vd, *a = va, *b = vb;
>>>> +    d[i] = ssub8(env, 0, a[i], b[i]);
>>>> +}
>>>> +
>>>> +RVPR(ksub8, 1, 1);
>>>> +
>>>> +static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
>>>> +                             void *vb, uint8_t i)
>>>> +{
>>>> +    uint8_t *d = vd, *a = va, *b = vb;
>>>> +    d[i] = ssubu8(env, 0, a[i], b[i]);
>>>> +}
>>>> +
>>>> +RVPR(uksub8, 1, 1);
>>>> --
>>>> 2.17.1
>>>>
>>
>> The naming on some of these helpers is a bit odd, but given that
>> they're a mix of the V and P extensions it's probably fine to just
>> leave them as-is.
>> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH 05/38] target/riscv: 8-bit Addition & Subtraction Instruction
@ 2021-05-26  6:15           ` Palmer Dabbelt
  0 siblings, 0 replies; 150+ messages in thread
From: Palmer Dabbelt @ 2021-05-26  6:15 UTC (permalink / raw)
  To: zhiwei_liu; +Cc: alistair23, qemu-devel, qemu-riscv, richard.henderson

On Tue, 25 May 2021 22:43:27 PDT (-0700), zhiwei_liu@c-sky.com wrote:
>
> On 5/24/21 9:00 AM, Palmer Dabbelt wrote:
>> On Mon, 15 Mar 2021 14:22:58 PDT (-0700), alistair23@gmail.com wrote:
>>> On Fri, Feb 12, 2021 at 10:14 AM LIU Zhiwei <zhiwei_liu@c-sky.com>
>>> wrote:
>>>>
>>>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>>>
>>> Acked-by: Alistair Francis <alistair.francis@wdc.com>
>>
>> I saw some reviews on the other ones, but since others (like this)
>> just have acks and haven't had any other traffic I'm going to start here.
>>
>> It looks like the latest spec is 0.9.4, but the changelog is pretty
>> minimal between 0.9.5 and 0.9.2:
>>
>> [0.9.2 -> 0.9.3]
>>
>> * Changed Zp64 name to Zpsfoperand.
>> * Added Zprvsfextra for RV64 only instructions.
>> * Removed SWAP16 encoding. It is an alias of PKBT16.
>> * Fixed few typos and enhanced precision descriptions on imtermediate
>> results.
>>
>> [0.9.3 -> 0.9.4]
>>
>> * Fixed few typos and enhanced precision descriptions on imtermediate
>> results.
>> * Fixed/Changed data types for some intrinsic functions.
>> * Removed "RV32 Only" for Zpsfoperand.
>>
>> So I'm just going to stick with reviewing based on the latest spec
>> <https://github.com/riscv/riscv-p-spec/blob/d33a761f805d3b7c84214e5654a511267985a0a0/P-ext-proposal.pdf>
>> and try to keep those differences in mind, assuming we're just
>> tracking the latest draft here.
>>
> Hi Palmer,
>
> It's a good news.
>
> I plan to rebase the patch set and update to the latest specification.
>
> Probably before next week, we can get a v2 patch set.

Sounds good.  I'll keep slowly going through these until the v2 shows up 
and then jump over there.

>
> Zhiwei
>
>>> Alistair
>>>
>>>> ---
>>>>  target/riscv/helper.h                   |  9 +++
>>>>  target/riscv/insn32.decode              | 11 ++++
>>>>  target/riscv/insn_trans/trans_rvp.c.inc | 79 +++++++++++++++++++++++++
>>>>  target/riscv/packed_helper.c            | 73 +++++++++++++++++++++++
>>>>  4 files changed, 172 insertions(+)
>>>>
>>>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>>>> index 6d622c732a..a69a6b4e84 100644
>>>> --- a/target/riscv/helper.h
>>>> +++ b/target/riscv/helper.h
>>>> @@ -1175,3 +1175,12 @@ DEF_HELPER_3(rstsa16, tl, env, tl, tl)
>>>>  DEF_HELPER_3(urstsa16, tl, env, tl, tl)
>>>>  DEF_HELPER_3(kstsa16, tl, env, tl, tl)
>>>>  DEF_HELPER_3(ukstsa16, tl, env, tl, tl)
>>>> +
>>>> +DEF_HELPER_3(radd8, tl, env, tl, tl)
>>>> +DEF_HELPER_3(uradd8, tl, env, tl, tl)
>>>> +DEF_HELPER_3(kadd8, tl, env, tl, tl)
>>>> +DEF_HELPER_3(ukadd8, tl, env, tl, tl)
>>>> +DEF_HELPER_3(rsub8, tl, env, tl, tl)
>>>> +DEF_HELPER_3(ursub8, tl, env, tl, tl)
>>>> +DEF_HELPER_3(ksub8, tl, env, tl, tl)
>>>> +DEF_HELPER_3(uksub8, tl, env, tl, tl)
>>>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>>>> index 8815e90476..358dd1fa10 100644
>>>> --- a/target/riscv/insn32.decode
>>>> +++ b/target/riscv/insn32.decode
>>>> @@ -624,3 +624,14 @@ rstsa16    1011011  ..... ..... 010 .....
>>>> 1111111 @r
>>>>  urstsa16   1101011  ..... ..... 010 ..... 1111111 @r
>>>>  kstsa16    1100011  ..... ..... 010 ..... 1111111 @r
>>>>  ukstsa16   1110011  ..... ..... 010 ..... 1111111 @r
>>>> +
>>>> +add8       0100100  ..... ..... 000 ..... 1111111 @r
>>>> +radd8      0000100  ..... ..... 000 ..... 1111111 @r
>>>> +uradd8     0010100  ..... ..... 000 ..... 1111111 @r
>>>> +kadd8      0001100  ..... ..... 000 ..... 1111111 @r
>>>> +ukadd8     0011100  ..... ..... 000 ..... 1111111 @r
>>>> +sub8       0100101  ..... ..... 000 ..... 1111111 @r
>>>> +rsub8      0000101  ..... ..... 000 ..... 1111111 @r
>>>> +ursub8     0010101  ..... ..... 000 ..... 1111111 @r
>>>> +ksub8      0001101  ..... ..... 000 ..... 1111111 @r
>>>> +uksub8     0011101  ..... ..... 000 ..... 1111111 @r
>>>> diff --git a/target/riscv/insn_trans/trans_rvp.c.inc
>>>> b/target/riscv/insn_trans/trans_rvp.c.inc
>>>> index 0885a4fd45..109f560ec9 100644
>>>> --- a/target/riscv/insn_trans/trans_rvp.c.inc
>>>> +++ b/target/riscv/insn_trans/trans_rvp.c.inc
>>>> @@ -159,3 +159,82 @@ GEN_RVP_R_OOL(rstsa16);
>>>>  GEN_RVP_R_OOL(urstsa16);
>>>>  GEN_RVP_R_OOL(kstsa16);
>>>>  GEN_RVP_R_OOL(ukstsa16);
>>>> +
>>>> +/* 8-bit Addition & Subtraction Instructions */
>>>> +/*
>>>> + *  Copied from tcg-op-gvec.c.
>>>> + *
>>>> + *  Perform a vector addition using normal addition and a mask. 
>>>> The mask
>>>> + *  should be the sign bit of each lane.  This 6-operation form is
>>>> more
>>>> + *  efficient than separate additions when there are 4 or more
>>>> lanes in
>>>> + *  the 64-bit operation.
>>>> + */
>>>> +
>>>> +static void gen_simd_add_mask(TCGv d, TCGv a, TCGv b, TCGv m)
>>>> +{
>>>> +    TCGv t1 = tcg_temp_new();
>>>> +    TCGv t2 = tcg_temp_new();
>>>> +    TCGv t3 = tcg_temp_new();
>>>> +
>>>> +    tcg_gen_andc_tl(t1, a, m);
>>>> +    tcg_gen_andc_tl(t2, b, m);
>>>> +    tcg_gen_xor_tl(t3, a, b);
>>>> +    tcg_gen_add_tl(d, t1, t2);
>>>> +    tcg_gen_and_tl(t3, t3, m);
>>>> +    tcg_gen_xor_tl(d, d, t3);
>>>> +
>>>> +    tcg_temp_free(t1);
>>>> +    tcg_temp_free(t2);
>>>> +    tcg_temp_free(t3);
>>>> +}
>>>> +
>>>> +static void tcg_gen_simd_add8(TCGv d, TCGv a, TCGv b)
>>>> +{
>>>> +    TCGv m = tcg_const_tl((target_ulong)dup_const(MO_8, 0x80));
>>>> +    gen_simd_add_mask(d, a, b, m);
>>>> +    tcg_temp_free(m);
>>>> +}
>>>> +
>>>> +GEN_RVP_R_INLINE(add8, add, 0, trans_add);
>>>> +
>>>> +/*
>>>> + *  Copied from tcg-op-gvec.c.
>>>> + *
>>>> + *  Perform a vector subtraction using normal subtraction and a mask.
>>>> + *  Compare gen_addv_mask above.
>>>> + */
>>>> +static void gen_simd_sub_mask(TCGv d, TCGv a, TCGv b, TCGv m)
>>>> +{
>>>> +    TCGv t1 = tcg_temp_new();
>>>> +    TCGv t2 = tcg_temp_new();
>>>> +    TCGv t3 = tcg_temp_new();
>>>> +
>>>> +    tcg_gen_or_tl(t1, a, m);
>>>> +    tcg_gen_andc_tl(t2, b, m);
>>>> +    tcg_gen_eqv_tl(t3, a, b);
>>>> +    tcg_gen_sub_tl(d, t1, t2);
>>>> +    tcg_gen_and_tl(t3, t3, m);
>>>> +    tcg_gen_xor_tl(d, d, t3);
>>>> +
>>>> +    tcg_temp_free(t1);
>>>> +    tcg_temp_free(t2);
>>>> +    tcg_temp_free(t3);
>>>> +}
>>>> +
>>>> +static void tcg_gen_simd_sub8(TCGv d, TCGv a, TCGv b)
>>>> +{
>>>> +    TCGv m = tcg_const_tl((target_ulong)dup_const(MO_8, 0x80));
>>>> +    gen_simd_sub_mask(d, a, b, m);
>>>> +    tcg_temp_free(m);
>>>> +}
>>>> +
>>>> +GEN_RVP_R_INLINE(sub8, sub, 0, trans_sub);
>>>> +
>>>> +GEN_RVP_R_OOL(radd8);
>>>> +GEN_RVP_R_OOL(uradd8);
>>>> +GEN_RVP_R_OOL(kadd8);
>>>> +GEN_RVP_R_OOL(ukadd8);
>>>> +GEN_RVP_R_OOL(rsub8);
>>>> +GEN_RVP_R_OOL(ursub8);
>>>> +GEN_RVP_R_OOL(ksub8);
>>>> +GEN_RVP_R_OOL(uksub8);
>>>> diff --git a/target/riscv/packed_helper.c
>>>> b/target/riscv/packed_helper.c
>>>> index b84abaaf25..62db072204 100644
>>>> --- a/target/riscv/packed_helper.c
>>>> +++ b/target/riscv/packed_helper.c
>>>> @@ -352,3 +352,76 @@ static inline void do_ukstsa16(CPURISCVState
>>>> *env, void *vd, void *va,
>>>>  }
>>>>
>>>>  RVPR(ukstsa16, 2, 2);
>>>> +
>>>> +/* 8-bit Addition & Subtraction Instructions */
>>>> +static inline void do_radd8(CPURISCVState *env, void *vd, void *va,
>>>> +                            void *vb, uint8_t i)
>>>> +{
>>>> +    int8_t *d = vd, *a = va, *b = vb;
>>>> +    d[i] = hadd32(a[i], b[i]);
>>>> +}
>>>> +
>>>> +RVPR(radd8, 1, 1);
>>>> +
>>>> +static inline void do_uradd8(CPURISCVState *env, void *vd, void *va,
>>>> +                                  void *vb, uint8_t i)
>>>> +{
>>>> +    uint8_t *d = vd, *a = va, *b = vb;
>>>> +    d[i] = haddu32(a[i], b[i]);
>>>> +}
>>>> +
>>>> +RVPR(uradd8, 1, 1);
>>>> +
>>>> +static inline void do_kadd8(CPURISCVState *env, void *vd, void *va,
>>>> +                            void *vb, uint8_t i)
>>>> +{
>>>> +    int8_t *d = vd, *a = va, *b = vb;
>>>> +    d[i] = sadd8(env, 0, a[i], b[i]);
>>>> +}
>>>> +
>>>> +RVPR(kadd8, 1, 1);
>>>> +
>>>> +static inline void do_ukadd8(CPURISCVState *env, void *vd, void *va,
>>>> +                             void *vb, uint8_t i)
>>>> +{
>>>> +    uint8_t *d = vd, *a = va, *b = vb;
>>>> +    d[i] = saddu8(env, 0, a[i], b[i]);
>>>> +}
>>>> +
>>>> +RVPR(ukadd8, 1, 1);
>>>> +
>>>> +static inline void do_rsub8(CPURISCVState *env, void *vd, void *va,
>>>> +                            void *vb, uint8_t i)
>>>> +{
>>>> +    int8_t *d = vd, *a = va, *b = vb;
>>>> +    d[i] = hsub32(a[i], b[i]);
>>>> +}
>>>> +
>>>> +RVPR(rsub8, 1, 1);
>>>> +
>>>> +static inline void do_ursub8(CPURISCVState *env, void *vd, void *va,
>>>> +                             void *vb, uint8_t i)
>>>> +{
>>>> +    uint8_t *d = vd, *a = va, *b = vb;
>>>> +    d[i] = hsubu64(a[i], b[i]);
>>>> +}
>>>> +
>>>> +RVPR(ursub8, 1, 1);
>>>> +
>>>> +static inline void do_ksub8(CPURISCVState *env, void *vd, void *va,
>>>> +                            void *vb, uint8_t i)
>>>> +{
>>>> +    int8_t *d = vd, *a = va, *b = vb;
>>>> +    d[i] = ssub8(env, 0, a[i], b[i]);
>>>> +}
>>>> +
>>>> +RVPR(ksub8, 1, 1);
>>>> +
>>>> +static inline void do_uksub8(CPURISCVState *env, void *vd, void *va,
>>>> +                             void *vb, uint8_t i)
>>>> +{
>>>> +    uint8_t *d = vd, *a = va, *b = vb;
>>>> +    d[i] = ssubu8(env, 0, a[i], b[i]);
>>>> +}
>>>> +
>>>> +RVPR(uksub8, 1, 1);
>>>> --
>>>> 2.17.1
>>>>
>>
>> The naming on some of these helpers is a bit odd, but given that
>> they're a mix of the V and P extensions it's probably fine to just
>> leave them as-is.
>> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>


^ permalink raw reply	[flat|nested] 150+ messages in thread

end of thread, other threads:[~2021-05-26  6:16 UTC | newest]

Thread overview: 150+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-12 15:02 [PATCH 00/38] target/riscv: support packed extension v0.9.2 LIU Zhiwei
2021-02-12 15:02 ` LIU Zhiwei
2021-02-12 15:02 ` [PATCH 01/38] target/riscv: implementation-defined constant parameters LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-03-09 14:08   ` Alistair Francis
2021-03-09 14:08     ` Alistair Francis
2021-02-12 15:02 ` [PATCH 02/38] target/riscv: Hoist vector functions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-03-09 14:10   ` Alistair Francis
2021-03-09 14:10     ` Alistair Francis
2021-02-12 15:02 ` [PATCH 03/38] target/riscv: Fixup saturate subtract function LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 18:52   ` Richard Henderson
2021-02-12 18:52     ` Richard Henderson
2021-03-09 14:11   ` Alistair Francis
2021-03-09 14:11     ` Alistair Francis
2021-02-12 15:02 ` [PATCH 04/38] target/riscv: 16-bit Addition & Subtraction Instructions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 18:03   ` Richard Henderson
2021-02-12 18:03     ` Richard Henderson
2021-02-18  8:39     ` LIU Zhiwei
2021-02-18  8:39       ` LIU Zhiwei
2021-02-18 16:20       ` Richard Henderson
2021-02-18 16:20         ` Richard Henderson
2021-02-12 19:02   ` Richard Henderson
2021-02-12 19:02     ` Richard Henderson
2021-02-18  8:47     ` LIU Zhiwei
2021-02-18  8:47       ` LIU Zhiwei
2021-02-18 16:21       ` Richard Henderson
2021-02-18 16:21         ` Richard Henderson
2021-02-12 15:02 ` [PATCH 05/38] target/riscv: 8-bit Addition & Subtraction Instruction LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-03-15 21:22   ` Alistair Francis
2021-03-15 21:22     ` Alistair Francis
2021-05-24  1:00     ` Palmer Dabbelt
2021-05-24  1:00       ` Palmer Dabbelt
2021-05-26  5:43       ` LIU Zhiwei
2021-05-26  5:43         ` LIU Zhiwei
2021-05-26  6:15         ` Palmer Dabbelt
2021-05-26  6:15           ` Palmer Dabbelt
2021-02-12 15:02 ` [PATCH 06/38] target/riscv: SIMD 16-bit Shift Instructions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-03-15 21:25   ` Alistair Francis
2021-03-15 21:25     ` Alistair Francis
2021-03-16  2:40     ` LIU Zhiwei
2021-03-16  2:40       ` LIU Zhiwei
2021-03-16 19:54       ` Alistair Francis
2021-03-16 19:54         ` Alistair Francis
2021-03-17  2:30         ` LIU Zhiwei
2021-03-17  2:30           ` LIU Zhiwei
2021-03-17 20:39           ` Alistair Francis
2021-03-17 20:39             ` Alistair Francis
2021-02-12 15:02 ` [PATCH 07/38] target/riscv: SIMD 8-bit " LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-03-15 21:27   ` Alistair Francis
2021-03-15 21:27     ` Alistair Francis
2021-05-24  4:46   ` Palmer Dabbelt
2021-05-24  4:46     ` Palmer Dabbelt
2021-02-12 15:02 ` [PATCH 08/38] target/riscv: SIMD 16-bit Compare Instructions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-03-15 21:28   ` Alistair Francis
2021-03-15 21:28     ` Alistair Francis
2021-05-26  5:30   ` Palmer Dabbelt
2021-05-26  5:30     ` Palmer Dabbelt
2021-05-26  5:31     ` Palmer Dabbelt
2021-05-26  5:31       ` Palmer Dabbelt
2021-02-12 15:02 ` [PATCH 09/38] target/riscv: SIMD 8-bit " LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-03-15 21:31   ` Alistair Francis
2021-03-15 21:31     ` Alistair Francis
2021-02-12 15:02 ` [PATCH 10/38] target/riscv: SIMD 16-bit Multiply Instructions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 15:02 ` [PATCH 11/38] target/riscv: SIMD 8-bit " LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-03-15 21:33   ` Alistair Francis
2021-03-15 21:33     ` Alistair Francis
2021-02-12 15:02 ` [PATCH 12/38] target/riscv: SIMD 16-bit Miscellaneous Instructions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-03-15 21:35   ` Alistair Francis
2021-03-15 21:35     ` Alistair Francis
2021-02-12 15:02 ` [PATCH 13/38] target/riscv: SIMD 8-bit " LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-03-16 14:38   ` Alistair Francis
2021-03-16 14:38     ` Alistair Francis
2021-02-12 15:02 ` [PATCH 14/38] target/riscv: 8-bit Unpacking Instructions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-03-16 14:40   ` Alistair Francis
2021-03-16 14:40     ` Alistair Francis
2021-02-12 15:02 ` [PATCH 15/38] target/riscv: 16-bit Packing Instructions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-03-16 14:42   ` Alistair Francis
2021-03-16 14:42     ` Alistair Francis
2021-02-12 15:02 ` [PATCH 16/38] target/riscv: Signed MSW 32x32 Multiply and Add Instructions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 15:02 ` [PATCH 17/38] target/riscv: Signed MSW 32x16 " LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-03-16 16:01   ` Alistair Francis
2021-03-16 16:01     ` Alistair Francis
2021-02-12 15:02 ` [PATCH 18/38] target/riscv: Signed 16-bit Multiply 32-bit Add/Subtract Instructions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 15:02 ` [PATCH 19/38] target/riscv: Signed 16-bit Multiply 64-bit " LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 15:02 ` [PATCH 20/38] target/riscv: Partial-SIMD Miscellaneous Instructions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-03-16 19:44   ` Alistair Francis
2021-03-16 19:44     ` Alistair Francis
2021-02-12 15:02 ` [PATCH 21/38] target/riscv: 8-bit Multiply with 32-bit Add Instructions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 15:02 ` [PATCH 22/38] target/riscv: 64-bit Add/Subtract Instructions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 15:02 ` [PATCH 23/38] target/riscv: 32-bit Multiply " LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 15:02 ` [PATCH 24/38] target/riscv: Signed 16-bit Multiply with " LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 15:02 ` [PATCH 25/38] target/riscv: Non-SIMD Q15 saturation ALU Instructions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 15:02 ` [PATCH 26/38] target/riscv: Non-SIMD Q31 " LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 15:02 ` [PATCH 27/38] target/riscv: 32-bit Computation Instructions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 15:02 ` [PATCH 28/38] target/riscv: Non-SIMD Miscellaneous Instructions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 15:02 ` [PATCH 29/38] target/riscv: RV64 Only SIMD 32-bit Add/Subtract Instructions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 15:02 ` [PATCH 30/38] target/riscv: RV64 Only SIMD 32-bit Shift Instructions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 15:02 ` [PATCH 31/38] target/riscv: RV64 Only SIMD 32-bit Miscellaneous Instructions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 15:02 ` [PATCH 32/38] target/riscv: RV64 Only SIMD Q15 saturating Multiply Instructions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 15:02 ` [PATCH 33/38] target/riscv: RV64 Only 32-bit " LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 15:02 ` [PATCH 34/38] target/riscv: RV64 Only 32-bit Multiply & Add Instructions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 15:02 ` [PATCH 35/38] target/riscv: RV64 Only 32-bit Parallel " LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 15:02 ` [PATCH 36/38] target/riscv: RV64 Only Non-SIMD 32-bit Shift Instructions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 15:02 ` [PATCH 37/38] target/riscv: RV64 Only 32-bit Packing Instructions LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-02-12 15:02 ` [PATCH 38/38] target/riscv: configure and turn on packed extension from command line LIU Zhiwei
2021-02-12 15:02   ` LIU Zhiwei
2021-03-05  6:14 ` [PATCH 00/38] target/riscv: support packed extension v0.9.2 LIU Zhiwei
2021-03-05  6:14   ` LIU Zhiwei
2021-04-13  3:27 ` LIU Zhiwei
2021-04-13  3:27   ` LIU Zhiwei
2021-04-15  4:46   ` Alistair Francis
2021-04-15  4:46     ` Alistair Francis
2021-04-15  5:50     ` LIU Zhiwei
2021-04-15  5:50       ` LIU Zhiwei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.