All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/43] Add LoongArch LSX instructions
@ 2022-12-24  8:15 Song Gao
  2022-12-24  8:15 ` [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t Song Gao
                   ` (43 more replies)
  0 siblings, 44 replies; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

Hi, Merry Christmas!

This series adds LoongArch LSX instructions, Since the LoongArch
Vol2 is not open, So we use 'RFC' title. 
 
About Test:
The new-abi gcc which support LSX is also not open, We use the old-abi gcc [1] build 
test code[2](tests/tcg/loongarch/vec/test_XXX*.c).

[1]:http://ftp.loongnix.cn/toolchain/gcc/release/loongarch/gcc8/loongson-gnu-toolchain-8.3-x86_64-loongarch64-linux-gnu-rc1.1.tar.xz 
[2]:https://github.com/loongson/qemu/commit/a4f03d68c0c60fcc5bf62114fd1f7a6a7cdf1070

e.g
   build:
   loongarch64-linux-gnu-gcc -mlsx tests/tcg/loongarch64/vec/test_bit.c -o test_bit
   run:
   ./build/qemu-loongarch64  test_bit   (qemu branch [2]: tcg-old-abi-support-lsx)

Thanks.
SOng Gao

Song Gao (43):
  target/loongarch: Add vector data type vec_t
  target/loongarch: CPUCFG support LSX
  target/loongarch: meson.build support build LSX
  target/loongarch: Add CHECK_SXE maccro for check LSX enable
  target/loongarch: Implement vadd/vsub
  target/loongarch: Implement vaddi/vsubi
  target/loongarch: Implement vneg
  target/loongarch: Implement vsadd/vssub
  target/loongarch: Implement vhaddw/vhsubw
  target/loongarch: Implement vaddw/vsubw
  target/loongarch: Implement vavg/vavgr
  target/loongarch: Implement vabsd
  target/loongarch: Implement vadda
  target/loongarch: Implement vmax/vmin
  target/loongarch: Implement vmul/vmuh/vmulw{ev/od}
  target/loongarch: Implement vmadd/vmsub/vmaddw{ev/od}
  target/loongarch: Implement vdiv/vmod
  target/loongarch: Implement vsat
  target/loongarch: Implement vexth
  target/loongarch: Implement vsigncov
  target/loongarch: Implement vmskltz/vmskgez/vmsknz
  target/loongarch: Implement LSX logic instructions
  target/loongarch: Implement vsll vsrl vsra vrotr
  target/loongarch: Implement vsllwil vextl
  target/loongarch: Implement vsrlr vsrar
  target/loongarch: Implement vsrln vsran
  target/loongarch: Implement vsrlrn vsrarn
  target/loongarch: Implement vssrln vssran
  target/loongarch: Implement vssrlrn vssrarn
  target/loongarch: Implement vclo vclz
  target/loongarch: Implement vpcnt
  target/loongarch: Implement vbitclr vbitset vbitrev
  target/loongarch: Implement vfrstp
  target/loongarch: Implement LSX fpu arith instructions
  target/loongarch: Implement LSX fpu fcvt instructions
  target/loongarch: Implement vseq vsle vslt
  target/loongarch: Implement vfcmp
  target/loongarch: Implement vbitsel vset
  target/loongarch: Implement vinsgr2vr vpickve2gr vreplgr2vr
  target/loongarch: Implement vreplve vpack vpick
  target/loongarch: Implement vilvl vilvh vextrins vshuf
  target/loongarch: Implement vld vst
  target/loongarch: Implement vldi

 fpu/softfloat.c                             |   55 +
 include/fpu/softfloat.h                     |   27 +
 linux-user/loongarch64/signal.c             |    4 +-
 target/loongarch/cpu.c                      |    5 +-
 target/loongarch/cpu.h                      |   20 +-
 target/loongarch/disas.c                    |  911 ++++
 target/loongarch/fpu_helper.c               |    2 +-
 target/loongarch/gdbstub.c                  |    4 +-
 target/loongarch/helper.h                   |  748 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  960 ++++
 target/loongarch/insns.decode               |  811 +++
 target/loongarch/internals.h                |    1 +
 target/loongarch/lsx_helper.c               | 5375 +++++++++++++++++++
 target/loongarch/machine.c                  |    2 +-
 target/loongarch/meson.build                |    1 +
 target/loongarch/translate.c                |   11 +
 16 files changed, 8929 insertions(+), 8 deletions(-)
 create mode 100644 target/loongarch/insn_trans/trans_lsx.c.inc
 create mode 100644 target/loongarch/lsx_helper.c

-- 
2.31.1



^ permalink raw reply	[flat|nested] 100+ messages in thread

* [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
@ 2022-12-24  8:15 ` Song Gao
  2022-12-24 17:07   ` Richard Henderson
                     ` (2 more replies)
  2022-12-24  8:15 ` [RFC PATCH 02/43] target/loongarch: CPUCFG support LSX Song Gao
                   ` (42 subsequent siblings)
  43 siblings, 3 replies; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 linux-user/loongarch64/signal.c |  4 ++--
 target/loongarch/cpu.c          |  2 +-
 target/loongarch/cpu.h          | 18 +++++++++++++++++-
 target/loongarch/gdbstub.c      |  4 ++--
 target/loongarch/machine.c      |  2 +-
 5 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/linux-user/loongarch64/signal.c b/linux-user/loongarch64/signal.c
index 7c7afb652e..40dba974d0 100644
--- a/linux-user/loongarch64/signal.c
+++ b/linux-user/loongarch64/signal.c
@@ -128,7 +128,7 @@ static void setup_sigframe(CPULoongArchState *env,
 
     fpu_ctx = (struct target_fpu_context *)(info + 1);
     for (i = 0; i < 32; ++i) {
-        __put_user(env->fpr[i], &fpu_ctx->regs[i]);
+        __put_user(env->fpr[i].d, &fpu_ctx->regs[i]);
     }
     __put_user(read_fcc(env), &fpu_ctx->fcc);
     __put_user(env->fcsr0, &fpu_ctx->fcsr);
@@ -193,7 +193,7 @@ static void restore_sigframe(CPULoongArchState *env,
         uint64_t fcc;
 
         for (i = 0; i < 32; ++i) {
-            __get_user(env->fpr[i], &fpu_ctx->regs[i]);
+            __get_user(env->fpr[i].d, &fpu_ctx->regs[i]);
         }
         __get_user(fcc, &fpu_ctx->fcc);
         write_fcc(env, fcc);
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 290ab4d526..59ae29a3b4 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -653,7 +653,7 @@ void loongarch_cpu_dump_state(CPUState *cs, FILE *f, int flags)
     /* fpr */
     if (flags & CPU_DUMP_FPU) {
         for (i = 0; i < 32; i++) {
-            qemu_fprintf(f, " %s %016" PRIx64, fregnames[i], env->fpr[i]);
+            qemu_fprintf(f, " %s %016" PRIx64, fregnames[i], env->fpr[i].d);
             if ((i & 3) == 3) {
                 qemu_fprintf(f, "\n");
             }
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index e35cf65597..d37df63bde 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -239,6 +239,22 @@ FIELD(TLB_MISC, ASID, 1, 10)
 FIELD(TLB_MISC, VPPN, 13, 35)
 FIELD(TLB_MISC, PS, 48, 6)
 
+#define LSX_LEN   (128)
+typedef union vec_t vec_t;
+union vec_t {
+    int8_t   B[LSX_LEN / 8];
+    int16_t  H[LSX_LEN / 16];
+    int32_t  W[LSX_LEN / 32];
+    int64_t  D[LSX_LEN / 64];
+    __int128 Q[LSX_LEN / 128];
+};
+
+typedef union fpr_t fpr_t;
+union fpr_t {
+    uint64_t d;
+    vec_t vec;
+};
+
 struct LoongArchTLB {
     uint64_t tlb_misc;
     /* Fields corresponding to CSR_TLBELO0/1 */
@@ -251,7 +267,7 @@ typedef struct CPUArchState {
     uint64_t gpr[32];
     uint64_t pc;
 
-    uint64_t fpr[32];
+    fpr_t fpr[32];
     float_status fp_status;
     bool cf[8];
 
diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c
index a4d1e28e36..18cba6f8f3 100644
--- a/target/loongarch/gdbstub.c
+++ b/target/loongarch/gdbstub.c
@@ -68,7 +68,7 @@ static int loongarch_gdb_get_fpu(CPULoongArchState *env,
                                  GByteArray *mem_buf, int n)
 {
     if (0 <= n && n < 32) {
-        return gdb_get_reg64(mem_buf, env->fpr[n]);
+        return gdb_get_reg64(mem_buf, env->fpr[n].d);
     } else if (n == 32) {
         uint64_t val = read_fcc(env);
         return gdb_get_reg64(mem_buf, val);
@@ -84,7 +84,7 @@ static int loongarch_gdb_set_fpu(CPULoongArchState *env,
     int length = 0;
 
     if (0 <= n && n < 32) {
-        env->fpr[n] = ldq_p(mem_buf);
+        env->fpr[n].d = ldq_p(mem_buf);
         length = 8;
     } else if (n == 32) {
         uint64_t val = ldq_p(mem_buf);
diff --git a/target/loongarch/machine.c b/target/loongarch/machine.c
index b1e523ea72..b3598cce3f 100644
--- a/target/loongarch/machine.c
+++ b/target/loongarch/machine.c
@@ -33,7 +33,7 @@ const VMStateDescription vmstate_loongarch_cpu = {
 
         VMSTATE_UINTTL_ARRAY(env.gpr, LoongArchCPU, 32),
         VMSTATE_UINTTL(env.pc, LoongArchCPU),
-        VMSTATE_UINT64_ARRAY(env.fpr, LoongArchCPU, 32),
+        VMSTATE_UINT64_ARRAY(env.fpr.d, LoongArchCPU, 32),
         VMSTATE_UINT32(env.fcsr0, LoongArchCPU),
         VMSTATE_BOOL_ARRAY(env.cf, LoongArchCPU, 8),
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 02/43] target/loongarch: CPUCFG support LSX
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
  2022-12-24  8:15 ` [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t Song Gao
@ 2022-12-24  8:15 ` Song Gao
  2022-12-24  8:15 ` [RFC PATCH 03/43] target/loongarch: meson.build support build LSX Song Gao
                   ` (41 subsequent siblings)
  43 siblings, 0 replies; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/cpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 59ae29a3b4..698778ce7f 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -385,6 +385,7 @@ static void loongarch_la464_initfn(Object *obj)
     data = FIELD_DP32(data, CPUCFG2, FP_SP, 1);
     data = FIELD_DP32(data, CPUCFG2, FP_DP, 1);
     data = FIELD_DP32(data, CPUCFG2, FP_VER, 1);
+    data = FIELD_DP32(data, CPUCFG2, LSX, 1),
     data = FIELD_DP32(data, CPUCFG2, LLFTP, 1);
     data = FIELD_DP32(data, CPUCFG2, LLFTP_VER, 1);
     data = FIELD_DP32(data, CPUCFG2, LAM, 1);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 03/43] target/loongarch: meson.build support build LSX
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
  2022-12-24  8:15 ` [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t Song Gao
  2022-12-24  8:15 ` [RFC PATCH 02/43] target/loongarch: CPUCFG support LSX Song Gao
@ 2022-12-24  8:15 ` Song Gao
  2022-12-24  8:15 ` [RFC PATCH 04/43] target/loongarch: Add CHECK_SXE maccro for check LSX enable Song Gao
                   ` (40 subsequent siblings)
  43 siblings, 0 replies; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insn_trans/trans_lsx.c.inc | 5 +++++
 target/loongarch/lsx_helper.c               | 6 ++++++
 target/loongarch/meson.build                | 1 +
 target/loongarch/translate.c                | 1 +
 4 files changed, 13 insertions(+)
 create mode 100644 target/loongarch/insn_trans/trans_lsx.c.inc
 create mode 100644 target/loongarch/lsx_helper.c

diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
new file mode 100644
index 0000000000..5a8c53c6c7
--- /dev/null
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * LSX translate functions
+ * Copyright (c) 2022 Loongson Technology Corporation Limited
+ */
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
new file mode 100644
index 0000000000..325574a026
--- /dev/null
+++ b/target/loongarch/lsx_helper.c
@@ -0,0 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * QEMU LoongArch LSX helper functions.
+ *
+ * Copyright (c) 2022 Loongson Technology Corporation Limited
+ */
diff --git a/target/loongarch/meson.build b/target/loongarch/meson.build
index 6376f9e84b..fc5f03c6da 100644
--- a/target/loongarch/meson.build
+++ b/target/loongarch/meson.build
@@ -12,6 +12,7 @@ loongarch_tcg_ss.add(files(
   'op_helper.c',
   'translate.c',
   'gdbstub.c',
+  'lsx_helper.c',
 ))
 loongarch_tcg_ss.add(zlib)
 
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index 38ced69803..fa43473738 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -181,6 +181,7 @@ static void gen_set_gpr(int reg_num, TCGv t, DisasExtend dst_ext)
 #include "insn_trans/trans_fmemory.c.inc"
 #include "insn_trans/trans_branch.c.inc"
 #include "insn_trans/trans_privileged.c.inc"
+#include "insn_trans/trans_lsx.c.inc"
 
 static void loongarch_tr_translate_insn(DisasContextBase *dcbase, CPUState *cs)
 {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 04/43] target/loongarch: Add CHECK_SXE maccro for check LSX enable
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (2 preceding siblings ...)
  2022-12-24  8:15 ` [RFC PATCH 03/43] target/loongarch: meson.build support build LSX Song Gao
@ 2022-12-24  8:15 ` Song Gao
  2022-12-24  8:15 ` [RFC PATCH 05/43] target/loongarch: Implement vadd/vsub Song Gao
                   ` (39 subsequent siblings)
  43 siblings, 0 replies; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/cpu.c                      |  2 ++
 target/loongarch/cpu.h                      |  2 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 11 +++++++++++
 3 files changed, 15 insertions(+)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 698778ce7f..d2c03c578f 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -52,6 +52,7 @@ static const char * const excp_names[] = {
     [EXCCODE_FPE] = "Floating Point Exception",
     [EXCCODE_DBP] = "Debug breakpoint",
     [EXCCODE_BCE] = "Bound Check Exception",
+    [EXCCODE_SXD] = "128 bit vector instructions Disable exception",
 };
 
 const char *loongarch_exception_name(int32_t exception)
@@ -187,6 +188,7 @@ static void loongarch_cpu_do_interrupt(CPUState *cs)
     case EXCCODE_FPD:
     case EXCCODE_FPE:
     case EXCCODE_BCE:
+    case EXCCODE_ASXD:
         env->CSR_BADV = env->pc;
         QEMU_FALLTHROUGH;
     case EXCCODE_ADEM:
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index d37df63bde..c2afeca3d0 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -414,6 +414,7 @@ static inline int cpu_mmu_index(CPULoongArchState *env, bool ifetch)
 #define HW_FLAGS_PLV_MASK   R_CSR_CRMD_PLV_MASK  /* 0x03 */
 #define HW_FLAGS_CRMD_PG    R_CSR_CRMD_PG_MASK   /* 0x10 */
 #define HW_FLAGS_EUEN_FPE   0x04
+#define HW_FLAGS_EUEN_SXE   0x08
 
 static inline void cpu_get_tb_cpu_state(CPULoongArchState *env,
                                         target_ulong *pc,
@@ -424,6 +425,7 @@ static inline void cpu_get_tb_cpu_state(CPULoongArchState *env,
     *cs_base = 0;
     *flags = env->CSR_CRMD & (R_CSR_CRMD_PLV_MASK | R_CSR_CRMD_PG_MASK);
     *flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, FPE) * HW_FLAGS_EUEN_FPE;
+    *flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, SXE) * HW_FLAGS_EUEN_SXE;
 }
 
 void loongarch_cpu_list(void);
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 5a8c53c6c7..d0bc9f561e 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -3,3 +3,14 @@
  * LSX translate functions
  * Copyright (c) 2022 Loongson Technology Corporation Limited
  */
+
+#ifndef CONFIG_USER_ONLY
+#define CHECK_SXE do { \
+    if ((ctx->base.tb->flags & HW_FLAGS_EUEN_SXE) == 0) { \
+        generate_exception(ctx, EXCCODE_SXD); \
+        return true; \
+    } \
+} while (0)
+#else
+#define CHECK_SXE
+#endif
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 05/43] target/loongarch: Implement vadd/vsub
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (3 preceding siblings ...)
  2022-12-24  8:15 ` [RFC PATCH 04/43] target/loongarch: Add CHECK_SXE maccro for check LSX enable Song Gao
@ 2022-12-24  8:15 ` Song Gao
  2022-12-24 17:16   ` Richard Henderson
  2022-12-24  8:15 ` [RFC PATCH 06/43] target/loongarch: Implement vaddi/vsubi Song Gao
                   ` (38 subsequent siblings)
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VADD.{B/H/W/D/Q};
- VSUB.{B/H/W/D/Q}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    | 23 ++++++
 target/loongarch/helper.h                   | 12 +++
 target/loongarch/insn_trans/trans_lsx.c.inc | 23 ++++++
 target/loongarch/insns.decode               | 22 ++++++
 target/loongarch/lsx_helper.c               | 81 +++++++++++++++++++++
 5 files changed, 161 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 858dfcc53a..51c597603e 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -755,3 +755,26 @@ static bool trans_fcmp_cond_##suffix(DisasContext *ctx, \
 
 FCMP_INSN(s)
 FCMP_INSN(d)
+
+#define INSN_LSX(insn, type)                                \
+static bool trans_##insn(DisasContext *ctx, arg_##type * a) \
+{                                                           \
+    output_##type(ctx, a, #insn);                           \
+    return true;                                            \
+}
+
+static void output_vvv(DisasContext *ctx, arg_vvv *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "v%d, v%d, v%d", a->vd, a->vj, a->vk);
+}
+
+INSN_LSX(vadd_b,           vvv)
+INSN_LSX(vadd_h,           vvv)
+INSN_LSX(vadd_w,           vvv)
+INSN_LSX(vadd_d,           vvv)
+INSN_LSX(vadd_q,           vvv)
+INSN_LSX(vsub_b,           vvv)
+INSN_LSX(vsub_h,           vvv)
+INSN_LSX(vsub_w,           vvv)
+INSN_LSX(vsub_d,           vvv)
+INSN_LSX(vsub_q,           vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 9c01823a26..465bc36cb8 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -130,3 +130,15 @@ DEF_HELPER_4(ldpte, void, env, tl, tl, i32)
 DEF_HELPER_1(ertn, void, env)
 DEF_HELPER_1(idle, void, env)
 #endif
+
+/* LoongArch LSX  */
+DEF_HELPER_4(vadd_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vadd_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vadd_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vadd_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vadd_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vsub_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsub_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsub_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsub_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsub_q, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index d0bc9f561e..b2276ae688 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -14,3 +14,26 @@
 #else
 #define CHECK_SXE
 #endif
+
+static bool gen_vvv(DisasContext *ctx, arg_vvv *a,
+                    void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32))
+{
+    TCGv_i32 vd = tcg_constant_i32(a->vd);
+    TCGv_i32 vj = tcg_constant_i32(a->vj);
+    TCGv_i32 vk = tcg_constant_i32(a->vk);
+
+    CHECK_SXE;
+    func(cpu_env, vd, vj, vk);
+    return true;
+}
+
+TRANS(vadd_b, gen_vvv, gen_helper_vadd_b)
+TRANS(vadd_h, gen_vvv, gen_helper_vadd_h)
+TRANS(vadd_w, gen_vvv, gen_helper_vadd_w)
+TRANS(vadd_d, gen_vvv, gen_helper_vadd_d)
+TRANS(vadd_q, gen_vvv, gen_helper_vadd_q)
+TRANS(vsub_b, gen_vvv, gen_helper_vsub_b)
+TRANS(vsub_h, gen_vvv, gen_helper_vsub_h)
+TRANS(vsub_w, gen_vvv, gen_helper_vsub_w)
+TRANS(vsub_d, gen_vvv, gen_helper_vsub_d)
+TRANS(vsub_q, gen_vvv, gen_helper_vsub_q)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 3fdc6e148c..0dd6ab20a2 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -484,3 +484,25 @@ ldpte            0000 01100100 01 ........ ..... 00000    @j_i
 ertn             0000 01100100 10000 01110 00000 00000    @empty
 idle             0000 01100100 10001 ...............      @i15
 dbcl             0000 00000010 10101 ...............      @i15
+
+#
+# LSX Argument sets
+#
+
+&vvv          vd vj vk
+
+#
+# LSX Formats
+#
+@vvv               .... ........ ..... vk:5 vj:5 vd:5    &vvv
+
+vadd_b           0111 00000000 10100 ..... ..... .....    @vvv
+vadd_h           0111 00000000 10101 ..... ..... .....    @vvv
+vadd_w           0111 00000000 10110 ..... ..... .....    @vvv
+vadd_d           0111 00000000 10111 ..... ..... .....    @vvv
+vadd_q           0111 00010010 11010 ..... ..... .....    @vvv
+vsub_b           0111 00000000 11000 ..... ..... .....    @vvv
+vsub_h           0111 00000000 11001 ..... ..... .....    @vvv
+vsub_w           0111 00000000 11010 ..... ..... .....    @vvv
+vsub_d           0111 00000000 11011 ..... ..... .....    @vvv
+vsub_q           0111 00010010 11011 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 325574a026..195b2ffa8d 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -4,3 +4,84 @@
  *
  * Copyright (c) 2022 Loongson Technology Corporation Limited
  */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/exec-all.h"
+#include "exec/helper-proto.h"
+
+#define DO_HELPER_VVV(NAME, BIT, FUNC, ...)                   \
+    void helper_##NAME(CPULoongArchState *env,                \
+                       uint32_t vd, uint32_t vj, uint32_t vk) \
+    { FUNC(env, vd, vj, vk, BIT, __VA_ARGS__); }
+
+static void helper_vvv(CPULoongArchState *env,
+                       uint32_t vd, uint32_t vj, uint32_t vk, int bit,
+                       void (*func)(vec_t*, vec_t*, vec_t*, int, int))
+{
+    int i;
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+    vec_t *Vk = &(env->fpr[vk].vec);
+
+    for (i = 0; i < LSX_LEN/bit; i++) {
+        func(Vd, Vj, Vk, bit, i);
+    }
+}
+
+static void do_vadd(vec_t *Vd, vec_t *Vj, vec_t *Vk,  int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = Vj->B[n] + Vk->B[n];
+        break;
+    case 16:
+        Vd->H[n] = Vj->H[n] + Vk->H[n];
+        break;
+    case 32:
+        Vd->W[n] = Vj->W[n] + Vk->W[n];
+        break;
+    case 64:
+        Vd->D[n] = Vj->D[n] + Vk->D[n];
+        break;
+    case 128:
+        Vd->Q[n] = Vj->Q[n] + Vk->Q[n];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vsub(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = Vj->B[n] - Vk->B[n];
+        break;
+    case 16:
+        Vd->H[n] = Vj->H[n] - Vk->H[n];
+        break;
+    case 32:
+        Vd->W[n] = Vj->W[n] - Vk->W[n];
+        break;
+    case 64:
+        Vd->D[n] = Vj->D[n] - Vk->D[n];
+        break;
+    case 128:
+        Vd->Q[n] = Vj->Q[n] - Vk->Q[n];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vadd_b, 8, helper_vvv, do_vadd)
+DO_HELPER_VVV(vadd_h, 16, helper_vvv, do_vadd)
+DO_HELPER_VVV(vadd_w, 32, helper_vvv, do_vadd)
+DO_HELPER_VVV(vadd_d, 64, helper_vvv, do_vadd)
+DO_HELPER_VVV(vadd_q, 128, helper_vvv, do_vadd)
+DO_HELPER_VVV(vsub_b, 8, helper_vvv, do_vsub)
+DO_HELPER_VVV(vsub_h, 16, helper_vvv, do_vsub)
+DO_HELPER_VVV(vsub_w, 32, helper_vvv, do_vsub)
+DO_HELPER_VVV(vsub_d, 64, helper_vvv, do_vsub)
+DO_HELPER_VVV(vsub_q, 128, helper_vvv, do_vsub)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 06/43] target/loongarch: Implement vaddi/vsubi
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (4 preceding siblings ...)
  2022-12-24  8:15 ` [RFC PATCH 05/43] target/loongarch: Implement vadd/vsub Song Gao
@ 2022-12-24  8:15 ` Song Gao
  2022-12-24 17:27   ` Richard Henderson
  2022-12-24  8:15 ` [RFC PATCH 07/43] target/loongarch: Implement vneg Song Gao
                   ` (37 subsequent siblings)
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VADDI.{B/H/W/D}U;
- VSUBI.{B/H/W/D}U.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    | 14 +++++
 target/loongarch/helper.h                   |  9 +++
 target/loongarch/insn_trans/trans_lsx.c.inc | 21 +++++++
 target/loongarch/insns.decode               | 11 ++++
 target/loongarch/lsx_helper.c               | 67 +++++++++++++++++++++
 5 files changed, 122 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 51c597603e..13a503951a 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -768,6 +768,11 @@ static void output_vvv(DisasContext *ctx, arg_vvv *a, const char *mnemonic)
     output(ctx, mnemonic, "v%d, v%d, v%d", a->vd, a->vj, a->vk);
 }
 
+static void output_vv_i(DisasContext *ctx, arg_vv_i *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "v%d, v%d, 0x%x", a->vd, a->vj, a->imm);
+}
+
 INSN_LSX(vadd_b,           vvv)
 INSN_LSX(vadd_h,           vvv)
 INSN_LSX(vadd_w,           vvv)
@@ -778,3 +783,12 @@ INSN_LSX(vsub_h,           vvv)
 INSN_LSX(vsub_w,           vvv)
 INSN_LSX(vsub_d,           vvv)
 INSN_LSX(vsub_q,           vvv)
+
+INSN_LSX(vaddi_bu,         vv_i)
+INSN_LSX(vaddi_hu,         vv_i)
+INSN_LSX(vaddi_wu,         vv_i)
+INSN_LSX(vaddi_du,         vv_i)
+INSN_LSX(vsubi_bu,         vv_i)
+INSN_LSX(vsubi_hu,         vv_i)
+INSN_LSX(vsubi_wu,         vv_i)
+INSN_LSX(vsubi_du,         vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 465bc36cb8..d6d50f6771 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -142,3 +142,12 @@ DEF_HELPER_4(vsub_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vsub_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vsub_d, void, env, i32, i32, i32)
 DEF_HELPER_4(vsub_q, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vaddi_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddi_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddi_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddi_du, void, env, i32, i32, i32)
+DEF_HELPER_4(vsubi_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsubi_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsubi_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsubi_du, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index b2276ae688..9485a03a08 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -27,6 +27,18 @@ static bool gen_vvv(DisasContext *ctx, arg_vvv *a,
     return true;
 }
 
+static bool gen_vv_i(DisasContext *ctx, arg_vv_i *a,
+                     void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32))
+{
+    TCGv_i32 vd = tcg_constant_i32(a->vd);
+    TCGv_i32 vj = tcg_constant_i32(a->vj);
+    TCGv_i32 imm = tcg_constant_i32(a->imm);
+
+    CHECK_SXE;
+    func(cpu_env, vd, vj, imm);
+    return true;
+}
+
 TRANS(vadd_b, gen_vvv, gen_helper_vadd_b)
 TRANS(vadd_h, gen_vvv, gen_helper_vadd_h)
 TRANS(vadd_w, gen_vvv, gen_helper_vadd_w)
@@ -37,3 +49,12 @@ TRANS(vsub_h, gen_vvv, gen_helper_vsub_h)
 TRANS(vsub_w, gen_vvv, gen_helper_vsub_w)
 TRANS(vsub_d, gen_vvv, gen_helper_vsub_d)
 TRANS(vsub_q, gen_vvv, gen_helper_vsub_q)
+
+TRANS(vaddi_bu, gen_vv_i, gen_helper_vaddi_bu)
+TRANS(vaddi_hu, gen_vv_i, gen_helper_vaddi_hu)
+TRANS(vaddi_wu, gen_vv_i, gen_helper_vaddi_wu)
+TRANS(vaddi_du, gen_vv_i, gen_helper_vaddi_du)
+TRANS(vsubi_bu, gen_vv_i, gen_helper_vsubi_bu)
+TRANS(vsubi_hu, gen_vv_i, gen_helper_vsubi_hu)
+TRANS(vsubi_wu, gen_vv_i, gen_helper_vsubi_wu)
+TRANS(vsubi_du, gen_vv_i, gen_helper_vsubi_du)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 0dd6ab20a2..4f8226060a 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -490,11 +490,13 @@ dbcl             0000 00000010 10101 ...............      @i15
 #
 
 &vvv          vd vj vk
+&vv_i         vd vj imm
 
 #
 # LSX Formats
 #
 @vvv               .... ........ ..... vk:5 vj:5 vd:5    &vvv
+@vv_ui5           .... ........ ..... imm:5 vj:5 vd:5    &vv_i
 
 vadd_b           0111 00000000 10100 ..... ..... .....    @vvv
 vadd_h           0111 00000000 10101 ..... ..... .....    @vvv
@@ -506,3 +508,12 @@ vsub_h           0111 00000000 11001 ..... ..... .....    @vvv
 vsub_w           0111 00000000 11010 ..... ..... .....    @vvv
 vsub_d           0111 00000000 11011 ..... ..... .....    @vvv
 vsub_q           0111 00010010 11011 ..... ..... .....    @vvv
+
+vaddi_bu         0111 00101000 10100 ..... ..... .....    @vv_ui5
+vaddi_hu         0111 00101000 10101 ..... ..... .....    @vv_ui5
+vaddi_wu         0111 00101000 10110 ..... ..... .....    @vv_ui5
+vaddi_du         0111 00101000 10111 ..... ..... .....    @vv_ui5
+vsubi_bu         0111 00101000 11000 ..... ..... .....    @vv_ui5
+vsubi_hu         0111 00101000 11001 ..... ..... .....    @vv_ui5
+vsubi_wu         0111 00101000 11010 ..... ..... .....    @vv_ui5
+vsubi_du         0111 00101000 11011 ..... ..... .....    @vv_ui5
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 195b2ffa8d..e227db20d3 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -15,6 +15,11 @@
                        uint32_t vd, uint32_t vj, uint32_t vk) \
     { FUNC(env, vd, vj, vk, BIT, __VA_ARGS__); }
 
+#define DO_HELPER_VV_I(NAME, BIT, FUNC, ...)                   \
+    void helper_##NAME(CPULoongArchState *env,                 \
+                       uint32_t vd, uint32_t vj, uint32_t imm) \
+    { FUNC(env, vd, vj, imm, BIT, __VA_ARGS__ ); }
+
 static void helper_vvv(CPULoongArchState *env,
                        uint32_t vd, uint32_t vj, uint32_t vk, int bit,
                        void (*func)(vec_t*, vec_t*, vec_t*, int, int))
@@ -29,6 +34,19 @@ static void helper_vvv(CPULoongArchState *env,
     }
 }
 
+static  void helper_vv_i(CPULoongArchState *env,
+                         uint32_t vd, uint32_t vj, uint32_t imm, int bit,
+                         void (*func)(vec_t*, vec_t*, uint32_t, int, int))
+{
+    int i;
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+
+    for (i = 0; i < LSX_LEN/bit; i++) {
+        func(Vd, Vj, imm, bit, i);
+    }
+}
+
 static void do_vadd(vec_t *Vd, vec_t *Vj, vec_t *Vk,  int bit, int n)
 {
     switch (bit) {
@@ -85,3 +103,52 @@ DO_HELPER_VVV(vsub_h, 16, helper_vvv, do_vsub)
 DO_HELPER_VVV(vsub_w, 32, helper_vvv, do_vsub)
 DO_HELPER_VVV(vsub_d, 64, helper_vvv, do_vsub)
 DO_HELPER_VVV(vsub_q, 128, helper_vvv, do_vsub)
+
+static void do_vaddi(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = Vj->B[n] + imm;
+        break;
+    case 16:
+        Vd->H[n] = Vj->H[n] + imm;
+        break;
+    case 32:
+        Vd->W[n] = Vj->W[n] + imm;
+        break;
+    case 64:
+        Vd->D[n] = Vj->D[n] + imm;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vsubi(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = Vj->B[n] - imm;
+        break;
+    case 16:
+        Vd->H[n] = Vj->H[n] - imm;
+        break;
+    case 32:
+        Vd->W[n] = Vj->W[n] - imm;
+        break;
+    case 64:
+        Vd->D[n] = Vd->D[n] - imm;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VV_I(vaddi_bu, 8, helper_vv_i, do_vaddi)
+DO_HELPER_VV_I(vaddi_hu, 16, helper_vv_i, do_vaddi)
+DO_HELPER_VV_I(vaddi_wu, 32, helper_vv_i, do_vaddi)
+DO_HELPER_VV_I(vaddi_du, 64, helper_vv_i, do_vaddi)
+DO_HELPER_VV_I(vsubi_bu, 8, helper_vv_i, do_vsubi)
+DO_HELPER_VV_I(vsubi_hu, 16, helper_vv_i, do_vsubi)
+DO_HELPER_VV_I(vsubi_wu, 32, helper_vv_i, do_vsubi)
+DO_HELPER_VV_I(vsubi_du, 64, helper_vv_i, do_vsubi)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 07/43] target/loongarch: Implement vneg
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (5 preceding siblings ...)
  2022-12-24  8:15 ` [RFC PATCH 06/43] target/loongarch: Implement vaddi/vsubi Song Gao
@ 2022-12-24  8:15 ` Song Gao
  2022-12-24 17:29   ` Richard Henderson
  2022-12-24  8:15 ` [RFC PATCH 08/43] target/loongarch: Implement vsadd/vssub Song Gao
                   ` (36 subsequent siblings)
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes;
- VNEG.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    | 10 +++++
 target/loongarch/helper.h                   |  5 +++
 target/loongarch/insn_trans/trans_lsx.c.inc | 16 ++++++++
 target/loongarch/insns.decode               |  7 ++++
 target/loongarch/lsx_helper.c               | 42 +++++++++++++++++++++
 5 files changed, 80 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 13a503951a..53e299b4ba 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -773,6 +773,11 @@ static void output_vv_i(DisasContext *ctx, arg_vv_i *a, const char *mnemonic)
     output(ctx, mnemonic, "v%d, v%d, 0x%x", a->vd, a->vj, a->imm);
 }
 
+static void output_vv(DisasContext *ctx, arg_vv *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "v%d, v%d", a->vd, a->vj);
+}
+
 INSN_LSX(vadd_b,           vvv)
 INSN_LSX(vadd_h,           vvv)
 INSN_LSX(vadd_w,           vvv)
@@ -792,3 +797,8 @@ INSN_LSX(vsubi_bu,         vv_i)
 INSN_LSX(vsubi_hu,         vv_i)
 INSN_LSX(vsubi_wu,         vv_i)
 INSN_LSX(vsubi_du,         vv_i)
+
+INSN_LSX(vneg_b,           vv)
+INSN_LSX(vneg_h,           vv)
+INSN_LSX(vneg_w,           vv)
+INSN_LSX(vneg_d,           vv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index d6d50f6771..847950011e 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -151,3 +151,8 @@ DEF_HELPER_4(vsubi_bu, void, env, i32, i32, i32)
 DEF_HELPER_4(vsubi_hu, void, env, i32, i32, i32)
 DEF_HELPER_4(vsubi_wu, void, env, i32, i32, i32)
 DEF_HELPER_4(vsubi_du, void, env, i32, i32, i32)
+
+DEF_HELPER_3(vneg_b, void, env, i32, i32)
+DEF_HELPER_3(vneg_h, void, env, i32, i32)
+DEF_HELPER_3(vneg_w, void, env, i32, i32)
+DEF_HELPER_3(vneg_d, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 9485a03a08..00514709c1 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -39,6 +39,17 @@ static bool gen_vv_i(DisasContext *ctx, arg_vv_i *a,
     return true;
 }
 
+static bool gen_vv(DisasContext *ctx, arg_vv *a,
+                   void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32))
+{
+    TCGv_i32 vd = tcg_constant_i32(a->vd);
+    TCGv_i32 vj = tcg_constant_i32(a->vj);
+
+    CHECK_SXE;
+    func(cpu_env, vd, vj);
+    return true;
+}
+
 TRANS(vadd_b, gen_vvv, gen_helper_vadd_b)
 TRANS(vadd_h, gen_vvv, gen_helper_vadd_h)
 TRANS(vadd_w, gen_vvv, gen_helper_vadd_w)
@@ -58,3 +69,8 @@ TRANS(vsubi_bu, gen_vv_i, gen_helper_vsubi_bu)
 TRANS(vsubi_hu, gen_vv_i, gen_helper_vsubi_hu)
 TRANS(vsubi_wu, gen_vv_i, gen_helper_vsubi_wu)
 TRANS(vsubi_du, gen_vv_i, gen_helper_vsubi_du)
+
+TRANS(vneg_b, gen_vv, gen_helper_vneg_b)
+TRANS(vneg_h, gen_vv, gen_helper_vneg_h)
+TRANS(vneg_w, gen_vv, gen_helper_vneg_w)
+TRANS(vneg_d, gen_vv, gen_helper_vneg_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 4f8226060a..3da5ed17ed 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -489,12 +489,14 @@ dbcl             0000 00000010 10101 ...............      @i15
 # LSX Argument sets
 #
 
+&vv           vd vj
 &vvv          vd vj vk
 &vv_i         vd vj imm
 
 #
 # LSX Formats
 #
+@vv               .... ........ ..... ..... vj:5 vd:5    &vv
 @vvv               .... ........ ..... vk:5 vj:5 vd:5    &vvv
 @vv_ui5           .... ........ ..... imm:5 vj:5 vd:5    &vv_i
 
@@ -517,3 +519,8 @@ vsubi_bu         0111 00101000 11000 ..... ..... .....    @vv_ui5
 vsubi_hu         0111 00101000 11001 ..... ..... .....    @vv_ui5
 vsubi_wu         0111 00101000 11010 ..... ..... .....    @vv_ui5
 vsubi_du         0111 00101000 11011 ..... ..... .....    @vv_ui5
+
+vneg_b           0111 00101001 11000 01100 ..... .....    @vv
+vneg_h           0111 00101001 11000 01101 ..... .....    @vv
+vneg_w           0111 00101001 11000 01110 ..... .....    @vv
+vneg_d           0111 00101001 11000 01111 ..... .....    @vv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index e227db20d3..0fd17bf08f 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -20,6 +20,10 @@
                        uint32_t vd, uint32_t vj, uint32_t imm) \
     { FUNC(env, vd, vj, imm, BIT, __VA_ARGS__ ); }
 
+#define DO_HELPER_VV(NAME, BIT, FUNC, ...)                               \
+    void helper_##NAME(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
+    { FUNC(env, vd, vj, BIT, __VA_ARGS__); }
+
 static void helper_vvv(CPULoongArchState *env,
                        uint32_t vd, uint32_t vj, uint32_t vk, int bit,
                        void (*func)(vec_t*, vec_t*, vec_t*, int, int))
@@ -47,6 +51,19 @@ static  void helper_vv_i(CPULoongArchState *env,
     }
 }
 
+static void helper_vv(CPULoongArchState *env,
+                      uint32_t vd, uint32_t vj, int bit,
+                      void (*func)(vec_t*, vec_t*, int, int))
+{
+    int i;
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+
+    for (i = 0; i < LSX_LEN/bit; i++) {
+        func(Vd, Vj, bit, i);
+    }
+}
+
 static void do_vadd(vec_t *Vd, vec_t *Vj, vec_t *Vk,  int bit, int n)
 {
     switch (bit) {
@@ -152,3 +169,28 @@ DO_HELPER_VV_I(vsubi_bu, 8, helper_vv_i, do_vsubi)
 DO_HELPER_VV_I(vsubi_hu, 16, helper_vv_i, do_vsubi)
 DO_HELPER_VV_I(vsubi_wu, 32, helper_vv_i, do_vsubi)
 DO_HELPER_VV_I(vsubi_du, 64, helper_vv_i, do_vsubi)
+
+static void do_vneg(vec_t *Vd, vec_t *Vj, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = -Vj->B[n];
+        break;
+    case 16:
+        Vd->H[n] = -Vj->H[n];
+        break;
+    case 32:
+        Vd->W[n] = -Vj->W[n];
+        break;
+    case 64:
+        Vd->D[n] = -Vj->D[n];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VV(vneg_b, 8, helper_vv, do_vneg)
+DO_HELPER_VV(vneg_h, 16, helper_vv, do_vneg)
+DO_HELPER_VV(vneg_w, 32, helper_vv, do_vneg)
+DO_HELPER_VV(vneg_d, 64, helper_vv, do_vneg)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 08/43] target/loongarch: Implement vsadd/vssub
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (6 preceding siblings ...)
  2022-12-24  8:15 ` [RFC PATCH 07/43] target/loongarch: Implement vneg Song Gao
@ 2022-12-24  8:15 ` Song Gao
  2022-12-24 17:31   ` Richard Henderson
  2022-12-24  8:15 ` [RFC PATCH 09/43] target/loongarch: Implement vhaddw/vhsubw Song Gao
                   ` (35 subsequent siblings)
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VSADD.{B/H/W/D}[U];
- VSSUB.{B/H/W/D}[U].

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  17 +++
 target/loongarch/helper.h                   |  17 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  17 +++
 target/loongarch/insns.decode               |  17 +++
 target/loongarch/lsx_helper.c               | 138 ++++++++++++++++++++
 5 files changed, 206 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 53e299b4ba..8f7db8b6db 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -802,3 +802,20 @@ INSN_LSX(vneg_b,           vv)
 INSN_LSX(vneg_h,           vv)
 INSN_LSX(vneg_w,           vv)
 INSN_LSX(vneg_d,           vv)
+
+INSN_LSX(vsadd_b,          vvv)
+INSN_LSX(vsadd_h,          vvv)
+INSN_LSX(vsadd_w,          vvv)
+INSN_LSX(vsadd_d,          vvv)
+INSN_LSX(vsadd_bu,         vvv)
+INSN_LSX(vsadd_hu,         vvv)
+INSN_LSX(vsadd_wu,         vvv)
+INSN_LSX(vsadd_du,         vvv)
+INSN_LSX(vssub_b,          vvv)
+INSN_LSX(vssub_h,          vvv)
+INSN_LSX(vssub_w,          vvv)
+INSN_LSX(vssub_d,          vvv)
+INSN_LSX(vssub_bu,         vvv)
+INSN_LSX(vssub_hu,         vvv)
+INSN_LSX(vssub_wu,         vvv)
+INSN_LSX(vssub_du,         vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 847950011e..d13bc77d8a 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -156,3 +156,20 @@ DEF_HELPER_3(vneg_b, void, env, i32, i32)
 DEF_HELPER_3(vneg_h, void, env, i32, i32)
 DEF_HELPER_3(vneg_w, void, env, i32, i32)
 DEF_HELPER_3(vneg_d, void, env, i32, i32)
+
+DEF_HELPER_4(vsadd_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsadd_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsadd_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsadd_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsadd_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsadd_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsadd_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsadd_du, void, env, i32, i32, i32)
+DEF_HELPER_4(vssub_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vssub_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssub_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssub_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssub_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vssub_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vssub_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vssub_du, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 00514709c1..e9a8e3ae18 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -74,3 +74,20 @@ TRANS(vneg_b, gen_vv, gen_helper_vneg_b)
 TRANS(vneg_h, gen_vv, gen_helper_vneg_h)
 TRANS(vneg_w, gen_vv, gen_helper_vneg_w)
 TRANS(vneg_d, gen_vv, gen_helper_vneg_d)
+
+TRANS(vsadd_b, gen_vvv, gen_helper_vsadd_b)
+TRANS(vsadd_h, gen_vvv, gen_helper_vsadd_h)
+TRANS(vsadd_w, gen_vvv, gen_helper_vsadd_w)
+TRANS(vsadd_d, gen_vvv, gen_helper_vsadd_d)
+TRANS(vsadd_bu, gen_vvv, gen_helper_vsadd_bu)
+TRANS(vsadd_hu, gen_vvv, gen_helper_vsadd_hu)
+TRANS(vsadd_wu, gen_vvv, gen_helper_vsadd_wu)
+TRANS(vsadd_du, gen_vvv, gen_helper_vsadd_du)
+TRANS(vssub_b, gen_vvv, gen_helper_vssub_b)
+TRANS(vssub_h, gen_vvv, gen_helper_vssub_h)
+TRANS(vssub_w, gen_vvv, gen_helper_vssub_w)
+TRANS(vssub_d, gen_vvv, gen_helper_vssub_d)
+TRANS(vssub_bu, gen_vvv, gen_helper_vssub_bu)
+TRANS(vssub_hu, gen_vvv, gen_helper_vssub_hu)
+TRANS(vssub_wu, gen_vvv, gen_helper_vssub_wu)
+TRANS(vssub_du, gen_vvv, gen_helper_vssub_du)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 3da5ed17ed..9176de3ed2 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -524,3 +524,20 @@ vneg_b           0111 00101001 11000 01100 ..... .....    @vv
 vneg_h           0111 00101001 11000 01101 ..... .....    @vv
 vneg_w           0111 00101001 11000 01110 ..... .....    @vv
 vneg_d           0111 00101001 11000 01111 ..... .....    @vv
+
+vsadd_b          0111 00000100 01100 ..... ..... .....    @vvv
+vsadd_h          0111 00000100 01101 ..... ..... .....    @vvv
+vsadd_w          0111 00000100 01110 ..... ..... .....    @vvv
+vsadd_d          0111 00000100 01111 ..... ..... .....    @vvv
+vsadd_bu         0111 00000100 10100 ..... ..... .....    @vvv
+vsadd_hu         0111 00000100 10101 ..... ..... .....    @vvv
+vsadd_wu         0111 00000100 10110 ..... ..... .....    @vvv
+vsadd_du         0111 00000100 10111 ..... ..... .....    @vvv
+vssub_b          0111 00000100 10000 ..... ..... .....    @vvv
+vssub_h          0111 00000100 10001 ..... ..... .....    @vvv
+vssub_w          0111 00000100 10010 ..... ..... .....    @vvv
+vssub_d          0111 00000100 10011 ..... ..... .....    @vvv
+vssub_bu         0111 00000100 11000 ..... ..... .....    @vvv
+vssub_hu         0111 00000100 11001 ..... ..... .....    @vvv
+vssub_wu         0111 00000100 11010 ..... ..... .....    @vvv
+vssub_du         0111 00000100 11011 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 0fd17bf08f..944823986f 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -194,3 +194,141 @@ DO_HELPER_VV(vneg_b, 8, helper_vv, do_vneg)
 DO_HELPER_VV(vneg_h, 16, helper_vv, do_vneg)
 DO_HELPER_VV(vneg_w, 32, helper_vv, do_vneg)
 DO_HELPER_VV(vneg_d, 64, helper_vv, do_vneg)
+
+static int64_t s_add_s(int64_t s1, int64_t s2, int bit)
+{
+    int64_t smax = MAKE_64BIT_MASK(0, (bit -1));
+    int64_t smin = MAKE_64BIT_MASK((bit -1), 64);
+
+    if (s1 < 0) {
+        return (smin - s1 < s2) ? s1 + s2 : smin;
+    } else {
+        return (s2 < smax - s1) ? s1 + s2 : smax;
+    }
+}
+
+static void do_vsadd(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = s_add_s(Vj->B[n], Vk->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = s_add_s(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = s_add_s(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = s_add_s(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static uint64_t u_add_u(int64_t s1, int64_t s2, int bit)
+{
+    uint64_t umax = MAKE_64BIT_MASK(0, bit);
+    uint64_t u1 = s1 & umax;
+    uint64_t u2 = s2 & umax;
+
+    return (u1 <  umax - u2) ? u1 + u2 : umax;
+}
+
+static void do_vsadd_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = u_add_u(Vj->B[n], Vk->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = u_add_u(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = u_add_u(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = u_add_u(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static int64_t s_sub_s(int64_t s1, int64_t s2, int bit)
+{
+    int64_t smax = MAKE_64BIT_MASK(0, (bit -1));
+    int64_t smin = MAKE_64BIT_MASK((bit -1), 64);
+
+    if (s2 > 0) {
+        return (smin + s2 < s1) ? s1 - s2 : smin;
+    } else {
+        return (s1 < smax + s2) ? s1 - s2 : smax;
+    }
+}
+
+static void do_vssub(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = s_sub_s(Vj->B[n], Vk->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = s_sub_s(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = s_sub_s(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = s_sub_s(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static uint64_t u_sub_u(int64_t s1, int64_t s2, int bit)
+{
+    uint64_t u1 = s1 & MAKE_64BIT_MASK(0, bit);
+    uint64_t u2 = s2 & MAKE_64BIT_MASK(0, bit);
+
+    return (u1 > u2) ?  u1 -u2 : 0;
+}
+
+static void do_vssub_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = u_sub_u(Vj->B[n], Vk->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = u_sub_u(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = u_sub_u(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = u_sub_u(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vsadd_b, 8, helper_vvv, do_vsadd)
+DO_HELPER_VVV(vsadd_h, 16, helper_vvv, do_vsadd)
+DO_HELPER_VVV(vsadd_w, 32, helper_vvv, do_vsadd)
+DO_HELPER_VVV(vsadd_d, 64, helper_vvv, do_vsadd)
+DO_HELPER_VVV(vsadd_bu, 8, helper_vvv, do_vsadd_u)
+DO_HELPER_VVV(vsadd_hu, 16, helper_vvv, do_vsadd_u)
+DO_HELPER_VVV(vsadd_wu, 32, helper_vvv, do_vsadd_u)
+DO_HELPER_VVV(vsadd_du, 64, helper_vvv, do_vsadd_u)
+DO_HELPER_VVV(vssub_b, 8, helper_vvv, do_vssub)
+DO_HELPER_VVV(vssub_h, 16, helper_vvv, do_vssub)
+DO_HELPER_VVV(vssub_w, 32, helper_vvv, do_vssub)
+DO_HELPER_VVV(vssub_d, 64, helper_vvv, do_vssub)
+DO_HELPER_VVV(vssub_bu, 8, helper_vvv, do_vssub_u)
+DO_HELPER_VVV(vssub_hu, 16, helper_vvv, do_vssub_u)
+DO_HELPER_VVV(vssub_wu, 32, helper_vvv, do_vssub_u)
+DO_HELPER_VVV(vssub_du, 64, helper_vvv, do_vssub_u)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 09/43] target/loongarch: Implement vhaddw/vhsubw
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (7 preceding siblings ...)
  2022-12-24  8:15 ` [RFC PATCH 08/43] target/loongarch: Implement vsadd/vssub Song Gao
@ 2022-12-24  8:15 ` Song Gao
  2022-12-24 17:41   ` Richard Henderson
  2022-12-24  8:16 ` [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw Song Gao
                   ` (34 subsequent siblings)
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VHADDW.{H.B/W.H/D.W/Q.D/HU.BU/WU.HU/DU.WU/QU.DU};
- VHSUBW.{H.B/W.H/D.W/Q.D/HU.BU/WU.HU/DU.WU/QU.DU}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  17 +++
 target/loongarch/helper.h                   |  17 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  17 +++
 target/loongarch/insns.decode               |  17 +++
 target/loongarch/lsx_helper.c               | 141 ++++++++++++++++++++
 5 files changed, 209 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 8f7db8b6db..1a906e8714 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -819,3 +819,20 @@ INSN_LSX(vssub_bu,         vvv)
 INSN_LSX(vssub_hu,         vvv)
 INSN_LSX(vssub_wu,         vvv)
 INSN_LSX(vssub_du,         vvv)
+
+INSN_LSX(vhaddw_h_b,       vvv)
+INSN_LSX(vhaddw_w_h,       vvv)
+INSN_LSX(vhaddw_d_w,       vvv)
+INSN_LSX(vhaddw_q_d,       vvv)
+INSN_LSX(vhaddw_hu_bu,     vvv)
+INSN_LSX(vhaddw_wu_hu,     vvv)
+INSN_LSX(vhaddw_du_wu,     vvv)
+INSN_LSX(vhaddw_qu_du,     vvv)
+INSN_LSX(vhsubw_h_b,       vvv)
+INSN_LSX(vhsubw_w_h,       vvv)
+INSN_LSX(vhsubw_d_w,       vvv)
+INSN_LSX(vhsubw_q_d,       vvv)
+INSN_LSX(vhsubw_hu_bu,     vvv)
+INSN_LSX(vhsubw_wu_hu,     vvv)
+INSN_LSX(vhsubw_du_wu,     vvv)
+INSN_LSX(vhsubw_qu_du,     vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index d13bc77d8a..4db8ca599e 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -173,3 +173,20 @@ DEF_HELPER_4(vssub_bu, void, env, i32, i32, i32)
 DEF_HELPER_4(vssub_hu, void, env, i32, i32, i32)
 DEF_HELPER_4(vssub_wu, void, env, i32, i32, i32)
 DEF_HELPER_4(vssub_du, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vhaddw_h_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vhaddw_w_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vhaddw_d_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vhaddw_q_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vhaddw_hu_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vhaddw_wu_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vhaddw_du_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vhaddw_qu_du, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_h_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_w_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_d_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_q_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_hu_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_wu_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_du_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_qu_du, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index e9a8e3ae18..f278a3cd00 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -91,3 +91,20 @@ TRANS(vssub_bu, gen_vvv, gen_helper_vssub_bu)
 TRANS(vssub_hu, gen_vvv, gen_helper_vssub_hu)
 TRANS(vssub_wu, gen_vvv, gen_helper_vssub_wu)
 TRANS(vssub_du, gen_vvv, gen_helper_vssub_du)
+
+TRANS(vhaddw_h_b, gen_vvv, gen_helper_vhaddw_h_b)
+TRANS(vhaddw_w_h, gen_vvv, gen_helper_vhaddw_w_h)
+TRANS(vhaddw_d_w, gen_vvv, gen_helper_vhaddw_d_w)
+TRANS(vhaddw_q_d, gen_vvv, gen_helper_vhaddw_q_d)
+TRANS(vhaddw_hu_bu, gen_vvv, gen_helper_vhaddw_hu_bu)
+TRANS(vhaddw_wu_hu, gen_vvv, gen_helper_vhaddw_wu_hu)
+TRANS(vhaddw_du_wu, gen_vvv, gen_helper_vhaddw_du_wu)
+TRANS(vhaddw_qu_du, gen_vvv, gen_helper_vhaddw_qu_du)
+TRANS(vhsubw_h_b, gen_vvv, gen_helper_vhsubw_h_b)
+TRANS(vhsubw_w_h, gen_vvv, gen_helper_vhsubw_w_h)
+TRANS(vhsubw_d_w, gen_vvv, gen_helper_vhsubw_d_w)
+TRANS(vhsubw_q_d, gen_vvv, gen_helper_vhsubw_q_d)
+TRANS(vhsubw_hu_bu, gen_vvv, gen_helper_vhsubw_hu_bu)
+TRANS(vhsubw_wu_hu, gen_vvv, gen_helper_vhsubw_wu_hu)
+TRANS(vhsubw_du_wu, gen_vvv, gen_helper_vhsubw_du_wu)
+TRANS(vhsubw_qu_du, gen_vvv, gen_helper_vhsubw_qu_du)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 9176de3ed2..77f9ab5a36 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -541,3 +541,20 @@ vssub_bu         0111 00000100 11000 ..... ..... .....    @vvv
 vssub_hu         0111 00000100 11001 ..... ..... .....    @vvv
 vssub_wu         0111 00000100 11010 ..... ..... .....    @vvv
 vssub_du         0111 00000100 11011 ..... ..... .....    @vvv
+
+vhaddw_h_b       0111 00000101 01000 ..... ..... .....    @vvv
+vhaddw_w_h       0111 00000101 01001 ..... ..... .....    @vvv
+vhaddw_d_w       0111 00000101 01010 ..... ..... .....    @vvv
+vhaddw_q_d       0111 00000101 01011 ..... ..... .....    @vvv
+vhaddw_hu_bu     0111 00000101 10000 ..... ..... .....    @vvv
+vhaddw_wu_hu     0111 00000101 10001 ..... ..... .....    @vvv
+vhaddw_du_wu     0111 00000101 10010 ..... ..... .....    @vvv
+vhaddw_qu_du     0111 00000101 10011 ..... ..... .....    @vvv
+vhsubw_h_b       0111 00000101 01100 ..... ..... .....    @vvv
+vhsubw_w_h       0111 00000101 01101 ..... ..... .....    @vvv
+vhsubw_d_w       0111 00000101 01110 ..... ..... .....    @vvv
+vhsubw_q_d       0111 00000101 01111 ..... ..... .....    @vvv
+vhsubw_hu_bu     0111 00000101 10100 ..... ..... .....    @vvv
+vhsubw_wu_hu     0111 00000101 10101 ..... ..... .....    @vvv
+vhsubw_du_wu     0111 00000101 10110 ..... ..... .....    @vvv
+vhsubw_qu_du     0111 00000101 10111 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 944823986f..cb9b691dc7 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -332,3 +332,144 @@ DO_HELPER_VVV(vssub_bu, 8, helper_vvv, do_vssub_u)
 DO_HELPER_VVV(vssub_hu, 16, helper_vvv, do_vssub_u)
 DO_HELPER_VVV(vssub_wu, 32, helper_vvv, do_vssub_u)
 DO_HELPER_VVV(vssub_du, 64, helper_vvv, do_vssub_u)
+
+#define S_EVEN(a, bit) \
+        ((((int64_t)(a)) << (64 - bit / 2)) >> (64 - bit / 2))
+
+#define U_EVEN(a, bit) \
+        ((((uint64_t)(a)) << (64 - bit / 2)) >> (64 - bit / 2))
+
+#define S_ODD(a, bit) \
+        ((((int64_t)(a)) << (64 - bit)) >> (64 - bit/ 2))
+
+#define U_ODD(a, bit) \
+        ((((uint64_t)(a)) << (64 - bit)) >> (64 - bit / 2))
+
+#define S_EVEN_Q(a, bit) \
+        ((((__int128)(a)) << (128 - bit / 2)) >> (128 - bit / 2))
+
+#define U_EVEN_Q(a, bit) \
+        ((((unsigned __int128)(a)) << (128 - bit / 2)) >> (128 - bit / 2))
+
+#define S_ODD_Q(a, bit) \
+        ((((__int128)(a)) << (128 - bit)) >> (128 - bit/ 2))
+
+#define U_ODD_Q(a, bit) \
+        ((((unsigned __int128)(a)) << (128 - bit)) >> (128 - bit / 2))
+
+static int64_t s_haddw_s(int64_t s1, int64_t s2,  int bit)
+{
+    return S_ODD(s1, bit) + S_EVEN(s2, bit);
+}
+
+static void do_vhaddw_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = s_haddw_s(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = s_haddw_s(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = s_haddw_s(Vj->D[n], Vk->D[n], bit);
+        break;
+    case 128:
+        Vd->Q[n] = S_ODD_Q(Vj->Q[n], bit) + S_EVEN_Q(Vk->Q[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static uint64_t u_haddw_u(int64_t s1, int64_t s2, int bit)
+{
+    return U_ODD(s1, bit) + U_EVEN(s2, bit);
+}
+
+static void do_vhaddw_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = u_haddw_u(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = u_haddw_u(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = u_haddw_u(Vj->D[n], Vk->D[n], bit);
+        break;
+    case 128:
+        Vd->Q[n] = U_ODD_Q(Vj->Q[n], bit) + U_EVEN_Q(Vk->Q[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static int64_t s_hsubw_s(int64_t s1, int64_t s2, int bit)
+{
+    return S_ODD(s1, bit) - S_EVEN(s2, bit);
+}
+
+static void do_vhsubw_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = s_hsubw_s(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = s_hsubw_s(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = s_hsubw_s(Vj->D[n], Vk->D[n], bit);
+        break;
+    case 128:
+        Vd->Q[n] = S_ODD_Q(Vj->Q[n], bit) - S_EVEN_Q(Vk->Q[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static uint64_t u_hsubw_u(int64_t s1, int64_t s2, int bit)
+{
+    return U_ODD(s1, bit) - U_EVEN(s2, bit);
+}
+
+static void do_vhsubw_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = u_hsubw_u(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = u_hsubw_u(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = u_hsubw_u(Vj->D[n], Vk->D[n], bit);
+        break;
+    case 128:
+        Vd->Q[n] = U_ODD_Q(Vj->Q[n], bit) - U_EVEN_Q(Vk->Q[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vhaddw_h_b, 16, helper_vvv, do_vhaddw_s)
+DO_HELPER_VVV(vhaddw_w_h, 32, helper_vvv, do_vhaddw_s)
+DO_HELPER_VVV(vhaddw_d_w, 64, helper_vvv, do_vhaddw_s)
+DO_HELPER_VVV(vhaddw_q_d, 128, helper_vvv, do_vhaddw_s)
+DO_HELPER_VVV(vhaddw_hu_bu, 16, helper_vvv, do_vhaddw_u)
+DO_HELPER_VVV(vhaddw_wu_hu, 32, helper_vvv, do_vhaddw_u)
+DO_HELPER_VVV(vhaddw_du_wu, 64, helper_vvv, do_vhaddw_u)
+DO_HELPER_VVV(vhaddw_qu_du, 128, helper_vvv, do_vhaddw_u)
+DO_HELPER_VVV(vhsubw_h_b, 16, helper_vvv, do_vhsubw_s)
+DO_HELPER_VVV(vhsubw_w_h, 32, helper_vvv, do_vhsubw_s)
+DO_HELPER_VVV(vhsubw_d_w, 64, helper_vvv, do_vhsubw_s)
+DO_HELPER_VVV(vhsubw_q_d, 128, helper_vvv, do_vhsubw_s)
+DO_HELPER_VVV(vhsubw_hu_bu, 16, helper_vvv, do_vhsubw_u)
+DO_HELPER_VVV(vhsubw_wu_hu, 32, helper_vvv, do_vhsubw_u)
+DO_HELPER_VVV(vhsubw_du_wu, 64, helper_vvv, do_vhsubw_u)
+DO_HELPER_VVV(vhsubw_qu_du, 128, helper_vvv, do_vhsubw_u)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (8 preceding siblings ...)
  2022-12-24  8:15 ` [RFC PATCH 09/43] target/loongarch: Implement vhaddw/vhsubw Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24 17:48   ` Richard Henderson
  2022-12-24  8:16 ` [RFC PATCH 11/43] target/loongarch: Implement vavg/vavgr Song Gao
                   ` (33 subsequent siblings)
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VADDW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
- VSUBW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
- VADDW{EV/OD}.{H.BU.B/W.HU.H/D.WU.W/Q.DU.D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  43 ++++
 target/loongarch/helper.h                   |  43 ++++
 target/loongarch/insn_trans/trans_lsx.c.inc |  43 ++++
 target/loongarch/insns.decode               |  43 ++++
 target/loongarch/lsx_helper.c               | 243 ++++++++++++++++++++
 5 files changed, 415 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 1a906e8714..81253f00e9 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -836,3 +836,46 @@ INSN_LSX(vhsubw_hu_bu,     vvv)
 INSN_LSX(vhsubw_wu_hu,     vvv)
 INSN_LSX(vhsubw_du_wu,     vvv)
 INSN_LSX(vhsubw_qu_du,     vvv)
+
+INSN_LSX(vaddwev_h_b,      vvv)
+INSN_LSX(vaddwev_w_h,      vvv)
+INSN_LSX(vaddwev_d_w,      vvv)
+INSN_LSX(vaddwev_q_d,      vvv)
+INSN_LSX(vaddwod_h_b,      vvv)
+INSN_LSX(vaddwod_w_h,      vvv)
+INSN_LSX(vaddwod_d_w,      vvv)
+INSN_LSX(vaddwod_q_d,      vvv)
+INSN_LSX(vsubwev_h_b,      vvv)
+INSN_LSX(vsubwev_w_h,      vvv)
+INSN_LSX(vsubwev_d_w,      vvv)
+INSN_LSX(vsubwev_q_d,      vvv)
+INSN_LSX(vsubwod_h_b,      vvv)
+INSN_LSX(vsubwod_w_h,      vvv)
+INSN_LSX(vsubwod_d_w,      vvv)
+INSN_LSX(vsubwod_q_d,      vvv)
+
+INSN_LSX(vaddwev_h_bu,     vvv)
+INSN_LSX(vaddwev_w_hu,     vvv)
+INSN_LSX(vaddwev_d_wu,     vvv)
+INSN_LSX(vaddwev_q_du,     vvv)
+INSN_LSX(vaddwod_h_bu,     vvv)
+INSN_LSX(vaddwod_w_hu,     vvv)
+INSN_LSX(vaddwod_d_wu,     vvv)
+INSN_LSX(vaddwod_q_du,     vvv)
+INSN_LSX(vsubwev_h_bu,     vvv)
+INSN_LSX(vsubwev_w_hu,     vvv)
+INSN_LSX(vsubwev_d_wu,     vvv)
+INSN_LSX(vsubwev_q_du,     vvv)
+INSN_LSX(vsubwod_h_bu,     vvv)
+INSN_LSX(vsubwod_w_hu,     vvv)
+INSN_LSX(vsubwod_d_wu,     vvv)
+INSN_LSX(vsubwod_q_du,     vvv)
+
+INSN_LSX(vaddwev_h_bu_b,   vvv)
+INSN_LSX(vaddwev_w_hu_h,   vvv)
+INSN_LSX(vaddwev_d_wu_w,   vvv)
+INSN_LSX(vaddwev_q_du_d,   vvv)
+INSN_LSX(vaddwod_h_bu_b,   vvv)
+INSN_LSX(vaddwod_w_hu_h,   vvv)
+INSN_LSX(vaddwod_d_wu_w,   vvv)
+INSN_LSX(vaddwod_q_du_d,   vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 4db8ca599e..ff16626381 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -190,3 +190,46 @@ DEF_HELPER_4(vhsubw_hu_bu, void, env, i32, i32, i32)
 DEF_HELPER_4(vhsubw_wu_hu, void, env, i32, i32, i32)
 DEF_HELPER_4(vhsubw_du_wu, void, env, i32, i32, i32)
 DEF_HELPER_4(vhsubw_qu_du, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vaddwev_h_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddwev_w_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddwev_d_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddwev_q_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddwod_h_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddwod_w_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddwod_d_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddwod_q_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsubwev_h_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsubwev_w_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsubwev_d_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsubwev_q_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsubwod_h_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsubwod_w_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsubwod_d_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsubwod_q_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vaddwev_h_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddwev_w_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddwev_d_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddwev_q_du, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddwod_h_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddwod_w_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddwod_d_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddwod_q_du, void, env, i32, i32, i32)
+DEF_HELPER_4(vsubwev_h_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsubwev_w_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsubwev_d_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsubwev_q_du, void, env, i32, i32, i32)
+DEF_HELPER_4(vsubwod_h_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsubwod_w_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsubwod_d_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsubwod_q_du, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vaddwev_h_bu_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddwev_w_hu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddwev_d_wu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddwev_q_du_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddwod_h_bu_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddwod_w_hu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddwod_d_wu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vaddwod_q_du_d, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index f278a3cd00..69111c498c 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -108,3 +108,46 @@ TRANS(vhsubw_hu_bu, gen_vvv, gen_helper_vhsubw_hu_bu)
 TRANS(vhsubw_wu_hu, gen_vvv, gen_helper_vhsubw_wu_hu)
 TRANS(vhsubw_du_wu, gen_vvv, gen_helper_vhsubw_du_wu)
 TRANS(vhsubw_qu_du, gen_vvv, gen_helper_vhsubw_qu_du)
+
+TRANS(vaddwev_h_b, gen_vvv, gen_helper_vaddwev_h_b)
+TRANS(vaddwev_w_h, gen_vvv, gen_helper_vaddwev_w_h)
+TRANS(vaddwev_d_w, gen_vvv, gen_helper_vaddwev_d_w)
+TRANS(vaddwev_q_d, gen_vvv, gen_helper_vaddwev_q_d)
+TRANS(vaddwod_h_b, gen_vvv, gen_helper_vaddwod_h_b)
+TRANS(vaddwod_w_h, gen_vvv, gen_helper_vaddwod_w_h)
+TRANS(vaddwod_d_w, gen_vvv, gen_helper_vaddwod_d_w)
+TRANS(vaddwod_q_d, gen_vvv, gen_helper_vaddwod_q_d)
+TRANS(vsubwev_h_b, gen_vvv, gen_helper_vsubwev_h_b)
+TRANS(vsubwev_w_h, gen_vvv, gen_helper_vsubwev_w_h)
+TRANS(vsubwev_d_w, gen_vvv, gen_helper_vsubwev_d_w)
+TRANS(vsubwev_q_d, gen_vvv, gen_helper_vsubwev_q_d)
+TRANS(vsubwod_h_b, gen_vvv, gen_helper_vsubwod_h_b)
+TRANS(vsubwod_w_h, gen_vvv, gen_helper_vsubwod_w_h)
+TRANS(vsubwod_d_w, gen_vvv, gen_helper_vsubwod_d_w)
+TRANS(vsubwod_q_d, gen_vvv, gen_helper_vsubwod_q_d)
+
+TRANS(vaddwev_h_bu, gen_vvv, gen_helper_vaddwev_h_bu)
+TRANS(vaddwev_w_hu, gen_vvv, gen_helper_vaddwev_w_hu)
+TRANS(vaddwev_d_wu, gen_vvv, gen_helper_vaddwev_d_wu)
+TRANS(vaddwev_q_du, gen_vvv, gen_helper_vaddwev_q_du)
+TRANS(vaddwod_h_bu, gen_vvv, gen_helper_vaddwod_h_bu)
+TRANS(vaddwod_w_hu, gen_vvv, gen_helper_vaddwod_w_hu)
+TRANS(vaddwod_d_wu, gen_vvv, gen_helper_vaddwod_d_wu)
+TRANS(vaddwod_q_du, gen_vvv, gen_helper_vaddwod_q_du)
+TRANS(vsubwev_h_bu, gen_vvv, gen_helper_vsubwev_h_bu)
+TRANS(vsubwev_w_hu, gen_vvv, gen_helper_vsubwev_w_hu)
+TRANS(vsubwev_d_wu, gen_vvv, gen_helper_vsubwev_d_wu)
+TRANS(vsubwev_q_du, gen_vvv, gen_helper_vsubwev_q_du)
+TRANS(vsubwod_h_bu, gen_vvv, gen_helper_vsubwod_h_bu)
+TRANS(vsubwod_w_hu, gen_vvv, gen_helper_vsubwod_w_hu)
+TRANS(vsubwod_d_wu, gen_vvv, gen_helper_vsubwod_d_wu)
+TRANS(vsubwod_q_du, gen_vvv, gen_helper_vsubwod_q_du)
+
+TRANS(vaddwev_h_bu_b, gen_vvv, gen_helper_vaddwev_h_bu_b)
+TRANS(vaddwev_w_hu_h, gen_vvv, gen_helper_vaddwev_w_hu_h)
+TRANS(vaddwev_d_wu_w, gen_vvv, gen_helper_vaddwev_d_wu_w)
+TRANS(vaddwev_q_du_d, gen_vvv, gen_helper_vaddwev_q_du_d)
+TRANS(vaddwod_h_bu_b, gen_vvv, gen_helper_vaddwod_h_bu_b)
+TRANS(vaddwod_w_hu_h, gen_vvv, gen_helper_vaddwod_w_hu_h)
+TRANS(vaddwod_d_wu_w, gen_vvv, gen_helper_vaddwod_d_wu_w)
+TRANS(vaddwod_q_du_d, gen_vvv, gen_helper_vaddwod_q_du_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 77f9ab5a36..7e99ead2de 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -558,3 +558,46 @@ vhsubw_hu_bu     0111 00000101 10100 ..... ..... .....    @vvv
 vhsubw_wu_hu     0111 00000101 10101 ..... ..... .....    @vvv
 vhsubw_du_wu     0111 00000101 10110 ..... ..... .....    @vvv
 vhsubw_qu_du     0111 00000101 10111 ..... ..... .....    @vvv
+
+vaddwev_h_b      0111 00000001 11100 ..... ..... .....    @vvv
+vaddwev_w_h      0111 00000001 11101 ..... ..... .....    @vvv
+vaddwev_d_w      0111 00000001 11110 ..... ..... .....    @vvv
+vaddwev_q_d      0111 00000001 11111 ..... ..... .....    @vvv
+vaddwod_h_b      0111 00000010 00100 ..... ..... .....    @vvv
+vaddwod_w_h      0111 00000010 00101 ..... ..... .....    @vvv
+vaddwod_d_w      0111 00000010 00110 ..... ..... .....    @vvv
+vaddwod_q_d      0111 00000010 00111 ..... ..... .....    @vvv
+vsubwev_h_b      0111 00000010 00000 ..... ..... .....    @vvv
+vsubwev_w_h      0111 00000010 00001 ..... ..... .....    @vvv
+vsubwev_d_w      0111 00000010 00010 ..... ..... .....    @vvv
+vsubwev_q_d      0111 00000010 00011 ..... ..... .....    @vvv
+vsubwod_h_b      0111 00000010 01000 ..... ..... .....    @vvv
+vsubwod_w_h      0111 00000010 01001 ..... ..... .....    @vvv
+vsubwod_d_w      0111 00000010 01010 ..... ..... .....    @vvv
+vsubwod_q_d      0111 00000010 01011 ..... ..... .....    @vvv
+
+vaddwev_h_bu     0111 00000010 11100 ..... ..... .....    @vvv
+vaddwev_w_hu     0111 00000010 11101 ..... ..... .....    @vvv
+vaddwev_d_wu     0111 00000010 11110 ..... ..... .....    @vvv
+vaddwev_q_du     0111 00000010 11111 ..... ..... .....    @vvv
+vaddwod_h_bu     0111 00000011 00100 ..... ..... .....    @vvv
+vaddwod_w_hu     0111 00000011 00101 ..... ..... .....    @vvv
+vaddwod_d_wu     0111 00000011 00110 ..... ..... .....    @vvv
+vaddwod_q_du     0111 00000011 00111 ..... ..... .....    @vvv
+vsubwev_h_bu     0111 00000011 00000 ..... ..... .....    @vvv
+vsubwev_w_hu     0111 00000011 00001 ..... ..... .....    @vvv
+vsubwev_d_wu     0111 00000011 00010 ..... ..... .....    @vvv
+vsubwev_q_du     0111 00000011 00011 ..... ..... .....    @vvv
+vsubwod_h_bu     0111 00000011 01000 ..... ..... .....    @vvv
+vsubwod_w_hu     0111 00000011 01001 ..... ..... .....    @vvv
+vsubwod_d_wu     0111 00000011 01010 ..... ..... .....    @vvv
+vsubwod_q_du     0111 00000011 01011 ..... ..... .....    @vvv
+
+vaddwev_h_bu_b   0111 00000011 11100 ..... ..... .....    @vvv
+vaddwev_w_hu_h   0111 00000011 11101 ..... ..... .....    @vvv
+vaddwev_d_wu_w   0111 00000011 11110 ..... ..... .....    @vvv
+vaddwev_q_du_d   0111 00000011 11111 ..... ..... .....    @vvv
+vaddwod_h_bu_b   0111 00000100 00000 ..... ..... .....    @vvv
+vaddwod_w_hu_h   0111 00000100 00001 ..... ..... .....    @vvv
+vaddwod_d_wu_w   0111 00000100 00010 ..... ..... .....    @vvv
+vaddwod_q_du_d   0111 00000100 00011 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index cb9b691dc7..9e3131af1b 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -473,3 +473,246 @@ DO_HELPER_VVV(vhsubw_hu_bu, 16, helper_vvv, do_vhsubw_u)
 DO_HELPER_VVV(vhsubw_wu_hu, 32, helper_vvv, do_vhsubw_u)
 DO_HELPER_VVV(vhsubw_du_wu, 64, helper_vvv, do_vhsubw_u)
 DO_HELPER_VVV(vhsubw_qu_du, 128, helper_vvv, do_vhsubw_u)
+
+static void do_vaddwev_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = (int16_t)Vj->B[2 * n] + (int16_t)Vk->B[2 * n];
+        break;
+    case 32:
+        Vd->W[n] = (int32_t)Vj->H[2 * n] + (int32_t)Vk->H[2 * n];
+        break;
+    case 64:
+        Vd->D[n] = (int64_t)Vj->W[2 * n] + (int64_t)Vk->W[2 * n];
+        break;
+    case 128:
+        Vd->Q[n] = (__int128)Vj->D[2 * n] + (__int128)Vk->D[2 * n];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vaddwod_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = (int16_t)Vj->B[2 * n + 1] + (int16_t)Vk->B[2 * n + 1];
+        break;
+    case 32:
+        Vd->W[n] = (int32_t)Vj->H[2 * n + 1] + (int32_t)Vk->H[2 * n + 1];
+        break;
+    case 64:
+        Vd->D[n] = (int64_t)Vj->W[2 * n + 1] + (int64_t)Vk->W[2 * n + 1];
+        break;
+    case 128:
+        Vd->Q[n] = (__int128)Vj->D[2 * n + 1] + (__int128)Vk->D[2 * n + 1];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vsubwev_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = (int16_t)Vj->B[2 * n] - (int16_t)Vk->B[2 * n];
+        break;
+    case 32:
+        Vd->W[n] = (int32_t)Vj->H[2 * n] - (int32_t)Vk->H[2 * n];
+        break;
+    case 64:
+        Vd->D[n] = (int64_t)Vj->W[2 * n] - (int64_t)Vk->W[2 * n];
+        break;
+    case 128:
+        Vd->Q[n] = (__int128)Vj->D[2 * n] - (__int128)Vk->D[2 * n];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vsubwod_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = (int16_t)Vj->B[2 * n + 1] - (int16_t)Vk->B[2 * n + 1];
+        break;
+    case 32:
+        Vd->W[n] = (int32_t)Vj->H[2 * n + 1] - (int32_t)Vk->H[2 * n + 1];
+        break;
+    case 64:
+        Vd->D[n] = (int64_t)Vj->W[2 * n + 1] - (int64_t)Vk->W[2 * n + 1];
+        break;
+    case 128:
+        Vd->Q[n] = (__int128)Vj->D[2 * n + 1] - (__int128)Vk->D[2 * n + 1];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vaddwev_h_b, 16, helper_vvv, do_vaddwev_s)
+DO_HELPER_VVV(vaddwev_w_h, 32, helper_vvv, do_vaddwev_s)
+DO_HELPER_VVV(vaddwev_d_w, 64, helper_vvv, do_vaddwev_s)
+DO_HELPER_VVV(vaddwev_q_d, 128, helper_vvv, do_vaddwev_s)
+DO_HELPER_VVV(vaddwod_h_b, 16, helper_vvv, do_vaddwod_s)
+DO_HELPER_VVV(vaddwod_w_h, 32, helper_vvv, do_vaddwod_s)
+DO_HELPER_VVV(vaddwod_d_w, 64, helper_vvv, do_vaddwod_s)
+DO_HELPER_VVV(vaddwod_q_d, 128, helper_vvv, do_vaddwod_s)
+DO_HELPER_VVV(vsubwev_h_b, 16, helper_vvv, do_vsubwev_s)
+DO_HELPER_VVV(vsubwev_w_h, 32, helper_vvv, do_vsubwev_s)
+DO_HELPER_VVV(vsubwev_d_w, 64, helper_vvv, do_vsubwev_s)
+DO_HELPER_VVV(vsubwev_q_d, 128, helper_vvv, do_vsubwev_s)
+DO_HELPER_VVV(vsubwod_h_b, 16, helper_vvv, do_vsubwod_s)
+DO_HELPER_VVV(vsubwod_w_h, 32, helper_vvv, do_vsubwod_s)
+DO_HELPER_VVV(vsubwod_d_w, 64, helper_vvv, do_vsubwod_s)
+DO_HELPER_VVV(vsubwod_q_d, 128, helper_vvv, do_vsubwod_s)
+
+static void do_vaddwev_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = (uint16_t)(uint8_t)Vj->B[2 * n] + (uint16_t)(uint8_t)Vk->B[2 * n];
+        break;
+    case 32:
+        Vd->W[n] = (uint32_t)(uint16_t)Vj->H[2 * n] + (uint32_t)(uint16_t)Vk->H[2 * n];
+        break;
+    case 64:
+        Vd->D[n] = (uint64_t)(uint32_t)Vj->W[2 * n] + (uint64_t)(uint32_t)Vk->W[2 * n];
+        break;
+    case 128:
+        Vd->Q[n] = (__uint128_t)(uint64_t)Vj->D[2 * n] + (__uint128_t)(uint64_t)Vk->D[2 * n];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vaddwod_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = (uint16_t)(uint8_t)Vj->B[2 * n + 1] + (uint16_t)(uint8_t)Vk->B[2 * n + 1];
+        break;
+    case 32:
+        Vd->W[n] = (uint32_t)(uint16_t)Vj->H[2 * n + 1] + (uint32_t)(uint16_t)Vk->H[2 * n + 1];
+        break;
+    case 64:
+        Vd->D[n] = (uint64_t)(uint32_t)Vj->W[2 * n + 1] + (uint64_t)(uint32_t)Vk->W[2 * n + 1];
+        break;
+    case 128:
+        Vd->Q[n] = (__uint128_t)(uint64_t)Vj->D[2 * n + 1] + (__uint128_t)(uint64_t )Vk->D[2 * n + 1];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vsubwev_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = (uint16_t)(uint8_t)Vj->B[2 * n] - (uint16_t)(uint8_t)Vk->B[2 * n];
+        break;
+    case 32:
+        Vd->W[n] = (uint32_t)(uint16_t)Vj->H[2 * n] - (uint32_t)(uint16_t)Vk->H[2 * n];
+        break;
+    case 64:
+        Vd->D[n] = (uint64_t)(uint32_t)Vj->W[2 * n] - (uint64_t)(uint32_t)Vk->W[2 * n];
+        break;
+    case 128:
+        Vd->Q[n] = (__uint128_t)(uint64_t)Vj->D[2 * n] - (__uint128_t)(uint64_t)Vk->D[2 * n];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vsubwod_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = (uint16_t)(uint8_t)Vj->B[2 * n + 1] - (uint16_t)(uint8_t)Vk->B[2 * n + 1];
+        break;
+    case 32:
+        Vd->W[n] = (uint32_t)(uint16_t)Vj->H[2 * n + 1] - (uint32_t)(uint16_t)Vk->H[2 * n + 1];
+        break;
+    case 64:
+        Vd->D[n] = (uint64_t)(uint32_t)Vj->W[2 * n + 1] - (uint64_t)(uint32_t)Vk->W[2 * n + 1];
+        break;
+    case 128:
+        Vd->Q[n] = (__uint128_t)(uint64_t)Vj->D[2 * n + 1] - (__uint128_t)(uint64_t)Vk->D[2 * n + 1];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vaddwev_h_bu, 16, helper_vvv, do_vaddwev_u)
+DO_HELPER_VVV(vaddwev_w_hu, 32, helper_vvv, do_vaddwev_u)
+DO_HELPER_VVV(vaddwev_d_wu, 64, helper_vvv, do_vaddwev_u)
+DO_HELPER_VVV(vaddwev_q_du, 128, helper_vvv, do_vaddwev_u)
+DO_HELPER_VVV(vaddwod_h_bu, 16, helper_vvv, do_vaddwod_u)
+DO_HELPER_VVV(vaddwod_w_hu, 32, helper_vvv, do_vaddwod_u)
+DO_HELPER_VVV(vaddwod_d_wu, 64, helper_vvv, do_vaddwod_u)
+DO_HELPER_VVV(vaddwod_q_du, 128, helper_vvv, do_vaddwod_u)
+DO_HELPER_VVV(vsubwev_h_bu, 16, helper_vvv, do_vsubwev_u)
+DO_HELPER_VVV(vsubwev_w_hu, 32, helper_vvv, do_vsubwev_u)
+DO_HELPER_VVV(vsubwev_d_wu, 64, helper_vvv, do_vsubwev_u)
+DO_HELPER_VVV(vsubwev_q_du, 128, helper_vvv, do_vsubwev_u)
+DO_HELPER_VVV(vsubwod_h_bu, 16, helper_vvv, do_vsubwod_u)
+DO_HELPER_VVV(vsubwod_w_hu, 32, helper_vvv, do_vsubwod_u)
+DO_HELPER_VVV(vsubwod_d_wu, 64, helper_vvv, do_vsubwod_u)
+DO_HELPER_VVV(vsubwod_q_du, 128, helper_vvv, do_vsubwod_u)
+
+static void do_vaddwev_u_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = (uint16_t)(uint8_t)Vj->B[2 * n] + (int16_t)Vk->B[2 * n];
+        break;
+    case 32:
+        Vd->W[n] = (uint32_t)(uint16_t)Vj->H[2 * n] + (int32_t)Vk->H[2 * n];
+        break;
+    case 64:
+        Vd->D[n] = (uint64_t)(uint32_t)Vj->W[2 * n] + (int64_t)Vk->W[2 * n];
+        break;
+    case 128:
+        Vd->Q[n] = (__uint128_t)(uint64_t)Vj->D[2 * n] + (__int128)Vk->D[2 * n];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vaddwod_u_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = (uint16_t)(uint8_t)Vj->B[2 * n + 1] + (int16_t)Vk->B[2 * n + 1];
+        break;
+    case 32:
+        Vd->W[n] = (uint32_t)(uint16_t)Vj->H[2 * n + 1] + (int32_t)Vk->H[2 * n + 1];
+        break;
+    case 64:
+        Vd->D[n] = (uint64_t)(uint32_t)Vj->W[2 * n + 1] + (int64_t)Vk->W[2 * n + 1];
+        break;
+    case 128:
+        Vd->Q[n] = (__uint128_t)(uint64_t)Vj->D[2 * n + 1] + (__int128)Vk->D[2 * n + 1];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vaddwev_h_bu_b, 16, helper_vvv, do_vaddwev_u_s)
+DO_HELPER_VVV(vaddwev_w_hu_h, 32, helper_vvv, do_vaddwev_u_s)
+DO_HELPER_VVV(vaddwev_d_wu_w, 64, helper_vvv, do_vaddwev_u_s)
+DO_HELPER_VVV(vaddwev_q_du_d, 128, helper_vvv, do_vaddwev_u_s)
+DO_HELPER_VVV(vaddwod_h_bu_b, 16, helper_vvv, do_vaddwod_u_s)
+DO_HELPER_VVV(vaddwod_w_hu_h, 32, helper_vvv, do_vaddwod_u_s)
+DO_HELPER_VVV(vaddwod_d_wu_w, 64, helper_vvv, do_vaddwod_u_s)
+DO_HELPER_VVV(vaddwod_q_du_d, 128, helper_vvv, do_vaddwod_u_s)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 11/43] target/loongarch: Implement vavg/vavgr
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (9 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24 17:52   ` Richard Henderson
  2022-12-24  8:16 ` [RFC PATCH 12/43] target/loongarch: Implement vabsd Song Gao
                   ` (32 subsequent siblings)
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VAVG.{B/H/W/D}[U];
- VAVGR.{B/H/W/D}[U].

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  17 +++
 target/loongarch/helper.h                   |  17 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  17 +++
 target/loongarch/insns.decode               |  17 +++
 target/loongarch/lsx_helper.c               | 125 ++++++++++++++++++++
 5 files changed, 193 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 81253f00e9..b0a491033e 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -879,3 +879,20 @@ INSN_LSX(vaddwod_h_bu_b,   vvv)
 INSN_LSX(vaddwod_w_hu_h,   vvv)
 INSN_LSX(vaddwod_d_wu_w,   vvv)
 INSN_LSX(vaddwod_q_du_d,   vvv)
+
+INSN_LSX(vavg_b,           vvv)
+INSN_LSX(vavg_h,           vvv)
+INSN_LSX(vavg_w,           vvv)
+INSN_LSX(vavg_d,           vvv)
+INSN_LSX(vavg_bu,          vvv)
+INSN_LSX(vavg_hu,          vvv)
+INSN_LSX(vavg_wu,          vvv)
+INSN_LSX(vavg_du,          vvv)
+INSN_LSX(vavgr_b,          vvv)
+INSN_LSX(vavgr_h,          vvv)
+INSN_LSX(vavgr_w,          vvv)
+INSN_LSX(vavgr_d,          vvv)
+INSN_LSX(vavgr_bu,         vvv)
+INSN_LSX(vavgr_hu,         vvv)
+INSN_LSX(vavgr_wu,         vvv)
+INSN_LSX(vavgr_du,         vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index ff16626381..c6a387c54d 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -233,3 +233,20 @@ DEF_HELPER_4(vaddwod_h_bu_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vaddwod_w_hu_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vaddwod_d_wu_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vaddwod_q_du_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vavg_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vavg_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vavg_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vavg_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vavg_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vavg_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vavg_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vavg_du, void, env, i32, i32, i32)
+DEF_HELPER_4(vavgr_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vavgr_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vavgr_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vavgr_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vavgr_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vavgr_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vavgr_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vavgr_du, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 69111c498c..2d43a88d74 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -151,3 +151,20 @@ TRANS(vaddwod_h_bu_b, gen_vvv, gen_helper_vaddwod_h_bu_b)
 TRANS(vaddwod_w_hu_h, gen_vvv, gen_helper_vaddwod_w_hu_h)
 TRANS(vaddwod_d_wu_w, gen_vvv, gen_helper_vaddwod_d_wu_w)
 TRANS(vaddwod_q_du_d, gen_vvv, gen_helper_vaddwod_q_du_d)
+
+TRANS(vavg_b, gen_vvv, gen_helper_vavg_b)
+TRANS(vavg_h, gen_vvv, gen_helper_vavg_h)
+TRANS(vavg_w, gen_vvv, gen_helper_vavg_w)
+TRANS(vavg_d, gen_vvv, gen_helper_vavg_d)
+TRANS(vavg_bu, gen_vvv, gen_helper_vavg_bu)
+TRANS(vavg_hu, gen_vvv, gen_helper_vavg_hu)
+TRANS(vavg_wu, gen_vvv, gen_helper_vavg_wu)
+TRANS(vavg_du, gen_vvv, gen_helper_vavg_du)
+TRANS(vavgr_b, gen_vvv, gen_helper_vavgr_b)
+TRANS(vavgr_h, gen_vvv, gen_helper_vavgr_h)
+TRANS(vavgr_w, gen_vvv, gen_helper_vavgr_w)
+TRANS(vavgr_d, gen_vvv, gen_helper_vavgr_d)
+TRANS(vavgr_bu, gen_vvv, gen_helper_vavgr_bu)
+TRANS(vavgr_hu, gen_vvv, gen_helper_vavgr_hu)
+TRANS(vavgr_wu, gen_vvv, gen_helper_vavgr_wu)
+TRANS(vavgr_du, gen_vvv, gen_helper_vavgr_du)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 7e99ead2de..de6e8a72a9 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -601,3 +601,20 @@ vaddwod_h_bu_b   0111 00000100 00000 ..... ..... .....    @vvv
 vaddwod_w_hu_h   0111 00000100 00001 ..... ..... .....    @vvv
 vaddwod_d_wu_w   0111 00000100 00010 ..... ..... .....    @vvv
 vaddwod_q_du_d   0111 00000100 00011 ..... ..... .....    @vvv
+
+vavg_b           0111 00000110 01000 ..... ..... .....    @vvv
+vavg_h           0111 00000110 01001 ..... ..... .....    @vvv
+vavg_w           0111 00000110 01010 ..... ..... .....    @vvv
+vavg_d           0111 00000110 01011 ..... ..... .....    @vvv
+vavg_bu          0111 00000110 01100 ..... ..... .....    @vvv
+vavg_hu          0111 00000110 01101 ..... ..... .....    @vvv
+vavg_wu          0111 00000110 01110 ..... ..... .....    @vvv
+vavg_du          0111 00000110 01111 ..... ..... .....    @vvv
+vavgr_b          0111 00000110 10000 ..... ..... .....    @vvv
+vavgr_h          0111 00000110 10001 ..... ..... .....    @vvv
+vavgr_w          0111 00000110 10010 ..... ..... .....    @vvv
+vavgr_d          0111 00000110 10011 ..... ..... .....    @vvv
+vavgr_bu         0111 00000110 10100 ..... ..... .....    @vvv
+vavgr_hu         0111 00000110 10101 ..... ..... .....    @vvv
+vavgr_wu         0111 00000110 10110 ..... ..... .....    @vvv
+vavgr_du         0111 00000110 10111 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 9e3131af1b..63161ecd1a 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -716,3 +716,128 @@ DO_HELPER_VVV(vaddwod_h_bu_b, 16, helper_vvv, do_vaddwod_u_s)
 DO_HELPER_VVV(vaddwod_w_hu_h, 32, helper_vvv, do_vaddwod_u_s)
 DO_HELPER_VVV(vaddwod_d_wu_w, 64, helper_vvv, do_vaddwod_u_s)
 DO_HELPER_VVV(vaddwod_q_du_d, 128, helper_vvv, do_vaddwod_u_s)
+
+
+static int64_t vavg_s(int64_t s1, int64_t s2)
+{
+    return (s1 >> 1) + (s2 >> 1) + (s1 & s2 & 1);
+}
+
+static void do_vavg_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vavg_s(Vj->B[n], Vk->B[n]);
+        break;
+    case 16:
+        Vd->H[n] = vavg_s(Vj->H[n], Vk->H[n]);
+        break;
+    case 32:
+        Vd->W[n] = vavg_s(Vj->W[n], Vk->W[n]);
+        break;
+    case 64:
+        Vd->D[n] = vavg_s(Vj->D[n], Vk->D[n]);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static uint64_t vavg_u(int64_t s1, int64_t s2, int bit)
+{
+    uint64_t umax = MAKE_64BIT_MASK(0, bit);
+    uint64_t u1 = s1 & umax;
+    uint64_t u2 = s2 & umax;
+    return (u1 >> 1) + (u2 >> 1) + (u1 & u2 & 1);
+}
+
+static void do_vavg_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vavg_u(Vj->B[n], Vk->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = vavg_u(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = vavg_u(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = vavg_u(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static int64_t vavgr_s(int64_t s1, int64_t s2)
+{
+    return (s1 >> 1) + (s2 >> 1) + ((s1 | s2) & 1);
+}
+
+static void do_vavgr_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vavgr_s(Vj->B[n], Vk->B[n]);
+        break;
+    case 16:
+        Vd->H[n] = vavgr_s(Vj->H[n], Vk->H[n]);
+        break;
+    case 32:
+        Vd->W[n] = vavgr_s(Vj->W[n], Vk->W[n]);
+        break;
+    case 64:
+        Vd->D[n] = vavgr_s(Vj->D[n], Vk->D[n]);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static uint64_t vavgr_u(int64_t s1, int64_t s2, int bit)
+{
+    uint64_t umax = MAKE_64BIT_MASK(0, bit);
+    uint64_t u1 = s1 & umax;
+    uint64_t u2 = s2 & umax;
+
+    return (u1 >> 1) + (u2 >> 1) + ((u1 | u2) & 1);
+}
+
+static void do_vavgr_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vavgr_u(Vj->B[n], Vk->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = vavgr_u(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = vavgr_u(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = vavgr_u(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vavg_b, 8, helper_vvv, do_vavg_s)
+DO_HELPER_VVV(vavg_h, 16, helper_vvv, do_vavg_s)
+DO_HELPER_VVV(vavg_w, 32, helper_vvv, do_vavg_s)
+DO_HELPER_VVV(vavg_d, 64, helper_vvv, do_vavg_s)
+DO_HELPER_VVV(vavg_bu, 8, helper_vvv, do_vavg_u)
+DO_HELPER_VVV(vavg_hu, 16, helper_vvv, do_vavg_u)
+DO_HELPER_VVV(vavg_wu, 32, helper_vvv, do_vavg_u)
+DO_HELPER_VVV(vavg_du, 64, helper_vvv, do_vavg_u)
+DO_HELPER_VVV(vavgr_b, 8, helper_vvv, do_vavgr_s)
+DO_HELPER_VVV(vavgr_h, 16, helper_vvv, do_vavgr_s)
+DO_HELPER_VVV(vavgr_w, 32, helper_vvv, do_vavgr_s)
+DO_HELPER_VVV(vavgr_d, 64, helper_vvv, do_vavgr_s)
+DO_HELPER_VVV(vavgr_bu, 8, helper_vvv, do_vavgr_u)
+DO_HELPER_VVV(vavgr_hu, 16, helper_vvv, do_vavgr_u)
+DO_HELPER_VVV(vavgr_wu, 32, helper_vvv, do_vavgr_u)
+DO_HELPER_VVV(vavgr_du, 64, helper_vvv, do_vavgr_u)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 12/43] target/loongarch: Implement vabsd
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (10 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 11/43] target/loongarch: Implement vavg/vavgr Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24 17:55   ` Richard Henderson
  2022-12-24  8:16 ` [RFC PATCH 13/43] target/loongarch: Implement vadda Song Gao
                   ` (31 subsequent siblings)
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VABSD.{B/H/W/D}[U].

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  9 +++
 target/loongarch/helper.h                   |  9 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  9 +++
 target/loongarch/insns.decode               |  9 +++
 target/loongarch/lsx_helper.c               | 63 +++++++++++++++++++++
 5 files changed, 99 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index b0a491033e..8ec612446c 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -896,3 +896,12 @@ INSN_LSX(vavgr_bu,         vvv)
 INSN_LSX(vavgr_hu,         vvv)
 INSN_LSX(vavgr_wu,         vvv)
 INSN_LSX(vavgr_du,         vvv)
+
+INSN_LSX(vabsd_b,          vvv)
+INSN_LSX(vabsd_h,          vvv)
+INSN_LSX(vabsd_w,          vvv)
+INSN_LSX(vabsd_d,          vvv)
+INSN_LSX(vabsd_bu,         vvv)
+INSN_LSX(vabsd_hu,         vvv)
+INSN_LSX(vabsd_wu,         vvv)
+INSN_LSX(vabsd_du,         vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index c6a387c54d..8298af2d40 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -250,3 +250,12 @@ DEF_HELPER_4(vavgr_bu, void, env, i32, i32, i32)
 DEF_HELPER_4(vavgr_hu, void, env, i32, i32, i32)
 DEF_HELPER_4(vavgr_wu, void, env, i32, i32, i32)
 DEF_HELPER_4(vavgr_du, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vabsd_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vabsd_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vabsd_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vabsd_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vabsd_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vabsd_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vabsd_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vabsd_du, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 2d43a88d74..00a921a935 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -168,3 +168,12 @@ TRANS(vavgr_bu, gen_vvv, gen_helper_vavgr_bu)
 TRANS(vavgr_hu, gen_vvv, gen_helper_vavgr_hu)
 TRANS(vavgr_wu, gen_vvv, gen_helper_vavgr_wu)
 TRANS(vavgr_du, gen_vvv, gen_helper_vavgr_du)
+
+TRANS(vabsd_b, gen_vvv, gen_helper_vabsd_b)
+TRANS(vabsd_h, gen_vvv, gen_helper_vabsd_h)
+TRANS(vabsd_w, gen_vvv, gen_helper_vabsd_w)
+TRANS(vabsd_d, gen_vvv, gen_helper_vabsd_d)
+TRANS(vabsd_bu, gen_vvv, gen_helper_vabsd_bu)
+TRANS(vabsd_hu, gen_vvv, gen_helper_vabsd_hu)
+TRANS(vabsd_wu, gen_vvv, gen_helper_vabsd_wu)
+TRANS(vabsd_du, gen_vvv, gen_helper_vabsd_du)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index de6e8a72a9..a770f37b99 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -618,3 +618,12 @@ vavgr_bu         0111 00000110 10100 ..... ..... .....    @vvv
 vavgr_hu         0111 00000110 10101 ..... ..... .....    @vvv
 vavgr_wu         0111 00000110 10110 ..... ..... .....    @vvv
 vavgr_du         0111 00000110 10111 ..... ..... .....    @vvv
+
+vabsd_b          0111 00000110 00000 ..... ..... .....    @vvv
+vabsd_h          0111 00000110 00001 ..... ..... .....    @vvv
+vabsd_w          0111 00000110 00010 ..... ..... .....    @vvv
+vabsd_d          0111 00000110 00011 ..... ..... .....    @vvv
+vabsd_bu         0111 00000110 00100 ..... ..... .....    @vvv
+vabsd_hu         0111 00000110 00101 ..... ..... .....    @vvv
+vabsd_wu         0111 00000110 00110 ..... ..... .....    @vvv
+vabsd_du         0111 00000110 00111 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 63161ecd1a..61dc92059e 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -841,3 +841,66 @@ DO_HELPER_VVV(vavgr_bu, 8, helper_vvv, do_vavgr_u)
 DO_HELPER_VVV(vavgr_hu, 16, helper_vvv, do_vavgr_u)
 DO_HELPER_VVV(vavgr_wu, 32, helper_vvv, do_vavgr_u)
 DO_HELPER_VVV(vavgr_du, 64, helper_vvv, do_vavgr_u)
+
+static int64_t vabsd_s(int64_t s1, int64_t s2)
+{
+    return s1 < s2 ? s2- s1 : s1 -s2;
+}
+
+static void do_vabsd_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vabsd_s(Vj->B[n], Vk->B[n]);
+        break;
+    case 16:
+        Vd->H[n] = vabsd_s(Vj->H[n], Vk->H[n]);
+        break;
+    case 32:
+        Vd->W[n] = vabsd_s(Vj->W[n], Vk->W[n]);
+        break;
+    case 64:
+        Vd->D[n] = vabsd_s(Vj->D[n], Vk->D[n]);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static uint64_t vabsd_u(int64_t s1, int64_t s2, int bit)
+{
+    uint64_t umax = MAKE_64BIT_MASK(0, bit);
+    uint64_t u1 = s1 & umax;
+    uint64_t u2 = s2 & umax;
+
+    return u1 < u2 ? u2 - u1 : u1 -u2;
+}
+
+static void do_vabsd_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vabsd_u(Vj->B[n], Vk->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = vabsd_u(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = vabsd_u(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = vabsd_u(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vabsd_b, 8, helper_vvv, do_vabsd_s)
+DO_HELPER_VVV(vabsd_h, 16, helper_vvv, do_vabsd_s)
+DO_HELPER_VVV(vabsd_w, 32, helper_vvv, do_vabsd_s)
+DO_HELPER_VVV(vabsd_d, 64, helper_vvv, do_vabsd_s)
+DO_HELPER_VVV(vabsd_bu, 8, helper_vvv, do_vabsd_u)
+DO_HELPER_VVV(vabsd_hu, 16, helper_vvv, do_vabsd_u)
+DO_HELPER_VVV(vabsd_wu, 32, helper_vvv, do_vabsd_u)
+DO_HELPER_VVV(vabsd_du, 64, helper_vvv, do_vabsd_u)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 13/43] target/loongarch: Implement vadda
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (11 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 12/43] target/loongarch: Implement vabsd Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24 17:56   ` Richard Henderson
  2022-12-24  8:16 ` [RFC PATCH 14/43] target/loongarch: Implement vmax/vmin Song Gao
                   ` (30 subsequent siblings)
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VADDA.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  5 ++++
 target/loongarch/helper.h                   |  5 ++++
 target/loongarch/insn_trans/trans_lsx.c.inc |  5 ++++
 target/loongarch/insns.decode               |  5 ++++
 target/loongarch/lsx_helper.c               | 32 +++++++++++++++++++++
 5 files changed, 52 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 8ec612446c..ff5d9e0e5b 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -905,3 +905,8 @@ INSN_LSX(vabsd_bu,         vvv)
 INSN_LSX(vabsd_hu,         vvv)
 INSN_LSX(vabsd_wu,         vvv)
 INSN_LSX(vabsd_du,         vvv)
+
+INSN_LSX(vadda_b,          vvv)
+INSN_LSX(vadda_h,          vvv)
+INSN_LSX(vadda_w,          vvv)
+INSN_LSX(vadda_d,          vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 8298af2d40..85321c8874 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -259,3 +259,8 @@ DEF_HELPER_4(vabsd_bu, void, env, i32, i32, i32)
 DEF_HELPER_4(vabsd_hu, void, env, i32, i32, i32)
 DEF_HELPER_4(vabsd_wu, void, env, i32, i32, i32)
 DEF_HELPER_4(vabsd_du, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vadda_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vadda_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vadda_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vadda_d, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 00a921a935..a90fc44ba7 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -177,3 +177,8 @@ TRANS(vabsd_bu, gen_vvv, gen_helper_vabsd_bu)
 TRANS(vabsd_hu, gen_vvv, gen_helper_vabsd_hu)
 TRANS(vabsd_wu, gen_vvv, gen_helper_vabsd_wu)
 TRANS(vabsd_du, gen_vvv, gen_helper_vabsd_du)
+
+TRANS(vadda_b, gen_vvv, gen_helper_vadda_b)
+TRANS(vadda_h, gen_vvv, gen_helper_vadda_h)
+TRANS(vadda_w, gen_vvv, gen_helper_vadda_w)
+TRANS(vadda_d, gen_vvv, gen_helper_vadda_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index a770f37b99..9529ffe970 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -627,3 +627,8 @@ vabsd_bu         0111 00000110 00100 ..... ..... .....    @vvv
 vabsd_hu         0111 00000110 00101 ..... ..... .....    @vvv
 vabsd_wu         0111 00000110 00110 ..... ..... .....    @vvv
 vabsd_du         0111 00000110 00111 ..... ..... .....    @vvv
+
+vadda_b          0111 00000101 11000 ..... ..... .....    @vvv
+vadda_h          0111 00000101 11001 ..... ..... .....    @vvv
+vadda_w          0111 00000101 11010 ..... ..... .....    @vvv
+vadda_d          0111 00000101 11011 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 61dc92059e..a9a0b01fd7 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -904,3 +904,35 @@ DO_HELPER_VVV(vabsd_bu, 8, helper_vvv, do_vabsd_u)
 DO_HELPER_VVV(vabsd_hu, 16, helper_vvv, do_vabsd_u)
 DO_HELPER_VVV(vabsd_wu, 32, helper_vvv, do_vabsd_u)
 DO_HELPER_VVV(vabsd_du, 64, helper_vvv, do_vabsd_u)
+
+static int64_t vadda_s(int64_t s1, int64_t s2)
+{
+    int64_t abs_s1 = s1 >= 0 ? s1 : -s1;
+    int64_t abs_s2 = s2 >= 0 ? s2 : -s2;
+    return abs_s1 + abs_s2;
+}
+
+static void do_vadda_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vadda_s(Vj->B[n], Vk->B[n]);
+        break;
+    case 16:
+        Vd->H[n] = vadda_s(Vj->H[n], Vk->H[n]);
+        break;
+    case 32:
+        Vd->W[n] = vadda_s(Vj->W[n], Vk->W[n]);
+        break;
+    case 64:
+        Vd->D[n] = vadda_s(Vj->D[n], Vk->D[n]);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vadda_b, 8, helper_vvv, do_vadda_s)
+DO_HELPER_VVV(vadda_h, 16, helper_vvv, do_vadda_s)
+DO_HELPER_VVV(vadda_w, 32, helper_vvv, do_vadda_s)
+DO_HELPER_VVV(vadda_d, 64, helper_vvv, do_vadda_s)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 14/43] target/loongarch: Implement vmax/vmin
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (12 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 13/43] target/loongarch: Implement vadda Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24 18:01   ` Richard Henderson
  2022-12-24  8:16 ` [RFC PATCH 15/43] target/loongarch: Implement vmul/vmuh/vmulw{ev/od} Song Gao
                   ` (29 subsequent siblings)
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VMAX[I].{B/H/W/D}[U];
- VMIN[I].{B/H/W/D}[U].

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  33 +++
 target/loongarch/helper.h                   |  34 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  33 +++
 target/loongarch/insns.decode               |  35 ++++
 target/loongarch/lsx_helper.c               | 219 ++++++++++++++++++++
 5 files changed, 354 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index ff5d9e0e5b..2e86c48f4d 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -910,3 +910,36 @@ INSN_LSX(vadda_b,          vvv)
 INSN_LSX(vadda_h,          vvv)
 INSN_LSX(vadda_w,          vvv)
 INSN_LSX(vadda_d,          vvv)
+
+INSN_LSX(vmax_b,           vvv)
+INSN_LSX(vmax_h,           vvv)
+INSN_LSX(vmax_w,           vvv)
+INSN_LSX(vmax_d,           vvv)
+INSN_LSX(vmin_b,           vvv)
+INSN_LSX(vmin_h,           vvv)
+INSN_LSX(vmin_w,           vvv)
+INSN_LSX(vmin_d,           vvv)
+INSN_LSX(vmax_bu,          vvv)
+INSN_LSX(vmax_hu,          vvv)
+INSN_LSX(vmax_wu,          vvv)
+INSN_LSX(vmax_du,          vvv)
+INSN_LSX(vmin_bu,          vvv)
+INSN_LSX(vmin_hu,          vvv)
+INSN_LSX(vmin_wu,          vvv)
+INSN_LSX(vmin_du,          vvv)
+INSN_LSX(vmaxi_b,          vv_i)
+INSN_LSX(vmaxi_h,          vv_i)
+INSN_LSX(vmaxi_w,          vv_i)
+INSN_LSX(vmaxi_d,          vv_i)
+INSN_LSX(vmini_b,          vv_i)
+INSN_LSX(vmini_h,          vv_i)
+INSN_LSX(vmini_w,          vv_i)
+INSN_LSX(vmini_d,          vv_i)
+INSN_LSX(vmaxi_bu,         vv_i)
+INSN_LSX(vmaxi_hu,         vv_i)
+INSN_LSX(vmaxi_wu,         vv_i)
+INSN_LSX(vmaxi_du,         vv_i)
+INSN_LSX(vmini_bu,         vv_i)
+INSN_LSX(vmini_hu,         vv_i)
+INSN_LSX(vmini_wu,         vv_i)
+INSN_LSX(vmini_du,         vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 85321c8874..04afc93dc1 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -264,3 +264,37 @@ DEF_HELPER_4(vadda_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vadda_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vadda_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vadda_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vmax_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vmax_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vmax_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vmax_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaxi_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaxi_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaxi_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaxi_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vmax_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmax_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmax_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmax_du, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaxi_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaxi_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaxi_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaxi_du, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vmin_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vmin_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vmin_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vmin_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vmini_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vmini_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vmini_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vmini_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vmin_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmin_hu, void, env, i32, i32 ,i32)
+DEF_HELPER_4(vmin_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmin_du, void, env, i32, i32, i32)
+DEF_HELPER_4(vmini_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmini_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmini_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmini_du, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index a90fc44ba7..8bece985f1 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -182,3 +182,36 @@ TRANS(vadda_b, gen_vvv, gen_helper_vadda_b)
 TRANS(vadda_h, gen_vvv, gen_helper_vadda_h)
 TRANS(vadda_w, gen_vvv, gen_helper_vadda_w)
 TRANS(vadda_d, gen_vvv, gen_helper_vadda_d)
+
+TRANS(vmax_b, gen_vvv, gen_helper_vmax_b)
+TRANS(vmax_h, gen_vvv, gen_helper_vmax_h)
+TRANS(vmax_w, gen_vvv, gen_helper_vmax_w)
+TRANS(vmax_d, gen_vvv, gen_helper_vmax_d)
+TRANS(vmaxi_b, gen_vv_i, gen_helper_vmaxi_b)
+TRANS(vmaxi_h, gen_vv_i, gen_helper_vmaxi_h)
+TRANS(vmaxi_w, gen_vv_i, gen_helper_vmaxi_w)
+TRANS(vmaxi_d, gen_vv_i, gen_helper_vmaxi_d)
+TRANS(vmax_bu, gen_vvv, gen_helper_vmax_bu)
+TRANS(vmax_hu, gen_vvv, gen_helper_vmax_hu)
+TRANS(vmax_wu, gen_vvv, gen_helper_vmax_wu)
+TRANS(vmax_du, gen_vvv, gen_helper_vmax_du)
+TRANS(vmaxi_bu, gen_vv_i, gen_helper_vmaxi_bu)
+TRANS(vmaxi_hu, gen_vv_i, gen_helper_vmaxi_hu)
+TRANS(vmaxi_wu, gen_vv_i, gen_helper_vmaxi_wu)
+TRANS(vmaxi_du, gen_vv_i, gen_helper_vmaxi_du)
+TRANS(vmin_b, gen_vvv, gen_helper_vmin_b)
+TRANS(vmin_h, gen_vvv, gen_helper_vmin_h)
+TRANS(vmin_w, gen_vvv, gen_helper_vmin_w)
+TRANS(vmin_d, gen_vvv, gen_helper_vmin_d)
+TRANS(vmini_b, gen_vv_i, gen_helper_vmini_b)
+TRANS(vmini_h, gen_vv_i, gen_helper_vmini_h)
+TRANS(vmini_w, gen_vv_i, gen_helper_vmini_w)
+TRANS(vmini_d, gen_vv_i, gen_helper_vmini_d)
+TRANS(vmin_bu, gen_vvv, gen_helper_vmin_bu)
+TRANS(vmin_hu, gen_vvv, gen_helper_vmin_hu)
+TRANS(vmin_wu, gen_vvv, gen_helper_vmin_wu)
+TRANS(vmin_du, gen_vvv, gen_helper_vmin_du)
+TRANS(vmini_bu, gen_vv_i, gen_helper_vmini_bu)
+TRANS(vmini_hu, gen_vv_i, gen_helper_vmini_hu)
+TRANS(vmini_wu, gen_vv_i, gen_helper_vmini_wu)
+TRANS(vmini_du, gen_vv_i, gen_helper_vmini_du)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 9529ffe970..c5d8859db2 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -499,6 +499,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 @vv               .... ........ ..... ..... vj:5 vd:5    &vv
 @vvv               .... ........ ..... vk:5 vj:5 vd:5    &vvv
 @vv_ui5           .... ........ ..... imm:5 vj:5 vd:5    &vv_i
+@vv_i5           .... ........ ..... imm:s5 vj:5 vd:5    &vv_i
 
 vadd_b           0111 00000000 10100 ..... ..... .....    @vvv
 vadd_h           0111 00000000 10101 ..... ..... .....    @vvv
@@ -632,3 +633,37 @@ vadda_b          0111 00000101 11000 ..... ..... .....    @vvv
 vadda_h          0111 00000101 11001 ..... ..... .....    @vvv
 vadda_w          0111 00000101 11010 ..... ..... .....    @vvv
 vadda_d          0111 00000101 11011 ..... ..... .....    @vvv
+
+vmax_b           0111 00000111 00000 ..... ..... .....    @vvv
+vmax_h           0111 00000111 00001 ..... ..... .....    @vvv
+vmax_w           0111 00000111 00010 ..... ..... .....    @vvv
+vmax_d           0111 00000111 00011 ..... ..... .....    @vvv
+vmaxi_b          0111 00101001 00000 ..... ..... .....    @vv_i5
+vmaxi_h          0111 00101001 00001 ..... ..... .....    @vv_i5
+vmaxi_w          0111 00101001 00010 ..... ..... .....    @vv_i5
+vmaxi_d          0111 00101001 00011 ..... ..... .....    @vv_i5
+vmax_bu          0111 00000111 01000 ..... ..... .....    @vvv
+vmax_hu          0111 00000111 01001 ..... ..... .....    @vvv
+vmax_wu          0111 00000111 01010 ..... ..... .....    @vvv
+vmax_du          0111 00000111 01011 ..... ..... .....    @vvv
+vmaxi_bu         0111 00101001 01000 ..... ..... .....    @vv_ui5
+vmaxi_hu         0111 00101001 01001 ..... ..... .....    @vv_ui5
+vmaxi_wu         0111 00101001 01010 ..... ..... .....    @vv_ui5
+vmaxi_du         0111 00101001 01011 ..... ..... .....    @vv_ui5
+
+vmin_b           0111 00000111 00100 ..... ..... .....    @vvv
+vmin_h           0111 00000111 00101 ..... ..... .....    @vvv
+vmin_w           0111 00000111 00110 ..... ..... .....    @vvv
+vmin_d           0111 00000111 00111 ..... ..... .....    @vvv
+vmini_b          0111 00101001 00100 ..... ..... .....    @vv_i5
+vmini_h          0111 00101001 00101 ..... ..... .....    @vv_i5
+vmini_w          0111 00101001 00110 ..... ..... .....    @vv_i5
+vmini_d          0111 00101001 00111 ..... ..... .....    @vv_i5
+vmin_bu          0111 00000111 01100 ..... ..... .....    @vvv
+vmin_hu          0111 00000111 01101 ..... ..... .....    @vvv
+vmin_wu          0111 00000111 01110 ..... ..... .....    @vvv
+vmin_du          0111 00000111 01111 ..... ..... .....    @vvv
+vmini_bu         0111 00101001 01100 ..... ..... .....    @vv_ui5
+vmini_hu         0111 00101001 01101 ..... ..... .....    @vv_ui5
+vmini_wu         0111 00101001 01110 ..... ..... .....    @vv_ui5
+vmini_du         0111 00101001 01111 ..... ..... .....    @vv_ui5
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index a9a0b01fd7..5bccb3111b 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -936,3 +936,222 @@ DO_HELPER_VVV(vadda_b, 8, helper_vvv, do_vadda_s)
 DO_HELPER_VVV(vadda_h, 16, helper_vvv, do_vadda_s)
 DO_HELPER_VVV(vadda_w, 32, helper_vvv, do_vadda_s)
 DO_HELPER_VVV(vadda_d, 64, helper_vvv, do_vadda_s)
+
+static int64_t vmax_s(int64_t s1, int64_t s2)
+{
+    return s1 > s2 ? s1 : s2;
+}
+
+static void do_vmax_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vmax_s(Vj->B[n], Vk->B[n]);
+        break;
+    case 16:
+        Vd->H[n] = vmax_s(Vj->H[n], Vk->H[n]);
+        break;
+    case 32:
+        Vd->W[n] = vmax_s(Vj->W[n], Vk->W[n]);
+        break;
+    case 64:
+        Vd->D[n] = vmax_s(Vj->D[n], Vk->D[n]);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vmaxi_s(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit , int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vmax_s(Vj->B[n], imm);
+        break;
+    case 16:
+        Vd->H[n] = vmax_s(Vj->H[n], imm);
+        break;
+    case 32:
+        Vd->W[n] = vmax_s(Vj->W[n], imm);
+        break;
+    case 64:
+        Vd->D[n] = vmax_s(Vj->D[n], imm);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static uint64_t vmax_u(int64_t s1, int64_t s2, int bit)
+{
+    uint64_t umax = MAKE_64BIT_MASK(0, bit);
+    uint64_t u1 = s1 & umax;
+    uint64_t u2 = s2 & umax;
+    return u1 > u2 ? u1 : u2;
+}
+
+static void do_vmax_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vmax_u(Vj->B[n], Vk->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = vmax_u(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = vmax_u(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = vmax_u(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vmaxi_u(vec_t *Vd, vec_t *Vj, uint32_t imm , int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vmax_u(Vj->B[n], imm, bit);
+        break;
+    case 16:
+        Vd->H[n] = vmax_u(Vj->H[n], imm, bit);
+        break;
+    case 32:
+        Vd->W[n] = vmax_u(Vj->W[n], imm, bit);
+        break;
+    case 64:
+        Vd->D[n] = vmax_u(Vj->D[n], imm, bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static int64_t vmin_s(int64_t s1, int64_t s2)
+{
+    return s1 < s2 ? s1 : s2;
+}
+
+static void do_vmin_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vmin_s(Vj->B[n], Vk->B[n]);
+        break;
+    case 16:
+        Vd->H[n] = vmin_s(Vj->H[n], Vk->H[n]);
+        break;
+    case 32:
+        Vd->W[n] = vmin_s(Vj->W[n], Vk->W[n]);
+        break;
+    case 64:
+        Vd->D[n] = vmin_s(Vj->D[n], Vk->D[n]);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vmini_s(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vmin_s(Vj->B[n], imm);
+        break;
+    case 16:
+        Vd->H[n] = vmin_s(Vj->H[n], imm);
+        break;
+    case 32:
+        Vd->W[n] = vmin_s(Vj->W[n], imm);
+        break;
+    case 64:
+        Vd->D[n] = vmin_s(Vj->D[n], imm);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static uint64_t vmin_u(int64_t s1, int64_t s2, int bit)
+{
+    uint64_t umax = MAKE_64BIT_MASK(0, bit);
+    uint64_t u1 = s1 & umax;
+    uint64_t u2 = s2 & umax;
+    return u1 < u2 ? u1 : u2;
+}
+
+static void do_vmin_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vmin_u(Vj->B[n], Vk->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = vmin_u(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = vmin_u(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = vmin_u(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vmini_u(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vmin_u(Vj->B[n], imm, bit);
+        break;
+    case 16:
+        Vd->H[n] = vmin_u(Vj->H[n], imm, bit);
+        break;
+    case 32:
+        Vd->W[n] = vmin_u(Vj->W[n], imm, bit);
+        break;
+    case 64:
+        Vd->D[n] = vmin_u(Vj->D[n], imm, bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vmax_b, 8, helper_vvv, do_vmax_s)
+DO_HELPER_VVV(vmax_h, 16, helper_vvv, do_vmax_s)
+DO_HELPER_VVV(vmax_w, 32, helper_vvv, do_vmax_s)
+DO_HELPER_VVV(vmax_d, 64, helper_vvv, do_vmax_s)
+DO_HELPER_VV_I(vmaxi_b, 8, helper_vv_i, do_vmaxi_s)
+DO_HELPER_VV_I(vmaxi_h, 16, helper_vv_i, do_vmaxi_s)
+DO_HELPER_VV_I(vmaxi_w, 32, helper_vv_i, do_vmaxi_s)
+DO_HELPER_VV_I(vmaxi_d, 64, helper_vv_i, do_vmaxi_s)
+DO_HELPER_VVV(vmax_bu, 8, helper_vvv, do_vmax_u)
+DO_HELPER_VVV(vmax_hu, 16, helper_vvv, do_vmax_u)
+DO_HELPER_VVV(vmax_wu, 32, helper_vvv, do_vmax_u)
+DO_HELPER_VVV(vmax_du, 64, helper_vvv, do_vmax_u)
+DO_HELPER_VV_I(vmaxi_bu, 8, helper_vv_i, do_vmaxi_u)
+DO_HELPER_VV_I(vmaxi_hu, 16, helper_vv_i, do_vmaxi_u)
+DO_HELPER_VV_I(vmaxi_wu, 32, helper_vv_i, do_vmaxi_u)
+DO_HELPER_VV_I(vmaxi_du, 64, helper_vv_i, do_vmaxi_u)
+DO_HELPER_VVV(vmin_b, 8, helper_vvv, do_vmin_s)
+DO_HELPER_VVV(vmin_h, 16, helper_vvv, do_vmin_s)
+DO_HELPER_VVV(vmin_w, 32, helper_vvv, do_vmin_s)
+DO_HELPER_VVV(vmin_d, 64, helper_vvv, do_vmin_s)
+DO_HELPER_VV_I(vmini_b, 8, helper_vv_i, do_vmini_s)
+DO_HELPER_VV_I(vmini_h, 16, helper_vv_i, do_vmini_s)
+DO_HELPER_VV_I(vmini_w, 32, helper_vv_i, do_vmini_s)
+DO_HELPER_VV_I(vmini_d, 64, helper_vv_i, do_vmini_s)
+DO_HELPER_VVV(vmin_bu, 8, helper_vvv, do_vmin_u)
+DO_HELPER_VVV(vmin_hu, 16, helper_vvv, do_vmin_u)
+DO_HELPER_VVV(vmin_wu, 32, helper_vvv, do_vmin_u)
+DO_HELPER_VVV(vmin_du, 64, helper_vvv, do_vmin_u)
+DO_HELPER_VV_I(vmini_bu, 8, helper_vv_i, do_vmini_u)
+DO_HELPER_VV_I(vmini_hu, 16, helper_vv_i, do_vmini_u)
+DO_HELPER_VV_I(vmini_wu, 32, helper_vv_i, do_vmini_u)
+DO_HELPER_VV_I(vmini_du, 64, helper_vv_i, do_vmini_u)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 15/43] target/loongarch: Implement vmul/vmuh/vmulw{ev/od}
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (13 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 14/43] target/loongarch: Implement vmax/vmin Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24 18:07   ` Richard Henderson
  2022-12-24  8:16 ` [RFC PATCH 16/43] target/loongarch: Implement vmadd/vmsub/vmaddw{ev/od} Song Gao
                   ` (28 subsequent siblings)
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VMUL.{B/H/W/D};
- VMUH.{B/H/W/D}[U];
- VMULW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
- VMULW{EV/OD}.{H.BU.B/W.HU.H/D.WU.W/Q.DU.D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  38 ++++
 target/loongarch/helper.h                   |  38 ++++
 target/loongarch/insn_trans/trans_lsx.c.inc |  37 ++++
 target/loongarch/insns.decode               |  38 ++++
 target/loongarch/lsx_helper.c               | 218 ++++++++++++++++++++
 5 files changed, 369 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 2e86c48f4d..8818e078c1 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -943,3 +943,41 @@ INSN_LSX(vmini_bu,         vv_i)
 INSN_LSX(vmini_hu,         vv_i)
 INSN_LSX(vmini_wu,         vv_i)
 INSN_LSX(vmini_du,         vv_i)
+
+INSN_LSX(vmul_b,           vvv)
+INSN_LSX(vmul_h,           vvv)
+INSN_LSX(vmul_w,           vvv)
+INSN_LSX(vmul_d,           vvv)
+INSN_LSX(vmuh_b,           vvv)
+INSN_LSX(vmuh_h,           vvv)
+INSN_LSX(vmuh_w,           vvv)
+INSN_LSX(vmuh_d,           vvv)
+INSN_LSX(vmuh_bu,          vvv)
+INSN_LSX(vmuh_hu,          vvv)
+INSN_LSX(vmuh_wu,          vvv)
+INSN_LSX(vmuh_du,          vvv)
+
+INSN_LSX(vmulwev_h_b,      vvv)
+INSN_LSX(vmulwev_w_h,      vvv)
+INSN_LSX(vmulwev_d_w,      vvv)
+INSN_LSX(vmulwev_q_d,      vvv)
+INSN_LSX(vmulwod_h_b,      vvv)
+INSN_LSX(vmulwod_w_h,      vvv)
+INSN_LSX(vmulwod_d_w,      vvv)
+INSN_LSX(vmulwod_q_d,      vvv)
+INSN_LSX(vmulwev_h_bu,     vvv)
+INSN_LSX(vmulwev_w_hu,     vvv)
+INSN_LSX(vmulwev_d_wu,     vvv)
+INSN_LSX(vmulwev_q_du,     vvv)
+INSN_LSX(vmulwod_h_bu,     vvv)
+INSN_LSX(vmulwod_w_hu,     vvv)
+INSN_LSX(vmulwod_d_wu,     vvv)
+INSN_LSX(vmulwod_q_du,     vvv)
+INSN_LSX(vmulwev_h_bu_b,   vvv)
+INSN_LSX(vmulwev_w_hu_h,   vvv)
+INSN_LSX(vmulwev_d_wu_w,   vvv)
+INSN_LSX(vmulwev_q_du_d,   vvv)
+INSN_LSX(vmulwod_h_bu_b,   vvv)
+INSN_LSX(vmulwod_w_hu_h,   vvv)
+INSN_LSX(vmulwod_d_wu_w,   vvv)
+INSN_LSX(vmulwod_q_du_d,   vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 04afc93dc1..568a89eec1 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -298,3 +298,41 @@ DEF_HELPER_4(vmini_bu, void, env, i32, i32, i32)
 DEF_HELPER_4(vmini_hu, void, env, i32, i32, i32)
 DEF_HELPER_4(vmini_wu, void, env, i32, i32, i32)
 DEF_HELPER_4(vmini_du, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vmul_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vmul_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vmul_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vmul_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vmuh_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vmuh_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vmuh_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vmuh_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vmuh_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmuh_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmuh_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmuh_du, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vmulwev_h_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwev_w_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwev_d_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwev_q_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwod_h_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwod_w_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwod_d_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwod_q_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwev_h_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwev_w_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwev_d_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwev_q_du, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwod_h_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwod_w_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwod_d_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwod_q_du, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwev_h_bu_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwev_w_hu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwev_d_wu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwev_q_du_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwod_h_bu_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwod_w_hu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwod_d_wu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vmulwod_q_du_d, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 8bece985f1..7d27f574ed 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -215,3 +215,40 @@ TRANS(vmini_bu, gen_vv_i, gen_helper_vmini_bu)
 TRANS(vmini_hu, gen_vv_i, gen_helper_vmini_hu)
 TRANS(vmini_wu, gen_vv_i, gen_helper_vmini_wu)
 TRANS(vmini_du, gen_vv_i, gen_helper_vmini_du)
+
+TRANS(vmul_b, gen_vvv, gen_helper_vmul_b)
+TRANS(vmul_h, gen_vvv, gen_helper_vmul_h)
+TRANS(vmul_w, gen_vvv, gen_helper_vmul_w)
+TRANS(vmul_d, gen_vvv, gen_helper_vmul_d)
+TRANS(vmuh_b, gen_vvv, gen_helper_vmuh_b)
+TRANS(vmuh_h, gen_vvv, gen_helper_vmuh_h)
+TRANS(vmuh_w, gen_vvv, gen_helper_vmuh_w)
+TRANS(vmuh_d, gen_vvv, gen_helper_vmuh_d)
+TRANS(vmuh_bu, gen_vvv, gen_helper_vmuh_bu)
+TRANS(vmuh_hu, gen_vvv, gen_helper_vmuh_hu)
+TRANS(vmuh_wu, gen_vvv, gen_helper_vmuh_wu)
+TRANS(vmuh_du, gen_vvv, gen_helper_vmuh_du)
+TRANS(vmulwev_h_b, gen_vvv, gen_helper_vmulwev_h_b)
+TRANS(vmulwev_w_h, gen_vvv, gen_helper_vmulwev_w_h)
+TRANS(vmulwev_d_w, gen_vvv, gen_helper_vmulwev_d_w)
+TRANS(vmulwev_q_d, gen_vvv, gen_helper_vmulwev_q_d)
+TRANS(vmulwod_h_b, gen_vvv, gen_helper_vmulwod_h_b)
+TRANS(vmulwod_w_h, gen_vvv, gen_helper_vmulwod_w_h)
+TRANS(vmulwod_d_w, gen_vvv, gen_helper_vmulwod_d_w)
+TRANS(vmulwod_q_d, gen_vvv, gen_helper_vmulwod_q_d)
+TRANS(vmulwev_h_bu, gen_vvv, gen_helper_vmulwev_h_bu)
+TRANS(vmulwev_w_hu, gen_vvv, gen_helper_vmulwev_w_hu)
+TRANS(vmulwev_d_wu, gen_vvv, gen_helper_vmulwev_d_wu)
+TRANS(vmulwev_q_du, gen_vvv, gen_helper_vmulwev_q_du)
+TRANS(vmulwod_h_bu, gen_vvv, gen_helper_vmulwod_h_bu)
+TRANS(vmulwod_w_hu, gen_vvv, gen_helper_vmulwod_w_hu)
+TRANS(vmulwod_d_wu, gen_vvv, gen_helper_vmulwod_d_wu)
+TRANS(vmulwod_q_du, gen_vvv, gen_helper_vmulwod_q_du)
+TRANS(vmulwev_h_bu_b, gen_vvv, gen_helper_vmulwev_h_bu_b)
+TRANS(vmulwev_w_hu_h, gen_vvv, gen_helper_vmulwev_w_hu_h)
+TRANS(vmulwev_d_wu_w, gen_vvv, gen_helper_vmulwev_d_wu_w)
+TRANS(vmulwev_q_du_d, gen_vvv, gen_helper_vmulwev_q_du_d)
+TRANS(vmulwod_h_bu_b, gen_vvv, gen_helper_vmulwod_h_bu_b)
+TRANS(vmulwod_w_hu_h, gen_vvv, gen_helper_vmulwod_w_hu_h)
+TRANS(vmulwod_d_wu_w, gen_vvv, gen_helper_vmulwod_d_wu_w)
+TRANS(vmulwod_q_du_d, gen_vvv, gen_helper_vmulwod_q_du_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index c5d8859db2..6f32fd290e 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -667,3 +667,41 @@ vmini_bu         0111 00101001 01100 ..... ..... .....    @vv_ui5
 vmini_hu         0111 00101001 01101 ..... ..... .....    @vv_ui5
 vmini_wu         0111 00101001 01110 ..... ..... .....    @vv_ui5
 vmini_du         0111 00101001 01111 ..... ..... .....    @vv_ui5
+
+vmul_b           0111 00001000 01000 ..... ..... .....    @vvv
+vmul_h           0111 00001000 01001 ..... ..... .....    @vvv
+vmul_w           0111 00001000 01010 ..... ..... .....    @vvv
+vmul_d           0111 00001000 01011 ..... ..... .....    @vvv
+vmuh_b           0111 00001000 01100 ..... ..... .....    @vvv
+vmuh_h           0111 00001000 01101 ..... ..... .....    @vvv
+vmuh_w           0111 00001000 01110 ..... ..... .....    @vvv
+vmuh_d           0111 00001000 01111 ..... ..... .....    @vvv
+vmuh_bu          0111 00001000 10000 ..... ..... .....    @vvv
+vmuh_hu          0111 00001000 10001 ..... ..... .....    @vvv
+vmuh_wu          0111 00001000 10010 ..... ..... .....    @vvv
+vmuh_du          0111 00001000 10011 ..... ..... .....    @vvv
+
+vmulwev_h_b      0111 00001001 00000 ..... ..... .....    @vvv
+vmulwev_w_h      0111 00001001 00001 ..... ..... .....    @vvv
+vmulwev_d_w      0111 00001001 00010 ..... ..... .....    @vvv
+vmulwev_q_d      0111 00001001 00011 ..... ..... .....    @vvv
+vmulwod_h_b      0111 00001001 00100 ..... ..... .....    @vvv
+vmulwod_w_h      0111 00001001 00101 ..... ..... .....    @vvv
+vmulwod_d_w      0111 00001001 00110 ..... ..... .....    @vvv
+vmulwod_q_d      0111 00001001 00111 ..... ..... .....    @vvv
+vmulwev_h_bu     0111 00001001 10000 ..... ..... .....    @vvv
+vmulwev_w_hu     0111 00001001 10001 ..... ..... .....    @vvv
+vmulwev_d_wu     0111 00001001 10010 ..... ..... .....    @vvv
+vmulwev_q_du     0111 00001001 10011 ..... ..... .....    @vvv
+vmulwod_h_bu     0111 00001001 10100 ..... ..... .....    @vvv
+vmulwod_w_hu     0111 00001001 10101 ..... ..... .....    @vvv
+vmulwod_d_wu     0111 00001001 10110 ..... ..... .....    @vvv
+vmulwod_q_du     0111 00001001 10111 ..... ..... .....    @vvv
+vmulwev_h_bu_b   0111 00001010 00000 ..... ..... .....    @vvv
+vmulwev_w_hu_h   0111 00001010 00001 ..... ..... .....    @vvv
+vmulwev_d_wu_w   0111 00001010 00010 ..... ..... .....    @vvv
+vmulwev_q_du_d   0111 00001010 00011 ..... ..... .....    @vvv
+vmulwod_h_bu_b   0111 00001010 00100 ..... ..... .....    @vvv
+vmulwod_w_hu_h   0111 00001010 00101 ..... ..... .....    @vvv
+vmulwod_d_wu_w   0111 00001010 00110 ..... ..... .....    @vvv
+vmulwod_q_du_d   0111 00001010 00111 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 5bccb3111b..d55d2350dc 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -1155,3 +1155,221 @@ DO_HELPER_VV_I(vmini_bu, 8, helper_vv_i, do_vmini_u)
 DO_HELPER_VV_I(vmini_hu, 16, helper_vv_i, do_vmini_u)
 DO_HELPER_VV_I(vmini_wu, 32, helper_vv_i, do_vmini_u)
 DO_HELPER_VV_I(vmini_du, 64, helper_vv_i, do_vmini_u)
+
+static void do_vmul(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = Vj->B[n] * Vk->B[n];
+        break;
+    case 16:
+        Vd->H[n] = Vj->H[n] * Vk->H[n];
+        break;
+    case 32:
+        Vd->W[n] = Vj->W[n] * Vk->W[n];
+        break;
+    case 64:
+        Vd->D[n] = Vj->D[n] * Vk->D[n];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vmuh_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = ((int16_t)(Vj->B[n] * Vk->B[n])) >> 8;
+        break;
+    case 16:
+        Vd->H[n] = ((int32_t)(Vj->H[n] * Vk->H[n])) >> 16;
+        break;
+    case 32:
+        Vd->W[n] = ((int64_t)(Vj->W[n] * (int64_t)Vk->W[n])) >> 32;
+        break;
+    case 64:
+        Vd->D[n] = ((__int128_t)(Vj->D[n] * (__int128_t)Vk->D[n])) >> 64;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vmuh_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = ((uint16_t)(((uint8_t)Vj->B[n]) * ((uint8_t)Vk->B[n]))) >> 8;
+        break;
+    case 16:
+        Vd->H[n] = ((uint32_t)(((uint16_t)Vj->H[n]) * ((uint16_t)Vk->H[n]))) >> 16;
+        break;
+    case 32:
+        Vd->W[n] = ((uint64_t)(((uint32_t)Vj->W[n]) * ((uint64_t)(uint32_t)Vk->W[n]))) >> 32;
+        break;
+    case 64:
+        Vd->D[n] = ((__int128_t)(((uint64_t)Vj->D[n]) * ((__int128_t)(uint64_t)Vk->D[n]))) >> 64;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vmul_b, 8, helper_vvv, do_vmul)
+DO_HELPER_VVV(vmul_h, 16, helper_vvv, do_vmul)
+DO_HELPER_VVV(vmul_w, 32, helper_vvv, do_vmul)
+DO_HELPER_VVV(vmul_d, 64, helper_vvv, do_vmul)
+DO_HELPER_VVV(vmuh_b, 8, helper_vvv, do_vmuh_s)
+DO_HELPER_VVV(vmuh_h, 16, helper_vvv, do_vmuh_s)
+DO_HELPER_VVV(vmuh_w, 32, helper_vvv, do_vmuh_s)
+DO_HELPER_VVV(vmuh_d, 64, helper_vvv, do_vmuh_s)
+DO_HELPER_VVV(vmuh_bu, 8, helper_vvv, do_vmuh_u)
+DO_HELPER_VVV(vmuh_hu, 16, helper_vvv, do_vmuh_u)
+DO_HELPER_VVV(vmuh_wu, 32, helper_vvv, do_vmuh_u)
+DO_HELPER_VVV(vmuh_du, 64, helper_vvv, do_vmuh_u)
+
+static void do_vmulwev_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = Vj->B[2 * n] * Vk->B[2 * n];
+        break;
+    case 32:
+        Vd->W[n] = Vj->H[2 * n] * Vk->H[2 * n];
+        break;
+    case 64:
+        Vd->D[n] = (int64_t)Vj->W[2 * n] * (int64_t)Vk->W[2 * n];
+        break;
+    case 128:
+        Vd->Q[n] = (__int128_t)Vj->D[2 * n] * (__int128_t)Vk->D[2 * n];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vmulwod_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = Vj->B[2 * n + 1] * Vk->B[2 * n + 1];
+        break;
+    case 32:
+        Vd->W[n] = Vj->H[2 * n + 1] * Vk->H[2 * n + 1];
+        break;
+    case 64:
+        Vd->D[n] = (int64_t)Vj->W[2 * n + 1] * (int64_t)Vk->W[2 * n + 1];
+        break;
+    case 128:
+        Vd->Q[n] = (__int128_t)Vj->D[2 * n + 1] * (__int128_t)Vk->D[2 * n + 1];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vmulwev_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = (uint8_t)Vj->B[2 * n] * (uint8_t)Vk->B[2 * n];
+        break;
+    case 32:
+        Vd->W[n] = (uint16_t)Vj->H[2 * n] * (uint16_t)Vk->H[2 * n];
+        break;
+    case 64:
+        Vd->D[n] = (uint64_t)(uint32_t)Vj->W[2 * n] * (uint64_t)(uint32_t)Vk->W[2 * n];
+        break;
+    case 128:
+        Vd->Q[n] = (__uint128_t)(uint64_t)Vj->D[2 * n] * (__uint128_t)(uint64_t)Vk->D[2 * n];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vmulwod_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = (uint8_t)Vj->B[2 * n + 1] * (uint8_t)Vk->B[2 * n + 1];
+        break;
+    case 32:
+        Vd->W[n] = (uint16_t)Vj->H[2 * n + 1] * (uint16_t)Vk->H[2 * n + 1];
+        break;
+    case 64:
+        Vd->D[n] = (uint64_t)(uint32_t)Vj->W[2 * n + 1] * (uint64_t)(uint32_t)Vk->W[2 * n + 1];
+        break;
+    case 128:
+        Vd->Q[n] = (__uint128_t)(uint64_t)Vj->D[2 * n + 1] * (__uint128_t)(uint64_t)Vk->D[2 * n + 1];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vmulwev_u_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = (uint8_t)Vj->B[2 * n] * Vk->B[2 * n];
+        break;
+    case 32:
+        Vd->W[n] = (uint16_t)Vj->H[2 * n] * Vk->H[2 * n];
+        break;
+    case 64:
+        Vd->D[n] = (int64_t)(uint32_t)Vj->W[2 * n] * (int64_t)Vk->W[2 * n];
+        break;
+    case 128:
+        Vd->Q[n] = (__int128_t)(uint64_t)Vj->D[2 * n] * (__int128_t)Vk->D[2 * n];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vmulwod_u_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = (uint8_t)Vj->B[2 * n + 1] * Vk->B[2 * n + 1];
+        break;
+    case 32:
+        Vd->W[n] = (uint16_t)Vj->H[2 * n + 1] * Vk->H[2 * n + 1];
+        break;
+    case 64:
+        Vd->D[n] = (int64_t)(uint32_t)Vj->W[2 * n + 1] * (int64_t)Vk->W[2 * n + 1];
+        break;
+    case 128:
+        Vd->Q[n] = (__int128_t)(uint64_t)Vj->D[2 * n + 1] * (__int128_t)Vk->D[2 * n + 1];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vmulwev_h_b, 16, helper_vvv, do_vmulwev_s)
+DO_HELPER_VVV(vmulwev_w_h, 32, helper_vvv, do_vmulwev_s)
+DO_HELPER_VVV(vmulwev_d_w, 64, helper_vvv, do_vmulwev_s)
+DO_HELPER_VVV(vmulwev_q_d, 128, helper_vvv, do_vmulwev_s)
+DO_HELPER_VVV(vmulwod_h_b, 16, helper_vvv, do_vmulwod_s)
+DO_HELPER_VVV(vmulwod_w_h, 32, helper_vvv, do_vmulwod_s)
+DO_HELPER_VVV(vmulwod_d_w, 64, helper_vvv, do_vmulwod_s)
+DO_HELPER_VVV(vmulwod_q_d, 128, helper_vvv, do_vmulwod_s)
+DO_HELPER_VVV(vmulwev_h_bu, 16, helper_vvv, do_vmulwev_u)
+DO_HELPER_VVV(vmulwev_w_hu, 32, helper_vvv, do_vmulwev_u)
+DO_HELPER_VVV(vmulwev_d_wu, 64, helper_vvv, do_vmulwev_u)
+DO_HELPER_VVV(vmulwev_q_du, 128, helper_vvv, do_vmulwev_u)
+DO_HELPER_VVV(vmulwod_h_bu, 16, helper_vvv, do_vmulwod_u)
+DO_HELPER_VVV(vmulwod_w_hu, 32, helper_vvv, do_vmulwod_u)
+DO_HELPER_VVV(vmulwod_d_wu, 64, helper_vvv, do_vmulwod_u)
+DO_HELPER_VVV(vmulwod_q_du, 128, helper_vvv, do_vmulwod_u)
+DO_HELPER_VVV(vmulwev_h_bu_b, 16, helper_vvv, do_vmulwev_u_s)
+DO_HELPER_VVV(vmulwev_w_hu_h, 32, helper_vvv, do_vmulwev_u_s)
+DO_HELPER_VVV(vmulwev_d_wu_w, 64, helper_vvv, do_vmulwev_u_s)
+DO_HELPER_VVV(vmulwev_q_du_d, 128, helper_vvv, do_vmulwev_u_s)
+DO_HELPER_VVV(vmulwod_h_bu_b, 16, helper_vvv, do_vmulwod_u_s)
+DO_HELPER_VVV(vmulwod_w_hu_h, 32, helper_vvv, do_vmulwod_u_s)
+DO_HELPER_VVV(vmulwod_d_wu_w, 64, helper_vvv, do_vmulwod_u_s)
+DO_HELPER_VVV(vmulwod_q_du_d, 128, helper_vvv, do_vmulwod_u_s)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 16/43] target/loongarch: Implement vmadd/vmsub/vmaddw{ev/od}
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (14 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 15/43] target/loongarch: Implement vmul/vmuh/vmulw{ev/od} Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24 18:09   ` Richard Henderson
  2022-12-24  8:16 ` [RFC PATCH 17/43] target/loongarch: Implement vdiv/vmod Song Gao
                   ` (27 subsequent siblings)
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VMADD.{B/H/W/D};
- VMSUB.{B/H/W/D};
- VMADDW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
- VMADDW{EV/OD}.{H.BU.B/W.HU.H/D.WU.W/Q.DU.D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  34 ++++
 target/loongarch/helper.h                   |  34 ++++
 target/loongarch/insn_trans/trans_lsx.c.inc |  34 ++++
 target/loongarch/insns.decode               |  34 ++++
 target/loongarch/lsx_helper.c               | 202 ++++++++++++++++++++
 5 files changed, 338 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 8818e078c1..3c11c6d5d2 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -981,3 +981,37 @@ INSN_LSX(vmulwod_h_bu_b,   vvv)
 INSN_LSX(vmulwod_w_hu_h,   vvv)
 INSN_LSX(vmulwod_d_wu_w,   vvv)
 INSN_LSX(vmulwod_q_du_d,   vvv)
+
+INSN_LSX(vmadd_b,          vvv)
+INSN_LSX(vmadd_h,          vvv)
+INSN_LSX(vmadd_w,          vvv)
+INSN_LSX(vmadd_d,          vvv)
+INSN_LSX(vmsub_b,          vvv)
+INSN_LSX(vmsub_h,          vvv)
+INSN_LSX(vmsub_w,          vvv)
+INSN_LSX(vmsub_d,          vvv)
+
+INSN_LSX(vmaddwev_h_b,     vvv)
+INSN_LSX(vmaddwev_w_h,     vvv)
+INSN_LSX(vmaddwev_d_w,     vvv)
+INSN_LSX(vmaddwev_q_d,     vvv)
+INSN_LSX(vmaddwod_h_b,     vvv)
+INSN_LSX(vmaddwod_w_h,     vvv)
+INSN_LSX(vmaddwod_d_w,     vvv)
+INSN_LSX(vmaddwod_q_d,     vvv)
+INSN_LSX(vmaddwev_h_bu,    vvv)
+INSN_LSX(vmaddwev_w_hu,    vvv)
+INSN_LSX(vmaddwev_d_wu,    vvv)
+INSN_LSX(vmaddwev_q_du,    vvv)
+INSN_LSX(vmaddwod_h_bu,    vvv)
+INSN_LSX(vmaddwod_w_hu,    vvv)
+INSN_LSX(vmaddwod_d_wu,    vvv)
+INSN_LSX(vmaddwod_q_du,    vvv)
+INSN_LSX(vmaddwev_h_bu_b,  vvv)
+INSN_LSX(vmaddwev_w_hu_h,  vvv)
+INSN_LSX(vmaddwev_d_wu_w,  vvv)
+INSN_LSX(vmaddwev_q_du_d,  vvv)
+INSN_LSX(vmaddwod_h_bu_b,  vvv)
+INSN_LSX(vmaddwod_w_hu_h,  vvv)
+INSN_LSX(vmaddwod_d_wu_w,  vvv)
+INSN_LSX(vmaddwod_q_du_d,  vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 568a89eec1..4d71b45fe0 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -336,3 +336,37 @@ DEF_HELPER_4(vmulwod_h_bu_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vmulwod_w_hu_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vmulwod_d_wu_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vmulwod_q_du_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vmadd_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vmadd_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vmadd_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vmadd_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vmsub_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vmsub_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vmsub_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vmsub_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vmaddwev_h_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwev_w_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwev_d_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwev_q_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwod_h_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwod_w_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwod_d_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwod_q_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwev_h_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwev_w_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwev_d_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwev_q_du, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwod_h_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwod_w_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwod_d_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwod_q_du, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwev_h_bu_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwev_w_hu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwev_d_wu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwev_q_du_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwod_h_bu_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwod_w_hu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwod_d_wu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vmaddwod_q_du_d, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 7d27f574ed..e9674af1bd 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -252,3 +252,37 @@ TRANS(vmulwod_h_bu_b, gen_vvv, gen_helper_vmulwod_h_bu_b)
 TRANS(vmulwod_w_hu_h, gen_vvv, gen_helper_vmulwod_w_hu_h)
 TRANS(vmulwod_d_wu_w, gen_vvv, gen_helper_vmulwod_d_wu_w)
 TRANS(vmulwod_q_du_d, gen_vvv, gen_helper_vmulwod_q_du_d)
+
+TRANS(vmadd_b, gen_vvv, gen_helper_vmadd_b)
+TRANS(vmadd_h, gen_vvv, gen_helper_vmadd_h)
+TRANS(vmadd_w, gen_vvv, gen_helper_vmadd_w)
+TRANS(vmadd_d, gen_vvv, gen_helper_vmadd_d)
+TRANS(vmsub_b, gen_vvv, gen_helper_vmsub_b)
+TRANS(vmsub_h, gen_vvv, gen_helper_vmsub_h)
+TRANS(vmsub_w, gen_vvv, gen_helper_vmsub_w)
+TRANS(vmsub_d, gen_vvv, gen_helper_vmsub_d)
+
+TRANS(vmaddwev_h_b, gen_vvv, gen_helper_vmaddwev_h_b)
+TRANS(vmaddwev_w_h, gen_vvv, gen_helper_vmaddwev_w_h)
+TRANS(vmaddwev_d_w, gen_vvv, gen_helper_vmaddwev_d_w)
+TRANS(vmaddwev_q_d, gen_vvv, gen_helper_vmaddwev_q_d)
+TRANS(vmaddwod_h_b, gen_vvv, gen_helper_vmaddwod_h_b)
+TRANS(vmaddwod_w_h, gen_vvv, gen_helper_vmaddwod_w_h)
+TRANS(vmaddwod_d_w, gen_vvv, gen_helper_vmaddwod_d_w)
+TRANS(vmaddwod_q_d, gen_vvv, gen_helper_vmaddwod_q_d)
+TRANS(vmaddwev_h_bu, gen_vvv, gen_helper_vmaddwev_h_bu)
+TRANS(vmaddwev_w_hu, gen_vvv, gen_helper_vmaddwev_w_hu)
+TRANS(vmaddwev_d_wu, gen_vvv, gen_helper_vmaddwev_d_wu)
+TRANS(vmaddwev_q_du, gen_vvv, gen_helper_vmaddwev_q_du)
+TRANS(vmaddwod_h_bu, gen_vvv, gen_helper_vmaddwod_h_bu)
+TRANS(vmaddwod_w_hu, gen_vvv, gen_helper_vmaddwod_w_hu)
+TRANS(vmaddwod_d_wu, gen_vvv, gen_helper_vmaddwod_d_wu)
+TRANS(vmaddwod_q_du, gen_vvv, gen_helper_vmaddwod_q_du)
+TRANS(vmaddwev_h_bu_b, gen_vvv, gen_helper_vmaddwev_h_bu_b)
+TRANS(vmaddwev_w_hu_h, gen_vvv, gen_helper_vmaddwev_w_hu_h)
+TRANS(vmaddwev_d_wu_w, gen_vvv, gen_helper_vmaddwev_d_wu_w)
+TRANS(vmaddwev_q_du_d, gen_vvv, gen_helper_vmaddwev_q_du_d)
+TRANS(vmaddwod_h_bu_b, gen_vvv, gen_helper_vmaddwod_h_bu_b)
+TRANS(vmaddwod_w_hu_h, gen_vvv, gen_helper_vmaddwod_w_hu_h)
+TRANS(vmaddwod_d_wu_w, gen_vvv, gen_helper_vmaddwod_d_wu_w)
+TRANS(vmaddwod_q_du_d, gen_vvv, gen_helper_vmaddwod_q_du_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 6f32fd290e..73390a07ce 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -705,3 +705,37 @@ vmulwod_h_bu_b   0111 00001010 00100 ..... ..... .....    @vvv
 vmulwod_w_hu_h   0111 00001010 00101 ..... ..... .....    @vvv
 vmulwod_d_wu_w   0111 00001010 00110 ..... ..... .....    @vvv
 vmulwod_q_du_d   0111 00001010 00111 ..... ..... .....    @vvv
+
+vmadd_b          0111 00001010 10000 ..... ..... .....    @vvv
+vmadd_h          0111 00001010 10001 ..... ..... .....    @vvv
+vmadd_w          0111 00001010 10010 ..... ..... .....    @vvv
+vmadd_d          0111 00001010 10011 ..... ..... .....    @vvv
+vmsub_b          0111 00001010 10100 ..... ..... .....    @vvv
+vmsub_h          0111 00001010 10101 ..... ..... .....    @vvv
+vmsub_w          0111 00001010 10110 ..... ..... .....    @vvv
+vmsub_d          0111 00001010 10111 ..... ..... .....    @vvv
+
+vmaddwev_h_b     0111 00001010 11000 ..... ..... .....    @vvv
+vmaddwev_w_h     0111 00001010 11001 ..... ..... .....    @vvv
+vmaddwev_d_w     0111 00001010 11010 ..... ..... .....    @vvv
+vmaddwev_q_d     0111 00001010 11011 ..... ..... .....    @vvv
+vmaddwod_h_b     0111 00001010 11100 ..... ..... .....    @vvv
+vmaddwod_w_h     0111 00001010 11101 ..... ..... .....    @vvv
+vmaddwod_d_w     0111 00001010 11110 ..... ..... .....    @vvv
+vmaddwod_q_d     0111 00001010 11111 ..... ..... .....    @vvv
+vmaddwev_h_bu    0111 00001011 01000 ..... ..... .....    @vvv
+vmaddwev_w_hu    0111 00001011 01001 ..... ..... .....    @vvv
+vmaddwev_d_wu    0111 00001011 01010 ..... ..... .....    @vvv
+vmaddwev_q_du    0111 00001011 01011 ..... ..... .....    @vvv
+vmaddwod_h_bu    0111 00001011 01100 ..... ..... .....    @vvv
+vmaddwod_w_hu    0111 00001011 01101 ..... ..... .....    @vvv
+vmaddwod_d_wu    0111 00001011 01110 ..... ..... .....    @vvv
+vmaddwod_q_du    0111 00001011 01111 ..... ..... .....    @vvv
+vmaddwev_h_bu_b  0111 00001011 11000 ..... ..... .....    @vvv
+vmaddwev_w_hu_h  0111 00001011 11001 ..... ..... .....    @vvv
+vmaddwev_d_wu_w  0111 00001011 11010 ..... ..... .....    @vvv
+vmaddwev_q_du_d  0111 00001011 11011 ..... ..... .....    @vvv
+vmaddwod_h_bu_b  0111 00001011 11100 ..... ..... .....    @vvv
+vmaddwod_w_hu_h  0111 00001011 11101 ..... ..... .....    @vvv
+vmaddwod_d_wu_w  0111 00001011 11110 ..... ..... .....    @vvv
+vmaddwod_q_du_d  0111 00001011 11111 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index d55d2350dc..aea2e34292 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -1373,3 +1373,205 @@ DO_HELPER_VVV(vmulwod_h_bu_b, 16, helper_vvv, do_vmulwod_u_s)
 DO_HELPER_VVV(vmulwod_w_hu_h, 32, helper_vvv, do_vmulwod_u_s)
 DO_HELPER_VVV(vmulwod_d_wu_w, 64, helper_vvv, do_vmulwod_u_s)
 DO_HELPER_VVV(vmulwod_q_du_d, 128, helper_vvv, do_vmulwod_u_s)
+
+static void do_vmadd(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] += Vj->B[n] * Vk->B[n];
+        break;
+    case 16:
+        Vd->H[n] += Vj->H[n] * Vk->H[n];
+        break;
+    case 32:
+        Vd->W[n] += Vj->W[n] * Vk->W[n];
+        break;
+    case 64:
+        Vd->D[n] += Vj->D[n] * Vk->D[n];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vmsub(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] -= Vj->B[n] * Vk->B[n];
+        break;
+    case 16:
+        Vd->H[n] -= Vj->H[n] * Vk->H[n];
+        break;
+    case 32:
+        Vd->W[n] -= Vj->W[n] * Vk->W[n];
+        break;
+    case 64:
+        Vd->D[n] -= Vj->D[n] * Vk->D[n];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vmadd_b, 8, helper_vvv, do_vmadd)
+DO_HELPER_VVV(vmadd_h, 16, helper_vvv, do_vmadd)
+DO_HELPER_VVV(vmadd_w, 32, helper_vvv, do_vmadd)
+DO_HELPER_VVV(vmadd_d, 64, helper_vvv, do_vmadd)
+DO_HELPER_VVV(vmsub_b, 8, helper_vvv, do_vmsub)
+DO_HELPER_VVV(vmsub_h, 16, helper_vvv, do_vmsub)
+DO_HELPER_VVV(vmsub_w, 32, helper_vvv, do_vmsub)
+DO_HELPER_VVV(vmsub_d, 64, helper_vvv, do_vmsub)
+
+static void do_vmaddwev_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] += Vj->B[2 * n] * Vk->B[2 * n];
+        break;
+    case 32:
+        Vd->W[n] += Vj->H[2 * n] * Vk->H[2 * n];
+        break;
+    case 64:
+        Vd->D[n] += (int64_t)Vj->W[2 * n] * (int64_t)Vk->W[2 * n];
+        break;
+    case 128:
+        Vd->Q[n] += (__int128_t)Vj->D[2 * n] * (__int128_t)Vk->D[2 * n];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vmaddwod_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] += Vj->B[2 * n + 1] * Vk->B[2 * n + 1];
+        break;
+    case 32:
+        Vd->W[n] += Vj->H[2 * n + 1] * Vk->H[2 * n + 1];
+        break;
+    case 64:
+        Vd->D[n] += (int64_t)Vj->W[2 * n + 1] * (int64_t)Vk->W[2 * n + 1];
+        break;
+    case 128:
+        Vd->Q[n] += (__int128_t)((__int128_t)Vj->D[2 * n + 1] *
+                    (__int128_t)Vk->D[2 * n + 1]);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vmaddwev_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] += (uint8_t)Vj->B[2 * n] * (uint8_t)Vk->B[2 * n];
+        break;
+    case 32:
+        Vd->W[n] += (uint16_t)Vj->H[2 * n] * (uint16_t)Vk->H[2 * n];
+        break;
+    case 64:
+        Vd->D[n] += (uint64_t)(uint32_t)Vj->W[2 * n] *
+                    (uint64_t)(uint32_t)Vk->W[2 * n];
+        break;
+    case 128:
+        Vd->Q[n] += (__uint128_t)(uint64_t)Vj->D[2 * n] *
+                    (__uint128_t)(uint64_t)Vk->D[2 * n];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vmaddwod_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] += (uint8_t)Vj->B[2 * n + 1] * (uint8_t)Vk->B[2 * n + 1];
+        break;
+    case 32:
+        Vd->W[n] += (uint16_t)Vj->H[2 * n + 1] * (uint16_t)Vk->H[2 * n + 1];
+        break;
+    case 64:
+        Vd->D[n] += (uint64_t)(uint32_t)Vj->W[2 * n + 1] *
+                    (uint64_t)(uint32_t)Vk->W[2 * n + 1];
+        break;
+    case 128:
+        Vd->Q[n] += (__uint128_t)(uint64_t)Vj->D[2 * n + 1] *
+                    (__uint128_t)(uint64_t)Vk->D[2 * n + 1];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vmaddwev_u_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] += (uint8_t)Vj->B[2 * n] * Vk->B[2 * n];
+        break;
+    case 32:
+        Vd->W[n] += (uint16_t)Vj->H[2 * n] * Vk->H[2 * n];
+        break;
+    case 64:
+        Vd->D[n] += (int64_t)(uint32_t)Vj->W[2 * n] * (int64_t)Vk->W[2 * n];
+        break;
+    case 128:
+        Vd->Q[n] += (__int128_t)(uint64_t)Vj->D[2 * n] *
+                    (__int128_t)Vk->D[2 * n];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vmaddwod_u_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] += (uint8_t)Vj->B[2 * n + 1] * Vk->B[2 * n + 1];
+        break;
+    case 32:
+        Vd->W[n] += (uint16_t)Vj->H[2 * n + 1] * Vk->H[2 * n + 1];
+        break;
+    case 64:
+        Vd->D[n] += (int64_t)(uint32_t)Vj->W[2 * n + 1] *
+                    (int64_t)Vk->W[2 * n + 1];
+        break;
+    case 128:
+        Vd->Q[n] += (__int128_t)(uint64_t)Vj->D[2 * n + 1] *
+                    (__int128_t)Vk->D[2 * n + 1];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vmaddwev_h_b, 16, helper_vvv, do_vmaddwev_s)
+DO_HELPER_VVV(vmaddwev_w_h, 32, helper_vvv, do_vmaddwev_s)
+DO_HELPER_VVV(vmaddwev_d_w, 64, helper_vvv, do_vmaddwev_s)
+DO_HELPER_VVV(vmaddwev_q_d, 128, helper_vvv, do_vmaddwev_s)
+DO_HELPER_VVV(vmaddwod_h_b, 16, helper_vvv, do_vmaddwod_s)
+DO_HELPER_VVV(vmaddwod_w_h, 32, helper_vvv, do_vmaddwod_s)
+DO_HELPER_VVV(vmaddwod_d_w, 64, helper_vvv, do_vmaddwod_s)
+DO_HELPER_VVV(vmaddwod_q_d, 128, helper_vvv, do_vmaddwod_s)
+DO_HELPER_VVV(vmaddwev_h_bu, 16, helper_vvv, do_vmaddwev_u)
+DO_HELPER_VVV(vmaddwev_w_hu, 32, helper_vvv, do_vmaddwev_u)
+DO_HELPER_VVV(vmaddwev_d_wu, 64, helper_vvv, do_vmaddwev_u)
+DO_HELPER_VVV(vmaddwev_q_du, 128, helper_vvv, do_vmaddwev_u)
+DO_HELPER_VVV(vmaddwod_h_bu, 16, helper_vvv, do_vmaddwod_u)
+DO_HELPER_VVV(vmaddwod_w_hu, 32, helper_vvv, do_vmaddwod_u)
+DO_HELPER_VVV(vmaddwod_d_wu, 64, helper_vvv, do_vmaddwod_u)
+DO_HELPER_VVV(vmaddwod_q_du, 128, helper_vvv, do_vmaddwod_u)
+DO_HELPER_VVV(vmaddwev_h_bu_b, 16, helper_vvv, do_vmaddwev_u_s)
+DO_HELPER_VVV(vmaddwev_w_hu_h, 32, helper_vvv, do_vmaddwev_u_s)
+DO_HELPER_VVV(vmaddwev_d_wu_w, 64, helper_vvv, do_vmaddwev_u_s)
+DO_HELPER_VVV(vmaddwev_q_du_d, 128, helper_vvv, do_vmaddwev_u_s)
+DO_HELPER_VVV(vmaddwod_h_bu_b, 16, helper_vvv, do_vmaddwod_u_s)
+DO_HELPER_VVV(vmaddwod_w_hu_h, 32, helper_vvv, do_vmaddwod_u_s)
+DO_HELPER_VVV(vmaddwod_d_wu_w, 64, helper_vvv, do_vmaddwod_u_s)
+DO_HELPER_VVV(vmaddwod_q_du_d, 128, helper_vvv, do_vmaddwod_u_s)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 17/43] target/loongarch: Implement vdiv/vmod
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (15 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 16/43] target/loongarch: Implement vmadd/vmsub/vmaddw{ev/od} Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24  8:16 ` [RFC PATCH 18/43] target/loongarch: Implement vsat Song Gao
                   ` (26 subsequent siblings)
  43 siblings, 0 replies; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VDIV.{B/H/W/D}[U];
- VMOD.{B/H/W/D}[U].

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  17 +++
 target/loongarch/helper.h                   |  17 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  17 +++
 target/loongarch/insns.decode               |  17 +++
 target/loongarch/lsx_helper.c               | 135 ++++++++++++++++++++
 5 files changed, 203 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 3c11c6d5d2..f50a1051b9 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1015,3 +1015,20 @@ INSN_LSX(vmaddwod_h_bu_b,  vvv)
 INSN_LSX(vmaddwod_w_hu_h,  vvv)
 INSN_LSX(vmaddwod_d_wu_w,  vvv)
 INSN_LSX(vmaddwod_q_du_d,  vvv)
+
+INSN_LSX(vdiv_b,           vvv)
+INSN_LSX(vdiv_h,           vvv)
+INSN_LSX(vdiv_w,           vvv)
+INSN_LSX(vdiv_d,           vvv)
+INSN_LSX(vdiv_bu,          vvv)
+INSN_LSX(vdiv_hu,          vvv)
+INSN_LSX(vdiv_wu,          vvv)
+INSN_LSX(vdiv_du,          vvv)
+INSN_LSX(vmod_b,           vvv)
+INSN_LSX(vmod_h,           vvv)
+INSN_LSX(vmod_w,           vvv)
+INSN_LSX(vmod_d,           vvv)
+INSN_LSX(vmod_bu,          vvv)
+INSN_LSX(vmod_hu,          vvv)
+INSN_LSX(vmod_wu,          vvv)
+INSN_LSX(vmod_du,          vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 4d71b45fe0..e5ee9260ad 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -370,3 +370,20 @@ DEF_HELPER_4(vmaddwod_h_bu_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vmaddwod_w_hu_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vmaddwod_d_wu_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vmaddwod_q_du_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vdiv_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vdiv_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vdiv_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vdiv_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vdiv_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vdiv_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vdiv_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vdiv_du, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_du, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index e9674af1bd..2d12470a0b 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -286,3 +286,20 @@ TRANS(vmaddwod_h_bu_b, gen_vvv, gen_helper_vmaddwod_h_bu_b)
 TRANS(vmaddwod_w_hu_h, gen_vvv, gen_helper_vmaddwod_w_hu_h)
 TRANS(vmaddwod_d_wu_w, gen_vvv, gen_helper_vmaddwod_d_wu_w)
 TRANS(vmaddwod_q_du_d, gen_vvv, gen_helper_vmaddwod_q_du_d)
+
+TRANS(vdiv_b, gen_vvv, gen_helper_vdiv_b)
+TRANS(vdiv_h, gen_vvv, gen_helper_vdiv_h)
+TRANS(vdiv_w, gen_vvv, gen_helper_vdiv_w)
+TRANS(vdiv_d, gen_vvv, gen_helper_vdiv_d)
+TRANS(vdiv_bu, gen_vvv, gen_helper_vdiv_bu)
+TRANS(vdiv_hu, gen_vvv, gen_helper_vdiv_hu)
+TRANS(vdiv_wu, gen_vvv, gen_helper_vdiv_wu)
+TRANS(vdiv_du, gen_vvv, gen_helper_vdiv_du)
+TRANS(vmod_b, gen_vvv, gen_helper_vmod_b)
+TRANS(vmod_h, gen_vvv, gen_helper_vmod_h)
+TRANS(vmod_w, gen_vvv, gen_helper_vmod_w)
+TRANS(vmod_d, gen_vvv, gen_helper_vmod_d)
+TRANS(vmod_bu, gen_vvv, gen_helper_vmod_bu)
+TRANS(vmod_hu, gen_vvv, gen_helper_vmod_hu)
+TRANS(vmod_wu, gen_vvv, gen_helper_vmod_wu)
+TRANS(vmod_du, gen_vvv, gen_helper_vmod_du)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 73390a07ce..cbd955a9e9 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -739,3 +739,20 @@ vmaddwod_h_bu_b  0111 00001011 11100 ..... ..... .....    @vvv
 vmaddwod_w_hu_h  0111 00001011 11101 ..... ..... .....    @vvv
 vmaddwod_d_wu_w  0111 00001011 11110 ..... ..... .....    @vvv
 vmaddwod_q_du_d  0111 00001011 11111 ..... ..... .....    @vvv
+
+vdiv_b           0111 00001110 00000 ..... ..... .....    @vvv
+vdiv_h           0111 00001110 00001 ..... ..... .....    @vvv
+vdiv_w           0111 00001110 00010 ..... ..... .....    @vvv
+vdiv_d           0111 00001110 00011 ..... ..... .....    @vvv
+vdiv_bu          0111 00001110 01000 ..... ..... .....    @vvv
+vdiv_hu          0111 00001110 01001 ..... ..... .....    @vvv
+vdiv_wu          0111 00001110 01010 ..... ..... .....    @vvv
+vdiv_du          0111 00001110 01011 ..... ..... .....    @vvv
+vmod_b           0111 00001110 00100 ..... ..... .....    @vvv
+vmod_h           0111 00001110 00101 ..... ..... .....    @vvv
+vmod_w           0111 00001110 00110 ..... ..... .....    @vvv
+vmod_d           0111 00001110 00111 ..... ..... .....    @vvv
+vmod_bu          0111 00001110 01100 ..... ..... .....    @vvv
+vmod_hu          0111 00001110 01101 ..... ..... .....    @vvv
+vmod_wu          0111 00001110 01110 ..... ..... .....    @vvv
+vmod_du          0111 00001110 01111 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index aea2e34292..99bdf4eb02 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -1575,3 +1575,138 @@ DO_HELPER_VVV(vmaddwod_h_bu_b, 16, helper_vvv, do_vmaddwod_u_s)
 DO_HELPER_VVV(vmaddwod_w_hu_h, 32, helper_vvv, do_vmaddwod_u_s)
 DO_HELPER_VVV(vmaddwod_d_wu_w, 64, helper_vvv, do_vmaddwod_u_s)
 DO_HELPER_VVV(vmaddwod_q_du_d, 128, helper_vvv, do_vmaddwod_u_s)
+
+static int64_t s_div_s(int64_t s1, int64_t s2, int bit)
+{
+    int64_t smin = MAKE_64BIT_MASK((bit -1), 64);
+
+    if (s1 == smin && s2 == -1) {
+        return smin;
+    }
+    return s2 ? s1 / s2 : s1 >= 0 ? -1 : 1;
+}
+
+static void do_vdiv_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = s_div_s(Vj->B[n], Vk->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = s_div_s(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = s_div_s(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = s_div_s(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static uint64_t u_div_u(int64_t s1, int64_t s2, int bit)
+{
+    uint64_t umax = MAKE_64BIT_MASK(0, bit);
+    uint64_t u1 = s1 & umax;
+    uint64_t u2 = s2 & umax;
+
+    return u2 ? u1 / u2 : -1;
+}
+
+static void do_vdiv_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = u_div_u(Vj->B[n], Vk->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = u_div_u(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = u_div_u(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = u_div_u(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static int64_t s_mod_s(int64_t s1, int64_t s2, int bit)
+{
+    int64_t smin = MAKE_64BIT_MASK((bit -1), 64);
+
+    if (s1 == smin && s2 == -1) {
+        return 0;
+    }
+    return s2 ? s1 % s2 : s1;
+}
+
+static void do_vmod_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = s_mod_s(Vj->B[n], Vk->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = s_mod_s(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = s_mod_s(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = s_mod_s(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static uint64_t u_mod_u(int64_t s1, int64_t s2, int bit)
+{
+    uint64_t umax = MAKE_64BIT_MASK(0, bit);
+    uint64_t u1 = s1 & umax;
+    uint64_t u2 = s2 & umax;
+
+    return u2 ? u1 % u2 : u1;
+}
+
+static void do_vmod_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = u_mod_u(Vj->B[n], Vk->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = u_mod_u(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = u_mod_u(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = u_mod_u(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vdiv_b, 8, helper_vvv, do_vdiv_s)
+DO_HELPER_VVV(vdiv_h, 16, helper_vvv, do_vdiv_s)
+DO_HELPER_VVV(vdiv_w, 32, helper_vvv, do_vdiv_s)
+DO_HELPER_VVV(vdiv_d, 64, helper_vvv, do_vdiv_s)
+DO_HELPER_VVV(vdiv_bu, 8, helper_vvv, do_vdiv_u)
+DO_HELPER_VVV(vdiv_hu, 16, helper_vvv, do_vdiv_u)
+DO_HELPER_VVV(vdiv_wu, 32, helper_vvv, do_vdiv_u)
+DO_HELPER_VVV(vdiv_du, 64, helper_vvv, do_vdiv_u)
+DO_HELPER_VVV(vmod_b, 8, helper_vvv, do_vmod_s)
+DO_HELPER_VVV(vmod_h, 16, helper_vvv, do_vmod_s)
+DO_HELPER_VVV(vmod_w, 32, helper_vvv, do_vmod_s)
+DO_HELPER_VVV(vmod_d, 64, helper_vvv, do_vmod_s)
+DO_HELPER_VVV(vmod_bu, 8, helper_vvv, do_vmod_u)
+DO_HELPER_VVV(vmod_hu, 16, helper_vvv, do_vmod_u)
+DO_HELPER_VVV(vmod_wu, 32, helper_vvv, do_vmod_u)
+DO_HELPER_VVV(vmod_du, 64, helper_vvv, do_vmod_u)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 18/43] target/loongarch: Implement vsat
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (16 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 17/43] target/loongarch: Implement vdiv/vmod Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24 18:13   ` Richard Henderson
  2022-12-24  8:16 ` [RFC PATCH 19/43] target/loongarch: Implement vexth Song Gao
                   ` (25 subsequent siblings)
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VSAT.{B/H/W/D}[U].

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  9 +++
 target/loongarch/helper.h                   |  9 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  9 +++
 target/loongarch/insns.decode               | 12 ++++
 target/loongarch/lsx_helper.c               | 70 +++++++++++++++++++++
 5 files changed, 109 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index f50a1051b9..1ae085e192 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1032,3 +1032,12 @@ INSN_LSX(vmod_bu,          vvv)
 INSN_LSX(vmod_hu,          vvv)
 INSN_LSX(vmod_wu,          vvv)
 INSN_LSX(vmod_du,          vvv)
+
+INSN_LSX(vsat_b,           vv_i)
+INSN_LSX(vsat_h,           vv_i)
+INSN_LSX(vsat_w,           vv_i)
+INSN_LSX(vsat_d,           vv_i)
+INSN_LSX(vsat_bu,          vv_i)
+INSN_LSX(vsat_hu,          vv_i)
+INSN_LSX(vsat_wu,          vv_i)
+INSN_LSX(vsat_du,          vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index e5ee9260ad..fc8044db51 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -387,3 +387,12 @@ DEF_HELPER_4(vmod_bu, void, env, i32, i32, i32)
 DEF_HELPER_4(vmod_hu, void, env, i32, i32, i32)
 DEF_HELPER_4(vmod_wu, void, env, i32, i32, i32)
 DEF_HELPER_4(vmod_du, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vsat_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsat_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsat_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsat_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsat_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsat_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsat_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsat_du, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 2d12470a0b..09924343b2 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -303,3 +303,12 @@ TRANS(vmod_bu, gen_vvv, gen_helper_vmod_bu)
 TRANS(vmod_hu, gen_vvv, gen_helper_vmod_hu)
 TRANS(vmod_wu, gen_vvv, gen_helper_vmod_wu)
 TRANS(vmod_du, gen_vvv, gen_helper_vmod_du)
+
+TRANS(vsat_b, gen_vv_i, gen_helper_vsat_b)
+TRANS(vsat_h, gen_vv_i, gen_helper_vsat_h)
+TRANS(vsat_w, gen_vv_i, gen_helper_vsat_w)
+TRANS(vsat_d, gen_vv_i, gen_helper_vsat_d)
+TRANS(vsat_bu, gen_vv_i, gen_helper_vsat_bu)
+TRANS(vsat_hu, gen_vv_i, gen_helper_vsat_hu)
+TRANS(vsat_wu, gen_vv_i, gen_helper_vsat_wu)
+TRANS(vsat_du, gen_vv_i, gen_helper_vsat_du)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index cbd955a9e9..cae67533fd 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -498,7 +498,10 @@ dbcl             0000 00000010 10101 ...............      @i15
 #
 @vv               .... ........ ..... ..... vj:5 vd:5    &vv
 @vvv               .... ........ ..... vk:5 vj:5 vd:5    &vvv
+@vv_ui3        .... ........ ..... .. imm:3 vj:5 vd:5    &vv_i
+@vv_ui4         .... ........ ..... . imm:4 vj:5 vd:5    &vv_i
 @vv_ui5           .... ........ ..... imm:5 vj:5 vd:5    &vv_i
+@vv_ui6            .... ........ .... imm:6 vj:5 vd:5    &vv_i
 @vv_i5           .... ........ ..... imm:s5 vj:5 vd:5    &vv_i
 
 vadd_b           0111 00000000 10100 ..... ..... .....    @vvv
@@ -756,3 +759,12 @@ vmod_bu          0111 00001110 01100 ..... ..... .....    @vvv
 vmod_hu          0111 00001110 01101 ..... ..... .....    @vvv
 vmod_wu          0111 00001110 01110 ..... ..... .....    @vvv
 vmod_du          0111 00001110 01111 ..... ..... .....    @vvv
+
+vsat_b           0111 00110010 01000 01 ... ..... .....   @vv_ui3
+vsat_h           0111 00110010 01000 1 .... ..... .....   @vv_ui4
+vsat_w           0111 00110010 01001 ..... ..... .....    @vv_ui5
+vsat_d           0111 00110010 0101 ...... ..... .....    @vv_ui6
+vsat_bu          0111 00110010 10000 01 ... ..... .....   @vv_ui3
+vsat_hu          0111 00110010 10000 1 .... ..... .....   @vv_ui4
+vsat_wu          0111 00110010 10001 ..... ..... .....    @vv_ui5
+vsat_du          0111 00110010 1001 ...... ..... .....    @vv_ui6
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 99bdf4eb02..62ab14051e 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -1710,3 +1710,73 @@ DO_HELPER_VVV(vmod_bu, 8, helper_vvv, do_vmod_u)
 DO_HELPER_VVV(vmod_hu, 16, helper_vvv, do_vmod_u)
 DO_HELPER_VVV(vmod_wu, 32, helper_vvv, do_vmod_u)
 DO_HELPER_VVV(vmod_du, 64, helper_vvv, do_vmod_u)
+
+static int64_t sat_s(int64_t s1, uint32_t imm)
+{
+    int64_t max = MAKE_64BIT_MASK(0, imm);
+    int64_t min = MAKE_64BIT_MASK(imm, 64);
+
+    if (s1 > max -1) {
+        return max;
+    } else if (s1 < - max) {
+        return min;
+    } else {
+        return s1;
+    }
+}
+
+static void do_vsat_s(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = sat_s(Vj->B[n], imm);
+        break;
+    case 16:
+        Vd->H[n] = sat_s(Vj->H[n], imm);
+        break;
+    case 32:
+        Vd->W[n] = sat_s(Vj->W[n], imm);
+        break;
+    case 64:
+        Vd->D[n] = sat_s(Vj->D[n], imm);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static uint64_t sat_u(uint64_t u1, uint32_t imm)
+{
+    uint64_t umax_imm = MAKE_64BIT_MASK(0, imm + 1);
+
+    return u1 < umax_imm ? u1 : umax_imm;
+}
+
+static void do_vsat_u(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = sat_u((uint8_t)Vj->B[n], imm);
+        break;
+    case 16:
+        Vd->H[n] = sat_u((uint16_t)Vj->H[n], imm);
+        break;
+    case 32:
+        Vd->W[n] = sat_u((uint32_t)Vj->W[n], imm);
+        break;
+    case 64:
+        Vd->D[n] = sat_u((uint64_t)Vj->D[n], imm);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VV_I(vsat_b, 8, helper_vv_i, do_vsat_s)
+DO_HELPER_VV_I(vsat_h, 16, helper_vv_i, do_vsat_s)
+DO_HELPER_VV_I(vsat_w, 32, helper_vv_i, do_vsat_s)
+DO_HELPER_VV_I(vsat_d, 64, helper_vv_i, do_vsat_s)
+DO_HELPER_VV_I(vsat_bu, 8, helper_vv_i, do_vsat_u)
+DO_HELPER_VV_I(vsat_hu, 16, helper_vv_i, do_vsat_u)
+DO_HELPER_VV_I(vsat_wu, 32, helper_vv_i, do_vsat_u)
+DO_HELPER_VV_I(vsat_du, 64, helper_vv_i, do_vsat_u)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 19/43] target/loongarch: Implement vexth
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (17 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 18/43] target/loongarch: Implement vsat Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24  8:16 ` [RFC PATCH 20/43] target/loongarch: Implement vsigncov Song Gao
                   ` (24 subsequent siblings)
  43 siblings, 0 replies; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VEXTH.{H.B/W.H/D.W/Q.D};
- VEXTH.{HU.BU/WU.HU/DU.WU/QU.DU}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  9 ++++
 target/loongarch/helper.h                   |  9 ++++
 target/loongarch/insn_trans/trans_lsx.c.inc |  9 ++++
 target/loongarch/insns.decode               |  9 ++++
 target/loongarch/lsx_helper.c               | 49 +++++++++++++++++++++
 5 files changed, 85 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 1ae085e192..3187f87bbe 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1041,3 +1041,12 @@ INSN_LSX(vsat_bu,          vv_i)
 INSN_LSX(vsat_hu,          vv_i)
 INSN_LSX(vsat_wu,          vv_i)
 INSN_LSX(vsat_du,          vv_i)
+
+INSN_LSX(vexth_h_b,        vv)
+INSN_LSX(vexth_w_h,        vv)
+INSN_LSX(vexth_d_w,        vv)
+INSN_LSX(vexth_q_d,        vv)
+INSN_LSX(vexth_hu_bu,      vv)
+INSN_LSX(vexth_wu_hu,      vv)
+INSN_LSX(vexth_du_wu,      vv)
+INSN_LSX(vexth_qu_du,      vv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index fc8044db51..7a9d4f125d 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -396,3 +396,12 @@ DEF_HELPER_4(vsat_bu, void, env, i32, i32, i32)
 DEF_HELPER_4(vsat_hu, void, env, i32, i32, i32)
 DEF_HELPER_4(vsat_wu, void, env, i32, i32, i32)
 DEF_HELPER_4(vsat_du, void, env, i32, i32, i32)
+
+DEF_HELPER_3(vexth_h_b, void, env, i32, i32)
+DEF_HELPER_3(vexth_w_h, void, env, i32, i32)
+DEF_HELPER_3(vexth_d_w, void, env, i32, i32)
+DEF_HELPER_3(vexth_q_d, void, env, i32, i32)
+DEF_HELPER_3(vexth_hu_bu, void, env, i32, i32)
+DEF_HELPER_3(vexth_wu_hu, void, env, i32, i32)
+DEF_HELPER_3(vexth_du_wu, void, env, i32, i32)
+DEF_HELPER_3(vexth_qu_du, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 09924343b2..48ea07b645 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -312,3 +312,12 @@ TRANS(vsat_bu, gen_vv_i, gen_helper_vsat_bu)
 TRANS(vsat_hu, gen_vv_i, gen_helper_vsat_hu)
 TRANS(vsat_wu, gen_vv_i, gen_helper_vsat_wu)
 TRANS(vsat_du, gen_vv_i, gen_helper_vsat_du)
+
+TRANS(vexth_h_b, gen_vv, gen_helper_vexth_h_b)
+TRANS(vexth_w_h, gen_vv, gen_helper_vexth_w_h)
+TRANS(vexth_d_w, gen_vv, gen_helper_vexth_d_w)
+TRANS(vexth_q_d, gen_vv, gen_helper_vexth_q_d)
+TRANS(vexth_hu_bu, gen_vv, gen_helper_vexth_hu_bu)
+TRANS(vexth_wu_hu, gen_vv, gen_helper_vexth_wu_hu)
+TRANS(vexth_du_wu, gen_vv, gen_helper_vexth_du_wu)
+TRANS(vexth_qu_du, gen_vv, gen_helper_vexth_qu_du)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index cae67533fd..8ae9ca608e 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -768,3 +768,12 @@ vsat_bu          0111 00110010 10000 01 ... ..... .....   @vv_ui3
 vsat_hu          0111 00110010 10000 1 .... ..... .....   @vv_ui4
 vsat_wu          0111 00110010 10001 ..... ..... .....    @vv_ui5
 vsat_du          0111 00110010 1001 ...... ..... .....    @vv_ui6
+
+vexth_h_b        0111 00101001 11101 11000 ..... .....    @vv
+vexth_w_h        0111 00101001 11101 11001 ..... .....    @vv
+vexth_d_w        0111 00101001 11101 11010 ..... .....    @vv
+vexth_q_d        0111 00101001 11101 11011 ..... .....    @vv
+vexth_hu_bu      0111 00101001 11101 11100 ..... .....    @vv
+vexth_wu_hu      0111 00101001 11101 11101 ..... .....    @vv
+vexth_du_wu      0111 00101001 11101 11110 ..... .....    @vv
+vexth_qu_du      0111 00101001 11101 11111 ..... .....    @vv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 62ab14051e..a094d7d382 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -1780,3 +1780,52 @@ DO_HELPER_VV_I(vsat_bu, 8, helper_vv_i, do_vsat_u)
 DO_HELPER_VV_I(vsat_hu, 16, helper_vv_i, do_vsat_u)
 DO_HELPER_VV_I(vsat_wu, 32, helper_vv_i, do_vsat_u)
 DO_HELPER_VV_I(vsat_du, 64, helper_vv_i, do_vsat_u)
+
+static void do_vexth_s(vec_t *Vd, vec_t *Vj, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = Vj->B[n + LSX_LEN/bit];
+        break;
+    case 32:
+        Vd->W[n] = Vj->H[n + LSX_LEN/bit];
+        break;
+    case 64:
+        Vd->D[n] = Vj->W[n + LSX_LEN/bit];
+        break;
+    case 128:
+        Vd->Q[n] = Vj->D[n + LSX_LEN/bit];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vexth_u(vec_t *Vd, vec_t *Vj, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = (uint8_t)Vj->B[n + LSX_LEN/bit];
+        break;
+    case 32:
+        Vd->W[n] = (uint16_t)Vj->H[n + LSX_LEN/bit];
+        break;
+    case 64:
+        Vd->D[n] = (uint32_t)Vj->W[n + LSX_LEN/bit];
+        break;
+    case 128:
+        Vd->Q[n] = (uint64_t)Vj->D[n + LSX_LEN/bit];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VV(vexth_h_b, 16, helper_vv, do_vexth_s)
+DO_HELPER_VV(vexth_w_h, 32, helper_vv, do_vexth_s)
+DO_HELPER_VV(vexth_d_w, 64, helper_vv, do_vexth_s)
+DO_HELPER_VV(vexth_q_d, 128, helper_vv, do_vexth_s)
+DO_HELPER_VV(vexth_hu_bu, 16, helper_vv, do_vexth_u)
+DO_HELPER_VV(vexth_wu_hu, 32, helper_vv, do_vexth_u)
+DO_HELPER_VV(vexth_du_wu, 64, helper_vv, do_vexth_u)
+DO_HELPER_VV(vexth_qu_du, 128, helper_vv, do_vexth_u)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 20/43] target/loongarch: Implement vsigncov
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (18 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 19/43] target/loongarch: Implement vexth Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24 18:18   ` Richard Henderson
  2022-12-24  8:16 ` [RFC PATCH 21/43] target/loongarch: Implement vmskltz/vmskgez/vmsknz Song Gao
                   ` (23 subsequent siblings)
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VSIGNCOV.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  5 ++++
 target/loongarch/helper.h                   |  5 ++++
 target/loongarch/insn_trans/trans_lsx.c.inc |  5 ++++
 target/loongarch/insns.decode               |  5 ++++
 target/loongarch/lsx_helper.c               | 29 +++++++++++++++++++++
 5 files changed, 49 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 3187f87bbe..34a459410b 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1050,3 +1050,8 @@ INSN_LSX(vexth_hu_bu,      vv)
 INSN_LSX(vexth_wu_hu,      vv)
 INSN_LSX(vexth_du_wu,      vv)
 INSN_LSX(vexth_qu_du,      vv)
+
+INSN_LSX(vsigncov_b,       vvv)
+INSN_LSX(vsigncov_h,       vvv)
+INSN_LSX(vsigncov_w,       vvv)
+INSN_LSX(vsigncov_d,       vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 7a9d4f125d..c2b4407663 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -405,3 +405,8 @@ DEF_HELPER_3(vexth_hu_bu, void, env, i32, i32)
 DEF_HELPER_3(vexth_wu_hu, void, env, i32, i32)
 DEF_HELPER_3(vexth_du_wu, void, env, i32, i32)
 DEF_HELPER_3(vexth_qu_du, void, env, i32, i32)
+
+DEF_HELPER_4(vsigncov_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsigncov_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsigncov_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsigncov_d, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 48ea07b645..ce207eda05 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -321,3 +321,8 @@ TRANS(vexth_hu_bu, gen_vv, gen_helper_vexth_hu_bu)
 TRANS(vexth_wu_hu, gen_vv, gen_helper_vexth_wu_hu)
 TRANS(vexth_du_wu, gen_vv, gen_helper_vexth_du_wu)
 TRANS(vexth_qu_du, gen_vv, gen_helper_vexth_qu_du)
+
+TRANS(vsigncov_b, gen_vvv, gen_helper_vsigncov_b)
+TRANS(vsigncov_h, gen_vvv, gen_helper_vsigncov_h)
+TRANS(vsigncov_w, gen_vvv, gen_helper_vsigncov_w)
+TRANS(vsigncov_d, gen_vvv, gen_helper_vsigncov_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 8ae9ca608e..c7237730d3 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -777,3 +777,8 @@ vexth_hu_bu      0111 00101001 11101 11100 ..... .....    @vv
 vexth_wu_hu      0111 00101001 11101 11101 ..... .....    @vv
 vexth_du_wu      0111 00101001 11101 11110 ..... .....    @vv
 vexth_qu_du      0111 00101001 11101 11111 ..... .....    @vv
+
+vsigncov_b       0111 00010010 11100 ..... ..... .....    @vvv
+vsigncov_h       0111 00010010 11101 ..... ..... .....    @vvv
+vsigncov_w       0111 00010010 11110 ..... ..... .....    @vvv
+vsigncov_d       0111 00010010 11111 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index a094d7d382..73360e45e2 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -1829,3 +1829,32 @@ DO_HELPER_VV(vexth_hu_bu, 16, helper_vv, do_vexth_u)
 DO_HELPER_VV(vexth_wu_hu, 32, helper_vv, do_vexth_u)
 DO_HELPER_VV(vexth_du_wu, 64, helper_vv, do_vexth_u)
 DO_HELPER_VV(vexth_qu_du, 128, helper_vv, do_vexth_u)
+
+static void do_vsigncov(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = (Vj->B[n] == 0x0) ? 0 :
+                   (Vj->B[n] < 0) ? -Vk->B[n] : Vk->B[n];
+        break;
+    case 16:
+        Vd->H[n] = (Vj->H[n] == 0x0) ? 0 :
+                   (Vj->H[n] < 0) ? -Vk->H[n] : Vk->H[n];
+        break;
+    case 32:
+        Vd->W[n] = (Vj->W[n] == 0x0) ? 0 :
+                   (Vj->W[n] < 0) ? -Vk->W[n] : Vk->W[n];
+        break;
+    case 64:
+        Vd->D[n] = (Vj->D[n] == 0x0) ? 0 :
+                   (Vj->D[n] < 0) ? -Vk->D[n] : Vk->W[n];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vsigncov_b, 8, helper_vvv, do_vsigncov)
+DO_HELPER_VVV(vsigncov_h, 16, helper_vvv, do_vsigncov)
+DO_HELPER_VVV(vsigncov_w, 32, helper_vvv, do_vsigncov)
+DO_HELPER_VVV(vsigncov_d, 64, helper_vvv, do_vsigncov)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 21/43] target/loongarch: Implement vmskltz/vmskgez/vmsknz
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (19 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 20/43] target/loongarch: Implement vsigncov Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24 18:31   ` Richard Henderson
  2022-12-24  8:16 ` [RFC PATCH 22/43] target/loongarch: Implement LSX logic instructions Song Gao
                   ` (22 subsequent siblings)
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VMSKLTZ.{B/H/W/D};
- VMSKGEZ.B;
- VMSKNZ.B.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  7 +++
 target/loongarch/helper.h                   |  7 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  7 +++
 target/loongarch/insns.decode               |  7 +++
 target/loongarch/lsx_helper.c               | 54 +++++++++++++++++++++
 5 files changed, 82 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 34a459410b..b674167120 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1055,3 +1055,10 @@ INSN_LSX(vsigncov_b,       vvv)
 INSN_LSX(vsigncov_h,       vvv)
 INSN_LSX(vsigncov_w,       vvv)
 INSN_LSX(vsigncov_d,       vvv)
+
+INSN_LSX(vmskltz_b,        vv)
+INSN_LSX(vmskltz_h,        vv)
+INSN_LSX(vmskltz_w,        vv)
+INSN_LSX(vmskltz_d,        vv)
+INSN_LSX(vmskgez_b,        vv)
+INSN_LSX(vmsknz_b,         vv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index c2b4407663..ae9351f513 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -410,3 +410,10 @@ DEF_HELPER_4(vsigncov_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vsigncov_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vsigncov_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vsigncov_d, void, env, i32, i32, i32)
+
+DEF_HELPER_3(vmskltz_b, void, env, i32, i32)
+DEF_HELPER_3(vmskltz_h, void, env, i32, i32)
+DEF_HELPER_3(vmskltz_w, void, env, i32, i32)
+DEF_HELPER_3(vmskltz_d, void, env, i32, i32)
+DEF_HELPER_3(vmskgez_b, void, env, i32, i32)
+DEF_HELPER_3(vmsknz_b, void, env, i32,i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index ce207eda05..c02602c409 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -326,3 +326,10 @@ TRANS(vsigncov_b, gen_vvv, gen_helper_vsigncov_b)
 TRANS(vsigncov_h, gen_vvv, gen_helper_vsigncov_h)
 TRANS(vsigncov_w, gen_vvv, gen_helper_vsigncov_w)
 TRANS(vsigncov_d, gen_vvv, gen_helper_vsigncov_d)
+
+TRANS(vmskltz_b, gen_vv, gen_helper_vmskltz_b)
+TRANS(vmskltz_h, gen_vv, gen_helper_vmskltz_h)
+TRANS(vmskltz_w, gen_vv, gen_helper_vmskltz_w)
+TRANS(vmskltz_d, gen_vv, gen_helper_vmskltz_d)
+TRANS(vmskgez_b, gen_vv, gen_helper_vmskgez_b)
+TRANS(vmsknz_b,  gen_vv, gen_helper_vmsknz_b)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index c7237730d3..864a524fe6 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -782,3 +782,10 @@ vsigncov_b       0111 00010010 11100 ..... ..... .....    @vvv
 vsigncov_h       0111 00010010 11101 ..... ..... .....    @vvv
 vsigncov_w       0111 00010010 11110 ..... ..... .....    @vvv
 vsigncov_d       0111 00010010 11111 ..... ..... .....    @vvv
+
+vmskltz_b        0111 00101001 11000 10000 ..... .....    @vv
+vmskltz_h        0111 00101001 11000 10001 ..... .....    @vv
+vmskltz_w        0111 00101001 11000 10010 ..... .....    @vv
+vmskltz_d        0111 00101001 11000 10011 ..... .....    @vv
+vmskgez_b        0111 00101001 11000 10100 ..... .....    @vv
+vmsknz_b         0111 00101001 11000 11000 ..... .....    @vv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 73360e45e2..cea1d99eb6 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -1858,3 +1858,57 @@ DO_HELPER_VVV(vsigncov_b, 8, helper_vvv, do_vsigncov)
 DO_HELPER_VVV(vsigncov_h, 16, helper_vvv, do_vsigncov)
 DO_HELPER_VVV(vsigncov_w, 32, helper_vvv, do_vsigncov)
 DO_HELPER_VVV(vsigncov_d, 64, helper_vvv, do_vsigncov)
+
+/* Vd, Vj, vd = 0 */
+static void helper_vv_z(CPULoongArchState *env,
+                        uint32_t vd, uint32_t vj, int bit,
+                        void (*func)(vec_t*, vec_t*, int, int))
+{
+    int i;
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+
+    Vd->D[0] = 0;
+    Vd->D[1] = 0;
+
+    for (i = 0; i < LSX_LEN/bit; i++) {
+        func(Vd, Vj, bit, i);
+    }
+}
+
+static void do_vmskltz(vec_t *Vd, vec_t *Vj, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->H[0] |= ((0x80 & Vj->B[n]) == 0) << n;
+        break;
+    case 16:
+        Vd->H[0] |= ((0x8000 & Vj->H[n]) == 0) << n;
+        break;
+    case 32:
+        Vd->H[0] |= ((0x80000000 & Vj->W[n]) == 0) << n;
+        break;
+    case 64:
+        Vd->H[0] |= ((0x8000000000000000 & Vj->D[n]) == 0) << n;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vmskgez(vec_t *Vd, vec_t *Vj, int bit, int n)
+{
+    Vd->H[0] |= !((0x80 & Vj->B[n]) == 0) << n;
+}
+
+static void do_vmsknz(vec_t *Vd, vec_t *Vj, int bit, int n)
+{
+    Vd->H[0] |=  (Vj->B[n] == 0) << n;
+}
+
+DO_HELPER_VV(vmskltz_b, 8, helper_vv_z, do_vmskltz)
+DO_HELPER_VV(vmskltz_h, 16, helper_vv_z, do_vmskltz)
+DO_HELPER_VV(vmskltz_w, 32, helper_vv_z, do_vmskltz)
+DO_HELPER_VV(vmskltz_d, 64, helper_vv_z, do_vmskltz)
+DO_HELPER_VV(vmskgez_b, 8, helper_vv_z, do_vmskgez)
+DO_HELPER_VV(vmsknz_b, 8, helper_vv_z, do_vmsknz)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 22/43] target/loongarch: Implement LSX logic instructions
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (20 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 21/43] target/loongarch: Implement vmskltz/vmskgez/vmsknz Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24 18:34   ` Richard Henderson
  2022-12-24  8:16 ` [RFC PATCH 23/43] target/loongarch: Implement vsll vsrl vsra vrotr Song Gao
                   ` (21 subsequent siblings)
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- V{AND/OR/XOR/NOR/ANDN/ORN}.V;
- V{AND/OR/XOR/NOR}I.B.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    | 12 ++++
 target/loongarch/helper.h                   | 12 ++++
 target/loongarch/insn_trans/trans_lsx.c.inc | 12 ++++
 target/loongarch/insns.decode               | 13 +++++
 target/loongarch/lsx_helper.c               | 62 +++++++++++++++++++++
 5 files changed, 111 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index b674167120..3e8015ac0e 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1062,3 +1062,15 @@ INSN_LSX(vmskltz_w,        vv)
 INSN_LSX(vmskltz_d,        vv)
 INSN_LSX(vmskgez_b,        vv)
 INSN_LSX(vmsknz_b,         vv)
+
+INSN_LSX(vand_v,           vvv)
+INSN_LSX(vor_v,            vvv)
+INSN_LSX(vxor_v,           vvv)
+INSN_LSX(vnor_v,           vvv)
+INSN_LSX(vandn_v,          vvv)
+INSN_LSX(vorn_v,           vvv)
+
+INSN_LSX(vandi_b,          vv_i)
+INSN_LSX(vori_b,           vv_i)
+INSN_LSX(vxori_b,          vv_i)
+INSN_LSX(vnori_b,          vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index ae9351f513..77b576f22f 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -417,3 +417,15 @@ DEF_HELPER_3(vmskltz_w, void, env, i32, i32)
 DEF_HELPER_3(vmskltz_d, void, env, i32, i32)
 DEF_HELPER_3(vmskgez_b, void, env, i32, i32)
 DEF_HELPER_3(vmsknz_b, void, env, i32,i32)
+
+DEF_HELPER_4(vand_v, void, env, i32, i32, i32)
+DEF_HELPER_4(vor_v, void, env, i32, i32, i32)
+DEF_HELPER_4(vxor_v, void, env, i32, i32, i32)
+DEF_HELPER_4(vnor_v, void, env, i32, i32, i32)
+DEF_HELPER_4(vandn_v, void, env, i32, i32, i32)
+DEF_HELPER_4(vorn_v, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vandi_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vori_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vxori_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vnori_b, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index c02602c409..c12de1d3d4 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -333,3 +333,15 @@ TRANS(vmskltz_w, gen_vv, gen_helper_vmskltz_w)
 TRANS(vmskltz_d, gen_vv, gen_helper_vmskltz_d)
 TRANS(vmskgez_b, gen_vv, gen_helper_vmskgez_b)
 TRANS(vmsknz_b,  gen_vv, gen_helper_vmsknz_b)
+
+TRANS(vand_v, gen_vvv, gen_helper_vand_v)
+TRANS(vor_v, gen_vvv, gen_helper_vor_v)
+TRANS(vxor_v, gen_vvv, gen_helper_vxor_v)
+TRANS(vnor_v, gen_vvv, gen_helper_vnor_v)
+TRANS(vandn_v, gen_vvv, gen_helper_vandn_v)
+TRANS(vorn_v, gen_vvv, gen_helper_vorn_v)
+
+TRANS(vandi_b, gen_vv_i, gen_helper_vandi_b)
+TRANS(vori_b, gen_vv_i, gen_helper_vori_b)
+TRANS(vxori_b, gen_vv_i, gen_helper_vxori_b)
+TRANS(vnori_b, gen_vv_i, gen_helper_vnori_b)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 864a524fe6..03b7f76712 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -502,6 +502,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 @vv_ui4         .... ........ ..... . imm:4 vj:5 vd:5    &vv_i
 @vv_ui5           .... ........ ..... imm:5 vj:5 vd:5    &vv_i
 @vv_ui6            .... ........ .... imm:6 vj:5 vd:5    &vv_i
+@vv_ui8              .... ........ .. imm:8 vj:5 vd:5    &vv_i
 @vv_i5           .... ........ ..... imm:s5 vj:5 vd:5    &vv_i
 
 vadd_b           0111 00000000 10100 ..... ..... .....    @vvv
@@ -789,3 +790,15 @@ vmskltz_w        0111 00101001 11000 10010 ..... .....    @vv
 vmskltz_d        0111 00101001 11000 10011 ..... .....    @vv
 vmskgez_b        0111 00101001 11000 10100 ..... .....    @vv
 vmsknz_b         0111 00101001 11000 11000 ..... .....    @vv
+
+vand_v           0111 00010010 01100 ..... ..... .....    @vvv
+vor_v            0111 00010010 01101 ..... ..... .....    @vvv
+vxor_v           0111 00010010 01110 ..... ..... .....    @vvv
+vnor_v           0111 00010010 01111 ..... ..... .....    @vvv
+vandn_v          0111 00010010 10000 ..... ..... .....    @vvv
+vorn_v           0111 00010010 10001 ..... ..... .....    @vvv
+
+vandi_b          0111 00111101 00 ........ ..... .....    @vv_ui8
+vori_b           0111 00111101 01 ........ ..... .....    @vv_ui8
+vxori_b          0111 00111101 10 ........ ..... .....    @vv_ui8
+vnori_b          0111 00111101 11 ........ ..... .....    @vv_ui8
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index cea1d99eb6..c61479bf74 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -1912,3 +1912,65 @@ DO_HELPER_VV(vmskltz_w, 32, helper_vv_z, do_vmskltz)
 DO_HELPER_VV(vmskltz_d, 64, helper_vv_z, do_vmskltz)
 DO_HELPER_VV(vmskgez_b, 8, helper_vv_z, do_vmskgez)
 DO_HELPER_VV(vmsknz_b, 8, helper_vv_z, do_vmsknz)
+
+static void do_vand_v(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    Vd->D[n] = Vj->D[n] & Vk->D[n];
+}
+
+static void do_vor_v(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    Vd->D[n] = Vj->D[n] | Vk->D[n];
+}
+
+static void do_vxor_v(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    Vd->D[n] = Vj->D[n] ^ Vk->D[n];
+}
+
+static void do_vnor_v(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    Vd->D[n] = ~(Vj->D[n] | Vk->D[n]);
+}
+
+static void do_vandn_v(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    Vd->D[n] = ~Vj->D[n] & Vk->D[n];
+}
+
+static void do_vorn_v(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    Vd->D[n] = Vj->D[n] | ~Vk->D[n];
+}
+
+DO_HELPER_VVV(vand_v, 64, helper_vvv, do_vand_v)
+DO_HELPER_VVV(vor_v, 64, helper_vvv, do_vor_v)
+DO_HELPER_VVV(vxor_v, 64, helper_vvv, do_vxor_v)
+DO_HELPER_VVV(vnor_v, 64, helper_vvv, do_vnor_v)
+DO_HELPER_VVV(vandn_v, 64, helper_vvv, do_vandn_v)
+DO_HELPER_VVV(vorn_v, 64, helper_vvv, do_vorn_v)
+
+static void do_vandi_b(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    Vd->B[n] = Vj->B[n] & imm;
+}
+
+static void do_vori_b(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    Vd->B[n] = Vj->B[n] | imm;
+}
+
+static void do_vxori_b(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    Vd->B[n] = Vj->B[n] ^ imm;
+}
+
+static void do_vnori_b(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    Vd->B[n] = ~(Vj->B[n] | imm);
+}
+
+DO_HELPER_VV_I(vandi_b, 8, helper_vv_i, do_vandi_b)
+DO_HELPER_VV_I(vori_b, 8, helper_vv_i, do_vori_b)
+DO_HELPER_VV_I(vxori_b, 8, helper_vv_i, do_vxori_b)
+DO_HELPER_VV_I(vnori_b, 8, helper_vv_i, do_vnori_b)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 23/43] target/loongarch: Implement vsll vsrl vsra vrotr
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (21 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 22/43] target/loongarch: Implement LSX logic instructions Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24 18:36   ` Richard Henderson
  2022-12-24  8:16 ` [RFC PATCH 24/43] target/loongarch: Implement vsllwil vextl Song Gao
                   ` (20 subsequent siblings)
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VSLL[I].{B/H/W/D};
- VSRL[I].{B/H/W/D};
- VSRA[I].{B/H/W/D};
- VROTR[I].{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  36 ++++
 target/loongarch/helper.h                   |  36 ++++
 target/loongarch/insn_trans/trans_lsx.c.inc |  36 ++++
 target/loongarch/insns.decode               |  36 ++++
 target/loongarch/lsx_helper.c               | 213 ++++++++++++++++++++
 5 files changed, 357 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 3e8015ac0e..a422c9dfc8 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1074,3 +1074,39 @@ INSN_LSX(vandi_b,          vv_i)
 INSN_LSX(vori_b,           vv_i)
 INSN_LSX(vxori_b,          vv_i)
 INSN_LSX(vnori_b,          vv_i)
+
+INSN_LSX(vsll_b,           vvv)
+INSN_LSX(vsll_h,           vvv)
+INSN_LSX(vsll_w,           vvv)
+INSN_LSX(vsll_d,           vvv)
+INSN_LSX(vslli_b,          vv_i)
+INSN_LSX(vslli_h,          vv_i)
+INSN_LSX(vslli_w,          vv_i)
+INSN_LSX(vslli_d,          vv_i)
+
+INSN_LSX(vsrl_b,           vvv)
+INSN_LSX(vsrl_h,           vvv)
+INSN_LSX(vsrl_w,           vvv)
+INSN_LSX(vsrl_d,           vvv)
+INSN_LSX(vsrli_b,          vv_i)
+INSN_LSX(vsrli_h,          vv_i)
+INSN_LSX(vsrli_w,          vv_i)
+INSN_LSX(vsrli_d,          vv_i)
+
+INSN_LSX(vsra_b,           vvv)
+INSN_LSX(vsra_h,           vvv)
+INSN_LSX(vsra_w,           vvv)
+INSN_LSX(vsra_d,           vvv)
+INSN_LSX(vsrai_b,          vv_i)
+INSN_LSX(vsrai_h,          vv_i)
+INSN_LSX(vsrai_w,          vv_i)
+INSN_LSX(vsrai_d,          vv_i)
+
+INSN_LSX(vrotr_b,          vvv)
+INSN_LSX(vrotr_h,          vvv)
+INSN_LSX(vrotr_w,          vvv)
+INSN_LSX(vrotr_d,          vvv)
+INSN_LSX(vrotri_b,         vv_i)
+INSN_LSX(vrotri_h,         vv_i)
+INSN_LSX(vrotri_w,         vv_i)
+INSN_LSX(vrotri_d,         vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 77b576f22f..c7733a7180 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -429,3 +429,39 @@ DEF_HELPER_4(vandi_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vori_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vxori_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vnori_b, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vsll_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsll_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsll_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsll_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vslli_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vslli_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vslli_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vslli_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vsrl_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrl_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrl_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrl_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrli_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrli_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrli_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrli_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vsra_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsra_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsra_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsra_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrai_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrai_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrai_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrai_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vrotr_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vrotr_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vrotr_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vrotr_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vrotri_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vrotri_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vrotri_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vrotri_d, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index c12de1d3d4..62aac7713b 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -345,3 +345,39 @@ TRANS(vandi_b, gen_vv_i, gen_helper_vandi_b)
 TRANS(vori_b, gen_vv_i, gen_helper_vori_b)
 TRANS(vxori_b, gen_vv_i, gen_helper_vxori_b)
 TRANS(vnori_b, gen_vv_i, gen_helper_vnori_b)
+
+TRANS(vsll_b, gen_vvv, gen_helper_vsll_b)
+TRANS(vsll_h, gen_vvv, gen_helper_vsll_h)
+TRANS(vsll_w, gen_vvv, gen_helper_vsll_w)
+TRANS(vsll_d, gen_vvv, gen_helper_vsll_d)
+TRANS(vslli_b, gen_vv_i, gen_helper_vslli_b)
+TRANS(vslli_h, gen_vv_i, gen_helper_vslli_h)
+TRANS(vslli_w, gen_vv_i, gen_helper_vslli_w)
+TRANS(vslli_d, gen_vv_i, gen_helper_vslli_d)
+
+TRANS(vsrl_b, gen_vvv, gen_helper_vsrl_b)
+TRANS(vsrl_h, gen_vvv, gen_helper_vsrl_h)
+TRANS(vsrl_w, gen_vvv, gen_helper_vsrl_w)
+TRANS(vsrl_d, gen_vvv, gen_helper_vsrl_d)
+TRANS(vsrli_b, gen_vv_i, gen_helper_vsrli_b)
+TRANS(vsrli_h, gen_vv_i, gen_helper_vsrli_h)
+TRANS(vsrli_w, gen_vv_i, gen_helper_vsrli_w)
+TRANS(vsrli_d, gen_vv_i, gen_helper_vsrli_d)
+
+TRANS(vsra_b, gen_vvv, gen_helper_vsra_b)
+TRANS(vsra_h, gen_vvv, gen_helper_vsra_h)
+TRANS(vsra_w, gen_vvv, gen_helper_vsra_w)
+TRANS(vsra_d, gen_vvv, gen_helper_vsra_d)
+TRANS(vsrai_b, gen_vv_i, gen_helper_vsrai_b)
+TRANS(vsrai_h, gen_vv_i, gen_helper_vsrai_h)
+TRANS(vsrai_w, gen_vv_i, gen_helper_vsrai_w)
+TRANS(vsrai_d, gen_vv_i, gen_helper_vsrai_d)
+
+TRANS(vrotr_b, gen_vvv, gen_helper_vrotr_b)
+TRANS(vrotr_h, gen_vvv, gen_helper_vrotr_h)
+TRANS(vrotr_w, gen_vvv, gen_helper_vrotr_w)
+TRANS(vrotr_d, gen_vvv, gen_helper_vrotr_d)
+TRANS(vrotri_b, gen_vv_i, gen_helper_vrotri_b)
+TRANS(vrotri_h, gen_vv_i, gen_helper_vrotri_h)
+TRANS(vrotri_w, gen_vv_i, gen_helper_vrotri_w)
+TRANS(vrotri_d, gen_vv_i, gen_helper_vrotri_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 03b7f76712..aca3267206 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -802,3 +802,39 @@ vandi_b          0111 00111101 00 ........ ..... .....    @vv_ui8
 vori_b           0111 00111101 01 ........ ..... .....    @vv_ui8
 vxori_b          0111 00111101 10 ........ ..... .....    @vv_ui8
 vnori_b          0111 00111101 11 ........ ..... .....    @vv_ui8
+
+vsll_b           0111 00001110 10000 ..... ..... .....    @vvv
+vsll_h           0111 00001110 10001 ..... ..... .....    @vvv
+vsll_w           0111 00001110 10010 ..... ..... .....    @vvv
+vsll_d           0111 00001110 10011 ..... ..... .....    @vvv
+vslli_b          0111 00110010 11000 01 ... ..... .....   @vv_ui3
+vslli_h          0111 00110010 11000 1 .... ..... .....   @vv_ui4
+vslli_w          0111 00110010 11001 ..... ..... .....    @vv_ui5
+vslli_d          0111 00110010 1101 ...... ..... .....    @vv_ui6
+
+vsrl_b           0111 00001110 10100 ..... ..... .....    @vvv
+vsrl_h           0111 00001110 10101 ..... ..... .....    @vvv
+vsrl_w           0111 00001110 10110 ..... ..... .....    @vvv
+vsrl_d           0111 00001110 10111 ..... ..... .....    @vvv
+vsrli_b          0111 00110011 00000 01 ... ..... .....   @vv_ui3
+vsrli_h          0111 00110011 00000 1 .... ..... .....   @vv_ui4
+vsrli_w          0111 00110011 00001 ..... ..... .....    @vv_ui5
+vsrli_d          0111 00110011 0001 ...... ..... .....    @vv_ui6
+
+vsra_b           0111 00001110 11000 ..... ..... .....    @vvv
+vsra_h           0111 00001110 11001 ..... ..... .....    @vvv
+vsra_w           0111 00001110 11010 ..... ..... .....    @vvv
+vsra_d           0111 00001110 11011 ..... ..... .....    @vvv
+vsrai_b          0111 00110011 01000 01 ... ..... .....   @vv_ui3
+vsrai_h          0111 00110011 01000 1 .... ..... .....   @vv_ui4
+vsrai_w          0111 00110011 01001 ..... ..... .....    @vv_ui5
+vsrai_d          0111 00110011 0101 ...... ..... .....    @vv_ui6
+
+vrotr_b          0111 00001110 11100 ..... ..... .....    @vvv
+vrotr_h          0111 00001110 11101 ..... ..... .....    @vvv
+vrotr_w          0111 00001110 11110 ..... ..... .....    @vvv
+vrotr_d          0111 00001110 11111 ..... ..... .....    @vvv
+vrotri_b         0111 00101010 00000 01 ... ..... .....   @vv_ui3
+vrotri_h         0111 00101010 00000 1 .... ..... .....   @vv_ui4
+vrotri_w         0111 00101010 00001 ..... ..... .....    @vv_ui5
+vrotri_d         0111 00101010 0001 ...... ..... .....    @vv_ui6
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index c61479bf74..d8282b670e 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -1974,3 +1974,216 @@ DO_HELPER_VV_I(vandi_b, 8, helper_vv_i, do_vandi_b)
 DO_HELPER_VV_I(vori_b, 8, helper_vv_i, do_vori_b)
 DO_HELPER_VV_I(vxori_b, 8, helper_vv_i, do_vxori_b)
 DO_HELPER_VV_I(vnori_b, 8, helper_vv_i, do_vnori_b)
+
+static void do_vsll(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = Vj->B[n] << ((uint64_t)(Vk->B[n]) % bit);
+        break;
+    case 16:
+        Vd->H[n] = Vj->H[n] << ((uint64_t)(Vk->H[n]) % bit);
+        break;
+    case 32:
+        Vd->W[n] = Vj->W[n] << ((uint64_t)(Vk->W[n]) % bit);
+        break;
+    case 64:
+        Vd->D[n] = Vj->D[n] << ((uint64_t)(Vk->D[n]) % bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vslli(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = Vj->B[n] << ((uint64_t)(imm) % bit);
+        break;
+    case 16:
+        Vd->H[n] = Vj->H[n] << ((uint64_t)(imm) % bit);
+        break;
+    case 32:
+        Vd->W[n] = Vj->W[n] << ((uint64_t)(imm) % bit);
+        break;
+    case 64:
+        Vd->D[n] = Vj->D[n] << ((uint64_t)(imm) % bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vsll_b, 8, helper_vvv, do_vsll)
+DO_HELPER_VVV(vsll_h, 16, helper_vvv, do_vsll)
+DO_HELPER_VVV(vsll_w, 32, helper_vvv, do_vsll)
+DO_HELPER_VVV(vsll_d, 64, helper_vvv, do_vsll)
+DO_HELPER_VV_I(vslli_b, 8, helper_vv_i, do_vslli)
+DO_HELPER_VV_I(vslli_h, 16, helper_vv_i, do_vslli)
+DO_HELPER_VV_I(vslli_w, 32, helper_vv_i, do_vslli)
+DO_HELPER_VV_I(vslli_d, 64, helper_vv_i, do_vslli)
+
+static int64_t vsrl(int64_t s1, int64_t s2, int bit)
+{
+    uint64_t umax = MAKE_64BIT_MASK(0, bit);
+    uint64_t u1 = s1 & umax;
+
+    return u1 >> ((uint64_t)(s2) % bit);
+}
+
+static void do_vsrl(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vsrl(Vj->B[n], Vk->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = vsrl(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = vsrl(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = vsrl(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vsrli(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vsrl(Vj->B[n], imm, bit);
+        break;
+    case 16:
+        Vd->H[n] = vsrl(Vj->H[n], imm, bit);
+        break;
+    case 32:
+        Vd->W[n] = vsrl(Vj->W[n], imm, bit);
+        break;
+    case 64:
+        Vd->D[n] = vsrl(Vj->D[n], imm, bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vsrl_b, 8, helper_vvv, do_vsrl)
+DO_HELPER_VVV(vsrl_h, 16, helper_vvv, do_vsrl)
+DO_HELPER_VVV(vsrl_w, 32, helper_vvv, do_vsrl)
+DO_HELPER_VVV(vsrl_d, 64, helper_vvv, do_vsrl)
+DO_HELPER_VV_I(vsrli_b, 8, helper_vv_i, do_vsrli)
+DO_HELPER_VV_I(vsrli_h, 16, helper_vv_i, do_vsrli)
+DO_HELPER_VV_I(vsrli_w, 32, helper_vv_i, do_vsrli)
+DO_HELPER_VV_I(vsrli_d, 64, helper_vv_i, do_vsrli)
+
+static void do_vsra(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = Vj->B[n] >> ((uint64_t)(Vk->B[n]) % bit);
+        break;
+    case 16:
+        Vd->H[n] = Vj->H[n] >> ((uint64_t)(Vk->H[n]) % bit);
+        break;
+    case 32:
+        Vd->W[n] = Vj->W[n] >> ((uint64_t)(Vk->W[n]) % bit);
+        break;
+    case 64:
+        Vd->D[n] = Vj->D[n] >> ((uint64_t)(Vk->D[n]) % bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vsrai(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = Vj->B[n] >> ((uint64_t)(imm) % bit);
+        break;
+    case 16:
+        Vd->H[n] = Vj->H[n] >> ((uint64_t)(imm) % bit);
+        break;
+    case 32:
+        Vd->W[n] = Vj->W[n] >> ((uint64_t)(imm) % bit);
+        break;
+    case 64:
+        Vd->D[n] = Vj->D[n] >> ((uint64_t)(imm) % bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vsra_b, 8, helper_vvv, do_vsra)
+DO_HELPER_VVV(vsra_h, 16, helper_vvv, do_vsra)
+DO_HELPER_VVV(vsra_w, 32, helper_vvv, do_vsra)
+DO_HELPER_VVV(vsra_d, 64, helper_vvv, do_vsra)
+DO_HELPER_VV_I(vsrai_b, 8, helper_vv_i, do_vsrai)
+DO_HELPER_VV_I(vsrai_h, 16, helper_vv_i, do_vsrai)
+DO_HELPER_VV_I(vsrai_w, 32, helper_vv_i, do_vsrai)
+DO_HELPER_VV_I(vsrai_d, 64, helper_vv_i, do_vsrai)
+
+static uint64_t vrotr(int64_t s1, int64_t s2, int bit)
+{
+    uint64_t umax = MAKE_64BIT_MASK(0, bit);
+    uint64_t u1 = s1 & umax;
+    int32_t n = (uint64_t)(s2) % bit;
+
+    return u1 >> n | u1 << (bit - n);
+}
+
+static void do_vrotr(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vrotr(Vj->B[n], Vk->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = vrotr(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = vrotr(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = vrotr(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vrotri(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vrotr(Vj->B[n], imm, bit);
+        break;
+    case 16:
+        Vd->H[n] = vrotr(Vj->H[n], imm, bit);
+        break;
+    case 32:
+        Vd->W[n] = vrotr(Vj->W[n], imm, bit);
+        break;
+    case 64:
+        Vd->D[n] = vrotr(Vj->D[n], imm, bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vrotr_b, 8, helper_vvv, do_vrotr)
+DO_HELPER_VVV(vrotr_h, 16, helper_vvv, do_vrotr)
+DO_HELPER_VVV(vrotr_w, 32, helper_vvv, do_vrotr)
+DO_HELPER_VVV(vrotr_d, 64, helper_vvv, do_vrotr)
+DO_HELPER_VV_I(vrotri_b, 8, helper_vv_i, do_vrotri)
+DO_HELPER_VV_I(vrotri_h, 16, helper_vv_i, do_vrotri)
+DO_HELPER_VV_I(vrotri_w, 32, helper_vv_i, do_vrotri)
+DO_HELPER_VV_I(vrotri_d, 64, helper_vv_i, do_vrotri)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 24/43] target/loongarch: Implement vsllwil vextl
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (22 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 23/43] target/loongarch: Implement vsll vsrl vsra vrotr Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24  8:16 ` [RFC PATCH 25/43] target/loongarch: Implement vsrlr vsrar Song Gao
                   ` (19 subsequent siblings)
  43 siblings, 0 replies; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VSLLWIL.{H.B/W.H/D.W};
- VSLLWIL.{HU.BU/WU.HU/DU.WU};
- VEXTL.Q.D, VEXTL.QU.DU.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  9 +++
 target/loongarch/helper.h                   |  9 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  9 +++
 target/loongarch/insns.decode               |  9 +++
 target/loongarch/lsx_helper.c               | 71 +++++++++++++++++++++
 5 files changed, 107 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index a422c9dfc8..18c4fd521a 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1110,3 +1110,12 @@ INSN_LSX(vrotri_b,         vv_i)
 INSN_LSX(vrotri_h,         vv_i)
 INSN_LSX(vrotri_w,         vv_i)
 INSN_LSX(vrotri_d,         vv_i)
+
+INSN_LSX(vsllwil_h_b,      vv_i)
+INSN_LSX(vsllwil_w_h,      vv_i)
+INSN_LSX(vsllwil_d_w,      vv_i)
+INSN_LSX(vextl_q_d,        vv)
+INSN_LSX(vsllwil_hu_bu,    vv_i)
+INSN_LSX(vsllwil_wu_hu,    vv_i)
+INSN_LSX(vsllwil_du_wu,    vv_i)
+INSN_LSX(vextl_qu_du,      vv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index c7733a7180..e3ec216b14 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -465,3 +465,12 @@ DEF_HELPER_4(vrotri_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vrotri_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vrotri_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vrotri_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vsllwil_h_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsllwil_w_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsllwil_d_w, void, env, i32, i32, i32)
+DEF_HELPER_3(vextl_q_d, void, env, i32, i32)
+DEF_HELPER_4(vsllwil_hu_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsllwil_wu_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsllwil_du_wu, void, env, i32, i32, i32)
+DEF_HELPER_3(vextl_qu_du, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 62aac7713b..8193e66fff 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -381,3 +381,12 @@ TRANS(vrotri_b, gen_vv_i, gen_helper_vrotri_b)
 TRANS(vrotri_h, gen_vv_i, gen_helper_vrotri_h)
 TRANS(vrotri_w, gen_vv_i, gen_helper_vrotri_w)
 TRANS(vrotri_d, gen_vv_i, gen_helper_vrotri_d)
+
+TRANS(vsllwil_h_b, gen_vv_i, gen_helper_vsllwil_h_b)
+TRANS(vsllwil_w_h, gen_vv_i, gen_helper_vsllwil_w_h)
+TRANS(vsllwil_d_w, gen_vv_i, gen_helper_vsllwil_d_w)
+TRANS(vextl_q_d, gen_vv, gen_helper_vextl_q_d)
+TRANS(vsllwil_hu_bu, gen_vv_i, gen_helper_vsllwil_hu_bu)
+TRANS(vsllwil_wu_hu, gen_vv_i, gen_helper_vsllwil_wu_hu)
+TRANS(vsllwil_du_wu, gen_vv_i, gen_helper_vsllwil_du_wu)
+TRANS(vextl_qu_du, gen_vv, gen_helper_vextl_qu_du)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index aca3267206..29609b834e 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -838,3 +838,12 @@ vrotri_b         0111 00101010 00000 01 ... ..... .....   @vv_ui3
 vrotri_h         0111 00101010 00000 1 .... ..... .....   @vv_ui4
 vrotri_w         0111 00101010 00001 ..... ..... .....    @vv_ui5
 vrotri_d         0111 00101010 0001 ...... ..... .....    @vv_ui6
+
+vsllwil_h_b      0111 00110000 10000 01 ... ..... .....   @vv_ui3
+vsllwil_w_h      0111 00110000 10000 1 .... ..... .....   @vv_ui4
+vsllwil_d_w      0111 00110000 10001 ..... ..... .....    @vv_ui5
+vextl_q_d        0111 00110000 10010 00000 ..... .....    @vv
+vsllwil_hu_bu    0111 00110000 11000 01 ... ..... .....   @vv_ui3
+vsllwil_wu_hu    0111 00110000 11000 1 .... ..... .....   @vv_ui4
+vsllwil_du_wu    0111 00110000 11001 ..... ..... .....    @vv_ui5
+vextl_qu_du      0111 00110000 11010 00000 ..... .....    @vv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index d8282b670e..91c1964d81 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -51,6 +51,24 @@ static  void helper_vv_i(CPULoongArchState *env,
     }
 }
 
+static void helper_vv_i_c(CPULoongArchState *env,
+                         uint32_t vd, uint32_t vj, uint32_t imm, int bit,
+                         void (*func)(vec_t*, vec_t*, uint32_t, int, int))
+{
+    int i;
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+
+    vec_t dest;
+    dest.D[0] = 0;
+    dest.D[1] = 0;
+    for (i = 0; i < LSX_LEN/bit; i++) {
+         func(&dest, Vj, imm, bit, i);
+    }
+    Vd->D[0] = dest.D[0];
+    Vd->D[1] = dest.D[1];
+}
+
 static void helper_vv(CPULoongArchState *env,
                       uint32_t vd, uint32_t vj, int bit,
                       void (*func)(vec_t*, vec_t*, int, int))
@@ -2187,3 +2205,56 @@ DO_HELPER_VV_I(vrotri_b, 8, helper_vv_i, do_vrotri)
 DO_HELPER_VV_I(vrotri_h, 16, helper_vv_i, do_vrotri)
 DO_HELPER_VV_I(vrotri_w, 32, helper_vv_i, do_vrotri)
 DO_HELPER_VV_I(vrotri_d, 64, helper_vv_i, do_vrotri)
+
+static void do_vsllwil_s(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = ((int8_t)Vj->B[n]) << ((uint64_t)(imm) % bit);
+        break;
+    case 32:
+        Vd->W[n] = ((int16_t)Vj->H[n]) << ((uint64_t)(imm) % bit);
+        break;
+    case 64:
+        Vd->D[n] = ((int64_t)(int32_t)Vj->W[n]) << ((uint64_t)(imm) % bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vextl_q_d(vec_t *Vd, vec_t *Vj, int bit, int n)
+{
+    Vd->Q[0] = (__int128_t)Vj->D[0];
+}
+
+static void do_vsllwil_u(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->H[n] = ((uint8_t)Vj->B[n]) << ((uint64_t)(imm) % bit);
+        break;
+    case 32:
+        Vd->W[n] = ((uint16_t)Vj->H[n]) << ((uint64_t)(imm) % bit);
+        break;
+    case 64:
+        Vd->D[n] = ((uint64_t)(uint32_t)Vj->W[n]) << ((uint64_t)(imm) % bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vextl_qu_du(vec_t *Vd, vec_t *Vj, int bit, int n)
+{
+     Vd->Q[0] = (uint64_t)Vj->D[0];
+}
+
+DO_HELPER_VV_I(vsllwil_h_b, 16, helper_vv_i_c, do_vsllwil_s)
+DO_HELPER_VV_I(vsllwil_w_h, 32, helper_vv_i_c, do_vsllwil_s)
+DO_HELPER_VV_I(vsllwil_d_w, 64, helper_vv_i_c, do_vsllwil_s)
+DO_HELPER_VV(vextl_q_d, 128, helper_vv, do_vextl_q_d)
+DO_HELPER_VV_I(vsllwil_hu_bu, 16, helper_vv_i_c, do_vsllwil_u)
+DO_HELPER_VV_I(vsllwil_wu_hu, 32, helper_vv_i_c, do_vsllwil_u)
+DO_HELPER_VV_I(vsllwil_du_wu, 64, helper_vv_i_c, do_vsllwil_u)
+DO_HELPER_VV(vextl_qu_du, 128, helper_vv, do_vextl_qu_du)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 25/43] target/loongarch: Implement vsrlr vsrar
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (23 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 24/43] target/loongarch: Implement vsllwil vextl Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24  8:16 ` [RFC PATCH 26/43] target/loongarch: Implement vsrln vsran Song Gao
                   ` (18 subsequent siblings)
  43 siblings, 0 replies; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VSRLR[I].{B/H/W/D};
- VSRAR[I].{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  18 +++
 target/loongarch/helper.h                   |  18 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  18 +++
 target/loongarch/insns.decode               |  18 +++
 target/loongarch/lsx_helper.c               | 124 ++++++++++++++++++++
 5 files changed, 196 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 18c4fd521a..766d934705 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1119,3 +1119,21 @@ INSN_LSX(vsllwil_hu_bu,    vv_i)
 INSN_LSX(vsllwil_wu_hu,    vv_i)
 INSN_LSX(vsllwil_du_wu,    vv_i)
 INSN_LSX(vextl_qu_du,      vv)
+
+INSN_LSX(vsrlr_b,          vvv)
+INSN_LSX(vsrlr_h,          vvv)
+INSN_LSX(vsrlr_w,          vvv)
+INSN_LSX(vsrlr_d,          vvv)
+INSN_LSX(vsrlri_b,         vv_i)
+INSN_LSX(vsrlri_h,         vv_i)
+INSN_LSX(vsrlri_w,         vv_i)
+INSN_LSX(vsrlri_d,         vv_i)
+
+INSN_LSX(vsrar_b,          vvv)
+INSN_LSX(vsrar_h,          vvv)
+INSN_LSX(vsrar_w,          vvv)
+INSN_LSX(vsrar_d,          vvv)
+INSN_LSX(vsrari_b,         vv_i)
+INSN_LSX(vsrari_h,         vv_i)
+INSN_LSX(vsrari_w,         vv_i)
+INSN_LSX(vsrari_d,         vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index e3ec216b14..65438c00f1 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -474,3 +474,21 @@ DEF_HELPER_4(vsllwil_hu_bu, void, env, i32, i32, i32)
 DEF_HELPER_4(vsllwil_wu_hu, void, env, i32, i32, i32)
 DEF_HELPER_4(vsllwil_du_wu, void, env, i32, i32, i32)
 DEF_HELPER_3(vextl_qu_du, void, env, i32, i32)
+
+DEF_HELPER_4(vsrlr_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlr_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlr_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlr_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlri_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlri_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlri_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlri_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vsrar_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrar_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrar_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrar_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrari_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrari_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrari_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrari_d, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 8193e66fff..9196ec3ed7 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -390,3 +390,21 @@ TRANS(vsllwil_hu_bu, gen_vv_i, gen_helper_vsllwil_hu_bu)
 TRANS(vsllwil_wu_hu, gen_vv_i, gen_helper_vsllwil_wu_hu)
 TRANS(vsllwil_du_wu, gen_vv_i, gen_helper_vsllwil_du_wu)
 TRANS(vextl_qu_du, gen_vv, gen_helper_vextl_qu_du)
+
+TRANS(vsrlr_b, gen_vvv, gen_helper_vsrlr_b)
+TRANS(vsrlr_h, gen_vvv, gen_helper_vsrlr_h)
+TRANS(vsrlr_w, gen_vvv, gen_helper_vsrlr_w)
+TRANS(vsrlr_d, gen_vvv, gen_helper_vsrlr_d)
+TRANS(vsrlri_b, gen_vv_i, gen_helper_vsrlri_b)
+TRANS(vsrlri_h, gen_vv_i, gen_helper_vsrlri_h)
+TRANS(vsrlri_w, gen_vv_i, gen_helper_vsrlri_w)
+TRANS(vsrlri_d, gen_vv_i, gen_helper_vsrlri_d)
+
+TRANS(vsrar_b, gen_vvv, gen_helper_vsrar_b)
+TRANS(vsrar_h, gen_vvv, gen_helper_vsrar_h)
+TRANS(vsrar_w, gen_vvv, gen_helper_vsrar_w)
+TRANS(vsrar_d, gen_vvv, gen_helper_vsrar_d)
+TRANS(vsrari_b, gen_vv_i, gen_helper_vsrari_b)
+TRANS(vsrari_h, gen_vv_i, gen_helper_vsrari_h)
+TRANS(vsrari_w, gen_vv_i, gen_helper_vsrari_w)
+TRANS(vsrari_d, gen_vv_i, gen_helper_vsrari_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 29609b834e..eef25e2eef 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -847,3 +847,21 @@ vsllwil_hu_bu    0111 00110000 11000 01 ... ..... .....   @vv_ui3
 vsllwil_wu_hu    0111 00110000 11000 1 .... ..... .....   @vv_ui4
 vsllwil_du_wu    0111 00110000 11001 ..... ..... .....    @vv_ui5
 vextl_qu_du      0111 00110000 11010 00000 ..... .....    @vv
+
+vsrlr_b          0111 00001111 00000 ..... ..... .....    @vvv
+vsrlr_h          0111 00001111 00001 ..... ..... .....    @vvv
+vsrlr_w          0111 00001111 00010 ..... ..... .....    @vvv
+vsrlr_d          0111 00001111 00011 ..... ..... .....    @vvv
+vsrlri_b         0111 00101010 01000 01 ... ..... .....   @vv_ui3
+vsrlri_h         0111 00101010 01000 1 .... ..... .....   @vv_ui4
+vsrlri_w         0111 00101010 01001 ..... ..... .....    @vv_ui5
+vsrlri_d         0111 00101010 0101 ...... ..... .....    @vv_ui6
+
+vsrar_b          0111 00001111 00100 ..... ..... .....    @vvv
+vsrar_h          0111 00001111 00101 ..... ..... .....    @vvv
+vsrar_w          0111 00001111 00110 ..... ..... .....    @vvv
+vsrar_d          0111 00001111 00111 ..... ..... .....    @vvv
+vsrari_b         0111 00101010 10000 01 ... ..... .....   @vv_ui3
+vsrari_h         0111 00101010 10000 1 .... ..... .....   @vv_ui4
+vsrari_w         0111 00101010 10001 ..... ..... .....    @vv_ui5
+vsrari_d         0111 00101010 1001 ...... ..... .....    @vv_ui6
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 91c1964d81..529a81372b 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -2258,3 +2258,127 @@ DO_HELPER_VV_I(vsllwil_hu_bu, 16, helper_vv_i_c, do_vsllwil_u)
 DO_HELPER_VV_I(vsllwil_wu_hu, 32, helper_vv_i_c, do_vsllwil_u)
 DO_HELPER_VV_I(vsllwil_du_wu, 64, helper_vv_i_c, do_vsllwil_u)
 DO_HELPER_VV(vextl_qu_du, 128, helper_vv, do_vextl_qu_du)
+
+static int64_t vsrlr(int64_t s1, int64_t s2, int bit)
+{
+    uint64_t umax = MAKE_64BIT_MASK(0, bit);
+    uint64_t u1 = s1 & umax;
+    int32_t n = (uint64_t)(s2 % bit);
+
+    if (n == 0) {
+        return u1;
+    } else {
+        uint64_t r_bit = (u1 >> (n  -1)) & 1;
+        return (u1 >> n) + r_bit;
+    }
+}
+
+static void do_vsrlr(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vsrlr(Vj->B[n], Vk->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = vsrlr(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = vsrlr(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = vsrlr(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vsrlri(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vsrlr(Vj->B[n], imm, bit);
+        break;
+    case 16:
+        Vd->H[n] = vsrlr(Vj->H[n], imm, bit);
+        break;
+    case 32:
+        Vd->W[n] = vsrlr(Vj->W[n], imm, bit);
+        break;
+    case 64:
+        Vd->D[n] = vsrlr(Vj->D[n], imm, bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vsrlr_b, 8, helper_vvv, do_vsrlr)
+DO_HELPER_VVV(vsrlr_h, 16, helper_vvv, do_vsrlr)
+DO_HELPER_VVV(vsrlr_w, 32, helper_vvv, do_vsrlr)
+DO_HELPER_VVV(vsrlr_d, 64, helper_vvv, do_vsrlr)
+DO_HELPER_VVV(vsrlri_b, 8, helper_vv_i, do_vsrlri)
+DO_HELPER_VVV(vsrlri_h, 16, helper_vv_i, do_vsrlri)
+DO_HELPER_VVV(vsrlri_w, 32, helper_vv_i, do_vsrlri)
+DO_HELPER_VVV(vsrlri_d, 64, helper_vv_i, do_vsrlri)
+
+static int64_t vsrar(int64_t s1, int64_t s2, int bit)
+{
+    int32_t n = (uint64_t)(s2 % bit);
+
+    if (n == 0) {
+        return s1;
+    } else {
+        uint64_t r_bit = (s1 >> (n  -1)) & 1;
+        return (s1 >> n) + r_bit;
+    }
+}
+
+static void do_vsrar(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vsrar(Vj->B[n], Vk->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = vsrar(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = vsrar(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = vsrar(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vsrari(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vsrar(Vj->B[n], imm, bit);
+        break;
+    case 16:
+        Vd->H[n] = vsrar(Vj->H[n], imm, bit);
+        break;
+    case 32:
+        Vd->W[n] = vsrar(Vj->W[n], imm, bit);
+        break;
+    case 64:
+        Vd->D[n] = vsrar(Vj->D[n], imm, bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vsrar_b, 8, helper_vvv, do_vsrar)
+DO_HELPER_VVV(vsrar_h, 16, helper_vvv, do_vsrar)
+DO_HELPER_VVV(vsrar_w, 32, helper_vvv, do_vsrar)
+DO_HELPER_VVV(vsrar_d, 64, helper_vvv, do_vsrar)
+DO_HELPER_VVV(vsrari_b, 8, helper_vv_i, do_vsrari)
+DO_HELPER_VVV(vsrari_h, 16, helper_vv_i, do_vsrari)
+DO_HELPER_VVV(vsrari_w, 32, helper_vv_i, do_vsrari)
+DO_HELPER_VVV(vsrari_d, 64, helper_vv_i, do_vsrari)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 26/43] target/loongarch: Implement vsrln vsran
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (24 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 25/43] target/loongarch: Implement vsrlr vsrar Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24  8:16 ` [RFC PATCH 27/43] target/loongarch: Implement vsrlrn vsrarn Song Gao
                   ` (17 subsequent siblings)
  43 siblings, 0 replies; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VSRLN.{B.H/H.W/W.D};
- VSRAN.{B.H/H.W/W.D};
- VSRLNI.{B.H/H.W/W.D/D.Q};
- VSRANI.{B.H/H.W/W.D/D.Q}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  16 +++
 target/loongarch/helper.h                   |  16 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  16 +++
 target/loongarch/insns.decode               |  17 +++
 target/loongarch/lsx_helper.c               | 134 ++++++++++++++++++++
 5 files changed, 199 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 766d934705..e6f4411b43 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1137,3 +1137,19 @@ INSN_LSX(vsrari_b,         vv_i)
 INSN_LSX(vsrari_h,         vv_i)
 INSN_LSX(vsrari_w,         vv_i)
 INSN_LSX(vsrari_d,         vv_i)
+
+INSN_LSX(vsrln_b_h,       vvv)
+INSN_LSX(vsrln_h_w,       vvv)
+INSN_LSX(vsrln_w_d,       vvv)
+INSN_LSX(vsran_b_h,       vvv)
+INSN_LSX(vsran_h_w,       vvv)
+INSN_LSX(vsran_w_d,       vvv)
+
+INSN_LSX(vsrlni_b_h,       vv_i)
+INSN_LSX(vsrlni_h_w,       vv_i)
+INSN_LSX(vsrlni_w_d,       vv_i)
+INSN_LSX(vsrlni_d_q,       vv_i)
+INSN_LSX(vsrani_b_h,       vv_i)
+INSN_LSX(vsrani_h_w,       vv_i)
+INSN_LSX(vsrani_w_d,       vv_i)
+INSN_LSX(vsrani_d_q,       vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 65438c00f1..eccfbfbb3e 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -492,3 +492,19 @@ DEF_HELPER_4(vsrari_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrari_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrari_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrari_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vsrln_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrln_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrln_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsran_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsran_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsran_w_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vsrlni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlni_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrani_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrani_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrani_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrani_d_q, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 9196ec3ed7..5b4410852d 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -408,3 +408,19 @@ TRANS(vsrari_b, gen_vv_i, gen_helper_vsrari_b)
 TRANS(vsrari_h, gen_vv_i, gen_helper_vsrari_h)
 TRANS(vsrari_w, gen_vv_i, gen_helper_vsrari_w)
 TRANS(vsrari_d, gen_vv_i, gen_helper_vsrari_d)
+
+TRANS(vsrln_b_h, gen_vvv, gen_helper_vsrln_b_h)
+TRANS(vsrln_h_w, gen_vvv, gen_helper_vsrln_h_w)
+TRANS(vsrln_w_d, gen_vvv, gen_helper_vsrln_w_d)
+TRANS(vsran_b_h, gen_vvv, gen_helper_vsran_b_h)
+TRANS(vsran_h_w, gen_vvv, gen_helper_vsran_h_w)
+TRANS(vsran_w_d, gen_vvv, gen_helper_vsran_w_d)
+
+TRANS(vsrlni_b_h, gen_vv_i, gen_helper_vsrlni_b_h)
+TRANS(vsrlni_h_w, gen_vv_i, gen_helper_vsrlni_h_w)
+TRANS(vsrlni_w_d, gen_vv_i, gen_helper_vsrlni_w_d)
+TRANS(vsrlni_d_q, gen_vv_i, gen_helper_vsrlni_d_q)
+TRANS(vsrani_b_h, gen_vv_i, gen_helper_vsrani_b_h)
+TRANS(vsrani_h_w, gen_vv_i, gen_helper_vsrani_h_w)
+TRANS(vsrani_w_d, gen_vv_i, gen_helper_vsrani_w_d)
+TRANS(vsrani_d_q, gen_vv_i, gen_helper_vsrani_d_q)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index eef25e2eef..859def6752 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -502,6 +502,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 @vv_ui4         .... ........ ..... . imm:4 vj:5 vd:5    &vv_i
 @vv_ui5           .... ........ ..... imm:5 vj:5 vd:5    &vv_i
 @vv_ui6            .... ........ .... imm:6 vj:5 vd:5    &vv_i
+@vv_ui7             .... ........ ... imm:7 vj:5 vd:5    &vv_i
 @vv_ui8              .... ........ .. imm:8 vj:5 vd:5    &vv_i
 @vv_i5           .... ........ ..... imm:s5 vj:5 vd:5    &vv_i
 
@@ -865,3 +866,19 @@ vsrari_b         0111 00101010 10000 01 ... ..... .....   @vv_ui3
 vsrari_h         0111 00101010 10000 1 .... ..... .....   @vv_ui4
 vsrari_w         0111 00101010 10001 ..... ..... .....    @vv_ui5
 vsrari_d         0111 00101010 1001 ...... ..... .....    @vv_ui6
+
+vsrln_b_h        0111 00001111 01001 ..... ..... .....    @vvv
+vsrln_h_w        0111 00001111 01010 ..... ..... .....    @vvv
+vsrln_w_d        0111 00001111 01011 ..... ..... .....    @vvv
+vsran_b_h        0111 00001111 01101 ..... ..... .....    @vvv
+vsran_h_w        0111 00001111 01110 ..... ..... .....    @vvv
+vsran_w_d        0111 00001111 01111 ..... ..... .....    @vvv
+
+vsrlni_b_h       0111 00110100 00000 1 .... ..... .....   @vv_ui4
+vsrlni_h_w       0111 00110100 00001 ..... ..... .....    @vv_ui5
+vsrlni_w_d       0111 00110100 0001 ...... ..... .....    @vv_ui6
+vsrlni_d_q       0111 00110100 001 ....... ..... .....    @vv_ui7
+vsrani_b_h       0111 00110101 10000 1 .... ..... .....   @vv_ui4
+vsrani_h_w       0111 00110101 10001 ..... ..... .....    @vv_ui5
+vsrani_w_d       0111 00110101 1001 ...... ..... .....    @vv_ui6
+vsrani_d_q       0111 00110101 101 ....... ..... .....    @vv_ui7
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 529a81372b..30b8da837a 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -2382,3 +2382,137 @@ DO_HELPER_VVV(vsrari_b, 8, helper_vv_i, do_vsrari)
 DO_HELPER_VVV(vsrari_h, 16, helper_vv_i, do_vsrari)
 DO_HELPER_VVV(vsrari_w, 32, helper_vv_i, do_vsrari)
 DO_HELPER_VVV(vsrari_d, 64, helper_vv_i, do_vsrari)
+
+static void helper_vvv_hz(CPULoongArchState *env,
+                          uint32_t vd, uint32_t vj, uint32_t vk, int bit,
+                          void (*func)(vec_t*, vec_t*, vec_t*, int, int))
+{
+    int i;
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+    vec_t *Vk = &(env->fpr[vk].vec);
+
+    for (i = 0; i < LSX_LEN/bit; i++) {
+        func(Vd, Vj, Vk, bit, i);
+    }
+    Vd->D[1] = 0;
+}
+
+static void do_vsrln(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->B[n] = (uint16_t)Vj->H[n] >> (Vk->H[n] & 0xf);
+        break;
+    case 32:
+        Vd->H[n] = (uint32_t)Vj->W[n] >> (Vk->W[n] & 0x1f);
+        break;
+    case 64:
+        Vd->W[n] = (uint64_t)Vj->D[n] >> (Vk->D[n] & 0x3f);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vsran(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->B[n] = Vj->H[n] >> (Vk->H[n] & 0xf);
+        break;
+    case 32:
+        Vd->H[n] = Vj->W[n] >> (Vk->W[n] & 0x1f);
+        break;
+    case 64:
+        Vd->W[n] = Vj->D[n] >> (Vk->D[n] & 0x3f);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vsrln_b_h, 16, helper_vvv_hz, do_vsrln)
+DO_HELPER_VVV(vsrln_h_w, 32, helper_vvv_hz, do_vsrln)
+DO_HELPER_VVV(vsrln_w_d, 64, helper_vvv_hz, do_vsrln)
+DO_HELPER_VVV(vsran_b_h, 16, helper_vvv_hz, do_vsran)
+DO_HELPER_VVV(vsran_h_w, 32, helper_vvv_hz, do_vsran)
+DO_HELPER_VVV(vsran_w_d, 64, helper_vvv_hz, do_vsran)
+
+static void helper_vv_ni_c(CPULoongArchState *env,
+                           uint32_t vd, uint32_t vj, uint32_t imm, int bit,
+                           void (*func)(vec_t*, vec_t*, vec_t*,
+                                        uint32_t, int, int))
+{
+    int i;
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+
+    vec_t dest;
+    dest.D[0] = 0;
+    dest.D[1] = 0;
+    for (i = 0; i < LSX_LEN/bit; i++) {
+         func(&dest, Vd, Vj, imm, bit, i);
+    }
+    Vd->D[0] = dest.D[0];
+    Vd->D[1] = dest.D[1];
+}
+
+static void do_vsrlni(vec_t *dest, vec_t *Vd, vec_t *Vj,
+                      uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        dest->B[n] = (uint16_t)Vj->H[n] >> imm;
+        dest->B[n + 128/bit] = (uint16_t)Vd->H[n] >> imm;
+        break;
+    case 32:
+        dest->H[n] = (uint32_t)Vj->W[n] >> imm;
+        dest->H[n + 128/bit] = (uint32_t)Vd->W[n] >> imm;
+        break;
+    case 64:
+        dest->W[n] = (uint64_t)Vj->D[n] >> imm;
+        dest->W[n + 128/bit] = (uint64_t)Vd->D[n] >> imm;
+        break;
+    case 128:
+        dest->D[n] = (__uint128_t)Vj->Q[n] >> imm;
+        dest->D[n + 128/bit] = (__uint128_t)Vd->Q[n] >> imm;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vsrani(vec_t *dest, vec_t *Vd, vec_t *Vj,
+                      uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        dest->B[n] = Vj->H[n] >> imm;
+        dest->B[n + 128/bit] = Vd->H[n] >> imm;
+        break;
+    case 32:
+        dest->H[n] = Vj->W[n] >> imm;
+        dest->H[n + 128/bit] = Vd->W[n] >> imm;
+        break;
+    case 64:
+        dest->W[n] = Vj->D[n] >> imm;
+        dest->W[n + 128/bit] = Vd->D[n] >> imm;
+        break;
+    case 128:
+        dest->D[n] = (__int128_t)Vj->Q[n] >> imm;
+        dest->D[n + 128/bit] = (__int128_t)Vd->Q[n] >> imm;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VV_I(vsrlni_b_h, 16, helper_vv_ni_c, do_vsrlni)
+DO_HELPER_VV_I(vsrlni_h_w, 32, helper_vv_ni_c, do_vsrlni)
+DO_HELPER_VV_I(vsrlni_w_d, 64, helper_vv_ni_c, do_vsrlni)
+DO_HELPER_VV_I(vsrlni_d_q, 128, helper_vv_ni_c, do_vsrlni)
+DO_HELPER_VV_I(vsrani_b_h, 16, helper_vv_ni_c, do_vsrani)
+DO_HELPER_VV_I(vsrani_h_w, 32, helper_vv_ni_c, do_vsrani)
+DO_HELPER_VV_I(vsrani_w_d, 64, helper_vv_ni_c, do_vsrani)
+DO_HELPER_VV_I(vsrani_d_q, 128, helper_vv_ni_c, do_vsrani)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 27/43] target/loongarch: Implement vsrlrn vsrarn
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (25 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 26/43] target/loongarch: Implement vsrln vsran Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24  8:16 ` [RFC PATCH 28/43] target/loongarch: Implement vssrln vssran Song Gao
                   ` (16 subsequent siblings)
  43 siblings, 0 replies; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VSRLRN.{B.H/H.W/W.D};
- VSRARN.{B.H/H.W/W.D};
- VSRLRNI.{B.H/H.W/W.D/D.Q};
- VSRARNI.{B.H/H.W/W.D/D.Q}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  16 +++
 target/loongarch/helper.h                   |  16 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  16 +++
 target/loongarch/insns.decode               |  16 +++
 target/loongarch/lsx_helper.c               | 108 ++++++++++++++++++++
 5 files changed, 172 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index e6f4411b43..507f34feaa 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1153,3 +1153,19 @@ INSN_LSX(vsrani_b_h,       vv_i)
 INSN_LSX(vsrani_h_w,       vv_i)
 INSN_LSX(vsrani_w_d,       vv_i)
 INSN_LSX(vsrani_d_q,       vv_i)
+
+INSN_LSX(vsrlrn_b_h,       vvv)
+INSN_LSX(vsrlrn_h_w,       vvv)
+INSN_LSX(vsrlrn_w_d,       vvv)
+INSN_LSX(vsrarn_b_h,       vvv)
+INSN_LSX(vsrarn_h_w,       vvv)
+INSN_LSX(vsrarn_w_d,       vvv)
+
+INSN_LSX(vsrlrni_b_h,      vv_i)
+INSN_LSX(vsrlrni_h_w,      vv_i)
+INSN_LSX(vsrlrni_w_d,      vv_i)
+INSN_LSX(vsrlrni_d_q,      vv_i)
+INSN_LSX(vsrarni_b_h,      vv_i)
+INSN_LSX(vsrarni_h_w,      vv_i)
+INSN_LSX(vsrarni_w_d,      vv_i)
+INSN_LSX(vsrarni_d_q,      vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index eccfbfbb3e..bb868961d1 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -508,3 +508,19 @@ DEF_HELPER_4(vsrani_b_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrani_h_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrani_w_d, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrani_d_q, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vsrlrn_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlrn_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlrn_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrarn_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrarn_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrarn_w_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vsrlrni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlrni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlrni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlrni_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrarni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrarni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrarni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrarni_d_q, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 5b4410852d..d3ab0a4a6a 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -424,3 +424,19 @@ TRANS(vsrani_b_h, gen_vv_i, gen_helper_vsrani_b_h)
 TRANS(vsrani_h_w, gen_vv_i, gen_helper_vsrani_h_w)
 TRANS(vsrani_w_d, gen_vv_i, gen_helper_vsrani_w_d)
 TRANS(vsrani_d_q, gen_vv_i, gen_helper_vsrani_d_q)
+
+TRANS(vsrlrn_b_h, gen_vvv, gen_helper_vsrlrn_b_h)
+TRANS(vsrlrn_h_w, gen_vvv, gen_helper_vsrlrn_h_w)
+TRANS(vsrlrn_w_d, gen_vvv, gen_helper_vsrlrn_w_d)
+TRANS(vsrarn_b_h, gen_vvv, gen_helper_vsrarn_b_h)
+TRANS(vsrarn_h_w, gen_vvv, gen_helper_vsrarn_h_w)
+TRANS(vsrarn_w_d, gen_vvv, gen_helper_vsrarn_w_d)
+
+TRANS(vsrlrni_b_h, gen_vv_i, gen_helper_vsrlrni_b_h)
+TRANS(vsrlrni_h_w, gen_vv_i, gen_helper_vsrlrni_h_w)
+TRANS(vsrlrni_w_d, gen_vv_i, gen_helper_vsrlrni_w_d)
+TRANS(vsrlrni_d_q, gen_vv_i, gen_helper_vsrlrni_d_q)
+TRANS(vsrarni_b_h, gen_vv_i, gen_helper_vsrarni_b_h)
+TRANS(vsrarni_h_w, gen_vv_i, gen_helper_vsrarni_h_w)
+TRANS(vsrarni_w_d, gen_vv_i, gen_helper_vsrarni_w_d)
+TRANS(vsrarni_d_q, gen_vv_i, gen_helper_vsrarni_d_q)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 859def6752..0b30175f6b 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -882,3 +882,19 @@ vsrani_b_h       0111 00110101 10000 1 .... ..... .....   @vv_ui4
 vsrani_h_w       0111 00110101 10001 ..... ..... .....    @vv_ui5
 vsrani_w_d       0111 00110101 1001 ...... ..... .....    @vv_ui6
 vsrani_d_q       0111 00110101 101 ....... ..... .....    @vv_ui7
+
+vsrlrn_b_h       0111 00001111 10001 ..... ..... .....    @vvv
+vsrlrn_h_w       0111 00001111 10010 ..... ..... .....    @vvv
+vsrlrn_w_d       0111 00001111 10011 ..... ..... .....    @vvv
+vsrarn_b_h       0111 00001111 10101 ..... ..... .....    @vvv
+vsrarn_h_w       0111 00001111 10110 ..... ..... .....    @vvv
+vsrarn_w_d       0111 00001111 10111 ..... ..... .....    @vvv
+
+vsrlrni_b_h      0111 00110100 01000 1 .... ..... .....   @vv_ui4
+vsrlrni_h_w      0111 00110100 01001 ..... ..... .....    @vv_ui5
+vsrlrni_w_d      0111 00110100 0101 ...... ..... .....    @vv_ui6
+vsrlrni_d_q      0111 00110100 011 ....... ..... .....    @vv_ui7
+vsrarni_b_h      0111 00110101 11000 1 .... ..... .....   @vv_ui4
+vsrarni_h_w      0111 00110101 11001 ..... ..... .....    @vv_ui5
+vsrarni_w_d      0111 00110101 1101 ...... ..... .....    @vv_ui6
+vsrarni_d_q      0111 00110101 111 ....... ..... .....    @vv_ui7
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 30b8da837a..8ccfa75fe3 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -2516,3 +2516,111 @@ DO_HELPER_VV_I(vsrani_b_h, 16, helper_vv_ni_c, do_vsrani)
 DO_HELPER_VV_I(vsrani_h_w, 32, helper_vv_ni_c, do_vsrani)
 DO_HELPER_VV_I(vsrani_w_d, 64, helper_vv_ni_c, do_vsrani)
 DO_HELPER_VV_I(vsrani_d_q, 128, helper_vv_ni_c, do_vsrani)
+
+static void do_vsrlrn(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->B[n] = vsrlr((uint16_t)Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->H[n] = vsrlr((uint32_t)Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->W[n] = vsrlr((uint64_t)Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vsrarn(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->B[n] = vsrar(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->H[n] = vsrar(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->W[n] = vsrar(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vsrlrn_b_h, 16, helper_vvv_hz, do_vsrlrn)
+DO_HELPER_VVV(vsrlrn_h_w, 32, helper_vvv_hz, do_vsrlrn)
+DO_HELPER_VVV(vsrlrn_w_d, 64, helper_vvv_hz, do_vsrlrn)
+DO_HELPER_VVV(vsrarn_b_h, 16, helper_vvv_hz, do_vsrarn)
+DO_HELPER_VVV(vsrarn_h_w, 32, helper_vvv_hz, do_vsrarn)
+DO_HELPER_VVV(vsrarn_w_d, 64, helper_vvv_hz, do_vsrarn)
+
+static __int128_t vsrlrn(__int128_t s1, uint32_t imm)
+{
+    if (imm == 0) {
+        return s1;
+    } else {
+        __uint128_t t1 = (__uint128_t)1 << (imm -1);
+        return (s1 + t1) >> imm;
+    }
+}
+
+static void do_vsrlrni(vec_t *dest, vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        dest->B[n] = vsrlrn((uint16_t)Vj->H[n], imm);
+        dest->B[n + 128 / bit] = vsrlrn((uint16_t)Vd->H[n], imm);
+        break;
+    case 32:
+        dest->H[n] = vsrlrn((uint32_t)Vj->W[n], imm);
+        dest->H[n + 128 / bit] = vsrlrn((uint32_t)Vd->W[n], imm);
+        break;
+    case 64:
+        dest->W[n] = vsrlrn((uint64_t)Vj->D[n], imm);
+        dest->W[n + 128 / bit] = vsrlrn((uint64_t)Vd->D[n], imm);
+        break;
+    case 128:
+        dest->D[n] = vsrlrn((__uint128_t)Vj->Q[n], imm);
+        dest->D[n + 128 / bit] = vsrlrn((__uint128_t)Vd->Q[n], imm);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vsrarni(vec_t *dest, vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        dest->B[n] = vsrlrn(Vj->H[n], imm);
+        dest->B[n + 128 / bit] = vsrlrn(Vd->H[n], imm);
+        break;
+    case 32:
+        dest->H[n] = vsrlrn(Vj->W[n], imm);
+        dest->H[n + 128 / bit] = vsrlrn(Vd->W[n], imm);
+        break;
+    case 64:
+        dest->W[n] = vsrlrn(Vj->D[n], imm);
+        dest->W[n + 128 / bit] = vsrlrn(Vd->D[n], imm);
+        break;
+    case 128:
+        dest->D[n] = vsrlrn(Vj->Q[n], imm);
+        dest->D[n + 128 / bit] = vsrlrn(Vd->Q[n], imm);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VV_I(vsrlrni_b_h, 16, helper_vv_ni_c, do_vsrlrni)
+DO_HELPER_VV_I(vsrlrni_h_w, 32, helper_vv_ni_c, do_vsrlrni)
+DO_HELPER_VV_I(vsrlrni_w_d, 64, helper_vv_ni_c, do_vsrlrni)
+DO_HELPER_VV_I(vsrlrni_d_q, 128, helper_vv_ni_c, do_vsrlrni)
+DO_HELPER_VV_I(vsrarni_b_h, 16, helper_vv_ni_c, do_vsrarni)
+DO_HELPER_VV_I(vsrarni_h_w, 32, helper_vv_ni_c, do_vsrarni)
+DO_HELPER_VV_I(vsrarni_w_d, 64, helper_vv_ni_c, do_vsrarni)
+DO_HELPER_VV_I(vsrarni_d_q, 128, helper_vv_ni_c, do_vsrarni)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 28/43] target/loongarch: Implement vssrln vssran
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (26 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 27/43] target/loongarch: Implement vsrlrn vsrarn Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24  8:16 ` [RFC PATCH 29/43] target/loongarch: Implement vssrlrn vssrarn Song Gao
                   ` (15 subsequent siblings)
  43 siblings, 0 replies; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VSSRLN.{B.H/H.W/W.D};
- VSSRAN.{B.H/H.W/W.D};
- VSSRLN.{BU.H/HU.W/WU.D};
- VSSRAN.{BU.H/HU.W/WU.D};
- VSSRLNI.{B.H/H.W/W.D/D.Q};
- VSSRANI.{B.H/H.W/W.D/D.Q};
- VSSRLNI.{BU.H/HU.W/WU.D/DU.Q};
- VSSRANI.{BU.H/HU.W/WU.D/DU.Q}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  30 +++
 target/loongarch/helper.h                   |  30 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  30 +++
 target/loongarch/insns.decode               |  30 +++
 target/loongarch/lsx_helper.c               | 267 ++++++++++++++++++++
 5 files changed, 387 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 507f34feaa..1b9bd6bb86 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1169,3 +1169,33 @@ INSN_LSX(vsrarni_b_h,      vv_i)
 INSN_LSX(vsrarni_h_w,      vv_i)
 INSN_LSX(vsrarni_w_d,      vv_i)
 INSN_LSX(vsrarni_d_q,      vv_i)
+
+INSN_LSX(vssrln_b_h,       vvv)
+INSN_LSX(vssrln_h_w,       vvv)
+INSN_LSX(vssrln_w_d,       vvv)
+INSN_LSX(vssran_b_h,       vvv)
+INSN_LSX(vssran_h_w,       vvv)
+INSN_LSX(vssran_w_d,       vvv)
+INSN_LSX(vssrln_bu_h,      vvv)
+INSN_LSX(vssrln_hu_w,      vvv)
+INSN_LSX(vssrln_wu_d,      vvv)
+INSN_LSX(vssran_bu_h,      vvv)
+INSN_LSX(vssran_hu_w,      vvv)
+INSN_LSX(vssran_wu_d,      vvv)
+
+INSN_LSX(vssrlni_b_h,      vv_i)
+INSN_LSX(vssrlni_h_w,      vv_i)
+INSN_LSX(vssrlni_w_d,      vv_i)
+INSN_LSX(vssrlni_d_q,      vv_i)
+INSN_LSX(vssrani_b_h,      vv_i)
+INSN_LSX(vssrani_h_w,      vv_i)
+INSN_LSX(vssrani_w_d,      vv_i)
+INSN_LSX(vssrani_d_q,      vv_i)
+INSN_LSX(vssrlni_bu_h,     vv_i)
+INSN_LSX(vssrlni_hu_w,     vv_i)
+INSN_LSX(vssrlni_wu_d,     vv_i)
+INSN_LSX(vssrlni_du_q,     vv_i)
+INSN_LSX(vssrani_bu_h,     vv_i)
+INSN_LSX(vssrani_hu_w,     vv_i)
+INSN_LSX(vssrani_wu_d,     vv_i)
+INSN_LSX(vssrani_du_q,     vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index bb868961d1..4585f0eb55 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -524,3 +524,33 @@ DEF_HELPER_4(vsrarni_b_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrarni_h_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrarni_w_d, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrarni_d_q, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vssrln_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrln_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrln_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssran_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssran_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssran_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrln_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrln_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrln_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssran_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssran_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssran_wu_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vssrlni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlni_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlni_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlni_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlni_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlni_du_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_du_q, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index d3ab0a4a6a..39e0e53677 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -440,3 +440,33 @@ TRANS(vsrarni_b_h, gen_vv_i, gen_helper_vsrarni_b_h)
 TRANS(vsrarni_h_w, gen_vv_i, gen_helper_vsrarni_h_w)
 TRANS(vsrarni_w_d, gen_vv_i, gen_helper_vsrarni_w_d)
 TRANS(vsrarni_d_q, gen_vv_i, gen_helper_vsrarni_d_q)
+
+TRANS(vssrln_b_h, gen_vvv, gen_helper_vssrln_b_h)
+TRANS(vssrln_h_w, gen_vvv, gen_helper_vssrln_h_w)
+TRANS(vssrln_w_d, gen_vvv, gen_helper_vssrln_w_d)
+TRANS(vssran_b_h, gen_vvv, gen_helper_vssran_b_h)
+TRANS(vssran_h_w, gen_vvv, gen_helper_vssran_h_w)
+TRANS(vssran_w_d, gen_vvv, gen_helper_vssran_w_d)
+TRANS(vssrln_bu_h, gen_vvv, gen_helper_vssrln_bu_h)
+TRANS(vssrln_hu_w, gen_vvv, gen_helper_vssrln_hu_w)
+TRANS(vssrln_wu_d, gen_vvv, gen_helper_vssrln_wu_d)
+TRANS(vssran_bu_h, gen_vvv, gen_helper_vssran_bu_h)
+TRANS(vssran_hu_w, gen_vvv, gen_helper_vssran_hu_w)
+TRANS(vssran_wu_d, gen_vvv, gen_helper_vssran_wu_d)
+
+TRANS(vssrlni_b_h, gen_vv_i, gen_helper_vssrlni_b_h)
+TRANS(vssrlni_h_w, gen_vv_i, gen_helper_vssrlni_h_w)
+TRANS(vssrlni_w_d, gen_vv_i, gen_helper_vssrlni_w_d)
+TRANS(vssrlni_d_q, gen_vv_i, gen_helper_vssrlni_d_q)
+TRANS(vssrani_b_h, gen_vv_i, gen_helper_vssrani_b_h)
+TRANS(vssrani_h_w, gen_vv_i, gen_helper_vssrani_h_w)
+TRANS(vssrani_w_d, gen_vv_i, gen_helper_vssrani_w_d)
+TRANS(vssrani_d_q, gen_vv_i, gen_helper_vssrani_d_q)
+TRANS(vssrlni_bu_h, gen_vv_i, gen_helper_vssrlni_bu_h)
+TRANS(vssrlni_hu_w, gen_vv_i, gen_helper_vssrlni_hu_w)
+TRANS(vssrlni_wu_d, gen_vv_i, gen_helper_vssrlni_wu_d)
+TRANS(vssrlni_du_q, gen_vv_i, gen_helper_vssrlni_du_q)
+TRANS(vssrani_bu_h, gen_vv_i, gen_helper_vssrani_bu_h)
+TRANS(vssrani_hu_w, gen_vv_i, gen_helper_vssrani_hu_w)
+TRANS(vssrani_wu_d, gen_vv_i, gen_helper_vssrani_wu_d)
+TRANS(vssrani_du_q, gen_vv_i, gen_helper_vssrani_du_q)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 0b30175f6b..3e1b4084bb 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -898,3 +898,33 @@ vsrarni_b_h      0111 00110101 11000 1 .... ..... .....   @vv_ui4
 vsrarni_h_w      0111 00110101 11001 ..... ..... .....    @vv_ui5
 vsrarni_w_d      0111 00110101 1101 ...... ..... .....    @vv_ui6
 vsrarni_d_q      0111 00110101 111 ....... ..... .....    @vv_ui7
+
+vssrln_b_h       0111 00001111 11001 ..... ..... .....    @vvv
+vssrln_h_w       0111 00001111 11010 ..... ..... .....    @vvv
+vssrln_w_d       0111 00001111 11011 ..... ..... .....    @vvv
+vssran_b_h       0111 00001111 11101 ..... ..... .....    @vvv
+vssran_h_w       0111 00001111 11110 ..... ..... .....    @vvv
+vssran_w_d       0111 00001111 11111 ..... ..... .....    @vvv
+vssrln_bu_h      0111 00010000 01001 ..... ..... .....    @vvv
+vssrln_hu_w      0111 00010000 01010 ..... ..... .....    @vvv
+vssrln_wu_d      0111 00010000 01011 ..... ..... .....    @vvv
+vssran_bu_h      0111 00010000 01101 ..... ..... .....    @vvv
+vssran_hu_w      0111 00010000 01110 ..... ..... .....    @vvv
+vssran_wu_d      0111 00010000 01111 ..... ..... .....    @vvv
+
+vssrlni_b_h      0111 00110100 10000 1 .... ..... .....   @vv_ui4
+vssrlni_h_w      0111 00110100 10001 ..... ..... .....    @vv_ui5
+vssrlni_w_d      0111 00110100 1001 ...... ..... .....    @vv_ui6
+vssrlni_d_q      0111 00110100 101 ....... ..... .....    @vv_ui7
+vssrani_b_h      0111 00110110 00000 1 .... ..... .....   @vv_ui4
+vssrani_h_w      0111 00110110 00001 ..... ..... .....    @vv_ui5
+vssrani_w_d      0111 00110110 0001 ...... ..... .....    @vv_ui6
+vssrani_d_q      0111 00110110 001 ....... ..... .....    @vv_ui7
+vssrlni_bu_h     0111 00110100 11000 1 .... ..... .....   @vv_ui4
+vssrlni_hu_w     0111 00110100 11001 ..... ..... .....    @vv_ui5
+vssrlni_wu_d     0111 00110100 1101 ...... ..... .....    @vv_ui6
+vssrlni_du_q     0111 00110100 111 ....... ..... .....    @vv_ui7
+vssrani_bu_h     0111 00110110 01000 1 .... ..... .....   @vv_ui4
+vssrani_hu_w     0111 00110110 01001 ..... ..... .....    @vv_ui5
+vssrani_wu_d     0111 00110110 0101 ...... ..... .....    @vv_ui6
+vssrani_du_q     0111 00110110 011 ....... ..... .....    @vv_ui7
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 8ccfa75fe3..6704eb4ea5 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -2624,3 +2624,270 @@ DO_HELPER_VV_I(vsrarni_b_h, 16, helper_vv_ni_c, do_vsrarni)
 DO_HELPER_VV_I(vsrarni_h_w, 32, helper_vv_ni_c, do_vsrarni)
 DO_HELPER_VV_I(vsrarni_w_d, 64, helper_vv_ni_c, do_vsrarni)
 DO_HELPER_VV_I(vsrarni_d_q, 128, helper_vv_ni_c, do_vsrarni)
+
+static int64_t vsra(int64_t s1, int64_t s2, int bit)
+{
+    return (s1 >> ((uint64_t)(s2) % bit));
+}
+
+static void do_vssrln(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->B[n] = sat_s(vsrl((uint16_t)Vj->H[n], Vk->H[n], bit), bit/2 - 1);
+        break;
+    case 32:
+        Vd->H[n] = sat_s(vsrl((uint32_t)Vj->W[n], Vk->W[n], bit), bit/2 - 1);
+        break;
+    case 64:
+        Vd->W[n] = sat_s(vsrl((uint64_t)Vj->D[n], Vk->D[n], bit), bit/2 - 1);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vssran(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->B[n] = sat_s(vsra(Vj->H[n], Vk->H[n], bit), bit/2 - 1);
+        break;
+    case 32:
+        Vd->H[n] = sat_s(vsra(Vj->W[n], Vk->W[n], bit), bit/2 - 1);
+        break;
+    case 64:
+        Vd->W[n] = sat_s(vsra(Vj->D[n], Vk->D[n], bit), bit/2 - 1);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vssrln_b_h, 16, helper_vvv_hz, do_vssrln)
+DO_HELPER_VVV(vssrln_h_w, 32, helper_vvv_hz, do_vssrln)
+DO_HELPER_VVV(vssrln_w_d, 64, helper_vvv_hz, do_vssrln)
+DO_HELPER_VVV(vssran_b_h, 16, helper_vvv_hz, do_vssran)
+DO_HELPER_VVV(vssran_h_w, 32, helper_vvv_hz, do_vssran)
+DO_HELPER_VVV(vssran_w_d, 64, helper_vvv_hz, do_vssran)
+
+static void do_vssrln_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->B[n] = sat_u(vsrl((uint16_t)Vj->H[n], Vk->H[n], bit), bit/2 - 1);
+        break;
+    case 32:
+        Vd->H[n] = sat_u(vsrl((uint32_t)Vj->W[n], Vk->W[n], bit), bit/2 - 1);
+        break;
+    case 64:
+        Vd->W[n] = sat_u(vsrl((uint64_t)Vj->D[n], Vk->D[n], bit), bit/2 - 1);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vssran_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->B[n] = sat_u(vsra(Vj->H[n], Vk->H[n], bit), bit/2 - 1);
+        if (Vd->B[n] < 0) {
+            Vd->B[n] = 0;
+        }
+        break;
+    case 32:
+        Vd->H[n] = sat_u(vsra(Vj->W[n], Vk->W[n], bit), bit/2 - 1);
+        if (Vd->H[n] < 0) {
+            Vd->H[n] = 0;
+        }
+        break;
+    case 64:
+        Vd->W[n] = sat_u(vsra(Vj->D[n], Vk->D[n], bit), bit/2 - 1);
+        if (Vd->W[n] < 0) {
+            Vd->W[n] = 0;
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vssrln_bu_h, 16, helper_vvv_hz, do_vssrln_u)
+DO_HELPER_VVV(vssrln_hu_w, 32, helper_vvv_hz, do_vssrln_u)
+DO_HELPER_VVV(vssrln_wu_d, 64, helper_vvv_hz, do_vssrln_u)
+DO_HELPER_VVV(vssran_bu_h, 16, helper_vvv_hz, do_vssran_u)
+DO_HELPER_VVV(vssran_hu_w, 32, helper_vvv_hz, do_vssran_u)
+DO_HELPER_VVV(vssran_wu_d, 64, helper_vvv_hz, do_vssran_u)
+
+static int64_t sat_s_128u(__uint128_t u1, uint32_t imm)
+{
+    uint64_t max = MAKE_64BIT_MASK(0, imm);
+    return u1 < max ? u1: max;
+}
+
+static void do_vssrlni(vec_t *dest, vec_t *Vd, vec_t *Vj,
+                       uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        dest->B[n] = sat_s((((uint16_t)Vj->H[n]) >> imm), bit/2 - 1);
+        dest->B[n + 128/bit] = sat_s((((uint16_t)Vd->H[n]) >> imm), bit/2 -1);
+        break;
+    case 32:
+        dest->H[n] = sat_s((((uint32_t)Vj->W[n]) >> imm), bit/2 - 1);
+        dest->H[n + 128/bit] = sat_s((((uint32_t)Vd->W[n]) >> imm), bit/2 - 1);
+        break;
+    case 64:
+        dest->W[n] = sat_s((((uint64_t)Vj->D[n]) >> imm), bit/2 - 1);
+        dest->W[n + 128/bit] = sat_s((((uint64_t)Vd->D[n]) >> imm), bit/2 - 1);
+        break;
+    case 128:
+        dest->D[n] = sat_s_128u((((__uint128_t)Vj->Q[n]) >> imm), bit/2 - 1);
+        dest->D[n + 128/bit] = sat_s_128u((((__uint128_t)Vd->Q[n]) >> imm),
+                                          bit/2 - 1);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static int64_t sat_s_128(__int128_t s1, uint32_t imm)
+{
+    int64_t max = MAKE_64BIT_MASK(0, imm);
+    int64_t min = MAKE_64BIT_MASK(imm, 64);
+
+    if (s1 > max -1) {
+        return max;
+    } else if (s1 < - max) {
+        return min;
+    } else {
+        return s1;
+    }
+}
+
+static void do_vssrani(vec_t *dest, vec_t *Vd, vec_t *Vj,
+                       uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        dest->B[n] = sat_s((Vj->H[n] >> imm), bit/2 - 1);
+        dest->B[n + 128/bit] = sat_s((Vd->H[n] >> imm), bit/2 - 1);
+        break;
+    case 32:
+        dest->H[n] = sat_s((Vj->W[n] >> imm), bit/2 - 1);
+        dest->H[n + 128/bit] = sat_s((Vd->W[n] >> imm), bit/2 - 1);
+        break;
+    case 64:
+        dest->W[n] = sat_s((Vj->D[n] >> imm), bit/2 - 1);
+        dest->W[n + 128/bit] = sat_s((Vd->D[n] >> imm), bit/2 - 1);
+        break;
+    case 128:
+        dest->D[n] = sat_s_128(((__int128_t)Vj->Q[n] >> imm), bit/2 - 1);
+        dest->D[n + 128/bit] = sat_s_128(((__int128_t)Vd->Q[n] >> imm),
+                                         bit/2 - 1);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VV_I(vssrlni_b_h, 16, helper_vv_ni_c, do_vssrlni)
+DO_HELPER_VV_I(vssrlni_h_w, 32, helper_vv_ni_c, do_vssrlni)
+DO_HELPER_VV_I(vssrlni_w_d, 64, helper_vv_ni_c, do_vssrlni)
+DO_HELPER_VV_I(vssrlni_d_q, 128, helper_vv_ni_c, do_vssrlni)
+DO_HELPER_VV_I(vssrani_b_h, 16, helper_vv_ni_c, do_vssrani)
+DO_HELPER_VV_I(vssrani_h_w, 32, helper_vv_ni_c, do_vssrani)
+DO_HELPER_VV_I(vssrani_w_d, 64, helper_vv_ni_c, do_vssrani)
+DO_HELPER_VV_I(vssrani_d_q, 128, helper_vv_ni_c, do_vssrani)
+
+static int64_t sat_u_128(__uint128_t u1, uint32_t imm)
+{
+    uint64_t max = MAKE_64BIT_MASK(0, imm + 1);
+    return u1 < max ? u1 : max;
+}
+
+static void do_vssrlni_u(vec_t *dest, vec_t *Vd, vec_t *Vj,
+                         uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        dest->B[n] = sat_u((((uint16_t)Vj->H[n]) >> imm),  bit/2 -1);
+        dest->B[n + 128/bit] = sat_u((((uint16_t)Vd->H[n]) >> imm), bit/2 -1);
+        break;
+    case 32:
+        dest->H[n] = sat_u((((uint32_t)Vj->W[n]) >> imm), imm);
+        dest->H[n + 128/bit] = sat_u((((uint32_t)Vd->W[n]) >> imm), bit/2 -1);
+        break;
+    case 64:
+        dest->W[n] = sat_u((((uint64_t)Vj->D[n]) >> imm), bit/2 - 1);
+        dest->W[n + 128/bit] = sat_u((((uint64_t)Vd->D[n]) >> imm), bit/2 -1);
+        break;
+    case 128:
+        dest->D[n] = sat_u_128((((__uint128_t)Vj->Q[n]) >> imm), bit/2 - 1);
+        dest->D[n + 128/bit] = sat_u_128((((__uint128_t)Vd->Q[n]) >> imm),
+                                         bit/2 -1);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void  do_vssrani_u(vec_t *dest, vec_t *Vd, vec_t *Vj,
+                          uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        dest->B[n] = sat_u((Vj->H[n] >> imm), bit/2 - 1);
+        if (dest->B[n] < 0) {
+            dest->B[n] = 0;
+        }
+        dest->B[n + 128/bit] = sat_u((Vd->H[n] >> imm), bit/2 - 1);
+        if (dest->B[n + 128/bit] < 0) {
+            dest->B[n + 128/bit] = 0;
+        }
+        break;
+    case 32:
+        dest->H[n] = sat_u((Vj->W[n] >> imm), bit/2 - 1);
+        if (dest->H[n] < 0) {
+            dest->H[n] = 0;
+        }
+        dest->H[n + 128/bit] = sat_u((Vd->W[n] >> imm), bit/2 - 1);
+        if (dest->H[n + 128/bit] < 0) {
+            dest->H[n + 128/bit] = 0;
+        }
+        break;
+    case 64:
+        dest->W[n] = sat_u((Vj->D[n] >> imm), bit/2 - 1);
+        if (dest->W[n] < 0) {
+            dest->W[n] = 0;
+        }
+        dest->W[n + 128/bit] = sat_u((Vd->D[n] >> imm), bit/2 - 1);
+        if (dest->W[n + 128/bit] < 0) {
+            dest->W[n + 128/bit] = 0;
+        }
+        break;
+    case 128:
+        dest->D[n] = sat_u_128((Vj->Q[n] >> imm), bit/2 - 1);
+        if (dest->D[n] < 0) {
+            dest->D[n] = 0;
+        }
+        dest->D[n + 128/bit] = sat_u_128((Vd->Q[n] >> imm), bit/2 - 1);
+        if (dest->D[n + 128/bit] < 0) {
+            dest->D[n + 128/bit] = 0;
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VV_I(vssrlni_bu_h, 16, helper_vv_ni_c, do_vssrlni_u)
+DO_HELPER_VV_I(vssrlni_hu_w, 32, helper_vv_ni_c, do_vssrlni_u)
+DO_HELPER_VV_I(vssrlni_wu_d, 64, helper_vv_ni_c, do_vssrlni_u)
+DO_HELPER_VV_I(vssrlni_du_q, 128, helper_vv_ni_c, do_vssrlni_u)
+DO_HELPER_VV_I(vssrani_bu_h, 16, helper_vv_ni_c, do_vssrani_u)
+DO_HELPER_VV_I(vssrani_hu_w, 32, helper_vv_ni_c, do_vssrani_u)
+DO_HELPER_VV_I(vssrani_wu_d, 64, helper_vv_ni_c, do_vssrani_u)
+DO_HELPER_VV_I(vssrani_du_q, 128, helper_vv_ni_c, do_vssrani_u)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 29/43] target/loongarch: Implement vssrlrn vssrarn
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (27 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 28/43] target/loongarch: Implement vssrln vssran Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24  8:16 ` [RFC PATCH 30/43] target/loongarch: Implement vclo vclz Song Gao
                   ` (14 subsequent siblings)
  43 siblings, 0 replies; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VSSRLRN.{B.H/H.W/W.D};
- VSSRARN.{B.H/H.W/W.D};
- VSSRLRN.{BU.H/HU.W/WU.D};
- VSSRARN.{BU.H/HU.W/WU.D};
- VSSRLRNI.{B.H/H.W/W.D/D.Q};
- VSSRARNI.{B.H/H.W/W.D/D.Q};
- VSSRLRNI.{BU.H/HU.W/WU.D/DU.Q};
- VSSRARNI.{BU.H/HU.W/WU.D/DU.Q}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  30 +++
 target/loongarch/helper.h                   |  30 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  30 +++
 target/loongarch/insns.decode               |  30 +++
 target/loongarch/lsx_helper.c               | 257 ++++++++++++++++++++
 5 files changed, 377 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 1b9bd6bb86..c1d256d8b4 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1199,3 +1199,33 @@ INSN_LSX(vssrani_bu_h,     vv_i)
 INSN_LSX(vssrani_hu_w,     vv_i)
 INSN_LSX(vssrani_wu_d,     vv_i)
 INSN_LSX(vssrani_du_q,     vv_i)
+
+INSN_LSX(vssrlrn_b_h,      vvv)
+INSN_LSX(vssrlrn_h_w,      vvv)
+INSN_LSX(vssrlrn_w_d,      vvv)
+INSN_LSX(vssrarn_b_h,      vvv)
+INSN_LSX(vssrarn_h_w,      vvv)
+INSN_LSX(vssrarn_w_d,      vvv)
+INSN_LSX(vssrlrn_bu_h,     vvv)
+INSN_LSX(vssrlrn_hu_w,     vvv)
+INSN_LSX(vssrlrn_wu_d,     vvv)
+INSN_LSX(vssrarn_bu_h,     vvv)
+INSN_LSX(vssrarn_hu_w,     vvv)
+INSN_LSX(vssrarn_wu_d,     vvv)
+
+INSN_LSX(vssrlrni_b_h,     vv_i)
+INSN_LSX(vssrlrni_h_w,     vv_i)
+INSN_LSX(vssrlrni_w_d,     vv_i)
+INSN_LSX(vssrlrni_d_q,     vv_i)
+INSN_LSX(vssrlrni_bu_h,    vv_i)
+INSN_LSX(vssrlrni_hu_w,    vv_i)
+INSN_LSX(vssrlrni_wu_d,    vv_i)
+INSN_LSX(vssrlrni_du_q,    vv_i)
+INSN_LSX(vssrarni_b_h,     vv_i)
+INSN_LSX(vssrarni_h_w,     vv_i)
+INSN_LSX(vssrarni_w_d,     vv_i)
+INSN_LSX(vssrarni_d_q,     vv_i)
+INSN_LSX(vssrarni_bu_h,    vv_i)
+INSN_LSX(vssrarni_hu_w,    vv_i)
+INSN_LSX(vssrarni_wu_d,    vv_i)
+INSN_LSX(vssrarni_du_q,    vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 4585f0eb55..e45eb211a6 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -554,3 +554,33 @@ DEF_HELPER_4(vssrani_bu_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vssrani_hu_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vssrani_wu_d, void, env, i32, i32, i32)
 DEF_HELPER_4(vssrani_du_q, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vssrlrn_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrn_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrn_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarn_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarn_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarn_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrn_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrn_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrn_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarn_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarn_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarn_wu_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vssrlrni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrni_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrni_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrni_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrni_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrni_du_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_du_q, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 39e0e53677..5473adc163 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -470,3 +470,33 @@ TRANS(vssrani_bu_h, gen_vv_i, gen_helper_vssrani_bu_h)
 TRANS(vssrani_hu_w, gen_vv_i, gen_helper_vssrani_hu_w)
 TRANS(vssrani_wu_d, gen_vv_i, gen_helper_vssrani_wu_d)
 TRANS(vssrani_du_q, gen_vv_i, gen_helper_vssrani_du_q)
+
+TRANS(vssrlrn_b_h, gen_vvv, gen_helper_vssrlrn_b_h)
+TRANS(vssrlrn_h_w, gen_vvv, gen_helper_vssrlrn_h_w)
+TRANS(vssrlrn_w_d, gen_vvv, gen_helper_vssrlrn_w_d)
+TRANS(vssrarn_b_h, gen_vvv, gen_helper_vssrarn_b_h)
+TRANS(vssrarn_h_w, gen_vvv, gen_helper_vssrarn_h_w)
+TRANS(vssrarn_w_d, gen_vvv, gen_helper_vssrarn_w_d)
+TRANS(vssrlrn_bu_h, gen_vvv, gen_helper_vssrlrn_bu_h)
+TRANS(vssrlrn_hu_w, gen_vvv, gen_helper_vssrlrn_hu_w)
+TRANS(vssrlrn_wu_d, gen_vvv, gen_helper_vssrlrn_wu_d)
+TRANS(vssrarn_bu_h, gen_vvv, gen_helper_vssrarn_bu_h)
+TRANS(vssrarn_hu_w, gen_vvv, gen_helper_vssrarn_hu_w)
+TRANS(vssrarn_wu_d, gen_vvv, gen_helper_vssrarn_wu_d)
+
+TRANS(vssrlrni_b_h, gen_vv_i, gen_helper_vssrlrni_b_h)
+TRANS(vssrlrni_h_w, gen_vv_i, gen_helper_vssrlrni_h_w)
+TRANS(vssrlrni_w_d, gen_vv_i, gen_helper_vssrlrni_w_d)
+TRANS(vssrlrni_d_q, gen_vv_i, gen_helper_vssrlrni_d_q)
+TRANS(vssrarni_b_h, gen_vv_i, gen_helper_vssrarni_b_h)
+TRANS(vssrarni_h_w, gen_vv_i, gen_helper_vssrarni_h_w)
+TRANS(vssrarni_w_d, gen_vv_i, gen_helper_vssrarni_w_d)
+TRANS(vssrarni_d_q, gen_vv_i, gen_helper_vssrarni_d_q)
+TRANS(vssrlrni_bu_h, gen_vv_i, gen_helper_vssrlrni_bu_h)
+TRANS(vssrlrni_hu_w, gen_vv_i, gen_helper_vssrlrni_hu_w)
+TRANS(vssrlrni_wu_d, gen_vv_i, gen_helper_vssrlrni_wu_d)
+TRANS(vssrlrni_du_q, gen_vv_i, gen_helper_vssrlrni_du_q)
+TRANS(vssrarni_bu_h, gen_vv_i, gen_helper_vssrarni_bu_h)
+TRANS(vssrarni_hu_w, gen_vv_i, gen_helper_vssrarni_hu_w)
+TRANS(vssrarni_wu_d, gen_vv_i, gen_helper_vssrarni_wu_d)
+TRANS(vssrarni_du_q, gen_vv_i, gen_helper_vssrarni_du_q)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 3e1b4084bb..3b3c2520c3 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -928,3 +928,33 @@ vssrani_bu_h     0111 00110110 01000 1 .... ..... .....   @vv_ui4
 vssrani_hu_w     0111 00110110 01001 ..... ..... .....    @vv_ui5
 vssrani_wu_d     0111 00110110 0101 ...... ..... .....    @vv_ui6
 vssrani_du_q     0111 00110110 011 ....... ..... .....    @vv_ui7
+
+vssrlrn_b_h      0111 00010000 00001 ..... ..... .....    @vvv
+vssrlrn_h_w      0111 00010000 00010 ..... ..... .....    @vvv
+vssrlrn_w_d      0111 00010000 00011 ..... ..... .....    @vvv
+vssrarn_b_h      0111 00010000 00101 ..... ..... .....    @vvv
+vssrarn_h_w      0111 00010000 00110 ..... ..... .....    @vvv
+vssrarn_w_d      0111 00010000 00111 ..... ..... .....    @vvv
+vssrlrn_bu_h     0111 00010000 10001 ..... ..... .....    @vvv
+vssrlrn_hu_w     0111 00010000 10010 ..... ..... .....    @vvv
+vssrlrn_wu_d     0111 00010000 10011 ..... ..... .....    @vvv
+vssrarn_bu_h     0111 00010000 10101 ..... ..... .....    @vvv
+vssrarn_hu_w     0111 00010000 10110 ..... ..... .....    @vvv
+vssrarn_wu_d     0111 00010000 10111 ..... ..... .....    @vvv
+
+vssrlrni_b_h     0111 00110101 00000 1 .... ..... .....   @vv_ui4
+vssrlrni_h_w     0111 00110101 00001 ..... ..... .....    @vv_ui5
+vssrlrni_w_d     0111 00110101 0001 ...... ..... .....    @vv_ui6
+vssrlrni_d_q     0111 00110101 001 ....... ..... .....    @vv_ui7
+vssrarni_b_h     0111 00110110 10000 1 .... ..... .....   @vv_ui4
+vssrarni_h_w     0111 00110110 10001 ..... ..... .....    @vv_ui5
+vssrarni_w_d     0111 00110110 1001 ...... ..... .....    @vv_ui6
+vssrarni_d_q     0111 00110110 101 ....... ..... .....    @vv_ui7
+vssrlrni_bu_h    0111 00110101 01000 1 .... ..... .....   @vv_ui4
+vssrlrni_hu_w    0111 00110101 01001 ..... ..... .....    @vv_ui5
+vssrlrni_wu_d    0111 00110101 0101 ...... ..... .....    @vv_ui6
+vssrlrni_du_q    0111 00110101 011 ....... ..... .....    @vv_ui7
+vssrarni_bu_h    0111 00110110 11000 1 .... ..... .....   @vv_ui4
+vssrarni_hu_w    0111 00110110 11001 ..... ..... .....    @vv_ui5
+vssrarni_wu_d    0111 00110110 1101 ...... ..... .....    @vv_ui6
+vssrarni_du_q    0111 00110110 111 ....... ..... .....    @vv_ui7
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 6704eb4ea5..d771ff953c 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -2891,3 +2891,260 @@ DO_HELPER_VV_I(vssrani_bu_h, 16, helper_vv_ni_c, do_vssrani_u)
 DO_HELPER_VV_I(vssrani_hu_w, 32, helper_vv_ni_c, do_vssrani_u)
 DO_HELPER_VV_I(vssrani_wu_d, 64, helper_vv_ni_c, do_vssrani_u)
 DO_HELPER_VV_I(vssrani_du_q, 128, helper_vv_ni_c, do_vssrani_u)
+
+static void do_vssrlrn(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->B[n] = sat_s(vsrlr((uint16_t)Vj->H[n], Vk->H[n], bit), bit/2 - 1);
+        break;
+    case 32:
+        Vd->H[n] = sat_s(vsrlr((uint32_t)Vj->W[n], Vk->W[n], bit), bit/2 - 1);
+        break;
+    case 64:
+        Vd->W[n] = sat_s(vsrlr((uint64_t)Vj->D[n], Vk->D[n], bit), bit/2 - 1);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vssrarn(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->B[n] = sat_s(vsrar(Vj->H[n], Vk->H[n], bit), bit/2 - 1);
+        break;
+    case 32:
+        Vd->H[n] = sat_s(vsrar(Vj->W[n], Vk->W[n], bit), bit/2 - 1);
+        break;
+    case 64:
+        Vd->W[n] = sat_s(vsrar(Vj->D[n], Vk->D[n], bit), bit/2 - 1);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vssrlrn_b_h, 16, helper_vvv_hz, do_vssrlrn)
+DO_HELPER_VVV(vssrlrn_h_w, 32, helper_vvv_hz, do_vssrlrn)
+DO_HELPER_VVV(vssrlrn_w_d, 64, helper_vvv_hz, do_vssrlrn)
+DO_HELPER_VVV(vssrarn_b_h, 16, helper_vvv_hz, do_vssrarn)
+DO_HELPER_VVV(vssrarn_h_w, 32, helper_vvv_hz, do_vssrarn)
+DO_HELPER_VVV(vssrarn_w_d, 64, helper_vvv_hz, do_vssrarn)
+
+static void do_vssrlrn_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->B[n] = sat_u(vsrlr((uint16_t)Vj->H[n], Vk->H[n], bit), bit/2 - 1);
+        break;
+    case 32:
+        Vd->H[n] = sat_u(vsrlr((uint32_t)Vj->W[n], Vk->W[n], bit), bit/2 - 1);
+        break;
+    case 64:
+        Vd->W[n] = sat_u(vsrlr((uint64_t)Vj->D[n], Vk->D[n], bit), bit/2 - 1);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vssrarn_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        Vd->B[n] = sat_u(vsrar(Vj->H[n], Vk->H[n], bit), bit/2 - 1);
+        if (Vd->B[n] < 0) {
+            Vd->B[n] = 0;
+        }
+        break;
+    case 32:
+        Vd->H[n] = sat_u(vsrar(Vj->W[n], Vk->W[n], bit), bit/2 - 1);
+        if (Vd->H[n] < 0) {
+            Vd->H[n] = 0;
+        }
+        break;
+    case 64:
+        Vd->W[n] = sat_u(vsrar(Vj->D[n], Vk->W[n], bit), bit/2 - 1);
+        if (Vd->W[n] < 0) {
+            Vd->W[n] = 0;
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vssrlrn_bu_h, 16, helper_vvv_hz, do_vssrlrn_u)
+DO_HELPER_VVV(vssrlrn_hu_w, 32, helper_vvv_hz, do_vssrlrn_u)
+DO_HELPER_VVV(vssrlrn_wu_d, 64, helper_vvv_hz, do_vssrlrn_u)
+DO_HELPER_VVV(vssrarn_bu_h, 16, helper_vvv_hz, do_vssrarn_u)
+DO_HELPER_VVV(vssrarn_hu_w, 32, helper_vvv_hz, do_vssrarn_u)
+DO_HELPER_VVV(vssrarn_wu_d, 64, helper_vvv_hz, do_vssrarn_u)
+
+static __int128_t vsrarn(__int128_t s1, int64_t s2, int bit)
+{
+    int32_t n = (uint64_t)(s2 % bit);
+
+    if (n == 0) {
+        return s1;
+    } else {
+        uint64_t r_bit = (s1 >> (n  - 1)) & 1;
+        return (s1 >> n) + r_bit;
+    }
+}
+
+static void do_vssrlrni(vec_t *dest, vec_t *Vd, vec_t *Vj,
+                        uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        dest->B[n] = sat_s(vsrlr((uint16_t)Vj->H[n], imm, bit), bit/2 - 1);
+        dest->B[n + 128/bit] = sat_s(vsrlr((uint16_t)Vd->H[n], imm, bit),
+	                             bit/2 -1);
+        break;
+    case 32:
+        dest->H[n] = sat_s(vsrlr((uint32_t)Vj->W[n], imm, bit), bit/2 - 1);
+        dest->H[n + 128/bit] = sat_s(vsrlr((uint32_t)Vd->W[n], imm, bit),
+                                     bit/2 - 1);
+        break;
+    case 64:
+        dest->W[n] = sat_s(vsrlr((uint64_t)Vj->D[n], imm, bit), bit/2 - 1);
+        dest->W[n + 128/bit] = sat_s(vsrlr((uint64_t)Vd->D[n], imm, bit),
+                                     bit/2 - 1);
+        break;
+    case 128:
+        dest->D[n] = sat_s_128u(vsrlrn((__uint128_t)Vj->Q[n], imm), bit/2 - 1);
+        dest->D[n + 128/bit] = sat_s_128u(vsrlrn((__uint128_t)Vd->Q[n], imm),
+                                          bit/2 - 1);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vssrarni(vec_t *dest, vec_t *Vd, vec_t *Vj,
+                        uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        dest->B[n] = sat_s(vsrar(Vj->H[n], imm, bit), bit/2 - 1);
+        dest->B[n + 128/bit] = sat_s(vsrar(Vd->H[n], imm, bit), bit/2 - 1);
+        break;
+    case 32:
+        dest->H[n] = sat_s(vsrar(Vj->W[n], imm, bit), bit/2 - 1);
+        dest->H[n + 128/bit] = sat_s(vsrar(Vd->W[n], imm, bit), bit/2 - 1);
+        break;
+    case 64:
+        dest->W[n] = sat_s(vsrar(Vj->D[n], imm, bit), bit/2 - 1);
+        dest->W[n + 128/bit] = sat_s(vsrar(Vd->D[n], imm, bit), bit/2 - 1);
+        break;
+    case 128:
+        dest->D[n] = sat_s_128(vsrarn((__int128_t)Vj->Q[n], imm, bit),
+                               bit/2 - 1);
+        dest->D[n + 128/bit] = sat_s_128(vsrarn((__int128_t)Vd->Q[n], imm, bit),
+                                         bit/2 - 1);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VV_I(vssrlrni_b_h, 16, helper_vv_ni_c, do_vssrlrni)
+DO_HELPER_VV_I(vssrlrni_h_w, 32, helper_vv_ni_c, do_vssrlrni)
+DO_HELPER_VV_I(vssrlrni_w_d, 64, helper_vv_ni_c, do_vssrlrni)
+DO_HELPER_VV_I(vssrlrni_d_q, 128, helper_vv_ni_c, do_vssrlrni)
+DO_HELPER_VV_I(vssrarni_b_h, 16, helper_vv_ni_c, do_vssrarni)
+DO_HELPER_VV_I(vssrarni_h_w, 32, helper_vv_ni_c, do_vssrarni)
+DO_HELPER_VV_I(vssrarni_w_d, 64, helper_vv_ni_c, do_vssrarni)
+DO_HELPER_VV_I(vssrarni_d_q, 128, helper_vv_ni_c, do_vssrarni)
+
+static void do_vssrlrni_u(vec_t *dest, vec_t *Vd, vec_t *Vj,
+                          uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        dest->B[n] = sat_u(vsrlr((uint16_t)Vj->H[n], imm, bit), bit/2 - 1);
+        dest->B[n + 128/bit] = sat_u(vsrlr((uint16_t)Vd->H[n], imm, bit),
+                                     bit/2 - 1);
+        break;
+    case 32:
+        dest->H[n] = sat_u(vsrlr((uint32_t)Vj->W[n], imm, bit), bit/2 - 1);
+        dest->H[n + 128/bit] = sat_u(vsrlr((uint32_t)Vd->W[n], imm, bit),
+                                     bit/2 - 1);
+        break;
+    case 64:
+        dest->W[n] = sat_u(vsrlr((uint64_t)Vj->D[n], imm, bit), bit/2 - 1);
+        dest->W[n + 128/bit] = sat_u(vsrlr((uint64_t)Vd->D[n], imm, bit),
+                                     bit/2 - 1);
+        break;
+    case 128:
+        dest->D[n] = sat_u_128(vsrlrn((__uint128_t)Vj->Q[n], imm), bit/2 - 1);
+        dest->D[n + 128/bit] = sat_u_128(vsrlrn((__uint128_t)Vd->Q[n], imm),
+                                         bit/2 - 1);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vssrarni_u(vec_t *dest, vec_t *Vd, vec_t *Vj,
+                          uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 16:
+        dest->B[n] = sat_u(vsrar(Vj->H[n], imm, bit), bit/2 - 1);
+        if (dest->B[n] < 0) {
+            dest->B[n] = 0;
+        }
+        dest->B[n + 128/bit] = sat_u(vsrar(Vd->H[n], imm, bit), bit/2 - 1);
+        if (dest->B[n + 128/bit] < 0) {
+            dest->B[n + 128/bit] = 0;
+        }
+        break;
+    case 32:
+        dest->H[n] = sat_u(vsrar(Vj->W[n],imm, bit), bit/2 - 1);
+        if (dest->H[n] < 0) {
+            dest->H[n] = 0;
+        }
+        dest->H[n + 128/bit] = sat_u(vsrar(Vd->W[n], imm, bit), bit/2 - 1);
+        if (dest->H[n + 128/bit] < 0) {
+            dest->H[n + 128/bit] = 0;
+        }
+        break;
+    case 64:
+        dest->W[n] = sat_u(vsrar(Vj->D[n], imm, bit), bit/2 - 1);
+        if (dest->W[n] < 0) {
+            dest->W[n] = 0;
+        }
+        dest->W[n + 128/bit] = sat_u(vsrar(Vd->D[n], imm, bit), bit/2 - 1);
+        if (dest->W[n + 128/bit] < 0) {
+            dest->W[n + 128/bit] = 0;
+        }
+        break;
+    case 128:
+        dest->D[n] = sat_u_128(vsrarn((__int128_t)Vj->Q[n], imm, bit),
+                               bit/2 - 1);
+        if (dest->D[n] < 0) {
+            dest->D[n] = 0;
+        }
+        dest->D[n + 128/bit] = sat_u_128(vsrarn((__int128_t)Vd->Q[n], imm, bit),
+                                         bit/2 - 1);
+        if (dest->D[n + 128/bit] < 0) {
+            dest->D[n + 128/bit] = 0;
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VV_I(vssrlrni_bu_h, 16, helper_vv_ni_c, do_vssrlrni_u)
+DO_HELPER_VV_I(vssrlrni_hu_w, 32, helper_vv_ni_c, do_vssrlrni_u)
+DO_HELPER_VV_I(vssrlrni_wu_d, 64, helper_vv_ni_c, do_vssrlrni_u)
+DO_HELPER_VV_I(vssrlrni_du_q, 128, helper_vv_ni_c, do_vssrlrni_u)
+DO_HELPER_VV_I(vssrarni_bu_h, 16, helper_vv_ni_c, do_vssrarni_u)
+DO_HELPER_VV_I(vssrarni_hu_w, 32, helper_vv_ni_c, do_vssrarni_u)
+DO_HELPER_VV_I(vssrarni_wu_d, 64, helper_vv_ni_c, do_vssrarni_u)
+DO_HELPER_VV_I(vssrarni_du_q, 128, helper_vv_ni_c, do_vssrarni_u)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 30/43] target/loongarch: Implement vclo vclz
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (28 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 29/43] target/loongarch: Implement vssrlrn vssrarn Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24  8:16 ` [RFC PATCH 31/43] target/loongarch: Implement vpcnt Song Gao
                   ` (13 subsequent siblings)
  43 siblings, 0 replies; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VCLO.{B/H/W/D};
- VCLZ.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  9 ++++
 target/loongarch/helper.h                   |  9 ++++
 target/loongarch/insn_trans/trans_lsx.c.inc |  9 ++++
 target/loongarch/insns.decode               |  9 ++++
 target/loongarch/lsx_helper.c               | 49 +++++++++++++++++++++
 5 files changed, 85 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index c1d256d8b4..865c293f43 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1229,3 +1229,12 @@ INSN_LSX(vssrarni_bu_h,    vv_i)
 INSN_LSX(vssrarni_hu_w,    vv_i)
 INSN_LSX(vssrarni_wu_d,    vv_i)
 INSN_LSX(vssrarni_du_q,    vv_i)
+
+INSN_LSX(vclo_b,           vv)
+INSN_LSX(vclo_h,           vv)
+INSN_LSX(vclo_w,           vv)
+INSN_LSX(vclo_d,           vv)
+INSN_LSX(vclz_b,           vv)
+INSN_LSX(vclz_h,           vv)
+INSN_LSX(vclz_w,           vv)
+INSN_LSX(vclz_d,           vv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index e45eb211a6..0080890bf6 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -584,3 +584,12 @@ DEF_HELPER_4(vssrarni_bu_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vssrarni_hu_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vssrarni_wu_d, void, env, i32, i32, i32)
 DEF_HELPER_4(vssrarni_du_q, void, env, i32, i32, i32)
+
+DEF_HELPER_3(vclo_b, void, env, i32, i32)
+DEF_HELPER_3(vclo_h, void, env, i32, i32)
+DEF_HELPER_3(vclo_w, void, env, i32, i32)
+DEF_HELPER_3(vclo_d, void, env, i32, i32)
+DEF_HELPER_3(vclz_b, void, env, i32, i32)
+DEF_HELPER_3(vclz_h, void, env, i32, i32)
+DEF_HELPER_3(vclz_w, void, env, i32, i32)
+DEF_HELPER_3(vclz_d, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 5473adc163..105b6fac6e 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -500,3 +500,12 @@ TRANS(vssrarni_bu_h, gen_vv_i, gen_helper_vssrarni_bu_h)
 TRANS(vssrarni_hu_w, gen_vv_i, gen_helper_vssrarni_hu_w)
 TRANS(vssrarni_wu_d, gen_vv_i, gen_helper_vssrarni_wu_d)
 TRANS(vssrarni_du_q, gen_vv_i, gen_helper_vssrarni_du_q)
+
+TRANS(vclo_b, gen_vv, gen_helper_vclo_b)
+TRANS(vclo_h, gen_vv, gen_helper_vclo_h)
+TRANS(vclo_w, gen_vv, gen_helper_vclo_w)
+TRANS(vclo_d, gen_vv, gen_helper_vclo_d)
+TRANS(vclz_b, gen_vv, gen_helper_vclz_b)
+TRANS(vclz_h, gen_vv, gen_helper_vclz_h)
+TRANS(vclz_w, gen_vv, gen_helper_vclz_w)
+TRANS(vclz_d, gen_vv, gen_helper_vclz_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 3b3c2520c3..27cfa306c9 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -958,3 +958,12 @@ vssrarni_bu_h    0111 00110110 11000 1 .... ..... .....   @vv_ui4
 vssrarni_hu_w    0111 00110110 11001 ..... ..... .....    @vv_ui5
 vssrarni_wu_d    0111 00110110 1101 ...... ..... .....    @vv_ui6
 vssrarni_du_q    0111 00110110 111 ....... ..... .....    @vv_ui7
+
+vclo_b           0111 00101001 11000 00000 ..... .....    @vv
+vclo_h           0111 00101001 11000 00001 ..... .....    @vv
+vclo_w           0111 00101001 11000 00010 ..... .....    @vv
+vclo_d           0111 00101001 11000 00011 ..... .....    @vv
+vclz_b           0111 00101001 11000 00100 ..... .....    @vv
+vclz_h           0111 00101001 11000 00101 ..... .....    @vv
+vclz_w           0111 00101001 11000 00110 ..... .....    @vv
+vclz_d           0111 00101001 11000 00111 ..... .....    @vv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index d771ff953c..0abb06781f 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -3148,3 +3148,52 @@ DO_HELPER_VV_I(vssrarni_bu_h, 16, helper_vv_ni_c, do_vssrarni_u)
 DO_HELPER_VV_I(vssrarni_hu_w, 32, helper_vv_ni_c, do_vssrarni_u)
 DO_HELPER_VV_I(vssrarni_wu_d, 64, helper_vv_ni_c, do_vssrarni_u)
 DO_HELPER_VV_I(vssrarni_du_q, 128, helper_vv_ni_c, do_vssrarni_u)
+
+static void do_vclo(vec_t *Vd, vec_t *Vj, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = clz32((uint8_t)(~Vj->B[n])) - 24;
+        break;
+    case 16:
+        Vd->H[n] = clz32((uint16_t)(~Vj->H[n])) - 16;
+        break;
+    case 32:
+        Vd->W[n] = clz32((uint32_t)(~Vj->W[n]));
+        break;
+    case 64:
+        Vd->D[n] = clz64((uint64_t)(~Vj->D[n]));
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vclz(vec_t *Vd, vec_t *Vj, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = clz32((uint8_t)Vj->B[n]) - 24;
+        break;
+    case 16:
+        Vd->H[n] = clz32((uint16_t)Vj->H[n]) - 16;
+        break;
+    case 32:
+        Vd->W[n] = clz32((uint32_t)Vj->W[n]);
+        break;
+    case 64:
+        Vd->D[n] = clz64((uint64_t)Vj->D[n]);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VV(vclo_b, 8, helper_vv, do_vclo)
+DO_HELPER_VV(vclo_h, 16, helper_vv, do_vclo)
+DO_HELPER_VV(vclo_w, 32, helper_vv, do_vclo)
+DO_HELPER_VV(vclo_d, 64, helper_vv, do_vclo)
+DO_HELPER_VV(vclz_b, 8, helper_vv, do_vclz)
+DO_HELPER_VV(vclz_h, 16, helper_vv, do_vclz)
+DO_HELPER_VV(vclz_w, 32, helper_vv, do_vclz)
+DO_HELPER_VV(vclz_d, 64, helper_vv, do_vclz)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 31/43] target/loongarch: Implement vpcnt
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (29 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 30/43] target/loongarch: Implement vclo vclz Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24  8:16 ` [RFC PATCH 32/43] target/loongarch: Implement vbitclr vbitset vbitrev Song Gao
                   ` (12 subsequent siblings)
  43 siblings, 0 replies; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VPCNT.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  5 +++
 target/loongarch/helper.h                   |  5 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  5 +++
 target/loongarch/insns.decode               |  5 +++
 target/loongarch/lsx_helper.c               | 39 +++++++++++++++++++++
 5 files changed, 59 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 865c293f43..e3d4d105fe 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1238,3 +1238,8 @@ INSN_LSX(vclz_b,           vv)
 INSN_LSX(vclz_h,           vv)
 INSN_LSX(vclz_w,           vv)
 INSN_LSX(vclz_d,           vv)
+
+INSN_LSX(vpcnt_b,          vv)
+INSN_LSX(vpcnt_h,          vv)
+INSN_LSX(vpcnt_w,          vv)
+INSN_LSX(vpcnt_d,          vv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 0080890bf6..6869b05105 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -593,3 +593,8 @@ DEF_HELPER_3(vclz_b, void, env, i32, i32)
 DEF_HELPER_3(vclz_h, void, env, i32, i32)
 DEF_HELPER_3(vclz_w, void, env, i32, i32)
 DEF_HELPER_3(vclz_d, void, env, i32, i32)
+
+DEF_HELPER_3(vpcnt_b, void, env, i32, i32)
+DEF_HELPER_3(vpcnt_h, void, env, i32, i32)
+DEF_HELPER_3(vpcnt_w, void, env, i32, i32)
+DEF_HELPER_3(vpcnt_d, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 105b6fac6e..38493c98b0 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -509,3 +509,8 @@ TRANS(vclz_b, gen_vv, gen_helper_vclz_b)
 TRANS(vclz_h, gen_vv, gen_helper_vclz_h)
 TRANS(vclz_w, gen_vv, gen_helper_vclz_w)
 TRANS(vclz_d, gen_vv, gen_helper_vclz_d)
+
+TRANS(vpcnt_b, gen_vv, gen_helper_vpcnt_b)
+TRANS(vpcnt_h, gen_vv, gen_helper_vpcnt_h)
+TRANS(vpcnt_w, gen_vv, gen_helper_vpcnt_w)
+TRANS(vpcnt_d, gen_vv, gen_helper_vpcnt_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 27cfa306c9..812262ff78 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -967,3 +967,8 @@ vclz_b           0111 00101001 11000 00100 ..... .....    @vv
 vclz_h           0111 00101001 11000 00101 ..... .....    @vv
 vclz_w           0111 00101001 11000 00110 ..... .....    @vv
 vclz_d           0111 00101001 11000 00111 ..... .....    @vv
+
+vpcnt_b          0111 00101001 11000 01000 ..... .....    @vv
+vpcnt_h          0111 00101001 11000 01001 ..... .....    @vv
+vpcnt_w          0111 00101001 11000 01010 ..... .....    @vv
+vpcnt_d          0111 00101001 11000 01011 ..... .....    @vv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 0abb06781f..c9913dec54 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -3197,3 +3197,42 @@ DO_HELPER_VV(vclz_b, 8, helper_vv, do_vclz)
 DO_HELPER_VV(vclz_h, 16, helper_vv, do_vclz)
 DO_HELPER_VV(vclz_w, 32, helper_vv, do_vclz)
 DO_HELPER_VV(vclz_d, 64, helper_vv, do_vclz)
+
+static uint64_t vpcnt(int64_t s1, int bit)
+{
+    uint64_t u1 = s1 & MAKE_64BIT_MASK(0, bit);
+
+    u1 = (u1 & 0x5555555555555555ULL) + ((u1 >>  1) & 0x5555555555555555ULL);
+    u1 = (u1 & 0x3333333333333333ULL) + ((u1 >>  2) & 0x3333333333333333ULL);
+    u1 = (u1 & 0x0F0F0F0F0F0F0F0FULL) + ((u1 >>  4) & 0x0F0F0F0F0F0F0F0FULL);
+    u1 = (u1 & 0x00FF00FF00FF00FFULL) + ((u1 >>  8) & 0x00FF00FF00FF00FFULL);
+    u1 = (u1 & 0x0000FFFF0000FFFFULL) + ((u1 >> 16) & 0x0000FFFF0000FFFFULL);
+    u1 = (u1 & 0x00000000FFFFFFFFULL) + ((u1 >> 32));
+
+    return u1;
+}
+
+static void do_vpcnt(vec_t *Vd, vec_t *Vj, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vpcnt(Vj->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = vpcnt(Vj->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = vpcnt(Vj->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = vpcnt(Vj->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VV(vpcnt_b, 8, helper_vv, do_vpcnt)
+DO_HELPER_VV(vpcnt_h, 16, helper_vv, do_vpcnt)
+DO_HELPER_VV(vpcnt_w, 32, helper_vv, do_vpcnt)
+DO_HELPER_VV(vpcnt_d, 64, helper_vv, do_vpcnt)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 32/43] target/loongarch: Implement vbitclr vbitset vbitrev
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (30 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 31/43] target/loongarch: Implement vpcnt Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24  8:16 ` [RFC PATCH 33/43] target/loongarch: Implement vfrstp Song Gao
                   ` (11 subsequent siblings)
  43 siblings, 0 replies; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VBITCLR[I].{B/H/W/D};
- VBITSET[I].{B/H/W/D};
- VBITREV[I].{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  25 +++
 target/loongarch/helper.h                   |  25 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  25 +++
 target/loongarch/insns.decode               |  25 +++
 target/loongarch/lsx_helper.c               | 162 ++++++++++++++++++++
 5 files changed, 262 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index e3d4d105fe..7212f86eb0 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1243,3 +1243,28 @@ INSN_LSX(vpcnt_b,          vv)
 INSN_LSX(vpcnt_h,          vv)
 INSN_LSX(vpcnt_w,          vv)
 INSN_LSX(vpcnt_d,          vv)
+
+INSN_LSX(vbitclr_b,        vvv)
+INSN_LSX(vbitclr_h,        vvv)
+INSN_LSX(vbitclr_w,        vvv)
+INSN_LSX(vbitclr_d,        vvv)
+INSN_LSX(vbitclri_b,       vv_i)
+INSN_LSX(vbitclri_h,       vv_i)
+INSN_LSX(vbitclri_w,       vv_i)
+INSN_LSX(vbitclri_d,       vv_i)
+INSN_LSX(vbitset_b,        vvv)
+INSN_LSX(vbitset_h,        vvv)
+INSN_LSX(vbitset_w,        vvv)
+INSN_LSX(vbitset_d,        vvv)
+INSN_LSX(vbitseti_b,       vv_i)
+INSN_LSX(vbitseti_h,       vv_i)
+INSN_LSX(vbitseti_w,       vv_i)
+INSN_LSX(vbitseti_d,       vv_i)
+INSN_LSX(vbitrev_b,        vvv)
+INSN_LSX(vbitrev_h,        vvv)
+INSN_LSX(vbitrev_w,        vvv)
+INSN_LSX(vbitrev_d,        vvv)
+INSN_LSX(vbitrevi_b,       vv_i)
+INSN_LSX(vbitrevi_h,       vv_i)
+INSN_LSX(vbitrevi_w,       vv_i)
+INSN_LSX(vbitrevi_d,       vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 6869b05105..d1983d9404 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -598,3 +598,28 @@ DEF_HELPER_3(vpcnt_b, void, env, i32, i32)
 DEF_HELPER_3(vpcnt_h, void, env, i32, i32)
 DEF_HELPER_3(vpcnt_w, void, env, i32, i32)
 DEF_HELPER_3(vpcnt_d, void, env, i32, i32)
+
+DEF_HELPER_4(vbitclr_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitclr_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitclr_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitclr_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitclri_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitclri_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitclri_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitclri_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitset_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitset_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitset_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitset_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitseti_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitseti_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitseti_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitseti_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrev_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrev_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrev_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrev_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrevi_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrevi_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrevi_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrevi_d, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 38493c98b0..141d7474dc 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -514,3 +514,28 @@ TRANS(vpcnt_b, gen_vv, gen_helper_vpcnt_b)
 TRANS(vpcnt_h, gen_vv, gen_helper_vpcnt_h)
 TRANS(vpcnt_w, gen_vv, gen_helper_vpcnt_w)
 TRANS(vpcnt_d, gen_vv, gen_helper_vpcnt_d)
+
+TRANS(vbitclr_b, gen_vvv, gen_helper_vbitclr_b)
+TRANS(vbitclr_h, gen_vvv, gen_helper_vbitclr_h)
+TRANS(vbitclr_w, gen_vvv, gen_helper_vbitclr_w)
+TRANS(vbitclr_d, gen_vvv, gen_helper_vbitclr_d)
+TRANS(vbitclri_b, gen_vv_i, gen_helper_vbitclri_b)
+TRANS(vbitclri_h, gen_vv_i, gen_helper_vbitclri_h)
+TRANS(vbitclri_w, gen_vv_i, gen_helper_vbitclri_w)
+TRANS(vbitclri_d, gen_vv_i, gen_helper_vbitclri_d)
+TRANS(vbitset_b, gen_vvv, gen_helper_vbitset_b)
+TRANS(vbitset_h, gen_vvv, gen_helper_vbitset_h)
+TRANS(vbitset_w, gen_vvv, gen_helper_vbitset_w)
+TRANS(vbitset_d, gen_vvv, gen_helper_vbitset_d)
+TRANS(vbitseti_b, gen_vv_i, gen_helper_vbitseti_b)
+TRANS(vbitseti_h, gen_vv_i, gen_helper_vbitseti_h)
+TRANS(vbitseti_w, gen_vv_i, gen_helper_vbitseti_w)
+TRANS(vbitseti_d, gen_vv_i, gen_helper_vbitseti_d)
+TRANS(vbitrev_b, gen_vvv, gen_helper_vbitrev_b)
+TRANS(vbitrev_h, gen_vvv, gen_helper_vbitrev_h)
+TRANS(vbitrev_w, gen_vvv, gen_helper_vbitrev_w)
+TRANS(vbitrev_d, gen_vvv, gen_helper_vbitrev_d)
+TRANS(vbitrevi_b, gen_vv_i, gen_helper_vbitrevi_b)
+TRANS(vbitrevi_h, gen_vv_i, gen_helper_vbitrevi_h)
+TRANS(vbitrevi_w, gen_vv_i, gen_helper_vbitrevi_w)
+TRANS(vbitrevi_d, gen_vv_i, gen_helper_vbitrevi_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 812262ff78..74667ae6e0 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -972,3 +972,28 @@ vpcnt_b          0111 00101001 11000 01000 ..... .....    @vv
 vpcnt_h          0111 00101001 11000 01001 ..... .....    @vv
 vpcnt_w          0111 00101001 11000 01010 ..... .....    @vv
 vpcnt_d          0111 00101001 11000 01011 ..... .....    @vv
+
+vbitclr_b        0111 00010000 11000 ..... ..... .....    @vvv
+vbitclr_h        0111 00010000 11001 ..... ..... .....    @vvv
+vbitclr_w        0111 00010000 11010 ..... ..... .....    @vvv
+vbitclr_d        0111 00010000 11011 ..... ..... .....    @vvv
+vbitclri_b       0111 00110001 00000 01 ... ..... .....   @vv_ui3
+vbitclri_h       0111 00110001 00000 1 .... ..... .....   @vv_ui4
+vbitclri_w       0111 00110001 00001 ..... ..... .....    @vv_ui5
+vbitclri_d       0111 00110001 0001 ...... ..... .....    @vv_ui6
+vbitset_b        0111 00010000 11100 ..... ..... .....    @vvv
+vbitset_h        0111 00010000 11101 ..... ..... .....    @vvv
+vbitset_w        0111 00010000 11110 ..... ..... .....    @vvv
+vbitset_d        0111 00010000 11111 ..... ..... .....    @vvv
+vbitseti_b       0111 00110001 01000 01 ... ..... .....   @vv_ui3
+vbitseti_h       0111 00110001 01000 1 .... ..... .....   @vv_ui4
+vbitseti_w       0111 00110001 01001 ..... ..... .....    @vv_ui5
+vbitseti_d       0111 00110001 0101 ...... ..... .....    @vv_ui6
+vbitrev_b        0111 00010001 00000 ..... ..... .....    @vvv
+vbitrev_h        0111 00010001 00001 ..... ..... .....    @vvv
+vbitrev_w        0111 00010001 00010 ..... ..... .....    @vvv
+vbitrev_d        0111 00010001 00011 ..... ..... .....    @vvv
+vbitrevi_b       0111 00110001 10000 01 ... ..... .....   @vv_ui3
+vbitrevi_h       0111 00110001 10000 1 .... ..... .....   @vv_ui4
+vbitrevi_w       0111 00110001 10001 ..... ..... .....    @vv_ui5
+vbitrevi_d       0111 00110001 1001 ...... ..... .....    @vv_ui6
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index c9913dec54..f88719908a 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -3236,3 +3236,165 @@ DO_HELPER_VV(vpcnt_b, 8, helper_vv, do_vpcnt)
 DO_HELPER_VV(vpcnt_h, 16, helper_vv, do_vpcnt)
 DO_HELPER_VV(vpcnt_w, 32, helper_vv, do_vpcnt)
 DO_HELPER_VV(vpcnt_d, 64, helper_vv, do_vpcnt)
+
+static int64_t vbitclr(int64_t s1, int64_t imm, int bit)
+{
+    return (s1 & (~(1LL << (imm % bit)))) & MAKE_64BIT_MASK(0, bit);
+}
+
+static void do_vbitclr(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vbitclr(Vj->B[n], Vk->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = vbitclr(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = vbitclr(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = vbitclr(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vbitclr_i(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vbitclr(Vj->B[n], imm, bit);
+        break;
+    case 16:
+        Vd->H[n] = vbitclr(Vj->H[n], imm, bit);
+        break;
+    case 32:
+        Vd->W[n] = vbitclr(Vj->W[n], imm, bit);
+        break;
+    case 64:
+        Vd->D[n] = vbitclr(Vj->D[n], imm, bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vbitclr_b, 8, helper_vvv, do_vbitclr)
+DO_HELPER_VVV(vbitclr_h, 16, helper_vvv, do_vbitclr)
+DO_HELPER_VVV(vbitclr_w, 32, helper_vvv, do_vbitclr)
+DO_HELPER_VVV(vbitclr_d, 64, helper_vvv, do_vbitclr)
+DO_HELPER_VV_I(vbitclri_b, 8, helper_vv_i, do_vbitclr_i)
+DO_HELPER_VV_I(vbitclri_h, 16, helper_vv_i, do_vbitclr_i)
+DO_HELPER_VV_I(vbitclri_w, 32, helper_vv_i, do_vbitclr_i)
+DO_HELPER_VV_I(vbitclri_d, 64, helper_vv_i, do_vbitclr_i)
+
+static int64_t vbitset(int64_t s1, int64_t imm, int bit)
+{
+    return (s1 | (1LL << (imm % bit))) & MAKE_64BIT_MASK(0, bit);
+}
+
+static void do_vbitset(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vbitset(Vj->B[n], Vk->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = vbitset(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = vbitset(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = vbitset(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vbitset_i(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vbitset(Vj->B[n], imm, bit);
+        break;
+    case 16:
+        Vd->H[n] = vbitset(Vj->H[n], imm, bit);
+        break;
+    case 32:
+        Vd->W[n] = vbitset(Vj->W[n], imm, bit);
+        break;
+    case 64:
+        Vd->D[n] = vbitset(Vj->D[n], imm, bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vbitset_b, 8, helper_vvv, do_vbitset)
+DO_HELPER_VVV(vbitset_h, 16, helper_vvv, do_vbitset)
+DO_HELPER_VVV(vbitset_w, 32, helper_vvv, do_vbitset)
+DO_HELPER_VVV(vbitset_d, 64, helper_vvv, do_vbitset)
+DO_HELPER_VV_I(vbitseti_b, 8, helper_vv_i, do_vbitset_i)
+DO_HELPER_VV_I(vbitseti_h, 16, helper_vv_i, do_vbitset_i)
+DO_HELPER_VV_I(vbitseti_w, 32, helper_vv_i, do_vbitset_i)
+DO_HELPER_VV_I(vbitseti_d, 64, helper_vv_i, do_vbitset_i)
+
+static int64_t vbitrev(int64_t s1, int64_t imm, int bit)
+{
+    return (s1 ^ (1LL << (imm % bit))) & MAKE_64BIT_MASK(0, bit);
+}
+
+static void do_vbitrev(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vbitrev(Vj->B[n], Vk->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = vbitrev(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = vbitrev(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = vbitrev(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vbitrev_i(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vbitrev(Vj->B[n], imm, bit);
+        break;
+    case 16:
+        Vd->H[n] = vbitrev(Vj->H[n], imm, bit);
+        break;
+    case 32:
+        Vd->W[n] = vbitrev(Vj->W[n], imm, bit);
+        break;
+    case 64:
+        Vd->D[n] = vbitrev(Vj->D[n], imm, bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vbitrev_b, 8, helper_vvv, do_vbitrev)
+DO_HELPER_VVV(vbitrev_h, 16, helper_vvv, do_vbitrev)
+DO_HELPER_VVV(vbitrev_w, 32, helper_vvv, do_vbitrev)
+DO_HELPER_VVV(vbitrev_d, 64, helper_vvv, do_vbitrev)
+DO_HELPER_VV_I(vbitrevi_b, 8, helper_vv_i, do_vbitrev_i)
+DO_HELPER_VV_I(vbitrevi_h, 16, helper_vv_i, do_vbitrev_i)
+DO_HELPER_VV_I(vbitrevi_w, 32, helper_vv_i, do_vbitrev_i)
+DO_HELPER_VV_I(vbitrevi_d, 64, helper_vv_i, do_vbitrev_i)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 33/43] target/loongarch: Implement vfrstp
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (31 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 32/43] target/loongarch: Implement vbitclr vbitset vbitrev Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24  8:16 ` [RFC PATCH 34/43] target/loongarch: Implement LSX fpu arith instructions Song Gao
                   ` (10 subsequent siblings)
  43 siblings, 0 replies; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VFRSTP[I].{B/H}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  5 ++
 target/loongarch/helper.h                   |  5 ++
 target/loongarch/insn_trans/trans_lsx.c.inc |  5 ++
 target/loongarch/insns.decode               |  5 ++
 target/loongarch/lsx_helper.c               | 70 +++++++++++++++++++++
 5 files changed, 90 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 7212f86eb0..ffcaf06136 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1268,3 +1268,8 @@ INSN_LSX(vbitrevi_b,       vv_i)
 INSN_LSX(vbitrevi_h,       vv_i)
 INSN_LSX(vbitrevi_w,       vv_i)
 INSN_LSX(vbitrevi_d,       vv_i)
+
+INSN_LSX(vfrstp_b,         vvv)
+INSN_LSX(vfrstp_h,         vvv)
+INSN_LSX(vfrstpi_b,        vv_i)
+INSN_LSX(vfrstpi_h,        vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index d1983d9404..781a544622 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -623,3 +623,8 @@ DEF_HELPER_4(vbitrevi_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vbitrevi_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vbitrevi_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vbitrevi_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vfrstp_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vfrstp_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vfrstpi_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vfrstpi_h, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 141d7474dc..ffa281e717 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -539,3 +539,8 @@ TRANS(vbitrevi_b, gen_vv_i, gen_helper_vbitrevi_b)
 TRANS(vbitrevi_h, gen_vv_i, gen_helper_vbitrevi_h)
 TRANS(vbitrevi_w, gen_vv_i, gen_helper_vbitrevi_w)
 TRANS(vbitrevi_d, gen_vv_i, gen_helper_vbitrevi_d)
+
+TRANS(vfrstp_b, gen_vvv, gen_helper_vfrstp_b)
+TRANS(vfrstp_h, gen_vvv, gen_helper_vfrstp_h)
+TRANS(vfrstpi_b, gen_vv_i, gen_helper_vfrstpi_b)
+TRANS(vfrstpi_h, gen_vv_i, gen_helper_vfrstpi_h)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 74667ae6e0..f537f726a2 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -997,3 +997,8 @@ vbitrevi_b       0111 00110001 10000 01 ... ..... .....   @vv_ui3
 vbitrevi_h       0111 00110001 10000 1 .... ..... .....   @vv_ui4
 vbitrevi_w       0111 00110001 10001 ..... ..... .....    @vv_ui5
 vbitrevi_d       0111 00110001 1001 ...... ..... .....    @vv_ui6
+
+vfrstp_b         0111 00010010 10110 ..... ..... .....    @vvv
+vfrstp_h         0111 00010010 10111 ..... ..... .....    @vvv
+vfrstpi_b        0111 00101001 10100 ..... ..... .....    @vv_ui5
+vfrstpi_h        0111 00101001 10101 ..... ..... .....    @vv_ui5
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index f88719908a..31e9270826 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -3398,3 +3398,73 @@ DO_HELPER_VV_I(vbitrevi_b, 8, helper_vv_i, do_vbitrev_i)
 DO_HELPER_VV_I(vbitrevi_h, 16, helper_vv_i, do_vbitrev_i)
 DO_HELPER_VV_I(vbitrevi_w, 32, helper_vv_i, do_vbitrev_i)
 DO_HELPER_VV_I(vbitrevi_d, 64, helper_vv_i, do_vbitrev_i)
+
+void helper_vfrstp_b(CPULoongArchState *env,
+                     uint32_t vd, uint32_t vj, uint32_t vk)
+{
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+    vec_t *Vk = &(env->fpr[vk].vec);
+
+    int i;
+    int m;
+    for (i = 0; i < 128/8; i++) {
+        if (Vj->B[i] < 0) {
+            break;
+        }
+    }
+    m = Vk->B[0] % 16;
+    Vd->B[m] = (int8_t)i;
+}
+
+void helper_vfrstp_h(CPULoongArchState *env,
+                     uint32_t vd, uint32_t vj, uint32_t vk)
+{
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+    vec_t *Vk = &(env->fpr[vk].vec);
+
+    int i;
+    int m;
+    for (i = 0; i < 128/16; i++) {
+        if (Vj->H[i] < 0) {
+            break;
+        }
+    }
+    m = Vk->H[0] % 8;
+    Vd->H[m] = (int16_t)i;
+}
+
+void helper_vfrstpi_b(CPULoongArchState *env,
+                      uint32_t vd, uint32_t vj, uint32_t imm)
+{
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+
+    int i;
+    int m;
+    for (i = 0; i < 128/8; i++) {
+        if (Vj->B[i] < 0) {
+            break;
+        }
+    }
+    m = imm % 16;
+    Vd->B[m] = (int8_t)i;
+}
+
+void helper_vfrstpi_h(CPULoongArchState *env,
+                      uint32_t vd, uint32_t vj, uint32_t imm)
+{
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+
+    int i;
+    int m;
+    for (i = 0; i < 128/16; i++) {
+        if (Vj->H[i] < 0){
+            break;
+        }
+    }
+    m = imm % 8;
+    Vd->H[m] = (int16_t)i;
+}
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 34/43] target/loongarch: Implement LSX fpu arith instructions
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (32 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 33/43] target/loongarch: Implement vfrstp Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24  8:16 ` [RFC PATCH 35/43] target/loongarch: Implement LSX fpu fcvt instructions Song Gao
                   ` (9 subsequent siblings)
  43 siblings, 0 replies; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VF{ADD/SUB/MUL/DIV}.{S/D};
- VF{MADD/MSUB/NMADD/NMSUB}.{S/D};
- VF{MAX/MIN}.{S/D};
- VF{MAXA/MINA}.{S/D};
- VFLOGB.{S/D};
- VFCLASS.{S/D};
- VF{SQRT/RECIP/RSQRT}.{S/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  46 +++++
 target/loongarch/fpu_helper.c               |   2 +-
 target/loongarch/helper.h                   |  41 +++++
 target/loongarch/insn_trans/trans_lsx.c.inc |  55 ++++++
 target/loongarch/insns.decode               |  43 +++++
 target/loongarch/internals.h                |   1 +
 target/loongarch/lsx_helper.c               | 179 ++++++++++++++++++++
 7 files changed, 366 insertions(+), 1 deletion(-)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index ffcaf06136..987bf5c597 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -778,6 +778,11 @@ static void output_vv(DisasContext *ctx, arg_vv *a, const char *mnemonic)
     output(ctx, mnemonic, "v%d, v%d", a->vd, a->vj);
 }
 
+static void output_vvvv(DisasContext *ctx, arg_vvvv *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "v%d, v%d, v%d, v%d", a->vd, a->vj, a->vk, a->va);
+}
+
 INSN_LSX(vadd_b,           vvv)
 INSN_LSX(vadd_h,           vvv)
 INSN_LSX(vadd_w,           vvv)
@@ -1273,3 +1278,44 @@ INSN_LSX(vfrstp_b,         vvv)
 INSN_LSX(vfrstp_h,         vvv)
 INSN_LSX(vfrstpi_b,        vv_i)
 INSN_LSX(vfrstpi_h,        vv_i)
+
+INSN_LSX(vfadd_s,          vvv)
+INSN_LSX(vfadd_d,          vvv)
+INSN_LSX(vfsub_s,          vvv)
+INSN_LSX(vfsub_d,          vvv)
+INSN_LSX(vfmul_s,          vvv)
+INSN_LSX(vfmul_d,          vvv)
+INSN_LSX(vfdiv_s,          vvv)
+INSN_LSX(vfdiv_d,          vvv)
+
+INSN_LSX(vfmadd_s,         vvvv)
+INSN_LSX(vfmadd_d,         vvvv)
+INSN_LSX(vfmsub_s,         vvvv)
+INSN_LSX(vfmsub_d,         vvvv)
+INSN_LSX(vfnmadd_s,        vvvv)
+INSN_LSX(vfnmadd_d,        vvvv)
+INSN_LSX(vfnmsub_s,        vvvv)
+INSN_LSX(vfnmsub_d,        vvvv)
+
+INSN_LSX(vfmax_s,          vvv)
+INSN_LSX(vfmax_d,          vvv)
+INSN_LSX(vfmin_s,          vvv)
+INSN_LSX(vfmin_d,          vvv)
+
+INSN_LSX(vfmaxa_s,         vvv)
+INSN_LSX(vfmaxa_d,         vvv)
+INSN_LSX(vfmina_s,         vvv)
+INSN_LSX(vfmina_d,         vvv)
+
+INSN_LSX(vflogb_s,         vv)
+INSN_LSX(vflogb_d,         vv)
+
+INSN_LSX(vfclass_s,        vv)
+INSN_LSX(vfclass_d,        vv)
+
+INSN_LSX(vfsqrt_s,         vv)
+INSN_LSX(vfsqrt_d,         vv)
+INSN_LSX(vfrecip_s,        vv)
+INSN_LSX(vfrecip_d,        vv)
+INSN_LSX(vfrsqrt_s,        vv)
+INSN_LSX(vfrsqrt_d,        vv)
diff --git a/target/loongarch/fpu_helper.c b/target/loongarch/fpu_helper.c
index 4b9637210a..0e9f5eb73b 100644
--- a/target/loongarch/fpu_helper.c
+++ b/target/loongarch/fpu_helper.c
@@ -77,7 +77,7 @@ static void update_fcsr0_mask(CPULoongArchState *env, uintptr_t pc, int mask)
     }
 }
 
-static void update_fcsr0(CPULoongArchState *env, uintptr_t pc)
+void update_fcsr0(CPULoongArchState *env, uintptr_t pc)
 {
     update_fcsr0_mask(env, pc, 0);
 }
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 781a544622..31fc36917d 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -628,3 +628,44 @@ DEF_HELPER_4(vfrstp_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vfrstp_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vfrstpi_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vfrstpi_h, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vfadd_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfadd_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vfsub_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfsub_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vfmul_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfmul_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vfdiv_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfdiv_d, void, env, i32, i32, i32)
+
+DEF_HELPER_5(vfmadd_s, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfmadd_d, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfmsub_s, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfmsub_d, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfnmadd_s, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfnmadd_d, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfnmsub_s, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfnmsub_d, void, env, i32, i32, i32, i32)
+
+DEF_HELPER_4(vfmax_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfmax_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vfmin_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfmin_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vfmaxa_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfmaxa_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vfmina_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfmina_d, void, env, i32, i32, i32)
+
+DEF_HELPER_3(vflogb_s, void, env, i32, i32)
+DEF_HELPER_3(vflogb_d, void, env, i32, i32)
+
+DEF_HELPER_3(vfclass_s, void, env, i32, i32)
+DEF_HELPER_3(vfclass_d, void, env, i32, i32)
+
+DEF_HELPER_3(vfsqrt_s, void, env, i32, i32)
+DEF_HELPER_3(vfsqrt_d, void, env, i32, i32)
+DEF_HELPER_3(vfrecip_s, void, env, i32, i32)
+DEF_HELPER_3(vfrecip_d, void, env, i32, i32)
+DEF_HELPER_3(vfrsqrt_s, void, env, i32, i32)
+DEF_HELPER_3(vfrsqrt_d, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index ffa281e717..c8b271ddc8 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -50,6 +50,20 @@ static bool gen_vv(DisasContext *ctx, arg_vv *a,
     return true;
 }
 
+static bool gen_vvvv(DisasContext *ctx, arg_vvvv *a,
+                     void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32,
+                                  TCGv_i32, TCGv_i32))
+{
+    TCGv_i32 vd = tcg_constant_i32(a->vd);
+    TCGv_i32 vj = tcg_constant_i32(a->vj);
+    TCGv_i32 vk = tcg_constant_i32(a->vk);
+    TCGv_i32 va = tcg_constant_i32(a->va);
+
+    CHECK_SXE;
+    func(cpu_env, vd, vj, vk, va);
+    return true;
+}
+
 TRANS(vadd_b, gen_vvv, gen_helper_vadd_b)
 TRANS(vadd_h, gen_vvv, gen_helper_vadd_h)
 TRANS(vadd_w, gen_vvv, gen_helper_vadd_w)
@@ -544,3 +558,44 @@ TRANS(vfrstp_b, gen_vvv, gen_helper_vfrstp_b)
 TRANS(vfrstp_h, gen_vvv, gen_helper_vfrstp_h)
 TRANS(vfrstpi_b, gen_vv_i, gen_helper_vfrstpi_b)
 TRANS(vfrstpi_h, gen_vv_i, gen_helper_vfrstpi_h)
+
+TRANS(vfadd_s, gen_vvv, gen_helper_vfadd_s)
+TRANS(vfadd_d, gen_vvv, gen_helper_vfadd_d)
+TRANS(vfsub_s, gen_vvv, gen_helper_vfsub_s)
+TRANS(vfsub_d, gen_vvv, gen_helper_vfsub_d)
+TRANS(vfmul_s, gen_vvv, gen_helper_vfmul_s)
+TRANS(vfmul_d, gen_vvv, gen_helper_vfmul_d)
+TRANS(vfdiv_s, gen_vvv, gen_helper_vfdiv_s)
+TRANS(vfdiv_d, gen_vvv, gen_helper_vfdiv_d)
+
+TRANS(vfmadd_s, gen_vvvv, gen_helper_vfmadd_s)
+TRANS(vfmadd_d, gen_vvvv, gen_helper_vfmadd_d)
+TRANS(vfmsub_s, gen_vvvv, gen_helper_vfmsub_s)
+TRANS(vfmsub_d, gen_vvvv, gen_helper_vfmsub_d)
+TRANS(vfnmadd_s, gen_vvvv, gen_helper_vfnmadd_s)
+TRANS(vfnmadd_d, gen_vvvv, gen_helper_vfnmadd_d)
+TRANS(vfnmsub_s, gen_vvvv, gen_helper_vfnmsub_s)
+TRANS(vfnmsub_d, gen_vvvv, gen_helper_vfnmsub_d)
+
+TRANS(vfmax_s, gen_vvv, gen_helper_vfmax_s)
+TRANS(vfmax_d, gen_vvv, gen_helper_vfmax_d)
+TRANS(vfmin_s, gen_vvv, gen_helper_vfmin_s)
+TRANS(vfmin_d, gen_vvv, gen_helper_vfmin_d)
+
+TRANS(vfmaxa_s, gen_vvv, gen_helper_vfmaxa_s)
+TRANS(vfmaxa_d, gen_vvv, gen_helper_vfmaxa_d)
+TRANS(vfmina_s, gen_vvv, gen_helper_vfmina_s)
+TRANS(vfmina_d, gen_vvv, gen_helper_vfmina_d)
+
+TRANS(vflogb_s, gen_vv, gen_helper_vflogb_s)
+TRANS(vflogb_d, gen_vv, gen_helper_vflogb_d)
+
+TRANS(vfclass_s, gen_vv, gen_helper_vfclass_s)
+TRANS(vfclass_d, gen_vv, gen_helper_vfclass_d)
+
+TRANS(vfsqrt_s, gen_vv, gen_helper_vfsqrt_s)
+TRANS(vfsqrt_d, gen_vv, gen_helper_vfsqrt_d)
+TRANS(vfrecip_s, gen_vv, gen_helper_vfrecip_s)
+TRANS(vfrecip_d, gen_vv, gen_helper_vfrecip_d)
+TRANS(vfrsqrt_s, gen_vv, gen_helper_vfrsqrt_s)
+TRANS(vfrsqrt_d, gen_vv, gen_helper_vfrsqrt_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index f537f726a2..722aa5d85b 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -492,6 +492,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 &vv           vd vj
 &vvv          vd vj vk
 &vv_i         vd vj imm
+&vvvv         vd vj vk va
 
 #
 # LSX Formats
@@ -505,6 +506,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 @vv_ui7             .... ........ ... imm:7 vj:5 vd:5    &vv_i
 @vv_ui8              .... ........ .. imm:8 vj:5 vd:5    &vv_i
 @vv_i5           .... ........ ..... imm:s5 vj:5 vd:5    &vv_i
+@vvvv               .... ........ va:5 vk:5 vj:5 vd:5    &vvvv
 
 vadd_b           0111 00000000 10100 ..... ..... .....    @vvv
 vadd_h           0111 00000000 10101 ..... ..... .....    @vvv
@@ -1002,3 +1004,44 @@ vfrstp_b         0111 00010010 10110 ..... ..... .....    @vvv
 vfrstp_h         0111 00010010 10111 ..... ..... .....    @vvv
 vfrstpi_b        0111 00101001 10100 ..... ..... .....    @vv_ui5
 vfrstpi_h        0111 00101001 10101 ..... ..... .....    @vv_ui5
+
+vfadd_s          0111 00010011 00001 ..... ..... .....    @vvv
+vfadd_d          0111 00010011 00010 ..... ..... .....    @vvv
+vfsub_s          0111 00010011 00101 ..... ..... .....    @vvv
+vfsub_d          0111 00010011 00110 ..... ..... .....    @vvv
+vfmul_s          0111 00010011 10001 ..... ..... .....    @vvv
+vfmul_d          0111 00010011 10010 ..... ..... .....    @vvv
+vfdiv_s          0111 00010011 10101 ..... ..... .....    @vvv
+vfdiv_d          0111 00010011 10110 ..... ..... .....    @vvv
+
+vfmadd_s         0000 10010001 ..... ..... ..... .....    @vvvv
+vfmadd_d         0000 10010010 ..... ..... ..... .....    @vvvv
+vfmsub_s         0000 10010101 ..... ..... ..... .....    @vvvv
+vfmsub_d         0000 10010110 ..... ..... ..... .....    @vvvv
+vfnmadd_s        0000 10011001 ..... ..... ..... .....    @vvvv
+vfnmadd_d        0000 10011010 ..... ..... ..... .....    @vvvv
+vfnmsub_s        0000 10011101 ..... ..... ..... .....    @vvvv
+vfnmsub_d        0000 10011110 ..... ..... ..... .....    @vvvv
+
+vfmax_s          0111 00010011 11001 ..... ..... .....    @vvv
+vfmax_d          0111 00010011 11010 ..... ..... .....    @vvv
+vfmin_s          0111 00010011 11101 ..... ..... .....    @vvv
+vfmin_d          0111 00010011 11110 ..... ..... .....    @vvv
+
+vfmaxa_s         0111 00010100 00001 ..... ..... .....    @vvv
+vfmaxa_d         0111 00010100 00010 ..... ..... .....    @vvv
+vfmina_s         0111 00010100 00101 ..... ..... .....    @vvv
+vfmina_d         0111 00010100 00110 ..... ..... .....    @vvv
+
+vflogb_s         0111 00101001 11001 10001 ..... .....    @vv
+vflogb_d         0111 00101001 11001 10010 ..... .....    @vv
+
+vfclass_s        0111 00101001 11001 10101 ..... .....    @vv
+vfclass_d        0111 00101001 11001 10110 ..... .....    @vv
+
+vfsqrt_s         0111 00101001 11001 11001 ..... .....    @vv
+vfsqrt_d         0111 00101001 11001 11010 ..... .....    @vv
+vfrecip_s        0111 00101001 11001 11101 ..... .....    @vv
+vfrecip_d        0111 00101001 11001 11110 ..... .....    @vv
+vfrsqrt_s        0111 00101001 11010 00001 ..... .....    @vv
+vfrsqrt_d        0111 00101001 11010 00010 ..... .....    @vv
diff --git a/target/loongarch/internals.h b/target/loongarch/internals.h
index f01635aed6..0bb0f072c0 100644
--- a/target/loongarch/internals.h
+++ b/target/loongarch/internals.h
@@ -32,6 +32,7 @@ void G_NORETURN do_raise_exception(CPULoongArchState *env,
 const char *loongarch_exception_name(int32_t exception);
 
 void restore_fp_status(CPULoongArchState *env);
+void update_fcsr0(CPULoongArchState *env, uintptr_t pc);
 
 #ifndef CONFIG_USER_ONLY
 extern const VMStateDescription vmstate_loongarch_cpu;
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 31e9270826..a5f2752dce 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -9,6 +9,8 @@
 #include "cpu.h"
 #include "exec/exec-all.h"
 #include "exec/helper-proto.h"
+#include "fpu/softfloat.h"
+#include "internals.h"
 
 #define DO_HELPER_VVV(NAME, BIT, FUNC, ...)                   \
     void helper_##NAME(CPULoongArchState *env,                \
@@ -24,6 +26,11 @@
     void helper_##NAME(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
     { FUNC(env, vd, vj, BIT, __VA_ARGS__); }
 
+#define DO_HELPER_VVVV(NAME, BIT, FUNC, ...)                               \
+    void helper_##NAME(CPULoongArchState *env,                             \
+                       uint32_t vd, uint32_t vj, uint32_t vk, uint32_t va) \
+    { FUNC(env, vd, vj, vk, va, BIT, __VA_ARGS__); }
+
 static void helper_vvv(CPULoongArchState *env,
                        uint32_t vd, uint32_t vj, uint32_t vk, int bit,
                        void (*func)(vec_t*, vec_t*, vec_t*, int, int))
@@ -3468,3 +3475,175 @@ void helper_vfrstpi_h(CPULoongArchState *env,
     m = imm % 8;
     Vd->H[m] = (int16_t)i;
 }
+
+static void helper_vvv_f(CPULoongArchState *env,
+                uint32_t vd, uint32_t vj, uint32_t vk, int bit,
+                void (*func)(float_status*, vec_t*, vec_t*, vec_t*, int, int))
+{
+    int i;
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+    vec_t *Vk = &(env->fpr[vk].vec);
+
+    vec_t dest;
+    dest.D[0] = 0;
+    dest.D[1] = 0;
+    for (i = 0; i < LSX_LEN/bit; i++) {
+        func(&env->fp_status, &dest, Vj, Vk, bit, i);
+    }
+    Vd->D[0] = dest.D[0];
+    Vd->D[1] = dest.D[1];
+    update_fcsr0(env, GETPC());
+}
+
+#define LSX_DO_FARITH(name)                                           \
+static void do_vf## name (float_status *status,                       \
+                     vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n) \
+{                                                                     \
+    switch (bit) {                                                    \
+    case 32:                                                          \
+        Vd->W[n] = float32_## name (Vj->W[n], Vk->W[n], status);      \
+        break;                                                        \
+    case 64:                                                          \
+        Vd->D[n] = float64_## name (Vj->D[n], Vk->D[n], status);      \
+        break;                                                        \
+    default:                                                          \
+        g_assert_not_reached();                                       \
+    }                                                                 \
+}
+
+LSX_DO_FARITH(add)
+LSX_DO_FARITH(sub)
+LSX_DO_FARITH(mul)
+LSX_DO_FARITH(div)
+LSX_DO_FARITH(maxnum)
+LSX_DO_FARITH(minnum)
+LSX_DO_FARITH(maxnummag)
+LSX_DO_FARITH(minnummag)
+
+DO_HELPER_VVV(vfadd_s, 32, helper_vvv_f, do_vfadd)
+DO_HELPER_VVV(vfadd_d, 64, helper_vvv_f, do_vfadd)
+DO_HELPER_VVV(vfsub_s, 32, helper_vvv_f, do_vfsub)
+DO_HELPER_VVV(vfsub_d, 64, helper_vvv_f, do_vfsub)
+DO_HELPER_VVV(vfmul_s, 32, helper_vvv_f, do_vfmul)
+DO_HELPER_VVV(vfmul_d, 64, helper_vvv_f, do_vfmul)
+DO_HELPER_VVV(vfdiv_s, 32, helper_vvv_f, do_vfdiv)
+DO_HELPER_VVV(vfdiv_d, 64, helper_vvv_f, do_vfdiv)
+
+static void helper_vvvv_f(CPULoongArchState *env,
+                uint32_t vd, uint32_t vj, uint32_t vk, uint32_t va, int bit,
+                void (*func)(float_status*, vec_t*, vec_t*, vec_t*,
+                             vec_t*, int, int))
+{
+    int i;
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+    vec_t *Vk = &(env->fpr[vk].vec);
+    vec_t *Va = &(env->fpr[va].vec);
+
+    vec_t dest;
+    dest.D[0] = 0;
+    dest.D[1] = 0;
+    for (i = 0; i < LSX_LEN/bit; i++) {
+        func(&env->fp_status, &dest, Vj, Vk, Va, bit, i);
+    }
+    Vd->D[0] = dest.D[0];
+    Vd->D[1] = dest.D[1];
+    update_fcsr0(env, GETPC());
+}
+
+#define LSX_DO_FMULADD(name, flags)                         \
+static void do_vf## name (float_status *status,             \
+                          vec_t *Vd, vec_t *Vj, vec_t *Vk,  \
+                          vec_t *Va, int bit, int n)        \
+{                                                           \
+    switch (bit) {                                          \
+    case 32:                                                \
+        Vd->W[n] = float32_muladd(Vj->W[n], Vk->W[n],       \
+                                  Va->W[n], flags, status); \
+        break;                                              \
+    case 64:                                                \
+        Vd->D[n] = float64_muladd(Vj->D[n], Vk->D[n],       \
+                                  Va->D[n], flags,status);  \
+        break;                                              \
+    default:                                                \
+        g_assert_not_reached();                             \
+    }                                                       \
+}
+
+LSX_DO_FMULADD(madd, 0)
+LSX_DO_FMULADD(msub, float_muladd_negate_c)
+LSX_DO_FMULADD(nmadd, float_muladd_negate_product | float_muladd_negate_c)
+LSX_DO_FMULADD(nmsub, float_muladd_negate_product)
+
+DO_HELPER_VVVV(vfmadd_s, 32, helper_vvvv_f, do_vfmadd)
+DO_HELPER_VVVV(vfmadd_d, 64, helper_vvvv_f, do_vfmadd)
+DO_HELPER_VVVV(vfmsub_s, 32, helper_vvvv_f, do_vfmsub)
+DO_HELPER_VVVV(vfmsub_d, 64, helper_vvvv_f, do_vfmsub)
+DO_HELPER_VVVV(vfnmadd_s, 32, helper_vvvv_f, do_vfnmadd)
+DO_HELPER_VVVV(vfnmadd_d, 64, helper_vvvv_f, do_vfnmadd)
+DO_HELPER_VVVV(vfnmsub_s, 32, helper_vvvv_f, do_vfnmsub)
+DO_HELPER_VVVV(vfnmsub_d, 64, helper_vvvv_f, do_vfnmsub)
+
+DO_HELPER_VVV(vfmax_s, 32, helper_vvv_f, do_vfmaxnum)
+DO_HELPER_VVV(vfmax_d, 64, helper_vvv_f, do_vfmaxnum)
+DO_HELPER_VVV(vfmin_s, 32, helper_vvv_f, do_vfminnum)
+DO_HELPER_VVV(vfmin_d, 64, helper_vvv_f, do_vfminnum)
+
+DO_HELPER_VVV(vfmaxa_s, 32, helper_vvv_f, do_vfmaxnummag)
+DO_HELPER_VVV(vfmaxa_d, 64, helper_vvv_f, do_vfmaxnummag)
+DO_HELPER_VVV(vfmina_s, 32, helper_vvv_f, do_vfminnummag)
+DO_HELPER_VVV(vfmina_d, 64, helper_vvv_f, do_vfminnummag)
+
+static void helper_vv_f(CPULoongArchState *env,
+                uint32_t vd, uint32_t vj, int bit,
+                void (*func)(CPULoongArchState*, vec_t*, vec_t*, int, int))
+{
+    int i;
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+
+    vec_t dest;
+    dest.D[0] = 0;
+    dest.D[1] = 0;
+    for (i = 0; i < LSX_LEN/bit; i++) {
+        func(env, &dest, Vj, bit, i);
+    }
+    Vd->D[0] = dest.D[0];
+    Vd->D[1] = dest.D[1];
+}
+
+#define LSX_DO_VV(name)                                     \
+static void do_v## name (CPULoongArchState *env, vec_t *Vd, \
+                          vec_t *Vj, int bit, int n)        \
+{                                                           \
+    switch (bit) {                                          \
+    case 32:                                                \
+        Vd->W[n] = helper_## name ## _s(env, Vj->W[n]);     \
+        break;                                              \
+    case 64:                                                \
+        Vd->D[n] = helper_## name ## _d(env, Vj->D[n]);     \
+        break;                                              \
+    default:                                                \
+        g_assert_not_reached();                             \
+    }                                                       \
+}                                                           \
+
+LSX_DO_VV(flogb)
+LSX_DO_VV(fclass)
+LSX_DO_VV(fsqrt)
+LSX_DO_VV(frecip)
+LSX_DO_VV(frsqrt)
+
+DO_HELPER_VV(vflogb_s, 32, helper_vv_f, do_vflogb)
+DO_HELPER_VV(vflogb_d, 64, helper_vv_f, do_vflogb)
+
+DO_HELPER_VV(vfclass_s, 32, helper_vv_f, do_vfclass)
+DO_HELPER_VV(vfclass_d, 64, helper_vv_f, do_vfclass)
+
+DO_HELPER_VV(vfsqrt_s, 32, helper_vv_f, do_vfsqrt)
+DO_HELPER_VV(vfsqrt_d, 64, helper_vv_f, do_vfsqrt)
+DO_HELPER_VV(vfrecip_s, 32, helper_vv_f, do_vfrecip)
+DO_HELPER_VV(vfrecip_d, 64, helper_vv_f, do_vfrecip)
+DO_HELPER_VV(vfrsqrt_s, 32, helper_vv_f, do_vfrsqrt)
+DO_HELPER_VV(vfrsqrt_d, 64, helper_vv_f, do_vfrsqrt)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 35/43] target/loongarch: Implement LSX fpu fcvt instructions
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (33 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 34/43] target/loongarch: Implement LSX fpu arith instructions Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24  8:16 ` [RFC PATCH 36/43] target/loongarch: Implement vseq vsle vslt Song Gao
                   ` (8 subsequent siblings)
  43 siblings, 0 replies; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VFCVT{L/H}.{S.H/D.S};
- VFCVT.{H.S/S.D};
- VFRINT[{RNE/RZ/RP/RM}].{S/D};
- VFTINT[{RNE/RZ/RP/RM}].{W.S/L.D};
- VFTINT[RZ].{WU.S/LU.D};
- VFTINT[{RNE/RZ/RP/RM}].W.D;
- VFTINT[{RNE/RZ/RP/RM}]{L/H}.L.S;
- VFFINT.{S.W/D.L}[U];
- VFFINT.S.L, VFFINT{L/H}.D.W.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 fpu/softfloat.c                             |  55 ++++
 include/fpu/softfloat.h                     |  27 ++
 target/loongarch/disas.c                    |  56 ++++
 target/loongarch/helper.h                   |  56 ++++
 target/loongarch/insn_trans/trans_lsx.c.inc |  56 ++++
 target/loongarch/insns.decode               |  56 ++++
 target/loongarch/lsx_helper.c               | 312 ++++++++++++++++++++
 7 files changed, 618 insertions(+)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index c7454c3eb1..c7d0ebd803 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -2988,6 +2988,25 @@ float64 float64_round_to_int(float64 a, float_status *s)
     return float64_round_pack_canonical(&p, s);
 }
 
+#define FRINT_RM(rm, rmode, bits)                             \
+float ## bits float ## bits ## _round_to_int_ ## rm(          \
+                         float ## bits a, float_status *s)    \
+{                                                             \
+    FloatParts64 pa;   \
+    float ## bits ## _unpack_canonical(&pa, a, s); \
+    parts_round_to_int(&pa, rmode, 0, s, &float64_params);    \
+    return float ## bits ## _round_pack_canonical(&pa, s);    \
+}
+FRINT_RM(rne , float_round_nearest_even, 32)
+FRINT_RM(rz  , float_round_down        , 32)
+FRINT_RM(rp  , float_round_up          , 32)
+FRINT_RM(rm  , float_round_to_zero     , 32)
+FRINT_RM(rne , float_round_nearest_even, 64)
+FRINT_RM(rz  , float_round_down        , 64)
+FRINT_RM(rp  , float_round_up          , 64)
+FRINT_RM(rm  , float_round_to_zero     , 64)
+#undef FRINT_RM
+
 bfloat16 bfloat16_round_to_int(bfloat16 a, float_status *s)
 {
     FloatParts64 p;
@@ -3349,6 +3368,42 @@ int32_t float64_to_int32_round_to_zero(float64 a, float_status *s)
     return float64_to_int32_scalbn(a, float_round_to_zero, 0, s);
 }
 
+#define FTINT_RM(rm, rmode, sbits, dbits)                                 \
+int ## dbits ## _t float ## sbits ## _to_int ## dbits ## _ ## rm(         \
+                         float ## sbits a, float_status *s)               \
+{                                                                         \
+    return float ## sbits ## _to_int ## dbits ## _scalbn(a, rmode, 0, s); \
+}
+FTINT_RM(rne , float_round_nearest_even, 32, 32)
+FTINT_RM(rz  , float_round_down        , 32, 32)
+FTINT_RM(rp  , float_round_up          , 32, 32)
+FTINT_RM(rm  , float_round_to_zero     , 32, 32)
+FTINT_RM(rne , float_round_nearest_even, 64, 64)
+FTINT_RM(rz  , float_round_down        , 64, 64)
+FTINT_RM(rp  , float_round_up          , 64, 64)
+FTINT_RM(rm  , float_round_to_zero     , 64, 64)
+
+FTINT_RM(rne , float_round_nearest_even, 32, 64)
+FTINT_RM(rz  , float_round_down        , 32, 64)
+FTINT_RM(rp  , float_round_up          , 32, 64)
+FTINT_RM(rm  , float_round_to_zero     , 32, 64)
+#undef FTINT_RM
+
+int32_t float64_to_int32_round_up(float64 a, float_status *s)
+{
+    return float64_to_int32_scalbn(a, float_round_up, 0, s);
+}
+
+int32_t float64_to_int32_round_down(float64 a, float_status *s)
+{
+    return float64_to_int32_scalbn(a, float_round_down, 0, s);
+}
+
+int32_t float64_to_int32_round_nearest_even(float64 a, float_status *s)
+{
+    return float64_to_int32_scalbn(a, float_round_nearest_even, 0, s);
+}
+
 int64_t float64_to_int64_round_to_zero(float64 a, float_status *s)
 {
     return float64_to_int64_scalbn(a, float_round_to_zero, 0, s);
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 3dcf20e3a2..ebdbaa4ac8 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -559,6 +559,16 @@ int16_t float32_to_int16_round_to_zero(float32, float_status *status);
 int32_t float32_to_int32_round_to_zero(float32, float_status *status);
 int64_t float32_to_int64_round_to_zero(float32, float_status *status);
 
+int64_t float32_to_int64_rm(float32, float_status *status);
+int64_t float32_to_int64_rp(float32, float_status *status);
+int64_t float32_to_int64_rz(float32, float_status *status);
+int64_t float32_to_int64_rne(float32, float_status *status);
+
+int32_t float32_to_int32_rm(float32, float_status *status);
+int32_t float32_to_int32_rp(float32, float_status *status);
+int32_t float32_to_int32_rz(float32, float_status *status);
+int32_t float32_to_int32_rne(float32, float_status *status);
+
 uint16_t float32_to_uint16_scalbn(float32, FloatRoundMode, int, float_status *);
 uint32_t float32_to_uint32_scalbn(float32, FloatRoundMode, int, float_status *);
 uint64_t float32_to_uint64_scalbn(float32, FloatRoundMode, int, float_status *);
@@ -579,6 +589,10 @@ float128 float32_to_float128(float32, float_status *status);
 | Software IEC/IEEE single-precision operations.
 *----------------------------------------------------------------------------*/
 float32 float32_round_to_int(float32, float_status *status);
+float32 float32_round_to_int_rm(float32, float_status *status);
+float32 float32_round_to_int_rp(float32, float_status *status);
+float32 float32_round_to_int_rz(float32, float_status *status);
+float32 float32_round_to_int_rne(float32, float_status *status);
 float32 float32_add(float32, float32, float_status *status);
 float32 float32_sub(float32, float32, float_status *status);
 float32 float32_mul(float32, float32, float_status *status);
@@ -751,6 +765,15 @@ int16_t float64_to_int16_round_to_zero(float64, float_status *status);
 int32_t float64_to_int32_round_to_zero(float64, float_status *status);
 int64_t float64_to_int64_round_to_zero(float64, float_status *status);
 
+int64_t float64_to_int64_rm(float64, float_status *status);
+int64_t float64_to_int64_rp(float64, float_status *status);
+int64_t float64_to_int64_rz(float64, float_status *status);
+int64_t float64_to_int64_rne(float64, float_status *status);
+
+int32_t float64_to_int32_round_up(float64, float_status *status);
+int32_t float64_to_int32_round_down(float64, float_status *status);
+int32_t float64_to_int32_round_nearest_even(float64, float_status *status);
+
 uint16_t float64_to_uint16_scalbn(float64, FloatRoundMode, int, float_status *);
 uint32_t float64_to_uint32_scalbn(float64, FloatRoundMode, int, float_status *);
 uint64_t float64_to_uint64_scalbn(float64, FloatRoundMode, int, float_status *);
@@ -771,6 +794,10 @@ float128 float64_to_float128(float64, float_status *status);
 | Software IEC/IEEE double-precision operations.
 *----------------------------------------------------------------------------*/
 float64 float64_round_to_int(float64, float_status *status);
+float64 float64_round_to_int_rm(float64, float_status *status);
+float64 float64_round_to_int_rp(float64, float_status *status);
+float64 float64_round_to_int_rz(float64, float_status *status);
+float64 float64_round_to_int_rne(float64, float_status *status);
 float64 float64_add(float64, float64, float_status *status);
 float64 float64_sub(float64, float64, float_status *status);
 float64 float64_mul(float64, float64, float_status *status);
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 987bf5c597..489980a0fa 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1319,3 +1319,59 @@ INSN_LSX(vfrecip_s,        vv)
 INSN_LSX(vfrecip_d,        vv)
 INSN_LSX(vfrsqrt_s,        vv)
 INSN_LSX(vfrsqrt_d,        vv)
+
+INSN_LSX(vfcvtl_s_h,       vv)
+INSN_LSX(vfcvth_s_h,       vv)
+INSN_LSX(vfcvtl_d_s,       vv)
+INSN_LSX(vfcvth_d_s,       vv)
+INSN_LSX(vfcvt_h_s,        vvv)
+INSN_LSX(vfcvt_s_d,        vvv)
+
+INSN_LSX(vfrint_s,         vv)
+INSN_LSX(vfrint_d,         vv)
+INSN_LSX(vfrintrm_s,       vv)
+INSN_LSX(vfrintrm_d,       vv)
+INSN_LSX(vfrintrp_s,       vv)
+INSN_LSX(vfrintrp_d,       vv)
+INSN_LSX(vfrintrz_s,       vv)
+INSN_LSX(vfrintrz_d,       vv)
+INSN_LSX(vfrintrne_s,      vv)
+INSN_LSX(vfrintrne_d,      vv)
+
+INSN_LSX(vftint_w_s,       vv)
+INSN_LSX(vftint_l_d,       vv)
+INSN_LSX(vftintrm_w_s,     vv)
+INSN_LSX(vftintrm_l_d,     vv)
+INSN_LSX(vftintrp_w_s,     vv)
+INSN_LSX(vftintrp_l_d,     vv)
+INSN_LSX(vftintrz_w_s,     vv)
+INSN_LSX(vftintrz_l_d,     vv)
+INSN_LSX(vftintrne_w_s,    vv)
+INSN_LSX(vftintrne_l_d,    vv)
+INSN_LSX(vftint_wu_s,      vv)
+INSN_LSX(vftint_lu_d,      vv)
+INSN_LSX(vftintrz_wu_s,    vv)
+INSN_LSX(vftintrz_lu_d,    vv)
+INSN_LSX(vftint_w_d,       vvv)
+INSN_LSX(vftintrm_w_d,     vvv)
+INSN_LSX(vftintrp_w_d,     vvv)
+INSN_LSX(vftintrz_w_d,     vvv)
+INSN_LSX(vftintrne_w_d,    vvv)
+INSN_LSX(vftintl_l_s,      vv)
+INSN_LSX(vftinth_l_s,      vv)
+INSN_LSX(vftintrml_l_s,    vv)
+INSN_LSX(vftintrmh_l_s,    vv)
+INSN_LSX(vftintrpl_l_s,    vv)
+INSN_LSX(vftintrph_l_s,    vv)
+INSN_LSX(vftintrzl_l_s,    vv)
+INSN_LSX(vftintrzh_l_s,    vv)
+INSN_LSX(vftintrnel_l_s,   vv)
+INSN_LSX(vftintrneh_l_s,   vv)
+
+INSN_LSX(vffint_s_w,       vv)
+INSN_LSX(vffint_s_wu,      vv)
+INSN_LSX(vffint_d_l,       vv)
+INSN_LSX(vffint_d_lu,      vv)
+INSN_LSX(vffintl_d_w,      vv)
+INSN_LSX(vffinth_d_w,      vv)
+INSN_LSX(vffint_s_l,       vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 31fc36917d..59d94fd055 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -669,3 +669,59 @@ DEF_HELPER_3(vfrecip_s, void, env, i32, i32)
 DEF_HELPER_3(vfrecip_d, void, env, i32, i32)
 DEF_HELPER_3(vfrsqrt_s, void, env, i32, i32)
 DEF_HELPER_3(vfrsqrt_d, void, env, i32, i32)
+
+DEF_HELPER_3(vfcvtl_s_h, void, env, i32, i32)
+DEF_HELPER_3(vfcvth_s_h, void, env, i32, i32)
+DEF_HELPER_3(vfcvtl_d_s, void, env, i32, i32)
+DEF_HELPER_3(vfcvth_d_s, void, env, i32, i32)
+DEF_HELPER_4(vfcvt_h_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfcvt_s_d, void, env, i32, i32, i32)
+
+DEF_HELPER_3(vfrintrne_s, void, env, i32, i32)
+DEF_HELPER_3(vfrintrne_d, void, env, i32, i32)
+DEF_HELPER_3(vfrintrz_s, void, env, i32, i32)
+DEF_HELPER_3(vfrintrz_d, void, env, i32, i32)
+DEF_HELPER_3(vfrintrp_s, void, env, i32, i32)
+DEF_HELPER_3(vfrintrp_d, void, env, i32, i32)
+DEF_HELPER_3(vfrintrm_s, void, env, i32, i32)
+DEF_HELPER_3(vfrintrm_d, void, env, i32, i32)
+DEF_HELPER_3(vfrint_s, void, env, i32, i32)
+DEF_HELPER_3(vfrint_d, void, env, i32, i32)
+
+DEF_HELPER_3(vftintrne_w_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrne_l_d, void, env, i32, i32)
+DEF_HELPER_3(vftintrz_w_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrz_l_d, void, env, i32, i32)
+DEF_HELPER_3(vftintrp_w_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrp_l_d, void, env, i32, i32)
+DEF_HELPER_3(vftintrm_w_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrm_l_d, void, env, i32, i32)
+DEF_HELPER_3(vftint_w_s, void, env, i32, i32)
+DEF_HELPER_3(vftint_l_d, void, env, i32, i32)
+DEF_HELPER_3(vftintrz_wu_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrz_lu_d, void, env, i32, i32)
+DEF_HELPER_3(vftint_wu_s, void, env, i32, i32)
+DEF_HELPER_3(vftint_lu_d, void, env, i32, i32)
+DEF_HELPER_4(vftintrne_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vftintrz_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vftintrp_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vftintrm_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vftint_w_d, void, env, i32, i32, i32)
+DEF_HELPER_3(vftintrnel_l_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrneh_l_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrzl_l_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrzh_l_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrpl_l_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrph_l_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrml_l_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrmh_l_s, void, env, i32, i32)
+DEF_HELPER_3(vftintl_l_s, void, env, i32, i32)
+DEF_HELPER_3(vftinth_l_s, void, env, i32, i32)
+
+DEF_HELPER_3(vffint_s_w, void, env, i32, i32)
+DEF_HELPER_3(vffint_d_l, void, env, i32, i32)
+DEF_HELPER_3(vffint_s_wu, void, env, i32, i32)
+DEF_HELPER_3(vffint_d_lu, void, env, i32, i32)
+DEF_HELPER_3(vffintl_d_w, void, env, i32, i32)
+DEF_HELPER_3(vffinth_d_w, void, env, i32, i32)
+DEF_HELPER_4(vffint_s_l, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index c8b271ddc8..cb318a726b 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -599,3 +599,59 @@ TRANS(vfrecip_s, gen_vv, gen_helper_vfrecip_s)
 TRANS(vfrecip_d, gen_vv, gen_helper_vfrecip_d)
 TRANS(vfrsqrt_s, gen_vv, gen_helper_vfrsqrt_s)
 TRANS(vfrsqrt_d, gen_vv, gen_helper_vfrsqrt_d)
+
+TRANS(vfcvtl_s_h, gen_vv, gen_helper_vfcvtl_s_h)
+TRANS(vfcvth_s_h, gen_vv, gen_helper_vfcvth_s_h)
+TRANS(vfcvtl_d_s, gen_vv, gen_helper_vfcvtl_d_s)
+TRANS(vfcvth_d_s, gen_vv, gen_helper_vfcvth_d_s)
+TRANS(vfcvt_h_s, gen_vvv, gen_helper_vfcvt_h_s)
+TRANS(vfcvt_s_d, gen_vvv, gen_helper_vfcvt_s_d)
+
+TRANS(vfrintrne_s, gen_vv, gen_helper_vfrintrne_s)
+TRANS(vfrintrne_d, gen_vv, gen_helper_vfrintrne_d)
+TRANS(vfrintrz_s, gen_vv, gen_helper_vfrintrz_s)
+TRANS(vfrintrz_d, gen_vv, gen_helper_vfrintrz_d)
+TRANS(vfrintrp_s, gen_vv, gen_helper_vfrintrp_s)
+TRANS(vfrintrp_d, gen_vv, gen_helper_vfrintrp_d)
+TRANS(vfrintrm_s, gen_vv, gen_helper_vfrintrm_s)
+TRANS(vfrintrm_d, gen_vv, gen_helper_vfrintrm_d)
+TRANS(vfrint_s, gen_vv, gen_helper_vfrint_s)
+TRANS(vfrint_d, gen_vv, gen_helper_vfrint_d)
+
+TRANS(vftintrne_w_s, gen_vv, gen_helper_vftintrne_w_s)
+TRANS(vftintrne_l_d, gen_vv, gen_helper_vftintrne_l_d)
+TRANS(vftintrz_w_s, gen_vv, gen_helper_vftintrz_w_s)
+TRANS(vftintrz_l_d, gen_vv, gen_helper_vftintrz_l_d)
+TRANS(vftintrp_w_s, gen_vv, gen_helper_vftintrp_w_s)
+TRANS(vftintrp_l_d, gen_vv, gen_helper_vftintrp_l_d)
+TRANS(vftintrm_w_s, gen_vv, gen_helper_vftintrm_w_s)
+TRANS(vftintrm_l_d, gen_vv, gen_helper_vftintrm_l_d)
+TRANS(vftint_w_s, gen_vv, gen_helper_vftint_w_s)
+TRANS(vftint_l_d, gen_vv, gen_helper_vftint_l_d)
+TRANS(vftintrz_wu_s, gen_vv, gen_helper_vftintrz_wu_s)
+TRANS(vftintrz_lu_d, gen_vv, gen_helper_vftintrz_lu_d)
+TRANS(vftint_wu_s, gen_vv, gen_helper_vftint_wu_s)
+TRANS(vftint_lu_d, gen_vv, gen_helper_vftint_lu_d)
+TRANS(vftintrne_w_d, gen_vvv, gen_helper_vftintrne_w_d)
+TRANS(vftintrz_w_d, gen_vvv, gen_helper_vftintrz_w_d)
+TRANS(vftintrp_w_d, gen_vvv, gen_helper_vftintrp_w_d)
+TRANS(vftintrm_w_d, gen_vvv, gen_helper_vftintrm_w_d)
+TRANS(vftint_w_d, gen_vvv, gen_helper_vftint_w_d)
+TRANS(vftintrnel_l_s, gen_vv, gen_helper_vftintrnel_l_s)
+TRANS(vftintrneh_l_s, gen_vv, gen_helper_vftintrneh_l_s)
+TRANS(vftintrzl_l_s, gen_vv, gen_helper_vftintrzl_l_s)
+TRANS(vftintrzh_l_s, gen_vv, gen_helper_vftintrzh_l_s)
+TRANS(vftintrpl_l_s, gen_vv, gen_helper_vftintrpl_l_s)
+TRANS(vftintrph_l_s, gen_vv, gen_helper_vftintrph_l_s)
+TRANS(vftintrml_l_s, gen_vv, gen_helper_vftintrml_l_s)
+TRANS(vftintrmh_l_s, gen_vv, gen_helper_vftintrmh_l_s)
+TRANS(vftintl_l_s, gen_vv, gen_helper_vftintl_l_s)
+TRANS(vftinth_l_s, gen_vv, gen_helper_vftinth_l_s)
+
+TRANS(vffint_s_w, gen_vv, gen_helper_vffint_s_w)
+TRANS(vffint_d_l, gen_vv, gen_helper_vffint_d_l)
+TRANS(vffint_s_wu, gen_vv, gen_helper_vffint_s_wu)
+TRANS(vffint_d_lu, gen_vv, gen_helper_vffint_d_lu)
+TRANS(vffintl_d_w, gen_vv, gen_helper_vffintl_d_w)
+TRANS(vffinth_d_w, gen_vv, gen_helper_vffinth_d_w)
+TRANS(vffint_s_l, gen_vvv, gen_helper_vffint_s_l)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 722aa5d85b..26f82d5712 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1045,3 +1045,59 @@ vfrecip_s        0111 00101001 11001 11101 ..... .....    @vv
 vfrecip_d        0111 00101001 11001 11110 ..... .....    @vv
 vfrsqrt_s        0111 00101001 11010 00001 ..... .....    @vv
 vfrsqrt_d        0111 00101001 11010 00010 ..... .....    @vv
+
+vfcvtl_s_h       0111 00101001 11011 11010 ..... .....    @vv
+vfcvth_s_h       0111 00101001 11011 11011 ..... .....    @vv
+vfcvtl_d_s       0111 00101001 11011 11100 ..... .....    @vv
+vfcvth_d_s       0111 00101001 11011 11101 ..... .....    @vv
+vfcvt_h_s        0111 00010100 01100 ..... ..... .....    @vvv
+vfcvt_s_d        0111 00010100 01101 ..... ..... .....    @vvv
+
+vfrint_s         0111 00101001 11010 01101 ..... .....    @vv
+vfrint_d         0111 00101001 11010 01110 ..... .....    @vv
+vfrintrm_s       0111 00101001 11010 10001 ..... .....    @vv
+vfrintrm_d       0111 00101001 11010 10010 ..... .....    @vv
+vfrintrp_s       0111 00101001 11010 10101 ..... .....    @vv
+vfrintrp_d       0111 00101001 11010 10110 ..... .....    @vv
+vfrintrz_s       0111 00101001 11010 11001 ..... .....    @vv
+vfrintrz_d       0111 00101001 11010 11010 ..... .....    @vv
+vfrintrne_s      0111 00101001 11010 11101 ..... .....    @vv
+vfrintrne_d      0111 00101001 11010 11110 ..... .....    @vv
+
+vftint_w_s       0111 00101001 11100 01100 ..... .....    @vv
+vftint_l_d       0111 00101001 11100 01101 ..... .....    @vv
+vftintrm_w_s     0111 00101001 11100 01110 ..... .....    @vv
+vftintrm_l_d     0111 00101001 11100 01111 ..... .....    @vv
+vftintrp_w_s     0111 00101001 11100 10000 ..... .....    @vv
+vftintrp_l_d     0111 00101001 11100 10001 ..... .....    @vv
+vftintrz_w_s     0111 00101001 11100 10010 ..... .....    @vv
+vftintrz_l_d     0111 00101001 11100 10011 ..... .....    @vv
+vftintrne_w_s    0111 00101001 11100 10100 ..... .....    @vv
+vftintrne_l_d    0111 00101001 11100 10101 ..... .....    @vv
+vftint_wu_s      0111 00101001 11100 10110 ..... .....    @vv
+vftint_lu_d      0111 00101001 11100 10111 ..... .....    @vv
+vftintrz_wu_s    0111 00101001 11100 11100 ..... .....    @vv
+vftintrz_lu_d    0111 00101001 11100 11101 ..... .....    @vv
+vftint_w_d       0111 00010100 10011 ..... ..... .....    @vvv
+vftintrm_w_d     0111 00010100 10100 ..... ..... .....    @vvv
+vftintrp_w_d     0111 00010100 10101 ..... ..... .....    @vvv
+vftintrz_w_d     0111 00010100 10110 ..... ..... .....    @vvv
+vftintrne_w_d    0111 00010100 10111 ..... ..... .....    @vvv
+vftintl_l_s      0111 00101001 11101 00000 ..... .....    @vv
+vftinth_l_s      0111 00101001 11101 00001 ..... .....    @vv
+vftintrml_l_s    0111 00101001 11101 00010 ..... .....    @vv
+vftintrmh_l_s    0111 00101001 11101 00011 ..... .....    @vv
+vftintrpl_l_s    0111 00101001 11101 00100 ..... .....    @vv
+vftintrph_l_s    0111 00101001 11101 00101 ..... .....    @vv
+vftintrzl_l_s    0111 00101001 11101 00110 ..... .....    @vv
+vftintrzh_l_s    0111 00101001 11101 00111 ..... .....    @vv
+vftintrnel_l_s   0111 00101001 11101 01000 ..... .....    @vv
+vftintrneh_l_s   0111 00101001 11101 01001 ..... .....    @vv
+
+vffint_s_w       0111 00101001 11100 00000 ..... .....    @vv
+vffint_s_wu      0111 00101001 11100 00001 ..... .....    @vv
+vffint_d_l       0111 00101001 11100 00010 ..... .....    @vv
+vffint_d_lu      0111 00101001 11100 00011 ..... .....    @vv
+vffintl_d_w      0111 00101001 11100 00100 ..... .....    @vv
+vffinth_d_w      0111 00101001 11100 00101 ..... .....    @vv
+vffint_s_l       0111 00010100 10000 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index a5f2752dce..29c0592d0c 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -3634,6 +3634,7 @@ LSX_DO_VV(fclass)
 LSX_DO_VV(fsqrt)
 LSX_DO_VV(frecip)
 LSX_DO_VV(frsqrt)
+LSX_DO_VV(frint)
 
 DO_HELPER_VV(vflogb_s, 32, helper_vv_f, do_vflogb)
 DO_HELPER_VV(vflogb_d, 64, helper_vv_f, do_vflogb)
@@ -3647,3 +3648,314 @@ DO_HELPER_VV(vfrecip_s, 32, helper_vv_f, do_vfrecip)
 DO_HELPER_VV(vfrecip_d, 64, helper_vv_f, do_vfrecip)
 DO_HELPER_VV(vfrsqrt_s, 32, helper_vv_f, do_vfrsqrt)
 DO_HELPER_VV(vfrsqrt_d, 64, helper_vv_f, do_vfrsqrt)
+
+static void do_vfcvtl(CPULoongArchState *env, vec_t *Vd,
+                      vec_t *Vj, int bit, int n)
+{
+    uint32_t s;
+    uint64_t d;
+
+    switch (bit) {
+    case 32:
+        s = float16_to_float32((uint16_t)Vj->H[n], true, &env->fp_status);
+        Vd->W[n] = Vj->H[n] < 0 ? (s | (1 << 31)) : s;
+        break;
+    case 64:
+        d = float32_to_float64((uint32_t)Vj->W[n], &env->fp_status);
+        Vd->D[n] = Vj->W[n] < 0 ? (d | (1ULL << 63)) : d;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    update_fcsr0(env, GETPC());
+}
+
+static void do_vfcvth(CPULoongArchState *env, vec_t *Vd,
+                      vec_t *Vj, int bit, int n)
+{
+    uint32_t s;
+    uint64_t d;
+
+    switch (bit) {
+    case 32:
+        s = float16_to_float32((uint16_t)Vj->H[n + 4], true, &env->fp_status);
+        Vd->W[n] = Vj->H[n + 4] < 0 ? (s | (1 << 31)) : s;
+        break;
+    case 64:
+        d = float32_to_float64((uint32_t)Vj->W[n + 2], &env->fp_status);
+        Vd->D[n] = Vj->W[n + 2] < 0 ? (d | (1ULL << 63)) : d;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    update_fcsr0(env, GETPC());
+}
+
+static void do_vfcvt(float_status *status, vec_t *Vd,
+                      vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    uint16_t H_h, H_l;
+    uint32_t S_h, S_l;
+
+    switch (bit) {
+    case 32:
+        H_h = float32_to_float16((uint32_t)Vj->W[n], true, status);
+        H_l = float32_to_float16((uint32_t)Vk->W[n], true, status);
+        Vd->H[n + 4] = Vj->W[n] < 0 ? (H_h | (1 << 15)) : H_h;
+        Vd->H[n] = Vk->W[n] < 0 ? (H_l | (1 << 15)) : H_l;
+        break;
+    case 64:
+        S_h = float64_to_float32((uint64_t)Vj->D[n], status);
+        S_l = float64_to_float32((uint64_t)Vk->D[n], status);
+        Vd->W[n + 2] = Vj->D[n] < 0 ? (S_h | (1 << 31)) : S_h;
+        Vd->W[n] = Vk->D[n] < 0 ? (S_l | (1 << 31)) : S_l;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VV(vfcvtl_s_h, 32, helper_vv_f, do_vfcvtl)
+DO_HELPER_VV(vfcvth_s_h, 32, helper_vv_f, do_vfcvth)
+DO_HELPER_VV(vfcvtl_d_s, 64, helper_vv_f, do_vfcvtl)
+DO_HELPER_VV(vfcvth_d_s, 64, helper_vv_f, do_vfcvth)
+DO_HELPER_VVV(vfcvt_h_s, 32, helper_vvv_f, do_vfcvt)
+DO_HELPER_VVV(vfcvt_s_d, 64, helper_vvv_f, do_vfcvt)
+
+#define LSX_FRINT_RM(rm)                                                   \
+static void do_vfrint## rm (CPULoongArchState *env, vec_t *Vd,             \
+                          vec_t *Vj, int bit, int n)                       \
+{                                                                          \
+    switch (bit) {                                                         \
+    case 32:                                                               \
+        Vd->W[n] = float32_round_to_int_## rm (Vj->W[n], &env->fp_status); \
+        break;                                                             \
+    case 64:                                                               \
+        Vd->D[n] = float64_round_to_int_## rm (Vj->D[n], &env->fp_status); \
+        break;                                                             \
+    default:                                                               \
+        g_assert_not_reached();                                            \
+    }                                                                      \
+    update_fcsr0(env, GETPC());                                            \
+}
+
+LSX_FRINT_RM(rne)
+LSX_FRINT_RM(rz)
+LSX_FRINT_RM(rp)
+LSX_FRINT_RM(rm)
+
+DO_HELPER_VV(vfrintrne_s, 32, helper_vv_f, do_vfrintrne)
+DO_HELPER_VV(vfrintrne_d, 64, helper_vv_f, do_vfrintrne)
+DO_HELPER_VV(vfrintrz_s, 32, helper_vv_f, do_vfrintrz)
+DO_HELPER_VV(vfrintrz_d, 64, helper_vv_f, do_vfrintrz)
+DO_HELPER_VV(vfrintrp_s, 32, helper_vv_f, do_vfrintrp)
+DO_HELPER_VV(vfrintrp_d, 64, helper_vv_f, do_vfrintrp)
+DO_HELPER_VV(vfrintrm_s, 32, helper_vv_f, do_vfrintrm)
+DO_HELPER_VV(vfrintrm_d, 64, helper_vv_f, do_vfrintrm)
+DO_HELPER_VV(vfrint_s, 32, helper_vv_f, do_vfrint)
+DO_HELPER_VV(vfrint_d, 64, helper_vv_f, do_vfrint)
+
+#define LSX_FTINT_RM(name)                                  \
+static void do_v## name (CPULoongArchState *env, vec_t *Vd, \
+                          vec_t *Vj, int bit, int n)        \
+{                                                           \
+    switch (bit) {                                          \
+    case 32:                                                \
+        Vd->W[n] = helper_## name ## _w_s(env, Vj->W[n]);   \
+        break;                                              \
+    case 64:                                                \
+        Vd->D[n] = helper_## name ## _l_d(env, Vj->D[n]);   \
+        break;                                              \
+    default:                                                \
+        g_assert_not_reached();                             \
+    }                                                       \
+}                                                           \
+
+LSX_FTINT_RM(ftintrne)
+LSX_FTINT_RM(ftintrp)
+LSX_FTINT_RM(ftintrz)
+LSX_FTINT_RM(ftintrm)
+LSX_FTINT_RM(ftint)
+
+DO_HELPER_VV(vftintrne_w_s, 32, helper_vv_f, do_vftintrne)
+DO_HELPER_VV(vftintrne_l_d, 64, helper_vv_f, do_vftintrne)
+DO_HELPER_VV(vftintrp_w_s, 32, helper_vv_f, do_vftintrp)
+DO_HELPER_VV(vftintrp_l_d, 64, helper_vv_f, do_vftintrp)
+DO_HELPER_VV(vftintrz_w_s, 32, helper_vv_f, do_vftintrz)
+DO_HELPER_VV(vftintrz_l_d, 64, helper_vv_f, do_vftintrz)
+DO_HELPER_VV(vftintrm_w_s, 32, helper_vv_f, do_vftintrm)
+DO_HELPER_VV(vftintrm_l_d, 64, helper_vv_f, do_vftintrm)
+DO_HELPER_VV(vftint_w_s, 32, helper_vv_f, do_vftint)
+DO_HELPER_VV(vftint_l_d, 64, helper_vv_f, do_vftint)
+
+static void do_vftintrz_u(CPULoongArchState *env, vec_t *Vd,
+                          vec_t *Vj, int bit, int n)
+{
+    switch (bit) {
+    case 32:
+        Vd->W[n] = float32_to_uint32_round_to_zero(Vj->W[n], &env->fp_status);
+        break;
+    case 64:
+        Vd->D[n] = float64_to_uint64_round_to_zero(Vj->D[n], &env->fp_status);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    update_fcsr0(env, GETPC());
+}
+
+static void do_vftint_u(CPULoongArchState *env, vec_t *Vd,
+                        vec_t *Vj, int bit, int n)
+{
+    switch (bit) {
+    case 32:
+        Vd->W[n] = float32_to_uint32(Vj->W[n], &env->fp_status);
+        break;
+    case 64:
+        Vd->D[n] = float64_to_uint64(Vj->D[n], &env->fp_status);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    update_fcsr0(env, GETPC());
+}
+
+DO_HELPER_VV(vftintrz_wu_s, 32, helper_vv_f, do_vftintrz_u)
+DO_HELPER_VV(vftintrz_lu_d, 64, helper_vv_f, do_vftintrz_u)
+DO_HELPER_VV(vftint_wu_s, 32, helper_vv_f, do_vftint_u)
+DO_HELPER_VV(vftint_lu_d, 64, helper_vv_f, do_vftint_u)
+
+#define LSX_FTINT_W_D(name)                                      \
+void helper_v## name ##_w_d(CPULoongArchState *env, uint32_t vd, \
+                            uint32_t vj, uint32_t vk)            \
+{                                                                \
+    int i;                                                       \
+    vec_t *Vd = &(env->fpr[vd].vec);                             \
+    vec_t *Vj = &(env->fpr[vj].vec);                             \
+    vec_t *Vk = &(env->fpr[vk].vec);                             \
+                                                                 \
+    vec_t dest;                                                  \
+    dest.D[0] = 0;                                               \
+    dest.D[1] = 0;                                               \
+    for (i = 0; i < 2; i++) {                                    \
+        dest.W[i + 2] = helper_## name ## _w_d(env, Vj->D[i]);   \
+        dest.W[i]  = helper_## name ## _w_d(env, Vk->D[i]);      \
+    }                                                            \
+    Vd->D[0] = dest.D[0];                                        \
+    Vd->D[1] = dest.D[1];                                        \
+}
+
+LSX_FTINT_W_D(ftintrne)
+LSX_FTINT_W_D(ftintrz)
+LSX_FTINT_W_D(ftintrp)
+LSX_FTINT_W_D(ftintrm)
+LSX_FTINT_W_D(ftint)
+
+#define LSX_FTINTL_L_S(name)                                       \
+static void do_v## name ##l_l_s(CPULoongArchState *env, vec_t *Vd, \
+                                vec_t *Vj, int bit, int n)         \
+{                                                                  \
+     Vd->D[n]  = helper_## name ## _l_s(env, Vj->W[n]);            \
+}                                                                  \
+
+LSX_FTINTL_L_S(ftintrne)
+LSX_FTINTL_L_S(ftintrz)
+LSX_FTINTL_L_S(ftintrp)
+LSX_FTINTL_L_S(ftintrm)
+LSX_FTINTL_L_S(ftint)
+
+#define LSX_FTINTH_L_S(name)                                       \
+static void do_v## name ##h_l_s(CPULoongArchState *env, vec_t *Vd, \
+                                vec_t *Vj, int bit, int n)         \
+{                                                                  \
+     Vd->D[n]  = helper_## name ## _l_s(env, Vj->W[n + 2]);        \
+}                                                                  \
+
+LSX_FTINTH_L_S(ftintrne)
+LSX_FTINTH_L_S(ftintrz)
+LSX_FTINTH_L_S(ftintrp)
+LSX_FTINTH_L_S(ftintrm)
+LSX_FTINTH_L_S(ftint)
+
+DO_HELPER_VV(vftintrnel_l_s, 64, helper_vv_f, do_vftintrnel_l_s)
+DO_HELPER_VV(vftintrneh_l_s, 64, helper_vv_f, do_vftintrneh_l_s)
+DO_HELPER_VV(vftintrzl_l_s, 64, helper_vv_f, do_vftintrzl_l_s)
+DO_HELPER_VV(vftintrzh_l_s, 64, helper_vv_f, do_vftintrzh_l_s)
+DO_HELPER_VV(vftintrpl_l_s, 64, helper_vv_f, do_vftintrpl_l_s)
+DO_HELPER_VV(vftintrph_l_s, 64, helper_vv_f, do_vftintrph_l_s)
+DO_HELPER_VV(vftintrml_l_s, 64, helper_vv_f, do_vftintrml_l_s)
+DO_HELPER_VV(vftintrmh_l_s, 64, helper_vv_f, do_vftintrmh_l_s)
+DO_HELPER_VV(vftintl_l_s, 64, helper_vv_f, do_vftintl_l_s)
+DO_HELPER_VV(vftinth_l_s, 64, helper_vv_f, do_vftinth_l_s)
+
+static void do_vffint_s(CPULoongArchState *env,
+                        vec_t *Vd, vec_t *Vj, int bit, int n)
+{
+    switch (bit) {
+    case 32:
+        Vd->W[n] = int32_to_float32(Vj->W[n], &env->fp_status);
+        break;
+    case 64:
+        Vd->D[n] = int64_to_float64(Vj->D[n], &env->fp_status);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    update_fcsr0(env, GETPC());
+}
+
+static void do_vffint_u(CPULoongArchState *env,
+                        vec_t *Vd, vec_t *Vj, int bit, int n)
+{
+    switch (bit) {
+    case 32:
+        Vd->W[n] = uint32_to_float32(Vj->W[n], &env->fp_status);
+        break;
+    case 64:
+        Vd->D[n] = uint64_to_float64(Vj->D[n], &env->fp_status);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    update_fcsr0(env, GETPC());
+}
+
+static void do_vffintl_d_w(CPULoongArchState *env,
+                           vec_t *Vd, vec_t *Vj, int bit, int n)
+{
+    Vd->D[n] = int32_to_float64(Vj->W[n], &env->fp_status);
+    update_fcsr0(env, GETPC());
+}
+
+static void do_vffinth_d_w(CPULoongArchState *env,
+                           vec_t *Vd, vec_t *Vj, int bit, int n)
+{
+    Vd->D[n] = int32_to_float64(Vj->W[n + 2], &env->fp_status);
+    update_fcsr0(env, GETPC());
+}
+
+void helper_vffint_s_l(CPULoongArchState *env,
+                       uint32_t vd, uint32_t vj, uint32_t vk)
+{
+    int i;
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+    vec_t *Vk = &(env->fpr[vk].vec);
+
+    vec_t dest;
+    dest.D[0] = 0;
+    dest.D[1] = 0;
+    for (i = 0; i < 2; i++) {
+        dest.W[i + 2] = int64_to_float32(Vj->D[i], &env->fp_status);
+        dest.W[i]  = int64_to_float32(Vk->D[i], &env->fp_status);
+    }
+    Vd->D[0] = dest.D[0];
+    Vd->D[1] = dest.D[1];
+}
+
+DO_HELPER_VV(vffint_s_w, 32, helper_vv_f, do_vffint_s)
+DO_HELPER_VV(vffint_d_l, 64, helper_vv_f, do_vffint_s)
+DO_HELPER_VV(vffint_s_wu, 32, helper_vv_f, do_vffint_u)
+DO_HELPER_VV(vffint_d_lu, 64, helper_vv_f, do_vffint_u)
+DO_HELPER_VV(vffintl_d_w, 64, helper_vv_f, do_vffintl_d_w)
+DO_HELPER_VV(vffinth_d_w, 64, helper_vv_f, do_vffinth_d_w)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 36/43] target/loongarch: Implement vseq vsle vslt
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (34 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 35/43] target/loongarch: Implement LSX fpu fcvt instructions Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24 18:50   ` Richard Henderson
  2022-12-24  8:16 ` [RFC PATCH 37/43] target/loongarch: Implement vfcmp Song Gao
                   ` (7 subsequent siblings)
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VSEQ[I].{B/H/W/D};
- VSLE[I].{B/H/W/D}[U];
- VSLT[I].{B/H/W/D/}[U].

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  43 +++
 target/loongarch/helper.h                   |  43 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  43 +++
 target/loongarch/insns.decode               |  43 +++
 target/loongarch/lsx_helper.c               | 278 ++++++++++++++++++++
 5 files changed, 450 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 489980a0fa..c854742f6d 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1375,3 +1375,46 @@ INSN_LSX(vffint_d_lu,      vv)
 INSN_LSX(vffintl_d_w,      vv)
 INSN_LSX(vffinth_d_w,      vv)
 INSN_LSX(vffint_s_l,       vvv)
+
+INSN_LSX(vseq_b,           vvv)
+INSN_LSX(vseq_h,           vvv)
+INSN_LSX(vseq_w,           vvv)
+INSN_LSX(vseq_d,           vvv)
+INSN_LSX(vseqi_b,          vv_i)
+INSN_LSX(vseqi_h,          vv_i)
+INSN_LSX(vseqi_w,          vv_i)
+INSN_LSX(vseqi_d,          vv_i)
+
+INSN_LSX(vsle_b,           vvv)
+INSN_LSX(vsle_h,           vvv)
+INSN_LSX(vsle_w,           vvv)
+INSN_LSX(vsle_d,           vvv)
+INSN_LSX(vslei_b,          vv_i)
+INSN_LSX(vslei_h,          vv_i)
+INSN_LSX(vslei_w,          vv_i)
+INSN_LSX(vslei_d,          vv_i)
+INSN_LSX(vsle_bu,          vvv)
+INSN_LSX(vsle_hu,          vvv)
+INSN_LSX(vsle_wu,          vvv)
+INSN_LSX(vsle_du,          vvv)
+INSN_LSX(vslei_bu,         vv_i)
+INSN_LSX(vslei_hu,         vv_i)
+INSN_LSX(vslei_wu,         vv_i)
+INSN_LSX(vslei_du,         vv_i)
+
+INSN_LSX(vslt_b,           vvv)
+INSN_LSX(vslt_h,           vvv)
+INSN_LSX(vslt_w,           vvv)
+INSN_LSX(vslt_d,           vvv)
+INSN_LSX(vslti_b,          vv_i)
+INSN_LSX(vslti_h,          vv_i)
+INSN_LSX(vslti_w,          vv_i)
+INSN_LSX(vslti_d,          vv_i)
+INSN_LSX(vslt_bu,          vvv)
+INSN_LSX(vslt_hu,          vvv)
+INSN_LSX(vslt_wu,          vvv)
+INSN_LSX(vslt_du,          vvv)
+INSN_LSX(vslti_bu,         vv_i)
+INSN_LSX(vslti_hu,         vv_i)
+INSN_LSX(vslti_wu,         vv_i)
+INSN_LSX(vslti_du,         vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 59d94fd055..b8f22a2601 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -725,3 +725,46 @@ DEF_HELPER_3(vffint_d_lu, void, env, i32, i32)
 DEF_HELPER_3(vffintl_d_w, void, env, i32, i32)
 DEF_HELPER_3(vffinth_d_w, void, env, i32, i32)
 DEF_HELPER_4(vffint_s_l, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vseq_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vseq_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vseq_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vseq_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vseqi_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vseqi_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vseqi_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vseqi_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vsle_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsle_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsle_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsle_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vslei_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vslei_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vslei_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vslei_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsle_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsle_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsle_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsle_du, void, env, i32, i32, i32)
+DEF_HELPER_4(vslei_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vslei_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vslei_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vslei_du, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vslt_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vslt_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vslt_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vslt_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vslti_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vslti_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vslti_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vslti_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vslt_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vslt_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vslt_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vslt_du, void, env, i32, i32, i32)
+DEF_HELPER_4(vslti_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vslti_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vslti_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vslti_du, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index cb318a726b..90b3e88229 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -655,3 +655,46 @@ TRANS(vffint_d_lu, gen_vv, gen_helper_vffint_d_lu)
 TRANS(vffintl_d_w, gen_vv, gen_helper_vffintl_d_w)
 TRANS(vffinth_d_w, gen_vv, gen_helper_vffinth_d_w)
 TRANS(vffint_s_l, gen_vvv, gen_helper_vffint_s_l)
+
+TRANS(vseq_b, gen_vvv, gen_helper_vseq_b)
+TRANS(vseq_h, gen_vvv, gen_helper_vseq_h)
+TRANS(vseq_w, gen_vvv, gen_helper_vseq_w)
+TRANS(vseq_d, gen_vvv, gen_helper_vseq_d)
+TRANS(vseqi_b, gen_vv_i, gen_helper_vseqi_b)
+TRANS(vseqi_h, gen_vv_i, gen_helper_vseqi_h)
+TRANS(vseqi_w, gen_vv_i, gen_helper_vseqi_w)
+TRANS(vseqi_d, gen_vv_i, gen_helper_vseqi_d)
+
+TRANS(vsle_b, gen_vvv, gen_helper_vsle_b)
+TRANS(vsle_h, gen_vvv, gen_helper_vsle_h)
+TRANS(vsle_w, gen_vvv, gen_helper_vsle_w)
+TRANS(vsle_d, gen_vvv, gen_helper_vsle_d)
+TRANS(vslei_b, gen_vv_i, gen_helper_vslei_b)
+TRANS(vslei_h, gen_vv_i, gen_helper_vslei_h)
+TRANS(vslei_w, gen_vv_i, gen_helper_vslei_w)
+TRANS(vslei_d, gen_vv_i, gen_helper_vslei_d)
+TRANS(vsle_bu, gen_vvv, gen_helper_vsle_bu)
+TRANS(vsle_hu, gen_vvv, gen_helper_vsle_hu)
+TRANS(vsle_wu, gen_vvv, gen_helper_vsle_wu)
+TRANS(vsle_du, gen_vvv, gen_helper_vsle_du)
+TRANS(vslei_bu, gen_vv_i, gen_helper_vslei_bu)
+TRANS(vslei_hu, gen_vv_i, gen_helper_vslei_hu)
+TRANS(vslei_wu, gen_vv_i, gen_helper_vslei_wu)
+TRANS(vslei_du, gen_vv_i, gen_helper_vslei_du)
+
+TRANS(vslt_b, gen_vvv, gen_helper_vslt_b)
+TRANS(vslt_h, gen_vvv, gen_helper_vslt_h)
+TRANS(vslt_w, gen_vvv, gen_helper_vslt_w)
+TRANS(vslt_d, gen_vvv, gen_helper_vslt_d)
+TRANS(vslti_b, gen_vv_i, gen_helper_vslti_b)
+TRANS(vslti_h, gen_vv_i, gen_helper_vslti_h)
+TRANS(vslti_w, gen_vv_i, gen_helper_vslti_w)
+TRANS(vslti_d, gen_vv_i, gen_helper_vslti_d)
+TRANS(vslt_bu, gen_vvv, gen_helper_vslt_bu)
+TRANS(vslt_hu, gen_vvv, gen_helper_vslt_hu)
+TRANS(vslt_wu, gen_vvv, gen_helper_vslt_wu)
+TRANS(vslt_du, gen_vvv, gen_helper_vslt_du)
+TRANS(vslti_bu, gen_vv_i, gen_helper_vslti_bu)
+TRANS(vslti_hu, gen_vv_i, gen_helper_vslti_hu)
+TRANS(vslti_wu, gen_vv_i, gen_helper_vslti_wu)
+TRANS(vslti_du, gen_vv_i, gen_helper_vslti_du)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 26f82d5712..965ee486e1 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1101,3 +1101,46 @@ vffint_d_lu      0111 00101001 11100 00011 ..... .....    @vv
 vffintl_d_w      0111 00101001 11100 00100 ..... .....    @vv
 vffinth_d_w      0111 00101001 11100 00101 ..... .....    @vv
 vffint_s_l       0111 00010100 10000 ..... ..... .....    @vvv
+
+vseq_b           0111 00000000 00000 ..... ..... .....    @vvv
+vseq_h           0111 00000000 00001 ..... ..... .....    @vvv
+vseq_w           0111 00000000 00010 ..... ..... .....    @vvv
+vseq_d           0111 00000000 00011 ..... ..... .....    @vvv
+vseqi_b          0111 00101000 00000 ..... ..... .....    @vv_i5
+vseqi_h          0111 00101000 00001 ..... ..... .....    @vv_i5
+vseqi_w          0111 00101000 00010 ..... ..... .....    @vv_i5
+vseqi_d          0111 00101000 00011 ..... ..... .....    @vv_i5
+
+vsle_b           0111 00000000 00100 ..... ..... .....    @vvv
+vsle_h           0111 00000000 00101 ..... ..... .....    @vvv
+vsle_w           0111 00000000 00110 ..... ..... .....    @vvv
+vsle_d           0111 00000000 00111 ..... ..... .....    @vvv
+vslei_b          0111 00101000 00100 ..... ..... .....    @vv_i5
+vslei_h          0111 00101000 00101 ..... ..... .....    @vv_i5
+vslei_w          0111 00101000 00110 ..... ..... .....    @vv_i5
+vslei_d          0111 00101000 00111 ..... ..... .....    @vv_i5
+vsle_bu          0111 00000000 01000 ..... ..... .....    @vvv
+vsle_hu          0111 00000000 01001 ..... ..... .....    @vvv
+vsle_wu          0111 00000000 01010 ..... ..... .....    @vvv
+vsle_du          0111 00000000 01011 ..... ..... .....    @vvv
+vslei_bu         0111 00101000 01000 ..... ..... .....    @vv_ui5
+vslei_hu         0111 00101000 01001 ..... ..... .....    @vv_ui5
+vslei_wu         0111 00101000 01010 ..... ..... .....    @vv_ui5
+vslei_du         0111 00101000 01011 ..... ..... .....    @vv_ui5
+
+vslt_b           0111 00000000 01100 ..... ..... .....    @vvv
+vslt_h           0111 00000000 01101 ..... ..... .....    @vvv
+vslt_w           0111 00000000 01110 ..... ..... .....    @vvv
+vslt_d           0111 00000000 01111 ..... ..... .....    @vvv
+vslti_b          0111 00101000 01100 ..... ..... .....    @vv_i5
+vslti_h          0111 00101000 01101 ..... ..... .....    @vv_i5
+vslti_w          0111 00101000 01110 ..... ..... .....    @vv_i5
+vslti_d          0111 00101000 01111 ..... ..... .....    @vv_i5
+vslt_bu          0111 00000000 10000 ..... ..... .....    @vvv
+vslt_hu          0111 00000000 10001 ..... ..... .....    @vvv
+vslt_wu          0111 00000000 10010 ..... ..... .....    @vvv
+vslt_du          0111 00000000 10011 ..... ..... .....    @vvv
+vslti_bu         0111 00101000 10000 ..... ..... .....    @vv_ui5
+vslti_hu         0111 00101000 10001 ..... ..... .....    @vv_ui5
+vslti_wu         0111 00101000 10010 ..... ..... .....    @vv_ui5
+vslti_du         0111 00101000 10011 ..... ..... .....    @vv_ui5
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 29c0592d0c..977a095e79 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -3959,3 +3959,281 @@ DO_HELPER_VV(vffint_s_wu, 32, helper_vv_f, do_vffint_u)
 DO_HELPER_VV(vffint_d_lu, 64, helper_vv_f, do_vffint_u)
 DO_HELPER_VV(vffintl_d_w, 64, helper_vv_f, do_vffintl_d_w)
 DO_HELPER_VV(vffinth_d_w, 64, helper_vv_f, do_vffinth_d_w)
+
+static int64_t vseq(int64_t s1, int64_t s2)
+{
+    return s1 == s2 ? -1: 0;
+}
+
+static void do_vseq(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vseq(Vj->B[n], Vk->B[n]);
+        break;
+    case 16:
+        Vd->H[n] = vseq(Vj->H[n], Vk->H[n]);
+        break;
+    case 32:
+        Vd->W[n] = vseq(Vj->W[n], Vk->W[n]);
+        break;
+    case 64:
+        Vd->D[n] = vseq(Vj->D[n], Vk->D[n]);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vseqi(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vseq(Vj->B[n], imm);
+        break;
+    case 16:
+        Vd->H[n] = vseq(Vj->H[n], imm);
+        break;
+    case 32:
+        Vd->W[n] = vseq(Vj->W[n], imm);
+        break;
+    case 64:
+        Vd->D[n] = vseq(Vj->D[n], imm);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vseq_b, 8, helper_vvv, do_vseq)
+DO_HELPER_VVV(vseq_h, 16, helper_vvv, do_vseq)
+DO_HELPER_VVV(vseq_w, 32, helper_vvv, do_vseq)
+DO_HELPER_VVV(vseq_d, 64, helper_vvv, do_vseq)
+DO_HELPER_VVV(vseqi_b, 8, helper_vv_i, do_vseqi)
+DO_HELPER_VVV(vseqi_h, 16, helper_vv_i, do_vseqi)
+DO_HELPER_VVV(vseqi_w, 32, helper_vv_i, do_vseqi)
+DO_HELPER_VVV(vseqi_d, 64, helper_vv_i, do_vseqi)
+
+static int64_t vsle_s(int64_t s1, int64_t s2)
+{
+    return s1 <= s2 ? -1 : 0;
+}
+
+static void do_vsle_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vsle_s(Vj->B[n], Vk->B[n]);
+        break;
+    case 16:
+        Vd->H[n] = vsle_s(Vj->H[n], Vk->H[n]);
+        break;
+    case 32:
+        Vd->W[n] = vsle_s(Vj->W[n], Vk->W[n]);
+        break;
+    case 64:
+        Vd->D[n] = vsle_s(Vj->D[n], Vk->D[n]);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vslei_s(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vsle_s(Vj->B[n], imm);
+        break;
+    case 16:
+        Vd->H[n] = vsle_s(Vj->H[n], imm);
+        break;
+    case 32:
+        Vd->W[n] = vsle_s(Vj->W[n], imm);
+        break;
+    case 64:
+        Vd->D[n] = vsle_s(Vj->D[n], imm);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vsle_b, 8, helper_vvv, do_vsle_s)
+DO_HELPER_VVV(vsle_h, 16, helper_vvv, do_vsle_s)
+DO_HELPER_VVV(vsle_w, 32, helper_vvv, do_vsle_s)
+DO_HELPER_VVV(vsle_d, 64, helper_vvv, do_vsle_s)
+DO_HELPER_VVV(vslei_b, 8, helper_vv_i, do_vslei_s)
+DO_HELPER_VVV(vslei_h, 16, helper_vv_i, do_vslei_s)
+DO_HELPER_VVV(vslei_w, 32, helper_vv_i, do_vslei_s)
+DO_HELPER_VVV(vslei_d, 64, helper_vv_i, do_vslei_s)
+
+static int64_t vsle_u(int64_t s1, int64_t s2, int bit)
+{
+    uint64_t umax = MAKE_64BIT_MASK(0, bit);
+    uint64_t u1 = s1 & umax;
+    uint64_t u2 = s2 & umax;
+
+    return u1 <= u2 ? -1 : 0;
+}
+
+static void do_vsle_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int  bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vsle_u(Vj->B[n], Vk->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = vsle_u(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = vsle_u(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = vsle_u(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vslei_u(vec_t *Vd, vec_t *Vj, uint32_t imm, int  bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vsle_u(Vj->B[n], imm, bit);
+        break;
+    case 16:
+        Vd->H[n] = vsle_u(Vj->H[n], imm, bit);
+        break;
+    case 32:
+        Vd->W[n] = vsle_u(Vj->W[n], imm, bit);
+        break;
+    case 64:
+        Vd->D[n] = vsle_u(Vj->D[n], imm, bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vsle_bu, 8, helper_vvv, do_vsle_u)
+DO_HELPER_VVV(vsle_hu, 16, helper_vvv, do_vsle_u)
+DO_HELPER_VVV(vsle_wu, 32, helper_vvv, do_vsle_u)
+DO_HELPER_VVV(vsle_du, 64, helper_vvv, do_vsle_u)
+DO_HELPER_VVV(vslei_bu, 8, helper_vv_i, do_vslei_u)
+DO_HELPER_VVV(vslei_hu, 16, helper_vv_i, do_vslei_u)
+DO_HELPER_VVV(vslei_wu, 32, helper_vv_i, do_vslei_u)
+DO_HELPER_VVV(vslei_du, 64, helper_vv_i, do_vslei_u)
+
+static int64_t vslt_s(int64_t s1, int64_t s2)
+{
+    return s1 < s2 ? -1 : 0;
+}
+
+static void do_vslt_s(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vslt_s(Vj->B[n], Vk->B[n]);
+        break;
+    case 16:
+        Vd->H[n] = vslt_s(Vj->H[n], Vk->H[n]);
+        break;
+    case 32:
+        Vd->W[n] = vslt_s(Vj->W[n], Vk->W[n]);
+        break;
+    case 64:
+        Vd->D[n] = vslt_s(Vj->D[n], Vk->D[n]);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vslti_s(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vslt_s(Vj->B[n], imm);
+        break;
+    case 16:
+        Vd->H[n] = vslt_s(Vj->H[n], imm);
+        break;
+    case 32:
+        Vd->W[n] = vslt_s(Vj->W[n], imm);
+        break;
+    case 64:
+        Vd->D[n] = vslt_s(Vj->D[n], imm);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vslt_b, 8, helper_vvv, do_vslt_s)
+DO_HELPER_VVV(vslt_h, 16, helper_vvv, do_vslt_s)
+DO_HELPER_VVV(vslt_w, 32, helper_vvv, do_vslt_s)
+DO_HELPER_VVV(vslt_d, 64, helper_vvv, do_vslt_s)
+DO_HELPER_VVV(vslti_b, 8, helper_vv_i, do_vslti_s)
+DO_HELPER_VVV(vslti_h, 16, helper_vv_i, do_vslti_s)
+DO_HELPER_VVV(vslti_w, 32, helper_vv_i, do_vslti_s)
+DO_HELPER_VVV(vslti_d, 64, helper_vv_i, do_vslti_s)
+
+static int64_t vslt_u(int64_t s1, int64_t s2, int bit)
+{
+    uint64_t umax = MAKE_64BIT_MASK(0, bit);
+    uint64_t u1 = s1 & umax;
+    uint64_t u2 = s2 & umax;
+
+    return u1 < u2 ? -1 : 0;
+}
+
+static void do_vslt_u(vec_t *Vd, vec_t *Vj, vec_t *Vk, int  bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vslt_u(Vj->B[n], Vk->B[n], bit);
+        break;
+    case 16:
+        Vd->H[n] = vslt_u(Vj->H[n], Vk->H[n], bit);
+        break;
+    case 32:
+        Vd->W[n] = vslt_u(Vj->W[n], Vk->W[n], bit);
+        break;
+    case 64:
+        Vd->D[n] = vslt_u(Vj->D[n], Vk->D[n], bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vslti_u(vec_t *Vd, vec_t *Vj, uint32_t imm, int  bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = vslt_u(Vj->B[n], imm, bit);
+        break;
+    case 16:
+        Vd->H[n] = vslt_u(Vj->H[n], imm, bit);
+        break;
+    case 32:
+        Vd->W[n] = vslt_u(Vj->W[n], imm, bit);
+        break;
+    case 64:
+        Vd->D[n] = vslt_u(Vj->D[n], imm, bit);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vslt_bu, 8, helper_vvv, do_vslt_u)
+DO_HELPER_VVV(vslt_hu, 16, helper_vvv, do_vslt_u)
+DO_HELPER_VVV(vslt_wu, 32, helper_vvv, do_vslt_u)
+DO_HELPER_VVV(vslt_du, 64, helper_vvv, do_vslt_u)
+DO_HELPER_VVV(vslti_bu, 8, helper_vv_i, do_vslti_u)
+DO_HELPER_VVV(vslti_hu, 16, helper_vv_i, do_vslti_u)
+DO_HELPER_VVV(vslti_wu, 32, helper_vv_i, do_vslti_u)
+DO_HELPER_VVV(vslti_du, 64, helper_vv_i, do_vslti_u)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 37/43] target/loongarch: Implement vfcmp
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (35 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 36/43] target/loongarch: Implement vseq vsle vslt Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24  8:16 ` [RFC PATCH 38/43] target/loongarch: Implement vbitsel vset Song Gao
                   ` (6 subsequent siblings)
  43 siblings, 0 replies; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VFCMP.cond.{S/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    | 94 +++++++++++++++++++++
 target/loongarch/helper.h                   |  9 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 30 +++++++
 target/loongarch/insns.decode               |  5 ++
 target/loongarch/lsx_helper.c               | 38 +++++++++
 5 files changed, 176 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index c854742f6d..0ea5418e5e 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1418,3 +1418,97 @@ INSN_LSX(vslti_bu,         vv_i)
 INSN_LSX(vslti_hu,         vv_i)
 INSN_LSX(vslti_wu,         vv_i)
 INSN_LSX(vslti_du,         vv_i)
+
+#define output_vfcmp(C, PREFIX, SUFFIX)                                     \
+{                                                                           \
+    (C)->info->fprintf_func((C)->info->stream, "%08x   %s%s\t%d, f%d, f%d", \
+                            (C)->insn, PREFIX, SUFFIX, a->vd,               \
+                            a->vj, a->vk);                                  \
+}
+
+static bool output_vvv_fcond(DisasContext *ctx, arg_vvv_fcond * a,
+                             const char *suffix)
+{
+    bool ret = true;
+    switch (a->fcond) {
+    case 0x0:
+        output_vfcmp(ctx, "vfcmp_caf_", suffix);
+        break;
+    case 0x1:
+        output_vfcmp(ctx, "vfcmp_saf_", suffix);
+        break;
+    case 0x2:
+        output_vfcmp(ctx, "vfcmp_clt_", suffix);
+        break;
+    case 0x3:
+        output_vfcmp(ctx, "vfcmp_slt_", suffix);
+        break;
+    case 0x4:
+        output_vfcmp(ctx, "vfcmp_ceq_", suffix);
+        break;
+    case 0x5:
+        output_vfcmp(ctx, "vfcmp_seq_", suffix);
+        break;
+    case 0x6:
+        output_vfcmp(ctx, "vfcmp_cle_", suffix);
+        break;
+    case 0x7:
+        output_vfcmp(ctx, "vfcmp_sle_", suffix);
+        break;
+    case 0x8:
+        output_vfcmp(ctx, "vfcmp_cun_", suffix);
+        break;
+    case 0x9:
+        output_vfcmp(ctx, "vfcmp_sun_", suffix);
+        break;
+    case 0xA:
+        output_vfcmp(ctx, "vfcmp_cult_", suffix);
+        break;
+    case 0xB:
+        output_vfcmp(ctx, "vfcmp_sult_", suffix);
+        break;
+    case 0xC:
+        output_vfcmp(ctx, "vfcmp_cueq_", suffix);
+        break;
+    case 0xD:
+        output_vfcmp(ctx, "vfcmp_sueq_", suffix);
+        break;
+    case 0xE:
+        output_vfcmp(ctx, "vfcmp_cule_", suffix);
+        break;
+    case 0xF:
+        output_vfcmp(ctx, "vfcmp_sule_", suffix);
+        break;
+    case 0x10:
+        output_vfcmp(ctx, "vfcmp_cne_", suffix);
+        break;
+    case 0x11:
+        output_vfcmp(ctx, "vfcmp_sne_", suffix);
+        break;
+    case 0x14:
+        output_vfcmp(ctx, "vfcmp_cor_", suffix);
+        break;
+    case 0x15:
+        output_vfcmp(ctx, "vfcmp_sor_", suffix);
+        break;
+    case 0x18:
+        output_vfcmp(ctx, "vfcmp_cune_", suffix);
+        break;
+    case 0x19:
+        output_vfcmp(ctx, "vfcmp_sune_", suffix);
+        break;
+    default:
+        ret = false;
+    }
+    return ret;
+}
+
+#define LSX_FCMP_INSN(suffix)                            \
+static bool trans_vfcmp_cond_##suffix(DisasContext *ctx, \
+                                     arg_vvv_fcond * a)  \
+{                                                        \
+    return output_vvv_fcond(ctx, a, #suffix);            \
+}
+
+LSX_FCMP_INSN(s)
+LSX_FCMP_INSN(d)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index b8f22a2601..9d8ade9dc8 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -768,3 +768,12 @@ DEF_HELPER_4(vslti_bu, void, env, i32, i32, i32)
 DEF_HELPER_4(vslti_hu, void, env, i32, i32, i32)
 DEF_HELPER_4(vslti_wu, void, env, i32, i32, i32)
 DEF_HELPER_4(vslti_du, void, env, i32, i32, i32)
+
+/* vfcmp.cXXX.s */
+DEF_HELPER_5(vfcmp_c_s, void, env, i32, i32, i32, i32)
+/* vfcmp.sXXX.s */
+DEF_HELPER_5(vfcmp_s_s, void, env, i32, i32, i32, i32)
+/* vfcmp.cXXX.d */
+DEF_HELPER_5(vfcmp_c_d, void, env, i32, i32, i32, i32)
+/* vfcmp.sXXX.d */
+DEF_HELPER_5(vfcmp_s_d, void, env, i32, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 90b3e88229..522d660113 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -698,3 +698,33 @@ TRANS(vslti_bu, gen_vv_i, gen_helper_vslti_bu)
 TRANS(vslti_hu, gen_vv_i, gen_helper_vslti_hu)
 TRANS(vslti_wu, gen_vv_i, gen_helper_vslti_wu)
 TRANS(vslti_du, gen_vv_i, gen_helper_vslti_du)
+
+static bool trans_vfcmp_cond_s(DisasContext *ctx, arg_vvv_fcond *a)
+{
+    uint32_t flags;
+    void (*fn)(TCGv_env, TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32);
+    TCGv_i32 vd = tcg_constant_i32(a->vd);
+    TCGv_i32 vj = tcg_constant_i32(a->vj);
+    TCGv_i32 vk = tcg_constant_i32(a->vk);
+
+    fn = (a->fcond & 1 ? gen_helper_vfcmp_s_s : gen_helper_vfcmp_c_s);
+    flags = get_fcmp_flags(a->fcond >> 1);
+    fn(cpu_env, vd, vj, vk,  tcg_constant_i32(flags));
+
+    return true;
+}
+
+static bool trans_vfcmp_cond_d(DisasContext *ctx, arg_vvv_fcond *a)
+{
+    uint32_t flags;
+    void (*fn)(TCGv_env, TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32);
+    TCGv_i32 vd = tcg_constant_i32(a->vd);
+    TCGv_i32 vj = tcg_constant_i32(a->vj);
+    TCGv_i32 vk = tcg_constant_i32(a->vk);
+
+    fn = (a->fcond & 1 ? gen_helper_vfcmp_s_d : gen_helper_vfcmp_c_d);
+    flags = get_fcmp_flags(a->fcond >> 1);
+    fn(cpu_env, vd, vj, vk, tcg_constant_i32(flags));
+
+    return true;
+}
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 965ee486e1..5b4114c39b 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -493,6 +493,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 &vvv          vd vj vk
 &vv_i         vd vj imm
 &vvvv         vd vj vk va
+&vvv_fcond    vd vj vk fcond
 
 #
 # LSX Formats
@@ -507,6 +508,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 @vv_ui8              .... ........ .. imm:8 vj:5 vd:5    &vv_i
 @vv_i5           .... ........ ..... imm:s5 vj:5 vd:5    &vv_i
 @vvvv               .... ........ va:5 vk:5 vj:5 vd:5    &vvvv
+@vvv_fcond      .... ........ fcond:5  vk:5 vj:5 vd:5    &vvv_fcond
 
 vadd_b           0111 00000000 10100 ..... ..... .....    @vvv
 vadd_h           0111 00000000 10101 ..... ..... .....    @vvv
@@ -1144,3 +1146,6 @@ vslti_bu         0111 00101000 10000 ..... ..... .....    @vv_ui5
 vslti_hu         0111 00101000 10001 ..... ..... .....    @vv_ui5
 vslti_wu         0111 00101000 10010 ..... ..... .....    @vv_ui5
 vslti_du         0111 00101000 10011 ..... ..... .....    @vv_ui5
+
+vfcmp_cond_s     0000 11000101 ..... ..... ..... .....    @vvv_fcond
+vfcmp_cond_d     0000 11000110 ..... ..... ..... .....    @vvv_fcond
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 977a095e79..1e5a1d989a 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -4237,3 +4237,41 @@ DO_HELPER_VVV(vslti_bu, 8, helper_vv_i, do_vslti_u)
 DO_HELPER_VVV(vslti_hu, 16, helper_vv_i, do_vslti_u)
 DO_HELPER_VVV(vslti_wu, 32, helper_vv_i, do_vslti_u)
 DO_HELPER_VVV(vslti_du, 64, helper_vv_i, do_vslti_u)
+
+#define LSX_FCMP_S(name)                                            \
+void helper_v## name ##_s(CPULoongArchState *env, uint32_t vd,      \
+                          uint32_t vj, uint32_t vk, uint32_t flags) \
+{                                                                   \
+    int ret;                                                        \
+    int i;                                                          \
+    vec_t *Vd = &(env->fpr[vd].vec);                                \
+    vec_t *Vj = &(env->fpr[vj].vec);                                \
+    vec_t *Vk = &(env->fpr[vk].vec);                                \
+                                                                    \
+    for (i = 0; i < 4; i++) {                                       \
+        ret = helper_## name ## _s(env, Vj->W[i], Vk->W[i], flags); \
+        Vd->W[i] = (ret == 1) ? -1 : 0;                             \
+    }                                                               \
+}
+
+LSX_FCMP_S(fcmp_c)
+LSX_FCMP_S(fcmp_s)
+
+#define LSX_FCMP_D(name)                                            \
+void helper_v## name ##_d(CPULoongArchState *env, uint32_t vd,      \
+                          uint32_t vj, uint32_t vk, uint32_t flags) \
+{                                                                   \
+    int ret;                                                        \
+    int i;                                                          \
+    vec_t *Vd = &(env->fpr[vd].vec);                                \
+    vec_t *Vj = &(env->fpr[vj].vec);                                \
+    vec_t *Vk = &(env->fpr[vk].vec);                                \
+                                                                    \
+    for (i = 0; i < 2; i++) {                                       \
+        ret = helper_## name ## _d(env, Vj->D[i], Vk->D[i], flags); \
+        Vd->D[i] = (ret == 1) ? -1 : 0;                             \
+    }                                                               \
+}
+
+LSX_FCMP_D(fcmp_c)
+LSX_FCMP_D(fcmp_s)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 38/43] target/loongarch: Implement vbitsel vset
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (36 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 37/43] target/loongarch: Implement vfcmp Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24 19:15   ` Richard Henderson
  2022-12-24  8:16 ` [RFC PATCH 39/43] target/loongarch: Implement vinsgr2vr vpickve2gr vreplgr2vr Song Gao
                   ` (5 subsequent siblings)
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VBITSEL.V;
- VBITSELI.B;
- VSET{EQZ/NEZ}.V;
- VSETANYEQZ.{B/H/W/D};
- VSETALLNEZ.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  20 ++++
 target/loongarch/helper.h                   |  14 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  25 +++++
 target/loongarch/insns.decode               |  17 +++
 target/loongarch/lsx_helper.c               | 116 ++++++++++++++++++++
 5 files changed, 192 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 0ea5418e5e..88e6ed1eef 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -763,6 +763,12 @@ static bool trans_##insn(DisasContext *ctx, arg_##type * a) \
     return true;                                            \
 }
 
+static void output_cv(DisasContext *ctx, arg_cv *a,
+                        const char *mnemonic)
+{
+    output(ctx, mnemonic, "fcc%d, v%d", a->cd, a->vj);
+}
+
 static void output_vvv(DisasContext *ctx, arg_vvv *a, const char *mnemonic)
 {
     output(ctx, mnemonic, "v%d, v%d, v%d", a->vd, a->vj, a->vk);
@@ -1512,3 +1518,17 @@ static bool trans_vfcmp_cond_##suffix(DisasContext *ctx, \
 
 LSX_FCMP_INSN(s)
 LSX_FCMP_INSN(d)
+
+INSN_LSX(vbitsel_v,        vvvv)
+INSN_LSX(vbitseli_b,       vv_i)
+
+INSN_LSX(vseteqz_v,        cv)
+INSN_LSX(vsetnez_v,        cv)
+INSN_LSX(vsetanyeqz_b,     cv)
+INSN_LSX(vsetanyeqz_h,     cv)
+INSN_LSX(vsetanyeqz_w,     cv)
+INSN_LSX(vsetanyeqz_d,     cv)
+INSN_LSX(vsetallnez_b,     cv)
+INSN_LSX(vsetallnez_h,     cv)
+INSN_LSX(vsetallnez_w,     cv)
+INSN_LSX(vsetallnez_d,     cv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 9d8ade9dc8..1bef2a901f 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -777,3 +777,17 @@ DEF_HELPER_5(vfcmp_s_s, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(vfcmp_c_d, void, env, i32, i32, i32, i32)
 /* vfcmp.sXXX.d */
 DEF_HELPER_5(vfcmp_s_d, void, env, i32, i32, i32, i32)
+
+DEF_HELPER_5(vbitsel_v, void, env, i32, i32, i32, i32)
+DEF_HELPER_4(vbitseli_b, void, env, i32, i32, i32)
+
+DEF_HELPER_3(vseteqz_v, void, env, i32, i32)
+DEF_HELPER_3(vsetnez_v, void, env, i32, i32)
+DEF_HELPER_3(vsetanyeqz_b, void, env, i32, i32)
+DEF_HELPER_3(vsetanyeqz_h, void, env, i32, i32)
+DEF_HELPER_3(vsetanyeqz_w, void, env, i32, i32)
+DEF_HELPER_3(vsetanyeqz_d, void, env, i32, i32)
+DEF_HELPER_3(vsetallnez_b, void, env, i32, i32)
+DEF_HELPER_3(vsetallnez_h, void, env, i32, i32)
+DEF_HELPER_3(vsetallnez_w, void, env, i32, i32)
+DEF_HELPER_3(vsetallnez_d, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 522d660113..7bf7f33724 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -50,6 +50,17 @@ static bool gen_vv(DisasContext *ctx, arg_vv *a,
     return true;
 }
 
+static bool gen_cv(DisasContext *ctx, arg_cv *a,
+                    void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32))
+{
+    TCGv_i32 vj = tcg_constant_i32(a->vj);
+    TCGv_i32 cd = tcg_constant_i32(a->cd);
+
+    CHECK_SXE;
+    func(cpu_env, cd, vj);
+    return true;
+}
+
 static bool gen_vvvv(DisasContext *ctx, arg_vvvv *a,
                      void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32,
                                   TCGv_i32, TCGv_i32))
@@ -728,3 +739,17 @@ static bool trans_vfcmp_cond_d(DisasContext *ctx, arg_vvv_fcond *a)
 
     return true;
 }
+
+TRANS(vbitsel_v, gen_vvvv, gen_helper_vbitsel_v)
+TRANS(vbitseli_b, gen_vv_i, gen_helper_vbitseli_b)
+
+TRANS(vseteqz_v, gen_cv, gen_helper_vseteqz_v)
+TRANS(vsetnez_v, gen_cv, gen_helper_vsetnez_v)
+TRANS(vsetanyeqz_b, gen_cv, gen_helper_vsetanyeqz_b)
+TRANS(vsetanyeqz_h, gen_cv, gen_helper_vsetanyeqz_h)
+TRANS(vsetanyeqz_w, gen_cv, gen_helper_vsetanyeqz_w)
+TRANS(vsetanyeqz_d, gen_cv, gen_helper_vsetanyeqz_d)
+TRANS(vsetallnez_b, gen_cv, gen_helper_vsetallnez_b)
+TRANS(vsetallnez_h, gen_cv, gen_helper_vsetallnez_h)
+TRANS(vsetallnez_w, gen_cv, gen_helper_vsetallnez_w)
+TRANS(vsetallnez_d, gen_cv, gen_helper_vsetallnez_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 5b4114c39b..fb1cc29aff 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -490,6 +490,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 #
 
 &vv           vd vj
+&cv           cd vj
 &vvv          vd vj vk
 &vv_i         vd vj imm
 &vvvv         vd vj vk va
@@ -499,6 +500,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 # LSX Formats
 #
 @vv               .... ........ ..... ..... vj:5 vd:5    &vv
+@cv            .... ........ ..... ..... vj:5 .. cd:3    &cv
 @vvv               .... ........ ..... vk:5 vj:5 vd:5    &vvv
 @vv_ui3        .... ........ ..... .. imm:3 vj:5 vd:5    &vv_i
 @vv_ui4         .... ........ ..... . imm:4 vj:5 vd:5    &vv_i
@@ -1149,3 +1151,18 @@ vslti_du         0111 00101000 10011 ..... ..... .....    @vv_ui5
 
 vfcmp_cond_s     0000 11000101 ..... ..... ..... .....    @vvv_fcond
 vfcmp_cond_d     0000 11000110 ..... ..... ..... .....    @vvv_fcond
+
+vbitsel_v        0000 11010001 ..... ..... ..... .....    @vvvv
+
+vbitseli_b       0111 00111100 01 ........ ..... .....    @vv_ui8
+
+vseteqz_v        0111 00101001 11001 00110 ..... 00 ...   @cv
+vsetnez_v        0111 00101001 11001 00111 ..... 00 ...   @cv
+vsetanyeqz_b     0111 00101001 11001 01000 ..... 00 ...   @cv
+vsetanyeqz_h     0111 00101001 11001 01001 ..... 00 ...   @cv
+vsetanyeqz_w     0111 00101001 11001 01010 ..... 00 ...   @cv
+vsetanyeqz_d     0111 00101001 11001 01011 ..... 00 ...   @cv
+vsetallnez_b     0111 00101001 11001 01100 ..... 00 ...   @cv
+vsetallnez_h     0111 00101001 11001 01101 ..... 00 ...   @cv
+vsetallnez_w     0111 00101001 11001 01110 ..... 00 ...   @cv
+vsetallnez_d     0111 00101001 11001 01111 ..... 00 ...   @cv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 1e5a1d989a..f4cdfae87a 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -31,6 +31,10 @@
                        uint32_t vd, uint32_t vj, uint32_t vk, uint32_t va) \
     { FUNC(env, vd, vj, vk, va, BIT, __VA_ARGS__); }
 
+#define DO_HELPER_CV(NAME, BIT, FUNC, ...)                               \
+    void helper_##NAME(CPULoongArchState *env, uint32_t cd, uint32_t vj) \
+    { FUNC(env, cd, vj, BIT, __VA_ARGS__); }
+
 static void helper_vvv(CPULoongArchState *env,
                        uint32_t vd, uint32_t vj, uint32_t vk, int bit,
                        void (*func)(vec_t*, vec_t*, vec_t*, int, int))
@@ -4275,3 +4279,115 @@ void helper_v## name ##_d(CPULoongArchState *env, uint32_t vd,      \
 
 LSX_FCMP_D(fcmp_c)
 LSX_FCMP_D(fcmp_s)
+
+void helper_vbitsel_v(CPULoongArchState *env,
+                      uint32_t vd, uint32_t vj, uint32_t vk, uint32_t va)
+{
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+    vec_t *Vk = &(env->fpr[vk].vec);
+    vec_t *Va = &(env->fpr[va].vec);
+
+    Vd->D[0] = (Vk->D[0] & Va->D[0]) | (Vj->D[0] & ~(Va->D[0]));
+    Vd->D[1] = (Vk->D[1] & Va->D[1]) | (Vj->D[1] & ~(Va->D[1]));
+}
+
+static void do_vbitseli_b(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    Vd->B[n] = (~Vd->B[n] & Vj->B[n] ) | (Vd->B[n] & imm);
+}
+
+DO_HELPER_VV_I(vbitseli_b, 8, helper_vv_i, do_vbitseli_b)
+
+void helper_vseteqz_v(CPULoongArchState *env, uint32_t cd, uint32_t vj)
+{
+    vec_t *Vj = &(env->fpr[vj].vec);
+    env->cf[cd & 0x7] = (Vj->Q[0] == 0);
+}
+
+void helper_vsetnez_v(CPULoongArchState *env, uint32_t cd, uint32_t vj)
+{
+    vec_t *Vj = &(env->fpr[vj].vec);
+    env->cf[cd & 0x7] = (Vj->Q[0] != 0);
+}
+
+static void helper_setanyeqz(CPULoongArchState *env,
+                             uint32_t cd, uint32_t vj, int bit,
+                             bool (*func)(vec_t*, int, int))
+{
+    int i;
+    bool ret = false;
+    vec_t *Vj = &(env->fpr[vj].vec);
+
+    for (i = 0; i < LSX_LEN/bit; i++) {
+        ret |= func(Vj, bit, i);
+    }
+    env->cf[cd & 0x7] = ret;
+}
+
+static void helper_setallnez(CPULoongArchState *env,
+                             uint32_t cd, uint32_t vj, int bit,
+                             bool (*func)(vec_t*, int, int))
+{
+    int i;
+    bool ret = true;
+    vec_t *Vj = &(env->fpr[vj].vec);
+
+    for (i = 0; i < LSX_LEN/bit; i++) {
+        ret &= func(Vj, bit, i);
+    }
+    env->cf[cd & 0x7] = ret;
+}
+
+static bool do_setanyeqz(vec_t *Vj, int bit, int n)
+{
+    bool ret = false;
+    switch (bit) {
+    case 8:
+        ret = (Vj->B[n] == 0);
+        break;
+    case 16:
+        ret = (Vj->H[n] == 0);
+        break;
+    case 32:
+        ret = (Vj->W[n] == 0);
+        break;
+    case 64:
+        ret = (Vj->D[n] == 0);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    return ret;
+}
+
+static bool do_setallnez(vec_t *Vj, int bit, int n)
+{
+    bool ret = false;
+    switch (bit) {
+    case 8:
+        ret = (Vj->B[n] != 0);
+        break;
+    case 16:
+        ret = (Vj->H[n] != 0);
+        break;
+    case 32:
+        ret = (Vj->W[n] != 0);
+        break;
+    case 64:
+        ret = (Vj->D[n] != 0);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    return ret;
+}
+
+DO_HELPER_CV(vsetanyeqz_b, 8, helper_setanyeqz, do_setanyeqz)
+DO_HELPER_CV(vsetanyeqz_h, 16, helper_setanyeqz, do_setanyeqz)
+DO_HELPER_CV(vsetanyeqz_w, 32, helper_setanyeqz, do_setanyeqz)
+DO_HELPER_CV(vsetanyeqz_d, 64, helper_setanyeqz, do_setanyeqz)
+DO_HELPER_CV(vsetallnez_b, 8, helper_setallnez, do_setallnez)
+DO_HELPER_CV(vsetallnez_h, 16, helper_setallnez, do_setallnez)
+DO_HELPER_CV(vsetallnez_w, 32, helper_setallnez, do_setallnez)
+DO_HELPER_CV(vsetallnez_d, 64, helper_setallnez, do_setallnez)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 39/43] target/loongarch: Implement vinsgr2vr vpickve2gr vreplgr2vr
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (37 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 38/43] target/loongarch: Implement vbitsel vset Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24 20:34   ` Richard Henderson
  2022-12-24  8:16 ` [RFC PATCH 40/43] target/loongarch: Implement vreplve vpack vpick Song Gao
                   ` (4 subsequent siblings)
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VINSGR2VR.{B/H/W/D};
- VPICKVE2GR.{B/H/W/D}[U];
- VREPLGR2VR.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  33 +++++
 target/loongarch/helper.h                   |  18 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  53 +++++++
 target/loongarch/insns.decode               |  30 ++++
 target/loongarch/lsx_helper.c               | 154 ++++++++++++++++++++
 5 files changed, 288 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 88e6ed1eef..2f7c726158 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -789,6 +789,21 @@ static void output_vvvv(DisasContext *ctx, arg_vvvv *a, const char *mnemonic)
     output(ctx, mnemonic, "v%d, v%d, v%d, v%d", a->vd, a->vj, a->vk, a->va);
 }
 
+static void output_vr_i(DisasContext *ctx, arg_vr_i *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "v%d, r%d, 0x%x", a->vd, a->rj, a->imm);
+}
+
+static void output_rv_i(DisasContext *ctx, arg_rv_i *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "r%d, v%d, 0x%x", a->rd, a->vj,  a->imm);
+}
+
+static void output_vr(DisasContext *ctx, arg_vr *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "v%d, r%d", a->vd, a->rj);
+}
+
 INSN_LSX(vadd_b,           vvv)
 INSN_LSX(vadd_h,           vvv)
 INSN_LSX(vadd_w,           vvv)
@@ -1532,3 +1547,21 @@ INSN_LSX(vsetallnez_b,     cv)
 INSN_LSX(vsetallnez_h,     cv)
 INSN_LSX(vsetallnez_w,     cv)
 INSN_LSX(vsetallnez_d,     cv)
+
+INSN_LSX(vinsgr2vr_b,      vr_i)
+INSN_LSX(vinsgr2vr_h,      vr_i)
+INSN_LSX(vinsgr2vr_w,      vr_i)
+INSN_LSX(vinsgr2vr_d,      vr_i)
+INSN_LSX(vpickve2gr_b,     rv_i)
+INSN_LSX(vpickve2gr_h,     rv_i)
+INSN_LSX(vpickve2gr_w,     rv_i)
+INSN_LSX(vpickve2gr_d,     rv_i)
+INSN_LSX(vpickve2gr_bu,    rv_i)
+INSN_LSX(vpickve2gr_hu,    rv_i)
+INSN_LSX(vpickve2gr_wu,    rv_i)
+INSN_LSX(vpickve2gr_du,    rv_i)
+
+INSN_LSX(vreplgr2vr_b,     vr)
+INSN_LSX(vreplgr2vr_h,     vr)
+INSN_LSX(vreplgr2vr_w,     vr)
+INSN_LSX(vreplgr2vr_d,     vr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 1bef2a901f..00570221c7 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -791,3 +791,21 @@ DEF_HELPER_3(vsetallnez_b, void, env, i32, i32)
 DEF_HELPER_3(vsetallnez_h, void, env, i32, i32)
 DEF_HELPER_3(vsetallnez_w, void, env, i32, i32)
 DEF_HELPER_3(vsetallnez_d, void, env, i32, i32)
+
+DEF_HELPER_4(vinsgr2vr_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vinsgr2vr_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vinsgr2vr_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vinsgr2vr_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickve2gr_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickve2gr_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickve2gr_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickve2gr_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickve2gr_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickve2gr_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickve2gr_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickve2gr_du, void, env, i32, i32, i32)
+
+DEF_HELPER_3(vreplgr2vr_b, void, env, i32, i32)
+DEF_HELPER_3(vreplgr2vr_h, void, env, i32, i32)
+DEF_HELPER_3(vreplgr2vr_w, void, env, i32, i32)
+DEF_HELPER_3(vreplgr2vr_d, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 7bf7f33724..c753e61b4c 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -75,6 +75,41 @@ static bool gen_vvvv(DisasContext *ctx, arg_vvvv *a,
     return true;
 }
 
+static bool gen_vr_i(DisasContext *ctx, arg_vr_i *a,
+                    void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32))
+{
+    TCGv_i32 vd = tcg_constant_i32(a->vd);
+    TCGv_i32 rj = tcg_constant_i32(a->rj);
+    TCGv_i32 imm = tcg_constant_i32(a->imm);
+
+    CHECK_SXE;
+    func(cpu_env, vd, rj, imm);
+    return true;
+}
+
+static bool gen_rv_i(DisasContext *ctx, arg_rv_i *a,
+                    void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32))
+{
+    TCGv_i32 rd = tcg_constant_i32(a->rd);
+    TCGv_i32 vj = tcg_constant_i32(a->vj);
+    TCGv_i32 imm = tcg_constant_i32(a->imm);
+
+    CHECK_SXE;
+    func(cpu_env, rd, vj, imm);
+    return true;
+}
+
+static bool gen_vr(DisasContext *ctx, arg_vr *a,
+                   void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32))
+{
+    TCGv_i32 vd = tcg_constant_i32(a->vd);
+    TCGv_i32 rj = tcg_constant_i32(a->rj);
+
+    CHECK_SXE;
+    func(cpu_env, vd, rj);
+    return true;
+}
+
 TRANS(vadd_b, gen_vvv, gen_helper_vadd_b)
 TRANS(vadd_h, gen_vvv, gen_helper_vadd_h)
 TRANS(vadd_w, gen_vvv, gen_helper_vadd_w)
@@ -753,3 +788,21 @@ TRANS(vsetallnez_b, gen_cv, gen_helper_vsetallnez_b)
 TRANS(vsetallnez_h, gen_cv, gen_helper_vsetallnez_h)
 TRANS(vsetallnez_w, gen_cv, gen_helper_vsetallnez_w)
 TRANS(vsetallnez_d, gen_cv, gen_helper_vsetallnez_d)
+
+TRANS(vinsgr2vr_b, gen_vr_i, gen_helper_vinsgr2vr_b)
+TRANS(vinsgr2vr_h, gen_vr_i, gen_helper_vinsgr2vr_h)
+TRANS(vinsgr2vr_w, gen_vr_i, gen_helper_vinsgr2vr_w)
+TRANS(vinsgr2vr_d, gen_vr_i, gen_helper_vinsgr2vr_d)
+TRANS(vpickve2gr_b, gen_rv_i, gen_helper_vpickve2gr_b)
+TRANS(vpickve2gr_h, gen_rv_i, gen_helper_vpickve2gr_h)
+TRANS(vpickve2gr_w, gen_rv_i, gen_helper_vpickve2gr_w)
+TRANS(vpickve2gr_d, gen_rv_i, gen_helper_vpickve2gr_d)
+TRANS(vpickve2gr_bu, gen_rv_i, gen_helper_vpickve2gr_bu)
+TRANS(vpickve2gr_hu, gen_rv_i, gen_helper_vpickve2gr_hu)
+TRANS(vpickve2gr_wu, gen_rv_i, gen_helper_vpickve2gr_wu)
+TRANS(vpickve2gr_du, gen_rv_i, gen_helper_vpickve2gr_du)
+
+TRANS(vreplgr2vr_b, gen_vr, gen_helper_vreplgr2vr_b)
+TRANS(vreplgr2vr_h, gen_vr, gen_helper_vreplgr2vr_h)
+TRANS(vreplgr2vr_w, gen_vr, gen_helper_vreplgr2vr_w)
+TRANS(vreplgr2vr_d, gen_vr, gen_helper_vreplgr2vr_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index fb1cc29aff..45eff88830 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -495,6 +495,9 @@ dbcl             0000 00000010 10101 ...............      @i15
 &vv_i         vd vj imm
 &vvvv         vd vj vk va
 &vvv_fcond    vd vj vk fcond
+&vr_i         vd rj imm
+&rv_i         rd vj imm
+&vr           vd rj
 
 #
 # LSX Formats
@@ -511,6 +514,15 @@ dbcl             0000 00000010 10101 ...............      @i15
 @vv_i5           .... ........ ..... imm:s5 vj:5 vd:5    &vv_i
 @vvvv               .... ........ va:5 vk:5 vj:5 vd:5    &vvvv
 @vvv_fcond      .... ........ fcond:5  vk:5 vj:5 vd:5    &vvv_fcond
+@vr_ui4         .... ........ ..... . imm:4 rj:5 vd:5    &vr_i
+@vr_ui3        .... ........ ..... .. imm:3 rj:5 vd:5    &vr_i
+@vr_ui2       .... ........ ..... ... imm:2 rj:5 vd:5    &vr_i
+@vr_ui1      .... ........ ..... .... imm:1 rj:5 vd:5    &vr_i
+@rv_ui4         .... ........ ..... . imm:4 vj:5 rd:5    &rv_i
+@rv_ui3        .... ........ ..... .. imm:3 vj:5 rd:5    &rv_i
+@rv_ui2       .... ........ ..... ... imm:2 vj:5 rd:5    &rv_i
+@rv_ui1      .... ........ ..... .... imm:1 vj:5 rd:5    &rv_i
+@vr               .... ........ ..... ..... rj:5 vd:5    &vr
 
 vadd_b           0111 00000000 10100 ..... ..... .....    @vvv
 vadd_h           0111 00000000 10101 ..... ..... .....    @vvv
@@ -1166,3 +1178,21 @@ vsetallnez_b     0111 00101001 11001 01100 ..... 00 ...   @cv
 vsetallnez_h     0111 00101001 11001 01101 ..... 00 ...   @cv
 vsetallnez_w     0111 00101001 11001 01110 ..... 00 ...   @cv
 vsetallnez_d     0111 00101001 11001 01111 ..... 00 ...   @cv
+
+vinsgr2vr_b      0111 00101110 10111 0 .... ..... .....   @vr_ui4
+vinsgr2vr_h      0111 00101110 10111 10 ... ..... .....   @vr_ui3
+vinsgr2vr_w      0111 00101110 10111 110 .. ..... .....   @vr_ui2
+vinsgr2vr_d      0111 00101110 10111 1110 . ..... .....   @vr_ui1
+vpickve2gr_b     0111 00101110 11111 0 .... ..... .....   @rv_ui4
+vpickve2gr_h     0111 00101110 11111 10 ... ..... .....   @rv_ui3
+vpickve2gr_w     0111 00101110 11111 110 .. ..... .....   @rv_ui2
+vpickve2gr_d     0111 00101110 11111 1110 . ..... .....   @rv_ui1
+vpickve2gr_bu    0111 00101111 00111 0 .... ..... .....   @rv_ui4
+vpickve2gr_hu    0111 00101111 00111 10 ... ..... .....   @rv_ui3
+vpickve2gr_wu    0111 00101111 00111 110 .. ..... .....   @rv_ui2
+vpickve2gr_du    0111 00101111 00111 1110 . ..... .....   @rv_ui1
+
+vreplgr2vr_b     0111 00101001 11110 00000 ..... .....    @vr
+vreplgr2vr_h     0111 00101001 11110 00001 ..... .....    @vr
+vreplgr2vr_w     0111 00101001 11110 00010 ..... .....    @vr
+vreplgr2vr_d     0111 00101001 11110 00011 ..... .....    @vr
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index f4cdfae87a..15dbf4fc32 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -35,6 +35,21 @@
     void helper_##NAME(CPULoongArchState *env, uint32_t cd, uint32_t vj) \
     { FUNC(env, cd, vj, BIT, __VA_ARGS__); }
 
+#define DO_HELPER_VR_I(NAME, BIT, FUNC, ...)                   \
+    void helper_##NAME(CPULoongArchState *env,                 \
+                       uint32_t vd, uint32_t rj, uint32_t imm) \
+    { FUNC(env, vd, rj, imm, BIT, __VA_ARGS__ ); }
+
+#define DO_HELPER_RV_I(NAME, BIT, FUNC, ...)                   \
+    void helper_##NAME(CPULoongArchState *env,                 \
+                       uint32_t rd, uint32_t vj, uint32_t imm) \
+    { FUNC(env, rd, vj, imm, BIT, __VA_ARGS__ ); }
+
+#define DO_HELPER_VR(NAME, BIT, FUNC, ...)       \
+    void helper_##NAME(CPULoongArchState *env,   \
+                       uint32_t vd, uint32_t rj) \
+    { FUNC(env, vd, rj, BIT, __VA_ARGS__ ); }
+
 static void helper_vvv(CPULoongArchState *env,
                        uint32_t vd, uint32_t vj, uint32_t vk, int bit,
                        void (*func)(vec_t*, vec_t*, vec_t*, int, int))
@@ -4391,3 +4406,142 @@ DO_HELPER_CV(vsetallnez_b, 8, helper_setallnez, do_setallnez)
 DO_HELPER_CV(vsetallnez_h, 16, helper_setallnez, do_setallnez)
 DO_HELPER_CV(vsetallnez_w, 32, helper_setallnez, do_setallnez)
 DO_HELPER_CV(vsetallnez_d, 64, helper_setallnez, do_setallnez)
+
+static void helper_vr_i(CPULoongArchState *env,
+                        uint32_t vd, uint32_t rj, uint32_t imm, int bit,
+                        void (*func)(vec_t*, uint64_t, uint32_t, int))
+{
+    vec_t *Vd = &(env->fpr[vd].vec);
+    uint64_t Rj = env->gpr[rj];
+
+    imm %= (LSX_LEN/bit);
+
+    func(Vd, Rj, imm, bit);
+}
+
+static void do_insgr2vr(vec_t *Vd, uint64_t value, uint32_t imm, int bit)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[imm] = (int8_t)value;
+        break;
+    case 16:
+        Vd->H[imm] = (int16_t)value;
+        break;
+    case 32:
+        Vd->W[imm] = (int32_t)value;
+        break;
+    case 64:
+        Vd->D[imm] = (int64_t)value;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VR_I(vinsgr2vr_b, 8, helper_vr_i, do_insgr2vr)
+DO_HELPER_VR_I(vinsgr2vr_h, 16, helper_vr_i, do_insgr2vr)
+DO_HELPER_VR_I(vinsgr2vr_w, 32, helper_vr_i, do_insgr2vr)
+DO_HELPER_VR_I(vinsgr2vr_d, 64, helper_vr_i, do_insgr2vr)
+
+static void helper_rv_i(CPULoongArchState *env,
+                        uint32_t rd, uint32_t vj, uint32_t imm, int bit,
+                        void (*func)(CPULoongArchState*, uint32_t, vec_t*,
+                                     uint32_t, int))
+{
+    vec_t *Vj = &(env->fpr[vj].vec);
+
+    imm %=(LSX_LEN/bit);
+
+    func(env, rd, Vj, imm, bit);
+}
+
+static void do_pickve2gr_s(CPULoongArchState *env,
+                           uint32_t rd, vec_t *Vj, uint32_t imm, int bit)
+{
+    switch (bit) {
+    case 8:
+        env->gpr[rd] = Vj->B[imm];
+        break;
+    case 16:
+        env->gpr[rd] = Vj->H[imm];
+        break;
+    case 32:
+        env->gpr[rd] = Vj->W[imm];
+        break;
+    case 64:
+        env->gpr[rd] = Vj->D[imm];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_pickve2gr_u(CPULoongArchState *env,
+                           uint32_t rd, vec_t *Vj, uint32_t imm, int bit)
+{
+    switch (bit) {
+    case 8:
+        env->gpr[rd] = (uint8_t)Vj->B[imm];
+        break;
+    case 16:
+        env->gpr[rd] = (uint16_t)Vj->H[imm];
+        break;
+    case 32:
+        env->gpr[rd] = (uint32_t)Vj->W[imm];
+        break;
+    case 64:
+        env->gpr[rd] = (uint64_t)Vj->D[imm];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_RV_I(vpickve2gr_b, 8, helper_rv_i, do_pickve2gr_s)
+DO_HELPER_RV_I(vpickve2gr_h, 16, helper_rv_i, do_pickve2gr_s)
+DO_HELPER_RV_I(vpickve2gr_w, 32, helper_rv_i, do_pickve2gr_s)
+DO_HELPER_RV_I(vpickve2gr_d, 64, helper_rv_i, do_pickve2gr_s)
+DO_HELPER_RV_I(vpickve2gr_bu, 8, helper_rv_i, do_pickve2gr_u)
+DO_HELPER_RV_I(vpickve2gr_hu, 16, helper_rv_i, do_pickve2gr_u)
+DO_HELPER_RV_I(vpickve2gr_wu, 32, helper_rv_i, do_pickve2gr_u)
+DO_HELPER_RV_I(vpickve2gr_du, 64, helper_rv_i, do_pickve2gr_u)
+
+static void helper_vr(CPULoongArchState *env,
+                      uint32_t vd, uint32_t rj, int bit,
+                      void (*func)(CPULoongArchState*,
+                                   vec_t*, uint32_t,  int, int))
+{
+    int i;
+    vec_t *Vd = &(env->fpr[vd].vec);
+
+    for (i = 0; i < LSX_LEN/bit; i++) {
+        func(env, Vd, rj, bit, i);
+    }
+}
+
+static void do_replgr2vr(CPULoongArchState *env,
+                         vec_t *Vd, uint32_t rj, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = (int8_t)env->gpr[rj];
+        break;
+    case 16:
+        Vd->H[n] = (int16_t)env->gpr[rj];
+        break;
+    case 32:
+        Vd->W[n] = (int32_t)env->gpr[rj];
+        break;
+    case 64:
+        Vd->D[n] = (int64_t)env->gpr[rj];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VR(vreplgr2vr_b, 8, helper_vr, do_replgr2vr)
+DO_HELPER_VR(vreplgr2vr_h, 16, helper_vr, do_replgr2vr)
+DO_HELPER_VR(vreplgr2vr_w, 32, helper_vr, do_replgr2vr)
+DO_HELPER_VR(vreplgr2vr_d, 64, helper_vr, do_replgr2vr)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 40/43] target/loongarch: Implement vreplve vpack vpick
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (38 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 39/43] target/loongarch: Implement vinsgr2vr vpickve2gr vreplgr2vr Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24 21:12   ` Richard Henderson
  2022-12-24  8:16 ` [RFC PATCH 41/43] target/loongarch: Implement vilvl vilvh vextrins vshuf Song Gao
                   ` (3 subsequent siblings)
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VREPLVE[I].{B/H/W/D};
- VBSLL.V, VBSRL.V;
- VPACK{EV/OD}.{B/H/W/D};
- VPICK{EV/OD}.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  35 +++
 target/loongarch/helper.h                   |  30 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  42 ++++
 target/loongarch/insns.decode               |  34 +++
 target/loongarch/lsx_helper.c               | 226 ++++++++++++++++++++
 5 files changed, 367 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 2f7c726158..fd87f7fbe1 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -804,6 +804,11 @@ static void output_vr(DisasContext *ctx, arg_vr *a, const char *mnemonic)
     output(ctx, mnemonic, "v%d, r%d", a->vd, a->rj);
 }
 
+static void output_vvr(DisasContext *ctx, arg_vvr *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "v%d, v%d, r%d", a->vd, a->vj, a->rk);
+}
+
 INSN_LSX(vadd_b,           vvv)
 INSN_LSX(vadd_h,           vvv)
 INSN_LSX(vadd_w,           vvv)
@@ -1565,3 +1570,33 @@ INSN_LSX(vreplgr2vr_b,     vr)
 INSN_LSX(vreplgr2vr_h,     vr)
 INSN_LSX(vreplgr2vr_w,     vr)
 INSN_LSX(vreplgr2vr_d,     vr)
+
+INSN_LSX(vreplve_b,        vvr)
+INSN_LSX(vreplve_h,        vvr)
+INSN_LSX(vreplve_w,        vvr)
+INSN_LSX(vreplve_d,        vvr)
+INSN_LSX(vreplvei_b,       vv_i)
+INSN_LSX(vreplvei_h,       vv_i)
+INSN_LSX(vreplvei_w,       vv_i)
+INSN_LSX(vreplvei_d,       vv_i)
+
+INSN_LSX(vbsll_v,          vv_i)
+INSN_LSX(vbsrl_v,          vv_i)
+
+INSN_LSX(vpackev_b,        vvv)
+INSN_LSX(vpackev_h,        vvv)
+INSN_LSX(vpackev_w,        vvv)
+INSN_LSX(vpackev_d,        vvv)
+INSN_LSX(vpackod_b,        vvv)
+INSN_LSX(vpackod_h,        vvv)
+INSN_LSX(vpackod_w,        vvv)
+INSN_LSX(vpackod_d,        vvv)
+
+INSN_LSX(vpickev_b,        vvv)
+INSN_LSX(vpickev_h,        vvv)
+INSN_LSX(vpickev_w,        vvv)
+INSN_LSX(vpickev_d,        vvv)
+INSN_LSX(vpickod_b,        vvv)
+INSN_LSX(vpickod_h,        vvv)
+INSN_LSX(vpickod_w,        vvv)
+INSN_LSX(vpickod_d,        vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 00570221c7..dfe3eb925f 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -809,3 +809,33 @@ DEF_HELPER_3(vreplgr2vr_b, void, env, i32, i32)
 DEF_HELPER_3(vreplgr2vr_h, void, env, i32, i32)
 DEF_HELPER_3(vreplgr2vr_w, void, env, i32, i32)
 DEF_HELPER_3(vreplgr2vr_d, void, env, i32, i32)
+
+DEF_HELPER_4(vreplve_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vreplve_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vreplve_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vreplve_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vreplvei_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vreplvei_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vreplvei_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vreplvei_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vbsll_v, void, env, i32, i32, i32)
+DEF_HELPER_4(vbsrl_v, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vpackev_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vpackev_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vpackev_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vpackev_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vpackod_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vpackod_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vpackod_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vpackod_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vpickev_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickev_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickev_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickev_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickod_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickod_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickod_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickod_d, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index c753e61b4c..0c74811bc4 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -110,6 +110,18 @@ static bool gen_vr(DisasContext *ctx, arg_vr *a,
     return true;
 }
 
+static bool gen_vvr(DisasContext *ctx, arg_vvr *a,
+                    void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32))
+{
+    TCGv_i32 vd = tcg_constant_i32(a->vd);
+    TCGv_i32 vj = tcg_constant_i32(a->vj);
+    TCGv_i32 rk = tcg_constant_i32(a->rk);
+
+    CHECK_SXE;
+    func(cpu_env, vd, vj, rk);
+    return true;
+}
+
 TRANS(vadd_b, gen_vvv, gen_helper_vadd_b)
 TRANS(vadd_h, gen_vvv, gen_helper_vadd_h)
 TRANS(vadd_w, gen_vvv, gen_helper_vadd_w)
@@ -806,3 +818,33 @@ TRANS(vreplgr2vr_b, gen_vr, gen_helper_vreplgr2vr_b)
 TRANS(vreplgr2vr_h, gen_vr, gen_helper_vreplgr2vr_h)
 TRANS(vreplgr2vr_w, gen_vr, gen_helper_vreplgr2vr_w)
 TRANS(vreplgr2vr_d, gen_vr, gen_helper_vreplgr2vr_d)
+
+TRANS(vreplve_b, gen_vvr, gen_helper_vreplve_b)
+TRANS(vreplve_h, gen_vvr, gen_helper_vreplve_h)
+TRANS(vreplve_w, gen_vvr, gen_helper_vreplve_w)
+TRANS(vreplve_d, gen_vvr, gen_helper_vreplve_d)
+TRANS(vreplvei_b, gen_vv_i, gen_helper_vreplvei_b)
+TRANS(vreplvei_h, gen_vv_i, gen_helper_vreplvei_h)
+TRANS(vreplvei_w, gen_vv_i, gen_helper_vreplvei_w)
+TRANS(vreplvei_d, gen_vv_i, gen_helper_vreplvei_d)
+
+TRANS(vbsll_v, gen_vv_i, gen_helper_vbsll_v)
+TRANS(vbsrl_v, gen_vv_i, gen_helper_vbsrl_v)
+
+TRANS(vpackev_b, gen_vvv, gen_helper_vpackev_b)
+TRANS(vpackev_h, gen_vvv, gen_helper_vpackev_h)
+TRANS(vpackev_w, gen_vvv, gen_helper_vpackev_w)
+TRANS(vpackev_d, gen_vvv, gen_helper_vpackev_d)
+TRANS(vpackod_b, gen_vvv, gen_helper_vpackod_b)
+TRANS(vpackod_h, gen_vvv, gen_helper_vpackod_h)
+TRANS(vpackod_w, gen_vvv, gen_helper_vpackod_w)
+TRANS(vpackod_d, gen_vvv, gen_helper_vpackod_d)
+
+TRANS(vpickev_b, gen_vvv, gen_helper_vpickev_b)
+TRANS(vpickev_h, gen_vvv, gen_helper_vpickev_h)
+TRANS(vpickev_w, gen_vvv, gen_helper_vpickev_w)
+TRANS(vpickev_d, gen_vvv, gen_helper_vpickev_d)
+TRANS(vpickod_b, gen_vvv, gen_helper_vpickod_b)
+TRANS(vpickod_h, gen_vvv, gen_helper_vpickod_h)
+TRANS(vpickod_w, gen_vvv, gen_helper_vpickod_w)
+TRANS(vpickod_d, gen_vvv, gen_helper_vpickod_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 45eff88830..7db72bd358 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -498,6 +498,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 &vr_i         vd rj imm
 &rv_i         rd vj imm
 &vr           vd rj
+&vvr          vd vj rk
 
 #
 # LSX Formats
@@ -505,6 +506,8 @@ dbcl             0000 00000010 10101 ...............      @i15
 @vv               .... ........ ..... ..... vj:5 vd:5    &vv
 @cv            .... ........ ..... ..... vj:5 .. cd:3    &cv
 @vvv               .... ........ ..... vk:5 vj:5 vd:5    &vvv
+@vv_ui1      .... ........ ..... .... imm:1 vj:5 vd:5    &vv_i
+@vv_ui2       .... ........ ..... ... imm:2 vj:5 vd:5    &vv_i
 @vv_ui3        .... ........ ..... .. imm:3 vj:5 vd:5    &vv_i
 @vv_ui4         .... ........ ..... . imm:4 vj:5 vd:5    &vv_i
 @vv_ui5           .... ........ ..... imm:5 vj:5 vd:5    &vv_i
@@ -523,6 +526,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 @rv_ui2       .... ........ ..... ... imm:2 vj:5 rd:5    &rv_i
 @rv_ui1      .... ........ ..... .... imm:1 vj:5 rd:5    &rv_i
 @vr               .... ........ ..... ..... rj:5 vd:5    &vr
+@vvr               .... ........ ..... rk:5 vj:5 vd:5    &vvr
 
 vadd_b           0111 00000000 10100 ..... ..... .....    @vvv
 vadd_h           0111 00000000 10101 ..... ..... .....    @vvv
@@ -1196,3 +1200,33 @@ vreplgr2vr_b     0111 00101001 11110 00000 ..... .....    @vr
 vreplgr2vr_h     0111 00101001 11110 00001 ..... .....    @vr
 vreplgr2vr_w     0111 00101001 11110 00010 ..... .....    @vr
 vreplgr2vr_d     0111 00101001 11110 00011 ..... .....    @vr
+
+vreplve_b        0111 00010010 00100 ..... ..... .....    @vvr
+vreplve_h        0111 00010010 00101 ..... ..... .....    @vvr
+vreplve_w        0111 00010010 00110 ..... ..... .....    @vvr
+vreplve_d        0111 00010010 00111 ..... ..... .....    @vvr
+vreplvei_b       0111 00101111 01111 0 .... ..... .....   @vv_ui4
+vreplvei_h       0111 00101111 01111 10 ... ..... .....   @vv_ui3
+vreplvei_w       0111 00101111 01111 110 .. ..... .....   @vv_ui2
+vreplvei_d       0111 00101111 01111 1110 . ..... .....   @vv_ui1
+
+vbsll_v          0111 00101000 11100 ..... ..... .....    @vv_ui5
+vbsrl_v          0111 00101000 11101 ..... ..... .....    @vv_ui5
+
+vpackev_b        0111 00010001 01100 ..... ..... .....    @vvv
+vpackev_h        0111 00010001 01101 ..... ..... .....    @vvv
+vpackev_w        0111 00010001 01110 ..... ..... .....    @vvv
+vpackev_d        0111 00010001 01111 ..... ..... .....    @vvv
+vpackod_b        0111 00010001 10000 ..... ..... .....    @vvv
+vpackod_h        0111 00010001 10001 ..... ..... .....    @vvv
+vpackod_w        0111 00010001 10010 ..... ..... .....    @vvv
+vpackod_d        0111 00010001 10011 ..... ..... .....    @vvv
+
+vpickev_b        0111 00010001 11100 ..... ..... .....    @vvv
+vpickev_h        0111 00010001 11101 ..... ..... .....    @vvv
+vpickev_w        0111 00010001 11110 ..... ..... .....    @vvv
+vpickev_d        0111 00010001 11111 ..... ..... .....    @vvv
+vpickod_b        0111 00010010 00000 ..... ..... .....    @vvv
+vpickod_h        0111 00010010 00001 ..... ..... .....    @vvv
+vpickod_w        0111 00010010 00010 ..... ..... .....    @vvv
+vpickod_d        0111 00010010 00011 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 15dbf4fc32..b0017a7ab8 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -50,6 +50,11 @@
                        uint32_t vd, uint32_t rj) \
     { FUNC(env, vd, rj, BIT, __VA_ARGS__ ); }
 
+#define DO_HELPER_VV_R(NAME, BIT, FUNC, ...)                  \
+    void helper_##NAME(CPULoongArchState *env,                \
+                       uint32_t vd, uint32_t vj, uint32_t rk) \
+    { FUNC(env, vd, vj, rk, BIT, __VA_ARGS__); }
+
 static void helper_vvv(CPULoongArchState *env,
                        uint32_t vd, uint32_t vj, uint32_t vk, int bit,
                        void (*func)(vec_t*, vec_t*, vec_t*, int, int))
@@ -4545,3 +4550,224 @@ DO_HELPER_VR(vreplgr2vr_b, 8, helper_vr, do_replgr2vr)
 DO_HELPER_VR(vreplgr2vr_h, 16, helper_vr, do_replgr2vr)
 DO_HELPER_VR(vreplgr2vr_w, 32, helper_vr, do_replgr2vr)
 DO_HELPER_VR(vreplgr2vr_d, 64, helper_vr, do_replgr2vr)
+
+static void helper_vreplve(CPULoongArchState *env,
+                           uint32_t vd, uint32_t vj, uint32_t rk, int bit,
+                           void (*func)(vec_t*, vec_t*, uint64_t, int, int))
+{
+    int i;
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+
+    for (i = 0; i < LSX_LEN/bit; i++) {
+        func(Vd, Vj, env->gpr[rk], bit, i);
+    }
+}
+
+static void helper_vreplvei(CPULoongArchState *env,
+                            uint32_t vd, uint32_t vj, uint32_t imm, int bit,
+                            void (*func)(vec_t*, vec_t*, uint64_t, int, int))
+{
+    int i;
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+
+    for (i = 0; i < LSX_LEN/bit; i++) {
+        func(Vd, Vj, imm, bit, i);
+    }
+}
+
+static void do_vreplve(vec_t *Vd, vec_t *Vj, uint64_t value, int bit, int n)
+{
+    uint32_t index = value % (LSX_LEN/bit);
+    switch (bit) {
+    case 8:
+        Vd->B[n] = Vj->B[index];
+        break;
+    case 16:
+        Vd->H[n] = Vj->H[index];
+        break;
+    case 32:
+        Vd->W[n] = Vj->W[index];
+        break;
+    case 64:
+        Vd->D[n] = Vj->D[index];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VV_R(vreplve_b, 8, helper_vreplve, do_vreplve)
+DO_HELPER_VV_R(vreplve_h, 16, helper_vreplve, do_vreplve)
+DO_HELPER_VV_R(vreplve_w, 32, helper_vreplve, do_vreplve)
+DO_HELPER_VV_R(vreplve_d, 64, helper_vreplve, do_vreplve)
+DO_HELPER_VV_I(vreplvei_b, 8, helper_vreplvei, do_vreplve)
+DO_HELPER_VV_I(vreplvei_h, 16, helper_vreplvei, do_vreplve)
+DO_HELPER_VV_I(vreplvei_w, 32, helper_vreplvei, do_vreplve)
+DO_HELPER_VV_I(vreplvei_d, 64, helper_vreplvei, do_vreplve)
+
+void helper_vbsll_v(CPULoongArchState *env,
+                    uint32_t vd, uint32_t vj, uint32_t imm)
+{
+    uint32_t idx, i;
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+    vec_t tmp;
+
+    tmp.D[0] = Vd->D[0];
+    tmp.D[1] = Vd->D[1];
+    idx = (imm & 0xf);
+    for(i = 0; i < 16; i++) {
+        tmp.B[i]  = (i < idx) ? 0 : Vj->B[i - idx];
+    }
+    Vd->D[0] = tmp.D[0];
+    Vd->D[1] = tmp.D[1];
+}
+
+void helper_vbsrl_v(CPULoongArchState *env,
+                    uint32_t vd, uint32_t vj, uint32_t imm)
+{
+    uint32_t idx, i;
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+
+    idx = (imm & 0xf);
+    for(i = 0; i < 16; i++) {
+        Vd->B[i]  = (i + idx > 15) ? 0 : Vj->B[i + idx];
+    }
+}
+
+static void helper_vvv_c(CPULoongArchState *env,
+                        uint32_t vd, uint32_t vj, uint32_t vk, int bit,
+                        void (*func)(vec_t*, vec_t*, vec_t*, int, int))
+{
+    int i;
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+    vec_t *Vk = &(env->fpr[vk].vec);
+
+    vec_t temp;
+    temp.D[0] = 0;
+    temp.D[1] = 0;
+
+    for (i = 0; i < LSX_LEN/bit/2; i++) {
+        func(&temp, Vj, Vk, bit, i);
+    }
+    Vd->D[0] = temp.D[0];
+    Vd->D[1] = temp.D[1];
+}
+
+static void do_vpackev(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[2 * n + 1] = Vj->B[2 * n];
+        Vd->B[2 * n] = Vk->B[2 * n];
+        break;
+    case 16:
+        Vd->H[2 * n + 1] = Vj->H[2 * n];
+        Vd->H[2 * n] = Vk->H[2 * n];
+        break;
+    case 32:
+        Vd->W[2 * n + 1] = Vj->W[2 * n];
+        Vd->W[2 * n] = Vk->W[2 * n];
+        break;
+    case 64:
+        Vd->D[2 * n + 1] = Vj->D[2 * n];
+        Vd->D[2 * n] = Vk->D[2 * n];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vpackod(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[2 * n + 1] = Vj->B[2 * n + 1];
+        Vd->B[2 * n] = Vk->B[2 * n + 1];
+        break;
+    case 16:
+        Vd->H[2 * n + 1] = Vj->H[2 * n + 1];
+        Vd->H[2 * n] = Vk->H[2 * n + 1];
+        break;
+    case 32:
+        Vd->W[2 * n + 1] = Vj->W[2 * n + 1];
+        Vd->W[2 * n] = Vk->W[2 * n + 1];
+        break;
+    case 64:
+        Vd->D[2 * n + 1] = Vj->D[2 * n + 1];
+        Vd->D[2 * n] = Vk->D[2 * n + 1];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vpackev_b, 8, helper_vvv_c, do_vpackev)
+DO_HELPER_VVV(vpackev_h, 16, helper_vvv_c, do_vpackev)
+DO_HELPER_VVV(vpackev_w, 32, helper_vvv_c, do_vpackev)
+DO_HELPER_VVV(vpackev_d, 64, helper_vvv_c, do_vpackev)
+DO_HELPER_VVV(vpackod_b, 8, helper_vvv_c, do_vpackod)
+DO_HELPER_VVV(vpackod_h, 16, helper_vvv_c, do_vpackod)
+DO_HELPER_VVV(vpackod_w, 32, helper_vvv_c, do_vpackod)
+DO_HELPER_VVV(vpackod_d, 64, helper_vvv_c, do_vpackod)
+
+static void do_vpickev(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n + 8] = Vj->B[2 * n];
+        Vd->B[n] = Vk->B[2 * n];
+        break;
+    case 16:
+        Vd->H[n + 4] = Vj->H[2 * n];
+        Vd->H[n] = Vk->H[2 * n];
+        break;
+    case 32:
+        Vd->W[n + 2] = Vj->W[2 * n];
+        Vd->W[n] = Vk->W[2 * n];
+        break;
+    case 64:
+        Vd->D[n + 1] = Vj->D[2 *n];
+        Vd->D[n] = Vk->D[2 * n];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vpickod(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n + 8] = Vj->B[2 * n + 1];
+        Vd->B[n] = Vk->B[2 * n + 1];
+        break;
+    case 16:
+        Vd->H[n + 4] = Vj->H[2 * n + 1];
+        Vd->H[n] = Vk->H[2 * n + 1];
+        break;
+    case 32:
+        Vd->W[n + 2] = Vj->W[2 * n + 1];
+        Vd->W[n] = Vk->W[2 * n + 1];
+        break;
+    case 64:
+        Vd->D[n + 1] = Vj->D[2 *n + 1];
+        Vd->D[n] = Vk->D[2 * n + 1];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vpickev_b, 8, helper_vvv_c, do_vpickev)
+DO_HELPER_VVV(vpickev_h, 16, helper_vvv_c, do_vpickev)
+DO_HELPER_VVV(vpickev_w, 32, helper_vvv_c, do_vpickev)
+DO_HELPER_VVV(vpickev_d, 64, helper_vvv_c, do_vpickev)
+DO_HELPER_VVV(vpickod_b, 8, helper_vvv_c, do_vpickod)
+DO_HELPER_VVV(vpickod_h, 16, helper_vvv_c, do_vpickod)
+DO_HELPER_VVV(vpickod_w, 32, helper_vvv_c, do_vpickod)
+DO_HELPER_VVV(vpickod_d, 64, helper_vvv_c, do_vpickod)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 41/43] target/loongarch: Implement vilvl vilvh vextrins vshuf
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (39 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 40/43] target/loongarch: Implement vreplve vpack vpick Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24  8:16 ` [RFC PATCH 42/43] target/loongarch: Implement vld vst Song Gao
                   ` (2 subsequent siblings)
  43 siblings, 0 replies; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VILV{L/H}.{B/H/W/D};
- VSHUF.{B/H/W/D};
- VSHUF4I.{B/H/W/D};
- VPERMI.W;
- VEXTRINS.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  25 +++
 target/loongarch/helper.h                   |  25 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  25 +++
 target/loongarch/insns.decode               |  25 +++
 target/loongarch/lsx_helper.c               | 202 ++++++++++++++++++++
 5 files changed, 302 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index fd87f7fbe1..ee92029007 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1600,3 +1600,28 @@ INSN_LSX(vpickod_b,        vvv)
 INSN_LSX(vpickod_h,        vvv)
 INSN_LSX(vpickod_w,        vvv)
 INSN_LSX(vpickod_d,        vvv)
+
+INSN_LSX(vilvl_b,          vvv)
+INSN_LSX(vilvl_h,          vvv)
+INSN_LSX(vilvl_w,          vvv)
+INSN_LSX(vilvl_d,          vvv)
+INSN_LSX(vilvh_b,          vvv)
+INSN_LSX(vilvh_h,          vvv)
+INSN_LSX(vilvh_w,          vvv)
+INSN_LSX(vilvh_d,          vvv)
+
+INSN_LSX(vshuf_b,          vvvv)
+INSN_LSX(vshuf_h,          vvv)
+INSN_LSX(vshuf_w,          vvv)
+INSN_LSX(vshuf_d,          vvv)
+INSN_LSX(vshuf4i_b,        vv_i)
+INSN_LSX(vshuf4i_h,        vv_i)
+INSN_LSX(vshuf4i_w,        vv_i)
+INSN_LSX(vshuf4i_d,        vv_i)
+
+INSN_LSX(vpermi_w,         vv_i)
+
+INSN_LSX(vextrins_d,       vv_i)
+INSN_LSX(vextrins_w,       vv_i)
+INSN_LSX(vextrins_h,       vv_i)
+INSN_LSX(vextrins_b,       vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index dfe3eb925f..b0fb82c60e 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -839,3 +839,28 @@ DEF_HELPER_4(vpickod_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vpickod_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vpickod_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vpickod_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vilvl_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vilvl_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vilvl_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vilvl_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vilvh_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vilvh_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vilvh_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vilvh_d, void, env, i32, i32, i32)
+
+DEF_HELPER_5(vshuf_b, void, env, i32, i32, i32, i32)
+DEF_HELPER_4(vshuf_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vshuf_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vshuf_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vshuf4i_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vshuf4i_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vshuf4i_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vshuf4i_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vpermi_w, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vextrins_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vextrins_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vextrins_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vextrins_d, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 0c74811bc4..b289354dc3 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -848,3 +848,28 @@ TRANS(vpickod_b, gen_vvv, gen_helper_vpickod_b)
 TRANS(vpickod_h, gen_vvv, gen_helper_vpickod_h)
 TRANS(vpickod_w, gen_vvv, gen_helper_vpickod_w)
 TRANS(vpickod_d, gen_vvv, gen_helper_vpickod_d)
+
+TRANS(vilvl_b, gen_vvv, gen_helper_vilvl_b)
+TRANS(vilvl_h, gen_vvv, gen_helper_vilvl_h)
+TRANS(vilvl_w, gen_vvv, gen_helper_vilvl_w)
+TRANS(vilvl_d, gen_vvv, gen_helper_vilvl_d)
+TRANS(vilvh_b, gen_vvv, gen_helper_vilvh_b)
+TRANS(vilvh_h, gen_vvv, gen_helper_vilvh_h)
+TRANS(vilvh_w, gen_vvv, gen_helper_vilvh_w)
+TRANS(vilvh_d, gen_vvv, gen_helper_vilvh_d)
+
+TRANS(vshuf_b, gen_vvvv, gen_helper_vshuf_b)
+TRANS(vshuf_h, gen_vvv, gen_helper_vshuf_h)
+TRANS(vshuf_w, gen_vvv, gen_helper_vshuf_w)
+TRANS(vshuf_d, gen_vvv, gen_helper_vshuf_d)
+TRANS(vshuf4i_b, gen_vv_i, gen_helper_vshuf4i_b)
+TRANS(vshuf4i_h, gen_vv_i, gen_helper_vshuf4i_h)
+TRANS(vshuf4i_w, gen_vv_i, gen_helper_vshuf4i_w)
+TRANS(vshuf4i_d, gen_vv_i, gen_helper_vshuf4i_d)
+
+TRANS(vpermi_w, gen_vv_i, gen_helper_vpermi_w)
+
+TRANS(vextrins_b, gen_vv_i, gen_helper_vextrins_b)
+TRANS(vextrins_h, gen_vv_i, gen_helper_vextrins_h)
+TRANS(vextrins_w, gen_vv_i, gen_helper_vextrins_w)
+TRANS(vextrins_d, gen_vv_i, gen_helper_vextrins_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 7db72bd358..67bce30d00 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1230,3 +1230,28 @@ vpickod_b        0111 00010010 00000 ..... ..... .....    @vvv
 vpickod_h        0111 00010010 00001 ..... ..... .....    @vvv
 vpickod_w        0111 00010010 00010 ..... ..... .....    @vvv
 vpickod_d        0111 00010010 00011 ..... ..... .....    @vvv
+
+vilvl_b          0111 00010001 10100 ..... ..... .....    @vvv
+vilvl_h          0111 00010001 10101 ..... ..... .....    @vvv
+vilvl_w          0111 00010001 10110 ..... ..... .....    @vvv
+vilvl_d          0111 00010001 10111 ..... ..... .....    @vvv
+vilvh_b          0111 00010001 11000 ..... ..... .....    @vvv
+vilvh_h          0111 00010001 11001 ..... ..... .....    @vvv
+vilvh_w          0111 00010001 11010 ..... ..... .....    @vvv
+vilvh_d          0111 00010001 11011 ..... ..... .....    @vvv
+
+vshuf_b          0000 11010101 ..... ..... ..... .....    @vvvv
+vshuf_h          0111 00010111 10101 ..... ..... .....    @vvv
+vshuf_w          0111 00010111 10110 ..... ..... .....    @vvv
+vshuf_d          0111 00010111 10111 ..... ..... .....    @vvv
+vshuf4i_b        0111 00111001 00 ........ ..... .....    @vv_ui8
+vshuf4i_h        0111 00111001 01 ........ ..... .....    @vv_ui8
+vshuf4i_w        0111 00111001 10 ........ ..... .....    @vv_ui8
+vshuf4i_d        0111 00111001 11 ........ ..... .....    @vv_ui8
+
+vpermi_w         0111 00111110 01 ........ ..... .....    @vv_ui8
+
+vextrins_d       0111 00111000 00 ........ ..... .....    @vv_ui8
+vextrins_w       0111 00111000 01 ........ ..... .....    @vv_ui8
+vextrins_h       0111 00111000 10 ........ ..... .....    @vv_ui8
+vextrins_b       0111 00111000 11 ........ ..... .....    @vv_ui8
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index b0017a7ab8..3d478f96ce 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -4771,3 +4771,205 @@ DO_HELPER_VVV(vpickod_b, 8, helper_vvv_c, do_vpickod)
 DO_HELPER_VVV(vpickod_h, 16, helper_vvv_c, do_vpickod)
 DO_HELPER_VVV(vpickod_w, 32, helper_vvv_c, do_vpickod)
 DO_HELPER_VVV(vpickod_d, 64, helper_vvv_c, do_vpickod)
+
+static void do_vilvl(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[2 * n + 1] = Vj->B[n];
+        Vd->B[2 * n] = Vk->B[n];
+        break;
+    case 16:
+        Vd->H[2 * n + 1] = Vj->H[n];
+        Vd->H[2 * n] = Vk->H[n];
+        break;
+    case 32:
+        Vd->W[2 * n + 1] = Vj->W[n];
+        Vd->W[2 * n] = Vk->W[n];
+        break;
+    case 64:
+        Vd->D[2 * n + 1] = Vj->D[n];
+        Vd->D[2 * n] = Vk->D[n];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void do_vilvh(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[2 * n + 1] = Vj->B[n + 8];
+        Vd->B[2 * n] = Vk->B[n + 8];
+        break;
+    case 16:
+        Vd->H[2 * n + 1] = Vj->H[n + 4];
+        Vd->H[2 * n] = Vk->H[n + 4];
+        break;
+    case 32:
+        Vd->W[2 * n + 1] = Vj->W[n + 2];
+        Vd->W[2 * n] = Vk->W[n + 2];
+        break;
+    case 64:
+        Vd->D[2 * n + 1] = Vj->D[n + 1];
+        Vd->D[2 * n] = Vk->D[n + 1];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vilvl_b, 8, helper_vvv_c, do_vilvl)
+DO_HELPER_VVV(vilvl_h, 16, helper_vvv_c, do_vilvl)
+DO_HELPER_VVV(vilvl_w, 32, helper_vvv_c, do_vilvl)
+DO_HELPER_VVV(vilvl_d, 64, helper_vvv_c, do_vilvl)
+DO_HELPER_VVV(vilvh_b, 8, helper_vvv_c, do_vilvh)
+DO_HELPER_VVV(vilvh_h, 16, helper_vvv_c, do_vilvh)
+DO_HELPER_VVV(vilvh_w, 32, helper_vvv_c, do_vilvh)
+DO_HELPER_VVV(vilvh_d, 64, helper_vvv_c, do_vilvh)
+
+void helper_vshuf_b(CPULoongArchState *env,
+                    uint32_t vd, uint32_t vj, uint32_t vk, uint32_t va)
+{
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+    vec_t *Vk = &(env->fpr[vk].vec);
+    vec_t *Va = &(env->fpr[va].vec);
+
+    uint32_t i;
+    uint32_t max = LSX_LEN/8;
+    vec_t temp;
+    temp.D[0] = Vd->D[0];
+    temp.D[1] = Vd->D[1];
+    for (i = 0; i < max ; i++) {
+        uint32_t k = (Va->B[i] & 0x3f) % (2 *max);
+        temp.B[i] = (Va->B[i] & 0xc0) ? 0 : k < max ? Vk->B[k] : Vj->B[k - max];
+    }
+    Vd->D[0] = temp.D[0];
+    Vd->D[1] = temp.D[1];
+}
+
+static void do_vshuf(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
+{
+    uint32_t i;
+    uint32_t max = LSX_LEN/bit;
+    switch (bit) {
+    case 16:
+        i = (Vd->H[n] & 0x1f) % (2 *max);
+        Vd->H[n] = (Vd->H[n] & 0xc000) ? 0 : i < max ? Vk->H[i] : Vj->H[i - max];
+        break;
+    case 32:
+        i = (Vd->W[n] & 0xf) % (2 *max);
+        Vd->W[n] = (Vd->W[n] & 0xc0000000) ? 0 : i < max ? Vk->W[i] : Vj->W[i - max];
+        break;
+    case 64:
+        i = (Vd->D[n] & 0x7) % (2* max);
+        Vd->D[n] = (Vd->D[n] & 0xc000000000000000) ? 0 :i < max ? Vk->D[i] : Vj->D[i - max];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VVV(vshuf_h, 16, helper_vvv, do_vshuf)
+DO_HELPER_VVV(vshuf_w, 32, helper_vvv, do_vshuf)
+DO_HELPER_VVV(vshuf_d, 64, helper_vvv, do_vshuf)
+
+#define SHF_POS(i, imm) (((i) & 0xfc) + (((imm) >> (2 * ((i) & 0x03))) & 0x03))
+
+static void do_vshuf4i(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit, int n)
+{
+    switch (bit) {
+    case 8:
+        Vd->B[n] = Vj->B[SHF_POS(n, imm)];
+        break;
+    case 16:
+        Vd->H[n] = Vj->H[SHF_POS(n, imm)];
+        break;
+    case 32:
+        Vd->W[n] = Vj->W[SHF_POS(n, imm)];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+void helper_vshuf4i_d(CPULoongArchState *env, uint32_t vd, uint32_t vj, uint32_t imm)
+{
+    int i;
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+
+    vec_t temp;
+    temp.D[0] = 0;
+    temp.D[1] = 0;
+    for (i = 0; i < 2; i++) {
+        temp.D[i] = (((imm & 0x03) << (2 *i)) == 0x00) ? Vd->D[0] :
+                    (((imm & 0x03) << (2 *i)) == 0x01) ? Vd->D[1] :
+                    (((imm & 0x03) << (2 *i)) == 0x02) ? Vj->D[0] : Vj->D[1];
+    }
+    Vd->D[0] = temp.D[0];
+    Vd->D[1] = temp.D[1];
+}
+
+DO_HELPER_VV_I(vshuf4i_b, 8, helper_vv_i_c, do_vshuf4i)
+DO_HELPER_VV_I(vshuf4i_h, 16, helper_vv_i_c, do_vshuf4i)
+DO_HELPER_VV_I(vshuf4i_w, 32, helper_vv_i_c, do_vshuf4i)
+
+void helper_vpermi_w(CPULoongArchState *env, uint32_t vd,
+                         uint32_t vj, uint32_t imm)
+{
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+
+    Vd->W[0] = Vj->W[imm & 0x3];
+    Vd->W[1] = Vj->W[(imm >> 2) & 0x3];
+    Vd->W[2] = Vj->W[(imm >> 4) & 0x3];
+    Vd->W[3] = Vj->W[(imm >> 6) & 0x3];
+}
+
+static void helper_vextrins(CPULoongArchState *env,
+                            uint32_t vd, uint32_t vj, uint32_t imm, int bit,
+                            void (*func)(vec_t*, vec_t*, uint32_t, int))
+
+{
+    vec_t *Vd = &(env->fpr[vd].vec);
+    vec_t *Vj = &(env->fpr[vj].vec);
+
+    func(Vd, Vj, imm, bit);
+}
+
+static void do_vextrins(vec_t *Vd, vec_t *Vj, uint32_t imm, int bit)
+{
+    int ins, extr;
+    switch (bit) {
+    case 8:
+       ins = (imm >> 4) & 0xf;
+       extr = imm & 0xf;
+       Vd->B[ins] = Vj->B[extr];
+       break;
+    case 16:
+       ins = (imm >> 4) & 0x7;
+       extr = imm & 0x7;
+       Vd->H[ins] = Vj->H[extr];
+       break;
+    case 32:
+       ins = (imm >> 4) & 0x3;
+       extr = imm & 0x3;
+       Vd->W[ins] = Vj->W[extr];
+       break;
+    case 64:
+       ins = (imm >> 4) & 0x1;
+       extr = imm & 0x1;
+       Vd->D[ins] = Vj->D[extr];
+       break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+DO_HELPER_VV_I(vextrins_b, 8, helper_vextrins, do_vextrins)
+DO_HELPER_VV_I(vextrins_h, 16, helper_vextrins, do_vextrins)
+DO_HELPER_VV_I(vextrins_w, 32, helper_vextrins, do_vextrins)
+DO_HELPER_VV_I(vextrins_d, 64, helper_vextrins, do_vextrins)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 42/43] target/loongarch: Implement vld vst
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (40 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 41/43] target/loongarch: Implement vilvl vilvh vextrins vshuf Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24 21:15   ` Richard Henderson
  2022-12-24  8:16 ` [RFC PATCH 43/43] target/loongarch: Implement vldi Song Gao
  2022-12-24 15:39 ` [RFC PATCH 00/43] Add LoongArch LSX instructions Richard Henderson
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VLD[X], VST[X];
- VLDREPL.{B/H/W/D};
- VSTELM.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  34 +++
 target/loongarch/helper.h                   |  12 +
 target/loongarch/insn_trans/trans_lsx.c.inc |  75 ++++++
 target/loongarch/insns.decode               |  36 +++
 target/loongarch/lsx_helper.c               | 266 ++++++++++++++++++++
 target/loongarch/translate.c                |  10 +
 6 files changed, 433 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index ee92029007..e8dc0644bb 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -21,11 +21,21 @@ static inline int plus_1(DisasContext *ctx, int x)
     return x + 1;
 }
 
+static inline int shl_1(DisasContext *ctx, int x)
+{
+    return x << 1;
+}
+
 static inline int shl_2(DisasContext *ctx, int x)
 {
     return x << 2;
 }
 
+static inline int shl_3(DisasContext *ctx, int x)
+{
+    return x << 3;
+}
+
 #define CSR_NAME(REG) \
     [LOONGARCH_CSR_##REG] = (#REG)
 
@@ -794,6 +804,11 @@ static void output_vr_i(DisasContext *ctx, arg_vr_i *a, const char *mnemonic)
     output(ctx, mnemonic, "v%d, r%d, 0x%x", a->vd, a->rj, a->imm);
 }
 
+static void output_vr_ii(DisasContext *ctx, arg_vr_ii *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "v%d, r%d, 0x%x, 0x%x", a->vd, a->rj, a->imm, a->imm2);
+}
+
 static void output_rv_i(DisasContext *ctx, arg_rv_i *a, const char *mnemonic)
 {
     output(ctx, mnemonic, "r%d, v%d, 0x%x", a->rd, a->vj,  a->imm);
@@ -809,6 +824,11 @@ static void output_vvr(DisasContext *ctx, arg_vvr *a, const char *mnemonic)
     output(ctx, mnemonic, "v%d, v%d, r%d", a->vd, a->vj, a->rk);
 }
 
+static void output_vrr(DisasContext *ctx, arg_vrr *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "v%d, r%d, r%d", a->vd, a->rj, a->rk);
+}
+
 INSN_LSX(vadd_b,           vvv)
 INSN_LSX(vadd_h,           vvv)
 INSN_LSX(vadd_w,           vvv)
@@ -1625,3 +1645,17 @@ INSN_LSX(vextrins_d,       vv_i)
 INSN_LSX(vextrins_w,       vv_i)
 INSN_LSX(vextrins_h,       vv_i)
 INSN_LSX(vextrins_b,       vv_i)
+
+INSN_LSX(vld,              vr_i)
+INSN_LSX(vst,              vr_i)
+INSN_LSX(vldx,             vrr)
+INSN_LSX(vstx,             vrr)
+
+INSN_LSX(vldrepl_d,        vr_i)
+INSN_LSX(vldrepl_w,        vr_i)
+INSN_LSX(vldrepl_h,        vr_i)
+INSN_LSX(vldrepl_b,        vr_i)
+INSN_LSX(vstelm_d,         vr_ii)
+INSN_LSX(vstelm_w,         vr_ii)
+INSN_LSX(vstelm_h,         vr_ii)
+INSN_LSX(vstelm_b,         vr_ii)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index b0fb82c60e..a92bcfffe8 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -864,3 +864,15 @@ DEF_HELPER_4(vextrins_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vextrins_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vextrins_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vextrins_d, void, env, i32, i32, i32)
+
+DEF_HELPER_3(vld_b, void, env, i32, tl)
+DEF_HELPER_3(vst_b, void, env, i32, tl)
+
+DEF_HELPER_3(vldrepl_d, void, env, i32, tl)
+DEF_HELPER_3(vldrepl_w, void, env, i32, tl)
+DEF_HELPER_3(vldrepl_h, void, env, i32, tl)
+DEF_HELPER_3(vldrepl_b, void, env, i32, tl)
+DEF_HELPER_4(vstelm_d, void, env, i32, tl, i32)
+DEF_HELPER_4(vstelm_w, void, env, i32, tl, i32)
+DEF_HELPER_4(vstelm_h, void, env, i32, tl, i32)
+DEF_HELPER_4(vstelm_b, void, env, i32, tl, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index b289354dc3..308cba12f2 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -873,3 +873,78 @@ TRANS(vextrins_b, gen_vv_i, gen_helper_vextrins_b)
 TRANS(vextrins_h, gen_vv_i, gen_helper_vextrins_h)
 TRANS(vextrins_w, gen_vv_i, gen_helper_vextrins_w)
 TRANS(vextrins_d, gen_vv_i, gen_helper_vextrins_d)
+
+static bool gen_memory(DisasContext *ctx, arg_vr_i *a,
+                       void (*func)(TCGv_ptr, TCGv_i32, TCGv))
+{
+    TCGv_i32 vd = tcg_constant_i32(a->vd);
+    TCGv addr = gpr_src(ctx, a->rj, EXT_NONE);
+    TCGv temp = NULL;
+
+    CHECK_SXE;
+
+    if (a->imm) {
+        temp = tcg_temp_new();
+        tcg_gen_addi_tl(temp, addr, a->imm);
+        addr = temp;
+    }
+
+    func(cpu_env, vd, addr);
+    if (temp) {
+        tcg_temp_free(temp);
+    }
+    return true;
+}
+
+static bool gen_memory_x(DisasContext *ctx, arg_vrr *a,
+                    void (*func)(TCGv_ptr, TCGv_i32, TCGv))
+{
+    TCGv_i32 vd = tcg_constant_i32(a->vd);
+    TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE);
+    TCGv src2 = gpr_src(ctx, a->rk, EXT_NONE);
+
+    CHECK_SXE;
+
+    TCGv addr = tcg_temp_new();
+    tcg_gen_add_tl(addr, src1, src2);
+    func(cpu_env, vd, addr);
+    tcg_temp_free(addr);
+    return true;
+}
+
+TRANS(vld, gen_memory, gen_helper_vld_b)
+TRANS(vst, gen_memory, gen_helper_vst_b)
+TRANS(vldx, gen_memory_x, gen_helper_vld_b)
+TRANS(vstx, gen_memory_x, gen_helper_vst_b)
+
+static bool gen_memory_elm(DisasContext *ctx, arg_vr_ii *a,
+                           void (*func)(TCGv_ptr, TCGv_i32, TCGv, TCGv_i32))
+{
+    TCGv_i32 vd = tcg_constant_i32(a->vd);
+    TCGv addr = gpr_src(ctx, a->rj, EXT_NONE);
+    TCGv_i32 tidx = tcg_constant_i32(a->imm2);
+    TCGv temp = NULL;
+
+    CHECK_SXE;
+
+    if (a->imm) {
+        temp = tcg_temp_new();
+        tcg_gen_addi_tl(temp, addr, a->imm);
+        addr = temp;
+    }
+
+    func(cpu_env, vd, addr, tidx);
+    if (temp) {
+        tcg_temp_free(temp);
+    }
+    return true;
+}
+
+TRANS(vldrepl_b, gen_memory, gen_helper_vldrepl_b)
+TRANS(vldrepl_h, gen_memory, gen_helper_vldrepl_h)
+TRANS(vldrepl_w, gen_memory, gen_helper_vldrepl_w)
+TRANS(vldrepl_d, gen_memory, gen_helper_vldrepl_d)
+TRANS(vstelm_b, gen_memory_elm, gen_helper_vstelm_b)
+TRANS(vstelm_h, gen_memory_elm, gen_helper_vstelm_h)
+TRANS(vstelm_w, gen_memory_elm, gen_helper_vstelm_w)
+TRANS(vstelm_d, gen_memory_elm, gen_helper_vstelm_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 67bce30d00..f786a9a9ee 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -485,6 +485,17 @@ ertn             0000 01100100 10000 01110 00000 00000    @empty
 idle             0000 01100100 10001 ...............      @i15
 dbcl             0000 00000010 10101 ...............      @i15
 
+#
+# LSX Fields
+#
+
+%i9s3     10:s9       !function=shl_3
+%i10s2    10:s10      !function=shl_2
+%i11s1    10:s11      !function=shl_1
+%i8s3     10:s8       !function=shl_3
+%i8s2     10:s8       !function=shl_2
+%i8s1     10:s8       !function=shl_1
+
 #
 # LSX Argument sets
 #
@@ -499,6 +510,8 @@ dbcl             0000 00000010 10101 ...............      @i15
 &rv_i         rd vj imm
 &vr           vd rj
 &vvr          vd vj rk
+&vrr          vd rj rk
+&vr_ii        vd rj imm imm2
 
 #
 # LSX Formats
@@ -527,6 +540,15 @@ dbcl             0000 00000010 10101 ...............      @i15
 @rv_ui1      .... ........ ..... .... imm:1 vj:5 rd:5    &rv_i
 @vr               .... ........ ..... ..... rj:5 vd:5    &vr
 @vvr               .... ........ ..... rk:5 vj:5 vd:5    &vvr
+@vr_i9            .... ........ . ......... rj:5 vd:5    &vr_i imm=%i9s3
+@vr_i10            .... ........ .......... rj:5 vd:5    &vr_i imm=%i10s2
+@vr_i11            .... ....... ........... rj:5 vd:5    &vr_i imm=%i11s1
+@vr_i12                 .... ...... imm:s12 rj:5 vd:5    &vr_i
+@vr_i8i1    .... ........ . imm2:1 ........ rj:5 vd:5    &vr_ii imm=%i8s3
+@vr_i8i2      .... ........ imm2:2 ........ rj:5 vd:5    &vr_ii imm=%i8s2
+@vr_i8i3       .... ....... imm2:3 ........ rj:5 vd:5    &vr_ii imm=%i8s1
+@vr_i8i4          .... ...... imm2:4 imm:s8 rj:5 vd:5    &vr_ii
+@vrr               .... ........ ..... rk:5 rj:5 vd:5    &vrr
 
 vadd_b           0111 00000000 10100 ..... ..... .....    @vvv
 vadd_h           0111 00000000 10101 ..... ..... .....    @vvv
@@ -1255,3 +1277,17 @@ vextrins_d       0111 00111000 00 ........ ..... .....    @vv_ui8
 vextrins_w       0111 00111000 01 ........ ..... .....    @vv_ui8
 vextrins_h       0111 00111000 10 ........ ..... .....    @vv_ui8
 vextrins_b       0111 00111000 11 ........ ..... .....    @vv_ui8
+
+vld              0010 110000 ............ ..... .....     @vr_i12
+vst              0010 110001 ............ ..... .....     @vr_i12
+vldx             0011 10000100 00000 ..... ..... .....    @vrr
+vstx             0011 10000100 01000 ..... ..... .....    @vrr
+
+vldrepl_d        0011 00000001 0 ......... ..... .....    @vr_i9
+vldrepl_w        0011 00000010 .......... ..... .....     @vr_i10
+vldrepl_h        0011 0000010 ........... ..... .....     @vr_i11
+vldrepl_b        0011 000010 ............ ..... .....     @vr_i12
+vstelm_d         0011 00010001 0 . ........ ..... .....   @vr_i8i1
+vstelm_w         0011 00010010 .. ........ ..... .....    @vr_i8i2
+vstelm_h         0011 0001010 ... ........ ..... .....    @vr_i8i3
+vstelm_b         0011 000110 .... ........ ..... .....    @vr_i8i4
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 3d478f96ce..9058230975 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -4973,3 +4973,269 @@ DO_HELPER_VV_I(vextrins_b, 8, helper_vextrins, do_vextrins)
 DO_HELPER_VV_I(vextrins_h, 16, helper_vextrins, do_vextrins)
 DO_HELPER_VV_I(vextrins_w, 32, helper_vextrins, do_vextrins)
 DO_HELPER_VV_I(vextrins_d, 64, helper_vextrins, do_vextrins)
+
+void helper_vld_b(CPULoongArchState *env, uint32_t vd, target_ulong addr)
+{
+    int i;
+    vec_t *Vd = &(env->fpr[vd].vec);
+#if !defined(CONFIG_USER_ONLY)
+    MemOpIdx oi = make_memop_idx(MO_TE | MO_UNALN, cpu_mmu_index(env, false));
+
+    for (i = 0; i < LSX_LEN/8; i++) {
+        Vd->B[i]  = helper_ret_ldub_mmu(env, addr + i, oi, GETPC());
+    }
+#else
+    for (i = 0; i < LSX_LEN/8; i++) {
+        Vd->B[i]  = cpu_ldub_data(env, addr + i);
+    }
+#endif
+}
+
+#define LSX_PAGESPAN(x) \
+        ((((x) & ~TARGET_PAGE_MASK) + LSX_LEN/8 - 1) >= TARGET_PAGE_SIZE)
+
+static inline void ensure_writable_pages(CPULoongArchState *env,
+                                         target_ulong addr,
+                                         int mmu_idx,
+                                         uintptr_t retaddr)
+{
+#ifndef CONFIG_USER_ONLY
+    /* FIXME: Probe the actual accesses (pass and use a size) */
+    if (unlikely(LSX_PAGESPAN(addr))) {
+        /* first page */
+        probe_write(env, addr, 0, mmu_idx, retaddr);
+        /* second page */
+        addr = (addr & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+        probe_write(env, addr, 0, mmu_idx, retaddr);
+    }
+#endif
+}
+
+void helper_vst_b(CPULoongArchState *env, uint32_t vd, target_ulong addr)
+{
+    int i;
+    vec_t *Vd = &(env->fpr[vd].vec);
+    int mmu_idx = cpu_mmu_index(env, false);
+
+    ensure_writable_pages(env, addr, mmu_idx, GETPC());
+#if !defined(CONFIG_USER_ONLY)
+    MemOpIdx oi = make_memop_idx(MO_TE | MO_UNALN, mmu_idx);
+    for (i = 0; i < LSX_LEN/8; i++) {
+        helper_ret_stb_mmu(env, addr + i, Vd->B[i],  oi, GETPC());
+    }
+#else
+    for (i = 0; i < LSX_LEN/8; i++) {
+        cpu_stb_data(env, addr + i, Vd->B[i]);
+    }
+#endif
+}
+
+void helper_vldrepl_b(CPULoongArchState *env, uint32_t vd, target_ulong addr)
+{
+    vec_t *Vd = &(env->fpr[vd].vec);
+    uint8_t data;
+#if !defined(CONFIG_USER_ONLY)
+    MemOpIdx oi = make_memop_idx(MO_TE | MO_8 | MO_UNALN,
+                                 cpu_mmu_index(env, false));
+    data = helper_ret_ldub_mmu(env, add, oi, GETPC());
+#else
+    data = cpu_ldub_data(env, addr);
+#endif
+    int i;
+    for (i = 0; i < 16; i++) {
+        Vd->B[i] = data;
+    }
+}
+
+void helper_vldrepl_h(CPULoongArchState *env, uint32_t vd, target_ulong addr)
+{
+    vec_t *Vd = &(env->fpr[vd].vec);
+    uint16_t data;
+#if !defined(CONFIG_USER_ONLY)
+    MemOpIdx oi = make_memop_idx(MO_TE | MO_16 | MO_UNALN,
+                                 cpu_mmu_index(env, false));
+    data = helper_le_lduw_mmu(env, addr, oi, GETPC());
+#else
+    data = cpu_lduw_data(env, addr);
+#endif
+    int i;
+    for (i = 0; i < 8; i++) {
+        Vd->H[i] = data;
+    }
+}
+
+void helper_vldrepl_w(CPULoongArchState *env, uint32_t vd, target_ulong addr)
+{
+    vec_t *Vd = &(env->fpr[vd].vec);
+    uint32_t data;
+#if !defined(CONFIG_USER_ONLY)
+    MemOpIdx oi = make_memop_idx(MO_TE | MO_32 | MO_UNALN,
+                                 cpu_mmu_index(env, false));
+    data = helper_le_ldul_mmu(env, addr, oi, GETPC());
+#else
+    data = cpu_ldl_data(env, addr);
+#endif
+    Vd->W[0] = data;
+    Vd->W[1] = data;
+    Vd->W[2] = data;
+    Vd->W[3] = data;
+}
+
+void helper_vldrepl_d(CPULoongArchState *env, uint32_t vd, target_ulong addr)
+{
+    vec_t *Vd = &(env->fpr[vd].vec);
+    uint64_t data;
+#if !defined(CONFIG_USER_ONLY)
+    MemOpIdx oi = make_memop_idx(MO_TE | MO_64 | MO_UNALN,
+                                 cpu_mmu_index(env, false));
+    data = helper_le_ldq_mmu(env, addr, oi, GETPC());
+#else
+    data = cpu_ldq_data(env, addr);
+#endif
+    Vd->D[0] = data;
+    Vd->D[1] = data;
+}
+
+#define B_PAGESPAN(x) \
+        ((((x) & ~TARGET_PAGE_MASK) + 8/8 - 1) >= TARGET_PAGE_SIZE)
+
+static inline void ensure_b_writable_pages(CPULoongArchState *env,
+                                           target_ulong addr,
+                                           int mmu_idx,
+                                           uintptr_t retaddr)
+{
+#ifndef CONFIG_USER_ONLY
+    /* FIXME: Probe the actual accesses (pass and use a size) */
+    if (unlikely(B_PAGESPAN(addr))) {
+        /* first page */
+        probe_write(env, addr, 0, mmu_idx, retaddr);
+        /* second page */
+        addr = (addr & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+        probe_write(env, addr, 0, mmu_idx, retaddr);
+    }
+#endif
+}
+
+void helper_vstelm_b(CPULoongArchState *env,
+                     uint32_t vd, target_ulong addr, uint32_t sel)
+{
+    vec_t *Vd = &(env->fpr[vd].vec);
+    int mmu_idx = cpu_mmu_index(env, false);
+
+    ensure_b_writable_pages(env, addr, mmu_idx, GETPC());
+#if !defined(CONFIG_USER_ONLY)
+    MemOpIdx oi = make_memop_idx(MO_TE | MO_8 | MO_UNALN,
+                                 cpu_mmu_index(env, false));
+    helper_ret_stb_mmu(env, addr, Vd->B[sel], oi, GETPC());
+#else
+    cpu_stb_data(env, addr, Vd->B[sel]);
+#endif
+}
+
+#define H_PAGESPAN(x) \
+        ((((x) & ~TARGET_PAGE_MASK) + 16/8 - 1) >= TARGET_PAGE_SIZE)
+
+static inline void ensure_h_writable_pages(CPULoongArchState *env,
+                                           target_ulong addr,
+                                           int mmu_idx,
+                                           uintptr_t retaddr)
+{
+#ifndef CONFIG_USER_ONLY
+    /* FIXME: Probe the actual accesses (pass and use a size) */
+    if (unlikely(H_PAGESPAN(addr))) {
+        /* first page */
+        probe_write(env, addr, 0, mmu_idx, retaddr);
+        /* second page */
+        addr = (addr & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+        probe_write(env, addr, 0, mmu_idx, retaddr);
+    }
+#endif
+}
+
+void helper_vstelm_h(CPULoongArchState *env,
+                     uint32_t vd, target_ulong addr, uint32_t sel)
+{
+    vec_t *Vd = &(env->fpr[vd].vec);
+    int mmu_idx = cpu_mmu_index(env, false);
+
+    ensure_h_writable_pages(env, addr, mmu_idx, GETPC());
+#if !defined(CONFIG_USER_ONLY)
+    MemOpIdx oi = make_memop_idx(MO_TE | MO_16 | MO_UNALN,
+                                 cpu_mmu_index(env, false));
+    helper_le_stw_mmu(env, addr, Vd->H[sel], oi, GETPC());
+#else
+    cpu_stw_data(env, addr, Vd->H[sel]);
+#endif
+}
+
+#define W_PAGESPAN(x) \
+        ((((x) & ~TARGET_PAGE_MASK) + 32/8 - 1) >= TARGET_PAGE_SIZE)
+
+static inline void ensure_w_writable_pages(CPULoongArchState *env,
+                                           target_ulong addr,
+                                           int mmu_idx,
+                                           uintptr_t retaddr)
+{
+#ifndef CONFIG_USER_ONLY
+    /* FIXME: Probe the actual accesses (pass and use a size) */
+    if (unlikely(W_PAGESPAN(addr))) {
+        /* first page */
+        probe_write(env, addr, 0, mmu_idx, retaddr);
+        /* second pdge */
+        addr = (addr & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+        probe_write(env, addr, 0, mmu_idx, retaddr);
+    }
+#endif
+}
+
+void helper_vstelm_w(CPULoongArchState *env,
+                     uint32_t vd, target_ulong addr, uint32_t sel)
+{
+    vec_t *Vd = &(env->fpr[vd].vec);
+    int mmu_idx = cpu_mmu_index(env, false);
+
+    ensure_w_writable_pages(env, addr, mmu_idx, GETPC());
+#if !defined(CONFIG_USER_ONLY)
+    MemOpIdx oi = make_memop_idx(MO_TE | MO_32 | MO_UNALN,
+                                 cpu_mmu_index(env, false));
+    helper_le_stl_mmu(env, addr, Vd->W[sel], oi, GETPC());
+#else
+    cpu_stl_data(env, addr, Vd->W[sel]);
+#endif
+}
+
+#define D_PAGESPAN(x) \
+        ((((x) & ~TARGET_PAGE_MASK) + 32/8 - 1) >= TARGET_PAGE_SIZE)
+
+static inline void ensure_d_writable_pages(CPULoongArchState *env,
+                                           target_ulong addr,
+                                           int mmu_idx,
+                                           uintptr_t retaddr)
+{
+#ifndef CONFIG_USER_ONLY
+    /* FIXME: Probe the actual accesses (pass and use a size) */
+    if (unlikely(D_PAGESPAN(addr))) {
+        /* first page */
+        probe_write(env, addr, 0, mmu_idx, retaddr);
+        /* second page */
+        addr = (addr & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+        probe_write(env, addr, 0, mmu_idx, retaddr);
+    }
+#endif
+}
+
+void helper_vstelm_d(CPULoongArchState *env,
+                     uint32_t vd, target_ulong addr, uint32_t sel)
+{
+    vec_t *Vd = &(env->fpr[vd].vec);
+    int mmu_idx = cpu_mmu_index(env, false);
+
+    ensure_d_writable_pages(env, addr, mmu_idx, GETPC());
+#if !defined(CONFIG_USER_ONLY)
+    MemOpIdx oi = make_memop_idx(MO_TE | MO_64 | MO_UNALN,
+                                 cpu_mmu_index(env, false));
+    helper_le_stq_mmu(env, addr, Vd->D[sel], oi, GETPC());
+#else
+    cpu_stq_data(env, addr, Vd->D[sel]);
+#endif
+}
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index fa43473738..3bb63bfb3e 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -35,11 +35,21 @@ static inline int plus_1(DisasContext *ctx, int x)
     return x + 1;
 }
 
+static inline int shl_1(DisasContext *ctx, int x)
+{
+    return x << 1;
+}
+
 static inline int shl_2(DisasContext *ctx, int x)
 {
     return x << 2;
 }
 
+static inline int shl_3(DisasContext *ctx, int x)
+{
+    return x << 3;
+}
+
 /*
  * LoongArch the upper 32 bits are undefined ("can be any value").
  * QEMU chooses to nanbox, because it is most likely to show guest bugs early.
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH 43/43] target/loongarch: Implement vldi
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (41 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 42/43] target/loongarch: Implement vld vst Song Gao
@ 2022-12-24  8:16 ` Song Gao
  2022-12-24 21:18   ` Richard Henderson
  2022-12-24 15:39 ` [RFC PATCH 00/43] Add LoongArch LSX instructions Richard Henderson
  43 siblings, 1 reply; 100+ messages in thread
From: Song Gao @ 2022-12-24  8:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VLDI.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |   7 +
 target/loongarch/helper.h                   |   2 +
 target/loongarch/insn_trans/trans_lsx.c.inc |  10 ++
 target/loongarch/insns.decode               |   4 +
 target/loongarch/lsx_helper.c               | 134 ++++++++++++++++++++
 5 files changed, 157 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index e8dc0644bb..0c5cc313e0 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -829,6 +829,11 @@ static void output_vrr(DisasContext *ctx, arg_vrr *a, const char *mnemonic)
     output(ctx, mnemonic, "v%d, r%d, r%d", a->vd, a->rj, a->rk);
 }
 
+static void output_v_i(DisasContext *ctx, arg_v_i *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "v%d, 0x%x", a->vd, a->imm);
+}
+
 INSN_LSX(vadd_b,           vvv)
 INSN_LSX(vadd_h,           vvv)
 INSN_LSX(vadd_w,           vvv)
@@ -1114,6 +1119,8 @@ INSN_LSX(vmskltz_d,        vv)
 INSN_LSX(vmskgez_b,        vv)
 INSN_LSX(vmsknz_b,         vv)
 
+INSN_LSX(vldi,             v_i)
+
 INSN_LSX(vand_v,           vvv)
 INSN_LSX(vor_v,            vvv)
 INSN_LSX(vxor_v,           vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index a92bcfffe8..cc28ecadd9 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -418,6 +418,8 @@ DEF_HELPER_3(vmskltz_d, void, env, i32, i32)
 DEF_HELPER_3(vmskgez_b, void, env, i32, i32)
 DEF_HELPER_3(vmsknz_b, void, env, i32,i32)
 
+DEF_HELPER_3(vldi, void, env, i32, i32)
+
 DEF_HELPER_4(vand_v, void, env, i32, i32, i32)
 DEF_HELPER_4(vor_v, void, env, i32, i32, i32)
 DEF_HELPER_4(vxor_v, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 308cba12f2..97969d7138 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -406,6 +406,16 @@ TRANS(vmskltz_d, gen_vv, gen_helper_vmskltz_d)
 TRANS(vmskgez_b, gen_vv, gen_helper_vmskgez_b)
 TRANS(vmsknz_b,  gen_vv, gen_helper_vmsknz_b)
 
+static bool trans_vldi(DisasContext *ctx, arg_vldi *a)
+{
+    TCGv_i32 twd = tcg_constant_i32(a->vd);
+    TCGv_i32 tui = tcg_constant_i32(a->imm);
+
+    CHECK_SXE;
+    gen_helper_vldi(cpu_env, twd, tui);
+    return true;
+}
+
 TRANS(vand_v, gen_vvv, gen_helper_vand_v)
 TRANS(vor_v, gen_vvv, gen_helper_vor_v)
 TRANS(vxor_v, gen_vvv, gen_helper_vxor_v)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index f786a9a9ee..b1608fd86e 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -512,6 +512,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 &vvr          vd vj rk
 &vrr          vd rj rk
 &vr_ii        vd rj imm imm2
+&v_i          vd imm
 
 #
 # LSX Formats
@@ -549,6 +550,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 @vr_i8i3       .... ....... imm2:3 ........ rj:5 vd:5    &vr_ii imm=%i8s1
 @vr_i8i4          .... ...... imm2:4 imm:s8 rj:5 vd:5    &vr_ii
 @vrr               .... ........ ..... rk:5 rj:5 vd:5    &vrr
+@v_i13                   .... ........ .. imm:13 vd:5    &v_i
 
 vadd_b           0111 00000000 10100 ..... ..... .....    @vvv
 vadd_h           0111 00000000 10101 ..... ..... .....    @vvv
@@ -836,6 +838,8 @@ vmskltz_d        0111 00101001 11000 10011 ..... .....    @vv
 vmskgez_b        0111 00101001 11000 10100 ..... .....    @vv
 vmsknz_b         0111 00101001 11000 11000 ..... .....    @vv
 
+vldi             0111 00111110 00 ............. .....     @v_i13
+
 vand_v           0111 00010010 01100 ..... ..... .....    @vvv
 vor_v            0111 00010010 01101 ..... ..... .....    @vvv
 vxor_v           0111 00010010 01110 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 9058230975..fcaee16394 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -5239,3 +5239,137 @@ void helper_vstelm_d(CPULoongArchState *env,
     cpu_stq_data(env, addr, Vd->D[sel]);
 #endif
 }
+
+#define EXPAND_BYTE(bit)  ((uint64_t)(bit ? 0xff : 0))
+
+void helper_vldi(CPULoongArchState *env, uint32_t vd, uint32_t ui)
+{
+    int sel = (ui >> 12) & 0x1;
+    uint32_t i;
+
+    vec_t *Vd = &(env->fpr[vd].vec);
+    if (sel) {
+        /* VSETI.D */
+        int mode = (ui >> 8) & 0xf;
+        uint64_t imm = (ui & 0xff);
+        for (i = 0; i < 2; i++) {
+            switch (mode) {
+            case 0:
+                Vd->D[i] = (imm << 32) | imm;
+                break;
+            case 1:
+                Vd->D[i] = (imm << 24) | (imm << 8);
+                break;
+            case 2:
+                Vd->D[i] = (imm << 48) | (imm << 16);
+                break;
+            case 3:
+                Vd->D[i] = (imm << 56) | (imm << 24);
+                break;
+            case 4:
+                Vd->D[i] = (imm << 48) | (imm << 32) |
+                           (imm << 16) | imm;
+                break;
+            case 5:
+                Vd->D[i] = (imm << 56) | (imm << 40) |
+                           (imm << 24) | (imm << 8);
+                break;
+            case 6:
+                Vd->D[i] = (imm << 40) | ((uint64_t)0xff << 32) |
+                           (imm << 8) | 0xff;
+                break;
+            case 7:
+                Vd->D[i] = (imm << 48) | ((uint64_t)0xffff << 32) |
+                           (imm << 16) | 0xffff;
+                break;
+            case 8:
+                Vd->D[i] = (imm << 56) | (imm << 48) | (imm << 40) |
+                           (imm << 32) | (imm << 24) | (imm << 16) |
+                           (imm << 8) | imm;
+                break;
+            case 9: {
+                uint64_t b0,b1,b2,b3,b4,b5,b6,b7;
+                b0 = imm & 0x1;
+                b1 = (imm & 0x2) >> 1;
+                b2 = (imm & 0x4) >> 2;
+                b3 = (imm & 0x8) >> 3;
+                b4 = (imm & 0x10) >> 4;
+                b5 = (imm & 0x20) >> 5;
+                b6 = (imm & 0x40) >> 6;
+                b7 = (imm & 0x80) >> 7;
+                Vd->D[i] = (EXPAND_BYTE(b7) << 56) |
+                           (EXPAND_BYTE(b6) << 48) |
+                           (EXPAND_BYTE(b5) << 40) |
+                           (EXPAND_BYTE(b4) << 32) |
+                           (EXPAND_BYTE(b3) << 24) |
+                           (EXPAND_BYTE(b2) << 16) |
+                           (EXPAND_BYTE(b1) <<  8) |
+                           EXPAND_BYTE(b0);
+                break;
+            }
+            case 10: {
+                uint64_t b6, b7;
+                uint64_t t0, t1;
+                b6 = (imm & 0x40) >> 6;
+                b7 = (imm & 0x80) >> 7;
+                t0 = (imm & 0x3f);
+                t1 = (b7 << 6) | ((1-b6) << 5) | (uint64_t)(b6 ? 0x1f : 0);
+                Vd->D[i] = (t1 << 57) | (t0 << 51) |
+                           (t1 << 25) | (t0 << 19);
+                break;
+            }
+            case 11: {
+                uint64_t b6,b7;
+                uint64_t t0, t1;
+                b6 = (imm & 0x40) >> 6;
+                b7 = (imm & 0x80) >> 7;
+                t0 = (imm & 0x3f);
+                t1 = (b7 << 6) | ((1-b6) << 5) | (b6 ? 0x1f : 0);
+                Vd->D[i] = (t1 << 25) | (t0 << 19);
+                break;
+            }
+            case 12: {
+                uint64_t b6,b7;
+                uint64_t t0, t1;
+                b6 = (imm & 0x40) >> 6;
+                b7 = (imm & 0x80) >> 7;
+                t0 = (imm & 0x3f);
+                t1 = (b7 << 6) | ((1-b6) << 5) | (b6 ? 0x1f : 0);
+                Vd->D[i] = (t1 << 54) | (t0 << 48);
+                break;
+            }
+            default:
+                assert(0);
+            }
+        }
+    } else {
+        /* LDI.df */
+        uint32_t df = (ui >> 10) & 0x3;
+        int32_t s10 = ((int32_t)(ui << 22)) >> 22;
+
+        switch (df) {
+        case 0:
+            for (i = 0; i < LSX_LEN/8; i++) {
+                Vd->B[i] = (int8_t)s10;
+            }
+            break;
+        case 1:
+            for (i = 0; i < LSX_LEN/16; i++) {
+                Vd->H[i] = (int16_t)s10;
+            }
+            break;
+        case 2:
+            for (i = 0; i < LSX_LEN/32; i++) {
+                Vd->W[i] = (int32_t)s10;
+            }
+            break;
+        case 3:
+            for (i = 0; i < LSX_LEN/64; i++) {
+                Vd->D[i] = (int64_t)s10;
+            }
+           break;
+        default:
+            assert(0);
+        }
+    }
+}
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 00/43] Add LoongArch LSX instructions
  2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
                   ` (42 preceding siblings ...)
  2022-12-24  8:16 ` [RFC PATCH 43/43] target/loongarch: Implement vldi Song Gao
@ 2022-12-24 15:39 ` Richard Henderson
  2022-12-28  0:55   ` gaosong
  43 siblings, 1 reply; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 15:39 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:15, Song Gao wrote:
> Hi, Merry Christmas!
> 
> This series adds LoongArch LSX instructions, Since the LoongArch
> Vol2 is not open, So we use 'RFC' title.

That is unfortunate, as it makes reviewing this difficult.
Is there a timeline for this being published?

In the meantime, I can at least point out some general issues.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t
  2022-12-24  8:15 ` [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t Song Gao
@ 2022-12-24 17:07   ` Richard Henderson
  2022-12-24 17:24   ` Richard Henderson
  2022-12-24 17:32   ` Richard Henderson
  2 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 17:07 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:15, Song Gao wrote:
> +#define LSX_LEN   (128)
> +typedef union vec_t vec_t;
> +union vec_t {
> +    int8_t   B[LSX_LEN / 8];
> +    int16_t  H[LSX_LEN / 16];
> +    int32_t  W[LSX_LEN / 32];
> +    int64_t  D[LSX_LEN / 64];
> +    __int128 Q[LSX_LEN / 128];
> +};
> +
> +typedef union fpr_t fpr_t;
> +union fpr_t {
> +    uint64_t d;
> +    vec_t vec;
> +};

You need to think about host endianness with this overlap and indexing.

There are two different models which can be emulated:

(1) target/{arm,s390x}/ has each uint64_t in host-endian order, but the words are indexed 
little-endian.  See, for instance, target/s390x/tcg/vec.h.

(2) target/{ppc,i386}/ has the entire vector in host-endian order.  See, for instance, 
ZMM_* in target/i386/cpu.h.

If you do nothing, I assume this will fail on a big-endian host.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 05/43] target/loongarch: Implement vadd/vsub
  2022-12-24  8:15 ` [RFC PATCH 05/43] target/loongarch: Implement vadd/vsub Song Gao
@ 2022-12-24 17:16   ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 17:16 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:15, Song Gao wrote:
> +static bool gen_vvv(DisasContext *ctx, arg_vvv *a,
> +                    void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32))
> +{
> +    TCGv_i32 vd = tcg_constant_i32(a->vd);
> +    TCGv_i32 vj = tcg_constant_i32(a->vj);
> +    TCGv_i32 vk = tcg_constant_i32(a->vk);
> +
> +    CHECK_SXE;
> +    func(cpu_env, vd, vj, vk);
> +    return true;
> +}
> +
> +TRANS(vadd_b, gen_vvv, gen_helper_vadd_b)
> +TRANS(vadd_h, gen_vvv, gen_helper_vadd_h)
> +TRANS(vadd_w, gen_vvv, gen_helper_vadd_w)
> +TRANS(vadd_d, gen_vvv, gen_helper_vadd_d)
> +TRANS(vadd_q, gen_vvv, gen_helper_vadd_q)
> +TRANS(vsub_b, gen_vvv, gen_helper_vsub_b)
> +TRANS(vsub_h, gen_vvv, gen_helper_vsub_h)
> +TRANS(vsub_w, gen_vvv, gen_helper_vsub_w)
> +TRANS(vsub_d, gen_vvv, gen_helper_vsub_d)
> +TRANS(vsub_q, gen_vvv, gen_helper_vsub_q)

The 8 to 64-bit operations can be implemented with tcg_gen_gvec_{add,sub}.
The 128-bit operations can be implemented with tcg_gen_{add,sub}2_i64.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t
  2022-12-24  8:15 ` [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t Song Gao
  2022-12-24 17:07   ` Richard Henderson
@ 2022-12-24 17:24   ` Richard Henderson
  2022-12-28  2:34     ` gaosong
  2022-12-24 17:32   ` Richard Henderson
  2 siblings, 1 reply; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 17:24 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:15, Song Gao wrote:
> +union fpr_t {
> +    uint64_t d;
> +    vec_t vec;
> +};
> +
>   struct LoongArchTLB {
>       uint64_t tlb_misc;
>       /* Fields corresponding to CSR_TLBELO0/1 */
> @@ -251,7 +267,7 @@ typedef struct CPUArchState {
>       uint64_t gpr[32];
>       uint64_t pc;
>   
> -    uint64_t fpr[32];
> +    fpr_t fpr[32];

I didn't spot it right away, because you didn't add ".d" to the tcg register allocation, 
but if you use tcg/tcg-op-gvec.h (and you really should), then you will also have to remove

>     for (i = 0; i < 32; i++) {
>         int off = offsetof(CPULoongArchState, fpr[i]);
>         cpu_fpr[i] = tcg_global_mem_new_i64(cpu_env, off, fregnames[i]);
>     }

because one cannot modify global_mem variables with gvec.

I strongly suggest that you introduce wrappers to load/store fpr values from their env 
slots.  I would name them similarly to gpr_{src,dst}, gen_set_gpr.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 06/43] target/loongarch: Implement vaddi/vsubi
  2022-12-24  8:15 ` [RFC PATCH 06/43] target/loongarch: Implement vaddi/vsubi Song Gao
@ 2022-12-24 17:27   ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 17:27 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:15, Song Gao wrote:
> +static bool gen_vv_i(DisasContext *ctx, arg_vv_i *a,
> +                     void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32))
> +{
> +    TCGv_i32 vd = tcg_constant_i32(a->vd);
> +    TCGv_i32 vj = tcg_constant_i32(a->vj);
> +    TCGv_i32 imm = tcg_constant_i32(a->imm);
> +
> +    CHECK_SXE;
> +    func(cpu_env, vd, vj, imm);
> +    return true;
> +}
> +
>   TRANS(vadd_b, gen_vvv, gen_helper_vadd_b)
>   TRANS(vadd_h, gen_vvv, gen_helper_vadd_h)
>   TRANS(vadd_w, gen_vvv, gen_helper_vadd_w)
> @@ -37,3 +49,12 @@ TRANS(vsub_h, gen_vvv, gen_helper_vsub_h)
>   TRANS(vsub_w, gen_vvv, gen_helper_vsub_w)
>   TRANS(vsub_d, gen_vvv, gen_helper_vsub_d)
>   TRANS(vsub_q, gen_vvv, gen_helper_vsub_q)
> +
> +TRANS(vaddi_bu, gen_vv_i, gen_helper_vaddi_bu)
> +TRANS(vaddi_hu, gen_vv_i, gen_helper_vaddi_hu)
> +TRANS(vaddi_wu, gen_vv_i, gen_helper_vaddi_wu)
> +TRANS(vaddi_du, gen_vv_i, gen_helper_vaddi_du)
> +TRANS(vsubi_bu, gen_vv_i, gen_helper_vsubi_bu)
> +TRANS(vsubi_hu, gen_vv_i, gen_helper_vsubi_hu)
> +TRANS(vsubi_wu, gen_vv_i, gen_helper_vsubi_wu)
> +TRANS(vsubi_du, gen_vv_i, gen_helper_vsubi_du)

These can be implemented with tcg_gen_gvec_addi.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 07/43] target/loongarch: Implement vneg
  2022-12-24  8:15 ` [RFC PATCH 07/43] target/loongarch: Implement vneg Song Gao
@ 2022-12-24 17:29   ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 17:29 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:15, Song Gao wrote:
> +TRANS(vneg_b, gen_vv, gen_helper_vneg_b)
> +TRANS(vneg_h, gen_vv, gen_helper_vneg_h)
> +TRANS(vneg_w, gen_vv, gen_helper_vneg_w)
> +TRANS(vneg_d, gen_vv, gen_helper_vneg_d)

These can be implemented with tcg_gen_gvec_neg.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 08/43] target/loongarch: Implement vsadd/vssub
  2022-12-24  8:15 ` [RFC PATCH 08/43] target/loongarch: Implement vsadd/vssub Song Gao
@ 2022-12-24 17:31   ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 17:31 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:15, Song Gao wrote:
> +TRANS(vsadd_b, gen_vvv, gen_helper_vsadd_b)
> +TRANS(vsadd_h, gen_vvv, gen_helper_vsadd_h)
> +TRANS(vsadd_w, gen_vvv, gen_helper_vsadd_w)
> +TRANS(vsadd_d, gen_vvv, gen_helper_vsadd_d)
> +TRANS(vsadd_bu, gen_vvv, gen_helper_vsadd_bu)
> +TRANS(vsadd_hu, gen_vvv, gen_helper_vsadd_hu)
> +TRANS(vsadd_wu, gen_vvv, gen_helper_vsadd_wu)
> +TRANS(vsadd_du, gen_vvv, gen_helper_vsadd_du)
> +TRANS(vssub_b, gen_vvv, gen_helper_vssub_b)
> +TRANS(vssub_h, gen_vvv, gen_helper_vssub_h)
> +TRANS(vssub_w, gen_vvv, gen_helper_vssub_w)
> +TRANS(vssub_d, gen_vvv, gen_helper_vssub_d)
> +TRANS(vssub_bu, gen_vvv, gen_helper_vssub_bu)
> +TRANS(vssub_hu, gen_vvv, gen_helper_vssub_hu)
> +TRANS(vssub_wu, gen_vvv, gen_helper_vssub_wu)
> +TRANS(vssub_du, gen_vvv, gen_helper_vssub_du)

These can be implemented with tcg_gen_gvec_{ssadd,sssub,usadd,ussub}.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t
  2022-12-24  8:15 ` [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t Song Gao
  2022-12-24 17:07   ` Richard Henderson
  2022-12-24 17:24   ` Richard Henderson
@ 2022-12-24 17:32   ` Richard Henderson
  2023-02-13  8:24     ` gaosong
  2 siblings, 1 reply; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 17:32 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:15, Song Gao wrote:
> +union vec_t {
> +    int8_t   B[LSX_LEN / 8];
> +    int16_t  H[LSX_LEN / 16];
> +    int32_t  W[LSX_LEN / 32];
> +    int64_t  D[LSX_LEN / 64];
> +    __int128 Q[LSX_LEN / 128];

Oh, you can't use __int128 directly.
It won't compile on 32-bit hosts.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 09/43] target/loongarch: Implement vhaddw/vhsubw
  2022-12-24  8:15 ` [RFC PATCH 09/43] target/loongarch: Implement vhaddw/vhsubw Song Gao
@ 2022-12-24 17:41   ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 17:41 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:15, Song Gao wrote:
> +#define S_EVEN(a, bit) \
> +        ((((int64_t)(a)) << (64 - bit / 2)) >> (64 - bit / 2))
> +
> +#define U_EVEN(a, bit) \
> +        ((((uint64_t)(a)) << (64 - bit / 2)) >> (64 - bit / 2))
> +
> +#define S_ODD(a, bit) \
> +        ((((int64_t)(a)) << (64 - bit)) >> (64 - bit/ 2))
> +
> +#define U_ODD(a, bit) \
> +        ((((uint64_t)(a)) << (64 - bit)) >> (64 - bit / 2))
> +
> +#define S_EVEN_Q(a, bit) \
> +        ((((__int128)(a)) << (128 - bit / 2)) >> (128 - bit / 2))
> +
> +#define U_EVEN_Q(a, bit) \
> +        ((((unsigned __int128)(a)) << (128 - bit / 2)) >> (128 - bit / 2))
> +
> +#define S_ODD_Q(a, bit) \
> +        ((((__int128)(a)) << (128 - bit)) >> (128 - bit/ 2))
> +
> +#define U_ODD_Q(a, bit) \
> +        ((((unsigned __int128)(a)) << (128 - bit)) >> (128 - bit / 2))

I suspect all of these are wrong.  I believe bit is in [0-127], which means both (64 - 
bit) and (128 - bit) generate out-of range shifts.

Also, you can't use __int128 directly.

I'm somewhat surprised that you're shifting at all, rather than indexing the correct 
element from of the vec_t arrays.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
  2022-12-24  8:16 ` [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw Song Gao
@ 2022-12-24 17:48   ` Richard Henderson
  2023-02-20  7:47     ` gaosong
  0 siblings, 1 reply; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 17:48 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:16, Song Gao wrote:
> +TRANS(vaddwev_h_b, gen_vvv, gen_helper_vaddwev_h_b)
> +TRANS(vaddwev_w_h, gen_vvv, gen_helper_vaddwev_w_h)
> +TRANS(vaddwev_d_w, gen_vvv, gen_helper_vaddwev_d_w)
> +TRANS(vaddwev_q_d, gen_vvv, gen_helper_vaddwev_q_d)
> +TRANS(vaddwod_h_b, gen_vvv, gen_helper_vaddwod_h_b)
> +TRANS(vaddwod_w_h, gen_vvv, gen_helper_vaddwod_w_h)
> +TRANS(vaddwod_d_w, gen_vvv, gen_helper_vaddwod_d_w)
> +TRANS(vaddwod_q_d, gen_vvv, gen_helper_vaddwod_q_d)
> +TRANS(vsubwev_h_b, gen_vvv, gen_helper_vsubwev_h_b)
> +TRANS(vsubwev_w_h, gen_vvv, gen_helper_vsubwev_w_h)
> +TRANS(vsubwev_d_w, gen_vvv, gen_helper_vsubwev_d_w)
> +TRANS(vsubwev_q_d, gen_vvv, gen_helper_vsubwev_q_d)
> +TRANS(vsubwod_h_b, gen_vvv, gen_helper_vsubwod_h_b)
> +TRANS(vsubwod_w_h, gen_vvv, gen_helper_vsubwod_w_h)
> +TRANS(vsubwod_d_w, gen_vvv, gen_helper_vsubwod_d_w)
> +TRANS(vsubwod_q_d, gen_vvv, gen_helper_vsubwod_q_d)

These can be implemented with a combination of vector shift + vector add.

> +TRANS(vaddwev_h_bu, gen_vvv, gen_helper_vaddwev_h_bu)
> +TRANS(vaddwev_w_hu, gen_vvv, gen_helper_vaddwev_w_hu)
> +TRANS(vaddwev_d_wu, gen_vvv, gen_helper_vaddwev_d_wu)
> +TRANS(vaddwev_q_du, gen_vvv, gen_helper_vaddwev_q_du)
> +TRANS(vaddwod_h_bu, gen_vvv, gen_helper_vaddwod_h_bu)
> +TRANS(vaddwod_w_hu, gen_vvv, gen_helper_vaddwod_w_hu)
> +TRANS(vaddwod_d_wu, gen_vvv, gen_helper_vaddwod_d_wu)
> +TRANS(vaddwod_q_du, gen_vvv, gen_helper_vaddwod_q_du)
> +TRANS(vsubwev_h_bu, gen_vvv, gen_helper_vsubwev_h_bu)
> +TRANS(vsubwev_w_hu, gen_vvv, gen_helper_vsubwev_w_hu)
> +TRANS(vsubwev_d_wu, gen_vvv, gen_helper_vsubwev_d_wu)
> +TRANS(vsubwev_q_du, gen_vvv, gen_helper_vsubwev_q_du)
> +TRANS(vsubwod_h_bu, gen_vvv, gen_helper_vsubwod_h_bu)
> +TRANS(vsubwod_w_hu, gen_vvv, gen_helper_vsubwod_w_hu)
> +TRANS(vsubwod_d_wu, gen_vvv, gen_helper_vsubwod_d_wu)
> +TRANS(vsubwod_q_du, gen_vvv, gen_helper_vsubwod_q_du)

These can be implemented with a combination of vector and + vector add.

> +TRANS(vaddwev_h_bu_b, gen_vvv, gen_helper_vaddwev_h_bu_b)
> +TRANS(vaddwev_w_hu_h, gen_vvv, gen_helper_vaddwev_w_hu_h)
> +TRANS(vaddwev_d_wu_w, gen_vvv, gen_helper_vaddwev_d_wu_w)
> +TRANS(vaddwev_q_du_d, gen_vvv, gen_helper_vaddwev_q_du_d)
> +TRANS(vaddwod_h_bu_b, gen_vvv, gen_helper_vaddwod_h_bu_b)
> +TRANS(vaddwod_w_hu_h, gen_vvv, gen_helper_vaddwod_w_hu_h)
> +TRANS(vaddwod_d_wu_w, gen_vvv, gen_helper_vaddwod_d_wu_w)
> +TRANS(vaddwod_q_du_d, gen_vvv, gen_helper_vaddwod_q_du_d)

Likewise.

For an example of how to bundle vector operations, see e.g. gen_gvec_rax1 and subroutines 
in target/arm/translate-a64.c.  There are many others, but ask if you need more help.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 11/43] target/loongarch: Implement vavg/vavgr
  2022-12-24  8:16 ` [RFC PATCH 11/43] target/loongarch: Implement vavg/vavgr Song Gao
@ 2022-12-24 17:52   ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 17:52 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:16, Song Gao wrote:
> +TRANS(vavg_b, gen_vvv, gen_helper_vavg_b)
> +TRANS(vavg_h, gen_vvv, gen_helper_vavg_h)
> +TRANS(vavg_w, gen_vvv, gen_helper_vavg_w)
> +TRANS(vavg_d, gen_vvv, gen_helper_vavg_d)
> +TRANS(vavg_bu, gen_vvv, gen_helper_vavg_bu)
> +TRANS(vavg_hu, gen_vvv, gen_helper_vavg_hu)
> +TRANS(vavg_wu, gen_vvv, gen_helper_vavg_wu)
> +TRANS(vavg_du, gen_vvv, gen_helper_vavg_du)
> +TRANS(vavgr_b, gen_vvv, gen_helper_vavgr_b)
> +TRANS(vavgr_h, gen_vvv, gen_helper_vavgr_h)
> +TRANS(vavgr_w, gen_vvv, gen_helper_vavgr_w)
> +TRANS(vavgr_d, gen_vvv, gen_helper_vavgr_d)
> +TRANS(vavgr_bu, gen_vvv, gen_helper_vavgr_bu)
> +TRANS(vavgr_hu, gen_vvv, gen_helper_vavgr_hu)
> +TRANS(vavgr_wu, gen_vvv, gen_helper_vavgr_wu)
> +TRANS(vavgr_du, gen_vvv, gen_helper_vavgr_du)

These can be implemented with gvec.  See e.g. do_vx_vavg in 
target/ppc/translate/vmx-impl.c.inc, which implements the rounding version.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 12/43] target/loongarch: Implement vabsd
  2022-12-24  8:16 ` [RFC PATCH 12/43] target/loongarch: Implement vabsd Song Gao
@ 2022-12-24 17:55   ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 17:55 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:16, Song Gao wrote:
> +INSN_LSX(vabsd_b,          vvv)
> +INSN_LSX(vabsd_h,          vvv)
> +INSN_LSX(vabsd_w,          vvv)
> +INSN_LSX(vabsd_d,          vvv)
> +INSN_LSX(vabsd_bu,         vvv)
> +INSN_LSX(vabsd_hu,         vvv)
> +INSN_LSX(vabsd_wu,         vvv)
> +INSN_LSX(vabsd_du,         vvv)

These can be implemented with max, min, and sub.
See e.g. do_vabsdu in target/ppc/translate/vmx-impl.c.inc.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 13/43] target/loongarch: Implement vadda
  2022-12-24  8:16 ` [RFC PATCH 13/43] target/loongarch: Implement vadda Song Gao
@ 2022-12-24 17:56   ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 17:56 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:16, Song Gao wrote:
> +TRANS(vadda_b, gen_vvv, gen_helper_vadda_b)
> +TRANS(vadda_h, gen_vvv, gen_helper_vadda_h)
> +TRANS(vadda_w, gen_vvv, gen_helper_vadda_w)
> +TRANS(vadda_d, gen_vvv, gen_helper_vadda_d)

These can be implemented with abs + add.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 14/43] target/loongarch: Implement vmax/vmin
  2022-12-24  8:16 ` [RFC PATCH 14/43] target/loongarch: Implement vmax/vmin Song Gao
@ 2022-12-24 18:01   ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 18:01 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:16, Song Gao wrote:
> +INSN_LSX(vmax_b,           vvv)
> +INSN_LSX(vmax_h,           vvv)
> +INSN_LSX(vmax_w,           vvv)
> +INSN_LSX(vmax_d,           vvv)
> +INSN_LSX(vmin_b,           vvv)
> +INSN_LSX(vmin_h,           vvv)
> +INSN_LSX(vmin_w,           vvv)
> +INSN_LSX(vmin_d,           vvv)
> +INSN_LSX(vmax_bu,          vvv)
> +INSN_LSX(vmax_hu,          vvv)
> +INSN_LSX(vmax_wu,          vvv)
> +INSN_LSX(vmax_du,          vvv)
> +INSN_LSX(vmin_bu,          vvv)
> +INSN_LSX(vmin_hu,          vvv)
> +INSN_LSX(vmin_wu,          vvv)
> +INSN_LSX(vmin_du,          vvv)

These can be implemented with tcg_gen_gvec_{smin,umin,smax,umax}.

> +INSN_LSX(vmaxi_b,          vv_i)
> +INSN_LSX(vmaxi_h,          vv_i)
> +INSN_LSX(vmaxi_w,          vv_i)
> +INSN_LSX(vmaxi_d,          vv_i)
> +INSN_LSX(vmini_b,          vv_i)
> +INSN_LSX(vmini_h,          vv_i)
> +INSN_LSX(vmini_w,          vv_i)
> +INSN_LSX(vmini_d,          vv_i)
> +INSN_LSX(vmaxi_bu,         vv_i)
> +INSN_LSX(vmaxi_hu,         vv_i)
> +INSN_LSX(vmaxi_wu,         vv_i)
> +INSN_LSX(vmaxi_du,         vv_i)
> +INSN_LSX(vmini_bu,         vv_i)
> +INSN_LSX(vmini_hu,         vv_i)
> +INSN_LSX(vmini_wu,         vv_i)
> +INSN_LSX(vmini_du,         vv_i)

These have no direct immediate variant.  Use a combination pattern with dup + minmax.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 15/43] target/loongarch: Implement vmul/vmuh/vmulw{ev/od}
  2022-12-24  8:16 ` [RFC PATCH 15/43] target/loongarch: Implement vmul/vmuh/vmulw{ev/od} Song Gao
@ 2022-12-24 18:07   ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 18:07 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:16, Song Gao wrote:
> +DEF_HELPER_4(vmul_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmul_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmul_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmul_d, void, env, i32, i32, i32)

These are tcg_gen_gvec_mul.

> +DEF_HELPER_4(vmuh_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmuh_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmuh_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmuh_d, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmuh_bu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmuh_hu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmuh_wu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmuh_du, void, env, i32, i32, i32)

These, sadly, have no generic equivalent.  We should probably create one, since several 
targets have it.  E.g. do_vx_mulh in target/ppc/translate/vmx-impl.c.inc.

> +DEF_HELPER_4(vmulwev_h_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwev_w_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwev_d_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwev_q_d, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwod_h_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwod_w_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwod_d_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwod_q_d, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwev_h_bu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwev_w_hu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwev_d_wu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwev_q_du, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwod_h_bu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwod_w_hu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwod_d_wu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwod_q_du, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwev_h_bu_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwev_w_hu_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwev_d_wu_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwev_q_du_d, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwod_h_bu_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwod_w_hu_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwod_d_wu_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(vmulwod_q_du_d, void, env, i32, i32, i32)

Similar to widening add: shifts, and, mul.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 16/43] target/loongarch: Implement vmadd/vmsub/vmaddw{ev/od}
  2022-12-24  8:16 ` [RFC PATCH 16/43] target/loongarch: Implement vmadd/vmsub/vmaddw{ev/od} Song Gao
@ 2022-12-24 18:09   ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 18:09 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:16, Song Gao wrote:
> +TRANS(vmadd_b, gen_vvv, gen_helper_vmadd_b)
> +TRANS(vmadd_h, gen_vvv, gen_helper_vmadd_h)
> +TRANS(vmadd_w, gen_vvv, gen_helper_vmadd_w)
> +TRANS(vmadd_d, gen_vvv, gen_helper_vmadd_d)
> +TRANS(vmsub_b, gen_vvv, gen_helper_vmsub_b)
> +TRANS(vmsub_h, gen_vvv, gen_helper_vmsub_h)
> +TRANS(vmsub_w, gen_vvv, gen_helper_vmsub_w)
> +TRANS(vmsub_d, gen_vvv, gen_helper_vmsub_d)

Implement with mul, add, sub.

> +TRANS(vmaddwev_h_b, gen_vvv, gen_helper_vmaddwev_h_b)
> +TRANS(vmaddwev_w_h, gen_vvv, gen_helper_vmaddwev_w_h)
> +TRANS(vmaddwev_d_w, gen_vvv, gen_helper_vmaddwev_d_w)
> +TRANS(vmaddwev_q_d, gen_vvv, gen_helper_vmaddwev_q_d)
> +TRANS(vmaddwod_h_b, gen_vvv, gen_helper_vmaddwod_h_b)
> +TRANS(vmaddwod_w_h, gen_vvv, gen_helper_vmaddwod_w_h)
> +TRANS(vmaddwod_d_w, gen_vvv, gen_helper_vmaddwod_d_w)
> +TRANS(vmaddwod_q_d, gen_vvv, gen_helper_vmaddwod_q_d)
> +TRANS(vmaddwev_h_bu, gen_vvv, gen_helper_vmaddwev_h_bu)
> +TRANS(vmaddwev_w_hu, gen_vvv, gen_helper_vmaddwev_w_hu)
> +TRANS(vmaddwev_d_wu, gen_vvv, gen_helper_vmaddwev_d_wu)
> +TRANS(vmaddwev_q_du, gen_vvv, gen_helper_vmaddwev_q_du)
> +TRANS(vmaddwod_h_bu, gen_vvv, gen_helper_vmaddwod_h_bu)
> +TRANS(vmaddwod_w_hu, gen_vvv, gen_helper_vmaddwod_w_hu)
> +TRANS(vmaddwod_d_wu, gen_vvv, gen_helper_vmaddwod_d_wu)
> +TRANS(vmaddwod_q_du, gen_vvv, gen_helper_vmaddwod_q_du)
> +TRANS(vmaddwev_h_bu_b, gen_vvv, gen_helper_vmaddwev_h_bu_b)
> +TRANS(vmaddwev_w_hu_h, gen_vvv, gen_helper_vmaddwev_w_hu_h)
> +TRANS(vmaddwev_d_wu_w, gen_vvv, gen_helper_vmaddwev_d_wu_w)
> +TRANS(vmaddwev_q_du_d, gen_vvv, gen_helper_vmaddwev_q_du_d)
> +TRANS(vmaddwod_h_bu_b, gen_vvv, gen_helper_vmaddwod_h_bu_b)
> +TRANS(vmaddwod_w_hu_h, gen_vvv, gen_helper_vmaddwod_w_hu_h)
> +TRANS(vmaddwod_d_wu_w, gen_vvv, gen_helper_vmaddwod_d_wu_w)
> +TRANS(vmaddwod_q_du_d, gen_vvv, gen_helper_vmaddwod_q_du_d)

Similar to widening add, mul.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 18/43] target/loongarch: Implement vsat
  2022-12-24  8:16 ` [RFC PATCH 18/43] target/loongarch: Implement vsat Song Gao
@ 2022-12-24 18:13   ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 18:13 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:16, Song Gao wrote:
> +TRANS(vsat_b, gen_vv_i, gen_helper_vsat_b)
> +TRANS(vsat_h, gen_vv_i, gen_helper_vsat_h)
> +TRANS(vsat_w, gen_vv_i, gen_helper_vsat_w)
> +TRANS(vsat_d, gen_vv_i, gen_helper_vsat_d)
> +TRANS(vsat_bu, gen_vv_i, gen_helper_vsat_bu)
> +TRANS(vsat_hu, gen_vv_i, gen_helper_vsat_hu)
> +TRANS(vsat_wu, gen_vv_i, gen_helper_vsat_wu)
> +TRANS(vsat_du, gen_vv_i, gen_helper_vsat_du)

Implement these with dup + min + max.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 20/43] target/loongarch: Implement vsigncov
  2022-12-24  8:16 ` [RFC PATCH 20/43] target/loongarch: Implement vsigncov Song Gao
@ 2022-12-24 18:18   ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 18:18 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:16, Song Gao wrote:
> +static void do_vsigncov(vec_t *Vd, vec_t *Vj, vec_t *Vk, int bit, int n)
> +{
> +    switch (bit) {
> +    case 8:
> +        Vd->B[n] = (Vj->B[n] == 0x0) ? 0 :
> +                   (Vj->B[n] < 0) ? -Vk->B[n] : Vk->B[n];
> +        break;
> +    case 16:
> +        Vd->H[n] = (Vj->H[n] == 0x0) ? 0 :
> +                   (Vj->H[n] < 0) ? -Vk->H[n] : Vk->H[n];
> +        break;
> +    case 32:
> +        Vd->W[n] = (Vj->W[n] == 0x0) ? 0 :
> +                   (Vj->W[n] < 0) ? -Vk->W[n] : Vk->W[n];
> +        break;
> +    case 64:
> +        Vd->D[n] = (Vj->D[n] == 0x0) ? 0 :
> +                   (Vj->D[n] < 0) ? -Vk->D[n] : Vk->W[n];

Typo in this last line.

Can be implemented with neg + cmpsel * 2.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 21/43] target/loongarch: Implement vmskltz/vmskgez/vmsknz
  2022-12-24  8:16 ` [RFC PATCH 21/43] target/loongarch: Implement vmskltz/vmskgez/vmsknz Song Gao
@ 2022-12-24 18:31   ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 18:31 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:16, Song Gao wrote:
> +static void do_vmskltz(vec_t *Vd, vec_t *Vj, int bit, int n)
> +{
> +    switch (bit) {
> +    case 8:
> +        Vd->H[0] |= ((0x80 & Vj->B[n]) == 0) << n;
> +        break;
> +    case 16:
> +        Vd->H[0] |= ((0x8000 & Vj->H[n]) == 0) << n;
> +        break;
> +    case 32:
> +        Vd->H[0] |= ((0x80000000 & Vj->W[n]) == 0) << n;
> +        break;
> +    case 64:
> +        Vd->H[0] |= ((0x8000000000000000 & Vj->D[n]) == 0) << n;
> +        break;
> +    default:
> +        g_assert_not_reached();
> +    }
> +}
> +
> +static void do_vmskgez(vec_t *Vd, vec_t *Vj, int bit, int n)
> +{
> +    Vd->H[0] |= !((0x80 & Vj->B[n]) == 0) << n;
> +}
> +
> +static void do_vmsknz(vec_t *Vd, vec_t *Vj, int bit, int n)
> +{
> +    Vd->H[0] |=  (Vj->B[n] == 0) << n;
> +}
The bit collection and compaction can be done with a set of integer shifts.
E.g. helper_cmpbe0 in target/alpha/int_helper.c.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 22/43] target/loongarch: Implement LSX logic instructions
  2022-12-24  8:16 ` [RFC PATCH 22/43] target/loongarch: Implement LSX logic instructions Song Gao
@ 2022-12-24 18:34   ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 18:34 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:16, Song Gao wrote:
> +TRANS(vand_v, gen_vvv, gen_helper_vand_v)
> +TRANS(vor_v, gen_vvv, gen_helper_vor_v)
> +TRANS(vxor_v, gen_vvv, gen_helper_vxor_v)
> +TRANS(vnor_v, gen_vvv, gen_helper_vnor_v)
> +TRANS(vandn_v, gen_vvv, gen_helper_vandn_v)
> +TRANS(vorn_v, gen_vvv, gen_helper_vorn_v)

These can be implemented with tcg_gen_gvec_{and,or,xor,andc,orc,nor}.

> +TRANS(vandi_b, gen_vv_i, gen_helper_vandi_b)
> +TRANS(vori_b, gen_vv_i, gen_helper_vori_b)
> +TRANS(vxori_b, gen_vv_i, gen_helper_vxori_b)

These are tcg_gen_gvec_{andi,ori,xori}.

> +TRANS(vnori_b, gen_vv_i, gen_helper_vnori_b)

This would need dup + nor.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 23/43] target/loongarch: Implement vsll vsrl vsra vrotr
  2022-12-24  8:16 ` [RFC PATCH 23/43] target/loongarch: Implement vsll vsrl vsra vrotr Song Gao
@ 2022-12-24 18:36   ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 18:36 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:16, Song Gao wrote:
> +DEF_HELPER_4(vsll_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsll_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsll_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsll_d, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslli_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslli_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslli_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslli_d, void, env, i32, i32, i32)
> +
> +DEF_HELPER_4(vsrl_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsrl_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsrl_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsrl_d, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsrli_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsrli_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsrli_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsrli_d, void, env, i32, i32, i32)
> +
> +DEF_HELPER_4(vsra_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsra_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsra_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsra_d, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsrai_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsrai_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsrai_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsrai_d, void, env, i32, i32, i32)
> +
> +DEF_HELPER_4(vrotr_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(vrotr_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(vrotr_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(vrotr_d, void, env, i32, i32, i32)
> +DEF_HELPER_4(vrotri_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(vrotri_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(vrotri_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(vrotri_d, void, env, i32, i32, i32)

These are tcg_gen_gvec_{shl,shr,sar,rotr}{v,i}.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 36/43] target/loongarch: Implement vseq vsle vslt
  2022-12-24  8:16 ` [RFC PATCH 36/43] target/loongarch: Implement vseq vsle vslt Song Gao
@ 2022-12-24 18:50   ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 18:50 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:16, Song Gao wrote:
> +DEF_HELPER_4(vseq_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(vseq_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(vseq_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(vseq_d, void, env, i32, i32, i32)
> +DEF_HELPER_4(vseqi_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(vseqi_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(vseqi_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(vseqi_d, void, env, i32, i32, i32)
> +
> +DEF_HELPER_4(vsle_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsle_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsle_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsle_d, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslei_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslei_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslei_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslei_d, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsle_bu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsle_hu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsle_wu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vsle_du, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslei_bu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslei_hu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslei_wu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslei_du, void, env, i32, i32, i32)
> +
> +DEF_HELPER_4(vslt_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslt_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslt_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslt_d, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslti_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslti_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslti_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslti_d, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslt_bu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslt_hu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslt_wu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslt_du, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslti_bu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslti_hu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslti_wu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vslti_du, void, env, i32, i32, i32)

These are tcg_gen_gvec_cmp.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 38/43] target/loongarch: Implement vbitsel vset
  2022-12-24  8:16 ` [RFC PATCH 38/43] target/loongarch: Implement vbitsel vset Song Gao
@ 2022-12-24 19:15   ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 19:15 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:16, Song Gao wrote:
> +DEF_HELPER_5(vbitsel_v, void, env, i32, i32, i32, i32)

This is tcg_gen_gvec_bitsel.  The immediate version would require dupi.

r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 39/43] target/loongarch: Implement vinsgr2vr vpickve2gr vreplgr2vr
  2022-12-24  8:16 ` [RFC PATCH 39/43] target/loongarch: Implement vinsgr2vr vpickve2gr vreplgr2vr Song Gao
@ 2022-12-24 20:34   ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 20:34 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:16, Song Gao wrote:
> +DEF_HELPER_4(vinsgr2vr_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(vinsgr2vr_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(vinsgr2vr_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(vinsgr2vr_d, void, env, i32, i32, i32)
> +DEF_HELPER_4(vpickve2gr_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(vpickve2gr_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(vpickve2gr_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(vpickve2gr_d, void, env, i32, i32, i32)
> +DEF_HELPER_4(vpickve2gr_bu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vpickve2gr_hu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vpickve2gr_wu, void, env, i32, i32, i32)
> +DEF_HELPER_4(vpickve2gr_du, void, env, i32, i32, i32)

These can be implemented with tcg_gen_{ld,st}_i64, and offsetof.

> +DEF_HELPER_3(vreplgr2vr_b, void, env, i32, i32)
> +DEF_HELPER_3(vreplgr2vr_h, void, env, i32, i32)
> +DEF_HELPER_3(vreplgr2vr_w, void, env, i32, i32)
> +DEF_HELPER_3(vreplgr2vr_d, void, env, i32, i32)

These are tcg_gen_gvec_dup_i64.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 40/43] target/loongarch: Implement vreplve vpack vpick
  2022-12-24  8:16 ` [RFC PATCH 40/43] target/loongarch: Implement vreplve vpack vpick Song Gao
@ 2022-12-24 21:12   ` Richard Henderson
  2023-03-21 11:31     ` gaosong
  0 siblings, 1 reply; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 21:12 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:16, Song Gao wrote:
> +TRANS(vreplve_b, gen_vvr, gen_helper_vreplve_b)
> +TRANS(vreplve_h, gen_vvr, gen_helper_vreplve_h)
> +TRANS(vreplve_w, gen_vvr, gen_helper_vreplve_w)
> +TRANS(vreplve_d, gen_vvr, gen_helper_vreplve_d)
> +TRANS(vreplvei_b, gen_vv_i, gen_helper_vreplvei_b)
> +TRANS(vreplvei_h, gen_vv_i, gen_helper_vreplvei_h)
> +TRANS(vreplvei_w, gen_vv_i, gen_helper_vreplvei_w)
> +TRANS(vreplvei_d, gen_vv_i, gen_helper_vreplvei_d)
tcg_gen_gvec_dupm.

In the case of imm, this will be cpu_env + offsetof.
In the case of reg, compute cpu_env + register offset + offsetof.

> +TRANS(vbsll_v, gen_vv_i, gen_helper_vbsll_v)
> +TRANS(vbsrl_v, gen_vv_i, gen_helper_vbsrl_v)

These can use tcg_gen_extract2_i64, with imm * 8 bit shift.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 42/43] target/loongarch: Implement vld vst
  2022-12-24  8:16 ` [RFC PATCH 42/43] target/loongarch: Implement vld vst Song Gao
@ 2022-12-24 21:15   ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 21:15 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:16, Song Gao wrote:
> This patch includes:
> - VLD[X], VST[X];
> - VLDREPL.{B/H/W/D};
> - VSTELM.{B/H/W/D}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/disas.c                    |  34 +++
>   target/loongarch/helper.h                   |  12 +
>   target/loongarch/insn_trans/trans_lsx.c.inc |  75 ++++++
>   target/loongarch/insns.decode               |  36 +++
>   target/loongarch/lsx_helper.c               | 266 ++++++++++++++++++++
>   target/loongarch/translate.c                |  10 +
>   6 files changed, 433 insertions(+)

This whole thing will be much simplified with TCGv_i128 load/store.
That patch set is still in flight, but should land soon...


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 43/43] target/loongarch: Implement vldi
  2022-12-24  8:16 ` [RFC PATCH 43/43] target/loongarch: Implement vldi Song Gao
@ 2022-12-24 21:18   ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2022-12-24 21:18 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 12/24/22 00:16, Song Gao wrote:
> +static bool trans_vldi(DisasContext *ctx, arg_vldi *a)
> +{
> +    TCGv_i32 twd = tcg_constant_i32(a->vd);
> +    TCGv_i32 tui = tcg_constant_i32(a->imm);
> +
> +    CHECK_SXE;
> +    gen_helper_vldi(cpu_env, twd, tui);
> +    return true;
> +}
> +

The constant should be expanded during translate, and use tcg_gen_gvec_dupi.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 00/43] Add LoongArch LSX instructions
  2022-12-24 15:39 ` [RFC PATCH 00/43] Add LoongArch LSX instructions Richard Henderson
@ 2022-12-28  0:55   ` gaosong
  0 siblings, 0 replies; 100+ messages in thread
From: gaosong @ 2022-12-28  0:55 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel


在 2022/12/24 下午11:39, Richard Henderson 写道:
> On 12/24/22 00:15, Song Gao wrote:
>> Hi, Merry Christmas!
>>
>> This series adds LoongArch LSX instructions, Since the LoongArch
>> Vol2 is not open, So we use 'RFC' title.
>
> That is unfortunate, as it makes reviewing this difficult.
> Is there a timeline for this being published?
>
Perhaps at the end of the first quarter of 2023.

> In the meantime, I can at least point out some general issues.
>
Thank you very much.

Thanks.
Song Gao



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t
  2022-12-24 17:24   ` Richard Henderson
@ 2022-12-28  2:34     ` gaosong
  2022-12-28 17:30       ` Richard Henderson
  0 siblings, 1 reply; 100+ messages in thread
From: gaosong @ 2022-12-28  2:34 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel


在 2022/12/25 上午1:24, Richard Henderson 写道:
> On 12/24/22 00:15, Song Gao wrote:
>> +union fpr_t {
>> +    uint64_t d;
>> +    vec_t vec;
>> +};
>> +
>>   struct LoongArchTLB {
>>       uint64_t tlb_misc;
>>       /* Fields corresponding to CSR_TLBELO0/1 */
>> @@ -251,7 +267,7 @@ typedef struct CPUArchState {
>>       uint64_t gpr[32];
>>       uint64_t pc;
>>   -    uint64_t fpr[32];
>> +    fpr_t fpr[32];
>
> I didn't spot it right away, because you didn't add ".d" to the tcg 
> register allocation, 
Oh,    my mistake.
> but if you use tcg/tcg-op-gvec.h (and you really should), then you 
> will also have to remove
>
>>     for (i = 0; i < 32; i++) {
>>         int off = offsetof(CPULoongArchState, fpr[i]);
>>         cpu_fpr[i] = tcg_global_mem_new_i64(cpu_env, off, fregnames[i]);
>>     }
>
> because one cannot modify global_mem variables with gvec.
>
The manual says "The lower 64 bits of each vector register overlap with 
the floating point register of the same number.  In other words
When the basic floating-point instruction is executed to update the 
floating-point register, the low 64 bits of the corresponding LSX register
are also updated to the same value."

So If we don't use the fpr_t.  we should:
1 Update LSX low 64 bits after floating point instruction translation;
2 Update floating-point registers after LSX instruction translation.

Should we do this  or have I misunderstood?
> I strongly suggest that you introduce wrappers to load/store fpr 
> values from their env slots.  I would name them similarly to 
> gpr_{src,dst}, gen_set_gpr.
>
Got it .

Thanks.
Song Gao



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t
  2022-12-28  2:34     ` gaosong
@ 2022-12-28 17:30       ` Richard Henderson
  2022-12-29  1:51         ` gaosong
  0 siblings, 1 reply; 100+ messages in thread
From: Richard Henderson @ 2022-12-28 17:30 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 12/27/22 18:34, gaosong wrote:
> The manual says "The lower 64 bits of each vector register overlap with the floating point 
> register of the same number.  In other words
> When the basic floating-point instruction is executed to update the floating-point 
> register, the low 64 bits of the corresponding LSX register
> are also updated to the same value."
> 
> So If we don't use the fpr_t.  we should:
> 1 Update LSX low 64 bits after floating point instruction translation;
> 2 Update floating-point registers after LSX instruction translation.
> 
> Should we do this  or have I misunderstood?

You should use fpr_t, you should not use cpu_fpr[].
This is the same as aarch64, for instance.

A related question though: does the manual mention whether the fpu instructions only 
modify the lower 64 bits, or do the high 64-bits become zeroed, nanboxed, or unspecified?


>> I strongly suggest that you introduce wrappers to load/store fpr values from their env 
>> slots.  I would name them similarly to gpr_{src,dst}, gen_set_gpr.
>>
> Got it.


r~



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t
  2022-12-28 17:30       ` Richard Henderson
@ 2022-12-29  1:51         ` gaosong
  2022-12-29  3:13           ` Richard Henderson
  0 siblings, 1 reply; 100+ messages in thread
From: gaosong @ 2022-12-29  1:51 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel


在 2022/12/29 上午1:30, Richard Henderson 写道:
> On 12/27/22 18:34, gaosong wrote:
>> The manual says "The lower 64 bits of each vector register overlap 
>> with the floating point register of the same number.  In other words
>> When the basic floating-point instruction is executed to update the 
>> floating-point register, the low 64 bits of the corresponding LSX 
>> register
>> are also updated to the same value."
>>
>> So If we don't use the fpr_t.  we should:
>> 1 Update LSX low 64 bits after floating point instruction translation;
>> 2 Update floating-point registers after LSX instruction translation.
>>
>> Should we do this  or have I misunderstood?
>
> You should use fpr_t, you should not use cpu_fpr[].
> This is the same as aarch64, for instance.
>
> A related question though: does the manual mention whether the fpu 
> instructions only modify the lower 64 bits, or do the high 64-bits 
> become zeroed, nanboxed, or unspecified?
>
>
Only modify the lower 64bits,   the high 64-bits is unpecified.

Thanks.
Song Gao
>>> I strongly suggest that you introduce wrappers to load/store fpr 
>>> values from their env slots.  I would name them similarly to 
>>> gpr_{src,dst}, gen_set_gpr.
>>>
>> Got it.
>
>
> r~



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t
  2022-12-29  1:51         ` gaosong
@ 2022-12-29  3:13           ` Richard Henderson
  2022-12-29  3:54             ` gaosong
  0 siblings, 1 reply; 100+ messages in thread
From: Richard Henderson @ 2022-12-29  3:13 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 12/28/22 17:51, gaosong wrote:
>> A related question though: does the manual mention whether the fpu instructions only 
>> modify the lower 64 bits, or do the high 64-bits become zeroed, nanboxed, or unspecified?
>>
>>
> Only modify the lower 64bits,   the high 64-bits is unpecified.

These two options are mutually exclusive.  If upper 64 bits unmodified, then they *are* 
specified to be the previous contents.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t
  2022-12-29  3:13           ` Richard Henderson
@ 2022-12-29  3:54             ` gaosong
  0 siblings, 0 replies; 100+ messages in thread
From: gaosong @ 2022-12-29  3:54 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel


在 2022/12/29 上午11:13, Richard Henderson 写道:
> On 12/28/22 17:51, gaosong wrote:
>>> A related question though: does the manual mention whether the fpu 
>>> instructions only modify the lower 64 bits, or do the high 64-bits 
>>> become zeroed, nanboxed, or unspecified?
>>>
>>>
>> Only modify the lower 64bits,   the high 64-bits is unpecified.
>
> These two options are mutually exclusive.  If upper 64 bits 
> unmodified, then they *are* specified to be the previous contents.
>
My description is not correct.
'The fpu instruction will modify the low 64 bits, but the high 64 bits 
are unspecified and their values are "unpredictable" '.

Thanks.
Song Gao



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t
  2022-12-24 17:32   ` Richard Henderson
@ 2023-02-13  8:24     ` gaosong
  2023-02-13 19:18       ` Richard Henderson
  0 siblings, 1 reply; 100+ messages in thread
From: gaosong @ 2023-02-13  8:24 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

Hi,  Richard

在 2022/12/25 上午1:32, Richard Henderson 写道:
> On 12/24/22 00:15, Song Gao wrote:
>> +union vec_t {
>> +    int8_t   B[LSX_LEN / 8];
>> +    int16_t  H[LSX_LEN / 16];
>> +    int32_t  W[LSX_LEN / 32];
>> +    int64_t  D[LSX_LEN / 64];
>> +    __int128 Q[LSX_LEN / 128];
>
> Oh, you can't use __int128 directly.
> It won't compile on 32-bit hosts.
>
>
Can we  use Int128  after include "qem/int128.h" ?
So,   some  vxx_q  instructions  can  use   int128_ xx(a, b).

Thanks.
Song Gao



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t
  2023-02-13  8:24     ` gaosong
@ 2023-02-13 19:18       ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2023-02-13 19:18 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 2/12/23 22:24, gaosong wrote:
> Hi,  Richard
> 
> 在 2022/12/25 上午1:32, Richard Henderson 写道:
>> On 12/24/22 00:15, Song Gao wrote:
>>> +union vec_t {
>>> +    int8_t   B[LSX_LEN / 8];
>>> +    int16_t  H[LSX_LEN / 16];
>>> +    int32_t  W[LSX_LEN / 32];
>>> +    int64_t  D[LSX_LEN / 64];
>>> +    __int128 Q[LSX_LEN / 128];
>>
>> Oh, you can't use __int128 directly.
>> It won't compile on 32-bit hosts.
>>
>>
> Can we  use Int128  after include "qem/int128.h" ?
> So,   some  vxx_q  instructions  can  use   int128_ xx(a, b).

Yes, certainly.

r~



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
  2022-12-24 17:48   ` Richard Henderson
@ 2023-02-20  7:47     ` gaosong
  2023-02-20 17:21       ` Richard Henderson
  0 siblings, 1 reply; 100+ messages in thread
From: gaosong @ 2023-02-20  7:47 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

Hi, Richard

在 2022/12/25 上午1:48, Richard Henderson 写道:
> On 12/24/22 00:16, Song Gao wrote:
>> +TRANS(vaddwev_h_b, gen_vvv, gen_helper_vaddwev_h_b)
>> +TRANS(vaddwev_w_h, gen_vvv, gen_helper_vaddwev_w_h)
>> +TRANS(vaddwev_d_w, gen_vvv, gen_helper_vaddwev_d_w)
>> +TRANS(vaddwev_q_d, gen_vvv, gen_helper_vaddwev_q_d)
>> +TRANS(vaddwod_h_b, gen_vvv, gen_helper_vaddwod_h_b)
>> +TRANS(vaddwod_w_h, gen_vvv, gen_helper_vaddwod_w_h)
>> +TRANS(vaddwod_d_w, gen_vvv, gen_helper_vaddwod_d_w)
>> +TRANS(vaddwod_q_d, gen_vvv, gen_helper_vaddwod_q_d)
>> +TRANS(vsubwev_h_b, gen_vvv, gen_helper_vsubwev_h_b)
>> +TRANS(vsubwev_w_h, gen_vvv, gen_helper_vsubwev_w_h)
>> +TRANS(vsubwev_d_w, gen_vvv, gen_helper_vsubwev_d_w)
>> +TRANS(vsubwev_q_d, gen_vvv, gen_helper_vsubwev_q_d)
>> +TRANS(vsubwod_h_b, gen_vvv, gen_helper_vsubwod_h_b)
>> +TRANS(vsubwod_w_h, gen_vvv, gen_helper_vsubwod_w_h)
>> +TRANS(vsubwod_d_w, gen_vvv, gen_helper_vsubwod_d_w)
>> +TRANS(vsubwod_q_d, gen_vvv, gen_helper_vsubwod_q_d)
>
> These can be implemented with a combination of vector shift + vector add.
>
>> +TRANS(vaddwev_h_bu, gen_vvv, gen_helper_vaddwev_h_bu)
>> +TRANS(vaddwev_w_hu, gen_vvv, gen_helper_vaddwev_w_hu)
>> +TRANS(vaddwev_d_wu, gen_vvv, gen_helper_vaddwev_d_wu)
>> +TRANS(vaddwev_q_du, gen_vvv, gen_helper_vaddwev_q_du)
>> +TRANS(vaddwod_h_bu, gen_vvv, gen_helper_vaddwod_h_bu)
>> +TRANS(vaddwod_w_hu, gen_vvv, gen_helper_vaddwod_w_hu)
>> +TRANS(vaddwod_d_wu, gen_vvv, gen_helper_vaddwod_d_wu)
>> +TRANS(vaddwod_q_du, gen_vvv, gen_helper_vaddwod_q_du)
>> +TRANS(vsubwev_h_bu, gen_vvv, gen_helper_vsubwev_h_bu)
>> +TRANS(vsubwev_w_hu, gen_vvv, gen_helper_vsubwev_w_hu)
>> +TRANS(vsubwev_d_wu, gen_vvv, gen_helper_vsubwev_d_wu)
>> +TRANS(vsubwev_q_du, gen_vvv, gen_helper_vsubwev_q_du)
>> +TRANS(vsubwod_h_bu, gen_vvv, gen_helper_vsubwod_h_bu)
>> +TRANS(vsubwod_w_hu, gen_vvv, gen_helper_vsubwod_w_hu)
>> +TRANS(vsubwod_d_wu, gen_vvv, gen_helper_vsubwod_d_wu)
>> +TRANS(vsubwod_q_du, gen_vvv, gen_helper_vsubwod_q_du)
>
> These can be implemented with a combination of vector and + vector add.
>
>> +TRANS(vaddwev_h_bu_b, gen_vvv, gen_helper_vaddwev_h_bu_b)
>> +TRANS(vaddwev_w_hu_h, gen_vvv, gen_helper_vaddwev_w_hu_h)
>> +TRANS(vaddwev_d_wu_w, gen_vvv, gen_helper_vaddwev_d_wu_w)
>> +TRANS(vaddwev_q_du_d, gen_vvv, gen_helper_vaddwev_q_du_d)
>> +TRANS(vaddwod_h_bu_b, gen_vvv, gen_helper_vaddwod_h_bu_b)
>> +TRANS(vaddwod_w_hu_h, gen_vvv, gen_helper_vaddwod_w_hu_h)
>> +TRANS(vaddwod_d_wu_w, gen_vvv, gen_helper_vaddwod_d_wu_w)
>> +TRANS(vaddwod_q_du_d, gen_vvv, gen_helper_vaddwod_q_du_d)
>
> Likewise.
>
> For an example of how to bundle vector operations, see e.g. 
> gen_gvec_rax1 and subroutines in target/arm/translate-a64.c. There are 
> many others, but ask if you need more help.
>
I have some questions:
1 Should we need implement  GVecGen*  for simple gvec instructiosn?
     such as add, sub , or , xor..
2 Should we need implement all fni8/fni4, fniv,  fno?

Thanks
Song Gao



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
  2023-02-20  7:47     ` gaosong
@ 2023-02-20 17:21       ` Richard Henderson
  2023-02-23  8:23         ` gaosong
  0 siblings, 1 reply; 100+ messages in thread
From: Richard Henderson @ 2023-02-20 17:21 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 2/19/23 21:47, gaosong wrote:
> I have some questions:
> 1 Should we need implement  GVecGen*  for simple gvec instructiosn?
>      such as add, sub , or , xor..

No, these are done generically.

> 2 Should we need implement all fni8/fni4, fniv,  fno?

You need not implement them all.  Generally you will only implement fni4 for 32-bit 
arithmetic operations, and only fni8 for logical operations; there is rarely a cause for 
both with the same operation.

You can rely on the generic cutoff of 4 integer inline operations -- easy for your maximum 
vector length of 128-bits -- to avoid implementing fno.

But in extreme, you can implement only fno.  You can choose this over directly calling a 
helper function, minimizing differences in the translator code paths and letting generic 
code build all of the pointers.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
  2023-02-20 17:21       ` Richard Henderson
@ 2023-02-23  8:23         ` gaosong
  2023-02-23 15:22           ` Richard Henderson
  0 siblings, 1 reply; 100+ messages in thread
From: gaosong @ 2023-02-23  8:23 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

Hi, Richard

在 2023/2/21 上午1:21, Richard Henderson 写道:
> On 2/19/23 21:47, gaosong wrote:
>> I have some questions:
>> 1 Should we need implement  GVecGen*  for simple gvec instructiosn?
>>      such as add, sub , or , xor..
>
> No, these are done generically.
>
>> 2 Should we need implement all fni8/fni4, fniv,  fno?
>
> You need not implement them all.  Generally you will only implement 
> fni4 for 32-bit arithmetic operations, and only fni8 for logical 
> operations; there is rarely a cause for both with the same operation.
>
> You can rely on the generic cutoff of 4 integer inline operations -- 
> easy for your maximum vector length of 128-bits -- to avoid 
> implementing fno.
>
> But in extreme, you can implement only fno.  You can choose this over 
> directly calling a helper function, minimizing differences in the 
> translator code paths and letting generic code build all of the pointers.
>
Sorry for the late reply,  and Thanks for you answers.

But I still need more help.

How gvec singed or unsigned extensions of vector elements?
I found no gvec function that implements signed and unsigned extensions 
of vector elements.
However, the result of some instructions requires the elements to be 
signed or unsigned extensions.

Thanks.
Song Gao



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
  2023-02-23  8:23         ` gaosong
@ 2023-02-23 15:22           ` Richard Henderson
  2023-02-24  7:24             ` gaosong
  0 siblings, 1 reply; 100+ messages in thread
From: Richard Henderson @ 2023-02-23 15:22 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 2/22/23 22:23, gaosong wrote:
> Hi, Richard
> 
> 在 2023/2/21 上午1:21, Richard Henderson 写道:
>> On 2/19/23 21:47, gaosong wrote:
>>> I have some questions:
>>> 1 Should we need implement  GVecGen*  for simple gvec instructiosn?
>>>      such as add, sub , or , xor..
>>
>> No, these are done generically.
>>
>>> 2 Should we need implement all fni8/fni4, fniv,  fno?
>>
>> You need not implement them all.  Generally you will only implement fni4 for 32-bit 
>> arithmetic operations, and only fni8 for logical operations; there is rarely a cause for 
>> both with the same operation.
>>
>> You can rely on the generic cutoff of 4 integer inline operations -- easy for your 
>> maximum vector length of 128-bits -- to avoid implementing fno.
>>
>> But in extreme, you can implement only fno.  You can choose this over directly calling a 
>> helper function, minimizing differences in the translator code paths and letting generic 
>> code build all of the pointers.
>>
> Sorry for the late reply,  and Thanks for you answers.
> 
> But I still need more help.
> 
> How gvec singed or unsigned extensions of vector elements?

There are no generic sign-extending; that turns out to be widely variable across the 
different hosts and guest architectures.

If your architecture widens the even elements, you can implement extensions as a pair of 
shifts in the wider element size.  E.g. sign-extend is shl + sar.

> I found no gvec function that implements signed and unsigned extensions of vector elements.
> However, the result of some instructions requires the elements to be signed or unsigned 
> extensions.

You may need to implement these operations with fni[48] or out of line in a helper.
It's hard to give advice without a specific example.


r~



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
  2023-02-23 15:22           ` Richard Henderson
@ 2023-02-24  7:24             ` gaosong
  2023-02-24 19:24               ` Richard Henderson
  2023-02-24 23:01               ` Richard Henderson
  0 siblings, 2 replies; 100+ messages in thread
From: gaosong @ 2023-02-24  7:24 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel


在 2023/2/23 下午11:22, Richard Henderson 写道:
> On 2/22/23 22:23, gaosong wrote:
>> Hi, Richard
>>
>> 在 2023/2/21 上午1:21, Richard Henderson 写道:
>>> On 2/19/23 21:47, gaosong wrote:
>>>> I have some questions:
>>>> 1 Should we need implement  GVecGen*  for simple gvec instructiosn?
>>>>      such as add, sub , or , xor..
>>>
>>> No, these are done generically.
>>>
>>>> 2 Should we need implement all fni8/fni4, fniv,  fno?
>>>
>>> You need not implement them all.  Generally you will only implement 
>>> fni4 for 32-bit arithmetic operations, and only fni8 for logical 
>>> operations; there is rarely a cause for both with the same operation.
>>>
>>> You can rely on the generic cutoff of 4 integer inline operations -- 
>>> easy for your maximum vector length of 128-bits -- to avoid 
>>> implementing fno.
>>>
>>> But in extreme, you can implement only fno.  You can choose this 
>>> over directly calling a helper function, minimizing differences in 
>>> the translator code paths and letting generic code build all of the 
>>> pointers.
>>>
>> Sorry for the late reply,  and Thanks for you answers.
>>
>> But I still need more help.
>>
>> How gvec singed or unsigned extensions of vector elements?
>
> There are no generic sign-extending; that turns out to be widely 
> variable across the different hosts and guest architectures.
>
> If your architecture widens the even elements, you can implement 
> extensions as a pair of shifts in the wider element size.  E.g. 
> sign-extend is shl + sar.
>
>> I found no gvec function that implements signed and unsigned 
>> extensions of vector elements.
>> However, the result of some instructions requires the elements to be 
>> signed or unsigned extensions.
>
> You may need to implement these operations with fni[48] or out of line 
> in a helper.
> It's hard to give advice without a specific example. 
I was wrong, the instruction is to sign-extend the odd or even elements 
of the vector before the operation, not to sign-extend the result.
E.g
vaddwev_h_b  vd, vj, vk
vd->H[i] = SignExtend(vj->B[2i])  + SignExtend(vk->B[2i]);
vaddwev_w_h  vd, vj, vk
vd->W[i] = SignExtend(vj->H[2i])  + SignExtend(vk->H[2i]);
vaddwev_d_w  vd, vj, vk
vd->Q[i] = SignExtend(vj->W[2i])  + SignExtend(vk->W[2i]);
vaddwev_q_d  vd, vj, vk
vd->Q[i] = SignExtend(vj->D[2i])  + SignExtend(vk->D[2i]);


Use  shl + sar  to sign-extend  vj/vk  even element.

static bool gvec_vvv(DisasContext *ctx, arg_vvv *a, MemOp mop,
                      void (*func)(unsigned, uint32_t, uint32_t,
                                   uint32_t, uint32_t, uint32_t))
{
     uint32_t vd_ofs, vj_ofs, vk_ofs;

     CHECK_SXE;

     vd_ofs = vreg_full_offset(a->vd);
     vj_ofs = vreg_full_offset(a->vj);
     vk_ofs = vreg_full_offset(a->vk);

     func(mop, vd_ofs, vj_ofs, vk_ofs, 16, 16);
     return true;
}
static void gen_vaddwev_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
{
     TCGv_vec t1 = tcg_temp_new_vec_matching(a);
     TCGv_vec t2 = tcg_temp_new_vec_matching(b);

     int halfbits  =  4 << vece;

     /* Sign-extend even elements from a */
     tcg_gen_dupi_vec(vece, t1, MAKE_64BIT_MASK(0, halfbits));
     tcg_gen_and_vec(vece, a, a, t1);
     tcg_gen_shli_vec(vece, a, a, halfbits);
     tcg_gen_sari_vec(vece, a, a, halfbits);

     /* Sign-extend even elements from b */
     tcg_gen_dupi_vec(vece, t2, MAKE_64BIT_MASK(0, halfbits));
     tcg_gen_and_vec(vece, b, b, t2);
     tcg_gen_shli_vec(vece, b, b, halfbits);
     tcg_gen_sari_vec(vece,  b, b, halfbits);

     tcg_gen_add_vec(vece, t, a, b);

     tcg_temp_free_vec(t1);
     tcg_temp_free_vec(t2);
}

static void gvec_vaddwev_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
                            uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
{
     static const TCGOpcode vecop_list[] = {
         INDEX_op_shli_vec, INDEX_op_shri_vec, INDEX_op_add_vec, 
INDEX_op_sari_vec, 0
         };
     static const GVecGen3 op[4] = {
         {
             .fniv = gen_vaddwev_s,
             .fno = gen_helper_vaddwev_h_b,
             .opt_opc = vecop_list,
             .vece = MO_16
         },
         {
             .fniv = gen_vaddwev_s,
             .fno = gen_helper_vaddwev_w_h,
             .opt_opc = vecop_list,
             .vece = MO_32
         },
         {
             .fniv = gen_vaddwev_s,
             .fno = gen_helper_vaddwev_d_w,
             .opt_opc = vecop_list,
             .vece = MO_64
         },
         {
             .fniv = gen_vaddwev_s,
             .fno = gen_helper_vaddwev_q_d,
             .opt_opc = vecop_list,
             .vece = MO_128
         },
     };

     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
}

TRANS(vaddwev_h_b, gvec_vvv, MO_8,  gvec_vaddwev_s)
TRANS(vaddwev_w_h, gvec_vvv, MO_16, gvec_vaddwev_s)
TRANS(vaddwev_d_w, gvec_vvv, MO_32, gvec_vaddwev_s)
TRANS(vaddwev_q_d, gvec_vvv, MO_64, gvec_vaddwev_s)

and I also implement  gen_helper_vaddwev_x_x.     Is this example correct?

Thanks.
Song Gao



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
  2023-02-24  7:24             ` gaosong
@ 2023-02-24 19:24               ` Richard Henderson
  2023-02-27  9:14                 ` gaosong
  2023-02-27 12:55                 ` gaosong
  2023-02-24 23:01               ` Richard Henderson
  1 sibling, 2 replies; 100+ messages in thread
From: Richard Henderson @ 2023-02-24 19:24 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 2/23/23 21:24, gaosong wrote:
> I was wrong, the instruction is to sign-extend the odd or even elements of the vector 
> before the operation, not to sign-extend the result.
> E.g
> vaddwev_h_b  vd, vj, vk
> vd->H[i] = SignExtend(vj->B[2i])  + SignExtend(vk->B[2i]);
> vaddwev_w_h  vd, vj, vk
> vd->W[i] = SignExtend(vj->H[2i])  + SignExtend(vk->H[2i]);
> vaddwev_d_w  vd, vj, vk
> vd->Q[i] = SignExtend(vj->W[2i])  + SignExtend(vk->W[2i]);
> vaddwev_q_d  vd, vj, vk
> vd->Q[i] = SignExtend(vj->D[2i])  + SignExtend(vk->D[2i]);

Ok, good example.

> static void gen_vaddwev_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
> {
>      TCGv_vec t1 = tcg_temp_new_vec_matching(a);
>      TCGv_vec t2 = tcg_temp_new_vec_matching(b);
> 
>      int halfbits  =  4 << vece;
> 
>      /* Sign-extend even elements from a */
>      tcg_gen_dupi_vec(vece, t1, MAKE_64BIT_MASK(0, halfbits));
>      tcg_gen_and_vec(vece, a, a, t1);

No need to mask off these bits...

>      tcg_gen_shli_vec(vece, a, a, halfbits);

... because they shift out here anyway.

>      tcg_gen_sari_vec(vece, a, a, halfbits);
> 
>      /* Sign-extend even elements from b */
>      tcg_gen_dupi_vec(vece, t2, MAKE_64BIT_MASK(0, halfbits));
>      tcg_gen_and_vec(vece, b, b, t2);
>      tcg_gen_shli_vec(vece, b, b, halfbits);
>      tcg_gen_sari_vec(vece,  b, b, halfbits);
> 
>      tcg_gen_add_vec(vece, t, a, b);
> 
>      tcg_temp_free_vec(t1);
>      tcg_temp_free_vec(t2);
> }

Otherwise this looks good.

>          {
>              .fniv = gen_vaddwev_s,
>              .fno = gen_helper_vaddwev_q_d,
>              .opt_opc = vecop_list,
>              .vece = MO_128
>          },

There are no 128-bit vector operations; you'll need to do this one differently.

Presumably just load the two 64-bit elements, sign-extend into 128-bits, add with 
tcg_gen_add2_i64, and store the two 64-bit elements as output.  But that won't fit into 
the tcg_gen_gvec_3 interface.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
  2023-02-24  7:24             ` gaosong
  2023-02-24 19:24               ` Richard Henderson
@ 2023-02-24 23:01               ` Richard Henderson
  2023-02-28  7:44                 ` gaosong
  1 sibling, 1 reply; 100+ messages in thread
From: Richard Henderson @ 2023-02-24 23:01 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 2/23/23 21:24, gaosong wrote:
>          {
>              .fniv = gen_vaddwev_s,
>              .fno = gen_helper_vaddwev_w_h,
>              .opt_opc = vecop_list,
>              .vece = MO_32
>          },
>          {
>              .fniv = gen_vaddwev_s,
>              .fno = gen_helper_vaddwev_d_w,
>              .opt_opc = vecop_list,
>              .vece = MO_64
>          },

Oh, these two can also include .fni4 and .fni8 integer versions, respectively, for hosts 
without the proper vector support.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
  2023-02-24 19:24               ` Richard Henderson
@ 2023-02-27  9:14                 ` gaosong
  2023-02-27  9:20                   ` Richard Henderson
  2023-02-27 12:55                 ` gaosong
  1 sibling, 1 reply; 100+ messages in thread
From: gaosong @ 2023-02-27  9:14 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel


在 2023/2/25 上午3:24, Richard Henderson 写道:
> On 2/23/23 21:24, gaosong wrote:
>> I was wrong, the instruction is to sign-extend the odd or even 
>> elements of the vector before the operation, not to sign-extend the 
>> result.
>> E.g
>> vaddwev_h_b  vd, vj, vk
>> vd->H[i] = SignExtend(vj->B[2i])  + SignExtend(vk->B[2i]);
>> vaddwev_w_h  vd, vj, vk
>> vd->W[i] = SignExtend(vj->H[2i])  + SignExtend(vk->H[2i]);
>> vaddwev_d_w  vd, vj, vk
>> vd->Q[i] = SignExtend(vj->W[2i])  + SignExtend(vk->W[2i]);
>> vaddwev_q_d  vd, vj, vk
>> vd->Q[i] = SignExtend(vj->D[2i])  + SignExtend(vk->D[2i]);
>
> Ok, good example.
>
Sorry ,  My description is not comprehensive.

vaddwedv_w_h  vd, vj, vk

...

for i in range(4):
     vd->W[i] = SignExtend(vj->H[2i], 32)  + SignExtend(vk->H[2i]. 32);

...

>> static void gen_vaddwev_s(unsigned vece, TCGv_vec t, TCGv_vec a, 
>> TCGv_vec b)
>> {
>>      TCGv_vec t1 = tcg_temp_new_vec_matching(a);
>>      TCGv_vec t2 = tcg_temp_new_vec_matching(b);
>>
>>      int halfbits  =  4 << vece;
>>
>>      /* Sign-extend even elements from a */
>>      tcg_gen_dupi_vec(vece, t1, MAKE_64BIT_MASK(0, halfbits));
>>      tcg_gen_and_vec(vece, a, a, t1);
>
> No need to mask off these bits...
>
I am not sure.  but the result is not correct.   It's  weird.


like this:
the vece is MO_32.
static void gen_vaddwev_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
{
     TCGv_vec t1 = tcg_temp_new_vec_matching(a);
     TCGv_vec t2 = tcg_temp_new_vec_matching(b);
     int halfbits = 4 << vece;
     tcg_gen_shli_vec(vece, t1, a, halfbits);
     tcg_gen_shri_vec(vece, t1, t1, halfbits);

     tcg_gen_shli_vec(vece, t2, b,  halfbits);
     tcg_gen_shri_vec(vece, t2, t2, halfbits);

     tcg_gen_add_vec(vece, t, t1, t2);

     tcg_temp_free_vec(t1);
     tcg_temp_free_vec(t2);
}
...
        op[MO_16];
         {
             .fniv = gen_vaddwev_s,
             .fno = gen_helper_vaddwev_w_h,
             .opt_opc = vecop_list,
             .vece = MO_32
         },
...
TRANS(vaddwev_w_h, gvec_vvv, MO_16, gvec_vaddwev_s)

input :       0x ffff     fffe ffff     fffe   ffff    fffe ffff fffe  + 0
output :    0x 0000 fffe 0000 fffe  0000 fffe 0000 fffe
the crroect is  0xffffffffefffffffefffffffefffffffe.

Thanks.
Song Gao
>>      tcg_gen_shli_vec(vece, a, a, halfbits);
>
> ... because they shift out here anyway.
>
>>      tcg_gen_sari_vec(vece, a, a, halfbits);
>>
>>      /* Sign-extend even elements from b */
>>      tcg_gen_dupi_vec(vece, t2, MAKE_64BIT_MASK(0, halfbits));
>>      tcg_gen_and_vec(vece, b, b, t2);
>>      tcg_gen_shli_vec(vece, b, b, halfbits);
>>      tcg_gen_sari_vec(vece,  b, b, halfbits);
>>
>>      tcg_gen_add_vec(vece, t, a, b);
>>
>>      tcg_temp_free_vec(t1);
>>      tcg_temp_free_vec(t2);
>> }
>
> Otherwise this looks good.



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
  2023-02-27  9:14                 ` gaosong
@ 2023-02-27  9:20                   ` Richard Henderson
  2023-02-27 12:54                     ` gaosong
  0 siblings, 1 reply; 100+ messages in thread
From: Richard Henderson @ 2023-02-27  9:20 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 2/26/23 23:14, gaosong wrote:
> like this:
> the vece is MO_32.
> static void gen_vaddwev_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
> {
>      TCGv_vec t1 = tcg_temp_new_vec_matching(a);
>      TCGv_vec t2 = tcg_temp_new_vec_matching(b);
>      int halfbits = 4 << vece;
>      tcg_gen_shli_vec(vece, t1, a, halfbits);
>      tcg_gen_shri_vec(vece, t1, t1, halfbits);
> 
>      tcg_gen_shli_vec(vece, t2, b,  halfbits);
>      tcg_gen_shri_vec(vece, t2, t2, halfbits);
> 
>      tcg_gen_add_vec(vece, t, t1, t2);
> 
>      tcg_temp_free_vec(t1);
>      tcg_temp_free_vec(t2);
> }
> ...
>         op[MO_16];
>          {
>              .fniv = gen_vaddwev_s,
>              .fno = gen_helper_vaddwev_w_h,
>              .opt_opc = vecop_list,
>              .vece = MO_32
>          },
> ...
> TRANS(vaddwev_w_h, gvec_vvv, MO_16, gvec_vaddwev_s)
> 
> input :     0x ffff fffe ffff fffe  ffff fffe ffff fffe  + 0
> output :    0x 0000 fffe 0000 fffe  0000 fffe 0000 fffe
> correct is  0xffffffffefffffffefffffffe       ffff fffe.

sari above, not shri, for sign-extension.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
  2023-02-27  9:20                   ` Richard Henderson
@ 2023-02-27 12:54                     ` gaosong
  2023-02-27 18:32                       ` Richard Henderson
  0 siblings, 1 reply; 100+ messages in thread
From: gaosong @ 2023-02-27 12:54 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel


在 2023/2/27 下午5:20, Richard Henderson 写道:
> On 2/26/23 23:14, gaosong wrote:
>> like this:
>> the vece is MO_32.
>> static void gen_vaddwev_s(unsigned vece, TCGv_vec t, TCGv_vec a, 
>> TCGv_vec b)
>> {
>>      TCGv_vec t1 = tcg_temp_new_vec_matching(a);
>>      TCGv_vec t2 = tcg_temp_new_vec_matching(b);
>>      int halfbits = 4 << vece;
>>      tcg_gen_shli_vec(vece, t1, a, halfbits);
>>      tcg_gen_shri_vec(vece, t1, t1, halfbits);
>>
>>      tcg_gen_shli_vec(vece, t2, b,  halfbits);
>>      tcg_gen_shri_vec(vece, t2, t2, halfbits);
>>
>>      tcg_gen_add_vec(vece, t, t1, t2);
>>
>>      tcg_temp_free_vec(t1);
>>      tcg_temp_free_vec(t2);
>> }
>> ...
>>         op[MO_16];
>>          {
>>              .fniv = gen_vaddwev_s,
>>              .fno = gen_helper_vaddwev_w_h,
>>              .opt_opc = vecop_list,
>>              .vece = MO_32
>>          },
>> ...
>> TRANS(vaddwev_w_h, gvec_vvv, MO_16, gvec_vaddwev_s)
>>
>> input :     0x ffff fffe ffff fffe  ffff fffe ffff fffe  + 0
>> output :    0x 0000 fffe 0000 fffe  0000 fffe 0000 fffe
>> correct is  0xffffffffefffffffefffffffe       ffff fffe.
>
> sari above, not shri, for sign-extension.
>
>
Got it.

and how to  sign-extend  the odd  element  of vector?


Thanks.
Song Gao



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
  2023-02-24 19:24               ` Richard Henderson
  2023-02-27  9:14                 ` gaosong
@ 2023-02-27 12:55                 ` gaosong
  2023-02-27 18:40                   ` Richard Henderson
  1 sibling, 1 reply; 100+ messages in thread
From: gaosong @ 2023-02-27 12:55 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel


在 2023/2/25 上午3:24, Richard Henderson 写道:
>>          {
>>              .fniv = gen_vaddwev_s,
>>              .fno = gen_helper_vaddwev_q_d,
>>              .opt_opc = vecop_list,
>>              .vece = MO_128
>>          },
>
> There are no 128-bit vector operations; you'll need to do this one 
> differently.
>
> Presumably just load the two 64-bit elements, sign-extend into 
> 128-bits, add with tcg_gen_add2_i64, and store the two 64-bit elements 
> as output.  But that won't fit into the tcg_gen_gvec_3 interface.
>
'sign-extend into 128-bits,'   Could you give a example?

I see a example at target/ppc/translate/vmx-impl.c.inc
     static bool do_vx_vprtyb(DisasContext *ctx, arg_VX_tb *a, unsigned 
vece)
     {
             ...
             {
             .fno = gen_helper_VPRTYBQ,
             .vece = MO_128
             },
             tcg_gen_gvec_2(avr_full_offset(a->vrt), 
avr_full_offset(a->vrb),
                                16, 16, &op[vece - MO_32]);
         return true;
     }
TRANS(VPRTYBQ, do_vx_vprtyb, MO_128)
...

do_vx_vprtyb  fit the fno into the tcg_gen_gvec_2.
I am not sure this  example is right.

Thanks.
Song Gao



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
  2023-02-27 12:54                     ` gaosong
@ 2023-02-27 18:32                       ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2023-02-27 18:32 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 2/27/23 02:54, gaosong wrote:
> 
> 在 2023/2/27 下午5:20, Richard Henderson 写道:
>> On 2/26/23 23:14, gaosong wrote:
>>> like this:
>>> the vece is MO_32.
>>> static void gen_vaddwev_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
>>> {
>>>      TCGv_vec t1 = tcg_temp_new_vec_matching(a);
>>>      TCGv_vec t2 = tcg_temp_new_vec_matching(b);
>>>      int halfbits = 4 << vece;
>>>      tcg_gen_shli_vec(vece, t1, a, halfbits);
>>>      tcg_gen_shri_vec(vece, t1, t1, halfbits);
>>>
>>>      tcg_gen_shli_vec(vece, t2, b,  halfbits);
>>>      tcg_gen_shri_vec(vece, t2, t2, halfbits);
>>>
>>>      tcg_gen_add_vec(vece, t, t1, t2);
>>>
>>>      tcg_temp_free_vec(t1);
>>>      tcg_temp_free_vec(t2);
>>> }
>>> ...
>>>         op[MO_16];
>>>          {
>>>              .fniv = gen_vaddwev_s,
>>>              .fno = gen_helper_vaddwev_w_h,
>>>              .opt_opc = vecop_list,
>>>              .vece = MO_32
>>>          },
>>> ...
>>> TRANS(vaddwev_w_h, gvec_vvv, MO_16, gvec_vaddwev_s)
>>>
>>> input :     0x ffff fffe ffff fffe  ffff fffe ffff fffe  + 0
>>> output :    0x 0000 fffe 0000 fffe  0000 fffe 0000 fffe
>>> correct is  0xffffffffefffffffefffffffe       ffff fffe.
>>
>> sari above, not shri, for sign-extension.
>>
>>
> Got it.
> 
> and how to  sign-extend  the odd  element  of vector?

For the odd elements, you omit the shli, because the odd element is already at the most 
significant end of the wider element.


r~



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
  2023-02-27 12:55                 ` gaosong
@ 2023-02-27 18:40                   ` Richard Henderson
  2023-02-28  3:30                     ` gaosong
  0 siblings, 1 reply; 100+ messages in thread
From: Richard Henderson @ 2023-02-27 18:40 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 2/27/23 02:55, gaosong wrote:
> 
> 在 2023/2/25 上午3:24, Richard Henderson 写道:
>>>          {
>>>              .fniv = gen_vaddwev_s,
>>>              .fno = gen_helper_vaddwev_q_d,
>>>              .opt_opc = vecop_list,
>>>              .vece = MO_128
>>>          },
>>
>> There are no 128-bit vector operations; you'll need to do this one differently.
>>
>> Presumably just load the two 64-bit elements, sign-extend into 128-bits, add with 
>> tcg_gen_add2_i64, and store the two 64-bit elements as output.  But that won't fit into 
>> the tcg_gen_gvec_3 interface.
>>
> 'sign-extend into 128-bits,'   Could you give a example?

Well, for vadd, as the example we have been using:

     tcg_gen_ld_i64(lo1, cpu_env, offsetof(vector_reg[A].lo));
     tcg_gen_ld_i64(lo2, cpu_env, offsetof(vector_reg[B].lo));
     tcg_gen_sari_i64(hi1, lo1, 63);
     tcg_gen_sari_i64(hi2, lo2, 63);
     tcg_gen_add2_i64(lo1, hi1, lo1, hi1, lo2, hi2);
     tcg_gen_st_i64(lo1, cpu_env, offsetof(vector_reg[R].lo));
     tcg_gen_st_i64(hi1, cpu_env, offsetof(vector_reg[R].hi));

The middle two sari operations replicate the sign bit across the entire high word, so the 
pair of variables constitute a sign-extended 128-bit value.

> I see a example at target/ppc/translate/vmx-impl.c.inc
>      static bool do_vx_vprtyb(DisasContext *ctx, arg_VX_tb *a, unsigned vece)
>      {
>              ...
>              {
>              .fno = gen_helper_VPRTYBQ,
>              .vece = MO_128
>              },
>              tcg_gen_gvec_2(avr_full_offset(a->vrt), avr_full_offset(a->vrb),
>                                 16, 16, &op[vece - MO_32]);
>          return true;
>      }
> TRANS(VPRTYBQ, do_vx_vprtyb, MO_128)
> ...
> 
> do_vx_vprtyb  fit the fno into the tcg_gen_gvec_2.
> I am not sure this  example is right.

Ah, well.  When .fno is the only callback, the implementation is entirely out-of-line, and 
the .vece member is not used.  I see that is confusing.


r~



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
  2023-02-27 18:40                   ` Richard Henderson
@ 2023-02-28  3:30                     ` gaosong
  2023-02-28 16:48                       ` Richard Henderson
  0 siblings, 1 reply; 100+ messages in thread
From: gaosong @ 2023-02-28  3:30 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel


在 2023/2/28 上午2:40, Richard Henderson 写道:
> On 2/27/23 02:55, gaosong wrote:
>>
>> 在 2023/2/25 上午3:24, Richard Henderson 写道:
>>>>          {
>>>>              .fniv = gen_vaddwev_s,
>>>>              .fno = gen_helper_vaddwev_q_d,
>>>>              .opt_opc = vecop_list,
>>>>              .vece = MO_128
>>>>          },
>>>
>>> There are no 128-bit vector operations; you'll need to do this one 
>>> differently.
>>>
>>> Presumably just load the two 64-bit elements, sign-extend into 
>>> 128-bits, add with tcg_gen_add2_i64, and store the two 64-bit 
>>> elements as output.  But that won't fit into the tcg_gen_gvec_3 
>>> interface.
>>>
>> 'sign-extend into 128-bits,'   Could you give a example?
>
> Well, for vadd, as the example we have been using:
>
>     tcg_gen_ld_i64(lo1, cpu_env, offsetof(vector_reg[A].lo));
>     tcg_gen_ld_i64(lo2, cpu_env, offsetof(vector_reg[B].lo));
>     tcg_gen_sari_i64(hi1, lo1, 63);
>     tcg_gen_sari_i64(hi2, lo2, 63);
>     tcg_gen_add2_i64(lo1, hi1, lo1, hi1, lo2, hi2);
>     tcg_gen_st_i64(lo1, cpu_env, offsetof(vector_reg[R].lo));
>     tcg_gen_st_i64(hi1, cpu_env, offsetof(vector_reg[R].hi));
>
> The middle two sari operations replicate the sign bit across the 
> entire high word, so the pair of variables constitute a sign-extended 
> 128-bit value.
>
Thank you .

This is a way  to translate:

static trans_vaddwev_q_d( DisasContext *ctx, arg_vvv *a)
{
     ...
     tcg_gen_ld_i64(lo1, cpu_env, offsetof(vector_reg[A].lo));
     tcg_gen_ld_i64(lo2, cpu_env, offsetof(vector_reg[B].lo));
     tcg_gen_sari_i64(hi1, lo1, 63);
     tcg_gen_sari_i64(hi2, lo2, 63);
     tcg_gen_add2_i64(lo1, hi1, lo1, hi1, lo2, hi2);
     tcg_gen_st_i64(lo1, cpu_env, offsetof(vector_reg[R].lo));
     tcg_gen_st_i64(hi1, cpu_env, offsetof(vector_reg[R].hi));
     ...
}
>> I see a example at target/ppc/translate/vmx-impl.c.inc
>>      static bool do_vx_vprtyb(DisasContext *ctx, arg_VX_tb *a, 
>> unsigned vece)
>>      {
>>              ...
>>              {
>>              .fno = gen_helper_VPRTYBQ,
>>              .vece = MO_128
>>              },
>>              tcg_gen_gvec_2(avr_full_offset(a->vrt), 
>> avr_full_offset(a->vrb),
>>                                 16, 16, &op[vece - MO_32]);
>>          return true;
>>      }
>> TRANS(VPRTYBQ, do_vx_vprtyb, MO_128)
>> ...
>>
>> do_vx_vprtyb  fit the fno into the tcg_gen_gvec_2.
>> I am not sure this  example is right.
>
> Ah, well.  When .fno is the only callback, the implementation is 
> entirely out-of-line, and the .vece member is not used.  I see that is 
> confusing.
>
and This is another way to translate:
     ...
          {
              .fno = gen_helper_vaddwev_q_d,
              .vece = MO_128
          },
     ...
     void HELPER(vaddwev_q_d)(void *vd, void *vj, void *vk, uint32_t v)
     {
         VReg *Vd = (VReg *)vd;
         VReg *Vj = (VReg *)vj;
         VReg *Vk = (VReg *)vk;

         Vd->Q(0) = int128_add((Int128)Vj->D(0), (Int128)Vk->D(0));
     }

These ways are can be chosen?

Thanks.
Song Gao



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
  2023-02-24 23:01               ` Richard Henderson
@ 2023-02-28  7:44                 ` gaosong
  2023-02-28 16:50                   ` Richard Henderson
  0 siblings, 1 reply; 100+ messages in thread
From: gaosong @ 2023-02-28  7:44 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

Hi, Richard

在 2023/2/25 上午7:01, Richard Henderson 写道:
> On 2/23/23 21:24, gaosong wrote:
>>          {
>>              .fniv = gen_vaddwev_s,
>>              .fno = gen_helper_vaddwev_w_h,
>>              .opt_opc = vecop_list,
>>              .vece = MO_32
>>          },
>>          {
>>              .fniv = gen_vaddwev_s,
>>              .fno = gen_helper_vaddwev_d_w,
>>              .opt_opc = vecop_list,
>>              .vece = MO_64
>>          },
>
> Oh, these two can also include .fni4 and .fni8 integer versions, 
> respectively, for hosts without the proper vector support.
>
OK,

but fno is not for hosts without the proper vector support?

Thanks.
Song Gao



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
  2023-02-28  3:30                     ` gaosong
@ 2023-02-28 16:48                       ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2023-02-28 16:48 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 2/27/23 17:30, gaosong wrote:
>          Vd->Q(0) = int128_add((Int128)Vj->D(0), (Int128)Vk->D(0));

You cannot cast like this.
You must use int128_make{64,s64}.

> These ways are can be chosen? 

Yes, out-of-line is a valid choice.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
  2023-02-28  7:44                 ` gaosong
@ 2023-02-28 16:50                   ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2023-02-28 16:50 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 2/27/23 21:44, gaosong wrote:
> Hi, Richard
> 
> 在 2023/2/25 上午7:01, Richard Henderson 写道:
>> On 2/23/23 21:24, gaosong wrote:
>>>          {
>>>              .fniv = gen_vaddwev_s,
>>>              .fno = gen_helper_vaddwev_w_h,
>>>              .opt_opc = vecop_list,
>>>              .vece = MO_32
>>>          },
>>>          {
>>>              .fniv = gen_vaddwev_s,
>>>              .fno = gen_helper_vaddwev_d_w,
>>>              .opt_opc = vecop_list,
>>>              .vece = MO_64
>>>          },
>>
>> Oh, these two can also include .fni4 and .fni8 integer versions, respectively, for hosts 
>> without the proper vector support.
>>
> OK,
> 
> but fno is not for hosts without the proper vector support?

For this definition, if the host has vector support, then the expansion will be done with 
.fniv, otherwise with .fno.


r~



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 40/43] target/loongarch: Implement vreplve vpack vpick
  2022-12-24 21:12   ` Richard Henderson
@ 2023-03-21 11:31     ` gaosong
  2023-03-21 15:55       ` Richard Henderson
  0 siblings, 1 reply; 100+ messages in thread
From: gaosong @ 2023-03-21 11:31 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

Hi, Richard

在 2022/12/25 上午5:12, Richard Henderson 写道:
> On 12/24/22 00:16, Song Gao wrote:
>> +TRANS(vreplve_b, gen_vvr, gen_helper_vreplve_b)
>> +TRANS(vreplve_h, gen_vvr, gen_helper_vreplve_h)
>> +TRANS(vreplve_w, gen_vvr, gen_helper_vreplve_w)
>> +TRANS(vreplve_d, gen_vvr, gen_helper_vreplve_d)
>> +TRANS(vreplvei_b, gen_vv_i, gen_helper_vreplvei_b)
>> +TRANS(vreplvei_h, gen_vv_i, gen_helper_vreplvei_h)
>> +TRANS(vreplvei_w, gen_vv_i, gen_helper_vreplvei_w)
>> +TRANS(vreplvei_d, gen_vv_i, gen_helper_vreplvei_d)
> tcg_gen_gvec_dupm.
>
> In the case of imm, this will be cpu_env + offsetof.
e.g  vreplvei_b  vd, vj, imm
vd->B(i) = Vj->B(imm);
tcg_gen_gvec_dup_mem(MO_8,  vreg_full_offset(a->vd), 
offsetof(CPULoongArchState,  fpr[a->vj].vreg.B(a->imm)),
                                                  16, 16);
this case no problem.
> In the case of reg, compute cpu_env + register offset + offsetof.
>
but for this case.
e.g
vreplve_b  vd vj, rk
index  = gpr[rk] % (128/8);
Vd->B(i) = Vj->B(index);
tcg_gen_gvec_dup_mem(MO_8, vreg_full_offset(a->vd), 
offsetof(CPULoongArchState, fpr[a->vj].vreg.B(index))), 16, 16 );

How can we get the index with cpu_env? or  need env->gpr[rk]?
The index type is not TCGv.
I have no idea.

Thanks.
Song Gao
>> +TRANS(vbsll_v, gen_vv_i, gen_helper_vbsll_v)
>> +TRANS(vbsrl_v, gen_vv_i, gen_helper_vbsrl_v)
>
> These can use tcg_gen_extract2_i64, with imm * 8 bit shift.
>
>
> r~



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 40/43] target/loongarch: Implement vreplve vpack vpick
  2023-03-21 11:31     ` gaosong
@ 2023-03-21 15:55       ` Richard Henderson
  2023-03-22  8:32         ` gaosong
  0 siblings, 1 reply; 100+ messages in thread
From: Richard Henderson @ 2023-03-21 15:55 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 3/21/23 04:31, gaosong wrote:
> but for this case.
> e.g
> vreplve_b  vd vj, rk
> index  = gpr[rk] % (128/8);
> Vd->B(i) = Vj->B(index);
> tcg_gen_gvec_dup_mem(MO_8, vreg_full_offset(a->vd), offsetof(CPULoongArchState, 
> fpr[a->vj].vreg.B(index))), 16, 16 );
> 
> How can we get the index with cpu_env? or  need env->gpr[rk]?
> The index type is not TCGv.

For this case you would load the value Vj->B(index) into a TCGv_i32,

	tcg_gen_andi_i64(t0, gpr_src(rk), 15);

	// Handle endian adjustment on t0, e.g. xor 15 for big-endian?

	tcg_gen_trunc_i64_ptr(t1, t0);
	tcg_gen_add_ptr(t1, t1, cpu_env);
	tcg_gen_ld8u_i32(t2, t1, vreg_full_offset(vj));

	// At this point t2 contains Vj->B(index)

	tcg_gen_gvec_dup_i32(MO_8, vreg_full_offset(vd), 16, 16, t2);



r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 40/43] target/loongarch: Implement vreplve vpack vpick
  2023-03-21 15:55       ` Richard Henderson
@ 2023-03-22  8:32         ` gaosong
  2023-03-22 12:35           ` Richard Henderson
  0 siblings, 1 reply; 100+ messages in thread
From: gaosong @ 2023-03-22  8:32 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel


在 2023/3/21 下午11:55, Richard Henderson 写道:
> On 3/21/23 04:31, gaosong wrote:
>> but for this case.
>> e.g
>> vreplve_b  vd vj, rk
>> index  = gpr[rk] % (128/8);
>> Vd->B(i) = Vj->B(index);
>> tcg_gen_gvec_dup_mem(MO_8, vreg_full_offset(a->vd), 
>> offsetof(CPULoongArchState, fpr[a->vj].vreg.B(index))), 16, 16 );
>>
>> How can we get the index with cpu_env? or  need env->gpr[rk]?
>> The index type is not TCGv.
>
> For this case you would load the value Vj->B(index) into a TCGv_i32,
>
>     tcg_gen_andi_i64(t0, gpr_src(rk), 15);
>
>     // Handle endian adjustment on t0, e.g. xor 15 for big-endian?
>
>     tcg_gen_trunc_i64_ptr(t1, t0);
>     tcg_gen_add_ptr(t1, t1, cpu_env);
>     tcg_gen_ld8u_i32(t2, t1, vreg_full_offset(vj));
>
>     // At this point t2 contains Vj->B(index)
>
>     tcg_gen_gvec_dup_i32(MO_8, vreg_full_offset(vd), 16, 16, t2);
>
>
It's weird. this is no problem  for vreplve_b,   but for vreplve_h/w/d 
is not correct.

e.g vreplve h
index = gpr[rk] % 8
Vd->H(i) = Vj->H(index);

like this:
{
     tcg_gen_andi_i64(t0, gpr_src(ctx, a->rk, EXT_NONE), 7);
     if (HOST_BIG_ENDIAN) {
         tcg_gen_xori_i64(t0, t0, 7);
     }

     tcg_gen_trunc_i64_ptr(t1, t0);
     tcg_gen_add_ptr(t1, t1, cpu_env);
     tcg_gen_ld16u_i32(t2, t1, vreg_full_offset(a->vj));
     tcg_gen_gvec_dup_i32(MO_16, vreg_full_offset(a->vd), 16, 16, t2);
}

vreplve.h    vr25,  vr31, r30
   r30    : 000000007aab5617
   v31    : {efd0efc1efd0efc1, efd0efc1efd0efc1}
result:  {efd0efd0efd0efd0, efd0efd0efd0efd0},
and we get result is :  {c1efc1efc1efc1ef, c1efc1efc1efc1ef}.
my host is little-endian.

Thanks.
Song Gao



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH 40/43] target/loongarch: Implement vreplve vpack vpick
  2023-03-22  8:32         ` gaosong
@ 2023-03-22 12:35           ` Richard Henderson
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Henderson @ 2023-03-22 12:35 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 3/22/23 01:32, gaosong wrote:
> 
> 在 2023/3/21 下午11:55, Richard Henderson 写道:
>> On 3/21/23 04:31, gaosong wrote:
>>> but for this case.
>>> e.g
>>> vreplve_b  vd vj, rk
>>> index  = gpr[rk] % (128/8);
>>> Vd->B(i) = Vj->B(index);
>>> tcg_gen_gvec_dup_mem(MO_8, vreg_full_offset(a->vd), offsetof(CPULoongArchState, 
>>> fpr[a->vj].vreg.B(index))), 16, 16 );
>>>
>>> How can we get the index with cpu_env? or  need env->gpr[rk]?
>>> The index type is not TCGv.
>>
>> For this case you would load the value Vj->B(index) into a TCGv_i32,
>>
>>     tcg_gen_andi_i64(t0, gpr_src(rk), 15);
>>
>>     // Handle endian adjustment on t0, e.g. xor 15 for big-endian?
>>
>>     tcg_gen_trunc_i64_ptr(t1, t0);
>>     tcg_gen_add_ptr(t1, t1, cpu_env);
>>     tcg_gen_ld8u_i32(t2, t1, vreg_full_offset(vj));
>>
>>     // At this point t2 contains Vj->B(index)
>>
>>     tcg_gen_gvec_dup_i32(MO_8, vreg_full_offset(vd), 16, 16, t2);
>>
>>
> It's weird. this is no problem  for vreplve_b,   but for vreplve_h/w/d is not correct.
> 
> e.g vreplve h
> index = gpr[rk] % 8
> Vd->H(i) = Vj->H(index);
> 
> like this:
> {
>      tcg_gen_andi_i64(t0, gpr_src(ctx, a->rk, EXT_NONE), 7);
>      if (HOST_BIG_ENDIAN) {
>          tcg_gen_xori_i64(t0, t0, 7);
>      }
> 
>      tcg_gen_trunc_i64_ptr(t1, t0);
>      tcg_gen_add_ptr(t1, t1, cpu_env);
>      tcg_gen_ld16u_i32(t2, t1, vreg_full_offset(a->vj));
>      tcg_gen_gvec_dup_i32(MO_16, vreg_full_offset(a->vd), 16, 16, t2);
> }
> 
> vreplve.h    vr25,  vr31, r30
>    r30    : 000000007aab5617
>    v31    : {efd0efc1efd0efc1, efd0efc1efd0efc1}
> result:  {efd0efd0efd0efd0, efd0efd0efd0efd0},
> and we get result is :  {c1efc1efc1efc1ef, c1efc1efc1efc1ef}.
> my host is little-endian.

You forgot to shift the index left by one bit, to turn H index into byte offset.


r~


^ permalink raw reply	[flat|nested] 100+ messages in thread

end of thread, other threads:[~2023-03-22 12:36 UTC | newest]

Thread overview: 100+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-24  8:15 [RFC PATCH 00/43] Add LoongArch LSX instructions Song Gao
2022-12-24  8:15 ` [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t Song Gao
2022-12-24 17:07   ` Richard Henderson
2022-12-24 17:24   ` Richard Henderson
2022-12-28  2:34     ` gaosong
2022-12-28 17:30       ` Richard Henderson
2022-12-29  1:51         ` gaosong
2022-12-29  3:13           ` Richard Henderson
2022-12-29  3:54             ` gaosong
2022-12-24 17:32   ` Richard Henderson
2023-02-13  8:24     ` gaosong
2023-02-13 19:18       ` Richard Henderson
2022-12-24  8:15 ` [RFC PATCH 02/43] target/loongarch: CPUCFG support LSX Song Gao
2022-12-24  8:15 ` [RFC PATCH 03/43] target/loongarch: meson.build support build LSX Song Gao
2022-12-24  8:15 ` [RFC PATCH 04/43] target/loongarch: Add CHECK_SXE maccro for check LSX enable Song Gao
2022-12-24  8:15 ` [RFC PATCH 05/43] target/loongarch: Implement vadd/vsub Song Gao
2022-12-24 17:16   ` Richard Henderson
2022-12-24  8:15 ` [RFC PATCH 06/43] target/loongarch: Implement vaddi/vsubi Song Gao
2022-12-24 17:27   ` Richard Henderson
2022-12-24  8:15 ` [RFC PATCH 07/43] target/loongarch: Implement vneg Song Gao
2022-12-24 17:29   ` Richard Henderson
2022-12-24  8:15 ` [RFC PATCH 08/43] target/loongarch: Implement vsadd/vssub Song Gao
2022-12-24 17:31   ` Richard Henderson
2022-12-24  8:15 ` [RFC PATCH 09/43] target/loongarch: Implement vhaddw/vhsubw Song Gao
2022-12-24 17:41   ` Richard Henderson
2022-12-24  8:16 ` [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw Song Gao
2022-12-24 17:48   ` Richard Henderson
2023-02-20  7:47     ` gaosong
2023-02-20 17:21       ` Richard Henderson
2023-02-23  8:23         ` gaosong
2023-02-23 15:22           ` Richard Henderson
2023-02-24  7:24             ` gaosong
2023-02-24 19:24               ` Richard Henderson
2023-02-27  9:14                 ` gaosong
2023-02-27  9:20                   ` Richard Henderson
2023-02-27 12:54                     ` gaosong
2023-02-27 18:32                       ` Richard Henderson
2023-02-27 12:55                 ` gaosong
2023-02-27 18:40                   ` Richard Henderson
2023-02-28  3:30                     ` gaosong
2023-02-28 16:48                       ` Richard Henderson
2023-02-24 23:01               ` Richard Henderson
2023-02-28  7:44                 ` gaosong
2023-02-28 16:50                   ` Richard Henderson
2022-12-24  8:16 ` [RFC PATCH 11/43] target/loongarch: Implement vavg/vavgr Song Gao
2022-12-24 17:52   ` Richard Henderson
2022-12-24  8:16 ` [RFC PATCH 12/43] target/loongarch: Implement vabsd Song Gao
2022-12-24 17:55   ` Richard Henderson
2022-12-24  8:16 ` [RFC PATCH 13/43] target/loongarch: Implement vadda Song Gao
2022-12-24 17:56   ` Richard Henderson
2022-12-24  8:16 ` [RFC PATCH 14/43] target/loongarch: Implement vmax/vmin Song Gao
2022-12-24 18:01   ` Richard Henderson
2022-12-24  8:16 ` [RFC PATCH 15/43] target/loongarch: Implement vmul/vmuh/vmulw{ev/od} Song Gao
2022-12-24 18:07   ` Richard Henderson
2022-12-24  8:16 ` [RFC PATCH 16/43] target/loongarch: Implement vmadd/vmsub/vmaddw{ev/od} Song Gao
2022-12-24 18:09   ` Richard Henderson
2022-12-24  8:16 ` [RFC PATCH 17/43] target/loongarch: Implement vdiv/vmod Song Gao
2022-12-24  8:16 ` [RFC PATCH 18/43] target/loongarch: Implement vsat Song Gao
2022-12-24 18:13   ` Richard Henderson
2022-12-24  8:16 ` [RFC PATCH 19/43] target/loongarch: Implement vexth Song Gao
2022-12-24  8:16 ` [RFC PATCH 20/43] target/loongarch: Implement vsigncov Song Gao
2022-12-24 18:18   ` Richard Henderson
2022-12-24  8:16 ` [RFC PATCH 21/43] target/loongarch: Implement vmskltz/vmskgez/vmsknz Song Gao
2022-12-24 18:31   ` Richard Henderson
2022-12-24  8:16 ` [RFC PATCH 22/43] target/loongarch: Implement LSX logic instructions Song Gao
2022-12-24 18:34   ` Richard Henderson
2022-12-24  8:16 ` [RFC PATCH 23/43] target/loongarch: Implement vsll vsrl vsra vrotr Song Gao
2022-12-24 18:36   ` Richard Henderson
2022-12-24  8:16 ` [RFC PATCH 24/43] target/loongarch: Implement vsllwil vextl Song Gao
2022-12-24  8:16 ` [RFC PATCH 25/43] target/loongarch: Implement vsrlr vsrar Song Gao
2022-12-24  8:16 ` [RFC PATCH 26/43] target/loongarch: Implement vsrln vsran Song Gao
2022-12-24  8:16 ` [RFC PATCH 27/43] target/loongarch: Implement vsrlrn vsrarn Song Gao
2022-12-24  8:16 ` [RFC PATCH 28/43] target/loongarch: Implement vssrln vssran Song Gao
2022-12-24  8:16 ` [RFC PATCH 29/43] target/loongarch: Implement vssrlrn vssrarn Song Gao
2022-12-24  8:16 ` [RFC PATCH 30/43] target/loongarch: Implement vclo vclz Song Gao
2022-12-24  8:16 ` [RFC PATCH 31/43] target/loongarch: Implement vpcnt Song Gao
2022-12-24  8:16 ` [RFC PATCH 32/43] target/loongarch: Implement vbitclr vbitset vbitrev Song Gao
2022-12-24  8:16 ` [RFC PATCH 33/43] target/loongarch: Implement vfrstp Song Gao
2022-12-24  8:16 ` [RFC PATCH 34/43] target/loongarch: Implement LSX fpu arith instructions Song Gao
2022-12-24  8:16 ` [RFC PATCH 35/43] target/loongarch: Implement LSX fpu fcvt instructions Song Gao
2022-12-24  8:16 ` [RFC PATCH 36/43] target/loongarch: Implement vseq vsle vslt Song Gao
2022-12-24 18:50   ` Richard Henderson
2022-12-24  8:16 ` [RFC PATCH 37/43] target/loongarch: Implement vfcmp Song Gao
2022-12-24  8:16 ` [RFC PATCH 38/43] target/loongarch: Implement vbitsel vset Song Gao
2022-12-24 19:15   ` Richard Henderson
2022-12-24  8:16 ` [RFC PATCH 39/43] target/loongarch: Implement vinsgr2vr vpickve2gr vreplgr2vr Song Gao
2022-12-24 20:34   ` Richard Henderson
2022-12-24  8:16 ` [RFC PATCH 40/43] target/loongarch: Implement vreplve vpack vpick Song Gao
2022-12-24 21:12   ` Richard Henderson
2023-03-21 11:31     ` gaosong
2023-03-21 15:55       ` Richard Henderson
2023-03-22  8:32         ` gaosong
2023-03-22 12:35           ` Richard Henderson
2022-12-24  8:16 ` [RFC PATCH 41/43] target/loongarch: Implement vilvl vilvh vextrins vshuf Song Gao
2022-12-24  8:16 ` [RFC PATCH 42/43] target/loongarch: Implement vld vst Song Gao
2022-12-24 21:15   ` Richard Henderson
2022-12-24  8:16 ` [RFC PATCH 43/43] target/loongarch: Implement vldi Song Gao
2022-12-24 21:18   ` Richard Henderson
2022-12-24 15:39 ` [RFC PATCH 00/43] Add LoongArch LSX instructions Richard Henderson
2022-12-28  0:55   ` gaosong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.