All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/44] Add LoongArch LSX instructions
@ 2023-03-28  3:05 Song Gao
  2023-03-28  3:05 ` [RFC PATCH v2 01/44] target/loongarch: Add LSX data type VReg Song Gao
                   ` (43 more replies)
  0 siblings, 44 replies; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

Hi,

This series adds LoongArch LSX instructions, Since the LoongArch
Vol2 is not open, So we use 'RFC' title.

About Test:
V2 we use RISU test the LoongArch LSX instructions.
No problems have been found so far.

QEMU:
    https://github.com/loongson/qemu/tree/tcg-old-abi-support-lsx
RISU:
    https://github.com/loongson/risu/tree/loongarch-suport-lsx

V2:
  - Use gvec;
  - Fix instructions bugs;
  - Add set_fpr()/get_fpr() replace to cpu_fpr.

Thanks.
Song Gao

Song Gao (44):
  target/loongarch: Add LSX data type VReg
  target/loongarch: CPUCFG support LSX
  target/loongarch: meson.build support build LSX
  target/loongarch: Add CHECK_SXE maccro for check LSX enable
  target/loongarch: Implement vadd/vsub
  target/loongarch: Implement vaddi/vsubi
  target/loongarch: Implement vneg
  target/loongarch: Implement vsadd/vssub
  target/loongarch: Implement vhaddw/vhsubw
  target/loongarch: Implement vaddw/vsubw
  target/loongarch: Implement vavg/vavgr
  target/loongarch: Implement vabsd
  target/loongarch: Implement vadda
  target/loongarch: Implement vmax/vmin
  target/loongarch: Implement vmul/vmuh/vmulw{ev/od}
  target/loongarch: Implement vmadd/vmsub/vmaddw{ev/od}
  target/loongarch: Implement vdiv/vmod
  target/loongarch: Implement vsat
  target/loongarch: Implement vexth
  target/loongarch: Implement vsigncov
  target/loongarch: Implement vmskltz/vmskgez/vmsknz
  target/loongarch: Implement LSX logic instructions
  target/loongarch: Implement vsll vsrl vsra vrotr
  target/loongarch: Implement vsllwil vextl
  target/loongarch: Implement vsrlr vsrar
  target/loongarch: Implement vsrln vsran
  target/loongarch: Implement vsrlrn vsrarn
  target/loongarch: Implement vssrln vssran
  target/loongarch: Implement vssrlrn vssrarn
  target/loongarch: Implement vclo vclz
  target/loongarch: Implement vpcnt
  target/loongarch: Implement vbitclr vbitset vbitrev
  target/loongarch: Implement vfrstp
  target/loongarch: Implement LSX fpu arith instructions
  target/loongarch: Implement LSX fpu fcvt instructions
  target/loongarch: Implement vseq vsle vslt
  target/loongarch: Implement vfcmp
  target/loongarch: Implement vbitsel vset
  target/loongarch: Implement vinsgr2vr vpickve2gr vreplgr2vr
  target/loongarch: Implement vreplve vpack vpick
  target/loongarch: Implement vilvl vilvh vextrins vshuf
  target/loongarch: Implement vld vst
  target/loongarch: Implement vldi
  target/loongarch: Use {set/get}_gpr replace to cpu_fpr

 fpu/softfloat.c                               |   55 +
 include/fpu/softfloat.h                       |   27 +
 linux-user/loongarch64/signal.c               |    4 +-
 target/loongarch/cpu.c                        |    5 +-
 target/loongarch/cpu.h                        |   37 +-
 target/loongarch/disas.c                      |  911 ++++
 target/loongarch/fpu_helper.c                 |    2 +-
 target/loongarch/gdbstub.c                    |    4 +-
 target/loongarch/helper.h                     |  593 +++
 .../loongarch/insn_trans/trans_farith.c.inc   |   72 +-
 target/loongarch/insn_trans/trans_fcmp.c.inc  |   12 +-
 .../loongarch/insn_trans/trans_fmemory.c.inc  |   37 +-
 target/loongarch/insn_trans/trans_fmov.c.inc  |   31 +-
 target/loongarch/insn_trans/trans_lsx.c.inc   | 3724 +++++++++++++++++
 target/loongarch/insns.decode                 |  811 ++++
 target/loongarch/internals.h                  |    1 +
 target/loongarch/lsx_helper.c                 | 3553 ++++++++++++++++
 target/loongarch/machine.c                    |   34 +-
 target/loongarch/meson.build                  |    1 +
 target/loongarch/translate.c                  |   38 +-
 20 files changed, 9901 insertions(+), 51 deletions(-)
 create mode 100644 target/loongarch/insn_trans/trans_lsx.c.inc
 create mode 100644 target/loongarch/lsx_helper.c

-- 
2.31.1



^ permalink raw reply	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 01/44] target/loongarch: Add LSX data type VReg
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
@ 2023-03-28  3:05 ` Song Gao
  2023-03-28 19:56   ` Richard Henderson
  2023-03-28  3:05 ` [RFC PATCH v2 02/44] target/loongarch: CPUCFG support LSX Song Gao
                   ` (42 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 linux-user/loongarch64/signal.c |  4 ++--
 target/loongarch/cpu.c          |  2 +-
 target/loongarch/cpu.h          | 31 +++++++++++++++++++++++++++++-
 target/loongarch/gdbstub.c      |  4 ++--
 target/loongarch/machine.c      | 34 ++++++++++++++++++++++++++++++++-
 5 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/linux-user/loongarch64/signal.c b/linux-user/loongarch64/signal.c
index 7c7afb652e..bb8efb1172 100644
--- a/linux-user/loongarch64/signal.c
+++ b/linux-user/loongarch64/signal.c
@@ -128,7 +128,7 @@ static void setup_sigframe(CPULoongArchState *env,
 
     fpu_ctx = (struct target_fpu_context *)(info + 1);
     for (i = 0; i < 32; ++i) {
-        __put_user(env->fpr[i], &fpu_ctx->regs[i]);
+        __put_user(env->fpr[i].vreg.D(0), &fpu_ctx->regs[i]);
     }
     __put_user(read_fcc(env), &fpu_ctx->fcc);
     __put_user(env->fcsr0, &fpu_ctx->fcsr);
@@ -193,7 +193,7 @@ static void restore_sigframe(CPULoongArchState *env,
         uint64_t fcc;
 
         for (i = 0; i < 32; ++i) {
-            __get_user(env->fpr[i], &fpu_ctx->regs[i]);
+            __get_user(env->fpr[i].vreg.D(0), &fpu_ctx->regs[i]);
         }
         __get_user(fcc, &fpu_ctx->fcc);
         write_fcc(env, fcc);
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 97e6579f6a..18b41221a6 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -656,7 +656,7 @@ void loongarch_cpu_dump_state(CPUState *cs, FILE *f, int flags)
     /* fpr */
     if (flags & CPU_DUMP_FPU) {
         for (i = 0; i < 32; i++) {
-            qemu_fprintf(f, " %s %016" PRIx64, fregnames[i], env->fpr[i]);
+            qemu_fprintf(f, " %s %016" PRIx64, fregnames[i], env->fpr[i].vreg.D(0));
             if ((i & 3) == 3) {
                 qemu_fprintf(f, "\n");
             }
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index e11c875188..6e5fa6a01d 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -8,6 +8,7 @@
 #ifndef LOONGARCH_CPU_H
 #define LOONGARCH_CPU_H
 
+#include "qemu/int128.h"
 #include "exec/cpu-defs.h"
 #include "fpu/softfloat-types.h"
 #include "hw/registerfields.h"
@@ -241,6 +242,34 @@ FIELD(TLB_MISC, ASID, 1, 10)
 FIELD(TLB_MISC, VPPN, 13, 35)
 FIELD(TLB_MISC, PS, 48, 6)
 
+#define LSX_LEN   (128)
+typedef union VReg {
+    int8_t   B[LSX_LEN / 8];
+    int16_t  H[LSX_LEN / 16];
+    int32_t  W[LSX_LEN / 32];
+    int64_t  D[LSX_LEN / 64];
+    Int128   Q[LSX_LEN / 128];
+}VReg;
+
+typedef union fpr_t fpr_t;
+union fpr_t {
+    VReg  vreg;
+};
+
+#if  HOST_BIG_ENDIAN
+#define B(x)  B[15 - (x)]
+#define H(x)  H[7 - (x)]
+#define W(x)  W[3 - (x)]
+#define D(x)  D[1 - (x)]
+#define Q(x)  Q[x]
+#else
+#define B(x)  B[x]
+#define H(x)  H[x]
+#define W(x)  W[x]
+#define D(x)  D[x]
+#define Q(x)  Q[x]
+#endif
+
 struct LoongArchTLB {
     uint64_t tlb_misc;
     /* Fields corresponding to CSR_TLBELO0/1 */
@@ -253,7 +282,7 @@ typedef struct CPUArchState {
     uint64_t gpr[32];
     uint64_t pc;
 
-    uint64_t fpr[32];
+    fpr_t fpr[32];
     float_status fp_status;
     bool cf[8];
 
diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c
index fa3e034d15..0752fff924 100644
--- a/target/loongarch/gdbstub.c
+++ b/target/loongarch/gdbstub.c
@@ -69,7 +69,7 @@ static int loongarch_gdb_get_fpu(CPULoongArchState *env,
                                  GByteArray *mem_buf, int n)
 {
     if (0 <= n && n < 32) {
-        return gdb_get_reg64(mem_buf, env->fpr[n]);
+        return gdb_get_reg64(mem_buf, env->fpr[n].vreg.D(0));
     } else if (n == 32) {
         uint64_t val = read_fcc(env);
         return gdb_get_reg64(mem_buf, val);
@@ -85,7 +85,7 @@ static int loongarch_gdb_set_fpu(CPULoongArchState *env,
     int length = 0;
 
     if (0 <= n && n < 32) {
-        env->fpr[n] = ldq_p(mem_buf);
+        env->fpr[n].vreg.D(0) = ldq_p(mem_buf);
         length = 8;
     } else if (n == 32) {
         uint64_t val = ldq_p(mem_buf);
diff --git a/target/loongarch/machine.c b/target/loongarch/machine.c
index b1e523ea72..54e67e63bc 100644
--- a/target/loongarch/machine.c
+++ b/target/loongarch/machine.c
@@ -33,7 +33,39 @@ const VMStateDescription vmstate_loongarch_cpu = {
 
         VMSTATE_UINTTL_ARRAY(env.gpr, LoongArchCPU, 32),
         VMSTATE_UINTTL(env.pc, LoongArchCPU),
-        VMSTATE_UINT64_ARRAY(env.fpr, LoongArchCPU, 32),
+        VMSTATE_INT64(env.fpr[0].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[1].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[2].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[3].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[4].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[5].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[6].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[7].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[8].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[9].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[10].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[11].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[12].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[13].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[14].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[15].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[16].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[17].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[18].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[19].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[20].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[21].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[22].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[23].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[24].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[25].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[26].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[27].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[28].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[29].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[30].vreg.D(0), LoongArchCPU),
+        VMSTATE_INT64(env.fpr[31].vreg.D(0), LoongArchCPU),
+
         VMSTATE_UINT32(env.fcsr0, LoongArchCPU),
         VMSTATE_BOOL_ARRAY(env.cf, LoongArchCPU, 8),
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 02/44] target/loongarch: CPUCFG support LSX
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
  2023-03-28  3:05 ` [RFC PATCH v2 01/44] target/loongarch: Add LSX data type VReg Song Gao
@ 2023-03-28  3:05 ` Song Gao
  2023-03-28 19:33   ` Richard Henderson
  2023-03-28  3:05 ` [RFC PATCH v2 03/44] target/loongarch: meson.build support build LSX Song Gao
                   ` (41 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/cpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 18b41221a6..2263bd4fdd 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -386,6 +386,7 @@ static void loongarch_la464_initfn(Object *obj)
     data = FIELD_DP32(data, CPUCFG2, FP_SP, 1);
     data = FIELD_DP32(data, CPUCFG2, FP_DP, 1);
     data = FIELD_DP32(data, CPUCFG2, FP_VER, 1);
+    data = FIELD_DP32(data, CPUCFG2, LSX, 1),
     data = FIELD_DP32(data, CPUCFG2, LLFTP, 1);
     data = FIELD_DP32(data, CPUCFG2, LLFTP_VER, 1);
     data = FIELD_DP32(data, CPUCFG2, LAM, 1);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 03/44] target/loongarch: meson.build support build LSX
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
  2023-03-28  3:05 ` [RFC PATCH v2 01/44] target/loongarch: Add LSX data type VReg Song Gao
  2023-03-28  3:05 ` [RFC PATCH v2 02/44] target/loongarch: CPUCFG support LSX Song Gao
@ 2023-03-28  3:05 ` Song Gao
  2023-03-28 19:35   ` Richard Henderson
  2023-03-28  3:05 ` [RFC PATCH v2 04/44] target/loongarch: Add CHECK_SXE maccro for check LSX enable Song Gao
                   ` (40 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insn_trans/trans_lsx.c.inc | 5 +++++
 target/loongarch/lsx_helper.c               | 6 ++++++
 target/loongarch/meson.build                | 1 +
 target/loongarch/translate.c                | 1 +
 4 files changed, 13 insertions(+)
 create mode 100644 target/loongarch/insn_trans/trans_lsx.c.inc
 create mode 100644 target/loongarch/lsx_helper.c

diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
new file mode 100644
index 0000000000..1cf3ab34a9
--- /dev/null
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * LSX translate functions
+ * Copyright (c) 2022-2023 Loongson Technology Corporation Limited
+ */
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
new file mode 100644
index 0000000000..9332163aff
--- /dev/null
+++ b/target/loongarch/lsx_helper.c
@@ -0,0 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * QEMU LoongArch LSX helper functions.
+ *
+ * Copyright (c) 2022-2023 Loongson Technology Corporation Limited
+ */
diff --git a/target/loongarch/meson.build b/target/loongarch/meson.build
index 9293a8ab78..1117a51c52 100644
--- a/target/loongarch/meson.build
+++ b/target/loongarch/meson.build
@@ -11,6 +11,7 @@ loongarch_tcg_ss.add(files(
   'op_helper.c',
   'translate.c',
   'gdbstub.c',
+  'lsx_helper.c',
 ))
 loongarch_tcg_ss.add(zlib)
 
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index f443b5822f..104d4f2fbd 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -171,6 +171,7 @@ static void gen_set_gpr(int reg_num, TCGv t, DisasExtend dst_ext)
 #include "insn_trans/trans_fmemory.c.inc"
 #include "insn_trans/trans_branch.c.inc"
 #include "insn_trans/trans_privileged.c.inc"
+#include "insn_trans/trans_lsx.c.inc"
 
 static void loongarch_tr_translate_insn(DisasContextBase *dcbase, CPUState *cs)
 {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 04/44] target/loongarch: Add CHECK_SXE maccro for check LSX enable
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (2 preceding siblings ...)
  2023-03-28  3:05 ` [RFC PATCH v2 03/44] target/loongarch: meson.build support build LSX Song Gao
@ 2023-03-28  3:05 ` Song Gao
  2023-03-28 19:42   ` Richard Henderson
  2023-03-28  3:05 ` [RFC PATCH v2 05/44] target/loongarch: Implement vadd/vsub Song Gao
                   ` (39 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/cpu.c                      |  2 ++
 target/loongarch/cpu.h                      |  2 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 11 +++++++++++
 3 files changed, 15 insertions(+)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 2263bd4fdd..a3ce1ccf00 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -52,6 +52,7 @@ static const char * const excp_names[] = {
     [EXCCODE_FPE] = "Floating Point Exception",
     [EXCCODE_DBP] = "Debug breakpoint",
     [EXCCODE_BCE] = "Bound Check Exception",
+    [EXCCODE_SXD] = "128 bit vector instructions Disable exception",
 };
 
 const char *loongarch_exception_name(int32_t exception)
@@ -187,6 +188,7 @@ static void loongarch_cpu_do_interrupt(CPUState *cs)
     case EXCCODE_FPD:
     case EXCCODE_FPE:
     case EXCCODE_BCE:
+    case EXCCODE_ASXD:
         env->CSR_BADV = env->pc;
         QEMU_FALLTHROUGH;
     case EXCCODE_ADEM:
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 6e5fa6a01d..2e5326f474 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -429,6 +429,7 @@ static inline int cpu_mmu_index(CPULoongArchState *env, bool ifetch)
 #define HW_FLAGS_PLV_MASK   R_CSR_CRMD_PLV_MASK  /* 0x03 */
 #define HW_FLAGS_CRMD_PG    R_CSR_CRMD_PG_MASK   /* 0x10 */
 #define HW_FLAGS_EUEN_FPE   0x04
+#define HW_FLAGS_EUEN_SXE   0x08
 
 static inline void cpu_get_tb_cpu_state(CPULoongArchState *env,
                                         target_ulong *pc,
@@ -439,6 +440,7 @@ static inline void cpu_get_tb_cpu_state(CPULoongArchState *env,
     *cs_base = 0;
     *flags = env->CSR_CRMD & (R_CSR_CRMD_PLV_MASK | R_CSR_CRMD_PG_MASK);
     *flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, FPE) * HW_FLAGS_EUEN_FPE;
+    *flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, SXE) * HW_FLAGS_EUEN_SXE;
 }
 
 void loongarch_cpu_list(void);
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 1cf3ab34a9..5dedb044d7 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -3,3 +3,14 @@
  * LSX translate functions
  * Copyright (c) 2022-2023 Loongson Technology Corporation Limited
  */
+
+#ifndef CONFIG_USER_ONLY
+#define CHECK_SXE do { \
+    if ((ctx->base.tb->flags & HW_FLAGS_EUEN_SXE) == 0) { \
+        generate_exception(ctx, EXCCODE_SXD); \
+        return true; \
+    } \
+} while (0)
+#else
+#define CHECK_SXE
+#endif
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 05/44] target/loongarch: Implement vadd/vsub
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (3 preceding siblings ...)
  2023-03-28  3:05 ` [RFC PATCH v2 04/44] target/loongarch: Add CHECK_SXE maccro for check LSX enable Song Gao
@ 2023-03-28  3:05 ` Song Gao
  2023-03-28 19:50   ` Richard Henderson
  2023-03-28 19:59   ` Richard Henderson
  2023-03-28  3:05 ` [RFC PATCH v2 06/44] target/loongarch: Implement vaddi/vsubi Song Gao
                   ` (38 subsequent siblings)
  43 siblings, 2 replies; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VADD.{B/H/W/D/Q};
- VSUB.{B/H/W/D/Q}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    | 23 ++++++++++++
 target/loongarch/helper.h                   |  4 +++
 target/loongarch/insn_trans/trans_lsx.c.inc | 40 +++++++++++++++++++++
 target/loongarch/insns.decode               | 22 ++++++++++++
 target/loongarch/lsx_helper.c               | 25 +++++++++++++
 target/loongarch/translate.c                |  7 ++++
 6 files changed, 121 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 2e93e77e0d..a5948d7847 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -784,3 +784,26 @@ PCADD_INSN(pcaddi)
 PCADD_INSN(pcalau12i)
 PCADD_INSN(pcaddu12i)
 PCADD_INSN(pcaddu18i)
+
+#define INSN_LSX(insn, type)                                \
+static bool trans_##insn(DisasContext *ctx, arg_##type * a) \
+{                                                           \
+    output_##type(ctx, a, #insn);                           \
+    return true;                                            \
+}
+
+static void output_vvv(DisasContext *ctx, arg_vvv *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "v%d, v%d, v%d", a->vd, a->vj, a->vk);
+}
+
+INSN_LSX(vadd_b,           vvv)
+INSN_LSX(vadd_h,           vvv)
+INSN_LSX(vadd_w,           vvv)
+INSN_LSX(vadd_d,           vvv)
+INSN_LSX(vadd_q,           vvv)
+INSN_LSX(vsub_b,           vvv)
+INSN_LSX(vsub_h,           vvv)
+INSN_LSX(vsub_w,           vvv)
+INSN_LSX(vsub_d,           vvv)
+INSN_LSX(vsub_q,           vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 9c01823a26..13390c07d6 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -130,3 +130,7 @@ DEF_HELPER_4(ldpte, void, env, tl, tl, i32)
 DEF_HELPER_1(ertn, void, env)
 DEF_HELPER_1(idle, void, env)
 #endif
+
+/* LoongArch LSX  */
+DEF_HELPER_4(vadd_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vsub_q, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 5dedb044d7..2fe0e4ace5 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -14,3 +14,43 @@
 #else
 #define CHECK_SXE
 #endif
+
+static bool gen_vvv(DisasContext *ctx, arg_vvv *a,
+                    void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32))
+{
+    TCGv_i32 vd = tcg_constant_i32(a->vd);
+    TCGv_i32 vj = tcg_constant_i32(a->vj);
+    TCGv_i32 vk = tcg_constant_i32(a->vk);
+
+    CHECK_SXE;
+
+    func(cpu_env, vd, vj, vk);
+    return true;
+}
+
+static bool gvec_vvv(DisasContext *ctx, arg_vvv *a, MemOp mop,
+                     void (*func)(unsigned, uint32_t, uint32_t,
+                                  uint32_t, uint32_t, uint32_t))
+{
+    uint32_t vd_ofs, vj_ofs, vk_ofs;
+
+    CHECK_SXE;
+
+    vd_ofs = vreg_full_offset(a->vd);
+    vj_ofs = vreg_full_offset(a->vj);
+    vk_ofs = vreg_full_offset(a->vk);
+
+    func(mop, vd_ofs, vj_ofs, vk_ofs, 16, 16);
+    return true;
+}
+
+TRANS(vadd_b, gvec_vvv, MO_8, tcg_gen_gvec_add)
+TRANS(vadd_h, gvec_vvv, MO_16, tcg_gen_gvec_add)
+TRANS(vadd_w, gvec_vvv, MO_32, tcg_gen_gvec_add)
+TRANS(vadd_d, gvec_vvv, MO_64, tcg_gen_gvec_add)
+TRANS(vadd_q, gen_vvv, gen_helper_vadd_q)
+TRANS(vsub_b, gvec_vvv, MO_8, tcg_gen_gvec_sub)
+TRANS(vsub_h, gvec_vvv, MO_16, tcg_gen_gvec_sub)
+TRANS(vsub_w, gvec_vvv, MO_32, tcg_gen_gvec_sub)
+TRANS(vsub_d, gvec_vvv, MO_64, tcg_gen_gvec_sub)
+TRANS(vsub_q, gen_vvv, gen_helper_vsub_q)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index de7b8f0f3c..d18db68d51 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -485,3 +485,25 @@ ldpte            0000 01100100 01 ........ ..... 00000    @j_i
 ertn             0000 01100100 10000 01110 00000 00000    @empty
 idle             0000 01100100 10001 ...............      @i15
 dbcl             0000 00000010 10101 ...............      @i15
+
+#
+# LSX Argument sets
+#
+
+&vvv          vd vj vk
+
+#
+# LSX Formats
+#
+@vvv               .... ........ ..... vk:5 vj:5 vd:5    &vvv
+
+vadd_b           0111 00000000 10100 ..... ..... .....    @vvv
+vadd_h           0111 00000000 10101 ..... ..... .....    @vvv
+vadd_w           0111 00000000 10110 ..... ..... .....    @vvv
+vadd_d           0111 00000000 10111 ..... ..... .....    @vvv
+vadd_q           0111 00010010 11010 ..... ..... .....    @vvv
+vsub_b           0111 00000000 11000 ..... ..... .....    @vvv
+vsub_h           0111 00000000 11001 ..... ..... .....    @vvv
+vsub_w           0111 00000000 11010 ..... ..... .....    @vvv
+vsub_d           0111 00000000 11011 ..... ..... .....    @vvv
+vsub_q           0111 00010010 11011 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 9332163aff..edd6e99b23 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -4,3 +4,28 @@
  *
  * Copyright (c) 2022-2023 Loongson Technology Corporation Limited
  */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/exec-all.h"
+#include "exec/helper-proto.h"
+
+void helper_vadd_q(CPULoongArchState *env,
+                   uint32_t vd, uint32_t vj, uint32_t vk)
+{
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vk = &(env->fpr[vk].vreg);
+
+    Vd->Q(0) = int128_add(Vj->Q(0), Vk->Q(0));
+}
+
+void helper_vsub_q(CPULoongArchState *env,
+                   uint32_t vd, uint32_t vj, uint32_t vk)
+{
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vk = &(env->fpr[vk].vreg);
+
+    Vd->Q(0) = int128_sub(Vj->Q(0), Vk->Q(0));
+}
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index 104d4f2fbd..f50d14cc65 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -8,6 +8,8 @@
 #include "qemu/osdep.h"
 #include "cpu.h"
 #include "tcg/tcg-op.h"
+#include "tcg/tcg-op-gvec.h"
+
 #include "exec/translator.h"
 #include "exec/helper-proto.h"
 #include "exec/helper-gen.h"
@@ -29,6 +31,11 @@ TCGv_i64 cpu_fpr[32];
 #define DISAS_EXIT        DISAS_TARGET_1
 #define DISAS_EXIT_UPDATE DISAS_TARGET_2
 
+static inline int vreg_full_offset(int regno)
+{
+    return  offsetof(CPULoongArchState, fpr[regno].vreg);
+}
+
 static inline int plus_1(DisasContext *ctx, int x)
 {
     return x + 1;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 06/44] target/loongarch: Implement vaddi/vsubi
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (4 preceding siblings ...)
  2023-03-28  3:05 ` [RFC PATCH v2 05/44] target/loongarch: Implement vadd/vsub Song Gao
@ 2023-03-28  3:05 ` Song Gao
  2023-03-28 19:58   ` Richard Henderson
  2023-03-28  3:05 ` [RFC PATCH v2 07/44] target/loongarch: Implement vneg Song Gao
                   ` (37 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VADDI.{B/H/W/D}U;
- VSUBI.{B/H/W/D}U.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    | 14 ++++++++
 target/loongarch/insn_trans/trans_lsx.c.inc | 37 +++++++++++++++++++++
 target/loongarch/insns.decode               | 11 ++++++
 3 files changed, 62 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index a5948d7847..c1960610c2 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -797,6 +797,11 @@ static void output_vvv(DisasContext *ctx, arg_vvv *a, const char *mnemonic)
     output(ctx, mnemonic, "v%d, v%d, v%d", a->vd, a->vj, a->vk);
 }
 
+static void output_vv_i(DisasContext *ctx, arg_vv_i *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "v%d, v%d, 0x%x", a->vd, a->vj, a->imm);
+}
+
 INSN_LSX(vadd_b,           vvv)
 INSN_LSX(vadd_h,           vvv)
 INSN_LSX(vadd_w,           vvv)
@@ -807,3 +812,12 @@ INSN_LSX(vsub_h,           vvv)
 INSN_LSX(vsub_w,           vvv)
 INSN_LSX(vsub_d,           vvv)
 INSN_LSX(vsub_q,           vvv)
+
+INSN_LSX(vaddi_bu,         vv_i)
+INSN_LSX(vaddi_hu,         vv_i)
+INSN_LSX(vaddi_wu,         vv_i)
+INSN_LSX(vaddi_du,         vv_i)
+INSN_LSX(vsubi_bu,         vv_i)
+INSN_LSX(vsubi_hu,         vv_i)
+INSN_LSX(vsubi_wu,         vv_i)
+INSN_LSX(vsubi_du,         vv_i)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 2fe0e4ace5..99a5c2474d 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -44,6 +44,34 @@ static bool gvec_vvv(DisasContext *ctx, arg_vvv *a, MemOp mop,
     return true;
 }
 
+static bool gvec_vv_i(DisasContext *ctx, arg_vv_i *a, MemOp mop,
+                      void (*func)(unsigned, uint32_t, uint32_t,
+                                   int64_t, uint32_t, uint32_t))
+{
+    uint32_t vd_ofs, vj_ofs;
+
+    CHECK_SXE;
+
+    vd_ofs = vreg_full_offset(a->vd);
+    vj_ofs = vreg_full_offset(a->vj);
+
+    func(mop, vd_ofs, vj_ofs, a->imm , 16, 16);
+    return true;
+}
+
+static bool gvec_subi(DisasContext *ctx, arg_vv_i *a, MemOp mop)
+{
+    uint32_t vd_ofs, vj_ofs;
+
+    CHECK_SXE;
+
+    vd_ofs = vreg_full_offset(a->vd);
+    vj_ofs = vreg_full_offset(a->vj);
+
+    tcg_gen_gvec_addi(mop, vd_ofs, vj_ofs, -(a->imm), 16, 16);
+    return true;
+}
+
 TRANS(vadd_b, gvec_vvv, MO_8, tcg_gen_gvec_add)
 TRANS(vadd_h, gvec_vvv, MO_16, tcg_gen_gvec_add)
 TRANS(vadd_w, gvec_vvv, MO_32, tcg_gen_gvec_add)
@@ -54,3 +82,12 @@ TRANS(vsub_h, gvec_vvv, MO_16, tcg_gen_gvec_sub)
 TRANS(vsub_w, gvec_vvv, MO_32, tcg_gen_gvec_sub)
 TRANS(vsub_d, gvec_vvv, MO_64, tcg_gen_gvec_sub)
 TRANS(vsub_q, gen_vvv, gen_helper_vsub_q)
+
+TRANS(vaddi_bu, gvec_vv_i, MO_8, tcg_gen_gvec_addi)
+TRANS(vaddi_hu, gvec_vv_i, MO_16, tcg_gen_gvec_addi)
+TRANS(vaddi_wu, gvec_vv_i, MO_32, tcg_gen_gvec_addi)
+TRANS(vaddi_du, gvec_vv_i, MO_64, tcg_gen_gvec_addi)
+TRANS(vsubi_bu, gvec_subi, MO_8)
+TRANS(vsubi_hu, gvec_subi, MO_16)
+TRANS(vsubi_wu, gvec_subi, MO_32)
+TRANS(vsubi_du, gvec_subi, MO_64)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index d18db68d51..2a98c14518 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -491,11 +491,13 @@ dbcl             0000 00000010 10101 ...............      @i15
 #
 
 &vvv          vd vj vk
+&vv_i         vd vj imm
 
 #
 # LSX Formats
 #
 @vvv               .... ........ ..... vk:5 vj:5 vd:5    &vvv
+@vv_ui5           .... ........ ..... imm:5 vj:5 vd:5    &vv_i
 
 vadd_b           0111 00000000 10100 ..... ..... .....    @vvv
 vadd_h           0111 00000000 10101 ..... ..... .....    @vvv
@@ -507,3 +509,12 @@ vsub_h           0111 00000000 11001 ..... ..... .....    @vvv
 vsub_w           0111 00000000 11010 ..... ..... .....    @vvv
 vsub_d           0111 00000000 11011 ..... ..... .....    @vvv
 vsub_q           0111 00010010 11011 ..... ..... .....    @vvv
+
+vaddi_bu         0111 00101000 10100 ..... ..... .....    @vv_ui5
+vaddi_hu         0111 00101000 10101 ..... ..... .....    @vv_ui5
+vaddi_wu         0111 00101000 10110 ..... ..... .....    @vv_ui5
+vaddi_du         0111 00101000 10111 ..... ..... .....    @vv_ui5
+vsubi_bu         0111 00101000 11000 ..... ..... .....    @vv_ui5
+vsubi_hu         0111 00101000 11001 ..... ..... .....    @vv_ui5
+vsubi_wu         0111 00101000 11010 ..... ..... .....    @vv_ui5
+vsubi_du         0111 00101000 11011 ..... ..... .....    @vv_ui5
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 07/44] target/loongarch: Implement vneg
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (5 preceding siblings ...)
  2023-03-28  3:05 ` [RFC PATCH v2 06/44] target/loongarch: Implement vaddi/vsubi Song Gao
@ 2023-03-28  3:05 ` Song Gao
  2023-03-28 20:02   ` Richard Henderson
  2023-03-28  3:05 ` [RFC PATCH v2 08/44] target/loongarch: Implement vsadd/vssub Song Gao
                   ` (36 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes;
- VNEG.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    | 10 ++++++++++
 target/loongarch/insn_trans/trans_lsx.c.inc | 20 ++++++++++++++++++++
 target/loongarch/insns.decode               |  7 +++++++
 3 files changed, 37 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index c1960610c2..5eabb8c47a 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -802,6 +802,11 @@ static void output_vv_i(DisasContext *ctx, arg_vv_i *a, const char *mnemonic)
     output(ctx, mnemonic, "v%d, v%d, 0x%x", a->vd, a->vj, a->imm);
 }
 
+static void output_vv(DisasContext *ctx, arg_vv *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "v%d, v%d", a->vd, a->vj);
+}
+
 INSN_LSX(vadd_b,           vvv)
 INSN_LSX(vadd_h,           vvv)
 INSN_LSX(vadd_w,           vvv)
@@ -821,3 +826,8 @@ INSN_LSX(vsubi_bu,         vv_i)
 INSN_LSX(vsubi_hu,         vv_i)
 INSN_LSX(vsubi_wu,         vv_i)
 INSN_LSX(vsubi_du,         vv_i)
+
+INSN_LSX(vneg_b,           vv)
+INSN_LSX(vneg_h,           vv)
+INSN_LSX(vneg_w,           vv)
+INSN_LSX(vneg_d,           vv)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 99a5c2474d..dc66e44a75 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -44,6 +44,21 @@ static bool gvec_vvv(DisasContext *ctx, arg_vvv *a, MemOp mop,
     return true;
 }
 
+static bool gvec_vv(DisasContext *ctx, arg_vv *a, MemOp mop,
+                    void (*func)(unsigned, uint32_t, uint32_t,
+                                 uint32_t, uint32_t))
+{
+    uint32_t dofs, jofs;
+
+    CHECK_SXE;
+
+    dofs = vreg_full_offset(a->vd);
+    jofs = vreg_full_offset(a->vj);
+
+    func(mop, dofs, jofs, 16, 16);
+    return true;
+}
+
 static bool gvec_vv_i(DisasContext *ctx, arg_vv_i *a, MemOp mop,
                       void (*func)(unsigned, uint32_t, uint32_t,
                                    int64_t, uint32_t, uint32_t))
@@ -91,3 +106,8 @@ TRANS(vsubi_bu, gvec_subi, MO_8)
 TRANS(vsubi_hu, gvec_subi, MO_16)
 TRANS(vsubi_wu, gvec_subi, MO_32)
 TRANS(vsubi_du, gvec_subi, MO_64)
+
+TRANS(vneg_b, gvec_vv, MO_8, tcg_gen_gvec_neg)
+TRANS(vneg_h, gvec_vv, MO_16, tcg_gen_gvec_neg)
+TRANS(vneg_w, gvec_vv, MO_32, tcg_gen_gvec_neg)
+TRANS(vneg_d, gvec_vv, MO_64, tcg_gen_gvec_neg)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 2a98c14518..d90798be11 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -490,12 +490,14 @@ dbcl             0000 00000010 10101 ...............      @i15
 # LSX Argument sets
 #
 
+&vv           vd vj
 &vvv          vd vj vk
 &vv_i         vd vj imm
 
 #
 # LSX Formats
 #
+@vv               .... ........ ..... ..... vj:5 vd:5    &vv
 @vvv               .... ........ ..... vk:5 vj:5 vd:5    &vvv
 @vv_ui5           .... ........ ..... imm:5 vj:5 vd:5    &vv_i
 
@@ -518,3 +520,8 @@ vsubi_bu         0111 00101000 11000 ..... ..... .....    @vv_ui5
 vsubi_hu         0111 00101000 11001 ..... ..... .....    @vv_ui5
 vsubi_wu         0111 00101000 11010 ..... ..... .....    @vv_ui5
 vsubi_du         0111 00101000 11011 ..... ..... .....    @vv_ui5
+
+vneg_b           0111 00101001 11000 01100 ..... .....    @vv
+vneg_h           0111 00101001 11000 01101 ..... .....    @vv
+vneg_w           0111 00101001 11000 01110 ..... .....    @vv
+vneg_d           0111 00101001 11000 01111 ..... .....    @vv
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 08/44] target/loongarch: Implement vsadd/vssub
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (6 preceding siblings ...)
  2023-03-28  3:05 ` [RFC PATCH v2 07/44] target/loongarch: Implement vneg Song Gao
@ 2023-03-28  3:05 ` Song Gao
  2023-03-28 20:03   ` Richard Henderson
  2023-03-28  3:05 ` [RFC PATCH v2 09/44] target/loongarch: Implement vhaddw/vhsubw Song Gao
                   ` (35 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VSADD.{B/H/W/D}[U];
- VSSUB.{B/H/W/D}[U].

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    | 17 +++++++++++++++++
 target/loongarch/insn_trans/trans_lsx.c.inc | 17 +++++++++++++++++
 target/loongarch/insns.decode               | 17 +++++++++++++++++
 3 files changed, 51 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 5eabb8c47a..b7f9320ba0 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -831,3 +831,20 @@ INSN_LSX(vneg_b,           vv)
 INSN_LSX(vneg_h,           vv)
 INSN_LSX(vneg_w,           vv)
 INSN_LSX(vneg_d,           vv)
+
+INSN_LSX(vsadd_b,          vvv)
+INSN_LSX(vsadd_h,          vvv)
+INSN_LSX(vsadd_w,          vvv)
+INSN_LSX(vsadd_d,          vvv)
+INSN_LSX(vsadd_bu,         vvv)
+INSN_LSX(vsadd_hu,         vvv)
+INSN_LSX(vsadd_wu,         vvv)
+INSN_LSX(vsadd_du,         vvv)
+INSN_LSX(vssub_b,          vvv)
+INSN_LSX(vssub_h,          vvv)
+INSN_LSX(vssub_w,          vvv)
+INSN_LSX(vssub_d,          vvv)
+INSN_LSX(vssub_bu,         vvv)
+INSN_LSX(vssub_hu,         vvv)
+INSN_LSX(vssub_wu,         vvv)
+INSN_LSX(vssub_du,         vvv)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index dc66e44a75..0bf4759a0f 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -111,3 +111,20 @@ TRANS(vneg_b, gvec_vv, MO_8, tcg_gen_gvec_neg)
 TRANS(vneg_h, gvec_vv, MO_16, tcg_gen_gvec_neg)
 TRANS(vneg_w, gvec_vv, MO_32, tcg_gen_gvec_neg)
 TRANS(vneg_d, gvec_vv, MO_64, tcg_gen_gvec_neg)
+
+TRANS(vsadd_b, gvec_vvv, MO_8, tcg_gen_gvec_ssadd)
+TRANS(vsadd_h, gvec_vvv, MO_16, tcg_gen_gvec_ssadd)
+TRANS(vsadd_w, gvec_vvv, MO_32, tcg_gen_gvec_ssadd)
+TRANS(vsadd_d, gvec_vvv, MO_64, tcg_gen_gvec_ssadd)
+TRANS(vsadd_bu, gvec_vvv, MO_8, tcg_gen_gvec_usadd)
+TRANS(vsadd_hu, gvec_vvv, MO_16, tcg_gen_gvec_usadd)
+TRANS(vsadd_wu, gvec_vvv, MO_32, tcg_gen_gvec_usadd)
+TRANS(vsadd_du, gvec_vvv, MO_64, tcg_gen_gvec_usadd)
+TRANS(vssub_b, gvec_vvv, MO_8, tcg_gen_gvec_sssub)
+TRANS(vssub_h, gvec_vvv, MO_16, tcg_gen_gvec_sssub)
+TRANS(vssub_w, gvec_vvv, MO_32, tcg_gen_gvec_sssub)
+TRANS(vssub_d, gvec_vvv, MO_64, tcg_gen_gvec_sssub)
+TRANS(vssub_bu, gvec_vvv, MO_8, tcg_gen_gvec_ussub)
+TRANS(vssub_hu, gvec_vvv, MO_16, tcg_gen_gvec_ussub)
+TRANS(vssub_wu, gvec_vvv, MO_32, tcg_gen_gvec_ussub)
+TRANS(vssub_du, gvec_vvv, MO_64, tcg_gen_gvec_ussub)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index d90798be11..3a29f0a9ab 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -525,3 +525,20 @@ vneg_b           0111 00101001 11000 01100 ..... .....    @vv
 vneg_h           0111 00101001 11000 01101 ..... .....    @vv
 vneg_w           0111 00101001 11000 01110 ..... .....    @vv
 vneg_d           0111 00101001 11000 01111 ..... .....    @vv
+
+vsadd_b          0111 00000100 01100 ..... ..... .....    @vvv
+vsadd_h          0111 00000100 01101 ..... ..... .....    @vvv
+vsadd_w          0111 00000100 01110 ..... ..... .....    @vvv
+vsadd_d          0111 00000100 01111 ..... ..... .....    @vvv
+vsadd_bu         0111 00000100 10100 ..... ..... .....    @vvv
+vsadd_hu         0111 00000100 10101 ..... ..... .....    @vvv
+vsadd_wu         0111 00000100 10110 ..... ..... .....    @vvv
+vsadd_du         0111 00000100 10111 ..... ..... .....    @vvv
+vssub_b          0111 00000100 10000 ..... ..... .....    @vvv
+vssub_h          0111 00000100 10001 ..... ..... .....    @vvv
+vssub_w          0111 00000100 10010 ..... ..... .....    @vvv
+vssub_d          0111 00000100 10011 ..... ..... .....    @vvv
+vssub_bu         0111 00000100 11000 ..... ..... .....    @vvv
+vssub_hu         0111 00000100 11001 ..... ..... .....    @vvv
+vssub_wu         0111 00000100 11010 ..... ..... .....    @vvv
+vssub_du         0111 00000100 11011 ..... ..... .....    @vvv
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 09/44] target/loongarch: Implement vhaddw/vhsubw
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (7 preceding siblings ...)
  2023-03-28  3:05 ` [RFC PATCH v2 08/44] target/loongarch: Implement vsadd/vssub Song Gao
@ 2023-03-28  3:05 ` Song Gao
  2023-03-28 20:17   ` Richard Henderson
  2023-03-28  3:05 ` [RFC PATCH v2 10/44] target/loongarch: Implement vaddw/vsubw Song Gao
                   ` (34 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VHADDW.{H.B/W.H/D.W/Q.D/HU.BU/WU.HU/DU.WU/QU.DU};
- VHSUBW.{H.B/W.H/D.W/Q.D/HU.BU/WU.HU/DU.WU/QU.DU}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    | 17 ++++
 target/loongarch/helper.h                   | 17 ++++
 target/loongarch/insn_trans/trans_lsx.c.inc | 17 ++++
 target/loongarch/insns.decode               | 17 ++++
 target/loongarch/lsx_helper.c               | 89 +++++++++++++++++++++
 5 files changed, 157 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index b7f9320ba0..adfd693938 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -848,3 +848,20 @@ INSN_LSX(vssub_bu,         vvv)
 INSN_LSX(vssub_hu,         vvv)
 INSN_LSX(vssub_wu,         vvv)
 INSN_LSX(vssub_du,         vvv)
+
+INSN_LSX(vhaddw_h_b,       vvv)
+INSN_LSX(vhaddw_w_h,       vvv)
+INSN_LSX(vhaddw_d_w,       vvv)
+INSN_LSX(vhaddw_q_d,       vvv)
+INSN_LSX(vhaddw_hu_bu,     vvv)
+INSN_LSX(vhaddw_wu_hu,     vvv)
+INSN_LSX(vhaddw_du_wu,     vvv)
+INSN_LSX(vhaddw_qu_du,     vvv)
+INSN_LSX(vhsubw_h_b,       vvv)
+INSN_LSX(vhsubw_w_h,       vvv)
+INSN_LSX(vhsubw_d_w,       vvv)
+INSN_LSX(vhsubw_q_d,       vvv)
+INSN_LSX(vhsubw_hu_bu,     vvv)
+INSN_LSX(vhsubw_wu_hu,     vvv)
+INSN_LSX(vhsubw_du_wu,     vvv)
+INSN_LSX(vhsubw_qu_du,     vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 13390c07d6..040f12c92c 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -134,3 +134,20 @@ DEF_HELPER_1(idle, void, env)
 /* LoongArch LSX  */
 DEF_HELPER_4(vadd_q, void, env, i32, i32, i32)
 DEF_HELPER_4(vsub_q, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vhaddw_h_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vhaddw_w_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vhaddw_d_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vhaddw_q_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vhaddw_hu_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vhaddw_wu_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vhaddw_du_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vhaddw_qu_du, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_h_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_w_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_d_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_q_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_hu_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_wu_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_du_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_qu_du, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 0bf4759a0f..d8b8c2a5ea 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -128,3 +128,20 @@ TRANS(vssub_bu, gvec_vvv, MO_8, tcg_gen_gvec_ussub)
 TRANS(vssub_hu, gvec_vvv, MO_16, tcg_gen_gvec_ussub)
 TRANS(vssub_wu, gvec_vvv, MO_32, tcg_gen_gvec_ussub)
 TRANS(vssub_du, gvec_vvv, MO_64, tcg_gen_gvec_ussub)
+
+TRANS(vhaddw_h_b, gen_vvv, gen_helper_vhaddw_h_b)
+TRANS(vhaddw_w_h, gen_vvv, gen_helper_vhaddw_w_h)
+TRANS(vhaddw_d_w, gen_vvv, gen_helper_vhaddw_d_w)
+TRANS(vhaddw_q_d, gen_vvv, gen_helper_vhaddw_q_d)
+TRANS(vhaddw_hu_bu, gen_vvv, gen_helper_vhaddw_hu_bu)
+TRANS(vhaddw_wu_hu, gen_vvv, gen_helper_vhaddw_wu_hu)
+TRANS(vhaddw_du_wu, gen_vvv, gen_helper_vhaddw_du_wu)
+TRANS(vhaddw_qu_du, gen_vvv, gen_helper_vhaddw_qu_du)
+TRANS(vhsubw_h_b, gen_vvv, gen_helper_vhsubw_h_b)
+TRANS(vhsubw_w_h, gen_vvv, gen_helper_vhsubw_w_h)
+TRANS(vhsubw_d_w, gen_vvv, gen_helper_vhsubw_d_w)
+TRANS(vhsubw_q_d, gen_vvv, gen_helper_vhsubw_q_d)
+TRANS(vhsubw_hu_bu, gen_vvv, gen_helper_vhsubw_hu_bu)
+TRANS(vhsubw_wu_hu, gen_vvv, gen_helper_vhsubw_wu_hu)
+TRANS(vhsubw_du_wu, gen_vvv, gen_helper_vhsubw_du_wu)
+TRANS(vhsubw_qu_du, gen_vvv, gen_helper_vhsubw_qu_du)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 3a29f0a9ab..10a20858e5 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -542,3 +542,20 @@ vssub_bu         0111 00000100 11000 ..... ..... .....    @vvv
 vssub_hu         0111 00000100 11001 ..... ..... .....    @vvv
 vssub_wu         0111 00000100 11010 ..... ..... .....    @vvv
 vssub_du         0111 00000100 11011 ..... ..... .....    @vvv
+
+vhaddw_h_b       0111 00000101 01000 ..... ..... .....    @vvv
+vhaddw_w_h       0111 00000101 01001 ..... ..... .....    @vvv
+vhaddw_d_w       0111 00000101 01010 ..... ..... .....    @vvv
+vhaddw_q_d       0111 00000101 01011 ..... ..... .....    @vvv
+vhaddw_hu_bu     0111 00000101 10000 ..... ..... .....    @vvv
+vhaddw_wu_hu     0111 00000101 10001 ..... ..... .....    @vvv
+vhaddw_du_wu     0111 00000101 10010 ..... ..... .....    @vvv
+vhaddw_qu_du     0111 00000101 10011 ..... ..... .....    @vvv
+vhsubw_h_b       0111 00000101 01100 ..... ..... .....    @vvv
+vhsubw_w_h       0111 00000101 01101 ..... ..... .....    @vvv
+vhsubw_d_w       0111 00000101 01110 ..... ..... .....    @vvv
+vhsubw_q_d       0111 00000101 01111 ..... ..... .....    @vvv
+vhsubw_hu_bu     0111 00000101 10100 ..... ..... .....    @vvv
+vhsubw_wu_hu     0111 00000101 10101 ..... ..... .....    @vvv
+vhsubw_du_wu     0111 00000101 10110 ..... ..... .....    @vvv
+vhsubw_qu_du     0111 00000101 10111 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index edd6e99b23..0eb37dda7a 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -29,3 +29,92 @@ void helper_vsub_q(CPULoongArchState *env,
 
     Vd->Q(0) = int128_sub(Vj->Q(0), Vk->Q(0));
 }
+
+#define DO_ADD(a, b)  (a + b)
+#define DO_SUB(a, b)  (a - b)
+
+#define DO_ODD_EVEN_S(NAME, BIT, T, E1, E2, DO_OP)                 \
+void HELPER(NAME)(CPULoongArchState *env,                          \
+                  uint32_t vd, uint32_t vj, uint32_t vk)           \
+{                                                                  \
+    int i;                                                         \
+    VReg *Vd = &(env->fpr[vd].vreg);                               \
+    VReg *Vj = &(env->fpr[vj].vreg);                               \
+    VReg *Vk = &(env->fpr[vk].vreg);                               \
+                                                                   \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                            \
+        Vd->E1(i) = DO_OP((T)Vj->E2(2 * i + 1), (T)Vk->E2(2 * i)); \
+    }                                                              \
+}
+
+DO_ODD_EVEN_S(vhaddw_h_b, 16, int16_t, H, B, DO_ADD)
+DO_ODD_EVEN_S(vhaddw_w_h, 32, int32_t, W, H, DO_ADD)
+DO_ODD_EVEN_S(vhaddw_d_w, 64, int64_t, D, W, DO_ADD)
+
+void HELPER(vhaddw_q_d)(CPULoongArchState *env,
+                        uint32_t vd, uint32_t vj, uint32_t vk)
+{
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vk = &(env->fpr[vk].vreg);
+
+    Vd->Q(0) = int128_add(int128_makes64(Vj->D(1)), int128_makes64(Vk->D(0)));
+}
+
+DO_ODD_EVEN_S(vhsubw_h_b, 16, int16_t, H, B, DO_SUB)
+DO_ODD_EVEN_S(vhsubw_w_h, 32, int32_t, W, H, DO_SUB)
+DO_ODD_EVEN_S(vhsubw_d_w, 64, int64_t, D, W, DO_SUB)
+
+void HELPER(vhsubw_q_d)(CPULoongArchState *env,
+                        uint32_t vd, uint32_t vj, uint32_t vk)
+{
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vk = &(env->fpr[vk].vreg);
+
+    Vd->Q(0) = int128_sub(int128_makes64(Vj->D(1)), int128_makes64(Vk->D(0)));
+}
+
+#define DO_ODD_EVEN_U(NAME, BIT, TD, TS,  E1, E2, DO_OP)                     \
+void HELPER(NAME)(CPULoongArchState *env,                                    \
+                  uint32_t vd, uint32_t vj, uint32_t vk)                     \
+{                                                                            \
+    int i;                                                                   \
+    VReg *Vd = &(env->fpr[vd].vreg);                                         \
+    VReg *Vj = &(env->fpr[vj].vreg);                                         \
+    VReg *Vk = &(env->fpr[vk].vreg);                                         \
+                                                                             \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                      \
+        Vd->E1(i) = DO_OP((TD)(TS)Vj->E2(2 * i + 1), (TD)(TS)Vk->E2(2 * i)); \
+    }                                                                        \
+}
+
+DO_ODD_EVEN_U(vhaddw_hu_bu, 16, uint16_t, uint8_t, H, B, DO_ADD)
+DO_ODD_EVEN_U(vhaddw_wu_hu, 32, uint32_t, uint16_t, W, H, DO_ADD)
+DO_ODD_EVEN_U(vhaddw_du_wu, 64, uint64_t, uint32_t, D, W, DO_ADD)
+
+void HELPER(vhaddw_qu_du)(CPULoongArchState *env,
+                          uint32_t vd, uint32_t vj, uint32_t vk)
+{
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vk = &(env->fpr[vk].vreg);
+
+    Vd->Q(0) = int128_add(int128_make64((uint64_t)Vj->D(1)),
+                          int128_make64((uint64_t)Vk->D(0)));
+}
+
+DO_ODD_EVEN_U(vhsubw_hu_bu, 16, uint16_t, uint8_t, H, B, DO_SUB)
+DO_ODD_EVEN_U(vhsubw_wu_hu, 32, uint32_t, uint16_t, W, H, DO_SUB)
+DO_ODD_EVEN_U(vhsubw_du_wu, 64, uint64_t, uint32_t, D, W, DO_SUB)
+
+void HELPER(vhsubw_qu_du)(CPULoongArchState *env,
+                          uint32_t vd, uint32_t vj, uint32_t vk)
+{
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vk = &(env->fpr[vk].vreg);
+
+    Vd->Q(0) = int128_sub(int128_make64((uint64_t)Vj->D(1)),
+                          int128_make64((uint64_t)Vk->D(0)));
+}
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 10/44] target/loongarch: Implement vaddw/vsubw
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (8 preceding siblings ...)
  2023-03-28  3:05 ` [RFC PATCH v2 09/44] target/loongarch: Implement vhaddw/vhsubw Song Gao
@ 2023-03-28  3:05 ` Song Gao
  2023-03-28 20:28   ` Richard Henderson
  2023-03-28  3:05 ` [RFC PATCH v2 11/44] target/loongarch: Implement vavg/vavgr Song Gao
                   ` (33 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VADDW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
- VSUBW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
- VADDW{EV/OD}.{H.BU.B/W.HU.H/D.WU.W/Q.DU.D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  43 +
 target/loongarch/helper.h                   |  45 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 832 ++++++++++++++++++++
 target/loongarch/insns.decode               |  43 +
 target/loongarch/lsx_helper.c               | 210 +++++
 5 files changed, 1173 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index adfd693938..8ee14916f3 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -865,3 +865,46 @@ INSN_LSX(vhsubw_hu_bu,     vvv)
 INSN_LSX(vhsubw_wu_hu,     vvv)
 INSN_LSX(vhsubw_du_wu,     vvv)
 INSN_LSX(vhsubw_qu_du,     vvv)
+
+INSN_LSX(vaddwev_h_b,      vvv)
+INSN_LSX(vaddwev_w_h,      vvv)
+INSN_LSX(vaddwev_d_w,      vvv)
+INSN_LSX(vaddwev_q_d,      vvv)
+INSN_LSX(vaddwod_h_b,      vvv)
+INSN_LSX(vaddwod_w_h,      vvv)
+INSN_LSX(vaddwod_d_w,      vvv)
+INSN_LSX(vaddwod_q_d,      vvv)
+INSN_LSX(vsubwev_h_b,      vvv)
+INSN_LSX(vsubwev_w_h,      vvv)
+INSN_LSX(vsubwev_d_w,      vvv)
+INSN_LSX(vsubwev_q_d,      vvv)
+INSN_LSX(vsubwod_h_b,      vvv)
+INSN_LSX(vsubwod_w_h,      vvv)
+INSN_LSX(vsubwod_d_w,      vvv)
+INSN_LSX(vsubwod_q_d,      vvv)
+
+INSN_LSX(vaddwev_h_bu,     vvv)
+INSN_LSX(vaddwev_w_hu,     vvv)
+INSN_LSX(vaddwev_d_wu,     vvv)
+INSN_LSX(vaddwev_q_du,     vvv)
+INSN_LSX(vaddwod_h_bu,     vvv)
+INSN_LSX(vaddwod_w_hu,     vvv)
+INSN_LSX(vaddwod_d_wu,     vvv)
+INSN_LSX(vaddwod_q_du,     vvv)
+INSN_LSX(vsubwev_h_bu,     vvv)
+INSN_LSX(vsubwev_w_hu,     vvv)
+INSN_LSX(vsubwev_d_wu,     vvv)
+INSN_LSX(vsubwev_q_du,     vvv)
+INSN_LSX(vsubwod_h_bu,     vvv)
+INSN_LSX(vsubwod_w_hu,     vvv)
+INSN_LSX(vsubwod_d_wu,     vvv)
+INSN_LSX(vsubwod_q_du,     vvv)
+
+INSN_LSX(vaddwev_h_bu_b,   vvv)
+INSN_LSX(vaddwev_w_hu_h,   vvv)
+INSN_LSX(vaddwev_d_wu_w,   vvv)
+INSN_LSX(vaddwev_q_du_d,   vvv)
+INSN_LSX(vaddwod_h_bu_b,   vvv)
+INSN_LSX(vaddwod_w_hu_h,   vvv)
+INSN_LSX(vaddwod_d_wu_w,   vvv)
+INSN_LSX(vaddwod_q_du_d,   vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 040f12c92c..566d9b6293 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -151,3 +151,48 @@ DEF_HELPER_4(vhsubw_hu_bu, void, env, i32, i32, i32)
 DEF_HELPER_4(vhsubw_wu_hu, void, env, i32, i32, i32)
 DEF_HELPER_4(vhsubw_du_wu, void, env, i32, i32, i32)
 DEF_HELPER_4(vhsubw_qu_du, void, env, i32, i32, i32)
+
+DEF_HELPER_FLAGS_4(vaddwev_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwev_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwev_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwev_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwod_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwod_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwod_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwod_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vsubwev_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwev_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwev_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwev_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwod_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwod_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwod_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwod_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vaddwev_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwev_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwev_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwev_q_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwod_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwod_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwod_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwod_q_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vsubwev_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwev_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwev_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwev_q_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwod_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwod_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwod_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwod_q_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vaddwev_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwev_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwev_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwev_q_du_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwod_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwod_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwod_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwod_q_du_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index d8b8c2a5ea..213a775490 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -145,3 +145,835 @@ TRANS(vhsubw_hu_bu, gen_vvv, gen_helper_vhsubw_hu_bu)
 TRANS(vhsubw_wu_hu, gen_vvv, gen_helper_vhsubw_wu_hu)
 TRANS(vhsubw_du_wu, gen_vvv, gen_helper_vhsubw_du_wu)
 TRANS(vhsubw_qu_du, gen_vvv, gen_helper_vhsubw_qu_du)
+
+static void gen_vaddwev_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2;
+
+    int halfbits = 4 << vece;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+
+    /* Sign-extend the even elements from a */
+    tcg_gen_shli_vec(vece, t1, a, halfbits);
+    tcg_gen_sari_vec(vece, t1, t1, halfbits);
+
+    /* Sign-extend the even elements from b */
+    tcg_gen_shli_vec(vece, t2, b, halfbits);
+    tcg_gen_sari_vec(vece, t2, t2, halfbits);
+
+    tcg_gen_add_vec(vece, t, t1, t2);
+}
+
+static void gen_vaddwev_w_h(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 t1, t2;
+
+    t1 = tcg_temp_new_i32();
+    t2 = tcg_temp_new_i32();
+    tcg_gen_shli_i32(t1, a, 16);
+    tcg_gen_sari_i32(t1, t1, 16);
+    tcg_gen_shli_i32(t2, b, 16);
+    tcg_gen_sari_i32(t2, t2, 16);
+    tcg_gen_add_i32(t, t1, t2);
+}
+
+static void gen_vaddwev_d_w(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t1, t2;
+
+    t1 = tcg_temp_new_i64();
+    t2 = tcg_temp_new_i64();
+    tcg_gen_shli_i64(t1, a, 32);
+    tcg_gen_sari_i64(t1, t1, 32);
+    tcg_gen_shli_i64(t2, b, 32);
+    tcg_gen_sari_i64(t2, t2, 32);
+    tcg_gen_add_i64(t, t1, t2);
+}
+
+static void do_vaddwev_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                         uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_shli_vec, INDEX_op_sari_vec, INDEX_op_add_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vaddwev_s,
+            .fno = gen_helper_vaddwev_h_b,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fni4 = gen_vaddwev_w_h,
+            .fniv = gen_vaddwev_s,
+            .fno = gen_helper_vaddwev_w_h,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fni8 = gen_vaddwev_d_w,
+            .fniv = gen_vaddwev_s,
+            .fno = gen_helper_vaddwev_d_w,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+        {
+            .fno = gen_helper_vaddwev_q_d,
+            .vece = MO_128
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vaddwev_h_b, gvec_vvv, MO_8, do_vaddwev_s)
+TRANS(vaddwev_w_h, gvec_vvv, MO_16, do_vaddwev_s)
+TRANS(vaddwev_d_w, gvec_vvv, MO_32, do_vaddwev_s)
+TRANS(vaddwev_q_d, gvec_vvv, MO_64, do_vaddwev_s)
+
+static void gen_vaddwod_w_h(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 t1, t2;
+
+    t1 = tcg_temp_new_i32();
+    t2 = tcg_temp_new_i32();
+    tcg_gen_sari_i32(t1, a, 16);
+    tcg_gen_sari_i32(t2, b, 16);
+    tcg_gen_add_i32(t, t1, t2);
+}
+
+static void gen_vaddwod_d_w(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t1, t2;
+
+    t1 = tcg_temp_new_i64();
+    t2 = tcg_temp_new_i64();
+    tcg_gen_sari_i64(t1, a, 32);
+    tcg_gen_sari_i64(t2, b, 32);
+    tcg_gen_add_i64(t, t1, t2);
+}
+
+static void gen_vaddwod_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2;
+
+    int halfbits = 4 << vece;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+
+    /* Sign-extend the odd elements for vector */
+    tcg_gen_sari_vec(vece, t1, a, halfbits);
+    tcg_gen_sari_vec(vece, t2, b, halfbits);
+
+    tcg_gen_add_vec(vece, t, t1, t2);
+}
+
+static void do_vaddwod_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                         uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_sari_vec, INDEX_op_add_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vaddwod_s,
+            .fno = gen_helper_vaddwod_h_b,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fni4 = gen_vaddwod_w_h,
+            .fniv = gen_vaddwod_s,
+            .fno = gen_helper_vaddwod_w_h,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fni8 = gen_vaddwod_d_w,
+            .fniv = gen_vaddwod_s,
+            .fno = gen_helper_vaddwod_d_w,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+        {
+            .fno = gen_helper_vaddwod_q_d,
+            .vece = MO_128
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vaddwod_h_b, gvec_vvv, MO_8, do_vaddwod_s)
+TRANS(vaddwod_w_h, gvec_vvv, MO_16, do_vaddwod_s)
+TRANS(vaddwod_d_w, gvec_vvv, MO_32, do_vaddwod_s)
+TRANS(vaddwod_q_d, gvec_vvv, MO_64, do_vaddwod_s)
+
+static void gen_vsubwev_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2;
+
+    int halfbits = 4 << vece;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+
+    /* Sign-extend the even elements from a */
+    tcg_gen_shli_vec(vece, t1, a, halfbits);
+    tcg_gen_sari_vec(vece, t1, t1, halfbits);
+
+    /* Sign-extend the even elements from b */
+    tcg_gen_shli_vec(vece, t2, b, halfbits);
+    tcg_gen_sari_vec(vece, t2, t2, halfbits);
+
+    tcg_gen_sub_vec(vece, t, t1, t2);
+}
+
+static void gen_vsubwev_w_h(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 t1, t2;
+
+    t1 = tcg_temp_new_i32();
+    t2 = tcg_temp_new_i32();
+    tcg_gen_shli_i32(t1, a, 16);
+    tcg_gen_sari_i32(t1, t1, 16);
+    tcg_gen_shli_i32(t2, b, 16);
+    tcg_gen_sari_i32(t2, t2, 16);
+    tcg_gen_sub_i32(t, t1, t2);
+}
+
+static void gen_vsubwev_d_w(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t1, t2;
+
+    t1 = tcg_temp_new_i64();
+    t2 = tcg_temp_new_i64();
+    tcg_gen_shli_i64(t1, a, 32);
+    tcg_gen_sari_i64(t1, t1, 32);
+    tcg_gen_shli_i64(t2, b, 32);
+    tcg_gen_sari_i64(t2, t2, 32);
+    tcg_gen_sub_i64(t, t1, t2);
+}
+
+static void do_vsubwev_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                         uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_shli_vec, INDEX_op_sari_vec, INDEX_op_sub_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vsubwev_s,
+            .fno = gen_helper_vsubwev_h_b,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fni4 = gen_vsubwev_w_h,
+            .fniv = gen_vsubwev_s,
+            .fno = gen_helper_vsubwev_w_h,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fni8 = gen_vsubwev_d_w,
+            .fniv = gen_vsubwev_s,
+            .fno = gen_helper_vsubwev_d_w,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+        {
+            .fno = gen_helper_vsubwev_q_d,
+            .vece = MO_128
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vsubwev_h_b, gvec_vvv, MO_8, do_vsubwev_s)
+TRANS(vsubwev_w_h, gvec_vvv, MO_16, do_vsubwev_s)
+TRANS(vsubwev_d_w, gvec_vvv, MO_32, do_vsubwev_s)
+TRANS(vsubwev_q_d, gvec_vvv, MO_64, do_vsubwev_s)
+
+static void gen_vsubwod_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2;
+
+    int halfbits = 4 << vece;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+
+    /* Sign-extend the odd elements for vector */
+    tcg_gen_sari_vec(vece, t1, a, halfbits);
+    tcg_gen_sari_vec(vece, t2, b, halfbits);
+
+    tcg_gen_sub_vec(vece, t, t1, t2);
+}
+
+static void gen_vsubwod_w_h(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 t1, t2;
+
+    t1 = tcg_temp_new_i32();
+    t2 = tcg_temp_new_i32();
+    tcg_gen_sari_i32(t1, a, 16);
+    tcg_gen_sari_i32(t2, b, 16);
+    tcg_gen_sub_i32(t, t1, t2);
+}
+
+static void gen_vsubwod_d_w(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t1, t2;
+
+    t1 = tcg_temp_new_i64();
+    t2 = tcg_temp_new_i64();
+    tcg_gen_sari_i64(t1, a, 32);
+    tcg_gen_sari_i64(t2, b, 32);
+    tcg_gen_sub_i64(t, t1, t2);
+}
+
+static void do_vsubwod_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                         uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_sari_vec, INDEX_op_sub_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vsubwod_s,
+            .fno = gen_helper_vsubwod_h_b,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fni4 = gen_vsubwod_w_h,
+            .fniv = gen_vsubwod_s,
+            .fno = gen_helper_vsubwod_w_h,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fni8 = gen_vsubwod_d_w,
+            .fniv = gen_vsubwod_s,
+            .fno = gen_helper_vsubwod_d_w,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+        {
+            .fno = gen_helper_vsubwod_q_d,
+            .vece = MO_128
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vsubwod_h_b, gvec_vvv, MO_8, do_vsubwod_s)
+TRANS(vsubwod_w_h, gvec_vvv, MO_16, do_vsubwod_s)
+TRANS(vsubwod_d_w, gvec_vvv, MO_32, do_vsubwod_s)
+TRANS(vsubwod_q_d, gvec_vvv, MO_64, do_vsubwod_s)
+
+static void gen_vaddwev_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2;
+
+    int halfbits = 4 << vece;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+
+    /* Zero-extend the even elements from a */
+    tcg_gen_shli_vec(vece, t1, a, halfbits);
+    tcg_gen_shri_vec(vece, t1, t1, halfbits);
+
+    /* Zero-extend the even elements from b */
+    tcg_gen_shli_vec(vece, t2, b, halfbits);
+    tcg_gen_shri_vec(vece, t2, t2, halfbits);
+
+    tcg_gen_add_vec(vece, t, t1, t2);
+}
+
+static void gen_vaddwev_w_hu(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 t1, t2;
+
+    t1 = tcg_temp_new_i32();
+    t2 = tcg_temp_new_i32();
+    tcg_gen_shli_i32(t1, a, 16);
+    tcg_gen_shri_i32(t1, t1, 16);
+    tcg_gen_shli_i32(t2, b, 16);
+    tcg_gen_shri_i32(t2, t2, 16);
+    tcg_gen_add_i32(t, t1, t2);
+}
+
+static void gen_vaddwev_d_wu(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t1, t2;
+
+    t1 = tcg_temp_new_i64();
+    t2 = tcg_temp_new_i64();
+    tcg_gen_shli_i64(t1, a, 32);
+    tcg_gen_shri_i64(t1, t1, 32);
+    tcg_gen_shli_i64(t2, b, 32);
+    tcg_gen_shri_i64(t2, t2, 32);
+    tcg_gen_add_i64(t, t1, t2);
+}
+
+static void do_vaddwev_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                         uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_shli_vec, INDEX_op_shri_vec, INDEX_op_add_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vaddwev_u,
+            .fno = gen_helper_vaddwev_h_bu,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fni4 = gen_vaddwev_w_hu,
+            .fniv = gen_vaddwev_u,
+            .fno = gen_helper_vaddwev_w_hu,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fni8 = gen_vaddwev_d_wu,
+            .fniv = gen_vaddwev_u,
+            .fno = gen_helper_vaddwev_d_wu,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+        {
+            .fno = gen_helper_vaddwev_q_du,
+            .vece = MO_128
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vaddwev_h_bu, gvec_vvv, MO_8, do_vaddwev_u)
+TRANS(vaddwev_w_hu, gvec_vvv, MO_16, do_vaddwev_u)
+TRANS(vaddwev_d_wu, gvec_vvv, MO_32, do_vaddwev_u)
+TRANS(vaddwev_q_du, gvec_vvv, MO_64, do_vaddwev_u)
+
+static void gen_vaddwod_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2;
+
+    int halfbits = 4 << vece;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+
+    /* Zero-extend the odd elements for vector */
+    tcg_gen_shri_vec(vece, t1, a, halfbits);
+    tcg_gen_shri_vec(vece, t2, b, halfbits);
+
+    tcg_gen_add_vec(vece, t, t1, t2);
+}
+
+static void gen_vaddwod_w_hu(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 t1, t2;
+
+    t1 = tcg_temp_new_i32();
+    t2 = tcg_temp_new_i32();
+    tcg_gen_shri_i32(t1, a, 16);
+    tcg_gen_shri_i32(t2, b, 16);
+    tcg_gen_add_i32(t, t1, t2);
+}
+
+static void gen_vaddwod_d_wu(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t1, t2;
+
+    t1 = tcg_temp_new_i64();
+    t2 = tcg_temp_new_i64();
+    tcg_gen_shri_i64(t1, a, 32);
+    tcg_gen_shri_i64(t2, b, 32);
+    tcg_gen_add_i64(t, t1, t2);
+}
+
+static void do_vaddwod_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                         uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_shri_vec, INDEX_op_add_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vaddwod_u,
+            .fno = gen_helper_vaddwod_h_bu,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fni4 = gen_vaddwod_w_hu,
+            .fniv = gen_vaddwod_u,
+            .fno = gen_helper_vaddwod_w_hu,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fni8 = gen_vaddwod_d_wu,
+            .fniv = gen_vaddwod_u,
+            .fno = gen_helper_vaddwod_d_wu,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+        {
+            .fno = gen_helper_vaddwod_q_du,
+            .vece = MO_128
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vaddwod_h_bu, gvec_vvv, MO_8, do_vaddwod_u)
+TRANS(vaddwod_w_hu, gvec_vvv, MO_16, do_vaddwod_u)
+TRANS(vaddwod_d_wu, gvec_vvv, MO_32, do_vaddwod_u)
+TRANS(vaddwod_q_du, gvec_vvv, MO_64, do_vaddwod_u)
+
+static void gen_vsubwev_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2;
+
+    int halfbits = 4 << vece;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+
+    /* Zero-extend the even elements from a */
+    tcg_gen_shli_vec(vece, t1, a, halfbits);
+    tcg_gen_shri_vec(vece, t1, t1, halfbits);
+
+    /* Zero-extend the even elements from b */
+    tcg_gen_shli_vec(vece, t2, b, halfbits);
+    tcg_gen_shri_vec(vece, t2, t2, halfbits);
+
+    tcg_gen_sub_vec(vece, t, t1, t2);
+}
+
+static void gen_vsubwev_w_hu(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 t1, t2;
+
+    t1 = tcg_temp_new_i32();
+    t2 = tcg_temp_new_i32();
+    tcg_gen_shli_i32(t1, a, 16);
+    tcg_gen_shri_i32(t1, t1, 16);
+    tcg_gen_shli_i32(t2, b, 16);
+    tcg_gen_shri_i32(t2, t2, 16);
+    tcg_gen_sub_i32(t, t1, t2);
+}
+
+static void gen_vsubwev_d_wu(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t1, t2;
+
+    t1 = tcg_temp_new_i64();
+    t2 = tcg_temp_new_i64();
+    tcg_gen_shli_i64(t1, a, 32);
+    tcg_gen_shri_i64(t1, t1, 32);
+    tcg_gen_shli_i64(t2, b, 32);
+    tcg_gen_shri_i64(t2, t2, 32);
+    tcg_gen_sub_i64(t, t1, t2);
+}
+
+static void do_vsubwev_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                         uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_shli_vec, INDEX_op_shri_vec, INDEX_op_sub_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vsubwev_u,
+            .fno = gen_helper_vsubwev_h_bu,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fni4 = gen_vsubwev_w_hu,
+            .fniv = gen_vsubwev_u,
+            .fno = gen_helper_vsubwev_w_hu,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fni8 = gen_vsubwev_d_wu,
+            .fniv = gen_vsubwev_u,
+            .fno = gen_helper_vsubwev_d_wu,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+        {
+            .fno = gen_helper_vsubwev_q_du,
+            .vece = MO_128
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vsubwev_h_bu, gvec_vvv, MO_8, do_vsubwev_u)
+TRANS(vsubwev_w_hu, gvec_vvv, MO_16, do_vsubwev_u)
+TRANS(vsubwev_d_wu, gvec_vvv, MO_32, do_vsubwev_u)
+TRANS(vsubwev_q_du, gvec_vvv, MO_64, do_vsubwev_u)
+
+static void gen_vsubwod_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2;
+
+    int halfbits = 4 << vece;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+
+    /* Zero-extend the odd elements for vector */
+    tcg_gen_shri_vec(vece, t1, a, halfbits);
+    tcg_gen_shri_vec(vece, t2, b, halfbits);
+
+    tcg_gen_sub_vec(vece, t, t1, t2);
+}
+
+static void gen_vsubwod_w_hu(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 t1, t2;
+
+    t1 = tcg_temp_new_i32();
+    t2 = tcg_temp_new_i32();
+    tcg_gen_shri_i32(t1, a, 16);
+    tcg_gen_shri_i32(t2, b, 16);
+    tcg_gen_sub_i32(t, t1, t2);
+}
+
+static void gen_vsubwod_d_wu(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t1, t2;
+
+    t1 = tcg_temp_new_i64();
+    t2 = tcg_temp_new_i64();
+    tcg_gen_shri_i64(t1, a, 32);
+    tcg_gen_shri_i64(t2, b, 32);
+    tcg_gen_sub_i64(t, t1, t2);
+}
+
+static void do_vsubwod_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                         uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_shri_vec, INDEX_op_sub_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vsubwod_u,
+            .fno = gen_helper_vsubwod_h_bu,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fni4 = gen_vsubwod_w_hu,
+            .fniv = gen_vsubwod_u,
+            .fno = gen_helper_vsubwod_w_hu,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fni8 = gen_vsubwod_d_wu,
+            .fniv = gen_vsubwod_u,
+            .fno = gen_helper_vsubwod_d_wu,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+        {
+            .fno = gen_helper_vsubwod_q_du,
+            .vece = MO_128
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vsubwod_h_bu, gvec_vvv, MO_8, do_vsubwod_u)
+TRANS(vsubwod_w_hu, gvec_vvv, MO_16, do_vsubwod_u)
+TRANS(vsubwod_d_wu, gvec_vvv, MO_32, do_vsubwod_u)
+TRANS(vsubwod_q_du, gvec_vvv, MO_64, do_vsubwod_u)
+
+static void gen_vaddwev_u_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2;
+
+    int halfbits = 4 << vece;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+
+    /* Zero-extend the even elements from a */
+    tcg_gen_shli_vec(vece, t1, a, halfbits);
+    tcg_gen_shri_vec(vece, t1, t1, halfbits);
+
+    /* Sign-extend the even elements from b */
+    tcg_gen_shli_vec(vece, t2, b, halfbits);
+    tcg_gen_sari_vec(vece, t2, t2, halfbits);
+
+    tcg_gen_add_vec(vece, t, t1, t2);
+}
+
+static void gen_vaddwev_w_hu_h(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 t1, t2;
+
+    t1 = tcg_temp_new_i32();
+    t2 = tcg_temp_new_i32();
+    tcg_gen_shli_i32(t1, a, 16);
+    tcg_gen_shri_i32(t1, t1, 16);
+    tcg_gen_shli_i32(t2, b, 16);
+    tcg_gen_sari_i32(t2, t2, 16);
+    tcg_gen_add_i32(t, t1, t2);
+}
+
+static void gen_vaddwev_d_wu_w(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t1, t2;
+
+    t1 = tcg_temp_new_i64();
+    t2 = tcg_temp_new_i64();
+    tcg_gen_shli_i64(t1, a, 32);
+    tcg_gen_shri_i64(t1, t1, 32);
+    tcg_gen_shli_i64(t2, b, 32);
+    tcg_gen_sari_i64(t2, t2, 32);
+    tcg_gen_add_i64(t, t1, t2);
+}
+
+static void do_vaddwev_u_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                           uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_shli_vec, INDEX_op_shri_vec,
+        INDEX_op_sari_vec, INDEX_op_add_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vaddwev_u_s,
+            .fno = gen_helper_vaddwev_h_bu_b,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fni4 = gen_vaddwev_w_hu_h,
+            .fniv = gen_vaddwev_u_s,
+            .fno = gen_helper_vaddwev_w_hu_h,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fni8 = gen_vaddwev_d_wu_w,
+            .fniv = gen_vaddwev_u_s,
+            .fno = gen_helper_vaddwev_d_wu_w,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+        {
+            .fno = gen_helper_vaddwev_q_du_d,
+            .vece = MO_128
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vaddwev_h_bu_b, gvec_vvv, MO_8, do_vaddwev_u_s)
+TRANS(vaddwev_w_hu_h, gvec_vvv, MO_16, do_vaddwev_u_s)
+TRANS(vaddwev_d_wu_w, gvec_vvv, MO_32, do_vaddwev_u_s)
+TRANS(vaddwev_q_du_d, gvec_vvv, MO_64, do_vaddwev_u_s)
+
+static void gen_vaddwod_u_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2;
+
+    int halfbits = 4 << vece;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+
+    /* Zero-extend the odd elements from a */
+    tcg_gen_shri_vec(vece, t1, a, halfbits);
+    /* Sign-extend the odd elements from b */
+    tcg_gen_sari_vec(vece, t2, b, halfbits);
+
+    tcg_gen_add_vec(vece, t, t1, t2);
+}
+
+static void gen_vaddwod_w_hu_h(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 t1, t2;
+
+    t1 = tcg_temp_new_i32();
+    t2 = tcg_temp_new_i32();
+    tcg_gen_shri_i32(t1, a, 16);
+    tcg_gen_sari_i32(t2, b, 16);
+    tcg_gen_add_i32(t, t1, t2);
+}
+
+static void gen_vaddwod_d_wu_w(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t1, t2;
+
+    t1 = tcg_temp_new_i64();
+    t2 = tcg_temp_new_i64();
+    tcg_gen_shri_i64(t1, a, 32);
+    tcg_gen_sari_i64(t2, b, 32);
+    tcg_gen_add_i64(t, t1, t2);
+}
+
+static void do_vaddwod_u_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                           uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_shri_vec, INDEX_op_sari_vec,  INDEX_op_add_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vaddwod_u_s,
+            .fno = gen_helper_vaddwod_h_bu_b,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fni4 = gen_vaddwod_w_hu_h,
+            .fniv = gen_vaddwod_u_s,
+            .fno = gen_helper_vaddwod_w_hu_h,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fni8 = gen_vaddwod_d_wu_w,
+            .fniv = gen_vaddwod_u_s,
+            .fno = gen_helper_vaddwod_d_wu_w,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+        {
+            .fno = gen_helper_vaddwod_q_du_d,
+            .vece = MO_128
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vaddwod_h_bu_b, gvec_vvv, MO_8, do_vaddwod_u_s)
+TRANS(vaddwod_w_hu_h, gvec_vvv, MO_16, do_vaddwod_u_s)
+TRANS(vaddwod_d_wu_w, gvec_vvv, MO_32, do_vaddwod_u_s)
+TRANS(vaddwod_q_du_d, gvec_vvv, MO_64, do_vaddwod_u_s)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 10a20858e5..ee16155b31 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -559,3 +559,46 @@ vhsubw_hu_bu     0111 00000101 10100 ..... ..... .....    @vvv
 vhsubw_wu_hu     0111 00000101 10101 ..... ..... .....    @vvv
 vhsubw_du_wu     0111 00000101 10110 ..... ..... .....    @vvv
 vhsubw_qu_du     0111 00000101 10111 ..... ..... .....    @vvv
+
+vaddwev_h_b      0111 00000001 11100 ..... ..... .....    @vvv
+vaddwev_w_h      0111 00000001 11101 ..... ..... .....    @vvv
+vaddwev_d_w      0111 00000001 11110 ..... ..... .....    @vvv
+vaddwev_q_d      0111 00000001 11111 ..... ..... .....    @vvv
+vaddwod_h_b      0111 00000010 00100 ..... ..... .....    @vvv
+vaddwod_w_h      0111 00000010 00101 ..... ..... .....    @vvv
+vaddwod_d_w      0111 00000010 00110 ..... ..... .....    @vvv
+vaddwod_q_d      0111 00000010 00111 ..... ..... .....    @vvv
+vsubwev_h_b      0111 00000010 00000 ..... ..... .....    @vvv
+vsubwev_w_h      0111 00000010 00001 ..... ..... .....    @vvv
+vsubwev_d_w      0111 00000010 00010 ..... ..... .....    @vvv
+vsubwev_q_d      0111 00000010 00011 ..... ..... .....    @vvv
+vsubwod_h_b      0111 00000010 01000 ..... ..... .....    @vvv
+vsubwod_w_h      0111 00000010 01001 ..... ..... .....    @vvv
+vsubwod_d_w      0111 00000010 01010 ..... ..... .....    @vvv
+vsubwod_q_d      0111 00000010 01011 ..... ..... .....    @vvv
+
+vaddwev_h_bu     0111 00000010 11100 ..... ..... .....    @vvv
+vaddwev_w_hu     0111 00000010 11101 ..... ..... .....    @vvv
+vaddwev_d_wu     0111 00000010 11110 ..... ..... .....    @vvv
+vaddwev_q_du     0111 00000010 11111 ..... ..... .....    @vvv
+vaddwod_h_bu     0111 00000011 00100 ..... ..... .....    @vvv
+vaddwod_w_hu     0111 00000011 00101 ..... ..... .....    @vvv
+vaddwod_d_wu     0111 00000011 00110 ..... ..... .....    @vvv
+vaddwod_q_du     0111 00000011 00111 ..... ..... .....    @vvv
+vsubwev_h_bu     0111 00000011 00000 ..... ..... .....    @vvv
+vsubwev_w_hu     0111 00000011 00001 ..... ..... .....    @vvv
+vsubwev_d_wu     0111 00000011 00010 ..... ..... .....    @vvv
+vsubwev_q_du     0111 00000011 00011 ..... ..... .....    @vvv
+vsubwod_h_bu     0111 00000011 01000 ..... ..... .....    @vvv
+vsubwod_w_hu     0111 00000011 01001 ..... ..... .....    @vvv
+vsubwod_d_wu     0111 00000011 01010 ..... ..... .....    @vvv
+vsubwod_q_du     0111 00000011 01011 ..... ..... .....    @vvv
+
+vaddwev_h_bu_b   0111 00000011 11100 ..... ..... .....    @vvv
+vaddwev_w_hu_h   0111 00000011 11101 ..... ..... .....    @vvv
+vaddwev_d_wu_w   0111 00000011 11110 ..... ..... .....    @vvv
+vaddwev_q_du_d   0111 00000011 11111 ..... ..... .....    @vvv
+vaddwod_h_bu_b   0111 00000100 00000 ..... ..... .....    @vvv
+vaddwod_w_hu_h   0111 00000100 00001 ..... ..... .....    @vvv
+vaddwod_d_wu_w   0111 00000100 00010 ..... ..... .....    @vvv
+vaddwod_q_du_d   0111 00000100 00011 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 0eb37dda7a..96b052c95a 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -118,3 +118,213 @@ void HELPER(vhsubw_qu_du)(CPULoongArchState *env,
     Vd->Q(0) = int128_sub(int128_make64((uint64_t)Vj->D(1)),
                           int128_make64((uint64_t)Vk->D(0)));
 }
+
+#define DO_EVEN_S(NAME, BIT, T, E1, E2, DO_OP)                 \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v)    \
+{                                                              \
+    int i;                                                     \
+    VReg *Vd = (VReg *)vd;                                     \
+    VReg *Vj = (VReg *)vj;                                     \
+    VReg *Vk = (VReg *)vk;                                     \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                        \
+        Vd->E1(i) = DO_OP((T)Vj->E2(2 * i) ,(T)Vk->E2(2 * i)); \
+    }                                                          \
+}
+
+#define DO_ODD_S(NAME, BIT, T, E1, E2, DO_OP)                          \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v)            \
+{                                                                      \
+    int i;                                                             \
+    VReg *Vd = (VReg *)vd;                                             \
+    VReg *Vj = (VReg *)vj;                                             \
+    VReg *Vk = (VReg *)vk;                                             \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                \
+        Vd->E1(i) = DO_OP((T)Vj->E2(2 * i + 1), (T)Vk->E2(2 * i + 1)); \
+    }                                                                  \
+}
+
+void HELPER(vaddwev_q_d)(void *vd, void *vj, void *vk, uint32_t v)
+{
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
+
+    Vd->Q(0) = int128_add(int128_makes64(Vj->D(0)), int128_makes64(Vk->D(0)));
+}
+
+DO_EVEN_S(vaddwev_h_b, 16, int16_t, H, B, DO_ADD)
+DO_EVEN_S(vaddwev_w_h, 32, int32_t, W, H, DO_ADD)
+DO_EVEN_S(vaddwev_d_w, 64, int64_t, D, W, DO_ADD)
+
+void HELPER(vaddwod_q_d)(void *vd, void *vj, void *vk, uint32_t v)
+{
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
+
+    Vd->Q(0) = int128_add(int128_makes64(Vj->D(1)), int128_makes64(Vk->D(1)));
+}
+
+DO_ODD_S(vaddwod_h_b, 16, int16_t, H, B, DO_ADD)
+DO_ODD_S(vaddwod_w_h, 32, int32_t, W, H, DO_ADD)
+DO_ODD_S(vaddwod_d_w, 64, int64_t, D, W, DO_ADD)
+
+void HELPER(vsubwev_q_d)(void *vd, void *vj, void *vk, uint32_t v)
+{
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
+
+    Vd->Q(0) = int128_sub(int128_makes64(Vj->D(0)), int128_makes64(Vk->D(0)));
+}
+
+DO_EVEN_S(vsubwev_h_b, 16, int16_t, H, B, DO_SUB)
+DO_EVEN_S(vsubwev_w_h, 32, int32_t, W, H, DO_SUB)
+DO_EVEN_S(vsubwev_d_w, 64, int64_t, D, W, DO_SUB)
+
+void HELPER(vsubwod_q_d)(void *vd, void *vj, void *vk, uint32_t v)
+{
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
+
+    Vd->Q(0) = int128_sub(int128_makes64(Vj->D(1)), int128_makes64(Vk->D(1)));
+}
+
+DO_ODD_S(vsubwod_h_b, 16, int16_t, H, B, DO_SUB)
+DO_ODD_S(vsubwod_w_h, 32, int32_t, W, H, DO_SUB)
+DO_ODD_S(vsubwod_d_w, 64, int64_t, D, W, DO_SUB)
+
+#define DO_EVEN_U(NAME, BIT, TD, TS, E1, E2, DO_OP)         \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
+{                                                           \
+    int i;                                                  \
+    VReg *Vd = (VReg *)vd;                                  \
+    VReg *Vj = (VReg *)vj;                                  \
+    VReg *Vk = (VReg *)vk;                                  \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
+        Vd->E1(i) = DO_OP((TD)(TS)Vj->E2(2 * i),            \
+                          (TD)(TS)Vk->E2(2 * i));           \
+    }                                                       \
+}
+
+#define DO_ODD_U(NAME, BIT, TD, TS, E1, E2, DO_OP)          \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
+{                                                           \
+    int i;                                                  \
+    VReg *Vd = (VReg *)vd;                                  \
+    VReg *Vj = (VReg *)vj;                                  \
+    VReg *Vk = (VReg *)vk;                                  \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
+        Vd->E1(i) = DO_OP((TD)(TS)Vj->E2(2 * i + 1),        \
+                          (TD)(TS)Vk->E2(2 * i + 1));       \
+    }                                                       \
+}
+
+void HELPER(vaddwev_q_du)(void *vd, void *vj, void *vk, uint32_t v)
+{
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
+
+    Vd->Q(0) = int128_add(int128_make64((uint64_t)Vj->D(0)),
+                          int128_make64((uint64_t)Vk->D(0)));
+}
+
+DO_EVEN_U(vaddwev_h_bu, 16, uint16_t, uint8_t, H, B, DO_ADD)
+DO_EVEN_U(vaddwev_w_hu, 32, uint32_t, uint16_t, W, H, DO_ADD)
+DO_EVEN_U(vaddwev_d_wu, 64, uint64_t, uint32_t, D, W, DO_ADD)
+
+void HELPER(vaddwod_q_du)(void *vd, void *vj, void *vk, uint32_t v)
+{
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
+
+    Vd->Q(0) = int128_add(int128_make64((uint64_t)Vj->D(1)),
+                          int128_make64((uint64_t)Vk->D(1)));
+}
+
+DO_ODD_U(vaddwod_h_bu, 16, uint16_t, uint8_t, H, B, DO_ADD)
+DO_ODD_U(vaddwod_w_hu, 32, uint32_t, uint16_t, W, H, DO_ADD)
+DO_ODD_U(vaddwod_d_wu, 64, uint64_t, uint32_t, D, W, DO_ADD)
+
+void HELPER(vsubwev_q_du)(void *vd, void *vj, void *vk, uint32_t v)
+{
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
+
+    Vd->Q(0) = int128_sub(int128_make64((uint64_t)Vj->D(0)),
+                          int128_make64((uint64_t)Vk->D(0)));
+}
+
+DO_EVEN_U(vsubwev_h_bu, 16, uint16_t, uint8_t, H, B, DO_SUB)
+DO_EVEN_U(vsubwev_w_hu, 32, uint32_t, uint16_t, W, H, DO_SUB)
+DO_EVEN_U(vsubwev_d_wu, 64, uint64_t, uint32_t, D, W, DO_SUB)
+
+void HELPER(vsubwod_q_du)(void *vd, void *vj, void *vk, uint32_t v)
+{
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
+
+    Vd->Q(0) = int128_sub(int128_make64((uint64_t)Vj->D(1)),
+                          int128_make64((uint64_t)Vk->D(1)));
+}
+
+DO_ODD_U(vsubwod_h_bu, 16, uint16_t, uint8_t, H, B, DO_SUB)
+DO_ODD_U(vsubwod_w_hu, 32, uint32_t, uint16_t, W, H, DO_SUB)
+DO_ODD_U(vsubwod_d_wu, 64, uint64_t, uint32_t, D, W, DO_SUB)
+
+#define DO_EVEN_U_S(NAME, BIT, T1, TD1, T2, E1, E2, DO_OP)            \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v)           \
+{                                                                     \
+    int i;                                                            \
+    VReg *Vd = (VReg *)vd;                                            \
+    VReg *Vj = (VReg *)vj;                                            \
+    VReg *Vk = (VReg *)vk;                                            \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                               \
+        Vd->E1(i) = DO_OP((TD1)(T1)Vj->E2(2 * i) ,(T2)Vk->E2(2 * i)); \
+    }                                                                 \
+}
+
+#define DO_ODD_U_S(NAME, BIT, T1, TD1, T2, E1, E2, DO_OP)                     \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v)                   \
+{                                                                             \
+    int i;                                                                    \
+    VReg *Vd = (VReg *)vd;                                                    \
+    VReg *Vj = (VReg *)vj;                                                    \
+    VReg *Vk = (VReg *)vk;                                                    \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                       \
+        Vd->E1(i) = DO_OP((TD1)(T1)Vj->E2(2 * i + 1), (T2)Vk->E2(2 * i + 1)); \
+    }                                                                         \
+}
+
+void HELPER(vaddwev_q_du_d)(void *vd, void *vj, void *vk, uint32_t v)
+{
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
+
+    Vd->Q(0) = int128_add(int128_make64((uint64_t)Vj->D(0)),
+                          int128_makes64(Vk->D(0)));
+}
+
+DO_EVEN_U_S(vaddwev_h_bu_b, 16, uint16_t, uint8_t, int16_t, H, B, DO_ADD)
+DO_EVEN_U_S(vaddwev_w_hu_h, 32, uint32_t, uint16_t, int32_t, W, H, DO_ADD)
+DO_EVEN_U_S(vaddwev_d_wu_w, 64, uint64_t, uint32_t, int64_t, D, W, DO_ADD)
+
+void HELPER(vaddwod_q_du_d)(void *vd, void *vj, void *vk, uint32_t v)
+{
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
+
+    Vd->Q(0) = int128_add(int128_make64((uint64_t)Vj->D(1)),
+                          int128_makes64(Vk->D(1)));
+}
+
+DO_ODD_U_S(vaddwod_h_bu_b, 16, uint16_t, uint8_t, int16_t, H, B, DO_ADD)
+DO_ODD_U_S(vaddwod_w_hu_h, 32, uint32_t, uint16_t, int32_t, W, H, DO_ADD)
+DO_ODD_U_S(vaddwod_d_wu_w, 64, uint64_t, uint32_t, int64_t, D, W, DO_ADD)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 11/44] target/loongarch: Implement vavg/vavgr
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (9 preceding siblings ...)
  2023-03-28  3:05 ` [RFC PATCH v2 10/44] target/loongarch: Implement vaddw/vsubw Song Gao
@ 2023-03-28  3:05 ` Song Gao
  2023-03-28 20:31   ` Richard Henderson
  2023-03-28  3:05 ` [RFC PATCH v2 12/44] target/loongarch: Implement vabsd Song Gao
                   ` (32 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VAVG.{B/H/W/D}[U];
- VAVGR.{B/H/W/D}[U].

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  17 ++
 target/loongarch/helper.h                   |  18 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 197 ++++++++++++++++++++
 target/loongarch/insns.decode               |  17 ++
 target/loongarch/lsx_helper.c               |  45 +++++
 5 files changed, 294 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 8ee14916f3..e7592e7a34 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -908,3 +908,20 @@ INSN_LSX(vaddwod_h_bu_b,   vvv)
 INSN_LSX(vaddwod_w_hu_h,   vvv)
 INSN_LSX(vaddwod_d_wu_w,   vvv)
 INSN_LSX(vaddwod_q_du_d,   vvv)
+
+INSN_LSX(vavg_b,           vvv)
+INSN_LSX(vavg_h,           vvv)
+INSN_LSX(vavg_w,           vvv)
+INSN_LSX(vavg_d,           vvv)
+INSN_LSX(vavg_bu,          vvv)
+INSN_LSX(vavg_hu,          vvv)
+INSN_LSX(vavg_wu,          vvv)
+INSN_LSX(vavg_du,          vvv)
+INSN_LSX(vavgr_b,          vvv)
+INSN_LSX(vavgr_h,          vvv)
+INSN_LSX(vavgr_w,          vvv)
+INSN_LSX(vavgr_d,          vvv)
+INSN_LSX(vavgr_bu,         vvv)
+INSN_LSX(vavgr_hu,         vvv)
+INSN_LSX(vavgr_wu,         vvv)
+INSN_LSX(vavgr_du,         vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 566d9b6293..021fe3cd60 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -196,3 +196,21 @@ DEF_HELPER_FLAGS_4(vaddwod_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vaddwod_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vaddwod_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vaddwod_q_du_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vavg_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavg_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavg_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavg_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavg_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavg_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vavgr_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavgr_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavgr_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavgr_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavgr_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavgr_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavgr_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavgr_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 213a775490..512fe947f6 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -977,3 +977,200 @@ TRANS(vaddwod_h_bu_b, gvec_vvv, MO_8, do_vaddwod_u_s)
 TRANS(vaddwod_w_hu_h, gvec_vvv, MO_16, do_vaddwod_u_s)
 TRANS(vaddwod_d_wu_w, gvec_vvv, MO_32, do_vaddwod_u_s)
 TRANS(vaddwod_q_du_d, gvec_vvv, MO_64, do_vaddwod_u_s)
+
+static void do_vavg(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b,
+                    void (*gen_shr_vec)(unsigned, TCGv_vec,
+                                        TCGv_vec, int64_t),
+                    void (*gen_round_vec)(unsigned, TCGv_vec,
+                                          TCGv_vec, TCGv_vec))
+{
+    TCGv_vec tmp = tcg_temp_new_vec_matching(t);
+    gen_round_vec(vece, tmp, a, b);
+    tcg_gen_and_vec(vece, tmp, tmp, tcg_constant_vec_matching(t, vece, 1));
+    gen_shr_vec(vece, a, a, 1);
+    gen_shr_vec(vece, b, b, 1);
+    tcg_gen_add_vec(vece, t, a, b);
+    tcg_gen_add_vec(vece, t, t, tmp);
+}
+
+static void gen_vavg_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    do_vavg(vece, t, a, b, tcg_gen_sari_vec, tcg_gen_and_vec);
+}
+
+static void gen_vavg_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    do_vavg(vece, t, a, b, tcg_gen_shri_vec, tcg_gen_and_vec);
+}
+
+static void gen_vavgr_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    do_vavg(vece, t, a, b, tcg_gen_sari_vec, tcg_gen_or_vec);
+}
+
+static void gen_vavgr_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    do_vavg(vece, t, a, b, tcg_gen_shri_vec, tcg_gen_or_vec);
+}
+
+static void do_vavg_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                      uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_sari_vec, INDEX_op_add_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vavg_s,
+            .fno = gen_helper_vavg_b,
+            .opt_opc = vecop_list,
+            .vece = MO_8
+        },
+        {
+            .fniv = gen_vavg_s,
+            .fno = gen_helper_vavg_h,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vavg_s,
+            .fno = gen_helper_vavg_w,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vavg_s,
+            .fno = gen_helper_vavg_d,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+static void do_vavg_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                      uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_shri_vec, INDEX_op_add_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vavg_u,
+            .fno = gen_helper_vavg_bu,
+            .opt_opc = vecop_list,
+            .vece = MO_8
+        },
+        {
+            .fniv = gen_vavg_u,
+            .fno = gen_helper_vavg_hu,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vavg_u,
+            .fno = gen_helper_vavg_wu,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vavg_u,
+            .fno = gen_helper_vavg_du,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vavg_b, gvec_vvv, MO_8, do_vavg_s)
+TRANS(vavg_h, gvec_vvv, MO_16, do_vavg_s)
+TRANS(vavg_w, gvec_vvv, MO_32, do_vavg_s)
+TRANS(vavg_d, gvec_vvv, MO_64, do_vavg_s)
+TRANS(vavg_bu, gvec_vvv, MO_8, do_vavg_u)
+TRANS(vavg_hu, gvec_vvv, MO_16, do_vavg_u)
+TRANS(vavg_wu, gvec_vvv, MO_32, do_vavg_u)
+TRANS(vavg_du, gvec_vvv, MO_64, do_vavg_u)
+
+static void do_vavgr_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                       uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_sari_vec, INDEX_op_add_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vavgr_s,
+            .fno = gen_helper_vavgr_b,
+            .opt_opc = vecop_list,
+            .vece = MO_8
+        },
+        {
+            .fniv = gen_vavgr_s,
+            .fno = gen_helper_vavgr_h,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vavgr_s,
+            .fno = gen_helper_vavgr_w,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vavgr_s,
+            .fno = gen_helper_vavgr_d,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+static void do_vavgr_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                       uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_shri_vec, INDEX_op_add_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vavgr_u,
+            .fno = gen_helper_vavgr_bu,
+            .opt_opc = vecop_list,
+            .vece = MO_8
+        },
+        {
+            .fniv = gen_vavgr_u,
+            .fno = gen_helper_vavgr_hu,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vavgr_u,
+            .fno = gen_helper_vavgr_wu,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vavgr_u,
+            .fno = gen_helper_vavgr_du,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vavgr_b, gvec_vvv, MO_8, do_vavgr_s)
+TRANS(vavgr_h, gvec_vvv, MO_16, do_vavgr_s)
+TRANS(vavgr_w, gvec_vvv, MO_32, do_vavgr_s)
+TRANS(vavgr_d, gvec_vvv, MO_64, do_vavgr_s)
+TRANS(vavgr_bu, gvec_vvv, MO_8, do_vavgr_u)
+TRANS(vavgr_hu, gvec_vvv, MO_16, do_vavgr_u)
+TRANS(vavgr_wu, gvec_vvv, MO_32, do_vavgr_u)
+TRANS(vavgr_du, gvec_vvv, MO_64, do_vavgr_u)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index ee16155b31..4a44380259 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -602,3 +602,20 @@ vaddwod_h_bu_b   0111 00000100 00000 ..... ..... .....    @vvv
 vaddwod_w_hu_h   0111 00000100 00001 ..... ..... .....    @vvv
 vaddwod_d_wu_w   0111 00000100 00010 ..... ..... .....    @vvv
 vaddwod_q_du_d   0111 00000100 00011 ..... ..... .....    @vvv
+
+vavg_b           0111 00000110 01000 ..... ..... .....    @vvv
+vavg_h           0111 00000110 01001 ..... ..... .....    @vvv
+vavg_w           0111 00000110 01010 ..... ..... .....    @vvv
+vavg_d           0111 00000110 01011 ..... ..... .....    @vvv
+vavg_bu          0111 00000110 01100 ..... ..... .....    @vvv
+vavg_hu          0111 00000110 01101 ..... ..... .....    @vvv
+vavg_wu          0111 00000110 01110 ..... ..... .....    @vvv
+vavg_du          0111 00000110 01111 ..... ..... .....    @vvv
+vavgr_b          0111 00000110 10000 ..... ..... .....    @vvv
+vavgr_h          0111 00000110 10001 ..... ..... .....    @vvv
+vavgr_w          0111 00000110 10010 ..... ..... .....    @vvv
+vavgr_d          0111 00000110 10011 ..... ..... .....    @vvv
+vavgr_bu         0111 00000110 10100 ..... ..... .....    @vvv
+vavgr_hu         0111 00000110 10101 ..... ..... .....    @vvv
+vavgr_wu         0111 00000110 10110 ..... ..... .....    @vvv
+vavgr_du         0111 00000110 10111 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 96b052c95a..b539eea6ad 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -328,3 +328,48 @@ void HELPER(vaddwod_q_du_d)(void *vd, void *vj, void *vk, uint32_t v)
 DO_ODD_U_S(vaddwod_h_bu_b, 16, uint16_t, uint8_t, int16_t, H, B, DO_ADD)
 DO_ODD_U_S(vaddwod_w_hu_h, 32, uint32_t, uint16_t, int32_t, W, H, DO_ADD)
 DO_ODD_U_S(vaddwod_d_wu_w, 64, uint64_t, uint32_t, int64_t, D, W, DO_ADD)
+
+#define DO_VAVG(a, b)  ((a >> 1) + (b >> 1) + (a & b & 1))
+#define DO_VAVGR(a, b) ((a >> 1) + (b >> 1) + ((a | b) & 1))
+
+#define DO_VAVG_S(NAME, BIT, E, DO_OP)                      \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
+{                                                           \
+    int i;                                                  \
+    VReg *Vd = (VReg *)vd;                                  \
+    VReg *Vj = (VReg *)vj;                                  \
+    VReg *Vk = (VReg *)vk;                                  \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
+        Vd->E(i) = DO_OP(Vj->E(i), Vk->E(i));               \
+    }                                                       \
+}
+
+DO_VAVG_S(vavg_b, 8, B, DO_VAVG)
+DO_VAVG_S(vavg_h, 16, H, DO_VAVG)
+DO_VAVG_S(vavg_w, 32, W, DO_VAVG)
+DO_VAVG_S(vavg_d, 64, D, DO_VAVG)
+DO_VAVG_S(vavgr_b, 8, B, DO_VAVGR)
+DO_VAVG_S(vavgr_h, 16, H, DO_VAVGR)
+DO_VAVG_S(vavgr_w, 32, W, DO_VAVGR)
+DO_VAVG_S(vavgr_d, 64, D, DO_VAVGR)
+
+#define DO_VAVG_U(NAME, BIT, T, E, DO_OP)                   \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
+{                                                           \
+    int i;                                                  \
+    VReg *Vd = (VReg *)vd;                                  \
+    VReg *Vj = (VReg *)vj;                                  \
+    VReg *Vk = (VReg *)vk;                                  \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
+        Vd->E(i) = DO_OP((T)Vj->E(i), (T)Vk->E(i));         \
+    }                                                       \
+}
+
+DO_VAVG_U(vavg_bu, 8, uint8_t, B, DO_VAVG)
+DO_VAVG_U(vavg_hu, 16, uint16_t, H, DO_VAVG)
+DO_VAVG_U(vavg_wu, 32, uint32_t, W, DO_VAVG)
+DO_VAVG_U(vavg_du, 64, uint64_t, D, DO_VAVG)
+DO_VAVG_U(vavgr_bu, 8, uint8_t, B, DO_VAVGR)
+DO_VAVG_U(vavgr_hu, 16, uint16_t, H, DO_VAVGR)
+DO_VAVG_U(vavgr_wu, 32, uint32_t, W, DO_VAVGR)
+DO_VAVG_U(vavgr_du, 64, uint64_t, D, DO_VAVGR)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 12/44] target/loongarch: Implement vabsd
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (10 preceding siblings ...)
  2023-03-28  3:05 ` [RFC PATCH v2 11/44] target/loongarch: Implement vavg/vavgr Song Gao
@ 2023-03-28  3:05 ` Song Gao
  2023-03-28 20:32   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 13/44] target/loongarch: Implement vadda Song Gao
                   ` (31 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VABSD.{B/H/W/D}[U].

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  9 ++
 target/loongarch/helper.h                   |  9 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 95 +++++++++++++++++++++
 target/loongarch/insns.decode               |  9 ++
 target/loongarch/lsx_helper.c               | 36 ++++++++
 5 files changed, 158 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index e7592e7a34..e98ea37793 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -925,3 +925,12 @@ INSN_LSX(vavgr_bu,         vvv)
 INSN_LSX(vavgr_hu,         vvv)
 INSN_LSX(vavgr_wu,         vvv)
 INSN_LSX(vavgr_du,         vvv)
+
+INSN_LSX(vabsd_b,          vvv)
+INSN_LSX(vabsd_h,          vvv)
+INSN_LSX(vabsd_w,          vvv)
+INSN_LSX(vabsd_d,          vvv)
+INSN_LSX(vabsd_bu,         vvv)
+INSN_LSX(vabsd_hu,         vvv)
+INSN_LSX(vabsd_wu,         vvv)
+INSN_LSX(vabsd_du,         vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 021fe3cd60..a2f1999997 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -214,3 +214,12 @@ DEF_HELPER_FLAGS_4(vavgr_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vavgr_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vavgr_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vavgr_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vabsd_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vabsd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vabsd_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vabsd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vabsd_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vabsd_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vabsd_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vabsd_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 512fe947f6..3a75347db1 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -1174,3 +1174,98 @@ TRANS(vavgr_bu, gvec_vvv, MO_8, do_vavgr_u)
 TRANS(vavgr_hu, gvec_vvv, MO_16, do_vavgr_u)
 TRANS(vavgr_wu, gvec_vvv, MO_32, do_vavgr_u)
 TRANS(vavgr_du, gvec_vvv, MO_64, do_vavgr_u)
+
+static void gen_vabsd_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    tcg_gen_smax_vec(vece, t, a, b);
+    tcg_gen_smin_vec(vece, a, a, b);
+    tcg_gen_sub_vec(vece, t, t, a);
+}
+
+static void do_vabsd_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                       uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_smax_vec, INDEX_op_smin_vec, INDEX_op_sub_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vabsd_s,
+            .fno = gen_helper_vabsd_b,
+            .opt_opc = vecop_list,
+            .vece = MO_8
+        },
+        {
+            .fniv = gen_vabsd_s,
+            .fno = gen_helper_vabsd_h,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vabsd_s,
+            .fno = gen_helper_vabsd_w,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vabsd_s,
+            .fno = gen_helper_vabsd_d,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+static void gen_vabsd_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    tcg_gen_umax_vec(vece, t, a, b);
+    tcg_gen_umin_vec(vece, a, a, b);
+    tcg_gen_sub_vec(vece, t, t, a);
+}
+
+static void do_vabsd_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                       uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_umax_vec, INDEX_op_umin_vec, INDEX_op_sub_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vabsd_u,
+            .fno = gen_helper_vabsd_bu,
+            .opt_opc = vecop_list,
+            .vece = MO_8
+        },
+        {
+            .fniv = gen_vabsd_u,
+            .fno = gen_helper_vabsd_hu,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vabsd_u,
+            .fno = gen_helper_vabsd_wu,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vabsd_u,
+            .fno = gen_helper_vabsd_du,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vabsd_b, gvec_vvv, MO_8, do_vabsd_s)
+TRANS(vabsd_h, gvec_vvv, MO_16, do_vabsd_s)
+TRANS(vabsd_w, gvec_vvv, MO_32, do_vabsd_s)
+TRANS(vabsd_d, gvec_vvv, MO_64, do_vabsd_s)
+TRANS(vabsd_bu, gvec_vvv, MO_8, do_vabsd_u)
+TRANS(vabsd_hu, gvec_vvv, MO_16, do_vabsd_u)
+TRANS(vabsd_wu, gvec_vvv, MO_32, do_vabsd_u)
+TRANS(vabsd_du, gvec_vvv, MO_64, do_vabsd_u)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 4a44380259..825ddedf4d 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -619,3 +619,12 @@ vavgr_bu         0111 00000110 10100 ..... ..... .....    @vvv
 vavgr_hu         0111 00000110 10101 ..... ..... .....    @vvv
 vavgr_wu         0111 00000110 10110 ..... ..... .....    @vvv
 vavgr_du         0111 00000110 10111 ..... ..... .....    @vvv
+
+vabsd_b          0111 00000110 00000 ..... ..... .....    @vvv
+vabsd_h          0111 00000110 00001 ..... ..... .....    @vvv
+vabsd_w          0111 00000110 00010 ..... ..... .....    @vvv
+vabsd_d          0111 00000110 00011 ..... ..... .....    @vvv
+vabsd_bu         0111 00000110 00100 ..... ..... .....    @vvv
+vabsd_hu         0111 00000110 00101 ..... ..... .....    @vvv
+vabsd_wu         0111 00000110 00110 ..... ..... .....    @vvv
+vabsd_du         0111 00000110 00111 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index b539eea6ad..18d566feaa 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -373,3 +373,39 @@ DO_VAVG_U(vavgr_bu, 8, uint8_t, B, DO_VAVGR)
 DO_VAVG_U(vavgr_hu, 16, uint16_t, H, DO_VAVGR)
 DO_VAVG_U(vavgr_wu, 32, uint32_t, W, DO_VAVGR)
 DO_VAVG_U(vavgr_du, 64, uint64_t, D, DO_VAVGR)
+
+#define DO_VABSD(a, b)  ((a > b) ? (a -b) : (b-a))
+
+#define DO_VABSD_S(NAME, BIT, E, DO_OP)                     \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
+{                                                           \
+    int i;                                                  \
+    VReg *Vd = (VReg *)vd;                                  \
+    VReg *Vj = (VReg *)vj;                                  \
+    VReg *Vk = (VReg *)vk;                                  \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
+        Vd->E(i) = DO_OP(Vj->E(i), Vk->E(i));               \
+    }                                                       \
+}
+
+DO_VABSD_S(vabsd_b, 8, B, DO_VABSD)
+DO_VABSD_S(vabsd_h, 16, H, DO_VABSD)
+DO_VABSD_S(vabsd_w, 32, W, DO_VABSD)
+DO_VABSD_S(vabsd_d, 64, D, DO_VABSD)
+
+#define DO_VABSD_U(NAME, BIT, T, E, DO_OP)                  \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
+{                                                           \
+    int i;                                                  \
+    VReg *Vd = (VReg *)vd;                                  \
+    VReg *Vj = (VReg *)vj;                                  \
+    VReg *Vk = (VReg *)vk;                                  \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
+        Vd->E(i) = DO_OP((T)Vj->E(i), (T)Vk->E(i));         \
+    }                                                       \
+}
+
+DO_VABSD_U(vabsd_bu, 8, uint8_t, B, DO_VABSD)
+DO_VABSD_U(vabsd_hu, 16, uint16_t, H, DO_VABSD)
+DO_VABSD_U(vabsd_wu, 32, uint32_t, W, DO_VABSD)
+DO_VABSD_U(vabsd_du, 64, uint64_t, D, DO_VABSD)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 13/44] target/loongarch: Implement vadda
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (11 preceding siblings ...)
  2023-03-28  3:05 ` [RFC PATCH v2 12/44] target/loongarch: Implement vabsd Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-03-28 20:33   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 14/44] target/loongarch: Implement vmax/vmin Song Gao
                   ` (30 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VADDA.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  5 ++
 target/loongarch/helper.h                   |  5 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 53 +++++++++++++++++++++
 target/loongarch/insns.decode               |  5 ++
 target/loongarch/lsx_helper.c               | 19 ++++++++
 5 files changed, 87 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index e98ea37793..1f61e67d1f 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -934,3 +934,8 @@ INSN_LSX(vabsd_bu,         vvv)
 INSN_LSX(vabsd_hu,         vvv)
 INSN_LSX(vabsd_wu,         vvv)
 INSN_LSX(vabsd_du,         vvv)
+
+INSN_LSX(vadda_b,          vvv)
+INSN_LSX(vadda_h,          vvv)
+INSN_LSX(vadda_w,          vvv)
+INSN_LSX(vadda_d,          vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index a2f1999997..37685ded2c 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -223,3 +223,8 @@ DEF_HELPER_FLAGS_4(vabsd_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vabsd_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vabsd_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vabsd_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vadda_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vadda_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vadda_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vadda_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 3a75347db1..a3fcb47c4f 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -1269,3 +1269,56 @@ TRANS(vabsd_bu, gvec_vvv, MO_8, do_vabsd_u)
 TRANS(vabsd_hu, gvec_vvv, MO_16, do_vabsd_u)
 TRANS(vabsd_wu, gvec_vvv, MO_32, do_vabsd_u)
 TRANS(vabsd_du, gvec_vvv, MO_64, do_vabsd_u)
+
+static void gen_vadda(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+
+    tcg_gen_abs_vec(vece, t1, a);
+    tcg_gen_abs_vec(vece, t2, b);
+    tcg_gen_add_vec(vece, t, t1, t2);
+}
+
+static void do_vadda(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                     uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_abs_vec, INDEX_op_add_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vadda,
+            .fno = gen_helper_vadda_b,
+            .opt_opc = vecop_list,
+            .vece = MO_8
+        },
+        {
+            .fniv = gen_vadda,
+            .fno = gen_helper_vadda_h,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vadda,
+            .fno = gen_helper_vadda_w,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vadda,
+            .fno = gen_helper_vadda_d,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vadda_b, gvec_vvv, MO_8, do_vadda)
+TRANS(vadda_h, gvec_vvv, MO_16, do_vadda)
+TRANS(vadda_w, gvec_vvv, MO_32, do_vadda)
+TRANS(vadda_d, gvec_vvv, MO_64, do_vadda)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 825ddedf4d..6cb22f9297 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -628,3 +628,8 @@ vabsd_bu         0111 00000110 00100 ..... ..... .....    @vvv
 vabsd_hu         0111 00000110 00101 ..... ..... .....    @vvv
 vabsd_wu         0111 00000110 00110 ..... ..... .....    @vvv
 vabsd_du         0111 00000110 00111 ..... ..... .....    @vvv
+
+vadda_b          0111 00000101 11000 ..... ..... .....    @vvv
+vadda_h          0111 00000101 11001 ..... ..... .....    @vvv
+vadda_w          0111 00000101 11010 ..... ..... .....    @vvv
+vadda_d          0111 00000101 11011 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 18d566feaa..c28eb62cff 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -409,3 +409,22 @@ DO_VABSD_U(vabsd_bu, 8, uint8_t, B, DO_VABSD)
 DO_VABSD_U(vabsd_hu, 16, uint16_t, H, DO_VABSD)
 DO_VABSD_U(vabsd_wu, 32, uint32_t, W, DO_VABSD)
 DO_VABSD_U(vabsd_du, 64, uint64_t, D, DO_VABSD)
+
+#define DO_VABS(a)  ((a < 0) ? (-a) : (a))
+
+#define DO_VADDA(NAME, BIT, E, DO_OP)                       \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
+{                                                           \
+    int i;                                                  \
+    VReg *Vd = (VReg *)vd;                                  \
+    VReg *Vj = (VReg *)vj;                                  \
+    VReg *Vk = (VReg *)vk;                                  \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
+        Vd->E(i) = DO_OP(Vj->E(i)) + DO_OP(Vk->E(i));       \
+    }                                                       \
+}
+
+DO_VADDA(vadda_b, 8, B, DO_VABS)
+DO_VADDA(vadda_h, 16, H, DO_VABS)
+DO_VADDA(vadda_w, 32, W, DO_VABS)
+DO_VADDA(vadda_d, 64, D, DO_VABS)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 14/44] target/loongarch: Implement vmax/vmin
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (12 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 13/44] target/loongarch: Implement vadda Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-03-28 20:39   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 15/44] target/loongarch: Implement vmul/vmuh/vmulw{ev/od} Song Gao
                   ` (29 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VMAX[I].{B/H/W/D}[U];
- VMIN[I].{B/H/W/D}[U].

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  33 +++
 target/loongarch/helper.h                   |  18 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 211 ++++++++++++++++++++
 target/loongarch/insns.decode               |  35 ++++
 target/loongarch/lsx_helper.c               |  43 ++++
 5 files changed, 340 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 1f61e67d1f..6b0e518bfa 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -939,3 +939,36 @@ INSN_LSX(vadda_b,          vvv)
 INSN_LSX(vadda_h,          vvv)
 INSN_LSX(vadda_w,          vvv)
 INSN_LSX(vadda_d,          vvv)
+
+INSN_LSX(vmax_b,           vvv)
+INSN_LSX(vmax_h,           vvv)
+INSN_LSX(vmax_w,           vvv)
+INSN_LSX(vmax_d,           vvv)
+INSN_LSX(vmin_b,           vvv)
+INSN_LSX(vmin_h,           vvv)
+INSN_LSX(vmin_w,           vvv)
+INSN_LSX(vmin_d,           vvv)
+INSN_LSX(vmax_bu,          vvv)
+INSN_LSX(vmax_hu,          vvv)
+INSN_LSX(vmax_wu,          vvv)
+INSN_LSX(vmax_du,          vvv)
+INSN_LSX(vmin_bu,          vvv)
+INSN_LSX(vmin_hu,          vvv)
+INSN_LSX(vmin_wu,          vvv)
+INSN_LSX(vmin_du,          vvv)
+INSN_LSX(vmaxi_b,          vv_i)
+INSN_LSX(vmaxi_h,          vv_i)
+INSN_LSX(vmaxi_w,          vv_i)
+INSN_LSX(vmaxi_d,          vv_i)
+INSN_LSX(vmini_b,          vv_i)
+INSN_LSX(vmini_h,          vv_i)
+INSN_LSX(vmini_w,          vv_i)
+INSN_LSX(vmini_d,          vv_i)
+INSN_LSX(vmaxi_bu,         vv_i)
+INSN_LSX(vmaxi_hu,         vv_i)
+INSN_LSX(vmaxi_wu,         vv_i)
+INSN_LSX(vmaxi_du,         vv_i)
+INSN_LSX(vmini_bu,         vv_i)
+INSN_LSX(vmini_hu,         vv_i)
+INSN_LSX(vmini_wu,         vv_i)
+INSN_LSX(vmini_du,         vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 37685ded2c..f0fc7760bd 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -228,3 +228,21 @@ DEF_HELPER_FLAGS_4(vadda_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vadda_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vadda_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vadda_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vmini_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmini_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmini_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmini_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmini_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmini_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmini_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmini_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(vmaxi_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmaxi_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmaxi_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmaxi_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmaxi_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmaxi_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmaxi_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmaxi_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index a3fcb47c4f..4e2f1ff097 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -1322,3 +1322,214 @@ TRANS(vadda_b, gvec_vvv, MO_8, do_vadda)
 TRANS(vadda_h, gvec_vvv, MO_16, do_vadda)
 TRANS(vadda_w, gvec_vvv, MO_32, do_vadda)
 TRANS(vadda_d, gvec_vvv, MO_64, do_vadda)
+
+TRANS(vmax_b, gvec_vvv, MO_8, tcg_gen_gvec_smax)
+TRANS(vmax_h, gvec_vvv, MO_16, tcg_gen_gvec_smax)
+TRANS(vmax_w, gvec_vvv, MO_32, tcg_gen_gvec_smax)
+TRANS(vmax_d, gvec_vvv, MO_64, tcg_gen_gvec_smax)
+TRANS(vmax_bu, gvec_vvv, MO_8, tcg_gen_gvec_umax)
+TRANS(vmax_hu, gvec_vvv, MO_16, tcg_gen_gvec_umax)
+TRANS(vmax_wu, gvec_vvv, MO_32, tcg_gen_gvec_umax)
+TRANS(vmax_du, gvec_vvv, MO_64, tcg_gen_gvec_umax)
+
+TRANS(vmin_b, gvec_vvv, MO_8, tcg_gen_gvec_smin)
+TRANS(vmin_h, gvec_vvv, MO_16, tcg_gen_gvec_smin)
+TRANS(vmin_w, gvec_vvv, MO_32, tcg_gen_gvec_smin)
+TRANS(vmin_d, gvec_vvv, MO_64, tcg_gen_gvec_smin)
+TRANS(vmin_bu, gvec_vvv, MO_8, tcg_gen_gvec_umin)
+TRANS(vmin_hu, gvec_vvv, MO_16, tcg_gen_gvec_umin)
+TRANS(vmin_wu, gvec_vvv, MO_32, tcg_gen_gvec_umin)
+TRANS(vmin_du, gvec_vvv, MO_64, tcg_gen_gvec_umin)
+
+static void do_vminmax(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm,
+                       void(*gen_vminmax_vec)(unsigned,
+                                              TCGv_vec, TCGv_vec, TCGv_vec))
+{
+    TCGv_vec t1;
+
+    t1 = tcg_temp_new_vec_matching(t);
+    tcg_gen_dupi_vec(vece, t1, imm);
+    gen_vminmax_vec(vece, t, a, t1);
+}
+
+static void gen_vmini_s(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
+{
+    do_vminmax(vece, t, a, imm, tcg_gen_smin_vec);
+}
+
+static void gen_vmini_u(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
+{
+    do_vminmax(vece, t, a, imm, tcg_gen_umin_vec);
+}
+
+static void gen_vmaxi_s(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
+{
+    do_vminmax(vece, t, a, imm, tcg_gen_smax_vec);
+}
+
+static void gen_vmaxi_u(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
+{
+    do_vminmax(vece, t, a, imm, tcg_gen_umax_vec);
+}
+
+static void do_vmini_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                       int64_t imm, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_smin_vec, 0
+        };
+    static const GVecGen2i op[4] = {
+        {
+            .fniv = gen_vmini_s,
+            .fnoi = gen_helper_vmini_b,
+            .opt_opc = vecop_list,
+            .vece = MO_8
+        },
+        {
+            .fniv = gen_vmini_s,
+            .fnoi = gen_helper_vmini_h,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vmini_s,
+            .fnoi = gen_helper_vmini_w,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vmini_s,
+            .fnoi = gen_helper_vmini_d,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+    };
+
+    tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, maxsz, imm, &op[vece]);
+}
+
+static void do_vmini_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                       int64_t imm, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_umin_vec, 0
+        };
+    static const GVecGen2i op[4] = {
+        {
+            .fniv = gen_vmini_u,
+            .fnoi = gen_helper_vmini_bu,
+            .opt_opc = vecop_list,
+            .vece = MO_8
+        },
+        {
+            .fniv = gen_vmini_u,
+            .fnoi = gen_helper_vmini_hu,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vmini_u,
+            .fnoi = gen_helper_vmini_wu,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vmini_u,
+            .fnoi = gen_helper_vmini_du,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+    };
+
+    tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, maxsz, imm, &op[vece]);
+}
+
+TRANS(vmini_b, gvec_vv_i, MO_8, do_vmini_s)
+TRANS(vmini_h, gvec_vv_i, MO_16, do_vmini_s)
+TRANS(vmini_w, gvec_vv_i, MO_32, do_vmini_s)
+TRANS(vmini_d, gvec_vv_i, MO_64, do_vmini_s)
+TRANS(vmini_bu, gvec_vv_i, MO_8, do_vmini_u)
+TRANS(vmini_hu, gvec_vv_i, MO_16, do_vmini_u)
+TRANS(vmini_wu, gvec_vv_i, MO_32, do_vmini_u)
+TRANS(vmini_du, gvec_vv_i, MO_64, do_vmini_u)
+
+static void do_vmaxi_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                       int64_t imm, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_smax_vec, 0
+        };
+    static const GVecGen2i op[4] = {
+        {
+            .fniv = gen_vmaxi_s,
+            .fnoi = gen_helper_vmaxi_b,
+            .opt_opc = vecop_list,
+            .vece = MO_8
+        },
+        {
+            .fniv = gen_vmaxi_s,
+            .fnoi = gen_helper_vmaxi_h,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vmaxi_s,
+            .fnoi = gen_helper_vmaxi_w,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vmaxi_s,
+            .fnoi = gen_helper_vmaxi_d,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+    };
+
+    tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, maxsz, imm, &op[vece]);
+}
+
+static void do_vmaxi_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                       int64_t imm, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_umax_vec, 0
+        };
+    static const GVecGen2i op[4] = {
+        {
+            .fniv = gen_vmaxi_u,
+            .fnoi = gen_helper_vmaxi_bu,
+            .opt_opc = vecop_list,
+            .vece = MO_8
+        },
+        {
+            .fniv = gen_vmaxi_u,
+            .fnoi = gen_helper_vmaxi_hu,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vmaxi_u,
+            .fnoi = gen_helper_vmaxi_wu,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vmaxi_u,
+            .fnoi = gen_helper_vmaxi_du,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+    };
+
+    tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, maxsz, imm, &op[vece]);
+}
+
+TRANS(vmaxi_b, gvec_vv_i, MO_8, do_vmaxi_s)
+TRANS(vmaxi_h, gvec_vv_i, MO_16, do_vmaxi_s)
+TRANS(vmaxi_w, gvec_vv_i, MO_32, do_vmaxi_s)
+TRANS(vmaxi_d, gvec_vv_i, MO_64, do_vmaxi_s)
+TRANS(vmaxi_bu, gvec_vv_i, MO_8, do_vmaxi_u)
+TRANS(vmaxi_hu, gvec_vv_i, MO_16, do_vmaxi_u)
+TRANS(vmaxi_wu, gvec_vv_i, MO_32, do_vmaxi_u)
+TRANS(vmaxi_du, gvec_vv_i, MO_64, do_vmaxi_u)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 6cb22f9297..dd1bc031e8 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -500,6 +500,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 @vv               .... ........ ..... ..... vj:5 vd:5    &vv
 @vvv               .... ........ ..... vk:5 vj:5 vd:5    &vvv
 @vv_ui5           .... ........ ..... imm:5 vj:5 vd:5    &vv_i
+@vv_i5           .... ........ ..... imm:s5 vj:5 vd:5    &vv_i
 
 vadd_b           0111 00000000 10100 ..... ..... .....    @vvv
 vadd_h           0111 00000000 10101 ..... ..... .....    @vvv
@@ -633,3 +634,37 @@ vadda_b          0111 00000101 11000 ..... ..... .....    @vvv
 vadda_h          0111 00000101 11001 ..... ..... .....    @vvv
 vadda_w          0111 00000101 11010 ..... ..... .....    @vvv
 vadda_d          0111 00000101 11011 ..... ..... .....    @vvv
+
+vmax_b           0111 00000111 00000 ..... ..... .....    @vvv
+vmax_h           0111 00000111 00001 ..... ..... .....    @vvv
+vmax_w           0111 00000111 00010 ..... ..... .....    @vvv
+vmax_d           0111 00000111 00011 ..... ..... .....    @vvv
+vmaxi_b          0111 00101001 00000 ..... ..... .....    @vv_i5
+vmaxi_h          0111 00101001 00001 ..... ..... .....    @vv_i5
+vmaxi_w          0111 00101001 00010 ..... ..... .....    @vv_i5
+vmaxi_d          0111 00101001 00011 ..... ..... .....    @vv_i5
+vmax_bu          0111 00000111 01000 ..... ..... .....    @vvv
+vmax_hu          0111 00000111 01001 ..... ..... .....    @vvv
+vmax_wu          0111 00000111 01010 ..... ..... .....    @vvv
+vmax_du          0111 00000111 01011 ..... ..... .....    @vvv
+vmaxi_bu         0111 00101001 01000 ..... ..... .....    @vv_ui5
+vmaxi_hu         0111 00101001 01001 ..... ..... .....    @vv_ui5
+vmaxi_wu         0111 00101001 01010 ..... ..... .....    @vv_ui5
+vmaxi_du         0111 00101001 01011 ..... ..... .....    @vv_ui5
+
+vmin_b           0111 00000111 00100 ..... ..... .....    @vvv
+vmin_h           0111 00000111 00101 ..... ..... .....    @vvv
+vmin_w           0111 00000111 00110 ..... ..... .....    @vvv
+vmin_d           0111 00000111 00111 ..... ..... .....    @vvv
+vmini_b          0111 00101001 00100 ..... ..... .....    @vv_i5
+vmini_h          0111 00101001 00101 ..... ..... .....    @vv_i5
+vmini_w          0111 00101001 00110 ..... ..... .....    @vv_i5
+vmini_d          0111 00101001 00111 ..... ..... .....    @vv_i5
+vmin_bu          0111 00000111 01100 ..... ..... .....    @vvv
+vmin_hu          0111 00000111 01101 ..... ..... .....    @vvv
+vmin_wu          0111 00000111 01110 ..... ..... .....    @vvv
+vmin_du          0111 00000111 01111 ..... ..... .....    @vvv
+vmini_bu         0111 00101001 01100 ..... ..... .....    @vv_ui5
+vmini_hu         0111 00101001 01101 ..... ..... .....    @vv_ui5
+vmini_wu         0111 00101001 01110 ..... ..... .....    @vv_ui5
+vmini_du         0111 00101001 01111 ..... ..... .....    @vv_ui5
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index c28eb62cff..e2ee419e0a 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -428,3 +428,46 @@ DO_VADDA(vadda_b, 8, B, DO_VABS)
 DO_VADDA(vadda_h, 16, H, DO_VABS)
 DO_VADDA(vadda_w, 32, W, DO_VABS)
 DO_VADDA(vadda_d, 64, D, DO_VABS)
+
+#define DO_MIN(a, b) (a < b ? a : b)
+#define DO_MAX(a, b) (a > b ? a : b)
+
+#define DO_VMINMAXI_S(NAME, BIT, T, E, DO_OP)                   \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t v) \
+{                                                               \
+    int i;                                                      \
+    VReg *Vd = (VReg *)vd;                                      \
+    VReg *Vj = (VReg *)vj;                                      \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                         \
+        Vd->E(i) = DO_OP(Vj->E(i), (T)imm);                     \
+    }                                                           \
+}
+
+DO_VMINMAXI_S(vmini_b, 8, int8_t, B, DO_MIN)
+DO_VMINMAXI_S(vmini_h, 16, int16_t, H, DO_MIN)
+DO_VMINMAXI_S(vmini_w, 32, int32_t, W, DO_MIN)
+DO_VMINMAXI_S(vmini_d, 64, int64_t, D, DO_MIN)
+DO_VMINMAXI_S(vmaxi_b, 8, int8_t, B, DO_MAX)
+DO_VMINMAXI_S(vmaxi_h, 16, int16_t, H, DO_MAX)
+DO_VMINMAXI_S(vmaxi_w, 32, int32_t, W, DO_MAX)
+DO_VMINMAXI_S(vmaxi_d, 64, int64_t, D, DO_MAX)
+
+#define DO_VMINMAXI_U(NAME, BIT, T, E, DO_OP)                   \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t v) \
+{                                                               \
+    int i;                                                      \
+    VReg *Vd = (VReg *)vd;                                      \
+    VReg *Vj = (VReg *)vj;                                      \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                         \
+        Vd->E(i) = DO_OP((T)Vj->E(i), (T)imm);                  \
+    }                                                           \
+}
+
+DO_VMINMAXI_U(vmini_bu, 8, uint8_t, B, DO_MIN)
+DO_VMINMAXI_U(vmini_hu, 16, uint16_t, H, DO_MIN)
+DO_VMINMAXI_U(vmini_wu, 32, uint32_t, W, DO_MIN)
+DO_VMINMAXI_U(vmini_du, 64, uint64_t, D, DO_MIN)
+DO_VMINMAXI_U(vmaxi_bu, 8, uint8_t, B, DO_MAX)
+DO_VMINMAXI_U(vmaxi_hu, 16, uint16_t, H, DO_MAX)
+DO_VMINMAXI_U(vmaxi_wu, 32, uint32_t, W, DO_MAX)
+DO_VMINMAXI_U(vmaxi_du, 64, uint64_t, D, DO_MAX)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 15/44] target/loongarch: Implement vmul/vmuh/vmulw{ev/od}
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (13 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 14/44] target/loongarch: Implement vmax/vmin Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-03-28 20:46   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 16/44] target/loongarch: Implement vmadd/vmsub/vmaddw{ev/od} Song Gao
                   ` (28 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VMUL.{B/H/W/D};
- VMUH.{B/H/W/D}[U];
- VMULW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
- VMULW{EV/OD}.{H.BU.B/W.HU.H/D.WU.W/Q.DU.D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  38 ++
 target/loongarch/helper.h                   |  36 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 378 ++++++++++++++++++++
 target/loongarch/insns.decode               |  38 ++
 target/loongarch/lsx_helper.c               | 140 ++++++++
 5 files changed, 630 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 6b0e518bfa..48e6ef5309 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -972,3 +972,41 @@ INSN_LSX(vmini_bu,         vv_i)
 INSN_LSX(vmini_hu,         vv_i)
 INSN_LSX(vmini_wu,         vv_i)
 INSN_LSX(vmini_du,         vv_i)
+
+INSN_LSX(vmul_b,           vvv)
+INSN_LSX(vmul_h,           vvv)
+INSN_LSX(vmul_w,           vvv)
+INSN_LSX(vmul_d,           vvv)
+INSN_LSX(vmuh_b,           vvv)
+INSN_LSX(vmuh_h,           vvv)
+INSN_LSX(vmuh_w,           vvv)
+INSN_LSX(vmuh_d,           vvv)
+INSN_LSX(vmuh_bu,          vvv)
+INSN_LSX(vmuh_hu,          vvv)
+INSN_LSX(vmuh_wu,          vvv)
+INSN_LSX(vmuh_du,          vvv)
+
+INSN_LSX(vmulwev_h_b,      vvv)
+INSN_LSX(vmulwev_w_h,      vvv)
+INSN_LSX(vmulwev_d_w,      vvv)
+INSN_LSX(vmulwev_q_d,      vvv)
+INSN_LSX(vmulwod_h_b,      vvv)
+INSN_LSX(vmulwod_w_h,      vvv)
+INSN_LSX(vmulwod_d_w,      vvv)
+INSN_LSX(vmulwod_q_d,      vvv)
+INSN_LSX(vmulwev_h_bu,     vvv)
+INSN_LSX(vmulwev_w_hu,     vvv)
+INSN_LSX(vmulwev_d_wu,     vvv)
+INSN_LSX(vmulwev_q_du,     vvv)
+INSN_LSX(vmulwod_h_bu,     vvv)
+INSN_LSX(vmulwod_w_hu,     vvv)
+INSN_LSX(vmulwod_d_wu,     vvv)
+INSN_LSX(vmulwod_q_du,     vvv)
+INSN_LSX(vmulwev_h_bu_b,   vvv)
+INSN_LSX(vmulwev_w_hu_h,   vvv)
+INSN_LSX(vmulwev_d_wu_w,   vvv)
+INSN_LSX(vmulwev_q_du_d,   vvv)
+INSN_LSX(vmulwod_h_bu_b,   vvv)
+INSN_LSX(vmulwod_w_hu_h,   vvv)
+INSN_LSX(vmulwod_d_wu_w,   vvv)
+INSN_LSX(vmulwod_q_du_d,   vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index f0fc7760bd..437b47fa78 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -246,3 +246,39 @@ DEF_HELPER_FLAGS_4(vmaxi_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vmaxi_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vmaxi_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vmaxi_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(vmuh_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmuh_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmuh_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmuh_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmuh_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmuh_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmuh_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmuh_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vmulwev_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwev_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwev_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwev_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vmulwev_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwev_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwev_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwev_q_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_q_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vmulwev_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwev_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwev_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwev_q_du_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_q_du_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 4e2f1ff097..583b608cd2 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -1533,3 +1533,381 @@ TRANS(vmaxi_bu, gvec_vv_i, MO_8, do_vmaxi_u)
 TRANS(vmaxi_hu, gvec_vv_i, MO_16, do_vmaxi_u)
 TRANS(vmaxi_wu, gvec_vv_i, MO_32, do_vmaxi_u)
 TRANS(vmaxi_du, gvec_vv_i, MO_64, do_vmaxi_u)
+
+TRANS(vmul_b, gvec_vvv, MO_8, tcg_gen_gvec_mul)
+TRANS(vmul_h, gvec_vvv, MO_16, tcg_gen_gvec_mul)
+TRANS(vmul_w, gvec_vvv, MO_32, tcg_gen_gvec_mul)
+TRANS(vmul_d, gvec_vvv, MO_64, tcg_gen_gvec_mul)
+
+static void do_vmuh_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                      uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const GVecGen3 op[4] = {
+        {
+            .fno = gen_helper_vmuh_b,
+            .vece = MO_8
+        },
+        {
+            .fno = gen_helper_vmuh_h,
+            .vece = MO_16
+        },
+        {
+            .fno = gen_helper_vmuh_w,
+            .vece = MO_32
+        },
+        {
+            .fno = gen_helper_vmuh_d,
+            .vece = MO_64
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vmuh_b, gvec_vvv, MO_8, do_vmuh_s)
+TRANS(vmuh_h, gvec_vvv, MO_16, do_vmuh_s)
+TRANS(vmuh_w, gvec_vvv, MO_32, do_vmuh_s)
+TRANS(vmuh_d, gvec_vvv, MO_64, do_vmuh_s)
+
+static void do_vmuh_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                      uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const GVecGen3 op[4] = {
+        {
+            .fno = gen_helper_vmuh_bu,
+            .vece = MO_8
+        },
+        {
+            .fno = gen_helper_vmuh_hu,
+            .vece = MO_16
+        },
+        {
+            .fno = gen_helper_vmuh_wu,
+            .vece = MO_32
+        },
+        {
+            .fno = gen_helper_vmuh_du,
+            .vece = MO_64
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vmuh_bu, gvec_vvv, MO_8,  do_vmuh_u)
+TRANS(vmuh_hu, gvec_vvv, MO_16, do_vmuh_u)
+TRANS(vmuh_wu, gvec_vvv, MO_32, do_vmuh_u)
+TRANS(vmuh_du, gvec_vvv, MO_64, do_vmuh_u)
+
+static void gen_vmulwev_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2;
+    int halfbits = 4 << vece;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+    tcg_gen_shli_vec(vece, t1, a, halfbits);
+    tcg_gen_sari_vec(vece, t1, t1, halfbits);
+    tcg_gen_shli_vec(vece, t2, b, halfbits);
+    tcg_gen_sari_vec(vece, t2, t2, halfbits);
+    tcg_gen_mul_vec(vece, t, t1, t2);
+}
+
+static void do_vmulwev_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                         uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_shli_vec, INDEX_op_sari_vec, INDEX_op_mul_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vmulwev_s,
+            .fno = gen_helper_vmulwev_h_b,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vmulwev_s,
+            .fno = gen_helper_vmulwev_w_h,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vmulwev_s,
+            .fno = gen_helper_vmulwev_d_w,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+        {
+            .fno = gen_helper_vmulwev_q_d,
+            .vece = MO_128
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vmulwev_h_b, gvec_vvv, MO_8, do_vmulwev_s)
+TRANS(vmulwev_w_h, gvec_vvv, MO_16, do_vmulwev_s)
+TRANS(vmulwev_d_w, gvec_vvv, MO_32, do_vmulwev_s)
+TRANS(vmulwev_q_d, gvec_vvv, MO_64, do_vmulwev_s)
+
+static void gen_vmulwod_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2;
+    int halfbits = 4 << vece;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+    tcg_gen_sari_vec(vece, t1, a, halfbits);
+    tcg_gen_sari_vec(vece, t2, b, halfbits);
+    tcg_gen_mul_vec(vece, t, t1, t2);
+}
+
+static void do_vmulwod_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                         uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_sari_vec, INDEX_op_mul_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vmulwod_s,
+            .fno = gen_helper_vmulwod_h_b,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vmulwod_s,
+            .fno = gen_helper_vmulwod_w_h,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vmulwod_s,
+            .fno = gen_helper_vmulwod_d_w,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+        {
+            .fno = gen_helper_vmulwod_q_d,
+            .vece = MO_128
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vmulwod_h_b, gvec_vvv, MO_8, do_vmulwod_s)
+TRANS(vmulwod_w_h, gvec_vvv, MO_16, do_vmulwod_s)
+TRANS(vmulwod_d_w, gvec_vvv, MO_32, do_vmulwod_s)
+TRANS(vmulwod_q_d, gvec_vvv, MO_64, do_vmulwod_s)
+
+static void gen_vmulwev_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2;
+    int halfbits = 4 << vece;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+    tcg_gen_shli_vec(vece, t1, a, halfbits);
+    tcg_gen_shri_vec(vece, t1, t1, halfbits);
+    tcg_gen_shli_vec(vece, t2, b, halfbits);
+    tcg_gen_shri_vec(vece, t2, t2, halfbits);
+    tcg_gen_mul_vec(vece, t, t1, t2);
+}
+
+static void do_vmulwev_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                         uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_shli_vec, INDEX_op_shri_vec, INDEX_op_mul_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vmulwev_u,
+            .fno = gen_helper_vmulwev_h_bu,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vmulwev_u,
+            .fno = gen_helper_vmulwev_w_hu,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vmulwev_u,
+            .fno = gen_helper_vmulwev_d_wu,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+        {
+            .fno = gen_helper_vmulwev_q_du,
+            .vece = MO_128
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vmulwev_h_bu, gvec_vvv, MO_8, do_vmulwev_u)
+TRANS(vmulwev_w_hu, gvec_vvv, MO_16, do_vmulwev_u)
+TRANS(vmulwev_d_wu, gvec_vvv, MO_32, do_vmulwev_u)
+TRANS(vmulwev_q_du, gvec_vvv, MO_64, do_vmulwev_u)
+
+static void gen_vmulwod_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2;
+    int halfbits = 4 << vece;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+    tcg_gen_shri_vec(vece, t1, a, halfbits);
+    tcg_gen_shri_vec(vece, t2, b, halfbits);
+    tcg_gen_mul_vec(vece, t, t1, t2);
+}
+
+static void do_vmulwod_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                         uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_shri_vec, INDEX_op_mul_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vmulwod_u,
+            .fno = gen_helper_vmulwod_h_bu,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vmulwod_u,
+            .fno = gen_helper_vmulwod_w_hu,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vmulwod_u,
+            .fno = gen_helper_vmulwod_d_wu,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+        {
+            .fno = gen_helper_vmulwod_q_du,
+            .vece = MO_128
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vmulwod_h_bu, gvec_vvv, MO_8, do_vmulwod_u)
+TRANS(vmulwod_w_hu, gvec_vvv, MO_16, do_vmulwod_u)
+TRANS(vmulwod_d_wu, gvec_vvv, MO_32, do_vmulwod_u)
+TRANS(vmulwod_q_du, gvec_vvv, MO_64, do_vmulwod_u)
+
+static void gen_vmulwev_u_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2;
+    int halfbits = 4 << vece;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+    tcg_gen_shli_vec(vece, t1, a, halfbits);
+    tcg_gen_shri_vec(vece, t1, t1, halfbits);
+    tcg_gen_shli_vec(vece, t2, b, halfbits);
+    tcg_gen_sari_vec(vece, t2, t2, halfbits);
+    tcg_gen_mul_vec(vece, t, t1, t2);
+}
+
+static void do_vmulwev_u_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                           uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_shli_vec, INDEX_op_shri_vec,
+        INDEX_op_sari_vec, INDEX_op_mul_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vmulwev_u_s,
+            .fno = gen_helper_vmulwev_h_bu_b,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vmulwev_u_s,
+            .fno = gen_helper_vmulwev_w_hu_h,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vmulwev_u_s,
+            .fno = gen_helper_vmulwev_d_wu_w,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+        {
+            .fno = gen_helper_vmulwev_q_du_d,
+            .vece = MO_128
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vmulwev_h_bu_b, gvec_vvv, MO_8, do_vmulwev_u_s)
+TRANS(vmulwev_w_hu_h, gvec_vvv, MO_16, do_vmulwev_u_s)
+TRANS(vmulwev_d_wu_w, gvec_vvv, MO_32, do_vmulwev_u_s)
+TRANS(vmulwev_q_du_d, gvec_vvv, MO_64, do_vmulwev_u_s)
+
+static void gen_vmulwod_u_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2;
+    int halfbits = 4 << vece;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+    tcg_gen_shri_vec(vece, t1, a, halfbits);
+    tcg_gen_sari_vec(vece, t2, b, halfbits);
+    tcg_gen_mul_vec(vece, t, t1, t2);
+}
+
+static void do_vmulwod_u_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                           uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_shri_vec, INDEX_op_sari_vec, INDEX_op_mul_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vmulwod_u_s,
+            .fno = gen_helper_vmulwod_h_bu_b,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vmulwod_u_s,
+            .fno = gen_helper_vmulwod_w_hu_h,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vmulwod_u_s,
+            .fno = gen_helper_vmulwod_d_wu_w,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+        {
+            .fno = gen_helper_vmulwod_q_du_d,
+            .vece = MO_128
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vmulwod_h_bu_b, gvec_vvv, MO_8, do_vmulwod_u_s)
+TRANS(vmulwod_w_hu_h, gvec_vvv, MO_16, do_vmulwod_u_s)
+TRANS(vmulwod_d_wu_w, gvec_vvv, MO_32, do_vmulwod_u_s)
+TRANS(vmulwod_q_du_d, gvec_vvv, MO_64, do_vmulwod_u_s)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index dd1bc031e8..64e8042c9c 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -668,3 +668,41 @@ vmini_bu         0111 00101001 01100 ..... ..... .....    @vv_ui5
 vmini_hu         0111 00101001 01101 ..... ..... .....    @vv_ui5
 vmini_wu         0111 00101001 01110 ..... ..... .....    @vv_ui5
 vmini_du         0111 00101001 01111 ..... ..... .....    @vv_ui5
+
+vmul_b           0111 00001000 01000 ..... ..... .....    @vvv
+vmul_h           0111 00001000 01001 ..... ..... .....    @vvv
+vmul_w           0111 00001000 01010 ..... ..... .....    @vvv
+vmul_d           0111 00001000 01011 ..... ..... .....    @vvv
+vmuh_b           0111 00001000 01100 ..... ..... .....    @vvv
+vmuh_h           0111 00001000 01101 ..... ..... .....    @vvv
+vmuh_w           0111 00001000 01110 ..... ..... .....    @vvv
+vmuh_d           0111 00001000 01111 ..... ..... .....    @vvv
+vmuh_bu          0111 00001000 10000 ..... ..... .....    @vvv
+vmuh_hu          0111 00001000 10001 ..... ..... .....    @vvv
+vmuh_wu          0111 00001000 10010 ..... ..... .....    @vvv
+vmuh_du          0111 00001000 10011 ..... ..... .....    @vvv
+
+vmulwev_h_b      0111 00001001 00000 ..... ..... .....    @vvv
+vmulwev_w_h      0111 00001001 00001 ..... ..... .....    @vvv
+vmulwev_d_w      0111 00001001 00010 ..... ..... .....    @vvv
+vmulwev_q_d      0111 00001001 00011 ..... ..... .....    @vvv
+vmulwod_h_b      0111 00001001 00100 ..... ..... .....    @vvv
+vmulwod_w_h      0111 00001001 00101 ..... ..... .....    @vvv
+vmulwod_d_w      0111 00001001 00110 ..... ..... .....    @vvv
+vmulwod_q_d      0111 00001001 00111 ..... ..... .....    @vvv
+vmulwev_h_bu     0111 00001001 10000 ..... ..... .....    @vvv
+vmulwev_w_hu     0111 00001001 10001 ..... ..... .....    @vvv
+vmulwev_d_wu     0111 00001001 10010 ..... ..... .....    @vvv
+vmulwev_q_du     0111 00001001 10011 ..... ..... .....    @vvv
+vmulwod_h_bu     0111 00001001 10100 ..... ..... .....    @vvv
+vmulwod_w_hu     0111 00001001 10101 ..... ..... .....    @vvv
+vmulwod_d_wu     0111 00001001 10110 ..... ..... .....    @vvv
+vmulwod_q_du     0111 00001001 10111 ..... ..... .....    @vvv
+vmulwev_h_bu_b   0111 00001010 00000 ..... ..... .....    @vvv
+vmulwev_w_hu_h   0111 00001010 00001 ..... ..... .....    @vvv
+vmulwev_d_wu_w   0111 00001010 00010 ..... ..... .....    @vvv
+vmulwev_q_du_d   0111 00001010 00011 ..... ..... .....    @vvv
+vmulwod_h_bu_b   0111 00001010 00100 ..... ..... .....    @vvv
+vmulwod_w_hu_h   0111 00001010 00101 ..... ..... .....    @vvv
+vmulwod_d_wu_w   0111 00001010 00110 ..... ..... .....    @vvv
+vmulwod_q_du_d   0111 00001010 00111 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index e2ee419e0a..cfc74f08d8 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -471,3 +471,143 @@ DO_VMINMAXI_U(vmaxi_bu, 8, uint8_t, B, DO_MAX)
 DO_VMINMAXI_U(vmaxi_hu, 16, uint16_t, H, DO_MAX)
 DO_VMINMAXI_U(vmaxi_wu, 32, uint32_t, W, DO_MAX)
 DO_VMINMAXI_U(vmaxi_du, 64, uint64_t, D, DO_MAX)
+
+#define DO_VMUH_S(NAME, BIT, T, E, DO_OP)                   \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
+{                                                           \
+    int i;                                                  \
+    VReg *Vd = (VReg *)vd;                                  \
+    VReg *Vj = (VReg *)vj;                                  \
+    VReg *Vk = (VReg *)vk;                                  \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
+        Vd->E(i) = ((T)Vj->E(i)) * ((T)Vk->E(i)) >> BIT;    \
+    }                                                       \
+}
+
+void HELPER(vmuh_d)(void *vd, void *vj, void *vk, uint32_t v)
+{
+    uint64_t l, h1, h2;
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
+
+    muls64(&l, &h1, Vj->D(0), Vk->D(0));
+    muls64(&l, &h2, Vj->D(1), Vk->D(1));
+
+    Vd->D(0) = h1;
+    Vd->D(1) = h2;
+}
+
+DO_VMUH_S(vmuh_b, 8, int16_t, B, DO_MUH)
+DO_VMUH_S(vmuh_h, 16, int32_t, H, DO_MUH)
+DO_VMUH_S(vmuh_w, 32, int64_t, W, DO_MUH)
+
+#define DO_VMUH_U(NAME, BIT, T, T2, E, DO_OP)                   \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v)     \
+{                                                               \
+    int i;                                                      \
+    VReg *Vd = (VReg *)vd;                                      \
+    VReg *Vj = (VReg *)vj;                                      \
+    VReg *Vk = (VReg *)vk;                                      \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                         \
+        Vd->E(i) = ((T)((T2)Vj->E(i)) * ((T2)Vk->E(i))) >> BIT; \
+    }                                                           \
+}
+
+void HELPER(vmuh_du)(void *vd, void *vj, void *vk, uint32_t v)
+{
+    uint64_t l, h1, h2;
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
+
+    mulu64(&l, &h1, Vj->D(0), Vk->D(0));
+    mulu64(&l, &h2, Vj->D(1), Vk->D(1));
+
+    Vd->D(0) = h1;
+    Vd->D(1) = h2;
+}
+
+DO_VMUH_U(vmuh_bu, 8, uint16_t, uint8_t, B, DO_MUH)
+DO_VMUH_U(vmuh_hu, 16, uint32_t, uint16_t, H, DO_MUH)
+DO_VMUH_U(vmuh_wu, 32, uint64_t, uint32_t, W, DO_MUH)
+
+#define DO_MUL(a, b) (a * b)
+
+void HELPER(vmulwev_q_d)(void *vd, void *vj, void *vk, uint32_t v)
+{
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
+
+    Vd->Q(0) = int128_makes64(Vj->D(0)) * int128_makes64(Vk->D(0));
+}
+
+DO_EVEN_S(vmulwev_h_b, 16, int16_t, H, B, DO_MUL)
+DO_EVEN_S(vmulwev_w_h, 32, int32_t, W, H, DO_MUL)
+DO_EVEN_S(vmulwev_d_w, 64, int64_t, D, W, DO_MUL)
+
+void HELPER(vmulwod_q_d)(void *vd, void *vj, void *vk, uint32_t v)
+{
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
+
+    Vd->Q(0) = int128_makes64(Vj->D(1)) * int128_makes64(Vk->D(1));
+}
+
+DO_ODD_S(vmulwod_h_b, 16, int16_t, H, B, DO_MUL)
+DO_ODD_S(vmulwod_w_h, 32, int32_t, W, H, DO_MUL)
+DO_ODD_S(vmulwod_d_w, 64, int64_t, D, W, DO_MUL)
+
+void HELPER(vmulwev_q_du)(void *vd, void *vj, void *vk, uint32_t v)
+{
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
+
+    Vd->Q(0) = int128_make64(Vj->D(0)) * int128_make64(Vk->D(0));
+}
+
+DO_EVEN_U(vmulwev_h_bu, 16, uint16_t, uint8_t, H, B, DO_MUL)
+DO_EVEN_U(vmulwev_w_hu, 32, uint32_t, uint16_t, W, H, DO_MUL)
+DO_EVEN_U(vmulwev_d_wu, 64, uint64_t, uint32_t, D, W, DO_MUL)
+
+void HELPER(vmulwod_q_du)(void *vd, void *vj, void *vk, uint32_t v)
+{
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
+
+    Vd->Q(0) = int128_make64(Vj->D(1)) * int128_make64(Vk->D(1));
+}
+
+DO_ODD_U(vmulwod_h_bu, 16, uint16_t, uint8_t, H, B, DO_MUL)
+DO_ODD_U(vmulwod_w_hu, 32, uint32_t, uint16_t, W, H, DO_MUL)
+DO_ODD_U(vmulwod_d_wu, 64, uint64_t, uint32_t, D, W, DO_MUL)
+
+void HELPER(vmulwev_q_du_d)(void *vd, void *vj, void *vk, uint32_t v)
+{
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
+
+    Vd->Q(0) = int128_make64(Vj->D(0)) * int128_makes64(Vk->D(0));
+}
+
+DO_EVEN_U_S(vmulwev_h_bu_b, 16, uint16_t, uint8_t, int16_t, H, B, DO_MUL)
+DO_EVEN_U_S(vmulwev_w_hu_h, 32, uint32_t, uint16_t, int32_t, W, H, DO_MUL)
+DO_EVEN_U_S(vmulwev_d_wu_w, 64, uint64_t, uint32_t, int64_t, D, W, DO_MUL)
+
+void HELPER(vmulwod_q_du_d)(void *vd, void *vj, void *vk, uint32_t v)
+{
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
+
+    Vd->Q(0) = int128_make64(Vj->D(1)) * int128_makes64(Vk->D(1));
+}
+
+DO_ODD_U_S(vmulwod_h_bu_b, 16, uint16_t, uint8_t, int16_t, H, B, DO_MUL)
+DO_ODD_U_S(vmulwod_w_hu_h, 32, uint32_t, uint16_t, int32_t, W, H, DO_MUL)
+DO_ODD_U_S(vmulwod_d_wu_w, 64, uint64_t, uint32_t, int64_t, D, W, DO_MUL)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 16/44] target/loongarch: Implement vmadd/vmsub/vmaddw{ev/od}
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (14 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 15/44] target/loongarch: Implement vmul/vmuh/vmulw{ev/od} Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-03-28 20:50   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 17/44] target/loongarch: Implement vdiv/vmod Song Gao
                   ` (27 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VMADD.{B/H/W/D};
- VMSUB.{B/H/W/D};
- VMADDW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
- VMADDW{EV/OD}.{H.BU.B/W.HU.H/D.WU.W/Q.DU.D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  34 ++
 target/loongarch/helper.h                   |  36 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 454 ++++++++++++++++++++
 target/loongarch/insns.decode               |  34 ++
 target/loongarch/lsx_helper.c               | 114 +++++
 5 files changed, 672 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 48e6ef5309..980e6e6375 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1010,3 +1010,37 @@ INSN_LSX(vmulwod_h_bu_b,   vvv)
 INSN_LSX(vmulwod_w_hu_h,   vvv)
 INSN_LSX(vmulwod_d_wu_w,   vvv)
 INSN_LSX(vmulwod_q_du_d,   vvv)
+
+INSN_LSX(vmadd_b,          vvv)
+INSN_LSX(vmadd_h,          vvv)
+INSN_LSX(vmadd_w,          vvv)
+INSN_LSX(vmadd_d,          vvv)
+INSN_LSX(vmsub_b,          vvv)
+INSN_LSX(vmsub_h,          vvv)
+INSN_LSX(vmsub_w,          vvv)
+INSN_LSX(vmsub_d,          vvv)
+
+INSN_LSX(vmaddwev_h_b,     vvv)
+INSN_LSX(vmaddwev_w_h,     vvv)
+INSN_LSX(vmaddwev_d_w,     vvv)
+INSN_LSX(vmaddwev_q_d,     vvv)
+INSN_LSX(vmaddwod_h_b,     vvv)
+INSN_LSX(vmaddwod_w_h,     vvv)
+INSN_LSX(vmaddwod_d_w,     vvv)
+INSN_LSX(vmaddwod_q_d,     vvv)
+INSN_LSX(vmaddwev_h_bu,    vvv)
+INSN_LSX(vmaddwev_w_hu,    vvv)
+INSN_LSX(vmaddwev_d_wu,    vvv)
+INSN_LSX(vmaddwev_q_du,    vvv)
+INSN_LSX(vmaddwod_h_bu,    vvv)
+INSN_LSX(vmaddwod_w_hu,    vvv)
+INSN_LSX(vmaddwod_d_wu,    vvv)
+INSN_LSX(vmaddwod_q_du,    vvv)
+INSN_LSX(vmaddwev_h_bu_b,  vvv)
+INSN_LSX(vmaddwev_w_hu_h,  vvv)
+INSN_LSX(vmaddwev_d_wu_w,  vvv)
+INSN_LSX(vmaddwev_q_du_d,  vvv)
+INSN_LSX(vmaddwod_h_bu_b,  vvv)
+INSN_LSX(vmaddwod_w_hu_h,  vvv)
+INSN_LSX(vmaddwod_d_wu_w,  vvv)
+INSN_LSX(vmaddwod_q_du_d,  vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 437b47fa78..6bb273fefe 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -282,3 +282,39 @@ DEF_HELPER_FLAGS_4(vmulwod_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vmulwod_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vmulwod_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vmulwod_q_du_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vmadd_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmadd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmadd_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmadd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmsub_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmsub_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmsub_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmsub_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vmaddwev_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwev_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwev_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwev_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vmaddwev_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwev_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwev_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwev_q_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_q_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vmaddwev_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwev_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwev_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwev_q_du_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_q_du_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 583b608cd2..29c7aca8f9 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -1911,3 +1911,457 @@ TRANS(vmulwod_h_bu_b, gvec_vvv, MO_8, do_vmulwod_u_s)
 TRANS(vmulwod_w_hu_h, gvec_vvv, MO_16, do_vmulwod_u_s)
 TRANS(vmulwod_d_wu_w, gvec_vvv, MO_32, do_vmulwod_u_s)
 TRANS(vmulwod_q_du_d, gvec_vvv, MO_64, do_vmulwod_u_s)
+
+static void gen_vmadd(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1;
+
+    t1 = tcg_temp_new_vec_matching(t);
+    tcg_gen_mul_vec(vece, t1, a, b);
+    tcg_gen_add_vec(vece, t, t, t1);
+}
+
+static void do_vmadd(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                     uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_mul_vec, INDEX_op_add_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vmadd,
+            .fno = gen_helper_vmadd_b,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_8
+        },
+        {
+            .fniv = gen_vmadd,
+            .fno = gen_helper_vmadd_h,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vmadd,
+            .fno = gen_helper_vmadd_w,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vmadd,
+            .fno = gen_helper_vmadd_d,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vmadd_b, gvec_vvv, MO_8, do_vmadd)
+TRANS(vmadd_h, gvec_vvv, MO_16, do_vmadd)
+TRANS(vmadd_w, gvec_vvv, MO_32, do_vmadd)
+TRANS(vmadd_d, gvec_vvv, MO_64, do_vmadd)
+
+static void gen_vmsub(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1;
+
+    t1 = tcg_temp_new_vec_matching(t);
+    tcg_gen_mul_vec(vece, t1, a, b);
+    tcg_gen_sub_vec(vece, t, t, t1);
+}
+
+static void do_vmsub(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                     uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_mul_vec, INDEX_op_sub_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vmsub,
+            .fno = gen_helper_vmsub_b,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_8
+        },
+        {
+            .fniv = gen_vmsub,
+            .fno = gen_helper_vmsub_h,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vmsub,
+            .fno = gen_helper_vmsub_w,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vmsub,
+            .fno = gen_helper_vmsub_d,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vmsub_b, gvec_vvv, MO_8, do_vmsub)
+TRANS(vmsub_h, gvec_vvv, MO_16, do_vmsub)
+TRANS(vmsub_w, gvec_vvv, MO_32, do_vmsub)
+TRANS(vmsub_d, gvec_vvv, MO_64, do_vmsub)
+
+static void gen_vmaddwev_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2, t3;
+    int halfbits = 4 << vece;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+    t3 = tcg_temp_new_vec_matching(t);
+    tcg_gen_shli_vec(vece, t1, a, halfbits);
+    tcg_gen_sari_vec(vece, t1, t1, halfbits);
+    tcg_gen_shli_vec(vece, t2, b, halfbits);
+    tcg_gen_sari_vec(vece, t2, t2, halfbits);
+    tcg_gen_mul_vec(vece, t3, t1, t2);
+    tcg_gen_add_vec(vece, t, t, t3);
+}
+
+static void do_vmaddwev_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                          uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_shli_vec, INDEX_op_sari_vec,
+        INDEX_op_mul_vec, INDEX_op_add_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vmaddwev_s,
+            .fno = gen_helper_vmaddwev_h_b,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vmaddwev_s,
+            .fno = gen_helper_vmaddwev_w_h,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vmaddwev_s,
+            .fno = gen_helper_vmaddwev_d_w,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+        {
+            .fno = gen_helper_vmaddwev_q_d,
+            .vece = MO_128
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vmaddwev_h_b, gvec_vvv, MO_8, do_vmaddwev_s)
+TRANS(vmaddwev_w_h, gvec_vvv, MO_16, do_vmaddwev_s)
+TRANS(vmaddwev_d_w, gvec_vvv, MO_32, do_vmaddwev_s)
+TRANS(vmaddwev_q_d, gvec_vvv, MO_64, do_vmaddwev_s)
+
+static void gen_vmaddwod_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2, t3;
+    int halfbits = 4 << vece;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+    t3 = tcg_temp_new_vec_matching(t);
+    tcg_gen_sari_vec(vece, t1, a, halfbits);
+    tcg_gen_sari_vec(vece, t2, b, halfbits);
+    tcg_gen_mul_vec(vece, t3, t1, t2);
+    tcg_gen_add_vec(vece, t, t, t3);
+}
+
+static void do_vmaddwod_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                          uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_sari_vec, INDEX_op_mul_vec, INDEX_op_add_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vmaddwod_s,
+            .fno = gen_helper_vmaddwod_h_b,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vmaddwod_s,
+            .fno = gen_helper_vmaddwod_w_h,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vmaddwod_s,
+            .fno = gen_helper_vmaddwod_d_w,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+        {
+            .fno = gen_helper_vmaddwod_q_d,
+            .vece = MO_128
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vmaddwod_h_b, gvec_vvv, MO_8, do_vmaddwod_s)
+TRANS(vmaddwod_w_h, gvec_vvv, MO_16, do_vmaddwod_s)
+TRANS(vmaddwod_d_w, gvec_vvv, MO_32, do_vmaddwod_s)
+TRANS(vmaddwod_q_d, gvec_vvv, MO_64, do_vmaddwod_s)
+
+static void gen_vmaddwev_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2, t3;
+    int halfbits = 4 << vece;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+    t3 = tcg_temp_new_vec_matching(t);
+    tcg_gen_shli_vec(vece, t1, a, halfbits);
+    tcg_gen_shri_vec(vece, t1, t1, halfbits);
+    tcg_gen_shli_vec(vece, t2, b, halfbits);
+    tcg_gen_shri_vec(vece, t2, t2, halfbits);
+    tcg_gen_mul_vec(vece, t3, t1, t2);
+    tcg_gen_add_vec(vece, t, t, t3);
+}
+
+static void do_vmaddwev_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                          uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_shli_vec, INDEX_op_shri_vec,
+        INDEX_op_mul_vec, INDEX_op_add_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vmaddwev_u,
+            .fno = gen_helper_vmaddwev_h_bu,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vmaddwev_u,
+            .fno = gen_helper_vmaddwev_w_hu,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vmaddwev_u,
+            .fno = gen_helper_vmaddwev_d_wu,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+        {
+            .fno = gen_helper_vmaddwev_q_du,
+            .vece = MO_128
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vmaddwev_h_bu, gvec_vvv, MO_8, do_vmaddwev_u)
+TRANS(vmaddwev_w_hu, gvec_vvv, MO_16, do_vmaddwev_u)
+TRANS(vmaddwev_d_wu, gvec_vvv, MO_32, do_vmaddwev_u)
+TRANS(vmaddwev_q_du, gvec_vvv, MO_64, do_vmaddwev_u)
+
+static void gen_vmaddwod_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2, t3;
+    int halfbits = 4 << vece;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+    t3 = tcg_temp_new_vec_matching(t);
+    tcg_gen_shri_vec(vece, t1, a, halfbits);
+    tcg_gen_shri_vec(vece, t2, b, halfbits);
+    tcg_gen_mul_vec(vece, t3, t1, t2);
+    tcg_gen_add_vec(vece, t, t, t3);
+}
+
+static void do_vmaddwod_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                          uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_shri_vec, INDEX_op_mul_vec, INDEX_op_add_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vmaddwod_u,
+            .fno = gen_helper_vmaddwod_h_bu,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vmaddwod_u,
+            .fno = gen_helper_vmaddwod_w_hu,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vmaddwod_u,
+            .fno = gen_helper_vmaddwod_d_wu,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+        {
+            .fno = gen_helper_vmaddwod_q_du,
+            .vece = MO_128
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vmaddwod_h_bu, gvec_vvv, MO_8, do_vmaddwod_u)
+TRANS(vmaddwod_w_hu, gvec_vvv, MO_16, do_vmaddwod_u)
+TRANS(vmaddwod_d_wu, gvec_vvv, MO_32, do_vmaddwod_u)
+TRANS(vmaddwod_q_du, gvec_vvv, MO_64, do_vmaddwod_u)
+
+static void gen_vmaddwev_u_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2, t3;
+    int halfbits = 4 << vece;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+    t3 = tcg_temp_new_vec_matching(t);
+    tcg_gen_shli_vec(vece, t1, a, halfbits);
+    tcg_gen_shri_vec(vece, t1, t1, halfbits);
+    tcg_gen_shli_vec(vece, t2, b, halfbits);
+    tcg_gen_sari_vec(vece, t2, t2, halfbits);
+    tcg_gen_mul_vec(vece, t3, t1, t2);
+    tcg_gen_add_vec(vece, t,t, t3);
+}
+
+static void do_vmaddwev_u_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                            uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_shli_vec, INDEX_op_shri_vec,
+        INDEX_op_sari_vec, INDEX_op_mul_vec, INDEX_op_add_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vmaddwev_u_s,
+            .fno = gen_helper_vmaddwev_h_bu_b,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vmaddwev_u_s,
+            .fno = gen_helper_vmaddwev_w_hu_h,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vmaddwev_u_s,
+            .fno = gen_helper_vmaddwev_d_wu_w,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+        {
+            .fno = gen_helper_vmaddwev_q_du_d,
+            .vece = MO_128
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vmaddwev_h_bu_b, gvec_vvv, MO_8, do_vmaddwev_u_s)
+TRANS(vmaddwev_w_hu_h, gvec_vvv, MO_16, do_vmaddwev_u_s)
+TRANS(vmaddwev_d_wu_w, gvec_vvv, MO_32, do_vmaddwev_u_s)
+TRANS(vmaddwev_q_du_d, gvec_vvv, MO_64, do_vmaddwev_u_s)
+
+static void gen_vmaddwod_u_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2, t3;
+    int halfbits = 4 << vece;
+
+    t1 = tcg_temp_new_vec_matching(a);
+    t2 = tcg_temp_new_vec_matching(b);
+    t3 = tcg_temp_new_vec_matching(t);
+    tcg_gen_shri_vec(vece, t1, a, halfbits);
+    tcg_gen_sari_vec(vece, t2, b, halfbits);
+    tcg_gen_mul_vec(vece, t3, t1, t2);
+    tcg_gen_add_vec(vece, t, t, t3);
+}
+
+static void do_vmaddwod_u_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                            uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_shri_vec, INDEX_op_sari_vec,
+        INDEX_op_mul_vec, INDEX_op_add_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vmaddwod_u_s,
+            .fno = gen_helper_vmaddwod_h_bu_b,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vmaddwod_u_s,
+            .fno = gen_helper_vmaddwod_w_hu_h,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vmaddwod_u_s,
+            .fno = gen_helper_vmaddwod_d_wu_w,
+            .load_dest = true,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+        {
+            .fno = gen_helper_vmaddwod_q_du_d,
+            .vece = MO_128
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vmaddwod_h_bu_b, gvec_vvv, MO_8, do_vmaddwod_u_s)
+TRANS(vmaddwod_w_hu_h, gvec_vvv, MO_16, do_vmaddwod_u_s)
+TRANS(vmaddwod_d_wu_w, gvec_vvv, MO_32, do_vmaddwod_u_s)
+TRANS(vmaddwod_q_du_d, gvec_vvv, MO_64, do_vmaddwod_u_s)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 64e8042c9c..df23d4ee1e 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -706,3 +706,37 @@ vmulwod_h_bu_b   0111 00001010 00100 ..... ..... .....    @vvv
 vmulwod_w_hu_h   0111 00001010 00101 ..... ..... .....    @vvv
 vmulwod_d_wu_w   0111 00001010 00110 ..... ..... .....    @vvv
 vmulwod_q_du_d   0111 00001010 00111 ..... ..... .....    @vvv
+
+vmadd_b          0111 00001010 10000 ..... ..... .....    @vvv
+vmadd_h          0111 00001010 10001 ..... ..... .....    @vvv
+vmadd_w          0111 00001010 10010 ..... ..... .....    @vvv
+vmadd_d          0111 00001010 10011 ..... ..... .....    @vvv
+vmsub_b          0111 00001010 10100 ..... ..... .....    @vvv
+vmsub_h          0111 00001010 10101 ..... ..... .....    @vvv
+vmsub_w          0111 00001010 10110 ..... ..... .....    @vvv
+vmsub_d          0111 00001010 10111 ..... ..... .....    @vvv
+
+vmaddwev_h_b     0111 00001010 11000 ..... ..... .....    @vvv
+vmaddwev_w_h     0111 00001010 11001 ..... ..... .....    @vvv
+vmaddwev_d_w     0111 00001010 11010 ..... ..... .....    @vvv
+vmaddwev_q_d     0111 00001010 11011 ..... ..... .....    @vvv
+vmaddwod_h_b     0111 00001010 11100 ..... ..... .....    @vvv
+vmaddwod_w_h     0111 00001010 11101 ..... ..... .....    @vvv
+vmaddwod_d_w     0111 00001010 11110 ..... ..... .....    @vvv
+vmaddwod_q_d     0111 00001010 11111 ..... ..... .....    @vvv
+vmaddwev_h_bu    0111 00001011 01000 ..... ..... .....    @vvv
+vmaddwev_w_hu    0111 00001011 01001 ..... ..... .....    @vvv
+vmaddwev_d_wu    0111 00001011 01010 ..... ..... .....    @vvv
+vmaddwev_q_du    0111 00001011 01011 ..... ..... .....    @vvv
+vmaddwod_h_bu    0111 00001011 01100 ..... ..... .....    @vvv
+vmaddwod_w_hu    0111 00001011 01101 ..... ..... .....    @vvv
+vmaddwod_d_wu    0111 00001011 01110 ..... ..... .....    @vvv
+vmaddwod_q_du    0111 00001011 01111 ..... ..... .....    @vvv
+vmaddwev_h_bu_b  0111 00001011 11000 ..... ..... .....    @vvv
+vmaddwev_w_hu_h  0111 00001011 11001 ..... ..... .....    @vvv
+vmaddwev_d_wu_w  0111 00001011 11010 ..... ..... .....    @vvv
+vmaddwev_q_du_d  0111 00001011 11011 ..... ..... .....    @vvv
+vmaddwod_h_bu_b  0111 00001011 11100 ..... ..... .....    @vvv
+vmaddwod_w_hu_h  0111 00001011 11101 ..... ..... .....    @vvv
+vmaddwod_d_wu_w  0111 00001011 11110 ..... ..... .....    @vvv
+vmaddwod_q_du_d  0111 00001011 11111 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index cfc74f08d8..9ae56e9fcb 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -611,3 +611,117 @@ void HELPER(vmulwod_q_du_d)(void *vd, void *vj, void *vk, uint32_t v)
 DO_ODD_U_S(vmulwod_h_bu_b, 16, uint16_t, uint8_t, int16_t, H, B, DO_MUL)
 DO_ODD_U_S(vmulwod_w_hu_h, 32, uint32_t, uint16_t, int32_t, W, H, DO_MUL)
 DO_ODD_U_S(vmulwod_d_wu_w, 64, uint64_t, uint32_t, int64_t, D, W, DO_MUL)
+
+#define DO_MADD(a, b, c)  (a + b * c)
+#define DO_MSUB(a, b, c)  (a - b * c)
+
+#define VMADDSUB(NAME, BIT, E, DO_OP)                       \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
+{                                                           \
+    int i;                                                  \
+    VReg *Vd = (VReg *)vd;                                  \
+    VReg *Vj = (VReg *)vj;                                  \
+    VReg *Vk = (VReg *)vk;                                  \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
+        Vd->E(i) = DO_OP(Vd->E(i), Vj->E(i) ,Vk->E(i));     \
+    }                                                       \
+}
+
+VMADDSUB(vmadd_b, 8, B, DO_MADD)
+VMADDSUB(vmadd_h, 16, H, DO_MADD)
+VMADDSUB(vmadd_w, 32, W, DO_MADD)
+VMADDSUB(vmadd_d, 64, D, DO_MADD)
+VMADDSUB(vmsub_b, 8, B, DO_MSUB)
+VMADDSUB(vmsub_h, 16, H, DO_MSUB)
+VMADDSUB(vmsub_w, 32, W, DO_MSUB)
+VMADDSUB(vmsub_d, 64, D, DO_MSUB)
+
+#define VMADD_Q(NAME, FN1, FN2, index)                            \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v)       \
+{                                                                 \
+    VReg *Vd = (VReg *)vd;                                        \
+    VReg *Vj = (VReg *)vj;                                        \
+    VReg *Vk = (VReg *)vk;                                        \
+                                                                  \
+    Vd->Q(0) = int128_add(Vd->Q(0),                               \
+                          FN1(Vj->D(index)) * FN2(Vk->D(index))); \
+}
+
+#define VMADDWEV(NAME, BIT, T1, T2, E1, E2, DO_OP)                        \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v)               \
+{                                                                         \
+    int i;                                                                \
+    VReg *Vd = (VReg *)vd;                                                \
+    VReg *Vj = (VReg *)vj;                                                \
+    VReg *Vk = (VReg *)vk;                                                \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                   \
+        Vd->E1(i) += DO_OP((T1)(T2)Vj->E2(2 * i), (T1)(T2)Vk->E2(2 * i)); \
+    }                                                                     \
+}
+
+VMADDWEV(vmaddwev_h_b, 16, int16_t, int8_t, H, B, DO_MUL)
+VMADDWEV(vmaddwev_w_h, 32, int32_t, int16_t, W, H, DO_MUL)
+VMADDWEV(vmaddwev_d_w, 64, int64_t, int32_t, D, W, DO_MUL)
+VMADD_Q(vmaddwev_q_d, int128_makes64, int128_makes64, 0)
+VMADDWEV(vmaddwev_h_bu, 16, uint16_t, uint8_t, H, B, DO_MUL)
+VMADDWEV(vmaddwev_w_hu, 32, uint32_t, uint16_t, W, H, DO_MUL)
+VMADDWEV(vmaddwev_d_wu, 64, uint64_t, uint32_t, D, W, DO_MUL)
+VMADD_Q(vmaddwev_q_du, int128_make64, int128_make64, 0)
+
+#define VMADDWOD(NAME, BIT, T1, T2, E1, E2, DO_OP)          \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
+{                                                           \
+    int i;                                                  \
+    VReg *Vd = (VReg *)vd;                                  \
+    VReg *Vj = (VReg *)vj;                                  \
+    VReg *Vk = (VReg *)vk;                                  \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
+        Vd->E1(i) += DO_OP((T1)(T2)Vj->E2(2 * i + 1),       \
+                           (T1)(T2)Vk->E2(2 * i + 1));      \
+    }                                                       \
+}
+
+VMADDWOD(vmaddwod_h_b, 16, int16_t, int8_t,  H, B, DO_MUL)
+VMADDWOD(vmaddwod_w_h, 32, int32_t, int16_t, W, H, DO_MUL)
+VMADDWOD(vmaddwod_d_w, 64, int64_t, int32_t, D, W, DO_MUL)
+VMADD_Q(vmaddwod_q_d, int128_makes64, int128_makes64, 1)
+VMADDWOD(vmaddwod_h_bu, 16, uint16_t, uint8_t, H, B, DO_MUL)
+VMADDWOD(vmaddwod_w_hu, 32, uint32_t, uint16_t, W, H, DO_MUL)
+VMADDWOD(vmaddwod_d_wu, 64, uint64_t, uint32_t, D, W, DO_MUL)
+VMADD_Q(vmaddwod_q_du, int128_make64, int128_make64, 1)
+
+#define VMADDWEV_U_S(NAME, BIT, T1, T2, TS, E1, E2, DO_OP)  \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
+{                                                           \
+    int i;                                                  \
+    VReg *Vd = (VReg *)vd;                                  \
+    VReg *Vj = (VReg *)vj;                                  \
+    VReg *Vk = (VReg *)vk;                                  \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
+        Vd->E1(i) += DO_OP((T1)(T2)Vj->E2(2 * i),           \
+                           (TS)Vk->E2(2 * i));              \
+    }                                                       \
+}
+
+VMADDWEV_U_S(vmaddwev_h_bu_b, 16, uint16_t, uint8_t, int16_t, H, B, DO_MUL)
+VMADDWEV_U_S(vmaddwev_w_hu_h, 32, uint32_t, uint16_t, int32_t, W, H, DO_MUL)
+VMADDWEV_U_S(vmaddwev_d_wu_w, 64, uint64_t, uint32_t, int64_t, D, W, DO_MUL)
+VMADD_Q(vmaddwev_q_du_d, int128_make64, int128_makes64, 0)
+
+#define VMADDWOD_U_S(NAME, BIT, T1, T2, TS, E1, E2, DO_OP)  \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
+{                                                           \
+    int i;                                                  \
+    VReg *Vd = (VReg *)vd;                                  \
+    VReg *Vj = (VReg *)vj;                                  \
+    VReg *Vk = (VReg *)vk;                                  \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
+        Vd->E1(i) += DO_OP((T1)(T2)Vj->E2(2 * i + 1),       \
+                           (TS)Vk->E2(2 * i + 1));          \
+    }                                                       \
+}
+
+VMADDWOD_U_S(vmaddwod_h_bu_b, 16, uint16_t, uint8_t, int16_t, H, B, DO_MUL)
+VMADDWOD_U_S(vmaddwod_w_hu_h, 32, uint32_t, uint16_t, int32_t, W, H, DO_MUL)
+VMADDWOD_U_S(vmaddwod_d_wu_w, 64, uint64_t, uint32_t, int64_t, D, W, DO_MUL)
+VMADD_Q(vmaddwod_q_du_d, int128_make64, int128_makes64, 1)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 17/44] target/loongarch: Implement vdiv/vmod
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (15 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 16/44] target/loongarch: Implement vmadd/vmsub/vmaddw{ev/od} Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-03-28 20:52   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 18/44] target/loongarch: Implement vsat Song Gao
                   ` (26 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VDIV.{B/H/W/D}[U];
- VMOD.{B/H/W/D}[U].

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    | 17 +++++++++
 target/loongarch/helper.h                   | 17 +++++++++
 target/loongarch/insn_trans/trans_lsx.c.inc | 17 +++++++++
 target/loongarch/insns.decode               | 17 +++++++++
 target/loongarch/lsx_helper.c               | 38 +++++++++++++++++++++
 5 files changed, 106 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 980e6e6375..6e4f676a42 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1044,3 +1044,20 @@ INSN_LSX(vmaddwod_h_bu_b,  vvv)
 INSN_LSX(vmaddwod_w_hu_h,  vvv)
 INSN_LSX(vmaddwod_d_wu_w,  vvv)
 INSN_LSX(vmaddwod_q_du_d,  vvv)
+
+INSN_LSX(vdiv_b,           vvv)
+INSN_LSX(vdiv_h,           vvv)
+INSN_LSX(vdiv_w,           vvv)
+INSN_LSX(vdiv_d,           vvv)
+INSN_LSX(vdiv_bu,          vvv)
+INSN_LSX(vdiv_hu,          vvv)
+INSN_LSX(vdiv_wu,          vvv)
+INSN_LSX(vdiv_du,          vvv)
+INSN_LSX(vmod_b,           vvv)
+INSN_LSX(vmod_h,           vvv)
+INSN_LSX(vmod_w,           vvv)
+INSN_LSX(vmod_d,           vvv)
+INSN_LSX(vmod_bu,          vvv)
+INSN_LSX(vmod_hu,          vvv)
+INSN_LSX(vmod_wu,          vvv)
+INSN_LSX(vmod_du,          vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 6bb273fefe..e46f12cb65 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -318,3 +318,20 @@ DEF_HELPER_FLAGS_4(vmaddwod_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vmaddwod_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vmaddwod_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vmaddwod_q_du_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_4(vdiv_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vdiv_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vdiv_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vdiv_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vdiv_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vdiv_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vdiv_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vdiv_du, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_du, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 29c7aca8f9..46a18da6dd 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2365,3 +2365,20 @@ TRANS(vmaddwod_h_bu_b, gvec_vvv, MO_8, do_vmaddwod_u_s)
 TRANS(vmaddwod_w_hu_h, gvec_vvv, MO_16, do_vmaddwod_u_s)
 TRANS(vmaddwod_d_wu_w, gvec_vvv, MO_32, do_vmaddwod_u_s)
 TRANS(vmaddwod_q_du_d, gvec_vvv, MO_64, do_vmaddwod_u_s)
+
+TRANS(vdiv_b, gen_vvv, gen_helper_vdiv_b)
+TRANS(vdiv_h, gen_vvv, gen_helper_vdiv_h)
+TRANS(vdiv_w, gen_vvv, gen_helper_vdiv_w)
+TRANS(vdiv_d, gen_vvv, gen_helper_vdiv_d)
+TRANS(vdiv_bu, gen_vvv, gen_helper_vdiv_bu)
+TRANS(vdiv_hu, gen_vvv, gen_helper_vdiv_hu)
+TRANS(vdiv_wu, gen_vvv, gen_helper_vdiv_wu)
+TRANS(vdiv_du, gen_vvv, gen_helper_vdiv_du)
+TRANS(vmod_b, gen_vvv, gen_helper_vmod_b)
+TRANS(vmod_h, gen_vvv, gen_helper_vmod_h)
+TRANS(vmod_w, gen_vvv, gen_helper_vmod_w)
+TRANS(vmod_d, gen_vvv, gen_helper_vmod_d)
+TRANS(vmod_bu, gen_vvv, gen_helper_vmod_bu)
+TRANS(vmod_hu, gen_vvv, gen_helper_vmod_hu)
+TRANS(vmod_wu, gen_vvv, gen_helper_vmod_wu)
+TRANS(vmod_du, gen_vvv, gen_helper_vmod_du)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index df23d4ee1e..67d016edb7 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -740,3 +740,20 @@ vmaddwod_h_bu_b  0111 00001011 11100 ..... ..... .....    @vvv
 vmaddwod_w_hu_h  0111 00001011 11101 ..... ..... .....    @vvv
 vmaddwod_d_wu_w  0111 00001011 11110 ..... ..... .....    @vvv
 vmaddwod_q_du_d  0111 00001011 11111 ..... ..... .....    @vvv
+
+vdiv_b           0111 00001110 00000 ..... ..... .....    @vvv
+vdiv_h           0111 00001110 00001 ..... ..... .....    @vvv
+vdiv_w           0111 00001110 00010 ..... ..... .....    @vvv
+vdiv_d           0111 00001110 00011 ..... ..... .....    @vvv
+vdiv_bu          0111 00001110 01000 ..... ..... .....    @vvv
+vdiv_hu          0111 00001110 01001 ..... ..... .....    @vvv
+vdiv_wu          0111 00001110 01010 ..... ..... .....    @vvv
+vdiv_du          0111 00001110 01011 ..... ..... .....    @vvv
+vmod_b           0111 00001110 00100 ..... ..... .....    @vvv
+vmod_h           0111 00001110 00101 ..... ..... .....    @vvv
+vmod_w           0111 00001110 00110 ..... ..... .....    @vvv
+vmod_d           0111 00001110 00111 ..... ..... .....    @vvv
+vmod_bu          0111 00001110 01100 ..... ..... .....    @vvv
+vmod_hu          0111 00001110 01101 ..... ..... .....    @vvv
+vmod_wu          0111 00001110 01110 ..... ..... .....    @vvv
+vmod_du          0111 00001110 01111 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 9ae56e9fcb..03a837fa74 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -725,3 +725,41 @@ VMADDWOD_U_S(vmaddwod_h_bu_b, 16, uint16_t, uint8_t, int16_t, H, B, DO_MUL)
 VMADDWOD_U_S(vmaddwod_w_hu_h, 32, uint32_t, uint16_t, int32_t, W, H, DO_MUL)
 VMADDWOD_U_S(vmaddwod_d_wu_w, 64, uint64_t, uint32_t, int64_t, D, W, DO_MUL)
 VMADD_Q(vmaddwod_q_du_d, int128_make64, int128_makes64, 1)
+
+#define DO_DIVU(N, M) (unlikely(M == 0) ? 0 : N / M)
+#define DO_REMU(N, M) (unlikely(M == 0) ? 0 : N % M)
+#define DO_DIV(N, M)  (unlikely(M == 0) ? 0 :\
+        unlikely((N == -N) && (M == (__typeof(N))(-1))) ? N : N / M)
+#define DO_REM(N, M)  (unlikely(M == 0) ? 0 :\
+        unlikely((N == -N) && (M == (__typeof(N))(-1))) ? 0 : N % M)
+
+#define DO_3OP(NAME, BIT, T, E, DO_OP)                   \
+void HELPER(NAME)(CPULoongArchState *env,                \
+                  uint32_t vd, uint32_t vj, uint32_t vk) \
+{                                                        \
+    int i;                                               \
+    VReg *Vd = &(env->fpr[vd].vreg);                     \
+    VReg *Vj = &(env->fpr[vj].vreg);                     \
+    VReg *Vk = &(env->fpr[vk].vreg);                     \
+                                                         \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                  \
+        Vd->E(i) = DO_OP((T)Vj->E(i), (T)Vk->E(i));      \
+    }                                                    \
+}
+
+DO_3OP(vdiv_b, 8, int8_t, B, DO_DIV)
+DO_3OP(vdiv_h, 16, int16_t, H, DO_DIV)
+DO_3OP(vdiv_w, 32, int32_t, W, DO_DIV)
+DO_3OP(vdiv_d, 64, int64_t, D, DO_DIV)
+DO_3OP(vdiv_bu, 8, uint8_t, B, DO_DIVU)
+DO_3OP(vdiv_hu, 16, uint16_t, H, DO_DIVU)
+DO_3OP(vdiv_wu, 32, uint32_t, W, DO_DIVU)
+DO_3OP(vdiv_du, 64, uint64_t, D, DO_DIVU)
+DO_3OP(vmod_b, 8, int8_t, B, DO_REM)
+DO_3OP(vmod_h, 16, int16_t, H, DO_REM)
+DO_3OP(vmod_w, 32, int32_t, W, DO_REM)
+DO_3OP(vmod_d, 64, int64_t, D, DO_REM)
+DO_3OP(vmod_bu, 8, uint8_t, B, DO_REMU)
+DO_3OP(vmod_hu, 16, uint16_t, H, DO_REMU)
+DO_3OP(vmod_wu, 32, uint32_t, W, DO_REMU)
+DO_3OP(vmod_du, 64, uint64_t, D, DO_REMU)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 18/44] target/loongarch: Implement vsat
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (16 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 17/44] target/loongarch: Implement vdiv/vmod Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-01  5:03   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 19/44] target/loongarch: Implement vexth Song Gao
                   ` (25 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VSAT.{B/H/W/D}[U].

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |   9 ++
 target/loongarch/helper.h                   |   9 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 105 ++++++++++++++++++++
 target/loongarch/insns.decode               |  12 +++
 target/loongarch/lsx_helper.c               |  73 ++++++++++++++
 5 files changed, 208 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 6e4f676a42..b04aefe3ed 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1061,3 +1061,12 @@ INSN_LSX(vmod_bu,          vvv)
 INSN_LSX(vmod_hu,          vvv)
 INSN_LSX(vmod_wu,          vvv)
 INSN_LSX(vmod_du,          vvv)
+
+INSN_LSX(vsat_b,           vv_i)
+INSN_LSX(vsat_h,           vv_i)
+INSN_LSX(vsat_w,           vv_i)
+INSN_LSX(vsat_d,           vv_i)
+INSN_LSX(vsat_bu,          vv_i)
+INSN_LSX(vsat_hu,          vv_i)
+INSN_LSX(vsat_wu,          vv_i)
+INSN_LSX(vsat_du,          vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index e46f12cb65..6345b7ef9c 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -335,3 +335,12 @@ DEF_HELPER_4(vmod_bu, void, env, i32, i32, i32)
 DEF_HELPER_4(vmod_hu, void, env, i32, i32, i32)
 DEF_HELPER_4(vmod_wu, void, env, i32, i32, i32)
 DEF_HELPER_4(vmod_du, void, env, i32, i32, i32)
+
+DEF_HELPER_FLAGS_4(vsat_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsat_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsat_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsat_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsat_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsat_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsat_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsat_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 46a18da6dd..7dfb3b33f6 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2382,3 +2382,108 @@ TRANS(vmod_bu, gen_vvv, gen_helper_vmod_bu)
 TRANS(vmod_hu, gen_vvv, gen_helper_vmod_hu)
 TRANS(vmod_wu, gen_vvv, gen_helper_vmod_wu)
 TRANS(vmod_du, gen_vvv, gen_helper_vmod_du)
+
+static void gen_vsat_s(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
+{
+    TCGv_vec t1;
+    int64_t max  = (1l << imm) - 1;
+    int64_t min =  ~max;
+
+    t1 = tcg_temp_new_vec_matching(t);
+    tcg_gen_dupi_vec(vece, t, min);
+    tcg_gen_smax_vec(vece, t, a, t);
+    tcg_gen_dupi_vec(vece, t1, max);
+    tcg_gen_smin_vec(vece, t, t, t1);
+}
+
+static void do_vsat_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                      int64_t imm, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_smax_vec, INDEX_op_smin_vec, 0
+        };
+    static const GVecGen2i op[4] = {
+        {
+            .fniv = gen_vsat_s,
+            .fnoi = gen_helper_vsat_b,
+            .opt_opc = vecop_list,
+            .vece = MO_8
+        },
+        {
+            .fniv = gen_vsat_s,
+            .fnoi = gen_helper_vsat_h,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vsat_s,
+            .fnoi = gen_helper_vsat_w,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vsat_s,
+            .fnoi = gen_helper_vsat_d,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+    };
+
+    tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, maxsz, imm, &op[vece]);
+}
+
+TRANS(vsat_b, gvec_vv_i, MO_8, do_vsat_s)
+TRANS(vsat_h, gvec_vv_i, MO_16, do_vsat_s)
+TRANS(vsat_w, gvec_vv_i, MO_32, do_vsat_s)
+TRANS(vsat_d, gvec_vv_i, MO_64, do_vsat_s)
+
+static void gen_vsat_u(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
+{
+    uint64_t max;
+
+    max = (imm == 0x3f) ? UINT64_MAX : (1ul << (imm + 1)) - 1;
+
+    tcg_gen_dupi_vec(vece, t, max);
+    tcg_gen_umin_vec(vece, t, a, t);
+}
+
+static void do_vsat_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                       int64_t imm, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_umin_vec, 0
+        };
+    static const GVecGen2i op[4] = {
+        {
+            .fniv = gen_vsat_u,
+            .fnoi = gen_helper_vsat_bu,
+            .opt_opc = vecop_list,
+            .vece = MO_8
+        },
+        {
+            .fniv = gen_vsat_u,
+            .fnoi = gen_helper_vsat_hu,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vsat_u,
+            .fnoi = gen_helper_vsat_wu,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vsat_u,
+            .fnoi = gen_helper_vsat_du,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+    };
+
+    tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, maxsz, imm, &op[vece]);
+}
+
+TRANS(vsat_bu, gvec_vv_i, MO_8, do_vsat_u)
+TRANS(vsat_hu, gvec_vv_i, MO_16, do_vsat_u)
+TRANS(vsat_wu, gvec_vv_i, MO_32, do_vsat_u)
+TRANS(vsat_du, gvec_vv_i, MO_64, do_vsat_u)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 67d016edb7..3ed61b3d68 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -499,7 +499,10 @@ dbcl             0000 00000010 10101 ...............      @i15
 #
 @vv               .... ........ ..... ..... vj:5 vd:5    &vv
 @vvv               .... ........ ..... vk:5 vj:5 vd:5    &vvv
+@vv_ui3        .... ........ ..... .. imm:3 vj:5 vd:5    &vv_i
+@vv_ui4         .... ........ ..... . imm:4 vj:5 vd:5    &vv_i
 @vv_ui5           .... ........ ..... imm:5 vj:5 vd:5    &vv_i
+@vv_ui6            .... ........ .... imm:6 vj:5 vd:5    &vv_i
 @vv_i5           .... ........ ..... imm:s5 vj:5 vd:5    &vv_i
 
 vadd_b           0111 00000000 10100 ..... ..... .....    @vvv
@@ -757,3 +760,12 @@ vmod_bu          0111 00001110 01100 ..... ..... .....    @vvv
 vmod_hu          0111 00001110 01101 ..... ..... .....    @vvv
 vmod_wu          0111 00001110 01110 ..... ..... .....    @vvv
 vmod_du          0111 00001110 01111 ..... ..... .....    @vvv
+
+vsat_b           0111 00110010 01000 01 ... ..... .....   @vv_ui3
+vsat_h           0111 00110010 01000 1 .... ..... .....   @vv_ui4
+vsat_w           0111 00110010 01001 ..... ..... .....    @vv_ui5
+vsat_d           0111 00110010 0101 ...... ..... .....    @vv_ui6
+vsat_bu          0111 00110010 10000 01 ... ..... .....   @vv_ui3
+vsat_hu          0111 00110010 10000 1 .... ..... .....   @vv_ui4
+vsat_wu          0111 00110010 10001 ..... ..... .....    @vv_ui5
+vsat_du          0111 00110010 1001 ...... ..... .....    @vv_ui6
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 03a837fa74..15efc64e4e 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -763,3 +763,76 @@ DO_3OP(vmod_bu, 8, uint8_t, B, DO_REMU)
 DO_3OP(vmod_hu, 16, uint16_t, H, DO_REMU)
 DO_3OP(vmod_wu, 32, uint32_t, W, DO_REMU)
 DO_3OP(vmod_du, 64, uint64_t, D, DO_REMU)
+
+#define do_vsats(E, T)                      \
+static T do_vsats_ ## E(T s1, uint64_t imm) \
+{                                           \
+    T mask,top;                             \
+                                            \
+    mask = (1l << imm) - 1;                 \
+    top = s1 >> imm;                        \
+    if (top > 0) {                          \
+        return mask;                        \
+    } else if (top < -1) {                  \
+        return ~mask;                       \
+    } else {                                \
+        return s1;                          \
+    }                                       \
+}
+
+do_vsats(B, int8_t)
+do_vsats(H, int16_t)
+do_vsats(W, int32_t)
+do_vsats(D, int64_t)
+
+#define VSAT_S(NAME, BIT, E)                                    \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t v) \
+{                                                               \
+    int i;                                                      \
+    VReg *Vd = (VReg *)vd;                                      \
+    VReg *Vj = (VReg *)vj;                                      \
+                                                                \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                         \
+        Vd->E(i) = do_vsats_ ## E(Vj->E(i), imm);               \
+    }                                                           \
+}
+
+VSAT_S(vsat_b, 8, B)
+VSAT_S(vsat_h, 16, H)
+VSAT_S(vsat_w, 32, W)
+VSAT_S(vsat_d, 64, D)
+
+#define do_vsatu(E, T)                                         \
+static T do_vsatu_ ## E(T s1, uint64_t imm)                    \
+{                                                              \
+    uint64_t max;                                              \
+                                                               \
+    max = (imm == 0x3f) ? UINT64_MAX : (1ul << (imm + 1)) - 1; \
+    if (s1 >(T)max) {                                          \
+        return (T)max;                                         \
+    } else {                                                   \
+        return s1;                                             \
+    }                                                          \
+}
+
+do_vsatu(B, uint8_t)
+do_vsatu(H, uint16_t)
+do_vsatu(W, uint32_t)
+do_vsatu(D, uint64_t)
+
+#define VSAT_U(NAME, BIT, T, E)                                 \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t v) \
+{                                                               \
+    int i;                                                      \
+    VReg *Vd = (VReg *)vd;                                      \
+    VReg *Vj = (VReg *)vj;                                      \
+                                                                \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                         \
+        Vd->E(i) = do_vsatu_ ## E((T)Vj->E(i), imm);            \
+    }                                                           \
+}
+
+VSAT_U(vsat_bu, 8, uint8_t, B)
+VSAT_U(vsat_hu, 16, uint16_t, H)
+VSAT_U(vsat_wu, 32, uint32_t, W)
+VSAT_U(vsat_du, 64, uint64_t, D)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 19/44] target/loongarch: Implement vexth
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (17 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 18/44] target/loongarch: Implement vsat Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-01  5:07   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 20/44] target/loongarch: Implement vsigncov Song Gao
                   ` (24 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VEXTH.{H.B/W.H/D.W/Q.D};
- VEXTH.{HU.BU/WU.HU/DU.WU/QU.DU}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  9 ++++++
 target/loongarch/helper.h                   |  9 ++++++
 target/loongarch/insn_trans/trans_lsx.c.inc | 20 ++++++++++++
 target/loongarch/insns.decode               |  9 ++++++
 target/loongarch/lsx_helper.c               | 35 +++++++++++++++++++++
 5 files changed, 82 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index b04aefe3ed..412c1cedcb 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1070,3 +1070,12 @@ INSN_LSX(vsat_bu,          vv_i)
 INSN_LSX(vsat_hu,          vv_i)
 INSN_LSX(vsat_wu,          vv_i)
 INSN_LSX(vsat_du,          vv_i)
+
+INSN_LSX(vexth_h_b,        vv)
+INSN_LSX(vexth_w_h,        vv)
+INSN_LSX(vexth_d_w,        vv)
+INSN_LSX(vexth_q_d,        vv)
+INSN_LSX(vexth_hu_bu,      vv)
+INSN_LSX(vexth_wu_hu,      vv)
+INSN_LSX(vexth_du_wu,      vv)
+INSN_LSX(vexth_qu_du,      vv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 6345b7ef9c..0876aa3331 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -344,3 +344,12 @@ DEF_HELPER_FLAGS_4(vsat_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vsat_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vsat_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vsat_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_3(vexth_h_b, void, env, i32, i32)
+DEF_HELPER_3(vexth_w_h, void, env, i32, i32)
+DEF_HELPER_3(vexth_d_w, void, env, i32, i32)
+DEF_HELPER_3(vexth_q_d, void, env, i32, i32)
+DEF_HELPER_3(vexth_hu_bu, void, env, i32, i32)
+DEF_HELPER_3(vexth_wu_hu, void, env, i32, i32)
+DEF_HELPER_3(vexth_du_wu, void, env, i32, i32)
+DEF_HELPER_3(vexth_qu_du, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 7dfb3b33f6..f6058c1360 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -28,6 +28,17 @@ static bool gen_vvv(DisasContext *ctx, arg_vvv *a,
     return true;
 }
 
+static bool gen_vv(DisasContext *ctx, arg_vv *a,
+                   void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32))
+{
+    TCGv_i32 vd = tcg_constant_i32(a->vd);
+    TCGv_i32 vj = tcg_constant_i32(a->vj);
+
+    CHECK_SXE;
+    func(cpu_env, vd, vj);
+    return true;
+}
+
 static bool gvec_vvv(DisasContext *ctx, arg_vvv *a, MemOp mop,
                      void (*func)(unsigned, uint32_t, uint32_t,
                                   uint32_t, uint32_t, uint32_t))
@@ -2487,3 +2498,12 @@ TRANS(vsat_bu, gvec_vv_i, MO_8, do_vsat_u)
 TRANS(vsat_hu, gvec_vv_i, MO_16, do_vsat_u)
 TRANS(vsat_wu, gvec_vv_i, MO_32, do_vsat_u)
 TRANS(vsat_du, gvec_vv_i, MO_64, do_vsat_u)
+
+TRANS(vexth_h_b, gen_vv, gen_helper_vexth_h_b)
+TRANS(vexth_w_h, gen_vv, gen_helper_vexth_w_h)
+TRANS(vexth_d_w, gen_vv, gen_helper_vexth_d_w)
+TRANS(vexth_q_d, gen_vv, gen_helper_vexth_q_d)
+TRANS(vexth_hu_bu, gen_vv, gen_helper_vexth_hu_bu)
+TRANS(vexth_wu_hu, gen_vv, gen_helper_vexth_wu_hu)
+TRANS(vexth_du_wu, gen_vv, gen_helper_vexth_du_wu)
+TRANS(vexth_qu_du, gen_vv, gen_helper_vexth_qu_du)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 3ed61b3d68..39c582d098 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -769,3 +769,12 @@ vsat_bu          0111 00110010 10000 01 ... ..... .....   @vv_ui3
 vsat_hu          0111 00110010 10000 1 .... ..... .....   @vv_ui4
 vsat_wu          0111 00110010 10001 ..... ..... .....    @vv_ui5
 vsat_du          0111 00110010 1001 ...... ..... .....    @vv_ui6
+
+vexth_h_b        0111 00101001 11101 11000 ..... .....    @vv
+vexth_w_h        0111 00101001 11101 11001 ..... .....    @vv
+vexth_d_w        0111 00101001 11101 11010 ..... .....    @vv
+vexth_q_d        0111 00101001 11101 11011 ..... .....    @vv
+vexth_hu_bu      0111 00101001 11101 11100 ..... .....    @vv
+vexth_wu_hu      0111 00101001 11101 11101 ..... .....    @vv
+vexth_du_wu      0111 00101001 11101 11110 ..... .....    @vv
+vexth_qu_du      0111 00101001 11101 11111 ..... .....    @vv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 15efc64e4e..9a0b358576 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -836,3 +836,38 @@ VSAT_U(vsat_bu, 8, uint8_t, B)
 VSAT_U(vsat_hu, 16, uint16_t, H)
 VSAT_U(vsat_wu, 32, uint32_t, W)
 VSAT_U(vsat_du, 64, uint64_t, D)
+
+#define VEXTH(NAME, BIT, T1, T2, E1, E2)                            \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
+{                                                                   \
+    int i;                                                          \
+    VReg *Vd = &(env->fpr[vd].vreg);                                \
+    VReg *Vj = &(env->fpr[vj].vreg);                                \
+                                                                    \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                             \
+        Vd->E1(i) = (T2)(T1)Vj->E2(i + LSX_LEN/BIT);                \
+    }                                                               \
+}
+
+void HELPER(vexth_q_d)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+{
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    Vd->Q(0) = int128_makes64(Vj->D(1));
+}
+
+void HELPER(vexth_qu_du)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+{
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    Vd->Q(0) = int128_make64((uint64_t)Vj->D(1));
+}
+
+VEXTH(vexth_h_b, 16, int16_t, int8_t, H, B)
+VEXTH(vexth_w_h, 32, int32_t, int16_t, W, H)
+VEXTH(vexth_d_w, 64, int64_t, int32_t, D, W)
+VEXTH(vexth_hu_bu, 16, uint16_t, uint8_t, H, B)
+VEXTH(vexth_wu_hu, 32, uint32_t, uint16_t, W, H)
+VEXTH(vexth_du_wu, 64, uint64_t, uint32_t, D, W)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 20/44] target/loongarch: Implement vsigncov
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (18 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 19/44] target/loongarch: Implement vexth Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-01  5:11   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 21/44] target/loongarch: Implement vmskltz/vmskgez/vmsknz Song Gao
                   ` (23 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VSIGNCOV.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  5 ++
 target/loongarch/helper.h                   |  5 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 54 +++++++++++++++++++++
 target/loongarch/insns.decode               |  5 ++
 target/loongarch/lsx_helper.c               | 19 ++++++++
 5 files changed, 88 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 412c1cedcb..46e808c321 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1079,3 +1079,8 @@ INSN_LSX(vexth_hu_bu,      vv)
 INSN_LSX(vexth_wu_hu,      vv)
 INSN_LSX(vexth_du_wu,      vv)
 INSN_LSX(vexth_qu_du,      vv)
+
+INSN_LSX(vsigncov_b,       vvv)
+INSN_LSX(vsigncov_h,       vvv)
+INSN_LSX(vsigncov_w,       vvv)
+INSN_LSX(vsigncov_d,       vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 0876aa3331..a7394b2eb7 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -353,3 +353,8 @@ DEF_HELPER_3(vexth_hu_bu, void, env, i32, i32)
 DEF_HELPER_3(vexth_wu_hu, void, env, i32, i32)
 DEF_HELPER_3(vexth_du_wu, void, env, i32, i32)
 DEF_HELPER_3(vexth_qu_du, void, env, i32, i32)
+
+DEF_HELPER_FLAGS_4(vsigncov_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsigncov_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsigncov_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsigncov_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index f6058c1360..865485ea10 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2507,3 +2507,57 @@ TRANS(vexth_hu_bu, gen_vv, gen_helper_vexth_hu_bu)
 TRANS(vexth_wu_hu, gen_vv, gen_helper_vexth_wu_hu)
 TRANS(vexth_du_wu, gen_vv, gen_helper_vexth_du_wu)
 TRANS(vexth_qu_du, gen_vv, gen_helper_vexth_qu_du)
+
+static void gen_vsigncov(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t1, t2;
+
+    t1 = tcg_temp_new_vec_matching(t);
+    t2 = tcg_temp_new_vec_matching(t);
+
+    tcg_gen_neg_vec(vece, t1, b);
+    tcg_gen_dupi_vec(vece, t2, 0);
+    tcg_gen_cmpsel_vec(TCG_COND_LT, vece, t, a, t2, t1, b);
+    tcg_gen_cmpsel_vec(TCG_COND_EQ, vece, t, a, t2, t2, t);
+}
+
+static void do_vsigncov(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                        uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_neg_vec, INDEX_op_cmpsel_vec, 0
+        };
+    static const GVecGen3 op[4] = {
+        {
+            .fniv = gen_vsigncov,
+            .fno = gen_helper_vsigncov_b,
+            .opt_opc = vecop_list,
+            .vece = MO_8
+        },
+        {
+            .fniv = gen_vsigncov,
+            .fno = gen_helper_vsigncov_h,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vsigncov,
+            .fno = gen_helper_vsigncov_w,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vsigncov,
+            .fno = gen_helper_vsigncov_d,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+    };
+
+    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(vsigncov_b, gvec_vvv, MO_8, do_vsigncov)
+TRANS(vsigncov_h, gvec_vvv, MO_16, do_vsigncov)
+TRANS(vsigncov_w, gvec_vvv, MO_32, do_vsigncov)
+TRANS(vsigncov_d, gvec_vvv, MO_64, do_vsigncov)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 39c582d098..4233dd7404 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -778,3 +778,8 @@ vexth_hu_bu      0111 00101001 11101 11100 ..... .....    @vv
 vexth_wu_hu      0111 00101001 11101 11101 ..... .....    @vv
 vexth_du_wu      0111 00101001 11101 11110 ..... .....    @vv
 vexth_qu_du      0111 00101001 11101 11111 ..... .....    @vv
+
+vsigncov_b       0111 00010010 11100 ..... ..... .....    @vvv
+vsigncov_h       0111 00010010 11101 ..... ..... .....    @vvv
+vsigncov_w       0111 00010010 11110 ..... ..... .....    @vvv
+vsigncov_d       0111 00010010 11111 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 9a0b358576..b3a9b8cb66 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -871,3 +871,22 @@ VEXTH(vexth_d_w, 64, int64_t, int32_t, D, W)
 VEXTH(vexth_hu_bu, 16, uint16_t, uint8_t, H, B)
 VEXTH(vexth_wu_hu, 32, uint32_t, uint16_t, W, H)
 VEXTH(vexth_du_wu, 64, uint64_t, uint32_t, D, W)
+
+#define DO_SIGNCOV(a, b)  (a == 0 ? 0 : a < 0 ? -b : b)
+
+#define VSIGNCOV(NAME, BIT, E, DO_OP)                       \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
+{                                                           \
+    int i;                                                  \
+    VReg *Vd = (VReg *)vd;                                  \
+    VReg *Vj = (VReg *)vj;                                  \
+    VReg *Vk = (VReg *)vk;                                  \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
+        Vd->E(i) = DO_OP(Vj->E(i),  Vk->E(i));              \
+    }                                                       \
+}
+
+VSIGNCOV(vsigncov_b, 8, B, DO_SIGNCOV)
+VSIGNCOV(vsigncov_h, 16, H, DO_SIGNCOV)
+VSIGNCOV(vsigncov_w, 32, W, DO_SIGNCOV)
+VSIGNCOV(vsigncov_d, 64, D, DO_SIGNCOV)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 21/44] target/loongarch: Implement vmskltz/vmskgez/vmsknz
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (19 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 20/44] target/loongarch: Implement vsigncov Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-01  5:20   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 22/44] target/loongarch: Implement LSX logic instructions Song Gao
                   ` (22 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VMSKLTZ.{B/H/W/D};
- VMSKGEZ.B;
- VMSKNZ.B.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |   7 ++
 target/loongarch/helper.h                   |   7 ++
 target/loongarch/insn_trans/trans_lsx.c.inc |   7 ++
 target/loongarch/insns.decode               |   7 ++
 target/loongarch/lsx_helper.c               | 130 ++++++++++++++++++++
 5 files changed, 158 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 46e808c321..2725b827ee 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1084,3 +1084,10 @@ INSN_LSX(vsigncov_b,       vvv)
 INSN_LSX(vsigncov_h,       vvv)
 INSN_LSX(vsigncov_w,       vvv)
 INSN_LSX(vsigncov_d,       vvv)
+
+INSN_LSX(vmskltz_b,        vv)
+INSN_LSX(vmskltz_h,        vv)
+INSN_LSX(vmskltz_w,        vv)
+INSN_LSX(vmskltz_d,        vv)
+INSN_LSX(vmskgez_b,        vv)
+INSN_LSX(vmsknz_b,         vv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index a7394b2eb7..cc2f542278 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -358,3 +358,10 @@ DEF_HELPER_FLAGS_4(vsigncov_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vsigncov_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vsigncov_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vsigncov_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_3(vmskltz_b, void, env, i32, i32)
+DEF_HELPER_3(vmskltz_h, void, env, i32, i32)
+DEF_HELPER_3(vmskltz_w, void, env, i32, i32)
+DEF_HELPER_3(vmskltz_d, void, env, i32, i32)
+DEF_HELPER_3(vmskgez_b, void, env, i32, i32)
+DEF_HELPER_3(vmsknz_b, void, env, i32,i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 865485ea10..9ca3a23106 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2561,3 +2561,10 @@ TRANS(vsigncov_b, gvec_vvv, MO_8, do_vsigncov)
 TRANS(vsigncov_h, gvec_vvv, MO_16, do_vsigncov)
 TRANS(vsigncov_w, gvec_vvv, MO_32, do_vsigncov)
 TRANS(vsigncov_d, gvec_vvv, MO_64, do_vsigncov)
+
+TRANS(vmskltz_b, gen_vv, gen_helper_vmskltz_b)
+TRANS(vmskltz_h, gen_vv, gen_helper_vmskltz_h)
+TRANS(vmskltz_w, gen_vv, gen_helper_vmskltz_w)
+TRANS(vmskltz_d, gen_vv, gen_helper_vmskltz_d)
+TRANS(vmskgez_b, gen_vv, gen_helper_vmskgez_b)
+TRANS(vmsknz_b, gen_vv, gen_helper_vmsknz_b)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 4233dd7404..47c1ef78a7 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -783,3 +783,10 @@ vsigncov_b       0111 00010010 11100 ..... ..... .....    @vvv
 vsigncov_h       0111 00010010 11101 ..... ..... .....    @vvv
 vsigncov_w       0111 00010010 11110 ..... ..... .....    @vvv
 vsigncov_d       0111 00010010 11111 ..... ..... .....    @vvv
+
+vmskltz_b        0111 00101001 11000 10000 ..... .....    @vv
+vmskltz_h        0111 00101001 11000 10001 ..... .....    @vv
+vmskltz_w        0111 00101001 11000 10010 ..... .....    @vv
+vmskltz_d        0111 00101001 11000 10011 ..... .....    @vv
+vmskgez_b        0111 00101001 11000 10100 ..... .....    @vv
+vmsknz_b         0111 00101001 11000 11000 ..... .....    @vv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index b3a9b8cb66..f8916c06da 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -890,3 +890,133 @@ VSIGNCOV(vsigncov_b, 8, B, DO_SIGNCOV)
 VSIGNCOV(vsigncov_h, 16, H, DO_SIGNCOV)
 VSIGNCOV(vsigncov_w, 32, W, DO_SIGNCOV)
 VSIGNCOV(vsigncov_d, 64, D, DO_SIGNCOV)
+
+static uint64_t do_vmskltz_b(int64_t val)
+{
+    uint64_t m = 0x8080808080808080ULL;
+    uint64_t c =  val & m;
+    c |= c << 7;
+    c |= c << 14;
+    c |= c << 28;
+    return c >> 56;
+}
+
+void HELPER(vmskltz_b)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+{
+    VReg temp;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    temp.D(0) = 0;
+    temp.D(1) = 0;
+    temp.H(0) = do_vmskltz_b(Vj->D(0));
+    temp.H(0) |= (do_vmskltz_b(Vj->D(1)) << 8);
+    Vd->D(0) = temp.D(0);
+    Vd->D(1) = 0;
+}
+
+static uint64_t do_vmskltz_h(int64_t val)
+{
+    uint64_t m = 0x8000800080008000ULL;
+    uint64_t c =  val & m;
+    c |= c << 15;
+    c |= c << 30;
+    return c >> 60;
+}
+
+void HELPER(vmskltz_h)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+{
+    VReg temp;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    temp.D(0) = 0;
+    temp.D(1) = 0;
+    temp.H(0) = do_vmskltz_h(Vj->D(0));
+    temp.H(0) |= (do_vmskltz_h(Vj->D(1)) << 4);
+    Vd->D(0) = temp.D(0);
+    Vd->D(1) = 0;
+}
+
+static uint64_t do_vmskltz_w(int64_t val)
+{
+    uint64_t m = 0x8000000080000000ULL;
+    uint64_t c =  val & m;
+    c |= c << 31;
+    return c >> 62;
+}
+
+void HELPER(vmskltz_w)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+{
+    VReg temp;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    temp.D(0) = 0;
+    temp.D(1) = 0;
+    temp.H(0) = do_vmskltz_w(Vj->D(0));
+    temp.H(0) |= (do_vmskltz_w(Vj->D(1)) << 2);
+    Vd->D(0) = temp.D(0);
+    Vd->D(1) = 0;
+}
+
+static uint64_t do_vmskltz_d(int64_t val)
+{
+    uint64_t m = 0x8000000000000000ULL;
+    uint64_t c =  val & m;
+    c |= c << 63;
+    return c >> 63;
+}
+void HELPER(vmskltz_d)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+{
+    VReg temp;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    temp.D(0) = 0;
+    temp.D(1) = 0;
+    temp.H(0) = do_vmskltz_d(Vj->D(0));
+    temp.H(0) |= (do_vmskltz_d(Vj->D(1)) << 1);
+    Vd->D(0) = temp.D(0);
+    Vd->D(1) = 0;
+}
+
+void HELPER(vmskgez_b)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+{
+    VReg temp;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    temp.D(0) = 0;
+    temp.D(1) = 0;
+    temp.H(0) =   do_vmskltz_b(Vj->D(0));
+    temp.H(0) |= (do_vmskltz_b(Vj->D(1)) << 8);
+    temp.H(0) = ~temp.H(0);
+    Vd->D(0) = temp.D(0);
+    Vd->D(1) = 0;
+}
+
+static uint64_t do_vmskez_b(uint64_t a)
+{
+    uint64_t m = 0x7f7f7f7f7f7f7f7fULL;
+    uint64_t c = ~(((a & m) + m) | a | m);
+    c |= c << 7;
+    c |= c << 14;
+    c |= c << 28;
+    return c >> 56;
+}
+
+void HELPER(vmsknz_b)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+{
+    VReg temp;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    temp.D(0) = 0;
+    temp.D(1) = 0;
+    temp.H(0) = do_vmskez_b(Vj->D(0));
+    temp.H(0) |= (do_vmskez_b(Vj->D(1)) << 8);
+    temp.H(0) = ~temp.H(0);
+    Vd->D(0) = temp.D(0);
+    Vd->D(1) = 0;
+}
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 22/44] target/loongarch: Implement LSX logic instructions
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (20 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 21/44] target/loongarch: Implement vmskltz/vmskgez/vmsknz Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-01  5:31   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 23/44] target/loongarch: Implement vsll vsrl vsra vrotr Song Gao
                   ` (21 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- V{AND/OR/XOR/NOR/ANDN/ORN}.V;
- V{AND/OR/XOR/NOR}I.B.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    | 12 +++++
 target/loongarch/helper.h                   |  2 +
 target/loongarch/insn_trans/trans_lsx.c.inc | 50 +++++++++++++++++++++
 target/loongarch/insns.decode               | 13 ++++++
 target/loongarch/lsx_helper.c               | 11 +++++
 5 files changed, 88 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 2725b827ee..eca0a4bb7b 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1091,3 +1091,15 @@ INSN_LSX(vmskltz_w,        vv)
 INSN_LSX(vmskltz_d,        vv)
 INSN_LSX(vmskgez_b,        vv)
 INSN_LSX(vmsknz_b,         vv)
+
+INSN_LSX(vand_v,           vvv)
+INSN_LSX(vor_v,            vvv)
+INSN_LSX(vxor_v,           vvv)
+INSN_LSX(vnor_v,           vvv)
+INSN_LSX(vandn_v,          vvv)
+INSN_LSX(vorn_v,           vvv)
+
+INSN_LSX(vandi_b,          vv_i)
+INSN_LSX(vori_b,           vv_i)
+INSN_LSX(vxori_b,          vv_i)
+INSN_LSX(vnori_b,          vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index cc2f542278..1eeb614427 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -365,3 +365,5 @@ DEF_HELPER_3(vmskltz_w, void, env, i32, i32)
 DEF_HELPER_3(vmskltz_d, void, env, i32, i32)
 DEF_HELPER_3(vmskgez_b, void, env, i32, i32)
 DEF_HELPER_3(vmsknz_b, void, env, i32,i32)
+
+DEF_HELPER_FLAGS_4(vnori_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 9ca3a23106..c20d77bd3a 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2568,3 +2568,53 @@ TRANS(vmskltz_w, gen_vv, gen_helper_vmskltz_w)
 TRANS(vmskltz_d, gen_vv, gen_helper_vmskltz_d)
 TRANS(vmskgez_b, gen_vv, gen_helper_vmskgez_b)
 TRANS(vmsknz_b, gen_vv, gen_helper_vmsknz_b)
+
+TRANS(vand_v, gvec_vvv, MO_64, tcg_gen_gvec_and)
+TRANS(vor_v, gvec_vvv, MO_64, tcg_gen_gvec_or)
+TRANS(vxor_v, gvec_vvv, MO_64, tcg_gen_gvec_xor)
+TRANS(vnor_v, gvec_vvv, MO_64, tcg_gen_gvec_nor)
+
+static bool trans_vandn_v(DisasContext *ctx, arg_vvv *a)
+{
+    uint32_t vd_ofs, vj_ofs, vk_ofs;
+
+    CHECK_SXE;
+
+    vd_ofs = vreg_full_offset(a->vd);
+    vj_ofs = vreg_full_offset(a->vj);
+    vk_ofs = vreg_full_offset(a->vk);
+
+    tcg_gen_gvec_andc(MO_64, vd_ofs, vk_ofs, vj_ofs, 16, 16);
+    return true;
+}
+TRANS(vorn_v, gvec_vvv, MO_64, tcg_gen_gvec_orc)
+TRANS(vandi_b, gvec_vv_i, MO_8, tcg_gen_gvec_andi)
+TRANS(vori_b, gvec_vv_i, MO_8, tcg_gen_gvec_ori)
+TRANS(vxori_b, gvec_vv_i, MO_8, tcg_gen_gvec_xori)
+
+static void gen_vnori(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
+{
+    TCGv_vec t1;
+
+    t1 = tcg_temp_new_vec_matching(t);
+    tcg_gen_dupi_vec(vece, t1, imm);
+    tcg_gen_nor_vec(vece, t, a, t1);
+}
+
+static void do_vnori_b(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+                       int64_t imm, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_nor_vec, 0
+        };
+    static const GVecGen2i op = {
+       .fniv = gen_vnori,
+       .fnoi = gen_helper_vnori_b,
+       .opt_opc = vecop_list,
+       .vece = MO_8
+    };
+
+    tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, maxsz, imm, &op);
+}
+
+TRANS(vnori_b, gvec_vv_i, MO_8, do_vnori_b)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 47c1ef78a7..6309683be9 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -503,6 +503,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 @vv_ui4         .... ........ ..... . imm:4 vj:5 vd:5    &vv_i
 @vv_ui5           .... ........ ..... imm:5 vj:5 vd:5    &vv_i
 @vv_ui6            .... ........ .... imm:6 vj:5 vd:5    &vv_i
+@vv_ui8              .... ........ .. imm:8 vj:5 vd:5    &vv_i
 @vv_i5           .... ........ ..... imm:s5 vj:5 vd:5    &vv_i
 
 vadd_b           0111 00000000 10100 ..... ..... .....    @vvv
@@ -790,3 +791,15 @@ vmskltz_w        0111 00101001 11000 10010 ..... .....    @vv
 vmskltz_d        0111 00101001 11000 10011 ..... .....    @vv
 vmskgez_b        0111 00101001 11000 10100 ..... .....    @vv
 vmsknz_b         0111 00101001 11000 11000 ..... .....    @vv
+
+vand_v           0111 00010010 01100 ..... ..... .....    @vvv
+vor_v            0111 00010010 01101 ..... ..... .....    @vvv
+vxor_v           0111 00010010 01110 ..... ..... .....    @vvv
+vnor_v           0111 00010010 01111 ..... ..... .....    @vvv
+vandn_v          0111 00010010 10000 ..... ..... .....    @vvv
+vorn_v           0111 00010010 10001 ..... ..... .....    @vvv
+
+vandi_b          0111 00111101 00 ........ ..... .....    @vv_ui8
+vori_b           0111 00111101 01 ........ ..... .....    @vv_ui8
+vxori_b          0111 00111101 10 ........ ..... .....    @vv_ui8
+vnori_b          0111 00111101 11 ........ ..... .....    @vv_ui8
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index f8916c06da..198ab3088b 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -1020,3 +1020,14 @@ void HELPER(vmsknz_b)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
     Vd->D(0) = temp.D(0);
     Vd->D(1) = 0;
 }
+
+void HELPER(vnori_b)(void *vd, void *vj, uint64_t imm, uint32_t v)
+{
+    int i;
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+
+    for (i = 0; i < LSX_LEN/8; i++) {
+        Vd->B(i) = ~(Vj->B(i) | (uint8_t)imm);
+    }
+}
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 23/44] target/loongarch: Implement vsll vsrl vsra vrotr
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (21 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 22/44] target/loongarch: Implement LSX logic instructions Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-01  5:38   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 24/44] target/loongarch: Implement vsllwil vextl Song Gao
                   ` (20 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VSLL[I].{B/H/W/D};
- VSRL[I].{B/H/W/D};
- VSRA[I].{B/H/W/D};
- VROTR[I].{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    | 36 +++++++++++++++++++++
 target/loongarch/insn_trans/trans_lsx.c.inc | 36 +++++++++++++++++++++
 target/loongarch/insns.decode               | 36 +++++++++++++++++++++
 3 files changed, 108 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index eca0a4bb7b..f7d0fb4441 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1103,3 +1103,39 @@ INSN_LSX(vandi_b,          vv_i)
 INSN_LSX(vori_b,           vv_i)
 INSN_LSX(vxori_b,          vv_i)
 INSN_LSX(vnori_b,          vv_i)
+
+INSN_LSX(vsll_b,           vvv)
+INSN_LSX(vsll_h,           vvv)
+INSN_LSX(vsll_w,           vvv)
+INSN_LSX(vsll_d,           vvv)
+INSN_LSX(vslli_b,          vv_i)
+INSN_LSX(vslli_h,          vv_i)
+INSN_LSX(vslli_w,          vv_i)
+INSN_LSX(vslli_d,          vv_i)
+
+INSN_LSX(vsrl_b,           vvv)
+INSN_LSX(vsrl_h,           vvv)
+INSN_LSX(vsrl_w,           vvv)
+INSN_LSX(vsrl_d,           vvv)
+INSN_LSX(vsrli_b,          vv_i)
+INSN_LSX(vsrli_h,          vv_i)
+INSN_LSX(vsrli_w,          vv_i)
+INSN_LSX(vsrli_d,          vv_i)
+
+INSN_LSX(vsra_b,           vvv)
+INSN_LSX(vsra_h,           vvv)
+INSN_LSX(vsra_w,           vvv)
+INSN_LSX(vsra_d,           vvv)
+INSN_LSX(vsrai_b,          vv_i)
+INSN_LSX(vsrai_h,          vv_i)
+INSN_LSX(vsrai_w,          vv_i)
+INSN_LSX(vsrai_d,          vv_i)
+
+INSN_LSX(vrotr_b,          vvv)
+INSN_LSX(vrotr_h,          vvv)
+INSN_LSX(vrotr_w,          vvv)
+INSN_LSX(vrotr_d,          vvv)
+INSN_LSX(vrotri_b,         vv_i)
+INSN_LSX(vrotri_h,         vv_i)
+INSN_LSX(vrotri_w,         vv_i)
+INSN_LSX(vrotri_d,         vv_i)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index c20d77bd3a..84c8d92ad6 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2618,3 +2618,39 @@ static void do_vnori_b(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
 }
 
 TRANS(vnori_b, gvec_vv_i, MO_8, do_vnori_b)
+
+TRANS(vsll_b, gvec_vvv, MO_8, tcg_gen_gvec_shlv)
+TRANS(vsll_h, gvec_vvv, MO_16, tcg_gen_gvec_shlv)
+TRANS(vsll_w, gvec_vvv, MO_32, tcg_gen_gvec_shlv)
+TRANS(vsll_d, gvec_vvv, MO_64, tcg_gen_gvec_shlv)
+TRANS(vslli_b, gvec_vv_i, MO_8, tcg_gen_gvec_shli)
+TRANS(vslli_h, gvec_vv_i, MO_16, tcg_gen_gvec_shli)
+TRANS(vslli_w, gvec_vv_i, MO_32, tcg_gen_gvec_shli)
+TRANS(vslli_d, gvec_vv_i, MO_64, tcg_gen_gvec_shli)
+
+TRANS(vsrl_b, gvec_vvv, MO_8, tcg_gen_gvec_shrv)
+TRANS(vsrl_h, gvec_vvv, MO_16, tcg_gen_gvec_shrv)
+TRANS(vsrl_w, gvec_vvv, MO_32, tcg_gen_gvec_shrv)
+TRANS(vsrl_d, gvec_vvv, MO_64, tcg_gen_gvec_shrv)
+TRANS(vsrli_b, gvec_vv_i, MO_8, tcg_gen_gvec_shri)
+TRANS(vsrli_h, gvec_vv_i, MO_16, tcg_gen_gvec_shri)
+TRANS(vsrli_w, gvec_vv_i, MO_32, tcg_gen_gvec_shri)
+TRANS(vsrli_d, gvec_vv_i, MO_64, tcg_gen_gvec_shri)
+
+TRANS(vsra_b, gvec_vvv, MO_8, tcg_gen_gvec_sarv)
+TRANS(vsra_h, gvec_vvv, MO_16, tcg_gen_gvec_sarv)
+TRANS(vsra_w, gvec_vvv, MO_32, tcg_gen_gvec_sarv)
+TRANS(vsra_d, gvec_vvv, MO_64, tcg_gen_gvec_sarv)
+TRANS(vsrai_b, gvec_vv_i, MO_8, tcg_gen_gvec_sari)
+TRANS(vsrai_h, gvec_vv_i, MO_16, tcg_gen_gvec_sari)
+TRANS(vsrai_w, gvec_vv_i, MO_32, tcg_gen_gvec_sari)
+TRANS(vsrai_d, gvec_vv_i, MO_64, tcg_gen_gvec_sari)
+
+TRANS(vrotr_b, gvec_vvv, MO_8, tcg_gen_gvec_rotrv)
+TRANS(vrotr_h, gvec_vvv, MO_16, tcg_gen_gvec_rotrv)
+TRANS(vrotr_w, gvec_vvv, MO_32, tcg_gen_gvec_rotrv)
+TRANS(vrotr_d, gvec_vvv, MO_64, tcg_gen_gvec_rotrv)
+TRANS(vrotri_b, gvec_vv_i, MO_8, tcg_gen_gvec_rotri)
+TRANS(vrotri_h, gvec_vv_i, MO_16, tcg_gen_gvec_rotri)
+TRANS(vrotri_w, gvec_vv_i, MO_32, tcg_gen_gvec_rotri)
+TRANS(vrotri_d, gvec_vv_i, MO_64, tcg_gen_gvec_rotri)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 6309683be9..7c0b0c4ac8 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -803,3 +803,39 @@ vandi_b          0111 00111101 00 ........ ..... .....    @vv_ui8
 vori_b           0111 00111101 01 ........ ..... .....    @vv_ui8
 vxori_b          0111 00111101 10 ........ ..... .....    @vv_ui8
 vnori_b          0111 00111101 11 ........ ..... .....    @vv_ui8
+
+vsll_b           0111 00001110 10000 ..... ..... .....    @vvv
+vsll_h           0111 00001110 10001 ..... ..... .....    @vvv
+vsll_w           0111 00001110 10010 ..... ..... .....    @vvv
+vsll_d           0111 00001110 10011 ..... ..... .....    @vvv
+vslli_b          0111 00110010 11000 01 ... ..... .....   @vv_ui3
+vslli_h          0111 00110010 11000 1 .... ..... .....   @vv_ui4
+vslli_w          0111 00110010 11001 ..... ..... .....    @vv_ui5
+vslli_d          0111 00110010 1101 ...... ..... .....    @vv_ui6
+
+vsrl_b           0111 00001110 10100 ..... ..... .....    @vvv
+vsrl_h           0111 00001110 10101 ..... ..... .....    @vvv
+vsrl_w           0111 00001110 10110 ..... ..... .....    @vvv
+vsrl_d           0111 00001110 10111 ..... ..... .....    @vvv
+vsrli_b          0111 00110011 00000 01 ... ..... .....   @vv_ui3
+vsrli_h          0111 00110011 00000 1 .... ..... .....   @vv_ui4
+vsrli_w          0111 00110011 00001 ..... ..... .....    @vv_ui5
+vsrli_d          0111 00110011 0001 ...... ..... .....    @vv_ui6
+
+vsra_b           0111 00001110 11000 ..... ..... .....    @vvv
+vsra_h           0111 00001110 11001 ..... ..... .....    @vvv
+vsra_w           0111 00001110 11010 ..... ..... .....    @vvv
+vsra_d           0111 00001110 11011 ..... ..... .....    @vvv
+vsrai_b          0111 00110011 01000 01 ... ..... .....   @vv_ui3
+vsrai_h          0111 00110011 01000 1 .... ..... .....   @vv_ui4
+vsrai_w          0111 00110011 01001 ..... ..... .....    @vv_ui5
+vsrai_d          0111 00110011 0101 ...... ..... .....    @vv_ui6
+
+vrotr_b          0111 00001110 11100 ..... ..... .....    @vvv
+vrotr_h          0111 00001110 11101 ..... ..... .....    @vvv
+vrotr_w          0111 00001110 11110 ..... ..... .....    @vvv
+vrotr_d          0111 00001110 11111 ..... ..... .....    @vvv
+vrotri_b         0111 00101010 00000 01 ... ..... .....   @vv_ui3
+vrotri_h         0111 00101010 00000 1 .... ..... .....   @vv_ui4
+vrotri_w         0111 00101010 00001 ..... ..... .....    @vv_ui5
+vrotri_d         0111 00101010 0001 ...... ..... .....    @vv_ui6
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 24/44] target/loongarch: Implement vsllwil vextl
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (22 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 23/44] target/loongarch: Implement vsll vsrl vsra vrotr Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-01  5:40   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 25/44] target/loongarch: Implement vsrlr vsrar Song Gao
                   ` (19 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VSLLWIL.{H.B/W.H/D.W};
- VSLLWIL.{HU.BU/WU.HU/DU.WU};
- VEXTL.Q.D, VEXTL.QU.DU.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  9 +++++
 target/loongarch/helper.h                   |  9 +++++
 target/loongarch/insn_trans/trans_lsx.c.inc | 21 +++++++++++
 target/loongarch/insns.decode               |  9 +++++
 target/loongarch/lsx_helper.c               | 40 +++++++++++++++++++++
 5 files changed, 88 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index f7d0fb4441..087cac10ad 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1139,3 +1139,12 @@ INSN_LSX(vrotri_b,         vv_i)
 INSN_LSX(vrotri_h,         vv_i)
 INSN_LSX(vrotri_w,         vv_i)
 INSN_LSX(vrotri_d,         vv_i)
+
+INSN_LSX(vsllwil_h_b,      vv_i)
+INSN_LSX(vsllwil_w_h,      vv_i)
+INSN_LSX(vsllwil_d_w,      vv_i)
+INSN_LSX(vextl_q_d,        vv)
+INSN_LSX(vsllwil_hu_bu,    vv_i)
+INSN_LSX(vsllwil_wu_hu,    vv_i)
+INSN_LSX(vsllwil_du_wu,    vv_i)
+INSN_LSX(vextl_qu_du,      vv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 1eeb614427..0266b9a4ad 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -367,3 +367,12 @@ DEF_HELPER_3(vmskgez_b, void, env, i32, i32)
 DEF_HELPER_3(vmsknz_b, void, env, i32,i32)
 
 DEF_HELPER_FLAGS_4(vnori_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_4(vsllwil_h_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsllwil_w_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsllwil_d_w, void, env, i32, i32, i32)
+DEF_HELPER_3(vextl_q_d, void, env, i32, i32)
+DEF_HELPER_4(vsllwil_hu_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsllwil_wu_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsllwil_du_wu, void, env, i32, i32, i32)
+DEF_HELPER_3(vextl_qu_du, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 84c8d92ad6..fb40aaf5ad 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -39,6 +39,18 @@ static bool gen_vv(DisasContext *ctx, arg_vv *a,
     return true;
 }
 
+static bool gen_vv_i(DisasContext *ctx, arg_vv_i *a,
+                     void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32))
+{
+    TCGv_i32 vd = tcg_constant_i32(a->vd);
+    TCGv_i32 vj = tcg_constant_i32(a->vj);
+    TCGv_i32 imm = tcg_constant_i32(a->imm);
+
+    CHECK_SXE;
+    func(cpu_env, vd, vj, imm);
+    return true;
+}
+
 static bool gvec_vvv(DisasContext *ctx, arg_vvv *a, MemOp mop,
                      void (*func)(unsigned, uint32_t, uint32_t,
                                   uint32_t, uint32_t, uint32_t))
@@ -2654,3 +2666,12 @@ TRANS(vrotri_b, gvec_vv_i, MO_8, tcg_gen_gvec_rotri)
 TRANS(vrotri_h, gvec_vv_i, MO_16, tcg_gen_gvec_rotri)
 TRANS(vrotri_w, gvec_vv_i, MO_32, tcg_gen_gvec_rotri)
 TRANS(vrotri_d, gvec_vv_i, MO_64, tcg_gen_gvec_rotri)
+
+TRANS(vsllwil_h_b, gen_vv_i, gen_helper_vsllwil_h_b)
+TRANS(vsllwil_w_h, gen_vv_i, gen_helper_vsllwil_w_h)
+TRANS(vsllwil_d_w, gen_vv_i, gen_helper_vsllwil_d_w)
+TRANS(vextl_q_d, gen_vv, gen_helper_vextl_q_d)
+TRANS(vsllwil_hu_bu, gen_vv_i, gen_helper_vsllwil_hu_bu)
+TRANS(vsllwil_wu_hu, gen_vv_i, gen_helper_vsllwil_wu_hu)
+TRANS(vsllwil_du_wu, gen_vv_i, gen_helper_vsllwil_du_wu)
+TRANS(vextl_qu_du, gen_vv, gen_helper_vextl_qu_du)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 7c0b0c4ac8..23dd338026 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -839,3 +839,12 @@ vrotri_b         0111 00101010 00000 01 ... ..... .....   @vv_ui3
 vrotri_h         0111 00101010 00000 1 .... ..... .....   @vv_ui4
 vrotri_w         0111 00101010 00001 ..... ..... .....    @vv_ui5
 vrotri_d         0111 00101010 0001 ...... ..... .....    @vv_ui6
+
+vsllwil_h_b      0111 00110000 10000 01 ... ..... .....   @vv_ui3
+vsllwil_w_h      0111 00110000 10000 1 .... ..... .....   @vv_ui4
+vsllwil_d_w      0111 00110000 10001 ..... ..... .....    @vv_ui5
+vextl_q_d        0111 00110000 10010 00000 ..... .....    @vv
+vsllwil_hu_bu    0111 00110000 11000 01 ... ..... .....   @vv_ui3
+vsllwil_wu_hu    0111 00110000 11000 1 .... ..... .....   @vv_ui4
+vsllwil_du_wu    0111 00110000 11001 ..... ..... .....    @vv_ui5
+vextl_qu_du      0111 00110000 11010 00000 ..... .....    @vv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 198ab3088b..72efdd5a74 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -1031,3 +1031,43 @@ void HELPER(vnori_b)(void *vd, void *vj, uint64_t imm, uint32_t v)
         Vd->B(i) = ~(Vj->B(i) | (uint8_t)imm);
     }
 }
+
+#define VSLLWIL(NAME, BIT, T1, T2, E1, E2)                \
+void HELPER(NAME)(CPULoongArchState *env,                 \
+                  uint32_t vd, uint32_t vj, uint32_t imm) \
+{                                                         \
+    int i;                                                \
+    VReg temp;                                            \
+    VReg *Vd = &(env->fpr[vd].vreg);                      \
+    VReg *Vj = &(env->fpr[vj].vreg);                      \
+    temp.D(0) = 0;                                        \
+    temp.D(1) = 0;                                        \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                   \
+        temp.E1(i) = (T1)(T2)Vj->E2(i) << (imm % BIT);    \
+    }                                                     \
+    Vd->D(0) = temp.D(0);                                 \
+    Vd->D(1) = temp.D(1);                                 \
+}
+
+void HELPER(vextl_q_d)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+{
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    Vd->Q(0) = int128_makes64(Vj->D(0));
+}
+
+void HELPER(vextl_qu_du)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+{
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    Vd->Q(0) = int128_make64(Vj->D(0));
+}
+
+VSLLWIL(vsllwil_h_b, 16, int16_t, int8_t, H, B)
+VSLLWIL(vsllwil_w_h, 32, int32_t, int16_t, W, H)
+VSLLWIL(vsllwil_d_w, 64, int64_t, int32_t, D, W)
+VSLLWIL(vsllwil_hu_bu, 16, uint16_t, uint8_t, H, B)
+VSLLWIL(vsllwil_wu_hu, 32, uint32_t, uint16_t, W, H)
+VSLLWIL(vsllwil_du_wu, 64, uint64_t, uint32_t, D, W)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 25/44] target/loongarch: Implement vsrlr vsrar
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (23 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 24/44] target/loongarch: Implement vsllwil vextl Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-01  5:42   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 26/44] target/loongarch: Implement vsrln vsran Song Gao
                   ` (18 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VSRLR[I].{B/H/W/D};
- VSRAR[I].{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  18 ++++
 target/loongarch/helper.h                   |  18 ++++
 target/loongarch/insn_trans/trans_lsx.c.inc |  18 ++++
 target/loongarch/insns.decode               |  18 ++++
 target/loongarch/lsx_helper.c               | 104 ++++++++++++++++++++
 5 files changed, 176 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 087cac10ad..c62b6720ec 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1148,3 +1148,21 @@ INSN_LSX(vsllwil_hu_bu,    vv_i)
 INSN_LSX(vsllwil_wu_hu,    vv_i)
 INSN_LSX(vsllwil_du_wu,    vv_i)
 INSN_LSX(vextl_qu_du,      vv)
+
+INSN_LSX(vsrlr_b,          vvv)
+INSN_LSX(vsrlr_h,          vvv)
+INSN_LSX(vsrlr_w,          vvv)
+INSN_LSX(vsrlr_d,          vvv)
+INSN_LSX(vsrlri_b,         vv_i)
+INSN_LSX(vsrlri_h,         vv_i)
+INSN_LSX(vsrlri_w,         vv_i)
+INSN_LSX(vsrlri_d,         vv_i)
+
+INSN_LSX(vsrar_b,          vvv)
+INSN_LSX(vsrar_h,          vvv)
+INSN_LSX(vsrar_w,          vvv)
+INSN_LSX(vsrar_d,          vvv)
+INSN_LSX(vsrari_b,         vv_i)
+INSN_LSX(vsrari_h,         vv_i)
+INSN_LSX(vsrari_w,         vv_i)
+INSN_LSX(vsrari_d,         vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 0266b9a4ad..c28353d822 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -376,3 +376,21 @@ DEF_HELPER_4(vsllwil_hu_bu, void, env, i32, i32, i32)
 DEF_HELPER_4(vsllwil_wu_hu, void, env, i32, i32, i32)
 DEF_HELPER_4(vsllwil_du_wu, void, env, i32, i32, i32)
 DEF_HELPER_3(vextl_qu_du, void, env, i32, i32)
+
+DEF_HELPER_4(vsrlr_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlr_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlr_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlr_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlri_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlri_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlri_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlri_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vsrar_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrar_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrar_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrar_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrari_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrari_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrari_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrari_d, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index fb40aaf5ad..2ee763fb32 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2675,3 +2675,21 @@ TRANS(vsllwil_hu_bu, gen_vv_i, gen_helper_vsllwil_hu_bu)
 TRANS(vsllwil_wu_hu, gen_vv_i, gen_helper_vsllwil_wu_hu)
 TRANS(vsllwil_du_wu, gen_vv_i, gen_helper_vsllwil_du_wu)
 TRANS(vextl_qu_du, gen_vv, gen_helper_vextl_qu_du)
+
+TRANS(vsrlr_b, gen_vvv, gen_helper_vsrlr_b)
+TRANS(vsrlr_h, gen_vvv, gen_helper_vsrlr_h)
+TRANS(vsrlr_w, gen_vvv, gen_helper_vsrlr_w)
+TRANS(vsrlr_d, gen_vvv, gen_helper_vsrlr_d)
+TRANS(vsrlri_b, gen_vv_i, gen_helper_vsrlri_b)
+TRANS(vsrlri_h, gen_vv_i, gen_helper_vsrlri_h)
+TRANS(vsrlri_w, gen_vv_i, gen_helper_vsrlri_w)
+TRANS(vsrlri_d, gen_vv_i, gen_helper_vsrlri_d)
+
+TRANS(vsrar_b, gen_vvv, gen_helper_vsrar_b)
+TRANS(vsrar_h, gen_vvv, gen_helper_vsrar_h)
+TRANS(vsrar_w, gen_vvv, gen_helper_vsrar_w)
+TRANS(vsrar_d, gen_vvv, gen_helper_vsrar_d)
+TRANS(vsrari_b, gen_vv_i, gen_helper_vsrari_b)
+TRANS(vsrari_h, gen_vv_i, gen_helper_vsrari_h)
+TRANS(vsrari_w, gen_vv_i, gen_helper_vsrari_w)
+TRANS(vsrari_d, gen_vv_i, gen_helper_vsrari_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 23dd338026..a217411113 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -848,3 +848,21 @@ vsllwil_hu_bu    0111 00110000 11000 01 ... ..... .....   @vv_ui3
 vsllwil_wu_hu    0111 00110000 11000 1 .... ..... .....   @vv_ui4
 vsllwil_du_wu    0111 00110000 11001 ..... ..... .....    @vv_ui5
 vextl_qu_du      0111 00110000 11010 00000 ..... .....    @vv
+
+vsrlr_b          0111 00001111 00000 ..... ..... .....    @vvv
+vsrlr_h          0111 00001111 00001 ..... ..... .....    @vvv
+vsrlr_w          0111 00001111 00010 ..... ..... .....    @vvv
+vsrlr_d          0111 00001111 00011 ..... ..... .....    @vvv
+vsrlri_b         0111 00101010 01000 01 ... ..... .....   @vv_ui3
+vsrlri_h         0111 00101010 01000 1 .... ..... .....   @vv_ui4
+vsrlri_w         0111 00101010 01001 ..... ..... .....    @vv_ui5
+vsrlri_d         0111 00101010 0101 ...... ..... .....    @vv_ui6
+
+vsrar_b          0111 00001111 00100 ..... ..... .....    @vvv
+vsrar_h          0111 00001111 00101 ..... ..... .....    @vvv
+vsrar_w          0111 00001111 00110 ..... ..... .....    @vvv
+vsrar_d          0111 00001111 00111 ..... ..... .....    @vvv
+vsrari_b         0111 00101010 10000 01 ... ..... .....   @vv_ui3
+vsrari_h         0111 00101010 10000 1 .... ..... .....   @vv_ui4
+vsrari_w         0111 00101010 10001 ..... ..... .....    @vv_ui5
+vsrari_d         0111 00101010 1001 ...... ..... .....    @vv_ui6
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 72efdd5a74..a33bb11aee 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -1071,3 +1071,107 @@ VSLLWIL(vsllwil_d_w, 64, int64_t, int32_t, D, W)
 VSLLWIL(vsllwil_hu_bu, 16, uint16_t, uint8_t, H, B)
 VSLLWIL(vsllwil_wu_hu, 32, uint32_t, uint16_t, W, H)
 VSLLWIL(vsllwil_du_wu, 64, uint64_t, uint32_t, D, W)
+
+#define do_vsrlr(E, T)                                  \
+static T do_vsrlr_ ##E(T s1, int sh)                    \
+{                                                       \
+    if (sh == 0) {                                      \
+        return s1;                                      \
+    } else {                                            \
+        return  (s1 >> sh)  + ((s1 >> (sh - 1)) & 0x1); \
+    }                                                   \
+}
+
+do_vsrlr(B, uint8_t)
+do_vsrlr(H, uint16_t)
+do_vsrlr(W, uint32_t)
+do_vsrlr(D, uint64_t)
+
+#define VSRLR(NAME, BIT, T, E)                                  \
+void HELPER(NAME)(CPULoongArchState *env,                       \
+                  uint32_t vd, uint32_t vj, uint32_t vk)        \
+{                                                               \
+    int i;                                                      \
+    VReg *Vd = &(env->fpr[vd].vreg);                            \
+    VReg *Vj = &(env->fpr[vj].vreg);                            \
+    VReg *Vk = &(env->fpr[vk].vreg);                            \
+                                                                \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                         \
+        Vd->E(i) = do_vsrlr_ ## E(Vj->E(i), ((T)Vk->E(i))%BIT); \
+    }                                                           \
+}
+
+VSRLR(vsrlr_b, 8,  uint8_t, B)
+VSRLR(vsrlr_h, 16, uint16_t, H)
+VSRLR(vsrlr_w, 32, uint32_t, W)
+VSRLR(vsrlr_d, 64, uint64_t, D)
+
+#define VSRLRI(NAME, BIT, E)                              \
+void HELPER(NAME)(CPULoongArchState *env,                 \
+                  uint32_t vd, uint32_t vj, uint32_t imm) \
+{                                                         \
+    int i;                                                \
+    VReg *Vd = &(env->fpr[vd].vreg);                      \
+    VReg *Vj = &(env->fpr[vj].vreg);                      \
+                                                          \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                   \
+        Vd->E(i) = do_vsrlr_ ## E(Vj->E(i), imm);         \
+    }                                                     \
+}
+
+VSRLRI(vsrlri_b, 8, B)
+VSRLRI(vsrlri_h, 16, H)
+VSRLRI(vsrlri_w, 32, W)
+VSRLRI(vsrlri_d, 64, D)
+
+#define do_vsrar(E, T)                                  \
+static T do_vsrar_ ##E(T s1, int sh)                    \
+{                                                       \
+    if (sh == 0) {                                      \
+        return s1;                                      \
+    } else {                                            \
+        return  (s1 >> sh)  + ((s1 >> (sh - 1)) & 0x1); \
+    }                                                   \
+}
+
+do_vsrar(B, int8_t)
+do_vsrar(H, int16_t)
+do_vsrar(W, int32_t)
+do_vsrar(D, int64_t)
+
+#define VSRAR(NAME, BIT, T, E)                                  \
+void HELPER(NAME)(CPULoongArchState *env,                       \
+                  uint32_t vd, uint32_t vj, uint32_t vk)        \
+{                                                               \
+    int i;                                                      \
+    VReg *Vd = &(env->fpr[vd].vreg);                            \
+    VReg *Vj = &(env->fpr[vj].vreg);                            \
+    VReg *Vk = &(env->fpr[vk].vreg);                            \
+                                                                \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                         \
+        Vd->E(i) = do_vsrar_ ## E(Vj->E(i), ((T)Vk->E(i))%BIT); \
+    }                                                           \
+}
+
+VSRAR(vsrar_b, 8,  uint8_t, B)
+VSRAR(vsrar_h, 16, uint16_t, H)
+VSRAR(vsrar_w, 32, uint32_t, W)
+VSRAR(vsrar_d, 64, uint64_t, D)
+
+#define VSRARI(NAME, BIT, E)                              \
+void HELPER(NAME)(CPULoongArchState *env,                 \
+                  uint32_t vd, uint32_t vj, uint32_t imm) \
+{                                                         \
+    int i;                                                \
+    VReg *Vd = &(env->fpr[vd].vreg);                      \
+    VReg *Vj = &(env->fpr[vj].vreg);                      \
+                                                          \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                   \
+        Vd->E(i) = do_vsrar_ ## E(Vj->E(i), imm);         \
+    }                                                     \
+}
+
+VSRARI(vsrari_b, 8, B)
+VSRARI(vsrari_h, 16, H)
+VSRARI(vsrari_w, 32, W)
+VSRARI(vsrari_d, 64, D)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 26/44] target/loongarch: Implement vsrln vsran
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (24 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 25/44] target/loongarch: Implement vsrlr vsrar Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-01  5:46   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 27/44] target/loongarch: Implement vsrlrn vsrarn Song Gao
                   ` (17 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VSRLN.{B.H/H.W/W.D};
- VSRAN.{B.H/H.W/W.D};
- VSRLNI.{B.H/H.W/W.D/D.Q};
- VSRANI.{B.H/H.W/W.D/D.Q}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  16 +++
 target/loongarch/helper.h                   |  16 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  16 +++
 target/loongarch/insns.decode               |  17 +++
 target/loongarch/lsx_helper.c               | 118 ++++++++++++++++++++
 5 files changed, 183 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index c62b6720ec..f0fc2ff84b 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1166,3 +1166,19 @@ INSN_LSX(vsrari_b,         vv_i)
 INSN_LSX(vsrari_h,         vv_i)
 INSN_LSX(vsrari_w,         vv_i)
 INSN_LSX(vsrari_d,         vv_i)
+
+INSN_LSX(vsrln_b_h,       vvv)
+INSN_LSX(vsrln_h_w,       vvv)
+INSN_LSX(vsrln_w_d,       vvv)
+INSN_LSX(vsran_b_h,       vvv)
+INSN_LSX(vsran_h_w,       vvv)
+INSN_LSX(vsran_w_d,       vvv)
+
+INSN_LSX(vsrlni_b_h,       vv_i)
+INSN_LSX(vsrlni_h_w,       vv_i)
+INSN_LSX(vsrlni_w_d,       vv_i)
+INSN_LSX(vsrlni_d_q,       vv_i)
+INSN_LSX(vsrani_b_h,       vv_i)
+INSN_LSX(vsrani_h_w,       vv_i)
+INSN_LSX(vsrani_w_d,       vv_i)
+INSN_LSX(vsrani_d_q,       vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index c28353d822..e7d0a8d6cf 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -394,3 +394,19 @@ DEF_HELPER_4(vsrari_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrari_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrari_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrari_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vsrln_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrln_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrln_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsran_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsran_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsran_w_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vsrlni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlni_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrani_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrani_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrani_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrani_d_q, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 2ee763fb32..77f7d6319f 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2693,3 +2693,19 @@ TRANS(vsrari_b, gen_vv_i, gen_helper_vsrari_b)
 TRANS(vsrari_h, gen_vv_i, gen_helper_vsrari_h)
 TRANS(vsrari_w, gen_vv_i, gen_helper_vsrari_w)
 TRANS(vsrari_d, gen_vv_i, gen_helper_vsrari_d)
+
+TRANS(vsrln_b_h, gen_vvv, gen_helper_vsrln_b_h)
+TRANS(vsrln_h_w, gen_vvv, gen_helper_vsrln_h_w)
+TRANS(vsrln_w_d, gen_vvv, gen_helper_vsrln_w_d)
+TRANS(vsran_b_h, gen_vvv, gen_helper_vsran_b_h)
+TRANS(vsran_h_w, gen_vvv, gen_helper_vsran_h_w)
+TRANS(vsran_w_d, gen_vvv, gen_helper_vsran_w_d)
+
+TRANS(vsrlni_b_h, gen_vv_i, gen_helper_vsrlni_b_h)
+TRANS(vsrlni_h_w, gen_vv_i, gen_helper_vsrlni_h_w)
+TRANS(vsrlni_w_d, gen_vv_i, gen_helper_vsrlni_w_d)
+TRANS(vsrlni_d_q, gen_vv_i, gen_helper_vsrlni_d_q)
+TRANS(vsrani_b_h, gen_vv_i, gen_helper_vsrani_b_h)
+TRANS(vsrani_h_w, gen_vv_i, gen_helper_vsrani_h_w)
+TRANS(vsrani_w_d, gen_vv_i, gen_helper_vsrani_w_d)
+TRANS(vsrani_d_q, gen_vv_i, gen_helper_vsrani_d_q)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index a217411113..ee54b632a7 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -503,6 +503,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 @vv_ui4         .... ........ ..... . imm:4 vj:5 vd:5    &vv_i
 @vv_ui5           .... ........ ..... imm:5 vj:5 vd:5    &vv_i
 @vv_ui6            .... ........ .... imm:6 vj:5 vd:5    &vv_i
+@vv_ui7             .... ........ ... imm:7 vj:5 vd:5    &vv_i
 @vv_ui8              .... ........ .. imm:8 vj:5 vd:5    &vv_i
 @vv_i5           .... ........ ..... imm:s5 vj:5 vd:5    &vv_i
 
@@ -866,3 +867,19 @@ vsrari_b         0111 00101010 10000 01 ... ..... .....   @vv_ui3
 vsrari_h         0111 00101010 10000 1 .... ..... .....   @vv_ui4
 vsrari_w         0111 00101010 10001 ..... ..... .....    @vv_ui5
 vsrari_d         0111 00101010 1001 ...... ..... .....    @vv_ui6
+
+vsrln_b_h        0111 00001111 01001 ..... ..... .....    @vvv
+vsrln_h_w        0111 00001111 01010 ..... ..... .....    @vvv
+vsrln_w_d        0111 00001111 01011 ..... ..... .....    @vvv
+vsran_b_h        0111 00001111 01101 ..... ..... .....    @vvv
+vsran_h_w        0111 00001111 01110 ..... ..... .....    @vvv
+vsran_w_d        0111 00001111 01111 ..... ..... .....    @vvv
+
+vsrlni_b_h       0111 00110100 00000 1 .... ..... .....   @vv_ui4
+vsrlni_h_w       0111 00110100 00001 ..... ..... .....    @vv_ui5
+vsrlni_w_d       0111 00110100 0001 ...... ..... .....    @vv_ui6
+vsrlni_d_q       0111 00110100 001 ....... ..... .....    @vv_ui7
+vsrani_b_h       0111 00110101 10000 1 .... ..... .....   @vv_ui4
+vsrani_h_w       0111 00110101 10001 ..... ..... .....    @vv_ui5
+vsrani_w_d       0111 00110101 1001 ...... ..... .....    @vv_ui6
+vsrani_d_q       0111 00110101 101 ....... ..... .....    @vv_ui7
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index a33bb11aee..6ddebddde7 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -1175,3 +1175,121 @@ VSRARI(vsrari_b, 8, B)
 VSRARI(vsrari_h, 16, H)
 VSRARI(vsrari_w, 32, W)
 VSRARI(vsrari_d, 64, D)
+
+#define R_SHIFT(a, b) (a >> b)
+
+#define VSRLN(NAME, BIT, T, E1, E2)                             \
+void HELPER(NAME)(CPULoongArchState *env,                       \
+                  uint32_t vd, uint32_t vj, uint32_t vk)        \
+{                                                               \
+    int i;                                                      \
+    VReg *Vd = &(env->fpr[vd].vreg);                            \
+    VReg *Vj = &(env->fpr[vj].vreg);                            \
+    VReg *Vk = &(env->fpr[vk].vreg);                            \
+                                                                \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                         \
+        Vd->E1(i) = R_SHIFT((T)Vj->E2(i),((T)Vk->E2(i)) % BIT); \
+    }                                                           \
+    Vd->D(1) = 0;                                               \
+}
+
+VSRLN(vsrln_b_h, 16, uint16_t, B, H)
+VSRLN(vsrln_h_w, 32, uint32_t, H, W)
+VSRLN(vsrln_w_d, 64, uint64_t, W, D)
+
+#define VSRAN(NAME, BIT, T, E1, E2)                           \
+void HELPER(NAME)(CPULoongArchState *env,                     \
+                  uint32_t vd, uint32_t vj, uint32_t vk)      \
+{                                                             \
+    int i;                                                    \
+    VReg *Vd = &(env->fpr[vd].vreg);                          \
+    VReg *Vj = &(env->fpr[vj].vreg);                          \
+    VReg *Vk = &(env->fpr[vk].vreg);                          \
+                                                              \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                       \
+        Vd->E1(i) = R_SHIFT(Vj->E2(i), ((T)Vk->E2(i)) % BIT); \
+    }                                                         \
+    Vd->D(1) = 0;                                             \
+}
+
+VSRAN(vsran_b_h, 16, uint16_t, B, H)
+VSRAN(vsran_h_w, 32, uint32_t, H, W)
+VSRAN(vsran_w_d, 64, uint64_t, W, D)
+
+#define VSRLNI(NAME, BIT, T, E1, E2)                         \
+void HELPER(NAME)(CPULoongArchState *env,                    \
+                  uint32_t vd, uint32_t vj, uint32_t imm)    \
+{                                                            \
+    int i, max;                                              \
+    VReg temp;                                               \
+    VReg *Vd = &(env->fpr[vd].vreg);                         \
+    VReg *Vj = &(env->fpr[vj].vreg);                         \
+                                                             \
+    temp.D(0) = 0;                                           \
+    temp.D(1) = 0;                                           \
+    max = LSX_LEN/BIT;                                       \
+    for (i = 0; i < max; i++) {                              \
+        temp.E1(i) = R_SHIFT((T)Vj->E2(i), imm);             \
+        temp.E1(i + max) = R_SHIFT((T)Vd->E2(i), imm);       \
+    }                                                        \
+    Vd->D(0) = temp.D(0);                                    \
+    Vd->D(1) = temp.D(1);                                    \
+}
+
+void HELPER(vsrlni_d_q)(CPULoongArchState *env,
+                        uint32_t vd, uint32_t vj, uint32_t imm)
+{
+    VReg temp;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    temp.D(0) = 0;
+    temp.D(1) = 0;
+    temp.D(0) = int128_urshift(Vj->Q(0), imm % 128);
+    temp.D(1) = int128_urshift(Vd->Q(0), imm % 128);
+    Vd->D(0) = temp.D(0);
+    Vd->D(1) = temp.D(1);
+}
+
+VSRLNI(vsrlni_b_h, 16, uint16_t, B, H)
+VSRLNI(vsrlni_h_w, 32, uint32_t, H, W)
+VSRLNI(vsrlni_w_d, 64, uint64_t, W, D)
+
+#define VSRANI(NAME, BIT, E1, E2)                         \
+void HELPER(NAME)(CPULoongArchState *env,                 \
+                  uint32_t vd, uint32_t vj, uint32_t imm) \
+{                                                         \
+    int i, max;                                           \
+    VReg temp;                                            \
+    VReg *Vd = &(env->fpr[vd].vreg);                      \
+    VReg *Vj = &(env->fpr[vj].vreg);                      \
+                                                          \
+    temp.D(0) = 0;                                        \
+    temp.D(1) = 0;                                        \
+    max = LSX_LEN/BIT;                                    \
+    for (i = 0; i < max; i++) {                           \
+        temp.E1(i) = R_SHIFT(Vj->E2(i), imm);             \
+        temp.E1(i + max) = R_SHIFT(Vd->E2(i), imm);       \
+    }                                                     \
+    Vd->D(0) = temp.D(0);                                 \
+    Vd->D(1) = temp.D(1);                                 \
+}
+
+void HELPER(vsrani_d_q)(CPULoongArchState *env,
+                        uint32_t vd, uint32_t vj, uint32_t imm)
+{
+    VReg temp;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    temp.D(0) = 0;
+    temp.D(1) = 0;
+    temp.D(0) = int128_rshift(Vj->Q(0), imm % 128);
+    temp.D(1) = int128_rshift(Vd->Q(0), imm % 128);
+    Vd->D(0) = temp.D(0);
+    Vd->D(1) = temp.D(1);
+}
+
+VSRANI(vsrani_b_h, 16, B, H)
+VSRANI(vsrani_h_w, 32, H, W)
+VSRANI(vsrani_w_d, 64, W, D)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 27/44] target/loongarch: Implement vsrlrn vsrarn
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (25 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 26/44] target/loongarch: Implement vsrln vsran Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-01  5:53   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 28/44] target/loongarch: Implement vssrln vssran Song Gao
                   ` (16 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VSRLRN.{B.H/H.W/W.D};
- VSRARN.{B.H/H.W/W.D};
- VSRLRNI.{B.H/H.W/W.D/D.Q};
- VSRARNI.{B.H/H.W/W.D/D.Q}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  16 +++
 target/loongarch/helper.h                   |  16 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  16 +++
 target/loongarch/insns.decode               |  16 +++
 target/loongarch/lsx_helper.c               | 132 ++++++++++++++++++++
 5 files changed, 196 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index f0fc2ff84b..185cd36381 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1182,3 +1182,19 @@ INSN_LSX(vsrani_b_h,       vv_i)
 INSN_LSX(vsrani_h_w,       vv_i)
 INSN_LSX(vsrani_w_d,       vv_i)
 INSN_LSX(vsrani_d_q,       vv_i)
+
+INSN_LSX(vsrlrn_b_h,       vvv)
+INSN_LSX(vsrlrn_h_w,       vvv)
+INSN_LSX(vsrlrn_w_d,       vvv)
+INSN_LSX(vsrarn_b_h,       vvv)
+INSN_LSX(vsrarn_h_w,       vvv)
+INSN_LSX(vsrarn_w_d,       vvv)
+
+INSN_LSX(vsrlrni_b_h,      vv_i)
+INSN_LSX(vsrlrni_h_w,      vv_i)
+INSN_LSX(vsrlrni_w_d,      vv_i)
+INSN_LSX(vsrlrni_d_q,      vv_i)
+INSN_LSX(vsrarni_b_h,      vv_i)
+INSN_LSX(vsrarni_h_w,      vv_i)
+INSN_LSX(vsrarni_w_d,      vv_i)
+INSN_LSX(vsrarni_d_q,      vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index e7d0a8d6cf..ee0812dca2 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -410,3 +410,19 @@ DEF_HELPER_4(vsrani_b_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrani_h_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrani_w_d, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrani_d_q, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vsrlrn_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlrn_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlrn_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrarn_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrarn_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrarn_w_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vsrlrni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlrni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlrni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlrni_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrarni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrarni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrarni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrarni_d_q, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 77f7d6319f..a8f699915d 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2709,3 +2709,19 @@ TRANS(vsrani_b_h, gen_vv_i, gen_helper_vsrani_b_h)
 TRANS(vsrani_h_w, gen_vv_i, gen_helper_vsrani_h_w)
 TRANS(vsrani_w_d, gen_vv_i, gen_helper_vsrani_w_d)
 TRANS(vsrani_d_q, gen_vv_i, gen_helper_vsrani_d_q)
+
+TRANS(vsrlrn_b_h, gen_vvv, gen_helper_vsrlrn_b_h)
+TRANS(vsrlrn_h_w, gen_vvv, gen_helper_vsrlrn_h_w)
+TRANS(vsrlrn_w_d, gen_vvv, gen_helper_vsrlrn_w_d)
+TRANS(vsrarn_b_h, gen_vvv, gen_helper_vsrarn_b_h)
+TRANS(vsrarn_h_w, gen_vvv, gen_helper_vsrarn_h_w)
+TRANS(vsrarn_w_d, gen_vvv, gen_helper_vsrarn_w_d)
+
+TRANS(vsrlrni_b_h, gen_vv_i, gen_helper_vsrlrni_b_h)
+TRANS(vsrlrni_h_w, gen_vv_i, gen_helper_vsrlrni_h_w)
+TRANS(vsrlrni_w_d, gen_vv_i, gen_helper_vsrlrni_w_d)
+TRANS(vsrlrni_d_q, gen_vv_i, gen_helper_vsrlrni_d_q)
+TRANS(vsrarni_b_h, gen_vv_i, gen_helper_vsrarni_b_h)
+TRANS(vsrarni_h_w, gen_vv_i, gen_helper_vsrarni_h_w)
+TRANS(vsrarni_w_d, gen_vv_i, gen_helper_vsrarni_w_d)
+TRANS(vsrarni_d_q, gen_vv_i, gen_helper_vsrarni_d_q)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index ee54b632a7..29bf4a8a6a 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -883,3 +883,19 @@ vsrani_b_h       0111 00110101 10000 1 .... ..... .....   @vv_ui4
 vsrani_h_w       0111 00110101 10001 ..... ..... .....    @vv_ui5
 vsrani_w_d       0111 00110101 1001 ...... ..... .....    @vv_ui6
 vsrani_d_q       0111 00110101 101 ....... ..... .....    @vv_ui7
+
+vsrlrn_b_h       0111 00001111 10001 ..... ..... .....    @vvv
+vsrlrn_h_w       0111 00001111 10010 ..... ..... .....    @vvv
+vsrlrn_w_d       0111 00001111 10011 ..... ..... .....    @vvv
+vsrarn_b_h       0111 00001111 10101 ..... ..... .....    @vvv
+vsrarn_h_w       0111 00001111 10110 ..... ..... .....    @vvv
+vsrarn_w_d       0111 00001111 10111 ..... ..... .....    @vvv
+
+vsrlrni_b_h      0111 00110100 01000 1 .... ..... .....   @vv_ui4
+vsrlrni_h_w      0111 00110100 01001 ..... ..... .....    @vv_ui5
+vsrlrni_w_d      0111 00110100 0101 ...... ..... .....    @vv_ui6
+vsrlrni_d_q      0111 00110100 011 ....... ..... .....    @vv_ui7
+vsrarni_b_h      0111 00110101 11000 1 .... ..... .....   @vv_ui4
+vsrarni_h_w      0111 00110101 11001 ..... ..... .....    @vv_ui5
+vsrarni_w_d      0111 00110101 1101 ...... ..... .....    @vv_ui6
+vsrarni_d_q      0111 00110101 111 ....... ..... .....    @vv_ui7
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 6ddebddde7..c0e704c7e5 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -1293,3 +1293,135 @@ void HELPER(vsrani_d_q)(CPULoongArchState *env,
 VSRANI(vsrani_b_h, 16, B, H)
 VSRANI(vsrani_h_w, 32, H, W)
 VSRANI(vsrani_w_d, 64, W, D)
+
+#define VSRLRN(NAME, BIT, T, E1, E2)                                \
+void HELPER(NAME)(CPULoongArchState *env,                           \
+                  uint32_t vd, uint32_t vj, uint32_t vk)            \
+{                                                                   \
+    int i;                                                          \
+    VReg *Vd = &(env->fpr[vd].vreg);                                \
+    VReg *Vj = &(env->fpr[vj].vreg);                                \
+    VReg *Vk = &(env->fpr[vk].vreg);                                \
+                                                                    \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                             \
+        Vd->E1(i) = do_vsrlr_ ## E2(Vj->E2(i), ((T)Vk->E2(i))%BIT); \
+    }                                                               \
+    Vd->D(1) = 0;                                                   \
+}
+
+VSRLRN(vsrlrn_b_h, 16, uint16_t, B, H)
+VSRLRN(vsrlrn_h_w, 32, uint32_t, H, W)
+VSRLRN(vsrlrn_w_d, 64, uint64_t, W, D)
+
+#define VSRARN(NAME, BIT, T, E1, E2)                                \
+void HELPER(NAME)(CPULoongArchState *env,                           \
+                  uint32_t vd, uint32_t vj, uint32_t vk)            \
+{                                                                   \
+    int i;                                                          \
+    VReg *Vd = &(env->fpr[vd].vreg);                                \
+    VReg *Vj = &(env->fpr[vj].vreg);                                \
+    VReg *Vk = &(env->fpr[vk].vreg);                                \
+                                                                    \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                             \
+        Vd->E1(i) = do_vsrar_ ## E2(Vj->E2(i), ((T)Vk->E2(i))%BIT); \
+    }                                                               \
+    Vd->D(1) = 0;                                                   \
+}
+
+VSRARN(vsrarn_b_h, 16, uint8_t,  B, H)
+VSRARN(vsrarn_h_w, 32, uint16_t, H, W)
+VSRARN(vsrarn_w_d, 64, uint32_t, W, D)
+
+#define VSRLRNI(NAME, BIT, E1, E2)                          \
+void HELPER(NAME)(CPULoongArchState *env,                   \
+                  uint32_t vd, uint32_t vj, uint32_t imm)   \
+{                                                           \
+    int i, max;                                             \
+    VReg temp;                                              \
+    VReg *Vd = &(env->fpr[vd].vreg);                        \
+    VReg *Vj = &(env->fpr[vj].vreg);                        \
+                                                            \
+    temp.D(0) = 0;                                          \
+    temp.D(1) = 0;                                          \
+    max = LSX_LEN/BIT;                                      \
+    for (i = 0; i < max; i++) {                             \
+        temp.E1(i) = do_vsrlr_ ## E2(Vj->E2(i), imm);       \
+        temp.E1(i + max) = do_vsrlr_ ## E2(Vd->E2(i), imm); \
+    }                                                       \
+    Vd->D(0) = temp.D(0);                                   \
+    Vd->D(1) = temp.D(1);                                   \
+}
+
+void HELPER(vsrlrni_d_q)(CPULoongArchState *env,
+                         uint32_t vd, uint32_t vj, uint32_t imm)
+{
+    VReg temp;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+    Int128 r1, r2;
+
+    if (imm == 0) {
+        temp.D(0) = int128_getlo(Vj->Q(0));
+        temp.D(1) = int128_getlo(Vd->D(0));
+    } else {
+        r1 = int128_and(int128_urshift(Vj->Q(0), (imm -1)), int128_one());
+        r2 = int128_and(int128_urshift(Vd->Q(0), (imm -1)), int128_one());
+
+       temp.D(0) = int128_getlo(int128_add(int128_urshift(Vj->Q(0), imm), r1));
+       temp.D(1) = int128_getlo(int128_add(int128_urshift(Vd->Q(0), imm), r2));
+    }
+
+    Vd->D(0) = temp.D(0);
+    Vd->D(1) = temp.D(1);
+}
+
+VSRLRNI(vsrlrni_b_h, 16, B, H)
+VSRLRNI(vsrlrni_h_w, 32, H, W)
+VSRLRNI(vsrlrni_w_d, 64, W, D)
+
+#define VSRARNI(NAME, BIT, E1, E2)                          \
+void HELPER(NAME)(CPULoongArchState *env,                   \
+                  uint32_t vd, uint32_t vj, uint32_t imm)   \
+{                                                           \
+    int i, max;                                             \
+    VReg temp;                                              \
+    VReg *Vd = &(env->fpr[vd].vreg);                        \
+    VReg *Vj = &(env->fpr[vj].vreg);                        \
+                                                            \
+    temp.D(0) = 0;                                          \
+    temp.D(1) = 0;                                          \
+    max = LSX_LEN/BIT;                                      \
+    for (i = 0; i < max; i++) {                             \
+        temp.E1(i) = do_vsrar_ ## E2(Vj->E2(i), imm);       \
+        temp.E1(i + max) = do_vsrar_ ## E2(Vd->E2(i), imm); \
+    }                                                       \
+    Vd->D(0) = temp.D(0);                                   \
+    Vd->D(1) = temp.D(1);                                   \
+}
+
+void HELPER(vsrarni_d_q)(CPULoongArchState *env,
+                         uint32_t vd, uint32_t vj, uint32_t imm)
+{
+    VReg temp;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+    Int128 r1, r2;
+
+    if (imm == 0) {
+        temp.D(0) = int128_getlo(Vj->Q(0));
+        temp.D(1) = int128_getlo(Vd->D(0));
+    } else {
+        r1 = int128_and(int128_rshift(Vj->Q(0), (imm -1)), int128_one());
+        r2 = int128_and(int128_rshift(Vd->Q(0), (imm -1)), int128_one());
+
+       temp.D(0) = int128_getlo(int128_add(int128_rshift(Vj->Q(0), imm), r1));
+       temp.D(1) = int128_getlo(int128_add(int128_rshift(Vd->Q(0), imm), r2));
+    }
+
+    Vd->D(0) = temp.D(0);
+    Vd->D(1) = temp.D(1);
+}
+
+VSRARNI(vsrarni_b_h, 16, B, H)
+VSRARNI(vsrarni_h_w, 32, H, W)
+VSRARNI(vsrarni_w_d, 64, W, D)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 28/44] target/loongarch: Implement vssrln vssran
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (26 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 27/44] target/loongarch: Implement vsrlrn vsrarn Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-02  3:26   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 29/44] target/loongarch: Implement vssrlrn vssrarn Song Gao
                   ` (15 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VSSRLN.{B.H/H.W/W.D};
- VSSRAN.{B.H/H.W/W.D};
- VSSRLN.{BU.H/HU.W/WU.D};
- VSSRAN.{BU.H/HU.W/WU.D};
- VSSRLNI.{B.H/H.W/W.D/D.Q};
- VSSRANI.{B.H/H.W/W.D/D.Q};
- VSSRLNI.{BU.H/HU.W/WU.D/DU.Q};
- VSSRANI.{BU.H/HU.W/WU.D/DU.Q}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  30 ++
 target/loongarch/helper.h                   |  30 ++
 target/loongarch/insn_trans/trans_lsx.c.inc |  30 ++
 target/loongarch/insns.decode               |  30 ++
 target/loongarch/lsx_helper.c               | 383 ++++++++++++++++++++
 5 files changed, 503 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 185cd36381..426d30dc01 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1198,3 +1198,33 @@ INSN_LSX(vsrarni_b_h,      vv_i)
 INSN_LSX(vsrarni_h_w,      vv_i)
 INSN_LSX(vsrarni_w_d,      vv_i)
 INSN_LSX(vsrarni_d_q,      vv_i)
+
+INSN_LSX(vssrln_b_h,       vvv)
+INSN_LSX(vssrln_h_w,       vvv)
+INSN_LSX(vssrln_w_d,       vvv)
+INSN_LSX(vssran_b_h,       vvv)
+INSN_LSX(vssran_h_w,       vvv)
+INSN_LSX(vssran_w_d,       vvv)
+INSN_LSX(vssrln_bu_h,      vvv)
+INSN_LSX(vssrln_hu_w,      vvv)
+INSN_LSX(vssrln_wu_d,      vvv)
+INSN_LSX(vssran_bu_h,      vvv)
+INSN_LSX(vssran_hu_w,      vvv)
+INSN_LSX(vssran_wu_d,      vvv)
+
+INSN_LSX(vssrlni_b_h,      vv_i)
+INSN_LSX(vssrlni_h_w,      vv_i)
+INSN_LSX(vssrlni_w_d,      vv_i)
+INSN_LSX(vssrlni_d_q,      vv_i)
+INSN_LSX(vssrani_b_h,      vv_i)
+INSN_LSX(vssrani_h_w,      vv_i)
+INSN_LSX(vssrani_w_d,      vv_i)
+INSN_LSX(vssrani_d_q,      vv_i)
+INSN_LSX(vssrlni_bu_h,     vv_i)
+INSN_LSX(vssrlni_hu_w,     vv_i)
+INSN_LSX(vssrlni_wu_d,     vv_i)
+INSN_LSX(vssrlni_du_q,     vv_i)
+INSN_LSX(vssrani_bu_h,     vv_i)
+INSN_LSX(vssrani_hu_w,     vv_i)
+INSN_LSX(vssrani_wu_d,     vv_i)
+INSN_LSX(vssrani_du_q,     vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index ee0812dca2..7562f01ad6 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -426,3 +426,33 @@ DEF_HELPER_4(vsrarni_b_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrarni_h_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrarni_w_d, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrarni_d_q, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vssrln_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrln_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrln_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssran_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssran_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssran_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrln_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrln_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrln_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssran_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssran_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssran_wu_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vssrlni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlni_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlni_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlni_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlni_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlni_du_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_du_q, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index a8f699915d..58f27d7f65 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2725,3 +2725,33 @@ TRANS(vsrarni_b_h, gen_vv_i, gen_helper_vsrarni_b_h)
 TRANS(vsrarni_h_w, gen_vv_i, gen_helper_vsrarni_h_w)
 TRANS(vsrarni_w_d, gen_vv_i, gen_helper_vsrarni_w_d)
 TRANS(vsrarni_d_q, gen_vv_i, gen_helper_vsrarni_d_q)
+
+TRANS(vssrln_b_h, gen_vvv, gen_helper_vssrln_b_h)
+TRANS(vssrln_h_w, gen_vvv, gen_helper_vssrln_h_w)
+TRANS(vssrln_w_d, gen_vvv, gen_helper_vssrln_w_d)
+TRANS(vssran_b_h, gen_vvv, gen_helper_vssran_b_h)
+TRANS(vssran_h_w, gen_vvv, gen_helper_vssran_h_w)
+TRANS(vssran_w_d, gen_vvv, gen_helper_vssran_w_d)
+TRANS(vssrln_bu_h, gen_vvv, gen_helper_vssrln_bu_h)
+TRANS(vssrln_hu_w, gen_vvv, gen_helper_vssrln_hu_w)
+TRANS(vssrln_wu_d, gen_vvv, gen_helper_vssrln_wu_d)
+TRANS(vssran_bu_h, gen_vvv, gen_helper_vssran_bu_h)
+TRANS(vssran_hu_w, gen_vvv, gen_helper_vssran_hu_w)
+TRANS(vssran_wu_d, gen_vvv, gen_helper_vssran_wu_d)
+
+TRANS(vssrlni_b_h, gen_vv_i, gen_helper_vssrlni_b_h)
+TRANS(vssrlni_h_w, gen_vv_i, gen_helper_vssrlni_h_w)
+TRANS(vssrlni_w_d, gen_vv_i, gen_helper_vssrlni_w_d)
+TRANS(vssrlni_d_q, gen_vv_i, gen_helper_vssrlni_d_q)
+TRANS(vssrani_b_h, gen_vv_i, gen_helper_vssrani_b_h)
+TRANS(vssrani_h_w, gen_vv_i, gen_helper_vssrani_h_w)
+TRANS(vssrani_w_d, gen_vv_i, gen_helper_vssrani_w_d)
+TRANS(vssrani_d_q, gen_vv_i, gen_helper_vssrani_d_q)
+TRANS(vssrlni_bu_h, gen_vv_i, gen_helper_vssrlni_bu_h)
+TRANS(vssrlni_hu_w, gen_vv_i, gen_helper_vssrlni_hu_w)
+TRANS(vssrlni_wu_d, gen_vv_i, gen_helper_vssrlni_wu_d)
+TRANS(vssrlni_du_q, gen_vv_i, gen_helper_vssrlni_du_q)
+TRANS(vssrani_bu_h, gen_vv_i, gen_helper_vssrani_bu_h)
+TRANS(vssrani_hu_w, gen_vv_i, gen_helper_vssrani_hu_w)
+TRANS(vssrani_wu_d, gen_vv_i, gen_helper_vssrani_wu_d)
+TRANS(vssrani_du_q, gen_vv_i, gen_helper_vssrani_du_q)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 29bf4a8a6a..772c5cddfe 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -899,3 +899,33 @@ vsrarni_b_h      0111 00110101 11000 1 .... ..... .....   @vv_ui4
 vsrarni_h_w      0111 00110101 11001 ..... ..... .....    @vv_ui5
 vsrarni_w_d      0111 00110101 1101 ...... ..... .....    @vv_ui6
 vsrarni_d_q      0111 00110101 111 ....... ..... .....    @vv_ui7
+
+vssrln_b_h       0111 00001111 11001 ..... ..... .....    @vvv
+vssrln_h_w       0111 00001111 11010 ..... ..... .....    @vvv
+vssrln_w_d       0111 00001111 11011 ..... ..... .....    @vvv
+vssran_b_h       0111 00001111 11101 ..... ..... .....    @vvv
+vssran_h_w       0111 00001111 11110 ..... ..... .....    @vvv
+vssran_w_d       0111 00001111 11111 ..... ..... .....    @vvv
+vssrln_bu_h      0111 00010000 01001 ..... ..... .....    @vvv
+vssrln_hu_w      0111 00010000 01010 ..... ..... .....    @vvv
+vssrln_wu_d      0111 00010000 01011 ..... ..... .....    @vvv
+vssran_bu_h      0111 00010000 01101 ..... ..... .....    @vvv
+vssran_hu_w      0111 00010000 01110 ..... ..... .....    @vvv
+vssran_wu_d      0111 00010000 01111 ..... ..... .....    @vvv
+
+vssrlni_b_h      0111 00110100 10000 1 .... ..... .....   @vv_ui4
+vssrlni_h_w      0111 00110100 10001 ..... ..... .....    @vv_ui5
+vssrlni_w_d      0111 00110100 1001 ...... ..... .....    @vv_ui6
+vssrlni_d_q      0111 00110100 101 ....... ..... .....    @vv_ui7
+vssrani_b_h      0111 00110110 00000 1 .... ..... .....   @vv_ui4
+vssrani_h_w      0111 00110110 00001 ..... ..... .....    @vv_ui5
+vssrani_w_d      0111 00110110 0001 ...... ..... .....    @vv_ui6
+vssrani_d_q      0111 00110110 001 ....... ..... .....    @vv_ui7
+vssrlni_bu_h     0111 00110100 11000 1 .... ..... .....   @vv_ui4
+vssrlni_hu_w     0111 00110100 11001 ..... ..... .....    @vv_ui5
+vssrlni_wu_d     0111 00110100 1101 ...... ..... .....    @vv_ui6
+vssrlni_du_q     0111 00110100 111 ....... ..... .....    @vv_ui7
+vssrani_bu_h     0111 00110110 01000 1 .... ..... .....   @vv_ui4
+vssrani_hu_w     0111 00110110 01001 ..... ..... .....    @vv_ui5
+vssrani_wu_d     0111 00110110 0101 ...... ..... .....    @vv_ui6
+vssrani_du_q     0111 00110110 011 ....... ..... .....    @vv_ui7
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index c0e704c7e5..680b345695 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -1425,3 +1425,386 @@ void HELPER(vsrarni_d_q)(CPULoongArchState *env,
 VSRARNI(vsrarni_b_h, 16, B, H)
 VSRARNI(vsrarni_h_w, 32, H, W)
 VSRARNI(vsrarni_w_d, 64, W, D)
+
+#define SSRLNS(NAME, T1, T2, T3)                    \
+static T1 do_ssrlns_ ## NAME(T2 e2, int sa, int sh) \
+{                                                   \
+        T1 shft_res;                                \
+        if (sa == 0) {                              \
+            shft_res = e2;                          \
+        } else {                                    \
+            shft_res = (((T1)e2) >> sa);            \
+        }                                           \
+        T3 mask;                                    \
+        mask = (1u << sh) -1;                       \
+        if (shft_res > mask) {                      \
+            return mask;                            \
+        } else {                                    \
+            return  shft_res;                       \
+        }                                           \
+}
+
+SSRLNS(B, uint16_t, int16_t, uint8_t)
+SSRLNS(H, uint32_t, int32_t, uint16_t)
+SSRLNS(W, uint64_t, int64_t, uint32_t)
+
+#define VSSRLN(NAME, BIT, T, E1, E2)                                          \
+void HELPER(NAME)(CPULoongArchState *env,                                     \
+                  uint32_t vd, uint32_t vj, uint32_t vk)                      \
+{                                                                             \
+    int i;                                                                    \
+    VReg *Vd = &(env->fpr[vd].vreg);                                          \
+    VReg *Vj = &(env->fpr[vj].vreg);                                          \
+    VReg *Vk = &(env->fpr[vk].vreg);                                          \
+                                                                              \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                       \
+        Vd->E1(i) = do_ssrlns_ ## E1(Vj->E2(i), (T)Vk->E2(i)% BIT, BIT/2 -1); \
+    }                                                                         \
+    Vd->D(1) = 0;                                                             \
+}
+
+VSSRLN(vssrln_b_h, 16, uint16_t, B, H)
+VSSRLN(vssrln_h_w, 32, uint32_t, H, W)
+VSSRLN(vssrln_w_d, 64, uint64_t, W, D)
+
+#define SSRANS(E, T1, T2)                        \
+static T1 do_ssrans_ ## E(T1 e2, int sa, int sh) \
+{                                                \
+        T1 shft_res;                             \
+        if (sa == 0) {                           \
+            shft_res = e2;                       \
+        } else {                                 \
+            shft_res = e2 >> sa;                 \
+        }                                        \
+        T2 mask;                                 \
+        mask = (1l << sh) -1;                    \
+        if (shft_res > mask) {                   \
+            return  mask;                        \
+        } else if (shft_res < -(mask +1)) {      \
+            return  ~mask;                       \
+        } else {                                 \
+            return shft_res;                     \
+        }                                        \
+}
+
+SSRANS(B, int16_t, int8_t)
+SSRANS(H, int32_t, int16_t)
+SSRANS(W, int64_t, int32_t)
+
+#define VSSRAN(NAME, BIT, T, E1, E2)                                         \
+void HELPER(NAME)(CPULoongArchState *env,                                    \
+                  uint32_t vd, uint32_t vj, uint32_t vk)                     \
+{                                                                            \
+    int i;                                                                   \
+    VReg *Vd = &(env->fpr[vd].vreg);                                         \
+    VReg *Vj = &(env->fpr[vj].vreg);                                         \
+    VReg *Vk = &(env->fpr[vk].vreg);                                         \
+                                                                             \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                      \
+        Vd->E1(i) = do_ssrans_ ## E1(Vj->E2(i), (T)Vk->E2(i)%BIT, BIT/2 -1); \
+    }                                                                        \
+    Vd->D(1) = 0;                                                            \
+}
+
+VSSRAN(vssran_b_h, 16, uint16_t, B, H)
+VSSRAN(vssran_h_w, 32, uint32_t, H, W)
+VSSRAN(vssran_w_d, 64, uint64_t, W, D)
+
+#define SSRLNU(E, T1, T2, T3)                    \
+static T1 do_ssrlnu_ ## E(T3 e2, int sa, int sh) \
+{                                                \
+        T1 shft_res;                             \
+        if (sa == 0) {                           \
+            shft_res = e2;                       \
+        } else {                                 \
+            shft_res = (((T1)e2) >> sa);         \
+        }                                        \
+        T2 mask;                                 \
+        mask = (1ul << sh) -1;                   \
+        if (shft_res > mask) {                   \
+            return mask;                         \
+        } else {                                 \
+            return shft_res;                     \
+        }                                        \
+}
+
+SSRLNU(B, uint16_t, uint8_t,  int16_t)
+SSRLNU(H, uint32_t, uint16_t, int32_t)
+SSRLNU(W, uint64_t, uint32_t, int64_t)
+
+#define VSSRLNU(NAME, BIT, T, E1, E2)                                     \
+void HELPER(NAME)(CPULoongArchState *env,                                 \
+                  uint32_t vd, uint32_t vj, uint32_t vk)                  \
+{                                                                         \
+    int i;                                                                \
+    VReg *Vd = &(env->fpr[vd].vreg);                                      \
+    VReg *Vj = &(env->fpr[vj].vreg);                                      \
+    VReg *Vk = &(env->fpr[vk].vreg);                                      \
+                                                                          \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                   \
+        Vd->E1(i) = do_ssrlnu_ ## E1(Vj->E2(i), (T)Vk->E2(i)%BIT, BIT/2); \
+    }                                                                     \
+    Vd->D(1) = 0;                                                         \
+}
+
+VSSRLNU(vssrln_bu_h, 16, uint16_t, B, H)
+VSSRLNU(vssrln_hu_w, 32, uint32_t, H, W)
+VSSRLNU(vssrln_wu_d, 64, uint64_t, W, D)
+
+#define SSRANU(E, T1, T2, T3)                    \
+static T1 do_ssranu_ ## E(T3 e2, int sa, int sh) \
+{                                                \
+        T1 shft_res;                             \
+        if (sa == 0) {                           \
+            shft_res = e2;                       \
+        } else {                                 \
+            shft_res = e2 >> sa;                 \
+        }                                        \
+        if (e2 < 0) {                            \
+            shft_res = 0;                        \
+        }                                        \
+        T2 mask;                                 \
+        mask = (1ul << sh) -1;                   \
+        if (shft_res > mask) {                   \
+            return mask;                         \
+        } else {                                 \
+            return shft_res;                     \
+        }                                        \
+}
+
+SSRANU(B, uint16_t, uint8_t,  int16_t)
+SSRANU(H, uint32_t, uint16_t, int32_t)
+SSRANU(W, uint64_t, uint32_t, int64_t)
+
+#define VSSRANU(NAME, BIT, T, E1, E2)                                     \
+void HELPER(NAME)(CPULoongArchState *env,                                 \
+                  uint32_t vd, uint32_t vj, uint32_t vk)                  \
+{                                                                         \
+    int i;                                                                \
+    VReg *Vd = &(env->fpr[vd].vreg);                                      \
+    VReg *Vj = &(env->fpr[vj].vreg);                                      \
+    VReg *Vk = &(env->fpr[vk].vreg);                                      \
+                                                                          \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                   \
+        Vd->E1(i) = do_ssranu_ ## E1(Vj->E2(i), (T)Vk->E2(i)%BIT, BIT/2); \
+    }                                                                     \
+    Vd->D(1) = 0;                                                         \
+}
+
+VSSRANU(vssran_bu_h, 16, uint16_t, B, H)
+VSSRANU(vssran_hu_w, 32, uint32_t, H, W)
+VSSRANU(vssran_wu_d, 64, uint64_t, W, D)
+
+#define VSSRLNI(NAME, BIT, E1, E2)                                            \
+void HELPER(NAME)(CPULoongArchState *env,                                     \
+                  uint32_t vd, uint32_t vj, uint32_t imm)                     \
+{                                                                             \
+    int i;                                                                    \
+    VReg temp;                                                                \
+    VReg *Vd = &(env->fpr[vd].vreg);                                          \
+    VReg *Vj = &(env->fpr[vj].vreg);                                          \
+                                                                              \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                       \
+        temp.E1(i) = do_ssrlns_ ## E1(Vj->E2(i), imm, BIT/2 -1);              \
+        temp.E1(i + LSX_LEN/BIT) = do_ssrlns_ ## E1(Vd->E2(i), imm, BIT/2 -1);\
+    }                                                                         \
+    Vd->D(0) = temp.D(0);                                                     \
+    Vd->D(1) = temp.D(1);                                                     \
+}
+
+void HELPER(vssrlni_d_q)(CPULoongArchState *env,
+                         uint32_t vd, uint32_t vj, uint32_t imm)
+{
+    Int128 shft_res1, shft_res2, mask;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    if (imm == 0) {
+        shft_res1 = Vj->Q(0);
+        shft_res2 = Vd->Q(0);
+    } else {
+        shft_res1 = int128_urshift(Vj->Q(0), imm);
+        shft_res2 = int128_urshift(Vd->Q(0), imm);
+    }
+    mask = int128_sub(int128_lshift(int128_one(), 63), int128_one());
+
+    if (int128_ult(mask, shft_res1)) {
+        Vd->D(0) = int128_getlo(mask);
+    }else {
+        Vd->D(0) = int128_getlo(shft_res1);
+    }
+
+    if (int128_ult(mask, shft_res2)) {
+        Vd->D(1) = int128_getlo(mask);
+    }else {
+        Vd->D(1) = int128_getlo(shft_res2);
+    }
+}
+
+VSSRLNI(vssrlni_b_h, 16, B, H)
+VSSRLNI(vssrlni_h_w, 32, H, W)
+VSSRLNI(vssrlni_w_d, 64, W, D)
+
+#define VSSRANI(NAME, BIT, E1, E2)                                             \
+void HELPER(NAME)(CPULoongArchState *env,                                      \
+                  uint32_t vd, uint32_t vj, uint32_t imm)                      \
+{                                                                              \
+    int i;                                                                     \
+    VReg temp;                                                                 \
+    VReg *Vd = &(env->fpr[vd].vreg);                                           \
+    VReg *Vj = &(env->fpr[vj].vreg);                                           \
+                                                                               \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                        \
+        temp.E1(i) = do_ssrans_ ## E1(Vj->E2(i), imm, BIT/2 -1);               \
+        temp.E1(i + LSX_LEN/BIT) = do_ssrans_ ## E1(Vd->E2(i), imm, BIT/2 -1); \
+    }                                                                          \
+    Vd->D(0) = temp.D(0);                                                      \
+    Vd->D(1) = temp.D(1);                                                      \
+}
+
+void HELPER(vssrani_d_q)(CPULoongArchState *env,
+                         uint32_t vd, uint32_t vj, uint32_t imm)
+{
+    Int128 shft_res1, shft_res2, mask, min;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    if (imm == 0) {
+        shft_res1 = Vj->Q(0);
+        shft_res2 = Vd->Q(0);
+    } else {
+        shft_res1 = int128_rshift(Vj->Q(0), imm);
+        shft_res2 = int128_rshift(Vd->Q(0), imm);
+    }
+    mask = int128_sub(int128_lshift(int128_one(), 63), int128_one());
+    min  = int128_lshift(int128_one(), 63);
+
+    if (int128_gt(shft_res1,  mask)) {
+        Vd->D(0) = int128_getlo(mask);
+    } else if (int128_lt(shft_res1, int128_neg(min))) {
+        Vd->D(0) = int128_getlo(min);
+    } else {
+        Vd->D(0) = int128_getlo(shft_res1);
+    }
+
+    if (int128_gt(shft_res2, mask)) {
+        Vd->D(1) = int128_getlo(mask);
+    } else if (int128_lt(shft_res2, int128_neg(min))) {
+        Vd->D(1) = int128_getlo(min);
+    } else {
+        Vd->D(1) = int128_getlo(shft_res2);
+    }
+}
+
+VSSRANI(vssrani_b_h, 16, B, H)
+VSSRANI(vssrani_h_w, 32, H, W)
+VSSRANI(vssrani_w_d, 64, W, D)
+
+#define VSSRLNUI(NAME, BIT, E1, E2)                                         \
+void HELPER(NAME)(CPULoongArchState *env,                                   \
+                  uint32_t vd, uint32_t vj, uint32_t imm)                   \
+{                                                                           \
+    int i;                                                                  \
+    VReg temp;                                                              \
+    VReg *Vd = &(env->fpr[vd].vreg);                                        \
+    VReg *Vj = &(env->fpr[vj].vreg);                                        \
+                                                                            \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                     \
+        temp.E1(i) = do_ssrlnu_ ## E1(Vj->E2(i), imm, BIT/2);               \
+        temp.E1(i + LSX_LEN/BIT) = do_ssrlnu_ ## E1(Vd->E2(i), imm, BIT/2); \
+    }                                                                       \
+    Vd->D(0) = temp.D(0);                                                   \
+    Vd->D(1) = temp.D(1);                                                   \
+}
+
+void HELPER(vssrlni_du_q)(CPULoongArchState *env,
+                         uint32_t vd, uint32_t vj, uint32_t imm)
+{
+    Int128 shft_res1, shft_res2, mask;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    if (imm == 0) {
+        shft_res1 = Vj->Q(0);
+        shft_res2 = Vd->Q(0);
+    } else {
+        shft_res1 = int128_urshift(Vj->Q(0), imm);
+        shft_res2 = int128_urshift(Vd->Q(0), imm);
+    }
+    mask = int128_sub(int128_lshift(int128_one(), 64), int128_one());
+
+    if (int128_ult(mask, shft_res1)) {
+        Vd->D(0) = int128_getlo(mask);
+    }else {
+        Vd->D(0) = int128_getlo(shft_res1);
+    }
+
+    if (int128_ult(mask, shft_res2)) {
+        Vd->D(1) = int128_getlo(mask);
+    }else {
+        Vd->D(1) = int128_getlo(shft_res2);
+    }
+}
+
+VSSRLNUI(vssrlni_bu_h, 16, B, H)
+VSSRLNUI(vssrlni_hu_w, 32, H, W)
+VSSRLNUI(vssrlni_wu_d, 64, W, D)
+
+#define VSSRANUI(NAME, BIT, E1, E2)                                         \
+void HELPER(NAME)(CPULoongArchState *env,                                   \
+                  uint32_t vd, uint32_t vj, uint32_t imm)                   \
+{                                                                           \
+    int i;                                                                  \
+    VReg temp;                                                              \
+    VReg *Vd = &(env->fpr[vd].vreg);                                        \
+    VReg *Vj = &(env->fpr[vj].vreg);                                        \
+                                                                            \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                     \
+        temp.E1(i) = do_ssranu_ ## E1(Vj->E2(i), imm, BIT/2);               \
+        temp.E1(i + LSX_LEN/BIT) = do_ssranu_ ## E1(Vd->E2(i), imm, BIT/2); \
+    }                                                                       \
+    Vd->D(0) = temp.D(0);                                                   \
+    Vd->D(1) = temp.D(1);                                                   \
+}
+
+void HELPER(vssrani_du_q)(CPULoongArchState *env,
+                         uint32_t vd, uint32_t vj, uint32_t imm)
+{
+    Int128 shft_res1, shft_res2, mask;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    if (imm == 0) {
+        shft_res1 = Vj->Q(0);
+        shft_res2 = Vd->Q(0);
+    } else {
+        shft_res1 = int128_rshift(Vj->Q(0), imm);
+        shft_res2 = int128_rshift(Vd->Q(0), imm);
+    }
+
+    if (int128_lt(Vj->Q(0), int128_zero())) {
+        shft_res1 = int128_zero();
+    }
+
+    if (int128_lt(Vd->Q(0), int128_zero())) {
+        shft_res2 = int128_zero();
+    }
+
+    mask = int128_sub(int128_lshift(int128_one(), 64), int128_one());
+
+    if (int128_ult(mask, shft_res1)) {
+        Vd->D(0) = int128_getlo(mask);
+    }else {
+        Vd->D(0) = int128_getlo(shft_res1);
+    }
+
+    if (int128_ult(mask, shft_res2)) {
+        Vd->D(1) = int128_getlo(mask);
+    }else {
+        Vd->D(1) = int128_getlo(shft_res2);
+    }
+}
+
+VSSRANUI(vssrani_bu_h, 16, B, H)
+VSSRANUI(vssrani_hu_w, 32, H, W)
+VSSRANUI(vssrani_wu_d, 64, W, D)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 29/44] target/loongarch: Implement vssrlrn vssrarn
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (27 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 28/44] target/loongarch: Implement vssrln vssran Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-02  3:31   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 30/44] target/loongarch: Implement vclo vclz Song Gao
                   ` (14 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VSSRLRN.{B.H/H.W/W.D};
- VSSRARN.{B.H/H.W/W.D};
- VSSRLRN.{BU.H/HU.W/WU.D};
- VSSRARN.{BU.H/HU.W/WU.D};
- VSSRLRNI.{B.H/H.W/W.D/D.Q};
- VSSRARNI.{B.H/H.W/W.D/D.Q};
- VSSRLRNI.{BU.H/HU.W/WU.D/DU.Q};
- VSSRARNI.{BU.H/HU.W/WU.D/DU.Q}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  30 ++
 target/loongarch/helper.h                   |  30 ++
 target/loongarch/insn_trans/trans_lsx.c.inc |  30 ++
 target/loongarch/insns.decode               |  30 ++
 target/loongarch/lsx_helper.c               | 362 ++++++++++++++++++++
 5 files changed, 482 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 426d30dc01..405e8885cd 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1228,3 +1228,33 @@ INSN_LSX(vssrani_bu_h,     vv_i)
 INSN_LSX(vssrani_hu_w,     vv_i)
 INSN_LSX(vssrani_wu_d,     vv_i)
 INSN_LSX(vssrani_du_q,     vv_i)
+
+INSN_LSX(vssrlrn_b_h,      vvv)
+INSN_LSX(vssrlrn_h_w,      vvv)
+INSN_LSX(vssrlrn_w_d,      vvv)
+INSN_LSX(vssrarn_b_h,      vvv)
+INSN_LSX(vssrarn_h_w,      vvv)
+INSN_LSX(vssrarn_w_d,      vvv)
+INSN_LSX(vssrlrn_bu_h,     vvv)
+INSN_LSX(vssrlrn_hu_w,     vvv)
+INSN_LSX(vssrlrn_wu_d,     vvv)
+INSN_LSX(vssrarn_bu_h,     vvv)
+INSN_LSX(vssrarn_hu_w,     vvv)
+INSN_LSX(vssrarn_wu_d,     vvv)
+
+INSN_LSX(vssrlrni_b_h,     vv_i)
+INSN_LSX(vssrlrni_h_w,     vv_i)
+INSN_LSX(vssrlrni_w_d,     vv_i)
+INSN_LSX(vssrlrni_d_q,     vv_i)
+INSN_LSX(vssrlrni_bu_h,    vv_i)
+INSN_LSX(vssrlrni_hu_w,    vv_i)
+INSN_LSX(vssrlrni_wu_d,    vv_i)
+INSN_LSX(vssrlrni_du_q,    vv_i)
+INSN_LSX(vssrarni_b_h,     vv_i)
+INSN_LSX(vssrarni_h_w,     vv_i)
+INSN_LSX(vssrarni_w_d,     vv_i)
+INSN_LSX(vssrarni_d_q,     vv_i)
+INSN_LSX(vssrarni_bu_h,    vv_i)
+INSN_LSX(vssrarni_hu_w,    vv_i)
+INSN_LSX(vssrarni_wu_d,    vv_i)
+INSN_LSX(vssrarni_du_q,    vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 7562f01ad6..d602de390b 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -456,3 +456,33 @@ DEF_HELPER_4(vssrani_bu_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vssrani_hu_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vssrani_wu_d, void, env, i32, i32, i32)
 DEF_HELPER_4(vssrani_du_q, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vssrlrn_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrn_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrn_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarn_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarn_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarn_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrn_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrn_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrn_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarn_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarn_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarn_wu_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vssrlrni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrni_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrni_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrni_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrni_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrni_du_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_du_q, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 58f27d7f65..c732c43580 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2755,3 +2755,33 @@ TRANS(vssrani_bu_h, gen_vv_i, gen_helper_vssrani_bu_h)
 TRANS(vssrani_hu_w, gen_vv_i, gen_helper_vssrani_hu_w)
 TRANS(vssrani_wu_d, gen_vv_i, gen_helper_vssrani_wu_d)
 TRANS(vssrani_du_q, gen_vv_i, gen_helper_vssrani_du_q)
+
+TRANS(vssrlrn_b_h, gen_vvv, gen_helper_vssrlrn_b_h)
+TRANS(vssrlrn_h_w, gen_vvv, gen_helper_vssrlrn_h_w)
+TRANS(vssrlrn_w_d, gen_vvv, gen_helper_vssrlrn_w_d)
+TRANS(vssrarn_b_h, gen_vvv, gen_helper_vssrarn_b_h)
+TRANS(vssrarn_h_w, gen_vvv, gen_helper_vssrarn_h_w)
+TRANS(vssrarn_w_d, gen_vvv, gen_helper_vssrarn_w_d)
+TRANS(vssrlrn_bu_h, gen_vvv, gen_helper_vssrlrn_bu_h)
+TRANS(vssrlrn_hu_w, gen_vvv, gen_helper_vssrlrn_hu_w)
+TRANS(vssrlrn_wu_d, gen_vvv, gen_helper_vssrlrn_wu_d)
+TRANS(vssrarn_bu_h, gen_vvv, gen_helper_vssrarn_bu_h)
+TRANS(vssrarn_hu_w, gen_vvv, gen_helper_vssrarn_hu_w)
+TRANS(vssrarn_wu_d, gen_vvv, gen_helper_vssrarn_wu_d)
+
+TRANS(vssrlrni_b_h, gen_vv_i, gen_helper_vssrlrni_b_h)
+TRANS(vssrlrni_h_w, gen_vv_i, gen_helper_vssrlrni_h_w)
+TRANS(vssrlrni_w_d, gen_vv_i, gen_helper_vssrlrni_w_d)
+TRANS(vssrlrni_d_q, gen_vv_i, gen_helper_vssrlrni_d_q)
+TRANS(vssrarni_b_h, gen_vv_i, gen_helper_vssrarni_b_h)
+TRANS(vssrarni_h_w, gen_vv_i, gen_helper_vssrarni_h_w)
+TRANS(vssrarni_w_d, gen_vv_i, gen_helper_vssrarni_w_d)
+TRANS(vssrarni_d_q, gen_vv_i, gen_helper_vssrarni_d_q)
+TRANS(vssrlrni_bu_h, gen_vv_i, gen_helper_vssrlrni_bu_h)
+TRANS(vssrlrni_hu_w, gen_vv_i, gen_helper_vssrlrni_hu_w)
+TRANS(vssrlrni_wu_d, gen_vv_i, gen_helper_vssrlrni_wu_d)
+TRANS(vssrlrni_du_q, gen_vv_i, gen_helper_vssrlrni_du_q)
+TRANS(vssrarni_bu_h, gen_vv_i, gen_helper_vssrarni_bu_h)
+TRANS(vssrarni_hu_w, gen_vv_i, gen_helper_vssrarni_hu_w)
+TRANS(vssrarni_wu_d, gen_vv_i, gen_helper_vssrarni_wu_d)
+TRANS(vssrarni_du_q, gen_vv_i, gen_helper_vssrarni_du_q)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 772c5cddfe..bb4b2a8632 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -929,3 +929,33 @@ vssrani_bu_h     0111 00110110 01000 1 .... ..... .....   @vv_ui4
 vssrani_hu_w     0111 00110110 01001 ..... ..... .....    @vv_ui5
 vssrani_wu_d     0111 00110110 0101 ...... ..... .....    @vv_ui6
 vssrani_du_q     0111 00110110 011 ....... ..... .....    @vv_ui7
+
+vssrlrn_b_h      0111 00010000 00001 ..... ..... .....    @vvv
+vssrlrn_h_w      0111 00010000 00010 ..... ..... .....    @vvv
+vssrlrn_w_d      0111 00010000 00011 ..... ..... .....    @vvv
+vssrarn_b_h      0111 00010000 00101 ..... ..... .....    @vvv
+vssrarn_h_w      0111 00010000 00110 ..... ..... .....    @vvv
+vssrarn_w_d      0111 00010000 00111 ..... ..... .....    @vvv
+vssrlrn_bu_h     0111 00010000 10001 ..... ..... .....    @vvv
+vssrlrn_hu_w     0111 00010000 10010 ..... ..... .....    @vvv
+vssrlrn_wu_d     0111 00010000 10011 ..... ..... .....    @vvv
+vssrarn_bu_h     0111 00010000 10101 ..... ..... .....    @vvv
+vssrarn_hu_w     0111 00010000 10110 ..... ..... .....    @vvv
+vssrarn_wu_d     0111 00010000 10111 ..... ..... .....    @vvv
+
+vssrlrni_b_h     0111 00110101 00000 1 .... ..... .....   @vv_ui4
+vssrlrni_h_w     0111 00110101 00001 ..... ..... .....    @vv_ui5
+vssrlrni_w_d     0111 00110101 0001 ...... ..... .....    @vv_ui6
+vssrlrni_d_q     0111 00110101 001 ....... ..... .....    @vv_ui7
+vssrarni_b_h     0111 00110110 10000 1 .... ..... .....   @vv_ui4
+vssrarni_h_w     0111 00110110 10001 ..... ..... .....    @vv_ui5
+vssrarni_w_d     0111 00110110 1001 ...... ..... .....    @vv_ui6
+vssrarni_d_q     0111 00110110 101 ....... ..... .....    @vv_ui7
+vssrlrni_bu_h    0111 00110101 01000 1 .... ..... .....   @vv_ui4
+vssrlrni_hu_w    0111 00110101 01001 ..... ..... .....    @vv_ui5
+vssrlrni_wu_d    0111 00110101 0101 ...... ..... .....    @vv_ui6
+vssrlrni_du_q    0111 00110101 011 ....... ..... .....    @vv_ui7
+vssrarni_bu_h    0111 00110110 11000 1 .... ..... .....   @vv_ui4
+vssrarni_hu_w    0111 00110110 11001 ..... ..... .....    @vv_ui5
+vssrarni_wu_d    0111 00110110 1101 ...... ..... .....    @vv_ui6
+vssrarni_du_q    0111 00110110 111 ....... ..... .....    @vv_ui7
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 680b345695..4b933f8a69 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -1808,3 +1808,365 @@ void HELPER(vssrani_du_q)(CPULoongArchState *env,
 VSSRANUI(vssrani_bu_h, 16, B, H)
 VSSRANUI(vssrani_hu_w, 32, H, W)
 VSSRANUI(vssrani_wu_d, 64, W, D)
+
+#define SSRLRNS(E1, E2, T1, T2, T3)                \
+static T1 do_ssrlrns_ ## E1(T2 e2, int sa, int sh) \
+{                                                  \
+    T1 shft_res;                                   \
+                                                   \
+    shft_res = do_vsrlr_ ## E2(e2, sa);            \
+    T1 mask;                                       \
+    mask = (1ul << sh) -1;                         \
+    if (shft_res > mask) {                         \
+        return mask;                               \
+    } else {                                       \
+        return  shft_res;                          \
+    }                                              \
+}
+
+SSRLRNS(B, H, uint16_t, int16_t, uint8_t)
+SSRLRNS(H, W, uint32_t, int32_t, uint16_t)
+SSRLRNS(W, D, uint64_t, int64_t, uint32_t)
+
+#define VSSRLRN(NAME, BIT, T, E1, E2)                                         \
+void HELPER(NAME)(CPULoongArchState *env,                                     \
+                  uint32_t vd, uint32_t vj, uint32_t vk)                      \
+{                                                                             \
+    int i;                                                                    \
+    VReg *Vd = &(env->fpr[vd].vreg);                                          \
+    VReg *Vj = &(env->fpr[vj].vreg);                                          \
+    VReg *Vk = &(env->fpr[vk].vreg);                                          \
+                                                                              \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                       \
+        Vd->E1(i) = do_ssrlrns_ ## E1(Vj->E2(i), (T)Vk->E2(i)%BIT, BIT/2 -1); \
+    }                                                                         \
+    Vd->D(1) = 0;                                                             \
+}
+
+VSSRLRN(vssrlrn_b_h, 16, uint16_t, B, H)
+VSSRLRN(vssrlrn_h_w, 32, uint32_t, H, W)
+VSSRLRN(vssrlrn_w_d, 64, uint64_t, W, D)
+
+#define SSRARNS(E1, E2, T1, T2)                    \
+static T1 do_ssrarns_ ## E1(T1 e2, int sa, int sh) \
+{                                                  \
+    T1 shft_res;                                   \
+                                                   \
+    shft_res = do_vsrar_ ## E2(e2, sa);            \
+    T2 mask;                                       \
+    mask = (1l << sh) -1;                          \
+    if (shft_res > mask) {                         \
+        return  mask;                              \
+    } else if (shft_res < -(mask +1)) {            \
+        return  ~mask;                             \
+    } else {                                       \
+        return shft_res;                           \
+    }                                              \
+}
+
+SSRARNS(B, H, int16_t, int8_t)
+SSRARNS(H, W, int32_t, int16_t)
+SSRARNS(W, D, int64_t, int32_t)
+
+#define VSSRARN(NAME, BIT, T, E1, E2)                                         \
+void HELPER(NAME)(CPULoongArchState *env,                                     \
+                  uint32_t vd, uint32_t vj, uint32_t vk)                      \
+{                                                                             \
+    int i;                                                                    \
+    VReg *Vd = &(env->fpr[vd].vreg);                                          \
+    VReg *Vj = &(env->fpr[vj].vreg);                                          \
+    VReg *Vk = &(env->fpr[vk].vreg);                                          \
+                                                                              \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                       \
+        Vd->E1(i) = do_ssrarns_ ## E1(Vj->E2(i), (T)Vk->E2(i)%BIT, BIT/2 -1); \
+    }                                                                         \
+    Vd->D(1) = 0;                                                             \
+}
+
+VSSRARN(vssrarn_b_h, 16, uint16_t, B, H)
+VSSRARN(vssrarn_h_w, 32, uint32_t, H, W)
+VSSRARN(vssrarn_w_d, 64, uint64_t, W, D)
+
+#define SSRLRNU(E1, E2, T1, T2, T3)                \
+static T1 do_ssrlrnu_ ## E1(T3 e2, int sa, int sh) \
+{                                                  \
+    T1 shft_res;                                   \
+                                                   \
+    shft_res = do_vsrlr_ ## E2(e2, sa);            \
+                                                   \
+    T2 mask;                                       \
+    mask = (1ul << sh) -1;                         \
+    if (shft_res > mask) {                         \
+        return mask;                               \
+    } else {                                       \
+        return shft_res;                           \
+    }                                              \
+}
+
+SSRLRNU(B, H, uint16_t, uint8_t, int16_t)
+SSRLRNU(H, W, uint32_t, uint16_t, int32_t)
+SSRLRNU(W, D, uint64_t, uint32_t, int64_t)
+
+#define VSSRLRNU(NAME, BIT, T, E1, E2)                                     \
+void HELPER(NAME)(CPULoongArchState *env,                                  \
+                  uint32_t vd, uint32_t vj, uint32_t vk)                   \
+{                                                                          \
+    int i;                                                                 \
+    VReg *Vd = &(env->fpr[vd].vreg);                                       \
+    VReg *Vj = &(env->fpr[vj].vreg);                                       \
+    VReg *Vk = &(env->fpr[vk].vreg);                                       \
+                                                                           \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                    \
+        Vd->E1(i) = do_ssrlrnu_ ## E1(Vj->E2(i), (T)Vk->E2(i)%BIT, BIT/2); \
+    }                                                                      \
+    Vd->D(1) = 0;                                                          \
+}
+
+VSSRLRNU(vssrlrn_bu_h, 16, uint16_t, B, H)
+VSSRLRNU(vssrlrn_hu_w, 32, uint32_t, H, W)
+VSSRLRNU(vssrlrn_wu_d, 64, uint64_t, W, D)
+
+#define SSRARNU(E1, E2, T1, T2, T3)                \
+static T1 do_ssrarnu_ ## E1(T3 e2, int sa, int sh) \
+{                                                  \
+    T1 shft_res;                                   \
+                                                   \
+    if (e2 < 0) {                                  \
+        shft_res = 0;                              \
+    } else {                                       \
+        shft_res = do_vsrar_ ## E2(e2, sa);        \
+    }                                              \
+    T2 mask;                                       \
+    mask = (1ul << sh) -1;                         \
+    if (shft_res > mask) {                         \
+        return mask;                               \
+    } else {                                       \
+        return shft_res;                           \
+    }                                              \
+}
+
+SSRARNU(B, H, uint16_t, uint8_t, int16_t)
+SSRARNU(H, W, uint32_t, uint16_t, int32_t)
+SSRARNU(W, D, uint64_t, uint32_t, int64_t)
+
+#define VSSRARNU(NAME, BIT, T, E1, E2)                                     \
+void HELPER(NAME)(CPULoongArchState *env,                                  \
+                  uint32_t vd, uint32_t vj, uint32_t vk)                   \
+{                                                                          \
+    int i;                                                                 \
+    VReg *Vd = &(env->fpr[vd].vreg);                                       \
+    VReg *Vj = &(env->fpr[vj].vreg);                                       \
+    VReg *Vk = &(env->fpr[vk].vreg);                                       \
+                                                                           \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                    \
+        Vd->E1(i) = do_ssrarnu_ ## E1(Vj->E2(i), (T)Vk->E2(i)%BIT, BIT/2); \
+    }                                                                      \
+    Vd->D(1) = 0;                                                          \
+}
+
+VSSRARNU(vssrarn_bu_h, 16, uint16_t, B, H)
+VSSRARNU(vssrarn_hu_w, 32, uint32_t, H, W)
+VSSRARNU(vssrarn_wu_d, 64, uint64_t, W, D)
+
+#define VSSRLRNI(NAME, BIT, E1, E2)                                            \
+void HELPER(NAME)(CPULoongArchState *env,                                      \
+                  uint32_t vd, uint32_t vj, uint32_t imm)                      \
+{                                                                              \
+    int i;                                                                     \
+    VReg temp;                                                                 \
+    VReg *Vd = &(env->fpr[vd].vreg);                                           \
+    VReg *Vj = &(env->fpr[vj].vreg);                                           \
+                                                                               \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                        \
+        temp.E1(i) = do_ssrlrns_ ## E1(Vj->E2(i), imm, BIT/2 -1);              \
+        temp.E1(i + LSX_LEN/BIT) = do_ssrlrns_ ## E1(Vd->E2(i), imm, BIT/2 -1);\
+    }                                                                          \
+    Vd->D(0) = temp.D(0);                                                      \
+    Vd->D(1) = temp.D(1);                                                      \
+}
+
+#define VSSRLRNI_Q(NAME, sh)                                               \
+void HELPER(NAME)(CPULoongArchState *env,                                  \
+                          uint32_t vd, uint32_t vj, uint32_t imm)          \
+{                                                                          \
+    Int128 shft_res1, shft_res2, mask, r1, r2;                             \
+    VReg *Vd = &(env->fpr[vd].vreg);                                       \
+    VReg *Vj = &(env->fpr[vj].vreg);                                       \
+                                                                           \
+    if (imm == 0) {                                                        \
+        shft_res1 = Vj->Q(0);                                              \
+        shft_res2 = Vd->Q(0);                                              \
+    } else {                                                               \
+        r1 = int128_and(int128_urshift(Vj->Q(0), (imm -1)), int128_one()); \
+        r2 = int128_and(int128_urshift(Vd->Q(0), (imm -1)), int128_one()); \
+                                                                           \
+        shft_res1 = (int128_add(int128_urshift(Vj->Q(0), imm), r1));       \
+        shft_res2 = (int128_add(int128_urshift(Vd->Q(0), imm), r2));       \
+    }                                                                      \
+                                                                           \
+    mask = int128_sub(int128_lshift(int128_one(), sh), int128_one());      \
+                                                                           \
+    if (int128_ult(mask, shft_res1)) {                                     \
+        Vd->D(0) = int128_getlo(mask);                                     \
+    }else {                                                                \
+        Vd->D(0) = int128_getlo(shft_res1);                                \
+    }                                                                      \
+                                                                           \
+    if (int128_ult(mask, shft_res2)) {                                     \
+        Vd->D(1) = int128_getlo(mask);                                     \
+    }else {                                                                \
+        Vd->D(1) = int128_getlo(shft_res2);                                \
+    }                                                                      \
+}
+
+VSSRLRNI(vssrlrni_b_h, 16, B, H)
+VSSRLRNI(vssrlrni_h_w, 32, H, W)
+VSSRLRNI(vssrlrni_w_d, 64, W, D)
+VSSRLRNI_Q(vssrlrni_d_q, 63)
+
+#define VSSRARNI(NAME, BIT, E1, E2)                                             \
+void HELPER(NAME)(CPULoongArchState *env,                                       \
+                  uint32_t vd, uint32_t vj, uint32_t imm)                       \
+{                                                                               \
+    int i;                                                                      \
+    VReg temp;                                                                  \
+    VReg *Vd = &(env->fpr[vd].vreg);                                            \
+    VReg *Vj = &(env->fpr[vj].vreg);                                            \
+                                                                                \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                         \
+        temp.E1(i) = do_ssrarns_ ## E1(Vj->E2(i), imm, BIT/2 -1);               \
+        temp.E1(i + LSX_LEN/BIT) = do_ssrarns_ ## E1(Vd->E2(i), imm, BIT/2 -1); \
+    }                                                                           \
+    Vd->D(0) = temp.D(0);                                                       \
+    Vd->D(1) = temp.D(1);                                                       \
+}
+
+void HELPER(vssrarni_d_q)(CPULoongArchState *env,
+                          uint32_t vd, uint32_t vj, uint32_t imm)
+{
+    Int128 shft_res1, shft_res2, mask1, mask2, r1, r2;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    if (imm == 0) {
+        shft_res1 = Vj->Q(0);
+        shft_res2 = Vd->Q(0);
+    } else {
+        r1 = int128_and(int128_rshift(Vj->Q(0), (imm -1)), int128_one());
+        r2 = int128_and(int128_rshift(Vd->Q(0), (imm -1)), int128_one());
+
+        shft_res1 = int128_add(int128_rshift(Vj->Q(0), imm), r1);
+        shft_res2 = int128_add(int128_rshift(Vd->Q(0), imm), r2);
+    }
+
+    mask1 = int128_sub(int128_lshift(int128_one(), 63), int128_one());
+    mask2  = int128_lshift(int128_one(), 63);
+
+    if (int128_gt(shft_res1,  mask1)) {
+        Vd->D(0) = int128_getlo(mask1);
+    } else if (int128_lt(shft_res1, int128_neg(mask2))) {
+        Vd->D(0) = int128_getlo(mask2);
+    } else {
+        Vd->D(0) = int128_getlo(shft_res1);
+    }
+
+    if (int128_gt(shft_res2, mask1)) {
+        Vd->D(1) = int128_getlo(mask1);
+    } else if (int128_lt(shft_res2, int128_neg(mask2))) {
+        Vd->D(1) = int128_getlo(mask2);
+    } else {
+        Vd->D(1) = int128_getlo(shft_res2);
+    }
+}
+
+VSSRARNI(vssrarni_b_h, 16, B, H)
+VSSRARNI(vssrarni_h_w, 32, H, W)
+VSSRARNI(vssrarni_w_d, 64, W, D)
+
+#define VSSRLRNUI(NAME, BIT, E1, E2)                                         \
+void HELPER(NAME)(CPULoongArchState *env,                                    \
+                  uint32_t vd, uint32_t vj, uint32_t imm)                    \
+{                                                                            \
+    int i;                                                                   \
+    VReg temp;                                                               \
+    VReg *Vd = &(env->fpr[vd].vreg);                                         \
+    VReg *Vj = &(env->fpr[vj].vreg);                                         \
+                                                                             \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                      \
+        temp.E1(i) = do_ssrlrnu_ ## E1(Vj->E2(i), imm, BIT/2);               \
+        temp.E1(i + LSX_LEN/BIT) = do_ssrlrnu_ ## E1(Vd->E2(i), imm, BIT/2); \
+    }                                                                        \
+    Vd->D(0) = temp.D(0);                                                    \
+    Vd->D(1) = temp.D(1);                                                    \
+}
+
+VSSRLRNUI(vssrlrni_bu_h, 16, B, H)
+VSSRLRNUI(vssrlrni_hu_w, 32, H, W)
+VSSRLRNUI(vssrlrni_wu_d, 64, W, D)
+VSSRLRNI_Q(vssrlrni_du_q, 64)
+
+#define VSSRARNUI(NAME, BIT, E1, E2)                                         \
+void HELPER(NAME)(CPULoongArchState *env,                                    \
+                  uint32_t vd, uint32_t vj, uint32_t imm)                    \
+{                                                                            \
+    int i;                                                                   \
+    VReg temp;                                                               \
+    VReg *Vd = &(env->fpr[vd].vreg);                                         \
+    VReg *Vj = &(env->fpr[vj].vreg);                                         \
+                                                                             \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                      \
+        temp.E1(i) = do_ssrarnu_ ## E1(Vj->E2(i), imm, BIT/2);               \
+        temp.E1(i + LSX_LEN/BIT) = do_ssrarnu_ ## E1(Vd->E2(i), imm, BIT/2); \
+    }                                                                        \
+    Vd->D(0) = temp.D(0);                                                    \
+    Vd->D(1) = temp.D(1);                                                    \
+}
+
+void HELPER(vssrarni_du_q)(CPULoongArchState *env,
+                           uint32_t vd, uint32_t vj, uint32_t imm)
+{
+    Int128 shft_res1, shft_res2, mask1, mask2, r1, r2;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    if (imm == 0) {
+        shft_res1 = Vj->Q(0);
+        shft_res2 = Vd->Q(0);
+    } else {
+        r1 = int128_and(int128_rshift(Vj->Q(0), (imm -1)), int128_one());
+        r2 = int128_and(int128_rshift(Vd->Q(0), (imm -1)), int128_one());
+
+        shft_res1 = int128_add(int128_rshift(Vj->Q(0), imm), r1);
+        shft_res2 = int128_add(int128_rshift(Vd->Q(0), imm), r2);
+    }
+
+    if (int128_lt(Vj->Q(0), int128_zero())) {
+        shft_res1 = int128_zero();
+    }
+    if (int128_lt(Vd->Q(0), int128_zero())) {
+        shft_res2 = int128_zero();
+    }
+
+    mask1 = int128_sub(int128_lshift(int128_one(), 64), int128_one());
+    mask2  = int128_lshift(int128_one(), 64);
+
+    if (int128_gt(shft_res1,  mask1)) {
+        Vd->D(0) = int128_getlo(mask1);
+    } else if (int128_lt(shft_res1, int128_neg(mask2))) {
+        Vd->D(0) = int128_getlo(mask2);
+    } else {
+        Vd->D(0) = int128_getlo(shft_res1);
+    }
+
+    if (int128_gt(shft_res2, mask1)) {
+        Vd->D(1) = int128_getlo(mask1);
+    } else if (int128_lt(shft_res2, int128_neg(mask2))) {
+        Vd->D(1) = int128_getlo(mask2);
+    } else {
+        Vd->D(1) = int128_getlo(shft_res2);
+    }
+}
+
+VSSRARNUI(vssrarni_bu_h, 16, B, H)
+VSSRARNUI(vssrarni_hu_w, 32, H, W)
+VSSRARNUI(vssrarni_wu_d, 64, W, D)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 30/44] target/loongarch: Implement vclo vclz
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (28 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 29/44] target/loongarch: Implement vssrlrn vssrarn Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-02  3:34   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 31/44] target/loongarch: Implement vpcnt Song Gao
                   ` (13 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VCLO.{B/H/W/D};
- VCLZ.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  9 ++++++
 target/loongarch/helper.h                   |  9 ++++++
 target/loongarch/insn_trans/trans_lsx.c.inc |  9 ++++++
 target/loongarch/insns.decode               |  9 ++++++
 target/loongarch/lsx_helper.c               | 31 +++++++++++++++++++++
 5 files changed, 67 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 405e8885cd..0c82a1d9d1 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1258,3 +1258,12 @@ INSN_LSX(vssrarni_bu_h,    vv_i)
 INSN_LSX(vssrarni_hu_w,    vv_i)
 INSN_LSX(vssrarni_wu_d,    vv_i)
 INSN_LSX(vssrarni_du_q,    vv_i)
+
+INSN_LSX(vclo_b,           vv)
+INSN_LSX(vclo_h,           vv)
+INSN_LSX(vclo_w,           vv)
+INSN_LSX(vclo_d,           vv)
+INSN_LSX(vclz_b,           vv)
+INSN_LSX(vclz_h,           vv)
+INSN_LSX(vclz_w,           vv)
+INSN_LSX(vclz_d,           vv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index d602de390b..a7facc6bc1 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -486,3 +486,12 @@ DEF_HELPER_4(vssrarni_bu_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vssrarni_hu_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vssrarni_wu_d, void, env, i32, i32, i32)
 DEF_HELPER_4(vssrarni_du_q, void, env, i32, i32, i32)
+
+DEF_HELPER_3(vclo_b, void, env, i32, i32)
+DEF_HELPER_3(vclo_h, void, env, i32, i32)
+DEF_HELPER_3(vclo_w, void, env, i32, i32)
+DEF_HELPER_3(vclo_d, void, env, i32, i32)
+DEF_HELPER_3(vclz_b, void, env, i32, i32)
+DEF_HELPER_3(vclz_h, void, env, i32, i32)
+DEF_HELPER_3(vclz_w, void, env, i32, i32)
+DEF_HELPER_3(vclz_d, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index c732c43580..5d81c02103 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2785,3 +2785,12 @@ TRANS(vssrarni_bu_h, gen_vv_i, gen_helper_vssrarni_bu_h)
 TRANS(vssrarni_hu_w, gen_vv_i, gen_helper_vssrarni_hu_w)
 TRANS(vssrarni_wu_d, gen_vv_i, gen_helper_vssrarni_wu_d)
 TRANS(vssrarni_du_q, gen_vv_i, gen_helper_vssrarni_du_q)
+
+TRANS(vclo_b, gen_vv, gen_helper_vclo_b)
+TRANS(vclo_h, gen_vv, gen_helper_vclo_h)
+TRANS(vclo_w, gen_vv, gen_helper_vclo_w)
+TRANS(vclo_d, gen_vv, gen_helper_vclo_d)
+TRANS(vclz_b, gen_vv, gen_helper_vclz_b)
+TRANS(vclz_h, gen_vv, gen_helper_vclz_h)
+TRANS(vclz_w, gen_vv, gen_helper_vclz_w)
+TRANS(vclz_d, gen_vv, gen_helper_vclz_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index bb4b2a8632..7591ec1bab 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -959,3 +959,12 @@ vssrarni_bu_h    0111 00110110 11000 1 .... ..... .....   @vv_ui4
 vssrarni_hu_w    0111 00110110 11001 ..... ..... .....    @vv_ui5
 vssrarni_wu_d    0111 00110110 1101 ...... ..... .....    @vv_ui6
 vssrarni_du_q    0111 00110110 111 ....... ..... .....    @vv_ui7
+
+vclo_b           0111 00101001 11000 00000 ..... .....    @vv
+vclo_h           0111 00101001 11000 00001 ..... .....    @vv
+vclo_w           0111 00101001 11000 00010 ..... .....    @vv
+vclo_d           0111 00101001 11000 00011 ..... .....    @vv
+vclz_b           0111 00101001 11000 00100 ..... .....    @vv
+vclz_h           0111 00101001 11000 00101 ..... .....    @vv
+vclz_w           0111 00101001 11000 00110 ..... .....    @vv
+vclz_d           0111 00101001 11000 00111 ..... .....    @vv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 4b933f8a69..8ec479dc2d 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -2170,3 +2170,34 @@ void HELPER(vssrarni_du_q)(CPULoongArchState *env,
 VSSRARNUI(vssrarni_bu_h, 16, B, H)
 VSSRARNUI(vssrarni_hu_w, 32, H, W)
 VSSRARNUI(vssrarni_wu_d, 64, W, D)
+
+#define DO_2OP(NAME, BIT, E, T, DO_OP)                              \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
+{                                                                   \
+    int i;                                                          \
+    VReg *Vd = &(env->fpr[vd].vreg);                                \
+    VReg *Vj = &(env->fpr[vj].vreg);                                \
+                                                                    \
+    for (i = 0; i < LSX_LEN/BIT; i++)                               \
+    {                                                               \
+        Vd->E(i) = DO_OP((T)Vj->E(i));                              \
+    }                                                               \
+}
+
+#define DO_CLO_B(N)  (clz32((uint8_t)~N) - 24)
+#define DO_CLO_H(N)  (clz32((uint16_t)~N) - 16)
+#define DO_CLO_W(N)  (clz32((uint32_t)~N))
+#define DO_CLO_D(N)  (clz64((uint64_t)~N))
+#define DO_CLZ_B(N)  (clz32(N) - 24)
+#define DO_CLZ_H(N)  (clz32(N) - 16)
+#define DO_CLZ_W(N)  (clz32(N))
+#define DO_CLZ_D(N)  (clz64(N))
+
+DO_2OP(vclo_b, 8, B, uint8_t, DO_CLO_B)
+DO_2OP(vclo_h, 16, H, uint16_t, DO_CLO_H)
+DO_2OP(vclo_w, 32, W, uint32_t, DO_CLO_W)
+DO_2OP(vclo_d, 64, D, uint64_t, DO_CLO_D)
+DO_2OP(vclz_b, 8, B, uint8_t, DO_CLZ_B)
+DO_2OP(vclz_h, 16, H, uint16_t, DO_CLZ_H)
+DO_2OP(vclz_w, 32, W, uint32_t, DO_CLZ_W)
+DO_2OP(vclz_d, 64, D, uint64_t, DO_CLZ_D)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 31/44] target/loongarch: Implement vpcnt
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (29 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 30/44] target/loongarch: Implement vclo vclz Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-02  3:35   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 32/44] target/loongarch: Implement vbitclr vbitset vbitrev Song Gao
                   ` (12 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VPCNT.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  5 ++++
 target/loongarch/helper.h                   |  5 ++++
 target/loongarch/insn_trans/trans_lsx.c.inc |  5 ++++
 target/loongarch/insns.decode               |  5 ++++
 target/loongarch/lsx_helper.c               | 30 +++++++++++++++++++++
 5 files changed, 50 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 0c82a1d9d1..0ca51de9d8 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1267,3 +1267,8 @@ INSN_LSX(vclz_b,           vv)
 INSN_LSX(vclz_h,           vv)
 INSN_LSX(vclz_w,           vv)
 INSN_LSX(vclz_d,           vv)
+
+INSN_LSX(vpcnt_b,          vv)
+INSN_LSX(vpcnt_h,          vv)
+INSN_LSX(vpcnt_w,          vv)
+INSN_LSX(vpcnt_d,          vv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index a7facc6bc1..38e310512b 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -495,3 +495,8 @@ DEF_HELPER_3(vclz_b, void, env, i32, i32)
 DEF_HELPER_3(vclz_h, void, env, i32, i32)
 DEF_HELPER_3(vclz_w, void, env, i32, i32)
 DEF_HELPER_3(vclz_d, void, env, i32, i32)
+
+DEF_HELPER_3(vpcnt_b, void, env, i32, i32)
+DEF_HELPER_3(vpcnt_h, void, env, i32, i32)
+DEF_HELPER_3(vpcnt_w, void, env, i32, i32)
+DEF_HELPER_3(vpcnt_d, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 5d81c02103..59923eb1fa 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2794,3 +2794,8 @@ TRANS(vclz_b, gen_vv, gen_helper_vclz_b)
 TRANS(vclz_h, gen_vv, gen_helper_vclz_h)
 TRANS(vclz_w, gen_vv, gen_helper_vclz_w)
 TRANS(vclz_d, gen_vv, gen_helper_vclz_d)
+
+TRANS(vpcnt_b, gen_vv, gen_helper_vpcnt_b)
+TRANS(vpcnt_h, gen_vv, gen_helper_vpcnt_h)
+TRANS(vpcnt_w, gen_vv, gen_helper_vpcnt_w)
+TRANS(vpcnt_d, gen_vv, gen_helper_vpcnt_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 7591ec1bab..f865e83da5 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -968,3 +968,8 @@ vclz_b           0111 00101001 11000 00100 ..... .....    @vv
 vclz_h           0111 00101001 11000 00101 ..... .....    @vv
 vclz_w           0111 00101001 11000 00110 ..... .....    @vv
 vclz_d           0111 00101001 11000 00111 ..... .....    @vv
+
+vpcnt_b          0111 00101001 11000 01000 ..... .....    @vv
+vpcnt_h          0111 00101001 11000 01001 ..... .....    @vv
+vpcnt_w          0111 00101001 11000 01010 ..... .....    @vv
+vpcnt_d          0111 00101001 11000 01011 ..... .....    @vv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 8ec479dc2d..94dded7e49 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -2201,3 +2201,33 @@ DO_2OP(vclz_b, 8, B, uint8_t, DO_CLZ_B)
 DO_2OP(vclz_h, 16, H, uint16_t, DO_CLZ_H)
 DO_2OP(vclz_w, 32, W, uint32_t, DO_CLZ_W)
 DO_2OP(vclz_d, 64, D, uint64_t, DO_CLZ_D)
+
+static uint64_t do_vpcnt(uint64_t u1)
+{
+    u1 = (u1 & 0x5555555555555555ULL) + ((u1 >>  1) & 0x5555555555555555ULL);
+    u1 = (u1 & 0x3333333333333333ULL) + ((u1 >>  2) & 0x3333333333333333ULL);
+    u1 = (u1 & 0x0F0F0F0F0F0F0F0FULL) + ((u1 >>  4) & 0x0F0F0F0F0F0F0F0FULL);
+    u1 = (u1 & 0x00FF00FF00FF00FFULL) + ((u1 >>  8) & 0x00FF00FF00FF00FFULL);
+    u1 = (u1 & 0x0000FFFF0000FFFFULL) + ((u1 >> 16) & 0x0000FFFF0000FFFFULL);
+    u1 = (u1 & 0x00000000FFFFFFFFULL) + ((u1 >> 32));
+
+    return u1;
+}
+
+#define VPCNT(NAME, BIT, E, T)                                      \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
+{                                                                   \
+    int i;                                                          \
+    VReg *Vd = &(env->fpr[vd].vreg);                                \
+    VReg *Vj = &(env->fpr[vj].vreg);                                \
+                                                                    \
+    for (i = 0; i < LSX_LEN/BIT; i++)                               \
+    {                                                               \
+        Vd->E(i) = do_vpcnt((T)Vj->E(i));                           \
+    }                                                               \
+}
+
+VPCNT(vpcnt_b, 8, B, uint8_t)
+VPCNT(vpcnt_h, 16, H, uint16_t)
+VPCNT(vpcnt_w, 32, W, uint32_t)
+VPCNT(vpcnt_d, 64, D, uint64_t)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 32/44] target/loongarch: Implement vbitclr vbitset vbitrev
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (30 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 31/44] target/loongarch: Implement vpcnt Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-02  5:14   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 33/44] target/loongarch: Implement vfrstp Song Gao
                   ` (11 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VBITCLR[I].{B/H/W/D};
- VBITSET[I].{B/H/W/D};
- VBITREV[I].{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    | 25 +++++++++
 target/loongarch/helper.h                   | 25 +++++++++
 target/loongarch/insn_trans/trans_lsx.c.inc | 25 +++++++++
 target/loongarch/insns.decode               | 25 +++++++++
 target/loongarch/lsx_helper.c               | 57 +++++++++++++++++++++
 5 files changed, 157 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 0ca51de9d8..48c7ea47a4 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1272,3 +1272,28 @@ INSN_LSX(vpcnt_b,          vv)
 INSN_LSX(vpcnt_h,          vv)
 INSN_LSX(vpcnt_w,          vv)
 INSN_LSX(vpcnt_d,          vv)
+
+INSN_LSX(vbitclr_b,        vvv)
+INSN_LSX(vbitclr_h,        vvv)
+INSN_LSX(vbitclr_w,        vvv)
+INSN_LSX(vbitclr_d,        vvv)
+INSN_LSX(vbitclri_b,       vv_i)
+INSN_LSX(vbitclri_h,       vv_i)
+INSN_LSX(vbitclri_w,       vv_i)
+INSN_LSX(vbitclri_d,       vv_i)
+INSN_LSX(vbitset_b,        vvv)
+INSN_LSX(vbitset_h,        vvv)
+INSN_LSX(vbitset_w,        vvv)
+INSN_LSX(vbitset_d,        vvv)
+INSN_LSX(vbitseti_b,       vv_i)
+INSN_LSX(vbitseti_h,       vv_i)
+INSN_LSX(vbitseti_w,       vv_i)
+INSN_LSX(vbitseti_d,       vv_i)
+INSN_LSX(vbitrev_b,        vvv)
+INSN_LSX(vbitrev_h,        vvv)
+INSN_LSX(vbitrev_w,        vvv)
+INSN_LSX(vbitrev_d,        vvv)
+INSN_LSX(vbitrevi_b,       vv_i)
+INSN_LSX(vbitrevi_h,       vv_i)
+INSN_LSX(vbitrevi_w,       vv_i)
+INSN_LSX(vbitrevi_d,       vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 38e310512b..4622f788ee 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -500,3 +500,28 @@ DEF_HELPER_3(vpcnt_b, void, env, i32, i32)
 DEF_HELPER_3(vpcnt_h, void, env, i32, i32)
 DEF_HELPER_3(vpcnt_w, void, env, i32, i32)
 DEF_HELPER_3(vpcnt_d, void, env, i32, i32)
+
+DEF_HELPER_4(vbitclr_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitclr_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitclr_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitclr_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitclri_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitclri_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitclri_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitclri_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitset_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitset_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitset_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitset_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitseti_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitseti_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitseti_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitseti_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrev_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrev_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrev_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrev_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrevi_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrevi_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrevi_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrevi_d, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 59923eb1fa..6d3a804767 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2799,3 +2799,28 @@ TRANS(vpcnt_b, gen_vv, gen_helper_vpcnt_b)
 TRANS(vpcnt_h, gen_vv, gen_helper_vpcnt_h)
 TRANS(vpcnt_w, gen_vv, gen_helper_vpcnt_w)
 TRANS(vpcnt_d, gen_vv, gen_helper_vpcnt_d)
+
+TRANS(vbitclr_b, gen_vvv, gen_helper_vbitclr_b)
+TRANS(vbitclr_h, gen_vvv, gen_helper_vbitclr_h)
+TRANS(vbitclr_w, gen_vvv, gen_helper_vbitclr_w)
+TRANS(vbitclr_d, gen_vvv, gen_helper_vbitclr_d)
+TRANS(vbitclri_b, gen_vv_i, gen_helper_vbitclri_b)
+TRANS(vbitclri_h, gen_vv_i, gen_helper_vbitclri_h)
+TRANS(vbitclri_w, gen_vv_i, gen_helper_vbitclri_w)
+TRANS(vbitclri_d, gen_vv_i, gen_helper_vbitclri_d)
+TRANS(vbitset_b, gen_vvv, gen_helper_vbitset_b)
+TRANS(vbitset_h, gen_vvv, gen_helper_vbitset_h)
+TRANS(vbitset_w, gen_vvv, gen_helper_vbitset_w)
+TRANS(vbitset_d, gen_vvv, gen_helper_vbitset_d)
+TRANS(vbitseti_b, gen_vv_i, gen_helper_vbitseti_b)
+TRANS(vbitseti_h, gen_vv_i, gen_helper_vbitseti_h)
+TRANS(vbitseti_w, gen_vv_i, gen_helper_vbitseti_w)
+TRANS(vbitseti_d, gen_vv_i, gen_helper_vbitseti_d)
+TRANS(vbitrev_b, gen_vvv, gen_helper_vbitrev_b)
+TRANS(vbitrev_h, gen_vvv, gen_helper_vbitrev_h)
+TRANS(vbitrev_w, gen_vvv, gen_helper_vbitrev_w)
+TRANS(vbitrev_d, gen_vvv, gen_helper_vbitrev_d)
+TRANS(vbitrevi_b, gen_vv_i, gen_helper_vbitrevi_b)
+TRANS(vbitrevi_h, gen_vv_i, gen_helper_vbitrevi_h)
+TRANS(vbitrevi_w, gen_vv_i, gen_helper_vbitrevi_w)
+TRANS(vbitrevi_d, gen_vv_i, gen_helper_vbitrevi_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index f865e83da5..801c97714e 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -973,3 +973,28 @@ vpcnt_b          0111 00101001 11000 01000 ..... .....    @vv
 vpcnt_h          0111 00101001 11000 01001 ..... .....    @vv
 vpcnt_w          0111 00101001 11000 01010 ..... .....    @vv
 vpcnt_d          0111 00101001 11000 01011 ..... .....    @vv
+
+vbitclr_b        0111 00010000 11000 ..... ..... .....    @vvv
+vbitclr_h        0111 00010000 11001 ..... ..... .....    @vvv
+vbitclr_w        0111 00010000 11010 ..... ..... .....    @vvv
+vbitclr_d        0111 00010000 11011 ..... ..... .....    @vvv
+vbitclri_b       0111 00110001 00000 01 ... ..... .....   @vv_ui3
+vbitclri_h       0111 00110001 00000 1 .... ..... .....   @vv_ui4
+vbitclri_w       0111 00110001 00001 ..... ..... .....    @vv_ui5
+vbitclri_d       0111 00110001 0001 ...... ..... .....    @vv_ui6
+vbitset_b        0111 00010000 11100 ..... ..... .....    @vvv
+vbitset_h        0111 00010000 11101 ..... ..... .....    @vvv
+vbitset_w        0111 00010000 11110 ..... ..... .....    @vvv
+vbitset_d        0111 00010000 11111 ..... ..... .....    @vvv
+vbitseti_b       0111 00110001 01000 01 ... ..... .....   @vv_ui3
+vbitseti_h       0111 00110001 01000 1 .... ..... .....   @vv_ui4
+vbitseti_w       0111 00110001 01001 ..... ..... .....    @vv_ui5
+vbitseti_d       0111 00110001 0101 ...... ..... .....    @vv_ui6
+vbitrev_b        0111 00010001 00000 ..... ..... .....    @vvv
+vbitrev_h        0111 00010001 00001 ..... ..... .....    @vvv
+vbitrev_w        0111 00010001 00010 ..... ..... .....    @vvv
+vbitrev_d        0111 00010001 00011 ..... ..... .....    @vvv
+vbitrevi_b       0111 00110001 10000 01 ... ..... .....   @vv_ui3
+vbitrevi_h       0111 00110001 10000 1 .... ..... .....   @vv_ui4
+vbitrevi_w       0111 00110001 10001 ..... ..... .....    @vv_ui5
+vbitrevi_d       0111 00110001 1001 ...... ..... .....    @vv_ui6
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 94dded7e49..e23c75bd56 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -2231,3 +2231,60 @@ VPCNT(vpcnt_b, 8, B, uint8_t)
 VPCNT(vpcnt_h, 16, H, uint16_t)
 VPCNT(vpcnt_w, 32, W, uint32_t)
 VPCNT(vpcnt_d, 64, D, uint64_t)
+
+#define DO_BITCLR(a, bit) (a & ~(1ul << bit))
+#define DO_BITSET(a, bit) (a | 1ul << bit)
+#define DO_BITREV(a, bit) (a ^ (1ul << bit))
+
+#define DO_BIT(NAME, BIT, T, E, DO_OP)                   \
+void HELPER(NAME)(CPULoongArchState *env,                \
+                  uint32_t vd, uint32_t vj, uint32_t vk) \
+{                                                        \
+    int i;                                               \
+    VReg *Vd = &(env->fpr[vd].vreg);                     \
+    VReg *Vj = &(env->fpr[vj].vreg);                     \
+    VReg *Vk = &(env->fpr[vk].vreg);                     \
+                                                         \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                  \
+        Vd->E(i) = DO_OP((T)Vj->E(i), (T)Vk->E(i)%BIT);  \
+    }                                                    \
+}
+
+DO_BIT(vbitclr_b, 8, uint8_t, B, DO_BITCLR)
+DO_BIT(vbitclr_h, 16, uint16_t, H, DO_BITCLR)
+DO_BIT(vbitclr_w, 32, uint32_t, W, DO_BITCLR)
+DO_BIT(vbitclr_d, 64, uint64_t, D, DO_BITCLR)
+DO_BIT(vbitset_b, 8, uint8_t, B, DO_BITSET)
+DO_BIT(vbitset_h, 16, uint16_t, H, DO_BITSET)
+DO_BIT(vbitset_w, 32, uint32_t, W, DO_BITSET)
+DO_BIT(vbitset_d, 64, uint64_t, D, DO_BITSET)
+DO_BIT(vbitrev_b, 8, uint8_t, B, DO_BITREV)
+DO_BIT(vbitrev_h, 16, uint16_t, H, DO_BITREV)
+DO_BIT(vbitrev_w, 32, uint32_t, W, DO_BITREV)
+DO_BIT(vbitrev_d, 64, uint64_t, D, DO_BITREV)
+
+#define DO_BITI(NAME, BIT, T, E, DO_OP)                   \
+void HELPER(NAME)(CPULoongArchState *env,                 \
+                  uint32_t vd, uint32_t vj, uint32_t imm) \
+{                                                         \
+    int i;                                                \
+    VReg *Vd = &(env->fpr[vd].vreg);                      \
+    VReg *Vj = &(env->fpr[vj].vreg);                      \
+                                                          \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                   \
+        Vd->E(i) = DO_OP((T)Vj->E(i), imm);               \
+    }                                                     \
+}
+
+DO_BITI(vbitclri_b, 8, uint8_t, B, DO_BITCLR)
+DO_BITI(vbitclri_h, 16, uint16_t, H, DO_BITCLR)
+DO_BITI(vbitclri_w, 32, uint32_t, W, DO_BITCLR)
+DO_BITI(vbitclri_d, 64, uint64_t, D, DO_BITCLR)
+DO_BITI(vbitseti_b, 8, uint8_t, B, DO_BITSET)
+DO_BITI(vbitseti_h, 16, uint16_t, H, DO_BITSET)
+DO_BITI(vbitseti_w, 32, uint32_t, W, DO_BITSET)
+DO_BITI(vbitseti_d, 64, uint64_t, D, DO_BITSET)
+DO_BITI(vbitrevi_b, 8, uint8_t, B, DO_BITREV)
+DO_BITI(vbitrevi_h, 16, uint16_t, H, DO_BITREV)
+DO_BITI(vbitrevi_w, 32, uint32_t, W, DO_BITREV)
+DO_BITI(vbitrevi_d, 64, uint64_t, D, DO_BITREV)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 33/44] target/loongarch: Implement vfrstp
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (31 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 32/44] target/loongarch: Implement vbitclr vbitset vbitrev Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-02  5:17   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 34/44] target/loongarch: Implement LSX fpu arith instructions Song Gao
                   ` (10 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VFRSTP[I].{B/H}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  5 +++
 target/loongarch/helper.h                   |  5 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  5 +++
 target/loongarch/insns.decode               |  5 +++
 target/loongarch/lsx_helper.c               | 41 +++++++++++++++++++++
 5 files changed, 61 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 48c7ea47a4..be2bb9cc42 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1297,3 +1297,8 @@ INSN_LSX(vbitrevi_b,       vv_i)
 INSN_LSX(vbitrevi_h,       vv_i)
 INSN_LSX(vbitrevi_w,       vv_i)
 INSN_LSX(vbitrevi_d,       vv_i)
+
+INSN_LSX(vfrstp_b,         vvv)
+INSN_LSX(vfrstp_h,         vvv)
+INSN_LSX(vfrstpi_b,        vv_i)
+INSN_LSX(vfrstpi_h,        vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 4622f788ee..d8b783ebc7 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -525,3 +525,8 @@ DEF_HELPER_4(vbitrevi_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vbitrevi_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vbitrevi_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vbitrevi_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vfrstp_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vfrstp_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vfrstpi_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vfrstpi_h, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 6d3a804767..9ba9113ca3 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2824,3 +2824,8 @@ TRANS(vbitrevi_b, gen_vv_i, gen_helper_vbitrevi_b)
 TRANS(vbitrevi_h, gen_vv_i, gen_helper_vbitrevi_h)
 TRANS(vbitrevi_w, gen_vv_i, gen_helper_vbitrevi_w)
 TRANS(vbitrevi_d, gen_vv_i, gen_helper_vbitrevi_d)
+
+TRANS(vfrstp_b, gen_vvv, gen_helper_vfrstp_b)
+TRANS(vfrstp_h, gen_vvv, gen_helper_vfrstp_h)
+TRANS(vfrstpi_b, gen_vv_i, gen_helper_vfrstpi_b)
+TRANS(vfrstpi_h, gen_vv_i, gen_helper_vfrstpi_h)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 801c97714e..4cb286ffe5 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -998,3 +998,8 @@ vbitrevi_b       0111 00110001 10000 01 ... ..... .....   @vv_ui3
 vbitrevi_h       0111 00110001 10000 1 .... ..... .....   @vv_ui4
 vbitrevi_w       0111 00110001 10001 ..... ..... .....    @vv_ui5
 vbitrevi_d       0111 00110001 1001 ...... ..... .....    @vv_ui6
+
+vfrstp_b         0111 00010010 10110 ..... ..... .....    @vvv
+vfrstp_h         0111 00010010 10111 ..... ..... .....    @vvv
+vfrstpi_b        0111 00101001 10100 ..... ..... .....    @vv_ui5
+vfrstpi_h        0111 00101001 10101 ..... ..... .....    @vv_ui5
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index e23c75bd56..d6143a0016 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -2288,3 +2288,44 @@ DO_BITI(vbitrevi_b, 8, uint8_t, B, DO_BITREV)
 DO_BITI(vbitrevi_h, 16, uint16_t, H, DO_BITREV)
 DO_BITI(vbitrevi_w, 32, uint32_t, W, DO_BITREV)
 DO_BITI(vbitrevi_d, 64, uint64_t, D, DO_BITREV)
+
+#define VFRSTP(NAME, BIT, MASK, E)                       \
+void HELPER(NAME)(CPULoongArchState *env,                \
+                  uint32_t vd, uint32_t vj, uint32_t vk) \
+{                                                        \
+    int i, m;                                            \
+    VReg *Vd = &(env->fpr[vd].vreg);                     \
+    VReg *Vj = &(env->fpr[vj].vreg);                     \
+    VReg *Vk = &(env->fpr[vk].vreg);                     \
+                                                         \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                  \
+        if (Vj->E(i) < 0) {                              \
+            break;                                       \
+        }                                                \
+    }                                                    \
+    m = Vk->E(0) & MASK;                                 \
+    Vd->E(m) = i;                                        \
+}
+
+VFRSTP(vfrstp_b, 8, 0xf, B)
+VFRSTP(vfrstp_h, 16, 0x7, H)
+
+#define VFRSTPI(NAME, BIT, E)                             \
+void HELPER(NAME)(CPULoongArchState *env,                 \
+                  uint32_t vd, uint32_t vj, uint32_t imm) \
+{                                                         \
+    int i, m;                                             \
+    VReg *Vd = &(env->fpr[vd].vreg);                      \
+    VReg *Vj = &(env->fpr[vj].vreg);                      \
+                                                          \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                   \
+        if (Vj->E(i) < 0) {                               \
+            break;                                        \
+        }                                                 \
+    }                                                     \
+    m = imm % (LSX_LEN/BIT);                              \
+    Vd->E(m) = i;                                         \
+}
+
+VFRSTPI(vfrstpi_b, 8,  B)
+VFRSTPI(vfrstpi_h, 16, H)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 34/44] target/loongarch: Implement LSX fpu arith instructions
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (32 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 33/44] target/loongarch: Implement vfrstp Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-02  5:19   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 35/44] target/loongarch: Implement LSX fpu fcvt instructions Song Gao
                   ` (9 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VF{ADD/SUB/MUL/DIV}.{S/D};
- VF{MADD/MSUB/NMADD/NMSUB}.{S/D};
- VF{MAX/MIN}.{S/D};
- VF{MAXA/MINA}.{S/D};
- VFLOGB.{S/D};
- VFCLASS.{S/D};
- VF{SQRT/RECIP/RSQRT}.{S/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/cpu.h                      |   4 +
 target/loongarch/disas.c                    |  46 +++++
 target/loongarch/fpu_helper.c               |   2 +-
 target/loongarch/helper.h                   |  41 +++++
 target/loongarch/insn_trans/trans_lsx.c.inc |  55 ++++++
 target/loongarch/insns.decode               |  43 +++++
 target/loongarch/internals.h                |   1 +
 target/loongarch/lsx_helper.c               | 187 ++++++++++++++++++++
 8 files changed, 378 insertions(+), 1 deletion(-)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 2e5326f474..abbe79f783 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -55,6 +55,10 @@ FIELD(FCSR0, CAUSE, 24, 5)
     do { \
         (REG) = FIELD_DP32(REG, FCSR0, CAUSE, V); \
     } while (0)
+#define UPDATE_FP_CAUSE(REG, V) \
+    do { \
+        (REG) |= FIELD_DP32(0, FCSR0, CAUSE, V); \
+    } while (0)
 
 #define GET_FP_ENABLES(REG)    FIELD_EX32(REG, FCSR0, ENABLES)
 #define SET_FP_ENABLES(REG, V) \
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index be2bb9cc42..b57b284e49 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -807,6 +807,11 @@ static void output_vv(DisasContext *ctx, arg_vv *a, const char *mnemonic)
     output(ctx, mnemonic, "v%d, v%d", a->vd, a->vj);
 }
 
+static void output_vvvv(DisasContext *ctx, arg_vvvv *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "v%d, v%d, v%d, v%d", a->vd, a->vj, a->vk, a->va);
+}
+
 INSN_LSX(vadd_b,           vvv)
 INSN_LSX(vadd_h,           vvv)
 INSN_LSX(vadd_w,           vvv)
@@ -1302,3 +1307,44 @@ INSN_LSX(vfrstp_b,         vvv)
 INSN_LSX(vfrstp_h,         vvv)
 INSN_LSX(vfrstpi_b,        vv_i)
 INSN_LSX(vfrstpi_h,        vv_i)
+
+INSN_LSX(vfadd_s,          vvv)
+INSN_LSX(vfadd_d,          vvv)
+INSN_LSX(vfsub_s,          vvv)
+INSN_LSX(vfsub_d,          vvv)
+INSN_LSX(vfmul_s,          vvv)
+INSN_LSX(vfmul_d,          vvv)
+INSN_LSX(vfdiv_s,          vvv)
+INSN_LSX(vfdiv_d,          vvv)
+
+INSN_LSX(vfmadd_s,         vvvv)
+INSN_LSX(vfmadd_d,         vvvv)
+INSN_LSX(vfmsub_s,         vvvv)
+INSN_LSX(vfmsub_d,         vvvv)
+INSN_LSX(vfnmadd_s,        vvvv)
+INSN_LSX(vfnmadd_d,        vvvv)
+INSN_LSX(vfnmsub_s,        vvvv)
+INSN_LSX(vfnmsub_d,        vvvv)
+
+INSN_LSX(vfmax_s,          vvv)
+INSN_LSX(vfmax_d,          vvv)
+INSN_LSX(vfmin_s,          vvv)
+INSN_LSX(vfmin_d,          vvv)
+
+INSN_LSX(vfmaxa_s,         vvv)
+INSN_LSX(vfmaxa_d,         vvv)
+INSN_LSX(vfmina_s,         vvv)
+INSN_LSX(vfmina_d,         vvv)
+
+INSN_LSX(vflogb_s,         vv)
+INSN_LSX(vflogb_d,         vv)
+
+INSN_LSX(vfclass_s,        vv)
+INSN_LSX(vfclass_d,        vv)
+
+INSN_LSX(vfsqrt_s,         vv)
+INSN_LSX(vfsqrt_d,         vv)
+INSN_LSX(vfrecip_s,        vv)
+INSN_LSX(vfrecip_d,        vv)
+INSN_LSX(vfrsqrt_s,        vv)
+INSN_LSX(vfrsqrt_d,        vv)
diff --git a/target/loongarch/fpu_helper.c b/target/loongarch/fpu_helper.c
index 4b9637210a..f6753c5875 100644
--- a/target/loongarch/fpu_helper.c
+++ b/target/loongarch/fpu_helper.c
@@ -33,7 +33,7 @@ void restore_fp_status(CPULoongArchState *env)
     set_flush_to_zero(0, &env->fp_status);
 }
 
-static int ieee_ex_to_loongarch(int xcpt)
+int ieee_ex_to_loongarch(int xcpt)
 {
     int ret = 0;
     if (xcpt & float_flag_invalid) {
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index d8b783ebc7..2c59fb09c0 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -530,3 +530,44 @@ DEF_HELPER_4(vfrstp_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vfrstp_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vfrstpi_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vfrstpi_h, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vfadd_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfadd_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vfsub_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfsub_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vfmul_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfmul_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vfdiv_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfdiv_d, void, env, i32, i32, i32)
+
+DEF_HELPER_5(vfmadd_s, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfmadd_d, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfmsub_s, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfmsub_d, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfnmadd_s, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfnmadd_d, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfnmsub_s, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfnmsub_d, void, env, i32, i32, i32, i32)
+
+DEF_HELPER_4(vfmax_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfmax_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vfmin_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfmin_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vfmaxa_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfmaxa_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vfmina_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfmina_d, void, env, i32, i32, i32)
+
+DEF_HELPER_3(vflogb_s, void, env, i32, i32)
+DEF_HELPER_3(vflogb_d, void, env, i32, i32)
+
+DEF_HELPER_3(vfclass_s, void, env, i32, i32)
+DEF_HELPER_3(vfclass_d, void, env, i32, i32)
+
+DEF_HELPER_3(vfsqrt_s, void, env, i32, i32)
+DEF_HELPER_3(vfsqrt_d, void, env, i32, i32)
+DEF_HELPER_3(vfrecip_s, void, env, i32, i32)
+DEF_HELPER_3(vfrecip_d, void, env, i32, i32)
+DEF_HELPER_3(vfrsqrt_s, void, env, i32, i32)
+DEF_HELPER_3(vfrsqrt_d, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 9ba9113ca3..34a272ce00 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -15,6 +15,20 @@
 #define CHECK_SXE
 #endif
 
+static bool gen_vvvv(DisasContext *ctx, arg_vvvv *a,
+                     void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32,
+                                  TCGv_i32, TCGv_i32))
+{
+    TCGv_i32 vd = tcg_constant_i32(a->vd);
+    TCGv_i32 vj = tcg_constant_i32(a->vj);
+    TCGv_i32 vk = tcg_constant_i32(a->vk);
+    TCGv_i32 va = tcg_constant_i32(a->va);
+
+    CHECK_SXE;
+    func(cpu_env, vd, vj, vk, va);
+    return true;
+}
+
 static bool gen_vvv(DisasContext *ctx, arg_vvv *a,
                     void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32))
 {
@@ -2829,3 +2843,44 @@ TRANS(vfrstp_b, gen_vvv, gen_helper_vfrstp_b)
 TRANS(vfrstp_h, gen_vvv, gen_helper_vfrstp_h)
 TRANS(vfrstpi_b, gen_vv_i, gen_helper_vfrstpi_b)
 TRANS(vfrstpi_h, gen_vv_i, gen_helper_vfrstpi_h)
+
+TRANS(vfadd_s, gen_vvv, gen_helper_vfadd_s)
+TRANS(vfadd_d, gen_vvv, gen_helper_vfadd_d)
+TRANS(vfsub_s, gen_vvv, gen_helper_vfsub_s)
+TRANS(vfsub_d, gen_vvv, gen_helper_vfsub_d)
+TRANS(vfmul_s, gen_vvv, gen_helper_vfmul_s)
+TRANS(vfmul_d, gen_vvv, gen_helper_vfmul_d)
+TRANS(vfdiv_s, gen_vvv, gen_helper_vfdiv_s)
+TRANS(vfdiv_d, gen_vvv, gen_helper_vfdiv_d)
+
+TRANS(vfmadd_s, gen_vvvv, gen_helper_vfmadd_s)
+TRANS(vfmadd_d, gen_vvvv, gen_helper_vfmadd_d)
+TRANS(vfmsub_s, gen_vvvv, gen_helper_vfmsub_s)
+TRANS(vfmsub_d, gen_vvvv, gen_helper_vfmsub_d)
+TRANS(vfnmadd_s, gen_vvvv, gen_helper_vfnmadd_s)
+TRANS(vfnmadd_d, gen_vvvv, gen_helper_vfnmadd_d)
+TRANS(vfnmsub_s, gen_vvvv, gen_helper_vfnmsub_s)
+TRANS(vfnmsub_d, gen_vvvv, gen_helper_vfnmsub_d)
+
+TRANS(vfmax_s, gen_vvv, gen_helper_vfmax_s)
+TRANS(vfmax_d, gen_vvv, gen_helper_vfmax_d)
+TRANS(vfmin_s, gen_vvv, gen_helper_vfmin_s)
+TRANS(vfmin_d, gen_vvv, gen_helper_vfmin_d)
+
+TRANS(vfmaxa_s, gen_vvv, gen_helper_vfmaxa_s)
+TRANS(vfmaxa_d, gen_vvv, gen_helper_vfmaxa_d)
+TRANS(vfmina_s, gen_vvv, gen_helper_vfmina_s)
+TRANS(vfmina_d, gen_vvv, gen_helper_vfmina_d)
+
+TRANS(vflogb_s, gen_vv, gen_helper_vflogb_s)
+TRANS(vflogb_d, gen_vv, gen_helper_vflogb_d)
+
+TRANS(vfclass_s, gen_vv, gen_helper_vfclass_s)
+TRANS(vfclass_d, gen_vv, gen_helper_vfclass_d)
+
+TRANS(vfsqrt_s, gen_vv, gen_helper_vfsqrt_s)
+TRANS(vfsqrt_d, gen_vv, gen_helper_vfsqrt_d)
+TRANS(vfrecip_s, gen_vv, gen_helper_vfrecip_s)
+TRANS(vfrecip_d, gen_vv, gen_helper_vfrecip_d)
+TRANS(vfrsqrt_s, gen_vv, gen_helper_vfrsqrt_s)
+TRANS(vfrsqrt_d, gen_vv, gen_helper_vfrsqrt_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 4cb286ffe5..bcc531dd25 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -493,6 +493,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 &vv           vd vj
 &vvv          vd vj vk
 &vv_i         vd vj imm
+&vvvv         vd vj vk va
 
 #
 # LSX Formats
@@ -506,6 +507,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 @vv_ui7             .... ........ ... imm:7 vj:5 vd:5    &vv_i
 @vv_ui8              .... ........ .. imm:8 vj:5 vd:5    &vv_i
 @vv_i5           .... ........ ..... imm:s5 vj:5 vd:5    &vv_i
+@vvvv               .... ........ va:5 vk:5 vj:5 vd:5    &vvvv
 
 vadd_b           0111 00000000 10100 ..... ..... .....    @vvv
 vadd_h           0111 00000000 10101 ..... ..... .....    @vvv
@@ -1003,3 +1005,44 @@ vfrstp_b         0111 00010010 10110 ..... ..... .....    @vvv
 vfrstp_h         0111 00010010 10111 ..... ..... .....    @vvv
 vfrstpi_b        0111 00101001 10100 ..... ..... .....    @vv_ui5
 vfrstpi_h        0111 00101001 10101 ..... ..... .....    @vv_ui5
+
+vfadd_s          0111 00010011 00001 ..... ..... .....    @vvv
+vfadd_d          0111 00010011 00010 ..... ..... .....    @vvv
+vfsub_s          0111 00010011 00101 ..... ..... .....    @vvv
+vfsub_d          0111 00010011 00110 ..... ..... .....    @vvv
+vfmul_s          0111 00010011 10001 ..... ..... .....    @vvv
+vfmul_d          0111 00010011 10010 ..... ..... .....    @vvv
+vfdiv_s          0111 00010011 10101 ..... ..... .....    @vvv
+vfdiv_d          0111 00010011 10110 ..... ..... .....    @vvv
+
+vfmadd_s         0000 10010001 ..... ..... ..... .....    @vvvv
+vfmadd_d         0000 10010010 ..... ..... ..... .....    @vvvv
+vfmsub_s         0000 10010101 ..... ..... ..... .....    @vvvv
+vfmsub_d         0000 10010110 ..... ..... ..... .....    @vvvv
+vfnmadd_s        0000 10011001 ..... ..... ..... .....    @vvvv
+vfnmadd_d        0000 10011010 ..... ..... ..... .....    @vvvv
+vfnmsub_s        0000 10011101 ..... ..... ..... .....    @vvvv
+vfnmsub_d        0000 10011110 ..... ..... ..... .....    @vvvv
+
+vfmax_s          0111 00010011 11001 ..... ..... .....    @vvv
+vfmax_d          0111 00010011 11010 ..... ..... .....    @vvv
+vfmin_s          0111 00010011 11101 ..... ..... .....    @vvv
+vfmin_d          0111 00010011 11110 ..... ..... .....    @vvv
+
+vfmaxa_s         0111 00010100 00001 ..... ..... .....    @vvv
+vfmaxa_d         0111 00010100 00010 ..... ..... .....    @vvv
+vfmina_s         0111 00010100 00101 ..... ..... .....    @vvv
+vfmina_d         0111 00010100 00110 ..... ..... .....    @vvv
+
+vflogb_s         0111 00101001 11001 10001 ..... .....    @vv
+vflogb_d         0111 00101001 11001 10010 ..... .....    @vv
+
+vfclass_s        0111 00101001 11001 10101 ..... .....    @vv
+vfclass_d        0111 00101001 11001 10110 ..... .....    @vv
+
+vfsqrt_s         0111 00101001 11001 11001 ..... .....    @vv
+vfsqrt_d         0111 00101001 11001 11010 ..... .....    @vv
+vfrecip_s        0111 00101001 11001 11101 ..... .....    @vv
+vfrecip_d        0111 00101001 11001 11110 ..... .....    @vv
+vfrsqrt_s        0111 00101001 11010 00001 ..... .....    @vv
+vfrsqrt_d        0111 00101001 11010 00010 ..... .....    @vv
diff --git a/target/loongarch/internals.h b/target/loongarch/internals.h
index f01635aed6..c492863cc5 100644
--- a/target/loongarch/internals.h
+++ b/target/loongarch/internals.h
@@ -31,6 +31,7 @@ void G_NORETURN do_raise_exception(CPULoongArchState *env,
 
 const char *loongarch_exception_name(int32_t exception);
 
+int ieee_ex_to_loongarch(int xcpt);
 void restore_fp_status(CPULoongArchState *env);
 
 #ifndef CONFIG_USER_ONLY
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index d6143a0016..b66a896a28 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -9,6 +9,8 @@
 #include "cpu.h"
 #include "exec/exec-all.h"
 #include "exec/helper-proto.h"
+#include "fpu/softfloat.h"
+#include "internals.h"
 
 void helper_vadd_q(CPULoongArchState *env,
                    uint32_t vd, uint32_t vj, uint32_t vk)
@@ -2329,3 +2331,188 @@ void HELPER(NAME)(CPULoongArchState *env,                 \
 
 VFRSTPI(vfrstpi_b, 8,  B)
 VFRSTPI(vfrstpi_h, 16, H)
+
+static void vec_update_fcsr0_mask(CPULoongArchState *env,
+                                  uintptr_t pc, int mask)
+{
+    int flags = get_float_exception_flags(&env->fp_status);
+
+    set_float_exception_flags(0, &env->fp_status);
+
+    flags &= ~mask;
+
+    if (flags) {
+        flags = ieee_ex_to_loongarch(flags);
+        UPDATE_FP_CAUSE(env->fcsr0, flags);
+    }
+
+    if (GET_FP_ENABLES(env->fcsr0) & flags) {
+        do_raise_exception(env, EXCCODE_FPE, pc);
+    } else {
+        UPDATE_FP_FLAGS(env->fcsr0, flags);
+    }
+}
+
+static void vec_update_fcsr0(CPULoongArchState *env, uintptr_t pc)
+{
+    vec_update_fcsr0_mask(env, pc, 0);
+}
+
+static inline void vec_clear_cause(CPULoongArchState *env)
+{
+    SET_FP_CAUSE(env->fcsr0, 0);
+}
+
+#define DO_3OP_F(NAME, BIT, T, E, FN)                             \
+void HELPER(NAME)(CPULoongArchState *env,                         \
+                  uint32_t vd, uint32_t vj, uint32_t vk)          \
+{                                                                 \
+    int i;                                                        \
+    VReg *Vd = &(env->fpr[vd].vreg);                              \
+    VReg *Vj = &(env->fpr[vj].vreg);                              \
+    VReg *Vk = &(env->fpr[vk].vreg);                              \
+                                                                  \
+    vec_clear_cause(env);                                         \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                           \
+        Vd->E(i) = FN((T)Vj->E(i), (T)Vk->E(i), &env->fp_status); \
+        vec_update_fcsr0(env, GETPC());                           \
+    }                                                             \
+}
+
+DO_3OP_F(vfadd_s, 32, uint32_t, W, float32_add)
+DO_3OP_F(vfadd_d, 64, uint64_t, D, float64_add)
+DO_3OP_F(vfsub_s, 32, uint32_t, W, float32_sub)
+DO_3OP_F(vfsub_d, 64, uint64_t, D, float64_sub)
+DO_3OP_F(vfmul_s, 32, uint32_t, W, float32_mul)
+DO_3OP_F(vfmul_d, 64, uint64_t, D, float64_mul)
+DO_3OP_F(vfdiv_s, 32, uint32_t, W, float32_div)
+DO_3OP_F(vfdiv_d, 64, uint64_t, D, float64_div)
+DO_3OP_F(vfmax_s, 32, uint32_t, W, float32_maxnum)
+DO_3OP_F(vfmax_d, 64, uint64_t, D, float64_maxnum)
+DO_3OP_F(vfmin_s, 32, uint32_t, W, float32_minnum)
+DO_3OP_F(vfmin_d, 64, uint64_t, D, float64_minnum)
+DO_3OP_F(vfmaxa_s, 32, uint32_t, W, float32_maxnummag)
+DO_3OP_F(vfmaxa_d, 64, uint64_t, D, float64_maxnummag)
+DO_3OP_F(vfmina_s, 32, uint32_t, W, float32_minnummag)
+DO_3OP_F(vfmina_d, 64, uint64_t, D, float64_minnummag)
+
+#define DO_4OP_F(NAME, BIT, T, E, FN, flags)                          \
+void HELPER(NAME)(CPULoongArchState *env,                             \
+                  uint32_t vd, uint32_t vj, uint32_t vk, uint32_t va) \
+{                                                                     \
+    int i;                                                            \
+    VReg *Vd = &(env->fpr[vd].vreg);                                  \
+    VReg *Vj = &(env->fpr[vj].vreg);                                  \
+    VReg *Vk = &(env->fpr[vk].vreg);                                  \
+    VReg *Va = &(env->fpr[va].vreg);                                  \
+                                                                      \
+    vec_clear_cause(env);                                             \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                               \
+        Vd->E(i) = FN((T)Vj->E(i), (T)Vk->E(i), (T)Va->E(i),          \
+                      flags, &env->fp_status);                        \
+        vec_update_fcsr0(env, GETPC());                               \
+    }                                                                 \
+}
+
+DO_4OP_F(vfmadd_s, 32, uint32_t, W, float32_muladd, 0)
+DO_4OP_F(vfmadd_d, 64, uint64_t, D, float64_muladd, 0)
+DO_4OP_F(vfmsub_s, 32, uint32_t, W, float32_muladd, float_muladd_negate_c)
+DO_4OP_F(vfmsub_d, 64, uint64_t, D, float64_muladd, float_muladd_negate_c)
+DO_4OP_F(vfnmadd_s, 32, uint32_t, W, float32_muladd, float_muladd_negate_result)
+DO_4OP_F(vfnmadd_d, 64, uint64_t, D, float64_muladd, float_muladd_negate_result)
+DO_4OP_F(vfnmsub_s, 32, uint32_t, W, float32_muladd,
+         float_muladd_negate_c | float_muladd_negate_result)
+DO_4OP_F(vfnmsub_d, 64, uint64_t, D, float64_muladd,
+         float_muladd_negate_c | float_muladd_negate_result)
+
+#define DO_2OP_F(NAME, BIT, T, E, FN)                               \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
+{                                                                   \
+    int i;                                                          \
+    VReg *Vd = &(env->fpr[vd].vreg);                                \
+    VReg *Vj = &(env->fpr[vj].vreg);                                \
+                                                                    \
+    vec_clear_cause(env);                                           \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                             \
+        Vd->E(i) = FN(env, (T)Vj->E(i));                            \
+    }                                                               \
+}
+
+#define FLOGB(BIT, T)                                            \
+static T do_flogb_## BIT(CPULoongArchState *env, T fj)           \
+{                                                                \
+    T fp, fd;                                                    \
+    float_status *status = &env->fp_status;                      \
+    FloatRoundMode old_mode = get_float_rounding_mode(status);   \
+                                                                 \
+    set_float_rounding_mode(float_round_down, status);           \
+    fp = float ## BIT ##_log2(fj, status);                       \
+    fd = float ## BIT ##_round_to_int(fp, status);               \
+    set_float_rounding_mode(old_mode, status);                   \
+    vec_update_fcsr0_mask(env, GETPC(), float_flag_inexact);     \
+    return fd;                                                   \
+}
+
+FLOGB(32, uint32_t)
+FLOGB(64, uint64_t)
+
+#define FCLASS(NAME, BIT, T, E, FN)                                 \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
+{                                                                   \
+    int i;                                                          \
+    VReg *Vd = &(env->fpr[vd].vreg);                                \
+    VReg *Vj = &(env->fpr[vj].vreg);                                \
+                                                                    \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                             \
+        Vd->E(i) = FN(env, (T)Vj->E(i));                            \
+    }                                                               \
+}
+
+FCLASS(vfclass_s, 32, uint32_t, W, helper_fclass_s)
+FCLASS(vfclass_d, 64, uint64_t, D, helper_fclass_d)
+
+#define FSQRT(BIT, T)                                  \
+static T do_fsqrt_## BIT(CPULoongArchState *env, T fj) \
+{                                                      \
+    T fd;                                              \
+    fd = float ## BIT ##_sqrt(fj, &env->fp_status);    \
+    vec_update_fcsr0(env, GETPC());                    \
+    return fd;                                         \
+}
+
+FSQRT(32, uint32_t)
+FSQRT(64, uint64_t)
+
+#define FRECIP(BIT, T)                                                  \
+static T do_frecip_## BIT(CPULoongArchState *env, T fj)                 \
+{                                                                       \
+    T fd;                                                               \
+    fd = float ## BIT ##_div(float ## BIT ##_one, fj, &env->fp_status); \
+    vec_update_fcsr0(env, GETPC());                                     \
+    return fd;                                                          \
+}
+
+FRECIP(32, uint32_t)
+FRECIP(64, uint64_t)
+
+#define FRSQRT(BIT, T)                                                  \
+static T do_frsqrt_## BIT(CPULoongArchState *env, T fj)                 \
+{                                                                       \
+    T fd, fp;                                                           \
+    fp = float ## BIT ##_sqrt(fj, &env->fp_status);                     \
+    fd = float ## BIT ##_div(float ## BIT ##_one, fp, &env->fp_status); \
+    vec_update_fcsr0(env, GETPC());                                     \
+    return fd;                                                          \
+}
+
+FRSQRT(32, uint32_t)
+FRSQRT(64, uint64_t)
+
+DO_2OP_F(vflogb_s, 32, uint32_t, W, do_flogb_32)
+DO_2OP_F(vflogb_d, 64, uint64_t, D, do_flogb_64)
+DO_2OP_F(vfsqrt_s, 32, uint32_t, W, do_fsqrt_32)
+DO_2OP_F(vfsqrt_d, 64, uint64_t, D, do_fsqrt_64)
+DO_2OP_F(vfrecip_s, 32, uint32_t, W, do_frecip_32)
+DO_2OP_F(vfrecip_d, 64, uint64_t, D, do_frecip_64)
+DO_2OP_F(vfrsqrt_s, 32, uint32_t, W, do_frsqrt_32)
+DO_2OP_F(vfrsqrt_d, 64, uint64_t, D, do_frsqrt_64)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 35/44] target/loongarch: Implement LSX fpu fcvt instructions
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (33 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 34/44] target/loongarch: Implement LSX fpu arith instructions Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-02  5:22   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 36/44] target/loongarch: Implement vseq vsle vslt Song Gao
                   ` (8 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VFCVT{L/H}.{S.H/D.S};
- VFCVT.{H.S/S.D};
- VFRINT[{RNE/RZ/RP/RM}].{S/D};
- VFTINT[{RNE/RZ/RP/RM}].{W.S/L.D};
- VFTINT[RZ].{WU.S/LU.D};
- VFTINT[{RNE/RZ/RP/RM}].W.D;
- VFTINT[{RNE/RZ/RP/RM}]{L/H}.L.S;
- VFFINT.{S.W/D.L}[U];
- VFFINT.S.L, VFFINT{L/H}.D.W.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 fpu/softfloat.c                             |  55 +++
 include/fpu/softfloat.h                     |  27 ++
 target/loongarch/disas.c                    |  56 +++
 target/loongarch/helper.h                   |  56 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  56 +++
 target/loongarch/insns.decode               |  56 +++
 target/loongarch/lsx_helper.c               | 369 ++++++++++++++++++++
 7 files changed, 675 insertions(+)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index c7454c3eb1..79975c6b01 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -2988,6 +2988,25 @@ float64 float64_round_to_int(float64 a, float_status *s)
     return float64_round_pack_canonical(&p, s);
 }
 
+#define FRINT_RM(rm, rmode, bits)                             \
+float ## bits float ## bits ## _round_to_int_ ## rm(          \
+                         float ## bits a, float_status *s)    \
+{                                                             \
+    FloatParts64 pa;   \
+    float ## bits ## _unpack_canonical(&pa, a, s); \
+    parts_round_to_int(&pa, rmode, 0, s, &float64_params);    \
+    return float ## bits ## _round_pack_canonical(&pa, s);    \
+}
+FRINT_RM(rne, float_round_nearest_even, 32)
+FRINT_RM(rm,  float_round_down,         32)
+FRINT_RM(rp,  float_round_up,           32)
+FRINT_RM(rz,  float_round_to_zero,      32)
+FRINT_RM(rne, float_round_nearest_even, 64)
+FRINT_RM(rm,  float_round_down,         64)
+FRINT_RM(rp,  float_round_up,           64)
+FRINT_RM(rz,  float_round_to_zero,      64)
+#undef FRINT_RM
+
 bfloat16 bfloat16_round_to_int(bfloat16 a, float_status *s)
 {
     FloatParts64 p;
@@ -3349,6 +3368,42 @@ int32_t float64_to_int32_round_to_zero(float64 a, float_status *s)
     return float64_to_int32_scalbn(a, float_round_to_zero, 0, s);
 }
 
+#define FTINT_RM(rm, rmode, sbits, dbits)                                 \
+int ## dbits ## _t float ## sbits ## _to_int ## dbits ## _ ## rm(         \
+                         float ## sbits a, float_status *s)               \
+{                                                                         \
+    return float ## sbits ## _to_int ## dbits ## _scalbn(a, rmode, 0, s); \
+}
+FTINT_RM(rne, float_round_nearest_even, 32, 32)
+FTINT_RM(rm,  float_round_down,         32, 32)
+FTINT_RM(rp,  float_round_up,           32, 32)
+FTINT_RM(rz,  float_round_to_zero,      32, 32)
+FTINT_RM(rne, float_round_nearest_even, 64, 64)
+FTINT_RM(rm,  float_round_down,         64, 64)
+FTINT_RM(rp,  float_round_up,           64, 64)
+FTINT_RM(rz,  float_round_to_zero,      64, 64)
+
+FTINT_RM(rne, float_round_nearest_even, 32, 64)
+FTINT_RM(rm,  float_round_down,         32, 64)
+FTINT_RM(rp,  float_round_up,           32, 64)
+FTINT_RM(rz,  float_round_to_zero,      32, 64)
+#undef FTINT_RM
+
+int32_t float64_to_int32_round_up(float64 a, float_status *s)
+{
+    return float64_to_int32_scalbn(a, float_round_up, 0, s);
+}
+
+int32_t float64_to_int32_round_down(float64 a, float_status *s)
+{
+    return float64_to_int32_scalbn(a, float_round_down, 0, s);
+}
+
+int32_t float64_to_int32_round_nearest_even(float64 a, float_status *s)
+{
+    return float64_to_int32_scalbn(a, float_round_nearest_even, 0, s);
+}
+
 int64_t float64_to_int64_round_to_zero(float64 a, float_status *s)
 {
     return float64_to_int64_scalbn(a, float_round_to_zero, 0, s);
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 3dcf20e3a2..ebdbaa4ac8 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -559,6 +559,16 @@ int16_t float32_to_int16_round_to_zero(float32, float_status *status);
 int32_t float32_to_int32_round_to_zero(float32, float_status *status);
 int64_t float32_to_int64_round_to_zero(float32, float_status *status);
 
+int64_t float32_to_int64_rm(float32, float_status *status);
+int64_t float32_to_int64_rp(float32, float_status *status);
+int64_t float32_to_int64_rz(float32, float_status *status);
+int64_t float32_to_int64_rne(float32, float_status *status);
+
+int32_t float32_to_int32_rm(float32, float_status *status);
+int32_t float32_to_int32_rp(float32, float_status *status);
+int32_t float32_to_int32_rz(float32, float_status *status);
+int32_t float32_to_int32_rne(float32, float_status *status);
+
 uint16_t float32_to_uint16_scalbn(float32, FloatRoundMode, int, float_status *);
 uint32_t float32_to_uint32_scalbn(float32, FloatRoundMode, int, float_status *);
 uint64_t float32_to_uint64_scalbn(float32, FloatRoundMode, int, float_status *);
@@ -579,6 +589,10 @@ float128 float32_to_float128(float32, float_status *status);
 | Software IEC/IEEE single-precision operations.
 *----------------------------------------------------------------------------*/
 float32 float32_round_to_int(float32, float_status *status);
+float32 float32_round_to_int_rm(float32, float_status *status);
+float32 float32_round_to_int_rp(float32, float_status *status);
+float32 float32_round_to_int_rz(float32, float_status *status);
+float32 float32_round_to_int_rne(float32, float_status *status);
 float32 float32_add(float32, float32, float_status *status);
 float32 float32_sub(float32, float32, float_status *status);
 float32 float32_mul(float32, float32, float_status *status);
@@ -751,6 +765,15 @@ int16_t float64_to_int16_round_to_zero(float64, float_status *status);
 int32_t float64_to_int32_round_to_zero(float64, float_status *status);
 int64_t float64_to_int64_round_to_zero(float64, float_status *status);
 
+int64_t float64_to_int64_rm(float64, float_status *status);
+int64_t float64_to_int64_rp(float64, float_status *status);
+int64_t float64_to_int64_rz(float64, float_status *status);
+int64_t float64_to_int64_rne(float64, float_status *status);
+
+int32_t float64_to_int32_round_up(float64, float_status *status);
+int32_t float64_to_int32_round_down(float64, float_status *status);
+int32_t float64_to_int32_round_nearest_even(float64, float_status *status);
+
 uint16_t float64_to_uint16_scalbn(float64, FloatRoundMode, int, float_status *);
 uint32_t float64_to_uint32_scalbn(float64, FloatRoundMode, int, float_status *);
 uint64_t float64_to_uint64_scalbn(float64, FloatRoundMode, int, float_status *);
@@ -771,6 +794,10 @@ float128 float64_to_float128(float64, float_status *status);
 | Software IEC/IEEE double-precision operations.
 *----------------------------------------------------------------------------*/
 float64 float64_round_to_int(float64, float_status *status);
+float64 float64_round_to_int_rm(float64, float_status *status);
+float64 float64_round_to_int_rp(float64, float_status *status);
+float64 float64_round_to_int_rz(float64, float_status *status);
+float64 float64_round_to_int_rne(float64, float_status *status);
 float64 float64_add(float64, float64, float_status *status);
 float64 float64_sub(float64, float64, float_status *status);
 float64 float64_mul(float64, float64, float_status *status);
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index b57b284e49..c04271081f 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1348,3 +1348,59 @@ INSN_LSX(vfrecip_s,        vv)
 INSN_LSX(vfrecip_d,        vv)
 INSN_LSX(vfrsqrt_s,        vv)
 INSN_LSX(vfrsqrt_d,        vv)
+
+INSN_LSX(vfcvtl_s_h,       vv)
+INSN_LSX(vfcvth_s_h,       vv)
+INSN_LSX(vfcvtl_d_s,       vv)
+INSN_LSX(vfcvth_d_s,       vv)
+INSN_LSX(vfcvt_h_s,        vvv)
+INSN_LSX(vfcvt_s_d,        vvv)
+
+INSN_LSX(vfrint_s,         vv)
+INSN_LSX(vfrint_d,         vv)
+INSN_LSX(vfrintrm_s,       vv)
+INSN_LSX(vfrintrm_d,       vv)
+INSN_LSX(vfrintrp_s,       vv)
+INSN_LSX(vfrintrp_d,       vv)
+INSN_LSX(vfrintrz_s,       vv)
+INSN_LSX(vfrintrz_d,       vv)
+INSN_LSX(vfrintrne_s,      vv)
+INSN_LSX(vfrintrne_d,      vv)
+
+INSN_LSX(vftint_w_s,       vv)
+INSN_LSX(vftint_l_d,       vv)
+INSN_LSX(vftintrm_w_s,     vv)
+INSN_LSX(vftintrm_l_d,     vv)
+INSN_LSX(vftintrp_w_s,     vv)
+INSN_LSX(vftintrp_l_d,     vv)
+INSN_LSX(vftintrz_w_s,     vv)
+INSN_LSX(vftintrz_l_d,     vv)
+INSN_LSX(vftintrne_w_s,    vv)
+INSN_LSX(vftintrne_l_d,    vv)
+INSN_LSX(vftint_wu_s,      vv)
+INSN_LSX(vftint_lu_d,      vv)
+INSN_LSX(vftintrz_wu_s,    vv)
+INSN_LSX(vftintrz_lu_d,    vv)
+INSN_LSX(vftint_w_d,       vvv)
+INSN_LSX(vftintrm_w_d,     vvv)
+INSN_LSX(vftintrp_w_d,     vvv)
+INSN_LSX(vftintrz_w_d,     vvv)
+INSN_LSX(vftintrne_w_d,    vvv)
+INSN_LSX(vftintl_l_s,      vv)
+INSN_LSX(vftinth_l_s,      vv)
+INSN_LSX(vftintrml_l_s,    vv)
+INSN_LSX(vftintrmh_l_s,    vv)
+INSN_LSX(vftintrpl_l_s,    vv)
+INSN_LSX(vftintrph_l_s,    vv)
+INSN_LSX(vftintrzl_l_s,    vv)
+INSN_LSX(vftintrzh_l_s,    vv)
+INSN_LSX(vftintrnel_l_s,   vv)
+INSN_LSX(vftintrneh_l_s,   vv)
+
+INSN_LSX(vffint_s_w,       vv)
+INSN_LSX(vffint_s_wu,      vv)
+INSN_LSX(vffint_d_l,       vv)
+INSN_LSX(vffint_d_lu,      vv)
+INSN_LSX(vffintl_d_w,      vv)
+INSN_LSX(vffinth_d_w,      vv)
+INSN_LSX(vffint_s_l,       vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 2c59fb09c0..b2cc1a6ddb 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -571,3 +571,59 @@ DEF_HELPER_3(vfrecip_s, void, env, i32, i32)
 DEF_HELPER_3(vfrecip_d, void, env, i32, i32)
 DEF_HELPER_3(vfrsqrt_s, void, env, i32, i32)
 DEF_HELPER_3(vfrsqrt_d, void, env, i32, i32)
+
+DEF_HELPER_3(vfcvtl_s_h, void, env, i32, i32)
+DEF_HELPER_3(vfcvth_s_h, void, env, i32, i32)
+DEF_HELPER_3(vfcvtl_d_s, void, env, i32, i32)
+DEF_HELPER_3(vfcvth_d_s, void, env, i32, i32)
+DEF_HELPER_4(vfcvt_h_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfcvt_s_d, void, env, i32, i32, i32)
+
+DEF_HELPER_3(vfrintrne_s, void, env, i32, i32)
+DEF_HELPER_3(vfrintrne_d, void, env, i32, i32)
+DEF_HELPER_3(vfrintrz_s, void, env, i32, i32)
+DEF_HELPER_3(vfrintrz_d, void, env, i32, i32)
+DEF_HELPER_3(vfrintrp_s, void, env, i32, i32)
+DEF_HELPER_3(vfrintrp_d, void, env, i32, i32)
+DEF_HELPER_3(vfrintrm_s, void, env, i32, i32)
+DEF_HELPER_3(vfrintrm_d, void, env, i32, i32)
+DEF_HELPER_3(vfrint_s, void, env, i32, i32)
+DEF_HELPER_3(vfrint_d, void, env, i32, i32)
+
+DEF_HELPER_3(vftintrne_w_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrne_l_d, void, env, i32, i32)
+DEF_HELPER_3(vftintrz_w_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrz_l_d, void, env, i32, i32)
+DEF_HELPER_3(vftintrp_w_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrp_l_d, void, env, i32, i32)
+DEF_HELPER_3(vftintrm_w_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrm_l_d, void, env, i32, i32)
+DEF_HELPER_3(vftint_w_s, void, env, i32, i32)
+DEF_HELPER_3(vftint_l_d, void, env, i32, i32)
+DEF_HELPER_3(vftintrz_wu_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrz_lu_d, void, env, i32, i32)
+DEF_HELPER_3(vftint_wu_s, void, env, i32, i32)
+DEF_HELPER_3(vftint_lu_d, void, env, i32, i32)
+DEF_HELPER_4(vftintrne_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vftintrz_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vftintrp_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vftintrm_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vftint_w_d, void, env, i32, i32, i32)
+DEF_HELPER_3(vftintrnel_l_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrneh_l_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrzl_l_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrzh_l_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrpl_l_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrph_l_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrml_l_s, void, env, i32, i32)
+DEF_HELPER_3(vftintrmh_l_s, void, env, i32, i32)
+DEF_HELPER_3(vftintl_l_s, void, env, i32, i32)
+DEF_HELPER_3(vftinth_l_s, void, env, i32, i32)
+
+DEF_HELPER_3(vffint_s_w, void, env, i32, i32)
+DEF_HELPER_3(vffint_d_l, void, env, i32, i32)
+DEF_HELPER_3(vffint_s_wu, void, env, i32, i32)
+DEF_HELPER_3(vffint_d_lu, void, env, i32, i32)
+DEF_HELPER_3(vffintl_d_w, void, env, i32, i32)
+DEF_HELPER_3(vffinth_d_w, void, env, i32, i32)
+DEF_HELPER_4(vffint_s_l, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 34a272ce00..ee3817dd31 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2884,3 +2884,59 @@ TRANS(vfrecip_s, gen_vv, gen_helper_vfrecip_s)
 TRANS(vfrecip_d, gen_vv, gen_helper_vfrecip_d)
 TRANS(vfrsqrt_s, gen_vv, gen_helper_vfrsqrt_s)
 TRANS(vfrsqrt_d, gen_vv, gen_helper_vfrsqrt_d)
+
+TRANS(vfcvtl_s_h, gen_vv, gen_helper_vfcvtl_s_h)
+TRANS(vfcvth_s_h, gen_vv, gen_helper_vfcvth_s_h)
+TRANS(vfcvtl_d_s, gen_vv, gen_helper_vfcvtl_d_s)
+TRANS(vfcvth_d_s, gen_vv, gen_helper_vfcvth_d_s)
+TRANS(vfcvt_h_s, gen_vvv, gen_helper_vfcvt_h_s)
+TRANS(vfcvt_s_d, gen_vvv, gen_helper_vfcvt_s_d)
+
+TRANS(vfrintrne_s, gen_vv, gen_helper_vfrintrne_s)
+TRANS(vfrintrne_d, gen_vv, gen_helper_vfrintrne_d)
+TRANS(vfrintrz_s, gen_vv, gen_helper_vfrintrz_s)
+TRANS(vfrintrz_d, gen_vv, gen_helper_vfrintrz_d)
+TRANS(vfrintrp_s, gen_vv, gen_helper_vfrintrp_s)
+TRANS(vfrintrp_d, gen_vv, gen_helper_vfrintrp_d)
+TRANS(vfrintrm_s, gen_vv, gen_helper_vfrintrm_s)
+TRANS(vfrintrm_d, gen_vv, gen_helper_vfrintrm_d)
+TRANS(vfrint_s, gen_vv, gen_helper_vfrint_s)
+TRANS(vfrint_d, gen_vv, gen_helper_vfrint_d)
+
+TRANS(vftintrne_w_s, gen_vv, gen_helper_vftintrne_w_s)
+TRANS(vftintrne_l_d, gen_vv, gen_helper_vftintrne_l_d)
+TRANS(vftintrz_w_s, gen_vv, gen_helper_vftintrz_w_s)
+TRANS(vftintrz_l_d, gen_vv, gen_helper_vftintrz_l_d)
+TRANS(vftintrp_w_s, gen_vv, gen_helper_vftintrp_w_s)
+TRANS(vftintrp_l_d, gen_vv, gen_helper_vftintrp_l_d)
+TRANS(vftintrm_w_s, gen_vv, gen_helper_vftintrm_w_s)
+TRANS(vftintrm_l_d, gen_vv, gen_helper_vftintrm_l_d)
+TRANS(vftint_w_s, gen_vv, gen_helper_vftint_w_s)
+TRANS(vftint_l_d, gen_vv, gen_helper_vftint_l_d)
+TRANS(vftintrz_wu_s, gen_vv, gen_helper_vftintrz_wu_s)
+TRANS(vftintrz_lu_d, gen_vv, gen_helper_vftintrz_lu_d)
+TRANS(vftint_wu_s, gen_vv, gen_helper_vftint_wu_s)
+TRANS(vftint_lu_d, gen_vv, gen_helper_vftint_lu_d)
+TRANS(vftintrne_w_d, gen_vvv, gen_helper_vftintrne_w_d)
+TRANS(vftintrz_w_d, gen_vvv, gen_helper_vftintrz_w_d)
+TRANS(vftintrp_w_d, gen_vvv, gen_helper_vftintrp_w_d)
+TRANS(vftintrm_w_d, gen_vvv, gen_helper_vftintrm_w_d)
+TRANS(vftint_w_d, gen_vvv, gen_helper_vftint_w_d)
+TRANS(vftintrnel_l_s, gen_vv, gen_helper_vftintrnel_l_s)
+TRANS(vftintrneh_l_s, gen_vv, gen_helper_vftintrneh_l_s)
+TRANS(vftintrzl_l_s, gen_vv, gen_helper_vftintrzl_l_s)
+TRANS(vftintrzh_l_s, gen_vv, gen_helper_vftintrzh_l_s)
+TRANS(vftintrpl_l_s, gen_vv, gen_helper_vftintrpl_l_s)
+TRANS(vftintrph_l_s, gen_vv, gen_helper_vftintrph_l_s)
+TRANS(vftintrml_l_s, gen_vv, gen_helper_vftintrml_l_s)
+TRANS(vftintrmh_l_s, gen_vv, gen_helper_vftintrmh_l_s)
+TRANS(vftintl_l_s, gen_vv, gen_helper_vftintl_l_s)
+TRANS(vftinth_l_s, gen_vv, gen_helper_vftinth_l_s)
+
+TRANS(vffint_s_w, gen_vv, gen_helper_vffint_s_w)
+TRANS(vffint_d_l, gen_vv, gen_helper_vffint_d_l)
+TRANS(vffint_s_wu, gen_vv, gen_helper_vffint_s_wu)
+TRANS(vffint_d_lu, gen_vv, gen_helper_vffint_d_lu)
+TRANS(vffintl_d_w, gen_vv, gen_helper_vffintl_d_w)
+TRANS(vffinth_d_w, gen_vv, gen_helper_vffinth_d_w)
+TRANS(vffint_s_l, gen_vvv, gen_helper_vffint_s_l)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index bcc531dd25..2ef0f73018 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1046,3 +1046,59 @@ vfrecip_s        0111 00101001 11001 11101 ..... .....    @vv
 vfrecip_d        0111 00101001 11001 11110 ..... .....    @vv
 vfrsqrt_s        0111 00101001 11010 00001 ..... .....    @vv
 vfrsqrt_d        0111 00101001 11010 00010 ..... .....    @vv
+
+vfcvtl_s_h       0111 00101001 11011 11010 ..... .....    @vv
+vfcvth_s_h       0111 00101001 11011 11011 ..... .....    @vv
+vfcvtl_d_s       0111 00101001 11011 11100 ..... .....    @vv
+vfcvth_d_s       0111 00101001 11011 11101 ..... .....    @vv
+vfcvt_h_s        0111 00010100 01100 ..... ..... .....    @vvv
+vfcvt_s_d        0111 00010100 01101 ..... ..... .....    @vvv
+
+vfrint_s         0111 00101001 11010 01101 ..... .....    @vv
+vfrint_d         0111 00101001 11010 01110 ..... .....    @vv
+vfrintrm_s       0111 00101001 11010 10001 ..... .....    @vv
+vfrintrm_d       0111 00101001 11010 10010 ..... .....    @vv
+vfrintrp_s       0111 00101001 11010 10101 ..... .....    @vv
+vfrintrp_d       0111 00101001 11010 10110 ..... .....    @vv
+vfrintrz_s       0111 00101001 11010 11001 ..... .....    @vv
+vfrintrz_d       0111 00101001 11010 11010 ..... .....    @vv
+vfrintrne_s      0111 00101001 11010 11101 ..... .....    @vv
+vfrintrne_d      0111 00101001 11010 11110 ..... .....    @vv
+
+vftint_w_s       0111 00101001 11100 01100 ..... .....    @vv
+vftint_l_d       0111 00101001 11100 01101 ..... .....    @vv
+vftintrm_w_s     0111 00101001 11100 01110 ..... .....    @vv
+vftintrm_l_d     0111 00101001 11100 01111 ..... .....    @vv
+vftintrp_w_s     0111 00101001 11100 10000 ..... .....    @vv
+vftintrp_l_d     0111 00101001 11100 10001 ..... .....    @vv
+vftintrz_w_s     0111 00101001 11100 10010 ..... .....    @vv
+vftintrz_l_d     0111 00101001 11100 10011 ..... .....    @vv
+vftintrne_w_s    0111 00101001 11100 10100 ..... .....    @vv
+vftintrne_l_d    0111 00101001 11100 10101 ..... .....    @vv
+vftint_wu_s      0111 00101001 11100 10110 ..... .....    @vv
+vftint_lu_d      0111 00101001 11100 10111 ..... .....    @vv
+vftintrz_wu_s    0111 00101001 11100 11100 ..... .....    @vv
+vftintrz_lu_d    0111 00101001 11100 11101 ..... .....    @vv
+vftint_w_d       0111 00010100 10011 ..... ..... .....    @vvv
+vftintrm_w_d     0111 00010100 10100 ..... ..... .....    @vvv
+vftintrp_w_d     0111 00010100 10101 ..... ..... .....    @vvv
+vftintrz_w_d     0111 00010100 10110 ..... ..... .....    @vvv
+vftintrne_w_d    0111 00010100 10111 ..... ..... .....    @vvv
+vftintl_l_s      0111 00101001 11101 00000 ..... .....    @vv
+vftinth_l_s      0111 00101001 11101 00001 ..... .....    @vv
+vftintrml_l_s    0111 00101001 11101 00010 ..... .....    @vv
+vftintrmh_l_s    0111 00101001 11101 00011 ..... .....    @vv
+vftintrpl_l_s    0111 00101001 11101 00100 ..... .....    @vv
+vftintrph_l_s    0111 00101001 11101 00101 ..... .....    @vv
+vftintrzl_l_s    0111 00101001 11101 00110 ..... .....    @vv
+vftintrzh_l_s    0111 00101001 11101 00111 ..... .....    @vv
+vftintrnel_l_s   0111 00101001 11101 01000 ..... .....    @vv
+vftintrneh_l_s   0111 00101001 11101 01001 ..... .....    @vv
+
+vffint_s_w       0111 00101001 11100 00000 ..... .....    @vv
+vffint_s_wu      0111 00101001 11100 00001 ..... .....    @vv
+vffint_d_l       0111 00101001 11100 00010 ..... .....    @vv
+vffint_d_lu      0111 00101001 11100 00011 ..... .....    @vv
+vffintl_d_w      0111 00101001 11100 00100 ..... .....    @vv
+vffinth_d_w      0111 00101001 11100 00101 ..... .....    @vv
+vffint_s_l       0111 00010100 10000 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index b66a896a28..0a03971cbe 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -2516,3 +2516,372 @@ DO_2OP_F(vfrecip_s, 32, uint32_t, W, do_frecip_32)
 DO_2OP_F(vfrecip_d, 64, uint64_t, D, do_frecip_64)
 DO_2OP_F(vfrsqrt_s, 32, uint32_t, W, do_frsqrt_32)
 DO_2OP_F(vfrsqrt_d, 64, uint64_t, D, do_frsqrt_64)
+
+static uint32_t float16_cvt_float32(int16_t h, float_status *status)
+{
+    uint32_t t;
+    t = float16_to_float32((uint16_t)h, true, status);
+    return  h < 0 ? (t | (1 << 31)) : t;
+}
+static uint64_t float32_cvt_float64(int32_t s, float_status *status)
+{
+    uint64_t t;
+    t = float32_to_float64((uint32_t)s, status);
+    return s < 0 ? (t | (1ULL << 63)) : t;
+}
+
+static uint16_t float32_cvt_float16(int32_t s, float_status *status)
+{
+    uint16_t t;
+    t = float32_to_float16((uint32_t)s, true, status);
+    return s < 0 ? (t | (1 << 15)) : t;
+}
+static uint32_t float64_cvt_float32(int64_t d, float_status *status)
+{
+    uint32_t t;
+    t = float64_to_float32((uint64_t)d, status);
+    return d < 0 ? (t | (1ULL << 63)) : t;
+}
+
+void HELPER(vfcvtl_s_h)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+{
+    int i;
+    VReg temp;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    vec_clear_cause(env);
+    for (i = 0; i < LSX_LEN/32; i++) {
+        temp.W(i) = float16_cvt_float32(Vj->H(i), &env->fp_status);
+        vec_update_fcsr0(env, GETPC());
+    }
+    Vd->D(0) = temp.D(0);
+    Vd->D(1) = temp.D(1);
+}
+
+void HELPER(vfcvtl_d_s)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+{
+    int i;
+    VReg temp;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    vec_clear_cause(env);
+    for (i = 0; i < LSX_LEN/64; i++) {
+        temp.D(i) = float32_cvt_float64(Vj->W(i), &env->fp_status);
+        vec_update_fcsr0(env, GETPC());
+    }
+    Vd->D(0) = temp.D(0);
+    Vd->D(1) = temp.D(1);
+}
+
+void HELPER(vfcvth_s_h)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+{
+    int i;
+    VReg temp;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    vec_clear_cause(env);
+    for (i = 0; i < LSX_LEN/32; i++) {
+        temp.W(i) = float16_cvt_float32(Vj->H(i + 4), &env->fp_status);
+        vec_update_fcsr0(env, GETPC());
+    }
+    Vd->D(0) = temp.D(0);
+    Vd->D(1) = temp.D(1);
+}
+
+void HELPER(vfcvth_d_s)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+{
+    int i;
+    VReg temp;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    vec_clear_cause(env);
+    for (i = 0; i < LSX_LEN/64; i++) {
+        temp.D(i) = float32_cvt_float64(Vj->W(i + 2), &env->fp_status);
+        vec_update_fcsr0(env, GETPC());
+    }
+    Vd->D(0) = temp.D(0);
+    Vd->D(1) = temp.D(1);
+}
+
+void HELPER(vfcvt_h_s)(CPULoongArchState *env,
+                       uint32_t vd, uint32_t vj, uint32_t vk)
+{
+    int i;
+    VReg temp;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vk = &(env->fpr[vk].vreg);
+
+    vec_clear_cause(env);
+    for(i = 0; i < LSX_LEN/32; i++) {
+        temp.H(i + 4) = float32_cvt_float16(Vj->W(i), &env->fp_status);
+        temp.H(i)  = float32_cvt_float16(Vk->W(i), &env->fp_status);
+        vec_update_fcsr0(env, GETPC());
+    }
+    Vd->D(0) = temp.D(0);
+    Vd->D(1) = temp.D(1);
+}
+
+void HELPER(vfcvt_s_d)(CPULoongArchState *env,
+                       uint32_t vd, uint32_t vj, uint32_t vk)
+{
+    int i;
+    VReg temp;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vk = &(env->fpr[vk].vreg);
+
+    vec_clear_cause(env);
+    for(i = 0; i < LSX_LEN/64; i++) {
+        temp.W(i + 2) = float64_cvt_float32(Vj->D(i), &env->fp_status);
+        temp.W(i)  = float64_cvt_float32(Vk->D(i), &env->fp_status);
+        vec_update_fcsr0(env, GETPC());
+    }
+    Vd->D(0) = temp.D(0);
+    Vd->D(1) = temp.D(1);
+}
+
+#define FCVT_2OP(NAME, BIT, T, E, FN)                               \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
+{                                                                   \
+    int i;                                                          \
+    VReg *Vd = &(env->fpr[vd].vreg);                                \
+    VReg *Vj = &(env->fpr[vj].vreg);                                \
+                                                                    \
+    vec_clear_cause(env);                                           \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                             \
+        Vd->E(i) = FN((T)Vj->E(i), &env->fp_status);                \
+        vec_update_fcsr0(env, GETPC());                             \
+    }                                                               \
+}
+
+FCVT_2OP(vfrint_s, 32, uint32_t, W, float32_round_to_int)
+FCVT_2OP(vfrint_d, 64, uint64_t, D, float64_round_to_int)
+FCVT_2OP(vfrintrne_s, 32, uint32_t, W, float32_round_to_int_rne)
+FCVT_2OP(vfrintrne_d, 64, uint64_t, D, float64_round_to_int_rne)
+FCVT_2OP(vfrintrz_s, 32, uint32_t, W, float32_round_to_int_rz)
+FCVT_2OP(vfrintrz_d, 64, uint64_t, D, float64_round_to_int_rz)
+FCVT_2OP(vfrintrp_s, 32, uint32_t, W, float32_round_to_int_rp)
+FCVT_2OP(vfrintrp_d, 64, uint64_t, D, float64_round_to_int_rp)
+FCVT_2OP(vfrintrm_s, 32, uint32_t, W, float32_round_to_int_rm)
+FCVT_2OP(vfrintrm_d, 64, uint64_t, D, float64_round_to_int_rm)
+
+#define FTINT(NAME, FMT1, FMT2, T1, T2,  MODE)                          \
+static T2 do_ftint ## NAME(CPULoongArchState *env, T1 fj)               \
+{                                                                       \
+    T2 fd;                                                              \
+    FloatRoundMode old_mode = get_float_rounding_mode(&env->fp_status); \
+                                                                        \
+    set_float_rounding_mode(MODE, &env->fp_status);                     \
+    fd = do_## FMT1 ##_to_## FMT2(env, fj);                             \
+    set_float_rounding_mode(old_mode, &env->fp_status);                 \
+    return fd;                                                          \
+}
+
+#define DO_FTINT(FMT1, FMT2, T1, T2)                                         \
+static T2 do_## FMT1 ##_to_## FMT2(CPULoongArchState *env, T1 fj)            \
+{                                                                            \
+    T2 fd;                                                                   \
+                                                                             \
+    fd = FMT1 ##_to_## FMT2(fj, &env->fp_status);                            \
+    if (get_float_exception_flags(&env->fp_status) & (float_flag_invalid)) { \
+        if (FMT1 ##_is_any_nan(fj)) {                                        \
+            fd = 0;                                                          \
+        }                                                                    \
+    }                                                                        \
+    vec_update_fcsr0(env, GETPC());                                          \
+    return fd;                                                               \
+}
+
+DO_FTINT(float32, int32, uint32_t, uint32_t)
+DO_FTINT(float64, int64, uint64_t, uint64_t)
+DO_FTINT(float32, uint32, uint32_t, uint32_t)
+DO_FTINT(float64, uint64, uint64_t, uint64_t)
+DO_FTINT(float64, int32, uint64_t, uint32_t)
+DO_FTINT(float32, int64, uint32_t, uint64_t)
+
+FTINT(rne_w_s, float32, int32, uint32_t, uint32_t, float_round_nearest_even)
+FTINT(rne_l_d, float64, int64, uint64_t, uint64_t, float_round_nearest_even)
+FTINT(rp_w_s, float32, int32, uint32_t, uint32_t, float_round_up)
+FTINT(rp_l_d, float64, int64, uint64_t, uint64_t, float_round_up)
+FTINT(rz_w_s, float32, int32, uint32_t, uint32_t, float_round_to_zero)
+FTINT(rz_l_d, float64, int64, uint64_t, uint64_t, float_round_to_zero)
+FTINT(rm_w_s, float32, int32, uint32_t, uint32_t, float_round_down)
+FTINT(rm_l_d, float64, int64, uint64_t, uint64_t, float_round_down)
+
+DO_2OP_F(vftintrne_w_s, 32, uint32_t, W, do_ftintrne_w_s)
+DO_2OP_F(vftintrne_l_d, 64, uint64_t, D, do_ftintrne_l_d)
+DO_2OP_F(vftintrp_w_s, 32, uint32_t, W, do_ftintrp_w_s)
+DO_2OP_F(vftintrp_l_d, 64, uint64_t, D, do_ftintrp_l_d)
+DO_2OP_F(vftintrz_w_s, 32, uint32_t, W, do_ftintrz_w_s)
+DO_2OP_F(vftintrz_l_d, 64, uint64_t, D, do_ftintrz_l_d)
+DO_2OP_F(vftintrm_w_s, 32, uint32_t, W, do_ftintrm_w_s)
+DO_2OP_F(vftintrm_l_d, 64, uint64_t, D, do_ftintrm_l_d)
+DO_2OP_F(vftint_w_s, 32, uint32_t, W, do_float32_to_int32)
+DO_2OP_F(vftint_l_d, 64, uint64_t, D, do_float64_to_int64)
+
+FTINT(rz_wu_s, float32, uint32, uint32_t, uint32_t, float_round_to_zero)
+FTINT(rz_lu_d, float64, uint64, uint64_t, uint64_t, float_round_to_zero)
+
+DO_2OP_F(vftintrz_wu_s, 32, uint32_t, W, do_ftintrz_wu_s)
+DO_2OP_F(vftintrz_lu_d, 64, uint64_t, D, do_ftintrz_lu_d)
+DO_2OP_F(vftint_wu_s, 32, uint32_t, W, do_float32_to_uint32)
+DO_2OP_F(vftint_lu_d, 64, uint64_t, D, do_float64_to_uint64)
+
+FTINT(rm_w_d, float64, int32, uint64_t, uint32_t, float_round_down)
+FTINT(rp_w_d, float64, int32, uint64_t, uint32_t, float_round_up)
+FTINT(rz_w_d, float64, int32, uint64_t, uint32_t, float_round_to_zero)
+FTINT(rne_w_d, float64, int32, uint64_t, uint32_t, float_round_nearest_even)
+
+#define FTINT_W_D(NAME, FN)                              \
+void HELPER(NAME)(CPULoongArchState *env,                \
+                  uint32_t vd, uint32_t vj, uint32_t vk) \
+{                                                        \
+    int i;                                               \
+    VReg temp;                                           \
+    VReg *Vd = &(env->fpr[vd].vreg);                     \
+    VReg *Vj = &(env->fpr[vj].vreg);                     \
+    VReg *Vk = &(env->fpr[vk].vreg);                     \
+                                                         \
+    vec_clear_cause(env);                                \
+    for (i = 0; i < 2; i++) {                            \
+        temp.W(i + 2) = FN(env, (uint64_t)Vj->D(i));     \
+        temp.W(i) = FN(env, (uint64_t)Vk->D(i));         \
+    }                                                    \
+    Vd->D(0) = temp.D(0);                                \
+    Vd->D(1) = temp.D(1);                                \
+}
+
+FTINT_W_D(vftint_w_d, do_float64_to_int32)
+FTINT_W_D(vftintrm_w_d, do_ftintrm_w_d)
+FTINT_W_D(vftintrp_w_d, do_ftintrp_w_d)
+FTINT_W_D(vftintrz_w_d, do_ftintrz_w_d)
+FTINT_W_D(vftintrne_w_d, do_ftintrne_w_d)
+
+FTINT(rml_l_s, float32, int64, uint32_t, uint64_t, float_round_down)
+FTINT(rpl_l_s, float32, int64, uint32_t, uint64_t, float_round_up)
+FTINT(rzl_l_s, float32, int64, uint32_t, uint64_t, float_round_to_zero)
+FTINT(rnel_l_s, float32, int64, uint32_t, uint64_t, float_round_nearest_even)
+FTINT(rmh_l_s, float32, int64, uint32_t, uint64_t, float_round_down)
+FTINT(rph_l_s, float32, int64, uint32_t, uint64_t, float_round_up)
+FTINT(rzh_l_s, float32, int64, uint32_t, uint64_t, float_round_to_zero)
+FTINT(rneh_l_s, float32, int64, uint32_t, uint64_t, float_round_nearest_even)
+
+#define FTINTL_L_S(NAME, FN)                                        \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
+{                                                                   \
+    int i;                                                          \
+    VReg temp;                                                      \
+    VReg *Vd = &(env->fpr[vd].vreg);                                \
+    VReg *Vj = &(env->fpr[vj].vreg);                                \
+                                                                    \
+    vec_clear_cause(env);                                           \
+    for (i = 0; i < 2; i++) {                                       \
+        temp.D(i) = FN(env, (uint32_t)Vj->W(i));                    \
+    }                                                               \
+    Vd->D(0) = temp.D(0);                                           \
+    Vd->D(1) = temp.D(1);                                           \
+}
+
+FTINTL_L_S(vftintl_l_s, do_float32_to_int64)
+FTINTL_L_S(vftintrml_l_s, do_ftintrml_l_s)
+FTINTL_L_S(vftintrpl_l_s, do_ftintrpl_l_s)
+FTINTL_L_S(vftintrzl_l_s, do_ftintrzl_l_s)
+FTINTL_L_S(vftintrnel_l_s, do_ftintrnel_l_s)
+
+#define FTINTH_L_S(NAME, FN)                                        \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
+{                                                                   \
+    int i;                                                          \
+    VReg temp;                                                      \
+    VReg *Vd = &(env->fpr[vd].vreg);                                \
+    VReg *Vj = &(env->fpr[vj].vreg);                                \
+                                                                    \
+    vec_clear_cause(env);                                           \
+    for (i = 0; i < 2; i++) {                                       \
+        temp.D(i) = FN(env, (uint32_t)Vj->W(i + 2));                \
+    }                                                               \
+    Vd->D(0) = temp.D(0);                                           \
+    Vd->D(1) = temp.D(1);                                           \
+}
+
+FTINTH_L_S(vftinth_l_s, do_float32_to_int64)
+FTINTH_L_S(vftintrmh_l_s, do_ftintrmh_l_s)
+FTINTH_L_S(vftintrph_l_s, do_ftintrph_l_s)
+FTINTH_L_S(vftintrzh_l_s, do_ftintrzh_l_s)
+FTINTH_L_S(vftintrneh_l_s, do_ftintrneh_l_s)
+
+#define FFINT(NAME, FMT1, FMT2, T1, T2)                    \
+static T2 do_ffint_ ## NAME(CPULoongArchState *env, T1 fj) \
+{                                                          \
+    T2 fd;                                                 \
+                                                           \
+    fd = FMT1 ##_to_## FMT2(fj, &env->fp_status);          \
+    vec_update_fcsr0(env, GETPC());                        \
+    return fd;                                             \
+}
+
+FFINT(s_w, int32, float32, int32_t, uint32_t)
+FFINT(d_l, int64, float64, int64_t, uint64_t)
+FFINT(s_wu, uint32, float32, uint32_t, uint32_t)
+FFINT(d_lu, uint64, float64, uint64_t, uint64_t)
+
+DO_2OP_F(vffint_s_w, 32, int32_t, W, do_ffint_s_w)
+DO_2OP_F(vffint_d_l, 64, int64_t, D, do_ffint_d_l)
+DO_2OP_F(vffint_s_wu, 32, uint32_t, W, do_ffint_s_wu)
+DO_2OP_F(vffint_d_lu, 64, uint64_t, D, do_ffint_d_lu)
+
+void HELPER(vffintl_d_w)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+{
+    int i;
+    VReg temp;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    vec_clear_cause(env);
+    for (i = 0; i < 2; i++) {
+        temp.D(i) = int32_to_float64(Vj->W(i), &env->fp_status);
+        vec_update_fcsr0(env, GETPC());
+    }
+    Vd->D(0) = temp.D(0);
+    Vd->D(1) = temp.D(1);
+}
+
+void HELPER(vffinth_d_w)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+{
+    int i;
+    VReg temp;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    vec_clear_cause(env);
+    for (i = 0; i < 2; i++) {
+        temp.D(i) = int32_to_float64(Vj->W(i + 2), &env->fp_status);
+        vec_update_fcsr0(env, GETPC());
+    }
+    Vd->D(0) = temp.D(0);
+    Vd->D(1) = temp.D(1);
+}
+
+void HELPER(vffint_s_l)(CPULoongArchState *env,
+                        uint32_t vd, uint32_t vj, uint32_t vk)
+{
+    int i;
+    VReg temp;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vk = &(env->fpr[vk].vreg);
+
+    vec_clear_cause(env);
+    for (i = 0; i < 2; i++) {
+        temp.W(i + 2) = int64_to_float32(Vj->D(i), &env->fp_status);
+        temp.W(i) = int64_to_float32(Vk->D(i), &env->fp_status);
+        vec_update_fcsr0(env, GETPC());
+    }
+    Vd->D(0) = temp.D(0);
+    Vd->D(1) = temp.D(1);
+}
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 36/44] target/loongarch: Implement vseq vsle vslt
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (34 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 35/44] target/loongarch: Implement LSX fpu fcvt instructions Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-02  5:27   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 37/44] target/loongarch: Implement vfcmp Song Gao
                   ` (7 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VSEQ[I].{B/H/W/D};
- VSLE[I].{B/H/W/D}[U];
- VSLT[I].{B/H/W/D/}[U].

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  43 +++++
 target/loongarch/helper.h                   |  23 +++
 target/loongarch/insn_trans/trans_lsx.c.inc | 191 ++++++++++++++++++++
 target/loongarch/insns.decode               |  43 +++++
 target/loongarch/lsx_helper.c               |  36 ++++
 5 files changed, 336 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index c04271081f..e589b23f4c 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1404,3 +1404,46 @@ INSN_LSX(vffint_d_lu,      vv)
 INSN_LSX(vffintl_d_w,      vv)
 INSN_LSX(vffinth_d_w,      vv)
 INSN_LSX(vffint_s_l,       vvv)
+
+INSN_LSX(vseq_b,           vvv)
+INSN_LSX(vseq_h,           vvv)
+INSN_LSX(vseq_w,           vvv)
+INSN_LSX(vseq_d,           vvv)
+INSN_LSX(vseqi_b,          vv_i)
+INSN_LSX(vseqi_h,          vv_i)
+INSN_LSX(vseqi_w,          vv_i)
+INSN_LSX(vseqi_d,          vv_i)
+
+INSN_LSX(vsle_b,           vvv)
+INSN_LSX(vsle_h,           vvv)
+INSN_LSX(vsle_w,           vvv)
+INSN_LSX(vsle_d,           vvv)
+INSN_LSX(vslei_b,          vv_i)
+INSN_LSX(vslei_h,          vv_i)
+INSN_LSX(vslei_w,          vv_i)
+INSN_LSX(vslei_d,          vv_i)
+INSN_LSX(vsle_bu,          vvv)
+INSN_LSX(vsle_hu,          vvv)
+INSN_LSX(vsle_wu,          vvv)
+INSN_LSX(vsle_du,          vvv)
+INSN_LSX(vslei_bu,         vv_i)
+INSN_LSX(vslei_hu,         vv_i)
+INSN_LSX(vslei_wu,         vv_i)
+INSN_LSX(vslei_du,         vv_i)
+
+INSN_LSX(vslt_b,           vvv)
+INSN_LSX(vslt_h,           vvv)
+INSN_LSX(vslt_w,           vvv)
+INSN_LSX(vslt_d,           vvv)
+INSN_LSX(vslti_b,          vv_i)
+INSN_LSX(vslti_h,          vv_i)
+INSN_LSX(vslti_w,          vv_i)
+INSN_LSX(vslti_d,          vv_i)
+INSN_LSX(vslt_bu,          vvv)
+INSN_LSX(vslt_hu,          vvv)
+INSN_LSX(vslt_wu,          vvv)
+INSN_LSX(vslt_du,          vvv)
+INSN_LSX(vslti_bu,         vv_i)
+INSN_LSX(vslti_hu,         vv_i)
+INSN_LSX(vslti_wu,         vv_i)
+INSN_LSX(vslti_du,         vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index b2cc1a6ddb..25ea9b633d 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -627,3 +627,26 @@ DEF_HELPER_3(vffint_d_lu, void, env, i32, i32)
 DEF_HELPER_3(vffintl_d_w, void, env, i32, i32)
 DEF_HELPER_3(vffinth_d_w, void, env, i32, i32)
 DEF_HELPER_4(vffint_s_l, void, env, i32, i32, i32)
+
+DEF_HELPER_FLAGS_4(vseqi_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vseqi_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vseqi_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vseqi_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(vslei_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslei_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslei_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslei_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslei_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslei_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslei_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslei_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(vslti_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslti_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslti_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslti_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslti_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslti_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslti_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslti_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index ee3817dd31..7368731424 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2940,3 +2940,194 @@ TRANS(vffint_d_lu, gen_vv, gen_helper_vffint_d_lu)
 TRANS(vffintl_d_w, gen_vv, gen_helper_vffintl_d_w)
 TRANS(vffinth_d_w, gen_vv, gen_helper_vffinth_d_w)
 TRANS(vffint_s_l, gen_vvv, gen_helper_vffint_s_l)
+
+static bool do_cmp(DisasContext *ctx, arg_vvv *a, MemOp mop, TCGCond cond,
+                   void (*func)(TCGCond, unsigned, uint32_t, uint32_t,
+                                uint32_t, uint32_t, uint32_t))
+{
+    uint32_t vd_ofs, vj_ofs, vk_ofs;
+
+    CHECK_SXE;
+
+    vd_ofs = vreg_full_offset(a->vd);
+    vj_ofs = vreg_full_offset(a->vj);
+    vk_ofs = vreg_full_offset(a->vk);
+
+    func(cond, mop, vd_ofs, vj_ofs, vk_ofs, 16, 16);
+    return true;
+}
+
+static void do_cmpi_vec(TCGCond cond,
+                        unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
+{
+    TCGv_vec t1;
+
+    t1 = tcg_temp_new_vec_matching(t);
+    tcg_gen_dupi_vec(vece, t1, imm);
+    tcg_gen_cmp_vec(cond, vece, t, a, t1);
+}
+
+static void gen_vseqi_s_vec(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
+{
+    do_cmpi_vec(TCG_COND_EQ, vece, t, a, imm);
+}
+
+static void gen_vslei_s_vec(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
+{
+    do_cmpi_vec(TCG_COND_LE, vece, t, a, imm);
+}
+
+static void gen_vslti_s_vec(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
+{
+    do_cmpi_vec(TCG_COND_LT, vece, t, a, imm);
+}
+
+static void gen_vslei_u_vec(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
+{
+    do_cmpi_vec(TCG_COND_LEU, vece, t, a, imm);
+}
+
+static void gen_vslti_u_vec(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
+{
+    do_cmpi_vec(TCG_COND_LTU, vece, t, a, imm);
+}
+
+#define DO_CMPI_S(NAME)                                                \
+static bool do_## NAME ##_s(DisasContext *ctx, arg_vv_i *a, MemOp mop) \
+{                                                                      \
+    uint32_t vd_ofs, vj_ofs;                                           \
+                                                                       \
+    CHECK_SXE;                                                         \
+                                                                       \
+    static const TCGOpcode vecop_list[] = {                            \
+        INDEX_op_cmp_vec, 0                                            \
+    };                                                                 \
+    static const GVecGen2i op[4] = {                                   \
+        {                                                              \
+            .fniv = gen_## NAME ##_s_vec,                              \
+            .fnoi = gen_helper_## NAME ##_b,                           \
+            .opt_opc = vecop_list,                                     \
+            .vece = MO_8                                               \
+        },                                                             \
+        {                                                              \
+            .fniv = gen_## NAME ##_s_vec,                              \
+            .fnoi = gen_helper_## NAME ##_h,                           \
+            .opt_opc = vecop_list,                                     \
+            .vece = MO_16                                              \
+        },                                                             \
+        {                                                              \
+            .fniv = gen_## NAME ##_s_vec,                              \
+            .fnoi = gen_helper_## NAME ##_w,                           \
+            .opt_opc = vecop_list,                                     \
+            .vece = MO_32                                              \
+        },                                                             \
+        {                                                              \
+            .fniv = gen_## NAME ##_s_vec,                              \
+            .fnoi = gen_helper_## NAME ##_d,                           \
+            .opt_opc = vecop_list,                                     \
+            .vece = MO_64                                              \
+        }                                                              \
+    };                                                                 \
+                                                                       \
+    vd_ofs = vreg_full_offset(a->vd);                                  \
+    vj_ofs = vreg_full_offset(a->vj);                                  \
+                                                                       \
+    tcg_gen_gvec_2i(vd_ofs, vj_ofs, 16, 16, a->imm, &op[mop]);         \
+                                                                       \
+    return true;                                                       \
+}
+
+DO_CMPI_S(vseqi)
+DO_CMPI_S(vslei)
+DO_CMPI_S(vslti)
+
+#define DO_CMPI_U(NAME)                                                \
+static bool do_## NAME ##_u(DisasContext *ctx, arg_vv_i *a, MemOp mop) \
+{                                                                      \
+    uint32_t vd_ofs, vj_ofs;                                           \
+                                                                       \
+    CHECK_SXE;                                                         \
+                                                                       \
+    static const TCGOpcode vecop_list[] = {                            \
+        INDEX_op_cmp_vec, 0                                            \
+    };                                                                 \
+    static const GVecGen2i op[4] = {                                   \
+        {                                                              \
+            .fniv = gen_## NAME ##_u_vec,                              \
+            .fnoi = gen_helper_## NAME ##_bu,                          \
+            .opt_opc = vecop_list,                                     \
+            .vece = MO_8                                               \
+        },                                                             \
+        {                                                              \
+            .fniv = gen_## NAME ##_u_vec,                              \
+            .fnoi = gen_helper_## NAME ##_hu,                          \
+            .opt_opc = vecop_list,                                     \
+            .vece = MO_16                                              \
+        },                                                             \
+        {                                                              \
+            .fniv = gen_## NAME ##_u_vec,                              \
+            .fnoi = gen_helper_## NAME ##_wu,                          \
+            .opt_opc = vecop_list,                                     \
+            .vece = MO_32                                              \
+        },                                                             \
+        {                                                              \
+            .fniv = gen_## NAME ##_u_vec,                              \
+            .fnoi = gen_helper_## NAME ##_du,                          \
+            .opt_opc = vecop_list,                                     \
+            .vece = MO_64                                              \
+        }                                                              \
+    };                                                                 \
+                                                                       \
+    vd_ofs = vreg_full_offset(a->vd);                                  \
+    vj_ofs = vreg_full_offset(a->vj);                                  \
+                                                                       \
+    tcg_gen_gvec_2i(vd_ofs, vj_ofs, 16, 16, a->imm, &op[mop]);         \
+                                                                       \
+    return true;                                                       \
+}
+
+DO_CMPI_U(vslei)
+DO_CMPI_U(vslti)
+
+TRANS(vseq_b, do_cmp, MO_8, TCG_COND_EQ, tcg_gen_gvec_cmp)
+TRANS(vseq_h, do_cmp, MO_16, TCG_COND_EQ, tcg_gen_gvec_cmp)
+TRANS(vseq_w, do_cmp, MO_32, TCG_COND_EQ, tcg_gen_gvec_cmp)
+TRANS(vseq_d, do_cmp, MO_64, TCG_COND_EQ, tcg_gen_gvec_cmp)
+TRANS(vseqi_b, do_vseqi_s, MO_8)
+TRANS(vseqi_h, do_vseqi_s, MO_16)
+TRANS(vseqi_w, do_vseqi_s, MO_32)
+TRANS(vseqi_d, do_vseqi_s, MO_64)
+
+TRANS(vsle_b, do_cmp, MO_8, TCG_COND_LE, tcg_gen_gvec_cmp)
+TRANS(vsle_h, do_cmp, MO_16, TCG_COND_LE, tcg_gen_gvec_cmp)
+TRANS(vsle_w, do_cmp, MO_32, TCG_COND_LE, tcg_gen_gvec_cmp)
+TRANS(vsle_d, do_cmp, MO_64, TCG_COND_LE, tcg_gen_gvec_cmp)
+TRANS(vslei_b, do_vslei_s, MO_8)
+TRANS(vslei_h, do_vslei_s, MO_16)
+TRANS(vslei_w, do_vslei_s, MO_32)
+TRANS(vslei_d, do_vslei_s, MO_64)
+TRANS(vsle_bu, do_cmp, MO_8, TCG_COND_LEU, tcg_gen_gvec_cmp)
+TRANS(vsle_hu, do_cmp, MO_16, TCG_COND_LEU, tcg_gen_gvec_cmp)
+TRANS(vsle_wu, do_cmp, MO_32, TCG_COND_LEU, tcg_gen_gvec_cmp)
+TRANS(vsle_du, do_cmp, MO_64, TCG_COND_LEU, tcg_gen_gvec_cmp)
+TRANS(vslei_bu, do_vslei_u, MO_8)
+TRANS(vslei_hu, do_vslei_u, MO_16)
+TRANS(vslei_wu, do_vslei_u, MO_32)
+TRANS(vslei_du, do_vslei_u, MO_64)
+
+TRANS(vslt_b, do_cmp, MO_8, TCG_COND_LT, tcg_gen_gvec_cmp)
+TRANS(vslt_h, do_cmp, MO_16, TCG_COND_LT, tcg_gen_gvec_cmp)
+TRANS(vslt_w, do_cmp, MO_32, TCG_COND_LT, tcg_gen_gvec_cmp)
+TRANS(vslt_d, do_cmp, MO_64, TCG_COND_LT, tcg_gen_gvec_cmp)
+TRANS(vslti_b, do_vslti_s, MO_8)
+TRANS(vslti_h, do_vslti_s, MO_16)
+TRANS(vslti_w, do_vslti_s, MO_32)
+TRANS(vslti_d, do_vslti_s, MO_64)
+TRANS(vslt_bu, do_cmp, MO_8, TCG_COND_LTU, tcg_gen_gvec_cmp)
+TRANS(vslt_hu, do_cmp, MO_16, TCG_COND_LTU, tcg_gen_gvec_cmp)
+TRANS(vslt_wu, do_cmp, MO_32, TCG_COND_LTU, tcg_gen_gvec_cmp)
+TRANS(vslt_du, do_cmp, MO_64, TCG_COND_LTU, tcg_gen_gvec_cmp)
+TRANS(vslti_bu, do_vslti_u, MO_8)
+TRANS(vslti_hu, do_vslti_u, MO_16)
+TRANS(vslti_wu, do_vslti_u, MO_32)
+TRANS(vslti_du, do_vslti_u, MO_64)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 2ef0f73018..a090a7d22b 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1102,3 +1102,46 @@ vffint_d_lu      0111 00101001 11100 00011 ..... .....    @vv
 vffintl_d_w      0111 00101001 11100 00100 ..... .....    @vv
 vffinth_d_w      0111 00101001 11100 00101 ..... .....    @vv
 vffint_s_l       0111 00010100 10000 ..... ..... .....    @vvv
+
+vseq_b           0111 00000000 00000 ..... ..... .....    @vvv
+vseq_h           0111 00000000 00001 ..... ..... .....    @vvv
+vseq_w           0111 00000000 00010 ..... ..... .....    @vvv
+vseq_d           0111 00000000 00011 ..... ..... .....    @vvv
+vseqi_b          0111 00101000 00000 ..... ..... .....    @vv_i5
+vseqi_h          0111 00101000 00001 ..... ..... .....    @vv_i5
+vseqi_w          0111 00101000 00010 ..... ..... .....    @vv_i5
+vseqi_d          0111 00101000 00011 ..... ..... .....    @vv_i5
+
+vsle_b           0111 00000000 00100 ..... ..... .....    @vvv
+vsle_h           0111 00000000 00101 ..... ..... .....    @vvv
+vsle_w           0111 00000000 00110 ..... ..... .....    @vvv
+vsle_d           0111 00000000 00111 ..... ..... .....    @vvv
+vslei_b          0111 00101000 00100 ..... ..... .....    @vv_i5
+vslei_h          0111 00101000 00101 ..... ..... .....    @vv_i5
+vslei_w          0111 00101000 00110 ..... ..... .....    @vv_i5
+vslei_d          0111 00101000 00111 ..... ..... .....    @vv_i5
+vsle_bu          0111 00000000 01000 ..... ..... .....    @vvv
+vsle_hu          0111 00000000 01001 ..... ..... .....    @vvv
+vsle_wu          0111 00000000 01010 ..... ..... .....    @vvv
+vsle_du          0111 00000000 01011 ..... ..... .....    @vvv
+vslei_bu         0111 00101000 01000 ..... ..... .....    @vv_ui5
+vslei_hu         0111 00101000 01001 ..... ..... .....    @vv_ui5
+vslei_wu         0111 00101000 01010 ..... ..... .....    @vv_ui5
+vslei_du         0111 00101000 01011 ..... ..... .....    @vv_ui5
+
+vslt_b           0111 00000000 01100 ..... ..... .....    @vvv
+vslt_h           0111 00000000 01101 ..... ..... .....    @vvv
+vslt_w           0111 00000000 01110 ..... ..... .....    @vvv
+vslt_d           0111 00000000 01111 ..... ..... .....    @vvv
+vslti_b          0111 00101000 01100 ..... ..... .....    @vv_i5
+vslti_h          0111 00101000 01101 ..... ..... .....    @vv_i5
+vslti_w          0111 00101000 01110 ..... ..... .....    @vv_i5
+vslti_d          0111 00101000 01111 ..... ..... .....    @vv_i5
+vslt_bu          0111 00000000 10000 ..... ..... .....    @vvv
+vslt_hu          0111 00000000 10001 ..... ..... .....    @vvv
+vslt_wu          0111 00000000 10010 ..... ..... .....    @vvv
+vslt_du          0111 00000000 10011 ..... ..... .....    @vvv
+vslti_bu         0111 00101000 10000 ..... ..... .....    @vv_ui5
+vslti_hu         0111 00101000 10001 ..... ..... .....    @vv_ui5
+vslti_wu         0111 00101000 10010 ..... ..... .....    @vv_ui5
+vslti_du         0111 00101000 10011 ..... ..... .....    @vv_ui5
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 0a03971cbe..9ed7afdf6d 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -2885,3 +2885,39 @@ void HELPER(vffint_s_l)(CPULoongArchState *env,
     Vd->D(0) = temp.D(0);
     Vd->D(1) = temp.D(1);
 }
+
+#define VSEQ(a, b) (a == b ? -1 : 0)
+#define VSLE(a, b) (a <= b ? -1 : 0)
+#define VSLT(a, b) (a < b ? -1 : 0)
+
+#define VCMPI(NAME, BIT, T, E, DO_OP)                           \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t v) \
+{                                                               \
+    int i;                                                      \
+    VReg *Vd = (VReg *)vd;                                      \
+    VReg *Vj = (VReg *)vj;                                      \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                         \
+        Vd->E(i) = DO_OP((T)Vj->E(i), (T)imm);                  \
+    }                                                           \
+}
+
+VCMPI(vseqi_b, 8, int8_t, B, VSEQ)
+VCMPI(vseqi_h, 16, int16_t, H, VSEQ)
+VCMPI(vseqi_w, 32, int32_t, W, VSEQ)
+VCMPI(vseqi_d, 64, int64_t, D, VSEQ)
+VCMPI(vslei_b, 8, int8_t, B, VSLE)
+VCMPI(vslei_h, 16, int16_t, H, VSLE)
+VCMPI(vslei_w, 32, int32_t, W, VSLE)
+VCMPI(vslei_d, 64, int64_t, D, VSLE)
+VCMPI(vslei_bu, 8, uint8_t, B, VSLE)
+VCMPI(vslei_hu, 16, uint16_t, H, VSLE)
+VCMPI(vslei_wu, 32, uint32_t, W, VSLE)
+VCMPI(vslei_du, 64, uint64_t, D, VSLE)
+VCMPI(vslti_b, 8, int8_t, B, VSLT)
+VCMPI(vslti_h, 16, int16_t, H, VSLT)
+VCMPI(vslti_w, 32, int32_t, W, VSLT)
+VCMPI(vslti_d, 64, int64_t, D, VSLT)
+VCMPI(vslti_bu, 8, uint8_t, B, VSLT)
+VCMPI(vslti_hu, 16, uint16_t, H, VSLT)
+VCMPI(vslti_wu, 32, uint32_t, W, VSLT)
+VCMPI(vslti_du, 64, uint64_t, D, VSLT)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 37/44] target/loongarch: Implement vfcmp
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (35 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 36/44] target/loongarch: Implement vseq vsle vslt Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-04  0:47   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 38/44] target/loongarch: Implement vbitsel vset Song Gao
                   ` (6 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VFCMP.cond.{S/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    | 94 +++++++++++++++++++++
 target/loongarch/helper.h                   |  5 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 32 +++++++
 target/loongarch/insns.decode               |  5 ++
 target/loongarch/lsx_helper.c               | 51 +++++++++++
 5 files changed, 187 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index e589b23f4c..64db01d2f9 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1447,3 +1447,97 @@ INSN_LSX(vslti_bu,         vv_i)
 INSN_LSX(vslti_hu,         vv_i)
 INSN_LSX(vslti_wu,         vv_i)
 INSN_LSX(vslti_du,         vv_i)
+
+#define output_vfcmp(C, PREFIX, SUFFIX)                                     \
+{                                                                           \
+    (C)->info->fprintf_func((C)->info->stream, "%08x   %s%s\t%d, f%d, f%d", \
+                            (C)->insn, PREFIX, SUFFIX, a->vd,               \
+                            a->vj, a->vk);                                  \
+}
+
+static bool output_vvv_fcond(DisasContext *ctx, arg_vvv_fcond * a,
+                             const char *suffix)
+{
+    bool ret = true;
+    switch (a->fcond) {
+    case 0x0:
+        output_vfcmp(ctx, "vfcmp_caf_", suffix);
+        break;
+    case 0x1:
+        output_vfcmp(ctx, "vfcmp_saf_", suffix);
+        break;
+    case 0x2:
+        output_vfcmp(ctx, "vfcmp_clt_", suffix);
+        break;
+    case 0x3:
+        output_vfcmp(ctx, "vfcmp_slt_", suffix);
+        break;
+    case 0x4:
+        output_vfcmp(ctx, "vfcmp_ceq_", suffix);
+        break;
+    case 0x5:
+        output_vfcmp(ctx, "vfcmp_seq_", suffix);
+        break;
+    case 0x6:
+        output_vfcmp(ctx, "vfcmp_cle_", suffix);
+        break;
+    case 0x7:
+        output_vfcmp(ctx, "vfcmp_sle_", suffix);
+        break;
+    case 0x8:
+        output_vfcmp(ctx, "vfcmp_cun_", suffix);
+        break;
+    case 0x9:
+        output_vfcmp(ctx, "vfcmp_sun_", suffix);
+        break;
+    case 0xA:
+        output_vfcmp(ctx, "vfcmp_cult_", suffix);
+        break;
+    case 0xB:
+        output_vfcmp(ctx, "vfcmp_sult_", suffix);
+        break;
+    case 0xC:
+        output_vfcmp(ctx, "vfcmp_cueq_", suffix);
+        break;
+    case 0xD:
+        output_vfcmp(ctx, "vfcmp_sueq_", suffix);
+        break;
+    case 0xE:
+        output_vfcmp(ctx, "vfcmp_cule_", suffix);
+        break;
+    case 0xF:
+        output_vfcmp(ctx, "vfcmp_sule_", suffix);
+        break;
+    case 0x10:
+        output_vfcmp(ctx, "vfcmp_cne_", suffix);
+        break;
+    case 0x11:
+        output_vfcmp(ctx, "vfcmp_sne_", suffix);
+        break;
+    case 0x14:
+        output_vfcmp(ctx, "vfcmp_cor_", suffix);
+        break;
+    case 0x15:
+        output_vfcmp(ctx, "vfcmp_sor_", suffix);
+        break;
+    case 0x18:
+        output_vfcmp(ctx, "vfcmp_cune_", suffix);
+        break;
+    case 0x19:
+        output_vfcmp(ctx, "vfcmp_sune_", suffix);
+        break;
+    default:
+        ret = false;
+    }
+    return ret;
+}
+
+#define LSX_FCMP_INSN(suffix)                            \
+static bool trans_vfcmp_cond_##suffix(DisasContext *ctx, \
+                                     arg_vvv_fcond * a)  \
+{                                                        \
+    return output_vvv_fcond(ctx, a, #suffix);            \
+}
+
+LSX_FCMP_INSN(s)
+LSX_FCMP_INSN(d)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 25ea9b633d..ef0b67349d 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -650,3 +650,8 @@ DEF_HELPER_FLAGS_4(vslti_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vslti_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vslti_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vslti_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_5(vfcmp_c_s, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfcmp_s_s, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfcmp_c_d, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfcmp_s_d, void, env, i32, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 7368731424..593b8b481d 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -3131,3 +3131,35 @@ TRANS(vslti_bu, do_vslti_u, MO_8)
 TRANS(vslti_hu, do_vslti_u, MO_16)
 TRANS(vslti_wu, do_vslti_u, MO_32)
 TRANS(vslti_du, do_vslti_u, MO_64)
+
+static bool trans_vfcmp_cond_s(DisasContext *ctx, arg_vvv_fcond *a)
+{
+    uint32_t flags;
+    void (*fn)(TCGv_env, TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32);
+    TCGv_i32 vd = tcg_constant_i32(a->vd);
+    TCGv_i32 vj = tcg_constant_i32(a->vj);
+    TCGv_i32 vk = tcg_constant_i32(a->vk);
+
+    CHECK_SXE;
+
+    fn = (a->fcond & 1 ? gen_helper_vfcmp_s_s : gen_helper_vfcmp_c_s);
+    flags = get_fcmp_flags(a->fcond >> 1);
+    fn(cpu_env, vd, vj, vk,  tcg_constant_i32(flags));
+
+    return true;
+}
+
+static bool trans_vfcmp_cond_d(DisasContext *ctx, arg_vvv_fcond *a)
+{
+    uint32_t flags;
+    void (*fn)(TCGv_env, TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32);
+    TCGv_i32 vd = tcg_constant_i32(a->vd);
+    TCGv_i32 vj = tcg_constant_i32(a->vj);
+    TCGv_i32 vk = tcg_constant_i32(a->vk);
+
+    fn = (a->fcond & 1 ? gen_helper_vfcmp_s_d : gen_helper_vfcmp_c_d);
+    flags = get_fcmp_flags(a->fcond >> 1);
+    fn(cpu_env, vd, vj, vk, tcg_constant_i32(flags));
+
+    return true;
+}
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index a090a7d22b..d018b110cd 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -494,6 +494,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 &vvv          vd vj vk
 &vv_i         vd vj imm
 &vvvv         vd vj vk va
+&vvv_fcond    vd vj vk fcond
 
 #
 # LSX Formats
@@ -508,6 +509,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 @vv_ui8              .... ........ .. imm:8 vj:5 vd:5    &vv_i
 @vv_i5           .... ........ ..... imm:s5 vj:5 vd:5    &vv_i
 @vvvv               .... ........ va:5 vk:5 vj:5 vd:5    &vvvv
+@vvv_fcond      .... ........ fcond:5  vk:5 vj:5 vd:5    &vvv_fcond
 
 vadd_b           0111 00000000 10100 ..... ..... .....    @vvv
 vadd_h           0111 00000000 10101 ..... ..... .....    @vvv
@@ -1145,3 +1147,6 @@ vslti_bu         0111 00101000 10000 ..... ..... .....    @vv_ui5
 vslti_hu         0111 00101000 10001 ..... ..... .....    @vv_ui5
 vslti_wu         0111 00101000 10010 ..... ..... .....    @vv_ui5
 vslti_du         0111 00101000 10011 ..... ..... .....    @vv_ui5
+
+vfcmp_cond_s     0000 11000101 ..... ..... ..... .....    @vvv_fcond
+vfcmp_cond_d     0000 11000110 ..... ..... ..... .....    @vvv_fcond
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 9ed7afdf6d..51b784e885 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -2921,3 +2921,54 @@ VCMPI(vslti_bu, 8, uint8_t, B, VSLT)
 VCMPI(vslti_hu, 16, uint16_t, H, VSLT)
 VCMPI(vslti_wu, 32, uint32_t, W, VSLT)
 VCMPI(vslti_du, 64, uint64_t, D, VSLT)
+
+static uint64_t vfcmp_common(CPULoongArchState *env,
+                             FloatRelation cmp, uint32_t flags)
+{
+    bool ret;
+
+    switch (cmp) {
+    case float_relation_less:
+        ret = (flags & FCMP_LT);
+        break;
+    case float_relation_equal:
+        ret = (flags & FCMP_EQ);
+        break;
+    case float_relation_greater:
+        ret = (flags & FCMP_GT);
+        break;
+    case float_relation_unordered:
+        ret = (flags & FCMP_UN);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    return ret;
+}
+
+#define VFCMP(NAME, BIT, T, E, FN)                                       \
+void HELPER(NAME)(CPULoongArchState *env,                                \
+                  uint32_t vd, uint32_t vj, uint32_t vk, uint32_t flags) \
+{                                                                        \
+    int i;                                                               \
+    VReg t;                                                              \
+    VReg *Vd = &(env->fpr[vd].vreg);                                     \
+    VReg *Vj = &(env->fpr[vj].vreg);                                     \
+    VReg *Vk = &(env->fpr[vk].vreg);                                     \
+                                                                         \
+    vec_clear_cause(env);                                                \
+    for (i = 0; i < LSX_LEN/BIT ; i++) {                                 \
+        FloatRelation cmp;                                               \
+        cmp = FN(Vj->E(i), Vk->E(i), &env->fp_status);                   \
+        t.E(i) = (vfcmp_common(env, cmp, flags)) ? -1 : 0;               \
+        vec_update_fcsr0(env, GETPC());                                  \
+    }                                                                    \
+    Vd->D(0) = t.D(0);                                                   \
+    Vd->D(1) = t.D(1);                                                   \
+}
+
+VFCMP(vfcmp_c_s, 32, uint32_t, W, float32_compare_quiet)
+VFCMP(vfcmp_s_s, 32, uint32_t, W, float32_compare)
+VFCMP(vfcmp_c_d, 64, uint64_t, D, float64_compare_quiet)
+VFCMP(vfcmp_s_d, 64, uint64_t, D, float64_compare)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 38/44] target/loongarch: Implement vbitsel vset
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (36 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 37/44] target/loongarch: Implement vfcmp Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-04  1:03   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 39/44] target/loongarch: Implement vinsgr2vr vpickve2gr vreplgr2vr Song Gao
                   ` (5 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VBITSEL.V;
- VBITSELI.B;
- VSET{EQZ/NEZ}.V;
- VSETANYEQZ.{B/H/W/D};
- VSETALLNEZ.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    | 20 +++++++
 target/loongarch/helper.h                   | 13 +++++
 target/loongarch/insn_trans/trans_lsx.c.inc | 58 +++++++++++++++++++++
 target/loongarch/insns.decode               | 17 ++++++
 target/loongarch/lsx_helper.c               | 57 ++++++++++++++++++++
 5 files changed, 165 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 64db01d2f9..ecf0c7b577 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -792,6 +792,12 @@ static bool trans_##insn(DisasContext *ctx, arg_##type * a) \
     return true;                                            \
 }
 
+static void output_cv(DisasContext *ctx, arg_cv *a,
+                        const char *mnemonic)
+{
+    output(ctx, mnemonic, "fcc%d, v%d", a->cd, a->vj);
+}
+
 static void output_vvv(DisasContext *ctx, arg_vvv *a, const char *mnemonic)
 {
     output(ctx, mnemonic, "v%d, v%d, v%d", a->vd, a->vj, a->vk);
@@ -1541,3 +1547,17 @@ static bool trans_vfcmp_cond_##suffix(DisasContext *ctx, \
 
 LSX_FCMP_INSN(s)
 LSX_FCMP_INSN(d)
+
+INSN_LSX(vbitsel_v,        vvvv)
+INSN_LSX(vbitseli_b,       vv_i)
+
+INSN_LSX(vseteqz_v,        cv)
+INSN_LSX(vsetnez_v,        cv)
+INSN_LSX(vsetanyeqz_b,     cv)
+INSN_LSX(vsetanyeqz_h,     cv)
+INSN_LSX(vsetanyeqz_w,     cv)
+INSN_LSX(vsetanyeqz_d,     cv)
+INSN_LSX(vsetallnez_b,     cv)
+INSN_LSX(vsetallnez_h,     cv)
+INSN_LSX(vsetallnez_w,     cv)
+INSN_LSX(vsetallnez_d,     cv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index ef0b67349d..cdc007a072 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -655,3 +655,16 @@ DEF_HELPER_5(vfcmp_c_s, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(vfcmp_s_s, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(vfcmp_c_d, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(vfcmp_s_d, void, env, i32, i32, i32, i32)
+
+DEF_HELPER_FLAGS_4(vbitseli_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_3(vseteqz_v, void, env, i32, i32)
+DEF_HELPER_3(vsetnez_v, void, env, i32, i32)
+DEF_HELPER_3(vsetanyeqz_b, void, env, i32, i32)
+DEF_HELPER_3(vsetanyeqz_h, void, env, i32, i32)
+DEF_HELPER_3(vsetanyeqz_w, void, env, i32, i32)
+DEF_HELPER_3(vsetanyeqz_d, void, env, i32, i32)
+DEF_HELPER_3(vsetallnez_b, void, env, i32, i32)
+DEF_HELPER_3(vsetallnez_h, void, env, i32, i32)
+DEF_HELPER_3(vsetallnez_w, void, env, i32, i32)
+DEF_HELPER_3(vsetallnez_d, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 593b8b481d..7fc5c6c1d6 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -65,6 +65,17 @@ static bool gen_vv_i(DisasContext *ctx, arg_vv_i *a,
     return true;
 }
 
+static bool gen_cv(DisasContext *ctx, arg_cv *a,
+                    void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32))
+{
+    TCGv_i32 vj = tcg_constant_i32(a->vj);
+    TCGv_i32 cd = tcg_constant_i32(a->cd);
+
+    CHECK_SXE;
+    func(cpu_env, cd, vj);
+    return true;
+}
+
 static bool gvec_vvv(DisasContext *ctx, arg_vvv *a, MemOp mop,
                      void (*func)(unsigned, uint32_t, uint32_t,
                                   uint32_t, uint32_t, uint32_t))
@@ -3163,3 +3174,50 @@ static bool trans_vfcmp_cond_d(DisasContext *ctx, arg_vvv_fcond *a)
 
     return true;
 }
+
+static bool trans_vbitsel_v(DisasContext *ctx, arg_vvvv *a)
+{
+    CHECK_SXE;
+
+    tcg_gen_gvec_bitsel(MO_64, vreg_full_offset(a->vd), vreg_full_offset(a->va),
+                        vreg_full_offset(a->vk), vreg_full_offset(a->vj),
+                        16, 16);
+    return true;
+}
+
+static void gen_vbitseli(unsigned vece, TCGv_vec a, TCGv_vec b, int64_t imm)
+{
+    TCGv_vec t;
+
+    t = tcg_temp_new_vec_matching(a);
+    tcg_gen_dupi_vec(vece, t, imm);
+    tcg_gen_bitsel_vec(vece, a, a, t, b);
+}
+
+static bool trans_vbitseli_b(DisasContext *ctx, arg_vv_i *a)
+{
+    static const GVecGen2i op = {
+       .fniv = gen_vbitseli,
+       .fnoi = gen_helper_vbitseli_b,
+       .vece = MO_8,
+       .load_dest = true
+    };
+
+    CHECK_SXE;
+
+    tcg_gen_gvec_2i(vreg_full_offset(a->vd), vreg_full_offset(a->vj),
+                    16, 16, a->imm, &op);
+    return true;
+}
+
+
+TRANS(vseteqz_v, gen_cv, gen_helper_vseteqz_v)
+TRANS(vsetnez_v, gen_cv, gen_helper_vsetnez_v)
+TRANS(vsetanyeqz_b, gen_cv, gen_helper_vsetanyeqz_b)
+TRANS(vsetanyeqz_h, gen_cv, gen_helper_vsetanyeqz_h)
+TRANS(vsetanyeqz_w, gen_cv, gen_helper_vsetanyeqz_w)
+TRANS(vsetanyeqz_d, gen_cv, gen_helper_vsetanyeqz_d)
+TRANS(vsetallnez_b, gen_cv, gen_helper_vsetallnez_b)
+TRANS(vsetallnez_h, gen_cv, gen_helper_vsetallnez_h)
+TRANS(vsetallnez_w, gen_cv, gen_helper_vsetallnez_w)
+TRANS(vsetallnez_d, gen_cv, gen_helper_vsetallnez_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index d018b110cd..d8feeadc41 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -491,6 +491,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 #
 
 &vv           vd vj
+&cv           cd vj
 &vvv          vd vj vk
 &vv_i         vd vj imm
 &vvvv         vd vj vk va
@@ -500,6 +501,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 # LSX Formats
 #
 @vv               .... ........ ..... ..... vj:5 vd:5    &vv
+@cv            .... ........ ..... ..... vj:5 .. cd:3    &cv
 @vvv               .... ........ ..... vk:5 vj:5 vd:5    &vvv
 @vv_ui3        .... ........ ..... .. imm:3 vj:5 vd:5    &vv_i
 @vv_ui4         .... ........ ..... . imm:4 vj:5 vd:5    &vv_i
@@ -1150,3 +1152,18 @@ vslti_du         0111 00101000 10011 ..... ..... .....    @vv_ui5
 
 vfcmp_cond_s     0000 11000101 ..... ..... ..... .....    @vvv_fcond
 vfcmp_cond_d     0000 11000110 ..... ..... ..... .....    @vvv_fcond
+
+vbitsel_v        0000 11010001 ..... ..... ..... .....    @vvvv
+
+vbitseli_b       0111 00111100 01 ........ ..... .....    @vv_ui8
+
+vseteqz_v        0111 00101001 11001 00110 ..... 00 ...   @cv
+vsetnez_v        0111 00101001 11001 00111 ..... 00 ...   @cv
+vsetanyeqz_b     0111 00101001 11001 01000 ..... 00 ...   @cv
+vsetanyeqz_h     0111 00101001 11001 01001 ..... 00 ...   @cv
+vsetanyeqz_w     0111 00101001 11001 01010 ..... 00 ...   @cv
+vsetanyeqz_d     0111 00101001 11001 01011 ..... 00 ...   @cv
+vsetallnez_b     0111 00101001 11001 01100 ..... 00 ...   @cv
+vsetallnez_h     0111 00101001 11001 01101 ..... 00 ...   @cv
+vsetallnez_w     0111 00101001 11001 01110 ..... 00 ...   @cv
+vsetallnez_d     0111 00101001 11001 01111 ..... 00 ...   @cv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 51b784e885..996312d9b2 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -2972,3 +2972,60 @@ VFCMP(vfcmp_c_s, 32, uint32_t, W, float32_compare_quiet)
 VFCMP(vfcmp_s_s, 32, uint32_t, W, float32_compare)
 VFCMP(vfcmp_c_d, 64, uint64_t, D, float64_compare_quiet)
 VFCMP(vfcmp_s_d, 64, uint64_t, D, float64_compare)
+
+void HELPER(vbitseli_b)(void *vd, void *vj,  uint64_t imm, uint32_t v)
+{
+    int i;
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+
+    for (i = 0; i < 16; i++) {
+        Vd->B(i) = (~Vd->B(i) & Vj->B(i)) | (Vd->B(i) & imm);
+    }
+}
+
+void HELPER(vseteqz_v)(CPULoongArchState *env, uint32_t cd, uint32_t vj)
+{
+    VReg *Vj = &(env->fpr[vj].vreg);
+    env->cf[cd & 0x7] = (Vj->Q(0) == 0);
+}
+
+void HELPER(vsetnez_v)(CPULoongArchState *env, uint32_t cd, uint32_t vj)
+{
+    VReg *Vj = &(env->fpr[vj].vreg);
+    env->cf[cd & 0x7] = (Vj->Q(0) != 0);
+}
+
+#define SETANYEQZ(NAME, BIT, E)                                     \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t cd, uint32_t vj) \
+{                                                                   \
+    int i;                                                          \
+    bool ret = false;                                               \
+    VReg *Vj = &(env->fpr[vj].vreg);                                \
+                                                                    \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                             \
+        ret |= (Vj->E(i) == 0);                                     \
+    }                                                               \
+    env->cf[cd & 0x7] = ret;                                        \
+}
+SETANYEQZ(vsetanyeqz_b, 8, B)
+SETANYEQZ(vsetanyeqz_h, 16, H)
+SETANYEQZ(vsetanyeqz_w, 32, W)
+SETANYEQZ(vsetanyeqz_d, 64, D)
+
+#define SETALLNEZ(NAME, BIT, E)                                     \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t cd, uint32_t vj) \
+{                                                                   \
+    int i;                                                          \
+    bool ret = true;                                                \
+    VReg *Vj = &(env->fpr[vj].vreg);                                \
+                                                                    \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                             \
+        ret &= (Vj->E(i) != 0);                                     \
+    }                                                               \
+    env->cf[cd & 0x7] = ret;                                        \
+}
+SETALLNEZ(vsetallnez_b, 8, B)
+SETALLNEZ(vsetallnez_h, 16, H)
+SETALLNEZ(vsetallnez_w, 32, W)
+SETALLNEZ(vsetallnez_d, 64, D)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 39/44] target/loongarch: Implement vinsgr2vr vpickve2gr vreplgr2vr
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (37 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 38/44] target/loongarch: Implement vbitsel vset Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-04  1:09   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 40/44] target/loongarch: Implement vreplve vpack vpick Song Gao
                   ` (4 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VINSGR2VR.{B/H/W/D};
- VPICKVE2GR.{B/H/W/D}[U];
- VREPLGR2VR.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  33 ++++++
 target/loongarch/insn_trans/trans_lsx.c.inc | 110 ++++++++++++++++++++
 target/loongarch/insns.decode               |  30 ++++++
 3 files changed, 173 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index ecf0c7b577..7255a2aa4f 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -818,6 +818,21 @@ static void output_vvvv(DisasContext *ctx, arg_vvvv *a, const char *mnemonic)
     output(ctx, mnemonic, "v%d, v%d, v%d, v%d", a->vd, a->vj, a->vk, a->va);
 }
 
+static void output_vr_i(DisasContext *ctx, arg_vr_i *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "v%d, r%d, 0x%x", a->vd, a->rj, a->imm);
+}
+
+static void output_rv_i(DisasContext *ctx, arg_rv_i *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "r%d, v%d, 0x%x", a->rd, a->vj,  a->imm);
+}
+
+static void output_vr(DisasContext *ctx, arg_vr *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "v%d, r%d", a->vd, a->rj);
+}
+
 INSN_LSX(vadd_b,           vvv)
 INSN_LSX(vadd_h,           vvv)
 INSN_LSX(vadd_w,           vvv)
@@ -1561,3 +1576,21 @@ INSN_LSX(vsetallnez_b,     cv)
 INSN_LSX(vsetallnez_h,     cv)
 INSN_LSX(vsetallnez_w,     cv)
 INSN_LSX(vsetallnez_d,     cv)
+
+INSN_LSX(vinsgr2vr_b,      vr_i)
+INSN_LSX(vinsgr2vr_h,      vr_i)
+INSN_LSX(vinsgr2vr_w,      vr_i)
+INSN_LSX(vinsgr2vr_d,      vr_i)
+INSN_LSX(vpickve2gr_b,     rv_i)
+INSN_LSX(vpickve2gr_h,     rv_i)
+INSN_LSX(vpickve2gr_w,     rv_i)
+INSN_LSX(vpickve2gr_d,     rv_i)
+INSN_LSX(vpickve2gr_bu,    rv_i)
+INSN_LSX(vpickve2gr_hu,    rv_i)
+INSN_LSX(vpickve2gr_wu,    rv_i)
+INSN_LSX(vpickve2gr_du,    rv_i)
+
+INSN_LSX(vreplgr2vr_b,     vr)
+INSN_LSX(vreplgr2vr_h,     vr)
+INSN_LSX(vreplgr2vr_w,     vr)
+INSN_LSX(vreplgr2vr_d,     vr)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 7fc5c6c1d6..b2489537ef 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -3221,3 +3221,113 @@ TRANS(vsetallnez_b, gen_cv, gen_helper_vsetallnez_b)
 TRANS(vsetallnez_h, gen_cv, gen_helper_vsetallnez_h)
 TRANS(vsetallnez_w, gen_cv, gen_helper_vsetallnez_w)
 TRANS(vsetallnez_d, gen_cv, gen_helper_vsetallnez_d)
+
+static bool trans_vinsgr2vr_b(DisasContext *ctx, arg_vr_i *a)
+{
+    CHECK_SXE;
+    tcg_gen_st8_i64(cpu_gpr[a->rj], cpu_env,
+                    offsetof(CPULoongArchState, fpr[a->vd].vreg.B(a->imm)));
+    return true;
+}
+
+static bool trans_vinsgr2vr_h(DisasContext *ctx, arg_vr_i *a)
+{
+    CHECK_SXE;
+    tcg_gen_st16_i64(cpu_gpr[a->rj], cpu_env,
+                    offsetof(CPULoongArchState, fpr[a->vd].vreg.H(a->imm)));
+    return true;
+}
+
+static bool trans_vinsgr2vr_w(DisasContext *ctx, arg_vr_i *a)
+{
+    CHECK_SXE;
+    tcg_gen_st32_i64(cpu_gpr[a->rj], cpu_env,
+                     offsetof(CPULoongArchState, fpr[a->vd].vreg.W(a->imm)));
+    return true;
+}
+
+static bool trans_vinsgr2vr_d(DisasContext *ctx, arg_vr_i *a)
+{
+    CHECK_SXE;
+    tcg_gen_st_i64(cpu_gpr[a->rj], cpu_env,
+                   offsetof(CPULoongArchState, fpr[a->vd].vreg.D(a->imm)));
+    return true;
+}
+
+static bool trans_vpickve2gr_b(DisasContext *ctx, arg_rv_i *a)
+{
+    CHECK_SXE;
+    tcg_gen_ld8s_i64(cpu_gpr[a->rd], cpu_env,
+                     offsetof(CPULoongArchState, fpr[a->vj].vreg.B(a->imm)));
+    return true;
+}
+
+static bool trans_vpickve2gr_h(DisasContext *ctx, arg_rv_i *a)
+{
+    CHECK_SXE;
+    tcg_gen_ld16s_i64(cpu_gpr[a->rd], cpu_env,
+                      offsetof(CPULoongArchState, fpr[a->vj].vreg.H(a->imm)));
+    return true;
+}
+
+static bool trans_vpickve2gr_w(DisasContext *ctx, arg_rv_i *a)
+{
+    CHECK_SXE;
+    tcg_gen_ld32s_i64(cpu_gpr[a->rd], cpu_env,
+                      offsetof(CPULoongArchState, fpr[a->vj].vreg.W(a->imm)));
+    return true;
+}
+
+static bool trans_vpickve2gr_d(DisasContext *ctx, arg_rv_i *a)
+{
+    CHECK_SXE;
+    tcg_gen_ld_i64(cpu_gpr[a->rd], cpu_env,
+                   offsetof(CPULoongArchState, fpr[a->vj].vreg.D(a->imm)));
+    return true;
+}
+
+static bool trans_vpickve2gr_bu(DisasContext *ctx, arg_rv_i *a)
+{
+    CHECK_SXE;
+    tcg_gen_ld8u_i64(cpu_gpr[a->rd], cpu_env,
+                     offsetof(CPULoongArchState, fpr[a->vj].vreg.B(a->imm)));
+    return true;
+}
+
+static bool trans_vpickve2gr_hu(DisasContext *ctx, arg_rv_i *a)
+{
+    CHECK_SXE;
+    tcg_gen_ld16u_i64(cpu_gpr[a->rd], cpu_env,
+                      offsetof(CPULoongArchState, fpr[a->vj].vreg.H(a->imm)));
+    return true;
+}
+
+static bool trans_vpickve2gr_wu(DisasContext *ctx, arg_rv_i *a)
+{
+    CHECK_SXE;
+    tcg_gen_ld32u_i64(cpu_gpr[a->rd], cpu_env,
+                      offsetof(CPULoongArchState, fpr[a->vj].vreg.W(a->imm)));
+    return true;
+}
+
+static bool trans_vpickve2gr_du(DisasContext *ctx, arg_rv_i *a)
+{
+    CHECK_SXE;
+    tcg_gen_ld_i64(cpu_gpr[a->rd], cpu_env,
+                   offsetof(CPULoongArchState, fpr[a->vj].vreg.D(a->imm)));
+    return true;
+}
+
+static bool gvec_dup(DisasContext *ctx, arg_vr *a, MemOp mop)
+{
+    CHECK_SXE;
+
+    tcg_gen_gvec_dup_i64(mop, vreg_full_offset(a->vd),
+                         16, 16, cpu_gpr[a->rj]);
+    return true;
+}
+
+TRANS(vreplgr2vr_b, gvec_dup, MO_8)
+TRANS(vreplgr2vr_h, gvec_dup, MO_16)
+TRANS(vreplgr2vr_w, gvec_dup, MO_32)
+TRANS(vreplgr2vr_d, gvec_dup, MO_64)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index d8feeadc41..d1d255ab82 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -496,6 +496,9 @@ dbcl             0000 00000010 10101 ...............      @i15
 &vv_i         vd vj imm
 &vvvv         vd vj vk va
 &vvv_fcond    vd vj vk fcond
+&vr_i         vd rj imm
+&rv_i         rd vj imm
+&vr           vd rj
 
 #
 # LSX Formats
@@ -512,6 +515,15 @@ dbcl             0000 00000010 10101 ...............      @i15
 @vv_i5           .... ........ ..... imm:s5 vj:5 vd:5    &vv_i
 @vvvv               .... ........ va:5 vk:5 vj:5 vd:5    &vvvv
 @vvv_fcond      .... ........ fcond:5  vk:5 vj:5 vd:5    &vvv_fcond
+@vr_ui4         .... ........ ..... . imm:4 rj:5 vd:5    &vr_i
+@vr_ui3        .... ........ ..... .. imm:3 rj:5 vd:5    &vr_i
+@vr_ui2       .... ........ ..... ... imm:2 rj:5 vd:5    &vr_i
+@vr_ui1      .... ........ ..... .... imm:1 rj:5 vd:5    &vr_i
+@rv_ui4         .... ........ ..... . imm:4 vj:5 rd:5    &rv_i
+@rv_ui3        .... ........ ..... .. imm:3 vj:5 rd:5    &rv_i
+@rv_ui2       .... ........ ..... ... imm:2 vj:5 rd:5    &rv_i
+@rv_ui1      .... ........ ..... .... imm:1 vj:5 rd:5    &rv_i
+@vr               .... ........ ..... ..... rj:5 vd:5    &vr
 
 vadd_b           0111 00000000 10100 ..... ..... .....    @vvv
 vadd_h           0111 00000000 10101 ..... ..... .....    @vvv
@@ -1167,3 +1179,21 @@ vsetallnez_b     0111 00101001 11001 01100 ..... 00 ...   @cv
 vsetallnez_h     0111 00101001 11001 01101 ..... 00 ...   @cv
 vsetallnez_w     0111 00101001 11001 01110 ..... 00 ...   @cv
 vsetallnez_d     0111 00101001 11001 01111 ..... 00 ...   @cv
+
+vinsgr2vr_b      0111 00101110 10111 0 .... ..... .....   @vr_ui4
+vinsgr2vr_h      0111 00101110 10111 10 ... ..... .....   @vr_ui3
+vinsgr2vr_w      0111 00101110 10111 110 .. ..... .....   @vr_ui2
+vinsgr2vr_d      0111 00101110 10111 1110 . ..... .....   @vr_ui1
+vpickve2gr_b     0111 00101110 11111 0 .... ..... .....   @rv_ui4
+vpickve2gr_h     0111 00101110 11111 10 ... ..... .....   @rv_ui3
+vpickve2gr_w     0111 00101110 11111 110 .. ..... .....   @rv_ui2
+vpickve2gr_d     0111 00101110 11111 1110 . ..... .....   @rv_ui1
+vpickve2gr_bu    0111 00101111 00111 0 .... ..... .....   @rv_ui4
+vpickve2gr_hu    0111 00101111 00111 10 ... ..... .....   @rv_ui3
+vpickve2gr_wu    0111 00101111 00111 110 .. ..... .....   @rv_ui2
+vpickve2gr_du    0111 00101111 00111 1110 . ..... .....   @rv_ui1
+
+vreplgr2vr_b     0111 00101001 11110 00000 ..... .....    @vr
+vreplgr2vr_h     0111 00101001 11110 00001 ..... .....    @vr
+vreplgr2vr_w     0111 00101001 11110 00010 ..... .....    @vr
+vreplgr2vr_d     0111 00101001 11110 00011 ..... .....    @vr
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 40/44] target/loongarch: Implement vreplve vpack vpick
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (38 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 39/44] target/loongarch: Implement vinsgr2vr vpickve2gr vreplgr2vr Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-04  1:17   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 41/44] target/loongarch: Implement vilvl vilvh vextrins vshuf Song Gao
                   ` (3 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VREPLVE[I].{B/H/W/D};
- VBSLL.V, VBSRL.V;
- VPACK{EV/OD}.{B/H/W/D};
- VPICK{EV/OD}.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  35 +++++
 target/loongarch/helper.h                   |  18 +++
 target/loongarch/insn_trans/trans_lsx.c.inc | 154 ++++++++++++++++++++
 target/loongarch/insns.decode               |  34 +++++
 target/loongarch/lsx_helper.c               |  92 ++++++++++++
 5 files changed, 333 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 7255a2aa4f..c6cf782725 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -833,6 +833,11 @@ static void output_vr(DisasContext *ctx, arg_vr *a, const char *mnemonic)
     output(ctx, mnemonic, "v%d, r%d", a->vd, a->rj);
 }
 
+static void output_vvr(DisasContext *ctx, arg_vvr *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "v%d, v%d, r%d", a->vd, a->vj, a->rk);
+}
+
 INSN_LSX(vadd_b,           vvv)
 INSN_LSX(vadd_h,           vvv)
 INSN_LSX(vadd_w,           vvv)
@@ -1594,3 +1599,33 @@ INSN_LSX(vreplgr2vr_b,     vr)
 INSN_LSX(vreplgr2vr_h,     vr)
 INSN_LSX(vreplgr2vr_w,     vr)
 INSN_LSX(vreplgr2vr_d,     vr)
+
+INSN_LSX(vreplve_b,        vvr)
+INSN_LSX(vreplve_h,        vvr)
+INSN_LSX(vreplve_w,        vvr)
+INSN_LSX(vreplve_d,        vvr)
+INSN_LSX(vreplvei_b,       vv_i)
+INSN_LSX(vreplvei_h,       vv_i)
+INSN_LSX(vreplvei_w,       vv_i)
+INSN_LSX(vreplvei_d,       vv_i)
+
+INSN_LSX(vbsll_v,          vv_i)
+INSN_LSX(vbsrl_v,          vv_i)
+
+INSN_LSX(vpackev_b,        vvv)
+INSN_LSX(vpackev_h,        vvv)
+INSN_LSX(vpackev_w,        vvv)
+INSN_LSX(vpackev_d,        vvv)
+INSN_LSX(vpackod_b,        vvv)
+INSN_LSX(vpackod_h,        vvv)
+INSN_LSX(vpackod_w,        vvv)
+INSN_LSX(vpackod_d,        vvv)
+
+INSN_LSX(vpickev_b,        vvv)
+INSN_LSX(vpickev_h,        vvv)
+INSN_LSX(vpickev_w,        vvv)
+INSN_LSX(vpickev_d,        vvv)
+INSN_LSX(vpickod_b,        vvv)
+INSN_LSX(vpickod_h,        vvv)
+INSN_LSX(vpickod_w,        vvv)
+INSN_LSX(vpickod_d,        vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index cdc007a072..bf03a16afd 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -668,3 +668,21 @@ DEF_HELPER_3(vsetallnez_b, void, env, i32, i32)
 DEF_HELPER_3(vsetallnez_h, void, env, i32, i32)
 DEF_HELPER_3(vsetallnez_w, void, env, i32, i32)
 DEF_HELPER_3(vsetallnez_d, void, env, i32, i32)
+
+DEF_HELPER_4(vpackev_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vpackev_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vpackev_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vpackev_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vpackod_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vpackod_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vpackod_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vpackod_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vpickev_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickev_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickev_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickev_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickod_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickod_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickod_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickod_d, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index b2489537ef..66cb67a19c 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -3331,3 +3331,157 @@ TRANS(vreplgr2vr_b, gvec_dup, MO_8)
 TRANS(vreplgr2vr_h, gvec_dup, MO_16)
 TRANS(vreplgr2vr_w, gvec_dup, MO_32)
 TRANS(vreplgr2vr_d, gvec_dup, MO_64)
+
+static bool trans_vreplvei_b(DisasContext *ctx, arg_vv_i *a)
+{
+    CHECK_SXE;
+    tcg_gen_gvec_dup_mem(MO_8,vreg_full_offset(a->vd),
+                         offsetof(CPULoongArchState,
+                                  fpr[a->vj].vreg.B((a->imm))),
+                         16, 16);
+    return true;
+}
+
+static bool trans_vreplvei_h(DisasContext *ctx, arg_vv_i *a)
+{
+    CHECK_SXE;
+    tcg_gen_gvec_dup_mem(MO_16, vreg_full_offset(a->vd),
+                         offsetof(CPULoongArchState,
+                                  fpr[a->vj].vreg.H((a->imm))),
+                         16, 16);
+    return true;
+}
+static bool trans_vreplvei_w(DisasContext *ctx, arg_vv_i *a)
+{
+    CHECK_SXE;
+    tcg_gen_gvec_dup_mem(MO_32, vreg_full_offset(a->vd),
+                         offsetof(CPULoongArchState,
+                                  fpr[a->vj].vreg.W((a->imm))),
+                        16, 16);
+    return true;
+}
+static bool trans_vreplvei_d(DisasContext *ctx, arg_vv_i *a)
+{
+    CHECK_SXE;
+    tcg_gen_gvec_dup_mem(MO_64, vreg_full_offset(a->vd),
+                         offsetof(CPULoongArchState,
+                                  fpr[a->vj].vreg.D((a->imm))),
+                         16, 16);
+    return true;
+}
+
+static bool gen_vreplve(DisasContext *ctx, arg_vvr *a, int vece, int bit,
+                        void (*func)(TCGv_i64, TCGv_ptr, tcg_target_long))
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_ptr t1 = tcg_temp_new_ptr();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+
+    CHECK_SXE;
+
+    tcg_gen_andi_i64(t0, gpr_src(ctx, a->rk, EXT_NONE), (LSX_LEN/bit) -1);
+    tcg_gen_shli_i64(t0, t0, vece);
+    if (HOST_BIG_ENDIAN) {
+        tcg_gen_xori_i64(t0, t0, vece << ((LSX_LEN/bit) -1));
+    }
+
+    tcg_gen_trunc_i64_ptr(t1, t0);
+    tcg_gen_add_ptr(t1, t1, cpu_env);
+    func(t2, t1, vreg_full_offset(a->vj));
+    tcg_gen_gvec_dup_i64(vece, vreg_full_offset(a->vd), 16, 16, t2);
+
+    return true;
+}
+
+TRANS(vreplve_b, gen_vreplve, MO_8,  8, tcg_gen_ld8u_i64)
+TRANS(vreplve_h, gen_vreplve, MO_16, 16, tcg_gen_ld16u_i64)
+TRANS(vreplve_w, gen_vreplve, MO_32, 32, tcg_gen_ld32u_i64)
+TRANS(vreplve_d, gen_vreplve, MO_64, 64, tcg_gen_ld_i64)
+
+static bool trans_vbsll_v(DisasContext *ctx, arg_vv_i *a)
+{
+    int ofs;
+    TCGv_i64 desthigh, destlow, high, low, t;
+
+    CHECK_SXE;
+
+    desthigh = tcg_temp_new_i64();
+    destlow = tcg_temp_new_i64();
+    high = tcg_temp_new_i64();
+    low = tcg_temp_new_i64();
+    t = tcg_constant_i64(0);
+
+    tcg_gen_ld_i64(high, cpu_env,
+                   offsetof(CPULoongArchState, fpr[a->vj].vreg.D(1)));
+    tcg_gen_ld_i64(low, cpu_env,
+                   offsetof(CPULoongArchState, fpr[a->vj].vreg.D(0)));
+
+    ofs = ((a->imm) & 0xf) * 8;
+    if (ofs < 64) {
+        tcg_gen_extract2_i64(desthigh, low, high, 64 -ofs);
+        tcg_gen_shli_i64(destlow, low, ofs);
+    } else {
+        tcg_gen_shli_i64(desthigh, low, ofs -64);
+        tcg_gen_mov_i64(destlow, t);
+    }
+
+    tcg_gen_st_i64(desthigh, cpu_env,
+                   offsetof(CPULoongArchState, fpr[a->vd].vreg.D(1)));
+    tcg_gen_st_i64(destlow, cpu_env,
+                   offsetof(CPULoongArchState, fpr[a->vd].vreg.D(0)));
+
+    return true;
+}
+
+static bool trans_vbsrl_v(DisasContext *ctx, arg_vv_i *a)
+{
+    TCGv_i64 desthigh, destlow, high, low, t;
+    int ofs;
+
+    CHECK_SXE;
+
+    desthigh = tcg_temp_new_i64();
+    destlow = tcg_temp_new_i64();
+    high = tcg_temp_new_i64();
+    low = tcg_temp_new_i64();
+    t = tcg_constant_i64(0);
+
+    tcg_gen_ld_i64(high, cpu_env,
+                   offsetof(CPULoongArchState, fpr[a->vj].vreg.D(1)));
+    tcg_gen_ld_i64(low, cpu_env,
+                   offsetof(CPULoongArchState, fpr[a->vj].vreg.D(0)));
+
+    ofs = ((a->imm) & 0xf) * 8;
+    if (ofs < 64) {
+        tcg_gen_extract2_i64(destlow, low, high, ofs);
+        tcg_gen_shri_i64(desthigh, high, ofs);
+    } else {
+        tcg_gen_shri_i64(destlow, high, ofs -64);
+        tcg_gen_mov_i64(desthigh, t);
+    }
+
+    tcg_gen_st_i64(desthigh, cpu_env,
+                   offsetof(CPULoongArchState, fpr[a->vd].vreg.D(1)));
+    tcg_gen_st_i64(destlow, cpu_env,
+                   offsetof(CPULoongArchState, fpr[a->vd].vreg.D(0)));
+
+    return true;
+}
+
+TRANS(vpackev_b, gen_vvv, gen_helper_vpackev_b)
+TRANS(vpackev_h, gen_vvv, gen_helper_vpackev_h)
+TRANS(vpackev_w, gen_vvv, gen_helper_vpackev_w)
+TRANS(vpackev_d, gen_vvv, gen_helper_vpackev_d)
+TRANS(vpackod_b, gen_vvv, gen_helper_vpackod_b)
+TRANS(vpackod_h, gen_vvv, gen_helper_vpackod_h)
+TRANS(vpackod_w, gen_vvv, gen_helper_vpackod_w)
+TRANS(vpackod_d, gen_vvv, gen_helper_vpackod_d)
+
+TRANS(vpickev_b, gen_vvv, gen_helper_vpickev_b)
+TRANS(vpickev_h, gen_vvv, gen_helper_vpickev_h)
+TRANS(vpickev_w, gen_vvv, gen_helper_vpickev_w)
+TRANS(vpickev_d, gen_vvv, gen_helper_vpickev_d)
+TRANS(vpickod_b, gen_vvv, gen_helper_vpickod_b)
+TRANS(vpickod_h, gen_vvv, gen_helper_vpickod_h)
+TRANS(vpickod_w, gen_vvv, gen_helper_vpickod_w)
+TRANS(vpickod_d, gen_vvv, gen_helper_vpickod_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index d1d255ab82..ab9e9e422f 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -499,6 +499,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 &vr_i         vd rj imm
 &rv_i         rd vj imm
 &vr           vd rj
+&vvr          vd vj rk
 
 #
 # LSX Formats
@@ -506,6 +507,8 @@ dbcl             0000 00000010 10101 ...............      @i15
 @vv               .... ........ ..... ..... vj:5 vd:5    &vv
 @cv            .... ........ ..... ..... vj:5 .. cd:3    &cv
 @vvv               .... ........ ..... vk:5 vj:5 vd:5    &vvv
+@vv_ui1      .... ........ ..... .... imm:1 vj:5 vd:5    &vv_i
+@vv_ui2       .... ........ ..... ... imm:2 vj:5 vd:5    &vv_i
 @vv_ui3        .... ........ ..... .. imm:3 vj:5 vd:5    &vv_i
 @vv_ui4         .... ........ ..... . imm:4 vj:5 vd:5    &vv_i
 @vv_ui5           .... ........ ..... imm:5 vj:5 vd:5    &vv_i
@@ -524,6 +527,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 @rv_ui2       .... ........ ..... ... imm:2 vj:5 rd:5    &rv_i
 @rv_ui1      .... ........ ..... .... imm:1 vj:5 rd:5    &rv_i
 @vr               .... ........ ..... ..... rj:5 vd:5    &vr
+@vvr               .... ........ ..... rk:5 vj:5 vd:5    &vvr
 
 vadd_b           0111 00000000 10100 ..... ..... .....    @vvv
 vadd_h           0111 00000000 10101 ..... ..... .....    @vvv
@@ -1197,3 +1201,33 @@ vreplgr2vr_b     0111 00101001 11110 00000 ..... .....    @vr
 vreplgr2vr_h     0111 00101001 11110 00001 ..... .....    @vr
 vreplgr2vr_w     0111 00101001 11110 00010 ..... .....    @vr
 vreplgr2vr_d     0111 00101001 11110 00011 ..... .....    @vr
+
+vreplve_b        0111 00010010 00100 ..... ..... .....    @vvr
+vreplve_h        0111 00010010 00101 ..... ..... .....    @vvr
+vreplve_w        0111 00010010 00110 ..... ..... .....    @vvr
+vreplve_d        0111 00010010 00111 ..... ..... .....    @vvr
+vreplvei_b       0111 00101111 01111 0 .... ..... .....   @vv_ui4
+vreplvei_h       0111 00101111 01111 10 ... ..... .....   @vv_ui3
+vreplvei_w       0111 00101111 01111 110 .. ..... .....   @vv_ui2
+vreplvei_d       0111 00101111 01111 1110 . ..... .....   @vv_ui1
+
+vbsll_v          0111 00101000 11100 ..... ..... .....    @vv_ui5
+vbsrl_v          0111 00101000 11101 ..... ..... .....    @vv_ui5
+
+vpackev_b        0111 00010001 01100 ..... ..... .....    @vvv
+vpackev_h        0111 00010001 01101 ..... ..... .....    @vvv
+vpackev_w        0111 00010001 01110 ..... ..... .....    @vvv
+vpackev_d        0111 00010001 01111 ..... ..... .....    @vvv
+vpackod_b        0111 00010001 10000 ..... ..... .....    @vvv
+vpackod_h        0111 00010001 10001 ..... ..... .....    @vvv
+vpackod_w        0111 00010001 10010 ..... ..... .....    @vvv
+vpackod_d        0111 00010001 10011 ..... ..... .....    @vvv
+
+vpickev_b        0111 00010001 11100 ..... ..... .....    @vvv
+vpickev_h        0111 00010001 11101 ..... ..... .....    @vvv
+vpickev_w        0111 00010001 11110 ..... ..... .....    @vvv
+vpickev_d        0111 00010001 11111 ..... ..... .....    @vvv
+vpickod_b        0111 00010010 00000 ..... ..... .....    @vvv
+vpickod_h        0111 00010010 00001 ..... ..... .....    @vvv
+vpickod_w        0111 00010010 00010 ..... ..... .....    @vvv
+vpickod_d        0111 00010010 00011 ..... ..... .....    @vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 996312d9b2..b8e0aa9d3b 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -3029,3 +3029,95 @@ SETALLNEZ(vsetallnez_b, 8, B)
 SETALLNEZ(vsetallnez_h, 16, H)
 SETALLNEZ(vsetallnez_w, 32, W)
 SETALLNEZ(vsetallnez_d, 64, D)
+
+#define VPACKEV(NAME, BIT, E)                            \
+void HELPER(NAME)(CPULoongArchState *env,                \
+                  uint32_t vd, uint32_t vj, uint32_t vk) \
+{                                                        \
+    int i;                                               \
+    VReg temp;                                           \
+    VReg *Vd = &(env->fpr[vd].vreg);                     \
+    VReg *Vj = &(env->fpr[vj].vreg);                     \
+    VReg *Vk = &(env->fpr[vk].vreg);                     \
+                                                         \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                  \
+        temp.E(2 * i + 1) = Vj->E(2 * i);                \
+        temp.E(2 *i) = Vk->E(2 * i);                     \
+    }                                                    \
+    Vd->D(0) = temp.D(0);                                \
+    Vd->D(1) = temp.D(1);                                \
+}
+
+VPACKEV(vpackev_b, 16, B)
+VPACKEV(vpackev_h, 32, H)
+VPACKEV(vpackev_w, 64, W)
+VPACKEV(vpackev_d, 128, D)
+
+#define VPACKOD(NAME, BIT, E)                            \
+void HELPER(NAME)(CPULoongArchState *env,                \
+                  uint32_t vd, uint32_t vj, uint32_t vk) \
+{                                                        \
+    int i;                                               \
+    VReg temp;                                           \
+    VReg *Vd = &(env->fpr[vd].vreg);                     \
+    VReg *Vj = &(env->fpr[vj].vreg);                     \
+    VReg *Vk = &(env->fpr[vk].vreg);                     \
+                                                         \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                  \
+        temp.E(2 * i + 1) = Vj->E(2 * i + 1);            \
+        temp.E(2 * i) = Vk->E(2 * i + 1);                \
+    }                                                    \
+    Vd->D(0) = temp.D(0);                                \
+    Vd->D(1) = temp.D(1);                                \
+}
+
+VPACKOD(vpackod_b, 16, B)
+VPACKOD(vpackod_h, 32, H)
+VPACKOD(vpackod_w, 64, W)
+VPACKOD(vpackod_d, 128, D)
+
+#define VPICKEV(NAME, BIT, E)                            \
+void HELPER(NAME)(CPULoongArchState *env,                \
+                  uint32_t vd, uint32_t vj, uint32_t vk) \
+{                                                        \
+    int i;                                               \
+    VReg temp;                                           \
+    VReg *Vd = &(env->fpr[vd].vreg);                     \
+    VReg *Vj = &(env->fpr[vj].vreg);                     \
+    VReg *Vk = &(env->fpr[vk].vreg);                     \
+                                                         \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                  \
+        temp.E(i + LSX_LEN/BIT) = Vj->E(2 * i);          \
+        temp.E(i) = Vk->E(2 * i);                        \
+    }                                                    \
+    Vd->D(0) = temp.D(0);                                \
+    Vd->D(1) = temp.D(1);                                \
+}
+
+VPICKEV(vpickev_b, 16, B)
+VPICKEV(vpickev_h, 32, H)
+VPICKEV(vpickev_w, 64, W)
+VPICKEV(vpickev_d, 128, D)
+
+#define VPICKOD(NAME, BIT, E)                            \
+void HELPER(NAME)(CPULoongArchState *env,                \
+                  uint32_t vd, uint32_t vj, uint32_t vk) \
+{                                                        \
+    int i;                                               \
+    VReg temp;                                           \
+    VReg *Vd = &(env->fpr[vd].vreg);                     \
+    VReg *Vj = &(env->fpr[vj].vreg);                     \
+    VReg *Vk = &(env->fpr[vk].vreg);                     \
+                                                         \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                  \
+        temp.E(i + LSX_LEN/BIT) = Vj->E(2 * i + 1);      \
+        temp.E(i) = Vk->E(2 * i + 1);                    \
+    }                                                    \
+    Vd->D(0) = temp.D(0);                                \
+    Vd->D(1) = temp.D(1);                                \
+}
+
+VPICKOD(vpickod_b, 16, B)
+VPICKOD(vpickod_h, 32, H)
+VPICKOD(vpickod_w, 64, W)
+VPICKOD(vpickod_d, 128, D)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 41/44] target/loongarch: Implement vilvl vilvh vextrins vshuf
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (39 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 40/44] target/loongarch: Implement vreplve vpack vpick Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-04  1:31   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 42/44] target/loongarch: Implement vld vst Song Gao
                   ` (2 subsequent siblings)
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VILV{L/H}.{B/H/W/D};
- VSHUF.{B/H/W/D};
- VSHUF4I.{B/H/W/D};
- VPERMI.W;
- VEXTRINS.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  25 +++
 target/loongarch/helper.h                   |  25 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  25 +++
 target/loongarch/insns.decode               |  25 +++
 target/loongarch/lsx_helper.c               | 163 ++++++++++++++++++++
 5 files changed, 263 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index c6cf782725..0b62bbb8be 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1629,3 +1629,28 @@ INSN_LSX(vpickod_b,        vvv)
 INSN_LSX(vpickod_h,        vvv)
 INSN_LSX(vpickod_w,        vvv)
 INSN_LSX(vpickod_d,        vvv)
+
+INSN_LSX(vilvl_b,          vvv)
+INSN_LSX(vilvl_h,          vvv)
+INSN_LSX(vilvl_w,          vvv)
+INSN_LSX(vilvl_d,          vvv)
+INSN_LSX(vilvh_b,          vvv)
+INSN_LSX(vilvh_h,          vvv)
+INSN_LSX(vilvh_w,          vvv)
+INSN_LSX(vilvh_d,          vvv)
+
+INSN_LSX(vshuf_b,          vvvv)
+INSN_LSX(vshuf_h,          vvv)
+INSN_LSX(vshuf_w,          vvv)
+INSN_LSX(vshuf_d,          vvv)
+INSN_LSX(vshuf4i_b,        vv_i)
+INSN_LSX(vshuf4i_h,        vv_i)
+INSN_LSX(vshuf4i_w,        vv_i)
+INSN_LSX(vshuf4i_d,        vv_i)
+
+INSN_LSX(vpermi_w,         vv_i)
+
+INSN_LSX(vextrins_d,       vv_i)
+INSN_LSX(vextrins_w,       vv_i)
+INSN_LSX(vextrins_h,       vv_i)
+INSN_LSX(vextrins_b,       vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index bf03a16afd..86c7eeeae1 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -686,3 +686,28 @@ DEF_HELPER_4(vpickod_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vpickod_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vpickod_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vpickod_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vilvl_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vilvl_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vilvl_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vilvl_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vilvh_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vilvh_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vilvh_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vilvh_d, void, env, i32, i32, i32)
+
+DEF_HELPER_5(vshuf_b, void, env, i32, i32, i32, i32)
+DEF_HELPER_4(vshuf_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vshuf_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vshuf_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vshuf4i_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vshuf4i_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vshuf4i_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vshuf4i_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vpermi_w, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vextrins_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vextrins_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vextrins_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vextrins_d, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 66cb67a19c..0ea7c65445 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -3485,3 +3485,28 @@ TRANS(vpickod_b, gen_vvv, gen_helper_vpickod_b)
 TRANS(vpickod_h, gen_vvv, gen_helper_vpickod_h)
 TRANS(vpickod_w, gen_vvv, gen_helper_vpickod_w)
 TRANS(vpickod_d, gen_vvv, gen_helper_vpickod_d)
+
+TRANS(vilvl_b, gen_vvv, gen_helper_vilvl_b)
+TRANS(vilvl_h, gen_vvv, gen_helper_vilvl_h)
+TRANS(vilvl_w, gen_vvv, gen_helper_vilvl_w)
+TRANS(vilvl_d, gen_vvv, gen_helper_vilvl_d)
+TRANS(vilvh_b, gen_vvv, gen_helper_vilvh_b)
+TRANS(vilvh_h, gen_vvv, gen_helper_vilvh_h)
+TRANS(vilvh_w, gen_vvv, gen_helper_vilvh_w)
+TRANS(vilvh_d, gen_vvv, gen_helper_vilvh_d)
+
+TRANS(vshuf_b, gen_vvvv, gen_helper_vshuf_b)
+TRANS(vshuf_h, gen_vvv, gen_helper_vshuf_h)
+TRANS(vshuf_w, gen_vvv, gen_helper_vshuf_w)
+TRANS(vshuf_d, gen_vvv, gen_helper_vshuf_d)
+TRANS(vshuf4i_b, gen_vv_i, gen_helper_vshuf4i_b)
+TRANS(vshuf4i_h, gen_vv_i, gen_helper_vshuf4i_h)
+TRANS(vshuf4i_w, gen_vv_i, gen_helper_vshuf4i_w)
+TRANS(vshuf4i_d, gen_vv_i, gen_helper_vshuf4i_d)
+
+TRANS(vpermi_w, gen_vv_i, gen_helper_vpermi_w)
+
+TRANS(vextrins_b, gen_vv_i, gen_helper_vextrins_b)
+TRANS(vextrins_h, gen_vv_i, gen_helper_vextrins_h)
+TRANS(vextrins_w, gen_vv_i, gen_helper_vextrins_w)
+TRANS(vextrins_d, gen_vv_i, gen_helper_vextrins_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index ab9e9e422f..0263bce28e 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1231,3 +1231,28 @@ vpickod_b        0111 00010010 00000 ..... ..... .....    @vvv
 vpickod_h        0111 00010010 00001 ..... ..... .....    @vvv
 vpickod_w        0111 00010010 00010 ..... ..... .....    @vvv
 vpickod_d        0111 00010010 00011 ..... ..... .....    @vvv
+
+vilvl_b          0111 00010001 10100 ..... ..... .....    @vvv
+vilvl_h          0111 00010001 10101 ..... ..... .....    @vvv
+vilvl_w          0111 00010001 10110 ..... ..... .....    @vvv
+vilvl_d          0111 00010001 10111 ..... ..... .....    @vvv
+vilvh_b          0111 00010001 11000 ..... ..... .....    @vvv
+vilvh_h          0111 00010001 11001 ..... ..... .....    @vvv
+vilvh_w          0111 00010001 11010 ..... ..... .....    @vvv
+vilvh_d          0111 00010001 11011 ..... ..... .....    @vvv
+
+vshuf_b          0000 11010101 ..... ..... ..... .....    @vvvv
+vshuf_h          0111 00010111 10101 ..... ..... .....    @vvv
+vshuf_w          0111 00010111 10110 ..... ..... .....    @vvv
+vshuf_d          0111 00010111 10111 ..... ..... .....    @vvv
+vshuf4i_b        0111 00111001 00 ........ ..... .....    @vv_ui8
+vshuf4i_h        0111 00111001 01 ........ ..... .....    @vv_ui8
+vshuf4i_w        0111 00111001 10 ........ ..... .....    @vv_ui8
+vshuf4i_d        0111 00111001 11 ........ ..... .....    @vv_ui8
+
+vpermi_w         0111 00111110 01 ........ ..... .....    @vv_ui8
+
+vextrins_d       0111 00111000 00 ........ ..... .....    @vv_ui8
+vextrins_w       0111 00111000 01 ........ ..... .....    @vv_ui8
+vextrins_h       0111 00111000 10 ........ ..... .....    @vv_ui8
+vextrins_b       0111 00111000 11 ........ ..... .....    @vv_ui8
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index b8e0aa9d3b..56faa8684d 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -3121,3 +3121,166 @@ VPICKOD(vpickod_b, 16, B)
 VPICKOD(vpickod_h, 32, H)
 VPICKOD(vpickod_w, 64, W)
 VPICKOD(vpickod_d, 128, D)
+
+#define VILVL(NAME, BIT, E)                              \
+void HELPER(NAME)(CPULoongArchState *env,                \
+                  uint32_t vd, uint32_t vj, uint32_t vk) \
+{                                                        \
+    int i;                                               \
+    VReg temp;                                           \
+    VReg *Vd = &(env->fpr[vd].vreg);                     \
+    VReg *Vj = &(env->fpr[vj].vreg);                     \
+    VReg *Vk = &(env->fpr[vk].vreg);                     \
+                                                         \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                  \
+        temp.E(2 * i + 1) = Vj->E(i);                    \
+        temp.E(2 * i) = Vk->E(i);                        \
+    }                                                    \
+    Vd->D(0) = temp.D(0);                                \
+    Vd->D(1) = temp.D(1);                                \
+}
+
+VILVL(vilvl_b, 16, B)
+VILVL(vilvl_h, 32, H)
+VILVL(vilvl_w, 64, W)
+VILVL(vilvl_d, 128, D)
+
+#define VILVH(NAME, BIT, E)                              \
+void HELPER(NAME)(CPULoongArchState *env,                \
+                  uint32_t vd, uint32_t vj, uint32_t vk) \
+{                                                        \
+    int i;                                               \
+    VReg temp;                                           \
+    VReg *Vd = &(env->fpr[vd].vreg);                     \
+    VReg *Vj = &(env->fpr[vj].vreg);                     \
+    VReg *Vk = &(env->fpr[vk].vreg);                     \
+                                                         \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                  \
+        temp.E(2 * i + 1) = Vj->E(i + LSX_LEN/BIT);      \
+        temp.E(2 * i) = Vk->E(i + LSX_LEN/BIT);          \
+    }                                                    \
+    Vd->D(0) = temp.D(0);                                \
+    Vd->D(1) = temp.D(1);                                \
+}
+
+VILVH(vilvh_b, 16, B)
+VILVH(vilvh_h, 32, H)
+VILVH(vilvh_w, 64, W)
+VILVH(vilvh_d, 128, D)
+
+void HELPER(vshuf_b)(CPULoongArchState *env,
+                     uint32_t vd, uint32_t vj, uint32_t vk, uint32_t va)
+{
+    int i, m, k;
+    VReg temp;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vk = &(env->fpr[vk].vreg);
+    VReg *Va = &(env->fpr[va].vreg);
+
+    m = LSX_LEN/8;
+    for (i = 0; i < m ; i++) {
+        k = (Va->B(i)& 0x3f) % (2 * m);
+        temp.B(i) = (Va->B(i) & 0xc0) ? 0 : k < m ? Vk->B(k) : Vj->B(k - m);
+    }
+    Vd->D(0) = temp.D(0);
+    Vd->D(1) = temp.D(1);
+}
+
+#define VSHUF(NAME, BIT, E)                                                  \
+void HELPER(NAME)(CPULoongArchState *env,                                    \
+                  uint32_t vd, uint32_t vj, uint32_t vk)                     \
+{                                                                            \
+    int i, m, k;                                                             \
+    VReg temp;                                                               \
+    VReg *Vd = &(env->fpr[vd].vreg);                                         \
+    VReg *Vj = &(env->fpr[vj].vreg);                                         \
+    VReg *Vk = &(env->fpr[vk].vreg);                                         \
+                                                                             \
+    m = LSX_LEN/BIT;                                                         \
+    for (i = 0; i < m; i++) {                                                \
+        k  = (Vd->E(i) & 0x3f) % (2 * m);                                    \
+        temp.E(i) = (Vd->E(i) & 0xc0) ? 0 : k < m ? Vk->E(k) : Vj->E(k - m); \
+    }                                                                        \
+    Vd->D(0) = temp.D(0);                                                    \
+    Vd->D(1) = temp.D(1);                                                    \
+}
+
+VSHUF(vshuf_h, 16, H)
+VSHUF(vshuf_w, 32, W)
+VSHUF(vshuf_d, 64, D)
+
+#define SHF_POS(i, imm) (((i) & 0xfc) + (((imm) >> (2 * ((i) & 0x03))) & 0x03))
+
+#define VSHUF4I(NAME, BIT, E)                             \
+void HELPER(NAME)(CPULoongArchState *env,                 \
+                  uint32_t vd, uint32_t vj, uint32_t imm) \
+{                                                         \
+    int i;                                                \
+    VReg temp;                                            \
+    VReg *Vd = &(env->fpr[vd].vreg);                      \
+    VReg *Vj = &(env->fpr[vj].vreg);                      \
+                                                          \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                   \
+         temp.E(i) = Vj->E(SHF_POS(i, imm));              \
+    }                                                     \
+    Vd->D[0] = temp.D[0];                                 \
+    Vd->D[1] = temp.D[1];                                 \
+}
+
+VSHUF4I(vshuf4i_b, 8, B)
+VSHUF4I(vshuf4i_h, 16, H)
+VSHUF4I(vshuf4i_w, 32, W)
+
+void HELPER(vshuf4i_d)(CPULoongArchState *env,
+                       uint32_t vd, uint32_t vj, uint32_t imm)
+{
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    VReg temp;
+    temp.D(0) = ((imm & 0x03) == 0x00) ? Vd->D(0):
+                ((imm & 0x03) == 0x01) ? Vd->D(1):
+                ((imm & 0x03) == 0x02) ? Vj->D(0): Vj->D(1);
+
+    temp.D(1) = ((imm & 0x0c) == 0x00) ? Vd->D(0):
+                ((imm & 0x0c) == 0x04) ? Vd->D(1):
+                ((imm & 0x0c) == 0x08) ? Vj->D(0): Vj->D(1);
+
+    Vd->D[0] = temp.D[0];
+    Vd->D[1] = temp.D[1];
+}
+
+void HELPER(vpermi_w)(CPULoongArchState *env,
+                      uint32_t vd, uint32_t vj, uint32_t imm)
+{
+    VReg temp;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    VReg *Vj = &(env->fpr[vj].vreg);
+
+    temp.W(0) = Vj->W(imm & 0x3);
+    temp.W(1) = Vj->W((imm >> 2) & 0x3);
+    temp.W(2) = Vd->W((imm >> 4) & 0x3);
+    temp.W(3) = Vd->W((imm >> 6) & 0x3);
+
+    Vd->D(0) = temp.D(0);
+    Vd->D(1) = temp.D(1);
+}
+
+#define VEXTRINS(NAME, BIT, E, MASK)                      \
+void HELPER(NAME)(CPULoongArchState *env,                 \
+                  uint32_t vd, uint32_t vj, uint32_t imm) \
+{                                                         \
+    int ins, extr;                                        \
+    VReg *Vd = &(env->fpr[vd].vreg);                      \
+    VReg *Vj = &(env->fpr[vj].vreg);                      \
+                                                          \
+    ins = (imm >> 4) & MASK;                              \
+    extr = imm & MASK;                                    \
+    Vd->E(ins) = Vj->E(extr);                             \
+}
+
+VEXTRINS(vextrins_b, 8, B, 0xf)
+VEXTRINS(vextrins_h, 16, H, 0x7)
+VEXTRINS(vextrins_w, 32, W, 0x3)
+VEXTRINS(vextrins_d, 64, D, 0x1)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 42/44] target/loongarch: Implement vld vst
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (40 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 41/44] target/loongarch: Implement vilvl vilvh vextrins vshuf Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-04  3:35   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 43/44] target/loongarch: Implement vldi Song Gao
  2023-03-28  3:06 ` [RFC PATCH v2 44/44] target/loongarch: Use {set/get}_gpr replace to cpu_fpr Song Gao
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VLD[X], VST[X];
- VLDREPL.{B/H/W/D};
- VSTELM.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |  34 +++
 target/loongarch/helper.h                   |  12 +
 target/loongarch/insn_trans/trans_lsx.c.inc |  70 +++++
 target/loongarch/insns.decode               |  36 +++
 target/loongarch/lsx_helper.c               | 267 ++++++++++++++++++++
 target/loongarch/translate.c                |  10 +
 6 files changed, 429 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 0b62bbb8be..8627908fc9 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -21,11 +21,21 @@ static inline int plus_1(DisasContext *ctx, int x)
     return x + 1;
 }
 
+static inline int shl_1(DisasContext *ctx, int x)
+{
+    return x << 1;
+}
+
 static inline int shl_2(DisasContext *ctx, int x)
 {
     return x << 2;
 }
 
+static inline int shl_3(DisasContext *ctx, int x)
+{
+    return x << 3;
+}
+
 #define CSR_NAME(REG) \
     [LOONGARCH_CSR_##REG] = (#REG)
 
@@ -823,6 +833,11 @@ static void output_vr_i(DisasContext *ctx, arg_vr_i *a, const char *mnemonic)
     output(ctx, mnemonic, "v%d, r%d, 0x%x", a->vd, a->rj, a->imm);
 }
 
+static void output_vr_ii(DisasContext *ctx, arg_vr_ii *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "v%d, r%d, 0x%x, 0x%x", a->vd, a->rj, a->imm, a->imm2);
+}
+
 static void output_rv_i(DisasContext *ctx, arg_rv_i *a, const char *mnemonic)
 {
     output(ctx, mnemonic, "r%d, v%d, 0x%x", a->rd, a->vj,  a->imm);
@@ -838,6 +853,11 @@ static void output_vvr(DisasContext *ctx, arg_vvr *a, const char *mnemonic)
     output(ctx, mnemonic, "v%d, v%d, r%d", a->vd, a->vj, a->rk);
 }
 
+static void output_vrr(DisasContext *ctx, arg_vrr *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "v%d, r%d, r%d", a->vd, a->rj, a->rk);
+}
+
 INSN_LSX(vadd_b,           vvv)
 INSN_LSX(vadd_h,           vvv)
 INSN_LSX(vadd_w,           vvv)
@@ -1654,3 +1674,17 @@ INSN_LSX(vextrins_d,       vv_i)
 INSN_LSX(vextrins_w,       vv_i)
 INSN_LSX(vextrins_h,       vv_i)
 INSN_LSX(vextrins_b,       vv_i)
+
+INSN_LSX(vld,              vr_i)
+INSN_LSX(vst,              vr_i)
+INSN_LSX(vldx,             vrr)
+INSN_LSX(vstx,             vrr)
+
+INSN_LSX(vldrepl_d,        vr_i)
+INSN_LSX(vldrepl_w,        vr_i)
+INSN_LSX(vldrepl_h,        vr_i)
+INSN_LSX(vldrepl_b,        vr_i)
+INSN_LSX(vstelm_d,         vr_ii)
+INSN_LSX(vstelm_w,         vr_ii)
+INSN_LSX(vstelm_h,         vr_ii)
+INSN_LSX(vstelm_b,         vr_ii)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 86c7eeeae1..5b6674ff0e 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -711,3 +711,15 @@ DEF_HELPER_4(vextrins_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vextrins_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vextrins_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vextrins_d, void, env, i32, i32, i32)
+
+DEF_HELPER_3(vld_b, void, env, i32, tl)
+DEF_HELPER_3(vst_b, void, env, i32, tl)
+
+DEF_HELPER_3(vldrepl_d, void, env, i32, tl)
+DEF_HELPER_3(vldrepl_w, void, env, i32, tl)
+DEF_HELPER_3(vldrepl_h, void, env, i32, tl)
+DEF_HELPER_3(vldrepl_b, void, env, i32, tl)
+DEF_HELPER_4(vstelm_d, void, env, i32, tl, i32)
+DEF_HELPER_4(vstelm_w, void, env, i32, tl, i32)
+DEF_HELPER_4(vstelm_h, void, env, i32, tl, i32)
+DEF_HELPER_4(vstelm_b, void, env, i32, tl, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 0ea7c65445..ab896f8a9e 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -3510,3 +3510,73 @@ TRANS(vextrins_b, gen_vv_i, gen_helper_vextrins_b)
 TRANS(vextrins_h, gen_vv_i, gen_helper_vextrins_h)
 TRANS(vextrins_w, gen_vv_i, gen_helper_vextrins_w)
 TRANS(vextrins_d, gen_vv_i, gen_helper_vextrins_d)
+
+static bool gen_memory(DisasContext *ctx, arg_vr_i *a,
+                       void (*func)(TCGv_ptr, TCGv_i32, TCGv))
+{
+    TCGv_i32 vd = tcg_constant_i32(a->vd);
+    TCGv addr = gpr_src(ctx, a->rj, EXT_NONE);
+    TCGv temp;
+
+    CHECK_SXE;
+
+    if (a->imm) {
+        temp = tcg_temp_new();
+        tcg_gen_addi_tl(temp, addr, a->imm);
+        addr = temp;
+    }
+
+    func(cpu_env, vd, addr);
+
+    return true;
+}
+
+static bool gen_memory_x(DisasContext *ctx, arg_vrr *a,
+                    void (*func)(TCGv_ptr, TCGv_i32, TCGv))
+{
+    TCGv_i32 vd = tcg_constant_i32(a->vd);
+    TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE);
+    TCGv src2 = gpr_src(ctx, a->rk, EXT_NONE);
+
+    CHECK_SXE;
+
+    TCGv addr = tcg_temp_new();
+    tcg_gen_add_tl(addr, src1, src2);
+    func(cpu_env, vd, addr);
+    return true;
+}
+
+TRANS(vld, gen_memory, gen_helper_vld_b)
+TRANS(vst, gen_memory, gen_helper_vst_b)
+TRANS(vldx, gen_memory_x, gen_helper_vld_b)
+TRANS(vstx, gen_memory_x, gen_helper_vst_b)
+
+static bool gen_memory_elm(DisasContext *ctx, arg_vr_ii *a,
+                           void (*func)(TCGv_ptr, TCGv_i32, TCGv, TCGv_i32))
+{
+    TCGv_i32 vd = tcg_constant_i32(a->vd);
+    TCGv addr = gpr_src(ctx, a->rj, EXT_NONE);
+    TCGv_i32 tidx = tcg_constant_i32(a->imm2);
+    TCGv temp;
+
+    CHECK_SXE;
+
+    if (a->imm) {
+        temp = tcg_temp_new();
+        tcg_gen_addi_tl(temp, addr, a->imm);
+        addr = temp;
+    }
+
+    func(cpu_env, vd, addr, tidx);
+
+    return true;
+}
+
+TRANS(vldrepl_b, gen_memory, gen_helper_vldrepl_b)
+TRANS(vldrepl_h, gen_memory, gen_helper_vldrepl_h)
+TRANS(vldrepl_w, gen_memory, gen_helper_vldrepl_w)
+TRANS(vldrepl_d, gen_memory, gen_helper_vldrepl_d)
+TRANS(vstelm_b, gen_memory_elm, gen_helper_vstelm_b)
+TRANS(vstelm_h, gen_memory_elm, gen_helper_vstelm_h)
+TRANS(vstelm_w, gen_memory_elm, gen_helper_vstelm_w)
+TRANS(vstelm_d, gen_memory_elm, gen_helper_vstelm_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 0263bce28e..ea6eedb7a9 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -486,6 +486,17 @@ ertn             0000 01100100 10000 01110 00000 00000    @empty
 idle             0000 01100100 10001 ...............      @i15
 dbcl             0000 00000010 10101 ...............      @i15
 
+#
+# LSX Fields
+#
+
+%i9s3     10:s9       !function=shl_3
+%i10s2    10:s10      !function=shl_2
+%i11s1    10:s11      !function=shl_1
+%i8s3     10:s8       !function=shl_3
+%i8s2     10:s8       !function=shl_2
+%i8s1     10:s8       !function=shl_1
+
 #
 # LSX Argument sets
 #
@@ -500,6 +511,8 @@ dbcl             0000 00000010 10101 ...............      @i15
 &rv_i         rd vj imm
 &vr           vd rj
 &vvr          vd vj rk
+&vrr          vd rj rk
+&vr_ii        vd rj imm imm2
 
 #
 # LSX Formats
@@ -528,6 +541,15 @@ dbcl             0000 00000010 10101 ...............      @i15
 @rv_ui1      .... ........ ..... .... imm:1 vj:5 rd:5    &rv_i
 @vr               .... ........ ..... ..... rj:5 vd:5    &vr
 @vvr               .... ........ ..... rk:5 vj:5 vd:5    &vvr
+@vr_i9            .... ........ . ......... rj:5 vd:5    &vr_i imm=%i9s3
+@vr_i10            .... ........ .......... rj:5 vd:5    &vr_i imm=%i10s2
+@vr_i11            .... ....... ........... rj:5 vd:5    &vr_i imm=%i11s1
+@vr_i12                 .... ...... imm:s12 rj:5 vd:5    &vr_i
+@vr_i8i1    .... ........ . imm2:1 ........ rj:5 vd:5    &vr_ii imm=%i8s3
+@vr_i8i2      .... ........ imm2:2 ........ rj:5 vd:5    &vr_ii imm=%i8s2
+@vr_i8i3       .... ....... imm2:3 ........ rj:5 vd:5    &vr_ii imm=%i8s1
+@vr_i8i4          .... ...... imm2:4 imm:s8 rj:5 vd:5    &vr_ii
+@vrr               .... ........ ..... rk:5 rj:5 vd:5    &vrr
 
 vadd_b           0111 00000000 10100 ..... ..... .....    @vvv
 vadd_h           0111 00000000 10101 ..... ..... .....    @vvv
@@ -1256,3 +1278,17 @@ vextrins_d       0111 00111000 00 ........ ..... .....    @vv_ui8
 vextrins_w       0111 00111000 01 ........ ..... .....    @vv_ui8
 vextrins_h       0111 00111000 10 ........ ..... .....    @vv_ui8
 vextrins_b       0111 00111000 11 ........ ..... .....    @vv_ui8
+
+vld              0010 110000 ............ ..... .....     @vr_i12
+vst              0010 110001 ............ ..... .....     @vr_i12
+vldx             0011 10000100 00000 ..... ..... .....    @vrr
+vstx             0011 10000100 01000 ..... ..... .....    @vrr
+
+vldrepl_d        0011 00000001 0 ......... ..... .....    @vr_i9
+vldrepl_w        0011 00000010 .......... ..... .....     @vr_i10
+vldrepl_h        0011 0000010 ........... ..... .....     @vr_i11
+vldrepl_b        0011 000010 ............ ..... .....     @vr_i12
+vstelm_d         0011 00010001 0 . ........ ..... .....   @vr_i8i1
+vstelm_w         0011 00010010 .. ........ ..... .....    @vr_i8i2
+vstelm_h         0011 0001010 ... ........ ..... .....    @vr_i8i3
+vstelm_b         0011 000110 .... ........ ..... .....    @vr_i8i4
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 56faa8684d..326646413d 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -11,6 +11,7 @@
 #include "exec/helper-proto.h"
 #include "fpu/softfloat.h"
 #include "internals.h"
+#include "tcg/tcg-ldst.h"
 
 void helper_vadd_q(CPULoongArchState *env,
                    uint32_t vd, uint32_t vj, uint32_t vk)
@@ -3284,3 +3285,269 @@ VEXTRINS(vextrins_b, 8, B, 0xf)
 VEXTRINS(vextrins_h, 16, H, 0x7)
 VEXTRINS(vextrins_w, 32, W, 0x3)
 VEXTRINS(vextrins_d, 64, D, 0x1)
+
+void HELPER(vld_b)(CPULoongArchState *env, uint32_t vd, target_ulong addr)
+{
+    int i;
+    VReg *Vd = &(env->fpr[vd].vreg);
+#if !defined(CONFIG_USER_ONLY)
+    MemOpIdx oi = make_memop_idx(MO_TE | MO_UNALN, cpu_mmu_index(env, false));
+
+    for (i = 0; i < LSX_LEN/8; i++) {
+        Vd->B(i) = helper_ret_ldub_mmu(env, addr + i, oi, GETPC());
+    }
+#else
+    for (i = 0; i < LSX_LEN/8; i++) {
+        Vd->B(i) = cpu_ldub_data(env, addr + i);
+    }
+#endif
+}
+
+#define LSX_PAGESPAN(x) \
+        ((((x) & ~TARGET_PAGE_MASK) + LSX_LEN/8 - 1) >= TARGET_PAGE_SIZE)
+
+static inline void ensure_writable_pages(CPULoongArchState *env,
+                                         target_ulong addr,
+                                         int mmu_idx,
+                                         uintptr_t retaddr)
+{
+#ifndef CONFIG_USER_ONLY
+    /* FIXME: Probe the actual accesses (pass and use a size) */
+    if (unlikely(LSX_PAGESPAN(addr))) {
+        /* first page */
+        probe_write(env, addr, 0, mmu_idx, retaddr);
+        /* second page */
+        addr = (addr & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+        probe_write(env, addr, 0, mmu_idx, retaddr);
+    }
+#endif
+}
+
+void HELPER(vst_b)(CPULoongArchState *env, uint32_t vd, target_ulong addr)
+{
+    int i;
+    VReg *Vd = &(env->fpr[vd].vreg);
+    int mmu_idx = cpu_mmu_index(env, false);
+
+    ensure_writable_pages(env, addr, mmu_idx, GETPC());
+#if !defined(CONFIG_USER_ONLY)
+    MemOpIdx oi = make_memop_idx(MO_TE | MO_UNALN, mmu_idx);
+    for (i = 0; i < LSX_LEN/8; i++) {
+        helper_ret_stb_mmu(env, addr + i, Vd->B(i),  oi, GETPC());
+    }
+#else
+    for (i = 0; i < LSX_LEN/8; i++) {
+        cpu_stb_data(env, addr + i, Vd->B(i));
+    }
+#endif
+}
+
+void HELPER(vldrepl_b)(CPULoongArchState *env, uint32_t vd, target_ulong addr)
+{
+    VReg *Vd = &(env->fpr[vd].vreg);
+    uint8_t data;
+#if !defined(CONFIG_USER_ONLY)
+    MemOpIdx oi = make_memop_idx(MO_TE | MO_8 | MO_UNALN,
+                                 cpu_mmu_index(env, false));
+    data = helper_ret_ldub_mmu(env, addr, oi, GETPC());
+#else
+    data = cpu_ldub_data(env, addr);
+#endif
+    int i;
+    for (i = 0; i < 16; i++) {
+        Vd->B(i) = data;
+    }
+}
+
+void HELPER(vldrepl_h)(CPULoongArchState *env, uint32_t vd, target_ulong addr)
+{
+    VReg *Vd = &(env->fpr[vd].vreg);
+    uint16_t data;
+#if !defined(CONFIG_USER_ONLY)
+    MemOpIdx oi = make_memop_idx(MO_TE | MO_16 | MO_UNALN,
+                                 cpu_mmu_index(env, false));
+    data = helper_le_lduw_mmu(env, addr, oi, GETPC());
+#else
+    data = cpu_lduw_data(env, addr);
+#endif
+    int i;
+    for (i = 0; i < 8; i++) {
+        Vd->H(i) = data;
+    }
+}
+
+void HELPER(vldrepl_w)(CPULoongArchState *env, uint32_t vd, target_ulong addr)
+{
+    VReg *Vd = &(env->fpr[vd].vreg);
+    uint32_t data;
+#if !defined(CONFIG_USER_ONLY)
+    MemOpIdx oi = make_memop_idx(MO_TE | MO_32 | MO_UNALN,
+                                 cpu_mmu_index(env, false));
+    data = helper_le_ldul_mmu(env, addr, oi, GETPC());
+#else
+    data = cpu_ldl_data(env, addr);
+#endif
+    Vd->W(0) = data;
+    Vd->W(1) = data;
+    Vd->W(2) = data;
+    Vd->W(3) = data;
+}
+
+void helper_vldrepl_d(CPULoongArchState *env, uint32_t vd, target_ulong addr)
+{
+    VReg *Vd = &(env->fpr[vd].vreg);
+    uint64_t data;
+#if !defined(CONFIG_USER_ONLY)
+    MemOpIdx oi = make_memop_idx(MO_TE | MO_64 | MO_UNALN,
+                                 cpu_mmu_index(env, false));
+    data = helper_le_ldq_mmu(env, addr, oi, GETPC());
+#else
+    data = cpu_ldq_data(env, addr);
+#endif
+    Vd->D(0) = data;
+    Vd->D(1) = data;
+}
+
+#define B_PAGESPAN(x) \
+        ((((x) & ~TARGET_PAGE_MASK) + 8/8 - 1) >= TARGET_PAGE_SIZE)
+
+static inline void ensure_b_writable_pages(CPULoongArchState *env,
+                                           target_ulong addr,
+                                           int mmu_idx,
+                                           uintptr_t retaddr)
+{
+#ifndef CONFIG_USER_ONLY
+    /* FIXME: Probe the actual accesses (pass and use a size) */
+    if (unlikely(B_PAGESPAN(addr))) {
+        /* first page */
+        probe_write(env, addr, 0, mmu_idx, retaddr);
+        /* second page */
+        addr = (addr & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+        probe_write(env, addr, 0, mmu_idx, retaddr);
+    }
+#endif
+}
+
+void HELPER(vstelm_b)(CPULoongArchState *env,
+                      uint32_t vd, target_ulong addr, uint32_t sel)
+{
+    VReg *Vd = &(env->fpr[vd].vreg);
+    int mmu_idx = cpu_mmu_index(env, false);
+
+    ensure_b_writable_pages(env, addr, mmu_idx, GETPC());
+#if !defined(CONFIG_USER_ONLY)
+    MemOpIdx oi = make_memop_idx(MO_TE | MO_8 | MO_UNALN,
+                                 cpu_mmu_index(env, false));
+    helper_ret_stb_mmu(env, addr, Vd->B(sel), oi, GETPC());
+#else
+    cpu_stb_data(env, addr, Vd->B(sel));
+#endif
+}
+
+#define H_PAGESPAN(x) \
+        ((((x) & ~TARGET_PAGE_MASK) + 16/8 - 1) >= TARGET_PAGE_SIZE)
+
+static inline void ensure_h_writable_pages(CPULoongArchState *env,
+                                           target_ulong addr,
+                                           int mmu_idx,
+                                           uintptr_t retaddr)
+{
+#ifndef CONFIG_USER_ONLY
+    /* FIXME: Probe the actual accesses (pass and use a size) */
+    if (unlikely(H_PAGESPAN(addr))) {
+        /* first page */
+        probe_write(env, addr, 0, mmu_idx, retaddr);
+        /* second page */
+        addr = (addr & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+        probe_write(env, addr, 0, mmu_idx, retaddr);
+    }
+#endif
+}
+
+void HELPER(vstelm_h)(CPULoongArchState *env,
+                      uint32_t vd, target_ulong addr, uint32_t sel)
+{
+    VReg *Vd = &(env->fpr[vd].vreg);
+    int mmu_idx = cpu_mmu_index(env, false);
+
+    ensure_h_writable_pages(env, addr, mmu_idx, GETPC());
+#if !defined(CONFIG_USER_ONLY)
+    MemOpIdx oi = make_memop_idx(MO_TE | MO_16 | MO_UNALN,
+                                 cpu_mmu_index(env, false));
+    helper_le_stw_mmu(env, addr, Vd->H(sel), oi, GETPC());
+#else
+    cpu_stw_data(env, addr, Vd->H(sel));
+#endif
+}
+
+#define W_PAGESPAN(x) \
+        ((((x) & ~TARGET_PAGE_MASK) + 32/8 - 1) >= TARGET_PAGE_SIZE)
+
+static inline void ensure_w_writable_pages(CPULoongArchState *env,
+                                           target_ulong addr,
+                                           int mmu_idx,
+                                           uintptr_t retaddr)
+{
+#ifndef CONFIG_USER_ONLY
+    /* FIXME: Probe the actual accesses (pass and use a size) */
+    if (unlikely(W_PAGESPAN(addr))) {
+        /* first page */
+        probe_write(env, addr, 0, mmu_idx, retaddr);
+        /* second pdge */
+        addr = (addr & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+        probe_write(env, addr, 0, mmu_idx, retaddr);
+    }
+#endif
+}
+
+void helper_vstelm_w(CPULoongArchState *env,
+                     uint32_t vd, target_ulong addr, uint32_t sel)
+{
+    VReg *Vd = &(env->fpr[vd].vreg);
+    int mmu_idx = cpu_mmu_index(env, false);
+
+    ensure_w_writable_pages(env, addr, mmu_idx, GETPC());
+#if !defined(CONFIG_USER_ONLY)
+    MemOpIdx oi = make_memop_idx(MO_TE | MO_32 | MO_UNALN,
+                                 cpu_mmu_index(env, false));
+    helper_le_stl_mmu(env, addr, Vd->W(sel), oi, GETPC());
+#else
+    cpu_stl_data(env, addr, Vd->W(sel));
+#endif
+}
+
+#define D_PAGESPAN(x) \
+        ((((x) & ~TARGET_PAGE_MASK) + 32/8 - 1) >= TARGET_PAGE_SIZE)
+
+static inline void ensure_d_writable_pages(CPULoongArchState *env,
+                                           target_ulong addr,
+                                           int mmu_idx,
+                                           uintptr_t retaddr)
+{
+#ifndef CONFIG_USER_ONLY
+    /* FIXME: Probe the actual accesses (pass and use a size) */
+    if (unlikely(D_PAGESPAN(addr))) {
+        /* first page */
+        probe_write(env, addr, 0, mmu_idx, retaddr);
+        /* second page */
+        addr = (addr & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+        probe_write(env, addr, 0, mmu_idx, retaddr);
+    }
+#endif
+}
+
+void HELPER(vstelm_d)(CPULoongArchState *env,
+                      uint32_t vd, target_ulong addr, uint32_t sel)
+{
+    VReg *Vd = &(env->fpr[vd].vreg);
+    int mmu_idx = cpu_mmu_index(env, false);
+
+    ensure_d_writable_pages(env, addr, mmu_idx, GETPC());
+#if !defined(CONFIG_USER_ONLY)
+    MemOpIdx oi = make_memop_idx(MO_TE | MO_64 | MO_UNALN,
+                                 cpu_mmu_index(env, false));
+    helper_le_stq_mmu(env, addr, Vd->D(sel), oi, GETPC());
+#else
+    cpu_stq_data(env, addr, Vd->D(sel));
+#endif
+}
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index f50d14cc65..7f2ad7f542 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -41,11 +41,21 @@ static inline int plus_1(DisasContext *ctx, int x)
     return x + 1;
 }
 
+static inline int shl_1(DisasContext *ctx, int x)
+{
+    return x << 1;
+}
+
 static inline int shl_2(DisasContext *ctx, int x)
 {
     return x << 2;
 }
 
+static inline int shl_3(DisasContext *ctx, int x)
+{
+    return x << 3;
+}
+
 /*
  * LoongArch the upper 32 bits are undefined ("can be any value").
  * QEMU chooses to nanbox, because it is most likely to show guest bugs early.
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 43/44] target/loongarch: Implement vldi
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (41 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 42/44] target/loongarch: Implement vld vst Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-04  3:39   ` Richard Henderson
  2023-03-28  3:06 ` [RFC PATCH v2 44/44] target/loongarch: Use {set/get}_gpr replace to cpu_fpr Song Gao
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VLDI.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/disas.c                    |   7 +
 target/loongarch/insn_trans/trans_lsx.c.inc | 142 ++++++++++++++++++++
 target/loongarch/insns.decode               |   4 +
 3 files changed, 153 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 8627908fc9..5c402d944d 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -858,6 +858,11 @@ static void output_vrr(DisasContext *ctx, arg_vrr *a, const char *mnemonic)
     output(ctx, mnemonic, "v%d, r%d, r%d", a->vd, a->rj, a->rk);
 }
 
+static void output_v_i(DisasContext *ctx, arg_v_i *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "v%d, 0x%x", a->vd, a->imm);
+}
+
 INSN_LSX(vadd_b,           vvv)
 INSN_LSX(vadd_h,           vvv)
 INSN_LSX(vadd_w,           vvv)
@@ -1143,6 +1148,8 @@ INSN_LSX(vmskltz_d,        vv)
 INSN_LSX(vmskgez_b,        vv)
 INSN_LSX(vmsknz_b,         vv)
 
+INSN_LSX(vldi,             v_i)
+
 INSN_LSX(vand_v,           vvv)
 INSN_LSX(vor_v,            vvv)
 INSN_LSX(vxor_v,           vvv)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index ab896f8a9e..cb5aa9e4a9 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2606,6 +2606,148 @@ TRANS(vmskltz_d, gen_vv, gen_helper_vmskltz_d)
 TRANS(vmskgez_b, gen_vv, gen_helper_vmskgez_b)
 TRANS(vmsknz_b, gen_vv, gen_helper_vmsknz_b)
 
+#define EXPAND_BYTE(bit)  ((uint64_t)(bit ? 0xff : 0))
+
+static uint64_t vldi_get_value(DisasContext *ctx, uint32_t imm)
+{
+    int mode;
+    uint64_t data, t;
+
+    /*
+     * imm bit [11:8] is mode, mode value is 0-12.
+     * other values are invalid.
+     */
+    mode = (imm >> 8) & 0xf;
+    t =  imm & 0xff;
+    switch (mode) {
+    case 0:
+        /* data: {2{24'0, imm[7:0]}} */
+        data =  (t << 32) | t ;
+        break;
+    case 1:
+        /* data: {2{16'0, imm[7:0], 8'0}} */
+        data = (t << 24) | (t << 8);
+        break;
+    case 2:
+        /* data: {2{8'0, imm[7:0], 16'0}} */
+        data = (t << 48) | (t << 16);
+        break;
+    case 3:
+        /* data: {2{imm[7:0], 24'0}} */
+        data = (t << 56) | (t << 24);
+        break;
+    case 4:
+        /* data: {4{8'0, imm[7:0]}} */
+        data = (t << 48) | (t << 32) | (t << 16) | t;
+        break;
+    case 5:
+        /* data: {4{imm[7:0], 8'0}} */
+        data = (t << 56) |(t << 40) | (t << 24) | (t << 8);
+        break;
+    case 6:
+        /* data: {2{16'0, imm[7:0], 8'1}} */
+        data = (t << 40) | ((uint64_t)0xff << 32) | (t << 8) | 0xff;
+        break;
+    case 7:
+        /* data: {2{8'0, imm[7:0], 16'1}} */
+        data = (t << 48) | ((uint64_t)0xffff << 32) | (t << 16) | 0xffff;
+        break;
+    case 8:
+        /* data: {8{imm[7:0]}} */
+        data =(t << 56) | (t << 48) | (t << 40) | (t << 32) |
+              (t << 24) | (t << 16) | (t << 8) | t;
+        break;
+    case 9:
+        /* data: {{8{imm[7]}, ..., 8{imm[0]}}} */
+        {
+            uint64_t b0,b1,b2,b3,b4,b5,b6,b7;
+            b0 = t& 0x1;
+            b1 = (t & 0x2) >> 1;
+            b2 = (t & 0x4) >> 2;
+            b3 = (t & 0x8) >> 3;
+            b4 = (t & 0x10) >> 4;
+            b5 = (t & 0x20) >> 5;
+            b6 = (t & 0x40) >> 6;
+            b7 = (t & 0x80) >> 7;
+            data = (EXPAND_BYTE(b7) << 56) |
+                   (EXPAND_BYTE(b6) << 48) |
+                   (EXPAND_BYTE(b5) << 40) |
+                   (EXPAND_BYTE(b4) << 32) |
+                   (EXPAND_BYTE(b3) << 24) |
+                   (EXPAND_BYTE(b2) << 16) |
+                   (EXPAND_BYTE(b1) <<  8) |
+                   EXPAND_BYTE(b0);
+        }
+        break;
+    case 10:
+        /* data: {2{imm[7], ~imm[6], {5{imm[6]}}, imm[5:0], 19'0}} */
+        {
+            uint64_t b6, b7;
+            uint64_t t0, t1;
+            b6 = (imm & 0x40) >> 6;
+            b7 = (imm & 0x80) >> 7;
+            t0 = (imm & 0x3f);
+            t1 = (b7 << 6) | ((1-b6) << 5) | (uint64_t)(b6 ? 0x1f : 0);
+            data  = (t1 << 57) | (t0 << 51) | (t1 << 25) | (t0 << 19);
+        }
+        break;
+    case 11:
+        /* data: {32'0, imm[7], ~{imm[6]}, 5{imm[6]}, imm[5:0], 19'0} */
+        {
+            uint64_t b6,b7;
+            uint64_t t0, t1;
+            b6 = (imm & 0x40) >> 6;
+            b7 = (imm & 0x80) >> 7;
+            t0 = (imm & 0x3f);
+            t1 = (b7 << 6) | ((1-b6) << 5) | (b6 ? 0x1f : 0);
+            data = (t1 << 25) | (t0 << 19);
+        }
+        break;
+    case 12:
+        /* data: {imm[7], ~imm[6], 8{imm[6]}, imm[5:0], 48'0} */
+        {
+            uint64_t b6,b7;
+            uint64_t t0, t1;
+            b6 = (imm & 0x40) >> 6;
+            b7 = (imm & 0x80) >> 7;
+            t0 = (imm & 0x3f);
+            t1 = (b7 << 9) | ((1-b6) << 8) | (b6 ? 0xff : 0);
+            data = (t1 << 54) | (t0 << 48);
+        }
+        break;
+    default:
+        assert(0);
+    }
+    return data;
+}
+
+static bool trans_vldi(DisasContext *ctx, arg_vldi *a)
+{
+    int sel, vece;
+    uint64_t value;
+    CHECK_SXE;
+
+    sel = (a->imm >> 12) & 0x1;
+
+    if (sel) {
+        /* VSETI.D */
+        value = vldi_get_value(ctx, a->imm);
+        vece = MO_64;
+    } else {
+       /*
+        * VLDI.B/H/W/D
+        *  a->imm bit [11:10] is vece.
+        *  a->imm bit [9:0] is value;
+        */
+       value = ((int32_t)(a->imm << 22)) >> 22;
+       vece = (a->imm >> 10) & 0x3;
+    }
+
+    tcg_gen_gvec_dup_i64(vece, vreg_full_offset(a->vd), 16, 16,
+                         tcg_constant_i64(value));
+    return true;
+}
+
 TRANS(vand_v, gvec_vvv, MO_64, tcg_gen_gvec_and)
 TRANS(vor_v, gvec_vvv, MO_64, tcg_gen_gvec_or)
 TRANS(vxor_v, gvec_vvv, MO_64, tcg_gen_gvec_xor)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index ea6eedb7a9..c9c3bc2c73 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -513,6 +513,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 &vvr          vd vj rk
 &vrr          vd rj rk
 &vr_ii        vd rj imm imm2
+&v_i          vd imm
 
 #
 # LSX Formats
@@ -550,6 +551,7 @@ dbcl             0000 00000010 10101 ...............      @i15
 @vr_i8i3       .... ....... imm2:3 ........ rj:5 vd:5    &vr_ii imm=%i8s1
 @vr_i8i4          .... ...... imm2:4 imm:s8 rj:5 vd:5    &vr_ii
 @vrr               .... ........ ..... rk:5 rj:5 vd:5    &vrr
+@v_i13                   .... ........ .. imm:13 vd:5    &v_i
 
 vadd_b           0111 00000000 10100 ..... ..... .....    @vvv
 vadd_h           0111 00000000 10101 ..... ..... .....    @vvv
@@ -837,6 +839,8 @@ vmskltz_d        0111 00101001 11000 10011 ..... .....    @vv
 vmskgez_b        0111 00101001 11000 10100 ..... .....    @vv
 vmsknz_b         0111 00101001 11000 11000 ..... .....    @vv
 
+vldi             0111 00111110 00 ............. .....     @v_i13
+
 vand_v           0111 00010010 01100 ..... ..... .....    @vvv
 vor_v            0111 00010010 01101 ..... ..... .....    @vvv
 vxor_v           0111 00010010 01110 ..... ..... .....    @vvv
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC PATCH v2 44/44] target/loongarch: Use {set/get}_gpr replace to cpu_fpr
  2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
                   ` (42 preceding siblings ...)
  2023-03-28  3:06 ` [RFC PATCH v2 43/44] target/loongarch: Implement vldi Song Gao
@ 2023-03-28  3:06 ` Song Gao
  2023-04-04  3:44   ` Richard Henderson
  43 siblings, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-03-28  3:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

Introduce set_fpr() and get_fpr() and remove cpu_fpr.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 .../loongarch/insn_trans/trans_farith.c.inc   | 72 +++++++++++++++----
 target/loongarch/insn_trans/trans_fcmp.c.inc  | 12 ++--
 .../loongarch/insn_trans/trans_fmemory.c.inc  | 37 ++++++----
 target/loongarch/insn_trans/trans_fmov.c.inc  | 31 +++++---
 target/loongarch/translate.c                  | 20 ++++--
 5 files changed, 129 insertions(+), 43 deletions(-)

diff --git a/target/loongarch/insn_trans/trans_farith.c.inc b/target/loongarch/insn_trans/trans_farith.c.inc
index 7081fbb89b..21ea47308b 100644
--- a/target/loongarch/insn_trans/trans_farith.c.inc
+++ b/target/loongarch/insn_trans/trans_farith.c.inc
@@ -17,18 +17,29 @@
 static bool gen_fff(DisasContext *ctx, arg_fff *a,
                     void (*func)(TCGv, TCGv_env, TCGv, TCGv))
 {
+    TCGv dest = get_fpr(ctx, a->fd);
+    TCGv src1 = get_fpr(ctx, a->fj);
+    TCGv src2 = get_fpr(ctx, a->fk);
+
     CHECK_FPE;
 
-    func(cpu_fpr[a->fd], cpu_env, cpu_fpr[a->fj], cpu_fpr[a->fk]);
+    func(dest, cpu_env, src1, src2);
+    set_fpr(a->fd, dest);
+
     return true;
 }
 
 static bool gen_ff(DisasContext *ctx, arg_ff *a,
                    void (*func)(TCGv, TCGv_env, TCGv))
 {
+    TCGv dest = get_fpr(ctx, a->fd);
+    TCGv src = get_fpr(ctx, a->fj);
+
     CHECK_FPE;
 
-    func(cpu_fpr[a->fd], cpu_env, cpu_fpr[a->fj]);
+    func(dest, cpu_env, src);
+    set_fpr(a->fd, dest);
+
     return true;
 }
 
@@ -37,61 +48,98 @@ static bool gen_muladd(DisasContext *ctx, arg_ffff *a,
                        int flag)
 {
     TCGv_i32 tflag = tcg_constant_i32(flag);
+    TCGv dest = get_fpr(ctx, a->fd);
+    TCGv src1 = get_fpr(ctx, a->fj);
+    TCGv src2 = get_fpr(ctx, a->fk);
+    TCGv src3 = get_fpr(ctx, a->fa);
 
     CHECK_FPE;
 
-    func(cpu_fpr[a->fd], cpu_env, cpu_fpr[a->fj],
-         cpu_fpr[a->fk], cpu_fpr[a->fa], tflag);
+    func(dest, cpu_env, src1, src2, src3, tflag);
+    set_fpr(a->fd, dest);
+
     return true;
 }
 
 static bool trans_fcopysign_s(DisasContext *ctx, arg_fcopysign_s *a)
 {
+    TCGv dest = get_fpr(ctx, a->fd);
+    TCGv src1 = get_fpr(ctx, a->fk);
+    TCGv src2 = get_fpr(ctx, a->fj);
+
     CHECK_FPE;
 
-    tcg_gen_deposit_i64(cpu_fpr[a->fd], cpu_fpr[a->fk], cpu_fpr[a->fj], 0, 31);
+    tcg_gen_deposit_i64(dest, src1, src2, 0, 31);
+    set_fpr(a->fd, dest);
+
     return true;
 }
 
 static bool trans_fcopysign_d(DisasContext *ctx, arg_fcopysign_d *a)
 {
+    TCGv dest = get_fpr(ctx, a->fd);
+    TCGv src1 = get_fpr(ctx, a->fk);
+    TCGv src2 = get_fpr(ctx, a->fj);
+
     CHECK_FPE;
 
-    tcg_gen_deposit_i64(cpu_fpr[a->fd], cpu_fpr[a->fk], cpu_fpr[a->fj], 0, 63);
+    tcg_gen_deposit_i64(dest, src1, src2, 0, 63);
+    set_fpr(a->fd, dest);
+
     return true;
 }
 
 static bool trans_fabs_s(DisasContext *ctx, arg_fabs_s *a)
 {
+    TCGv dest = get_fpr(ctx, a->fd);
+    TCGv src = get_fpr(ctx, a->fj);
+
     CHECK_FPE;
 
-    tcg_gen_andi_i64(cpu_fpr[a->fd], cpu_fpr[a->fj], MAKE_64BIT_MASK(0, 31));
-    gen_nanbox_s(cpu_fpr[a->fd], cpu_fpr[a->fd]);
+    tcg_gen_andi_i64(dest, src, MAKE_64BIT_MASK(0, 31));
+    gen_nanbox_s(dest, dest);
+    set_fpr(a->fd, dest);
+
     return true;
 }
 
 static bool trans_fabs_d(DisasContext *ctx, arg_fabs_d *a)
 {
+    TCGv dest = get_fpr(ctx, a->fd);
+    TCGv src = get_fpr(ctx, a->fj);
+
     CHECK_FPE;
 
-    tcg_gen_andi_i64(cpu_fpr[a->fd], cpu_fpr[a->fj], MAKE_64BIT_MASK(0, 63));
+    tcg_gen_andi_i64(dest, src, MAKE_64BIT_MASK(0, 63));
+    set_fpr(a->fd, dest);
+
     return true;
 }
 
 static bool trans_fneg_s(DisasContext *ctx, arg_fneg_s *a)
 {
+    TCGv dest = get_fpr(ctx, a->fd);
+    TCGv src = get_fpr(ctx, a->fj);
+
     CHECK_FPE;
 
-    tcg_gen_xori_i64(cpu_fpr[a->fd], cpu_fpr[a->fj], 0x80000000);
-    gen_nanbox_s(cpu_fpr[a->fd], cpu_fpr[a->fd]);
+    tcg_gen_xori_i64(dest, src, 0x80000000);
+    gen_nanbox_s(dest, dest);
+    set_fpr(a->fd, dest);
+
     return true;
 }
 
 static bool trans_fneg_d(DisasContext *ctx, arg_fneg_d *a)
 {
+    TCGv dest = get_fpr(ctx, a->fd);
+    TCGv src = get_fpr(ctx, a->fj);
+
     CHECK_FPE;
 
-    tcg_gen_xori_i64(cpu_fpr[a->fd], cpu_fpr[a->fj], 0x8000000000000000LL);
+    tcg_gen_xori_i64(dest, src, 0x8000000000000000LL);
+    set_fpr(a->fd, dest);
+
     return true;
 }
 
diff --git a/target/loongarch/insn_trans/trans_fcmp.c.inc b/target/loongarch/insn_trans/trans_fcmp.c.inc
index 3b0da2b9f4..a78868dbc4 100644
--- a/target/loongarch/insn_trans/trans_fcmp.c.inc
+++ b/target/loongarch/insn_trans/trans_fcmp.c.inc
@@ -25,17 +25,19 @@ static uint32_t get_fcmp_flags(int cond)
 
 static bool trans_fcmp_cond_s(DisasContext *ctx, arg_fcmp_cond_s *a)
 {
-    TCGv var;
+    TCGv var, src1, src2;
     uint32_t flags;
     void (*fn)(TCGv, TCGv_env, TCGv, TCGv, TCGv_i32);
 
     CHECK_FPE;
 
     var = tcg_temp_new();
+    src1 = get_fpr(ctx, a->fj);
+    src2 = get_fpr(ctx, a->fk);
     fn = (a->fcond & 1 ? gen_helper_fcmp_s_s : gen_helper_fcmp_c_s);
     flags = get_fcmp_flags(a->fcond >> 1);
 
-    fn(var, cpu_env, cpu_fpr[a->fj], cpu_fpr[a->fk], tcg_constant_i32(flags));
+    fn(var, cpu_env, src1, src2, tcg_constant_i32(flags));
 
     tcg_gen_st8_tl(var, cpu_env, offsetof(CPULoongArchState, cf[a->cd]));
     return true;
@@ -43,17 +45,19 @@ static bool trans_fcmp_cond_s(DisasContext *ctx, arg_fcmp_cond_s *a)
 
 static bool trans_fcmp_cond_d(DisasContext *ctx, arg_fcmp_cond_d *a)
 {
-    TCGv var;
+    TCGv var, src1, src2;
     uint32_t flags;
     void (*fn)(TCGv, TCGv_env, TCGv, TCGv, TCGv_i32);
 
     CHECK_FPE;
 
     var = tcg_temp_new();
+    src1 = get_fpr(ctx, a->fj);
+    src2 = get_fpr(ctx, a->fk);
     fn = (a->fcond & 1 ? gen_helper_fcmp_s_d : gen_helper_fcmp_c_d);
     flags = get_fcmp_flags(a->fcond >> 1);
 
-    fn(var, cpu_env, cpu_fpr[a->fj], cpu_fpr[a->fk], tcg_constant_i32(flags));
+    fn(var, cpu_env, src1, src2, tcg_constant_i32(flags));
 
     tcg_gen_st8_tl(var, cpu_env, offsetof(CPULoongArchState, cf[a->cd]));
     return true;
diff --git a/target/loongarch/insn_trans/trans_fmemory.c.inc b/target/loongarch/insn_trans/trans_fmemory.c.inc
index 0d11843873..91c09fb6d9 100644
--- a/target/loongarch/insn_trans/trans_fmemory.c.inc
+++ b/target/loongarch/insn_trans/trans_fmemory.c.inc
@@ -13,6 +13,7 @@ static void maybe_nanbox_load(TCGv freg, MemOp mop)
 static bool gen_fload_i(DisasContext *ctx, arg_fr_i *a, MemOp mop)
 {
     TCGv addr = gpr_src(ctx, a->rj, EXT_NONE);
+    TCGv dest = get_fpr(ctx, a->fd);
 
     CHECK_FPE;
 
@@ -22,8 +23,9 @@ static bool gen_fload_i(DisasContext *ctx, arg_fr_i *a, MemOp mop)
         addr = temp;
     }
 
-    tcg_gen_qemu_ld_tl(cpu_fpr[a->fd], addr, ctx->mem_idx, mop);
-    maybe_nanbox_load(cpu_fpr[a->fd], mop);
+    tcg_gen_qemu_ld_tl(dest, addr, ctx->mem_idx, mop);
+    maybe_nanbox_load(dest, mop);
+    set_fpr(a->fd, dest);
 
     return true;
 }
@@ -31,6 +33,7 @@ static bool gen_fload_i(DisasContext *ctx, arg_fr_i *a, MemOp mop)
 static bool gen_fstore_i(DisasContext *ctx, arg_fr_i *a, MemOp mop)
 {
     TCGv addr = gpr_src(ctx, a->rj, EXT_NONE);
+    TCGv src = get_fpr(ctx, a->fd);
 
     CHECK_FPE;
 
@@ -40,7 +43,8 @@ static bool gen_fstore_i(DisasContext *ctx, arg_fr_i *a, MemOp mop)
         addr = temp;
     }
 
-    tcg_gen_qemu_st_tl(cpu_fpr[a->fd], addr, ctx->mem_idx, mop);
+    tcg_gen_qemu_st_tl(src, addr, ctx->mem_idx, mop);
+
     return true;
 }
 
@@ -48,14 +52,16 @@ static bool gen_floadx(DisasContext *ctx, arg_frr *a, MemOp mop)
 {
     TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE);
     TCGv src2 = gpr_src(ctx, a->rk, EXT_NONE);
+    TCGv dest = get_fpr(ctx, a->fd);
     TCGv addr;
 
     CHECK_FPE;
 
     addr = tcg_temp_new();
     tcg_gen_add_tl(addr, src1, src2);
-    tcg_gen_qemu_ld_tl(cpu_fpr[a->fd], addr, ctx->mem_idx, mop);
-    maybe_nanbox_load(cpu_fpr[a->fd], mop);
+    tcg_gen_qemu_ld_tl(dest, addr, ctx->mem_idx, mop);
+    maybe_nanbox_load(dest, mop);
+    set_fpr(a->fd, dest);
 
     return true;
 }
@@ -64,13 +70,14 @@ static bool gen_fstorex(DisasContext *ctx, arg_frr *a, MemOp mop)
 {
     TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE);
     TCGv src2 = gpr_src(ctx, a->rk, EXT_NONE);
+    TCGv src3 = get_fpr(ctx, a->fd);
     TCGv addr;
 
     CHECK_FPE;
 
     addr = tcg_temp_new();
     tcg_gen_add_tl(addr, src1, src2);
-    tcg_gen_qemu_st_tl(cpu_fpr[a->fd], addr, ctx->mem_idx, mop);
+    tcg_gen_qemu_st_tl(src3, addr, ctx->mem_idx, mop);
 
     return true;
 }
@@ -79,6 +86,7 @@ static bool gen_fload_gt(DisasContext *ctx, arg_frr *a, MemOp mop)
 {
     TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE);
     TCGv src2 = gpr_src(ctx, a->rk, EXT_NONE);
+    TCGv dest = get_fpr(ctx, a->fd);
     TCGv addr;
 
     CHECK_FPE;
@@ -86,8 +94,9 @@ static bool gen_fload_gt(DisasContext *ctx, arg_frr *a, MemOp mop)
     addr = tcg_temp_new();
     gen_helper_asrtgt_d(cpu_env, src1, src2);
     tcg_gen_add_tl(addr, src1, src2);
-    tcg_gen_qemu_ld_tl(cpu_fpr[a->fd], addr, ctx->mem_idx, mop);
-    maybe_nanbox_load(cpu_fpr[a->fd], mop);
+    tcg_gen_qemu_ld_tl(dest, addr, ctx->mem_idx, mop);
+    maybe_nanbox_load(dest, mop);
+    set_fpr(a->fd, dest);
 
     return true;
 }
@@ -96,6 +105,7 @@ static bool gen_fstore_gt(DisasContext *ctx, arg_frr *a, MemOp mop)
 {
     TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE);
     TCGv src2 = gpr_src(ctx, a->rk, EXT_NONE);
+    TCGv src3 = get_fpr(ctx, a->fd);
     TCGv addr;
 
     CHECK_FPE;
@@ -103,7 +113,7 @@ static bool gen_fstore_gt(DisasContext *ctx, arg_frr *a, MemOp mop)
     addr = tcg_temp_new();
     gen_helper_asrtgt_d(cpu_env, src1, src2);
     tcg_gen_add_tl(addr, src1, src2);
-    tcg_gen_qemu_st_tl(cpu_fpr[a->fd], addr, ctx->mem_idx, mop);
+    tcg_gen_qemu_st_tl(src3, addr, ctx->mem_idx, mop);
 
     return true;
 }
@@ -112,6 +122,7 @@ static bool gen_fload_le(DisasContext *ctx, arg_frr *a, MemOp mop)
 {
     TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE);
     TCGv src2 = gpr_src(ctx, a->rk, EXT_NONE);
+    TCGv dest = get_fpr(ctx, a->fd);
     TCGv addr;
 
     CHECK_FPE;
@@ -119,8 +130,9 @@ static bool gen_fload_le(DisasContext *ctx, arg_frr *a, MemOp mop)
     addr = tcg_temp_new();
     gen_helper_asrtle_d(cpu_env, src1, src2);
     tcg_gen_add_tl(addr, src1, src2);
-    tcg_gen_qemu_ld_tl(cpu_fpr[a->fd], addr, ctx->mem_idx, mop);
-    maybe_nanbox_load(cpu_fpr[a->fd], mop);
+    tcg_gen_qemu_ld_tl(dest, addr, ctx->mem_idx, mop);
+    maybe_nanbox_load(dest, mop);
+    set_fpr(a->fd, dest);
 
     return true;
 }
@@ -129,6 +141,7 @@ static bool gen_fstore_le(DisasContext *ctx, arg_frr *a, MemOp mop)
 {
     TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE);
     TCGv src2 = gpr_src(ctx, a->rk, EXT_NONE);
+    TCGv src3 = get_fpr(ctx, a->fd);
     TCGv addr;
 
     CHECK_FPE;
@@ -136,7 +149,7 @@ static bool gen_fstore_le(DisasContext *ctx, arg_frr *a, MemOp mop)
     addr = tcg_temp_new();
     gen_helper_asrtle_d(cpu_env, src1, src2);
     tcg_gen_add_tl(addr, src1, src2);
-    tcg_gen_qemu_st_tl(cpu_fpr[a->fd], addr, ctx->mem_idx, mop);
+    tcg_gen_qemu_st_tl(src3, addr, ctx->mem_idx, mop);
 
     return true;
 }
diff --git a/target/loongarch/insn_trans/trans_fmov.c.inc b/target/loongarch/insn_trans/trans_fmov.c.inc
index 069c941665..5af0dd1b66 100644
--- a/target/loongarch/insn_trans/trans_fmov.c.inc
+++ b/target/loongarch/insn_trans/trans_fmov.c.inc
@@ -10,14 +10,17 @@ static const uint32_t fcsr_mask[4] = {
 static bool trans_fsel(DisasContext *ctx, arg_fsel *a)
 {
     TCGv zero = tcg_constant_tl(0);
+    TCGv dest = get_fpr(ctx, a->fd);
+    TCGv src1 = get_fpr(ctx, a->fj);
+    TCGv src2 = get_fpr(ctx, a->fk);
     TCGv cond;
 
     CHECK_FPE;
 
     cond = tcg_temp_new();
     tcg_gen_ld8u_tl(cond, cpu_env, offsetof(CPULoongArchState, cf[a->ca]));
-    tcg_gen_movcond_tl(TCG_COND_EQ, cpu_fpr[a->fd], cond, zero,
-                       cpu_fpr[a->fj], cpu_fpr[a->fk]);
+    tcg_gen_movcond_tl(TCG_COND_EQ, dest, cond, zero, src1, src2);
+    set_fpr(a->fd, dest);
 
     return true;
 }
@@ -25,15 +28,16 @@ static bool trans_fsel(DisasContext *ctx, arg_fsel *a)
 static bool gen_f2f(DisasContext *ctx, arg_ff *a,
                     void (*func)(TCGv, TCGv), bool nanbox)
 {
-    TCGv dest = cpu_fpr[a->fd];
-    TCGv src = cpu_fpr[a->fj];
+    TCGv dest = get_fpr(ctx, a->fd);
+    TCGv src = get_fpr(ctx, a->fj);
 
     CHECK_FPE;
 
     func(dest, src);
     if (nanbox) {
-        gen_nanbox_s(cpu_fpr[a->fd], cpu_fpr[a->fd]);
+        gen_nanbox_s(dest, dest);
     }
+    set_fpr(a->fd, dest);
 
     return true;
 }
@@ -42,10 +46,13 @@ static bool gen_r2f(DisasContext *ctx, arg_fr *a,
                     void (*func)(TCGv, TCGv))
 {
     TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
+    TCGv dest = get_fpr(ctx, a->fd);
 
     CHECK_FPE;
 
-    func(cpu_fpr[a->fd], src);
+    func(dest, src);
+    set_fpr(a->fd, dest);
+
     return true;
 }
 
@@ -53,10 +60,11 @@ static bool gen_f2r(DisasContext *ctx, arg_rf *a,
                     void (*func)(TCGv, TCGv))
 {
     TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE);
+    TCGv src = get_fpr(ctx, a->fj);
 
     CHECK_FPE;
 
-    func(dest, cpu_fpr[a->fj]);
+    func(dest, src);
     gen_set_gpr(a->rd, dest, EXT_NONE);
 
     return true;
@@ -124,11 +132,12 @@ static void gen_movfrh2gr_s(TCGv dest, TCGv src)
 static bool trans_movfr2cf(DisasContext *ctx, arg_movfr2cf *a)
 {
     TCGv t0;
+    TCGv src = get_fpr(ctx, a->fj);
 
     CHECK_FPE;
 
     t0 = tcg_temp_new();
-    tcg_gen_andi_tl(t0, cpu_fpr[a->fj], 0x1);
+    tcg_gen_andi_tl(t0, src, 0x1);
     tcg_gen_st8_tl(t0, cpu_env, offsetof(CPULoongArchState, cf[a->cd & 0x7]));
 
     return true;
@@ -136,10 +145,14 @@ static bool trans_movfr2cf(DisasContext *ctx, arg_movfr2cf *a)
 
 static bool trans_movcf2fr(DisasContext *ctx, arg_movcf2fr *a)
 {
+    TCGv dest = get_fpr(ctx, a->fd);
+
     CHECK_FPE;
 
-    tcg_gen_ld8u_tl(cpu_fpr[a->fd], cpu_env,
+    tcg_gen_ld8u_tl(dest, cpu_env,
                     offsetof(CPULoongArchState, cf[a->cj & 0x7]));
+    set_fpr(a->fd, dest);
+
     return true;
 }
 
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index 7f2ad7f542..dbf8545c9d 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -23,7 +23,6 @@
 /* Global register indices */
 TCGv cpu_gpr[32], cpu_pc;
 static TCGv cpu_lladdr, cpu_llval;
-TCGv_i64 cpu_fpr[32];
 
 #include "exec/gen-icount.h"
 
@@ -174,6 +173,20 @@ static void gen_set_gpr(int reg_num, TCGv t, DisasExtend dst_ext)
     }
 }
 
+static TCGv get_fpr(DisasContext *ctx, int reg_num)
+{
+    TCGv t = tcg_temp_new();
+    tcg_gen_ld_i64(t, cpu_env,
+                   offsetof(CPULoongArchState, fpr[reg_num].vreg.D(0)));
+    return  t;
+}
+
+static void set_fpr(int reg_num, TCGv val)
+{
+    tcg_gen_st_i64(val, cpu_env,
+                   offsetof(CPULoongArchState, fpr[reg_num].vreg.D(0)));
+}
+
 #include "decode-insns.c.inc"
 #include "insn_trans/trans_arith.c.inc"
 #include "insn_trans/trans_shift.c.inc"
@@ -268,11 +281,6 @@ void loongarch_translate_init(void)
                                         regnames[i]);
     }
 
-    for (i = 0; i < 32; i++) {
-        int off = offsetof(CPULoongArchState, fpr[i]);
-        cpu_fpr[i] = tcg_global_mem_new_i64(cpu_env, off, fregnames[i]);
-    }
-
     cpu_pc = tcg_global_mem_new(cpu_env, offsetof(CPULoongArchState, pc), "pc");
     cpu_lladdr = tcg_global_mem_new(cpu_env,
                     offsetof(CPULoongArchState, lladdr), "lladdr");
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 02/44] target/loongarch: CPUCFG support LSX
  2023-03-28  3:05 ` [RFC PATCH v2 02/44] target/loongarch: CPUCFG support LSX Song Gao
@ 2023-03-28 19:33   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-03-28 19:33 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:05, Song Gao wrote:
> Signed-off-by: Song Gao <gaosong@loongson.cn>
> ---
>   target/loongarch/cpu.c | 1 +
>   1 file changed, 1 insertion(+)

This patch should be sorted last, once the entire extension is present.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>




^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 03/44] target/loongarch: meson.build support build LSX
  2023-03-28  3:05 ` [RFC PATCH v2 03/44] target/loongarch: meson.build support build LSX Song Gao
@ 2023-03-28 19:35   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-03-28 19:35 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:05, Song Gao wrote:
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/insn_trans/trans_lsx.c.inc | 5 +++++
>   target/loongarch/lsx_helper.c               | 6 ++++++
>   target/loongarch/meson.build                | 1 +
>   target/loongarch/translate.c                | 1 +
>   4 files changed, 13 insertions(+)
>   create mode 100644 target/loongarch/insn_trans/trans_lsx.c.inc
>   create mode 100644 target/loongarch/lsx_helper.c

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 04/44] target/loongarch: Add CHECK_SXE maccro for check LSX enable
  2023-03-28  3:05 ` [RFC PATCH v2 04/44] target/loongarch: Add CHECK_SXE maccro for check LSX enable Song Gao
@ 2023-03-28 19:42   ` Richard Henderson
  2023-03-29  2:28     ` gaosong
  0 siblings, 1 reply; 114+ messages in thread
From: Richard Henderson @ 2023-03-28 19:42 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:05, Song Gao wrote:
> --- a/target/loongarch/cpu.c
> +++ b/target/loongarch/cpu.c
> @@ -52,6 +52,7 @@ static const char * const excp_names[] = {
>       [EXCCODE_FPE] = "Floating Point Exception",
>       [EXCCODE_DBP] = "Debug breakpoint",
>       [EXCCODE_BCE] = "Bound Check Exception",
> +    [EXCCODE_SXD] = "128 bit vector instructions Disable exception",
>   };
>   
>   const char *loongarch_exception_name(int32_t exception)
> @@ -187,6 +188,7 @@ static void loongarch_cpu_do_interrupt(CPUState *cs)
>       case EXCCODE_FPD:
>       case EXCCODE_FPE:
>       case EXCCODE_BCE:
> +    case EXCCODE_ASXD:

SXD?

 From what little documentation is present in Volume 1, ASXD appears to be for a 256-bit 
vector extension?


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 05/44] target/loongarch: Implement vadd/vsub
  2023-03-28  3:05 ` [RFC PATCH v2 05/44] target/loongarch: Implement vadd/vsub Song Gao
@ 2023-03-28 19:50   ` Richard Henderson
  2023-03-28 19:59   ` Richard Henderson
  1 sibling, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-03-28 19:50 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:05, Song Gao wrote:
> This patch includes:
> - VADD.{B/H/W/D/Q};
> - VSUB.{B/H/W/D/Q}.
> 
> Signed-off-by: Song Gao <gaosong@loongson.cn>
> ---
>   target/loongarch/disas.c                    | 23 ++++++++++++
>   target/loongarch/helper.h                   |  4 +++
>   target/loongarch/insn_trans/trans_lsx.c.inc | 40 +++++++++++++++++++++
>   target/loongarch/insns.decode               | 22 ++++++++++++
>   target/loongarch/lsx_helper.c               | 25 +++++++++++++
>   target/loongarch/translate.c                |  7 ++++
>   6 files changed, 121 insertions(+)

I did mention that you could use tcg_gen_{add,sub}2_i64 to perform the 128-bit arithmetic 
inline.  But that could be improved later.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 01/44] target/loongarch: Add LSX data type VReg
  2023-03-28  3:05 ` [RFC PATCH v2 01/44] target/loongarch: Add LSX data type VReg Song Gao
@ 2023-03-28 19:56   ` Richard Henderson
  2023-03-29  2:28     ` gaosong
  0 siblings, 1 reply; 114+ messages in thread
From: Richard Henderson @ 2023-03-28 19:56 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:05, Song Gao wrote:
> Signed-off-by: Song Gao <gaosong@loongson.cn>
> ---
>   linux-user/loongarch64/signal.c |  4 ++--
>   target/loongarch/cpu.c          |  2 +-
>   target/loongarch/cpu.h          | 31 +++++++++++++++++++++++++++++-
>   target/loongarch/gdbstub.c      |  4 ++--
>   target/loongarch/machine.c      | 34 ++++++++++++++++++++++++++++++++-
>   5 files changed, 68 insertions(+), 7 deletions(-)
> 
> diff --git a/linux-user/loongarch64/signal.c b/linux-user/loongarch64/signal.c
> index 7c7afb652e..bb8efb1172 100644
> --- a/linux-user/loongarch64/signal.c
> +++ b/linux-user/loongarch64/signal.c
> @@ -128,7 +128,7 @@ static void setup_sigframe(CPULoongArchState *env,
>   
>       fpu_ctx = (struct target_fpu_context *)(info + 1);
>       for (i = 0; i < 32; ++i) {
> -        __put_user(env->fpr[i], &fpu_ctx->regs[i]);
> +        __put_user(env->fpr[i].vreg.D(0), &fpu_ctx->regs[i]);
>       }
>       __put_user(read_fcc(env), &fpu_ctx->fcc);
>       __put_user(env->fcsr0, &fpu_ctx->fcsr);
> @@ -193,7 +193,7 @@ static void restore_sigframe(CPULoongArchState *env,
>           uint64_t fcc;
>   
>           for (i = 0; i < 32; ++i) {
> -            __get_user(env->fpr[i], &fpu_ctx->regs[i]);
> +            __get_user(env->fpr[i].vreg.D(0), &fpu_ctx->regs[i]);
>           }
>           __get_user(fcc, &fpu_ctx->fcc);
>           write_fcc(env, fcc);
> diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
> index 97e6579f6a..18b41221a6 100644
> --- a/target/loongarch/cpu.c
> +++ b/target/loongarch/cpu.c
> @@ -656,7 +656,7 @@ void loongarch_cpu_dump_state(CPUState *cs, FILE *f, int flags)
>       /* fpr */
>       if (flags & CPU_DUMP_FPU) {
>           for (i = 0; i < 32; i++) {
> -            qemu_fprintf(f, " %s %016" PRIx64, fregnames[i], env->fpr[i]);
> +            qemu_fprintf(f, " %s %016" PRIx64, fregnames[i], env->fpr[i].vreg.D(0));
>               if ((i & 3) == 3) {
>                   qemu_fprintf(f, "\n");
>               }
> diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
> index e11c875188..6e5fa6a01d 100644
> --- a/target/loongarch/cpu.h
> +++ b/target/loongarch/cpu.h
> @@ -8,6 +8,7 @@
>   #ifndef LOONGARCH_CPU_H
>   #define LOONGARCH_CPU_H
>   
> +#include "qemu/int128.h"
>   #include "exec/cpu-defs.h"
>   #include "fpu/softfloat-types.h"
>   #include "hw/registerfields.h"
> @@ -241,6 +242,34 @@ FIELD(TLB_MISC, ASID, 1, 10)
>   FIELD(TLB_MISC, VPPN, 13, 35)
>   FIELD(TLB_MISC, PS, 48, 6)
>   
> +#define LSX_LEN   (128)
> +typedef union VReg {
> +    int8_t   B[LSX_LEN / 8];
> +    int16_t  H[LSX_LEN / 16];
> +    int32_t  W[LSX_LEN / 32];
> +    int64_t  D[LSX_LEN / 64];
> +    Int128   Q[LSX_LEN / 128];
> +}VReg;
> +
> +typedef union fpr_t fpr_t;
> +union fpr_t {
> +    VReg  vreg;
> +};
> +
> +#if  HOST_BIG_ENDIAN
> +#define B(x)  B[15 - (x)]
> +#define H(x)  H[7 - (x)]
> +#define W(x)  W[3 - (x)]
> +#define D(x)  D[1 - (x)]
> +#define Q(x)  Q[x]
> +#else
> +#define B(x)  B[x]
> +#define H(x)  H[x]
> +#define W(x)  W[x]
> +#define D(x)  D[x]
> +#define Q(x)  Q[x]
> +#endif

It would probably be better to move these rather generically named macros outside of cpu.h 
(e.g. internals.h).

> @@ -33,7 +33,39 @@ const VMStateDescription vmstate_loongarch_cpu = {
>   
>           VMSTATE_UINTTL_ARRAY(env.gpr, LoongArchCPU, 32),
>           VMSTATE_UINTTL(env.pc, LoongArchCPU),
> -        VMSTATE_UINT64_ARRAY(env.fpr, LoongArchCPU, 32),
> +        VMSTATE_INT64(env.fpr[0].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[1].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[2].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[3].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[4].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[5].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[6].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[7].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[8].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[9].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[10].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[11].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[12].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[13].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[14].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[15].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[16].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[17].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[18].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[19].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[20].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[21].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[22].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[23].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[24].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[25].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[26].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[27].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[28].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[29].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[30].vreg.D(0), LoongArchCPU),
> +        VMSTATE_INT64(env.fpr[31].vreg.D(0), LoongArchCPU),

Do you care about migration compatibility between qemu versions?
If not, it might be easier to handle the vector registers differently.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 06/44] target/loongarch: Implement vaddi/vsubi
  2023-03-28  3:05 ` [RFC PATCH v2 06/44] target/loongarch: Implement vaddi/vsubi Song Gao
@ 2023-03-28 19:58   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-03-28 19:58 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:05, Song Gao wrote:
> +    tcg_gen_gvec_addi(mop, vd_ofs, vj_ofs, -(a->imm), 16, 16);

No need for parenthesis around a->imm.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 05/44] target/loongarch: Implement vadd/vsub
  2023-03-28  3:05 ` [RFC PATCH v2 05/44] target/loongarch: Implement vadd/vsub Song Gao
  2023-03-28 19:50   ` Richard Henderson
@ 2023-03-28 19:59   ` Richard Henderson
  2023-03-29  9:59     ` gaosong
  1 sibling, 1 reply; 114+ messages in thread
From: Richard Henderson @ 2023-03-28 19:59 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:05, Song Gao wrote:
> +    func(mop, vd_ofs, vj_ofs, vk_ofs, 16, 16);

Oh, reading about ASXD and 256-bit vectors makes me wonder if it would be better to plan 
ahead and have a function, or DisasContext member, for the length of the vector.


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 07/44] target/loongarch: Implement vneg
  2023-03-28  3:05 ` [RFC PATCH v2 07/44] target/loongarch: Implement vneg Song Gao
@ 2023-03-28 20:02   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-03-28 20:02 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:05, Song Gao wrote:
> This patch includes;
> - VNEG.{B/H/W/D}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/disas.c                    | 10 ++++++++++
>   target/loongarch/insn_trans/trans_lsx.c.inc | 20 ++++++++++++++++++++
>   target/loongarch/insns.decode               |  7 +++++++
>   3 files changed, 37 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 08/44] target/loongarch: Implement vsadd/vssub
  2023-03-28  3:05 ` [RFC PATCH v2 08/44] target/loongarch: Implement vsadd/vssub Song Gao
@ 2023-03-28 20:03   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-03-28 20:03 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:05, Song Gao wrote:
> This patch includes:
> - VSADD.{B/H/W/D}[U];
> - VSSUB.{B/H/W/D}[U].
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/disas.c                    | 17 +++++++++++++++++
>   target/loongarch/insn_trans/trans_lsx.c.inc | 17 +++++++++++++++++
>   target/loongarch/insns.decode               | 17 +++++++++++++++++
>   3 files changed, 51 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 09/44] target/loongarch: Implement vhaddw/vhsubw
  2023-03-28  3:05 ` [RFC PATCH v2 09/44] target/loongarch: Implement vhaddw/vhsubw Song Gao
@ 2023-03-28 20:17   ` Richard Henderson
  2023-03-29  3:24     ` gaosong
  0 siblings, 1 reply; 114+ messages in thread
From: Richard Henderson @ 2023-03-28 20:17 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:05, Song Gao wrote:
> +#define DO_ODD_EVEN_S(NAME, BIT, T, E1, E2, DO_OP)                 \
> +void HELPER(NAME)(CPULoongArchState *env,                          \
> +                  uint32_t vd, uint32_t vj, uint32_t vk)           \
> +{                                                                  \
> +    int i;                                                         \
> +    VReg *Vd = &(env->fpr[vd].vreg);                               \
> +    VReg *Vj = &(env->fpr[vj].vreg);                               \
> +    VReg *Vk = &(env->fpr[vk].vreg);                               \
> +                                                                   \
> +    for (i = 0; i < LSX_LEN/BIT; i++) {                            \
> +        Vd->E1(i) = DO_OP((T)Vj->E2(2 * i + 1), (T)Vk->E2(2 * i)); \
> +    }                                                              \
> +}
...
> +#define DO_ODD_EVEN_U(NAME, BIT, TD, TS,  E1, E2, DO_OP)                     \
> +void HELPER(NAME)(CPULoongArchState *env,                                    \
> +                  uint32_t vd, uint32_t vj, uint32_t vk)                     \
> +{                                                                            \
> +    int i;                                                                   \
> +    VReg *Vd = &(env->fpr[vd].vreg);                                         \
> +    VReg *Vj = &(env->fpr[vj].vreg);                                         \
> +    VReg *Vk = &(env->fpr[vk].vreg);                                         \
> +                                                                             \
> +    for (i = 0; i < LSX_LEN/BIT; i++) {                                      \
> +        Vd->E1(i) = DO_OP((TD)(TS)Vj->E2(2 * i + 1), (TD)(TS)Vk->E2(2 * i)); \
> +    }                                                                        \
> +}

In the first case we have one cast, in the second case we have two.  I wonder if it would 
be clearer to have both signed and unsigned members in the VReg union?  Then these two 
macros could be combined.

I also think we could make use of (__typeof(Vd->E1(0))) instead of separately passing the 
output type?  It would appear to be less error-prone.

All that said, the code as written is correct so,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 10/44] target/loongarch: Implement vaddw/vsubw
  2023-03-28  3:05 ` [RFC PATCH v2 10/44] target/loongarch: Implement vaddw/vsubw Song Gao
@ 2023-03-28 20:28   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-03-28 20:28 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:05, Song Gao wrote:
> +static void gen_vaddwev_w_h(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b)
> +{
> +    TCGv_i32 t1, t2;
> +
> +    t1 = tcg_temp_new_i32();
> +    t2 = tcg_temp_new_i32();
> +    tcg_gen_shli_i32(t1, a, 16);
> +    tcg_gen_sari_i32(t1, t1, 16);
> +    tcg_gen_shli_i32(t2, b, 16);
> +    tcg_gen_sari_i32(t2, t2, 16);
> +    tcg_gen_add_i32(t, t1, t2);
> +}
> +
> +static void gen_vaddwev_d_w(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)
> +{
> +    TCGv_i64 t1, t2;
> +
> +    t1 = tcg_temp_new_i64();
> +    t2 = tcg_temp_new_i64();
> +    tcg_gen_shli_i64(t1, a, 32);
> +    tcg_gen_sari_i64(t1, t1, 32);
> +    tcg_gen_shli_i64(t2, b, 32);
> +    tcg_gen_sari_i64(t2, t2, 32);
> +    tcg_gen_add_i64(t, t1, t2);
> +}

For integer code like this, use tcg_gen_ext16s_i32/tcg_gen_ext32s_i64.

> +static void gen_vaddwev_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
> +{
> +    TCGv_vec t1, t2;
> +
> +    int halfbits = 4 << vece;
> +
> +    t1 = tcg_temp_new_vec_matching(a);
> +    t2 = tcg_temp_new_vec_matching(b);
> +
> +    /* Zero-extend the even elements from a */
> +    tcg_gen_shli_vec(vece, t1, a, halfbits);
> +    tcg_gen_shri_vec(vece, t1, t1, halfbits);
> +
> +    /* Zero-extend the even elements from b */
> +    tcg_gen_shli_vec(vece, t2, b, halfbits);
> +    tcg_gen_shri_vec(vece, t2, t2, halfbits);
> +
> +    tcg_gen_add_vec(vece, t, t1, t2);
> +}

uint64_t mask = MAKE_64BIT_MASK(0, halfbits);
tcg_gen_andi_vec(vece, t1, a, mask);

> +static void gen_vaddwev_w_hu(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b)
> +{
> +    TCGv_i32 t1, t2;
> +
> +    t1 = tcg_temp_new_i32();
> +    t2 = tcg_temp_new_i32();
> +    tcg_gen_shli_i32(t1, a, 16);
> +    tcg_gen_shri_i32(t1, t1, 16);
> +    tcg_gen_shli_i32(t2, b, 16);
> +    tcg_gen_shri_i32(t2, t2, 16);
> +    tcg_gen_add_i32(t, t1, t2);
> +}

tcg_gen_ext16u_i32.


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 11/44] target/loongarch: Implement vavg/vavgr
  2023-03-28  3:05 ` [RFC PATCH v2 11/44] target/loongarch: Implement vavg/vavgr Song Gao
@ 2023-03-28 20:31   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-03-28 20:31 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:05, Song Gao wrote:
> This patch includes:
> - VAVG.{B/H/W/D}[U];
> - VAVGR.{B/H/W/D}[U].
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/disas.c                    |  17 ++
>   target/loongarch/helper.h                   |  18 ++
>   target/loongarch/insn_trans/trans_lsx.c.inc | 197 ++++++++++++++++++++
>   target/loongarch/insns.decode               |  17 ++
>   target/loongarch/lsx_helper.c               |  45 +++++
>   5 files changed, 294 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 12/44] target/loongarch: Implement vabsd
  2023-03-28  3:05 ` [RFC PATCH v2 12/44] target/loongarch: Implement vabsd Song Gao
@ 2023-03-28 20:32   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-03-28 20:32 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:05, Song Gao wrote:
> This patch includes:
> - VABSD.{B/H/W/D}[U].
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/disas.c                    |  9 ++
>   target/loongarch/helper.h                   |  9 ++
>   target/loongarch/insn_trans/trans_lsx.c.inc | 95 +++++++++++++++++++++
>   target/loongarch/insns.decode               |  9 ++
>   target/loongarch/lsx_helper.c               | 36 ++++++++
>   5 files changed, 158 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 13/44] target/loongarch: Implement vadda
  2023-03-28  3:06 ` [RFC PATCH v2 13/44] target/loongarch: Implement vadda Song Gao
@ 2023-03-28 20:33   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-03-28 20:33 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> This patch includes:
> - VADDA.{B/H/W/D}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/disas.c                    |  5 ++
>   target/loongarch/helper.h                   |  5 ++
>   target/loongarch/insn_trans/trans_lsx.c.inc | 53 +++++++++++++++++++++
>   target/loongarch/insns.decode               |  5 ++
>   target/loongarch/lsx_helper.c               | 19 ++++++++
>   5 files changed, 87 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 14/44] target/loongarch: Implement vmax/vmin
  2023-03-28  3:06 ` [RFC PATCH v2 14/44] target/loongarch: Implement vmax/vmin Song Gao
@ 2023-03-28 20:39   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-03-28 20:39 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> +static void do_vminmax(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm,
> +                       void(*gen_vminmax_vec)(unsigned,
> +                                              TCGv_vec, TCGv_vec, TCGv_vec))
> +{
> +    TCGv_vec t1;
> +
> +    t1 = tcg_temp_new_vec_matching(t);
> +    tcg_gen_dupi_vec(vece, t1, imm);

t1 = tcg_constant_vec_matching(t, vece, imm);

> +static void gen_vmini_s(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
> +{
> +    do_vminmax(vece, t, a, imm, tcg_gen_smin_vec);
> +}
> +
> +static void gen_vmini_u(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
> +{
> +    do_vminmax(vece, t, a, imm, tcg_gen_umin_vec);
> +}
> +
> +static void gen_vmaxi_s(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
> +{
> +    do_vminmax(vece, t, a, imm, tcg_gen_smax_vec);
> +}
> +
> +static void gen_vmaxi_u(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
> +{
> +    do_vminmax(vece, t, a, imm, tcg_gen_umax_vec);
> +}

Perhaps easier to expand

     tcg_gen_umax_vec(vece, t, a, tcg_constant_vec_matching(t, vece, imm));

in each instance?


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 15/44] target/loongarch: Implement vmul/vmuh/vmulw{ev/od}
  2023-03-28  3:06 ` [RFC PATCH v2 15/44] target/loongarch: Implement vmul/vmuh/vmulw{ev/od} Song Gao
@ 2023-03-28 20:46   ` Richard Henderson
  2023-04-06 12:09     ` gaosong
  0 siblings, 1 reply; 114+ messages in thread
From: Richard Henderson @ 2023-03-28 20:46 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> This patch includes:
> - VMUL.{B/H/W/D};
> - VMUH.{B/H/W/D}[U];
> - VMULW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
> - VMULW{EV/OD}.{H.BU.B/W.HU.H/D.WU.W/Q.DU.D}.
> 
> Signed-off-by: Song Gao <gaosong@loongson.cn>
> ---
>   target/loongarch/disas.c                    |  38 ++
>   target/loongarch/helper.h                   |  36 ++
>   target/loongarch/insn_trans/trans_lsx.c.inc | 378 ++++++++++++++++++++
>   target/loongarch/insns.decode               |  38 ++
>   target/loongarch/lsx_helper.c               | 140 ++++++++
>   5 files changed, 630 insertions(+)
> 
> diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
> index 6b0e518bfa..48e6ef5309 100644
> --- a/target/loongarch/disas.c
> +++ b/target/loongarch/disas.c
> @@ -972,3 +972,41 @@ INSN_LSX(vmini_bu,         vv_i)
>   INSN_LSX(vmini_hu,         vv_i)
>   INSN_LSX(vmini_wu,         vv_i)
>   INSN_LSX(vmini_du,         vv_i)
> +
> +INSN_LSX(vmul_b,           vvv)
> +INSN_LSX(vmul_h,           vvv)
> +INSN_LSX(vmul_w,           vvv)
> +INSN_LSX(vmul_d,           vvv)
> +INSN_LSX(vmuh_b,           vvv)
> +INSN_LSX(vmuh_h,           vvv)
> +INSN_LSX(vmuh_w,           vvv)
> +INSN_LSX(vmuh_d,           vvv)
> +INSN_LSX(vmuh_bu,          vvv)
> +INSN_LSX(vmuh_hu,          vvv)
> +INSN_LSX(vmuh_wu,          vvv)
> +INSN_LSX(vmuh_du,          vvv)
> +
> +INSN_LSX(vmulwev_h_b,      vvv)
> +INSN_LSX(vmulwev_w_h,      vvv)
> +INSN_LSX(vmulwev_d_w,      vvv)
> +INSN_LSX(vmulwev_q_d,      vvv)
> +INSN_LSX(vmulwod_h_b,      vvv)
> +INSN_LSX(vmulwod_w_h,      vvv)
> +INSN_LSX(vmulwod_d_w,      vvv)
> +INSN_LSX(vmulwod_q_d,      vvv)
> +INSN_LSX(vmulwev_h_bu,     vvv)
> +INSN_LSX(vmulwev_w_hu,     vvv)
> +INSN_LSX(vmulwev_d_wu,     vvv)
> +INSN_LSX(vmulwev_q_du,     vvv)
> +INSN_LSX(vmulwod_h_bu,     vvv)
> +INSN_LSX(vmulwod_w_hu,     vvv)
> +INSN_LSX(vmulwod_d_wu,     vvv)
> +INSN_LSX(vmulwod_q_du,     vvv)
> +INSN_LSX(vmulwev_h_bu_b,   vvv)
> +INSN_LSX(vmulwev_w_hu_h,   vvv)
> +INSN_LSX(vmulwev_d_wu_w,   vvv)
> +INSN_LSX(vmulwev_q_du_d,   vvv)
> +INSN_LSX(vmulwod_h_bu_b,   vvv)
> +INSN_LSX(vmulwod_w_hu_h,   vvv)
> +INSN_LSX(vmulwod_d_wu_w,   vvv)
> +INSN_LSX(vmulwod_q_du_d,   vvv)
> diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
> index f0fc7760bd..437b47fa78 100644
> --- a/target/loongarch/helper.h
> +++ b/target/loongarch/helper.h
> @@ -246,3 +246,39 @@ DEF_HELPER_FLAGS_4(vmaxi_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
>   DEF_HELPER_FLAGS_4(vmaxi_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
>   DEF_HELPER_FLAGS_4(vmaxi_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
>   DEF_HELPER_FLAGS_4(vmaxi_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
> +
> +DEF_HELPER_FLAGS_4(vmuh_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmuh_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmuh_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmuh_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmuh_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmuh_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmuh_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmuh_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +
> +DEF_HELPER_FLAGS_4(vmulwev_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmulwev_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmulwev_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmulwev_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmulwod_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmulwod_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmulwod_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmulwod_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +
> +DEF_HELPER_FLAGS_4(vmulwev_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmulwev_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmulwev_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmulwev_q_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmulwod_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmulwod_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmulwod_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmulwod_q_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +
> +DEF_HELPER_FLAGS_4(vmulwev_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmulwev_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmulwev_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmulwev_q_du_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmulwod_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmulwod_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmulwod_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(vmulwod_q_du_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
> index 4e2f1ff097..583b608cd2 100644
> --- a/target/loongarch/insn_trans/trans_lsx.c.inc
> +++ b/target/loongarch/insn_trans/trans_lsx.c.inc
> @@ -1533,3 +1533,381 @@ TRANS(vmaxi_bu, gvec_vv_i, MO_8, do_vmaxi_u)
>   TRANS(vmaxi_hu, gvec_vv_i, MO_16, do_vmaxi_u)
>   TRANS(vmaxi_wu, gvec_vv_i, MO_32, do_vmaxi_u)
>   TRANS(vmaxi_du, gvec_vv_i, MO_64, do_vmaxi_u)
> +
> +TRANS(vmul_b, gvec_vvv, MO_8, tcg_gen_gvec_mul)
> +TRANS(vmul_h, gvec_vvv, MO_16, tcg_gen_gvec_mul)
> +TRANS(vmul_w, gvec_vvv, MO_32, tcg_gen_gvec_mul)
> +TRANS(vmul_d, gvec_vvv, MO_64, tcg_gen_gvec_mul)
> +
> +static void do_vmuh_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
> +                      uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
> +{
> +    static const GVecGen3 op[4] = {
> +        {
> +            .fno = gen_helper_vmuh_b,
> +            .vece = MO_8
> +        },
> +        {
> +            .fno = gen_helper_vmuh_h,
> +            .vece = MO_16
> +        },
> +        {
> +            .fno = gen_helper_vmuh_w,
> +            .vece = MO_32
> +        },
> +        {
> +            .fno = gen_helper_vmuh_d,
> +            .vece = MO_64
> +        },
> +    };

Could be worth integer expansion, especially for MO_32/MO_64?
Should be trivial...

> +static void do_vmulwev_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
> +                         uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
> +{
> +    static const TCGOpcode vecop_list[] = {
> +        INDEX_op_shli_vec, INDEX_op_sari_vec, INDEX_op_mul_vec, 0
> +        };
> +    static const GVecGen3 op[4] = {
> +        {
> +            .fniv = gen_vmulwev_s,
> +            .fno = gen_helper_vmulwev_h_b,
> +            .opt_opc = vecop_list,
> +            .vece = MO_16
> +        },
> +        {
> +            .fniv = gen_vmulwev_s,
> +            .fno = gen_helper_vmulwev_w_h,
> +            .opt_opc = vecop_list,
> +            .vece = MO_32
> +        },
> +        {
> +            .fniv = gen_vmulwev_s,
> +            .fno = gen_helper_vmulwev_d_w,
> +            .opt_opc = vecop_list,
> +            .vece = MO_64
> +        },
> +        {
> +            .fno = gen_helper_vmulwev_q_d,
> +            .vece = MO_128
> +        },
> +    };

Likewise.  And MO_128 may be had via tcg_gen_muls2_i64.

> +static void do_vmulwev_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
> +                         uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
> +{
> +    static const TCGOpcode vecop_list[] = {
> +        INDEX_op_shli_vec, INDEX_op_shri_vec, INDEX_op_mul_vec, 0
> +        };
> +    static const GVecGen3 op[4] = {
> +        {
> +            .fniv = gen_vmulwev_u,
> +            .fno = gen_helper_vmulwev_h_bu,
> +            .opt_opc = vecop_list,
> +            .vece = MO_16
> +        },
> +        {
> +            .fniv = gen_vmulwev_u,
> +            .fno = gen_helper_vmulwev_w_hu,
> +            .opt_opc = vecop_list,
> +            .vece = MO_32
> +        },
> +        {
> +            .fniv = gen_vmulwev_u,
> +            .fno = gen_helper_vmulwev_d_wu,
> +            .opt_opc = vecop_list,
> +            .vece = MO_64
> +        },
> +        {
> +            .fno = gen_helper_vmulwev_q_du,
> +            .vece = MO_128
> +        },
> +    };

tcg_gen_mulu2_i64.


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 16/44] target/loongarch: Implement vmadd/vmsub/vmaddw{ev/od}
  2023-03-28  3:06 ` [RFC PATCH v2 16/44] target/loongarch: Implement vmadd/vmsub/vmaddw{ev/od} Song Gao
@ 2023-03-28 20:50   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-03-28 20:50 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> +static void gen_vmadd(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
> +{
> +    TCGv_vec t1;
> +
> +    t1 = tcg_temp_new_vec_matching(t);
> +    tcg_gen_mul_vec(vece, t1, a, b);
> +    tcg_gen_add_vec(vece, t, t, t1);
> +}
> +
> +static void do_vmadd(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
> +                     uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
> +{
> +    static const TCGOpcode vecop_list[] = {
> +        INDEX_op_mul_vec, INDEX_op_add_vec, 0
> +        };
> +    static const GVecGen3 op[4] = {
> +        {
> +            .fniv = gen_vmadd,
> +            .fno = gen_helper_vmadd_b,
> +            .load_dest = true,
> +            .opt_opc = vecop_list,
> +            .vece = MO_8
> +        },
> +        {
> +            .fniv = gen_vmadd,
> +            .fno = gen_helper_vmadd_h,
> +            .load_dest = true,
> +            .opt_opc = vecop_list,
> +            .vece = MO_16
> +        },
> +        {
> +            .fniv = gen_vmadd,
> +            .fno = gen_helper_vmadd_w,
> +            .load_dest = true,
> +            .opt_opc = vecop_list,
> +            .vece = MO_32
> +        },
> +        {
> +            .fniv = gen_vmadd,
> +            .fno = gen_helper_vmadd_d,
> +            .load_dest = true,
> +            .opt_opc = vecop_list,
> +            .vece = MO_64
> +        },
> +    };
> +
> +    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
> +}

Integer expansion?  Anyway,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 17/44] target/loongarch: Implement vdiv/vmod
  2023-03-28  3:06 ` [RFC PATCH v2 17/44] target/loongarch: Implement vdiv/vmod Song Gao
@ 2023-03-28 20:52   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-03-28 20:52 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> This patch includes:
> - VDIV.{B/H/W/D}[U];
> - VMOD.{B/H/W/D}[U].
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/disas.c                    | 17 +++++++++
>   target/loongarch/helper.h                   | 17 +++++++++
>   target/loongarch/insn_trans/trans_lsx.c.inc | 17 +++++++++
>   target/loongarch/insns.decode               | 17 +++++++++
>   target/loongarch/lsx_helper.c               | 38 +++++++++++++++++++++
>   5 files changed, 106 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 04/44] target/loongarch: Add CHECK_SXE maccro for check LSX enable
  2023-03-28 19:42   ` Richard Henderson
@ 2023-03-29  2:28     ` gaosong
  0 siblings, 0 replies; 114+ messages in thread
From: gaosong @ 2023-03-29  2:28 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel


在 2023/3/29 上午3:42, Richard Henderson 写道:
> On 3/27/23 20:05, Song Gao wrote:
>> --- a/target/loongarch/cpu.c
>> +++ b/target/loongarch/cpu.c
>> @@ -52,6 +52,7 @@ static const char * const excp_names[] = {
>>       [EXCCODE_FPE] = "Floating Point Exception",
>>       [EXCCODE_DBP] = "Debug breakpoint",
>>       [EXCCODE_BCE] = "Bound Check Exception",
>> +    [EXCCODE_SXD] = "128 bit vector instructions Disable exception",
>>   };
>>     const char *loongarch_exception_name(int32_t exception)
>> @@ -187,6 +188,7 @@ static void loongarch_cpu_do_interrupt(CPUState *cs)
>>       case EXCCODE_FPD:
>>       case EXCCODE_FPE:
>>       case EXCCODE_BCE:
>> +    case EXCCODE_ASXD:
>
> SXD?
>
Oh,  Should be SXD.
> From what little documentation is present in Volume 1, ASXD appears to 
> be for a 256-bit vector extension?
>
Yes.

Thanks.
Song Gao



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 01/44] target/loongarch: Add LSX data type VReg
  2023-03-28 19:56   ` Richard Henderson
@ 2023-03-29  2:28     ` gaosong
  0 siblings, 0 replies; 114+ messages in thread
From: gaosong @ 2023-03-29  2:28 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2908 bytes --]


在 2023/3/29 上午3:56, Richard Henderson 写道:
>> @@ -33,7 +33,39 @@ const VMStateDescription vmstate_loongarch_cpu = {
>>             VMSTATE_UINTTL_ARRAY(env.gpr, LoongArchCPU, 32),
>>           VMSTATE_UINTTL(env.pc, LoongArchCPU),
>> -        VMSTATE_UINT64_ARRAY(env.fpr, LoongArchCPU, 32),
>> +        VMSTATE_INT64(env.fpr[0].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[1].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[2].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[3].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[4].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[5].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[6].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[7].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[8].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[9].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[10].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[11].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[12].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[13].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[14].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[15].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[16].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[17].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[18].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[19].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[20].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[21].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[22].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[23].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[24].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[25].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[26].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[27].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[28].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[29].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[30].vreg.D(0), LoongArchCPU),
>> +        VMSTATE_INT64(env.fpr[31].vreg.D(0), LoongArchCPU),
>
> Do you care about migration compatibility between qemu versions?
> If not, it might be easier to handle the vector registers differently.
Since our features are not yet complete,   such as  128 bit vector 
instrcutions,   256 bit vector instructions
and kvm ,  we don't care about this now.

Thanks.
Song Gao

[-- Attachment #2: Type: text/html, Size: 4030 bytes --]

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 09/44] target/loongarch: Implement vhaddw/vhsubw
  2023-03-28 20:17   ` Richard Henderson
@ 2023-03-29  3:24     ` gaosong
  2023-03-29 15:25       ` Richard Henderson
  0 siblings, 1 reply; 114+ messages in thread
From: gaosong @ 2023-03-29  3:24 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel


在 2023/3/29 上午4:17, Richard Henderson 写道:
> On 3/27/23 20:05, Song Gao wrote:
>> +#define DO_ODD_EVEN_S(NAME, BIT, T, E1, E2, DO_OP)                 \
>> +void HELPER(NAME)(CPULoongArchState *env,                          \
>> +                  uint32_t vd, uint32_t vj, uint32_t vk)           \
>> +{                                                                  \
>> +    int i;                                                         \
>> +    VReg *Vd = &(env->fpr[vd].vreg);                               \
>> +    VReg *Vj = &(env->fpr[vj].vreg);                               \
>> +    VReg *Vk = &(env->fpr[vk].vreg);                               \
>> +                                                                   \
>> +    for (i = 0; i < LSX_LEN/BIT; i++) {                            \
>> +        Vd->E1(i) = DO_OP((T)Vj->E2(2 * i + 1), (T)Vk->E2(2 * i)); \
>> + }                                                              \
>> +}
> ...
>> +#define DO_ODD_EVEN_U(NAME, BIT, TD, TS, E1, E2, 
>> DO_OP)                     \
>> +void HELPER(NAME)(CPULoongArchState 
>> *env,                                    \
>> +                  uint32_t vd, uint32_t vj, uint32_t 
>> vk)                     \
>> +{ \
>> +    int i; \
>> +    VReg *Vd = &(env->fpr[vd].vreg); \
>> +    VReg *Vj = &(env->fpr[vj].vreg); \
>> +    VReg *Vk = &(env->fpr[vk].vreg); \
>> + \
>> +    for (i = 0; i < LSX_LEN/BIT; i++) 
>> {                                      \
>> +        Vd->E1(i) = DO_OP((TD)(TS)Vj->E2(2 * i + 1), 
>> (TD)(TS)Vk->E2(2 * i)); \
>> + } \
>> +}
>
> In the first case we have one cast, in the second case we have two.  I 
> wonder if it would be clearer to have both signed and unsigned members 
> in the VReg union? 

I really agree this.

> Then these two macros could be combined.
>
> I also think we could make use of (__typeof(Vd->E1(0))) instead of 
> separately passing the output type?  It would appear to be less 
> error-prone.
>
I will try this on v3.

Thanks.
Song Gao



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 05/44] target/loongarch: Implement vadd/vsub
  2023-03-28 19:59   ` Richard Henderson
@ 2023-03-29  9:59     ` gaosong
  2023-03-29 15:22       ` Richard Henderson
  0 siblings, 1 reply; 114+ messages in thread
From: gaosong @ 2023-03-29  9:59 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel


在 2023/3/29 上午3:59, Richard Henderson 写道:
> On 3/27/23 20:05, Song Gao wrote:
>> +    func(mop, vd_ofs, vj_ofs, vk_ofs, 16, 16);
>
> Oh, reading about ASXD and 256-bit vectors makes me wonder if it would 
> be better to plan ahead and have a function, or DisasContext member, 
> for the length of the vector. 

like arm:

/* Return the byte size of the "whole" vector register, VL / 8.  */
static inline int vec_full_reg_size(DisasContext *s)
{
     return s->vl;
}

What I'm confused about is what is the difference between s->vl and 
s->vec_len ?

Thanks.
Song Gao



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 05/44] target/loongarch: Implement vadd/vsub
  2023-03-29  9:59     ` gaosong
@ 2023-03-29 15:22       ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-03-29 15:22 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 3/29/23 02:59, gaosong wrote:
> 
> 在 2023/3/29 上午3:59, Richard Henderson 写道:
>> On 3/27/23 20:05, Song Gao wrote:
>>> +    func(mop, vd_ofs, vj_ofs, vk_ofs, 16, 16);
>>
>> Oh, reading about ASXD and 256-bit vectors makes me wonder if it would be better to plan 
>> ahead and have a function, or DisasContext member, for the length of the vector. 
> 
> like arm:
> 
> /* Return the byte size of the "whole" vector register, VL / 8.  */
> static inline int vec_full_reg_size(DisasContext *s)
> {
>      return s->vl;
> }
> 
> What I'm confused about is what is the difference between s->vl and s->vec_len ?

The first is for aarch64 SVE.
The second is for armv5 VFP (which was removed from armv8).


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 09/44] target/loongarch: Implement vhaddw/vhsubw
  2023-03-29  3:24     ` gaosong
@ 2023-03-29 15:25       ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-03-29 15:25 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 3/28/23 20:24, gaosong wrote:
>> I also think we could make use of (__typeof(Vd->E1(0))) instead of separately passing 
>> the output type?  It would appear to be less error-prone.
>>
> I will try this on v3.

Consider using local typedefs, e.g.

     typedef __typeof(Vd->E1(0)) TD;


r~



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 18/44] target/loongarch: Implement vsat
  2023-03-28  3:06 ` [RFC PATCH v2 18/44] target/loongarch: Implement vsat Song Gao
@ 2023-04-01  5:03   ` Richard Henderson
  2023-04-03 12:55     ` gaosong
  2023-04-19  9:31     ` Song Gao
  0 siblings, 2 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-01  5:03 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> +static void gen_vsat_s(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
> +{
> +    TCGv_vec t1;
> +    int64_t max  = (1l << imm) - 1;

This needed 1ull, but better to just use

     max = MAKE_64BIT_MASK(0, imm - 1);

> +    int64_t min =  ~max;

Extra space.

> +    t1 = tcg_temp_new_vec_matching(t);
> +    tcg_gen_dupi_vec(vece, t, min);
> +    tcg_gen_smax_vec(vece, t, a, t);

Use tcg_constant_vec_matching(t, vece, min) instead of dupi.
Three instances.

> +    tcg_gen_dupi_vec(vece, t1, max);
> +    tcg_gen_smin_vec(vece, t, t, t1);
> +}
> +
> +static void do_vsat_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
> +                      int64_t imm, uint32_t oprsz, uint32_t maxsz)
> +{
> +    static const TCGOpcode vecop_list[] = {
> +        INDEX_op_smax_vec, INDEX_op_smin_vec, 0
> +        };
> +    static const GVecGen2i op[4] = {
> +        {
> +            .fniv = gen_vsat_s,
> +            .fnoi = gen_helper_vsat_b,
> +            .opt_opc = vecop_list,
> +            .vece = MO_8
> +        },
> +        {
> +            .fniv = gen_vsat_s,
> +            .fnoi = gen_helper_vsat_h,
> +            .opt_opc = vecop_list,
> +            .vece = MO_16
> +        },
> +        {
> +            .fniv = gen_vsat_s,
> +            .fnoi = gen_helper_vsat_w,
> +            .opt_opc = vecop_list,
> +            .vece = MO_32
> +        },
> +        {
> +            .fniv = gen_vsat_s,
> +            .fnoi = gen_helper_vsat_d,
> +            .opt_opc = vecop_list,
> +            .vece = MO_64
> +        },
> +    };
> +
> +    tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, maxsz, imm, &op[vece]);

Better to expand imm to max here, rather than both inside gen_vsat_s and the runtime 
do_vsats_*.

Likewise for the unsigned versions.


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 19/44] target/loongarch: Implement vexth
  2023-03-28  3:06 ` [RFC PATCH v2 19/44] target/loongarch: Implement vexth Song Gao
@ 2023-04-01  5:07   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-01  5:07 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> This patch includes:
> - VEXTH.{H.B/W.H/D.W/Q.D};
> - VEXTH.{HU.BU/WU.HU/DU.WU/QU.DU}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/disas.c                    |  9 ++++++
>   target/loongarch/helper.h                   |  9 ++++++
>   target/loongarch/insn_trans/trans_lsx.c.inc | 20 ++++++++++++
>   target/loongarch/insns.decode               |  9 ++++++
>   target/loongarch/lsx_helper.c               | 35 +++++++++++++++++++++
>   5 files changed, 82 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 20/44] target/loongarch: Implement vsigncov
  2023-03-28  3:06 ` [RFC PATCH v2 20/44] target/loongarch: Implement vsigncov Song Gao
@ 2023-04-01  5:11   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-01  5:11 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> +static void gen_vsigncov(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
> +{
> +    TCGv_vec t1, t2;
> +
> +    t1 = tcg_temp_new_vec_matching(t);
> +    t2 = tcg_temp_new_vec_matching(t);
> +
> +    tcg_gen_neg_vec(vece, t1, b);
> +    tcg_gen_dupi_vec(vece, t2, 0);

tcg_constant_vec_matching.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 21/44] target/loongarch: Implement vmskltz/vmskgez/vmsknz
  2023-03-28  3:06 ` [RFC PATCH v2 21/44] target/loongarch: Implement vmskltz/vmskgez/vmsknz Song Gao
@ 2023-04-01  5:20   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-01  5:20 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> +void HELPER(vmskltz_b)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
> +{
> +    VReg temp;
> +    VReg *Vd = &(env->fpr[vd].vreg);
> +    VReg *Vj = &(env->fpr[vj].vreg);
> +
> +    temp.D(0) = 0;
> +    temp.D(1) = 0;
> +    temp.H(0) = do_vmskltz_b(Vj->D(0));
> +    temp.H(0) |= (do_vmskltz_b(Vj->D(1)) << 8);
> +    Vd->D(0) = temp.D(0);
> +    Vd->D(1) = 0;
> +}

Better as uint16_t temp, instead of a full VReg.

> +static uint64_t do_vmskltz_d(int64_t val)
> +{
> +    uint64_t m = 0x8000000000000000ULL;
> +    uint64_t c =  val & m;
> +    c |= c << 63;
> +    return c >> 63;
> +}

No mask or shift left required.


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 22/44] target/loongarch: Implement LSX logic instructions
  2023-03-28  3:06 ` [RFC PATCH v2 22/44] target/loongarch: Implement LSX logic instructions Song Gao
@ 2023-04-01  5:31   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-01  5:31 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> +static void gen_vnori(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
> +{
> +    TCGv_vec t1;
> +
> +    t1 = tcg_temp_new_vec_matching(t);
> +    tcg_gen_dupi_vec(vece, t1, imm);
> +    tcg_gen_nor_vec(vece, t, a, t1);
> +}

tcg_constant_vec_matching.

> +
> +static void do_vnori_b(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
> +                       int64_t imm, uint32_t oprsz, uint32_t maxsz)
> +{
> +    static const TCGOpcode vecop_list[] = {
> +        INDEX_op_nor_vec, 0
> +        };
> +    static const GVecGen2i op = {
> +       .fniv = gen_vnori,
> +       .fnoi = gen_helper_vnori_b,
> +       .opt_opc = vecop_list,
> +       .vece = MO_8
> +    };
> +
> +    tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, maxsz, imm, &op);
> +}

Should implement .fni8.


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 23/44] target/loongarch: Implement vsll vsrl vsra vrotr
  2023-03-28  3:06 ` [RFC PATCH v2 23/44] target/loongarch: Implement vsll vsrl vsra vrotr Song Gao
@ 2023-04-01  5:38   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-01  5:38 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> This patch includes:
> - VSLL[I].{B/H/W/D};
> - VSRL[I].{B/H/W/D};
> - VSRA[I].{B/H/W/D};
> - VROTR[I].{B/H/W/D}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/disas.c                    | 36 +++++++++++++++++++++
>   target/loongarch/insn_trans/trans_lsx.c.inc | 36 +++++++++++++++++++++
>   target/loongarch/insns.decode               | 36 +++++++++++++++++++++
>   3 files changed, 108 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 24/44] target/loongarch: Implement vsllwil vextl
  2023-03-28  3:06 ` [RFC PATCH v2 24/44] target/loongarch: Implement vsllwil vextl Song Gao
@ 2023-04-01  5:40   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-01  5:40 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> This patch includes:
> - VSLLWIL.{H.B/W.H/D.W};
> - VSLLWIL.{HU.BU/WU.HU/DU.WU};
> - VEXTL.Q.D, VEXTL.QU.DU.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/disas.c                    |  9 +++++
>   target/loongarch/helper.h                   |  9 +++++
>   target/loongarch/insn_trans/trans_lsx.c.inc | 21 +++++++++++
>   target/loongarch/insns.decode               |  9 +++++
>   target/loongarch/lsx_helper.c               | 40 +++++++++++++++++++++
>   5 files changed, 88 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 25/44] target/loongarch: Implement vsrlr vsrar
  2023-03-28  3:06 ` [RFC PATCH v2 25/44] target/loongarch: Implement vsrlr vsrar Song Gao
@ 2023-04-01  5:42   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-01  5:42 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> This patch includes:
> - VSRLR[I].{B/H/W/D};
> - VSRAR[I].{B/H/W/D}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/disas.c                    |  18 ++++
>   target/loongarch/helper.h                   |  18 ++++
>   target/loongarch/insn_trans/trans_lsx.c.inc |  18 ++++
>   target/loongarch/insns.decode               |  18 ++++
>   target/loongarch/lsx_helper.c               | 104 ++++++++++++++++++++
>   5 files changed, 176 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 26/44] target/loongarch: Implement vsrln vsran
  2023-03-28  3:06 ` [RFC PATCH v2 26/44] target/loongarch: Implement vsrln vsran Song Gao
@ 2023-04-01  5:46   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-01  5:46 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> This patch includes:
> - VSRLN.{B.H/H.W/W.D};
> - VSRAN.{B.H/H.W/W.D};
> - VSRLNI.{B.H/H.W/W.D/D.Q};
> - VSRANI.{B.H/H.W/W.D/D.Q}.
> 
> Signed-off-by: Song Gao <gaosong@loongson.cn>
> ---
>   target/loongarch/disas.c                    |  16 +++
>   target/loongarch/helper.h                   |  16 +++
>   target/loongarch/insn_trans/trans_lsx.c.inc |  16 +++
>   target/loongarch/insns.decode               |  17 +++
>   target/loongarch/lsx_helper.c               | 118 ++++++++++++++++++++
>   5 files changed, 183 insertions(+)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

> +    Vd->D(0) = temp.D(0);                                    \
> +    Vd->D(1) = temp.D(1);          

Oh, just noticed but lots of instances: better as *Vd = temp.


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 27/44] target/loongarch: Implement vsrlrn vsrarn
  2023-03-28  3:06 ` [RFC PATCH v2 27/44] target/loongarch: Implement vsrlrn vsrarn Song Gao
@ 2023-04-01  5:53   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-01  5:53 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> +        temp.D(1) = int128_getlo(Vd->D(0));

Typo here.

You should build i386 host.  E.g.

   make docker-test-build@fedora-i386-cross


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 28/44] target/loongarch: Implement vssrln vssran
  2023-03-28  3:06 ` [RFC PATCH v2 28/44] target/loongarch: Implement vssrln vssran Song Gao
@ 2023-04-02  3:26   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-02  3:26 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> This patch includes:
> - VSSRLN.{B.H/H.W/W.D};
> - VSSRAN.{B.H/H.W/W.D};
> - VSSRLN.{BU.H/HU.W/WU.D};
> - VSSRAN.{BU.H/HU.W/WU.D};
> - VSSRLNI.{B.H/H.W/W.D/D.Q};
> - VSSRANI.{B.H/H.W/W.D/D.Q};
> - VSSRLNI.{BU.H/HU.W/WU.D/DU.Q};
> - VSSRANI.{BU.H/HU.W/WU.D/DU.Q}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/disas.c                    |  30 ++
>   target/loongarch/helper.h                   |  30 ++
>   target/loongarch/insn_trans/trans_lsx.c.inc |  30 ++
>   target/loongarch/insns.decode               |  30 ++
>   target/loongarch/lsx_helper.c               | 383 ++++++++++++++++++++
>   5 files changed, 503 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 29/44] target/loongarch: Implement vssrlrn vssrarn
  2023-03-28  3:06 ` [RFC PATCH v2 29/44] target/loongarch: Implement vssrlrn vssrarn Song Gao
@ 2023-04-02  3:31   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-02  3:31 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> +#define SSRLRNS(E1, E2, T1, T2, T3)                \
> +static T1 do_ssrlrns_ ## E1(T2 e2, int sa, int sh) \
> +{                                                  \
> +    T1 shft_res;                                   \
> +                                                   \
> +    shft_res = do_vsrlr_ ## E2(e2, sa);            \
> +    T1 mask;                                       \
> +    mask = (1ul << sh) -1;                         \

I've probably missed other instances in review, but "ul" and "l" are *always* incorrect.

For 32-bit hosts, this is not wide enough.
If it were, "u" or no suffix would have been sufficient.

For uses like this, MAKE_64BIT_MASK(0, sh) is what you want.
For other kinds of uses, "ull" or "ll" is correct.


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 30/44] target/loongarch: Implement vclo vclz
  2023-03-28  3:06 ` [RFC PATCH v2 30/44] target/loongarch: Implement vclo vclz Song Gao
@ 2023-04-02  3:34   ` Richard Henderson
  2023-04-07  7:40     ` gaosong
  0 siblings, 1 reply; 114+ messages in thread
From: Richard Henderson @ 2023-04-02  3:34 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> +#define DO_CLO_B(N)  (clz32((uint8_t)~N) - 24)
> +#define DO_CLO_H(N)  (clz32((uint16_t)~N) - 16)

I think this is wrong.  You *want* the high bits to be set, so that they are ones, and 
included in the count, which you then subtract off.  You want the "real" count to start 
after the 24th leading 1.


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 31/44] target/loongarch: Implement vpcnt
  2023-03-28  3:06 ` [RFC PATCH v2 31/44] target/loongarch: Implement vpcnt Song Gao
@ 2023-04-02  3:35   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-02  3:35 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> +static uint64_t do_vpcnt(uint64_t u1)
> +{
> +    u1 = (u1 & 0x5555555555555555ULL) + ((u1 >>  1) & 0x5555555555555555ULL);
> +    u1 = (u1 & 0x3333333333333333ULL) + ((u1 >>  2) & 0x3333333333333333ULL);
> +    u1 = (u1 & 0x0F0F0F0F0F0F0F0FULL) + ((u1 >>  4) & 0x0F0F0F0F0F0F0F0FULL);
> +    u1 = (u1 & 0x00FF00FF00FF00FFULL) + ((u1 >>  8) & 0x00FF00FF00FF00FFULL);
> +    u1 = (u1 & 0x0000FFFF0000FFFFULL) + ((u1 >> 16) & 0x0000FFFF0000FFFFULL);
> +    u1 = (u1 & 0x00000000FFFFFFFFULL) + ((u1 >> 32));
> +
> +    return u1;
> +}
> +
> +#define VPCNT(NAME, BIT, E, T)                                      \
> +void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
> +{                                                                   \
> +    int i;                                                          \
> +    VReg *Vd = &(env->fpr[vd].vreg);                                \
> +    VReg *Vj = &(env->fpr[vj].vreg);                                \
> +                                                                    \
> +    for (i = 0; i < LSX_LEN/BIT; i++)                               \
> +    {                                                               \
> +        Vd->E(i) = do_vpcnt((T)Vj->E(i));                           \
> +    }                                                               \
> +}
> +
> +VPCNT(vpcnt_b, 8, B, uint8_t)
> +VPCNT(vpcnt_h, 16, H, uint16_t)
> +VPCNT(vpcnt_w, 32, W, uint32_t)
> +VPCNT(vpcnt_d, 64, D, uint64_t)

host-utils.h has ctpop{8,16,32,64}.


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 32/44] target/loongarch: Implement vbitclr vbitset vbitrev
  2023-03-28  3:06 ` [RFC PATCH v2 32/44] target/loongarch: Implement vbitclr vbitset vbitrev Song Gao
@ 2023-04-02  5:14   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-02  5:14 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> +#define DO_BITCLR(a, bit) (a & ~(1ul << bit))
> +#define DO_BITSET(a, bit) (a | 1ul << bit)
> +#define DO_BITREV(a, bit) (a ^ (1ul << bit))

ul.

Also, the *i versions should always be inline.
And it should be trivial to expand the non-i versions inline, with shl.


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 33/44] target/loongarch: Implement vfrstp
  2023-03-28  3:06 ` [RFC PATCH v2 33/44] target/loongarch: Implement vfrstp Song Gao
@ 2023-04-02  5:17   ` Richard Henderson
  2023-04-03  2:27     ` gaosong
  0 siblings, 1 reply; 114+ messages in thread
From: Richard Henderson @ 2023-04-02  5:17 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> This patch includes:
> - VFRSTP[I].{B/H}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/disas.c                    |  5 +++
>   target/loongarch/helper.h                   |  5 +++
>   target/loongarch/insn_trans/trans_lsx.c.inc |  5 +++
>   target/loongarch/insns.decode               |  5 +++
>   target/loongarch/lsx_helper.c               | 41 +++++++++++++++++++++
>   5 files changed, 61 insertions(+)

This one's obscure.  Find first negative element in Vj,
store that value in Vd element indexed by Vk?

Acked-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 34/44] target/loongarch: Implement LSX fpu arith instructions
  2023-03-28  3:06 ` [RFC PATCH v2 34/44] target/loongarch: Implement LSX fpu arith instructions Song Gao
@ 2023-04-02  5:19   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-02  5:19 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> This patch includes:
> - VF{ADD/SUB/MUL/DIV}.{S/D};
> - VF{MADD/MSUB/NMADD/NMSUB}.{S/D};
> - VF{MAX/MIN}.{S/D};
> - VF{MAXA/MINA}.{S/D};
> - VFLOGB.{S/D};
> - VFCLASS.{S/D};
> - VF{SQRT/RECIP/RSQRT}.{S/D}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/cpu.h                      |   4 +
>   target/loongarch/disas.c                    |  46 +++++
>   target/loongarch/fpu_helper.c               |   2 +-
>   target/loongarch/helper.h                   |  41 +++++
>   target/loongarch/insn_trans/trans_lsx.c.inc |  55 ++++++
>   target/loongarch/insns.decode               |  43 +++++
>   target/loongarch/internals.h                |   1 +
>   target/loongarch/lsx_helper.c               | 187 ++++++++++++++++++++
>   8 files changed, 378 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 35/44] target/loongarch: Implement LSX fpu fcvt instructions
  2023-03-28  3:06 ` [RFC PATCH v2 35/44] target/loongarch: Implement LSX fpu fcvt instructions Song Gao
@ 2023-04-02  5:22   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-02  5:22 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> This patch includes:
> - VFCVT{L/H}.{S.H/D.S};
> - VFCVT.{H.S/S.D};
> - VFRINT[{RNE/RZ/RP/RM}].{S/D};
> - VFTINT[{RNE/RZ/RP/RM}].{W.S/L.D};
> - VFTINT[RZ].{WU.S/LU.D};
> - VFTINT[{RNE/RZ/RP/RM}].W.D;
> - VFTINT[{RNE/RZ/RP/RM}]{L/H}.L.S;
> - VFFINT.{S.W/D.L}[U];
> - VFFINT.S.L, VFFINT{L/H}.D.W.
> 
> Signed-off-by: Song Gao <gaosong@loongson.cn>
> ---
>   fpu/softfloat.c                             |  55 +++
>   include/fpu/softfloat.h                     |  27 ++
>   target/loongarch/disas.c                    |  56 +++
>   target/loongarch/helper.h                   |  56 +++
>   target/loongarch/insn_trans/trans_lsx.c.inc |  56 +++
>   target/loongarch/insns.decode               |  56 +++
>   target/loongarch/lsx_helper.c               | 369 ++++++++++++++++++++
>   7 files changed, 675 insertions(+)
> 
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index c7454c3eb1..79975c6b01 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -2988,6 +2988,25 @@ float64 float64_round_to_int(float64 a, float_status *s)
>       return float64_round_pack_canonical(&p, s);
>   }
>   
> +#define FRINT_RM(rm, rmode, bits)                             \
> +float ## bits float ## bits ## _round_to_int_ ## rm(          \
> +                         float ## bits a, float_status *s)    \
> +{                                                             \
> +    FloatParts64 pa;   \
> +    float ## bits ## _unpack_canonical(&pa, a, s); \
> +    parts_round_to_int(&pa, rmode, 0, s, &float64_params);    \
> +    return float ## bits ## _round_pack_canonical(&pa, s);    \
> +}
> +FRINT_RM(rne, float_round_nearest_even, 32)
> +FRINT_RM(rm,  float_round_down,         32)
> +FRINT_RM(rp,  float_round_up,           32)
> +FRINT_RM(rz,  float_round_to_zero,      32)
> +FRINT_RM(rne, float_round_nearest_even, 64)
> +FRINT_RM(rm,  float_round_down,         64)
> +FRINT_RM(rp,  float_round_up,           64)
> +FRINT_RM(rz,  float_round_to_zero,      64)
> +#undef FRINT_RM


No, you should simply swap your float_status rounding mode around the operation.
See the arm/tcg gen_set_rmode function.


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 36/44] target/loongarch: Implement vseq vsle vslt
  2023-03-28  3:06 ` [RFC PATCH v2 36/44] target/loongarch: Implement vseq vsle vslt Song Gao
@ 2023-04-02  5:27   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-02  5:27 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> +static bool do_cmp(DisasContext *ctx, arg_vvv *a, MemOp mop, TCGCond cond,
> +                   void (*func)(TCGCond, unsigned, uint32_t, uint32_t,
> +                                uint32_t, uint32_t, uint32_t))
> +{
> +    uint32_t vd_ofs, vj_ofs, vk_ofs;
> +
> +    CHECK_SXE;
> +
> +    vd_ofs = vreg_full_offset(a->vd);
> +    vj_ofs = vreg_full_offset(a->vj);
> +    vk_ofs = vreg_full_offset(a->vk);
> +
> +    func(cond, mop, vd_ofs, vj_ofs, vk_ofs, 16, 16);

You always pass tcg_gen_cmp_vec.

> +static void do_cmpi_vec(TCGCond cond,
> +                        unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
> +{
> +    TCGv_vec t1;
> +
> +    t1 = tcg_temp_new_vec_matching(t);
> +    tcg_gen_dupi_vec(vece, t1, imm);

tcg_constant_vec_matching.


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 33/44] target/loongarch: Implement vfrstp
  2023-04-02  5:17   ` Richard Henderson
@ 2023-04-03  2:27     ` gaosong
  0 siblings, 0 replies; 114+ messages in thread
From: gaosong @ 2023-04-03  2:27 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel


在 2023/4/2 下午1:17, Richard Henderson 写道:
> On 3/27/23 20:06, Song Gao wrote:
>> This patch includes:
>> - VFRSTP[I].{B/H}.
>>
>> Signed-off-by: Song Gao<gaosong@loongson.cn>
>> ---
>>   target/loongarch/disas.c                    |  5 +++
>>   target/loongarch/helper.h                   |  5 +++
>>   target/loongarch/insn_trans/trans_lsx.c.inc |  5 +++
>>   target/loongarch/insns.decode               |  5 +++
>>   target/loongarch/lsx_helper.c               | 41 +++++++++++++++++++++
>>   5 files changed, 61 insertions(+)
>
> This one's obscure.  Find first negative element in Vj,
> store that value in Vd element indexed by Vk?
>
Yes,  but the value is  the first negative element index  or  max index + 1.

e.g   vfrstp.b  vd, vj, vk.
     idx = 0;
     for  i in range(16);
         if Vj->B[i]  < 0; break;
         idx = idx +1;
     m = Vk->B(0) % 16;
     Vd->B(m) = idx;

Thanks.
Song Gao



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 18/44] target/loongarch: Implement vsat
  2023-04-01  5:03   ` Richard Henderson
@ 2023-04-03 12:55     ` gaosong
  2023-04-03 20:13       ` Richard Henderson
  2023-04-19  9:31     ` Song Gao
  1 sibling, 1 reply; 114+ messages in thread
From: gaosong @ 2023-04-03 12:55 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1730 bytes --]

Hi, Richard

在 2023/4/1 下午1:03, Richard Henderson 写道:
> On 3/27/23 20:06, Song Gao wrote:
>> +static void gen_vsat_s(unsigned vece, TCGv_vec t, TCGv_vec a, 
>> int64_t imm)
>> +{
>> +    TCGv_vec t1;
>> +    int64_t max  = (1l << imm) - 1;
>
> This needed 1ull, but better to just use
>
>     max = MAKE_64BIT_MASK(0, imm - 1); 
For the signed  version use ll?
I think use MAKE_64BIT_MASK(0, imm -1 )  for the signed version is not 
suitable.

e.g   imm is 1,

  imm is 1
1ll << imm  -1    1
1ull << imm  -1   1
MAKE_64BIT_MASK   ffffffffffffffff

vsat.w    vr 22  vr25  0x1.
input  vr25:   {0, 0}
result vr22: {0, 0}
if we use MAKE_64BIT_MASK ,   result is {ffffffffffffffff, 
ffffffffffffffff}.


This is   RISU test log:

......

imm is d
1ll << imm  -1    1fff
1ull << imm  -1   1fff
MAKE_64BIT_MASK   fff
imm is 8
1ll << imm  -1    ff
1ull << imm  -1   ff
MAKE_64BIT_MASK   7f
imm is 7
1ll << imm  -1    7f
1ull << imm  -1   7f
MAKE_64BIT_MASK   3f
imm is 1d
1ll << imm  -1    1fffffff
1ull << imm  -1   1fffffff
MAKE_64BIT_MASK   fffffff
imm is 29
1ll << imm  -1    1ffffffffff
1ull << imm  -1   1ffffffffff
MAKE_64BIT_MASK   ffffffffff
imm is 6
1ll << imm  -1    3f
1ull << imm  -1   3f
MAKE_64BIT_MASK   1f
imm is 3
1ll << imm  -1    7
1ull << imm  -1   7
MAKE_64BIT_MASK   3
imm is 1
1ll << imm  -1    1
1ull << imm  -1   1
MAKE_64BIT_MASK   ffffffffffffffff
Mismatch reg after 63 checkpoints

......

mismatch detail (master : apprentice):
   f22    : 0000000000000000 vs ffffffffffffffff
   v22    : {0000000000000000, 0000000000000000} vs {ffffffffffffffff, 
ffffffffffffffff}

Thanks.
Song Gao.

[-- Attachment #2: Type: text/html, Size: 2922 bytes --]

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 18/44] target/loongarch: Implement vsat
  2023-04-03 12:55     ` gaosong
@ 2023-04-03 20:13       ` Richard Henderson
  2023-04-04  2:11         ` gaosong
  0 siblings, 1 reply; 114+ messages in thread
From: Richard Henderson @ 2023-04-03 20:13 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 4/3/23 05:55, gaosong wrote:
> Hi, Richard
> 
> 在 2023/4/1 下午1:03, Richard Henderson 写道:
>> On 3/27/23 20:06, Song Gao wrote:
>>> +static void gen_vsat_s(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
>>> +{
>>> +    TCGv_vec t1;
>>> +    int64_t max  = (1l << imm) - 1;
>>
>> This needed 1ull, but better to just use
>>
>>     max = MAKE_64BIT_MASK(0, imm - 1); 
> For the signed  version use ll?
> I think use MAKE_64BIT_MASK(0, imm -1 )  for the signed version is not suitable.

int64_t max = MAKE_64BIT_MASK(0, imm);
int64_t min = ~max // or -1 - max




^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 37/44] target/loongarch: Implement vfcmp
  2023-03-28  3:06 ` [RFC PATCH v2 37/44] target/loongarch: Implement vfcmp Song Gao
@ 2023-04-04  0:47   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-04  0:47 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> +static uint64_t vfcmp_common(CPULoongArchState *env,
> +                             FloatRelation cmp, uint32_t flags)
> +{
> +    bool ret;
> +
> +    switch (cmp) {
> +    case float_relation_less:
> +        ret = (flags & FCMP_LT);
> +        break;
> +    case float_relation_equal:
> +        ret = (flags & FCMP_EQ);
> +        break;
> +    case float_relation_greater:
> +        ret = (flags & FCMP_GT);
> +        break;
> +    case float_relation_unordered:
> +        ret = (flags & FCMP_UN);
> +        break;
> +    default:
> +        g_assert_not_reached();
> +    }
> +
> +    return ret;
> +}

Either change the return type to bool, or return {0, -1} here...

> +
> +#define VFCMP(NAME, BIT, T, E, FN)                                       \
> +void HELPER(NAME)(CPULoongArchState *env,                                \
> +                  uint32_t vd, uint32_t vj, uint32_t vk, uint32_t flags) \
> +{                                                                        \
> +    int i;                                                               \
> +    VReg t;                                                              \
> +    VReg *Vd = &(env->fpr[vd].vreg);                                     \
> +    VReg *Vj = &(env->fpr[vj].vreg);                                     \
> +    VReg *Vk = &(env->fpr[vk].vreg);                                     \
> +                                                                         \
> +    vec_clear_cause(env);                                                \
> +    for (i = 0; i < LSX_LEN/BIT ; i++) {                                 \
> +        FloatRelation cmp;                                               \
> +        cmp = FN(Vj->E(i), Vk->E(i), &env->fp_status);                   \
> +        t.E(i) = (vfcmp_common(env, cmp, flags)) ? -1 : 0;               \

... and avoid the extra conditional here.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 38/44] target/loongarch: Implement vbitsel vset
  2023-03-28  3:06 ` [RFC PATCH v2 38/44] target/loongarch: Implement vbitsel vset Song Gao
@ 2023-04-04  1:03   ` Richard Henderson
  2023-04-11 11:37     ` gaosong
  0 siblings, 1 reply; 114+ messages in thread
From: Richard Henderson @ 2023-04-04  1:03 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> +static void gen_vbitseli(unsigned vece, TCGv_vec a, TCGv_vec b, int64_t imm)
> +{
> +    TCGv_vec t;
> +
> +    t = tcg_temp_new_vec_matching(a);
> +    tcg_gen_dupi_vec(vece, t, imm);

tcg_constant_vec_matching.

> +void HELPER(vseteqz_v)(CPULoongArchState *env, uint32_t cd, uint32_t vj)
> +{
> +    VReg *Vj = &(env->fpr[vj].vreg);
> +    env->cf[cd & 0x7] = (Vj->Q(0) == 0);
> +}
> +
> +void HELPER(vsetnez_v)(CPULoongArchState *env, uint32_t cd, uint32_t vj)
> +{
> +    VReg *Vj = &(env->fpr[vj].vreg);
> +    env->cf[cd & 0x7] = (Vj->Q(0) != 0);
> +}

This is trivial inline.

> +#define SETANYEQZ(NAME, BIT, E)                                     \
> +void HELPER(NAME)(CPULoongArchState *env, uint32_t cd, uint32_t vj) \
> +{                                                                   \
> +    int i;                                                          \
> +    bool ret = false;                                               \
> +    VReg *Vj = &(env->fpr[vj].vreg);                                \
> +                                                                    \
> +    for (i = 0; i < LSX_LEN/BIT; i++) {                             \
> +        ret |= (Vj->E(i) == 0);                                     \
> +    }                                                               \
> +    env->cf[cd & 0x7] = ret;                                        \
> +}
> +SETANYEQZ(vsetanyeqz_b, 8, B)
> +SETANYEQZ(vsetanyeqz_h, 16, H)
> +SETANYEQZ(vsetanyeqz_w, 32, W)
> +SETANYEQZ(vsetanyeqz_d, 64, D)

These could be inlined, though slightly harder.
C.f. target/arm/sve_helper.c, do_match2 (your n == 0).

Anyway, leaving this as-is for now is also ok.


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 39/44] target/loongarch: Implement vinsgr2vr vpickve2gr vreplgr2vr
  2023-03-28  3:06 ` [RFC PATCH v2 39/44] target/loongarch: Implement vinsgr2vr vpickve2gr vreplgr2vr Song Gao
@ 2023-04-04  1:09   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-04  1:09 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> This patch includes:
> - VINSGR2VR.{B/H/W/D};
> - VPICKVE2GR.{B/H/W/D}[U];
> - VREPLGR2VR.{B/H/W/D}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/disas.c                    |  33 ++++++
>   target/loongarch/insn_trans/trans_lsx.c.inc | 110 ++++++++++++++++++++
>   target/loongarch/insns.decode               |  30 ++++++
>   3 files changed, 173 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 40/44] target/loongarch: Implement vreplve vpack vpick
  2023-03-28  3:06 ` [RFC PATCH v2 40/44] target/loongarch: Implement vreplve vpack vpick Song Gao
@ 2023-04-04  1:17   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-04  1:17 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> +static bool trans_vbsll_v(DisasContext *ctx, arg_vv_i *a)
> +{
> +    int ofs;
> +    TCGv_i64 desthigh, destlow, high, low, t;
> +
> +    CHECK_SXE;
> +
> +    desthigh = tcg_temp_new_i64();
> +    destlow = tcg_temp_new_i64();
> +    high = tcg_temp_new_i64();
> +    low = tcg_temp_new_i64();
> +    t = tcg_constant_i64(0);
> +
> +    tcg_gen_ld_i64(high, cpu_env,
> +                   offsetof(CPULoongArchState, fpr[a->vj].vreg.D(1)));
> +    tcg_gen_ld_i64(low, cpu_env,
> +                   offsetof(CPULoongArchState, fpr[a->vj].vreg.D(0)));
> +
> +    ofs = ((a->imm) & 0xf) * 8;
> +    if (ofs < 64) {
> +        tcg_gen_extract2_i64(desthigh, low, high, 64 -ofs);

high is only used here, therefore the load should be delayed.

> +        tcg_gen_shli_i64(destlow, low, ofs);
> +    } else {
> +        tcg_gen_shli_i64(desthigh, low, ofs -64);
> +        tcg_gen_mov_i64(destlow, t);

Delay the allocation of destlow into the < 64 block,
then simply assign destlow = tcg_constant_i64(0) here.

Watch the spacing: "ofs - 64".

Similarly for trans_vbsrl_v.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 41/44] target/loongarch: Implement vilvl vilvh vextrins vshuf
  2023-03-28  3:06 ` [RFC PATCH v2 41/44] target/loongarch: Implement vilvl vilvh vextrins vshuf Song Gao
@ 2023-04-04  1:31   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-04  1:31 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> +void HELPER(vshuf_b)(CPULoongArchState *env,
> +                     uint32_t vd, uint32_t vj, uint32_t vk, uint32_t va)
> +{
> +    int i, m, k;
> +    VReg temp;
> +    VReg *Vd = &(env->fpr[vd].vreg);
> +    VReg *Vj = &(env->fpr[vj].vreg);
> +    VReg *Vk = &(env->fpr[vk].vreg);
> +    VReg *Va = &(env->fpr[va].vreg);
> +
> +    m = LSX_LEN/8;
> +    for (i = 0; i < m ; i++) {
> +        k = (Va->B(i)& 0x3f) % (2 * m);

Eh?  Double masking?

> +        temp.B(i) = (Va->B(i) & 0xc0) ? 0 : k < m ? Vk->B(k) : Vj->B(k - m);

Triple masking?

I would have expected something like

     k = Va->B(i) % N;
     temp.B(i) = (k < m ? Vj : k < 2 * m ? Vk : 0);

> +#define VSHUF(NAME, BIT, E)                                                  \
> +void HELPER(NAME)(CPULoongArchState *env,                                    \
> +                  uint32_t vd, uint32_t vj, uint32_t vk)                     \
> +{                                                                            \
> +    int i, m, k;                                                             \
> +    VReg temp;                                                               \
> +    VReg *Vd = &(env->fpr[vd].vreg);                                         \
> +    VReg *Vj = &(env->fpr[vj].vreg);                                         \
> +    VReg *Vk = &(env->fpr[vk].vreg);                                         \
> +                                                                             \
> +    m = LSX_LEN/BIT;                                                         \
> +    for (i = 0; i < m; i++) {                                                \
> +        k  = (Vd->E(i) & 0x3f) % (2 * m);                                    \
> +        temp.E(i) = (Vd->E(i) & 0xc0) ? 0 : k < m ? Vk->E(k) : Vj->E(k - m); \
> +    }                                                                        \
> +    Vd->D(0) = temp.D(0);                                                    \
> +    Vd->D(1) = temp.D(1);                                                    \
> +}

Likewise.

> +#define SHF_POS(i, imm) (((i) & 0xfc) + (((imm) >> (2 * ((i) & 0x03))) & 0x03))
> +
> +#define VSHUF4I(NAME, BIT, E)                             \
> +void HELPER(NAME)(CPULoongArchState *env,                 \
> +                  uint32_t vd, uint32_t vj, uint32_t imm) \
> +{                                                         \
> +    int i;                                                \
> +    VReg temp;                                            \
> +    VReg *Vd = &(env->fpr[vd].vreg);                      \
> +    VReg *Vj = &(env->fpr[vj].vreg);                      \
> +                                                          \
> +    for (i = 0; i < LSX_LEN/BIT; i++) {                   \
> +         temp.E(i) = Vj->E(SHF_POS(i, imm));              \
> +    }                                                     \
> +    Vd->D[0] = temp.D[0];                                 \
> +    Vd->D[1] = temp.D[1];                                 \
> +}

Merge SHF_POS unless you expect it to be used again?

> +void HELPER(vshuf4i_d)(CPULoongArchState *env,
> +                       uint32_t vd, uint32_t vj, uint32_t imm)
> +{
> +    VReg *Vd = &(env->fpr[vd].vreg);
> +    VReg *Vj = &(env->fpr[vj].vreg);
> +
> +    VReg temp;
> +    temp.D(0) = ((imm & 0x03) == 0x00) ? Vd->D(0):
> +                ((imm & 0x03) == 0x01) ? Vd->D(1):
> +                ((imm & 0x03) == 0x02) ? Vj->D(0): Vj->D(1);
> +
> +    temp.D(1) = ((imm & 0x0c) == 0x00) ? Vd->D(0):
> +                ((imm & 0x0c) == 0x04) ? Vd->D(1):
> +                ((imm & 0x0c) == 0x08) ? Vj->D(0): Vj->D(1);
> +
> +    Vd->D[0] = temp.D[0];
> +    Vd->D[1] = temp.D[1];
> +}

Perhaps

     temp.D(0) = (imm & 2 ? Vj : Vd)->D(imm & 1);
     temp.D(1) = (imm & 8 ? Vj : Vd)->D((imm >> 2) & 1);


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 18/44] target/loongarch: Implement vsat
  2023-04-03 20:13       ` Richard Henderson
@ 2023-04-04  2:11         ` gaosong
  2023-04-04  3:46           ` Richard Henderson
  0 siblings, 1 reply; 114+ messages in thread
From: gaosong @ 2023-04-04  2:11 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel


在 2023/4/4 上午4:13, Richard Henderson 写道:
> On 4/3/23 05:55, gaosong wrote:
>> Hi, Richard
>>
>> 在 2023/4/1 下午1:03, Richard Henderson 写道:
>>> On 3/27/23 20:06, Song Gao wrote:
>>>> +static void gen_vsat_s(unsigned vece, TCGv_vec t, TCGv_vec a, 
>>>> int64_t imm)
>>>> +{
>>>> +    TCGv_vec t1;
>>>> +    int64_t max  = (1l << imm) - 1;
>>>
>>> This needed 1ull, but better to just use
>>>
>>>     max = MAKE_64BIT_MASK(0, imm - 1); 
>> For the signed  version use ll?
>> I think use MAKE_64BIT_MASK(0, imm -1 )  for the signed version is 
>> not suitable.
>
> int64_t max = MAKE_64BIT_MASK(0, imm);
> int64_t min = ~max // or -1 - max
>
The same problem with imm = 0,
MAKE_64BIT_MASK(0, 0) is always  0xffffffffffffffff. :-)

Thanks.
Song Gao



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 42/44] target/loongarch: Implement vld vst
  2023-03-28  3:06 ` [RFC PATCH v2 42/44] target/loongarch: Implement vld vst Song Gao
@ 2023-04-04  3:35   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-04  3:35 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> +void HELPER(vld_b)(CPULoongArchState *env, uint32_t vd, target_ulong addr)
> +{
> +    int i;
> +    VReg *Vd = &(env->fpr[vd].vreg);
> +#if !defined(CONFIG_USER_ONLY)
> +    MemOpIdx oi = make_memop_idx(MO_TE | MO_UNALN, cpu_mmu_index(env, false));
> +
> +    for (i = 0; i < LSX_LEN/8; i++) {
> +        Vd->B(i) = helper_ret_ldub_mmu(env, addr + i, oi, GETPC());
> +    }
> +#else
> +    for (i = 0; i < LSX_LEN/8; i++) {
> +        Vd->B(i) = cpu_ldub_data(env, addr + i);
> +    }
> +#endif
> +}

tcg_gen_qemu_ld_i128.

> +static inline void ensure_writable_pages(CPULoongArchState *env,
> +                                         target_ulong addr,
> +                                         int mmu_idx,
> +                                         uintptr_t retaddr)
> +{
> +#ifndef CONFIG_USER_ONLY
> +    /* FIXME: Probe the actual accesses (pass and use a size) */
> +    if (unlikely(LSX_PAGESPAN(addr))) {
> +        /* first page */
> +        probe_write(env, addr, 0, mmu_idx, retaddr);
> +        /* second page */
> +        addr = (addr & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
> +        probe_write(env, addr, 0, mmu_idx, retaddr);
> +    }
> +#endif
> +}

Won't be needed with...

> +void HELPER(vst_b)(CPULoongArchState *env, uint32_t vd, target_ulong addr)
> +{
> +    int i;
> +    VReg *Vd = &(env->fpr[vd].vreg);
> +    int mmu_idx = cpu_mmu_index(env, false);
> +
> +    ensure_writable_pages(env, addr, mmu_idx, GETPC());
> +#if !defined(CONFIG_USER_ONLY)
> +    MemOpIdx oi = make_memop_idx(MO_TE | MO_UNALN, mmu_idx);
> +    for (i = 0; i < LSX_LEN/8; i++) {
> +        helper_ret_stb_mmu(env, addr + i, Vd->B(i),  oi, GETPC());
> +    }
> +#else
> +    for (i = 0; i < LSX_LEN/8; i++) {
> +        cpu_stb_data(env, addr + i, Vd->B(i));
> +    }
> +#endif
> +}

... tcg_gen_qemu_st_i128.

> +void HELPER(vldrepl_b)(CPULoongArchState *env, uint32_t vd, target_ulong addr)
> +{
> +    VReg *Vd = &(env->fpr[vd].vreg);
> +    uint8_t data;
> +#if !defined(CONFIG_USER_ONLY)
> +    MemOpIdx oi = make_memop_idx(MO_TE | MO_8 | MO_UNALN,
> +                                 cpu_mmu_index(env, false));
> +    data = helper_ret_ldub_mmu(env, addr, oi, GETPC());
> +#else
> +    data = cpu_ldub_data(env, addr);
> +#endif
> +    int i;
> +    for (i = 0; i < 16; i++) {
> +        Vd->B(i) = data;
> +    }
> +}

tcg_gen_qemu_ld_i64 + tcg_gen_gvec_dup_i64.

> +#define B_PAGESPAN(x) \
> +        ((((x) & ~TARGET_PAGE_MASK) + 8/8 - 1) >= TARGET_PAGE_SIZE)
> +
> +static inline void ensure_b_writable_pages(CPULoongArchState *env,
> +                                           target_ulong addr,
> +                                           int mmu_idx,
> +                                           uintptr_t retaddr)
> +{
> +#ifndef CONFIG_USER_ONLY
> +    /* FIXME: Probe the actual accesses (pass and use a size) */
> +    if (unlikely(B_PAGESPAN(addr))) {
> +        /* first page */
> +        probe_write(env, addr, 0, mmu_idx, retaddr);
> +        /* second page */
> +        addr = (addr & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
> +        probe_write(env, addr, 0, mmu_idx, retaddr);
> +    }
> +#endif
> +}
> +
> +void HELPER(vstelm_b)(CPULoongArchState *env,
> +                      uint32_t vd, target_ulong addr, uint32_t sel)
> +{
> +    VReg *Vd = &(env->fpr[vd].vreg);
> +    int mmu_idx = cpu_mmu_index(env, false);
> +
> +    ensure_b_writable_pages(env, addr, mmu_idx, GETPC());
> +#if !defined(CONFIG_USER_ONLY)
> +    MemOpIdx oi = make_memop_idx(MO_TE | MO_8 | MO_UNALN,
> +                                 cpu_mmu_index(env, false));
> +    helper_ret_stb_mmu(env, addr, Vd->B(sel), oi, GETPC());
> +#else
> +    cpu_stb_data(env, addr, Vd->B(sel));
> +#endif
> +}

What are you doing here?
This is a plain integer store.


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 43/44] target/loongarch: Implement vldi
  2023-03-28  3:06 ` [RFC PATCH v2 43/44] target/loongarch: Implement vldi Song Gao
@ 2023-04-04  3:39   ` Richard Henderson
  2023-04-18  9:03     ` Song Gao
  0 siblings, 1 reply; 114+ messages in thread
From: Richard Henderson @ 2023-04-04  3:39 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> +static bool trans_vldi(DisasContext *ctx, arg_vldi *a)
> +{
> +    int sel, vece;
> +    uint64_t value;
> +    CHECK_SXE;
> +
> +    sel = (a->imm >> 12) & 0x1;
> +
> +    if (sel) {
> +        /* VSETI.D */
> +        value = vldi_get_value(ctx, a->imm);
> +        vece = MO_64;
> +    } else {
> +       /*
> +        * VLDI.B/H/W/D
> +        *  a->imm bit [11:10] is vece.
> +        *  a->imm bit [9:0] is value;
> +        */
> +       value = ((int32_t)(a->imm << 22)) >> 22;
> +       vece = (a->imm >> 10) & 0x3;
> +    }
> +
> +    tcg_gen_gvec_dup_i64(vece, vreg_full_offset(a->vd), 16, 16,
> +                         tcg_constant_i64(value));
> +    return true;
> +}

I think you should finish this decode in insns.decode,
especially since we are using that for disassembly.


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 44/44] target/loongarch: Use {set/get}_gpr replace to cpu_fpr
  2023-03-28  3:06 ` [RFC PATCH v2 44/44] target/loongarch: Use {set/get}_gpr replace to cpu_fpr Song Gao
@ 2023-04-04  3:44   ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-04  3:44 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 3/27/23 20:06, Song Gao wrote:
> Introduce set_fpr() and get_fpr() and remove cpu_fpr.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   .../loongarch/insn_trans/trans_farith.c.inc   | 72 +++++++++++++++----
>   target/loongarch/insn_trans/trans_fcmp.c.inc  | 12 ++--
>   .../loongarch/insn_trans/trans_fmemory.c.inc  | 37 ++++++----
>   target/loongarch/insn_trans/trans_fmov.c.inc  | 31 +++++---
>   target/loongarch/translate.c                  | 20 ++++--
>   5 files changed, 129 insertions(+), 43 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

As previously mentioned, patch 2 must be last, because without this patch you will 
generate invalid tcg.


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 18/44] target/loongarch: Implement vsat
  2023-04-04  2:11         ` gaosong
@ 2023-04-04  3:46           ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-04  3:46 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 4/3/23 19:11, gaosong wrote:
> 
> 在 2023/4/4 上午4:13, Richard Henderson 写道:
>> On 4/3/23 05:55, gaosong wrote:
>>> Hi, Richard
>>>
>>> 在 2023/4/1 下午1:03, Richard Henderson 写道:
>>>> On 3/27/23 20:06, Song Gao wrote:
>>>>> +static void gen_vsat_s(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
>>>>> +{
>>>>> +    TCGv_vec t1;
>>>>> +    int64_t max  = (1l << imm) - 1;
>>>>
>>>> This needed 1ull, but better to just use
>>>>
>>>>     max = MAKE_64BIT_MASK(0, imm - 1); 
>>> For the signed  version use ll?
>>> I think use MAKE_64BIT_MASK(0, imm -1 )  for the signed version is not suitable.
>>
>> int64_t max = MAKE_64BIT_MASK(0, imm);
>> int64_t min = ~max // or -1 - max
>>
> The same problem with imm = 0,
> MAKE_64BIT_MASK(0, 0) is always  0xffffffffffffffff. :-)

Huh.  Well that's a bug.


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 15/44] target/loongarch: Implement vmul/vmuh/vmulw{ev/od}
  2023-03-28 20:46   ` Richard Henderson
@ 2023-04-06 12:09     ` gaosong
  2023-04-06 16:52       ` Richard Henderson
  0 siblings, 1 reply; 114+ messages in thread
From: gaosong @ 2023-04-06 12:09 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4563 bytes --]

HI, Richard

在 2023/3/29 上午4:46, Richard Henderson 写道:
>> +static void do_vmuh_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
>> +                      uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
>> +{
>> +    static const GVecGen3 op[4] = {
>> +        {
>> +            .fno = gen_helper_vmuh_b,
>> +            .vece = MO_8
>> +        },
>> +        {
>> +            .fno = gen_helper_vmuh_h,
>> +            .vece = MO_16
>> +        },
>> +        {
>> +            .fno = gen_helper_vmuh_w,
>> +            .vece = MO_32
>> +        },
>> +        {
>> +            .fno = gen_helper_vmuh_d,
>> +            .vece = MO_64
>> +        },
>> +    };
>
> Could be worth integer expansion, especially for MO_32/MO_64?
> Should be trivial...
For integer expansion.  How about the following code?

static void gen_vmuh_b(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)
{
     int i;
     TCGv_i64 t1, t2;

     t1 = tcg_temp_new_i64();
     t2 = tcg_temp_new_i64();

     tcg_gen_mov_i64(t, tcg_constant_i64(0));

     for (i = 0; i < 8; i++) {
         tcg_gen_shri_i64(t1, a, 8 *i);
         tcg_gen_shri_i64(t2, b, 8 *i);
         tcg_gen_ext8s_i64(t1, t1);
         tcg_gen_ext8s_i64(t2, t2);
         tcg_gen_mul_i64(t1, t1, t2);
         tcg_gen_andi_i64(t1, t1, 0xffff);
         tcg_gen_shri_i64(t1, t1, 8);
         tcg_gen_shli_i64(t1, t1, 8 * i);
         tcg_gen_or_i64(t, t, t1);
     }
}

static void gen_vmuh_h(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)
{
     int i;
     TCGv_i64 t1, t2;

     t1 = tcg_temp_new_i64();
     t2 = tcg_temp_new_i64();

     tcg_gen_mov_i64(t, tcg_constant_i64(0));

     for ( i = 0; i < 4; i++) {
         tcg_gen_shri_i64(t1, a, 16 *i);
         tcg_gen_shri_i64(t2, b, 16*i);
         tcg_gen_ext16s_i64(t1, t1);
         tcg_gen_ext16s_i64(t2, t2);
         tcg_gen_mul_i64(t1, t1, t2);
         tcg_gen_andi_i64(t1, t1, 0xffffffff);
         tcg_gen_shri_i64(t1, t1, 16);
         tcg_gen_shli_i64(t1, t1, 16 * i);
         tcg_gen_or_i64(t, t, t1);
     }
}

static void gen_vmuh_w(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b)
{
     TCGv_i64 t1, t2;

     t1 = tcg_temp_new_i64();
     t2 = tcg_temp_new_i64();
     tcg_gen_ext_i32_i64(t1, a);
     tcg_gen_ext_i32_i64(t2, b);
     tcg_gen_mul_i64(t2, t1, t2);
     tcg_gen_extrh_i64_i32(t, t2);
}

static void gen_vmuh_d(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)
{
     TCGv_i64 t1;

     t1 = tcg_temp_new_i64();
     tcg_gen_muls2_i64(t1, t, a, b);
}

static void gen_vmuh_bu(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)
{
     int i;
     TCGv_i64 t1, t2;

     t1 = tcg_temp_new_i64();
     t2 = tcg_temp_new_i64();

     tcg_gen_mov_i64(t, tcg_constant_i64(0));

     for (i = 0; i < 8; i++) {
         tcg_gen_shri_i64(t1, a, 8 * i);
         tcg_gen_shri_i64(t2, b, 8 * i);
         tcg_gen_ext8u_i64(t1, t1);
         tcg_gen_ext8u_i64(t2, t2);
         tcg_gen_mul_i64(t1, t1, t2);
         tcg_gen_shri_i64(t1, t1, 8);
         tcg_gen_shli_i64(t1, t1, 8 * i);
         tcg_gen_or_i64(t, t, t1);
     }
}

static void gen_vmuh_hu(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)
{
     int i;
     TCGv_i64 t1, t2;

     t1 = tcg_temp_new_i64();
     t2 = tcg_temp_new_i64();

     tcg_gen_mov_i64(t, tcg_constant_i64(0));

     for (i = 0; i < 4; i++) {
         tcg_gen_shri_i64(t1, a, 16 *i);
         tcg_gen_shri_i64(t2, b, 16*i);
         tcg_gen_ext16u_i64(t1, t1);
         tcg_gen_ext16u_i64(t2, t2);
         tcg_gen_mul_i64(t1, t1, t2);
         tcg_gen_shri_i64(t1, t1, 16);
         tcg_gen_shli_i64(t1, t1, 16 * i);
         tcg_gen_or_i64(t, t, t1);
     }
}

static void gen_vmuh_wu(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b)
{
     TCGv_i64 t1, t2;

     t1 = tcg_temp_new_i64();
     t2 = tcg_temp_new_i64();
     tcg_gen_extu_i32_i64(t1, a);
     tcg_gen_extu_i32_i64(t2, b);
     tcg_gen_mul_i64(t2, t1, t2);
     tcg_gen_extrh_i64_i32(t, t2);
}

static void gen_vmuh_du(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)
{
     TCGv_i64 t1;

     t1 = tcg_temp_new_i64();
     tcg_gen_mulu2_i64(t1, t, a, b);
}

Thanks.
Song Gao

[-- Attachment #2: Type: text/html, Size: 6369 bytes --]

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 15/44] target/loongarch: Implement vmul/vmuh/vmulw{ev/od}
  2023-04-06 12:09     ` gaosong
@ 2023-04-06 16:52       ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-06 16:52 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 4/6/23 05:09, gaosong wrote:
> HI, Richard
> 
> 在 2023/3/29 上午4:46, Richard Henderson 写道:
>>> +static void do_vmuh_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
>>> +                      uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
>>> +{
>>> +    static const GVecGen3 op[4] = {
>>> +        {
>>> +            .fno = gen_helper_vmuh_b,
>>> +            .vece = MO_8
>>> +        },
>>> +        {
>>> +            .fno = gen_helper_vmuh_h,
>>> +            .vece = MO_16
>>> +        },
>>> +        {
>>> +            .fno = gen_helper_vmuh_w,
>>> +            .vece = MO_32
>>> +        },
>>> +        {
>>> +            .fno = gen_helper_vmuh_d,
>>> +            .vece = MO_64
>>> +        },
>>> +    };
>>
>> Could be worth integer expansion, especially for MO_32/MO_64?
>> Should be trivial...
> For integer expansion.  How about the following code?

I meant just "w" and "d" -- drop the "b" and "h" inline expansion.

> static void gen_vmuh_w(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b)
> {
>      TCGv_i64 t1, t2;
> 
>      t1 = tcg_temp_new_i64();
>      t2 = tcg_temp_new_i64();
>      tcg_gen_ext_i32_i64(t1, a);
>      tcg_gen_ext_i32_i64(t2, b);
>      tcg_gen_mul_i64(t2, t1, t2);
>      tcg_gen_extrh_i64_i32(t, t2);
> }

TCGv_i32 discard = tcg_temp_new_i32();
tcg_gen_muls2_i32(discard, t, a, b);

> 
> static void gen_vmuh_d(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)
> {
>      TCGv_i64 t1;
> 
>      t1 = tcg_temp_new_i64();
>      tcg_gen_muls2_i64(t1, t, a, b);
> }

Yes.

> static void gen_vmuh_wu(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b)
> {
>      TCGv_i64 t1, t2;
> 
>      t1 = tcg_temp_new_i64();
>      t2 = tcg_temp_new_i64();
>      tcg_gen_extu_i32_i64(t1, a);
>      tcg_gen_extu_i32_i64(t2, b);
>      tcg_gen_mul_i64(t2, t1, t2);
>      tcg_gen_extrh_i64_i32(t, t2);
> }

tcg_gen_mulu2_i32.

> static void gen_vmuh_du(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b)
> {
>      TCGv_i64 t1;
> 
>      t1 = tcg_temp_new_i64();
>      tcg_gen_mulu2_i64(t1, t, a, b);
> }

Yes.


r~



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 30/44] target/loongarch: Implement vclo vclz
  2023-04-02  3:34   ` Richard Henderson
@ 2023-04-07  7:40     ` gaosong
  2023-04-07 16:46       ` Richard Henderson
  0 siblings, 1 reply; 114+ messages in thread
From: gaosong @ 2023-04-07  7:40 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel


在 2023/4/2 上午11:34, Richard Henderson 写道:
> On 3/27/23 20:06, Song Gao wrote:
>> +#define DO_CLO_B(N)  (clz32((uint8_t)~N) - 24)
>> +#define DO_CLO_H(N)  (clz32((uint16_t)~N) - 16)
>
> I think this is wrong. 
It is wried,  the result is always right. :-\
and  (clz32(~N) - 24)  or (clz32((uint32_t)~N) - 24) is wrong.
> You *want* the high bits to be set, so that they are ones, and 
> included in the count, which you then subtract off.  You want the 
> "real" count to start after the 24th leading 1.
>
Yes,
and  we use clz32(),   how about the following way?

#define DO_CLO_B(N)  (clz32( ~N & 0xff) -24)
#define DO_CLO_H(N)  (clz32( ~N & 0xffff) -16)

Thanks.
Song Gao



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 30/44] target/loongarch: Implement vclo vclz
  2023-04-07  7:40     ` gaosong
@ 2023-04-07 16:46       ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-07 16:46 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 4/7/23 00:40, gaosong wrote:
> 
> 在 2023/4/2 上午11:34, Richard Henderson 写道:
>> On 3/27/23 20:06, Song Gao wrote:
>>> +#define DO_CLO_B(N)  (clz32((uint8_t)~N) - 24)
>>> +#define DO_CLO_H(N)  (clz32((uint16_t)~N) - 16)
>>
>> I think this is wrong. 
> It is wried,  the result is always right. :-\
> and  (clz32(~N) - 24)  or (clz32((uint32_t)~N) - 24) is wrong.
>> You *want* the high bits to be set, so that they are ones, and included in the count, 
>> which you then subtract off.  You want the "real" count to start after the 24th leading 1.
>>
> Yes,
> and  we use clz32(),   how about the following way?
> 
> #define DO_CLO_B(N)  (clz32( ~N & 0xff) -24)
> #define DO_CLO_H(N)  (clz32( ~N & 0xffff) -16)

Ah yes, I see.  My mistake.  Either old or new formulation is fine.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 38/44] target/loongarch: Implement vbitsel vset
  2023-04-04  1:03   ` Richard Henderson
@ 2023-04-11 11:37     ` gaosong
  2023-04-12  6:53       ` Richard Henderson
  0 siblings, 1 reply; 114+ messages in thread
From: gaosong @ 2023-04-11 11:37 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3279 bytes --]


在 2023/4/4 上午9:03, Richard Henderson 写道:
>> +void HELPER(vseteqz_v)(CPULoongArchState *env, uint32_t cd, uint32_t 
>> vj)
>> +{
>> +    VReg *Vj = &(env->fpr[vj].vreg);
>> +    env->cf[cd & 0x7] = (Vj->Q(0) == 0);
>> +}
>> +
>> +void HELPER(vsetnez_v)(CPULoongArchState *env, uint32_t cd, uint32_t 
>> vj)
>> +{
>> +    VReg *Vj = &(env->fpr[vj].vreg);
>> +    env->cf[cd & 0x7] = (Vj->Q(0) != 0);
>> +}
>
> This is trivial inline.

e.g

static bool trans_vseteqz_v(DisasContext *ctx, arg_cv *a)
{
     TCGv_i64  t1, t2, al, ah, zero;

     al = tcg_temp_new_i64();
     ah = tcg_temp_new_i64();
     t1 = tcg_temp_new_i64();
     t2 = tcg_temp_new_i64();
     zero = tcg_constant_i64(0);

     get_vreg64(ah, a->vj, 1);
     get_vreg64(al, a->vj, 0);

     CHECK_SXE;
     tcg_gen_setcond_i64(TCG_COND_EQ, t1, al, zero);
     tcg_gen_setcond_i64(TCG_COND_EQ, t2, ah, zero);
     tcg_gen_and_i64(t1, t1, t2);
     tcg_gen_st8_tl(t1, cpu_env, offsetof(CPULoongArchState, cf[a->cd & 
0x7]));

     return true;
}

and

static bool trans_vsetnez_v(DisasContext *ctx, arg_cv *a)
{
     TCGv_i64  t1, t2, al, ah, zero;

     al = tcg_temp_new_i64();
     ah = tcg_temp_new_i64();
     t1 = tcg_temp_new_i64();
     t2 = tcg_temp_new_i64();
     zero = tcg_constant_i64(0);

     get_vreg64(ah, a->vj, 1);
     get_vreg64(al, a->vj, 0);

     CHECK_SXE;
     tcg_gen_setcond_i64(TCG_COND_NE, t1, al, zero);
     tcg_gen_setcond_i64(TCG_COND_NE, t2, ah, zero);
     tcg_gen_or_i64(t1, t1, t2);
     tcg_gen_st8_tl(t1, cpu_env, offsetof(CPULoongArchState, cf[a->cd & 
0x7]));

     return true;
}

>> +#define SETANYEQZ(NAME, BIT, E)                                     \
>> +void HELPER(NAME)(CPULoongArchState *env, uint32_t cd, uint32_t vj) \
>> +{                                                                   \
>> +    int i;                                                          \
>> +    bool ret = false;                                               \
>> +    VReg *Vj = &(env->fpr[vj].vreg);                                \
>> +                                                                    \
>> +    for (i = 0; i < LSX_LEN/BIT; i++) {                             \
>> +        ret |= (Vj->E(i) == 0);                                     \
>> + } \
>> +    env->cf[cd & 0x7] = ret;                                        \
>> +}
>> +SETANYEQZ(vsetanyeqz_b, 8, B)
>> +SETANYEQZ(vsetanyeqz_h, 16, H)
>> +SETANYEQZ(vsetanyeqz_w, 32, W)
>> +SETANYEQZ(vsetanyeqz_d, 64, D)
>
> These could be inlined, though slightly harder.
> C.f. target/arm/sve_helper.c, do_match2 (your n == 0).
>
Do you mean an inline like trans_vseteqz_v or just an inline helper 
function?

Thanks.
Song Gao
> Anyway, leaving this as-is for now is also ok. 

[-- Attachment #2: Type: text/html, Size: 5287 bytes --]

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 38/44] target/loongarch: Implement vbitsel vset
  2023-04-11 11:37     ` gaosong
@ 2023-04-12  6:53       ` Richard Henderson
  2023-04-13  2:53         ` gaosong
  0 siblings, 1 reply; 114+ messages in thread
From: Richard Henderson @ 2023-04-12  6:53 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 4/11/23 13:37, gaosong wrote:
> static bool trans_vseteqz_v(DisasContext *ctx, arg_cv *a)
> {
>      TCGv_i64  t1, t2, al, ah, zero;
> 
>      al = tcg_temp_new_i64();
>      ah = tcg_temp_new_i64();
>      t1 = tcg_temp_new_i64();
>      t2 = tcg_temp_new_i64();
>      zero = tcg_constant_i64(0);
> 
>      get_vreg64(ah, a->vj, 1);
>      get_vreg64(al, a->vj, 0);
> 
>      CHECK_SXE;
>      tcg_gen_setcond_i64(TCG_COND_EQ, t1, al, zero);
>      tcg_gen_setcond_i64(TCG_COND_EQ, t2, ah, zero);
>      tcg_gen_and_i64(t1, t1, t2);

tcg_gen_or_i64(t1, al, ah);
tcg_gen_setcondi_i64(TCG_COND_EQ, t1, t1, 0

But otherwise correct, yes.

>>> +#define SETANYEQZ(NAME, BIT, E)                                     \
>>> +void HELPER(NAME)(CPULoongArchState *env, uint32_t cd, uint32_t vj) \
>>> +{                                                                   \
>>> +    int i;                                                          \
>>> +    bool ret = false;                                               \
>>> +    VReg *Vj = &(env->fpr[vj].vreg);                                \
>>> +                                                                    \
>>> +    for (i = 0; i < LSX_LEN/BIT; i++) {                             \
>>> +        ret |= (Vj->E(i) == 0);                                     \
>>> + } \
>>> +    env->cf[cd & 0x7] = ret;                                        \
>>> +}
>>> +SETANYEQZ(vsetanyeqz_b, 8, B)
>>> +SETANYEQZ(vsetanyeqz_h, 16, H)
>>> +SETANYEQZ(vsetanyeqz_w, 32, W)
>>> +SETANYEQZ(vsetanyeqz_d, 64, D)
>>
>> These could be inlined, though slightly harder.
>> C.f. target/arm/sve_helper.c, do_match2 (your n == 0).
>>
> Do you mean an inline like trans_vseteqz_v or just an inline helper function?

I meant inline tcg code generation, instead of a call to a helper.
But even if we keep this in a helper, see do_match2 for avoiding the loop over bytes.


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 38/44] target/loongarch: Implement vbitsel vset
  2023-04-12  6:53       ` Richard Henderson
@ 2023-04-13  2:53         ` gaosong
  2023-04-13 10:06           ` Richard Henderson
  0 siblings, 1 reply; 114+ messages in thread
From: gaosong @ 2023-04-13  2:53 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2667 bytes --]


在 2023/4/12 下午2:53, Richard Henderson 写道:
>
>>>> +#define SETANYEQZ(NAME, BIT, E) \
>>>> +void HELPER(NAME)(CPULoongArchState *env, uint32_t cd, uint32_t vj) \
>>>> +{                                                                   \
>>>> +    int i; \
>>>> +    bool ret = false;                                               \
>>>> +    VReg *Vj = &(env->fpr[vj].vreg); \
>>>> +                                                                    \
>>>> +    for (i = 0; i < LSX_LEN/BIT; i++) {                             \
>>>> +        ret |= (Vj->E(i) == 0);                                     \
>>>> + } \
>>>> +    env->cf[cd & 0x7] = ret;                                        \
>>>> +}
>>>> +SETANYEQZ(vsetanyeqz_b, 8, B)
>>>> +SETANYEQZ(vsetanyeqz_h, 16, H)
>>>> +SETANYEQZ(vsetanyeqz_w, 32, W)
>>>> +SETANYEQZ(vsetanyeqz_d, 64, D)
>>>
>>> These could be inlined, though slightly harder.
>>> C.f. target/arm/sve_helper.c, do_match2 (your n == 0).
>>>
>> Do you mean an inline like trans_vseteqz_v or just an inline helper 
>> function?
>
> I meant inline tcg code generation, instead of a call to a helper.
> But even if we keep this in a helper, see do_match2 for avoiding the 
> loop over bytes. 
Ok,
e.g
#define SETANYEQZ(NAME, MO)                                  \
void HELPER(NAME)(CPULoongArchState *env, uint32_t cd, uint32_t vj) \
{                                                                 \
     int i;                                                                \
     bool ret = false; \
     VReg *Vj = &(env->fpr[vj].vreg); \
\
     ret = do_match2(0, (uint64_t)Vj->D(0), (uint64_t)Vj->D(1), 
MO);              \
     env->cf[cd & 0x7] = ret;      \
}
SETANYEQZ(vsetanyeqz_b, MO_8)
SETANYEQZ(vsetanyeqz_h, MO_16)
SETANYEQZ(vsetanyeqz_w, MO_32)
SETANYEQZ(vsetanyeqz_d, MO_64)

and
vsetanyeqz.b    $fcc5  $vr11
   v11    : {edc0004d576eef5b, ec03ec0fec03ea47}
------------------
do_match2
bits is 8
m1 is ec03ec0fec03ea47
m0 is edc0004d576eef5b
ones is 1010101
sings is 80808080
cmp1 is 0
cmp0 is edc0004d576eef5b
cmp1 is ec03ec0fec03ea47
cmp0 is 10000
cmp1 is 3000100
ret is 0

but,  the results is not correct  for vsetanyeqz.b. :-)

Thanks.
Song Gao

[-- Attachment #2: Type: text/html, Size: 5527 bytes --]

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 38/44] target/loongarch: Implement vbitsel vset
  2023-04-13  2:53         ` gaosong
@ 2023-04-13 10:06           ` Richard Henderson
  2023-04-14  3:22             ` gaosong
  0 siblings, 1 reply; 114+ messages in thread
From: Richard Henderson @ 2023-04-13 10:06 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 4/13/23 04:53, gaosong wrote:
> 
> 在 2023/4/12 下午2:53, Richard Henderson 写道:
>>
>>>>> +#define SETANYEQZ(NAME, BIT, E) \
>>>>> +void HELPER(NAME)(CPULoongArchState *env, uint32_t cd, uint32_t vj) \
>>>>> +{                                                                   \
>>>>> +    int i; \
>>>>> +    bool ret = false;                                               \
>>>>> +    VReg *Vj = &(env->fpr[vj].vreg); \
>>>>> +                                                                    \
>>>>> +    for (i = 0; i < LSX_LEN/BIT; i++) {                             \
>>>>> +        ret |= (Vj->E(i) == 0);                                     \
>>>>> + } \
>>>>> +    env->cf[cd & 0x7] = ret;                                        \
>>>>> +}
>>>>> +SETANYEQZ(vsetanyeqz_b, 8, B)
>>>>> +SETANYEQZ(vsetanyeqz_h, 16, H)
>>>>> +SETANYEQZ(vsetanyeqz_w, 32, W)
>>>>> +SETANYEQZ(vsetanyeqz_d, 64, D)
>>>>
>>>> These could be inlined, though slightly harder.
>>>> C.f. target/arm/sve_helper.c, do_match2 (your n == 0).
>>>>
>>> Do you mean an inline like trans_vseteqz_v or just an inline helper function?
>>
>> I meant inline tcg code generation, instead of a call to a helper.
>> But even if we keep this in a helper, see do_match2 for avoiding the loop over bytes. 
> Ok,
> e.g
> #define SETANYEQZ(NAME, MO)                                  \
> void HELPER(NAME)(CPULoongArchState *env, uint32_t cd, uint32_t vj) \
> {                                                                 \
>      int i;                                                                \
>      bool ret = false; \
>      VReg *Vj = &(env->fpr[vj].vreg); \
> \
>      ret = do_match2(0, (uint64_t)Vj->D(0), (uint64_t)Vj->D(1), MO);              \
>      env->cf[cd & 0x7] = ret;      \
> }
> SETANYEQZ(vsetanyeqz_b, MO_8)
> SETANYEQZ(vsetanyeqz_h, MO_16)
> SETANYEQZ(vsetanyeqz_w, MO_32)
> SETANYEQZ(vsetanyeqz_d, MO_64)
> 
> and
> vsetanyeqz.b    $fcc5  $vr11
>    v11    : {edc0004d576eef5b, ec03ec0fec03ea47}
> ------------------
> do_match2
> bits is 8
> m1 is ec03ec0fec03ea47
> m0 is edc0004d576eef5b
> ones is 1010101
> sings is 80808080
> cmp1 is 0
> cmp0 is edc0004d576eef5b
> cmp1 is ec03ec0fec03ea47
> cmp0 is 10000
> cmp1 is 3000100
> ret is 0
> 
> but,  the results is not correct  for vsetanyeqz.b. :-)

Well, 'ones' as printed above is only 4 bytes instead of 8, similarly 'sings'.  That would 
certainly explain why it did not detect a zero in byte 5 of 'm0'.

Some problem with your conversion of that function?


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 38/44] target/loongarch: Implement vbitsel vset
  2023-04-13 10:06           ` Richard Henderson
@ 2023-04-14  3:22             ` gaosong
  2023-04-14  3:47               ` gaosong
  0 siblings, 1 reply; 114+ messages in thread
From: gaosong @ 2023-04-14  3:22 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel


在 2023/4/13 下午6:06, Richard Henderson 写道:
> On 4/13/23 04:53, gaosong wrote:
>>
>> 在 2023/4/12 下午2:53, Richard Henderson 写道:
>>>
>>>>>> +#define SETANYEQZ(NAME, BIT, E) \
>>>>>> +void HELPER(NAME)(CPULoongArchState *env, uint32_t cd, uint32_t 
>>>>>> vj) \
>>>>>> +{                                                                   
>>>>>> \
>>>>>> +    int i; \
>>>>>> +    bool ret = 
>>>>>> false;                                               \
>>>>>> +    VReg *Vj = &(env->fpr[vj].vreg); \
>>>>>> +                                                                    
>>>>>> \
>>>>>> +    for (i = 0; i < LSX_LEN/BIT; i++) 
>>>>>> {                             \
>>>>>> +        ret |= (Vj->E(i) == 
>>>>>> 0);                                     \
>>>>>> + } \
>>>>>> +    env->cf[cd & 0x7] = 
>>>>>> ret;                                        \
>>>>>> +}
>>>>>> +SETANYEQZ(vsetanyeqz_b, 8, B)
>>>>>> +SETANYEQZ(vsetanyeqz_h, 16, H)
>>>>>> +SETANYEQZ(vsetanyeqz_w, 32, W)
>>>>>> +SETANYEQZ(vsetanyeqz_d, 64, D)
>>>>>
>>>>> These could be inlined, though slightly harder.
>>>>> C.f. target/arm/sve_helper.c, do_match2 (your n == 0).
>>>>>
>>>> Do you mean an inline like trans_vseteqz_v or just an inline helper 
>>>> function?
>>>
>>> I meant inline tcg code generation, instead of a call to a helper.
>>> But even if we keep this in a helper, see do_match2 for avoiding the 
>>> loop over bytes. 
>> Ok,
>> e.g
>> #define SETANYEQZ(NAME, MO)                                  \
>> void HELPER(NAME)(CPULoongArchState *env, uint32_t cd, uint32_t vj) \
>> {                                                               \
>>      int i;            \
>>      bool ret = false; \
>>      VReg *Vj = &(env->fpr[vj].vreg); \
>> \
>>      ret = do_match2(0, (uint64_t)Vj->D(0), (uint64_t)Vj->D(1), 
>> MO);              \
>>      env->cf[cd & 0x7] = ret;      \
>> }
>> SETANYEQZ(vsetanyeqz_b, MO_8)
>> SETANYEQZ(vsetanyeqz_h, MO_16)
>> SETANYEQZ(vsetanyeqz_w, MO_32)
>> SETANYEQZ(vsetanyeqz_d, MO_64)
>>
>> and
>> vsetanyeqz.b    $fcc5  $vr11
>>    v11    : {edc0004d576eef5b, ec03ec0fec03ea47}
>> ------------------
>> do_match2
>> bits is 8
>> m1 is ec03ec0fec03ea47
>> m0 is edc0004d576eef5b
>> ones is 1010101
>> sings is 80808080
>> cmp1 is 0
>> cmp0 is edc0004d576eef5b
>> cmp1 is ec03ec0fec03ea47
>> cmp0 is 10000
>> cmp1 is 3000100
>> ret is 0
>>
>> but,  the results is not correct  for vsetanyeqz.b. :-)
>
> Well, 'ones' as printed above is only 4 bytes instead of 8, similarly 
> 'sings'.  That would certainly explain why it did not detect a zero in 
> byte 5 of 'm0'.
>
> Some problem with your conversion of that function?
>
I copied do_match2  from arm.  and my host is x86 machine.

...
uint64_t ones = dup_const(esz, 1);   // esz = MO_8
uint64_t signs = ones << ( bits  -1 );   // bits = 8
...


the dup_const() return  0x101010101010101.

but set  the 'ones' is 0x1010101.


Thread 1 "qemu-loongarch6" hit Breakpoint 1, helper_vsetanyeqz_b 
(env=0x555555a50910, cd=6, vj=3) at ../target/loongarch/lsx_helper.c:2906
2906    SETANYEQZ(vsetanyeqz_b, MO_8, B)
(gdb) s
do_match2 (n=0, m0=14467753019624114359, m1=14467753019624114359, esz=0) 
at ../target/loongarch/lsx_helper.c:2868
2868        uint64_t bits = 8 << esz;
(gdb) s
2869        uint64_t ones = dup_const(esz, 1);
(gdb) s
dup_const (vece=0, c=1) at ../tcg/tcg-op-gvec.c:374
374        switch (vece) {
(gdb) finish
Run till exit from #0  dup_const (vece=0, c=1) at ../tcg/tcg-op-gvec.c:374
do_match2 (n=0, m0=14467753019624114359, m1=14467753019624114359, esz=0) 
at ../target/loongarch/lsx_helper.c:2869
2869        uint64_t ones = dup_const(esz, 1);
Value returned is $16 = 72340172838076673
(gdb) disassemble $pc
Dump of assembler code for function do_match2:
    0x00005555555fffdf <+0>:    push   %rbp
    0x00005555555fffe0 <+1>:    mov    %rsp,%rbp
    0x00005555555fffe3 <+4>:    sub    $0x50,%rsp
    0x00005555555fffe7 <+8>:    mov    %rdi,-0x38(%rbp)
    0x00005555555fffeb <+12>:    mov    %rsi,-0x40(%rbp)
    0x00005555555fffef <+16>:    mov    %rdx,-0x48(%rbp)
    0x00005555555ffff3 <+20>:    mov    %ecx,-0x4c(%rbp)
    0x00005555555ffff6 <+23>:    mov    -0x4c(%rbp),%eax
    0x00005555555ffff9 <+26>:    mov    $0x8,%edx
    0x00005555555ffffe <+31>:    mov    %eax,%ecx
    0x0000555555600000 <+33>:    shl    %cl,%edx
    0x0000555555600002 <+35>:    mov    %edx,%eax
    0x0000555555600004 <+37>:    cltq
    0x0000555555600006 <+39>:    mov    %rax,-0x28(%rbp)
    0x000055555560000a <+43>:    mov    -0x4c(%rbp),%eax
    0x000055555560000d <+46>:    mov    $0x1,%esi
    0x0000555555600012 <+51>:    mov    %eax,%edi
    0x0000555555600014 <+53>:    mov    $0x0,%eax
    0x0000555555600019 <+58>:    callq  0x5555556342c3 <dup_const>
=> 0x000055555560001e <+63>:    cltq
    0x0000555555600020 <+65>:    mov    %rax,-0x20(%rbp)
    0x0000555555600024 <+69>:    mov    -0x28(%rbp),%rax
    0x0000555555600028 <+73>:    sub    $0x1,%eax
    0x000055555560002b <+76>:    mov    -0x20(%rbp),%rdx
    0x000055555560002f <+80>:    mov    %eax,%ecx
    0x0000555555600031 <+82>:    shl    %cl,%rdx
    0x0000555555600034 <+85>:    mov    %rdx,%rax
    0x0000555555600037 <+88>:    mov    %rax,-0x18(%rbp)
    0x000055555560003b <+92>:    lea 0x129df7(%rip),%rdi        # 
0x555555729e39
    0x0000555555600042 <+99>:    callq  0x555555583af0 <puts@plt>
    0x0000555555600047 <+104>:    mov    -0x4c(%rbp),%eax
--Type <RET> for more, q to quit, c to continue without paging--q
Quit
(gdb) p/x $rax
$17 = 0x101010101010101
(gdb) si
0x0000555555600020    2869        uint64_t ones = dup_const(esz, 1);
(gdb) p/x $rax
$18 = 0x1010101
(gdb) disassemble $pc
Dump of assembler code for function do_match2:
    0x00005555555fffdf <+0>:    push   %rbp
    0x00005555555fffe0 <+1>:    mov    %rsp,%rbp
    0x00005555555fffe3 <+4>:    sub    $0x50,%rsp
    0x00005555555fffe7 <+8>:    mov    %rdi,-0x38(%rbp)
    0x00005555555fffeb <+12>:    mov    %rsi,-0x40(%rbp)
    0x00005555555fffef <+16>:    mov    %rdx,-0x48(%rbp)
    0x00005555555ffff3 <+20>:    mov    %ecx,-0x4c(%rbp)
    0x00005555555ffff6 <+23>:    mov    -0x4c(%rbp),%eax
    0x00005555555ffff9 <+26>:    mov    $0x8,%edx
    0x00005555555ffffe <+31>:    mov    %eax,%ecx
    0x0000555555600000 <+33>:    shl    %cl,%edx
    0x0000555555600002 <+35>:    mov    %edx,%eax
    0x0000555555600004 <+37>:    cltq
    0x0000555555600006 <+39>:    mov    %rax,-0x28(%rbp)
    0x000055555560000a <+43>:    mov    -0x4c(%rbp),%eax
    0x000055555560000d <+46>:    mov    $0x1,%esi
    0x0000555555600012 <+51>:    mov    %eax,%edi
    0x0000555555600014 <+53>:    mov    $0x0,%eax
    0x0000555555600019 <+58>:    callq  0x5555556342c3 <dup_const>
    0x000055555560001e <+63>:    cltq
=> 0x0000555555600020 <+65>:    mov    %rax,-0x20(%rbp)
    0x0000555555600024 <+69>:    mov    -0x28(%rbp),%rax
    0x0000555555600028 <+73>:    sub    $0x1,%eax
    0x000055555560002b <+76>:    mov    -0x20(%rbp),%rdx
    0x000055555560002f <+80>:    mov    %eax,%ecx
    0x0000555555600031 <+82>:    shl    %cl,%rdx
    0x0000555555600034 <+85>:    mov    %rdx,%rax
    0x0000555555600037 <+88>:    mov    %rax,-0x18(%rbp)
    0x000055555560003b <+92>:    lea 0x129df7(%rip),%rdi        # 
0x555555729e39
    0x0000555555600042 <+99>:    callq  0x555555583af0 <puts@plt>
    0x0000555555600047 <+104>:    mov    -0x4c(%rbp),%eax
--Type <RET> for more, q to quit, c to continue without paging--q
Quit
(gdb) p/x ones
$19 = 0x7fffffffc850
(gdb) si
2871        uint64_t signs = ones << (bits - 1);
(gdb) p/x $rax
$20 = 0x1010101
(gdb) p/x ones
$21 = 0x1010101

After  exec   insn  'cltq' ,  the  'ones'  is not we want.

Thanks.
Song Gao



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 38/44] target/loongarch: Implement vbitsel vset
  2023-04-14  3:22             ` gaosong
@ 2023-04-14  3:47               ` gaosong
  0 siblings, 0 replies; 114+ messages in thread
From: gaosong @ 2023-04-14  3:47 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel


在 2023/4/14 上午11:22, gaosong 写道:
>
> 在 2023/4/13 下午6:06, Richard Henderson 写道:
>> On 4/13/23 04:53, gaosong wrote:
>>>
>>> 在 2023/4/12 下午2:53, Richard Henderson 写道:
>>>>
>>>>>>> +#define SETANYEQZ(NAME, BIT, E) \
>>>>>>> +void HELPER(NAME)(CPULoongArchState *env, uint32_t cd, uint32_t 
>>>>>>> vj) \
>>>>>>> +{                                                                   
>>>>>>> \
>>>>>>> +    int i; \
>>>>>>> +    bool ret = 
>>>>>>> false;                                               \
>>>>>>> +    VReg *Vj = &(env->fpr[vj].vreg); \
>>>>>>> +                                                                    
>>>>>>> \
>>>>>>> +    for (i = 0; i < LSX_LEN/BIT; i++) 
>>>>>>> {                             \
>>>>>>> +        ret |= (Vj->E(i) == 
>>>>>>> 0);                                     \
>>>>>>> + } \
>>>>>>> +    env->cf[cd & 0x7] = 
>>>>>>> ret;                                        \
>>>>>>> +}
>>>>>>> +SETANYEQZ(vsetanyeqz_b, 8, B)
>>>>>>> +SETANYEQZ(vsetanyeqz_h, 16, H)
>>>>>>> +SETANYEQZ(vsetanyeqz_w, 32, W)
>>>>>>> +SETANYEQZ(vsetanyeqz_d, 64, D)
>>>>>>
>>>>>> These could be inlined, though slightly harder.
>>>>>> C.f. target/arm/sve_helper.c, do_match2 (your n == 0).
>>>>>>
>>>>> Do you mean an inline like trans_vseteqz_v or just an inline 
>>>>> helper function?
>>>>
>>>> I meant inline tcg code generation, instead of a call to a helper.
>>>> But even if we keep this in a helper, see do_match2 for avoiding 
>>>> the loop over bytes. 
>>> Ok,
>>> e.g
>>> #define SETANYEQZ(NAME, MO)                                  \
>>> void HELPER(NAME)(CPULoongArchState *env, uint32_t cd, uint32_t vj) \
>>> { \
>>>      int i;            \
>>>      bool ret = false; \
>>>      VReg *Vj = &(env->fpr[vj].vreg); \
>>> \
>>>      ret = do_match2(0, (uint64_t)Vj->D(0), (uint64_t)Vj->D(1), 
>>> MO);              \
>>>      env->cf[cd & 0x7] = ret;      \
>>> }
>>> SETANYEQZ(vsetanyeqz_b, MO_8)
>>> SETANYEQZ(vsetanyeqz_h, MO_16)
>>> SETANYEQZ(vsetanyeqz_w, MO_32)
>>> SETANYEQZ(vsetanyeqz_d, MO_64)
>>>
>>> and
>>> vsetanyeqz.b    $fcc5  $vr11
>>>    v11    : {edc0004d576eef5b, ec03ec0fec03ea47}
>>> ------------------
>>> do_match2
>>> bits is 8
>>> m1 is ec03ec0fec03ea47
>>> m0 is edc0004d576eef5b
>>> ones is 1010101
>>> sings is 80808080
>>> cmp1 is 0
>>> cmp0 is edc0004d576eef5b
>>> cmp1 is ec03ec0fec03ea47
>>> cmp0 is 10000
>>> cmp1 is 3000100
>>> ret is 0
>>>
>>> but,  the results is not correct  for vsetanyeqz.b. :-)
>>
>> Well, 'ones' as printed above is only 4 bytes instead of 8, similarly 
>> 'sings'.  That would certainly explain why it did not detect a zero 
>> in byte 5 of 'm0'.
>>
>> Some problem with your conversion of that function?
>>
> I copied do_match2  from arm.  and my host is x86 machine.
>
> ...
> uint64_t ones = dup_const(esz, 1);   // esz = MO_8
> uint64_t signs = ones << ( bits  -1 );   // bits = 8
> ...
>
>
> the dup_const() return  0x101010101010101.
>
> but set  the 'ones' is 0x1010101.
>
>
Oh, I didn't include the 'tcg/tcg.h' header file.

Thanks.
Song gao



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 43/44] target/loongarch: Implement vldi
  2023-04-04  3:39   ` Richard Henderson
@ 2023-04-18  9:03     ` Song Gao
  0 siblings, 0 replies; 114+ messages in thread
From: Song Gao @ 2023-04-18  9:03 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

Hi, Richard

在 2023/4/4 上午11:39, Richard Henderson 写道:
> On 3/27/23 20:06, Song Gao wrote:
>> +static bool trans_vldi(DisasContext *ctx, arg_vldi *a)
>> +{
>> +    int sel, vece;
>> +    uint64_t value;
>> +    CHECK_SXE;
>> +
>> +    sel = (a->imm >> 12) & 0x1;
>> +
>> +    if (sel) {
>> +        /* VSETI.D */
>> +        value = vldi_get_value(ctx, a->imm);
>> +        vece = MO_64;
>> +    } else {
>> +       /*
>> +        * VLDI.B/H/W/D
>> +        *  a->imm bit [11:10] is vece.
>> +        *  a->imm bit [9:0] is value;
>> +        */
>> +       value = ((int32_t)(a->imm << 22)) >> 22;
>> +       vece = (a->imm >> 10) & 0x3;
>> +    }
>> +
>> +    tcg_gen_gvec_dup_i64(vece, vreg_full_offset(a->vd), 16, 16,
>> +                         tcg_constant_i64(value));
>> +    return true;
>> +}
>
> I think you should finish this decode in insns.decode,
> especially since we are using that for disassembly.
>
You can ignore these comments, I will drop them. We only have vldi, no 
vseti.d, vldi.xxx insns.

Thanks.
Song Gao



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 18/44] target/loongarch: Implement vsat
  2023-04-01  5:03   ` Richard Henderson
  2023-04-03 12:55     ` gaosong
@ 2023-04-19  9:31     ` Song Gao
  2023-04-19 11:06       ` Richard Henderson
  1 sibling, 1 reply; 114+ messages in thread
From: Song Gao @ 2023-04-19  9:31 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 6569 bytes --]

Hi, Richard

在 2023/4/1 下午1:03, Richard Henderson 写道:
>
>> + tcg_gen_dupi_vec(vece, t1, max);
>> +    tcg_gen_smin_vec(vece, t, t, t1);
>> +}
>> +
>> +static void do_vsat_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
>> +                      int64_t imm, uint32_t oprsz, uint32_t maxsz)
>> +{
>> +    static const TCGOpcode vecop_list[] = {
>> +        INDEX_op_smax_vec, INDEX_op_smin_vec, 0
>> +        };
>> +    static const GVecGen2i op[4] = {
>> +        {
>> +            .fniv = gen_vsat_s,
>> +            .fnoi = gen_helper_vsat_b,
>> +            .opt_opc = vecop_list,
>> +            .vece = MO_8
>> +        },
>> +        {
>> +            .fniv = gen_vsat_s,
>> +            .fnoi = gen_helper_vsat_h,
>> +            .opt_opc = vecop_list,
>> +            .vece = MO_16
>> +        },
>> +        {
>> +            .fniv = gen_vsat_s,
>> +            .fnoi = gen_helper_vsat_w,
>> +            .opt_opc = vecop_list,
>> +            .vece = MO_32
>> +        },
>> +        {
>> +            .fniv = gen_vsat_s,
>> +            .fnoi = gen_helper_vsat_d,
>> +            .opt_opc = vecop_list,
>> +            .vece = MO_64
>> +        },
>> +    };
>> +
>> +    tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, maxsz, imm, &op[vece]);
>
> Better to expand imm to max here, rather than both inside gen_vsat_s 
> and the runtime do_vsats_*.
>
> Likewise for the unsigned versions. 

I tried to expand imm to max  here  for the unsigned versions.

{

     uint64_t max;

     ...

     static const GVecGen2i op[4] = {
         {
             //.fniv = gen_vsat_u,
             .fnoi = gen_helper_vsat_bu,
             .opt_opc = vecop_list,
             .vece = MO_8
         },
         {
             //.fniv = gen_vsat_u,
             .fnoi = gen_helper_vsat_hu,
             .opt_opc = vecop_list,
             .vece = MO_16
         },
         {
             //.fniv = gen_vsat_u,
             .fnoi = gen_helper_vsat_wu,
             .opt_opc = vecop_list,
             .vece = MO_32
         },
         {
             //.fniv = gen_vsat_u,
             .fnoi = gen_helper_vsat_du,
             .opt_opc = vecop_list,
             .vece = MO_64
         },
     };

     max = (imm == 0x3f) ? UINT64_MAX : (1ull << (imm + 1)) - 1;
     tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, maxsz, max, &op[vece]);

}


and  I got a tcg_debug_assert();


Thread 1 "qemu-loongarch6" received signal SIGABRT, Aborted.
0x00007ffff60b337f in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff60b337f in raise () from /lib64/libc.so.6
#1  0x00007ffff609ddb5 in abort () from /lib64/libc.so.6
#2  0x00007ffff609dc89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x00007ffff60aba76 in __assert_fail () from /lib64/libc.so.6
#4  0x0000555555632fcf in simd_desc (oprsz=16, maxsz=16, data=134217727) 
at ../tcg/tcg-op-gvec.c:91
#5  0x000055555563312b in tcg_gen_gvec_2i_ool (dofs=768, aofs=432, 
c=0xb20, oprsz=16, maxsz=16, data=134217727, fn=0x5555555b5c00 
<gen_helper_vsat_wu>)
     at ../tcg/tcg-op-gvec.c:139
#6  0x0000555555636085 in tcg_gen_gvec_2i (dofs=768, aofs=432, oprsz=16, 
maxsz=16, c=134217727, g=0x5555559c25b0 <op+112>) at 
../tcg/tcg-op-gvec.c:1316
#7  0x00005555555e1ef5 in do_vsat_u (vece=2, vd_ofs=768, vj_ofs=432, 
imm=26, oprsz=16, maxsz=16) at 
../target/loongarch/insn_trans/trans_lsx.c.inc:2828
#8  0x00005555555db25e in gvec_vv_i (ctx=0x7fffffffcc00, 
a=0x7fffffffcb00, mop=MO_32, func=0x5555555e1e73 <do_vsat_u>) at 
../target/loongarch/insn_trans/trans_lsx.c.inc:121
#9  0x00005555555e1f80 in trans_vsat_wu (ctx=0x7fffffffcc00, 
a=0x7fffffffcb00) at ../target/loongarch/insn_trans/trans_lsx.c.inc:2833
#10 0x00005555555d2650 in decode (ctx=0x7fffffffcc00, insn=1932061023) 
at libqemu-loongarch64-linux-user.fa.p/decode-insns.c.inc:8967
#11 0x00005555555e8fca in loongarch_tr_translate_insn 
(dcbase=0x7fffffffcc00, cs=0x555555a4e5a0) at 
../target/loongarch/translate.c:230
#12 0x000055555565e9ae in translator_loop (cpu=0x555555a4e5a0, 
tb=0x7fffe409f340 <code_gen_buffer+652051>, max_insns=0x7fffffffccfc, 
pc=274886330028, host_pc=0x40008086ac,
     ops=0x5555559c0960 <loongarch_tr_ops>, db=0x7fffffffcc00) at 
../accel/tcg/translator.c:84
#13 0x00005555555e91d5 in gen_intermediate_code (cs=0x555555a4e5a0, 
tb=0x7fffe409f340 <code_gen_buffer+652051>, max_insns=0x7fffffffccfc, 
pc=274886330028, host_pc=0x40008086ac)
     at ../target/loongarch/translate.c:286
#14 0x000055555565d38b in setjmp_gen_code (env=0x555555a4e8f0, 
tb=0x7fffe409f340 <code_gen_buffer+652051>, pc=274886330028, 
host_pc=0x40008086ac, max_insns=0x7fffffffccfc,
     ti=0x7fffffffcd18) at ../accel/tcg/translate-all.c:285
#15 0x000055555565d64a in tb_gen_code (cpu=0x555555a4e5a0, 
pc=274886330028, cs_base=0, flags=0, cflags=0) at 
../accel/tcg/translate-all.c:365
#16 0x00005555556556d6 in cpu_exec_loop (cpu=0x555555a4e5a0, 
sc=0x7fffffffce40) at ../accel/tcg/cpu-exec.c:977
#17 0x0000555555655859 in cpu_exec_setjmp (cpu=0x555555a4e5a0, 
sc=0x7fffffffce40) at ../accel/tcg/cpu-exec.c:1034
#18 0x00005555556558eb in cpu_exec (cpu=0x555555a4e5a0) at 
../accel/tcg/cpu-exec.c:1060
#19 0x00005555555a75da in cpu_loop (env=0x555555a4e8f0) at 
../linux-user/loongarch64/cpu_loop.c:22
#20 0x000055555567bc18 in main (argc=5, argv=0x7fffffffd708, 
envp=0x7fffffffd738) at ../linux-user/main.c:957
(gdb) frame 4
#4  0x0000555555632fcf in simd_desc (oprsz=16, maxsz=16, data=134217727) 
at ../tcg/tcg-op-gvec.c:91
91        tcg_debug_assert(data == sextract32(data, 0, SIMD_DATA_BITS));
(gdb) p/x data
$1 = 0x7ffffff
(gdb) frame 7
#7  0x00005555555e1ef5 in do_vsat_u (vece=2, vd_ofs=768, vj_ofs=432, 
imm=26, oprsz=16, maxsz=16) at 
../target/loongarch/insn_trans/trans_lsx.c.inc:2828
2828        tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, maxsz, max, &op[vece]);
(gdb) p/x max
$2 = 0x7ffffff

qemu-loongarch64: ../tcg/tcg-op-gvec.c:91: simd_desc: Assertion `data == 
sextract32(data, 0, (32 - ((0 + 8) + 2)))' failed.

Could I drop this tcg_debug_assert()?

Thanks.
Song Gao

[-- Attachment #2: Type: text/html, Size: 8812 bytes --]

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [RFC PATCH v2 18/44] target/loongarch: Implement vsat
  2023-04-19  9:31     ` Song Gao
@ 2023-04-19 11:06       ` Richard Henderson
  0 siblings, 0 replies; 114+ messages in thread
From: Richard Henderson @ 2023-04-19 11:06 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 4/19/23 11:31, Song Gao wrote:
> 在 2023/4/1 下午1:03, Richard Henderson 写道:
>> Better to expand imm to max here, rather than both inside gen_vsat_s and the runtime 
>> do_vsats_*.
>>
>> Likewise for the unsigned versions. 
> 
> I tried to expand imm to max  here  for the unsigned versions.
> 
> {
> 
>      uint64_t max;
> 
>      ...
> 
>      static const GVecGen2i op[4] = {
>          {
>              //.fniv = gen_vsat_u,
>              .fnoi = gen_helper_vsat_bu,
>              .opt_opc = vecop_list,
>              .vece = MO_8
>          },
>          {
>              //.fniv = gen_vsat_u,
>              .fnoi = gen_helper_vsat_hu,
>              .opt_opc = vecop_list,
>              .vece = MO_16
>          },
>          {
>              //.fniv = gen_vsat_u,
>              .fnoi = gen_helper_vsat_wu,
>              .opt_opc = vecop_list,
>              .vece = MO_32
>          },
>          {
>              //.fniv = gen_vsat_u,
>              .fnoi = gen_helper_vsat_du,
>              .opt_opc = vecop_list,
>              .vece = MO_64
>          },
>      };
> 
>      max = (imm == 0x3f) ? UINT64_MAX : (1ull << (imm + 1)) - 1;
>      tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, maxsz, max, &op[vece]);
> 
> }
> 
> 
> and  I got a tcg_debug_assert();
> 
> 
> Thread 1 "qemu-loongarch6" received signal SIGABRT, Aborted.
> 0x00007ffff60b337f in raise () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00007ffff60b337f in raise () from /lib64/libc.so.6
> #1  0x00007ffff609ddb5 in abort () from /lib64/libc.so.6
> #2  0x00007ffff609dc89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
> #3  0x00007ffff60aba76 in __assert_fail () from /lib64/libc.so.6
> #4  0x0000555555632fcf in simd_desc (oprsz=16, maxsz=16, data=134217727) at 
> ../tcg/tcg-op-gvec.c:91

You should use tcg_gen_gvec_2s, and pass tcg_constant_i64(max).


r~


^ permalink raw reply	[flat|nested] 114+ messages in thread

end of thread, other threads:[~2023-04-19 11:08 UTC | newest]

Thread overview: 114+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-28  3:05 [RFC PATCH v2 00/44] Add LoongArch LSX instructions Song Gao
2023-03-28  3:05 ` [RFC PATCH v2 01/44] target/loongarch: Add LSX data type VReg Song Gao
2023-03-28 19:56   ` Richard Henderson
2023-03-29  2:28     ` gaosong
2023-03-28  3:05 ` [RFC PATCH v2 02/44] target/loongarch: CPUCFG support LSX Song Gao
2023-03-28 19:33   ` Richard Henderson
2023-03-28  3:05 ` [RFC PATCH v2 03/44] target/loongarch: meson.build support build LSX Song Gao
2023-03-28 19:35   ` Richard Henderson
2023-03-28  3:05 ` [RFC PATCH v2 04/44] target/loongarch: Add CHECK_SXE maccro for check LSX enable Song Gao
2023-03-28 19:42   ` Richard Henderson
2023-03-29  2:28     ` gaosong
2023-03-28  3:05 ` [RFC PATCH v2 05/44] target/loongarch: Implement vadd/vsub Song Gao
2023-03-28 19:50   ` Richard Henderson
2023-03-28 19:59   ` Richard Henderson
2023-03-29  9:59     ` gaosong
2023-03-29 15:22       ` Richard Henderson
2023-03-28  3:05 ` [RFC PATCH v2 06/44] target/loongarch: Implement vaddi/vsubi Song Gao
2023-03-28 19:58   ` Richard Henderson
2023-03-28  3:05 ` [RFC PATCH v2 07/44] target/loongarch: Implement vneg Song Gao
2023-03-28 20:02   ` Richard Henderson
2023-03-28  3:05 ` [RFC PATCH v2 08/44] target/loongarch: Implement vsadd/vssub Song Gao
2023-03-28 20:03   ` Richard Henderson
2023-03-28  3:05 ` [RFC PATCH v2 09/44] target/loongarch: Implement vhaddw/vhsubw Song Gao
2023-03-28 20:17   ` Richard Henderson
2023-03-29  3:24     ` gaosong
2023-03-29 15:25       ` Richard Henderson
2023-03-28  3:05 ` [RFC PATCH v2 10/44] target/loongarch: Implement vaddw/vsubw Song Gao
2023-03-28 20:28   ` Richard Henderson
2023-03-28  3:05 ` [RFC PATCH v2 11/44] target/loongarch: Implement vavg/vavgr Song Gao
2023-03-28 20:31   ` Richard Henderson
2023-03-28  3:05 ` [RFC PATCH v2 12/44] target/loongarch: Implement vabsd Song Gao
2023-03-28 20:32   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 13/44] target/loongarch: Implement vadda Song Gao
2023-03-28 20:33   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 14/44] target/loongarch: Implement vmax/vmin Song Gao
2023-03-28 20:39   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 15/44] target/loongarch: Implement vmul/vmuh/vmulw{ev/od} Song Gao
2023-03-28 20:46   ` Richard Henderson
2023-04-06 12:09     ` gaosong
2023-04-06 16:52       ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 16/44] target/loongarch: Implement vmadd/vmsub/vmaddw{ev/od} Song Gao
2023-03-28 20:50   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 17/44] target/loongarch: Implement vdiv/vmod Song Gao
2023-03-28 20:52   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 18/44] target/loongarch: Implement vsat Song Gao
2023-04-01  5:03   ` Richard Henderson
2023-04-03 12:55     ` gaosong
2023-04-03 20:13       ` Richard Henderson
2023-04-04  2:11         ` gaosong
2023-04-04  3:46           ` Richard Henderson
2023-04-19  9:31     ` Song Gao
2023-04-19 11:06       ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 19/44] target/loongarch: Implement vexth Song Gao
2023-04-01  5:07   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 20/44] target/loongarch: Implement vsigncov Song Gao
2023-04-01  5:11   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 21/44] target/loongarch: Implement vmskltz/vmskgez/vmsknz Song Gao
2023-04-01  5:20   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 22/44] target/loongarch: Implement LSX logic instructions Song Gao
2023-04-01  5:31   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 23/44] target/loongarch: Implement vsll vsrl vsra vrotr Song Gao
2023-04-01  5:38   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 24/44] target/loongarch: Implement vsllwil vextl Song Gao
2023-04-01  5:40   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 25/44] target/loongarch: Implement vsrlr vsrar Song Gao
2023-04-01  5:42   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 26/44] target/loongarch: Implement vsrln vsran Song Gao
2023-04-01  5:46   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 27/44] target/loongarch: Implement vsrlrn vsrarn Song Gao
2023-04-01  5:53   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 28/44] target/loongarch: Implement vssrln vssran Song Gao
2023-04-02  3:26   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 29/44] target/loongarch: Implement vssrlrn vssrarn Song Gao
2023-04-02  3:31   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 30/44] target/loongarch: Implement vclo vclz Song Gao
2023-04-02  3:34   ` Richard Henderson
2023-04-07  7:40     ` gaosong
2023-04-07 16:46       ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 31/44] target/loongarch: Implement vpcnt Song Gao
2023-04-02  3:35   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 32/44] target/loongarch: Implement vbitclr vbitset vbitrev Song Gao
2023-04-02  5:14   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 33/44] target/loongarch: Implement vfrstp Song Gao
2023-04-02  5:17   ` Richard Henderson
2023-04-03  2:27     ` gaosong
2023-03-28  3:06 ` [RFC PATCH v2 34/44] target/loongarch: Implement LSX fpu arith instructions Song Gao
2023-04-02  5:19   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 35/44] target/loongarch: Implement LSX fpu fcvt instructions Song Gao
2023-04-02  5:22   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 36/44] target/loongarch: Implement vseq vsle vslt Song Gao
2023-04-02  5:27   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 37/44] target/loongarch: Implement vfcmp Song Gao
2023-04-04  0:47   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 38/44] target/loongarch: Implement vbitsel vset Song Gao
2023-04-04  1:03   ` Richard Henderson
2023-04-11 11:37     ` gaosong
2023-04-12  6:53       ` Richard Henderson
2023-04-13  2:53         ` gaosong
2023-04-13 10:06           ` Richard Henderson
2023-04-14  3:22             ` gaosong
2023-04-14  3:47               ` gaosong
2023-03-28  3:06 ` [RFC PATCH v2 39/44] target/loongarch: Implement vinsgr2vr vpickve2gr vreplgr2vr Song Gao
2023-04-04  1:09   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 40/44] target/loongarch: Implement vreplve vpack vpick Song Gao
2023-04-04  1:17   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 41/44] target/loongarch: Implement vilvl vilvh vextrins vshuf Song Gao
2023-04-04  1:31   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 42/44] target/loongarch: Implement vld vst Song Gao
2023-04-04  3:35   ` Richard Henderson
2023-03-28  3:06 ` [RFC PATCH v2 43/44] target/loongarch: Implement vldi Song Gao
2023-04-04  3:39   ` Richard Henderson
2023-04-18  9:03     ` Song Gao
2023-03-28  3:06 ` [RFC PATCH v2 44/44] target/loongarch: Use {set/get}_gpr replace to cpu_fpr Song Gao
2023-04-04  3:44   ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.