All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/22] Add LoongArch linux-user emulation support
@ 2021-07-21  9:52 Song Gao
  2021-07-21  9:52 ` [PATCH v2 01/22] target/loongarch: Add README Song Gao
                   ` (21 more replies)
  0 siblings, 22 replies; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

Hi,

This series only adds linux-user emulation support for LoongArch.
So there is no introduction of system in docs/system directory,
We'll add that in a future series.

Changes for v1:
  * Patch 1, remove unnecessary introduction;
  * Patch 3, follow the ARM/AVR pattern to add new CPU features;
  * Patch 6, remove decode_lsx(); 
  * Patches 7-18, delete opcode definition, modify translation function;
  * Patches 20-22, split V1 patch20 to V2 patch20-22. 

In the next series, we will add privileged instruction emulation, 
board emulation and TCG test. Please review.

Regards
Song Gao

Song Gao (22):
  target/loongarch: Add README
  target/loongarch: Add CSR registers definition
  target/loongarch: Add core definition
  target/loongarch: Add interrupt handling support
  target/loongarch: Add memory management support
  target/loongarch: Add main translation routines
  target/loongarch: Add fixed point arithmetic instruction translation
  target/loongarch: Add fixed point shift instruction translation
  target/loongarch: Add fixed point bit instruction translation
  target/loongarch: Add fixed point load/store instruction translation
  target/loongarch: Add fixed point atomic instruction translation
  target/loongarch: Add fixed point extra instruction translation
  target/loongarch: Add floating point arithmetic instruction
    translation
  target/loongarch: Add floating point comparison instruction
    translation
  target/loongarch: Add floating point conversion instruction
    translation
  target/loongarch: Add floating point move instruction translation
  target/loongarch: Add floating point load/store instruction
    translation
  target/loongarch: Add branch instruction translation
  target/loongarch: Add disassembler
  LoongArch Linux User Emulation
  configs: Add loongarch linux-user config
  target/loongarch: Add target build suport

 MAINTAINERS                                |    7 +
 configs/targets/loongarch64-linux-user.mak |    3 +
 disas/loongarch.c                          | 2511 +++++++++++++
 disas/meson.build                          |    1 +
 include/disas/dis-asm.h                    |    2 +
 include/elf.h                              |    2 +
 linux-user/elfload.c                       |   58 +
 linux-user/loongarch64/cpu_loop.c          |  177 +
 linux-user/loongarch64/signal.c            |  193 +
 linux-user/loongarch64/sockbits.h          |    1 +
 linux-user/loongarch64/syscall_nr.h        |  307 ++
 linux-user/loongarch64/target_cpu.h        |   36 +
 linux-user/loongarch64/target_elf.h        |   14 +
 linux-user/loongarch64/target_fcntl.h      |   12 +
 linux-user/loongarch64/target_signal.h     |   28 +
 linux-user/loongarch64/target_structs.h    |   49 +
 linux-user/loongarch64/target_syscall.h    |   46 +
 linux-user/loongarch64/termbits.h          |  229 ++
 linux-user/syscall_defs.h                  |   10 +-
 meson.build                                |    1 +
 target/loongarch/README                    |    5 +
 target/loongarch/cpu-csr.h                 |  724 ++++
 target/loongarch/cpu-param.h               |   21 +
 target/loongarch/cpu-qom.h                 |   40 +
 target/loongarch/cpu.c                     |  319 ++
 target/loongarch/cpu.h                     |  299 ++
 target/loongarch/fpu_helper.c              | 1435 +++++++
 target/loongarch/fpu_helper.h              |   34 +
 target/loongarch/helper.h                  |  158 +
 target/loongarch/insns.decode              |  480 +++
 target/loongarch/meson.build               |   19 +
 target/loongarch/op_helper.c               |  230 ++
 target/loongarch/tlb_helper.c              |  103 +
 target/loongarch/trans.inc.c               | 5536 ++++++++++++++++++++++++++++
 target/loongarch/translate.c               |  558 +++
 target/loongarch/translate.h               |   50 +
 target/meson.build                         |    1 +
 37 files changed, 13695 insertions(+), 4 deletions(-)
 create mode 100644 configs/targets/loongarch64-linux-user.mak
 create mode 100644 disas/loongarch.c
 create mode 100644 linux-user/loongarch64/cpu_loop.c
 create mode 100644 linux-user/loongarch64/signal.c
 create mode 100644 linux-user/loongarch64/sockbits.h
 create mode 100644 linux-user/loongarch64/syscall_nr.h
 create mode 100644 linux-user/loongarch64/target_cpu.h
 create mode 100644 linux-user/loongarch64/target_elf.h
 create mode 100644 linux-user/loongarch64/target_fcntl.h
 create mode 100644 linux-user/loongarch64/target_signal.h
 create mode 100644 linux-user/loongarch64/target_structs.h
 create mode 100644 linux-user/loongarch64/target_syscall.h
 create mode 100644 linux-user/loongarch64/termbits.h
 create mode 100644 target/loongarch/README
 create mode 100644 target/loongarch/cpu-csr.h
 create mode 100644 target/loongarch/cpu-param.h
 create mode 100644 target/loongarch/cpu-qom.h
 create mode 100644 target/loongarch/cpu.c
 create mode 100644 target/loongarch/cpu.h
 create mode 100644 target/loongarch/fpu_helper.c
 create mode 100644 target/loongarch/fpu_helper.h
 create mode 100644 target/loongarch/helper.h
 create mode 100644 target/loongarch/insns.decode
 create mode 100644 target/loongarch/meson.build
 create mode 100644 target/loongarch/op_helper.c
 create mode 100644 target/loongarch/tlb_helper.c
 create mode 100644 target/loongarch/trans.inc.c
 create mode 100644 target/loongarch/translate.c
 create mode 100644 target/loongarch/translate.h

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v2 01/22] target/loongarch: Add README
  2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
@ 2021-07-21  9:52 ` Song Gao
  2021-07-21  9:52 ` [PATCH v2 02/22] target/loongarch: Add CSR registers definition Song Gao
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

This patch give an introduction to the LoongArch target.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 MAINTAINERS             | 5 +++++
 target/loongarch/README | 5 +++++
 2 files changed, 10 insertions(+)
 create mode 100644 target/loongarch/README

diff --git a/MAINTAINERS b/MAINTAINERS
index 4256ad1..ae87a74 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -210,6 +210,11 @@ F: disas/hppa.c
 F: hw/net/*i82596*
 F: include/hw/net/lasi_82596.h
 
+LoongArch TCG CPUS
+M: Song Gao <gaosong@loongson.cn>
+S: Maintained
+F: target/loongarch/
+
 M68K TCG CPUs
 M: Laurent Vivier <laurent@vivier.eu>
 S: Maintained
diff --git a/target/loongarch/README b/target/loongarch/README
new file mode 100644
index 0000000..fe7a36f
--- /dev/null
+++ b/target/loongarch/README
@@ -0,0 +1,5 @@
+LoongArch is the general processor architecture of Loongson.
+
+The following versions of the LoongArch core are supported
+    core: 3A5000
+    https://github.com/loongson/LoongArch-Documentation/releases/download/LoongArch-Vol1-v3/LoongArch-Vol1-v1.00-EN.pdf
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 02/22] target/loongarch: Add CSR registers definition
  2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
  2021-07-21  9:52 ` [PATCH v2 01/22] target/loongarch: Add README Song Gao
@ 2021-07-21  9:52 ` Song Gao
  2021-07-21  9:52 ` [PATCH v2 03/22] target/loongarch: Add core definition Song Gao
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

This patch add basic CSR registers definition. The CSR registers
definition copy from kernel arch/loongarch/include/asm/loongarchregs.h.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/cpu-csr.h | 724 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 724 insertions(+)
 create mode 100644 target/loongarch/cpu-csr.h

diff --git a/target/loongarch/cpu-csr.h b/target/loongarch/cpu-csr.h
new file mode 100644
index 0000000..87273b1
--- /dev/null
+++ b/target/loongarch/cpu-csr.h
@@ -0,0 +1,724 @@
+/*
+ * QEMU LoongArch CSR
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+#ifndef _CPU_CSR_H_
+#define _CPU_CSR_H_
+
+/*
+ * basic CSR register copy
+ * copy from kernel arch/loongarch/include/asm/loongarchregs.h
+ */
+
+#define LOONGARCH_CSR_CRMD           0x0 /* Current mode info */
+#define  CSR_CRMD_DACM_SHIFT         7
+#define  CSR_CRMD_DACM_WIDTH         2
+#define  CSR_CRMD_DACM               (0x3UL << CSR_CRMD_DACM_SHIFT)
+#define  CSR_CRMD_DACF_SHIFT         5
+#define  CSR_CRMD_DACF_WIDTH         2
+#define  CSR_CRMD_DACF               (0x3UL << CSR_CRMD_DACF_SHIFT)
+#define  CSR_CRMD_PG_SHIFT           4
+#define  CSR_CRMD_PG                 (0x1UL << CSR_CRMD_PG_SHIFT)
+#define  CSR_CRMD_DA_SHIFT           3
+#define  CSR_CRMD_DA                 (0x1UL << CSR_CRMD_DA_SHIFT)
+#define  CSR_CRMD_IE_SHIFT           2
+#define  CSR_CRMD_IE                 (0x1UL << CSR_CRMD_IE_SHIFT)
+#define  CSR_CRMD_PLV_SHIFT          0
+#define  CSR_CRMD_PLV_WIDTH          2
+#define  CSR_CRMD_PLV                (0x3UL << CSR_CRMD_PLV_SHIFT)
+
+#define PLV_USER                     3
+#define PLV_KERN                     0
+#define PLV_MASK                     0x3
+
+#define LOONGARCH_CSR_PRMD           0x1 /* Prev-exception mode info */
+#define  CSR_PRMD_PIE_SHIFT          2
+#define  CSR_PRMD_PIE                (0x1UL << CSR_PRMD_PIE_SHIFT)
+#define  CSR_PRMD_PPLV_SHIFT         0
+#define  CSR_PRMD_PPLV_WIDTH         2
+#define  CSR_PRMD_PPLV               (0x3UL << CSR_PRMD_PPLV_SHIFT)
+
+#define LOONGARCH_CSR_EUEN           0x2 /* Extended unit enable */
+#define  CSR_EUEN_LBTEN_SHIFT        3
+#define  CSR_EUEN_LBTEN              (0x1UL << CSR_EUEN_LBTEN_SHIFT)
+#define  CSR_EUEN_LASXEN_SHIFT       2
+#define  CSR_EUEN_LASXEN             (0x1UL << CSR_EUEN_LASXEN_SHIFT)
+#define  CSR_EUEN_LSXEN_SHIFT        1
+#define  CSR_EUEN_LSXEN              (0x1UL << CSR_EUEN_LSXEN_SHIFT)
+#define  CSR_EUEN_FPEN_SHIFT         0
+#define  CSR_EUEN_FPEN               (0x1UL << CSR_EUEN_FPEN_SHIFT)
+
+#define LOONGARCH_CSR_MISC           0x3 /* Misc config */
+
+#define LOONGARCH_CSR_ECFG           0x4 /* Exception config */
+#define  CSR_ECFG_VS_SHIFT           16
+#define  CSR_ECFG_VS_WIDTH           3
+#define  CSR_ECFG_VS                 (0x7UL << CSR_ECFG_VS_SHIFT)
+#define  CSR_ECFG_IM_SHIFT           0
+#define  CSR_ECFG_IM_WIDTH           13
+#define  CSR_ECFG_IM                 (0x1fffUL << CSR_ECFG_IM_SHIFT)
+
+#define  CSR_ECFG_IPMASK             0x00001fff
+
+#define LOONGARCH_CSR_ESTAT          0x5 /* Exception status */
+#define  CSR_ESTAT_ESUBCODE_SHIFT    22
+#define  CSR_ESTAT_ESUBCODE_WIDTH    9
+#define  CSR_ESTAT_ESUBCODE          (0x1ffULL << CSR_ESTAT_ESUBCODE_SHIFT)
+#define  CSR_ESTAT_EXC_SH            16
+#define  CSR_ESTAT_EXC_WIDTH         5
+#define  CSR_ESTAT_EXC               (0x1fULL << CSR_ESTAT_EXC_SH)
+#define  CSR_ESTAT_IS_SHIFT          0
+#define  CSR_ESTAT_IS_WIDTH          15
+#define  CSR_ESTAT_IS                (0x7fffULL << CSR_ESTAT_IS_SHIFT)
+
+#define  CSR_ESTAT_IPMASK            0x00001fff
+
+/*
+ * ExStatus.ExcCode
+ */
+#define  EXCCODE_INT_START           64
+#define  EXCCODE_RSV                 0    /* Reserved */
+#define  EXCCODE_TLBL                1    /* TLB miss on a load */
+#define  EXCCODE_TLBS                2    /* TLB miss on a store */
+#define  EXCCODE_TLBI                3    /* TLB miss on a ifetch */
+#define  EXCCODE_TLBM                4    /* TLB modified fault */
+#define  EXCCODE_TLBRI               5    /* TLB Read-Inhibit exception */
+#define  EXCCODE_TLBXI               6    /* TLB Execution-Inhibit exception */
+#define  EXCCODE_TLBPE               7    /* TLB Privilege Error */
+#define  EXCCODE_ADE                 8    /* Address Error */
+#define  EXCCODE_ALE                 9    /* Unalign Access */
+#define  EXCCODE_OOB                 10   /* Out of bounds */
+#define  EXCCODE_SYS                 11   /* System call */
+#define  EXCCODE_BP                  12   /* Breakpoint */
+#define  EXCCODE_INE                 13   /* Inst. Not Exist */
+#define  EXCCODE_IPE                 14   /* Inst. Privileged Error */
+#define  EXCCODE_FPDIS               15   /* FPU Disabled */
+#define  EXCCODE_LSXDIS              16   /* LSX Disabled */
+#define  EXCCODE_LASXDIS             17   /* LASX Disabled */
+#define  EXCCODE_FPE                 18   /* Floating Point Exception */
+#define  EXCCODE_WATCH               19   /* Watch address reference */
+#define  EXCCODE_BTDIS               20   /* Binary Trans. Disabled */
+#define  EXCCODE_BTE                 21   /* Binary Trans. Exception */
+#define  EXCCODE_PSI                 22   /* Guest Privileged Error */
+#define  EXCCODE_HYP                 23   /* Hypercall */
+#define  EXCCODE_GCM                 24   /* Guest CSR modified */
+#define  EXCCODE_SE                  25   /* Security*/
+
+#define LOONGARCH_CSR_ERA            0x6  /* Error PC */
+
+#define LOONGARCH_CSR_BADV           0x7  /* Bad virtual address */
+
+#define LOONGARCH_CSR_BADI           0x8  /* Bad instruction */
+
+#define LOONGARCH_CSR_EEPN           0xc  /* Exception enter base address */
+
+/* TLB related CSR register */
+#define LOONGARCH_CSR_TLBIDX         0x10 /* TLB Index, EHINV, PageSize, NP*/
+#define  CSR_TLBIDX_EHINV_SHIFT      31
+#define  CSR_TLBIDX_EHINV            (0x1ULL << CSR_TLBIDX_EHINV_SHIFT)
+#define  CSR_TLBIDX_PS_SHIFT         24
+#define  CSR_TLBIDX_PS_WIDTH         6
+#define  CSR_TLBIDX_PS               (0x3fULL << CSR_TLBIDX_PS_SHIFT)
+#define  CSR_TLBIDX_IDX_SHIFT        0
+#define  CSR_TLBIDX_IDX_WIDTH        12
+#define  CSR_TLBIDX_IDX              (0xfffULL << CSR_TLBIDX_IDX_SHIFT)
+#define  CSR_TLBIDX_SIZEM            0x3f000000
+#define  CSR_TLBIDX_SIZE             CSR_TLBIDX_PS_SHIFT
+#define  CSR_TLBIDX_IDXM             0xfff
+
+#define LOONGARCH_CSR_TLBEHI         0x11 /* TLB EntryHi without ASID */
+
+#define LOONGARCH_CSR_TLBELO0        0x12 /* TLB EntryLo0 */
+#define  CSR_TLBLO0_RPLV_SHIFT       63
+#define  CSR_TLBLO0_RPLV             (0x1ULL << CSR_TLBLO0_RPLV_SHIFT)
+#define  CSR_TLBLO0_XI_SHIFT         62
+#define  CSR_TLBLO0_XI               (0x1ULL << CSR_TLBLO0_XI_SHIFT)
+#define  CSR_TLBLO0_RI_SHIFT         61
+#define  CSR_TLBLO0_RI               (0x1ULL << CSR_TLBLO0_RI_SHIFT)
+#define  CSR_TLBLO0_PPN_SHIFT        12
+#define  CSR_TLBLO0_PPN_WIDTH        36 /* ignore lower 12bits */
+#define  CSR_TLBLO0_PPN              (0xfffffffffULL << CSR_TLBLO0_PPN_SHIFT)
+#define  CSR_TLBLO0_GLOBAL_SHIFT     6
+#define  CSR_TLBLO0_GLOBAL           (0x1ULL << CSR_TLBLO0_GLOBAL_SHIFT)
+#define  CSR_TLBLO0_CCA_SHIFT        4
+#define  CSR_TLBLO0_CCA_WIDTH        2
+#define  CSR_TLBLO0_CCA              (0x3ULL << CSR_TLBLO0_CCA_SHIFT)
+#define  CSR_TLBLO0_PLV_SHIFT        2
+#define  CSR_TLBLO0_PLV_WIDTH        2
+#define  CSR_TLBLO0_PLV              (0x3ULL << CSR_TLBLO0_PLV_SHIFT)
+#define  CSR_TLBLO0_WE_SHIFT         1
+#define  CSR_TLBLO0_WE               (0x1ULL << CSR_TLBLO0_WE_SHIFT)
+#define  CSR_TLBLO0_V_SHIFT          0
+#define  CSR_TLBLO0_V                (0x1ULL << CSR_TLBLO0_V_SHIFT)
+
+#define LOONGARCH_CSR_TLBELO1        0x13 /* TLB EntryLo1 */
+#define  CSR_TLBLO1_RPLV_SHIFT       63
+#define  CSR_TLBLO1_RPLV             (0x1ULL << CSR_TLBLO1_RPLV_SHIFT)
+#define  CSR_TLBLO1_XI_SHIFT         62
+#define  CSR_TLBLO1_XI               (0x1ULL << CSR_TLBLO1_XI_SHIFT)
+#define  CSR_TLBLO1_RI_SHIFT         61
+#define  CSR_TLBLO1_RI               (0x1ULL << CSR_TLBLO1_RI_SHIFT)
+#define  CSR_TLBLO1_PPN_SHIFT        12
+#define  CSR_TLBLO1_PPN_WIDTH        36
+#define  CSR_TLBLO1_PPN              (0xfffffffffULL << CSR_TLBLO1_PPN_SHIFT)
+#define  CSR_TLBLO1_GLOBAL_SHIFT     6
+#define  CSR_TLBLO1_GLOBAL           (0x1ULL << CSR_TLBLO1_GLOBAL_SHIFT)
+#define  CSR_TLBLO1_CCA_SHIFT        4
+#define  CSR_TLBLO1_CCA_WIDTH        2
+#define  CSR_TLBLO1_CCA              (0x3ULL << CSR_TLBLO1_CCA_SHIFT)
+#define  CSR_TLBLO1_PLV_SHIFT        2
+#define  CSR_TLBLO1_PLV_WIDTH        2
+#define  CSR_TLBLO1_PLV              (0x3ULL << CSR_TLBLO1_PLV_SHIFT)
+#define  CSR_TLBLO1_WE_SHIFT         1
+#define  CSR_TLBLO1_WE               (0x1ULL << CSR_TLBLO1_WE_SHIFT)
+#define  CSR_TLBLO1_V_SHIFT          0
+#define  CSR_TLBLO1_V                (0x1ULL << CSR_TLBLO1_V_SHIFT)
+
+#define LOONGARCH_CSR_ASID           0x18 /* 64 ASID */
+#define  CSR_ASID_BIT_SHIFT          16 /* ASIDBits */
+#define  CSR_ASID_BIT_WIDTH          8
+#define  CSR_ASID_BIT                (0xffULL << CSR_ASID_BIT_SHIFT)
+#define  CSR_ASID_ASID_SHIFT         0
+#define  CSR_ASID_ASID_WIDTH         10
+#define  CSR_ASID_ASID               (0x3ffULL << CSR_ASID_ASID_SHIFT)
+
+/* Page table base address when badv[47] = 0 */
+#define LOONGARCH_CSR_PGDL           0x19
+/* Page table base address when badv[47] = 1 */
+#define LOONGARCH_CSR_PGDH           0x1a
+
+#define LOONGARCH_CSR_PGD            0x1b /* Page table base */
+
+#define LOONGARCH_CSR_PWCTL0         0x1c /* PWCtl0 */
+#define  CSR_PWCTL0_PTEW_SHIFT       30
+#define  CSR_PWCTL0_PTEW_WIDTH       2
+#define  CSR_PWCTL0_PTEW             (0x3ULL << CSR_PWCTL0_PTEW_SHIFT)
+#define  CSR_PWCTL0_DIR1WIDTH_SHIFT  25
+#define  CSR_PWCTL0_DIR1WIDTH_WIDTH  5
+#define  CSR_PWCTL0_DIR1WIDTH        (0x1fULL << CSR_PWCTL0_DIR1WIDTH_SHIFT)
+#define  CSR_PWCTL0_DIR1BASE_SHIFT   20
+#define  CSR_PWCTL0_DIR1BASE_WIDTH   5
+#define  CSR_PWCTL0_DIR1BASE         (0x1fULL << CSR_PWCTL0_DIR1BASE_SHIFT)
+#define  CSR_PWCTL0_DIR0WIDTH_SHIFT  15
+#define  CSR_PWCTL0_DIR0WIDTH_WIDTH  5
+#define  CSR_PWCTL0_DIR0WIDTH        (0x1fULL << CSR_PWCTL0_DIR0WIDTH_SHIFT)
+#define  CSR_PWCTL0_DIR0BASE_SHIFT   10
+#define  CSR_PWCTL0_DIR0BASE_WIDTH   5
+#define  CSR_PWCTL0_DIR0BASE         (0x1fULL << CSR_PWCTL0_DIR0BASE_SHIFT)
+#define  CSR_PWCTL0_PTWIDTH_SHIFT    5
+#define  CSR_PWCTL0_PTWIDTH_WIDTH    5
+#define  CSR_PWCTL0_PTWIDTH          (0x1fULL << CSR_PWCTL0_PTWIDTH_SHIFT)
+#define  CSR_PWCTL0_PTBASE_SHIFT     0
+#define  CSR_PWCTL0_PTBASE_WIDTH     5
+#define  CSR_PWCTL0_PTBASE           (0x1fULL << CSR_PWCTL0_PTBASE_SHIFT)
+
+#define LOONGARCH_CSR_PWCTL1         0x1d /* PWCtl1 */
+#define  CSR_PWCTL1_DIR3WIDTH_SHIFT  18
+#define  CSR_PWCTL1_DIR3WIDTH_WIDTH  5
+#define  CSR_PWCTL1_DIR3WIDTH        (0x1fULL << CSR_PWCTL1_DIR3WIDTH_SHIFT)
+#define  CSR_PWCTL1_DIR3BASE_SHIFT   12
+#define  CSR_PWCTL1_DIR3BASE_WIDTH   5
+#define  CSR_PWCTL1_DIR3BASE         (0x1fULL << CSR_PWCTL0_DIR3BASE_SHIFT)
+#define  CSR_PWCTL1_DIR2WIDTH_SHIFT  6
+#define  CSR_PWCTL1_DIR2WIDTH_WIDTH  5
+#define  CSR_PWCTL1_DIR2WIDTH        (0x1fULL << CSR_PWCTL1_DIR2WIDTH_SHIFT)
+#define  CSR_PWCTL1_DIR2BASE_SHIFT   0
+#define  CSR_PWCTL1_DIR2BASE_WIDTH   5
+#define  CSR_PWCTL1_DIR2BASE         (0x1fULL << CSR_PWCTL0_DIR2BASE_SHIFT)
+
+#define LOONGARCH_CSR_STLBPGSIZE     0x1e
+#define  CSR_STLBPGSIZE_PS_WIDTH     6
+#define  CSR_STLBPGSIZE_PS           (0x3f)
+
+#define LOONGARCH_CSR_RVACFG         0x1f
+#define  CSR_RVACFG_RDVA_WIDTH       4
+#define  CSR_RVACFG_RDVA             (0xf)
+
+/* Config CSR registers */
+#define LOONGARCH_CSR_CPUID          0x20 /* CPU core number */
+#define  CSR_CPUID_CID_WIDTH         9
+#define  CSR_CPUID_CID               (0x1ff)
+
+#define LOONGARCH_CSR_PRCFG1         0x21 /* Config1 */
+#define  CSR_CONF1_VSMAX_SHIFT       12
+#define  CSR_CONF1_VSMAX_WIDTH       3
+#define  CSR_CONF1_VSMAX             (7ULL << CSR_CONF1_VSMAX_SHIFT)
+#define  CSR_CONF1_TMRBITS_SHIFT     4
+#define  CSR_CONF1_TMRBITS_WIDTH     8
+#define  CSR_CONF1_TMRBITS           (0xffULL << CSR_CONF1_TMRBITS_SHIFT)
+#define  CSR_CONF1_KSNUM_SHIFT       0
+#define  CSR_CONF1_KSNUM_WIDTH       4
+#define  CSR_CONF1_KSNUM             (0x8)
+
+#define LOONGARCH_CSR_PRCFG2         0x22 /* Config2 */
+#define  CSR_CONF2_PGMASK_SUPP       0x3ffff000
+
+#define LOONGARCH_CSR_PRCFG3         0x23 /* Config3 */
+#define  CSR_CONF3_STLBIDX_SHIFT     20
+#define  CSR_CONF3_STLBIDX_WIDTH     6
+#define  CSR_CONF3_STLBIDX           (0x3fULL << CSR_CONF3_STLBIDX_SHIFT)
+#define  CSR_CONF3_STLBWAYS_SHIFT    12
+#define  CSR_CONF3_STLBWAYS_WIDTH    8
+#define  CSR_CONF3_STLBWAYS          (0xffULL << CSR_CONF3_STLBWAYS_SHIFT)
+#define  CSR_CONF3_MTLBSIZE_SHIFT    4
+#define  CSR_CONF3_MTLBSIZE_WIDTH    8
+#define  CSR_CONF3_MTLBSIZE          (0xffULL << CSR_CONF3_MTLBSIZE_SHIFT)
+#define  CSR_CONF3_TLBORG_SHIFT      0
+#define  CSR_CONF3_TLBORG_WIDTH      4
+#define  CSR_CONF3_TLBORG            (0xfULL << CSR_CONF3_TLBORG_SHIFT)
+
+/* Kscratch registers */
+#define LOONGARCH_CSR_KS0            0x30
+#define LOONGARCH_CSR_KS1            0x31
+#define LOONGARCH_CSR_KS2            0x32
+#define LOONGARCH_CSR_KS3            0x33
+#define LOONGARCH_CSR_KS4            0x34
+#define LOONGARCH_CSR_KS5            0x35
+#define LOONGARCH_CSR_KS6            0x36
+#define LOONGARCH_CSR_KS7            0x37
+#define LOONGARCH_CSR_KS8            0x38
+
+/* Timer registers */
+#define LOONGARCH_CSR_TMID           0x40 /* Timer ID */
+
+#define LOONGARCH_CSR_TCFG           0x41 /* Timer config */
+#define  CSR_TCFG_VAL_SHIFT          2
+#define  CSR_TCFG_VAL_WIDTH          48
+#define  CSR_TCFG_VAL                (0x3fffffffffffULL << CSR_TCFG_VAL_SHIFT)
+#define  CSR_TCFG_PERIOD_SHIFT       1
+#define  CSR_TCFG_PERIOD             (0x1ULL << CSR_TCFG_PERIOD_SHIFT)
+#define  CSR_TCFG_EN                 (0x1)
+
+#define LOONGARCH_CSR_TVAL           0x42 /* Timer value */
+
+#define LOONGARCH_CSR_CNTC           0x43 /* Timer offset */
+
+#define LOONGARCH_CSR_TINTCLR        0x44 /* Timer interrupt clear */
+#define  CSR_TINTCLR_TI_SHIFT        0
+#define  CSR_TINTCLR_TI              (1 << CSR_TINTCLR_TI_SHIFT)
+
+/* LLBCTL register */
+#define LOONGARCH_CSR_LLBIT          0x60 /* LLBit control */
+#define  CSR_LLBIT_ROLLB_SHIFT       0
+#define  CSR_LLBIT_ROLLB             (1ULL << CSR_LLBIT_ROLLB_SHIFT)
+#define  CSR_LLBIT_WCLLB_SHIFT       1
+#define  CSR_LLBIT_WCLLB             (1ULL << CSR_LLBIT_WCLLB_SHIFT)
+#define  CSR_LLBIT_KLO_SHIFT         2
+#define  CSR_LLBIT_KLO               (1ULL << CSR_LLBIT_KLO_SHIFT)
+
+/* Implement dependent */
+#define LOONGARCH_CSR_IMPCTL1        0x80 /* Loongarch config1 */
+#define  CSR_MISPEC_SHIFT            20
+#define  CSR_MISPEC_WIDTH            8
+#define  CSR_MISPEC                  (0xffULL << CSR_MISPEC_SHIFT)
+#define  CSR_SSEN_SHIFT              18
+#define  CSR_SSEN                    (1ULL << CSR_SSEN_SHIFT)
+#define  CSR_SCRAND_SHIFT            17
+#define  CSR_SCRAND                  (1ULL << CSR_SCRAND_SHIFT)
+#define  CSR_LLEXCL_SHIFT            16
+#define  CSR_LLEXCL                  (1ULL << CSR_LLEXCL_SHIFT)
+#define  CSR_DISVC_SHIFT             15
+#define  CSR_DISVC                   (1ULL << CSR_DISVC_SHIFT)
+#define  CSR_VCLRU_SHIFT             14
+#define  CSR_VCLRU                   (1ULL << CSR_VCLRU_SHIFT)
+#define  CSR_DCLRU_SHIFT             13
+#define  CSR_DCLRU                   (1ULL << CSR_DCLRU_SHIFT)
+#define  CSR_FASTLDQ_SHIFT           12
+#define  CSR_FASTLDQ                 (1ULL << CSR_FASTLDQ_SHIFT)
+#define  CSR_USERCAC_SHIFT           11
+#define  CSR_USERCAC                 (1ULL << CSR_USERCAC_SHIFT)
+#define  CSR_ANTI_MISPEC_SHIFT       10
+#define  CSR_ANTI_MISPEC             (1ULL << CSR_ANTI_MISPEC_SHIFT)
+#define  CSR_ANTI_FLUSHSFB_SHIFT     9
+#define  CSR_ANTI_FLUSHSFB           (1ULL << CSR_ANTI_FLUSHSFB_SHIFT)
+#define  CSR_STFILL_SHIFT            8
+#define  CSR_STFILL                  (1ULL << CSR_STFILL_SHIFT)
+#define  CSR_LIFEP_SHIFT             7
+#define  CSR_LIFEP                   (1ULL << CSR_LIFEP_SHIFT)
+#define  CSR_LLSYNC_SHIFT            6
+#define  CSR_LLSYNC                  (1ULL << CSR_LLSYNC_SHIFT)
+#define  CSR_BRBTDIS_SHIFT           5
+#define  CSR_BRBTDIS                 (1ULL << CSR_BRBTDIS_SHIFT)
+#define  CSR_RASDIS_SHIFT            4
+#define  CSR_RASDIS                  (1ULL << CSR_RASDIS_SHIFT)
+#define  CSR_STPRE_SHIFT             2
+#define  CSR_STPRE_WIDTH             2
+#define  CSR_STPRE                   (3ULL << CSR_STPRE_SHIFT)
+#define  CSR_INSTPRE_SHIFT           1
+#define  CSR_INSTPRE                 (1ULL << CSR_INSTPRE_SHIFT)
+#define  CSR_DATAPRE_SHIFT           0
+#define  CSR_DATAPRE                 (1ULL << CSR_DATAPRE_SHIFT)
+
+#define LOONGARCH_CSR_IMPCTL2        0x81 /* loongarch config2 */
+#define  CSR_IMPCTL2_MTLB_SHIFT      0
+#define  CSR_IMPCTL2_MTLB            (1ULL << CSR_IMPCTL2_MTLB_SHIFT)
+#define  CSR_IMPCTL2_STLB_SHIFT      1
+#define  CSR_IMPCTL2_STLB            (1ULL << CSR_IMPCTL2_STLB_SHIFT)
+#define  CSR_IMPCTL2_DTLB_SHIFT      2
+#define  CSR_IMPCTL2_DTLB            (1ULL << CSR_IMPCTL2_DTLB_SHIFT)
+#define  CSR_IMPCTL2_ITLB_SHIFT      3
+#define  CSR_IMPCTL2_ITLB            (1ULL << CSR_IMPCTL2_ITLB_SHIFT)
+#define  CSR_IMPCTL2_BTAC_SHIFT      4
+#define  CSR_IMPCTL2_BTAC            (1ULL << CSR_IMPCTL2_BTAC_SHIFT)
+
+#define LOONGARCH_CSR_GNMI           0x82
+
+/* TLB refill registers */
+#define LOONGARCH_CSR_TLBRENT        0x88 /* TLB refill exception address */
+#define LOONGARCH_CSR_TLBRBADV       0x89 /* TLB refill badvaddr */
+#define LOONGARCH_CSR_TLBRERA        0x8a /* TLB refill ERA */
+#define LOONGARCH_CSR_TLBRSAVE       0x8b /* KScratch for TLB refill */
+#define LOONGARCH_CSR_TLBRELO0       0x8c /* TLB refill entrylo0 */
+#define LOONGARCH_CSR_TLBRELO1       0x8d /* TLB refill entrylo1 */
+#define LOONGARCH_CSR_TLBREHI        0x8e /* TLB refill entryhi */
+#define LOONGARCH_CSR_TLBRPRMD       0x8f /* TLB refill mode info */
+
+/* Machine error registers */
+#define LOONGARCH_CSR_ERRCTL         0x90 /* ERRCTL */
+#define LOONGARCH_CSR_ERRINFO        0x91 /* Error info1 */
+#define LOONGARCH_CSR_ERRINFO1       0x92 /* Error info2 */
+#define LOONGARCH_CSR_ERRENT         0x93 /* Error exception base address */
+#define LOONGARCH_CSR_ERRERA         0x94 /* Error exception PC */
+#define LOONGARCH_CSR_ERRSAVE        0x95 /* KScratch machine error exception */
+
+#define LOONGARCH_CSR_CTAG           0x98 /* TagLo + TagHi */
+
+/* Shadow MCSR : 0xc0 ~ 0xff */
+#define LOONGARCH_CSR_MCSR0          0xc0 /* CPUCFG0 and CPUCFG1 */
+#define  MCSR0_INT_IMPL_SHIFT        58
+#define  MCSR0_INT_IMPL              0
+#define  MCSR0_IOCSR_BRD_SHIFT       57
+#define  MCSR0_IOCSR_BRD             (1ULL << MCSR0_IOCSR_BRD_SHIFT)
+#define  MCSR0_HUGEPG_SHIFT          56
+#define  MCSR0_HUGEPG                (1ULL << MCSR0_HUGEPG_SHIFT)
+#define  MCSR0_RPLVTLB_SHIFT         55
+#define  MCSR0_RPLVTLB               (1ULL << MCSR0_RPLVTLB_SHIFT)
+#define  MCSR0_EXEPROT_SHIFT         54
+#define  MCSR0_EXEPROT               (1ULL << MCSR0_EXEPROT_SHIFT)
+#define  MCSR0_RI_SHIFT              53
+#define  MCSR0_RI                    (1ULL << MCSR0_RI_SHIFT)
+#define  MCSR0_UAL_SHIFT             52
+#define  MCSR0_UAL                   (1ULL << MCSR0_UAL_SHIFT)
+#define  MCSR0_VABIT_SHIFT           44
+#define  MCSR0_VABIT_WIDTH           8
+#define  MCSR0_VABIT                 (0xffULL << MCSR0_VABIT_SHIFT)
+#define  VABIT_DEFAULT               0x2f
+#define  MCSR0_PABIT_SHIFT           36
+#define  MCSR0_PABIT_WIDTH           8
+#define  MCSR0_PABIT                 (0xffULL << MCSR0_PABIT_SHIFT)
+#define  PABIT_DEFAULT               0x2f
+#define  MCSR0_IOCSR_SHIFT           35
+#define  MCSR0_IOCSR                 (1ULL << MCSR0_IOCSR_SHIFT)
+#define  MCSR0_PAGING_SHIFT          34
+#define  MCSR0_PAGING                (1ULL << MCSR0_PAGING_SHIFT)
+#define  MCSR0_GR64_SHIFT            33
+#define  MCSR0_GR64                  (1ULL << MCSR0_GR64_SHIFT)
+#define  GR64_DEFAULT                1
+#define  MCSR0_GR32_SHIFT            32
+#define  MCSR0_GR32                  (1ULL << MCSR0_GR32_SHIFT)
+#define  GR32_DEFAULT                0
+#define  MCSR0_PRID_WIDTH            32
+#define  MCSR0_PRID                  0x14C010
+
+#define LOONGARCH_CSR_MCSR1          0xc1 /* CPUCFG2 and CPUCFG3 */
+#define  MCSR1_HPFOLD_SHIFT          43
+#define  MCSR1_HPFOLD                (1ULL << MCSR1_HPFOLD_SHIFT)
+#define  MCSR1_SPW_LVL_SHIFT         40
+#define  MCSR1_SPW_LVL_WIDTH         3
+#define  MCSR1_SPW_LVL               (7ULL << MCSR1_SPW_LVL_SHIFT)
+#define  MCSR1_ICACHET_SHIFT         39
+#define  MCSR1_ICACHET               (1ULL << MCSR1_ICACHET_SHIFT)
+#define  MCSR1_ITLBT_SHIFT           38
+#define  MCSR1_ITLBT                 (1ULL << MCSR1_ITLBT_SHIFT)
+#define  MCSR1_LLDBAR_SHIFT          37
+#define  MCSR1_LLDBAR                (1ULL << MCSR1_LLDBAR_SHIFT)
+#define  MCSR1_SCDLY_SHIFT           36
+#define  MCSR1_SCDLY                 (1ULL << MCSR1_SCDLY_SHIFT)
+#define  MCSR1_LLEXC_SHIFT           35
+#define  MCSR1_LLEXC                 (1ULL << MCSR1_LLEXC_SHIFT)
+#define  MCSR1_UCACC_SHIFT           34
+#define  MCSR1_UCACC                 (1ULL << MCSR1_UCACC_SHIFT)
+#define  MCSR1_SFB_SHIFT             33
+#define  MCSR1_SFB                   (1ULL << MCSR1_SFB_SHIFT)
+#define  MCSR1_CCDMA_SHIFT           32
+#define  MCSR1_CCDMA                 (1ULL << MCSR1_CCDMA_SHIFT)
+#define  MCSR1_LAMO_SHIFT            22
+#define  MCSR1_LAMO                  (1ULL << MCSR1_LAMO_SHIFT)
+#define  MCSR1_LSPW_SHIFT            21
+#define  MCSR1_LSPW                  (1ULL << MCSR1_LSPW_SHIFT)
+#define  MCSR1_LOONGARCHBT_SHIFT     20
+#define  MCSR1_LOONGARCHBT           (1ULL << MCSR1_LOONGARCHBT_SHIFT)
+#define  MCSR1_ARMBT_SHIFT           19
+#define  MCSR1_ARMBT                 (1ULL << MCSR1_ARMBT_SHIFT)
+#define  MCSR1_X86BT_SHIFT           18
+#define  MCSR1_X86BT                 (1ULL << MCSR1_X86BT_SHIFT)
+#define  MCSR1_LLFTPVERS_SHIFT       15
+#define  MCSR1_LLFTPVERS_WIDTH       3
+#define  MCSR1_LLFTPVERS             (7ULL << MCSR1_LLFTPVERS_SHIFT)
+#define  MCSR1_LLFTP_SHIFT           14
+#define  MCSR1_LLFTP                 (1ULL << MCSR1_LLFTP_SHIFT)
+#define  MCSR1_VZVERS_SHIFT          11
+#define  MCSR1_VZVERS_WIDTH          3
+#define  MCSR1_VZVERS                (7ULL << MCSR1_VZVERS_SHIFT)
+#define  MCSR1_VZ_SHIFT              10
+#define  MCSR1_VZ                    (1ULL << MCSR1_VZ_SHIFT)
+#define  MCSR1_CRYPTO_SHIFT          9
+#define  MCSR1_CRYPTO                (1ULL << MCSR1_CRYPTO_SHIFT)
+#define  MCSR1_COMPLEX_SHIFT         8
+#define  MCSR1_COMPLEX               (1ULL << MCSR1_COMPLEX_SHIFT)
+#define  MCSR1_LASX_SHIFT            7
+#define  MCSR1_LASX                  (1ULL << MCSR1_LASX_SHIFT)
+#define  MCSR1_LSX_SHIFT             6
+#define  MCSR1_LSX                   (1ULL << MCSR1_LSX_SHIFT)
+#define  MCSR1_FPVERS_SHIFT          3
+#define  MCSR1_FPVERS_WIDTH          3
+#define  MCSR1_FPVERS                (7ULL << MCSR1_FPVERS_SHIFT)
+#define  MCSR1_FPDP_SHIFT            2
+#define  MCSR1_FPDP                  (1ULL << MCSR1_FPDP_SHIFT)
+#define  MCSR1_FPSP_SHIFT            1
+#define  MCSR1_FPSP                  (1ULL << MCSR1_FPSP_SHIFT)
+#define  MCSR1_FP_SHIFT              0
+#define  MCSR1_FP                    (1ULL << MCSR1_FP_SHIFT)
+
+#define LOONGARCH_CSR_MCSR2          0xc2 /* CPUCFG4 and CPUCFG5 */
+#define  MCSR2_CCDIV_SHIFT           48
+#define  MCSR2_CCDIV_WIDTH           16
+#define  MCSR2_CCDIV                 (0xffffULL << MCSR2_CCDIV_SHIFT)
+#define  MCSR2_CCMUL_SHIFT           32
+#define  MCSR2_CCMUL_WIDTH           16
+#define  MCSR2_CCMUL                 (0xffffULL << MCSR2_CCMUL_SHIFT)
+#define  MCSR2_CCFREQ_WIDTH          32
+#define  MCSR2_CCFREQ                (0xffffffff)
+#define  CCFREQ_DEFAULT              0x5f5e100 /* 100MHZ */
+
+#define LOONGARCH_CSR_MCSR3          0xc3 /* CPUCFG6 */
+#define  MCSR3_UPM_SHIFT             14
+#define  MCSR3_UPM                   (1ULL << MCSR3_UPM_SHIFT)
+#define  MCSR3_PMBITS_SHIFT          8
+#define  MCSR3_PMBITS_WIDTH          6
+#define  MCSR3_PMBITS                (0x3fULL << MCSR3_PMBITS_SHIFT)
+#define  PMBITS_DEFAULT              0x40
+#define  MCSR3_PMNUM_SHIFT           4
+#define  MCSR3_PMNUM_WIDTH           4
+#define  MCSR3_PMNUM                 (0xfULL << MCSR3_PMNUM_SHIFT)
+#define  MCSR3_PAMVER_SHIFT          1
+#define  MCSR3_PAMVER_WIDTH          3
+#define  MCSR3_PAMVER                (0x7ULL << MCSR3_PAMVER_SHIFT)
+#define  MCSR3_PMP_SHIFT             0
+#define  MCSR3_PMP                   (1ULL << MCSR3_PMP_SHIFT)
+
+#define LOONGARCH_CSR_MCSR8          0xc8 /* CPUCFG16 and CPUCFG17 */
+#define  MCSR8_L1I_SIZE_SHIFT        56
+#define  MCSR8_L1I_SIZE_WIDTH        7
+#define  MCSR8_L1I_SIZE              (0x7fULL << MCSR8_L1I_SIZE_SHIFT)
+#define  MCSR8_L1I_IDX_SHIFT         48
+#define  MCSR8_L1I_IDX_WIDTH         8
+#define  MCSR8_L1I_IDX               (0xffULL << MCSR8_L1I_IDX_SHIFT)
+#define  MCSR8_L1I_WAY_SHIFT         32
+#define  MCSR8_L1I_WAY_WIDTH         16
+#define  MCSR8_L1I_WAY               (0xffffULL << MCSR8_L1I_WAY_SHIFT)
+#define  MCSR8_L3DINCL_SHIFT         16
+#define  MCSR8_L3DINCL               (1ULL << MCSR8_L3DINCL_SHIFT)
+#define  MCSR8_L3DPRIV_SHIFT         15
+#define  MCSR8_L3DPRIV               (1ULL << MCSR8_L3DPRIV_SHIFT)
+#define  MCSR8_L3DPRE_SHIFT          14
+#define  MCSR8_L3DPRE                (1ULL << MCSR8_L3DPRE_SHIFT)
+#define  MCSR8_L3IUINCL_SHIFT        13
+#define  MCSR8_L3IUINCL              (1ULL << MCSR8_L3IUINCL_SHIFT)
+#define  MCSR8_L3IUPRIV_SHIFT        12
+#define  MCSR8_L3IUPRIV              (1ULL << MCSR8_L3IUPRIV_SHIFT)
+#define  MCSR8_L3IUUNIFY_SHIFT       11
+#define  MCSR8_L3IUUNIFY             (1ULL << MCSR8_L3IUUNIFY_SHIFT)
+#define  MCSR8_L3IUPRE_SHIFT         10
+#define  MCSR8_L3IUPRE               (1ULL << MCSR8_L3IUPRE_SHIFT)
+#define  MCSR8_L2DINCL_SHIFT         9
+#define  MCSR8_L2DINCL               (1ULL << MCSR8_L2DINCL_SHIFT)
+#define  MCSR8_L2DPRIV_SHIFT         8
+#define  MCSR8_L2DPRIV               (1ULL << MCSR8_L2DPRIV_SHIFT)
+#define  MCSR8_L2DPRE_SHIFT          7
+#define  MCSR8_L2DPRE                (1ULL << MCSR8_L2DPRE_SHIFT)
+#define  MCSR8_L2IUINCL_SHIFT        6
+#define  MCSR8_L2IUINCL              (1ULL << MCSR8_L2IUINCL_SHIFT)
+#define  MCSR8_L2IUPRIV_SHIFT        5
+#define  MCSR8_L2IUPRIV              (1ULL << MCSR8_L2IUPRIV_SHIFT)
+#define  MCSR8_L2IUUNIFY_SHIFT       4
+#define  MCSR8_L2IUUNIFY             (1ULL << MCSR8_L2IUUNIFY_SHIFT)
+#define  MCSR8_L2IUPRE_SHIFT         3
+#define  MCSR8_L2IUPRE               (1ULL << MCSR8_L2IUPRE_SHIFT)
+#define  MCSR8_L1DPRE_SHIFT          2
+#define  MCSR8_L1DPRE                (1ULL << MCSR8_L1DPRE_SHIFT)
+#define  MCSR8_L1IUUNIFY_SHIFT       1
+#define  MCSR8_L1IUUNIFY             (1ULL << MCSR8_L1IUUNIFY_SHIFT)
+#define  MCSR8_L1IUPRE_SHIFT         0
+#define  MCSR8_L1IUPRE               (1ULL << MCSR8_L1IUPRE_SHIFT)
+
+#define LOONGARCH_CSR_MCSR9          0xc9 /* CPUCFG18 and CPUCFG19 */
+#define  MCSR9_L2U_SIZE_SHIFT        56
+#define  MCSR9_L2U_SIZE_WIDTH        7
+#define  MCSR9_L2U_SIZE              (0x7fULL << MCSR9_L2U_SIZE_SHIFT)
+#define  MCSR9_L2U_IDX_SHIFT         48
+#define  MCSR9_L2U_IDX_WIDTH         8
+#define  MCSR9_L2U_IDX               (0xffULL << MCSR9_IDX_LOG_SHIFT)
+#define  MCSR9_L2U_WAY_SHIFT         32
+#define  MCSR9_L2U_WAY_WIDTH         16
+#define  MCSR9_L2U_WAY               (0xffffULL << MCSR9_L2U_WAY_SHIFT)
+#define  MCSR9_L1D_SIZE_SHIFT        24
+#define  MCSR9_L1D_SIZE_WIDTH        7
+#define  MCSR9_L1D_SIZE              (0x7fULL << MCSR9_L1D_SIZE_SHIFT)
+#define  MCSR9_L1D_IDX_SHIFT         16
+#define  MCSR9_L1D_IDX_WIDTH         8
+#define  MCSR9_L1D_IDX               (0xffULL << MCSR9_L1D_IDX_SHIFT)
+#define  MCSR9_L1D_WAY_SHIFT         0
+#define  MCSR9_L1D_WAY_WIDTH         16
+#define  MCSR9_L1D_WAY               (0xffffULL << MCSR9_L1D_WAY_SHIFT)
+
+#define LOONGARCH_CSR_MCSR10         0xca /* CPUCFG20 */
+#define  MCSR10_L3U_SIZE_SHIFT       24
+#define  MCSR10_L3U_SIZE_WIDTH       7
+#define  MCSR10_L3U_SIZE             (0x7fULL << MCSR10_L3U_SIZE_SHIFT)
+#define  MCSR10_L3U_IDX_SHIFT        16
+#define  MCSR10_L3U_IDX_WIDTH        8
+#define  MCSR10_L3U_IDX              (0xffULL << MCSR10_L3U_IDX_SHIFT)
+#define  MCSR10_L3U_WAY_SHIFT        0
+#define  MCSR10_L3U_WAY_WIDTH        16
+#define  MCSR10_L3U_WAY              (0xffffULL << MCSR10_L3U_WAY_SHIFT)
+
+#define LOONGARCH_CSR_MCSR24         0xf0 /* CPUCFG48 */
+#define  MCSR24_RAMCG_SHIFT          3
+#define  MCSR24_RAMCG                (1ULL << MCSR24_RAMCG_SHIFT)
+#define  MCSR24_VFPUCG_SHIFT         2
+#define  MCSR24_VFPUCG               (1ULL << MCSR24_VFPUCG_SHIFT)
+#define  MCSR24_NAPEN_SHIFT          1
+#define  MCSR24_NAPEN                (1ULL << MCSR24_NAPEN_SHIFT)
+#define  MCSR24_MCSRLOCK_SHIFT       0
+#define  MCSR24_MCSRLOCK             (1ULL << MCSR24_MCSRLOCK_SHIFT)
+
+/* Uncached accelerate windows registers  */
+#define LOONGARCH_CSR_UCAWIN         0x100 /* read only info */
+#define LOONGARCH_CSR_UCAWIN0_LO     0x102 /* win0 low */
+#define LOONGARCH_CSR_UCAWIN0_HI     0x103 /* win0 high */
+#define LOONGARCH_CSR_UCAWIN1_LO     0x104 /* win1 low */
+#define LOONGARCH_CSR_UCAWIN1_HI     0x105 /* win1 high */
+#define LOONGARCH_CSR_UCAWIN2_LO     0x106 /* win2 low */
+#define LOONGARCH_CSR_UCAWIN2_HI     0x107 /* win2 high */
+#define LOONGARCH_CSR_UCAWIN3_LO     0x108 /* win3 low */
+#define LOONGARCH_CSR_UCAWIN3_HI     0x109 /* win3 high */
+
+/* Direct map windows registers */
+#define LOONGARCH_CSR_DMWIN0         0x180 /* direct map win0: MEM & IF */
+#define LOONGARCH_CSR_DMWIN1         0x181 /* direct map win1: MEM & IF */
+#define LOONGARCH_CSR_DMWIN2         0x182 /* direct map win2: MEM */
+#define LOONGARCH_CSR_DMWIN3         0x183 /* direct map win3: MEM */
+
+/* Performance counter registers */
+#define LOONGARCH_CSR_PERFCTRL0      0x200 /* perf event 0 config */
+#define LOONGARCH_CSR_PERFCNTR0      0x201 /* perf event 0 count value */
+#define LOONGARCH_CSR_PERFCTRL1      0x202 /* perf event 1 config */
+#define LOONGARCH_CSR_PERFCNTR1      0x203 /* perf event 1 count value */
+#define LOONGARCH_CSR_PERFCTRL2      0x204 /* perf event 2 config */
+#define LOONGARCH_CSR_PERFCNTR2      0x205 /* perf event 2 count value */
+#define LOONGARCH_CSR_PERFCTRL3      0x206 /* perf event 3 config */
+#define LOONGARCH_CSR_PERFCNTR3      0x207 /* perf event 3 count value */
+#define  CSR_PERFCTRL_PLV0           (1ULL << 16)
+#define  CSR_PERFCTRL_PLV1           (1ULL << 17)
+#define  CSR_PERFCTRL_PLV2           (1ULL << 18)
+#define  CSR_PERFCTRL_PLV3           (1ULL << 19)
+#define  CSR_PERFCTRL_IE             (1ULL << 20)
+#define  CSR_PERFCTRL_EVENT          0x3ff
+
+/* CSR register */
+#define CPU_LOONGARCH_CSR    \
+    uint64_t CSR_CRMD;       \
+    uint64_t CSR_PRMD;       \
+    uint64_t CSR_EUEN;       \
+    uint64_t CSR_MISC;       \
+    uint64_t CSR_ECFG;       \
+    uint64_t CSR_ESTAT;      \
+    uint64_t CSR_ERA;        \
+    uint64_t CSR_BADV;       \
+    uint64_t CSR_BADI;       \
+    uint64_t CSR_EEPN;       \
+    uint64_t CSR_TLBIDX;     \
+    uint64_t CSR_TLBEHI;     \
+    uint64_t CSR_TLBELO0;    \
+    uint64_t CSR_TLBELO1;    \
+    uint64_t CSR_ASID;       \
+    uint64_t CSR_PGDL;       \
+    uint64_t CSR_PGDH;       \
+    uint64_t CSR_PGD;        \
+    uint64_t CSR_PWCTL0;     \
+    uint64_t CSR_PWCTL1;     \
+    uint64_t CSR_STLBPGSIZE; \
+    uint64_t CSR_RVACFG;     \
+    uint64_t CSR_CPUID;      \
+    uint64_t CSR_PRCFG1;     \
+    uint64_t CSR_PRCFG2;     \
+    uint64_t CSR_PRCFG3;     \
+    uint64_t CSR_KS0;        \
+    uint64_t CSR_KS1;        \
+    uint64_t CSR_KS2;        \
+    uint64_t CSR_KS3;        \
+    uint64_t CSR_KS4;        \
+    uint64_t CSR_KS5;        \
+    uint64_t CSR_KS6;        \
+    uint64_t CSR_KS7;        \
+    uint64_t CSR_KS8;        \
+    uint64_t CSR_TMID;       \
+    uint64_t CSR_TCFG;       \
+    uint64_t CSR_TVAL;       \
+    uint64_t CSR_CNTC;       \
+    uint64_t CSR_TINTCLR;    \
+    uint64_t CSR_LLBIT;      \
+    uint64_t CSR_IMPCTL1;    \
+    uint64_t CSR_IMPCTL2;    \
+    uint64_t CSR_GNMI;       \
+    uint64_t CSR_TLBRENT;    \
+    uint64_t CSR_TLBRBADV;   \
+    uint64_t CSR_TLBRERA;    \
+    uint64_t CSR_TLBRSAVE;   \
+    uint64_t CSR_TLBRELO0;   \
+    uint64_t CSR_TLBRELO1;   \
+    uint64_t CSR_TLBREHI;    \
+    uint64_t CSR_TLBRPRMD;   \
+    uint64_t CSR_ERRCTL;     \
+    uint64_t CSR_ERRINFO;    \
+    uint64_t CSR_ERRINFO1;   \
+    uint64_t CSR_ERRENT;     \
+    uint64_t CSR_ERRERA;     \
+    uint64_t CSR_ERRSAVE;    \
+    uint64_t CSR_CTAG;       \
+    uint64_t CSR_MCSR0;      \
+    uint64_t CSR_MCSR1;      \
+    uint64_t CSR_MCSR2;      \
+    uint64_t CSR_MCSR3;      \
+    uint64_t CSR_MCSR8;      \
+    uint64_t CSR_MCSR9;      \
+    uint64_t CSR_MCSR10;     \
+    uint64_t CSR_MCSR24;     \
+    uint64_t CSR_UCAWIN;     \
+    uint64_t CSR_UCAWIN0_LO; \
+    uint64_t CSR_UCAWIN0_HI; \
+    uint64_t CSR_UCAWIN1_LO; \
+    uint64_t CSR_UCAWIN1_HI; \
+    uint64_t CSR_UCAWIN2_LO; \
+    uint64_t CSR_UCAWIN2_HI; \
+    uint64_t CSR_UCAWIN3_LO; \
+    uint64_t CSR_UCAWIN3_HI; \
+    uint64_t CSR_DMWIN0;     \
+    uint64_t CSR_DMWIN1;     \
+    uint64_t CSR_DMWIN2;     \
+    uint64_t CSR_DMWIN3;     \
+    uint64_t CSR_PERFCTRL0;  \
+    uint64_t CSR_PERFCNTR0;  \
+    uint64_t CSR_PERFCTRL1;  \
+    uint64_t CSR_PERFCNTR1;  \
+    uint64_t CSR_PERFCTRL2;  \
+    uint64_t CSR_PERFCNTR2;  \
+    uint64_t CSR_PERFCTRL3;  \
+    uint64_t CSR_PERFCNTR3;  \
+
+#endif
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 03/22] target/loongarch: Add core definition
  2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
  2021-07-21  9:52 ` [PATCH v2 01/22] target/loongarch: Add README Song Gao
  2021-07-21  9:52 ` [PATCH v2 02/22] target/loongarch: Add CSR registers definition Song Gao
@ 2021-07-21  9:52 ` Song Gao
  2021-07-22 22:43   ` Richard Henderson
  2021-07-21  9:53 ` [PATCH v2 04/22] target/loongarch: Add interrupt handling support Song Gao
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

This patch add target state header, target definitions 
and initialization routines.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/cpu-param.h |  21 ++++
 target/loongarch/cpu-qom.h   |  40 ++++++
 target/loongarch/cpu.c       | 293 +++++++++++++++++++++++++++++++++++++++++++
 target/loongarch/cpu.h       | 265 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 619 insertions(+)
 create mode 100644 target/loongarch/cpu-param.h
 create mode 100644 target/loongarch/cpu-qom.h
 create mode 100644 target/loongarch/cpu.c
 create mode 100644 target/loongarch/cpu.h

diff --git a/target/loongarch/cpu-param.h b/target/loongarch/cpu-param.h
new file mode 100644
index 0000000..582ee29
--- /dev/null
+++ b/target/loongarch/cpu-param.h
@@ -0,0 +1,21 @@
+/*
+ * LoongArch cpu parameters for qemu.
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+#ifndef LOONGARCH_CPU_PARAM_H
+#define LOONGARCH_CPU_PARAM_H 1
+
+#ifdef TARGET_LOONGARCH64
+#define TARGET_LONG_BITS 64
+#define TARGET_PHYS_ADDR_SPACE_BITS 48
+#define TARGET_VIRT_ADDR_SPACE_BITS 48
+#endif
+
+#define TARGET_PAGE_BITS 12
+#define NB_MMU_MODES 4
+
+#endif
diff --git a/target/loongarch/cpu-qom.h b/target/loongarch/cpu-qom.h
new file mode 100644
index 0000000..307ab13
--- /dev/null
+++ b/target/loongarch/cpu-qom.h
@@ -0,0 +1,40 @@
+/*
+ * QEMU LoongArch CPU
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+#ifndef QEMU_LOONGARCH_CPU_QOM_H
+#define QEMU_LOONGARCH_CPU_QOM_H
+
+#include "hw/core/cpu.h"
+#include "qom/object.h"
+
+#ifdef TARGET_LOONGARCH64
+#define TYPE_LOONGARCH_CPU "loongarch64-cpu"
+#else
+#error LoongArch 32bit emulation is not implemented yet
+#endif
+
+OBJECT_DECLARE_TYPE(LoongArchCPU, LoongArchCPUClass,
+                    LOONGARCH_CPU)
+
+/**
+ * LoongArchCPUClass:
+ * @parent_realize: The parent class' realize handler.
+ * @parent_reset: The parent class' reset handler.
+ *
+ * A LoongArch CPU model.
+ */
+struct LoongArchCPUClass {
+    /*< private >*/
+    CPUClass parent_class;
+    /*< public >*/
+
+    DeviceRealize parent_realize;
+    DeviceReset parent_reset;
+};
+
+#endif
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
new file mode 100644
index 0000000..4db2d0f
--- /dev/null
+++ b/target/loongarch/cpu.c
@@ -0,0 +1,293 @@
+/*
+ * QEMU LoongArch CPU
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/qemu-print.h"
+#include "qapi/error.h"
+#include "qemu/module.h"
+#include "sysemu/qtest.h"
+#include "exec/exec-all.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-clock.h"
+#include "qapi/qapi-commands-machine-target.h"
+#include "cpu.h"
+#include "cpu-csr.h"
+#include "cpu-qom.h"
+
+static const char * const excp_names[EXCP_LAST + 1] = {
+    [EXCP_INTE] = "Interrupt error",
+    [EXCP_ADE] = "Address error",
+    [EXCP_SYSCALL] = "Syscall",
+    [EXCP_BREAK] = "Break",
+    [EXCP_FPDIS] = "FPU Disabled",
+    [EXCP_INE] = "Inst. Not Exist",
+    [EXCP_TRAP] = "Trap",
+    [EXCP_FPE] = "Floating Point Exception",
+    [EXCP_TLBM] = "TLB modified fault",
+    [EXCP_TLBL] = "TLB miss on a load",
+    [EXCP_TLBS] = "TLB miss on a store",
+    [EXCP_TLBPE] = "TLB Privilege Error",
+    [EXCP_TLBXI] = "TLB Execution-Inhibit exception",
+    [EXCP_TLBRI] = "TLB Read-Inhibit exception",
+};
+
+const char *loongarch_exception_name(int32_t exception)
+{
+    if (exception < 0 || exception > EXCP_LAST) {
+        return "unknown";
+    }
+    return excp_names[exception];
+}
+
+target_ulong exception_resume_pc(CPULoongArchState *env)
+{
+    target_ulong bad_pc;
+
+    bad_pc = env->active_tc.PC;
+
+    return bad_pc;
+}
+
+void QEMU_NORETURN do_raise_exception_err(CPULoongArchState *env,
+                                          uint32_t exception,
+                                          int error_code,
+                                          uintptr_t pc)
+{
+    CPUState *cs = env_cpu(env);
+
+    qemu_log_mask(CPU_LOG_INT, "%s: %d (%s) %d\n",
+                  __func__,
+                  exception,
+                  loongarch_exception_name(exception),
+                  error_code);
+    cs->exception_index = exception;
+    env->error_code = error_code;
+
+    cpu_loop_exit_restore(cs, pc);
+}
+
+static void loongarch_cpu_set_pc(CPUState *cs, vaddr value)
+{
+    LoongArchCPU *cpu = LOONGARCH_CPU(cs);
+    CPULoongArchState *env = &cpu->env;
+
+    env->active_tc.PC = value & ~(target_ulong)1;
+}
+
+#ifdef CONFIG_TCG
+static void loongarch_cpu_synchronize_from_tb(CPUState *cs,
+                                              const TranslationBlock *tb)
+{
+    LoongArchCPU *cpu = LOONGARCH_CPU(cs);
+    CPULoongArchState *env = &cpu->env;
+
+    env->active_tc.PC = tb->pc;
+    env->hflags &= ~LOONGARCH_HFLAG_BMASK;
+    env->hflags |= tb->flags & LOONGARCH_HFLAG_BMASK;
+}
+#endif /* CONFIG_TCG */
+
+static bool loongarch_cpu_has_work(CPUState *cs)
+{
+    LoongArchCPU *cpu = LOONGARCH_CPU(cs);
+    CPULoongArchState *env = &cpu->env;
+    bool has_work = false;
+
+    if ((cs->interrupt_request & CPU_INTERRUPT_HARD) &&
+        cpu_loongarch_hw_interrupts_pending(env)) {
+            has_work = true;
+    }
+
+    return has_work;
+}
+
+static void set_loongarch_feature(CPULoongArchState *env, int feature)
+{
+    env->features |= (1ULL << feature);
+}
+
+static void set_loongarch_csr(CPULoongArchState *env)
+{
+    env->CSR_PRCFG1 = 0x72f8;
+    env->CSR_PRCFG2 = 0x3ffff000;
+    env->CSR_PRCFG3 = 0x8073f2;
+    env->CSR_CRMD = 0xa8;
+    env->CSR_ECFG = 0x70000;
+    env->CSR_STLBPGSIZE = 0xe;
+    env->CSR_RVACFG = 0x0;
+    env->CSR_MCSR0 = 0x3f2f2fe0014c010;
+    env->CSR_MCSR1 = 0xcff0060c3cf;
+    env->CSR_MCSR2 = 0x1000105f5e100;
+    env->CSR_MCSR3 = 0x0;
+    env->CSR_MCSR8 = 0x608000300002c3d;
+    env->CSR_MCSR9 = 0x608000f06080003;
+    env->CSR_MCSR10 = 0x60f000f;
+    env->CSR_MCSR24 = 0x0;
+}
+
+/* LoongArch CPU definitions */
+static void loongarch_3a5000_initfn(Object *obj)
+{
+    LoongArchCPU *cpu = LOONGARCH_CPU(obj);
+    CPULoongArchState *env = &cpu->env;
+
+    set_loongarch_feature(env, LA_FEATURE_3A5000);
+    set_loongarch_csr(env);
+}
+
+static void loongarch_cpu_list_entry(gpointer data, gpointer user_data)
+{
+    const char *typename = object_class_get_name(OBJECT_CLASS(data));
+
+    qemu_printf("%s\n", typename);
+}
+
+void loongarch_cpu_list(void)
+{
+    GSList *list;
+    list = object_class_get_list_sorted(TYPE_LOONGARCH_CPU, false);
+    g_slist_foreach(list, loongarch_cpu_list_entry, NULL);
+    g_slist_free(list);
+}
+
+static void fpu_init(CPULoongArchState *env)
+{
+    memcpy(&env->active_fpu, &env->fpus[0], sizeof(env->active_fpu));
+}
+
+static void loongarch_cpu_reset(DeviceState *dev)
+{
+    CPUState *cs = CPU(dev);
+    LoongArchCPU *cpu = LOONGARCH_CPU(cs);
+    LoongArchCPUClass *lacc = LOONGARCH_CPU_GET_CLASS(cpu);
+    CPULoongArchState *env = &cpu->env;
+
+    lacc->parent_reset(dev);
+
+    memset(env, 0, offsetof(CPULoongArchState, end_reset_fields));
+
+    set_loongarch_csr(env);
+    env->current_tc = 0;
+    env->active_fpu.fcsr0_mask = 0x1f1f03df;
+    env->active_fpu.fcsr0 = 0x0;
+
+    compute_hflags(env);
+    cs->exception_index = EXCP_NONE;
+}
+
+static void loongarch_cpu_disas_set_info(CPUState *s, disassemble_info *info)
+{
+    info->print_insn = print_insn_loongarch;
+}
+
+static void loongarch_cpu_realizefn(DeviceState *dev, Error **errp)
+{
+    CPUState *cs = CPU(dev);
+    LoongArchCPU *cpu = LOONGARCH_CPU(dev);
+    CPULoongArchState *env = &cpu->env;
+    LoongArchCPUClass *lacc = LOONGARCH_CPU_GET_CLASS(dev);
+    Error *local_err = NULL;
+
+    cpu_exec_realizefn(cs, &local_err);
+    if (local_err != NULL) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    env->exception_base = 0x1C000000;
+
+    fpu_init(env);
+
+    cpu_reset(cs);
+    qemu_init_vcpu(cs);
+
+    lacc->parent_realize(dev, errp);
+}
+
+static void loongarch_cpu_initfn(Object *obj)
+{
+    LoongArchCPU *cpu = LOONGARCH_CPU(obj);
+
+    cpu_set_cpustate_pointers(cpu);
+    cpu->clock = qdev_init_clock_in(DEVICE(obj), "clk-in", NULL, cpu, 0);
+}
+
+static char *loongarch_cpu_type_name(const char *cpu_model)
+{
+    return g_strdup_printf(LOONGARCH_CPU_TYPE_NAME("%s"), cpu_model);
+}
+
+static ObjectClass *loongarch_cpu_class_by_name(const char *cpu_model)
+{
+    ObjectClass *oc;
+    char *typename;
+
+    typename = loongarch_cpu_type_name(cpu_model);
+    oc = object_class_by_name(typename);
+    g_free(typename);
+    return oc;
+}
+
+static Property loongarch_cpu_properties[] = {
+    DEFINE_PROP_INT32("core-id", LoongArchCPU, core_id, -1),
+    DEFINE_PROP_UINT32("id", LoongArchCPU, id, UNASSIGNED_CPU_ID),
+    DEFINE_PROP_INT32("node-id", LoongArchCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
+    DEFINE_PROP_END_OF_LIST()
+};
+
+#ifdef CONFIG_TCG
+#include "hw/core/tcg-cpu-ops.h"
+
+static struct TCGCPUOps loongarch_tcg_ops = {
+    .initialize = loongarch_tcg_init,
+    .synchronize_from_tb = loongarch_cpu_synchronize_from_tb,
+};
+#endif /* CONFIG_TCG */
+
+static void loongarch_cpu_class_init(ObjectClass *c, void *data)
+{
+    LoongArchCPUClass *lacc = LOONGARCH_CPU_CLASS(c);
+    CPUClass *cc = CPU_CLASS(c);
+    DeviceClass *dc = DEVICE_CLASS(c);
+
+    device_class_set_parent_realize(dc, loongarch_cpu_realizefn,
+                                    &lacc->parent_realize);
+    device_class_set_parent_reset(dc, loongarch_cpu_reset, &lacc->parent_reset);
+    device_class_set_props(dc, loongarch_cpu_properties);
+
+    cc->class_by_name = loongarch_cpu_class_by_name;
+    cc->has_work = loongarch_cpu_has_work;
+    cc->dump_state = loongarch_cpu_dump_state;
+    cc->set_pc = loongarch_cpu_set_pc;
+    cc->disas_set_info = loongarch_cpu_disas_set_info;
+#ifdef CONFIG_TCG
+    cc->tcg_ops = &loongarch_tcg_ops;
+#endif
+}
+
+#define DEFINE_LOONGARCH_CPU_TYPE(model, initfn) \
+    { \
+        .parent = TYPE_LOONGARCH_CPU, \
+        .instance_init = initfn, \
+        .name = LOONGARCH_CPU_TYPE_NAME(model), \
+    }
+
+static const TypeInfo loongarch_cpu_type_infos[] = {
+    {
+        .name = TYPE_LOONGARCH_CPU,
+        .parent = TYPE_CPU,
+        .instance_size = sizeof(LoongArchCPU),
+        .instance_init = loongarch_cpu_initfn,
+        .abstract = true,
+        .class_size = sizeof(LoongArchCPUClass),
+        .class_init = loongarch_cpu_class_init,
+    },
+    DEFINE_LOONGARCH_CPU_TYPE("Loongson-3A5000", loongarch_3a5000_initfn),
+};
+
+DEFINE_TYPES(loongarch_cpu_type_infos)
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
new file mode 100644
index 0000000..ab1aeb6
--- /dev/null
+++ b/target/loongarch/cpu.h
@@ -0,0 +1,265 @@
+/*
+ * QEMU LoongArch CPU
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+#ifndef LOONGARCH_CPU_H
+#define LOONGARCH_CPU_H
+
+#include "exec/cpu-defs.h"
+#include "fpu/softfloat-types.h"
+#include "hw/clock.h"
+#include "cpu-qom.h"
+#include "cpu-csr.h"
+
+#define TCG_GUEST_DEFAULT_MO (0)
+#define UNASSIGNED_CPU_ID 0xFFFFFFFF
+
+typedef union fpr_t fpr_t;
+union fpr_t {
+    float64  fd;   /* ieee double precision */
+    float32  fs[2];/* ieee single precision */
+    uint64_t d;    /* binary double fixed-point */
+    uint32_t w[2]; /* binary single fixed-point */
+};
+
+/*
+ * define FP_ENDIAN_IDX to access the same location
+ * in the fpr_t union regardless of the host endianness
+ */
+#if defined(HOST_WORDS_BIGENDIAN)
+#  define FP_ENDIAN_IDX 1
+#else
+#  define FP_ENDIAN_IDX 0
+#endif
+
+typedef struct CPULoongArchFPUContext CPULoongArchFPUContext;
+struct CPULoongArchFPUContext {
+    /* Floating point registers */
+    fpr_t fpr[32];
+    float_status fp_status;
+
+    bool cf[8];
+    /*
+     * fcsr0
+     * 31:29 |28:24 |23:21 |20:16 |15:10 |9:8 |7  |6  |5 |4:0
+     *        Cause         Flags         RM   DAE TM     Enables
+     */
+    uint32_t fcsr0;
+    uint32_t fcsr0_mask;
+    uint32_t vcsr16;
+
+#define FCSR0_M1    0xdf         /* FCSR1 mask, DAE, TM and Enables */
+#define FCSR0_M2    0x1f1f0000   /* FCSR2 mask, Cause and Flags */
+#define FCSR0_M3    0x300        /* FCSR3 mask, Round Mode */
+#define FCSR0_RM    8            /* Round Mode bit num on fcsr0 */
+#define GET_FP_CAUSE(reg)        (((reg) >> 24) & 0x1f)
+#define GET_FP_ENABLE(reg)       (((reg) >>  0) & 0x1f)
+#define GET_FP_FLAGS(reg)        (((reg) >> 16) & 0x1f)
+#define SET_FP_CAUSE(reg, v)      do { (reg) = ((reg) & ~(0x1f << 24)) | \
+                                               ((v & 0x1f) << 24);       \
+                                     } while (0)
+#define SET_FP_ENABLE(reg, v)     do { (reg) = ((reg) & ~(0x1f <<  0)) | \
+                                               ((v & 0x1f) << 0);        \
+                                     } while (0)
+#define SET_FP_FLAGS(reg, v)      do { (reg) = ((reg) & ~(0x1f << 16)) | \
+                                               ((v & 0x1f) << 16);       \
+                                     } while (0)
+#define UPDATE_FP_FLAGS(reg, v)   do { (reg) |= ((v & 0x1f) << 16); } while (0)
+#define FP_INEXACT        1
+#define FP_UNDERFLOW      2
+#define FP_OVERFLOW       4
+#define FP_DIV0           8
+#define FP_INVALID        16
+};
+
+#define TARGET_INSN_START_EXTRA_WORDS 2
+#define LOONGARCH_FPU_MAX 1
+#define N_IRQS      14
+
+enum loongarch_feature {
+    LA_FEATURE_3A5000,
+};
+
+typedef struct TCState TCState;
+struct TCState {
+    target_ulong gpr[32];
+    target_ulong PC;
+};
+
+typedef struct CPULoongArchState CPULoongArchState;
+struct CPULoongArchState {
+    TCState active_tc;
+    CPULoongArchFPUContext active_fpu;
+
+    uint32_t current_tc;
+    uint64_t scr[4];
+    uint32_t current_fpu;
+
+    /* LoongArch CSR register */
+    CPU_LOONGARCH_CSR
+    target_ulong lladdr; /* LL virtual address compared against SC */
+    target_ulong llval;
+
+    CPULoongArchFPUContext fpus[LOONGARCH_FPU_MAX];
+
+    /* QEMU */
+    int error_code;
+    uint32_t hflags;    /* CPU State */
+#define TLB_NOMATCH   0x1
+#define INST_INAVAIL  0x2 /* Invalid instruction word for BadInstr */
+    /* TMASK defines different execution modes */
+#define LOONGARCH_HFLAG_TMASK  0x1F5807FF
+#define LOONGARCH_HFLAG_KU     0x00003 /* kernel/supervisor/user mode mask   */
+#define LOONGARCH_HFLAG_UM     0x00003 /* user mode flag                     */
+#define LOONGARCH_HFLAG_KM     0x00000 /* kernel mode flag                   */
+#define LOONGARCH_HFLAG_64     0x00008 /* 64-bit instructions enabled        */
+#define LOONGARCH_HFLAG_FPU    0x00020 /* FPU enabled                        */
+#define LOONGARCH_HFLAG_F64    0x00040 /* 64-bit FPU enabled                 */
+#define LOONGARCH_HFLAG_BMASK  0x3800
+#define LOONGARCH_HFLAG_B      0x00800 /* Unconditional branch               */
+#define LOONGARCH_HFLAG_BC     0x01000 /* Conditional branch                 */
+#define LOONGARCH_HFLAG_BR     0x02000 /* branch to register (can't link TB) */
+#define LOONGARCH_HFLAG_FRE   0x2000000 /* FRE enabled */
+#define LOONGARCH_HFLAG_ELPA  0x4000000
+    target_ulong btarget;        /* Jump / branch target               */
+    target_ulong bcond;          /* Branch condition (if needed)       */
+
+    /* Fields up to this point are cleared by a CPU reset */
+    struct {} end_reset_fields;
+
+    /* Fields after this point are preserved across CPU reset. */
+    uint64_t features;
+    void *irq[N_IRQS];
+    QEMUTimer *timer; /* Internal timer */
+    target_ulong exception_base; /* ExceptionBase input to the core */
+};
+
+/**
+ * LoongArchCPU:
+ * @env: #CPULoongArchState
+ * @clock: this CPU input clock (may be connected
+ *         to an output clock from another device).
+ *
+ * A LoongArch CPU.
+ */
+struct LoongArchCPU {
+    /*< private >*/
+    CPUState parent_obj;
+    /*< public >*/
+
+    Clock *clock;
+    CPUNegativeOffsetState neg;
+    CPULoongArchState env;
+    uint32_t id;
+    int32_t node_id; /* NUMA node this CPU belongs to */
+    int32_t core_id;
+};
+
+target_ulong exception_resume_pc(CPULoongArchState *env);
+
+static inline void cpu_get_tb_cpu_state(CPULoongArchState *env,
+                                        target_ulong *pc,
+                                        target_ulong *cs_base,
+                                        uint32_t *flags)
+{
+    *pc = env->active_tc.PC;
+    *cs_base = 0;
+    *flags = env->hflags & (LOONGARCH_HFLAG_TMASK | LOONGARCH_HFLAG_BMASK);
+}
+
+static inline LoongArchCPU *loongarch_env_get_cpu(CPULoongArchState *env)
+{
+    return container_of(env, LoongArchCPU, env);
+}
+
+#define ENV_GET_CPU(e) CPU(loongarch_env_get_cpu(e))
+
+void loongarch_cpu_list(void);
+
+#define CPU_INTERRUPT_WAKE CPU_INTERRUPT_TGT_INT_0
+
+#define cpu_signal_handler cpu_loongarch_signal_handler
+#define cpu_list loongarch_cpu_list
+
+/* MMU modes definitions */
+#define MMU_MODE0_SUFFIX _kernel
+#define MMU_MODE1_SUFFIX _super
+#define MMU_MODE2_SUFFIX _user
+#define MMU_MODE3_SUFFIX _error
+#define MMU_USER_IDX 2
+
+static inline int cpu_mmu_index(CPULoongArchState *env, bool ifetch)
+{
+    return MMU_USER_IDX;
+}
+
+typedef CPULoongArchState CPUArchState;
+typedef LoongArchCPU ArchCPU;
+
+#include "exec/cpu-all.h"
+
+/* Exceptions */
+enum {
+    EXCP_NONE          = -1,
+    EXCP_INTE          = 0,
+    EXCP_ADE,
+    EXCP_SYSCALL,
+    EXCP_BREAK,
+    EXCP_FPDIS,
+    EXCP_INE,
+    EXCP_TRAP,
+    EXCP_FPE,
+    EXCP_TLBM,
+    EXCP_TLBL,
+    EXCP_TLBS,
+    EXCP_TLBPE,
+    EXCP_TLBXI,
+    EXCP_TLBRI,
+
+    EXCP_LAST = EXCP_TLBRI,
+};
+
+int cpu_loongarch_signal_handler(int host_signum, void *pinfo, void *puc);
+
+#define LOONGARCH_CPU_TYPE_SUFFIX "-" TYPE_LOONGARCH_CPU
+#define LOONGARCH_CPU_TYPE_NAME(model) model LOONGARCH_CPU_TYPE_SUFFIX
+#define CPU_RESOLVING_TYPE TYPE_LOONGARCH_CPU
+
+#include "exec/memattrs.h"
+
+void loongarch_tcg_init(void);
+
+void loongarch_cpu_dump_state(CPUState *cpu, FILE *f, int flags);
+
+void QEMU_NORETURN do_raise_exception_err(CPULoongArchState *env,
+                                          uint32_t exception,
+                                          int error_code,
+                                          uintptr_t pc);
+
+static inline void QEMU_NORETURN do_raise_exception(CPULoongArchState *env,
+                                                    uint32_t exception,
+                                                    uintptr_t pc)
+{
+    do_raise_exception_err(env, exception, 0, pc);
+}
+
+static inline void compute_hflags(CPULoongArchState *env)
+{
+    env->hflags &= ~(LOONGARCH_HFLAG_64 | LOONGARCH_HFLAG_FPU |
+                     LOONGARCH_HFLAG_KU | LOONGARCH_HFLAG_ELPA);
+
+    env->hflags |= (env->CSR_CRMD & CSR_CRMD_PLV);
+    env->hflags |= LOONGARCH_HFLAG_64;
+
+    if (env->CSR_EUEN & CSR_EUEN_FPEN) {
+        env->hflags |= LOONGARCH_HFLAG_FPU;
+    }
+}
+
+const char *loongarch_exception_name(int32_t exception);
+
+#endif /* LOONGARCH_CPU_H */
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 04/22] target/loongarch: Add interrupt handling support
  2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
                   ` (2 preceding siblings ...)
  2021-07-21  9:52 ` [PATCH v2 03/22] target/loongarch: Add core definition Song Gao
@ 2021-07-21  9:53 ` Song Gao
  2021-07-22 22:47   ` Richard Henderson
  2021-07-21  9:53 ` [PATCH v2 05/22] target/loongarch: Add memory management support Song Gao
                   ` (17 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

This patch introduces functions loongarch_cpu_do_interrupt()
and loongarch_cpu_exec_interrupt()

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/cpu.c | 23 +++++++++++++++++++++++
 target/loongarch/cpu.h | 25 +++++++++++++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 4db2d0f..8eaa778 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -79,6 +79,28 @@ static void loongarch_cpu_set_pc(CPUState *cs, vaddr value)
     env->active_tc.PC = value & ~(target_ulong)1;
 }
 
+bool loongarch_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
+{
+    if (interrupt_request & CPU_INTERRUPT_HARD) {
+        LoongArchCPU *cpu = LOONGARCH_CPU(cs);
+        CPULoongArchState *env = &cpu->env;
+
+        if (cpu_loongarch_hw_interrupts_enabled(env) &&
+            cpu_loongarch_hw_interrupts_pending(env)) {
+            cs->exception_index = EXCP_INTE;
+            env->error_code = 0;
+            loongarch_cpu_do_interrupt(cs);
+            return true;
+        }
+    }
+    return false;
+}
+
+void loongarch_cpu_do_interrupt(CPUState *cs)
+{
+    cs->exception_index = EXCP_NONE;
+}
+
 #ifdef CONFIG_TCG
 static void loongarch_cpu_synchronize_from_tb(CPUState *cs,
                                               const TranslationBlock *tb)
@@ -246,6 +268,7 @@ static Property loongarch_cpu_properties[] = {
 static struct TCGCPUOps loongarch_tcg_ops = {
     .initialize = loongarch_tcg_init,
     .synchronize_from_tb = loongarch_cpu_synchronize_from_tb,
+    .cpu_exec_interrupt = loongarch_cpu_exec_interrupt,
 };
 #endif /* CONFIG_TCG */
 
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index ab1aeb6..1db8bb5 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -231,6 +231,31 @@ int cpu_loongarch_signal_handler(int host_signum, void *pinfo, void *puc);
 
 #include "exec/memattrs.h"
 
+void loongarch_cpu_do_interrupt(CPUState *cpu);
+bool loongarch_cpu_exec_interrupt(CPUState *cpu, int int_req);
+
+static inline bool cpu_loongarch_hw_interrupts_enabled(CPULoongArchState *env)
+{
+    bool ret = 0;
+
+    ret = env->CSR_CRMD & (1 << CSR_CRMD_IE_SHIFT);
+
+    return ret;
+}
+
+static inline bool cpu_loongarch_hw_interrupts_pending(CPULoongArchState *env)
+{
+    int32_t pending;
+    int32_t status;
+    bool r;
+
+    pending = env->CSR_ESTAT & CSR_ESTAT_IPMASK;
+    status  = env->CSR_ECFG & CSR_ECFG_IPMASK;
+
+    r = (pending & status) != 0;
+    return r;
+}
+
 void loongarch_tcg_init(void);
 
 void loongarch_cpu_dump_state(CPUState *cpu, FILE *f, int flags);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 05/22] target/loongarch: Add memory management support
  2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
                   ` (3 preceding siblings ...)
  2021-07-21  9:53 ` [PATCH v2 04/22] target/loongarch: Add interrupt handling support Song Gao
@ 2021-07-21  9:53 ` Song Gao
  2021-07-22 22:48   ` Richard Henderson
  2021-07-21  9:53 ` [PATCH v2 06/22] target/loongarch: Add main translation routines Song Gao
                   ` (16 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

This patch introduces one memory-management-related functions
- loongarch_cpu_tlb_fill()

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/cpu.c        |   1 +
 target/loongarch/cpu.h        |   9 ++++
 target/loongarch/tlb_helper.c | 103 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 113 insertions(+)
 create mode 100644 target/loongarch/tlb_helper.c

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 8eaa778..6269dd9 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -269,6 +269,7 @@ static struct TCGCPUOps loongarch_tcg_ops = {
     .initialize = loongarch_tcg_init,
     .synchronize_from_tb = loongarch_cpu_synchronize_from_tb,
     .cpu_exec_interrupt = loongarch_cpu_exec_interrupt,
+    .tlb_fill = loongarch_cpu_tlb_fill,
 };
 #endif /* CONFIG_TCG */
 
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 1db8bb5..5c06122 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -287,4 +287,13 @@ static inline void compute_hflags(CPULoongArchState *env)
 
 const char *loongarch_exception_name(int32_t exception);
 
+/* tlb_helper.c */
+bool loongarch_cpu_tlb_fill(CPUState *cs,
+                            vaddr address,
+                            int size,
+                            MMUAccessType access_type,
+                            int mmu_idx,
+                            bool probe,
+                            uintptr_t retaddr);
+
 #endif /* LOONGARCH_CPU_H */
diff --git a/target/loongarch/tlb_helper.c b/target/loongarch/tlb_helper.c
new file mode 100644
index 0000000..b59a995
--- /dev/null
+++ b/target/loongarch/tlb_helper.c
@@ -0,0 +1,103 @@
+/*
+ * LoongArch tlb emulation helpers for qemu.
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "cpu-csr.h"
+#include "exec/helper-proto.h"
+#include "exec/exec-all.h"
+#include "exec/cpu_ldst.h"
+#include "exec/log.h"
+
+enum {
+    TLBRET_PE = -7,
+    TLBRET_XI = -6,
+    TLBRET_RI = -5,
+    TLBRET_DIRTY = -4,
+    TLBRET_INVALID = -3,
+    TLBRET_NOMATCH = -2,
+    TLBRET_BADADDR = -1,
+    TLBRET_MATCH = 0
+};
+
+static void raise_mmu_exception(CPULoongArchState *env, target_ulong address,
+                                MMUAccessType access_type, int tlb_error)
+{
+    CPUState *cs = env_cpu(env);
+    int exception = 0, error_code = 0;
+
+    if (access_type == MMU_INST_FETCH) {
+        error_code |= INST_INAVAIL;
+    }
+
+    switch (tlb_error) {
+    default:
+    case TLBRET_BADADDR:
+        exception = EXCP_ADE;
+        break;
+    case TLBRET_NOMATCH:
+        /* No TLB match for a mapped address */
+        if (access_type == MMU_DATA_STORE) {
+            exception = EXCP_TLBS;
+        } else {
+            exception = EXCP_TLBL;
+        }
+        error_code |= TLB_NOMATCH;
+        break;
+    case TLBRET_INVALID:
+        /* TLB match with no valid bit */
+        if (access_type == MMU_DATA_STORE) {
+            exception = EXCP_TLBS;
+        } else {
+            exception = EXCP_TLBL;
+        }
+        break;
+    case TLBRET_DIRTY:
+        exception = EXCP_TLBM;
+        break;
+    case TLBRET_XI:
+        /* Execute-Inhibit Exception */
+        exception = EXCP_TLBXI;
+        break;
+    case TLBRET_RI:
+        /* Read-Inhibit Exception */
+        exception = EXCP_TLBRI;
+        break;
+    case TLBRET_PE:
+        /* Privileged Exception */
+        exception = EXCP_TLBPE;
+        break;
+    }
+
+    if (tlb_error == TLBRET_NOMATCH) {
+        env->CSR_TLBRBADV = address;
+        env->CSR_TLBREHI = address & (TARGET_PAGE_MASK << 1);
+        cs->exception_index = exception;
+        env->error_code = error_code;
+        return;
+    }
+
+    /* Raise exception */
+    env->CSR_BADV = address;
+    cs->exception_index = exception;
+    env->error_code = error_code;
+    env->CSR_TLBEHI = address & (TARGET_PAGE_MASK << 1);
+}
+
+bool loongarch_cpu_tlb_fill(CPUState *cs, vaddr address, int size,
+                       MMUAccessType access_type, int mmu_idx,
+                       bool probe, uintptr_t retaddr)
+{
+    LoongArchCPU *cpu = LOONGARCH_CPU(cs);
+    CPULoongArchState *env = &cpu->env;
+    int ret = TLBRET_BADADDR;
+
+    /* data access */
+    raise_mmu_exception(env, address, access_type, ret);
+    do_raise_exception_err(env, cs->exception_index, env->error_code, retaddr);
+}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 06/22] target/loongarch: Add main translation routines
  2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
                   ` (4 preceding siblings ...)
  2021-07-21  9:53 ` [PATCH v2 05/22] target/loongarch: Add memory management support Song Gao
@ 2021-07-21  9:53 ` Song Gao
  2021-07-22 23:50   ` Richard Henderson
  2021-07-21  9:53 ` [PATCH v2 07/22] target/loongarch: Add fixed point arithmetic instruction translation Song Gao
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

This patch add main translation routines and
basic functions for translation.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/helper.h    |  10 +
 target/loongarch/op_helper.c |  27 +++
 target/loongarch/translate.c | 485 +++++++++++++++++++++++++++++++++++++++++++
 target/loongarch/translate.h |  49 +++++
 4 files changed, 571 insertions(+)
 create mode 100644 target/loongarch/helper.h
 create mode 100644 target/loongarch/op_helper.c
 create mode 100644 target/loongarch/translate.c
 create mode 100644 target/loongarch/translate.h

diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
new file mode 100644
index 0000000..6c7e19b
--- /dev/null
+++ b/target/loongarch/helper.h
@@ -0,0 +1,10 @@
+/*
+ * QEMU LoongArch CPU
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+DEF_HELPER_3(raise_exception_err, noreturn, env, i32, int)
+DEF_HELPER_2(raise_exception, noreturn, env, i32)
diff --git a/target/loongarch/op_helper.c b/target/loongarch/op_helper.c
new file mode 100644
index 0000000..b2cbdd7
--- /dev/null
+++ b/target/loongarch/op_helper.c
@@ -0,0 +1,27 @@
+/*
+ * LoongArch emulation helpers for qemu.
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/main-loop.h"
+#include "cpu.h"
+#include "qemu/host-utils.h"
+#include "exec/helper-proto.h"
+#include "exec/exec-all.h"
+#include "exec/cpu_ldst.h"
+
+/* Exceptions helpers */
+void helper_raise_exception_err(CPULoongArchState *env, uint32_t exception,
+                                int error_code)
+{
+    do_raise_exception_err(env, exception, error_code, 0);
+}
+
+void helper_raise_exception(CPULoongArchState *env, uint32_t exception)
+{
+    do_raise_exception(env, exception, GETPC());
+}
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
new file mode 100644
index 0000000..531f7e1
--- /dev/null
+++ b/target/loongarch/translate.c
@@ -0,0 +1,485 @@
+/*
+ * LoongArch emulation for QEMU - main translation routines.
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "tcg/tcg-op.h"
+#include "exec/translator.h"
+#include "exec/helper-proto.h"
+#include "exec/helper-gen.h"
+#include "semihosting/semihost.h"
+
+#include "exec/translator.h"
+#include "exec/log.h"
+#include "qemu/qemu-print.h"
+#include "fpu_helper.h"
+#include "translate.h"
+
+/* global register indices */
+TCGv cpu_gpr[32], cpu_PC;
+TCGv btarget, bcond;
+static TCGv cpu_lladdr, cpu_llval;
+static TCGv_i32 hflags;
+TCGv_i32 fpu_fcsr0;
+TCGv_i64 fpu_f64[32];
+
+#include "exec/gen-icount.h"
+
+#define DISAS_STOP       DISAS_TARGET_0
+#define DISAS_EXIT       DISAS_TARGET_1
+
+static const char * const regnames[] = {
+    "r0", "ra", "tp", "sp", "a0", "a1", "a2", "a3",
+    "a4", "a5", "a6", "a7", "t0", "t1", "t2", "t3",
+    "t4", "t5", "t6", "t7", "t8", "x0", "fp", "s0",
+    "s1", "s2", "s3", "s4", "s5", "s6", "s7", "s8",
+};
+
+static const char * const fregnames[] = {
+    "f0",  "f1",  "f2",  "f3",  "f4",  "f5",  "f6",  "f7",
+    "f8",  "f9",  "f10", "f11", "f12", "f13", "f14", "f15",
+    "f16", "f17", "f18", "f19", "f20", "f21", "f22", "f23",
+    "f24", "f25", "f26", "f27", "f28", "f29", "f30", "f31",
+};
+
+/* General purpose registers moves. */
+void gen_load_gpr(TCGv t, int reg)
+{
+    if (reg == 0) {
+        tcg_gen_movi_tl(t, 0);
+    } else {
+        tcg_gen_mov_tl(t, cpu_gpr[reg]);
+    }
+}
+
+static inline void gen_save_pc(target_ulong pc)
+{
+    tcg_gen_movi_tl(cpu_PC, pc);
+}
+
+static inline void save_cpu_state(DisasContext *ctx, int do_save_pc)
+{
+    if (do_save_pc && ctx->base.pc_next != ctx->saved_pc) {
+        gen_save_pc(ctx->base.pc_next);
+        ctx->saved_pc = ctx->base.pc_next;
+    }
+    if (ctx->hflags != ctx->saved_hflags) {
+        tcg_gen_movi_i32(hflags, ctx->hflags);
+        ctx->saved_hflags = ctx->hflags;
+        switch (ctx->hflags & LOONGARCH_HFLAG_BMASK) {
+        case LOONGARCH_HFLAG_BR:
+            break;
+        case LOONGARCH_HFLAG_BC:
+        case LOONGARCH_HFLAG_B:
+            tcg_gen_movi_tl(btarget, ctx->btarget);
+            break;
+        }
+    }
+}
+
+static inline void restore_cpu_state(CPULoongArchState *env, DisasContext *ctx)
+{
+    ctx->saved_hflags = ctx->hflags;
+    switch (ctx->hflags & LOONGARCH_HFLAG_BMASK) {
+    case LOONGARCH_HFLAG_BR:
+        break;
+    case LOONGARCH_HFLAG_BC:
+    case LOONGARCH_HFLAG_B:
+        ctx->btarget = env->btarget;
+        break;
+    }
+}
+
+void generate_exception_err(DisasContext *ctx, int excp, int err)
+{
+    TCGv_i32 texcp = tcg_const_i32(excp);
+    TCGv_i32 terr = tcg_const_i32(err);
+    save_cpu_state(ctx, 1);
+    gen_helper_raise_exception_err(cpu_env, texcp, terr);
+    tcg_temp_free_i32(terr);
+    tcg_temp_free_i32(texcp);
+    ctx->base.is_jmp = DISAS_NORETURN;
+}
+
+void generate_exception_end(DisasContext *ctx, int excp)
+{
+    generate_exception_err(ctx, excp, 0);
+}
+
+void gen_reserved_instruction(DisasContext *ctx)
+{
+    generate_exception_end(ctx, EXCP_INE);
+}
+
+void gen_load_fpr32(TCGv_i32 t, int reg)
+{
+    tcg_gen_extrl_i64_i32(t, fpu_f64[reg]);
+}
+
+void gen_store_fpr32(TCGv_i32 t, int reg)
+{
+    TCGv_i64 t64 = tcg_temp_new_i64();
+    tcg_gen_extu_i32_i64(t64, t);
+    tcg_gen_deposit_i64(fpu_f64[reg], fpu_f64[reg], t64, 0, 32);
+    tcg_temp_free_i64(t64);
+}
+
+static void gen_load_fpr32h(TCGv_i32 t, int reg)
+{
+    tcg_gen_extrh_i64_i32(t, fpu_f64[reg]);
+}
+
+static void gen_store_fpr32h(TCGv_i32 t, int reg)
+{
+    TCGv_i64 t64 = tcg_temp_new_i64();
+    tcg_gen_extu_i32_i64(t64, t);
+    tcg_gen_deposit_i64(fpu_f64[reg], fpu_f64[reg], t64, 32, 32);
+    tcg_temp_free_i64(t64);
+}
+
+void gen_load_fpr64(TCGv_i64 t, int reg)
+{
+    tcg_gen_mov_i64(t, fpu_f64[reg]);
+}
+
+void gen_store_fpr64(TCGv_i64 t, int reg)
+{
+    tcg_gen_mov_i64(fpu_f64[reg], t);
+}
+
+void gen_op_addr_add(TCGv ret, TCGv arg0, TCGv arg1)
+{
+    tcg_gen_add_tl(ret, arg0, arg1);
+}
+
+void check_fpu_enabled(DisasContext *ctx)
+{
+    /* Nop */
+}
+
+/*
+ * This code generates a "reserved instruction" exception if 64-bit
+ * instructions are not enabled.
+ */
+void check_loongarch_64(DisasContext *ctx)
+{
+    if (unlikely(!(ctx->hflags & LOONGARCH_HFLAG_64))) {
+        gen_reserved_instruction(ctx);
+    }
+}
+
+void gen_base_offset_addr(TCGv addr, int base, int offset)
+{
+    if (base == 0) {
+        tcg_gen_movi_tl(addr, offset);
+    } else if (offset == 0) {
+        gen_load_gpr(addr, base);
+    } else {
+        tcg_gen_movi_tl(addr, offset);
+        gen_op_addr_add(addr, cpu_gpr[base], addr);
+    }
+}
+
+
+static inline bool use_goto_tb(DisasContext *ctx, target_ulong dest)
+{
+    return true;
+}
+
+static inline void gen_goto_tb(DisasContext *ctx, int n, target_ulong dest)
+{
+    if (use_goto_tb(ctx, dest)) {
+        tcg_gen_goto_tb(n);
+        gen_save_pc(dest);
+        tcg_gen_exit_tb(ctx->base.tb, n);
+    } else {
+        gen_save_pc(dest);
+        tcg_gen_lookup_and_goto_ptr();
+    }
+}
+
+static inline void clear_branch_hflags(DisasContext *ctx)
+{
+    ctx->hflags &= ~LOONGARCH_HFLAG_BMASK;
+    if (ctx->base.is_jmp == DISAS_NEXT) {
+        save_cpu_state(ctx, 0);
+    } else {
+        /*
+         * It is not safe to save ctx->hflags as hflags may be changed
+         * in execution time.
+         */
+        tcg_gen_andi_i32(hflags, hflags, ~LOONGARCH_HFLAG_BMASK);
+    }
+}
+
+static void gen_branch(DisasContext *ctx, int insn_bytes)
+{
+    if (ctx->hflags & LOONGARCH_HFLAG_BMASK) {
+        int proc_hflags = ctx->hflags & LOONGARCH_HFLAG_BMASK;
+        /* Branches completion */
+        clear_branch_hflags(ctx);
+        ctx->base.is_jmp = DISAS_NORETURN;
+        switch (proc_hflags & LOONGARCH_HFLAG_BMASK) {
+        case LOONGARCH_HFLAG_B:
+            /* unconditional branch */
+            gen_goto_tb(ctx, 0, ctx->btarget);
+            break;
+        case LOONGARCH_HFLAG_BC:
+            /* Conditional branch */
+            {
+                TCGLabel *l1 = gen_new_label();
+
+                tcg_gen_brcondi_tl(TCG_COND_NE, bcond, 0, l1);
+                gen_goto_tb(ctx, 1, ctx->base.pc_next + insn_bytes);
+                gen_set_label(l1);
+                gen_goto_tb(ctx, 0, ctx->btarget);
+            }
+            break;
+        case LOONGARCH_HFLAG_BR:
+            /* unconditional branch to register */
+            tcg_gen_mov_tl(cpu_PC, btarget);
+            tcg_gen_lookup_and_goto_ptr();
+            break;
+        default:
+            fprintf(stderr, "unknown branch 0x%x\n", proc_hflags);
+            abort();
+        }
+    }
+}
+
+static void loongarch_tr_init_disas_context(DisasContextBase *dcbase,
+                                            CPUState *cs)
+{
+    DisasContext *ctx = container_of(dcbase, DisasContext, base);
+    CPULoongArchState *env = cs->env_ptr;
+
+    ctx->page_start = ctx->base.pc_first & TARGET_PAGE_MASK;
+    ctx->saved_pc = -1;
+    ctx->btarget = 0;
+    /* Restore state from the tb context.  */
+    ctx->hflags = (uint32_t)ctx->base.tb->flags;
+    restore_cpu_state(env, ctx);
+    ctx->mem_idx = LOONGARCH_HFLAG_UM;
+    ctx->default_tcg_memop_mask = MO_UNALN;
+}
+
+static void loongarch_tr_tb_start(DisasContextBase *dcbase, CPUState *cs)
+{
+}
+
+static void loongarch_tr_insn_start(DisasContextBase *dcbase, CPUState *cs)
+{
+    DisasContext *ctx = container_of(dcbase, DisasContext, base);
+
+    tcg_gen_insn_start(ctx->base.pc_next, ctx->hflags & LOONGARCH_HFLAG_BMASK,
+                       ctx->btarget);
+}
+
+static bool loongarch_tr_breakpoint_check(DisasContextBase *dcbase,
+                                          CPUState *cs,
+                                          const CPUBreakpoint *bp)
+{
+    return true;
+}
+
+static void loongarch_tr_translate_insn(DisasContextBase *dcbase, CPUState *cs)
+{
+    CPULoongArchState *env = cs->env_ptr;
+    DisasContext *ctx = container_of(dcbase, DisasContext, base);
+    int insn_bytes = 4;
+
+    ctx->opcode = cpu_ldl_code(env, ctx->base.pc_next);
+
+    if (!decode(ctx, ctx->opcode)) {
+        fprintf(stderr, "Error: unkown opcode. 0x%lx: 0x%x\n",
+                ctx->base.pc_next, ctx->opcode);
+        generate_exception_end(ctx, EXCP_INE);
+    }
+
+    if (ctx->hflags & LOONGARCH_HFLAG_BMASK) {
+        gen_branch(ctx, insn_bytes);
+    }
+    ctx->base.pc_next += insn_bytes;
+}
+
+static void loongarch_tr_tb_stop(DisasContextBase *dcbase, CPUState *cs)
+{
+    DisasContext *ctx = container_of(dcbase, DisasContext, base);
+
+    switch (ctx->base.is_jmp) {
+    case DISAS_STOP:
+        gen_save_pc(ctx->base.pc_next);
+        tcg_gen_lookup_and_goto_ptr();
+        break;
+    case DISAS_NEXT:
+    case DISAS_TOO_MANY:
+        save_cpu_state(ctx, 0);
+        gen_goto_tb(ctx, 0, ctx->base.pc_next);
+        break;
+    case DISAS_EXIT:
+        tcg_gen_exit_tb(NULL, 0);
+        break;
+    case DISAS_NORETURN:
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void loongarch_tr_disas_log(const DisasContextBase *dcbase, CPUState *cs)
+{
+    qemu_log("IN: %s\n", lookup_symbol(dcbase->pc_first));
+    log_target_disas(cs, dcbase->pc_first, dcbase->tb->size);
+}
+
+static const TranslatorOps loongarch_tr_ops = {
+    .init_disas_context = loongarch_tr_init_disas_context,
+    .tb_start           = loongarch_tr_tb_start,
+    .insn_start         = loongarch_tr_insn_start,
+    .breakpoint_check   = loongarch_tr_breakpoint_check,
+    .translate_insn     = loongarch_tr_translate_insn,
+    .tb_stop            = loongarch_tr_tb_stop,
+    .disas_log          = loongarch_tr_disas_log,
+};
+
+void gen_intermediate_code(CPUState *cs, TranslationBlock *tb, int max_insns)
+{
+    DisasContext ctx;
+
+    translator_loop(&loongarch_tr_ops, &ctx.base, cs, tb, max_insns);
+}
+
+static void fpu_dump_state(CPULoongArchState *env, FILE * f, int flags)
+{
+    int i;
+    int is_fpu64 = 1;
+
+#define printfpr(fp)                                              \
+    do {                                                          \
+        if (is_fpu64)                                             \
+            qemu_fprintf(f, "w:%08x d:%016" PRIx64                \
+                        " fd:%13g fs:%13g psu: %13g\n",           \
+                        (fp)->w[FP_ENDIAN_IDX], (fp)->d,          \
+                        (double)(fp)->fd,                         \
+                        (double)(fp)->fs[FP_ENDIAN_IDX],          \
+                        (double)(fp)->fs[!FP_ENDIAN_IDX]);        \
+        else {                                                    \
+            fpr_t tmp;                                            \
+            tmp.w[FP_ENDIAN_IDX] = (fp)->w[FP_ENDIAN_IDX];        \
+            tmp.w[!FP_ENDIAN_IDX] = ((fp) + 1)->w[FP_ENDIAN_IDX]; \
+            qemu_fprintf(f, "w:%08x d:%016" PRIx64                \
+                        " fd:%13g fs:%13g psu:%13g\n",            \
+                        tmp.w[FP_ENDIAN_IDX], tmp.d,              \
+                        (double)tmp.fd,                           \
+                        (double)tmp.fs[FP_ENDIAN_IDX],            \
+                        (double)tmp.fs[!FP_ENDIAN_IDX]);          \
+        }                                                         \
+    } while (0)
+
+
+    qemu_fprintf(f,
+                 "FCSR0 0x%08x  SR.FR %d  fp_status 0x%02x\n",
+                 env->active_fpu.fcsr0, is_fpu64,
+                 get_float_exception_flags(&env->active_fpu.fp_status));
+    for (i = 0; i < 32; (is_fpu64) ? i++ : (i += 2)) {
+        qemu_fprintf(f, "%3s: ", fregnames[i]);
+        printfpr(&env->active_fpu.fpr[i]);
+    }
+
+#undef printfpr
+}
+
+void loongarch_cpu_dump_state(CPUState *cs, FILE *f, int flags)
+{
+    LoongArchCPU *cpu = LOONGARCH_CPU(cs);
+    CPULoongArchState *env = &cpu->env;
+    int i;
+
+    qemu_fprintf(f, "pc=0x" TARGET_FMT_lx " ds %04x "
+                 TARGET_FMT_lx " " TARGET_FMT_ld "\n",
+                 env->active_tc.PC, env->hflags, env->btarget, env->bcond);
+    for (i = 0; i < 32; i++) {
+        if ((i & 3) == 0) {
+            qemu_fprintf(f, "GPR%02d:", i);
+        }
+        qemu_fprintf(f, " %s " TARGET_FMT_lx,
+                     regnames[i], env->active_tc.gpr[i]);
+        if ((i & 3) == 3) {
+            qemu_fprintf(f, "\n");
+        }
+    }
+
+    qemu_fprintf(f, "EUEN            0x%lx\n", env->CSR_EUEN);
+    qemu_fprintf(f, "ESTAT           0x%lx\n", env->CSR_ESTAT);
+    qemu_fprintf(f, "ERA             0x%lx\n", env->CSR_ERA);
+    qemu_fprintf(f, "CRMD            0x%lx\n", env->CSR_CRMD);
+    qemu_fprintf(f, "PRMD            0x%lx\n", env->CSR_PRMD);
+    qemu_fprintf(f, "BadVAddr        0x%lx\n", env->CSR_BADV);
+    qemu_fprintf(f, "TLB refill ERA  0x%lx\n", env->CSR_TLBRERA);
+    qemu_fprintf(f, "TLB refill BadV 0x%lx\n", env->CSR_TLBRBADV);
+    qemu_fprintf(f, "EEPN            0x%lx\n", env->CSR_EEPN);
+    qemu_fprintf(f, "BadInstr        0x%lx\n", env->CSR_BADI);
+    qemu_fprintf(f, "PRCFG1    0x%lx\nPRCFG2     0x%lx\nPRCFG3     0x%lx\n",
+                 env->CSR_PRCFG1, env->CSR_PRCFG3, env->CSR_PRCFG3);
+    if ((flags & CPU_DUMP_FPU) && (env->hflags & LOONGARCH_HFLAG_FPU)) {
+        fpu_dump_state(env, f, flags);
+    }
+}
+
+void loongarch_tcg_init(void)
+{
+    int i;
+
+    for (i = 0; i < 32; i++)
+        cpu_gpr[i] = tcg_global_mem_new(cpu_env,
+                                        offsetof(CPULoongArchState,
+                                                 active_tc.gpr[i]),
+                                        regnames[i]);
+
+    for (i = 0; i < 32; i++) {
+        int off = offsetof(CPULoongArchState, active_fpu.fpr[i].d);
+        fpu_f64[i] = tcg_global_mem_new_i64(cpu_env, off, fregnames[i]);
+    }
+
+    cpu_PC = tcg_global_mem_new(cpu_env,
+                                offsetof(CPULoongArchState,
+                                         active_tc.PC), "PC");
+    bcond = tcg_global_mem_new(cpu_env,
+                               offsetof(CPULoongArchState, bcond), "bcond");
+    btarget = tcg_global_mem_new(cpu_env,
+                                 offsetof(CPULoongArchState, btarget),
+                                 "btarget");
+    hflags = tcg_global_mem_new_i32(cpu_env,
+                                    offsetof(CPULoongArchState, hflags),
+                                    "hflags");
+    fpu_fcsr0 = tcg_global_mem_new_i32(cpu_env,
+                                   offsetof(CPULoongArchState,
+                                            active_fpu.fcsr0), "fcsr0");
+    cpu_lladdr = tcg_global_mem_new(cpu_env,
+                                    offsetof(CPULoongArchState, lladdr),
+                                    "lladdr");
+    cpu_llval = tcg_global_mem_new(cpu_env,
+                                   offsetof(CPULoongArchState, llval),
+                                   "llval");
+}
+
+void restore_state_to_opc(CPULoongArchState *env, TranslationBlock *tb,
+                          target_ulong *data)
+{
+    env->active_tc.PC = data[0];
+    env->hflags &= ~LOONGARCH_HFLAG_BMASK;
+    env->hflags |= data[1];
+    switch (env->hflags & LOONGARCH_HFLAG_BMASK) {
+    case LOONGARCH_HFLAG_BR:
+        break;
+    case LOONGARCH_HFLAG_BC:
+    case LOONGARCH_HFLAG_B:
+        env->btarget = data[2];
+        break;
+    }
+}
diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h
new file mode 100644
index 0000000..333c3bf
--- /dev/null
+++ b/target/loongarch/translate.h
@@ -0,0 +1,49 @@
+/*
+ * LoongArch translation routines.
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+#ifndef TARGET_LOONGARCH_TRANSLATE_H
+#define TARGET_LOONGARCH_TRANSLATE_H
+
+#include "exec/translator.h"
+
+#define LOONGARCH_DEBUG_DISAS 0
+
+typedef struct DisasContext {
+    DisasContextBase base;
+    target_ulong saved_pc;
+    target_ulong page_start;
+    uint32_t opcode;
+    /* Routine used to access memory */
+    int mem_idx;
+    MemOp default_tcg_memop_mask;
+    uint32_t hflags, saved_hflags;
+    target_ulong btarget;
+} DisasContext;
+
+void generate_exception_err(DisasContext *ctx, int excp, int err);
+void generate_exception_end(DisasContext *ctx, int excp);
+void gen_reserved_instruction(DisasContext *ctx);
+
+void check_insn(DisasContext *ctx, uint64_t flags);
+void check_loongarch_64(DisasContext *ctx);
+void check_fpu_enabled(DisasContext *ctx);
+
+void gen_base_offset_addr(TCGv addr, int base, int offset);
+void gen_load_gpr(TCGv t, int reg);
+void gen_load_fpr32(TCGv_i32 t, int reg);
+void gen_load_fpr64(TCGv_i64 t, int reg);
+void gen_store_fpr32(TCGv_i32 t, int reg);
+void gen_store_fpr64(TCGv_i64 t, int reg);
+void gen_op_addr_add(TCGv ret, TCGv arg0, TCGv arg1);
+
+extern TCGv cpu_gpr[32], cpu_PC;
+extern TCGv_i32 fpu_fscr0;
+extern TCGv_i64 fpu_f64[32];
+extern TCGv bcond;
+
+#endif
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 07/22] target/loongarch: Add fixed point arithmetic instruction translation
  2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
                   ` (5 preceding siblings ...)
  2021-07-21  9:53 ` [PATCH v2 06/22] target/loongarch: Add main translation routines Song Gao
@ 2021-07-21  9:53 ` Song Gao
  2021-07-21 17:38   ` Philippe Mathieu-Daudé
  2021-07-23  0:46   ` Richard Henderson
  2021-07-21  9:53 ` [PATCH v2 08/22] target/loongarch: Add fixed point shift " Song Gao
                   ` (14 subsequent siblings)
  21 siblings, 2 replies; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

This patch implement fixed point arithemtic instruction translation.

This includes:
- ADD.{W/D}, SUB.{W/D}
- ADDI.{W/D}, ADDU16ID
- ALSL.{W[U]/D}
- LU12I.W, LU32I.D LU52I.D
- SLT[U], SLT[U]I
- PCADDI, PCADDU12I, PCADDU18I, PCALAU12I
- AND, OR, NOR, XOR, ANDN, ORN
- MUL.{W/D}, MULH.{W[U]/D[U]}
- MULW.D.W[U]
- DIV.{W[U]/D[U]}, MOD.{W[U]/D[U]}
- ANDI, ORI, XORI

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insns.decode |   89 ++++
 target/loongarch/trans.inc.c  | 1090 +++++++++++++++++++++++++++++++++++++++++
 target/loongarch/translate.c  |   12 +
 target/loongarch/translate.h  |    1 +
 4 files changed, 1192 insertions(+)
 create mode 100644 target/loongarch/insns.decode
 create mode 100644 target/loongarch/trans.inc.c

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
new file mode 100644
index 0000000..1e0b755
--- /dev/null
+++ b/target/loongarch/insns.decode
@@ -0,0 +1,89 @@
+#
+# LoongArch instruction decode definitions.
+#
+# Copyright (c) 2021 Loongson Technology Corporation Limited
+#
+# SPDX-License-Identifier: LGPL-2.1+
+#
+
+#
+# Fields
+#
+%rd      0:5
+%rj      5:5
+%rk      10:5
+%sa2     15:2
+%si12    10:s12
+%ui12    10:12
+%si16    10:s16
+%si20    5:s20
+
+#
+# Argument sets
+#
+&fmt_rdrjrk         rd rj rk
+&fmt_rdrjsi12       rd rj si12
+&fmt_rdrjrksa2      rd rj rk sa2
+&fmt_rdrjsi16       rd rj si16
+&fmt_rdrjui12       rd rj ui12
+&fmt_rdsi20         rd si20
+
+#
+# Formats
+#
+@fmt_rdrjrk          .... ........ ..... ..... ..... .....    &fmt_rdrjrk         %rd %rj %rk
+@fmt_rdrjsi12        .... ...... ............ ..... .....     &fmt_rdrjsi12       %rd %rj %si12
+@fmt_rdrjui12        .... ...... ............ ..... .....     &fmt_rdrjui12       %rd %rj %ui12
+@fmt_rdrjrksa2       .... ........ ... .. ..... ..... .....   &fmt_rdrjrksa2      %rd %rj %rk %sa2
+@fmt_rdrjsi16        .... .. ................ ..... .....     &fmt_rdrjsi16       %rd %rj %si16
+@fmt_rdsi20          .... ... .................... .....      &fmt_rdsi20         %rd %si20
+
+#
+# Fixed point arithmetic operation instruction
+#
+add_w            0000 00000001 00000 ..... ..... .....    @fmt_rdrjrk
+add_d            0000 00000001 00001 ..... ..... .....    @fmt_rdrjrk
+sub_w            0000 00000001 00010 ..... ..... .....    @fmt_rdrjrk
+sub_d            0000 00000001 00011 ..... ..... .....    @fmt_rdrjrk
+slt              0000 00000001 00100 ..... ..... .....    @fmt_rdrjrk
+sltu             0000 00000001 00101 ..... ..... .....    @fmt_rdrjrk
+slti             0000 001000 ............ ..... .....     @fmt_rdrjsi12
+sltui            0000 001001 ............ ..... .....     @fmt_rdrjsi12
+nor              0000 00000001 01000 ..... ..... .....    @fmt_rdrjrk
+and              0000 00000001 01001 ..... ..... .....    @fmt_rdrjrk
+or               0000 00000001 01010 ..... ..... .....    @fmt_rdrjrk
+xor              0000 00000001 01011 ..... ..... .....    @fmt_rdrjrk
+orn              0000 00000001 01100 ..... ..... .....    @fmt_rdrjrk
+andn             0000 00000001 01101 ..... ..... .....    @fmt_rdrjrk
+mul_w            0000 00000001 11000 ..... ..... .....    @fmt_rdrjrk
+mulh_w           0000 00000001 11001 ..... ..... .....    @fmt_rdrjrk
+mulh_wu          0000 00000001 11010 ..... ..... .....    @fmt_rdrjrk
+mul_d            0000 00000001 11011 ..... ..... .....    @fmt_rdrjrk
+mulh_d           0000 00000001 11100 ..... ..... .....    @fmt_rdrjrk
+mulh_du          0000 00000001 11101 ..... ..... .....    @fmt_rdrjrk
+mulw_d_w         0000 00000001 11110 ..... ..... .....    @fmt_rdrjrk
+mulw_d_wu        0000 00000001 11111 ..... ..... .....    @fmt_rdrjrk
+div_w            0000 00000010 00000 ..... ..... .....    @fmt_rdrjrk
+mod_w            0000 00000010 00001 ..... ..... .....    @fmt_rdrjrk
+div_wu           0000 00000010 00010 ..... ..... .....    @fmt_rdrjrk
+mod_wu           0000 00000010 00011 ..... ..... .....    @fmt_rdrjrk
+div_d            0000 00000010 00100 ..... ..... .....    @fmt_rdrjrk
+mod_d            0000 00000010 00101 ..... ..... .....    @fmt_rdrjrk
+div_du           0000 00000010 00110 ..... ..... .....    @fmt_rdrjrk
+mod_du           0000 00000010 00111 ..... ..... .....    @fmt_rdrjrk
+alsl_w           0000 00000000 010 .. ..... ..... .....   @fmt_rdrjrksa2
+alsl_wu          0000 00000000 011 .. ..... ..... .....   @fmt_rdrjrksa2
+alsl_d           0000 00000010 110 .. ..... ..... .....   @fmt_rdrjrksa2
+lu12i_w          0001 010 .................... .....      @fmt_rdsi20
+lu32i_d          0001 011 .................... .....      @fmt_rdsi20
+lu52i_d          0000 001100 ............ ..... .....     @fmt_rdrjsi12
+pcaddi           0001 100 .................... .....      @fmt_rdsi20
+pcalau12i        0001 101 .................... .....      @fmt_rdsi20
+pcaddu12i        0001 110 .................... .....      @fmt_rdsi20
+pcaddu18i        0001 111 .................... .....      @fmt_rdsi20
+addi_w           0000 001010 ............ ..... .....     @fmt_rdrjsi12
+addi_d           0000 001011 ............ ..... .....     @fmt_rdrjsi12
+addu16i_d        0001 00 ................ ..... .....     @fmt_rdrjsi16
+andi             0000 001101 ............ ..... .....     @fmt_rdrjui12
+ori              0000 001110 ............ ..... .....     @fmt_rdrjui12
+xori             0000 001111 ............ ..... .....     @fmt_rdrjui12
diff --git a/target/loongarch/trans.inc.c b/target/loongarch/trans.inc.c
new file mode 100644
index 0000000..8faef62
--- /dev/null
+++ b/target/loongarch/trans.inc.c
@@ -0,0 +1,1090 @@
+/*
+ * LoongArch translate functions
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+/* Fixed point arithmetic operation instruction translation */
+static bool trans_add_w(DisasContext *ctx, arg_add_w *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    if (a->rj != 0 && a->rk != 0) {
+        tcg_gen_add_tl(Rd, Rj, Rk);
+        tcg_gen_ext32s_tl(Rd, Rd);
+    } else if (a->rj == 0 && a->rk != 0) {
+        tcg_gen_mov_tl(Rd, Rk);
+    } else if (a->rj != 0 && a->rk == 0) {
+        tcg_gen_mov_tl(Rd, Rj);
+    } else {
+        tcg_gen_movi_tl(Rd, 0);
+    }
+
+    return true;
+}
+
+static bool trans_add_d(DisasContext *ctx, arg_add_d *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    check_loongarch_64(ctx);
+    if (a->rj != 0 && a->rk != 0) {
+        tcg_gen_add_tl(Rd, Rj, Rk);
+    } else if (a->rj == 0 && a->rk != 0) {
+        tcg_gen_mov_tl(Rd, Rk);
+    } else if (a->rj != 0 && a->rk == 0) {
+        tcg_gen_mov_tl(Rd, Rj);
+    } else {
+        tcg_gen_movi_tl(Rd, 0);
+    }
+
+    return true;
+}
+
+static bool trans_sub_w(DisasContext *ctx, arg_sub_w *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    if (a->rj != 0 && a->rk != 0) {
+        tcg_gen_sub_tl(Rd, Rj, Rk);
+        tcg_gen_ext32s_tl(Rd, Rd);
+    } else if (a->rj == 0 && a->rk != 0) {
+        tcg_gen_neg_tl(Rd, Rk);
+        tcg_gen_ext32s_tl(Rd, Rd);
+    } else if (a->rj != 0 && a->rk == 0) {
+        tcg_gen_mov_tl(Rd, Rj);
+    } else {
+        tcg_gen_movi_tl(Rd, 0);
+    }
+
+    return true;
+}
+
+static bool trans_sub_d(DisasContext *ctx, arg_sub_d *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    check_loongarch_64(ctx);
+    if (a->rj != 0 && a->rk != 0) {
+        tcg_gen_sub_tl(Rd, Rj, Rk);
+    } else if (a->rj == 0 && a->rk != 0) {
+        tcg_gen_neg_tl(Rd, Rk);
+    } else if (a->rj != 0 && a->rk == 0) {
+        tcg_gen_mov_tl(Rd, Rj);
+    } else {
+        tcg_gen_movi_tl(Rd, 0);
+    }
+
+    return true;
+}
+
+static bool trans_slt(DisasContext *ctx, arg_slt *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rj);
+    t1 = get_gpr(a->rk);
+
+    tcg_gen_setcond_tl(TCG_COND_LT, Rd, t0, t1);
+
+    return true;
+}
+
+static bool trans_sltu(DisasContext *ctx, arg_sltu *a)
+{
+
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rj);
+    t1 = get_gpr(a->rk);
+
+    tcg_gen_setcond_tl(TCG_COND_LTU, Rd, t0, t1);
+
+    return true;
+}
+
+static bool trans_slti(DisasContext *ctx, arg_slti *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+    target_ulong uimm = (target_long)(a->si12);
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rj);
+
+    tcg_gen_setcondi_tl(TCG_COND_LT, Rd, t0, uimm);
+
+    return true;
+}
+
+static bool trans_sltui(DisasContext *ctx, arg_sltui *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+    target_ulong uimm = (target_long)(a->si12);
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rj);
+
+    tcg_gen_setcondi_tl(TCG_COND_LTU, Rd, t0, uimm);
+
+    return true;
+}
+
+static bool trans_nor(DisasContext *ctx, arg_nor *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    if (a->rj != 0 && a->rk != 0) {
+        tcg_gen_nor_tl(Rd, Rj, Rk);
+    } else if (a->rj == 0 && a->rk != 0) {
+        tcg_gen_not_tl(Rd, Rk);
+    } else if (a->rj != 0 && a->rk == 0) {
+        tcg_gen_not_tl(Rd, Rj);
+    } else {
+        tcg_gen_movi_tl(Rd, ~((target_ulong)0));
+    }
+
+    return true;
+}
+
+static bool trans_and(DisasContext *ctx, arg_and *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    if (likely(a->rj != 0 && a->rk != 0)) {
+        tcg_gen_and_tl(Rd, Rj, Rk);
+    } else {
+        tcg_gen_movi_tl(Rd, 0);
+    }
+
+    return true;
+}
+
+static bool trans_or(DisasContext *ctx, arg_or *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    if (likely(a->rj != 0 && a->rk != 0)) {
+        tcg_gen_or_tl(Rd, Rj, Rk);
+    } else if (a->rj == 0 && a->rk != 0) {
+        tcg_gen_mov_tl(Rd, Rk);
+    } else if (a->rj != 0 && a->rk == 0) {
+        tcg_gen_mov_tl(Rd, Rj);
+    } else {
+        tcg_gen_movi_tl(Rd, 0);
+    }
+
+    return true;
+}
+
+static bool trans_xor(DisasContext *ctx, arg_xor *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    if (likely(a->rj != 0 && a->rk != 0)) {
+        tcg_gen_xor_tl(Rd, Rj, Rk);
+    } else if (a->rj == 0 && a->rk != 0) {
+        tcg_gen_mov_tl(Rd, Rk);
+    } else if (a->rj != 0 && a->rk == 0) {
+        tcg_gen_mov_tl(Rd, Rj);
+    } else {
+        tcg_gen_movi_tl(Rd, 0);
+    }
+
+    return true;
+}
+
+static bool trans_orn(DisasContext *ctx, arg_orn *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    TCGv t0 = tcg_temp_new();
+    gen_load_gpr(t0, a->rk);
+
+    tcg_gen_not_tl(t0, t0);
+    tcg_gen_or_tl(Rd, Rj, t0);
+
+    tcg_temp_free(t0);
+    return true;
+}
+
+static bool trans_andn(DisasContext *ctx, arg_andn *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    TCGv t0 = tcg_temp_new();
+    gen_load_gpr(t0, a->rk);
+
+    tcg_gen_not_tl(t0, t0);
+    tcg_gen_and_tl(Rd, Rj, t0);
+
+    tcg_temp_free(t0);
+    return true;
+}
+
+static bool trans_mul_w(DisasContext *ctx, arg_mul_w *a)
+{
+    TCGv t0, t1;
+    TCGv_i32 t2, t3;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rj);
+    t1 = get_gpr(a->rk);
+    t2 = tcg_temp_new_i32();
+    t3 = tcg_temp_new_i32();
+
+    tcg_gen_trunc_tl_i32(t2, t0);
+    tcg_gen_trunc_tl_i32(t3, t1);
+    tcg_gen_mul_i32(t2, t2, t3);
+    tcg_gen_ext_i32_tl(Rd, t2);
+
+    tcg_temp_free_i32(t2);
+    tcg_temp_free_i32(t3);
+
+    return true;
+}
+
+static bool trans_mulh_w(DisasContext *ctx, arg_mulh_w *a)
+{
+    TCGv t0, t1;
+    TCGv_i32 t2, t3;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rj);
+    t1 = get_gpr(a->rk);
+    t2 = tcg_temp_new_i32();
+    t3 = tcg_temp_new_i32();
+
+    tcg_gen_trunc_tl_i32(t2, t0);
+    tcg_gen_trunc_tl_i32(t3, t1);
+    tcg_gen_muls2_i32(t2, t3, t2, t3);
+    tcg_gen_ext_i32_tl(Rd, t3);
+
+    tcg_temp_free_i32(t2);
+    tcg_temp_free_i32(t3);
+
+    return true;
+}
+
+static bool trans_mulh_wu(DisasContext *ctx, arg_mulh_wu *a)
+{
+    TCGv t0, t1;
+    TCGv_i32 t2, t3;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rj);
+    t1 = get_gpr(a->rk);
+    t2 = tcg_temp_new_i32();
+    t3 = tcg_temp_new_i32();
+
+    tcg_gen_trunc_tl_i32(t2, t0);
+    tcg_gen_trunc_tl_i32(t3, t1);
+    tcg_gen_mulu2_i32(t2, t3, t2, t3);
+    tcg_gen_ext_i32_tl(Rd, t3);
+
+    tcg_temp_free_i32(t2);
+    tcg_temp_free_i32(t3);
+
+    return true;
+}
+
+static bool trans_mul_d(DisasContext *ctx, arg_mul_d *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rj);
+    t1 = get_gpr(a->rk);
+
+    check_loongarch_64(ctx);
+    tcg_gen_mul_i64(Rd, t0, t1);
+
+    return true;
+}
+
+static bool trans_mulh_d(DisasContext *ctx, arg_mulh_d *a)
+{
+    TCGv t0, t1, t2;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rj);
+    t1 = get_gpr(a->rk);
+    t2 = tcg_temp_new();
+
+    check_loongarch_64(ctx);
+    tcg_gen_muls2_i64(t2, Rd, t0, t1);
+
+    tcg_temp_free(t2);
+
+    return true;
+}
+
+static bool trans_mulh_du(DisasContext *ctx, arg_mulh_du *a)
+{
+    TCGv t0, t1, t2;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rj);
+    t1 = get_gpr(a->rk);
+    t2 = tcg_temp_new();
+
+    check_loongarch_64(ctx);
+    tcg_gen_mulu2_i64(t2, Rd, t0, t1);
+
+    tcg_temp_free(t2);
+
+    return true;
+}
+
+static bool trans_mulw_d_w(DisasContext *ctx, arg_mulw_d_w *a)
+{
+    TCGv_i64 t0, t1, t2;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    t2 = tcg_temp_new_i64();
+
+    gen_load_gpr(t0, a->rj);
+    gen_load_gpr(t1, a->rk);
+
+    tcg_gen_ext32s_i64(t0, t0);
+    tcg_gen_ext32s_i64(t1, t1);
+    tcg_gen_mul_i64(t2, t0, t1);
+    tcg_gen_mov_tl(Rd, t2);
+
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+
+    return true;
+}
+
+static bool trans_mulw_d_wu(DisasContext *ctx, arg_mulw_d_wu *a)
+{
+    TCGv_i64 t0, t1, t2;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    t2 = tcg_temp_new_i64();
+
+    gen_load_gpr(t0, a->rj);
+    gen_load_gpr(t1, a->rk);
+
+    tcg_gen_ext32u_i64(t0, t0);
+    tcg_gen_ext32u_i64(t1, t1);
+    tcg_gen_mul_i64(t2, t0, t1);
+    tcg_gen_mov_tl(Rd, t2);
+
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+
+    return true;
+}
+
+static bool trans_div_w(DisasContext *ctx, arg_div_w *a)
+{
+    TCGv t0, t1, t2, t3;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+    t2 = tcg_temp_new();
+    t3 = tcg_temp_new();
+
+    gen_load_gpr(t0, a->rj);
+    gen_load_gpr(t1, a->rk);
+
+    tcg_gen_ext32s_tl(t0, t0);
+    tcg_gen_ext32s_tl(t1, t1);
+    tcg_gen_setcondi_tl(TCG_COND_EQ, t2, t0, INT_MIN);
+    tcg_gen_setcondi_tl(TCG_COND_EQ, t3, t1, -1);
+    tcg_gen_and_tl(t2, t2, t3);
+    tcg_gen_setcondi_tl(TCG_COND_EQ, t3, t1, 0);
+    tcg_gen_or_tl(t2, t2, t3);
+    tcg_gen_movi_tl(t3, 0);
+    tcg_gen_movcond_tl(TCG_COND_NE, t1, t2, t3, t2, t1);
+    tcg_gen_div_tl(Rd, t0, t1);
+    tcg_gen_ext32s_tl(Rd, Rd);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+    tcg_temp_free(t3);
+
+    return true;
+}
+
+static bool trans_mod_w(DisasContext *ctx, arg_mod_w *a)
+{
+    TCGv t0, t1, t2, t3;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+    t2 = tcg_temp_new();
+    t3 = tcg_temp_new();
+
+    gen_load_gpr(t0, a->rj);
+    gen_load_gpr(t1, a->rk);
+
+    tcg_gen_ext32s_tl(t0, t0);
+    tcg_gen_ext32s_tl(t1, t1);
+    tcg_gen_setcondi_tl(TCG_COND_EQ, t2, t0, INT_MIN);
+    tcg_gen_setcondi_tl(TCG_COND_EQ, t3, t1, -1);
+    tcg_gen_and_tl(t2, t2, t3);
+    tcg_gen_setcondi_tl(TCG_COND_EQ, t3, t1, 0);
+    tcg_gen_or_tl(t2, t2, t3);
+    tcg_gen_movi_tl(t3, 0);
+    tcg_gen_movcond_tl(TCG_COND_NE, t1, t2, t3, t2, t1);
+    tcg_gen_rem_tl(Rd, t0, t1);
+    tcg_gen_ext32s_tl(Rd, Rd);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+    tcg_temp_free(t3);
+
+    return true;
+}
+
+static bool trans_div_wu(DisasContext *ctx, arg_div_wu *a)
+{
+    TCGv t0, t1, t2, t3;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+    t2 = tcg_const_tl(0);
+    t3 = tcg_const_tl(1);
+
+    gen_load_gpr(t0, a->rj);
+    gen_load_gpr(t1, a->rk);
+
+    tcg_gen_ext32u_tl(t0, t0);
+    tcg_gen_ext32u_tl(t1, t1);
+    tcg_gen_movcond_tl(TCG_COND_EQ, t1, t1, t2, t3, t1);
+    tcg_gen_divu_tl(Rd, t0, t1);
+    tcg_gen_ext32s_tl(Rd, Rd);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+    tcg_temp_free(t3);
+
+    return true;
+}
+
+static bool trans_mod_wu(DisasContext *ctx, arg_mod_wu *a)
+{
+    TCGv t0, t1, t2, t3;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+    t2 = tcg_const_tl(0);
+    t3 = tcg_const_tl(1);
+
+    gen_load_gpr(t0, a->rj);
+    gen_load_gpr(t1, a->rk);
+
+    tcg_gen_ext32u_tl(t0, t0);
+    tcg_gen_ext32u_tl(t1, t1);
+    tcg_gen_movcond_tl(TCG_COND_EQ, t1, t1, t2, t3, t1);
+    tcg_gen_remu_tl(Rd, t0, t1);
+    tcg_gen_ext32s_tl(Rd, Rd);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+    tcg_temp_free(t3);
+
+    return true;
+}
+
+static bool trans_div_d(DisasContext *ctx, arg_div_d *a)
+{
+    TCGv t0, t1, t2, t3;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rj);
+    t1 = get_gpr(a->rk);
+    t2 = tcg_temp_new();
+    t3 = tcg_temp_new();
+
+    check_loongarch_64(ctx);
+    tcg_gen_setcondi_tl(TCG_COND_EQ, t2, t0, -1LL << 63);
+    tcg_gen_setcondi_tl(TCG_COND_EQ, t3, t1, -1LL);
+    tcg_gen_and_tl(t2, t2, t3);
+    tcg_gen_setcondi_tl(TCG_COND_EQ, t3, t1, 0);
+    tcg_gen_or_tl(t2, t2, t3);
+    tcg_gen_movi_tl(t3, 0);
+    tcg_gen_movcond_tl(TCG_COND_NE, t1, t2, t3, t2, t1);
+    tcg_gen_div_tl(Rd, t0, t1);
+
+    tcg_temp_free(t2);
+    tcg_temp_free(t3);
+
+    return true;
+}
+
+static bool trans_mod_d(DisasContext *ctx, arg_mod_d *a)
+{
+    TCGv t0, t1, t2, t3;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rj);
+    t1 = get_gpr(a->rk);
+    t2 = tcg_temp_new();
+    t3 = tcg_temp_new();
+
+    check_loongarch_64(ctx);
+    tcg_gen_setcondi_tl(TCG_COND_EQ, t2, t0, -1LL << 63);
+    tcg_gen_setcondi_tl(TCG_COND_EQ, t3, t1, -1LL);
+    tcg_gen_and_tl(t2, t2, t3);
+    tcg_gen_setcondi_tl(TCG_COND_EQ, t3, t1, 0);
+    tcg_gen_or_tl(t2, t2, t3);
+    tcg_gen_movi_tl(t3, 0);
+    tcg_gen_movcond_tl(TCG_COND_NE, t1, t2, t3, t2, t1);
+    tcg_gen_rem_tl(Rd, t0, t1);
+
+    tcg_temp_free(t2);
+    tcg_temp_free(t3);
+
+    return true;
+}
+
+static bool trans_div_du(DisasContext *ctx, arg_div_du *a)
+{
+    TCGv t0, t1, t2, t3;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rj);
+    t1 = get_gpr(a->rk);
+    t2 = tcg_const_tl(0);
+    t3 = tcg_const_tl(1);
+
+    check_loongarch_64(ctx);
+    tcg_gen_movcond_tl(TCG_COND_EQ, t1, t1, t2, t3, t1);
+    tcg_gen_divu_i64(Rd, t0, t1);
+
+    tcg_temp_free(t2);
+    tcg_temp_free(t3);
+
+    return true;
+}
+
+static bool trans_mod_du(DisasContext *ctx, arg_mod_du *a)
+{
+    TCGv t0, t1, t2, t3;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rj);
+    t1 = get_gpr(a->rk);
+    t2 = tcg_const_tl(0);
+    t3 = tcg_const_tl(1);
+
+    check_loongarch_64(ctx);
+    tcg_gen_movcond_tl(TCG_COND_EQ, t1, t1, t2, t3, t1);
+    tcg_gen_remu_i64(Rd, t0, t1);
+
+    tcg_temp_free(t2);
+    tcg_temp_free(t3);
+
+    return true;
+}
+
+static bool trans_alsl_w(DisasContext *ctx, arg_alsl_w *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = get_gpr(a->rk);
+
+    gen_load_gpr(t0, a->rj);
+
+    tcg_gen_shli_tl(t0, t0, a->sa2 + 1);
+    tcg_gen_add_tl(Rd, t0, t1);
+    tcg_gen_ext32s_tl(Rd, Rd);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_alsl_wu(DisasContext *ctx, arg_alsl_wu *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = get_gpr(a->rk);
+
+    gen_load_gpr(t0, a->rj);
+
+    tcg_gen_shli_tl(t0, t0, a->sa2 + 1);
+    tcg_gen_add_tl(t0, t0, t1);
+    tcg_gen_ext32u_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_alsl_d(DisasContext *ctx, arg_alsl_d *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = get_gpr(a->rk);
+
+    gen_load_gpr(t0, a->rj);
+
+    check_loongarch_64(ctx);
+    tcg_gen_shli_tl(t0, t0, a->sa2 + 1);
+    tcg_gen_add_tl(Rd, t0, t1);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_lu12i_w(DisasContext *ctx, arg_lu12i_w *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    tcg_gen_movi_tl(Rd, a->si20 << 12);
+
+    return true;
+}
+
+static bool trans_lu32i_d(DisasContext *ctx, arg_lu32i_d *a)
+{
+    TCGv_i64 t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+
+    tcg_gen_movi_tl(t0, a->si20);
+    tcg_gen_concat_tl_i64(t1, Rd, t0);
+    tcg_gen_mov_tl(Rd, t1);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_lu52i_d(DisasContext *ctx, arg_lu52i_d *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_load_gpr(t1, a->rj);
+
+    tcg_gen_movi_tl(t0, a->si12);
+    tcg_gen_shli_tl(t0, t0, 52);
+    tcg_gen_andi_tl(t1, t1, 0xfffffffffffffU);
+    tcg_gen_or_tl(Rd, t0, t1);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_pcaddi(DisasContext *ctx, arg_pcaddi *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    target_ulong pc = ctx->base.pc_next;
+    target_ulong addr = pc + (a->si20 << 2);
+    tcg_gen_movi_tl(Rd, addr);
+
+    return true;
+}
+
+static bool trans_pcalau12i(DisasContext *ctx, arg_pcalau12i *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    target_ulong pc = ctx->base.pc_next;
+    target_ulong addr = (pc + (a->si20 << 12)) & ~0xfff;
+    tcg_gen_movi_tl(Rd, addr);
+
+    return true;
+}
+
+static bool trans_pcaddu12i(DisasContext *ctx, arg_pcaddu12i *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    target_ulong pc = ctx->base.pc_next;
+    target_ulong addr = pc + (a->si20 << 12);
+    tcg_gen_movi_tl(Rd, addr);
+
+    return true;
+}
+
+static bool trans_pcaddu18i(DisasContext *ctx, arg_pcaddu18i *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    target_ulong pc = ctx->base.pc_next;
+    target_ulong addr = pc + ((target_ulong)(a->si20) << 18);
+    tcg_gen_movi_tl(Rd, addr);
+
+    return true;
+}
+
+static bool trans_addi_w(DisasContext *ctx, arg_addi_w *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+    target_ulong uimm = (target_long)(a->si12);
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    if (a->rj != 0) {
+        tcg_gen_addi_tl(Rd, Rj, uimm);
+        tcg_gen_ext32s_tl(Rd, Rd);
+    } else {
+        tcg_gen_movi_tl(Rd, uimm);
+    }
+
+    return true;
+}
+
+static bool trans_addi_d(DisasContext *ctx, arg_addi_d *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+    target_ulong uimm = (target_long)(a->si12);
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    check_loongarch_64(ctx);
+    if (a->rj != 0) {
+        tcg_gen_addi_tl(Rd, Rj, uimm);
+    } else {
+        tcg_gen_movi_tl(Rd, uimm);
+    }
+
+    return true;
+}
+
+static bool trans_addu16i_d(DisasContext *ctx, arg_addu16i_d *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    if (a->rj != 0) {
+        tcg_gen_addi_tl(Rd, Rj, a->si16 << 16);
+    } else {
+        tcg_gen_movi_tl(Rd, a->si16 << 16);
+    }
+    return true;
+}
+
+static bool trans_andi(DisasContext *ctx, arg_andi *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+
+    target_ulong uimm = (uint16_t)(a->ui12);
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    if (likely(a->rj != 0)) {
+        tcg_gen_andi_tl(Rd, Rj, uimm);
+    } else {
+        tcg_gen_movi_tl(Rd, 0);
+    }
+
+    return true;
+}
+
+static bool trans_ori(DisasContext *ctx, arg_ori *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+
+    target_ulong uimm = (uint16_t)(a->ui12);
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    if (a->rj != 0) {
+        tcg_gen_ori_tl(Rd, Rj, uimm);
+    } else {
+        tcg_gen_movi_tl(Rd, uimm);
+    }
+
+    return true;
+}
+
+static bool trans_xori(DisasContext *ctx, arg_xori *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+
+    target_ulong uimm = (uint16_t)(a->ui12);
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    if (likely(a->rj != 0)) {
+        tcg_gen_xori_tl(Rd, Rj, uimm);
+    } else {
+        tcg_gen_movi_tl(Rd, uimm);
+    }
+
+    return true;
+}
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index 531f7e1..b60bdc2 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -57,6 +57,15 @@ void gen_load_gpr(TCGv t, int reg)
     }
 }
 
+TCGv get_gpr(int regno)
+{
+    if (regno == 0) {
+        return tcg_constant_tl(0);
+    } else {
+        return cpu_gpr[regno];
+    }
+}
+
 static inline void gen_save_pc(target_ulong pc)
 {
     tcg_gen_movi_tl(cpu_PC, pc);
@@ -287,6 +296,9 @@ static bool loongarch_tr_breakpoint_check(DisasContextBase *dcbase,
     return true;
 }
 
+#include "decode-insns.c.inc"
+#include "trans.inc.c"
+
 static void loongarch_tr_translate_insn(DisasContextBase *dcbase, CPUState *cs)
 {
     CPULoongArchState *env = cs->env_ptr;
diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h
index 333c3bf..ef4d4e7 100644
--- a/target/loongarch/translate.h
+++ b/target/loongarch/translate.h
@@ -35,6 +35,7 @@ void check_fpu_enabled(DisasContext *ctx);
 
 void gen_base_offset_addr(TCGv addr, int base, int offset);
 void gen_load_gpr(TCGv t, int reg);
+TCGv get_gpr(int regno);
 void gen_load_fpr32(TCGv_i32 t, int reg);
 void gen_load_fpr64(TCGv_i64 t, int reg);
 void gen_store_fpr32(TCGv_i32 t, int reg);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 08/22] target/loongarch: Add fixed point shift instruction translation
  2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
                   ` (6 preceding siblings ...)
  2021-07-21  9:53 ` [PATCH v2 07/22] target/loongarch: Add fixed point arithmetic instruction translation Song Gao
@ 2021-07-21  9:53 ` Song Gao
  2021-07-23  0:51   ` Richard Henderson
  2021-07-21  9:53 ` [PATCH v2 09/22] target/loongarch: Add fixed point bit " Song Gao
                   ` (13 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

This patch implement fixed point shift instruction translation.

This includes:
- SLL.W, SRL.W, SRA.W, ROTR.W
- SLLI.W, SRLI.W, SRAI.W, ROTRI.W
- SLL.D, SRL.D, SRA.D, ROTR.D
- SLLI.D, SRLI.D, SRAI.D, ROTRI.D

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insns.decode |  26 +++
 target/loongarch/trans.inc.c  | 363 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 389 insertions(+)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 1e0b755..9302576 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -17,6 +17,8 @@
 %ui12    10:12
 %si16    10:s16
 %si20    5:s20
+%ui5     10:5
+%ui6     10:6
 
 #
 # Argument sets
@@ -27,6 +29,8 @@
 &fmt_rdrjsi16       rd rj si16
 &fmt_rdrjui12       rd rj ui12
 &fmt_rdsi20         rd si20
+&fmt_rdrjui5        rd rj ui5
+&fmt_rdrjui6        rd rj ui6
 
 #
 # Formats
@@ -37,6 +41,8 @@
 @fmt_rdrjrksa2       .... ........ ... .. ..... ..... .....   &fmt_rdrjrksa2      %rd %rj %rk %sa2
 @fmt_rdrjsi16        .... .. ................ ..... .....     &fmt_rdrjsi16       %rd %rj %si16
 @fmt_rdsi20          .... ... .................... .....      &fmt_rdsi20         %rd %si20
+@fmt_rdrjui5         .... ........ ..... ..... ..... .....    &fmt_rdrjui5        %rd %rj %ui5
+@fmt_rdrjui6         .... ........ .... ...... ..... .....    &fmt_rdrjui6        %rd %rj %ui6
 
 #
 # Fixed point arithmetic operation instruction
@@ -87,3 +93,23 @@ addu16i_d        0001 00 ................ ..... .....     @fmt_rdrjsi16
 andi             0000 001101 ............ ..... .....     @fmt_rdrjui12
 ori              0000 001110 ............ ..... .....     @fmt_rdrjui12
 xori             0000 001111 ............ ..... .....     @fmt_rdrjui12
+
+#
+# Fixed point shift operation instruction
+#
+sll_w            0000 00000001 01110 ..... ..... .....    @fmt_rdrjrk
+srl_w            0000 00000001 01111 ..... ..... .....    @fmt_rdrjrk
+sra_w            0000 00000001 10000 ..... ..... .....    @fmt_rdrjrk
+sll_d            0000 00000001 10001 ..... ..... .....    @fmt_rdrjrk
+srl_d            0000 00000001 10010 ..... ..... .....    @fmt_rdrjrk
+sra_d            0000 00000001 10011 ..... ..... .....    @fmt_rdrjrk
+rotr_w           0000 00000001 10110 ..... ..... .....    @fmt_rdrjrk
+rotr_d           0000 00000001 10111 ..... ..... .....    @fmt_rdrjrk
+slli_w           0000 00000100 00001 ..... ..... .....    @fmt_rdrjui5
+slli_d           0000 00000100 0001 ...... ..... .....    @fmt_rdrjui6
+srli_w           0000 00000100 01001 ..... ..... .....    @fmt_rdrjui5
+srli_d           0000 00000100 0101 ...... ..... .....    @fmt_rdrjui6
+srai_w           0000 00000100 10001 ..... ..... .....    @fmt_rdrjui5
+srai_d           0000 00000100 1001 ...... ..... .....    @fmt_rdrjui6
+rotri_w          0000 00000100 11001 ..... ..... .....    @fmt_rdrjui5
+rotri_d          0000 00000100 1101 ...... ..... .....    @fmt_rdrjui6
diff --git a/target/loongarch/trans.inc.c b/target/loongarch/trans.inc.c
index 8faef62..62e9396 100644
--- a/target/loongarch/trans.inc.c
+++ b/target/loongarch/trans.inc.c
@@ -1088,3 +1088,366 @@ static bool trans_xori(DisasContext *ctx, arg_xori *a)
 
     return true;
 }
+
+/* Fixed point shift operation instruction translation */
+static bool trans_sll_w(DisasContext *ctx, arg_sll_w *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = get_gpr(a->rj);
+
+    gen_load_gpr(t0, a->rk);
+
+    tcg_gen_andi_tl(t0, t0, 0x1f);
+    tcg_gen_shl_tl(t0, t1, t0);
+    tcg_gen_ext32s_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_srl_w(DisasContext *ctx, arg_srl_w *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_load_gpr(t0, a->rk);
+    gen_load_gpr(t1, a->rj);
+
+    tcg_gen_ext32u_tl(t1, t1);
+    tcg_gen_andi_tl(t0, t0, 0x1f);
+    tcg_gen_shr_tl(t0, t1, t0);
+    tcg_gen_ext32s_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_sra_w(DisasContext *ctx, arg_sra_w *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = get_gpr(a->rj);
+
+    gen_load_gpr(t0, a->rk);
+
+    tcg_gen_andi_tl(t0, t0, 0x1f);
+    tcg_gen_sar_tl(Rd, t1, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_sll_d(DisasContext *ctx, arg_sll_d *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = get_gpr(a->rj);
+
+    gen_load_gpr(t0, a->rk);
+
+    check_loongarch_64(ctx);
+    tcg_gen_andi_tl(t0, t0, 0x3f);
+    tcg_gen_shl_tl(Rd, t1, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+static bool trans_srl_d(DisasContext *ctx, arg_srl_d *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = get_gpr(a->rj);
+
+    gen_load_gpr(t0, a->rk);
+
+    check_loongarch_64(ctx);
+    tcg_gen_andi_tl(t0, t0, 0x3f);
+    tcg_gen_shr_tl(Rd, t1, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_sra_d(DisasContext *ctx, arg_sra_d *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = get_gpr(a->rj);
+
+    gen_load_gpr(t0, a->rk);
+
+    check_loongarch_64(ctx);
+    tcg_gen_andi_tl(t0, t0, 0x3f);
+    tcg_gen_sar_tl(Rd, t1, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_rotr_w(DisasContext *ctx, arg_rotr_w *a)
+{
+    TCGv t0, t1;
+    TCGv_i32 t2, t3;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rk);
+    t1 = get_gpr(a->rj);
+    t2 = tcg_temp_new_i32();
+    t3 = tcg_temp_new_i32();
+
+    tcg_gen_trunc_tl_i32(t2, t0);
+    tcg_gen_trunc_tl_i32(t3, t1);
+    tcg_gen_andi_i32(t2, t2, 0x1f);
+    tcg_gen_rotr_i32(t2, t3, t2);
+    tcg_gen_ext_i32_tl(Rd, t2);
+
+    tcg_temp_free_i32(t2);
+    tcg_temp_free_i32(t3);
+
+    return true;
+}
+
+static bool trans_rotr_d(DisasContext *ctx, arg_rotr_d *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = get_gpr(a->rj);
+
+    gen_load_gpr(t0, a->rk);
+
+    check_loongarch_64(ctx);
+    tcg_gen_andi_tl(t0, t0, 0x3f);
+    tcg_gen_rotr_tl(Rd, t1, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_slli_w(DisasContext *ctx, arg_slli_w *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    TCGv t0 = tcg_temp_new();
+
+    gen_load_gpr(t0, a->rj);
+    tcg_gen_shli_tl(t0, t0, a->ui5);
+    tcg_gen_ext32s_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_slli_d(DisasContext *ctx, arg_slli_d *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    TCGv t0 = tcg_temp_new();
+
+    gen_load_gpr(t0, a->rj);
+    tcg_gen_shli_tl(Rd, t0, a->ui6);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_srli_w(DisasContext *ctx, arg_srli_w *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+    target_ulong uimm = ((uint16_t)(a->ui5)) & 0x1f;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+
+    gen_load_gpr(t0, a->rj);
+
+    if (uimm != 0) {
+        tcg_gen_ext32u_tl(t0, t0);
+        tcg_gen_shri_tl(Rd, t0, uimm);
+    } else {
+        tcg_gen_ext32s_tl(Rd, t0);
+    }
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_srli_d(DisasContext *ctx, arg_srli_d *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rj);
+
+    tcg_gen_shri_tl(Rd, t0, a->ui6);
+
+    return true;
+}
+
+static bool trans_srai_w(DisasContext *ctx, arg_srai_w *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+    target_ulong uimm = ((uint16_t)(a->ui5)) & 0x1f;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rj);
+
+    tcg_gen_sari_tl(Rd, t0, uimm);
+
+    return true;
+}
+
+static bool trans_srai_d(DisasContext *ctx, arg_srai_d *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rj);
+
+    check_loongarch_64(ctx);
+    tcg_gen_sari_tl(Rd, t0, a->ui6);
+
+    return true;
+}
+
+static bool trans_rotri_w(DisasContext *ctx, arg_rotri_w *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+    target_ulong uimm = ((uint16_t)(a->ui5)) & 0x1f;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rj);
+
+    if (uimm != 0) {
+        TCGv_i32 t1 = tcg_temp_new_i32();
+
+        tcg_gen_trunc_tl_i32(t1, t0);
+        tcg_gen_rotri_i32(t1, t1, uimm);
+        tcg_gen_ext_i32_tl(Rd, t1);
+
+        tcg_temp_free_i32(t1);
+    } else {
+        tcg_gen_ext32s_tl(Rd, t0);
+    }
+
+    return true;
+}
+
+static bool trans_rotri_d(DisasContext *ctx, arg_rotri_d *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rj);
+
+    check_loongarch_64(ctx);
+    tcg_gen_rotri_tl(Rd, t0, a->ui6);
+
+    return true;
+}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 09/22] target/loongarch: Add fixed point bit instruction translation
  2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
                   ` (7 preceding siblings ...)
  2021-07-21  9:53 ` [PATCH v2 08/22] target/loongarch: Add fixed point shift " Song Gao
@ 2021-07-21  9:53 ` Song Gao
  2021-07-21 17:46   ` Philippe Mathieu-Daudé
  2021-07-23  1:29   ` Richard Henderson
  2021-07-21  9:53 ` [PATCH v2 10/22] target/loongarch: Add fixed point load/store " Song Gao
                   ` (12 subsequent siblings)
  21 siblings, 2 replies; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

This patch implement fixed point bit instruction translation.

This includes:
- EXT.W.{B/H}
- CL{O/Z}.{W/D}, CT{O/Z}.{W/D}
- BYTEPICK.{W/D}
- REVB.{2H/4H/2W/D}
- REVH.{2W/D}
- BITREV.{4B/8B}, BITREV.{W/D}
- BSTRINS.{W/D}, BSTRPICK.{W/D}
- MASKEQZ, MASKNEZ

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/helper.h     |  10 +
 target/loongarch/insns.decode |  45 +++
 target/loongarch/op_helper.c  | 119 ++++++++
 target/loongarch/trans.inc.c  | 665 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 839 insertions(+)

diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 6c7e19b..bbbcc26 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -8,3 +8,13 @@
 
 DEF_HELPER_3(raise_exception_err, noreturn, env, i32, int)
 DEF_HELPER_2(raise_exception, noreturn, env, i32)
+
+DEF_HELPER_2(cto_w, tl, env, tl)
+DEF_HELPER_2(ctz_w, tl, env, tl)
+DEF_HELPER_2(cto_d, tl, env, tl)
+DEF_HELPER_2(ctz_d, tl, env, tl)
+DEF_HELPER_2(bitrev_w, tl, env, tl)
+DEF_HELPER_2(bitrev_d, tl, env, tl)
+
+DEF_HELPER_FLAGS_1(loongarch_bitswap, TCG_CALL_NO_RWG_SE, tl, tl)
+DEF_HELPER_FLAGS_1(loongarch_dbitswap, TCG_CALL_NO_RWG_SE, tl, tl)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 9302576..ec599a9 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -13,12 +13,17 @@
 %rj      5:5
 %rk      10:5
 %sa2     15:2
+%sa3     15:3
 %si12    10:s12
 %ui12    10:12
 %si16    10:s16
 %si20    5:s20
 %ui5     10:5
 %ui6     10:6
+%msbw    16:5
+%lsbw    10:5
+%msbd    16:6
+%lsbd    10:6
 
 #
 # Argument sets
@@ -31,6 +36,10 @@
 &fmt_rdsi20         rd si20
 &fmt_rdrjui5        rd rj ui5
 &fmt_rdrjui6        rd rj ui6
+&fmt_rdrj           rd rj
+&fmt_rdrjrksa3      rd rj rk sa3
+&fmt_rdrjmsbwlsbw   rd rj msbw lsbw
+&fmt_rdrjmsbdlsbd   rd rj msbd lsbd
 
 #
 # Formats
@@ -43,6 +52,10 @@
 @fmt_rdsi20          .... ... .................... .....      &fmt_rdsi20         %rd %si20
 @fmt_rdrjui5         .... ........ ..... ..... ..... .....    &fmt_rdrjui5        %rd %rj %ui5
 @fmt_rdrjui6         .... ........ .... ...... ..... .....    &fmt_rdrjui6        %rd %rj %ui6
+@fmt_rdrj            .... ........ ..... ..... ..... .....    &fmt_rdrj           %rd %rj
+@fmt_rdrjmsbwlsbw    .... ....... ..... . ..... ..... .....   &fmt_rdrjmsbwlsbw   %rd %rj %msbw %lsbw
+@fmt_rdrjmsbdlsbd    .... ...... ...... ...... ..... .....    &fmt_rdrjmsbdlsbd   %rd %rj %msbd %lsbd
+@fmt_rdrjrksa3       .... ........ .. ... ..... ..... .....   &fmt_rdrjrksa3      %rd %rj %rk %sa3
 
 #
 # Fixed point arithmetic operation instruction
@@ -113,3 +126,35 @@ srai_w           0000 00000100 10001 ..... ..... .....    @fmt_rdrjui5
 srai_d           0000 00000100 1001 ...... ..... .....    @fmt_rdrjui6
 rotri_w          0000 00000100 11001 ..... ..... .....    @fmt_rdrjui5
 rotri_d          0000 00000100 1101 ...... ..... .....    @fmt_rdrjui6
+
+#
+# Fixed point bit operation instruction
+#
+ext_w_h          0000 00000000 00000 10110 ..... .....    @fmt_rdrj
+ext_w_b          0000 00000000 00000 10111 ..... .....    @fmt_rdrj
+clo_w            0000 00000000 00000 00100 ..... .....    @fmt_rdrj
+clz_w            0000 00000000 00000 00101 ..... .....    @fmt_rdrj
+cto_w            0000 00000000 00000 00110 ..... .....    @fmt_rdrj
+ctz_w            0000 00000000 00000 00111 ..... .....    @fmt_rdrj
+clo_d            0000 00000000 00000 01000 ..... .....    @fmt_rdrj
+clz_d            0000 00000000 00000 01001 ..... .....    @fmt_rdrj
+cto_d            0000 00000000 00000 01010 ..... .....    @fmt_rdrj
+ctz_d            0000 00000000 00000 01011 ..... .....    @fmt_rdrj
+revb_2h          0000 00000000 00000 01100 ..... .....    @fmt_rdrj
+revb_4h          0000 00000000 00000 01101 ..... .....    @fmt_rdrj
+revb_2w          0000 00000000 00000 01110 ..... .....    @fmt_rdrj
+revb_d           0000 00000000 00000 01111 ..... .....    @fmt_rdrj
+revh_2w          0000 00000000 00000 10000 ..... .....    @fmt_rdrj
+revh_d           0000 00000000 00000 10001 ..... .....    @fmt_rdrj
+bitrev_4b        0000 00000000 00000 10010 ..... .....    @fmt_rdrj
+bitrev_8b        0000 00000000 00000 10011 ..... .....    @fmt_rdrj
+bitrev_w         0000 00000000 00000 10100 ..... .....    @fmt_rdrj
+bitrev_d         0000 00000000 00000 10101 ..... .....    @fmt_rdrj
+bytepick_w       0000 00000000 100 .. ..... ..... .....   @fmt_rdrjrksa2
+bytepick_d       0000 00000000 11 ... ..... ..... .....   @fmt_rdrjrksa3
+maskeqz          0000 00000001 00110 ..... ..... .....    @fmt_rdrjrk
+masknez          0000 00000001 00111 ..... ..... .....    @fmt_rdrjrk
+bstrins_w        0000 0000011 ..... 0 ..... ..... .....   @fmt_rdrjmsbwlsbw
+bstrpick_w       0000 0000011 ..... 1 ..... ..... .....   @fmt_rdrjmsbwlsbw
+bstrins_d        0000 000010 ...... ...... ..... .....    @fmt_rdrjmsbdlsbd
+bstrpick_d       0000 000011 ...... ...... ..... .....    @fmt_rdrjmsbdlsbd
diff --git a/target/loongarch/op_helper.c b/target/loongarch/op_helper.c
index b2cbdd7..07c3d52 100644
--- a/target/loongarch/op_helper.c
+++ b/target/loongarch/op_helper.c
@@ -25,3 +25,122 @@ void helper_raise_exception(CPULoongArchState *env, uint32_t exception)
 {
     do_raise_exception(env, exception, GETPC());
 }
+
+target_ulong helper_cto_w(CPULoongArchState *env, target_ulong rj)
+{
+    uint32_t v = (uint32_t)rj;
+    int temp = 0;
+
+    while ((v & 0x1) == 1) {
+        temp++;
+        v = v >> 1;
+    }
+
+    return (target_ulong)temp;
+}
+
+target_ulong helper_ctz_w(CPULoongArchState *env, target_ulong rj)
+{
+    uint32_t v = (uint32_t)rj;
+
+    if (v == 0) {
+        return 32;
+    }
+
+    int temp = 0;
+    while ((v & 0x1) == 0) {
+        temp++;
+        v = v >> 1;
+    }
+
+    return (target_ulong)temp;
+}
+
+target_ulong helper_cto_d(CPULoongArchState *env, target_ulong rj)
+{
+    uint64_t v = rj;
+    int temp = 0;
+
+    while ((v & 0x1) == 1) {
+        temp++;
+        v = v >> 1;
+    }
+
+    return (target_ulong)temp;
+}
+
+target_ulong helper_ctz_d(CPULoongArchState *env, target_ulong rj)
+{
+    uint64_t v = rj;
+
+    if (v == 0) {
+        return 64;
+    }
+
+    int temp = 0;
+    while ((v & 0x1) == 0) {
+        temp++;
+        v = v >> 1;
+    }
+
+    return (target_ulong)temp;
+}
+
+target_ulong helper_bitrev_w(CPULoongArchState *env, target_ulong rj)
+{
+    int32_t v = (int32_t)rj;
+    const int SIZE = 32;
+    uint8_t bytes[SIZE];
+
+    int i;
+    for (i = 0; i < SIZE; i++) {
+        bytes[i] = v & 0x1;
+        v = v >> 1;
+    }
+    /* v == 0 */
+    for (i = 0; i < SIZE; i++) {
+        v = v | ((uint32_t)bytes[i] << (SIZE - 1 - i));
+    }
+
+    return (target_ulong)(int32_t)v;
+}
+
+target_ulong helper_bitrev_d(CPULoongArchState *env, target_ulong rj)
+{
+    uint64_t v = rj;
+    const int SIZE = 64;
+    uint8_t bytes[SIZE];
+
+    int i;
+    for (i = 0; i < SIZE; i++) {
+        bytes[i] = v & 0x1;
+        v = v >> 1;
+    }
+    /* v == 0 */
+    for (i = 0; i < SIZE; i++) {
+        v = v | ((uint64_t)bytes[i] << (SIZE - 1 - i));
+    }
+
+    return (target_ulong)v;
+}
+
+static inline target_ulong bitswap(target_ulong v)
+{
+    v = ((v >> 1) & (target_ulong)0x5555555555555555ULL) |
+        ((v & (target_ulong)0x5555555555555555ULL) << 1);
+    v = ((v >> 2) & (target_ulong)0x3333333333333333ULL) |
+        ((v & (target_ulong)0x3333333333333333ULL) << 2);
+    v = ((v >> 4) & (target_ulong)0x0F0F0F0F0F0F0F0FULL) |
+        ((v & (target_ulong)0x0F0F0F0F0F0F0F0FULL) << 4);
+    return v;
+}
+
+target_ulong helper_loongarch_dbitswap(target_ulong rj)
+{
+    return bitswap(rj);
+}
+
+target_ulong helper_loongarch_bitswap(target_ulong rt)
+{
+    return (int32_t)bitswap(rt);
+}
diff --git a/target/loongarch/trans.inc.c b/target/loongarch/trans.inc.c
index 62e9396..8c5ba63 100644
--- a/target/loongarch/trans.inc.c
+++ b/target/loongarch/trans.inc.c
@@ -1451,3 +1451,668 @@ static bool trans_rotri_d(DisasContext *ctx, arg_rotri_d *a)
 
     return true;
 }
+
+/* Fixed point bit operation instruction translation */
+static bool trans_ext_w_h(DisasContext *ctx, arg_ext_w_h *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rj);
+
+    tcg_gen_ext16s_tl(Rd, t0);
+
+    return true;
+}
+
+static bool trans_ext_w_b(DisasContext *ctx, arg_ext_w_b *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rj);
+
+    tcg_gen_ext8s_tl(Rd, t0);
+
+    return true;
+}
+
+static bool trans_clo_w(DisasContext *ctx, arg_clo_w *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    gen_load_gpr(Rd, a->rj);
+
+    tcg_gen_not_tl(Rd, Rd);
+    tcg_gen_ext32u_tl(Rd, Rd);
+    tcg_gen_clzi_tl(Rd, Rd, TARGET_LONG_BITS);
+    tcg_gen_subi_tl(Rd, Rd, TARGET_LONG_BITS - 32);
+
+    return true;
+}
+
+static bool trans_clz_w(DisasContext *ctx, arg_clz_w *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    gen_load_gpr(Rd, a->rj);
+
+    tcg_gen_ext32u_tl(Rd, Rd);
+    tcg_gen_clzi_tl(Rd, Rd, TARGET_LONG_BITS);
+    tcg_gen_subi_tl(Rd, Rd, TARGET_LONG_BITS - 32);
+
+    return true;
+}
+
+static bool trans_cto_w(DisasContext *ctx, arg_cto_w *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    gen_load_gpr(t0, a->rj);
+
+    gen_helper_cto_w(Rd, cpu_env, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_ctz_w(DisasContext *ctx, arg_ctz_w *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    gen_load_gpr(t0, a->rj);
+
+    gen_helper_ctz_w(Rd, cpu_env, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+static bool trans_clo_d(DisasContext *ctx, arg_clo_d *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    check_loongarch_64(ctx);
+    gen_load_gpr(Rd, a->rj);
+    tcg_gen_not_tl(Rd, Rd);
+    tcg_gen_clzi_i64(Rd, Rd, 64);
+
+    return true;
+}
+
+static bool trans_clz_d(DisasContext *ctx, arg_clz_d *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    check_loongarch_64(ctx);
+    gen_load_gpr(Rd, a->rj);
+    tcg_gen_clzi_i64(Rd, Rd, 64);
+
+    return true;
+}
+
+static bool trans_cto_d(DisasContext *ctx, arg_cto_d *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    gen_load_gpr(t0, a->rj);
+
+    gen_helper_cto_d(Rd, cpu_env, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_ctz_d(DisasContext *ctx, arg_ctz_d *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    gen_load_gpr(t0, a->rj);
+
+    gen_helper_ctz_d(Rd, cpu_env, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_revb_2h(DisasContext *ctx, arg_revb_2h *a)
+{
+    TCGv t0, t1, mask;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+    mask = tcg_const_tl(0x00FF00FF);
+
+    gen_load_gpr(t0, a->rj);
+
+    tcg_gen_shri_tl(t1, t0, 8);
+    tcg_gen_and_tl(t1, t1, mask);
+    tcg_gen_and_tl(t0, t0, mask);
+    tcg_gen_shli_tl(t0, t0, 8);
+    tcg_gen_or_tl(t0, t0, t1);
+    tcg_gen_ext32s_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+    tcg_temp_free(mask);
+
+    return true;
+}
+
+static bool trans_revb_4h(DisasContext *ctx, arg_revb_4h *a)
+{
+    TCGv t0, t1, mask;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+    mask = tcg_const_tl(0x00FF00FF00FF00FFULL);
+
+    gen_load_gpr(t0, a->rj);
+
+    check_loongarch_64(ctx);
+    tcg_gen_shri_tl(t1, t0, 8);
+    tcg_gen_and_tl(t1, t1, mask);
+    tcg_gen_and_tl(t0, t0, mask);
+    tcg_gen_shli_tl(t0, t0, 8);
+    tcg_gen_or_tl(Rd, t0, t1);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+    tcg_temp_free(mask);
+
+    return true;
+}
+
+static bool trans_revb_2w(DisasContext *ctx, arg_revb_2w *a)
+{
+    TCGv_i64 t0, t1, t2;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    t2 = get_gpr(a->rj);
+
+    gen_load_gpr(t0, a->rd);
+
+    tcg_gen_ext32u_i64(t1, t2);
+    tcg_gen_bswap32_i64(t0, t1);
+    tcg_gen_shri_i64(t1, t2, 32);
+    tcg_gen_bswap32_i64(t1, t1);
+    tcg_gen_concat32_i64(Rd, t0, t1);
+
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+
+    return true;
+}
+
+static bool trans_revb_d(DisasContext *ctx, arg_revb_d *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    check_loongarch_64(ctx);
+    tcg_gen_bswap64_i64(Rd, Rj);
+
+    return true;
+}
+
+static bool trans_revh_2w(DisasContext *ctx, arg_revh_2w *a)
+{
+    TCGv_i64 t0, t1, t2, mask;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    t2 = get_gpr(a->rj);
+    mask = tcg_const_i64(0x0000ffff0000ffffull);
+
+    gen_load_gpr(t1, a->rd);
+
+    tcg_gen_shri_i64(t0, t2, 16);
+    tcg_gen_and_i64(t1, t2, mask);
+    tcg_gen_and_i64(t0, t0, mask);
+    tcg_gen_shli_i64(t1, t1, 16);
+    tcg_gen_or_i64(Rd, t1, t0);
+
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(mask);
+
+    return true;
+}
+
+static bool trans_revh_d(DisasContext *ctx, arg_revh_d *a)
+{
+    TCGv t0, t1, mask;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+    mask = tcg_const_tl(0x0000FFFF0000FFFFULL);
+
+    gen_load_gpr(t0, a->rj);
+
+    check_loongarch_64(ctx);
+    tcg_gen_shri_tl(t1, t0, 16);
+    tcg_gen_and_tl(t1, t1, mask);
+    tcg_gen_and_tl(t0, t0, mask);
+    tcg_gen_shli_tl(t0, t0, 16);
+    tcg_gen_or_tl(t0, t0, t1);
+    tcg_gen_shri_tl(t1, t0, 32);
+    tcg_gen_shli_tl(t0, t0, 32);
+    tcg_gen_or_tl(Rd, t0, t1);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+    tcg_temp_free(mask);
+
+    return true;
+}
+
+static bool trans_bitrev_4b(DisasContext *ctx, arg_bitrev_4b *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    gen_load_gpr(t0, a->rj);
+
+    gen_helper_loongarch_bitswap(Rd, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_bitrev_8b(DisasContext *ctx, arg_bitrev_8b *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    gen_load_gpr(t0, a->rj);
+
+    check_loongarch_64(ctx);
+    gen_helper_loongarch_dbitswap(Rd, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_bitrev_w(DisasContext *ctx, arg_bitrev_w *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    gen_load_gpr(t0, a->rj);
+
+    gen_helper_bitrev_w(Rd, cpu_env, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_bitrev_d(DisasContext *ctx, arg_bitrev_d *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    gen_load_gpr(t0, a->rj);
+
+    check_loongarch_64(ctx);
+    gen_helper_bitrev_d(Rd, cpu_env, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_bytepick_w(DisasContext *ctx, arg_bytepick_w *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    if (a->sa2 == 0 || ((a->sa2) * 8) == 32) {
+        if (a->sa2 == 0) {
+            t0 = get_gpr(a->rk);
+        } else {
+            t0 = get_gpr(a->rj);
+        }
+            tcg_gen_ext32s_tl(Rd, t0);
+    } else {
+        t0 = get_gpr(a->rk);
+
+        TCGv t1 = get_gpr(a->rj);
+        TCGv_i64 t2 = tcg_temp_new_i64();
+
+        tcg_gen_concat_tl_i64(t2, t1, t0);
+        tcg_gen_shri_i64(t2, t2, 32 - ((a->sa2) * 8));
+        tcg_gen_ext32s_i64(Rd, t2);
+
+        tcg_temp_free_i64(t2);
+    }
+
+    return true;
+}
+
+static bool trans_bytepick_d(DisasContext *ctx, arg_bytepick_d *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+
+    check_loongarch_64(ctx);
+    if (a->sa3 == 0 || ((a->sa3) * 8) == 64) {
+        if (a->sa3 == 0) {
+            gen_load_gpr(t0, a->rk);
+        } else {
+            gen_load_gpr(t0, a->rj);
+        }
+            tcg_gen_mov_tl(Rd, t0);
+    } else {
+        TCGv t1 = tcg_temp_new();
+
+        gen_load_gpr(t0, a->rk);
+        gen_load_gpr(t1, a->rj);
+
+        tcg_gen_shli_tl(t0, t0, ((a->sa3) * 8));
+        tcg_gen_shri_tl(t1, t1, 64 - ((a->sa3) * 8));
+        tcg_gen_or_tl(Rd, t1, t0);
+
+        tcg_temp_free(t1);
+    }
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_maskeqz(DisasContext *ctx, arg_maskeqz *a)
+{
+    TCGv t0, t1, t2;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rk);
+    t1 = get_gpr(a->rj);
+    t2 = tcg_const_tl(0);
+
+    tcg_gen_movcond_tl(TCG_COND_NE, Rd, t0, t2, t1, t2);
+
+    tcg_temp_free(t2);
+
+    return true;
+}
+
+static bool trans_masknez(DisasContext *ctx, arg_masknez *a)
+{
+    TCGv t0, t1, t2;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rk);
+    t1 = get_gpr(a->rj);
+    t2 = tcg_const_tl(0);
+
+    tcg_gen_movcond_tl(TCG_COND_EQ, Rd, t0, t2, t1, t2);
+
+    tcg_temp_free(t2);
+
+    return true;
+}
+
+static bool trans_bstrins_d(DisasContext *ctx, arg_bstrins_d *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+    int lsb = a->lsbd;
+    int msb = a->msbd;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    if (lsb > msb) {
+        return false;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = get_gpr(a->rj);
+
+    gen_load_gpr(t0, a->rd);
+
+    tcg_gen_deposit_tl(t0, t0, t1, lsb, msb - lsb + 1);
+    tcg_gen_mov_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_bstrpick_d(DisasContext *ctx, arg_bstrpick_d *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+    int lsb = a->lsbd;
+    int msb = a->msbd;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    if (lsb > msb) {
+        return false;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = get_gpr(a->rj);
+
+    gen_load_gpr(t0, a->rd);
+
+    tcg_gen_extract_tl(t0, t1, lsb, msb - lsb + 1);
+    tcg_gen_mov_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_bstrins_w(DisasContext *ctx, arg_bstrins_w *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+    int lsb = a->lsbw;
+    int msb = a->msbw;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    if (lsb > msb) {
+        return false;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = get_gpr(a->rj);
+
+    gen_load_gpr(t0, a->rd);
+
+    tcg_gen_deposit_tl(t0, t0, t1, lsb, msb - lsb + 1);
+    tcg_gen_ext32s_tl(t0, t0);
+    tcg_gen_mov_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_bstrpick_w(DisasContext *ctx, arg_bstrpick_w *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+    int lsb = a->lsbw;
+    int msb = a->msbw;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    if ((a->lsbw > a->msbw) || (lsb + msb > 31)) {
+        return false;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = get_gpr(a->rj);
+
+    if (msb != 31) {
+        tcg_gen_extract_tl(t0, t1, lsb, msb + 1);
+    } else {
+        tcg_gen_ext32s_tl(t0, t1);
+    }
+    tcg_gen_mov_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 10/22] target/loongarch: Add fixed point load/store instruction translation
  2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
                   ` (8 preceding siblings ...)
  2021-07-21  9:53 ` [PATCH v2 09/22] target/loongarch: Add fixed point bit " Song Gao
@ 2021-07-21  9:53 ` Song Gao
  2021-07-23  1:45   ` Richard Henderson
  2021-07-21  9:53 ` [PATCH v2 11/22] target/loongarch: Add fixed point atomic " Song Gao
                   ` (11 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

This patch implement fixed point load/store instruction translation.

This includes:
- LD.{B[U]/H[U]/W[U]/D}, ST.{B/H/W/D}
- LDX.{B[U]/H[U]/W[U]/D}, STX.{B/H/W/D}
- LDPTR.{W/D}, STPTR.{W/D}
- PRELD
- LD{GT/LE}.{B/H/W/D}, ST{GT/LE}.{B/H/W/D}
- DBAR, IBAR

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/helper.h     |   3 +
 target/loongarch/insns.decode |  58 ++++
 target/loongarch/op_helper.c  |  15 +
 target/loongarch/trans.inc.c  | 758 ++++++++++++++++++++++++++++++++++++++++++
 target/loongarch/translate.c  |  29 ++
 5 files changed, 863 insertions(+)

diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index bbbcc26..5cd38c8 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -18,3 +18,6 @@ DEF_HELPER_2(bitrev_d, tl, env, tl)
 
 DEF_HELPER_FLAGS_1(loongarch_bitswap, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_1(loongarch_dbitswap, TCG_CALL_NO_RWG_SE, tl, tl)
+
+DEF_HELPER_3(asrtle_d, void, env, tl, tl)
+DEF_HELPER_3(asrtgt_d, void, env, tl, tl)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index ec599a9..08fd232 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -24,6 +24,9 @@
 %lsbw    10:5
 %msbd    16:6
 %lsbd    10:6
+%si14    10:s14
+%hint    0:5
+%whint   0:15
 
 #
 # Argument sets
@@ -40,6 +43,9 @@
 &fmt_rdrjrksa3      rd rj rk sa3
 &fmt_rdrjmsbwlsbw   rd rj msbw lsbw
 &fmt_rdrjmsbdlsbd   rd rj msbd lsbd
+&fmt_rdrjsi14       rd rj si14
+&fmt_hintrjsi12     hint rj si12
+&fmt_whint          whint
 
 #
 # Formats
@@ -56,6 +62,9 @@
 @fmt_rdrjmsbwlsbw    .... ....... ..... . ..... ..... .....   &fmt_rdrjmsbwlsbw   %rd %rj %msbw %lsbw
 @fmt_rdrjmsbdlsbd    .... ...... ...... ...... ..... .....    &fmt_rdrjmsbdlsbd   %rd %rj %msbd %lsbd
 @fmt_rdrjrksa3       .... ........ .. ... ..... ..... .....   &fmt_rdrjrksa3      %rd %rj %rk %sa3
+@fmt_hintrjsi12      .... ...... ............ ..... .....     &fmt_hintrjsi12     %hint %rj %si12
+@fmt_whint           .... ........ ..... ...............      &fmt_whint          %whint
+@fmt_rdrjsi14        .... .... .............. ..... .....     &fmt_rdrjsi14       %rd %rj %si14
 
 #
 # Fixed point arithmetic operation instruction
@@ -158,3 +167,52 @@ bstrins_w        0000 0000011 ..... 0 ..... ..... .....   @fmt_rdrjmsbwlsbw
 bstrpick_w       0000 0000011 ..... 1 ..... ..... .....   @fmt_rdrjmsbwlsbw
 bstrins_d        0000 000010 ...... ...... ..... .....    @fmt_rdrjmsbdlsbd
 bstrpick_d       0000 000011 ...... ...... ..... .....    @fmt_rdrjmsbdlsbd
+
+#
+# Fixed point load/store instruction
+#
+ld_b             0010 100000 ............ ..... .....     @fmt_rdrjsi12
+ld_h             0010 100001 ............ ..... .....     @fmt_rdrjsi12
+ld_w             0010 100010 ............ ..... .....     @fmt_rdrjsi12
+ld_d             0010 100011 ............ ..... .....     @fmt_rdrjsi12
+st_b             0010 100100 ............ ..... .....     @fmt_rdrjsi12
+st_h             0010 100101 ............ ..... .....     @fmt_rdrjsi12
+st_w             0010 100110 ............ ..... .....     @fmt_rdrjsi12
+st_d             0010 100111 ............ ..... .....     @fmt_rdrjsi12
+ld_bu            0010 101000 ............ ..... .....     @fmt_rdrjsi12
+ld_hu            0010 101001 ............ ..... .....     @fmt_rdrjsi12
+ld_wu            0010 101010 ............ ..... .....     @fmt_rdrjsi12
+ldx_b            0011 10000000 00000 ..... ..... .....    @fmt_rdrjrk
+ldx_h            0011 10000000 01000 ..... ..... .....    @fmt_rdrjrk
+ldx_w            0011 10000000 10000 ..... ..... .....    @fmt_rdrjrk
+ldx_d            0011 10000000 11000 ..... ..... .....    @fmt_rdrjrk
+stx_b            0011 10000001 00000 ..... ..... .....    @fmt_rdrjrk
+stx_h            0011 10000001 01000 ..... ..... .....    @fmt_rdrjrk
+stx_w            0011 10000001 10000 ..... ..... .....    @fmt_rdrjrk
+stx_d            0011 10000001 11000 ..... ..... .....    @fmt_rdrjrk
+ldx_bu           0011 10000010 00000 ..... ..... .....    @fmt_rdrjrk
+ldx_hu           0011 10000010 01000 ..... ..... .....    @fmt_rdrjrk
+ldx_wu           0011 10000010 10000 ..... ..... .....    @fmt_rdrjrk
+preld            0010 101011 ............ ..... .....     @fmt_hintrjsi12
+dbar             0011 10000111 00100 ...............      @fmt_whint
+ibar             0011 10000111 00101 ...............      @fmt_whint
+ldptr_w          0010 0100 .............. ..... .....     @fmt_rdrjsi14
+stptr_w          0010 0101 .............. ..... .....     @fmt_rdrjsi14
+ldptr_d          0010 0110 .............. ..... .....     @fmt_rdrjsi14
+stptr_d          0010 0111 .............. ..... .....     @fmt_rdrjsi14
+ldgt_b           0011 10000111 10000 ..... ..... .....    @fmt_rdrjrk
+ldgt_h           0011 10000111 10001 ..... ..... .....    @fmt_rdrjrk
+ldgt_w           0011 10000111 10010 ..... ..... .....    @fmt_rdrjrk
+ldgt_d           0011 10000111 10011 ..... ..... .....    @fmt_rdrjrk
+ldle_b           0011 10000111 10100 ..... ..... .....    @fmt_rdrjrk
+ldle_h           0011 10000111 10101 ..... ..... .....    @fmt_rdrjrk
+ldle_w           0011 10000111 10110 ..... ..... .....    @fmt_rdrjrk
+ldle_d           0011 10000111 10111 ..... ..... .....    @fmt_rdrjrk
+stgt_b           0011 10000111 11000 ..... ..... .....    @fmt_rdrjrk
+stgt_h           0011 10000111 11001 ..... ..... .....    @fmt_rdrjrk
+stgt_w           0011 10000111 11010 ..... ..... .....    @fmt_rdrjrk
+stgt_d           0011 10000111 11011 ..... ..... .....    @fmt_rdrjrk
+stle_b           0011 10000111 11100 ..... ..... .....    @fmt_rdrjrk
+stle_h           0011 10000111 11101 ..... ..... .....    @fmt_rdrjrk
+stle_w           0011 10000111 11110 ..... ..... .....    @fmt_rdrjrk
+stle_d           0011 10000111 11111 ..... ..... .....    @fmt_rdrjrk
diff --git a/target/loongarch/op_helper.c b/target/loongarch/op_helper.c
index 07c3d52..738e067 100644
--- a/target/loongarch/op_helper.c
+++ b/target/loongarch/op_helper.c
@@ -144,3 +144,18 @@ target_ulong helper_loongarch_bitswap(target_ulong rt)
 {
     return (int32_t)bitswap(rt);
 }
+
+/* loongarch assert op */
+void helper_asrtle_d(CPULoongArchState *env, target_ulong rj, target_ulong rk)
+{
+    if (rj > rk) {
+        do_raise_exception(env, EXCP_ADE, GETPC());
+    }
+}
+
+void helper_asrtgt_d(CPULoongArchState *env, target_ulong rj, target_ulong rk)
+{
+    if (rj <= rk) {
+        do_raise_exception(env, EXCP_ADE, GETPC());
+    }
+}
diff --git a/target/loongarch/trans.inc.c b/target/loongarch/trans.inc.c
index 8c5ba63..e38001b 100644
--- a/target/loongarch/trans.inc.c
+++ b/target/loongarch/trans.inc.c
@@ -2116,3 +2116,761 @@ static bool trans_bstrpick_w(DisasContext *ctx, arg_bstrpick_w *a)
 
     return true;
 }
+
+/* Fixed point load/store instruction translation */
+static bool trans_ld_b(DisasContext *ctx, arg_ld_b *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+    int mem_idx = ctx->mem_idx;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+
+    gen_base_offset_addr(t0, a->rj, a->si12);
+    tcg_gen_qemu_ld_tl(t0, t0, mem_idx, MO_SB);
+    tcg_gen_mov_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_ld_h(DisasContext *ctx, arg_ld_h *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+    int mem_idx = ctx->mem_idx;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    gen_base_offset_addr(t0, a->rj, a->si12);
+    tcg_gen_qemu_ld_tl(t0, t0, mem_idx, MO_TESW |
+                       ctx->default_tcg_memop_mask);
+    tcg_gen_mov_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_ld_w(DisasContext *ctx, arg_ld_w *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+    int mem_idx = ctx->mem_idx;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    gen_base_offset_addr(t0, a->rj, a->si12);
+    tcg_gen_qemu_ld_tl(t0, t0, mem_idx, MO_TESL |
+                       ctx->default_tcg_memop_mask);
+    tcg_gen_mov_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_ld_d(DisasContext *ctx, arg_ld_d *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+    int mem_idx = ctx->mem_idx;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    gen_base_offset_addr(t0, a->rj, a->si12);
+    tcg_gen_qemu_ld_tl(t0, t0, mem_idx, MO_TEQ |
+                       ctx->default_tcg_memop_mask);
+    tcg_gen_mov_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_st_b(DisasContext *ctx, arg_st_b *a)
+{
+    TCGv t0, t1;
+    int mem_idx = ctx->mem_idx;
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_base_offset_addr(t0, a->rj, a->si12);
+    gen_load_gpr(t1, a->rd);
+    tcg_gen_qemu_st_tl(t1, t0, mem_idx, MO_8);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_st_h(DisasContext *ctx, arg_st_h *a)
+{
+    TCGv t0, t1;
+    int mem_idx = ctx->mem_idx;
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_base_offset_addr(t0, a->rj, a->si12);
+    gen_load_gpr(t1, a->rd);
+    tcg_gen_qemu_st_tl(t1, t0, mem_idx, MO_TEUW |
+                       ctx->default_tcg_memop_mask);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_st_w(DisasContext *ctx, arg_st_w *a)
+{
+    TCGv t0, t1;
+    int mem_idx = ctx->mem_idx;
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_base_offset_addr(t0, a->rj, a->si12);
+    gen_load_gpr(t1, a->rd);
+    tcg_gen_qemu_st_tl(t1, t0, mem_idx, MO_TEUL |
+                       ctx->default_tcg_memop_mask);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_st_d(DisasContext *ctx, arg_st_d *a)
+{
+    TCGv t0, t1;
+    int mem_idx = ctx->mem_idx;
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_base_offset_addr(t0, a->rj, a->si12);
+    gen_load_gpr(t1, a->rd);
+    tcg_gen_qemu_st_tl(t1, t0, mem_idx, MO_TEQ |
+                       ctx->default_tcg_memop_mask);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+static bool trans_ld_bu(DisasContext *ctx, arg_ld_bu *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+    int mem_idx = ctx->mem_idx;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+
+    gen_base_offset_addr(t0, a->rj, a->si12);
+    tcg_gen_qemu_ld_tl(t0, t0, mem_idx, MO_UB);
+    tcg_gen_mov_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_ld_hu(DisasContext *ctx, arg_ld_hu *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+    int mem_idx = ctx->mem_idx;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+
+    gen_base_offset_addr(t0, a->rj, a->si12);
+    tcg_gen_qemu_ld_tl(t0, t0, mem_idx, MO_TEUW |
+                       ctx->default_tcg_memop_mask);
+    tcg_gen_mov_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_ld_wu(DisasContext *ctx, arg_ld_wu *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+    int mem_idx = ctx->mem_idx;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+
+    gen_base_offset_addr(t0, a->rj, a->si12);
+    tcg_gen_qemu_ld_tl(t0, t0, mem_idx, MO_TEUL |
+                       ctx->default_tcg_memop_mask);
+    tcg_gen_mov_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_ldx_b(DisasContext *ctx, arg_ldx_b *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+    int mem_idx = ctx->mem_idx;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_op_addr_add(t0, Rj, Rk);
+    tcg_gen_qemu_ld_tl(t1, t0, mem_idx, MO_SB);
+    tcg_gen_mov_tl(Rd, t1);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_ldx_h(DisasContext *ctx, arg_ldx_h *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+    int mem_idx = ctx->mem_idx;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_op_addr_add(t0, Rj, Rk);
+    tcg_gen_qemu_ld_tl(t1, t0, mem_idx, MO_TESW |
+                       ctx->default_tcg_memop_mask);
+    tcg_gen_mov_tl(Rd, t1);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_ldx_w(DisasContext *ctx, arg_ldx_w *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+    int mem_idx = ctx->mem_idx;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_op_addr_add(t0, Rj, Rk);
+    tcg_gen_qemu_ld_tl(t1, t0, mem_idx, MO_TESL |
+                       ctx->default_tcg_memop_mask);
+    tcg_gen_mov_tl(Rd, t1);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_ldx_d(DisasContext *ctx, arg_ldx_d *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+    int mem_idx = ctx->mem_idx;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_op_addr_add(t0, Rj, Rk);
+    tcg_gen_qemu_ld_tl(t1, t0, mem_idx, MO_TEQ |
+                       ctx->default_tcg_memop_mask);
+    tcg_gen_mov_tl(Rd, t1);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_stx_b(DisasContext *ctx, arg_stx_b *a)
+{
+    TCGv t0, t1;
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+    int mem_idx = ctx->mem_idx;
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_op_addr_add(t0, Rj, Rk);
+    gen_load_gpr(t1, a->rd);
+    tcg_gen_qemu_st_tl(t1, t0, mem_idx, MO_8);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_stx_h(DisasContext *ctx, arg_stx_h *a)
+{
+    TCGv t0, t1;
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+    int mem_idx = ctx->mem_idx;
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_op_addr_add(t0, Rj, Rk);
+    gen_load_gpr(t1, a->rd);
+    tcg_gen_qemu_st_tl(t1, t0, mem_idx, MO_TEUW |
+                       ctx->default_tcg_memop_mask);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_stx_w(DisasContext *ctx, arg_stx_w *a)
+{
+    TCGv t0, t1;
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+    int mem_idx = ctx->mem_idx;
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_op_addr_add(t0, Rj, Rk);
+    gen_load_gpr(t1, a->rd);
+    tcg_gen_qemu_st_tl(t1, t0, mem_idx, MO_TEUL |
+                       ctx->default_tcg_memop_mask);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_stx_d(DisasContext *ctx, arg_stx_d *a)
+{
+    TCGv t0, t1;
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+    int mem_idx = ctx->mem_idx;
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_op_addr_add(t0, Rj, Rk);
+    gen_load_gpr(t1, a->rd);
+    tcg_gen_qemu_st_tl(t1, t0, mem_idx, MO_TEQ |
+                       ctx->default_tcg_memop_mask);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_ldx_bu(DisasContext *ctx, arg_ldx_bu *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+    int mem_idx = ctx->mem_idx;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_op_addr_add(t0, Rj, Rk);
+    tcg_gen_qemu_ld_tl(t1, t0, mem_idx, MO_UB);
+    tcg_gen_mov_tl(Rd, t1);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_ldx_hu(DisasContext *ctx, arg_ldx_hu *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+    int mem_idx = ctx->mem_idx;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_op_addr_add(t0, Rj, Rk);
+    tcg_gen_qemu_ld_tl(t1, t0, mem_idx, MO_TEUW |
+                       ctx->default_tcg_memop_mask);
+    tcg_gen_mov_tl(Rd, t1);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_ldx_wu(DisasContext *ctx, arg_ldx_wu *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+    int mem_idx = ctx->mem_idx;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_op_addr_add(t0, Rj, Rk);
+    tcg_gen_qemu_ld_tl(t1, t0, mem_idx, MO_TEUL |
+                       ctx->default_tcg_memop_mask);
+    tcg_gen_mov_tl(Rd, t1);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_preld(DisasContext *ctx, arg_preld *a)
+{
+    /* Treat as NOP. */
+    return true;
+}
+
+static bool trans_dbar(DisasContext *ctx, arg_dbar * a)
+{
+    gen_loongarch_sync(a->whint);
+    return true;
+}
+
+static bool trans_ibar(DisasContext *ctx, arg_ibar *a)
+{
+    /*
+     * IBAR is a no-op in QEMU,
+     * however we need to end the translation block
+     */
+    ctx->base.is_jmp = DISAS_STOP;
+    return true;
+}
+
+static bool trans_ldptr_w(DisasContext *ctx, arg_ldptr_w *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+    int mem_idx = ctx->mem_idx;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+
+    gen_base_offset_addr(t0, a->rj, a->si14 << 2);
+    tcg_gen_qemu_ld_tl(t0, t0, mem_idx, MO_TESL |
+                       ctx->default_tcg_memop_mask);
+    tcg_gen_mov_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_stptr_w(DisasContext *ctx, arg_stptr_w *a)
+{
+    TCGv t0, t1;
+    int mem_idx = ctx->mem_idx;
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_base_offset_addr(t0, a->rj, a->si14 << 2);
+    gen_load_gpr(t1, a->rd);
+    tcg_gen_qemu_st_tl(t1, t0, mem_idx, MO_TEUL |
+                       ctx->default_tcg_memop_mask);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_ldptr_d(DisasContext *ctx, arg_ldptr_d *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+    int mem_idx = ctx->mem_idx;
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+
+    gen_base_offset_addr(t0, a->rj, a->si14 << 2);
+    tcg_gen_qemu_ld_tl(t0, t0, mem_idx, MO_TEQ |
+                       ctx->default_tcg_memop_mask);
+    tcg_gen_mov_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_stptr_d(DisasContext *ctx, arg_stptr_d *a)
+{
+    TCGv t0, t1;
+    int mem_idx = ctx->mem_idx;
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_base_offset_addr(t0, a->rj, a->si14 << 2);
+    gen_load_gpr(t1, a->rd);
+    tcg_gen_qemu_st_tl(t1, t0, mem_idx, MO_TEQ |
+                       ctx->default_tcg_memop_mask);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+#define ASRTGT                                \
+    do {                                      \
+        TCGv t1 = get_gpr(a->rj);             \
+        TCGv t2 = get_gpr(a->rk);             \
+        gen_helper_asrtgt_d(cpu_env, t1, t2); \
+    } while (0)
+
+#define ASRTLE                                \
+    do {                                      \
+        TCGv t1 = get_gpr(a->rj);             \
+        TCGv t2 = get_gpr(a->rk);             \
+        gen_helper_asrtle_d(cpu_env, t1, t2); \
+    } while (0)
+
+#define DECL_ARG(name)   \
+    arg_ ## name arg = { \
+        .rd = a->rd,     \
+        .rj = a->rj,     \
+        .rk = a->rk,     \
+    };
+
+static bool trans_ldgt_b(DisasContext *ctx, arg_ldgt_b *a)
+{
+    ASRTGT;
+    DECL_ARG(ldx_b)
+    trans_ldx_b(ctx, &arg);
+    return true;
+}
+
+static bool trans_ldgt_h(DisasContext *ctx, arg_ldgt_h *a)
+{
+    ASRTGT;
+    DECL_ARG(ldx_h)
+    trans_ldx_h(ctx, &arg);
+    return true;
+}
+
+static bool trans_ldgt_w(DisasContext *ctx, arg_ldgt_w *a)
+{
+    ASRTGT;
+    DECL_ARG(ldx_w)
+    trans_ldx_w(ctx, &arg);
+    return true;
+}
+
+static bool trans_ldgt_d(DisasContext *ctx, arg_ldgt_d *a)
+{
+    ASRTGT;
+    DECL_ARG(ldx_d)
+    trans_ldx_d(ctx, &arg);
+    return true;
+}
+
+static bool trans_ldle_b(DisasContext *ctx, arg_ldle_b *a)
+{
+    ASRTLE;
+    DECL_ARG(ldx_b)
+    trans_ldx_b(ctx, &arg);
+    return true;
+}
+
+static bool trans_ldle_h(DisasContext *ctx, arg_ldle_h *a)
+{
+    ASRTLE;
+    DECL_ARG(ldx_h)
+    trans_ldx_h(ctx, &arg);
+    return true;
+}
+
+static bool trans_ldle_w(DisasContext *ctx, arg_ldle_w *a)
+{
+    ASRTLE;
+    DECL_ARG(ldx_w)
+    trans_ldx_w(ctx, &arg);
+    return true;
+}
+
+static bool trans_ldle_d(DisasContext *ctx, arg_ldle_d *a)
+{
+    ASRTLE;
+    DECL_ARG(ldx_d)
+    trans_ldx_d(ctx, &arg);
+    return true;
+}
+
+static bool trans_stgt_b(DisasContext *ctx, arg_stgt_b *a)
+{
+    ASRTGT;
+    DECL_ARG(stx_b)
+    trans_stx_b(ctx, &arg);
+    return true;
+}
+
+static bool trans_stgt_h(DisasContext *ctx, arg_stgt_h *a)
+{
+    ASRTGT;
+    DECL_ARG(stx_h)
+    trans_stx_h(ctx, &arg);
+    return true;
+}
+
+static bool trans_stgt_w(DisasContext *ctx, arg_stgt_w *a)
+{
+    ASRTGT;
+    DECL_ARG(stx_w)
+    trans_stx_w(ctx, &arg);
+    return true;
+}
+
+static bool trans_stgt_d(DisasContext *ctx, arg_stgt_d *a)
+{
+    ASRTGT;
+    DECL_ARG(stx_d)
+    trans_stx_d(ctx, &arg);
+    return true;
+}
+
+static bool trans_stle_b(DisasContext *ctx, arg_stle_b *a)
+{
+    ASRTLE;
+    DECL_ARG(stx_b)
+    trans_stx_b(ctx, &arg);
+    return true;
+}
+
+static bool trans_stle_h(DisasContext *ctx, arg_stle_h *a)
+{
+    ASRTLE;
+    DECL_ARG(stx_h)
+    trans_stx_h(ctx, &arg);
+    return true;
+}
+
+static bool trans_stle_w(DisasContext *ctx, arg_stle_w *a)
+{
+    ASRTLE;
+    DECL_ARG(stx_w)
+    trans_stx_w(ctx, &arg);
+    return true;
+}
+
+static bool trans_stle_d(DisasContext *ctx, arg_stle_d *a)
+{
+    ASRTLE;
+    DECL_ARG(stx_d)
+    trans_stx_d(ctx, &arg);
+    return true;
+}
+
+#undef DECL_ARG
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index b60bdc2..6ce2d6a 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -277,6 +277,35 @@ static void loongarch_tr_init_disas_context(DisasContextBase *dcbase,
     ctx->default_tcg_memop_mask = MO_UNALN;
 }
 
+/* loongarch sync */
+static void gen_loongarch_sync(int stype)
+{
+    TCGBar tcg_mo = TCG_BAR_SC;
+
+    switch (stype) {
+    case 0x4: /* SYNC_WMB */
+        tcg_mo |= TCG_MO_ST_ST;
+        break;
+    case 0x10: /* SYNC_MB */
+        tcg_mo |= TCG_MO_ALL;
+        break;
+    case 0x11: /* SYNC_ACQUIRE */
+        tcg_mo |= TCG_MO_LD_LD | TCG_MO_LD_ST;
+        break;
+    case 0x12: /* SYNC_RELEASE */
+        tcg_mo |= TCG_MO_ST_ST | TCG_MO_LD_ST;
+        break;
+    case 0x13: /* SYNC_RMB */
+        tcg_mo |= TCG_MO_LD_LD;
+        break;
+    default:
+        tcg_mo |= TCG_MO_ALL;
+        break;
+    }
+
+    tcg_gen_mb(tcg_mo);
+}
+
 static void loongarch_tr_tb_start(DisasContextBase *dcbase, CPUState *cs)
 {
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 11/22] target/loongarch: Add fixed point atomic instruction translation
  2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
                   ` (9 preceding siblings ...)
  2021-07-21  9:53 ` [PATCH v2 10/22] target/loongarch: Add fixed point load/store " Song Gao
@ 2021-07-21  9:53 ` Song Gao
  2021-07-23  1:49   ` Richard Henderson
  2021-07-21  9:53 ` [PATCH v2 12/22] target/loongarch: Add fixed point extra " Song Gao
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

This patch implement fixed point atomic instruction translation.

This includes:
- LL.{W/D}, SC.{W/D}
- AM{SWAP/ADD/AND/OR/XOR/MAX/MIN}[_DB].{W/D}
- AM{MAX/MIN}[_DB].{WU/DU}

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insns.decode |  44 +++++++++
 target/loongarch/trans.inc.c  | 210 ++++++++++++++++++++++++++++++++++++++++++
 target/loongarch/translate.c  |  32 +++++++
 3 files changed, 286 insertions(+)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 08fd232..574c055 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -216,3 +216,47 @@ stle_b           0011 10000111 11100 ..... ..... .....    @fmt_rdrjrk
 stle_h           0011 10000111 11101 ..... ..... .....    @fmt_rdrjrk
 stle_w           0011 10000111 11110 ..... ..... .....    @fmt_rdrjrk
 stle_d           0011 10000111 11111 ..... ..... .....    @fmt_rdrjrk
+
+#
+# Fixed point atomic instruction
+#
+ll_w             0010 0000 .............. ..... .....     @fmt_rdrjsi14
+sc_w             0010 0001 .............. ..... .....     @fmt_rdrjsi14
+ll_d             0010 0010 .............. ..... .....     @fmt_rdrjsi14
+sc_d             0010 0011 .............. ..... .....     @fmt_rdrjsi14
+amswap_w         0011 10000110 00000 ..... ..... .....    @fmt_rdrjrk
+amswap_d         0011 10000110 00001 ..... ..... .....    @fmt_rdrjrk
+amadd_w          0011 10000110 00010 ..... ..... .....    @fmt_rdrjrk
+amadd_d          0011 10000110 00011 ..... ..... .....    @fmt_rdrjrk
+amand_w          0011 10000110 00100 ..... ..... .....    @fmt_rdrjrk
+amand_d          0011 10000110 00101 ..... ..... .....    @fmt_rdrjrk
+amor_w           0011 10000110 00110 ..... ..... .....    @fmt_rdrjrk
+amor_d           0011 10000110 00111 ..... ..... .....    @fmt_rdrjrk
+amxor_w          0011 10000110 01000 ..... ..... .....    @fmt_rdrjrk
+amxor_d          0011 10000110 01001 ..... ..... .....    @fmt_rdrjrk
+ammax_w          0011 10000110 01010 ..... ..... .....    @fmt_rdrjrk
+ammax_d          0011 10000110 01011 ..... ..... .....    @fmt_rdrjrk
+ammin_w          0011 10000110 01100 ..... ..... .....    @fmt_rdrjrk
+ammin_d          0011 10000110 01101 ..... ..... .....    @fmt_rdrjrk
+ammax_wu         0011 10000110 01110 ..... ..... .....    @fmt_rdrjrk
+ammax_du         0011 10000110 01111 ..... ..... .....    @fmt_rdrjrk
+ammin_wu         0011 10000110 10000 ..... ..... .....    @fmt_rdrjrk
+ammin_du         0011 10000110 10001 ..... ..... .....    @fmt_rdrjrk
+amswap_db_w      0011 10000110 10010 ..... ..... .....    @fmt_rdrjrk
+amswap_db_d      0011 10000110 10011 ..... ..... .....    @fmt_rdrjrk
+amadd_db_w       0011 10000110 10100 ..... ..... .....    @fmt_rdrjrk
+amadd_db_d       0011 10000110 10101 ..... ..... .....    @fmt_rdrjrk
+amand_db_w       0011 10000110 10110 ..... ..... .....    @fmt_rdrjrk
+amand_db_d       0011 10000110 10111 ..... ..... .....    @fmt_rdrjrk
+amor_db_w        0011 10000110 11000 ..... ..... .....    @fmt_rdrjrk
+amor_db_d        0011 10000110 11001 ..... ..... .....    @fmt_rdrjrk
+amxor_db_w       0011 10000110 11010 ..... ..... .....    @fmt_rdrjrk
+amxor_db_d       0011 10000110 11011 ..... ..... .....    @fmt_rdrjrk
+ammax_db_w       0011 10000110 11100 ..... ..... .....    @fmt_rdrjrk
+ammax_db_d       0011 10000110 11101 ..... ..... .....    @fmt_rdrjrk
+ammin_db_w       0011 10000110 11110 ..... ..... .....    @fmt_rdrjrk
+ammin_db_d       0011 10000110 11111 ..... ..... .....    @fmt_rdrjrk
+ammax_db_wu      0011 10000111 00000 ..... ..... .....    @fmt_rdrjrk
+ammax_db_du      0011 10000111 00001 ..... ..... .....    @fmt_rdrjrk
+ammin_db_wu      0011 10000111 00010 ..... ..... .....    @fmt_rdrjrk
+ammin_db_du      0011 10000111 00011 ..... ..... .....    @fmt_rdrjrk
diff --git a/target/loongarch/trans.inc.c b/target/loongarch/trans.inc.c
index e38001b..a87da4a 100644
--- a/target/loongarch/trans.inc.c
+++ b/target/loongarch/trans.inc.c
@@ -2874,3 +2874,213 @@ static bool trans_stle_d(DisasContext *ctx, arg_stle_d *a)
 }
 
 #undef DECL_ARG
+
+/* Fixed point atomic instruction translation */
+static bool trans_ll_w(DisasContext *ctx, arg_ll_w *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_base_offset_addr(t0, a->rj, a->si14 << 2);
+    tcg_gen_mov_tl(t1, t0);
+    tcg_gen_qemu_ld32s(t0, t0, ctx->mem_idx);
+    tcg_gen_st_tl(t1, cpu_env, offsetof(CPULoongArchState, lladdr));
+    tcg_gen_st_tl(t0, cpu_env, offsetof(CPULoongArchState, llval));
+    tcg_gen_mov_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_sc_w(DisasContext *ctx, arg_sc_w *a)
+{
+    gen_loongarch_st_cond(ctx, a->rd, a->rj, a->si14 << 2, MO_TESL, false);
+    return true;
+}
+
+static bool trans_ll_d(DisasContext *ctx, arg_ll_d *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_base_offset_addr(t0, a->rj, a->si14 << 2);
+    tcg_gen_mov_tl(t1, t0);
+    tcg_gen_qemu_ld64(t0, t0, ctx->mem_idx);
+    tcg_gen_st_tl(t1, cpu_env, offsetof(CPULoongArchState, lladdr));
+    tcg_gen_st_tl(t0, cpu_env, offsetof(CPULoongArchState, llval));
+    tcg_gen_mov_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_sc_d(DisasContext *ctx, arg_sc_d *a)
+{
+    gen_loongarch_st_cond(ctx, a->rd, a->rj, a->si14 << 2, MO_TEQ, false);
+    return true;
+}
+
+#define TRANS_AM_W(name, op)                                      \
+static bool trans_ ## name(DisasContext *ctx, arg_ ## name * a)   \
+{                                                                 \
+    TCGv addr, val, ret;                                          \
+    TCGv Rd = cpu_gpr[a->rd];                                     \
+    int mem_idx = ctx->mem_idx;                                   \
+                                                                  \
+    if (a->rd == 0) {                                             \
+        return true;                                              \
+    }                                                             \
+    if ((a->rd != 0) && ((a->rj == a->rd) || (a->rk == a->rd))) { \
+        printf("%s: warning, register equal\n", __func__);        \
+        return false;                                             \
+    }                                                             \
+                                                                  \
+    addr = get_gpr(a->rj);                                        \
+    val = get_gpr(a->rk);                                         \
+    ret = tcg_temp_new();                                         \
+                                                                  \
+    tcg_gen_atomic_##op##_tl(ret, addr, val, mem_idx, MO_TESL |   \
+                            ctx->default_tcg_memop_mask);         \
+    tcg_gen_mov_tl(Rd, ret);                                      \
+                                                                  \
+    tcg_temp_free(ret);                                           \
+                                                                  \
+    return true;                                                  \
+}
+#define TRANS_AM_D(name, op)                                      \
+static bool trans_ ## name(DisasContext *ctx, arg_ ## name * a)   \
+{                                                                 \
+    TCGv addr, val, ret;                                          \
+    TCGv Rd = cpu_gpr[a->rd];                                     \
+    int mem_idx = ctx->mem_idx;                                   \
+                                                                  \
+    if (a->rd == 0) {                                             \
+        return true;                                              \
+    }                                                             \
+    if ((a->rd != 0) && ((a->rj == a->rd) || (a->rk == a->rd))) { \
+        printf("%s: warning, register equal\n", __func__);        \
+        return false;                                             \
+    }                                                             \
+    addr = get_gpr(a->rj);                                        \
+    val = get_gpr(a->rk);                                         \
+    ret = tcg_temp_new();                                         \
+                                                                  \
+    tcg_gen_atomic_##op##_tl(ret, addr, val, mem_idx, MO_TEQ |    \
+                            ctx->default_tcg_memop_mask);         \
+    tcg_gen_mov_tl(Rd, ret);                                      \
+                                                                  \
+    tcg_temp_free(ret);                                           \
+                                                                  \
+    return true;                                                  \
+}
+#define TRANS_AM(name, op)   \
+    TRANS_AM_W(name##_w, op) \
+    TRANS_AM_D(name##_d, op)
+TRANS_AM(amswap, xchg)      /* trans_amswap_w, trans_amswap_d */
+TRANS_AM(amadd, fetch_add)  /* trans_amadd_w, trans_amadd_d   */
+TRANS_AM(amand, fetch_and)  /* trans_amand_w, trans_amand_d   */
+TRANS_AM(amor, fetch_or)    /* trans_amor_w, trans_amor_d     */
+TRANS_AM(amxor, fetch_xor)  /* trans_amxor_w, trans_amxor_d   */
+TRANS_AM(ammax, fetch_smax) /* trans_ammax_w, trans_ammax_d   */
+TRANS_AM(ammin, fetch_smin) /* trans_ammin_w, trans_ammin_d   */
+TRANS_AM_W(ammax_wu, fetch_umax)    /* trans_ammax_wu */
+TRANS_AM_D(ammax_du, fetch_umax)    /* trans_ammax_du */
+TRANS_AM_W(ammin_wu, fetch_umin)    /* trans_ammin_wu */
+TRANS_AM_D(ammin_du, fetch_umin)    /* trans_ammin_du */
+#undef TRANS_AM
+#undef TRANS_AM_W
+#undef TRANS_AM_D
+
+#define TRANS_AM_DB_W(name, op)                                   \
+static bool trans_ ## name(DisasContext *ctx, arg_ ## name * a)   \
+{                                                                 \
+    TCGv addr, val, ret;                                          \
+    TCGv Rd = cpu_gpr[a->rd];                                     \
+    int mem_idx = ctx->mem_idx;                                   \
+                                                                  \
+    if (a->rd == 0) {                                             \
+        return true;                                              \
+    }                                                             \
+    if ((a->rd != 0) && ((a->rj == a->rd) || (a->rk == a->rd))) { \
+        printf("%s: warning, register equal\n", __func__);        \
+        return false;                                             \
+    }                                                             \
+                                                                  \
+    addr = get_gpr(a->rj);                                        \
+    val = get_gpr(a->rk);                                         \
+    ret = tcg_temp_new();                                         \
+                                                                  \
+    gen_loongarch_sync(0x10);                                     \
+    tcg_gen_atomic_##op##_tl(ret, addr, val, mem_idx, MO_TESL |   \
+                            ctx->default_tcg_memop_mask);         \
+    tcg_gen_mov_tl(Rd, ret);                                      \
+                                                                  \
+    tcg_temp_free(ret);                                           \
+                                                                  \
+    return true;                                                  \
+}
+#define TRANS_AM_DB_D(name, op)                                   \
+static bool trans_ ## name(DisasContext *ctx, arg_ ## name * a)   \
+{                                                                 \
+    TCGv addr, val, ret;                                          \
+    TCGv Rd = cpu_gpr[a->rd];                                     \
+    int mem_idx = ctx->mem_idx;                                   \
+                                                                  \
+    if (a->rd == 0) {                                             \
+        return true;                                              \
+    }                                                             \
+    if ((a->rd != 0) && ((a->rj == a->rd) || (a->rk == a->rd))) { \
+        printf("%s: warning, register equal\n", __func__);        \
+        return false;                                             \
+    }                                                             \
+                                                                  \
+    addr = get_gpr(a->rj);                                        \
+    val = get_gpr(a->rk);                                         \
+    ret = tcg_temp_new();                                         \
+                                                                  \
+    gen_loongarch_sync(0x10);                                     \
+    tcg_gen_atomic_##op##_tl(ret, addr, val, mem_idx, MO_TEQ |    \
+                            ctx->default_tcg_memop_mask);         \
+    tcg_gen_mov_tl(Rd, ret);                                      \
+                                                                  \
+    tcg_temp_free(ret);                                           \
+                                                                  \
+    return true;                                                  \
+}
+#define TRANS_AM_DB(name, op)      \
+    TRANS_AM_DB_W(name##_db_w, op) \
+    TRANS_AM_DB_D(name##_db_d, op)
+TRANS_AM_DB(amswap, xchg)      /* trans_amswap_db_w, trans_amswap_db_d */
+TRANS_AM_DB(amadd, fetch_add)  /* trans_amadd_db_w, trans_amadd_db_d   */
+TRANS_AM_DB(amand, fetch_and)  /* trans_amand_db_w, trans_amand_db_d   */
+TRANS_AM_DB(amor, fetch_or)    /* trans_amor_db_w, trans_amor_db_d     */
+TRANS_AM_DB(amxor, fetch_xor)  /* trans_amxor_db_w, trans_amxor_db_d   */
+TRANS_AM_DB(ammax, fetch_smax) /* trans_ammax_db_w, trans_ammax_db_d   */
+TRANS_AM_DB(ammin, fetch_smin) /* trans_ammin_db_w, trans_ammin_db_d   */
+TRANS_AM_DB_W(ammax_db_wu, fetch_umax)    /* trans_ammax_db_wu */
+TRANS_AM_DB_D(ammax_db_du, fetch_umax)    /* trans_ammax_db_du */
+TRANS_AM_DB_W(ammin_db_wu, fetch_umin)    /* trans_ammin_db_wu */
+TRANS_AM_DB_D(ammin_db_du, fetch_umin)    /* trans_ammin_db_du */
+#undef TRANS_AM_DB
+#undef TRANS_AM_DB_W
+#undef TRANS_AM_DB_D
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index 6ce2d6a..2d3547f 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -306,6 +306,38 @@ static void gen_loongarch_sync(int stype)
     tcg_gen_mb(tcg_mo);
 }
 
+/* loongarch st cond */
+static void gen_loongarch_st_cond(DisasContext *ctx, int rd, int base,
+                                  int offset, MemOp tcg_mo, bool eva)
+{
+    TCGv Rd = cpu_gpr[rd];
+    TCGv t0 = tcg_temp_new();
+    TCGv addr = tcg_temp_new();
+    TCGv val = tcg_temp_new();
+    TCGLabel *l1 = gen_new_label();
+    TCGLabel *done = gen_new_label();
+
+    /* compare the address against that of the preceding LL */
+    gen_base_offset_addr(addr, base, offset);
+    tcg_gen_brcond_tl(TCG_COND_EQ, addr, cpu_lladdr, l1);
+    tcg_gen_movi_tl(t0, 0);
+    tcg_gen_mov_tl(Rd, t0);
+    tcg_gen_br(done);
+
+    gen_set_label(l1);
+    /* generate cmpxchg */
+    gen_load_gpr(val, rd);
+    tcg_gen_atomic_cmpxchg_tl(t0, cpu_lladdr, cpu_llval, val,
+                              eva ? LOONGARCH_HFLAG_UM : ctx->mem_idx, tcg_mo);
+    tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_llval);
+    tcg_gen_mov_tl(Rd, t0);
+
+    gen_set_label(done);
+    tcg_temp_free(t0);
+    tcg_temp_free(addr);
+    tcg_temp_free(val);
+}
+
 static void loongarch_tr_tb_start(DisasContextBase *dcbase, CPUState *cs)
 {
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 12/22] target/loongarch: Add fixed point extra instruction translation
  2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
                   ` (10 preceding siblings ...)
  2021-07-21  9:53 ` [PATCH v2 11/22] target/loongarch: Add fixed point atomic " Song Gao
@ 2021-07-21  9:53 ` Song Gao
  2021-07-23  5:12   ` Richard Henderson
  2021-07-21  9:53 ` [PATCH v2 13/22] target/loongarch: Add floating point arithmetic " Song Gao
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

This patch implement fixed point extra instruction translation.

This includes:
- CRC[C].W.{B/H/W/D}.W
- SYSCALL
- BREAK
- ASRT{LE/GT}.D
- RDTIME{L/H}.W, RDTIME.D
- CPUCFG

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/helper.h     |   4 +
 target/loongarch/insns.decode |  25 +++++
 target/loongarch/op_helper.c  |  69 +++++++++++++
 target/loongarch/trans.inc.c  | 235 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 333 insertions(+)

diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 5cd38c8..a60f293 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -21,3 +21,7 @@ DEF_HELPER_FLAGS_1(loongarch_dbitswap, TCG_CALL_NO_RWG_SE, tl, tl)
 
 DEF_HELPER_3(asrtle_d, void, env, tl, tl)
 DEF_HELPER_3(asrtgt_d, void, env, tl, tl)
+
+DEF_HELPER_3(crc32, tl, tl, tl, i32)
+DEF_HELPER_3(crc32c, tl, tl, tl, i32)
+DEF_HELPER_2(cpucfg, tl, env, tl)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 574c055..66bc314 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -27,6 +27,7 @@
 %si14    10:s14
 %hint    0:5
 %whint   0:15
+%code    0:15
 
 #
 # Argument sets
@@ -46,6 +47,8 @@
 &fmt_rdrjsi14       rd rj si14
 &fmt_hintrjsi12     hint rj si12
 &fmt_whint          whint
+&fmt_rjrk           rj rk
+&fmt_code           code
 
 #
 # Formats
@@ -65,6 +68,8 @@
 @fmt_hintrjsi12      .... ...... ............ ..... .....     &fmt_hintrjsi12     %hint %rj %si12
 @fmt_whint           .... ........ ..... ...............      &fmt_whint          %whint
 @fmt_rdrjsi14        .... .... .............. ..... .....     &fmt_rdrjsi14       %rd %rj %si14
+@fmt_rjrk            .... ........ ..... ..... ..... .....    &fmt_rjrk           %rj %rk
+@fmt_code            .... ........ ..... ...............      &fmt_code           %code
 
 #
 # Fixed point arithmetic operation instruction
@@ -260,3 +265,23 @@ ammax_db_wu      0011 10000111 00000 ..... ..... .....    @fmt_rdrjrk
 ammax_db_du      0011 10000111 00001 ..... ..... .....    @fmt_rdrjrk
 ammin_db_wu      0011 10000111 00010 ..... ..... .....    @fmt_rdrjrk
 ammin_db_du      0011 10000111 00011 ..... ..... .....    @fmt_rdrjrk
+
+#
+# Fixed point extra instruction
+#
+crc_w_b_w        0000 00000010 01000 ..... ..... .....    @fmt_rdrjrk
+crc_w_h_w        0000 00000010 01001 ..... ..... .....    @fmt_rdrjrk
+crc_w_w_w        0000 00000010 01010 ..... ..... .....    @fmt_rdrjrk
+crc_w_d_w        0000 00000010 01011 ..... ..... .....    @fmt_rdrjrk
+crcc_w_b_w       0000 00000010 01100 ..... ..... .....    @fmt_rdrjrk
+crcc_w_h_w       0000 00000010 01101 ..... ..... .....    @fmt_rdrjrk
+crcc_w_w_w       0000 00000010 01110 ..... ..... .....    @fmt_rdrjrk
+crcc_w_d_w       0000 00000010 01111 ..... ..... .....    @fmt_rdrjrk
+break            0000 00000010 10100 ...............      @fmt_code
+syscall          0000 00000010 10110 ...............      @fmt_code
+asrtle_d         0000 00000000 00010 ..... ..... 00000    @fmt_rjrk
+asrtgt_d         0000 00000000 00011 ..... ..... 00000    @fmt_rjrk
+rdtimel_w        0000 00000000 00000 11000 ..... .....    @fmt_rdrj
+rdtimeh_w        0000 00000000 00000 11001 ..... .....    @fmt_rdrj
+rdtime_d         0000 00000000 00000 11010 ..... .....    @fmt_rdrj
+cpucfg           0000 00000000 00000 11011 ..... .....    @fmt_rdrj
diff --git a/target/loongarch/op_helper.c b/target/loongarch/op_helper.c
index 738e067..5bf2806 100644
--- a/target/loongarch/op_helper.c
+++ b/target/loongarch/op_helper.c
@@ -13,6 +13,8 @@
 #include "exec/helper-proto.h"
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
+#include "qemu/crc32c.h"
+#include <zlib.h>
 
 /* Exceptions helpers */
 void helper_raise_exception_err(CPULoongArchState *env, uint32_t exception,
@@ -159,3 +161,70 @@ void helper_asrtgt_d(CPULoongArchState *env, target_ulong rj, target_ulong rk)
         do_raise_exception(env, EXCP_ADE, GETPC());
     }
 }
+
+target_ulong helper_crc32(target_ulong val, target_ulong m, uint32_t sz)
+{
+    uint8_t buf[8];
+    target_ulong mask = ((sz * 8) == 64) ? -1ULL : ((1ULL << (sz * 8)) - 1);
+
+    m &= mask;
+    stq_le_p(buf, m);
+    return (int32_t) (crc32(val ^ 0xffffffff, buf, sz) ^ 0xffffffff);
+}
+
+target_ulong helper_crc32c(target_ulong val, target_ulong m, uint32_t sz)
+{
+    uint8_t buf[8];
+    target_ulong mask = ((sz * 8) == 64) ? -1ULL : ((1ULL << (sz * 8)) - 1);
+    m &= mask;
+    stq_le_p(buf, m);
+    return (int32_t) (crc32c(val, buf, sz) ^ 0xffffffff);
+}
+
+target_ulong helper_cpucfg(CPULoongArchState *env, target_ulong rj)
+{
+    target_ulong r = 0;
+
+    switch (rj) {
+    case 0:
+        r = env->CSR_MCSR0 & 0xffffffff;
+        break;
+    case 1:
+        r = (env->CSR_MCSR0 & 0xffffffff00000000) >> 32;
+        break;
+    case 2:
+        r = env->CSR_MCSR1 & 0xffffffff;
+        break;
+    case 3:
+        r = (env->CSR_MCSR1 & 0xffffffff00000000) >> 32;
+        break;
+    case 4:
+        r = env->CSR_MCSR2 & 0xffffffff;
+        break;
+    case 5:
+        r = (env->CSR_MCSR2 & 0xffffffff00000000) >> 32;
+        break;
+    case 6:
+        r = env->CSR_MCSR3 & 0xffffffff;
+        break;
+    case 10:
+        r = env->CSR_MCSR8 & 0xffffffff;
+        break;
+    case 11:
+        r = (env->CSR_MCSR8 & 0xffffffff00000000) >> 32;
+        break;
+    case 12:
+        r = env->CSR_MCSR9 & 0xffffffff;
+        break;
+    case 13:
+        r = (env->CSR_MCSR9 & 0xffffffff00000000) >> 32;
+        break;
+    case 14:
+        r = env->CSR_MCSR10 & 0xffffffff;
+        break;
+    case 30:
+        r = env->CSR_MCSR24 & 0xffffffff;
+        break;
+    }
+    return r;
+}
diff --git a/target/loongarch/trans.inc.c b/target/loongarch/trans.inc.c
index a87da4a..366877e 100644
--- a/target/loongarch/trans.inc.c
+++ b/target/loongarch/trans.inc.c
@@ -3084,3 +3084,238 @@ TRANS_AM_DB_D(ammin_db_du, fetch_umin)    /* trans_ammin_db_du */
 #undef TRANS_AM_DB
 #undef TRANS_AM_DB_W
 #undef TRANS_AM_DB_D
+
+/* Fixed point extra instruction translation */
+static bool trans_crc_w_b_w(DisasContext *ctx, arg_crc_w_b_w *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv_i32 tsz = tcg_const_i32(1 << 1);
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rk);
+    t1 = get_gpr(a->rj);
+
+    gen_helper_crc32(Rd, t0, t1, tsz);
+
+    tcg_temp_free_i32(tsz);
+
+    return true;
+}
+
+static bool trans_crc_w_h_w(DisasContext *ctx, arg_crc_w_h_w *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv_i32 tsz = tcg_const_i32(1 << 2);
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rk);
+    t1 = get_gpr(a->rj);
+
+    gen_helper_crc32(Rd, t0, t1, tsz);
+
+    tcg_temp_free_i32(tsz);
+
+    return true;
+}
+
+static bool trans_crc_w_w_w(DisasContext *ctx, arg_crc_w_w_w *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv_i32 tsz = tcg_const_i32(1 << 4);
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rk);
+    t1 = get_gpr(a->rj);
+
+    gen_helper_crc32(Rd, t0, t1, tsz);
+
+    tcg_temp_free_i32(tsz);
+
+    return true;
+}
+
+static bool trans_crc_w_d_w(DisasContext *ctx, arg_crc_w_d_w *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv_i32 tsz = tcg_const_i32(1 << 8);
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rk);
+    t1 = get_gpr(a->rj);
+
+    gen_helper_crc32(Rd, t0, t1, tsz);
+
+    tcg_temp_free_i32(tsz);
+
+    return true;
+}
+
+static bool trans_crcc_w_b_w(DisasContext *ctx, arg_crcc_w_b_w *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv_i32 tsz = tcg_const_i32(1 << 1);
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rk);
+    t1 = get_gpr(a->rj);
+
+    gen_helper_crc32c(Rd, t0, t1, tsz);
+
+    tcg_temp_free_i32(tsz);
+
+    return true;
+}
+
+static bool trans_crcc_w_h_w(DisasContext *ctx, arg_crcc_w_h_w *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv_i32 tsz = tcg_const_i32(1 << 2);
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rk);
+    t1 = get_gpr(a->rj);
+
+    gen_helper_crc32c(Rd, t0, t1, tsz);
+
+    tcg_temp_free_i32(tsz);
+
+    return true;
+}
+
+static bool trans_crcc_w_w_w(DisasContext *ctx, arg_crcc_w_w_w *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv_i32 tsz = tcg_const_i32(1 << 4);
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rk);
+    t1 = get_gpr(a->rj);
+
+    gen_helper_crc32c(Rd, t0, t1, tsz);
+
+    tcg_temp_free_i32(tsz);
+
+    return true;
+}
+
+static bool trans_crcc_w_d_w(DisasContext *ctx, arg_crcc_w_d_w *a)
+{
+    TCGv t0, t1;
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv_i32 tsz = tcg_const_i32(1 << 8);
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = get_gpr(a->rk);
+    t1 = get_gpr(a->rj);
+
+    gen_helper_crc32c(Rd, t0, t1, tsz);
+
+    tcg_temp_free_i32(tsz);
+
+    return true;
+}
+
+static bool trans_break(DisasContext *ctx, arg_break *a)
+{
+    generate_exception_end(ctx, EXCP_BREAK);
+    return true;
+}
+
+static bool trans_syscall(DisasContext *ctx, arg_syscall *a)
+{
+    generate_exception_end(ctx, EXCP_SYSCALL);
+    return true;
+}
+
+static bool trans_asrtle_d(DisasContext *ctx, arg_asrtle_d * a)
+{
+    TCGv t0, t1;
+
+    t0 = get_gpr(a->rj);
+    t1 = get_gpr(a->rk);
+
+    gen_helper_asrtle_d(cpu_env, t0, t1);
+
+    return true;
+}
+
+static bool trans_asrtgt_d(DisasContext *ctx, arg_asrtgt_d * a)
+{
+    TCGv t0, t1;
+
+    t0 = get_gpr(a->rj);
+    t1 = get_gpr(a->rk);
+
+    gen_helper_asrtgt_d(cpu_env, t0, t1);
+
+    return true;
+}
+
+static bool trans_rdtimel_w(DisasContext *ctx, arg_rdtimel_w *a)
+{
+    /* Nop */
+    return true;
+}
+
+static bool trans_rdtimeh_w(DisasContext *ctx, arg_rdtimeh_w *a)
+{
+    /* Nop */
+    return true;
+}
+
+static bool trans_rdtime_d(DisasContext *ctx, arg_rdtime_d *a)
+{
+    /* Nop */
+    return true;
+}
+
+static bool trans_cpucfg(DisasContext *ctx, arg_cpucfg *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    t0 = get_gpr(a->rj);
+
+    gen_helper_cpucfg(Rd, cpu_env, t0);
+
+    return true;
+}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 13/22] target/loongarch: Add floating point arithmetic instruction translation
  2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
                   ` (11 preceding siblings ...)
  2021-07-21  9:53 ` [PATCH v2 12/22] target/loongarch: Add fixed point extra " Song Gao
@ 2021-07-21  9:53 ` Song Gao
  2021-07-23  5:44   ` Richard Henderson
  2021-07-21  9:53 ` [PATCH v2 14/22] target/loongarch: Add floating point comparison " Song Gao
                   ` (8 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

This patch implement floating point arithmetic instruction translation.

This includes:
- F{ADD/SUB/MUL/DIV}.{S/D}
- F{MADD/MSUB/NMADD/NMSUB}.{S/D}
- F{MAX/MIN}.{S/D}
- F{MAXA/MINA}.{S/D}
- F{ABS/NEG}.{S/D}
- F{SQRT/RECIP/RSQRT}.{S/D}
- F{SCALEB/LOGB/COPYSIGN}.{S/D}
- FCLASS.{S/D}

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/cpu.c        |   2 +
 target/loongarch/fpu_helper.c | 380 ++++++++++++++++++++
 target/loongarch/fpu_helper.h |  34 ++
 target/loongarch/helper.h     |  47 +++
 target/loongarch/insns.decode |  56 +++
 target/loongarch/trans.inc.c  | 806 ++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 1325 insertions(+)
 create mode 100644 target/loongarch/fpu_helper.c
 create mode 100644 target/loongarch/fpu_helper.h

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 6269dd9..e696fda 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -18,6 +18,7 @@
 #include "cpu.h"
 #include "cpu-csr.h"
 #include "cpu-qom.h"
+#include "fpu_helper.h"
 
 static const char * const excp_names[EXCP_LAST + 1] = {
     [EXCP_INTE] = "Interrupt error",
@@ -199,6 +200,7 @@ static void loongarch_cpu_reset(DeviceState *dev)
     env->active_fpu.fcsr0 = 0x0;
 
     compute_hflags(env);
+    restore_fp_status(env);
     cs->exception_index = EXCP_NONE;
 }
 
diff --git a/target/loongarch/fpu_helper.c b/target/loongarch/fpu_helper.c
new file mode 100644
index 0000000..399a98b
--- /dev/null
+++ b/target/loongarch/fpu_helper.c
@@ -0,0 +1,380 @@
+/*
+ * LoongArch float point emulation helpers for qemu
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "fpu_helper.h"
+#include "exec/helper-proto.h"
+#include "exec/exec-all.h"
+#include "exec/cpu_ldst.h"
+#include "fpu/softfloat.h"
+
+#define FP_TO_INT32_OVERFLOW 0x7fffffff
+#define FP_TO_INT64_OVERFLOW 0x7fffffffffffffffULL
+
+#define FP_CLASS_SIGNALING_NAN      0x001
+#define FP_CLASS_QUIET_NAN          0x002
+#define FP_CLASS_NEGATIVE_INFINITY  0x004
+#define FP_CLASS_NEGATIVE_NORMAL    0x008
+#define FP_CLASS_NEGATIVE_SUBNORMAL 0x010
+#define FP_CLASS_NEGATIVE_ZERO      0x020
+#define FP_CLASS_POSITIVE_INFINITY  0x040
+#define FP_CLASS_POSITIVE_NORMAL    0x080
+#define FP_CLASS_POSITIVE_SUBNORMAL 0x100
+#define FP_CLASS_POSITIVE_ZERO      0x200
+
+/* convert loongarch rounding mode in fcsr0 to IEEE library */
+const FloatRoundMode ieee_rm[4] = {
+    float_round_nearest_even,
+    float_round_to_zero,
+    float_round_up,
+    float_round_down
+};
+
+int ieee_ex_to_loongarch(int xcpt)
+{
+    int ret = 0;
+    if (xcpt) {
+        if (xcpt & float_flag_invalid) {
+            ret |= FP_INVALID;
+        }
+        if (xcpt & float_flag_overflow) {
+            ret |= FP_OVERFLOW;
+        }
+        if (xcpt & float_flag_underflow) {
+            ret |= FP_UNDERFLOW;
+        }
+        if (xcpt & float_flag_divbyzero) {
+            ret |= FP_DIV0;
+        }
+        if (xcpt & float_flag_inexact) {
+            ret |= FP_INEXACT;
+        }
+    }
+    return ret;
+}
+
+static inline void update_fcsr0(CPULoongArchState *env, uintptr_t pc)
+{
+    int tmp = ieee_ex_to_loongarch(get_float_exception_flags(
+                                  &env->active_fpu.fp_status));
+
+    SET_FP_CAUSE(env->active_fpu.fcsr0, tmp);
+    if (tmp) {
+        set_float_exception_flags(0, &env->active_fpu.fp_status);
+
+        if (GET_FP_ENABLE(env->active_fpu.fcsr0) & tmp) {
+            do_raise_exception(env, EXCP_FPE, pc);
+        } else {
+            UPDATE_FP_FLAGS(env->active_fpu.fcsr0, tmp);
+        }
+    }
+}
+
+uint64_t helper_fp_sqrt_d(CPULoongArchState *env, uint64_t fp)
+{
+    fp = float64_sqrt(fp, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return fp;
+}
+
+uint32_t helper_fp_sqrt_s(CPULoongArchState *env, uint32_t fp)
+{
+    fp = float32_sqrt(fp, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return fp;
+}
+
+uint64_t helper_fp_abs_d(uint64_t fp)
+{
+    return float64_abs(fp);
+}
+uint32_t helper_fp_abs_s(uint32_t fp)
+{
+    return float32_abs(fp);
+}
+
+uint64_t helper_fp_neg_d(uint64_t fp)
+{
+    return float64_chs(fp);
+}
+uint32_t helper_fp_neg_s(uint32_t fp)
+{
+    return float32_chs(fp);
+}
+
+uint64_t helper_fp_recip_d(CPULoongArchState *env, uint64_t fp)
+{
+    uint64_t fp1;
+
+    fp1 = float64_div(float64_one, fp, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return fp1;
+}
+
+uint32_t helper_fp_recip_s(CPULoongArchState *env, uint32_t fp)
+{
+    uint32_t fp1;
+
+    fp1 = float32_div(float32_one, fp, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return fp1;
+}
+
+uint64_t helper_fp_rsqrt_d(CPULoongArchState *env, uint64_t fp)
+{
+    uint64_t fp1;
+
+    fp1 = float64_sqrt(fp, &env->active_fpu.fp_status);
+    fp1 = float64_div(float64_one, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return fp1;
+}
+
+uint32_t helper_fp_rsqrt_s(CPULoongArchState *env, uint32_t fp)
+{
+    uint32_t fp1;
+
+    fp1 = float32_sqrt(fp, &env->active_fpu.fp_status);
+    fp1 = float32_div(float32_one, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return fp1;
+}
+
+uint32_t fp_class_s(uint32_t arg, float_status *status)
+{
+    if (float32_is_signaling_nan(arg, status)) {
+        return FP_CLASS_SIGNALING_NAN;
+    } else if (float32_is_quiet_nan(arg, status)) {
+        return FP_CLASS_QUIET_NAN;
+    } else if (float32_is_neg(arg)) {
+        if (float32_is_infinity(arg)) {
+            return FP_CLASS_NEGATIVE_INFINITY;
+        } else if (float32_is_zero(arg)) {
+            return FP_CLASS_NEGATIVE_ZERO;
+        } else if (float32_is_zero_or_denormal(arg)) {
+            return FP_CLASS_NEGATIVE_SUBNORMAL;
+        } else {
+            return FP_CLASS_NEGATIVE_NORMAL;
+        }
+    } else {
+        if (float32_is_infinity(arg)) {
+            return FP_CLASS_POSITIVE_INFINITY;
+        } else if (float32_is_zero(arg)) {
+            return FP_CLASS_POSITIVE_ZERO;
+        } else if (float32_is_zero_or_denormal(arg)) {
+            return FP_CLASS_POSITIVE_SUBNORMAL;
+        } else {
+            return FP_CLASS_POSITIVE_NORMAL;
+        }
+    }
+}
+
+uint32_t helper_fp_class_s(CPULoongArchState *env, uint32_t arg)
+{
+    return fp_class_s(arg, &env->active_fpu.fp_status);
+}
+
+uint64_t fp_class_d(uint64_t arg, float_status *status)
+{
+    if (float64_is_signaling_nan(arg, status)) {
+        return FP_CLASS_SIGNALING_NAN;
+    } else if (float64_is_quiet_nan(arg, status)) {
+        return FP_CLASS_QUIET_NAN;
+    } else if (float64_is_neg(arg)) {
+        if (float64_is_infinity(arg)) {
+            return FP_CLASS_NEGATIVE_INFINITY;
+        } else if (float64_is_zero(arg)) {
+            return FP_CLASS_NEGATIVE_ZERO;
+        } else if (float64_is_zero_or_denormal(arg)) {
+            return FP_CLASS_NEGATIVE_SUBNORMAL;
+        } else {
+            return FP_CLASS_NEGATIVE_NORMAL;
+        }
+    } else {
+        if (float64_is_infinity(arg)) {
+            return FP_CLASS_POSITIVE_INFINITY;
+        } else if (float64_is_zero(arg)) {
+            return FP_CLASS_POSITIVE_ZERO;
+        } else if (float64_is_zero_or_denormal(arg)) {
+            return FP_CLASS_POSITIVE_SUBNORMAL;
+        } else {
+            return FP_CLASS_POSITIVE_NORMAL;
+        }
+    }
+}
+
+uint64_t helper_fp_class_d(CPULoongArchState *env, uint64_t arg)
+{
+    return fp_class_d(arg, &env->active_fpu.fp_status);
+}
+
+uint64_t helper_fp_add_d(CPULoongArchState *env, uint64_t fp, uint64_t fp1)
+{
+    uint64_t fp2;
+
+    fp2 = float64_add(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return fp2;
+}
+
+uint32_t helper_fp_add_s(CPULoongArchState *env, uint32_t fp, uint32_t fp1)
+{
+    uint32_t fp2;
+
+    fp2 = float32_add(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return fp2;
+}
+
+uint64_t helper_fp_sub_d(CPULoongArchState *env, uint64_t fp, uint64_t fp1)
+{
+    uint64_t fp2;
+
+    fp2 = float64_sub(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return fp2;
+}
+
+uint32_t helper_fp_sub_s(CPULoongArchState *env, uint32_t fp, uint32_t fp1)
+{
+    uint32_t fp2;
+
+    fp2 = float32_sub(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return fp2;
+}
+
+uint64_t helper_fp_mul_d(CPULoongArchState *env, uint64_t fp, uint64_t fp1)
+{
+    uint64_t fp2;
+
+    fp2 = float64_mul(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return fp2;
+}
+
+uint32_t helper_fp_mul_s(CPULoongArchState *env, uint32_t fp, uint32_t fp1)
+{
+    uint32_t fp2;
+
+    fp2 = float32_mul(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return fp2;
+}
+
+uint64_t helper_fp_div_d(CPULoongArchState *env, uint64_t fp, uint64_t fp1)
+{
+    uint64_t fp2;
+
+    fp2 = float64_div(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return fp2;
+}
+
+uint32_t helper_fp_div_s(CPULoongArchState *env, uint32_t fp, uint32_t fp1)
+{
+    uint32_t fp2;
+
+    fp2 = float32_div(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return fp2;
+}
+
+uint64_t helper_fp_exp2_d(CPULoongArchState *env,
+                          uint64_t fp, uint64_t fp1)
+{
+    uint64_t fp2;
+    int64_t n = (int64_t)fp1;
+
+    fp2 = float64_scalbn(fp,
+                         n >  0x1000 ?  0x1000 :
+                         n < -0x1000 ? -0x1000 : n,
+                         &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return fp2;
+}
+
+uint32_t helper_fp_exp2_s(CPULoongArchState *env,
+                          uint32_t fp, uint32_t fp1)
+{
+    uint32_t fp2;
+    int32_t n = (int32_t)fp1;
+
+    fp2 = float32_scalbn(fp,
+                         n >  0x200 ?  0x200 :
+                         n < -0x200 ? -0x200 : n,
+                         &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return fp2;
+}
+
+#define FP_MINMAX(name, bits, minmaxfunc)                                 \
+uint ## bits ## _t helper_fp_ ## name(CPULoongArchState *env,             \
+                                      uint ## bits ## _t fs,              \
+                                      uint ## bits ## _t ft)              \
+{                                                                         \
+    uint ## bits ## _t fdret;                                             \
+                                                                          \
+    fdret = float ## bits ## _ ## minmaxfunc(fs, ft,                      \
+                                             &env->active_fpu.fp_status); \
+    update_fcsr0(env, GETPC());                                           \
+    return fdret;                                                         \
+}
+
+FP_MINMAX(max_s, 32, maxnum)
+FP_MINMAX(max_d, 64, maxnum)
+FP_MINMAX(maxa_s, 32, maxnummag)
+FP_MINMAX(maxa_d, 64, maxnummag)
+FP_MINMAX(min_s, 32, minnum)
+FP_MINMAX(min_d, 64, minnum)
+FP_MINMAX(mina_s, 32, minnummag)
+FP_MINMAX(mina_d, 64, minnummag)
+#undef FP_MINMAX
+
+#define FP_FMADDSUB(name, bits, muladd_arg)                       \
+uint ## bits ## _t helper_fp_ ## name(CPULoongArchState *env,     \
+                                      uint ## bits ## _t fs,      \
+                                      uint ## bits ## _t ft,      \
+                                      uint ## bits ## _t fd)      \
+{                                                                 \
+    uint ## bits ## _t fdret;                                     \
+                                                                  \
+    fdret = float ## bits ## _muladd(fs, ft, fd, muladd_arg,      \
+                                     &env->active_fpu.fp_status); \
+    update_fcsr0(env, GETPC());                                   \
+    return fdret;                                                 \
+}
+
+FP_FMADDSUB(madd_s, 32, 0)
+FP_FMADDSUB(madd_d, 64, 0)
+FP_FMADDSUB(msub_s, 32, float_muladd_negate_c)
+FP_FMADDSUB(msub_d, 64, float_muladd_negate_c)
+FP_FMADDSUB(nmadd_s, 32, float_muladd_negate_result)
+FP_FMADDSUB(nmadd_d, 64, float_muladd_negate_result)
+FP_FMADDSUB(nmsub_s, 32, float_muladd_negate_result | float_muladd_negate_c)
+FP_FMADDSUB(nmsub_d, 64, float_muladd_negate_result | float_muladd_negate_c)
+#undef FP_FMADDSUB
+
+uint32_t helper_fp_logb_s(CPULoongArchState *env, uint32_t fp)
+{
+    uint32_t fp1;
+
+    fp1 = float32_log2(fp, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return fp1;
+}
+
+uint64_t helper_fp_logb_d(CPULoongArchState *env, uint64_t fp)
+{
+    uint64_t fp1;
+
+    fp1 = float64_log2(fp, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return fp1;
+}
diff --git a/target/loongarch/fpu_helper.h b/target/loongarch/fpu_helper.h
new file mode 100644
index 0000000..2537a3d
--- /dev/null
+++ b/target/loongarch/fpu_helper.h
@@ -0,0 +1,34 @@
+/*
+ * QEMU LoongArch CPU
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+#include "fpu/softfloat-helpers.h"
+#include "cpu.h"
+
+extern const FloatRoundMode ieee_rm[4];
+
+uint32_t fp_class_s(uint32_t arg, float_status *fst);
+uint64_t fp_class_d(uint64_t arg, float_status *fst);
+
+int ieee_ex_to_loongarch(int xcpt);
+
+static inline void restore_rounding_mode(CPULoongArchState *env)
+{
+    set_float_rounding_mode(ieee_rm[(env->active_fpu.fcsr0 >> FCSR0_RM) & 0x3],
+                            &env->active_fpu.fp_status);
+}
+
+static inline void restore_flush_mode(CPULoongArchState *env)
+{
+    set_flush_to_zero(0, &env->active_fpu.fp_status);
+}
+
+static inline void restore_fp_status(CPULoongArchState *env)
+{
+    restore_rounding_mode(env);
+    restore_flush_mode(env);
+}
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index a60f293..e945177 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -25,3 +25,50 @@ DEF_HELPER_3(asrtgt_d, void, env, tl, tl)
 DEF_HELPER_3(crc32, tl, tl, tl, i32)
 DEF_HELPER_3(crc32c, tl, tl, tl, i32)
 DEF_HELPER_2(cpucfg, tl, env, tl)
+
+/* Floating-point helper */
+DEF_HELPER_3(fp_add_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_add_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_sub_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_sub_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_mul_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_mul_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_div_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_div_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_max_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_max_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_maxa_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_maxa_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_min_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_min_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_mina_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_mina_d, i64, env, i64, i64)
+
+DEF_HELPER_4(fp_madd_s, i32, env, i32, i32, i32)
+DEF_HELPER_4(fp_madd_d, i64, env, i64, i64, i64)
+DEF_HELPER_4(fp_msub_s, i32, env, i32, i32, i32)
+DEF_HELPER_4(fp_msub_d, i64, env, i64, i64, i64)
+DEF_HELPER_4(fp_nmadd_s, i32, env, i32, i32, i32)
+DEF_HELPER_4(fp_nmadd_d, i64, env, i64, i64, i64)
+DEF_HELPER_4(fp_nmsub_s, i32, env, i32, i32, i32)
+DEF_HELPER_4(fp_nmsub_d, i64, env, i64, i64, i64)
+
+DEF_HELPER_3(fp_exp2_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_exp2_d, i64, env, i64, i64)
+DEF_HELPER_2(fp_logb_s, i32, env, i32)
+DEF_HELPER_2(fp_logb_d, i64, env, i64)
+
+DEF_HELPER_1(fp_abs_s, i32, i32)
+DEF_HELPER_1(fp_abs_d, i64, i64)
+DEF_HELPER_1(fp_neg_s, i32, i32)
+DEF_HELPER_1(fp_neg_d, i64, i64)
+
+DEF_HELPER_2(fp_sqrt_s, i32, env, i32)
+DEF_HELPER_2(fp_sqrt_d, i64, env, i64)
+DEF_HELPER_2(fp_rsqrt_s, i32, env, i32)
+DEF_HELPER_2(fp_rsqrt_d, i64, env, i64)
+DEF_HELPER_2(fp_recip_s, i32, env, i32)
+DEF_HELPER_2(fp_recip_d, i64, env, i64)
+
+DEF_HELPER_FLAGS_2(fp_class_s, TCG_CALL_NO_RWG_SE, i32, env, i32)
+DEF_HELPER_FLAGS_2(fp_class_d, TCG_CALL_NO_RWG_SE, i64, env, i64)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 66bc314..9e6a727 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -28,6 +28,10 @@
 %hint    0:5
 %whint   0:15
 %code    0:15
+%fd      0:5
+%fj      5:5
+%fk      10:5
+%fa      15:5
 
 #
 # Argument sets
@@ -49,6 +53,9 @@
 &fmt_whint          whint
 &fmt_rjrk           rj rk
 &fmt_code           code
+&fmt_fdfjfk         fd fj fk
+&fmt_fdfjfkfa       fd fj fk fa
+&fmt_fdfj           fd fj
 
 #
 # Formats
@@ -70,6 +77,9 @@
 @fmt_rdrjsi14        .... .... .............. ..... .....     &fmt_rdrjsi14       %rd %rj %si14
 @fmt_rjrk            .... ........ ..... ..... ..... .....    &fmt_rjrk           %rj %rk
 @fmt_code            .... ........ ..... ...............      &fmt_code           %code
+@fmt_fdfjfk          .... ........ ..... ..... ..... .....    &fmt_fdfjfk         %fd %fj %fk
+@fmt_fdfjfkfa        .... ........ ..... ..... ..... .....    &fmt_fdfjfkfa       %fd %fj %fk %fa
+@fmt_fdfj            .... ........ ..... ..... ..... .....    &fmt_fdfj           %fd %fj
 
 #
 # Fixed point arithmetic operation instruction
@@ -285,3 +295,49 @@ rdtimel_w        0000 00000000 00000 11000 ..... .....    @fmt_rdrj
 rdtimeh_w        0000 00000000 00000 11001 ..... .....    @fmt_rdrj
 rdtime_d         0000 00000000 00000 11010 ..... .....    @fmt_rdrj
 cpucfg           0000 00000000 00000 11011 ..... .....    @fmt_rdrj
+
+#
+# Floating point arithmetic operation instruction
+#
+fadd_s           0000 00010000 00001 ..... ..... .....    @fmt_fdfjfk
+fadd_d           0000 00010000 00010 ..... ..... .....    @fmt_fdfjfk
+fsub_s           0000 00010000 00101 ..... ..... .....    @fmt_fdfjfk
+fsub_d           0000 00010000 00110 ..... ..... .....    @fmt_fdfjfk
+fmul_s           0000 00010000 01001 ..... ..... .....    @fmt_fdfjfk
+fmul_d           0000 00010000 01010 ..... ..... .....    @fmt_fdfjfk
+fdiv_s           0000 00010000 01101 ..... ..... .....    @fmt_fdfjfk
+fdiv_d           0000 00010000 01110 ..... ..... .....    @fmt_fdfjfk
+fmadd_s          0000 10000001 ..... ..... ..... .....    @fmt_fdfjfkfa
+fmadd_d          0000 10000010 ..... ..... ..... .....    @fmt_fdfjfkfa
+fmsub_s          0000 10000101 ..... ..... ..... .....    @fmt_fdfjfkfa
+fmsub_d          0000 10000110 ..... ..... ..... .....    @fmt_fdfjfkfa
+fnmadd_s         0000 10001001 ..... ..... ..... .....    @fmt_fdfjfkfa
+fnmadd_d         0000 10001010 ..... ..... ..... .....    @fmt_fdfjfkfa
+fnmsub_s         0000 10001101 ..... ..... ..... .....    @fmt_fdfjfkfa
+fnmsub_d         0000 10001110 ..... ..... ..... .....    @fmt_fdfjfkfa
+fmax_s           0000 00010000 10001 ..... ..... .....    @fmt_fdfjfk
+fmax_d           0000 00010000 10010 ..... ..... .....    @fmt_fdfjfk
+fmin_s           0000 00010000 10101 ..... ..... .....    @fmt_fdfjfk
+fmin_d           0000 00010000 10110 ..... ..... .....    @fmt_fdfjfk
+fmaxa_s          0000 00010000 11001 ..... ..... .....    @fmt_fdfjfk
+fmaxa_d          0000 00010000 11010 ..... ..... .....    @fmt_fdfjfk
+fmina_s          0000 00010000 11101 ..... ..... .....    @fmt_fdfjfk
+fmina_d          0000 00010000 11110 ..... ..... .....    @fmt_fdfjfk
+fabs_s           0000 00010001 01000 00001 ..... .....    @fmt_fdfj
+fabs_d           0000 00010001 01000 00010 ..... .....    @fmt_fdfj
+fneg_s           0000 00010001 01000 00101 ..... .....    @fmt_fdfj
+fneg_d           0000 00010001 01000 00110 ..... .....    @fmt_fdfj
+fsqrt_s          0000 00010001 01000 10001 ..... .....    @fmt_fdfj
+fsqrt_d          0000 00010001 01000 10010 ..... .....    @fmt_fdfj
+frecip_s         0000 00010001 01000 10101 ..... .....    @fmt_fdfj
+frecip_d         0000 00010001 01000 10110 ..... .....    @fmt_fdfj
+frsqrt_s         0000 00010001 01000 11001 ..... .....    @fmt_fdfj
+frsqrt_d         0000 00010001 01000 11010 ..... .....    @fmt_fdfj
+fscaleb_s        0000 00010001 00001 ..... ..... .....    @fmt_fdfjfk
+fscaleb_d        0000 00010001 00010 ..... ..... .....    @fmt_fdfjfk
+flogb_s          0000 00010001 01000 01001 ..... .....    @fmt_fdfj
+flogb_d          0000 00010001 01000 01010 ..... .....    @fmt_fdfj
+fcopysign_s      0000 00010001 00101 ..... ..... .....    @fmt_fdfjfk
+fcopysign_d      0000 00010001 00110 ..... ..... .....    @fmt_fdfjfk
+fclass_s         0000 00010001 01000 01101 ..... .....    @fmt_fdfj
+fclass_d         0000 00010001 01000 01110 ..... .....    @fmt_fdfj
diff --git a/target/loongarch/trans.inc.c b/target/loongarch/trans.inc.c
index 366877e..786d2a6 100644
--- a/target/loongarch/trans.inc.c
+++ b/target/loongarch/trans.inc.c
@@ -3319,3 +3319,809 @@ static bool trans_cpucfg(DisasContext *ctx, arg_cpucfg *a)
 
     return true;
 }
+
+/* Floating point arithmetic operation instruction translation */
+static bool trans_fadd_s(DisasContext *ctx, arg_fadd_s * a)
+{
+    TCGv_i32 fp0, fp1;
+
+    fp0 = tcg_temp_new_i32();
+    fp1 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_load_fpr32(fp1, a->fk);
+    gen_helper_fp_add_s(fp0, cpu_env, fp0, fp1);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free_i32(fp0);
+    tcg_temp_free_i32(fp1);
+
+    return true;
+}
+
+static bool trans_fadd_d(DisasContext *ctx, arg_fadd_d *a)
+{
+    TCGv_i64 fp0, fp1;
+
+    fp0 = tcg_temp_new_i64();
+    fp1 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_load_fpr64(fp1, a->fk);
+    gen_helper_fp_add_d(fp0, cpu_env, fp0, fp1);
+    gen_store_fpr64(fp0, a->fd);
+
+    tcg_temp_free_i64(fp0);
+    tcg_temp_free_i64(fp1);
+
+    return true;
+}
+
+static bool trans_fsub_s(DisasContext *ctx, arg_fsub_s *a)
+{
+    TCGv_i32 fp0, fp1;
+
+    fp0 = tcg_temp_new_i32();
+    fp1 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_load_fpr32(fp1, a->fk);
+    gen_helper_fp_sub_s(fp0, cpu_env, fp0, fp1);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free_i32(fp0);
+    tcg_temp_free_i32(fp1);
+
+    return true;
+}
+
+static bool trans_fsub_d(DisasContext *ctx, arg_fsub_d *a)
+{
+    TCGv_i64 fp0, fp1;
+
+    fp0 = tcg_temp_new_i64();
+    fp1 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_load_fpr64(fp1, a->fk);
+    gen_helper_fp_sub_d(fp0, cpu_env, fp0, fp1);
+    gen_store_fpr64(fp0, a->fd);
+
+    tcg_temp_free_i64(fp0);
+    tcg_temp_free_i64(fp1);
+
+    return true;
+}
+
+static bool trans_fmul_s(DisasContext *ctx, arg_fmul_s *a)
+{
+    TCGv_i32 fp0, fp1;
+
+    fp0 = tcg_temp_new_i32();
+    fp1 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_load_fpr32(fp1, a->fk);
+    gen_helper_fp_mul_s(fp0, cpu_env, fp0, fp1);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free_i32(fp0);
+    tcg_temp_free_i32(fp1);
+
+    return true;
+}
+
+static bool trans_fmul_d(DisasContext *ctx, arg_fmul_d *a)
+{
+    TCGv_i64 fp0, fp1;
+
+    fp0 = tcg_temp_new_i64();
+    fp1 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_load_fpr64(fp1, a->fk);
+    gen_helper_fp_mul_d(fp0, cpu_env, fp0, fp1);
+    gen_store_fpr64(fp0, a->fd);
+
+    tcg_temp_free_i64(fp0);
+    tcg_temp_free_i64(fp1);
+
+    return true;
+}
+
+static bool trans_fdiv_s(DisasContext *ctx, arg_fdiv_s *a)
+{
+    TCGv_i32 fp0, fp1;
+
+    fp0 = tcg_temp_new_i32();
+    fp1 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_load_fpr32(fp1, a->fk);
+    gen_helper_fp_div_s(fp0, cpu_env, fp0, fp1);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free_i32(fp0);
+    tcg_temp_free_i32(fp1);
+
+    return true;
+}
+
+static bool trans_fdiv_d(DisasContext *ctx, arg_fdiv_d *a)
+{
+    TCGv_i64 fp0, fp1;
+
+    fp0 = tcg_temp_new_i64();
+    fp1 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_load_fpr64(fp1, a->fk);
+    gen_helper_fp_div_d(fp0, cpu_env, fp0, fp1);
+    gen_store_fpr64(fp0, a->fd);
+
+    tcg_temp_free_i64(fp0);
+    tcg_temp_free_i64(fp1);
+
+    return true;
+}
+
+static bool trans_fmadd_s(DisasContext *ctx, arg_fmadd_s *a)
+{
+    TCGv_i32 fp0, fp1, fp2, fp3;
+
+    fp0 = tcg_temp_new_i32();
+    fp1 = tcg_temp_new_i32();
+    fp2 = tcg_temp_new_i32();
+    fp3 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_load_fpr32(fp1, a->fk);
+    gen_load_fpr32(fp2, a->fa);
+    gen_helper_fp_madd_s(fp3, cpu_env, fp0, fp1, fp2);
+    gen_store_fpr32(fp3, a->fd);
+
+    tcg_temp_free_i32(fp0);
+    tcg_temp_free_i32(fp1);
+    tcg_temp_free_i32(fp2);
+    tcg_temp_free_i32(fp3);
+
+    return true;
+}
+
+static bool trans_fmadd_d(DisasContext *ctx, arg_fmadd_d *a)
+{
+    TCGv_i64 fp0, fp1, fp2, fp3;
+
+    fp0 = tcg_temp_new_i64();
+    fp1 = tcg_temp_new_i64();
+    fp2 = tcg_temp_new_i64();
+    fp3 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_load_fpr64(fp1, a->fk);
+    gen_load_fpr64(fp2, a->fa);
+    check_fpu_enabled(ctx);
+    gen_helper_fp_madd_d(fp3, cpu_env, fp0, fp1, fp2);
+    gen_store_fpr64(fp3, a->fd);
+
+    tcg_temp_free_i64(fp0);
+    tcg_temp_free_i64(fp1);
+    tcg_temp_free_i64(fp2);
+    tcg_temp_free_i64(fp3);
+
+    return true;
+}
+
+static bool trans_fmsub_s(DisasContext *ctx, arg_fmsub_s *a)
+{
+    TCGv_i32 fp0, fp1, fp2, fp3;
+
+    fp0 = tcg_temp_new_i32();
+    fp1 = tcg_temp_new_i32();
+    fp2 = tcg_temp_new_i32();
+    fp3 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_load_fpr32(fp1, a->fk);
+    gen_load_fpr32(fp2, a->fa);
+    gen_helper_fp_msub_s(fp3, cpu_env, fp0, fp1, fp2);
+    gen_store_fpr32(fp3, a->fd);
+
+    tcg_temp_free_i32(fp0);
+    tcg_temp_free_i32(fp1);
+    tcg_temp_free_i32(fp2);
+    tcg_temp_free_i32(fp3);
+
+    return true;
+}
+
+static bool trans_fmsub_d(DisasContext *ctx, arg_fmsub_d *a)
+{
+    TCGv_i64 fp0, fp1, fp2, fp3;
+
+    fp0 = tcg_temp_new_i64();
+    fp1 = tcg_temp_new_i64();
+    fp2 = tcg_temp_new_i64();
+    fp3 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_load_fpr64(fp1, a->fk);
+    gen_load_fpr64(fp2, a->fa);
+    gen_helper_fp_msub_d(fp3, cpu_env, fp0, fp1, fp2);
+    gen_store_fpr64(fp3, a->fd);
+
+    tcg_temp_free_i64(fp0);
+    tcg_temp_free_i64(fp1);
+    tcg_temp_free_i64(fp2);
+    tcg_temp_free_i64(fp3);
+
+    return true;
+}
+
+static bool trans_fnmadd_s(DisasContext *ctx, arg_fnmadd_s *a)
+{
+    TCGv_i32 fp0, fp1, fp2, fp3;
+
+    fp0 = tcg_temp_new_i32();
+    fp1 = tcg_temp_new_i32();
+    fp2 = tcg_temp_new_i32();
+    fp3 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_load_fpr32(fp1, a->fk);
+    gen_load_fpr32(fp2, a->fa);
+    gen_helper_fp_nmadd_s(fp3, cpu_env, fp0, fp1, fp2);
+    gen_store_fpr32(fp3, a->fd);
+
+    tcg_temp_free_i32(fp0);
+    tcg_temp_free_i32(fp1);
+    tcg_temp_free_i32(fp2);
+    tcg_temp_free_i32(fp3);
+
+    return true;
+}
+
+static bool trans_fnmadd_d(DisasContext *ctx, arg_fnmadd_d *a)
+{
+    TCGv_i64 fp0, fp1, fp2, fp3;
+
+    fp0 = tcg_temp_new_i64();
+    fp1 = tcg_temp_new_i64();
+    fp2 = tcg_temp_new_i64();
+    fp3 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_load_fpr64(fp1, a->fk);
+    gen_load_fpr64(fp2, a->fa);
+    gen_helper_fp_nmadd_d(fp3, cpu_env, fp0, fp1, fp2);
+    gen_store_fpr64(fp3, a->fd);
+
+    tcg_temp_free_i64(fp0);
+    tcg_temp_free_i64(fp1);
+    tcg_temp_free_i64(fp2);
+    tcg_temp_free_i64(fp3);
+
+    return true;
+}
+
+static bool trans_fnmsub_s(DisasContext *ctx, arg_fnmsub_s *a)
+{
+    TCGv_i32 fp0, fp1, fp2, fp3;
+
+    fp0 = tcg_temp_new_i32();
+    fp1 = tcg_temp_new_i32();
+    fp2 = tcg_temp_new_i32();
+    fp3 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_load_fpr32(fp1, a->fk);
+    gen_load_fpr32(fp2, a->fa);
+    gen_helper_fp_nmsub_s(fp3, cpu_env, fp0, fp1, fp2);
+    gen_store_fpr32(fp3, a->fd);
+
+    tcg_temp_free_i32(fp0);
+    tcg_temp_free_i32(fp1);
+    tcg_temp_free_i32(fp2);
+    tcg_temp_free_i32(fp3);
+
+    return true;
+}
+
+static bool trans_fnmsub_d(DisasContext *ctx, arg_fnmsub_d *a)
+{
+    TCGv_i64 fp0, fp1, fp2, fp3;
+
+    fp0 = tcg_temp_new_i64();
+    fp1 = tcg_temp_new_i64();
+    fp2 = tcg_temp_new_i64();
+    fp3 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_load_fpr64(fp1, a->fk);
+    gen_load_fpr64(fp2, a->fa);
+    gen_helper_fp_nmsub_d(fp3, cpu_env, fp0, fp1, fp2);
+    gen_store_fpr64(fp3, a->fd);
+
+    tcg_temp_free_i64(fp0);
+    tcg_temp_free_i64(fp1);
+    tcg_temp_free_i64(fp2);
+    tcg_temp_free_i64(fp3);
+
+    return true;
+}
+
+static bool trans_fmax_s(DisasContext *ctx, arg_fmax_s *a)
+{
+    TCGv_i32 fp0, fp1;
+
+    fp0 = tcg_temp_new_i32();
+    fp1 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_load_fpr32(fp1, a->fk);
+    gen_helper_fp_max_s(fp1, cpu_env, fp0, fp1);
+    gen_store_fpr32(fp1, a->fd);
+
+    tcg_temp_free_i32(fp0);
+    tcg_temp_free_i32(fp1);
+
+    return true;
+}
+
+static bool trans_fmax_d(DisasContext *ctx, arg_fmax_d *a)
+{
+    TCGv_i64 fp0, fp1;
+
+    fp0 = tcg_temp_new_i64();
+    fp1 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_load_fpr64(fp1, a->fk);
+    gen_helper_fp_max_d(fp1, cpu_env, fp0, fp1);
+    gen_store_fpr64(fp1, a->fd);
+
+    tcg_temp_free_i64(fp0);
+    tcg_temp_free_i64(fp1);
+
+    return true;
+}
+
+static bool trans_fmin_s(DisasContext *ctx, arg_fmin_s *a)
+{
+    TCGv_i32 fp0, fp1;
+
+    fp0 = tcg_temp_new_i32();
+    fp1 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_load_fpr32(fp1, a->fk);
+    gen_helper_fp_min_s(fp1, cpu_env, fp0, fp1);
+    gen_store_fpr32(fp1, a->fd);
+
+    tcg_temp_free_i32(fp0);
+    tcg_temp_free_i32(fp1);
+
+    return true;
+}
+
+static bool trans_fmin_d(DisasContext *ctx, arg_fmin_d *a)
+{
+    TCGv_i64 fp0, fp1;
+
+    fp0 = tcg_temp_new_i64();
+    fp1 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_load_fpr64(fp1, a->fk);
+    gen_helper_fp_min_d(fp1, cpu_env, fp0, fp1);
+    gen_store_fpr64(fp1, a->fd);
+
+    tcg_temp_free_i64(fp0);
+    tcg_temp_free_i64(fp1);
+
+    return true;
+}
+
+static bool trans_fmaxa_s(DisasContext *ctx, arg_fmaxa_s *a)
+{
+    TCGv_i32 fp0, fp1;
+
+    fp0 = tcg_temp_new_i32();
+    fp1 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_load_fpr32(fp1, a->fk);
+    gen_helper_fp_maxa_s(fp1, cpu_env, fp0, fp1);
+    gen_store_fpr32(fp1, a->fd);
+
+    tcg_temp_free_i32(fp0);
+    tcg_temp_free_i32(fp1);
+
+    return true;
+}
+
+static bool trans_fmaxa_d(DisasContext *ctx, arg_fmaxa_d *a)
+{
+    TCGv_i64 fp0, fp1;
+
+    fp0 = tcg_temp_new_i64();
+    fp1 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_load_fpr64(fp1, a->fk);
+    gen_helper_fp_maxa_d(fp1, cpu_env, fp0, fp1);
+    gen_store_fpr64(fp1, a->fd);
+
+    tcg_temp_free_i64(fp0);
+    tcg_temp_free_i64(fp1);
+
+    return true;
+}
+
+static bool trans_fmina_s(DisasContext *ctx, arg_fmina_s *a)
+{
+    TCGv_i32 fp0, fp1;
+
+    fp0 = tcg_temp_new_i32();
+    fp1 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_load_fpr32(fp1, a->fk);
+    gen_helper_fp_mina_s(fp1, cpu_env, fp0, fp1);
+    gen_store_fpr32(fp1, a->fd);
+
+    tcg_temp_free_i32(fp0);
+    tcg_temp_free_i32(fp1);
+
+    return true;
+}
+
+static bool trans_fmina_d(DisasContext *ctx, arg_fmina_d *a)
+{
+    TCGv_i64 fp0, fp1;
+
+    fp0 = tcg_temp_new_i64();
+    fp1 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_load_fpr64(fp1, a->fk);
+    gen_helper_fp_mina_d(fp1, cpu_env, fp0, fp1);
+    gen_store_fpr64(fp1, a->fd);
+
+    tcg_temp_free_i64(fp0);
+    tcg_temp_free_i64(fp1);
+
+    return true;
+}
+
+static bool trans_fabs_s(DisasContext *ctx, arg_fabs_s *a)
+{
+    TCGv_i32 fp0;
+
+    fp0 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_helper_fp_abs_s(fp0, fp0);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free_i32(fp0);
+
+    return true;
+}
+
+static bool trans_fabs_d(DisasContext *ctx, arg_fabs_d *a)
+{
+    TCGv_i64 fp0;
+
+    fp0 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_helper_fp_abs_d(fp0, fp0);
+    gen_store_fpr64(fp0, a->fd);
+
+    tcg_temp_free_i64(fp0);
+
+    return true;
+}
+
+static bool trans_fneg_s(DisasContext *ctx, arg_fneg_s *a)
+{
+    TCGv_i32 fp0;
+
+    fp0 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_helper_fp_neg_s(fp0, fp0);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free_i32(fp0);
+
+    return true;
+}
+
+static bool trans_fneg_d(DisasContext *ctx, arg_fneg_d *a)
+{
+    TCGv_i64 fp0;
+
+    fp0 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_helper_fp_neg_d(fp0, fp0);
+    gen_store_fpr64(fp0, a->fd);
+
+    tcg_temp_free_i64(fp0);
+
+    return true;
+}
+
+static bool trans_fsqrt_s(DisasContext *ctx, arg_fsqrt_s *a)
+{
+    TCGv_i32 fp0;
+
+    fp0 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_helper_fp_sqrt_s(fp0, cpu_env, fp0);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free_i32(fp0);
+
+    return true;
+}
+
+static bool trans_fsqrt_d(DisasContext *ctx, arg_fsqrt_d *a)
+{
+    TCGv_i64 fp0;
+
+    fp0 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_helper_fp_sqrt_d(fp0, cpu_env, fp0);
+    gen_store_fpr64(fp0, a->fd);
+
+    tcg_temp_free_i64(fp0);
+
+    return true;
+}
+
+static bool trans_frecip_s(DisasContext *ctx, arg_frecip_s *a)
+{
+    TCGv_i32 fp0;
+
+    fp0 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_helper_fp_recip_s(fp0, cpu_env, fp0);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free_i32(fp0);
+
+    return true;
+}
+
+static bool trans_frecip_d(DisasContext *ctx, arg_frecip_d *a)
+{
+    TCGv_i64 fp0;
+
+    fp0 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_helper_fp_recip_d(fp0, cpu_env, fp0);
+    gen_store_fpr64(fp0, a->fd);
+
+    tcg_temp_free_i64(fp0);
+
+    return true;
+}
+
+static bool trans_frsqrt_s(DisasContext *ctx, arg_frsqrt_s *a)
+{
+    TCGv_i32 fp0;
+
+    fp0 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_helper_fp_rsqrt_s(fp0, cpu_env, fp0);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free_i32(fp0);
+
+    return true;
+}
+
+static bool trans_frsqrt_d(DisasContext *ctx, arg_frsqrt_d *a)
+{
+    TCGv_i64 fp0;
+
+    fp0 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_helper_fp_rsqrt_d(fp0, cpu_env, fp0);
+    gen_store_fpr64(fp0, a->fd);
+
+    tcg_temp_free_i64(fp0);
+
+    return true;
+}
+
+static bool trans_fscaleb_s(DisasContext *ctx, arg_fscaleb_s *a)
+{
+    TCGv_i32 fp0, fp1;
+
+    fp0 = tcg_temp_new_i32();
+    fp1 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_load_fpr32(fp1, a->fk);
+    gen_helper_fp_exp2_s(fp0, cpu_env, fp0, fp1);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free_i32(fp0);
+    tcg_temp_free_i32(fp1);
+
+    return true;
+}
+
+static bool trans_fscaleb_d(DisasContext *ctx, arg_fscaleb_d *a)
+{
+    TCGv_i64 fp0, fp1;
+
+    fp0 = tcg_temp_new_i64();
+    fp1 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_load_fpr64(fp1, a->fk);
+    gen_helper_fp_exp2_d(fp0, cpu_env, fp0, fp1);
+    gen_store_fpr64(fp0, a->fd);
+
+    tcg_temp_free_i64(fp0);
+    tcg_temp_free_i64(fp1);
+
+    return true;
+}
+
+static bool trans_flogb_s(DisasContext *ctx, arg_flogb_s *a)
+{
+    TCGv_i32 fp0, fp1;
+
+    fp0 = tcg_temp_new_i32();
+    fp1 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_helper_fp_logb_s(fp1, cpu_env, fp0);
+    gen_store_fpr32(fp1, a->fd);
+
+    tcg_temp_free_i32(fp0);
+    tcg_temp_free_i32(fp1);
+
+    return true;
+}
+
+static bool trans_flogb_d(DisasContext *ctx, arg_flogb_d *a)
+{
+    TCGv_i64 fp0, fp1;
+
+    fp0 = tcg_temp_new_i64();
+    fp1 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_helper_fp_logb_d(fp1, cpu_env, fp0);
+    gen_store_fpr64(fp1, a->fd);
+
+    tcg_temp_free_i64(fp0);
+    tcg_temp_free_i64(fp1);
+
+    return true;
+}
+
+static bool trans_fcopysign_s(DisasContext *ctx, arg_fcopysign_s *a)
+{
+    TCGv_i32 fp0, fp1, fp2;
+
+    fp0 = tcg_temp_new_i32();
+    fp1 = tcg_temp_new_i32();
+    fp2 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_load_fpr32(fp1, a->fk);
+    tcg_gen_deposit_i32(fp2, fp1, fp0, 0, 31);
+    gen_store_fpr32(fp2, a->fd);
+
+    tcg_temp_free_i32(fp0);
+    tcg_temp_free_i32(fp1);
+    tcg_temp_free_i32(fp2);
+
+    return true;
+}
+
+static bool trans_fcopysign_d(DisasContext *ctx, arg_fcopysign_d *a)
+{
+    TCGv_i64 fp0, fp1, fp2;
+
+    fp0 = tcg_temp_new_i64();
+    fp1 = tcg_temp_new_i64();
+    fp2 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_load_fpr64(fp1, a->fk);
+    tcg_gen_deposit_i64(fp2, fp1, fp0, 0, 63);
+    gen_store_fpr64(fp2, a->fd);
+
+    tcg_temp_free_i64(fp0);
+    tcg_temp_free_i64(fp1);
+    tcg_temp_free_i64(fp2);
+
+    return true;
+}
+
+static bool trans_fclass_s(DisasContext *ctx, arg_fclass_s *a)
+{
+    TCGv_i32 fp0;
+
+    fp0 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_helper_fp_class_s(fp0, cpu_env, fp0);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free_i32(fp0);
+
+    return true;
+}
+
+static bool trans_fclass_d(DisasContext *ctx, arg_fclass_d *a)
+{
+    TCGv_i64 fp0;
+
+    fp0 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_helper_fp_class_d(fp0, cpu_env, fp0);
+    gen_store_fpr64(fp0, a->fd);
+
+    tcg_temp_free_i64(fp0);
+
+    return true;
+}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 14/22] target/loongarch: Add floating point comparison instruction translation
  2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
                   ` (12 preceding siblings ...)
  2021-07-21  9:53 ` [PATCH v2 13/22] target/loongarch: Add floating point arithmetic " Song Gao
@ 2021-07-21  9:53 ` Song Gao
  2021-07-23  6:11   ` Richard Henderson
  2021-07-21  9:53 ` [PATCH v2 15/22] target/loongarch: Add floating point conversion " Song Gao
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

This patch implement floating point comparison instruction translation.

This includes:
- FCMP.cond.{S/D}

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/fpu_helper.c | 613 ++++++++++++++++++++++++++++++++++++++++++
 target/loongarch/helper.h     |  49 ++++
 target/loongarch/insns.decode |  10 +
 target/loongarch/trans.inc.c  | 184 +++++++++++++
 4 files changed, 856 insertions(+)

diff --git a/target/loongarch/fpu_helper.c b/target/loongarch/fpu_helper.c
index 399a98b..0b6a07e 100644
--- a/target/loongarch/fpu_helper.c
+++ b/target/loongarch/fpu_helper.c
@@ -378,3 +378,616 @@ uint64_t helper_fp_logb_d(CPULoongArchState *env, uint64_t fp)
     update_fcsr0(env, GETPC());
     return fp1;
 }
+
+void helper_movreg2cf_i32(CPULoongArchState *env, uint32_t cd, uint32_t src)
+{
+    env->active_fpu.cf[cd & 0x7] = src & 0x1;
+}
+
+void helper_movreg2cf_i64(CPULoongArchState *env, uint32_t cd, uint64_t src)
+{
+    env->active_fpu.cf[cd & 0x7] = src & 0x1;
+}
+
+/* fcmp.cond.s */
+uint32_t helper_fp_cmp_caf_s(CPULoongArchState *env, uint32_t fp,
+                             uint32_t fp1)
+{
+    uint64_t ret;
+    ret = (float32_unordered_quiet(fp1, fp, &env->active_fpu.fp_status), 0);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint32_t helper_fp_cmp_cun_s(CPULoongArchState *env, uint32_t fp,
+                             uint32_t fp1)
+{
+    uint64_t ret;
+    ret = float32_unordered_quiet(fp1, fp, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint32_t helper_fp_cmp_ceq_s(CPULoongArchState *env, uint32_t fp,
+                             uint32_t fp1)
+{
+    uint64_t ret;
+    ret = float32_eq_quiet(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint32_t helper_fp_cmp_cueq_s(CPULoongArchState *env, uint32_t fp,
+                              uint32_t fp1)
+{
+    uint64_t ret;
+    ret = float32_unordered_quiet(fp1, fp, &env->active_fpu.fp_status) ||
+          float32_eq_quiet(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint32_t helper_fp_cmp_clt_s(CPULoongArchState *env, uint32_t fp,
+                             uint32_t fp1)
+{
+    uint64_t ret;
+    ret = float32_lt_quiet(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint32_t helper_fp_cmp_cult_s(CPULoongArchState *env, uint32_t fp,
+                              uint32_t fp1)
+{
+    uint64_t ret;
+    ret = float32_unordered_quiet(fp1, fp, &env->active_fpu.fp_status) ||
+          float32_lt_quiet(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint32_t helper_fp_cmp_cle_s(CPULoongArchState *env, uint32_t fp,
+                             uint32_t fp1)
+{
+    uint64_t ret;
+    ret = float32_le_quiet(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint32_t helper_fp_cmp_cule_s(CPULoongArchState *env, uint32_t fp,
+                              uint32_t fp1)
+{
+    uint64_t ret;
+    ret = float32_unordered_quiet(fp1, fp, &env->active_fpu.fp_status) ||
+          float32_le_quiet(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint32_t helper_fp_cmp_cne_s(CPULoongArchState *env, uint32_t fp,
+                             uint32_t fp1)
+{
+    uint64_t ret;
+    ret = float32_lt_quiet(fp1, fp, &env->active_fpu.fp_status) ||
+          float32_lt_quiet(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint32_t helper_fp_cmp_cor_s(CPULoongArchState *env, uint32_t fp,
+                             uint32_t fp1)
+{
+    uint64_t ret;
+    ret = float32_le_quiet(fp1, fp, &env->active_fpu.fp_status) ||
+          float32_le_quiet(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint32_t helper_fp_cmp_cune_s(CPULoongArchState *env, uint32_t fp,
+                              uint32_t fp1)
+{
+    uint64_t ret;
+    ret = float32_unordered_quiet(fp1, fp, &env->active_fpu.fp_status) ||
+          float32_lt_quiet(fp1, fp, &env->active_fpu.fp_status) ||
+          float32_lt_quiet(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+
+uint32_t helper_fp_cmp_saf_s(CPULoongArchState *env, uint32_t fp,
+                             uint32_t fp1)
+{
+    uint64_t ret;
+    ret = (float32_unordered(fp1, fp, &env->active_fpu.fp_status), 0);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint32_t helper_fp_cmp_sun_s(CPULoongArchState *env, uint32_t fp,
+                             uint32_t fp1)
+{
+    uint64_t ret;
+    ret = float32_unordered(fp1, fp, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint32_t helper_fp_cmp_seq_s(CPULoongArchState *env, uint32_t fp,
+                             uint32_t fp1)
+{
+    uint64_t ret;
+    ret = float32_eq(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint32_t helper_fp_cmp_sueq_s(CPULoongArchState *env, uint32_t fp,
+                              uint32_t fp1)
+{
+    uint64_t ret;
+    ret = float32_unordered(fp1, fp, &env->active_fpu.fp_status) ||
+          float32_eq(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint32_t helper_fp_cmp_slt_s(CPULoongArchState *env, uint32_t fp,
+                             uint32_t fp1)
+{
+    uint64_t ret;
+    ret = float32_lt(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint32_t helper_fp_cmp_sult_s(CPULoongArchState *env, uint32_t fp,
+                              uint32_t fp1)
+{
+    uint64_t ret;
+    ret = float32_unordered(fp1, fp, &env->active_fpu.fp_status) ||
+          float32_lt(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint32_t helper_fp_cmp_sle_s(CPULoongArchState *env, uint32_t fp,
+                             uint32_t fp1)
+{
+    uint64_t ret;
+    ret = float32_le(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint32_t helper_fp_cmp_sule_s(CPULoongArchState *env, uint32_t fp,
+                              uint32_t fp1)
+{
+    uint64_t ret;
+    ret = float32_unordered(fp1, fp, &env->active_fpu.fp_status) ||
+          float32_le(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint32_t helper_fp_cmp_sne_s(CPULoongArchState *env, uint32_t fp,
+                             uint32_t fp1)
+{
+    uint64_t ret;
+    ret = float32_lt(fp1, fp, &env->active_fpu.fp_status) ||
+          float32_lt(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint32_t helper_fp_cmp_sor_s(CPULoongArchState *env, uint32_t fp,
+                             uint32_t fp1)
+{
+    uint64_t ret;
+    ret = float32_le(fp1, fp, &env->active_fpu.fp_status) ||
+          float32_le(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint32_t helper_fp_cmp_sune_s(CPULoongArchState *env, uint32_t fp,
+                              uint32_t fp1)
+{
+    uint64_t ret;
+    ret = float32_unordered(fp1, fp, &env->active_fpu.fp_status) ||
+          float32_lt(fp1, fp, &env->active_fpu.fp_status) ||
+          float32_lt(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+/* fcmp.cond.d */
+uint64_t helper_fp_cmp_caf_d(CPULoongArchState *env, uint64_t fp,
+                             uint64_t fp1)
+{
+    uint64_t ret;
+    ret = (float64_unordered_quiet(fp1, fp, &env->active_fpu.fp_status), 0);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint64_t helper_fp_cmp_cun_d(CPULoongArchState *env, uint64_t fp,
+                             uint64_t fp1)
+{
+    uint64_t ret;
+    ret = float64_unordered_quiet(fp1, fp, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint64_t helper_fp_cmp_ceq_d(CPULoongArchState *env, uint64_t fp,
+                             uint64_t fp1)
+{
+    uint64_t ret;
+    ret = float64_eq_quiet(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint64_t helper_fp_cmp_cueq_d(CPULoongArchState *env, uint64_t fp,
+                              uint64_t fp1)
+{
+    uint64_t ret;
+    ret = float64_unordered_quiet(fp1, fp, &env->active_fpu.fp_status) ||
+          float64_eq_quiet(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint64_t helper_fp_cmp_clt_d(CPULoongArchState *env, uint64_t fp,
+                              uint64_t fp1)
+{
+    uint64_t ret;
+    ret = float64_lt_quiet(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint64_t helper_fp_cmp_cult_d(CPULoongArchState *env, uint64_t fp,
+                              uint64_t fp1)
+{
+    uint64_t ret;
+    ret = float64_unordered_quiet(fp1, fp, &env->active_fpu.fp_status) ||
+          float64_lt_quiet(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint64_t helper_fp_cmp_cle_d(CPULoongArchState *env, uint64_t fp,
+                             uint64_t fp1)
+{
+    uint64_t ret;
+    ret = float64_le_quiet(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint64_t helper_fp_cmp_cule_d(CPULoongArchState *env, uint64_t fp,
+                              uint64_t fp1)
+{
+    uint64_t ret;
+    ret = float64_unordered_quiet(fp1, fp, &env->active_fpu.fp_status) ||
+          float64_le_quiet(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint64_t helper_fp_cmp_cne_d(CPULoongArchState *env, uint64_t fp,
+                             uint64_t fp1)
+{
+    uint64_t ret;
+    ret = float64_lt_quiet(fp1, fp, &env->active_fpu.fp_status) ||
+          float64_lt_quiet(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint64_t helper_fp_cmp_cor_d(CPULoongArchState *env, uint64_t fp,
+                             uint64_t fp1)
+{
+    uint64_t ret;
+    ret = float64_le_quiet(fp1, fp, &env->active_fpu.fp_status) ||
+          float64_le_quiet(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint64_t helper_fp_cmp_cune_d(CPULoongArchState *env, uint64_t fp,
+                              uint64_t fp1)
+{
+    uint64_t ret;
+    ret = float64_unordered_quiet(fp1, fp, &env->active_fpu.fp_status) ||
+          float64_lt_quiet(fp1, fp, &env->active_fpu.fp_status) ||
+          float64_lt_quiet(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint64_t helper_fp_cmp_saf_d(CPULoongArchState *env, uint64_t fp,
+                             uint64_t fp1)
+{
+    uint64_t ret;
+    ret = (float64_unordered(fp1, fp, &env->active_fpu.fp_status), 0);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint64_t helper_fp_cmp_sun_d(CPULoongArchState *env, uint64_t fp,
+                             uint64_t fp1)
+{
+    uint64_t ret;
+    ret = float64_unordered(fp1, fp, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint64_t helper_fp_cmp_seq_d(CPULoongArchState *env, uint64_t fp,
+                             uint64_t fp1)
+{
+    uint64_t ret;
+    ret = float64_eq(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint64_t helper_fp_cmp_sueq_d(CPULoongArchState *env, uint64_t fp,
+                              uint64_t fp1)
+{
+    uint64_t ret;
+    ret = float64_unordered(fp1, fp, &env->active_fpu.fp_status) ||
+          float64_eq(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint64_t helper_fp_cmp_slt_d(CPULoongArchState *env, uint64_t fp,
+                             uint64_t fp1)
+{
+    uint64_t ret;
+    ret = float64_lt(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint64_t helper_fp_cmp_sult_d(CPULoongArchState *env, uint64_t fp,
+                              uint64_t fp1)
+{
+    uint64_t ret;
+    ret = float64_unordered(fp1, fp, &env->active_fpu.fp_status) ||
+          float64_lt(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint64_t helper_fp_cmp_sle_d(CPULoongArchState *env, uint64_t fp,
+                             uint64_t fp1)
+{
+    uint64_t ret;
+    ret = float64_le(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint64_t helper_fp_cmp_sule_d(CPULoongArchState *env, uint64_t fp,
+                              uint64_t fp1)
+{
+    uint64_t ret;
+    ret = float64_unordered(fp1, fp, &env->active_fpu.fp_status) ||
+          float64_le(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint64_t helper_fp_cmp_sne_d(CPULoongArchState *env, uint64_t fp,
+                             uint64_t fp1)
+{
+    uint64_t ret;
+    ret = float64_lt(fp1, fp, &env->active_fpu.fp_status) ||
+          float64_lt(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint64_t helper_fp_cmp_sor_d(CPULoongArchState *env, uint64_t fp,
+                             uint64_t fp1)
+{
+    uint64_t ret;
+    ret = float64_le(fp1, fp, &env->active_fpu.fp_status) ||
+          float64_le(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+uint64_t helper_fp_cmp_sune_d(CPULoongArchState *env, uint64_t fp,
+                              uint64_t fp1)
+{
+    uint64_t ret;
+    ret = float64_unordered(fp1, fp, &env->active_fpu.fp_status) ||
+          float64_lt(fp1, fp, &env->active_fpu.fp_status) ||
+          float64_lt(fp, fp1, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    if (ret) {
+        return -1;
+    } else {
+        return 0;
+    }
+}
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index e945177..b1a81c5 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -72,3 +72,52 @@ DEF_HELPER_2(fp_recip_d, i64, env, i64)
 
 DEF_HELPER_FLAGS_2(fp_class_s, TCG_CALL_NO_RWG_SE, i32, env, i32)
 DEF_HELPER_FLAGS_2(fp_class_d, TCG_CALL_NO_RWG_SE, i64, env, i64)
+
+/* fcmp.cond.s/d */
+DEF_HELPER_3(fp_cmp_caf_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_cmp_caf_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_cmp_cun_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_cmp_cun_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_cmp_ceq_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_cmp_ceq_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_cmp_cueq_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_cmp_cueq_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_cmp_clt_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_cmp_clt_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_cmp_cult_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_cmp_cult_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_cmp_cle_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_cmp_cle_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_cmp_cule_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_cmp_cule_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_cmp_cne_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_cmp_cne_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_cmp_cor_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_cmp_cor_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_cmp_cune_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_cmp_cune_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_cmp_saf_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_cmp_saf_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_cmp_sun_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_cmp_sun_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_cmp_seq_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_cmp_seq_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_cmp_sueq_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_cmp_sueq_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_cmp_slt_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_cmp_slt_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_cmp_sult_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_cmp_sult_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_cmp_sle_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_cmp_sle_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_cmp_sule_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_cmp_sule_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_cmp_sne_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_cmp_sne_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_cmp_sor_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_cmp_sor_s, i32, env, i32, i32)
+DEF_HELPER_3(fp_cmp_sune_d, i64, env, i64, i64)
+DEF_HELPER_3(fp_cmp_sune_s, i32, env, i32, i32)
+
+DEF_HELPER_3(movreg2cf_i32, void, env, i32, i32)
+DEF_HELPER_3(movreg2cf_i64, void, env, i32, i64)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 9e6a727..8aadcfd 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -32,6 +32,8 @@
 %fj      5:5
 %fk      10:5
 %fa      15:5
+%cd      0:3
+%fcond   15:5
 
 #
 # Argument sets
@@ -56,6 +58,7 @@
 &fmt_fdfjfk         fd fj fk
 &fmt_fdfjfkfa       fd fj fk fa
 &fmt_fdfj           fd fj
+&fmt_cdfjfkfcond    cd fj fk fcond
 
 #
 # Formats
@@ -80,6 +83,7 @@
 @fmt_fdfjfk          .... ........ ..... ..... ..... .....    &fmt_fdfjfk         %fd %fj %fk
 @fmt_fdfjfkfa        .... ........ ..... ..... ..... .....    &fmt_fdfjfkfa       %fd %fj %fk %fa
 @fmt_fdfj            .... ........ ..... ..... ..... .....    &fmt_fdfj           %fd %fj
+@fmt_cdfjfkfcond     .... ........ ..... ..... ..... .. ...   &fmt_cdfjfkfcond    %cd %fj %fk %fcond
 
 #
 # Fixed point arithmetic operation instruction
@@ -341,3 +345,9 @@ fcopysign_s      0000 00010001 00101 ..... ..... .....    @fmt_fdfjfk
 fcopysign_d      0000 00010001 00110 ..... ..... .....    @fmt_fdfjfk
 fclass_s         0000 00010001 01000 01101 ..... .....    @fmt_fdfj
 fclass_d         0000 00010001 01000 01110 ..... .....    @fmt_fdfj
+
+#
+# Floating point compare instruction
+#
+fcmp_cond_s      0000 11000001 ..... ..... ..... 00 ...   @fmt_cdfjfkfcond
+fcmp_cond_d      0000 11000010 ..... ..... ..... 00 ...   @fmt_cdfjfkfcond
diff --git a/target/loongarch/trans.inc.c b/target/loongarch/trans.inc.c
index 786d2a6..a4efc05 100644
--- a/target/loongarch/trans.inc.c
+++ b/target/loongarch/trans.inc.c
@@ -4125,3 +4125,187 @@ static bool trans_fclass_d(DisasContext *ctx, arg_fclass_d *a)
 
     return true;
 }
+
+/* Floating point compare instruction translation */
+static bool trans_fcmp_cond_s(DisasContext *ctx, arg_fcmp_cond_s *a)
+{
+    TCGv_i32 fp0, fp1, fcc;
+
+    fp0 = tcg_temp_new_i32();
+    fp1 = tcg_temp_new_i32();
+    fcc = tcg_const_i32(a->cd);
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_load_fpr32(fp1, a->fk);
+
+    switch (a->fcond) {
+    case  0:
+        gen_helper_fp_cmp_caf_s(fp0, cpu_env, fp0, fp1);
+        break;
+    case  1:
+        gen_helper_fp_cmp_saf_s(fp0, cpu_env, fp0, fp1);
+        break;
+    case  2:
+        gen_helper_fp_cmp_clt_s(fp0, cpu_env, fp0, fp1);
+        break;
+    case  3:
+        gen_helper_fp_cmp_slt_s(fp0, cpu_env, fp0, fp1);
+        break;
+    case  4:
+        gen_helper_fp_cmp_ceq_s(fp0, cpu_env, fp0, fp1);
+        break;
+    case  5:
+        gen_helper_fp_cmp_seq_s(fp0, cpu_env, fp0, fp1);
+        break;
+    case  6:
+        gen_helper_fp_cmp_cle_s(fp0, cpu_env, fp0, fp1);
+        break;
+    case  7:
+        gen_helper_fp_cmp_sle_s(fp0, cpu_env, fp0, fp1);
+        break;
+    case  8:
+        gen_helper_fp_cmp_cun_s(fp0, cpu_env, fp0, fp1);
+        break;
+    case  9:
+        gen_helper_fp_cmp_sun_s(fp0, cpu_env, fp0, fp1);
+        break;
+    case 10:
+        gen_helper_fp_cmp_cult_s(fp0, cpu_env, fp0, fp1);
+        break;
+    case 11:
+        gen_helper_fp_cmp_sult_s(fp0, cpu_env, fp0, fp1);
+        break;
+    case 12:
+        gen_helper_fp_cmp_cueq_s(fp0, cpu_env, fp0, fp1);
+        break;
+    case 13:
+        gen_helper_fp_cmp_sueq_s(fp0, cpu_env, fp0, fp1);
+        break;
+    case 14:
+        gen_helper_fp_cmp_cule_s(fp0, cpu_env, fp0, fp1);
+        break;
+    case 15:
+        gen_helper_fp_cmp_sule_s(fp0, cpu_env, fp0, fp1);
+        break;
+    case 16:
+        gen_helper_fp_cmp_cne_s(fp0, cpu_env, fp0, fp1);
+        break;
+    case 17:
+        gen_helper_fp_cmp_sne_s(fp0, cpu_env, fp0, fp1);
+        break;
+    case 20:
+        gen_helper_fp_cmp_cor_s(fp0, cpu_env, fp0, fp1);
+        break;
+    case 21:
+        gen_helper_fp_cmp_sor_s(fp0, cpu_env, fp0, fp1);
+        break;
+    case 24:
+        gen_helper_fp_cmp_cune_s(fp0, cpu_env, fp0, fp1);
+        break;
+    case 25:
+        gen_helper_fp_cmp_sune_s(fp0, cpu_env, fp0, fp1);
+        break;
+    default:
+        abort();
+    }
+    gen_helper_movreg2cf_i32(cpu_env, fcc, fp0);
+
+    tcg_temp_free_i32(fp0);
+    tcg_temp_free_i32(fp1);
+    tcg_temp_free_i32(fcc);
+
+    return true;
+}
+
+static bool trans_fcmp_cond_d(DisasContext *ctx, arg_fcmp_cond_d *a)
+{
+    TCGv_i64 fp0, fp1;
+    TCGv_i32 fcc;
+
+    fp0 = tcg_temp_new_i64();
+    fp1 = tcg_temp_new_i64();
+    fcc = tcg_const_i32(a->cd);
+
+    gen_load_fpr64(fp0, a->fj);
+    gen_load_fpr64(fp1, a->fk);
+
+    check_fpu_enabled(ctx);
+    switch (a->fcond) {
+    case  0:
+        gen_helper_fp_cmp_caf_d(fp0, cpu_env, fp0, fp1);
+        break;
+    case  1:
+        gen_helper_fp_cmp_saf_d(fp0, cpu_env, fp0, fp1);
+        break;
+    case  2:
+        gen_helper_fp_cmp_clt_d(fp0, cpu_env, fp0, fp1);
+        break;
+    case  3:
+        gen_helper_fp_cmp_slt_d(fp0, cpu_env, fp0, fp1);
+        break;
+    case  4:
+        gen_helper_fp_cmp_ceq_d(fp0, cpu_env, fp0, fp1);
+        break;
+    case  5:
+        gen_helper_fp_cmp_seq_d(fp0, cpu_env, fp0, fp1);
+        break;
+    case  6:
+        gen_helper_fp_cmp_cle_d(fp0, cpu_env, fp0, fp1);
+        break;
+    case  7:
+        gen_helper_fp_cmp_sle_d(fp0, cpu_env, fp0, fp1);
+        break;
+    case  8:
+        gen_helper_fp_cmp_cun_d(fp0, cpu_env, fp0, fp1);
+        break;
+    case  9:
+        gen_helper_fp_cmp_sun_d(fp0, cpu_env, fp0, fp1);
+        break;
+    case 10:
+        gen_helper_fp_cmp_cult_d(fp0, cpu_env, fp0, fp1);
+        break;
+    case 11:
+        gen_helper_fp_cmp_sult_d(fp0, cpu_env, fp0, fp1);
+        break;
+    case 12:
+        gen_helper_fp_cmp_cueq_d(fp0, cpu_env, fp0, fp1);
+        break;
+    case 13:
+        gen_helper_fp_cmp_sueq_d(fp0, cpu_env, fp0, fp1);
+        break;
+    case 14:
+        gen_helper_fp_cmp_cule_d(fp0, cpu_env, fp0, fp1);
+        break;
+    case 15:
+        gen_helper_fp_cmp_sule_d(fp0, cpu_env, fp0, fp1);
+        break;
+    case 16:
+        gen_helper_fp_cmp_cne_d(fp0, cpu_env, fp0, fp1);
+        break;
+    case 17:
+        gen_helper_fp_cmp_sne_d(fp0, cpu_env, fp0, fp1);
+        break;
+    case 20:
+        gen_helper_fp_cmp_cor_d(fp0, cpu_env, fp0, fp1);
+        break;
+    case 21:
+        gen_helper_fp_cmp_sor_d(fp0, cpu_env, fp0, fp1);
+        break;
+    case 24:
+        gen_helper_fp_cmp_cune_d(fp0, cpu_env, fp0, fp1);
+        break;
+    case 25:
+        gen_helper_fp_cmp_sune_d(fp0, cpu_env, fp0, fp1);
+        break;
+    default:
+        abort();
+    }
+    gen_helper_movreg2cf_i64(cpu_env, fcc, fp0);
+
+    tcg_temp_free_i64(fp0);
+    tcg_temp_free_i64(fp1);
+    tcg_temp_free_i32(fcc);
+
+    return true;
+}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 15/22] target/loongarch: Add floating point conversion instruction translation
  2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
                   ` (13 preceding siblings ...)
  2021-07-21  9:53 ` [PATCH v2 14/22] target/loongarch: Add floating point comparison " Song Gao
@ 2021-07-21  9:53 ` Song Gao
  2021-07-23  6:16   ` Richard Henderson
  2021-07-21  9:53 ` [PATCH v2 16/22] target/loongarch: Add floating point move " Song Gao
                   ` (6 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

This patch implement floating point conversion instruction translation.

This includes:
- FCVT.S.D, FCVT.D.S
- FFINT.{S/D}.{W/L}, FTINT.{W/L}.{S/D}
- FTINT{RM/RP/RZ/RNE}.{W/L}.{S/D}
- FRINT.{S/D}

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/fpu_helper.c | 362 ++++++++++++++++++++++++++++++++++
 target/loongarch/helper.h     |  29 +++
 target/loongarch/insns.decode |  32 +++
 target/loongarch/trans.inc.c  | 449 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 872 insertions(+)

diff --git a/target/loongarch/fpu_helper.c b/target/loongarch/fpu_helper.c
index 0b6a07e..162085a 100644
--- a/target/loongarch/fpu_helper.c
+++ b/target/loongarch/fpu_helper.c
@@ -991,3 +991,365 @@ uint64_t helper_fp_cmp_sune_d(CPULoongArchState *env, uint64_t fp,
         return 0;
     }
 }
+
+/* floating point conversion */
+uint64_t helper_fp_cvt_d_s(CPULoongArchState *env, uint32_t src)
+{
+    uint64_t dest;
+
+    dest = float32_to_float64(src, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint64_t helper_fp_fint_d_w(CPULoongArchState *env, uint32_t src)
+{
+    uint64_t dest;
+
+    dest = int32_to_float64(src, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint64_t helper_fp_fint_d_l(CPULoongArchState *env, uint64_t src)
+{
+    uint64_t dest;
+
+    dest = int64_to_float64(src, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint32_t helper_fp_cvt_s_d(CPULoongArchState *env, uint64_t src)
+{
+    uint32_t dest;
+
+    dest = float64_to_float32(src, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint32_t helper_fp_fint_s_w(CPULoongArchState *env, uint32_t src)
+{
+    uint32_t dest;
+
+    dest = int32_to_float32(src, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint32_t helper_fp_fint_s_l(CPULoongArchState *env, uint64_t src)
+{
+    uint32_t dest;
+
+    dest = int64_to_float32(src, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint64_t helper_fp_tintrm_l_d(CPULoongArchState *env, uint64_t src)
+{
+    uint64_t dest;
+
+    set_float_rounding_mode(float_round_down, &env->active_fpu.fp_status);
+    dest = float64_to_int64(src, &env->active_fpu.fp_status);
+    restore_rounding_mode(env);
+    if (get_float_exception_flags(&env->active_fpu.fp_status)
+        & (float_flag_invalid | float_flag_overflow)) {
+        dest = FP_TO_INT64_OVERFLOW;
+    }
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint64_t helper_fp_tintrm_l_s(CPULoongArchState *env, uint32_t src)
+{
+    uint64_t dest;
+
+    set_float_rounding_mode(float_round_down, &env->active_fpu.fp_status);
+    dest = float32_to_int64(src, &env->active_fpu.fp_status);
+    restore_rounding_mode(env);
+    if (get_float_exception_flags(&env->active_fpu.fp_status)
+        & (float_flag_invalid | float_flag_overflow)) {
+        dest = FP_TO_INT64_OVERFLOW;
+    }
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint32_t helper_fp_tintrm_w_d(CPULoongArchState *env, uint64_t src)
+{
+    uint32_t dest;
+
+    set_float_rounding_mode(float_round_down, &env->active_fpu.fp_status);
+    dest = float64_to_int32(src, &env->active_fpu.fp_status);
+    restore_rounding_mode(env);
+    if (get_float_exception_flags(&env->active_fpu.fp_status)
+        & (float_flag_invalid | float_flag_overflow)) {
+        dest = FP_TO_INT32_OVERFLOW;
+    }
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint32_t helper_fp_tintrm_w_s(CPULoongArchState *env, uint32_t src)
+{
+    uint32_t dest;
+
+    set_float_rounding_mode(float_round_down, &env->active_fpu.fp_status);
+    dest = float32_to_int32(src, &env->active_fpu.fp_status);
+    restore_rounding_mode(env);
+    if (get_float_exception_flags(&env->active_fpu.fp_status)
+        & (float_flag_invalid | float_flag_overflow)) {
+        dest = FP_TO_INT32_OVERFLOW;
+    }
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint64_t helper_fp_tintrp_l_d(CPULoongArchState *env, uint64_t src)
+{
+    uint64_t dest;
+
+    set_float_rounding_mode(float_round_up, &env->active_fpu.fp_status);
+    dest = float64_to_int64(src, &env->active_fpu.fp_status);
+    restore_rounding_mode(env);
+    if (get_float_exception_flags(&env->active_fpu.fp_status)
+        & (float_flag_invalid | float_flag_overflow)) {
+        dest = FP_TO_INT64_OVERFLOW;
+    }
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint64_t helper_fp_tintrp_l_s(CPULoongArchState *env, uint32_t src)
+{
+    uint64_t dest;
+
+    set_float_rounding_mode(float_round_up, &env->active_fpu.fp_status);
+    dest = float32_to_int64(src, &env->active_fpu.fp_status);
+    restore_rounding_mode(env);
+    if (get_float_exception_flags(&env->active_fpu.fp_status)
+        & (float_flag_invalid | float_flag_overflow)) {
+        dest = FP_TO_INT64_OVERFLOW;
+    }
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint32_t helper_fp_tintrp_w_d(CPULoongArchState *env, uint64_t src)
+{
+    uint32_t dest;
+
+    set_float_rounding_mode(float_round_up, &env->active_fpu.fp_status);
+    dest = float64_to_int32(src, &env->active_fpu.fp_status);
+    restore_rounding_mode(env);
+    if (get_float_exception_flags(&env->active_fpu.fp_status)
+        & (float_flag_invalid | float_flag_overflow)) {
+        dest = FP_TO_INT32_OVERFLOW;
+    }
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint32_t helper_fp_tintrp_w_s(CPULoongArchState *env, uint32_t src)
+{
+    uint32_t dest;
+
+    set_float_rounding_mode(float_round_up, &env->active_fpu.fp_status);
+    dest = float32_to_int32(src, &env->active_fpu.fp_status);
+    restore_rounding_mode(env);
+    if (get_float_exception_flags(&env->active_fpu.fp_status)
+        & (float_flag_invalid | float_flag_overflow)) {
+        dest = FP_TO_INT32_OVERFLOW;
+    }
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint64_t helper_fp_tintrz_l_d(CPULoongArchState *env, uint64_t src)
+{
+    uint64_t dest;
+
+    dest = float64_to_int64_round_to_zero(src,
+                                         &env->active_fpu.fp_status);
+    if (get_float_exception_flags(&env->active_fpu.fp_status)
+        & (float_flag_invalid | float_flag_overflow)) {
+        dest = FP_TO_INT64_OVERFLOW;
+    }
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint64_t helper_fp_tintrz_l_s(CPULoongArchState *env, uint32_t src)
+{
+    uint64_t dest;
+
+    dest = float32_to_int64_round_to_zero(src, &env->active_fpu.fp_status);
+    if (get_float_exception_flags(&env->active_fpu.fp_status)
+        & (float_flag_invalid | float_flag_overflow)) {
+        dest = FP_TO_INT64_OVERFLOW;
+    }
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint32_t helper_fp_tintrz_w_d(CPULoongArchState *env, uint64_t src)
+{
+    uint32_t dest;
+
+    dest = float64_to_int32_round_to_zero(src, &env->active_fpu.fp_status);
+    if (get_float_exception_flags(&env->active_fpu.fp_status)
+        & (float_flag_invalid | float_flag_overflow)) {
+        dest = FP_TO_INT32_OVERFLOW;
+    }
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint32_t helper_fp_tintrz_w_s(CPULoongArchState *env, uint32_t src)
+{
+    uint32_t dest;
+
+    dest = float32_to_int32_round_to_zero(src, &env->active_fpu.fp_status);
+    if (get_float_exception_flags(&env->active_fpu.fp_status)
+        & (float_flag_invalid | float_flag_overflow)) {
+        dest = FP_TO_INT32_OVERFLOW;
+    }
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint64_t helper_fp_tintrne_l_d(CPULoongArchState *env, uint64_t src)
+{
+    uint64_t dest;
+
+    set_float_rounding_mode(float_round_nearest_even,
+                            &env->active_fpu.fp_status);
+    dest = float64_to_int64(src, &env->active_fpu.fp_status);
+    restore_rounding_mode(env);
+    if (get_float_exception_flags(&env->active_fpu.fp_status)
+        & (float_flag_invalid | float_flag_overflow)) {
+        dest = FP_TO_INT64_OVERFLOW;
+    }
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint64_t helper_fp_tintrne_l_s(CPULoongArchState *env, uint32_t src)
+{
+    uint64_t dest;
+
+    set_float_rounding_mode(float_round_nearest_even,
+                            &env->active_fpu.fp_status);
+    dest = float32_to_int64(src, &env->active_fpu.fp_status);
+    restore_rounding_mode(env);
+    if (get_float_exception_flags(&env->active_fpu.fp_status)
+        & (float_flag_invalid | float_flag_overflow)) {
+        dest = FP_TO_INT64_OVERFLOW;
+    }
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint32_t helper_fp_tintrne_w_d(CPULoongArchState *env, uint64_t src)
+{
+    uint32_t dest;
+
+    set_float_rounding_mode(float_round_nearest_even,
+                            &env->active_fpu.fp_status);
+    dest = float64_to_int32(src, &env->active_fpu.fp_status);
+    restore_rounding_mode(env);
+    if (get_float_exception_flags(&env->active_fpu.fp_status)
+        & (float_flag_invalid | float_flag_overflow)) {
+        dest = FP_TO_INT32_OVERFLOW;
+    }
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint32_t helper_fp_tintrne_w_s(CPULoongArchState *env, uint32_t src)
+{
+    uint32_t dest;
+
+    set_float_rounding_mode(float_round_nearest_even,
+                            &env->active_fpu.fp_status);
+    dest = float32_to_int32(src, &env->active_fpu.fp_status);
+    restore_rounding_mode(env);
+    if (get_float_exception_flags(&env->active_fpu.fp_status)
+        & (float_flag_invalid | float_flag_overflow)) {
+        dest = FP_TO_INT32_OVERFLOW;
+    }
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint64_t helper_fp_tint_l_d(CPULoongArchState *env, uint64_t src)
+{
+    uint64_t dest;
+
+    dest = float64_to_int64(src, &env->active_fpu.fp_status);
+    if (get_float_exception_flags(&env->active_fpu.fp_status)
+        & (float_flag_invalid | float_flag_overflow)) {
+        dest = FP_TO_INT64_OVERFLOW;
+    }
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint64_t helper_fp_tint_l_s(CPULoongArchState *env, uint32_t src)
+{
+    uint64_t dest;
+
+    dest = float32_to_int64(src, &env->active_fpu.fp_status);
+    if (get_float_exception_flags(&env->active_fpu.fp_status)
+        & (float_flag_invalid | float_flag_overflow)) {
+        dest = FP_TO_INT64_OVERFLOW;
+    }
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint32_t helper_fp_tint_w_s(CPULoongArchState *env, uint32_t src)
+{
+    uint32_t dest;
+
+    dest = float32_to_int32(src, &env->active_fpu.fp_status);
+    if (get_float_exception_flags(&env->active_fpu.fp_status)
+        & (float_flag_invalid | float_flag_overflow)) {
+        dest = FP_TO_INT32_OVERFLOW;
+    }
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint32_t helper_fp_tint_w_d(CPULoongArchState *env, uint64_t src)
+{
+    uint32_t dest;
+
+    dest = float64_to_int32(src, &env->active_fpu.fp_status);
+    if (get_float_exception_flags(&env->active_fpu.fp_status)
+        & (float_flag_invalid | float_flag_overflow)) {
+        dest = FP_TO_INT32_OVERFLOW;
+    }
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint32_t helper_fp_rint_s(CPULoongArchState *env, uint32_t src)
+{
+    uint32_t dest;
+
+    dest = float32_round_to_int(src, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return dest;
+}
+
+uint64_t helper_fp_rint_d(CPULoongArchState *env, uint64_t src)
+{
+    uint64_t dest;
+
+    dest = float64_round_to_int(src, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return dest;
+}
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index b1a81c5..9ec2b53 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -69,6 +69,8 @@ DEF_HELPER_2(fp_rsqrt_s, i32, env, i32)
 DEF_HELPER_2(fp_rsqrt_d, i64, env, i64)
 DEF_HELPER_2(fp_recip_s, i32, env, i32)
 DEF_HELPER_2(fp_recip_d, i64, env, i64)
+DEF_HELPER_2(fp_rint_s, i32, env, i32)
+DEF_HELPER_2(fp_rint_d, i64, env, i64)
 
 DEF_HELPER_FLAGS_2(fp_class_s, TCG_CALL_NO_RWG_SE, i32, env, i32)
 DEF_HELPER_FLAGS_2(fp_class_d, TCG_CALL_NO_RWG_SE, i64, env, i64)
@@ -121,3 +123,30 @@ DEF_HELPER_3(fp_cmp_sune_s, i32, env, i32, i32)
 
 DEF_HELPER_3(movreg2cf_i32, void, env, i32, i32)
 DEF_HELPER_3(movreg2cf_i64, void, env, i32, i64)
+
+DEF_HELPER_2(fp_cvt_d_s, i64, env, i32)
+DEF_HELPER_2(fp_cvt_s_d, i32, env, i64)
+DEF_HELPER_2(fp_fint_d_w, i64, env, i32)
+DEF_HELPER_2(fp_fint_d_l, i64, env, i64)
+DEF_HELPER_2(fp_fint_s_w, i32, env, i32)
+DEF_HELPER_2(fp_fint_s_l, i32, env, i64)
+DEF_HELPER_2(fp_tintrm_l_s, i64, env, i32)
+DEF_HELPER_2(fp_tintrm_l_d, i64, env, i64)
+DEF_HELPER_2(fp_tintrm_w_s, i32, env, i32)
+DEF_HELPER_2(fp_tintrm_w_d, i32, env, i64)
+DEF_HELPER_2(fp_tintrp_l_s, i64, env, i32)
+DEF_HELPER_2(fp_tintrp_l_d, i64, env, i64)
+DEF_HELPER_2(fp_tintrp_w_s, i32, env, i32)
+DEF_HELPER_2(fp_tintrp_w_d, i32, env, i64)
+DEF_HELPER_2(fp_tintrz_l_s, i64, env, i32)
+DEF_HELPER_2(fp_tintrz_l_d, i64, env, i64)
+DEF_HELPER_2(fp_tintrz_w_s, i32, env, i32)
+DEF_HELPER_2(fp_tintrz_w_d, i32, env, i64)
+DEF_HELPER_2(fp_tintrne_l_s, i64, env, i32)
+DEF_HELPER_2(fp_tintrne_l_d, i64, env, i64)
+DEF_HELPER_2(fp_tintrne_w_s, i32, env, i32)
+DEF_HELPER_2(fp_tintrne_w_d, i32, env, i64)
+DEF_HELPER_2(fp_tint_l_s, i64, env, i32)
+DEF_HELPER_2(fp_tint_l_d, i64, env, i64)
+DEF_HELPER_2(fp_tint_w_s, i32, env, i32)
+DEF_HELPER_2(fp_tint_w_d, i32, env, i64)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 8aadcfd..c6fd762 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -351,3 +351,35 @@ fclass_d         0000 00010001 01000 01110 ..... .....    @fmt_fdfj
 #
 fcmp_cond_s      0000 11000001 ..... ..... ..... 00 ...   @fmt_cdfjfkfcond
 fcmp_cond_d      0000 11000010 ..... ..... ..... 00 ...   @fmt_cdfjfkfcond
+
+#
+# Floating point conversion instruction
+#
+fcvt_s_d         0000 00010001 10010 00110 ..... .....    @fmt_fdfj
+fcvt_d_s         0000 00010001 10010 01001 ..... .....    @fmt_fdfj
+ftintrm_w_s      0000 00010001 10100 00001 ..... .....    @fmt_fdfj
+ftintrm_w_d      0000 00010001 10100 00010 ..... .....    @fmt_fdfj
+ftintrm_l_s      0000 00010001 10100 01001 ..... .....    @fmt_fdfj
+ftintrm_l_d      0000 00010001 10100 01010 ..... .....    @fmt_fdfj
+ftintrp_w_s      0000 00010001 10100 10001 ..... .....    @fmt_fdfj
+ftintrp_w_d      0000 00010001 10100 10010 ..... .....    @fmt_fdfj
+ftintrp_l_s      0000 00010001 10100 11001 ..... .....    @fmt_fdfj
+ftintrp_l_d      0000 00010001 10100 11010 ..... .....    @fmt_fdfj
+ftintrz_w_s      0000 00010001 10101 00001 ..... .....    @fmt_fdfj
+ftintrz_w_d      0000 00010001 10101 00010 ..... .....    @fmt_fdfj
+ftintrz_l_s      0000 00010001 10101 01001 ..... .....    @fmt_fdfj
+ftintrz_l_d      0000 00010001 10101 01010 ..... .....    @fmt_fdfj
+ftintrne_w_s     0000 00010001 10101 10001 ..... .....    @fmt_fdfj
+ftintrne_w_d     0000 00010001 10101 10010 ..... .....    @fmt_fdfj
+ftintrne_l_s     0000 00010001 10101 11001 ..... .....    @fmt_fdfj
+ftintrne_l_d     0000 00010001 10101 11010 ..... .....    @fmt_fdfj
+ftint_w_s        0000 00010001 10110 00001 ..... .....    @fmt_fdfj
+ftint_w_d        0000 00010001 10110 00010 ..... .....    @fmt_fdfj
+ftint_l_s        0000 00010001 10110 01001 ..... .....    @fmt_fdfj
+ftint_l_d        0000 00010001 10110 01010 ..... .....    @fmt_fdfj
+ffint_s_w        0000 00010001 11010 00100 ..... .....    @fmt_fdfj
+ffint_s_l        0000 00010001 11010 00110 ..... .....    @fmt_fdfj
+ffint_d_w        0000 00010001 11010 01000 ..... .....    @fmt_fdfj
+ffint_d_l        0000 00010001 11010 01010 ..... .....    @fmt_fdfj
+frint_s          0000 00010001 11100 10001 ..... .....    @fmt_fdfj
+frint_d          0000 00010001 11100 10010 ..... .....    @fmt_fdfj
diff --git a/target/loongarch/trans.inc.c b/target/loongarch/trans.inc.c
index a4efc05..aa9920e 100644
--- a/target/loongarch/trans.inc.c
+++ b/target/loongarch/trans.inc.c
@@ -4309,3 +4309,452 @@ static bool trans_fcmp_cond_d(DisasContext *ctx, arg_fcmp_cond_d *a)
 
     return true;
 }
+
+/* Floating point conversion instruction */
+static bool trans_fcvt_s_d(DisasContext *ctx, arg_fcvt_s_d *a)
+{
+    TCGv_i32 fp32 = tcg_temp_new_i32();
+    TCGv_i64 fp64 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp64, a->fj);
+    gen_helper_fp_cvt_s_d(fp32, cpu_env, fp64);
+    gen_store_fpr32(fp32, a->fd);
+
+    tcg_temp_free_i32(fp32);
+    tcg_temp_free_i64(fp64);
+
+    return true;
+}
+
+static bool trans_fcvt_d_s(DisasContext *ctx, arg_fcvt_d_s *a)
+{
+    TCGv_i32 fp32 = tcg_temp_new_i32();
+    TCGv_i64 fp64 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp32, a->fj);
+    gen_helper_fp_cvt_d_s(fp64, cpu_env, fp32);
+    gen_store_fpr64(fp64, a->fd);
+
+    tcg_temp_free_i32(fp32);
+    tcg_temp_free_i64(fp64);
+
+    return true;
+}
+
+static bool trans_ftintrm_w_s(DisasContext *ctx, arg_ftintrm_l_s *a)
+{
+    TCGv_i32 fp0;
+
+    fp0 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_helper_fp_tintrm_w_s(fp0, cpu_env, fp0);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free_i32(fp0);
+
+    return true;
+}
+
+static bool trans_ftintrm_w_d(DisasContext *ctx, arg_ftintrm_l_d *a)
+{
+    TCGv_i32 fp32 = tcg_temp_new_i32();
+    TCGv_i64 fp64 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp64, a->fj);
+    gen_helper_fp_tintrm_w_d(fp32, cpu_env, fp64);
+    gen_store_fpr32(fp32, a->fd);
+
+    tcg_temp_free_i32(fp32);
+    tcg_temp_free_i64(fp64);
+
+    return true;
+}
+
+static bool trans_ftintrm_l_s(DisasContext *ctx, arg_ftintrm_l_s *a)
+{
+    TCGv_i32 fp32 = tcg_temp_new_i32();
+    TCGv_i64 fp64 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp32, a->fj);
+    gen_helper_fp_tintrm_l_s(fp64, cpu_env, fp32);
+    gen_store_fpr64(fp64, a->fd);
+
+    tcg_temp_free_i32(fp32);
+    tcg_temp_free_i64(fp64);
+
+    return true;
+}
+
+static bool trans_ftintrm_l_d(DisasContext *ctx, arg_ftintrm_l_d *a)
+{
+    TCGv_i64 fp0;
+
+    fp0 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_helper_fp_tintrm_l_d(fp0, cpu_env, fp0);
+    gen_store_fpr64(fp0, a->fd);
+
+    tcg_temp_free_i64(fp0);
+
+    return true;
+}
+
+static bool trans_ftintrp_w_s(DisasContext *ctx, arg_ftintrp_w_s *a)
+{
+    TCGv_i32 fp0;
+
+    fp0 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_helper_fp_tintrp_w_s(fp0, cpu_env, fp0);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free_i32(fp0);
+
+    return true;
+}
+
+static bool trans_ftintrp_w_d(DisasContext *ctx, arg_ftintrp_w_d *a)
+{
+    TCGv_i32 fp32 = tcg_temp_new_i32();
+    TCGv_i64 fp64 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp64, a->fj);
+    gen_helper_fp_tintrp_w_d(fp32, cpu_env, fp64);
+    gen_store_fpr32(fp32, a->fd);
+
+    tcg_temp_free_i32(fp32);
+    tcg_temp_free_i64(fp64);
+
+    return true;
+}
+
+static bool trans_ftintrp_l_s(DisasContext *ctx, arg_ftintrp_l_s *a)
+{
+    TCGv_i32 fp32 = tcg_temp_new_i32();
+    TCGv_i64 fp64 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp32, a->fj);
+    gen_helper_fp_tintrp_l_s(fp64, cpu_env, fp32);
+    gen_store_fpr64(fp64, a->fd);
+
+    tcg_temp_free_i32(fp32);
+    tcg_temp_free_i64(fp64);
+
+    return true;
+}
+
+static bool trans_ftintrp_l_d(DisasContext *ctx, arg_ftintrp_l_d *a)
+{
+    TCGv_i64 fp0;
+
+    fp0  = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_helper_fp_tintrp_l_d(fp0, cpu_env, fp0);
+    gen_store_fpr64(fp0, a->fd);
+
+    tcg_temp_free_i64(fp0);
+
+    return true;
+}
+
+static bool trans_ftintrz_w_s(DisasContext *ctx, arg_ftintrz_w_s *a)
+{
+    TCGv_i32 fp0;
+
+    fp0 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_helper_fp_tintrz_w_s(fp0, cpu_env, fp0);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free_i32(fp0);
+
+    return true;
+}
+
+static bool trans_ftintrz_w_d(DisasContext *ctx, arg_ftintrz_w_d *a)
+{
+    TCGv_i32 fp32 = tcg_temp_new_i32();
+    TCGv_i64 fp64 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp64, a->fj);
+    gen_helper_fp_tintrz_w_d(fp32, cpu_env, fp64);
+    gen_store_fpr32(fp32, a->fd);
+
+    tcg_temp_free_i32(fp32);
+    tcg_temp_free_i64(fp64);
+
+    return true;
+}
+
+static bool trans_ftintrz_l_s(DisasContext *ctx, arg_ftintrz_l_s *a)
+{
+    TCGv_i32 fp32 = tcg_temp_new_i32();
+    TCGv_i64 fp64 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp32, a->fj);
+    gen_helper_fp_tintrz_l_s(fp64, cpu_env, fp32);
+    gen_store_fpr64(fp64, a->fd);
+
+    tcg_temp_free_i32(fp32);
+    tcg_temp_free_i64(fp64);
+
+    return true;
+}
+
+static bool trans_ftintrz_l_d(DisasContext *ctx, arg_ftintrz_l_d *a)
+{
+    TCGv_i64 fp0;
+
+    fp0 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_helper_fp_tintrz_l_d(fp0, cpu_env, fp0);
+    gen_store_fpr64(fp0, a->fd);
+
+    tcg_temp_free_i64(fp0);
+
+    return true;
+}
+
+static bool trans_ftintrne_w_s(DisasContext *ctx, arg_ftintrne_w_s *a)
+{
+    TCGv_i32 fp0;
+
+    fp0 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_helper_fp_tintrne_w_s(fp0, cpu_env, fp0);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free_i32(fp0);
+
+    return true;
+}
+
+static bool trans_ftintrne_w_d(DisasContext *ctx, arg_ftintrne_w_d *a)
+{
+    TCGv_i32 fp32 = tcg_temp_new_i32();
+    TCGv_i64 fp64 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp64, a->fj);
+    gen_helper_fp_tintrne_w_d(fp32, cpu_env, fp64);
+    gen_store_fpr32(fp32, a->fd);
+
+    tcg_temp_free_i32(fp32);
+    tcg_temp_free_i64(fp64);
+
+    return true;
+}
+
+static bool trans_ftintrne_l_s(DisasContext *ctx, arg_ftintrne_l_s *a)
+{
+    TCGv_i32 fp32 = tcg_temp_new_i32();
+    TCGv_i64 fp64 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp32, a->fj);
+    gen_helper_fp_tintrne_l_s(fp64, cpu_env, fp32);
+    gen_store_fpr64(fp64, a->fd);
+
+    tcg_temp_free_i32(fp32);
+    tcg_temp_free_i64(fp64);
+
+    return true;
+}
+
+static bool trans_ftintrne_l_d(DisasContext *ctx, arg_ftintrne_l_d *a)
+{
+    TCGv_i64 fp0;
+
+    fp0 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_helper_fp_tintrne_l_d(fp0, cpu_env, fp0);
+    gen_store_fpr64(fp0, a->fd);
+
+    tcg_temp_free_i64(fp0);
+
+    return true;
+}
+
+static bool trans_ftint_w_s(DisasContext *ctx, arg_ftint_w_s *a)
+{
+    TCGv_i32 fp0;
+
+    fp0 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_helper_fp_tint_w_s(fp0, cpu_env, fp0);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free_i32(fp0);
+
+    return true;
+}
+
+static bool trans_ftint_w_d(DisasContext *ctx, arg_ftint_w_d *a)
+{
+    TCGv_i32 fp32 = tcg_temp_new_i32();
+    TCGv_i64 fp64 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp64, a->fj);
+    gen_helper_fp_tint_w_d(fp32, cpu_env, fp64);
+    gen_store_fpr32(fp32, a->fd);
+
+    tcg_temp_free_i32(fp32);
+    tcg_temp_free_i64(fp64);
+
+    return true;
+}
+
+static bool trans_ftint_l_s(DisasContext *ctx, arg_ftint_l_s *a)
+{
+    TCGv_i32 fp32 = tcg_temp_new_i32();
+    TCGv_i64 fp64 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp32, a->fj);
+    gen_helper_fp_tint_l_s(fp64, cpu_env, fp32);
+    gen_store_fpr64(fp64, a->fd);
+
+    tcg_temp_free_i32(fp32);
+    tcg_temp_free_i64(fp64);
+
+    return true;
+}
+
+static bool trans_ftint_l_d(DisasContext *ctx, arg_ftint_l_d *a)
+{
+    TCGv_i64 fp0;
+
+    fp0 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_helper_fp_tint_l_d(fp0, cpu_env, fp0);
+    gen_store_fpr64(fp0, a->fd);
+
+    tcg_temp_free_i64(fp0);
+
+    return true;
+}
+
+static bool trans_ffint_s_w(DisasContext *ctx, arg_ffint_s_w *a)
+{
+    TCGv_i32 fp0;
+
+    fp0 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_helper_fp_fint_s_w(fp0, cpu_env, fp0);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free_i32(fp0);
+
+    return true;
+}
+
+static bool trans_ffint_s_l(DisasContext *ctx, arg_ffint_s_l *a)
+{
+    TCGv_i32 fp32 = tcg_temp_new_i32();
+    TCGv_i64 fp64 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp64, a->fj);
+    gen_helper_fp_fint_s_l(fp32, cpu_env, fp64);
+    gen_store_fpr32(fp32, a->fd);
+
+    tcg_temp_free_i32(fp32);
+    tcg_temp_free_i64(fp64);
+
+    return true;
+}
+
+static bool trans_ffint_d_w(DisasContext *ctx, arg_ffint_d_w *a)
+{
+    TCGv_i32 fp32 = tcg_temp_new_i32();
+    TCGv_i64 fp64 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp32, a->fj);
+    gen_helper_fp_fint_d_w(fp64, cpu_env, fp32);
+    gen_store_fpr64(fp64, a->fd);
+
+    tcg_temp_free_i32(fp32);
+    tcg_temp_free_i64(fp64);
+
+    return true;
+}
+
+static bool trans_ffint_d_l(DisasContext *ctx, arg_ffint_d_l *a)
+{
+    TCGv_i64 fp0;
+
+    fp0 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_helper_fp_fint_d_l(fp0, cpu_env, fp0);
+    gen_store_fpr64(fp0, a->fd);
+
+    tcg_temp_free_i64(fp0);
+
+    return true;
+}
+
+static bool trans_frint_s(DisasContext *ctx, arg_frint_s *a)
+{
+    TCGv_i32 fp0;
+
+    fp0 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_helper_fp_rint_s(fp0, cpu_env, fp0);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free_i32(fp0);
+
+    return true;
+}
+
+static bool trans_frint_d(DisasContext *ctx, arg_frint_d *a)
+{
+    TCGv_i64 fp0;
+
+    fp0 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_helper_fp_rint_d(fp0, cpu_env, fp0);
+    gen_store_fpr64(fp0, a->fd);
+
+    tcg_temp_free_i64(fp0);
+
+    return true;
+}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 16/22] target/loongarch: Add floating point move instruction translation
  2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
                   ` (14 preceding siblings ...)
  2021-07-21  9:53 ` [PATCH v2 15/22] target/loongarch: Add floating point conversion " Song Gao
@ 2021-07-21  9:53 ` Song Gao
  2021-07-23  6:29   ` Richard Henderson
  2021-07-21  9:53 ` [PATCH v2 17/22] target/loongarch: Add floating point load/store " Song Gao
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

This patch implement floationg point move instruction translation.

This includes:
- FMOV.{S/D}
- FSEL
- MOVGR2FR.{W/D}, MOVGR2FRH.W
- MOVFR2GR.{S/D}, MOVFRH2GR.S
- MOVGR2FCSR, MOVFCSR2GR
- MOVFR2CF, MOVCF2FR
- MOVGR2CF, MOVCF2GR

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/fpu_helper.c |  80 +++++++++++++
 target/loongarch/helper.h     |   6 +
 target/loongarch/insns.decode |  41 +++++++
 target/loongarch/trans.inc.c  | 270 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 397 insertions(+)

diff --git a/target/loongarch/fpu_helper.c b/target/loongarch/fpu_helper.c
index 162085a..7662715 100644
--- a/target/loongarch/fpu_helper.c
+++ b/target/loongarch/fpu_helper.c
@@ -379,6 +379,11 @@ uint64_t helper_fp_logb_d(CPULoongArchState *env, uint64_t fp)
     return fp1;
 }
 
+void helper_movreg2cf(CPULoongArchState *env, uint32_t cd, target_ulong src)
+{
+    env->active_fpu.cf[cd & 0x7] = src & 0x1;
+}
+
 void helper_movreg2cf_i32(CPULoongArchState *env, uint32_t cd, uint32_t src)
 {
     env->active_fpu.cf[cd & 0x7] = src & 0x1;
@@ -1353,3 +1358,78 @@ uint64_t helper_fp_rint_d(CPULoongArchState *env, uint64_t src)
     update_fcsr0(env, GETPC());
     return dest;
 }
+
+target_ulong helper_fsel(CPULoongArchState *env, target_ulong fj,
+                         target_ulong fk, uint32_t ca)
+{
+    if (env->active_fpu.cf[ca & 0x7]) {
+        return fk;
+    } else {
+        return fj;
+    }
+}
+
+void helper_movgr2fcsr(CPULoongArchState *env, target_ulong arg1,
+                       uint32_t fcsr)
+{
+    switch (fcsr) {
+    case 0:
+        env->active_fpu.fcsr0 = arg1;
+        break;
+    case 1:
+        env->active_fpu.fcsr0 = (arg1 & FCSR0_M1) |
+                                (env->active_fpu.fcsr0 & ~FCSR0_M1);
+        break;
+    case 2:
+        env->active_fpu.fcsr0 = (arg1 & FCSR0_M2) |
+                                (env->active_fpu.fcsr0 & ~FCSR0_M2);
+        break;
+    case 3:
+        env->active_fpu.fcsr0 = (arg1 & FCSR0_M3) |
+                                (env->active_fpu.fcsr0 & ~FCSR0_M3);
+        break;
+    case 16:
+        env->active_fpu.vcsr16 = arg1;
+        break;
+    default:
+        printf("%s: warning, fcsr '%d' not supported\n", __func__, fcsr);
+        assert(0);
+        break;
+    }
+    restore_fp_status(env);
+    set_float_exception_flags(0, &env->active_fpu.fp_status);
+}
+
+target_ulong helper_movfcsr2gr(CPULoongArchState *env, uint32_t reg)
+{
+    target_ulong r = 0;
+
+    switch (reg) {
+    case 0:
+        r = (uint32_t)env->active_fpu.fcsr0;
+        break;
+    case 1:
+        r = (env->active_fpu.fcsr0 & FCSR0_M1);
+        break;
+    case 2:
+        r = (env->active_fpu.fcsr0 & FCSR0_M2);
+        break;
+    case 3:
+        r = (env->active_fpu.fcsr0 & FCSR0_M3);
+        break;
+    case 16:
+        r = (uint32_t)env->active_fpu.vcsr16;
+        break;
+    default:
+        printf("%s: warning, fcsr '%d' not supported\n", __func__, reg);
+        assert(0);
+        break;
+    }
+
+    return r;
+}
+
+target_ulong helper_movcf2reg(CPULoongArchState *env, uint32_t cj)
+{
+    return (target_ulong)env->active_fpu.cf[cj & 0x7];
+}
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 9ec2b53..eedf174 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -150,3 +150,9 @@ DEF_HELPER_2(fp_tint_l_s, i64, env, i32)
 DEF_HELPER_2(fp_tint_l_d, i64, env, i64)
 DEF_HELPER_2(fp_tint_w_s, i32, env, i32)
 DEF_HELPER_2(fp_tint_w_d, i32, env, i64)
+
+DEF_HELPER_4(fsel, i64, env, i64, i64, i32)
+DEF_HELPER_3(movreg2cf, void, env, i32, tl)
+DEF_HELPER_2(movcf2reg, tl, env, i32)
+DEF_HELPER_2(movfcsr2gr, tl, env, i32)
+DEF_HELPER_3(movgr2fcsr, void, env, tl, i32)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index c6fd762..febf89a 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -34,6 +34,10 @@
 %fa      15:5
 %cd      0:3
 %fcond   15:5
+%cj      5:3
+%ca      15:3
+%fcsrd   0:5
+%fcsrs   5:5
 
 #
 # Argument sets
@@ -59,6 +63,15 @@
 &fmt_fdfjfkfa       fd fj fk fa
 &fmt_fdfj           fd fj
 &fmt_cdfjfkfcond    cd fj fk fcond
+&fmt_fdfjfkca       fd fj fk ca
+&fmt_fdrj           fd rj
+&fmt_rdfj           rd fj
+&fmt_fcsrdrj        fcsrd rj
+&fmt_rdfcsrs        rd fcsrs
+&fmt_cdfj           cd fj
+&fmt_fdcj           fd cj
+&fmt_cdrj           cd rj
+&fmt_rdcj           rd cj
 
 #
 # Formats
@@ -84,6 +97,15 @@
 @fmt_fdfjfkfa        .... ........ ..... ..... ..... .....    &fmt_fdfjfkfa       %fd %fj %fk %fa
 @fmt_fdfj            .... ........ ..... ..... ..... .....    &fmt_fdfj           %fd %fj
 @fmt_cdfjfkfcond     .... ........ ..... ..... ..... .. ...   &fmt_cdfjfkfcond    %cd %fj %fk %fcond
+@fmt_fdfjfkca        .... ........ .. ... ..... ..... .....   &fmt_fdfjfkca       %fd %fj %fk %ca
+@fmt_fdrj            .... ........ ..... ..... ..... .....    &fmt_fdrj           %fd %rj
+@fmt_rdfj            .... ........ ..... ..... ..... .....    &fmt_rdfj           %rd %fj
+@fmt_fcsrdrj         .... ........ ..... ..... ..... .....    &fmt_fcsrdrj        %fcsrd %rj
+@fmt_rdfcsrs         .... ........ ..... ..... ..... .....    &fmt_rdfcsrs        %rd %fcsrs
+@fmt_cdfj            .... ........ ..... ..... ..... .. ...   &fmt_cdfj           %cd %fj
+@fmt_fdcj            .... ........ ..... ..... .. ... .....   &fmt_fdcj           %fd %cj
+@fmt_cdrj            .... ........ ..... ..... ..... .. ...   &fmt_cdrj           %cd %rj
+@fmt_rdcj            .... ........ ..... ..... .. ... .....   &fmt_rdcj           %rd %cj
 
 #
 # Fixed point arithmetic operation instruction
@@ -383,3 +405,22 @@ ffint_d_w        0000 00010001 11010 01000 ..... .....    @fmt_fdfj
 ffint_d_l        0000 00010001 11010 01010 ..... .....    @fmt_fdfj
 frint_s          0000 00010001 11100 10001 ..... .....    @fmt_fdfj
 frint_d          0000 00010001 11100 10010 ..... .....    @fmt_fdfj
+
+#
+# Floating point move instruction
+#
+fmov_s           0000 00010001 01001 00101 ..... .....    @fmt_fdfj
+fmov_d           0000 00010001 01001 00110 ..... .....    @fmt_fdfj
+fsel             0000 11010000 00 ... ..... ..... .....   @fmt_fdfjfkca
+movgr2fr_w       0000 00010001 01001 01001 ..... .....    @fmt_fdrj
+movgr2fr_d       0000 00010001 01001 01010 ..... .....    @fmt_fdrj
+movgr2frh_w      0000 00010001 01001 01011 ..... .....    @fmt_fdrj
+movfr2gr_s       0000 00010001 01001 01101 ..... .....    @fmt_rdfj
+movfr2gr_d       0000 00010001 01001 01110 ..... .....    @fmt_rdfj
+movfrh2gr_s      0000 00010001 01001 01111 ..... .....    @fmt_rdfj
+movgr2fcsr       0000 00010001 01001 10000 ..... .....    @fmt_fcsrdrj
+movfcsr2gr       0000 00010001 01001 10010 ..... .....    @fmt_rdfcsrs
+movfr2cf         0000 00010001 01001 10100 ..... 00 ...   @fmt_cdfj
+movcf2fr         0000 00010001 01001 10101 00 ... .....   @fmt_fdcj
+movgr2cf         0000 00010001 01001 10110 ..... 00 ...   @fmt_cdrj
+movcf2gr         0000 00010001 01001 10111 00 ... .....   @fmt_rdcj
diff --git a/target/loongarch/trans.inc.c b/target/loongarch/trans.inc.c
index aa9920e..56677f8 100644
--- a/target/loongarch/trans.inc.c
+++ b/target/loongarch/trans.inc.c
@@ -4758,3 +4758,273 @@ static bool trans_frint_d(DisasContext *ctx, arg_frint_d *a)
 
     return true;
 }
+
+/* Floating point move instruction translation */
+static bool trans_fmov_s(DisasContext *ctx, arg_fmov_s *a)
+{
+    TCGv_i32 fp0;
+
+    fp0 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free_i32(fp0);
+
+    return true;
+}
+
+static bool trans_fmov_d(DisasContext *ctx, arg_fmov_d *a)
+{
+    TCGv_i64 fp0;
+
+    fp0 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_store_fpr64(fp0, a->fd);
+
+    tcg_temp_free_i64(fp0);
+
+    return true;
+}
+
+static bool trans_fsel(DisasContext *ctx, arg_fsel *a)
+{
+    TCGv_i64 fj, fk, fd;
+    TCGv_i32 ca;
+
+    fj = tcg_temp_new_i64();
+    fk = tcg_temp_new_i64();
+    fd = tcg_temp_new_i64();
+    ca = tcg_const_i32(a->ca);
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fj, a->fj);
+    gen_load_fpr64(fk, a->fk);
+    gen_helper_fsel(fd, cpu_env, fj, fk, ca);
+    gen_store_fpr64(fd, a->fd);
+
+    tcg_temp_free_i64(fj);
+    tcg_temp_free_i64(fk);
+    tcg_temp_free_i64(fd);
+    tcg_temp_free_i32(ca);
+
+    return true;
+}
+
+static bool trans_movgr2fr_w(DisasContext *ctx, arg_movgr2fr_w *a)
+{
+    TCGv t0;
+    TCGv_i32 fp0;
+
+    t0 = get_gpr(a->rj);
+    fp0 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    tcg_gen_trunc_tl_i32(fp0, t0);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free_i32(fp0);
+
+    return true;
+}
+
+static bool trans_movgr2fr_d(DisasContext *ctx, arg_movgr2fr_d *a)
+{
+    TCGv t0;
+
+    t0 = get_gpr(a->rj);
+
+    check_fpu_enabled(ctx);
+    gen_store_fpr64(t0, a->fd);
+
+    return true;
+}
+
+static bool trans_movgr2frh_w(DisasContext *ctx, arg_movgr2frh_w *a)
+{
+    TCGv t0;
+    TCGv_i32 fp0;
+
+    t0 = get_gpr(a->rj);
+    fp0 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    tcg_gen_trunc_tl_i32(fp0, t0);
+    gen_store_fpr32h(fp0, a->fd);
+
+    tcg_temp_free_i32(fp0);
+
+    return true;
+}
+
+static bool trans_movfr2gr_s(DisasContext *ctx, arg_movfr2gr_s *a)
+{
+    TCGv t0;
+    TCGv_i32 fp0;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    fp0 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    tcg_gen_ext_i32_tl(t0, fp0);
+    tcg_gen_mov_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+    tcg_temp_free_i32(fp0);
+
+    return true;
+}
+
+static bool trans_movfr2gr_d(DisasContext *ctx, arg_movfr2gr_d *a)
+{
+    TCGv t0;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(t0, a->fj);
+    tcg_gen_mov_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_movfrh2gr_s(DisasContext *ctx, arg_movfrh2gr_s *a)
+{
+    TCGv t0;
+    TCGv_i32 fp0;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    fp0 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32h(fp0, a->fj);
+    tcg_gen_ext_i32_tl(t0, fp0);
+    tcg_gen_mov_tl(Rd, t0);
+
+    tcg_temp_free_i32(fp0);
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+static bool trans_movgr2fcsr(DisasContext *ctx, arg_movgr2fcsr *a)
+{
+    TCGv t0 = tcg_temp_new();
+    TCGv_i32 t1 = tcg_const_i32(a->fcsrd);
+
+    check_fpu_enabled(ctx);
+    gen_load_gpr(t0, a->rj);
+    save_cpu_state(ctx, 0);
+    gen_helper_movgr2fcsr(cpu_env, t0, t1);
+    /* Stop translation as we may have changed hflags */
+    ctx->base.is_jmp = DISAS_STOP;
+
+    tcg_temp_free(t0);
+    tcg_temp_free_i32(t1);
+
+    return true;
+}
+
+static bool trans_movfcsr2gr(DisasContext *ctx, arg_movfcsr2gr *a)
+{
+    TCGv t0;
+    TCGv_i32 t1;
+    TCGv Rd = cpu_gpr[a->rd];
+
+    if (a->rd == 0) {
+        /* Nop */
+        return true;
+    }
+
+    t0 = tcg_temp_new();
+    t1 = tcg_const_i32(a->fcsrs);
+
+    gen_helper_movfcsr2gr(t0, cpu_env, t1);
+    tcg_gen_mov_tl(Rd, t0);
+
+    tcg_temp_free(t0);
+    tcg_temp_free_i32(t1);
+
+    return true;
+}
+
+static bool trans_movfr2cf(DisasContext *ctx, arg_movfr2cf *a)
+{
+    TCGv_i64 fp0 = tcg_temp_new_i64();
+    TCGv_i32 cd  = tcg_const_i32(a->cd);
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_helper_movreg2cf(cpu_env, cd, fp0);
+
+    tcg_temp_free_i64(fp0);
+    tcg_temp_free_i32(cd);
+
+    return true;
+}
+
+static bool trans_movcf2fr(DisasContext *ctx, arg_movcf2fr *a)
+{
+    TCGv t0 = tcg_temp_new();
+    TCGv_i32 cj = tcg_const_i32(a->cj);
+
+    check_fpu_enabled(ctx);
+    gen_helper_movcf2reg(t0, cpu_env, cj);
+    gen_store_fpr64(t0, a->fd);
+
+    tcg_temp_free(t0);
+    tcg_temp_free_i32(cj);
+    return true;
+}
+
+static bool trans_movgr2cf(DisasContext *ctx, arg_movgr2cf *a)
+{
+    TCGv t0 = tcg_temp_new();
+    TCGv_i32 cd = tcg_const_i32(a->cd);
+
+    check_fpu_enabled(ctx);
+    gen_load_gpr(t0, a->rj);
+    gen_helper_movreg2cf(cpu_env, cd, t0);
+
+    tcg_temp_free(t0);
+    tcg_temp_free_i32(cd);
+
+    return true;
+}
+
+static bool trans_movcf2gr(DisasContext *ctx, arg_movcf2gr *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv_i32 cj = tcg_const_i32(a->cj);
+
+    check_fpu_enabled(ctx);
+    gen_helper_movcf2reg(Rd, cpu_env, cj);
+
+    tcg_temp_free_i32(cj);
+
+    return true;
+}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 17/22] target/loongarch: Add floating point load/store instruction translation
  2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
                   ` (15 preceding siblings ...)
  2021-07-21  9:53 ` [PATCH v2 16/22] target/loongarch: Add floating point move " Song Gao
@ 2021-07-21  9:53 ` Song Gao
  2021-07-23  6:34   ` Richard Henderson
  2021-07-21  9:53 ` [PATCH v2 18/22] target/loongarch: Add branch " Song Gao
                   ` (4 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

This patch implement floating point load/store instruction translation.

This includes:
- FLD.{S/D}, FST.{S/D}
- FLDX.{S/D}, FSTX.{S/D}
- FLD{GT/LE}.{S/D}, FST{GT/LE}.{S/D}

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insns.decode |  24 ++++
 target/loongarch/trans.inc.c  | 257 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 281 insertions(+)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index febf89a..ea776c2 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -72,6 +72,8 @@
 &fmt_fdcj           fd cj
 &fmt_cdrj           cd rj
 &fmt_rdcj           rd cj
+&fmt_fdrjrk         fd rj rk
+&fmt_fdrjsi12       fd rj si12
 
 #
 # Formats
@@ -106,6 +108,8 @@
 @fmt_fdcj            .... ........ ..... ..... .. ... .....   &fmt_fdcj           %fd %cj
 @fmt_cdrj            .... ........ ..... ..... ..... .. ...   &fmt_cdrj           %cd %rj
 @fmt_rdcj            .... ........ ..... ..... .. ... .....   &fmt_rdcj           %rd %cj
+@fmt_fdrjrk          .... ........ ..... ..... ..... .....    &fmt_fdrjrk         %fd %rj %rk
+@fmt_fdrjsi12        .... ...... ............ ..... .....     &fmt_fdrjsi12       %fd %rj %si12
 
 #
 # Fixed point arithmetic operation instruction
@@ -424,3 +428,23 @@ movfr2cf         0000 00010001 01001 10100 ..... 00 ...   @fmt_cdfj
 movcf2fr         0000 00010001 01001 10101 00 ... .....   @fmt_fdcj
 movgr2cf         0000 00010001 01001 10110 ..... 00 ...   @fmt_cdrj
 movcf2gr         0000 00010001 01001 10111 00 ... .....   @fmt_rdcj
+
+#
+# Floating point load/store instruction
+#
+fld_s            0010 101100 ............ ..... .....     @fmt_fdrjsi12
+fst_s            0010 101101 ............ ..... .....     @fmt_fdrjsi12
+fld_d            0010 101110 ............ ..... .....     @fmt_fdrjsi12
+fst_d            0010 101111 ............ ..... .....     @fmt_fdrjsi12
+fldx_s           0011 10000011 00000 ..... ..... .....    @fmt_fdrjrk
+fldx_d           0011 10000011 01000 ..... ..... .....    @fmt_fdrjrk
+fstx_s           0011 10000011 10000 ..... ..... .....    @fmt_fdrjrk
+fstx_d           0011 10000011 11000 ..... ..... .....    @fmt_fdrjrk
+fldgt_s          0011 10000111 01000 ..... ..... .....    @fmt_fdrjrk
+fldgt_d          0011 10000111 01001 ..... ..... .....    @fmt_fdrjrk
+fldle_s          0011 10000111 01010 ..... ..... .....    @fmt_fdrjrk
+fldle_d          0011 10000111 01011 ..... ..... .....    @fmt_fdrjrk
+fstgt_s          0011 10000111 01100 ..... ..... .....    @fmt_fdrjrk
+fstgt_d          0011 10000111 01101 ..... ..... .....    @fmt_fdrjrk
+fstle_s          0011 10000111 01110 ..... ..... .....    @fmt_fdrjrk
+fstle_d          0011 10000111 01111 ..... ..... .....    @fmt_fdrjrk
diff --git a/target/loongarch/trans.inc.c b/target/loongarch/trans.inc.c
index 56677f8..8adfdd3 100644
--- a/target/loongarch/trans.inc.c
+++ b/target/loongarch/trans.inc.c
@@ -5028,3 +5028,260 @@ static bool trans_movcf2gr(DisasContext *ctx, arg_movcf2gr *a)
 
     return true;
 }
+
+/* Floating point load/store instruction translation */
+static bool trans_fld_s(DisasContext *ctx, arg_fld_s *a)
+{
+    TCGv t0;
+    TCGv_i32 fp0;
+
+    t0 = tcg_temp_new();
+    fp0 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_base_offset_addr(t0, a->rj, a->si12);
+    tcg_gen_qemu_ld_i32(fp0, t0, ctx->mem_idx, MO_TESL |
+                        ctx->default_tcg_memop_mask);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free(t0);
+    tcg_temp_free_i32(fp0);
+
+
+    return true;
+}
+
+static bool trans_fst_s(DisasContext *ctx, arg_fst_s *a)
+{
+    TCGv t0;
+    TCGv_i32 fp0;
+
+    t0 = tcg_temp_new();
+    fp0 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_base_offset_addr(t0, a->rj, a->si12);
+    gen_load_fpr32(fp0, a->fd);
+    tcg_gen_qemu_st_i32(fp0, t0, ctx->mem_idx, MO_TEUL |
+                        ctx->default_tcg_memop_mask);
+
+    tcg_temp_free(t0);
+    tcg_temp_free_i32(fp0);
+
+    return true;
+}
+
+static bool trans_fld_d(DisasContext *ctx, arg_fld_d *a)
+{
+    TCGv t0;
+    TCGv_i64 fp0;
+
+    t0 = tcg_temp_new();
+    fp0 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_base_offset_addr(t0, a->rj, a->si12);
+    tcg_gen_qemu_ld_i64(fp0, t0, ctx->mem_idx, MO_TEQ |
+                        ctx->default_tcg_memop_mask);
+    gen_store_fpr64(fp0, a->fd);
+
+    tcg_temp_free(t0);
+    tcg_temp_free_i64(fp0);
+
+    return true;
+}
+
+static bool trans_fst_d(DisasContext *ctx, arg_fst_d *a)
+{
+    TCGv t0;
+    TCGv_i64 fp0;
+
+    t0 = tcg_temp_new();
+    fp0 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_base_offset_addr(t0, a->rj, a->si12);
+    gen_load_fpr64(fp0, a->fd);
+    tcg_gen_qemu_st_i64(fp0, t0, ctx->mem_idx, MO_TEQ |
+                        ctx->default_tcg_memop_mask);
+
+    tcg_temp_free(t0);
+    tcg_temp_free_i64(fp0);
+
+    return true;
+}
+
+static bool trans_fldx_s(DisasContext *ctx, arg_fldx_s *a)
+{
+    TCGv t0;
+    TCGv_i32 fp0;
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+
+    t0 = tcg_temp_new();
+    fp0 = tcg_temp_new_i32();
+
+    if (a->rj == 0 && a->rk == 0) {
+        /* Nop */
+        return true;
+    }
+
+    tcg_gen_add_tl(t0, Rj, Rk);
+    tcg_gen_qemu_ld_tl(t0, t0, ctx->mem_idx, MO_TESL);
+    tcg_gen_trunc_tl_i32(fp0, t0);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free(t0);
+    tcg_temp_free_i32(fp0);
+
+    return true;
+}
+
+static bool trans_fldx_d(DisasContext *ctx, arg_fldx_d *a)
+{
+    TCGv t0;
+    TCGv_i64 fp0;
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+
+    t0 = tcg_temp_new();
+    fp0 = tcg_temp_new_i64();
+
+    if (a->rj == 0 && a->rk == 0) {
+        /* Nop */
+        return true;
+    }
+
+    tcg_gen_add_tl(t0, Rj, Rk);
+    tcg_gen_qemu_ld_i64(fp0, t0, ctx->mem_idx, MO_TEQ);
+    gen_store_fpr64(fp0, a->fd);
+
+    tcg_temp_free(t0);
+    tcg_temp_free_i64(fp0);
+
+    return true;
+}
+
+static bool trans_fstx_s(DisasContext *ctx, arg_fstx_s *a)
+{
+    TCGv t0;
+    TCGv_i32 fp0;
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+
+    t0 = tcg_temp_new();
+    fp0 = tcg_temp_new_i32();
+
+    if (a->rj == 0 && a->rk == 0) {
+        /* Nop */
+        return true;
+    }
+
+    tcg_gen_add_tl(t0, Rj, Rk);
+    gen_load_fpr32(fp0, a->fd);
+    tcg_gen_qemu_st_i32(fp0, t0, ctx->mem_idx, MO_TEUL);
+
+    tcg_temp_free(t0);
+    tcg_temp_free_i32(fp0);
+
+    return true;
+}
+
+static bool trans_fstx_d(DisasContext *ctx, arg_fstx_d *a)
+{
+    TCGv t0;
+    TCGv_i64 fp0;
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+
+    t0 = tcg_temp_new();
+    fp0 = tcg_temp_new_i64();
+
+    if (a->rj == 0 && a->rk == 0) {
+        /* Nop */
+        return true;
+    }
+
+    tcg_gen_add_tl(t0, Rj, Rk);
+    gen_load_fpr64(fp0, a->fd);
+    tcg_gen_qemu_st_i64(fp0, t0, ctx->mem_idx, MO_TEQ);
+
+    tcg_temp_free_i64(fp0);
+    tcg_temp_free(t0);
+
+    return true;
+}
+
+
+#define DECL_ARG2(name)  \
+    arg_ ## name arg = { \
+        .fd = a->fd,     \
+        .rj = a->rj,     \
+        .rk = a->rk,     \
+    };
+
+static bool trans_fldgt_s(DisasContext *ctx, arg_fldgt_s *a)
+{
+    ASRTGT;
+    DECL_ARG2(fldx_s)
+    trans_fldx_s(ctx, &arg);
+    return true;
+}
+
+static bool trans_fldgt_d(DisasContext *ctx, arg_fldgt_d *a)
+{
+    ASRTGT;
+    DECL_ARG2(fldx_d);
+    trans_fldx_d(ctx, &arg);
+    return true;
+}
+
+static bool trans_fldle_s(DisasContext *ctx, arg_fldle_s *a)
+{
+    ASRTLE;
+    DECL_ARG2(fldx_s);
+    trans_fldx_s(ctx, &arg);
+    return true;
+}
+
+static bool trans_fldle_d(DisasContext *ctx, arg_fldle_d *a)
+{
+    ASRTLE;
+    DECL_ARG2(fldx_d);
+    trans_fldx_d(ctx, &arg);
+    return true;
+}
+
+static bool trans_fstgt_s(DisasContext *ctx, arg_fstgt_s *a)
+{
+    ASRTGT;
+    DECL_ARG2(fstx_s);
+    trans_fstx_s(ctx, &arg);
+    return true;
+}
+
+static bool trans_fstgt_d(DisasContext *ctx, arg_fstgt_d *a)
+{
+    ASRTGT;
+    DECL_ARG2(fstx_d);
+    trans_fstx_d(ctx, &arg);
+    return true;
+}
+
+static bool trans_fstle_s(DisasContext *ctx, arg_fstle_s *a)
+{
+    ASRTLE;
+    DECL_ARG2(fstx_s);
+    trans_fstx_s(ctx, &arg);
+    return true;
+}
+
+static bool trans_fstle_d(DisasContext *ctx, arg_fstle_d *a)
+{
+    ASRTLE;
+    DECL_ARG2(fstx_d);
+    trans_fstx_d(ctx, &arg);
+    return true;
+}
+
+#undef DECL_ARG2
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 18/22] target/loongarch: Add branch instruction translation
  2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
                   ` (16 preceding siblings ...)
  2021-07-21  9:53 ` [PATCH v2 17/22] target/loongarch: Add floating point load/store " Song Gao
@ 2021-07-21  9:53 ` Song Gao
  2021-07-23  6:38   ` Richard Henderson
  2021-07-21  9:53 ` [PATCH v2 19/22] target/loongarch: Add disassembler Song Gao
                   ` (3 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

This patch implement branch instruction translation.

This includes:
- BEQ, BNE, BLT[U], BGE[U]
- BEQZ, BNEZ
- B
- BL
- JIRL
- BCEQZ, BCNEZ

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insns.decode |  30 +++++
 target/loongarch/trans.inc.c  | 249 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 279 insertions(+)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index ea776c2..077063e 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -38,6 +38,9 @@
 %ca      15:3
 %fcsrd   0:5
 %fcsrs   5:5
+%offs21  0:s5 10:16
+%offs16  10:s16
+%offs    0:s10 10:16
 
 #
 # Argument sets
@@ -74,6 +77,11 @@
 &fmt_rdcj           rd cj
 &fmt_fdrjrk         fd rj rk
 &fmt_fdrjsi12       fd rj si12
+&fmt_rjoffs21       rj offs21
+&fmt_cjoffs21       cj offs21
+&fmt_rdrjoffs16     rd rj offs16
+&fmt_offs           offs
+&fmt_rjrdoffs16     rj rd offs16
 
 #
 # Formats
@@ -110,6 +118,11 @@
 @fmt_rdcj            .... ........ ..... ..... .. ... .....   &fmt_rdcj           %rd %cj
 @fmt_fdrjrk          .... ........ ..... ..... ..... .....    &fmt_fdrjrk         %fd %rj %rk
 @fmt_fdrjsi12        .... ...... ............ ..... .....     &fmt_fdrjsi12       %fd %rj %si12
+@fmt_rjoffs21        .... .. ................ ..... .....     &fmt_rjoffs21       %rj %offs21
+@fmt_cjoffs21        .... .. ................ .. ... .....    &fmt_cjoffs21       %cj %offs21
+@fmt_rdrjoffs16      .... .. ................ ..... .....     &fmt_rdrjoffs16     %rd %rj %offs16
+@fmt_offs            .... .. ..........................       &fmt_offs           %offs
+@fmt_rjrdoffs16      .... .. ................ ..... .....     &fmt_rjrdoffs16     %rj %rd %offs16
 
 #
 # Fixed point arithmetic operation instruction
@@ -448,3 +461,20 @@ fstgt_s          0011 10000111 01100 ..... ..... .....    @fmt_fdrjrk
 fstgt_d          0011 10000111 01101 ..... ..... .....    @fmt_fdrjrk
 fstle_s          0011 10000111 01110 ..... ..... .....    @fmt_fdrjrk
 fstle_d          0011 10000111 01111 ..... ..... .....    @fmt_fdrjrk
+
+#
+# Branch instructions
+#
+beqz             0100 00 ................ ..... .....     @fmt_rjoffs21
+bnez             0100 01 ................ ..... .....     @fmt_rjoffs21
+bceqz            0100 10 ................ 00 ... .....    @fmt_cjoffs21
+bcnez            0100 10 ................ 01 ... .....    @fmt_cjoffs21
+jirl             0100 11 ................ ..... .....     @fmt_rdrjoffs16
+b                0101 00 ..........................       @fmt_offs
+bl               0101 01 ..........................       @fmt_offs
+beq              0101 10 ................ ..... .....     @fmt_rjrdoffs16
+bne              0101 11 ................ ..... .....     @fmt_rjrdoffs16
+blt              0110 00 ................ ..... .....     @fmt_rjrdoffs16
+bge              0110 01 ................ ..... .....     @fmt_rjrdoffs16
+bltu             0110 10 ................ ..... .....     @fmt_rjrdoffs16
+bgeu             0110 11 ................ ..... .....     @fmt_rjrdoffs16
diff --git a/target/loongarch/trans.inc.c b/target/loongarch/trans.inc.c
index 8adfdd3..0c67c54 100644
--- a/target/loongarch/trans.inc.c
+++ b/target/loongarch/trans.inc.c
@@ -5285,3 +5285,252 @@ static bool trans_fstle_d(DisasContext *ctx, arg_fstle_d *a)
 }
 
 #undef DECL_ARG2
+
+/* Branch Instructions translation */
+static bool trans_beqz(DisasContext *ctx, arg_beqz *a)
+{
+    TCGv t0, t1;
+    int bcond_flag = 0;
+
+    t0 = tcg_temp_new();
+    t1 = tcg_const_i64(0);
+
+    if (a->rj != 0) {
+        gen_load_gpr(t0, a->rj);
+        bcond_flag = 1;
+    }
+
+    if (bcond_flag == 0) {
+        ctx->hflags |= LOONGARCH_HFLAG_B;
+    } else {
+        tcg_gen_setcond_tl(TCG_COND_EQ, bcond, t0, t1);
+        ctx->hflags |= LOONGARCH_HFLAG_BC;
+    }
+    ctx->btarget = ctx->base.pc_next + (a->offs21 << 2);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_bnez(DisasContext *ctx, arg_bnez *a)
+{
+    TCGv t0, t1;
+
+    t0 = tcg_temp_new();
+    t1 = tcg_const_i64(0);
+
+    if (a->rj != 0) {
+        gen_load_gpr(t0, a->rj);
+        tcg_gen_setcond_tl(TCG_COND_NE, bcond, t0, t1);
+        ctx->hflags |= LOONGARCH_HFLAG_BC;
+    }
+    ctx->btarget = ctx->base.pc_next + (a->offs21 << 2);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_bceqz(DisasContext *ctx, arg_bceqz *a)
+{
+    TCGv t0, t1;
+    TCGv_i32 cj;
+
+    cj = tcg_const_i32(a->cj);
+    t0 = tcg_temp_new();
+    t1 = tcg_const_i64(0);
+
+    gen_helper_movcf2reg(t0, cpu_env, cj);
+    tcg_gen_setcond_tl(TCG_COND_EQ, bcond, t0, t1);
+    ctx->hflags |= LOONGARCH_HFLAG_BC;
+    ctx->btarget = ctx->base.pc_next + (a->offs21 << 2);
+
+    tcg_temp_free_i32(cj);
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_bcnez(DisasContext *ctx, arg_bcnez *a)
+{
+    TCGv t0, t1;
+    TCGv_i32 cj;
+
+    cj = tcg_const_i32(a->cj);
+    t0 = tcg_temp_new();
+    t1 = tcg_const_i64(0);
+
+    gen_helper_movcf2reg(t0, cpu_env, cj);
+    tcg_gen_setcond_tl(TCG_COND_NE, bcond, t0, t1);
+    ctx->hflags |= LOONGARCH_HFLAG_BC;
+    ctx->btarget = ctx->base.pc_next + (a->offs21 << 2);
+
+    tcg_temp_free_i32(cj);
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_b(DisasContext *ctx, arg_b *a)
+{
+    ctx->hflags |= LOONGARCH_HFLAG_B;
+    ctx->btarget = ctx->base.pc_next + (a->offs << 2);
+
+    return true;
+}
+
+static bool trans_bl(DisasContext *ctx, arg_bl *a)
+{
+    ctx->btarget = ctx->base.pc_next + (a->offs << 2);
+    tcg_gen_movi_tl(cpu_gpr[1], ctx->base.pc_next + 4);
+    ctx->hflags |= LOONGARCH_HFLAG_B;
+    gen_branch(ctx, 4);
+
+    return true;
+}
+
+static bool trans_blt(DisasContext *ctx, arg_blt *a)
+{
+    TCGv t0, t1;
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_load_gpr(t0, a->rj);
+    gen_load_gpr(t1, a->rd);
+
+    tcg_gen_setcond_tl(TCG_COND_LT, bcond, t0, t1);
+    ctx->hflags |= LOONGARCH_HFLAG_BC;
+    ctx->btarget = ctx->base.pc_next + (a->offs16 << 2);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_bge(DisasContext *ctx, arg_bge *a)
+{
+    TCGv t0, t1;
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_load_gpr(t0, a->rj);
+    gen_load_gpr(t1, a->rd);
+
+    tcg_gen_setcond_tl(TCG_COND_GE, bcond, t0, t1);
+    ctx->hflags |= LOONGARCH_HFLAG_BC;
+    ctx->btarget = ctx->base.pc_next + (a->offs16 << 2);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_bltu(DisasContext *ctx, arg_bltu *a)
+{
+    TCGv t0, t1;
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_load_gpr(t0, a->rj);
+    gen_load_gpr(t1, a->rd);
+
+    tcg_gen_setcond_tl(TCG_COND_LTU, bcond, t0, t1);
+    ctx->hflags |= LOONGARCH_HFLAG_BC;
+    ctx->btarget = ctx->base.pc_next + (a->offs16 << 2);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_bgeu(DisasContext *ctx, arg_bgeu *a)
+{
+    TCGv t0, t1;
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    gen_load_gpr(t0, a->rj);
+    gen_load_gpr(t1, a->rd);
+
+    tcg_gen_setcond_tl(TCG_COND_GEU, bcond, t0, t1);
+    ctx->hflags |= LOONGARCH_HFLAG_BC;
+    ctx->btarget = ctx->base.pc_next + (a->offs16 << 2);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_beq(DisasContext *ctx, arg_beq *a)
+{
+    TCGv t0, t1;
+    int bcond_flag = 0;
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    if (a->rj != a->rd) {
+        gen_load_gpr(t0, a->rj);
+        gen_load_gpr(t1, a->rd);
+        bcond_flag = 1;
+    }
+
+    if (bcond_flag == 0) {
+        ctx->hflags |= LOONGARCH_HFLAG_B;
+    } else {
+        tcg_gen_setcond_tl(TCG_COND_EQ, bcond, t0, t1);
+        ctx->hflags |= LOONGARCH_HFLAG_BC;
+    }
+    ctx->btarget = ctx->base.pc_next + (a->offs16 << 2);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_bne(DisasContext *ctx, arg_bne *a)
+{
+    TCGv t0, t1;
+
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+
+    if (a->rj != a->rd) {
+        gen_load_gpr(t0, a->rj);
+        gen_load_gpr(t1, a->rd);
+        tcg_gen_setcond_tl(TCG_COND_NE, bcond, t0, t1);
+        ctx->hflags |= LOONGARCH_HFLAG_BC;
+    }
+    ctx->btarget = ctx->base.pc_next + (a->offs16 << 2);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+
+    return true;
+}
+
+static bool trans_jirl(DisasContext *ctx, arg_jirl *a)
+{
+    gen_base_offset_addr(btarget, a->rj, a->offs16 << 2);
+    if (a->rd != 0) {
+        tcg_gen_movi_tl(cpu_gpr[a->rd], ctx->base.pc_next + 4);
+    }
+    ctx->hflags |= LOONGARCH_HFLAG_BR;
+    gen_branch(ctx, 4);
+
+    return true;
+}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 19/22] target/loongarch: Add disassembler
  2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
                   ` (17 preceding siblings ...)
  2021-07-21  9:53 ` [PATCH v2 18/22] target/loongarch: Add branch " Song Gao
@ 2021-07-21  9:53 ` Song Gao
  2021-07-23  6:40   ` Richard Henderson
  2021-08-12 10:33   ` Philippe Mathieu-Daudé
  2021-07-21  9:53 ` [PATCH v2 20/22] LoongArch Linux User Emulation Song Gao
                   ` (2 subsequent siblings)
  21 siblings, 2 replies; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

This patch add support for disassembling via option '-d in_asm'.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 MAINTAINERS             |    1 +
 disas/loongarch.c       | 2511 +++++++++++++++++++++++++++++++++++++++++++++++
 disas/meson.build       |    1 +
 include/disas/dis-asm.h |    2 +
 meson.build             |    1 +
 5 files changed, 2516 insertions(+)
 create mode 100644 disas/loongarch.c

diff --git a/MAINTAINERS b/MAINTAINERS
index ae87a74..612fdfb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -214,6 +214,7 @@ LoongArch TCG CPUS
 M: Song Gao <gaosong@loongson.cn>
 S: Maintained
 F: target/loongarch/
+F: disas/loongarch.c
 
 M68K TCG CPUs
 M: Laurent Vivier <laurent@vivier.eu>
diff --git a/disas/loongarch.c b/disas/loongarch.c
new file mode 100644
index 0000000..eb7475b
--- /dev/null
+++ b/disas/loongarch.c
@@ -0,0 +1,2511 @@
+/*
+ * QEMU LoongArch Disassembler
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited.
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+#include "qemu/osdep.h"
+#include "disas/dis-asm.h"
+
+#define INSNLEN 4
+
+/* enums */
+typedef enum {
+    la_op_illegal = 0,
+    la_op_clo_w = 1,
+    la_op_clz_w = 2,
+    la_op_cto_w = 3,
+    la_op_ctz_w = 4,
+    la_op_clo_d = 5,
+    la_op_clz_d = 6,
+    la_op_cto_d = 7,
+    la_op_ctz_d = 8,
+    la_op_revb_2h = 9,
+    la_op_revb_4h = 10,
+    la_op_revb_2w = 11,
+    la_op_revb_d = 12,
+    la_op_revh_2w = 13,
+    la_op_revh_d = 14,
+    la_op_bitrev_4b = 15,
+    la_op_bitrev_8b = 16,
+    la_op_bitrev_w = 17,
+    la_op_bitrev_d = 18,
+    la_op_ext_w_h = 19,
+    la_op_ext_w_b = 20,
+    la_op_rdtime_d = 21,
+    la_op_cpucfg = 22,
+    la_op_asrtle_d = 23,
+    la_op_asrtgt_d = 24,
+    la_op_alsl_w = 25,
+    la_op_alsl_wu = 26,
+    la_op_bytepick_w = 27,
+    la_op_bytepick_d = 28,
+    la_op_add_w = 29,
+    la_op_add_d = 30,
+    la_op_sub_w = 31,
+    la_op_sub_d = 32,
+    la_op_slt = 33,
+    la_op_sltu = 34,
+    la_op_maskeqz = 35,
+    la_op_masknez = 36,
+    la_op_nor = 37,
+    la_op_and = 38,
+    la_op_or = 39,
+    la_op_xor = 40,
+    la_op_orn = 41,
+    la_op_andn = 42,
+    la_op_sll_w = 43,
+    la_op_srl_w = 44,
+    la_op_sra_w = 45,
+    la_op_sll_d = 46,
+    la_op_srl_d = 47,
+    la_op_sra_d = 48,
+    la_op_rotr_w = 49,
+    la_op_rotr_d = 50,
+    la_op_mul_w = 51,
+    la_op_mulh_w = 52,
+    la_op_mulh_wu = 53,
+    la_op_mul_d = 54,
+    la_op_mulh_d = 55,
+    la_op_mulh_du = 56,
+    la_op_mulw_d_w = 57,
+    la_op_mulw_d_wu = 58,
+    la_op_div_w = 59,
+    la_op_mod_w = 60,
+    la_op_div_wu = 61,
+    la_op_mod_wu = 62,
+    la_op_div_d = 63,
+    la_op_mod_d = 64,
+    la_op_div_du = 65,
+    la_op_mod_du = 66,
+    la_op_crc_w_b_w = 67,
+    la_op_crc_w_h_w = 68,
+    la_op_crc_w_w_w = 69,
+    la_op_crc_w_d_w = 70,
+    la_op_crcc_w_b_w = 71,
+    la_op_crcc_w_h_w = 72,
+    la_op_crcc_w_w_w = 73,
+    la_op_crcc_w_d_w = 74,
+    la_op_break = 75,
+    la_op_syscall = 76,
+    la_op_alsl_d = 77,
+    la_op_slli_w = 78,
+    la_op_slli_d = 79,
+    la_op_srli_w = 80,
+    la_op_srli_d = 81,
+    la_op_srai_w = 82,
+    la_op_srai_d = 83,
+    la_op_rotri_w = 84,
+    la_op_rotri_d = 85,
+    la_op_bstrins_w = 86,
+    la_op_bstrpick_w = 87,
+    la_op_bstrins_d = 88,
+    la_op_bstrpick_d = 89,
+    la_op_fadd_s = 90,
+    la_op_fadd_d = 91,
+    la_op_fsub_s = 92,
+    la_op_fsub_d = 93,
+    la_op_fmul_s = 94,
+    la_op_fmul_d = 95,
+    la_op_fdiv_s = 96,
+    la_op_fdiv_d = 97,
+    la_op_fmax_s = 98,
+    la_op_fmax_d = 99,
+    la_op_fmin_s = 100,
+    la_op_fmin_d = 101,
+    la_op_fmaxa_s = 102,
+    la_op_fmaxa_d = 103,
+    la_op_fmina_s = 104,
+    la_op_fmina_d = 105,
+    la_op_fscaleb_s = 106,
+    la_op_fscaleb_d = 107,
+    la_op_fcopysign_s = 108,
+    la_op_fcopysign_d = 109,
+    la_op_fabs_s = 110,
+    la_op_fabs_d = 111,
+    la_op_fneg_s = 112,
+    la_op_fneg_d = 113,
+    la_op_flogb_s = 114,
+    la_op_flogb_d = 115,
+    la_op_fclass_s = 116,
+    la_op_fclass_d = 117,
+    la_op_fsqrt_s = 118,
+    la_op_fsqrt_d = 119,
+    la_op_frecip_s = 120,
+    la_op_frecip_d = 121,
+    la_op_frsqrt_s = 122,
+    la_op_frsqrt_d = 123,
+    la_op_fmov_s = 124,
+    la_op_fmov_d = 125,
+    la_op_movgr2fr_w = 126,
+    la_op_movgr2fr_d = 127,
+    la_op_movgr2frh_w = 128,
+    la_op_movfr2gr_s = 129,
+    la_op_movfr2gr_d = 130,
+    la_op_movfrh2gr_s = 131,
+    la_op_movgr2fcsr = 132,
+    la_op_movfcsr2gr = 133,
+    la_op_movfr2cf = 134,
+    la_op_movcf2fr = 135,
+    la_op_movgr2cf = 136,
+    la_op_movcf2gr = 137,
+    la_op_fcvt_s_d = 138,
+    la_op_fcvt_d_s = 139,
+    la_op_ftintrm_w_s = 140,
+    la_op_ftintrm_w_d = 141,
+    la_op_ftintrm_l_s = 142,
+    la_op_ftintrm_l_d = 143,
+    la_op_ftintrp_w_s = 144,
+    la_op_ftintrp_w_d = 145,
+    la_op_ftintrp_l_s = 146,
+    la_op_ftintrp_l_d = 147,
+    la_op_ftintrz_w_s = 148,
+    la_op_ftintrz_w_d = 149,
+    la_op_ftintrz_l_s = 150,
+    la_op_ftintrz_l_d = 151,
+    la_op_ftintrne_w_s = 152,
+    la_op_ftintrne_w_d = 153,
+    la_op_ftintrne_l_s = 154,
+    la_op_ftintrne_l_d = 155,
+    la_op_ftint_w_s = 156,
+    la_op_ftint_w_d = 157,
+    la_op_ftint_l_s = 158,
+    la_op_ftint_l_d = 159,
+    la_op_ffint_s_w = 160,
+    la_op_ffint_s_l = 161,
+    la_op_ffint_d_w = 162,
+    la_op_ffint_d_l = 163,
+    la_op_frint_s = 164,
+    la_op_frint_d = 165,
+    la_op_slti = 166,
+    la_op_sltui = 167,
+    la_op_addi_w = 168,
+    la_op_addi_d = 169,
+    la_op_lu52i_d = 170,
+    la_op_addi = 171,
+    la_op_ori = 172,
+    la_op_xori = 173,
+    la_op_rdtimel_w = 174,
+    la_op_rdtimeh_w = 175,
+    la_op_fmadd_s = 176,
+    la_op_fmadd_d = 177,
+    la_op_fmsub_s = 178,
+    la_op_fmsub_d = 179,
+    la_op_fnmadd_s = 180,
+    la_op_fnmadd_d = 181,
+    la_op_fnmsub_s = 182,
+    la_op_fnmsub_d = 183,
+    la_op_fcmp_cond_s = 184,
+    la_op_fcmp_cond_d = 185,
+    la_op_fsel = 186,
+    la_op_addu16i_d = 187,
+    la_op_lu12i_w = 188,
+    la_op_lu32i_d = 189,
+    la_op_pcaddi = 190,
+    la_op_pcalau12i = 191,
+    la_op_pcaddu12i = 192,
+    la_op_pcaddu18i = 193,
+    la_op_ll_w = 194,
+    la_op_sc_w = 195,
+    la_op_ll_d = 196,
+    la_op_sc_d = 197,
+    la_op_ldptr_w = 198,
+    la_op_stptr_w = 199,
+    la_op_ldptr_d = 200,
+    la_op_stptr_d = 201,
+    la_op_ld_b = 202,
+    la_op_ld_h = 203,
+    la_op_ld_w = 204,
+    la_op_ld_d = 205,
+    la_op_st_b = 206,
+    la_op_st_h = 207,
+    la_op_st_w = 208,
+    la_op_st_d = 209,
+    la_op_ld_bu = 210,
+    la_op_ld_hu = 211,
+    la_op_ld_wu = 212,
+    la_op_preld = 213,
+    la_op_fld_s = 214,
+    la_op_fst_s = 215,
+    la_op_fld_d = 216,
+    la_op_fst_d = 217,
+    la_op_ldx_b = 218,
+    la_op_ldx_h = 219,
+    la_op_ldx_w = 220,
+    la_op_ldx_d = 221,
+    la_op_stx_b = 222,
+    la_op_stx_h = 223,
+    la_op_stx_w = 224,
+    la_op_stx_d = 225,
+    la_op_ldx_bu = 226,
+    la_op_ldx_hu = 227,
+    la_op_ldx_wu = 228,
+    la_op_fldx_s = 229,
+    la_op_fldx_d = 230,
+    la_op_fstx_s = 231,
+    la_op_fstx_d = 232,
+    la_op_amswap_w = 233,
+    la_op_amswap_d = 234,
+    la_op_amadd_w = 235,
+    la_op_amadd_d = 236,
+    la_op_amand_w = 237,
+    la_op_amand_d = 238,
+    la_op_amor_w = 239,
+    la_op_amor_d = 240,
+    la_op_amxor_w = 241,
+    la_op_amxor_d = 242,
+    la_op_ammax_w = 243,
+    la_op_ammax_d = 244,
+    la_op_ammin_w = 245,
+    la_op_ammin_d = 246,
+    la_op_ammax_wu = 247,
+    la_op_ammax_du = 248,
+    la_op_ammin_wu = 249,
+    la_op_ammin_du = 250,
+    la_op_amswap_db_w = 251,
+    la_op_amswap_db_d = 252,
+    la_op_amadd_db_w = 253,
+    la_op_amadd_db_d = 254,
+    la_op_amand_db_w = 255,
+    la_op_amand_db_d = 256,
+    la_op_amor_db_w = 257,
+    la_op_amor_db_d = 258,
+    la_op_amxor_db_w = 259,
+    la_op_amxor_db_d = 260,
+    la_op_ammax_db_w = 261,
+    la_op_ammax_db_d = 262,
+    la_op_ammin_db_w = 263,
+    la_op_ammin_db_d = 264,
+    la_op_ammax_db_wu = 265,
+    la_op_ammax_db_du = 266,
+    la_op_ammin_db_wu = 267,
+    la_op_ammin_db_du = 268,
+    la_op_dbar = 269,
+    la_op_ibar = 270,
+    la_op_fldgt_s = 271,
+    la_op_fldgt_d = 272,
+    la_op_fldle_s = 273,
+    la_op_fldle_d = 274,
+    la_op_fstgt_s = 275,
+    la_op_fstgt_d = 276,
+    ls_op_fstle_s = 277,
+    la_op_fstle_d = 278,
+    la_op_ldgt_b = 279,
+    la_op_ldgt_h = 280,
+    la_op_ldgt_w = 281,
+    la_op_ldgt_d = 282,
+    la_op_ldle_b = 283,
+    la_op_ldle_h = 284,
+    la_op_ldle_w = 285,
+    la_op_ldle_d = 286,
+    la_op_stgt_b = 287,
+    la_op_stgt_h = 288,
+    la_op_stgt_w = 289,
+    la_op_stgt_d = 290,
+    la_op_stle_b = 291,
+    la_op_stle_h = 292,
+    la_op_stle_w = 293,
+    la_op_stle_d = 294,
+    la_op_beqz = 295,
+    la_op_bnez = 296,
+    la_op_bceqz = 297,
+    la_op_bcnez = 298,
+    la_op_jirl = 299,
+    la_op_b = 300,
+    la_op_bl = 301,
+    la_op_beq = 302,
+    la_op_bne = 303,
+    la_op_blt = 304,
+    la_op_bge = 305,
+    la_op_bltu = 306,
+    la_op_bgeu = 307,
+
+} la_op;
+
+typedef enum {
+    la_codec_illegal,
+    la_codec_empty,
+    la_codec_2r,
+    la_codec_2r_u5,
+    la_codec_2r_u6,
+    la_codec_2r_2bw,
+    la_codec_2r_2bd,
+    la_codec_3r,
+    la_codec_3r_rd0,
+    la_codec_3r_sa2,
+    la_codec_3r_sa3,
+    la_codec_4r,
+    la_codec_r_im20,
+    la_codec_2r_im16,
+    la_codec_2r_im14,
+    la_codec_r_im14,
+    la_codec_2r_im12,
+    la_codec_im5_r_im12,
+    la_codec_2r_im8,
+    la_codec_r_sd,
+    la_codec_r_sj,
+    la_codec_r_cd,
+    la_codec_r_cj,
+    la_codec_r_seq,
+    la_codec_code,
+    la_codec_whint,
+    la_codec_invtlb,
+    la_codec_r_ofs21,
+    la_codec_cj_ofs21,
+    la_codec_ofs26,
+    la_codec_cond,
+    la_codec_sel,
+
+} la_codec;
+
+#define la_fmt_illegal         "nte"
+#define la_fmt_empty           "nt"
+#define la_fmt_sd_rj           "ntA,1"
+#define la_fmt_rd_sj           "nt0,B"
+#define la_fmt_rd_rj           "nt0,1"
+#define la_fmt_rj_rk           "nt1,2"
+#define la_fmt_rj_seq          "nt1,x"
+#define la_fmt_rd_si20         "nt0,i(x)"
+#define la_fmt_rd_rj_ui5       "nt0,1,C"
+#define la_fmt_rd_rj_ui6       "nt0,1.C"
+#define la_fmt_rd_rj_level     "nt0,1,x"
+#define la_fmt_rd_rj_msbw_lsbw "nt0,1,C,D"
+#define la_fmt_rd_rj_msbd_lsbd "nt0,1,C,D"
+#define la_fmt_rd_rj_si12      "nt0,1,i(x)"
+#define la_fmt_hint_rj_si12    "ntE,1,i(x)"
+#define la_fmt_rd_rj_csr       "nt0,1,x"
+#define la_fmt_rd_csr          "nt0,x"
+#define la_fmt_rd_rj_si14      "nt0,1,i(x)"
+#define la_fmt_rd_rj_si16      "nt0,1,i(x)"
+#define la_fmt_rd_rj_rk        "nt0,1,2"
+#define la_fmt_fd_rj_rk        "nt3,1,2"
+#define la_fmt_rd_rj_rk_sa2    "nt0,1,2,D"
+#define la_fmt_rd_rj_rk_sa3    "nt0,1,2,D"
+#define la_fmt_fd_rj           "nt3,1"
+#define la_fmt_rd_fj           "nt0,4"
+#define la_fmt_fd_fj           "nt3,4"
+#define la_fmt_fd_fj_si12      "nt3,4,i(x)"
+#define la_fmt_fcsrd_rj        "ntF,1"
+#define la_fmt_rd_fcsrs        "nt0,G"
+#define la_fmt_cd_fj           "ntH,4"
+#define la_fmt_fd_cj           "nt3,I"
+#define la_fmt_fd_fj_fk        "nt3,4,5"
+#define la_fmt_code            "ntJ"
+#define la_fmt_whint           "ntx"
+#define la_fmt_invtlb          "ntx,1,2"
+#define la_fmt_offs26          "nto(X)p"
+#define la_fmt_rj_offs21       "nt1,o(X)p"
+#define la_fmt_cj_offs21       "ntQ,o(X)p"
+#define la_fmt_rd_rj_offs16    "nt0,1,o(X)"
+#define la_fmt_rj_rd_offs16    "nt1,0,o(X)p"
+#define la_fmt_s_cd_fj_fk      "K.stH,4,5"
+#define la_fmt_d_cd_fj_fk      "K.dtH,4,5"
+#define la_fmt_fd_fj_fk_fa     "nt3,4,5,6"
+#define la_fmt_fd_fj_fk_ca     "nt3,4,5,L"
+#define la_fmt_cop_rj_si12     "ntM,1,i(x)"
+
+/* structures */
+typedef struct {
+    uint32_t pc;
+    uint32_t insn;
+    int32_t imm;
+    int32_t imm2;
+    uint16_t op;
+    uint16_t code;
+    uint8_t codec;
+    uint8_t r1;
+    uint8_t r2;
+    uint8_t r3;
+    uint8_t r4;
+    uint8_t bit;
+} la_decode;
+
+typedef struct {
+    const char * const name;
+    const la_codec codec;
+    const char * const format;
+} la_opcode_data;
+
+/* reg names */
+const char * const loongarch_r_normal_name[32] = {
+  "$r0", "$r1", "$r2", "$r3", "$r4", "$r5", "$r6", "$r7",
+  "$r8", "$r9", "$r10", "$r11", "$r12", "$r13", "$r14", "$r15",
+  "$r16", "$r17", "$r18", "$r19", "$r20", "$r21", "$r22", "$r23",
+  "$r24", "$r25", "$r26", "$r27", "$r28", "$r29", "$r30", "$r31",
+};
+
+const char * const loongarch_f_normal_name[32] = {
+  "$f0", "$f1", "$f2", "$f3", "$f4", "$f5", "$f6", "$f7",
+  "$f8", "$f9", "$f10", "$f11", "$f12", "$f13", "$f14", "$f15",
+  "$f16", "$f17", "$f18", "$f19", "$f20", "$f21", "$f22", "$f23",
+  "$f24", "$f25", "$f26", "$f27", "$f28", "$f29", "$f30", "$f31",
+};
+
+const char * const loongarch_cr_normal_name[4] = {
+  "$scr0", "$scr1", "$scr2", "$scr3",
+};
+
+const char * const loongarch_c_normal_name[8] = {
+  "$fcc0", "$fcc1", "$fcc2", "$fcc3", "$fcc4", "$fcc5", "$fcc6", "$fcc7",
+};
+
+/* instruction data */
+const  la_opcode_data opcode_data[] = {
+    { "illegal", la_codec_illegal, la_fmt_illegal },
+    { "clo.w", la_codec_2r, la_fmt_rd_rj },
+    { "clz.w", la_codec_2r, la_fmt_rd_rj },
+    { "cto.w", la_codec_2r, la_fmt_rd_rj },
+    { "ctz.w", la_codec_2r, la_fmt_rd_rj },
+    { "clo.d", la_codec_2r, la_fmt_rd_rj },
+    { "clz.d", la_codec_2r, la_fmt_rd_rj },
+    { "cto.d", la_codec_2r, la_fmt_rd_rj },
+    { "ctz_d", la_codec_2r, la_fmt_rd_rj },
+    { "revb.2h", la_codec_2r, la_fmt_rd_rj },
+    { "revb.4h", la_codec_2r, la_fmt_rd_rj },
+    { "revb.2w", la_codec_2r, la_fmt_rd_rj },
+    { "revb.d", la_codec_2r, la_fmt_rd_rj },
+    { "revh.2w", la_codec_2r, la_fmt_rd_rj },
+    { "revh.d", la_codec_2r, la_fmt_rd_rj },
+    { "bitrev.4b", la_codec_2r, la_fmt_rd_rj },
+    { "bitrev.8b", la_codec_2r, la_fmt_rd_rj },
+    { "bitrev.w", la_codec_2r, la_fmt_rd_rj },
+    { "bitrev.d", la_codec_2r, la_fmt_rd_rj },
+    { "ext.w.h", la_codec_2r, la_fmt_rd_rj },
+    { "ext.w.b", la_codec_2r, la_fmt_rd_rj },
+    { "rdtime.d", la_codec_2r, la_fmt_rd_rj },
+    { "cpucfg", la_codec_2r, la_fmt_rd_rj },
+    { "asrtle.d", la_codec_3r_rd0, la_fmt_rj_rk },
+    { "asrtgt.d", la_codec_3r_rd0, la_fmt_rj_rk },
+    { "alsl.w", la_codec_3r_sa2, la_fmt_rd_rj_rk_sa2 },
+    { "alsl.wu", la_codec_3r_sa2, la_fmt_rd_rj_rk_sa2 },
+    { "bytepick.w", la_codec_3r_sa2, la_fmt_rd_rj_rk_sa2 },
+    { "bytepick.d", la_codec_3r_sa3, la_fmt_rd_rj_rk_sa3 },
+    { "add.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "add.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "sub.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "sub.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "slt", la_codec_3r, la_fmt_rd_rj_rk },
+    { "sltu", la_codec_3r, la_fmt_rd_rj_rk },
+    { "maskeqz", la_codec_3r, la_fmt_rd_rj_rk },
+    { "masknez", la_codec_3r, la_fmt_rd_rj_rk },
+    { "nor", la_codec_3r, la_fmt_rd_rj_rk },
+    { "and", la_codec_3r, la_fmt_rd_rj_rk },
+    { "or", la_codec_3r, la_fmt_rd_rj_rk },
+    { "xor", la_codec_3r, la_fmt_rd_rj_rk },
+    { "orn", la_codec_3r, la_fmt_rd_rj_rk },
+    { "andn", la_codec_3r, la_fmt_rd_rj_rk },
+    { "sll.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "srl.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "sra.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "sll.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "srl.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "sra.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "rotr.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "rotr.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "mul.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "mulh.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "mulh.wu", la_codec_3r, la_fmt_rd_rj_rk },
+    { "mul.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "mulh.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "mulh.du", la_codec_3r, la_fmt_rd_rj_rk },
+    { "mulw.d.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "mulw.d.wu", la_codec_3r, la_fmt_rd_rj_rk },
+    { "div.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "mod.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "div.wu", la_codec_3r, la_fmt_rd_rj_rk },
+    { "mod.wu", la_codec_3r, la_fmt_rd_rj_rk },
+    { "div.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "mod.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "div.du", la_codec_3r, la_fmt_rd_rj_rk },
+    { "mod.du", la_codec_3r, la_fmt_rd_rj_rk },
+    { "crc.w.b.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "crc.w.h.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "crc.w.w.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "crc.w.d.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "crcc.w.b.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "crcc.w.h.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "crcc.w.w.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "crcc.w.d.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "break", la_codec_code, la_fmt_code },
+    { "syscall", la_codec_code, la_fmt_code },
+    { "alsl.d", la_codec_3r_sa2, la_fmt_rd_rj_rk_sa2 },
+    { "slli.w", la_codec_2r_u5, la_fmt_rd_rj_ui5 },
+    { "slli.d", la_codec_2r_u6, la_fmt_rd_rj_ui6 },
+    { "srli.w", la_codec_2r_u5, la_fmt_rd_rj_ui5 },
+    { "srli.d", la_codec_2r_u6, la_fmt_rd_rj_ui6 },
+    { "srai.w", la_codec_2r_u5, la_fmt_rd_rj_ui5 },
+    { "srai.d", la_codec_2r_u6, la_fmt_rd_rj_ui6 },
+    { "rotri.w", la_codec_2r_u5, la_fmt_rd_rj_ui5 },
+    { "rotri.d", la_codec_2r_u6, la_fmt_rd_rj_ui6 },
+    { "bstrins.w", la_codec_2r_2bw, la_fmt_rd_rj_msbw_lsbw },
+    { "bstrpick.w", la_codec_2r_2bw, la_fmt_rd_rj_msbw_lsbw },
+    { "bstrins.d", la_codec_2r_2bd, la_fmt_rd_rj_msbd_lsbd },
+    { "bstrpick.d", la_codec_2r_2bd, la_fmt_rd_rj_msbd_lsbd },
+    { "fadd.s", la_codec_3r, la_fmt_fd_fj_fk },
+    { "fadd.d", la_codec_3r, la_fmt_fd_fj_fk },
+    { "fsub.s", la_codec_3r, la_fmt_fd_fj_fk },
+    { "fsub.d", la_codec_3r, la_fmt_fd_fj_fk },
+    { "fmul.s", la_codec_3r, la_fmt_fd_fj_fk },
+    { "fmul.d", la_codec_3r, la_fmt_fd_fj_fk },
+    { "fdiv.s", la_codec_3r, la_fmt_fd_fj_fk },
+    { "fdiv.d", la_codec_3r, la_fmt_fd_fj_fk },
+    { "fmax.s", la_codec_3r, la_fmt_fd_fj_fk },
+    { "fmax.d", la_codec_3r, la_fmt_fd_fj_fk },
+    { "fmin.s", la_codec_3r, la_fmt_fd_fj_fk },
+    { "fmin.d", la_codec_3r, la_fmt_fd_fj_fk },
+    { "fmaxa.s", la_codec_3r, la_fmt_fd_fj_fk },
+    { "fmaxa.d", la_codec_3r, la_fmt_fd_fj_fk },
+    { "fmina.s", la_codec_3r, la_fmt_fd_fj_fk },
+    { "fmina.d", la_codec_3r, la_fmt_fd_fj_fk },
+    { "fscaleb.s", la_codec_3r, la_fmt_fd_fj_fk },
+    { "fscaleb.d", la_codec_3r, la_fmt_fd_fj_fk },
+    { "fcopysign.s", la_codec_3r, la_fmt_fd_fj_fk },
+    { "fcopysign.d", la_codec_3r, la_fmt_fd_fj_fk },
+    { "fabs.s", la_codec_2r, la_fmt_fd_fj },
+    { "fabs.d", la_codec_2r, la_fmt_fd_fj },
+    { "fneg.s", la_codec_2r, la_fmt_fd_fj },
+    { "fneg.d", la_codec_2r, la_fmt_fd_fj },
+    { "flogb.s", la_codec_2r, la_fmt_fd_fj },
+    { "flogb.d", la_codec_2r, la_fmt_fd_fj },
+    { "fclass.s", la_codec_2r, la_fmt_fd_fj },
+    { "fclass.d", la_codec_2r, la_fmt_fd_fj },
+    { "fsqrt.s", la_codec_2r, la_fmt_fd_fj },
+    { "fsqrt.d", la_codec_2r, la_fmt_fd_fj },
+    { "frecip.s", la_codec_2r, la_fmt_fd_fj },
+    { "frecip.d", la_codec_2r, la_fmt_fd_fj },
+    { "frsqrt.s", la_codec_2r, la_fmt_fd_fj },
+    { "frsqrt.d", la_codec_2r, la_fmt_fd_fj },
+    { "fmov.s", la_codec_2r, la_fmt_fd_fj },
+    { "fmov.d", la_codec_2r, la_fmt_fd_fj },
+    { "movgr2fr.w", la_codec_2r, la_fmt_fd_rj },
+    { "movgr2fr.d", la_codec_2r, la_fmt_fd_rj },
+    { "movgr2frh.w", la_codec_2r, la_fmt_fd_rj },
+    { "movfr2gr.s", la_codec_2r, la_fmt_rd_fj },
+    { "movfr2gr.d", la_codec_2r, la_fmt_rd_fj },
+    { "movfrh2gr.s", la_codec_2r, la_fmt_rd_fj },
+    { "movgr2fcsr", la_codec_2r, la_fmt_fcsrd_rj },
+    { "movfcsr2gr", la_codec_2r, la_fmt_rd_fcsrs },
+    { "movfr2cf", la_codec_r_cd, la_fmt_cd_fj },
+    { "movcf2fr", la_codec_r_cj, la_fmt_fd_cj },
+    { "movgr2cf", la_codec_r_cd, la_fmt_cd_fj },
+    { "movcf2gr", la_codec_r_cj, la_fmt_fd_cj },
+    { "fcvt.s.d", la_codec_2r, la_fmt_fd_fj },
+    { "fcvt.d.s", la_codec_2r, la_fmt_fd_fj },
+    { "ftintrm.w.s", la_codec_2r, la_fmt_fd_fj },
+    { "ftintrm.w.d", la_codec_2r, la_fmt_fd_fj },
+    { "ftintrm.l.s", la_codec_2r, la_fmt_fd_fj },
+    { "ftintrm.l.d", la_codec_2r, la_fmt_fd_fj },
+    { "ftintrp.w.s", la_codec_2r, la_fmt_fd_fj },
+    { "ftintrp.w.d", la_codec_2r, la_fmt_fd_fj },
+    { "ftintrp.l.s", la_codec_2r, la_fmt_fd_fj },
+    { "ftintrp.l.d", la_codec_2r, la_fmt_fd_fj },
+    { "ftintrz.w.s", la_codec_2r, la_fmt_fd_fj },
+    { "ftintrz.w.d", la_codec_2r, la_fmt_fd_fj },
+    { "ftintrz.l.s", la_codec_2r, la_fmt_fd_fj },
+    { "ftintrz.l.d", la_codec_2r, la_fmt_fd_fj },
+    { "ftintrne.w.s", la_codec_2r, la_fmt_fd_fj },
+    { "ftintrne.w.d", la_codec_2r, la_fmt_fd_fj },
+    { "ftintrne.l.s", la_codec_2r, la_fmt_fd_fj },
+    { "ftintrne.l.d", la_codec_2r, la_fmt_fd_fj },
+    { "ftint.w.s", la_codec_2r, la_fmt_fd_fj },
+    { "ftint.w.d", la_codec_2r, la_fmt_fd_fj },
+    { "ftint.l.s", la_codec_2r, la_fmt_fd_fj },
+    { "ftint.l.d", la_codec_2r, la_fmt_fd_fj },
+    { "ffint.s.w", la_codec_2r, la_fmt_fd_fj },
+    { "ffint.s.l", la_codec_2r, la_fmt_fd_fj },
+    { "ffint.d.w", la_codec_2r, la_fmt_fd_fj },
+    { "ffint.d.l", la_codec_2r, la_fmt_fd_fj },
+    { "frint.s", la_codec_2r, la_fmt_fd_fj },
+    { "frint.d", la_codec_2r, la_fmt_fd_fj },
+    { "slti", la_codec_2r_im12, la_fmt_rd_rj_si12 },
+    { "sltui", la_codec_2r_im12, la_fmt_rd_rj_si12 },
+    { "addi.w", la_codec_2r_im12, la_fmt_rd_rj_si12 },
+    { "addi.d", la_codec_2r_im12, la_fmt_rd_rj_si12 },
+    { "lu52i.d", la_codec_2r_im12, la_fmt_rd_rj_si12 },
+    { "addi", la_codec_2r_im12, la_fmt_rd_rj_si12 },
+    { "ori", la_codec_2r_im12, la_fmt_rd_rj_si12 },
+    { "xori", la_codec_2r_im12, la_fmt_rd_rj_si12 },
+    { "rdtimel.w", la_codec_2r, la_fmt_rd_rj },
+    { "rdtimeh.w", la_codec_2r, la_fmt_rd_rj },
+    { "fmadd.s", la_codec_4r, la_fmt_fd_fj_fk_fa },
+    { "fmadd.d", la_codec_4r, la_fmt_fd_fj_fk_fa },
+    { "fmsub.s", la_codec_4r, la_fmt_fd_fj_fk_fa },
+    { "fmsub.d", la_codec_4r, la_fmt_fd_fj_fk_fa },
+    { "fnmadd.s", la_codec_4r, la_fmt_fd_fj_fk_fa },
+    { "fnmadd.d", la_codec_4r, la_fmt_fd_fj_fk_fa },
+    { "fnmsub.s", la_codec_4r, la_fmt_fd_fj_fk_fa },
+    { "fnmsub.d", la_codec_4r, la_fmt_fd_fj_fk_fa },
+    { "fcmp.cond.s", la_codec_cond, la_fmt_s_cd_fj_fk },
+    { "fcmp.cond.d", la_codec_cond, la_fmt_d_cd_fj_fk },
+    { "fsel", la_codec_sel, la_fmt_fd_fj_fk_ca },
+    { "addu16i.d", la_codec_2r_im16, la_fmt_rd_rj_si16 },
+    { "lu12i.w", la_codec_r_im20, la_fmt_rd_si20 },
+    { "lu32i.d", la_codec_r_im20, la_fmt_rd_si20 },
+    { "pcaddi", la_codec_r_im20, la_fmt_rd_si20 },
+    { "pcalau12i", la_codec_r_im20, la_fmt_rd_si20 },
+    { "pcaddu12i", la_codec_r_im20, la_fmt_rd_si20 },
+    { "pcaddu18i", la_codec_r_im20, la_fmt_rd_si20 },
+    { "ll.w", la_codec_2r_im14, la_fmt_rd_rj_si14 },
+    { "sc.w", la_codec_2r_im14, la_fmt_rd_rj_si14 },
+    { "ll.d", la_codec_2r_im14, la_fmt_rd_rj_si14 },
+    { "sc.d", la_codec_2r_im14, la_fmt_rd_rj_si14 },
+    { "ldptr.w", la_codec_2r_im14, la_fmt_rd_rj_si14 },
+    { "stptr.w", la_codec_2r_im14, la_fmt_rd_rj_si14 },
+    { "ldptr.d", la_codec_2r_im14, la_fmt_rd_rj_si14 },
+    { "stptr.d", la_codec_2r_im14, la_fmt_rd_rj_si14 },
+    { "ld.b", la_codec_2r_im12, la_fmt_rd_rj_si12 },
+    { "ld.h", la_codec_2r_im12, la_fmt_rd_rj_si12 },
+    { "ld.w", la_codec_2r_im12, la_fmt_rd_rj_si12 },
+    { "ld.d", la_codec_2r_im12, la_fmt_rd_rj_si12 },
+    { "st.b", la_codec_2r_im12, la_fmt_rd_rj_si12 },
+    { "st.h", la_codec_2r_im12, la_fmt_rd_rj_si12 },
+    { "st.w", la_codec_2r_im12, la_fmt_rd_rj_si12 },
+    { "st.d", la_codec_2r_im12, la_fmt_rd_rj_si12 },
+    { "ld.bu", la_codec_2r_im12, la_fmt_rd_rj_si12 },
+    { "ld.hu", la_codec_2r_im12, la_fmt_rd_rj_si12 },
+    { "ld.wu", la_codec_2r_im12, la_fmt_rd_rj_si12 },
+    { "preld", la_codec_2r_im12, la_fmt_hint_rj_si12 },
+    { "fld.s", la_codec_2r_im12, la_fmt_fd_fj_si12 },
+    { "fst.s", la_codec_2r_im12, la_fmt_fd_fj_si12 },
+    { "fld.d", la_codec_2r_im12, la_fmt_fd_fj_si12 },
+    { "fst.d", la_codec_2r_im12, la_fmt_fd_fj_si12 },
+    { "ldx.b", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ldx.h", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ldx.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ldx.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "stx.b", la_codec_3r, la_fmt_rd_rj_rk },
+    { "stx.h", la_codec_3r, la_fmt_rd_rj_rk },
+    { "stx.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "stx.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ldx.bu", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ldx.hu", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ldx.wu", la_codec_3r, la_fmt_rd_rj_rk },
+    { "fldx.s", la_codec_3r, la_fmt_fd_rj_rk },
+    { "fldx.d", la_codec_3r, la_fmt_fd_rj_rk },
+    { "fstx.s", la_codec_3r, la_fmt_fd_rj_rk },
+    { "fstx.d", la_codec_3r, la_fmt_fd_rj_rk },
+    { "amswap.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "amswap.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "amadd.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "amadd.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "amand.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "amand.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "amor.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "amor.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "amxor.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "amxor.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ammax.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ammax.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ammin.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ammin.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ammax.wu", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ammax.du", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ammin.wu", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ammin.du", la_codec_3r, la_fmt_rd_rj_rk },
+    { "amswap.db.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "amswap.db.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "amadd.db.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "amadd.db.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "amand.db.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "amand.db.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "amor.db.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "amor.db.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "amxor.db.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "amxor.db.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ammax.db.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ammax.db.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ammin.db.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ammin.db.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ammax.db.wu", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ammax.db.du", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ammin.db.wu", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ammin.db.du", la_codec_3r, la_fmt_rd_rj_rk },
+    { "dbar", la_codec_whint, la_fmt_whint },
+    { "ibar", la_codec_whint, la_fmt_whint },
+    { "fldgt.s", la_codec_3r, la_fmt_fd_rj_rk },
+    { "fldgt.d", la_codec_3r, la_fmt_fd_rj_rk },
+    { "fldle.s", la_codec_3r, la_fmt_fd_rj_rk },
+    { "fldle.d", la_codec_3r, la_fmt_fd_rj_rk },
+    { "fstgt.s", la_codec_3r, la_fmt_fd_rj_rk },
+    { "fstgt.d", la_codec_3r, la_fmt_fd_rj_rk },
+    { "fstle.s", la_codec_3r, la_fmt_fd_rj_rk },
+    { "fstle.d", la_codec_3r, la_fmt_fd_rj_rk },
+    { "ldgt.b", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ldgt.h", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ldgt.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ldgt.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ldle.b", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ldle.h", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ldle.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "ldle.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "stgt.b", la_codec_3r, la_fmt_rd_rj_rk },
+    { "stgt.h", la_codec_3r, la_fmt_rd_rj_rk },
+    { "stgt.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "stgt.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "stle.b", la_codec_3r, la_fmt_rd_rj_rk },
+    { "stle.h", la_codec_3r, la_fmt_rd_rj_rk },
+    { "stle.w", la_codec_3r, la_fmt_rd_rj_rk },
+    { "stle.d", la_codec_3r, la_fmt_rd_rj_rk },
+    { "beqz", la_codec_r_ofs21, la_fmt_rj_offs21 },
+    { "bnez", la_codec_r_ofs21, la_fmt_rj_offs21 },
+    { "bceqz", la_codec_cj_ofs21, la_fmt_cj_offs21 },
+    { "bcnez", la_codec_cj_ofs21, la_fmt_cj_offs21 },
+    { "jirl", la_codec_2r_im16, la_fmt_rd_rj_offs16 },
+    { "b", la_codec_ofs26, la_fmt_offs26 },
+    { "bl", la_codec_ofs26, la_fmt_offs26 },
+    { "beq", la_codec_2r_im16, la_fmt_rj_rd_offs16 },
+    { "bne", la_codec_2r_im16, la_fmt_rj_rd_offs16 },
+    { "blt", la_codec_2r_im16, la_fmt_rj_rd_offs16 },
+    { "bge", la_codec_2r_im16, la_fmt_rj_rd_offs16 },
+    { "bltu", la_codec_2r_im16, la_fmt_rj_rd_offs16 },
+    { "bgeu", la_codec_2r_im16, la_fmt_rj_rd_offs16 },
+
+};
+
+
+/* decode opcode */
+static void decode_insn_opcode(la_decode *dec)
+{
+    uint32_t insn = dec->insn;
+    uint16_t op = la_op_illegal;
+    switch ((insn >> 26) & 0x3f) {
+    case 0x0:
+        switch ((insn >> 22) & 0xf) {
+        case 0x0:
+            switch ((insn >> 18) & 0xf) {
+            case 0x0:
+                switch ((insn >> 15) & 0x7) {
+                case 0x0:
+                    switch ((insn >> 10) & 0x1f) {
+                    case 0x4:
+                        op = la_op_clo_w;
+                        break;
+                    case 0x5:
+                        op = la_op_clz_w;
+                        break;
+                    case 0x6:
+                        op = la_op_cto_w;
+                        break;
+                    case 0x7:
+                        op = la_op_ctz_w;
+                        break;
+                    case 0x8:
+                        op = la_op_clo_d;
+                        break;
+                    case 0x9:
+                        op = la_op_clz_d;
+                        break;
+                    case 0xa:
+                        op = la_op_cto_d;
+                        break;
+                    case 0xb:
+                        op = la_op_ctz_d;
+                        break;
+                    case 0xc:
+                        op = la_op_revb_2h;
+                        break;
+                    case 0xd:
+                        op = la_op_revb_4h;
+                        break;
+                    case 0xe:
+                        op = la_op_revb_2w;
+                        break;
+                    case 0xf:
+                        op = la_op_revb_d;
+                        break;
+                    case 0x10:
+                        op = la_op_revh_2w;
+                        break;
+                    case 0x11:
+                        op = la_op_revh_d;
+                        break;
+                    case 0x12:
+                        op = la_op_bitrev_4b;
+                        break;
+                    case 0x13:
+                        op = la_op_bitrev_8b;
+                        break;
+                    case 0x14:
+                        op = la_op_bitrev_w;
+                        break;
+                    case 0x15:
+                        op = la_op_bitrev_d;
+                        break;
+                    case 0x16:
+                        op = la_op_ext_w_h;
+                        break;
+                    case 0x17:
+                        op = la_op_ext_w_b;
+                        break;
+                    case 0x18:
+                        op = la_op_rdtimel_w;
+                        break;
+                    case 0x19:
+                        op = la_op_rdtimeh_w;
+                        break;
+                    case 0x1a:
+                        op = la_op_rdtime_d;
+                        break;
+                    case 0x1b:
+                        op = la_op_cpucfg;
+                        break;
+                    }
+                    break;
+                case 0x2:
+                    switch (insn & 0x0000001f) {
+                    case 0x00000000:
+                        op = la_op_asrtle_d;
+                        break;
+                    }
+                    break;
+                case 0x3:
+                    switch (insn & 0x0000001f) {
+                    case 0x00000000:
+                        op = la_op_asrtgt_d;
+                        break;
+                    }
+                    break;
+                }
+                break;
+            case 0x1:
+                switch ((insn >> 17) & 0x1) {
+                case 0x0:
+                    op = la_op_alsl_w;
+                    break;
+                case 0x1:
+                    op = la_op_alsl_wu;
+                    break;
+                }
+                break;
+            case 0x2:
+                switch ((insn >> 17) & 0x1) {
+                case 0x0:
+                    op = la_op_bytepick_w;
+                    break;
+                }
+                break;
+            case 0x3:
+                op = la_op_bytepick_d;
+                break;
+            case 0x4:
+                switch ((insn >> 15) & 0x7) {
+                case 0x0:
+                    op = la_op_add_w;
+                    break;
+                case 0x1:
+                    op = la_op_add_d;
+                    break;
+                case 0x2:
+                    op = la_op_sub_w;
+                    break;
+                case 0x3:
+                    op = la_op_sub_d;
+                    break;
+                case 0x4:
+                    op = la_op_slt;
+                    break;
+                case 0x5:
+                    op = la_op_sltu;
+                    break;
+                case 0x6:
+                    op = la_op_maskeqz;
+                    break;
+                case 0x7:
+                    op = la_op_masknez;
+                    break;
+                }
+                break;
+            case 0x5:
+                switch ((insn >> 15) & 0x7) {
+                case 0x0:
+                    op = la_op_nor;
+                    break;
+                case 0x1:
+                    op = la_op_and;
+                    break;
+                case 0x2:
+                    op = la_op_or;
+                    break;
+                case 0x3:
+                    op = la_op_xor;
+                    break;
+                case 0x4:
+                    op = la_op_orn;
+                    break;
+                case 0x5:
+                    op = la_op_andn;
+                    break;
+                case 0x6:
+                    op = la_op_sll_w;
+                    break;
+                case 0x7:
+                    op = la_op_srl_w;
+                    break;
+                }
+                break;
+            case 0x6:
+                switch ((insn >> 15) & 0x7) {
+                case 0x0:
+                    op = la_op_sra_w;
+                    break;
+                case 0x1:
+                    op = la_op_sll_d;
+                    break;
+                case 0x2:
+                    op = la_op_srl_d;
+                    break;
+                case 0x3:
+                    op = la_op_sra_d;
+                    break;
+                case 0x6:
+                    op = la_op_rotr_w;
+                    break;
+                case 0x7:
+                    op = la_op_rotr_d;
+                    break;
+                }
+                break;
+            case 0x7:
+                switch ((insn >> 15) & 0x7) {
+                case 0x0:
+                    op = la_op_mul_w;
+                    break;
+                case 0x1:
+                    op = la_op_mulh_w;
+                    break;
+                case 0x2:
+                    op = la_op_mulh_wu;
+                    break;
+                case 0x3:
+                    op = la_op_mul_d;
+                    break;
+                case 0x4:
+                    op = la_op_mulh_d;
+                    break;
+                case 0x5:
+                    op = la_op_mulh_du;
+                    break;
+                case 0x6:
+                    op = la_op_mulw_d_w;
+                    break;
+                case 0x7:
+                    op = la_op_mulw_d_wu;
+                    break;
+                }
+                break;
+            case 0x8:
+                switch ((insn >> 15) & 0x7) {
+                case 0x0:
+                    op = la_op_div_w;
+                    break;
+                case 0x1:
+                    op = la_op_mod_w;
+                    break;
+                case 0x2:
+                    op = la_op_div_wu;
+                    break;
+                case 0x3:
+                    op = la_op_mod_wu;
+                    break;
+                case 0x4:
+                    op = la_op_div_d;
+                    break;
+                case 0x5:
+                    op = la_op_mod_d;
+                    break;
+                case 0x6:
+                    op = la_op_div_du;
+                    break;
+                case 0x7:
+                    op = la_op_mod_du;
+                    break;
+                }
+                break;
+            case 0x9:
+                switch ((insn >> 15) & 0x7) {
+                case 0x0:
+                    op = la_op_crc_w_b_w;
+                    break;
+                case 0x1:
+                    op = la_op_crc_w_h_w;
+                    break;
+                case 0x2:
+                    op = la_op_crc_w_w_w;
+                    break;
+                case 0x3:
+                    op = la_op_crc_w_d_w;
+                    break;
+                case 0x4:
+                    op = la_op_crcc_w_b_w;
+                    break;
+                case 0x5:
+                    op = la_op_crcc_w_h_w;
+                    break;
+                case 0x6:
+                    op = la_op_crcc_w_w_w;
+                    break;
+                case 0x7:
+                    op = la_op_crcc_w_d_w;
+                    break;
+                }
+                break;
+            case 0xa:
+                switch ((insn >> 15) & 0x7) {
+                case 0x4:
+                    op = la_op_break;
+                    break;
+                case 0x6:
+                    op = la_op_syscall;
+                    break;
+                }
+                break;
+            case 0xb:
+                switch ((insn >> 17) & 0x1) {
+                case 0x0:
+                    op = la_op_alsl_d;
+                    break;
+                }
+                break;
+            }
+            break;
+        case 0x1:
+            switch ((insn >> 21) & 0x1) {
+            case 0x0:
+                switch ((insn >> 16) & 0x1f) {
+                case 0x0:
+                    switch ((insn >> 15) & 0x1) {
+                    case 0x1:
+                        op = la_op_slli_w;
+                        break;
+                    }
+                    break;
+                case 0x1:
+                    op = la_op_slli_d;
+                    break;
+                case 0x4:
+                    switch ((insn >> 15) & 0x1) {
+                    case 0x1:
+                        op = la_op_srli_w;
+                        break;
+                    }
+                    break;
+                case 0x5:
+                    op = la_op_srli_d;
+                    break;
+                case 0x8:
+                    switch ((insn >> 15) & 0x1) {
+                    case 0x1:
+                        op = la_op_srai_w;
+                        break;
+                    }
+                    break;
+                case 0x9:
+                    op = la_op_srai_d;
+                    break;
+                case 0xc:
+                    switch ((insn >> 15) & 0x1) {
+                    case 0x1:
+                        op = la_op_rotri_w;
+                        break;
+                    }
+                    break;
+                case 0xd:
+                    op = la_op_rotri_d;
+                    break;
+                }
+                break;
+            case 0x1:
+                switch ((insn >> 15) & 0x1) {
+                case 0x0:
+                    op = la_op_bstrins_w;
+                    break;
+                case 0x1:
+                    op = la_op_bstrpick_w;
+                    break;
+                }
+                break;
+            }
+            break;
+        case 0x2:
+            op = la_op_bstrins_d;
+            break;
+        case 0x3:
+            op = la_op_bstrpick_d;
+            break;
+        case 0x4:
+            switch ((insn >> 15) & 0x7f) {
+            case 0x1:
+                op = la_op_fadd_s;
+                break;
+            case 0x2:
+                op = la_op_fadd_d;
+                break;
+            case 0x5:
+                op = la_op_fsub_s;
+                break;
+            case 0x6:
+                op = la_op_fsub_d;
+                break;
+            case 0x9:
+                op = la_op_fmul_s;
+                break;
+            case 0xa:
+                op = la_op_fmul_d;
+                break;
+            case 0xd:
+                op = la_op_fdiv_s;
+                break;
+            case 0xe:
+                op = la_op_fdiv_d;
+                break;
+            case 0x11:
+                op = la_op_fmax_s;
+                break;
+            case 0x12:
+                op = la_op_fmax_d;
+                break;
+            case 0x15:
+                op = la_op_fmin_s;
+                break;
+            case 0x16:
+                op = la_op_fmin_d;
+                break;
+            case 0x19:
+                op = la_op_fmaxa_s;
+                break;
+            case 0x1a:
+                op = la_op_fmaxa_d;
+                break;
+            case 0x1d:
+                op = la_op_fmina_s;
+                break;
+            case 0x1e:
+                op = la_op_fmina_d;
+                break;
+            case 0x21:
+                op = la_op_fscaleb_s;
+                break;
+            case 0x22:
+                op = la_op_fscaleb_d;
+                break;
+            case 0x25:
+                op = la_op_fcopysign_s;
+                break;
+            case 0x26:
+                op = la_op_fcopysign_d;
+                break;
+            case 0x28:
+                switch ((insn >> 10) & 0x1f) {
+                case 0x1:
+                    op = la_op_fabs_s;
+                    break;
+                case 0x2:
+                    op = la_op_fabs_d;
+                    break;
+                case 0x5:
+                    op = la_op_fneg_s;
+                    break;
+                case 0x6:
+                    op = la_op_fneg_d;
+                    break;
+                case 0x9:
+                    op = la_op_flogb_s;
+                    break;
+                case 0xa:
+                    op = la_op_flogb_d;
+                    break;
+                case 0xd:
+                    op = la_op_fclass_s;
+                    break;
+                case 0xe:
+                    op = la_op_fclass_d;
+                    break;
+                case 0x11:
+                    op = la_op_fsqrt_s;
+                    break;
+                case 0x12:
+                    op = la_op_fsqrt_d;
+                    break;
+                case 0x15:
+                    op = la_op_frecip_s;
+                    break;
+                case 0x16:
+                    op = la_op_frecip_d;
+                    break;
+                case 0x19:
+                    op = la_op_frsqrt_s;
+                    break;
+                case 0x1a:
+                    op = la_op_frsqrt_d;
+                    break;
+                }
+                break;
+            case 0x29:
+                switch ((insn >> 10) & 0x1f) {
+                case 0x5:
+                    op = la_op_fmov_s;
+                    break;
+                case 0x6:
+                    op = la_op_fmov_d;
+                    break;
+                case 0x9:
+                    op = la_op_movgr2fr_w;
+                    break;
+                case 0xa:
+                    op = la_op_movgr2fr_d;
+                    break;
+                case 0xb:
+                    op = la_op_movgr2frh_w;
+                    break;
+                case 0xd:
+                    op = la_op_movfr2gr_s;
+                    break;
+                case 0xe:
+                    op = la_op_movfr2gr_d;
+                    break;
+                case 0xf:
+                    op = la_op_movfrh2gr_s;
+                    break;
+                case 0x10:
+                    op = la_op_movgr2fcsr;
+                    break;
+                case 0x12:
+                    op = la_op_movfcsr2gr;
+                    break;
+                case 0x14:
+                    switch ((insn >> 3) & 0x3) {
+                    case 0x0:
+                        op = la_op_movfr2cf;
+                        break;
+                    }
+                    break;
+                case 0x15:
+                    switch ((insn >> 8) & 0x3) {
+                    case 0x0:
+                        op = la_op_movcf2fr;
+                        break;
+                    }
+                    break;
+                case 0x16:
+                    switch ((insn >> 3) & 0x3) {
+                    case 0x0:
+                        op = la_op_movgr2cf;
+                        break;
+                    }
+                    break;
+                case 0x17:
+                    switch ((insn >> 8) & 0x3) {
+                    case 0x0:
+                        op = la_op_movcf2gr;
+                        break;
+                    }
+                    break;
+                }
+                break;
+            case 0x32:
+                switch ((insn >> 10) & 0x1f) {
+                case 0x6:
+                    op = la_op_fcvt_s_d;
+                    break;
+                case 0x9:
+                    op = la_op_fcvt_d_s;
+                    break;
+                }
+                break;
+            case 0x34:
+                switch ((insn >> 10) & 0x1f) {
+                case 0x1:
+                    op = la_op_ftintrm_w_s;
+                    break;
+                case 0x2:
+                    op = la_op_ftintrm_w_d;
+                    break;
+                case 0x9:
+                    op = la_op_ftintrm_l_s;
+                    break;
+                case 0xa:
+                    op = la_op_ftintrm_l_d;
+                    break;
+                case 0x11:
+                    op = la_op_ftintrp_w_s;
+                    break;
+                case 0x12:
+                    op = la_op_ftintrp_w_d;
+                    break;
+                case 0x19:
+                    op = la_op_ftintrp_l_s;
+                    break;
+                case 0x1a:
+                    op = la_op_ftintrp_l_d;
+                    break;
+                }
+                break;
+            case 0x35:
+                switch ((insn >> 10) & 0x1f) {
+                case 0x1:
+                    op = la_op_ftintrz_w_s;
+                    break;
+                case 0x2:
+                    op = la_op_ftintrz_w_d;
+                    break;
+                case 0x9:
+                    op = la_op_ftintrz_l_s;
+                    break;
+                case 0xa:
+                    op = la_op_ftintrz_l_d;
+                    break;
+                case 0x11:
+                    op = la_op_ftintrne_w_s;
+                    break;
+                case 0x12:
+                    op = la_op_ftintrne_w_d;
+                    break;
+                case 0x19:
+                    op = la_op_ftintrne_l_s;
+                    break;
+                case 0x1a:
+                    op = la_op_ftintrne_l_d;
+                    break;
+                }
+                break;
+            case 0x36:
+                switch ((insn >> 10) & 0x1f) {
+                case 0x1:
+                    op = la_op_ftint_w_s;
+                    break;
+                case 0x2:
+                    op = la_op_ftint_w_d;
+                    break;
+                case 0x9:
+                    op = la_op_ftint_l_s;
+                    break;
+                case 0xa:
+                    op = la_op_ftint_l_d;
+                    break;
+                }
+                break;
+            case 0x3a:
+                switch ((insn >> 10) & 0x1f) {
+                case 0x4:
+                    op = la_op_ffint_s_w;
+                    break;
+                case 0x6:
+                    op = la_op_ffint_s_l;
+                    break;
+                case 0x8:
+                    op = la_op_ffint_d_w;
+                    break;
+                case 0xa:
+                    op = la_op_ffint_d_l;
+                    break;
+                }
+                break;
+            case 0x3c:
+                switch ((insn >> 10) & 0x1f) {
+                case 0x11:
+                    op = la_op_frint_s;
+                    break;
+                case 0x12:
+                    op = la_op_frint_d;
+                    break;
+                }
+                break;
+            }
+            break;
+        case 0x8:
+            op = la_op_slti;
+            break;
+        case 0x9:
+            op = la_op_sltui;
+            break;
+        case 0xa:
+            op = la_op_addi_w;
+            break;
+        case 0xb:
+            op = la_op_addi_d;
+            break;
+        case 0xc:
+            op = la_op_lu52i_d;
+            break;
+        case 0xd:
+            op = la_op_addi;
+            break;
+        case 0xe:
+            op = la_op_ori;
+            break;
+        case 0xf:
+            op = la_op_xori;
+            break;
+        }
+        break;
+    case 0x2:
+        switch ((insn >> 20) & 0x3f) {
+        case 0x1:
+            op = la_op_fmadd_s;
+            break;
+        case 0x2:
+            op = la_op_fmadd_d;
+            break;
+        case 0x5:
+            op = la_op_fmsub_s;
+            break;
+        case 0x6:
+            op = la_op_fmsub_d;
+            break;
+        case 0x9:
+            op = la_op_fnmadd_s;
+            break;
+        case 0xa:
+            op = la_op_fnmadd_d;
+            break;
+        case 0xd:
+            op = la_op_fnmsub_s;
+            break;
+        case 0xe:
+            op = la_op_fnmsub_d;
+            break;
+        }
+        break;
+    case 0x3:
+        switch ((insn >> 20) & 0x3f) {
+        case 0x1:
+            switch ((insn >> 3) & 0x3) {
+            case 0x0:
+                op = la_op_fcmp_cond_s;
+                break;
+            }
+            break;
+        case 0x2:
+            switch ((insn >> 3) & 0x3) {
+            case 0x0:
+                op = la_op_fcmp_cond_d;
+                break;
+            }
+            break;
+        case 0x10:
+            switch ((insn >> 18) & 0x3) {
+            case 0x0:
+                op = la_op_fsel;
+                break;
+            }
+            break;
+        }
+        break;
+    case 0x4:
+        op = la_op_addu16i_d;
+        break;
+    case 0x5:
+        switch ((insn >> 25) & 0x1) {
+        case 0x0:
+            op = la_op_lu12i_w;
+            break;
+        case 0x1:
+            op = la_op_lu32i_d;
+            break;
+        }
+        break;
+    case 0x6:
+        switch ((insn >> 25) & 0x1) {
+        case 0x0:
+            op = la_op_pcaddi;
+            break;
+        case 0x1:
+            op = la_op_pcalau12i;
+            break;
+        }
+        break;
+    case 0x7:
+        switch ((insn >> 25) & 0x1) {
+        case 0x0:
+            op = la_op_pcaddu12i;
+            break;
+        case 0x1:
+            op = la_op_pcaddu18i;
+            break;
+        }
+        break;
+    case 0x8:
+        switch ((insn >> 24) & 0x3) {
+        case 0x0:
+            op = la_op_ll_w;
+            break;
+        case 0x1:
+            op = la_op_sc_w;
+            break;
+        case 0x2:
+            op = la_op_ll_d;
+            break;
+        case 0x3:
+            op = la_op_sc_d;
+            break;
+        }
+        break;
+    case 0x9:
+        switch ((insn >> 24) & 0x3) {
+        case 0x0:
+            op = la_op_ldptr_w;
+            break;
+        case 0x1:
+            op = la_op_stptr_w;
+            break;
+        case 0x2:
+            op = la_op_ldptr_d;
+            break;
+        case 0x3:
+            op = la_op_stptr_d;
+            break;
+        }
+        break;
+    case 0xa:
+        switch ((insn >> 22) & 0xf) {
+        case 0x0:
+            op = la_op_ld_b;
+            break;
+        case 0x1:
+            op = la_op_ld_h;
+            break;
+        case 0x2:
+            op = la_op_ld_w;
+            break;
+        case 0x3:
+            op = la_op_ld_d;
+            break;
+        case 0x4:
+            op = la_op_st_b;
+            break;
+        case 0x5:
+            op = la_op_st_h;
+            break;
+        case 0x6:
+            op = la_op_st_w;
+            break;
+        case 0x7:
+            op = la_op_st_d;
+            break;
+        case 0x8:
+            op = la_op_ld_bu;
+            break;
+        case 0x9:
+            op = la_op_ld_hu;
+            break;
+        case 0xa:
+            op = la_op_ld_wu;
+            break;
+        case 0xb:
+            op = la_op_preld;
+            break;
+        case 0xc:
+            op = la_op_fld_s;
+            break;
+        case 0xd:
+            op = la_op_fst_s;
+            break;
+        case 0xe:
+            op = la_op_fld_d;
+            break;
+        case 0xf:
+            op = la_op_fst_d;
+            break;
+        }
+        break;
+    case 0xe:
+        switch ((insn >> 15) & 0x7ff) {
+        case 0x0:
+            op = la_op_ldx_b;
+            break;
+        case 0x8:
+            op = la_op_ldx_h;
+            break;
+        case 0x10:
+            op = la_op_ldx_w;
+            break;
+        case 0x18:
+            op = la_op_ldx_d;
+            break;
+        case 0x20:
+            op = la_op_stx_b;
+            break;
+        case 0x28:
+            op = la_op_stx_h;
+            break;
+        case 0x30:
+            op = la_op_stx_w;
+            break;
+        case 0x38:
+            op = la_op_stx_d;
+            break;
+        case 0x40:
+            op = la_op_ldx_bu;
+            break;
+        case 0x48:
+            op = la_op_ldx_hu;
+            break;
+        case 0x50:
+            op = la_op_ldx_wu;
+            break;
+        case 0x60:
+            op = la_op_fldx_s;
+            break;
+        case 0x68:
+            op = la_op_fldx_d;
+            break;
+        case 0x70:
+            op = la_op_fstx_s;
+            break;
+        case 0x78:
+            op = la_op_fstx_d;
+            break;
+        case 0xc0:
+            op = la_op_amswap_w;
+            break;
+        case 0xc1:
+            op = la_op_amswap_d;
+            break;
+        case 0xc2:
+            op = la_op_amadd_w;
+            break;
+        case 0xc3:
+            op = la_op_amadd_d;
+            break;
+        case 0xc4:
+            op = la_op_amand_w;
+            break;
+        case 0xc5:
+            op = la_op_amand_d;
+            break;
+        case 0xc6:
+            op = la_op_amor_w;
+            break;
+        case 0xc7:
+            op = la_op_amor_d;
+            break;
+        case 0xc8:
+            op = la_op_amxor_w;
+            break;
+        case 0xc9:
+            op = la_op_amxor_d;
+            break;
+        case 0xca:
+            op = la_op_ammax_w;
+            break;
+        case 0xcb:
+            op = la_op_ammax_d;
+            break;
+        case 0xcc:
+            op = la_op_ammin_w;
+            break;
+        case 0xcd:
+            op = la_op_ammin_d;
+            break;
+        case 0xce:
+            op = la_op_ammax_wu;
+            break;
+        case 0xcf:
+            op = la_op_ammax_du;
+            break;
+        case 0xd0:
+            op = la_op_ammin_wu;
+             break;
+        case 0xd1:
+            op = la_op_ammin_du;
+            break;
+        case 0xd2:
+            op = la_op_amswap_db_w;
+            break;
+        case 0xd3:
+            op = la_op_amswap_db_d;
+            break;
+        case 0xd4:
+            op = la_op_amadd_db_w;
+            break;
+        case 0xd5:
+            op = la_op_amadd_db_d;
+            break;
+        case 0xd6:
+            op = la_op_amand_db_w;
+            break;
+        case 0xd7:
+            op = la_op_amand_db_d;
+            break;
+        case 0xd8:
+            op = la_op_amor_db_w;
+            break;
+        case 0xd9:
+            op = la_op_amor_db_d;
+            break;
+        case 0xda:
+            op = la_op_amxor_db_w;
+            break;
+        case 0xdb:
+            op = la_op_amxor_db_d;
+            break;
+        case 0xdc:
+            op = la_op_ammax_db_w;
+            break;
+        case 0xdd:
+            op = la_op_ammax_db_d;
+            break;
+        case 0xde:
+            op = la_op_ammin_db_w;
+            break;
+        case 0xdf:
+            op = la_op_ammin_db_d;
+            break;
+        case 0xe0:
+            op = la_op_ammax_db_wu;
+            break;
+        case 0xe1:
+            op = la_op_ammax_db_du;
+            break;
+        case 0xe2:
+            op = la_op_ammin_db_wu;
+            break;
+        case 0xe3:
+            op = la_op_ammin_db_du;
+            break;
+        case 0xe4:
+            op = la_op_dbar;
+            break;
+        case 0xe5:
+            op = la_op_ibar;
+            break;
+        case 0xe8:
+            op = la_op_fldgt_s;
+            break;
+        case 0xe9:
+            op = la_op_fldgt_d;
+            break;
+        case 0xea:
+            op = la_op_fldle_s;
+            break;
+        case 0xeb:
+            op = la_op_fldle_d;
+            break;
+        case 0xec:
+            op = la_op_fstgt_s;
+            break;
+        case 0xed:
+            op = la_op_fstgt_d;
+            break;
+        case 0xee:
+            op = ls_op_fstle_s;
+            break;
+        case 0xef:
+            op = la_op_fstle_d;
+            break;
+        case 0xf0:
+            op = la_op_ldgt_b;
+            break;
+        case 0xf1:
+            op = la_op_ldgt_h;
+            break;
+        case 0xf2:
+            op = la_op_ldgt_w;
+            break;
+        case 0xf3:
+            op = la_op_ldgt_d;
+            break;
+        case 0xf4:
+            op = la_op_ldle_b;
+            break;
+        case 0xf5:
+            op = la_op_ldle_h;
+            break;
+        case 0xf6:
+            op = la_op_ldle_w;
+            break;
+        case 0xf7:
+            op = la_op_ldle_d;
+            break;
+        case 0xf8:
+            op = la_op_stgt_b;
+            break;
+        case 0xf9:
+            op = la_op_stgt_h;
+            break;
+        case 0xfa:
+            op = la_op_stgt_w;
+            break;
+        case 0xfb:
+            op = la_op_stgt_d;
+            break;
+        case 0xfc:
+            op = la_op_stle_b;
+            break;
+        case 0xfd:
+            op = la_op_stle_h;
+            break;
+        case 0xfe:
+            op = la_op_stle_w;
+            break;
+        case 0xff:
+            op = la_op_stle_d;
+            break;
+        }
+        break;
+    case 0x10:
+        op = la_op_beqz;
+        break;
+    case 0x11:
+        op = la_op_bnez;
+        break;
+    case 0x12:
+        switch ((insn >> 8) & 0x3) {
+        case 0x0:
+            op = la_op_bceqz;
+            break;
+        case 0x1:
+            op = la_op_bcnez;
+            break;
+        }
+        break;
+    case 0x13:
+        op = la_op_jirl;
+        break;
+    case 0x14:
+        op = la_op_b;
+        break;
+    case 0x15:
+        op = la_op_bl;
+        break;
+    case 0x16:
+        op = la_op_beq;
+        break;
+    case 0x17:
+        op = la_op_bne;
+        break;
+    case 0x18:
+        op = la_op_blt;
+        break;
+    case 0x19:
+        op = la_op_bge;
+        break;
+    case 0x1a:
+        op = la_op_bltu;
+        break;
+    case 0x1b:
+        op = la_op_bgeu;
+        break;
+    default:
+        op = la_op_illegal;
+        break;
+    }
+    dec->op = op;
+}
+
+/* operand extractors */
+#define IM_5  5
+#define IM_8  8
+#define IM_12 12
+#define IM_14 14
+#define IM_15 15
+#define IM_16 16
+#define IM_20 20
+#define IM_21 21
+#define IM_26 26
+
+static uint32_t operand_r1(uint32_t insn)
+{
+    return insn & 0x1f;
+}
+
+static uint32_t operand_r2(uint32_t insn)
+{
+    return (insn >> 5) & 0x1f;
+}
+
+static uint32_t operand_r3(uint32_t insn)
+{
+    return (insn >> 10) & 0x1f;
+}
+
+static uint32_t operand_r4(uint32_t insn)
+{
+    return (insn >> 15) & 0x1f;
+}
+
+static uint32_t operand_u6(uint32_t insn)
+{
+    return (insn >> 10) & 0x3f;
+}
+
+static uint32_t operand_bw1(uint32_t insn)
+{
+    return (insn >> 10) & 0x1f;
+}
+
+static uint32_t operand_bw2(uint32_t insn)
+{
+    return (insn >> 16) & 0x1f;
+}
+
+static uint32_t operand_bd1(uint32_t insn)
+{
+    return (insn >> 10) & 0x3f;
+}
+
+static uint32_t operand_bd2(uint32_t insn)
+{
+    return (insn >> 16) & 0x3f;
+}
+
+static uint32_t operand_sa2(uint32_t insn)
+{
+    return (insn >> 15) & 0x3;
+}
+
+static uint32_t operand_sa3(uint32_t insn)
+{
+    return (insn >> 15) & 0x3;
+}
+
+static int32_t operand_im20(uint32_t insn)
+{
+    int32_t imm = (int32_t)((insn >> 5) & 0xfffff);
+    return imm > (1 << 19) ? imm - (1 << 20) : imm;
+}
+
+static int32_t operand_im16(uint32_t insn)
+{
+    int32_t imm = (int32_t)((insn >> 10) & 0xffff);
+    return imm > (1 << 15) ? imm - (1 << 16) : imm;
+}
+
+static int32_t operand_im14(uint32_t insn)
+{
+    int32_t imm = (int32_t)((insn >> 10) & 0x3fff);
+    return imm > (1 << 13) ? imm - (1 << 14) : imm;
+}
+
+static int32_t operand_im12(uint32_t insn)
+{
+    int32_t imm = (int32_t)((insn >> 10) & 0xfff);
+    return imm > (1 << 11) ? imm - (1 << 12) : imm;
+}
+
+static int32_t operand_im8(uint32_t insn)
+{
+    int32_t imm = (int32_t)((insn >> 10) & 0xff);
+    return imm > (1 << 7) ? imm - (1 << 8) : imm;
+}
+
+static uint32_t operand_sd(uint32_t insn)
+{
+    return insn & 0x3;
+}
+
+static uint32_t operand_sj(uint32_t insn)
+{
+    return (insn >> 5) & 0x3;
+}
+
+static uint32_t operand_cd(uint32_t insn)
+{
+    return insn & 0x7;
+}
+
+static uint32_t operand_cj(uint32_t insn)
+{
+    return (insn >> 5) & 0x7;
+}
+
+static uint32_t operand_code(uint32_t insn)
+{
+    return insn & 0x7fff;
+}
+
+static int32_t operand_whint(uint32_t insn)
+{
+    int32_t imm = (int32_t)(insn & 0x7fff);
+    return imm > (1 << 14) ? imm - (1 << 15) : imm;
+}
+
+static int32_t operand_invop(uint32_t insn)
+{
+    int32_t imm = (int32_t)(insn & 0x1f);
+    return imm > (1 << 4) ? imm - (1 << 5) : imm;
+}
+
+static int32_t operand_ofs21(uint32_t insn)
+{
+    int32_t imm = (((int32_t)insn & 0x1f) << 16) |
+        ((insn >> 10) & 0xffff);
+    return imm > (1 << 20) ? imm - (1 << 21) : imm;
+}
+
+static int32_t operand_ofs26(uint32_t insn)
+{
+    int32_t imm = (((int32_t)insn & 0x3ff) << 16) |
+        ((insn >> 10) & 0xffff);
+    return imm > (1 << 25) ? imm - (1 << 26) : imm;
+}
+
+static uint32_t operand_fcond(uint32_t insn)
+{
+    return (insn >> 15) & 0x1f;
+}
+
+static uint32_t operand_sel(uint32_t insn)
+{
+    return (insn >> 15) & 0x7;
+}
+
+/* decode operands */
+static void decode_insn_operands(la_decode *dec)
+{
+    uint32_t insn = dec->insn;
+    dec->codec = opcode_data[dec->op].codec;
+    switch (dec->codec) {
+    case la_codec_illegal:
+    case la_codec_empty:
+        break;
+    case la_codec_2r:
+        dec->r1 = operand_r1(insn);
+        dec->r2 = operand_r2(insn);
+        break;
+    case la_codec_2r_u5:
+        dec->r1 = operand_r1(insn);
+        dec->r2 = operand_r2(insn);
+        dec->r3 = operand_r3(insn);
+        break;
+    case la_codec_2r_u6:
+        dec->r1 = operand_r1(insn);
+        dec->r2 = operand_r2(insn);
+        dec->r3 = operand_u6(insn);
+        break;
+    case la_codec_2r_2bw:
+        dec->r1 = operand_r1(insn);
+        dec->r2 = operand_r2(insn);
+        dec->r3 = operand_bw1(insn);
+        dec->r4 = operand_bw2(insn);
+        break;
+    case la_codec_2r_2bd:
+        dec->r1 = operand_r1(insn);
+        dec->r2 = operand_r2(insn);
+        dec->r3 = operand_bd1(insn);
+        dec->r4 = operand_bd2(insn);
+        break;
+    case la_codec_3r:
+        dec->r1 = operand_r1(insn);
+        dec->r2 = operand_r2(insn);
+        dec->r3 = operand_r3(insn);
+        break;
+    case la_codec_3r_rd0:
+        dec->r1 = 0;
+        dec->r2 = operand_r2(insn);
+        dec->r3 = operand_r3(insn);
+        break;
+    case la_codec_3r_sa2:
+        dec->r1 = operand_r1(insn);
+        dec->r2 = operand_r2(insn);
+        dec->r3 = operand_r3(insn);
+        dec->r4 = operand_sa2(insn);
+        break;
+    case la_codec_3r_sa3:
+        dec->r1 = operand_r1(insn);
+        dec->r2 = operand_r2(insn);
+        dec->r3 = operand_r3(insn);
+        dec->r4 = operand_sa3(insn);
+        break;
+    case la_codec_4r:
+        dec->r1 = operand_r1(insn);
+        dec->r2 = operand_r2(insn);
+        dec->r3 = operand_r3(insn);
+        dec->r4 = operand_r4(insn);
+        break;
+    case la_codec_r_im20:
+        dec->r1 = operand_r1(insn);
+        dec->imm = operand_im20(insn);
+        dec->bit = IM_20;
+        break;
+    case la_codec_2r_im16:
+        dec->r1 = operand_r1(insn);
+        dec->r2 = operand_r2(insn);
+        dec->imm = operand_im16(insn);
+        dec->bit = IM_16;
+        break;
+    case la_codec_2r_im14:
+        dec->r1 = operand_r1(insn);
+        dec->r2 = operand_r2(insn);
+        dec->imm = operand_im14(insn);
+        dec->bit = IM_14;
+        break;
+    case la_codec_r_im14:
+        dec->r1 = operand_r1(insn);
+        dec->imm = operand_im14(insn);
+        dec->bit = IM_14;
+        break;
+    case la_codec_im5_r_im12:
+        dec->imm2 = operand_r1(insn);
+        dec->r2 = operand_r2(insn);
+        dec->imm = operand_im12(insn);
+        dec->bit = IM_12;
+        break;
+    case la_codec_2r_im12:
+        dec->r1 = operand_r1(insn);
+        dec->r2 = operand_r2(insn);
+        dec->imm = operand_im12(insn);
+        dec->bit = IM_12;
+        break;
+    case la_codec_2r_im8:
+        dec->r1 = operand_r1(insn);
+        dec->r2 = operand_r2(insn);
+        dec->imm = operand_im8(insn);
+        dec->bit = IM_8;
+        break;
+    case la_codec_r_sd:
+        dec->r1 = operand_sd(insn);
+        dec->r2 = operand_r2(insn);
+        break;
+    case la_codec_r_sj:
+        dec->r1 = operand_r1(insn);
+        dec->r2 = operand_sj(insn);
+        break;
+    case la_codec_r_cd:
+        dec->r1 = operand_cd(insn);
+        dec->r2 = operand_r2(insn);
+        break;
+    case la_codec_r_cj:
+        dec->r1 = operand_r1(insn);
+        dec->r2 = operand_cj(insn);
+        break;
+    case la_codec_r_seq:
+        dec->r1 = 0;
+        dec->r2 = operand_r1(insn);
+        dec->imm = operand_im8(insn);
+        dec->bit = IM_8;
+        break;
+    case la_codec_code:
+        dec->code = operand_code(insn);
+        break;
+    case la_codec_whint:
+        dec->imm = operand_whint(insn);
+        dec->bit = IM_15;
+        break;
+    case la_codec_invtlb:
+        dec->imm = operand_invop(insn);
+        dec->bit = IM_5;
+        dec->r2 = operand_r2(insn);
+        dec->r3 = operand_r3(insn);
+        break;
+    case la_codec_r_ofs21:
+        dec->imm = operand_ofs21(insn);
+        dec->bit = IM_21;
+        dec->r2 = operand_r2(insn);
+        break;
+    case la_codec_cj_ofs21:
+        dec->imm = operand_ofs21(insn);
+        dec->bit = IM_21;
+        dec->r2 = operand_cj(insn);
+        break;
+    case la_codec_ofs26:
+        dec->imm = operand_ofs26(insn);
+        dec->bit = IM_26;
+        break;
+    case la_codec_cond:
+        dec->r1 = operand_cd(insn);
+        dec->r2 = operand_r2(insn);
+        dec->r3 = operand_r3(insn);
+        dec->r4 = operand_fcond(insn);
+        break;
+    case la_codec_sel:
+        dec->r1 = operand_r1(insn);
+        dec->r2 = operand_r2(insn);
+        dec->r3 = operand_r3(insn);
+        dec->r4 = operand_sel(insn);
+        break;
+    }
+}
+
+/* format instruction */
+static void append(char *s1, const char *s2, size_t n)
+{
+    size_t l1 = strlen(s1);
+    if (n - l1 - 1 > 0) {
+        strncat(s1, s2, n - l1);
+    }
+}
+
+static void format_insn(char *buf, size_t buflen, size_t tab, la_decode *dec)
+{
+    char tmp[16];
+    const char *fmt;
+
+    fmt = opcode_data[dec->op].format;
+    while (*fmt) {
+        switch (*fmt) {
+        case 'n': /* name */
+            append(buf, opcode_data[dec->op].name, buflen);
+            break;
+        case 's':
+            append(buf, "s", buflen);
+            break;
+        case 'd':
+            append(buf, "d", buflen);
+            break;
+        case 'e': /* illegal */
+            snprintf(tmp, sizeof(tmp), "%x", dec->insn);
+            append(buf, tmp, buflen);
+            break;
+        case 't':
+            while (strlen(buf) < tab) {
+                append(buf, " ", buflen);
+            }
+            break;
+        case '(':
+            append(buf, "(", buflen);
+            break;
+        case ',':
+            append(buf, ",", buflen);
+            break;
+        case '.':
+            append(buf, ".", buflen);
+            break;
+        case ')':
+            append(buf, ")", buflen);
+            break;
+        case '0': /* rd */
+            append(buf, loongarch_r_normal_name[dec->r1], buflen);
+            break;
+        case '1': /* rj */
+            append(buf, loongarch_r_normal_name[dec->r2], buflen);
+            break;
+        case '2': /* rk */
+            append(buf, loongarch_r_normal_name[dec->r3], buflen);
+            break;
+        case '3': /* fd */
+            append(buf, loongarch_f_normal_name[dec->r1], buflen);
+            break;
+        case '4': /* fj */
+            append(buf, loongarch_f_normal_name[dec->r2], buflen);
+            break;
+        case '5': /* fk */
+            append(buf, loongarch_f_normal_name[dec->r3], buflen);
+            break;
+        case '6': /* fa */
+            append(buf, loongarch_f_normal_name[dec->r4], buflen);
+            break;
+        case 'A': /* sd */
+            append(buf, loongarch_cr_normal_name[dec->r1], buflen);
+            break;
+        case 'B': /* sj */
+            append(buf, loongarch_cr_normal_name[dec->r2], buflen);
+            break;
+        case 'C': /* r3 */
+            snprintf(tmp, sizeof(tmp), "%x", dec->r3);
+            append(buf, tmp, buflen);
+            break;
+        case 'D': /* r4 */
+            snprintf(tmp, sizeof(tmp), "%x", dec->r4);
+            append(buf, tmp, buflen);
+            break;
+        case 'E': /* r1 */
+            snprintf(tmp, sizeof(tmp), "%x", dec->r1);
+            append(buf, tmp, buflen);
+            break;
+        case 'F': /* fcsrd */
+            append(buf, loongarch_r_normal_name[dec->r1], buflen);
+            break;
+        case 'G': /* fcsrs */
+            append(buf, loongarch_r_normal_name[dec->r2], buflen);
+            break;
+        case 'H': /* cd */
+            append(buf, loongarch_c_normal_name[dec->r1], buflen);
+            break;
+        case 'I': /* cj */
+            append(buf, loongarch_c_normal_name[dec->r2], buflen);
+            break;
+        case 'J': /* code */
+            snprintf(tmp, sizeof(tmp), "0x%x", dec->code);
+            append(buf, tmp, buflen);
+            break;
+        case 'K': /* cond */
+            switch (dec->r4) {
+            case 0x0:
+                append(buf, "caf", buflen);
+                break;
+            case 0x1:
+                append(buf, "saf", buflen);
+                break;
+            case 0x2:
+                append(buf, "clt", buflen);
+                break;
+            case 0x3:
+                append(buf, "slt", buflen);
+                break;
+            case 0x4:
+                append(buf, "ceq", buflen);
+                break;
+            case 0x5:
+                append(buf, "seq", buflen);
+                break;
+            case 0x6:
+                append(buf, "cle", buflen);
+                break;
+            case 0x7:
+                append(buf, "sle", buflen);
+                break;
+            case 0x8:
+                append(buf, "cun", buflen);
+                break;
+            case 0x9:
+                append(buf, "sun", buflen);
+                break;
+            case 0xA:
+                append(buf, "cult", buflen);
+                break;
+            case 0xB:
+                append(buf, "sult", buflen);
+                break;
+            case 0xC:
+                append(buf, "cueq", buflen);
+                break;
+            case 0xD:
+                append(buf, "sueq", buflen);
+                break;
+            case 0xE:
+                append(buf, "cule", buflen);
+                break;
+            case 0xF:
+                append(buf, "sule", buflen);
+                break;
+            case 0x10:
+                append(buf, "cne", buflen);
+                break;
+            case 0x11:
+                append(buf, "sne", buflen);
+                break;
+            case 0x14:
+                append(buf, "cor", buflen);
+                break;
+            case 0x15:
+                append(buf, "sor", buflen);
+                break;
+            case 0x18:
+                append(buf, "cune", buflen);
+                break;
+            case 0x19:
+                append(buf, "sune", buflen);
+                break;
+            }
+            break;
+        case 'L': /* ca */
+            append(buf, loongarch_c_normal_name[dec->r4], buflen);
+            break;
+        case 'M': /* cop */
+            snprintf(tmp, sizeof(tmp), "0x%x", (dec->imm2) & 0x1f);
+            append(buf, tmp, buflen);
+            break;
+        case 'i': /* sixx d */
+            snprintf(tmp, sizeof(tmp), "%d", dec->imm);
+            append(buf, tmp, buflen);
+            break;
+        case 'o': /* offset */
+            snprintf(tmp, sizeof(tmp), "%d", (dec->imm) << 2);
+            append(buf, tmp, buflen);
+            break;
+        case 'x': /* sixx x */
+            switch (dec->bit) {
+            case IM_5:
+                snprintf(tmp, sizeof(tmp), "0x%x", (dec->imm) & 0x1f);
+                append(buf, tmp, buflen);
+                break;
+            case IM_8:
+                snprintf(tmp, sizeof(tmp), "0x%x", (dec->imm) & 0xff);
+                append(buf, tmp, buflen);
+                break;
+            case IM_12:
+                snprintf(tmp, sizeof(tmp), "0x%x", (dec->imm) & 0xfff);
+                append(buf, tmp, buflen);
+                break;
+            case IM_14:
+                snprintf(tmp, sizeof(tmp), "0x%x", (dec->imm) & 0x3fff);
+                append(buf, tmp, buflen);
+                break;
+            case IM_15:
+                snprintf(tmp, sizeof(tmp), "0x%x", (dec->imm) & 0x7fff);
+                append(buf, tmp, buflen);
+                break;
+            case IM_16:
+                snprintf(tmp, sizeof(tmp), "0x%x", (dec->imm) & 0xffff);
+                append(buf, tmp, buflen);
+                break;
+            case IM_20:
+                snprintf(tmp, sizeof(tmp), "0x%x", (dec->imm) & 0xfffff);
+                append(buf, tmp, buflen);
+                break;
+            default:
+                snprintf(tmp, sizeof(tmp), "0x%x", dec->imm);
+                append(buf, tmp, buflen);
+                break;
+            }
+            break;
+        case 'X': /* offset x*/
+            switch (dec->bit) {
+            case IM_16:
+                snprintf(tmp, sizeof(tmp), "0x%x",
+                    ((dec->imm) << 2) & 0xffff);
+                append(buf, tmp, buflen);
+                break;
+            case IM_21:
+                snprintf(tmp, sizeof(tmp), "0x%x",
+                    ((dec->imm) << 2) & 0x1fffff);
+                append(buf, tmp, buflen);
+                break;
+            case IM_26:
+                snprintf(tmp, sizeof(tmp), "0x%x",
+                    ((dec->imm) << 2) & 0x3ffffff);
+                append(buf, tmp, buflen);
+                break;
+            default:
+                snprintf(tmp, sizeof(tmp), "0x%x", (dec->imm) << 2);
+                append(buf, tmp, buflen);
+                break;
+            }
+            break;
+        case 'p': /* pc */
+            snprintf(tmp, sizeof(tmp), "  # 0x%"PRIx32"",
+                dec->pc + ((dec->imm) << 2));
+            append(buf, tmp, buflen);
+            break;
+        default:
+            break;
+        }
+        fmt++;
+    }
+}
+
+/* disassemble instruction */
+static void
+disasm_insn(char *buf, size_t buflen, bfd_vma pc, unsigned long int insn)
+{
+    la_decode dec = { 0 };
+    dec.pc = pc;
+    dec.insn = insn;
+    decode_insn_opcode(&dec);
+    decode_insn_operands(&dec);
+    format_insn(buf, buflen, 16, &dec);
+}
+
+int
+print_insn_loongarch(bfd_vma memaddr, struct disassemble_info *info)
+{
+    char buf[128] = { 0 };
+    bfd_byte buffer[INSNLEN];
+    unsigned long insn;
+    int status;
+
+    status = (*info->read_memory_func)(memaddr, buffer, INSNLEN, info);
+    if (status == 0) {
+        insn = (uint32_t) bfd_getl32(buffer);
+        (*info->fprintf_func)(info->stream, "%08" PRIx64 " ", insn);
+    } else {
+        (*info->memory_error_func)(status, memaddr, info);
+        return -1;
+    }
+    disasm_insn(buf, sizeof(buf), memaddr, insn);
+    (*info->fprintf_func)(info->stream, "\t%s", buf);
+    return INSNLEN;
+}
diff --git a/disas/meson.build b/disas/meson.build
index 449f99e..a1bd8b8 100644
--- a/disas/meson.build
+++ b/disas/meson.build
@@ -9,6 +9,7 @@ common_ss.add(when: 'CONFIG_CRIS_DIS', if_true: files('cris.c'))
 common_ss.add(when: 'CONFIG_HEXAGON_DIS', if_true: files('hexagon.c'))
 common_ss.add(when: 'CONFIG_HPPA_DIS', if_true: files('hppa.c'))
 common_ss.add(when: 'CONFIG_I386_DIS', if_true: files('i386.c'))
+common_ss.add(when: 'CONFIG_LOONGARCH_DIS', if_true: files('loongarch.c'))
 common_ss.add(when: 'CONFIG_M68K_DIS', if_true: files('m68k.c'))
 common_ss.add(when: 'CONFIG_MICROBLAZE_DIS', if_true: files('microblaze.c'))
 common_ss.add(when: 'CONFIG_MIPS_DIS', if_true: files('mips.c'))
diff --git a/include/disas/dis-asm.h b/include/disas/dis-asm.h
index 524f291..009a03a 100644
--- a/include/disas/dis-asm.h
+++ b/include/disas/dis-asm.h
@@ -253,6 +253,7 @@ enum bfd_architecture
 #define bfd_mach_rx            0x75
 #define bfd_mach_rx_v2         0x76
 #define bfd_mach_rx_v3         0x77
+  bfd_arch_loongarch,
   bfd_arch_last
   };
 #define bfd_mach_s390_31 31
@@ -462,6 +463,7 @@ int print_insn_riscv32          (bfd_vma, disassemble_info*);
 int print_insn_riscv64          (bfd_vma, disassemble_info*);
 int print_insn_rx(bfd_vma, disassemble_info *);
 int print_insn_hexagon(bfd_vma, disassemble_info *);
+int print_insn_loongarch(bfd_vma, disassemble_info *);
 
 #ifdef CONFIG_CAPSTONE
 bool cap_disas_target(disassemble_info *info, uint64_t pc, size_t size);
diff --git a/meson.build b/meson.build
index 2f37709..8c50fda 100644
--- a/meson.build
+++ b/meson.build
@@ -1499,6 +1499,7 @@ disassemblers = {
   'sh4' : ['CONFIG_SH4_DIS'],
   'sparc' : ['CONFIG_SPARC_DIS'],
   'xtensa' : ['CONFIG_XTENSA_DIS'],
+  'loongarch' : ['CONFIG_LOONGARCH_DIS'],
 }
 if link_language == 'cpp'
   disassemblers += {
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 20/22] LoongArch Linux User Emulation
  2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
                   ` (18 preceding siblings ...)
  2021-07-21  9:53 ` [PATCH v2 19/22] target/loongarch: Add disassembler Song Gao
@ 2021-07-21  9:53 ` Song Gao
  2021-07-21  9:53 ` [PATCH v2 21/22] configs: Add loongarch linux-user config Song Gao
  2021-07-21  9:53 ` [PATCH v2 22/22] target/loongarch: Add target build suport Song Gao
  21 siblings, 0 replies; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

Implementation of linux user emulation for LoongArch.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 MAINTAINERS                             |   1 +
 include/elf.h                           |   2 +
 linux-user/elfload.c                    |  58 ++++++
 linux-user/loongarch64/cpu_loop.c       | 177 ++++++++++++++++++
 linux-user/loongarch64/signal.c         | 193 ++++++++++++++++++++
 linux-user/loongarch64/sockbits.h       |   1 +
 linux-user/loongarch64/syscall_nr.h     | 307 ++++++++++++++++++++++++++++++++
 linux-user/loongarch64/target_cpu.h     |  36 ++++
 linux-user/loongarch64/target_elf.h     |  14 ++
 linux-user/loongarch64/target_fcntl.h   |  12 ++
 linux-user/loongarch64/target_signal.h  |  28 +++
 linux-user/loongarch64/target_structs.h |  49 +++++
 linux-user/loongarch64/target_syscall.h |  46 +++++
 linux-user/loongarch64/termbits.h       | 229 ++++++++++++++++++++++++
 linux-user/syscall_defs.h               |  10 +-
 15 files changed, 1159 insertions(+), 4 deletions(-)
 create mode 100644 linux-user/loongarch64/cpu_loop.c
 create mode 100644 linux-user/loongarch64/signal.c
 create mode 100644 linux-user/loongarch64/sockbits.h
 create mode 100644 linux-user/loongarch64/syscall_nr.h
 create mode 100644 linux-user/loongarch64/target_cpu.h
 create mode 100644 linux-user/loongarch64/target_elf.h
 create mode 100644 linux-user/loongarch64/target_fcntl.h
 create mode 100644 linux-user/loongarch64/target_signal.h
 create mode 100644 linux-user/loongarch64/target_structs.h
 create mode 100644 linux-user/loongarch64/target_syscall.h
 create mode 100644 linux-user/loongarch64/termbits.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 612fdfb..8e43916 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -214,6 +214,7 @@ LoongArch TCG CPUS
 M: Song Gao <gaosong@loongson.cn>
 S: Maintained
 F: target/loongarch/
+F: linux-user/loongarch64/
 F: disas/loongarch.c
 
 M68K TCG CPUs
diff --git a/include/elf.h b/include/elf.h
index 811bf4a..3a4bcb6 100644
--- a/include/elf.h
+++ b/include/elf.h
@@ -182,6 +182,8 @@ typedef struct mips_elf_abiflags_v0 {
 
 #define EM_NANOMIPS     249     /* Wave Computing nanoMIPS */
 
+#define EM_LOONGARCH    258     /* LoongArch */
+
 /*
  * This is an interim value that we will use until the committee comes
  * up with a final number.
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 42ef2a1..8278e8d 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -909,6 +909,64 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs, const CPUPPCState *en
 
 #endif
 
+#ifdef TARGET_LOONGARCH64
+
+#define ELF_START_MMAP 0x80000000
+
+#define ELF_CLASS   ELFCLASS64
+#define ELF_ARCH    EM_LOONGARCH
+
+#define elf_check_arch(x) ((x) == EM_LOONGARCH)
+static inline void init_thread(struct target_pt_regs *regs,
+                               struct image_info *infop)
+{
+    regs->csr_crmd = 2 << 3;
+    regs->csr_era = infop->entry;
+    regs->regs[3] = infop->start_stack;
+}
+
+/* See linux kernel: arch/loongarch/include/asm/elf.h.  */
+#define ELF_NREG 45
+typedef target_elf_greg_t target_elf_gregset_t[ELF_NREG];
+
+/* See linux kernel: arch/loongarch/include/asm/reg.h.  */
+enum {
+    TARGET_EF_R0 = 0,
+    TARGET_EF_CSR_ERA = TARGET_EF_R0 + 32,
+    TARGET_EF_CSR_BADVADDR = TARGET_EF_R0 + 33,
+};
+
+/* See linux kernel: arch/loongarch/kernel/process.c:loongarch_dump_regs64. */
+static void elf_core_copy_regs(target_elf_gregset_t *regs,
+                               const CPULoongArchState *env)
+{
+    int i;
+
+    for (i = 0; i < TARGET_EF_R0; i++) {
+        (*regs)[i] = 0;
+    }
+    (*regs)[TARGET_EF_R0] = 0;
+
+    for (i = 1; i < ARRAY_SIZE(env->active_tc.gpr); i++) {
+        (*regs)[TARGET_EF_R0 + i] = tswapreg(env->active_tc.gpr[i]);
+    }
+
+    (*regs)[TARGET_EF_CSR_ERA] = tswapreg(env->active_tc.PC);
+    (*regs)[TARGET_EF_CSR_BADVADDR] = tswapreg(env->CSR_BADV);
+}
+
+#define USE_ELF_CORE_DUMP
+#define ELF_EXEC_PAGESIZE        4096
+
+#define ELF_HWCAP get_elf_hwcap()
+
+static uint32_t get_elf_hwcap(void)
+{
+    return 0;
+}
+
+#endif /* TARGET_LOONGARCH64 */
+
 #ifdef TARGET_MIPS
 
 #define ELF_START_MMAP 0x80000000
diff --git a/linux-user/loongarch64/cpu_loop.c b/linux-user/loongarch64/cpu_loop.c
new file mode 100644
index 0000000..91f35bc
--- /dev/null
+++ b/linux-user/loongarch64/cpu_loop.c
@@ -0,0 +1,177 @@
+/*
+ * QEMU LoongArch user cpu loop.
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+#include "qemu/osdep.h"
+#include "qemu.h"
+#include "qemu-common.h"
+#include "cpu_loop-common.h"
+#include "elf.h"
+
+/* Break codes */
+enum {
+    BRK_OVERFLOW = 6,
+    BRK_DIVZERO = 7
+};
+
+static int do_break(CPULoongArchState *env, target_siginfo_t *info,
+                    unsigned int code)
+{
+    int ret = -1;
+
+    switch (code) {
+    case BRK_OVERFLOW:
+    case BRK_DIVZERO:
+        info->si_signo = TARGET_SIGFPE;
+        info->si_errno = 0;
+        info->si_code = (code == BRK_OVERFLOW) ? FPE_INTOVF : FPE_INTDIV;
+        queue_signal(env, info->si_signo, QEMU_SI_FAULT, &*info);
+        ret = 0;
+        break;
+    default:
+        info->si_signo = TARGET_SIGTRAP;
+        info->si_errno = 0;
+        queue_signal(env, info->si_signo, QEMU_SI_FAULT, &*info);
+        ret = 0;
+        break;
+    }
+
+    return ret;
+}
+
+void cpu_loop(CPULoongArchState *env)
+{
+    CPUState *cs = CPU(loongarch_env_get_cpu(env));
+    target_siginfo_t info;
+    int trapnr;
+    abi_long ret;
+
+    for (;;) {
+        cpu_exec_start(cs);
+        trapnr = cpu_exec(cs);
+        cpu_exec_end(cs);
+        process_queued_cpu_work(cs);
+
+        switch (trapnr) {
+        case EXCP_SYSCALL:
+            env->active_tc.PC += 4;
+            ret = do_syscall(env, env->active_tc.gpr[11],
+                             env->active_tc.gpr[4], env->active_tc.gpr[5],
+                             env->active_tc.gpr[6], env->active_tc.gpr[7],
+                             env->active_tc.gpr[8], env->active_tc.gpr[9],
+                             -1, -1);
+            if (ret == -TARGET_ERESTARTSYS) {
+                env->active_tc.PC -= 4;
+                break;
+            }
+            if (ret == -TARGET_QEMU_ESIGRETURN) {
+                /*
+                 * Returning from a successful sigreturn syscall.
+                 * Avoid clobbering register state.
+                 */
+                break;
+            }
+            env->active_tc.gpr[4] = ret;
+            break;
+        case EXCP_TLBL:
+        case EXCP_TLBS:
+        case EXCP_ADE:
+            info.si_signo = TARGET_SIGSEGV;
+            info.si_errno = 0;
+            info.si_code = TARGET_SEGV_MAPERR;
+            info._sifields._sigfault._addr = env->CSR_BADV;
+            queue_signal(env, info.si_signo, QEMU_SI_FAULT, &info);
+            break;
+        case EXCP_FPDIS:
+        case EXCP_INE:
+            info.si_signo = TARGET_SIGILL;
+            info.si_errno = 0;
+            info.si_code = 0;
+            queue_signal(env, info.si_signo, QEMU_SI_FAULT, &info);
+            break;
+        case EXCP_FPE:
+            info.si_signo = TARGET_SIGFPE;
+            info.si_errno = 0;
+            info.si_code = TARGET_FPE_FLTUNK;
+            if (GET_FP_CAUSE(env->active_fpu.fcsr0) & FP_INVALID) {
+                info.si_code = TARGET_FPE_FLTINV;
+            } else if (GET_FP_CAUSE(env->active_fpu.fcsr0) & FP_DIV0) {
+                info.si_code = TARGET_FPE_FLTDIV;
+            } else if (GET_FP_CAUSE(env->active_fpu.fcsr0) & FP_OVERFLOW) {
+                info.si_code = TARGET_FPE_FLTOVF;
+            } else if (GET_FP_CAUSE(env->active_fpu.fcsr0) & FP_UNDERFLOW) {
+                info.si_code = TARGET_FPE_FLTUND;
+            } else if (GET_FP_CAUSE(env->active_fpu.fcsr0) & FP_INEXACT) {
+                info.si_code = TARGET_FPE_FLTRES;
+            }
+            queue_signal(env, info.si_signo, QEMU_SI_FAULT, &info);
+            break;
+        case EXCP_BREAK:
+            {
+                abi_ulong trap_instr;
+                unsigned int code;
+
+                ret = get_user_u32(trap_instr, env->active_tc.PC);
+                if (ret != 0) {
+                    abort();
+                    goto error;
+                }
+
+                code = trap_instr & 0x7fff;
+
+                if (do_break(env, &info, code) != 0) {
+                    abort();
+                    goto error;
+                }
+            }
+            break;
+        case EXCP_TRAP:
+            {
+                abi_ulong trap_instr;
+                unsigned int code = 0;
+
+                ret = get_user_u32(trap_instr, env->active_tc.PC);
+
+                if (ret != 0) {
+                    abort();
+                    goto error;
+                }
+
+                /* The immediate versions don't provide a code. */
+                if (!(trap_instr & 0xFC000000)) {
+                    code = ((trap_instr >> 6) & ((1 << 10) - 1));
+                }
+
+                if (do_break(env, &info, code) != 0) {
+                    abort();
+                    goto error;
+                }
+            }
+            break;
+        case EXCP_ATOMIC:
+            cpu_exec_step_atomic(cs);
+            break;
+        default:
+error:
+            EXCP_DUMP(env, "qemu: unhandled CPU exception 0x%x - aborting\n",
+                      trapnr);
+            abort();
+        }
+        process_pending_signals(env);
+    }
+}
+
+void target_cpu_copy_regs(CPUArchState *env, struct target_pt_regs *regs)
+{
+    int i;
+
+    for (i = 0; i < 32; i++) {
+        env->active_tc.gpr[i] = regs->regs[i];
+    }
+    env->active_tc.PC = regs->csr_era & ~(target_ulong)1;
+
+}
diff --git a/linux-user/loongarch64/signal.c b/linux-user/loongarch64/signal.c
new file mode 100644
index 0000000..716cee3
--- /dev/null
+++ b/linux-user/loongarch64/signal.c
@@ -0,0 +1,193 @@
+/*
+ * LoongArch emulation of Linux signals
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+#include "qemu/osdep.h"
+#include "qemu.h"
+#include "signal-common.h"
+#include "linux-user/trace.h"
+
+#define FPU_REG_WIDTH   256
+union fpureg {
+    uint32_t   val32[FPU_REG_WIDTH / 32];
+    uint64_t   val64[FPU_REG_WIDTH / 64];
+};
+
+struct target_sigcontext {
+    uint64_t   sc_pc;
+    uint64_t   sc_regs[32];
+    uint32_t   sc_flags;
+
+    uint32_t   sc_fcsr;
+    uint32_t   sc_vcsr;
+    uint64_t   sc_fcc;
+    union fpureg    sc_fpregs[32] __attribute__((aligned(32)));
+
+    uint32_t   sc_reserved;
+
+};
+
+struct sigframe {
+    uint32_t sf_ass[4];             /* argument save space for o32 */
+    uint32_t sf_code[2];            /* signal trampoline */
+    struct target_sigcontext sf_sc;
+    target_sigset_t sf_mask;
+};
+
+struct target_ucontext {
+    target_ulong tuc_flags;
+    target_ulong tuc_link;
+    target_stack_t tuc_stack;
+    target_ulong pad0;
+    struct target_sigcontext tuc_mcontext;
+    target_sigset_t tuc_sigmask;
+};
+
+struct target_rt_sigframe {
+    uint32_t rs_ass[4];            /* argument save space for o32 */
+    uint32_t rs_code[2];           /* signal trampoline */
+    struct target_siginfo rs_info;
+    struct target_ucontext rs_uc;
+};
+
+/* Install trampoline to jump back from signal handler */
+static inline int install_sigtramp(unsigned int *tramp, unsigned int syscall)
+{
+    int err = 0;
+
+    /*
+     * Set up the return code ...
+     *
+     *         li      a7, 139
+     *         syscall
+     */
+
+    __put_user(0x03822c0b, tramp + 0);  /* ori  a7, a7, 0x8b */
+    __put_user(0x002b0000, tramp + 1);  /* syscall 0 */
+    return err;
+}
+
+static inline void setup_sigcontext(CPULoongArchState *regs,
+                                    struct target_sigcontext *sc)
+{
+    int i;
+
+    __put_user(exception_resume_pc(regs), &sc->sc_pc);
+    regs->hflags &= ~LOONGARCH_HFLAG_BMASK;
+
+    __put_user(0, &sc->sc_regs[0]);
+    for (i = 1; i < 32; ++i) {
+        __put_user(regs->active_tc.gpr[i], &sc->sc_regs[i]);
+    }
+
+    for (i = 0; i < 32; ++i) {
+        __put_user(regs->active_fpu.fpr[i].d, &sc->sc_fpregs[i].val64[0]);
+    }
+}
+
+static inline void
+restore_sigcontext(CPULoongArchState *regs, struct target_sigcontext *sc)
+{
+    int i;
+
+    __get_user(regs->CSR_ERA, &sc->sc_pc);
+
+    for (i = 1; i < 32; ++i) {
+        __get_user(regs->active_tc.gpr[i], &sc->sc_regs[i]);
+    }
+
+    for (i = 0; i < 32; ++i) {
+        __get_user(regs->active_fpu.fpr[i].d, &sc->sc_fpregs[i].val64[0]);
+    }
+}
+
+/*
+ * Determine which stack to use..
+ */
+static inline abi_ulong
+get_sigframe(struct target_sigaction *ka, CPULoongArchState *regs,
+             size_t frame_size)
+{
+    unsigned long sp;
+
+    sp = target_sigsp(get_sp_from_cpustate(regs) - 32, ka);
+
+    return (sp - frame_size) & ~7;
+}
+
+void setup_rt_frame(int sig, struct target_sigaction *ka,
+                    target_siginfo_t *info,
+                    target_sigset_t *set, CPULoongArchState *env)
+{
+    struct target_rt_sigframe *frame;
+    abi_ulong frame_addr;
+    int i;
+
+    frame_addr = get_sigframe(ka, env, sizeof(*frame));
+    trace_user_setup_rt_frame(env, frame_addr);
+    if (!lock_user_struct(VERIFY_WRITE, frame, frame_addr, 0)) {
+        goto give_sigsegv;
+    }
+
+    install_sigtramp(frame->rs_code, TARGET_NR_rt_sigreturn);
+
+    tswap_siginfo(&frame->rs_info, info);
+
+    __put_user(0, &frame->rs_uc.tuc_flags);
+    __put_user(0, &frame->rs_uc.tuc_link);
+    target_save_altstack(&frame->rs_uc.tuc_stack, env);
+
+    setup_sigcontext(env, &frame->rs_uc.tuc_mcontext);
+
+    for (i = 0; i < TARGET_NSIG_WORDS; i++) {
+        __put_user(set->sig[i], &frame->rs_uc.tuc_sigmask.sig[i]);
+    }
+
+    env->active_tc.gpr[4] = sig;
+    env->active_tc.gpr[5] = frame_addr
+                             + offsetof(struct target_rt_sigframe, rs_info);
+    env->active_tc.gpr[6] = frame_addr
+                             + offsetof(struct target_rt_sigframe, rs_uc);
+    env->active_tc.gpr[3] = frame_addr;
+    env->active_tc.gpr[1] = frame_addr
+                             + offsetof(struct target_rt_sigframe, rs_code);
+
+    env->active_tc.PC = env->active_tc.gpr[20] = ka->_sa_handler;
+    unlock_user_struct(frame, frame_addr, 1);
+    return;
+
+give_sigsegv:
+    unlock_user_struct(frame, frame_addr, 1);
+    force_sigsegv(sig);
+}
+
+long do_rt_sigreturn(CPULoongArchState *env)
+{
+    struct target_rt_sigframe *frame;
+    abi_ulong frame_addr;
+    sigset_t blocked;
+
+    frame_addr = env->active_tc.gpr[3];
+    trace_user_do_rt_sigreturn(env, frame_addr);
+    if (!lock_user_struct(VERIFY_READ, frame, frame_addr, 1)) {
+        goto badframe;
+    }
+
+    target_to_host_sigset(&blocked, &frame->rs_uc.tuc_sigmask);
+    set_sigmask(&blocked);
+
+    restore_sigcontext(env, &frame->rs_uc.tuc_mcontext);
+    target_restore_altstack(&frame->rs_uc.tuc_stack, env);
+
+    env->active_tc.PC = env->CSR_ERA;
+    env->CSR_ERA = 0;
+    return -TARGET_QEMU_ESIGRETURN;
+
+badframe:
+    force_sig(TARGET_SIGSEGV);
+    return -TARGET_QEMU_ESIGRETURN;
+}
diff --git a/linux-user/loongarch64/sockbits.h b/linux-user/loongarch64/sockbits.h
new file mode 100644
index 0000000..0e4c8f0
--- /dev/null
+++ b/linux-user/loongarch64/sockbits.h
@@ -0,0 +1 @@
+#include "../generic/sockbits.h"
diff --git a/linux-user/loongarch64/syscall_nr.h b/linux-user/loongarch64/syscall_nr.h
new file mode 100644
index 0000000..64cb17d
--- /dev/null
+++ b/linux-user/loongarch64/syscall_nr.h
@@ -0,0 +1,307 @@
+#ifndef LINUX_USER_LOONGARCH_SYSCALL_NR_H
+#define LINUX_USER_LOONGARCH_SYSCALL_NR_H
+
+/* copy from kernel: include/uapi/asm-generic/unistd.h */
+#define TARGET_NR_io_setup 0
+#define TARGET_NR_io_destroy 1
+#define TARGET_NR_io_submit 2
+#define TARGET_NR_io_cancel 3
+#define TARGET_NR_io_getevents 4
+#define TARGET_NR_setxattr 5
+#define TARGET_NR_lsetxattr 6
+#define TARGET_NR_fsetxattr 7
+#define TARGET_NR_getxattr 8
+#define TARGET_NR_lgetxattr 9
+#define TARGET_NR_fgetxattr 10
+#define TARGET_NR_listxattr 11
+#define TARGET_NR_llistxattr 12
+#define TARGET_NR_flistxattr 13
+#define TARGET_NR_removexattr 14
+#define TARGET_NR_lremovexattr 15
+#define TARGET_NR_fremovexattr 16
+#define TARGET_NR_getcwd 17
+#define TARGET_NR_lookup_dcookie 18
+#define TARGET_NR_eventfd2 19
+#define TARGET_NR_epoll_create1 20
+#define TARGET_NR_epoll_ctl 21
+#define TARGET_NR_epoll_pwait 22
+#define TARGET_NR_dup 23
+#define TARGET_NR_dup3 24
+#define TARGET_NR_fcntl 25
+#define TARGET_NR_inotify_init1 26
+#define TARGET_NR_inotify_add_watch 27
+#define TARGET_NR_inotify_rm_watch 28
+#define TARGET_NR_ioctl 29
+#define TARGET_NR_ioprio_set 30
+#define TARGET_NR_ioprio_get 31
+#define TARGET_NR_flock 32
+#define TARGET_NR_mknodat 33
+#define TARGET_NR_mkdirat 34
+#define TARGET_NR_unlinkat 35
+#define TARGET_NR_symlinkat 36
+#define TARGET_NR_linkat 37
+#define TARGET_NR_renameat 38
+#define TARGET_NR_umount2 39
+#define TARGET_NR_mount 40
+#define TARGET_NR_pivot_root 41
+#define TARGET_NR_nfsservctl 42
+#define TARGET_NR_statfs 43
+#define TARGET_NR_fstatfs 44
+#define TARGET_NR_truncate 45
+#define TARGET_NR_ftruncate 46
+#define TARGET_NR_fallocate 47
+#define TARGET_NR_faccessat 48
+#define TARGET_NR_chdir 49
+#define TARGET_NR_fchdir 50
+#define TARGET_NR_chroot 51
+#define TARGET_NR_fchmod 52
+#define TARGET_NR_fchmodat 53
+#define TARGET_NR_fchownat 54
+#define TARGET_NR_fchown 55
+#define TARGET_NR_openat 56
+#define TARGET_NR_close 57
+#define TARGET_NR_vhangup 58
+#define TARGET_NR_pipe2 59
+#define TARGET_NR_quotactl 60
+#define TARGET_NR_getdents64 61
+#define TARGET_NR_lseek 62
+#define TARGET_NR_read 63
+#define TARGET_NR_write 64
+#define TARGET_NR_readv 65
+#define TARGET_NR_writev 66
+#define TARGET_NR_pread64 67
+#define TARGET_NR_pwrite64 68
+#define TARGET_NR_preadv 69
+#define TARGET_NR_pwritev 70
+#define TARGET_NR_sendfile 71
+#define TARGET_NR_pselect6 72
+#define TARGET_NR_ppoll 73
+#define TARGET_NR_signalfd4 74
+#define TARGET_NR_vmsplice 75
+#define TARGET_NR_splice 76
+#define TARGET_NR_tee 77
+#define TARGET_NR_readlinkat 78
+#define TARGET_NR_newfstatat 79
+#define TARGET_NR_fstat 80
+#define TARGET_NR_sync 81
+#define TARGET_NR_fsync 82
+#define TARGET_NR_fdatasync 83
+#define TARGET_NR_sync_file_range 84
+#define TARGET_NR_timerfd_create 85
+#define TARGET_NR_timerfd_settime 86
+#define TARGET_NR_timerfd_gettime 87
+#define TARGET_NR_utimensat 88
+#define TARGET_NR_acct 89
+#define TARGET_NR_capget 90
+#define TARGET_NR_capset 91
+#define TARGET_NR_personality 92
+#define TARGET_NR_exit 93
+#define TARGET_NR_exit_group 94
+#define TARGET_NR_waitid 95
+#define TARGET_NR_set_tid_address 96
+#define TARGET_NR_unshare 97
+#define TARGET_NR_futex 98
+#define TARGET_NR_set_robust_list 99
+#define TARGET_NR_get_robust_list 100
+#define TARGET_NR_nanosleep 101
+#define TARGET_NR_getitimer 102
+#define TARGET_NR_setitimer 103
+#define TARGET_NR_kexec_load 104
+#define TARGET_NR_init_module 105
+#define TARGET_NR_delete_module 106
+#define TARGET_NR_timer_create 107
+#define TARGET_NR_timer_gettime 108
+#define TARGET_NR_timer_getoverrun 109
+#define TARGET_NR_timer_settime 110
+#define TARGET_NR_timer_delete 111
+#define TARGET_NR_clock_settime 112
+#define TARGET_NR_clock_gettime 113
+#define TARGET_NR_clock_getres 114
+#define TARGET_NR_clock_nanosleep 115
+#define TARGET_NR_syslog 116
+#define TARGET_NR_ptrace 117
+#define TARGET_NR_sched_setparam 118
+#define TARGET_NR_sched_setscheduler 119
+#define TARGET_NR_sched_getscheduler 120
+#define TARGET_NR_sched_getparam 121
+#define TARGET_NR_sched_setaffinity 122
+#define TARGET_NR_sched_getaffinity 123
+#define TARGET_NR_sched_yield 124
+#define TARGET_NR_sched_get_priority_max 125
+#define TARGET_NR_sched_get_priority_min 126
+#define TARGET_NR_sched_rr_get_interval 127
+#define TARGET_NR_restart_syscall 128
+#define TARGET_NR_kill 129
+#define TARGET_NR_tkill 130
+#define TARGET_NR_tgkill 131
+#define TARGET_NR_sigaltstack 132
+#define TARGET_NR_rt_sigsuspend 133
+#define TARGET_NR_rt_sigaction 134
+#define TARGET_NR_rt_sigprocmask 135
+#define TARGET_NR_rt_sigpending 136
+#define TARGET_NR_rt_sigtimedwait 137
+#define TARGET_NR_rt_sigqueueinfo 138
+#define TARGET_NR_rt_sigreturn 139
+#define TARGET_NR_setpriority 140
+#define TARGET_NR_getpriority 141
+#define TARGET_NR_reboot 142
+#define TARGET_NR_setregid 143
+#define TARGET_NR_setgid 144
+#define TARGET_NR_setreuid 145
+#define TARGET_NR_setuid 146
+#define TARGET_NR_setresuid 147
+#define TARGET_NR_getresuid 148
+#define TARGET_NR_setresgid 149
+#define TARGET_NR_getresgid 150
+#define TARGET_NR_setfsuid 151
+#define TARGET_NR_setfsgid 152
+#define TARGET_NR_times 153
+#define TARGET_NR_setpgid 154
+#define TARGET_NR_getpgid 155
+#define TARGET_NR_getsid 156
+#define TARGET_NR_setsid 157
+#define TARGET_NR_getgroups 158
+#define TARGET_NR_setgroups 159
+#define TARGET_NR_uname 160
+#define TARGET_NR_sethostname 161
+#define TARGET_NR_setdomainname 162
+#define TARGET_NR_getrlimit 163
+#define TARGET_NR_setrlimit 164
+#define TARGET_NR_getrusage 165
+#define TARGET_NR_umask 166
+#define TARGET_NR_prctl 167
+#define TARGET_NR_getcpu 168
+#define TARGET_NR_gettimeofday 169
+#define TARGET_NR_settimeofday 170
+#define TARGET_NR_adjtimex 171
+#define TARGET_NR_getpid 172
+#define TARGET_NR_getppid 173
+#define TARGET_NR_getuid 174
+#define TARGET_NR_geteuid 175
+#define TARGET_NR_getgid 176
+#define TARGET_NR_getegid 177
+#define TARGET_NR_gettid 178
+#define TARGET_NR_sysinfo 179
+#define TARGET_NR_mq_open 180
+#define TARGET_NR_mq_unlink 181
+#define TARGET_NR_mq_timedsend 182
+#define TARGET_NR_mq_timedreceive 183
+#define TARGET_NR_mq_notify 184
+#define TARGET_NR_mq_getsetattr 185
+#define TARGET_NR_msgget 186
+#define TARGET_NR_msgctl 187
+#define TARGET_NR_msgrcv 188
+#define TARGET_NR_msgsnd 189
+#define TARGET_NR_semget 190
+#define TARGET_NR_semctl 191
+#define TARGET_NR_semtimedop 192
+#define TARGET_NR_semop 193
+#define TARGET_NR_shmget 194
+#define TARGET_NR_shmctl 195
+#define TARGET_NR_shmat 196
+#define TARGET_NR_shmdt 197
+#define TARGET_NR_socket 198
+#define TARGET_NR_socketpair 199
+#define TARGET_NR_bind 200
+#define TARGET_NR_listen 201
+#define TARGET_NR_accept 202
+#define TARGET_NR_connect 203
+#define TARGET_NR_getsockname 204
+#define TARGET_NR_getpeername 205
+#define TARGET_NR_sendto 206
+#define TARGET_NR_recvfrom 207
+#define TARGET_NR_setsockopt 208
+#define TARGET_NR_getsockopt 209
+#define TARGET_NR_shutdown 210
+#define TARGET_NR_sendmsg 211
+#define TARGET_NR_recvmsg 212
+#define TARGET_NR_readahead 213
+#define TARGET_NR_brk 214
+#define TARGET_NR_munmap 215
+#define TARGET_NR_mremap 216
+#define TARGET_NR_add_key 217
+#define TARGET_NR_request_key 218
+#define TARGET_NR_keyctl 219
+#define TARGET_NR_clone 220
+#define TARGET_NR_execve 221
+#define TARGET_NR_mmap 222
+#define TARGET_NR_fadvise64 223
+#define TARGET_NR_swapon 224
+#define TARGET_NR_swapoff 225
+#define TARGET_NR_mprotect 226
+#define TARGET_NR_msync 227
+#define TARGET_NR_mlock 228
+#define TARGET_NR_munlock 229
+#define TARGET_NR_mlockall 230
+#define TARGET_NR_munlockall 231
+#define TARGET_NR_mincore 232
+#define TARGET_NR_madvise 233
+#define TARGET_NR_remap_file_pages 234
+#define TARGET_NR_mbind 235
+#define TARGET_NR_get_mempolicy 236
+#define TARGET_NR_set_mempolicy 237
+#define TARGET_NR_migrate_pages 238
+#define TARGET_NR_move_pages 239
+#define TARGET_NR_rt_tgsigqueueinfo 240
+#define TARGET_NR_perf_event_open 241
+#define TARGET_NR_accept4 242
+#define TARGET_NR_recvmmsg 243
+#define TARGET_NR_arch_specific_syscall 244
+#define TARGET_NR_wait4 260
+#define TARGET_NR_prlimit64 261
+#define TARGET_NR_fanotify_init 262
+#define TARGET_NR_fanotify_mark 263
+#define TARGET_NR_name_to_handle_at 264
+#define TARGET_NR_open_by_handle_at 265
+#define TARGET_NR_clock_adjtime 266
+#define TARGET_NR_syncfs 267
+#define TARGET_NR_setns 268
+#define TARGET_NR_sendmmsg 269
+#define TARGET_NR_process_vm_readv 270
+#define TARGET_NR_process_vm_writev 271
+#define TARGET_NR_kcmp 272
+#define TARGET_NR_finit_module 273
+#define TARGET_NR_sched_setattr 274
+#define TARGET_NR_sched_getattr 275
+#define TARGET_NR_renameat2 276
+#define TARGET_NR_seccomp 277
+#define TARGET_NR_getrandom 278
+#define TARGET_NR_memfd_create 279
+#define TARGET_NR_bpf 280
+#define TARGET_NR_execveat 281
+#define TARGET_NR_userfaultfd 282
+#define TARGET_NR_membarrier 283
+#define TARGET_NR_mlock2 284
+#define TARGET_NR_copy_file_range 285
+#define TARGET_NR_preadv2 286
+#define TARGET_NR_pwritev2 287
+#define TARGET_NR_pkey_mprotect 288
+#define TARGET_NR_pkey_alloc 289
+#define TARGET_NR_pkey_free 290
+#define TARGET_NR_statx 291
+#define TARGET_NR_io_pgetevents 292
+#define TARGET_NR_rseq 293
+#define TARGET_NR_kexec_file_load 294
+#define TARGET_NR_pidfd_send_signal 424
+#define TARGET_NR_io_uring_setup 425
+#define TARGET_NR_io_uring_enter 426
+#define TARGET_NR_io_uring_register 427
+#define TARGET_NR_open_tree 428
+#define TARGET_NR_move_mount 429
+#define TARGET_NR_fsopen 430
+#define TARGET_NR_fsconfig 431
+#define TARGET_NR_fsmount 432
+#define TARGET_NR_fspick 433
+#define TARGET_NR_pidfd_open 434
+#define TARGET_NR_clone3 435
+#define TARGET_NR_close_range 436
+#define TARGET_NR_openat2 437
+#define TARGET_NR_pidfd_getfd 438
+#define TARGET_NR_faccessat2 439
+#define TARGET_NR_process_madvise 440
+#define TARGET_NR_epoll_pwait2 441
+#define TARGET_NR_mount_setattr 442
+
+#define TARGET_NR_syscalls 443
+
+#endif
diff --git a/linux-user/loongarch64/target_cpu.h b/linux-user/loongarch64/target_cpu.h
new file mode 100644
index 0000000..c4c06dd
--- /dev/null
+++ b/linux-user/loongarch64/target_cpu.h
@@ -0,0 +1,36 @@
+/*
+ * LoongArch specific CPU ABI and functions for linux-user
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+#ifndef LOONGARCH_TARGET_CPU_H
+#define LOONGARCH_TARGET_CPU_H
+
+static inline void cpu_clone_regs_child(CPULoongArchState *env,
+                                        target_ulong newsp, unsigned flags)
+{
+    if (newsp) {
+        env->active_tc.gpr[3] = newsp;
+    }
+    env->active_tc.gpr[7] = 0;
+    env->active_tc.gpr[4] = 0;
+}
+
+static inline void cpu_clone_regs_parent(CPULoongArchState *env,
+                                         unsigned flags)
+{
+}
+
+static inline void cpu_set_tls(CPULoongArchState *env, target_ulong newtls)
+{
+    env->active_tc.gpr[2] = newtls;
+}
+
+static inline abi_ulong get_sp_from_cpustate(CPULoongArchState *state)
+{
+    return state->active_tc.gpr[3];
+}
+#endif
diff --git a/linux-user/loongarch64/target_elf.h b/linux-user/loongarch64/target_elf.h
new file mode 100644
index 0000000..7c88394
--- /dev/null
+++ b/linux-user/loongarch64/target_elf.h
@@ -0,0 +1,14 @@
+/*
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+
+#ifndef LOONGARCH_TARGET_ELF_H
+#define LOONGARCH_TARGET_ELF_H
+static inline const char *cpu_get_model(uint32_t eflags)
+{
+    return "Loongson-3A5000";
+}
+#endif
diff --git a/linux-user/loongarch64/target_fcntl.h b/linux-user/loongarch64/target_fcntl.h
new file mode 100644
index 0000000..b810293
--- /dev/null
+++ b/linux-user/loongarch64/target_fcntl.h
@@ -0,0 +1,12 @@
+/*
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+#ifndef LOONGARCH_TARGET_FCNTL_H
+#define LOONGARCH_TARGET_FCNTL_H
+
+#include "../generic/fcntl.h"
+
+#endif  /* LOONGARCH_TARGET_FCNTL_H */
diff --git a/linux-user/loongarch64/target_signal.h b/linux-user/loongarch64/target_signal.h
new file mode 100644
index 0000000..713c26a
--- /dev/null
+++ b/linux-user/loongarch64/target_signal.h
@@ -0,0 +1,28 @@
+/*
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+#ifndef LOONGARCH_TARGET_SIGNAL_H
+#define LOONGARCH_TARGET_SIGNAL_H
+
+/* this struct defines a stack used during syscall handling */
+typedef struct target_sigaltstack {
+        abi_long ss_sp;
+        abi_int ss_flags;
+        abi_ulong ss_size;
+} target_stack_t;
+
+/*
+ * sigaltstack controls
+ */
+#define TARGET_SS_ONSTACK     1
+#define TARGET_SS_DISABLE     2
+
+#define TARGET_MINSIGSTKSZ    2048
+#define TARGET_SIGSTKSZ       8192
+
+#include "../generic/signal.h"
+
+#endif /* LOONGARCH_TARGET_SIGNAL_H */
diff --git a/linux-user/loongarch64/target_structs.h b/linux-user/loongarch64/target_structs.h
new file mode 100644
index 0000000..818e8d6
--- /dev/null
+++ b/linux-user/loongarch64/target_structs.h
@@ -0,0 +1,49 @@
+/*
+ * LoongArch specific structures for linux-user
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+#ifndef LOONGARCH_TARGET_STRUCTS_H
+#define LOONGARCH_TARGET_STRUCTS_H
+
+struct target_ipc_perm {
+    abi_int __key;                      /* Key.  */
+    abi_uint uid;                       /* Owner's user ID.  */
+    abi_uint gid;                       /* Owner's group ID.  */
+    abi_uint cuid;                      /* Creator's user ID.  */
+    abi_uint cgid;                      /* Creator's group ID.  */
+    abi_uint mode;                      /* Read/write permission.  */
+    abi_ushort __seq;                   /* Sequence number.  */
+    abi_ushort __pad1;
+    abi_ulong __unused1;
+    abi_ulong __unused2;
+};
+
+struct target_shmid_ds {
+    struct target_ipc_perm shm_perm;    /* operation permission struct */
+    abi_long shm_segsz;                 /* size of segment in bytes */
+    abi_ulong shm_atime;                /* time of last shmat() */
+    abi_ulong shm_dtime;                /* time of last shmdt() */
+    abi_ulong shm_ctime;                /* time of last change by shmctl() */
+    abi_int shm_cpid;                   /* pid of creator */
+    abi_int shm_lpid;                   /* pid of last shmop */
+    abi_ulong shm_nattch;               /* number of current attaches */
+    abi_ulong __unused1;
+    abi_ulong __unused2;
+};
+
+#define TARGET_SEMID64_DS
+
+struct target_semid64_ds {
+    struct target_ipc_perm sem_perm;
+    abi_ulong sem_otime;
+    abi_ulong sem_ctime;
+    abi_ulong sem_nsems;
+    abi_ulong __unused1;
+    abi_ulong __unused2;
+};
+
+#endif
diff --git a/linux-user/loongarch64/target_syscall.h b/linux-user/loongarch64/target_syscall.h
new file mode 100644
index 0000000..b98ac12
--- /dev/null
+++ b/linux-user/loongarch64/target_syscall.h
@@ -0,0 +1,46 @@
+/*
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+#ifndef LOONGARCH_TARGET_SYSCALL_H
+#define LOONGARCH_TARGET_SYSCALL_H
+
+/*
+ * this struct defines the way the registers are stored on the
+ * stack during a system call.
+ */
+
+struct target_pt_regs {
+    /* Saved main processor registers. */
+    target_ulong regs[32];
+
+    /* Saved special registers. */
+    target_ulong csr_crmd;
+    target_ulong csr_prmd;
+    target_ulong csr_euen;
+    target_ulong csr_ecfg;
+    target_ulong csr_estat;
+    target_ulong csr_era;
+    target_ulong csr_badvaddr;
+    target_ulong orig_a0;
+    target_ulong __last[0];
+};
+
+#define UNAME_MACHINE "loongarch"
+#define UNAME_MINIMUM_RELEASE "4.19.0"
+
+#define TARGET_MINSIGSTKSZ 2048
+#define TARGET_MCL_CURRENT 1
+#define TARGET_MCL_FUTURE  2
+#define TARGET_MCL_ONFAULT 4
+
+#define TARGET_FORCE_SHMLBA
+
+static inline abi_ulong target_shmlba(CPULoongArchState *env)
+{
+    return 0x40000;
+}
+
+#endif
diff --git a/linux-user/loongarch64/termbits.h b/linux-user/loongarch64/termbits.h
new file mode 100644
index 0000000..33e74ed
--- /dev/null
+++ b/linux-user/loongarch64/termbits.h
@@ -0,0 +1,229 @@
+#ifndef LINUX_USER_LOONGARCH_TERMBITS_H
+#define LINUX_USER_LOONGARCH_TERMBITS_H
+
+#define TARGET_NCCS 19
+
+typedef unsigned char   target_cc_t;        /* cc_t */
+typedef unsigned int    target_speed_t;     /* speed_t */
+typedef unsigned int    target_tcflag_t;    /* tcflag_t */
+
+struct target_termios {
+    target_tcflag_t c_iflag;               /* input mode flags */
+    target_tcflag_t c_oflag;               /* output mode flags */
+    target_tcflag_t c_cflag;               /* control mode flags */
+    target_tcflag_t c_lflag;               /* local mode flags */
+    target_cc_t c_line;                    /* line discipline */
+    target_cc_t c_cc[TARGET_NCCS];         /* control characters */
+};
+
+/* c_iflag bits */
+#define TARGET_IGNBRK  0000001
+#define TARGET_BRKINT  0000002
+#define TARGET_IGNPAR  0000004
+#define TARGET_PARMRK  0000010
+#define TARGET_INPCK   0000020
+#define TARGET_ISTRIP  0000040
+#define TARGET_INLCR   0000100
+#define TARGET_IGNCR   0000200
+#define TARGET_ICRNL   0000400
+#define TARGET_IUCLC   0001000
+#define TARGET_IXON    0002000
+#define TARGET_IXANY   0004000
+#define TARGET_IXOFF   0010000
+#define TARGET_IMAXBEL 0020000
+#define TARGET_IUTF8   0040000
+
+/* c_oflag bits */
+#define TARGET_OPOST   0000001
+#define TARGET_OLCUC   0000002
+#define TARGET_ONLCR   0000004
+#define TARGET_OCRNL   0000010
+#define TARGET_ONOCR   0000020
+#define TARGET_ONLRET  0000040
+#define TARGET_OFILL   0000100
+#define TARGET_OFDEL   0000200
+#define TARGET_NLDLY   0000400
+#define   TARGET_NL0   0000000
+#define   TARGET_NL1   0000400
+#define TARGET_CRDLY   0003000
+#define   TARGET_CR0   0000000
+#define   TARGET_CR1   0001000
+#define   TARGET_CR2   0002000
+#define   TARGET_CR3   0003000
+#define TARGET_TABDLY  0014000
+#define   TARGET_TAB0  0000000
+#define   TARGET_TAB1  0004000
+#define   TARGET_TAB2  0010000
+#define   TARGET_TAB3  0014000
+#define   TARGET_XTABS 0014000
+#define TARGET_BSDLY   0020000
+#define   TARGET_BS0   0000000
+#define   TARGET_BS1   0020000
+#define TARGET_VTDLY   0040000
+#define   TARGET_VT0   0000000
+#define   TARGET_VT1   0040000
+#define TARGET_FFDLY   0100000
+#define   TARGET_FF0   0000000
+#define   TARGET_FF1   0100000
+
+/* c_cflag bit meaning */
+#define TARGET_CBAUD   0010017
+#define  TARGET_B0     0000000         /* hang up */
+#define  TARGET_B50    0000001
+#define  TARGET_B75    0000002
+#define  TARGET_B110   0000003
+#define  TARGET_B134   0000004
+#define  TARGET_B150   0000005
+#define  TARGET_B200   0000006
+#define  TARGET_B300   0000007
+#define  TARGET_B600   0000010
+#define  TARGET_B1200  0000011
+#define  TARGET_B1800  0000012
+#define  TARGET_B2400  0000013
+#define  TARGET_B4800  0000014
+#define  TARGET_B9600  0000015
+#define  TARGET_B19200 0000016
+#define  TARGET_B38400 0000017
+#define TARGET_EXTA B19200
+#define TARGET_EXTB B38400
+#define TARGET_CSIZE   0000060
+#define   TARGET_CS5   0000000
+#define   TARGET_CS6   0000020
+#define   TARGET_CS7   0000040
+#define   TARGET_CS8   0000060
+#define TARGET_CSTOPB  0000100
+#define TARGET_CREAD   0000200
+#define TARGET_PARENB  0000400
+#define TARGET_PARODD  0001000
+#define TARGET_HUPCL   0002000
+#define TARGET_CLOCAL  0004000
+#define TARGET_CBAUDEX 0010000
+#define  TARGET_B57600  0010001
+#define  TARGET_B115200 0010002
+#define  TARGET_B230400 0010003
+#define  TARGET_B460800 0010004
+#define TARGET_CIBAUD    002003600000  /* input baud rate (not used) */
+#define TARGET_CMSPAR    010000000000  /* mark or space (stick) parity */
+#define TARGET_CRTSCTS   020000000000  /* flow control */
+
+/* c_lflag bits */
+#define TARGET_ISIG    0000001
+#define TARGET_ICANON  0000002
+#define TARGET_XCASE   0000004
+#define TARGET_ECHO    0000010
+#define TARGET_ECHOE   0000020
+#define TARGET_ECHOK   0000040
+#define TARGET_ECHONL  0000100
+#define TARGET_NOFLSH  0000200
+#define TARGET_TOSTOP  0000400
+#define TARGET_ECHOCTL 0001000
+#define TARGET_ECHOPRT 0002000
+#define TARGET_ECHOKE  0004000
+#define TARGET_FLUSHO  0010000
+#define TARGET_PENDIN  0040000
+#define TARGET_IEXTEN  0100000
+#define TARGET_EXTPROC 0200000
+
+/* c_cc character offsets */
+#define TARGET_VINTR    0
+#define TARGET_VQUIT    1
+#define TARGET_VERASE   2
+#define TARGET_VKILL    3
+#define TARGET_VEOF     4
+#define TARGET_VTIME    5
+#define TARGET_VMIN     6
+#define TARGET_VSWTC    7
+#define TARGET_VSTART   8
+#define TARGET_VSTOP    9
+#define TARGET_VSUSP    10
+#define TARGET_VEOL     11
+#define TARGET_VREPRINT 12
+#define TARGET_VDISCARD 13
+#define TARGET_VWERASE  14
+#define TARGET_VLNEXT   15
+#define TARGET_VEOL2    16
+
+/* ioctls */
+
+#define TARGET_TCGETS           0x5401
+#define TARGET_TCSETS           0x5402
+#define TARGET_TCSETSW          0x5403
+#define TARGET_TCSETSF          0x5404
+#define TARGET_TCGETA           0x5405
+#define TARGET_TCSETA           0x5406
+#define TARGET_TCSETAW          0x5407
+#define TARGET_TCSETAF          0x5408
+#define TARGET_TCSBRK           0x5409
+#define TARGET_TCXONC           0x540A
+#define TARGET_TCFLSH           0x540B
+
+#define TARGET_TIOCEXCL         0x540C
+#define TARGET_TIOCNXCL         0x540D
+#define TARGET_TIOCSCTTY        0x540E
+#define TARGET_TIOCGPGRP        0x540F
+#define TARGET_TIOCSPGRP        0x5410
+#define TARGET_TIOCOUTQ         0x5411
+#define TARGET_TIOCSTI          0x5412
+#define TARGET_TIOCGWINSZ       0x5413
+#define TARGET_TIOCSWINSZ       0x5414
+#define TARGET_TIOCMGET         0x5415
+#define TARGET_TIOCMBIS         0x5416
+#define TARGET_TIOCMBIC         0x5417
+#define TARGET_TIOCMSET         0x5418
+#define TARGET_TIOCGSOFTCAR     0x5419
+#define TARGET_TIOCSSOFTCAR     0x541A
+#define TARGET_FIONREAD         0x541B
+#define TARGET_TIOCINQ          TARGET_FIONREAD
+#define TARGET_TIOCLINUX        0x541C
+#define TARGET_TIOCCONS         0x541D
+#define TARGET_TIOCGSERIAL      0x541E
+#define TARGET_TIOCSSERIAL      0x541F
+#define TARGET_TIOCPKT          0x5420
+#define TARGET_FIONBIO          0x5421
+#define TARGET_TIOCNOTTY        0x5422
+#define TARGET_TIOCSETD         0x5423
+#define TARGET_TIOCGETD         0x5424
+#define TARGET_TCSBRKP          0x5425 /* Needed for POSIX tcsendbreak() */
+#define TARGET_TIOCTTYGSTRUCT   0x5426 /* For debugging only */
+#define TARGET_TIOCSBRK         0x5427 /* BSD compatibility */
+#define TARGET_TIOCCBRK         0x5428 /* BSD compatibility */
+#define TARGET_TIOCGSID         0x5429 /* Return the session ID of FD */
+#define TARGET_TIOCGPTN         TARGET_IOR('T', 0x30, unsigned int)
+        /* Get Pty Number (of pty-mux device) */
+#define TARGET_TIOCSPTLCK       TARGET_IOW('T', 0x31, int)
+        /* Lock/unlock Pty */
+#define TARGET_TIOCGPTPEER      TARGET_IO('T', 0x41)
+        /* Safely open the slave */
+
+#define TARGET_FIONCLEX         0x5450  /* these numbers need to be adjusted. */
+#define TARGET_FIOCLEX          0x5451
+#define TARGET_FIOASYNC         0x5452
+#define TARGET_TIOCSERCONFIG    0x5453
+#define TARGET_TIOCSERGWILD     0x5454
+#define TARGET_TIOCSERSWILD     0x5455
+#define TARGET_TIOCGLCKTRMIOS   0x5456
+#define TARGET_TIOCSLCKTRMIOS   0x5457
+#define TARGET_TIOCSERGSTRUCT   0x5458 /* For debugging only */
+#define TARGET_TIOCSERGETLSR    0x5459 /* Get line status register */
+#define TARGET_TIOCSERGETMULTI  0x545A /* Get multiport config  */
+#define TARGET_TIOCSERSETMULTI  0x545B /* Set multiport config */
+
+#define TARGET_TIOCMIWAIT      0x545C
+        /* wait for a change on serial input line(s) */
+#define TARGET_TIOCGICOUNT     0x545D
+        /* read serial port inline interrupt counts */
+#define TARGET_TIOCGHAYESESP   0x545E  /* Get Hayes ESP configuration */
+#define TARGET_TIOCSHAYESESP   0x545F  /* Set Hayes ESP configuration */
+
+/* Used for packet mode */
+#define TARGET_TIOCPKT_DATA              0
+#define TARGET_TIOCPKT_FLUSHREAD         1
+#define TARGET_TIOCPKT_FLUSHWRITE        2
+#define TARGET_TIOCPKT_STOP              4
+#define TARGET_TIOCPKT_START             8
+#define TARGET_TIOCPKT_NOSTOP           16
+#define TARGET_TIOCPKT_DOSTOP           32
+
+#define TARGET_TIOCSER_TEMT    0x01 /* Transmitter physically empty */
+
+#endif
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index a5ce487..92e1a35 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -74,7 +74,7 @@
     || defined(TARGET_M68K) || defined(TARGET_CRIS) \
     || defined(TARGET_S390X) || defined(TARGET_OPENRISC) \
     || defined(TARGET_NIOS2) || defined(TARGET_RISCV) \
-    || defined(TARGET_XTENSA)
+    || defined(TARGET_XTENSA) || defined(TARGET_LOONGARCH64)
 
 #define TARGET_IOC_SIZEBITS	14
 #define TARGET_IOC_DIRBITS	2
@@ -450,7 +450,7 @@ struct target_dirent64 {
 #define TARGET_SIG_IGN	((abi_long)1)	/* ignore signal */
 #define TARGET_SIG_ERR	((abi_long)-1)	/* error return from signal */
 
-#ifdef TARGET_MIPS
+#if defined(TARGET_MIPS) || defined(TARGET_LOONGARCH64)
 #define TARGET_NSIG	   128
 #else
 #define TARGET_NSIG	   64
@@ -2129,7 +2129,8 @@ struct target_stat64  {
     abi_ulong __unused5;
 };
 
-#elif defined(TARGET_OPENRISC) || defined(TARGET_NIOS2) || defined(TARGET_RISCV)
+#elif defined(TARGET_OPENRISC) || defined(TARGET_NIOS2) || \
+      defined(TARGET_RISCV) || defined(TARGET_LOONGARCH64)
 
 /* These are the asm-generic versions of the stat and stat64 structures */
 
@@ -2327,7 +2328,8 @@ struct target_statfs64 {
 };
 #elif (defined(TARGET_PPC64) || defined(TARGET_X86_64) || \
        defined(TARGET_SPARC64) || defined(TARGET_AARCH64) || \
-       defined(TARGET_RISCV)) && !defined(TARGET_ABI32)
+       defined(TARGET_RISCV) || defined(TARGET_LOONGARCH64)) && \
+       !defined(TARGET_ABI32)
 struct target_statfs {
 	abi_long f_type;
 	abi_long f_bsize;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 21/22] configs: Add loongarch linux-user config
  2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
                   ` (19 preceding siblings ...)
  2021-07-21  9:53 ` [PATCH v2 20/22] LoongArch Linux User Emulation Song Gao
@ 2021-07-21  9:53 ` Song Gao
  2021-07-23  6:43   ` Richard Henderson
  2021-07-21  9:53 ` [PATCH v2 22/22] target/loongarch: Add target build suport Song Gao
  21 siblings, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

Add loongarch64 linux-user default configs file.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 configs/targets/loongarch64-linux-user.mak | 3 +++
 1 file changed, 3 insertions(+)
 create mode 100644 configs/targets/loongarch64-linux-user.mak

diff --git a/configs/targets/loongarch64-linux-user.mak b/configs/targets/loongarch64-linux-user.mak
new file mode 100644
index 0000000..5b0acfa
--- /dev/null
+++ b/configs/targets/loongarch64-linux-user.mak
@@ -0,0 +1,3 @@
+# Default configuration for loongson64-linux-user
+TARGET_ARCH=loongarch64
+TARGET_BASE_ARCH=loongarch
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 22/22] target/loongarch: Add target build suport
  2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
                   ` (20 preceding siblings ...)
  2021-07-21  9:53 ` [PATCH v2 21/22] configs: Add loongarch linux-user config Song Gao
@ 2021-07-21  9:53 ` Song Gao
  21 siblings, 0 replies; 76+ messages in thread
From: Song Gao @ 2021-07-21  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, richard.henderson,
	laurent, maobibo, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

This patch add build loongarch-linux-user target support.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/meson.build | 19 +++++++++++++++++++
 target/meson.build           |  1 +
 2 files changed, 20 insertions(+)
 create mode 100644 target/loongarch/meson.build

diff --git a/target/loongarch/meson.build b/target/loongarch/meson.build
new file mode 100644
index 0000000..54d5ca2
--- /dev/null
+++ b/target/loongarch/meson.build
@@ -0,0 +1,19 @@
+gen = decodetree.process('insns.decode')
+
+loongarch_ss = ss.source_set()
+loongarch_ss.add(files(
+  'cpu.c',
+))
+loongarch_tcg_ss = ss.source_set()
+loongarch_tcg_ss.add(gen)
+loongarch_tcg_ss.add(files(
+  'fpu_helper.c',
+  'op_helper.c',
+  'tlb_helper.c',
+  'translate.c',
+))
+loongarch_tcg_ss.add(zlib)
+
+loongarch_ss.add_all(when: 'CONFIG_TCG', if_true: [loongarch_tcg_ss])
+
+target_arch += {'loongarch': loongarch_ss}
diff --git a/target/meson.build b/target/meson.build
index 2f69402..a53a604 100644
--- a/target/meson.build
+++ b/target/meson.build
@@ -5,6 +5,7 @@ subdir('cris')
 subdir('hexagon')
 subdir('hppa')
 subdir('i386')
+subdir('loongarch')
 subdir('m68k')
 subdir('microblaze')
 subdir('mips')
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 07/22] target/loongarch: Add fixed point arithmetic instruction translation
  2021-07-21  9:53 ` [PATCH v2 07/22] target/loongarch: Add fixed point arithmetic instruction translation Song Gao
@ 2021-07-21 17:38   ` Philippe Mathieu-Daudé
  2021-07-21 17:49     ` Philippe Mathieu-Daudé
  2021-07-23  0:46   ` Richard Henderson
  1 sibling, 1 reply; 76+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-07-21 17:38 UTC (permalink / raw)
  To: Song Gao, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, richard.henderson, laurent,
	maobibo, yangxiaojuan, alistair.francis, pbonzini, alex.bennee

On 7/21/21 11:53 AM, Song Gao wrote:
> This patch implement fixed point arithemtic instruction translation.
> 
> This includes:
> - ADD.{W/D}, SUB.{W/D}
> - ADDI.{W/D}, ADDU16ID
> - ALSL.{W[U]/D}
> - LU12I.W, LU32I.D LU52I.D
> - SLT[U], SLT[U]I
> - PCADDI, PCADDU12I, PCADDU18I, PCALAU12I
> - AND, OR, NOR, XOR, ANDN, ORN
> - MUL.{W/D}, MULH.{W[U]/D[U]}
> - MULW.D.W[U]
> - DIV.{W[U]/D[U]}, MOD.{W[U]/D[U]}
> - ANDI, ORI, XORI
> 
> Signed-off-by: Song Gao <gaosong@loongson.cn>
> ---
>  target/loongarch/insns.decode |   89 ++++
>  target/loongarch/trans.inc.c  | 1090 +++++++++++++++++++++++++++++++++++++++++
>  target/loongarch/translate.c  |   12 +
>  target/loongarch/translate.h  |    1 +
>  4 files changed, 1192 insertions(+)
>  create mode 100644 target/loongarch/insns.decode
>  create mode 100644 target/loongarch/trans.inc.c

Please don't include all .inc.c in one big translate.c...

> diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
> index 531f7e1..b60bdc2 100644
> --- a/target/loongarch/translate.c
> +++ b/target/loongarch/translate.c
> @@ -57,6 +57,15 @@ void gen_load_gpr(TCGv t, int reg)
>      }
>  }
>  
> +TCGv get_gpr(int regno)
> +{
> +    if (regno == 0) {
> +        return tcg_constant_tl(0);
> +    } else {
> +        return cpu_gpr[regno];
> +    }
> +}
> +
>  static inline void gen_save_pc(target_ulong pc)

... expose this one ...

>  {
>      tcg_gen_movi_tl(cpu_PC, pc);
> @@ -287,6 +296,9 @@ static bool loongarch_tr_breakpoint_check(DisasContextBase *dcbase,
>      return true;
>  }
>  
> +#include "decode-insns.c.inc"

... and move this include to "trans.c".

> +#include "trans.inc.c"

removing this include.

>  static void loongarch_tr_translate_insn(DisasContextBase *dcbase, CPUState *cs)
>  {
>      CPULoongArchState *env = cs->env_ptr;
> diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h
> index 333c3bf..ef4d4e7 100644
> --- a/target/loongarch/translate.h
> +++ b/target/loongarch/translate.h
> @@ -35,6 +35,7 @@ void check_fpu_enabled(DisasContext *ctx);
>  
>  void gen_base_offset_addr(TCGv addr, int base, int offset);
>  void gen_load_gpr(TCGv t, int reg);
> +TCGv get_gpr(int regno);
>  void gen_load_fpr32(TCGv_i32 t, int reg);
>  void gen_load_fpr64(TCGv_i64 t, int reg);
>  void gen_store_fpr32(TCGv_i32 t, int reg);
> 



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 09/22] target/loongarch: Add fixed point bit instruction translation
  2021-07-21  9:53 ` [PATCH v2 09/22] target/loongarch: Add fixed point bit " Song Gao
@ 2021-07-21 17:46   ` Philippe Mathieu-Daudé
  2021-07-22  8:17     ` Song Gao
  2021-07-23  1:29   ` Richard Henderson
  1 sibling, 1 reply; 76+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-07-21 17:46 UTC (permalink / raw)
  To: Song Gao, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, richard.henderson, laurent,
	maobibo, yangxiaojuan, alistair.francis, pbonzini, alex.bennee

On 7/21/21 11:53 AM, Song Gao wrote:
> This patch implement fixed point bit instruction translation.
> 
> This includes:
> - EXT.W.{B/H}
> - CL{O/Z}.{W/D}, CT{O/Z}.{W/D}
> - BYTEPICK.{W/D}
> - REVB.{2H/4H/2W/D}
> - REVH.{2W/D}
> - BITREV.{4B/8B}, BITREV.{W/D}
> - BSTRINS.{W/D}, BSTRPICK.{W/D}
> - MASKEQZ, MASKNEZ
> 
> Signed-off-by: Song Gao <gaosong@loongson.cn>
> ---
>  target/loongarch/helper.h     |  10 +
>  target/loongarch/insns.decode |  45 +++
>  target/loongarch/op_helper.c  | 119 ++++++++
>  target/loongarch/trans.inc.c  | 665 ++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 839 insertions(+)

> diff --git a/target/loongarch/op_helper.c b/target/loongarch/op_helper.c
> index b2cbdd7..07c3d52 100644
> --- a/target/loongarch/op_helper.c
> +++ b/target/loongarch/op_helper.c
> @@ -25,3 +25,122 @@ void helper_raise_exception(CPULoongArchState *env, uint32_t exception)
>  {
>      do_raise_exception(env, exception, GETPC());
>  }
> +
> +target_ulong helper_cto_w(CPULoongArchState *env, target_ulong rj)
> +{
> +    uint32_t v = (uint32_t)rj;
> +    int temp = 0;
> +
> +    while ((v & 0x1) == 1) {
> +        temp++;
> +        v = v >> 1;
> +    }

Why not use cto32() from "qemu/host-utils.h"

> +
> +    return (target_ulong)temp;
> +}
> +
> +target_ulong helper_ctz_w(CPULoongArchState *env, target_ulong rj)
> +{
> +    uint32_t v = (uint32_t)rj;
> +
> +    if (v == 0) {
> +        return 32;
> +    }
> +
> +    int temp = 0;
> +    while ((v & 0x1) == 0) {
> +        temp++;
> +        v = v >> 1;
> +    }

ctz32

> +
> +    return (target_ulong)temp;
> +}
> +
> +target_ulong helper_cto_d(CPULoongArchState *env, target_ulong rj)
> +{
> +    uint64_t v = rj;
> +    int temp = 0;
> +
> +    while ((v & 0x1) == 1) {
> +        temp++;
> +        v = v >> 1;
> +    }

cto64

> +
> +    return (target_ulong)temp;
> +}
> +
> +target_ulong helper_ctz_d(CPULoongArchState *env, target_ulong rj)
> +{
> +    uint64_t v = rj;
> +
> +    if (v == 0) {
> +        return 64;
> +    }
> +
> +    int temp = 0;
> +    while ((v & 0x1) == 0) {
> +        temp++;
> +        v = v >> 1;
> +    }

and ctz64?

> +
> +    return (target_ulong)temp;
> +}
> +
> +target_ulong helper_bitrev_w(CPULoongArchState *env, target_ulong rj)
> +{
> +    int32_t v = (int32_t)rj;
> +    const int SIZE = 32;
> +    uint8_t bytes[SIZE];
> +
> +    int i;
> +    for (i = 0; i < SIZE; i++) {
> +        bytes[i] = v & 0x1;
> +        v = v >> 1;
> +    }
> +    /* v == 0 */
> +    for (i = 0; i < SIZE; i++) {
> +        v = v | ((uint32_t)bytes[i] << (SIZE - 1 - i));
> +    }
> +
> +    return (target_ulong)(int32_t)v;
> +}
> +
> +target_ulong helper_bitrev_d(CPULoongArchState *env, target_ulong rj)
> +{
> +    uint64_t v = rj;
> +    const int SIZE = 64;
> +    uint8_t bytes[SIZE];
> +
> +    int i;
> +    for (i = 0; i < SIZE; i++) {
> +        bytes[i] = v & 0x1;
> +        v = v >> 1;
> +    }
> +    /* v == 0 */
> +    for (i = 0; i < SIZE; i++) {
> +        v = v | ((uint64_t)bytes[i] << (SIZE - 1 - i));
> +    }
> +
> +    return (target_ulong)v;
> +}
> +
> +static inline target_ulong bitswap(target_ulong v)
> +{
> +    v = ((v >> 1) & (target_ulong)0x5555555555555555ULL) |
> +        ((v & (target_ulong)0x5555555555555555ULL) << 1);
> +    v = ((v >> 2) & (target_ulong)0x3333333333333333ULL) |
> +        ((v & (target_ulong)0x3333333333333333ULL) << 2);
> +    v = ((v >> 4) & (target_ulong)0x0F0F0F0F0F0F0F0FULL) |
> +        ((v & (target_ulong)0x0F0F0F0F0F0F0F0FULL) << 4);
> +    return v;

Is this revbit64?

> +}
> +
> +target_ulong helper_loongarch_dbitswap(target_ulong rj)
> +{
> +    return bitswap(rj);
> +}
> +
> +target_ulong helper_loongarch_bitswap(target_ulong rt)
> +{
> +    return (int32_t)bitswap(rt);
> +}


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 07/22] target/loongarch: Add fixed point arithmetic instruction translation
  2021-07-21 17:38   ` Philippe Mathieu-Daudé
@ 2021-07-21 17:49     ` Philippe Mathieu-Daudé
  2021-07-22  7:41       ` Song Gao
  0 siblings, 1 reply; 76+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-07-21 17:49 UTC (permalink / raw)
  To: Song Gao, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, richard.henderson, laurent,
	maobibo, yangxiaojuan, alistair.francis, pbonzini, alex.bennee

On 7/21/21 7:38 PM, Philippe Mathieu-Daudé wrote:
> On 7/21/21 11:53 AM, Song Gao wrote:
>> This patch implement fixed point arithemtic instruction translation.

Typo arithmetic.

>>
>> This includes:
>> - ADD.{W/D}, SUB.{W/D}
>> - ADDI.{W/D}, ADDU16ID
>> - ALSL.{W[U]/D}
>> - LU12I.W, LU32I.D LU52I.D
>> - SLT[U], SLT[U]I
>> - PCADDI, PCADDU12I, PCADDU18I, PCALAU12I
>> - AND, OR, NOR, XOR, ANDN, ORN
>> - MUL.{W/D}, MULH.{W[U]/D[U]}
>> - MULW.D.W[U]
>> - DIV.{W[U]/D[U]}, MOD.{W[U]/D[U]}
>> - ANDI, ORI, XORI
>>
>> Signed-off-by: Song Gao <gaosong@loongson.cn>
>> ---
>>  target/loongarch/insns.decode |   89 ++++
>>  target/loongarch/trans.inc.c  | 1090 +++++++++++++++++++++++++++++++++++++++++
>>  target/loongarch/translate.c  |   12 +
>>  target/loongarch/translate.h  |    1 +
>>  4 files changed, 1192 insertions(+)
>>  create mode 100644 target/loongarch/insns.decode
>>  create mode 100644 target/loongarch/trans.inc.c
> 
> Please don't include all .inc.c in one big translate.c...
> 
>> diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
>> index 531f7e1..b60bdc2 100644
>> --- a/target/loongarch/translate.c
>> +++ b/target/loongarch/translate.c
>> @@ -57,6 +57,15 @@ void gen_load_gpr(TCGv t, int reg)
>>      }
>>  }
>>  
>> +TCGv get_gpr(int regno)
>> +{
>> +    if (regno == 0) {
>> +        return tcg_constant_tl(0);
>> +    } else {
>> +        return cpu_gpr[regno];
>> +    }
>> +}
>> +
>>  static inline void gen_save_pc(target_ulong pc)
> 
> ... expose this one ...
> 
>>  {
>>      tcg_gen_movi_tl(cpu_PC, pc);
>> @@ -287,6 +296,9 @@ static bool loongarch_tr_breakpoint_check(DisasContextBase *dcbase,
>>      return true;
>>  }
>>  
>> +#include "decode-insns.c.inc"
> 
> ... and move this include to "trans.c".

Since you have the luck to add a new architecture, you could
start cleanly from scratch and add group of instructions, so
this patch would add "trans_arithmetic.c", etc.. in the series.

>> +#include "trans.inc.c"
> 
> removing this include.
> 
>>  static void loongarch_tr_translate_insn(DisasContextBase *dcbase, CPUState *cs)
>>  {
>>      CPULoongArchState *env = cs->env_ptr;
>> diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h
>> index 333c3bf..ef4d4e7 100644
>> --- a/target/loongarch/translate.h
>> +++ b/target/loongarch/translate.h
>> @@ -35,6 +35,7 @@ void check_fpu_enabled(DisasContext *ctx);
>>  
>>  void gen_base_offset_addr(TCGv addr, int base, int offset);
>>  void gen_load_gpr(TCGv t, int reg);
>> +TCGv get_gpr(int regno);
>>  void gen_load_fpr32(TCGv_i32 t, int reg);
>>  void gen_load_fpr64(TCGv_i64 t, int reg);
>>  void gen_store_fpr32(TCGv_i32 t, int reg);
>>
> 
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 07/22] target/loongarch: Add fixed point arithmetic instruction translation
  2021-07-21 17:49     ` Philippe Mathieu-Daudé
@ 2021-07-22  7:41       ` Song Gao
  0 siblings, 0 replies; 76+ messages in thread
From: Song Gao @ 2021-07-22  7:41 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: peter.maydell, thuth, chenhuacai, richard.henderson, qemu-devel,
	maobibo, laurent, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

Hi, Philippe,

On 07/22/2021 01:49 AM, Philippe Mathieu-Daudé wrote:
> On 7/21/21 7:38 PM, Philippe Mathieu-Daudé wrote:
>> On 7/21/21 11:53 AM, Song Gao wrote:
>>> This patch implement fixed point arithemtic instruction translation.
> 
> Typo arithmetic.
> 
>>>
>>> This includes:
>>> - ADD.{W/D}, SUB.{W/D}
>>> - ADDI.{W/D}, ADDU16ID
>>> - ALSL.{W[U]/D}
>>> - LU12I.W, LU32I.D LU52I.D
>>> - SLT[U], SLT[U]I
>>> - PCADDI, PCADDU12I, PCADDU18I, PCALAU12I
>>> - AND, OR, NOR, XOR, ANDN, ORN
>>> - MUL.{W/D}, MULH.{W[U]/D[U]}
>>> - MULW.D.W[U]
>>> - DIV.{W[U]/D[U]}, MOD.{W[U]/D[U]}
>>> - ANDI, ORI, XORI
>>>
>>> Signed-off-by: Song Gao <gaosong@loongson.cn>
>>> ---
>>>  target/loongarch/insns.decode |   89 ++++
>>>  target/loongarch/trans.inc.c  | 1090 +++++++++++++++++++++++++++++++++++++++++
>>>  target/loongarch/translate.c  |   12 +
>>>  target/loongarch/translate.h  |    1 +
>>>  4 files changed, 1192 insertions(+)
>>>  create mode 100644 target/loongarch/insns.decode
>>>  create mode 100644 target/loongarch/trans.inc.c
>>
>> Please don't include all .inc.c in one big translate.c...
>>
>>> diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
>>> index 531f7e1..b60bdc2 100644
>>> --- a/target/loongarch/translate.c
>>> +++ b/target/loongarch/translate.c
>>> @@ -57,6 +57,15 @@ void gen_load_gpr(TCGv t, int reg)
>>>      }
>>>  }
>>>  
>>> +TCGv get_gpr(int regno)
>>> +{
>>> +    if (regno == 0) {
>>> +        return tcg_constant_tl(0);
>>> +    } else {
>>> +        return cpu_gpr[regno];
>>> +    }
>>> +}
>>> +
>>>  static inline void gen_save_pc(target_ulong pc)
>>
>> ... expose this one ...
>>
>>>  {
>>>      tcg_gen_movi_tl(cpu_PC, pc);
>>> @@ -287,6 +296,9 @@ static bool loongarch_tr_breakpoint_check(DisasContextBase *dcbase,
>>>      return true;
>>>  }
>>>  
>>> +#include "decode-insns.c.inc"
>>
>> ... and move this include to "trans.c".
> 
> Since you have the luck to add a new architecture, you could
> start cleanly from scratch and add group of instructions, so
> this patch would add "trans_arithmetic.c", etc.. in the series.
> 

Got it,  The file trans.inc.c seems too big ...

Thansk,
Song Gao   




^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 09/22] target/loongarch: Add fixed point bit instruction translation
  2021-07-21 17:46   ` Philippe Mathieu-Daudé
@ 2021-07-22  8:17     ` Song Gao
  0 siblings, 0 replies; 76+ messages in thread
From: Song Gao @ 2021-07-22  8:17 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: peter.maydell, thuth, chenhuacai, richard.henderson, qemu-devel,
	maobibo, laurent, yangxiaojuan, alistair.francis, pbonzini,
	alex.bennee

Hi, Philippe

On 07/22/2021 01:46 AM, Philippe Mathieu-Daudé wrote:
> On 7/21/21 11:53 AM, Song Gao wrote:
>> This patch implement fixed point bit instruction translation.
>>
>> This includes:
>> - EXT.W.{B/H}
>> - CL{O/Z}.{W/D}, CT{O/Z}.{W/D}
>> - BYTEPICK.{W/D}
>> - REVB.{2H/4H/2W/D}
>> - REVH.{2W/D}
>> - BITREV.{4B/8B}, BITREV.{W/D}
>> - BSTRINS.{W/D}, BSTRPICK.{W/D}
>> - MASKEQZ, MASKNEZ
>>
>> Signed-off-by: Song Gao <gaosong@loongson.cn>
>> ---
>>  target/loongarch/helper.h     |  10 +
>>  target/loongarch/insns.decode |  45 +++
>>  target/loongarch/op_helper.c  | 119 ++++++++
>>  target/loongarch/trans.inc.c  | 665 ++++++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 839 insertions(+)
> 
>> diff --git a/target/loongarch/op_helper.c b/target/loongarch/op_helper.c
>> index b2cbdd7..07c3d52 100644
>> --- a/target/loongarch/op_helper.c
>> +++ b/target/loongarch/op_helper.c
>> @@ -25,3 +25,122 @@ void helper_raise_exception(CPULoongArchState *env, uint32_t exception)
>>  {
>>      do_raise_exception(env, exception, GETPC());
>>  }
>> +
>> +target_ulong helper_cto_w(CPULoongArchState *env, target_ulong rj)
>> +{
>> +    uint32_t v = (uint32_t)rj;
>> +    int temp = 0;
>> +
>> +    while ((v & 0x1) == 1) {
>> +        temp++;
>> +        v = v >> 1;
>> +    }
> 
> Why not use cto32() from "qemu/host-utils.h"
>>> +
>> +    return (target_ulong)temp;
>> +}
>> +
>> +target_ulong helper_ctz_w(CPULoongArchState *env, target_ulong rj)
>> +{
>> +    uint32_t v = (uint32_t)rj;
>> +
>> +    if (v == 0) {
>> +        return 32;
>> +    }
>> +
>> +    int temp = 0;
>> +    while ((v & 0x1) == 0) {
>> +        temp++;
>> +        v = v >> 1;
>> +    }
> 
> ctz32
> 
>> +
>> +    return (target_ulong)temp;
>> +}
>> +
>> +target_ulong helper_cto_d(CPULoongArchState *env, target_ulong rj)
>> +{
>> +    uint64_t v = rj;
>> +    int temp = 0;
>> +
>> +    while ((v & 0x1) == 1) {
>> +        temp++;
>> +        v = v >> 1;
>> +    }
> 
> cto64
> 
>> +
>> +    return (target_ulong)temp;
>> +}
>> +
>> +target_ulong helper_ctz_d(CPULoongArchState *env, target_ulong rj)
>> +{
>> +    uint64_t v = rj;
>> +
>> +    if (v == 0) {
>> +        return 64;
>> +    }
>> +
>> +    int temp = 0;
>> +    while ((v & 0x1) == 0) {
>> +        temp++;
>> +        v = v >> 1;
>> +    }
> 
> and ctz64?
> 

Yes,  I didn't notice the file "qemu/host-utils.h" before,  thanks for kindly help! 

>> +
>> +    return (target_ulong)temp;
>> +}
>> +
>> +target_ulong helper_bitrev_w(CPULoongArchState *env, target_ulong rj)
>> +{
>> +    int32_t v = (int32_t)rj;
>> +    const int SIZE = 32;
>> +    uint8_t bytes[SIZE];
>> +
>> +    int i;
>> +    for (i = 0; i < SIZE; i++) {
>> +        bytes[i] = v & 0x1;
>> +        v = v >> 1;
>> +    }
>> +    /* v == 0 */
>> +    for (i = 0; i < SIZE; i++) {
>> +        v = v | ((uint32_t)bytes[i] << (SIZE - 1 - i));
>> +    }
>> +
>> +    return (target_ulong)(int32_t)v;
>> +}
>> +
>> +target_ulong helper_bitrev_d(CPULoongArchState *env, target_ulong rj)
>> +{
>> +    uint64_t v = rj;
>> +    const int SIZE = 64;
>> +    uint8_t bytes[SIZE];
>> +
>> +    int i;
>> +    for (i = 0; i < SIZE; i++) {
>> +        bytes[i] = v & 0x1;
>> +        v = v >> 1;
>> +    }
>> +    /* v == 0 */
>> +    for (i = 0; i < SIZE; i++) {
>> +        v = v | ((uint64_t)bytes[i] << (SIZE - 1 - i));
>> +    }
>> +
>> +    return (target_ulong)v;
>> +}
>> +
>> +static inline target_ulong bitswap(target_ulong v)
>> +{
>> +    v = ((v >> 1) & (target_ulong)0x5555555555555555ULL) |
>> +        ((v & (target_ulong)0x5555555555555555ULL) << 1);
>> +    v = ((v >> 2) & (target_ulong)0x3333333333333333ULL) |
>> +        ((v & (target_ulong)0x3333333333333333ULL) << 2);
>> +    v = ((v >> 4) & (target_ulong)0x0F0F0F0F0F0F0F0FULL) |
>> +        ((v & (target_ulong)0x0F0F0F0F0F0F0F0FULL) << 4);
>> +    return v;
> 
> Is this revbit64?
> 

No, helper_bitrev_d is revbit64(LoongArch insn is 'bitrev.d rd, rj').

bitswap function for 'bitrev.4b/8b rd, rj' instruction.

    BITREV.4B:
      bstr32[31:24] = BITREV(GR[rj][31:24])
      bstr32[23:16] = BITREV(GR[rj][23:16])
      bstr32[15: 8] = BITREV(GR[rj][15: 8])
      bstr32[ 7: 0] = BITREV(GR[rj][ 7: 0])
      GR[rd] = SignExtend(bstr32, GRLEN)
     
    BITREV.8B:
      GR[rd][63:56] = BITREV(GR[rj][63:56])
      GR[rd][55:48] = BITREV(GR[rj][55:48])
      GR[rd][47:40] = BITREV(GR[rj][47:40])
      GR[rd][39:32] = BITREV(GR[rj][39:32])
      GR[rd][31:24] = BITREV(GR[rj][31:24])
      GR[rd][23:16] = BITREV(GR[rj][23:16])
      GR[rd][15: 8] = BITREV(GR[rj][15: 8])
      GR[rd][ 7: 0] = BITREV(GR[rj][ 7: 0])

We can see a detailed introduction in [1]  2.2.3.6.

[1] : https://github.com/loongson/LoongArch-Documentation/releases/download/LoongArch-Vol1-v3/LoongArch-Vol1-v1.00-EN.pdf

Thanks
Song Gao



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 03/22] target/loongarch: Add core definition
  2021-07-21  9:52 ` [PATCH v2 03/22] target/loongarch: Add core definition Song Gao
@ 2021-07-22 22:43   ` Richard Henderson
  2021-07-26  8:47     ` Song Gao
  0 siblings, 1 reply; 76+ messages in thread
From: Richard Henderson @ 2021-07-22 22:43 UTC (permalink / raw)
  To: Song Gao, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan, laurent,
	maobibo, alistair.francis, pbonzini, alex.bennee

On 7/20/21 11:52 PM, Song Gao wrote:
> This patch add target state header, target definitions
> and initialization routines.
> 
> Signed-off-by: Song Gao <gaosong@loongson.cn>
> ---
>   target/loongarch/cpu-param.h |  21 ++++
>   target/loongarch/cpu-qom.h   |  40 ++++++
>   target/loongarch/cpu.c       | 293 +++++++++++++++++++++++++++++++++++++++++++
>   target/loongarch/cpu.h       | 265 ++++++++++++++++++++++++++++++++++++++
>   4 files changed, 619 insertions(+)
>   create mode 100644 target/loongarch/cpu-param.h
>   create mode 100644 target/loongarch/cpu-qom.h
>   create mode 100644 target/loongarch/cpu.c
>   create mode 100644 target/loongarch/cpu.h
> 
> diff --git a/target/loongarch/cpu-param.h b/target/loongarch/cpu-param.h
> new file mode 100644
> index 0000000..582ee29
> --- /dev/null
> +++ b/target/loongarch/cpu-param.h
> @@ -0,0 +1,21 @@
> +/*
> + * LoongArch cpu parameters for qemu.
> + *
> + * Copyright (c) 2021 Loongson Technology Corporation Limited
> + *
> + * SPDX-License-Identifier: LGPL-2.1+
> + */
> +
> +#ifndef LOONGARCH_CPU_PARAM_H
> +#define LOONGARCH_CPU_PARAM_H 1
> +
> +#ifdef TARGET_LOONGARCH64
> +#define TARGET_LONG_BITS 64

Why the ifdef for TARGET_LOONGARCH64?
Nothing will compile without that set.

> +#ifdef CONFIG_TCG
> +static void loongarch_cpu_synchronize_from_tb(CPUState *cs,
> +                                              const TranslationBlock *tb)
> +{
> +    LoongArchCPU *cpu = LOONGARCH_CPU(cs);
> +    CPULoongArchState *env = &cpu->env;
> +
> +    env->active_tc.PC = tb->pc;
> +    env->hflags &= ~LOONGARCH_HFLAG_BMASK;
> +    env->hflags |= tb->flags & LOONGARCH_HFLAG_BMASK;
> +}

Loongarch has no branch delay slots, so you should not have replicated the mips branch 
delay slot handling.  There should be no BMASK at all.

> +#ifdef CONFIG_TCG
> +#include "hw/core/tcg-cpu-ops.h"
> +
> +static struct TCGCPUOps loongarch_tcg_ops = {
> +    .initialize = loongarch_tcg_init,
> +    .synchronize_from_tb = loongarch_cpu_synchronize_from_tb,
> +};
> +#endif /* CONFIG_TCG */

May I presume that Loongarch has virtualization hardware, and will eventually support KVM? 
  If not, there is no need for CONFIG_TCG anywhere.

> +#define TCG_GUEST_DEFAULT_MO (0)
> +#define UNASSIGNED_CPU_ID 0xFFFFFFFF
> +
> +typedef union fpr_t fpr_t;
> +union fpr_t {
> +    float64  fd;   /* ieee double precision */
> +    float32  fs[2];/* ieee single precision */
> +    uint64_t d;    /* binary double fixed-point */
> +    uint32_t w[2]; /* binary single fixed-point */
> +};

For what it's worth, we already have a CPU_DoubleU type that could be used.  But frankly, 
float64 *is* uint64_t, so there's very little use in putting them together into a union. 
It would seem that you don't even use fs and w for more than fpu_dump_state, and you're 
even doing it wrong there.

> +typedef struct CPULoongArchFPUContext CPULoongArchFPUContext;
> +struct CPULoongArchFPUContext {
> +    /* Floating point registers */
> +    fpr_t fpr[32];
> +    float_status fp_status;
> +
> +    bool cf[8];
> +    /*
> +     * fcsr0
> +     * 31:29 |28:24 |23:21 |20:16 |15:10 |9:8 |7  |6  |5 |4:0
> +     *        Cause         Flags         RM   DAE TM     Enables
> +     */
> +    uint32_t fcsr0;
> +    uint32_t fcsr0_mask;
> +    uint32_t vcsr16;
> +
> +#define FCSR0_M1    0xdf         /* FCSR1 mask, DAE, TM and Enables */
> +#define FCSR0_M2    0x1f1f0000   /* FCSR2 mask, Cause and Flags */
> +#define FCSR0_M3    0x300        /* FCSR3 mask, Round Mode */
> +#define FCSR0_RM    8            /* Round Mode bit num on fcsr0 */
> +#define GET_FP_CAUSE(reg)        (((reg) >> 24) & 0x1f)
> +#define GET_FP_ENABLE(reg)       (((reg) >>  0) & 0x1f)
> +#define GET_FP_FLAGS(reg)        (((reg) >> 16) & 0x1f)
> +#define SET_FP_CAUSE(reg, v)      do { (reg) = ((reg) & ~(0x1f << 24)) | \
> +                                               ((v & 0x1f) << 24);       \
> +                                     } while (0)
> +#define SET_FP_ENABLE(reg, v)     do { (reg) = ((reg) & ~(0x1f <<  0)) | \
> +                                               ((v & 0x1f) << 0);        \
> +                                     } while (0)
> +#define SET_FP_FLAGS(reg, v)      do { (reg) = ((reg) & ~(0x1f << 16)) | \
> +                                               ((v & 0x1f) << 16);       \
> +                                     } while (0)
> +#define UPDATE_FP_FLAGS(reg, v)   do { (reg) |= ((v & 0x1f) << 16); } while (0)
> +#define FP_INEXACT        1
> +#define FP_UNDERFLOW      2
> +#define FP_OVERFLOW       4
> +#define FP_DIV0           8
> +#define FP_INVALID        16
> +};
> +
> +#define TARGET_INSN_START_EXTRA_WORDS 2
> +#define LOONGARCH_FPU_MAX 1
> +#define N_IRQS      14
> +
> +enum loongarch_feature {
> +    LA_FEATURE_3A5000,
> +};
> +
> +typedef struct TCState TCState;
> +struct TCState {
> +    target_ulong gpr[32];
> +    target_ulong PC;
> +};
> +
> +typedef struct CPULoongArchState CPULoongArchState;
> +struct CPULoongArchState {
> +    TCState active_tc;
> +    CPULoongArchFPUContext active_fpu;

Please don't replicate the mips foolishness with active_tc and active_fpu.  There is no 
inactive_fpu with which to contrast this.  Just include these fields directly into the 
main CPULoongArchState structure.

> +
> +    uint32_t current_tc;
> +    uint64_t scr[4];
> +    uint32_t current_fpu;
> +
> +    /* LoongArch CSR register */
> +    CPU_LOONGARCH_CSR
> +    target_ulong lladdr; /* LL virtual address compared against SC */
> +    target_ulong llval;
> +
> +    CPULoongArchFPUContext fpus[LOONGARCH_FPU_MAX];

More copying from MIPS?  What is this for?


> +
> +    /* QEMU */
> +    int error_code;
> +    uint32_t hflags;    /* CPU State */
> +#define TLB_NOMATCH   0x1
> +#define INST_INAVAIL  0x2 /* Invalid instruction word for BadInstr */
> +    /* TMASK defines different execution modes */
> +#define LOONGARCH_HFLAG_TMASK  0x1F5807FF
> +#define LOONGARCH_HFLAG_KU     0x00003 /* kernel/supervisor/user mode mask   */
> +#define LOONGARCH_HFLAG_UM     0x00003 /* user mode flag                     */
> +#define LOONGARCH_HFLAG_KM     0x00000 /* kernel mode flag                   */
> +#define LOONGARCH_HFLAG_64     0x00008 /* 64-bit instructions enabled        */

Is there a 32-bit mode for LoongArch?  I don't see this big in CRMD.  This big overlaps 
the "Direct address translation mode enable bit".  Which does sound like it should be 
present in tb->flags,

> +#define LOONGARCH_HFLAG_FPU    0x00020 /* FPU enabled                        */
> +#define LOONGARCH_HFLAG_F64    0x00040 /* 64-bit FPU enabled                 */

I don't see that there is a mode-switch for a 32-bit fpu either.

> +#define LOONGARCH_HFLAG_BMASK  0x3800
> +#define LOONGARCH_HFLAG_B      0x00800 /* Unconditional branch               */
> +#define LOONGARCH_HFLAG_BC     0x01000 /* Conditional branch                 */
> +#define LOONGARCH_HFLAG_BR     0x02000 /* branch to register (can't link TB) */

None of the BMASK stuff applies to LoongArch.


> +#define LOONGARCH_HFLAG_FRE   0x2000000 /* FRE enabled */
> +#define LOONGARCH_HFLAG_ELPA  0x4000000
> +    target_ulong btarget;        /* Jump / branch target               */
> +    target_ulong bcond;          /* Branch condition (if needed)       */

Nor this.

> +static inline LoongArchCPU *loongarch_env_get_cpu(CPULoongArchState *env)
> +{
> +    return container_of(env, LoongArchCPU, env);
> +}
> +
> +#define ENV_GET_CPU(e) CPU(loongarch_env_get_cpu(e))

You have copied this from a very old version of qemu.  These were replaced by generic 
functions in include/exec/cpu-all.h.

> +void loongarch_tcg_init(void);
> +
> +void loongarch_cpu_dump_state(CPUState *cpu, FILE *f, int flags);
> +
> +void QEMU_NORETURN do_raise_exception_err(CPULoongArchState *env,
> +                                          uint32_t exception,
> +                                          int error_code,
> +                                          uintptr_t pc);
> +
> +static inline void QEMU_NORETURN do_raise_exception(CPULoongArchState *env,
> +                                                    uint32_t exception,
> +                                                    uintptr_t pc)
> +{
> +    do_raise_exception_err(env, exception, 0, pc);
> +}
> +
> +static inline void compute_hflags(CPULoongArchState *env)
> +{
> +    env->hflags &= ~(LOONGARCH_HFLAG_64 | LOONGARCH_HFLAG_FPU |
> +                     LOONGARCH_HFLAG_KU | LOONGARCH_HFLAG_ELPA);
> +
> +    env->hflags |= (env->CSR_CRMD & CSR_CRMD_PLV);
> +    env->hflags |= LOONGARCH_HFLAG_64;
> +
> +    if (env->CSR_EUEN & CSR_EUEN_FPEN) {
> +        env->hflags |= LOONGARCH_HFLAG_FPU;
> +    }
> +}
> +
> +const char *loongarch_exception_name(int32_t exception);

These should not be declared in cpu.h.


r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 04/22] target/loongarch: Add interrupt handling support
  2021-07-21  9:53 ` [PATCH v2 04/22] target/loongarch: Add interrupt handling support Song Gao
@ 2021-07-22 22:47   ` Richard Henderson
  2021-07-26  9:23     ` Song Gao
  0 siblings, 1 reply; 76+ messages in thread
From: Richard Henderson @ 2021-07-22 22:47 UTC (permalink / raw)
  To: Song Gao, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan, laurent,
	maobibo, alistair.francis, pbonzini, alex.bennee

On 7/20/21 11:53 PM, Song Gao wrote:
> +bool loongarch_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
> +{
> +    if (interrupt_request & CPU_INTERRUPT_HARD) {
> +        LoongArchCPU *cpu = LOONGARCH_CPU(cs);
> +        CPULoongArchState *env = &cpu->env;
> +
> +        if (cpu_loongarch_hw_interrupts_enabled(env) &&
> +            cpu_loongarch_hw_interrupts_pending(env)) {
> +            cs->exception_index = EXCP_INTE;
> +            env->error_code = 0;
> +            loongarch_cpu_do_interrupt(cs);
> +            return true;
> +        }
> +    }
> +    return false;
> +}

Not sure what you're doing here, with user-only.  None of these conditions apply.


r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 05/22] target/loongarch: Add memory management support
  2021-07-21  9:53 ` [PATCH v2 05/22] target/loongarch: Add memory management support Song Gao
@ 2021-07-22 22:48   ` Richard Henderson
  2021-07-26  9:25     ` Song Gao
  0 siblings, 1 reply; 76+ messages in thread
From: Richard Henderson @ 2021-07-22 22:48 UTC (permalink / raw)
  To: Song Gao, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan, laurent,
	maobibo, alistair.francis, pbonzini, alex.bennee

On 7/20/21 11:53 PM, Song Gao wrote:
> This patch introduces one memory-management-related functions
> - loongarch_cpu_tlb_fill()
> 
> Signed-off-by: Song Gao <gaosong@loongson.cn>
> ---
>   target/loongarch/cpu.c        |   1 +
>   target/loongarch/cpu.h        |   9 ++++
>   target/loongarch/tlb_helper.c | 103 ++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 113 insertions(+)
>   create mode 100644 target/loongarch/tlb_helper.c
> 
> diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
> index 8eaa778..6269dd9 100644
> --- a/target/loongarch/cpu.c
> +++ b/target/loongarch/cpu.c
> @@ -269,6 +269,7 @@ static struct TCGCPUOps loongarch_tcg_ops = {
>       .initialize = loongarch_tcg_init,
>       .synchronize_from_tb = loongarch_cpu_synchronize_from_tb,
>       .cpu_exec_interrupt = loongarch_cpu_exec_interrupt,
> +    .tlb_fill = loongarch_cpu_tlb_fill,
>   };
>   #endif /* CONFIG_TCG */
>   
> diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
> index 1db8bb5..5c06122 100644
> --- a/target/loongarch/cpu.h
> +++ b/target/loongarch/cpu.h
> @@ -287,4 +287,13 @@ static inline void compute_hflags(CPULoongArchState *env)
>   
>   const char *loongarch_exception_name(int32_t exception);
>   
> +/* tlb_helper.c */
> +bool loongarch_cpu_tlb_fill(CPUState *cs,
> +                            vaddr address,
> +                            int size,
> +                            MMUAccessType access_type,
> +                            int mmu_idx,
> +                            bool probe,
> +                            uintptr_t retaddr);
> +
>   #endif /* LOONGARCH_CPU_H */
> diff --git a/target/loongarch/tlb_helper.c b/target/loongarch/tlb_helper.c
> new file mode 100644
> index 0000000..b59a995
> --- /dev/null
> +++ b/target/loongarch/tlb_helper.c
> @@ -0,0 +1,103 @@
> +/*
> + * LoongArch tlb emulation helpers for qemu.
> + *
> + * Copyright (c) 2021 Loongson Technology Corporation Limited
> + *
> + * SPDX-License-Identifier: LGPL-2.1+
> + */
> +
> +#include "qemu/osdep.h"
> +#include "cpu.h"
> +#include "cpu-csr.h"
> +#include "exec/helper-proto.h"
> +#include "exec/exec-all.h"
> +#include "exec/cpu_ldst.h"
> +#include "exec/log.h"
> +
> +enum {
> +    TLBRET_PE = -7,
> +    TLBRET_XI = -6,
> +    TLBRET_RI = -5,
> +    TLBRET_DIRTY = -4,
> +    TLBRET_INVALID = -3,
> +    TLBRET_NOMATCH = -2,
> +    TLBRET_BADADDR = -1,
> +    TLBRET_MATCH = 0
> +};
> +
> +static void raise_mmu_exception(CPULoongArchState *env, target_ulong address,
> +                                MMUAccessType access_type, int tlb_error)
> +{
> +    CPUState *cs = env_cpu(env);
> +    int exception = 0, error_code = 0;
> +
> +    if (access_type == MMU_INST_FETCH) {
> +        error_code |= INST_INAVAIL;
> +    }
> +
> +    switch (tlb_error) {
> +    default:
> +    case TLBRET_BADADDR:
> +        exception = EXCP_ADE;
> +        break;
> +    case TLBRET_NOMATCH:
> +        /* No TLB match for a mapped address */
> +        if (access_type == MMU_DATA_STORE) {
> +            exception = EXCP_TLBS;
> +        } else {
> +            exception = EXCP_TLBL;
> +        }
> +        error_code |= TLB_NOMATCH;
> +        break;
> +    case TLBRET_INVALID:
> +        /* TLB match with no valid bit */
> +        if (access_type == MMU_DATA_STORE) {
> +            exception = EXCP_TLBS;
> +        } else {
> +            exception = EXCP_TLBL;
> +        }
> +        break;
> +    case TLBRET_DIRTY:
> +        exception = EXCP_TLBM;
> +        break;
> +    case TLBRET_XI:
> +        /* Execute-Inhibit Exception */
> +        exception = EXCP_TLBXI;
> +        break;
> +    case TLBRET_RI:
> +        /* Read-Inhibit Exception */
> +        exception = EXCP_TLBRI;
> +        break;
> +    case TLBRET_PE:
> +        /* Privileged Exception */
> +        exception = EXCP_TLBPE;
> +        break;
> +    }
> +
> +    if (tlb_error == TLBRET_NOMATCH) {
> +        env->CSR_TLBRBADV = address;
> +        env->CSR_TLBREHI = address & (TARGET_PAGE_MASK << 1);
> +        cs->exception_index = exception;
> +        env->error_code = error_code;
> +        return;
> +    }
> +
> +    /* Raise exception */
> +    env->CSR_BADV = address;
> +    cs->exception_index = exception;
> +    env->error_code = error_code;
> +    env->CSR_TLBEHI = address & (TARGET_PAGE_MASK << 1);
> +}
> +
> +bool loongarch_cpu_tlb_fill(CPUState *cs, vaddr address, int size,
> +                       MMUAccessType access_type, int mmu_idx,
> +                       bool probe, uintptr_t retaddr)
> +{
> +    LoongArchCPU *cpu = LOONGARCH_CPU(cs);
> +    CPULoongArchState *env = &cpu->env;
> +    int ret = TLBRET_BADADDR;
> +
> +    /* data access */
> +    raise_mmu_exception(env, address, access_type, ret);
> +    do_raise_exception_err(env, cs->exception_index, env->error_code, retaddr);
> +}

Again, almost all of this does not apply for user-only.

r~

> 



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 06/22] target/loongarch: Add main translation routines
  2021-07-21  9:53 ` [PATCH v2 06/22] target/loongarch: Add main translation routines Song Gao
@ 2021-07-22 23:50   ` Richard Henderson
  2021-07-26  9:39     ` Song Gao
  0 siblings, 1 reply; 76+ messages in thread
From: Richard Henderson @ 2021-07-22 23:50 UTC (permalink / raw)
  To: Song Gao, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan, laurent,
	maobibo, alistair.francis, pbonzini, alex.bennee

On 7/20/21 11:53 PM, Song Gao wrote:
> +/* General purpose registers moves. */
> +void gen_load_gpr(TCGv t, int reg)
> +{
> +    if (reg == 0) {
> +        tcg_gen_movi_tl(t, 0);
> +    } else {
> +        tcg_gen_mov_tl(t, cpu_gpr[reg]);
> +    }
> +}

Please have a look at

https://patchew.org/QEMU/20210709042608.883256-1-richard.henderson@linaro.org/

for a better way to handle the zero register.


> +static inline void save_cpu_state(DisasContext *ctx, int do_save_pc)
> +{
> +    if (do_save_pc && ctx->base.pc_next != ctx->saved_pc) {
> +        gen_save_pc(ctx->base.pc_next);
> +        ctx->saved_pc = ctx->base.pc_next;
> +    }
> +    if (ctx->hflags != ctx->saved_hflags) {
> +        tcg_gen_movi_i32(hflags, ctx->hflags);
> +        ctx->saved_hflags = ctx->hflags;
> +        switch (ctx->hflags & LOONGARCH_HFLAG_BMASK) {
> +        case LOONGARCH_HFLAG_BR:
> +            break;
> +        case LOONGARCH_HFLAG_BC:
> +        case LOONGARCH_HFLAG_B:
> +            tcg_gen_movi_tl(btarget, ctx->btarget);
> +            break;
> +        }
> +    }
> +}

Drop all the hflags handling.
It's all copied from mips delay slot handling.

> +
> +static inline void restore_cpu_state(CPULoongArchState *env, DisasContext *ctx)
> +{
> +    ctx->saved_hflags = ctx->hflags;
> +    switch (ctx->hflags & LOONGARCH_HFLAG_BMASK) {
> +    case LOONGARCH_HFLAG_BR:
> +        break;
> +    case LOONGARCH_HFLAG_BC:
> +    case LOONGARCH_HFLAG_B:
> +        ctx->btarget = env->btarget;
> +        break;
> +    }
> +}

Likewise.

> +static void gen_load_fpr32h(TCGv_i32 t, int reg)
> +{
> +    tcg_gen_extrh_i64_i32(t, fpu_f64[reg]);
> +}
> +
> +static void gen_store_fpr32h(TCGv_i32 t, int reg)
> +{
> +    TCGv_i64 t64 = tcg_temp_new_i64();
> +    tcg_gen_extu_i32_i64(t64, t);
> +    tcg_gen_deposit_i64(fpu_f64[reg], fpu_f64[reg], t64, 32, 32);
> +    tcg_temp_free_i64(t64);
> +}

There is no general-purpose high-part fpr stuff.  There's only movgr2frh and movfrh2gr, 
and you can simplify both if you drop the transition through TCGv_i32.

> +void gen_op_addr_add(TCGv ret, TCGv arg0, TCGv arg1)
> +{
> +    tcg_gen_add_tl(ret, arg0, arg1);
> +}

No point in this, since loongarch has no 32-bit address mode.

> +void gen_base_offset_addr(TCGv addr, int base, int offset)
> +{
> +    if (base == 0) {
> +        tcg_gen_movi_tl(addr, offset);
> +    } else if (offset == 0) {
> +        gen_load_gpr(addr, base);
> +    } else {
> +        tcg_gen_movi_tl(addr, offset);
> +        gen_op_addr_add(addr, cpu_gpr[base], addr);
> +    }
> +}

Using the interfaces I quote above from my riscv cleanup,
this can be tidied to

     tcg_gen_addi_tl(addr, gpr_src(base), offset);

> +static inline bool use_goto_tb(DisasContext *ctx, target_ulong dest)
> +{
> +    return true;
> +}

You must now use translate_use_goto_tb, which will not always return true.  You will see 
assertion failures otherwise.

> +static inline void clear_branch_hflags(DisasContext *ctx)
> +{
> +    ctx->hflags &= ~LOONGARCH_HFLAG_BMASK;
> +    if (ctx->base.is_jmp == DISAS_NEXT) {
> +        save_cpu_state(ctx, 0);
> +    } else {
> +        /*
> +         * It is not safe to save ctx->hflags as hflags may be changed
> +         * in execution time.
> +         */
> +        tcg_gen_andi_i32(hflags, hflags, ~LOONGARCH_HFLAG_BMASK);
> +    }
> +}

Not required.

> +static void gen_branch(DisasContext *ctx, int insn_bytes)
> +{
> +    if (ctx->hflags & LOONGARCH_HFLAG_BMASK) {
> +        int proc_hflags = ctx->hflags & LOONGARCH_HFLAG_BMASK;
> +        /* Branches completion */
> +        clear_branch_hflags(ctx);
> +        ctx->base.is_jmp = DISAS_NORETURN;
> +        switch (proc_hflags & LOONGARCH_HFLAG_BMASK) {
> +        case LOONGARCH_HFLAG_B:
> +            /* unconditional branch */
> +            gen_goto_tb(ctx, 0, ctx->btarget);
> +            break;
> +        case LOONGARCH_HFLAG_BC:
> +            /* Conditional branch */
> +            {
> +                TCGLabel *l1 = gen_new_label();
> +
> +                tcg_gen_brcondi_tl(TCG_COND_NE, bcond, 0, l1);
> +                gen_goto_tb(ctx, 1, ctx->base.pc_next + insn_bytes);
> +                gen_set_label(l1);
> +                gen_goto_tb(ctx, 0, ctx->btarget);
> +            }
> +            break;
> +        case LOONGARCH_HFLAG_BR:
> +            /* unconditional branch to register */
> +            tcg_gen_mov_tl(cpu_PC, btarget);
> +            tcg_gen_lookup_and_goto_ptr();
> +            break;
> +        default:
> +            fprintf(stderr, "unknown branch 0x%x\n", proc_hflags);
> +            abort();
> +        }
> +    }
> +}

Split this up into the various trans_* branch routines, without the setting of HFLAG.

> +static void loongarch_tr_init_disas_context(DisasContextBase *dcbase,
> +                                            CPUState *cs)
> +{
> +    DisasContext *ctx = container_of(dcbase, DisasContext, base);
> +    CPULoongArchState *env = cs->env_ptr;
> +
> +    ctx->page_start = ctx->base.pc_first & TARGET_PAGE_MASK;
> +    ctx->saved_pc = -1;
> +    ctx->btarget = 0;
> +    /* Restore state from the tb context.  */
> +    ctx->hflags = (uint32_t)ctx->base.tb->flags;
> +    restore_cpu_state(env, ctx);
> +    ctx->mem_idx = LOONGARCH_HFLAG_UM;

This is not an mmu index.  You didn't notice the error because you're only doing user-mode.

You're missing a check for page crossing.
Generally, for fixed-width ISAs like this, we do

     /* Bound the number of insns to execute to those left on the page.  */
     int bound = -(ctx->base.pc_first | TARGET_PAGE_MASK) / 4;
     ctx->base.max_insns = MIN(ctx->base.max_insns, bound);

here in init_disas_context.

> +static void loongarch_tr_insn_start(DisasContextBase *dcbase, CPUState *cs)
> +{
> +    DisasContext *ctx = container_of(dcbase, DisasContext, base);
> +
> +    tcg_gen_insn_start(ctx->base.pc_next, ctx->hflags & LOONGARCH_HFLAG_BMASK,
> +                       ctx->btarget);

No hflags/btarget stuff.  Drop TARGET_INSN_START_EXTRA_WORDS.

> +static bool loongarch_tr_breakpoint_check(DisasContextBase *dcbase,
> +                                          CPUState *cs,
> +                                          const CPUBreakpoint *bp)
> +{
> +    return true;
> +}

Broken, but now handled generically, so remove it.


> +static void loongarch_tr_translate_insn(DisasContextBase *dcbase, CPUState *cs)
> +{
> +    CPULoongArchState *env = cs->env_ptr;
> +    DisasContext *ctx = container_of(dcbase, DisasContext, base);
> +    int insn_bytes = 4;
> +
> +    ctx->opcode = cpu_ldl_code(env, ctx->base.pc_next);
> +
> +    if (!decode(ctx, ctx->opcode)) {
> +        fprintf(stderr, "Error: unkown opcode. 0x%lx: 0x%x\n",
> +                ctx->base.pc_next, ctx->opcode);

No fprintfs.  Use qemu_log_mask with LOG_UNIMP or LOG_GUEST_ERROR.

> +    if (ctx->hflags & LOONGARCH_HFLAG_BMASK) {
> +        gen_branch(ctx, insn_bytes);
> +    }

Drop this, as I mentioned above.

> +static void fpu_dump_state(CPULoongArchState *env, FILE * f, int flags)
> +{
> +    int i;
> +    int is_fpu64 = 1;
> +
> +#define printfpr(fp)                                              \
> +    do {                                                          \
> +        if (is_fpu64)                                             \
> +            qemu_fprintf(f, "w:%08x d:%016" PRIx64                \
> +                        " fd:%13g fs:%13g psu: %13g\n",           \
> +                        (fp)->w[FP_ENDIAN_IDX], (fp)->d,          \
> +                        (double)(fp)->fd,                         \
> +                        (double)(fp)->fs[FP_ENDIAN_IDX],          \
> +                        (double)(fp)->fs[!FP_ENDIAN_IDX]);        \
> +        else {                                                    \
> +            fpr_t tmp;                                            \
> +            tmp.w[FP_ENDIAN_IDX] = (fp)->w[FP_ENDIAN_IDX];        \
> +            tmp.w[!FP_ENDIAN_IDX] = ((fp) + 1)->w[FP_ENDIAN_IDX]; \
> +            qemu_fprintf(f, "w:%08x d:%016" PRIx64                \
> +                        " fd:%13g fs:%13g psu:%13g\n",            \
> +                        tmp.w[FP_ENDIAN_IDX], tmp.d,              \
> +                        (double)tmp.fd,                           \
> +                        (double)tmp.fs[FP_ENDIAN_IDX],            \
> +                        (double)tmp.fs[!FP_ENDIAN_IDX]);          \
> +        }                                                         \
> +    } while (0)

This is broken.  You're performing an integer to fp conversion of something that is 
already a floating-point value, not printing the floating-point value itself.  It's broken 
in the mips code as well.

In addition, is_fpu64 is pointless for loongarch.

> +void loongarch_tcg_init(void)
> +{
> +    int i;
> +
> +    for (i = 0; i < 32; i++)
> +        cpu_gpr[i] = tcg_global_mem_new(cpu_env,
> +                                        offsetof(CPULoongArchState,
> +                                                 active_tc.gpr[i]),
> +                                        regnames[i]);

Missing braces.
Do not create a temp for the zero register.

> +    bcond = tcg_global_mem_new(cpu_env,
> +                               offsetof(CPULoongArchState, bcond), "bcond");
> +    btarget = tcg_global_mem_new(cpu_env,
> +                                 offsetof(CPULoongArchState, btarget),
> +                                 "btarget");
> +    hflags = tcg_global_mem_new_i32(cpu_env,
> +                                    offsetof(CPULoongArchState, hflags),
> +                                    "hflags");

Drop these.


r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 07/22] target/loongarch: Add fixed point arithmetic instruction translation
  2021-07-21  9:53 ` [PATCH v2 07/22] target/loongarch: Add fixed point arithmetic instruction translation Song Gao
  2021-07-21 17:38   ` Philippe Mathieu-Daudé
@ 2021-07-23  0:46   ` Richard Henderson
  2021-07-26 11:56     ` Song Gao
  1 sibling, 1 reply; 76+ messages in thread
From: Richard Henderson @ 2021-07-23  0:46 UTC (permalink / raw)
  To: Song Gao, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan, laurent,
	maobibo, alistair.francis, pbonzini, alex.bennee

On 7/20/21 11:53 PM, Song Gao wrote:
> +/* Fixed point arithmetic operation instruction translation */
> +static bool trans_add_w(DisasContext *ctx, arg_add_w *a)
> +{
> +    TCGv Rd = cpu_gpr[a->rd];
> +    TCGv Rj = cpu_gpr[a->rj];
> +    TCGv Rk = cpu_gpr[a->rk];
> +
> +    if (a->rd == 0) {
> +        /* Nop */
> +        return true;
> +    }
> +
> +    if (a->rj != 0 && a->rk != 0) {
> +        tcg_gen_add_tl(Rd, Rj, Rk);
> +        tcg_gen_ext32s_tl(Rd, Rd);
> +    } else if (a->rj == 0 && a->rk != 0) {
> +        tcg_gen_mov_tl(Rd, Rk);
> +    } else if (a->rj != 0 && a->rk == 0) {
> +        tcg_gen_mov_tl(Rd, Rj);
> +    } else {
> +        tcg_gen_movi_tl(Rd, 0);
> +    }
> +
> +    return true;
> +}

Do not do all of this "if reg(n) zero" testing.

Use a common function to perform the gpr lookup, and a small callback function for the 
operation.  Often, the callback function already exists within include/tcg/tcg-op.h.

Please see my riscv cleanup patch set I referenced vs patch 6.

> +static bool trans_orn(DisasContext *ctx, arg_orn *a)
> +{
> +    TCGv Rd = cpu_gpr[a->rd];
> +    TCGv Rj = cpu_gpr[a->rj];
> +
> +    if (a->rd == 0) {
> +        /* Nop */
> +        return true;
> +    }
> +
> +    TCGv t0 = tcg_temp_new();
> +    gen_load_gpr(t0, a->rk);
> +
> +    tcg_gen_not_tl(t0, t0);
> +    tcg_gen_or_tl(Rd, Rj, t0);

tcg_gen_orc_tl.

> +static bool trans_andn(DisasContext *ctx, arg_andn *a)
> +{
> +    TCGv Rd = cpu_gpr[a->rd];
> +    TCGv Rj = cpu_gpr[a->rj];
> +
> +    if (a->rd == 0) {
> +        /* Nop */
> +        return true;
> +    }
> +
> +    TCGv t0 = tcg_temp_new();
> +    gen_load_gpr(t0, a->rk);
> +
> +    tcg_gen_not_tl(t0, t0);
> +    tcg_gen_and_tl(Rd, Rj, t0);

tcg_gen_andc_tl.

> +static bool trans_mul_d(DisasContext *ctx, arg_mul_d *a)
> +{
> +    TCGv t0, t1;
> +    TCGv Rd = cpu_gpr[a->rd];
> +
> +    if (a->rd == 0) {
> +        /* Nop */
> +        return true;
> +    }
> +
> +    t0 = get_gpr(a->rj);
> +    t1 = get_gpr(a->rk);
> +
> +    check_loongarch_64(ctx);

Architecture checks go first, before you've decided the operation is a nop.

> +static bool trans_mulh_d(DisasContext *ctx, arg_mulh_d *a)
> +{
> +    TCGv t0, t1, t2;
> +    TCGv Rd = cpu_gpr[a->rd];
> +
> +    if (a->rd == 0) {
> +        /* Nop */
> +        return true;
> +    }
> +
> +    t0 = get_gpr(a->rj);
> +    t1 = get_gpr(a->rk);
> +    t2 = tcg_temp_new();
> +
> +    check_loongarch_64(ctx);
> +    tcg_gen_muls2_i64(t2, Rd, t0, t1);

If you actually supported LA32, you'd notice this doesn't compile.  Are you planning to 
support LA32 in the future?

> +static bool trans_lu32i_d(DisasContext *ctx, arg_lu32i_d *a)
> +{
> +    TCGv_i64 t0, t1;
> +    TCGv Rd = cpu_gpr[a->rd];
> +
> +    if (a->rd == 0) {
> +        /* Nop */
> +        return true;
> +    }
> +
> +    t0 = tcg_temp_new_i64();
> +    t1 = tcg_temp_new_i64();
> +
> +    tcg_gen_movi_tl(t0, a->si20);
> +    tcg_gen_concat_tl_i64(t1, Rd, t0);
> +    tcg_gen_mov_tl(Rd, t1);

Hmm.  Better as

   tcg_gen_deposit_tl(Rd, Rd, tcg_constant_tl(a->si20), 32, 32);

> +static bool trans_lu52i_d(DisasContext *ctx, arg_lu52i_d *a)
> +{
> +    TCGv t0, t1;
> +    TCGv Rd = cpu_gpr[a->rd];
> +
> +    if (a->rd == 0) {
> +        /* Nop */
> +        return true;
> +    }
> +
> +    t0 = tcg_temp_new();
> +    t1 = tcg_temp_new();
> +
> +    gen_load_gpr(t1, a->rj);
> +
> +    tcg_gen_movi_tl(t0, a->si12);
> +    tcg_gen_shli_tl(t0, t0, 52);
> +    tcg_gen_andi_tl(t1, t1, 0xfffffffffffffU);
> +    tcg_gen_or_tl(Rd, t0, t1);

Definitely better as

   tcg_gen_deposit_tl(Rd, Rd, tcg_constant_tl(a->si12), 52, 12);

> +static bool trans_addi_w(DisasContext *ctx, arg_addi_w *a)
> +{
> +    TCGv Rd = cpu_gpr[a->rd];
> +    TCGv Rj = cpu_gpr[a->rj];
> +    target_ulong uimm = (target_long)(a->si12);
> +
> +    if (a->rd == 0) {
> +        /* Nop */
> +        return true;
> +    }
> +
> +    if (a->rj != 0) {
> +        tcg_gen_addi_tl(Rd, Rj, uimm);
> +        tcg_gen_ext32s_tl(Rd, Rd);
> +    } else {
> +        tcg_gen_movi_tl(Rd, uimm);
> +    }
> +
> +    return true;
> +}

Again, there should be a common function for all of the two-register-immediate operations. 
  The callback here is exactly the same as for trans_add_w.

> +static bool trans_xori(DisasContext *ctx, arg_xori *a)
> +{
> +    TCGv Rd = cpu_gpr[a->rd];
> +    TCGv Rj = cpu_gpr[a->rj];
> +
> +    target_ulong uimm = (uint16_t)(a->ui12);

You shouldn't need these sorts of casts.


r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 08/22] target/loongarch: Add fixed point shift instruction translation
  2021-07-21  9:53 ` [PATCH v2 08/22] target/loongarch: Add fixed point shift " Song Gao
@ 2021-07-23  0:51   ` Richard Henderson
  2021-07-26 11:57     ` Song Gao
  0 siblings, 1 reply; 76+ messages in thread
From: Richard Henderson @ 2021-07-23  0:51 UTC (permalink / raw)
  To: Song Gao, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan, laurent,
	maobibo, alistair.francis, pbonzini, alex.bennee

On 7/20/21 11:53 PM, Song Gao wrote:
> +/* Fixed point shift operation instruction translation */
> +static bool trans_sll_w(DisasContext *ctx, arg_sll_w *a)
> +{
> +    TCGv t0, t1;
> +    TCGv Rd = cpu_gpr[a->rd];
> +
> +    if (a->rd == 0) {
> +        /* Nop */
> +        return true;
> +    }
> +
> +    t0 = tcg_temp_new();
> +    t1 = get_gpr(a->rj);
> +
> +    gen_load_gpr(t0, a->rk);
> +
> +    tcg_gen_andi_tl(t0, t0, 0x1f);
> +    tcg_gen_shl_tl(t0, t1, t0);
> +    tcg_gen_ext32s_tl(Rd, t0);
> +
> +    tcg_temp_free(t0);
> +
> +    return true;
> +}

Again, you should be using common helper functions for this instead of replicating the 
same pattern 16 times.

r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 09/22] target/loongarch: Add fixed point bit instruction translation
  2021-07-21  9:53 ` [PATCH v2 09/22] target/loongarch: Add fixed point bit " Song Gao
  2021-07-21 17:46   ` Philippe Mathieu-Daudé
@ 2021-07-23  1:29   ` Richard Henderson
  2021-07-26 12:22     ` Song Gao
  1 sibling, 1 reply; 76+ messages in thread
From: Richard Henderson @ 2021-07-23  1:29 UTC (permalink / raw)
  To: Song Gao, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan, laurent,
	maobibo, alistair.francis, pbonzini, alex.bennee

On 7/20/21 11:53 PM, Song Gao wrote:
> This patch implement fixed point bit instruction translation.
> 
> This includes:
> - EXT.W.{B/H}
> - CL{O/Z}.{W/D}, CT{O/Z}.{W/D}
> - BYTEPICK.{W/D}
> - REVB.{2H/4H/2W/D}
> - REVH.{2W/D}
> - BITREV.{4B/8B}, BITREV.{W/D}
> - BSTRINS.{W/D}, BSTRPICK.{W/D}
> - MASKEQZ, MASKNEZ
> 
> Signed-off-by: Song Gao <gaosong@loongson.cn>
> ---
>   target/loongarch/helper.h     |  10 +
>   target/loongarch/insns.decode |  45 +++
>   target/loongarch/op_helper.c  | 119 ++++++++
>   target/loongarch/trans.inc.c  | 665 ++++++++++++++++++++++++++++++++++++++++++
>   4 files changed, 839 insertions(+)
> 
> diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
> index 6c7e19b..bbbcc26 100644
> --- a/target/loongarch/helper.h
> +++ b/target/loongarch/helper.h
> @@ -8,3 +8,13 @@
>   
>   DEF_HELPER_3(raise_exception_err, noreturn, env, i32, int)
>   DEF_HELPER_2(raise_exception, noreturn, env, i32)
> +
> +DEF_HELPER_2(cto_w, tl, env, tl)
> +DEF_HELPER_2(ctz_w, tl, env, tl)
> +DEF_HELPER_2(cto_d, tl, env, tl)
> +DEF_HELPER_2(ctz_d, tl, env, tl)

The count leading and trailing zero operations are built into tcg.  Count leading and 
trailing one simply needs a NOT operation to convert it to zero.

> +DEF_HELPER_2(bitrev_w, tl, env, tl)
> +DEF_HELPER_2(bitrev_d, tl, env, tl)

These should use TCG_CALL_NO_RWG_SE.

> +target_ulong helper_bitrev_w(CPULoongArchState *env, target_ulong rj)
> +{
> +    int32_t v = (int32_t)rj;
> +    const int SIZE = 32;
> +    uint8_t bytes[SIZE];
> +
> +    int i;
> +    for (i = 0; i < SIZE; i++) {
> +        bytes[i] = v & 0x1;
> +        v = v >> 1;
> +    }
> +    /* v == 0 */
> +    for (i = 0; i < SIZE; i++) {
> +        v = v | ((uint32_t)bytes[i] << (SIZE - 1 - i));
> +    }
> +
> +    return (target_ulong)(int32_t)v;
> +}

   return (int32_t)revbit32(rj);


> +target_ulong helper_bitrev_d(CPULoongArchState *env, target_ulong rj)
> +{
> +    uint64_t v = rj;
> +    const int SIZE = 64;
> +    uint8_t bytes[SIZE];
> +
> +    int i;
> +    for (i = 0; i < SIZE; i++) {
> +        bytes[i] = v & 0x1;
> +        v = v >> 1;
> +    }
> +    /* v == 0 */
> +    for (i = 0; i < SIZE; i++) {
> +        v = v | ((uint64_t)bytes[i] << (SIZE - 1 - i));
> +    }
> +
> +    return (target_ulong)v;
> +}

   return revbit64(rj);

> +static inline target_ulong bitswap(target_ulong v)
> +{
> +    v = ((v >> 1) & (target_ulong)0x5555555555555555ULL) |
> +        ((v & (target_ulong)0x5555555555555555ULL) << 1);
> +    v = ((v >> 2) & (target_ulong)0x3333333333333333ULL) |
> +        ((v & (target_ulong)0x3333333333333333ULL) << 2);
> +    v = ((v >> 4) & (target_ulong)0x0F0F0F0F0F0F0F0FULL) |
> +        ((v & (target_ulong)0x0F0F0F0F0F0F0F0FULL) << 4);
> +    return v;
> +}
> +
> +target_ulong helper_loongarch_dbitswap(target_ulong rj)
> +{
> +    return bitswap(rj);
> +}
> +
> +target_ulong helper_loongarch_bitswap(target_ulong rt)
> +{
> +    return (int32_t)bitswap(rt);
> +}

I assume these are fpr the  bitrev.4b and bitrev.8b insns?
It would be better to name them correctly.


> +/* Fixed point bit operation instruction translation */
> +static bool trans_ext_w_h(DisasContext *ctx, arg_ext_w_h *a)
> +{
> +    TCGv t0;
> +    TCGv Rd = cpu_gpr[a->rd];
> +
> +    if (a->rd == 0) {
> +        /* Nop */
> +        return true;
> +    }
> +
> +    t0 = get_gpr(a->rj);
> +
> +    tcg_gen_ext16s_tl(Rd, t0);

Again, you should have a common routine for handling these unary operations.

> +static bool trans_clo_w(DisasContext *ctx, arg_clo_w *a)
> +{
> +    TCGv Rd = cpu_gpr[a->rd];
> +
> +    if (a->rd == 0) {
> +        /* Nop */
> +        return true;
> +    }
> +
> +    gen_load_gpr(Rd, a->rj);
> +
> +    tcg_gen_not_tl(Rd, Rd);
> +    tcg_gen_ext32u_tl(Rd, Rd);
> +    tcg_gen_clzi_tl(Rd, Rd, TARGET_LONG_BITS);
> +    tcg_gen_subi_tl(Rd, Rd, TARGET_LONG_BITS - 32);

So, you're actually using the tcg builtins here, and the helper you created isn't used.

> +static bool trans_cto_w(DisasContext *ctx, arg_cto_w *a)
> +{
> +    TCGv t0;
> +    TCGv Rd = cpu_gpr[a->rd];
> +
> +    if (a->rd == 0) {
> +        /* Nop */
> +        return true;
> +    }
> +
> +    t0 = tcg_temp_new();
> +    gen_load_gpr(t0, a->rj);
> +
> +    gen_helper_cto_w(Rd, cpu_env, t0);

Here you should have used the tcg builtin.

> +static bool trans_ctz_w(DisasContext *ctx, arg_ctz_w *a)
> +{
> +    TCGv t0;
> +    TCGv Rd = cpu_gpr[a->rd];
> +
> +    if (a->rd == 0) {
> +        /* Nop */
> +        return true;
> +    }
> +
> +    t0 = tcg_temp_new();
> +    gen_load_gpr(t0, a->rj);
> +
> +    gen_helper_ctz_w(Rd, cpu_env, t0);

Likewise.

> +static bool trans_revb_2w(DisasContext *ctx, arg_revb_2w *a)
> +{
> +    TCGv_i64 t0, t1, t2;
> +    TCGv Rd = cpu_gpr[a->rd];
> +
> +    if (a->rd == 0) {
> +        /* Nop */
> +        return true;
> +    }
> +
> +    t0 = tcg_temp_new_i64();
> +    t1 = tcg_temp_new_i64();
> +    t2 = get_gpr(a->rj);
> +
> +    gen_load_gpr(t0, a->rd);
> +
> +    tcg_gen_ext32u_i64(t1, t2);
> +    tcg_gen_bswap32_i64(t0, t1);
> +    tcg_gen_shri_i64(t1, t2, 32);
> +    tcg_gen_bswap32_i64(t1, t1);
> +    tcg_gen_concat32_i64(Rd, t0, t1);

tcg_gen_bswap64_i64(Rd, Rj)
tcg_gen_rotri_i64(Rd, Rd, 32);

> +static bool trans_bytepick_d(DisasContext *ctx, arg_bytepick_d *a)
> +{
> +    TCGv t0;
> +    TCGv Rd = cpu_gpr[a->rd];
> +
> +    if (a->rd == 0) {
> +        /* Nop */
> +        return true;
> +    }
> +
> +    t0 = tcg_temp_new();
> +
> +    check_loongarch_64(ctx);
> +    if (a->sa3 == 0 || ((a->sa3) * 8) == 64) {
> +        if (a->sa3 == 0) {
> +            gen_load_gpr(t0, a->rk);
> +        } else {
> +            gen_load_gpr(t0, a->rj);
> +        }
> +            tcg_gen_mov_tl(Rd, t0);
> +    } else {
> +        TCGv t1 = tcg_temp_new();
> +
> +        gen_load_gpr(t0, a->rk);
> +        gen_load_gpr(t1, a->rj);
> +
> +        tcg_gen_shli_tl(t0, t0, ((a->sa3) * 8));
> +        tcg_gen_shri_tl(t1, t1, 64 - ((a->sa3) * 8));
> +        tcg_gen_or_tl(Rd, t1, t0);
> +
> +        tcg_temp_free(t1);
> +    }

tcg_gen_extract2_i64(Rd, Rk, Rj, a->sa3 * 8);


r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 10/22] target/loongarch: Add fixed point load/store instruction translation
  2021-07-21  9:53 ` [PATCH v2 10/22] target/loongarch: Add fixed point load/store " Song Gao
@ 2021-07-23  1:45   ` Richard Henderson
  2021-07-26 12:25     ` Song Gao
  0 siblings, 1 reply; 76+ messages in thread
From: Richard Henderson @ 2021-07-23  1:45 UTC (permalink / raw)
  To: Song Gao, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan, laurent,
	maobibo, alistair.francis, pbonzini, alex.bennee

On 7/20/21 11:53 PM, Song Gao wrote:
> This patch implement fixed point load/store instruction translation.
> 
> This includes:
> - LD.{B[U]/H[U]/W[U]/D}, ST.{B/H/W/D}
> - LDX.{B[U]/H[U]/W[U]/D}, STX.{B/H/W/D}
> - LDPTR.{W/D}, STPTR.{W/D}
> - PRELD
> - LD{GT/LE}.{B/H/W/D}, ST{GT/LE}.{B/H/W/D}
> - DBAR, IBAR
> 
> Signed-off-by: Song Gao <gaosong@loongson.cn>
> ---
>   target/loongarch/helper.h     |   3 +
>   target/loongarch/insns.decode |  58 ++++
>   target/loongarch/op_helper.c  |  15 +
>   target/loongarch/trans.inc.c  | 758 ++++++++++++++++++++++++++++++++++++++++++
>   target/loongarch/translate.c  |  29 ++
>   5 files changed, 863 insertions(+)
> 
> diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
> index bbbcc26..5cd38c8 100644
> --- a/target/loongarch/helper.h
> +++ b/target/loongarch/helper.h
> @@ -18,3 +18,6 @@ DEF_HELPER_2(bitrev_d, tl, env, tl)
>   
>   DEF_HELPER_FLAGS_1(loongarch_bitswap, TCG_CALL_NO_RWG_SE, tl, tl)
>   DEF_HELPER_FLAGS_1(loongarch_dbitswap, TCG_CALL_NO_RWG_SE, tl, tl)
> +
> +DEF_HELPER_3(asrtle_d, void, env, tl, tl)
> +DEF_HELPER_3(asrtgt_d, void, env, tl, tl)
> diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
> index ec599a9..08fd232 100644
> --- a/target/loongarch/insns.decode
> +++ b/target/loongarch/insns.decode
> @@ -24,6 +24,9 @@
>   %lsbw    10:5
>   %msbd    16:6
>   %lsbd    10:6
> +%si14    10:s14
> +%hint    0:5
> +%whint   0:15
>   
>   #
>   # Argument sets
> @@ -40,6 +43,9 @@
>   &fmt_rdrjrksa3      rd rj rk sa3
>   &fmt_rdrjmsbwlsbw   rd rj msbw lsbw
>   &fmt_rdrjmsbdlsbd   rd rj msbd lsbd
> +&fmt_rdrjsi14       rd rj si14
> +&fmt_hintrjsi12     hint rj si12
> +&fmt_whint          whint
>   
>   #
>   # Formats
> @@ -56,6 +62,9 @@
>   @fmt_rdrjmsbwlsbw    .... ....... ..... . ..... ..... .....   &fmt_rdrjmsbwlsbw   %rd %rj %msbw %lsbw
>   @fmt_rdrjmsbdlsbd    .... ...... ...... ...... ..... .....    &fmt_rdrjmsbdlsbd   %rd %rj %msbd %lsbd
>   @fmt_rdrjrksa3       .... ........ .. ... ..... ..... .....   &fmt_rdrjrksa3      %rd %rj %rk %sa3
> +@fmt_hintrjsi12      .... ...... ............ ..... .....     &fmt_hintrjsi12     %hint %rj %si12
> +@fmt_whint           .... ........ ..... ...............      &fmt_whint          %whint
> +@fmt_rdrjsi14        .... .... .............. ..... .....     &fmt_rdrjsi14       %rd %rj %si14
>   
>   #
>   # Fixed point arithmetic operation instruction
> @@ -158,3 +167,52 @@ bstrins_w        0000 0000011 ..... 0 ..... ..... .....   @fmt_rdrjmsbwlsbw
>   bstrpick_w       0000 0000011 ..... 1 ..... ..... .....   @fmt_rdrjmsbwlsbw
>   bstrins_d        0000 000010 ...... ...... ..... .....    @fmt_rdrjmsbdlsbd
>   bstrpick_d       0000 000011 ...... ...... ..... .....    @fmt_rdrjmsbdlsbd
> +
> +#
> +# Fixed point load/store instruction
> +#
> +ld_b             0010 100000 ............ ..... .....     @fmt_rdrjsi12
> +ld_h             0010 100001 ............ ..... .....     @fmt_rdrjsi12
> +ld_w             0010 100010 ............ ..... .....     @fmt_rdrjsi12
> +ld_d             0010 100011 ............ ..... .....     @fmt_rdrjsi12
> +st_b             0010 100100 ............ ..... .....     @fmt_rdrjsi12
> +st_h             0010 100101 ............ ..... .....     @fmt_rdrjsi12
> +st_w             0010 100110 ............ ..... .....     @fmt_rdrjsi12
> +st_d             0010 100111 ............ ..... .....     @fmt_rdrjsi12
> +ld_bu            0010 101000 ............ ..... .....     @fmt_rdrjsi12
> +ld_hu            0010 101001 ............ ..... .....     @fmt_rdrjsi12
> +ld_wu            0010 101010 ............ ..... .....     @fmt_rdrjsi12
> +ldx_b            0011 10000000 00000 ..... ..... .....    @fmt_rdrjrk
> +ldx_h            0011 10000000 01000 ..... ..... .....    @fmt_rdrjrk
> +ldx_w            0011 10000000 10000 ..... ..... .....    @fmt_rdrjrk
> +ldx_d            0011 10000000 11000 ..... ..... .....    @fmt_rdrjrk
> +stx_b            0011 10000001 00000 ..... ..... .....    @fmt_rdrjrk
> +stx_h            0011 10000001 01000 ..... ..... .....    @fmt_rdrjrk
> +stx_w            0011 10000001 10000 ..... ..... .....    @fmt_rdrjrk
> +stx_d            0011 10000001 11000 ..... ..... .....    @fmt_rdrjrk
> +ldx_bu           0011 10000010 00000 ..... ..... .....    @fmt_rdrjrk
> +ldx_hu           0011 10000010 01000 ..... ..... .....    @fmt_rdrjrk
> +ldx_wu           0011 10000010 10000 ..... ..... .....    @fmt_rdrjrk
> +preld            0010 101011 ............ ..... .....     @fmt_hintrjsi12
> +dbar             0011 10000111 00100 ...............      @fmt_whint
> +ibar             0011 10000111 00101 ...............      @fmt_whint
> +ldptr_w          0010 0100 .............. ..... .....     @fmt_rdrjsi14
> +stptr_w          0010 0101 .............. ..... .....     @fmt_rdrjsi14
> +ldptr_d          0010 0110 .............. ..... .....     @fmt_rdrjsi14
> +stptr_d          0010 0111 .............. ..... .....     @fmt_rdrjsi14
> +ldgt_b           0011 10000111 10000 ..... ..... .....    @fmt_rdrjrk
> +ldgt_h           0011 10000111 10001 ..... ..... .....    @fmt_rdrjrk
> +ldgt_w           0011 10000111 10010 ..... ..... .....    @fmt_rdrjrk
> +ldgt_d           0011 10000111 10011 ..... ..... .....    @fmt_rdrjrk
> +ldle_b           0011 10000111 10100 ..... ..... .....    @fmt_rdrjrk
> +ldle_h           0011 10000111 10101 ..... ..... .....    @fmt_rdrjrk
> +ldle_w           0011 10000111 10110 ..... ..... .....    @fmt_rdrjrk
> +ldle_d           0011 10000111 10111 ..... ..... .....    @fmt_rdrjrk
> +stgt_b           0011 10000111 11000 ..... ..... .....    @fmt_rdrjrk
> +stgt_h           0011 10000111 11001 ..... ..... .....    @fmt_rdrjrk
> +stgt_w           0011 10000111 11010 ..... ..... .....    @fmt_rdrjrk
> +stgt_d           0011 10000111 11011 ..... ..... .....    @fmt_rdrjrk
> +stle_b           0011 10000111 11100 ..... ..... .....    @fmt_rdrjrk
> +stle_h           0011 10000111 11101 ..... ..... .....    @fmt_rdrjrk
> +stle_w           0011 10000111 11110 ..... ..... .....    @fmt_rdrjrk
> +stle_d           0011 10000111 11111 ..... ..... .....    @fmt_rdrjrk
> diff --git a/target/loongarch/op_helper.c b/target/loongarch/op_helper.c
> index 07c3d52..738e067 100644
> --- a/target/loongarch/op_helper.c
> +++ b/target/loongarch/op_helper.c
> @@ -144,3 +144,18 @@ target_ulong helper_loongarch_bitswap(target_ulong rt)
>   {
>       return (int32_t)bitswap(rt);
>   }
> +
> +/* loongarch assert op */
> +void helper_asrtle_d(CPULoongArchState *env, target_ulong rj, target_ulong rk)
> +{
> +    if (rj > rk) {
> +        do_raise_exception(env, EXCP_ADE, GETPC());
> +    }
> +}
> +
> +void helper_asrtgt_d(CPULoongArchState *env, target_ulong rj, target_ulong rk)
> +{
> +    if (rj <= rk) {
> +        do_raise_exception(env, EXCP_ADE, GETPC());
> +    }
> +}
> diff --git a/target/loongarch/trans.inc.c b/target/loongarch/trans.inc.c
> index 8c5ba63..e38001b 100644
> --- a/target/loongarch/trans.inc.c
> +++ b/target/loongarch/trans.inc.c
> @@ -2116,3 +2116,761 @@ static bool trans_bstrpick_w(DisasContext *ctx, arg_bstrpick_w *a)
>   
>       return true;
>   }
> +
> +/* Fixed point load/store instruction translation */
> +static bool trans_ld_b(DisasContext *ctx, arg_ld_b *a)
> +{
> +    TCGv t0;
> +    TCGv Rd = cpu_gpr[a->rd];
> +    int mem_idx = ctx->mem_idx;
> +
> +    if (a->rd == 0) {
> +        /* Nop */
> +        return true;
> +    }

A load into the zero register is not a nop.  It is a load with the result discarded.  One 
should still fault if the load is to an invalid address.

You should be using a common routine, passing in the MO_* operand.

> +#define ASRTGT                                \
> +    do {                                      \
> +        TCGv t1 = get_gpr(a->rj);             \
> +        TCGv t2 = get_gpr(a->rk);             \
> +        gen_helper_asrtgt_d(cpu_env, t1, t2); \
> +    } while (0)
> +
> +#define ASRTLE                                \
> +    do {                                      \
> +        TCGv t1 = get_gpr(a->rj);             \
> +        TCGv t2 = get_gpr(a->rk);             \
> +        gen_helper_asrtle_d(cpu_env, t1, t2); \
> +    } while (0)
> +
> +#define DECL_ARG(name)   \
> +    arg_ ## name arg = { \
> +        .rd = a->rd,     \
> +        .rj = a->rj,     \
> +        .rk = a->rk,     \
> +    };
> +
> +static bool trans_ldgt_b(DisasContext *ctx, arg_ldgt_b *a)
> +{
> +    ASRTGT;
> +    DECL_ARG(ldx_b)
> +    trans_ldx_b(ctx, &arg);
> +    return true;
> +}

Use of a common routine would avoid the macro ugliness.


r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 11/22] target/loongarch: Add fixed point atomic instruction translation
  2021-07-21  9:53 ` [PATCH v2 11/22] target/loongarch: Add fixed point atomic " Song Gao
@ 2021-07-23  1:49   ` Richard Henderson
  2021-07-26 12:25     ` Song Gao
  0 siblings, 1 reply; 76+ messages in thread
From: Richard Henderson @ 2021-07-23  1:49 UTC (permalink / raw)
  To: Song Gao, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan, laurent,
	maobibo, alistair.francis, pbonzini, alex.bennee

On 7/20/21 11:53 PM, Song Gao wrote:
> +#define TRANS_AM_W(name, op)                                      \
> +static bool trans_ ## name(DisasContext *ctx, arg_ ## name * a)   \
> +{                                                                 \
> +    TCGv addr, val, ret;                                          \
> +    TCGv Rd = cpu_gpr[a->rd];                                     \
> +    int mem_idx = ctx->mem_idx;                                   \
> +                                                                  \
> +    if (a->rd == 0) {                                             \
> +        return true;                                              \
> +    }                                                             \
> +    if ((a->rd != 0) && ((a->rj == a->rd) || (a->rk == a->rd))) { \
> +        printf("%s: warning, register equal\n", __func__);        \
> +        return false;                                             \
> +    }                                                             \
> +                                                                  \
> +    addr = get_gpr(a->rj);                                        \
> +    val = get_gpr(a->rk);                                         \
> +    ret = tcg_temp_new();                                         \
> +                                                                  \
> +    tcg_gen_atomic_##op##_tl(ret, addr, val, mem_idx, MO_TESL |   \
> +                            ctx->default_tcg_memop_mask);         \
> +    tcg_gen_mov_tl(Rd, ret);                                      \
> +                                                                  \
> +    tcg_temp_free(ret);                                           \
> +                                                                  \
> +    return true;                                                  \
> +}

No printf.  Use a common routine instead of macros.


r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 12/22] target/loongarch: Add fixed point extra instruction translation
  2021-07-21  9:53 ` [PATCH v2 12/22] target/loongarch: Add fixed point extra " Song Gao
@ 2021-07-23  5:12   ` Richard Henderson
  2021-07-26 12:57     ` Song Gao
  2021-08-04  7:40     ` Song Gao
  0 siblings, 2 replies; 76+ messages in thread
From: Richard Henderson @ 2021-07-23  5:12 UTC (permalink / raw)
  To: Song Gao, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan, laurent,
	maobibo, alistair.francis, pbonzini, alex.bennee

On 7/20/21 11:53 PM, Song Gao wrote:
> +target_ulong helper_cpucfg(CPULoongArchState *env, target_ulong rj)
> +{
> +    target_ulong r = 0;
> +
> +    switch (rj) {
> +    case 0:
> +        r = env->CSR_MCSR0 & 0xffffffff;
> +        break;
> +    case 1:
> +        r = (env->CSR_MCSR0 & 0xffffffff00000000) >> 32;
> +        break;

Why do you represent all of these as high and low portions of a 64-bit internal value, 
when the manual describes them as 32-bit values?


> +/* Fixed point extra instruction translation */
> +static bool trans_crc_w_b_w(DisasContext *ctx, arg_crc_w_b_w *a)
> +{
> +    TCGv t0, t1;
> +    TCGv Rd = cpu_gpr[a->rd];
> +    TCGv_i32 tsz = tcg_const_i32(1 << 1);

This size is wrong.  It should be 1, not 1 << 1 (2).


> +static bool trans_crc_w_w_w(DisasContext *ctx, arg_crc_w_w_w *a)
> +{
> +    TCGv t0, t1;
> +    TCGv Rd = cpu_gpr[a->rd];
> +    TCGv_i32 tsz = tcg_const_i32(1 << 4);

Because this size most certainly should not be 16...

> +static bool trans_crc_w_d_w(DisasContext *ctx, arg_crc_w_d_w *a)
> +{
> +    TCGv t0, t1;
> +    TCGv Rd = cpu_gpr[a->rd];
> +    TCGv_i32 tsz = tcg_const_i32(1 << 8);

... and this size should not be 256.  Both well larger than the 8 byte buffer that you've 
allocated.

Also, you need a helper so that you don't have 8 copies of this code.

> +static bool trans_asrtle_d(DisasContext *ctx, arg_asrtle_d * a)
> +{
> +    TCGv t0, t1;
> +
> +    t0 = get_gpr(a->rj);
> +    t1 = get_gpr(a->rk);
> +
> +    gen_helper_asrtle_d(cpu_env, t0, t1);
> +
> +    return true;
> +}
> +
> +static bool trans_asrtgt_d(DisasContext *ctx, arg_asrtgt_d * a)
> +{
> +    TCGv t0, t1;
> +
> +    t0 = get_gpr(a->rj);
> +    t1 = get_gpr(a->rk);
> +
> +    gen_helper_asrtgt_d(cpu_env, t0, t1);
> +
> +    return true;
> +}

I'm not sure why both of these instructions are in the ISA, since

   ASRTLE X,Y <-> ASRTGT Y,X

but we certainly don't need two different helpers.
Just swap the arguments for one of them.

> +static bool trans_rdtimel_w(DisasContext *ctx, arg_rdtimel_w *a)
> +{
> +    /* Nop */
> +    return true;
> +}
> +
> +static bool trans_rdtimeh_w(DisasContext *ctx, arg_rdtimeh_w *a)
> +{
> +    /* Nop */
> +    return true;
> +}
> +
> +static bool trans_rdtime_d(DisasContext *ctx, arg_rdtime_d *a)
> +{
> +    /* Nop */
> +    return true;
> +}

If you don't want to implement these right now, you should at least initialize the 
destination register to 0, or something.


r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 13/22] target/loongarch: Add floating point arithmetic instruction translation
  2021-07-21  9:53 ` [PATCH v2 13/22] target/loongarch: Add floating point arithmetic " Song Gao
@ 2021-07-23  5:44   ` Richard Henderson
  2021-07-27  7:17     ` Song Gao
  0 siblings, 1 reply; 76+ messages in thread
From: Richard Henderson @ 2021-07-23  5:44 UTC (permalink / raw)
  To: Song Gao, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan, laurent,
	maobibo, alistair.francis, pbonzini, alex.bennee

On 7/20/21 11:53 PM, Song Gao wrote:
> +uint64_t helper_fp_sqrt_d(CPULoongArchState *env, uint64_t fp)
> +{
> +    fp = float64_sqrt(fp, &env->active_fpu.fp_status);
> +    update_fcsr0(env, GETPC());
> +    return fp;
> +}
> +
> +uint32_t helper_fp_sqrt_s(CPULoongArchState *env, uint32_t fp)
> +{
> +    fp = float32_sqrt(fp, &env->active_fpu.fp_status);
> +    update_fcsr0(env, GETPC());
> +    return fp;
> +}

I believe you will find it easier to take and return uint64_t, even for 32-bit operations. 
  The manual says that the high bits may contain any value, so in my opinion you should 
not work hard to preserve the high bits, as you currently do with

> +    gen_load_fpr32(fp0, a->fj);
> +    gen_load_fpr32(fp1, a->fk);
> +    gen_helper_fp_add_s(fp0, cpu_env, fp0, fp1);
> +    gen_store_fpr32(fp0, a->fd);

I think this should be as simple as

   gen_helper_fp_add_s(cpu_fpu[a->fd], cpu_env,
                       cpu_fpu[a->fj], cpu_fpu[a->fk]);

I also think that loongarch should learn from risc-v and change the architecture to 
"nan-box" single-precision results -- fill the high 32-bits with 1s.  This is an SNaN 
representation for double-precision and will immediately fail when incorrectly using a 
single-precision value as a double-precision input.

Thankfully the current architecture is backward compatible with nan-boxing.

> +/* Floating point arithmetic operation instruction translation */
> +static bool trans_fadd_s(DisasContext *ctx, arg_fadd_s * a)
> +{
> +    TCGv_i32 fp0, fp1;
> +
> +    fp0 = tcg_temp_new_i32();
> +    fp1 = tcg_temp_new_i32();
> +
> +    check_fpu_enabled(ctx);
> +    gen_load_fpr32(fp0, a->fj);
> +    gen_load_fpr32(fp1, a->fk);
> +    gen_helper_fp_add_s(fp0, cpu_env, fp0, fp1);
> +    gen_store_fpr32(fp0, a->fd);
> +
> +    tcg_temp_free_i32(fp0);
> +    tcg_temp_free_i32(fp1);
> +
> +    return true;
> +}

Again, you should use some helper functions to reduce the repetition.

> +static bool trans_fmadd_d(DisasContext *ctx, arg_fmadd_d *a)
> +{
> +    TCGv_i64 fp0, fp1, fp2, fp3;
> +
> +    fp0 = tcg_temp_new_i64();
> +    fp1 = tcg_temp_new_i64();
> +    fp2 = tcg_temp_new_i64();
> +    fp3 = tcg_temp_new_i64();
> +
> +    check_fpu_enabled(ctx);
> +    gen_load_fpr64(fp0, a->fj);
> +    gen_load_fpr64(fp1, a->fk);
> +    gen_load_fpr64(fp2, a->fa);
> +    check_fpu_enabled(ctx);

Repeating check_fpu_enabled.

> +    gen_helper_fp_madd_d(fp3, cpu_env, fp0, fp1, fp2);
> +    gen_store_fpr64(fp3, a->fd);

I think you might as well pass in the float_muladd_* constant to a single helper rather 
than having 4 different helpers.


r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 14/22] target/loongarch: Add floating point comparison instruction translation
  2021-07-21  9:53 ` [PATCH v2 14/22] target/loongarch: Add floating point comparison " Song Gao
@ 2021-07-23  6:11   ` Richard Henderson
  2021-07-27  7:56     ` Song Gao
  0 siblings, 1 reply; 76+ messages in thread
From: Richard Henderson @ 2021-07-23  6:11 UTC (permalink / raw)
  To: Song Gao, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan, laurent,
	maobibo, alistair.francis, pbonzini, alex.bennee

On 7/20/21 11:53 PM, Song Gao wrote:
> +void helper_movreg2cf_i32(CPULoongArchState *env, uint32_t cd, uint32_t src)
> +{
> +    env->active_fpu.cf[cd & 0x7] = src & 0x1;
> +}
> +
> +void helper_movreg2cf_i64(CPULoongArchState *env, uint32_t cd, uint64_t src)
> +{
> +    env->active_fpu.cf[cd & 0x7] = src & 0x1;
> +}
> +
> +/* fcmp.cond.s */
> +uint32_t helper_fp_cmp_caf_s(CPULoongArchState *env, uint32_t fp,
> +                             uint32_t fp1)
> +{
> +    uint64_t ret;
> +    ret = (float32_unordered_quiet(fp1, fp, &env->active_fpu.fp_status), 0);
> +    update_fcsr0(env, GETPC());
> +    if (ret) {
> +        return -1;
> +    } else {
> +        return 0;
> +    }
> +}

I don't understand why you have split the compare from the store to cf?

I don't understand why you're returning -1 instead of 1, when the result is supposed to be 
a boolean.

Alternately, I don't understand why you want a helper function to perform a simple byte 
store operation.  You could easily store a byte with tcg_gen_st8_{i32,i64}.

> +uint32_t helper_fp_cmp_cueq_s(CPULoongArchState *env, uint32_t fp,
> +                              uint32_t fp1)
> +{
> +    uint64_t ret;
> +    ret = float32_unordered_quiet(fp1, fp, &env->active_fpu.fp_status) ||
> +          float32_eq_quiet(fp, fp1, &env->active_fpu.fp_status);

You're better off using

     FloatRelation cmp = float32_compare_quiet(fp0, fp1, status);
     update_fcsr0(env, GETPC();
     return cmp == float_relation_unordered ||
            cmp == float_relation_equal;

Similarly with every other place you use two comparisons.

Indeed, one could conceivably condense everything into exactly four helper functions: two 
using float{32,64}_compare_quiet and two using float{32,64}_compare (signalling).  A 4th 
argument would be a bitmask of the different true conditions, exactly as listed in Table 9.

Since FloatRelation is in {-1, 0, 1, 2}, one could write

   return (mask >> (cmp + 1)) & 1;


r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 15/22] target/loongarch: Add floating point conversion instruction translation
  2021-07-21  9:53 ` [PATCH v2 15/22] target/loongarch: Add floating point conversion " Song Gao
@ 2021-07-23  6:16   ` Richard Henderson
  2021-07-27  7:57     ` Song Gao
  0 siblings, 1 reply; 76+ messages in thread
From: Richard Henderson @ 2021-07-23  6:16 UTC (permalink / raw)
  To: Song Gao, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan, laurent,
	maobibo, alistair.francis, pbonzini, alex.bennee

On 7/20/21 11:53 PM, Song Gao wrote:
> +uint64_t helper_fp_tintrm_l_d(CPULoongArchState *env, uint64_t src)
> +{
> +    uint64_t dest;
> +
> +    set_float_rounding_mode(float_round_down, &env->active_fpu.fp_status);
> +    dest = float64_to_int64(src, &env->active_fpu.fp_status);
> +    restore_rounding_mode(env);

Better off to save the current rounding mode with get_float_rounding_mode, and restore it 
afterward.

See 63d06e90e65d5f119039044e986a81007954a466.


r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 16/22] target/loongarch: Add floating point move instruction translation
  2021-07-21  9:53 ` [PATCH v2 16/22] target/loongarch: Add floating point move " Song Gao
@ 2021-07-23  6:29   ` Richard Henderson
  2021-07-27  8:06     ` Song Gao
  2021-08-12  9:20     ` Song Gao
  0 siblings, 2 replies; 76+ messages in thread
From: Richard Henderson @ 2021-07-23  6:29 UTC (permalink / raw)
  To: Song Gao, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan, laurent,
	maobibo, alistair.francis, pbonzini, alex.bennee

On 7/20/21 11:53 PM, Song Gao wrote:
> This patch implement floationg point move instruction translation.
> 
> This includes:
> - FMOV.{S/D}
> - FSEL
> - MOVGR2FR.{W/D}, MOVGR2FRH.W
> - MOVFR2GR.{S/D}, MOVFRH2GR.S
> - MOVGR2FCSR, MOVFCSR2GR
> - MOVFR2CF, MOVCF2FR
> - MOVGR2CF, MOVCF2GR
> 
> Signed-off-by: Song Gao <gaosong@loongson.cn>
> ---
>   target/loongarch/fpu_helper.c |  80 +++++++++++++
>   target/loongarch/helper.h     |   6 +
>   target/loongarch/insns.decode |  41 +++++++
>   target/loongarch/trans.inc.c  | 270 ++++++++++++++++++++++++++++++++++++++++++
>   4 files changed, 397 insertions(+)
> 
> diff --git a/target/loongarch/fpu_helper.c b/target/loongarch/fpu_helper.c
> index 162085a..7662715 100644
> --- a/target/loongarch/fpu_helper.c
> +++ b/target/loongarch/fpu_helper.c
> @@ -379,6 +379,11 @@ uint64_t helper_fp_logb_d(CPULoongArchState *env, uint64_t fp)
>       return fp1;
>   }
>   
> +void helper_movreg2cf(CPULoongArchState *env, uint32_t cd, target_ulong src)
> +{
> +    env->active_fpu.cf[cd & 0x7] = src & 0x1;
> +}

tcg_gen_andi_tl + tcg_gen_st8_tl.

> +target_ulong helper_fsel(CPULoongArchState *env, target_ulong fj,
> +                         target_ulong fk, uint32_t ca)
> +{
> +    if (env->active_fpu.cf[ca & 0x7]) {
> +        return fk;
> +    } else {
> +        return fj;
> +    }
> +}

tcg_gen_movcond_i64.

> +void helper_movgr2fcsr(CPULoongArchState *env, target_ulong arg1,
> +                       uint32_t fcsr)
> +{
> +    switch (fcsr) {
> +    case 0:
> +        env->active_fpu.fcsr0 = arg1;
> +        break;
> +    case 1:
> +        env->active_fpu.fcsr0 = (arg1 & FCSR0_M1) |
> +                                (env->active_fpu.fcsr0 & ~FCSR0_M1);
> +        break;
> +    case 2:
> +        env->active_fpu.fcsr0 = (arg1 & FCSR0_M2) |
> +                                (env->active_fpu.fcsr0 & ~FCSR0_M2);
> +        break;
> +    case 3:
> +        env->active_fpu.fcsr0 = (arg1 & FCSR0_M3) |
> +                                (env->active_fpu.fcsr0 & ~FCSR0_M3);
> +        break;

This is easily implemented inline, followed by a single helper call to re-load the 
rounding mode (if required by the mask).

> +    case 16:
> +        env->active_fpu.vcsr16 = arg1;
> +        break;

The documentation I have does not describe the vector stuff?

> +    default:
> +        printf("%s: warning, fcsr '%d' not supported\n", __func__, fcsr);
> +        assert(0);
> +        break;

No printfs, no assert.  This should have been caught by

> +target_ulong helper_movcf2reg(CPULoongArchState *env, uint32_t cj)
> +{
> +    return (target_ulong)env->active_fpu.cf[cj & 0x7];
> +}

tcg_gen_ld8u_tl.


r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 17/22] target/loongarch: Add floating point load/store instruction translation
  2021-07-21  9:53 ` [PATCH v2 17/22] target/loongarch: Add floating point load/store " Song Gao
@ 2021-07-23  6:34   ` Richard Henderson
  2021-07-27  8:07     ` Song Gao
  0 siblings, 1 reply; 76+ messages in thread
From: Richard Henderson @ 2021-07-23  6:34 UTC (permalink / raw)
  To: Song Gao, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan, laurent,
	maobibo, alistair.francis, pbonzini, alex.bennee

On 7/20/21 11:53 PM, Song Gao wrote:
> +static bool trans_fldx_d(DisasContext *ctx, arg_fldx_d *a)
> +{
> +    TCGv t0;
> +    TCGv_i64 fp0;
> +    TCGv Rj = cpu_gpr[a->rj];
> +    TCGv Rk = cpu_gpr[a->rk];
> +
> +    t0 = tcg_temp_new();
> +    fp0 = tcg_temp_new_i64();
> +
> +    if (a->rj == 0 && a->rk == 0) {
> +        /* Nop */
> +        return true;
> +    }

This is not true.  This is simply a read from address 0 + 0 = 0.
Similarly for all of the other indexed memory operations.

And again, you should be using helpers to reduce the replication here.


r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 18/22] target/loongarch: Add branch instruction translation
  2021-07-21  9:53 ` [PATCH v2 18/22] target/loongarch: Add branch " Song Gao
@ 2021-07-23  6:38   ` Richard Henderson
  2021-07-27  8:07     ` Song Gao
  0 siblings, 1 reply; 76+ messages in thread
From: Richard Henderson @ 2021-07-23  6:38 UTC (permalink / raw)
  To: Song Gao, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan, laurent,
	maobibo, alistair.francis, pbonzini, alex.bennee

On 7/20/21 11:53 PM, Song Gao wrote:
> +/* Branch Instructions translation */
> +static bool trans_beqz(DisasContext *ctx, arg_beqz *a)
> +{
> +    TCGv t0, t1;
> +    int bcond_flag = 0;
> +
> +    t0 = tcg_temp_new();
> +    t1 = tcg_const_i64(0);
> +
> +    if (a->rj != 0) {
> +        gen_load_gpr(t0, a->rj);
> +        bcond_flag = 1;
> +    }
> +
> +    if (bcond_flag == 0) {
> +        ctx->hflags |= LOONGARCH_HFLAG_B;
> +    } else {
> +        tcg_gen_setcond_tl(TCG_COND_EQ, bcond, t0, t1);
> +        ctx->hflags |= LOONGARCH_HFLAG_BC;
> +    }
> +    ctx->btarget = ctx->base.pc_next + (a->offs21 << 2);
> +
> +    tcg_temp_free(t0);
> +    tcg_temp_free(t1);
> +
> +    return true;
> +}

Drop all of the branch delay slot stuff.
Use a common routine and pass in the TCGCond.


r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 19/22] target/loongarch: Add disassembler
  2021-07-21  9:53 ` [PATCH v2 19/22] target/loongarch: Add disassembler Song Gao
@ 2021-07-23  6:40   ` Richard Henderson
  2021-08-12 10:33   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 76+ messages in thread
From: Richard Henderson @ 2021-07-23  6:40 UTC (permalink / raw)
  To: Song Gao, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan, laurent,
	maobibo, alistair.francis, pbonzini, alex.bennee

On 7/20/21 11:53 PM, Song Gao wrote:
> This patch add support for disassembling via option '-d in_asm'.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   MAINTAINERS             |    1 +
>   disas/loongarch.c       | 2511 +++++++++++++++++++++++++++++++++++++++++++++++
>   disas/meson.build       |    1 +
>   include/disas/dis-asm.h |    2 +
>   meson.build             |    1 +
>   5 files changed, 2516 insertions(+)
>   create mode 100644 disas/loongarch.c

A quick browse looks fine.

Acked-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 21/22] configs: Add loongarch linux-user config
  2021-07-21  9:53 ` [PATCH v2 21/22] configs: Add loongarch linux-user config Song Gao
@ 2021-07-23  6:43   ` Richard Henderson
  0 siblings, 0 replies; 76+ messages in thread
From: Richard Henderson @ 2021-07-23  6:43 UTC (permalink / raw)
  To: Song Gao, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan, laurent,
	maobibo, alistair.francis, pbonzini, alex.bennee

On 7/20/21 11:53 PM, Song Gao wrote:
> Add loongarch64 linux-user default configs file.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   configs/targets/loongarch64-linux-user.mak | 3 +++
>   1 file changed, 3 insertions(+)
>   create mode 100644 configs/targets/loongarch64-linux-user.mak

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 03/22] target/loongarch: Add core definition
  2021-07-22 22:43   ` Richard Henderson
@ 2021-07-26  8:47     ` Song Gao
  2021-07-26 15:32       ` Richard Henderson
  0 siblings, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-07-26  8:47 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee


Hi, Richard.

On 07/23/2021 06:43 AM, Richard Henderson wrote:
> On 7/20/21 11:52 PM, Song Gao wrote:
>> This patch add target state header, target definitions
>> and initialization routines.
>>
>> Signed-off-by: Song Gao <gaosong@loongson.cn>
>> ---
>>   target/loongarch/cpu-param.h |  21 ++++
>>   target/loongarch/cpu-qom.h   |  40 ++++++
>>   target/loongarch/cpu.c       | 293 +++++++++++++++++++++++++++++++++++++++++++
>>   target/loongarch/cpu.h       | 265 ++++++++++++++++++++++++++++++++++++++
>>   4 files changed, 619 insertions(+)
>>   create mode 100644 target/loongarch/cpu-param.h
>>   create mode 100644 target/loongarch/cpu-qom.h
>>   create mode 100644 target/loongarch/cpu.c
>>   create mode 100644 target/loongarch/cpu.h
>>
>> diff --git a/target/loongarch/cpu-param.h b/target/loongarch/cpu-param.h
>> new file mode 100644
>> index 0000000..582ee29
>> --- /dev/null
>> +++ b/target/loongarch/cpu-param.h
>> @@ -0,0 +1,21 @@
>> +/*
>> + * LoongArch cpu parameters for qemu.
>> + *
>> + * Copyright (c) 2021 Loongson Technology Corporation Limited
>> + *
>> + * SPDX-License-Identifier: LGPL-2.1+
>> + */
>> +
>> +#ifndef LOONGARCH_CPU_PARAM_H
>> +#define LOONGARCH_CPU_PARAM_H 1
>> +
>> +#ifdef TARGET_LOONGARCH64
>> +#define TARGET_LONG_BITS 64
> 
> Why the ifdef for TARGET_LOONGARCH64?
> Nothing will compile without that set.
> 

OK, I'll remove it.

>> +#ifdef CONFIG_TCG
>> +static void loongarch_cpu_synchronize_from_tb(CPUState *cs,
>> +                                              const TranslationBlock *tb)
>> +{
>> +    LoongArchCPU *cpu = LOONGARCH_CPU(cs);
>> +    CPULoongArchState *env = &cpu->env;
>> +
>> +    env->active_tc.PC = tb->pc;
>> +    env->hflags &= ~LOONGARCH_HFLAG_BMASK;
>> +    env->hflags |= tb->flags & LOONGARCH_HFLAG_BMASK;
>> +}
> 
> Loongarch has no branch delay slots, so you should not have replicated the mips branch delay slot handling.  There should be no BMASK at all.
>
OK
 
>> +#ifdef CONFIG_TCG
>> +#include "hw/core/tcg-cpu-ops.h"
>> +
>> +static struct TCGCPUOps loongarch_tcg_ops = {
>> +    .initialize = loongarch_tcg_init,
>> +    .synchronize_from_tb = loongarch_cpu_synchronize_from_tb,
>> +};
>> +#endif /* CONFIG_TCG */
> 
> May I presume that Loongarch has virtualization hardware, and will eventually support KVM?  If not, there is no need for CONFIG_TCG anywhere.
>
Yes, Loongarch has virtualization hardware,  We plan to support KVM in QEMU in the future.  
 
>> +#define TCG_GUEST_DEFAULT_MO (0)
>> +#define UNASSIGNED_CPU_ID 0xFFFFFFFF
>> +
>> +typedef union fpr_t fpr_t;
>> +union fpr_t {
>> +    float64  fd;   /* ieee double precision */
>> +    float32  fs[2];/* ieee single precision */
>> +    uint64_t d;    /* binary double fixed-point */
>> +    uint32_t w[2]; /* binary single fixed-point */
>> +};
> 
> For what it's worth, we already have a CPU_DoubleU type that could be used.  But frankly, float64 *is* uint64_t, so there's very little use in putting them together into a union. It would seem that you don't even use fs and w for more than fpu_dump_state, and you're even doing it wrong there.
>
OK, I'll correct it.
 
>> +typedef struct CPULoongArchFPUContext CPULoongArchFPUContext;
>> +struct CPULoongArchFPUContext {
>> +    /* Floating point registers */
>> +    fpr_t fpr[32];
>> +    float_status fp_status;
>> +
>> +    bool cf[8];
>> +    /*
>> +     * fcsr0
>> +     * 31:29 |28:24 |23:21 |20:16 |15:10 |9:8 |7  |6  |5 |4:0
>> +     *        Cause         Flags         RM   DAE TM     Enables
>> +     */
>> +    uint32_t fcsr0;
>> +    uint32_t fcsr0_mask;
>> +    uint32_t vcsr16;
>> +
>> +#define FCSR0_M1    0xdf         /* FCSR1 mask, DAE, TM and Enables */
>> +#define FCSR0_M2    0x1f1f0000   /* FCSR2 mask, Cause and Flags */
>> +#define FCSR0_M3    0x300        /* FCSR3 mask, Round Mode */
>> +#define FCSR0_RM    8            /* Round Mode bit num on fcsr0 */
>> +#define GET_FP_CAUSE(reg)        (((reg) >> 24) & 0x1f)
>> +#define GET_FP_ENABLE(reg)       (((reg) >>  0) & 0x1f)
>> +#define GET_FP_FLAGS(reg)        (((reg) >> 16) & 0x1f)
>> +#define SET_FP_CAUSE(reg, v)      do { (reg) = ((reg) & ~(0x1f << 24)) | \
>> +                                               ((v & 0x1f) << 24);       \
>> +                                     } while (0)
>> +#define SET_FP_ENABLE(reg, v)     do { (reg) = ((reg) & ~(0x1f <<  0)) | \
>> +                                               ((v & 0x1f) << 0);        \
>> +                                     } while (0)
>> +#define SET_FP_FLAGS(reg, v)      do { (reg) = ((reg) & ~(0x1f << 16)) | \
>> +                                               ((v & 0x1f) << 16);       \
>> +                                     } while (0)
>> +#define UPDATE_FP_FLAGS(reg, v)   do { (reg) |= ((v & 0x1f) << 16); } while (0)
>> +#define FP_INEXACT        1
>> +#define FP_UNDERFLOW      2
>> +#define FP_OVERFLOW       4
>> +#define FP_DIV0           8
>> +#define FP_INVALID        16
>> +};
>> +
>> +#define TARGET_INSN_START_EXTRA_WORDS 2
>> +#define LOONGARCH_FPU_MAX 1
>> +#define N_IRQS      14
>> +
>> +enum loongarch_feature {
>> +    LA_FEATURE_3A5000,
>> +};
>> +
>> +typedef struct TCState TCState;
>> +struct TCState {
>> +    target_ulong gpr[32];
>> +    target_ulong PC;
>> +};
>> +
>> +typedef struct CPULoongArchState CPULoongArchState;
>> +struct CPULoongArchState {
>> +    TCState active_tc;
>> +    CPULoongArchFPUContext active_fpu;
> 
> Please don't replicate the mips foolishness with active_tc and active_fpu.  There is no inactive_fpu with which to contrast this.  Just include these fields directly into the main CPULoongArchState structure.
> 

OK.

>> +
>> +    uint32_t current_tc;
>> +    uint64_t scr[4];
>> +    uint32_t current_fpu;
>> +
>> +    /* LoongArch CSR register */
>> +    CPU_LOONGARCH_CSR
>> +    target_ulong lladdr; /* LL virtual address compared against SC */
>> +    target_ulong llval;
>> +
>> +    CPULoongArchFPUContext fpus[LOONGARCH_FPU_MAX];
> 
> More copying from MIPS?  What is this for?
>
Oh, It semms so. 

> 
>> +
>> +    /* QEMU */
>> +    int error_code;
>> +    uint32_t hflags;    /* CPU State */
>> +#define TLB_NOMATCH   0x1
>> +#define INST_INAVAIL  0x2 /* Invalid instruction word for BadInstr */
>> +    /* TMASK defines different execution modes */
>> +#define LOONGARCH_HFLAG_TMASK  0x1F5807FF
>> +#define LOONGARCH_HFLAG_KU     0x00003 /* kernel/supervisor/user mode mask   */
>> +#define LOONGARCH_HFLAG_UM     0x00003 /* user mode flag                     */
>> +#define LOONGARCH_HFLAG_KM     0x00000 /* kernel mode flag                   */
>> +#define LOONGARCH_HFLAG_64     0x00008 /* 64-bit instructions enabled        */
> 
> Is there a 32-bit mode for LoongArch?  I don't see this big in CRMD.  This big overlaps the "Direct address translation mode enable bit".  Which does sound like it should be present in tb->flags,
>

No.
 
>> +#define LOONGARCH_HFLAG_FPU    0x00020 /* FPU enabled                        */
>> +#define LOONGARCH_HFLAG_F64    0x00040 /* 64-bit FPU enabled                 */
> 
> I don't see that there is a mode-switch for a 32-bit fpu either.
> 
>> +#define LOONGARCH_HFLAG_BMASK  0x3800
>> +#define LOONGARCH_HFLAG_B      0x00800 /* Unconditional branch               */
>> +#define LOONGARCH_HFLAG_BC     0x01000 /* Conditional branch                 */
>> +#define LOONGARCH_HFLAG_BR     0x02000 /* branch to register (can't link TB) */
> 
> None of the BMASK stuff applies to LoongArch.
>> 
>> +#define LOONGARCH_HFLAG_FRE   0x2000000 /* FRE enabled */
>> +#define LOONGARCH_HFLAG_ELPA  0x4000000
>> +    target_ulong btarget;        /* Jump / branch target               */
>> +    target_ulong bcond;          /* Branch condition (if needed)       */
> 
> Nor this.
OK, I'll remove them.

> 
>> +static inline LoongArchCPU *loongarch_env_get_cpu(CPULoongArchState *env)
>> +{
>> +    return container_of(env, LoongArchCPU, env);
>> +}
>> +
>> +#define ENV_GET_CPU(e) CPU(loongarch_env_get_cpu(e))
> 
> You have copied this from a very old version of qemu.  These were replaced by generic functions in include/exec/cpu-all.h.

Right, We copied this from 3.10 version, the version is really old.  

> 
>> +void loongarch_tcg_init(void);
>> +
>> +void loongarch_cpu_dump_state(CPUState *cpu, FILE *f, int flags);
>> +
>> +void QEMU_NORETURN do_raise_exception_err(CPULoongArchState *env,Drop all of the branch delay slot stuff.
Use a common routine and pass in the TCGCond
>> +                                          uint32_t exception,
>> +                                          int error_code,
>> +                                          uintptr_t pc);
>> +
>> +static inline void QEMU_NORETURN do_raise_exception(CPULoongArchState *env,
>> +                                                    uint32_t exception,
>> +                                                    uintptr_t pc)
>> +{
>> +    do_raise_exception_err(env, exception, 0, pc);
>> +}
>> +
>> +static inline void compute_hflags(CPULoongArchState *env)
>> +{
>> +    env->hflags &= ~(LOONGARCH_HFLAG_64 | LOONGARCH_HFLAG_FPU |
>> +                     LOONGARCH_HFLAG_KU | LOONGARCH_HFLAG_ELPA);
>> +
>> +    env->hflags |= (env->CSR_CRMD & CSR_CRMD_PLV);
>> +    env->hflags |= LOONGARCH_HFLAG_64;
>> +
>> +    if (env->CSR_EUEN & CSR_EUEN_FPEN) {
>> +        env->hflags |= LOONGARCH_HFLAG_FPU;
>> +    }
>> +}
>> +
>> +const char *loongarch_exception_name(int32_t exception);
> 
> These should not be declared in cpu.h.
>

Hmm,  but where can we declared in ? such as ARM architecture declared in internals.h, is that OK?


Thanks 
Song Gao



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 04/22] target/loongarch: Add interrupt handling support
  2021-07-22 22:47   ` Richard Henderson
@ 2021-07-26  9:23     ` Song Gao
  0 siblings, 0 replies; 76+ messages in thread
From: Song Gao @ 2021-07-26  9:23 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee

Hi, Richard.

On 07/23/2021 06:47 AM, Richard Henderson wrote:
> On 7/20/21 11:53 PM, Song Gao wrote:
>> +bool loongarch_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
>> +{
>> +    if (interrupt_request & CPU_INTERRUPT_HARD) {
>> +        LoongArchCPU *cpu = LOONGARCH_CPU(cs);
>> +        CPULoongArchState *env = &cpu->env;
>> +
>> +        if (cpu_loongarch_hw_interrupts_enabled(env) &&
>> +            cpu_loongarch_hw_interrupts_pending(env)) {
>> +            cs->exception_index = EXCP_INTE;
>> +            env->error_code = 0;
>> +            loongarch_cpu_do_interrupt(cs);
>> +            return true;
>> +        }
>> +    }
>> +    return false;
>> +}
> 
> Not sure what you're doing here, with user-only.  None of these conditions apply.
> 

OK, I'll remove it.

Thanks
Song Gao.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 05/22] target/loongarch: Add memory management support
  2021-07-22 22:48   ` Richard Henderson
@ 2021-07-26  9:25     ` Song Gao
  0 siblings, 0 replies; 76+ messages in thread
From: Song Gao @ 2021-07-26  9:25 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee

Hi, Richard.

On 07/23/2021 06:48 AM, Richard Henderson wrote:
> On 7/20/21 11:53 PM, Song Gao wrote:
>> This patch introduces one memory-management-related functions
>> - loongarch_cpu_tlb_fill()
>>
>> Signed-off-by: Song Gao <gaosong@loongson.cn>
>> ---
>>   target/loongarch/cpu.c        |   1 +
>>   target/loongarch/cpu.h        |   9 ++++
>>   target/loongarch/tlb_helper.c | 103 ++++++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 113 insertions(+)
>>   create mode 100644 target/loongarch/tlb_helper.c
>>
>> diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
>> index 8eaa778..6269dd9 100644
>> --- a/target/loongarch/cpu.c
>> +++ b/target/loongarch/cpu.c
>> @@ -269,6 +269,7 @@ static struct TCGCPUOps loongarch_tcg_ops = {
>>       .initialize = loongarch_tcg_init,
>>       .synchronize_from_tb = loongarch_cpu_synchronize_from_tb,
>>       .cpu_exec_interrupt = loongarch_cpu_exec_interrupt,
>> +    .tlb_fill = loongarch_cpu_tlb_fill,
>>   };
>>   #endif /* CONFIG_TCG */
>>   diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
>> index 1db8bb5..5c06122 100644
>> --- a/target/loongarch/cpu.h
>> +++ b/target/loongarch/cpu.h
>> @@ -287,4 +287,13 @@ static inline void compute_hflags(CPULoongArchState *env)
>>     const char *loongarch_exception_name(int32_t exception);
>>   +/* tlb_helper.c */
>> +bool loongarch_cpu_tlb_fill(CPUState *cs,
>> +                            vaddr address,
>> +                            int size,
>> +                            MMUAccessType access_type,
>> +                            int mmu_idx,
>> +                            bool probe,
>> +                            uintptr_t retaddr);
>> +
>>   #endif /* LOONGARCH_CPU_H */
>> diff --git a/target/loongarch/tlb_helper.c b/target/loongarch/tlb_helper.c
>> new file mode 100644
>> index 0000000..b59a995
>> --- /dev/null
>> +++ b/target/loongarch/tlb_helper.c
>> @@ -0,0 +1,103 @@
>> +/*
>> + * LoongArch tlb emulation helpers for qemu.
>> + *
>> + * Copyright (c) 2021 Loongson Technology Corporation Limited
>> + *
>> + * SPDX-License-Identifier: LGPL-2.1+
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "cpu.h"
>> +#include "cpu-csr.h"
>> +#include "exec/helper-proto.h"
>> +#include "exec/exec-all.h"
>> +#include "exec/cpu_ldst.h"
>> +#include "exec/log.h"
>> +
>> +enum {
>> +    TLBRET_PE = -7,
>> +    TLBRET_XI = -6,
>> +    TLBRET_RI = -5,
>> +    TLBRET_DIRTY = -4,
>> +    TLBRET_INVALID = -3,
>> +    TLBRET_NOMATCH = -2,
>> +    TLBRET_BADADDR = -1,
>> +    TLBRET_MATCH = 0
>> +};
>> +
>> +static void raise_mmu_exception(CPULoongArchState *env, target_ulong address,
>> +                                MMUAccessType access_type, int tlb_error)
>> +{
>> +    CPUState *cs = env_cpu(env);
>> +    int exception = 0, error_code = 0;
>> +
>> +    if (access_type == MMU_INST_FETCH) {
>> +        error_code |= INST_INAVAIL;
>> +    }
>> +
>> +    switch (tlb_error) {
>> +    default:
>> +    case TLBRET_BADADDR:
>> +        exception = EXCP_ADE;
>> +        break;
>> +    case TLBRET_NOMATCH:
>> +        /* No TLB match for a mapped address */
>> +        if (access_type == MMU_DATA_STORE) {
>> +            exception = EXCP_TLBS;
>> +        } else {
>> +            exception = EXCP_TLBL;
>> +        }
>> +        error_code |= TLB_NOMATCH;
>> +        break;
>> +    case TLBRET_INVALID:
>> +        /* TLB match with no valid bit */
>> +        if (access_type == MMU_DATA_STORE) {
>> +            exception = EXCP_TLBS;
>> +        } else {
>> +            exception = EXCP_TLBL;
>> +        }
>> +        break;
>> +    case TLBRET_DIRTY:
>> +        exception = EXCP_TLBM;
>> +        break;
>> +    case TLBRET_XI:
>> +        /* Execute-Inhibit Exception */
>> +        exception = EXCP_TLBXI;
>> +        break;
>> +    case TLBRET_RI:
>> +        /* Read-Inhibit Exception */
>> +        exception = EXCP_TLBRI;
>> +        break;
>> +    case TLBRET_PE:
>> +        /* Privileged Exception */
>> +        exception = EXCP_TLBPE;
>> +        break;
>> +    }
>> +
>> +    if (tlb_error == TLBRET_NOMATCH) {
>> +        env->CSR_TLBRBADV = address;
>> +        env->CSR_TLBREHI = address & (TARGET_PAGE_MASK << 1);
>> +        cs->exception_index = exception;
>> +        env->error_code = error_code;
>> +        return;
>> +    }
>> +
>> +    /* Raise exception */
>> +    env->CSR_BADV = address;
>> +    cs->exception_index = exception;
>> +    env->error_code = error_code;
>> +    env->CSR_TLBEHI = address & (TARGET_PAGE_MASK << 1);
>> +}
>> +
>> +bool loongarch_cpu_tlb_fill(CPUState *cs, vaddr address, int size,
>> +                       MMUAccessType access_type, int mmu_idx,
>> +                       bool probe, uintptr_t retaddr)
>> +{
>> +    LoongArchCPU *cpu = LOONGARCH_CPU(cs);
>> +    CPULoongArchState *env = &cpu->env;
>> +    int ret = TLBRET_BADADDR;
>> +
>> +    /* data access */
>> +    raise_mmu_exception(env, address, access_type, ret);
>> +    do_raise_exception_err(env, cs->exception_index, env->error_code, retaddr);
>> +}
> 
> Again, almost all of this does not apply for user-only.
> 
> r~
> 
>>

OK, I‘ll remove it .

Thanks
Song Gao.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 06/22] target/loongarch: Add main translation routines
  2021-07-22 23:50   ` Richard Henderson
@ 2021-07-26  9:39     ` Song Gao
  2021-07-26 15:35       ` Richard Henderson
  0 siblings, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-07-26  9:39 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan, laurent,
	maobibo, alistair.francis, pbonzini, alex.bennee

Hi, Richard.

On 07/23/2021 07:50 AM, Richard Henderson wrote:
> On 7/20/21 11:53 PM, Song Gao wrote:
>> +/* General purpose registers moves. */
>> +void gen_load_gpr(TCGv t, int reg)
>> +{
>> +    if (reg == 0) {
>> +        tcg_gen_movi_tl(t, 0);
>> +    } else {
>> +        tcg_gen_mov_tl(t, cpu_gpr[reg]);
>> +    }
>> +}
> 
> Please have a look at
> 
> https://patchew.org/QEMU/20210709042608.883256-1-richard.henderson@linaro.org/
> 
> for a better way to handle the zero register.
> > 

OK, I'll look at it carefully.

>> +static inline void save_cpu_state(DisasContext *ctx, int do_save_pc)
>> +{
>> +    if (do_save_pc && ctx->base.pc_next != ctx->saved_pc) {
>> +        gen_save_pc(ctx->base.pc_next);
>> +        ctx->saved_pc = ctx->base.pc_next;
>> +    }
>> +    if (ctx->hflags != ctx->saved_hflags) {
>> +        tcg_gen_movi_i32(hflags, ctx->hflags);
>> +        ctx->saved_hflags = ctx->hflags;
>> +        switch (ctx->hflags & LOONGARCH_HFLAG_BMASK) {
>> +        case LOONGARCH_HFLAG_BR:
>> +            break;
>> +        case LOONGARCH_HFLAG_BC:
>> +        case LOONGARCH_HFLAG_B:
>> +            tcg_gen_movi_tl(btarget, ctx->btarget);
>> +            break;
>> +        }
>> +    }
>> +}
> 
> Drop all the hflags handling.
> It's all copied from mips delay slot handling.
> 

OK.

>> +
>> +static inline void restore_cpu_state(CPULoongArchState *env, DisasContext *ctx)
>> +{
>> +    ctx->saved_hflags = ctx->hflags;
>> +    switch (ctx->hflags & LOONGARCH_HFLAG_BMASK) {
>> +    case LOONGARCH_HFLAG_BR:
>> +        break;
>> +    case LOONGARCH_HFLAG_BC:
>> +    case LOONGARCH_HFLAG_B:
>> +        ctx->btarget = env->btarget;
>> +        break;
>> +    }
>> +}
> 
> Likewise.
> 
>> +static void gen_load_fpr32h(TCGv_i32 t, int reg)
>> +{
>> +    tcg_gen_extrh_i64_i32(t, fpu_f64[reg]);
>> +}
>> +
>> +static void gen_store_fpr32h(TCGv_i32 t, int reg)
>> +{
>> +    TCGv_i64 t64 = tcg_temp_new_i64();
>> +    tcg_gen_extu_i32_i64(t64, t);
>> +    tcg_gen_deposit_i64(fpu_f64[reg], fpu_f64[reg], t64, 32, 32);
>> +    tcg_temp_free_i64(t64);
>> +}
> 
> There is no general-purpose high-part fpr stuff.  There's only movgr2frh and movfrh2gr, and you can simplify both if you drop the transition through TCGv_i32.
> 
OK.

>> +void gen_op_addr_add(TCGv ret, TCGv arg0, TCGv arg1)
>> +{
>> +    tcg_gen_add_tl(ret, arg0, arg1);
>> +}
> 
> No point in this, since loongarch has no 32-bit address mode.
> 
OK.

>> +void gen_base_offset_addr(TCGv addr, int base, int offset)
>> +{
>> +    if (base == 0) {
>> +        tcg_gen_movi_tl(addr, offset);
>> +    } else if (offset == 0) {
>> +        gen_load_gpr(addr, base);
>> +    } else {
>> +        tcg_gen_movi_tl(addr, offset);
>> +        gen_op_addr_add(addr, cpu_gpr[base], addr);
>> +    }
>> +}
> 
> Using the interfaces I quote above from my riscv cleanup,
> this can be tidied to
> 
>     tcg_gen_addi_tl(addr, gpr_src(base), offset);
> 

'riscv cleanup' series at https://patchew.org/QEMU/20210709042608.883256-1-richard.henderson@linaro.org/ , Right?


>> +static inline bool use_goto_tb(DisasContext *ctx, target_ulong dest)
>> +{
>> +    return true;
>> +}
> 
> You must now use translate_use_goto_tb, which will not always return true.  You will see assertion failures otherwise.
> 

I see the patch already.

>> +static inline void clear_branch_hflags(DisasContext *ctx)
>> +{
>> +    ctx->hflags &= ~LOONGARCH_HFLAG_BMASK;
>> +    if (ctx->base.is_jmp == DISAS_NEXT) {
>> +        save_cpu_state(ctx, 0);
>> +    } else {
>> +        /*
>> +         * It is not safe to save ctx->hflags as hflags may be changed
>> +         * in execution time.
>> +         */
>> +        tcg_gen_andi_i32(hflags, hflags, ~LOONGARCH_HFLAG_BMASK);
>> +    }
>> +}
> 
> Not required.
> 
>> +static void gen_branch(DisasContext *ctx, int insn_bytes)
>> +{
>> +    if (ctx->hflags & LOONGARCH_HFLAG_BMASK) {
>> +        int proc_hflags = ctx->hflags & LOONGARCH_HFLAG_BMASK;
>> +        /* Branches completion */
>> +        clear_branch_hflags(ctx);
>> +        ctx->base.is_jmp = DISAS_NORETURN;
>> +        switch (proc_hflags & LOONGARCH_HFLAG_BMASK) {
>> +        case LOONGARCH_HFLAG_B:
>> +            /* unconditional branch */
>> +            gen_goto_tb(ctx, 0, ctx->btarget);
>> +            break;
>> +        case LOONGARCH_HFLAG_BC:
>> +            /* Conditional branch */
>> +            {
>> +                TCGLabel *l1 = gen_new_label();
>> +
>> +                tcg_gen_brcondi_tl(TCG_COND_NE, bcond, 0, l1);
>> +                gen_goto_tb(ctx, 1, ctx->base.pc_next + insn_bytes);
>> +                gen_set_label(l1);
>> +                gen_goto_tb(ctx, 0, ctx->btarget);
>> +            }
>> +            break;
>> +        case LOONGARCH_HFLAG_BR:
>> +            /* unconditional branch to register */
>> +            tcg_gen_mov_tl(cpu_PC, btarget);
>> +            tcg_gen_lookup_and_goto_ptr();
>> +            break;
>> +        default:
>> +            fprintf(stderr, "unknown branch 0x%x\n", proc_hflags);
>> +            abort();
>> +        }
>> +    }
>> +}
> 
> Split this up into the various trans_* branch routines, without the setting of HFLAG.
> 
>> +static void loongarch_tr_init_disas_context(DisasContextBase *dcbase,
>> +                                            CPUState *cs)
>> +{
>> +    DisasContext *ctx = container_of(dcbase, DisasContext, base);
>> +    CPULoongArchState *env = cs->env_ptr;
>> +
>> +    ctx->page_start = ctx->base.pc_first & TARGET_PAGE_MASK;
>> +    ctx->saved_pc = -1;
>> +    ctx->btarget = 0;
>> +    /* Restore state from the tb context.  */
>> +    ctx->hflags = (uint32_t)ctx->base.tb->flags;
>> +    restore_cpu_state(env, ctx);
>> +    ctx->mem_idx = LOONGARCH_HFLAG_UM;
> 
> This is not an mmu index.  You didn't notice the error because you're only doing user-mode.
> 
> You're missing a check for page crossing.
> Generally, for fixed-width ISAs like this, we do
> 
>     /* Bound the number of insns to execute to those left on the page.  */
>     int bound = -(ctx->base.pc_first | TARGET_PAGE_MASK) / 4;
>     ctx->base.max_insns = MIN(ctx->base.max_insns, bound);
> 
> here in init_disas_context.
> 
>> +static void loongarch_tr_insn_start(DisasContextBase *dcbase, CPUState *cs)
>> +{
>> +    DisasContext *ctx = container_of(dcbase, DisasContext, base);
>> +
>> +    tcg_gen_insn_start(ctx->base.pc_next, ctx->hflags & LOONGARCH_HFLAG_BMASK,
>> +                       ctx->btarget);
> 
> No hflags/btarget stuff.  Drop TARGET_INSN_START_EXTRA_WORDS.
> 
>> +static bool loongarch_tr_breakpoint_check(DisasContextBase *dcbase,
>> +                                          CPUState *cs,
>> +                                          const CPUBreakpoint *bp)
>> +{
>> +    return true;
>> +}
> 
> Broken, but now handled generically, so remove it.
> 
> 
OK.

>> +static void loongarch_tr_translate_insn(DisasContextBase *dcbase, CPUState *cs)
>> +{
>> +    CPULoongArchState *env = cs->env_ptr;
>> +    DisasContext *ctx = container_of(dcbase, DisasContext, base);
>> +    int insn_bytes = 4;
>> +
>> +    ctx->opcode = cpu_ldl_code(env, ctx->base.pc_next);
>> +
>> +    if (!decode(ctx, ctx->opcode)) {
>> +        fprintf(stderr, "Error: unkown opcode. 0x%lx: 0x%x\n",
>> +                ctx->base.pc_next, ctx->opcode);
> 
> No fprintfs.  Use qemu_log_mask with LOG_UNIMP or LOG_GUEST_ERROR.
> 
OK.
>> +    if (ctx->hflags & LOONGARCH_HFLAG_BMASK) {
>> +        gen_branch(ctx, insn_bytes);
>> +    }
> 
> Drop this, as I mentioned above.
> 
OK.

>> +static void fpu_dump_state(CPULoongArchState *env, FILE * f, int flags)
>> +{
>> +    int i;
>> +    int is_fpu64 = 1;
>> +
>> +#define printfpr(fp)                                              \
>> +    do {                                                          \
>> +        if (is_fpu64)                                             \
>> +            qemu_fprintf(f, "w:%08x d:%016" PRIx64                \
>> +                        " fd:%13g fs:%13g psu: %13g\n",           \
>> +                        (fp)->w[FP_ENDIAN_IDX], (fp)->d,          \
>> +                        (double)(fp)->fd,                         \
>> +                        (double)(fp)->fs[FP_ENDIAN_IDX],          \
>> +                        (double)(fp)->fs[!FP_ENDIAN_IDX]);        \
>> +        else {                                                    \
>> +            fpr_t tmp;                                            \
>> +            tmp.w[FP_ENDIAN_IDX] = (fp)->w[FP_ENDIAN_IDX];        \
>> +            tmp.w[!FP_ENDIAN_IDX] = ((fp) + 1)->w[FP_ENDIAN_IDX]; \
>> +            qemu_fprintf(f, "w:%08x d:%016" PRIx64                \
>> +                        " fd:%13g fs:%13g psu:%13g\n",            \
>> +                        tmp.w[FP_ENDIAN_IDX], tmp.d,              \
>> +                        (double)tmp.fd,                           \
>> +                        (double)tmp.fs[FP_ENDIAN_IDX],            \
>> +                        (double)tmp.fs[!FP_ENDIAN_IDX]);          \
>> +        }                                                         \
>> +    } while (0)
> 
> This is broken.  You're performing an integer to fp conversion of something that is already a floating-point value, not printing the floating-point value itself.  It's broken in the mips code as well.
> 
> In addition, is_fpu64 is pointless for loongarch.
> 
Yes.
>> +void loongarch_tcg_init(void)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < 32; i++)
>> +        cpu_gpr[i] = tcg_global_mem_new(cpu_env,
>> +                                        offsetof(CPULoongArchState,
>> +                                                 active_tc.gpr[i]),
>> +                                        regnames[i]);
> 
> Missing braces.
> Do not create a temp for the zero register.
> 
>> +    bcond = tcg_global_mem_new(cpu_env,
>> +                               offsetof(CPULoongArchState, bcond), "bcond");
>> +    btarget = tcg_global_mem_new(cpu_env,
>> +                                 offsetof(CPULoongArchState, btarget),
>> +                                 "btarget");
>> +    hflags = tcg_global_mem_new_i32(cpu_env,
>> +                                    offsetof(CPULoongArchState, hflags),
>> +                                    "hflags");
> 
> Drop these.
OK.

Thanks for you kindly help.

Thanks
Song Gao.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 07/22] target/loongarch: Add fixed point arithmetic instruction translation
  2021-07-23  0:46   ` Richard Henderson
@ 2021-07-26 11:56     ` Song Gao
  2021-07-26 15:53       ` Richard Henderson
  0 siblings, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-07-26 11:56 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee

Hi, Richard.

On 07/23/2021 08:46 AM, Richard Henderson wrote:
> On 7/20/21 11:53 PM, Song Gao wrote:
>> +/* Fixed point arithmetic operation instruction translation */
>> +static bool trans_add_w(DisasContext *ctx, arg_add_w *a)
>> +{
>> +    TCGv Rd = cpu_gpr[a->rd];
>> +    TCGv Rj = cpu_gpr[a->rj];
>> +    TCGv Rk = cpu_gpr[a->rk];
>> +
>> +    if (a->rd == 0) {
>> +        /* Nop */
>> +        return true;
>> +    }
>> +
>> +    if (a->rj != 0 && a->rk != 0) {
>> +        tcg_gen_add_tl(Rd, Rj, Rk);
>> +        tcg_gen_ext32s_tl(Rd, Rd);
>> +    } else if (a->rj == 0 && a->rk != 0) {
>> +        tcg_gen_mov_tl(Rd, Rk);
>> +    } else if (a->rj != 0 && a->rk == 0) {
>> +        tcg_gen_mov_tl(Rd, Rj);
>> +    } else {
>> +        tcg_gen_movi_tl(Rd, 0);
>> +    }
>> +
>> +    return true;
>> +}
> 
> Do not do all of this "if reg(n) zero" testing.
> 
> Use a common function to perform the gpr lookup, and a small callback function for the operation.  Often, the callback function already exists within include/tcg/tcg-op.h.
> 
> Please see my riscv cleanup patch set I referenced vs patch 6.

I am not sure  that 'riscv cleanup' patchs at:
  
   https://patchew.org/QEMU/20210709042608.883256-1-richard.henderson@linaro.org 

It seems that  gpr_dst/gpr_src are common function to perform the gpr lookup. is that right? 


> 
>> +static bool trans_orn(DisasContext *ctx, arg_orn *a)
>> +{
>> +    TCGv Rd = cpu_gpr[a->rd];
>> +    TCGv Rj = cpu_gpr[a->rj];
>> +
>> +    if (a->rd == 0) {
>> +        /* Nop */
>> +        return true;
>> +    }
>> +
>> +    TCGv t0 = tcg_temp_new();
>> +    gen_load_gpr(t0, a->rk);
>> +
>> +    tcg_gen_not_tl(t0, t0);
>> +    tcg_gen_or_tl(Rd, Rj, t0);
> 
> tcg_gen_orc_tl.
> 
OK.
>> +static bool trans_andn(DisasContext *ctx, arg_andn *a)
>> +{
>> +    TCGv Rd = cpu_gpr[a->rd];
>> +    TCGv Rj = cpu_gpr[a->rj];
>> +
>> +    if (a->rd == 0) {
>> +        /* Nop */
>> +        return true;
>> +    }
>> +
>> +    TCGv t0 = tcg_temp_new();
>> +    gen_load_gpr(t0, a->rk);
>> +
>> +    tcg_gen_not_tl(t0, t0);
>> +    tcg_gen_and_tl(Rd, Rj, t0);
> 
> tcg_gen_andc_tl.
> 
OK.

>> +static bool trans_mul_d(DisasContext *ctx, arg_mul_d *a)
>> +{
>> +    TCGv t0, t1;
>> +    TCGv Rd = cpu_gpr[a->rd];
>> +
>> +    if (a->rd == 0) {
>> +        /* Nop */
>> +        return true;
>> +    }
>> +
>> +    t0 = get_gpr(a->rj);
>> +    t1 = get_gpr(a->rk);
>> +
>> +    check_loongarch_64(ctx);
> 
> Architecture checks go first, before you've decided the operation is a nop.
> 
OK.

>> +static bool trans_mulh_d(DisasContext *ctx, arg_mulh_d *a)
>> +{
>> +    TCGv t0, t1, t2;
>> +    TCGv Rd = cpu_gpr[a->rd];
>> +
>> +    if (a->rd == 0) {
>> +        /* Nop */
>> +        return true;
>> +    }
>> +
>> +    t0 = get_gpr(a->rj);
>> +    t1 = get_gpr(a->rk);
>> +    t2 = tcg_temp_new();
>> +
>> +    check_loongarch_64(ctx);
>> +    tcg_gen_muls2_i64(t2, Rd, t0, t1);
> 
> If you actually supported LA32, you'd notice this doesn't compile.  Are you planning to support LA32 in the future?
> 
No. 
>> +static bool trans_lu32i_d(DisasContext *ctx, arg_lu32i_d *a)
>> +{
>> +    TCGv_i64 t0, t1;
>> +    TCGv Rd = cpu_gpr[a->rd];
>> +
>> +    if (a->rd == 0) {
>> +        /* Nop */
>> +        return true;
>> +    }
>> +
>> +    t0 = tcg_temp_new_i64();
>> +    t1 = tcg_temp_new_i64();
>> +
>> +    tcg_gen_movi_tl(t0, a->si20);
>> +    tcg_gen_concat_tl_i64(t1, Rd, t0);
>> +    tcg_gen_mov_tl(Rd, t1);
> 
> Hmm.  Better as
> 
>   tcg_gen_deposit_tl(Rd, Rd, tcg_constant_tl(a->si20), 32, 32);
>
OK.>> +static bool trans_lu52i_d(DisasContext *ctx, arg_lu52i_d *a)
>> +{
>> +    TCGv t0, t1;
>> +    TCGv Rd = cpu_gpr[a->rd];
>> +
>> +    if (a->rd == 0) {
>> +        /* Nop */
>> +        return true;
>> +    }
>> +
>> +    t0 = tcg_temp_new();
>> +    t1 = tcg_temp_new();
>> +
>> +    gen_load_gpr(t1, a->rj);
>> +
>> +    tcg_gen_movi_tl(t0, a->si12);
>> +    tcg_gen_shli_tl(t0, t0, 52);
>> +    tcg_gen_andi_tl(t1, t1, 0xfffffffffffffU);
>> +    tcg_gen_or_tl(Rd, t0, t1);
> 
> Definitely better as
> 
>   tcg_gen_deposit_tl(Rd, Rd, tcg_constant_tl(a->si12), 52, 12);
> 
OK.
>> +static bool trans_addi_w(DisasContext *ctx, arg_addi_w *a)
>> +{
>> +    TCGv Rd = cpu_gpr[a->rd];
>> +    TCGv Rj = cpu_gpr[a->rj];
>> +    target_ulong uimm = (target_long)(a->si12);
>> +
>> +    if (a->rd == 0) {
>> +        /* Nop */
>> +        return true;
>> +    }
>> +
>> +    if (a->rj != 0) {
>> +        tcg_gen_addi_tl(Rd, Rj, uimm);
>> +        tcg_gen_ext32s_tl(Rd, Rd);
>> +    } else {
>> +        tcg_gen_movi_tl(Rd, uimm);
>> +    }
>> +
>> +    return true;
>> +}
> 
> Again, there should be a common function for all of the two-register-immediate operations.  The callback here is exactly the same as for trans_add_w.
> 
OK.
>> +static bool trans_xori(DisasContext *ctx, arg_xori *a)
>> +{
>> +    TCGv Rd = cpu_gpr[a->rd];
>> +    TCGv Rj = cpu_gpr[a->rj];
>> +
>> +    target_ulong uimm = (uint16_t)(a->ui12);
> 
> You shouldn't need these sorts of casts.
> 
OK. 

Thank you kindly help.

Thanks
Song Gao



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 08/22] target/loongarch: Add fixed point shift instruction translation
  2021-07-23  0:51   ` Richard Henderson
@ 2021-07-26 11:57     ` Song Gao
  0 siblings, 0 replies; 76+ messages in thread
From: Song Gao @ 2021-07-26 11:57 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee

Hi, Richard.

On 07/23/2021 08:51 AM, Richard Henderson wrote:
> On 7/20/21 11:53 PM, Song Gao wrote:
>> +/* Fixed point shift operation instruction translation */
>> +static bool trans_sll_w(DisasContext *ctx, arg_sll_w *a)
>> +{
>> +    TCGv t0, t1;
>> +    TCGv Rd = cpu_gpr[a->rd];
>> +
>> +    if (a->rd == 0) {
>> +        /* Nop */
>> +        return true;
>> +    }
>> +
>> +    t0 = tcg_temp_new();
>> +    t1 = get_gpr(a->rj);
>> +
>> +    gen_load_gpr(t0, a->rk);
>> +
>> +    tcg_gen_andi_tl(t0, t0, 0x1f);
>> +    tcg_gen_shl_tl(t0, t1, t0);
>> +    tcg_gen_ext32s_tl(Rd, t0);
>> +
>> +    tcg_temp_free(t0);
>> +
>> +    return true;
>> +}
> 
> Again, you should be using common helper functions for this instead of replicating the same pattern 16 times.
> 

OK. 

Thanks
Song Gao



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 09/22] target/loongarch: Add fixed point bit instruction translation
  2021-07-23  1:29   ` Richard Henderson
@ 2021-07-26 12:22     ` Song Gao
  2021-07-26 16:39       ` Richard Henderson
  0 siblings, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-07-26 12:22 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee

Hi, Richard.

On 07/23/2021 09:29 AM, Richard Henderson wrote:
> On 7/20/21 11:53 PM, Song Gao wrote:
>> This patch implement fixed point bit instruction translation.
>>
>> This includes:
>> - EXT.W.{B/H}
>> - CL{O/Z}.{W/D}, CT{O/Z}.{W/D}
>> - BYTEPICK.{W/D}
>> - REVB.{2H/4H/2W/D}
>> - REVH.{2W/D}
>> - BITREV.{4B/8B}, BITREV.{W/D}
>> - BSTRINS.{W/D}, BSTRPICK.{W/D}
>> - MASKEQZ, MASKNEZ
>>
>> Signed-off-by: Song Gao <gaosong@loongson.cn>
>> ---
>>   target/loongarch/helper.h     |  10 +
>>   target/loongarch/insns.decode |  45 +++
>>   target/loongarch/op_helper.c  | 119 ++++++++
>>   target/loongarch/trans.inc.c  | 665 ++++++++++++++++++++++++++++++++++++++++++
>>   4 files changed, 839 insertions(+)
>>
>> diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
>> index 6c7e19b..bbbcc26 100644
>> --- a/target/loongarch/helper.h
>> +++ b/target/loongarch/helper.h
>> @@ -8,3 +8,13 @@
>>     DEF_HELPER_3(raise_exception_err, noreturn, env, i32, int)
>>   DEF_HELPER_2(raise_exception, noreturn, env, i32)
>> +
>> +DEF_HELPER_2(cto_w, tl, env, tl)
>> +DEF_HELPER_2(ctz_w, tl, env, tl)
>> +DEF_HELPER_2(cto_d, tl, env, tl)
>> +DEF_HELPER_2(ctz_d, tl, env, tl)
> 
> The count leading and trailing zero operations are built into tcg.  Count leading and trailing one simply needs a NOT operation to convert it to zero.
> 

My understanding is this:
   
  cto -> NOT operation (tcg_gen_not_tl)  -> ctz,

  is right?

>> +DEF_HELPER_2(bitrev_w, tl, env, tl)
>> +DEF_HELPER_2(bitrev_d, tl, env, tl)
> 
> These should use TCG_CALL_NO_RWG_SE.
> 
>> +target_ulong helper_bitrev_w(CPULoongArchState *env, target_ulong rj)
>> +{
>> +    int32_t v = (int32_t)rj;
>> +    const int SIZE = 32;
>> +    uint8_t bytes[SIZE];
>> +
>> +    int i;
>> +    for (i = 0; i < SIZE; i++) {
>> +        bytes[i] = v & 0x1;
>> +        v = v >> 1;
>> +    }
>> +    /* v == 0 */
>> +    for (i = 0; i < SIZE; i++) {
>> +        v = v | ((uint32_t)bytes[i] << (SIZE - 1 - i));
>> +    }
>> +
>> +    return (target_ulong)(int32_t)v;
>> +}
> 
>   return (int32_t)revbit32(rj);
> 
> 
OK.

>> +target_ulong helper_bitrev_d(CPULoongArchState *env, target_ulong rj)
>> +{
>> +    uint64_t v = rj;
>> +    const int SIZE = 64;
>> +    uint8_t bytes[SIZE];
>> +
>> +    int i;
>> +    for (i = 0; i < SIZE; i++) {
>> +        bytes[i] = v & 0x1;
>> +        v = v >> 1;
>> +    }
>> +    /* v == 0 */
>> +    for (i = 0; i < SIZE; i++) {
>> +        v = v | ((uint64_t)bytes[i] << (SIZE - 1 - i));
>> +    }
>> +
>> +    return (target_ulong)v;
>> +}
> 
>   return revbit64(rj);
>
OK.
 
>> +static inline target_ulong bitswap(target_ulong v)
>> +{
>> +    v = ((v >> 1) & (target_ulong)0x5555555555555555ULL) |
>> +        ((v & (target_ulong)0x5555555555555555ULL) << 1);
>> +    v = ((v >> 2) & (target_ulong)0x3333333333333333ULL) |
>> +        ((v & (target_ulong)0x3333333333333333ULL) << 2);
>> +    v = ((v >> 4) & (target_ulong)0x0F0F0F0F0F0F0F0FULL) |
>> +        ((v & (target_ulong)0x0F0F0F0F0F0F0F0FULL) << 4);
>> +    return v;
>> +}
>> +
>> +target_ulong helper_loongarch_dbitswap(target_ulong rj)
>> +{
>> +    return bitswap(rj);
>> +}
>> +
>> +target_ulong helper_loongarch_bitswap(target_ulong rt)
>> +{
>> +    return (int32_t)bitswap(rt);
>> +}
> 
> I assume these are fpr the  bitrev.4b and bitrev.8b insns?
> It would be better to name them correctly.
> 
> 
Yes.

>> +/* Fixed point bit operation instruction translation */
>> +static bool trans_ext_w_h(DisasContext *ctx, arg_ext_w_h *a)
>> +{
>> +    TCGv t0;
>> +    TCGv Rd = cpu_gpr[a->rd];
>> +
>> +    if (a->rd == 0) {
>> +        /* Nop */
>> +        return true;
>> +    }
>> +
>> +    t0 = get_gpr(a->rj);
>> +
>> +    tcg_gen_ext16s_tl(Rd, t0);
> 
> Again, you should have a common routine for handling these unary operations.
> 
OK. 

>> +static bool trans_clo_w(DisasContext *ctx, arg_clo_w *a)
>> +{
>> +    TCGv Rd = cpu_gpr[a->rd];
>> +
>> +    if (a->rd == 0) {
>> +        /* Nop */
>> +        return true;
>> +    }
>> +
>> +    gen_load_gpr(Rd, a->rj);
>> +
>> +    tcg_gen_not_tl(Rd, Rd);
>> +    tcg_gen_ext32u_tl(Rd, Rd);
>> +    tcg_gen_clzi_tl(Rd, Rd, TARGET_LONG_BITS);
>> +    tcg_gen_subi_tl(Rd, Rd, TARGET_LONG_BITS - 32);
> 
> So, you're actually using the tcg builtins here, and the helper you created isn't used.
> 
Yes.
>> +static bool trans_cto_w(DisasContext *ctx, arg_cto_w *a)
>> +{
>> +    TCGv t0;
>> +    TCGv Rd = cpu_gpr[a->rd];
>> +
>> +    if (a->rd == 0) {
>> +        /* Nop */
>> +        return true;
>> +    }
>> +
>> +    t0 = tcg_temp_new();
>> +    gen_load_gpr(t0, a->rj);
>> +
>> +    gen_helper_cto_w(Rd, cpu_env, t0);
> 
> Here you should have used the tcg builtin.
> 
OK.

>> +static bool trans_ctz_w(DisasContext *ctx, arg_ctz_w *a)
>> +{
>> +    TCGv t0;
>> +    TCGv Rd = cpu_gpr[a->rd];
>> +
>> +    if (a->rd == 0) {
>> +        /* Nop */
>> +        return true;
>> +    }
>> +
>> +    t0 = tcg_temp_new();
>> +    gen_load_gpr(t0, a->rj);
>> +
>> +    gen_helper_ctz_w(Rd, cpu_env, t0);
> 
> Likewise.
> 
>> +static bool trans_revb_2w(DisasContext *ctx, arg_revb_2w *a)
>> +{
>> +    TCGv_i64 t0, t1, t2;
>> +    TCGv Rd = cpu_gpr[a->rd];
>> +
>> +    if (a->rd == 0) {
>> +        /* Nop */
>> +        return true;
>> +    }
>> +
>> +    t0 = tcg_temp_new_i64();
>> +    t1 = tcg_temp_new_i64();
>> +    t2 = get_gpr(a->rj);
>> +
>> +    gen_load_gpr(t0, a->rd);
>> +
>> +    tcg_gen_ext32u_i64(t1, t2);
>> +    tcg_gen_bswap32_i64(t0, t1);
>> +    tcg_gen_shri_i64(t1, t2, 32);
>> +    tcg_gen_bswap32_i64(t1, t1);
>> +    tcg_gen_concat32_i64(Rd, t0, t1);
> 
> tcg_gen_bswap64_i64(Rd, Rj)
> tcg_gen_rotri_i64(Rd, Rd, 32);
> 
OK.
>> +static bool trans_bytepick_d(DisasContext *ctx, arg_bytepick_d *a)
>> +{
>> +    TCGv t0;
>> +    TCGv Rd = cpu_gpr[a->rd];
>> +
>> +    if (a->rd == 0) {
>> +        /* Nop */
>> +        return true;
>> +    }
>> +
>> +    t0 = tcg_temp_new();
>> +
>> +    check_loongarch_64(ctx);
>> +    if (a->sa3 == 0 || ((a->sa3) * 8) == 64) {
>> +        if (a->sa3 == 0) {
>> +            gen_load_gpr(t0, a->rk);
>> +        } else {
>> +            gen_load_gpr(t0, a->rj);
>> +        }
>> +            tcg_gen_mov_tl(Rd, t0);
>> +    } else {
>> +        TCGv t1 = tcg_temp_new();
>> +
>> +        gen_load_gpr(t0, a->rk);
>> +        gen_load_gpr(t1, a->rj);
>> +
>> +        tcg_gen_shli_tl(t0, t0, ((a->sa3) * 8));
>> +        tcg_gen_shri_tl(t1, t1, 64 - ((a->sa3) * 8));
>> +        tcg_gen_or_tl(Rd, t1, t0);
>> +
>> +        tcg_temp_free(t1);
>> +    }
> 
> tcg_gen_extract2_i64(Rd, Rk, Rj, a->sa3 * 8);
> 
OK

Thank you kindly help.

Thanks
Song Gao.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 10/22] target/loongarch: Add fixed point load/store instruction translation
  2021-07-23  1:45   ` Richard Henderson
@ 2021-07-26 12:25     ` Song Gao
  0 siblings, 0 replies; 76+ messages in thread
From: Song Gao @ 2021-07-26 12:25 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee

Hi, Richard.

On 07/23/2021 09:45 AM, Richard Henderson wrote:
>> +/* Fixed point load/store instruction translation */
>> +static bool trans_ld_b(DisasContext *ctx, arg_ld_b *a)
>> +{
>> +    TCGv t0;
>> +    TCGv Rd = cpu_gpr[a->rd];
>> +    int mem_idx = ctx->mem_idx;
>> +
>> +    if (a->rd == 0) {
>> +        /* Nop */
>> +        return true;
>> +    }
> 
> A load into the zero register is not a nop.  It is a load with the result discarded.  One should still fault if the load is to an invalid address.
> 
> You should be using a common routine, passing in the MO_* operand.
> 
OK.

>> +#define ASRTGT                                \
>> +    do {                                      \
>> +        TCGv t1 = get_gpr(a->rj);             \
>> +        TCGv t2 = get_gpr(a->rk);             \
>> +        gen_helper_asrtgt_d(cpu_env, t1, t2); \
>> +    } while (0)
>> +
>> +#define ASRTLE                                \
>> +    do {                                      \
>> +        TCGv t1 = get_gpr(a->rj);             \
>> +        TCGv t2 = get_gpr(a->rk);             \
>> +        gen_helper_asrtle_d(cpu_env, t1, t2); \
>> +    } while (0)
>> +
>> +#define DECL_ARG(name)   \
>> +    arg_ ## name arg = { \
>> +        .rd = a->rd,     \
>> +        .rj = a->rj,     \
>> +        .rk = a->rk,     \
>> +    };
>> +
>> +static bool trans_ldgt_b(DisasContext *ctx, arg_ldgt_b *a)
>> +{
>> +    ASRTGT;
>> +    DECL_ARG(ldx_b)
>> +    trans_ldx_b(ctx, &arg);
>> +    return true;
>> +}
> 
> Use of a common routine would avoid the macro ugliness.
OK.

Thanks you kindly help.

Thanks
Song Gao.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 11/22] target/loongarch: Add fixed point atomic instruction translation
  2021-07-23  1:49   ` Richard Henderson
@ 2021-07-26 12:25     ` Song Gao
  0 siblings, 0 replies; 76+ messages in thread
From: Song Gao @ 2021-07-26 12:25 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee

Hi, Richard.

On 07/23/2021 09:49 AM, Richard Henderson wrote:
> On 7/20/21 11:53 PM, Song Gao wrote:
>> +#define TRANS_AM_W(name, op)                                      \
>> +static bool trans_ ## name(DisasContext *ctx, arg_ ## name * a)   \
>> +{                                                                 \
>> +    TCGv addr, val, ret;                                          \
>> +    TCGv Rd = cpu_gpr[a->rd];                                     \
>> +    int mem_idx = ctx->mem_idx;                                   \
>> +                                                                  \
>> +    if (a->rd == 0) {                                             \
>> +        return true;                                              \
>> +    }                                                             \
>> +    if ((a->rd != 0) && ((a->rj == a->rd) || (a->rk == a->rd))) { \
>> +        printf("%s: warning, register equal\n", __func__);        \
>> +        return false;                                             \
>> +    }                                                             \
>> +                                                                  \
>> +    addr = get_gpr(a->rj);                                        \
>> +    val = get_gpr(a->rk);                                         \
>> +    ret = tcg_temp_new();                                         \
>> +                                                                  \
>> +    tcg_gen_atomic_##op##_tl(ret, addr, val, mem_idx, MO_TESL |   \
>> +                            ctx->default_tcg_memop_mask);         \
>> +    tcg_gen_mov_tl(Rd, ret);                                      \
>> +                                                                  \
>> +    tcg_temp_free(ret);                                           \
>> +                                                                  \
>> +    return true;                                                  \
>> +}
> 
> No printf.  Use a common routine instead of macros.
> 
OK.

Thanks
Song Gao.
> 
> r~



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 12/22] target/loongarch: Add fixed point extra instruction translation
  2021-07-23  5:12   ` Richard Henderson
@ 2021-07-26 12:57     ` Song Gao
  2021-07-26 16:42       ` Richard Henderson
  2021-08-04  7:40     ` Song Gao
  1 sibling, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-07-26 12:57 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee


Hi, Richard.

On 07/23/2021 01:12 PM, Richard Henderson wrote:
> On 7/20/21 11:53 PM, Song Gao wrote:
>> +target_ulong helper_cpucfg(CPULoongArchState *env, target_ulong rj)
>> +{
>> +    target_ulong r = 0;
>> +
>> +    switch (rj) {
>> +    case 0:
>> +        r = env->CSR_MCSR0 & 0xffffffff;
>> +        break;
>> +    case 1:
>> +        r = (env->CSR_MCSR0 & 0xffffffff00000000) >> 32;
>> +        break;
> 
> Why do you represent all of these as high and low portions of a 64-bit internal value, when the manual describes them as 32-bit values?
> 
This method can reduce variables on env.
> >> +/* Fixed point extra instruction translation */
>> +static bool trans_crc_w_b_w(DisasContext *ctx, arg_crc_w_b_w *a)
>> +{
>> +    TCGv t0, t1;
>> +    TCGv Rd = cpu_gpr[a->rd];
>> +    TCGv_i32 tsz = tcg_const_i32(1 << 1);
> 
> This size is wrong.  It should be 1, not 1 << 1 (2).
> > 
>> +static bool trans_crc_w_w_w(DisasContext *ctx, arg_crc_w_w_w *a)
>> +{
>> +    TCGv t0, t1;
>> +    TCGv Rd = cpu_gpr[a->rd];
>> +    TCGv_i32 tsz = tcg_const_i32(1 << 4);
> 
> Because this size most certainly should not be 16...
>>> +static bool trans_crc_w_d_w(DisasContext *ctx, arg_crc_w_d_w *a)
>> +{
>> +    TCGv t0, t1;
>> +    TCGv Rd = cpu_gpr[a->rd];
>> +    TCGv_i32 tsz = tcg_const_i32(1 << 8);
> 
> ... and this size should not be 256.  Both well larger than the 8 byte buffer that you've allocated.
> 

I'm not sure about that.

> Also, you need a helper so that you don't have 8 copies of this code.
> 
OK.
>> +static bool trans_asrtle_d(DisasContext *ctx, arg_asrtle_d * a)
>> +{
>> +    TCGv t0, t1;
>> +
>> +    t0 = get_gpr(a->rj);
>> +    t1 = get_gpr(a->rk);
>> +
>> +    gen_helper_asrtle_d(cpu_env, t0, t1);
>> +
>> +    return true;
>> +}
>> +
>> +static bool trans_asrtgt_d(DisasContext *ctx, arg_asrtgt_d * a)
>> +{
>> +    TCGv t0, t1;
>> +
>> +    t0 = get_gpr(a->rj);
>> +    t1 = get_gpr(a->rk);
>> +
>> +    gen_helper_asrtgt_d(cpu_env, t0, t1);
>> +
>> +    return true;
>> +}
> 
> I'm not sure why both of these instructions are in the ISA, since
> 
>   ASRTLE X,Y <-> ASRTGT Y,X
> 
> but we certainly don't need two different helpers.
> Just swap the arguments for one of them.
>
OK.
 
>> +static bool trans_rdtimel_w(DisasContext *ctx, arg_rdtimel_w *a)
>> +{
>> +    /* Nop */
>> +    return true;
>> +}
>> +
>> +static bool trans_rdtimeh_w(DisasContext *ctx, arg_rdtimeh_w *a)
>> +{
>> +    /* Nop */
>> +    return true;
>> +}
>> +
>> +static bool trans_rdtime_d(DisasContext *ctx, arg_rdtime_d *a)
>> +{
>> +    /* Nop */
>> +    return true;
>> +}
> 
> If you don't want to implement these right now, you should at least initialize the destination register to 0, or something.
> 
OK.
> 
> r~

Again ,thanks you kindly help.

Thanks
Song Gao.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 03/22] target/loongarch: Add core definition
  2021-07-26  8:47     ` Song Gao
@ 2021-07-26 15:32       ` Richard Henderson
  0 siblings, 0 replies; 76+ messages in thread
From: Richard Henderson @ 2021-07-26 15:32 UTC (permalink / raw)
  To: Song Gao
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee

On 7/25/21 10:47 PM, Song Gao wrote:
> Hmm,  but where can we declared in ? such as ARM architecture declared in internals.h, is that OK?

Yes.

It is preferable that only things that are used outside of target/arch/ go into cpu.h, and 
that everything that is private to target/arch/ go into some other local header (with 
internals.h being a good catch-all).


r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 06/22] target/loongarch: Add main translation routines
  2021-07-26  9:39     ` Song Gao
@ 2021-07-26 15:35       ` Richard Henderson
  0 siblings, 0 replies; 76+ messages in thread
From: Richard Henderson @ 2021-07-26 15:35 UTC (permalink / raw)
  To: Song Gao, qemu-devel
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan, laurent,
	maobibo, alistair.francis, pbonzini, alex.bennee

On 7/25/21 11:39 PM, Song Gao wrote:
>>> +void gen_base_offset_addr(TCGv addr, int base, int offset)
>>> +{
>>> +    if (base == 0) {
>>> +        tcg_gen_movi_tl(addr, offset);
>>> +    } else if (offset == 0) {
>>> +        gen_load_gpr(addr, base);
>>> +    } else {
>>> +        tcg_gen_movi_tl(addr, offset);
>>> +        gen_op_addr_add(addr, cpu_gpr[base], addr);
>>> +    }
>>> +}
>>
>> Using the interfaces I quote above from my riscv cleanup,
>> this can be tidied to
>>
>>      tcg_gen_addi_tl(addr, gpr_src(base), offset);
>>
> 
> 'riscv cleanup' series at https://patchew.org/QEMU/20210709042608.883256-1-richard.henderson@linaro.org/ , Right?

Yes.


r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 07/22] target/loongarch: Add fixed point arithmetic instruction translation
  2021-07-26 11:56     ` Song Gao
@ 2021-07-26 15:53       ` Richard Henderson
  2021-07-27  1:51         ` Song Gao
  0 siblings, 1 reply; 76+ messages in thread
From: Richard Henderson @ 2021-07-26 15:53 UTC (permalink / raw)
  To: Song Gao
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee

On 7/26/21 1:56 AM, Song Gao wrote:
> Hi, Richard.
> 
> On 07/23/2021 08:46 AM, Richard Henderson wrote:
>> On 7/20/21 11:53 PM, Song Gao wrote:
>>> +/* Fixed point arithmetic operation instruction translation */
>>> +static bool trans_add_w(DisasContext *ctx, arg_add_w *a)
>>> +{
>>> +    TCGv Rd = cpu_gpr[a->rd];
>>> +    TCGv Rj = cpu_gpr[a->rj];
>>> +    TCGv Rk = cpu_gpr[a->rk];
>>> +
>>> +    if (a->rd == 0) {
>>> +        /* Nop */
>>> +        return true;
>>> +    }
>>> +
>>> +    if (a->rj != 0 && a->rk != 0) {
>>> +        tcg_gen_add_tl(Rd, Rj, Rk);
>>> +        tcg_gen_ext32s_tl(Rd, Rd);
>>> +    } else if (a->rj == 0 && a->rk != 0) {
>>> +        tcg_gen_mov_tl(Rd, Rk);
>>> +    } else if (a->rj != 0 && a->rk == 0) {
>>> +        tcg_gen_mov_tl(Rd, Rj);
>>> +    } else {
>>> +        tcg_gen_movi_tl(Rd, 0);
>>> +    }
>>> +
>>> +    return true;
>>> +}
>>
>> Do not do all of this "if reg(n) zero" testing.
>>
>> Use a common function to perform the gpr lookup, and a small callback function for the operation.  Often, the callback function already exists within include/tcg/tcg-op.h.
>>
>> Please see my riscv cleanup patch set I referenced vs patch 6.
> 
> I am not sure  that 'riscv cleanup' patchs at:
>    
>     https://patchew.org/QEMU/20210709042608.883256-1-richard.henderson@linaro.org
> 
> It seems that  gpr_dst/gpr_src are common function to perform the gpr lookup. is that right?

More than that.  The gen_arith() function, for example, performs all of the bookkeeping 
for a binary operation.

For example,

static bool gen_arith(DisasContext *ctx, arg_fmt_rdrjrk *a,
                       void (*func)(TCGv, TCGv, TCGv))
{
    TCGv dest = gpr_dst(ctx, a->rd);
    TCGv src1 = gpr_src(ctx, a->rj);
    TCGv src2 = gpr_src(ctx, a->rk);

     func(dest, src1, src2);
     return true;
}

#define TRANS(NAME, FUNC, ...) \
     static bool trans_##NAME(DisasContext *ctx, arg_##NAME *a) \
     { return FUNC(ctx, a, __VA_ARGS__); }

static void gen_add_w(TCGv dest, TCGv src1, TCGv src2)
{
     tcg_gen_add_tl(dest, src1, src2);
     tcg_gen_ext32s_tl(dest, dest);
}

TRANS(add_w, gen_arith, gen_add_w)
TRANS(add_d, gen_arith, tcg_gen_add_tl)


r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 09/22] target/loongarch: Add fixed point bit instruction translation
  2021-07-26 12:22     ` Song Gao
@ 2021-07-26 16:39       ` Richard Henderson
  0 siblings, 0 replies; 76+ messages in thread
From: Richard Henderson @ 2021-07-26 16:39 UTC (permalink / raw)
  To: Song Gao
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee

On 7/26/21 2:22 AM, Song Gao wrote:
> Hi, Richard.
> 
> On 07/23/2021 09:29 AM, Richard Henderson wrote:
>> On 7/20/21 11:53 PM, Song Gao wrote:
>>> This patch implement fixed point bit instruction translation.
>>>
>>> This includes:
>>> - EXT.W.{B/H}
>>> - CL{O/Z}.{W/D}, CT{O/Z}.{W/D}
>>> - BYTEPICK.{W/D}
>>> - REVB.{2H/4H/2W/D}
>>> - REVH.{2W/D}
>>> - BITREV.{4B/8B}, BITREV.{W/D}
>>> - BSTRINS.{W/D}, BSTRPICK.{W/D}
>>> - MASKEQZ, MASKNEZ
>>>
>>> Signed-off-by: Song Gao <gaosong@loongson.cn>
>>> ---
>>>    target/loongarch/helper.h     |  10 +
>>>    target/loongarch/insns.decode |  45 +++
>>>    target/loongarch/op_helper.c  | 119 ++++++++
>>>    target/loongarch/trans.inc.c  | 665 ++++++++++++++++++++++++++++++++++++++++++
>>>    4 files changed, 839 insertions(+)
>>>
>>> diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
>>> index 6c7e19b..bbbcc26 100644
>>> --- a/target/loongarch/helper.h
>>> +++ b/target/loongarch/helper.h
>>> @@ -8,3 +8,13 @@
>>>      DEF_HELPER_3(raise_exception_err, noreturn, env, i32, int)
>>>    DEF_HELPER_2(raise_exception, noreturn, env, i32)
>>> +
>>> +DEF_HELPER_2(cto_w, tl, env, tl)
>>> +DEF_HELPER_2(ctz_w, tl, env, tl)
>>> +DEF_HELPER_2(cto_d, tl, env, tl)
>>> +DEF_HELPER_2(ctz_d, tl, env, tl)
>>
>> The count leading and trailing zero operations are built into tcg.  Count leading and trailing one simply needs a NOT operation to convert it to zero.
>>
> 
> My understanding is this:
>     
>    cto -> NOT operation (tcg_gen_not_tl)  -> ctz,
> 
>    is right?

Yes.


r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 12/22] target/loongarch: Add fixed point extra instruction translation
  2021-07-26 12:57     ` Song Gao
@ 2021-07-26 16:42       ` Richard Henderson
  2021-07-27  1:46         ` Song Gao
  0 siblings, 1 reply; 76+ messages in thread
From: Richard Henderson @ 2021-07-26 16:42 UTC (permalink / raw)
  To: Song Gao
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee

On 7/26/21 2:57 AM, Song Gao wrote:
> 
> Hi, Richard.
> 
> On 07/23/2021 01:12 PM, Richard Henderson wrote:
>> On 7/20/21 11:53 PM, Song Gao wrote:
>>> +target_ulong helper_cpucfg(CPULoongArchState *env, target_ulong rj)
>>> +{
>>> +    target_ulong r = 0;
>>> +
>>> +    switch (rj) {
>>> +    case 0:
>>> +        r = env->CSR_MCSR0 & 0xffffffff;
>>> +        break;
>>> +    case 1:
>>> +        r = (env->CSR_MCSR0 & 0xffffffff00000000) >> 32;
>>> +        break;
>>
>> Why do you represent all of these as high and low portions of a 64-bit internal value, when the manual describes them as 32-bit values?
>>
> This method can reduce variables on env.

The number of variables may increase, but the memory consumed -- which is the metric that 
is more important -- is still the same.

Also, it is much less confusing to match the description in the manual.


r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 12/22] target/loongarch: Add fixed point extra instruction translation
  2021-07-26 16:42       ` Richard Henderson
@ 2021-07-27  1:46         ` Song Gao
  0 siblings, 0 replies; 76+ messages in thread
From: Song Gao @ 2021-07-27  1:46 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee

Hi, Richard.

On 07/27/2021 12:42 AM, Richard Henderson wrote:
> On 7/26/21 2:57 AM, Song Gao wrote:
>>
>> Hi, Richard.
>>
>> On 07/23/2021 01:12 PM, Richard Henderson wrote:
>>> On 7/20/21 11:53 PM, Song Gao wrote:
>>>> +target_ulong helper_cpucfg(CPULoongArchState *env, target_ulong rj)
>>>> +{
>>>> +    target_ulong r = 0;
>>>> +
>>>> +    switch (rj) {
>>>> +    case 0:
>>>> +        r = env->CSR_MCSR0 & 0xffffffff;
>>>> +        break;
>>>> +    case 1:
>>>> +        r = (env->CSR_MCSR0 & 0xffffffff00000000) >> 32;
>>>> +        break;
>>>
>>> Why do you represent all of these as high and low portions of a 64-bit internal value, when the manual describes them as 32-bit values?
>>>
>> This method can reduce variables on env.
> 
> The number of variables may increase, but the memory consumed -- which is the metric that is more important -- is still the same.
> 
> Also, it is much less confusing to match the description in the manual.
> 
OK.

Thanks
Song Gao.
> 
> r~



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 07/22] target/loongarch: Add fixed point arithmetic instruction translation
  2021-07-26 15:53       ` Richard Henderson
@ 2021-07-27  1:51         ` Song Gao
  0 siblings, 0 replies; 76+ messages in thread
From: Song Gao @ 2021-07-27  1:51 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee

Hi, Richard.

On 07/26/2021 11:53 PM, Richard Henderson wrote:
> On 7/26/21 1:56 AM, Song Gao wrote:
>> Hi, Richard.
>>
>> On 07/23/2021 08:46 AM, Richard Henderson wrote:
>>> On 7/20/21 11:53 PM, Song Gao wrote:
>>>> +/* Fixed point arithmetic operation instruction translation */
>>>> +static bool trans_add_w(DisasContext *ctx, arg_add_w *a)
>>>> +{
>>>> +    TCGv Rd = cpu_gpr[a->rd];
>>>> +    TCGv Rj = cpu_gpr[a->rj];
>>>> +    TCGv Rk = cpu_gpr[a->rk];
>>>> +
>>>> +    if (a->rd == 0) {
>>>> +        /* Nop */
>>>> +        return true;
>>>> +    }
>>>> +
>>>> +    if (a->rj != 0 && a->rk != 0) {
>>>> +        tcg_gen_add_tl(Rd, Rj, Rk);
>>>> +        tcg_gen_ext32s_tl(Rd, Rd);
>>>> +    } else if (a->rj == 0 && a->rk != 0) {
>>>> +        tcg_gen_mov_tl(Rd, Rk);
>>>> +    } else if (a->rj != 0 && a->rk == 0) {
>>>> +        tcg_gen_mov_tl(Rd, Rj);
>>>> +    } else {
>>>> +        tcg_gen_movi_tl(Rd, 0);
>>>> +    }
>>>> +
>>>> +    return true;
>>>> +}
>>>
>>> Do not do all of this "if reg(n) zero" testing.
>>>
>>> Use a common function to perform the gpr lookup, and a small callback function for the operation.  Often, the callback function already exists within include/tcg/tcg-op.h.
>>>
>>> Please see my riscv cleanup patch set I referenced vs patch 6.
>>
>> I am not sure  that 'riscv cleanup' patchs at:
>>        https://patchew.org/QEMU/20210709042608.883256-1-richard.henderson@linaro.org
>>
>> It seems that  gpr_dst/gpr_src are common function to perform the gpr lookup. is that right?
> 
> More than that.  The gen_arith() function, for example, performs all of the bookkeeping for a binary operation.
> 
> For example,
> 
> static bool gen_arith(DisasContext *ctx, arg_fmt_rdrjrk *a,
>                       void (*func)(TCGv, TCGv, TCGv))
> {
>    TCGv dest = gpr_dst(ctx, a->rd);
>    TCGv src1 = gpr_src(ctx, a->rj);
>    TCGv src2 = gpr_src(ctx, a->rk);
> 
>     func(dest, src1, src2);
>     return true;
> }
> 
> #define TRANS(NAME, FUNC, ...) \
>     static bool trans_##NAME(DisasContext *ctx, arg_##NAME *a) \
>     { return FUNC(ctx, a, __VA_ARGS__); }
> 
> static void gen_add_w(TCGv dest, TCGv src1, TCGv src2)
> {
>     tcg_gen_add_tl(dest, src1, src2);
>     tcg_gen_ext32s_tl(dest, dest);
> }
> 
> TRANS(add_w, gen_arith, gen_add_w)
> TRANS(add_d, gen_arith, tcg_gen_add_tl)
> 
> 
OK

Again, thank you kindly help.

Thanks
Song Gao.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 13/22] target/loongarch: Add floating point arithmetic instruction translation
  2021-07-23  5:44   ` Richard Henderson
@ 2021-07-27  7:17     ` Song Gao
  2021-07-27 16:12       ` Richard Henderson
  0 siblings, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-07-27  7:17 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee

Hi, Richard.

On 07/23/2021 01:44 PM, Richard Henderson wrote:
> On 7/20/21 11:53 PM, Song Gao wrote:
>> +uint64_t helper_fp_sqrt_d(CPULoongArchState *env, uint64_t fp)
>> +{
>> +    fp = float64_sqrt(fp, &env->active_fpu.fp_status);
>> +    update_fcsr0(env, GETPC());
>> +    return fp;
>> +}
>> +
>> +uint32_t helper_fp_sqrt_s(CPULoongArchState *env, uint32_t fp)
>> +{
>> +    fp = float32_sqrt(fp, &env->active_fpu.fp_status);
>> +    update_fcsr0(env, GETPC());
>> +    return fp;
>> +}
> 
> I believe you will find it easier to take and return uint64_t, even for 32-bit operations.  The manual says that the high bits may contain any value, so in my opinion you should not work hard to preserve the high bits, as you currently do with
> 
>> +    gen_load_fpr32(fp0, a->fj);
>> +    gen_load_fpr32(fp1, a->fk);
>> +    gen_helper_fp_add_s(fp0, cpu_env, fp0, fp1);
>> +    gen_store_fpr32(fp0, a->fd);
> 
> I think this should be as simple as
> 
>   gen_helper_fp_add_s(cpu_fpu[a->fd], cpu_env,
>                       cpu_fpu[a->fj], cpu_fpu[a->fk]);
>
> I also think that loongarch should learn from risc-v and change the architecture to "nan-box" single-precision results -- fill the high 32-bits with 1s.  This is an SNaN representation for double-precision and will immediately fail when incorrectly using a single-precision value as a double-precision input.
> 
> Thankfully the current architecture is backward compatible with nan-boxing.
>

by this method,  the trans_fadd_s is   

static bool trans_fadd_s(DisasContext *ctx, arg_fadd_s * a)
{
    TCGv_i64 fp0, fp1;

    fp0 = tcg_temp_new_i64();
    fp1 = tcg_temp_new_i64();

    check_fpu_enabled(ctx);
    gen_load_fpr64(fp0, a->fj);
    gen_load_fpr64(fp1, a->fk);
    gen_helper_fp_add_s(fp0, cpu_env, fp0, fp1);

    gen_check_nanbox_s(fp0, fp0); /* from riscv */

    gen_store_fpr64(fp0, a->fd);

    tcg_temp_free_i64(fp0);
    tcg_temp_free_i64(fp1);

    return true;
}

uint64_t helper_fp_add_s(CPULoongArchState *env, uint64_t fp, uint64_t fp1)
{
    uint32_t fp2;

    fp2 = float32_add((uint32_t)fp, (uint32_t)fp1, &env->active_fpu.fp_status);
    update_fcsr0(env, GETPC());
    return (uint64_t)fp2;
}

is this right?

 
>> +/* Floating point arithmetic operation instruction translation */
>> +static bool trans_fadd_s(DisasContext *ctx, arg_fadd_s * a)
>> +{
>> +    TCGv_i32 fp0, fp1;
>> +
>> +    fp0 = tcg_temp_new_i32();
>> +    fp1 = tcg_temp_new_i32();
>> +
>> +    check_fpu_enabled(ctx);
>> +    gen_load_fpr32(fp0, a->fj);
>> +    gen_load_fpr32(fp1, a->fk);
>> +    gen_helper_fp_add_s(fp0, cpu_env, fp0, fp1);
>> +    gen_store_fpr32(fp0, a->fd);
>> +
>> +    tcg_temp_free_i32(fp0);
>> +    tcg_temp_free_i32(fp1);
>> +
>> +    return true;
>> +}
> 
> Again, you should use some helper functions to reduce the repetition.
>
OK>> +static bool trans_fmadd_d(DisasContext *ctx, arg_fmadd_d *a)
>> +{
>> +    TCGv_i64 fp0, fp1, fp2, fp3;
>> +
>> +    fp0 = tcg_temp_new_i64();
>> +    fp1 = tcg_temp_new_i64();
>> +    fp2 = tcg_temp_new_i64();
>> +    fp3 = tcg_temp_new_i64();
>> +
>> +    check_fpu_enabled(ctx);
>> +    gen_load_fpr64(fp0, a->fj);
>> +    gen_load_fpr64(fp1, a->fk);
>> +    gen_load_fpr64(fp2, a->fa);
>> +    check_fpu_enabled(ctx);
> 
> Repeating check_fpu_enabled.
> 
OK.
>> +    gen_helper_fp_madd_d(fp3, cpu_env, fp0, fp1, fp2);
>> +    gen_store_fpr64(fp3, a->fd);
> 
> I think you might as well pass in the float_muladd_* constant to a single helper rather than having 4 different helpers.
> 
OK
> 
> r~

Again. thank you kindly help.

Thanks
Song Gao.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 14/22] target/loongarch: Add floating point comparison instruction translation
  2021-07-23  6:11   ` Richard Henderson
@ 2021-07-27  7:56     ` Song Gao
  0 siblings, 0 replies; 76+ messages in thread
From: Song Gao @ 2021-07-27  7:56 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee


Hi, Richard.

On 07/23/2021 02:11 PM, Richard Henderson wrote:
> On 7/20/21 11:53 PM, Song Gao wrote:
>> +void helper_movreg2cf_i32(CPULoongArchState *env, uint32_t cd, uint32_t src)
>> +{
>> +    env->active_fpu.cf[cd & 0x7] = src & 0x1;
>> +}
>> +
>> +void helper_movreg2cf_i64(CPULoongArchState *env, uint32_t cd, uint64_t src)
>> +{
>> +    env->active_fpu.cf[cd & 0x7] = src & 0x1;
>> +}
>> +
>> +/* fcmp.cond.s */
>> +uint32_t helper_fp_cmp_caf_s(CPULoongArchState *env, uint32_t fp,
>> +                             uint32_t fp1)
>> +{
>> +    uint64_t ret;
>> +    ret = (float32_unordered_quiet(fp1, fp, &env->active_fpu.fp_status), 0);
>> +    update_fcsr0(env, GETPC());
>> +    if (ret) {
>> +        return -1;
>> +    } else {
>> +        return 0;
>> +    }
>> +}
> 
> I don't understand why you have split the compare from the store to cf?
> 
> I don't understand why you're returning -1 instead of 1, when the result is supposed to be a boolean.
> 
> Alternately, I don't understand why you want a helper function to perform a simple byte store operation.  You could easily store a byte with tcg_gen_st8_{i32,i64}.
>

Hmm, this part is seem too bad. 
 
>> +uint32_t helper_fp_cmp_cueq_s(CPULoongArchState *env, uint32_t fp,
>> +                              uint32_t fp1)
>> +{
>> +    uint64_t ret;
>> +    ret = float32_unordered_quiet(fp1, fp, &env->active_fpu.fp_status) ||
>> +          float32_eq_quiet(fp, fp1, &env->active_fpu.fp_status);
> 
> You're better off using
> 
>     FloatRelation cmp = float32_compare_quiet(fp0, fp1, status);
>     update_fcsr0(env, GETPC();
>     return cmp == float_relation_unordered ||
>            cmp == float_relation_equal;
> 
> Similarly with every other place you use two comparisons.
> 
> Indeed, one could conceivably condense everything into exactly four helper functions: two using float{32,64}_compare_quiet and two using float{32,64}_compare (signalling).  A 4th argument would be a bitmask of the different true conditions, exactly as listed in Table 9.
> 
> Since FloatRelation is in {-1, 0, 1, 2}, one could write
> 
>   return (mask >> (cmp + 1)) & 1;
>
This is a good idea!
 
Thanks
Song Gao.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 15/22] target/loongarch: Add floating point conversion instruction translation
  2021-07-23  6:16   ` Richard Henderson
@ 2021-07-27  7:57     ` Song Gao
  0 siblings, 0 replies; 76+ messages in thread
From: Song Gao @ 2021-07-27  7:57 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee


Hi Richard.

On 07/23/2021 02:16 PM, Richard Henderson wrote:
> On 7/20/21 11:53 PM, Song Gao wrote:
>> +uint64_t helper_fp_tintrm_l_d(CPULoongArchState *env, uint64_t src)
>> +{
>> +    uint64_t dest;
>> +
>> +    set_float_rounding_mode(float_round_down, &env->active_fpu.fp_status);
>> +    dest = float64_to_int64(src, &env->active_fpu.fp_status);
>> +    restore_rounding_mode(env);
> 
> Better off to save the current rounding mode with get_float_rounding_mode, and restore it afterward.
> 
> See 63d06e90e65d5f119039044e986a81007954a466.
> 
OK.
> 
> r~

Thanks
Song Gao.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 16/22] target/loongarch: Add floating point move instruction translation
  2021-07-23  6:29   ` Richard Henderson
@ 2021-07-27  8:06     ` Song Gao
  2021-08-12  9:20     ` Song Gao
  1 sibling, 0 replies; 76+ messages in thread
From: Song Gao @ 2021-07-27  8:06 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee

Hi, Richard.

On 07/23/2021 02:29 PM, Richard Henderson wrote:
> On 7/20/21 11:53 PM, Song Gao wrote:
>> This patch implement floationg point move instruction translation.
>>
>> This includes:
>> - FMOV.{S/D}
>> - FSEL
>> - MOVGR2FR.{W/D}, MOVGR2FRH.W
>> - MOVFR2GR.{S/D}, MOVFRH2GR.S
>> - MOVGR2FCSR, MOVFCSR2GR
>> - MOVFR2CF, MOVCF2FR
>> - MOVGR2CF, MOVCF2GR
>>
>> Signed-off-by: Song Gao <gaosong@loongson.cn>
>> ---
>>   target/loongarch/fpu_helper.c |  80 +++++++++++++
>>   target/loongarch/helper.h     |   6 +
>>   target/loongarch/insns.decode |  41 +++++++
>>   target/loongarch/trans.inc.c  | 270 ++++++++++++++++++++++++++++++++++++++++++
>>   4 files changed, 397 insertions(+)
>>
>> diff --git a/target/loongarch/fpu_helper.c b/target/loongarch/fpu_helper.c
>> index 162085a..7662715 100644
>> --- a/target/loongarch/fpu_helper.c
>> +++ b/target/loongarch/fpu_helper.c
>> @@ -379,6 +379,11 @@ uint64_t helper_fp_logb_d(CPULoongArchState *env, uint64_t fp)
>>       return fp1;
>>   }
>>   +void helper_movreg2cf(CPULoongArchState *env, uint32_t cd, target_ulong src)
>> +{
>> +    env->active_fpu.cf[cd & 0x7] = src & 0x1;
>> +}
> 
> tcg_gen_andi_tl + tcg_gen_st8_tl.
> 
OK.
>> +target_ulong helper_fsel(CPULoongArchState *env, target_ulong fj,
>> +                         target_ulong fk, uint32_t ca)
>> +{
>> +    if (env->active_fpu.cf[ca & 0x7]) {
>> +        return fk;
>> +    } else {
>> +        return fj;
>> +    }
>> +}
> 
> tcg_gen_movcond_i64.
> 
OK.
>> +void helper_movgr2fcsr(CPULoongArchState *env, target_ulong arg1,
>> +                       uint32_t fcsr)
>> +{
>> +    switch (fcsr) {
>> +    case 0:
>> +        env->active_fpu.fcsr0 = arg1;
>> +        break;
>> +    case 1:
>> +        env->active_fpu.fcsr0 = (arg1 & FCSR0_M1) |
>> +                                (env->active_fpu.fcsr0 & ~FCSR0_M1);
>> +        break;
>> +    case 2:
>> +        env->active_fpu.fcsr0 = (arg1 & FCSR0_M2) |
>> +                                (env->active_fpu.fcsr0 & ~FCSR0_M2);
>> +        break;
>> +    case 3:
>> +        env->active_fpu.fcsr0 = (arg1 & FCSR0_M3) |
>> +                                (env->active_fpu.fcsr0 & ~FCSR0_M3);
>> +        break;
> 
> This is easily implemented inline, followed by a single helper call to re-load the rounding mode (if required by the mask).
> 
OK.
>> +    case 16:
>> +        env->active_fpu.vcsr16 = arg1;
>> +        break;
> 
> The documentation I have does not describe the vector stuff?
> 

Yes, It is described in Volume II, but now  I need remove it .

>> +    default:
>> +        printf("%s: warning, fcsr '%d' not supported\n", __func__, fcsr);
>> +        assert(0);
>> +        break;
> 
> No printfs, no assert.  This should have been caught by
> 
>> +target_ulong helper_movcf2reg(CPULoongArchState *env, uint32_t cj)
>> +{
>> +    return (target_ulong)env->active_fpu.cf[cj & 0x7];
>> +}
> 
> tcg_gen_ld8u_tl.
>
OK.> 
> r~

Again. Thank you kindly help.

Thanks
Song Gao.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 17/22] target/loongarch: Add floating point load/store instruction translation
  2021-07-23  6:34   ` Richard Henderson
@ 2021-07-27  8:07     ` Song Gao
  0 siblings, 0 replies; 76+ messages in thread
From: Song Gao @ 2021-07-27  8:07 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee


Hi, Richard.

On 07/23/2021 02:34 PM, Richard Henderson wrote:
> On 7/20/21 11:53 PM, Song Gao wrote:
>> +static bool trans_fldx_d(DisasContext *ctx, arg_fldx_d *a)
>> +{
>> +    TCGv t0;
>> +    TCGv_i64 fp0;
>> +    TCGv Rj = cpu_gpr[a->rj];
>> +    TCGv Rk = cpu_gpr[a->rk];
>> +
>> +    t0 = tcg_temp_new();
>> +    fp0 = tcg_temp_new_i64();
>> +
>> +    if (a->rj == 0 && a->rk == 0) {
>> +        /* Nop */
>> +        return true;
>> +    }
> 
> This is not true.  This is simply a read from address 0 + 0 = 0.
> Similarly for all of the other indexed memory operations.
> 
> And again, you should be using helpers to reduce the replication here.
> 
OK.
> 
> r~

Thanks
Song Gao.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 18/22] target/loongarch: Add branch instruction translation
  2021-07-23  6:38   ` Richard Henderson
@ 2021-07-27  8:07     ` Song Gao
  0 siblings, 0 replies; 76+ messages in thread
From: Song Gao @ 2021-07-27  8:07 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee


Hi, Richard.

On 07/23/2021 02:38 PM, Richard Henderson wrote:
> On 7/20/21 11:53 PM, Song Gao wrote:
>> +/* Branch Instructions translation */
>> +static bool trans_beqz(DisasContext *ctx, arg_beqz *a)
>> +{
>> +    TCGv t0, t1;
>> +    int bcond_flag = 0;
>> +
>> +    t0 = tcg_temp_new();
>> +    t1 = tcg_const_i64(0);
>> +
>> +    if (a->rj != 0) {
>> +        gen_load_gpr(t0, a->rj);
>> +        bcond_flag = 1;
>> +    }
>> +
>> +    if (bcond_flag == 0) {
>> +        ctx->hflags |= LOONGARCH_HFLAG_B;
>> +    } else {
>> +        tcg_gen_setcond_tl(TCG_COND_EQ, bcond, t0, t1);
>> +        ctx->hflags |= LOONGARCH_HFLAG_BC;
>> +    }
>> +    ctx->btarget = ctx->base.pc_next + (a->offs21 << 2);
>> +
>> +    tcg_temp_free(t0);
>> +    tcg_temp_free(t1);
>> +
>> +    return true;
>> +}
> 
> Drop all of the branch delay slot stuff.
> Use a common routine and pass in the TCGCond.
>
OK> 
> r~

Thanks 
Song Gao.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 13/22] target/loongarch: Add floating point arithmetic instruction translation
  2021-07-27  7:17     ` Song Gao
@ 2021-07-27 16:12       ` Richard Henderson
  2021-07-28  1:18         ` Song Gao
  0 siblings, 1 reply; 76+ messages in thread
From: Richard Henderson @ 2021-07-27 16:12 UTC (permalink / raw)
  To: Song Gao
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee

On 7/26/21 9:17 PM, Song Gao wrote:
>> I think this should be as simple as
>>
>>    gen_helper_fp_add_s(cpu_fpu[a->fd], cpu_env,
>>                        cpu_fpu[a->fj], cpu_fpu[a->fk]);
>>
>> I also think that loongarch should learn from risc-v and change the architecture to "nan-box" single-precision results -- fill the high 32-bits with 1s.  This is an SNaN representation for double-precision and will immediately fail when incorrectly using a single-precision value as a double-precision input.
>>
>> Thankfully the current architecture is backward compatible with nan-boxing.
>>
> 
> by this method,  the trans_fadd_s is
> 
> static bool trans_fadd_s(DisasContext *ctx, arg_fadd_s * a)
> {
>      TCGv_i64 fp0, fp1;
> 
>      fp0 = tcg_temp_new_i64();
>      fp1 = tcg_temp_new_i64();
> 
>      check_fpu_enabled(ctx);
>      gen_load_fpr64(fp0, a->fj);
>      gen_load_fpr64(fp1, a->fk);
>      gen_helper_fp_add_s(fp0, cpu_env, fp0, fp1);
> 
>      gen_check_nanbox_s(fp0, fp0); /* from riscv */
> 
>      gen_store_fpr64(fp0, a->fd);
> 
>      tcg_temp_free_i64(fp0);
>      tcg_temp_free_i64(fp1);
> 
>      return true;
> }

A few points here:

(1) You do not need gen_load_fpr64 and gen_store_fpr64 at all.
     These were from mips to deal with the varying fpu sizes.

(2) If we need to call a helper, then the helper as much of
     the work a possible.  Therefore the nanboxing should be
     done there.  See riscv/fpu_helper.c, and the use of
     nanbox_s within that file.

(3) Again, use a helper function:

static bool gen_binary_fp(DisasContext *ctx, arg_fmt_fdfjfk *a,
                           void (*func)(TCGv_i64, TCGv_env,
                                        TCGv_i64, TCGv_i64))
{
     if (check_fpu_enabled(ctx)) {
         func(cpu_fpr[a->fd], cpu_env,
              cpu_fpr[a->fj], cpu_fpr[a->fk]);
     }
     return true;
}

TRANS(fadd_s, gen_binary_fp, gen_helper_fp_add_s)
TRANS(fadd_d, gen_binary_fp, gen_helper_fp_add_d)

> uint64_t helper_fp_add_s(CPULoongArchState *env, uint64_t fp, uint64_t fp1)
> {
>      uint32_t fp2;
> 
>      fp2 = float32_add((uint32_t)fp, (uint32_t)fp1, &env->active_fpu.fp_status);
>      update_fcsr0(env, GETPC());
>      return (uint64_t)fp2;
> }

with return nanbox_s(fp2);


r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 13/22] target/loongarch: Add floating point arithmetic instruction translation
  2021-07-27 16:12       ` Richard Henderson
@ 2021-07-28  1:18         ` Song Gao
  0 siblings, 0 replies; 76+ messages in thread
From: Song Gao @ 2021-07-28  1:18 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee

Hi, Richard.

On 07/28/2021 12:12 AM, Richard Henderson wrote:
> On 7/26/21 9:17 PM, Song Gao wrote:
>>> I think this should be as simple as
>>>
>>>    gen_helper_fp_add_s(cpu_fpu[a->fd], cpu_env,
>>>                        cpu_fpu[a->fj], cpu_fpu[a->fk]);
>>>
>>> I also think that loongarch should learn from risc-v and change the architecture to "nan-box" single-precision results -- fill the high 32-bits with 1s.  This is an SNaN representation for double-precision and will immediately fail when incorrectly using a single-precision value as a double-precision input.
>>>
>>> Thankfully the current architecture is backward compatible with nan-boxing.
>>>
>>
>> by this method,  the trans_fadd_s is
>>
>> static bool trans_fadd_s(DisasContext *ctx, arg_fadd_s * a)
>> {
>>      TCGv_i64 fp0, fp1;
>>
>>      fp0 = tcg_temp_new_i64();
>>      fp1 = tcg_temp_new_i64();
>>
>>      check_fpu_enabled(ctx);
>>      gen_load_fpr64(fp0, a->fj);
>>      gen_load_fpr64(fp1, a->fk);
>>      gen_helper_fp_add_s(fp0, cpu_env, fp0, fp1);
>>
>>      gen_check_nanbox_s(fp0, fp0); /* from riscv */
>>
>>      gen_store_fpr64(fp0, a->fd);
>>
>>      tcg_temp_free_i64(fp0);
>>      tcg_temp_free_i64(fp1);
>>
>>      return true;
>> }
> 
> A few points here:
> 
> (1) You do not need gen_load_fpr64 and gen_store_fpr64 at all.
>     These were from mips to deal with the varying fpu sizes.
> 
> (2) If we need to call a helper, then the helper as much of
>     the work a possible.  Therefore the nanboxing should be
>     done there.  See riscv/fpu_helper.c, and the use of
>     nanbox_s within that file.
> 
> (3) Again, use a helper function:
> 
> static bool gen_binary_fp(DisasContext *ctx, arg_fmt_fdfjfk *a,
>                           void (*func)(TCGv_i64, TCGv_env,
>                                        TCGv_i64, TCGv_i64))
> {
>     if (check_fpu_enabled(ctx)) {
>         func(cpu_fpr[a->fd], cpu_env,
>              cpu_fpr[a->fj], cpu_fpr[a->fk]);
>     }
>     return true;
> }
> 
> TRANS(fadd_s, gen_binary_fp, gen_helper_fp_add_s)
> TRANS(fadd_d, gen_binary_fp, gen_helper_fp_add_d)
> 
>> uint64_t helper_fp_add_s(CPULoongArchState *env, uint64_t fp, uint64_t fp1)
>> {
>>      uint32_t fp2;
>>
>>      fp2 = float32_add((uint32_t)fp, (uint32_t)fp1, &env->active_fpu.fp_status);
>>      update_fcsr0(env, GETPC());
>>      return (uint64_t)fp2;
>> }
> 
> with return nanbox_s(fp2);
>
OK.

Again, thank you kindly help.

Thanks
Song Gao.
> 
> r~



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 12/22] target/loongarch: Add fixed point extra instruction translation
  2021-07-23  5:12   ` Richard Henderson
  2021-07-26 12:57     ` Song Gao
@ 2021-08-04  7:40     ` Song Gao
  2021-08-04  7:51       ` Song Gao
  1 sibling, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-08-04  7:40 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee


On 07/23/2021 01:12 PM, Richard Henderson wrote:
>> +static bool trans_asrtle_d(DisasContext *ctx, arg_asrtle_d * a)
>> +{
>> +    TCGv t0, t1;
>> +
>> +    t0 = get_gpr(a->rj);
>> +    t1 = get_gpr(a->rk);
>> +
>> +    gen_helper_asrtle_d(cpu_env, t0, t1);
>> +
>> +    return true;
>> +}
>> +
>> +static bool trans_asrtgt_d(DisasContext *ctx, arg_asrtgt_d * a)
>> +{
>> +    TCGv t0, t1;
>> +
>> +    t0 = get_gpr(a->rj);
>> +    t1 = get_gpr(a->rk);
>> +
>> +    gen_helper_asrtgt_d(cpu_env, t0, t1);
>> +
>> +    return true;
>> +}
> 
> I'm not sure why both of these instructions are in the ISA, since
> 
>   ASRTLE X,Y <-> ASRTGT Y,X
> 
> but we certainly don't need two different helpers.
> Just swap the arguments for one of them.

Hi, Richard.

I find 'ASRTLE X,Y <-> ASRTGT Y,X ' is not right,
 
    ASRTLE X, Y is X <= Y, raise a exception.
    ASRTGT Y, X is X < Y, raise a exception, lose X=Y.

Thanks
Song Gao



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 12/22] target/loongarch: Add fixed point extra instruction translation
  2021-08-04  7:40     ` Song Gao
@ 2021-08-04  7:51       ` Song Gao
  0 siblings, 0 replies; 76+ messages in thread
From: Song Gao @ 2021-08-04  7:51 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, thuth, chenhuacai, alex.bennee, yangxiaojuan,
	maobibo, qemu-devel, alistair.francis, pbonzini, philmd, laurent


On 08/04/2021 03:40 PM, Song Gao wrote:
> 
> On 07/23/2021 01:12 PM, Richard Henderson wrote:
>>> +static bool trans_asrtle_d(DisasContext *ctx, arg_asrtle_d * a)
>>> +{
>>> +    TCGv t0, t1;
>>> +
>>> +    t0 = get_gpr(a->rj);
>>> +    t1 = get_gpr(a->rk);
>>> +
>>> +    gen_helper_asrtle_d(cpu_env, t0, t1);
>>> +
>>> +    return true;
>>> +}
>>> +
>>> +static bool trans_asrtgt_d(DisasContext *ctx, arg_asrtgt_d * a)
>>> +{
>>> +    TCGv t0, t1;
>>> +
>>> +    t0 = get_gpr(a->rj);
>>> +    t1 = get_gpr(a->rk);
>>> +
>>> +    gen_helper_asrtgt_d(cpu_env, t0, t1);
>>> +
>>> +    return true;
>>> +}
>>
>> I'm not sure why both of these instructions are in the ISA, since
>>
>>   ASRTLE X,Y <-> ASRTGT Y,X
>>
>> but we certainly don't need two different helpers.
>> Just swap the arguments for one of them.
> 
> Hi, Richard.
> 
> I find 'ASRTLE X,Y <-> ASRTGT Y,X ' is not right,
>  
>     ASRTLE X, Y is X <= Y, raise a exception.
>     ASRTGT Y, X is X < Y, raise a exception, lose X=Y.
> 

sorry, I said it wrong。

     ASRTLE X, Y is X > Y,  raise a exception.
     ASRTGT Y, X is X >= Y  raise a exception.  more X=Y.

> Thanks
> Song Gao
> 



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 16/22] target/loongarch: Add floating point move instruction translation
  2021-07-23  6:29   ` Richard Henderson
  2021-07-27  8:06     ` Song Gao
@ 2021-08-12  9:20     ` Song Gao
  2021-08-12 19:31       ` Richard Henderson
  1 sibling, 1 reply; 76+ messages in thread
From: Song Gao @ 2021-08-12  9:20 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee



On 07/23/2021 02:29 PM, Richard Henderson wrote:
> 
>> +void helper_movgr2fcsr(CPULoongArchState *env, target_ulong arg1,
>> +                       uint32_t fcsr)
>> +{
>> +    switch (fcsr) {
>> +    case 0:
>> +        env->active_fpu.fcsr0 = arg1;
>> +        break;
>> +    case 1:
>> +        env->active_fpu.fcsr0 = (arg1 & FCSR0_M1) |
>> +                                (env->active_fpu.fcsr0 & ~FCSR0_M1);
>> +        break;
>> +    case 2:
>> +        env->active_fpu.fcsr0 = (arg1 & FCSR0_M2) |
>> +                                (env->active_fpu.fcsr0 & ~FCSR0_M2);
>> +        break;
>> +    case 3:
>> +        env->active_fpu.fcsr0 = (arg1 & FCSR0_M3) |
>> +                                (env->active_fpu.fcsr0 & ~FCSR0_M3);
>> +        break;
> 
> This is easily implemented inline, followed by a single helper call to re-load the rounding mode (if required by the mask).

Hi, Richard, 

Sorry to bother you, When I was revising this patch, I found that I didn't seem to understand your opinion. 
Could you explain it in detail?  thank you very much.

Thanks
Song Gao.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 19/22] target/loongarch: Add disassembler
  2021-07-21  9:53 ` [PATCH v2 19/22] target/loongarch: Add disassembler Song Gao
  2021-07-23  6:40   ` Richard Henderson
@ 2021-08-12 10:33   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 76+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-08-12 10:33 UTC (permalink / raw)
  To: richard.henderson
  Cc: peter.maydell, thuth, chenhuacai, yangxiaojuan, laurent, maobibo,
	qemu-devel, alistair.francis, pbonzini, alex.bennee, Song Gao

Hi Richard,

On 7/21/21 11:53 AM, Song Gao wrote:
> This patch add support for disassembling via option '-d in_asm'.
> 
> Signed-off-by: Song Gao <gaosong@loongson.cn>
> ---
>  MAINTAINERS             |    1 +
>  disas/loongarch.c       | 2511 +++++++++++++++++++++++++++++++++++++++++++++++
>  disas/meson.build       |    1 +
>  include/disas/dis-asm.h |    2 +
>  meson.build             |    1 +
>  5 files changed, 2516 insertions(+)
>  create mode 100644 disas/loongarch.c

> +/* decode opcode */
> +static void decode_insn_opcode(la_decode *dec)
> +{
> +    uint32_t insn = dec->insn;
> +    uint16_t op = la_op_illegal;
> +    switch ((insn >> 26) & 0x3f) {
> +    case 0x0:
> +        switch ((insn >> 22) & 0xf) {
> +        case 0x0:
> +            switch ((insn >> 18) & 0xf) {
> +            case 0x0:
> +                switch ((insn >> 15) & 0x7) {
> +                case 0x0:
> +                    switch ((insn >> 10) & 0x1f) {
> +                    case 0x4:
> +                        op = la_op_clo_w;
> +                        break;
> +                    case 0x5:
> +                        op = la_op_clz_w;
> +                        break;
> +                    case 0x6:
> +                        op = la_op_cto_w;
> +                        break;
> +                    case 0x7:
> +                        op = la_op_ctz_w;
> +                        break;
> +                    case 0x8:
> +                        op = la_op_clo_d;
> +                        break;
> +                    case 0x9:
> +                        op = la_op_clz_d;
> +                        break;
> +                    case 0xa:
> +                        op = la_op_cto_d;
> +                        break;
> +                    case 0xb:
> +                        op = la_op_ctz_d;
> +                        break;
> +                    case 0xc:
> +                        op = la_op_revb_2h;
> +                        break;
> +                    case 0xd:
> +                        op = la_op_revb_4h;
> +                        break;
> +                    case 0xe:
> +                        op = la_op_revb_2w;
> +                        break;
> +                    case 0xf:
> +                        op = la_op_revb_d;
> +                        break;
> +                    case 0x10:
> +                        op = la_op_revh_2w;
> +                        break;
> +                    case 0x11:
> +                        op = la_op_revh_d;
> +                        break;
> +                    case 0x12:
> +                        op = la_op_bitrev_4b;
> +                        break;
> +                    case 0x13:
> +                        op = la_op_bitrev_8b;
> +                        break;
> +                    case 0x14:
> +                        op = la_op_bitrev_w;
> +                        break;
> +                    case 0x15:
> +                        op = la_op_bitrev_d;
> +                        break;
> +                    case 0x16:
> +                        op = la_op_ext_w_h;
> +                        break;
> +                    case 0x17:
> +                        op = la_op_ext_w_b;
> +                        break;
> +                    case 0x18:
> +                        op = la_op_rdtimel_w;
> +                        break;
> +                    case 0x19:
> +                        op = la_op_rdtimeh_w;
> +                        break;
> +                    case 0x1a:
> +                        op = la_op_rdtime_d;
> +                        break;
> +                    case 0x1b:
> +                        op = la_op_cpucfg;
> +                        break;
> +                    }
> +                    break;
> +                case 0x2:
> +                    switch (insn & 0x0000001f) {
> +                    case 0x00000000:
> +                        op = la_op_asrtle_d;
> +                        break;
> +                    }
> +                    break;
> +                case 0x3:
> +                    switch (insn & 0x0000001f) {
> +                    case 0x00000000:
> +                        op = la_op_asrtgt_d;
> +                        break;
> +                    }
> +                    break;
> +                }
> +                break;
> +            case 0x1:
> +                switch ((insn >> 17) & 0x1) {
> +                case 0x0:
> +                    op = la_op_alsl_w;
> +                    break;
> +                case 0x1:
> +                    op = la_op_alsl_wu;
> +                    break;
> +                }
> +                break;
> +            case 0x2:
> +                switch ((insn >> 17) & 0x1) {
> +                case 0x0:
> +                    op = la_op_bytepick_w;
> +                    break;
> +                }
> +                break;
> +            case 0x3:
> +                op = la_op_bytepick_d;
> +                break;
> +            case 0x4:
> +                switch ((insn >> 15) & 0x7) {
> +                case 0x0:
> +                    op = la_op_add_w;
> +                    break;
> +                case 0x1:
> +                    op = la_op_add_d;
> +                    break;
> +                case 0x2:
> +                    op = la_op_sub_w;
> +                    break;
> +                case 0x3:
> +                    op = la_op_sub_d;
> +                    break;
> +                case 0x4:
> +                    op = la_op_slt;
> +                    break;
> +                case 0x5:
> +                    op = la_op_sltu;
> +                    break;
> +                case 0x6:
> +                    op = la_op_maskeqz;
> +                    break;
> +                case 0x7:
> +                    op = la_op_masknez;
> +                    break;
> +                }
> +                break;
> +            case 0x5:
> +                switch ((insn >> 15) & 0x7) {
> +                case 0x0:
> +                    op = la_op_nor;
> +                    break;
> +                case 0x1:
> +                    op = la_op_and;
> +                    break;
> +                case 0x2:
> +                    op = la_op_or;
> +                    break;
> +                case 0x3:
> +                    op = la_op_xor;
> +                    break;
> +                case 0x4:
> +                    op = la_op_orn;
> +                    break;
> +                case 0x5:
> +                    op = la_op_andn;
> +                    break;
> +                case 0x6:
> +                    op = la_op_sll_w;
> +                    break;
> +                case 0x7:
> +                    op = la_op_srl_w;
> +                    break;
> +                }
> +                break;
> +            case 0x6:
> +                switch ((insn >> 15) & 0x7) {
> +                case 0x0:
> +                    op = la_op_sra_w;
> +                    break;
> +                case 0x1:
> +                    op = la_op_sll_d;
> +                    break;
> +                case 0x2:
> +                    op = la_op_srl_d;
> +                    break;
> +                case 0x3:
> +                    op = la_op_sra_d;
> +                    break;
> +                case 0x6:
> +                    op = la_op_rotr_w;
> +                    break;
> +                case 0x7:
> +                    op = la_op_rotr_d;
> +                    break;
> +                }
> +                break;
> +            case 0x7:
> +                switch ((insn >> 15) & 0x7) {
> +                case 0x0:
> +                    op = la_op_mul_w;
> +                    break;
> +                case 0x1:
> +                    op = la_op_mulh_w;
> +                    break;
> +                case 0x2:
> +                    op = la_op_mulh_wu;
> +                    break;
> +                case 0x3:
> +                    op = la_op_mul_d;
> +                    break;
> +                case 0x4:
> +                    op = la_op_mulh_d;
> +                    break;
> +                case 0x5:
> +                    op = la_op_mulh_du;
> +                    break;
> +                case 0x6:
> +                    op = la_op_mulw_d_w;
> +                    break;
> +                case 0x7:
> +                    op = la_op_mulw_d_wu;
> +                    break;
> +                }
> +                break;
> +            case 0x8:
> +                switch ((insn >> 15) & 0x7) {
> +                case 0x0:
> +                    op = la_op_div_w;
> +                    break;
> +                case 0x1:
> +                    op = la_op_mod_w;
> +                    break;
> +                case 0x2:
> +                    op = la_op_div_wu;
> +                    break;
> +                case 0x3:
> +                    op = la_op_mod_wu;
> +                    break;
> +                case 0x4:
> +                    op = la_op_div_d;
> +                    break;
> +                case 0x5:
> +                    op = la_op_mod_d;
> +                    break;
> +                case 0x6:
> +                    op = la_op_div_du;
> +                    break;
> +                case 0x7:
> +                    op = la_op_mod_du;
> +                    break;
> +                }
> +                break;
> +            case 0x9:
> +                switch ((insn >> 15) & 0x7) {
> +                case 0x0:
> +                    op = la_op_crc_w_b_w;
> +                    break;
> +                case 0x1:
> +                    op = la_op_crc_w_h_w;
> +                    break;
> +                case 0x2:
> +                    op = la_op_crc_w_w_w;
> +                    break;
> +                case 0x3:
> +                    op = la_op_crc_w_d_w;
> +                    break;
> +                case 0x4:
> +                    op = la_op_crcc_w_b_w;
> +                    break;
> +                case 0x5:
> +                    op = la_op_crcc_w_h_w;
> +                    break;
> +                case 0x6:
> +                    op = la_op_crcc_w_w_w;
> +                    break;
> +                case 0x7:
> +                    op = la_op_crcc_w_d_w;
> +                    break;
> +                }
> +                break;
> +            case 0xa:
> +                switch ((insn >> 15) & 0x7) {
> +                case 0x4:
> +                    op = la_op_break;
> +                    break;
> +                case 0x6:
> +                    op = la_op_syscall;
> +                    break;
> +                }
> +                break;
> +            case 0xb:
> +                switch ((insn >> 17) & 0x1) {
> +                case 0x0:
> +                    op = la_op_alsl_d;
> +                    break;
> +                }
> +                break;
> +            }
> +            break;
> +        case 0x1:
> +            switch ((insn >> 21) & 0x1) {
> +            case 0x0:
> +                switch ((insn >> 16) & 0x1f) {
> +                case 0x0:
> +                    switch ((insn >> 15) & 0x1) {
> +                    case 0x1:
> +                        op = la_op_slli_w;
> +                        break;
> +                    }
> +                    break;
> +                case 0x1:
> +                    op = la_op_slli_d;
> +                    break;
> +                case 0x4:
> +                    switch ((insn >> 15) & 0x1) {
> +                    case 0x1:
> +                        op = la_op_srli_w;
> +                        break;
> +                    }
> +                    break;
> +                case 0x5:
> +                    op = la_op_srli_d;
> +                    break;
> +                case 0x8:
> +                    switch ((insn >> 15) & 0x1) {
> +                    case 0x1:
> +                        op = la_op_srai_w;
> +                        break;
> +                    }
> +                    break;
> +                case 0x9:
> +                    op = la_op_srai_d;
> +                    break;
> +                case 0xc:
> +                    switch ((insn >> 15) & 0x1) {
> +                    case 0x1:
> +                        op = la_op_rotri_w;
> +                        break;
> +                    }
> +                    break;
> +                case 0xd:
> +                    op = la_op_rotri_d;
> +                    break;
> +                }
> +                break;
> +            case 0x1:
> +                switch ((insn >> 15) & 0x1) {
> +                case 0x0:
> +                    op = la_op_bstrins_w;
> +                    break;
> +                case 0x1:
> +                    op = la_op_bstrpick_w;
> +                    break;
> +                }
> +                break;
> +            }
> +            break;
> +        case 0x2:
> +            op = la_op_bstrins_d;
> +            break;
> +        case 0x3:
> +            op = la_op_bstrpick_d;
> +            break;
> +        case 0x4:
> +            switch ((insn >> 15) & 0x7f) {
> +            case 0x1:
> +                op = la_op_fadd_s;
> +                break;
> +            case 0x2:
> +                op = la_op_fadd_d;
> +                break;
> +            case 0x5:
> +                op = la_op_fsub_s;
> +                break;
> +            case 0x6:
> +                op = la_op_fsub_d;
> +                break;
> +            case 0x9:
> +                op = la_op_fmul_s;
> +                break;
> +            case 0xa:
> +                op = la_op_fmul_d;
> +                break;
> +            case 0xd:
> +                op = la_op_fdiv_s;
> +                break;
> +            case 0xe:
> +                op = la_op_fdiv_d;
> +                break;
> +            case 0x11:
> +                op = la_op_fmax_s;
> +                break;
> +            case 0x12:
> +                op = la_op_fmax_d;
> +                break;
> +            case 0x15:
> +                op = la_op_fmin_s;
> +                break;
> +            case 0x16:
> +                op = la_op_fmin_d;
> +                break;
> +            case 0x19:
> +                op = la_op_fmaxa_s;
> +                break;
> +            case 0x1a:
> +                op = la_op_fmaxa_d;
> +                break;
> +            case 0x1d:
> +                op = la_op_fmina_s;
> +                break;
> +            case 0x1e:
> +                op = la_op_fmina_d;
> +                break;
> +            case 0x21:
> +                op = la_op_fscaleb_s;
> +                break;
> +            case 0x22:
> +                op = la_op_fscaleb_d;
> +                break;
> +            case 0x25:
> +                op = la_op_fcopysign_s;
> +                break;
> +            case 0x26:
> +                op = la_op_fcopysign_d;
> +                break;
> +            case 0x28:
> +                switch ((insn >> 10) & 0x1f) {
> +                case 0x1:
> +                    op = la_op_fabs_s;
> +                    break;
> +                case 0x2:
> +                    op = la_op_fabs_d;
> +                    break;
> +                case 0x5:
> +                    op = la_op_fneg_s;
> +                    break;
> +                case 0x6:
> +                    op = la_op_fneg_d;
> +                    break;
> +                case 0x9:
> +                    op = la_op_flogb_s;
> +                    break;
> +                case 0xa:
> +                    op = la_op_flogb_d;
> +                    break;
> +                case 0xd:
> +                    op = la_op_fclass_s;
> +                    break;
> +                case 0xe:
> +                    op = la_op_fclass_d;
> +                    break;
> +                case 0x11:
> +                    op = la_op_fsqrt_s;
> +                    break;
> +                case 0x12:
> +                    op = la_op_fsqrt_d;
> +                    break;
> +                case 0x15:
> +                    op = la_op_frecip_s;
> +                    break;
> +                case 0x16:
> +                    op = la_op_frecip_d;
> +                    break;
> +                case 0x19:
> +                    op = la_op_frsqrt_s;
> +                    break;
> +                case 0x1a:
> +                    op = la_op_frsqrt_d;
> +                    break;
> +                }
> +                break;
> +            case 0x29:
> +                switch ((insn >> 10) & 0x1f) {
> +                case 0x5:
> +                    op = la_op_fmov_s;
> +                    break;
> +                case 0x6:
> +                    op = la_op_fmov_d;
> +                    break;
> +                case 0x9:
> +                    op = la_op_movgr2fr_w;
> +                    break;
> +                case 0xa:
> +                    op = la_op_movgr2fr_d;
> +                    break;
> +                case 0xb:
> +                    op = la_op_movgr2frh_w;
> +                    break;
> +                case 0xd:
> +                    op = la_op_movfr2gr_s;
> +                    break;
> +                case 0xe:
> +                    op = la_op_movfr2gr_d;
> +                    break;
> +                case 0xf:
> +                    op = la_op_movfrh2gr_s;
> +                    break;
> +                case 0x10:
> +                    op = la_op_movgr2fcsr;
> +                    break;
> +                case 0x12:
> +                    op = la_op_movfcsr2gr;
> +                    break;
> +                case 0x14:
> +                    switch ((insn >> 3) & 0x3) {
> +                    case 0x0:
> +                        op = la_op_movfr2cf;
> +                        break;
> +                    }
> +                    break;
> +                case 0x15:
> +                    switch ((insn >> 8) & 0x3) {
> +                    case 0x0:
> +                        op = la_op_movcf2fr;
> +                        break;
> +                    }
> +                    break;
> +                case 0x16:
> +                    switch ((insn >> 3) & 0x3) {
> +                    case 0x0:
> +                        op = la_op_movgr2cf;
> +                        break;
> +                    }
> +                    break;
> +                case 0x17:
> +                    switch ((insn >> 8) & 0x3) {
> +                    case 0x0:
> +                        op = la_op_movcf2gr;
> +                        break;
> +                    }
> +                    break;
> +                }
> +                break;
> +            case 0x32:
> +                switch ((insn >> 10) & 0x1f) {
> +                case 0x6:
> +                    op = la_op_fcvt_s_d;
> +                    break;
> +                case 0x9:
> +                    op = la_op_fcvt_d_s;
> +                    break;
> +                }
> +                break;
> +            case 0x34:
> +                switch ((insn >> 10) & 0x1f) {
> +                case 0x1:
> +                    op = la_op_ftintrm_w_s;
> +                    break;
> +                case 0x2:
> +                    op = la_op_ftintrm_w_d;
> +                    break;
> +                case 0x9:
> +                    op = la_op_ftintrm_l_s;
> +                    break;
> +                case 0xa:
> +                    op = la_op_ftintrm_l_d;
> +                    break;
> +                case 0x11:
> +                    op = la_op_ftintrp_w_s;
> +                    break;
> +                case 0x12:
> +                    op = la_op_ftintrp_w_d;
> +                    break;
> +                case 0x19:
> +                    op = la_op_ftintrp_l_s;
> +                    break;
> +                case 0x1a:
> +                    op = la_op_ftintrp_l_d;
> +                    break;
> +                }
> +                break;
> +            case 0x35:
> +                switch ((insn >> 10) & 0x1f) {
> +                case 0x1:
> +                    op = la_op_ftintrz_w_s;
> +                    break;
> +                case 0x2:
> +                    op = la_op_ftintrz_w_d;
> +                    break;
> +                case 0x9:
> +                    op = la_op_ftintrz_l_s;
> +                    break;
> +                case 0xa:
> +                    op = la_op_ftintrz_l_d;
> +                    break;
> +                case 0x11:
> +                    op = la_op_ftintrne_w_s;
> +                    break;
> +                case 0x12:
> +                    op = la_op_ftintrne_w_d;
> +                    break;
> +                case 0x19:
> +                    op = la_op_ftintrne_l_s;
> +                    break;
> +                case 0x1a:
> +                    op = la_op_ftintrne_l_d;
> +                    break;
> +                }
> +                break;
> +            case 0x36:
> +                switch ((insn >> 10) & 0x1f) {
> +                case 0x1:
> +                    op = la_op_ftint_w_s;
> +                    break;
> +                case 0x2:
> +                    op = la_op_ftint_w_d;
> +                    break;
> +                case 0x9:
> +                    op = la_op_ftint_l_s;
> +                    break;
> +                case 0xa:
> +                    op = la_op_ftint_l_d;
> +                    break;
> +                }
> +                break;
> +            case 0x3a:
> +                switch ((insn >> 10) & 0x1f) {
> +                case 0x4:
> +                    op = la_op_ffint_s_w;
> +                    break;
> +                case 0x6:
> +                    op = la_op_ffint_s_l;
> +                    break;
> +                case 0x8:
> +                    op = la_op_ffint_d_w;
> +                    break;
> +                case 0xa:
> +                    op = la_op_ffint_d_l;
> +                    break;
> +                }
> +                break;
> +            case 0x3c:
> +                switch ((insn >> 10) & 0x1f) {
> +                case 0x11:
> +                    op = la_op_frint_s;
> +                    break;
> +                case 0x12:
> +                    op = la_op_frint_d;
> +                    break;
> +                }
> +                break;
> +            }
> +            break;
> +        case 0x8:
> +            op = la_op_slti;
> +            break;
> +        case 0x9:
> +            op = la_op_sltui;
> +            break;
> +        case 0xa:
> +            op = la_op_addi_w;
> +            break;
> +        case 0xb:
> +            op = la_op_addi_d;
> +            break;
> +        case 0xc:
> +            op = la_op_lu52i_d;
> +            break;
> +        case 0xd:
> +            op = la_op_addi;
> +            break;
> +        case 0xe:
> +            op = la_op_ori;
> +            break;
> +        case 0xf:
> +            op = la_op_xori;
> +            break;
> +        }
> +        break;
> +    case 0x2:
> +        switch ((insn >> 20) & 0x3f) {
> +        case 0x1:
> +            op = la_op_fmadd_s;
> +            break;
> +        case 0x2:
> +            op = la_op_fmadd_d;
> +            break;
> +        case 0x5:
> +            op = la_op_fmsub_s;
> +            break;
> +        case 0x6:
> +            op = la_op_fmsub_d;
> +            break;
> +        case 0x9:
> +            op = la_op_fnmadd_s;
> +            break;
> +        case 0xa:
> +            op = la_op_fnmadd_d;
> +            break;
> +        case 0xd:
> +            op = la_op_fnmsub_s;
> +            break;
> +        case 0xe:
> +            op = la_op_fnmsub_d;
> +            break;
> +        }
> +        break;
> +    case 0x3:
> +        switch ((insn >> 20) & 0x3f) {
> +        case 0x1:
> +            switch ((insn >> 3) & 0x3) {
> +            case 0x0:
> +                op = la_op_fcmp_cond_s;
> +                break;
> +            }
> +            break;
> +        case 0x2:
> +            switch ((insn >> 3) & 0x3) {
> +            case 0x0:
> +                op = la_op_fcmp_cond_d;
> +                break;
> +            }
> +            break;
> +        case 0x10:
> +            switch ((insn >> 18) & 0x3) {
> +            case 0x0:
> +                op = la_op_fsel;
> +                break;
> +            }
> +            break;
> +        }
> +        break;
> +    case 0x4:
> +        op = la_op_addu16i_d;
> +        break;
> +    case 0x5:
> +        switch ((insn >> 25) & 0x1) {
> +        case 0x0:
> +            op = la_op_lu12i_w;
> +            break;
> +        case 0x1:
> +            op = la_op_lu32i_d;
> +            break;
> +        }
> +        break;
> +    case 0x6:
> +        switch ((insn >> 25) & 0x1) {
> +        case 0x0:
> +            op = la_op_pcaddi;
> +            break;
> +        case 0x1:
> +            op = la_op_pcalau12i;
> +            break;
> +        }
> +        break;
> +    case 0x7:
> +        switch ((insn >> 25) & 0x1) {
> +        case 0x0:
> +            op = la_op_pcaddu12i;
> +            break;
> +        case 0x1:
> +            op = la_op_pcaddu18i;
> +            break;
> +        }
> +        break;
> +    case 0x8:
> +        switch ((insn >> 24) & 0x3) {
> +        case 0x0:
> +            op = la_op_ll_w;
> +            break;
> +        case 0x1:
> +            op = la_op_sc_w;
> +            break;
> +        case 0x2:
> +            op = la_op_ll_d;
> +            break;
> +        case 0x3:
> +            op = la_op_sc_d;
> +            break;
> +        }
> +        break;
> +    case 0x9:
> +        switch ((insn >> 24) & 0x3) {
> +        case 0x0:
> +            op = la_op_ldptr_w;
> +            break;
> +        case 0x1:
> +            op = la_op_stptr_w;
> +            break;
> +        case 0x2:
> +            op = la_op_ldptr_d;
> +            break;
> +        case 0x3:
> +            op = la_op_stptr_d;
> +            break;
> +        }
> +        break;
> +    case 0xa:
> +        switch ((insn >> 22) & 0xf) {
> +        case 0x0:
> +            op = la_op_ld_b;
> +            break;
> +        case 0x1:
> +            op = la_op_ld_h;
> +            break;
> +        case 0x2:
> +            op = la_op_ld_w;
> +            break;
> +        case 0x3:
> +            op = la_op_ld_d;
> +            break;
> +        case 0x4:
> +            op = la_op_st_b;
> +            break;
> +        case 0x5:
> +            op = la_op_st_h;
> +            break;
> +        case 0x6:
> +            op = la_op_st_w;
> +            break;
> +        case 0x7:
> +            op = la_op_st_d;
> +            break;
> +        case 0x8:
> +            op = la_op_ld_bu;
> +            break;
> +        case 0x9:
> +            op = la_op_ld_hu;
> +            break;
> +        case 0xa:
> +            op = la_op_ld_wu;
> +            break;
> +        case 0xb:
> +            op = la_op_preld;
> +            break;
> +        case 0xc:
> +            op = la_op_fld_s;
> +            break;
> +        case 0xd:
> +            op = la_op_fst_s;
> +            break;
> +        case 0xe:
> +            op = la_op_fld_d;
> +            break;
> +        case 0xf:
> +            op = la_op_fst_d;
> +            break;
> +        }
> +        break;
> +    case 0xe:
> +        switch ((insn >> 15) & 0x7ff) {
> +        case 0x0:
> +            op = la_op_ldx_b;
> +            break;
> +        case 0x8:
> +            op = la_op_ldx_h;
> +            break;
> +        case 0x10:
> +            op = la_op_ldx_w;
> +            break;
> +        case 0x18:
> +            op = la_op_ldx_d;
> +            break;
> +        case 0x20:
> +            op = la_op_stx_b;
> +            break;
> +        case 0x28:
> +            op = la_op_stx_h;
> +            break;
> +        case 0x30:
> +            op = la_op_stx_w;
> +            break;
> +        case 0x38:
> +            op = la_op_stx_d;
> +            break;
> +        case 0x40:
> +            op = la_op_ldx_bu;
> +            break;
> +        case 0x48:
> +            op = la_op_ldx_hu;
> +            break;
> +        case 0x50:
> +            op = la_op_ldx_wu;
> +            break;
> +        case 0x60:
> +            op = la_op_fldx_s;
> +            break;
> +        case 0x68:
> +            op = la_op_fldx_d;
> +            break;
> +        case 0x70:
> +            op = la_op_fstx_s;
> +            break;
> +        case 0x78:
> +            op = la_op_fstx_d;
> +            break;
> +        case 0xc0:
> +            op = la_op_amswap_w;
> +            break;
> +        case 0xc1:
> +            op = la_op_amswap_d;
> +            break;
> +        case 0xc2:
> +            op = la_op_amadd_w;
> +            break;
> +        case 0xc3:
> +            op = la_op_amadd_d;
> +            break;
> +        case 0xc4:
> +            op = la_op_amand_w;
> +            break;
> +        case 0xc5:
> +            op = la_op_amand_d;
> +            break;
> +        case 0xc6:
> +            op = la_op_amor_w;
> +            break;
> +        case 0xc7:
> +            op = la_op_amor_d;
> +            break;
> +        case 0xc8:
> +            op = la_op_amxor_w;
> +            break;
> +        case 0xc9:
> +            op = la_op_amxor_d;
> +            break;
> +        case 0xca:
> +            op = la_op_ammax_w;
> +            break;
> +        case 0xcb:
> +            op = la_op_ammax_d;
> +            break;
> +        case 0xcc:
> +            op = la_op_ammin_w;
> +            break;
> +        case 0xcd:
> +            op = la_op_ammin_d;
> +            break;
> +        case 0xce:
> +            op = la_op_ammax_wu;
> +            break;
> +        case 0xcf:
> +            op = la_op_ammax_du;
> +            break;
> +        case 0xd0:
> +            op = la_op_ammin_wu;
> +             break;
> +        case 0xd1:
> +            op = la_op_ammin_du;
> +            break;
> +        case 0xd2:
> +            op = la_op_amswap_db_w;
> +            break;
> +        case 0xd3:
> +            op = la_op_amswap_db_d;
> +            break;
> +        case 0xd4:
> +            op = la_op_amadd_db_w;
> +            break;
> +        case 0xd5:
> +            op = la_op_amadd_db_d;
> +            break;
> +        case 0xd6:
> +            op = la_op_amand_db_w;
> +            break;
> +        case 0xd7:
> +            op = la_op_amand_db_d;
> +            break;
> +        case 0xd8:
> +            op = la_op_amor_db_w;
> +            break;
> +        case 0xd9:
> +            op = la_op_amor_db_d;
> +            break;
> +        case 0xda:
> +            op = la_op_amxor_db_w;
> +            break;
> +        case 0xdb:
> +            op = la_op_amxor_db_d;
> +            break;
> +        case 0xdc:
> +            op = la_op_ammax_db_w;
> +            break;
> +        case 0xdd:
> +            op = la_op_ammax_db_d;
> +            break;
> +        case 0xde:
> +            op = la_op_ammin_db_w;
> +            break;
> +        case 0xdf:
> +            op = la_op_ammin_db_d;
> +            break;
> +        case 0xe0:
> +            op = la_op_ammax_db_wu;
> +            break;
> +        case 0xe1:
> +            op = la_op_ammax_db_du;
> +            break;
> +        case 0xe2:
> +            op = la_op_ammin_db_wu;
> +            break;
> +        case 0xe3:
> +            op = la_op_ammin_db_du;
> +            break;
> +        case 0xe4:
> +            op = la_op_dbar;
> +            break;
> +        case 0xe5:
> +            op = la_op_ibar;
> +            break;
> +        case 0xe8:
> +            op = la_op_fldgt_s;
> +            break;
> +        case 0xe9:
> +            op = la_op_fldgt_d;
> +            break;
> +        case 0xea:
> +            op = la_op_fldle_s;
> +            break;
> +        case 0xeb:
> +            op = la_op_fldle_d;
> +            break;
> +        case 0xec:
> +            op = la_op_fstgt_s;
> +            break;
> +        case 0xed:
> +            op = la_op_fstgt_d;
> +            break;
> +        case 0xee:
> +            op = ls_op_fstle_s;
> +            break;
> +        case 0xef:
> +            op = la_op_fstle_d;
> +            break;
> +        case 0xf0:
> +            op = la_op_ldgt_b;
> +            break;
> +        case 0xf1:
> +            op = la_op_ldgt_h;
> +            break;
> +        case 0xf2:
> +            op = la_op_ldgt_w;
> +            break;
> +        case 0xf3:
> +            op = la_op_ldgt_d;
> +            break;
> +        case 0xf4:
> +            op = la_op_ldle_b;
> +            break;
> +        case 0xf5:
> +            op = la_op_ldle_h;
> +            break;
> +        case 0xf6:
> +            op = la_op_ldle_w;
> +            break;
> +        case 0xf7:
> +            op = la_op_ldle_d;
> +            break;
> +        case 0xf8:
> +            op = la_op_stgt_b;
> +            break;
> +        case 0xf9:
> +            op = la_op_stgt_h;
> +            break;
> +        case 0xfa:
> +            op = la_op_stgt_w;
> +            break;
> +        case 0xfb:
> +            op = la_op_stgt_d;
> +            break;
> +        case 0xfc:
> +            op = la_op_stle_b;
> +            break;
> +        case 0xfd:
> +            op = la_op_stle_h;
> +            break;
> +        case 0xfe:
> +            op = la_op_stle_w;
> +            break;
> +        case 0xff:
> +            op = la_op_stle_d;
> +            break;
> +        }
> +        break;
> +    case 0x10:
> +        op = la_op_beqz;
> +        break;
> +    case 0x11:
> +        op = la_op_bnez;
> +        break;
> +    case 0x12:
> +        switch ((insn >> 8) & 0x3) {
> +        case 0x0:
> +            op = la_op_bceqz;
> +            break;
> +        case 0x1:
> +            op = la_op_bcnez;
> +            break;
> +        }
> +        break;
> +    case 0x13:
> +        op = la_op_jirl;
> +        break;
> +    case 0x14:
> +        op = la_op_b;
> +        break;
> +    case 0x15:
> +        op = la_op_bl;
> +        break;
> +    case 0x16:
> +        op = la_op_beq;
> +        break;
> +    case 0x17:
> +        op = la_op_bne;
> +        break;
> +    case 0x18:
> +        op = la_op_blt;
> +        break;
> +    case 0x19:
> +        op = la_op_bge;
> +        break;
> +    case 0x1a:
> +        op = la_op_bltu;
> +        break;
> +    case 0x1b:
> +        op = la_op_bgeu;
> +        break;
> +    default:
> +        op = la_op_illegal;
> +        break;
> +    }
> +    dec->op = op;
> +}
> +
> +/* operand extractors */
> +#define IM_5  5
> +#define IM_8  8
> +#define IM_12 12
> +#define IM_14 14
> +#define IM_15 15
> +#define IM_16 16
> +#define IM_20 20
> +#define IM_21 21
> +#define IM_26 26
> +
> +static uint32_t operand_r1(uint32_t insn)
> +{
> +    return insn & 0x1f;
> +}
> +
> +static uint32_t operand_r2(uint32_t insn)
> +{
> +    return (insn >> 5) & 0x1f;
> +}
> +
> +static uint32_t operand_r3(uint32_t insn)
> +{
> +    return (insn >> 10) & 0x1f;
> +}
> +
> +static uint32_t operand_r4(uint32_t insn)
> +{
> +    return (insn >> 15) & 0x1f;
> +}
> +
> +static uint32_t operand_u6(uint32_t insn)
> +{
> +    return (insn >> 10) & 0x3f;
> +}
> +
> +static uint32_t operand_bw1(uint32_t insn)
> +{
> +    return (insn >> 10) & 0x1f;
> +}
> +
> +static uint32_t operand_bw2(uint32_t insn)
> +{
> +    return (insn >> 16) & 0x1f;
> +}
> +
> +static uint32_t operand_bd1(uint32_t insn)
> +{
> +    return (insn >> 10) & 0x3f;
> +}
> +
> +static uint32_t operand_bd2(uint32_t insn)
> +{
> +    return (insn >> 16) & 0x3f;
> +}
> +
> +static uint32_t operand_sa2(uint32_t insn)
> +{
> +    return (insn >> 15) & 0x3;
> +}
> +
> +static uint32_t operand_sa3(uint32_t insn)
> +{
> +    return (insn >> 15) & 0x3;
> +}
> +
> +static int32_t operand_im20(uint32_t insn)
> +{
> +    int32_t imm = (int32_t)((insn >> 5) & 0xfffff);
> +    return imm > (1 << 19) ? imm - (1 << 20) : imm;
> +}
> +
> +static int32_t operand_im16(uint32_t insn)
> +{
> +    int32_t imm = (int32_t)((insn >> 10) & 0xffff);
> +    return imm > (1 << 15) ? imm - (1 << 16) : imm;
> +}
> +
> +static int32_t operand_im14(uint32_t insn)
> +{
> +    int32_t imm = (int32_t)((insn >> 10) & 0x3fff);
> +    return imm > (1 << 13) ? imm - (1 << 14) : imm;
> +}
> +
> +static int32_t operand_im12(uint32_t insn)
> +{
> +    int32_t imm = (int32_t)((insn >> 10) & 0xfff);
> +    return imm > (1 << 11) ? imm - (1 << 12) : imm;
> +}
> +
> +static int32_t operand_im8(uint32_t insn)
> +{
> +    int32_t imm = (int32_t)((insn >> 10) & 0xff);
> +    return imm > (1 << 7) ? imm - (1 << 8) : imm;
> +}
> +
> +static uint32_t operand_sd(uint32_t insn)
> +{
> +    return insn & 0x3;
> +}
> +
> +static uint32_t operand_sj(uint32_t insn)
> +{
> +    return (insn >> 5) & 0x3;
> +}
> +
> +static uint32_t operand_cd(uint32_t insn)
> +{
> +    return insn & 0x7;
> +}
> +
> +static uint32_t operand_cj(uint32_t insn)
> +{
> +    return (insn >> 5) & 0x7;
> +}
> +
> +static uint32_t operand_code(uint32_t insn)
> +{
> +    return insn & 0x7fff;
> +}
> +
> +static int32_t operand_whint(uint32_t insn)
> +{
> +    int32_t imm = (int32_t)(insn & 0x7fff);
> +    return imm > (1 << 14) ? imm - (1 << 15) : imm;
> +}
> +
> +static int32_t operand_invop(uint32_t insn)
> +{
> +    int32_t imm = (int32_t)(insn & 0x1f);
> +    return imm > (1 << 4) ? imm - (1 << 5) : imm;
> +}
> +
> +static int32_t operand_ofs21(uint32_t insn)
> +{
> +    int32_t imm = (((int32_t)insn & 0x1f) << 16) |
> +        ((insn >> 10) & 0xffff);
> +    return imm > (1 << 20) ? imm - (1 << 21) : imm;
> +}
> +
> +static int32_t operand_ofs26(uint32_t insn)
> +{
> +    int32_t imm = (((int32_t)insn & 0x3ff) << 16) |
> +        ((insn >> 10) & 0xffff);
> +    return imm > (1 << 25) ? imm - (1 << 26) : imm;
> +}
> +
> +static uint32_t operand_fcond(uint32_t insn)
> +{
> +    return (insn >> 15) & 0x1f;
> +}
> +
> +static uint32_t operand_sel(uint32_t insn)
> +{
> +    return (insn >> 15) & 0x7;
> +}
> +
> +/* decode operands */
> +static void decode_insn_operands(la_decode *dec)
> +{
> +    uint32_t insn = dec->insn;
> +    dec->codec = opcode_data[dec->op].codec;
> +    switch (dec->codec) {
> +    case la_codec_illegal:
> +    case la_codec_empty:
> +        break;
> +    case la_codec_2r:
> +        dec->r1 = operand_r1(insn);
> +        dec->r2 = operand_r2(insn);
> +        break;
> +    case la_codec_2r_u5:
> +        dec->r1 = operand_r1(insn);
> +        dec->r2 = operand_r2(insn);
> +        dec->r3 = operand_r3(insn);
> +        break;
> +    case la_codec_2r_u6:
> +        dec->r1 = operand_r1(insn);
> +        dec->r2 = operand_r2(insn);
> +        dec->r3 = operand_u6(insn);
> +        break;
> +    case la_codec_2r_2bw:
> +        dec->r1 = operand_r1(insn);
> +        dec->r2 = operand_r2(insn);
> +        dec->r3 = operand_bw1(insn);
> +        dec->r4 = operand_bw2(insn);
> +        break;
> +    case la_codec_2r_2bd:
> +        dec->r1 = operand_r1(insn);
> +        dec->r2 = operand_r2(insn);
> +        dec->r3 = operand_bd1(insn);
> +        dec->r4 = operand_bd2(insn);
> +        break;
> +    case la_codec_3r:
> +        dec->r1 = operand_r1(insn);
> +        dec->r2 = operand_r2(insn);
> +        dec->r3 = operand_r3(insn);
> +        break;
> +    case la_codec_3r_rd0:
> +        dec->r1 = 0;
> +        dec->r2 = operand_r2(insn);
> +        dec->r3 = operand_r3(insn);
> +        break;
> +    case la_codec_3r_sa2:
> +        dec->r1 = operand_r1(insn);
> +        dec->r2 = operand_r2(insn);
> +        dec->r3 = operand_r3(insn);
> +        dec->r4 = operand_sa2(insn);
> +        break;
> +    case la_codec_3r_sa3:
> +        dec->r1 = operand_r1(insn);
> +        dec->r2 = operand_r2(insn);
> +        dec->r3 = operand_r3(insn);
> +        dec->r4 = operand_sa3(insn);
> +        break;
> +    case la_codec_4r:
> +        dec->r1 = operand_r1(insn);
> +        dec->r2 = operand_r2(insn);
> +        dec->r3 = operand_r3(insn);
> +        dec->r4 = operand_r4(insn);
> +        break;
> +    case la_codec_r_im20:
> +        dec->r1 = operand_r1(insn);
> +        dec->imm = operand_im20(insn);
> +        dec->bit = IM_20;
> +        break;
> +    case la_codec_2r_im16:
> +        dec->r1 = operand_r1(insn);
> +        dec->r2 = operand_r2(insn);
> +        dec->imm = operand_im16(insn);
> +        dec->bit = IM_16;
> +        break;
> +    case la_codec_2r_im14:
> +        dec->r1 = operand_r1(insn);
> +        dec->r2 = operand_r2(insn);
> +        dec->imm = operand_im14(insn);
> +        dec->bit = IM_14;
> +        break;
> +    case la_codec_r_im14:
> +        dec->r1 = operand_r1(insn);
> +        dec->imm = operand_im14(insn);
> +        dec->bit = IM_14;
> +        break;
> +    case la_codec_im5_r_im12:
> +        dec->imm2 = operand_r1(insn);
> +        dec->r2 = operand_r2(insn);
> +        dec->imm = operand_im12(insn);
> +        dec->bit = IM_12;
> +        break;
> +    case la_codec_2r_im12:
> +        dec->r1 = operand_r1(insn);
> +        dec->r2 = operand_r2(insn);
> +        dec->imm = operand_im12(insn);
> +        dec->bit = IM_12;
> +        break;
> +    case la_codec_2r_im8:
> +        dec->r1 = operand_r1(insn);
> +        dec->r2 = operand_r2(insn);
> +        dec->imm = operand_im8(insn);
> +        dec->bit = IM_8;
> +        break;
> +    case la_codec_r_sd:
> +        dec->r1 = operand_sd(insn);
> +        dec->r2 = operand_r2(insn);
> +        break;
> +    case la_codec_r_sj:
> +        dec->r1 = operand_r1(insn);
> +        dec->r2 = operand_sj(insn);
> +        break;
> +    case la_codec_r_cd:
> +        dec->r1 = operand_cd(insn);
> +        dec->r2 = operand_r2(insn);
> +        break;
> +    case la_codec_r_cj:
> +        dec->r1 = operand_r1(insn);
> +        dec->r2 = operand_cj(insn);
> +        break;
> +    case la_codec_r_seq:
> +        dec->r1 = 0;
> +        dec->r2 = operand_r1(insn);
> +        dec->imm = operand_im8(insn);
> +        dec->bit = IM_8;
> +        break;
> +    case la_codec_code:
> +        dec->code = operand_code(insn);
> +        break;
> +    case la_codec_whint:
> +        dec->imm = operand_whint(insn);
> +        dec->bit = IM_15;
> +        break;
> +    case la_codec_invtlb:
> +        dec->imm = operand_invop(insn);
> +        dec->bit = IM_5;
> +        dec->r2 = operand_r2(insn);
> +        dec->r3 = operand_r3(insn);
> +        break;
> +    case la_codec_r_ofs21:
> +        dec->imm = operand_ofs21(insn);
> +        dec->bit = IM_21;
> +        dec->r2 = operand_r2(insn);
> +        break;
> +    case la_codec_cj_ofs21:
> +        dec->imm = operand_ofs21(insn);
> +        dec->bit = IM_21;
> +        dec->r2 = operand_cj(insn);
> +        break;
> +    case la_codec_ofs26:
> +        dec->imm = operand_ofs26(insn);
> +        dec->bit = IM_26;
> +        break;
> +    case la_codec_cond:
> +        dec->r1 = operand_cd(insn);
> +        dec->r2 = operand_r2(insn);
> +        dec->r3 = operand_r3(insn);
> +        dec->r4 = operand_fcond(insn);
> +        break;
> +    case la_codec_sel:
> +        dec->r1 = operand_r1(insn);
> +        dec->r2 = operand_r2(insn);
> +        dec->r3 = operand_r3(insn);
> +        dec->r4 = operand_sel(insn);
> +        break;
> +    }
> +}

Am I right these 1500 lines could eventually be generated by
the decodetree.py script parsing target/loongarch/insns.decode?


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 16/22] target/loongarch: Add floating point move instruction translation
  2021-08-12  9:20     ` Song Gao
@ 2021-08-12 19:31       ` Richard Henderson
  0 siblings, 0 replies; 76+ messages in thread
From: Richard Henderson @ 2021-08-12 19:31 UTC (permalink / raw)
  To: Song Gao
  Cc: peter.maydell, thuth, chenhuacai, philmd, yangxiaojuan,
	qemu-devel, maobibo, laurent, alistair.francis, pbonzini,
	alex.bennee

On 8/11/21 11:20 PM, Song Gao wrote:
>> This is easily implemented inline, followed by a single helper call to re-load the rounding mode (if required by the mask).
> 
> Hi, Richard,
>
> Sorry to bother you, When I was revising this patch, I found that I didn't seem to
> understand your opinion. Could you explain it in detail?  thank you very much.
---%<

static const uint32_t fcsr_mask[4] = {
     UINT32_MAX, FCSR0_M1, FCSR0_M2, FCSR0_M3
};

bool trans_movgr2fcsr(DisasContext *s, arg_movgr2fcsr *a)
{
     uint32_t mask = fcsr_mask[a->fcsr];

     if (mask == UINT32_MAX) {
         tcg_gen_extrl_i64_i32(fpu_fscr0, get_gpr(a->rj));
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();

         tcg_gen_extrl_i64_i32(tmp, get_gpr(a->rj));
         tcg_gen_andi_i32(tmp, tmp, mask);
         tcg_gen_andi_i32(fpu_fcsr0, cpu_fcsr0, ~mask);
         tcg_gen_or_i32(fpu_fcsr0, fpu_fcsr0, tmp);
         tcg_temp_free_i32(tmp);

         /*
          * Install the new rounding mode to fpu_status, if changed.
          * Note that FCSR3 is exactly the rounding mode field.
          */
         if (mask != FCSR0_M3) {
             return true;
         }
     }
     gen_helper_set_rounding_mode(cpu_env, fpu_fcsr0);
     return true;
}

void trans_movfcsr2gr(DisasContext *s, arg_movfcsr2gr *a)
{
     TCGv_i32 tmp = tcg_temp_new_i32();

     tcg_gen_andi_i32(tmp, fpu_fcsr0, fcsr_mask[a->fcsr]);
     tcg_gen_ext_i32_i64(dest_gpr(a->rd), tmp);
     tcg_temp_free_i32(tmp);
     return true;
}

---%<

DEF_HELPER_FLAGS_2(set_rounding_mode, TCG_CALL_NO_RWG, void, env, i32)

---%<

void HELPER(set_rounding_mode)(CPULoongArchState *env, uint32_t fcsr)
{
     set_float_rounding_mode(ieee_rm[(fcsr0 >> FCSR0_RM) & 0x3],
                             &env->fp_status);
}


r~


^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2021-08-12 19:33 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-21  9:52 [PATCH v2 00/22] Add LoongArch linux-user emulation support Song Gao
2021-07-21  9:52 ` [PATCH v2 01/22] target/loongarch: Add README Song Gao
2021-07-21  9:52 ` [PATCH v2 02/22] target/loongarch: Add CSR registers definition Song Gao
2021-07-21  9:52 ` [PATCH v2 03/22] target/loongarch: Add core definition Song Gao
2021-07-22 22:43   ` Richard Henderson
2021-07-26  8:47     ` Song Gao
2021-07-26 15:32       ` Richard Henderson
2021-07-21  9:53 ` [PATCH v2 04/22] target/loongarch: Add interrupt handling support Song Gao
2021-07-22 22:47   ` Richard Henderson
2021-07-26  9:23     ` Song Gao
2021-07-21  9:53 ` [PATCH v2 05/22] target/loongarch: Add memory management support Song Gao
2021-07-22 22:48   ` Richard Henderson
2021-07-26  9:25     ` Song Gao
2021-07-21  9:53 ` [PATCH v2 06/22] target/loongarch: Add main translation routines Song Gao
2021-07-22 23:50   ` Richard Henderson
2021-07-26  9:39     ` Song Gao
2021-07-26 15:35       ` Richard Henderson
2021-07-21  9:53 ` [PATCH v2 07/22] target/loongarch: Add fixed point arithmetic instruction translation Song Gao
2021-07-21 17:38   ` Philippe Mathieu-Daudé
2021-07-21 17:49     ` Philippe Mathieu-Daudé
2021-07-22  7:41       ` Song Gao
2021-07-23  0:46   ` Richard Henderson
2021-07-26 11:56     ` Song Gao
2021-07-26 15:53       ` Richard Henderson
2021-07-27  1:51         ` Song Gao
2021-07-21  9:53 ` [PATCH v2 08/22] target/loongarch: Add fixed point shift " Song Gao
2021-07-23  0:51   ` Richard Henderson
2021-07-26 11:57     ` Song Gao
2021-07-21  9:53 ` [PATCH v2 09/22] target/loongarch: Add fixed point bit " Song Gao
2021-07-21 17:46   ` Philippe Mathieu-Daudé
2021-07-22  8:17     ` Song Gao
2021-07-23  1:29   ` Richard Henderson
2021-07-26 12:22     ` Song Gao
2021-07-26 16:39       ` Richard Henderson
2021-07-21  9:53 ` [PATCH v2 10/22] target/loongarch: Add fixed point load/store " Song Gao
2021-07-23  1:45   ` Richard Henderson
2021-07-26 12:25     ` Song Gao
2021-07-21  9:53 ` [PATCH v2 11/22] target/loongarch: Add fixed point atomic " Song Gao
2021-07-23  1:49   ` Richard Henderson
2021-07-26 12:25     ` Song Gao
2021-07-21  9:53 ` [PATCH v2 12/22] target/loongarch: Add fixed point extra " Song Gao
2021-07-23  5:12   ` Richard Henderson
2021-07-26 12:57     ` Song Gao
2021-07-26 16:42       ` Richard Henderson
2021-07-27  1:46         ` Song Gao
2021-08-04  7:40     ` Song Gao
2021-08-04  7:51       ` Song Gao
2021-07-21  9:53 ` [PATCH v2 13/22] target/loongarch: Add floating point arithmetic " Song Gao
2021-07-23  5:44   ` Richard Henderson
2021-07-27  7:17     ` Song Gao
2021-07-27 16:12       ` Richard Henderson
2021-07-28  1:18         ` Song Gao
2021-07-21  9:53 ` [PATCH v2 14/22] target/loongarch: Add floating point comparison " Song Gao
2021-07-23  6:11   ` Richard Henderson
2021-07-27  7:56     ` Song Gao
2021-07-21  9:53 ` [PATCH v2 15/22] target/loongarch: Add floating point conversion " Song Gao
2021-07-23  6:16   ` Richard Henderson
2021-07-27  7:57     ` Song Gao
2021-07-21  9:53 ` [PATCH v2 16/22] target/loongarch: Add floating point move " Song Gao
2021-07-23  6:29   ` Richard Henderson
2021-07-27  8:06     ` Song Gao
2021-08-12  9:20     ` Song Gao
2021-08-12 19:31       ` Richard Henderson
2021-07-21  9:53 ` [PATCH v2 17/22] target/loongarch: Add floating point load/store " Song Gao
2021-07-23  6:34   ` Richard Henderson
2021-07-27  8:07     ` Song Gao
2021-07-21  9:53 ` [PATCH v2 18/22] target/loongarch: Add branch " Song Gao
2021-07-23  6:38   ` Richard Henderson
2021-07-27  8:07     ` Song Gao
2021-07-21  9:53 ` [PATCH v2 19/22] target/loongarch: Add disassembler Song Gao
2021-07-23  6:40   ` Richard Henderson
2021-08-12 10:33   ` Philippe Mathieu-Daudé
2021-07-21  9:53 ` [PATCH v2 20/22] LoongArch Linux User Emulation Song Gao
2021-07-21  9:53 ` [PATCH v2 21/22] configs: Add loongarch linux-user config Song Gao
2021-07-23  6:43   ` Richard Henderson
2021-07-21  9:53 ` [PATCH v2 22/22] target/loongarch: Add target build suport Song Gao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.