All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/30] 64-bit LoongArch port of QEMU TCG
@ 2021-09-20  8:04 WANG Xuerui
  2021-09-20  8:04 ` [PATCH 01/30] elf: Add machine type value for LoongArch WANG Xuerui
                   ` (29 more replies)
  0 siblings, 30 replies; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Hi all,

This is a port of QEMU TCG to the brand-new CPU architecture LoongArch,
introduced by Loongson with their 3A5000 chips. Test suite all passed
except one timeout that is test-crypto-tlssession, but this particular
case runs well when relatively few targets are enabled, so it may be
just a case of low performance (4C4T 2.5GHz). I also boot-tested x86_64
(Debian and Gentoo installation CDs) and aarch64 (Debian netboot
installer), and ran riscv64 linux-user emulation with a chroot;
everything seems fine so far.

## About the series

Only the LP64 ABI is supported, as this is the only one fully
implemented and supported by Loongson. 32-bit support is incomplete from
outset, and removed from the very latest upstream submissions, so you
can't even configure for that.

The architecture's documentation is already translated into English;
it can be browsed at https://loongson.github.io/LoongArch-Documentation/.

In this series I made use of generated instruction encodings and
emitters from https://github.com/loongson-community/loongarch-opcodes
(a community project started by myself, something I must admit), as the
LoongArch encoding is highly irregular even for a fixed 32-bit ISA, and
I want to minimize the maintenance burden for future collaboration. This
is something not seen in any of the other TCG ports out there, so I'd
like to see if this is acceptable practice (and also maybe bikeshed the
file name).

Also, I'm not quite familiar with the DWARF spec, so maybe the debug
frame information is not supplied correctly; beware (though all my tests
run fine).

This series touches some of the same files as Song Gao's previous
submission of LoongArch *target* support, which is a bit unfortunate;
one of us will have to rebase after either series gets in. Actual
conflict should only happen on build system bits and include/elf.h,
though, as we're working on entirely different areas.

## How to build and test this

Upstream support for LoongArch is largely WIP for now, which means you
must apply a lot of patches if you want to even cross-build for this arch.
The main sources I used are as follows:

* binutils: https://github.com/xen0n/binutils-gdb/tree/for-gentoo-2.37-v2
  based on https://github.com/loongson/binutils-gdb/tree/loongarch/upstream_v6_a1d65b3
* gcc: https://github.com/xen0n/gcc/tree/for-gentoo-gcc-12-v2
  based on https://github.com/loongson/gcc/tree/loongarch_upstream
* glibc: https://github.com/xen0n/glibc/tree/for-gentoo-glibc-2.34
  based on https://github.com/loongson/glibc/tree/loongarch_2_34_for_upstream
* Linux: https://github.com/xen0n/linux/tree/loongarch-playground
  based on https://github.com/loongson/linux/tree/loongarch-next
* Gentoo overlay: https://github.com/xen0n/loongson-overlay

I have made ready-to-use Gentoo stage3 tarballs, but they're served with
CDN off my personal cloud account, and I don't want the link to be
exposed so that my bills skyrocket; you can reach me off-list to get the
links if you're interested.

As for the hardware availability, the boards can already be bought in
China on Taobao, and I think some people at Loongson might be able to
arrange for testing environments, if testing on real hardware other than
mine is required before merging; they have their in-house Debian spin-off
from the early days of this architecture. Their kernel is
ABI-incompatible with the version being upstreamed and used by me, but
QEMU should work there regardless.

Lastly, I'm new to QEMU development and this is my first patch series
here; apologizes if I get anything wrong, and any help or suggestion is
certainly appreciated!

WANG Xuerui (30):
  elf: Add machine type value for LoongArch
  MAINTAINERS: Add tcg/loongarch entry with myself as maintainer
  tcg/loongarch: Add the tcg-target.h file
  tcg/loongarch: Add generated instruction opcodes and encoding helpers
  tcg/loongarch: Add register names, allocation order and input/output
    sets
  tcg/loongarch: Define the operand constraints
  tcg/loongarch: Implement necessary relocation operations
  tcg/loongarch: Implement the memory barrier op
  tcg/loongarch: Implement tcg_out_mov and tcg_out_movi
  tcg/loongarch: Implement goto_ptr
  tcg/loongarch: Implement sign-/zero-extension ops
  tcg/loongarch: Implement not/and/or/xor/nor/andc/orc ops
  tcg/loongarch: Implement deposit/extract ops
  tcg/loongarch: Implement bswap32_i32/bswap64_i64
  tcg/loongarch: Implement clz/ctz ops
  tcg/loongarch: Implement shl/shr/sar/rotl/rotr ops
  tcg/loongarch: Implement neg/add/sub ops
  tcg/loongarch: Implement mul/mulsh/muluh/div/divu/rem/remu ops
  tcg/loongarch: Implement br/brcond ops
  tcg/loongarch: Implement setcond ops
  tcg/loongarch: Implement tcg_out_call
  tcg/loongarch: Implement simple load/store ops
  tcg/loongarch: Add softmmu load/store helpers, implement
    qemu_ld/qemu_st ops
  tcg/loongarch: Implement tcg_target_qemu_prologue
  tcg/loongarch: Implement exit_tb/goto_tb
  tcg/loongarch: Implement tcg_target_init
  tcg/loongarch: Register the JIT
  configure, meson.build: Mark support for 64-bit LoongArch hosts
  linux-user: Add host dependency for 64-bit LoongArch
  accel/tcg/user-exec: Implement CPU-specific signal handler for
    LoongArch hosts

 MAINTAINERS                           |    5 +
 accel/tcg/user-exec.c                 |   83 ++
 configure                             |    4 +-
 include/elf.h                         |    2 +
 linux-user/host/loongarch64/hostdep.h |   11 +
 meson.build                           |    4 +-
 tcg/loongarch/tcg-insn-defs.c.inc     | 1080 +++++++++++++++++
 tcg/loongarch/tcg-target-con-set.h    |   30 +
 tcg/loongarch/tcg-target-con-str.h    |   26 +
 tcg/loongarch/tcg-target.c.inc        | 1561 +++++++++++++++++++++++++
 tcg/loongarch/tcg-target.h            |  183 +++
 11 files changed, 2987 insertions(+), 2 deletions(-)
 create mode 100644 linux-user/host/loongarch64/hostdep.h
 create mode 100644 tcg/loongarch/tcg-insn-defs.c.inc
 create mode 100644 tcg/loongarch/tcg-target-con-set.h
 create mode 100644 tcg/loongarch/tcg-target-con-str.h
 create mode 100644 tcg/loongarch/tcg-target.c.inc
 create mode 100644 tcg/loongarch/tcg-target.h

-- 
2.33.0



^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 01/30] elf: Add machine type value for LoongArch
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20  8:04 ` [PATCH 02/30] MAINTAINERS: Add tcg/loongarch entry with myself as maintainer WANG Xuerui
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 include/elf.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/elf.h b/include/elf.h
index 811bf4a1cb..3a4bcb646a 100644
--- a/include/elf.h
+++ b/include/elf.h
@@ -182,6 +182,8 @@ typedef struct mips_elf_abiflags_v0 {
 
 #define EM_NANOMIPS     249     /* Wave Computing nanoMIPS */
 
+#define EM_LOONGARCH    258     /* LoongArch */
+
 /*
  * This is an interim value that we will use until the committee comes
  * up with a final number.
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 02/30] MAINTAINERS: Add tcg/loongarch entry with myself as maintainer
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
  2021-09-20  8:04 ` [PATCH 01/30] elf: Add machine type value for LoongArch WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 14:50   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 03/30] tcg/loongarch: Add the tcg-target.h file WANG Xuerui
                   ` (27 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

I wrote the initial code, so I should maintain it of course.

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 MAINTAINERS | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6c20634d63..0e9942cc00 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3109,6 +3109,11 @@ S: Maintained
 F: tcg/i386/
 F: disas/i386.c
 
+LoongArch TCG target
+M: WANG Xuerui <git@xen0n.name>
+S: Maintained
+F: tcg/loongarch/
+
 MIPS TCG target
 M: Philippe Mathieu-Daudé <f4bug@amsat.org>
 R: Aurelien Jarno <aurelien@aurel32.net>
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 03/30] tcg/loongarch: Add the tcg-target.h file
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
  2021-09-20  8:04 ` [PATCH 01/30] elf: Add machine type value for LoongArch WANG Xuerui
  2021-09-20  8:04 ` [PATCH 02/30] MAINTAINERS: Add tcg/loongarch entry with myself as maintainer WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 14:23   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 04/30] tcg/loongarch: Add generated instruction opcodes and encoding helpers WANG Xuerui
                   ` (26 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target.h | 183 +++++++++++++++++++++++++++++++++++++
 1 file changed, 183 insertions(+)
 create mode 100644 tcg/loongarch/tcg-target.h

diff --git a/tcg/loongarch/tcg-target.h b/tcg/loongarch/tcg-target.h
new file mode 100644
index 0000000000..b5e70e01b5
--- /dev/null
+++ b/tcg/loongarch/tcg-target.h
@@ -0,0 +1,183 @@
+/*
+ * Tiny Code Generator for QEMU
+ *
+ * Copyright (c) 2021 WANG Xuerui <git@xen0n.name>
+ *
+ * Based on tcg/riscv/tcg-target.h
+ *
+ * Copyright (c) 2018 SiFive, Inc
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef LOONGARCH_TCG_TARGET_H
+#define LOONGARCH_TCG_TARGET_H
+
+/*
+ * Loongson removed the (incomplete) 32-bit support from kernel and toolchain
+ * for the initial upstreaming of this architecture, so don't bother and just
+ * support the LP64 ABI for now.
+ */
+#if defined(__loongarch64)
+# define TCG_TARGET_REG_BITS 64
+#else
+# error unsupported LoongArch bitness
+#endif
+
+#define TCG_TARGET_INSN_UNIT_SIZE 4
+#define TCG_TARGET_TLB_DISPLACEMENT_BITS 20
+#define TCG_TARGET_NB_REGS 32
+#define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
+
+typedef enum {
+    TCG_REG_ZERO,
+    TCG_REG_RA,
+    TCG_REG_TP,
+    TCG_REG_SP,
+    TCG_REG_A0,
+    TCG_REG_A1,
+    TCG_REG_A2,
+    TCG_REG_A3,
+    TCG_REG_A4,
+    TCG_REG_A5,
+    TCG_REG_A6,
+    TCG_REG_A7,
+    TCG_REG_T0,
+    TCG_REG_T1,
+    TCG_REG_T2,
+    TCG_REG_T3,
+    TCG_REG_T4,
+    TCG_REG_T5,
+    TCG_REG_T6,
+    TCG_REG_T7,
+    TCG_REG_T8,
+    TCG_REG_RESERVED,
+    TCG_REG_S9,
+    TCG_REG_S0,
+    TCG_REG_S1,
+    TCG_REG_S2,
+    TCG_REG_S3,
+    TCG_REG_S4,
+    TCG_REG_S5,
+    TCG_REG_S6,
+    TCG_REG_S7,
+    TCG_REG_S8,
+
+    /* aliases */
+    TCG_AREG0          = TCG_REG_S0,
+    TCG_GUEST_BASE_REG = TCG_REG_S1,
+    TCG_REG_TMP0       = TCG_REG_T8,
+    TCG_REG_TMP1       = TCG_REG_T7,
+    TCG_REG_TMP2       = TCG_REG_T6,
+} TCGReg;
+
+/* used for function call generation */
+#define TCG_REG_CALL_STACK              TCG_REG_SP
+#define TCG_TARGET_STACK_ALIGN          16
+#define TCG_TARGET_CALL_ALIGN_ARGS      1
+#define TCG_TARGET_CALL_STACK_OFFSET    0
+
+/* optional instructions */
+#define TCG_TARGET_HAS_movcond_i32      0
+#define TCG_TARGET_HAS_div_i32          1
+#define TCG_TARGET_HAS_rem_i32          1
+#define TCG_TARGET_HAS_div2_i32         0
+#define TCG_TARGET_HAS_rot_i32          1
+#define TCG_TARGET_HAS_deposit_i32      1
+#define TCG_TARGET_HAS_extract_i32      1
+#define TCG_TARGET_HAS_sextract_i32     0
+#define TCG_TARGET_HAS_extract2_i32     0
+#define TCG_TARGET_HAS_add2_i32         0
+#define TCG_TARGET_HAS_sub2_i32         0
+#define TCG_TARGET_HAS_mulu2_i32        0
+#define TCG_TARGET_HAS_muls2_i32        0
+#define TCG_TARGET_HAS_muluh_i32        1
+#define TCG_TARGET_HAS_mulsh_i32        1
+#define TCG_TARGET_HAS_ext8s_i32        1
+#define TCG_TARGET_HAS_ext16s_i32       1
+#define TCG_TARGET_HAS_ext8u_i32        1
+#define TCG_TARGET_HAS_ext16u_i32       1
+#define TCG_TARGET_HAS_bswap16_i32      0
+#define TCG_TARGET_HAS_bswap32_i32      1
+#define TCG_TARGET_HAS_not_i32          1
+#define TCG_TARGET_HAS_neg_i32          1
+#define TCG_TARGET_HAS_andc_i32         1
+#define TCG_TARGET_HAS_orc_i32          1
+#define TCG_TARGET_HAS_eqv_i32          0
+#define TCG_TARGET_HAS_nand_i32         0
+#define TCG_TARGET_HAS_nor_i32          1
+#define TCG_TARGET_HAS_clz_i32          1
+#define TCG_TARGET_HAS_ctz_i32          1
+#define TCG_TARGET_HAS_ctpop_i32        0
+#define TCG_TARGET_HAS_direct_jump      0
+#define TCG_TARGET_HAS_brcond2          0
+#define TCG_TARGET_HAS_setcond2         0
+#define TCG_TARGET_HAS_qemu_st8_i32     0
+
+#if TCG_TARGET_REG_BITS == 64
+#define TCG_TARGET_HAS_movcond_i64      0
+#define TCG_TARGET_HAS_div_i64          1
+#define TCG_TARGET_HAS_rem_i64          1
+#define TCG_TARGET_HAS_div2_i64         0
+#define TCG_TARGET_HAS_rot_i64          1
+#define TCG_TARGET_HAS_deposit_i64      1
+#define TCG_TARGET_HAS_extract_i64      1
+#define TCG_TARGET_HAS_sextract_i64     0
+#define TCG_TARGET_HAS_extract2_i64     0
+#define TCG_TARGET_HAS_extrl_i64_i32    1
+#define TCG_TARGET_HAS_extrh_i64_i32    1
+#define TCG_TARGET_HAS_ext8s_i64        1
+#define TCG_TARGET_HAS_ext16s_i64       1
+#define TCG_TARGET_HAS_ext32s_i64       1
+#define TCG_TARGET_HAS_ext8u_i64        1
+#define TCG_TARGET_HAS_ext16u_i64       1
+#define TCG_TARGET_HAS_ext32u_i64       1
+#define TCG_TARGET_HAS_bswap16_i64      0
+#define TCG_TARGET_HAS_bswap32_i64      0
+#define TCG_TARGET_HAS_bswap64_i64      1
+#define TCG_TARGET_HAS_not_i64          1
+#define TCG_TARGET_HAS_neg_i64          1
+#define TCG_TARGET_HAS_andc_i64         1
+#define TCG_TARGET_HAS_orc_i64          1
+#define TCG_TARGET_HAS_eqv_i64          0
+#define TCG_TARGET_HAS_nand_i64         0
+#define TCG_TARGET_HAS_nor_i64          1
+#define TCG_TARGET_HAS_clz_i64          1
+#define TCG_TARGET_HAS_ctz_i64          1
+#define TCG_TARGET_HAS_ctpop_i64        0
+#define TCG_TARGET_HAS_add2_i64         0
+#define TCG_TARGET_HAS_sub2_i64         0
+#define TCG_TARGET_HAS_mulu2_i64        0
+#define TCG_TARGET_HAS_muls2_i64        0
+#define TCG_TARGET_HAS_muluh_i64        1
+#define TCG_TARGET_HAS_mulsh_i64        1
+#endif
+
+/* not defined -- call should be eliminated at compile time */
+void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t, uintptr_t);
+
+#define TCG_TARGET_DEFAULT_MO (0)
+
+#ifdef CONFIG_SOFTMMU
+#define TCG_TARGET_NEED_LDST_LABELS
+#endif
+
+#define TCG_TARGET_HAS_MEMORY_BSWAP 0
+
+#endif /* LOONGARCH_TCG_TARGET_H */
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 04/30] tcg/loongarch: Add generated instruction opcodes and encoding helpers
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (2 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 03/30] tcg/loongarch: Add the tcg-target.h file WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 15:55   ` Richard Henderson
  2021-09-21  9:58   ` Philippe Mathieu-Daudé
  2021-09-20  8:04 ` [PATCH 05/30] tcg/loongarch: Add register names, allocation order and input/output sets WANG Xuerui
                   ` (25 subsequent siblings)
  29 siblings, 2 replies; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-insn-defs.c.inc | 1080 +++++++++++++++++++++++++++++
 1 file changed, 1080 insertions(+)
 create mode 100644 tcg/loongarch/tcg-insn-defs.c.inc

diff --git a/tcg/loongarch/tcg-insn-defs.c.inc b/tcg/loongarch/tcg-insn-defs.c.inc
new file mode 100644
index 0000000000..413f7ffc12
--- /dev/null
+++ b/tcg/loongarch/tcg-insn-defs.c.inc
@@ -0,0 +1,1080 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * LoongArch instruction formats, opcodes, and encoders for TCG use.
+ *
+ * Code generated by genqemutcgdefs from
+ * https://github.com/loongson-community/loongarch-opcodes,
+ * from commit bb5234081663faaefb6b921a7848b18e19519890.
+ * DO NOT EDIT.
+ */
+
+typedef enum {
+    OPC_CLZ_W = 0x00001400,
+    OPC_CTZ_W = 0x00001c00,
+    OPC_CLZ_D = 0x00002400,
+    OPC_CTZ_D = 0x00002c00,
+    OPC_REVB_2H = 0x00003000,
+    OPC_REVB_D = 0x00003c00,
+    OPC_SEXT_H = 0x00005800,
+    OPC_SEXT_B = 0x00005c00,
+    OPC_ADD_W = 0x00100000,
+    OPC_ADD_D = 0x00108000,
+    OPC_SUB_W = 0x00110000,
+    OPC_SUB_D = 0x00118000,
+    OPC_SLT = 0x00120000,
+    OPC_SLTU = 0x00128000,
+    OPC_MASKEQZ = 0x00130000,
+    OPC_MASKNEZ = 0x00138000,
+    OPC_NOR = 0x00140000,
+    OPC_AND = 0x00148000,
+    OPC_OR = 0x00150000,
+    OPC_XOR = 0x00158000,
+    OPC_ORN = 0x00160000,
+    OPC_ANDN = 0x00168000,
+    OPC_SLL_W = 0x00170000,
+    OPC_SRL_W = 0x00178000,
+    OPC_SRA_W = 0x00180000,
+    OPC_SLL_D = 0x00188000,
+    OPC_SRL_D = 0x00190000,
+    OPC_SRA_D = 0x00198000,
+    OPC_ROTR_W = 0x001b0000,
+    OPC_ROTR_D = 0x001b8000,
+    OPC_MUL_W = 0x001c0000,
+    OPC_MULH_W = 0x001c8000,
+    OPC_MULH_WU = 0x001d0000,
+    OPC_MUL_D = 0x001d8000,
+    OPC_MULH_D = 0x001e0000,
+    OPC_MULH_DU = 0x001e8000,
+    OPC_DIV_W = 0x00200000,
+    OPC_MOD_W = 0x00208000,
+    OPC_DIV_WU = 0x00210000,
+    OPC_MOD_WU = 0x00218000,
+    OPC_DIV_D = 0x00220000,
+    OPC_MOD_D = 0x00228000,
+    OPC_DIV_DU = 0x00230000,
+    OPC_MOD_DU = 0x00238000,
+    OPC_SLLI_W = 0x00408000,
+    OPC_SLLI_D = 0x00410000,
+    OPC_SRLI_W = 0x00448000,
+    OPC_SRLI_D = 0x00450000,
+    OPC_SRAI_W = 0x00488000,
+    OPC_SRAI_D = 0x00490000,
+    OPC_ROTRI_W = 0x004c8000,
+    OPC_ROTRI_D = 0x004d0000,
+    OPC_BSTRINS_W = 0x00600000,
+    OPC_BSTRPICK_W = 0x00608000,
+    OPC_BSTRINS_D = 0x00800000,
+    OPC_BSTRPICK_D = 0x00c00000,
+    OPC_SLTI = 0x02000000,
+    OPC_SLTUI = 0x02400000,
+    OPC_ADDI_W = 0x02800000,
+    OPC_ADDI_D = 0x02c00000,
+    OPC_CU52I_D = 0x03000000,
+    OPC_ANDI = 0x03400000,
+    OPC_ORI = 0x03800000,
+    OPC_XORI = 0x03c00000,
+    OPC_LU12I_W = 0x14000000,
+    OPC_CU32I_D = 0x16000000,
+    OPC_PCADDU12I = 0x1c000000,
+    OPC_LD_B = 0x28000000,
+    OPC_LD_H = 0x28400000,
+    OPC_LD_W = 0x28800000,
+    OPC_LD_D = 0x28c00000,
+    OPC_ST_B = 0x29000000,
+    OPC_ST_H = 0x29400000,
+    OPC_ST_W = 0x29800000,
+    OPC_ST_D = 0x29c00000,
+    OPC_LD_BU = 0x2a000000,
+    OPC_LD_HU = 0x2a400000,
+    OPC_LD_WU = 0x2a800000,
+    OPC_DBAR = 0x38720000,
+    OPC_JIRL = 0x4c000000,
+    OPC_B = 0x50000000,
+    OPC_BL = 0x54000000,
+    OPC_BEQ = 0x58000000,
+    OPC_BNE = 0x5c000000,
+    OPC_BGT = 0x60000000,
+    OPC_BLE = 0x64000000,
+    OPC_BGTU = 0x68000000,
+    OPC_BLEU = 0x6c000000,
+} LoongArchInsn;
+
+static int32_t encode_d_slot(LoongArchInsn opc, uint32_t d)
+    __attribute__((unused));
+
+static int32_t encode_d_slot(LoongArchInsn opc, uint32_t d)
+{
+    return opc | d;
+}
+
+static int32_t encode_dj_slots(LoongArchInsn opc, uint32_t d, uint32_t j)
+    __attribute__((unused));
+
+static int32_t encode_dj_slots(LoongArchInsn opc, uint32_t d, uint32_t j)
+{
+    return opc | d | j << 5;
+}
+
+static int32_t encode_djk_slots(LoongArchInsn opc, uint32_t d, uint32_t j,
+                                uint32_t k) __attribute__((unused));
+
+static int32_t encode_djk_slots(LoongArchInsn opc, uint32_t d, uint32_t j,
+                                uint32_t k)
+{
+    return opc | d | j << 5 | k << 10;
+}
+
+static int32_t encode_djkm_slots(LoongArchInsn opc, uint32_t d, uint32_t j,
+                                 uint32_t k, uint32_t m)
+    __attribute__((unused));
+
+static int32_t encode_djkm_slots(LoongArchInsn opc, uint32_t d, uint32_t j,
+                                 uint32_t k, uint32_t m)
+{
+    return opc | d | j << 5 | k << 10 | m << 16;
+}
+
+static int32_t encode_dk_slots(LoongArchInsn opc, uint32_t d, uint32_t k)
+    __attribute__((unused));
+
+static int32_t encode_dk_slots(LoongArchInsn opc, uint32_t d, uint32_t k)
+{
+    return opc | d | k << 10;
+}
+
+static int32_t encode_dj_insn(LoongArchInsn opc, TCGReg d, TCGReg j)
+    __attribute__((unused));
+
+static int32_t encode_dj_insn(LoongArchInsn opc, TCGReg d, TCGReg j)
+{
+    d &= 0x1f;
+    j &= 0x1f;
+    return encode_dj_slots(opc, d, j);
+}
+
+static int32_t encode_djk_insn(LoongArchInsn opc, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static int32_t encode_djk_insn(LoongArchInsn opc, TCGReg d, TCGReg j, TCGReg k)
+{
+    d &= 0x1f;
+    j &= 0x1f;
+    k &= 0x1f;
+    return encode_djk_slots(opc, d, j, k);
+}
+
+static int32_t encode_djsk12_insn(LoongArchInsn opc, TCGReg d, TCGReg j,
+                                  int32_t sk12) __attribute__((unused));
+
+static int32_t encode_djsk12_insn(LoongArchInsn opc, TCGReg d, TCGReg j,
+                                  int32_t sk12)
+{
+    d &= 0x1f;
+    j &= 0x1f;
+    sk12 &= 0xfff;
+    return encode_djk_slots(opc, d, j, sk12);
+}
+
+static int32_t encode_djsk16_insn(LoongArchInsn opc, TCGReg d, TCGReg j,
+                                  int32_t sk16) __attribute__((unused));
+
+static int32_t encode_djsk16_insn(LoongArchInsn opc, TCGReg d, TCGReg j,
+                                  int32_t sk16)
+{
+    d &= 0x1f;
+    j &= 0x1f;
+    sk16 &= 0xffff;
+    return encode_djk_slots(opc, d, j, sk16);
+}
+
+static int32_t encode_djuk12_insn(LoongArchInsn opc, TCGReg d, TCGReg j,
+                                  uint32_t uk12) __attribute__((unused));
+
+static int32_t encode_djuk12_insn(LoongArchInsn opc, TCGReg d, TCGReg j,
+                                  uint32_t uk12)
+{
+    d &= 0x1f;
+    j &= 0x1f;
+    uk12 &= 0xfff;
+    return encode_djk_slots(opc, d, j, uk12);
+}
+
+static int32_t encode_djuk5_insn(LoongArchInsn opc, TCGReg d, TCGReg j,
+                                 uint32_t uk5) __attribute__((unused));
+
+static int32_t encode_djuk5_insn(LoongArchInsn opc, TCGReg d, TCGReg j,
+                                 uint32_t uk5)
+{
+    d &= 0x1f;
+    j &= 0x1f;
+    uk5 &= 0x1f;
+    return encode_djk_slots(opc, d, j, uk5);
+}
+
+static int32_t encode_djuk5um5_insn(LoongArchInsn opc, TCGReg d, TCGReg j,
+                                    uint32_t uk5, uint32_t um5)
+    __attribute__((unused));
+
+static int32_t encode_djuk5um5_insn(LoongArchInsn opc, TCGReg d, TCGReg j,
+                                    uint32_t uk5, uint32_t um5)
+{
+    d &= 0x1f;
+    j &= 0x1f;
+    uk5 &= 0x1f;
+    um5 &= 0x1f;
+    return encode_djkm_slots(opc, d, j, uk5, um5);
+}
+
+static int32_t encode_djuk6_insn(LoongArchInsn opc, TCGReg d, TCGReg j,
+                                 uint32_t uk6) __attribute__((unused));
+
+static int32_t encode_djuk6_insn(LoongArchInsn opc, TCGReg d, TCGReg j,
+                                 uint32_t uk6)
+{
+    d &= 0x1f;
+    j &= 0x1f;
+    uk6 &= 0x3f;
+    return encode_djk_slots(opc, d, j, uk6);
+}
+
+static int32_t encode_djuk6um6_insn(LoongArchInsn opc, TCGReg d, TCGReg j,
+                                    uint32_t uk6, uint32_t um6)
+    __attribute__((unused));
+
+static int32_t encode_djuk6um6_insn(LoongArchInsn opc, TCGReg d, TCGReg j,
+                                    uint32_t uk6, uint32_t um6)
+{
+    d &= 0x1f;
+    j &= 0x1f;
+    uk6 &= 0x3f;
+    um6 &= 0x3f;
+    return encode_djkm_slots(opc, d, j, uk6, um6);
+}
+
+static int32_t encode_dsj20_insn(LoongArchInsn opc, TCGReg d, int32_t sj20)
+    __attribute__((unused));
+
+static int32_t encode_dsj20_insn(LoongArchInsn opc, TCGReg d, int32_t sj20)
+{
+    d &= 0x1f;
+    sj20 &= 0xfffff;
+    return encode_dj_slots(opc, d, sj20);
+}
+
+static int32_t encode_sd10k16_insn(LoongArchInsn opc, int32_t sd10k16)
+    __attribute__((unused));
+
+static int32_t encode_sd10k16_insn(LoongArchInsn opc, int32_t sd10k16)
+{
+    sd10k16 &= 0x3ffffff;
+    return encode_dk_slots(opc, (sd10k16 >> 16) & 0x3ff, sd10k16 & 0xffff);
+}
+
+static int32_t encode_ud15_insn(LoongArchInsn opc, uint32_t ud15)
+    __attribute__((unused));
+
+static int32_t encode_ud15_insn(LoongArchInsn opc, uint32_t ud15)
+{
+    ud15 &= 0x7fff;
+    return encode_d_slot(opc, ud15);
+}
+
+/* Emits the `clz.w d, j` instruction. */
+static void tcg_out_opc_clz_w(TCGContext *s, TCGReg d, TCGReg j)
+    __attribute__((unused));
+
+static void tcg_out_opc_clz_w(TCGContext *s, TCGReg d, TCGReg j)
+{
+    tcg_out32(s, encode_dj_insn(OPC_CLZ_W, d, j));
+}
+
+/* Emits the `ctz.w d, j` instruction. */
+static void tcg_out_opc_ctz_w(TCGContext *s, TCGReg d, TCGReg j)
+    __attribute__((unused));
+
+static void tcg_out_opc_ctz_w(TCGContext *s, TCGReg d, TCGReg j)
+{
+    tcg_out32(s, encode_dj_insn(OPC_CTZ_W, d, j));
+}
+
+/* Emits the `clz.d d, j` instruction. */
+static void tcg_out_opc_clz_d(TCGContext *s, TCGReg d, TCGReg j)
+    __attribute__((unused));
+
+static void tcg_out_opc_clz_d(TCGContext *s, TCGReg d, TCGReg j)
+{
+    tcg_out32(s, encode_dj_insn(OPC_CLZ_D, d, j));
+}
+
+/* Emits the `ctz.d d, j` instruction. */
+static void tcg_out_opc_ctz_d(TCGContext *s, TCGReg d, TCGReg j)
+    __attribute__((unused));
+
+static void tcg_out_opc_ctz_d(TCGContext *s, TCGReg d, TCGReg j)
+{
+    tcg_out32(s, encode_dj_insn(OPC_CTZ_D, d, j));
+}
+
+/* Emits the `revb.2h d, j` instruction. */
+static void tcg_out_opc_revb_2h(TCGContext *s, TCGReg d, TCGReg j)
+    __attribute__((unused));
+
+static void tcg_out_opc_revb_2h(TCGContext *s, TCGReg d, TCGReg j)
+{
+    tcg_out32(s, encode_dj_insn(OPC_REVB_2H, d, j));
+}
+
+/* Emits the `revb.d d, j` instruction. */
+static void tcg_out_opc_revb_d(TCGContext *s, TCGReg d, TCGReg j)
+    __attribute__((unused));
+
+static void tcg_out_opc_revb_d(TCGContext *s, TCGReg d, TCGReg j)
+{
+    tcg_out32(s, encode_dj_insn(OPC_REVB_D, d, j));
+}
+
+/* Emits the `sext.h d, j` instruction. */
+static void tcg_out_opc_sext_h(TCGContext *s, TCGReg d, TCGReg j)
+    __attribute__((unused));
+
+static void tcg_out_opc_sext_h(TCGContext *s, TCGReg d, TCGReg j)
+{
+    tcg_out32(s, encode_dj_insn(OPC_SEXT_H, d, j));
+}
+
+/* Emits the `sext.b d, j` instruction. */
+static void tcg_out_opc_sext_b(TCGContext *s, TCGReg d, TCGReg j)
+    __attribute__((unused));
+
+static void tcg_out_opc_sext_b(TCGContext *s, TCGReg d, TCGReg j)
+{
+    tcg_out32(s, encode_dj_insn(OPC_SEXT_B, d, j));
+}
+
+/* Emits the `add.w d, j, k` instruction. */
+static void tcg_out_opc_add_w(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_add_w(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_ADD_W, d, j, k));
+}
+
+/* Emits the `add.d d, j, k` instruction. */
+static void tcg_out_opc_add_d(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_add_d(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_ADD_D, d, j, k));
+}
+
+/* Emits the `sub.w d, j, k` instruction. */
+static void tcg_out_opc_sub_w(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_sub_w(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_SUB_W, d, j, k));
+}
+
+/* Emits the `sub.d d, j, k` instruction. */
+static void tcg_out_opc_sub_d(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_sub_d(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_SUB_D, d, j, k));
+}
+
+/* Emits the `slt d, j, k` instruction. */
+static void tcg_out_opc_slt(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_slt(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_SLT, d, j, k));
+}
+
+/* Emits the `sltu d, j, k` instruction. */
+static void tcg_out_opc_sltu(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_sltu(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_SLTU, d, j, k));
+}
+
+/* Emits the `maskeqz d, j, k` instruction. */
+static void tcg_out_opc_maskeqz(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_maskeqz(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_MASKEQZ, d, j, k));
+}
+
+/* Emits the `masknez d, j, k` instruction. */
+static void tcg_out_opc_masknez(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_masknez(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_MASKNEZ, d, j, k));
+}
+
+/* Emits the `nor d, j, k` instruction. */
+static void tcg_out_opc_nor(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_nor(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_NOR, d, j, k));
+}
+
+/* Emits the `and d, j, k` instruction. */
+static void tcg_out_opc_and(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_and(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_AND, d, j, k));
+}
+
+/* Emits the `or d, j, k` instruction. */
+static void tcg_out_opc_or(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_or(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_OR, d, j, k));
+}
+
+/* Emits the `xor d, j, k` instruction. */
+static void tcg_out_opc_xor(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_xor(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_XOR, d, j, k));
+}
+
+/* Emits the `orn d, j, k` instruction. */
+static void tcg_out_opc_orn(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_orn(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_ORN, d, j, k));
+}
+
+/* Emits the `andn d, j, k` instruction. */
+static void tcg_out_opc_andn(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_andn(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_ANDN, d, j, k));
+}
+
+/* Emits the `sll.w d, j, k` instruction. */
+static void tcg_out_opc_sll_w(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_sll_w(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_SLL_W, d, j, k));
+}
+
+/* Emits the `srl.w d, j, k` instruction. */
+static void tcg_out_opc_srl_w(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_srl_w(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_SRL_W, d, j, k));
+}
+
+/* Emits the `sra.w d, j, k` instruction. */
+static void tcg_out_opc_sra_w(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_sra_w(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_SRA_W, d, j, k));
+}
+
+/* Emits the `sll.d d, j, k` instruction. */
+static void tcg_out_opc_sll_d(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_sll_d(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_SLL_D, d, j, k));
+}
+
+/* Emits the `srl.d d, j, k` instruction. */
+static void tcg_out_opc_srl_d(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_srl_d(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_SRL_D, d, j, k));
+}
+
+/* Emits the `sra.d d, j, k` instruction. */
+static void tcg_out_opc_sra_d(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_sra_d(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_SRA_D, d, j, k));
+}
+
+/* Emits the `rotr.w d, j, k` instruction. */
+static void tcg_out_opc_rotr_w(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_rotr_w(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_ROTR_W, d, j, k));
+}
+
+/* Emits the `rotr.d d, j, k` instruction. */
+static void tcg_out_opc_rotr_d(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_rotr_d(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_ROTR_D, d, j, k));
+}
+
+/* Emits the `mul.w d, j, k` instruction. */
+static void tcg_out_opc_mul_w(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_mul_w(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_MUL_W, d, j, k));
+}
+
+/* Emits the `mulh.w d, j, k` instruction. */
+static void tcg_out_opc_mulh_w(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_mulh_w(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_MULH_W, d, j, k));
+}
+
+/* Emits the `mulh.wu d, j, k` instruction. */
+static void tcg_out_opc_mulh_wu(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_mulh_wu(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_MULH_WU, d, j, k));
+}
+
+/* Emits the `mul.d d, j, k` instruction. */
+static void tcg_out_opc_mul_d(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_mul_d(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_MUL_D, d, j, k));
+}
+
+/* Emits the `mulh.d d, j, k` instruction. */
+static void tcg_out_opc_mulh_d(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_mulh_d(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_MULH_D, d, j, k));
+}
+
+/* Emits the `mulh.du d, j, k` instruction. */
+static void tcg_out_opc_mulh_du(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_mulh_du(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_MULH_DU, d, j, k));
+}
+
+/* Emits the `div.w d, j, k` instruction. */
+static void tcg_out_opc_div_w(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_div_w(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_DIV_W, d, j, k));
+}
+
+/* Emits the `mod.w d, j, k` instruction. */
+static void tcg_out_opc_mod_w(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_mod_w(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_MOD_W, d, j, k));
+}
+
+/* Emits the `div.wu d, j, k` instruction. */
+static void tcg_out_opc_div_wu(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_div_wu(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_DIV_WU, d, j, k));
+}
+
+/* Emits the `mod.wu d, j, k` instruction. */
+static void tcg_out_opc_mod_wu(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_mod_wu(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_MOD_WU, d, j, k));
+}
+
+/* Emits the `div.d d, j, k` instruction. */
+static void tcg_out_opc_div_d(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_div_d(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_DIV_D, d, j, k));
+}
+
+/* Emits the `mod.d d, j, k` instruction. */
+static void tcg_out_opc_mod_d(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_mod_d(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_MOD_D, d, j, k));
+}
+
+/* Emits the `div.du d, j, k` instruction. */
+static void tcg_out_opc_div_du(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_div_du(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_DIV_DU, d, j, k));
+}
+
+/* Emits the `mod.du d, j, k` instruction. */
+static void tcg_out_opc_mod_du(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+    __attribute__((unused));
+
+static void tcg_out_opc_mod_du(TCGContext *s, TCGReg d, TCGReg j, TCGReg k)
+{
+    tcg_out32(s, encode_djk_insn(OPC_MOD_DU, d, j, k));
+}
+
+/* Emits the `slli.w d, j, uk5` instruction. */
+static void tcg_out_opc_slli_w(TCGContext *s, TCGReg d, TCGReg j, uint32_t uk5)
+    __attribute__((unused));
+
+static void tcg_out_opc_slli_w(TCGContext *s, TCGReg d, TCGReg j, uint32_t uk5)
+{
+    tcg_out32(s, encode_djuk5_insn(OPC_SLLI_W, d, j, uk5));
+}
+
+/* Emits the `slli.d d, j, uk6` instruction. */
+static void tcg_out_opc_slli_d(TCGContext *s, TCGReg d, TCGReg j, uint32_t uk6)
+    __attribute__((unused));
+
+static void tcg_out_opc_slli_d(TCGContext *s, TCGReg d, TCGReg j, uint32_t uk6)
+{
+    tcg_out32(s, encode_djuk6_insn(OPC_SLLI_D, d, j, uk6));
+}
+
+/* Emits the `srli.w d, j, uk5` instruction. */
+static void tcg_out_opc_srli_w(TCGContext *s, TCGReg d, TCGReg j, uint32_t uk5)
+    __attribute__((unused));
+
+static void tcg_out_opc_srli_w(TCGContext *s, TCGReg d, TCGReg j, uint32_t uk5)
+{
+    tcg_out32(s, encode_djuk5_insn(OPC_SRLI_W, d, j, uk5));
+}
+
+/* Emits the `srli.d d, j, uk6` instruction. */
+static void tcg_out_opc_srli_d(TCGContext *s, TCGReg d, TCGReg j, uint32_t uk6)
+    __attribute__((unused));
+
+static void tcg_out_opc_srli_d(TCGContext *s, TCGReg d, TCGReg j, uint32_t uk6)
+{
+    tcg_out32(s, encode_djuk6_insn(OPC_SRLI_D, d, j, uk6));
+}
+
+/* Emits the `srai.w d, j, uk5` instruction. */
+static void tcg_out_opc_srai_w(TCGContext *s, TCGReg d, TCGReg j, uint32_t uk5)
+    __attribute__((unused));
+
+static void tcg_out_opc_srai_w(TCGContext *s, TCGReg d, TCGReg j, uint32_t uk5)
+{
+    tcg_out32(s, encode_djuk5_insn(OPC_SRAI_W, d, j, uk5));
+}
+
+/* Emits the `srai.d d, j, uk6` instruction. */
+static void tcg_out_opc_srai_d(TCGContext *s, TCGReg d, TCGReg j, uint32_t uk6)
+    __attribute__((unused));
+
+static void tcg_out_opc_srai_d(TCGContext *s, TCGReg d, TCGReg j, uint32_t uk6)
+{
+    tcg_out32(s, encode_djuk6_insn(OPC_SRAI_D, d, j, uk6));
+}
+
+/* Emits the `rotri.w d, j, uk5` instruction. */
+static void tcg_out_opc_rotri_w(TCGContext *s, TCGReg d, TCGReg j, uint32_t uk5)
+    __attribute__((unused));
+
+static void tcg_out_opc_rotri_w(TCGContext *s, TCGReg d, TCGReg j, uint32_t uk5)
+{
+    tcg_out32(s, encode_djuk5_insn(OPC_ROTRI_W, d, j, uk5));
+}
+
+/* Emits the `rotri.d d, j, uk6` instruction. */
+static void tcg_out_opc_rotri_d(TCGContext *s, TCGReg d, TCGReg j, uint32_t uk6)
+    __attribute__((unused));
+
+static void tcg_out_opc_rotri_d(TCGContext *s, TCGReg d, TCGReg j, uint32_t uk6)
+{
+    tcg_out32(s, encode_djuk6_insn(OPC_ROTRI_D, d, j, uk6));
+}
+
+/* Emits the `bstrins.w d, j, uk5, um5` instruction. */
+static void tcg_out_opc_bstrins_w(TCGContext *s, TCGReg d, TCGReg j,
+                                  uint32_t uk5, uint32_t um5)
+    __attribute__((unused));
+
+static void tcg_out_opc_bstrins_w(TCGContext *s, TCGReg d, TCGReg j,
+                                  uint32_t uk5, uint32_t um5)
+{
+    tcg_out32(s, encode_djuk5um5_insn(OPC_BSTRINS_W, d, j, uk5, um5));
+}
+
+/* Emits the `bstrpick.w d, j, uk5, um5` instruction. */
+static void tcg_out_opc_bstrpick_w(TCGContext *s, TCGReg d, TCGReg j,
+                                   uint32_t uk5, uint32_t um5)
+    __attribute__((unused));
+
+static void tcg_out_opc_bstrpick_w(TCGContext *s, TCGReg d, TCGReg j,
+                                   uint32_t uk5, uint32_t um5)
+{
+    tcg_out32(s, encode_djuk5um5_insn(OPC_BSTRPICK_W, d, j, uk5, um5));
+}
+
+/* Emits the `bstrins.d d, j, uk6, um6` instruction. */
+static void tcg_out_opc_bstrins_d(TCGContext *s, TCGReg d, TCGReg j,
+                                  uint32_t uk6, uint32_t um6)
+    __attribute__((unused));
+
+static void tcg_out_opc_bstrins_d(TCGContext *s, TCGReg d, TCGReg j,
+                                  uint32_t uk6, uint32_t um6)
+{
+    tcg_out32(s, encode_djuk6um6_insn(OPC_BSTRINS_D, d, j, uk6, um6));
+}
+
+/* Emits the `bstrpick.d d, j, uk6, um6` instruction. */
+static void tcg_out_opc_bstrpick_d(TCGContext *s, TCGReg d, TCGReg j,
+                                   uint32_t uk6, uint32_t um6)
+    __attribute__((unused));
+
+static void tcg_out_opc_bstrpick_d(TCGContext *s, TCGReg d, TCGReg j,
+                                   uint32_t uk6, uint32_t um6)
+{
+    tcg_out32(s, encode_djuk6um6_insn(OPC_BSTRPICK_D, d, j, uk6, um6));
+}
+
+/* Emits the `slti d, j, sk12` instruction. */
+static void tcg_out_opc_slti(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+    __attribute__((unused));
+
+static void tcg_out_opc_slti(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+{
+    tcg_out32(s, encode_djsk12_insn(OPC_SLTI, d, j, sk12));
+}
+
+/* Emits the `sltui d, j, sk12` instruction. */
+static void tcg_out_opc_sltui(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+    __attribute__((unused));
+
+static void tcg_out_opc_sltui(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+{
+    tcg_out32(s, encode_djsk12_insn(OPC_SLTUI, d, j, sk12));
+}
+
+/* Emits the `addi.w d, j, sk12` instruction. */
+static void tcg_out_opc_addi_w(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+    __attribute__((unused));
+
+static void tcg_out_opc_addi_w(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+{
+    tcg_out32(s, encode_djsk12_insn(OPC_ADDI_W, d, j, sk12));
+}
+
+/* Emits the `addi.d d, j, sk12` instruction. */
+static void tcg_out_opc_addi_d(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+    __attribute__((unused));
+
+static void tcg_out_opc_addi_d(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+{
+    tcg_out32(s, encode_djsk12_insn(OPC_ADDI_D, d, j, sk12));
+}
+
+/* Emits the `cu52i.d d, j, sk12` instruction. */
+static void tcg_out_opc_cu52i_d(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+    __attribute__((unused));
+
+static void tcg_out_opc_cu52i_d(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+{
+    tcg_out32(s, encode_djsk12_insn(OPC_CU52I_D, d, j, sk12));
+}
+
+/* Emits the `andi d, j, uk12` instruction. */
+static void tcg_out_opc_andi(TCGContext *s, TCGReg d, TCGReg j, uint32_t uk12)
+    __attribute__((unused));
+
+static void tcg_out_opc_andi(TCGContext *s, TCGReg d, TCGReg j, uint32_t uk12)
+{
+    tcg_out32(s, encode_djuk12_insn(OPC_ANDI, d, j, uk12));
+}
+
+/* Emits the `ori d, j, uk12` instruction. */
+static void tcg_out_opc_ori(TCGContext *s, TCGReg d, TCGReg j, uint32_t uk12)
+    __attribute__((unused));
+
+static void tcg_out_opc_ori(TCGContext *s, TCGReg d, TCGReg j, uint32_t uk12)
+{
+    tcg_out32(s, encode_djuk12_insn(OPC_ORI, d, j, uk12));
+}
+
+/* Emits the `xori d, j, uk12` instruction. */
+static void tcg_out_opc_xori(TCGContext *s, TCGReg d, TCGReg j, uint32_t uk12)
+    __attribute__((unused));
+
+static void tcg_out_opc_xori(TCGContext *s, TCGReg d, TCGReg j, uint32_t uk12)
+{
+    tcg_out32(s, encode_djuk12_insn(OPC_XORI, d, j, uk12));
+}
+
+/* Emits the `lu12i.w d, sj20` instruction. */
+static void tcg_out_opc_lu12i_w(TCGContext *s, TCGReg d, int32_t sj20)
+    __attribute__((unused));
+
+static void tcg_out_opc_lu12i_w(TCGContext *s, TCGReg d, int32_t sj20)
+{
+    tcg_out32(s, encode_dsj20_insn(OPC_LU12I_W, d, sj20));
+}
+
+/* Emits the `cu32i.d d, sj20` instruction. */
+static void tcg_out_opc_cu32i_d(TCGContext *s, TCGReg d, int32_t sj20)
+    __attribute__((unused));
+
+static void tcg_out_opc_cu32i_d(TCGContext *s, TCGReg d, int32_t sj20)
+{
+    tcg_out32(s, encode_dsj20_insn(OPC_CU32I_D, d, sj20));
+}
+
+/* Emits the `pcaddu12i d, sj20` instruction. */
+static void tcg_out_opc_pcaddu12i(TCGContext *s, TCGReg d, int32_t sj20)
+    __attribute__((unused));
+
+static void tcg_out_opc_pcaddu12i(TCGContext *s, TCGReg d, int32_t sj20)
+{
+    tcg_out32(s, encode_dsj20_insn(OPC_PCADDU12I, d, sj20));
+}
+
+/* Emits the `ld.b d, j, sk12` instruction. */
+static void tcg_out_opc_ld_b(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+    __attribute__((unused));
+
+static void tcg_out_opc_ld_b(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+{
+    tcg_out32(s, encode_djsk12_insn(OPC_LD_B, d, j, sk12));
+}
+
+/* Emits the `ld.h d, j, sk12` instruction. */
+static void tcg_out_opc_ld_h(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+    __attribute__((unused));
+
+static void tcg_out_opc_ld_h(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+{
+    tcg_out32(s, encode_djsk12_insn(OPC_LD_H, d, j, sk12));
+}
+
+/* Emits the `ld.w d, j, sk12` instruction. */
+static void tcg_out_opc_ld_w(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+    __attribute__((unused));
+
+static void tcg_out_opc_ld_w(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+{
+    tcg_out32(s, encode_djsk12_insn(OPC_LD_W, d, j, sk12));
+}
+
+/* Emits the `ld.d d, j, sk12` instruction. */
+static void tcg_out_opc_ld_d(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+    __attribute__((unused));
+
+static void tcg_out_opc_ld_d(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+{
+    tcg_out32(s, encode_djsk12_insn(OPC_LD_D, d, j, sk12));
+}
+
+/* Emits the `st.b d, j, sk12` instruction. */
+static void tcg_out_opc_st_b(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+    __attribute__((unused));
+
+static void tcg_out_opc_st_b(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+{
+    tcg_out32(s, encode_djsk12_insn(OPC_ST_B, d, j, sk12));
+}
+
+/* Emits the `st.h d, j, sk12` instruction. */
+static void tcg_out_opc_st_h(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+    __attribute__((unused));
+
+static void tcg_out_opc_st_h(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+{
+    tcg_out32(s, encode_djsk12_insn(OPC_ST_H, d, j, sk12));
+}
+
+/* Emits the `st.w d, j, sk12` instruction. */
+static void tcg_out_opc_st_w(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+    __attribute__((unused));
+
+static void tcg_out_opc_st_w(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+{
+    tcg_out32(s, encode_djsk12_insn(OPC_ST_W, d, j, sk12));
+}
+
+/* Emits the `st.d d, j, sk12` instruction. */
+static void tcg_out_opc_st_d(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+    __attribute__((unused));
+
+static void tcg_out_opc_st_d(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+{
+    tcg_out32(s, encode_djsk12_insn(OPC_ST_D, d, j, sk12));
+}
+
+/* Emits the `ld.bu d, j, sk12` instruction. */
+static void tcg_out_opc_ld_bu(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+    __attribute__((unused));
+
+static void tcg_out_opc_ld_bu(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+{
+    tcg_out32(s, encode_djsk12_insn(OPC_LD_BU, d, j, sk12));
+}
+
+/* Emits the `ld.hu d, j, sk12` instruction. */
+static void tcg_out_opc_ld_hu(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+    __attribute__((unused));
+
+static void tcg_out_opc_ld_hu(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+{
+    tcg_out32(s, encode_djsk12_insn(OPC_LD_HU, d, j, sk12));
+}
+
+/* Emits the `ld.wu d, j, sk12` instruction. */
+static void tcg_out_opc_ld_wu(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+    __attribute__((unused));
+
+static void tcg_out_opc_ld_wu(TCGContext *s, TCGReg d, TCGReg j, int32_t sk12)
+{
+    tcg_out32(s, encode_djsk12_insn(OPC_LD_WU, d, j, sk12));
+}
+
+/* Emits the `dbar ud15` instruction. */
+static void tcg_out_opc_dbar(TCGContext *s, uint32_t ud15)
+    __attribute__((unused));
+
+static void tcg_out_opc_dbar(TCGContext *s, uint32_t ud15)
+{
+    tcg_out32(s, encode_ud15_insn(OPC_DBAR, ud15));
+}
+
+/* Emits the `jirl d, j, sk16` instruction. */
+static void tcg_out_opc_jirl(TCGContext *s, TCGReg d, TCGReg j, int32_t sk16)
+    __attribute__((unused));
+
+static void tcg_out_opc_jirl(TCGContext *s, TCGReg d, TCGReg j, int32_t sk16)
+{
+    tcg_out32(s, encode_djsk16_insn(OPC_JIRL, d, j, sk16));
+}
+
+/* Emits the `b sd10k16` instruction. */
+static void tcg_out_opc_b(TCGContext *s, int32_t sd10k16)
+    __attribute__((unused));
+
+static void tcg_out_opc_b(TCGContext *s, int32_t sd10k16)
+{
+    tcg_out32(s, encode_sd10k16_insn(OPC_B, sd10k16));
+}
+
+/* Emits the `bl sd10k16` instruction. */
+static void tcg_out_opc_bl(TCGContext *s, int32_t sd10k16)
+    __attribute__((unused));
+
+static void tcg_out_opc_bl(TCGContext *s, int32_t sd10k16)
+{
+    tcg_out32(s, encode_sd10k16_insn(OPC_BL, sd10k16));
+}
+
+/* Emits the `beq d, j, sk16` instruction. */
+static void tcg_out_opc_beq(TCGContext *s, TCGReg d, TCGReg j, int32_t sk16)
+    __attribute__((unused));
+
+static void tcg_out_opc_beq(TCGContext *s, TCGReg d, TCGReg j, int32_t sk16)
+{
+    tcg_out32(s, encode_djsk16_insn(OPC_BEQ, d, j, sk16));
+}
+
+/* Emits the `bne d, j, sk16` instruction. */
+static void tcg_out_opc_bne(TCGContext *s, TCGReg d, TCGReg j, int32_t sk16)
+    __attribute__((unused));
+
+static void tcg_out_opc_bne(TCGContext *s, TCGReg d, TCGReg j, int32_t sk16)
+{
+    tcg_out32(s, encode_djsk16_insn(OPC_BNE, d, j, sk16));
+}
+
+/* Emits the `bgt d, j, sk16` instruction. */
+static void tcg_out_opc_bgt(TCGContext *s, TCGReg d, TCGReg j, int32_t sk16)
+    __attribute__((unused));
+
+static void tcg_out_opc_bgt(TCGContext *s, TCGReg d, TCGReg j, int32_t sk16)
+{
+    tcg_out32(s, encode_djsk16_insn(OPC_BGT, d, j, sk16));
+}
+
+/* Emits the `ble d, j, sk16` instruction. */
+static void tcg_out_opc_ble(TCGContext *s, TCGReg d, TCGReg j, int32_t sk16)
+    __attribute__((unused));
+
+static void tcg_out_opc_ble(TCGContext *s, TCGReg d, TCGReg j, int32_t sk16)
+{
+    tcg_out32(s, encode_djsk16_insn(OPC_BLE, d, j, sk16));
+}
+
+/* Emits the `bgtu d, j, sk16` instruction. */
+static void tcg_out_opc_bgtu(TCGContext *s, TCGReg d, TCGReg j, int32_t sk16)
+    __attribute__((unused));
+
+static void tcg_out_opc_bgtu(TCGContext *s, TCGReg d, TCGReg j, int32_t sk16)
+{
+    tcg_out32(s, encode_djsk16_insn(OPC_BGTU, d, j, sk16));
+}
+
+/* Emits the `bleu d, j, sk16` instruction. */
+static void tcg_out_opc_bleu(TCGContext *s, TCGReg d, TCGReg j, int32_t sk16)
+    __attribute__((unused));
+
+static void tcg_out_opc_bleu(TCGContext *s, TCGReg d, TCGReg j, int32_t sk16)
+{
+    tcg_out32(s, encode_djsk16_insn(OPC_BLEU, d, j, sk16));
+}
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 05/30] tcg/loongarch: Add register names, allocation order and input/output sets
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (3 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 04/30] tcg/loongarch: Add generated instruction opcodes and encoding helpers WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 15:57   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 06/30] tcg/loongarch: Define the operand constraints WANG Xuerui
                   ` (24 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target.c.inc | 118 +++++++++++++++++++++++++++++++++
 1 file changed, 118 insertions(+)
 create mode 100644 tcg/loongarch/tcg-target.c.inc

diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
new file mode 100644
index 0000000000..f8c71bbaf4
--- /dev/null
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -0,0 +1,118 @@
+/*
+ * Tiny Code Generator for QEMU
+ *
+ * Copyright (c) 2021 WANG Xuerui <git@xen0n.name>
+ *
+ * Based on tcg/riscv/tcg-target.c.inc
+ *
+ * Copyright (c) 2018 SiFive, Inc
+ * Copyright (c) 2008-2009 Arnaud Patard <arnaud.patard@rtp-net.org>
+ * Copyright (c) 2009 Aurelien Jarno <aurelien@aurel32.net>
+ * Copyright (c) 2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifdef CONFIG_DEBUG_TCG
+static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
+    "zero",
+    "ra",
+    "tp",
+    "sp",
+    "a0",
+    "a1",
+    "a2",
+    "a3",
+    "a4",
+    "a5",
+    "a6",
+    "a7",
+    "t0",
+    "t1",
+    "t2",
+    "t3",
+    "t4",
+    "t5",
+    "t6",
+    "t7",
+    "t8",
+    "r21", /* reserved in the LP64 ABI, hence no ABI name */
+    "s9",
+    "s0",
+    "s1",
+    "s2",
+    "s3",
+    "s4",
+    "s5",
+    "s6",
+    "s7",
+    "s8"
+};
+#endif
+
+static const int tcg_target_reg_alloc_order[] = {
+    /* Registers preserved across calls */
+    /* TCG_REG_S0 reserved for TCG_AREG0 */
+    TCG_REG_S1,
+    TCG_REG_S2,
+    TCG_REG_S3,
+    TCG_REG_S4,
+    TCG_REG_S5,
+    TCG_REG_S6,
+    TCG_REG_S7,
+    TCG_REG_S8,
+    TCG_REG_S9,
+
+    /* Registers (potentially) clobbered across calls */
+    TCG_REG_T0,
+    TCG_REG_T1,
+    TCG_REG_T2,
+    TCG_REG_T3,
+    TCG_REG_T4,
+    TCG_REG_T5,
+    TCG_REG_T6,
+    TCG_REG_T7,
+    TCG_REG_T8,
+
+    /* Argument registers */
+    TCG_REG_A0,
+    TCG_REG_A1,
+    TCG_REG_A2,
+    TCG_REG_A3,
+    TCG_REG_A4,
+    TCG_REG_A5,
+    TCG_REG_A6,
+    TCG_REG_A7,
+};
+
+static const int tcg_target_call_iarg_regs[] = {
+    TCG_REG_A0,
+    TCG_REG_A1,
+    TCG_REG_A2,
+    TCG_REG_A3,
+    TCG_REG_A4,
+    TCG_REG_A5,
+    TCG_REG_A6,
+    TCG_REG_A7,
+};
+
+static const int tcg_target_call_oarg_regs[] = {
+    TCG_REG_A0,
+    TCG_REG_A1,
+};
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 06/30] tcg/loongarch: Define the operand constraints
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (4 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 05/30] tcg/loongarch: Add register names, allocation order and input/output sets WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 14:28   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 07/30] tcg/loongarch: Implement necessary relocation operations WANG Xuerui
                   ` (23 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target-con-str.h | 26 ++++++++++++++++
 tcg/loongarch/tcg-target.c.inc     | 48 ++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+)
 create mode 100644 tcg/loongarch/tcg-target-con-str.h

diff --git a/tcg/loongarch/tcg-target-con-str.h b/tcg/loongarch/tcg-target-con-str.h
new file mode 100644
index 0000000000..30b42d83a4
--- /dev/null
+++ b/tcg/loongarch/tcg-target-con-str.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Define LoongArch target-specific operand constraints.
+ *
+ * Copyright (c) 2021 WANG Xuerui <git@xen0n.name>
+ *
+ * Based on tcg/riscv/tcg-target-con-str.h
+ *
+ * Copyright (c) 2021 Linaro
+ */
+
+/*
+ * Define constraint letters for register sets:
+ * REGS(letter, register_mask)
+ */
+REGS('r', ALL_GENERAL_REGS)
+REGS('L', ALL_GENERAL_REGS & ~SOFTMMU_RESERVE_REGS)
+
+/*
+ * Define constraint letters for constants:
+ * CONST(letter, TCG_CT_CONST_* bit set)
+ */
+CONST('I', TCG_CT_CONST_S12)
+CONST('N', TCG_CT_CONST_N12)
+CONST('U', TCG_CT_CONST_U12)
+CONST('Z', TCG_CT_CONST_ZERO)
diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
index f8c71bbaf4..594b434b47 100644
--- a/tcg/loongarch/tcg-target.c.inc
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -116,3 +116,51 @@ static const int tcg_target_call_oarg_regs[] = {
     TCG_REG_A0,
     TCG_REG_A1,
 };
+
+#define TCG_CT_CONST_ZERO  0x100
+#define TCG_CT_CONST_S12   0x200
+#define TCG_CT_CONST_N12   0x400
+#define TCG_CT_CONST_U12   0x800
+
+#define ALL_GENERAL_REGS      MAKE_64BIT_MASK(0, 32)
+/*
+ * For softmmu, we need to avoid conflicts with the first 5
+ * argument registers to call the helper.  Some of these are
+ * also used for the tlb lookup.
+ */
+#ifdef CONFIG_SOFTMMU
+#define SOFTMMU_RESERVE_REGS  MAKE_64BIT_MASK(TCG_REG_A0, 5)
+#else
+#define SOFTMMU_RESERVE_REGS  0
+#endif
+
+
+static inline tcg_target_long sextreg(tcg_target_long val, int pos, int len)
+{
+    if (TCG_TARGET_REG_BITS == 32) {
+        return sextract32(val, pos, len);
+    } else {
+        return sextract64(val, pos, len);
+    }
+}
+
+/* test if a constant matches the constraint */
+static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
+{
+    if (ct & TCG_CT_CONST) {
+        return 1;
+    }
+    if ((ct & TCG_CT_CONST_ZERO) && val == 0) {
+        return 1;
+    }
+    if ((ct & TCG_CT_CONST_S12) && val == sextreg(val, 0, 12)) {
+        return 1;
+    }
+    if ((ct & TCG_CT_CONST_N12) && -val == sextreg(-val, 0, 12)) {
+        return 1;
+    }
+    if ((ct & TCG_CT_CONST_U12) && val >= 0 && val <= 0xfff) {
+        return 1;
+    }
+    return 0;
+}
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 07/30] tcg/loongarch: Implement necessary relocation operations
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (5 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 06/30] tcg/loongarch: Define the operand constraints WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 14:36   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 08/30] tcg/loongarch: Implement the memory barrier op WANG Xuerui
                   ` (22 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target.c.inc | 84 ++++++++++++++++++++++++++++++++++
 1 file changed, 84 insertions(+)

diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
index 594b434b47..8be34f8275 100644
--- a/tcg/loongarch/tcg-target.c.inc
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -164,3 +164,87 @@ static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
     }
     return 0;
 }
+
+/*
+ * Relocations
+ */
+
+/*
+ * Relocation records defined in LoongArch ELF psABI v1.00 is way too much
+ * complicated; a whopping stack machine is needed to stuff the fields, at
+ * the very least one SOP_PUSH and one SOP_POP (of the correct format) are
+ * needed.
+ *
+ * Hence, define our own simpler relocation types. Numbers are chosen as to
+ * not collide with potential future additions to the true ELF relocation
+ * type enum.
+ */
+
+/* Field Sk16; suitable for conditional jumps */
+#define R_LOONGARCH_SK16    256
+/* Field Sd10k16; suitable for B and BL */
+#define R_LOONGARCH_SD10K16 257
+
+static bool reloc_sk16(tcg_insn_unit *src_rw, const tcg_insn_unit *target)
+{
+    const tcg_insn_unit *src_rx = tcg_splitwx_to_rx(src_rw);
+    intptr_t offset = (intptr_t)target - (intptr_t)src_rx;
+
+    tcg_debug_assert((offset & 2) == 0);
+    offset >>= 2;
+    if (offset == sextreg(offset, 0, 16)) {
+        *src_rw |= (offset << 10) & 0x3fffc00;
+        return true;
+    }
+
+    return false;
+}
+
+static bool reloc_sd10k16(tcg_insn_unit *src_rw, const tcg_insn_unit *target)
+{
+    const tcg_insn_unit *src_rx = tcg_splitwx_to_rx(src_rw);
+    intptr_t offset = (intptr_t)target - (intptr_t)src_rx;
+
+    tcg_debug_assert((offset & 2) == 0);
+    offset >>= 2;
+    if (offset == sextreg(offset, 0, 26)) {
+        *src_rw |= (offset >> 16) & 0x3ff; /* slot d10 */
+        *src_rw |= ((offset & 0xffff) << 10) & 0x3fffc00; /* slot k16 */
+        return true;
+    }
+
+    return false;
+}
+
+static bool reloc_call(tcg_insn_unit *src_rw, const tcg_insn_unit *target)
+{
+    const tcg_insn_unit *src_rx = tcg_splitwx_to_rx(src_rw);
+    intptr_t offset = (intptr_t)target - (intptr_t)src_rx;
+    int32_t lo = sextreg(offset, 0, 12);
+    int32_t hi = offset - lo;
+
+    tcg_debug_assert((offset & 2) == 0);
+    if (offset == hi + lo) {
+        hi >>= 12;
+        src_rw[0] |= (hi << 5) & 0x1ffffe0; /* pcaddu12i's Sj20 imm */
+        lo >>= 2;
+        src_rw[1] |= (lo << 10) & 0x3fffc00; /* jirl's Sk16 imm */
+        return true;
+    }
+
+    return false;
+}
+
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
+                        intptr_t value, intptr_t addend)
+{
+    tcg_debug_assert(addend == 0);
+    switch (type) {
+    case R_LOONGARCH_SK16:
+        return reloc_sk16(code_ptr, (tcg_insn_unit *)value);
+    case R_LOONGARCH_SD10K16:
+        return reloc_sd10k16(code_ptr, (tcg_insn_unit *)value);
+    default:
+        g_assert_not_reached();
+    }
+}
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 08/30] tcg/loongarch: Implement the memory barrier op
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (6 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 07/30] tcg/loongarch: Implement necessary relocation operations WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20  8:04 ` [PATCH 09/30] tcg/loongarch: Implement tcg_out_mov and tcg_out_movi WANG Xuerui
                   ` (21 subsequent siblings)
  29 siblings, 0 replies; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target.c.inc | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
index 8be34f8275..71564e3246 100644
--- a/tcg/loongarch/tcg-target.c.inc
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -248,3 +248,35 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
         g_assert_not_reached();
     }
 }
+
+#include "tcg-insn-defs.c.inc"
+
+/*
+ * TCG intrinsics
+ */
+
+static void tcg_out_mb(TCGContext *s, TCGArg a0)
+{
+    /* Baseline LoongArch only has the full barrier, unfortunately.  */
+    tcg_out_opc_dbar(s, 0);
+}
+
+/*
+ * Entry-points
+ */
+
+static void tcg_out_op(TCGContext *s, TCGOpcode opc,
+                       const TCGArg args[TCG_MAX_OP_ARGS],
+                       const int const_args[TCG_MAX_OP_ARGS])
+{
+    TCGArg a0 = args[0];
+
+    switch (opc) {
+    case INDEX_op_mb:
+        tcg_out_mb(s, a0);
+        break;
+
+    default:
+        g_assert_not_reached();
+    }
+}
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 09/30] tcg/loongarch: Implement tcg_out_mov and tcg_out_movi
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (7 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 08/30] tcg/loongarch: Implement the memory barrier op WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 14:47   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 10/30] tcg/loongarch: Implement goto_ptr WANG Xuerui
                   ` (20 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target.c.inc | 73 ++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
index 71564e3246..60783d7ddc 100644
--- a/tcg/loongarch/tcg-target.c.inc
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -261,6 +261,77 @@ static void tcg_out_mb(TCGContext *s, TCGArg a0)
     tcg_out_opc_dbar(s, 0);
 }
 
+static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
+{
+    if (ret == arg) {
+        return true;
+    }
+    switch (type) {
+    case TCG_TYPE_I32:
+    case TCG_TYPE_I64:
+        /*
+         * Conventional register-register move used in LoongArch is
+         * `or dst, src, zero`.
+         */
+        tcg_out_opc_or(s, ret, arg, TCG_REG_ZERO);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    return true;
+}
+
+static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
+                         tcg_target_long val)
+{
+    tcg_target_long low, upper, higher, top;
+
+    if (TCG_TARGET_REG_BITS == 64 && type == TCG_TYPE_I32) {
+        val = (int32_t)val;
+    }
+
+    /* Single-instruction cases.  */
+    low = sextreg(val, 0, 12);
+    if (low == val) {
+        /* val fits in simm12: addi.w rd, zero, val */
+        tcg_out_opc_addi_w(s, rd, TCG_REG_ZERO, val);
+        return;
+    }
+    if (0x800 <= val && val <= 0xfff) {
+        /* val fits in uimm12: ori rd, zero, val */
+        tcg_out_opc_ori(s, rd, TCG_REG_ZERO, val);
+        return;
+    }
+
+    /* Chop upper bits into 3 immediate-field-sized segments respectively.  */
+    upper = (val >> 12) & 0xfffff;
+    higher = (val >> 32) & 0xfffff;
+    top = val >> 52;
+
+    tcg_out_opc_lu12i_w(s, rd, upper);
+    if (low != 0) {
+        tcg_out_opc_ori(s, rd, rd, low);
+    }
+
+    if (sextreg(val, 0, 32) == val) {
+        /*
+         * Fits in 32-bits, upper bits are already properly sign-extended by
+         * lu12i.w.
+         */
+        return;
+    }
+    tcg_out_opc_cu32i_d(s, rd, higher);
+
+    if (sextreg(val, 0, 52) == val) {
+        /*
+         * Fits in 52-bits, upper bits are already properly sign-extended by
+         * cu32i.d.
+         */
+        return;
+    }
+    tcg_out_opc_cu52i_d(s, rd, rd, top);
+}
+
 /*
  * Entry-points
  */
@@ -276,6 +347,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_mb(s, a0);
         break;
 
+    case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
+    case INDEX_op_mov_i64:
     default:
         g_assert_not_reached();
     }
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 10/30] tcg/loongarch: Implement goto_ptr
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (8 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 09/30] tcg/loongarch: Implement tcg_out_mov and tcg_out_movi WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 14:49   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 11/30] tcg/loongarch: Implement sign-/zero-extension ops WANG Xuerui
                   ` (19 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target-con-set.h | 17 +++++++++++++++++
 tcg/loongarch/tcg-target.c.inc     | 15 +++++++++++++++
 2 files changed, 32 insertions(+)
 create mode 100644 tcg/loongarch/tcg-target-con-set.h

diff --git a/tcg/loongarch/tcg-target-con-set.h b/tcg/loongarch/tcg-target-con-set.h
new file mode 100644
index 0000000000..5cc4407367
--- /dev/null
+++ b/tcg/loongarch/tcg-target-con-set.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Define LoongArch target-specific constraint sets.
+ *
+ * Copyright (c) 2021 WANG Xuerui <git@xen0n.name>
+ *
+ * Based on tcg/riscv/tcg-target-con-set.h
+ *
+ * Copyright (c) 2021 Linaro
+ */
+
+/*
+ * C_On_Im(...) defines a constraint set with <n> outputs and <m> inputs.
+ * Each operand should be a sequence of constraint letters as defined by
+ * tcg-target-con-str.h; the constraint combination is inclusive or.
+ */
+C_O0_I1(r)
diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
index 60783d7ddc..9d78146fb9 100644
--- a/tcg/loongarch/tcg-target.c.inc
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -347,9 +347,24 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_mb(s, a0);
         break;
 
+    case INDEX_op_goto_ptr:
+        tcg_out_opc_jirl(s, TCG_REG_ZERO, a0, 0);
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     default:
         g_assert_not_reached();
     }
 }
+
+static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
+{
+    switch (op) {
+    case INDEX_op_goto_ptr:
+        return C_O0_I1(r);
+
+    default:
+        g_assert_not_reached();
+    }
+}
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 11/30] tcg/loongarch: Implement sign-/zero-extension ops
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (9 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 10/30] tcg/loongarch: Implement goto_ptr WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 14:50   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 12/30] tcg/loongarch: Implement not/and/or/xor/nor/andc/orc ops WANG Xuerui
                   ` (18 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target-con-set.h |  1 +
 tcg/loongarch/tcg-target.c.inc     | 82 ++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)

diff --git a/tcg/loongarch/tcg-target-con-set.h b/tcg/loongarch/tcg-target-con-set.h
index 5cc4407367..7e459490ea 100644
--- a/tcg/loongarch/tcg-target-con-set.h
+++ b/tcg/loongarch/tcg-target-con-set.h
@@ -15,3 +15,4 @@
  * tcg-target-con-str.h; the constraint combination is inclusive or.
  */
 C_O0_I1(r)
+C_O1_I1(r, r)
diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
index 9d78146fb9..0ee389fdaa 100644
--- a/tcg/loongarch/tcg-target.c.inc
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -332,6 +332,36 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
     tcg_out_opc_cu52i_d(s, rd, rd, top);
 }
 
+static void tcg_out_ext8u(TCGContext *s, TCGReg ret, TCGReg arg)
+{
+    tcg_out_opc_andi(s, ret, arg, 0xff);
+}
+
+static void tcg_out_ext16u(TCGContext *s, TCGReg ret, TCGReg arg)
+{
+    tcg_out_opc_bstrpick_w(s, ret, arg, 0, 15);
+}
+
+static void tcg_out_ext32u(TCGContext *s, TCGReg ret, TCGReg arg)
+{
+    tcg_out_opc_bstrpick_d(s, ret, arg, 0, 31);
+}
+
+static void tcg_out_ext8s(TCGContext *s, TCGReg ret, TCGReg arg)
+{
+    tcg_out_opc_sext_b(s, ret, arg);
+}
+
+static void tcg_out_ext16s(TCGContext *s, TCGReg ret, TCGReg arg)
+{
+    tcg_out_opc_sext_h(s, ret, arg);
+}
+
+static void tcg_out_ext32s(TCGContext *s, TCGReg ret, TCGReg arg)
+{
+    tcg_out_opc_addi_w(s, ret, arg, 0);
+}
+
 /*
  * Entry-points
  */
@@ -341,6 +371,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
                        const int const_args[TCG_MAX_OP_ARGS])
 {
     TCGArg a0 = args[0];
+    TCGArg a1 = args[1];
 
     switch (opc) {
     case INDEX_op_mb:
@@ -351,6 +382,41 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_opc_jirl(s, TCG_REG_ZERO, a0, 0);
         break;
 
+    case INDEX_op_ext8s_i32:
+    case INDEX_op_ext8s_i64:
+        tcg_out_ext8s(s, a0, a1);
+        break;
+
+    case INDEX_op_ext8u_i32:
+    case INDEX_op_ext8u_i64:
+        tcg_out_ext8u(s, a0, a1);
+        break;
+
+    case INDEX_op_ext16s_i32:
+    case INDEX_op_ext16s_i64:
+        tcg_out_ext16s(s, a0, a1);
+        break;
+
+    case INDEX_op_ext16u_i32:
+    case INDEX_op_ext16u_i64:
+        tcg_out_ext16u(s, a0, a1);
+        break;
+
+    case INDEX_op_ext32u_i64:
+    case INDEX_op_extu_i32_i64:
+        tcg_out_ext32u(s, a0, a1);
+        break;
+
+    case INDEX_op_ext32s_i64:
+    case INDEX_op_extrl_i64_i32:
+    case INDEX_op_ext_i32_i64:
+        tcg_out_ext32s(s, a0, a1);
+        break;
+
+    case INDEX_op_extrh_i64_i32:
+        tcg_out_opc_srai_d(s, a0, a1, 32);
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     default:
@@ -364,6 +430,22 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_goto_ptr:
         return C_O0_I1(r);
 
+    case INDEX_op_ext8s_i32:
+    case INDEX_op_ext8s_i64:
+    case INDEX_op_ext8u_i32:
+    case INDEX_op_ext8u_i64:
+    case INDEX_op_ext16s_i32:
+    case INDEX_op_ext16s_i64:
+    case INDEX_op_ext16u_i32:
+    case INDEX_op_ext16u_i64:
+    case INDEX_op_ext32s_i64:
+    case INDEX_op_ext32u_i64:
+    case INDEX_op_extu_i32_i64:
+    case INDEX_op_extrl_i64_i32:
+    case INDEX_op_extrh_i64_i32:
+    case INDEX_op_ext_i32_i64:
+        return C_O1_I1(r, r);
+
     default:
         g_assert_not_reached();
     }
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 12/30] tcg/loongarch: Implement not/and/or/xor/nor/andc/orc ops
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (10 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 11/30] tcg/loongarch: Implement sign-/zero-extension ops WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 14:54   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 13/30] tcg/loongarch: Implement deposit/extract ops WANG Xuerui
                   ` (17 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target-con-set.h |  2 +
 tcg/loongarch/tcg-target.c.inc     | 69 ++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+)

diff --git a/tcg/loongarch/tcg-target-con-set.h b/tcg/loongarch/tcg-target-con-set.h
index 7e459490ea..385f503552 100644
--- a/tcg/loongarch/tcg-target-con-set.h
+++ b/tcg/loongarch/tcg-target-con-set.h
@@ -16,3 +16,5 @@
  */
 C_O0_I1(r)
 C_O1_I1(r, r)
+C_O1_I2(r, r, r)
+C_O1_I2(r, r, rU)
diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
index 0ee389fdaa..e364b6c1da 100644
--- a/tcg/loongarch/tcg-target.c.inc
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -372,6 +372,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 {
     TCGArg a0 = args[0];
     TCGArg a1 = args[1];
+    TCGArg a2 = args[2];
+    int c2 = const_args[2];
 
     switch (opc) {
     case INDEX_op_mb:
@@ -417,6 +419,53 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_opc_srai_d(s, a0, a1, 32);
         break;
 
+    case INDEX_op_not_i32:
+    case INDEX_op_not_i64:
+        tcg_out_opc_nor(s, a0, a1, TCG_REG_ZERO);
+        break;
+
+    case INDEX_op_nor_i32:
+    case INDEX_op_nor_i64:
+        tcg_out_opc_nor(s, a0, a1, a2);
+        break;
+
+    case INDEX_op_andc_i32:
+    case INDEX_op_andc_i64:
+        tcg_out_opc_andn(s, a0, a1, a2);
+        break;
+
+    case INDEX_op_orc_i32:
+    case INDEX_op_orc_i64:
+        tcg_out_opc_orn(s, a0, a1, a2);
+        break;
+
+    case INDEX_op_and_i32:
+    case INDEX_op_and_i64:
+        if (c2) {
+            tcg_out_opc_andi(s, a0, a1, a2);
+        } else {
+            tcg_out_opc_and(s, a0, a1, a2);
+        }
+        break;
+
+    case INDEX_op_or_i32:
+    case INDEX_op_or_i64:
+        if (c2) {
+            tcg_out_opc_ori(s, a0, a1, a2);
+        } else {
+            tcg_out_opc_or(s, a0, a1, a2);
+        }
+        break;
+
+    case INDEX_op_xor_i32:
+    case INDEX_op_xor_i64:
+        if (c2) {
+            tcg_out_opc_xori(s, a0, a1, a2);
+        } else {
+            tcg_out_opc_xor(s, a0, a1, a2);
+        }
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     default:
@@ -444,8 +493,28 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_extrl_i64_i32:
     case INDEX_op_extrh_i64_i32:
     case INDEX_op_ext_i32_i64:
+    case INDEX_op_not_i32:
+    case INDEX_op_not_i64:
         return C_O1_I1(r, r);
 
+    case INDEX_op_nor_i32:
+    case INDEX_op_andc_i32:
+    case INDEX_op_orc_i32:
+    case INDEX_op_nor_i64:
+    case INDEX_op_andc_i64:
+    case INDEX_op_orc_i64:
+        /* LoongArch insns for these ops don't have reg-imm forms */
+        return C_O1_I2(r, r, r);
+
+    case INDEX_op_and_i32:
+    case INDEX_op_or_i32:
+    case INDEX_op_xor_i32:
+    case INDEX_op_and_i64:
+    case INDEX_op_or_i64:
+    case INDEX_op_xor_i64:
+        /* LoongArch reg-imm bitops have their imms ZERO-extended */
+        return C_O1_I2(r, r, rU);
+
     default:
         g_assert_not_reached();
     }
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 13/30] tcg/loongarch: Implement deposit/extract ops
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (11 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 12/30] tcg/loongarch: Implement not/and/or/xor/nor/andc/orc ops WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 14:55   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 14/30] tcg/loongarch: Implement bswap32_i32/bswap64_i64 WANG Xuerui
                   ` (16 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target-con-set.h |  1 +
 tcg/loongarch/tcg-target.c.inc     | 21 +++++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/tcg/loongarch/tcg-target-con-set.h b/tcg/loongarch/tcg-target-con-set.h
index 385f503552..b0751c4bb0 100644
--- a/tcg/loongarch/tcg-target-con-set.h
+++ b/tcg/loongarch/tcg-target-con-set.h
@@ -18,3 +18,4 @@ C_O0_I1(r)
 C_O1_I1(r, r)
 C_O1_I2(r, r, r)
 C_O1_I2(r, r, rU)
+C_O1_I2(r, 0, rZ)
diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
index e364b6c1da..e5356bdaf8 100644
--- a/tcg/loongarch/tcg-target.c.inc
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -466,6 +466,20 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_extract_i32:
+        tcg_out_opc_bstrpick_w(s, a0, a1, a2, a2 + args[3] - 1);
+        break;
+    case INDEX_op_extract_i64:
+        tcg_out_opc_bstrpick_d(s, a0, a1, a2, a2 + args[3] - 1);
+        break;
+
+    case INDEX_op_deposit_i32:
+        tcg_out_opc_bstrins_w(s, a0, a2, args[3], args[3] + args[4] - 1);
+        break;
+    case INDEX_op_deposit_i64:
+        tcg_out_opc_bstrins_d(s, a0, a2, args[3], args[3] + args[4] - 1);
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     default:
@@ -495,6 +509,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_ext_i32_i64:
     case INDEX_op_not_i32:
     case INDEX_op_not_i64:
+    case INDEX_op_extract_i32:
+    case INDEX_op_extract_i64:
         return C_O1_I1(r, r);
 
     case INDEX_op_nor_i32:
@@ -515,6 +531,11 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
         /* LoongArch reg-imm bitops have their imms ZERO-extended */
         return C_O1_I2(r, r, rU);
 
+    case INDEX_op_deposit_i32:
+    case INDEX_op_deposit_i64:
+        /* Must deposit into the same register as input */
+        return C_O1_I2(r, 0, rZ);
+
     default:
         g_assert_not_reached();
     }
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 14/30] tcg/loongarch: Implement bswap32_i32/bswap64_i64
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (12 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 13/30] tcg/loongarch: Implement deposit/extract ops WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 15:11   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 15/30] tcg/loongarch: Implement clz/ctz ops WANG Xuerui
                   ` (15 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target.c.inc | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
index e5356bdaf8..d617b833e5 100644
--- a/tcg/loongarch/tcg-target.c.inc
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -480,6 +480,14 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_opc_bstrins_d(s, a0, a2, args[3], args[3] + args[4] - 1);
         break;
 
+    case INDEX_op_bswap32_i32:
+        tcg_out_opc_revb_2h(s, a0, a1);
+        tcg_out_opc_rotri_w(s, a0, a0, 16);
+        break;
+    case INDEX_op_bswap64_i64:
+        tcg_out_opc_revb_d(s, a0, a1);
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     default:
@@ -511,6 +519,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_not_i64:
     case INDEX_op_extract_i32:
     case INDEX_op_extract_i64:
+    case INDEX_op_bswap32_i32:
+    case INDEX_op_bswap64_i64:
         return C_O1_I1(r, r);
 
     case INDEX_op_nor_i32:
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 15/30] tcg/loongarch: Implement clz/ctz ops
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (13 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 14/30] tcg/loongarch: Implement bswap32_i32/bswap64_i64 WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 16:10   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 16/30] tcg/loongarch: Implement shl/shr/sar/rotl/rotr ops WANG Xuerui
                   ` (14 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target-con-set.h |  1 +
 tcg/loongarch/tcg-target.c.inc     | 31 ++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/tcg/loongarch/tcg-target-con-set.h b/tcg/loongarch/tcg-target-con-set.h
index b0751c4bb0..417c97549a 100644
--- a/tcg/loongarch/tcg-target-con-set.h
+++ b/tcg/loongarch/tcg-target-con-set.h
@@ -18,4 +18,5 @@ C_O0_I1(r)
 C_O1_I1(r, r)
 C_O1_I2(r, r, r)
 C_O1_I2(r, r, rU)
+C_O1_I2(r, r, rZ)
 C_O1_I2(r, 0, rZ)
diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
index d617b833e5..e817964a7e 100644
--- a/tcg/loongarch/tcg-target.c.inc
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -362,6 +362,17 @@ static void tcg_out_ext32s(TCGContext *s, TCGReg ret, TCGReg arg)
     tcg_out_opc_addi_w(s, ret, arg, 0);
 }
 
+static void tcg_out_clzctz(TCGContext *s, LoongArchInsn opc,
+                           TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    /* all clz/ctz insns belong to DJ-format */
+    tcg_out32(s, encode_dj_insn(opc, TCG_REG_TMP0, a1));
+    /* a0 = a1 ? REG_TMP0 : a2 */
+    tcg_out_opc_maskeqz(s, TCG_REG_TMP0, TCG_REG_TMP0, a1);
+    tcg_out_opc_masknez(s, a0, a2, a1);
+    tcg_out_opc_or(s, a0, TCG_REG_TMP0, a0);
+}
+
 /*
  * Entry-points
  */
@@ -488,6 +499,20 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_opc_revb_d(s, a0, a1);
         break;
 
+    case INDEX_op_clz_i32:
+        tcg_out_clzctz(s, OPC_CLZ_W, a0, a1, a2);
+        break;
+    case INDEX_op_clz_i64:
+        tcg_out_clzctz(s, OPC_CLZ_D, a0, a1, a2);
+        break;
+
+    case INDEX_op_ctz_i32:
+        tcg_out_clzctz(s, OPC_CTZ_W, a0, a1, a2);
+        break;
+    case INDEX_op_ctz_i64:
+        tcg_out_clzctz(s, OPC_CTZ_D, a0, a1, a2);
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     default:
@@ -541,6 +566,12 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
         /* LoongArch reg-imm bitops have their imms ZERO-extended */
         return C_O1_I2(r, r, rU);
 
+    case INDEX_op_clz_i32:
+    case INDEX_op_clz_i64:
+    case INDEX_op_ctz_i32:
+    case INDEX_op_ctz_i64:
+        return C_O1_I2(r, r, rZ);
+
     case INDEX_op_deposit_i32:
     case INDEX_op_deposit_i64:
         /* Must deposit into the same register as input */
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 16/30] tcg/loongarch: Implement shl/shr/sar/rotl/rotr ops
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (14 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 15/30] tcg/loongarch: Implement clz/ctz ops WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 16:13   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 17/30] tcg/loongarch: Implement neg/add/sub ops WANG Xuerui
                   ` (13 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target-con-set.h |  1 +
 tcg/loongarch/tcg-target.c.inc     | 91 ++++++++++++++++++++++++++++++
 2 files changed, 92 insertions(+)

diff --git a/tcg/loongarch/tcg-target-con-set.h b/tcg/loongarch/tcg-target-con-set.h
index 417c97549a..8630d1ee6e 100644
--- a/tcg/loongarch/tcg-target-con-set.h
+++ b/tcg/loongarch/tcg-target-con-set.h
@@ -17,6 +17,7 @@
 C_O0_I1(r)
 C_O1_I1(r, r)
 C_O1_I2(r, r, r)
+C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rU)
 C_O1_I2(r, r, rZ)
 C_O1_I2(r, 0, rZ)
diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
index e817964a7e..acbd0e65ef 100644
--- a/tcg/loongarch/tcg-target.c.inc
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -513,6 +513,85 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_clzctz(s, OPC_CTZ_D, a0, a1, a2);
         break;
 
+    case INDEX_op_shl_i32:
+        if (c2) {
+            tcg_out_opc_slli_w(s, a0, a1, a2 & 0x1f);
+        } else {
+            tcg_out_opc_sll_w(s, a0, a1, a2);
+        }
+        break;
+    case INDEX_op_shl_i64:
+        if (c2) {
+            tcg_out_opc_slli_d(s, a0, a1, a2 & 0x3f);
+        } else {
+            tcg_out_opc_sll_d(s, a0, a1, a2);
+        }
+        break;
+
+    case INDEX_op_shr_i32:
+        if (c2) {
+            tcg_out_opc_srli_w(s, a0, a1, a2 & 0x1f);
+        } else {
+            tcg_out_opc_srl_w(s, a0, a1, a2);
+        }
+        break;
+    case INDEX_op_shr_i64:
+        if (c2) {
+            tcg_out_opc_srli_d(s, a0, a1, a2 & 0x3f);
+        } else {
+            tcg_out_opc_srl_d(s, a0, a1, a2);
+        }
+        break;
+
+    case INDEX_op_sar_i32:
+        if (c2) {
+            tcg_out_opc_srai_w(s, a0, a1, a2 & 0x1f);
+        } else {
+            tcg_out_opc_sra_w(s, a0, a1, a2);
+        }
+        break;
+    case INDEX_op_sar_i64:
+        if (c2) {
+            tcg_out_opc_srai_d(s, a0, a1, a2 & 0x3f);
+        } else {
+            tcg_out_opc_sra_d(s, a0, a1, a2);
+        }
+        break;
+
+    case INDEX_op_rotl_i32:
+        /* transform into equivalent rotr_i32 */
+        if (c2) {
+            a2 = 32 - a2;
+        } else {
+            tcg_out_opc_sub_w(s, a2, TCG_REG_ZERO, a2);
+            tcg_out_opc_addi_w(s, a2, a2, 32);
+        }
+        /* fallthrough */
+    case INDEX_op_rotr_i32:
+        if (c2) {
+            tcg_out_opc_rotri_w(s, a0, a1, a2 & 0x1f);
+        } else {
+            tcg_out_opc_rotr_w(s, a0, a1, a2);
+        }
+        break;
+
+    case INDEX_op_rotl_i64:
+        /* transform into equivalent rotr_i64 */
+        if (c2) {
+            a2 = 64 - a2;
+        } else {
+            tcg_out_opc_sub_w(s, a2, TCG_REG_ZERO, a2);
+            tcg_out_opc_addi_w(s, a2, a2, 64);
+        }
+        /* fallthrough */
+    case INDEX_op_rotr_i64:
+        if (c2) {
+            tcg_out_opc_rotri_d(s, a0, a1, a2 & 0x3f);
+        } else {
+            tcg_out_opc_rotr_d(s, a0, a1, a2);
+        }
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     default:
@@ -557,6 +636,18 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
         /* LoongArch insns for these ops don't have reg-imm forms */
         return C_O1_I2(r, r, r);
 
+    case INDEX_op_shl_i32:
+    case INDEX_op_shl_i64:
+    case INDEX_op_shr_i32:
+    case INDEX_op_shr_i64:
+    case INDEX_op_sar_i32:
+    case INDEX_op_sar_i64:
+    case INDEX_op_rotl_i32:
+    case INDEX_op_rotl_i64:
+    case INDEX_op_rotr_i32:
+    case INDEX_op_rotr_i64:
+        return C_O1_I2(r, r, ri);
+
     case INDEX_op_and_i32:
     case INDEX_op_or_i32:
     case INDEX_op_xor_i32:
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 17/30] tcg/loongarch: Implement neg/add/sub ops
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (15 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 16/30] tcg/loongarch: Implement shl/shr/sar/rotl/rotr ops WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 16:16   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 18/30] tcg/loongarch: Implement mul/mulsh/muluh/div/divu/rem/remu ops WANG Xuerui
                   ` (12 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target-con-set.h |  2 ++
 tcg/loongarch/tcg-target.c.inc     | 47 ++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/tcg/loongarch/tcg-target-con-set.h b/tcg/loongarch/tcg-target-con-set.h
index 8630d1ee6e..58b5c487e2 100644
--- a/tcg/loongarch/tcg-target-con-set.h
+++ b/tcg/loongarch/tcg-target-con-set.h
@@ -18,6 +18,8 @@ C_O0_I1(r)
 C_O1_I1(r, r)
 C_O1_I2(r, r, r)
 C_O1_I2(r, r, ri)
+C_O1_I2(r, r, rI)
 C_O1_I2(r, r, rU)
 C_O1_I2(r, r, rZ)
 C_O1_I2(r, 0, rZ)
+C_O1_I2(r, rZ, rN)
diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
index acbd0e65ef..e5518c0102 100644
--- a/tcg/loongarch/tcg-target.c.inc
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -592,6 +592,43 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_neg_i32:
+        tcg_out_opc_sub_w(s, a0, TCG_REG_ZERO, a1);
+        break;
+    case INDEX_op_neg_i64:
+        tcg_out_opc_sub_d(s, a0, TCG_REG_ZERO, a1);
+        break;
+
+    case INDEX_op_add_i32:
+        if (c2) {
+            tcg_out_opc_addi_w(s, a0, a1, a2);
+        } else {
+            tcg_out_opc_add_w(s, a0, a1, a2);
+        }
+        break;
+    case INDEX_op_add_i64:
+        if (c2) {
+            tcg_out_opc_addi_d(s, a0, a1, a2);
+        } else {
+            tcg_out_opc_add_d(s, a0, a1, a2);
+        }
+        break;
+
+    case INDEX_op_sub_i32:
+        if (c2) {
+            tcg_out_opc_addi_w(s, a0, a1, -a2);
+        } else {
+            tcg_out_opc_sub_w(s, a0, a1, a2);
+        }
+        break;
+    case INDEX_op_sub_i64:
+        if (c2) {
+            tcg_out_opc_addi_d(s, a0, a1, -a2);
+        } else {
+            tcg_out_opc_sub_d(s, a0, a1, a2);
+        }
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     default:
@@ -625,6 +662,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_extract_i64:
     case INDEX_op_bswap32_i32:
     case INDEX_op_bswap64_i64:
+    case INDEX_op_neg_i32:
+    case INDEX_op_neg_i64:
         return C_O1_I1(r, r);
 
     case INDEX_op_nor_i32:
@@ -648,6 +687,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_rotr_i64:
         return C_O1_I2(r, r, ri);
 
+    case INDEX_op_add_i32:
+    case INDEX_op_add_i64:
+        return C_O1_I2(r, r, rI);
+
     case INDEX_op_and_i32:
     case INDEX_op_or_i32:
     case INDEX_op_xor_i32:
@@ -668,6 +711,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
         /* Must deposit into the same register as input */
         return C_O1_I2(r, 0, rZ);
 
+    case INDEX_op_sub_i32:
+    case INDEX_op_sub_i64:
+        return C_O1_I2(r, rZ, rN);
+
     default:
         g_assert_not_reached();
     }
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 18/30] tcg/loongarch: Implement mul/mulsh/muluh/div/divu/rem/remu ops
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (16 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 17/30] tcg/loongarch: Implement neg/add/sub ops WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 16:16   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 19/30] tcg/loongarch: Implement br/brcond ops WANG Xuerui
                   ` (11 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target-con-set.h |  1 +
 tcg/loongarch/tcg-target.c.inc     | 65 ++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+)

diff --git a/tcg/loongarch/tcg-target-con-set.h b/tcg/loongarch/tcg-target-con-set.h
index 58b5c487e2..57b2846d82 100644
--- a/tcg/loongarch/tcg-target-con-set.h
+++ b/tcg/loongarch/tcg-target-con-set.h
@@ -23,3 +23,4 @@ C_O1_I2(r, r, rU)
 C_O1_I2(r, r, rZ)
 C_O1_I2(r, 0, rZ)
 C_O1_I2(r, rZ, rN)
+C_O1_I2(r, rZ, rZ)
diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
index e5518c0102..eaa155ad68 100644
--- a/tcg/loongarch/tcg-target.c.inc
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -629,6 +629,55 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_mul_i32:
+        tcg_out_opc_mul_w(s, a0, a1, a2);
+        break;
+    case INDEX_op_mul_i64:
+        tcg_out_opc_mul_d(s, a0, a1, a2);
+        break;
+
+    case INDEX_op_mulsh_i32:
+        tcg_out_opc_mulh_w(s, a0, a1, a2);
+        break;
+    case INDEX_op_mulsh_i64:
+        tcg_out_opc_mulh_d(s, a0, a1, a2);
+        break;
+
+    case INDEX_op_muluh_i32:
+        tcg_out_opc_mulh_wu(s, a0, a1, a2);
+        break;
+    case INDEX_op_muluh_i64:
+        tcg_out_opc_mulh_du(s, a0, a1, a2);
+        break;
+
+    case INDEX_op_div_i32:
+        tcg_out_opc_div_w(s, a0, a1, a2);
+        break;
+    case INDEX_op_div_i64:
+        tcg_out_opc_div_d(s, a0, a1, a2);
+        break;
+
+    case INDEX_op_divu_i32:
+        tcg_out_opc_div_wu(s, a0, a1, a2);
+        break;
+    case INDEX_op_divu_i64:
+        tcg_out_opc_div_du(s, a0, a1, a2);
+        break;
+
+    case INDEX_op_rem_i32:
+        tcg_out_opc_mod_w(s, a0, a1, a2);
+        break;
+    case INDEX_op_rem_i64:
+        tcg_out_opc_mod_d(s, a0, a1, a2);
+        break;
+
+    case INDEX_op_remu_i32:
+        tcg_out_opc_mod_wu(s, a0, a1, a2);
+        break;
+    case INDEX_op_remu_i64:
+        tcg_out_opc_mod_du(s, a0, a1, a2);
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     default:
@@ -715,6 +764,22 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_sub_i64:
         return C_O1_I2(r, rZ, rN);
 
+    case INDEX_op_mul_i32:
+    case INDEX_op_mul_i64:
+    case INDEX_op_mulsh_i32:
+    case INDEX_op_mulsh_i64:
+    case INDEX_op_muluh_i32:
+    case INDEX_op_muluh_i64:
+    case INDEX_op_div_i32:
+    case INDEX_op_div_i64:
+    case INDEX_op_divu_i32:
+    case INDEX_op_divu_i64:
+    case INDEX_op_rem_i32:
+    case INDEX_op_rem_i64:
+    case INDEX_op_remu_i32:
+    case INDEX_op_remu_i64:
+        return C_O1_I2(r, rZ, rZ);
+
     default:
         g_assert_not_reached();
     }
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 19/30] tcg/loongarch: Implement br/brcond ops
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (17 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 18/30] tcg/loongarch: Implement mul/mulsh/muluh/div/divu/rem/remu ops WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 16:20   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 20/30] tcg/loongarch: Implement setcond ops WANG Xuerui
                   ` (10 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target-con-set.h |  1 +
 tcg/loongarch/tcg-target.c.inc     | 52 ++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/tcg/loongarch/tcg-target-con-set.h b/tcg/loongarch/tcg-target-con-set.h
index 57b2846d82..bcbf0780ff 100644
--- a/tcg/loongarch/tcg-target-con-set.h
+++ b/tcg/loongarch/tcg-target-con-set.h
@@ -15,6 +15,7 @@
  * tcg-target-con-str.h; the constraint combination is inclusive or.
  */
 C_O0_I1(r)
+C_O0_I2(rZ, rZ)
 C_O1_I1(r, r)
 C_O1_I2(r, r, r)
 C_O1_I2(r, r, ri)
diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
index eaa155ad68..a533a5619d 100644
--- a/tcg/loongarch/tcg-target.c.inc
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -373,6 +373,44 @@ static void tcg_out_clzctz(TCGContext *s, LoongArchInsn opc,
     tcg_out_opc_or(s, a0, TCG_REG_TMP0, a0);
 }
 
+/*
+ * Branch helpers
+ */
+
+static const struct {
+    LoongArchInsn op;
+    bool swap;
+} tcg_brcond_to_loongarch[] = {
+    [TCG_COND_EQ] =  { OPC_BEQ,  false },
+    [TCG_COND_NE] =  { OPC_BNE,  false },
+    [TCG_COND_LT] =  { OPC_BGT,  true  },
+    [TCG_COND_GE] =  { OPC_BLE,  true  },
+    [TCG_COND_LE] =  { OPC_BLE,  false },
+    [TCG_COND_GT] =  { OPC_BGT,  false },
+    [TCG_COND_LTU] = { OPC_BGTU, true  },
+    [TCG_COND_GEU] = { OPC_BLEU, true  },
+    [TCG_COND_LEU] = { OPC_BLEU, false },
+    [TCG_COND_GTU] = { OPC_BGTU, false }
+};
+
+static void tcg_out_brcond(TCGContext *s, TCGCond cond, TCGReg arg1,
+                           TCGReg arg2, TCGLabel *l)
+{
+    LoongArchInsn op = tcg_brcond_to_loongarch[cond].op;
+
+    tcg_debug_assert(op != 0);
+
+    if (tcg_brcond_to_loongarch[cond].swap) {
+        TCGReg t = arg1;
+        arg1 = arg2;
+        arg2 = t;
+    }
+
+    /* all conditional branch insns belong to DJSk16-format */
+    tcg_out_reloc(s, s->code_ptr, R_LOONGARCH_SK16, l, 0);
+    tcg_out32(s, encode_djsk16_insn(op, arg1, arg2, 0));
+}
+
 /*
  * Entry-points
  */
@@ -395,6 +433,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_opc_jirl(s, TCG_REG_ZERO, a0, 0);
         break;
 
+    case INDEX_op_br:
+        tcg_out_reloc(s, s->code_ptr, R_LOONGARCH_SD10K16, arg_label(a0), 0);
+        tcg_out_opc_b(s, 0);
+        break;
+
+    case INDEX_op_brcond_i32:
+    case INDEX_op_brcond_i64:
+        tcg_out_brcond(s, a2, a0, a1, arg_label(args[3]));
+        break;
+
     case INDEX_op_ext8s_i32:
     case INDEX_op_ext8s_i64:
         tcg_out_ext8s(s, a0, a1);
@@ -691,6 +739,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_goto_ptr:
         return C_O0_I1(r);
 
+    case INDEX_op_brcond_i32:
+    case INDEX_op_brcond_i64:
+        return C_O0_I2(rZ, rZ);
+
     case INDEX_op_ext8s_i32:
     case INDEX_op_ext8s_i64:
     case INDEX_op_ext8u_i32:
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 20/30] tcg/loongarch: Implement setcond ops
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (18 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 19/30] tcg/loongarch: Implement br/brcond ops WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 16:24   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 21/30] tcg/loongarch: Implement tcg_out_call WANG Xuerui
                   ` (9 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target.c.inc | 53 ++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
index a533a5619d..fb0143474a 100644
--- a/tcg/loongarch/tcg-target.c.inc
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -373,6 +373,52 @@ static void tcg_out_clzctz(TCGContext *s, LoongArchInsn opc,
     tcg_out_opc_or(s, a0, TCG_REG_TMP0, a0);
 }
 
+static void tcg_out_setcond(TCGContext *s, TCGCond cond, TCGReg ret,
+                            TCGReg arg1, TCGReg arg2)
+{
+    switch (cond) {
+    case TCG_COND_EQ:
+        tcg_out_opc_sub_d(s, ret, arg1, arg2);
+        tcg_out_opc_sltui(s, ret, ret, 1);
+        break;
+    case TCG_COND_NE:
+        tcg_out_opc_sub_d(s, ret, arg1, arg2);
+        tcg_out_opc_sltu(s, ret, TCG_REG_ZERO, ret);
+        break;
+    case TCG_COND_LT:
+        tcg_out_opc_slt(s, ret, arg1, arg2);
+        break;
+    case TCG_COND_GE:
+        tcg_out_opc_slt(s, ret, arg1, arg2);
+        tcg_out_opc_xori(s, ret, ret, 1);
+        break;
+    case TCG_COND_LE:
+        tcg_out_opc_slt(s, ret, arg2, arg1);
+        tcg_out_opc_xori(s, ret, ret, 1);
+        break;
+    case TCG_COND_GT:
+        tcg_out_opc_slt(s, ret, arg2, arg1);
+        break;
+    case TCG_COND_LTU:
+        tcg_out_opc_sltu(s, ret, arg1, arg2);
+        break;
+    case TCG_COND_GEU:
+        tcg_out_opc_sltu(s, ret, arg1, arg2);
+        tcg_out_opc_xori(s, ret, ret, 1);
+        break;
+    case TCG_COND_LEU:
+        tcg_out_opc_sltu(s, ret, arg2, arg1);
+        tcg_out_opc_xori(s, ret, ret, 1);
+        break;
+    case TCG_COND_GTU:
+        tcg_out_opc_sltu(s, ret, arg2, arg1);
+        break;
+    default:
+         g_assert_not_reached();
+         break;
+     }
+}
+
 /*
  * Branch helpers
  */
@@ -726,6 +772,11 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_opc_mod_du(s, a0, a1, a2);
         break;
 
+    case INDEX_op_setcond_i32:
+    case INDEX_op_setcond_i64:
+        tcg_out_setcond(s, args[3], a0, a1, a2);
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     default:
@@ -830,6 +881,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_rem_i64:
     case INDEX_op_remu_i32:
     case INDEX_op_remu_i64:
+    case INDEX_op_setcond_i32:
+    case INDEX_op_setcond_i64:
         return C_O1_I2(r, rZ, rZ);
 
     default:
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 21/30] tcg/loongarch: Implement tcg_out_call
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (19 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 20/30] tcg/loongarch: Implement setcond ops WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 16:31   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 22/30] tcg/loongarch: Implement simple load/store ops WANG Xuerui
                   ` (8 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target.c.inc | 37 ++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
index fb0143474a..01c6002fdb 100644
--- a/tcg/loongarch/tcg-target.c.inc
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -457,6 +457,42 @@ static void tcg_out_brcond(TCGContext *s, TCGCond cond, TCGReg arg1,
     tcg_out32(s, encode_djsk16_insn(op, arg1, arg2, 0));
 }
 
+static void tcg_out_call_int(TCGContext *s, const tcg_insn_unit *arg, bool tail)
+{
+    TCGReg link = tail ? TCG_REG_ZERO : TCG_REG_RA;
+    ptrdiff_t offset = tcg_pcrel_diff(s, arg);
+    int ret;
+
+    tcg_debug_assert((offset & 2) == 0);
+    if (offset == sextreg(offset, 0, 28)) {
+        /* short jump: +/- 256MiB */
+        if (tail) {
+            tcg_out_opc_b(s, offset >> 2);
+        } else {
+            tcg_out_opc_bl(s, offset >> 2);
+        }
+    } else if (TCG_TARGET_REG_BITS == 32 || offset == (int32_t)offset) {
+        /* long jump: +/- 2GiB */
+        tcg_out_opc_pcaddu12i(s, TCG_REG_TMP0, 0);
+        tcg_out_opc_jirl(s, link, TCG_REG_TMP0, 0);
+        ret = reloc_call(s->code_ptr - 2, arg);
+        tcg_debug_assert(ret == true);
+    } else if (TCG_TARGET_REG_BITS == 64) {
+        /* far jump: 64-bit */
+        tcg_target_long imm = sextreg((tcg_target_long)arg, 0, 12);
+        tcg_target_long base = (tcg_target_long)arg - imm;
+        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP0, base);
+        tcg_out_opc_jirl(s, link, TCG_REG_TMP0, imm >> 2);
+    } else {
+        g_assert_not_reached();
+    }
+}
+
+static void tcg_out_call(TCGContext *s, const tcg_insn_unit *arg)
+{
+    tcg_out_call_int(s, arg, false);
+}
+
 /*
  * Entry-points
  */
@@ -779,6 +815,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
+    case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
     default:
         g_assert_not_reached();
     }
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 22/30] tcg/loongarch: Implement simple load/store ops
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (20 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 21/30] tcg/loongarch: Implement tcg_out_call WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 16:35   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 23/30] tcg/loongarch: Add softmmu load/store helpers, implement qemu_ld/qemu_st ops WANG Xuerui
                   ` (7 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target-con-set.h |   1 +
 tcg/loongarch/tcg-target.c.inc     | 131 +++++++++++++++++++++++++++++
 2 files changed, 132 insertions(+)

diff --git a/tcg/loongarch/tcg-target-con-set.h b/tcg/loongarch/tcg-target-con-set.h
index bcbf0780ff..cdbfe9cd8d 100644
--- a/tcg/loongarch/tcg-target-con-set.h
+++ b/tcg/loongarch/tcg-target-con-set.h
@@ -15,6 +15,7 @@
  * tcg-target-con-str.h; the constraint combination is inclusive or.
  */
 C_O0_I1(r)
+C_O0_I2(rZ, r)
 C_O0_I2(rZ, rZ)
 C_O1_I1(r, r)
 C_O1_I2(r, r, r)
diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
index 01c6002fdb..3947a2d9fa 100644
--- a/tcg/loongarch/tcg-target.c.inc
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -493,6 +493,73 @@ static void tcg_out_call(TCGContext *s, const tcg_insn_unit *arg)
     tcg_out_call_int(s, arg, false);
 }
 
+/*
+ * Load/store helpers
+ */
+
+static void tcg_out_ldst(TCGContext *s, LoongArchInsn opc, TCGReg data,
+                         TCGReg addr, intptr_t offset)
+{
+    intptr_t imm12 = sextreg(offset, 0, 12);
+
+    if (offset != imm12) {
+        intptr_t diff = offset - (uintptr_t)s->code_ptr;
+
+        if (addr == TCG_REG_ZERO && diff == (int32_t)diff) {
+            imm12 = sextreg(diff, 0, 12);
+            tcg_out_opc_pcaddu12i(s, TCG_REG_TMP2, (diff - imm12) >> 12);
+        } else {
+            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP2, offset - imm12);
+            if (addr != TCG_REG_ZERO) {
+                tcg_out_opc_add_d(s, TCG_REG_TMP2, TCG_REG_TMP2, addr);
+            }
+        }
+        addr = TCG_REG_TMP2;
+    }
+
+    switch (opc) {
+    case OPC_LD_B:
+    case OPC_LD_BU:
+    case OPC_LD_H:
+    case OPC_LD_HU:
+    case OPC_LD_W:
+    case OPC_LD_WU:
+    case OPC_LD_D:
+    case OPC_ST_B:
+    case OPC_ST_H:
+    case OPC_ST_W:
+    case OPC_ST_D:
+        tcg_out32(s, encode_djsk12_insn(opc, data, addr, imm12));
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
+                       TCGReg arg1, intptr_t arg2)
+{
+    bool is_32bit = (TCG_TARGET_REG_BITS == 32 || type == TCG_TYPE_I32);
+    tcg_out_ldst(s, is_32bit ? OPC_LD_W : OPC_LD_D, arg, arg1, arg2);
+}
+
+static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
+                       TCGReg arg1, intptr_t arg2)
+{
+    bool is_32bit = (TCG_TARGET_REG_BITS == 32 || type == TCG_TYPE_I32);
+    tcg_out_ldst(s, is_32bit ? OPC_ST_W : OPC_ST_D, arg, arg1, arg2);
+}
+
+static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
+                        TCGReg base, intptr_t ofs)
+{
+    if (val == 0) {
+        tcg_out_st(s, type, TCG_REG_ZERO, base, ofs);
+        return true;
+    }
+    return false;
+}
+
 /*
  * Entry-points
  */
@@ -813,6 +880,49 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_setcond(s, args[3], a0, a1, a2);
         break;
 
+    case INDEX_op_ld8s_i32:
+    case INDEX_op_ld8s_i64:
+        tcg_out_ldst(s, OPC_LD_B, a0, a1, a2);
+        break;
+    case INDEX_op_ld8u_i32:
+    case INDEX_op_ld8u_i64:
+        tcg_out_ldst(s, OPC_LD_BU, a0, a1, a2);
+        break;
+    case INDEX_op_ld16s_i32:
+    case INDEX_op_ld16s_i64:
+        tcg_out_ldst(s, OPC_LD_H, a0, a1, a2);
+        break;
+    case INDEX_op_ld16u_i32:
+    case INDEX_op_ld16u_i64:
+        tcg_out_ldst(s, OPC_LD_HU, a0, a1, a2);
+        break;
+    case INDEX_op_ld_i32:
+    case INDEX_op_ld32s_i64:
+        tcg_out_ldst(s, OPC_LD_W, a0, a1, a2);
+        break;
+    case INDEX_op_ld32u_i64:
+        tcg_out_ldst(s, OPC_LD_WU, a0, a1, a2);
+        break;
+    case INDEX_op_ld_i64:
+        tcg_out_ldst(s, OPC_LD_D, a0, a1, a2);
+        break;
+
+    case INDEX_op_st8_i32:
+    case INDEX_op_st8_i64:
+        tcg_out_ldst(s, OPC_ST_B, a0, a1, a2);
+        break;
+    case INDEX_op_st16_i32:
+    case INDEX_op_st16_i64:
+        tcg_out_ldst(s, OPC_ST_H, a0, a1, a2);
+        break;
+    case INDEX_op_st_i32:
+    case INDEX_op_st32_i64:
+        tcg_out_ldst(s, OPC_ST_W, a0, a1, a2);
+        break;
+    case INDEX_op_st_i64:
+        tcg_out_ldst(s, OPC_ST_D, a0, a1, a2);
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
@@ -827,6 +937,15 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_goto_ptr:
         return C_O0_I1(r);
 
+    case INDEX_op_st8_i32:
+    case INDEX_op_st8_i64:
+    case INDEX_op_st16_i32:
+    case INDEX_op_st16_i64:
+    case INDEX_op_st32_i64:
+    case INDEX_op_st_i32:
+    case INDEX_op_st_i64:
+        return C_O0_I2(rZ, r);
+
     case INDEX_op_brcond_i32:
     case INDEX_op_brcond_i64:
         return C_O0_I2(rZ, rZ);
@@ -853,6 +972,18 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_bswap64_i64:
     case INDEX_op_neg_i32:
     case INDEX_op_neg_i64:
+    case INDEX_op_ld8s_i32:
+    case INDEX_op_ld8s_i64:
+    case INDEX_op_ld8u_i32:
+    case INDEX_op_ld8u_i64:
+    case INDEX_op_ld16s_i32:
+    case INDEX_op_ld16s_i64:
+    case INDEX_op_ld16u_i32:
+    case INDEX_op_ld16u_i64:
+    case INDEX_op_ld32s_i64:
+    case INDEX_op_ld32u_i64:
+    case INDEX_op_ld_i32:
+    case INDEX_op_ld_i64:
         return C_O1_I1(r, r);
 
     case INDEX_op_nor_i32:
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 23/30] tcg/loongarch: Add softmmu load/store helpers, implement qemu_ld/qemu_st ops
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (21 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 22/30] tcg/loongarch: Implement simple load/store ops WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 17:10   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 24/30] tcg/loongarch: Implement tcg_target_qemu_prologue WANG Xuerui
                   ` (6 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target-con-set.h |   2 +
 tcg/loongarch/tcg-target.c.inc     | 344 +++++++++++++++++++++++++++++
 2 files changed, 346 insertions(+)

diff --git a/tcg/loongarch/tcg-target-con-set.h b/tcg/loongarch/tcg-target-con-set.h
index cdbfe9cd8d..b990da6b26 100644
--- a/tcg/loongarch/tcg-target-con-set.h
+++ b/tcg/loongarch/tcg-target-con-set.h
@@ -17,7 +17,9 @@
 C_O0_I1(r)
 C_O0_I2(rZ, r)
 C_O0_I2(rZ, rZ)
+C_O0_I2(LZ, L)
 C_O1_I1(r, r)
+C_O1_I1(r, L)
 C_O1_I2(r, r, r)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rI)
diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
index 3947a2d9fa..0b6f16bde0 100644
--- a/tcg/loongarch/tcg-target.c.inc
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -560,6 +560,329 @@ static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
     return false;
 }
 
+/*
+ * Load/store helpers for SoftMMU, and qemu_ld/st implementations
+ */
+
+#if defined(CONFIG_SOFTMMU)
+#include "../tcg-ldst.c.inc"
+
+/*
+ * helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
+ *                                     TCGMemOpIdx oi, uintptr_t ra)
+ */
+static void * const qemu_ld_helpers[8] = {
+    [MO_UB] = helper_ret_ldub_mmu,
+    [MO_SB] = helper_ret_ldsb_mmu,
+    [MO_UW] = helper_le_lduw_mmu,
+    [MO_SW] = helper_le_ldsw_mmu,
+    [MO_UL] = helper_le_ldul_mmu,
+#if TCG_TARGET_REG_BITS == 64
+    [MO_SL] = helper_le_ldsl_mmu,
+#endif
+    [MO_Q]  = helper_le_ldq_mmu,
+};
+
+/*
+ * helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
+ *                                     uintxx_t val, TCGMemOpIdx oi,
+ *                                     uintptr_t ra)
+ */
+static void * const qemu_st_helpers[4] = {
+    [MO_8]  = helper_ret_stb_mmu,
+    [MO_16] = helper_le_stw_mmu,
+    [MO_32] = helper_le_stl_mmu,
+    [MO_64] = helper_le_stq_mmu,
+};
+
+/* We don't support oversize guests */
+QEMU_BUILD_BUG_ON(TCG_TARGET_REG_BITS < TARGET_LONG_BITS);
+
+/* We expect to use a 12-bit negative offset from ENV.  */
+QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) > 0);
+QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) < -(1 << 11));
+
+static void tcg_out_goto(TCGContext *s, const tcg_insn_unit *target)
+{
+    tcg_out_opc_b(s, 0);
+    bool ok = reloc_sd10k16(s->code_ptr - 1, target);
+    tcg_debug_assert(ok);
+}
+
+static void tcg_out_tlb_load(TCGContext *s, TCGReg addrl, TCGMemOpIdx oi,
+                             tcg_insn_unit **label_ptr, bool is_load)
+{
+    MemOp opc = get_memop(oi);
+    unsigned s_bits = opc & MO_SIZE;
+    unsigned a_bits = get_alignment_bits(opc);
+    tcg_target_long compare_mask;
+    int mem_index = get_mmuidx(oi);
+    int fast_ofs = TLB_MASK_TABLE_OFS(mem_index);
+    int mask_ofs = fast_ofs + offsetof(CPUTLBDescFast, mask);
+    int table_ofs = fast_ofs + offsetof(CPUTLBDescFast, table);
+    TCGReg mask_base = TCG_AREG0, table_base = TCG_AREG0;
+
+    tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TMP0, mask_base, mask_ofs);
+    tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TMP1, table_base, table_ofs);
+
+    tcg_out_opc_srli_d(s, TCG_REG_TMP2, addrl,
+                    TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
+    tcg_out_opc_and(s, TCG_REG_TMP2, TCG_REG_TMP2, TCG_REG_TMP0);
+    tcg_out_opc_add_d(s, TCG_REG_TMP2, TCG_REG_TMP2, TCG_REG_TMP1);
+
+    /* Load the tlb comparator and the addend.  */
+    tcg_out_ld(s, TCG_TYPE_TL, TCG_REG_TMP0, TCG_REG_TMP2,
+               is_load ? offsetof(CPUTLBEntry, addr_read)
+               : offsetof(CPUTLBEntry, addr_write));
+    tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TMP2, TCG_REG_TMP2,
+               offsetof(CPUTLBEntry, addend));
+
+    /* We don't support unaligned accesses.  */
+    if (a_bits < s_bits) {
+        a_bits = s_bits;
+    }
+    /* Clear the non-page, non-alignment bits from the address.  */
+    compare_mask = (tcg_target_long)TARGET_PAGE_MASK | ((1 << a_bits) - 1);
+    if (compare_mask == sextreg(compare_mask, 0, 12)) {
+        tcg_out_opc_andi(s, TCG_REG_TMP1, addrl, compare_mask);
+    } else {
+        tcg_out_movi(s, TCG_TYPE_TL, TCG_REG_TMP1, compare_mask);
+        tcg_out_opc_and(s, TCG_REG_TMP1, TCG_REG_TMP1, addrl);
+    }
+
+    /* Compare masked address with the TLB entry.  */
+    label_ptr[0] = s->code_ptr;
+    tcg_out_opc_bne(s, TCG_REG_TMP0, TCG_REG_TMP1, 0);
+
+    /* TLB Hit - translate address using addend.  */
+    if (TCG_TARGET_REG_BITS > TARGET_LONG_BITS) {
+        tcg_out_ext32u(s, TCG_REG_TMP0, addrl);
+        addrl = TCG_REG_TMP0;
+    }
+    tcg_out_opc_add_d(s, TCG_REG_TMP0, TCG_REG_TMP2, addrl);
+}
+
+static void add_qemu_ldst_label(TCGContext *s, int is_ld, TCGMemOpIdx oi,
+                                TCGType ext,
+                                TCGReg datalo, TCGReg addrlo,
+                                void *raddr, tcg_insn_unit **label_ptr)
+{
+    TCGLabelQemuLdst *label = new_ldst_label(s);
+
+    label->is_ld = is_ld;
+    label->oi = oi;
+    label->type = ext;
+    label->datalo_reg = datalo;
+    label->datahi_reg = 0;
+    label->addrlo_reg = addrlo;
+    label->addrhi_reg = 0;
+    label->raddr = tcg_splitwx_to_rx(raddr);
+    label->label_ptr[0] = label_ptr[0];
+}
+
+static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
+{
+    TCGMemOpIdx oi = l->oi;
+    MemOp opc = get_memop(oi);
+    TCGReg a0 = tcg_target_call_iarg_regs[0];
+    TCGReg a1 = tcg_target_call_iarg_regs[1];
+    TCGReg a2 = tcg_target_call_iarg_regs[2];
+    TCGReg a3 = tcg_target_call_iarg_regs[3];
+
+    /* We don't support oversize guests */
+    if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
+        g_assert_not_reached();
+    }
+
+    /* resolve label address */
+    if (!reloc_sk16(l->label_ptr[0], tcg_splitwx_to_rx(s->code_ptr))) {
+        return false;
+    }
+
+    /* call load helper */
+    tcg_out_mov(s, TCG_TYPE_PTR, a0, TCG_AREG0);
+    tcg_out_mov(s, TCG_TYPE_PTR, a1, l->addrlo_reg);
+    tcg_out_movi(s, TCG_TYPE_PTR, a2, oi);
+    tcg_out_movi(s, TCG_TYPE_PTR, a3, (tcg_target_long)l->raddr);
+
+    tcg_out_call(s, qemu_ld_helpers[opc & MO_SSIZE]);
+    tcg_out_mov(s, (opc & MO_SIZE) == MO_64, l->datalo_reg, a0);
+
+    tcg_out_goto(s, l->raddr);
+    return true;
+}
+
+static bool tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
+{
+    TCGMemOpIdx oi = l->oi;
+    MemOp opc = get_memop(oi);
+    MemOp s_bits = opc & MO_SIZE;
+    TCGReg a0 = tcg_target_call_iarg_regs[0];
+    TCGReg a1 = tcg_target_call_iarg_regs[1];
+    TCGReg a2 = tcg_target_call_iarg_regs[2];
+    TCGReg a3 = tcg_target_call_iarg_regs[3];
+    TCGReg a4 = tcg_target_call_iarg_regs[4];
+
+    /* We don't support oversize guests */
+    if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
+        g_assert_not_reached();
+    }
+
+    /* resolve label address */
+    if (!reloc_sk16(l->label_ptr[0], tcg_splitwx_to_rx(s->code_ptr))) {
+        return false;
+    }
+
+    /* call store helper */
+    tcg_out_mov(s, TCG_TYPE_PTR, a0, TCG_AREG0);
+    tcg_out_mov(s, TCG_TYPE_PTR, a1, l->addrlo_reg);
+    tcg_out_mov(s, TCG_TYPE_PTR, a2, l->datalo_reg);
+    switch (s_bits) {
+    case MO_8:
+        tcg_out_ext8u(s, a2, a2);
+        break;
+    case MO_16:
+        tcg_out_ext16u(s, a2, a2);
+        break;
+    default:
+        break;
+    }
+    tcg_out_movi(s, TCG_TYPE_PTR, a3, oi);
+    tcg_out_movi(s, TCG_TYPE_PTR, a4, (tcg_target_long)l->raddr);
+
+    tcg_out_call(s, qemu_st_helpers[opc & MO_SIZE]);
+
+    tcg_out_goto(s, l->raddr);
+    return true;
+}
+#endif /* CONFIG_SOFTMMU */
+
+static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg lo, TCGReg base,
+                                   MemOp opc, bool is_64)
+{
+    /* Byte swapping is left to middle-end expansion.  */
+    tcg_debug_assert((opc & MO_BSWAP) == 0);
+
+    switch (opc & (MO_SSIZE)) {
+    case MO_UB:
+        tcg_out_opc_ld_bu(s, lo, base, 0);
+        break;
+    case MO_SB:
+        tcg_out_opc_ld_b(s, lo, base, 0);
+        break;
+    case MO_UW:
+        tcg_out_opc_ld_hu(s, lo, base, 0);
+        break;
+    case MO_SW:
+        tcg_out_opc_ld_h(s, lo, base, 0);
+        break;
+    case MO_UL:
+        if (TCG_TARGET_REG_BITS == 64 && is_64) {
+            tcg_out_opc_ld_wu(s, lo, base, 0);
+            break;
+        }
+        /* fallthrough */
+    case MO_SL:
+        tcg_out_opc_ld_w(s, lo, base, 0);
+        break;
+    case MO_Q:
+        tcg_out_opc_ld_d(s, lo, base, 0);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is_64)
+{
+    TCGReg addr_regl;
+    TCGReg data_regl;
+    TCGMemOpIdx oi;
+    MemOp opc;
+#if defined(CONFIG_SOFTMMU)
+    tcg_insn_unit *label_ptr[1];
+#endif
+    TCGReg base = TCG_REG_TMP0;
+
+    data_regl = *args++;
+    addr_regl = *args++;
+    oi = *args++;
+    opc = get_memop(oi);
+
+#if defined(CONFIG_SOFTMMU)
+    tcg_out_tlb_load(s, addr_regl, oi, label_ptr, 1);
+    tcg_out_qemu_ld_direct(s, data_regl, base, opc, is_64);
+    add_qemu_ldst_label(s, 1, oi,
+                        (is_64 ? TCG_TYPE_I64 : TCG_TYPE_I32),
+                        data_regl, addr_regl,
+                        s->code_ptr, label_ptr);
+#else
+    if (guest_base == 0) {
+        tcg_out_opc_add_d(s, base, addr_regl, TCG_REG_ZERO);
+    } else {
+        tcg_out_opc_add_d(s, base, TCG_GUEST_BASE_REG, addr_regl);
+    }
+    tcg_out_qemu_ld_direct(s, data_regl, base, opc, is_64);
+#endif
+}
+
+static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg lo,
+                                   TCGReg base, MemOp opc)
+{
+    /* Byte swapping is left to middle-end expansion.  */
+    tcg_debug_assert((opc & MO_BSWAP) == 0);
+
+    switch (opc & (MO_SSIZE)) {
+    case MO_8:
+        tcg_out_opc_st_b(s, lo, base, 0);
+        break;
+    case MO_16:
+        tcg_out_opc_st_h(s, lo, base, 0);
+        break;
+    case MO_32:
+        tcg_out_opc_st_w(s, lo, base, 0);
+        break;
+    case MO_64:
+        tcg_out_opc_st_d(s, lo, base, 0);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is_64)
+{
+    TCGReg addr_regl;
+    TCGReg data_regl;
+    TCGMemOpIdx oi;
+    MemOp opc;
+#if defined(CONFIG_SOFTMMU)
+    tcg_insn_unit *label_ptr[1];
+#endif
+    TCGReg base = TCG_REG_TMP0;
+
+    data_regl = *args++;
+    addr_regl = *args++;
+    oi = *args++;
+    opc = get_memop(oi);
+
+#if defined(CONFIG_SOFTMMU)
+    tcg_out_tlb_load(s, addr_regl, oi, label_ptr, 0);
+    tcg_out_qemu_st_direct(s, data_regl, base, opc);
+    add_qemu_ldst_label(s, 0, oi,
+                        (is_64 ? TCG_TYPE_I64 : TCG_TYPE_I32),
+                        data_regl, addr_regl,
+                        s->code_ptr, label_ptr);
+#else
+    if (guest_base == 0) {
+        tcg_out_opc_add_d(s, base, addr_regl, TCG_REG_ZERO);
+    } else {
+        tcg_out_opc_add_d(s, base, TCG_GUEST_BASE_REG, addr_regl);
+    }
+    tcg_out_qemu_st_direct(s, data_regl, base, opc);
+#endif
+}
+
 /*
  * Entry-points
  */
@@ -923,6 +1246,19 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_ldst(s, OPC_ST_D, a0, a1, a2);
         break;
 
+    case INDEX_op_qemu_ld_i32:
+        tcg_out_qemu_ld(s, args, false);
+        break;
+    case INDEX_op_qemu_ld_i64:
+        tcg_out_qemu_ld(s, args, true);
+        break;
+    case INDEX_op_qemu_st_i32:
+        tcg_out_qemu_st(s, args, false);
+        break;
+    case INDEX_op_qemu_st_i64:
+        tcg_out_qemu_st(s, args, true);
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
@@ -950,6 +1286,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_brcond_i64:
         return C_O0_I2(rZ, rZ);
 
+    case INDEX_op_qemu_st_i32:
+    case INDEX_op_qemu_st_i64:
+        return C_O0_I2(LZ, L);
+
     case INDEX_op_ext8s_i32:
     case INDEX_op_ext8s_i64:
     case INDEX_op_ext8u_i32:
@@ -986,6 +1326,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_ld_i64:
         return C_O1_I1(r, r);
 
+    case INDEX_op_qemu_ld_i32:
+    case INDEX_op_qemu_ld_i64:
+        return C_O1_I1(r, L);
+
     case INDEX_op_nor_i32:
     case INDEX_op_andc_i32:
     case INDEX_op_orc_i32:
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 24/30] tcg/loongarch: Implement tcg_target_qemu_prologue
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (22 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 23/30] tcg/loongarch: Add softmmu load/store helpers, implement qemu_ld/qemu_st ops WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 17:15   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 25/30] tcg/loongarch: Implement exit_tb/goto_tb WANG Xuerui
                   ` (5 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target.c.inc | 66 ++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
index 0b6f16bde0..10df007087 100644
--- a/tcg/loongarch/tcg-target.c.inc
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -887,6 +887,8 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is_64)
  * Entry-points
  */
 
+static const tcg_insn_unit *tb_ret_addr;
+
 static void tcg_out_op(TCGContext *s, TCGOpcode opc,
                        const TCGArg args[TCG_MAX_OP_ARGS],
                        const int const_args[TCG_MAX_OP_ARGS])
@@ -1401,3 +1403,67 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
         g_assert_not_reached();
     }
 }
+
+static const int tcg_target_callee_save_regs[] = {
+    TCG_REG_S0,     /* used for the global env (TCG_AREG0) */
+    TCG_REG_S1,
+    TCG_REG_S2,
+    TCG_REG_S3,
+    TCG_REG_S4,
+    TCG_REG_S5,
+    TCG_REG_S6,
+    TCG_REG_S7,
+    TCG_REG_S8,
+    TCG_REG_S9,
+    TCG_REG_RA,     /* should be last for ABI compliance */
+};
+
+/* Stack frame parameters.  */
+#define REG_SIZE   (TCG_TARGET_REG_BITS / 8)
+#define SAVE_SIZE  ((int)ARRAY_SIZE(tcg_target_callee_save_regs) * REG_SIZE)
+#define TEMP_SIZE  (CPU_TEMP_BUF_NLONGS * (int)sizeof(long))
+#define FRAME_SIZE ((TCG_STATIC_CALL_ARGS_SIZE + TEMP_SIZE + SAVE_SIZE \
+                     + TCG_TARGET_STACK_ALIGN - 1) \
+                    & -TCG_TARGET_STACK_ALIGN)
+#define SAVE_OFS   (TCG_STATIC_CALL_ARGS_SIZE + TEMP_SIZE)
+
+/* We're expecting to be able to use an immediate for frame allocation.  */
+QEMU_BUILD_BUG_ON(FRAME_SIZE > 0x7ff);
+
+/* Generate global QEMU prologue and epilogue code */
+static void tcg_target_qemu_prologue(TCGContext *s)
+{
+    int i;
+
+    tcg_set_frame(s, TCG_REG_SP, TCG_STATIC_CALL_ARGS_SIZE, TEMP_SIZE);
+
+    /* TB prologue */
+    tcg_out_opc_addi_d(s, TCG_REG_SP, TCG_REG_SP, -FRAME_SIZE);
+    for (i = 0; i < ARRAY_SIZE(tcg_target_callee_save_regs); i++) {
+        tcg_out_st(s, TCG_TYPE_REG, tcg_target_callee_save_regs[i],
+                   TCG_REG_SP, SAVE_OFS + i * REG_SIZE);
+    }
+
+#if !defined(CONFIG_SOFTMMU)
+    tcg_out_movi(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, guest_base);
+    tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG);
+#endif
+
+    /* Call generated code */
+    tcg_out_mov(s, TCG_TYPE_PTR, TCG_AREG0, tcg_target_call_iarg_regs[0]);
+    tcg_out_opc_jirl(s, TCG_REG_ZERO, tcg_target_call_iarg_regs[1], 0);
+
+    /* Return path for goto_ptr. Set return value to 0 */
+    tcg_code_gen_epilogue = tcg_splitwx_to_rx(s->code_ptr);
+    tcg_out_mov(s, TCG_TYPE_REG, TCG_REG_A0, TCG_REG_ZERO);
+
+    /* TB epilogue */
+    tb_ret_addr = tcg_splitwx_to_rx(s->code_ptr);
+    for (i = 0; i < ARRAY_SIZE(tcg_target_callee_save_regs); i++) {
+        tcg_out_ld(s, TCG_TYPE_REG, tcg_target_callee_save_regs[i],
+                   TCG_REG_SP, SAVE_OFS + i * REG_SIZE);
+    }
+
+    tcg_out_opc_addi_d(s, TCG_REG_SP, TCG_REG_SP, FRAME_SIZE);
+    tcg_out_opc_jirl(s, TCG_REG_ZERO, TCG_REG_RA, 0);
+}
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 25/30] tcg/loongarch: Implement exit_tb/goto_tb
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (23 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 24/30] tcg/loongarch: Implement tcg_target_qemu_prologue WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 17:16   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 26/30] tcg/loongarch: Implement tcg_target_init WANG Xuerui
                   ` (4 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target.c.inc | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
index 10df007087..585bf8dba0 100644
--- a/tcg/loongarch/tcg-target.c.inc
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -899,6 +899,25 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     int c2 = const_args[2];
 
     switch (opc) {
+    case INDEX_op_exit_tb:
+        /* Reuse the zeroing that exists for goto_ptr.  */
+        if (a0 == 0) {
+            tcg_out_call_int(s, tcg_code_gen_epilogue, true);
+        } else {
+            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_A0, a0);
+            tcg_out_call_int(s, tb_ret_addr, true);
+        }
+        break;
+
+    case INDEX_op_goto_tb:
+        assert(s->tb_jmp_insn_offset == 0);
+        /* indirect jump method */
+        tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TMP0, TCG_REG_ZERO,
+                   (uintptr_t)(s->tb_jmp_target_addr + a0));
+        tcg_out_opc_jirl(s, TCG_REG_ZERO, TCG_REG_TMP0, 0);
+        set_jmp_reset_offset(s, a0);
+        break;
+
     case INDEX_op_mb:
         tcg_out_mb(s, a0);
         break;
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 26/30] tcg/loongarch: Implement tcg_target_init
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (24 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 25/30] tcg/loongarch: Implement exit_tb/goto_tb WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 17:19   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 27/30] tcg/loongarch: Register the JIT WANG Xuerui
                   ` (3 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target.c.inc | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
index 585bf8dba0..107682e1fa 100644
--- a/tcg/loongarch/tcg-target.c.inc
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -1486,3 +1486,32 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     tcg_out_opc_addi_d(s, TCG_REG_SP, TCG_REG_SP, FRAME_SIZE);
     tcg_out_opc_jirl(s, TCG_REG_ZERO, TCG_REG_RA, 0);
 }
+
+static void tcg_target_init(TCGContext *s)
+{
+    tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff;
+    if (TCG_TARGET_REG_BITS == 64) {
+        tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffff;
+    }
+
+    tcg_target_call_clobber_regs = -1u;
+    tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S0);
+    tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S1);
+    tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S2);
+    tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S3);
+    tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S4);
+    tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S5);
+    tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S6);
+    tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S7);
+    tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S8);
+    tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S9);
+
+    s->reserved_regs = 0;
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_ZERO);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP0);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP1);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP2);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_SP);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_TP);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_RESERVED);
+}
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 27/30] tcg/loongarch: Register the JIT
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (25 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 26/30] tcg/loongarch: Implement tcg_target_init WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 17:21   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 28/30] configure, meson.build: Mark support for 64-bit LoongArch hosts WANG Xuerui
                   ` (2 subsequent siblings)
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 tcg/loongarch/tcg-target.c.inc | 44 ++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/tcg/loongarch/tcg-target.c.inc b/tcg/loongarch/tcg-target.c.inc
index 107682e1fa..59adc92d26 100644
--- a/tcg/loongarch/tcg-target.c.inc
+++ b/tcg/loongarch/tcg-target.c.inc
@@ -1515,3 +1515,47 @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_TP);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_RESERVED);
 }
+
+typedef struct {
+    DebugFrameHeader h;
+    uint8_t fde_def_cfa[4];
+    uint8_t fde_reg_ofs[ARRAY_SIZE(tcg_target_callee_save_regs) * 2];
+} DebugFrame;
+
+#define ELF_HOST_MACHINE EM_LOONGARCH
+
+static const DebugFrame debug_frame = {
+    .h.cie.len = sizeof(DebugFrameCIE) - 4, /* length after .len member */
+    .h.cie.id = -1,
+    .h.cie.version = 1,
+    .h.cie.code_align = 1,
+    .h.cie.data_align = -(TCG_TARGET_REG_BITS / 8) & 0x7f, /* sleb128 */
+    .h.cie.return_column = TCG_REG_RA,
+
+    /* Total FDE size does not include the "len" member.  */
+    .h.fde.len = sizeof(DebugFrame) - offsetof(DebugFrame, h.fde.cie_offset),
+
+    .fde_def_cfa = {
+        12, TCG_REG_SP,                 /* DW_CFA_def_cfa sp, ...  */
+        (FRAME_SIZE & 0x7f) | 0x80,     /* ... uleb128 FRAME_SIZE */
+        (FRAME_SIZE >> 7)
+    },
+    .fde_reg_ofs = {
+        0x80 + 23, 11,                  /* DW_CFA_offset, s0,  -88 */
+        0x80 + 24, 10,                  /* DW_CFA_offset, s1,  -80 */
+        0x80 + 25, 9,                   /* DW_CFA_offset, s2,  -72 */
+        0x80 + 26, 8,                   /* DW_CFA_offset, s3,  -64 */
+        0x80 + 27, 7,                   /* DW_CFA_offset, s4,  -56 */
+        0x80 + 28, 6,                   /* DW_CFA_offset, s5,  -48 */
+        0x80 + 29, 5,                   /* DW_CFA_offset, s6,  -40 */
+        0x80 + 30, 4,                   /* DW_CFA_offset, s7,  -32 */
+        0x80 + 31, 3,                   /* DW_CFA_offset, s8,  -24 */
+        0x80 + 22, 2,                   /* DW_CFA_offset, s9,  -16 */
+        0x80 + 1 , 1,                   /* DW_CFA_offset, ra,  -8 */
+    }
+};
+
+void tcg_register_jit(const void *buf, size_t buf_size)
+{
+    tcg_register_jit_int(buf, buf_size, &debug_frame, sizeof(debug_frame));
+}
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 28/30] configure, meson.build: Mark support for 64-bit LoongArch hosts
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (26 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 27/30] tcg/loongarch: Register the JIT WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 17:23   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 29/30] linux-user: Add host dependency for 64-bit LoongArch WANG Xuerui
  2021-09-20  8:04 ` [PATCH 30/30] accel/tcg/user-exec: Implement CPU-specific signal handler for LoongArch hosts WANG Xuerui
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 configure   | 4 +++-
 meson.build | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 1043ccce4f..f1bc85e71b 100755
--- a/configure
+++ b/configure
@@ -659,6 +659,8 @@ elif check_define __arm__ ; then
   cpu="arm"
 elif check_define __aarch64__ ; then
   cpu="aarch64"
+elif check_define __loongarch64 ; then
+  cpu="loongarch64"
 else
   cpu=$(uname -m)
 fi
@@ -667,7 +669,7 @@ ARCH=
 # Normalise host CPU name and set ARCH.
 # Note that this case should only have supported host CPUs, not guests.
 case "$cpu" in
-  ppc|ppc64|s390x|sparc64|x32|riscv32|riscv64)
+  ppc|ppc64|s390x|sparc64|x32|riscv32|riscv64|loongarch64)
   ;;
   ppc64le)
     ARCH="ppc64"
diff --git a/meson.build b/meson.build
index 2711cbb789..fb3befead5 100644
--- a/meson.build
+++ b/meson.build
@@ -57,7 +57,7 @@ python = import('python').find_installation()
 
 supported_oses = ['windows', 'freebsd', 'netbsd', 'openbsd', 'darwin', 'sunos', 'linux']
 supported_cpus = ['ppc', 'ppc64', 's390x', 'riscv32', 'riscv64', 'x86', 'x86_64',
-  'arm', 'aarch64', 'mips', 'mips64', 'sparc', 'sparc64']
+  'arm', 'aarch64', 'loongarch64', 'mips', 'mips64', 'sparc', 'sparc64']
 
 cpu = host_machine.cpu_family()
 targetos = host_machine.system()
@@ -269,6 +269,8 @@ if not get_option('tcg').disabled()
     tcg_arch = 's390'
   elif config_host['ARCH'] in ['x86_64', 'x32']
     tcg_arch = 'i386'
+  elif config_host['ARCH'] == 'loongarch64'
+    tcg_arch = 'loongarch'
   elif config_host['ARCH'] == 'ppc64'
     tcg_arch = 'ppc'
   elif config_host['ARCH'] in ['riscv32', 'riscv64']
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 29/30] linux-user: Add host dependency for 64-bit LoongArch
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (27 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 28/30] configure, meson.build: Mark support for 64-bit LoongArch hosts WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 17:26   ` Richard Henderson
  2021-09-20  8:04 ` [PATCH 30/30] accel/tcg/user-exec: Implement CPU-specific signal handler for LoongArch hosts WANG Xuerui
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Currently nothing special is needed for LoongArch hosts to work, so only
leave a placeholder there.

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 linux-user/host/loongarch64/hostdep.h | 11 +++++++++++
 1 file changed, 11 insertions(+)
 create mode 100644 linux-user/host/loongarch64/hostdep.h

diff --git a/linux-user/host/loongarch64/hostdep.h b/linux-user/host/loongarch64/hostdep.h
new file mode 100644
index 0000000000..4e55695155
--- /dev/null
+++ b/linux-user/host/loongarch64/hostdep.h
@@ -0,0 +1,11 @@
+/*
+ * hostdep.h : things which are dependent on the host architecture
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef LOONGARCH64_HOSTDEP_H
+#define LOONGARCH64_HOSTDEP_H
+
+#endif
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 30/30] accel/tcg/user-exec: Implement CPU-specific signal handler for LoongArch hosts
  2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
                   ` (28 preceding siblings ...)
  2021-09-20  8:04 ` [PATCH 29/30] linux-user: Add host dependency for 64-bit LoongArch WANG Xuerui
@ 2021-09-20  8:04 ` WANG Xuerui
  2021-09-20 17:31   ` Richard Henderson
  29 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20  8:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: WANG Xuerui

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
 accel/tcg/user-exec.c | 83 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 83 insertions(+)

diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index 8fed542622..0f85062e61 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -878,6 +878,89 @@ int cpu_signal_handler(int host_signum, void *pinfo,
     return handle_cpu_signal(pc, info, is_write, &uc->uc_sigmask);
 }
 
+#elif defined(__loongarch__)
+
+/*
+ * This logic is bitness-agnostic, so the generic __loongarch__ guard is used
+ * instead of explicit ones like __loongarch64.
+ */
+
+int cpu_signal_handler(int host_signum, void *pinfo,
+                       void *puc)
+{
+    siginfo_t *info = pinfo;
+    ucontext_t *uc = puc;
+    greg_t pc = uc->uc_mcontext.__pc;
+    uint32_t insn = *(uint32_t *)pc;
+    int is_write = 0;
+
+    /* Detect store by reading the instruction at the program counter.  */
+    switch ((insn >> 26) & 0b111111) {
+    case 0b001000: /* {ll,sc}.[wd] */
+        switch ((insn >> 24) & 0b11) {
+        case 0b01: /* sc.w */
+        case 0b11: /* sc.d */
+            is_write = 1;
+            break;
+        }
+        break;
+    case 0b001001: /* {ld,st}ox4.[wd] ({ld,st}ptr.[wd]) */
+        switch ((insn >> 24) & 0b11) {
+        case 0b01: /* stox4.w (stptr.w) */
+        case 0b11: /* stox4.d (stptr.d) */
+            is_write = 1;
+            break;
+        }
+        break;
+    case 0b001010: /* {ld,st}.* family */
+        switch ((insn >> 22) & 0b1111) {
+        case 0b0100: /* st.b */
+        case 0b0101: /* st.h */
+        case 0b0110: /* st.w */
+        case 0b0111: /* st.d */
+        case 0b1101: /* fst.s */
+        case 0b1111: /* fst.d */
+            is_write = 1;
+            break;
+        }
+        break;
+    case 0b001110: /* indexed, atomic, bounds-checking memory operations */
+        uint32_t sel = (insn >> 15) & 0b11111111111;
+
+        switch (sel) {
+        case 0b00000100000: /* stx.b */
+        case 0b00000101000: /* stx.h */
+        case 0b00000110000: /* stx.w */
+        case 0b00000111000: /* stx.d */
+        case 0b00001110000: /* fstx.s */
+        case 0b00001111000: /* fstx.d */
+        case 0b00011101100: /* fstgt.s */
+        case 0b00011101101: /* fstgt.d */
+        case 0b00011101110: /* fstle.s */
+        case 0b00011101111: /* fstle.d */
+        case 0b00011111000: /* stgt.b */
+        case 0b00011111001: /* stgt.h */
+        case 0b00011111010: /* stgt.w */
+        case 0b00011111011: /* stgt.d */
+        case 0b00011111100: /* stle.b */
+        case 0b00011111101: /* stle.h */
+        case 0b00011111110: /* stle.w */
+        case 0b00011111111: /* stle.d */
+            is_write = 1;
+            break;
+        default:
+            /* test for am* instruction range */
+            if (0b00011000000 <= sel && sel <= 0b00011100011) {
+                is_write = 1;
+            }
+            break;
+        }
+        break;
+    }
+
+    return handle_cpu_signal(pc, info, is_write, &uc->uc_sigmask);
+}
+
 #else
 
 #error host CPU specific signal handler needed
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/30] tcg/loongarch: Add the tcg-target.h file
  2021-09-20  8:04 ` [PATCH 03/30] tcg/loongarch: Add the tcg-target.h file WANG Xuerui
@ 2021-09-20 14:23   ` Richard Henderson
  2021-09-20 16:20     ` WANG Xuerui
  0 siblings, 1 reply; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 14:23 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> Signed-off-by: WANG Xuerui <git@xen0n.name>
> ---
>   tcg/loongarch/tcg-target.h | 183 +++++++++++++++++++++++++++++++++++++
>   1 file changed, 183 insertions(+)
>   create mode 100644 tcg/loongarch/tcg-target.h
> 
> diff --git a/tcg/loongarch/tcg-target.h b/tcg/loongarch/tcg-target.h
> new file mode 100644
> index 0000000000..b5e70e01b5
> --- /dev/null
> +++ b/tcg/loongarch/tcg-target.h
> @@ -0,0 +1,183 @@
> +/*
> + * Tiny Code Generator for QEMU
> + *
> + * Copyright (c) 2021 WANG Xuerui <git@xen0n.name>
> + *
> + * Based on tcg/riscv/tcg-target.h
> + *
> + * Copyright (c) 2018 SiFive, Inc

You may have copied too much from the riscv port?  :-)

> +/*
> + * Loongson removed the (incomplete) 32-bit support from kernel and toolchain
> + * for the initial upstreaming of this architecture, so don't bother and just
> + * support the LP64 ABI for now.
> + */
> +#if defined(__loongarch64)
> +# define TCG_TARGET_REG_BITS 64
> +#else
> +# error unsupported LoongArch bitness

s/bitness/register size/


> +#define TCG_TARGET_TLB_DISPLACEMENT_BITS 20

Hmm.  I was about to say this is more copying from riscv, and should be X, but now I see 
that this is no longer used.  You can omit it now; I'll remove the other instances myself.

> +/* optional instructions */
> +#define TCG_TARGET_HAS_movcond_i32      0
> +#define TCG_TARGET_HAS_div_i32          1
> +#define TCG_TARGET_HAS_rem_i32          1
> +#define TCG_TARGET_HAS_div2_i32         0
> +#define TCG_TARGET_HAS_rot_i32          1
> +#define TCG_TARGET_HAS_deposit_i32      1
> +#define TCG_TARGET_HAS_extract_i32      1
> +#define TCG_TARGET_HAS_sextract_i32     0
> +#define TCG_TARGET_HAS_extract2_i32     0
> +#define TCG_TARGET_HAS_add2_i32         0
> +#define TCG_TARGET_HAS_sub2_i32         0
> +#define TCG_TARGET_HAS_mulu2_i32        0
> +#define TCG_TARGET_HAS_muls2_i32        0
> +#define TCG_TARGET_HAS_muluh_i32        1
> +#define TCG_TARGET_HAS_mulsh_i32        1
> +#define TCG_TARGET_HAS_ext8s_i32        1
> +#define TCG_TARGET_HAS_ext16s_i32       1
> +#define TCG_TARGET_HAS_ext8u_i32        1
> +#define TCG_TARGET_HAS_ext16u_i32       1
> +#define TCG_TARGET_HAS_bswap16_i32      0
> +#define TCG_TARGET_HAS_bswap32_i32      1
> +#define TCG_TARGET_HAS_not_i32          1
> +#define TCG_TARGET_HAS_neg_i32          1
> +#define TCG_TARGET_HAS_andc_i32         1
> +#define TCG_TARGET_HAS_orc_i32          1
> +#define TCG_TARGET_HAS_eqv_i32          0
> +#define TCG_TARGET_HAS_nand_i32         0
> +#define TCG_TARGET_HAS_nor_i32          1
> +#define TCG_TARGET_HAS_clz_i32          1
> +#define TCG_TARGET_HAS_ctz_i32          1
> +#define TCG_TARGET_HAS_ctpop_i32        0
> +#define TCG_TARGET_HAS_direct_jump      0
> +#define TCG_TARGET_HAS_brcond2          0
> +#define TCG_TARGET_HAS_setcond2         0
> +#define TCG_TARGET_HAS_qemu_st8_i32     0
> +
> +#if TCG_TARGET_REG_BITS == 64

You don't need this conditional, since you've asserted it at the top (and unlike riscv, 
have no plans to add support for riscv32 at some future point).


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 06/30] tcg/loongarch: Define the operand constraints
  2021-09-20  8:04 ` [PATCH 06/30] tcg/loongarch: Define the operand constraints WANG Xuerui
@ 2021-09-20 14:28   ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 14:28 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> +static inline tcg_target_long sextreg(tcg_target_long val, int pos, int len)
> +{
> +    if (TCG_TARGET_REG_BITS == 32) {
> +        return sextract32(val, pos, len);
> +    } else {
> +        return sextract64(val, pos, len);
> +    }
> +}

You don't need this conditional.  Just use sextract64 directly.


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/30] tcg/loongarch: Implement necessary relocation operations
  2021-09-20  8:04 ` [PATCH 07/30] tcg/loongarch: Implement necessary relocation operations WANG Xuerui
@ 2021-09-20 14:36   ` Richard Henderson
  2021-09-20 17:15     ` WANG Xuerui
  0 siblings, 1 reply; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 14:36 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> +static bool reloc_call(tcg_insn_unit *src_rw, const tcg_insn_unit *target)
> +{
> +    const tcg_insn_unit *src_rx = tcg_splitwx_to_rx(src_rw);
> +    intptr_t offset = (intptr_t)target - (intptr_t)src_rx;
> +    int32_t lo = sextreg(offset, 0, 12);
> +    int32_t hi = offset - lo;
> +
> +    tcg_debug_assert((offset & 2) == 0);
> +    if (offset == hi + lo) {
> +        hi >>= 12;
> +        src_rw[0] |= (hi << 5) & 0x1ffffe0; /* pcaddu12i's Sj20 imm */
> +        lo >>= 2;
> +        src_rw[1] |= (lo << 10) & 0x3fffc00; /* jirl's Sk16 imm */
> +        return true;
> +    }
> +
> +    return false;
> +}

This doesn't seem to belong as a "reloc".
Certainly it doesn't seem like something that can simply be allowed to fail.


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 09/30] tcg/loongarch: Implement tcg_out_mov and tcg_out_movi
  2021-09-20  8:04 ` [PATCH 09/30] tcg/loongarch: Implement tcg_out_mov and tcg_out_movi WANG Xuerui
@ 2021-09-20 14:47   ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 14:47 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> +static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
> +                         tcg_target_long val)
> +{
> +    tcg_target_long low, upper, higher, top;
> +
> +    if (TCG_TARGET_REG_BITS == 64 && type == TCG_TYPE_I32) {
> +        val = (int32_t)val;
> +    }
> +
> +    /* Single-instruction cases.  */
> +    low = sextreg(val, 0, 12);
> +    if (low == val) {
> +        /* val fits in simm12: addi.w rd, zero, val */
> +        tcg_out_opc_addi_w(s, rd, TCG_REG_ZERO, val);
> +        return;
> +    }
> +    if (0x800 <= val && val <= 0xfff) {
> +        /* val fits in uimm12: ori rd, zero, val */
> +        tcg_out_opc_ori(s, rd, TCG_REG_ZERO, val);
> +        return;
> +    }
> +
> +    /* Chop upper bits into 3 immediate-field-sized segments respectively.  */
> +    upper = (val >> 12) & 0xfffff;
> +    higher = (val >> 32) & 0xfffff;
> +    top = val >> 52;
> +
> +    tcg_out_opc_lu12i_w(s, rd, upper);
> +    if (low != 0) {
> +        tcg_out_opc_ori(s, rd, rd, low);
> +    }
> +
> +    if (sextreg(val, 0, 32) == val) {
> +        /*
> +         * Fits in 32-bits, upper bits are already properly sign-extended by
> +         * lu12i.w.
> +         */
> +        return;
> +    }
> +    tcg_out_opc_cu32i_d(s, rd, higher);
> +
> +    if (sextreg(val, 0, 52) == val) {
> +        /*
> +         * Fits in 52-bits, upper bits are already properly sign-extended by
> +         * cu32i.d.
> +         */
> +        return;
> +    }
> +    tcg_out_opc_cu52i_d(s, rd, rd, top);

Looks ok.

You'll want to check for small to medium pc-relative addresses.  Almost every TB will load 
the address of TB+C (0 <= C <= 3) at the end, and the TB structure immediately precedes 
the code.  Because of the odd values, you'll sometimes need two instructions. But that 
will still be less than the 3-4 for a 52/64-bit address constant.


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/30] tcg/loongarch: Implement goto_ptr
  2021-09-20  8:04 ` [PATCH 10/30] tcg/loongarch: Implement goto_ptr WANG Xuerui
@ 2021-09-20 14:49   ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 14:49 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

> + * Define LoongArch target-specific constraint sets.
> + *
> + * Copyright (c) 2021 WANG Xuerui <git@xen0n.name>
> + *
> + * Based on tcg/riscv/tcg-target-con-set.h
> + *
> + * Copyright (c) 2021 Linaro

Too much cut and paste.  Otherwise,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/30] tcg/loongarch: Implement sign-/zero-extension ops
  2021-09-20  8:04 ` [PATCH 11/30] tcg/loongarch: Implement sign-/zero-extension ops WANG Xuerui
@ 2021-09-20 14:50   ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 14:50 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> Signed-off-by: WANG Xuerui<git@xen0n.name>
> ---
>   tcg/loongarch/tcg-target-con-set.h |  1 +
>   tcg/loongarch/tcg-target.c.inc     | 82 ++++++++++++++++++++++++++++++
>   2 files changed, 83 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/30] MAINTAINERS: Add tcg/loongarch entry with myself as maintainer
  2021-09-20  8:04 ` [PATCH 02/30] MAINTAINERS: Add tcg/loongarch entry with myself as maintainer WANG Xuerui
@ 2021-09-20 14:50   ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 14:50 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> I wrote the initial code, so I should maintain it of course.
> 
> Signed-off-by: WANG Xuerui<git@xen0n.name>
> ---
>   MAINTAINERS | 5 +++++
>   1 file changed, 5 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 12/30] tcg/loongarch: Implement not/and/or/xor/nor/andc/orc ops
  2021-09-20  8:04 ` [PATCH 12/30] tcg/loongarch: Implement not/and/or/xor/nor/andc/orc ops WANG Xuerui
@ 2021-09-20 14:54   ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 14:54 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> +    case INDEX_op_andc_i32:
> +    case INDEX_op_andc_i64:
> +        tcg_out_opc_andn(s, a0, a1, a2);
> +        break;

You may want to add the constant case here, implemented with andi, with the constant 
inverted, similarly to the negation of the N constraint.  We do not (but probably should) 
canonicalize andc/orc/eqv constants to and/or/xor during optimization...

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/30] tcg/loongarch: Implement deposit/extract ops
  2021-09-20  8:04 ` [PATCH 13/30] tcg/loongarch: Implement deposit/extract ops WANG Xuerui
@ 2021-09-20 14:55   ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 14:55 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> Signed-off-by: WANG Xuerui<git@xen0n.name>
> ---
>   tcg/loongarch/tcg-target-con-set.h |  1 +
>   tcg/loongarch/tcg-target.c.inc     | 21 +++++++++++++++++++++
>   2 files changed, 22 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 14/30] tcg/loongarch: Implement bswap32_i32/bswap64_i64
  2021-09-20  8:04 ` [PATCH 14/30] tcg/loongarch: Implement bswap32_i32/bswap64_i64 WANG Xuerui
@ 2021-09-20 15:11   ` Richard Henderson
  2021-09-20 18:20     ` Richard Henderson
  2021-09-21  6:37     ` WANG Xuerui
  0 siblings, 2 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 15:11 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> +    case INDEX_op_bswap32_i32:
> +        tcg_out_opc_revb_2h(s, a0, a1);
> +        tcg_out_opc_rotri_w(s, a0, a0, 16);
> +        break;
> +    case INDEX_op_bswap64_i64:
> +        tcg_out_opc_revb_d(s, a0, a1);
> +        break;

You're missing INDEX_op_bswap32_i64, which in addition has a third argument consisting of 
TCG_BSWAP_* bits.

I would have expected revb_2w to be the preferred implementation of bswap32.  I would 
expect something like


     case INDEX_op_bswap32_i32:
         /* All 32-bit values are computed sign-extended in the register. */
         a2 = TCG_BSWAP_OS;
         /* fall through */
     case INDEX_op_bswap32_i64:
         tcg_out_opc_revb_2w(s, a0, a1);
         if (a2 & TCG_BSWAP_OS) {
             tcg_out_ext32s(s, a0, a0);
         } else if (a2 & TCG_BSWAP_OZ) {
             tcg_out_ext32u(s, a0, a0);
         }
         break;


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/30] tcg/loongarch: Add generated instruction opcodes and encoding helpers
  2021-09-20  8:04 ` [PATCH 04/30] tcg/loongarch: Add generated instruction opcodes and encoding helpers WANG Xuerui
@ 2021-09-20 15:55   ` Richard Henderson
  2021-09-20 16:24     ` WANG Xuerui
  2021-09-21  9:58   ` Philippe Mathieu-Daudé
  1 sibling, 1 reply; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 15:55 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> Signed-off-by: WANG Xuerui<git@xen0n.name>
> ---
>   tcg/loongarch/tcg-insn-defs.c.inc | 1080 +++++++++++++++++++++++++++++
>   1 file changed, 1080 insertions(+)
>   create mode 100644 tcg/loongarch/tcg-insn-defs.c.inc
> 
> diff --git a/tcg/loongarch/tcg-insn-defs.c.inc b/tcg/loongarch/tcg-insn-defs.c.inc
> new file mode 100644
> index 0000000000..413f7ffc12
> --- /dev/null
> +++ b/tcg/loongarch/tcg-insn-defs.c.inc
> @@ -0,0 +1,1080 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * LoongArch instruction formats, opcodes, and encoders for TCG use.
> + *
> + * Code generated by genqemutcgdefs from
> + *https://github.com/loongson-community/loongarch-opcodes,
> + * from commit bb5234081663faaefb6b921a7848b18e19519890.
> + * DO NOT EDIT.
> + */
> +

Acked-by: Richard Henderson <richard.henderson@linaro.org>


> +static int32_t encode_d_slot(LoongArchInsn opc, uint32_t d)
> +    __attribute__((unused));
> +
> +static int32_t encode_d_slot(LoongArchInsn opc, uint32_t d)

Just an FYI: you can add the attribute directly to the function definition like so

static int32_t __attribute__((unused))
encode_d_slot(LoongArchInsn opc, uint32_t d)
{
    ...
}


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/30] tcg/loongarch: Add register names, allocation order and input/output sets
  2021-09-20  8:04 ` [PATCH 05/30] tcg/loongarch: Add register names, allocation order and input/output sets WANG Xuerui
@ 2021-09-20 15:57   ` Richard Henderson
  2021-09-20 16:27     ` WANG Xuerui
  0 siblings, 1 reply; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 15:57 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> +    /* Argument registers */
> +    TCG_REG_A0,
> +    TCG_REG_A1,
> +    TCG_REG_A2,
> +    TCG_REG_A3,
> +    TCG_REG_A4,
> +    TCG_REG_A5,
> +    TCG_REG_A6,
> +    TCG_REG_A7,
> +};

Generally I'd place the argument registers in reverse usage order.  It means that we'll 
try to use A7 before A0, which may work to our favor if the called function has less than 
8 arguments.

But otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 15/30] tcg/loongarch: Implement clz/ctz ops
  2021-09-20  8:04 ` [PATCH 15/30] tcg/loongarch: Implement clz/ctz ops WANG Xuerui
@ 2021-09-20 16:10   ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 16:10 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> +static void tcg_out_clzctz(TCGContext *s, LoongArchInsn opc,
> +                           TCGReg a0, TCGReg a1, TCGReg a2)
> +{
> +    /* all clz/ctz insns belong to DJ-format */
> +    tcg_out32(s, encode_dj_insn(opc, TCG_REG_TMP0, a1));
> +    /* a0 = a1 ? REG_TMP0 : a2 */
> +    tcg_out_opc_maskeqz(s, TCG_REG_TMP0, TCG_REG_TMP0, a1);
> +    tcg_out_opc_masknez(s, a0, a2, a1);
> +    tcg_out_opc_or(s, a0, TCG_REG_TMP0, a0);
> +}

 From Song Gao's translation, I believe that ctz(0) == 32 or 64, depending on the 
operation width.  This is in fact the most common result, so it's worth specializing.  See 
tcg/i386/tcg-target.c.inc, tcg_out_clz, have_lzcnt.

But what's here looks ok.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 16/30] tcg/loongarch: Implement shl/shr/sar/rotl/rotr ops
  2021-09-20  8:04 ` [PATCH 16/30] tcg/loongarch: Implement shl/shr/sar/rotl/rotr ops WANG Xuerui
@ 2021-09-20 16:13   ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 16:13 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> +    case INDEX_op_rotl_i32:
> +        /* transform into equivalent rotr_i32 */
> +        if (c2) {
> +            a2 = 32 - a2;
> +        } else {
> +            tcg_out_opc_sub_w(s, a2, TCG_REG_ZERO, a2);
> +            tcg_out_opc_addi_w(s, a2, a2, 32);

You can't modify a2 here; need to use TCG_REG_TMP0.
You don't need the addi, because the negation is sufficient, mod 32.

Likewise for INDEX_op_rotl_i64.


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 17/30] tcg/loongarch: Implement neg/add/sub ops
  2021-09-20  8:04 ` [PATCH 17/30] tcg/loongarch: Implement neg/add/sub ops WANG Xuerui
@ 2021-09-20 16:16   ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 16:16 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> Signed-off-by: WANG Xuerui<git@xen0n.name>
> ---
>   tcg/loongarch/tcg-target-con-set.h |  2 ++
>   tcg/loongarch/tcg-target.c.inc     | 47 ++++++++++++++++++++++++++++++
>   2 files changed, 49 insertions(+)

You shouldn't have needed to implement neg, since tcg should have figured out that it 
could use subtract from zero.  But anyway,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 18/30] tcg/loongarch: Implement mul/mulsh/muluh/div/divu/rem/remu ops
  2021-09-20  8:04 ` [PATCH 18/30] tcg/loongarch: Implement mul/mulsh/muluh/div/divu/rem/remu ops WANG Xuerui
@ 2021-09-20 16:16   ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 16:16 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> Signed-off-by: WANG Xuerui<git@xen0n.name>
> ---
>   tcg/loongarch/tcg-target-con-set.h |  1 +
>   tcg/loongarch/tcg-target.c.inc     | 65 ++++++++++++++++++++++++++++++
>   2 files changed, 66 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 19/30] tcg/loongarch: Implement br/brcond ops
  2021-09-20  8:04 ` [PATCH 19/30] tcg/loongarch: Implement br/brcond ops WANG Xuerui
@ 2021-09-20 16:20   ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 16:20 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> Signed-off-by: WANG Xuerui<git@xen0n.name>
> ---
>   tcg/loongarch/tcg-target-con-set.h |  1 +
>   tcg/loongarch/tcg-target.c.inc     | 52 ++++++++++++++++++++++++++++++
>   2 files changed, 53 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/30] tcg/loongarch: Add the tcg-target.h file
  2021-09-20 14:23   ` Richard Henderson
@ 2021-09-20 16:20     ` WANG Xuerui
  2021-09-20 16:25       ` Richard Henderson
  0 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20 16:20 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

Hi Richard,

On 9/20/21 22:23, Richard Henderson wrote:
> On 9/20/21 1:04 AM, WANG Xuerui wrote:
>> Signed-off-by: WANG Xuerui <git@xen0n.name>
>> ---
>>   tcg/loongarch/tcg-target.h | 183 +++++++++++++++++++++++++++++++++++++
>>   1 file changed, 183 insertions(+)
>>   create mode 100644 tcg/loongarch/tcg-target.h
>>
>> diff --git a/tcg/loongarch/tcg-target.h b/tcg/loongarch/tcg-target.h
>> new file mode 100644
>> index 0000000000..b5e70e01b5
>> --- /dev/null
>> +++ b/tcg/loongarch/tcg-target.h
>> @@ -0,0 +1,183 @@
>> +/*
>> + * Tiny Code Generator for QEMU
>> + *
>> + * Copyright (c) 2021 WANG Xuerui <git@xen0n.name>
>> + *
>> + * Based on tcg/riscv/tcg-target.h
>> + *
>> + * Copyright (c) 2018 SiFive, Inc
>
> You may have copied too much from the riscv port?  :-)

First of all, thanks for the *extremely* quick review!

As for the copying, I admit that I thought the riscv port generally was 
doing things the recent and preferred way, so most of the logic are only 
lightly touched. However the LoongArch is substantially similar to riscv 
too, so much of the traits expressed here would be the same regardless.

But in such a case of outstanding similarity, should I just drop my 
"copyright" line? I'm actually okay with dropping if that's the best 
thing to do.

>
>> +/*
>> + * Loongson removed the (incomplete) 32-bit support from kernel and 
>> toolchain
>> + * for the initial upstreaming of this architecture, so don't bother 
>> and just
>> + * support the LP64 ABI for now.
>> + */
>> +#if defined(__loongarch64)
>> +# define TCG_TARGET_REG_BITS 64
>> +#else
>> +# error unsupported LoongArch bitness
>
> s/bitness/register size/
Sure; will fix in v2.
>
>
>> +#define TCG_TARGET_TLB_DISPLACEMENT_BITS 20
>
> Hmm.  I was about to say this is more copying from riscv, and should 
> be X, but now I see that this is no longer used.  You can omit it now; 
> I'll remove the other instances myself.
Thanks for the explanation, I'm only into qemu internals for 2 weeks and 
that's something I haven't read about yet! I'll try to remove irrelevant 
parts like this in v2.
>
>> +/* optional instructions */
>> +#define TCG_TARGET_HAS_movcond_i32      0
>> +#define TCG_TARGET_HAS_div_i32          1
>> +#define TCG_TARGET_HAS_rem_i32          1
>> +#define TCG_TARGET_HAS_div2_i32         0
>> +#define TCG_TARGET_HAS_rot_i32          1
>> +#define TCG_TARGET_HAS_deposit_i32      1
>> +#define TCG_TARGET_HAS_extract_i32      1
>> +#define TCG_TARGET_HAS_sextract_i32     0
>> +#define TCG_TARGET_HAS_extract2_i32     0
>> +#define TCG_TARGET_HAS_add2_i32         0
>> +#define TCG_TARGET_HAS_sub2_i32         0
>> +#define TCG_TARGET_HAS_mulu2_i32        0
>> +#define TCG_TARGET_HAS_muls2_i32        0
>> +#define TCG_TARGET_HAS_muluh_i32        1
>> +#define TCG_TARGET_HAS_mulsh_i32        1
>> +#define TCG_TARGET_HAS_ext8s_i32        1
>> +#define TCG_TARGET_HAS_ext16s_i32       1
>> +#define TCG_TARGET_HAS_ext8u_i32        1
>> +#define TCG_TARGET_HAS_ext16u_i32       1
>> +#define TCG_TARGET_HAS_bswap16_i32      0
>> +#define TCG_TARGET_HAS_bswap32_i32      1
>> +#define TCG_TARGET_HAS_not_i32          1
>> +#define TCG_TARGET_HAS_neg_i32          1
>> +#define TCG_TARGET_HAS_andc_i32         1
>> +#define TCG_TARGET_HAS_orc_i32          1
>> +#define TCG_TARGET_HAS_eqv_i32          0
>> +#define TCG_TARGET_HAS_nand_i32         0
>> +#define TCG_TARGET_HAS_nor_i32          1
>> +#define TCG_TARGET_HAS_clz_i32          1
>> +#define TCG_TARGET_HAS_ctz_i32          1
>> +#define TCG_TARGET_HAS_ctpop_i32        0
>> +#define TCG_TARGET_HAS_direct_jump      0
>> +#define TCG_TARGET_HAS_brcond2          0
>> +#define TCG_TARGET_HAS_setcond2         0
>> +#define TCG_TARGET_HAS_qemu_st8_i32     0
>> +
>> +#if TCG_TARGET_REG_BITS == 64
>
> You don't need this conditional, since you've asserted it at the top 
> (and unlike riscv, have no plans to add support for riscv32 at some 
> future point).
OK, will remove all such conditionals in v2 too.


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 20/30] tcg/loongarch: Implement setcond ops
  2021-09-20  8:04 ` [PATCH 20/30] tcg/loongarch: Implement setcond ops WANG Xuerui
@ 2021-09-20 16:24   ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 16:24 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> +    switch (cond) {
> +    case TCG_COND_EQ:
> +        tcg_out_opc_sub_d(s, ret, arg1, arg2);
> +        tcg_out_opc_sltui(s, ret, ret, 1);
> +        break;
> +    case TCG_COND_NE:
> +        tcg_out_opc_sub_d(s, ret, arg1, arg2);
> +        tcg_out_opc_sltu(s, ret, TCG_REG_ZERO, ret);
> +        break;

You accept zero as input; you'll want to skip the subtract in that case.

r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/30] tcg/loongarch: Add generated instruction opcodes and encoding helpers
  2021-09-20 15:55   ` Richard Henderson
@ 2021-09-20 16:24     ` WANG Xuerui
  0 siblings, 0 replies; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20 16:24 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

Hi Richard,

On 9/20/21 23:55, Richard Henderson wrote:
> On 9/20/21 1:04 AM, WANG Xuerui wrote:
>> Signed-off-by: WANG Xuerui<git@xen0n.name>
>> ---
>>   tcg/loongarch/tcg-insn-defs.c.inc | 1080 +++++++++++++++++++++++++++++
>>   1 file changed, 1080 insertions(+)
>>   create mode 100644 tcg/loongarch/tcg-insn-defs.c.inc
>>
>> diff --git a/tcg/loongarch/tcg-insn-defs.c.inc 
>> b/tcg/loongarch/tcg-insn-defs.c.inc
>> new file mode 100644
>> index 0000000000..413f7ffc12
>> --- /dev/null
>> +++ b/tcg/loongarch/tcg-insn-defs.c.inc
>> @@ -0,0 +1,1080 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * LoongArch instruction formats, opcodes, and encoders for TCG use.
>> + *
>> + * Code generated by genqemutcgdefs from
>> + *https://github.com/loongson-community/loongarch-opcodes,
>> + * from commit bb5234081663faaefb6b921a7848b18e19519890.
>> + * DO NOT EDIT.
>> + */
>> +
>
> Acked-by: Richard Henderson <richard.henderson@linaro.org>
>
>
>> +static int32_t encode_d_slot(LoongArchInsn opc, uint32_t d)
>> +    __attribute__((unused));
>> +
>> +static int32_t encode_d_slot(LoongArchInsn opc, uint32_t d)
>
> Just an FYI: you can add the attribute directly to the function 
> definition like so
>
> static int32_t __attribute__((unused))
> encode_d_slot(LoongArchInsn opc, uint32_t d)
> {
>    ...
> }
>
Fine; I always struggle to remember the correct placement of attributes! 
I'll try to adjust that in loongarch-opcodes repo. If I can arrive at 
something that doesn't need prototypes and builds cleanly, I'll replace 
the code here and shave off maybe 100 lines (because currently we use 88 
insns).
>
> r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/30] tcg/loongarch: Add the tcg-target.h file
  2021-09-20 16:20     ` WANG Xuerui
@ 2021-09-20 16:25       ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 16:25 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 9:20 AM, WANG Xuerui wrote:
>>> + * Copyright (c) 2018 SiFive, Inc
>>
>> You may have copied too much from the riscv port?  :-)
> 
> First of all, thanks for the *extremely* quick review!
> 
> As for the copying, I admit that I thought the riscv port generally was doing things the 
> recent and preferred way, so most of the logic are only lightly touched. However the 
> LoongArch is substantially similar to riscv too, so much of the traits expressed here 
> would be the same regardless.
> 
> But in such a case of outstanding similarity, should I just drop my "copyright" line? I'm 
> actually okay with dropping if that's the best thing to do.

Yes, your own copyright is the correct thing in this case.


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/30] tcg/loongarch: Add register names, allocation order and input/output sets
  2021-09-20 15:57   ` Richard Henderson
@ 2021-09-20 16:27     ` WANG Xuerui
  0 siblings, 0 replies; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20 16:27 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

Hi Richard,

On 9/20/21 23:57, Richard Henderson wrote:
> On 9/20/21 1:04 AM, WANG Xuerui wrote:
>> +    /* Argument registers */
>> +    TCG_REG_A0,
>> +    TCG_REG_A1,
>> +    TCG_REG_A2,
>> +    TCG_REG_A3,
>> +    TCG_REG_A4,
>> +    TCG_REG_A5,
>> +    TCG_REG_A6,
>> +    TCG_REG_A7,
>> +};
>
> Generally I'd place the argument registers in reverse usage order.  It 
> means that we'll try to use A7 before A0, which may work to our favor 
> if the called function has less than 8 arguments.
Hmm, is that a trick already employed by other TCG ports? I'll check the 
code and adjust if that proves beneficial indeed.
>
> But otherwise,
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>
>
> r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 21/30] tcg/loongarch: Implement tcg_out_call
  2021-09-20  8:04 ` [PATCH 21/30] tcg/loongarch: Implement tcg_out_call WANG Xuerui
@ 2021-09-20 16:31   ` Richard Henderson
  2021-09-20 16:35     ` Richard Henderson
  0 siblings, 1 reply; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 16:31 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> +    } else if (TCG_TARGET_REG_BITS == 32 || offset == (int32_t)offset) {
> +        /* long jump: +/- 2GiB */
> +        tcg_out_opc_pcaddu12i(s, TCG_REG_TMP0, 0);
> +        tcg_out_opc_jirl(s, link, TCG_REG_TMP0, 0);
> +        ret = reloc_call(s->code_ptr - 2, arg);
> +        tcg_debug_assert(ret == true);

Just inline reloc_call here, so that you can provide the correct offsets to the pcadd and 
jirl instructions directly.  The assert will vanish, because you've already done the range 
check with "offset == (int32_t)offset".


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 21/30] tcg/loongarch: Implement tcg_out_call
  2021-09-20 16:31   ` Richard Henderson
@ 2021-09-20 16:35     ` Richard Henderson
  2021-09-21  6:42       ` WANG Xuerui
  0 siblings, 1 reply; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 16:35 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 9:31 AM, Richard Henderson wrote:
> On 9/20/21 1:04 AM, WANG Xuerui wrote:
>> +    } else if (TCG_TARGET_REG_BITS == 32 || offset == (int32_t)offset) {
>> +        /* long jump: +/- 2GiB */
>> +        tcg_out_opc_pcaddu12i(s, TCG_REG_TMP0, 0);
>> +        tcg_out_opc_jirl(s, link, TCG_REG_TMP0, 0);
>> +        ret = reloc_call(s->code_ptr - 2, arg);
>> +        tcg_debug_assert(ret == true);
> 
> Just inline reloc_call here, so that you can provide the correct offsets to the pcadd and 
> jirl instructions directly.  The assert will vanish, because you've already done the range 
> check with "offset == (int32_t)offset".

Actually, don't you want offset == sextract64(offset, 0, 34), and use pcaddu18i? 
Depending on the memory map of qemu, those extra bits could make the difference in 
directly reaching the main executable.


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 22/30] tcg/loongarch: Implement simple load/store ops
  2021-09-20  8:04 ` [PATCH 22/30] tcg/loongarch: Implement simple load/store ops WANG Xuerui
@ 2021-09-20 16:35   ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 16:35 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> Signed-off-by: WANG Xuerui<git@xen0n.name>
> ---
>   tcg/loongarch/tcg-target-con-set.h |   1 +
>   tcg/loongarch/tcg-target.c.inc     | 131 +++++++++++++++++++++++++++++
>   2 files changed, 132 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 23/30] tcg/loongarch: Add softmmu load/store helpers, implement qemu_ld/qemu_st ops
  2021-09-20  8:04 ` [PATCH 23/30] tcg/loongarch: Add softmmu load/store helpers, implement qemu_ld/qemu_st ops WANG Xuerui
@ 2021-09-20 17:10   ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 17:10 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> +static void tcg_out_goto(TCGContext *s, const tcg_insn_unit *target)
> +{
> +    tcg_out_opc_b(s, 0);
> +    bool ok = reloc_sd10k16(s->code_ptr - 1, target);
> +    tcg_debug_assert(ok);
> +}

Hmm.  This is an existing bug in tcg/riscv/.  We should have no asserts on relocations 
being in range.  We should always be able to tell our caller that the relocation failed, 
and we'll try again with a smaller TB.

In this case, return the result of reloc_sd10k16 and ...

> +
> +    tcg_out_goto(s, l->raddr);
> +    return true;
> +}

... return the result of tcg_out_goto.


> +static void tcg_out_tlb_load(TCGContext *s, TCGReg addrl, TCGMemOpIdx oi,
> +                             tcg_insn_unit **label_ptr, bool is_load)
> +{
> +    MemOp opc = get_memop(oi);
> +    unsigned s_bits = opc & MO_SIZE;
> +    unsigned a_bits = get_alignment_bits(opc);
> +    tcg_target_long compare_mask;
> +    int mem_index = get_mmuidx(oi);
> +    int fast_ofs = TLB_MASK_TABLE_OFS(mem_index);
> +    int mask_ofs = fast_ofs + offsetof(CPUTLBDescFast, mask);
> +    int table_ofs = fast_ofs + offsetof(CPUTLBDescFast, table);
> +    TCGReg mask_base = TCG_AREG0, table_base = TCG_AREG0;
> +
> +    tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TMP0, mask_base, mask_ofs);
> +    tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TMP1, table_base, table_ofs);

I think we can eliminate the mask_base and table_base variables now.  This dates from the 
TCG_TARGET_TLB_DISPLACEMENT_BITS thing, where we would need to compute an intermediate 
offset, and adjust these base registers.

> +    /* Clear the non-page, non-alignment bits from the address.  */
> +    compare_mask = (tcg_target_long)TARGET_PAGE_MASK | ((1 << a_bits) - 1);
> +    if (compare_mask == sextreg(compare_mask, 0, 12)) {
> +        tcg_out_opc_andi(s, TCG_REG_TMP1, addrl, compare_mask);

LoongArch uses an unsigned mask for andi, not signed.  The immediate case will never match 
for LoongArch.

> +static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
> +{
> +    TCGMemOpIdx oi = l->oi;
> +    MemOp opc = get_memop(oi);
> +    TCGReg a0 = tcg_target_call_iarg_regs[0];
> +    TCGReg a1 = tcg_target_call_iarg_regs[1];
> +    TCGReg a2 = tcg_target_call_iarg_regs[2];
> +    TCGReg a3 = tcg_target_call_iarg_regs[3];

Drop these, since you've already named TCG_REG_A0 etc.

> +
> +    /* We don't support oversize guests */
> +    if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
> +        g_assert_not_reached();
> +    }

This is redundant with TCG_TARGET_REG_BITS == 64.

> +    tcg_out_call(s, qemu_ld_helpers[opc & MO_SSIZE]);
> +    tcg_out_mov(s, (opc & MO_SIZE) == MO_64, l->datalo_reg, a0);

Because you have single-insn sign-extend instructions, it's better to always call the 
unsigned load function, and then sign-extend here.  See the aarch64 version.

> +    tcg_out_mov(s, TCG_TYPE_PTR, a2, l->datalo_reg);
> +    switch (s_bits) {
> +    case MO_8:
> +        tcg_out_ext8u(s, a2, a2);
> +        break;
> +    case MO_16:
> +        tcg_out_ext16u(s, a2, a2);
> +        break;
> +    default:
> +        break;
> +    }

Do you have a pointer to the LoongArch ABI?  Do 32-bit values need to be sign- or 
zero-extended in the call arguments?

Anyway, merge the move and extend.

> +    tcg_out_movi(s, TCG_TYPE_PTR, a4, (tcg_target_long)l->raddr);

Oh, just FYI, this is another case where movi wants to handle pc-relative addresses.

> +    if (guest_base == 0) {
> +        tcg_out_opc_add_d(s, base, addr_regl, TCG_REG_ZERO);

Don't add zero.  RISCV bug that's been fixed recently.

> +static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg lo,
> +                                   TCGReg base, MemOp opc)
> +{
> +    /* Byte swapping is left to middle-end expansion.  */
> +    tcg_debug_assert((opc & MO_BSWAP) == 0);
> +
> +    switch (opc & (MO_SSIZE)) {

MO_SIZE here.

> +static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is_64)

Shouldn't need is_64 argument for store.  It's only relevant to load so that TCG_TYPE_I32 
values are always sign-extended in the host register.


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 24/30] tcg/loongarch: Implement tcg_target_qemu_prologue
  2021-09-20  8:04 ` [PATCH 24/30] tcg/loongarch: Implement tcg_target_qemu_prologue WANG Xuerui
@ 2021-09-20 17:15   ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 17:15 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> +#if !defined(CONFIG_SOFTMMU)
> +    tcg_out_movi(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, guest_base);
> +    tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG);
> +#endif

Should test for guest_base == 0.  This is common for a 64-bit guest.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/30] tcg/loongarch: Implement necessary relocation operations
  2021-09-20 14:36   ` Richard Henderson
@ 2021-09-20 17:15     ` WANG Xuerui
  0 siblings, 0 replies; 80+ messages in thread
From: WANG Xuerui @ 2021-09-20 17:15 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

Hi Richard,

On 9/20/21 22:36, Richard Henderson wrote:
> On 9/20/21 1:04 AM, WANG Xuerui wrote:
>> +static bool reloc_call(tcg_insn_unit *src_rw, const tcg_insn_unit 
>> *target)
>> +{
>> +    const tcg_insn_unit *src_rx = tcg_splitwx_to_rx(src_rw);
>> +    intptr_t offset = (intptr_t)target - (intptr_t)src_rx;
>> +    int32_t lo = sextreg(offset, 0, 12);
>> +    int32_t hi = offset - lo;
>> +
>> +    tcg_debug_assert((offset & 2) == 0);
>> +    if (offset == hi + lo) {
>> +        hi >>= 12;
>> +        src_rw[0] |= (hi << 5) & 0x1ffffe0; /* pcaddu12i's Sj20 imm */
>> +        lo >>= 2;
>> +        src_rw[1] |= (lo << 10) & 0x3fffc00; /* jirl's Sk16 imm */
>> +        return true;
>> +    }
>> +
>> +    return false;
>> +}
>
> This doesn't seem to belong as a "reloc".
> Certainly it doesn't seem like something that can simply be allowed to 
> fail.
>
Yes, you're right on this; on closer look at the riscv port they 
actually reused this logic once (the riscv port drops large constants to 
pool, hence need some PC-relative hackery). For LoongArch the only usage 
of this code is for generating calls, so I'll just merge this into the 
commit doing tcg_out_call, and inline if the resulting code is still 
readable.

And it's 1 a.m. here in China, so I'll be processing the other review 
comments after getting some sleep. (Today's in the middle of the 3-day 
Mid-Autumn Festival holiday here, and that's why I can work on this 
hobby project like it's $DAY_JOB!) I'll send the v2 hopefully at 
afternoon local time (tomorrow in your timezone).

>
> r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 25/30] tcg/loongarch: Implement exit_tb/goto_tb
  2021-09-20  8:04 ` [PATCH 25/30] tcg/loongarch: Implement exit_tb/goto_tb WANG Xuerui
@ 2021-09-20 17:16   ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 17:16 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> Signed-off-by: WANG Xuerui <git@xen0n.name>
> ---
>   tcg/loongarch/tcg-target.c.inc | 19 +++++++++++++++++++
>   1 file changed, 19 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 26/30] tcg/loongarch: Implement tcg_target_init
  2021-09-20  8:04 ` [PATCH 26/30] tcg/loongarch: Implement tcg_target_init WANG Xuerui
@ 2021-09-20 17:19   ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 17:19 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> +static void tcg_target_init(TCGContext *s)
> +{
> +    tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff;
> +    if (TCG_TARGET_REG_BITS == 64) {
> +        tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffff;
> +    }
> +
> +    tcg_target_call_clobber_regs = -1u;

In all 3 places, use your ALL_GENERAL_REGS constant.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 27/30] tcg/loongarch: Register the JIT
  2021-09-20  8:04 ` [PATCH 27/30] tcg/loongarch: Register the JIT WANG Xuerui
@ 2021-09-20 17:21   ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 17:21 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> Signed-off-by: WANG Xuerui<git@xen0n.name>
> ---
>   tcg/loongarch/tcg-target.c.inc | 44 ++++++++++++++++++++++++++++++++++
>   1 file changed, 44 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 28/30] configure, meson.build: Mark support for 64-bit LoongArch hosts
  2021-09-20  8:04 ` [PATCH 28/30] configure, meson.build: Mark support for 64-bit LoongArch hosts WANG Xuerui
@ 2021-09-20 17:23   ` Richard Henderson
  2021-09-21  6:02     ` WANG Xuerui
  2021-09-21 14:42     ` Peter Maydell
  0 siblings, 2 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 17:23 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> Signed-off-by: WANG Xuerui <git@xen0n.name>
> ---
>   configure   | 4 +++-
>   meson.build | 4 +++-
>   2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/configure b/configure
> index 1043ccce4f..f1bc85e71b 100755
> --- a/configure
> +++ b/configure
> @@ -659,6 +659,8 @@ elif check_define __arm__ ; then
>     cpu="arm"
>   elif check_define __aarch64__ ; then
>     cpu="aarch64"
> +elif check_define __loongarch64 ; then
> +  cpu="loongarch64"
>   else
>     cpu=$(uname -m)
>   fi
> @@ -667,7 +669,7 @@ ARCH=
>   # Normalise host CPU name and set ARCH.
>   # Note that this case should only have supported host CPUs, not guests.
>   case "$cpu" in
> -  ppc|ppc64|s390x|sparc64|x32|riscv32|riscv64)
> +  ppc|ppc64|s390x|sparc64|x32|riscv32|riscv64|loongarch64)
>     ;;
>     ppc64le)
>       ARCH="ppc64"
> diff --git a/meson.build b/meson.build
> index 2711cbb789..fb3befead5 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -57,7 +57,7 @@ python = import('python').find_installation()
>   
>   supported_oses = ['windows', 'freebsd', 'netbsd', 'openbsd', 'darwin', 'sunos', 'linux']
>   supported_cpus = ['ppc', 'ppc64', 's390x', 'riscv32', 'riscv64', 'x86', 'x86_64',
> -  'arm', 'aarch64', 'mips', 'mips64', 'sparc', 'sparc64']
> +  'arm', 'aarch64', 'loongarch64', 'mips', 'mips64', 'sparc', 'sparc64']
>   
>   cpu = host_machine.cpu_family()
>   targetos = host_machine.system()
> @@ -269,6 +269,8 @@ if not get_option('tcg').disabled()
>       tcg_arch = 's390'
>     elif config_host['ARCH'] in ['x86_64', 'x32']
>       tcg_arch = 'i386'
> +  elif config_host['ARCH'] == 'loongarch64'
> +    tcg_arch = 'loongarch'

Be consistent with loongarch or loongarch64 everywhere.

If there's no loongarch32, and never will be, then there's probably no point in keeping 
the '64' suffix.


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 29/30] linux-user: Add host dependency for 64-bit LoongArch
  2021-09-20  8:04 ` [PATCH 29/30] linux-user: Add host dependency for 64-bit LoongArch WANG Xuerui
@ 2021-09-20 17:26   ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 17:26 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> Currently nothing special is needed for LoongArch hosts to work, so only
> leave a placeholder there.
> 
> Signed-off-by: WANG Xuerui <git@xen0n.name>
> ---
>   linux-user/host/loongarch64/hostdep.h | 11 +++++++++++
>   1 file changed, 11 insertions(+)
>   create mode 100644 linux-user/host/loongarch64/hostdep.h
> 
> diff --git a/linux-user/host/loongarch64/hostdep.h b/linux-user/host/loongarch64/hostdep.h
> new file mode 100644
> index 0000000000..4e55695155
> --- /dev/null
> +++ b/linux-user/host/loongarch64/hostdep.h
> @@ -0,0 +1,11 @@
> +/*
> + * hostdep.h : things which are dependent on the host architecture
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef LOONGARCH64_HOSTDEP_H
> +#define LOONGARCH64_HOSTDEP_H
> +
> +#endif
> 

This is not true.  You'll need to write safe-syscall.inc.S for loongarch.  Currently we 
have a fallback, in linux-user/safe-syscall.h, but this is clearly marked as a bug.  I 
plan to drop this fallback very shortly, as all supported hosts have now filled this in.


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 30/30] accel/tcg/user-exec: Implement CPU-specific signal handler for LoongArch hosts
  2021-09-20  8:04 ` [PATCH 30/30] accel/tcg/user-exec: Implement CPU-specific signal handler for LoongArch hosts WANG Xuerui
@ 2021-09-20 17:31   ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 17:31 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 1:04 AM, WANG Xuerui wrote:
> Signed-off-by: WANG Xuerui <git@xen0n.name>
> ---
>   accel/tcg/user-exec.c | 83 +++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 83 insertions(+)
> 
> diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
> index 8fed542622..0f85062e61 100644
> --- a/accel/tcg/user-exec.c
> +++ b/accel/tcg/user-exec.c
> @@ -878,6 +878,89 @@ int cpu_signal_handler(int host_signum, void *pinfo,
>       return handle_cpu_signal(pc, info, is_write, &uc->uc_sigmask);
>   }
>   
> +#elif defined(__loongarch__)
> +
> +/*
> + * This logic is bitness-agnostic, so the generic __loongarch__ guard is used
> + * instead of explicit ones like __loongarch64.
> + */
> +
> +int cpu_signal_handler(int host_signum, void *pinfo,
> +                       void *puc)

Looks ok, as far as it goes.  Similar comments about loongarch64 vs loongarch32 vs loongarch.

Also have a look at

https://lore.kernel.org/qemu-devel/20210918184527.408540-1-richard.henderson@linaro.org/


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 14/30] tcg/loongarch: Implement bswap32_i32/bswap64_i64
  2021-09-20 15:11   ` Richard Henderson
@ 2021-09-20 18:20     ` Richard Henderson
  2021-09-21  6:37     ` WANG Xuerui
  1 sibling, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-20 18:20 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 8:11 AM, Richard Henderson wrote:
>          } else if (a2 & TCG_BSWAP_OZ) {
>              tcg_out_ext32u(s, a0, a0);
>          }

Actually,

   if ((a2 & (TCG_BSWAP_IZ | TCG_BSWAP_OZ)) == TCG_BSWAP_OZ)

If the input is zero-extended, the output of revb_2w will also be zero-extended already.


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 28/30] configure, meson.build: Mark support for 64-bit LoongArch hosts
  2021-09-20 17:23   ` Richard Henderson
@ 2021-09-21  6:02     ` WANG Xuerui
  2021-09-21  6:59       ` Philippe Mathieu-Daudé
  2021-09-21 13:30       ` Richard Henderson
  2021-09-21 14:42     ` Peter Maydell
  1 sibling, 2 replies; 80+ messages in thread
From: WANG Xuerui @ 2021-09-21  6:02 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

Hi Richard,

On 9/21/21 01:23, Richard Henderson wrote:
> On 9/20/21 1:04 AM, WANG Xuerui wrote:
>> Signed-off-by: WANG Xuerui <git@xen0n.name>
>> ---
>>   configure   | 4 +++-
>>   meson.build | 4 +++-
>>   2 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/configure b/configure
>> index 1043ccce4f..f1bc85e71b 100755
>> --- a/configure
>> +++ b/configure
>> @@ -659,6 +659,8 @@ elif check_define __arm__ ; then
>>     cpu="arm"
>>   elif check_define __aarch64__ ; then
>>     cpu="aarch64"
>> +elif check_define __loongarch64 ; then
>> +  cpu="loongarch64"
>>   else
>>     cpu=$(uname -m)
>>   fi
>> @@ -667,7 +669,7 @@ ARCH=
>>   # Normalise host CPU name and set ARCH.
>>   # Note that this case should only have supported host CPUs, not 
>> guests.
>>   case "$cpu" in
>> -  ppc|ppc64|s390x|sparc64|x32|riscv32|riscv64)
>> +  ppc|ppc64|s390x|sparc64|x32|riscv32|riscv64|loongarch64)
>>     ;;
>>     ppc64le)
>>       ARCH="ppc64"
>> diff --git a/meson.build b/meson.build
>> index 2711cbb789..fb3befead5 100644
>> --- a/meson.build
>> +++ b/meson.build
>> @@ -57,7 +57,7 @@ python = import('python').find_installation()
>>     supported_oses = ['windows', 'freebsd', 'netbsd', 'openbsd', 
>> 'darwin', 'sunos', 'linux']
>>   supported_cpus = ['ppc', 'ppc64', 's390x', 'riscv32', 'riscv64', 
>> 'x86', 'x86_64',
>> -  'arm', 'aarch64', 'mips', 'mips64', 'sparc', 'sparc64']
>> +  'arm', 'aarch64', 'loongarch64', 'mips', 'mips64', 'sparc', 
>> 'sparc64']
>>     cpu = host_machine.cpu_family()
>>   targetos = host_machine.system()
>> @@ -269,6 +269,8 @@ if not get_option('tcg').disabled()
>>       tcg_arch = 's390'
>>     elif config_host['ARCH'] in ['x86_64', 'x32']
>>       tcg_arch = 'i386'
>> +  elif config_host['ARCH'] == 'loongarch64'
>> +    tcg_arch = 'loongarch'
>
> Be consistent with loongarch or loongarch64 everywhere.
>
> If there's no loongarch32, and never will be, then there's probably no 
> point in keeping the '64' suffix.

The loongarch32 tuple will most certainly come into existence some time 
in the future, but probably bare-metal-only and without a Linux port 
AFAIK. That's a point the Loongson people and I didn't communicate well, 
apologizes for that. (While we're at it, the reserved "loongarchx32" 
which is x32/n32-like, most likely will never exist.)

So should I drop the explicit probing for __loongarch64, instead just 
probe for __loongarch__ and later #error out the non-__loongarch64 cases 
individually?

>
>
> r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 14/30] tcg/loongarch: Implement bswap32_i32/bswap64_i64
  2021-09-20 15:11   ` Richard Henderson
  2021-09-20 18:20     ` Richard Henderson
@ 2021-09-21  6:37     ` WANG Xuerui
  1 sibling, 0 replies; 80+ messages in thread
From: WANG Xuerui @ 2021-09-21  6:37 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

Hi Richard,

On 9/20/21 23:11, Richard Henderson wrote:
> On 9/20/21 1:04 AM, WANG Xuerui wrote:
>> +    case INDEX_op_bswap32_i32:
>> +        tcg_out_opc_revb_2h(s, a0, a1);
>> +        tcg_out_opc_rotri_w(s, a0, a0, 16);
>> +        break;
>> +    case INDEX_op_bswap64_i64:
>> +        tcg_out_opc_revb_d(s, a0, a1);
>> +        break;
>
> You're missing INDEX_op_bswap32_i64, which in addition has a third 
> argument consisting of TCG_BSWAP_* bits.
>
> I would have expected revb_2w to be the preferred implementation of 
> bswap32.  I would expect something like
>
>
>     case INDEX_op_bswap32_i32:
>         /* All 32-bit values are computed sign-extended in the 
> register. */
>         a2 = TCG_BSWAP_OS;
>         /* fall through */
>     case INDEX_op_bswap32_i64:
>         tcg_out_opc_revb_2w(s, a0, a1);
>         if (a2 & TCG_BSWAP_OS) {
>             tcg_out_ext32s(s, a0, a0);
>         } else if (a2 & TCG_BSWAP_OZ) {
>             tcg_out_ext32u(s, a0, a0);
>         }
>         break;
>
You're right when we're supporting only 64-bit hosts. While I was 
writing that code I hadn't decided whether to remove support for 32-bit 
hosts, so I didn't make use of 64-bit instructions for the 32-bit ops. 
I'll fix this in v2.
>
> r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 21/30] tcg/loongarch: Implement tcg_out_call
  2021-09-20 16:35     ` Richard Henderson
@ 2021-09-21  6:42       ` WANG Xuerui
  0 siblings, 0 replies; 80+ messages in thread
From: WANG Xuerui @ 2021-09-21  6:42 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

Hi Richard,

On 9/21/21 00:35, Richard Henderson wrote:
> On 9/20/21 9:31 AM, Richard Henderson wrote:
>> On 9/20/21 1:04 AM, WANG Xuerui wrote:
>>> +    } else if (TCG_TARGET_REG_BITS == 32 || offset == 
>>> (int32_t)offset) {
>>> +        /* long jump: +/- 2GiB */
>>> +        tcg_out_opc_pcaddu12i(s, TCG_REG_TMP0, 0);
>>> +        tcg_out_opc_jirl(s, link, TCG_REG_TMP0, 0);
>>> +        ret = reloc_call(s->code_ptr - 2, arg);
>>> +        tcg_debug_assert(ret == true);
>>
>> Just inline reloc_call here, so that you can provide the correct 
>> offsets to the pcadd and jirl instructions directly.  The assert will 
>> vanish, because you've already done the range check with "offset == 
>> (int32_t)offset".
>
> Actually, don't you want offset == sextract64(offset, 0, 34), and use 
> pcaddu18i? Depending on the memory map of qemu, those extra bits could 
> make the difference in directly reaching the main executable.
>
Whoa, silly me, I actually didn't realize a single expected use case of 
pcaddu18i until I read this, the low 2 bits are always clear so 18 is 
exactly the amount of shift needed when paired with jirl!

I'll of course rework this to use pcaddu18i+jirl instead.

>
> r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 28/30] configure, meson.build: Mark support for 64-bit LoongArch hosts
  2021-09-21  6:02     ` WANG Xuerui
@ 2021-09-21  6:59       ` Philippe Mathieu-Daudé
  2021-09-21  7:24         ` WANG Xuerui
  2021-09-21 13:30       ` Richard Henderson
  1 sibling, 1 reply; 80+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-09-21  6:59 UTC (permalink / raw)
  To: WANG Xuerui, Richard Henderson, qemu-devel

On 9/21/21 08:02, WANG Xuerui wrote:
> On 9/21/21 01:23, Richard Henderson wrote:
>> On 9/20/21 1:04 AM, WANG Xuerui wrote:
>>> Signed-off-by: WANG Xuerui <git@xen0n.name>
>>> ---
>>>   configure   | 4 +++-
>>>   meson.build | 4 +++-
>>>   2 files changed, 6 insertions(+), 2 deletions(-)

>> If there's no loongarch32, and never will be, then there's probably no 
>> point in keeping the '64' suffix.
> 
> The loongarch32 tuple will most certainly come into existence some time 
> in the future, but probably bare-metal-only and without a Linux port 
> AFAIK. That's a point the Loongson people and I didn't communicate well, 
> apologizes for that. (While we're at it, the reserved "loongarchx32" 
> which is x32/n32-like, most likely will never exist.)

Are you trying to beat MIPS at their ABI complexity? /s


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 28/30] configure, meson.build: Mark support for 64-bit LoongArch hosts
  2021-09-21  6:59       ` Philippe Mathieu-Daudé
@ 2021-09-21  7:24         ` WANG Xuerui
  0 siblings, 0 replies; 80+ messages in thread
From: WANG Xuerui @ 2021-09-21  7:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Richard Henderson, qemu-devel

Hi Philippe,

On 9/21/21 14:59, Philippe Mathieu-Daudé wrote:
> On 9/21/21 08:02, WANG Xuerui wrote:
>> On 9/21/21 01:23, Richard Henderson wrote:
>>> On 9/20/21 1:04 AM, WANG Xuerui wrote:
>>>> Signed-off-by: WANG Xuerui <git@xen0n.name>
>>>> ---
>>>>   configure   | 4 +++-
>>>>   meson.build | 4 +++-
>>>>   2 files changed, 6 insertions(+), 2 deletions(-)
>
>>> If there's no loongarch32, and never will be, then there's probably 
>>> no point in keeping the '64' suffix.
>>
>> The loongarch32 tuple will most certainly come into existence some 
>> time in the future, but probably bare-metal-only and without a Linux 
>> port AFAIK. That's a point the Loongson people and I didn't 
>> communicate well, apologizes for that. (While we're at it, the 
>> reserved "loongarchx32" which is x32/n32-like, most likely will never 
>> exist.)
>
> Are you trying to beat MIPS at their ABI complexity? /s

Hah, I'm not Loongson employee so maybe I'm not in the best position to 
answer this ;-)

But from an outsider's perspective, the Loongson people obviously 
reserved things upfront like a multi-millionaire, then suddenly realized 
they only have ~500 people on board, developers even less; so they did 
the Right Thing(TM), only later, to drop x32 altogether and focus their 
energy on bare-metal use cases for their 32-bit-only chips.

Plus, LoongArch is strictly little-endian, and only one baseline ISA 
revision is published so far, so IMO it can never beat MIPS in terms of 
combinatorial ABI possibilities. Maybe RISC-V have a chance? ;-)



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/30] tcg/loongarch: Add generated instruction opcodes and encoding helpers
  2021-09-20  8:04 ` [PATCH 04/30] tcg/loongarch: Add generated instruction opcodes and encoding helpers WANG Xuerui
  2021-09-20 15:55   ` Richard Henderson
@ 2021-09-21  9:58   ` Philippe Mathieu-Daudé
  2021-09-21 11:40     ` WANG Xuerui
  1 sibling, 1 reply; 80+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-09-21  9:58 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel; +Cc: Richard Henderson

On 9/20/21 10:04, WANG Xuerui wrote:
> Signed-off-by: WANG Xuerui <git@xen0n.name>
> ---
>   tcg/loongarch/tcg-insn-defs.c.inc | 1080 +++++++++++++++++++++++++++++
>   1 file changed, 1080 insertions(+)
>   create mode 100644 tcg/loongarch/tcg-insn-defs.c.inc

> +static int32_t encode_dj_slots(LoongArchInsn opc, uint32_t d, uint32_t j)
> +{
Can we move the range check to the callee and avoid masking the values
in the caller?

        tcg_debug_assert(d < 0x20);
        tcg_debug_assert(j < 0x20);

> +    return opc | d | j << 5;
> +}
> +
> +static int32_t encode_djk_slots(LoongArchInsn opc, uint32_t d, uint32_t j,
> +                                uint32_t k) __attribute__((unused));
> +
> +static int32_t encode_djk_slots(LoongArchInsn opc, uint32_t d, uint32_t j,
> +                                uint32_t k)
> +{

        tcg_debug_assert(d < 0x20);
        tcg_debug_assert(j < 0x20);

> +    return opc | d | j << 5 | k << 10;
> +}
> +
> +static int32_t encode_djkm_slots(LoongArchInsn opc, uint32_t d, uint32_t j,
> +                                 uint32_t k, uint32_t m)
> +    __attribute__((unused));
> +
> +static int32_t encode_djkm_slots(LoongArchInsn opc, uint32_t d, uint32_t j,
> +                                 uint32_t k, uint32_t m)
> +{
> +    return opc | d | j << 5 | k << 10 | m << 16;
> +}
> +
> +static int32_t encode_dk_slots(LoongArchInsn opc, uint32_t d, uint32_t k)
> +    __attribute__((unused));
> +
> +static int32_t encode_dk_slots(LoongArchInsn opc, uint32_t d, uint32_t k)
> +{
> +    return opc | d | k << 10;
> +}
> +
> +static int32_t encode_dj_insn(LoongArchInsn opc, TCGReg d, TCGReg j)
> +    __attribute__((unused));
> +
> +static int32_t encode_dj_insn(LoongArchInsn opc, TCGReg d, TCGReg j)
> +{
> +    d &= 0x1f;
> +    j &= 0x1f;
> +    return encode_dj_slots(opc, d, j);
> +}
> +
> +static int32_t encode_djk_insn(LoongArchInsn opc, TCGReg d, TCGReg j, TCGReg k)
> +    __attribute__((unused));
> +
> +static int32_t encode_djk_insn(LoongArchInsn opc, TCGReg d, TCGReg j, TCGReg k)
> +{
> +    d &= 0x1f;
> +    j &= 0x1f;
^ moved to encode_djk_slots()

> +    k &= 0x1f;

        tcg_debug_assert(k < 0x20);

> +    return encode_djk_slots(opc, d, j, k);
> +}
> +
> +static int32_t encode_djsk12_insn(LoongArchInsn opc, TCGReg d, TCGReg j,
> +                                  int32_t sk12) __attribute__((unused));
> +
> +static int32_t encode_djsk12_insn(LoongArchInsn opc, TCGReg d, TCGReg j,
> +                                  int32_t sk12)
> +{
> +    d &= 0x1f;
> +    j &= 0x1f;

^ moved to encode_djk_slots()

> +    sk12 &= 0xfff;

        tcg_debug_assert(sk12 < 0x1000);

> +    return encode_djk_slots(opc, d, j, sk12);
> +}
> +
> +static int32_t encode_djsk16_insn(LoongArchInsn opc, TCGReg d, TCGReg j,
> +                                  int32_t sk16) __attribute__((unused));
> +
> +static int32_t encode_djsk16_insn(LoongArchInsn opc, TCGReg d, TCGReg j,
> +                                  int32_t sk16)
> +{
> +    d &= 0x1f;
> +    j &= 0x1f;

^ moved to encode_djk_slots()

> +    sk16 &= 0xffff;

        tcg_debug_assert(sk16 < 0x10000);

> +    return encode_djk_slots(opc, d, j, sk16);
> +}

etc...


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/30] tcg/loongarch: Add generated instruction opcodes and encoding helpers
  2021-09-21  9:58   ` Philippe Mathieu-Daudé
@ 2021-09-21 11:40     ` WANG Xuerui
  0 siblings, 0 replies; 80+ messages in thread
From: WANG Xuerui @ 2021-09-21 11:40 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel; +Cc: Richard Henderson

Hi Philippe,

On 9/21/21 17:58, Philippe Mathieu-Daudé wrote:
> On 9/20/21 10:04, WANG Xuerui wrote:
>> Signed-off-by: WANG Xuerui <git@xen0n.name>
>> ---
>>   tcg/loongarch/tcg-insn-defs.c.inc | 1080 +++++++++++++++++++++++++++++
>>   1 file changed, 1080 insertions(+)
>>   create mode 100644 tcg/loongarch/tcg-insn-defs.c.inc
>
>> +static int32_t encode_dj_slots(LoongArchInsn opc, uint32_t d, 
>> uint32_t j)
>> +{
> Can we move the range check to the callee and avoid masking the values
> in the caller?
>
>        tcg_debug_assert(d < 0x20);
>        tcg_debug_assert(j < 0x20);

Making use of tcg_debug_assert would be rather nice, but in fact 
different instructions could have differently sized fields start from 
the same offset. Take the "bstrpick.w" and "bstrpick.d" instructions, 
they belong to DJUk5Um5 and DJUk6Um6 formats respectively; the "Uk5" and 
"Uk6" fields both start from the 10th bit but have different value 
ranges. So the range checks necessarily live in encoders for the 
individual formats.



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 28/30] configure, meson.build: Mark support for 64-bit LoongArch hosts
  2021-09-21  6:02     ` WANG Xuerui
  2021-09-21  6:59       ` Philippe Mathieu-Daudé
@ 2021-09-21 13:30       ` Richard Henderson
  2021-09-21 14:07         ` WANG Xuerui
  1 sibling, 1 reply; 80+ messages in thread
From: Richard Henderson @ 2021-09-21 13:30 UTC (permalink / raw)
  To: WANG Xuerui, qemu-devel

On 9/20/21 11:02 PM, WANG Xuerui wrote:
> The loongarch32 tuple will most certainly come into existence some time in the future, but 
> probably bare-metal-only and without a Linux port AFAIK.

Ok, I'll bear that in mind when considering target/loongarch/.

> So should I drop the explicit probing for __loongarch64, instead just probe for 
> __loongarch__ and later #error out the non-__loongarch64 cases individually?

I'm ok with checking the __loongarch64 define, but I thing ARCH=loongarch is sufficient. 
That name will apply to linux-user/host/$ARCH/ and tcg/$ARCH/.


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 28/30] configure, meson.build: Mark support for 64-bit LoongArch hosts
  2021-09-21 13:30       ` Richard Henderson
@ 2021-09-21 14:07         ` WANG Xuerui
  2021-09-21 14:10           ` WANG Xuerui
  0 siblings, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-21 14:07 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

Hi Richard,

On 9/21/21 21:30, Richard Henderson wrote:
> On 9/20/21 11:02 PM, WANG Xuerui wrote:
>
>> So should I drop the explicit probing for __loongarch64, instead just 
>> probe for __loongarch__ and later #error out the non-__loongarch64 
>> cases individually?
>
> I'm ok with checking the __loongarch64 define, but I thing 
> ARCH=loongarch is sufficient. That name will apply to 
> linux-user/host/$ARCH/ and tcg/$ARCH/.
>
I just dug deeper into this while waiting for compilations; indeed the 
cpu variable must be "loongarch64" but ARCH could be just "loongarch". 
The $cpu is shoved directly into the meson cross file as CPU family 
name, for which only "loongarch64" is valid [1]. I'll keep probing for 
__loongarch64 but just transform the ARCH value.

[1]: https://mesonbuild.com/Reference-tables.html#cpu-families




^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 28/30] configure, meson.build: Mark support for 64-bit LoongArch hosts
  2021-09-21 14:07         ` WANG Xuerui
@ 2021-09-21 14:10           ` WANG Xuerui
  0 siblings, 0 replies; 80+ messages in thread
From: WANG Xuerui @ 2021-09-21 14:10 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 9/21/21 22:07, WANG Xuerui wrote:

> Hi Richard,
>
> On 9/21/21 21:30, Richard Henderson wrote:
>> On 9/20/21 11:02 PM, WANG Xuerui wrote:
>>
>>> So should I drop the explicit probing for __loongarch64, instead 
>>> just probe for __loongarch__ and later #error out the 
>>> non-__loongarch64 cases individually?
>>
>> I'm ok with checking the __loongarch64 define, but I thing 
>> ARCH=loongarch is sufficient. That name will apply to 
>> linux-user/host/$ARCH/ and tcg/$ARCH/.
>>
> I just dug deeper into this while waiting for compilations; indeed the 
> cpu variable must be "loongarch64" but ARCH could be just "loongarch". 
> The $cpu is shoved directly into the meson cross file as CPU family 
> name, for which only "loongarch64" is valid [1]. I'll keep probing for 
> __loongarch64 but just transform the ARCH value.
>
Ah wait, it seems the used value is $ARCH... But some changes around the 
$cpu/$ARCH handling are necessary anyway. Sorry for the noise!
> [1]: https://mesonbuild.com/Reference-tables.html#cpu-families
>
>
>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 28/30] configure, meson.build: Mark support for 64-bit LoongArch hosts
  2021-09-20 17:23   ` Richard Henderson
  2021-09-21  6:02     ` WANG Xuerui
@ 2021-09-21 14:42     ` Peter Maydell
  2021-09-21 15:59       ` Richard Henderson
  2021-09-21 16:09       ` WANG Xuerui
  1 sibling, 2 replies; 80+ messages in thread
From: Peter Maydell @ 2021-09-21 14:42 UTC (permalink / raw)
  To: Richard Henderson; +Cc: WANG Xuerui, QEMU Developers

On Mon, 20 Sept 2021 at 18:25, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 9/20/21 1:04 AM, WANG Xuerui wrote:
> > Signed-off-by: WANG Xuerui <git@xen0n.name>

> Be consistent with loongarch or loongarch64 everywhere.
>
> If there's no loongarch32, and never will be, then there's probably no point in keeping
> the '64' suffix.

What does Linux 'uname -m' call the architecture, and what is the
name in the gcc triplet? Generally I think we should prefer to follow
those precedents (which hopefully don't point in different directions)
rather than making up our own architecture names.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 28/30] configure, meson.build: Mark support for 64-bit LoongArch hosts
  2021-09-21 14:42     ` Peter Maydell
@ 2021-09-21 15:59       ` Richard Henderson
  2021-09-21 16:09       ` WANG Xuerui
  1 sibling, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-21 15:59 UTC (permalink / raw)
  To: Peter Maydell; +Cc: WANG Xuerui, QEMU Developers

On 9/21/21 7:42 AM, Peter Maydell wrote:
> On Mon, 20 Sept 2021 at 18:25, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> On 9/20/21 1:04 AM, WANG Xuerui wrote:
>>> Signed-off-by: WANG Xuerui <git@xen0n.name>
> 
>> Be consistent with loongarch or loongarch64 everywhere.
>>
>> If there's no loongarch32, and never will be, then there's probably no point in keeping
>> the '64' suffix.
> 
> What does Linux 'uname -m' call the architecture, and what is the
> name in the gcc triplet?

The kernel will report

arch/loongarch/Makefile:UTS_MACHINE := loongarch64

and it appears that the toolchain is using loongarch64 as well.

So, Xuerui, I think there's your answer...


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 28/30] configure, meson.build: Mark support for 64-bit LoongArch hosts
  2021-09-21 14:42     ` Peter Maydell
  2021-09-21 15:59       ` Richard Henderson
@ 2021-09-21 16:09       ` WANG Xuerui
  2021-09-21 17:26         ` Richard Henderson
  1 sibling, 1 reply; 80+ messages in thread
From: WANG Xuerui @ 2021-09-21 16:09 UTC (permalink / raw)
  To: Peter Maydell, Richard Henderson; +Cc: QEMU Developers

Hi Peter,

On 9/21/21 22:42, Peter Maydell wrote:
> On Mon, 20 Sept 2021 at 18:25, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> On 9/20/21 1:04 AM, WANG Xuerui wrote:
>>> Signed-off-by: WANG Xuerui <git@xen0n.name>
>> Be consistent with loongarch or loongarch64 everywhere.
>>
>> If there's no loongarch32, and never will be, then there's probably no point in keeping
>> the '64' suffix.
> What does Linux 'uname -m' call the architecture, and what is the
> name in the gcc triplet? Generally I think we should prefer to follow
> those precedents (which hopefully don't point in different directions)
> rather than making up our own architecture names.

uname -m says "loongarch64", the GNU triple arch name is also 
"loongarch64". I'd say it's similar to the situation of RISC-V or MIPS; 
except that a Linux port to the 32-bit variant of LoongArch might not 
happen, precluding a QEMU port.

I think cpu=loongarch64 but ARCH=loongarch should be okay; at least it's 
better than, say, the Go language or Gentoo, where this architecture is 
named "loong64" and "loong"; or the binutils internals where it's "larch".

>
> thanks
> -- PMM


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 28/30] configure, meson.build: Mark support for 64-bit LoongArch hosts
  2021-09-21 16:09       ` WANG Xuerui
@ 2021-09-21 17:26         ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2021-09-21 17:26 UTC (permalink / raw)
  To: WANG Xuerui, Peter Maydell; +Cc: QEMU Developers

On 9/21/21 9:09 AM, WANG Xuerui wrote:
> I think cpu=loongarch64 but ARCH=loongarch should be okay...

Make it easier on yourself and keep them the same.


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2021-09-21 17:29 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-20  8:04 [PATCH 00/30] 64-bit LoongArch port of QEMU TCG WANG Xuerui
2021-09-20  8:04 ` [PATCH 01/30] elf: Add machine type value for LoongArch WANG Xuerui
2021-09-20  8:04 ` [PATCH 02/30] MAINTAINERS: Add tcg/loongarch entry with myself as maintainer WANG Xuerui
2021-09-20 14:50   ` Richard Henderson
2021-09-20  8:04 ` [PATCH 03/30] tcg/loongarch: Add the tcg-target.h file WANG Xuerui
2021-09-20 14:23   ` Richard Henderson
2021-09-20 16:20     ` WANG Xuerui
2021-09-20 16:25       ` Richard Henderson
2021-09-20  8:04 ` [PATCH 04/30] tcg/loongarch: Add generated instruction opcodes and encoding helpers WANG Xuerui
2021-09-20 15:55   ` Richard Henderson
2021-09-20 16:24     ` WANG Xuerui
2021-09-21  9:58   ` Philippe Mathieu-Daudé
2021-09-21 11:40     ` WANG Xuerui
2021-09-20  8:04 ` [PATCH 05/30] tcg/loongarch: Add register names, allocation order and input/output sets WANG Xuerui
2021-09-20 15:57   ` Richard Henderson
2021-09-20 16:27     ` WANG Xuerui
2021-09-20  8:04 ` [PATCH 06/30] tcg/loongarch: Define the operand constraints WANG Xuerui
2021-09-20 14:28   ` Richard Henderson
2021-09-20  8:04 ` [PATCH 07/30] tcg/loongarch: Implement necessary relocation operations WANG Xuerui
2021-09-20 14:36   ` Richard Henderson
2021-09-20 17:15     ` WANG Xuerui
2021-09-20  8:04 ` [PATCH 08/30] tcg/loongarch: Implement the memory barrier op WANG Xuerui
2021-09-20  8:04 ` [PATCH 09/30] tcg/loongarch: Implement tcg_out_mov and tcg_out_movi WANG Xuerui
2021-09-20 14:47   ` Richard Henderson
2021-09-20  8:04 ` [PATCH 10/30] tcg/loongarch: Implement goto_ptr WANG Xuerui
2021-09-20 14:49   ` Richard Henderson
2021-09-20  8:04 ` [PATCH 11/30] tcg/loongarch: Implement sign-/zero-extension ops WANG Xuerui
2021-09-20 14:50   ` Richard Henderson
2021-09-20  8:04 ` [PATCH 12/30] tcg/loongarch: Implement not/and/or/xor/nor/andc/orc ops WANG Xuerui
2021-09-20 14:54   ` Richard Henderson
2021-09-20  8:04 ` [PATCH 13/30] tcg/loongarch: Implement deposit/extract ops WANG Xuerui
2021-09-20 14:55   ` Richard Henderson
2021-09-20  8:04 ` [PATCH 14/30] tcg/loongarch: Implement bswap32_i32/bswap64_i64 WANG Xuerui
2021-09-20 15:11   ` Richard Henderson
2021-09-20 18:20     ` Richard Henderson
2021-09-21  6:37     ` WANG Xuerui
2021-09-20  8:04 ` [PATCH 15/30] tcg/loongarch: Implement clz/ctz ops WANG Xuerui
2021-09-20 16:10   ` Richard Henderson
2021-09-20  8:04 ` [PATCH 16/30] tcg/loongarch: Implement shl/shr/sar/rotl/rotr ops WANG Xuerui
2021-09-20 16:13   ` Richard Henderson
2021-09-20  8:04 ` [PATCH 17/30] tcg/loongarch: Implement neg/add/sub ops WANG Xuerui
2021-09-20 16:16   ` Richard Henderson
2021-09-20  8:04 ` [PATCH 18/30] tcg/loongarch: Implement mul/mulsh/muluh/div/divu/rem/remu ops WANG Xuerui
2021-09-20 16:16   ` Richard Henderson
2021-09-20  8:04 ` [PATCH 19/30] tcg/loongarch: Implement br/brcond ops WANG Xuerui
2021-09-20 16:20   ` Richard Henderson
2021-09-20  8:04 ` [PATCH 20/30] tcg/loongarch: Implement setcond ops WANG Xuerui
2021-09-20 16:24   ` Richard Henderson
2021-09-20  8:04 ` [PATCH 21/30] tcg/loongarch: Implement tcg_out_call WANG Xuerui
2021-09-20 16:31   ` Richard Henderson
2021-09-20 16:35     ` Richard Henderson
2021-09-21  6:42       ` WANG Xuerui
2021-09-20  8:04 ` [PATCH 22/30] tcg/loongarch: Implement simple load/store ops WANG Xuerui
2021-09-20 16:35   ` Richard Henderson
2021-09-20  8:04 ` [PATCH 23/30] tcg/loongarch: Add softmmu load/store helpers, implement qemu_ld/qemu_st ops WANG Xuerui
2021-09-20 17:10   ` Richard Henderson
2021-09-20  8:04 ` [PATCH 24/30] tcg/loongarch: Implement tcg_target_qemu_prologue WANG Xuerui
2021-09-20 17:15   ` Richard Henderson
2021-09-20  8:04 ` [PATCH 25/30] tcg/loongarch: Implement exit_tb/goto_tb WANG Xuerui
2021-09-20 17:16   ` Richard Henderson
2021-09-20  8:04 ` [PATCH 26/30] tcg/loongarch: Implement tcg_target_init WANG Xuerui
2021-09-20 17:19   ` Richard Henderson
2021-09-20  8:04 ` [PATCH 27/30] tcg/loongarch: Register the JIT WANG Xuerui
2021-09-20 17:21   ` Richard Henderson
2021-09-20  8:04 ` [PATCH 28/30] configure, meson.build: Mark support for 64-bit LoongArch hosts WANG Xuerui
2021-09-20 17:23   ` Richard Henderson
2021-09-21  6:02     ` WANG Xuerui
2021-09-21  6:59       ` Philippe Mathieu-Daudé
2021-09-21  7:24         ` WANG Xuerui
2021-09-21 13:30       ` Richard Henderson
2021-09-21 14:07         ` WANG Xuerui
2021-09-21 14:10           ` WANG Xuerui
2021-09-21 14:42     ` Peter Maydell
2021-09-21 15:59       ` Richard Henderson
2021-09-21 16:09       ` WANG Xuerui
2021-09-21 17:26         ` Richard Henderson
2021-09-20  8:04 ` [PATCH 29/30] linux-user: Add host dependency for 64-bit LoongArch WANG Xuerui
2021-09-20 17:26   ` Richard Henderson
2021-09-20  8:04 ` [PATCH 30/30] accel/tcg/user-exec: Implement CPU-specific signal handler for LoongArch hosts WANG Xuerui
2021-09-20 17:31   ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.