qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/15] tcg mips64 and mipsr6 improvements
@ 2016-02-09 10:39 Richard Henderson
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 01/15] tcg-mips: Add mips64 opcodes Richard Henderson
                   ` (14 more replies)
  0 siblings, 15 replies; 29+ messages in thread
From: Richard Henderson @ 2016-02-09 10:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: james.hogan, aurelien

Some of this patch set is two years old, though I never got
around to posting it.  I've (lightly) tested it under qemu
with a 5kc cpu, with a debian n64 runtime.  I've not been
able to test any of the r6 changes, except for compilation.

So to some extent this is an RFC.  Hopefully someone has an
R6 install out there somewhere who can test this.


r~


Richard Henderson (15):
  tcg-mips: Add mips64 opcodes
  tcg-mips: Support 64-bit opcodes
  tcg-mips: Adjust move functions for mips64
  tcg-mips: Adjust load/store functions for mips64
  tcg-mips: Adjust prologue for mips64
  tcg-mips: Add tcg unwind info
  tcg-mips: Adjust qemu_ld/st for mips64
  tcg-mips: Adjust calling conventions for mips64
  tcg-mips: Fix exit_tb for mips64
  tcg-mips: Move bswap code to subroutines
  tcg-mips: Use mips64r6 instructions in tcg_out_movi
  tcg-mips: Use mips64r6 instructions in tcg_out_ldst
  tcg-mips: Use mips64r6 instructions in constant addition
  tcg-mips: Use mipsr6 instructions in branches
  tcg-mips: Use mipsr6 instructions in calls

 include/elf.h         |    4 +
 tcg/mips/tcg-target.c | 1560 +++++++++++++++++++++++++++++++++++++++----------
 tcg/mips/tcg-target.h |   62 +-
 3 files changed, 1326 insertions(+), 300 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 01/15] tcg-mips: Add mips64 opcodes
  2016-02-09 10:39 [Qemu-devel] [PATCH 00/15] tcg mips64 and mipsr6 improvements Richard Henderson
@ 2016-02-09 10:39 ` Richard Henderson
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 02/15] tcg-mips: Support 64-bit opcodes Richard Henderson
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2016-02-09 10:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: james.hogan, aurelien

Since the mips manual tables are in octal, reorg all of the opcodes
into that format for clarity.  Note that the 64-bit opcodes are as
yet unused.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/mips/tcg-target.c | 191 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 117 insertions(+), 74 deletions(-)

diff --git a/tcg/mips/tcg-target.c b/tcg/mips/tcg-target.c
index 2dc4998..68cd896 100644
--- a/tcg/mips/tcg-target.c
+++ b/tcg/mips/tcg-target.c
@@ -257,80 +257,117 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
 
 /* instruction opcodes */
 typedef enum {
-    OPC_J        = 0x02 << 26,
-    OPC_JAL      = 0x03 << 26,
-    OPC_BEQ      = 0x04 << 26,
-    OPC_BNE      = 0x05 << 26,
-    OPC_BLEZ     = 0x06 << 26,
-    OPC_BGTZ     = 0x07 << 26,
-    OPC_ADDIU    = 0x09 << 26,
-    OPC_SLTI     = 0x0A << 26,
-    OPC_SLTIU    = 0x0B << 26,
-    OPC_ANDI     = 0x0C << 26,
-    OPC_ORI      = 0x0D << 26,
-    OPC_XORI     = 0x0E << 26,
-    OPC_LUI      = 0x0F << 26,
-    OPC_LB       = 0x20 << 26,
-    OPC_LH       = 0x21 << 26,
-    OPC_LW       = 0x23 << 26,
-    OPC_LBU      = 0x24 << 26,
-    OPC_LHU      = 0x25 << 26,
-    OPC_LWU      = 0x27 << 26,
-    OPC_SB       = 0x28 << 26,
-    OPC_SH       = 0x29 << 26,
-    OPC_SW       = 0x2B << 26,
-
-    OPC_SPECIAL  = 0x00 << 26,
-    OPC_SLL      = OPC_SPECIAL | 0x00,
-    OPC_SRL      = OPC_SPECIAL | 0x02,
-    OPC_ROTR     = OPC_SPECIAL | (0x01 << 21) | 0x02,
-    OPC_SRA      = OPC_SPECIAL | 0x03,
-    OPC_SLLV     = OPC_SPECIAL | 0x04,
-    OPC_SRLV     = OPC_SPECIAL | 0x06,
-    OPC_ROTRV    = OPC_SPECIAL | (0x01 <<  6) | 0x06,
-    OPC_SRAV     = OPC_SPECIAL | 0x07,
-    OPC_JR_R5    = OPC_SPECIAL | 0x08,
-    OPC_JALR     = OPC_SPECIAL | 0x09,
-    OPC_MOVZ     = OPC_SPECIAL | 0x0A,
-    OPC_MOVN     = OPC_SPECIAL | 0x0B,
-    OPC_MFHI     = OPC_SPECIAL | 0x10,
-    OPC_MFLO     = OPC_SPECIAL | 0x12,
-    OPC_MULT     = OPC_SPECIAL | 0x18,
-    OPC_MUL_R6   = OPC_SPECIAL | (0x02 <<  6) | 0x18,
-    OPC_MUH      = OPC_SPECIAL | (0x03 <<  6) | 0x18,
-    OPC_MULTU    = OPC_SPECIAL | 0x19,
-    OPC_MULU     = OPC_SPECIAL | (0x02 <<  6) | 0x19,
-    OPC_MUHU     = OPC_SPECIAL | (0x03 <<  6) | 0x19,
-    OPC_DIV      = OPC_SPECIAL | 0x1A,
-    OPC_DIV_R6   = OPC_SPECIAL | (0x02 <<  6) | 0x1A,
-    OPC_MOD      = OPC_SPECIAL | (0x03 <<  6) | 0x1A,
-    OPC_DIVU     = OPC_SPECIAL | 0x1B,
-    OPC_DIVU_R6  = OPC_SPECIAL | (0x02 <<  6) | 0x1B,
-    OPC_MODU     = OPC_SPECIAL | (0x03 <<  6) | 0x1B,
-    OPC_ADDU     = OPC_SPECIAL | 0x21,
-    OPC_SUBU     = OPC_SPECIAL | 0x23,
-    OPC_AND      = OPC_SPECIAL | 0x24,
-    OPC_OR       = OPC_SPECIAL | 0x25,
-    OPC_XOR      = OPC_SPECIAL | 0x26,
-    OPC_NOR      = OPC_SPECIAL | 0x27,
-    OPC_SLT      = OPC_SPECIAL | 0x2A,
-    OPC_SLTU     = OPC_SPECIAL | 0x2B,
-    OPC_SELEQZ   = OPC_SPECIAL | 0x35,
-    OPC_SELNEZ   = OPC_SPECIAL | 0x37,
-
-    OPC_REGIMM   = 0x01 << 26,
-    OPC_BLTZ     = OPC_REGIMM | (0x00 << 16),
-    OPC_BGEZ     = OPC_REGIMM | (0x01 << 16),
-
-    OPC_SPECIAL2 = 0x1c << 26,
-    OPC_MUL_R5   = OPC_SPECIAL2 | 0x002,
-
-    OPC_SPECIAL3 = 0x1f << 26,
-    OPC_EXT      = OPC_SPECIAL3 | 0x000,
-    OPC_INS      = OPC_SPECIAL3 | 0x004,
-    OPC_WSBH     = OPC_SPECIAL3 | 0x0a0,
-    OPC_SEB      = OPC_SPECIAL3 | 0x420,
-    OPC_SEH      = OPC_SPECIAL3 | 0x620,
+    OPC_J        = 002 << 26,
+    OPC_JAL      = 003 << 26,
+    OPC_BEQ      = 004 << 26,
+    OPC_BNE      = 005 << 26,
+    OPC_BLEZ     = 006 << 26,
+    OPC_BGTZ     = 007 << 26,
+    OPC_ADDIU    = 011 << 26,
+    OPC_SLTI     = 012 << 26,
+    OPC_SLTIU    = 013 << 26,
+    OPC_ANDI     = 014 << 26,
+    OPC_ORI      = 015 << 26,
+    OPC_XORI     = 016 << 26,
+    OPC_LUI      = 017 << 26,
+    OPC_DADDIU   = 031 << 26,
+    OPC_LB       = 040 << 26,
+    OPC_LH       = 041 << 26,
+    OPC_LW       = 043 << 26,
+    OPC_LBU      = 044 << 26,
+    OPC_LHU      = 045 << 26,
+    OPC_LWU      = 047 << 26,
+    OPC_SB       = 050 << 26,
+    OPC_SH       = 051 << 26,
+    OPC_SW       = 053 << 26,
+    OPC_LD       = 067 << 26,
+    OPC_SD       = 077 << 26,
+
+    OPC_SPECIAL  = 000 << 26,
+    OPC_SLL      = OPC_SPECIAL | 000,
+    OPC_SRL      = OPC_SPECIAL | 002,
+    OPC_ROTR     = OPC_SPECIAL | 002 | (1 << 21),
+    OPC_SRA      = OPC_SPECIAL | 003,
+    OPC_SLLV     = OPC_SPECIAL | 004,
+    OPC_SRLV     = OPC_SPECIAL | 006,
+    OPC_ROTRV    = OPC_SPECIAL | 006 | 0100,
+    OPC_SRAV     = OPC_SPECIAL | 007,
+    OPC_JR_R5    = OPC_SPECIAL | 010,
+    OPC_JALR     = OPC_SPECIAL | 011,
+    OPC_MOVZ     = OPC_SPECIAL | 012,
+    OPC_MOVN     = OPC_SPECIAL | 013,
+    OPC_MFHI     = OPC_SPECIAL | 020,
+    OPC_MFLO     = OPC_SPECIAL | 022,
+    OPC_DSLLV    = OPC_SPECIAL | 024,
+    OPC_DSRLV    = OPC_SPECIAL | 026,
+    OPC_DROTRV   = OPC_SPECIAL | 026 | 0100,
+    OPC_DSRAV    = OPC_SPECIAL | 027,
+    OPC_MULT     = OPC_SPECIAL | 030,
+    OPC_MUL_R6   = OPC_SPECIAL | 030 | 0200,
+    OPC_MUH      = OPC_SPECIAL | 030 | 0300,
+    OPC_MULTU    = OPC_SPECIAL | 031,
+    OPC_MULU     = OPC_SPECIAL | 031 | 0200,
+    OPC_MUHU     = OPC_SPECIAL | 031 | 0300,
+    OPC_DIV      = OPC_SPECIAL | 032,
+    OPC_DIV_R6   = OPC_SPECIAL | 032 | 0200,
+    OPC_MOD      = OPC_SPECIAL | 032 | 0300,
+    OPC_DIVU     = OPC_SPECIAL | 033,
+    OPC_DIVU_R6  = OPC_SPECIAL | 033 | 0200,
+    OPC_MODU     = OPC_SPECIAL | 033 | 0300,
+    OPC_DMULT    = OPC_SPECIAL | 034,
+    OPC_DMUL     = OPC_SPECIAL | 034 | 0200,
+    OPC_DMUH     = OPC_SPECIAL | 034 | 0300,
+    OPC_DMULTU   = OPC_SPECIAL | 035,
+    OPC_DMULU    = OPC_SPECIAL | 035 | 0200,
+    OPC_DMUHU    = OPC_SPECIAL | 035 | 0300,
+    OPC_DDIV     = OPC_SPECIAL | 036,
+    OPC_DDIV_R6  = OPC_SPECIAL | 036 | 0200,
+    OPC_DMOD     = OPC_SPECIAL | 036 | 0300,
+    OPC_DDIVU    = OPC_SPECIAL | 037,
+    OPC_DDIVU_R6 = OPC_SPECIAL | 037 | 0200,
+    OPC_DMODU_R6 = OPC_SPECIAL | 037 | 0300,
+    OPC_ADDU     = OPC_SPECIAL | 041,
+    OPC_SUBU     = OPC_SPECIAL | 043,
+    OPC_AND      = OPC_SPECIAL | 044,
+    OPC_OR       = OPC_SPECIAL | 045,
+    OPC_XOR      = OPC_SPECIAL | 046,
+    OPC_NOR      = OPC_SPECIAL | 047,
+    OPC_SLT      = OPC_SPECIAL | 052,
+    OPC_SLTU     = OPC_SPECIAL | 053,
+    OPC_DADDU    = OPC_SPECIAL | 055,
+    OPC_DSUBU    = OPC_SPECIAL | 057,
+    OPC_SELEQZ   = OPC_SPECIAL | 065,
+    OPC_SELNEZ   = OPC_SPECIAL | 067,
+    OPC_DSLL     = OPC_SPECIAL | 070,
+    OPC_DSRL     = OPC_SPECIAL | 072,
+    OPC_DROTR    = OPC_SPECIAL | 072 | (1 << 21),
+    OPC_DSRA     = OPC_SPECIAL | 073,
+    OPC_DSLL32   = OPC_SPECIAL | 074,
+    OPC_DSRL32   = OPC_SPECIAL | 076,
+    OPC_DROTR32  = OPC_SPECIAL | 076 | (1 << 21),
+    OPC_DSRA32   = OPC_SPECIAL | 077,
+
+    OPC_REGIMM   = 001 << 26,
+    OPC_BLTZ     = OPC_REGIMM | (000 << 16),
+    OPC_BGEZ     = OPC_REGIMM | (001 << 16),
+
+    OPC_SPECIAL2 = 034 << 26,
+    OPC_MUL_R5   = OPC_SPECIAL2 | 002,
+
+    OPC_SPECIAL3 = 037 << 26,
+    OPC_EXT      = OPC_SPECIAL3 | 000,
+    OPC_DEXTM    = OPC_SPECIAL3 | 001,
+    OPC_DEXTU    = OPC_SPECIAL3 | 002,
+    OPC_DEXT     = OPC_SPECIAL3 | 003,
+    OPC_INS      = OPC_SPECIAL3 | 004,
+    OPC_DINSM    = OPC_SPECIAL3 | 005,
+    OPC_DINSU    = OPC_SPECIAL3 | 006,
+    OPC_DINS     = OPC_SPECIAL3 | 007,
+    OPC_WSBH     = OPC_SPECIAL3 | 00240,
+    OPC_DSBH     = OPC_SPECIAL3 | 00244,
+    OPC_DSHD     = OPC_SPECIAL3 | 00544,
+    OPC_SEB      = OPC_SPECIAL3 | 02040,
+    OPC_SEH      = OPC_SPECIAL3 | 03040,
 
     /* MIPS r6 doesn't have JR, JALR should be used instead */
     OPC_JR       = use_mips32r6_instructions ? OPC_JALR : OPC_JR_R5,
@@ -340,6 +377,12 @@ typedef enum {
      * backwards-compatible at the assembly level.
      */
     OPC_MUL      = use_mips32r6_instructions ? OPC_MUL_R6 : OPC_MUL_R5,
+
+    /* Aliases for convenience.  */
+    ALIAS_PADD     = sizeof(void *) == 4 ? OPC_ADDU : OPC_DADDU,
+    ALIAS_PADDI    = sizeof(void *) == 4 ? OPC_ADDIU : OPC_DADDIU,
+    ALIAS_TSRL     = TARGET_LONG_BITS == 32 || TCG_TARGET_REG_BITS == 32
+                     ? OPC_SRL : OPC_DSRL,
 } MIPSInsn;
 
 /*
-- 
2.5.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 02/15] tcg-mips: Support 64-bit opcodes
  2016-02-09 10:39 [Qemu-devel] [PATCH 00/15] tcg mips64 and mipsr6 improvements Richard Henderson
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 01/15] tcg-mips: Add mips64 opcodes Richard Henderson
@ 2016-02-09 10:39 ` Richard Henderson
  2016-02-09 15:24   ` James Hogan
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 03/15] tcg-mips: Adjust move functions for mips64 Richard Henderson
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2016-02-09 10:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: james.hogan, aurelien

Bulk patch adding 64-bit opcodes into tcg_out_op.  Note that
mips64 is as yet neither complete nor enabled.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/mips/tcg-target.c | 372 ++++++++++++++++++++++++++++++++++++++++++++++++--
 tcg/mips/tcg-target.h |  43 ++++++
 2 files changed, 403 insertions(+), 12 deletions(-)

diff --git a/tcg/mips/tcg-target.c b/tcg/mips/tcg-target.c
index 68cd896..e56dbc6 100644
--- a/tcg/mips/tcg-target.c
+++ b/tcg/mips/tcg-target.c
@@ -325,7 +325,7 @@ typedef enum {
     OPC_DMOD     = OPC_SPECIAL | 036 | 0300,
     OPC_DDIVU    = OPC_SPECIAL | 037,
     OPC_DDIVU_R6 = OPC_SPECIAL | 037 | 0200,
-    OPC_DMODU_R6 = OPC_SPECIAL | 037 | 0300,
+    OPC_DMODU    = OPC_SPECIAL | 037 | 0300,
     OPC_ADDU     = OPC_SPECIAL | 041,
     OPC_SUBU     = OPC_SPECIAL | 043,
     OPC_AND      = OPC_SPECIAL | 044,
@@ -431,6 +431,21 @@ static inline void tcg_out_opc_bf(TCGContext *s, MIPSInsn opc, TCGReg rt,
     tcg_out32(s, inst);
 }
 
+static inline void tcg_out_opc_bf64(TCGContext *s, MIPSInsn opc, MIPSInsn opm,
+                                    MIPSInsn oph, TCGReg rt, TCGReg rs,
+                                    int msb, int lsb)
+{
+    if (lsb >= 32) {
+        opc = oph;
+        msb -= 32;
+        lsb -= 32;
+    } else if (msb >= 32) {
+        opc = opm;
+        msb -= 32;
+    }
+    tcg_out_opc_bf(s, opc, rt, rs, msb, lsb);
+}
+
 /*
  * Type branch
  */
@@ -461,6 +476,18 @@ static inline void tcg_out_opc_sa(TCGContext *s, MIPSInsn opc,
 
 }
 
+static void tcg_out_opc_sa64(TCGContext *s, MIPSInsn opc1, MIPSInsn opc2,
+                             TCGReg rd, TCGReg rt, TCGArg sa)
+{
+    int32_t inst;
+
+    inst = (sa & 32 ? opc2 : opc1);
+    inst |= (rt & 0x1F) << 16;
+    inst |= (rd & 0x1F) << 11;
+    inst |= (sa & 0x1F) <<  6;
+    tcg_out32(s, inst);
+}
+
 /*
  * Type jump.
  * Returns true if the branch was in range and the insn was emitted.
@@ -489,6 +516,21 @@ static inline void tcg_out_nop(TCGContext *s)
     tcg_out32(s, 0);
 }
 
+static inline void tcg_out_dsll(TCGContext *s, TCGReg rd, TCGReg rt, TCGArg sa)
+{
+    tcg_out_opc_sa64(s, OPC_DSLL, OPC_DSLL32, rd, rt, sa);
+}
+
+static inline void tcg_out_dsrl(TCGContext *s, TCGReg rd, TCGReg rt, TCGArg sa)
+{
+    tcg_out_opc_sa64(s, OPC_DSRL, OPC_DSRL32, rd, rt, sa);
+}
+
+static inline void tcg_out_dsra(TCGContext *s, TCGReg rd, TCGReg rt, TCGArg sa)
+{
+    tcg_out_opc_sa64(s, OPC_DSRA, OPC_DSRA32, rd, rt, sa);
+}
+
 static inline void tcg_out_mov(TCGContext *s, TCGType type,
                                TCGReg ret, TCGReg arg)
 {
@@ -574,6 +616,80 @@ static inline void tcg_out_bswap32(TCGContext *s, TCGReg ret, TCGReg arg)
     }
 }
 
+static inline void tcg_out_bswap32u(TCGContext *s, TCGReg ret, TCGReg arg)
+{
+    if (use_mips32r2_instructions) {
+        tcg_out_opc_reg(s, OPC_DSBH, ret, 0, arg);
+        tcg_out_opc_reg(s, OPC_DSHD, ret, 0, arg);
+        tcg_out_dsrl(s, ret, ret, 32);
+    } else {
+        /* ret and arg must be different and can't be register at */
+        if (ret == arg || ret == TCG_TMP0 || arg == TCG_TMP0) {
+            tcg_abort();
+        }
+
+        tcg_out_dsll(s, ret, arg, 24);
+
+        tcg_out_dsrl(s, TCG_TMP0, arg, 24);
+        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
+
+        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP0, arg, 0xff00);
+        tcg_out_dsll(s, TCG_TMP0, TCG_TMP0, 8);
+        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
+
+        tcg_out_dsrl(s, TCG_TMP0, arg, 8);
+        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP0, TCG_TMP0, 0xff00);
+        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
+    }
+}
+
+static void tcg_out_bswap64(TCGContext *s, TCGReg ret, TCGReg arg)
+{
+    if (use_mips32r2_instructions) {
+        tcg_out_opc_reg(s, OPC_DSBH, ret, 0, arg);
+        tcg_out_opc_reg(s, OPC_DSHD, ret, 0, arg);
+    } else {
+        /* ret and arg must be different and can't be either tmp reg.  */
+        if (ret == arg || ret == TCG_TMP0 || arg == TCG_TMP0
+            || ret == TCG_TMP1 || arg == TCG_TMP1) {
+            tcg_abort();
+        }
+
+        /* ??? Consider just making this a subroutine.  */
+
+        /* A... ...H -> H... ...A */
+        tcg_out_dsll(s, ret, arg, 56);
+        tcg_out_dsrl(s, TCG_TMP0, arg, 56);
+        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
+
+        /* .B.. ..G. -> .G.. ..B. */
+        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP0, arg, 0xff00);
+        tcg_out_dsrl(s, TCG_TMP1, arg, 40);
+        tcg_out_dsll(s, TCG_TMP0, TCG_TMP0, 40);
+        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP1, TCG_TMP1, 0xff00);
+        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
+        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP1);
+
+        /* ..CD .... -> .... DC.. */
+        tcg_out_dsrl(s, TCG_TMP0, arg, 32);
+        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP1, TCG_TMP0, 0xff00);
+        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP0, TCG_TMP0, 0x00ff);
+        tcg_out_dsll(s, TCG_TMP1, TCG_TMP1, 8);
+        tcg_out_dsll(s, TCG_TMP0, TCG_TMP0, 24);
+        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP1);
+        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
+
+        /* .... EF.. -> ..FE .... */
+        tcg_out_dsrl(s, TCG_TMP0, arg, 16);
+        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP1, TCG_TMP0, 0xff00);
+        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP0, TCG_TMP0, 0x00ff);
+        tcg_out_dsll(s, TCG_TMP1, TCG_TMP1, 24);
+        tcg_out_dsll(s, TCG_TMP0, TCG_TMP0, 40);
+        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP1);
+        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
+    }
+}
+
 static inline void tcg_out_ext8s(TCGContext *s, TCGReg ret, TCGReg arg)
 {
     if (use_mips32r2_instructions) {
@@ -1461,28 +1577,45 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_ld8u_i32:
+    case INDEX_op_ld8u_i64:
         i1 = OPC_LBU;
         goto do_ldst;
     case INDEX_op_ld8s_i32:
+    case INDEX_op_ld8s_i64:
         i1 = OPC_LB;
         goto do_ldst;
     case INDEX_op_ld16u_i32:
+    case INDEX_op_ld16u_i64:
         i1 = OPC_LHU;
         goto do_ldst;
     case INDEX_op_ld16s_i32:
+    case INDEX_op_ld16s_i64:
         i1 = OPC_LH;
         goto do_ldst;
     case INDEX_op_ld_i32:
+    case INDEX_op_ld32s_i64:
         i1 = OPC_LW;
         goto do_ldst;
+    case INDEX_op_ld32u_i64:
+        i1 = OPC_LWU;
+        goto do_ldst;
+    case INDEX_op_ld_i64:
+        i1 = OPC_LD;
+        goto do_ldst;
     case INDEX_op_st8_i32:
+    case INDEX_op_st8_i64:
         i1 = OPC_SB;
         goto do_ldst;
     case INDEX_op_st16_i32:
+    case INDEX_op_st16_i64:
         i1 = OPC_SH;
         goto do_ldst;
     case INDEX_op_st_i32:
+    case INDEX_op_st32_i64:
         i1 = OPC_SW;
+        goto do_ldst;
+    case INDEX_op_st_i64:
+        i1 = OPC_SD;
     do_ldst:
         tcg_out_ldst(s, i1, a0, a1, a2);
         break;
@@ -1490,10 +1623,15 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_add_i32:
         i1 = OPC_ADDU, i2 = OPC_ADDIU;
         goto do_binary;
+    case INDEX_op_add_i64:
+        i1 = OPC_DADDU, i2 = OPC_DADDIU;
+        goto do_binary;
     case INDEX_op_or_i32:
+    case INDEX_op_or_i64:
         i1 = OPC_OR, i2 = OPC_ORI;
         goto do_binary;
     case INDEX_op_xor_i32:
+    case INDEX_op_xor_i64:
         i1 = OPC_XOR, i2 = OPC_XORI;
     do_binary:
         if (c2) {
@@ -1505,12 +1643,16 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_sub_i32:
+        i1 = OPC_SUBU, i2 = OPC_ADDIU;
+        goto do_subtract;
+    case INDEX_op_sub_i64:
+        i1 = OPC_DSUBU, i2 = OPC_DADDIU;
+    do_subtract:
         if (c2) {
-            tcg_out_opc_imm(s, OPC_ADDIU, a0, a1, -a2);
+            tcg_out_opc_imm(s, i2, a0, a1, -a2);
             break;
         }
-        i1 = OPC_SUBU;
-        goto do_binary;
+        goto do_binaryv;
     case INDEX_op_and_i32:
         if (c2 && a2 != (uint16_t)a2) {
             int msb = ctz32(~a2) - 1;
@@ -1521,7 +1663,18 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         i1 = OPC_AND, i2 = OPC_ANDI;
         goto do_binary;
+    case INDEX_op_and_i64:
+        if (c2 && a2 != (uint16_t)a2) {
+            int msb = ctz64(~a2) - 1;
+            assert(use_mips32r2_instructions);
+            assert(is_p2m1(a2));
+            tcg_out_opc_bf64(s, OPC_DEXT, OPC_DEXTM, OPC_DEXTU, a0, a1, msb, 0);
+            break;
+        }
+        i1 = OPC_AND, i2 = OPC_ANDI;
+        goto do_binary;
     case INDEX_op_nor_i32:
+    case INDEX_op_nor_i64:
         i1 = OPC_NOR;
         goto do_binaryv;
 
@@ -1573,6 +1726,55 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
             break;
         }
         i1 = OPC_DIVU, i2 = OPC_MFHI;
+        goto do_hilo1;
+    case INDEX_op_mul_i64:
+        if (use_mips32r6_instructions) {
+            tcg_out_opc_reg(s, OPC_DMUL, a0, a1, a2);
+            break;
+        }
+        i1 = OPC_DMULT, i2 = OPC_MFLO;
+        goto do_hilo1;
+    case INDEX_op_mulsh_i64:
+        if (use_mips32r6_instructions) {
+            tcg_out_opc_reg(s, OPC_DMUH, a0, a1, a2);
+            break;
+        }
+        i1 = OPC_DMULT, i2 = OPC_MFHI;
+        goto do_hilo1;
+    case INDEX_op_muluh_i64:
+        if (use_mips32r6_instructions) {
+            tcg_out_opc_reg(s, OPC_DMUHU, a0, a1, a2);
+            break;
+        }
+        i1 = OPC_DMULTU, i2 = OPC_MFHI;
+        goto do_hilo1;
+    case INDEX_op_div_i64:
+        if (use_mips32r6_instructions) {
+            tcg_out_opc_reg(s, OPC_DDIV_R6, a0, a1, a2);
+            break;
+        }
+        i1 = OPC_DDIV, i2 = OPC_MFLO;
+        goto do_hilo1;
+    case INDEX_op_divu_i64:
+        if (use_mips32r6_instructions) {
+            tcg_out_opc_reg(s, OPC_DDIVU_R6, a0, a1, a2);
+            break;
+        }
+        i1 = OPC_DDIVU, i2 = OPC_MFLO;
+        goto do_hilo1;
+    case INDEX_op_rem_i64:
+        if (use_mips32r6_instructions) {
+            tcg_out_opc_reg(s, OPC_DMOD, a0, a1, a2);
+            break;
+        }
+        i1 = OPC_DDIV, i2 = OPC_MFHI;
+        goto do_hilo1;
+    case INDEX_op_remu_i64:
+        if (use_mips32r6_instructions) {
+            tcg_out_opc_reg(s, OPC_DMODU, a0, a1, a2);
+            break;
+        }
+        i1 = OPC_DDIVU, i2 = OPC_MFHI;
     do_hilo1:
         tcg_out_opc_reg(s, i1, 0, a1, a2);
         tcg_out_opc_reg(s, i2, a0, 0, 0);
@@ -1583,6 +1785,12 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         goto do_hilo2;
     case INDEX_op_mulu2_i32:
         i1 = OPC_MULTU;
+        goto do_hilo2;
+    case INDEX_op_muls2_i64:
+        i1 = OPC_DMULT;
+        goto do_hilo2;
+    case INDEX_op_mulu2_i64:
+        i1 = OPC_DMULTU;
     do_hilo2:
         tcg_out_opc_reg(s, i1, 0, a2, args[3]);
         tcg_out_opc_reg(s, OPC_MFLO, a0, 0, 0);
@@ -1590,20 +1798,51 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_not_i32:
+    case INDEX_op_not_i64:
         i1 = OPC_NOR;
         goto do_unary;
     case INDEX_op_bswap16_i32:
+    case INDEX_op_bswap16_i64:
         i1 = OPC_WSBH;
         goto do_unary;
     case INDEX_op_ext8s_i32:
+    case INDEX_op_ext8s_i64:
         i1 = OPC_SEB;
         goto do_unary;
     case INDEX_op_ext16s_i32:
+    case INDEX_op_ext16s_i64:
         i1 = OPC_SEH;
     do_unary:
         tcg_out_opc_reg(s, i1, a0, TCG_REG_ZERO, a1);
         break;
 
+    case INDEX_op_bswap32_i32:
+        tcg_out_bswap32(s, a0, a1);
+        break;
+    case INDEX_op_bswap32_i64:
+        tcg_out_bswap32u(s, a0, a1);
+        break;
+    case INDEX_op_bswap64_i64:
+        tcg_out_bswap64(s, a0, a1);
+        break;
+    case INDEX_op_extrh_i64_i32:
+        tcg_out_dsra(s, a0, a1, 32);
+        break;
+    case INDEX_op_ext32s_i64:
+    case INDEX_op_ext_i32_i64:
+    case INDEX_op_extrl_i64_i32:
+        tcg_out_opc_sa(s, OPC_SLL, a0, a1, 0);
+        break;
+    case INDEX_op_ext32u_i64:
+    case INDEX_op_extu_i32_i64:
+        if (use_mips32r2_instructions) {
+            tcg_out_opc_bf(s, OPC_DEXT, a0, a1, 31, 0);
+        } else {
+            tcg_out_dsll(s, a0, a1, 32);
+            tcg_out_dsrl(s, a0, a0, 32);
+        }
+        break;
+
     case INDEX_op_sar_i32:
         i1 = OPC_SRAV, i2 = OPC_SRA;
         goto do_shift;
@@ -1618,9 +1857,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     do_shift:
         if (c2) {
             tcg_out_opc_sa(s, i2, a0, a1, a2);
-        } else {
-            tcg_out_opc_reg(s, i1, a0, a2, a1);
+            break;
         }
+    do_shiftv:
+        tcg_out_opc_reg(s, i1, a0, a2, a1);
         break;
     case INDEX_op_rotl_i32:
         if (c2) {
@@ -1630,17 +1870,53 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
             tcg_out_opc_reg(s, OPC_ROTRV, a0, TCG_TMP0, a1);
         }
         break;
-
-    case INDEX_op_bswap32_i32:
-        tcg_out_opc_reg(s, OPC_WSBH, a0, 0, a1);
-        tcg_out_opc_sa(s, OPC_ROTR, a0, a0, 16);
+    case INDEX_op_sar_i64:
+        if (c2) {
+            tcg_out_dsra(s, a0, a1, a2);
+            break;
+        }
+        i1 = OPC_DSRAV;
+        goto do_shiftv;
+    case INDEX_op_shl_i64:
+        if (c2) {
+            tcg_out_dsll(s, a0, a1, a2);
+            break;
+        }
+        i1 = OPC_DSLLV;
+        goto do_shiftv;
+    case INDEX_op_shr_i64:
+        if (c2) {
+            tcg_out_dsrl(s, a0, a1, a2);
+            break;
+        }
+        i1 = OPC_DSRLV;
+        goto do_shiftv;
+    case INDEX_op_rotr_i64:
+        if (c2) {
+            tcg_out_opc_sa64(s, OPC_DROTR, OPC_DROTR32, a0, a1, a2);
+            break;
+        }
+        i1 = OPC_DROTRV;
+        goto do_shiftv;
+    case INDEX_op_rotl_i64:
+        if (c2) {
+            tcg_out_opc_sa64(s, OPC_DROTR, OPC_DROTR32, a0, a1, 64 - a2);
+        } else {
+            tcg_out_opc_reg(s, OPC_DSUBU, TCG_TMP0, TCG_REG_ZERO, a2);
+            tcg_out_opc_reg(s, OPC_DROTRV, a0, TCG_TMP0, a1);
+        }
         break;
 
     case INDEX_op_deposit_i32:
         tcg_out_opc_bf(s, OPC_INS, a0, a2, args[3] + args[4] - 1, args[3]);
         break;
+    case INDEX_op_deposit_i64:
+        tcg_out_opc_bf64(s, OPC_DINS, OPC_DINSM, OPC_DINSU, a0, a2,
+                         args[3] + args[4] - 1, args[3]);
+        break;
 
     case INDEX_op_brcond_i32:
+    case INDEX_op_brcond_i64:
         tcg_out_brcond(s, a2, a0, a1, arg_label(args[3]));
         break;
     case INDEX_op_brcond2_i32:
@@ -1648,10 +1924,12 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_movcond_i32:
+    case INDEX_op_movcond_i64:
         tcg_out_movcond(s, args[5], a0, a1, a2, args[3], args[4]);
         break;
 
     case INDEX_op_setcond_i32:
+    case INDEX_op_setcond_i64:
         tcg_out_setcond(s, args[3], a0, a1, a2);
         break;
     case INDEX_op_setcond2_i32:
@@ -1681,7 +1959,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
+    case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
+    case INDEX_op_movi_i64:
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
     default:
         tcg_abort();
@@ -1743,13 +2023,81 @@ static const TCGTargetOpDef mips_op_defs[] = {
     { INDEX_op_movcond_i32, { "r", "rZ", "rZ", "rZ", "0" } },
 #endif
     { INDEX_op_setcond_i32, { "r", "rZ", "rZ" } },
-    { INDEX_op_setcond2_i32, { "r", "rZ", "rZ", "rZ", "rZ" } },
 
+#if TCG_TARGET_REG_BITS == 32
     { INDEX_op_add2_i32, { "r", "r", "rZ", "rZ", "rN", "rN" } },
     { INDEX_op_sub2_i32, { "r", "r", "rZ", "rZ", "rN", "rN" } },
+    { INDEX_op_setcond2_i32, { "r", "rZ", "rZ", "rZ", "rZ" } },
     { INDEX_op_brcond2_i32, { "rZ", "rZ", "rZ", "rZ" } },
+#endif
 
-#if TARGET_LONG_BITS == 32
+#if TCG_TARGET_REG_BITS == 64
+    { INDEX_op_ld8u_i64, { "r", "r" } },
+    { INDEX_op_ld8s_i64, { "r", "r" } },
+    { INDEX_op_ld16u_i64, { "r", "r" } },
+    { INDEX_op_ld16s_i64, { "r", "r" } },
+    { INDEX_op_ld32s_i64, { "r", "r" } },
+    { INDEX_op_ld32u_i64, { "r", "r" } },
+    { INDEX_op_ld_i64, { "r", "r" } },
+    { INDEX_op_st8_i64, { "rZ", "r" } },
+    { INDEX_op_st16_i64, { "rZ", "r" } },
+    { INDEX_op_st32_i64, { "rZ", "r" } },
+    { INDEX_op_st_i64, { "rZ", "r" } },
+
+    { INDEX_op_add_i64, { "r", "rZ", "rJ" } },
+    { INDEX_op_mul_i64, { "r", "rZ", "rZ" } },
+#if !use_mips32r6_instructions
+    { INDEX_op_muls2_i64, { "r", "r", "rZ", "rZ" } },
+    { INDEX_op_mulu2_i64, { "r", "r", "rZ", "rZ" } },
+#endif
+    { INDEX_op_mulsh_i64, { "r", "rZ", "rZ" } },
+    { INDEX_op_muluh_i64, { "r", "rZ", "rZ" } },
+    { INDEX_op_div_i64, { "r", "rZ", "rZ" } },
+    { INDEX_op_divu_i64, { "r", "rZ", "rZ" } },
+    { INDEX_op_rem_i64, { "r", "rZ", "rZ" } },
+    { INDEX_op_remu_i64, { "r", "rZ", "rZ" } },
+    { INDEX_op_sub_i64, { "r", "rZ", "rN" } },
+
+    { INDEX_op_and_i64, { "r", "rZ", "rIK" } },
+    { INDEX_op_nor_i64, { "r", "rZ", "rZ" } },
+    { INDEX_op_not_i64, { "r", "rZ" } },
+    { INDEX_op_or_i64, { "r", "rZ", "rI" } },
+    { INDEX_op_xor_i64, { "r", "rZ", "rI" } },
+
+    { INDEX_op_shl_i64, { "r", "rZ", "ri" } },
+    { INDEX_op_shr_i64, { "r", "rZ", "ri" } },
+    { INDEX_op_sar_i64, { "r", "rZ", "ri" } },
+    { INDEX_op_rotr_i64, { "r", "rZ", "ri" } },
+    { INDEX_op_rotl_i64, { "r", "rZ", "ri" } },
+
+    { INDEX_op_bswap16_i64, { "r", "r" } },
+    { INDEX_op_bswap32_i64, { "r", "r" } },
+    { INDEX_op_bswap64_i64, { "r", "r" } },
+
+    { INDEX_op_ext8s_i64, { "r", "rZ" } },
+    { INDEX_op_ext16s_i64, { "r", "rZ" } },
+    { INDEX_op_ext32s_i64, { "r", "rZ" } },
+    { INDEX_op_ext32u_i64, { "r", "rZ" } },
+    { INDEX_op_ext_i32_i64, { "r", "rZ" } },
+    { INDEX_op_extu_i32_i64, { "r", "rZ" } },
+    { INDEX_op_extrl_i64_i32, { "r", "rZ" } },
+    { INDEX_op_extrh_i64_i32, { "r", "rZ" } },
+
+    { INDEX_op_deposit_i64, { "r", "0", "rZ" } },
+
+    { INDEX_op_brcond_i64, { "rZ", "rZ" } },
+#if use_mips32r6_instructions
+    { INDEX_op_movcond_i64, { "r", "rZ", "rZ", "rZ", "rZ" } },
+#else
+    { INDEX_op_movcond_i64, { "r", "rZ", "rZ", "rZ", "0" } },
+#endif
+    { INDEX_op_setcond_i64, { "r", "rZ", "rZ" } },
+
+    { INDEX_op_qemu_ld_i32, { "L", "lZ" } },
+    { INDEX_op_qemu_st_i32, { "SZ", "SZ" } },
+    { INDEX_op_qemu_ld_i64, { "L", "lZ" } },
+    { INDEX_op_qemu_st_i64, { "SZ", "SZ" } },
+#elif TARGET_LONG_BITS == 32
     { INDEX_op_qemu_ld_i32, { "L", "lZ" } },
     { INDEX_op_qemu_st_i32, { "SZ", "SZ" } },
     { INDEX_op_qemu_ld_i64, { "L", "L", "lZ" } },
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index b1cda37..3de58ae 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -26,6 +26,7 @@
 #ifndef TCG_TARGET_MIPS 
 #define TCG_TARGET_MIPS 1
 
+#define TCG_TARGET_REG_BITS 32
 #define TCG_TARGET_INSN_UNIT_SIZE 4
 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 16
 #define TCG_TARGET_NB_REGS 32
@@ -117,6 +118,29 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_muluh_i32        1
 #define TCG_TARGET_HAS_mulsh_i32        1
 
+#if TCG_TARGET_REG_BITS == 64
+#define TCG_TARGET_HAS_add2_i32         0
+#define TCG_TARGET_HAS_sub2_i32         0
+#define TCG_TARGET_HAS_extrl_i64_i32    1
+#define TCG_TARGET_HAS_extrh_i64_i32    1
+#define TCG_TARGET_HAS_div_i64          1
+#define TCG_TARGET_HAS_rem_i64          1
+#define TCG_TARGET_HAS_not_i64          1
+#define TCG_TARGET_HAS_nor_i64          1
+#define TCG_TARGET_HAS_andc_i64         0
+#define TCG_TARGET_HAS_orc_i64          0
+#define TCG_TARGET_HAS_eqv_i64          0
+#define TCG_TARGET_HAS_nand_i64         0
+#define TCG_TARGET_HAS_add2_i64         0
+#define TCG_TARGET_HAS_sub2_i64         0
+#define TCG_TARGET_HAS_mulu2_i64        1
+#define TCG_TARGET_HAS_muls2_i64        1
+#define TCG_TARGET_HAS_muluh_i64        1
+#define TCG_TARGET_HAS_mulsh_i64        1
+#define TCG_TARGET_HAS_ext32s_i64       1
+#define TCG_TARGET_HAS_ext32u_i64       1
+#endif
+
 /* optional instructions detected at runtime */
 #define TCG_TARGET_HAS_movcond_i32      use_movnz_instructions
 #define TCG_TARGET_HAS_bswap16_i32      use_mips32r2_instructions
@@ -126,11 +150,30 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_ext16s_i32       use_mips32r2_instructions
 #define TCG_TARGET_HAS_rot_i32          use_mips32r2_instructions
 
+#if TCG_TARGET_REG_BITS == 64
+#define TCG_TARGET_HAS_movcond_i64      use_movnz_instructions
+#define TCG_TARGET_HAS_bswap16_i64      use_mips32r2_instructions
+#define TCG_TARGET_HAS_bswap32_i64      use_mips32r2_instructions
+#define TCG_TARGET_HAS_bswap64_i64      use_mips32r2_instructions
+#define TCG_TARGET_HAS_deposit_i64      use_mips32r2_instructions
+#define TCG_TARGET_HAS_ext8s_i64        use_mips32r2_instructions
+#define TCG_TARGET_HAS_ext16s_i64       use_mips32r2_instructions
+#define TCG_TARGET_HAS_rot_i64          use_mips32r2_instructions
+#endif
+
 /* optional instructions automatically implemented */
 #define TCG_TARGET_HAS_neg_i32          0 /* sub  rd, zero, rt   */
 #define TCG_TARGET_HAS_ext8u_i32        0 /* andi rt, rs, 0xff   */
 #define TCG_TARGET_HAS_ext16u_i32       0 /* andi rt, rs, 0xffff */
 
+#if TCG_TARGET_REG_BITS == 64
+#define TCG_TARGET_HAS_neg_i64          0 /* sub  rd, zero, rt   */
+#define TCG_TARGET_HAS_ext8u_i64        0 /* andi rt, rs, 0xff   */
+#define TCG_TARGET_HAS_ext16u_i64       0 /* andi rt, rs, 0xffff */
+#endif
+
+#define TCG_TARGET_HAS_new_ldst         1
+
 #ifdef __OpenBSD__
 #include <machine/sysarch.h>
 #else
-- 
2.5.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 03/15] tcg-mips: Adjust move functions for mips64
  2016-02-09 10:39 [Qemu-devel] [PATCH 00/15] tcg mips64 and mipsr6 improvements Richard Henderson
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 01/15] tcg-mips: Add mips64 opcodes Richard Henderson
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 02/15] tcg-mips: Support 64-bit opcodes Richard Henderson
@ 2016-02-09 10:39 ` Richard Henderson
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 04/15] tcg-mips: Adjust load/store " Richard Henderson
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2016-02-09 10:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: james.hogan, aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/mips/tcg-target.c | 34 +++++++++++++++++++++++++---------
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/tcg/mips/tcg-target.c b/tcg/mips/tcg-target.c
index e56dbc6..43210c5 100644
--- a/tcg/mips/tcg-target.c
+++ b/tcg/mips/tcg-target.c
@@ -536,23 +536,39 @@ static inline void tcg_out_mov(TCGContext *s, TCGType type,
 {
     /* Simple reg-reg move, optimising out the 'do nothing' case */
     if (ret != arg) {
-        tcg_out_opc_reg(s, OPC_ADDU, ret, arg, TCG_REG_ZERO);
+        tcg_out_opc_reg(s, OPC_OR, ret, arg, TCG_REG_ZERO);
     }
 }
 
-static inline void tcg_out_movi(TCGContext *s, TCGType type,
-                                TCGReg reg, tcg_target_long arg)
+static void tcg_out_movi(TCGContext *s, TCGType type,
+                         TCGReg ret, tcg_target_long arg)
 {
+    if (TCG_TARGET_REG_BITS == 64 && type == TCG_TYPE_I32) {
+        arg = (int32_t)arg;
+    }
     if (arg == (int16_t)arg) {
-        tcg_out_opc_imm(s, OPC_ADDIU, reg, TCG_REG_ZERO, arg);
-    } else if (arg == (uint16_t)arg) {
-        tcg_out_opc_imm(s, OPC_ORI, reg, TCG_REG_ZERO, arg);
+        tcg_out_opc_imm(s, OPC_ADDIU, ret, TCG_REG_ZERO, arg);
+        return;
+    }
+    if (arg == (uint16_t)arg) {
+        tcg_out_opc_imm(s, OPC_ORI, ret, TCG_REG_ZERO, arg);
+        return;
+    }
+    if (TCG_TARGET_REG_BITS == 32 || arg == (int32_t)arg) {
+        tcg_out_opc_imm(s, OPC_LUI, ret, TCG_REG_ZERO, arg >> 16);
     } else {
-        tcg_out_opc_imm(s, OPC_LUI, reg, TCG_REG_ZERO, arg >> 16);
-        if (arg & 0xffff) {
-            tcg_out_opc_imm(s, OPC_ORI, reg, reg, arg & 0xffff);
+        tcg_out_movi(s, TCG_TYPE_I32, ret, arg >> 31 >> 1);
+        if (arg & 0xffff0000ull) {
+            tcg_out_dsll(s, ret, ret, 16);
+            tcg_out_opc_imm(s, OPC_ORI, ret, ret, arg >> 16);
+            tcg_out_dsll(s, ret, ret, 16);
+        } else {
+            tcg_out_dsll(s, ret, ret, 32);
         }
     }
+    if (arg & 0xffff) {
+        tcg_out_opc_imm(s, OPC_ORI, ret, ret, arg & 0xffff);
+    }
 }
 
 static inline void tcg_out_bswap16(TCGContext *s, TCGReg ret, TCGReg arg)
-- 
2.5.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 04/15] tcg-mips: Adjust load/store functions for mips64
  2016-02-09 10:39 [Qemu-devel] [PATCH 00/15] tcg mips64 and mipsr6 improvements Richard Henderson
                   ` (2 preceding siblings ...)
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 03/15] tcg-mips: Adjust move functions for mips64 Richard Henderson
@ 2016-02-09 10:39 ` Richard Henderson
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 05/15] tcg-mips: Adjust prologue " Richard Henderson
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2016-02-09 10:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: james.hogan, aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/mips/tcg-target.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/tcg/mips/tcg-target.c b/tcg/mips/tcg-target.c
index 43210c5..8f90360 100644
--- a/tcg/mips/tcg-target.c
+++ b/tcg/mips/tcg-target.c
@@ -733,7 +733,7 @@ static void tcg_out_ldst(TCGContext *s, MIPSInsn opc, TCGReg data,
     if (ofs != lo) {
         tcg_out_movi(s, TCG_TYPE_PTR, TCG_TMP0, ofs - lo);
         if (addr != TCG_REG_ZERO) {
-            tcg_out_opc_reg(s, OPC_ADDU, TCG_TMP0, TCG_TMP0, addr);
+            tcg_out_opc_reg(s, ALIAS_PADD, TCG_TMP0, TCG_TMP0, addr);
         }
         addr = TCG_TMP0;
     }
@@ -743,13 +743,21 @@ static void tcg_out_ldst(TCGContext *s, MIPSInsn opc, TCGReg data,
 static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
                               TCGReg arg1, intptr_t arg2)
 {
-    tcg_out_ldst(s, OPC_LW, arg, arg1, arg2);
+    MIPSInsn opc = OPC_LD;
+    if (TCG_TARGET_REG_BITS == 32 || type == TCG_TYPE_I32) {
+        opc = OPC_LW;
+    }
+    tcg_out_ldst(s, opc, arg, arg1, arg2);
 }
 
 static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
                               TCGReg arg1, intptr_t arg2)
 {
-    tcg_out_ldst(s, OPC_SW, arg, arg1, arg2);
+    MIPSInsn opc = OPC_SD;
+    if (TCG_TARGET_REG_BITS == 32 || type == TCG_TYPE_I32) {
+        opc = OPC_SW;
+    }
+    tcg_out_ldst(s, opc, arg, arg1, arg2);
 }
 
 static inline void tcg_out_addi(TCGContext *s, TCGReg reg, TCGArg val)
-- 
2.5.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 05/15] tcg-mips: Adjust prologue for mips64
  2016-02-09 10:39 [Qemu-devel] [PATCH 00/15] tcg mips64 and mipsr6 improvements Richard Henderson
                   ` (3 preceding siblings ...)
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 04/15] tcg-mips: Adjust load/store " Richard Henderson
@ 2016-02-09 10:39 ` Richard Henderson
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 06/15] tcg-mips: Add tcg unwind info Richard Henderson
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2016-02-09 10:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: james.hogan, aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/mips/tcg-target.c | 57 ++++++++++++++++++++++++---------------------------
 1 file changed, 27 insertions(+), 30 deletions(-)

diff --git a/tcg/mips/tcg-target.c b/tcg/mips/tcg-target.c
index 8f90360..89083fb 100644
--- a/tcg/mips/tcg-target.c
+++ b/tcg/mips/tcg-target.c
@@ -760,16 +760,6 @@ static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
     tcg_out_ldst(s, opc, arg, arg1, arg2);
 }
 
-static inline void tcg_out_addi(TCGContext *s, TCGReg reg, TCGArg val)
-{
-    if (val == (int16_t)val) {
-        tcg_out_opc_imm(s, OPC_ADDIU, reg, reg, val);
-    } else {
-        tcg_out_movi(s, TCG_TYPE_PTR, TCG_TMP0, val);
-        tcg_out_opc_reg(s, OPC_ADDU, reg, reg, TCG_TMP0);
-    }
-}
-
 static void tcg_out_addsub2(TCGContext *s, TCGReg rl, TCGReg rh, TCGReg al,
                             TCGReg ah, TCGArg bl, TCGArg bh, bool cbl,
                             bool cbh, bool is_sub)
@@ -2227,41 +2217,48 @@ static void tcg_target_detect_isa(void)
     sigaction(SIGILL, &sa_old, NULL);
 }
 
+/* Stack frame parameters.  */
+#define REG_SIZE   (TCG_TARGET_REG_BITS / 8)
+#define SAVE_SIZE  ((int)ARRAY_SIZE(tcg_target_callee_save_regs) * REG_SIZE)
+#define TEMP_SIZE  (CPU_TEMP_BUF_NLONGS * (int)sizeof(long))
+
+#define FRAME_SIZE ((TCG_STATIC_CALL_ARGS_SIZE + TEMP_SIZE + SAVE_SIZE \
+                     + TCG_TARGET_STACK_ALIGN - 1) \
+                    & -TCG_TARGET_STACK_ALIGN)
+#define SAVE_OFS   (TCG_STATIC_CALL_ARGS_SIZE + TEMP_SIZE)
+
+/* We're expecting to be able to use an immediate for frame allocation.  */
+QEMU_BUILD_BUG_ON(FRAME_SIZE > 0x7fff);
+
 /* Generate global QEMU prologue and epilogue code */
 static void tcg_target_qemu_prologue(TCGContext *s)
 {
-    int i, frame_size;
-
-    /* reserve some stack space, also for TCG temps. */
-    frame_size = ARRAY_SIZE(tcg_target_callee_save_regs) * 4
-                 + TCG_STATIC_CALL_ARGS_SIZE
-                 + CPU_TEMP_BUF_NLONGS * sizeof(long);
-    frame_size = (frame_size + TCG_TARGET_STACK_ALIGN - 1) &
-                 ~(TCG_TARGET_STACK_ALIGN - 1);
-    tcg_set_frame(s, TCG_REG_SP, ARRAY_SIZE(tcg_target_callee_save_regs) * 4
-                  + TCG_STATIC_CALL_ARGS_SIZE,
-                  CPU_TEMP_BUF_NLONGS * sizeof(long));
+    int i;
+
+    tcg_set_frame(s, TCG_REG_SP, TCG_STATIC_CALL_ARGS_SIZE, TEMP_SIZE);
 
     /* TB prologue */
-    tcg_out_addi(s, TCG_REG_SP, -frame_size);
-    for(i = 0 ; i < ARRAY_SIZE(tcg_target_callee_save_regs) ; i++) {
-        tcg_out_st(s, TCG_TYPE_I32, tcg_target_callee_save_regs[i],
-                   TCG_REG_SP, TCG_STATIC_CALL_ARGS_SIZE + i * 4);
+    tcg_out_opc_imm(s, ALIAS_PADDI, TCG_REG_SP, TCG_REG_SP, -FRAME_SIZE);
+    for (i = 0; i < ARRAY_SIZE(tcg_target_callee_save_regs); i++) {
+        tcg_out_st(s, TCG_TYPE_REG, tcg_target_callee_save_regs[i],
+                   TCG_REG_SP, SAVE_OFS + i * REG_SIZE);
     }
 
     /* Call generated code */
     tcg_out_opc_reg(s, OPC_JR, 0, tcg_target_call_iarg_regs[1], 0);
+    /* delay slot */
     tcg_out_mov(s, TCG_TYPE_PTR, TCG_AREG0, tcg_target_call_iarg_regs[0]);
-    tb_ret_addr = s->code_ptr;
 
     /* TB epilogue */
-    for(i = 0 ; i < ARRAY_SIZE(tcg_target_callee_save_regs) ; i++) {
-        tcg_out_ld(s, TCG_TYPE_I32, tcg_target_callee_save_regs[i],
-                   TCG_REG_SP, TCG_STATIC_CALL_ARGS_SIZE + i * 4);
+    tb_ret_addr = s->code_ptr;
+    for (i = 0; i < ARRAY_SIZE(tcg_target_callee_save_regs); i++) {
+        tcg_out_ld(s, TCG_TYPE_REG, tcg_target_callee_save_regs[i],
+                   TCG_REG_SP, SAVE_OFS + i * REG_SIZE);
     }
 
     tcg_out_opc_reg(s, OPC_JR, 0, TCG_REG_RA, 0);
-    tcg_out_addi(s, TCG_REG_SP, frame_size);
+    /* delay slot */
+    tcg_out_opc_imm(s, ALIAS_PADDI, TCG_REG_SP, TCG_REG_SP, FRAME_SIZE);
 }
 
 static void tcg_target_init(TCGContext *s)
-- 
2.5.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 06/15] tcg-mips: Add tcg unwind info
  2016-02-09 10:39 [Qemu-devel] [PATCH 00/15] tcg mips64 and mipsr6 improvements Richard Henderson
                   ` (4 preceding siblings ...)
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 05/15] tcg-mips: Adjust prologue " Richard Henderson
@ 2016-02-09 10:39 ` Richard Henderson
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 07/15] tcg-mips: Adjust qemu_ld/st for mips64 Richard Henderson
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2016-02-09 10:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: james.hogan, aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/mips/tcg-target.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/tcg/mips/tcg-target.c b/tcg/mips/tcg-target.c
index 89083fb..e986437 100644
--- a/tcg/mips/tcg-target.c
+++ b/tcg/mips/tcg-target.c
@@ -2302,3 +2302,47 @@ void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
     *ptr = deposit32(*ptr, 0, 26, addr >> 2);
     flush_icache_range(jmp_addr, jmp_addr + 4);
 }
+
+typedef struct {
+    DebugFrameHeader h;
+    uint8_t fde_def_cfa[4];
+    uint8_t fde_reg_ofs[ARRAY_SIZE(tcg_target_callee_save_regs) * 2];
+} DebugFrame;
+
+#define ELF_HOST_MACHINE EM_MIPS
+/* GDB doesn't appear to require proper setting of ELF_HOST_FLAGS,
+   which is good because they're really quite complicated for MIPS.  */
+
+static const DebugFrame debug_frame = {
+    .h.cie.len = sizeof(DebugFrameCIE)-4, /* length after .len member */
+    .h.cie.id = -1,
+    .h.cie.version = 1,
+    .h.cie.code_align = 1,
+    .h.cie.data_align = -(TCG_TARGET_REG_BITS / 8) & 0x7f, /* sleb128 */
+    .h.cie.return_column = TCG_REG_RA,
+
+    /* Total FDE size does not include the "len" member.  */
+    .h.fde.len = sizeof(DebugFrame) - offsetof(DebugFrame, h.fde.cie_offset),
+
+    .fde_def_cfa = {
+        12, TCG_REG_SP,                 /* DW_CFA_def_cfa sp, ... */
+        (FRAME_SIZE & 0x7f) | 0x80,     /* ... uleb128 FRAME_SIZE */
+        (FRAME_SIZE >> 7)
+    },
+    .fde_reg_ofs = {
+        0x80 + 16, 9,                   /* DW_CFA_offset, s0, -72 */
+        0x80 + 17, 8,                   /* DW_CFA_offset, s2, -64 */
+        0x80 + 18, 7,                   /* DW_CFA_offset, s3, -56 */
+        0x80 + 19, 6,                   /* DW_CFA_offset, s4, -48 */
+        0x80 + 20, 5,                   /* DW_CFA_offset, s5, -40 */
+        0x80 + 21, 4,                   /* DW_CFA_offset, s6, -32 */
+        0x80 + 22, 3,                   /* DW_CFA_offset, s7, -24 */
+        0x80 + 30, 2,                   /* DW_CFA_offset, s8, -16 */
+        0x80 + 31, 1,                   /* DW_CFA_offset, ra,  -8 */
+    }
+};
+
+void tcg_register_jit(void *buf, size_t buf_size)
+{
+    tcg_register_jit_int(buf, buf_size, &debug_frame, sizeof(debug_frame));
+}
-- 
2.5.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 07/15] tcg-mips: Adjust qemu_ld/st for mips64
  2016-02-09 10:39 [Qemu-devel] [PATCH 00/15] tcg mips64 and mipsr6 improvements Richard Henderson
                   ` (5 preceding siblings ...)
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 06/15] tcg-mips: Add tcg unwind info Richard Henderson
@ 2016-02-09 10:39 ` Richard Henderson
  2016-02-10 16:34   ` James Hogan
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 08/15] tcg-mips: Adjust calling conventions " Richard Henderson
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2016-02-09 10:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: james.hogan, aurelien

At the same time, use extract in the tlb_load for mips32r2.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/mips/tcg-target.c | 239 ++++++++++++++++++++++++++++++++++----------------
 1 file changed, 163 insertions(+), 76 deletions(-)

diff --git a/tcg/mips/tcg-target.c b/tcg/mips/tcg-target.c
index e986437..242db14 100644
--- a/tcg/mips/tcg-target.c
+++ b/tcg/mips/tcg-target.c
@@ -33,8 +33,14 @@
 # define MIPS_BE  0
 #endif
 
-#define LO_OFF    (MIPS_BE * 4)
-#define HI_OFF    (4 - LO_OFF)
+#if TCG_TARGET_REG_BITS == 32
+# define LO_OFF  (MIPS_BE * 4)
+# define HI_OFF  (4 - LO_OFF)
+#else
+extern int link_error(void);
+# define LO_OFF  link_error()
+# define HI_OFF  link_error()
+#endif
 
 #ifndef NDEBUG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
@@ -188,7 +194,7 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
         tcg_regset_set(ct->u.regs, 0xffffffff);
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_A0);
 #if defined(CONFIG_SOFTMMU)
-        if (TARGET_LONG_BITS == 64) {
+        if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
             tcg_regset_reset_reg(ct->u.regs, TCG_REG_A2);
         }
 #endif
@@ -198,11 +204,11 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
         tcg_regset_set(ct->u.regs, 0xffffffff);
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_A0);
 #if defined(CONFIG_SOFTMMU)
-        if (TARGET_LONG_BITS == 32) {
-            tcg_regset_reset_reg(ct->u.regs, TCG_REG_A1);
-        } else {
+        if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
             tcg_regset_reset_reg(ct->u.regs, TCG_REG_A2);
             tcg_regset_reset_reg(ct->u.regs, TCG_REG_A3);
+        } else {
+            tcg_regset_reset_reg(ct->u.regs, TCG_REG_A1);
         }
 #endif
         break;
@@ -726,6 +732,16 @@ static inline void tcg_out_ext16s(TCGContext *s, TCGReg ret, TCGReg arg)
     }
 }
 
+static inline void tcg_out_ext32u(TCGContext *s, TCGReg ret, TCGReg arg)
+{
+    if (use_mips32r2_instructions) {
+        tcg_out_opc_bf(s, OPC_DEXT, ret, arg, 31, 0);
+    } else {
+        tcg_out_dsll(s, ret, arg, 32);
+        tcg_out_dsrl(s, ret, ret, 32);
+    }
+}
+
 static void tcg_out_ldst(TCGContext *s, MIPSInsn opc, TCGReg data,
                          TCGReg addr, intptr_t ofs)
 {
@@ -1124,6 +1140,10 @@ static void * const qemu_ld_helpers[16] = {
     [MO_BESW] = helper_be_ldsw_mmu,
     [MO_BEUL] = helper_be_ldul_mmu,
     [MO_BEQ]  = helper_be_ldq_mmu,
+#if TCG_TARGET_REG_BITS == 64
+    [MO_LESL] = helper_le_ldsl_mmu,
+    [MO_BESL] = helper_be_ldsl_mmu,
+#endif
 };
 
 static void * const qemu_st_helpers[16] = {
@@ -1151,6 +1171,9 @@ static int tcg_out_call_iarg_reg(TCGContext *s, int i, TCGReg arg)
     if (i < ARRAY_SIZE(tcg_target_call_iarg_regs)) {
         tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[i], arg);
     } else {
+        /* For N32 and N64, the initial offset is different.  But there
+           we also have 8 argument register so we don't run out here.  */
+        assert(TCG_TARGET_REG_BITS == 32);
         tcg_out_st(s, TCG_TYPE_REG, arg, TCG_REG_SP, 4 * i);
     }
     return i + 1;
@@ -1192,6 +1215,7 @@ static int tcg_out_call_iarg_imm(TCGContext *s, int i, TCGArg arg)
 
 static int tcg_out_call_iarg_reg2(TCGContext *s, int i, TCGReg al, TCGReg ah)
 {
+    assert(TCG_TARGET_REG_BITS == 32);
     i = (i + 1) & ~1;
     i = tcg_out_call_iarg_reg(s, i, (MIPS_BE ? ah : al));
     i = tcg_out_call_iarg_reg(s, i, (MIPS_BE ? al : ah));
@@ -1205,6 +1229,7 @@ static void tcg_out_tlb_load(TCGContext *s, TCGReg base, TCGReg addrl,
                              tcg_insn_unit *label_ptr[2], bool is_load)
 {
     TCGMemOp s_bits = get_memop(oi) & MO_SIZE;
+    target_ulong mask = TARGET_PAGE_MASK | ((1 << s_bits) - 1);
     int mem_index = get_mmuidx(oi);
     int cmp_off
         = (is_load
@@ -1212,11 +1237,24 @@ static void tcg_out_tlb_load(TCGContext *s, TCGReg base, TCGReg addrl,
            : offsetof(CPUArchState, tlb_table[mem_index][0].addr_write));
     int add_off = offsetof(CPUArchState, tlb_table[mem_index][0].addend);
 
-    tcg_out_opc_sa(s, OPC_SRL, TCG_REG_A0, addrl,
-                   TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
-    tcg_out_opc_imm(s, OPC_ANDI, TCG_REG_A0, TCG_REG_A0,
-                    (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS);
-    tcg_out_opc_reg(s, OPC_ADDU, TCG_REG_A0, TCG_REG_A0, TCG_AREG0);
+    if (use_mips32r2_instructions) {
+        if (TCG_TARGET_REG_BITS == 32 || TARGET_LONG_BITS == 32) {
+            tcg_out_opc_bf(s, OPC_EXT, TCG_REG_A0, addrl,
+                           TARGET_PAGE_BITS + CPU_TLB_ENTRY_BITS - 1,
+                           CPU_TLB_ENTRY_BITS);
+        } else {
+            tcg_out_opc_bf64(s, OPC_DEXT, OPC_DEXTM, OPC_DEXTU,
+                             TCG_REG_A0, addrl,
+                             TARGET_PAGE_BITS + CPU_TLB_ENTRY_BITS - 1,
+                             CPU_TLB_ENTRY_BITS);
+        }
+    } else {
+        tcg_out_opc_sa(s, ALIAS_TSRL, TCG_REG_A0, addrl,
+                       TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
+        tcg_out_opc_imm(s, OPC_ANDI, TCG_REG_A0, TCG_REG_A0,
+                        (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS);
+    }
+    tcg_out_opc_reg(s, ALIAS_PADD, TCG_REG_A0, TCG_REG_A0, TCG_AREG0);
 
     /* Compensate for very large offsets.  */
     if (add_off >= 0x8000) {
@@ -1226,43 +1264,48 @@ static void tcg_out_tlb_load(TCGContext *s, TCGReg base, TCGReg addrl,
         QEMU_BUILD_BUG_ON(offsetof(CPUArchState,
                                    tlb_table[NB_MMU_MODES - 1][1])
                           > 0x7ff0 + 0x7fff);
-        tcg_out_opc_imm(s, OPC_ADDIU, TCG_REG_A0, TCG_REG_A0, 0x7ff0);
+        tcg_out_opc_imm(s, ALIAS_PADDI, TCG_REG_A0, TCG_REG_A0, 0x7ff0);
         cmp_off -= 0x7ff0;
         add_off -= 0x7ff0;
     }
 
-    /* Load the (low half) tlb comparator.  */
-    tcg_out_opc_imm(s, OPC_LW, TCG_TMP0, TCG_REG_A0,
-                    cmp_off + (TARGET_LONG_BITS == 64 ? LO_OFF : 0));
-
-    /* Mask the page bits, keeping the alignment bits to compare against.
-       In between on 32-bit targets, load the tlb addend for the fast path.  */
-    tcg_out_movi(s, TCG_TYPE_I32, TCG_TMP1,
-                 TARGET_PAGE_MASK | ((1 << s_bits) - 1));
-    if (TARGET_LONG_BITS == 32) {
-        tcg_out_opc_imm(s, OPC_LW, TCG_REG_A0, TCG_REG_A0, add_off);
+    /* Load the (low half) tlb comparator.  Mask the page bits, keeping the
+       alignment bits to compare against.  */
+    if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
+        tcg_out_ld(s, TCG_TYPE_I32, TCG_TMP0, TCG_REG_A0, cmp_off + LO_OFF);
+        tcg_out_movi(s, TCG_TYPE_I32, TCG_TMP1, mask);
+    } else {
+        tcg_out_ld(s, TCG_TYPE_TL, TCG_TMP0, TCG_REG_A0, cmp_off);
+        tcg_out_movi(s, TCG_TYPE_TL, TCG_TMP1, mask);
+        /* No second compare is required here;
+           load the tlb addend for the fast path.  */
+        tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_A0, TCG_REG_A0, add_off);
     }
     tcg_out_opc_reg(s, OPC_AND, TCG_TMP1, TCG_TMP1, addrl);
 
+    /* Zero extend a 32-bit guest address for a 64-bit host.  */
+    if (TCG_TARGET_REG_BITS > TARGET_LONG_BITS) {
+        tcg_out_ext32u(s, base, addrl);
+        addrl = base;
+    }
+
     label_ptr[0] = s->code_ptr;
     tcg_out_opc_br(s, OPC_BNE, TCG_TMP1, TCG_TMP0);
 
     /* Load and test the high half tlb comparator.  */
-    if (TARGET_LONG_BITS == 64) {
+    if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
         /* delay slot */
-        tcg_out_opc_imm(s, OPC_LW, TCG_TMP0, TCG_REG_A0, cmp_off + HI_OFF);
+        tcg_out_ld(s, TCG_TYPE_I32, TCG_TMP0, TCG_REG_A0, cmp_off + HI_OFF);
 
-        /* Load the tlb addend for the fast path. We can't do it earlier with
-           64-bit targets or we'll clobber a0 before reading the high half tlb
-           comparator.  */
-        tcg_out_opc_imm(s, OPC_LW, TCG_REG_A0, TCG_REG_A0, add_off);
+        /* Load the tlb addend for the fast path.  */
+        tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_A0, TCG_REG_A0, add_off);
 
         label_ptr[1] = s->code_ptr;
         tcg_out_opc_br(s, OPC_BNE, addrh, TCG_TMP0);
     }
 
     /* delay slot */
-    tcg_out_opc_reg(s, OPC_ADDU, base, TCG_REG_A0, addrl);
+    tcg_out_opc_reg(s, ALIAS_PADD, base, TCG_REG_A0, addrl);
 }
 
 static void add_qemu_ldst_label(TCGContext *s, int is_ld, TCGMemOpIdx oi,
@@ -1280,7 +1323,7 @@ static void add_qemu_ldst_label(TCGContext *s, int is_ld, TCGMemOpIdx oi,
     label->addrhi_reg = addrhi;
     label->raddr = raddr;
     label->label_ptr[0] = label_ptr[0];
-    if (TARGET_LONG_BITS == 64) {
+    if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
         label->label_ptr[1] = label_ptr[1];
     }
 }
@@ -1294,12 +1337,12 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
 
     /* resolve label address */
     reloc_pc16(l->label_ptr[0], s->code_ptr);
-    if (TARGET_LONG_BITS == 64) {
+    if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
         reloc_pc16(l->label_ptr[1], s->code_ptr);
     }
 
     i = 1;
-    if (TARGET_LONG_BITS == 64) {
+    if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
         i = tcg_out_call_iarg_reg2(s, i, l->addrlo_reg, l->addrhi_reg);
     } else {
         i = tcg_out_call_iarg_reg(s, i, l->addrlo_reg);
@@ -1311,7 +1354,7 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
     tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
 
     v0 = l->datalo_reg;
-    if ((opc & MO_SIZE) == MO_64) {
+    if (TCG_TARGET_REG_BITS == 32 && (opc & MO_SIZE) == MO_64) {
         /* We eliminated V0 from the possible output registers, so it
            cannot be clobbered here.  So we must move V1 first.  */
         if (MIPS_BE) {
@@ -1337,12 +1380,12 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
 
     /* resolve label address */
     reloc_pc16(l->label_ptr[0], s->code_ptr);
-    if (TARGET_LONG_BITS == 64) {
+    if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
         reloc_pc16(l->label_ptr[1], s->code_ptr);
     }
 
     i = 1;
-    if (TARGET_LONG_BITS == 64) {
+    if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
         i = tcg_out_call_iarg_reg2(s, i, l->addrlo_reg, l->addrhi_reg);
     } else {
         i = tcg_out_call_iarg_reg(s, i, l->addrlo_reg);
@@ -1354,14 +1397,15 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
     case MO_16:
         i = tcg_out_call_iarg_reg16(s, i, l->datalo_reg);
         break;
-    case MO_32:
-        i = tcg_out_call_iarg_reg(s, i, l->datalo_reg);
-        break;
     case MO_64:
-        i = tcg_out_call_iarg_reg2(s, i, l->datalo_reg, l->datahi_reg);
-        break;
+        if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
+            i = tcg_out_call_iarg_reg2(s, i, l->datalo_reg, l->datahi_reg);
+            break;
+        }
+        /* FALLTHRU */
     default:
-        tcg_abort();
+        i = tcg_out_call_iarg_reg(s, i, l->datalo_reg);
+        break;
     }
     i = tcg_out_call_iarg_imm(s, i, oi);
 
@@ -1376,7 +1420,7 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
 #endif
 
 static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
-                                   TCGReg base, TCGMemOp opc)
+                                   TCGReg base, TCGMemOp opc, bool is_64)
 {
     switch (opc & (MO_SSIZE | MO_BSWAP)) {
     case MO_UB:
@@ -1385,6 +1429,7 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
     case MO_SB:
         tcg_out_opc_imm(s, OPC_LB, datalo, base, 0);
         break;
+
     case MO_UW | MO_BSWAP:
         tcg_out_opc_imm(s, OPC_LHU, TCG_TMP1, base, 0);
         tcg_out_bswap16(s, datalo, TCG_TMP1);
@@ -1392,6 +1437,7 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
     case MO_UW:
         tcg_out_opc_imm(s, OPC_LHU, datalo, base, 0);
         break;
+
     case MO_SW | MO_BSWAP:
         tcg_out_opc_imm(s, OPC_LHU, TCG_TMP1, base, 0);
         tcg_out_bswap16s(s, datalo, TCG_TMP1);
@@ -1399,22 +1445,47 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
     case MO_SW:
         tcg_out_opc_imm(s, OPC_LH, datalo, base, 0);
         break;
+
     case MO_UL | MO_BSWAP:
+        if (TCG_TARGET_REG_BITS == 64 && is_64) {
+            tcg_out_opc_imm(s, OPC_LWU, TCG_TMP1, base, 0);
+            tcg_out_bswap32u(s, datalo, TCG_TMP1);
+            break;
+        }
+        /* FALLTHRU */
+    case MO_SL | MO_BSWAP:
         tcg_out_opc_imm(s, OPC_LW, TCG_TMP1, base, 0);
         tcg_out_bswap32(s, datalo, TCG_TMP1);
         break;
+
     case MO_UL:
+        if (TCG_TARGET_REG_BITS == 64 && is_64) {
+            tcg_out_opc_imm(s, OPC_LWU, datalo, base, 0);
+            break;
+        }
+        /* FALLTHRU */
+    case MO_SL:
         tcg_out_opc_imm(s, OPC_LW, datalo, base, 0);
         break;
+
     case MO_Q | MO_BSWAP:
-        tcg_out_opc_imm(s, OPC_LW, TCG_TMP1, base, HI_OFF);
-        tcg_out_bswap32(s, datalo, TCG_TMP1);
-        tcg_out_opc_imm(s, OPC_LW, TCG_TMP1, base, LO_OFF);
-        tcg_out_bswap32(s, datahi, TCG_TMP1);
+        if (TCG_TARGET_REG_BITS == 32) {
+            tcg_out_opc_imm(s, OPC_LW, TCG_TMP1, base, HI_OFF);
+            tcg_out_bswap32(s, datalo, TCG_TMP1);
+            tcg_out_opc_imm(s, OPC_LW, TCG_TMP1, base, LO_OFF);
+            tcg_out_bswap32(s, datahi, TCG_TMP1);
+        } else {
+            tcg_out_opc_imm(s, OPC_LD, TCG_REG_V0, base, 0);
+            tcg_out_bswap64(s, datalo, TCG_REG_V0);
+        }
         break;
     case MO_Q:
-        tcg_out_opc_imm(s, OPC_LW, datalo, base, LO_OFF);
-        tcg_out_opc_imm(s, OPC_LW, datahi, base, HI_OFF);
+        if (TCG_TARGET_REG_BITS == 32) {
+            tcg_out_opc_imm(s, OPC_LW, datalo, base, LO_OFF);
+            tcg_out_opc_imm(s, OPC_LW, datahi, base, HI_OFF);
+        } else {
+            tcg_out_opc_imm(s, OPC_LD, datalo, base, 0);
+        }
         break;
     default:
         tcg_abort();
@@ -1435,33 +1506,41 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is_64)
     TCGReg base = TCG_REG_V0;
 
     data_regl = *args++;
-    data_regh = (is_64 ? *args++ : 0);
+    data_regh = (TCG_TARGET_REG_BITS == 32 && is_64 ? *args++ : 0);
     addr_regl = *args++;
-    addr_regh = (TARGET_LONG_BITS == 64 ? *args++ : 0);
+    addr_regh = (TCG_TARGET_REG_BITS < TARGET_LONG_BITS ? *args++ : 0);
     oi = *args++;
     opc = get_memop(oi);
 
 #if defined(CONFIG_SOFTMMU)
     tcg_out_tlb_load(s, base, addr_regl, addr_regh, oi, label_ptr, 1);
-    tcg_out_qemu_ld_direct(s, data_regl, data_regh, base, opc);
+    tcg_out_qemu_ld_direct(s, data_regl, data_regh, base, opc, is_64);
     add_qemu_ldst_label(s, 1, oi, data_regl, data_regh, addr_regl, addr_regh,
                         s->code_ptr, label_ptr);
 #else
+    if (TCG_TARGET_REG_BITS > TARGET_LONG_BITS) {
+        tcg_out_ext32u(s, base, addr_regl);
+        addr_regl = base;
+    }
     if (guest_base == 0 && data_regl != addr_regl) {
         base = addr_regl;
     } else if (guest_base == (int16_t)guest_base) {
-        tcg_out_opc_imm(s, OPC_ADDIU, base, addr_regl, guest_base);
+        tcg_out_opc_imm(s, ALIAS_PADDI, base, addr_regl, guest_base);
     } else {
         tcg_out_movi(s, TCG_TYPE_PTR, base, guest_base);
-        tcg_out_opc_reg(s, OPC_ADDU, base, base, addr_regl);
+        tcg_out_opc_reg(s, ALIAS_PADD, base, base, addr_regl);
     }
-    tcg_out_qemu_ld_direct(s, data_regl, data_regh, base, opc);
+    tcg_out_qemu_ld_direct(s, data_regl, data_regh, base, opc, is_64);
 #endif
 }
 
 static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
                                    TCGReg base, TCGMemOp opc)
 {
+    if ((datalo | datahi) == 0) {
+        opc &= ~MO_BSWAP;
+    }
+
     switch (opc & (MO_SIZE | MO_BSWAP)) {
     case MO_8:
         tcg_out_opc_imm(s, OPC_SB, datalo, base, 0);
@@ -1485,14 +1564,25 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
         break;
 
     case MO_64 | MO_BSWAP:
-        tcg_out_bswap32(s, TCG_TMP1, datalo);
-        tcg_out_opc_imm(s, OPC_SW, TCG_TMP1, base, HI_OFF);
-        tcg_out_bswap32(s, TCG_TMP1, datahi);
-        tcg_out_opc_imm(s, OPC_SW, TCG_TMP1, base, LO_OFF);
-        break;
+        if (TCG_TARGET_REG_BITS == 32) {
+            tcg_out_bswap32(s, TCG_TMP1, datalo);
+            datalo = TCG_TMP1;
+            tcg_out_opc_imm(s, OPC_SW, datalo, base, HI_OFF);
+            tcg_out_bswap32(s, TCG_TMP1, datahi);
+            datahi = TCG_TMP1;
+            tcg_out_opc_imm(s, OPC_SW, datahi, base, LO_OFF);
+            break;
+        }
+        tcg_out_bswap64(s, TCG_REG_A1, datalo);
+        datalo = TCG_REG_A1;
+        /* FALLTHRU */
     case MO_64:
-        tcg_out_opc_imm(s, OPC_SW, datalo, base, LO_OFF);
-        tcg_out_opc_imm(s, OPC_SW, datahi, base, HI_OFF);
+        if (TCG_TARGET_REG_BITS == 32) {
+            tcg_out_opc_imm(s, OPC_SW, datalo, base, LO_OFF);
+            tcg_out_opc_imm(s, OPC_SW, datahi, base, HI_OFF);
+        } else {
+            tcg_out_opc_imm(s, OPC_SD, datalo, base, 0);
+        }
         break;
 
     default:
@@ -1511,9 +1601,9 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is_64)
 #endif
 
     data_regl = *args++;
-    data_regh = (is_64 ? *args++ : 0);
+    data_regh = (TCG_TARGET_REG_BITS == 32 && is_64 ? *args++ : 0);
     addr_regl = *args++;
-    addr_regh = (TARGET_LONG_BITS == 64 ? *args++ : 0);
+    addr_regh = (TCG_TARGET_REG_BITS < TARGET_LONG_BITS ? *args++ : 0);
     oi = *args++;
     opc = get_memop(oi);
 
@@ -1526,16 +1616,18 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is_64)
     add_qemu_ldst_label(s, 0, oi, data_regl, data_regh, addr_regl, addr_regh,
                         s->code_ptr, label_ptr);
 #else
+    base = TCG_REG_A0;
+    if (TCG_TARGET_REG_BITS > TARGET_LONG_BITS) {
+        tcg_out_ext32u(s, base, addr_regl);
+        addr_regl = base;
+    }
     if (guest_base == 0) {
         base = addr_regl;
+    } else if (guest_base == (int16_t)guest_base) {
+        tcg_out_opc_imm(s, ALIAS_PADDI, base, addr_regl, guest_base);
     } else {
-        base = TCG_REG_A0;
-        if (guest_base == (int16_t)guest_base) {
-            tcg_out_opc_imm(s, OPC_ADDIU, base, addr_regl, guest_base);
-        } else {
-            tcg_out_movi(s, TCG_TYPE_PTR, base, guest_base);
-            tcg_out_opc_reg(s, OPC_ADDU, base, base, addr_regl);
-        }
+        tcg_out_movi(s, TCG_TYPE_PTR, base, guest_base);
+        tcg_out_opc_reg(s, ALIAS_PADD, base, base, addr_regl);
     }
     tcg_out_qemu_st_direct(s, data_regl, data_regh, base, opc);
 #endif
@@ -1849,12 +1941,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
     case INDEX_op_ext32u_i64:
     case INDEX_op_extu_i32_i64:
-        if (use_mips32r2_instructions) {
-            tcg_out_opc_bf(s, OPC_DEXT, a0, a1, 31, 0);
-        } else {
-            tcg_out_dsll(s, a0, a1, 32);
-            tcg_out_dsrl(s, a0, a0, 32);
-        }
+        tcg_out_ext32u(s, a0, a1);
         break;
 
     case INDEX_op_sar_i32:
-- 
2.5.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 08/15] tcg-mips: Adjust calling conventions for mips64
  2016-02-09 10:39 [Qemu-devel] [PATCH 00/15] tcg mips64 and mipsr6 improvements Richard Henderson
                   ` (6 preceding siblings ...)
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 07/15] tcg-mips: Adjust qemu_ld/st for mips64 Richard Henderson
@ 2016-02-09 10:39 ` Richard Henderson
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 09/15] tcg-mips: Fix exit_tb " Richard Henderson
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2016-02-09 10:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: james.hogan, aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/mips/tcg-target.c | 21 +++++++++++++++------
 tcg/mips/tcg-target.h | 19 +++++++++++++++----
 2 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/tcg/mips/tcg-target.c b/tcg/mips/tcg-target.c
index 242db14..b5982de 100644
--- a/tcg/mips/tcg-target.c
+++ b/tcg/mips/tcg-target.c
@@ -96,10 +96,6 @@ static const TCGReg tcg_target_reg_alloc_order[] = {
     TCG_REG_S8,
 
     /* Call clobbered registers.  */
-    TCG_REG_T0,
-    TCG_REG_T1,
-    TCG_REG_T2,
-    TCG_REG_T3,
     TCG_REG_T4,
     TCG_REG_T5,
     TCG_REG_T6,
@@ -110,17 +106,27 @@ static const TCGReg tcg_target_reg_alloc_order[] = {
     TCG_REG_V0,
 
     /* Argument registers, opposite order of allocation.  */
+    TCG_REG_T3,
+    TCG_REG_T2,
+    TCG_REG_T1,
+    TCG_REG_T0,
     TCG_REG_A3,
     TCG_REG_A2,
     TCG_REG_A1,
     TCG_REG_A0,
 };
 
-static const TCGReg tcg_target_call_iarg_regs[4] = {
+static const TCGReg tcg_target_call_iarg_regs[] = {
     TCG_REG_A0,
     TCG_REG_A1,
     TCG_REG_A2,
-    TCG_REG_A3
+    TCG_REG_A3,
+#if _MIPS_SIM == _ABIN32 || _MIPS_SIM == _ABI64
+    TCG_REG_T0,
+    TCG_REG_T1,
+    TCG_REG_T2,
+    TCG_REG_T3,
+#endif
 };
 
 static const TCGReg tcg_target_call_oarg_regs[2] = {
@@ -2352,6 +2358,9 @@ static void tcg_target_init(TCGContext *s)
 {
     tcg_target_detect_isa();
     tcg_regset_set(tcg_target_available_regs[TCG_TYPE_I32], 0xffffffff);
+    if (TCG_TARGET_REG_BITS == 64) {
+        tcg_regset_set(tcg_target_available_regs[TCG_TYPE_I64], 0xffffffff);
+    }
     tcg_regset_set(tcg_target_call_clobber_regs,
                    (1 << TCG_REG_V0) |
                    (1 << TCG_REG_V1) |
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index 3de58ae..0dab62b 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -26,7 +26,14 @@
 #ifndef TCG_TARGET_MIPS 
 #define TCG_TARGET_MIPS 1
 
-#define TCG_TARGET_REG_BITS 32
+#if _MIPS_SIM == _ABIO32
+# define TCG_TARGET_REG_BITS 32
+#elif _MIPS_SIM == _ABIN32 || _MIPS_SIM == _ABI64
+# define TCG_TARGET_REG_BITS 64
+#else
+# error "Unknown ABI"
+#endif
+
 #define TCG_TARGET_INSN_UNIT_SIZE 4
 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 16
 #define TCG_TARGET_NB_REGS 32
@@ -70,9 +77,13 @@ typedef enum {
 } TCGReg;
 
 /* used for function call generation */
-#define TCG_TARGET_STACK_ALIGN 8
-#define TCG_TARGET_CALL_STACK_OFFSET 16
-#define TCG_TARGET_CALL_ALIGN_ARGS 1
+#define TCG_TARGET_STACK_ALIGN        16
+#if _MIPS_SIM == _ABIO32
+# define TCG_TARGET_CALL_STACK_OFFSET 16
+#else
+# define TCG_TARGET_CALL_STACK_OFFSET 0
+#endif
+#define TCG_TARGET_CALL_ALIGN_ARGS    1
 
 /* MOVN/MOVZ instructions detection */
 #if (defined(__mips_isa_rev) && (__mips_isa_rev >= 1)) || \
-- 
2.5.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 09/15] tcg-mips: Fix exit_tb for mips64
  2016-02-09 10:39 [Qemu-devel] [PATCH 00/15] tcg mips64 and mipsr6 improvements Richard Henderson
                   ` (7 preceding siblings ...)
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 08/15] tcg-mips: Adjust calling conventions " Richard Henderson
@ 2016-02-09 10:39 ` Richard Henderson
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 10/15] tcg-mips: Move bswap code to subroutines Richard Henderson
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2016-02-09 10:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: james.hogan, aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/mips/tcg-target.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tcg/mips/tcg-target.c b/tcg/mips/tcg-target.c
index b5982de..b8c5d90 100644
--- a/tcg/mips/tcg-target.c
+++ b/tcg/mips/tcg-target.c
@@ -1656,6 +1656,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         {
             TCGReg b0 = TCG_REG_ZERO;
 
+            a0 = (intptr_t)a0;
             if (a0 & ~0xffff) {
                 tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_V0, a0 & ~0xffff);
                 b0 = TCG_REG_V0;
-- 
2.5.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 10/15] tcg-mips: Move bswap code to subroutines
  2016-02-09 10:39 [Qemu-devel] [PATCH 00/15] tcg mips64 and mipsr6 improvements Richard Henderson
                   ` (8 preceding siblings ...)
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 09/15] tcg-mips: Fix exit_tb " Richard Henderson
@ 2016-02-09 10:39 ` Richard Henderson
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 11/15] tcg-mips: Use mips64r6 instructions in tcg_out_movi Richard Henderson
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2016-02-09 10:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: james.hogan, aurelien

Without the mips32r2 / mips64r2 instructions to perform swapping,
32 and 64-bit bswap is quite large.  Move them to a subroutine in
the prologue block to minimize code bloat.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/mips/tcg-target.c | 389 ++++++++++++++++++++++++++++++++++----------------
 tcg/mips/tcg-target.h |   6 +-
 2 files changed, 271 insertions(+), 124 deletions(-)

diff --git a/tcg/mips/tcg-target.c b/tcg/mips/tcg-target.c
index b8c5d90..97f9251 100644
--- a/tcg/mips/tcg-target.c
+++ b/tcg/mips/tcg-target.c
@@ -135,6 +135,9 @@ static const TCGReg tcg_target_call_oarg_regs[2] = {
 };
 
 static tcg_insn_unit *tb_ret_addr;
+static tcg_insn_unit *bswap32s_addr;
+static tcg_insn_unit *bswap32u_addr;
+static tcg_insn_unit *bswap64_addr;
 
 static inline uint32_t reloc_pc16_val(tcg_insn_unit *pc, tcg_insn_unit *target)
 {
@@ -187,6 +190,7 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
     ct_str = *pct_str;
     switch(ct_str[0]) {
     case 'r':
+    do_default:
         ct->ct |= TCG_CT_REG;
         tcg_regset_set(ct->u.regs, 0xffffffff);
         break;
@@ -208,6 +212,7 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
     case 'S': /* qemu_st constraint */
         ct->ct |= TCG_CT_REG;
         tcg_regset_set(ct->u.regs, 0xffffffff);
+        tcg_regset_reset_reg(ct->u.regs, TCG_REG_V0);
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_A0);
 #if defined(CONFIG_SOFTMMU)
         if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
@@ -218,6 +223,22 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
         }
 #endif
         break;
+    case 'v': /* bswap output constraint */
+        if (use_mips32r2_instructions) {
+            goto do_default;
+        }
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_clear(ct->u.regs);
+        tcg_regset_set_reg(ct->u.regs, TCG_REG_V0);
+        break;
+    case 'a': /* bswap input constraint */
+        if (use_mips32r2_instructions) {
+            goto do_default;
+        }
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_clear(ct->u.regs);
+        tcg_regset_set_reg(ct->u.regs, TCG_REG_A0);
+        break;
     case 'I':
         ct->ct |= TCG_CT_CONST_U16;
         break;
@@ -618,29 +639,23 @@ static inline void tcg_out_bswap16s(TCGContext *s, TCGReg ret, TCGReg arg)
     }
 }
 
+static void tcg_out_bswap_subr(TCGContext *s, tcg_insn_unit *sub)
+{
+    if (!tcg_out_opc_jmp(s, OPC_JAL, sub)) {
+        tcg_abort();
+    }
+}
+
 static inline void tcg_out_bswap32(TCGContext *s, TCGReg ret, TCGReg arg)
 {
     if (use_mips32r2_instructions) {
         tcg_out_opc_reg(s, OPC_WSBH, ret, 0, arg);
         tcg_out_opc_sa(s, OPC_ROTR, ret, ret, 16);
     } else {
-        /* ret and arg must be different and can't be register at */
-        if (ret == arg || ret == TCG_TMP0 || arg == TCG_TMP0) {
-            tcg_abort();
-        }
-
-        tcg_out_opc_sa(s, OPC_SLL, ret, arg, 24);
-
-        tcg_out_opc_sa(s, OPC_SRL, TCG_TMP0, arg, 24);
-        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
-
-        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP0, arg, 0xff00);
-        tcg_out_opc_sa(s, OPC_SLL, TCG_TMP0, TCG_TMP0, 8);
-        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
-
-        tcg_out_opc_sa(s, OPC_SRL, TCG_TMP0, arg, 8);
-        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP0, TCG_TMP0, 0xff00);
-        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
+        assert(ret == TCG_REG_V0);
+        tcg_out_bswap_subr(s, bswap32s_addr);
+        /* delay slot */
+        tcg_out_opc_reg(s, OPC_OR, TCG_REG_A0, arg, TCG_REG_ZERO);
     }
 }
 
@@ -648,26 +663,13 @@ static inline void tcg_out_bswap32u(TCGContext *s, TCGReg ret, TCGReg arg)
 {
     if (use_mips32r2_instructions) {
         tcg_out_opc_reg(s, OPC_DSBH, ret, 0, arg);
-        tcg_out_opc_reg(s, OPC_DSHD, ret, 0, arg);
+        tcg_out_opc_reg(s, OPC_DSHD, ret, 0, ret);
         tcg_out_dsrl(s, ret, ret, 32);
     } else {
-        /* ret and arg must be different and can't be register at */
-        if (ret == arg || ret == TCG_TMP0 || arg == TCG_TMP0) {
-            tcg_abort();
-        }
-
-        tcg_out_dsll(s, ret, arg, 24);
-
-        tcg_out_dsrl(s, TCG_TMP0, arg, 24);
-        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
-
-        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP0, arg, 0xff00);
-        tcg_out_dsll(s, TCG_TMP0, TCG_TMP0, 8);
-        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
-
-        tcg_out_dsrl(s, TCG_TMP0, arg, 8);
-        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP0, TCG_TMP0, 0xff00);
-        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
+        assert(ret == TCG_REG_V0);
+        tcg_out_bswap_subr(s, bswap32u_addr);
+        /* delay slot */
+        tcg_out_opc_reg(s, OPC_OR, TCG_REG_A0, arg, TCG_REG_ZERO);
     }
 }
 
@@ -677,44 +679,10 @@ static void tcg_out_bswap64(TCGContext *s, TCGReg ret, TCGReg arg)
         tcg_out_opc_reg(s, OPC_DSBH, ret, 0, arg);
         tcg_out_opc_reg(s, OPC_DSHD, ret, 0, arg);
     } else {
-        /* ret and arg must be different and can't be either tmp reg.  */
-        if (ret == arg || ret == TCG_TMP0 || arg == TCG_TMP0
-            || ret == TCG_TMP1 || arg == TCG_TMP1) {
-            tcg_abort();
-        }
-
-        /* ??? Consider just making this a subroutine.  */
-
-        /* A... ...H -> H... ...A */
-        tcg_out_dsll(s, ret, arg, 56);
-        tcg_out_dsrl(s, TCG_TMP0, arg, 56);
-        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
-
-        /* .B.. ..G. -> .G.. ..B. */
-        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP0, arg, 0xff00);
-        tcg_out_dsrl(s, TCG_TMP1, arg, 40);
-        tcg_out_dsll(s, TCG_TMP0, TCG_TMP0, 40);
-        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP1, TCG_TMP1, 0xff00);
-        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
-        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP1);
-
-        /* ..CD .... -> .... DC.. */
-        tcg_out_dsrl(s, TCG_TMP0, arg, 32);
-        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP1, TCG_TMP0, 0xff00);
-        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP0, TCG_TMP0, 0x00ff);
-        tcg_out_dsll(s, TCG_TMP1, TCG_TMP1, 8);
-        tcg_out_dsll(s, TCG_TMP0, TCG_TMP0, 24);
-        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP1);
-        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
-
-        /* .... EF.. -> ..FE .... */
-        tcg_out_dsrl(s, TCG_TMP0, arg, 16);
-        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP1, TCG_TMP0, 0xff00);
-        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP0, TCG_TMP0, 0x00ff);
-        tcg_out_dsll(s, TCG_TMP1, TCG_TMP1, 24);
-        tcg_out_dsll(s, TCG_TMP0, TCG_TMP0, 40);
-        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP1);
-        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
+        assert(ret == TCG_REG_V0);
+        tcg_out_bswap_subr(s, bswap64_addr);
+        /* delay slot */
+        tcg_out_opc_reg(s, OPC_OR, TCG_REG_A0, arg, TCG_REG_ZERO);
     }
 }
 
@@ -1425,72 +1393,111 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
 }
 #endif
 
-static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
+static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg lo, TCGReg hi,
                                    TCGReg base, TCGMemOp opc, bool is_64)
 {
+    bool hi_first = MIPS_BE ? hi != base : lo == base;
+
     switch (opc & (MO_SSIZE | MO_BSWAP)) {
     case MO_UB:
-        tcg_out_opc_imm(s, OPC_LBU, datalo, base, 0);
+        tcg_out_opc_imm(s, OPC_LBU, lo, base, 0);
         break;
     case MO_SB:
-        tcg_out_opc_imm(s, OPC_LB, datalo, base, 0);
+        tcg_out_opc_imm(s, OPC_LB, lo, base, 0);
         break;
 
     case MO_UW | MO_BSWAP:
         tcg_out_opc_imm(s, OPC_LHU, TCG_TMP1, base, 0);
-        tcg_out_bswap16(s, datalo, TCG_TMP1);
+        tcg_out_bswap16(s, lo, TCG_TMP1);
         break;
     case MO_UW:
-        tcg_out_opc_imm(s, OPC_LHU, datalo, base, 0);
+        tcg_out_opc_imm(s, OPC_LHU, lo, base, 0);
         break;
 
     case MO_SW | MO_BSWAP:
         tcg_out_opc_imm(s, OPC_LHU, TCG_TMP1, base, 0);
-        tcg_out_bswap16s(s, datalo, TCG_TMP1);
+        tcg_out_bswap16s(s, lo, TCG_TMP1);
         break;
     case MO_SW:
-        tcg_out_opc_imm(s, OPC_LH, datalo, base, 0);
+        tcg_out_opc_imm(s, OPC_LH, lo, base, 0);
         break;
 
     case MO_UL | MO_BSWAP:
         if (TCG_TARGET_REG_BITS == 64 && is_64) {
-            tcg_out_opc_imm(s, OPC_LWU, TCG_TMP1, base, 0);
-            tcg_out_bswap32u(s, datalo, TCG_TMP1);
+            if (use_mips32r2_instructions) {
+                tcg_out_opc_imm(s, OPC_LWU, lo, base, 0);
+                tcg_out_bswap32u(s, lo, lo);
+            } else {
+                tcg_out_bswap_subr(s, bswap32u_addr);
+                /* delay slot */
+                tcg_out_opc_imm(s, OPC_LWU, TCG_REG_A0, base, 0);
+                tcg_out_mov(s, TCG_TYPE_I64, lo, TCG_REG_V0);
+            }
             break;
         }
         /* FALLTHRU */
     case MO_SL | MO_BSWAP:
-        tcg_out_opc_imm(s, OPC_LW, TCG_TMP1, base, 0);
-        tcg_out_bswap32(s, datalo, TCG_TMP1);
+        if (use_mips32r2_instructions) {
+            tcg_out_opc_imm(s, OPC_LW, lo, base, 0);
+            tcg_out_bswap32(s, lo, lo);
+        } else {
+            tcg_out_bswap_subr(s, bswap32s_addr);
+            /* delay slot */
+            tcg_out_opc_imm(s, OPC_LW, TCG_REG_A0, base, 0);
+            tcg_out_mov(s, TCG_TYPE_I32, lo, TCG_REG_V0);
+        }
         break;
 
     case MO_UL:
         if (TCG_TARGET_REG_BITS == 64 && is_64) {
-            tcg_out_opc_imm(s, OPC_LWU, datalo, base, 0);
+            tcg_out_opc_imm(s, OPC_LWU, lo, base, 0);
             break;
         }
         /* FALLTHRU */
     case MO_SL:
-        tcg_out_opc_imm(s, OPC_LW, datalo, base, 0);
+        tcg_out_opc_imm(s, OPC_LW, lo, base, 0);
         break;
 
     case MO_Q | MO_BSWAP:
-        if (TCG_TARGET_REG_BITS == 32) {
-            tcg_out_opc_imm(s, OPC_LW, TCG_TMP1, base, HI_OFF);
-            tcg_out_bswap32(s, datalo, TCG_TMP1);
-            tcg_out_opc_imm(s, OPC_LW, TCG_TMP1, base, LO_OFF);
-            tcg_out_bswap32(s, datahi, TCG_TMP1);
+        if (TCG_TARGET_REG_BITS == 64 && use_mips32r2_instructions) {
+            tcg_out_opc_imm(s, OPC_LD, lo, base, 0);
+            tcg_out_bswap64(s, lo, lo);
+        } else if (TCG_TARGET_REG_BITS == 64) {
+            tcg_out_bswap_subr(s, bswap64_addr);
+            /* delay slot */
+            tcg_out_opc_imm(s, OPC_LD, TCG_REG_A0, base, 0);
+            tcg_out_mov(s, TCG_TYPE_I64, lo, TCG_REG_V0);
+        } else if (use_mips32r2_instructions) {
+            tcg_out_opc_imm(s, OPC_LW, TCG_TMP0, base, 0);
+            tcg_out_opc_imm(s, OPC_LW, TCG_TMP1, base, 4);
+            tcg_out_opc_reg(s, OPC_WSBH, TCG_TMP0, 0, TCG_TMP0);
+            tcg_out_opc_reg(s, OPC_WSBH, TCG_TMP1, 0, TCG_TMP1);
+            tcg_out_opc_sa(s, OPC_ROTR, MIPS_BE ? lo : hi, TCG_TMP0, 16);
+            tcg_out_opc_sa(s, OPC_ROTR, MIPS_BE ? hi : lo, TCG_TMP1, 16);
         } else {
-            tcg_out_opc_imm(s, OPC_LD, TCG_REG_V0, base, 0);
-            tcg_out_bswap64(s, datalo, TCG_REG_V0);
+            tcg_out_bswap_subr(s, bswap32s_addr);
+            /* delay slot */
+            tcg_out_opc_imm(s, OPC_LW, TCG_REG_A0, base,
+                            hi_first ? LO_OFF : HI_OFF);
+            tcg_out_mov(s, TCG_TYPE_I32, TCG_REG_A2, TCG_REG_V0);
+
+            tcg_out_bswap_subr(s, bswap32s_addr);
+            /* delay slot */
+            tcg_out_opc_imm(s, OPC_LW, TCG_REG_A0, base,
+                            hi_first ? LO_OFF : HI_OFF);
+            tcg_out_mov(s, TCG_TYPE_I32, hi_first ? lo : hi, TCG_REG_V0);
+            tcg_out_mov(s, TCG_TYPE_I32, hi_first ? hi : lo, TCG_REG_A2);
         }
         break;
     case MO_Q:
-        if (TCG_TARGET_REG_BITS == 32) {
-            tcg_out_opc_imm(s, OPC_LW, datalo, base, LO_OFF);
-            tcg_out_opc_imm(s, OPC_LW, datahi, base, HI_OFF);
+        if (TCG_TARGET_REG_BITS == 64) {
+            tcg_out_opc_imm(s, OPC_LD, lo, base, 0);
+        } else if (hi_first) {
+            tcg_out_opc_imm(s, OPC_LW, hi, base, HI_OFF);
+            tcg_out_opc_imm(s, OPC_LW, lo, base, LO_OFF);
         } else {
-            tcg_out_opc_imm(s, OPC_LD, datalo, base, 0);
+            tcg_out_opc_imm(s, OPC_LW, lo, base, LO_OFF);
+            tcg_out_opc_imm(s, OPC_LW, hi, base, HI_OFF);
         }
         break;
     default:
@@ -1540,54 +1547,62 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is_64)
 #endif
 }
 
-static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
+static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg lo, TCGReg hi,
                                    TCGReg base, TCGMemOp opc)
 {
-    if ((datalo | datahi) == 0) {
+    /* Don't clutter the code below with checks to avoid bswapping ZERO.  */
+    if ((lo | hi) == 0) {
         opc &= ~MO_BSWAP;
     }
 
     switch (opc & (MO_SIZE | MO_BSWAP)) {
     case MO_8:
-        tcg_out_opc_imm(s, OPC_SB, datalo, base, 0);
+        tcg_out_opc_imm(s, OPC_SB, lo, base, 0);
         break;
 
     case MO_16 | MO_BSWAP:
-        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP1, datalo, 0xffff);
+        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP1, lo, 0xffff);
         tcg_out_bswap16(s, TCG_TMP1, TCG_TMP1);
-        datalo = TCG_TMP1;
+        lo = TCG_TMP1;
         /* FALLTHRU */
     case MO_16:
-        tcg_out_opc_imm(s, OPC_SH, datalo, base, 0);
+        tcg_out_opc_imm(s, OPC_SH, lo, base, 0);
         break;
 
     case MO_32 | MO_BSWAP:
-        tcg_out_bswap32(s, TCG_TMP1, datalo);
-        datalo = TCG_TMP1;
+        tcg_out_bswap32(s, TCG_REG_V0, lo);
+        lo = TCG_REG_V0;
         /* FALLTHRU */
     case MO_32:
-        tcg_out_opc_imm(s, OPC_SW, datalo, base, 0);
+        tcg_out_opc_imm(s, OPC_SW, lo, base, 0);
         break;
 
     case MO_64 | MO_BSWAP:
-        if (TCG_TARGET_REG_BITS == 32) {
-            tcg_out_bswap32(s, TCG_TMP1, datalo);
-            datalo = TCG_TMP1;
-            tcg_out_opc_imm(s, OPC_SW, datalo, base, HI_OFF);
-            tcg_out_bswap32(s, TCG_TMP1, datahi);
-            datahi = TCG_TMP1;
-            tcg_out_opc_imm(s, OPC_SW, datahi, base, LO_OFF);
+        if (TCG_TARGET_REG_BITS == 64) {
+            tcg_out_bswap64(s, TCG_REG_V0, lo);
+            lo = TCG_REG_V0;
+        } else if (use_mips32r2_instructions) {
+            tcg_out_opc_reg(s, OPC_WSBH, TCG_TMP0, 0, MIPS_BE ? lo : hi);
+            tcg_out_opc_reg(s, OPC_WSBH, TCG_TMP1, 0, MIPS_BE ? hi : lo);
+            tcg_out_opc_sa(s, OPC_ROTR, TCG_TMP0, TCG_TMP0, 16);
+            tcg_out_opc_sa(s, OPC_ROTR, TCG_TMP1, TCG_TMP1, 16);
+            tcg_out_opc_imm(s, OPC_SW, TCG_TMP0, base, 0);
+            tcg_out_opc_imm(s, OPC_SW, TCG_TMP1, base, 4);
+            break;
+        } else {
+            tcg_out_bswap32(s, TCG_REG_V0, lo);
+            tcg_out_opc_imm(s, OPC_SW, TCG_REG_V0, base, HI_OFF);
+            tcg_out_bswap32(s, TCG_REG_V0, hi);
+            tcg_out_opc_imm(s, OPC_SW, TCG_REG_V0, base, LO_OFF);
             break;
         }
-        tcg_out_bswap64(s, TCG_REG_A1, datalo);
-        datalo = TCG_REG_A1;
         /* FALLTHRU */
     case MO_64:
         if (TCG_TARGET_REG_BITS == 32) {
-            tcg_out_opc_imm(s, OPC_SW, datalo, base, LO_OFF);
-            tcg_out_opc_imm(s, OPC_SW, datahi, base, HI_OFF);
+            tcg_out_opc_imm(s, OPC_SW, MIPS_BE ? hi : lo, base, 0);
+            tcg_out_opc_imm(s, OPC_SW, MIPS_BE ? lo : hi, base, 4);
         } else {
-            tcg_out_opc_imm(s, OPC_SD, datalo, base, 0);
+            tcg_out_opc_imm(s, OPC_SD, lo, base, 0);
         }
         break;
 
@@ -2117,7 +2132,7 @@ static const TCGTargetOpDef mips_op_defs[] = {
     { INDEX_op_rotl_i32, { "r", "rZ", "ri" } },
 
     { INDEX_op_bswap16_i32, { "r", "r" } },
-    { INDEX_op_bswap32_i32, { "r", "r" } },
+    { INDEX_op_bswap32_i32, { "v", "a" } },
 
     { INDEX_op_ext8s_i32, { "r", "rZ" } },
     { INDEX_op_ext16s_i32, { "r", "rZ" } },
@@ -2179,8 +2194,8 @@ static const TCGTargetOpDef mips_op_defs[] = {
     { INDEX_op_rotl_i64, { "r", "rZ", "ri" } },
 
     { INDEX_op_bswap16_i64, { "r", "r" } },
-    { INDEX_op_bswap32_i64, { "r", "r" } },
-    { INDEX_op_bswap64_i64, { "r", "r" } },
+    { INDEX_op_bswap32_i64, { "v", "a" } },
+    { INDEX_op_bswap64_i64, { "v", "a" } },
 
     { INDEX_op_ext8s_i64, { "r", "rZ" } },
     { INDEX_op_ext16s_i64, { "r", "rZ" } },
@@ -2324,6 +2339,16 @@ static void tcg_target_detect_isa(void)
 /* We're expecting to be able to use an immediate for frame allocation.  */
 QEMU_BUILD_BUG_ON(FRAME_SIZE > 0x7fff);
 
+static tcg_insn_unit *align_code_ptr(TCGContext *s)
+{
+    uintptr_t p = (uintptr_t)s->code_ptr;
+    if (p & 15) {
+        p = (p + 15) & -16;
+        s->code_ptr = (void *)p;
+    }
+    return s->code_ptr;
+}
+
 /* Generate global QEMU prologue and epilogue code */
 static void tcg_target_qemu_prologue(TCGContext *s)
 {
@@ -2353,6 +2378,128 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     tcg_out_opc_reg(s, OPC_JR, 0, TCG_REG_RA, 0);
     /* delay slot */
     tcg_out_opc_imm(s, ALIAS_PADDI, TCG_REG_SP, TCG_REG_SP, FRAME_SIZE);
+
+    if (use_mips32r2_instructions) {
+        return;
+    }
+
+    /* Bswap subroutines: Input in TCG_REG_A0, output in TCG_REG_V0;
+       clobbers TCG_TMP1, TCG_TMP0.  */
+
+    bswap32s_addr = align_code_ptr(s);
+
+    /*
+     * bswap32s -- signed 32-bit swap.  a0 = abcd.
+     */
+    /* v0 = (ssss)d000 */
+    tcg_out_opc_sa(s, OPC_SLL, TCG_REG_V0, TCG_REG_A0, 24);
+    /* t1 = 000a */
+    tcg_out_opc_sa(s, OPC_SRL, TCG_TMP1, TCG_REG_A0, 24);
+    /* t0 = 00c0 */
+    tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP0, TCG_REG_A0, 0xff00);
+    /* v0 = d00a */
+    tcg_out_opc_reg(s, OPC_OR, TCG_REG_V0, TCG_REG_V0, TCG_TMP1);
+    /* t1 = 0abc */
+    tcg_out_opc_sa(s, OPC_SRL, TCG_TMP1, TCG_REG_A0, 8);
+    /* t0 = 0c00 */
+    tcg_out_opc_sa(s, OPC_SLL, TCG_TMP0, TCG_TMP0, 8);
+    /* t1 = 00b0 */
+    tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP1, TCG_TMP1, 0xff00);
+    /* v0 = dc0a */
+    tcg_out_opc_reg(s, OPC_OR, TCG_REG_V0, TCG_REG_V0, TCG_TMP0);
+    tcg_out_opc_reg(s, OPC_JR, 0, TCG_REG_RA, 0);
+    /* v0 = dcba -- delay slot */
+    tcg_out_opc_reg(s, OPC_OR, TCG_REG_V0, TCG_REG_V0, TCG_TMP1);
+
+    if (TCG_TARGET_REG_BITS == 32) {
+        return;
+    }
+
+    bswap32u_addr = align_code_ptr(s);
+
+    /*
+     * bswap32u -- unsigned 32-bit swap.  a0 = ....abcd.
+     */
+    /* t1 = (0000)000d */
+    tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP1, TCG_REG_A0, 0xff);
+    /* v0 = 000a */
+    tcg_out_opc_sa(s, OPC_SRL, TCG_REG_V0, TCG_REG_A0, 24);
+    /* t1 = (0000)d000 */
+    tcg_out_dsll(s, TCG_TMP1, TCG_TMP1, 24);
+    /* t0 = 00c0 */
+    tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP0, TCG_REG_A0, 0xff00);
+    /* v0 = d00a */
+    tcg_out_opc_reg(s, OPC_OR, TCG_REG_V0, TCG_REG_V0, TCG_TMP1);
+    /* t1 = 0abc */
+    tcg_out_opc_sa(s, OPC_SRL, TCG_TMP1, TCG_REG_A0, 8);
+    /* t0 = 0c00 */
+    tcg_out_opc_sa(s, OPC_SLL, TCG_TMP0, TCG_TMP0, 8);
+    /* t1 = 00b0 */
+    tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP1, TCG_TMP1, 0xff00);
+    /* v0 = dc0a */
+    tcg_out_opc_reg(s, OPC_OR, TCG_REG_V0, TCG_REG_V0, TCG_TMP0);
+    tcg_out_opc_reg(s, OPC_JR, 0, TCG_REG_RA, 0);
+    /* v0 = dcba -- delay slot */
+    tcg_out_opc_reg(s, OPC_OR, TCG_REG_V0, TCG_REG_V0, TCG_TMP1);
+
+    bswap64_addr = align_code_ptr(s);
+
+    /*
+     * bswap64 -- 64-bit swap.  a0 = abcdefgh
+     */
+    /* v0 = h0000000 */
+    tcg_out_dsll(s, TCG_REG_V0, TCG_REG_A0, 56);
+    /* t1 = 0000000a */
+    tcg_out_dsrl(s, TCG_TMP1, TCG_REG_A0, 56);
+
+    /* t0 = 000000g0 */
+    tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP0, TCG_REG_A0, 0xff00);
+    /* v0 = h000000a */
+    tcg_out_opc_reg(s, OPC_OR, TCG_REG_V0, TCG_REG_V0, TCG_TMP1);
+    /* t1 = 00000abc */
+    tcg_out_dsrl(s, TCG_TMP1, TCG_REG_A0, 40);
+    /* t0 = 0g000000 */
+    tcg_out_dsll(s, TCG_TMP0, TCG_TMP0, 40);
+    /* t1 = 000000b0 */
+    tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP1, TCG_TMP1, 0xff00);
+
+    /* v0 = hg00000a */
+    tcg_out_opc_reg(s, OPC_OR, TCG_REG_V0, TCG_REG_V0, TCG_TMP0);
+    /* t0 = 0000abcd */
+    tcg_out_dsrl(s, TCG_TMP0, TCG_REG_A0, 32);
+    /* v0 = hg0000ba */
+    tcg_out_opc_reg(s, OPC_OR, TCG_REG_V0, TCG_REG_V0, TCG_TMP1);
+
+    /* t1 = 000000c0 */
+    tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP1, TCG_TMP0, 0xff00);
+    /* t0 = 0000000d */
+    tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP0, TCG_TMP0, 0x00ff);
+    /* t1 = 00000c00 */
+    tcg_out_dsll(s, TCG_TMP1, TCG_TMP1, 8);
+    /* t0 = 0000d000 */
+    tcg_out_dsll(s, TCG_TMP0, TCG_TMP0, 24);
+
+    /* v0 = hg000cba */
+    tcg_out_opc_reg(s, OPC_OR, TCG_REG_V0, TCG_REG_V0, TCG_TMP1);
+    /* t1 = 00abcdef */
+    tcg_out_dsrl(s, TCG_TMP1, TCG_REG_A0, 16);
+    /* v0 = hg00dcba */
+    tcg_out_opc_reg(s, OPC_OR, TCG_REG_V0, TCG_REG_V0, TCG_TMP0);
+
+    /* t0 = 0000000f */
+    tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP0, TCG_TMP1, 0x00ff);
+    /* t1 = 000000e0 */
+    tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP1, TCG_TMP1, 0xff00);
+    /* t0 = 00f00000 */
+    tcg_out_dsll(s, TCG_TMP0, TCG_TMP0, 40);
+    /* t1 = 000e0000 */
+    tcg_out_dsll(s, TCG_TMP1, TCG_TMP1, 24);
+
+    /* v0 = hgf0dcba */
+    tcg_out_opc_reg(s, OPC_OR, TCG_REG_V0, TCG_REG_V0, TCG_TMP0);
+    tcg_out_opc_reg(s, OPC_JR, 0, TCG_REG_RA, 0);
+    /* v0 = hgfedcba -- delay slot */
+    tcg_out_opc_reg(s, OPC_OR, TCG_REG_V0, TCG_REG_V0, TCG_TMP1);
 }
 
 static void tcg_target_init(TCGContext *s)
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index 0dab62b..374d803 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -128,6 +128,7 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_muls2_i32        (!use_mips32r6_instructions)
 #define TCG_TARGET_HAS_muluh_i32        1
 #define TCG_TARGET_HAS_mulsh_i32        1
+#define TCG_TARGET_HAS_bswap32_i32      1
 
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_add2_i32         0
@@ -150,12 +151,13 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_mulsh_i64        1
 #define TCG_TARGET_HAS_ext32s_i64       1
 #define TCG_TARGET_HAS_ext32u_i64       1
+#define TCG_TARGET_HAS_bswap32_i64      1
+#define TCG_TARGET_HAS_bswap64_i64      1
 #endif
 
 /* optional instructions detected at runtime */
 #define TCG_TARGET_HAS_movcond_i32      use_movnz_instructions
 #define TCG_TARGET_HAS_bswap16_i32      use_mips32r2_instructions
-#define TCG_TARGET_HAS_bswap32_i32      use_mips32r2_instructions
 #define TCG_TARGET_HAS_deposit_i32      use_mips32r2_instructions
 #define TCG_TARGET_HAS_ext8s_i32        use_mips32r2_instructions
 #define TCG_TARGET_HAS_ext16s_i32       use_mips32r2_instructions
@@ -164,8 +166,6 @@ extern bool use_mips32r2_instructions;
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_movcond_i64      use_movnz_instructions
 #define TCG_TARGET_HAS_bswap16_i64      use_mips32r2_instructions
-#define TCG_TARGET_HAS_bswap32_i64      use_mips32r2_instructions
-#define TCG_TARGET_HAS_bswap64_i64      use_mips32r2_instructions
 #define TCG_TARGET_HAS_deposit_i64      use_mips32r2_instructions
 #define TCG_TARGET_HAS_ext8s_i64        use_mips32r2_instructions
 #define TCG_TARGET_HAS_ext16s_i64       use_mips32r2_instructions
-- 
2.5.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 11/15] tcg-mips: Use mips64r6 instructions in tcg_out_movi
  2016-02-09 10:39 [Qemu-devel] [PATCH 00/15] tcg mips64 and mipsr6 improvements Richard Henderson
                   ` (9 preceding siblings ...)
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 10/15] tcg-mips: Move bswap code to subroutines Richard Henderson
@ 2016-02-09 10:39 ` Richard Henderson
  2016-02-09 16:50   ` James Hogan
  2016-02-09 10:40 ` [Qemu-devel] [PATCH 12/15] tcg-mips: Use mips64r6 instructions in tcg_out_ldst Richard Henderson
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2016-02-09 10:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: james.hogan, aurelien

The DAHI and DATI instructions can eliminate two insns
off the pre-r6 path.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/mips/tcg-target.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/tcg/mips/tcg-target.c b/tcg/mips/tcg-target.c
index 97f9251..f7f4331 100644
--- a/tcg/mips/tcg-target.c
+++ b/tcg/mips/tcg-target.c
@@ -303,7 +303,9 @@ typedef enum {
     OPC_ORI      = 015 << 26,
     OPC_XORI     = 016 << 26,
     OPC_LUI      = 017 << 26,
+    OPC_AUI      = OPC_LUI,
     OPC_DADDIU   = 031 << 26,
+    OPC_DAUI     = 035 << 26,
     OPC_LB       = 040 << 26,
     OPC_LH       = 041 << 26,
     OPC_LW       = 043 << 26,
@@ -383,6 +385,8 @@ typedef enum {
     OPC_REGIMM   = 001 << 26,
     OPC_BLTZ     = OPC_REGIMM | (000 << 16),
     OPC_BGEZ     = OPC_REGIMM | (001 << 16),
+    OPC_DAHI     = OPC_REGIMM | (006 << 16),
+    OPC_DATI     = OPC_REGIMM | (036 << 16),
 
     OPC_SPECIAL2 = 034 << 26,
     OPC_MUL_R5   = OPC_SPECIAL2 | 002,
@@ -402,6 +406,10 @@ typedef enum {
     OPC_SEB      = OPC_SPECIAL3 | 02040,
     OPC_SEH      = OPC_SPECIAL3 | 03040,
 
+    OPC_PCREL    = 073 << 26,
+    OPC_ADDIUPC  = OPC_PCREL | (0 << 19),
+    OPC_ALUIPC   = OPC_PCREL | (3 << 19) | (7 << 16),
+
     /* MIPS r6 doesn't have JR, JALR should be used instead */
     OPC_JR       = use_mips32r6_instructions ? OPC_JALR : OPC_JR_R5,
 
@@ -448,6 +456,17 @@ static inline void tcg_out_opc_imm(TCGContext *s, MIPSInsn opc,
     tcg_out32(s, inst);
 }
 
+static inline void tcg_out_opc_pc19(TCGContext *s, MIPSInsn opc,
+                                    TCGReg rs, TCGArg imm)
+{
+    int32_t inst;
+
+    inst = opc;
+    inst |= (rs & 0x1F) << 21;
+    inst |= (imm & 0x7ffff);
+    tcg_out32(s, inst);
+}
+
 /*
  * Type bitfield
  */
@@ -589,6 +608,50 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
     }
     if (TCG_TARGET_REG_BITS == 32 || arg == (int32_t)arg) {
         tcg_out_opc_imm(s, OPC_LUI, ret, TCG_REG_ZERO, arg >> 16);
+    } else if (use_mips32r6_instructions) {
+        tcg_target_long disp = arg - (intptr_t)s->code_ptr;
+        if (disp == sextract32(disp, 2, 19) * 4) {
+            tcg_out_opc_pc19(s, OPC_ADDIUPC, ret, disp >> 2);
+            return;
+        } else if ((disp & ~(tcg_target_long)0xffff)
+                   == sextract32(disp, 16, 16) * 0x10000) {
+            tcg_out_opc_imm(s, OPC_ALUIPC, ret, 0, disp >> 16);
+        } else {
+            TCGReg in = TCG_REG_ZERO;
+            tcg_target_long tmp = (int16_t)arg;
+
+            /* The R6 manual recommends construction of immediates in
+               order of low to high (ADDI, AUI, DAHI, DATI) in order
+               to simplify hardware recognizing these sequences.  */
+
+            if (tmp) {
+                tcg_out_opc_imm(s, OPC_ADDIU, ret, in, tmp);
+                in = ret;
+            }
+            arg = (arg - tmp) >> 16;
+            tmp = (int16_t)arg;
+
+            /* Note that DAHI and DATI only have one register operand,
+               and are thus we must put a zero low part in place.  Also
+               note that we already eliminated simple 32-bit constants
+               so we know this must happen.  */
+            if (tmp || in != ret) {
+                tcg_out_opc_imm(s, OPC_AUI, ret, in, tmp);
+            }
+            arg = (arg - tmp) >> 16;
+            tmp = (int16_t)arg;
+
+            if (tmp) {
+                tcg_out_opc_imm(s, OPC_DAHI, ret, 0, tmp);
+            }
+            arg = (arg - tmp) >> 16;
+            tcg_debug_assert(arg == (int16_t)arg);
+
+            if (arg) {
+                tcg_out_opc_imm(s, OPC_DATI, ret, 0, arg);
+            }
+            return;
+        }
     } else {
         tcg_out_movi(s, TCG_TYPE_I32, ret, arg >> 31 >> 1);
         if (arg & 0xffff0000ull) {
-- 
2.5.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 12/15] tcg-mips: Use mips64r6 instructions in tcg_out_ldst
  2016-02-09 10:39 [Qemu-devel] [PATCH 00/15] tcg mips64 and mipsr6 improvements Richard Henderson
                   ` (10 preceding siblings ...)
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 11/15] tcg-mips: Use mips64r6 instructions in tcg_out_movi Richard Henderson
@ 2016-02-09 10:40 ` Richard Henderson
  2016-02-09 10:40 ` [Qemu-devel] [PATCH 13/15] tcg-mips: Use mips64r6 instructions in constant addition Richard Henderson
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2016-02-09 10:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: james.hogan, aurelien

The DAUI, DAHI, and DATI insns can be used to eliminate
one extra instruction in these cases.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/mips/tcg-target.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/tcg/mips/tcg-target.c b/tcg/mips/tcg-target.c
index f7f4331..bda31c2 100644
--- a/tcg/mips/tcg-target.c
+++ b/tcg/mips/tcg-target.c
@@ -422,6 +422,7 @@ typedef enum {
     /* Aliases for convenience.  */
     ALIAS_PADD     = sizeof(void *) == 4 ? OPC_ADDU : OPC_DADDU,
     ALIAS_PADDI    = sizeof(void *) == 4 ? OPC_ADDIU : OPC_DADDIU,
+    ALIAS_PAUI     = sizeof(void *) == 4 ? OPC_AUI : OPC_DAUI,
     ALIAS_TSRL     = TARGET_LONG_BITS == 32 || TCG_TARGET_REG_BITS == 32
                      ? OPC_SRL : OPC_DSRL,
 } MIPSInsn;
@@ -779,9 +780,48 @@ static inline void tcg_out_ext32u(TCGContext *s, TCGReg ret, TCGReg arg)
     }
 }
 
+static void tcg_out_r6_ofs(TCGContext *s, MIPSInsn opl, MIPSInsn oph,
+                           TCGReg reg0, TCGReg reg1, tcg_target_long ofs)
+{
+    TCGReg scratch = TCG_TMP0;
+    int16_t lo = ofs;
+    int32_t hi = ofs - lo;
+
+    ofs = ofs - hi - lo;
+    if (oph == OPC_DAUI && ofs != 0) {
+        tcg_target_long tmp;
+
+        /* Bits are set in the high 32-bit half.  Thus we require th
+           use of DAHI and/or DATI.  The R6 manual recommends addition
+           of immediates in order of mid to high (DAUI, DAHI, DATI, OPL)
+           in order to simplify hardware recognizing these sequences.  */
+
+        tcg_out_opc_imm(s, OPC_DAUI, scratch, reg1, hi >> 16);
+
+        tmp = ofs >> 16 >> 16;
+        if (tmp & 0xffff) {
+            tcg_out_opc_imm(s, OPC_DAHI, scratch, 0, tmp);
+        }
+        tmp = (tmp - (int16_t)tmp) >> 16;
+        if (tmp) {
+            tcg_out_opc_imm(s, OPC_DATI, scratch, 0, tmp);
+        }
+        reg1 = scratch;
+    } else if (hi != 0) {
+        tcg_out_opc_imm(s, oph, scratch, reg1, hi >> 16);
+        reg1 = scratch;
+    }
+    tcg_out_opc_imm(s, opc, reg0, reg1, lo);
+}
+
 static void tcg_out_ldst(TCGContext *s, MIPSInsn opc, TCGReg data,
                          TCGReg addr, intptr_t ofs)
 {
+    if (use_mips32r6_instructions) {
+        tcg_out_r6_ofs(s, opc, ALIAS_PAUI, data, addr, ofs);
+        return;
+    }
+
     int16_t lo = ofs;
     if (ofs != lo) {
         tcg_out_movi(s, TCG_TYPE_PTR, TCG_TMP0, ofs - lo);
-- 
2.5.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 13/15] tcg-mips: Use mips64r6 instructions in constant addition
  2016-02-09 10:39 [Qemu-devel] [PATCH 00/15] tcg mips64 and mipsr6 improvements Richard Henderson
                   ` (11 preceding siblings ...)
  2016-02-09 10:40 ` [Qemu-devel] [PATCH 12/15] tcg-mips: Use mips64r6 instructions in tcg_out_ldst Richard Henderson
@ 2016-02-09 10:40 ` Richard Henderson
  2016-02-09 10:40 ` [Qemu-devel] [PATCH 14/15] tcg-mips: Use mipsr6 instructions in branches Richard Henderson
  2016-02-09 10:40 ` [Qemu-devel] [PATCH 15/15] tcg-mips: Use mipsr6 instructions in calls Richard Henderson
  14 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2016-02-09 10:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: james.hogan, aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/mips/tcg-target.c | 59 ++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 49 insertions(+), 10 deletions(-)

diff --git a/tcg/mips/tcg-target.c b/tcg/mips/tcg-target.c
index bda31c2..e0972ba 100644
--- a/tcg/mips/tcg-target.c
+++ b/tcg/mips/tcg-target.c
@@ -242,12 +242,26 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
     case 'I':
         ct->ct |= TCG_CT_CONST_U16;
         break;
+    case 'A':
+        /* mips r6 can add any constant without needing a temporary.  */
+        if (use_mips32r6_instructions) {
+            ct->ct |= TCG_CT_CONST;
+            break;
+        }
+        /* fallthru */
     case 'J':
         ct->ct |= TCG_CT_CONST_S16;
         break;
     case 'K':
         ct->ct |= TCG_CT_CONST_P2M1;
         break;
+    case 's':
+        /* mips r6 can subtract any constant without needing a temporary.  */
+        if (use_mips32r6_instructions) {
+            ct->ct |= TCG_CT_CONST;
+            break;
+        }
+        /* fallthru */
     case 'N':
         ct->ct |= TCG_CT_CONST_N16;
         break;
@@ -781,9 +795,10 @@ static inline void tcg_out_ext32u(TCGContext *s, TCGReg ret, TCGReg arg)
 }
 
 static void tcg_out_r6_ofs(TCGContext *s, MIPSInsn opl, MIPSInsn oph,
-                           TCGReg reg0, TCGReg reg1, tcg_target_long ofs)
+                           TCGReg reg0, TCGReg reg1,
+                           tcg_target_long ofs, bool is_mem)
 {
-    TCGReg scratch = TCG_TMP0;
+    TCGReg scratch = is_mem ? TCG_TMP0 : reg0;
     int16_t lo = ofs;
     int32_t hi = ofs - lo;
 
@@ -794,9 +809,15 @@ static void tcg_out_r6_ofs(TCGContext *s, MIPSInsn opl, MIPSInsn oph,
         /* Bits are set in the high 32-bit half.  Thus we require th
            use of DAHI and/or DATI.  The R6 manual recommends addition
            of immediates in order of mid to high (DAUI, DAHI, DATI, OPL)
-           in order to simplify hardware recognizing these sequences.  */
+           in order to simplify hardware recognizing these sequences.
+           Ignore this wrt DADDIU if it will save one instruction.  */
 
-        tcg_out_opc_imm(s, OPC_DAUI, scratch, reg1, hi >> 16);
+        if (hi == 0 && lo != 0 && !is_mem) {
+            tcg_out_opc_imm(s, OPC_DADDIU, scratch, reg1, lo);
+            lo = 0;
+        } else if (hi != 0 || reg1 != scratch) {
+            tcg_out_opc_imm(s, OPC_DAUI, scratch, reg1, hi >> 16);
+        }
 
         tmp = ofs >> 16 >> 16;
         if (tmp & 0xffff) {
@@ -811,14 +832,16 @@ static void tcg_out_r6_ofs(TCGContext *s, MIPSInsn opl, MIPSInsn oph,
         tcg_out_opc_imm(s, oph, scratch, reg1, hi >> 16);
         reg1 = scratch;
     }
-    tcg_out_opc_imm(s, opc, reg0, reg1, lo);
+    if (is_mem || lo != 0 || reg0 != reg1) {
+        tcg_out_opc_imm(s, opl, reg0, reg1, lo);
+    }
 }
 
 static void tcg_out_ldst(TCGContext *s, MIPSInsn opc, TCGReg data,
                          TCGReg addr, intptr_t ofs)
 {
     if (use_mips32r6_instructions) {
-        tcg_out_r6_ofs(s, opc, ALIAS_PAUI, data, addr, ofs);
+        tcg_out_r6_ofs(s, opc, ALIAS_PAUI, data, addr, ofs, true);
         return;
     }
 
@@ -1852,9 +1875,17 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_add_i32:
+        if (use_mips32r6_instructions && c2) {
+            tcg_out_r6_ofs(s, OPC_ADDIU, OPC_AUI, a0, a1, (int32_t)a2, false);
+            break;
+        }
         i1 = OPC_ADDU, i2 = OPC_ADDIU;
         goto do_binary;
     case INDEX_op_add_i64:
+        if (use_mips32r6_instructions && c2) {
+            tcg_out_r6_ofs(s, OPC_DADDIU, OPC_DAUI, a0, a1, a2, false);
+            break;
+        }
         i1 = OPC_DADDU, i2 = OPC_DADDIU;
         goto do_binary;
     case INDEX_op_or_i32:
@@ -1874,9 +1905,17 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_sub_i32:
+        if (use_mips32r6_instructions && c2) {
+            tcg_out_r6_ofs(s, OPC_ADDIU, OPC_AUI, a0, a1, (int32_t)-a2, false);
+            break;
+        }
         i1 = OPC_SUBU, i2 = OPC_ADDIU;
         goto do_subtract;
     case INDEX_op_sub_i64:
+        if (use_mips32r6_instructions && c2) {
+            tcg_out_r6_ofs(s, OPC_DADDIU, OPC_DAUI, a0, a1, -a2, false);
+            break;
+        }
         i1 = OPC_DSUBU, i2 = OPC_DADDIU;
     do_subtract:
         if (c2) {
@@ -2208,7 +2247,7 @@ static const TCGTargetOpDef mips_op_defs[] = {
     { INDEX_op_st16_i32, { "rZ", "r" } },
     { INDEX_op_st_i32, { "rZ", "r" } },
 
-    { INDEX_op_add_i32, { "r", "rZ", "rJ" } },
+    { INDEX_op_add_i32, { "r", "rZ", "rA" } },
     { INDEX_op_mul_i32, { "r", "rZ", "rZ" } },
 #if !use_mips32r6_instructions
     { INDEX_op_muls2_i32, { "r", "r", "rZ", "rZ" } },
@@ -2220,7 +2259,7 @@ static const TCGTargetOpDef mips_op_defs[] = {
     { INDEX_op_divu_i32, { "r", "rZ", "rZ" } },
     { INDEX_op_rem_i32, { "r", "rZ", "rZ" } },
     { INDEX_op_remu_i32, { "r", "rZ", "rZ" } },
-    { INDEX_op_sub_i32, { "r", "rZ", "rN" } },
+    { INDEX_op_sub_i32, { "r", "rZ", "rs" } },
 
     { INDEX_op_and_i32, { "r", "rZ", "rIK" } },
     { INDEX_op_nor_i32, { "r", "rZ", "rZ" } },
@@ -2270,7 +2309,7 @@ static const TCGTargetOpDef mips_op_defs[] = {
     { INDEX_op_st32_i64, { "rZ", "r" } },
     { INDEX_op_st_i64, { "rZ", "r" } },
 
-    { INDEX_op_add_i64, { "r", "rZ", "rJ" } },
+    { INDEX_op_add_i64, { "r", "rZ", "rA" } },
     { INDEX_op_mul_i64, { "r", "rZ", "rZ" } },
 #if !use_mips32r6_instructions
     { INDEX_op_muls2_i64, { "r", "r", "rZ", "rZ" } },
@@ -2282,7 +2321,7 @@ static const TCGTargetOpDef mips_op_defs[] = {
     { INDEX_op_divu_i64, { "r", "rZ", "rZ" } },
     { INDEX_op_rem_i64, { "r", "rZ", "rZ" } },
     { INDEX_op_remu_i64, { "r", "rZ", "rZ" } },
-    { INDEX_op_sub_i64, { "r", "rZ", "rN" } },
+    { INDEX_op_sub_i64, { "r", "rZ", "rs" } },
 
     { INDEX_op_and_i64, { "r", "rZ", "rIK" } },
     { INDEX_op_nor_i64, { "r", "rZ", "rZ" } },
-- 
2.5.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 14/15] tcg-mips: Use mipsr6 instructions in branches
  2016-02-09 10:39 [Qemu-devel] [PATCH 00/15] tcg mips64 and mipsr6 improvements Richard Henderson
                   ` (12 preceding siblings ...)
  2016-02-09 10:40 ` [Qemu-devel] [PATCH 13/15] tcg-mips: Use mips64r6 instructions in constant addition Richard Henderson
@ 2016-02-09 10:40 ` Richard Henderson
  2016-02-09 16:22   ` James Hogan
  2016-02-09 10:40 ` [Qemu-devel] [PATCH 15/15] tcg-mips: Use mipsr6 instructions in calls Richard Henderson
  14 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2016-02-09 10:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: james.hogan, aurelien

Using compact branches, when possible, avoids a delay slot nop.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 include/elf.h         |   4 +
 tcg/mips/tcg-target.c | 216 +++++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 172 insertions(+), 48 deletions(-)

diff --git a/include/elf.h b/include/elf.h
index 1098d21..6e52ba0 100644
--- a/include/elf.h
+++ b/include/elf.h
@@ -352,6 +352,10 @@ typedef struct {
 #define R_MIPS_CALLHI16		30
 #define R_MIPS_CALLLO16		31
 /*
+ * Incomplete list of MIPS R6 relocation types.
+ */
+#define R_MIPS_PC26_S2          61
+/*
  * This range is reserved for vendor specific relocations.
  */
 #define R_MIPS_LOVENDOR		100
diff --git a/tcg/mips/tcg-target.c b/tcg/mips/tcg-target.c
index e0972ba..06e15d4 100644
--- a/tcg/mips/tcg-target.c
+++ b/tcg/mips/tcg-target.c
@@ -152,6 +152,19 @@ static inline void reloc_pc16(tcg_insn_unit *pc, tcg_insn_unit *target)
     *pc = deposit32(*pc, 0, 16, reloc_pc16_val(pc, target));
 }
 
+static inline uint32_t reloc_pc26_val(tcg_insn_unit *pc, tcg_insn_unit *target)
+{
+    /* Let the compiler perform the right-shift as part of the arithmetic.  */
+    ptrdiff_t disp = target - (pc + 1);
+    assert(disp == sextract32(disp, 0, 26));
+    return disp & 0x1ffffff;
+}
+
+static inline void reloc_pc26(tcg_insn_unit *pc, tcg_insn_unit *target)
+{
+    *pc = deposit32(*pc, 0, 26, reloc_pc16_val(pc, target));
+}
+
 static inline uint32_t reloc_26_val(tcg_insn_unit *pc, tcg_insn_unit *target)
 {
     assert((((uintptr_t)pc ^ (uintptr_t)target) & 0xf0000000) == 0);
@@ -166,9 +179,17 @@ static inline void reloc_26(tcg_insn_unit *pc, tcg_insn_unit *target)
 static void patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
-    assert(type == R_MIPS_PC16);
     assert(addend == 0);
-    reloc_pc16(code_ptr, (tcg_insn_unit *)value);
+    switch (type) {
+    case R_MIPS_PC16:
+        reloc_pc16(code_ptr, (tcg_insn_unit *)value);
+	break;
+    case R_MIPS_PC26_S2:
+        reloc_pc26(code_ptr, (tcg_insn_unit *)value);
+	break;
+    default:
+        tcg_abort();
+    }
 }
 
 #define TCG_CT_CONST_ZERO 0x100
@@ -309,7 +330,10 @@ typedef enum {
     OPC_BEQ      = 004 << 26,
     OPC_BNE      = 005 << 26,
     OPC_BLEZ     = 006 << 26,
+    OPC_BGEUC    = OPC_BLEZ,    /* R6: rs != 0, rt != 0, rs != rt */
     OPC_BGTZ     = 007 << 26,
+    OPC_BLTUC    = OPC_BGTZ,    /* R6: rs != 0, rt != 0, rs != rt */
+    OPC_BEQC     = 010 << 26,   /* R6: rs > rt */
     OPC_ADDIU    = 011 << 26,
     OPC_SLTI     = 012 << 26,
     OPC_SLTIU    = 013 << 26,
@@ -318,6 +342,9 @@ typedef enum {
     OPC_XORI     = 016 << 26,
     OPC_LUI      = 017 << 26,
     OPC_AUI      = OPC_LUI,
+    OPC_BGEC     = 026 << 26,
+    OPC_BLTC     = 027 << 26,
+    OPC_BNEC     = 030 << 26,   /* R6: rs > rt */
     OPC_DADDIU   = 031 << 26,
     OPC_DAUI     = 035 << 26,
     OPC_LB       = 040 << 26,
@@ -329,6 +356,7 @@ typedef enum {
     OPC_SB       = 050 << 26,
     OPC_SH       = 051 << 26,
     OPC_SW       = 053 << 26,
+    OPC_BC       = 062 << 26,
     OPC_LD       = 067 << 26,
     OPC_SD       = 077 << 26,
 
@@ -527,6 +555,17 @@ static inline void tcg_out_opc_br(TCGContext *s, MIPSInsn opc,
     tcg_out_opc_imm(s, opc, rt, rs, offset);
 }
 
+static void tcg_out_opc_br_pc16(TCGContext *s, MIPSInsn opc,
+                                TCGReg arg1, TCGReg arg2, TCGLabel *l)
+{
+    tcg_out_opc_br(s, opc, arg1, arg2);
+    if (l->has_value) {
+        reloc_pc16(s->code_ptr - 1, l->u.value_ptr);
+    } else {
+        tcg_out_reloc(s, s->code_ptr - 1, R_MIPS_PC16, l, 0);
+    }
+}
+
 /*
  * Type sa
  */
@@ -1002,59 +1041,129 @@ static void tcg_out_brcond(TCGContext *s, TCGCond cond, TCGReg arg1,
         [TCG_COND_GE] = OPC_BGEZ,
     };
 
-    MIPSInsn s_opc = OPC_SLTU;
-    MIPSInsn b_opc;
-    int cmp_map;
+    MIPSInsn b_opc = 0;
+    bool compact = false;
+    int cmp_map, t;
+
+    /* We shouldn't expect to have arg1 == arg2, as the TCG optimizer
+       should have eliminated all such.  However, the R6 encodings do
+       not allow this situation, so e.g. if the optimizer is disabled
+       we must fall back to normal compares.  */
+    if (use_mips32r6_instructions && arg1 != arg2) {
+        switch (cond) {
+        case TCG_COND_EQ:
+        case TCG_COND_NE:
+            if (arg1 < arg2) {
+                t = arg1, arg1 = arg2, arg2 = t;
+            }
+            b_opc = cond == TCG_COND_EQ ? OPC_BEQC : OPC_BNEC;
+            compact = true;
+            break;
 
-    switch (cond) {
-    case TCG_COND_EQ:
-        b_opc = OPC_BEQ;
-        break;
-    case TCG_COND_NE:
-        b_opc = OPC_BNE;
-        break;
+        case TCG_COND_LE:
+        case TCG_COND_GT:
+            if (arg1 == TCG_REG_ZERO) {
+                break;
+            }
+            /* Swap arguments to turn LE to GE or GT to LT.
+               This also produces BLEZC/BGTZC when arg2 = 0.  */
+            t = arg1, arg1 = arg2, arg2 = t;
+            b_opc = cond == TCG_COND_LE ? OPC_BGEC : OPC_BLTC;
+            compact = true;
+            break;
 
-    case TCG_COND_LT:
-    case TCG_COND_GT:
-    case TCG_COND_LE:
-    case TCG_COND_GE:
-        if (arg2 == 0) {
-            b_opc = b_zero[cond];
-            arg2 = arg1;
-            arg1 = 0;
+        case TCG_COND_GE:
+        case TCG_COND_LT:
+            if (arg1 == TCG_REG_ZERO) {
+                break;
+            }
+            /* The encoding of BGEZC/BLTZC requires rs = rt.  */
+            if (arg2 == TCG_REG_ZERO) {
+                arg2 = arg1;
+            }
+            b_opc = cond == TCG_COND_GE ? OPC_BGEC : OPC_BLTC;
+            compact = true;
             break;
-        }
-        s_opc = OPC_SLT;
-        /* FALLTHRU */
 
-    case TCG_COND_LTU:
-    case TCG_COND_GTU:
-    case TCG_COND_LEU:
-    case TCG_COND_GEU:
-        cmp_map = mips_cmp_map[cond];
-        if (cmp_map & MIPS_CMP_SWAP) {
-            TCGReg t = arg1;
-            arg1 = arg2;
-            arg2 = t;
+        case TCG_COND_LEU:
+            /* Swap arguments to turn LE to GE.  */
+            t = arg1, arg1 = arg2, arg2 = t;
+            /* FALLTHRU */
+        case TCG_COND_GEU:
+            b_opc = OPC_BGEUC;
+            compact = true;
+            break;
+
+        case TCG_COND_GTU:
+            /* Swap arguments to turn GT to LT.  */
+            t = arg1, arg1 = arg2, arg2 = t;
+            /* FALLTHRU */
+        case TCG_COND_LTU:
+            b_opc = OPC_BLTUC;
+            compact = true;
+            break;
+
+        default:
+            tcg_abort();
+            break;
         }
-        tcg_out_opc_reg(s, s_opc, TCG_TMP0, arg1, arg2);
-        b_opc = (cmp_map & MIPS_CMP_INV ? OPC_BEQ : OPC_BNE);
-        arg1 = TCG_TMP0;
-        arg2 = TCG_REG_ZERO;
-        break;
+    }
 
-    default:
-        tcg_abort();
-        break;
+    if (b_opc == 0) {
+        MIPSInsn s_opc = OPC_SLTU;
+
+        switch (cond) {
+        case TCG_COND_EQ:
+            b_opc = OPC_BEQ;
+            break;
+        case TCG_COND_NE:
+            b_opc = OPC_BNE;
+            break;
+
+        case TCG_COND_LT:
+        case TCG_COND_GT:
+        case TCG_COND_LE:
+        case TCG_COND_GE:
+            if (arg2 == 0) {
+                b_opc = b_zero[cond];
+                arg2 = arg1;
+                arg1 = 0;
+                break;
+            }
+            s_opc = OPC_SLT;
+            /* FALLTHRU */
+
+        case TCG_COND_LTU:
+        case TCG_COND_GTU:
+        case TCG_COND_LEU:
+        case TCG_COND_GEU:
+            cmp_map = mips_cmp_map[cond];
+            if (cmp_map & MIPS_CMP_SWAP) {
+                TCGReg t = arg1;
+                arg1 = arg2;
+                arg2 = t;
+            }
+            tcg_out_opc_reg(s, s_opc, TCG_TMP0, arg1, arg2);
+            if (use_mips32r6_instructions) {
+                b_opc = (cmp_map & MIPS_CMP_INV ? OPC_BEQC : OPC_BNEC);
+                compact = true;
+            } else {
+                b_opc = (cmp_map & MIPS_CMP_INV ? OPC_BEQ : OPC_BNE);
+            }
+            arg1 = TCG_TMP0;
+            arg2 = TCG_REG_ZERO;
+            break;
+
+        default:
+            tcg_abort();
+            break;
+        }
     }
 
-    tcg_out_opc_br(s, b_opc, arg1, arg2);
-    if (l->has_value) {
-        reloc_pc16(s->code_ptr - 1, l->u.value_ptr);
-    } else {
-        tcg_out_reloc(s, s->code_ptr - 1, R_MIPS_PC16, l, 0);
+    tcg_out_opc_br_pc16(s, b_opc, arg1, arg2, l);
+    if (!compact) {
+        tcg_out_nop(s);
     }
-    tcg_out_nop(s);
 }
 
 static TCGReg tcg_out_reduce_eq2(TCGContext *s, TCGReg tmp0, TCGReg tmp1,
@@ -1826,8 +1935,19 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         s->tb_next_offset[a0] = tcg_current_code_size(s);
         break;
     case INDEX_op_br:
-        tcg_out_brcond(s, TCG_COND_EQ, TCG_REG_ZERO, TCG_REG_ZERO,
-                       arg_label(a0));
+        {
+            TCGLabel *l = arg_label(a0);
+            if (use_mips32r6_instructions) {
+                tcg_out32(s, OPC_BC);
+                if (l->has_value) {
+                    reloc_pc26(s->code_ptr - 1, l->u.value_ptr);
+                } else {
+                    tcg_out_reloc(s, s->code_ptr - 1, R_MIPS_PC26_S2, l, 0);
+                }
+            } else {
+                tcg_out_opc_br_pc16(s, OPC_BEQ, TCG_REG_ZERO, TCG_REG_ZERO, l);
+            }
+        }
         break;
 
     case INDEX_op_ld8u_i32:
-- 
2.5.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 15/15] tcg-mips: Use mipsr6 instructions in calls
  2016-02-09 10:39 [Qemu-devel] [PATCH 00/15] tcg mips64 and mipsr6 improvements Richard Henderson
                   ` (13 preceding siblings ...)
  2016-02-09 10:40 ` [Qemu-devel] [PATCH 14/15] tcg-mips: Use mipsr6 instructions in branches Richard Henderson
@ 2016-02-09 10:40 ` Richard Henderson
  2016-02-10 12:49   ` James Hogan
  14 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2016-02-09 10:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: james.hogan, aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/mips/tcg-target.c | 30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/tcg/mips/tcg-target.c b/tcg/mips/tcg-target.c
index 06e15d4..1b876af 100644
--- a/tcg/mips/tcg-target.c
+++ b/tcg/mips/tcg-target.c
@@ -357,7 +357,9 @@ typedef enum {
     OPC_SH       = 051 << 26,
     OPC_SW       = 053 << 26,
     OPC_BC       = 062 << 26,
+    OPC_JIC      = 066 << 26,
     OPC_LD       = 067 << 26,
+    OPC_JIALC    = 076 << 26,
     OPC_SD       = 077 << 26,
 
     OPC_SPECIAL  = 000 << 26,
@@ -1313,28 +1315,30 @@ static void tcg_out_movcond(TCGContext *s, TCGCond cond, TCGReg ret,
     }
 }
 
-static void tcg_out_call_int(TCGContext *s, tcg_insn_unit *arg, bool tail)
+static void tcg_out_call_int(TCGContext *s, tcg_insn_unit *arg,
+                             bool tail, bool delay)
 {
     /* Note that the ABI requires the called function's address to be
        loaded into T9, even if a direct branch is in range.  */
     tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_T9, (uintptr_t)arg);
 
     /* But do try a direct branch, allowing the cpu better insn prefetch.  */
-    if (tail) {
-        if (!tcg_out_opc_jmp(s, OPC_J, arg)) {
-            tcg_out_opc_reg(s, OPC_JR, 0, TCG_REG_T9, 0);
-        }
+    if (tcg_out_opc_jmp(s, tail ? OPC_J : OPC_JAL, arg)) {
+        if (!delay) {
+            tcg_out_nop(s);
+        }
+    } else if (use_mips32r6_instructions && !delay) {
+        tcg_out_opc_reg(s, tail ? OPC_JIC : OPC_JIALC, 0, TCG_REG_T9, 0);
+    } else if (tail) {
+        tcg_out_opc_reg(s, OPC_JR, 0, TCG_REG_T9, 0);
     } else {
-        if (!tcg_out_opc_jmp(s, OPC_JAL, arg)) {
-            tcg_out_opc_reg(s, OPC_JALR, TCG_REG_RA, TCG_REG_T9, 0);
-        }
+        tcg_out_opc_reg(s, OPC_JALR, TCG_REG_RA, TCG_REG_T9, 0);
     }
 }
 
 static void tcg_out_call(TCGContext *s, tcg_insn_unit *arg)
 {
-    tcg_out_call_int(s, arg, false);
-    tcg_out_nop(s);
+    tcg_out_call_int(s, arg, false, false);
 }
 
 #if defined(CONFIG_SOFTMMU)
@@ -1558,7 +1562,8 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
     }
     i = tcg_out_call_iarg_imm(s, i, oi);
     i = tcg_out_call_iarg_imm(s, i, (intptr_t)l->raddr);
-    tcg_out_call_int(s, qemu_ld_helpers[opc & (MO_BSWAP | MO_SSIZE)], false);
+    tcg_out_call_int(s, qemu_ld_helpers[opc & (MO_BSWAP | MO_SSIZE)],
+                     false, true);
     /* delay slot */
     tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
 
@@ -1622,7 +1627,8 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
        computation to take place in the return address register.  */
     tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_RA, (intptr_t)l->raddr);
     i = tcg_out_call_iarg_reg(s, i, TCG_REG_RA);
-    tcg_out_call_int(s, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)], true);
+    tcg_out_call_int(s, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)],
+                     true, true);
     /* delay slot */
     tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
 }
-- 
2.5.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 02/15] tcg-mips: Support 64-bit opcodes
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 02/15] tcg-mips: Support 64-bit opcodes Richard Henderson
@ 2016-02-09 15:24   ` James Hogan
  2016-02-09 17:16     ` Richard Henderson
  0 siblings, 1 reply; 29+ messages in thread
From: James Hogan @ 2016-02-09 15:24 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien

[-- Attachment #1: Type: text/plain, Size: 1395 bytes --]

Hi Richard,

Awesome, thanks for looking at these patches again :-)

On Tue, Feb 09, 2016 at 09:39:50PM +1100, Richard Henderson wrote:
> +#if !use_mips32r6_instructions
> +    { INDEX_op_muls2_i64, { "r", "r", "rZ", "rZ" } },
> +    { INDEX_op_mulu2_i64, { "r", "r", "rZ", "rZ" } },
> +#endif

this...

> +#define TCG_TARGET_HAS_mulu2_i64        1
> +#define TCG_TARGET_HAS_muls2_i64        1

and this are inconsistent for r6:

Missing op definition for mulu2_i64 
Missing op definition for muls2_i64 
/work/mips/qemu/main/tcg/tcg.c:1253: tcg fatal error

It gets further (to the point of seg faulting - looking into it) with
this fixup:

diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index 374d80374021..fa9cd4ab296a 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -145,8 +145,8 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_add2_i64         0
 #define TCG_TARGET_HAS_sub2_i64         0
-#define TCG_TARGET_HAS_mulu2_i64        1
-#define TCG_TARGET_HAS_muls2_i64        1
+#define TCG_TARGET_HAS_mulu2_i64        (!use_mips32r6_instructions)
+#define TCG_TARGET_HAS_muls2_i64        (!use_mips32r6_instructions)
 #define TCG_TARGET_HAS_muluh_i64        1
 #define TCG_TARGET_HAS_mulsh_i64        1
 #define TCG_TARGET_HAS_ext32s_i64       1

Cheers
James

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 14/15] tcg-mips: Use mipsr6 instructions in branches
  2016-02-09 10:40 ` [Qemu-devel] [PATCH 14/15] tcg-mips: Use mipsr6 instructions in branches Richard Henderson
@ 2016-02-09 16:22   ` James Hogan
  2016-02-09 17:13     ` Richard Henderson
  2016-02-10  0:20     ` James Hogan
  0 siblings, 2 replies; 29+ messages in thread
From: James Hogan @ 2016-02-09 16:22 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Leon Alrae, qemu-devel, aurelien

[-- Attachment #1: Type: text/plain, Size: 12468 bytes --]

Hi Richard,

On Tue, Feb 09, 2016 at 09:40:02PM +1100, Richard Henderson wrote:
> Using compact branches, when possible, avoids a delay slot nop.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  include/elf.h         |   4 +
>  tcg/mips/tcg-target.c | 216 +++++++++++++++++++++++++++++++++++++++-----------
>  2 files changed, 172 insertions(+), 48 deletions(-)
> 
> diff --git a/include/elf.h b/include/elf.h
> index 1098d21..6e52ba0 100644
> --- a/include/elf.h
> +++ b/include/elf.h
> @@ -352,6 +352,10 @@ typedef struct {
>  #define R_MIPS_CALLHI16		30
>  #define R_MIPS_CALLLO16		31
>  /*
> + * Incomplete list of MIPS R6 relocation types.
> + */
> +#define R_MIPS_PC26_S2          61
> +/*
>   * This range is reserved for vendor specific relocations.
>   */
>  #define R_MIPS_LOVENDOR		100
> diff --git a/tcg/mips/tcg-target.c b/tcg/mips/tcg-target.c
> index e0972ba..06e15d4 100644
> --- a/tcg/mips/tcg-target.c
> +++ b/tcg/mips/tcg-target.c
> @@ -152,6 +152,19 @@ static inline void reloc_pc16(tcg_insn_unit *pc, tcg_insn_unit *target)
>      *pc = deposit32(*pc, 0, 16, reloc_pc16_val(pc, target));
>  }
>  
> +static inline uint32_t reloc_pc26_val(tcg_insn_unit *pc, tcg_insn_unit *target)
> +{
> +    /* Let the compiler perform the right-shift as part of the arithmetic.  */
> +    ptrdiff_t disp = target - (pc + 1);
> +    assert(disp == sextract32(disp, 0, 26));
> +    return disp & 0x1ffffff;
> +}
> +
> +static inline void reloc_pc26(tcg_insn_unit *pc, tcg_insn_unit *target)
> +{
> +    *pc = deposit32(*pc, 0, 26, reloc_pc16_val(pc, target));
> +}
> +
>  static inline uint32_t reloc_26_val(tcg_insn_unit *pc, tcg_insn_unit *target)
>  {
>      assert((((uintptr_t)pc ^ (uintptr_t)target) & 0xf0000000) == 0);
> @@ -166,9 +179,17 @@ static inline void reloc_26(tcg_insn_unit *pc, tcg_insn_unit *target)
>  static void patch_reloc(tcg_insn_unit *code_ptr, int type,
>                          intptr_t value, intptr_t addend)
>  {
> -    assert(type == R_MIPS_PC16);
>      assert(addend == 0);
> -    reloc_pc16(code_ptr, (tcg_insn_unit *)value);
> +    switch (type) {
> +    case R_MIPS_PC16:
> +        reloc_pc16(code_ptr, (tcg_insn_unit *)value);
> +	break;
> +    case R_MIPS_PC26_S2:
> +        reloc_pc26(code_ptr, (tcg_insn_unit *)value);
> +	break;
> +    default:
> +        tcg_abort();
> +    }
>  }
>  
>  #define TCG_CT_CONST_ZERO 0x100
> @@ -309,7 +330,10 @@ typedef enum {
>      OPC_BEQ      = 004 << 26,
>      OPC_BNE      = 005 << 26,
>      OPC_BLEZ     = 006 << 26,
> +    OPC_BGEUC    = OPC_BLEZ,    /* R6: rs != 0, rt != 0, rs != rt */
>      OPC_BGTZ     = 007 << 26,
> +    OPC_BLTUC    = OPC_BGTZ,    /* R6: rs != 0, rt != 0, rs != rt */
> +    OPC_BEQC     = 010 << 26,   /* R6: rs > rt */
>      OPC_ADDIU    = 011 << 26,
>      OPC_SLTI     = 012 << 26,
>      OPC_SLTIU    = 013 << 26,
> @@ -318,6 +342,9 @@ typedef enum {
>      OPC_XORI     = 016 << 26,
>      OPC_LUI      = 017 << 26,
>      OPC_AUI      = OPC_LUI,
> +    OPC_BGEC     = 026 << 26,
> +    OPC_BLTC     = 027 << 26,
> +    OPC_BNEC     = 030 << 26,   /* R6: rs > rt */
>      OPC_DADDIU   = 031 << 26,
>      OPC_DAUI     = 035 << 26,
>      OPC_LB       = 040 << 26,
> @@ -329,6 +356,7 @@ typedef enum {
>      OPC_SB       = 050 << 26,
>      OPC_SH       = 051 << 26,
>      OPC_SW       = 053 << 26,
> +    OPC_BC       = 062 << 26,
>      OPC_LD       = 067 << 26,
>      OPC_SD       = 077 << 26,
>  
> @@ -527,6 +555,17 @@ static inline void tcg_out_opc_br(TCGContext *s, MIPSInsn opc,
>      tcg_out_opc_imm(s, opc, rt, rs, offset);
>  }
>  
> +static void tcg_out_opc_br_pc16(TCGContext *s, MIPSInsn opc,
> +                                TCGReg arg1, TCGReg arg2, TCGLabel *l)
> +{
> +    tcg_out_opc_br(s, opc, arg1, arg2);
> +    if (l->has_value) {
> +        reloc_pc16(s->code_ptr - 1, l->u.value_ptr);
> +    } else {
> +        tcg_out_reloc(s, s->code_ptr - 1, R_MIPS_PC16, l, 0);
> +    }
> +}
> +
>  /*
>   * Type sa
>   */
> @@ -1002,59 +1041,129 @@ static void tcg_out_brcond(TCGContext *s, TCGCond cond, TCGReg arg1,
>          [TCG_COND_GE] = OPC_BGEZ,
>      };
>  
> -    MIPSInsn s_opc = OPC_SLTU;
> -    MIPSInsn b_opc;
> -    int cmp_map;
> +    MIPSInsn b_opc = 0;
> +    bool compact = false;
> +    int cmp_map, t;
> +
> +    /* We shouldn't expect to have arg1 == arg2, as the TCG optimizer
> +       should have eliminated all such.  However, the R6 encodings do
> +       not allow this situation, so e.g. if the optimizer is disabled
> +       we must fall back to normal compares.  */
> +    if (use_mips32r6_instructions && arg1 != arg2) {
> +        switch (cond) {
> +        case TCG_COND_EQ:
> +        case TCG_COND_NE:
> +            if (arg1 < arg2) {
> +                t = arg1, arg1 = arg2, arg2 = t;
> +            }
> +            b_opc = cond == TCG_COND_EQ ? OPC_BEQC : OPC_BNEC;
> +            compact = true;
> +            break;
>  
> -    switch (cond) {
> -    case TCG_COND_EQ:
> -        b_opc = OPC_BEQ;
> -        break;
> -    case TCG_COND_NE:
> -        b_opc = OPC_BNE;
> -        break;
> +        case TCG_COND_LE:
> +        case TCG_COND_GT:
> +            if (arg1 == TCG_REG_ZERO) {
> +                break;
> +            }
> +            /* Swap arguments to turn LE to GE or GT to LT.
> +               This also produces BLEZC/BGTZC when arg2 = 0.  */
> +            t = arg1, arg1 = arg2, arg2 = t;
> +            b_opc = cond == TCG_COND_LE ? OPC_BGEC : OPC_BLTC;
> +            compact = true;
> +            break;
>  
> -    case TCG_COND_LT:
> -    case TCG_COND_GT:
> -    case TCG_COND_LE:
> -    case TCG_COND_GE:
> -        if (arg2 == 0) {
> -            b_opc = b_zero[cond];
> -            arg2 = arg1;
> -            arg1 = 0;
> +        case TCG_COND_GE:
> +        case TCG_COND_LT:
> +            if (arg1 == TCG_REG_ZERO) {
> +                break;
> +            }
> +            /* The encoding of BGEZC/BLTZC requires rs = rt.  */
> +            if (arg2 == TCG_REG_ZERO) {
> +                arg2 = arg1;
> +            }
> +            b_opc = cond == TCG_COND_GE ? OPC_BGEC : OPC_BLTC;
> +            compact = true;
>              break;
> -        }
> -        s_opc = OPC_SLT;
> -        /* FALLTHRU */
>  
> -    case TCG_COND_LTU:
> -    case TCG_COND_GTU:
> -    case TCG_COND_LEU:
> -    case TCG_COND_GEU:
> -        cmp_map = mips_cmp_map[cond];
> -        if (cmp_map & MIPS_CMP_SWAP) {
> -            TCGReg t = arg1;
> -            arg1 = arg2;
> -            arg2 = t;
> +        case TCG_COND_LEU:
> +            /* Swap arguments to turn LE to GE.  */
> +            t = arg1, arg1 = arg2, arg2 = t;
> +            /* FALLTHRU */
> +        case TCG_COND_GEU:
> +            b_opc = OPC_BGEUC;
> +            compact = true;
> +            break;
> +
> +        case TCG_COND_GTU:
> +            /* Swap arguments to turn GT to LT.  */
> +            t = arg1, arg1 = arg2, arg2 = t;
> +            /* FALLTHRU */
> +        case TCG_COND_LTU:
> +            b_opc = OPC_BLTUC;
> +            compact = true;
> +            break;
> +
> +        default:
> +            tcg_abort();
> +            break;
>          }
> -        tcg_out_opc_reg(s, s_opc, TCG_TMP0, arg1, arg2);
> -        b_opc = (cmp_map & MIPS_CMP_INV ? OPC_BEQ : OPC_BNE);
> -        arg1 = TCG_TMP0;
> -        arg2 = TCG_REG_ZERO;
> -        break;
> +    }
>  
> -    default:
> -        tcg_abort();
> -        break;
> +    if (b_opc == 0) {
> +        MIPSInsn s_opc = OPC_SLTU;
> +
> +        switch (cond) {
> +        case TCG_COND_EQ:
> +            b_opc = OPC_BEQ;
> +            break;
> +        case TCG_COND_NE:
> +            b_opc = OPC_BNE;
> +            break;
> +
> +        case TCG_COND_LT:
> +        case TCG_COND_GT:
> +        case TCG_COND_LE:
> +        case TCG_COND_GE:
> +            if (arg2 == 0) {
> +                b_opc = b_zero[cond];
> +                arg2 = arg1;
> +                arg1 = 0;
> +                break;
> +            }
> +            s_opc = OPC_SLT;
> +            /* FALLTHRU */
> +
> +        case TCG_COND_LTU:
> +        case TCG_COND_GTU:
> +        case TCG_COND_LEU:
> +        case TCG_COND_GEU:
> +            cmp_map = mips_cmp_map[cond];
> +            if (cmp_map & MIPS_CMP_SWAP) {
> +                TCGReg t = arg1;
> +                arg1 = arg2;
> +                arg2 = t;
> +            }
> +            tcg_out_opc_reg(s, s_opc, TCG_TMP0, arg1, arg2);
> +            if (use_mips32r6_instructions) {
> +                b_opc = (cmp_map & MIPS_CMP_INV ? OPC_BEQC : OPC_BNEC);
> +                compact = true;
> +            } else {
> +                b_opc = (cmp_map & MIPS_CMP_INV ? OPC_BEQ : OPC_BNE);
> +            }
> +            arg1 = TCG_TMP0;
> +            arg2 = TCG_REG_ZERO;
> +            break;
> +
> +        default:
> +            tcg_abort();
> +            break;
> +        }
>      }
>  
> -    tcg_out_opc_br(s, b_opc, arg1, arg2);
> -    if (l->has_value) {
> -        reloc_pc16(s->code_ptr - 1, l->u.value_ptr);
> -    } else {
> -        tcg_out_reloc(s, s->code_ptr - 1, R_MIPS_PC16, l, 0);
> +    tcg_out_opc_br_pc16(s, b_opc, arg1, arg2, l);
> +    if (!compact) {
> +        tcg_out_nop(s);

Unfortunately this isn't quite right. As far as I understand them,
conditional compact branches have a forbidden slot after them which
isn't permitted to contain a control transfer instruction (CTI).
Executing a conditional compact branch with a CTI in the forbidden slot
is required to signal a reserved instruction, but only if the branch is
not taken (giving user process a SIGILL).

E.g. 

Program received signal SIGILL, Illegal instruction.
[Switching to Thread 0xfff1c32e00 (LWP 204)]
0x000000fff30b0068 in code_gen_buffer ()
(gdb) disas/r
Dump of assembler code for function code_gen_buffer:
   0x000000fff30b0064 <+0>:     f8 ff 11 8e     lw      s1,-8(s0)
=> 0x000000fff30b0068 <+4>:     08 00 11 60     bnezalc s1,0xfff30b008c <code_gen_buffer+40>
   0x000000fff30b006c <+8>:     1d c0 c2 08     j       0xfff30b0074 <code_gen_buffer+16>
   0x000000fff30b0070 <+12>:    00 00 00 00     nop

(gdb) set *0x000000fff30b006c=0
(gdb) disas/r
Dump of assembler code for function code_gen_buffer:
   0x000000fff30b0064 <+0>:     f8 ff 11 8e     lw      s1,-8(s0)
=> 0x000000fff30b0068 <+4>:     08 00 11 60     bnezalc s1,0xfff30b008c <code_gen_buffer+40>
   0x000000fff30b006c <+8>:     00 00 00 00     nop
   0x000000fff30b0070 <+12>:    00 00 00 00     nop

(gdb) stepi
0x000000fff30b0070 in code_gen_buffer ()
(gdb) disas/r
Dump of assembler code for function code_gen_buffer:
   0x000000fff30b0064 <+0>:     f8 ff 11 8e     lw      s1,-8(s0)
   0x000000fff30b0068 <+4>:     08 00 11 60     bnezalc s1,0xfff30b008c <code_gen_buffer+40>
   0x000000fff30b006c <+8>:     00 00 00 00     nop
=> 0x000000fff30b0070 <+12>:    00 00 00 00     nop

So to be correct + efficient, it should only put the nop in if the next
generated instruction is a CTI. I imagine that would be a bit messy /
fragile, but maybe doable? I haven't looked too deeply.

Cheers
James


>      }
> -    tcg_out_nop(s);
>  }
>  
>  static TCGReg tcg_out_reduce_eq2(TCGContext *s, TCGReg tmp0, TCGReg tmp1,
> @@ -1826,8 +1935,19 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          s->tb_next_offset[a0] = tcg_current_code_size(s);
>          break;
>      case INDEX_op_br:
> -        tcg_out_brcond(s, TCG_COND_EQ, TCG_REG_ZERO, TCG_REG_ZERO,
> -                       arg_label(a0));
> +        {
> +            TCGLabel *l = arg_label(a0);
> +            if (use_mips32r6_instructions) {
> +                tcg_out32(s, OPC_BC);
> +                if (l->has_value) {
> +                    reloc_pc26(s->code_ptr - 1, l->u.value_ptr);
> +                } else {
> +                    tcg_out_reloc(s, s->code_ptr - 1, R_MIPS_PC26_S2, l, 0);
> +                }
> +            } else {
> +                tcg_out_opc_br_pc16(s, OPC_BEQ, TCG_REG_ZERO, TCG_REG_ZERO, l);
> +            }
> +        }
>          break;
>  
>      case INDEX_op_ld8u_i32:
> -- 
> 2.5.0
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 11/15] tcg-mips: Use mips64r6 instructions in tcg_out_movi
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 11/15] tcg-mips: Use mips64r6 instructions in tcg_out_movi Richard Henderson
@ 2016-02-09 16:50   ` James Hogan
  2016-02-09 17:20     ` Richard Henderson
                       ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: James Hogan @ 2016-02-09 16:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien

[-- Attachment #1: Type: text/plain, Size: 4970 bytes --]

Hi Richard,

On Tue, Feb 09, 2016 at 09:39:59PM +1100, Richard Henderson wrote:
> The DAHI and DATI instructions can eliminate two insns
> off the pre-r6 path.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/mips/tcg-target.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 63 insertions(+)
> 
> diff --git a/tcg/mips/tcg-target.c b/tcg/mips/tcg-target.c
> index 97f9251..f7f4331 100644
> --- a/tcg/mips/tcg-target.c
> +++ b/tcg/mips/tcg-target.c
> @@ -303,7 +303,9 @@ typedef enum {
>      OPC_ORI      = 015 << 26,
>      OPC_XORI     = 016 << 26,
>      OPC_LUI      = 017 << 26,
> +    OPC_AUI      = OPC_LUI,
>      OPC_DADDIU   = 031 << 26,
> +    OPC_DAUI     = 035 << 26,
>      OPC_LB       = 040 << 26,
>      OPC_LH       = 041 << 26,
>      OPC_LW       = 043 << 26,
> @@ -383,6 +385,8 @@ typedef enum {
>      OPC_REGIMM   = 001 << 26,
>      OPC_BLTZ     = OPC_REGIMM | (000 << 16),
>      OPC_BGEZ     = OPC_REGIMM | (001 << 16),
> +    OPC_DAHI     = OPC_REGIMM | (006 << 16),
> +    OPC_DATI     = OPC_REGIMM | (036 << 16),
>  
>      OPC_SPECIAL2 = 034 << 26,
>      OPC_MUL_R5   = OPC_SPECIAL2 | 002,
> @@ -402,6 +406,10 @@ typedef enum {
>      OPC_SEB      = OPC_SPECIAL3 | 02040,
>      OPC_SEH      = OPC_SPECIAL3 | 03040,
>  
> +    OPC_PCREL    = 073 << 26,
> +    OPC_ADDIUPC  = OPC_PCREL | (0 << 19),
> +    OPC_ALUIPC   = OPC_PCREL | (3 << 19) | (7 << 16),
> +
>      /* MIPS r6 doesn't have JR, JALR should be used instead */
>      OPC_JR       = use_mips32r6_instructions ? OPC_JALR : OPC_JR_R5,
>  
> @@ -448,6 +456,17 @@ static inline void tcg_out_opc_imm(TCGContext *s, MIPSInsn opc,
>      tcg_out32(s, inst);
>  }
>  
> +static inline void tcg_out_opc_pc19(TCGContext *s, MIPSInsn opc,
> +                                    TCGReg rs, TCGArg imm)
> +{
> +    int32_t inst;
> +
> +    inst = opc;
> +    inst |= (rs & 0x1F) << 21;
> +    inst |= (imm & 0x7ffff);
> +    tcg_out32(s, inst);
> +}
> +
>  /*
>   * Type bitfield
>   */
> @@ -589,6 +608,50 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
>      }
>      if (TCG_TARGET_REG_BITS == 32 || arg == (int32_t)arg) {
>          tcg_out_opc_imm(s, OPC_LUI, ret, TCG_REG_ZERO, arg >> 16);
> +    } else if (use_mips32r6_instructions) {
> +        tcg_target_long disp = arg - (intptr_t)s->code_ptr;
> +        if (disp == sextract32(disp, 2, 19) * 4) {
> +            tcg_out_opc_pc19(s, OPC_ADDIUPC, ret, disp >> 2);
> +            return;
> +        } else if ((disp & ~(tcg_target_long)0xffff)
> +                   == sextract32(disp, 16, 16) * 0x10000) {
> +            tcg_out_opc_imm(s, OPC_ALUIPC, ret, 0, disp >> 16);

I think ret and 0 are the wrong way around here. You're putting 0 in rs
(the destination register), which causes a seg fault.

OUT: [size=56] 
0xfff30b0064:  lw       s1,-8(s0) 
0xfff30b0068:  bnezalc  zero,s1,0xfff30b0090 
0xfff30b006c:  nop 
0xfff30b0070:  j        0xfff0000000 
0xfff30b0074:  nop 
0xfff30b0078:  lui      s1,0xbfc0 
0xfff30b007c:  ori      s1,s1,0x580 
0xfff30b0080:  sd       s1,256(s0) 
0xfff30b0084:  aluipc   zero,0xfeb7 
0xfff30b0088:  j        0xfff30b0034 
0xfff30b008c:  ori      v0,v0,0x4010 
0xfff30b0090:  aluipc   zero,0xfeb7 
0xfff30b0094:  j        0xfff30b0034 
0xfff30b0098:  ori      v0,v0,0x4013

Cheers
James



> +        } else {
> +            TCGReg in = TCG_REG_ZERO;
> +            tcg_target_long tmp = (int16_t)arg;
> +
> +            /* The R6 manual recommends construction of immediates in
> +               order of low to high (ADDI, AUI, DAHI, DATI) in order
> +               to simplify hardware recognizing these sequences.  */
> +
> +            if (tmp) {
> +                tcg_out_opc_imm(s, OPC_ADDIU, ret, in, tmp);
> +                in = ret;
> +            }
> +            arg = (arg - tmp) >> 16;
> +            tmp = (int16_t)arg;
> +
> +            /* Note that DAHI and DATI only have one register operand,
> +               and are thus we must put a zero low part in place.  Also
> +               note that we already eliminated simple 32-bit constants
> +               so we know this must happen.  */
> +            if (tmp || in != ret) {
> +                tcg_out_opc_imm(s, OPC_AUI, ret, in, tmp);
> +            }
> +            arg = (arg - tmp) >> 16;
> +            tmp = (int16_t)arg;
> +
> +            if (tmp) {
> +                tcg_out_opc_imm(s, OPC_DAHI, ret, 0, tmp);
> +            }
> +            arg = (arg - tmp) >> 16;
> +            tcg_debug_assert(arg == (int16_t)arg);
> +
> +            if (arg) {
> +                tcg_out_opc_imm(s, OPC_DATI, ret, 0, arg);
> +            }
> +            return;
> +        }
>      } else {
>          tcg_out_movi(s, TCG_TYPE_I32, ret, arg >> 31 >> 1);
>          if (arg & 0xffff0000ull) {
> -- 
> 2.5.0
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 14/15] tcg-mips: Use mipsr6 instructions in branches
  2016-02-09 16:22   ` James Hogan
@ 2016-02-09 17:13     ` Richard Henderson
  2016-02-09 18:46       ` Maciej W. Rozycki
  2016-02-10  0:20     ` James Hogan
  1 sibling, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2016-02-09 17:13 UTC (permalink / raw)
  To: James Hogan; +Cc: Leon Alrae, qemu-devel, aurelien

On 02/10/2016 03:22 AM, James Hogan wrote:
> So to be correct + efficient, it should only put the nop in if the next
> generated instruction is a CTI. I imagine that would be a bit messy /
> fragile, but maybe doable? I haven't looked too deeply.

Ouch, I didn't notice this about these insns.

I suppose this might be rare enough that it's still worth thinking about.  Off 
the top of my head I can't think of any way to save extra state, but perhaps 
just looking back at the previous insn's major opcode is enough when emitting 
any forbidden insn.

For the moment, let's just drop this patch (and probably the one for calls too, 
for the same reason?)


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 02/15] tcg-mips: Support 64-bit opcodes
  2016-02-09 15:24   ` James Hogan
@ 2016-02-09 17:16     ` Richard Henderson
  0 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2016-02-09 17:16 UTC (permalink / raw)
  To: James Hogan; +Cc: qemu-devel, aurelien

On 02/10/2016 02:24 AM, James Hogan wrote:
> Missing op definition for mulu2_i64
> Missing op definition for muls2_i64
> /work/mips/qemu/main/tcg/tcg.c:1253: tcg fatal error
>
> It gets further (to the point of seg faulting - looking into it) with
> this fixup:
>
> diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
> index 374d80374021..fa9cd4ab296a 100644
> --- a/tcg/mips/tcg-target.h
> +++ b/tcg/mips/tcg-target.h
> @@ -145,8 +145,8 @@ extern bool use_mips32r2_instructions;
>   #define TCG_TARGET_HAS_nand_i64         0
>   #define TCG_TARGET_HAS_add2_i64         0
>   #define TCG_TARGET_HAS_sub2_i64         0
> -#define TCG_TARGET_HAS_mulu2_i64        1
> -#define TCG_TARGET_HAS_muls2_i64        1
> +#define TCG_TARGET_HAS_mulu2_i64        (!use_mips32r6_instructions)
> +#define TCG_TARGET_HAS_muls2_i64        (!use_mips32r6_instructions)

Oops, yep.


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 11/15] tcg-mips: Use mips64r6 instructions in tcg_out_movi
  2016-02-09 16:50   ` James Hogan
@ 2016-02-09 17:20     ` Richard Henderson
  2016-02-09 17:25     ` Richard Henderson
  2016-02-10  0:32     ` James Hogan
  2 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2016-02-09 17:20 UTC (permalink / raw)
  To: James Hogan; +Cc: qemu-devel, aurelien

On 02/10/2016 03:50 AM, James Hogan wrote:
>> +        } else if ((disp & ~(tcg_target_long)0xffff)
>> +                   == sextract32(disp, 16, 16) * 0x10000) {
>> +            tcg_out_opc_imm(s, OPC_ALUIPC, ret, 0, disp >> 16);
>
> I think ret and 0 are the wrong way around here. You're putting 0 in rs
> (the destination register), which causes a seg fault.

Yep, thanks.


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 11/15] tcg-mips: Use mips64r6 instructions in tcg_out_movi
  2016-02-09 16:50   ` James Hogan
  2016-02-09 17:20     ` Richard Henderson
@ 2016-02-09 17:25     ` Richard Henderson
  2016-02-10  0:32     ` James Hogan
  2 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2016-02-09 17:25 UTC (permalink / raw)
  To: James Hogan; +Cc: qemu-devel, aurelien

On 02/10/2016 03:50 AM, James Hogan wrote:
> I think ret and 0 are the wrong way around here. You're putting 0 in rs
> (the destination register), which causes a seg fault.
>
> OUT: [size=56]
> 0xfff30b0064:  lw       s1,-8(s0)
> 0xfff30b0068:  bnezalc  zero,s1,0xfff30b0090
> 0xfff30b006c:  nop
> 0xfff30b0070:  j        0xfff0000000
> 0xfff30b0074:  nop
> 0xfff30b0078:  lui      s1,0xbfc0
> 0xfff30b007c:  ori      s1,s1,0x580
> 0xfff30b0080:  sd       s1,256(s0)
> 0xfff30b0084:  aluipc   zero,0xfeb7
> 0xfff30b0088:  j        0xfff30b0034
> 0xfff30b008c:  ori      v0,v0,0x4010
> 0xfff30b0090:  aluipc   zero,0xfeb7
> 0xfff30b0094:  j        0xfff30b0034
> 0xfff30b0098:  ori      v0,v0,0x4013
>
> Cheers
> James
>
>
>
>> +        } else {
>> +            TCGReg in = TCG_REG_ZERO;
>> +            tcg_target_long tmp = (int16_t)arg;
>> +
>> +            /* The R6 manual recommends construction of immediates in
>> +               order of low to high (ADDI, AUI, DAHI, DATI) in order
>> +               to simplify hardware recognizing these sequences.  */
>> +
>> +            if (tmp) {
>> +                tcg_out_opc_imm(s, OPC_ADDIU, ret, in, tmp);
>> +                in = ret;
>> +            }
>> +            arg = (arg - tmp) >> 16;
>> +            tmp = (int16_t)arg;
>> +
>> +            /* Note that DAHI and DATI only have one register operand,
>> +               and are thus we must put a zero low part in place.  Also
>> +               note that we already eliminated simple 32-bit constants
>> +               so we know this must happen.  */
>> +            if (tmp || in != ret) {
>> +                tcg_out_opc_imm(s, OPC_AUI, ret, in, tmp);
>> +            }
>> +            arg = (arg - tmp) >> 16;
>> +            tmp = (int16_t)arg;
>> +
>> +            if (tmp) {
>> +                tcg_out_opc_imm(s, OPC_DAHI, ret, 0, tmp);
>> +            }
>> +            arg = (arg - tmp) >> 16;
>> +            tcg_debug_assert(arg == (int16_t)arg);
>> +
>> +            if (arg) {
>> +                tcg_out_opc_imm(s, OPC_DATI, ret, 0, arg);

Same mistake here for DAHI/DATI.


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 14/15] tcg-mips: Use mipsr6 instructions in branches
  2016-02-09 17:13     ` Richard Henderson
@ 2016-02-09 18:46       ` Maciej W. Rozycki
  0 siblings, 0 replies; 29+ messages in thread
From: Maciej W. Rozycki @ 2016-02-09 18:46 UTC (permalink / raw)
  To: Richard Henderson; +Cc: James Hogan, Leon Alrae, qemu-devel, Aurelien Jarno

On Tue, 9 Feb 2016, Richard Henderson wrote:

> > So to be correct + efficient, it should only put the nop in if the next
> > generated instruction is a CTI. I imagine that would be a bit messy /
> > fragile, but maybe doable? I haven't looked too deeply.
> 
> Ouch, I didn't notice this about these insns.
> 
> I suppose this might be rare enough that it's still worth thinking about.  Off
> the top of my head I can't think of any way to save extra state, but perhaps
> just looking back at the previous insn's major opcode is enough when emitting
> any forbidden insn.

 FWIW I think this is a reasonable approach, applying to the regular MIPS 
ISA where the size of the instruction word is fixed so you can look at the 
preceding instruction in a reproducible manner.  And in the microMIPSr6 
ISA there are no forbidden slots, so no issue there in the first place.

  Maciej

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 14/15] tcg-mips: Use mipsr6 instructions in branches
  2016-02-09 16:22   ` James Hogan
  2016-02-09 17:13     ` Richard Henderson
@ 2016-02-10  0:20     ` James Hogan
  1 sibling, 0 replies; 29+ messages in thread
From: James Hogan @ 2016-02-10  0:20 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Leon Alrae, qemu-devel, aurelien

[-- Attachment #1: Type: text/plain, Size: 1006 bytes --]

Hi Richard,

On Tue, Feb 09, 2016 at 04:22:34PM +0000, James Hogan wrote:
> (gdb) disas/r
> Dump of assembler code for function code_gen_buffer:
>    0x000000fff30b0064 <+0>:     f8 ff 11 8e     lw      s1,-8(s0)
> => 0x000000fff30b0068 <+4>:     08 00 11 60     bnezalc s1,0xfff30b008c <code_gen_buffer+40>

Note also that this seems to be the wrong encoding anyway. It is
encoding the "and-link" variation which overwrites $31 with PC+4.

Cheers
James


>    0x000000fff30b006c <+8>:     1d c0 c2 08     j       0xfff30b0074 <code_gen_buffer+16>
>    0x000000fff30b0070 <+12>:    00 00 00 00     nop
> 
> (gdb) set *0x000000fff30b006c=0
> (gdb) disas/r
> Dump of assembler code for function code_gen_buffer:
>    0x000000fff30b0064 <+0>:     f8 ff 11 8e     lw      s1,-8(s0)
> => 0x000000fff30b0068 <+4>:     08 00 11 60     bnezalc s1,0xfff30b008c <code_gen_buffer+40>
>    0x000000fff30b006c <+8>:     00 00 00 00     nop
>    0x000000fff30b0070 <+12>:    00 00 00 00     nop

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 11/15] tcg-mips: Use mips64r6 instructions in tcg_out_movi
  2016-02-09 16:50   ` James Hogan
  2016-02-09 17:20     ` Richard Henderson
  2016-02-09 17:25     ` Richard Henderson
@ 2016-02-10  0:32     ` James Hogan
  2 siblings, 0 replies; 29+ messages in thread
From: James Hogan @ 2016-02-10  0:32 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien

[-- Attachment #1: Type: text/plain, Size: 3254 bytes --]

Hi Richard,

On Tue, Feb 09, 2016 at 04:50:52PM +0000, James Hogan wrote:
> > @@ -589,6 +608,50 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
> >      }
> >      if (TCG_TARGET_REG_BITS == 32 || arg == (int32_t)arg) {
> >          tcg_out_opc_imm(s, OPC_LUI, ret, TCG_REG_ZERO, arg >> 16);
> > +    } else if (use_mips32r6_instructions) {
> > +        tcg_target_long disp = arg - (intptr_t)s->code_ptr;
> > +        if (disp == sextract32(disp, 2, 19) * 4) {
> > +            tcg_out_opc_pc19(s, OPC_ADDIUPC, ret, disp >> 2);
> > +            return;
> > +        } else if ((disp & ~(tcg_target_long)0xffff)
> > +                   == sextract32(disp, 16, 16) * 0x10000) {
> > +            tcg_out_opc_imm(s, OPC_ALUIPC, ret, 0, disp >> 16);
> 
> I think ret and 0 are the wrong way around here. You're putting 0 in rs
> (the destination register), which causes a seg fault.
> 
> OUT: [size=56] 
> 0xfff30b0064:  lw       s1,-8(s0) 
> 0xfff30b0068:  bnezalc  zero,s1,0xfff30b0090 
> 0xfff30b006c:  nop 
> 0xfff30b0070:  j        0xfff0000000 
> 0xfff30b0074:  nop 
> 0xfff30b0078:  lui      s1,0xbfc0 
> 0xfff30b007c:  ori      s1,s1,0x580 
> 0xfff30b0080:  sd       s1,256(s0) 
> 0xfff30b0084:  aluipc   zero,0xfeb7 
> 0xfff30b0088:  j        0xfff30b0034 
> 0xfff30b008c:  ori      v0,v0,0x4010 
> 0xfff30b0090:  aluipc   zero,0xfeb7 
> 0xfff30b0094:  j        0xfff30b0034 
> 0xfff30b0098:  ori      v0,v0,0x4013

Actually, still not quite right.

ALUIPC does
dest <- ~0xffff & (PC + sign_extend(imm16<<16))

which is effectively
dest <- PC & ~0xffff + sign_extend(imm16<<16)

so disp should be between arg and code_ptr & ~0xffff, i.e. something
like this I think:

diff --git a/tcg/mips/tcg-target.c b/tcg/mips/tcg-target.c
index 8205ea4e159f..9a5d31478797 100644
--- a/tcg/mips/tcg-target.c
+++ b/tcg/mips/tcg-target.c
@@ -666,12 +666,13 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
         tcg_out_opc_imm(s, OPC_LUI, ret, TCG_REG_ZERO, arg >> 16);
     } else if (use_mips32r6_instructions) {
         tcg_target_long disp = arg - (intptr_t)s->code_ptr;
+        tcg_target_long disphi = arg - ((intptr_t)s->code_ptr & ~(tcg_target_long)0xffff);
         if (disp == sextract32(disp, 2, 19) * 4) {
             tcg_out_opc_pc19(s, OPC_ADDIUPC, ret, disp >> 2);
             return;
-        } else if ((disp & ~(tcg_target_long)0xffff)
-                   == sextract32(disp, 16, 16) * 0x10000) {
-            tcg_out_opc_imm(s, OPC_ALUIPC, 0, ret, disp >> 16);
+        } else if ((disphi & ~(tcg_target_long)0xffff)
+                   == sextract32(disphi, 16, 16) * 0x10000) {
+            tcg_out_opc_imm(s, OPC_ALUIPC, 0, ret, disphi >> 16);
         } else {
             TCGReg in = TCG_REG_ZERO;
             tcg_target_long tmp = (int16_t)arg;

Otherwise, in this case its trying to load the immediate 0xfff1c30000
relative to 0xfff30b0084, and calculates a disp of FEB7FF7C, which is
truncated to 0xFEB7. The result is then:
0xfff30b0000 + (int)0xfeb70000 = 0xfff1c20000
which is off by 64KiB.

With the above change we get:
disphi = 0xfeb80000
and the result is then:
0xfff30b0000 + (int)0xfeb80000 = 0xfff1c30000

Cheers
James

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 15/15] tcg-mips: Use mipsr6 instructions in calls
  2016-02-09 10:40 ` [Qemu-devel] [PATCH 15/15] tcg-mips: Use mipsr6 instructions in calls Richard Henderson
@ 2016-02-10 12:49   ` James Hogan
  0 siblings, 0 replies; 29+ messages in thread
From: James Hogan @ 2016-02-10 12:49 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien

[-- Attachment #1: Type: text/plain, Size: 1210 bytes --]

Hi Richard,

On Tue, Feb 09, 2016 at 09:40:03PM +1100, Richard Henderson wrote:
> @@ -1313,28 +1315,30 @@ static void tcg_out_movcond(TCGContext *s, TCGCond cond, TCGReg ret,
>      }
>  }
>  
> -static void tcg_out_call_int(TCGContext *s, tcg_insn_unit *arg, bool tail)
> +static void tcg_out_call_int(TCGContext *s, tcg_insn_unit *arg,
> +                             bool tail, bool delay)
>  {
>      /* Note that the ABI requires the called function's address to be
>         loaded into T9, even if a direct branch is in range.  */
>      tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_T9, (uintptr_t)arg);
>  
>      /* But do try a direct branch, allowing the cpu better insn prefetch.  */
> -    if (tail) {
> -        if (!tcg_out_opc_jmp(s, OPC_J, arg)) {
> -            tcg_out_opc_reg(s, OPC_JR, 0, TCG_REG_T9, 0);
> -        }
> +    if (tcg_out_opc_jmp(s, tail ? OPC_J : OPC_JAL, arg)) {
> +        if (!delay) {
> +            tcg_out_nop(s);
> +        }
> +    } else if (use_mips32r6_instructions && !delay) {
> +        tcg_out_opc_reg(s, tail ? OPC_JIC : OPC_JIALC, 0, TCG_REG_T9, 0);

this needs to be "...JIALC, 0, 0, TCG_REG_T9);" to get t9 into rt.

Cheers
James

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 07/15] tcg-mips: Adjust qemu_ld/st for mips64
  2016-02-09 10:39 ` [Qemu-devel] [PATCH 07/15] tcg-mips: Adjust qemu_ld/st for mips64 Richard Henderson
@ 2016-02-10 16:34   ` James Hogan
  2016-02-10 17:35     ` Richard Henderson
  0 siblings, 1 reply; 29+ messages in thread
From: James Hogan @ 2016-02-10 16:34 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien

[-- Attachment #1: Type: text/plain, Size: 2192 bytes --]

Hi Richard,

On Tue, Feb 09, 2016 at 09:39:55PM +1100, Richard Henderson wrote:
> @@ -1212,11 +1237,24 @@ static void tcg_out_tlb_load(TCGContext *s, TCGReg base, TCGReg addrl,
>             : offsetof(CPUArchState, tlb_table[mem_index][0].addr_write));
>      int add_off = offsetof(CPUArchState, tlb_table[mem_index][0].addend);
>  
> -    tcg_out_opc_sa(s, OPC_SRL, TCG_REG_A0, addrl,
> -                   TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
> -    tcg_out_opc_imm(s, OPC_ANDI, TCG_REG_A0, TCG_REG_A0,
> -                    (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS);
> -    tcg_out_opc_reg(s, OPC_ADDU, TCG_REG_A0, TCG_REG_A0, TCG_AREG0);
> +    if (use_mips32r2_instructions) {
> +        if (TCG_TARGET_REG_BITS == 32 || TARGET_LONG_BITS == 32) {
> +            tcg_out_opc_bf(s, OPC_EXT, TCG_REG_A0, addrl,
> +                           TARGET_PAGE_BITS + CPU_TLB_ENTRY_BITS - 1,
> +                           CPU_TLB_ENTRY_BITS);
> +        } else {
> +            tcg_out_opc_bf64(s, OPC_DEXT, OPC_DEXTM, OPC_DEXTU,
> +                             TCG_REG_A0, addrl,
> +                             TARGET_PAGE_BITS + CPU_TLB_ENTRY_BITS - 1,
> +                             CPU_TLB_ENTRY_BITS);
> +        }

The ext/dext here will end up with bits below bit CPU_TLB_ENTRY_BITS
set, which will result in load of addend from slightly offset address,
so things go badly wrong. You still need to either ANDI off the low bits
or trim them off with the ext/dext and shift it left again.

So I don't think there's any benefit to the use of these instructions
unless CPU_TLB_SIZE + CPU_TLB_ENTRY_BITS exceeds the 16-bits available
in the ANDI immediate field for the non r2 case.

Cheers
James

> +    } else {
> +        tcg_out_opc_sa(s, ALIAS_TSRL, TCG_REG_A0, addrl,
> +                       TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
> +        tcg_out_opc_imm(s, OPC_ANDI, TCG_REG_A0, TCG_REG_A0,
> +                        (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS);
> +    }
> +    tcg_out_opc_reg(s, ALIAS_PADD, TCG_REG_A0, TCG_REG_A0, TCG_AREG0);
>  
>      /* Compensate for very large offsets.  */
>      if (add_off >= 0x8000) {

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 07/15] tcg-mips: Adjust qemu_ld/st for mips64
  2016-02-10 16:34   ` James Hogan
@ 2016-02-10 17:35     ` Richard Henderson
  0 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2016-02-10 17:35 UTC (permalink / raw)
  To: James Hogan; +Cc: qemu-devel, aurelien

On 02/11/2016 03:34 AM, James Hogan wrote:
> Hi Richard,
>
> On Tue, Feb 09, 2016 at 09:39:55PM +1100, Richard Henderson wrote:
>> @@ -1212,11 +1237,24 @@ static void tcg_out_tlb_load(TCGContext *s, TCGReg base, TCGReg addrl,
>>              : offsetof(CPUArchState, tlb_table[mem_index][0].addr_write));
>>       int add_off = offsetof(CPUArchState, tlb_table[mem_index][0].addend);
>>
>> -    tcg_out_opc_sa(s, OPC_SRL, TCG_REG_A0, addrl,
>> -                   TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
>> -    tcg_out_opc_imm(s, OPC_ANDI, TCG_REG_A0, TCG_REG_A0,
>> -                    (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS);
>> -    tcg_out_opc_reg(s, OPC_ADDU, TCG_REG_A0, TCG_REG_A0, TCG_AREG0);
>> +    if (use_mips32r2_instructions) {
>> +        if (TCG_TARGET_REG_BITS == 32 || TARGET_LONG_BITS == 32) {
>> +            tcg_out_opc_bf(s, OPC_EXT, TCG_REG_A0, addrl,
>> +                           TARGET_PAGE_BITS + CPU_TLB_ENTRY_BITS - 1,
>> +                           CPU_TLB_ENTRY_BITS);
>> +        } else {
>> +            tcg_out_opc_bf64(s, OPC_DEXT, OPC_DEXTM, OPC_DEXTU,
>> +                             TCG_REG_A0, addrl,
>> +                             TARGET_PAGE_BITS + CPU_TLB_ENTRY_BITS - 1,
>> +                             CPU_TLB_ENTRY_BITS);
>> +        }
>
> The ext/dext here will end up with bits below bit CPU_TLB_ENTRY_BITS
> set, which will result in load of addend from slightly offset address,
> so things go badly wrong. You still need to either ANDI off the low bits
> or trim them off with the ext/dext and shift it left again.
>
> So I don't think there's any benefit to the use of these instructions
> unless CPU_TLB_SIZE + CPU_TLB_ENTRY_BITS exceeds the 16-bits available
> in the ANDI immediate field for the non r2 case.

Hmm.  I thought I'd deleted this code back out.  I must have messed up copying 
trees between machines and overwritten this.


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-02-10 17:36 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-09 10:39 [Qemu-devel] [PATCH 00/15] tcg mips64 and mipsr6 improvements Richard Henderson
2016-02-09 10:39 ` [Qemu-devel] [PATCH 01/15] tcg-mips: Add mips64 opcodes Richard Henderson
2016-02-09 10:39 ` [Qemu-devel] [PATCH 02/15] tcg-mips: Support 64-bit opcodes Richard Henderson
2016-02-09 15:24   ` James Hogan
2016-02-09 17:16     ` Richard Henderson
2016-02-09 10:39 ` [Qemu-devel] [PATCH 03/15] tcg-mips: Adjust move functions for mips64 Richard Henderson
2016-02-09 10:39 ` [Qemu-devel] [PATCH 04/15] tcg-mips: Adjust load/store " Richard Henderson
2016-02-09 10:39 ` [Qemu-devel] [PATCH 05/15] tcg-mips: Adjust prologue " Richard Henderson
2016-02-09 10:39 ` [Qemu-devel] [PATCH 06/15] tcg-mips: Add tcg unwind info Richard Henderson
2016-02-09 10:39 ` [Qemu-devel] [PATCH 07/15] tcg-mips: Adjust qemu_ld/st for mips64 Richard Henderson
2016-02-10 16:34   ` James Hogan
2016-02-10 17:35     ` Richard Henderson
2016-02-09 10:39 ` [Qemu-devel] [PATCH 08/15] tcg-mips: Adjust calling conventions " Richard Henderson
2016-02-09 10:39 ` [Qemu-devel] [PATCH 09/15] tcg-mips: Fix exit_tb " Richard Henderson
2016-02-09 10:39 ` [Qemu-devel] [PATCH 10/15] tcg-mips: Move bswap code to subroutines Richard Henderson
2016-02-09 10:39 ` [Qemu-devel] [PATCH 11/15] tcg-mips: Use mips64r6 instructions in tcg_out_movi Richard Henderson
2016-02-09 16:50   ` James Hogan
2016-02-09 17:20     ` Richard Henderson
2016-02-09 17:25     ` Richard Henderson
2016-02-10  0:32     ` James Hogan
2016-02-09 10:40 ` [Qemu-devel] [PATCH 12/15] tcg-mips: Use mips64r6 instructions in tcg_out_ldst Richard Henderson
2016-02-09 10:40 ` [Qemu-devel] [PATCH 13/15] tcg-mips: Use mips64r6 instructions in constant addition Richard Henderson
2016-02-09 10:40 ` [Qemu-devel] [PATCH 14/15] tcg-mips: Use mipsr6 instructions in branches Richard Henderson
2016-02-09 16:22   ` James Hogan
2016-02-09 17:13     ` Richard Henderson
2016-02-09 18:46       ` Maciej W. Rozycki
2016-02-10  0:20     ` James Hogan
2016-02-09 10:40 ` [Qemu-devel] [PATCH 15/15] tcg-mips: Use mipsr6 instructions in calls Richard Henderson
2016-02-10 12:49   ` James Hogan

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox