All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups
@ 2018-11-23 14:45 Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 01/37] tcg/i386: Always use %ebp for TCG_AREG0 Richard Henderson
                   ` (38 more replies)
  0 siblings, 39 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

This includes everything queued so far -- softmmu out-of-line
patches, bswap cleanups, and (new) eliminating all scratch
registers from x86 user-only memops.

This tree is now at

  https://github.com/rth7680/qemu.git tcg-next-for-4.0

for future tcg/riscv/ rebasing.


r~


Richard Henderson (37):
  tcg/i386: Always use %ebp for TCG_AREG0
  tcg/i386: Move TCG_REG_CALL_STACK from define to enum
  tcg: Return success from patch_reloc
  tcg: Add TCG_TARGET_NEED_LDST_OOL_LABELS
  tcg/i386: Add constraints for r8 and r9
  tcg/i386: Return a base register from tcg_out_tlb_load
  tcg/i386: Change TCG_REG_L[01] to not overlap function arguments
  tcg/i386: Force qemu_ld/st arguments into fixed registers
  tcg/i386: Use TCG_TARGET_NEED_LDST_OOL_LABELS
  tcg/aarch64: Add constraints for x0, x1, x2
  tcg/aarch64: Parameterize the temps for tcg_out_tlb_read
  tcg/aarch64: Parameterize the temp for tcg_out_goto_long
  tcg/aarch64: Use B not BL for tcg_out_goto_long
  tcg/aarch64: Use TCG_TARGET_NEED_LDST_OOL_LABELS
  tcg/arm: Parameterize the temps for tcg_out_tlb_read
  tcg/arm: Add constraints for R0-R5
  tcg/arm: Reduce the number of temps for tcg_out_tlb_read
  tcg/arm: Force qemu_ld/st arguments into fixed registers
  tcg/arm: Use TCG_TARGET_NEED_LDST_OOL_LABELS
  tcg/ppc: Parameterize the temps for tcg_out_tlb_read
  tcg/ppc: Split out tcg_out_call_int
  tcg/ppc: Add constraints for R7-R8
  tcg/ppc: Change TCG_TARGET_CALL_ALIGN_ARGS to bool
  tcg/ppc: Force qemu_ld/st arguments into fixed registers
  tcg/ppc: Use TCG_TARGET_NEED_LDST_OOL_LABELS
  tcg: Clean up generic bswap32
  tcg: Clean up generic bswap64
  tcg/optimize: Optimize bswap
  tcg: Add TCG_TARGET_HAS_MEMORY_BSWAP
  tcg/i386: Adjust TCG_TARGET_HAS_MEMORY_BSWAP
  tcg/aarch64: Set TCG_TARGET_HAS_MEMORY_BSWAP to false
  tcg/arm: Set TCG_TARGET_HAS_MEMORY_BSWAP to false for user-only
  tcg/i386: Propagate is64 to tcg_out_qemu_ld_direct
  tcg/i386: Restrict user-only qemu_st_i32 values to q-regs
  tcg/i386: Add setup_guest_base_seg for FreeBSD
  tcg/i386: Require segment syscalls to succeed
  tcg/i386: Remove L constraint

 tcg/aarch64/tcg-target.h     |   3 +-
 tcg/arm/tcg-target.h         |   7 +-
 tcg/i386/tcg-target.h        |  19 +-
 tcg/mips/tcg-target.h        |   1 +
 tcg/ppc/tcg-target.h         |   3 +-
 tcg/s390/tcg-target.h        |   1 +
 tcg/sparc/tcg-target.h       |   1 +
 tcg/tcg.h                    |   5 +
 tcg/tci/tcg-target.h         |   2 +
 accel/tcg/translate-all.c    |  15 +-
 tcg/aarch64/tcg-target.inc.c | 369 ++++++++--------
 tcg/arm/tcg-target.inc.c     | 643 +++++++++++++--------------
 tcg/i386/tcg-target.inc.c    | 821 ++++++++++++++++++++---------------
 tcg/mips/tcg-target.inc.c    |  29 +-
 tcg/optimize.c               |  12 +
 tcg/ppc/tcg-target.inc.c     | 562 +++++++++++++-----------
 tcg/s390/tcg-target.inc.c    |  37 +-
 tcg/sparc/tcg-target.inc.c   |  13 +-
 tcg/tcg-ldst-ool.inc.c       |  95 ++++
 tcg/tcg-op.c                 | 215 ++++++---
 tcg/tcg-pool.inc.c           |   5 +-
 tcg/tcg.c                    |  36 +-
 tcg/tci/tcg-target.inc.c     |   3 +-
 23 files changed, 1628 insertions(+), 1269 deletions(-)
 create mode 100644 tcg/tcg-ldst-ool.inc.c

-- 
2.17.2

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 01/37] tcg/i386: Always use %ebp for TCG_AREG0
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-29 12:52   ` Alex Bennée
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 02/37] tcg/i386: Move TCG_REG_CALL_STACK from define to enum Richard Henderson
                   ` (37 subsequent siblings)
  38 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

For x86_64, this can result in smaller code when manipulating
TCG_TYPE_I32, as we can omit a REX prefix.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.h | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 9fdf37f23c..7488c3d869 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -84,6 +84,8 @@ typedef enum {
     TCG_REG_RBP = TCG_REG_EBP,
     TCG_REG_RSI = TCG_REG_ESI,
     TCG_REG_RDI = TCG_REG_EDI,
+
+    TCG_AREG0 = TCG_REG_EBP,
 } TCGReg;
 
 /* used for function call generation */
@@ -194,12 +196,6 @@ extern bool have_avx2;
 #define TCG_TARGET_extract_i64_valid(ofs, len) \
     (((ofs) == 8 && (len) == 8) || ((ofs) + (len)) == 32)
 
-#if TCG_TARGET_REG_BITS == 64
-# define TCG_AREG0 TCG_REG_R14
-#else
-# define TCG_AREG0 TCG_REG_EBP
-#endif
-
 static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 {
 }
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 02/37] tcg/i386: Move TCG_REG_CALL_STACK from define to enum
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 01/37] tcg/i386: Always use %ebp for TCG_AREG0 Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-29 12:52   ` Alex Bennée
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 03/37] tcg: Return success from patch_reloc Richard Henderson
                   ` (36 subsequent siblings)
  38 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 7488c3d869..2441658865 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -86,10 +86,10 @@ typedef enum {
     TCG_REG_RDI = TCG_REG_EDI,
 
     TCG_AREG0 = TCG_REG_EBP,
+    TCG_REG_CALL_STACK = TCG_REG_ESP
 } TCGReg;
 
 /* used for function call generation */
-#define TCG_REG_CALL_STACK TCG_REG_ESP 
 #define TCG_TARGET_STACK_ALIGN 16
 #if defined(_WIN64)
 #define TCG_TARGET_CALL_STACK_OFFSET 32
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 03/37] tcg: Return success from patch_reloc
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 01/37] tcg/i386: Always use %ebp for TCG_AREG0 Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 02/37] tcg/i386: Move TCG_REG_CALL_STACK from define to enum Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-29 14:47   ` Alex Bennée
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 04/37] tcg: Add TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
                   ` (35 subsequent siblings)
  38 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

This moves the assert for success from inside patch_reloc
to outside patch_reloc.  This touches all tcg backends.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c | 44 ++++++++++++++-------------------
 tcg/arm/tcg-target.inc.c     | 26 +++++++++-----------
 tcg/i386/tcg-target.inc.c    | 17 +++++++------
 tcg/mips/tcg-target.inc.c    | 29 +++++++++-------------
 tcg/ppc/tcg-target.inc.c     | 47 ++++++++++++++++++++++--------------
 tcg/s390/tcg-target.inc.c    | 37 +++++++++++++++++++---------
 tcg/sparc/tcg-target.inc.c   | 13 ++++++----
 tcg/tcg-pool.inc.c           |  5 +++-
 tcg/tcg.c                    |  8 +++---
 tcg/tci/tcg-target.inc.c     |  3 ++-
 10 files changed, 125 insertions(+), 104 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 083592a4d7..30091f6a69 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -78,48 +78,40 @@ static const int tcg_target_call_oarg_regs[1] = {
 #define TCG_REG_GUEST_BASE TCG_REG_X28
 #endif
 
-static inline void reloc_pc26(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
+static inline bool reloc_pc26(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
 {
     ptrdiff_t offset = target - code_ptr;
-    tcg_debug_assert(offset == sextract64(offset, 0, 26));
-    /* read instruction, mask away previous PC_REL26 parameter contents,
-       set the proper offset, then write back the instruction. */
-    *code_ptr = deposit32(*code_ptr, 0, 26, offset);
+    if (offset == sextract64(offset, 0, 26)) {
+        /* read instruction, mask away previous PC_REL26 parameter contents,
+           set the proper offset, then write back the instruction. */
+        *code_ptr = deposit32(*code_ptr, 0, 26, offset);
+        return true;
+    }
+    return false;
 }
 
-static inline void reloc_pc26_atomic(tcg_insn_unit *code_ptr,
-                                     tcg_insn_unit *target)
+static inline bool reloc_pc19(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
 {
     ptrdiff_t offset = target - code_ptr;
-    tcg_insn_unit insn;
-    tcg_debug_assert(offset == sextract64(offset, 0, 26));
-    /* read instruction, mask away previous PC_REL26 parameter contents,
-       set the proper offset, then write back the instruction. */
-    insn = atomic_read(code_ptr);
-    atomic_set(code_ptr, deposit32(insn, 0, 26, offset));
+    if (offset == sextract64(offset, 0, 19)) {
+        *code_ptr = deposit32(*code_ptr, 5, 19, offset);
+        return true;
+    }
+    return false;
 }
 
-static inline void reloc_pc19(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
-{
-    ptrdiff_t offset = target - code_ptr;
-    tcg_debug_assert(offset == sextract64(offset, 0, 19));
-    *code_ptr = deposit32(*code_ptr, 5, 19, offset);
-}
-
-static inline void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static inline bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                                intptr_t value, intptr_t addend)
 {
     tcg_debug_assert(addend == 0);
     switch (type) {
     case R_AARCH64_JUMP26:
     case R_AARCH64_CALL26:
-        reloc_pc26(code_ptr, (tcg_insn_unit *)value);
-        break;
+        return reloc_pc26(code_ptr, (tcg_insn_unit *)value);
     case R_AARCH64_CONDBR19:
-        reloc_pc19(code_ptr, (tcg_insn_unit *)value);
-        break;
+        return reloc_pc19(code_ptr, (tcg_insn_unit *)value);
     default:
-        tcg_abort();
+        g_assert_not_reached();
     }
 }
 
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index e1fbf465cb..80d174ef44 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -187,27 +187,23 @@ static const uint8_t tcg_cond_to_arm_cond[] = {
     [TCG_COND_GTU] = COND_HI,
 };
 
-static inline void reloc_pc24(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
+static inline bool reloc_pc24(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
 {
     ptrdiff_t offset = (tcg_ptr_byte_diff(target, code_ptr) - 8) >> 2;
-    *code_ptr = (*code_ptr & ~0xffffff) | (offset & 0xffffff);
+    if (offset == sextract32(offset, 0, 24)) {
+        *code_ptr = (*code_ptr & ~0xffffff) | (offset & 0xffffff);
+        return true;
+    }
+    return false;
 }
 
-static inline void reloc_pc24_atomic(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
-{
-    ptrdiff_t offset = (tcg_ptr_byte_diff(target, code_ptr) - 8) >> 2;
-    tcg_insn_unit insn = atomic_read(code_ptr);
-    tcg_debug_assert(offset == sextract32(offset, 0, 24));
-    atomic_set(code_ptr, deposit32(insn, 0, 24, offset));
-}
-
-static void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
     tcg_debug_assert(addend == 0);
 
     if (type == R_ARM_PC24) {
-        reloc_pc24(code_ptr, (tcg_insn_unit *)value);
+        return reloc_pc24(code_ptr, (tcg_insn_unit *)value);
     } else if (type == R_ARM_PC13) {
         intptr_t diff = value - (uintptr_t)(code_ptr + 2);
         tcg_insn_unit insn = *code_ptr;
@@ -218,10 +214,9 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
             if (!u) {
                 diff = -diff;
             }
-        } else {
+        } else if (diff >= 0x1000 && diff < 0x100000) {
             int rd = extract32(insn, 12, 4);
             int rt = rd == TCG_REG_PC ? TCG_REG_TMP : rd;
-            assert(diff >= 0x1000 && diff < 0x100000);
             /* add rt, pc, #high */
             *code_ptr++ = ((insn & 0xf0000000) | (1 << 25) | ARITH_ADD
                            | (TCG_REG_PC << 16) | (rt << 12)
@@ -230,10 +225,13 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
             insn = deposit32(insn, 12, 4, rt);
             diff &= 0xfff;
             u = 1;
+        } else {
+            return false;
         }
         insn = deposit32(insn, 23, 1, u);
         insn = deposit32(insn, 0, 12, diff);
         *code_ptr = insn;
+        return true;
     } else {
         g_assert_not_reached();
     }
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 436195894b..4f66a0c5ae 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -167,29 +167,32 @@ static bool have_lzcnt;
 
 static tcg_insn_unit *tb_ret_addr;
 
-static void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
     value += addend;
-    switch(type) {
+
+    switch (type) {
     case R_386_PC32:
         value -= (uintptr_t)code_ptr;
         if (value != (int32_t)value) {
-            tcg_abort();
+            return false;
         }
         /* FALLTHRU */
     case R_386_32:
         tcg_patch32(code_ptr, value);
-        break;
+        return true;
+
     case R_386_PC8:
         value -= (uintptr_t)code_ptr;
         if (value != (int8_t)value) {
-            tcg_abort();
+            return false;
         }
         tcg_patch8(code_ptr, value);
-        break;
+        return true;
+
     default:
-        tcg_abort();
+        g_assert_not_reached();
     }
 }
 
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index cff525373b..e59c66b607 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -144,36 +144,29 @@ static tcg_insn_unit *bswap32_addr;
 static tcg_insn_unit *bswap32u_addr;
 static tcg_insn_unit *bswap64_addr;
 
-static inline uint32_t reloc_pc16_val(tcg_insn_unit *pc, tcg_insn_unit *target)
+static bool reloc_pc16_cond(tcg_insn_unit *pc, tcg_insn_unit *target)
 {
     /* Let the compiler perform the right-shift as part of the arithmetic.  */
     ptrdiff_t disp = target - (pc + 1);
-    tcg_debug_assert(disp == (int16_t)disp);
-    return disp & 0xffff;
+    if (disp == (int16_t)disp) {
+        *pc = deposit32(*pc, 0, 16, disp);
+        return true;
+    } else {
+        return false;
+    }
 }
 
-static inline void reloc_pc16(tcg_insn_unit *pc, tcg_insn_unit *target)
+static bool reloc_pc16(tcg_insn_unit *pc, tcg_insn_unit *target)
 {
-    *pc = deposit32(*pc, 0, 16, reloc_pc16_val(pc, target));
+    tcg_debug_assert(reloc_pc16_cond(pc, target));
 }
 
-static inline uint32_t reloc_26_val(tcg_insn_unit *pc, tcg_insn_unit *target)
-{
-    tcg_debug_assert((((uintptr_t)pc ^ (uintptr_t)target) & 0xf0000000) == 0);
-    return ((uintptr_t)target >> 2) & 0x3ffffff;
-}
-
-static inline void reloc_26(tcg_insn_unit *pc, tcg_insn_unit *target)
-{
-    *pc = deposit32(*pc, 0, 26, reloc_26_val(pc, target));
-}
-
-static void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
     tcg_debug_assert(type == R_MIPS_PC16);
     tcg_debug_assert(addend == 0);
-    reloc_pc16(code_ptr, (tcg_insn_unit *)value);
+    return reloc_pc16_cond(code_ptr, (tcg_insn_unit *)value);
 }
 
 #define TCG_CT_CONST_ZERO 0x100
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index c2f729ee8f..656a9ff603 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -186,16 +186,14 @@ static inline bool in_range_b(tcg_target_long target)
     return target == sextract64(target, 0, 26);
 }
 
-static uint32_t reloc_pc24_val(tcg_insn_unit *pc, tcg_insn_unit *target)
+static bool reloc_pc24_cond(tcg_insn_unit *pc, tcg_insn_unit *target)
 {
     ptrdiff_t disp = tcg_ptr_byte_diff(target, pc);
-    tcg_debug_assert(in_range_b(disp));
-    return disp & 0x3fffffc;
-}
-
-static void reloc_pc24(tcg_insn_unit *pc, tcg_insn_unit *target)
-{
-    *pc = (*pc & ~0x3fffffc) | reloc_pc24_val(pc, target);
+    if (in_range_b(disp)) {
+        *pc = (*pc & ~0x3fffffc) | (disp & 0x3fffffc);
+        return true;
+    }
+    return false;
 }
 
 static uint16_t reloc_pc14_val(tcg_insn_unit *pc, tcg_insn_unit *target)
@@ -205,10 +203,22 @@ static uint16_t reloc_pc14_val(tcg_insn_unit *pc, tcg_insn_unit *target)
     return disp & 0xfffc;
 }
 
+static bool reloc_pc14_cond(tcg_insn_unit *pc, tcg_insn_unit *target)
+{
+    ptrdiff_t disp = tcg_ptr_byte_diff(target, pc);
+    if (disp == (int16_t) disp) {
+        *pc = (*pc & ~0xfffc) | (disp & 0xfffc);
+        return true;
+    }
+    return false;
+}
+
+#ifdef CONFIG_SOFTMMU
 static void reloc_pc14(tcg_insn_unit *pc, tcg_insn_unit *target)
 {
-    *pc = (*pc & ~0xfffc) | reloc_pc14_val(pc, target);
+    tcg_debug_assert(reloc_pc14_cond(pc, target));
 }
+#endif
 
 static inline void tcg_out_b_noaddr(TCGContext *s, int insn)
 {
@@ -525,7 +535,7 @@ static const uint32_t tcg_to_isel[] = {
     [TCG_COND_GTU] = ISEL | BC_(7, CR_GT),
 };
 
-static void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
     tcg_insn_unit *target;
@@ -536,11 +546,9 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
 
     switch (type) {
     case R_PPC_REL14:
-        reloc_pc14(code_ptr, target);
-        break;
+        return reloc_pc14_cond(code_ptr, target);
     case R_PPC_REL24:
-        reloc_pc24(code_ptr, target);
-        break;
+        return reloc_pc24_cond(code_ptr, target);
     case R_PPC_ADDR16:
         /* We are abusing this relocation type.  This points to a pair
            of insns, addis + load.  If the displacement is small, we
@@ -552,11 +560,14 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
         } else {
             int16_t lo = value;
             int hi = value - lo;
-            assert(hi + lo == value);
-            code_ptr[0] = deposit32(code_ptr[0], 0, 16, hi >> 16);
-            code_ptr[1] = deposit32(code_ptr[1], 0, 16, lo);
+            if (hi + lo == value) {
+                code_ptr[0] = deposit32(code_ptr[0], 0, 16, hi >> 16);
+                code_ptr[1] = deposit32(code_ptr[1], 0, 16, lo);
+            } else {
+                return false;
+            }
         }
-        break;
+        return true;
     default:
         g_assert_not_reached();
     }
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 17c435ade5..a8d72dd630 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -366,7 +366,7 @@ static void * const qemu_st_helpers[16] = {
 static tcg_insn_unit *tb_ret_addr;
 uint64_t s390_facilities;
 
-static void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
     intptr_t pcrel2;
@@ -377,22 +377,35 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
 
     switch (type) {
     case R_390_PC16DBL:
-        assert(pcrel2 == (int16_t)pcrel2);
-        tcg_patch16(code_ptr, pcrel2);
+        if (pcrel2 == (int16_t)pcrel2) {
+            tcg_patch16(code_ptr, pcrel2);
+            return true;
+        }
         break;
     case R_390_PC32DBL:
-        assert(pcrel2 == (int32_t)pcrel2);
-        tcg_patch32(code_ptr, pcrel2);
+        if (pcrel2 == (int32_t)pcrel2) {
+            tcg_patch32(code_ptr, pcrel2);
+            return true;
+        }
         break;
     case R_390_20:
-        assert(value == sextract64(value, 0, 20));
-        old = *(uint32_t *)code_ptr & 0xf00000ff;
-        old |= ((value & 0xfff) << 16) | ((value & 0xff000) >> 4);
-        tcg_patch32(code_ptr, old);
+        if (value == sextract64(value, 0, 20)) {
+            old = *(uint32_t *)code_ptr & 0xf00000ff;
+            old |= ((value & 0xfff) << 16) | ((value & 0xff000) >> 4);
+            tcg_patch32(code_ptr, old);
+            return true;
+        }
         break;
     default:
         g_assert_not_reached();
     }
+    return false;
+}
+
+static void patch_reloc_force(tcg_insn_unit *code_ptr, int type,
+                              intptr_t value, intptr_t addend)
+{
+    tcg_debug_assert(patch_reloc(code_ptr, type, value, addend));
 }
 
 /* parse target specific constraints */
@@ -1618,7 +1631,8 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     TCGMemOpIdx oi = lb->oi;
     TCGMemOp opc = get_memop(oi);
 
-    patch_reloc(lb->label_ptr[0], R_390_PC16DBL, (intptr_t)s->code_ptr, 2);
+    patch_reloc_force(lb->label_ptr[0], R_390_PC16DBL,
+                      (intptr_t)s->code_ptr, 2);
 
     tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_R2, TCG_AREG0);
     if (TARGET_LONG_BITS == 64) {
@@ -1639,7 +1653,8 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     TCGMemOpIdx oi = lb->oi;
     TCGMemOp opc = get_memop(oi);
 
-    patch_reloc(lb->label_ptr[0], R_390_PC16DBL, (intptr_t)s->code_ptr, 2);
+    patch_reloc_force(lb->label_ptr[0], R_390_PC16DBL,
+                      (intptr_t)s->code_ptr, 2);
 
     tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_R2, TCG_AREG0);
     if (TARGET_LONG_BITS == 64) {
diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index 04bdc3df5e..111f3312d3 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -291,32 +291,34 @@ static inline int check_fit_i32(int32_t val, unsigned int bits)
 # define check_fit_ptr  check_fit_i32
 #endif
 
-static void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
     uint32_t insn = *code_ptr;
     intptr_t pcrel;
+    bool ret;
 
     value += addend;
     pcrel = tcg_ptr_byte_diff((tcg_insn_unit *)value, code_ptr);
 
     switch (type) {
     case R_SPARC_WDISP16:
-        assert(check_fit_ptr(pcrel >> 2, 16));
+        ret = check_fit_ptr(pcrel >> 2, 16);
         insn &= ~INSN_OFF16(-1);
         insn |= INSN_OFF16(pcrel);
         break;
     case R_SPARC_WDISP19:
-        assert(check_fit_ptr(pcrel >> 2, 19));
+        ret = check_fit_ptr(pcrel >> 2, 19);
         insn &= ~INSN_OFF19(-1);
         insn |= INSN_OFF19(pcrel);
         break;
     case R_SPARC_13:
         /* Note that we're abusing this reloc type for our own needs.  */
+        ret = true;
         if (!check_fit_ptr(value, 13)) {
             int adj = (value > 0 ? 0xff8 : -0x1000);
             value -= adj;
-            assert(check_fit_ptr(value, 13));
+            ret = check_fit_ptr(value, 13);
             *code_ptr++ = (ARITH_ADD | INSN_RD(TCG_REG_T2)
                            | INSN_RS1(TCG_REG_TB) | INSN_IMM13(adj));
             insn ^= INSN_RS1(TCG_REG_TB) ^ INSN_RS1(TCG_REG_T2);
@@ -328,12 +330,13 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
         /* Note that we're abusing this reloc type for our own needs.  */
         code_ptr[0] = deposit32(code_ptr[0], 0, 22, value >> 10);
         code_ptr[1] = deposit32(code_ptr[1], 0, 10, value);
-        return;
+        return value == (intptr_t)(uint32_t)value;
     default:
         g_assert_not_reached();
     }
 
     *code_ptr = insn;
+    return ret;
 }
 
 /* parse target specific constraints */
diff --git a/tcg/tcg-pool.inc.c b/tcg/tcg-pool.inc.c
index 7af5513ff3..ab8f6df8b0 100644
--- a/tcg/tcg-pool.inc.c
+++ b/tcg/tcg-pool.inc.c
@@ -140,6 +140,8 @@ static bool tcg_out_pool_finalize(TCGContext *s)
 
     for (; p != NULL; p = p->next) {
         size_t size = sizeof(tcg_target_ulong) * p->nlong;
+        bool ok;
+
         if (!l || l->nlong != p->nlong || memcmp(l->data, p->data, size)) {
             if (unlikely(a > s->code_gen_highwater)) {
                 return false;
@@ -148,7 +150,8 @@ static bool tcg_out_pool_finalize(TCGContext *s)
             a += size;
             l = p;
         }
-        patch_reloc(p->label, p->rtype, (intptr_t)a - size, p->addend);
+        ok = patch_reloc(p->label, p->rtype, (intptr_t)a - size, p->addend);
+        tcg_debug_assert(ok);
     }
 
     s->code_ptr = a;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index e85133ef05..54f1272187 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -66,7 +66,7 @@
 static void tcg_target_init(TCGContext *s);
 static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode);
 static void tcg_target_qemu_prologue(TCGContext *s);
-static void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend);
 
 /* The CIE and FDE header definitions will be common to all hosts.  */
@@ -268,7 +268,8 @@ static void tcg_out_reloc(TCGContext *s, tcg_insn_unit *code_ptr, int type,
         /* FIXME: This may break relocations on RISC targets that
            modify instruction fields in place.  The caller may not have 
            written the initial value.  */
-        patch_reloc(code_ptr, type, l->u.value, addend);
+        bool ok = patch_reloc(code_ptr, type, l->u.value, addend);
+        tcg_debug_assert(ok);
     } else {
         /* add a new relocation entry */
         r = tcg_malloc(sizeof(TCGRelocation));
@@ -288,7 +289,8 @@ static void tcg_out_label(TCGContext *s, TCGLabel *l, tcg_insn_unit *ptr)
     tcg_debug_assert(!l->has_value);
 
     for (r = l->u.first_reloc; r != NULL; r = r->next) {
-        patch_reloc(r->ptr, r->type, value, r->addend);
+        bool ok = patch_reloc(r->ptr, r->type, value, r->addend);
+        tcg_debug_assert(ok);
     }
 
     l->has_value = 1;
diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
index 62ed097254..0015a98485 100644
--- a/tcg/tci/tcg-target.inc.c
+++ b/tcg/tci/tcg-target.inc.c
@@ -369,7 +369,7 @@ static const char *const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 };
 #endif
 
-static void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
     /* tcg_out_reloc always uses the same type, addend. */
@@ -381,6 +381,7 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
     } else {
         tcg_patch64(code_ptr, value);
     }
+    return true;
 }
 
 /* Parse target specific constraints. */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 04/37] tcg: Add TCG_TARGET_NEED_LDST_OOL_LABELS
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (2 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 03/37] tcg: Return success from patch_reloc Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-26  0:31   ` Emilio G. Cota
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 05/37] tcg/i386: Add constraints for r8 and r9 Richard Henderson
                   ` (34 subsequent siblings)
  38 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

This variant of tcg-ldst.inc.c allows the entire thunk to be
moved out-of-line, with caching across TBs within a region.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg.h                 |  5 +++
 accel/tcg/translate-all.c | 15 +++++--
 tcg/tcg-ldst-ool.inc.c    | 95 +++++++++++++++++++++++++++++++++++++++
 tcg/tcg.c                 | 28 ++++++++++++
 4 files changed, 140 insertions(+), 3 deletions(-)
 create mode 100644 tcg/tcg-ldst-ool.inc.c

diff --git a/tcg/tcg.h b/tcg/tcg.h
index f4efbaa680..73737dc671 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -706,6 +706,11 @@ struct TCGContext {
 #ifdef TCG_TARGET_NEED_LDST_LABELS
     QSIMPLEQ_HEAD(ldst_labels, TCGLabelQemuLdst) ldst_labels;
 #endif
+#ifdef TCG_TARGET_NEED_LDST_OOL_LABELS
+    QSIMPLEQ_HEAD(ldst_labels, TCGLabelQemuLdstOol) ldst_ool_labels;
+    GHashTable *ldst_ool_thunks;
+    size_t ldst_ool_generation;
+#endif
 #ifdef TCG_TARGET_NEED_POOL_LABELS
     struct TCGLabelPoolData *pool_labels;
 #endif
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 639f0b2728..dd9332b24c 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1678,6 +1678,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     target_ulong virt_page2;
     tcg_insn_unit *gen_code_buf;
     int gen_code_size, search_size;
+#ifdef TCG_TARGET_NEED_LDST_OOL_LABELS
+    size_t ldst_ool_generation = tcg_ctx->ldst_ool_generation;
+#endif
 #ifdef CONFIG_PROFILER
     TCGProfile *prof = &tcg_ctx->prof;
     int64_t ti;
@@ -1831,10 +1834,16 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     existing_tb = tb_link_page(tb, phys_pc, phys_page2);
     /* if the TB already exists, discard what we just translated */
     if (unlikely(existing_tb != tb)) {
-        uintptr_t orig_aligned = (uintptr_t)gen_code_buf;
+        bool discard = true;
 
-        orig_aligned -= ROUND_UP(sizeof(*tb), qemu_icache_linesize);
-        atomic_set(&tcg_ctx->code_gen_ptr, (void *)orig_aligned);
+#ifdef TCG_TARGET_NEED_LDST_OOL_LABELS
+        discard = ldst_ool_generation == tcg_ctx->ldst_ool_generation;
+#endif
+        if (discard) {
+            uintptr_t orig_aligned = (uintptr_t)gen_code_buf;
+            orig_aligned -= ROUND_UP(sizeof(*tb), qemu_icache_linesize);
+            atomic_set(&tcg_ctx->code_gen_ptr, (void *)orig_aligned);
+        }
         return existing_tb;
     }
     tcg_tb_insert(tb);
diff --git a/tcg/tcg-ldst-ool.inc.c b/tcg/tcg-ldst-ool.inc.c
new file mode 100644
index 0000000000..70b8789797
--- /dev/null
+++ b/tcg/tcg-ldst-ool.inc.c
@@ -0,0 +1,95 @@
+/*
+ * TCG Backend Data: load-store optimization only.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+typedef struct TCGLabelQemuLdstOol {
+    QSIMPLEQ_ENTRY(TCGLabelQemuLdstOol) next;
+    tcg_insn_unit *label;   /* label pointer to be updated */
+    int reloc;              /* relocation type from label_ptr */
+    intptr_t addend;        /* relocation addend from label_ptr */
+    uint32_t key;           /* oi : is_64 : is_ld */
+} TCGLabelQemuLdstOol;
+
+
+/*
+ * Generate TB finalization at the end of block
+ */
+
+static tcg_insn_unit *tcg_out_qemu_ldst_ool(TCGContext *s, bool is_ld,
+                                            bool is64, TCGMemOpIdx oi);
+
+static bool tcg_out_ldst_ool_finalize(TCGContext *s)
+{
+    TCGLabelQemuLdstOol *lb;
+
+    /* qemu_ld/st slow paths */
+    QSIMPLEQ_FOREACH(lb, &s->ldst_ool_labels, next) {
+        gpointer dest, key = (gpointer)(uintptr_t)lb->key;
+        TCGMemOpIdx oi;
+        bool is_ld, is_64, ok;
+
+        /* If we have generated the thunk, and it's still in range, all ok.  */
+        dest = g_hash_table_lookup(s->ldst_ool_thunks, key);
+        if (dest &&
+            patch_reloc(lb->label, lb->reloc, (intptr_t)dest, lb->addend)) {
+            continue;
+        }
+
+        /* Generate a new thunk.  */
+        is_ld = extract32(lb->key, 0, 1);
+        is_64 = extract32(lb->key, 1, 1);
+        oi = extract32(lb->key, 2, 30);
+        dest = tcg_out_qemu_ldst_ool(s, is_ld, is_64, oi);
+
+        /* Test for (pending) buffer overflow.  The assumption is that any
+           one thunk beginning below the high water mark cannot overrun
+           the buffer completely.  Thus we can test for overflow after
+           generating code without having to check during generation.  */
+        if (unlikely((void *)s->code_ptr > s->code_gen_highwater)) {
+            return false;
+        }
+
+        /* Remember the thunk for next time.  */
+        g_hash_table_replace(s->ldst_ool_thunks, key, dest);
+        s->ldst_ool_generation++;
+
+        /* The new thunk must be in range.  */
+        ok = patch_reloc(lb->label, lb->reloc, (intptr_t)dest, lb->addend);
+        tcg_debug_assert(ok);
+    }
+    return true;
+}
+
+/*
+ * Allocate a new TCGLabelQemuLdstOol entry.
+ */
+
+static void add_ldst_ool_label(TCGContext *s, bool is_ld, bool is_64,
+                               TCGMemOpIdx oi, int reloc, intptr_t addend)
+{
+    TCGLabelQemuLdstOol *lb = tcg_malloc(sizeof(*lb));
+
+    QSIMPLEQ_INSERT_TAIL(&s->ldst_ool_labels, lb, next);
+    lb->label = s->code_ptr;
+    lb->reloc = reloc;
+    lb->addend = addend;
+    lb->key = is_ld | (is_64 << 1) | (oi << 2);
+}
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 54f1272187..17c193791f 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -521,6 +521,13 @@ static void tcg_region_assign(TCGContext *s, size_t curr_region)
     s->code_gen_ptr = start;
     s->code_gen_buffer_size = end - start;
     s->code_gen_highwater = end - TCG_HIGHWATER;
+
+#ifdef TCG_TARGET_NEED_LDST_OOL_LABELS
+    /* No thunks yet generated this region.  Even if they were in range,
+       this is also the most convenient place to clear the table after a
+       full tb_flush.  */
+    g_hash_table_remove_all(s->ldst_ool_thunks);
+#endif
 }
 
 static bool tcg_region_alloc__locked(TCGContext *s)
@@ -756,6 +763,14 @@ void tcg_register_thread(void)
     err = tcg_region_initial_alloc__locked(tcg_ctx);
     g_assert(!err);
     qemu_mutex_unlock(&region.lock);
+
+#ifdef TCG_TARGET_NEED_LDST_OOL_LABELS
+    /* If n == 0, keep the hash table we allocated in tcg_context_init.  */
+    if (n) {
+        /* Both key and value are raw pointers.  */
+        s->ldst_ool_thunks = g_hash_table_new(NULL, NULL);
+    }
+#endif
 }
 #endif /* !CONFIG_USER_ONLY */
 
@@ -964,6 +979,11 @@ void tcg_context_init(TCGContext *s)
     tcg_debug_assert(!tcg_regset_test_reg(s->reserved_regs, TCG_AREG0));
     ts = tcg_global_reg_new_internal(s, TCG_TYPE_PTR, TCG_AREG0, "env");
     cpu_env = temp_tcgv_ptr(ts);
+
+#ifdef TCG_TARGET_NEED_LDST_OOL_LABELS
+    /* Both key and value are raw pointers.  */
+    s->ldst_ool_thunks = g_hash_table_new(NULL, NULL);
+#endif
 }
 
 /*
@@ -3540,6 +3560,9 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
 #ifdef TCG_TARGET_NEED_LDST_LABELS
     QSIMPLEQ_INIT(&s->ldst_labels);
 #endif
+#ifdef TCG_TARGET_NEED_LDST_OOL_LABELS
+    QSIMPLEQ_INIT(&s->ldst_ool_labels);
+#endif
 #ifdef TCG_TARGET_NEED_POOL_LABELS
     s->pool_labels = NULL;
 #endif
@@ -3620,6 +3643,11 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
         return -1;
     }
 #endif
+#ifdef TCG_TARGET_NEED_LDST_OOL_LABELS
+    if (!tcg_out_ldst_ool_finalize(s)) {
+        return -1;
+    }
+#endif
 #ifdef TCG_TARGET_NEED_POOL_LABELS
     if (!tcg_out_pool_finalize(s)) {
         return -1;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 05/37] tcg/i386: Add constraints for r8 and r9
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (3 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 04/37] tcg: Add TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-29 15:00   ` Alex Bennée
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 06/37] tcg/i386: Return a base register from tcg_out_tlb_load Richard Henderson
                   ` (33 subsequent siblings)
  38 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

These are function call arguments for x86_64 we will need soon.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.inc.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 4f66a0c5ae..8aef66e430 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -233,6 +233,14 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
         ct->ct |= TCG_CT_REG;
         tcg_regset_set_reg(ct->u.regs, TCG_REG_EDI);
         break;
+    case 'E': /* "Eight", r8 */
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set_reg(ct->u.regs, TCG_REG_R8);
+        break;
+    case 'N': /* "Nine", r9 */
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set_reg(ct->u.regs, TCG_REG_R9);
+        break;
     case 'q':
         /* A register that can be used as a byte operand.  */
         ct->ct |= TCG_CT_REG;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 06/37] tcg/i386: Return a base register from tcg_out_tlb_load
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (4 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 05/37] tcg/i386: Add constraints for r8 and r9 Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-29 16:34   ` Alex Bennée
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 07/37] tcg/i386: Change TCG_REG_L[01] to not overlap function arguments Richard Henderson
                   ` (32 subsequent siblings)
  38 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

We will shortly be asking the hot path not to assume TCG_REG_L1
for the host base address.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.inc.c | 56 ++++++++++++++++++++-------------------
 1 file changed, 29 insertions(+), 27 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 8aef66e430..3234a8d8bf 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1614,9 +1614,9 @@ static void * const qemu_st_helpers[16] = {
 
    First argument register is clobbered.  */
 
-static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
-                                    int mem_index, TCGMemOp opc,
-                                    tcg_insn_unit **label_ptr, int which)
+static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
+                               int mem_index, TCGMemOp opc,
+                               tcg_insn_unit **label_ptr, int which)
 {
     const TCGReg r0 = TCG_REG_L0;
     const TCGReg r1 = TCG_REG_L1;
@@ -1696,6 +1696,8 @@ static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     /* add addend(r0), r1 */
     tcg_out_modrm_offset(s, OPC_ADD_GvEv + hrexw, r1, r0,
                          offsetof(CPUTLBEntry, addend) - which);
+
+    return r1;
 }
 
 /*
@@ -2001,10 +2003,6 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
     TCGReg addrhi __attribute__((unused));
     TCGMemOpIdx oi;
     TCGMemOp opc;
-#if defined(CONFIG_SOFTMMU)
-    int mem_index;
-    tcg_insn_unit *label_ptr[2];
-#endif
 
     datalo = *args++;
     datahi = (TCG_TARGET_REG_BITS == 32 && is64 ? *args++ : 0);
@@ -2014,17 +2012,21 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
     opc = get_memop(oi);
 
 #if defined(CONFIG_SOFTMMU)
-    mem_index = get_mmuidx(oi);
+    {
+        int mem_index = get_mmuidx(oi);
+        tcg_insn_unit *label_ptr[2];
+        TCGReg base;
 
-    tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
-                     label_ptr, offsetof(CPUTLBEntry, addr_read));
+        base = tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
+                                label_ptr, offsetof(CPUTLBEntry, addr_read));
 
-    /* TLB Hit.  */
-    tcg_out_qemu_ld_direct(s, datalo, datahi, TCG_REG_L1, -1, 0, 0, opc);
+        /* TLB Hit.  */
+        tcg_out_qemu_ld_direct(s, datalo, datahi, base, -1, 0, 0, opc);
 
-    /* Record the current context of a load into ldst label */
-    add_qemu_ldst_label(s, true, oi, datalo, datahi, addrlo, addrhi,
-                        s->code_ptr, label_ptr);
+        /* Record the current context of a load into ldst label */
+        add_qemu_ldst_label(s, true, oi, datalo, datahi, addrlo, addrhi,
+                            s->code_ptr, label_ptr);
+    }
 #else
     {
         int32_t offset = guest_base;
@@ -2141,10 +2143,6 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
     TCGReg addrhi __attribute__((unused));
     TCGMemOpIdx oi;
     TCGMemOp opc;
-#if defined(CONFIG_SOFTMMU)
-    int mem_index;
-    tcg_insn_unit *label_ptr[2];
-#endif
 
     datalo = *args++;
     datahi = (TCG_TARGET_REG_BITS == 32 && is64 ? *args++ : 0);
@@ -2154,17 +2152,21 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
     opc = get_memop(oi);
 
 #if defined(CONFIG_SOFTMMU)
-    mem_index = get_mmuidx(oi);
+    {
+        int mem_index = get_mmuidx(oi);
+        tcg_insn_unit *label_ptr[2];
+        TCGReg base;
 
-    tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
-                     label_ptr, offsetof(CPUTLBEntry, addr_write));
+        base = tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
+                                label_ptr, offsetof(CPUTLBEntry, addr_write));
 
-    /* TLB Hit.  */
-    tcg_out_qemu_st_direct(s, datalo, datahi, TCG_REG_L1, 0, 0, opc);
+        /* TLB Hit.  */
+        tcg_out_qemu_st_direct(s, datalo, datahi, base, 0, 0, opc);
 
-    /* Record the current context of a store into ldst label */
-    add_qemu_ldst_label(s, false, oi, datalo, datahi, addrlo, addrhi,
-                        s->code_ptr, label_ptr);
+        /* Record the current context of a store into ldst label */
+        add_qemu_ldst_label(s, false, oi, datalo, datahi, addrlo, addrhi,
+                            s->code_ptr, label_ptr);
+    }
 #else
     {
         int32_t offset = guest_base;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 07/37] tcg/i386: Change TCG_REG_L[01] to not overlap function arguments
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (5 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 06/37] tcg/i386: Return a base register from tcg_out_tlb_load Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-29 17:13   ` Alex Bennée
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 08/37] tcg/i386: Force qemu_ld/st arguments into fixed registers Richard Henderson
                   ` (31 subsequent siblings)
  38 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

We will shortly be forcing qemu_ld/st arguments into registers
that match the function call abi of the host, which means that
the temps must be elsewhere.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.inc.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 3234a8d8bf..07df4b2b12 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -121,12 +121,16 @@ static const int tcg_target_call_oarg_regs[] = {
 #define TCG_CT_CONST_I32 0x400
 #define TCG_CT_CONST_WSZ 0x800
 
-/* Registers used with L constraint, which are the first argument
-   registers on x86_64, and two random call clobbered registers on
-   i386. */
+/* Registers used with L constraint, which are two random
+ * call clobbered registers.  These should be free.
+ */
 #if TCG_TARGET_REG_BITS == 64
-# define TCG_REG_L0 tcg_target_call_iarg_regs[0]
-# define TCG_REG_L1 tcg_target_call_iarg_regs[1]
+# define TCG_REG_L0   TCG_REG_RAX
+# ifdef _WIN64
+#  define TCG_REG_L1  TCG_REG_R10
+# else
+#  define TCG_REG_L1  TCG_REG_RDI
+# endif
 #else
 # define TCG_REG_L0 TCG_REG_EAX
 # define TCG_REG_L1 TCG_REG_EDX
@@ -1628,6 +1632,7 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     unsigned a_mask = (1 << a_bits) - 1;
     unsigned s_mask = (1 << s_bits) - 1;
     target_ulong tlb_mask;
+    TCGReg base;
 
     if (TCG_TARGET_REG_BITS == 64) {
         if (TARGET_LONG_BITS == 64) {
@@ -1674,7 +1679,12 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
        before the fastpath ADDQ below.  For 64-bit guest and x32 host, MOVQ
        copies the entire guest address for the slow path, while truncation
        for the 32-bit host happens with the fastpath ADDL below.  */
-    tcg_out_mov(s, ttype, r1, addrlo);
+    if (TCG_TARGET_REG_BITS == 64) {
+        base = tcg_target_call_iarg_regs[1];
+    } else {
+        base = r1;
+    }
+    tcg_out_mov(s, ttype, base, addrlo);
 
     /* jne slow_path */
     tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
@@ -1693,11 +1703,11 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
 
     /* TLB Hit.  */
 
-    /* add addend(r0), r1 */
-    tcg_out_modrm_offset(s, OPC_ADD_GvEv + hrexw, r1, r0,
+    /* add addend(r0), base */
+    tcg_out_modrm_offset(s, OPC_ADD_GvEv + hrexw, base, r0,
                          offsetof(CPUTLBEntry, addend) - which);
 
-    return r1;
+    return base;
 }
 
 /*
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 08/37] tcg/i386: Force qemu_ld/st arguments into fixed registers
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (6 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 07/37] tcg/i386: Change TCG_REG_L[01] to not overlap function arguments Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-30 16:16   ` Alex Bennée
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 09/37] tcg/i386: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
                   ` (30 subsequent siblings)
  38 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

This is an incremental step toward moving the qemu_ld/st
code sequence out of line.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.inc.c | 203 +++++++++++++++++++++++++++++++-------
 1 file changed, 169 insertions(+), 34 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 07df4b2b12..50e5dc31b3 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -171,6 +171,80 @@ static bool have_lzcnt;
 
 static tcg_insn_unit *tb_ret_addr;
 
+typedef enum {
+    ARG_ADDR,
+    ARG_STVAL,
+    ARG_LDVAL,
+} QemuMemArgType;
+
+#ifdef CONFIG_SOFTMMU
+/*
+ * Constraint to choose a particular register.  This is used for softmmu
+ * loads and stores.  Registers with no assignment get an empty string.
+ */
+static const char * const one_reg_constraint[TCG_TARGET_NB_REGS] = {
+    [TCG_REG_EAX] = "a",
+    [TCG_REG_EBX] = "b",
+    [TCG_REG_ECX] = "c",
+    [TCG_REG_EDX] = "d",
+    [TCG_REG_ESI] = "S",
+    [TCG_REG_EDI] = "D",
+#if TCG_TARGET_REG_BITS == 64
+    [TCG_REG_R8]  = "E",
+    [TCG_REG_R9]  = "N",
+#endif
+};
+
+/*
+ * Calling convention for the softmmu load and store thunks.
+ *
+ * For 64-bit, we mostly use the host calling convention, therefore the
+ * real first argument is reserved for the ENV parameter that is passed
+ * on to the slow path helpers.
+ *
+ * For 32-bit, the host calling convention is stack based; we invent a
+ * private convention that uses 4 of the 6 available host registers.
+ * We reserve EAX and EDX as temporaries for use by the thunk, we require
+ * INDEX_op_qemu_st_i32 to have a 'q' register from which to store, and
+ * further complicate this last by wanting a call-clobbered for that store.
+ * The 'q' requirement allows MO_8 stores at all; the call-clobbered part
+ * allows bswap to operate in-place, clobbering the input.
+ */
+static TCGReg softmmu_arg(QemuMemArgType type, bool is_64, int hi)
+{
+    switch (type) {
+    case ARG_ADDR:
+        tcg_debug_assert(!hi || TARGET_LONG_BITS > TCG_TARGET_REG_BITS);
+        if (TCG_TARGET_REG_BITS == 64) {
+            return tcg_target_call_iarg_regs[1];
+        } else {
+            return hi ? TCG_REG_EDI : TCG_REG_ESI;
+        }
+    case ARG_STVAL:
+        tcg_debug_assert(!hi || (TCG_TARGET_REG_BITS == 32 && is_64));
+        if (TCG_TARGET_REG_BITS == 64) {
+            return tcg_target_call_iarg_regs[2];
+        } else {
+            return hi ? TCG_REG_EBX : TCG_REG_ECX;
+        }
+    case ARG_LDVAL:
+        tcg_debug_assert(!hi || (TCG_TARGET_REG_BITS == 32 && is_64));
+        return tcg_target_call_oarg_regs[hi];
+    }
+    g_assert_not_reached();
+}
+
+static const char *constrain_memop_arg(QemuMemArgType type, bool is_64, int hi)
+{
+    return one_reg_constraint[softmmu_arg(type, is_64, hi)];
+}
+#else
+static const char *constrain_memop_arg(QemuMemArgType type, bool is_64, int hi)
+{
+    return "L";
+}
+#endif /* CONFIG_SOFTMMU */
+
 static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
@@ -1680,11 +1754,15 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
        copies the entire guest address for the slow path, while truncation
        for the 32-bit host happens with the fastpath ADDL below.  */
     if (TCG_TARGET_REG_BITS == 64) {
-        base = tcg_target_call_iarg_regs[1];
+        tcg_debug_assert(addrlo == tcg_target_call_iarg_regs[1]);
+        if (TARGET_LONG_BITS == 32) {
+            tcg_out_ext32u(s, addrlo, addrlo);
+        }
+        base = addrlo;
     } else {
         base = r1;
+        tcg_out_mov(s, ttype, base, addrlo);
     }
-    tcg_out_mov(s, ttype, base, addrlo);
 
     /* jne slow_path */
     tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
@@ -2009,16 +2087,22 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
    common. */
 static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
 {
-    TCGReg datalo, datahi, addrlo;
-    TCGReg addrhi __attribute__((unused));
+    TCGReg datalo, addrlo;
+    TCGReg datahi __attribute__((unused)) = -1;
+    TCGReg addrhi __attribute__((unused)) = -1;
     TCGMemOpIdx oi;
     TCGMemOp opc;
+    int i = -1;
 
-    datalo = *args++;
-    datahi = (TCG_TARGET_REG_BITS == 32 && is64 ? *args++ : 0);
-    addrlo = *args++;
-    addrhi = (TARGET_LONG_BITS > TCG_TARGET_REG_BITS ? *args++ : 0);
-    oi = *args++;
+    datalo = args[++i];
+    if (TCG_TARGET_REG_BITS == 32 && is64) {
+        datahi = args[++i];
+    }
+    addrlo = args[++i];
+    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+        addrhi = args[++i];
+    }
+    oi = args[++i];
     opc = get_memop(oi);
 
 #if defined(CONFIG_SOFTMMU)
@@ -2027,6 +2111,15 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
         tcg_insn_unit *label_ptr[2];
         TCGReg base;
 
+        tcg_debug_assert(datalo == softmmu_arg(ARG_LDVAL, is64, 0));
+        if (TCG_TARGET_REG_BITS == 32 && is64) {
+            tcg_debug_assert(datahi == softmmu_arg(ARG_LDVAL, is64, 1));
+        }
+        tcg_debug_assert(addrlo == softmmu_arg(ARG_ADDR, 0, 0));
+        if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+            tcg_debug_assert(addrhi == softmmu_arg(ARG_ADDR, 0, 1));
+        }
+
         base = tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
                                 label_ptr, offsetof(CPUTLBEntry, addr_read));
 
@@ -2149,16 +2242,22 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
 
 static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 {
-    TCGReg datalo, datahi, addrlo;
-    TCGReg addrhi __attribute__((unused));
+    TCGReg datalo, addrlo;
+    TCGReg datahi __attribute__((unused)) = -1;
+    TCGReg addrhi __attribute__((unused)) = -1;
     TCGMemOpIdx oi;
     TCGMemOp opc;
+    int i = -1;
 
-    datalo = *args++;
-    datahi = (TCG_TARGET_REG_BITS == 32 && is64 ? *args++ : 0);
-    addrlo = *args++;
-    addrhi = (TARGET_LONG_BITS > TCG_TARGET_REG_BITS ? *args++ : 0);
-    oi = *args++;
+    datalo = args[++i];
+    if (TCG_TARGET_REG_BITS == 32 && is64) {
+        datahi = args[++i];
+    }
+    addrlo = args[++i];
+    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+        addrhi = args[++i];
+    }
+    oi = args[++i];
     opc = get_memop(oi);
 
 #if defined(CONFIG_SOFTMMU)
@@ -2167,6 +2266,15 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
         tcg_insn_unit *label_ptr[2];
         TCGReg base;
 
+        tcg_debug_assert(datalo == softmmu_arg(ARG_STVAL, is64, 0));
+        if (TCG_TARGET_REG_BITS == 32 && is64) {
+            tcg_debug_assert(datahi == softmmu_arg(ARG_STVAL, is64, 1));
+        }
+        tcg_debug_assert(addrlo == softmmu_arg(ARG_ADDR, 0, 0));
+        if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+            tcg_debug_assert(addrhi == softmmu_arg(ARG_ADDR, 0, 1));
+        }
+
         base = tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
                                 label_ptr, offsetof(CPUTLBEntry, addr_write));
 
@@ -2836,15 +2944,6 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     static const TCGTargetOpDef r_r_re = { .args_ct_str = { "r", "r", "re" } };
     static const TCGTargetOpDef r_0_re = { .args_ct_str = { "r", "0", "re" } };
     static const TCGTargetOpDef r_0_ci = { .args_ct_str = { "r", "0", "ci" } };
-    static const TCGTargetOpDef r_L = { .args_ct_str = { "r", "L" } };
-    static const TCGTargetOpDef L_L = { .args_ct_str = { "L", "L" } };
-    static const TCGTargetOpDef r_L_L = { .args_ct_str = { "r", "L", "L" } };
-    static const TCGTargetOpDef r_r_L = { .args_ct_str = { "r", "r", "L" } };
-    static const TCGTargetOpDef L_L_L = { .args_ct_str = { "L", "L", "L" } };
-    static const TCGTargetOpDef r_r_L_L
-        = { .args_ct_str = { "r", "r", "L", "L" } };
-    static const TCGTargetOpDef L_L_L_L
-        = { .args_ct_str = { "L", "L", "L", "L" } };
     static const TCGTargetOpDef x_x = { .args_ct_str = { "x", "x" } };
     static const TCGTargetOpDef x_x_x = { .args_ct_str = { "x", "x", "x" } };
     static const TCGTargetOpDef x_x_x_x
@@ -3026,17 +3125,53 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         }
 
     case INDEX_op_qemu_ld_i32:
-        return TARGET_LONG_BITS <= TCG_TARGET_REG_BITS ? &r_L : &r_L_L;
-    case INDEX_op_qemu_st_i32:
-        return TARGET_LONG_BITS <= TCG_TARGET_REG_BITS ? &L_L : &L_L_L;
+        {
+            static TCGTargetOpDef ld32;
+            int i;
+
+            ld32.args_ct_str[0] = constrain_memop_arg(ARG_LDVAL, 0, 0);
+            for (i = 0; i * TCG_TARGET_REG_BITS < TARGET_LONG_BITS; ++i) {
+                ld32.args_ct_str[i + 1] = constrain_memop_arg(ARG_ADDR, 0, i);
+            }
+            return &ld32;
+        }
     case INDEX_op_qemu_ld_i64:
-        return (TCG_TARGET_REG_BITS == 64 ? &r_L
-                : TARGET_LONG_BITS <= TCG_TARGET_REG_BITS ? &r_r_L
-                : &r_r_L_L);
+        {
+            static TCGTargetOpDef ld64;
+            int i, j = 0;
+
+            for (i = 0; i * TCG_TARGET_REG_BITS < 64; ++i) {
+                ld64.args_ct_str[j++] = constrain_memop_arg(ARG_LDVAL, 1, i);
+            }
+            for (i = 0; i * TCG_TARGET_REG_BITS < TARGET_LONG_BITS; ++i) {
+                ld64.args_ct_str[j++] = constrain_memop_arg(ARG_ADDR, 0, i);
+            }
+            return &ld64;
+        }
+    case INDEX_op_qemu_st_i32:
+        {
+            static TCGTargetOpDef st32;
+            int i;
+
+            st32.args_ct_str[0] = constrain_memop_arg(ARG_STVAL, 0, 0);
+            for (i = 0; i * TCG_TARGET_REG_BITS < TARGET_LONG_BITS; ++i) {
+                st32.args_ct_str[i + 1] = constrain_memop_arg(ARG_ADDR, 0, i);
+            }
+            return &st32;
+        }
     case INDEX_op_qemu_st_i64:
-        return (TCG_TARGET_REG_BITS == 64 ? &L_L
-                : TARGET_LONG_BITS <= TCG_TARGET_REG_BITS ? &L_L_L
-                : &L_L_L_L);
+        {
+            static TCGTargetOpDef st64;
+            int i, j = 0;
+
+            for (i = 0; i * TCG_TARGET_REG_BITS < 64; ++i) {
+                st64.args_ct_str[j++] = constrain_memop_arg(ARG_STVAL, 1, i);
+            }
+            for (i = 0; i * TCG_TARGET_REG_BITS < TARGET_LONG_BITS; ++i) {
+                st64.args_ct_str[j++] = constrain_memop_arg(ARG_ADDR, 0, i);
+            }
+            return &st64;
+        }
 
     case INDEX_op_brcond2_i32:
         {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 09/37] tcg/i386: Use TCG_TARGET_NEED_LDST_OOL_LABELS
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (7 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 08/37] tcg/i386: Force qemu_ld/st arguments into fixed registers Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-30 17:22   ` Alex Bennée
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 10/37] tcg/aarch64: Add constraints for x0, x1, x2 Richard Henderson
                   ` (29 subsequent siblings)
  38 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

Move the entire memory operation out of line.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.h     |   2 +-
 tcg/i386/tcg-target.inc.c | 391 ++++++++++++++++----------------------
 2 files changed, 162 insertions(+), 231 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 2441658865..1b2d4e1b0d 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -220,7 +220,7 @@ static inline void tb_target_set_jmp_target(uintptr_t tc_ptr,
 #define TCG_TARGET_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
 
 #ifdef CONFIG_SOFTMMU
-#define TCG_TARGET_NEED_LDST_LABELS
+#define TCG_TARGET_NEED_LDST_OOL_LABELS
 #endif
 #define TCG_TARGET_NEED_POOL_LABELS
 
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 50e5dc31b3..5c68cbd43d 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1643,7 +1643,7 @@ static void tcg_out_nopn(TCGContext *s, int n)
 }
 
 #if defined(CONFIG_SOFTMMU)
-#include "tcg-ldst.inc.c"
+#include "tcg-ldst-ool.inc.c"
 
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     int mmu_idx, uintptr_t ra)
@@ -1656,6 +1656,14 @@ static void * const qemu_ld_helpers[16] = {
     [MO_BEUW] = helper_be_lduw_mmu,
     [MO_BEUL] = helper_be_ldul_mmu,
     [MO_BEQ]  = helper_be_ldq_mmu,
+
+    [MO_SB]   = helper_ret_ldsb_mmu,
+    [MO_LESW] = helper_le_ldsw_mmu,
+    [MO_BESW] = helper_be_ldsw_mmu,
+#if TCG_TARGET_REG_BITS == 64
+    [MO_LESL] = helper_le_ldsl_mmu,
+    [MO_BESL] = helper_be_ldsl_mmu,
+#endif
 };
 
 /* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
@@ -1765,18 +1773,18 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     }
 
     /* jne slow_path */
-    tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
+    tcg_out_opc(s, OPC_JCC_short + JCC_JNE, 0, 0, 0);
     label_ptr[0] = s->code_ptr;
-    s->code_ptr += 4;
+    s->code_ptr += 1;
 
     if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
         /* cmp 4(r0), addrhi */
         tcg_out_modrm_offset(s, OPC_CMP_GvEv, addrhi, r0, 4);
 
         /* jne slow_path */
-        tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
+        tcg_out_opc(s, OPC_JCC_short + JCC_JNE, 0, 0, 0);
         label_ptr[1] = s->code_ptr;
-        s->code_ptr += 4;
+        s->code_ptr += 1;
     }
 
     /* TLB Hit.  */
@@ -1788,181 +1796,6 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     return base;
 }
 
-/*
- * Record the context of a call to the out of line helper code for the slow path
- * for a load or store, so that we can later generate the correct helper code
- */
-static void add_qemu_ldst_label(TCGContext *s, bool is_ld, TCGMemOpIdx oi,
-                                TCGReg datalo, TCGReg datahi,
-                                TCGReg addrlo, TCGReg addrhi,
-                                tcg_insn_unit *raddr,
-                                tcg_insn_unit **label_ptr)
-{
-    TCGLabelQemuLdst *label = new_ldst_label(s);
-
-    label->is_ld = is_ld;
-    label->oi = oi;
-    label->datalo_reg = datalo;
-    label->datahi_reg = datahi;
-    label->addrlo_reg = addrlo;
-    label->addrhi_reg = addrhi;
-    label->raddr = raddr;
-    label->label_ptr[0] = label_ptr[0];
-    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-        label->label_ptr[1] = label_ptr[1];
-    }
-}
-
-/*
- * Generate code for the slow path for a load at the end of block
- */
-static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
-{
-    TCGMemOpIdx oi = l->oi;
-    TCGMemOp opc = get_memop(oi);
-    TCGReg data_reg;
-    tcg_insn_unit **label_ptr = &l->label_ptr[0];
-
-    /* resolve label address */
-    tcg_patch32(label_ptr[0], s->code_ptr - label_ptr[0] - 4);
-    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-        tcg_patch32(label_ptr[1], s->code_ptr - label_ptr[1] - 4);
-    }
-
-    if (TCG_TARGET_REG_BITS == 32) {
-        int ofs = 0;
-
-        tcg_out_st(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_ESP, ofs);
-        ofs += 4;
-
-        tcg_out_st(s, TCG_TYPE_I32, l->addrlo_reg, TCG_REG_ESP, ofs);
-        ofs += 4;
-
-        if (TARGET_LONG_BITS == 64) {
-            tcg_out_st(s, TCG_TYPE_I32, l->addrhi_reg, TCG_REG_ESP, ofs);
-            ofs += 4;
-        }
-
-        tcg_out_sti(s, TCG_TYPE_I32, oi, TCG_REG_ESP, ofs);
-        ofs += 4;
-
-        tcg_out_sti(s, TCG_TYPE_PTR, (uintptr_t)l->raddr, TCG_REG_ESP, ofs);
-    } else {
-        tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
-        /* The second argument is already loaded with addrlo.  */
-        tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[2], oi);
-        tcg_out_movi(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[3],
-                     (uintptr_t)l->raddr);
-    }
-
-    tcg_out_call(s, qemu_ld_helpers[opc & (MO_BSWAP | MO_SIZE)]);
-
-    data_reg = l->datalo_reg;
-    switch (opc & MO_SSIZE) {
-    case MO_SB:
-        tcg_out_ext8s(s, data_reg, TCG_REG_EAX, P_REXW);
-        break;
-    case MO_SW:
-        tcg_out_ext16s(s, data_reg, TCG_REG_EAX, P_REXW);
-        break;
-#if TCG_TARGET_REG_BITS == 64
-    case MO_SL:
-        tcg_out_ext32s(s, data_reg, TCG_REG_EAX);
-        break;
-#endif
-    case MO_UB:
-    case MO_UW:
-        /* Note that the helpers have zero-extended to tcg_target_long.  */
-    case MO_UL:
-        tcg_out_mov(s, TCG_TYPE_I32, data_reg, TCG_REG_EAX);
-        break;
-    case MO_Q:
-        if (TCG_TARGET_REG_BITS == 64) {
-            tcg_out_mov(s, TCG_TYPE_I64, data_reg, TCG_REG_RAX);
-        } else if (data_reg == TCG_REG_EDX) {
-            /* xchg %edx, %eax */
-            tcg_out_opc(s, OPC_XCHG_ax_r32 + TCG_REG_EDX, 0, 0, 0);
-            tcg_out_mov(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_EAX);
-        } else {
-            tcg_out_mov(s, TCG_TYPE_I32, data_reg, TCG_REG_EAX);
-            tcg_out_mov(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_EDX);
-        }
-        break;
-    default:
-        tcg_abort();
-    }
-
-    /* Jump to the code corresponding to next IR of qemu_st */
-    tcg_out_jmp(s, l->raddr);
-}
-
-/*
- * Generate code for the slow path for a store at the end of block
- */
-static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
-{
-    TCGMemOpIdx oi = l->oi;
-    TCGMemOp opc = get_memop(oi);
-    TCGMemOp s_bits = opc & MO_SIZE;
-    tcg_insn_unit **label_ptr = &l->label_ptr[0];
-    TCGReg retaddr;
-
-    /* resolve label address */
-    tcg_patch32(label_ptr[0], s->code_ptr - label_ptr[0] - 4);
-    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-        tcg_patch32(label_ptr[1], s->code_ptr - label_ptr[1] - 4);
-    }
-
-    if (TCG_TARGET_REG_BITS == 32) {
-        int ofs = 0;
-
-        tcg_out_st(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_ESP, ofs);
-        ofs += 4;
-
-        tcg_out_st(s, TCG_TYPE_I32, l->addrlo_reg, TCG_REG_ESP, ofs);
-        ofs += 4;
-
-        if (TARGET_LONG_BITS == 64) {
-            tcg_out_st(s, TCG_TYPE_I32, l->addrhi_reg, TCG_REG_ESP, ofs);
-            ofs += 4;
-        }
-
-        tcg_out_st(s, TCG_TYPE_I32, l->datalo_reg, TCG_REG_ESP, ofs);
-        ofs += 4;
-
-        if (s_bits == MO_64) {
-            tcg_out_st(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_ESP, ofs);
-            ofs += 4;
-        }
-
-        tcg_out_sti(s, TCG_TYPE_I32, oi, TCG_REG_ESP, ofs);
-        ofs += 4;
-
-        retaddr = TCG_REG_EAX;
-        tcg_out_movi(s, TCG_TYPE_PTR, retaddr, (uintptr_t)l->raddr);
-        tcg_out_st(s, TCG_TYPE_PTR, retaddr, TCG_REG_ESP, ofs);
-    } else {
-        tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
-        /* The second argument is already loaded with addrlo.  */
-        tcg_out_mov(s, (s_bits == MO_64 ? TCG_TYPE_I64 : TCG_TYPE_I32),
-                    tcg_target_call_iarg_regs[2], l->datalo_reg);
-        tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[3], oi);
-
-        if (ARRAY_SIZE(tcg_target_call_iarg_regs) > 4) {
-            retaddr = tcg_target_call_iarg_regs[4];
-            tcg_out_movi(s, TCG_TYPE_PTR, retaddr, (uintptr_t)l->raddr);
-        } else {
-            retaddr = TCG_REG_RAX;
-            tcg_out_movi(s, TCG_TYPE_PTR, retaddr, (uintptr_t)l->raddr);
-            tcg_out_st(s, TCG_TYPE_PTR, retaddr, TCG_REG_ESP,
-                       TCG_TARGET_CALL_STACK_OFFSET);
-        }
-    }
-
-    /* "Tail call" to the helper, with the return address back inline.  */
-    tcg_out_push(s, retaddr);
-    tcg_out_jmp(s, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)]);
-}
 #elif defined(__x86_64__) && defined(__linux__)
 # include <asm/prctl.h>
 # include <sys/prctl.h>
@@ -2091,7 +1924,6 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
     TCGReg datahi __attribute__((unused)) = -1;
     TCGReg addrhi __attribute__((unused)) = -1;
     TCGMemOpIdx oi;
-    TCGMemOp opc;
     int i = -1;
 
     datalo = args[++i];
@@ -2103,35 +1935,25 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
         addrhi = args[++i];
     }
     oi = args[++i];
-    opc = get_memop(oi);
 
 #if defined(CONFIG_SOFTMMU)
-    {
-        int mem_index = get_mmuidx(oi);
-        tcg_insn_unit *label_ptr[2];
-        TCGReg base;
-
-        tcg_debug_assert(datalo == softmmu_arg(ARG_LDVAL, is64, 0));
-        if (TCG_TARGET_REG_BITS == 32 && is64) {
-            tcg_debug_assert(datahi == softmmu_arg(ARG_LDVAL, is64, 1));
-        }
-        tcg_debug_assert(addrlo == softmmu_arg(ARG_ADDR, 0, 0));
-        if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-            tcg_debug_assert(addrhi == softmmu_arg(ARG_ADDR, 0, 1));
-        }
-
-        base = tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
-                                label_ptr, offsetof(CPUTLBEntry, addr_read));
-
-        /* TLB Hit.  */
-        tcg_out_qemu_ld_direct(s, datalo, datahi, base, -1, 0, 0, opc);
-
-        /* Record the current context of a load into ldst label */
-        add_qemu_ldst_label(s, true, oi, datalo, datahi, addrlo, addrhi,
-                            s->code_ptr, label_ptr);
+    /* Assert that we've set up the constraints properly.  */
+    tcg_debug_assert(datalo == softmmu_arg(ARG_LDVAL, is64, 0));
+    if (TCG_TARGET_REG_BITS == 32 && is64) {
+        tcg_debug_assert(datahi == softmmu_arg(ARG_LDVAL, is64, 1));
     }
+    tcg_debug_assert(addrlo == softmmu_arg(ARG_ADDR, 0, 0));
+    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+        tcg_debug_assert(addrhi == softmmu_arg(ARG_ADDR, 0, 1));
+    }
+
+    /* Call to thunk.  */
+    tcg_out8(s, OPC_CALL_Jz);
+    add_ldst_ool_label(s, true, is64, oi, R_386_PC32, -4);
+    s->code_ptr += 4;
 #else
     {
+        TCGMemOp opc = get_memop(oi);
         int32_t offset = guest_base;
         TCGReg base = addrlo;
         int index = -1;
@@ -2246,7 +2068,6 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
     TCGReg datahi __attribute__((unused)) = -1;
     TCGReg addrhi __attribute__((unused)) = -1;
     TCGMemOpIdx oi;
-    TCGMemOp opc;
     int i = -1;
 
     datalo = args[++i];
@@ -2258,35 +2079,25 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
         addrhi = args[++i];
     }
     oi = args[++i];
-    opc = get_memop(oi);
 
 #if defined(CONFIG_SOFTMMU)
-    {
-        int mem_index = get_mmuidx(oi);
-        tcg_insn_unit *label_ptr[2];
-        TCGReg base;
-
-        tcg_debug_assert(datalo == softmmu_arg(ARG_STVAL, is64, 0));
-        if (TCG_TARGET_REG_BITS == 32 && is64) {
-            tcg_debug_assert(datahi == softmmu_arg(ARG_STVAL, is64, 1));
-        }
-        tcg_debug_assert(addrlo == softmmu_arg(ARG_ADDR, 0, 0));
-        if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-            tcg_debug_assert(addrhi == softmmu_arg(ARG_ADDR, 0, 1));
-        }
-
-        base = tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
-                                label_ptr, offsetof(CPUTLBEntry, addr_write));
-
-        /* TLB Hit.  */
-        tcg_out_qemu_st_direct(s, datalo, datahi, base, 0, 0, opc);
-
-        /* Record the current context of a store into ldst label */
-        add_qemu_ldst_label(s, false, oi, datalo, datahi, addrlo, addrhi,
-                            s->code_ptr, label_ptr);
+    /* Assert that we've set up the constraints properly.  */
+    tcg_debug_assert(datalo == softmmu_arg(ARG_STVAL, is64, 0));
+    if (TCG_TARGET_REG_BITS == 32 && is64) {
+        tcg_debug_assert(datahi == softmmu_arg(ARG_STVAL, is64, 1));
     }
+    tcg_debug_assert(addrlo == softmmu_arg(ARG_ADDR, 0, 0));
+    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+        tcg_debug_assert(addrhi == softmmu_arg(ARG_ADDR, 0, 1));
+    }
+
+    /* Call to thunk.  */
+    tcg_out8(s, OPC_CALL_Jz);
+    add_ldst_ool_label(s, false, is64, oi, R_386_PC32, -4);
+    s->code_ptr += 4;
 #else
     {
+        TCGMemOp opc = get_memop(oi);
         int32_t offset = guest_base;
         TCGReg base = addrlo;
         int seg = 0;
@@ -2321,6 +2132,126 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 #endif
 }
 
+#if defined(CONFIG_SOFTMMU)
+/*
+ * Generate code for an out-of-line thunk performing a load.
+ */
+static tcg_insn_unit *tcg_out_qemu_ldst_ool(TCGContext *s, bool is_ld,
+                                            bool is_64, TCGMemOpIdx oi)
+{
+    TCGMemOp opc = get_memop(oi);
+    int mem_index = get_mmuidx(oi);
+    tcg_insn_unit *label_ptr[2], *thunk;
+    TCGReg datalo, addrlo, base;
+    TCGReg datahi __attribute__((unused)) = -1;
+    TCGReg addrhi __attribute__((unused)) = -1;
+    int i;
+
+    /* Since we're amortizing the cost, align the thunk.  */
+    thunk = QEMU_ALIGN_PTR_UP(s->code_ptr, 16);
+    if (thunk != s->code_ptr) {
+        memset(s->code_ptr, 0x90, thunk - s->code_ptr);
+        s->code_ptr = thunk;
+    }
+
+    /* Discover where the inputs are held.  */
+    addrlo = softmmu_arg(ARG_ADDR, 0, 0);
+    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+        addrhi = softmmu_arg(ARG_ADDR, 0, 1);
+    }
+    datalo = softmmu_arg(is_ld ? ARG_LDVAL : ARG_STVAL, is_64, 0);
+    if (TCG_TARGET_REG_BITS == 32 && is_64) {
+        datahi = softmmu_arg(is_ld ? ARG_LDVAL : ARG_STVAL, is_64, 1);
+    }
+
+    base = tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc, label_ptr,
+                            is_ld ? offsetof(CPUTLBEntry, addr_read)
+                            : offsetof(CPUTLBEntry, addr_write));
+
+    /* TLB Hit.  */
+    if (is_ld) {
+        tcg_out_qemu_ld_direct(s, datalo, datahi, base, -1, 0, 0, opc);
+    } else {
+        tcg_out_qemu_st_direct(s, datalo, datahi, base, 0, 0, opc);
+    }
+    tcg_out_opc(s, OPC_RET, 0, 0, 0);
+
+    /* TLB Miss.  */
+
+    /* resolve label address */
+    tcg_patch8(label_ptr[0], s->code_ptr - label_ptr[0] - 1);
+    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+        tcg_patch8(label_ptr[1], s->code_ptr - label_ptr[1] - 1);
+    }
+
+    if (TCG_TARGET_REG_BITS == 32) {
+        /* Copy the return address into a temporary.  */
+        tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_L0, TCG_REG_ESP, 0);
+        i = 4;
+
+        tcg_out_st(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_ESP, i);
+        i += 4;
+
+        tcg_out_st(s, TCG_TYPE_I32, addrlo, TCG_REG_ESP, i);
+        i += 4;
+
+        if (TARGET_LONG_BITS == 64) {
+            tcg_out_st(s, TCG_TYPE_I32, addrhi, TCG_REG_ESP, i);
+            i += 4;
+        }
+
+        if (!is_ld) {
+            tcg_out_st(s, TCG_TYPE_I32, datalo, TCG_REG_ESP, i);
+            i += 4;
+
+            if (is_64) {
+                tcg_out_st(s, TCG_TYPE_I32, datahi, TCG_REG_ESP, i);
+                i += 4;
+            }
+        }
+
+        tcg_out_sti(s, TCG_TYPE_I32, oi, TCG_REG_ESP, i);
+        i += 4;
+
+        tcg_out_st(s, TCG_TYPE_PTR, TCG_REG_L0, TCG_REG_ESP, i);
+    } else {
+        tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
+
+        /* The address and data values have been placed by constraints.  */
+        tcg_debug_assert(addrlo == tcg_target_call_iarg_regs[1]);
+        if (is_ld) {
+            i = 2;
+        } else {
+            tcg_debug_assert(datalo == tcg_target_call_iarg_regs[2]);
+            i = 3;
+        }
+
+        tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[i++], oi);
+
+        /* Copy the return address from the stack to the rvalue argument.
+         * WIN64 runs out of argument registers for stores.
+         */
+        if (i < (int)ARRAY_SIZE(tcg_target_call_iarg_regs)) {
+            tcg_out_ld(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[i],
+                       TCG_REG_ESP, 0);
+        } else {
+            tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_RAX, TCG_REG_ESP, 0);
+            tcg_out_st(s, TCG_TYPE_PTR, TCG_REG_RAX, TCG_REG_ESP,
+                       TCG_TARGET_CALL_STACK_OFFSET + 8);
+        }
+    }
+
+    /* Tail call to the helper.  */
+    if (is_ld) {
+        tcg_out_jmp(s, qemu_ld_helpers[opc & (MO_BSWAP | MO_SSIZE)]);
+    } else {
+        tcg_out_jmp(s, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)]);
+    }
+
+    return thunk;
+}
+#endif
+
 static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                               const TCGArg *args, const int *const_args)
 {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 10/37] tcg/aarch64: Add constraints for x0, x1, x2
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (8 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 09/37] tcg/i386: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-30 17:25   ` Alex Bennée
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 11/37] tcg/aarch64: Parameterize the temps for tcg_out_tlb_read Richard Henderson
                   ` (28 subsequent siblings)
  38 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

These are function call arguments that we will need soon.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 30091f6a69..148de0b7f2 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -125,6 +125,18 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
                                            const char *ct_str, TCGType type)
 {
     switch (*ct_str++) {
+    case 'a': /* x0 */
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set_reg(ct->u.regs, TCG_REG_X0);
+        break;
+    case 'b': /* x1 */
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set_reg(ct->u.regs, TCG_REG_X1);
+        break;
+    case 'c': /* x2 */
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set_reg(ct->u.regs, TCG_REG_X2);
+        break;
     case 'r': /* general registers */
         ct->ct |= TCG_CT_REG;
         ct->u.regs |= 0xffffffffu;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 11/37] tcg/aarch64: Parameterize the temps for tcg_out_tlb_read
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (9 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 10/37] tcg/aarch64: Add constraints for x0, x1, x2 Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-30 17:50   ` Alex Bennée
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 12/37] tcg/aarch64: Parameterize the temp for tcg_out_goto_long Richard Henderson
                   ` (27 subsequent siblings)
  38 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

When moving the qemu_ld/st arguments to the right place for
a function call, we'll need to move the temps out of the way.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c | 74 +++++++++++++++++++-----------------
 1 file changed, 40 insertions(+), 34 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 148de0b7f2..c0ba9a6d50 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -1467,13 +1467,15 @@ static void add_qemu_ldst_label(TCGContext *s, bool is_ld, TCGMemOpIdx oi,
     label->label_ptr[0] = label_ptr;
 }
 
-/* Load and compare a TLB entry, emitting the conditional jump to the
-   slow path for the failure case, which will be patched later when finalizing
-   the slow path. Generated code returns the host addend in X1,
-   clobbers X0,X2,X3,TMP. */
-static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp opc,
-                             tcg_insn_unit **label_ptr, int mem_index,
-                             bool is_read)
+/*
+ * Load and compare a TLB entry, emitting the conditional jump to the
+ * slow path on failure.  Returns the register for the host addend.
+ * Clobbers t0, t1, t2, t3.
+ */
+static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp opc,
+                               tcg_insn_unit **label_ptr, int mem_index,
+                               bool is_read, TCGReg t0, TCGReg t1,
+                               TCGReg t2, TCGReg t3)
 {
     int tlb_offset = is_read ?
         offsetof(CPUArchState, tlb_table[mem_index][0].addr_read)
@@ -1491,55 +1493,56 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp opc,
     if (a_bits >= s_bits) {
         x3 = addr_reg;
     } else {
+        x3 = t3;
         tcg_out_insn(s, 3401, ADDI, TARGET_LONG_BITS == 64,
-                     TCG_REG_X3, addr_reg, s_mask - a_mask);
-        x3 = TCG_REG_X3;
+                     x3, addr_reg, s_mask - a_mask);
     }
     tlb_mask = (uint64_t)TARGET_PAGE_MASK | a_mask;
 
-    /* Extract the TLB index from the address into X0.
-       X0<CPU_TLB_BITS:0> =
+    /* Extract the TLB index from the address into T0.
+       T0<CPU_TLB_BITS:0> =
        addr_reg<TARGET_PAGE_BITS+CPU_TLB_BITS:TARGET_PAGE_BITS> */
-    tcg_out_ubfm(s, TARGET_LONG_BITS == 64, TCG_REG_X0, addr_reg,
+    tcg_out_ubfm(s, TARGET_LONG_BITS == 64, t0, addr_reg,
                  TARGET_PAGE_BITS, TARGET_PAGE_BITS + CPU_TLB_BITS);
 
-    /* Store the page mask part of the address into X3.  */
+    /* Store the page mask part of the address into T3.  */
     tcg_out_logicali(s, I3404_ANDI, TARGET_LONG_BITS == 64,
-                     TCG_REG_X3, x3, tlb_mask);
+                     t3, x3, tlb_mask);
 
-    /* Add any "high bits" from the tlb offset to the env address into X2,
+    /* Add any "high bits" from the tlb offset to the env address into T2,
        to take advantage of the LSL12 form of the ADDI instruction.
-       X2 = env + (tlb_offset & 0xfff000) */
+       T2 = env + (tlb_offset & 0xfff000) */
     if (tlb_offset & 0xfff000) {
-        tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, TCG_REG_X2, base,
+        tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, t2, base,
                      tlb_offset & 0xfff000);
-        base = TCG_REG_X2;
+        base = t2;
     }
 
-    /* Merge the tlb index contribution into X2.
-       X2 = X2 + (X0 << CPU_TLB_ENTRY_BITS) */
-    tcg_out_insn(s, 3502S, ADD_LSL, TCG_TYPE_I64, TCG_REG_X2, base,
-                 TCG_REG_X0, CPU_TLB_ENTRY_BITS);
+    /* Merge the tlb index contribution into T2.
+       T2 = T2 + (T0 << CPU_TLB_ENTRY_BITS) */
+    tcg_out_insn(s, 3502S, ADD_LSL, TCG_TYPE_I64,
+                 t2, base, t0, CPU_TLB_ENTRY_BITS);
 
-    /* Merge "low bits" from tlb offset, load the tlb comparator into X0.
-       X0 = load [X2 + (tlb_offset & 0x000fff)] */
+    /* Merge "low bits" from tlb offset, load the tlb comparator into T0.
+       T0 = load [T2 + (tlb_offset & 0x000fff)] */
     tcg_out_ldst(s, TARGET_LONG_BITS == 32 ? I3312_LDRW : I3312_LDRX,
-                 TCG_REG_X0, TCG_REG_X2, tlb_offset & 0xfff,
-                 TARGET_LONG_BITS == 32 ? 2 : 3);
+                 t0, t2, tlb_offset & 0xfff, TARGET_LONG_BITS == 32 ? 2 : 3);
 
     /* Load the tlb addend. Do that early to avoid stalling.
-       X1 = load [X2 + (tlb_offset & 0xfff) + offsetof(addend)] */
-    tcg_out_ldst(s, I3312_LDRX, TCG_REG_X1, TCG_REG_X2,
+       T1 = load [T2 + (tlb_offset & 0xfff) + offsetof(addend)] */
+    tcg_out_ldst(s, I3312_LDRX, t1, t2,
                  (tlb_offset & 0xfff) + (offsetof(CPUTLBEntry, addend)) -
                  (is_read ? offsetof(CPUTLBEntry, addr_read)
                   : offsetof(CPUTLBEntry, addr_write)), 3);
 
     /* Perform the address comparison. */
-    tcg_out_cmp(s, (TARGET_LONG_BITS == 64), TCG_REG_X0, TCG_REG_X3, 0);
+    tcg_out_cmp(s, (TARGET_LONG_BITS == 64), t0, t3, 0);
 
     /* If not equal, we jump to the slow path. */
     *label_ptr = s->code_ptr;
     tcg_out_goto_cond_noaddr(s, TCG_COND_NE);
+
+    return t1;
 }
 
 #endif /* CONFIG_SOFTMMU */
@@ -1644,10 +1647,12 @@ static void tcg_out_qemu_ld(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
 #ifdef CONFIG_SOFTMMU
     unsigned mem_index = get_mmuidx(oi);
     tcg_insn_unit *label_ptr;
+    TCGReg base;
 
-    tcg_out_tlb_read(s, addr_reg, memop, &label_ptr, mem_index, 1);
+    base = tcg_out_tlb_read(s, addr_reg, memop, &label_ptr, mem_index, 1,
+                            TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3);
     tcg_out_qemu_ld_direct(s, memop, ext, data_reg,
-                           TCG_REG_X1, otype, addr_reg);
+                           base, otype, addr_reg);
     add_qemu_ldst_label(s, true, oi, ext, data_reg, addr_reg,
                         s->code_ptr, label_ptr);
 #else /* !CONFIG_SOFTMMU */
@@ -1669,10 +1674,11 @@ static void tcg_out_qemu_st(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
 #ifdef CONFIG_SOFTMMU
     unsigned mem_index = get_mmuidx(oi);
     tcg_insn_unit *label_ptr;
+    TCGReg base;
 
-    tcg_out_tlb_read(s, addr_reg, memop, &label_ptr, mem_index, 0);
-    tcg_out_qemu_st_direct(s, memop, data_reg,
-                           TCG_REG_X1, otype, addr_reg);
+    base = tcg_out_tlb_read(s, addr_reg, memop, &label_ptr, mem_index, 0,
+                            TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3);
+    tcg_out_qemu_st_direct(s, memop, data_reg, base, otype, addr_reg);
     add_qemu_ldst_label(s, false, oi, (memop & MO_SIZE)== MO_64,
                         data_reg, addr_reg, s->code_ptr, label_ptr);
 #else /* !CONFIG_SOFTMMU */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 12/37] tcg/aarch64: Parameterize the temp for tcg_out_goto_long
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (10 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 11/37] tcg/aarch64: Parameterize the temps for tcg_out_tlb_read Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 13/37] tcg/aarch64: Use B not BL " Richard Henderson
                   ` (26 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

We cannot use TCG_REG_LR (aka TCG_REG_TMP) for tail calls.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index c0ba9a6d50..ea5fe33fca 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -1134,14 +1134,15 @@ static inline void tcg_out_goto(TCGContext *s, tcg_insn_unit *target)
     tcg_out_insn(s, 3206, B, offset);
 }
 
-static inline void tcg_out_goto_long(TCGContext *s, tcg_insn_unit *target)
+static inline void tcg_out_goto_long(TCGContext *s, tcg_insn_unit *target,
+                                     TCGReg scratch)
 {
     ptrdiff_t offset = target - s->code_ptr;
     if (offset == sextract64(offset, 0, 26)) {
         tcg_out_insn(s, 3206, BL, offset);
     } else {
-        tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP, (intptr_t)target);
-        tcg_out_insn(s, 3207, BR, TCG_REG_TMP);
+        tcg_out_movi(s, TCG_TYPE_I64, scratch, (intptr_t)target);
+        tcg_out_insn(s, 3207, BR, scratch);
     }
 }
 
@@ -1716,10 +1717,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_exit_tb:
         /* Reuse the zeroing that exists for goto_ptr.  */
         if (a0 == 0) {
-            tcg_out_goto_long(s, s->code_gen_epilogue);
+            tcg_out_goto_long(s, s->code_gen_epilogue, TCG_REG_TMP);
         } else {
             tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_X0, a0);
-            tcg_out_goto_long(s, tb_ret_addr);
+            tcg_out_goto_long(s, tb_ret_addr, TCG_REG_TMP);
         }
         break;
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 13/37] tcg/aarch64: Use B not BL for tcg_out_goto_long
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (11 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 12/37] tcg/aarch64: Parameterize the temp for tcg_out_goto_long Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 14/37] tcg/aarch64: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
                   ` (25 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

This was a typo copying from tcg_out_call, apparently.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index ea5fe33fca..403f5caf14 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -1139,7 +1139,7 @@ static inline void tcg_out_goto_long(TCGContext *s, tcg_insn_unit *target,
 {
     ptrdiff_t offset = target - s->code_ptr;
     if (offset == sextract64(offset, 0, 26)) {
-        tcg_out_insn(s, 3206, BL, offset);
+        tcg_out_insn(s, 3206, B, offset);
     } else {
         tcg_out_movi(s, TCG_TYPE_I64, scratch, (intptr_t)target);
         tcg_out_insn(s, 3207, BR, scratch);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 14/37] tcg/aarch64: Use TCG_TARGET_NEED_LDST_OOL_LABELS
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (12 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 13/37] tcg/aarch64: Use B not BL " Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 15/37] tcg/arm: Parameterize the temps for tcg_out_tlb_read Richard Henderson
                   ` (24 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.h     |   2 +-
 tcg/aarch64/tcg-target.inc.c | 191 +++++++++++++++++------------------
 2 files changed, 93 insertions(+), 100 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 9aea1d1771..d1bd77c41d 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -146,7 +146,7 @@ static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
 
 #ifdef CONFIG_SOFTMMU
-#define TCG_TARGET_NEED_LDST_LABELS
+#define TCG_TARGET_NEED_LDST_OOL_LABELS
 #endif
 #define TCG_TARGET_NEED_POOL_LABELS
 
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 403f5caf14..8edea527f7 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -145,18 +145,6 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
         ct->ct |= TCG_CT_REG;
         ct->u.regs |= 0xffffffff00000000ull;
         break;
-    case 'l': /* qemu_ld / qemu_st address, data_reg */
-        ct->ct |= TCG_CT_REG;
-        ct->u.regs = 0xffffffffu;
-#ifdef CONFIG_SOFTMMU
-        /* x0 and x1 will be overwritten when reading the tlb entry,
-           and x2, and x3 for helper args, better to avoid using them. */
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_X0);
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_X1);
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_X2);
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_X3);
-#endif
-        break;
     case 'A': /* Valid for arithmetic immediate (positive or negative).  */
         ct->ct |= TCG_CT_CONST_AIMM;
         break;
@@ -1378,7 +1366,7 @@ static void tcg_out_cltz(TCGContext *s, TCGType ext, TCGReg d,
 }
 
 #ifdef CONFIG_SOFTMMU
-#include "tcg-ldst.inc.c"
+#include "tcg-ldst-ool.inc.c"
 
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     TCGMemOpIdx oi, uintptr_t ra)
@@ -1391,6 +1379,12 @@ static void * const qemu_ld_helpers[16] = {
     [MO_BEUW] = helper_be_lduw_mmu,
     [MO_BEUL] = helper_be_ldul_mmu,
     [MO_BEQ]  = helper_be_ldq_mmu,
+
+    [MO_SB]   = helper_ret_ldsb_mmu,
+    [MO_LESW] = helper_le_ldsw_mmu,
+    [MO_LESL] = helper_le_ldsl_mmu,
+    [MO_BESW] = helper_be_ldsw_mmu,
+    [MO_BESL] = helper_be_ldsl_mmu,
 };
 
 /* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
@@ -1407,67 +1401,6 @@ static void * const qemu_st_helpers[16] = {
     [MO_BEQ]  = helper_be_stq_mmu,
 };
 
-static inline void tcg_out_adr(TCGContext *s, TCGReg rd, void *target)
-{
-    ptrdiff_t offset = tcg_pcrel_diff(s, target);
-    tcg_debug_assert(offset == sextract64(offset, 0, 21));
-    tcg_out_insn(s, 3406, ADR, rd, offset);
-}
-
-static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
-{
-    TCGMemOpIdx oi = lb->oi;
-    TCGMemOp opc = get_memop(oi);
-    TCGMemOp size = opc & MO_SIZE;
-
-    reloc_pc19(lb->label_ptr[0], s->code_ptr);
-
-    tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_X0, TCG_AREG0);
-    tcg_out_mov(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
-    tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X2, oi);
-    tcg_out_adr(s, TCG_REG_X3, lb->raddr);
-    tcg_out_call(s, qemu_ld_helpers[opc & (MO_BSWAP | MO_SIZE)]);
-    if (opc & MO_SIGN) {
-        tcg_out_sxt(s, lb->type, size, lb->datalo_reg, TCG_REG_X0);
-    } else {
-        tcg_out_mov(s, size == MO_64, lb->datalo_reg, TCG_REG_X0);
-    }
-
-    tcg_out_goto(s, lb->raddr);
-}
-
-static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
-{
-    TCGMemOpIdx oi = lb->oi;
-    TCGMemOp opc = get_memop(oi);
-    TCGMemOp size = opc & MO_SIZE;
-
-    reloc_pc19(lb->label_ptr[0], s->code_ptr);
-
-    tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_X0, TCG_AREG0);
-    tcg_out_mov(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
-    tcg_out_mov(s, size == MO_64, TCG_REG_X2, lb->datalo_reg);
-    tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X3, oi);
-    tcg_out_adr(s, TCG_REG_X4, lb->raddr);
-    tcg_out_call(s, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)]);
-    tcg_out_goto(s, lb->raddr);
-}
-
-static void add_qemu_ldst_label(TCGContext *s, bool is_ld, TCGMemOpIdx oi,
-                                TCGType ext, TCGReg data_reg, TCGReg addr_reg,
-                                tcg_insn_unit *raddr, tcg_insn_unit *label_ptr)
-{
-    TCGLabelQemuLdst *label = new_ldst_label(s);
-
-    label->is_ld = is_ld;
-    label->oi = oi;
-    label->type = ext;
-    label->datalo_reg = data_reg;
-    label->addrlo_reg = addr_reg;
-    label->raddr = raddr;
-    label->label_ptr[0] = label_ptr;
-}
-
 /*
  * Load and compare a TLB entry, emitting the conditional jump to the
  * slow path on failure.  Returns the register for the host addend.
@@ -1644,19 +1577,22 @@ static void tcg_out_qemu_ld(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
                             TCGMemOpIdx oi, TCGType ext)
 {
     TCGMemOp memop = get_memop(oi);
-    const TCGType otype = TARGET_LONG_BITS == 64 ? TCG_TYPE_I64 : TCG_TYPE_I32;
-#ifdef CONFIG_SOFTMMU
-    unsigned mem_index = get_mmuidx(oi);
-    tcg_insn_unit *label_ptr;
-    TCGReg base;
 
-    base = tcg_out_tlb_read(s, addr_reg, memop, &label_ptr, mem_index, 1,
-                            TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3);
-    tcg_out_qemu_ld_direct(s, memop, ext, data_reg,
-                           base, otype, addr_reg);
-    add_qemu_ldst_label(s, true, oi, ext, data_reg, addr_reg,
-                        s->code_ptr, label_ptr);
+#ifdef CONFIG_SOFTMMU
+    /* Ignore the requested "ext".  We get the same correct result from
+     * a 16-bit sign-extended to 64-bit as we do sign-extended to 32-bit,
+     * and we create fewer out-of-line thunks.
+     */
+    bool is_64 = (memop & MO_SIGN) || ((memop & MO_SIZE) == MO_64);
+
+    tcg_debug_assert(data_reg == TCG_REG_X0);
+    tcg_debug_assert(addr_reg == TCG_REG_X1);
+
+    add_ldst_ool_label(s, true, is_64, oi, R_AARCH64_JUMP26, 0);
+    tcg_out_insn(s, 3206, BL, 0);
 #else /* !CONFIG_SOFTMMU */
+    const TCGType otype = TARGET_LONG_BITS == 64 ? TCG_TYPE_I64 : TCG_TYPE_I32;
+
     if (USE_GUEST_BASE) {
         tcg_out_qemu_ld_direct(s, memop, ext, data_reg,
                                TCG_REG_GUEST_BASE, otype, addr_reg);
@@ -1671,18 +1607,18 @@ static void tcg_out_qemu_st(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
                             TCGMemOpIdx oi)
 {
     TCGMemOp memop = get_memop(oi);
-    const TCGType otype = TARGET_LONG_BITS == 64 ? TCG_TYPE_I64 : TCG_TYPE_I32;
-#ifdef CONFIG_SOFTMMU
-    unsigned mem_index = get_mmuidx(oi);
-    tcg_insn_unit *label_ptr;
-    TCGReg base;
 
-    base = tcg_out_tlb_read(s, addr_reg, memop, &label_ptr, mem_index, 0,
-                            TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3);
-    tcg_out_qemu_st_direct(s, memop, data_reg, base, otype, addr_reg);
-    add_qemu_ldst_label(s, false, oi, (memop & MO_SIZE)== MO_64,
-                        data_reg, addr_reg, s->code_ptr, label_ptr);
+#ifdef CONFIG_SOFTMMU
+    bool is_64 = (memop & MO_SIZE) == MO_64;
+
+    tcg_debug_assert(addr_reg == TCG_REG_X1);
+    tcg_debug_assert(data_reg == TCG_REG_X2);
+
+    add_ldst_ool_label(s, false, is_64, oi, R_AARCH64_JUMP26, 0);
+    tcg_out_insn(s, 3206, BL, 0);
 #else /* !CONFIG_SOFTMMU */
+    const TCGType otype = TARGET_LONG_BITS == 64 ? TCG_TYPE_I64 : TCG_TYPE_I32;
+
     if (USE_GUEST_BASE) {
         tcg_out_qemu_st_direct(s, memop, data_reg,
                                TCG_REG_GUEST_BASE, otype, addr_reg);
@@ -1693,6 +1629,52 @@ static void tcg_out_qemu_st(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
 #endif /* CONFIG_SOFTMMU */
 }
 
+#ifdef CONFIG_SOFTMMU
+static tcg_insn_unit *tcg_out_qemu_ldst_ool(TCGContext *s, bool is_ld,
+                                            bool is_64, TCGMemOpIdx oi)
+{
+    const TCGType otype = TARGET_LONG_BITS == 64 ? TCG_TYPE_I64 : TCG_TYPE_I32;
+    const TCGMemOp memop = get_memop(oi);
+    const unsigned mem_index = get_mmuidx(oi);
+    const TCGReg addr_reg = TCG_REG_X1;
+    const TCGReg data_reg = is_ld ? TCG_REG_X0 : TCG_REG_X2;
+    tcg_insn_unit * const thunk = s->code_ptr;
+    tcg_insn_unit *label;
+    TCGReg base, arg;
+
+    base = tcg_out_tlb_read(s, addr_reg, memop, &label, mem_index, is_ld,
+                            TCG_REG_X4, TCG_REG_X5, TCG_REG_X6, TCG_REG_X7);
+
+    /* TLB Hit */
+    if (is_ld) {
+        tcg_out_qemu_ld_direct(s, memop, is_64, data_reg,
+                               base, otype, addr_reg);
+    } else {
+        tcg_out_qemu_st_direct(s, memop, data_reg, base, otype, addr_reg);
+    }
+    tcg_out_insn(s, 3207, RET, TCG_REG_LR);
+
+    /* TLB Miss */
+    reloc_pc19(label, s->code_ptr);
+
+    tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_X0, TCG_AREG0);
+    /* addr_reg and data_reg are already in place.  */
+    arg = is_ld ? TCG_REG_X2 : TCG_REG_X3;
+    tcg_out_movi(s, TCG_TYPE_I32, arg++, oi);
+    tcg_out_mov(s, TCG_TYPE_PTR, arg++, TCG_REG_LR);
+
+    if (is_ld) {
+        tcg_out_goto_long(s, qemu_ld_helpers[memop & (MO_BSWAP | MO_SSIZE)],
+                          TCG_REG_X7);
+    } else {
+        tcg_out_goto_long(s, qemu_st_helpers[memop & (MO_BSWAP | MO_SIZE)],
+                          TCG_REG_X7);
+    }
+
+    return thunk;
+}
+#endif
+
 static tcg_insn_unit *tb_ret_addr;
 
 static void tcg_out_op(TCGContext *s, TCGOpcode opc,
@@ -2262,10 +2244,12 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     static const TCGTargetOpDef w_w = { .args_ct_str = { "w", "w" } };
     static const TCGTargetOpDef w_r = { .args_ct_str = { "w", "r" } };
     static const TCGTargetOpDef w_wr = { .args_ct_str = { "w", "wr" } };
-    static const TCGTargetOpDef r_l = { .args_ct_str = { "r", "l" } };
     static const TCGTargetOpDef r_rA = { .args_ct_str = { "r", "rA" } };
     static const TCGTargetOpDef rZ_r = { .args_ct_str = { "rZ", "r" } };
-    static const TCGTargetOpDef lZ_l = { .args_ct_str = { "lZ", "l" } };
+#ifdef CONFIG_SOFTMMU
+    static const TCGTargetOpDef a_b = { .args_ct_str = { "a", "b" } };
+    static const TCGTargetOpDef c_b = { .args_ct_str = { "c", "b" } };
+#endif
     static const TCGTargetOpDef r_r_r = { .args_ct_str = { "r", "r", "r" } };
     static const TCGTargetOpDef w_w_w = { .args_ct_str = { "w", "w", "w" } };
     static const TCGTargetOpDef w_w_wZ = { .args_ct_str = { "w", "w", "wZ" } };
@@ -2397,10 +2381,19 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
 
     case INDEX_op_qemu_ld_i32:
     case INDEX_op_qemu_ld_i64:
-        return &r_l;
+#ifdef CONFIG_SOFTMMU
+        return &a_b;
+#else
+        return &r_r;
+#endif
+
     case INDEX_op_qemu_st_i32:
     case INDEX_op_qemu_st_i64:
-        return &lZ_l;
+#ifdef CONFIG_SOFTMMU
+        return &c_b;
+#else
+        return &r_r;
+#endif
 
     case INDEX_op_deposit_i32:
     case INDEX_op_deposit_i64:
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 15/37] tcg/arm: Parameterize the temps for tcg_out_tlb_read
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (13 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 14/37] tcg/aarch64: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 16/37] tcg/arm: Add constraints for R0-R5 Richard Henderson
                   ` (23 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

When moving the qemu_ld/st arguments to the right place for
a function call, we'll need to move the temps out of the way.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/arm/tcg-target.inc.c | 89 +++++++++++++++++++++-------------------
 1 file changed, 46 insertions(+), 43 deletions(-)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 80d174ef44..414c91c9ea 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -1245,11 +1245,14 @@ static TCGReg tcg_out_arg_reg64(TCGContext *s, TCGReg argreg,
 /* We're expecting to use an 8-bit immediate and to mask.  */
 QEMU_BUILD_BUG_ON(CPU_TLB_BITS > 8);
 
-/* Load and compare a TLB entry, leaving the flags set.  Returns the register
-   containing the addend of the tlb entry.  Clobbers R0, R1, R2, TMP.  */
-
+/*
+ *Load and compare a TLB entry, leaving the flags set.  Returns the register
+ * containing the addend of the tlb entry.  Clobbers t0, t1, t2, t3.
+ * T0 and T1 must be consecutive for LDRD.
+ */
 static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
-                               TCGMemOp opc, int mem_index, bool is_load)
+                               TCGMemOp opc, int mem_index, bool is_load,
+                               TCGReg t0, TCGReg t1, TCGReg t2, TCGReg t3)
 {
     TCGReg base = TCG_AREG0;
     int cmp_off =
@@ -1262,36 +1265,37 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     unsigned a_bits = get_alignment_bits(opc);
 
     /* V7 generates the following:
-     *   ubfx   r0, addrlo, #TARGET_PAGE_BITS, #CPU_TLB_BITS
-     *   add    r2, env, #high
-     *   add    r2, r2, r0, lsl #CPU_TLB_ENTRY_BITS
-     *   ldr    r0, [r2, #cmp]
-     *   ldr    r2, [r2, #add]
-     *   movw   tmp, #page_align_mask
-     *   bic    tmp, addrlo, tmp
-     *   cmp    r0, tmp
+     *   ubfx   t0, addrlo, #TARGET_PAGE_BITS, #CPU_TLB_BITS
+     *   add    t2, env, #high
+     *   add    t2, t2, r0, lsl #CPU_TLB_ENTRY_BITS
+     *   ldr    t0, [t2, #cmp]  (and t1 w/ldrd)
+     *   ldr    t2, [t2, #add]
+     *   movw   t3, #page_align_mask
+     *   bic    t3, addrlo, t3
+     *   cmp    t0, t3
      *
      * Otherwise we generate:
-     *   shr    tmp, addrlo, #TARGET_PAGE_BITS
-     *   add    r2, env, #high
-     *   and    r0, tmp, #(CPU_TLB_SIZE - 1)
-     *   add    r2, r2, r0, lsl #CPU_TLB_ENTRY_BITS
-     *   ldr    r0, [r2, #cmp]
-     *   ldr    r2, [r2, #add]
+     *   shr    t3, addrlo, #TARGET_PAGE_BITS
+     *   add    t2, env, #high
+     *   and    t0, t3, #(CPU_TLB_SIZE - 1)
+     *   add    t2, t2, t0, lsl #CPU_TLB_ENTRY_BITS
+     *   ldr    t0, [t2, #cmp]  (and t1 w/ldrd)
+     *   ldr    t2, [t2, #add]
      *   tst    addrlo, #s_mask
-     *   cmpeq  r0, tmp, lsl #TARGET_PAGE_BITS
+     *   cmpeq  t0, t3, lsl #TARGET_PAGE_BITS
      */
     if (use_armv7_instructions) {
-        tcg_out_extract(s, COND_AL, TCG_REG_R0, addrlo,
+        tcg_out_extract(s, COND_AL, t0, addrlo,
                         TARGET_PAGE_BITS, CPU_TLB_BITS);
     } else {
-        tcg_out_dat_reg(s, COND_AL, ARITH_MOV, TCG_REG_TMP,
+        tcg_out_dat_reg(s, COND_AL, ARITH_MOV, t3,
                         0, addrlo, SHIFT_IMM_LSR(TARGET_PAGE_BITS));
     }
 
     /* Add portions of the offset until the memory access is in range.
      * If we plan on using ldrd, reduce to an 8-bit offset; otherwise
-     * we can use a 12-bit offset.  */
+     * we can use a 12-bit offset.
+     */
     if (use_armv6_instructions && TARGET_LONG_BITS == 64) {
         mask_off = 0xff;
     } else {
@@ -1301,34 +1305,33 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
         int shift = ctz32(cmp_off & ~mask_off) & ~1;
         int rot = ((32 - shift) << 7) & 0xf00;
         int addend = cmp_off & (0xff << shift);
-        tcg_out_dat_imm(s, COND_AL, ARITH_ADD, TCG_REG_R2, base,
+        tcg_out_dat_imm(s, COND_AL, ARITH_ADD, t2, base,
                         rot | ((cmp_off >> shift) & 0xff));
-        base = TCG_REG_R2;
+        base = t2;
         add_off -= addend;
         cmp_off -= addend;
     }
 
     if (!use_armv7_instructions) {
-        tcg_out_dat_imm(s, COND_AL, ARITH_AND,
-                        TCG_REG_R0, TCG_REG_TMP, CPU_TLB_SIZE - 1);
+        tcg_out_dat_imm(s, COND_AL, ARITH_AND, t0, t3, CPU_TLB_SIZE - 1);
     }
-    tcg_out_dat_reg(s, COND_AL, ARITH_ADD, TCG_REG_R2, base,
-                    TCG_REG_R0, SHIFT_IMM_LSL(CPU_TLB_ENTRY_BITS));
+    tcg_out_dat_reg(s, COND_AL, ARITH_ADD, t2, base, t0,
+                    SHIFT_IMM_LSL(CPU_TLB_ENTRY_BITS));
 
     /* Load the tlb comparator.  Use ldrd if needed and available,
        but due to how the pointer needs setting up, ldm isn't useful.
        Base arm5 doesn't have ldrd, but armv5te does.  */
     if (use_armv6_instructions && TARGET_LONG_BITS == 64) {
-        tcg_out_ldrd_8(s, COND_AL, TCG_REG_R0, TCG_REG_R2, cmp_off);
+        tcg_out_ldrd_8(s, COND_AL, t0, t2, cmp_off);
     } else {
-        tcg_out_ld32_12(s, COND_AL, TCG_REG_R0, TCG_REG_R2, cmp_off);
+        tcg_out_ld32_12(s, COND_AL, t0, t2, cmp_off);
         if (TARGET_LONG_BITS == 64) {
-            tcg_out_ld32_12(s, COND_AL, TCG_REG_R1, TCG_REG_R2, cmp_off + 4);
+            tcg_out_ld32_12(s, COND_AL, t1, t2, cmp_off + 4);
         }
     }
 
     /* Load the tlb addend.  */
-    tcg_out_ld32_12(s, COND_AL, TCG_REG_R2, TCG_REG_R2, add_off);
+    tcg_out_ld32_12(s, COND_AL, t2, t2, add_off);
 
     /* Check alignment.  We don't support inline unaligned acceses,
        but we can easily support overalignment checks.  */
@@ -1341,29 +1344,27 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
         int rot = encode_imm(mask);
 
         if (rot >= 0) { 
-            tcg_out_dat_imm(s, COND_AL, ARITH_BIC, TCG_REG_TMP, addrlo,
+            tcg_out_dat_imm(s, COND_AL, ARITH_BIC, t3, addrlo,
                             rotl(mask, rot) | (rot << 7));
         } else {
-            tcg_out_movi32(s, COND_AL, TCG_REG_TMP, mask);
-            tcg_out_dat_reg(s, COND_AL, ARITH_BIC, TCG_REG_TMP,
-                            addrlo, TCG_REG_TMP, 0);
+            tcg_out_movi32(s, COND_AL, t3, mask);
+            tcg_out_dat_reg(s, COND_AL, ARITH_BIC, t3, addrlo, t3, 0);
         }
-        tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0, TCG_REG_R0, TCG_REG_TMP, 0);
+        tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0, t0, t3, 0);
     } else {
         if (a_bits) {
             tcg_out_dat_imm(s, COND_AL, ARITH_TST, 0, addrlo,
                             (1 << a_bits) - 1);
         }
         tcg_out_dat_reg(s, (a_bits ? COND_EQ : COND_AL), ARITH_CMP,
-                        0, TCG_REG_R0, TCG_REG_TMP,
-                        SHIFT_IMM_LSL(TARGET_PAGE_BITS));
+                        0, t0, t3, SHIFT_IMM_LSL(TARGET_PAGE_BITS));
     }
 
     if (TARGET_LONG_BITS == 64) {
-        tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0, TCG_REG_R1, addrhi, 0);
+        tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0, t1, addrhi, 0);
     }
 
-    return TCG_REG_R2;
+    return t2;
 }
 
 /* Record the context of a call to the out of line helper code for the slow
@@ -1629,7 +1630,8 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
-    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 1);
+    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 1,
+                              TCG_REG_R0, TCG_REG_R1, TCG_REG_R2, TCG_REG_R14);
 
     /* This a conditional BL only to load a pointer within this opcode into LR
        for the slow path.  We will not be using the value for a tail call.  */
@@ -1760,7 +1762,8 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
-    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 0);
+    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 0,
+                              TCG_REG_R0, TCG_REG_R1, TCG_REG_R2, TCG_REG_R14);
 
     tcg_out_qemu_st_index(s, COND_EQ, opc, datalo, datahi, addrlo, addend);
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 16/37] tcg/arm: Add constraints for R0-R5
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (14 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 15/37] tcg/arm: Parameterize the temps for tcg_out_tlb_read Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 17/37] tcg/arm: Reduce the number of temps for tcg_out_tlb_read Richard Henderson
                   ` (22 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

These are function call arguments that we will need soon.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/arm/tcg-target.inc.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 414c91c9ea..4339c472e8 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -246,7 +246,12 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 static const char *target_parse_constraint(TCGArgConstraint *ct,
                                            const char *ct_str, TCGType type)
 {
-    switch (*ct_str++) {
+    char c = *ct_str++;
+    switch (c) {
+    case 'a' ... 'f': /* r0 - r5 */
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set_reg(ct->u.regs, TCG_REG_R0 + (c - 'a'));
+        break;
     case 'I':
         ct->ct |= TCG_CT_CONST_ARM;
         break;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 17/37] tcg/arm: Reduce the number of temps for tcg_out_tlb_read
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (15 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 16/37] tcg/arm: Add constraints for R0-R5 Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 18/37] tcg/arm: Force qemu_ld/st arguments into fixed registers Richard Henderson
                   ` (21 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

When moving the qemu_ld/st thunk out of line, we no longer have LR for
use as a temporary.  In the worst case we must make do with 3 temps,
when dealing with a 64-bit guest address.  This in turn imples that we
cannot use LDRD anymore, as there are not enough temps.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/arm/tcg-target.inc.c | 97 ++++++++++++++++++++++------------------
 1 file changed, 53 insertions(+), 44 deletions(-)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 4339c472e8..2deeb1f5d1 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -1251,13 +1251,12 @@ static TCGReg tcg_out_arg_reg64(TCGContext *s, TCGReg argreg,
 QEMU_BUILD_BUG_ON(CPU_TLB_BITS > 8);
 
 /*
- *Load and compare a TLB entry, leaving the flags set.  Returns the register
- * containing the addend of the tlb entry.  Clobbers t0, t1, t2, t3.
- * T0 and T1 must be consecutive for LDRD.
+ * Load and compare a TLB entry, leaving the flags set.  Returns the register
+ * containing the addend of the tlb entry.  Clobbers t0, t1, t2.
  */
 static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
                                TCGMemOp opc, int mem_index, bool is_load,
-                               TCGReg t0, TCGReg t1, TCGReg t2, TCGReg t3)
+                               TCGReg t0, TCGReg t1, TCGReg t2)
 {
     TCGReg base = TCG_AREG0;
     int cmp_off =
@@ -1265,49 +1264,64 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
          ? offsetof(CPUArchState, tlb_table[mem_index][0].addr_read)
          : offsetof(CPUArchState, tlb_table[mem_index][0].addr_write));
     int add_off = offsetof(CPUArchState, tlb_table[mem_index][0].addend);
-    int mask_off;
     unsigned s_bits = opc & MO_SIZE;
     unsigned a_bits = get_alignment_bits(opc);
 
     /* V7 generates the following:
      *   ubfx   t0, addrlo, #TARGET_PAGE_BITS, #CPU_TLB_BITS
      *   add    t2, env, #high
-     *   add    t2, t2, r0, lsl #CPU_TLB_ENTRY_BITS
-     *   ldr    t0, [t2, #cmp]  (and t1 w/ldrd)
+     *   add    t2, t2, t0, lsl #CPU_TLB_ENTRY_BITS
+     *   ldr    t0, [t2, #cmp]
      *   ldr    t2, [t2, #add]
-     *   movw   t3, #page_align_mask
-     *   bic    t3, addrlo, t3
-     *   cmp    t0, t3
+     *   movw   t1, #page_align_mask
+     *   bic    t1, addrlo, t1
+     *   cmp    t0, t1
+     *
+     *   ubfx   t0, addrlo, #TPB, #CTB   -- 64-bit address
+     *   add    t2, env, #high
+     *   add    t2, t2, t0, lsl #CTEB
+     *   ldr    t0, [t2, #cmplo]
+     *   movw   t1, #page_align_mask
+     *   bic    t1, addrlo, t1
+     *   cmp    t0, t1
+     *   ldr    t0, [t2, #cmphi]
+     *   ldr    t2, [t2, #add]
+     *   cmpeq  t0, addrhi
      *
      * Otherwise we generate:
      *   shr    t3, addrlo, #TARGET_PAGE_BITS
      *   add    t2, env, #high
      *   and    t0, t3, #(CPU_TLB_SIZE - 1)
      *   add    t2, t2, t0, lsl #CPU_TLB_ENTRY_BITS
-     *   ldr    t0, [t2, #cmp]  (and t1 w/ldrd)
+     *   ldr    t0, [t2, #cmp]
      *   ldr    t2, [t2, #add]
      *   tst    addrlo, #s_mask
      *   cmpeq  t0, t3, lsl #TARGET_PAGE_BITS
+     *
+     *   shr    t1, addrlo, #TPB         -- 64-bit address
+     *   add    t2, env, #high
+     *   and    t0, t1, #CTS-1
+     *   add    t2, t2, t0, lsl #CTEB
+     *   ldr    t0, [t2, #cmplo]
+     *   tst    addrlo, #s_mask
+     *   cmpeq  t0, t1, lsl #TBP
+     *   ldr    t0, [t2, #cmphi]
+     *   ldr    t2, [t2, #add]
+     *   cmpeq  t0, addrhi
      */
     if (use_armv7_instructions) {
         tcg_out_extract(s, COND_AL, t0, addrlo,
                         TARGET_PAGE_BITS, CPU_TLB_BITS);
     } else {
-        tcg_out_dat_reg(s, COND_AL, ARITH_MOV, t3,
+        tcg_out_dat_reg(s, COND_AL, ARITH_MOV, t1,
                         0, addrlo, SHIFT_IMM_LSR(TARGET_PAGE_BITS));
     }
 
     /* Add portions of the offset until the memory access is in range.
-     * If we plan on using ldrd, reduce to an 8-bit offset; otherwise
-     * we can use a 12-bit offset.
+     * We are not using ldrd, so we can use a 12-bit offset.
      */
-    if (use_armv6_instructions && TARGET_LONG_BITS == 64) {
-        mask_off = 0xff;
-    } else {
-        mask_off = 0xfff;
-    }
-    while (cmp_off > mask_off) {
-        int shift = ctz32(cmp_off & ~mask_off) & ~1;
+    while (cmp_off > 0xfff) {
+        int shift = ctz32(cmp_off & ~0xfff) & ~1;
         int rot = ((32 - shift) << 7) & 0xf00;
         int addend = cmp_off & (0xff << shift);
         tcg_out_dat_imm(s, COND_AL, ARITH_ADD, t2, base,
@@ -1318,25 +1332,13 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     }
 
     if (!use_armv7_instructions) {
-        tcg_out_dat_imm(s, COND_AL, ARITH_AND, t0, t3, CPU_TLB_SIZE - 1);
+        tcg_out_dat_imm(s, COND_AL, ARITH_AND, t0, t1, CPU_TLB_SIZE - 1);
     }
     tcg_out_dat_reg(s, COND_AL, ARITH_ADD, t2, base, t0,
                     SHIFT_IMM_LSL(CPU_TLB_ENTRY_BITS));
 
-    /* Load the tlb comparator.  Use ldrd if needed and available,
-       but due to how the pointer needs setting up, ldm isn't useful.
-       Base arm5 doesn't have ldrd, but armv5te does.  */
-    if (use_armv6_instructions && TARGET_LONG_BITS == 64) {
-        tcg_out_ldrd_8(s, COND_AL, t0, t2, cmp_off);
-    } else {
-        tcg_out_ld32_12(s, COND_AL, t0, t2, cmp_off);
-        if (TARGET_LONG_BITS == 64) {
-            tcg_out_ld32_12(s, COND_AL, t1, t2, cmp_off + 4);
-        }
-    }
-
-    /* Load the tlb addend.  */
-    tcg_out_ld32_12(s, COND_AL, t2, t2, add_off);
+    /* Load the tlb comparator (low part).  */
+    tcg_out_ld32_12(s, COND_AL, t0, t2, cmp_off);
 
     /* Check alignment.  We don't support inline unaligned acceses,
        but we can easily support overalignment checks.  */
@@ -1349,24 +1351,31 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
         int rot = encode_imm(mask);
 
         if (rot >= 0) { 
-            tcg_out_dat_imm(s, COND_AL, ARITH_BIC, t3, addrlo,
+            tcg_out_dat_imm(s, COND_AL, ARITH_BIC, t1, addrlo,
                             rotl(mask, rot) | (rot << 7));
         } else {
-            tcg_out_movi32(s, COND_AL, t3, mask);
-            tcg_out_dat_reg(s, COND_AL, ARITH_BIC, t3, addrlo, t3, 0);
+            tcg_out_movi32(s, COND_AL, t1, mask);
+            tcg_out_dat_reg(s, COND_AL, ARITH_BIC, t1, addrlo, t1, 0);
         }
-        tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0, t0, t3, 0);
+        tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0, t0, t1, 0);
     } else {
         if (a_bits) {
             tcg_out_dat_imm(s, COND_AL, ARITH_TST, 0, addrlo,
                             (1 << a_bits) - 1);
         }
         tcg_out_dat_reg(s, (a_bits ? COND_EQ : COND_AL), ARITH_CMP,
-                        0, t0, t3, SHIFT_IMM_LSL(TARGET_PAGE_BITS));
+                        0, t0, t1, SHIFT_IMM_LSL(TARGET_PAGE_BITS));
     }
 
+    /* Load the tlb comparator (high part).  */
     if (TARGET_LONG_BITS == 64) {
-        tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0, t1, addrhi, 0);
+        tcg_out_ld32_12(s, COND_AL, t0, t2, cmp_off + 4);
+    }
+    /* Load the tlb addend.  */
+    tcg_out_ld32_12(s, COND_AL, t2, t2, add_off);
+
+    if (TARGET_LONG_BITS == 64) {
+        tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0, t0, addrhi, 0);
     }
 
     return t2;
@@ -1636,7 +1645,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
     addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 1,
-                              TCG_REG_R0, TCG_REG_R1, TCG_REG_R2, TCG_REG_R14);
+                              TCG_REG_R0, TCG_REG_R1, TCG_REG_TMP);
 
     /* This a conditional BL only to load a pointer within this opcode into LR
        for the slow path.  We will not be using the value for a tail call.  */
@@ -1768,7 +1777,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
     addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 0,
-                              TCG_REG_R0, TCG_REG_R1, TCG_REG_R2, TCG_REG_R14);
+                              TCG_REG_R0, TCG_REG_R1, TCG_REG_TMP);
 
     tcg_out_qemu_st_index(s, COND_EQ, opc, datalo, datahi, addrlo, addend);
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 18/37] tcg/arm: Force qemu_ld/st arguments into fixed registers
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (16 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 17/37] tcg/arm: Reduce the number of temps for tcg_out_tlb_read Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 19/37] tcg/arm: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
                   ` (20 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

This is an incremental step toward moving the qemu_ld/st
code sequence out of line.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/arm/tcg-target.inc.c | 113 +++++++++++++++++++++++++--------------
 1 file changed, 73 insertions(+), 40 deletions(-)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 2deeb1f5d1..6b89ac7983 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -270,37 +270,13 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
         ct->u.regs = 0xffff;
         break;
 
-    /* qemu_ld address */
-    case 'l':
-        ct->ct |= TCG_CT_REG;
-        ct->u.regs = 0xffff;
-#ifdef CONFIG_SOFTMMU
-        /* r0-r2,lr will be overwritten when reading the tlb entry,
-           so don't use these. */
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R0);
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R1);
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R2);
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R14);
-#endif
-        break;
-
     /* qemu_st address & data */
     case 's':
         ct->ct |= TCG_CT_REG;
         ct->u.regs = 0xffff;
-        /* r0-r2 will be overwritten when reading the tlb entry (softmmu only)
-           and r0-r1 doing the byte swapping, so don't use these. */
+        /* r0-r1 doing the byte swapping, so don't use these */
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_R0);
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_R1);
-#if defined(CONFIG_SOFTMMU)
-        /* Avoid clashes with registers being used for helper args */
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R2);
-#if TARGET_LONG_BITS == 64
-        /* Avoid clashes with registers being used for helper args */
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R3);
-#endif
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R14);
-#endif
         break;
 
     default:
@@ -1630,8 +1606,8 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
     TCGMemOpIdx oi;
     TCGMemOp opc;
 #ifdef CONFIG_SOFTMMU
-    int mem_index;
-    TCGReg addend;
+    int mem_index, avail;
+    TCGReg addend, t0, t1;
     tcg_insn_unit *label_ptr;
 #endif
 
@@ -1644,8 +1620,20 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
+
+    avail = 0xf;
+    avail &= ~(1 << addrlo);
+    if (TARGET_LONG_BITS == 64) {
+        avail &= ~(1 << addrhi);
+    }
+    tcg_debug_assert(avail & 1);
+    t0 = TCG_REG_R0;
+    avail &= ~1;
+    tcg_debug_assert(avail != 0);
+    t1 = ctz32(avail);
+
     addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 1,
-                              TCG_REG_R0, TCG_REG_R1, TCG_REG_TMP);
+                              t0, t1, TCG_REG_TMP);
 
     /* This a conditional BL only to load a pointer within this opcode into LR
        for the slow path.  We will not be using the value for a tail call.  */
@@ -1762,8 +1750,8 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
     TCGMemOpIdx oi;
     TCGMemOp opc;
 #ifdef CONFIG_SOFTMMU
-    int mem_index;
-    TCGReg addend;
+    int mem_index, avail;
+    TCGReg addend, t0, t1;
     tcg_insn_unit *label_ptr;
 #endif
 
@@ -1776,8 +1764,24 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
+
+    avail = 0xf;
+    avail &= ~(1 << addrlo);
+    avail &= ~(1 << datalo);
+    if (TARGET_LONG_BITS == 64) {
+        avail &= ~(1 << addrhi);
+    }
+    if (is64) {
+        avail &= ~(1 << datahi);
+    }
+    tcg_debug_assert(avail & 1);
+    t0 = TCG_REG_R0;
+    avail &= ~1;
+    tcg_debug_assert(avail != 0);
+    t1 = ctz32(avail);
+
     addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 0,
-                              TCG_REG_R0, TCG_REG_R1, TCG_REG_TMP);
+                              t0, t1, TCG_REG_TMP);
 
     tcg_out_qemu_st_index(s, COND_EQ, opc, datalo, datahi, addrlo, addend);
 
@@ -2118,11 +2122,14 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     static const TCGTargetOpDef r = { .args_ct_str = { "r" } };
     static const TCGTargetOpDef r_r = { .args_ct_str = { "r", "r" } };
     static const TCGTargetOpDef s_s = { .args_ct_str = { "s", "s" } };
-    static const TCGTargetOpDef r_l = { .args_ct_str = { "r", "l" } };
+    static const TCGTargetOpDef a_b = { .args_ct_str = { "a", "b" } };
+    static const TCGTargetOpDef c_b = { .args_ct_str = { "c", "b" } };
     static const TCGTargetOpDef r_r_r = { .args_ct_str = { "r", "r", "r" } };
-    static const TCGTargetOpDef r_r_l = { .args_ct_str = { "r", "r", "l" } };
-    static const TCGTargetOpDef r_l_l = { .args_ct_str = { "r", "l", "l" } };
     static const TCGTargetOpDef s_s_s = { .args_ct_str = { "s", "s", "s" } };
+    static const TCGTargetOpDef a_c_d = { .args_ct_str = { "a", "c", "d" } };
+    static const TCGTargetOpDef a_b_b = { .args_ct_str = { "a", "b", "b" } };
+    static const TCGTargetOpDef e_c_d = { .args_ct_str = { "e", "c", "d" } };
+    static const TCGTargetOpDef e_f_b = { .args_ct_str = { "e", "f", "b" } };
     static const TCGTargetOpDef r_r_ri = { .args_ct_str = { "r", "r", "ri" } };
     static const TCGTargetOpDef r_r_rI = { .args_ct_str = { "r", "r", "rI" } };
     static const TCGTargetOpDef r_r_rIN
@@ -2131,10 +2138,12 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         = { .args_ct_str = { "r", "r", "rIK" } };
     static const TCGTargetOpDef r_r_r_r
         = { .args_ct_str = { "r", "r", "r", "r" } };
-    static const TCGTargetOpDef r_r_l_l
-        = { .args_ct_str = { "r", "r", "l", "l" } };
     static const TCGTargetOpDef s_s_s_s
         = { .args_ct_str = { "s", "s", "s", "s" } };
+    static const TCGTargetOpDef a_b_c_d
+        = { .args_ct_str = { "a", "b", "c", "d" } };
+    static const TCGTargetOpDef e_f_c_d
+        = { .args_ct_str = { "e", "f", "c", "d" } };
     static const TCGTargetOpDef br
         = { .args_ct_str = { "r", "rIN" } };
     static const TCGTargetOpDef dep
@@ -2215,13 +2224,37 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         return &setc2;
 
     case INDEX_op_qemu_ld_i32:
-        return TARGET_LONG_BITS == 32 ? &r_l : &r_l_l;
+        if (!USING_SOFTMMU) {
+            return TARGET_LONG_BITS == 32 ? &r_r : &r_r_r;
+        } else if (TARGET_LONG_BITS == 32) {
+            return &a_b;     /* temps available r0, r2, r3, r12 */
+        } else {
+            return &a_c_d;   /* temps available r0, r1, r12 */
+        }
     case INDEX_op_qemu_ld_i64:
-        return TARGET_LONG_BITS == 32 ? &r_r_l : &r_r_l_l;
+        if (!USING_SOFTMMU) {
+            return TARGET_LONG_BITS == 32 ? &r_r_r : &r_r_r_r;
+        } else if (TARGET_LONG_BITS == 32) {
+            return &a_b_b;   /* temps available r0, r2, r3, r12 */
+        } else {
+            return &a_b_c_d; /* temps available r0, r1, r12 */
+        }
     case INDEX_op_qemu_st_i32:
-        return TARGET_LONG_BITS == 32 ? &s_s : &s_s_s;
+        if (!USING_SOFTMMU) {
+            return TARGET_LONG_BITS == 32 ? &s_s : &s_s_s;
+        } else if (TARGET_LONG_BITS == 32) {
+            return &c_b;     /* temps available r0, r3, r12 */
+        } else {
+            return &e_c_d;   /* temps available r0, r1, r12 */
+        }
     case INDEX_op_qemu_st_i64:
-        return TARGET_LONG_BITS == 32 ? &s_s_s : &s_s_s_s;
+        if (!USING_SOFTMMU) {
+            return TARGET_LONG_BITS == 32 ? &s_s_s : &s_s_s_s;
+        } else if (TARGET_LONG_BITS == 32) {
+            return &e_f_b;   /* temps available r0, r2, r3, r12 */
+        } else {
+            return &e_f_c_d; /* temps available r0, r1, r12 */
+        }
 
     default:
         return NULL;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 19/37] tcg/arm: Use TCG_TARGET_NEED_LDST_OOL_LABELS
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (17 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 18/37] tcg/arm: Force qemu_ld/st arguments into fixed registers Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 20/37] tcg/ppc: Parameterize the temps for tcg_out_tlb_read Richard Henderson
                   ` (19 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/arm/tcg-target.h     |   2 +-
 tcg/arm/tcg-target.inc.c | 314 ++++++++++++++++-----------------------
 2 files changed, 125 insertions(+), 191 deletions(-)

diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 94b3578c55..02981abdcc 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -141,7 +141,7 @@ static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
 
 #ifdef CONFIG_SOFTMMU
-#define TCG_TARGET_NEED_LDST_LABELS
+#define TCG_TARGET_NEED_LDST_OOL_LABELS
 #endif
 #define TCG_TARGET_NEED_POOL_LABELS
 
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 6b89ac7983..5a15f6a546 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -1133,7 +1133,7 @@ static TCGCond tcg_out_cmp2(TCGContext *s, const TCGArg *args,
 }
 
 #ifdef CONFIG_SOFTMMU
-#include "tcg-ldst.inc.c"
+#include "tcg-ldst-ool.inc.c"
 
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     int mmu_idx, uintptr_t ra)
@@ -1356,128 +1356,6 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
 
     return t2;
 }
-
-/* Record the context of a call to the out of line helper code for the slow
-   path for a load or store, so that we can later generate the correct
-   helper code.  */
-static void add_qemu_ldst_label(TCGContext *s, bool is_ld, TCGMemOpIdx oi,
-                                TCGReg datalo, TCGReg datahi, TCGReg addrlo,
-                                TCGReg addrhi, tcg_insn_unit *raddr,
-                                tcg_insn_unit *label_ptr)
-{
-    TCGLabelQemuLdst *label = new_ldst_label(s);
-
-    label->is_ld = is_ld;
-    label->oi = oi;
-    label->datalo_reg = datalo;
-    label->datahi_reg = datahi;
-    label->addrlo_reg = addrlo;
-    label->addrhi_reg = addrhi;
-    label->raddr = raddr;
-    label->label_ptr[0] = label_ptr;
-}
-
-static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
-{
-    TCGReg argreg, datalo, datahi;
-    TCGMemOpIdx oi = lb->oi;
-    TCGMemOp opc = get_memop(oi);
-    void *func;
-
-    reloc_pc24(lb->label_ptr[0], s->code_ptr);
-
-    argreg = tcg_out_arg_reg32(s, TCG_REG_R0, TCG_AREG0);
-    if (TARGET_LONG_BITS == 64) {
-        argreg = tcg_out_arg_reg64(s, argreg, lb->addrlo_reg, lb->addrhi_reg);
-    } else {
-        argreg = tcg_out_arg_reg32(s, argreg, lb->addrlo_reg);
-    }
-    argreg = tcg_out_arg_imm32(s, argreg, oi);
-    argreg = tcg_out_arg_reg32(s, argreg, TCG_REG_R14);
-
-    /* For armv6 we can use the canonical unsigned helpers and minimize
-       icache usage.  For pre-armv6, use the signed helpers since we do
-       not have a single insn sign-extend.  */
-    if (use_armv6_instructions) {
-        func = qemu_ld_helpers[opc & (MO_BSWAP | MO_SIZE)];
-    } else {
-        func = qemu_ld_helpers[opc & (MO_BSWAP | MO_SSIZE)];
-        if (opc & MO_SIGN) {
-            opc = MO_UL;
-        }
-    }
-    tcg_out_call(s, func);
-
-    datalo = lb->datalo_reg;
-    datahi = lb->datahi_reg;
-    switch (opc & MO_SSIZE) {
-    case MO_SB:
-        tcg_out_ext8s(s, COND_AL, datalo, TCG_REG_R0);
-        break;
-    case MO_SW:
-        tcg_out_ext16s(s, COND_AL, datalo, TCG_REG_R0);
-        break;
-    default:
-        tcg_out_mov_reg(s, COND_AL, datalo, TCG_REG_R0);
-        break;
-    case MO_Q:
-        if (datalo != TCG_REG_R1) {
-            tcg_out_mov_reg(s, COND_AL, datalo, TCG_REG_R0);
-            tcg_out_mov_reg(s, COND_AL, datahi, TCG_REG_R1);
-        } else if (datahi != TCG_REG_R0) {
-            tcg_out_mov_reg(s, COND_AL, datahi, TCG_REG_R1);
-            tcg_out_mov_reg(s, COND_AL, datalo, TCG_REG_R0);
-        } else {
-            tcg_out_mov_reg(s, COND_AL, TCG_REG_TMP, TCG_REG_R0);
-            tcg_out_mov_reg(s, COND_AL, datahi, TCG_REG_R1);
-            tcg_out_mov_reg(s, COND_AL, datalo, TCG_REG_TMP);
-        }
-        break;
-    }
-
-    tcg_out_goto(s, COND_AL, lb->raddr);
-}
-
-static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
-{
-    TCGReg argreg, datalo, datahi;
-    TCGMemOpIdx oi = lb->oi;
-    TCGMemOp opc = get_memop(oi);
-
-    reloc_pc24(lb->label_ptr[0], s->code_ptr);
-
-    argreg = TCG_REG_R0;
-    argreg = tcg_out_arg_reg32(s, argreg, TCG_AREG0);
-    if (TARGET_LONG_BITS == 64) {
-        argreg = tcg_out_arg_reg64(s, argreg, lb->addrlo_reg, lb->addrhi_reg);
-    } else {
-        argreg = tcg_out_arg_reg32(s, argreg, lb->addrlo_reg);
-    }
-
-    datalo = lb->datalo_reg;
-    datahi = lb->datahi_reg;
-    switch (opc & MO_SIZE) {
-    case MO_8:
-        argreg = tcg_out_arg_reg8(s, argreg, datalo);
-        break;
-    case MO_16:
-        argreg = tcg_out_arg_reg16(s, argreg, datalo);
-        break;
-    case MO_32:
-    default:
-        argreg = tcg_out_arg_reg32(s, argreg, datalo);
-        break;
-    case MO_64:
-        argreg = tcg_out_arg_reg64(s, argreg, datalo, datahi);
-        break;
-    }
-
-    argreg = tcg_out_arg_imm32(s, argreg, oi);
-    argreg = tcg_out_arg_reg32(s, argreg, TCG_REG_R14);
-
-    /* Tail-call to the helper, which will return to the fast path.  */
-    tcg_out_goto(s, COND_AL, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)]);
-}
 #endif /* SOFTMMU */
 
 static inline void tcg_out_qemu_ld_index(TCGContext *s, TCGMemOp opc,
@@ -1602,14 +1480,12 @@ static inline void tcg_out_qemu_ld_direct(TCGContext *s, TCGMemOp opc,
 
 static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
 {
-    TCGReg addrlo, datalo, datahi, addrhi __attribute__((unused));
+    TCGReg addrlo __attribute__((unused));
+    TCGReg addrhi __attribute__((unused));
+    TCGReg datalo __attribute__((unused));
+    TCGReg datahi __attribute__((unused));
     TCGMemOpIdx oi;
     TCGMemOp opc;
-#ifdef CONFIG_SOFTMMU
-    int mem_index, avail;
-    TCGReg addend, t0, t1;
-    tcg_insn_unit *label_ptr;
-#endif
 
     datalo = *args++;
     datahi = (is64 ? *args++ : 0);
@@ -1619,32 +1495,9 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
     opc = get_memop(oi);
 
 #ifdef CONFIG_SOFTMMU
-    mem_index = get_mmuidx(oi);
-
-    avail = 0xf;
-    avail &= ~(1 << addrlo);
-    if (TARGET_LONG_BITS == 64) {
-        avail &= ~(1 << addrhi);
-    }
-    tcg_debug_assert(avail & 1);
-    t0 = TCG_REG_R0;
-    avail &= ~1;
-    tcg_debug_assert(avail != 0);
-    t1 = ctz32(avail);
-
-    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 1,
-                              t0, t1, TCG_REG_TMP);
-
-    /* This a conditional BL only to load a pointer within this opcode into LR
-       for the slow path.  We will not be using the value for a tail call.  */
-    label_ptr = s->code_ptr;
-    tcg_out_bl_noaddr(s, COND_NE);
-
-    tcg_out_qemu_ld_index(s, opc, datalo, datahi, addrlo, addend);
-
-    add_qemu_ldst_label(s, true, oi, datalo, datahi, addrlo, addrhi,
-                        s->code_ptr, label_ptr);
-#else /* !CONFIG_SOFTMMU */
+    add_ldst_ool_label(s, true, is64, oi, R_ARM_PC24, 0);
+    tcg_out_bl_noaddr(s, COND_AL);
+#else
     if (guest_base) {
         tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP, guest_base);
         tcg_out_qemu_ld_index(s, opc, datalo, datahi, addrlo, TCG_REG_TMP);
@@ -1746,14 +1599,12 @@ static inline void tcg_out_qemu_st_direct(TCGContext *s, TCGMemOp opc,
 
 static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 {
-    TCGReg addrlo, datalo, datahi, addrhi __attribute__((unused));
+    TCGReg addrlo __attribute__((unused));
+    TCGReg addrhi __attribute__((unused));
+    TCGReg datalo __attribute__((unused));
+    TCGReg datahi __attribute__((unused));
     TCGMemOpIdx oi;
     TCGMemOp opc;
-#ifdef CONFIG_SOFTMMU
-    int mem_index, avail;
-    TCGReg addend, t0, t1;
-    tcg_insn_unit *label_ptr;
-#endif
 
     datalo = *args++;
     datahi = (is64 ? *args++ : 0);
@@ -1763,35 +1614,9 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
     opc = get_memop(oi);
 
 #ifdef CONFIG_SOFTMMU
-    mem_index = get_mmuidx(oi);
-
-    avail = 0xf;
-    avail &= ~(1 << addrlo);
-    avail &= ~(1 << datalo);
-    if (TARGET_LONG_BITS == 64) {
-        avail &= ~(1 << addrhi);
-    }
-    if (is64) {
-        avail &= ~(1 << datahi);
-    }
-    tcg_debug_assert(avail & 1);
-    t0 = TCG_REG_R0;
-    avail &= ~1;
-    tcg_debug_assert(avail != 0);
-    t1 = ctz32(avail);
-
-    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 0,
-                              t0, t1, TCG_REG_TMP);
-
-    tcg_out_qemu_st_index(s, COND_EQ, opc, datalo, datahi, addrlo, addend);
-
-    /* The conditional call must come last, as we're going to return here.  */
-    label_ptr = s->code_ptr;
-    tcg_out_bl_noaddr(s, COND_NE);
-
-    add_qemu_ldst_label(s, false, oi, datalo, datahi, addrlo, addrhi,
-                        s->code_ptr, label_ptr);
-#else /* !CONFIG_SOFTMMU */
+    add_ldst_ool_label(s, false, is64, oi, R_ARM_PC24, 0);
+    tcg_out_bl_noaddr(s, COND_AL);
+#else
     if (guest_base) {
         tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP, guest_base);
         tcg_out_qemu_st_index(s, COND_AL, opc, datalo,
@@ -1802,6 +1627,115 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 #endif
 }
 
+#ifdef CONFIG_SOFTMMU
+static tcg_insn_unit *tcg_out_qemu_ldst_ool(TCGContext *s, bool is_ld,
+                                            bool is_64, TCGMemOpIdx oi)
+{
+    TCGReg addrlo, addrhi, datalo, datahi, addend, argreg, t0, t1;
+    TCGMemOp opc = get_memop(oi);
+    int mem_index = get_mmuidx(oi);
+    tcg_insn_unit *thunk = s->code_ptr;
+    tcg_insn_unit *label;
+    uintptr_t func;
+    int avail;
+
+    /* Pick out where the arguments are located.  A 64-bit address is
+     * aligned in the register pair R2:R3.  Loads return into R0:R1.
+     * A 32-bit store with a 32-bit address has room at R2, but
+     * otherwise uses R4:R5.
+     */
+    if (TARGET_LONG_BITS == 64) {
+        addrlo = TCG_REG_R2, addrhi = TCG_REG_R3;
+    } else {
+        addrlo = TCG_REG_R1, addrhi = -1;
+    }
+    if (is_ld) {
+        datalo = TCG_REG_R0;
+    } else if (TARGET_LONG_BITS == 64 || is_64) {
+        datalo = TCG_REG_R4;
+    } else {
+        datalo = TCG_REG_R2;
+    }
+    datahi = (is_64 ? datalo + 1 : -1);
+
+    /* We need 3 call-clobbered temps.  One of them is always R12,
+     * one of them is always R0.  The third is somewhere in R[1-3].
+     */
+    avail = 0xf;
+    avail &= ~(1 << addrlo);
+    if (TARGET_LONG_BITS == 64) {
+        avail &= ~(1 << addrhi);
+    }
+    if (!is_ld) {
+        avail &= ~(1 << datalo);
+        if (is_64) {
+            avail &= ~(1 << datahi);
+        }
+    }
+    tcg_debug_assert(avail & 1);
+    t0 = TCG_REG_R0;
+    avail &= ~1;
+    tcg_debug_assert(avail != 0);
+    t1 = ctz32(avail);
+
+    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, is_ld,
+                              t0, t1, TCG_REG_TMP);
+
+    label = s->code_ptr;
+    tcg_out_b_noaddr(s, COND_NE);
+
+    /* TCG Hit.  */
+    if (is_ld) {
+        tcg_out_qemu_ld_index(s, opc, datalo, datahi, addrlo, addend);
+    } else {
+        tcg_out_qemu_st_index(s, COND_AL, opc, datalo, datahi, addrlo, addend);
+    }
+    tcg_out_bx(s, COND_AL, TCG_REG_R14);
+
+    /* TLB Miss.  */
+    reloc_pc24(label, s->code_ptr);
+
+    tcg_out_arg_reg32(s, TCG_REG_R0, TCG_AREG0);
+    /* addrlo and addrhi are in place -- see above */
+    argreg = addrlo + (TARGET_LONG_BITS / 32);
+    if (!is_ld) {
+        switch (opc & MO_SIZE) {
+        case MO_8:
+            argreg = tcg_out_arg_reg8(s, argreg, datalo);
+            break;
+        case MO_16:
+            argreg = tcg_out_arg_reg16(s, argreg, datalo);
+            break;
+        case MO_32:
+            argreg = tcg_out_arg_reg32(s, argreg, datalo);
+            break;
+        case MO_64:
+            argreg = tcg_out_arg_reg64(s, argreg, datalo, datahi);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
+    argreg = tcg_out_arg_imm32(s, argreg, oi);
+    argreg = tcg_out_arg_reg32(s, argreg, TCG_REG_R14);
+
+    /* Tail call to the helper.  */
+    if (is_ld) {
+        func = (uintptr_t)qemu_ld_helpers[opc & (MO_BSWAP | MO_SSIZE)];
+    } else {
+        func = (uintptr_t)qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)];
+    }
+    if (use_armv7_instructions) {
+        tcg_out_movi32(s, COND_AL, TCG_REG_TMP, func);
+        tcg_out_bx(s, COND_AL, TCG_REG_TMP);
+    } else {
+        tcg_out_movi_pool(s, COND_AL, TCG_REG_PC, func);
+    }
+
+    return thunk;
+}
+#endif
+
 static tcg_insn_unit *tb_ret_addr;
 
 static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 20/37] tcg/ppc: Parameterize the temps for tcg_out_tlb_read
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (18 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 19/37] tcg/arm: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 21/37] tcg/ppc: Split out tcg_out_call_int Richard Henderson
                   ` (18 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

When moving the qemu_ld/st arguments to the right place for
a function call, we'll need to move the temps out of the way.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.inc.c | 40 ++++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 656a9ff603..6e656cd41e 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -1516,12 +1516,14 @@ static void * const qemu_st_helpers[16] = {
 };
 
 /* Perform the TLB load and compare.  Places the result of the comparison
-   in CR7, loads the addend of the TLB into R3, and returns the register
-   containing the guest address (zero-extended into R4).  Clobbers R0 and R2. */
+   in CR7, loads the addend of the TLB, and returns the register containing
+   the guest address, places the addend into T0.
+   Clobbers t0, t1, TCG_REG_R0, TCG_REG_TMP1.  */
 
 static TCGReg tcg_out_tlb_read(TCGContext *s, TCGMemOp opc,
                                TCGReg addrlo, TCGReg addrhi,
-                               int mem_index, bool is_read)
+                               int mem_index, bool is_read,
+                               TCGReg t0, TCGReg t1)
 {
     int cmp_off
         = (is_read
@@ -1536,10 +1538,10 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGMemOp opc,
     if (TCG_TARGET_REG_BITS == 64) {
         if (TARGET_LONG_BITS == 32) {
             /* Zero-extend the address into a place helpful for further use. */
-            tcg_out_ext32u(s, TCG_REG_R4, addrlo);
-            addrlo = TCG_REG_R4;
+            tcg_out_ext32u(s, t1, addrlo);
+            addrlo = t1;
         } else {
-            tcg_out_rld(s, RLDICL, TCG_REG_R3, addrlo,
+            tcg_out_rld(s, RLDICL, t0, addrlo,
                         64 - TARGET_PAGE_BITS, 64 - CPU_TLB_BITS);
         }
     }
@@ -1559,27 +1561,27 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGMemOp opc,
 
     /* Extraction and shifting, part 2.  */
     if (TCG_TARGET_REG_BITS == 32 || TARGET_LONG_BITS == 32) {
-        tcg_out_rlw(s, RLWINM, TCG_REG_R3, addrlo,
+        tcg_out_rlw(s, RLWINM, t0, addrlo,
                     32 - (TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS),
                     32 - (CPU_TLB_BITS + CPU_TLB_ENTRY_BITS),
                     31 - CPU_TLB_ENTRY_BITS);
     } else {
-        tcg_out_shli64(s, TCG_REG_R3, TCG_REG_R3, CPU_TLB_ENTRY_BITS);
+        tcg_out_shli64(s, t0, t0, CPU_TLB_ENTRY_BITS);
     }
 
-    tcg_out32(s, ADD | TAB(TCG_REG_R3, TCG_REG_R3, base));
+    tcg_out32(s, ADD | TAB(t0, t0, base));
 
     /* Load the tlb comparator.  */
     if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
-        tcg_out_ld(s, TCG_TYPE_I32, TCG_REG_R4, TCG_REG_R3, cmp_off);
-        tcg_out_ld(s, TCG_TYPE_I32, TCG_REG_TMP1, TCG_REG_R3, cmp_off + 4);
+        tcg_out_ld(s, TCG_TYPE_I32, t1, t0, cmp_off);
+        tcg_out_ld(s, TCG_TYPE_I32, TCG_REG_TMP1, t0, cmp_off + 4);
     } else {
-        tcg_out_ld(s, TCG_TYPE_TL, TCG_REG_TMP1, TCG_REG_R3, cmp_off);
+        tcg_out_ld(s, TCG_TYPE_TL, TCG_REG_TMP1, t0, cmp_off);
     }
 
     /* Load the TLB addend for use on the fast path.  Do this asap
        to minimize any load use delay.  */
-    tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_R3, TCG_REG_R3, add_off);
+    tcg_out_ld(s, TCG_TYPE_PTR, t0, t0, add_off);
 
     /* Clear the non-page, non-alignment bits from the address */
     if (TCG_TARGET_REG_BITS == 32) {
@@ -1624,7 +1626,7 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGMemOp opc,
     if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
         tcg_out_cmp(s, TCG_COND_EQ, TCG_REG_R0, TCG_REG_TMP1,
                     0, 7, TCG_TYPE_I32);
-        tcg_out_cmp(s, TCG_COND_EQ, addrhi, TCG_REG_R4, 0, 6, TCG_TYPE_I32);
+        tcg_out_cmp(s, TCG_COND_EQ, addrhi, t1, 0, 6, TCG_TYPE_I32);
         tcg_out32(s, CRAND | BT(7, CR_EQ) | BA(6, CR_EQ) | BB(7, CR_EQ));
     } else {
         tcg_out_cmp(s, TCG_COND_EQ, TCG_REG_R0, TCG_REG_TMP1,
@@ -1778,13 +1780,14 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is_64)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
-    addrlo = tcg_out_tlb_read(s, opc, addrlo, addrhi, mem_index, true);
+    rbase = TCG_REG_R3;
+    addrlo = tcg_out_tlb_read(s, opc, addrlo, addrhi, mem_index, true,
+                              rbase, TCG_REG_R4);
 
     /* Load a pointer into the current opcode w/conditional branch-link. */
     label_ptr = s->code_ptr;
     tcg_out_bc_noaddr(s, BC | BI(7, CR_EQ) | BO_COND_FALSE | LK);
 
-    rbase = TCG_REG_R3;
 #else  /* !CONFIG_SOFTMMU */
     rbase = guest_base ? TCG_GUEST_BASE_REG : 0;
     if (TCG_TARGET_REG_BITS > TARGET_LONG_BITS) {
@@ -1853,13 +1856,14 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is_64)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
-    addrlo = tcg_out_tlb_read(s, opc, addrlo, addrhi, mem_index, false);
+    rbase = TCG_REG_R3;
+    addrlo = tcg_out_tlb_read(s, opc, addrlo, addrhi, mem_index, false,
+                              rbase, TCG_REG_R4);
 
     /* Load a pointer into the current opcode w/conditional branch-link. */
     label_ptr = s->code_ptr;
     tcg_out_bc_noaddr(s, BC | BI(7, CR_EQ) | BO_COND_FALSE | LK);
 
-    rbase = TCG_REG_R3;
 #else  /* !CONFIG_SOFTMMU */
     rbase = guest_base ? TCG_GUEST_BASE_REG : 0;
     if (TCG_TARGET_REG_BITS > TARGET_LONG_BITS) {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 21/37] tcg/ppc: Split out tcg_out_call_int
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (19 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 20/37] tcg/ppc: Parameterize the temps for tcg_out_tlb_read Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 22/37] tcg/ppc: Add constraints for R7-R8 Richard Henderson
                   ` (17 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

Pass in a LK parameter, allowing us to create tail calls.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.inc.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 6e656cd41e..6377e3a829 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -1408,7 +1408,7 @@ void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
     }
 }
 
-static void tcg_out_call(TCGContext *s, tcg_insn_unit *target)
+static void tcg_out_call_int(TCGContext *s, tcg_insn_unit *target, int lk)
 {
 #ifdef _CALL_AIX
     /* Look through the descriptor.  If the branch is in range, and we
@@ -1419,7 +1419,7 @@ static void tcg_out_call(TCGContext *s, tcg_insn_unit *target)
 
     if (in_range_b(diff) && toc == (uint32_t)toc) {
         tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP1, toc);
-        tcg_out_b(s, LK, tgt);
+        tcg_out_b(s, lk, tgt);
     } else {
         /* Fold the low bits of the constant into the addresses below.  */
         intptr_t arg = (intptr_t)target;
@@ -1434,7 +1434,7 @@ static void tcg_out_call(TCGContext *s, tcg_insn_unit *target)
         tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_R0, TCG_REG_TMP1, ofs);
         tcg_out32(s, MTSPR | RA(TCG_REG_R0) | CTR);
         tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_R2, TCG_REG_TMP1, ofs + SZP);
-        tcg_out32(s, BCCTR | BO_ALWAYS | LK);
+        tcg_out32(s, BCCTR | BO_ALWAYS | lk);
     }
 #elif defined(_CALL_ELF) && _CALL_ELF == 2
     intptr_t diff;
@@ -1448,16 +1448,21 @@ static void tcg_out_call(TCGContext *s, tcg_insn_unit *target)
 
     diff = tcg_pcrel_diff(s, target);
     if (in_range_b(diff)) {
-        tcg_out_b(s, LK, target);
+        tcg_out_b(s, lk, target);
     } else {
         tcg_out32(s, MTSPR | RS(TCG_REG_R12) | CTR);
-        tcg_out32(s, BCCTR | BO_ALWAYS | LK);
+        tcg_out32(s, BCCTR | BO_ALWAYS | lk);
     }
 #else
-    tcg_out_b(s, LK, target);
+    tcg_out_b(s, lk, target);
 #endif
 }
 
+static void tcg_out_call(TCGContext *s, tcg_insn_unit *target)
+{
+    tcg_out_call_int(s, target, LK);
+}
+
 static const uint32_t qemu_ldx_opc[16] = {
     [MO_UB] = LBZX,
     [MO_UW] = LHZX,
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 22/37] tcg/ppc: Add constraints for R7-R8
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (20 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 21/37] tcg/ppc: Split out tcg_out_call_int Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 23/37] tcg/ppc: Change TCG_TARGET_CALL_ALIGN_ARGS to bool Richard Henderson
                   ` (16 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

These are function call arguments that we will need soon.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.inc.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 6377e3a829..484d90ead2 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -236,10 +236,11 @@ static inline void tcg_out_bc_noaddr(TCGContext *s, int insn)
 static const char *target_parse_constraint(TCGArgConstraint *ct,
                                            const char *ct_str, TCGType type)
 {
-    switch (*ct_str++) {
-    case 'A': case 'B': case 'C': case 'D':
+    char c = *ct_str++;
+    switch (c) {
+    case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
         ct->ct |= TCG_CT_REG;
-        tcg_regset_set_reg(ct->u.regs, 3 + ct_str[0] - 'A');
+        tcg_regset_set_reg(ct->u.regs, 3 + c - 'A');
         break;
     case 'r':
         ct->ct |= TCG_CT_REG;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 23/37] tcg/ppc: Change TCG_TARGET_CALL_ALIGN_ARGS to bool
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (21 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 22/37] tcg/ppc: Add constraints for R7-R8 Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 24/37] tcg/ppc: Force qemu_ld/st arguments into fixed registers Richard Henderson
                   ` (15 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

Cleaner not to treat this as #ifdef.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.inc.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 484d90ead2..f7c33f3b7f 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -30,6 +30,8 @@
 #endif
 #ifdef _CALL_SYSV
 # define TCG_TARGET_CALL_ALIGN_ARGS   1
+#else
+# define TCG_TARGET_CALL_ALIGN_ARGS   0
 #endif
 
 /* For some memory operations, we need a scratch that isn't R0.  For the AIX
@@ -1675,9 +1677,7 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     lo = lb->addrlo_reg;
     hi = lb->addrhi_reg;
     if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
-#ifdef TCG_TARGET_CALL_ALIGN_ARGS
-        arg |= 1;
-#endif
+        arg |= TCG_TARGET_CALL_ALIGN_ARGS;
         tcg_out_mov(s, TCG_TYPE_I32, arg++, hi);
         tcg_out_mov(s, TCG_TYPE_I32, arg++, lo);
     } else {
@@ -1720,9 +1720,7 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     lo = lb->addrlo_reg;
     hi = lb->addrhi_reg;
     if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
-#ifdef TCG_TARGET_CALL_ALIGN_ARGS
-        arg |= 1;
-#endif
+        arg |= TCG_TARGET_CALL_ALIGN_ARGS;
         tcg_out_mov(s, TCG_TYPE_I32, arg++, hi);
         tcg_out_mov(s, TCG_TYPE_I32, arg++, lo);
     } else {
@@ -1736,9 +1734,7 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     if (TCG_TARGET_REG_BITS == 32) {
         switch (s_bits) {
         case MO_64:
-#ifdef TCG_TARGET_CALL_ALIGN_ARGS
-            arg |= 1;
-#endif
+            arg |= TCG_TARGET_CALL_ALIGN_ARGS;
             tcg_out_mov(s, TCG_TYPE_I32, arg++, hi);
             /* FALLTHRU */
         case MO_32:
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 24/37] tcg/ppc: Force qemu_ld/st arguments into fixed registers
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (22 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 23/37] tcg/ppc: Change TCG_TARGET_CALL_ALIGN_ARGS to bool Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 25/37] tcg/ppc: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
                   ` (14 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

This is an incremental step toward moving the qemu_ld/st
code sequence out of line.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.inc.c | 151 ++++++++++++++++++++++++++++-----------
 1 file changed, 111 insertions(+), 40 deletions(-)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index f7c33f3b7f..c706b2cf53 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -248,25 +248,6 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
         ct->ct |= TCG_CT_REG;
         ct->u.regs = 0xffffffff;
         break;
-    case 'L':                   /* qemu_ld constraint */
-        ct->ct |= TCG_CT_REG;
-        ct->u.regs = 0xffffffff;
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R3);
-#ifdef CONFIG_SOFTMMU
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R4);
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R5);
-#endif
-        break;
-    case 'S':                   /* qemu_st constraint */
-        ct->ct |= TCG_CT_REG;
-        ct->u.regs = 0xffffffff;
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R3);
-#ifdef CONFIG_SOFTMMU
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R4);
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R5);
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R6);
-#endif
-        break;
     case 'I':
         ct->ct |= TCG_CT_CONST_S16;
         break;
@@ -1759,6 +1740,21 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 
     tcg_out_b(s, 0, lb->raddr);
 }
+
+static TCGReg softmmu_args_2(TCGReg reg, TCGReg *lo, TCGReg *hi)
+{
+#ifdef HOST_WORDS_BIGENDIAN
+    static bool is_be = true;
+#else
+    static bool is_be = false;
+#endif
+
+    assert(TCG_TARGET_REG_BITS == 32);
+    reg |= TCG_TARGET_CALL_ALIGN_ARGS;
+    *(is_be ? hi : lo) = reg;
+    *(is_be ? lo : hi) = reg + 1;
+    return reg + 2;
+}
 #endif /* SOFTMMU */
 
 static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is_64)
@@ -1782,9 +1778,9 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is_64)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
-    rbase = TCG_REG_R3;
+    rbase = TCG_REG_R9;
     addrlo = tcg_out_tlb_read(s, opc, addrlo, addrhi, mem_index, true,
-                              rbase, TCG_REG_R4);
+                              rbase, TCG_REG_R10);
 
     /* Load a pointer into the current opcode w/conditional branch-link. */
     label_ptr = s->code_ptr;
@@ -1858,9 +1854,9 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is_64)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
-    rbase = TCG_REG_R3;
+    rbase = TCG_REG_R9;
     addrlo = tcg_out_tlb_read(s, opc, addrlo, addrhi, mem_index, false,
-                              rbase, TCG_REG_R4);
+                              rbase, TCG_REG_R10);
 
     /* Load a pointer into the current opcode w/conditional branch-link. */
     label_ptr = s->code_ptr;
@@ -2627,13 +2623,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
 {
     static const TCGTargetOpDef r = { .args_ct_str = { "r" } };
     static const TCGTargetOpDef r_r = { .args_ct_str = { "r", "r" } };
-    static const TCGTargetOpDef r_L = { .args_ct_str = { "r", "L" } };
-    static const TCGTargetOpDef S_S = { .args_ct_str = { "S", "S" } };
     static const TCGTargetOpDef r_ri = { .args_ct_str = { "r", "ri" } };
     static const TCGTargetOpDef r_r_r = { .args_ct_str = { "r", "r", "r" } };
-    static const TCGTargetOpDef r_L_L = { .args_ct_str = { "r", "L", "L" } };
-    static const TCGTargetOpDef L_L_L = { .args_ct_str = { "L", "L", "L" } };
-    static const TCGTargetOpDef S_S_S = { .args_ct_str = { "S", "S", "S" } };
     static const TCGTargetOpDef r_r_ri = { .args_ct_str = { "r", "r", "ri" } };
     static const TCGTargetOpDef r_r_rI = { .args_ct_str = { "r", "r", "rI" } };
     static const TCGTargetOpDef r_r_rT = { .args_ct_str = { "r", "r", "rT" } };
@@ -2644,10 +2635,6 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         = { .args_ct_str = { "r", "rI", "rT" } };
     static const TCGTargetOpDef r_r_rZW
         = { .args_ct_str = { "r", "r", "rZW" } };
-    static const TCGTargetOpDef L_L_L_L
-        = { .args_ct_str = { "L", "L", "L", "L" } };
-    static const TCGTargetOpDef S_S_S_S
-        = { .args_ct_str = { "S", "S", "S", "S" } };
     static const TCGTargetOpDef movc
         = { .args_ct_str = { "r", "r", "ri", "rZ", "rZ" } };
     static const TCGTargetOpDef dep
@@ -2660,6 +2647,15 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         = { .args_ct_str = { "r", "r", "r", "r", "rI", "rZM" } };
     static const TCGTargetOpDef sub2
         = { .args_ct_str = { "r", "r", "rI", "rZM", "r", "r" } };
+#ifdef CONFIG_SOFTMMU
+    static const char * const arg_letter[] = {
+        NULL, NULL, NULL, "A", "B", "C", "D", "E", "F", NULL, NULL
+    };
+    TCGReg hi, lo, arg;
+#else
+    static const TCGTargetOpDef r_r_r_r
+        = { .args_ct_str = { "r", "r", "r", "r" } };
+#endif
 
     switch (op) {
     case INDEX_op_goto_ptr:
@@ -2782,18 +2778,93 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_sub2_i32:
         return &sub2;
 
+#ifdef CONFIG_SOFTMMU
     case INDEX_op_qemu_ld_i32:
-        return (TCG_TARGET_REG_BITS == 64 || TARGET_LONG_BITS == 32
-                ? &r_L : &r_L_L);
+        {
+            static TCGTargetOpDef ld32;
+            ld32.args_ct_str[0] = arg_letter[tcg_target_call_oarg_regs[0]];
+            if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
+                ld32.args_ct_str[1] = arg_letter[tcg_target_call_iarg_regs[1]];
+            } else {
+                arg = tcg_target_call_iarg_regs[1];
+                arg = softmmu_args_2(arg, &lo, &hi);
+                ld32.args_ct_str[1] = arg_letter[lo];
+                ld32.args_ct_str[2] = arg_letter[hi];
+            }
+            return &ld32;
+        }
     case INDEX_op_qemu_st_i32:
-        return (TCG_TARGET_REG_BITS == 64 || TARGET_LONG_BITS == 32
-                ? &S_S : &S_S_S);
+        {
+            static TCGTargetOpDef st32;
+            arg = tcg_target_call_iarg_regs[1];
+            if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
+                st32.args_ct_str[0] = arg_letter[arg + 1];
+                st32.args_ct_str[1] = arg_letter[arg];
+            } else {
+                arg = softmmu_args_2(arg, &lo, &hi);
+                st32.args_ct_str[0] = arg_letter[arg];
+                st32.args_ct_str[1] = arg_letter[lo];
+                st32.args_ct_str[2] = arg_letter[hi];
+            }
+            return &st32;
+        }
     case INDEX_op_qemu_ld_i64:
-        return (TCG_TARGET_REG_BITS == 64 ? &r_L
-                : TARGET_LONG_BITS == 32 ? &L_L_L : &L_L_L_L);
+        {
+            static TCGTargetOpDef ld64;
+            if (TCG_TARGET_REG_BITS == 64) {
+                ld64.args_ct_str[0] = arg_letter[tcg_target_call_oarg_regs[0]];
+                ld64.args_ct_str[1] = arg_letter[tcg_target_call_iarg_regs[1]];
+            } else {
+                arg = tcg_target_call_oarg_regs[1];
+                arg = softmmu_args_2(arg, &lo, &hi);
+                ld64.args_ct_str[0] = arg_letter[lo];
+                ld64.args_ct_str[1] = arg_letter[hi];
+                arg = tcg_target_call_iarg_regs[1];
+                if (TARGET_LONG_BITS == 32) {
+                    ld64.args_ct_str[2] = arg_letter[arg];
+                } else {
+                    arg = softmmu_args_2(arg, &lo, &hi);
+                    ld64.args_ct_str[2] = arg_letter[lo];
+                    ld64.args_ct_str[3] = arg_letter[hi];
+                }
+            }
+            return &ld64;
+        }
     case INDEX_op_qemu_st_i64:
-        return (TCG_TARGET_REG_BITS == 64 ? &S_S
-                : TARGET_LONG_BITS == 32 ? &S_S_S : &S_S_S_S);
+        {
+            static TCGTargetOpDef st64;
+            if (TCG_TARGET_REG_BITS == 64) {
+                st64.args_ct_str[1] = arg_letter[tcg_target_call_iarg_regs[1]];
+                st64.args_ct_str[0] = arg_letter[tcg_target_call_iarg_regs[2]];
+            } else {
+                arg = tcg_target_call_iarg_regs[1];
+                if (TARGET_LONG_BITS == 32) {
+                    st64.args_ct_str[2] = arg_letter[arg++];
+                } else {
+                    arg = softmmu_args_2(arg, &lo, &hi);
+                    st64.args_ct_str[2] = arg_letter[lo];
+                    st64.args_ct_str[3] = arg_letter[hi];
+                }
+                arg = softmmu_args_2(arg, &lo, &hi);
+                st64.args_ct_str[0] = arg_letter[lo];
+                st64.args_ct_str[1] = arg_letter[hi];
+            }
+            return &st64;
+        }
+#else
+    case INDEX_op_qemu_ld_i32:
+    case INDEX_op_qemu_st_i32:
+        return TARGET_LONG_BITS <= TCG_TARGET_REG_BITS ? &r_r : &r_r_r;
+    case INDEX_op_qemu_ld_i64:
+    case INDEX_op_qemu_st_i64:
+        if (TCG_TARGET_REG_BITS == 64) {
+            return &r_r;
+        } else if (TARGET_LONG_BITS == 32) {
+            return &r_r_r;
+        } else {
+            return &r_r_r_r;
+        }
+#endif
 
     default:
         return NULL;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 25/37] tcg/ppc: Use TCG_TARGET_NEED_LDST_OOL_LABELS
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (23 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 24/37] tcg/ppc: Force qemu_ld/st arguments into fixed registers Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 26/37] tcg: Clean up generic bswap32 Richard Henderson
                   ` (13 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.h     |   2 +-
 tcg/ppc/tcg-target.inc.c | 326 +++++++++++++++++----------------------
 2 files changed, 141 insertions(+), 187 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index be52ad1d2e..bbc49bb1be 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -130,7 +130,7 @@ void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
 #define TCG_TARGET_DEFAULT_MO (0)
 
 #ifdef CONFIG_SOFTMMU
-#define TCG_TARGET_NEED_LDST_LABELS
+#define TCG_TARGET_NEED_LDST_OOL_LABELS
 #endif
 #define TCG_TARGET_NEED_POOL_LABELS
 
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index c706b2cf53..fed7f5fe6e 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -1476,7 +1476,7 @@ static const uint32_t qemu_exts_opc[4] = {
 };
 
 #if defined (CONFIG_SOFTMMU)
-#include "tcg-ldst.inc.c"
+#include "tcg-ldst-ool.inc.c"
 
 /* helper signature: helper_ld_mmu(CPUState *env, target_ulong addr,
  *                                 int mmu_idx, uintptr_t ra)
@@ -1489,6 +1489,14 @@ static void * const qemu_ld_helpers[16] = {
     [MO_BEUW] = helper_be_lduw_mmu,
     [MO_BEUL] = helper_be_ldul_mmu,
     [MO_BEQ]  = helper_be_ldq_mmu,
+
+    [MO_SB]   = helper_ret_ldsb_mmu,
+    [MO_LESW] = helper_le_ldsw_mmu,
+    [MO_BESW] = helper_be_ldsw_mmu,
+#if TCG_TARGET_REG_BITS == 64
+    [MO_LESL] = helper_le_ldsl_mmu,
+    [MO_BESL] = helper_be_ldsl_mmu,
+#endif
 };
 
 /* helper signature: helper_st_mmu(CPUState *env, target_ulong addr,
@@ -1526,9 +1534,8 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGMemOp opc,
     /* Extract the page index, shifted into place for tlb index.  */
     if (TCG_TARGET_REG_BITS == 64) {
         if (TARGET_LONG_BITS == 32) {
-            /* Zero-extend the address into a place helpful for further use. */
-            tcg_out_ext32u(s, t1, addrlo);
-            addrlo = t1;
+            /* Zero-extend the address now.  */
+            tcg_out_ext32u(s, addrlo, addrlo);
         } else {
             tcg_out_rld(s, RLDICL, t0, addrlo,
                         64 - TARGET_PAGE_BITS, 64 - CPU_TLB_BITS);
@@ -1625,122 +1632,6 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGMemOp opc,
     return addrlo;
 }
 
-/* Record the context of a call to the out of line helper code for the slow
-   path for a load or store, so that we can later generate the correct
-   helper code.  */
-static void add_qemu_ldst_label(TCGContext *s, bool is_ld, TCGMemOpIdx oi,
-                                TCGReg datalo_reg, TCGReg datahi_reg,
-                                TCGReg addrlo_reg, TCGReg addrhi_reg,
-                                tcg_insn_unit *raddr, tcg_insn_unit *lptr)
-{
-    TCGLabelQemuLdst *label = new_ldst_label(s);
-
-    label->is_ld = is_ld;
-    label->oi = oi;
-    label->datalo_reg = datalo_reg;
-    label->datahi_reg = datahi_reg;
-    label->addrlo_reg = addrlo_reg;
-    label->addrhi_reg = addrhi_reg;
-    label->raddr = raddr;
-    label->label_ptr[0] = lptr;
-}
-
-static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
-{
-    TCGMemOpIdx oi = lb->oi;
-    TCGMemOp opc = get_memop(oi);
-    TCGReg hi, lo, arg = TCG_REG_R3;
-
-    reloc_pc14(lb->label_ptr[0], s->code_ptr);
-
-    tcg_out_mov(s, TCG_TYPE_PTR, arg++, TCG_AREG0);
-
-    lo = lb->addrlo_reg;
-    hi = lb->addrhi_reg;
-    if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
-        arg |= TCG_TARGET_CALL_ALIGN_ARGS;
-        tcg_out_mov(s, TCG_TYPE_I32, arg++, hi);
-        tcg_out_mov(s, TCG_TYPE_I32, arg++, lo);
-    } else {
-        /* If the address needed to be zero-extended, we'll have already
-           placed it in R4.  The only remaining case is 64-bit guest.  */
-        tcg_out_mov(s, TCG_TYPE_TL, arg++, lo);
-    }
-
-    tcg_out_movi(s, TCG_TYPE_I32, arg++, oi);
-    tcg_out32(s, MFSPR | RT(arg) | LR);
-
-    tcg_out_call(s, qemu_ld_helpers[opc & (MO_BSWAP | MO_SIZE)]);
-
-    lo = lb->datalo_reg;
-    hi = lb->datahi_reg;
-    if (TCG_TARGET_REG_BITS == 32 && (opc & MO_SIZE) == MO_64) {
-        tcg_out_mov(s, TCG_TYPE_I32, lo, TCG_REG_R4);
-        tcg_out_mov(s, TCG_TYPE_I32, hi, TCG_REG_R3);
-    } else if (opc & MO_SIGN) {
-        uint32_t insn = qemu_exts_opc[opc & MO_SIZE];
-        tcg_out32(s, insn | RA(lo) | RS(TCG_REG_R3));
-    } else {
-        tcg_out_mov(s, TCG_TYPE_REG, lo, TCG_REG_R3);
-    }
-
-    tcg_out_b(s, 0, lb->raddr);
-}
-
-static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
-{
-    TCGMemOpIdx oi = lb->oi;
-    TCGMemOp opc = get_memop(oi);
-    TCGMemOp s_bits = opc & MO_SIZE;
-    TCGReg hi, lo, arg = TCG_REG_R3;
-
-    reloc_pc14(lb->label_ptr[0], s->code_ptr);
-
-    tcg_out_mov(s, TCG_TYPE_PTR, arg++, TCG_AREG0);
-
-    lo = lb->addrlo_reg;
-    hi = lb->addrhi_reg;
-    if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
-        arg |= TCG_TARGET_CALL_ALIGN_ARGS;
-        tcg_out_mov(s, TCG_TYPE_I32, arg++, hi);
-        tcg_out_mov(s, TCG_TYPE_I32, arg++, lo);
-    } else {
-        /* If the address needed to be zero-extended, we'll have already
-           placed it in R4.  The only remaining case is 64-bit guest.  */
-        tcg_out_mov(s, TCG_TYPE_TL, arg++, lo);
-    }
-
-    lo = lb->datalo_reg;
-    hi = lb->datahi_reg;
-    if (TCG_TARGET_REG_BITS == 32) {
-        switch (s_bits) {
-        case MO_64:
-            arg |= TCG_TARGET_CALL_ALIGN_ARGS;
-            tcg_out_mov(s, TCG_TYPE_I32, arg++, hi);
-            /* FALLTHRU */
-        case MO_32:
-            tcg_out_mov(s, TCG_TYPE_I32, arg++, lo);
-            break;
-        default:
-            tcg_out_rlw(s, RLWINM, arg++, lo, 0, 32 - (8 << s_bits), 31);
-            break;
-        }
-    } else {
-        if (s_bits == MO_64) {
-            tcg_out_mov(s, TCG_TYPE_I64, arg++, lo);
-        } else {
-            tcg_out_rld(s, RLDICL, arg++, lo, 0, 64 - (8 << s_bits));
-        }
-    }
-
-    tcg_out_movi(s, TCG_TYPE_I32, arg++, oi);
-    tcg_out32(s, MFSPR | RT(arg) | LR);
-
-    tcg_out_call(s, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)]);
-
-    tcg_out_b(s, 0, lb->raddr);
-}
-
 static TCGReg softmmu_args_2(TCGReg reg, TCGReg *lo, TCGReg *hi)
 {
 #ifdef HOST_WORDS_BIGENDIAN
@@ -1757,44 +1648,10 @@ static TCGReg softmmu_args_2(TCGReg reg, TCGReg *lo, TCGReg *hi)
 }
 #endif /* SOFTMMU */
 
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is_64)
+static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
+                                   TCGReg addrlo, TCGReg rbase, TCGMemOp opc)
 {
-    TCGReg datalo, datahi, addrlo, rbase;
-    TCGReg addrhi __attribute__((unused));
-    TCGMemOpIdx oi;
-    TCGMemOp opc, s_bits;
-#ifdef CONFIG_SOFTMMU
-    int mem_index;
-    tcg_insn_unit *label_ptr;
-#endif
-
-    datalo = *args++;
-    datahi = (TCG_TARGET_REG_BITS == 32 && is_64 ? *args++ : 0);
-    addrlo = *args++;
-    addrhi = (TCG_TARGET_REG_BITS < TARGET_LONG_BITS ? *args++ : 0);
-    oi = *args++;
-    opc = get_memop(oi);
-    s_bits = opc & MO_SIZE;
-
-#ifdef CONFIG_SOFTMMU
-    mem_index = get_mmuidx(oi);
-    rbase = TCG_REG_R9;
-    addrlo = tcg_out_tlb_read(s, opc, addrlo, addrhi, mem_index, true,
-                              rbase, TCG_REG_R10);
-
-    /* Load a pointer into the current opcode w/conditional branch-link. */
-    label_ptr = s->code_ptr;
-    tcg_out_bc_noaddr(s, BC | BI(7, CR_EQ) | BO_COND_FALSE | LK);
-
-#else  /* !CONFIG_SOFTMMU */
-    rbase = guest_base ? TCG_GUEST_BASE_REG : 0;
-    if (TCG_TARGET_REG_BITS > TARGET_LONG_BITS) {
-        tcg_out_ext32u(s, TCG_REG_TMP1, addrlo);
-        addrlo = TCG_REG_TMP1;
-    }
-#endif
-
-    if (TCG_TARGET_REG_BITS == 32 && s_bits == MO_64) {
+    if (TCG_TARGET_REG_BITS == 32 && (opc & MO_SIZE) == MO_64) {
         if (opc & MO_BSWAP) {
             tcg_out32(s, ADDI | TAI(TCG_REG_R0, addrlo, 4));
             tcg_out32(s, LWBRX | TAB(datalo, rbase, addrlo));
@@ -1811,7 +1668,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is_64)
             tcg_out32(s, LWZ | TAI(datalo, addrlo, 4));
         }
     } else {
-        uint32_t insn = qemu_ldx_opc[opc & (MO_BSWAP | MO_SSIZE)];
+        uint32_t insn = qemu_ldx_opc[opc & (MO_SSIZE | MO_BSWAP)];
         if (!HAVE_ISA_2_06 && insn == LDBRX) {
             tcg_out32(s, ADDI | TAI(TCG_REG_R0, addrlo, 4));
             tcg_out32(s, LWBRX | TAB(datalo, rbase, addrlo));
@@ -1822,55 +1679,45 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is_64)
         } else {
             insn = qemu_ldx_opc[opc & (MO_SIZE | MO_BSWAP)];
             tcg_out32(s, insn | TAB(datalo, rbase, addrlo));
-            insn = qemu_exts_opc[s_bits];
+            insn = qemu_exts_opc[opc & MO_SIZE];
             tcg_out32(s, insn | RA(datalo) | RS(datalo));
         }
     }
-
-#ifdef CONFIG_SOFTMMU
-    add_qemu_ldst_label(s, true, oi, datalo, datahi, addrlo, addrhi,
-                        s->code_ptr, label_ptr);
-#endif
 }
 
-static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is_64)
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is_64)
 {
-    TCGReg datalo, datahi, addrlo, rbase;
+    TCGReg datalo __attribute__((unused));
+    TCGReg datahi __attribute__((unused));
+    TCGReg addrlo __attribute__((unused));
     TCGReg addrhi __attribute__((unused));
     TCGMemOpIdx oi;
-    TCGMemOp opc, s_bits;
-#ifdef CONFIG_SOFTMMU
-    int mem_index;
-    tcg_insn_unit *label_ptr;
-#endif
 
     datalo = *args++;
     datahi = (TCG_TARGET_REG_BITS == 32 && is_64 ? *args++ : 0);
     addrlo = *args++;
     addrhi = (TCG_TARGET_REG_BITS < TARGET_LONG_BITS ? *args++ : 0);
     oi = *args++;
-    opc = get_memop(oi);
-    s_bits = opc & MO_SIZE;
 
 #ifdef CONFIG_SOFTMMU
-    mem_index = get_mmuidx(oi);
-    rbase = TCG_REG_R9;
-    addrlo = tcg_out_tlb_read(s, opc, addrlo, addrhi, mem_index, false,
-                              rbase, TCG_REG_R10);
+    add_ldst_ool_label(s, true, is_64, oi, R_PPC_REL24, 0);
+    tcg_out_b_noaddr(s, B | LK);
+#else
+    TCGMemOp opc = get_memop(oi);
+    TCGReg rbase = guest_base ? TCG_GUEST_BASE_REG : 0;
 
-    /* Load a pointer into the current opcode w/conditional branch-link. */
-    label_ptr = s->code_ptr;
-    tcg_out_bc_noaddr(s, BC | BI(7, CR_EQ) | BO_COND_FALSE | LK);
-
-#else  /* !CONFIG_SOFTMMU */
-    rbase = guest_base ? TCG_GUEST_BASE_REG : 0;
     if (TCG_TARGET_REG_BITS > TARGET_LONG_BITS) {
         tcg_out_ext32u(s, TCG_REG_TMP1, addrlo);
         addrlo = TCG_REG_TMP1;
     }
+    tcg_out_qemu_ld_direct(s, datalo, datahi, addrlo, rbase, opc);
 #endif
+}
 
-    if (TCG_TARGET_REG_BITS == 32 && s_bits == MO_64) {
+static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
+                                   TCGReg addrlo, TCGReg rbase, TCGMemOp opc)
+{
+    if (TCG_TARGET_REG_BITS == 32 && (opc & MO_SIZE) == MO_64) {
         if (opc & MO_BSWAP) {
             tcg_out32(s, ADDI | TAI(TCG_REG_R0, addrlo, 4));
             tcg_out32(s, STWBRX | SAB(datalo, rbase, addrlo));
@@ -1894,10 +1741,34 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is_64)
             tcg_out32(s, insn | SAB(datalo, rbase, addrlo));
         }
     }
+}
+
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is_64)
+{
+    TCGReg datalo __attribute__((unused));
+    TCGReg datahi __attribute__((unused));
+    TCGReg addrlo __attribute__((unused));
+    TCGReg addrhi __attribute__((unused));
+    TCGMemOpIdx oi;
+
+    datalo = *args++;
+    datahi = (TCG_TARGET_REG_BITS == 32 && is_64 ? *args++ : 0);
+    addrlo = *args++;
+    addrhi = (TCG_TARGET_REG_BITS < TARGET_LONG_BITS ? *args++ : 0);
+    oi = *args++;
 
 #ifdef CONFIG_SOFTMMU
-    add_qemu_ldst_label(s, false, oi, datalo, datahi, addrlo, addrhi,
-                        s->code_ptr, label_ptr);
+    add_ldst_ool_label(s, false, is_64, oi, R_PPC_REL24, 0);
+    tcg_out_b_noaddr(s, B | LK);
+#else
+    TCGMemOp opc = get_memop(oi);
+    TCGReg rbase = guest_base ? TCG_GUEST_BASE_REG : 0;
+
+    if (TCG_TARGET_REG_BITS > TARGET_LONG_BITS) {
+        tcg_out_ext32u(s, TCG_REG_TMP1, addrlo);
+        addrlo = TCG_REG_TMP1;
+    }
+    tcg_out_qemu_st_direct(s, datalo, datahi, addrlo, rbase, opc);
 #endif
 }
 
@@ -1909,6 +1780,89 @@ static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
     }
 }
 
+#ifdef CONFIG_SOFTMMU
+static tcg_insn_unit *tcg_out_qemu_ldst_ool(TCGContext *s, bool is_ld,
+                                            bool is_64, TCGMemOpIdx oi)
+{
+    TCGMemOp opc = get_memop(oi);
+    int mem_index = get_mmuidx(oi);
+    TCGReg addrlo, addrhi, datalo, datahi, rbase, nextarg;
+    tcg_insn_unit *thunk, *label;
+
+    /* Since we're amortizing the cost, align the thunk.  */
+    thunk = QEMU_ALIGN_PTR_UP(s->code_ptr, 16);
+    if (thunk != s->code_ptr) {
+        tcg_out_nop_fill(s->code_ptr, thunk - s->code_ptr);
+        s->code_ptr = thunk;
+    }
+
+    /* Discover where the inputs are held.  */
+    if (TCG_TARGET_REG_BITS == 64) {
+        addrhi = addrlo = tcg_target_call_iarg_regs[1];
+        if (is_ld) {
+            datahi = datalo = tcg_target_call_oarg_regs[0];
+            nextarg = addrlo + 1;
+        } else {
+            datahi = datalo = addrlo + 1;
+            nextarg = addrlo + 2;
+        }
+    } else {
+        nextarg = tcg_target_call_iarg_regs[1];
+        if (TARGET_LONG_BITS == 64) {
+            nextarg = softmmu_args_2(nextarg, &addrlo, &addrhi);
+        } else {
+            addrhi = addrlo = nextarg++;
+        }
+        if (is_ld) {
+            TCGReg arg = tcg_target_call_oarg_regs[0];
+            if (is_64) {
+                softmmu_args_2(arg, &datalo, &datahi);
+            } else {
+                addrhi = addrlo = arg;
+            }
+        } else {
+            if (is_64) {
+                nextarg = softmmu_args_2(nextarg, &datalo, &datahi);
+            } else {
+                addrhi = addrlo = nextarg++;
+            }
+        }
+    }
+
+    rbase = TCG_REG_R9;
+    tcg_out_tlb_read(s, opc, addrlo, addrhi, mem_index,
+                     is_ld, rbase, TCG_REG_R10);
+
+    label = s->code_ptr;
+    tcg_out_bc_noaddr(s, BC | BI(7, CR_EQ) | BO_COND_FALSE);
+
+    /* TLB Hit */
+    if (is_ld) {
+        tcg_out_qemu_ld_direct(s, datalo, datahi, addrlo, rbase, opc);
+    } else {
+        tcg_out_qemu_st_direct(s, datalo, datahi, addrlo, rbase, opc);
+    }
+    tcg_out32(s, BCLR | BO_ALWAYS);
+
+    /* TLB Miss */
+    reloc_pc14(label, s->code_ptr);
+
+    tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
+    /* The addrhi, addrlo, datahi, datalo registers are already in place.  */
+    tcg_out_movi(s, TCG_TYPE_I32, nextarg++, oi);
+    tcg_out32(s, MFSPR | RT(nextarg) | LR);
+
+    /* Tail call to the helper.  */
+    if (is_ld) {
+        tcg_out_call_int(s, qemu_ld_helpers[opc & (MO_BSWAP | MO_SSIZE)], 0);
+    } else {
+        tcg_out_call_int(s, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)], 0);
+    }
+
+    return thunk;
+}
+#endif
+
 /* Parameters for function call generation, used in tcg.c.  */
 #define TCG_TARGET_STACK_ALIGN       16
 #define TCG_TARGET_EXTEND_ARGS       1
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 26/37] tcg: Clean up generic bswap32
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (24 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 25/37] tcg/ppc: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 27/37] tcg: Clean up generic bswap64 Richard Henderson
                   ` (12 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

Based on the only current user, Sparc:

New code uses 1 constant that takes 2 insns to create, plus 8.
Old code used 2 constants that took 2 insns to create, plus 9.
The result is a new total of 10 vs an old total of 13.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op.c | 54 ++++++++++++++++++++++++++--------------------------
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 7a8015c5a9..a956499e46 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1012,22 +1012,22 @@ void tcg_gen_bswap32_i32(TCGv_i32 ret, TCGv_i32 arg)
     if (TCG_TARGET_HAS_bswap32_i32) {
         tcg_gen_op2_i32(INDEX_op_bswap32_i32, ret, arg);
     } else {
-        TCGv_i32 t0, t1;
-        t0 = tcg_temp_new_i32();
-        t1 = tcg_temp_new_i32();
+        TCGv_i32 t0 = tcg_temp_new_i32();
+        TCGv_i32 t1 = tcg_temp_new_i32();
+        TCGv_i32 t2 = tcg_const_i32(0x00ff00ff);
 
-        tcg_gen_shli_i32(t0, arg, 24);
+                                        /* arg = abcd */
+        tcg_gen_shri_i32(t0, arg, 8);   /*  t0 = .abc */
+        tcg_gen_and_i32(t1, arg, t2);   /*  t1 = .b.d */
+        tcg_gen_and_i32(t0, t0, t2);    /*  t0 = .a.c */
+        tcg_temp_free_i32(t2);
+        tcg_gen_shli_i32(t1, t1, 8);    /*  t1 = b.d. */
+        tcg_gen_or_i32(ret, t0, t1);    /* ret = badc */
 
-        tcg_gen_andi_i32(t1, arg, 0x0000ff00);
-        tcg_gen_shli_i32(t1, t1, 8);
-        tcg_gen_or_i32(t0, t0, t1);
+        tcg_gen_shri_i32(t0, ret, 16);  /*  t0 = ..ba */
+        tcg_gen_shli_i32(t1, ret, 16);  /*  t1 = dc.. */
+        tcg_gen_or_i32(ret, t0, t1);    /* ret = dcba */
 
-        tcg_gen_shri_i32(t1, arg, 8);
-        tcg_gen_andi_i32(t1, t1, 0x0000ff00);
-        tcg_gen_or_i32(t0, t0, t1);
-
-        tcg_gen_shri_i32(t1, arg, 24);
-        tcg_gen_or_i32(ret, t0, t1);
         tcg_temp_free_i32(t0);
         tcg_temp_free_i32(t1);
     }
@@ -1638,23 +1638,23 @@ void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg)
     } else if (TCG_TARGET_HAS_bswap32_i64) {
         tcg_gen_op2_i64(INDEX_op_bswap32_i64, ret, arg);
     } else {
-        TCGv_i64 t0, t1;
-        t0 = tcg_temp_new_i64();
-        t1 = tcg_temp_new_i64();
+        TCGv_i64 t0 = tcg_temp_new_i64();
+        TCGv_i64 t1 = tcg_temp_new_i64();
+        TCGv_i64 t2 = tcg_const_i64(0x00ff00ff);
 
-        tcg_gen_shli_i64(t0, arg, 24);
-        tcg_gen_ext32u_i64(t0, t0);
+                                        /* arg = ....abcd */
+        tcg_gen_shri_i64(t0, arg, 8);   /*  t0 = .....abc */
+        tcg_gen_and_i64(t1, arg, t2);   /*  t1 = .....b.d */
+        tcg_gen_and_i64(t0, t0, t2);    /*  t0 = .....a.c */
+        tcg_temp_free_i64(t2);
+        tcg_gen_shli_i64(t1, t1, 8);    /*  t1 = ....b.d. */
+        tcg_gen_or_i64(ret, t0, t1);    /* ret = ....badc */
 
-        tcg_gen_andi_i64(t1, arg, 0x0000ff00);
-        tcg_gen_shli_i64(t1, t1, 8);
-        tcg_gen_or_i64(t0, t0, t1);
+        tcg_gen_shli_i64(t1, ret, 48);  /*  t1 = dc...... */
+        tcg_gen_shri_i64(t0, ret, 16);  /*  t0 = ......ba */
+        tcg_gen_shri_i64(t1, ret, 32);  /*  t1 = ....dc.. */
+        tcg_gen_or_i64(ret, t0, t1);    /* ret = ....dcba */
 
-        tcg_gen_shri_i64(t1, arg, 8);
-        tcg_gen_andi_i64(t1, t1, 0x0000ff00);
-        tcg_gen_or_i64(t0, t0, t1);
-
-        tcg_gen_shri_i64(t1, arg, 24);
-        tcg_gen_or_i64(ret, t0, t1);
         tcg_temp_free_i64(t0);
         tcg_temp_free_i64(t1);
     }
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 27/37] tcg: Clean up generic bswap64
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (25 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 26/37] tcg: Clean up generic bswap32 Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 28/37] tcg/optimize: Optimize bswap Richard Henderson
                   ` (11 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

Based on the only current user, Sparc:

New code uses 2 constants that take 2 insns to load from constant pool,
plus 13.  Old code used 6 constants that took 1 or 2 insns to create,
plus 21.  The result is a new total of 17 vs an old total of 29.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op.c | 43 ++++++++++++++++++-------------------------
 1 file changed, 18 insertions(+), 25 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index a956499e46..887b371a81 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1678,37 +1678,30 @@ void tcg_gen_bswap64_i64(TCGv_i64 ret, TCGv_i64 arg)
     } else {
         TCGv_i64 t0 = tcg_temp_new_i64();
         TCGv_i64 t1 = tcg_temp_new_i64();
+        TCGv_i64 t2 = tcg_temp_new_i64();
 
-        tcg_gen_shli_i64(t0, arg, 56);
+                                        /* arg = abcdefgh */
+        tcg_gen_movi_i64(t2, 0x00ff00ff00ff00ffull);
+        tcg_gen_shri_i64(t0, arg, 8);   /*  t0 = .abcdefg */
+        tcg_gen_and_i64(t1, arg, t2);   /*  t1 = .b.d.f.h */
+        tcg_gen_and_i64(t0, t0, t2);    /*  t0 = .a.c.e.g */
+        tcg_gen_shli_i64(t1, t1, 8);    /*  t1 = b.d.f.h. */
+        tcg_gen_or_i64(ret, t0, t1);    /* ret = badcfehg */
 
-        tcg_gen_andi_i64(t1, arg, 0x0000ff00);
-        tcg_gen_shli_i64(t1, t1, 40);
-        tcg_gen_or_i64(t0, t0, t1);
+        tcg_gen_movi_i64(t2, 0x0000ffff0000ffffull);
+        tcg_gen_shri_i64(t0, ret, 16);  /*  t0 = ..badcfe */
+        tcg_gen_and_i64(t1, ret, t2);   /*  t1 = ..dc..hg */
+        tcg_gen_and_i64(t0, t0, t2);    /*  t0 = ..ba..fe */
+        tcg_gen_shli_i64(t1, t1, 16);   /*  t1 = dc..hg.. */
+        tcg_gen_or_i64(ret, t0, t1);    /* ret = dcbahgfe */
 
-        tcg_gen_andi_i64(t1, arg, 0x00ff0000);
-        tcg_gen_shli_i64(t1, t1, 24);
-        tcg_gen_or_i64(t0, t0, t1);
+        tcg_gen_shri_i64(t0, ret, 32);  /*  t0 = ....dcba */
+        tcg_gen_shli_i64(t1, ret, 32);  /*  t1 = hgfe.... */
+        tcg_gen_or_i64(ret, t0, t1);    /* ret = hgfedcba */
 
-        tcg_gen_andi_i64(t1, arg, 0xff000000);
-        tcg_gen_shli_i64(t1, t1, 8);
-        tcg_gen_or_i64(t0, t0, t1);
-
-        tcg_gen_shri_i64(t1, arg, 8);
-        tcg_gen_andi_i64(t1, t1, 0xff000000);
-        tcg_gen_or_i64(t0, t0, t1);
-
-        tcg_gen_shri_i64(t1, arg, 24);
-        tcg_gen_andi_i64(t1, t1, 0x00ff0000);
-        tcg_gen_or_i64(t0, t0, t1);
-
-        tcg_gen_shri_i64(t1, arg, 40);
-        tcg_gen_andi_i64(t1, t1, 0x0000ff00);
-        tcg_gen_or_i64(t0, t0, t1);
-
-        tcg_gen_shri_i64(t1, arg, 56);
-        tcg_gen_or_i64(ret, t0, t1);
         tcg_temp_free_i64(t0);
         tcg_temp_free_i64(t1);
+        tcg_temp_free_i64(t2);
     }
 }
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 28/37] tcg/optimize: Optimize bswap
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (26 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 27/37] tcg: Clean up generic bswap64 Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 29/37] tcg: Add TCG_TARGET_HAS_MEMORY_BSWAP Richard Henderson
                   ` (10 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

Somehow we forgot these operations, once upon a time.
This will allow immediate stores to have their bswap
optimized away.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 5dbe11c3c8..6b98ec13e6 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -353,6 +353,15 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
     CASE_OP_32_64(ext16u):
         return (uint16_t)x;
 
+    CASE_OP_32_64(bswap16):
+        return bswap16(x);
+
+    CASE_OP_32_64(bswap32):
+        return bswap32(x);
+
+    case INDEX_op_bswap64_i64:
+        return bswap64(x);
+
     case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         return (int32_t)x;
@@ -1105,6 +1114,9 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(ext16s):
         CASE_OP_32_64(ext16u):
         CASE_OP_32_64(ctpop):
+        CASE_OP_32_64(bswap16):
+        CASE_OP_32_64(bswap32):
+        case INDEX_op_bswap64_i64:
         case INDEX_op_ext32s_i64:
         case INDEX_op_ext32u_i64:
         case INDEX_op_ext_i32_i64:
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 29/37] tcg: Add TCG_TARGET_HAS_MEMORY_BSWAP
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (27 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 28/37] tcg/optimize: Optimize bswap Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 30/37] tcg/i386: Adjust TCG_TARGET_HAS_MEMORY_BSWAP Richard Henderson
                   ` (9 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

For now, defined universally as true, since we previously required
backends to implement swapped memory operations.  Future patches
may now remove that support where it is onerous.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.h |   1 +
 tcg/arm/tcg-target.h     |   1 +
 tcg/i386/tcg-target.h    |   2 +
 tcg/mips/tcg-target.h    |   1 +
 tcg/ppc/tcg-target.h     |   1 +
 tcg/s390/tcg-target.h    |   1 +
 tcg/sparc/tcg-target.h   |   1 +
 tcg/tci/tcg-target.h     |   2 +
 tcg/tcg-op.c             | 118 ++++++++++++++++++++++++++++++++++++++-
 9 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index d1bd77c41d..0788f2eecb 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -137,6 +137,7 @@ typedef enum {
 #define TCG_TARGET_HAS_mul_vec          1
 
 #define TCG_TARGET_DEFAULT_MO (0)
+#define TCG_TARGET_HAS_MEMORY_BSWAP     1
 
 static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 {
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 02981abdcc..7a4c55d66d 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -131,6 +131,7 @@ enum {
 };
 
 #define TCG_TARGET_DEFAULT_MO (0)
+#define TCG_TARGET_HAS_MEMORY_BSWAP     1
 
 static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 {
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 1b2d4e1b0d..212ba554e9 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -219,6 +219,8 @@ static inline void tb_target_set_jmp_target(uintptr_t tc_ptr,
 
 #define TCG_TARGET_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
 
+#define TCG_TARGET_HAS_MEMORY_BSWAP  1
+
 #ifdef CONFIG_SOFTMMU
 #define TCG_TARGET_NEED_LDST_OOL_LABELS
 #endif
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index a8222476f0..5cb8672470 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -203,6 +203,7 @@ extern bool use_mips32r2_instructions;
 #endif
 
 #define TCG_TARGET_DEFAULT_MO (0)
+#define TCG_TARGET_HAS_MEMORY_BSWAP     1
 
 static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 {
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index bbc49bb1be..6f587010fb 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -128,6 +128,7 @@ void flush_icache_range(uintptr_t start, uintptr_t stop);
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
 
 #define TCG_TARGET_DEFAULT_MO (0)
+#define TCG_TARGET_HAS_MEMORY_BSWAP     1
 
 #ifdef CONFIG_SOFTMMU
 #define TCG_TARGET_NEED_LDST_OOL_LABELS
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 6f2b06a7d1..853ed6e7aa 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -135,6 +135,7 @@ extern uint64_t s390_facilities;
 #define TCG_TARGET_CALL_STACK_OFFSET	160
 
 #define TCG_TARGET_EXTEND_ARGS 1
+#define TCG_TARGET_HAS_MEMORY_BSWAP   1
 
 #define TCG_TARGET_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
 
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index d8339bf010..a0ed2a3342 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -164,6 +164,7 @@ extern bool use_vis3_instructions;
 #define TCG_AREG0 TCG_REG_I0
 
 #define TCG_TARGET_DEFAULT_MO (0)
+#define TCG_TARGET_HAS_MEMORY_BSWAP     1
 
 static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 {
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 26140d78cb..086f34e69a 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -198,6 +198,8 @@ static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
    We prefer consistency across hosts on this.  */
 #define TCG_TARGET_DEFAULT_MO  (0)
 
+#define TCG_TARGET_HAS_MEMORY_BSWAP     1
+
 static inline void tb_target_set_jmp_target(uintptr_t tc_ptr,
                                             uintptr_t jmp_addr, uintptr_t addr)
 {
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 887b371a81..1ad095cc35 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -2694,25 +2694,78 @@ static void tcg_gen_req_mo(TCGBar type)
 
 void tcg_gen_qemu_ld_i32(TCGv_i32 val, TCGv addr, TCGArg idx, TCGMemOp memop)
 {
+    TCGMemOp orig_memop;
+
     tcg_gen_req_mo(TCG_MO_LD_LD | TCG_MO_ST_LD);
     memop = tcg_canonicalize_memop(memop, 0, 0);
     trace_guest_mem_before_tcg(tcg_ctx->cpu, cpu_env,
                                addr, trace_mem_get_info(memop, 0));
+
+    orig_memop = memop;
+    if (!TCG_TARGET_HAS_MEMORY_BSWAP && (memop & MO_BSWAP)) {
+        memop &= ~MO_BSWAP;
+        /* The bswap primitive requires zero-extended input.  */
+        if ((memop & MO_SSIZE) == MO_SW) {
+            memop &= ~MO_SIGN;
+        }
+    }
+
     gen_ldst_i32(INDEX_op_qemu_ld_i32, val, addr, memop, idx);
+
+    if ((orig_memop ^ memop) & MO_BSWAP) {
+        switch (orig_memop & MO_SIZE) {
+        case MO_16:
+            tcg_gen_bswap16_i32(val, val);
+            if (orig_memop & MO_SIGN) {
+                tcg_gen_ext16s_i32(val, val);
+            }
+            break;
+        case MO_32:
+            tcg_gen_bswap32_i32(val, val);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
 }
 
 void tcg_gen_qemu_st_i32(TCGv_i32 val, TCGv addr, TCGArg idx, TCGMemOp memop)
 {
+    TCGv_i32 swap = NULL;
+
     tcg_gen_req_mo(TCG_MO_LD_ST | TCG_MO_ST_ST);
     memop = tcg_canonicalize_memop(memop, 0, 1);
     trace_guest_mem_before_tcg(tcg_ctx->cpu, cpu_env,
                                addr, trace_mem_get_info(memop, 1));
+
+    if (!TCG_TARGET_HAS_MEMORY_BSWAP && (memop & MO_BSWAP)) {
+        swap = tcg_temp_new_i32();
+        switch (memop & MO_SIZE) {
+        case MO_16:
+            tcg_gen_ext16u_i32(swap, val);
+            tcg_gen_bswap16_i32(swap, swap);
+            break;
+        case MO_32:
+            tcg_gen_bswap32_i32(swap, val);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        val = swap;
+        memop &= ~MO_BSWAP;
+    }
+
     gen_ldst_i32(INDEX_op_qemu_st_i32, val, addr, memop, idx);
+
+    if (swap) {
+        tcg_temp_free_i32(swap);
+    }
 }
 
 void tcg_gen_qemu_ld_i64(TCGv_i64 val, TCGv addr, TCGArg idx, TCGMemOp memop)
 {
-    tcg_gen_req_mo(TCG_MO_LD_LD | TCG_MO_ST_LD);
+    TCGMemOp orig_memop;
+
     if (TCG_TARGET_REG_BITS == 32 && (memop & MO_SIZE) < MO_64) {
         tcg_gen_qemu_ld_i32(TCGV_LOW(val), addr, idx, memop);
         if (memop & MO_SIGN) {
@@ -2723,24 +2776,85 @@ void tcg_gen_qemu_ld_i64(TCGv_i64 val, TCGv addr, TCGArg idx, TCGMemOp memop)
         return;
     }
 
+    tcg_gen_req_mo(TCG_MO_LD_LD | TCG_MO_ST_LD);
     memop = tcg_canonicalize_memop(memop, 1, 0);
     trace_guest_mem_before_tcg(tcg_ctx->cpu, cpu_env,
                                addr, trace_mem_get_info(memop, 0));
+
+    orig_memop = memop;
+    if (!TCG_TARGET_HAS_MEMORY_BSWAP && (memop & MO_BSWAP)) {
+        memop &= ~MO_BSWAP;
+        /* The bswap primitive requires zero-extended input.  */
+        if ((memop & MO_SIGN) && (memop & MO_SIZE) < MO_64) {
+            memop &= ~MO_SIGN;
+        }
+    }
+
     gen_ldst_i64(INDEX_op_qemu_ld_i64, val, addr, memop, idx);
+
+    if ((orig_memop ^ memop) & MO_BSWAP) {
+        switch (orig_memop & MO_SIZE) {
+        case MO_16:
+            tcg_gen_bswap16_i64(val, val);
+            if (orig_memop & MO_SIGN) {
+                tcg_gen_ext16s_i64(val, val);
+            }
+            break;
+        case MO_32:
+            tcg_gen_bswap32_i64(val, val);
+            if (orig_memop & MO_SIGN) {
+                tcg_gen_ext32s_i64(val, val);
+            }
+            break;
+        case MO_64:
+            tcg_gen_bswap64_i64(val, val);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
 }
 
 void tcg_gen_qemu_st_i64(TCGv_i64 val, TCGv addr, TCGArg idx, TCGMemOp memop)
 {
-    tcg_gen_req_mo(TCG_MO_LD_ST | TCG_MO_ST_ST);
+    TCGv_i64 swap = NULL;
+
     if (TCG_TARGET_REG_BITS == 32 && (memop & MO_SIZE) < MO_64) {
         tcg_gen_qemu_st_i32(TCGV_LOW(val), addr, idx, memop);
         return;
     }
 
+    tcg_gen_req_mo(TCG_MO_LD_ST | TCG_MO_ST_ST);
     memop = tcg_canonicalize_memop(memop, 1, 1);
     trace_guest_mem_before_tcg(tcg_ctx->cpu, cpu_env,
                                addr, trace_mem_get_info(memop, 1));
+
+    if (!TCG_TARGET_HAS_MEMORY_BSWAP && (memop & MO_BSWAP)) {
+        swap = tcg_temp_new_i64();
+        switch (memop & MO_SIZE) {
+        case MO_16:
+            tcg_gen_ext16u_i64(swap, val);
+            tcg_gen_bswap16_i64(swap, swap);
+            break;
+        case MO_32:
+            tcg_gen_ext32u_i64(swap, val);
+            tcg_gen_bswap32_i64(swap, swap);
+            break;
+        case MO_64:
+            tcg_gen_bswap64_i64(swap, val);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        val = swap;
+        memop &= ~MO_BSWAP;
+    }
+
     gen_ldst_i64(INDEX_op_qemu_st_i64, val, addr, memop, idx);
+
+    if (swap) {
+        tcg_temp_free_i64(swap);
+    }
 }
 
 static void tcg_gen_ext_i32(TCGv_i32 ret, TCGv_i32 val, TCGMemOp opc)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 30/37] tcg/i386: Adjust TCG_TARGET_HAS_MEMORY_BSWAP
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (28 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 29/37] tcg: Add TCG_TARGET_HAS_MEMORY_BSWAP Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 31/37] tcg/aarch64: Set TCG_TARGET_HAS_MEMORY_BSWAP to false Richard Henderson
                   ` (8 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

Always true for softmmu and when movbe is available.  In the softmmu
case we always have call-clobbered scratch registers available, and
having the bswap in the softmmu thunk maximizes code sharing.

For user-only and without movbe, leave this to generic code.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.h     |   5 ++
 tcg/i386/tcg-target.inc.c | 122 ++++++++++++++++++++++++--------------
 2 files changed, 82 insertions(+), 45 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 212ba554e9..2d7cbb5dd6 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -101,6 +101,7 @@ extern bool have_bmi1;
 extern bool have_popcnt;
 extern bool have_avx1;
 extern bool have_avx2;
+extern bool have_movbe;
 
 /* optional instructions */
 #define TCG_TARGET_HAS_div2_i32         1
@@ -219,7 +220,11 @@ static inline void tb_target_set_jmp_target(uintptr_t tc_ptr,
 
 #define TCG_TARGET_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
 
+#ifdef CONFIG_SOFTMMU
 #define TCG_TARGET_HAS_MEMORY_BSWAP  1
+#else
+#define TCG_TARGET_HAS_MEMORY_BSWAP  have_movbe
+#endif
 
 #ifdef CONFIG_SOFTMMU
 #define TCG_TARGET_NEED_LDST_OOL_LABELS
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 5c68cbd43d..76235e90c9 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -158,13 +158,12 @@ bool have_bmi1;
 bool have_popcnt;
 bool have_avx1;
 bool have_avx2;
+bool have_movbe;
 
 #ifdef CONFIG_CPUID_H
-static bool have_movbe;
 static bool have_bmi2;
 static bool have_lzcnt;
 #else
-# define have_movbe 0
 # define have_bmi2 0
 # define have_lzcnt 0
 #endif
@@ -1818,13 +1817,24 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
                                    TCGReg base, int index, intptr_t ofs,
                                    int seg, TCGMemOp memop)
 {
-    const TCGMemOp real_bswap = memop & MO_BSWAP;
-    TCGMemOp bswap = real_bswap;
+    bool use_bswap = memop & MO_BSWAP;
+    bool use_movbe = false;
     int movop = OPC_MOVL_GvEv;
 
-    if (have_movbe && real_bswap) {
-        bswap = 0;
-        movop = OPC_MOVBE_GyMy;
+    /*
+     * Do big-endian loads with movbe or softmmu.
+     * User-only without movbe will have its swapping done generically.
+     */
+    if (use_bswap) {
+        if (have_movbe) {
+            use_bswap = false;
+            use_movbe = true;
+            movop = OPC_MOVBE_GyMy;
+        } else {
+#ifndef CONFIG_SOFTMMU
+            g_assert_not_reached();
+#endif
+        }
     }
 
     switch (memop & MO_SSIZE) {
@@ -1837,40 +1847,52 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
                                  base, index, 0, ofs);
         break;
     case MO_UW:
-        tcg_out_modrm_sib_offset(s, OPC_MOVZWL + seg, datalo,
-                                 base, index, 0, ofs);
-        if (real_bswap) {
-            tcg_out_rolw_8(s, datalo);
-        }
-        break;
-    case MO_SW:
-        if (real_bswap) {
-            if (have_movbe) {
+        if (use_movbe) {
+            /* There is no extending movbe; only low 16-bits are modified.  */
+            if (datalo != base && datalo != index) {
+                /* XOR breaks zeros while breaking dependency chains.  */
+                tgen_arithr(s, ARITH_XOR, datalo, datalo);
                 tcg_out_modrm_sib_offset(s, OPC_MOVBE_GyMy + P_DATA16 + seg,
                                          datalo, base, index, 0, ofs);
             } else {
-                tcg_out_modrm_sib_offset(s, OPC_MOVZWL + seg, datalo,
-                                         base, index, 0, ofs);
+                tcg_out_modrm_sib_offset(s, OPC_MOVBE_GyMy + P_DATA16 + seg,
+                                         datalo, base, index, 0, ofs);
+                tcg_out_ext16u(s, datalo, datalo);
+            }
+        } else {
+            tcg_out_modrm_sib_offset(s, OPC_MOVZWL + seg, datalo,
+                                     base, index, 0, ofs);
+            if (use_bswap) {
                 tcg_out_rolw_8(s, datalo);
             }
-            tcg_out_modrm(s, OPC_MOVSWL + P_REXW, datalo, datalo);
+        }
+        break;
+    case MO_SW:
+        if (use_movbe) {
+            tcg_out_modrm_sib_offset(s, OPC_MOVBE_GyMy + P_DATA16 + seg,
+                                     datalo, base, index, 0, ofs);
+            tcg_out_ext16s(s, datalo, datalo, P_REXW);
         } else {
             tcg_out_modrm_sib_offset(s, OPC_MOVSWL + P_REXW + seg,
                                      datalo, base, index, 0, ofs);
+            if (use_bswap) {
+                tcg_out_rolw_8(s, datalo);
+                tcg_out_ext16s(s, datalo, datalo, P_REXW);
+            }
         }
         break;
     case MO_UL:
         tcg_out_modrm_sib_offset(s, movop + seg, datalo, base, index, 0, ofs);
-        if (bswap) {
+        if (use_bswap) {
             tcg_out_bswap32(s, datalo);
         }
         break;
 #if TCG_TARGET_REG_BITS == 64
     case MO_SL:
-        if (real_bswap) {
+        if (use_bswap || use_movbe) {
             tcg_out_modrm_sib_offset(s, movop + seg, datalo,
                                      base, index, 0, ofs);
-            if (bswap) {
+            if (use_bswap) {
                 tcg_out_bswap32(s, datalo);
             }
             tcg_out_ext32s(s, datalo, datalo);
@@ -1884,12 +1906,12 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
         if (TCG_TARGET_REG_BITS == 64) {
             tcg_out_modrm_sib_offset(s, movop + P_REXW + seg, datalo,
                                      base, index, 0, ofs);
-            if (bswap) {
+            if (use_bswap) {
                 tcg_out_bswap64(s, datalo);
             }
         } else {
-            if (real_bswap) {
-                int t = datalo;
+            if (use_bswap || use_movbe) {
+                TCGReg t = datalo;
                 datalo = datahi;
                 datahi = t;
             }
@@ -1904,14 +1926,14 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
                 tcg_out_modrm_sib_offset(s, movop + seg, datalo,
                                          base, index, 0, ofs);
             }
-            if (bswap) {
+            if (use_bswap) {
                 tcg_out_bswap32(s, datalo);
                 tcg_out_bswap32(s, datahi);
             }
         }
         break;
     default:
-        tcg_abort();
+        g_assert_not_reached();
     }
 }
 
@@ -1991,24 +2013,34 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
                                    TCGReg base, intptr_t ofs, int seg,
                                    TCGMemOp memop)
 {
-    /* ??? Ideally we wouldn't need a scratch register.  For user-only,
-       we could perform the bswap twice to restore the original value
-       instead of moving to the scratch.  But as it is, the L constraint
-       means that TCG_REG_L0 is definitely free here.  */
     const TCGReg scratch = TCG_REG_L0;
-    const TCGMemOp real_bswap = memop & MO_BSWAP;
-    TCGMemOp bswap = real_bswap;
+    bool use_bswap = memop & MO_BSWAP;
+    bool use_movbe = false;
     int movop = OPC_MOVL_EvGv;
 
-    if (have_movbe && real_bswap) {
-        bswap = 0;
-        movop = OPC_MOVBE_MyGy;
+    /*
+     * Do big-endian stores with movbe or softmmu.
+     * User-only without movbe will have its swapping done generically.
+     */
+    if (use_bswap) {
+        if (have_movbe) {
+            use_bswap = false;
+            use_movbe = true;
+            movop = OPC_MOVBE_MyGy;
+        } else {
+#ifndef CONFIG_SOFTMMU
+            g_assert_not_reached();
+#endif
+        }
     }
 
     switch (memop & MO_SIZE) {
     case MO_8:
-        /* In 32-bit mode, 8-bit stores can only happen from [abcd]x.
-           Use the scratch register if necessary.  */
+        /*
+         * In 32-bit mode, 8-bit stores can only happen from [abcd]x.
+         * ??? Adjust constraints such that this is is forced, then
+         * we won't need a scratch at all for user-only.
+         */
         if (TCG_TARGET_REG_BITS == 32 && datalo >= 4) {
             tcg_out_mov(s, TCG_TYPE_I32, scratch, datalo);
             datalo = scratch;
@@ -2017,7 +2049,7 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
                              datalo, base, ofs);
         break;
     case MO_16:
-        if (bswap) {
+        if (use_bswap) {
             tcg_out_mov(s, TCG_TYPE_I32, scratch, datalo);
             tcg_out_rolw_8(s, scratch);
             datalo = scratch;
@@ -2025,7 +2057,7 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
         tcg_out_modrm_offset(s, movop + P_DATA16 + seg, datalo, base, ofs);
         break;
     case MO_32:
-        if (bswap) {
+        if (use_bswap) {
             tcg_out_mov(s, TCG_TYPE_I32, scratch, datalo);
             tcg_out_bswap32(s, scratch);
             datalo = scratch;
@@ -2034,13 +2066,13 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
         break;
     case MO_64:
         if (TCG_TARGET_REG_BITS == 64) {
-            if (bswap) {
+            if (use_bswap) {
                 tcg_out_mov(s, TCG_TYPE_I64, scratch, datalo);
                 tcg_out_bswap64(s, scratch);
                 datalo = scratch;
             }
             tcg_out_modrm_offset(s, movop + P_REXW + seg, datalo, base, ofs);
-        } else if (bswap) {
+        } else if (use_bswap) {
             tcg_out_mov(s, TCG_TYPE_I32, scratch, datahi);
             tcg_out_bswap32(s, scratch);
             tcg_out_modrm_offset(s, OPC_MOVL_EvGv + seg, scratch, base, ofs);
@@ -2048,8 +2080,8 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
             tcg_out_bswap32(s, scratch);
             tcg_out_modrm_offset(s, OPC_MOVL_EvGv + seg, scratch, base, ofs+4);
         } else {
-            if (real_bswap) {
-                int t = datalo;
+            if (use_movbe) {
+                TCGReg t = datalo;
                 datalo = datahi;
                 datahi = t;
             }
@@ -2058,7 +2090,7 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
         }
         break;
     default:
-        tcg_abort();
+        g_assert_not_reached();
     }
 }
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 31/37] tcg/aarch64: Set TCG_TARGET_HAS_MEMORY_BSWAP to false
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (29 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 30/37] tcg/i386: Adjust TCG_TARGET_HAS_MEMORY_BSWAP Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 32/37] tcg/arm: Set TCG_TARGET_HAS_MEMORY_BSWAP to false for user-only Richard Henderson
                   ` (7 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

This allows us to remove some code from the backend, allowing
the generic code to emit any extra bswaps.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.h     |  2 +-
 tcg/aarch64/tcg-target.inc.c | 51 +++++++-----------------------------
 2 files changed, 10 insertions(+), 43 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 0788f2eecb..7f55d50400 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -137,7 +137,7 @@ typedef enum {
 #define TCG_TARGET_HAS_mul_vec          1
 
 #define TCG_TARGET_DEFAULT_MO (0)
-#define TCG_TARGET_HAS_MEMORY_BSWAP     1
+#define TCG_TARGET_HAS_MEMORY_BSWAP     0
 
 static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 {
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 8edea527f7..34f9347cdf 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -1485,8 +1485,6 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGMemOp memop, TCGType ext,
                                    TCGReg data_r, TCGReg addr_r,
                                    TCGType otype, TCGReg off_r)
 {
-    const TCGMemOp bswap = memop & MO_BSWAP;
-
     switch (memop & MO_SSIZE) {
     case MO_UB:
         tcg_out_ldst_r(s, I3312_LDRB, data_r, addr_r, otype, off_r);
@@ -1497,43 +1495,22 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGMemOp memop, TCGType ext,
         break;
     case MO_UW:
         tcg_out_ldst_r(s, I3312_LDRH, data_r, addr_r, otype, off_r);
-        if (bswap) {
-            tcg_out_rev16(s, data_r, data_r);
-        }
         break;
     case MO_SW:
-        if (bswap) {
-            tcg_out_ldst_r(s, I3312_LDRH, data_r, addr_r, otype, off_r);
-            tcg_out_rev16(s, data_r, data_r);
-            tcg_out_sxt(s, ext, MO_16, data_r, data_r);
-        } else {
-            tcg_out_ldst_r(s, (ext ? I3312_LDRSHX : I3312_LDRSHW),
-                           data_r, addr_r, otype, off_r);
-        }
+        tcg_out_ldst_r(s, (ext ? I3312_LDRSHX : I3312_LDRSHW),
+                       data_r, addr_r, otype, off_r);
         break;
     case MO_UL:
         tcg_out_ldst_r(s, I3312_LDRW, data_r, addr_r, otype, off_r);
-        if (bswap) {
-            tcg_out_rev32(s, data_r, data_r);
-        }
         break;
     case MO_SL:
-        if (bswap) {
-            tcg_out_ldst_r(s, I3312_LDRW, data_r, addr_r, otype, off_r);
-            tcg_out_rev32(s, data_r, data_r);
-            tcg_out_sxt(s, TCG_TYPE_I64, MO_32, data_r, data_r);
-        } else {
-            tcg_out_ldst_r(s, I3312_LDRSWX, data_r, addr_r, otype, off_r);
-        }
+        tcg_out_ldst_r(s, I3312_LDRSWX, data_r, addr_r, otype, off_r);
         break;
     case MO_Q:
         tcg_out_ldst_r(s, I3312_LDRX, data_r, addr_r, otype, off_r);
-        if (bswap) {
-            tcg_out_rev64(s, data_r, data_r);
-        }
         break;
     default:
-        tcg_abort();
+        g_assert_not_reached();
     }
 }
 
@@ -1541,35 +1518,21 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGMemOp memop,
                                    TCGReg data_r, TCGReg addr_r,
                                    TCGType otype, TCGReg off_r)
 {
-    const TCGMemOp bswap = memop & MO_BSWAP;
-
     switch (memop & MO_SIZE) {
     case MO_8:
         tcg_out_ldst_r(s, I3312_STRB, data_r, addr_r, otype, off_r);
         break;
     case MO_16:
-        if (bswap && data_r != TCG_REG_XZR) {
-            tcg_out_rev16(s, TCG_REG_TMP, data_r);
-            data_r = TCG_REG_TMP;
-        }
         tcg_out_ldst_r(s, I3312_STRH, data_r, addr_r, otype, off_r);
         break;
     case MO_32:
-        if (bswap && data_r != TCG_REG_XZR) {
-            tcg_out_rev32(s, TCG_REG_TMP, data_r);
-            data_r = TCG_REG_TMP;
-        }
         tcg_out_ldst_r(s, I3312_STRW, data_r, addr_r, otype, off_r);
         break;
     case MO_64:
-        if (bswap && data_r != TCG_REG_XZR) {
-            tcg_out_rev64(s, TCG_REG_TMP, data_r);
-            data_r = TCG_REG_TMP;
-        }
         tcg_out_ldst_r(s, I3312_STRX, data_r, addr_r, otype, off_r);
         break;
     default:
-        tcg_abort();
+        g_assert_not_reached();
     }
 }
 
@@ -1578,6 +1541,8 @@ static void tcg_out_qemu_ld(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
 {
     TCGMemOp memop = get_memop(oi);
 
+    tcg_debug_assert(!(memop & MO_BSWAP));
+
 #ifdef CONFIG_SOFTMMU
     /* Ignore the requested "ext".  We get the same correct result from
      * a 16-bit sign-extended to 64-bit as we do sign-extended to 32-bit,
@@ -1608,6 +1573,8 @@ static void tcg_out_qemu_st(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
 {
     TCGMemOp memop = get_memop(oi);
 
+    tcg_debug_assert(!(memop & MO_BSWAP));
+
 #ifdef CONFIG_SOFTMMU
     bool is_64 = (memop & MO_SIZE) == MO_64;
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 32/37] tcg/arm: Set TCG_TARGET_HAS_MEMORY_BSWAP to false for user-only
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (30 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 31/37] tcg/aarch64: Set TCG_TARGET_HAS_MEMORY_BSWAP to false Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 33/37] tcg/i386: Propagate is64 to tcg_out_qemu_ld_direct Richard Henderson
                   ` (6 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

Letting the generic code emit any bswaps allows us to avoid reserving
an extra register for CONFIG_USER_ONLY.  For SOFTMMU, where we have
free call-clobbered registers anyway, leaving the bswap in the out-of-line
thunk maximizes code sharing.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/arm/tcg-target.h     |   4 ++
 tcg/arm/tcg-target.inc.c | 129 +++++++++++++--------------------------
 2 files changed, 48 insertions(+), 85 deletions(-)

diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 7a4c55d66d..a05310a684 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -131,7 +131,11 @@ enum {
 };
 
 #define TCG_TARGET_DEFAULT_MO (0)
+#ifdef CONFIG_SOFTMMU
 #define TCG_TARGET_HAS_MEMORY_BSWAP     1
+#else
+#define TCG_TARGET_HAS_MEMORY_BSWAP     0
+#endif
 
 static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 {
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 5a15f6a546..898701f105 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -270,15 +270,6 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
         ct->u.regs = 0xffff;
         break;
 
-    /* qemu_st address & data */
-    case 's':
-        ct->ct |= TCG_CT_REG;
-        ct->u.regs = 0xffff;
-        /* r0-r1 doing the byte swapping, so don't use these */
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R0);
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R1);
-        break;
-
     default:
         return NULL;
     }
@@ -1363,6 +1354,7 @@ static inline void tcg_out_qemu_ld_index(TCGContext *s, TCGMemOp opc,
                                          TCGReg addrlo, TCGReg addend)
 {
     TCGMemOp bswap = opc & MO_BSWAP;
+    assert(USING_SOFTMMU || !bswap);
 
     switch (opc & MO_SSIZE) {
     case MO_UB:
@@ -1386,7 +1378,6 @@ static inline void tcg_out_qemu_ld_index(TCGContext *s, TCGMemOp opc,
         }
         break;
     case MO_UL:
-    default:
         tcg_out_ld32_r(s, COND_AL, datalo, addrlo, addend);
         if (bswap) {
             tcg_out_bswap32(s, COND_AL, datalo, datalo);
@@ -1416,6 +1407,8 @@ static inline void tcg_out_qemu_ld_index(TCGContext *s, TCGMemOp opc,
             }
         }
         break;
+    default:
+        g_assert_not_reached();
     }
 }
 
@@ -1424,6 +1417,7 @@ static inline void tcg_out_qemu_ld_direct(TCGContext *s, TCGMemOp opc,
                                           TCGReg addrlo)
 {
     TCGMemOp bswap = opc & MO_BSWAP;
+    assert(!USING_SOFTMMU && !bswap);
 
     switch (opc & MO_SSIZE) {
     case MO_UB:
@@ -1434,47 +1428,24 @@ static inline void tcg_out_qemu_ld_direct(TCGContext *s, TCGMemOp opc,
         break;
     case MO_UW:
         tcg_out_ld16u_8(s, COND_AL, datalo, addrlo, 0);
-        if (bswap) {
-            tcg_out_bswap16(s, COND_AL, datalo, datalo);
-        }
         break;
     case MO_SW:
-        if (bswap) {
-            tcg_out_ld16u_8(s, COND_AL, datalo, addrlo, 0);
-            tcg_out_bswap16s(s, COND_AL, datalo, datalo);
-        } else {
-            tcg_out_ld16s_8(s, COND_AL, datalo, addrlo, 0);
-        }
+        tcg_out_ld16s_8(s, COND_AL, datalo, addrlo, 0);
         break;
     case MO_UL:
-    default:
         tcg_out_ld32_12(s, COND_AL, datalo, addrlo, 0);
-        if (bswap) {
-            tcg_out_bswap32(s, COND_AL, datalo, datalo);
-        }
         break;
     case MO_Q:
-        {
-            TCGReg dl = (bswap ? datahi : datalo);
-            TCGReg dh = (bswap ? datalo : datahi);
-
-            /* Avoid ldrd for user-only emulation, to handle unaligned.  */
-            if (USING_SOFTMMU && use_armv6_instructions
-                && (dl & 1) == 0 && dh == dl + 1) {
-                tcg_out_ldrd_8(s, COND_AL, dl, addrlo, 0);
-            } else if (dl == addrlo) {
-                tcg_out_ld32_12(s, COND_AL, dh, addrlo, bswap ? 0 : 4);
-                tcg_out_ld32_12(s, COND_AL, dl, addrlo, bswap ? 4 : 0);
-            } else {
-                tcg_out_ld32_12(s, COND_AL, dl, addrlo, bswap ? 4 : 0);
-                tcg_out_ld32_12(s, COND_AL, dh, addrlo, bswap ? 0 : 4);
-            }
-            if (bswap) {
-                tcg_out_bswap32(s, COND_AL, dl, dl);
-                tcg_out_bswap32(s, COND_AL, dh, dh);
-            }
+        if (datalo == addrlo) {
+            tcg_out_ld32_12(s, COND_AL, datahi, addrlo, 4);
+            tcg_out_ld32_12(s, COND_AL, datalo, addrlo, 0);
+        } else {
+            tcg_out_ld32_12(s, COND_AL, datalo, addrlo, 0);
+            tcg_out_ld32_12(s, COND_AL, datahi, addrlo, 4);
         }
         break;
+    default:
+        g_assert_not_reached();
     }
 }
 
@@ -1485,19 +1456,18 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
     TCGReg datalo __attribute__((unused));
     TCGReg datahi __attribute__((unused));
     TCGMemOpIdx oi;
-    TCGMemOp opc;
 
     datalo = *args++;
     datahi = (is64 ? *args++ : 0);
     addrlo = *args++;
     addrhi = (TARGET_LONG_BITS == 64 ? *args++ : 0);
     oi = *args++;
-    opc = get_memop(oi);
 
 #ifdef CONFIG_SOFTMMU
     add_ldst_ool_label(s, true, is64, oi, R_ARM_PC24, 0);
     tcg_out_bl_noaddr(s, COND_AL);
 #else
+    TCGMemOp opc = get_memop(oi);
     if (guest_base) {
         tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP, guest_base);
         tcg_out_qemu_ld_index(s, opc, datalo, datahi, addrlo, TCG_REG_TMP);
@@ -1512,6 +1482,7 @@ static inline void tcg_out_qemu_st_index(TCGContext *s, int cond, TCGMemOp opc,
                                          TCGReg addrlo, TCGReg addend)
 {
     TCGMemOp bswap = opc & MO_BSWAP;
+    assert(USING_SOFTMMU || !bswap);
 
     switch (opc & MO_SIZE) {
     case MO_8:
@@ -1526,7 +1497,6 @@ static inline void tcg_out_qemu_st_index(TCGContext *s, int cond, TCGMemOp opc,
         }
         break;
     case MO_32:
-    default:
         if (bswap) {
             tcg_out_bswap32(s, cond, TCG_REG_R0, datalo);
             tcg_out_st32_r(s, cond, TCG_REG_R0, addrlo, addend);
@@ -1535,20 +1505,32 @@ static inline void tcg_out_qemu_st_index(TCGContext *s, int cond, TCGMemOp opc,
         }
         break;
     case MO_64:
-        /* Avoid strd for user-only emulation, to handle unaligned.  */
         if (bswap) {
-            tcg_out_bswap32(s, cond, TCG_REG_R0, datahi);
-            tcg_out_st32_rwb(s, cond, TCG_REG_R0, addend, addrlo);
-            tcg_out_bswap32(s, cond, TCG_REG_R0, datalo);
-            tcg_out_st32_12(s, cond, TCG_REG_R0, addend, 4);
-        } else if (USING_SOFTMMU && use_armv6_instructions
-                   && (datalo & 1) == 0 && datahi == datalo + 1) {
+            /*
+             * Assert inputs are where I think, for the softmmu thunk.
+             * One pair of R0/R1 or R2/R3 will be free and call-clobbered,
+             * which allows the use of STRD below.  Note the bswaps also
+             * reverse the lo/hi registers to swap the two words.
+             */
+            tcg_debug_assert(addend == TCG_REG_TMP);
+            tcg_debug_assert(datalo == TCG_REG_R4);
+            tcg_debug_assert(datahi == TCG_REG_R5);
+            datalo = addrlo == TCG_REG_R1 ? TCG_REG_R2 : TCG_REG_R0;
+            datahi = datalo + 1;
+            tcg_out_bswap32(s, cond, datalo, TCG_REG_R5);
+            tcg_out_bswap32(s, cond, datahi, TCG_REG_R4);
+        }
+        /* Avoid strd for user-only emulation, to handle unaligned.  */
+        if (USING_SOFTMMU && use_armv6_instructions
+            && (datalo & 1) == 0 && datahi == datalo + 1) {
             tcg_out_strd_r(s, cond, datalo, addrlo, addend);
         } else {
             tcg_out_st32_rwb(s, cond, datalo, addend, addrlo);
             tcg_out_st32_12(s, cond, datahi, addend, 4);
         }
         break;
+    default:
+        g_assert_not_reached();
     }
 }
 
@@ -1557,43 +1539,25 @@ static inline void tcg_out_qemu_st_direct(TCGContext *s, TCGMemOp opc,
                                           TCGReg addrlo)
 {
     TCGMemOp bswap = opc & MO_BSWAP;
+    assert(!USING_SOFTMMU && !bswap);
 
     switch (opc & MO_SIZE) {
     case MO_8:
         tcg_out_st8_12(s, COND_AL, datalo, addrlo, 0);
         break;
     case MO_16:
-        if (bswap) {
-            tcg_out_bswap16st(s, COND_AL, TCG_REG_R0, datalo);
-            tcg_out_st16_8(s, COND_AL, TCG_REG_R0, addrlo, 0);
-        } else {
-            tcg_out_st16_8(s, COND_AL, datalo, addrlo, 0);
-        }
+        tcg_out_st16_8(s, COND_AL, datalo, addrlo, 0);
         break;
     case MO_32:
-    default:
-        if (bswap) {
-            tcg_out_bswap32(s, COND_AL, TCG_REG_R0, datalo);
-            tcg_out_st32_12(s, COND_AL, TCG_REG_R0, addrlo, 0);
-        } else {
-            tcg_out_st32_12(s, COND_AL, datalo, addrlo, 0);
-        }
+        tcg_out_st32_12(s, COND_AL, datalo, addrlo, 0);
         break;
     case MO_64:
         /* Avoid strd for user-only emulation, to handle unaligned.  */
-        if (bswap) {
-            tcg_out_bswap32(s, COND_AL, TCG_REG_R0, datahi);
-            tcg_out_st32_12(s, COND_AL, TCG_REG_R0, addrlo, 0);
-            tcg_out_bswap32(s, COND_AL, TCG_REG_R0, datalo);
-            tcg_out_st32_12(s, COND_AL, TCG_REG_R0, addrlo, 4);
-        } else if (USING_SOFTMMU && use_armv6_instructions
-                   && (datalo & 1) == 0 && datahi == datalo + 1) {
-            tcg_out_strd_8(s, COND_AL, datalo, addrlo, 0);
-        } else {
-            tcg_out_st32_12(s, COND_AL, datalo, addrlo, 0);
-            tcg_out_st32_12(s, COND_AL, datahi, addrlo, 4);
-        }
+        tcg_out_st32_12(s, COND_AL, datalo, addrlo, 0);
+        tcg_out_st32_12(s, COND_AL, datahi, addrlo, 4);
         break;
+    default:
+        g_assert_not_reached();
     }
 }
 
@@ -1604,19 +1568,18 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
     TCGReg datalo __attribute__((unused));
     TCGReg datahi __attribute__((unused));
     TCGMemOpIdx oi;
-    TCGMemOp opc;
 
     datalo = *args++;
     datahi = (is64 ? *args++ : 0);
     addrlo = *args++;
     addrhi = (TARGET_LONG_BITS == 64 ? *args++ : 0);
     oi = *args++;
-    opc = get_memop(oi);
 
 #ifdef CONFIG_SOFTMMU
     add_ldst_ool_label(s, false, is64, oi, R_ARM_PC24, 0);
     tcg_out_bl_noaddr(s, COND_AL);
 #else
+    TCGMemOp opc = get_memop(oi);
     if (guest_base) {
         tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP, guest_base);
         tcg_out_qemu_st_index(s, COND_AL, opc, datalo,
@@ -2055,11 +2018,9 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
 {
     static const TCGTargetOpDef r = { .args_ct_str = { "r" } };
     static const TCGTargetOpDef r_r = { .args_ct_str = { "r", "r" } };
-    static const TCGTargetOpDef s_s = { .args_ct_str = { "s", "s" } };
     static const TCGTargetOpDef a_b = { .args_ct_str = { "a", "b" } };
     static const TCGTargetOpDef c_b = { .args_ct_str = { "c", "b" } };
     static const TCGTargetOpDef r_r_r = { .args_ct_str = { "r", "r", "r" } };
-    static const TCGTargetOpDef s_s_s = { .args_ct_str = { "s", "s", "s" } };
     static const TCGTargetOpDef a_c_d = { .args_ct_str = { "a", "c", "d" } };
     static const TCGTargetOpDef a_b_b = { .args_ct_str = { "a", "b", "b" } };
     static const TCGTargetOpDef e_c_d = { .args_ct_str = { "e", "c", "d" } };
@@ -2072,8 +2033,6 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         = { .args_ct_str = { "r", "r", "rIK" } };
     static const TCGTargetOpDef r_r_r_r
         = { .args_ct_str = { "r", "r", "r", "r" } };
-    static const TCGTargetOpDef s_s_s_s
-        = { .args_ct_str = { "s", "s", "s", "s" } };
     static const TCGTargetOpDef a_b_c_d
         = { .args_ct_str = { "a", "b", "c", "d" } };
     static const TCGTargetOpDef e_f_c_d
@@ -2175,7 +2134,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         }
     case INDEX_op_qemu_st_i32:
         if (!USING_SOFTMMU) {
-            return TARGET_LONG_BITS == 32 ? &s_s : &s_s_s;
+            return TARGET_LONG_BITS == 32 ? &r_r : &r_r_r;
         } else if (TARGET_LONG_BITS == 32) {
             return &c_b;     /* temps available r0, r3, r12 */
         } else {
@@ -2183,7 +2142,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         }
     case INDEX_op_qemu_st_i64:
         if (!USING_SOFTMMU) {
-            return TARGET_LONG_BITS == 32 ? &s_s_s : &s_s_s_s;
+            return TARGET_LONG_BITS == 32 ? &r_r_r : &r_r_r_r;
         } else if (TARGET_LONG_BITS == 32) {
             return &e_f_b;   /* temps available r0, r2, r3, r12 */
         } else {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 33/37] tcg/i386: Propagate is64 to tcg_out_qemu_ld_direct
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (31 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 32/37] tcg/arm: Set TCG_TARGET_HAS_MEMORY_BSWAP to false for user-only Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 34/37] tcg/i386: Restrict user-only qemu_st_i32 values to q-regs Richard Henderson
                   ` (5 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

This can save a few rex prefixes for qemu_ld_i32.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.inc.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 76235e90c9..5cad31cfe5 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1815,10 +1815,11 @@ static inline void setup_guest_base_seg(void) { }
 
 static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
                                    TCGReg base, int index, intptr_t ofs,
-                                   int seg, TCGMemOp memop)
+                                   int seg, bool is64, TCGMemOp memop)
 {
     bool use_bswap = memop & MO_BSWAP;
     bool use_movbe = false;
+    int rexw = is64 * P_REXW;
     int movop = OPC_MOVL_GvEv;
 
     /*
@@ -1843,7 +1844,7 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
                                  base, index, 0, ofs);
         break;
     case MO_SB:
-        tcg_out_modrm_sib_offset(s, OPC_MOVSBL + P_REXW + seg, datalo,
+        tcg_out_modrm_sib_offset(s, OPC_MOVSBL + rexw + seg, datalo,
                                  base, index, 0, ofs);
         break;
     case MO_UW:
@@ -1871,14 +1872,15 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
         if (use_movbe) {
             tcg_out_modrm_sib_offset(s, OPC_MOVBE_GyMy + P_DATA16 + seg,
                                      datalo, base, index, 0, ofs);
-            tcg_out_ext16s(s, datalo, datalo, P_REXW);
-        } else {
-            tcg_out_modrm_sib_offset(s, OPC_MOVSWL + P_REXW + seg,
+            tcg_out_ext16s(s, datalo, datalo, rexw);
+        } else if (use_bswap) {
+            tcg_out_modrm_sib_offset(s, OPC_MOVSWL + seg,
+                                     datalo, base, index, 0, ofs);
+            tcg_out_rolw_8(s, datalo);
+            tcg_out_ext16s(s, datalo, datalo, rexw);
+        } else {
+            tcg_out_modrm_sib_offset(s, OPC_MOVSWL + rexw + seg,
                                      datalo, base, index, 0, ofs);
-            if (use_bswap) {
-                tcg_out_rolw_8(s, datalo);
-                tcg_out_ext16s(s, datalo, datalo, P_REXW);
-            }
         }
         break;
     case MO_UL:
@@ -2004,7 +2006,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
         }
 
         tcg_out_qemu_ld_direct(s, datalo, datahi,
-                               base, index, offset, seg, opc);
+                               base, index, offset, seg, is64, opc);
     }
 #endif
 }
@@ -2202,7 +2204,7 @@ static tcg_insn_unit *tcg_out_qemu_ldst_ool(TCGContext *s, bool is_ld,
 
     /* TLB Hit.  */
     if (is_ld) {
-        tcg_out_qemu_ld_direct(s, datalo, datahi, base, -1, 0, 0, opc);
+        tcg_out_qemu_ld_direct(s, datalo, datahi, base, -1, 0, 0, is_64, opc);
     } else {
         tcg_out_qemu_st_direct(s, datalo, datahi, base, 0, 0, opc);
     }
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 34/37] tcg/i386: Restrict user-only qemu_st_i32 values to q-regs
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (32 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 33/37] tcg/i386: Propagate is64 to tcg_out_qemu_ld_direct Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 35/37] tcg/i386: Add setup_guest_base_seg for FreeBSD Richard Henderson
                   ` (4 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

This is one more step toward the removal of all scratch registers
during user-only guest memory operations.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.inc.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 5cad31cfe5..79de8d0cd2 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -240,7 +240,17 @@ static const char *constrain_memop_arg(QemuMemArgType type, bool is_64, int hi)
 #else
 static const char *constrain_memop_arg(QemuMemArgType type, bool is_64, int hi)
 {
-    return "L";
+    if (TCG_TARGET_REG_BITS == 64) {
+        /* Temps are still needed for guest_base && !guest_base_flags.  */
+        return "L";
+    } else if (type == ARG_STVAL && !is_64) {
+        /* Byte stores must happen from q-regs.  Because of this, we must
+         * constrain all INDEX_op_qemu_st_i32 to use q-regs.
+         */
+        return "q";
+    } else {
+        return "r";
+    }
 }
 #endif /* CONFIG_SOFTMMU */
 
@@ -2038,15 +2048,8 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
 
     switch (memop & MO_SIZE) {
     case MO_8:
-        /*
-         * In 32-bit mode, 8-bit stores can only happen from [abcd]x.
-         * ??? Adjust constraints such that this is is forced, then
-         * we won't need a scratch at all for user-only.
-         */
-        if (TCG_TARGET_REG_BITS == 32 && datalo >= 4) {
-            tcg_out_mov(s, TCG_TYPE_I32, scratch, datalo);
-            datalo = scratch;
-        }
+        /* In 32-bit mode, 8-bit stores can only happen from [abcd]x.  */
+        tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || datalo < 4);
         tcg_out_modrm_offset(s, OPC_MOVB_EvGv + P_REXB_R + seg,
                              datalo, base, ofs);
         break;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 35/37] tcg/i386: Add setup_guest_base_seg for FreeBSD
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (33 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 34/37] tcg/i386: Restrict user-only qemu_st_i32 values to q-regs Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 36/37] tcg/i386: Require segment syscalls to succeed Richard Henderson
                   ` (3 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.inc.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 79de8d0cd2..55c5a8516c 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1818,6 +1818,16 @@ static inline void setup_guest_base_seg(void)
         guest_base_flags = P_GS;
     }
 }
+#elif defined (__FreeBSD__) || defined (__FreeBSD_kernel__)
+# include <machine/sysarch.h>
+
+static int guest_base_flags;
+static inline void setup_guest_base_seg(void)
+{
+    if (sysarch(AMD64_SET_GSBASE, &guest_base) == 0) {
+        guest_base_flags = P_GS;
+    }
+}
 #else
 # define guest_base_flags 0
 static inline void setup_guest_base_seg(void) { }
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 36/37] tcg/i386: Require segment syscalls to succeed
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (34 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 35/37] tcg/i386: Add setup_guest_base_seg for FreeBSD Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 37/37] tcg/i386: Remove L constraint Richard Henderson
                   ` (2 subsequent siblings)
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

There ought be no reason they should ever fail.  If we don't know
how to set a segment base register for user-only (NetBSD, OpenBSD?),
then error out if we cannot proceed.

This is one more step toward the removal of all scratch registers
during user-only guest memory operations.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.inc.c | 54 +++++++++++++++++----------------------
 1 file changed, 23 insertions(+), 31 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 55c5a8516c..19a0fa8a03 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1814,9 +1814,12 @@ int arch_prctl(int code, unsigned long addr);
 static int guest_base_flags;
 static inline void setup_guest_base_seg(void)
 {
-    if (arch_prctl(ARCH_SET_GS, guest_base) == 0) {
-        guest_base_flags = P_GS;
+    /* There is no reason this syscall should fail.  */
+    if (arch_prctl(ARCH_SET_GS, guest_base) < 0) {
+        perror("arch_prctl(ARCH_SET_GS)");
+        exit(1);
     }
+    guest_base_flags = P_GS;
 }
 #elif defined (__FreeBSD__) || defined (__FreeBSD_kernel__)
 # include <machine/sysarch.h>
@@ -1824,13 +1827,28 @@ static inline void setup_guest_base_seg(void)
 static int guest_base_flags;
 static inline void setup_guest_base_seg(void)
 {
-    if (sysarch(AMD64_SET_GSBASE, &guest_base) == 0) {
-        guest_base_flags = P_GS;
+    /* There is no reason this syscall should fail.  */
+    if (sysarch(AMD64_SET_GSBASE, &guest_base) < 0) {
+        perror("sysarch(AMD64_SET_GSBASE)");
+        exit(1);
     }
+    guest_base_flags = P_GS;
 }
 #else
 # define guest_base_flags 0
-static inline void setup_guest_base_seg(void) { }
+static inline void setup_guest_base_seg(void)
+{
+    /*
+     * Verify we can proceed without scratch registers.
+     * If guest_base > INT32_MAX, then it would need to be loaded.
+     * If 32-bit guest, the address would need to be zero-extended.
+     */
+    if (TCG_TARGET_REG_BITS == 64
+        && (TARGET_LONG_BITS == 32 || guest_base > INT32_MAX)) {
+        error_report("Segment base register not supported on this OS");
+        exit(1);
+    }
+}
 #endif /* SOFTMMU */
 
 static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
@@ -2013,16 +2031,6 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
             if (TCG_TARGET_REG_BITS > TARGET_LONG_BITS) {
                 seg |= P_ADDR32;
             }
-        } else if (TCG_TARGET_REG_BITS == 64) {
-            if (TARGET_LONG_BITS == 32) {
-                tcg_out_ext32u(s, TCG_REG_L0, base);
-                base = TCG_REG_L0;
-            }
-            if (offset != guest_base) {
-                tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_L1, guest_base);
-                index = TCG_REG_L1;
-                offset = 0;
-            }
         }
 
         tcg_out_qemu_ld_direct(s, datalo, datahi,
@@ -2156,22 +2164,6 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
             if (TCG_TARGET_REG_BITS > TARGET_LONG_BITS) {
                 seg |= P_ADDR32;
             }
-        } else if (TCG_TARGET_REG_BITS == 64) {
-            /* ??? Note that we can't use the same SIB addressing scheme
-               as for loads, since we require L0 free for bswap.  */
-            if (offset != guest_base) {
-                if (TARGET_LONG_BITS == 32) {
-                    tcg_out_ext32u(s, TCG_REG_L0, base);
-                    base = TCG_REG_L0;
-                }
-                tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_L1, guest_base);
-                tgen_arithr(s, ARITH_ADD + P_REXW, TCG_REG_L1, base);
-                base = TCG_REG_L1;
-                offset = 0;
-            } else if (TARGET_LONG_BITS == 32) {
-                tcg_out_ext32u(s, TCG_REG_L1, base);
-                base = TCG_REG_L1;
-            }
         }
 
         tcg_out_qemu_st_direct(s, datalo, datahi, base, offset, seg, opc);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v2 37/37] tcg/i386: Remove L constraint
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (35 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 36/37] tcg/i386: Require segment syscalls to succeed Richard Henderson
@ 2018-11-23 14:45 ` Richard Henderson
  2018-11-23 21:04 ` [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups no-reply
  2018-11-26  0:30 ` Emilio G. Cota
  38 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-23 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis

We no longer need any scratch registers for user-only memory ops.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.inc.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 19a0fa8a03..2815dd25a0 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -240,10 +240,7 @@ static const char *constrain_memop_arg(QemuMemArgType type, bool is_64, int hi)
 #else
 static const char *constrain_memop_arg(QemuMemArgType type, bool is_64, int hi)
 {
-    if (TCG_TARGET_REG_BITS == 64) {
-        /* Temps are still needed for guest_base && !guest_base_flags.  */
-        return "L";
-    } else if (type == ARG_STVAL && !is_64) {
+    if (TCG_TARGET_REG_BITS == 32 && type == ARG_STVAL && !is_64) {
         /* Byte stores must happen from q-regs.  Because of this, we must
          * constrain all INDEX_op_qemu_st_i32 to use q-regs.
          */
@@ -353,14 +350,6 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
         ct->u.regs |= ALL_VECTOR_REGS;
         break;
 
-        /* qemu_ld/st address constraint */
-    case 'L':
-        ct->ct |= TCG_CT_REG;
-        ct->u.regs = TCG_TARGET_REG_BITS == 64 ? 0xffff : 0xff;
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_L0);
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_L1);
-        break;
-
     case 'e':
         ct->ct |= (type == TCG_TYPE_I32 ? TCG_CT_CONST : TCG_CT_CONST_S32);
         break;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (36 preceding siblings ...)
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 37/37] tcg/i386: Remove L constraint Richard Henderson
@ 2018-11-23 21:04 ` no-reply
  2018-11-26  0:30 ` Emilio G. Cota
  38 siblings, 0 replies; 55+ messages in thread
From: no-reply @ 2018-11-23 21:04 UTC (permalink / raw)
  To: richard.henderson; +Cc: famz, qemu-devel, Alistair.Francis

Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20181123144558.5048-1-richard.henderson@linaro.org
Type: series
Subject: [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
fc046e8 tcg/i386: Remove L constraint
25a6e0e tcg/i386: Require segment syscalls to succeed
5e0fd5d tcg/i386: Add setup_guest_base_seg for FreeBSD
d68fe20 tcg/i386: Restrict user-only qemu_st_i32 values to q-regs
85fcd2d tcg/i386: Propagate is64 to tcg_out_qemu_ld_direct
a3178a9 tcg/arm: Set TCG_TARGET_HAS_MEMORY_BSWAP to false for user-only
4d0c5fc tcg/aarch64: Set TCG_TARGET_HAS_MEMORY_BSWAP to false
676c67e tcg/i386: Adjust TCG_TARGET_HAS_MEMORY_BSWAP
38c85b7 tcg: Add TCG_TARGET_HAS_MEMORY_BSWAP
d3a1899 tcg/optimize: Optimize bswap
3ad8388 tcg: Clean up generic bswap64
affd2d8 tcg: Clean up generic bswap32
fd89430 tcg/ppc: Use TCG_TARGET_NEED_LDST_OOL_LABELS
8aa5784 tcg/ppc: Force qemu_ld/st arguments into fixed registers
af59cb6 tcg/ppc: Change TCG_TARGET_CALL_ALIGN_ARGS to bool
26bfc9a tcg/ppc: Add constraints for R7-R8
6cd8567 tcg/ppc: Split out tcg_out_call_int
d40e505 tcg/ppc: Parameterize the temps for tcg_out_tlb_read
acfcb89 tcg/arm: Use TCG_TARGET_NEED_LDST_OOL_LABELS
5516052 tcg/arm: Force qemu_ld/st arguments into fixed registers
46ad1f6 tcg/arm: Reduce the number of temps for tcg_out_tlb_read
24abc30 tcg/arm: Add constraints for R0-R5
94b24c4 tcg/arm: Parameterize the temps for tcg_out_tlb_read
91b0db1 tcg/aarch64: Use TCG_TARGET_NEED_LDST_OOL_LABELS
7129f8e tcg/aarch64: Use B not BL for tcg_out_goto_long
c770cee tcg/aarch64: Parameterize the temp for tcg_out_goto_long
412bf17 tcg/aarch64: Parameterize the temps for tcg_out_tlb_read
2ffa3ee tcg/aarch64: Add constraints for x0, x1, x2
a403293 tcg/i386: Use TCG_TARGET_NEED_LDST_OOL_LABELS
a65b8a9 tcg/i386: Force qemu_ld/st arguments into fixed registers
4053d4c tcg/i386: Change TCG_REG_L[01] to not overlap function arguments
c9e6cd5 tcg/i386: Return a base register from tcg_out_tlb_load
a601058 tcg/i386: Add constraints for r8 and r9
41701df tcg: Add TCG_TARGET_NEED_LDST_OOL_LABELS
fdb09e0 tcg: Return success from patch_reloc
8dab265 tcg/i386: Move TCG_REG_CALL_STACK from define to enum
58990b6 tcg/i386: Always use %ebp for TCG_AREG0

=== OUTPUT BEGIN ===
Checking PATCH 1/37: tcg/i386: Always use %ebp for TCG_AREG0...
Checking PATCH 2/37: tcg/i386: Move TCG_REG_CALL_STACK from define to enum...
Checking PATCH 3/37: tcg: Return success from patch_reloc...
Checking PATCH 4/37: tcg: Add TCG_TARGET_NEED_LDST_OOL_LABELS...
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#47: 
new file mode 100644

total: 0 errors, 1 warnings, 192 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 5/37: tcg/i386: Add constraints for r8 and r9...
Checking PATCH 6/37: tcg/i386: Return a base register from tcg_out_tlb_load...
Checking PATCH 7/37: tcg/i386: Change TCG_REG_L[01] to not overlap function arguments...
Checking PATCH 8/37: tcg/i386: Force qemu_ld/st arguments into fixed registers...
Checking PATCH 9/37: tcg/i386: Use TCG_TARGET_NEED_LDST_OOL_LABELS...
Checking PATCH 10/37: tcg/aarch64: Add constraints for x0, x1, x2...
Checking PATCH 11/37: tcg/aarch64: Parameterize the temps for tcg_out_tlb_read...
Checking PATCH 12/37: tcg/aarch64: Parameterize the temp for tcg_out_goto_long...
Checking PATCH 13/37: tcg/aarch64: Use B not BL for tcg_out_goto_long...
Checking PATCH 14/37: tcg/aarch64: Use TCG_TARGET_NEED_LDST_OOL_LABELS...
Checking PATCH 15/37: tcg/arm: Parameterize the temps for tcg_out_tlb_read...
Checking PATCH 16/37: tcg/arm: Add constraints for R0-R5...
Checking PATCH 17/37: tcg/arm: Reduce the number of temps for tcg_out_tlb_read...
Checking PATCH 18/37: tcg/arm: Force qemu_ld/st arguments into fixed registers...
Checking PATCH 19/37: tcg/arm: Use TCG_TARGET_NEED_LDST_OOL_LABELS...
ERROR: externs should be avoided in .c files
#169: FILE: tcg/arm/tcg-target.inc.c:1483:
+    TCGReg addrlo __attribute__((unused));

ERROR: externs should be avoided in .c files
#170: FILE: tcg/arm/tcg-target.inc.c:1484:
+    TCGReg addrhi __attribute__((unused));

ERROR: externs should be avoided in .c files
#171: FILE: tcg/arm/tcg-target.inc.c:1485:
+    TCGReg datalo __attribute__((unused));

ERROR: externs should be avoided in .c files
#172: FILE: tcg/arm/tcg-target.inc.c:1486:
+    TCGReg datahi __attribute__((unused));

ERROR: externs should be avoided in .c files
#224: FILE: tcg/arm/tcg-target.inc.c:1602:
+    TCGReg addrlo __attribute__((unused));

ERROR: externs should be avoided in .c files
#225: FILE: tcg/arm/tcg-target.inc.c:1603:
+    TCGReg addrhi __attribute__((unused));

ERROR: externs should be avoided in .c files
#226: FILE: tcg/arm/tcg-target.inc.c:1604:
+    TCGReg datalo __attribute__((unused));

ERROR: externs should be avoided in .c files
#227: FILE: tcg/arm/tcg-target.inc.c:1605:
+    TCGReg datahi __attribute__((unused));

total: 8 errors, 0 warnings, 368 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 20/37: tcg/ppc: Parameterize the temps for tcg_out_tlb_read...
Checking PATCH 21/37: tcg/ppc: Split out tcg_out_call_int...
Checking PATCH 22/37: tcg/ppc: Add constraints for R7-R8...
Checking PATCH 23/37: tcg/ppc: Change TCG_TARGET_CALL_ALIGN_ARGS to bool...
Checking PATCH 24/37: tcg/ppc: Force qemu_ld/st arguments into fixed registers...
ERROR: do not initialise statics to 0 or NULL
#52: FILE: tcg/ppc/tcg-target.inc.c:1749:
+    static bool is_be = false;

total: 1 errors, 0 warnings, 207 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 25/37: tcg/ppc: Use TCG_TARGET_NEED_LDST_OOL_LABELS...
ERROR: externs should be avoided in .c files
#262: FILE: tcg/ppc/tcg-target.inc.c:1690:
+    TCGReg datalo __attribute__((unused));

ERROR: externs should be avoided in .c files
#263: FILE: tcg/ppc/tcg-target.inc.c:1691:
+    TCGReg datahi __attribute__((unused));

ERROR: externs should be avoided in .c files
#264: FILE: tcg/ppc/tcg-target.inc.c:1692:
+    TCGReg addrlo __attribute__((unused));

ERROR: externs should be avoided in .c files
#322: FILE: tcg/ppc/tcg-target.inc.c:1748:
+    TCGReg datalo __attribute__((unused));

ERROR: externs should be avoided in .c files
#323: FILE: tcg/ppc/tcg-target.inc.c:1749:
+    TCGReg datahi __attribute__((unused));

ERROR: externs should be avoided in .c files
#324: FILE: tcg/ppc/tcg-target.inc.c:1750:
+    TCGReg addrlo __attribute__((unused));

ERROR: externs should be avoided in .c files
#325: FILE: tcg/ppc/tcg-target.inc.c:1751:
+    TCGReg addrhi __attribute__((unused));

total: 7 errors, 0 warnings, 414 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 26/37: tcg: Clean up generic bswap32...
Checking PATCH 27/37: tcg: Clean up generic bswap64...
Checking PATCH 28/37: tcg/optimize: Optimize bswap...
ERROR: spaces required around that ':' (ctx:VxE)
#21: FILE: tcg/optimize.c:356:
+    CASE_OP_32_64(bswap16):
                           ^

ERROR: spaces required around that ':' (ctx:VxE)
#24: FILE: tcg/optimize.c:359:
+    CASE_OP_32_64(bswap32):
                           ^

ERROR: spaces required around that ':' (ctx:VxE)
#37: FILE: tcg/optimize.c:1117:
+        CASE_OP_32_64(bswap16):
                               ^

ERROR: spaces required around that ':' (ctx:VxE)
#38: FILE: tcg/optimize.c:1118:
+        CASE_OP_32_64(bswap32):
                               ^

total: 4 errors, 0 warnings, 24 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 29/37: tcg: Add TCG_TARGET_HAS_MEMORY_BSWAP...
Checking PATCH 30/37: tcg/i386: Adjust TCG_TARGET_HAS_MEMORY_BSWAP...
Checking PATCH 31/37: tcg/aarch64: Set TCG_TARGET_HAS_MEMORY_BSWAP to false...
Checking PATCH 32/37: tcg/arm: Set TCG_TARGET_HAS_MEMORY_BSWAP to false for user-only...
ERROR: space prohibited after that '&' (ctx:WxW)
#206: FILE: tcg/arm/tcg-target.inc.c:1525:
+            && (datalo & 1) == 0 && datahi == datalo + 1) {
                        ^

total: 1 errors, 0 warnings, 289 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 33/37: tcg/i386: Propagate is64 to tcg_out_qemu_ld_direct...
Checking PATCH 34/37: tcg/i386: Restrict user-only qemu_st_i32 values to q-regs...
Checking PATCH 35/37: tcg/i386: Add setup_guest_base_seg for FreeBSD...
ERROR: space prohibited between function name and open parenthesis '('
#17: FILE: tcg/i386/tcg-target.inc.c:1821:
+#elif defined (__FreeBSD__) || defined (__FreeBSD_kernel__)

ERROR: space prohibited between function name and open parenthesis '('
#17: FILE: tcg/i386/tcg-target.inc.c:1821:
+#elif defined (__FreeBSD__) || defined (__FreeBSD_kernel__)

total: 2 errors, 0 warnings, 16 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 36/37: tcg/i386: Require segment syscalls to succeed...
Checking PATCH 37/37: tcg/i386: Remove L constraint...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups
  2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
                   ` (37 preceding siblings ...)
  2018-11-23 21:04 ` [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups no-reply
@ 2018-11-26  0:30 ` Emilio G. Cota
  38 siblings, 0 replies; 55+ messages in thread
From: Emilio G. Cota @ 2018-11-26  0:30 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Alistair.Francis

On Fri, Nov 23, 2018 at 15:45:21 +0100, Richard Henderson wrote:
> This includes everything queued so far -- softmmu out-of-line
> patches

Reviewed-by: Emilio G. Cota <cota@braap.org>
for patches 1-9.

I am sad to report that on a Skylake host, this series gives
a ~10% average slowdown for x86_64-softmmu SPEC06int
(I'm reporting speedup, so <1 means slowdown):
  https://imgur.com/a/25iu8Yl

Turns out that despite the higher icache hit, the IPC
ends up being lower. For instance, here are perf counts when
running hmmer x3 right after booting up (bootup is included
in the counts, but hmmer is run 3 times in a row):

- Before:
   249,392,070,159      cycles
   781,327,593,681      instructions              #    3.13  insn per cycle
    85,914,418,873      branches
       242,572,820      branch-misses             #    0.28% of all branches
     1,567,954,032      L1-icache-load-misses

      70.559864567 seconds time elapsed

- After:
   277,806,651,701      cycles
   813,619,725,225      instructions              #    2.93  insn per cycle
   132,453,633,831      branches
       306,969,989      branch-misses             #    0.23% of all branches
     1,250,619,057      L1-icache-load-misses

      78.420517079 seconds time elapsed

On the bright side, in an older system (Sandy Bridge), I get
a fairly neutral average perf impact, with some workloads
speeding up and others slowing down:
  https://imgur.com/a/AokDbkm
(Note that v1 of this series gave an overall slowdown, so that's
progress.)

Given the above, perhaps the best way forward is to add a
configure flag to disable OOL thunks, unless you have any
further optimizations coming up.

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v2 04/37] tcg: Add TCG_TARGET_NEED_LDST_OOL_LABELS
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 04/37] tcg: Add TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
@ 2018-11-26  0:31   ` Emilio G. Cota
  0 siblings, 0 replies; 55+ messages in thread
From: Emilio G. Cota @ 2018-11-26  0:31 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Alistair.Francis

On Fri, Nov 23, 2018 at 15:45:25 +0100, Richard Henderson wrote:
> This variant of tcg-ldst.inc.c allows the entire thunk to be
> moved out-of-line, with caching across TBs within a region.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
(snip)
> +++ b/tcg/tcg-ldst-ool.inc.c
(snip)
> +typedef struct TCGLabelQemuLdstOol {
> +    QSIMPLEQ_ENTRY(TCGLabelQemuLdstOol) next;
> +    tcg_insn_unit *label;   /* label pointer to be updated */
> +    int reloc;              /* relocation type from label_ptr */
> +    intptr_t addend;        /* relocation addend from label_ptr */
> +    uint32_t key;           /* oi : is_64 : is_ld */
> +} TCGLabelQemuLdstOol;

Just a tiny nit, here we can move reloc down to plug a hole.

		Emilio

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v2 01/37] tcg/i386: Always use %ebp for TCG_AREG0
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 01/37] tcg/i386: Always use %ebp for TCG_AREG0 Richard Henderson
@ 2018-11-29 12:52   ` Alex Bennée
  2018-11-29 14:55     ` Richard Henderson
  0 siblings, 1 reply; 55+ messages in thread
From: Alex Bennée @ 2018-11-29 12:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis


Richard Henderson <richard.henderson@linaro.org> writes:

> For x86_64, this can result in smaller code when manipulating
> TCG_TYPE_I32, as we can omit a REX prefix.

I take it you mean passing TCG_TYPE_I32 back and forth from the register
backing store in CPUEnv which TCG_AREG0 points at?

Anyway:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/i386/tcg-target.h | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
> index 9fdf37f23c..7488c3d869 100644
> --- a/tcg/i386/tcg-target.h
> +++ b/tcg/i386/tcg-target.h
> @@ -84,6 +84,8 @@ typedef enum {
>      TCG_REG_RBP = TCG_REG_EBP,
>      TCG_REG_RSI = TCG_REG_ESI,
>      TCG_REG_RDI = TCG_REG_EDI,
> +
> +    TCG_AREG0 = TCG_REG_EBP,
>  } TCGReg;
>
>  /* used for function call generation */
> @@ -194,12 +196,6 @@ extern bool have_avx2;
>  #define TCG_TARGET_extract_i64_valid(ofs, len) \
>      (((ofs) == 8 && (len) == 8) || ((ofs) + (len)) == 32)
>
> -#if TCG_TARGET_REG_BITS == 64
> -# define TCG_AREG0 TCG_REG_R14
> -#else
> -# define TCG_AREG0 TCG_REG_EBP
> -#endif
> -
>  static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
>  {
>  }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v2 02/37] tcg/i386: Move TCG_REG_CALL_STACK from define to enum
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 02/37] tcg/i386: Move TCG_REG_CALL_STACK from define to enum Richard Henderson
@ 2018-11-29 12:52   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2018-11-29 12:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  tcg/i386/tcg-target.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
> index 7488c3d869..2441658865 100644
> --- a/tcg/i386/tcg-target.h
> +++ b/tcg/i386/tcg-target.h
> @@ -86,10 +86,10 @@ typedef enum {
>      TCG_REG_RDI = TCG_REG_EDI,
>
>      TCG_AREG0 = TCG_REG_EBP,
> +    TCG_REG_CALL_STACK = TCG_REG_ESP
>  } TCGReg;
>
>  /* used for function call generation */
> -#define TCG_REG_CALL_STACK TCG_REG_ESP
>  #define TCG_TARGET_STACK_ALIGN 16
>  #if defined(_WIN64)
>  #define TCG_TARGET_CALL_STACK_OFFSET 32


--
Alex Bennée

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v2 03/37] tcg: Return success from patch_reloc
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 03/37] tcg: Return success from patch_reloc Richard Henderson
@ 2018-11-29 14:47   ` Alex Bennée
  2018-11-29 17:35     ` Richard Henderson
  0 siblings, 1 reply; 55+ messages in thread
From: Alex Bennée @ 2018-11-29 14:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis


Richard Henderson <richard.henderson@linaro.org> writes:

> This moves the assert for success from inside patch_reloc
> to outside patch_reloc.  This touches all tcg backends.

s/outside/above/?

We also seem to be dropping a bunch of reloc_atomic functions (which are
no longer used?). Perhaps that should be a separate patch to make the
series cleaner?

>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/aarch64/tcg-target.inc.c | 44 ++++++++++++++-------------------
>  tcg/arm/tcg-target.inc.c     | 26 +++++++++-----------
>  tcg/i386/tcg-target.inc.c    | 17 +++++++------
>  tcg/mips/tcg-target.inc.c    | 29 +++++++++-------------
>  tcg/ppc/tcg-target.inc.c     | 47 ++++++++++++++++++++++--------------
>  tcg/s390/tcg-target.inc.c    | 37 +++++++++++++++++++---------
>  tcg/sparc/tcg-target.inc.c   | 13 ++++++----
>  tcg/tcg-pool.inc.c           |  5 +++-
>  tcg/tcg.c                    |  8 +++---
>  tcg/tci/tcg-target.inc.c     |  3 ++-
>  10 files changed, 125 insertions(+), 104 deletions(-)
>
> diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
> index 083592a4d7..30091f6a69 100644
> --- a/tcg/aarch64/tcg-target.inc.c
> +++ b/tcg/aarch64/tcg-target.inc.c
> @@ -78,48 +78,40 @@ static const int tcg_target_call_oarg_regs[1] = {
>  #define TCG_REG_GUEST_BASE TCG_REG_X28
>  #endif
>
> -static inline void reloc_pc26(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
> +static inline bool reloc_pc26(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
>  {
>      ptrdiff_t offset = target - code_ptr;
> -    tcg_debug_assert(offset == sextract64(offset, 0, 26));
> -    /* read instruction, mask away previous PC_REL26 parameter contents,
> -       set the proper offset, then write back the instruction. */
> -    *code_ptr = deposit32(*code_ptr, 0, 26, offset);
> +    if (offset == sextract64(offset, 0, 26)) {
> +        /* read instruction, mask away previous PC_REL26 parameter contents,
> +           set the proper offset, then write back the instruction. */
> +        *code_ptr = deposit32(*code_ptr, 0, 26, offset);
> +        return true;
> +    }
> +    return false;
>  }
>
> -static inline void reloc_pc26_atomic(tcg_insn_unit *code_ptr,
> -                                     tcg_insn_unit *target)
> +static inline bool reloc_pc19(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
>  {
>      ptrdiff_t offset = target - code_ptr;
> -    tcg_insn_unit insn;
> -    tcg_debug_assert(offset == sextract64(offset, 0, 26));
> -    /* read instruction, mask away previous PC_REL26 parameter contents,
> -       set the proper offset, then write back the instruction. */
> -    insn = atomic_read(code_ptr);
> -    atomic_set(code_ptr, deposit32(insn, 0, 26, offset));
> +    if (offset == sextract64(offset, 0, 19)) {
> +        *code_ptr = deposit32(*code_ptr, 5, 19, offset);
> +        return true;
> +    }
> +    return false;
>  }
>
> -static inline void reloc_pc19(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
> -{
> -    ptrdiff_t offset = target - code_ptr;
> -    tcg_debug_assert(offset == sextract64(offset, 0, 19));
> -    *code_ptr = deposit32(*code_ptr, 5, 19, offset);
> -}
> -
> -static inline void patch_reloc(tcg_insn_unit *code_ptr, int type,
> +static inline bool patch_reloc(tcg_insn_unit *code_ptr, int type,
>                                 intptr_t value, intptr_t addend)
>  {
>      tcg_debug_assert(addend == 0);
>      switch (type) {
>      case R_AARCH64_JUMP26:
>      case R_AARCH64_CALL26:
> -        reloc_pc26(code_ptr, (tcg_insn_unit *)value);
> -        break;
> +        return reloc_pc26(code_ptr, (tcg_insn_unit *)value);
>      case R_AARCH64_CONDBR19:
> -        reloc_pc19(code_ptr, (tcg_insn_unit *)value);
> -        break;
> +        return reloc_pc19(code_ptr, (tcg_insn_unit *)value);
>      default:
> -        tcg_abort();
> +        g_assert_not_reached();
>      }
>  }
>
> diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
> index e1fbf465cb..80d174ef44 100644
> --- a/tcg/arm/tcg-target.inc.c
> +++ b/tcg/arm/tcg-target.inc.c
> @@ -187,27 +187,23 @@ static const uint8_t tcg_cond_to_arm_cond[] = {
>      [TCG_COND_GTU] = COND_HI,
>  };
>
> -static inline void reloc_pc24(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
> +static inline bool reloc_pc24(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
>  {
>      ptrdiff_t offset = (tcg_ptr_byte_diff(target, code_ptr) - 8) >> 2;
> -    *code_ptr = (*code_ptr & ~0xffffff) | (offset & 0xffffff);
> +    if (offset == sextract32(offset, 0, 24)) {
> +        *code_ptr = (*code_ptr & ~0xffffff) | (offset & 0xffffff);
> +        return true;
> +    }
> +    return false;
>  }
>
> -static inline void reloc_pc24_atomic(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
> -{
> -    ptrdiff_t offset = (tcg_ptr_byte_diff(target, code_ptr) - 8) >> 2;
> -    tcg_insn_unit insn = atomic_read(code_ptr);
> -    tcg_debug_assert(offset == sextract32(offset, 0, 24));
> -    atomic_set(code_ptr, deposit32(insn, 0, 24, offset));
> -}
> -
> -static void patch_reloc(tcg_insn_unit *code_ptr, int type,
> +static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
>                          intptr_t value, intptr_t addend)
>  {
>      tcg_debug_assert(addend == 0);
>
>      if (type == R_ARM_PC24) {
> -        reloc_pc24(code_ptr, (tcg_insn_unit *)value);
> +        return reloc_pc24(code_ptr, (tcg_insn_unit *)value);
>      } else if (type == R_ARM_PC13) {
>          intptr_t diff = value - (uintptr_t)(code_ptr + 2);
>          tcg_insn_unit insn = *code_ptr;
> @@ -218,10 +214,9 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
>              if (!u) {
>                  diff = -diff;
>              }
> -        } else {
> +        } else if (diff >= 0x1000 && diff < 0x100000) {
>              int rd = extract32(insn, 12, 4);
>              int rt = rd == TCG_REG_PC ? TCG_REG_TMP : rd;
> -            assert(diff >= 0x1000 && diff < 0x100000);
>              /* add rt, pc, #high */
>              *code_ptr++ = ((insn & 0xf0000000) | (1 << 25) | ARITH_ADD
>                             | (TCG_REG_PC << 16) | (rt << 12)
> @@ -230,10 +225,13 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
>              insn = deposit32(insn, 12, 4, rt);
>              diff &= 0xfff;
>              u = 1;
> +        } else {
> +            return false;
>          }
>          insn = deposit32(insn, 23, 1, u);
>          insn = deposit32(insn, 0, 12, diff);
>          *code_ptr = insn;
> +        return true;
>      } else {
>          g_assert_not_reached();
>      }
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index 436195894b..4f66a0c5ae 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -167,29 +167,32 @@ static bool have_lzcnt;
>
>  static tcg_insn_unit *tb_ret_addr;
>
> -static void patch_reloc(tcg_insn_unit *code_ptr, int type,
> +static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
>                          intptr_t value, intptr_t addend)
>  {
>      value += addend;
> -    switch(type) {
> +
> +    switch (type) {
>      case R_386_PC32:
>          value -= (uintptr_t)code_ptr;
>          if (value != (int32_t)value) {
> -            tcg_abort();
> +            return false;
>          }
>          /* FALLTHRU */
>      case R_386_32:
>          tcg_patch32(code_ptr, value);
> -        break;
> +        return true;
> +
>      case R_386_PC8:
>          value -= (uintptr_t)code_ptr;
>          if (value != (int8_t)value) {
> -            tcg_abort();
> +            return false;
>          }
>          tcg_patch8(code_ptr, value);
> -        break;
> +        return true;
> +
>      default:
> -        tcg_abort();
> +        g_assert_not_reached();
>      }
>  }
>
> diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
> index cff525373b..e59c66b607 100644
> --- a/tcg/mips/tcg-target.inc.c
> +++ b/tcg/mips/tcg-target.inc.c
> @@ -144,36 +144,29 @@ static tcg_insn_unit *bswap32_addr;
>  static tcg_insn_unit *bswap32u_addr;
>  static tcg_insn_unit *bswap64_addr;
>
> -static inline uint32_t reloc_pc16_val(tcg_insn_unit *pc, tcg_insn_unit *target)
> +static bool reloc_pc16_cond(tcg_insn_unit *pc, tcg_insn_unit *target)

What is the cond here anyway? Given we pass through bellow with a
function with the same signature it makes me wonder if there shouldn't
just be one reloc_pc16 function.

>  {
>      /* Let the compiler perform the right-shift as part of the arithmetic.  */
>      ptrdiff_t disp = target - (pc + 1);
> -    tcg_debug_assert(disp == (int16_t)disp);
> -    return disp & 0xffff;
> +    if (disp == (int16_t)disp) {
> +        *pc = deposit32(*pc, 0, 16, disp);
> +        return true;
> +    } else {
> +        return false;
> +    }
>  }
>
> -static inline void reloc_pc16(tcg_insn_unit *pc, tcg_insn_unit *target)
> +static bool reloc_pc16(tcg_insn_unit *pc, tcg_insn_unit *target)
>  {
> -    *pc = deposit32(*pc, 0, 16, reloc_pc16_val(pc, target));
> +    tcg_debug_assert(reloc_pc16_cond(pc, target));

Having side effects in tcg_debug_assert seems like bad style, besides
should we not be passing the result up to the caller?

In fact I think this breaks the shippable build anyway:

In file included from /root/src/github.com/stsquad/qemu/tcg/tcg.c:320:0:
/root/src/github.com/stsquad/qemu/tcg/mips/tcg-target.inc.c: In function 'reloc_pc16':
/root/src/github.com/stsquad/qemu/tcg/mips/tcg-target.inc.c:162:1: error: control reaches end of non-void function [-Werror=return-type]
 }

>  }
>
> -static inline uint32_t reloc_26_val(tcg_insn_unit *pc, tcg_insn_unit *target)
> -{
> -    tcg_debug_assert((((uintptr_t)pc ^ (uintptr_t)target) & 0xf0000000) == 0);
> -    return ((uintptr_t)target >> 2) & 0x3ffffff;
> -}
> -
> -static inline void reloc_26(tcg_insn_unit *pc, tcg_insn_unit *target)
> -{
> -    *pc = deposit32(*pc, 0, 26, reloc_26_val(pc, target));
> -}
> -
> -static void patch_reloc(tcg_insn_unit *code_ptr, int type,
> +static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
>                          intptr_t value, intptr_t addend)
>  {
>      tcg_debug_assert(type == R_MIPS_PC16);
>      tcg_debug_assert(addend == 0);
> -    reloc_pc16(code_ptr, (tcg_insn_unit *)value);
> +    return reloc_pc16_cond(code_ptr, (tcg_insn_unit *)value);

See above.

>  }
>
>  #define TCG_CT_CONST_ZERO 0x100
> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
> index c2f729ee8f..656a9ff603 100644
> --- a/tcg/ppc/tcg-target.inc.c
> +++ b/tcg/ppc/tcg-target.inc.c
> @@ -186,16 +186,14 @@ static inline bool in_range_b(tcg_target_long target)
>      return target == sextract64(target, 0, 26);
>  }
>
> -static uint32_t reloc_pc24_val(tcg_insn_unit *pc, tcg_insn_unit *target)
> +static bool reloc_pc24_cond(tcg_insn_unit *pc, tcg_insn_unit *target)
>  {
>      ptrdiff_t disp = tcg_ptr_byte_diff(target, pc);
> -    tcg_debug_assert(in_range_b(disp));
> -    return disp & 0x3fffffc;
> -}
> -
> -static void reloc_pc24(tcg_insn_unit *pc, tcg_insn_unit *target)
> -{
> -    *pc = (*pc & ~0x3fffffc) | reloc_pc24_val(pc, target);
> +    if (in_range_b(disp)) {
> +        *pc = (*pc & ~0x3fffffc) | (disp & 0x3fffffc);
> +        return true;
> +    }
> +    return false;
>  }
>
>  static uint16_t reloc_pc14_val(tcg_insn_unit *pc, tcg_insn_unit *target)
> @@ -205,10 +203,22 @@ static uint16_t reloc_pc14_val(tcg_insn_unit *pc, tcg_insn_unit *target)
>      return disp & 0xfffc;
>  }
>
> +static bool reloc_pc14_cond(tcg_insn_unit *pc, tcg_insn_unit *target)
> +{
> +    ptrdiff_t disp = tcg_ptr_byte_diff(target, pc);
> +    if (disp == (int16_t) disp) {
> +        *pc = (*pc & ~0xfffc) | (disp & 0xfffc);
> +        return true;
> +    }
> +    return false;
> +}
> +
> +#ifdef CONFIG_SOFTMMU
>  static void reloc_pc14(tcg_insn_unit *pc, tcg_insn_unit *target)
>  {
> -    *pc = (*pc & ~0xfffc) | reloc_pc14_val(pc, target);
> +    tcg_debug_assert(reloc_pc14_cond(pc, target));

Again side effects in assert.

>  }
> +#endif
>
>  static inline void tcg_out_b_noaddr(TCGContext *s, int insn)
>  {
> @@ -525,7 +535,7 @@ static const uint32_t tcg_to_isel[] = {
>      [TCG_COND_GTU] = ISEL | BC_(7, CR_GT),
>  };
>
> -static void patch_reloc(tcg_insn_unit *code_ptr, int type,
> +static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
>                          intptr_t value, intptr_t addend)
>  {
>      tcg_insn_unit *target;
> @@ -536,11 +546,9 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
>
>      switch (type) {
>      case R_PPC_REL14:
> -        reloc_pc14(code_ptr, target);
> -        break;
> +        return reloc_pc14_cond(code_ptr, target);
>      case R_PPC_REL24:
> -        reloc_pc24(code_ptr, target);
> -        break;
> +        return reloc_pc24_cond(code_ptr, target);
>      case R_PPC_ADDR16:
>          /* We are abusing this relocation type.  This points to a pair
>             of insns, addis + load.  If the displacement is small, we
> @@ -552,11 +560,14 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
>          } else {
>              int16_t lo = value;
>              int hi = value - lo;
> -            assert(hi + lo == value);
> -            code_ptr[0] = deposit32(code_ptr[0], 0, 16, hi >> 16);
> -            code_ptr[1] = deposit32(code_ptr[1], 0, 16, lo);
> +            if (hi + lo == value) {
> +                code_ptr[0] = deposit32(code_ptr[0], 0, 16, hi >> 16);
> +                code_ptr[1] = deposit32(code_ptr[1], 0, 16, lo);
> +            } else {
> +                return false;
> +            }
>          }
> -        break;
> +        return true;
>      default:
>          g_assert_not_reached();
>      }
> diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
> index 17c435ade5..a8d72dd630 100644
> --- a/tcg/s390/tcg-target.inc.c
> +++ b/tcg/s390/tcg-target.inc.c
> @@ -366,7 +366,7 @@ static void * const qemu_st_helpers[16] = {
>  static tcg_insn_unit *tb_ret_addr;
>  uint64_t s390_facilities;
>
> -static void patch_reloc(tcg_insn_unit *code_ptr, int type,
> +static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
>                          intptr_t value, intptr_t addend)
>  {
>      intptr_t pcrel2;
> @@ -377,22 +377,35 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
>
>      switch (type) {
>      case R_390_PC16DBL:
> -        assert(pcrel2 == (int16_t)pcrel2);
> -        tcg_patch16(code_ptr, pcrel2);
> +        if (pcrel2 == (int16_t)pcrel2) {
> +            tcg_patch16(code_ptr, pcrel2);
> +            return true;
> +        }
>          break;
>      case R_390_PC32DBL:
> -        assert(pcrel2 == (int32_t)pcrel2);
> -        tcg_patch32(code_ptr, pcrel2);
> +        if (pcrel2 == (int32_t)pcrel2) {
> +            tcg_patch32(code_ptr, pcrel2);
> +            return true;
> +        }
>          break;
>      case R_390_20:
> -        assert(value == sextract64(value, 0, 20));
> -        old = *(uint32_t *)code_ptr & 0xf00000ff;
> -        old |= ((value & 0xfff) << 16) | ((value & 0xff000) >> 4);
> -        tcg_patch32(code_ptr, old);
> +        if (value == sextract64(value, 0, 20)) {
> +            old = *(uint32_t *)code_ptr & 0xf00000ff;
> +            old |= ((value & 0xfff) << 16) | ((value & 0xff000) >> 4);
> +            tcg_patch32(code_ptr, old);
> +            return true;
> +        }
>          break;
>      default:
>          g_assert_not_reached();
>      }
> +    return false;
> +}
> +
> +static void patch_reloc_force(tcg_insn_unit *code_ptr, int type,
> +                              intptr_t value, intptr_t addend)
> +{
> +    tcg_debug_assert(patch_reloc(code_ptr, type, value, addend));

Side effect in assert.

Also as patch_reloc_force is only called for softmmu it needs a guard to
stop the compiler complaining for a linux-user build:


In file included from /root/src/github.com/stsquad/qemu/tcg/tcg.c:320:0:
/root/src/github.com/stsquad/qemu/tcg/s390/tcg-target.inc.c:405:13: error: 'patch_reloc_force' defined but not used [-Werror=unused-function]
 static void patch_reloc_force(tcg_insn_unit *code_ptr, int type,
             ^~~~~~~~~~~~~~~~~

>  }
>
>  /* parse target specific constraints */
> @@ -1618,7 +1631,8 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
>      TCGMemOpIdx oi = lb->oi;
>      TCGMemOp opc = get_memop(oi);
>
> -    patch_reloc(lb->label_ptr[0], R_390_PC16DBL, (intptr_t)s->code_ptr, 2);
> +    patch_reloc_force(lb->label_ptr[0], R_390_PC16DBL,
> +                      (intptr_t)s->code_ptr, 2);
>
>      tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_R2, TCG_AREG0);
>      if (TARGET_LONG_BITS == 64) {
> @@ -1639,7 +1653,8 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
>      TCGMemOpIdx oi = lb->oi;
>      TCGMemOp opc = get_memop(oi);
>
> -    patch_reloc(lb->label_ptr[0], R_390_PC16DBL, (intptr_t)s->code_ptr, 2);
> +    patch_reloc_force(lb->label_ptr[0], R_390_PC16DBL,
> +                      (intptr_t)s->code_ptr, 2);
>
>      tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_R2, TCG_AREG0);
>      if (TARGET_LONG_BITS == 64) {
> diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
> index 04bdc3df5e..111f3312d3 100644
> --- a/tcg/sparc/tcg-target.inc.c
> +++ b/tcg/sparc/tcg-target.inc.c
> @@ -291,32 +291,34 @@ static inline int check_fit_i32(int32_t val, unsigned int bits)
>  # define check_fit_ptr  check_fit_i32
>  #endif
>
> -static void patch_reloc(tcg_insn_unit *code_ptr, int type,
> +static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
>                          intptr_t value, intptr_t addend)
>  {
>      uint32_t insn = *code_ptr;
>      intptr_t pcrel;
> +    bool ret;
>
>      value += addend;
>      pcrel = tcg_ptr_byte_diff((tcg_insn_unit *)value, code_ptr);
>
>      switch (type) {
>      case R_SPARC_WDISP16:
> -        assert(check_fit_ptr(pcrel >> 2, 16));
> +        ret = check_fit_ptr(pcrel >> 2, 16);
>          insn &= ~INSN_OFF16(-1);
>          insn |= INSN_OFF16(pcrel);
>          break;
>      case R_SPARC_WDISP19:
> -        assert(check_fit_ptr(pcrel >> 2, 19));
> +        ret = check_fit_ptr(pcrel >> 2, 19);
>          insn &= ~INSN_OFF19(-1);
>          insn |= INSN_OFF19(pcrel);
>          break;
>      case R_SPARC_13:
>          /* Note that we're abusing this reloc type for our own needs.  */
> +        ret = true;
>          if (!check_fit_ptr(value, 13)) {
>              int adj = (value > 0 ? 0xff8 : -0x1000);
>              value -= adj;
> -            assert(check_fit_ptr(value, 13));
> +            ret = check_fit_ptr(value, 13);
>              *code_ptr++ = (ARITH_ADD | INSN_RD(TCG_REG_T2)
>                             | INSN_RS1(TCG_REG_TB) | INSN_IMM13(adj));
>              insn ^= INSN_RS1(TCG_REG_TB) ^ INSN_RS1(TCG_REG_T2);
> @@ -328,12 +330,13 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
>          /* Note that we're abusing this reloc type for our own needs.  */
>          code_ptr[0] = deposit32(code_ptr[0], 0, 22, value >> 10);
>          code_ptr[1] = deposit32(code_ptr[1], 0, 10, value);
> -        return;
> +        return value == (intptr_t)(uint32_t)value;
>      default:
>          g_assert_not_reached();
>      }
>
>      *code_ptr = insn;
> +    return ret;
>  }
>
>  /* parse target specific constraints */
> diff --git a/tcg/tcg-pool.inc.c b/tcg/tcg-pool.inc.c
> index 7af5513ff3..ab8f6df8b0 100644
> --- a/tcg/tcg-pool.inc.c
> +++ b/tcg/tcg-pool.inc.c
> @@ -140,6 +140,8 @@ static bool tcg_out_pool_finalize(TCGContext *s)
>
>      for (; p != NULL; p = p->next) {
>          size_t size = sizeof(tcg_target_ulong) * p->nlong;
> +        bool ok;
> +
>          if (!l || l->nlong != p->nlong || memcmp(l->data, p->data, size)) {
>              if (unlikely(a > s->code_gen_highwater)) {
>                  return false;
> @@ -148,7 +150,8 @@ static bool tcg_out_pool_finalize(TCGContext *s)
>              a += size;
>              l = p;
>          }
> -        patch_reloc(p->label, p->rtype, (intptr_t)a - size, p->addend);
> +        ok = patch_reloc(p->label, p->rtype, (intptr_t)a - size, p->addend);
> +        tcg_debug_assert(ok);
>      }
>
>      s->code_ptr = a;
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index e85133ef05..54f1272187 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -66,7 +66,7 @@
>  static void tcg_target_init(TCGContext *s);
>  static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode);
>  static void tcg_target_qemu_prologue(TCGContext *s);
> -static void patch_reloc(tcg_insn_unit *code_ptr, int type,
> +static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
>                          intptr_t value, intptr_t addend);
>
>  /* The CIE and FDE header definitions will be common to all hosts.  */
> @@ -268,7 +268,8 @@ static void tcg_out_reloc(TCGContext *s, tcg_insn_unit *code_ptr, int type,
>          /* FIXME: This may break relocations on RISC targets that
>             modify instruction fields in place.  The caller may not have
>             written the initial value.  */
> -        patch_reloc(code_ptr, type, l->u.value, addend);
> +        bool ok = patch_reloc(code_ptr, type, l->u.value, addend);
> +        tcg_debug_assert(ok);
>      } else {
>          /* add a new relocation entry */
>          r = tcg_malloc(sizeof(TCGRelocation));
> @@ -288,7 +289,8 @@ static void tcg_out_label(TCGContext *s, TCGLabel *l, tcg_insn_unit *ptr)
>      tcg_debug_assert(!l->has_value);
>
>      for (r = l->u.first_reloc; r != NULL; r = r->next) {
> -        patch_reloc(r->ptr, r->type, value, r->addend);
> +        bool ok = patch_reloc(r->ptr, r->type, value, r->addend);
> +        tcg_debug_assert(ok);
>      }
>
>      l->has_value = 1;
> diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
> index 62ed097254..0015a98485 100644
> --- a/tcg/tci/tcg-target.inc.c
> +++ b/tcg/tci/tcg-target.inc.c
> @@ -369,7 +369,7 @@ static const char *const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
>  };
>  #endif
>
> -static void patch_reloc(tcg_insn_unit *code_ptr, int type,
> +static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
>                          intptr_t value, intptr_t addend)
>  {
>      /* tcg_out_reloc always uses the same type, addend. */
> @@ -381,6 +381,7 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
>      } else {
>          tcg_patch64(code_ptr, value);
>      }
> +    return true;
>  }
>
>  /* Parse target specific constraints. */


--
Alex Bennée

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v2 01/37] tcg/i386: Always use %ebp for TCG_AREG0
  2018-11-29 12:52   ` Alex Bennée
@ 2018-11-29 14:55     ` Richard Henderson
  0 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-29 14:55 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel; +Cc: Alistair.Francis

On 11/29/18 4:52 AM, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> For x86_64, this can result in smaller code when manipulating
>> TCG_TYPE_I32, as we can omit a REX prefix.
> 
> I take it you mean passing TCG_TYPE_I32 back and forth from the register
> backing store in CPUEnv which TCG_AREG0 points at?

Yes, exactly.  Perhaps I should expand my comment...


r~

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v2 05/37] tcg/i386: Add constraints for r8 and r9
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 05/37] tcg/i386: Add constraints for r8 and r9 Richard Henderson
@ 2018-11-29 15:00   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2018-11-29 15:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis


Richard Henderson <richard.henderson@linaro.org> writes:

> These are function call arguments for x86_64 we will need soon.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/i386/tcg-target.inc.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index 4f66a0c5ae..8aef66e430 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -233,6 +233,14 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
>          ct->ct |= TCG_CT_REG;
>          tcg_regset_set_reg(ct->u.regs, TCG_REG_EDI);
>          break;
> +    case 'E': /* "Eight", r8 */
> +        ct->ct |= TCG_CT_REG;
> +        tcg_regset_set_reg(ct->u.regs, TCG_REG_R8);
> +        break;
> +    case 'N': /* "Nine", r9 */
> +        ct->ct |= TCG_CT_REG;
> +        tcg_regset_set_reg(ct->u.regs, TCG_REG_R9);
> +        break;

I would be nice to flesh out the missing comments in tcg.h:

#define TCG_CT_ALIAS  0x80
#define TCG_CT_IALIAS 0x40
#define TCG_CT_NEWREG 0x20 /* output requires a new register */
#define TCG_CT_REG    0x01
#define TCG_CT_CONST  0x02 /* any constant of register size */

Anyway:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

>      case 'q':
>          /* A register that can be used as a byte operand.  */
>          ct->ct |= TCG_CT_REG;


--
Alex Bennée

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v2 06/37] tcg/i386: Return a base register from tcg_out_tlb_load
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 06/37] tcg/i386: Return a base register from tcg_out_tlb_load Richard Henderson
@ 2018-11-29 16:34   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2018-11-29 16:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis


Richard Henderson <richard.henderson@linaro.org> writes:

> We will shortly be asking the hot path not to assume TCG_REG_L1
> for the host base address.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  tcg/i386/tcg-target.inc.c | 56 ++++++++++++++++++++-------------------
>  1 file changed, 29 insertions(+), 27 deletions(-)
>
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index 8aef66e430..3234a8d8bf 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -1614,9 +1614,9 @@ static void * const qemu_st_helpers[16] = {
>
>     First argument register is clobbered.  */
>
> -static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
> -                                    int mem_index, TCGMemOp opc,
> -                                    tcg_insn_unit **label_ptr, int which)
> +static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
> +                               int mem_index, TCGMemOp opc,
> +                               tcg_insn_unit **label_ptr, int which)
>  {
>      const TCGReg r0 = TCG_REG_L0;
>      const TCGReg r1 = TCG_REG_L1;
> @@ -1696,6 +1696,8 @@ static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
>      /* add addend(r0), r1 */
>      tcg_out_modrm_offset(s, OPC_ADD_GvEv + hrexw, r1, r0,
>                           offsetof(CPUTLBEntry, addend) - which);
> +
> +    return r1;
>  }
>
>  /*
> @@ -2001,10 +2003,6 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
>      TCGReg addrhi __attribute__((unused));
>      TCGMemOpIdx oi;
>      TCGMemOp opc;
> -#if defined(CONFIG_SOFTMMU)
> -    int mem_index;
> -    tcg_insn_unit *label_ptr[2];
> -#endif
>
>      datalo = *args++;
>      datahi = (TCG_TARGET_REG_BITS == 32 && is64 ? *args++ : 0);
> @@ -2014,17 +2012,21 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
>      opc = get_memop(oi);
>
>  #if defined(CONFIG_SOFTMMU)
> -    mem_index = get_mmuidx(oi);
> +    {
> +        int mem_index = get_mmuidx(oi);
> +        tcg_insn_unit *label_ptr[2];
> +        TCGReg base;
>
> -    tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
> -                     label_ptr, offsetof(CPUTLBEntry, addr_read));
> +        base = tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
> +                                label_ptr, offsetof(CPUTLBEntry, addr_read));
>
> -    /* TLB Hit.  */
> -    tcg_out_qemu_ld_direct(s, datalo, datahi, TCG_REG_L1, -1, 0, 0, opc);
> +        /* TLB Hit.  */
> +        tcg_out_qemu_ld_direct(s, datalo, datahi, base, -1, 0, 0, opc);
>
> -    /* Record the current context of a load into ldst label */
> -    add_qemu_ldst_label(s, true, oi, datalo, datahi, addrlo, addrhi,
> -                        s->code_ptr, label_ptr);
> +        /* Record the current context of a load into ldst label */
> +        add_qemu_ldst_label(s, true, oi, datalo, datahi, addrlo, addrhi,
> +                            s->code_ptr, label_ptr);
> +    }
>  #else
>      {
>          int32_t offset = guest_base;
> @@ -2141,10 +2143,6 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
>      TCGReg addrhi __attribute__((unused));
>      TCGMemOpIdx oi;
>      TCGMemOp opc;
> -#if defined(CONFIG_SOFTMMU)
> -    int mem_index;
> -    tcg_insn_unit *label_ptr[2];
> -#endif
>
>      datalo = *args++;
>      datahi = (TCG_TARGET_REG_BITS == 32 && is64 ? *args++ : 0);
> @@ -2154,17 +2152,21 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
>      opc = get_memop(oi);
>
>  #if defined(CONFIG_SOFTMMU)
> -    mem_index = get_mmuidx(oi);
> +    {
> +        int mem_index = get_mmuidx(oi);
> +        tcg_insn_unit *label_ptr[2];
> +        TCGReg base;
>
> -    tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
> -                     label_ptr, offsetof(CPUTLBEntry, addr_write));
> +        base = tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
> +                                label_ptr, offsetof(CPUTLBEntry, addr_write));
>
> -    /* TLB Hit.  */
> -    tcg_out_qemu_st_direct(s, datalo, datahi, TCG_REG_L1, 0, 0, opc);
> +        /* TLB Hit.  */
> +        tcg_out_qemu_st_direct(s, datalo, datahi, base, 0, 0, opc);
>
> -    /* Record the current context of a store into ldst label */
> -    add_qemu_ldst_label(s, false, oi, datalo, datahi, addrlo, addrhi,
> -                        s->code_ptr, label_ptr);
> +        /* Record the current context of a store into ldst label */
> +        add_qemu_ldst_label(s, false, oi, datalo, datahi, addrlo, addrhi,
> +                            s->code_ptr, label_ptr);
> +    }
>  #else
>      {
>          int32_t offset = guest_base;


--
Alex Bennée

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v2 07/37] tcg/i386: Change TCG_REG_L[01] to not overlap function arguments
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 07/37] tcg/i386: Change TCG_REG_L[01] to not overlap function arguments Richard Henderson
@ 2018-11-29 17:13   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2018-11-29 17:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis


Richard Henderson <richard.henderson@linaro.org> writes:

> We will shortly be forcing qemu_ld/st arguments into registers
> that match the function call abi of the host, which means that
> the temps must be elsewhere.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/i386/tcg-target.inc.c | 28 +++++++++++++++++++---------
>  1 file changed, 19 insertions(+), 9 deletions(-)
>
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index 3234a8d8bf..07df4b2b12 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -121,12 +121,16 @@ static const int tcg_target_call_oarg_regs[] = {
>  #define TCG_CT_CONST_I32 0x400
>  #define TCG_CT_CONST_WSZ 0x800
>
> -/* Registers used with L constraint, which are the first argument
> -   registers on x86_64, and two random call clobbered registers on
> -   i386. */
> +/* Registers used with L constraint, which are two random
> + * call clobbered registers.  These should be free.
> + */

"These should be free by the time we have committed to making a procedure
call and won't be needed afterwards."?

>  #if TCG_TARGET_REG_BITS == 64
> -# define TCG_REG_L0 tcg_target_call_iarg_regs[0]
> -# define TCG_REG_L1 tcg_target_call_iarg_regs[1]

I guess we don't need this type of assignment enough to have a
tcg_target_call_clobber_regs array we can fill from?

/me digs deeper

ahh I see we have tcg_target_call_clobber_regs but that's a bitmap for
use by the register allocator... never mind.

> +# define TCG_REG_L0   TCG_REG_RAX
> +# ifdef _WIN64
> +#  define TCG_REG_L1  TCG_REG_R10
> +# else
> +#  define TCG_REG_L1  TCG_REG_RDI
> +# endif
>  #else
>  # define TCG_REG_L0 TCG_REG_EAX
>  # define TCG_REG_L1 TCG_REG_EDX
> @@ -1628,6 +1632,7 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
>      unsigned a_mask = (1 << a_bits) - 1;
>      unsigned s_mask = (1 << s_bits) - 1;
>      target_ulong tlb_mask;
> +    TCGReg base;
>
>      if (TCG_TARGET_REG_BITS == 64) {
>          if (TARGET_LONG_BITS == 64) {
> @@ -1674,7 +1679,12 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
>         before the fastpath ADDQ below.  For 64-bit guest and x32 host, MOVQ
>         copies the entire guest address for the slow path, while truncation
>         for the 32-bit host happens with the fastpath ADDL below.  */
> -    tcg_out_mov(s, ttype, r1, addrlo);
> +    if (TCG_TARGET_REG_BITS == 64) {
> +        base = tcg_target_call_iarg_regs[1];
> +    } else {
> +        base = r1;
> +    }
> +    tcg_out_mov(s, ttype, base, addrlo);
>
>      /* jne slow_path */
>      tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
> @@ -1693,11 +1703,11 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
>
>      /* TLB Hit.  */
>
> -    /* add addend(r0), r1 */
> -    tcg_out_modrm_offset(s, OPC_ADD_GvEv + hrexw, r1, r0,
> +    /* add addend(r0), base */
> +    tcg_out_modrm_offset(s, OPC_ADD_GvEv + hrexw, base, r0,
>                           offsetof(CPUTLBEntry, addend) - which);
>
> -    return r1;
> +    return base;
>  }
>
>  /*

Anyway:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v2 03/37] tcg: Return success from patch_reloc
  2018-11-29 14:47   ` Alex Bennée
@ 2018-11-29 17:35     ` Richard Henderson
  0 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2018-11-29 17:35 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel; +Cc: Alistair.Francis

On 11/29/18 6:47 AM, Alex Bennée wrote:
> We also seem to be dropping a bunch of reloc_atomic functions (which are
> no longer used?). Perhaps that should be a separate patch to make the
> series cleaner?

Yes, they're no longer used.

I should do a full cleanup of "atomic" stuff that's no longer required due to
the fact that we no longer re-translate for unwinding.


r~

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v2 08/37] tcg/i386: Force qemu_ld/st arguments into fixed registers
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 08/37] tcg/i386: Force qemu_ld/st arguments into fixed registers Richard Henderson
@ 2018-11-30 16:16   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2018-11-30 16:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis


Richard Henderson <richard.henderson@linaro.org> writes:

> This is an incremental step toward moving the qemu_ld/st
> code sequence out of line.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/i386/tcg-target.inc.c | 203 +++++++++++++++++++++++++++++++-------
>  1 file changed, 169 insertions(+), 34 deletions(-)
>
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index 07df4b2b12..50e5dc31b3 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -171,6 +171,80 @@ static bool have_lzcnt;
>
>  static tcg_insn_unit *tb_ret_addr;
>
> +typedef enum {
> +    ARG_ADDR,
> +    ARG_STVAL,
> +    ARG_LDVAL,
> +} QemuMemArgType;
> +
> +#ifdef CONFIG_SOFTMMU
> +/*
> + * Constraint to choose a particular register.  This is used for softmmu
> + * loads and stores.  Registers with no assignment get an empty string.
> + */
> +static const char * const one_reg_constraint[TCG_TARGET_NB_REGS] = {
> +    [TCG_REG_EAX] = "a",
> +    [TCG_REG_EBX] = "b",
> +    [TCG_REG_ECX] = "c",
> +    [TCG_REG_EDX] = "d",
> +    [TCG_REG_ESI] = "S",
> +    [TCG_REG_EDI] = "D",
> +#if TCG_TARGET_REG_BITS == 64
> +    [TCG_REG_R8]  = "E",
> +    [TCG_REG_R9]  = "N",
> +#endif
> +};
> +
> +/*
> + * Calling convention for the softmmu load and store thunks.
> + *
> + * For 64-bit, we mostly use the host calling convention, therefore the
> + * real first argument is reserved for the ENV parameter that is passed
> + * on to the slow path helpers.
> + *
> + * For 32-bit, the host calling convention is stack based; we invent a
> + * private convention that uses 4 of the 6 available host registers.
> + * We reserve EAX and EDX as temporaries for use by the thunk, we require
> + * INDEX_op_qemu_st_i32 to have a 'q' register from which to store, and
> + * further complicate this last by wanting a call-clobbered for that store.
> + * The 'q' requirement allows MO_8 stores at all; the call-clobbered part
> + * allows bswap to operate in-place, clobbering the input.
> + */
> +static TCGReg softmmu_arg(QemuMemArgType type, bool is_64, int hi)
> +{
> +    switch (type) {
> +    case ARG_ADDR:
> +        tcg_debug_assert(!hi || TARGET_LONG_BITS > TCG_TARGET_REG_BITS);
> +        if (TCG_TARGET_REG_BITS == 64) {
> +            return tcg_target_call_iarg_regs[1];
> +        } else {
> +            return hi ? TCG_REG_EDI : TCG_REG_ESI;
> +        }
> +    case ARG_STVAL:
> +        tcg_debug_assert(!hi || (TCG_TARGET_REG_BITS == 32 && is_64));
> +        if (TCG_TARGET_REG_BITS == 64) {
> +            return tcg_target_call_iarg_regs[2];
> +        } else {
> +            return hi ? TCG_REG_EBX : TCG_REG_ECX;
> +        }
> +    case ARG_LDVAL:
> +        tcg_debug_assert(!hi || (TCG_TARGET_REG_BITS == 32 && is_64));
> +        return tcg_target_call_oarg_regs[hi];
> +    }
> +    g_assert_not_reached();
> +}
> +
> +static const char *constrain_memop_arg(QemuMemArgType type, bool is_64, int hi)
> +{
> +    return one_reg_constraint[softmmu_arg(type, is_64, hi)];
> +}
> +#else
> +static const char *constrain_memop_arg(QemuMemArgType type, bool is_64, int hi)
> +{
> +    return "L";
> +}
> +#endif /* CONFIG_SOFTMMU */
> +
>  static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
>                          intptr_t value, intptr_t addend)
>  {
> @@ -1680,11 +1754,15 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
>         copies the entire guest address for the slow path, while truncation
>         for the 32-bit host happens with the fastpath ADDL below.  */
>      if (TCG_TARGET_REG_BITS == 64) {
> -        base = tcg_target_call_iarg_regs[1];
> +        tcg_debug_assert(addrlo == tcg_target_call_iarg_regs[1]);
> +        if (TARGET_LONG_BITS == 32) {
> +            tcg_out_ext32u(s, addrlo, addrlo);
> +        }
> +        base = addrlo;
>      } else {
>          base = r1;
> +        tcg_out_mov(s, ttype, base, addrlo);
>      }
> -    tcg_out_mov(s, ttype, base, addrlo);
>
>      /* jne slow_path */
>      tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
> @@ -2009,16 +2087,22 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
>     common. */
>  static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
>  {
> -    TCGReg datalo, datahi, addrlo;
> -    TCGReg addrhi __attribute__((unused));
> +    TCGReg datalo, addrlo;
> +    TCGReg datahi __attribute__((unused)) = -1;
> +    TCGReg addrhi __attribute__((unused)) = -1;
>      TCGMemOpIdx oi;
>      TCGMemOp opc;
> +    int i = -1;
>
> -    datalo = *args++;
> -    datahi = (TCG_TARGET_REG_BITS == 32 && is64 ? *args++ : 0);
> -    addrlo = *args++;
> -    addrhi = (TARGET_LONG_BITS > TCG_TARGET_REG_BITS ? *args++ : 0);
> -    oi = *args++;
> +    datalo = args[++i];

Swapping to i = -1 and pre-indexes seems a little unnatural here
compared to a more normal 0 and i++ unless there was a specific reason
to have i in the range of 2-4?

> +    if (TCG_TARGET_REG_BITS == 32 && is64) {
> +        datahi = args[++i];
> +    }
> +    addrlo = args[++i];
> +    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
> +        addrhi = args[++i];
> +    }
> +    oi = args[++i];
>      opc = get_memop(oi);
>
>  #if defined(CONFIG_SOFTMMU)
> @@ -2027,6 +2111,15 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
>          tcg_insn_unit *label_ptr[2];
>          TCGReg base;
>
> +        tcg_debug_assert(datalo == softmmu_arg(ARG_LDVAL, is64, 0));
> +        if (TCG_TARGET_REG_BITS == 32 && is64) {
> +            tcg_debug_assert(datahi == softmmu_arg(ARG_LDVAL, is64, 1));
> +        }
> +        tcg_debug_assert(addrlo == softmmu_arg(ARG_ADDR, 0, 0));
> +        if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
> +            tcg_debug_assert(addrhi == softmmu_arg(ARG_ADDR, 0, 1));
> +        }
> +
>          base = tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
>                                  label_ptr, offsetof(CPUTLBEntry, addr_read));
>
> @@ -2149,16 +2242,22 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
>
>  static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
>  {
> -    TCGReg datalo, datahi, addrlo;
> -    TCGReg addrhi __attribute__((unused));
> +    TCGReg datalo, addrlo;
> +    TCGReg datahi __attribute__((unused)) = -1;
> +    TCGReg addrhi __attribute__((unused)) = -1;
>      TCGMemOpIdx oi;
>      TCGMemOp opc;
> +    int i = -1;

And again here

>
> -    datalo = *args++;
> -    datahi = (TCG_TARGET_REG_BITS == 32 && is64 ? *args++ : 0);
> -    addrlo = *args++;
> -    addrhi = (TARGET_LONG_BITS > TCG_TARGET_REG_BITS ? *args++ : 0);
> -    oi = *args++;
> +    datalo = args[++i];
> +    if (TCG_TARGET_REG_BITS == 32 && is64) {
> +        datahi = args[++i];
> +    }
> +    addrlo = args[++i];
> +    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
> +        addrhi = args[++i];
> +    }
> +    oi = args[++i];
>      opc = get_memop(oi);
>
>  #if defined(CONFIG_SOFTMMU)
> @@ -2167,6 +2266,15 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
>          tcg_insn_unit *label_ptr[2];
>          TCGReg base;
>
> +        tcg_debug_assert(datalo == softmmu_arg(ARG_STVAL, is64, 0));
> +        if (TCG_TARGET_REG_BITS == 32 && is64) {
> +            tcg_debug_assert(datahi == softmmu_arg(ARG_STVAL, is64, 1));
> +        }
> +        tcg_debug_assert(addrlo == softmmu_arg(ARG_ADDR, 0, 0));
> +        if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
> +            tcg_debug_assert(addrhi == softmmu_arg(ARG_ADDR, 0, 1));
> +        }
> +
>          base = tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
>                                  label_ptr, offsetof(CPUTLBEntry, addr_write));
>
> @@ -2836,15 +2944,6 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
>      static const TCGTargetOpDef r_r_re = { .args_ct_str = { "r", "r", "re" } };
>      static const TCGTargetOpDef r_0_re = { .args_ct_str = { "r", "0", "re" } };
>      static const TCGTargetOpDef r_0_ci = { .args_ct_str = { "r", "0", "ci" } };
> -    static const TCGTargetOpDef r_L = { .args_ct_str = { "r", "L" } };
> -    static const TCGTargetOpDef L_L = { .args_ct_str = { "L", "L" } };
> -    static const TCGTargetOpDef r_L_L = { .args_ct_str = { "r", "L", "L" } };
> -    static const TCGTargetOpDef r_r_L = { .args_ct_str = { "r", "r", "L" } };
> -    static const TCGTargetOpDef L_L_L = { .args_ct_str = { "L", "L", "L" } };
> -    static const TCGTargetOpDef r_r_L_L
> -        = { .args_ct_str = { "r", "r", "L", "L" } };
> -    static const TCGTargetOpDef L_L_L_L
> -        = { .args_ct_str = { "L", "L", "L", "L" } };
>      static const TCGTargetOpDef x_x = { .args_ct_str = { "x", "x" } };
>      static const TCGTargetOpDef x_x_x = { .args_ct_str = { "x", "x", "x" } };
>      static const TCGTargetOpDef x_x_x_x
> @@ -3026,17 +3125,53 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
>          }
>
>      case INDEX_op_qemu_ld_i32:
> -        return TARGET_LONG_BITS <= TCG_TARGET_REG_BITS ? &r_L : &r_L_L;
> -    case INDEX_op_qemu_st_i32:
> -        return TARGET_LONG_BITS <= TCG_TARGET_REG_BITS ? &L_L : &L_L_L;
> +        {
> +            static TCGTargetOpDef ld32;
> +            int i;
> +
> +            ld32.args_ct_str[0] = constrain_memop_arg(ARG_LDVAL, 0, 0);
> +            for (i = 0; i * TCG_TARGET_REG_BITS < TARGET_LONG_BITS;
>      ++i) {

This formulation is a bit weird to follow, at first I thought it was
going to overflow TCG_MAX_OP_ARGS until I realised what it was doing.
Maybe a helper function would be a little clearer:

               for (src_reg = 0; src_reg < tcg_target_regs_for(TARGET_LONG_BITS), ++i)

Same comment applies bellow.

> +                ld32.args_ct_str[i + 1] = constrain_memop_arg(ARG_ADDR, 0, i);
> +            }
> +            return &ld32;
> +        }
>      case INDEX_op_qemu_ld_i64:
> -        return (TCG_TARGET_REG_BITS == 64 ? &r_L
> -                : TARGET_LONG_BITS <= TCG_TARGET_REG_BITS ? &r_r_L
> -                : &r_r_L_L);
> +        {
> +            static TCGTargetOpDef ld64;
> +            int i, j = 0;
> +
> +            for (i = 0; i * TCG_TARGET_REG_BITS < 64; ++i) {
> +                ld64.args_ct_str[j++] = constrain_memop_arg(ARG_LDVAL, 1, i);
> +            }
> +            for (i = 0; i * TCG_TARGET_REG_BITS < TARGET_LONG_BITS; ++i) {
> +                ld64.args_ct_str[j++] = constrain_memop_arg(ARG_ADDR, 0, i);
> +            }
> +            return &ld64;
> +        }
> +    case INDEX_op_qemu_st_i32:
> +        {
> +            static TCGTargetOpDef st32;
> +            int i;
> +
> +            st32.args_ct_str[0] = constrain_memop_arg(ARG_STVAL, 0, 0);
> +            for (i = 0; i * TCG_TARGET_REG_BITS < TARGET_LONG_BITS; ++i) {
> +                st32.args_ct_str[i + 1] = constrain_memop_arg(ARG_ADDR, 0, i);
> +            }
> +            return &st32;
> +        }
>      case INDEX_op_qemu_st_i64:
> -        return (TCG_TARGET_REG_BITS == 64 ? &L_L
> -                : TARGET_LONG_BITS <= TCG_TARGET_REG_BITS ? &L_L_L
> -                : &L_L_L_L);
> +        {
> +            static TCGTargetOpDef st64;
> +            int i, j = 0;
> +
> +            for (i = 0; i * TCG_TARGET_REG_BITS < 64; ++i) {
> +                st64.args_ct_str[j++] = constrain_memop_arg(ARG_STVAL, 1, i);
> +            }
> +            for (i = 0; i * TCG_TARGET_REG_BITS < TARGET_LONG_BITS; ++i) {
> +                st64.args_ct_str[j++] = constrain_memop_arg(ARG_ADDR, 0, i);
> +            }
> +            return &st64;
> +        }
>
>      case INDEX_op_brcond2_i32:
>          {


--
Alex Bennée

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v2 09/37] tcg/i386: Use TCG_TARGET_NEED_LDST_OOL_LABELS
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 09/37] tcg/i386: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
@ 2018-11-30 17:22   ` Alex Bennée
  2018-11-30 17:37     ` Richard Henderson
  0 siblings, 1 reply; 55+ messages in thread
From: Alex Bennée @ 2018-11-30 17:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis


Richard Henderson <richard.henderson@linaro.org> writes:

> Move the entire memory operation out of line.

Given Emilio's numbers is it likely we will want to support both options
given the variability on x86?

>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/i386/tcg-target.h     |   2 +-
>  tcg/i386/tcg-target.inc.c | 391 ++++++++++++++++----------------------
>  2 files changed, 162 insertions(+), 231 deletions(-)
>
> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
> index 2441658865..1b2d4e1b0d 100644
> --- a/tcg/i386/tcg-target.h
> +++ b/tcg/i386/tcg-target.h
> @@ -220,7 +220,7 @@ static inline void tb_target_set_jmp_target(uintptr_t tc_ptr,
>  #define TCG_TARGET_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
>
>  #ifdef CONFIG_SOFTMMU
> -#define TCG_TARGET_NEED_LDST_LABELS
> +#define TCG_TARGET_NEED_LDST_OOL_LABELS
>  #endif
>  #define TCG_TARGET_NEED_POOL_LABELS
>
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index 50e5dc31b3..5c68cbd43d 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -1643,7 +1643,7 @@ static void tcg_out_nopn(TCGContext *s, int n)
>  }
>
>  #if defined(CONFIG_SOFTMMU)
> -#include "tcg-ldst.inc.c"
> +#include "tcg-ldst-ool.inc.c"
>
>  /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
>   *                                     int mmu_idx, uintptr_t ra)
> @@ -1656,6 +1656,14 @@ static void * const qemu_ld_helpers[16] = {
>      [MO_BEUW] = helper_be_lduw_mmu,
>      [MO_BEUL] = helper_be_ldul_mmu,
>      [MO_BEQ]  = helper_be_ldq_mmu,
> +
> +    [MO_SB]   = helper_ret_ldsb_mmu,
> +    [MO_LESW] = helper_le_ldsw_mmu,
> +    [MO_BESW] = helper_be_ldsw_mmu,
> +#if TCG_TARGET_REG_BITS == 64
> +    [MO_LESL] = helper_le_ldsl_mmu,
> +    [MO_BESL] = helper_be_ldsl_mmu,
> +#endif

Can we mention why these are added in the commit message please?

 rth: why has qemu_ld_helpers been filled out? Did those loads not
    happen before?
<rth> stsquad, previously we performed sign-extensions inline after
    returning from the helper; with the change to a tail call we can't
    do that anymore.
 rth: maybe that could go in the commit message then...


>  };
>
>  /* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
> @@ -1765,18 +1773,18 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
>      }
>
>      /* jne slow_path */
> -    tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
> +    tcg_out_opc(s, OPC_JCC_short + JCC_JNE, 0, 0, 0);
>      label_ptr[0] = s->code_ptr;
> -    s->code_ptr += 4;
> +    s->code_ptr += 1;
>
>      if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
>          /* cmp 4(r0), addrhi */
>          tcg_out_modrm_offset(s, OPC_CMP_GvEv, addrhi, r0, 4);
>
>          /* jne slow_path */
> -        tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
> +        tcg_out_opc(s, OPC_JCC_short + JCC_JNE, 0, 0, 0);
>          label_ptr[1] = s->code_ptr;
> -        s->code_ptr += 4;
> +        s->code_ptr += 1;
>      }
>
>      /* TLB Hit.  */
> @@ -1788,181 +1796,6 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
>      return base;
>  }
>
> -/*
> - * Record the context of a call to the out of line helper code for the slow path
> - * for a load or store, so that we can later generate the correct helper code
> - */
> -static void add_qemu_ldst_label(TCGContext *s, bool is_ld, TCGMemOpIdx oi,
> -                                TCGReg datalo, TCGReg datahi,
> -                                TCGReg addrlo, TCGReg addrhi,
> -                                tcg_insn_unit *raddr,
> -                                tcg_insn_unit **label_ptr)
> -{
> -    TCGLabelQemuLdst *label = new_ldst_label(s);
> -
> -    label->is_ld = is_ld;
> -    label->oi = oi;
> -    label->datalo_reg = datalo;
> -    label->datahi_reg = datahi;
> -    label->addrlo_reg = addrlo;
> -    label->addrhi_reg = addrhi;
> -    label->raddr = raddr;
> -    label->label_ptr[0] = label_ptr[0];
> -    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
> -        label->label_ptr[1] = label_ptr[1];
> -    }
> -}
> -
> -/*
> - * Generate code for the slow path for a load at the end of block
> - */
> -static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
> -{
> -    TCGMemOpIdx oi = l->oi;
> -    TCGMemOp opc = get_memop(oi);
> -    TCGReg data_reg;
> -    tcg_insn_unit **label_ptr = &l->label_ptr[0];
> -
> -    /* resolve label address */
> -    tcg_patch32(label_ptr[0], s->code_ptr - label_ptr[0] - 4);
> -    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
> -        tcg_patch32(label_ptr[1], s->code_ptr - label_ptr[1] - 4);
> -    }
> -
> -    if (TCG_TARGET_REG_BITS == 32) {
> -        int ofs = 0;
> -
> -        tcg_out_st(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_ESP, ofs);
> -        ofs += 4;
> -
> -        tcg_out_st(s, TCG_TYPE_I32, l->addrlo_reg, TCG_REG_ESP, ofs);
> -        ofs += 4;
> -
> -        if (TARGET_LONG_BITS == 64) {
> -            tcg_out_st(s, TCG_TYPE_I32, l->addrhi_reg, TCG_REG_ESP, ofs);
> -            ofs += 4;
> -        }
> -
> -        tcg_out_sti(s, TCG_TYPE_I32, oi, TCG_REG_ESP, ofs);
> -        ofs += 4;
> -
> -        tcg_out_sti(s, TCG_TYPE_PTR, (uintptr_t)l->raddr, TCG_REG_ESP, ofs);
> -    } else {
> -        tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
> -        /* The second argument is already loaded with addrlo.  */
> -        tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[2], oi);
> -        tcg_out_movi(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[3],
> -                     (uintptr_t)l->raddr);
> -    }
> -
> -    tcg_out_call(s, qemu_ld_helpers[opc & (MO_BSWAP | MO_SIZE)]);
> -
> -    data_reg = l->datalo_reg;
> -    switch (opc & MO_SSIZE) {
> -    case MO_SB:
> -        tcg_out_ext8s(s, data_reg, TCG_REG_EAX, P_REXW);
> -        break;
> -    case MO_SW:
> -        tcg_out_ext16s(s, data_reg, TCG_REG_EAX, P_REXW);
> -        break;
> -#if TCG_TARGET_REG_BITS == 64
> -    case MO_SL:
> -        tcg_out_ext32s(s, data_reg, TCG_REG_EAX);
> -        break;
> -#endif
> -    case MO_UB:
> -    case MO_UW:
> -        /* Note that the helpers have zero-extended to tcg_target_long.  */
> -    case MO_UL:
> -        tcg_out_mov(s, TCG_TYPE_I32, data_reg, TCG_REG_EAX);
> -        break;
> -    case MO_Q:
> -        if (TCG_TARGET_REG_BITS == 64) {
> -            tcg_out_mov(s, TCG_TYPE_I64, data_reg, TCG_REG_RAX);
> -        } else if (data_reg == TCG_REG_EDX) {
> -            /* xchg %edx, %eax */
> -            tcg_out_opc(s, OPC_XCHG_ax_r32 + TCG_REG_EDX, 0, 0, 0);
> -            tcg_out_mov(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_EAX);
> -        } else {
> -            tcg_out_mov(s, TCG_TYPE_I32, data_reg, TCG_REG_EAX);
> -            tcg_out_mov(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_EDX);
> -        }
> -        break;
> -    default:
> -        tcg_abort();
> -    }
> -
> -    /* Jump to the code corresponding to next IR of qemu_st */
> -    tcg_out_jmp(s, l->raddr);
> -}
> -
> -/*
> - * Generate code for the slow path for a store at the end of block
> - */
> -static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
> -{
> -    TCGMemOpIdx oi = l->oi;
> -    TCGMemOp opc = get_memop(oi);
> -    TCGMemOp s_bits = opc & MO_SIZE;
> -    tcg_insn_unit **label_ptr = &l->label_ptr[0];
> -    TCGReg retaddr;
> -
> -    /* resolve label address */
> -    tcg_patch32(label_ptr[0], s->code_ptr - label_ptr[0] - 4);
> -    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
> -        tcg_patch32(label_ptr[1], s->code_ptr - label_ptr[1] - 4);
> -    }
> -
> -    if (TCG_TARGET_REG_BITS == 32) {
> -        int ofs = 0;
> -
> -        tcg_out_st(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_ESP, ofs);
> -        ofs += 4;
> -
> -        tcg_out_st(s, TCG_TYPE_I32, l->addrlo_reg, TCG_REG_ESP, ofs);
> -        ofs += 4;
> -
> -        if (TARGET_LONG_BITS == 64) {
> -            tcg_out_st(s, TCG_TYPE_I32, l->addrhi_reg, TCG_REG_ESP, ofs);
> -            ofs += 4;
> -        }
> -
> -        tcg_out_st(s, TCG_TYPE_I32, l->datalo_reg, TCG_REG_ESP, ofs);
> -        ofs += 4;
> -
> -        if (s_bits == MO_64) {
> -            tcg_out_st(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_ESP, ofs);
> -            ofs += 4;
> -        }
> -
> -        tcg_out_sti(s, TCG_TYPE_I32, oi, TCG_REG_ESP, ofs);
> -        ofs += 4;
> -
> -        retaddr = TCG_REG_EAX;
> -        tcg_out_movi(s, TCG_TYPE_PTR, retaddr, (uintptr_t)l->raddr);
> -        tcg_out_st(s, TCG_TYPE_PTR, retaddr, TCG_REG_ESP, ofs);
> -    } else {
> -        tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
> -        /* The second argument is already loaded with addrlo.  */
> -        tcg_out_mov(s, (s_bits == MO_64 ? TCG_TYPE_I64 : TCG_TYPE_I32),
> -                    tcg_target_call_iarg_regs[2], l->datalo_reg);
> -        tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[3], oi);
> -
> -        if (ARRAY_SIZE(tcg_target_call_iarg_regs) > 4) {
> -            retaddr = tcg_target_call_iarg_regs[4];
> -            tcg_out_movi(s, TCG_TYPE_PTR, retaddr, (uintptr_t)l->raddr);
> -        } else {
> -            retaddr = TCG_REG_RAX;
> -            tcg_out_movi(s, TCG_TYPE_PTR, retaddr, (uintptr_t)l->raddr);
> -            tcg_out_st(s, TCG_TYPE_PTR, retaddr, TCG_REG_ESP,
> -                       TCG_TARGET_CALL_STACK_OFFSET);
> -        }
> -    }
> -
> -    /* "Tail call" to the helper, with the return address back inline.  */
> -    tcg_out_push(s, retaddr);
> -    tcg_out_jmp(s, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)]);
> -}
>  #elif defined(__x86_64__) && defined(__linux__)
>  # include <asm/prctl.h>
>  # include <sys/prctl.h>
> @@ -2091,7 +1924,6 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
>      TCGReg datahi __attribute__((unused)) = -1;
>      TCGReg addrhi __attribute__((unused)) = -1;
>      TCGMemOpIdx oi;
> -    TCGMemOp opc;
>      int i = -1;
>
>      datalo = args[++i];
> @@ -2103,35 +1935,25 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
>          addrhi = args[++i];
>      }
>      oi = args[++i];
> -    opc = get_memop(oi);
>
>  #if defined(CONFIG_SOFTMMU)
> -    {
> -        int mem_index = get_mmuidx(oi);
> -        tcg_insn_unit *label_ptr[2];
> -        TCGReg base;
> -
> -        tcg_debug_assert(datalo == softmmu_arg(ARG_LDVAL, is64, 0));
> -        if (TCG_TARGET_REG_BITS == 32 && is64) {
> -            tcg_debug_assert(datahi == softmmu_arg(ARG_LDVAL, is64, 1));
> -        }
> -        tcg_debug_assert(addrlo == softmmu_arg(ARG_ADDR, 0, 0));
> -        if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
> -            tcg_debug_assert(addrhi == softmmu_arg(ARG_ADDR, 0, 1));
> -        }
> -
> -        base = tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
> -                                label_ptr, offsetof(CPUTLBEntry, addr_read));
> -
> -        /* TLB Hit.  */
> -        tcg_out_qemu_ld_direct(s, datalo, datahi, base, -1, 0, 0, opc);
> -
> -        /* Record the current context of a load into ldst label */
> -        add_qemu_ldst_label(s, true, oi, datalo, datahi, addrlo, addrhi,
> -                            s->code_ptr, label_ptr);
> +    /* Assert that we've set up the constraints properly.  */
> +    tcg_debug_assert(datalo == softmmu_arg(ARG_LDVAL, is64, 0));
> +    if (TCG_TARGET_REG_BITS == 32 && is64) {
> +        tcg_debug_assert(datahi == softmmu_arg(ARG_LDVAL, is64, 1));
>      }
> +    tcg_debug_assert(addrlo == softmmu_arg(ARG_ADDR, 0, 0));
> +    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
> +        tcg_debug_assert(addrhi == softmmu_arg(ARG_ADDR, 0, 1));
> +    }
> +
> +    /* Call to thunk.  */
> +    tcg_out8(s, OPC_CALL_Jz);
> +    add_ldst_ool_label(s, true, is64, oi, R_386_PC32, -4);
> +    s->code_ptr += 4;
>  #else
>      {
> +        TCGMemOp opc = get_memop(oi);
>          int32_t offset = guest_base;
>          TCGReg base = addrlo;
>          int index = -1;
> @@ -2246,7 +2068,6 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
>      TCGReg datahi __attribute__((unused)) = -1;
>      TCGReg addrhi __attribute__((unused)) = -1;
>      TCGMemOpIdx oi;
> -    TCGMemOp opc;
>      int i = -1;
>
>      datalo = args[++i];
> @@ -2258,35 +2079,25 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
>          addrhi = args[++i];
>      }
>      oi = args[++i];
> -    opc = get_memop(oi);
>
>  #if defined(CONFIG_SOFTMMU)
> -    {
> -        int mem_index = get_mmuidx(oi);
> -        tcg_insn_unit *label_ptr[2];
> -        TCGReg base;
> -
> -        tcg_debug_assert(datalo == softmmu_arg(ARG_STVAL, is64, 0));
> -        if (TCG_TARGET_REG_BITS == 32 && is64) {
> -            tcg_debug_assert(datahi == softmmu_arg(ARG_STVAL, is64, 1));
> -        }
> -        tcg_debug_assert(addrlo == softmmu_arg(ARG_ADDR, 0, 0));
> -        if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
> -            tcg_debug_assert(addrhi == softmmu_arg(ARG_ADDR, 0, 1));
> -        }
> -
> -        base = tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
> -                                label_ptr, offsetof(CPUTLBEntry, addr_write));
> -
> -        /* TLB Hit.  */
> -        tcg_out_qemu_st_direct(s, datalo, datahi, base, 0, 0, opc);
> -
> -        /* Record the current context of a store into ldst label */
> -        add_qemu_ldst_label(s, false, oi, datalo, datahi, addrlo, addrhi,
> -                            s->code_ptr, label_ptr);
> +    /* Assert that we've set up the constraints properly.  */
> +    tcg_debug_assert(datalo == softmmu_arg(ARG_STVAL, is64, 0));
> +    if (TCG_TARGET_REG_BITS == 32 && is64) {
> +        tcg_debug_assert(datahi == softmmu_arg(ARG_STVAL, is64, 1));
>      }
> +    tcg_debug_assert(addrlo == softmmu_arg(ARG_ADDR, 0, 0));
> +    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
> +        tcg_debug_assert(addrhi == softmmu_arg(ARG_ADDR, 0, 1));
> +    }
> +
> +    /* Call to thunk.  */
> +    tcg_out8(s, OPC_CALL_Jz);
> +    add_ldst_ool_label(s, false, is64, oi, R_386_PC32, -4);
> +    s->code_ptr += 4;
>  #else
>      {
> +        TCGMemOp opc = get_memop(oi);
>          int32_t offset = guest_base;
>          TCGReg base = addrlo;
>          int seg = 0;
> @@ -2321,6 +2132,126 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
>  #endif
>  }
>
> +#if defined(CONFIG_SOFTMMU)
> +/*
> + * Generate code for an out-of-line thunk performing a load.
> + */
> +static tcg_insn_unit *tcg_out_qemu_ldst_ool(TCGContext *s, bool is_ld,
> +                                            bool is_64, TCGMemOpIdx oi)
> +{
> +    TCGMemOp opc = get_memop(oi);
> +    int mem_index = get_mmuidx(oi);
> +    tcg_insn_unit *label_ptr[2], *thunk;
> +    TCGReg datalo, addrlo, base;
> +    TCGReg datahi __attribute__((unused)) = -1;
> +    TCGReg addrhi __attribute__((unused)) = -1;
> +    int i;
> +
> +    /* Since we're amortizing the cost, align the thunk.  */
> +    thunk = QEMU_ALIGN_PTR_UP(s->code_ptr, 16);
> +    if (thunk != s->code_ptr) {
> +        memset(s->code_ptr, 0x90, thunk - s->code_ptr);
> +        s->code_ptr = thunk;
> +    }
> +
> +    /* Discover where the inputs are held.  */
> +    addrlo = softmmu_arg(ARG_ADDR, 0, 0);
> +    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
> +        addrhi = softmmu_arg(ARG_ADDR, 0, 1);
> +    }
> +    datalo = softmmu_arg(is_ld ? ARG_LDVAL : ARG_STVAL, is_64, 0);
> +    if (TCG_TARGET_REG_BITS == 32 && is_64) {
> +        datahi = softmmu_arg(is_ld ? ARG_LDVAL : ARG_STVAL, is_64, 1);
> +    }
> +
> +    base = tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc, label_ptr,
> +                            is_ld ? offsetof(CPUTLBEntry, addr_read)
> +                            : offsetof(CPUTLBEntry, addr_write));
> +
> +    /* TLB Hit.  */
> +    if (is_ld) {
> +        tcg_out_qemu_ld_direct(s, datalo, datahi, base, -1, 0, 0, opc);
> +    } else {
> +        tcg_out_qemu_st_direct(s, datalo, datahi, base, 0, 0, opc);
> +    }
> +    tcg_out_opc(s, OPC_RET, 0, 0, 0);
> +
> +    /* TLB Miss.  */
> +
> +    /* resolve label address */
> +    tcg_patch8(label_ptr[0], s->code_ptr - label_ptr[0] - 1);
> +    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
> +        tcg_patch8(label_ptr[1], s->code_ptr - label_ptr[1] - 1);
> +    }
> +
> +    if (TCG_TARGET_REG_BITS == 32) {
> +        /* Copy the return address into a temporary.  */
> +        tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_L0, TCG_REG_ESP, 0);
> +        i = 4;
> +
> +        tcg_out_st(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_ESP, i);
> +        i += 4;
> +
> +        tcg_out_st(s, TCG_TYPE_I32, addrlo, TCG_REG_ESP, i);
> +        i += 4;
> +
> +        if (TARGET_LONG_BITS == 64) {
> +            tcg_out_st(s, TCG_TYPE_I32, addrhi, TCG_REG_ESP, i);
> +            i += 4;
> +        }
> +
> +        if (!is_ld) {
> +            tcg_out_st(s, TCG_TYPE_I32, datalo, TCG_REG_ESP, i);
> +            i += 4;
> +
> +            if (is_64) {
> +                tcg_out_st(s, TCG_TYPE_I32, datahi, TCG_REG_ESP, i);
> +                i += 4;
> +            }
> +        }
> +
> +        tcg_out_sti(s, TCG_TYPE_I32, oi, TCG_REG_ESP, i);
> +        i += 4;
> +
> +        tcg_out_st(s, TCG_TYPE_PTR, TCG_REG_L0, TCG_REG_ESP, i);
> +    } else {
> +        tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
> +
> +        /* The address and data values have been placed by constraints.  */
> +        tcg_debug_assert(addrlo == tcg_target_call_iarg_regs[1]);
> +        if (is_ld) {
> +            i = 2;
> +        } else {
> +            tcg_debug_assert(datalo == tcg_target_call_iarg_regs[2]);
> +            i = 3;
> +        }
> +
> +        tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[i++], oi);
> +
> +        /* Copy the return address from the stack to the rvalue argument.
> +         * WIN64 runs out of argument registers for stores.
> +         */
> +        if (i < (int)ARRAY_SIZE(tcg_target_call_iarg_regs)) {
> +            tcg_out_ld(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[i],
> +                       TCG_REG_ESP, 0);
> +        } else {
> +            tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_RAX, TCG_REG_ESP, 0);
> +            tcg_out_st(s, TCG_TYPE_PTR, TCG_REG_RAX, TCG_REG_ESP,
> +                       TCG_TARGET_CALL_STACK_OFFSET + 8);
> +        }
> +    }
> +
> +    /* Tail call to the helper.  */
> +    if (is_ld) {
> +        tcg_out_jmp(s, qemu_ld_helpers[opc & (MO_BSWAP | MO_SSIZE)]);
> +    } else {
> +        tcg_out_jmp(s, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)]);
> +    }
> +
> +    return thunk;
> +}
> +#endif
> +
>  static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>                                const TCGArg *args, const int *const_args)
>  {

Otherwise:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


--
Alex Bennée

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v2 10/37] tcg/aarch64: Add constraints for x0, x1, x2
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 10/37] tcg/aarch64: Add constraints for x0, x1, x2 Richard Henderson
@ 2018-11-30 17:25   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2018-11-30 17:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis


Richard Henderson <richard.henderson@linaro.org> writes:

> These are function call arguments that we will need soon.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  tcg/aarch64/tcg-target.inc.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
> index 30091f6a69..148de0b7f2 100644
> --- a/tcg/aarch64/tcg-target.inc.c
> +++ b/tcg/aarch64/tcg-target.inc.c
> @@ -125,6 +125,18 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
>                                             const char *ct_str, TCGType type)
>  {
>      switch (*ct_str++) {
> +    case 'a': /* x0 */
> +        ct->ct |= TCG_CT_REG;
> +        tcg_regset_set_reg(ct->u.regs, TCG_REG_X0);
> +        break;
> +    case 'b': /* x1 */
> +        ct->ct |= TCG_CT_REG;
> +        tcg_regset_set_reg(ct->u.regs, TCG_REG_X1);
> +        break;
> +    case 'c': /* x2 */
> +        ct->ct |= TCG_CT_REG;
> +        tcg_regset_set_reg(ct->u.regs, TCG_REG_X2);
> +        break;
>      case 'r': /* general registers */
>          ct->ct |= TCG_CT_REG;
>          ct->u.regs |= 0xffffffffu;


--
Alex Bennée

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v2 09/37] tcg/i386: Use TCG_TARGET_NEED_LDST_OOL_LABELS
  2018-11-30 17:22   ` Alex Bennée
@ 2018-11-30 17:37     ` Richard Henderson
  2018-11-30 17:52       ` Alex Bennée
  0 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2018-11-30 17:37 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel; +Cc: Alistair.Francis

On 11/30/18 9:22 AM, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> Move the entire memory operation out of line.
> 
> Given Emilio's numbers is it likely we will want to support both options
> given the variability on x86?

No, I don't want to support two methods in any one tcg backend.
Which is why I'm not really sure what to do about Emilio's results.


r~

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v2 11/37] tcg/aarch64: Parameterize the temps for tcg_out_tlb_read
  2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 11/37] tcg/aarch64: Parameterize the temps for tcg_out_tlb_read Richard Henderson
@ 2018-11-30 17:50   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2018-11-30 17:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alistair.Francis


Richard Henderson <richard.henderson@linaro.org> writes:

> When moving the qemu_ld/st arguments to the right place for
> a function call, we'll need to move the temps out of the way.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  tcg/aarch64/tcg-target.inc.c | 74 +++++++++++++++++++-----------------
>  1 file changed, 40 insertions(+), 34 deletions(-)
>
> diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
> index 148de0b7f2..c0ba9a6d50 100644
> --- a/tcg/aarch64/tcg-target.inc.c
> +++ b/tcg/aarch64/tcg-target.inc.c
> @@ -1467,13 +1467,15 @@ static void add_qemu_ldst_label(TCGContext *s, bool is_ld, TCGMemOpIdx oi,
>      label->label_ptr[0] = label_ptr;
>  }
>
> -/* Load and compare a TLB entry, emitting the conditional jump to the
> -   slow path for the failure case, which will be patched later when finalizing
> -   the slow path. Generated code returns the host addend in X1,
> -   clobbers X0,X2,X3,TMP. */
> -static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp opc,
> -                             tcg_insn_unit **label_ptr, int mem_index,
> -                             bool is_read)
> +/*
> + * Load and compare a TLB entry, emitting the conditional jump to the
> + * slow path on failure.  Returns the register for the host addend.
> + * Clobbers t0, t1, t2, t3.
> + */
> +static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp opc,
> +                               tcg_insn_unit **label_ptr, int mem_index,
> +                               bool is_read, TCGReg t0, TCGReg t1,
> +                               TCGReg t2, TCGReg t3)
>  {
>      int tlb_offset = is_read ?
>          offsetof(CPUArchState, tlb_table[mem_index][0].addr_read)
> @@ -1491,55 +1493,56 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp opc,
>      if (a_bits >= s_bits) {
>          x3 = addr_reg;
>      } else {
> +        x3 = t3;
>          tcg_out_insn(s, 3401, ADDI, TARGET_LONG_BITS == 64,
> -                     TCG_REG_X3, addr_reg, s_mask - a_mask);
> -        x3 = TCG_REG_X3;
> +                     x3, addr_reg, s_mask - a_mask);
>      }
>      tlb_mask = (uint64_t)TARGET_PAGE_MASK | a_mask;
>
> -    /* Extract the TLB index from the address into X0.
> -       X0<CPU_TLB_BITS:0> =
> +    /* Extract the TLB index from the address into T0.
> +       T0<CPU_TLB_BITS:0> =
>         addr_reg<TARGET_PAGE_BITS+CPU_TLB_BITS:TARGET_PAGE_BITS> */
> -    tcg_out_ubfm(s, TARGET_LONG_BITS == 64, TCG_REG_X0, addr_reg,
> +    tcg_out_ubfm(s, TARGET_LONG_BITS == 64, t0, addr_reg,
>                   TARGET_PAGE_BITS, TARGET_PAGE_BITS + CPU_TLB_BITS);
>
> -    /* Store the page mask part of the address into X3.  */
> +    /* Store the page mask part of the address into T3.  */
>      tcg_out_logicali(s, I3404_ANDI, TARGET_LONG_BITS == 64,
> -                     TCG_REG_X3, x3, tlb_mask);
> +                     t3, x3, tlb_mask);
>
> -    /* Add any "high bits" from the tlb offset to the env address into X2,
> +    /* Add any "high bits" from the tlb offset to the env address into T2,
>         to take advantage of the LSL12 form of the ADDI instruction.
> -       X2 = env + (tlb_offset & 0xfff000) */
> +       T2 = env + (tlb_offset & 0xfff000) */
>      if (tlb_offset & 0xfff000) {
> -        tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, TCG_REG_X2, base,
> +        tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, t2, base,
>                       tlb_offset & 0xfff000);
> -        base = TCG_REG_X2;
> +        base = t2;
>      }
>
> -    /* Merge the tlb index contribution into X2.
> -       X2 = X2 + (X0 << CPU_TLB_ENTRY_BITS) */
> -    tcg_out_insn(s, 3502S, ADD_LSL, TCG_TYPE_I64, TCG_REG_X2, base,
> -                 TCG_REG_X0, CPU_TLB_ENTRY_BITS);
> +    /* Merge the tlb index contribution into T2.
> +       T2 = T2 + (T0 << CPU_TLB_ENTRY_BITS) */
> +    tcg_out_insn(s, 3502S, ADD_LSL, TCG_TYPE_I64,
> +                 t2, base, t0, CPU_TLB_ENTRY_BITS);
>
> -    /* Merge "low bits" from tlb offset, load the tlb comparator into X0.
> -       X0 = load [X2 + (tlb_offset & 0x000fff)] */
> +    /* Merge "low bits" from tlb offset, load the tlb comparator into T0.
> +       T0 = load [T2 + (tlb_offset & 0x000fff)] */
>      tcg_out_ldst(s, TARGET_LONG_BITS == 32 ? I3312_LDRW : I3312_LDRX,
> -                 TCG_REG_X0, TCG_REG_X2, tlb_offset & 0xfff,
> -                 TARGET_LONG_BITS == 32 ? 2 : 3);
> +                 t0, t2, tlb_offset & 0xfff, TARGET_LONG_BITS == 32 ? 2 : 3);
>
>      /* Load the tlb addend. Do that early to avoid stalling.
> -       X1 = load [X2 + (tlb_offset & 0xfff) + offsetof(addend)] */
> -    tcg_out_ldst(s, I3312_LDRX, TCG_REG_X1, TCG_REG_X2,
> +       T1 = load [T2 + (tlb_offset & 0xfff) + offsetof(addend)] */
> +    tcg_out_ldst(s, I3312_LDRX, t1, t2,
>                   (tlb_offset & 0xfff) + (offsetof(CPUTLBEntry, addend)) -
>                   (is_read ? offsetof(CPUTLBEntry, addr_read)
>                    : offsetof(CPUTLBEntry, addr_write)), 3);
>
>      /* Perform the address comparison. */
> -    tcg_out_cmp(s, (TARGET_LONG_BITS == 64), TCG_REG_X0, TCG_REG_X3, 0);
> +    tcg_out_cmp(s, (TARGET_LONG_BITS == 64), t0, t3, 0);
>
>      /* If not equal, we jump to the slow path. */
>      *label_ptr = s->code_ptr;
>      tcg_out_goto_cond_noaddr(s, TCG_COND_NE);
> +
> +    return t1;
>  }
>
>  #endif /* CONFIG_SOFTMMU */
> @@ -1644,10 +1647,12 @@ static void tcg_out_qemu_ld(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
>  #ifdef CONFIG_SOFTMMU
>      unsigned mem_index = get_mmuidx(oi);
>      tcg_insn_unit *label_ptr;
> +    TCGReg base;
>
> -    tcg_out_tlb_read(s, addr_reg, memop, &label_ptr, mem_index, 1);
> +    base = tcg_out_tlb_read(s, addr_reg, memop, &label_ptr, mem_index, 1,
> +                            TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3);
>      tcg_out_qemu_ld_direct(s, memop, ext, data_reg,
> -                           TCG_REG_X1, otype, addr_reg);
> +                           base, otype, addr_reg);
>      add_qemu_ldst_label(s, true, oi, ext, data_reg, addr_reg,
>                          s->code_ptr, label_ptr);
>  #else /* !CONFIG_SOFTMMU */
> @@ -1669,10 +1674,11 @@ static void tcg_out_qemu_st(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
>  #ifdef CONFIG_SOFTMMU
>      unsigned mem_index = get_mmuidx(oi);
>      tcg_insn_unit *label_ptr;
> +    TCGReg base;
>
> -    tcg_out_tlb_read(s, addr_reg, memop, &label_ptr, mem_index, 0);
> -    tcg_out_qemu_st_direct(s, memop, data_reg,
> -                           TCG_REG_X1, otype, addr_reg);
> +    base = tcg_out_tlb_read(s, addr_reg, memop, &label_ptr, mem_index, 0,
> +                            TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3);
> +    tcg_out_qemu_st_direct(s, memop, data_reg, base, otype, addr_reg);
>      add_qemu_ldst_label(s, false, oi, (memop & MO_SIZE)== MO_64,
>                          data_reg, addr_reg, s->code_ptr, label_ptr);
>  #else /* !CONFIG_SOFTMMU */


--
Alex Bennée

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v2 09/37] tcg/i386: Use TCG_TARGET_NEED_LDST_OOL_LABELS
  2018-11-30 17:37     ` Richard Henderson
@ 2018-11-30 17:52       ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2018-11-30 17:52 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Alistair.Francis


Richard Henderson <richard.henderson@linaro.org> writes:

> On 11/30/18 9:22 AM, Alex Bennée wrote:
>>
>> Richard Henderson <richard.henderson@linaro.org> writes:
>>
>>> Move the entire memory operation out of line.
>>
>> Given Emilio's numbers is it likely we will want to support both options
>> given the variability on x86?
>
> No, I don't want to support two methods in any one tcg backend.
> Which is why I'm not really sure what to do about Emilio's results.

They at least seem pretty positive on aarch64 backends....

--
Alex Bennée

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2018-11-30 17:52 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-23 14:45 [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 01/37] tcg/i386: Always use %ebp for TCG_AREG0 Richard Henderson
2018-11-29 12:52   ` Alex Bennée
2018-11-29 14:55     ` Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 02/37] tcg/i386: Move TCG_REG_CALL_STACK from define to enum Richard Henderson
2018-11-29 12:52   ` Alex Bennée
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 03/37] tcg: Return success from patch_reloc Richard Henderson
2018-11-29 14:47   ` Alex Bennée
2018-11-29 17:35     ` Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 04/37] tcg: Add TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
2018-11-26  0:31   ` Emilio G. Cota
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 05/37] tcg/i386: Add constraints for r8 and r9 Richard Henderson
2018-11-29 15:00   ` Alex Bennée
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 06/37] tcg/i386: Return a base register from tcg_out_tlb_load Richard Henderson
2018-11-29 16:34   ` Alex Bennée
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 07/37] tcg/i386: Change TCG_REG_L[01] to not overlap function arguments Richard Henderson
2018-11-29 17:13   ` Alex Bennée
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 08/37] tcg/i386: Force qemu_ld/st arguments into fixed registers Richard Henderson
2018-11-30 16:16   ` Alex Bennée
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 09/37] tcg/i386: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
2018-11-30 17:22   ` Alex Bennée
2018-11-30 17:37     ` Richard Henderson
2018-11-30 17:52       ` Alex Bennée
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 10/37] tcg/aarch64: Add constraints for x0, x1, x2 Richard Henderson
2018-11-30 17:25   ` Alex Bennée
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 11/37] tcg/aarch64: Parameterize the temps for tcg_out_tlb_read Richard Henderson
2018-11-30 17:50   ` Alex Bennée
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 12/37] tcg/aarch64: Parameterize the temp for tcg_out_goto_long Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 13/37] tcg/aarch64: Use B not BL " Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 14/37] tcg/aarch64: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 15/37] tcg/arm: Parameterize the temps for tcg_out_tlb_read Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 16/37] tcg/arm: Add constraints for R0-R5 Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 17/37] tcg/arm: Reduce the number of temps for tcg_out_tlb_read Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 18/37] tcg/arm: Force qemu_ld/st arguments into fixed registers Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 19/37] tcg/arm: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 20/37] tcg/ppc: Parameterize the temps for tcg_out_tlb_read Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 21/37] tcg/ppc: Split out tcg_out_call_int Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 22/37] tcg/ppc: Add constraints for R7-R8 Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 23/37] tcg/ppc: Change TCG_TARGET_CALL_ALIGN_ARGS to bool Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 24/37] tcg/ppc: Force qemu_ld/st arguments into fixed registers Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 25/37] tcg/ppc: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 26/37] tcg: Clean up generic bswap32 Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 27/37] tcg: Clean up generic bswap64 Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 28/37] tcg/optimize: Optimize bswap Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 29/37] tcg: Add TCG_TARGET_HAS_MEMORY_BSWAP Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 30/37] tcg/i386: Adjust TCG_TARGET_HAS_MEMORY_BSWAP Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 31/37] tcg/aarch64: Set TCG_TARGET_HAS_MEMORY_BSWAP to false Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 32/37] tcg/arm: Set TCG_TARGET_HAS_MEMORY_BSWAP to false for user-only Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 33/37] tcg/i386: Propagate is64 to tcg_out_qemu_ld_direct Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 34/37] tcg/i386: Restrict user-only qemu_st_i32 values to q-regs Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 35/37] tcg/i386: Add setup_guest_base_seg for FreeBSD Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 36/37] tcg/i386: Require segment syscalls to succeed Richard Henderson
2018-11-23 14:45 ` [Qemu-devel] [PATCH for-4.0 v2 37/37] tcg/i386: Remove L constraint Richard Henderson
2018-11-23 21:04 ` [Qemu-devel] [PATCH for-4.0 v2 00/37] tcg: Assorted cleanups no-reply
2018-11-26  0:30 ` Emilio G. Cota

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.