All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line
@ 2018-11-12 21:44 Richard Henderson
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 01/17] tcg/i386: Add constraints for r8 and r9 Richard Henderson
                   ` (18 more replies)
  0 siblings, 19 replies; 30+ messages in thread
From: Richard Henderson @ 2018-11-12 21:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

Based on an idea forwarded by Emilio, which suggests a 5-6%
speed gain is possible.  I have not spent too much time
measuring this, as the code size gains are significant.

I believe that I posted an x86_64-only patch some time ago,
but this now includes i386, aarch64 and arm32.  In late
testing I do some failures on i386, for sparc guest.  I'll
follow up on that later.

The main feature here is sharing code to place these out-of-line
thunks.  We want them to be within a direct call.  Once we've
emitted a thunk we remember (at least within a given tcg_region)
reusing it until we find that the relocation is out of range.
At which point we generate another copy.

The second main change is that the entire TCGMemOpIdx is built
into each thunk.  There simply are not enough free registers for
i386 (or arm32 for that matter) to pass in the mmu_idx to the thunk.

For x86, this displacement is 2GB, and we've already constrained
the whole code_gen_buffer to be in range.  For aarch64, this
displacement is 128MB; for arm32 it is 16MB.  In every case,
the range is significant, and for any smp guest may well cover
the entire tcg_region.

Other than these three targets, I have compile-tested the generic
change on ppc64le.  I have not even compile-tested mips, s390x,
or sparc host.


r~


Richard Henderson (17):
  tcg/i386: Add constraints for r8 and r9
  tcg/i386: Return a base register from tcg_out_tlb_load
  tcg/i386: Change TCG_REG_L[01] to not overlap function arguments
  tcg/i386: Force qemu_ld/st arguments into fixed registers
  tcg: Return success from patch_reloc
  tcg: Add TCG_TARGET_NEED_LDST_OOL_LABELS
  tcg/i386: Use TCG_TARGET_NEED_LDST_OOL_LABELS
  tcg/aarch64: Add constraints for x0, x1, x2
  tcg/aarch64: Parameterize the temps for tcg_out_tlb_read
  tcg/aarch64: Parameterize the temp for tcg_out_goto_long
  tcg/aarch64: Use B not BL for tcg_out_goto_long
  tcg/aarch64: Use TCG_TARGET_NEED_LDST_OOL_LABELS
  tcg/arm: Parameterize the temps for tcg_out_tlb_read
  tcg/arm: Add constraints for R0-R5
  tcg/arm: Reduce the number of temps for tcg_out_tlb_read
  tcg/arm: Force qemu_ld/st arguments into fixed registers
  tcg/arm: Use TCG_TARGET_NEED_LDST_OOL_LABELS

 tcg/aarch64/tcg-target.h     |   2 +-
 tcg/arm/tcg-target.h         |   2 +-
 tcg/i386/tcg-target.h        |   2 +-
 tcg/tcg.h                    |   4 +
 tcg/aarch64/tcg-target.inc.c | 318 +++++++++---------
 tcg/arm/tcg-target.inc.c     | 535 +++++++++++++++---------------
 tcg/i386/tcg-target.inc.c    | 611 ++++++++++++++++++++---------------
 tcg/mips/tcg-target.inc.c    |  29 +-
 tcg/ppc/tcg-target.inc.c     |  47 +--
 tcg/s390/tcg-target.inc.c    |  37 ++-
 tcg/sparc/tcg-target.inc.c   |  13 +-
 tcg/tcg-ldst-ool.inc.c       |  94 ++++++
 tcg/tcg-pool.inc.c           |   5 +-
 tcg/tcg.c                    |  28 +-
 tcg/tci/tcg-target.inc.c     |   3 +-
 15 files changed, 974 insertions(+), 756 deletions(-)
 create mode 100644 tcg/tcg-ldst-ool.inc.c

-- 
2.17.2

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-4.0 01/17] tcg/i386: Add constraints for r8 and r9
  2018-11-12 21:44 [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line Richard Henderson
@ 2018-11-12 21:44 ` Richard Henderson
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 02/17] tcg/i386: Return a base register from tcg_out_tlb_load Richard Henderson
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2018-11-12 21:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

These are function call arguments for x86_64 we will need soon.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.inc.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 436195894b..e4d9be57ff 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -230,6 +230,14 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
         ct->ct |= TCG_CT_REG;
         tcg_regset_set_reg(ct->u.regs, TCG_REG_EDI);
         break;
+    case 'E': /* "Eight", r8 */
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set_reg(ct->u.regs, TCG_REG_R8);
+        break;
+    case 'N': /* "Nine", r9 */
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set_reg(ct->u.regs, TCG_REG_R9);
+        break;
     case 'q':
         /* A register that can be used as a byte operand.  */
         ct->ct |= TCG_CT_REG;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-4.0 02/17] tcg/i386: Return a base register from tcg_out_tlb_load
  2018-11-12 21:44 [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line Richard Henderson
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 01/17] tcg/i386: Add constraints for r8 and r9 Richard Henderson
@ 2018-11-12 21:44 ` Richard Henderson
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 03/17] tcg/i386: Change TCG_REG_L[01] to not overlap function arguments Richard Henderson
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2018-11-12 21:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

We will shortly be asking the hot path not to assume TCG_REG_L1
for the host base address.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.inc.c | 56 ++++++++++++++++++++-------------------
 1 file changed, 29 insertions(+), 27 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index e4d9be57ff..4435a7bb52 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1611,9 +1611,9 @@ static void * const qemu_st_helpers[16] = {
 
    First argument register is clobbered.  */
 
-static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
-                                    int mem_index, TCGMemOp opc,
-                                    tcg_insn_unit **label_ptr, int which)
+static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
+                               int mem_index, TCGMemOp opc,
+                               tcg_insn_unit **label_ptr, int which)
 {
     const TCGReg r0 = TCG_REG_L0;
     const TCGReg r1 = TCG_REG_L1;
@@ -1693,6 +1693,8 @@ static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     /* add addend(r0), r1 */
     tcg_out_modrm_offset(s, OPC_ADD_GvEv + hrexw, r1, r0,
                          offsetof(CPUTLBEntry, addend) - which);
+
+    return r1;
 }
 
 /*
@@ -1998,10 +2000,6 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
     TCGReg addrhi __attribute__((unused));
     TCGMemOpIdx oi;
     TCGMemOp opc;
-#if defined(CONFIG_SOFTMMU)
-    int mem_index;
-    tcg_insn_unit *label_ptr[2];
-#endif
 
     datalo = *args++;
     datahi = (TCG_TARGET_REG_BITS == 32 && is64 ? *args++ : 0);
@@ -2011,17 +2009,21 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
     opc = get_memop(oi);
 
 #if defined(CONFIG_SOFTMMU)
-    mem_index = get_mmuidx(oi);
+    {
+        int mem_index = get_mmuidx(oi);
+        tcg_insn_unit *label_ptr[2];
+        TCGReg base;
 
-    tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
-                     label_ptr, offsetof(CPUTLBEntry, addr_read));
+        base = tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
+                                label_ptr, offsetof(CPUTLBEntry, addr_read));
 
-    /* TLB Hit.  */
-    tcg_out_qemu_ld_direct(s, datalo, datahi, TCG_REG_L1, -1, 0, 0, opc);
+        /* TLB Hit.  */
+        tcg_out_qemu_ld_direct(s, datalo, datahi, base, -1, 0, 0, opc);
 
-    /* Record the current context of a load into ldst label */
-    add_qemu_ldst_label(s, true, oi, datalo, datahi, addrlo, addrhi,
-                        s->code_ptr, label_ptr);
+        /* Record the current context of a load into ldst label */
+        add_qemu_ldst_label(s, true, oi, datalo, datahi, addrlo, addrhi,
+                            s->code_ptr, label_ptr);
+    }
 #else
     {
         int32_t offset = guest_base;
@@ -2138,10 +2140,6 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
     TCGReg addrhi __attribute__((unused));
     TCGMemOpIdx oi;
     TCGMemOp opc;
-#if defined(CONFIG_SOFTMMU)
-    int mem_index;
-    tcg_insn_unit *label_ptr[2];
-#endif
 
     datalo = *args++;
     datahi = (TCG_TARGET_REG_BITS == 32 && is64 ? *args++ : 0);
@@ -2151,17 +2149,21 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
     opc = get_memop(oi);
 
 #if defined(CONFIG_SOFTMMU)
-    mem_index = get_mmuidx(oi);
+    {
+        int mem_index = get_mmuidx(oi);
+        tcg_insn_unit *label_ptr[2];
+        TCGReg base;
 
-    tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
-                     label_ptr, offsetof(CPUTLBEntry, addr_write));
+        base = tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
+                                label_ptr, offsetof(CPUTLBEntry, addr_write));
 
-    /* TLB Hit.  */
-    tcg_out_qemu_st_direct(s, datalo, datahi, TCG_REG_L1, 0, 0, opc);
+        /* TLB Hit.  */
+        tcg_out_qemu_st_direct(s, datalo, datahi, base, 0, 0, opc);
 
-    /* Record the current context of a store into ldst label */
-    add_qemu_ldst_label(s, false, oi, datalo, datahi, addrlo, addrhi,
-                        s->code_ptr, label_ptr);
+        /* Record the current context of a store into ldst label */
+        add_qemu_ldst_label(s, false, oi, datalo, datahi, addrlo, addrhi,
+                            s->code_ptr, label_ptr);
+    }
 #else
     {
         int32_t offset = guest_base;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-4.0 03/17] tcg/i386: Change TCG_REG_L[01] to not overlap function arguments
  2018-11-12 21:44 [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line Richard Henderson
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 01/17] tcg/i386: Add constraints for r8 and r9 Richard Henderson
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 02/17] tcg/i386: Return a base register from tcg_out_tlb_load Richard Henderson
@ 2018-11-12 21:44 ` Richard Henderson
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 04/17] tcg/i386: Force qemu_ld/st arguments into fixed registers Richard Henderson
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2018-11-12 21:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

We will shortly be forcing qemu_ld/st arguments into registers
that match the function call abi of the host, which means that
the temps must be elsewhere.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.inc.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 4435a7bb52..2a96ca4274 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -121,12 +121,16 @@ static const int tcg_target_call_oarg_regs[] = {
 #define TCG_CT_CONST_I32 0x400
 #define TCG_CT_CONST_WSZ 0x800
 
-/* Registers used with L constraint, which are the first argument
-   registers on x86_64, and two random call clobbered registers on
-   i386. */
+/* Registers used with L constraint, which are two random
+ * call clobbered registers.  These should be free.
+ */
 #if TCG_TARGET_REG_BITS == 64
-# define TCG_REG_L0 tcg_target_call_iarg_regs[0]
-# define TCG_REG_L1 tcg_target_call_iarg_regs[1]
+# define TCG_REG_L0   TCG_REG_RAX
+# ifdef _WIN64
+#  define TCG_REG_L1  TCG_REG_R10
+# else
+#  define TCG_REG_L1  TCG_REG_RDI
+# endif
 #else
 # define TCG_REG_L0 TCG_REG_EAX
 # define TCG_REG_L1 TCG_REG_EDX
@@ -1625,6 +1629,7 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     unsigned a_mask = (1 << a_bits) - 1;
     unsigned s_mask = (1 << s_bits) - 1;
     target_ulong tlb_mask;
+    TCGReg base;
 
     if (TCG_TARGET_REG_BITS == 64) {
         if (TARGET_LONG_BITS == 64) {
@@ -1671,7 +1676,12 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
        before the fastpath ADDQ below.  For 64-bit guest and x32 host, MOVQ
        copies the entire guest address for the slow path, while truncation
        for the 32-bit host happens with the fastpath ADDL below.  */
-    tcg_out_mov(s, ttype, r1, addrlo);
+    if (TCG_TARGET_REG_BITS == 64) {
+        base = tcg_target_call_iarg_regs[1];
+    } else {
+        base = r1;
+    }
+    tcg_out_mov(s, ttype, base, addrlo);
 
     /* jne slow_path */
     tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
@@ -1690,11 +1700,11 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
 
     /* TLB Hit.  */
 
-    /* add addend(r0), r1 */
-    tcg_out_modrm_offset(s, OPC_ADD_GvEv + hrexw, r1, r0,
+    /* add addend(r0), base */
+    tcg_out_modrm_offset(s, OPC_ADD_GvEv + hrexw, base, r0,
                          offsetof(CPUTLBEntry, addend) - which);
 
-    return r1;
+    return base;
 }
 
 /*
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-4.0 04/17] tcg/i386: Force qemu_ld/st arguments into fixed registers
  2018-11-12 21:44 [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line Richard Henderson
                   ` (2 preceding siblings ...)
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 03/17] tcg/i386: Change TCG_REG_L[01] to not overlap function arguments Richard Henderson
@ 2018-11-12 21:44 ` Richard Henderson
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 05/17] tcg: Return success from patch_reloc Richard Henderson
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2018-11-12 21:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

This is an incremental step toward moving the qemu_ld/st
code sequence out of line.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.inc.c | 193 +++++++++++++++++++++++++++++++-------
 1 file changed, 159 insertions(+), 34 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 2a96ca4274..8a3e7690b6 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -171,6 +171,56 @@ static bool have_lzcnt;
 
 static tcg_insn_unit *tb_ret_addr;
 
+#ifdef CONFIG_SOFTMMU
+/*
+ * Constraint to choose a particular register.  This is used for softmmu
+ * loads and stores.  Registers with no assignment get an empty string.
+ */
+static const char * const one_reg_constraint[TCG_TARGET_NB_REGS] = {
+    [TCG_REG_EAX] = "a",
+    [TCG_REG_EBX] = "b",
+    [TCG_REG_ECX] = "c",
+    [TCG_REG_EDX] = "d",
+    [TCG_REG_ESI] = "S",
+    [TCG_REG_EDI] = "D",
+#if TCG_TARGET_REG_BITS == 64
+    [TCG_REG_R8]  = "E",
+    [TCG_REG_R9]  = "N",
+#endif
+};
+
+/*
+ * Calling convention for the softmmu load and store thunks.
+ *
+ * For 64-bit, we mostly use the host calling convention, therefore the
+ * real first argument is reserved for the ENV parameter that is passed
+ * on to the slow path helpers.
+ *
+ * For 32-bit, the host calling convention is stack based; we invent a
+ * private convention that uses 4 of the 6 available host registers, and
+ * we reserve EAX and EDX as temporaries for use by the thunk.
+ */
+static inline TCGReg softmmu_arg(unsigned n)
+{
+    if (TCG_TARGET_REG_BITS == 64) {
+        tcg_debug_assert(n < ARRAY_SIZE(tcg_target_call_iarg_regs) - 1);
+        return tcg_target_call_iarg_regs[n + 1];
+    } else {
+        static const TCGReg local_order[] = {
+            TCG_REG_ESI, TCG_REG_EDI, TCG_REG_ECX, TCG_REG_EBX
+        };
+        tcg_debug_assert(n < ARRAY_SIZE(local_order));
+        return local_order[n];
+    }
+}
+
+#define qemu_memop_arg(N)          one_reg_constraint[softmmu_arg(N)]
+#define qemu_memop_ret(N)          (N ? "d" : "a")
+#else
+#define qemu_memop_arg(N)          "L"
+#define qemu_memop_ret(N)          "L"
+#endif /* CONFIG_SOFTMMU */
+
 static void patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
@@ -1677,11 +1727,15 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
        copies the entire guest address for the slow path, while truncation
        for the 32-bit host happens with the fastpath ADDL below.  */
     if (TCG_TARGET_REG_BITS == 64) {
-        base = tcg_target_call_iarg_regs[1];
+        tcg_debug_assert(addrlo == tcg_target_call_iarg_regs[1]);
+        if (TARGET_LONG_BITS == 32) {
+            tcg_out_ext32u(s, addrlo, addrlo);
+        }
+        base = addrlo;
     } else {
         base = r1;
+        tcg_out_mov(s, ttype, base, addrlo);
     }
-    tcg_out_mov(s, ttype, base, addrlo);
 
     /* jne slow_path */
     tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
@@ -2006,16 +2060,22 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
    common. */
 static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
 {
-    TCGReg datalo, datahi, addrlo;
-    TCGReg addrhi __attribute__((unused));
+    TCGReg datalo, addrlo;
+    TCGReg datahi __attribute__((unused)) = -1;
+    TCGReg addrhi __attribute__((unused)) = -1;
     TCGMemOpIdx oi;
     TCGMemOp opc;
+    int i = -1;
 
-    datalo = *args++;
-    datahi = (TCG_TARGET_REG_BITS == 32 && is64 ? *args++ : 0);
-    addrlo = *args++;
-    addrhi = (TARGET_LONG_BITS > TCG_TARGET_REG_BITS ? *args++ : 0);
-    oi = *args++;
+    datalo = args[++i];
+    if (TCG_TARGET_REG_BITS == 32 && is64) {
+        datahi = args[++i];
+    }
+    addrlo = args[++i];
+    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+        addrhi = args[++i];
+    }
+    oi = args[++i];
     opc = get_memop(oi);
 
 #if defined(CONFIG_SOFTMMU)
@@ -2024,6 +2084,15 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
         tcg_insn_unit *label_ptr[2];
         TCGReg base;
 
+        tcg_debug_assert(datalo == tcg_target_call_oarg_regs[0]);
+        if (TCG_TARGET_REG_BITS == 32 && is64) {
+            tcg_debug_assert(datahi == tcg_target_call_oarg_regs[1]);
+        }
+        tcg_debug_assert(addrlo == softmmu_arg(0));
+        if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+            tcg_debug_assert(addrhi == softmmu_arg(1));
+        }
+
         base = tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
                                 label_ptr, offsetof(CPUTLBEntry, addr_read));
 
@@ -2146,16 +2215,22 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
 
 static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 {
-    TCGReg datalo, datahi, addrlo;
-    TCGReg addrhi __attribute__((unused));
+    TCGReg datalo, addrlo;
+    TCGReg datahi __attribute__((unused)) = -1;
+    TCGReg addrhi __attribute__((unused)) = -1;
     TCGMemOpIdx oi;
     TCGMemOp opc;
+    int i = -1;
 
-    datalo = *args++;
-    datahi = (TCG_TARGET_REG_BITS == 32 && is64 ? *args++ : 0);
-    addrlo = *args++;
-    addrhi = (TARGET_LONG_BITS > TCG_TARGET_REG_BITS ? *args++ : 0);
-    oi = *args++;
+    datalo = args[++i];
+    if (TCG_TARGET_REG_BITS == 32 && is64) {
+        datahi = args[++i];
+    }
+    addrlo = args[++i];
+    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+        addrhi = args[++i];
+    }
+    oi = args[++i];
     opc = get_memop(oi);
 
 #if defined(CONFIG_SOFTMMU)
@@ -2164,6 +2239,16 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
         tcg_insn_unit *label_ptr[2];
         TCGReg base;
 
+        i = -1;
+        tcg_debug_assert(addrlo == softmmu_arg(++i));
+        if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+            tcg_debug_assert(addrhi == softmmu_arg(++i));
+        }
+        tcg_debug_assert(datalo == softmmu_arg(++i));
+        if (TCG_TARGET_REG_BITS == 32 && is64) {
+            tcg_debug_assert(datahi == softmmu_arg(++i));
+        }
+
         base = tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
                                 label_ptr, offsetof(CPUTLBEntry, addr_write));
 
@@ -2833,15 +2918,6 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     static const TCGTargetOpDef r_r_re = { .args_ct_str = { "r", "r", "re" } };
     static const TCGTargetOpDef r_0_re = { .args_ct_str = { "r", "0", "re" } };
     static const TCGTargetOpDef r_0_ci = { .args_ct_str = { "r", "0", "ci" } };
-    static const TCGTargetOpDef r_L = { .args_ct_str = { "r", "L" } };
-    static const TCGTargetOpDef L_L = { .args_ct_str = { "L", "L" } };
-    static const TCGTargetOpDef r_L_L = { .args_ct_str = { "r", "L", "L" } };
-    static const TCGTargetOpDef r_r_L = { .args_ct_str = { "r", "r", "L" } };
-    static const TCGTargetOpDef L_L_L = { .args_ct_str = { "L", "L", "L" } };
-    static const TCGTargetOpDef r_r_L_L
-        = { .args_ct_str = { "r", "r", "L", "L" } };
-    static const TCGTargetOpDef L_L_L_L
-        = { .args_ct_str = { "L", "L", "L", "L" } };
     static const TCGTargetOpDef x_x = { .args_ct_str = { "x", "x" } };
     static const TCGTargetOpDef x_x_x = { .args_ct_str = { "x", "x", "x" } };
     static const TCGTargetOpDef x_x_x_x
@@ -3023,17 +3099,66 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         }
 
     case INDEX_op_qemu_ld_i32:
-        return TARGET_LONG_BITS <= TCG_TARGET_REG_BITS ? &r_L : &r_L_L;
-    case INDEX_op_qemu_st_i32:
-        return TARGET_LONG_BITS <= TCG_TARGET_REG_BITS ? &L_L : &L_L_L;
+        {
+            static TCGTargetOpDef ld32;
+            ld32.args_ct_str[0] = qemu_memop_ret(0);
+            ld32.args_ct_str[1] = qemu_memop_arg(0);
+            if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+                ld32.args_ct_str[2] = qemu_memop_arg(1);
+            }
+            return &ld32;
+        }
     case INDEX_op_qemu_ld_i64:
-        return (TCG_TARGET_REG_BITS == 64 ? &r_L
-                : TARGET_LONG_BITS <= TCG_TARGET_REG_BITS ? &r_r_L
-                : &r_r_L_L);
+        {
+            static TCGTargetOpDef ld64;
+            if (TCG_TARGET_REG_BITS == 64) {
+                ld64.args_ct_str[0] = qemu_memop_ret(0);
+                ld64.args_ct_str[1] = qemu_memop_arg(0);
+            } else if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
+                ld64.args_ct_str[0] = qemu_memop_ret(0);
+                ld64.args_ct_str[1] = qemu_memop_ret(1);
+                ld64.args_ct_str[2] = qemu_memop_arg(0);
+            } else {
+                ld64.args_ct_str[0] = qemu_memop_ret(0);
+                ld64.args_ct_str[1] = qemu_memop_ret(1);
+                ld64.args_ct_str[2] = qemu_memop_arg(0);
+                ld64.args_ct_str[3] = qemu_memop_arg(1);
+            }
+            return &ld64;
+        }
+
+    /* Recall the store value comes before addr in the opcode args
+       and after addr in helper args.  */
+    case INDEX_op_qemu_st_i32:
+        {
+            static TCGTargetOpDef st32;
+            st32.args_ct_str[1] = qemu_memop_arg(0);
+            if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
+                st32.args_ct_str[0] = qemu_memop_arg(1);
+            } else {
+                st32.args_ct_str[2] = qemu_memop_arg(1);
+                st32.args_ct_str[0] = qemu_memop_arg(2);
+            }
+            return &st32;
+        }
     case INDEX_op_qemu_st_i64:
-        return (TCG_TARGET_REG_BITS == 64 ? &L_L
-                : TARGET_LONG_BITS <= TCG_TARGET_REG_BITS ? &L_L_L
-                : &L_L_L_L);
+        {
+            static TCGTargetOpDef st64;
+            if (TCG_TARGET_REG_BITS == 64) {
+                st64.args_ct_str[1] = qemu_memop_arg(0);
+                st64.args_ct_str[0] = qemu_memop_arg(1);
+            } else if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
+                st64.args_ct_str[2] = qemu_memop_arg(0);
+                st64.args_ct_str[0] = qemu_memop_arg(1);
+                st64.args_ct_str[1] = qemu_memop_arg(2);
+            } else {
+                st64.args_ct_str[2] = qemu_memop_arg(0);
+                st64.args_ct_str[3] = qemu_memop_arg(1);
+                st64.args_ct_str[0] = qemu_memop_arg(2);
+                st64.args_ct_str[1] = qemu_memop_arg(3);
+            }
+            return &st64;
+        }
 
     case INDEX_op_brcond2_i32:
         {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-4.0 05/17] tcg: Return success from patch_reloc
  2018-11-12 21:44 [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line Richard Henderson
                   ` (3 preceding siblings ...)
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 04/17] tcg/i386: Force qemu_ld/st arguments into fixed registers Richard Henderson
@ 2018-11-12 21:44 ` Richard Henderson
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 06/17] tcg: Add TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2018-11-12 21:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

This moves the assert for success from inside patch_reloc
to outside patch_reloc.  This touches all tcg backends.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c | 44 ++++++++++++++-------------------
 tcg/arm/tcg-target.inc.c     | 26 +++++++++-----------
 tcg/i386/tcg-target.inc.c    | 17 +++++++------
 tcg/mips/tcg-target.inc.c    | 29 +++++++++-------------
 tcg/ppc/tcg-target.inc.c     | 47 ++++++++++++++++++++++--------------
 tcg/s390/tcg-target.inc.c    | 37 +++++++++++++++++++---------
 tcg/sparc/tcg-target.inc.c   | 13 ++++++----
 tcg/tcg-pool.inc.c           |  5 +++-
 tcg/tcg.c                    |  8 +++---
 tcg/tci/tcg-target.inc.c     |  3 ++-
 10 files changed, 125 insertions(+), 104 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 083592a4d7..30091f6a69 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -78,48 +78,40 @@ static const int tcg_target_call_oarg_regs[1] = {
 #define TCG_REG_GUEST_BASE TCG_REG_X28
 #endif
 
-static inline void reloc_pc26(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
+static inline bool reloc_pc26(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
 {
     ptrdiff_t offset = target - code_ptr;
-    tcg_debug_assert(offset == sextract64(offset, 0, 26));
-    /* read instruction, mask away previous PC_REL26 parameter contents,
-       set the proper offset, then write back the instruction. */
-    *code_ptr = deposit32(*code_ptr, 0, 26, offset);
+    if (offset == sextract64(offset, 0, 26)) {
+        /* read instruction, mask away previous PC_REL26 parameter contents,
+           set the proper offset, then write back the instruction. */
+        *code_ptr = deposit32(*code_ptr, 0, 26, offset);
+        return true;
+    }
+    return false;
 }
 
-static inline void reloc_pc26_atomic(tcg_insn_unit *code_ptr,
-                                     tcg_insn_unit *target)
+static inline bool reloc_pc19(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
 {
     ptrdiff_t offset = target - code_ptr;
-    tcg_insn_unit insn;
-    tcg_debug_assert(offset == sextract64(offset, 0, 26));
-    /* read instruction, mask away previous PC_REL26 parameter contents,
-       set the proper offset, then write back the instruction. */
-    insn = atomic_read(code_ptr);
-    atomic_set(code_ptr, deposit32(insn, 0, 26, offset));
+    if (offset == sextract64(offset, 0, 19)) {
+        *code_ptr = deposit32(*code_ptr, 5, 19, offset);
+        return true;
+    }
+    return false;
 }
 
-static inline void reloc_pc19(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
-{
-    ptrdiff_t offset = target - code_ptr;
-    tcg_debug_assert(offset == sextract64(offset, 0, 19));
-    *code_ptr = deposit32(*code_ptr, 5, 19, offset);
-}
-
-static inline void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static inline bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                                intptr_t value, intptr_t addend)
 {
     tcg_debug_assert(addend == 0);
     switch (type) {
     case R_AARCH64_JUMP26:
     case R_AARCH64_CALL26:
-        reloc_pc26(code_ptr, (tcg_insn_unit *)value);
-        break;
+        return reloc_pc26(code_ptr, (tcg_insn_unit *)value);
     case R_AARCH64_CONDBR19:
-        reloc_pc19(code_ptr, (tcg_insn_unit *)value);
-        break;
+        return reloc_pc19(code_ptr, (tcg_insn_unit *)value);
     default:
-        tcg_abort();
+        g_assert_not_reached();
     }
 }
 
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index e1fbf465cb..80d174ef44 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -187,27 +187,23 @@ static const uint8_t tcg_cond_to_arm_cond[] = {
     [TCG_COND_GTU] = COND_HI,
 };
 
-static inline void reloc_pc24(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
+static inline bool reloc_pc24(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
 {
     ptrdiff_t offset = (tcg_ptr_byte_diff(target, code_ptr) - 8) >> 2;
-    *code_ptr = (*code_ptr & ~0xffffff) | (offset & 0xffffff);
+    if (offset == sextract32(offset, 0, 24)) {
+        *code_ptr = (*code_ptr & ~0xffffff) | (offset & 0xffffff);
+        return true;
+    }
+    return false;
 }
 
-static inline void reloc_pc24_atomic(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
-{
-    ptrdiff_t offset = (tcg_ptr_byte_diff(target, code_ptr) - 8) >> 2;
-    tcg_insn_unit insn = atomic_read(code_ptr);
-    tcg_debug_assert(offset == sextract32(offset, 0, 24));
-    atomic_set(code_ptr, deposit32(insn, 0, 24, offset));
-}
-
-static void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
     tcg_debug_assert(addend == 0);
 
     if (type == R_ARM_PC24) {
-        reloc_pc24(code_ptr, (tcg_insn_unit *)value);
+        return reloc_pc24(code_ptr, (tcg_insn_unit *)value);
     } else if (type == R_ARM_PC13) {
         intptr_t diff = value - (uintptr_t)(code_ptr + 2);
         tcg_insn_unit insn = *code_ptr;
@@ -218,10 +214,9 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
             if (!u) {
                 diff = -diff;
             }
-        } else {
+        } else if (diff >= 0x1000 && diff < 0x100000) {
             int rd = extract32(insn, 12, 4);
             int rt = rd == TCG_REG_PC ? TCG_REG_TMP : rd;
-            assert(diff >= 0x1000 && diff < 0x100000);
             /* add rt, pc, #high */
             *code_ptr++ = ((insn & 0xf0000000) | (1 << 25) | ARITH_ADD
                            | (TCG_REG_PC << 16) | (rt << 12)
@@ -230,10 +225,13 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
             insn = deposit32(insn, 12, 4, rt);
             diff &= 0xfff;
             u = 1;
+        } else {
+            return false;
         }
         insn = deposit32(insn, 23, 1, u);
         insn = deposit32(insn, 0, 12, diff);
         *code_ptr = insn;
+        return true;
     } else {
         g_assert_not_reached();
     }
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 8a3e7690b6..16d5af76ad 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -221,29 +221,32 @@ static inline TCGReg softmmu_arg(unsigned n)
 #define qemu_memop_ret(N)          "L"
 #endif /* CONFIG_SOFTMMU */
 
-static void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
     value += addend;
-    switch(type) {
+
+    switch (type) {
     case R_386_PC32:
         value -= (uintptr_t)code_ptr;
         if (value != (int32_t)value) {
-            tcg_abort();
+            return false;
         }
         /* FALLTHRU */
     case R_386_32:
         tcg_patch32(code_ptr, value);
-        break;
+        return true;
+
     case R_386_PC8:
         value -= (uintptr_t)code_ptr;
         if (value != (int8_t)value) {
-            tcg_abort();
+            return false;
         }
         tcg_patch8(code_ptr, value);
-        break;
+        return true;
+
     default:
-        tcg_abort();
+        g_assert_not_reached();
     }
 }
 
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index cff525373b..e59c66b607 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -144,36 +144,29 @@ static tcg_insn_unit *bswap32_addr;
 static tcg_insn_unit *bswap32u_addr;
 static tcg_insn_unit *bswap64_addr;
 
-static inline uint32_t reloc_pc16_val(tcg_insn_unit *pc, tcg_insn_unit *target)
+static bool reloc_pc16_cond(tcg_insn_unit *pc, tcg_insn_unit *target)
 {
     /* Let the compiler perform the right-shift as part of the arithmetic.  */
     ptrdiff_t disp = target - (pc + 1);
-    tcg_debug_assert(disp == (int16_t)disp);
-    return disp & 0xffff;
+    if (disp == (int16_t)disp) {
+        *pc = deposit32(*pc, 0, 16, disp);
+        return true;
+    } else {
+        return false;
+    }
 }
 
-static inline void reloc_pc16(tcg_insn_unit *pc, tcg_insn_unit *target)
+static bool reloc_pc16(tcg_insn_unit *pc, tcg_insn_unit *target)
 {
-    *pc = deposit32(*pc, 0, 16, reloc_pc16_val(pc, target));
+    tcg_debug_assert(reloc_pc16_cond(pc, target));
 }
 
-static inline uint32_t reloc_26_val(tcg_insn_unit *pc, tcg_insn_unit *target)
-{
-    tcg_debug_assert((((uintptr_t)pc ^ (uintptr_t)target) & 0xf0000000) == 0);
-    return ((uintptr_t)target >> 2) & 0x3ffffff;
-}
-
-static inline void reloc_26(tcg_insn_unit *pc, tcg_insn_unit *target)
-{
-    *pc = deposit32(*pc, 0, 26, reloc_26_val(pc, target));
-}
-
-static void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
     tcg_debug_assert(type == R_MIPS_PC16);
     tcg_debug_assert(addend == 0);
-    reloc_pc16(code_ptr, (tcg_insn_unit *)value);
+    return reloc_pc16_cond(code_ptr, (tcg_insn_unit *)value);
 }
 
 #define TCG_CT_CONST_ZERO 0x100
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index c2f729ee8f..656a9ff603 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -186,16 +186,14 @@ static inline bool in_range_b(tcg_target_long target)
     return target == sextract64(target, 0, 26);
 }
 
-static uint32_t reloc_pc24_val(tcg_insn_unit *pc, tcg_insn_unit *target)
+static bool reloc_pc24_cond(tcg_insn_unit *pc, tcg_insn_unit *target)
 {
     ptrdiff_t disp = tcg_ptr_byte_diff(target, pc);
-    tcg_debug_assert(in_range_b(disp));
-    return disp & 0x3fffffc;
-}
-
-static void reloc_pc24(tcg_insn_unit *pc, tcg_insn_unit *target)
-{
-    *pc = (*pc & ~0x3fffffc) | reloc_pc24_val(pc, target);
+    if (in_range_b(disp)) {
+        *pc = (*pc & ~0x3fffffc) | (disp & 0x3fffffc);
+        return true;
+    }
+    return false;
 }
 
 static uint16_t reloc_pc14_val(tcg_insn_unit *pc, tcg_insn_unit *target)
@@ -205,10 +203,22 @@ static uint16_t reloc_pc14_val(tcg_insn_unit *pc, tcg_insn_unit *target)
     return disp & 0xfffc;
 }
 
+static bool reloc_pc14_cond(tcg_insn_unit *pc, tcg_insn_unit *target)
+{
+    ptrdiff_t disp = tcg_ptr_byte_diff(target, pc);
+    if (disp == (int16_t) disp) {
+        *pc = (*pc & ~0xfffc) | (disp & 0xfffc);
+        return true;
+    }
+    return false;
+}
+
+#ifdef CONFIG_SOFTMMU
 static void reloc_pc14(tcg_insn_unit *pc, tcg_insn_unit *target)
 {
-    *pc = (*pc & ~0xfffc) | reloc_pc14_val(pc, target);
+    tcg_debug_assert(reloc_pc14_cond(pc, target));
 }
+#endif
 
 static inline void tcg_out_b_noaddr(TCGContext *s, int insn)
 {
@@ -525,7 +535,7 @@ static const uint32_t tcg_to_isel[] = {
     [TCG_COND_GTU] = ISEL | BC_(7, CR_GT),
 };
 
-static void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
     tcg_insn_unit *target;
@@ -536,11 +546,9 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
 
     switch (type) {
     case R_PPC_REL14:
-        reloc_pc14(code_ptr, target);
-        break;
+        return reloc_pc14_cond(code_ptr, target);
     case R_PPC_REL24:
-        reloc_pc24(code_ptr, target);
-        break;
+        return reloc_pc24_cond(code_ptr, target);
     case R_PPC_ADDR16:
         /* We are abusing this relocation type.  This points to a pair
            of insns, addis + load.  If the displacement is small, we
@@ -552,11 +560,14 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
         } else {
             int16_t lo = value;
             int hi = value - lo;
-            assert(hi + lo == value);
-            code_ptr[0] = deposit32(code_ptr[0], 0, 16, hi >> 16);
-            code_ptr[1] = deposit32(code_ptr[1], 0, 16, lo);
+            if (hi + lo == value) {
+                code_ptr[0] = deposit32(code_ptr[0], 0, 16, hi >> 16);
+                code_ptr[1] = deposit32(code_ptr[1], 0, 16, lo);
+            } else {
+                return false;
+            }
         }
-        break;
+        return true;
     default:
         g_assert_not_reached();
     }
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 17c435ade5..a8d72dd630 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -366,7 +366,7 @@ static void * const qemu_st_helpers[16] = {
 static tcg_insn_unit *tb_ret_addr;
 uint64_t s390_facilities;
 
-static void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
     intptr_t pcrel2;
@@ -377,22 +377,35 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
 
     switch (type) {
     case R_390_PC16DBL:
-        assert(pcrel2 == (int16_t)pcrel2);
-        tcg_patch16(code_ptr, pcrel2);
+        if (pcrel2 == (int16_t)pcrel2) {
+            tcg_patch16(code_ptr, pcrel2);
+            return true;
+        }
         break;
     case R_390_PC32DBL:
-        assert(pcrel2 == (int32_t)pcrel2);
-        tcg_patch32(code_ptr, pcrel2);
+        if (pcrel2 == (int32_t)pcrel2) {
+            tcg_patch32(code_ptr, pcrel2);
+            return true;
+        }
         break;
     case R_390_20:
-        assert(value == sextract64(value, 0, 20));
-        old = *(uint32_t *)code_ptr & 0xf00000ff;
-        old |= ((value & 0xfff) << 16) | ((value & 0xff000) >> 4);
-        tcg_patch32(code_ptr, old);
+        if (value == sextract64(value, 0, 20)) {
+            old = *(uint32_t *)code_ptr & 0xf00000ff;
+            old |= ((value & 0xfff) << 16) | ((value & 0xff000) >> 4);
+            tcg_patch32(code_ptr, old);
+            return true;
+        }
         break;
     default:
         g_assert_not_reached();
     }
+    return false;
+}
+
+static void patch_reloc_force(tcg_insn_unit *code_ptr, int type,
+                              intptr_t value, intptr_t addend)
+{
+    tcg_debug_assert(patch_reloc(code_ptr, type, value, addend));
 }
 
 /* parse target specific constraints */
@@ -1618,7 +1631,8 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     TCGMemOpIdx oi = lb->oi;
     TCGMemOp opc = get_memop(oi);
 
-    patch_reloc(lb->label_ptr[0], R_390_PC16DBL, (intptr_t)s->code_ptr, 2);
+    patch_reloc_force(lb->label_ptr[0], R_390_PC16DBL,
+                      (intptr_t)s->code_ptr, 2);
 
     tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_R2, TCG_AREG0);
     if (TARGET_LONG_BITS == 64) {
@@ -1639,7 +1653,8 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     TCGMemOpIdx oi = lb->oi;
     TCGMemOp opc = get_memop(oi);
 
-    patch_reloc(lb->label_ptr[0], R_390_PC16DBL, (intptr_t)s->code_ptr, 2);
+    patch_reloc_force(lb->label_ptr[0], R_390_PC16DBL,
+                      (intptr_t)s->code_ptr, 2);
 
     tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_R2, TCG_AREG0);
     if (TARGET_LONG_BITS == 64) {
diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index 04bdc3df5e..111f3312d3 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -291,32 +291,34 @@ static inline int check_fit_i32(int32_t val, unsigned int bits)
 # define check_fit_ptr  check_fit_i32
 #endif
 
-static void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
     uint32_t insn = *code_ptr;
     intptr_t pcrel;
+    bool ret;
 
     value += addend;
     pcrel = tcg_ptr_byte_diff((tcg_insn_unit *)value, code_ptr);
 
     switch (type) {
     case R_SPARC_WDISP16:
-        assert(check_fit_ptr(pcrel >> 2, 16));
+        ret = check_fit_ptr(pcrel >> 2, 16);
         insn &= ~INSN_OFF16(-1);
         insn |= INSN_OFF16(pcrel);
         break;
     case R_SPARC_WDISP19:
-        assert(check_fit_ptr(pcrel >> 2, 19));
+        ret = check_fit_ptr(pcrel >> 2, 19);
         insn &= ~INSN_OFF19(-1);
         insn |= INSN_OFF19(pcrel);
         break;
     case R_SPARC_13:
         /* Note that we're abusing this reloc type for our own needs.  */
+        ret = true;
         if (!check_fit_ptr(value, 13)) {
             int adj = (value > 0 ? 0xff8 : -0x1000);
             value -= adj;
-            assert(check_fit_ptr(value, 13));
+            ret = check_fit_ptr(value, 13);
             *code_ptr++ = (ARITH_ADD | INSN_RD(TCG_REG_T2)
                            | INSN_RS1(TCG_REG_TB) | INSN_IMM13(adj));
             insn ^= INSN_RS1(TCG_REG_TB) ^ INSN_RS1(TCG_REG_T2);
@@ -328,12 +330,13 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
         /* Note that we're abusing this reloc type for our own needs.  */
         code_ptr[0] = deposit32(code_ptr[0], 0, 22, value >> 10);
         code_ptr[1] = deposit32(code_ptr[1], 0, 10, value);
-        return;
+        return value == (intptr_t)(uint32_t)value;
     default:
         g_assert_not_reached();
     }
 
     *code_ptr = insn;
+    return ret;
 }
 
 /* parse target specific constraints */
diff --git a/tcg/tcg-pool.inc.c b/tcg/tcg-pool.inc.c
index 7af5513ff3..ab8f6df8b0 100644
--- a/tcg/tcg-pool.inc.c
+++ b/tcg/tcg-pool.inc.c
@@ -140,6 +140,8 @@ static bool tcg_out_pool_finalize(TCGContext *s)
 
     for (; p != NULL; p = p->next) {
         size_t size = sizeof(tcg_target_ulong) * p->nlong;
+        bool ok;
+
         if (!l || l->nlong != p->nlong || memcmp(l->data, p->data, size)) {
             if (unlikely(a > s->code_gen_highwater)) {
                 return false;
@@ -148,7 +150,8 @@ static bool tcg_out_pool_finalize(TCGContext *s)
             a += size;
             l = p;
         }
-        patch_reloc(p->label, p->rtype, (intptr_t)a - size, p->addend);
+        ok = patch_reloc(p->label, p->rtype, (intptr_t)a - size, p->addend);
+        tcg_debug_assert(ok);
     }
 
     s->code_ptr = a;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index e85133ef05..54f1272187 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -66,7 +66,7 @@
 static void tcg_target_init(TCGContext *s);
 static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode);
 static void tcg_target_qemu_prologue(TCGContext *s);
-static void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend);
 
 /* The CIE and FDE header definitions will be common to all hosts.  */
@@ -268,7 +268,8 @@ static void tcg_out_reloc(TCGContext *s, tcg_insn_unit *code_ptr, int type,
         /* FIXME: This may break relocations on RISC targets that
            modify instruction fields in place.  The caller may not have 
            written the initial value.  */
-        patch_reloc(code_ptr, type, l->u.value, addend);
+        bool ok = patch_reloc(code_ptr, type, l->u.value, addend);
+        tcg_debug_assert(ok);
     } else {
         /* add a new relocation entry */
         r = tcg_malloc(sizeof(TCGRelocation));
@@ -288,7 +289,8 @@ static void tcg_out_label(TCGContext *s, TCGLabel *l, tcg_insn_unit *ptr)
     tcg_debug_assert(!l->has_value);
 
     for (r = l->u.first_reloc; r != NULL; r = r->next) {
-        patch_reloc(r->ptr, r->type, value, r->addend);
+        bool ok = patch_reloc(r->ptr, r->type, value, r->addend);
+        tcg_debug_assert(ok);
     }
 
     l->has_value = 1;
diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
index 62ed097254..0015a98485 100644
--- a/tcg/tci/tcg-target.inc.c
+++ b/tcg/tci/tcg-target.inc.c
@@ -369,7 +369,7 @@ static const char *const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 };
 #endif
 
-static void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
     /* tcg_out_reloc always uses the same type, addend. */
@@ -381,6 +381,7 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
     } else {
         tcg_patch64(code_ptr, value);
     }
+    return true;
 }
 
 /* Parse target specific constraints. */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-4.0 06/17] tcg: Add TCG_TARGET_NEED_LDST_OOL_LABELS
  2018-11-12 21:44 [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line Richard Henderson
                   ` (4 preceding siblings ...)
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 05/17] tcg: Return success from patch_reloc Richard Henderson
@ 2018-11-12 21:44 ` Richard Henderson
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 07/17] tcg/i386: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2018-11-12 21:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

This variant of tcg-ldst.inc.c allows the entire thunk to be
moved out-of-line, with caching across TBs within a region.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg.h              |  4 ++
 tcg/tcg-ldst-ool.inc.c | 94 ++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg.c              | 20 +++++++++
 3 files changed, 118 insertions(+)
 create mode 100644 tcg/tcg-ldst-ool.inc.c

diff --git a/tcg/tcg.h b/tcg/tcg.h
index f4efbaa680..1255d2a2c6 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -706,6 +706,10 @@ struct TCGContext {
 #ifdef TCG_TARGET_NEED_LDST_LABELS
     QSIMPLEQ_HEAD(ldst_labels, TCGLabelQemuLdst) ldst_labels;
 #endif
+#ifdef TCG_TARGET_NEED_LDST_OOL_LABELS
+    QSIMPLEQ_HEAD(ldst_labels, TCGLabelQemuLdstOol) ldst_ool_labels;
+    GHashTable *ldst_ool_thunks;
+#endif
 #ifdef TCG_TARGET_NEED_POOL_LABELS
     struct TCGLabelPoolData *pool_labels;
 #endif
diff --git a/tcg/tcg-ldst-ool.inc.c b/tcg/tcg-ldst-ool.inc.c
new file mode 100644
index 0000000000..8fb6550a8d
--- /dev/null
+++ b/tcg/tcg-ldst-ool.inc.c
@@ -0,0 +1,94 @@
+/*
+ * TCG Backend Data: load-store optimization only.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+typedef struct TCGLabelQemuLdstOol {
+    QSIMPLEQ_ENTRY(TCGLabelQemuLdstOol) next;
+    tcg_insn_unit *label;   /* label pointer to be updated */
+    int reloc;              /* relocation type from label_ptr */
+    intptr_t addend;        /* relocation addend from label_ptr */
+    uint32_t key;           /* oi : is_64 : is_ld */
+} TCGLabelQemuLdstOol;
+
+
+/*
+ * Generate TB finalization at the end of block
+ */
+
+static tcg_insn_unit *tcg_out_qemu_ldst_ool(TCGContext *s, bool is_ld,
+                                            bool is64, TCGMemOpIdx oi);
+
+static bool tcg_out_ldst_ool_finalize(TCGContext *s)
+{
+    TCGLabelQemuLdstOol *lb;
+
+    /* qemu_ld/st slow paths */
+    QSIMPLEQ_FOREACH(lb, &s->ldst_ool_labels, next) {
+        gpointer dest, key = (gpointer)(uintptr_t)lb->key;
+        TCGMemOpIdx oi;
+        bool is_ld, is_64, ok;
+
+        /* If we have generated the thunk, and it's still in range, all ok.  */
+        dest = g_hash_table_lookup(s->ldst_ool_thunks, key);
+        if (dest &&
+            patch_reloc(lb->label, lb->reloc, (intptr_t)dest, lb->addend)) {
+            continue;
+        }
+
+        /* Generate a new thunk.  */
+        is_ld = extract32(lb->key, 0, 1);
+        is_64 = extract32(lb->key, 1, 1);
+        oi = extract32(lb->key, 2, 30);
+        dest = tcg_out_qemu_ldst_ool(s, is_ld, is_64, oi);
+
+        /* Test for (pending) buffer overflow.  The assumption is that any
+           one thunk beginning below the high water mark cannot overrun
+           the buffer completely.  Thus we can test for overflow after
+           generating code without having to check during generation.  */
+        if (unlikely((void *)s->code_ptr > s->code_gen_highwater)) {
+            return false;
+        }
+
+        /* Remember the thunk for next time.  */
+        g_hash_table_replace(s->ldst_ool_thunks, key, dest);
+
+        /* The new thunk must be in range.  */
+        ok = patch_reloc(lb->label, lb->reloc, (intptr_t)dest, lb->addend);
+        tcg_debug_assert(ok);
+    }
+    return true;
+}
+
+/*
+ * Allocate a new TCGLabelQemuLdstOol entry.
+ */
+
+static void add_ldst_ool_label(TCGContext *s, bool is_ld, bool is_64,
+                               TCGMemOpIdx oi, int reloc, intptr_t addend)
+{
+    TCGLabelQemuLdstOol *lb = tcg_malloc(sizeof(*lb));
+
+    QSIMPLEQ_INSERT_TAIL(&s->ldst_ool_labels, lb, next);
+    lb->label = s->code_ptr;
+    lb->reloc = reloc;
+    lb->addend = addend;
+    lb->key = is_ld | (is_64 << 1) | (oi << 2);
+}
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 54f1272187..885d842a12 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -521,6 +521,13 @@ static void tcg_region_assign(TCGContext *s, size_t curr_region)
     s->code_gen_ptr = start;
     s->code_gen_buffer_size = end - start;
     s->code_gen_highwater = end - TCG_HIGHWATER;
+
+#ifdef TCG_TARGET_NEED_LDST_OOL_LABELS
+    /* No thunks yet generated this region.  Even if they were in range,
+       this is also the most convenient place to clear the table after a
+       full tb_flush.  */
+    g_hash_table_remove_all(s->ldst_ool_thunks);
+#endif
 }
 
 static bool tcg_region_alloc__locked(TCGContext *s)
@@ -964,6 +971,11 @@ void tcg_context_init(TCGContext *s)
     tcg_debug_assert(!tcg_regset_test_reg(s->reserved_regs, TCG_AREG0));
     ts = tcg_global_reg_new_internal(s, TCG_TYPE_PTR, TCG_AREG0, "env");
     cpu_env = temp_tcgv_ptr(ts);
+
+#ifdef TCG_TARGET_NEED_LDST_OOL_LABELS
+    /* Both key and value are raw pointers.  */
+    s->ldst_ool_thunks = g_hash_table_new(NULL, NULL);
+#endif
 }
 
 /*
@@ -3540,6 +3552,9 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
 #ifdef TCG_TARGET_NEED_LDST_LABELS
     QSIMPLEQ_INIT(&s->ldst_labels);
 #endif
+#ifdef TCG_TARGET_NEED_LDST_OOL_LABELS
+    QSIMPLEQ_INIT(&s->ldst_ool_labels);
+#endif
 #ifdef TCG_TARGET_NEED_POOL_LABELS
     s->pool_labels = NULL;
 #endif
@@ -3620,6 +3635,11 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
         return -1;
     }
 #endif
+#ifdef TCG_TARGET_NEED_LDST_OOL_LABELS
+    if (!tcg_out_ldst_ool_finalize(s)) {
+        return -1;
+    }
+#endif
 #ifdef TCG_TARGET_NEED_POOL_LABELS
     if (!tcg_out_pool_finalize(s)) {
         return -1;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-4.0 07/17] tcg/i386: Use TCG_TARGET_NEED_LDST_OOL_LABELS
  2018-11-12 21:44 [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line Richard Henderson
                   ` (5 preceding siblings ...)
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 06/17] tcg: Add TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
@ 2018-11-12 21:44 ` Richard Henderson
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 08/17] tcg/aarch64: Add constraints for x0, x1, x2 Richard Henderson
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2018-11-12 21:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

Move the entire memory operation out of line.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.h     |   2 +-
 tcg/i386/tcg-target.inc.c | 401 ++++++++++++++++----------------------
 2 files changed, 171 insertions(+), 232 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 9fdf37f23c..c2d84cf1d2 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -224,7 +224,7 @@ static inline void tb_target_set_jmp_target(uintptr_t tc_ptr,
 #define TCG_TARGET_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
 
 #ifdef CONFIG_SOFTMMU
-#define TCG_TARGET_NEED_LDST_LABELS
+#define TCG_TARGET_NEED_LDST_OOL_LABELS
 #endif
 #define TCG_TARGET_NEED_POOL_LABELS
 
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 16d5af76ad..1833f4c2b2 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1619,7 +1619,7 @@ static void tcg_out_nopn(TCGContext *s, int n)
 }
 
 #if defined(CONFIG_SOFTMMU)
-#include "tcg-ldst.inc.c"
+#include "tcg-ldst-ool.inc.c"
 
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     int mmu_idx, uintptr_t ra)
@@ -1632,6 +1632,14 @@ static void * const qemu_ld_helpers[16] = {
     [MO_BEUW] = helper_be_lduw_mmu,
     [MO_BEUL] = helper_be_ldul_mmu,
     [MO_BEQ]  = helper_be_ldq_mmu,
+
+    [MO_SB]   = helper_ret_ldsb_mmu,
+    [MO_LESW] = helper_le_ldsw_mmu,
+    [MO_BESW] = helper_be_ldsw_mmu,
+#if TCG_TARGET_REG_BITS == 64
+    [MO_LESL] = helper_le_ldsl_mmu,
+    [MO_BESL] = helper_be_ldsl_mmu,
+#endif
 };
 
 /* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
@@ -1741,18 +1749,18 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     }
 
     /* jne slow_path */
-    tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
+    tcg_out_opc(s, OPC_JCC_short + JCC_JNE, 0, 0, 0);
     label_ptr[0] = s->code_ptr;
-    s->code_ptr += 4;
+    s->code_ptr += 1;
 
     if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
         /* cmp 4(r0), addrhi */
         tcg_out_modrm_offset(s, OPC_CMP_GvEv, addrhi, r0, 4);
 
         /* jne slow_path */
-        tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
+        tcg_out_opc(s, OPC_JCC_short + JCC_JNE, 0, 0, 0);
         label_ptr[1] = s->code_ptr;
-        s->code_ptr += 4;
+        s->code_ptr += 1;
     }
 
     /* TLB Hit.  */
@@ -1764,181 +1772,6 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     return base;
 }
 
-/*
- * Record the context of a call to the out of line helper code for the slow path
- * for a load or store, so that we can later generate the correct helper code
- */
-static void add_qemu_ldst_label(TCGContext *s, bool is_ld, TCGMemOpIdx oi,
-                                TCGReg datalo, TCGReg datahi,
-                                TCGReg addrlo, TCGReg addrhi,
-                                tcg_insn_unit *raddr,
-                                tcg_insn_unit **label_ptr)
-{
-    TCGLabelQemuLdst *label = new_ldst_label(s);
-
-    label->is_ld = is_ld;
-    label->oi = oi;
-    label->datalo_reg = datalo;
-    label->datahi_reg = datahi;
-    label->addrlo_reg = addrlo;
-    label->addrhi_reg = addrhi;
-    label->raddr = raddr;
-    label->label_ptr[0] = label_ptr[0];
-    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-        label->label_ptr[1] = label_ptr[1];
-    }
-}
-
-/*
- * Generate code for the slow path for a load at the end of block
- */
-static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
-{
-    TCGMemOpIdx oi = l->oi;
-    TCGMemOp opc = get_memop(oi);
-    TCGReg data_reg;
-    tcg_insn_unit **label_ptr = &l->label_ptr[0];
-
-    /* resolve label address */
-    tcg_patch32(label_ptr[0], s->code_ptr - label_ptr[0] - 4);
-    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-        tcg_patch32(label_ptr[1], s->code_ptr - label_ptr[1] - 4);
-    }
-
-    if (TCG_TARGET_REG_BITS == 32) {
-        int ofs = 0;
-
-        tcg_out_st(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_ESP, ofs);
-        ofs += 4;
-
-        tcg_out_st(s, TCG_TYPE_I32, l->addrlo_reg, TCG_REG_ESP, ofs);
-        ofs += 4;
-
-        if (TARGET_LONG_BITS == 64) {
-            tcg_out_st(s, TCG_TYPE_I32, l->addrhi_reg, TCG_REG_ESP, ofs);
-            ofs += 4;
-        }
-
-        tcg_out_sti(s, TCG_TYPE_I32, oi, TCG_REG_ESP, ofs);
-        ofs += 4;
-
-        tcg_out_sti(s, TCG_TYPE_PTR, (uintptr_t)l->raddr, TCG_REG_ESP, ofs);
-    } else {
-        tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
-        /* The second argument is already loaded with addrlo.  */
-        tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[2], oi);
-        tcg_out_movi(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[3],
-                     (uintptr_t)l->raddr);
-    }
-
-    tcg_out_call(s, qemu_ld_helpers[opc & (MO_BSWAP | MO_SIZE)]);
-
-    data_reg = l->datalo_reg;
-    switch (opc & MO_SSIZE) {
-    case MO_SB:
-        tcg_out_ext8s(s, data_reg, TCG_REG_EAX, P_REXW);
-        break;
-    case MO_SW:
-        tcg_out_ext16s(s, data_reg, TCG_REG_EAX, P_REXW);
-        break;
-#if TCG_TARGET_REG_BITS == 64
-    case MO_SL:
-        tcg_out_ext32s(s, data_reg, TCG_REG_EAX);
-        break;
-#endif
-    case MO_UB:
-    case MO_UW:
-        /* Note that the helpers have zero-extended to tcg_target_long.  */
-    case MO_UL:
-        tcg_out_mov(s, TCG_TYPE_I32, data_reg, TCG_REG_EAX);
-        break;
-    case MO_Q:
-        if (TCG_TARGET_REG_BITS == 64) {
-            tcg_out_mov(s, TCG_TYPE_I64, data_reg, TCG_REG_RAX);
-        } else if (data_reg == TCG_REG_EDX) {
-            /* xchg %edx, %eax */
-            tcg_out_opc(s, OPC_XCHG_ax_r32 + TCG_REG_EDX, 0, 0, 0);
-            tcg_out_mov(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_EAX);
-        } else {
-            tcg_out_mov(s, TCG_TYPE_I32, data_reg, TCG_REG_EAX);
-            tcg_out_mov(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_EDX);
-        }
-        break;
-    default:
-        tcg_abort();
-    }
-
-    /* Jump to the code corresponding to next IR of qemu_st */
-    tcg_out_jmp(s, l->raddr);
-}
-
-/*
- * Generate code for the slow path for a store at the end of block
- */
-static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
-{
-    TCGMemOpIdx oi = l->oi;
-    TCGMemOp opc = get_memop(oi);
-    TCGMemOp s_bits = opc & MO_SIZE;
-    tcg_insn_unit **label_ptr = &l->label_ptr[0];
-    TCGReg retaddr;
-
-    /* resolve label address */
-    tcg_patch32(label_ptr[0], s->code_ptr - label_ptr[0] - 4);
-    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-        tcg_patch32(label_ptr[1], s->code_ptr - label_ptr[1] - 4);
-    }
-
-    if (TCG_TARGET_REG_BITS == 32) {
-        int ofs = 0;
-
-        tcg_out_st(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_ESP, ofs);
-        ofs += 4;
-
-        tcg_out_st(s, TCG_TYPE_I32, l->addrlo_reg, TCG_REG_ESP, ofs);
-        ofs += 4;
-
-        if (TARGET_LONG_BITS == 64) {
-            tcg_out_st(s, TCG_TYPE_I32, l->addrhi_reg, TCG_REG_ESP, ofs);
-            ofs += 4;
-        }
-
-        tcg_out_st(s, TCG_TYPE_I32, l->datalo_reg, TCG_REG_ESP, ofs);
-        ofs += 4;
-
-        if (s_bits == MO_64) {
-            tcg_out_st(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_ESP, ofs);
-            ofs += 4;
-        }
-
-        tcg_out_sti(s, TCG_TYPE_I32, oi, TCG_REG_ESP, ofs);
-        ofs += 4;
-
-        retaddr = TCG_REG_EAX;
-        tcg_out_movi(s, TCG_TYPE_PTR, retaddr, (uintptr_t)l->raddr);
-        tcg_out_st(s, TCG_TYPE_PTR, retaddr, TCG_REG_ESP, ofs);
-    } else {
-        tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
-        /* The second argument is already loaded with addrlo.  */
-        tcg_out_mov(s, (s_bits == MO_64 ? TCG_TYPE_I64 : TCG_TYPE_I32),
-                    tcg_target_call_iarg_regs[2], l->datalo_reg);
-        tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[3], oi);
-
-        if (ARRAY_SIZE(tcg_target_call_iarg_regs) > 4) {
-            retaddr = tcg_target_call_iarg_regs[4];
-            tcg_out_movi(s, TCG_TYPE_PTR, retaddr, (uintptr_t)l->raddr);
-        } else {
-            retaddr = TCG_REG_RAX;
-            tcg_out_movi(s, TCG_TYPE_PTR, retaddr, (uintptr_t)l->raddr);
-            tcg_out_st(s, TCG_TYPE_PTR, retaddr, TCG_REG_ESP,
-                       TCG_TARGET_CALL_STACK_OFFSET);
-        }
-    }
-
-    /* "Tail call" to the helper, with the return address back inline.  */
-    tcg_out_push(s, retaddr);
-    tcg_out_jmp(s, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)]);
-}
 #elif defined(__x86_64__) && defined(__linux__)
 # include <asm/prctl.h>
 # include <sys/prctl.h>
@@ -2067,7 +1900,6 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
     TCGReg datahi __attribute__((unused)) = -1;
     TCGReg addrhi __attribute__((unused)) = -1;
     TCGMemOpIdx oi;
-    TCGMemOp opc;
     int i = -1;
 
     datalo = args[++i];
@@ -2079,35 +1911,25 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
         addrhi = args[++i];
     }
     oi = args[++i];
-    opc = get_memop(oi);
 
 #if defined(CONFIG_SOFTMMU)
-    {
-        int mem_index = get_mmuidx(oi);
-        tcg_insn_unit *label_ptr[2];
-        TCGReg base;
-
-        tcg_debug_assert(datalo == tcg_target_call_oarg_regs[0]);
-        if (TCG_TARGET_REG_BITS == 32 && is64) {
-            tcg_debug_assert(datahi == tcg_target_call_oarg_regs[1]);
-        }
-        tcg_debug_assert(addrlo == softmmu_arg(0));
-        if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-            tcg_debug_assert(addrhi == softmmu_arg(1));
-        }
-
-        base = tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
-                                label_ptr, offsetof(CPUTLBEntry, addr_read));
-
-        /* TLB Hit.  */
-        tcg_out_qemu_ld_direct(s, datalo, datahi, base, -1, 0, 0, opc);
-
-        /* Record the current context of a load into ldst label */
-        add_qemu_ldst_label(s, true, oi, datalo, datahi, addrlo, addrhi,
-                            s->code_ptr, label_ptr);
+    /* Assert that we've set up the constraints properly.  */
+    tcg_debug_assert(datalo == tcg_target_call_oarg_regs[0]);
+    if (TCG_TARGET_REG_BITS == 32 && is64) {
+        tcg_debug_assert(datahi == tcg_target_call_oarg_regs[1]);
     }
+    tcg_debug_assert(addrlo == softmmu_arg(0));
+    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+        tcg_debug_assert(addrhi == softmmu_arg(1));
+    }
+
+    /* Call to thunk.  */
+    tcg_out8(s, OPC_CALL_Jz);
+    add_ldst_ool_label(s, true, is64, oi, R_386_PC32, -4);
+    s->code_ptr += 4;
 #else
     {
+        TCGMemOp opc = get_memop(oi);
         int32_t offset = guest_base;
         TCGReg base = addrlo;
         int index = -1;
@@ -2222,7 +2044,6 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
     TCGReg datahi __attribute__((unused)) = -1;
     TCGReg addrhi __attribute__((unused)) = -1;
     TCGMemOpIdx oi;
-    TCGMemOp opc;
     int i = -1;
 
     datalo = args[++i];
@@ -2234,36 +2055,26 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
         addrhi = args[++i];
     }
     oi = args[++i];
-    opc = get_memop(oi);
 
 #if defined(CONFIG_SOFTMMU)
-    {
-        int mem_index = get_mmuidx(oi);
-        tcg_insn_unit *label_ptr[2];
-        TCGReg base;
-
-        i = -1;
-        tcg_debug_assert(addrlo == softmmu_arg(++i));
-        if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-            tcg_debug_assert(addrhi == softmmu_arg(++i));
-        }
-        tcg_debug_assert(datalo == softmmu_arg(++i));
-        if (TCG_TARGET_REG_BITS == 32 && is64) {
-            tcg_debug_assert(datahi == softmmu_arg(++i));
-        }
-
-        base = tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
-                                label_ptr, offsetof(CPUTLBEntry, addr_write));
-
-        /* TLB Hit.  */
-        tcg_out_qemu_st_direct(s, datalo, datahi, base, 0, 0, opc);
-
-        /* Record the current context of a store into ldst label */
-        add_qemu_ldst_label(s, false, oi, datalo, datahi, addrlo, addrhi,
-                            s->code_ptr, label_ptr);
+    /* Assert that we've set up the constraints properly.  */
+    i = -1;
+    tcg_debug_assert(addrlo == softmmu_arg(++i));
+    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+        tcg_debug_assert(addrhi == softmmu_arg(++i));
     }
+    tcg_debug_assert(datalo == softmmu_arg(++i));
+    if (TCG_TARGET_REG_BITS == 32 && is64) {
+        tcg_debug_assert(datahi == softmmu_arg(++i));
+    }
+
+    /* Call to thunk.  */
+    tcg_out8(s, OPC_CALL_Jz);
+    add_ldst_ool_label(s, false, is64, oi, R_386_PC32, -4);
+    s->code_ptr += 4;
 #else
     {
+        TCGMemOp opc = get_memop(oi);
         int32_t offset = guest_base;
         TCGReg base = addrlo;
         int seg = 0;
@@ -2298,6 +2109,134 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 #endif
 }
 
+#if defined(CONFIG_SOFTMMU)
+/*
+ * Generate code for an out-of-line thunk performing a load.
+ */
+static tcg_insn_unit *tcg_out_qemu_ldst_ool(TCGContext *s, bool is_ld,
+                                            bool is64, TCGMemOpIdx oi)
+{
+    TCGMemOp opc = get_memop(oi);
+    int mem_index = get_mmuidx(oi);
+    tcg_insn_unit *label_ptr[2], *thunk;
+    TCGReg datalo, addrlo, base;
+    TCGReg datahi __attribute__((unused)) = -1;
+    TCGReg addrhi __attribute__((unused)) = -1;
+    int i;
+
+    /* Since we're amortizing the cost, align the thunk.  */
+    thunk = QEMU_ALIGN_PTR_UP(s->code_ptr, 16);
+    if (thunk != s->code_ptr) {
+        memset(s->code_ptr, 0x90, thunk - s->code_ptr);
+        s->code_ptr = thunk;
+    }
+
+    /* Discover where the inputs are held.  */
+    i = -1;
+    addrlo = softmmu_arg(++i);
+    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+        addrhi = softmmu_arg(++i);
+    }
+    if (is_ld) {
+        datalo = tcg_target_call_oarg_regs[0];
+        if (TCG_TARGET_REG_BITS == 32 && is64) {
+            datahi = tcg_target_call_oarg_regs[1];
+        }
+    } else {
+        datalo = softmmu_arg(++i);
+        if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+            datahi = softmmu_arg(++i);
+        }
+    }
+
+    base = tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc, label_ptr,
+                            is_ld ? offsetof(CPUTLBEntry, addr_read)
+                            : offsetof(CPUTLBEntry, addr_write));
+
+    /* TLB Hit.  */
+    if (is_ld) {
+        tcg_out_qemu_ld_direct(s, datalo, datahi, base, -1, 0, 0, opc);
+    } else {
+        tcg_out_qemu_st_direct(s, datalo, datahi, base, 0, 0, opc);
+    }
+    tcg_out_opc(s, OPC_RET, 0, 0, 0);
+
+    /* TLB Miss.  */
+
+    /* resolve label address */
+    tcg_patch8(label_ptr[0], s->code_ptr - label_ptr[0] - 1);
+    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+        tcg_patch8(label_ptr[1], s->code_ptr - label_ptr[1] - 1);
+    }
+
+    if (TCG_TARGET_REG_BITS == 32) {
+        /* Copy the return address into a temporary.  */
+        tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_L0, TCG_REG_ESP, 0);
+        i = 4;
+
+        tcg_out_st(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_ESP, i);
+        i += 4;
+
+        tcg_out_st(s, TCG_TYPE_I32, addrlo, TCG_REG_ESP, i);
+        i += 4;
+
+        if (TARGET_LONG_BITS == 64) {
+            tcg_out_st(s, TCG_TYPE_I32, addrhi, TCG_REG_ESP, i);
+            i += 4;
+        }
+
+        if (!is_ld) {
+            tcg_out_st(s, TCG_TYPE_I32, datalo, TCG_REG_ESP, i);
+            i += 4;
+
+            if (is64) {
+                tcg_out_st(s, TCG_TYPE_I32, datahi, TCG_REG_ESP, i);
+                i += 4;
+            }
+        }
+
+        tcg_out_sti(s, TCG_TYPE_I32, oi, TCG_REG_ESP, i);
+        i += 4;
+
+        tcg_out_st(s, TCG_TYPE_PTR, TCG_REG_L0, TCG_REG_ESP, i);
+    } else {
+        tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
+
+        /* The address and data values have been placed by constraints.  */
+        tcg_debug_assert(addrlo == tcg_target_call_iarg_regs[1]);
+        if (is_ld) {
+            i = 2;
+        } else {
+            tcg_debug_assert(datalo == tcg_target_call_iarg_regs[2]);
+            i = 3;
+        }
+
+        tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[i++], oi);
+
+        /* Copy the return address from the stack to the rvalue argument.
+         * WIN64 runs out of argument registers for stores.
+         */
+        if (i < (int)ARRAY_SIZE(tcg_target_call_iarg_regs)) {
+            tcg_out_ld(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[i],
+                       TCG_REG_ESP, 0);
+        } else {
+            tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_RAX, TCG_REG_ESP, 0);
+            tcg_out_st(s, TCG_TYPE_PTR, TCG_REG_RAX, TCG_REG_ESP,
+                       TCG_TARGET_CALL_STACK_OFFSET + 8);
+        }
+    }
+
+    /* Tail call to the helper.  */
+    if (is_ld) {
+        tcg_out_jmp(s, qemu_ld_helpers[opc & (MO_BSWAP | MO_SSIZE)]);
+    } else {
+        tcg_out_jmp(s, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)]);
+    }
+
+    return thunk;
+}
+#endif
+
 static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                               const TCGArg *args, const int *const_args)
 {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-4.0 08/17] tcg/aarch64: Add constraints for x0, x1, x2
  2018-11-12 21:44 [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line Richard Henderson
                   ` (6 preceding siblings ...)
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 07/17] tcg/i386: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
@ 2018-11-12 21:44 ` Richard Henderson
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 09/17] tcg/aarch64: Parameterize the temps for tcg_out_tlb_read Richard Henderson
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2018-11-12 21:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

These are function call arguments that we will need soon.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 30091f6a69..148de0b7f2 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -125,6 +125,18 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
                                            const char *ct_str, TCGType type)
 {
     switch (*ct_str++) {
+    case 'a': /* x0 */
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set_reg(ct->u.regs, TCG_REG_X0);
+        break;
+    case 'b': /* x1 */
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set_reg(ct->u.regs, TCG_REG_X1);
+        break;
+    case 'c': /* x2 */
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set_reg(ct->u.regs, TCG_REG_X2);
+        break;
     case 'r': /* general registers */
         ct->ct |= TCG_CT_REG;
         ct->u.regs |= 0xffffffffu;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-4.0 09/17] tcg/aarch64: Parameterize the temps for tcg_out_tlb_read
  2018-11-12 21:44 [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line Richard Henderson
                   ` (7 preceding siblings ...)
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 08/17] tcg/aarch64: Add constraints for x0, x1, x2 Richard Henderson
@ 2018-11-12 21:44 ` Richard Henderson
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 10/17] tcg/aarch64: Parameterize the temp for tcg_out_goto_long Richard Henderson
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2018-11-12 21:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

When moving the qemu_ld/st arguments to the right place for
a function call, we'll need to move the temps out of the way.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c | 74 +++++++++++++++++++-----------------
 1 file changed, 40 insertions(+), 34 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 148de0b7f2..c0ba9a6d50 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -1467,13 +1467,15 @@ static void add_qemu_ldst_label(TCGContext *s, bool is_ld, TCGMemOpIdx oi,
     label->label_ptr[0] = label_ptr;
 }
 
-/* Load and compare a TLB entry, emitting the conditional jump to the
-   slow path for the failure case, which will be patched later when finalizing
-   the slow path. Generated code returns the host addend in X1,
-   clobbers X0,X2,X3,TMP. */
-static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp opc,
-                             tcg_insn_unit **label_ptr, int mem_index,
-                             bool is_read)
+/*
+ * Load and compare a TLB entry, emitting the conditional jump to the
+ * slow path on failure.  Returns the register for the host addend.
+ * Clobbers t0, t1, t2, t3.
+ */
+static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp opc,
+                               tcg_insn_unit **label_ptr, int mem_index,
+                               bool is_read, TCGReg t0, TCGReg t1,
+                               TCGReg t2, TCGReg t3)
 {
     int tlb_offset = is_read ?
         offsetof(CPUArchState, tlb_table[mem_index][0].addr_read)
@@ -1491,55 +1493,56 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp opc,
     if (a_bits >= s_bits) {
         x3 = addr_reg;
     } else {
+        x3 = t3;
         tcg_out_insn(s, 3401, ADDI, TARGET_LONG_BITS == 64,
-                     TCG_REG_X3, addr_reg, s_mask - a_mask);
-        x3 = TCG_REG_X3;
+                     x3, addr_reg, s_mask - a_mask);
     }
     tlb_mask = (uint64_t)TARGET_PAGE_MASK | a_mask;
 
-    /* Extract the TLB index from the address into X0.
-       X0<CPU_TLB_BITS:0> =
+    /* Extract the TLB index from the address into T0.
+       T0<CPU_TLB_BITS:0> =
        addr_reg<TARGET_PAGE_BITS+CPU_TLB_BITS:TARGET_PAGE_BITS> */
-    tcg_out_ubfm(s, TARGET_LONG_BITS == 64, TCG_REG_X0, addr_reg,
+    tcg_out_ubfm(s, TARGET_LONG_BITS == 64, t0, addr_reg,
                  TARGET_PAGE_BITS, TARGET_PAGE_BITS + CPU_TLB_BITS);
 
-    /* Store the page mask part of the address into X3.  */
+    /* Store the page mask part of the address into T3.  */
     tcg_out_logicali(s, I3404_ANDI, TARGET_LONG_BITS == 64,
-                     TCG_REG_X3, x3, tlb_mask);
+                     t3, x3, tlb_mask);
 
-    /* Add any "high bits" from the tlb offset to the env address into X2,
+    /* Add any "high bits" from the tlb offset to the env address into T2,
        to take advantage of the LSL12 form of the ADDI instruction.
-       X2 = env + (tlb_offset & 0xfff000) */
+       T2 = env + (tlb_offset & 0xfff000) */
     if (tlb_offset & 0xfff000) {
-        tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, TCG_REG_X2, base,
+        tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, t2, base,
                      tlb_offset & 0xfff000);
-        base = TCG_REG_X2;
+        base = t2;
     }
 
-    /* Merge the tlb index contribution into X2.
-       X2 = X2 + (X0 << CPU_TLB_ENTRY_BITS) */
-    tcg_out_insn(s, 3502S, ADD_LSL, TCG_TYPE_I64, TCG_REG_X2, base,
-                 TCG_REG_X0, CPU_TLB_ENTRY_BITS);
+    /* Merge the tlb index contribution into T2.
+       T2 = T2 + (T0 << CPU_TLB_ENTRY_BITS) */
+    tcg_out_insn(s, 3502S, ADD_LSL, TCG_TYPE_I64,
+                 t2, base, t0, CPU_TLB_ENTRY_BITS);
 
-    /* Merge "low bits" from tlb offset, load the tlb comparator into X0.
-       X0 = load [X2 + (tlb_offset & 0x000fff)] */
+    /* Merge "low bits" from tlb offset, load the tlb comparator into T0.
+       T0 = load [T2 + (tlb_offset & 0x000fff)] */
     tcg_out_ldst(s, TARGET_LONG_BITS == 32 ? I3312_LDRW : I3312_LDRX,
-                 TCG_REG_X0, TCG_REG_X2, tlb_offset & 0xfff,
-                 TARGET_LONG_BITS == 32 ? 2 : 3);
+                 t0, t2, tlb_offset & 0xfff, TARGET_LONG_BITS == 32 ? 2 : 3);
 
     /* Load the tlb addend. Do that early to avoid stalling.
-       X1 = load [X2 + (tlb_offset & 0xfff) + offsetof(addend)] */
-    tcg_out_ldst(s, I3312_LDRX, TCG_REG_X1, TCG_REG_X2,
+       T1 = load [T2 + (tlb_offset & 0xfff) + offsetof(addend)] */
+    tcg_out_ldst(s, I3312_LDRX, t1, t2,
                  (tlb_offset & 0xfff) + (offsetof(CPUTLBEntry, addend)) -
                  (is_read ? offsetof(CPUTLBEntry, addr_read)
                   : offsetof(CPUTLBEntry, addr_write)), 3);
 
     /* Perform the address comparison. */
-    tcg_out_cmp(s, (TARGET_LONG_BITS == 64), TCG_REG_X0, TCG_REG_X3, 0);
+    tcg_out_cmp(s, (TARGET_LONG_BITS == 64), t0, t3, 0);
 
     /* If not equal, we jump to the slow path. */
     *label_ptr = s->code_ptr;
     tcg_out_goto_cond_noaddr(s, TCG_COND_NE);
+
+    return t1;
 }
 
 #endif /* CONFIG_SOFTMMU */
@@ -1644,10 +1647,12 @@ static void tcg_out_qemu_ld(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
 #ifdef CONFIG_SOFTMMU
     unsigned mem_index = get_mmuidx(oi);
     tcg_insn_unit *label_ptr;
+    TCGReg base;
 
-    tcg_out_tlb_read(s, addr_reg, memop, &label_ptr, mem_index, 1);
+    base = tcg_out_tlb_read(s, addr_reg, memop, &label_ptr, mem_index, 1,
+                            TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3);
     tcg_out_qemu_ld_direct(s, memop, ext, data_reg,
-                           TCG_REG_X1, otype, addr_reg);
+                           base, otype, addr_reg);
     add_qemu_ldst_label(s, true, oi, ext, data_reg, addr_reg,
                         s->code_ptr, label_ptr);
 #else /* !CONFIG_SOFTMMU */
@@ -1669,10 +1674,11 @@ static void tcg_out_qemu_st(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
 #ifdef CONFIG_SOFTMMU
     unsigned mem_index = get_mmuidx(oi);
     tcg_insn_unit *label_ptr;
+    TCGReg base;
 
-    tcg_out_tlb_read(s, addr_reg, memop, &label_ptr, mem_index, 0);
-    tcg_out_qemu_st_direct(s, memop, data_reg,
-                           TCG_REG_X1, otype, addr_reg);
+    base = tcg_out_tlb_read(s, addr_reg, memop, &label_ptr, mem_index, 0,
+                            TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3);
+    tcg_out_qemu_st_direct(s, memop, data_reg, base, otype, addr_reg);
     add_qemu_ldst_label(s, false, oi, (memop & MO_SIZE)== MO_64,
                         data_reg, addr_reg, s->code_ptr, label_ptr);
 #else /* !CONFIG_SOFTMMU */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-4.0 10/17] tcg/aarch64: Parameterize the temp for tcg_out_goto_long
  2018-11-12 21:44 [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line Richard Henderson
                   ` (8 preceding siblings ...)
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 09/17] tcg/aarch64: Parameterize the temps for tcg_out_tlb_read Richard Henderson
@ 2018-11-12 21:44 ` Richard Henderson
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 11/17] tcg/aarch64: Use B not BL " Richard Henderson
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2018-11-12 21:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

We cannot use TCG_REG_LR (aka TCG_REG_TMP) for tail calls.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index c0ba9a6d50..ea5fe33fca 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -1134,14 +1134,15 @@ static inline void tcg_out_goto(TCGContext *s, tcg_insn_unit *target)
     tcg_out_insn(s, 3206, B, offset);
 }
 
-static inline void tcg_out_goto_long(TCGContext *s, tcg_insn_unit *target)
+static inline void tcg_out_goto_long(TCGContext *s, tcg_insn_unit *target,
+                                     TCGReg scratch)
 {
     ptrdiff_t offset = target - s->code_ptr;
     if (offset == sextract64(offset, 0, 26)) {
         tcg_out_insn(s, 3206, BL, offset);
     } else {
-        tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP, (intptr_t)target);
-        tcg_out_insn(s, 3207, BR, TCG_REG_TMP);
+        tcg_out_movi(s, TCG_TYPE_I64, scratch, (intptr_t)target);
+        tcg_out_insn(s, 3207, BR, scratch);
     }
 }
 
@@ -1716,10 +1717,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_exit_tb:
         /* Reuse the zeroing that exists for goto_ptr.  */
         if (a0 == 0) {
-            tcg_out_goto_long(s, s->code_gen_epilogue);
+            tcg_out_goto_long(s, s->code_gen_epilogue, TCG_REG_TMP);
         } else {
             tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_X0, a0);
-            tcg_out_goto_long(s, tb_ret_addr);
+            tcg_out_goto_long(s, tb_ret_addr, TCG_REG_TMP);
         }
         break;
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-4.0 11/17] tcg/aarch64: Use B not BL for tcg_out_goto_long
  2018-11-12 21:44 [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line Richard Henderson
                   ` (9 preceding siblings ...)
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 10/17] tcg/aarch64: Parameterize the temp for tcg_out_goto_long Richard Henderson
@ 2018-11-12 21:44 ` Richard Henderson
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 12/17] tcg/aarch64: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2018-11-12 21:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

This was a typo copying from tcg_out_call, apparently.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index ea5fe33fca..403f5caf14 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -1139,7 +1139,7 @@ static inline void tcg_out_goto_long(TCGContext *s, tcg_insn_unit *target,
 {
     ptrdiff_t offset = target - s->code_ptr;
     if (offset == sextract64(offset, 0, 26)) {
-        tcg_out_insn(s, 3206, BL, offset);
+        tcg_out_insn(s, 3206, B, offset);
     } else {
         tcg_out_movi(s, TCG_TYPE_I64, scratch, (intptr_t)target);
         tcg_out_insn(s, 3207, BR, scratch);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-4.0 12/17] tcg/aarch64: Use TCG_TARGET_NEED_LDST_OOL_LABELS
  2018-11-12 21:44 [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line Richard Henderson
                   ` (10 preceding siblings ...)
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 11/17] tcg/aarch64: Use B not BL " Richard Henderson
@ 2018-11-12 21:44 ` Richard Henderson
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 13/17] tcg/arm: Parameterize the temps for tcg_out_tlb_read Richard Henderson
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2018-11-12 21:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.h     |   2 +-
 tcg/aarch64/tcg-target.inc.c | 191 +++++++++++++++++------------------
 2 files changed, 93 insertions(+), 100 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 9aea1d1771..d1bd77c41d 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -146,7 +146,7 @@ static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
 
 #ifdef CONFIG_SOFTMMU
-#define TCG_TARGET_NEED_LDST_LABELS
+#define TCG_TARGET_NEED_LDST_OOL_LABELS
 #endif
 #define TCG_TARGET_NEED_POOL_LABELS
 
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 403f5caf14..8edea527f7 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -145,18 +145,6 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
         ct->ct |= TCG_CT_REG;
         ct->u.regs |= 0xffffffff00000000ull;
         break;
-    case 'l': /* qemu_ld / qemu_st address, data_reg */
-        ct->ct |= TCG_CT_REG;
-        ct->u.regs = 0xffffffffu;
-#ifdef CONFIG_SOFTMMU
-        /* x0 and x1 will be overwritten when reading the tlb entry,
-           and x2, and x3 for helper args, better to avoid using them. */
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_X0);
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_X1);
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_X2);
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_X3);
-#endif
-        break;
     case 'A': /* Valid for arithmetic immediate (positive or negative).  */
         ct->ct |= TCG_CT_CONST_AIMM;
         break;
@@ -1378,7 +1366,7 @@ static void tcg_out_cltz(TCGContext *s, TCGType ext, TCGReg d,
 }
 
 #ifdef CONFIG_SOFTMMU
-#include "tcg-ldst.inc.c"
+#include "tcg-ldst-ool.inc.c"
 
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     TCGMemOpIdx oi, uintptr_t ra)
@@ -1391,6 +1379,12 @@ static void * const qemu_ld_helpers[16] = {
     [MO_BEUW] = helper_be_lduw_mmu,
     [MO_BEUL] = helper_be_ldul_mmu,
     [MO_BEQ]  = helper_be_ldq_mmu,
+
+    [MO_SB]   = helper_ret_ldsb_mmu,
+    [MO_LESW] = helper_le_ldsw_mmu,
+    [MO_LESL] = helper_le_ldsl_mmu,
+    [MO_BESW] = helper_be_ldsw_mmu,
+    [MO_BESL] = helper_be_ldsl_mmu,
 };
 
 /* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
@@ -1407,67 +1401,6 @@ static void * const qemu_st_helpers[16] = {
     [MO_BEQ]  = helper_be_stq_mmu,
 };
 
-static inline void tcg_out_adr(TCGContext *s, TCGReg rd, void *target)
-{
-    ptrdiff_t offset = tcg_pcrel_diff(s, target);
-    tcg_debug_assert(offset == sextract64(offset, 0, 21));
-    tcg_out_insn(s, 3406, ADR, rd, offset);
-}
-
-static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
-{
-    TCGMemOpIdx oi = lb->oi;
-    TCGMemOp opc = get_memop(oi);
-    TCGMemOp size = opc & MO_SIZE;
-
-    reloc_pc19(lb->label_ptr[0], s->code_ptr);
-
-    tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_X0, TCG_AREG0);
-    tcg_out_mov(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
-    tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X2, oi);
-    tcg_out_adr(s, TCG_REG_X3, lb->raddr);
-    tcg_out_call(s, qemu_ld_helpers[opc & (MO_BSWAP | MO_SIZE)]);
-    if (opc & MO_SIGN) {
-        tcg_out_sxt(s, lb->type, size, lb->datalo_reg, TCG_REG_X0);
-    } else {
-        tcg_out_mov(s, size == MO_64, lb->datalo_reg, TCG_REG_X0);
-    }
-
-    tcg_out_goto(s, lb->raddr);
-}
-
-static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
-{
-    TCGMemOpIdx oi = lb->oi;
-    TCGMemOp opc = get_memop(oi);
-    TCGMemOp size = opc & MO_SIZE;
-
-    reloc_pc19(lb->label_ptr[0], s->code_ptr);
-
-    tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_X0, TCG_AREG0);
-    tcg_out_mov(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
-    tcg_out_mov(s, size == MO_64, TCG_REG_X2, lb->datalo_reg);
-    tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X3, oi);
-    tcg_out_adr(s, TCG_REG_X4, lb->raddr);
-    tcg_out_call(s, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)]);
-    tcg_out_goto(s, lb->raddr);
-}
-
-static void add_qemu_ldst_label(TCGContext *s, bool is_ld, TCGMemOpIdx oi,
-                                TCGType ext, TCGReg data_reg, TCGReg addr_reg,
-                                tcg_insn_unit *raddr, tcg_insn_unit *label_ptr)
-{
-    TCGLabelQemuLdst *label = new_ldst_label(s);
-
-    label->is_ld = is_ld;
-    label->oi = oi;
-    label->type = ext;
-    label->datalo_reg = data_reg;
-    label->addrlo_reg = addr_reg;
-    label->raddr = raddr;
-    label->label_ptr[0] = label_ptr;
-}
-
 /*
  * Load and compare a TLB entry, emitting the conditional jump to the
  * slow path on failure.  Returns the register for the host addend.
@@ -1644,19 +1577,22 @@ static void tcg_out_qemu_ld(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
                             TCGMemOpIdx oi, TCGType ext)
 {
     TCGMemOp memop = get_memop(oi);
-    const TCGType otype = TARGET_LONG_BITS == 64 ? TCG_TYPE_I64 : TCG_TYPE_I32;
-#ifdef CONFIG_SOFTMMU
-    unsigned mem_index = get_mmuidx(oi);
-    tcg_insn_unit *label_ptr;
-    TCGReg base;
 
-    base = tcg_out_tlb_read(s, addr_reg, memop, &label_ptr, mem_index, 1,
-                            TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3);
-    tcg_out_qemu_ld_direct(s, memop, ext, data_reg,
-                           base, otype, addr_reg);
-    add_qemu_ldst_label(s, true, oi, ext, data_reg, addr_reg,
-                        s->code_ptr, label_ptr);
+#ifdef CONFIG_SOFTMMU
+    /* Ignore the requested "ext".  We get the same correct result from
+     * a 16-bit sign-extended to 64-bit as we do sign-extended to 32-bit,
+     * and we create fewer out-of-line thunks.
+     */
+    bool is_64 = (memop & MO_SIGN) || ((memop & MO_SIZE) == MO_64);
+
+    tcg_debug_assert(data_reg == TCG_REG_X0);
+    tcg_debug_assert(addr_reg == TCG_REG_X1);
+
+    add_ldst_ool_label(s, true, is_64, oi, R_AARCH64_JUMP26, 0);
+    tcg_out_insn(s, 3206, BL, 0);
 #else /* !CONFIG_SOFTMMU */
+    const TCGType otype = TARGET_LONG_BITS == 64 ? TCG_TYPE_I64 : TCG_TYPE_I32;
+
     if (USE_GUEST_BASE) {
         tcg_out_qemu_ld_direct(s, memop, ext, data_reg,
                                TCG_REG_GUEST_BASE, otype, addr_reg);
@@ -1671,18 +1607,18 @@ static void tcg_out_qemu_st(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
                             TCGMemOpIdx oi)
 {
     TCGMemOp memop = get_memop(oi);
-    const TCGType otype = TARGET_LONG_BITS == 64 ? TCG_TYPE_I64 : TCG_TYPE_I32;
-#ifdef CONFIG_SOFTMMU
-    unsigned mem_index = get_mmuidx(oi);
-    tcg_insn_unit *label_ptr;
-    TCGReg base;
 
-    base = tcg_out_tlb_read(s, addr_reg, memop, &label_ptr, mem_index, 0,
-                            TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3);
-    tcg_out_qemu_st_direct(s, memop, data_reg, base, otype, addr_reg);
-    add_qemu_ldst_label(s, false, oi, (memop & MO_SIZE)== MO_64,
-                        data_reg, addr_reg, s->code_ptr, label_ptr);
+#ifdef CONFIG_SOFTMMU
+    bool is_64 = (memop & MO_SIZE) == MO_64;
+
+    tcg_debug_assert(addr_reg == TCG_REG_X1);
+    tcg_debug_assert(data_reg == TCG_REG_X2);
+
+    add_ldst_ool_label(s, false, is_64, oi, R_AARCH64_JUMP26, 0);
+    tcg_out_insn(s, 3206, BL, 0);
 #else /* !CONFIG_SOFTMMU */
+    const TCGType otype = TARGET_LONG_BITS == 64 ? TCG_TYPE_I64 : TCG_TYPE_I32;
+
     if (USE_GUEST_BASE) {
         tcg_out_qemu_st_direct(s, memop, data_reg,
                                TCG_REG_GUEST_BASE, otype, addr_reg);
@@ -1693,6 +1629,52 @@ static void tcg_out_qemu_st(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
 #endif /* CONFIG_SOFTMMU */
 }
 
+#ifdef CONFIG_SOFTMMU
+static tcg_insn_unit *tcg_out_qemu_ldst_ool(TCGContext *s, bool is_ld,
+                                            bool is_64, TCGMemOpIdx oi)
+{
+    const TCGType otype = TARGET_LONG_BITS == 64 ? TCG_TYPE_I64 : TCG_TYPE_I32;
+    const TCGMemOp memop = get_memop(oi);
+    const unsigned mem_index = get_mmuidx(oi);
+    const TCGReg addr_reg = TCG_REG_X1;
+    const TCGReg data_reg = is_ld ? TCG_REG_X0 : TCG_REG_X2;
+    tcg_insn_unit * const thunk = s->code_ptr;
+    tcg_insn_unit *label;
+    TCGReg base, arg;
+
+    base = tcg_out_tlb_read(s, addr_reg, memop, &label, mem_index, is_ld,
+                            TCG_REG_X4, TCG_REG_X5, TCG_REG_X6, TCG_REG_X7);
+
+    /* TLB Hit */
+    if (is_ld) {
+        tcg_out_qemu_ld_direct(s, memop, is_64, data_reg,
+                               base, otype, addr_reg);
+    } else {
+        tcg_out_qemu_st_direct(s, memop, data_reg, base, otype, addr_reg);
+    }
+    tcg_out_insn(s, 3207, RET, TCG_REG_LR);
+
+    /* TLB Miss */
+    reloc_pc19(label, s->code_ptr);
+
+    tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_X0, TCG_AREG0);
+    /* addr_reg and data_reg are already in place.  */
+    arg = is_ld ? TCG_REG_X2 : TCG_REG_X3;
+    tcg_out_movi(s, TCG_TYPE_I32, arg++, oi);
+    tcg_out_mov(s, TCG_TYPE_PTR, arg++, TCG_REG_LR);
+
+    if (is_ld) {
+        tcg_out_goto_long(s, qemu_ld_helpers[memop & (MO_BSWAP | MO_SSIZE)],
+                          TCG_REG_X7);
+    } else {
+        tcg_out_goto_long(s, qemu_st_helpers[memop & (MO_BSWAP | MO_SIZE)],
+                          TCG_REG_X7);
+    }
+
+    return thunk;
+}
+#endif
+
 static tcg_insn_unit *tb_ret_addr;
 
 static void tcg_out_op(TCGContext *s, TCGOpcode opc,
@@ -2262,10 +2244,12 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     static const TCGTargetOpDef w_w = { .args_ct_str = { "w", "w" } };
     static const TCGTargetOpDef w_r = { .args_ct_str = { "w", "r" } };
     static const TCGTargetOpDef w_wr = { .args_ct_str = { "w", "wr" } };
-    static const TCGTargetOpDef r_l = { .args_ct_str = { "r", "l" } };
     static const TCGTargetOpDef r_rA = { .args_ct_str = { "r", "rA" } };
     static const TCGTargetOpDef rZ_r = { .args_ct_str = { "rZ", "r" } };
-    static const TCGTargetOpDef lZ_l = { .args_ct_str = { "lZ", "l" } };
+#ifdef CONFIG_SOFTMMU
+    static const TCGTargetOpDef a_b = { .args_ct_str = { "a", "b" } };
+    static const TCGTargetOpDef c_b = { .args_ct_str = { "c", "b" } };
+#endif
     static const TCGTargetOpDef r_r_r = { .args_ct_str = { "r", "r", "r" } };
     static const TCGTargetOpDef w_w_w = { .args_ct_str = { "w", "w", "w" } };
     static const TCGTargetOpDef w_w_wZ = { .args_ct_str = { "w", "w", "wZ" } };
@@ -2397,10 +2381,19 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
 
     case INDEX_op_qemu_ld_i32:
     case INDEX_op_qemu_ld_i64:
-        return &r_l;
+#ifdef CONFIG_SOFTMMU
+        return &a_b;
+#else
+        return &r_r;
+#endif
+
     case INDEX_op_qemu_st_i32:
     case INDEX_op_qemu_st_i64:
-        return &lZ_l;
+#ifdef CONFIG_SOFTMMU
+        return &c_b;
+#else
+        return &r_r;
+#endif
 
     case INDEX_op_deposit_i32:
     case INDEX_op_deposit_i64:
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-4.0 13/17] tcg/arm: Parameterize the temps for tcg_out_tlb_read
  2018-11-12 21:44 [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line Richard Henderson
                   ` (11 preceding siblings ...)
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 12/17] tcg/aarch64: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
@ 2018-11-12 21:44 ` Richard Henderson
  2018-11-12 21:45 ` [Qemu-devel] [PATCH for-4.0 14/17] tcg/arm: Add constraints for R0-R5 Richard Henderson
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2018-11-12 21:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

When moving the qemu_ld/st arguments to the right place for
a function call, we'll need to move the temps out of the way.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/arm/tcg-target.inc.c | 89 +++++++++++++++++++++-------------------
 1 file changed, 46 insertions(+), 43 deletions(-)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 80d174ef44..414c91c9ea 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -1245,11 +1245,14 @@ static TCGReg tcg_out_arg_reg64(TCGContext *s, TCGReg argreg,
 /* We're expecting to use an 8-bit immediate and to mask.  */
 QEMU_BUILD_BUG_ON(CPU_TLB_BITS > 8);
 
-/* Load and compare a TLB entry, leaving the flags set.  Returns the register
-   containing the addend of the tlb entry.  Clobbers R0, R1, R2, TMP.  */
-
+/*
+ *Load and compare a TLB entry, leaving the flags set.  Returns the register
+ * containing the addend of the tlb entry.  Clobbers t0, t1, t2, t3.
+ * T0 and T1 must be consecutive for LDRD.
+ */
 static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
-                               TCGMemOp opc, int mem_index, bool is_load)
+                               TCGMemOp opc, int mem_index, bool is_load,
+                               TCGReg t0, TCGReg t1, TCGReg t2, TCGReg t3)
 {
     TCGReg base = TCG_AREG0;
     int cmp_off =
@@ -1262,36 +1265,37 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     unsigned a_bits = get_alignment_bits(opc);
 
     /* V7 generates the following:
-     *   ubfx   r0, addrlo, #TARGET_PAGE_BITS, #CPU_TLB_BITS
-     *   add    r2, env, #high
-     *   add    r2, r2, r0, lsl #CPU_TLB_ENTRY_BITS
-     *   ldr    r0, [r2, #cmp]
-     *   ldr    r2, [r2, #add]
-     *   movw   tmp, #page_align_mask
-     *   bic    tmp, addrlo, tmp
-     *   cmp    r0, tmp
+     *   ubfx   t0, addrlo, #TARGET_PAGE_BITS, #CPU_TLB_BITS
+     *   add    t2, env, #high
+     *   add    t2, t2, r0, lsl #CPU_TLB_ENTRY_BITS
+     *   ldr    t0, [t2, #cmp]  (and t1 w/ldrd)
+     *   ldr    t2, [t2, #add]
+     *   movw   t3, #page_align_mask
+     *   bic    t3, addrlo, t3
+     *   cmp    t0, t3
      *
      * Otherwise we generate:
-     *   shr    tmp, addrlo, #TARGET_PAGE_BITS
-     *   add    r2, env, #high
-     *   and    r0, tmp, #(CPU_TLB_SIZE - 1)
-     *   add    r2, r2, r0, lsl #CPU_TLB_ENTRY_BITS
-     *   ldr    r0, [r2, #cmp]
-     *   ldr    r2, [r2, #add]
+     *   shr    t3, addrlo, #TARGET_PAGE_BITS
+     *   add    t2, env, #high
+     *   and    t0, t3, #(CPU_TLB_SIZE - 1)
+     *   add    t2, t2, t0, lsl #CPU_TLB_ENTRY_BITS
+     *   ldr    t0, [t2, #cmp]  (and t1 w/ldrd)
+     *   ldr    t2, [t2, #add]
      *   tst    addrlo, #s_mask
-     *   cmpeq  r0, tmp, lsl #TARGET_PAGE_BITS
+     *   cmpeq  t0, t3, lsl #TARGET_PAGE_BITS
      */
     if (use_armv7_instructions) {
-        tcg_out_extract(s, COND_AL, TCG_REG_R0, addrlo,
+        tcg_out_extract(s, COND_AL, t0, addrlo,
                         TARGET_PAGE_BITS, CPU_TLB_BITS);
     } else {
-        tcg_out_dat_reg(s, COND_AL, ARITH_MOV, TCG_REG_TMP,
+        tcg_out_dat_reg(s, COND_AL, ARITH_MOV, t3,
                         0, addrlo, SHIFT_IMM_LSR(TARGET_PAGE_BITS));
     }
 
     /* Add portions of the offset until the memory access is in range.
      * If we plan on using ldrd, reduce to an 8-bit offset; otherwise
-     * we can use a 12-bit offset.  */
+     * we can use a 12-bit offset.
+     */
     if (use_armv6_instructions && TARGET_LONG_BITS == 64) {
         mask_off = 0xff;
     } else {
@@ -1301,34 +1305,33 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
         int shift = ctz32(cmp_off & ~mask_off) & ~1;
         int rot = ((32 - shift) << 7) & 0xf00;
         int addend = cmp_off & (0xff << shift);
-        tcg_out_dat_imm(s, COND_AL, ARITH_ADD, TCG_REG_R2, base,
+        tcg_out_dat_imm(s, COND_AL, ARITH_ADD, t2, base,
                         rot | ((cmp_off >> shift) & 0xff));
-        base = TCG_REG_R2;
+        base = t2;
         add_off -= addend;
         cmp_off -= addend;
     }
 
     if (!use_armv7_instructions) {
-        tcg_out_dat_imm(s, COND_AL, ARITH_AND,
-                        TCG_REG_R0, TCG_REG_TMP, CPU_TLB_SIZE - 1);
+        tcg_out_dat_imm(s, COND_AL, ARITH_AND, t0, t3, CPU_TLB_SIZE - 1);
     }
-    tcg_out_dat_reg(s, COND_AL, ARITH_ADD, TCG_REG_R2, base,
-                    TCG_REG_R0, SHIFT_IMM_LSL(CPU_TLB_ENTRY_BITS));
+    tcg_out_dat_reg(s, COND_AL, ARITH_ADD, t2, base, t0,
+                    SHIFT_IMM_LSL(CPU_TLB_ENTRY_BITS));
 
     /* Load the tlb comparator.  Use ldrd if needed and available,
        but due to how the pointer needs setting up, ldm isn't useful.
        Base arm5 doesn't have ldrd, but armv5te does.  */
     if (use_armv6_instructions && TARGET_LONG_BITS == 64) {
-        tcg_out_ldrd_8(s, COND_AL, TCG_REG_R0, TCG_REG_R2, cmp_off);
+        tcg_out_ldrd_8(s, COND_AL, t0, t2, cmp_off);
     } else {
-        tcg_out_ld32_12(s, COND_AL, TCG_REG_R0, TCG_REG_R2, cmp_off);
+        tcg_out_ld32_12(s, COND_AL, t0, t2, cmp_off);
         if (TARGET_LONG_BITS == 64) {
-            tcg_out_ld32_12(s, COND_AL, TCG_REG_R1, TCG_REG_R2, cmp_off + 4);
+            tcg_out_ld32_12(s, COND_AL, t1, t2, cmp_off + 4);
         }
     }
 
     /* Load the tlb addend.  */
-    tcg_out_ld32_12(s, COND_AL, TCG_REG_R2, TCG_REG_R2, add_off);
+    tcg_out_ld32_12(s, COND_AL, t2, t2, add_off);
 
     /* Check alignment.  We don't support inline unaligned acceses,
        but we can easily support overalignment checks.  */
@@ -1341,29 +1344,27 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
         int rot = encode_imm(mask);
 
         if (rot >= 0) { 
-            tcg_out_dat_imm(s, COND_AL, ARITH_BIC, TCG_REG_TMP, addrlo,
+            tcg_out_dat_imm(s, COND_AL, ARITH_BIC, t3, addrlo,
                             rotl(mask, rot) | (rot << 7));
         } else {
-            tcg_out_movi32(s, COND_AL, TCG_REG_TMP, mask);
-            tcg_out_dat_reg(s, COND_AL, ARITH_BIC, TCG_REG_TMP,
-                            addrlo, TCG_REG_TMP, 0);
+            tcg_out_movi32(s, COND_AL, t3, mask);
+            tcg_out_dat_reg(s, COND_AL, ARITH_BIC, t3, addrlo, t3, 0);
         }
-        tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0, TCG_REG_R0, TCG_REG_TMP, 0);
+        tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0, t0, t3, 0);
     } else {
         if (a_bits) {
             tcg_out_dat_imm(s, COND_AL, ARITH_TST, 0, addrlo,
                             (1 << a_bits) - 1);
         }
         tcg_out_dat_reg(s, (a_bits ? COND_EQ : COND_AL), ARITH_CMP,
-                        0, TCG_REG_R0, TCG_REG_TMP,
-                        SHIFT_IMM_LSL(TARGET_PAGE_BITS));
+                        0, t0, t3, SHIFT_IMM_LSL(TARGET_PAGE_BITS));
     }
 
     if (TARGET_LONG_BITS == 64) {
-        tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0, TCG_REG_R1, addrhi, 0);
+        tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0, t1, addrhi, 0);
     }
 
-    return TCG_REG_R2;
+    return t2;
 }
 
 /* Record the context of a call to the out of line helper code for the slow
@@ -1629,7 +1630,8 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
-    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 1);
+    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 1,
+                              TCG_REG_R0, TCG_REG_R1, TCG_REG_R2, TCG_REG_R14);
 
     /* This a conditional BL only to load a pointer within this opcode into LR
        for the slow path.  We will not be using the value for a tail call.  */
@@ -1760,7 +1762,8 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
-    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 0);
+    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 0,
+                              TCG_REG_R0, TCG_REG_R1, TCG_REG_R2, TCG_REG_R14);
 
     tcg_out_qemu_st_index(s, COND_EQ, opc, datalo, datahi, addrlo, addend);
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-4.0 14/17] tcg/arm: Add constraints for R0-R5
  2018-11-12 21:44 [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line Richard Henderson
                   ` (12 preceding siblings ...)
  2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 13/17] tcg/arm: Parameterize the temps for tcg_out_tlb_read Richard Henderson
@ 2018-11-12 21:45 ` Richard Henderson
  2018-11-12 21:45 ` [Qemu-devel] [PATCH for-4.0 15/17] tcg/arm: Reduce the number of temps for tcg_out_tlb_read Richard Henderson
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2018-11-12 21:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

These are function call arguments that we will need soon.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/arm/tcg-target.inc.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 414c91c9ea..4339c472e8 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -246,7 +246,12 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 static const char *target_parse_constraint(TCGArgConstraint *ct,
                                            const char *ct_str, TCGType type)
 {
-    switch (*ct_str++) {
+    char c = *ct_str++;
+    switch (c) {
+    case 'a' ... 'f': /* r0 - r5 */
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set_reg(ct->u.regs, TCG_REG_R0 + (c - 'a'));
+        break;
     case 'I':
         ct->ct |= TCG_CT_CONST_ARM;
         break;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-4.0 15/17] tcg/arm: Reduce the number of temps for tcg_out_tlb_read
  2018-11-12 21:44 [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line Richard Henderson
                   ` (13 preceding siblings ...)
  2018-11-12 21:45 ` [Qemu-devel] [PATCH for-4.0 14/17] tcg/arm: Add constraints for R0-R5 Richard Henderson
@ 2018-11-12 21:45 ` Richard Henderson
  2018-11-12 21:45 ` [Qemu-devel] [PATCH for-4.0 16/17] tcg/arm: Force qemu_ld/st arguments into fixed registers Richard Henderson
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2018-11-12 21:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

When moving the qemu_ld/st thunk out of line, we no longer have LR for
use as a temporary.  In the worst case we must make do with 3 temps,
when dealing with a 64-bit guest address.  This in turn imples that we
cannot use LDRD anymore, as there are not enough temps.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/arm/tcg-target.inc.c | 97 ++++++++++++++++++++++------------------
 1 file changed, 53 insertions(+), 44 deletions(-)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 4339c472e8..2deeb1f5d1 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -1251,13 +1251,12 @@ static TCGReg tcg_out_arg_reg64(TCGContext *s, TCGReg argreg,
 QEMU_BUILD_BUG_ON(CPU_TLB_BITS > 8);
 
 /*
- *Load and compare a TLB entry, leaving the flags set.  Returns the register
- * containing the addend of the tlb entry.  Clobbers t0, t1, t2, t3.
- * T0 and T1 must be consecutive for LDRD.
+ * Load and compare a TLB entry, leaving the flags set.  Returns the register
+ * containing the addend of the tlb entry.  Clobbers t0, t1, t2.
  */
 static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
                                TCGMemOp opc, int mem_index, bool is_load,
-                               TCGReg t0, TCGReg t1, TCGReg t2, TCGReg t3)
+                               TCGReg t0, TCGReg t1, TCGReg t2)
 {
     TCGReg base = TCG_AREG0;
     int cmp_off =
@@ -1265,49 +1264,64 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
          ? offsetof(CPUArchState, tlb_table[mem_index][0].addr_read)
          : offsetof(CPUArchState, tlb_table[mem_index][0].addr_write));
     int add_off = offsetof(CPUArchState, tlb_table[mem_index][0].addend);
-    int mask_off;
     unsigned s_bits = opc & MO_SIZE;
     unsigned a_bits = get_alignment_bits(opc);
 
     /* V7 generates the following:
      *   ubfx   t0, addrlo, #TARGET_PAGE_BITS, #CPU_TLB_BITS
      *   add    t2, env, #high
-     *   add    t2, t2, r0, lsl #CPU_TLB_ENTRY_BITS
-     *   ldr    t0, [t2, #cmp]  (and t1 w/ldrd)
+     *   add    t2, t2, t0, lsl #CPU_TLB_ENTRY_BITS
+     *   ldr    t0, [t2, #cmp]
      *   ldr    t2, [t2, #add]
-     *   movw   t3, #page_align_mask
-     *   bic    t3, addrlo, t3
-     *   cmp    t0, t3
+     *   movw   t1, #page_align_mask
+     *   bic    t1, addrlo, t1
+     *   cmp    t0, t1
+     *
+     *   ubfx   t0, addrlo, #TPB, #CTB   -- 64-bit address
+     *   add    t2, env, #high
+     *   add    t2, t2, t0, lsl #CTEB
+     *   ldr    t0, [t2, #cmplo]
+     *   movw   t1, #page_align_mask
+     *   bic    t1, addrlo, t1
+     *   cmp    t0, t1
+     *   ldr    t0, [t2, #cmphi]
+     *   ldr    t2, [t2, #add]
+     *   cmpeq  t0, addrhi
      *
      * Otherwise we generate:
      *   shr    t3, addrlo, #TARGET_PAGE_BITS
      *   add    t2, env, #high
      *   and    t0, t3, #(CPU_TLB_SIZE - 1)
      *   add    t2, t2, t0, lsl #CPU_TLB_ENTRY_BITS
-     *   ldr    t0, [t2, #cmp]  (and t1 w/ldrd)
+     *   ldr    t0, [t2, #cmp]
      *   ldr    t2, [t2, #add]
      *   tst    addrlo, #s_mask
      *   cmpeq  t0, t3, lsl #TARGET_PAGE_BITS
+     *
+     *   shr    t1, addrlo, #TPB         -- 64-bit address
+     *   add    t2, env, #high
+     *   and    t0, t1, #CTS-1
+     *   add    t2, t2, t0, lsl #CTEB
+     *   ldr    t0, [t2, #cmplo]
+     *   tst    addrlo, #s_mask
+     *   cmpeq  t0, t1, lsl #TBP
+     *   ldr    t0, [t2, #cmphi]
+     *   ldr    t2, [t2, #add]
+     *   cmpeq  t0, addrhi
      */
     if (use_armv7_instructions) {
         tcg_out_extract(s, COND_AL, t0, addrlo,
                         TARGET_PAGE_BITS, CPU_TLB_BITS);
     } else {
-        tcg_out_dat_reg(s, COND_AL, ARITH_MOV, t3,
+        tcg_out_dat_reg(s, COND_AL, ARITH_MOV, t1,
                         0, addrlo, SHIFT_IMM_LSR(TARGET_PAGE_BITS));
     }
 
     /* Add portions of the offset until the memory access is in range.
-     * If we plan on using ldrd, reduce to an 8-bit offset; otherwise
-     * we can use a 12-bit offset.
+     * We are not using ldrd, so we can use a 12-bit offset.
      */
-    if (use_armv6_instructions && TARGET_LONG_BITS == 64) {
-        mask_off = 0xff;
-    } else {
-        mask_off = 0xfff;
-    }
-    while (cmp_off > mask_off) {
-        int shift = ctz32(cmp_off & ~mask_off) & ~1;
+    while (cmp_off > 0xfff) {
+        int shift = ctz32(cmp_off & ~0xfff) & ~1;
         int rot = ((32 - shift) << 7) & 0xf00;
         int addend = cmp_off & (0xff << shift);
         tcg_out_dat_imm(s, COND_AL, ARITH_ADD, t2, base,
@@ -1318,25 +1332,13 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     }
 
     if (!use_armv7_instructions) {
-        tcg_out_dat_imm(s, COND_AL, ARITH_AND, t0, t3, CPU_TLB_SIZE - 1);
+        tcg_out_dat_imm(s, COND_AL, ARITH_AND, t0, t1, CPU_TLB_SIZE - 1);
     }
     tcg_out_dat_reg(s, COND_AL, ARITH_ADD, t2, base, t0,
                     SHIFT_IMM_LSL(CPU_TLB_ENTRY_BITS));
 
-    /* Load the tlb comparator.  Use ldrd if needed and available,
-       but due to how the pointer needs setting up, ldm isn't useful.
-       Base arm5 doesn't have ldrd, but armv5te does.  */
-    if (use_armv6_instructions && TARGET_LONG_BITS == 64) {
-        tcg_out_ldrd_8(s, COND_AL, t0, t2, cmp_off);
-    } else {
-        tcg_out_ld32_12(s, COND_AL, t0, t2, cmp_off);
-        if (TARGET_LONG_BITS == 64) {
-            tcg_out_ld32_12(s, COND_AL, t1, t2, cmp_off + 4);
-        }
-    }
-
-    /* Load the tlb addend.  */
-    tcg_out_ld32_12(s, COND_AL, t2, t2, add_off);
+    /* Load the tlb comparator (low part).  */
+    tcg_out_ld32_12(s, COND_AL, t0, t2, cmp_off);
 
     /* Check alignment.  We don't support inline unaligned acceses,
        but we can easily support overalignment checks.  */
@@ -1349,24 +1351,31 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
         int rot = encode_imm(mask);
 
         if (rot >= 0) { 
-            tcg_out_dat_imm(s, COND_AL, ARITH_BIC, t3, addrlo,
+            tcg_out_dat_imm(s, COND_AL, ARITH_BIC, t1, addrlo,
                             rotl(mask, rot) | (rot << 7));
         } else {
-            tcg_out_movi32(s, COND_AL, t3, mask);
-            tcg_out_dat_reg(s, COND_AL, ARITH_BIC, t3, addrlo, t3, 0);
+            tcg_out_movi32(s, COND_AL, t1, mask);
+            tcg_out_dat_reg(s, COND_AL, ARITH_BIC, t1, addrlo, t1, 0);
         }
-        tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0, t0, t3, 0);
+        tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0, t0, t1, 0);
     } else {
         if (a_bits) {
             tcg_out_dat_imm(s, COND_AL, ARITH_TST, 0, addrlo,
                             (1 << a_bits) - 1);
         }
         tcg_out_dat_reg(s, (a_bits ? COND_EQ : COND_AL), ARITH_CMP,
-                        0, t0, t3, SHIFT_IMM_LSL(TARGET_PAGE_BITS));
+                        0, t0, t1, SHIFT_IMM_LSL(TARGET_PAGE_BITS));
     }
 
+    /* Load the tlb comparator (high part).  */
     if (TARGET_LONG_BITS == 64) {
-        tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0, t1, addrhi, 0);
+        tcg_out_ld32_12(s, COND_AL, t0, t2, cmp_off + 4);
+    }
+    /* Load the tlb addend.  */
+    tcg_out_ld32_12(s, COND_AL, t2, t2, add_off);
+
+    if (TARGET_LONG_BITS == 64) {
+        tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0, t0, addrhi, 0);
     }
 
     return t2;
@@ -1636,7 +1645,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
     addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 1,
-                              TCG_REG_R0, TCG_REG_R1, TCG_REG_R2, TCG_REG_R14);
+                              TCG_REG_R0, TCG_REG_R1, TCG_REG_TMP);
 
     /* This a conditional BL only to load a pointer within this opcode into LR
        for the slow path.  We will not be using the value for a tail call.  */
@@ -1768,7 +1777,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
     addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 0,
-                              TCG_REG_R0, TCG_REG_R1, TCG_REG_R2, TCG_REG_R14);
+                              TCG_REG_R0, TCG_REG_R1, TCG_REG_TMP);
 
     tcg_out_qemu_st_index(s, COND_EQ, opc, datalo, datahi, addrlo, addend);
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-4.0 16/17] tcg/arm: Force qemu_ld/st arguments into fixed registers
  2018-11-12 21:44 [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line Richard Henderson
                   ` (14 preceding siblings ...)
  2018-11-12 21:45 ` [Qemu-devel] [PATCH for-4.0 15/17] tcg/arm: Reduce the number of temps for tcg_out_tlb_read Richard Henderson
@ 2018-11-12 21:45 ` Richard Henderson
  2018-11-12 21:45 ` [Qemu-devel] [PATCH for-4.0 17/17] tcg/arm: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2018-11-12 21:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

This is an incremental step toward moving the qemu_ld/st
code sequence out of line.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/arm/tcg-target.inc.c | 116 +++++++++++++++++++++++++--------------
 1 file changed, 75 insertions(+), 41 deletions(-)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 2deeb1f5d1..75589b43e2 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -270,38 +270,15 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
         ct->u.regs = 0xffff;
         break;
 
-    /* qemu_ld address */
-    case 'l':
-        ct->ct |= TCG_CT_REG;
-        ct->u.regs = 0xffff;
-#ifdef CONFIG_SOFTMMU
-        /* r0-r2,lr will be overwritten when reading the tlb entry,
-           so don't use these. */
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R0);
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R1);
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R2);
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R14);
-#endif
-        break;
-
+#ifndef CONFIG_SOFTMMU
     /* qemu_st address & data */
     case 's':
         ct->ct |= TCG_CT_REG;
         ct->u.regs = 0xffff;
-        /* r0-r2 will be overwritten when reading the tlb entry (softmmu only)
-           and r0-r1 doing the byte swapping, so don't use these. */
+        /* r0 and tmp are needed for byte swapping.  */
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_R0);
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R1);
-#if defined(CONFIG_SOFTMMU)
-        /* Avoid clashes with registers being used for helper args */
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R2);
-#if TARGET_LONG_BITS == 64
-        /* Avoid clashes with registers being used for helper args */
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R3);
-#endif
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R14);
-#endif
         break;
+#endif
 
     default:
         return NULL;
@@ -1630,8 +1607,8 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
     TCGMemOpIdx oi;
     TCGMemOp opc;
 #ifdef CONFIG_SOFTMMU
-    int mem_index;
-    TCGReg addend;
+    int mem_index, avail;
+    TCGReg addend, t0, t1;
     tcg_insn_unit *label_ptr;
 #endif
 
@@ -1644,8 +1621,20 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
+
+    avail = 0xf;
+    avail &= ~(1 << addrlo);
+    if (TARGET_LONG_BITS == 64) {
+        avail &= ~(1 << addrhi);
+    }
+    tcg_debug_assert(avail & 1);
+    t0 = TCG_REG_R0;
+    avail &= ~1;
+    tcg_debug_assert(avail != 0);
+    t1 = ctz32(avail);
+
     addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 1,
-                              TCG_REG_R0, TCG_REG_R1, TCG_REG_TMP);
+                              t0, t1, TCG_REG_TMP);
 
     /* This a conditional BL only to load a pointer within this opcode into LR
        for the slow path.  We will not be using the value for a tail call.  */
@@ -1762,8 +1751,8 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
     TCGMemOpIdx oi;
     TCGMemOp opc;
 #ifdef CONFIG_SOFTMMU
-    int mem_index;
-    TCGReg addend;
+    int mem_index, avail;
+    TCGReg addend, t0, t1;
     tcg_insn_unit *label_ptr;
 #endif
 
@@ -1776,8 +1765,24 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
+
+    avail = 0xf;
+    avail &= ~(1 << addrlo);
+    avail &= ~(1 << datalo);
+    if (TARGET_LONG_BITS == 64) {
+        avail &= ~(1 << addrhi);
+    }
+    if (is64) {
+        avail &= ~(1 << datahi);
+    }
+    tcg_debug_assert(avail & 1);
+    t0 = TCG_REG_R0;
+    avail &= ~1;
+    tcg_debug_assert(avail != 0);
+    t1 = ctz32(avail);
+
     addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 0,
-                              TCG_REG_R0, TCG_REG_R1, TCG_REG_TMP);
+                              t0, t1, TCG_REG_TMP);
 
     tcg_out_qemu_st_index(s, COND_EQ, opc, datalo, datahi, addrlo, addend);
 
@@ -2118,11 +2123,14 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     static const TCGTargetOpDef r = { .args_ct_str = { "r" } };
     static const TCGTargetOpDef r_r = { .args_ct_str = { "r", "r" } };
     static const TCGTargetOpDef s_s = { .args_ct_str = { "s", "s" } };
-    static const TCGTargetOpDef r_l = { .args_ct_str = { "r", "l" } };
+    static const TCGTargetOpDef a_b = { .args_ct_str = { "a", "b" } };
+    static const TCGTargetOpDef c_b = { .args_ct_str = { "c", "b" } };
     static const TCGTargetOpDef r_r_r = { .args_ct_str = { "r", "r", "r" } };
-    static const TCGTargetOpDef r_r_l = { .args_ct_str = { "r", "r", "l" } };
-    static const TCGTargetOpDef r_l_l = { .args_ct_str = { "r", "l", "l" } };
     static const TCGTargetOpDef s_s_s = { .args_ct_str = { "s", "s", "s" } };
+    static const TCGTargetOpDef a_c_d = { .args_ct_str = { "a", "c", "d" } };
+    static const TCGTargetOpDef a_b_b = { .args_ct_str = { "a", "b", "b" } };
+    static const TCGTargetOpDef e_c_d = { .args_ct_str = { "e", "c", "d" } };
+    static const TCGTargetOpDef e_f_b = { .args_ct_str = { "e", "f", "b" } };
     static const TCGTargetOpDef r_r_ri = { .args_ct_str = { "r", "r", "ri" } };
     static const TCGTargetOpDef r_r_rI = { .args_ct_str = { "r", "r", "rI" } };
     static const TCGTargetOpDef r_r_rIN
@@ -2131,10 +2139,12 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         = { .args_ct_str = { "r", "r", "rIK" } };
     static const TCGTargetOpDef r_r_r_r
         = { .args_ct_str = { "r", "r", "r", "r" } };
-    static const TCGTargetOpDef r_r_l_l
-        = { .args_ct_str = { "r", "r", "l", "l" } };
     static const TCGTargetOpDef s_s_s_s
         = { .args_ct_str = { "s", "s", "s", "s" } };
+    static const TCGTargetOpDef a_b_c_d
+        = { .args_ct_str = { "a", "b", "c", "d" } };
+    static const TCGTargetOpDef e_f_c_d
+        = { .args_ct_str = { "e", "f", "c", "d" } };
     static const TCGTargetOpDef br
         = { .args_ct_str = { "r", "rIN" } };
     static const TCGTargetOpDef dep
@@ -2215,13 +2225,37 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         return &setc2;
 
     case INDEX_op_qemu_ld_i32:
-        return TARGET_LONG_BITS == 32 ? &r_l : &r_l_l;
+        if (!USING_SOFTMMU) {
+            return TARGET_LONG_BITS == 32 ? &r_r : &r_r_r;
+        } else if (TARGET_LONG_BITS == 32) {
+            return &a_b;     /* temps available r0, r2, r3, r12 */
+        } else {
+            return &a_c_d;   /* temps available r0, r1, r12 */
+        }
     case INDEX_op_qemu_ld_i64:
-        return TARGET_LONG_BITS == 32 ? &r_r_l : &r_r_l_l;
+        if (!USING_SOFTMMU) {
+            return TARGET_LONG_BITS == 32 ? &r_r_r : &r_r_r_r;
+        } else if (TARGET_LONG_BITS == 32) {
+            return &a_b_b;   /* temps available r0, r2, r3, r12 */
+        } else {
+            return &a_b_c_d; /* temps available r0, r1, r12 */
+        }
     case INDEX_op_qemu_st_i32:
-        return TARGET_LONG_BITS == 32 ? &s_s : &s_s_s;
+        if (!USING_SOFTMMU) {
+            return TARGET_LONG_BITS == 32 ? &s_s : &s_s_s;
+        } else if (TARGET_LONG_BITS == 32) {
+            return &c_b;     /* temps available r0, r3, r12 */
+        } else {
+            return &e_c_d;   /* temps available r0, r1, r12 */
+        }
     case INDEX_op_qemu_st_i64:
-        return TARGET_LONG_BITS == 32 ? &s_s_s : &s_s_s_s;
+        if (!USING_SOFTMMU) {
+            return TARGET_LONG_BITS == 32 ? &s_s_s : &s_s_s_s;
+        } else if (TARGET_LONG_BITS == 32) {
+            return &e_f_b;   /* temps available r0, r2, r3, r12 */
+        } else {
+            return &e_f_c_d; /* temps available r0, r1, r12 */
+        }
 
     default:
         return NULL;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-4.0 17/17] tcg/arm: Use TCG_TARGET_NEED_LDST_OOL_LABELS
  2018-11-12 21:44 [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line Richard Henderson
                   ` (15 preceding siblings ...)
  2018-11-12 21:45 ` [Qemu-devel] [PATCH for-4.0 16/17] tcg/arm: Force qemu_ld/st arguments into fixed registers Richard Henderson
@ 2018-11-12 21:45 ` Richard Henderson
  2018-11-13  9:00 ` [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line no-reply
  2018-11-14  1:00 ` Emilio G. Cota
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2018-11-12 21:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/arm/tcg-target.h     |   2 +-
 tcg/arm/tcg-target.inc.c | 302 +++++++++++++++------------------------
 2 files changed, 118 insertions(+), 186 deletions(-)

diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 94b3578c55..02981abdcc 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -141,7 +141,7 @@ static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
 
 #ifdef CONFIG_SOFTMMU
-#define TCG_TARGET_NEED_LDST_LABELS
+#define TCG_TARGET_NEED_LDST_OOL_LABELS
 #endif
 #define TCG_TARGET_NEED_POOL_LABELS
 
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 75589b43e2..1e9ff693d9 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -1134,7 +1134,7 @@ static TCGCond tcg_out_cmp2(TCGContext *s, const TCGArg *args,
 }
 
 #ifdef CONFIG_SOFTMMU
-#include "tcg-ldst.inc.c"
+#include "tcg-ldst-ool.inc.c"
 
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     int mmu_idx, uintptr_t ra)
@@ -1358,127 +1358,6 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     return t2;
 }
 
-/* Record the context of a call to the out of line helper code for the slow
-   path for a load or store, so that we can later generate the correct
-   helper code.  */
-static void add_qemu_ldst_label(TCGContext *s, bool is_ld, TCGMemOpIdx oi,
-                                TCGReg datalo, TCGReg datahi, TCGReg addrlo,
-                                TCGReg addrhi, tcg_insn_unit *raddr,
-                                tcg_insn_unit *label_ptr)
-{
-    TCGLabelQemuLdst *label = new_ldst_label(s);
-
-    label->is_ld = is_ld;
-    label->oi = oi;
-    label->datalo_reg = datalo;
-    label->datahi_reg = datahi;
-    label->addrlo_reg = addrlo;
-    label->addrhi_reg = addrhi;
-    label->raddr = raddr;
-    label->label_ptr[0] = label_ptr;
-}
-
-static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
-{
-    TCGReg argreg, datalo, datahi;
-    TCGMemOpIdx oi = lb->oi;
-    TCGMemOp opc = get_memop(oi);
-    void *func;
-
-    reloc_pc24(lb->label_ptr[0], s->code_ptr);
-
-    argreg = tcg_out_arg_reg32(s, TCG_REG_R0, TCG_AREG0);
-    if (TARGET_LONG_BITS == 64) {
-        argreg = tcg_out_arg_reg64(s, argreg, lb->addrlo_reg, lb->addrhi_reg);
-    } else {
-        argreg = tcg_out_arg_reg32(s, argreg, lb->addrlo_reg);
-    }
-    argreg = tcg_out_arg_imm32(s, argreg, oi);
-    argreg = tcg_out_arg_reg32(s, argreg, TCG_REG_R14);
-
-    /* For armv6 we can use the canonical unsigned helpers and minimize
-       icache usage.  For pre-armv6, use the signed helpers since we do
-       not have a single insn sign-extend.  */
-    if (use_armv6_instructions) {
-        func = qemu_ld_helpers[opc & (MO_BSWAP | MO_SIZE)];
-    } else {
-        func = qemu_ld_helpers[opc & (MO_BSWAP | MO_SSIZE)];
-        if (opc & MO_SIGN) {
-            opc = MO_UL;
-        }
-    }
-    tcg_out_call(s, func);
-
-    datalo = lb->datalo_reg;
-    datahi = lb->datahi_reg;
-    switch (opc & MO_SSIZE) {
-    case MO_SB:
-        tcg_out_ext8s(s, COND_AL, datalo, TCG_REG_R0);
-        break;
-    case MO_SW:
-        tcg_out_ext16s(s, COND_AL, datalo, TCG_REG_R0);
-        break;
-    default:
-        tcg_out_mov_reg(s, COND_AL, datalo, TCG_REG_R0);
-        break;
-    case MO_Q:
-        if (datalo != TCG_REG_R1) {
-            tcg_out_mov_reg(s, COND_AL, datalo, TCG_REG_R0);
-            tcg_out_mov_reg(s, COND_AL, datahi, TCG_REG_R1);
-        } else if (datahi != TCG_REG_R0) {
-            tcg_out_mov_reg(s, COND_AL, datahi, TCG_REG_R1);
-            tcg_out_mov_reg(s, COND_AL, datalo, TCG_REG_R0);
-        } else {
-            tcg_out_mov_reg(s, COND_AL, TCG_REG_TMP, TCG_REG_R0);
-            tcg_out_mov_reg(s, COND_AL, datahi, TCG_REG_R1);
-            tcg_out_mov_reg(s, COND_AL, datalo, TCG_REG_TMP);
-        }
-        break;
-    }
-
-    tcg_out_goto(s, COND_AL, lb->raddr);
-}
-
-static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
-{
-    TCGReg argreg, datalo, datahi;
-    TCGMemOpIdx oi = lb->oi;
-    TCGMemOp opc = get_memop(oi);
-
-    reloc_pc24(lb->label_ptr[0], s->code_ptr);
-
-    argreg = TCG_REG_R0;
-    argreg = tcg_out_arg_reg32(s, argreg, TCG_AREG0);
-    if (TARGET_LONG_BITS == 64) {
-        argreg = tcg_out_arg_reg64(s, argreg, lb->addrlo_reg, lb->addrhi_reg);
-    } else {
-        argreg = tcg_out_arg_reg32(s, argreg, lb->addrlo_reg);
-    }
-
-    datalo = lb->datalo_reg;
-    datahi = lb->datahi_reg;
-    switch (opc & MO_SIZE) {
-    case MO_8:
-        argreg = tcg_out_arg_reg8(s, argreg, datalo);
-        break;
-    case MO_16:
-        argreg = tcg_out_arg_reg16(s, argreg, datalo);
-        break;
-    case MO_32:
-    default:
-        argreg = tcg_out_arg_reg32(s, argreg, datalo);
-        break;
-    case MO_64:
-        argreg = tcg_out_arg_reg64(s, argreg, datalo, datahi);
-        break;
-    }
-
-    argreg = tcg_out_arg_imm32(s, argreg, oi);
-    argreg = tcg_out_arg_reg32(s, argreg, TCG_REG_R14);
-
-    /* Tail-call to the helper, which will return to the fast path.  */
-    tcg_out_goto(s, COND_AL, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)]);
-}
 #endif /* SOFTMMU */
 
 static inline void tcg_out_qemu_ld_index(TCGContext *s, TCGMemOp opc,
@@ -1603,54 +1482,28 @@ static inline void tcg_out_qemu_ld_direct(TCGContext *s, TCGMemOp opc,
 
 static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
 {
-    TCGReg addrlo, datalo, datahi, addrhi __attribute__((unused));
+    TCGReg addrlo __attribute__((unused));
+    TCGReg addrhi __attribute__((unused));
+    TCGReg datalo __attribute__((unused));
+    TCGReg datahi __attribute__((unused));
     TCGMemOpIdx oi;
-    TCGMemOp opc;
-#ifdef CONFIG_SOFTMMU
-    int mem_index, avail;
-    TCGReg addend, t0, t1;
-    tcg_insn_unit *label_ptr;
-#endif
 
     datalo = *args++;
     datahi = (is64 ? *args++ : 0);
     addrlo = *args++;
     addrhi = (TARGET_LONG_BITS == 64 ? *args++ : 0);
     oi = *args++;
-    opc = get_memop(oi);
 
 #ifdef CONFIG_SOFTMMU
-    mem_index = get_mmuidx(oi);
-
-    avail = 0xf;
-    avail &= ~(1 << addrlo);
-    if (TARGET_LONG_BITS == 64) {
-        avail &= ~(1 << addrhi);
-    }
-    tcg_debug_assert(avail & 1);
-    t0 = TCG_REG_R0;
-    avail &= ~1;
-    tcg_debug_assert(avail != 0);
-    t1 = ctz32(avail);
-
-    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 1,
-                              t0, t1, TCG_REG_TMP);
-
-    /* This a conditional BL only to load a pointer within this opcode into LR
-       for the slow path.  We will not be using the value for a tail call.  */
-    label_ptr = s->code_ptr;
-    tcg_out_bl_noaddr(s, COND_NE);
-
-    tcg_out_qemu_ld_index(s, opc, datalo, datahi, addrlo, addend);
-
-    add_qemu_ldst_label(s, true, oi, datalo, datahi, addrlo, addrhi,
-                        s->code_ptr, label_ptr);
-#else /* !CONFIG_SOFTMMU */
+    add_ldst_ool_label(s, true, is64, oi, R_ARM_PC24, 0);
+    tcg_out_bl_noaddr(s, COND_AL);
+#else
     if (guest_base) {
         tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP, guest_base);
-        tcg_out_qemu_ld_index(s, opc, datalo, datahi, addrlo, TCG_REG_TMP);
+        tcg_out_qemu_ld_index(s, get_memop(oi), datalo, datahi,
+                              addrlo, TCG_REG_TMP);
     } else {
-        tcg_out_qemu_ld_direct(s, opc, datalo, datahi, addrlo);
+        tcg_out_qemu_ld_direct(s, get_memop(oi), datalo, datahi, addrlo);
     }
 #endif
 }
@@ -1747,33 +1600,76 @@ static inline void tcg_out_qemu_st_direct(TCGContext *s, TCGMemOp opc,
 
 static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 {
-    TCGReg addrlo, datalo, datahi, addrhi __attribute__((unused));
+    TCGReg addrlo __attribute__((unused));
+    TCGReg addrhi __attribute__((unused));
+    TCGReg datalo __attribute__((unused));
+    TCGReg datahi __attribute__((unused));
     TCGMemOpIdx oi;
-    TCGMemOp opc;
-#ifdef CONFIG_SOFTMMU
-    int mem_index, avail;
-    TCGReg addend, t0, t1;
-    tcg_insn_unit *label_ptr;
-#endif
 
     datalo = *args++;
     datahi = (is64 ? *args++ : 0);
     addrlo = *args++;
     addrhi = (TARGET_LONG_BITS == 64 ? *args++ : 0);
     oi = *args++;
-    opc = get_memop(oi);
 
 #ifdef CONFIG_SOFTMMU
-    mem_index = get_mmuidx(oi);
+    add_ldst_ool_label(s, false, is64, oi, R_ARM_PC24, 0);
+    tcg_out_bl_noaddr(s, COND_AL);
+#else
+    if (guest_base) {
+        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP, guest_base);
+        tcg_out_qemu_st_index(s, COND_AL, get_memop(oi), datalo,
+                              datahi, addrlo, TCG_REG_TMP);
+    } else {
+        tcg_out_qemu_st_direct(s, get_memop(oi), datalo, datahi, addrlo);
+    }
+#endif
+}
 
+#ifdef CONFIG_SOFTMMU
+static tcg_insn_unit *tcg_out_qemu_ldst_ool(TCGContext *s, bool is_ld,
+                                            bool is_64, TCGMemOpIdx oi)
+{
+    TCGReg addrlo, addrhi, datalo, datahi, addend, argreg, t0, t1;
+    TCGMemOp opc = get_memop(oi);
+    int mem_index = get_mmuidx(oi);
+    tcg_insn_unit *thunk = s->code_ptr;
+    tcg_insn_unit *label;
+    uintptr_t func;
+    int avail;
+
+    /* Pick out where the arguments are located.  A 64-bit address is
+     * aligned in the register pair R2:R3.  Loads return into R0:R1.
+     * A 32-bit store with a 32-bit address has room at R2, but
+     * otherwise uses R4:R5.
+     */
+    if (TARGET_LONG_BITS == 64) {
+        addrlo = TCG_REG_R2, addrhi = TCG_REG_R3;
+    } else {
+        addrlo = TCG_REG_R1, addrhi = -1;
+    }
+    if (is_ld) {
+        datalo = TCG_REG_R0;
+    } else if (TARGET_LONG_BITS == 64 || is_64) {
+        datalo = TCG_REG_R4;
+    } else {
+        datalo = TCG_REG_R2;
+    }
+    datahi = (is_64 ? datalo + 1 : -1);
+
+    /* We need 3 call-clobbered temps.  One of them is always R12,
+     * one of them is always R0.  The third is somewhere in R[1-3].
+     */
     avail = 0xf;
     avail &= ~(1 << addrlo);
-    avail &= ~(1 << datalo);
     if (TARGET_LONG_BITS == 64) {
         avail &= ~(1 << addrhi);
     }
-    if (is64) {
-        avail &= ~(1 << datahi);
+    if (!is_ld) {
+        avail &= ~(1 << datalo);
+        if (is_64) {
+            avail &= ~(1 << datahi);
+        }
     }
     tcg_debug_assert(avail & 1);
     t0 = TCG_REG_R0;
@@ -1781,27 +1677,63 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
     tcg_debug_assert(avail != 0);
     t1 = ctz32(avail);
 
-    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 0,
+    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, is_ld,
                               t0, t1, TCG_REG_TMP);
 
-    tcg_out_qemu_st_index(s, COND_EQ, opc, datalo, datahi, addrlo, addend);
+    label = s->code_ptr;
+    tcg_out_b_noaddr(s, COND_NE);
 
-    /* The conditional call must come last, as we're going to return here.  */
-    label_ptr = s->code_ptr;
-    tcg_out_bl_noaddr(s, COND_NE);
-
-    add_qemu_ldst_label(s, false, oi, datalo, datahi, addrlo, addrhi,
-                        s->code_ptr, label_ptr);
-#else /* !CONFIG_SOFTMMU */
-    if (guest_base) {
-        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP, guest_base);
-        tcg_out_qemu_st_index(s, COND_AL, opc, datalo,
-                              datahi, addrlo, TCG_REG_TMP);
+    /* TCG Hit.  */
+    if (is_ld) {
+        tcg_out_qemu_ld_index(s, opc, datalo, datahi, addrlo, addend);
     } else {
-        tcg_out_qemu_st_direct(s, opc, datalo, datahi, addrlo);
+        tcg_out_qemu_st_index(s, COND_AL, opc, datalo, datahi, addrlo, addend);
     }
-#endif
+    tcg_out_bx(s, COND_AL, TCG_REG_R14);
+
+    /* TLB Miss.  */
+    reloc_pc24(label, s->code_ptr);
+
+    tcg_out_arg_reg32(s, TCG_REG_R0, TCG_AREG0);
+    /* addrlo and addrhi are in place -- see above */
+    argreg = addrlo + (TARGET_LONG_BITS / 32);
+    if (!is_ld) {
+        switch (opc & MO_SIZE) {
+        case MO_8:
+            argreg = tcg_out_arg_reg8(s, argreg, datalo);
+            break;
+        case MO_16:
+            argreg = tcg_out_arg_reg16(s, argreg, datalo);
+            break;
+        case MO_32:
+            argreg = tcg_out_arg_reg32(s, argreg, datalo);
+            break;
+        case MO_64:
+            argreg = tcg_out_arg_reg64(s, argreg, datalo, datahi);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
+    argreg = tcg_out_arg_imm32(s, argreg, oi);
+    argreg = tcg_out_arg_reg32(s, argreg, TCG_REG_R14);
+
+    /* Tail call to the helper.  */
+    if (is_ld) {
+        func = (uintptr_t)qemu_ld_helpers[opc & (MO_BSWAP | MO_SSIZE)];
+    } else {
+        func = (uintptr_t)qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)];
+    }
+    if (use_armv7_instructions) {
+        tcg_out_movi32(s, COND_AL, TCG_REG_TMP, func);
+        tcg_out_bx(s, COND_AL, TCG_REG_TMP);
+    } else {
+        tcg_out_movi_pool(s, COND_AL, TCG_REG_PC, func);
+    }
+
+    return thunk;
 }
+#endif
 
 static tcg_insn_unit *tb_ret_addr;
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line
  2018-11-12 21:44 [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line Richard Henderson
                   ` (16 preceding siblings ...)
  2018-11-12 21:45 ` [Qemu-devel] [PATCH for-4.0 17/17] tcg/arm: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
@ 2018-11-13  9:00 ` no-reply
  2018-11-14  1:00 ` Emilio G. Cota
  18 siblings, 0 replies; 30+ messages in thread
From: no-reply @ 2018-11-13  9:00 UTC (permalink / raw)
  To: richard.henderson; +Cc: famz, qemu-devel, cota

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20181112214503.22941-1-richard.henderson@linaro.org
Subject: [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]               patchew/20181113083104.2692-1-aik@ozlabs.ru -> patchew/20181113083104.2692-1-aik@ozlabs.ru
Switched to a new branch 'test'
e69b196b86 tcg/arm: Use TCG_TARGET_NEED_LDST_OOL_LABELS
6fd30d145e tcg/arm: Force qemu_ld/st arguments into fixed registers
bb1d7b7a2a tcg/arm: Reduce the number of temps for tcg_out_tlb_read
58ad5e1707 tcg/arm: Add constraints for R0-R5
0ae2a6fe31 tcg/arm: Parameterize the temps for tcg_out_tlb_read
d9cc700ca9 tcg/aarch64: Use TCG_TARGET_NEED_LDST_OOL_LABELS
81d088cf09 tcg/aarch64: Use B not BL for tcg_out_goto_long
9b4254898d tcg/aarch64: Parameterize the temp for tcg_out_goto_long
88f66038f2 tcg/aarch64: Parameterize the temps for tcg_out_tlb_read
70870da04b tcg/aarch64: Add constraints for x0, x1, x2
c13bc30fe2 tcg/i386: Use TCG_TARGET_NEED_LDST_OOL_LABELS
f89e8cc968 tcg: Add TCG_TARGET_NEED_LDST_OOL_LABELS
61aa1913f6 tcg: Return success from patch_reloc
299cd2f2dc tcg/i386: Force qemu_ld/st arguments into fixed registers
29999f5f40 tcg/i386: Change TCG_REG_L[01] to not overlap function arguments
8c6efd075f tcg/i386: Return a base register from tcg_out_tlb_load
8c82ba4a0e tcg/i386: Add constraints for r8 and r9

=== OUTPUT BEGIN ===
Checking PATCH 1/17: tcg/i386: Add constraints for r8 and r9...
Checking PATCH 2/17: tcg/i386: Return a base register from tcg_out_tlb_load...
Checking PATCH 3/17: tcg/i386: Change TCG_REG_L[01] to not overlap function arguments...
Checking PATCH 4/17: tcg/i386: Force qemu_ld/st arguments into fixed registers...
Checking PATCH 5/17: tcg: Return success from patch_reloc...
Checking PATCH 6/17: tcg: Add TCG_TARGET_NEED_LDST_OOL_LABELS...
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#13: 
new file mode 100644

total: 0 errors, 1 warnings, 148 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 7/17: tcg/i386: Use TCG_TARGET_NEED_LDST_OOL_LABELS...
Checking PATCH 8/17: tcg/aarch64: Add constraints for x0, x1, x2...
Checking PATCH 9/17: tcg/aarch64: Parameterize the temps for tcg_out_tlb_read...
Checking PATCH 10/17: tcg/aarch64: Parameterize the temp for tcg_out_goto_long...
Checking PATCH 11/17: tcg/aarch64: Use B not BL for tcg_out_goto_long...
Checking PATCH 12/17: tcg/aarch64: Use TCG_TARGET_NEED_LDST_OOL_LABELS...
Checking PATCH 13/17: tcg/arm: Parameterize the temps for tcg_out_tlb_read...
Checking PATCH 14/17: tcg/arm: Add constraints for R0-R5...
Checking PATCH 15/17: tcg/arm: Reduce the number of temps for tcg_out_tlb_read...
Checking PATCH 16/17: tcg/arm: Force qemu_ld/st arguments into fixed registers...
Checking PATCH 17/17: tcg/arm: Use TCG_TARGET_NEED_LDST_OOL_LABELS...
ERROR: externs should be avoided in .c files
#168: FILE: tcg/arm/tcg-target.inc.c:1485:
+    TCGReg addrlo __attribute__((unused));

ERROR: externs should be avoided in .c files
#169: FILE: tcg/arm/tcg-target.inc.c:1486:
+    TCGReg addrhi __attribute__((unused));

ERROR: externs should be avoided in .c files
#170: FILE: tcg/arm/tcg-target.inc.c:1487:
+    TCGReg datalo __attribute__((unused));

ERROR: externs should be avoided in .c files
#171: FILE: tcg/arm/tcg-target.inc.c:1488:
+    TCGReg datahi __attribute__((unused));

ERROR: externs should be avoided in .c files
#233: FILE: tcg/arm/tcg-target.inc.c:1603:
+    TCGReg addrlo __attribute__((unused));

ERROR: externs should be avoided in .c files
#234: FILE: tcg/arm/tcg-target.inc.c:1604:
+    TCGReg addrhi __attribute__((unused));

ERROR: externs should be avoided in .c files
#235: FILE: tcg/arm/tcg-target.inc.c:1605:
+    TCGReg datalo __attribute__((unused));

ERROR: externs should be avoided in .c files
#236: FILE: tcg/arm/tcg-target.inc.c:1606:
+    TCGReg datahi __attribute__((unused));

total: 8 errors, 0 warnings, 373 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line
  2018-11-12 21:44 [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line Richard Henderson
                   ` (17 preceding siblings ...)
  2018-11-13  9:00 ` [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line no-reply
@ 2018-11-14  1:00 ` Emilio G. Cota
  2018-11-15 11:32   ` Richard Henderson
  18 siblings, 1 reply; 30+ messages in thread
From: Emilio G. Cota @ 2018-11-14  1:00 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, Nov 12, 2018 at 22:44:46 +0100, Richard Henderson wrote:
> Based on an idea forwarded by Emilio, which suggests a 5-6%
> speed gain is possible.  I have not spent too much time
> measuring this, as the code size gains are significant.

Nice!

> I believe that I posted an x86_64-only patch some time ago,
> but this now includes i386, aarch64 and arm32.  In late
> testing I do some failures on i386, for sparc guest.  I'll
> follow up on that later.

The following might be related: I'm seeing segfaults with -smp 8
and beyond when doing bootup+shutdown of an aarch64 guest on
an x86-64 host. smp -1 is stable AFAICT. The first commit that
shows these crashes is "tcg/i386: Use TCG_TARGET_NEED_LDST_OOL_LABELS",
that is f7ec49a51c8 in your tcg-tlb-x86 github branch.

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line
  2018-11-14  1:00 ` Emilio G. Cota
@ 2018-11-15 11:32   ` Richard Henderson
  2018-11-15 18:48     ` Emilio G. Cota
  0 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2018-11-15 11:32 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel

On 11/14/18 2:00 AM, Emilio G. Cota wrote:
> The following might be related: I'm seeing segfaults with -smp 8
> and beyond when doing bootup+shutdown of an aarch64 guest on
> an x86-64 host.

I'm not seeing that.  Anything else special on the command-line?
Are the segv in the code_gen_buffer or elsewhere?


r~

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line
  2018-11-15 11:32   ` Richard Henderson
@ 2018-11-15 18:48     ` Emilio G. Cota
  2018-11-15 18:54       ` Richard Henderson
  2018-11-15 22:04       ` Richard Henderson
  0 siblings, 2 replies; 30+ messages in thread
From: Emilio G. Cota @ 2018-11-15 18:48 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Thu, Nov 15, 2018 at 12:32:00 +0100, Richard Henderson wrote:
> On 11/14/18 2:00 AM, Emilio G. Cota wrote:
> > The following might be related: I'm seeing segfaults with -smp 8
> > and beyond when doing bootup+shutdown of an aarch64 guest on
> > an x86-64 host.
> 
> I'm not seeing that.  Anything else special on the command-line?
> Are the segv in the code_gen_buffer or elsewhere?

I just spent some time on this. I've noticed two issues:

- All TCG contexts end up using the same hash table, since
  we only allocate one table in tcg_context_init. This leads
  to memory corruption.
  This fixes it (confirmed that there aren't races with helgrind):

--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -763,6 +763,14 @@ void tcg_register_thread(void)
     err = tcg_region_initial_alloc__locked(tcg_ctx);
     g_assert(!err);
     qemu_mutex_unlock(&region.lock);
+
+#ifdef TCG_TARGET_NEED_LDST_OOL_LABELS
+    /* if n == 0, keep the hash table we allocated in tcg_context_init */
+    if (n) {
+        /* Both key and value are raw pointers.  */
+        s->ldst_ool_thunks = g_hash_table_new(NULL, NULL);
+    }
+#endif
 }
 #endif /* !CONFIG_USER_ONLY */

- Segfault in code_gen_buffer. This one I don't have a fix for,
  but it's *much* easier to reproduce when -tb-size is very small,
  e.g. "-tb-size 5 -smp 2" (BTW it crashes with x86_64 guests too.)
  So at first I thought the code cache flushing was the problem,
  but I don't see how that could be, at least from a TCGContext
  viewpoint -- I agree that clearing the hash table in
  tcg_region_assign is a good place to do so.

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line
  2018-11-15 18:48     ` Emilio G. Cota
@ 2018-11-15 18:54       ` Richard Henderson
  2018-11-15 22:04       ` Richard Henderson
  1 sibling, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2018-11-15 18:54 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel

On 11/15/18 7:48 PM, Emilio G. Cota wrote:
> - All TCG contexts end up using the same hash table, since
>   we only allocate one table in tcg_context_init. This leads
>   to memory corruption.

Bah.  I forgot we only call tcg_context_init once.
Thanks for the fix.

In the meantime I've fixed the i386 problem.  A bit of silliness wrt 64-bit stores.


r~

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line
  2018-11-15 18:48     ` Emilio G. Cota
  2018-11-15 18:54       ` Richard Henderson
@ 2018-11-15 22:04       ` Richard Henderson
  2018-11-16  1:13         ` Emilio G. Cota
  1 sibling, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2018-11-15 22:04 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel

On 11/15/18 7:48 PM, Emilio G. Cota wrote:
> - Segfault in code_gen_buffer. This one I don't have a fix for,
>   but it's *much* easier to reproduce when -tb-size is very small,
>   e.g. "-tb-size 5 -smp 2" (BTW it crashes with x86_64 guests too.)
>   So at first I thought the code cache flushing was the problem,
>   but I don't see how that could be, at least from a TCGContext
>   viewpoint -- I agree that clearing the hash table in
>   tcg_region_assign is a good place to do so.

Ho hum.

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 639f0b2728..115ea186e5 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1831,10 +1831,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     existing_tb = tb_link_page(tb, phys_pc, phys_page2);
     /* if the TB already exists, discard what we just translated */
     if (unlikely(existing_tb != tb)) {
-        uintptr_t orig_aligned = (uintptr_t)gen_code_buf;
-
-        orig_aligned -= ROUND_UP(sizeof(*tb), qemu_icache_linesize);
-        atomic_set(&tcg_ctx->code_gen_ptr, (void *)orig_aligned);
         return existing_tb;
     }
     tcg_tb_insert(tb);

We can't easily undo the hash table insert, and for a relatively rare
occurrence it's not worth the effort.


r~

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line
  2018-11-15 22:04       ` Richard Henderson
@ 2018-11-16  1:13         ` Emilio G. Cota
  2018-11-16  5:10           ` Emilio G. Cota
  2018-11-16  8:10           ` Richard Henderson
  0 siblings, 2 replies; 30+ messages in thread
From: Emilio G. Cota @ 2018-11-16  1:13 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Thu, Nov 15, 2018 at 23:04:50 +0100, Richard Henderson wrote:
> On 11/15/18 7:48 PM, Emilio G. Cota wrote:
> > - Segfault in code_gen_buffer. This one I don't have a fix for,
> >   but it's *much* easier to reproduce when -tb-size is very small,
> >   e.g. "-tb-size 5 -smp 2" (BTW it crashes with x86_64 guests too.)
> >   So at first I thought the code cache flushing was the problem,
> >   but I don't see how that could be, at least from a TCGContext
> >   viewpoint -- I agree that clearing the hash table in
> >   tcg_region_assign is a good place to do so.
> 
> Ho hum.
> 
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 639f0b2728..115ea186e5 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1831,10 +1831,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>      existing_tb = tb_link_page(tb, phys_pc, phys_page2);
>      /* if the TB already exists, discard what we just translated */
>      if (unlikely(existing_tb != tb)) {
> -        uintptr_t orig_aligned = (uintptr_t)gen_code_buf;
> -
> -        orig_aligned -= ROUND_UP(sizeof(*tb), qemu_icache_linesize);
> -        atomic_set(&tcg_ctx->code_gen_ptr, (void *)orig_aligned);
>          return existing_tb;
>      }
>      tcg_tb_insert(tb);
> 
> We can't easily undo the hash table insert, and for a relatively rare
> occurrence it's not worth the effort.

Nice catch! Everything works now =D

In the bootup+shutdown aarch64 test with -smp 12, we end up
discarding ~2500 TB's--that's ~439K of space for code that we
do not waste; note that I'm assuming 180 host bytes per TB,
which is the average reported by info jit.

We can still discard most of these by increasing a counter every
time we insert a new element into the OOL table, and checking
this counter before/after tcg_gen_code. (Note that checking
g_hash_table_size before/after is not enough, because we might
have replaced an existing item from the table.)
Then, we discard a TB iff an OOL thunk was generated. (Diff below.)

This allows us to discard most TBs; in the example above,
we end up *not* discarding only ~70 TBs, that is we end up keeping
only 70/2500 = 2.8% of the TBs that we'd discard without OOL.

Performance-wise it doesn't make a difference for -smp 1:

Host: Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz
Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (5 runs):

- Before (3.1.0-rc1):

      14351.436177      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.24% )
    49,963,260,126      cycles                    #    3.481 GHz                      ( +-  0.22% )  (83.32%)
    26,047,650,654      stalled-cycles-frontend   #   52.13% frontend cycles idle     ( +-  0.29% )  (83.34%)
    19,717,480,482      stalled-cycles-backend    #   39.46% backend  cycles idle     ( +-  0.27% )  (66.67%)
    59,278,011,067      instructions              #    1.19  insns per cycle        
                                                  #    0.44  stalled cycles per insn  ( +-  0.17% )  (83.34%)
    10,632,601,608      branches                  #  740.874 M/sec                    ( +-  0.17% )  (83.34%)
       236,153,469      branch-misses             #    2.22% of all branches          ( +-  0.16% )  (83.35%)

      14.382847823 seconds time elapsed                                          ( +-  0.25% )

- After this series (with the fixes we've discussed):

      13256.198927      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.04% )
    46,146,457,353      cycles                    #    3.481 GHz                      ( +-  0.08% )  (83.34%)
    22,632,342,565      stalled-cycles-frontend   #   49.04% frontend cycles idle     ( +-  0.12% )  (83.35%)
    16,534,690,741      stalled-cycles-backend    #   35.83% backend  cycles idle     ( +-  0.15% )  (66.67%)
    58,047,832,548      instructions              #    1.26  insns per cycle        
                                                  #    0.39  stalled cycles per insn  ( +-  0.18% )  (83.34%)
    11,031,634,880      branches                  #  832.187 M/sec                    ( +-  0.12% )  (83.33%)
       210,593,929      branch-misses             #    1.91% of all branches          ( +-  0.30% )  (83.33%)

      13.285023783 seconds time elapsed                                          ( +-  0.05% )

- After the fixup below:

      13240.889734      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.19% )
    46,074,292,775      cycles                    #    3.480 GHz                      ( +-  0.12% )  (83.35%)
    22,670,132,770      stalled-cycles-frontend   #   49.20% frontend cycles idle     ( +-  0.17% )  (83.35%)
    16,598,822,504      stalled-cycles-backend    #   36.03% backend  cycles idle     ( +-  0.26% )  (66.66%)
    57,796,083,344      instructions              #    1.25  insns per cycle        
                                                  #    0.39  stalled cycles per insn  ( +-  0.16% )  (83.34%)
    11,002,340,174      branches                  #  830.937 M/sec                    ( +-  0.11% )  (83.35%)
       211,023,549      branch-misses             #    1.92% of all branches          ( +-  0.22% )  (83.32%)

      13.264499034 seconds time elapsed                                          ( +-  0.19% )

I'll generate now some more perf numbers that we could include in the
commit logs.

Thanks,

		Emilio

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 115ea18..15f7d4e 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1678,6 +1678,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     target_ulong virt_page2;
     tcg_insn_unit *gen_code_buf;
     int gen_code_size, search_size;
+#ifdef TCG_TARGET_NEED_LDST_OOL_LABELS
+    size_t n_ool_thunks;
+#endif
 #ifdef CONFIG_PROFILER
     TCGProfile *prof = &tcg_ctx->prof;
     int64_t ti;
@@ -1744,6 +1747,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     ti = profile_getclock();
 #endif
 
+#ifdef TCG_TARGET_NEED_LDST_OOL_LABELS
+    n_ool_thunks = tcg_ctx->n_ool_thunks;
+#endif
+
     /* ??? Overflow could be handled better here.  In particular, we
        don't need to re-do gen_intermediate_code, nor should we re-do
        the tcg optimization currently hidden inside tcg_gen_code.  All
@@ -1831,6 +1838,18 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     existing_tb = tb_link_page(tb, phys_pc, phys_page2);
     /* if the TB already exists, discard what we just translated */
     if (unlikely(existing_tb != tb)) {
+        bool discard = true;
+
+#ifdef TCG_TARGET_NEED_LDST_OOL_LABELS
+        /* only discard the TB if we didn't generate an OOL thunk */
+        discard = tcg_ctx->n_ool_thunks == n_ool_thunks;
+#endif
+        if (discard) {
+            uintptr_t orig_aligned = (uintptr_t)gen_code_buf;
+
+            orig_aligned -= ROUND_UP(sizeof(*tb), qemu_icache_linesize);
+            atomic_set(&tcg_ctx->code_gen_ptr, (void *)orig_aligned);
+        }
         return existing_tb;
     }
     tcg_tb_insert(tb);
diff --git a/tcg/tcg-ldst-ool.inc.c b/tcg/tcg-ldst-ool.inc.c
index 8fb6550..61da060 100644
--- a/tcg/tcg-ldst-ool.inc.c
+++ b/tcg/tcg-ldst-ool.inc.c
@@ -69,6 +69,7 @@ static bool tcg_out_ldst_ool_finalize(TCGContext *s)
 
         /* Remember the thunk for next time.  */
         g_hash_table_replace(s->ldst_ool_thunks, key, dest);
+        s->n_ool_thunks++;
 
         /* The new thunk must be in range.  */
         ok = patch_reloc(lb->label, lb->reloc, (intptr_t)dest, lb->addend);
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 1255d2a..d4f07a6 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -709,6 +709,7 @@ struct TCGContext {
 #ifdef TCG_TARGET_NEED_LDST_OOL_LABELS
     QSIMPLEQ_HEAD(ldst_labels, TCGLabelQemuLdstOol) ldst_ool_labels;
     GHashTable *ldst_ool_thunks;
+    size_t n_ool_thunks;
 #endif
 #ifdef TCG_TARGET_NEED_POOL_LABELS
     struct TCGLabelPoolData *pool_labels;

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line
  2018-11-16  1:13         ` Emilio G. Cota
@ 2018-11-16  5:10           ` Emilio G. Cota
  2018-11-16  8:07             ` Richard Henderson
  2018-11-16  8:10           ` Richard Henderson
  1 sibling, 1 reply; 30+ messages in thread
From: Emilio G. Cota @ 2018-11-16  5:10 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Thu, Nov 15, 2018 at 20:13:38 -0500, Emilio G. Cota wrote:
> I'll generate now some more perf numbers that we could include in the
> commit logs.

SPEC numbers are a net perf decrease, unfortunately:

                     Softmmu speedup for SPEC06int (test set)
   1.1 +-+--+----+----+----+----+----+----+---+----+----+----+----+----+--+-+
       |                                                                    |
       |                                                      aft+++        |
  1.05 +-+........................................................|.......+-+
       |                                               +++        |         |
       |                       +++                      |         |         |
       |   +++                  |                       |         |         |
     1 +-++++++++++++++++****++++++++++++++++++++++++++++++++++++***+++++++-+
       |    |         |  *  * ****                     ****      *|*        |
       |   ***  +++   |  *  * * |*       +++           *| *      *|*        |
  0.95 +-+.*|*..***...|..*..*.*.|*..+++...|............*|.*.+++..*|*..+++.+-+
       |   *|*  *+*  *** *  * * |*   |    |  +++       *| * ***  *|*  ***   |
       |   *+*  * *  *|* *  * *++*   |  ****  |        *| * *+*  *|*  *+*   |
       |   * *  * *  *|* *  * *  * **** * |* ****      *++* * *  *+*  * *   |
   0.9 +-+.*.*..*.*..*+*.*..*.*..*.*.|*.*.|*.*|.*......*..*.*.*..*.*..*.*.+-+
       |   * *  * *  * * *  * *  * *++* *++* *++* +++  *  * * *  * *  * *   |
       |   * *  * *  * * *  * *  * *  * *  * *  *  |   *  * * *  * *  * *   |
  0.85 +-+.*.*..*.*..*.*.*..*.*..*.*..*.*..*.*..*..|...*..*.*.*..*.*..*.*.+-+
       |   * *  * *  * * *  * *  * *  * *  * *  *  |   *  * * *  * *  * *   |
       |   * *  * *  * * *  * *  * *  * *  * *  * **** *  * * *  * *  * *   |
       |   * *  * *  * * *  * *  * *  * *  * *  * *| * *  * * *  * *  * *   |
   0.8 +-+.*.*..*.*..*.*.*..*.*..*.*..*.*..*.*..*.*|.*.*..*.*.*..*.*..*.*.+-+
       |   * *  * *  * * *  * *  * *  * *  * *  * *| * *  * * *  * *  * *   |
       |   * *  * *  * * *  * *  * *  * *  * *  * *++* *  * * *  * *  * *   |
  0.75 +-+-***--***--***-****-****-****-****-****-****-****-***--***--***-+-+
        401.bzi403.g429445.g456.462.libq464.h471.omn4483.xalancbgeomean
  png: https://imgur.com/aO39gyP

Turns out that the additional instructions are the problem,
despite the much lower icache miss rate. For instance, here
are some numbers for h264ref running on the not-so-recent
Xeon E5-2643 (i.e. Sandy Bridge):

- Before:
 1,137,737,512,668      instructions              #    2.02  insns per cycle
   563,574,505,040      cycles
     5,663,616,681      L1-icache-load-misses
     164.091239774 seconds time elapsed

- After:
 1,216,600,582,476      instructions              #    2.06  insns per cycle        
   591,888,969,223      cycles                                                      
     3,082,426,508      L1-icache-load-misses                                       

     172.232292897 seconds time elapsed

It's possible that newer machines with larger reorder buffers
will be able to take better advantage of the higher instruction
locality, hiding the latency of having to execute more instructions.
I'll test on Skylake tomorrow.

Thanks,

		E.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line
  2018-11-16  5:10           ` Emilio G. Cota
@ 2018-11-16  8:07             ` Richard Henderson
  2018-11-16 15:07               ` Emilio G. Cota
  0 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2018-11-16  8:07 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel

On 11/16/18 6:10 AM, Emilio G. Cota wrote:
> It's possible that newer machines with larger reorder buffers
> will be able to take better advantage of the higher instruction
> locality, hiding the latency of having to execute more instructions.
> I'll test on Skylake tomorrow.

I've noticed that the code we generate for calls has twice as many instructions
as really needed for setting up the arguments.  I have a plan to fix that,
which hopefully will solve this problem.


r~

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line
  2018-11-16  1:13         ` Emilio G. Cota
  2018-11-16  5:10           ` Emilio G. Cota
@ 2018-11-16  8:10           ` Richard Henderson
  2018-11-16 15:10             ` Emilio G. Cota
  1 sibling, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2018-11-16  8:10 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel

On 11/16/18 2:13 AM, Emilio G. Cota wrote:
> This allows us to discard most TBs; in the example above,
> we end up *not* discarding only ~70 TBs, that is we end up keeping
> only 70/2500 = 2.8% of the TBs that we'd discard without OOL.

Thanks.

When I apply this I think I'll rename "n_ool_thunks" to "ool_generation".  We
don't care about the absolute count, but any change.


r~

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line
  2018-11-16  8:07             ` Richard Henderson
@ 2018-11-16 15:07               ` Emilio G. Cota
  0 siblings, 0 replies; 30+ messages in thread
From: Emilio G. Cota @ 2018-11-16 15:07 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Fri, Nov 16, 2018 at 09:07:50 +0100, Richard Henderson wrote:
> On 11/16/18 6:10 AM, Emilio G. Cota wrote:
> > It's possible that newer machines with larger reorder buffers
> > will be able to take better advantage of the higher instruction
> > locality, hiding the latency of having to execute more instructions.
> > I'll test on Skylake tomorrow.
> 
> I've noticed that the code we generate for calls has twice as many instructions
> as really needed for setting up the arguments.  I have a plan to fix that,
> which hopefully will solve this problem.

Ah that's great. I'll do more tests and a full review when those
changes come out, then.

Thanks,

		E.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line
  2018-11-16  8:10           ` Richard Henderson
@ 2018-11-16 15:10             ` Emilio G. Cota
  0 siblings, 0 replies; 30+ messages in thread
From: Emilio G. Cota @ 2018-11-16 15:10 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Fri, Nov 16, 2018 at 09:10:32 +0100, Richard Henderson wrote:
> On 11/16/18 2:13 AM, Emilio G. Cota wrote:
> > This allows us to discard most TBs; in the example above,
> > we end up *not* discarding only ~70 TBs, that is we end up keeping
> > only 70/2500 = 2.8% of the TBs that we'd discard without OOL.
> 
> Thanks.
> 
> When I apply this I think I'll rename "n_ool_thunks" to "ool_generation".  We
> don't care about the absolute count, but any change.

Agreed, that's much better.

Thanks,

		E.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2018-11-16 15:10 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-12 21:44 [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 01/17] tcg/i386: Add constraints for r8 and r9 Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 02/17] tcg/i386: Return a base register from tcg_out_tlb_load Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 03/17] tcg/i386: Change TCG_REG_L[01] to not overlap function arguments Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 04/17] tcg/i386: Force qemu_ld/st arguments into fixed registers Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 05/17] tcg: Return success from patch_reloc Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 06/17] tcg: Add TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 07/17] tcg/i386: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 08/17] tcg/aarch64: Add constraints for x0, x1, x2 Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 09/17] tcg/aarch64: Parameterize the temps for tcg_out_tlb_read Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 10/17] tcg/aarch64: Parameterize the temp for tcg_out_goto_long Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 11/17] tcg/aarch64: Use B not BL " Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 12/17] tcg/aarch64: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 13/17] tcg/arm: Parameterize the temps for tcg_out_tlb_read Richard Henderson
2018-11-12 21:45 ` [Qemu-devel] [PATCH for-4.0 14/17] tcg/arm: Add constraints for R0-R5 Richard Henderson
2018-11-12 21:45 ` [Qemu-devel] [PATCH for-4.0 15/17] tcg/arm: Reduce the number of temps for tcg_out_tlb_read Richard Henderson
2018-11-12 21:45 ` [Qemu-devel] [PATCH for-4.0 16/17] tcg/arm: Force qemu_ld/st arguments into fixed registers Richard Henderson
2018-11-12 21:45 ` [Qemu-devel] [PATCH for-4.0 17/17] tcg/arm: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
2018-11-13  9:00 ` [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line no-reply
2018-11-14  1:00 ` Emilio G. Cota
2018-11-15 11:32   ` Richard Henderson
2018-11-15 18:48     ` Emilio G. Cota
2018-11-15 18:54       ` Richard Henderson
2018-11-15 22:04       ` Richard Henderson
2018-11-16  1:13         ` Emilio G. Cota
2018-11-16  5:10           ` Emilio G. Cota
2018-11-16  8:07             ` Richard Henderson
2018-11-16 15:07               ` Emilio G. Cota
2018-11-16  8:10           ` Richard Henderson
2018-11-16 15:10             ` Emilio G. Cota

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.