All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools
@ 2017-08-04  5:44 Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 01/23] tcg: Move USE_DIRECT_JUMP discriminator to tcg/cpu/tcg-target.h Richard Henderson
                   ` (22 more replies)
  0 siblings, 23 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

RISC machines often require many instructions in order to construct large
constants from the immediate values available to individual instructions.
Static compilers like GCC often place these large constants into read-only
memory and use one load instruction to fetch the constant instead; a
collection of these is known as a "constant pool".

TCG currently generates all constants from immediate values.  This can
require 4 insns for a full 64-bit value for AArch64, 4 insns for a
full 32-bit value for AArch32 v6.  s390x z9 needs 4, ppc64 and sparc64
need 5, mips64 needs 6.

Moreover, entries in the constant pool may be used more than once.  For
instance, if there are 3 consecutive guest stores, then we can enter the
host address of helper_le_ldul_mmu into the constant pool once for the 3
call invocations.  Depending on the host memory map, the result may be a
savings of (4*3*4) - (1*3*4+1*8) = 28 bytes.

This last is even true for the x86_64 host, where

    movq $helper_ld_ldul_mmu, %rax; call *%rax 

costs 10+6 bytes, but

    call *label(%rip); .quad helper_ld_ldul_mmu 

costs 6+8 bytes, plus the ability to share the 8 bytes for the entry.


r~


Richard Henderson (23):
  tcg: Move USE_DIRECT_JUMP discriminator to tcg/cpu/tcg-target.h
  tcg: Rearrange ldst label tracking
  tcg: Infrastructure for managing constant pools
  tcg/i386: Store out-of-range call targets in constant pool
  tcg/s390: Introduce TCG_REG_TB
  tcg/s390: Fix sign of patch_reloc addend
  tcg/s390: Use constant pool for movi
  tcg/s390: Use constant pool for andi
  tcg/s390: Use constant pool for ori
  tcg/s390: Use constant pool for xori
  tcg/s390: Use constant pool for cmpi
  tcg/aarch64: Use constant pool for movi
  tcg/sparc: Introduce TCG_REG_TB
  tcg/sparc: Use constant pool for movi
  tcg/arm: Improve tlb load for armv7
  tcg/arm: Tighten tlb indexing offset test
  tcg/arm: Code rearrangement
  tcg/arm: Extract INSN_NOP
  tcg/arm: Use constant pool for movi
  tcg/arm: Use constant pool for call
  tcg/ppc: Change TCG_REG_RA to TCG_REG_TB
  tcg/ppc: Look for shifted constants
  tcg/ppc: Use constant pool for movi

 include/elf.h                         |   3 +-
 include/exec/exec-all.h               |  95 +----
 tcg/aarch64/tcg-target.h              |   8 +
 tcg/arm/tcg-target.h                  |   9 +
 tcg/i386/tcg-target.h                 |  14 +
 tcg/ia64/tcg-target.h                 |   8 +
 tcg/mips/tcg-target.h                 |   7 +
 tcg/ppc/tcg-target.h                  |   7 +
 tcg/s390/tcg-target.h                 |  15 +
 tcg/sparc/tcg-target.h                |   5 +
 tcg/tcg-be-null.h                     |  44 --
 tcg/tcg.h                             |  14 +-
 tcg/tci/tcg-target.h                  |   9 +
 accel/tcg/cpu-exec.c                  |  35 ++
 accel/tcg/translate-all.c             |  36 +-
 tcg/aarch64/tcg-target.inc.c          |  78 ++--
 tcg/arm/tcg-target.inc.c              | 780 +++++++++++++++++++---------------
 tcg/i386/tcg-target.inc.c             |  20 +-
 tcg/ia64/tcg-target.inc.c             |  19 +-
 tcg/mips/tcg-target.inc.c             |   7 +-
 tcg/ppc/tcg-target.inc.c              | 320 +++++++-------
 tcg/s390/tcg-target.inc.c             | 527 +++++++++++++----------
 tcg/sparc/tcg-target.inc.c            | 240 ++++++++---
 tcg/{tcg-be-ldst.h => tcg-ldst.inc.c} |  27 +-
 tcg/tcg-pool.inc.c                    |  85 ++++
 tcg/tcg.c                             |  26 +-
 tcg/tci/tcg-target.inc.c              |   2 -
 27 files changed, 1422 insertions(+), 1018 deletions(-)
 delete mode 100644 tcg/tcg-be-null.h
 rename tcg/{tcg-be-ldst.h => tcg-ldst.inc.c} (85%)
 create mode 100644 tcg/tcg-pool.inc.c

-- 
2.13.3

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 01/23] tcg: Move USE_DIRECT_JUMP discriminator to tcg/cpu/tcg-target.h
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 02/23] tcg: Rearrange ldst label tracking Richard Henderson
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

Replace the USE_DIRECT_JUMP ifdef with a TCG_TARGET_HAS_direct_jump
boolean test.  Replace the tb_set_jmp_target1 ifdef with an unconditional
function tb_target_set_jmp_target.

While we're touching all backends, add a parameter for tb->tc_ptr;
we're going to need it shortly for some backends.

Move tb_set_jmp_target and tb_add_jump from exec-all.h to cpu-exec.c.

This opens the possibility for TCG_TARGET_HAS_direct_jump to be
a runtime decision -- based on host cpu capabilities, the size of
code_gen_buffer, or a future debugging switch.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 include/exec/exec-all.h      | 95 ++------------------------------------------
 tcg/aarch64/tcg-target.h     |  3 ++
 tcg/arm/tcg-target.h         |  4 ++
 tcg/i386/tcg-target.h        |  9 +++++
 tcg/ia64/tcg-target.h        |  4 ++
 tcg/mips/tcg-target.h        |  3 ++
 tcg/ppc/tcg-target.h         |  2 +
 tcg/s390/tcg-target.h        | 10 +++++
 tcg/sparc/tcg-target.h       |  3 ++
 tcg/tcg.h                    |  4 +-
 tcg/tci/tcg-target.h         |  9 +++++
 accel/tcg/cpu-exec.c         | 35 ++++++++++++++++
 accel/tcg/translate-all.c    | 14 +++----
 tcg/aarch64/tcg-target.inc.c | 13 +++---
 tcg/mips/tcg-target.inc.c    |  3 +-
 tcg/ppc/tcg-target.inc.c     |  6 ++-
 tcg/sparc/tcg-target.inc.c   |  3 +-
 17 files changed, 107 insertions(+), 113 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 440fc31b37..aff4d33e3c 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -330,15 +330,6 @@ static inline void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr)
 #define CODE_GEN_AVG_BLOCK_SIZE 150
 #endif
 
-#if defined(_ARCH_PPC) \
-    || defined(__x86_64__) || defined(__i386__) \
-    || defined(__sparc__) || defined(__aarch64__) \
-    || defined(__s390x__) || defined(__mips__) \
-    || defined(CONFIG_TCG_INTERPRETER)
-/* NOTE: Direct jump patching must be atomic to be thread-safe. */
-#define USE_DIRECT_JUMP
-#endif
-
 struct TranslationBlock {
     target_ulong pc;   /* simulated PC corresponding to this block (EIP + CS base) */
     target_ulong cs_base; /* CS base for this block */
@@ -376,11 +367,8 @@ struct TranslationBlock {
      */
     uint16_t jmp_reset_offset[2]; /* offset of original jump target */
 #define TB_JMP_RESET_OFFSET_INVALID 0xffff /* indicates no jump generated */
-#ifdef USE_DIRECT_JUMP
-    uint16_t jmp_insn_offset[2]; /* offset of native jump instruction */
-#else
-    uintptr_t jmp_target_addr[2]; /* target address for indirect jump */
-#endif
+    uintptr_t jmp_target_arg[2];  /* target address or offset */
+
     /* Each TB has an assosiated circular list of TBs jumping to this one.
      * jmp_list_first points to the first TB jumping to this one.
      * jmp_list_next is used to point to the next TB in a list.
@@ -402,84 +390,7 @@ void tb_flush(CPUState *cpu);
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
 TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
                                    target_ulong cs_base, uint32_t flags);
-
-#if defined(USE_DIRECT_JUMP)
-
-#if defined(CONFIG_TCG_INTERPRETER)
-static inline void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
-{
-    /* patch the branch destination */
-    atomic_set((int32_t *)jmp_addr, addr - (jmp_addr + 4));
-    /* no need to flush icache explicitly */
-}
-#elif defined(_ARCH_PPC)
-void ppc_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr);
-#define tb_set_jmp_target1 ppc_tb_set_jmp_target
-#elif defined(__i386__) || defined(__x86_64__)
-static inline void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
-{
-    /* patch the branch destination */
-    atomic_set((int32_t *)jmp_addr, addr - (jmp_addr + 4));
-    /* no need to flush icache explicitly */
-}
-#elif defined(__s390x__)
-static inline void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
-{
-    /* patch the branch destination */
-    intptr_t disp = addr - (jmp_addr - 2);
-    atomic_set((int32_t *)jmp_addr, disp / 2);
-    /* no need to flush icache explicitly */
-}
-#elif defined(__aarch64__)
-void aarch64_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr);
-#define tb_set_jmp_target1 aarch64_tb_set_jmp_target
-#elif defined(__sparc__) || defined(__mips__)
-void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr);
-#else
-#error tb_set_jmp_target1 is missing
-#endif
-
-static inline void tb_set_jmp_target(TranslationBlock *tb,
-                                     int n, uintptr_t addr)
-{
-    uint16_t offset = tb->jmp_insn_offset[n];
-    tb_set_jmp_target1((uintptr_t)(tb->tc_ptr + offset), addr);
-}
-
-#else
-
-/* set the jump target */
-static inline void tb_set_jmp_target(TranslationBlock *tb,
-                                     int n, uintptr_t addr)
-{
-    tb->jmp_target_addr[n] = addr;
-}
-
-#endif
-
-/* Called with tb_lock held.  */
-static inline void tb_add_jump(TranslationBlock *tb, int n,
-                               TranslationBlock *tb_next)
-{
-    assert(n < ARRAY_SIZE(tb->jmp_list_next));
-    if (tb->jmp_list_next[n]) {
-        /* Another thread has already done this while we were
-         * outside of the lock; nothing to do in this case */
-        return;
-    }
-    qemu_log_mask_and_addr(CPU_LOG_EXEC, tb->pc,
-                           "Linking TBs %p [" TARGET_FMT_lx
-                           "] index %d -> %p [" TARGET_FMT_lx "]\n",
-                           tb->tc_ptr, tb->pc, n,
-                           tb_next->tc_ptr, tb_next->pc);
-
-    /* patch the native jump address */
-    tb_set_jmp_target(tb, n, (uintptr_t)tb_next->tc_ptr);
-
-    /* add in TB jmp circular list */
-    tb->jmp_list_next[n] = tb_next->jmp_list_first;
-    tb_next->jmp_list_first = (uintptr_t)tb | n;
-}
+void tb_set_jmp_target(TranslationBlock *tb, int n, uintptr_t addr);
 
 /* GETPC is the true target of the return instruction that we'll execute.  */
 #if defined(CONFIG_TCG_INTERPRETER)
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 55a46ac825..3c3b1e603d 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -111,10 +111,13 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i64        0
 #define TCG_TARGET_HAS_muluh_i64        1
 #define TCG_TARGET_HAS_mulsh_i64        1
+#define TCG_TARGET_HAS_direct_jump      1
 
 static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 {
     __builtin___clear_cache((char *)start, (char *)stop);
 }
 
+void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
+
 #endif /* AARCH64_TCG_TARGET_H */
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 5ef1086710..b836f7f127 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -124,6 +124,7 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_div_i32          use_idiv_instructions
 #define TCG_TARGET_HAS_rem_i32          0
 #define TCG_TARGET_HAS_goto_ptr         1
+#define TCG_TARGET_HAS_direct_jump      0
 
 enum {
     TCG_AREG0 = TCG_REG_R6,
@@ -134,4 +135,7 @@ static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
     __builtin___clear_cache((char *) start, (char *) stop);
 }
 
+/* not defined -- call should be eliminated at compile time */
+void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
+
 #endif
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 73a15f7e80..2fd28fa6a5 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -108,6 +108,7 @@ extern bool have_popcnt;
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 #define TCG_TARGET_HAS_goto_ptr         1
+#define TCG_TARGET_HAS_direct_jump      1
 
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_extrl_i64_i32    0
@@ -166,6 +167,14 @@ static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 {
 }
 
+static inline void tb_target_set_jmp_target(uintptr_t tc_ptr,
+                                            uintptr_t jmp_addr, uintptr_t addr)
+{
+    /* patch the branch destination */
+    atomic_set((int32_t *)jmp_addr, addr - (jmp_addr + 4));
+    /* no need to flush icache explicitly */
+}
+
 /* This defines the natural memory order supported by this
  * architecture before guarantees made by various barrier
  * instructions.
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index 901bb7575d..5c9ca8c1ce 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -174,6 +174,7 @@ typedef enum {
 #define TCG_TARGET_HAS_extrl_i64_i32    0
 #define TCG_TARGET_HAS_extrh_i64_i32    0
 #define TCG_TARGET_HAS_goto_ptr         0
+#define TCG_TARGET_HAS_direct_jump      0
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) ((len) <= 16)
 #define TCG_TARGET_deposit_i64_valid(ofs, len) ((len) <= 16)
@@ -195,4 +196,7 @@ static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
     asm volatile (";;sync.i;;srlz.i;;");
 }
 
+/* not defined -- call should be eliminated at compile time */
+void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
+
 #endif
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index d75cb63ed3..557c8ddc46 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -131,6 +131,7 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_mulsh_i32        1
 #define TCG_TARGET_HAS_bswap32_i32      1
 #define TCG_TARGET_HAS_goto_ptr         1
+#define TCG_TARGET_HAS_direct_jump      1
 
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_add2_i32         0
@@ -206,4 +207,6 @@ static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
     cacheflush ((void *)start, stop-start, ICACHE);
 }
 
+void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
+
 #endif
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 5f4a40a5b4..5bab3387e5 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -83,6 +83,7 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_muluh_i32        1
 #define TCG_TARGET_HAS_mulsh_i32        1
 #define TCG_TARGET_HAS_goto_ptr         1
+#define TCG_TARGET_HAS_direct_jump      1
 
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_add2_i32         0
@@ -124,5 +125,6 @@ extern bool have_isa_3_00;
 #endif
 
 void flush_icache_range(uintptr_t start, uintptr_t stop);
+void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
 
 #endif
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 81fc179459..1398952d6b 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -95,6 +95,7 @@ extern uint64_t s390_facilities;
 #define TCG_TARGET_HAS_extrl_i64_i32  0
 #define TCG_TARGET_HAS_extrh_i64_i32  0
 #define TCG_TARGET_HAS_goto_ptr       1
+#define TCG_TARGET_HAS_direct_jump    1
 
 #define TCG_TARGET_HAS_div2_i64       1
 #define TCG_TARGET_HAS_rot_i64        1
@@ -143,4 +144,13 @@ static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 {
 }
 
+static inline void tb_target_set_jmp_target(uintptr_t tc_ptr,
+                                            uintptr_t jmp_addr, uintptr_t addr)
+{
+    /* patch the branch destination */
+    intptr_t disp = addr - (jmp_addr - 2);
+    atomic_set((int32_t *)jmp_addr, disp / 2);
+    /* no need to flush icache explicitly */
+}
+
 #endif
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 854a0afd70..3ac0bd33d3 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -124,6 +124,7 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 #define TCG_TARGET_HAS_goto_ptr         1
+#define TCG_TARGET_HAS_direct_jump      1
 
 #define TCG_TARGET_HAS_extrl_i64_i32    1
 #define TCG_TARGET_HAS_extrh_i64_i32    1
@@ -170,4 +171,6 @@ static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
     }
 }
 
+void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
+
 #endif
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 17b7750ee6..46957d9bd7 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -652,8 +652,8 @@ struct TCGContext {
     /* goto_tb support */
     tcg_insn_unit *code_buf;
     uint16_t *tb_jmp_reset_offset; /* tb->jmp_reset_offset */
-    uint16_t *tb_jmp_insn_offset; /* tb->jmp_insn_offset if USE_DIRECT_JUMP */
-    uintptr_t *tb_jmp_target_addr; /* tb->jmp_target_addr if !USE_DIRECT_JUMP */
+    uintptr_t *tb_jmp_insn_offset; /* tb->jmp_target_arg if direct_jump */
+    uintptr_t *tb_jmp_target_addr; /* tb->jmp_target_arg if !direct_jump */
 
     TCGRegSet reserved_regs;
     intptr_t current_frame_offset;
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 06963288dc..036e8049c8 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -86,6 +86,7 @@
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 #define TCG_TARGET_HAS_goto_ptr         0
+#define TCG_TARGET_HAS_direct_jump      1
 
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_extrl_i64_i32    0
@@ -192,4 +193,12 @@ static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 {
 }
 
+static inline void tb_target_set_jmp_target(uintptr_t tc_ptr,
+                                            uintptr_t jmp_addr, uintptr_t addr)
+{
+    /* patch the branch destination */
+    atomic_set((int32_t *)jmp_addr, addr - (jmp_addr + 4));
+    /* no need to flush icache explicitly */
+}
+
 #endif /* TCG_TARGET_H */
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index d84b01d1b8..ff6866624a 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -329,6 +329,41 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
     return qht_lookup(&tcg_ctx.tb_ctx.htable, tb_cmp, &desc, h);
 }
 
+void tb_set_jmp_target(TranslationBlock *tb, int n, uintptr_t addr)
+{
+    if (TCG_TARGET_HAS_direct_jump) {
+        uintptr_t offset = tb->jmp_target_arg[n];
+        uintptr_t tc_ptr = (uintptr_t)tb->tc_ptr;
+        tb_target_set_jmp_target(tc_ptr, tc_ptr + offset, addr);
+    } else {
+        tb->jmp_target_arg[n] = addr;
+    }
+}
+
+/* Called with tb_lock held.  */
+static inline void tb_add_jump(TranslationBlock *tb, int n,
+                               TranslationBlock *tb_next)
+{
+    assert(n < ARRAY_SIZE(tb->jmp_list_next));
+    if (tb->jmp_list_next[n]) {
+        /* Another thread has already done this while we were
+         * outside of the lock; nothing to do in this case */
+        return;
+    }
+    qemu_log_mask_and_addr(CPU_LOG_EXEC, tb->pc,
+                           "Linking TBs %p [" TARGET_FMT_lx
+                           "] index %d -> %p [" TARGET_FMT_lx "]\n",
+                           tb->tc_ptr, tb->pc, n,
+                           tb_next->tc_ptr, tb_next->pc);
+
+    /* patch the native jump address */
+    tb_set_jmp_target(tb, n, (uintptr_t)tb_next->tc_ptr);
+
+    /* add in TB jmp circular list */
+    tb->jmp_list_next[n] = tb_next->jmp_list_first;
+    tb_next->jmp_list_first = (uintptr_t)tb | n;
+}
+
 static inline TranslationBlock *tb_find(CPUState *cpu,
                                         TranslationBlock *last_tb,
                                         int tb_exit)
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 37ecafa931..93a1cf2ba8 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1289,13 +1289,13 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     tb->jmp_reset_offset[0] = TB_JMP_RESET_OFFSET_INVALID;
     tb->jmp_reset_offset[1] = TB_JMP_RESET_OFFSET_INVALID;
     tcg_ctx.tb_jmp_reset_offset = tb->jmp_reset_offset;
-#ifdef USE_DIRECT_JUMP
-    tcg_ctx.tb_jmp_insn_offset = tb->jmp_insn_offset;
-    tcg_ctx.tb_jmp_target_addr = NULL;
-#else
-    tcg_ctx.tb_jmp_insn_offset = NULL;
-    tcg_ctx.tb_jmp_target_addr = tb->jmp_target_addr;
-#endif
+    if (TCG_TARGET_HAS_direct_jump) {
+        tcg_ctx.tb_jmp_insn_offset = tb->jmp_target_arg;
+        tcg_ctx.tb_jmp_target_addr = NULL;
+    } else {
+        tcg_ctx.tb_jmp_insn_offset = NULL;
+        tcg_ctx.tb_jmp_target_addr = tb->jmp_target_arg;
+    }
 
 #ifdef CONFIG_PROFILER
     tcg_ctx.tb_count++;
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 04bc369a92..a1e5dd2f03 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -871,9 +871,8 @@ static inline void tcg_out_call(TCGContext *s, tcg_insn_unit *target)
     }
 }
 
-#ifdef USE_DIRECT_JUMP
-
-void aarch64_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
+void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
+                              uintptr_t addr)
 {
     tcg_insn_unit i1, i2;
     TCGType rt = TCG_TYPE_I64;
@@ -898,8 +897,6 @@ void aarch64_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
     flush_icache_range(jmp_addr, jmp_addr + 8);
 }
 
-#endif
-
 static inline void tcg_out_goto_label(TCGContext *s, TCGLabel *l)
 {
     if (!l->has_value) {
@@ -1412,7 +1409,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_goto_tb:
         if (s->tb_jmp_insn_offset != NULL) {
-            /* USE_DIRECT_JUMP */
+            /* TCG_TARGET_HAS_direct_jump */
             /* Ensure that ADRP+ADD are 8-byte aligned so that an atomic
                write can be used to patch the target address. */
             if ((uintptr_t)s->code_ptr & 7) {
@@ -1420,11 +1417,11 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
             }
             s->tb_jmp_insn_offset[a0] = tcg_current_code_size(s);
             /* actual branch destination will be patched by
-               aarch64_tb_set_jmp_target later. */
+               tb_target_set_jmp_target later. */
             tcg_out_insn(s, 3406, ADRP, TCG_REG_TMP, 0);
             tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, TCG_REG_TMP, TCG_REG_TMP, 0);
         } else {
-            /* !USE_DIRECT_JUMP */
+            /* !TCG_TARGET_HAS_direct_jump */
             tcg_debug_assert(s->tb_jmp_target_addr != NULL);
             intptr_t offset = tcg_pcrel_diff(s, (s->tb_jmp_target_addr + a0)) >> 2;
             tcg_out_insn(s, 3305, LDR, offset, TCG_REG_TMP);
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 1a8169f5fc..04f8c839fe 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -2642,7 +2642,8 @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_GP);   /* global pointer */
 }
 
-void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
+void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
+                              uintptr_t addr)
 {
     atomic_set((uint32_t *)jmp_addr, deposit32(OPC_J, 0, 26, addr >> 2));
     flush_icache_range(jmp_addr, jmp_addr + 4);
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 1f690df20d..018c240f6d 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -1296,7 +1296,8 @@ static void tcg_out_mb(TCGContext *s, TCGArg a0)
 }
 
 #ifdef __powerpc64__
-void ppc_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
+void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
+                              uintptr_t addr)
 {
     tcg_insn_unit i1, i2;
     uint64_t pair;
@@ -1328,7 +1329,8 @@ void ppc_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
     flush_icache_range(jmp_addr, jmp_addr + 8);
 }
 #else
-void ppc_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
+void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
+                              uintptr_t addr)
 {
     intptr_t diff = addr - jmp_addr;
     tcg_debug_assert(in_range_b(diff));
diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index 18afce2f87..06cabbedf5 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -1708,7 +1708,8 @@ void tcg_register_jit(void *buf, size_t buf_size)
     tcg_register_jit_int(buf, buf_size, &debug_frame, sizeof(debug_frame));
 }
 
-void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
+void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
+                              uintptr_t addr)
 {
     uint32_t *ptr = (uint32_t *)jmp_addr;
     uintptr_t disp = addr - jmp_addr;
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 02/23] tcg: Rearrange ldst label tracking
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 01/23] tcg: Move USE_DIRECT_JUMP discriminator to tcg/cpu/tcg-target.h Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  2017-08-04 10:33   ` Paolo Bonzini
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 03/23] tcg: Infrastructure for managing constant pools Richard Henderson
                   ` (20 subsequent siblings)
  22 siblings, 1 reply; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

Dispense with TCGBackendData, as it has never been used for more than
holding a single pointer.  Use a define in the cpu/tcg-target.h to
signal requirement for TCGLabelQemuLdst, so that we can drop the no-op
tcg-be-null.h stubs.  Rename tcg-be-ldst.h to tcg-ldst.inc.c.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.h              |  4 ++++
 tcg/arm/tcg-target.h                  |  4 ++++
 tcg/i386/tcg-target.h                 |  4 ++++
 tcg/ia64/tcg-target.h                 |  4 ++++
 tcg/mips/tcg-target.h                 |  4 ++++
 tcg/ppc/tcg-target.h                  |  4 ++++
 tcg/s390/tcg-target.h                 |  4 ++++
 tcg/tcg-be-null.h                     | 44 -----------------------------------
 tcg/tcg.h                             |  6 +++--
 tcg/aarch64/tcg-target.inc.c          |  3 ++-
 tcg/arm/tcg-target.inc.c              |  3 ++-
 tcg/i386/tcg-target.inc.c             |  4 ++--
 tcg/ia64/tcg-target.inc.c             | 19 ++++-----------
 tcg/mips/tcg-target.inc.c             |  4 ++--
 tcg/ppc/tcg-target.inc.c              |  4 ++--
 tcg/s390/tcg-target.inc.c             |  4 ++--
 tcg/sparc/tcg-target.inc.c            |  2 --
 tcg/{tcg-be-ldst.h => tcg-ldst.inc.c} | 27 ++++-----------------
 tcg/tcg.c                             | 17 +++++++-------
 tcg/tci/tcg-target.inc.c              |  2 --
 20 files changed, 61 insertions(+), 106 deletions(-)
 delete mode 100644 tcg/tcg-be-null.h
 rename tcg/{tcg-be-ldst.h => tcg-ldst.inc.c} (85%)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 3c3b1e603d..484cf6236c 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -120,4 +120,8 @@ static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
 
+#ifdef CONFIG_SOFTMMU
+#define TCG_TARGET_NEED_LDST_LABELS
+#endif
+
 #endif /* AARCH64_TCG_TARGET_H */
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index b836f7f127..55de35a691 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -138,4 +138,8 @@ static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 /* not defined -- call should be eliminated at compile time */
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
 
+#ifdef CONFIG_SOFTMMU
+#define TCG_TARGET_NEED_LDST_LABELS
+#endif
+
 #endif
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 2fd28fa6a5..11ee7fadd1 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -186,4 +186,8 @@ static inline void tb_target_set_jmp_target(uintptr_t tc_ptr,
 
 #define TCG_TARGET_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
 
+#ifdef CONFIG_SOFTMMU
+#define TCG_TARGET_NEED_LDST_LABELS
+#endif
+
 #endif
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index 5c9ca8c1ce..83107e1407 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -199,4 +199,8 @@ static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 /* not defined -- call should be eliminated at compile time */
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
 
+#ifdef CONFIG_SOFTMMU
+#define TCG_TARGET_NEED_LDST_LABELS
+#endif
+
 #endif
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index 557c8ddc46..bea5290b9f 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -209,4 +209,8 @@ static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
 
+#ifdef CONFIG_SOFTMMU
+#define TCG_TARGET_NEED_LDST_LABELS
+#endif
+
 #endif
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 5bab3387e5..c1226ea5b6 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -127,4 +127,8 @@ extern bool have_isa_3_00;
 void flush_icache_range(uintptr_t start, uintptr_t stop);
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
 
+#ifdef CONFIG_SOFTMMU
+#define TCG_TARGET_NEED_LDST_LABELS
+#endif
+
 #endif
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 1398952d6b..8fea9646b4 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -153,4 +153,8 @@ static inline void tb_target_set_jmp_target(uintptr_t tc_ptr,
     /* no need to flush icache explicitly */
 }
 
+#ifdef CONFIG_SOFTMMU
+#define TCG_TARGET_NEED_LDST_LABELS
+#endif
+
 #endif
diff --git a/tcg/tcg-be-null.h b/tcg/tcg-be-null.h
deleted file mode 100644
index 5222fe29e2..0000000000
--- a/tcg/tcg-be-null.h
+++ /dev/null
@@ -1,44 +0,0 @@
-/*
- * TCG Backend Data: No backend data
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- * THE SOFTWARE.
- */
-
-typedef struct TCGBackendData {
-    /* Empty */
-    char dummy;
-} TCGBackendData;
-
-
-/*
- * Initialize TB backend data at the beginning of the TB.
- */
-
-static inline void tcg_out_tb_init(TCGContext *s)
-{
-}
-
-/*
- * Generate TB finalization at the end of block
- */
-
-static inline bool tcg_out_tb_finalize(TCGContext *s)
-{
-    return true;
-}
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 46957d9bd7..b0e00e744e 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -712,8 +712,10 @@ struct TCGContext {
     CPUState *cpu;                      /* *_trans */
     TCGv_env tcg_env;                   /* *_exec  */
 
-    /* The TCGBackendData structure is private to tcg-target.inc.c.  */
-    struct TCGBackendData *be;
+    /* These structures are private to tcg-target.inc.c.  */
+#ifdef TCG_TARGET_NEED_LDST_LABELS
+    struct TCGLabelQemuLdst *ldst_labels;
+#endif
 
     TCGTempSet free_temps[TCG_TYPE_COUNT * 2];
     TCGTemp temps[TCG_MAX_TEMPS]; /* globals first, temps after */
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index a1e5dd2f03..c7c751bafc 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -10,7 +10,6 @@
  * See the COPYING file in the top-level directory for details.
  */
 
-#include "tcg-be-ldst.h"
 #include "qemu/bitops.h"
 
 /* We're going to re-use TCGType in setting of the SF bit, which controls
@@ -1070,6 +1069,8 @@ static void tcg_out_cltz(TCGContext *s, TCGType ext, TCGReg d,
 }
 
 #ifdef CONFIG_SOFTMMU
+#include "tcg-ldst.inc.c"
+
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     TCGMemOpIdx oi, uintptr_t ra)
  */
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 37efcf06af..81ea900852 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -23,7 +23,6 @@
  */
 
 #include "elf.h"
-#include "tcg-be-ldst.h"
 
 int arm_arch = __ARM_ARCH;
 
@@ -1060,6 +1059,8 @@ static inline void tcg_out_mb(TCGContext *s, TCGArg a0)
 }
 
 #ifdef CONFIG_SOFTMMU
+#include "tcg-ldst.inc.c"
+
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     int mmu_idx, uintptr_t ra)
  */
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index e4b120a40c..1a1ad96906 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -22,8 +22,6 @@
  * THE SOFTWARE.
  */
 
-#include "tcg-be-ldst.h"
-
 #ifdef CONFIG_DEBUG_TCG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 #if TCG_TARGET_REG_BITS == 64
@@ -1214,6 +1212,8 @@ static void tcg_out_nopn(TCGContext *s, int n)
 }
 
 #if defined(CONFIG_SOFTMMU)
+#include "tcg-ldst.inc.c"
+
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     int mmu_idx, uintptr_t ra)
  */
diff --git a/tcg/ia64/tcg-target.inc.c b/tcg/ia64/tcg-target.inc.c
index bf9a97d75c..3569f2b457 100644
--- a/tcg/ia64/tcg-target.inc.c
+++ b/tcg/ia64/tcg-target.inc.c
@@ -1565,29 +1565,19 @@ typedef struct TCGLabelQemuLdst {
     struct TCGLabelQemuLdst *next;
 } TCGLabelQemuLdst;
 
-typedef struct TCGBackendData {
-    TCGLabelQemuLdst *labels;
-} TCGBackendData;
-
-static inline void tcg_out_tb_init(TCGContext *s)
-{
-    s->be->labels = NULL;
-}
-
 static void add_qemu_ldst_label(TCGContext *s, bool is_ld, TCGMemOp opc,
                                 tcg_insn_unit *label_ptr)
 {
-    TCGBackendData *be = s->be;
     TCGLabelQemuLdst *l = tcg_malloc(sizeof(*l));
 
     l->is_ld = is_ld;
     l->size = opc & MO_SIZE;
     l->label_ptr = label_ptr;
-    l->next = be->labels;
-    be->labels = l;
+    l->next = s->ldst_labels;
+    s->ldst_labels = l;
 }
 
-static bool tcg_out_tb_finalize(TCGContext *s)
+static bool tcg_out_ldst_finalize(TCGContext *s)
 {
     static const void * const helpers[8] = {
         helper_ret_stb_mmu,
@@ -1602,7 +1592,7 @@ static bool tcg_out_tb_finalize(TCGContext *s)
     tcg_insn_unit *thunks[8] = { };
     TCGLabelQemuLdst *l;
 
-    for (l = s->be->labels; l != NULL; l = l->next) {
+    for (l = s->ldst_labels; l != NULL; l = l->next) {
         long x = l->is_ld * 4 + l->size;
         tcg_insn_unit *dest = thunks[x];
 
@@ -1767,7 +1757,6 @@ static inline void tcg_out_qemu_st(TCGContext *s, const TCGArg *args)
 }
 
 #else /* !CONFIG_SOFTMMU */
-# include "tcg-be-null.h"
 
 static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args)
 {
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 04f8c839fe..750baadf37 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -24,8 +24,6 @@
  * THE SOFTWARE.
  */
 
-#include "tcg-be-ldst.h"
-
 #ifdef HOST_WORDS_BIGENDIAN
 # define MIPS_BE  1
 #else
@@ -1112,6 +1110,8 @@ static void tcg_out_call(TCGContext *s, tcg_insn_unit *arg)
 }
 
 #if defined(CONFIG_SOFTMMU)
+#include "tcg-ldst.inc.c"
+
 static void * const qemu_ld_helpers[16] = {
     [MO_UB]   = helper_ret_ldub_mmu,
     [MO_SB]   = helper_ret_ldsb_mmu,
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 018c240f6d..d772faf7be 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -22,8 +22,6 @@
  * THE SOFTWARE.
  */
 
-#include "tcg-be-ldst.h"
-
 #if defined _CALL_DARWIN || defined __APPLE__
 #define TCG_TARGET_CALL_DARWIN
 #endif
@@ -1418,6 +1416,8 @@ static const uint32_t qemu_exts_opc[4] = {
 };
 
 #if defined (CONFIG_SOFTMMU)
+#include "tcg-ldst.inc.c"
+
 /* helper signature: helper_ld_mmu(CPUState *env, target_ulong addr,
  *                                 int mmu_idx, uintptr_t ra)
  */
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 38b9e791ee..ee0dff995a 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -24,8 +24,6 @@
  * THE SOFTWARE.
  */
 
-#include "tcg-be-ldst.h"
-
 /* We only support generating code for 64-bit mode.  */
 #if TCG_TARGET_REG_BITS != 64
 #error "unsupported code generation mode"
@@ -1458,6 +1456,8 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGMemOp opc, TCGReg data,
 }
 
 #if defined(CONFIG_SOFTMMU)
+#include "tcg-ldst.inc.c"
+
 /* We're expecting to use a 20-bit signed offset on the tlb memory ops.
    Using the offset of the second entry in the last tlb table ensures
    that we can index all of the elements of the first entry.  */
diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index 06cabbedf5..bb7f7e8906 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -22,8 +22,6 @@
  * THE SOFTWARE.
  */
 
-#include "tcg-be-null.h"
-
 #ifdef CONFIG_DEBUG_TCG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
     "%g0",
diff --git a/tcg/tcg-be-ldst.h b/tcg/tcg-ldst.inc.c
similarity index 85%
rename from tcg/tcg-be-ldst.h
rename to tcg/tcg-ldst.inc.c
index 17777aec5a..0e14cf4357 100644
--- a/tcg/tcg-be-ldst.h
+++ b/tcg/tcg-ldst.inc.c
@@ -20,8 +20,6 @@
  * THE SOFTWARE.
  */
 
-#ifdef CONFIG_SOFTMMU
-
 typedef struct TCGLabelQemuLdst {
     bool is_ld;             /* qemu_ld: true, qemu_st: false */
     TCGMemOpIdx oi;
@@ -35,19 +33,6 @@ typedef struct TCGLabelQemuLdst {
     struct TCGLabelQemuLdst *next;
 } TCGLabelQemuLdst;
 
-typedef struct TCGBackendData {
-    TCGLabelQemuLdst *labels;
-} TCGBackendData;
-
-
-/*
- * Initialize TB backend data at the beginning of the TB.
- */
-
-static inline void tcg_out_tb_init(TCGContext *s)
-{
-    s->be->labels = NULL;
-}
 
 /*
  * Generate TB finalization at the end of block
@@ -56,12 +41,12 @@ static inline void tcg_out_tb_init(TCGContext *s)
 static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l);
 static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l);
 
-static bool tcg_out_tb_finalize(TCGContext *s)
+static bool tcg_out_ldst_finalize(TCGContext *s)
 {
     TCGLabelQemuLdst *lb;
 
     /* qemu_ld/st slow paths */
-    for (lb = s->be->labels; lb != NULL; lb = lb->next) {
+    for (lb = s->ldst_labels; lb != NULL; lb = lb->next) {
         if (lb->is_ld) {
             tcg_out_qemu_ld_slow_path(s, lb);
         } else {
@@ -85,13 +70,9 @@ static bool tcg_out_tb_finalize(TCGContext *s)
 
 static inline TCGLabelQemuLdst *new_ldst_label(TCGContext *s)
 {
-    TCGBackendData *be = s->be;
     TCGLabelQemuLdst *l = tcg_malloc(sizeof(*l));
 
-    l->next = be->labels;
-    be->labels = l;
+    l->next = s->ldst_labels;
+    s->ldst_labels = l;
     return l;
 }
-#else
-#include "tcg-be-null.h"
-#endif /* CONFIG_SOFTMMU */
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 35598296c5..dd74eabb0a 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -112,10 +112,9 @@ static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
 static void tcg_out_call(TCGContext *s, tcg_insn_unit *target);
 static int tcg_target_const_match(tcg_target_long val, TCGType type,
                                   const TCGArgConstraint *arg_ct);
-static void tcg_out_tb_init(TCGContext *s);
-static bool tcg_out_tb_finalize(TCGContext *s);
-
-
+#ifdef TCG_TARGET_NEED_LDST_LABELS
+static bool tcg_out_ldst_finalize(TCGContext *s);
+#endif
 
 static TCGRegSet tcg_target_available_regs[2];
 static TCGRegSet tcg_target_call_clobber_regs;
@@ -470,8 +469,6 @@ void tcg_func_start(TCGContext *s)
     s->gen_op_buf[0].prev = 0;
     s->gen_next_op_idx = 1;
     s->gen_next_parm_idx = 0;
-
-    s->be = tcg_malloc(sizeof(TCGBackendData));
 }
 
 static inline int temp_idx(TCGContext *s, TCGTemp *ts)
@@ -2619,7 +2616,9 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     s->code_buf = tb->tc_ptr;
     s->code_ptr = tb->tc_ptr;
 
-    tcg_out_tb_init(s);
+#ifdef TCG_TARGET_NEED_LDST_LABELS
+    s->ldst_labels = NULL;
+#endif
 
     num_insns = -1;
     for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
@@ -2694,9 +2693,11 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     s->gen_insn_end_off[num_insns] = tcg_current_code_size(s);
 
     /* Generate TB finalization at the end of block */
-    if (!tcg_out_tb_finalize(s)) {
+#ifdef TCG_TARGET_NEED_LDST_LABELS
+    if (!tcg_out_ldst_finalize(s)) {
         return -1;
     }
+#endif
 
     /* flush instruction cache */
     flush_icache_range((uintptr_t)s->code_buf, (uintptr_t)s->code_ptr);
diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
index b6a15569f8..94461b2baf 100644
--- a/tcg/tci/tcg-target.inc.c
+++ b/tcg/tci/tcg-target.inc.c
@@ -22,8 +22,6 @@
  * THE SOFTWARE.
  */
 
-#include "tcg-be-null.h"
-
 /* TODO list:
  * - See TODO comments in code.
  */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 03/23] tcg: Infrastructure for managing constant pools
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 01/23] tcg: Move USE_DIRECT_JUMP discriminator to tcg/cpu/tcg-target.h Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 02/23] tcg: Rearrange ldst label tracking Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 04/23] tcg/i386: Store out-of-range call targets in constant pool Richard Henderson
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

A new shared header tcg-pool.inc.c adds new_pool_label,
for registering a tcg_target_ulong to be emitted after
the generated code, plus relocation data to install a
pointer to the data.

A new pointer is added to the TCGContext, so that we
dump the constant pool as data, not code.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg.h                 |  4 +++
 accel/tcg/translate-all.c | 22 +++++++++++-
 tcg/tcg-pool.inc.c        | 85 +++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg.c                 |  9 +++++
 4 files changed, 119 insertions(+), 1 deletion(-)
 create mode 100644 tcg/tcg-pool.inc.c

diff --git a/tcg/tcg.h b/tcg/tcg.h
index b0e00e744e..ac94133870 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -702,6 +702,7 @@ struct TCGContext {
     void *code_gen_buffer;
     size_t code_gen_buffer_size;
     void *code_gen_ptr;
+    void *data_gen_ptr;
 
     /* Threshold to flush the translated code buffer.  */
     void *code_gen_highwater;
@@ -716,6 +717,9 @@ struct TCGContext {
 #ifdef TCG_TARGET_NEED_LDST_LABELS
     struct TCGLabelQemuLdst *ldst_labels;
 #endif
+#ifdef TCG_TARGET_NEED_POOL_LABELS
+    struct TCGLabelPoolData *pool_labels;
+#endif
 
     TCGTempSet free_temps[TCG_TYPE_COUNT * 2];
     TCGTemp temps[TCG_MAX_TEMPS]; /* globals first, temps after */
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 93a1cf2ba8..2d1ed06065 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1329,7 +1329,27 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         qemu_log_in_addr_range(tb->pc)) {
         qemu_log_lock();
         qemu_log("OUT: [size=%d]\n", gen_code_size);
-        log_disas(tb->tc_ptr, gen_code_size);
+        if (tcg_ctx.data_gen_ptr) {
+            size_t code_size = tcg_ctx.data_gen_ptr - tb->tc_ptr;
+            size_t data_size = gen_code_size - code_size;
+            size_t i;
+
+            log_disas(tb->tc_ptr, code_size);
+
+            for (i = 0; i < data_size; i += sizeof(tcg_target_ulong)) {
+                if (sizeof(tcg_target_ulong) == 8) {
+                    qemu_log("0x%08" PRIxPTR ":  .quad  0x%016" PRIx64 "\n",
+                             (uintptr_t)tcg_ctx.data_gen_ptr + i,
+                             *(uint64_t *)(tcg_ctx.data_gen_ptr + i));
+                } else {
+                    qemu_log("0x%08" PRIxPTR ":  .long  0x%08x\n",
+                             (uintptr_t)tcg_ctx.data_gen_ptr + i,
+                             *(uint32_t *)(tcg_ctx.data_gen_ptr + i));
+                }
+            }
+        } else {
+            log_disas(tb->tc_ptr, gen_code_size);
+        }
         qemu_log("\n");
         qemu_log_flush();
         qemu_log_unlock();
diff --git a/tcg/tcg-pool.inc.c b/tcg/tcg-pool.inc.c
new file mode 100644
index 0000000000..8a85131405
--- /dev/null
+++ b/tcg/tcg-pool.inc.c
@@ -0,0 +1,85 @@
+/*
+ * TCG Backend Data: constant pool.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+typedef struct TCGLabelPoolData {
+    struct TCGLabelPoolData *next;
+    tcg_target_ulong data;
+    tcg_insn_unit *label;
+    intptr_t addend;
+    int type;
+} TCGLabelPoolData;
+
+
+static void new_pool_label(TCGContext *s, tcg_target_ulong data, int type,
+                           tcg_insn_unit *label, intptr_t addend)
+{
+    TCGLabelPoolData *n = tcg_malloc(sizeof(*n));
+    TCGLabelPoolData *i, **pp;
+
+    n->data = data;
+    n->label = label;
+    n->type = type;
+    n->addend = addend;
+
+    /* Insertion sort on the pool.  */
+    for (pp = &s->pool_labels; (i = *pp) && i->data < data; pp = &i->next) {
+        continue;
+    }
+    n->next = *pp;
+    *pp = n;
+}
+
+/* To be provided by cpu/tcg-target.inc.c.  */
+static void tcg_out_nop_fill(tcg_insn_unit *p, int count);
+
+static bool tcg_out_pool_finalize(TCGContext *s)
+{
+    TCGLabelPoolData *p = s->pool_labels;
+    tcg_target_ulong d, *a;
+
+    if (p == NULL) {
+        return true;
+    }
+
+    /* ??? Round up to qemu_icache_linesize, but then do not round
+       again when allocating the next TranslationBlock structure.  */
+    a = (void *)ROUND_UP((uintptr_t)s->code_ptr, sizeof(tcg_target_ulong));
+    tcg_out_nop_fill(s->code_ptr, (tcg_insn_unit *)a - s->code_ptr);
+    s->data_gen_ptr = a;
+
+    /* Ensure the first comparison fails.  */
+    d = p->data + 1;
+
+    for (; p != NULL; p = p->next) {
+        if (p->data != d) {
+            d = p->data;
+            if (unlikely((void *)a > s->code_gen_highwater)) {
+                return false;
+            }
+            *a++ = d;
+        }
+        patch_reloc(p->label, p->type, (intptr_t)(a - 1), p->addend);
+    }
+
+    s->code_ptr = (void *)a;
+    return true;
+}
diff --git a/tcg/tcg.c b/tcg/tcg.c
index dd74eabb0a..fd8a3dfe93 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -399,6 +399,7 @@ TranslationBlock *tcg_tb_alloc(TCGContext *s)
         return NULL;
     }
     s->code_gen_ptr = next;
+    s->data_gen_ptr = NULL;
     return tb;
 }
 
@@ -2619,6 +2620,9 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
 #ifdef TCG_TARGET_NEED_LDST_LABELS
     s->ldst_labels = NULL;
 #endif
+#ifdef TCG_TARGET_NEED_POOL_LABELS
+    s->pool_labels = NULL;
+#endif
 
     num_insns = -1;
     for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
@@ -2698,6 +2702,11 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
         return -1;
     }
 #endif
+#ifdef TCG_TARGET_NEED_POOL_LABELS
+    if (!tcg_out_pool_finalize(s)) {
+        return -1;
+    }
+#endif
 
     /* flush instruction cache */
     flush_icache_range((uintptr_t)s->code_buf, (uintptr_t)s->code_ptr);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 04/23] tcg/i386: Store out-of-range call targets in constant pool
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
                   ` (2 preceding siblings ...)
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 03/23] tcg: Infrastructure for managing constant pools Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 05/23] tcg/s390: Introduce TCG_REG_TB Richard Henderson
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

Already it saves 2 bytes per call, but also the constant pool
entry may well be shared across multiple calls.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.h     |  1 +
 tcg/i386/tcg-target.inc.c | 18 +++++++++++++++---
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 11ee7fadd1..b89dababf4 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -189,5 +189,6 @@ static inline void tb_target_set_jmp_target(uintptr_t tc_ptr,
 #ifdef CONFIG_SOFTMMU
 #define TCG_TARGET_NEED_LDST_LABELS
 #endif
+#define TCG_TARGET_NEED_POOL_LABELS
 
 #endif
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 1a1ad96906..5231056fd3 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -22,6 +22,8 @@
  * THE SOFTWARE.
  */
 
+#include "tcg-pool.inc.c"
+
 #ifdef CONFIG_DEBUG_TCG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 #if TCG_TARGET_REG_BITS == 64
@@ -1180,9 +1182,14 @@ static void tcg_out_branch(TCGContext *s, int call, tcg_insn_unit *dest)
         tcg_out_opc(s, call ? OPC_CALL_Jz : OPC_JMP_long, 0, 0, 0);
         tcg_out32(s, disp);
     } else {
-        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R10, (uintptr_t)dest);
-        tcg_out_modrm(s, OPC_GRP5,
-                      call ? EXT5_CALLN_Ev : EXT5_JMPN_Ev, TCG_REG_R10);
+        /* rip-relative addressing into the constant pool.
+           This is 6 + 8 = 14 bytes, as compared to using an
+           an immediate load 10 + 6 = 16 bytes, plus we may
+           be able to re-use the pool constant for more calls.  */
+        tcg_out_opc(s, OPC_GRP5, 0, 0, 0);
+        tcg_out8(s, (call ? EXT5_CALLN_Ev : EXT5_JMPN_Ev) << 3 | 5);
+        new_pool_label(s, (uintptr_t)dest, R_386_PC32, s->code_ptr, -4);
+        tcg_out32(s, 0);
     }
 }
 
@@ -2595,6 +2602,11 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 #endif
 }
 
+static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
+{
+    memset(p, 0x90, count);
+}
+
 static void tcg_target_init(TCGContext *s)
 {
 #ifdef CONFIG_CPUID_H
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 05/23] tcg/s390: Introduce TCG_REG_TB
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
                   ` (3 preceding siblings ...)
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 04/23] tcg/i386: Store out-of-range call targets in constant pool Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 06/23] tcg/s390: Fix sign of patch_reloc addend Richard Henderson
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/s390/tcg-target.h     |  2 +-
 tcg/s390/tcg-target.inc.c | 71 +++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 61 insertions(+), 12 deletions(-)

diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 8fea9646b4..8c5a30ccf8 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -95,7 +95,7 @@ extern uint64_t s390_facilities;
 #define TCG_TARGET_HAS_extrl_i64_i32  0
 #define TCG_TARGET_HAS_extrh_i64_i32  0
 #define TCG_TARGET_HAS_goto_ptr       1
-#define TCG_TARGET_HAS_direct_jump    1
+#define TCG_TARGET_HAS_direct_jump    (s390_facilities & FACILITY_GEN_INST_EXT)
 
 #define TCG_TARGET_HAS_div2_i64       1
 #define TCG_TARGET_HAS_rot_i64        1
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index ee0dff995a..e007586315 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -51,6 +51,12 @@
 /* A scratch register that may be be used throughout the backend.  */
 #define TCG_TMP0        TCG_REG_R1
 
+/* A scratch register that holds a pointer to the beginning of the TB.
+   We don't need this when we have pc-relative loads with the general
+   instructions extension facility.  */
+#define TCG_REG_TB      TCG_REG_R12
+#define USE_REG_TB      (!(s390_facilities & FACILITY_GEN_INST_EXT))
+
 #ifndef CONFIG_SOFTMMU
 #define TCG_GUEST_BASE_REG TCG_REG_R13
 #endif
@@ -556,8 +562,8 @@ static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg dst, TCGReg src)
 }
 
 /* load a register with an immediate value */
-static void tcg_out_movi(TCGContext *s, TCGType type,
-                         TCGReg ret, tcg_target_long sval)
+static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
+                             tcg_target_long sval, bool in_prologue)
 {
     static const S390Opcode lli_insns[4] = {
         RI_LLILL, RI_LLILH, RI_LLIHL, RI_LLIHH
@@ -601,13 +607,22 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
         }
     }
 
-    /* Try for PC-relative address load.  */
+    /* Try for PC-relative address load.  For odd addresses,
+       attempt to use an offset from the start of the TB.  */
     if ((sval & 1) == 0) {
         ptrdiff_t off = tcg_pcrel_diff(s, (void *)sval) >> 1;
         if (off == (int32_t)off) {
             tcg_out_insn(s, RIL, LARL, ret, off);
             return;
         }
+    } else if (USE_REG_TB && !in_prologue) {
+        ptrdiff_t off = sval - (uintptr_t)s->code_gen_ptr;
+        if (off == sextract64(off, 0, 20)) {
+            /* This is certain to be an address within TB, and therefore
+               OFF will be negative; don't try RX_LA.  */
+            tcg_out_insn(s, RXY, LAY, ret, TCG_REG_TB, TCG_REG_NONE, off);
+            return;
+        }
     }
 
     /* If extended immediates are not present, then we may have to issue
@@ -663,6 +678,11 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
     }
 }
 
+static void tcg_out_movi(TCGContext *s, TCGType type,
+                         TCGReg ret, tcg_target_long sval)
+{
+    tcg_out_movi_int(s, type, ret, sval, false);
+}
 
 /* Emit a load/store type instruction.  Inputs are:
    DATA:     The register to be loaded or stored.
@@ -739,6 +759,13 @@ static void tcg_out_ld_abs(TCGContext *s, TCGType type, TCGReg dest, void *abs)
             return;
         }
     }
+    if (USE_REG_TB) {
+        ptrdiff_t disp = abs - (void *)s->code_gen_ptr;
+        if (disp == sextract64(disp, 0, 20)) {
+            tcg_out_ld(s, type, dest, TCG_REG_TB, disp);
+            return;
+        }
+    }
 
     tcg_out_movi(s, TCG_TYPE_PTR, dest, addr & ~0xffff);
     tcg_out_ld(s, type, dest, dest, addr & 0xffff);
@@ -1690,6 +1717,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_goto_tb:
+        a0 = args[0];
         if (s->tb_jmp_insn_offset) {
             /* branch displacement must be aligned for atomic patching;
              * see if we need to add extra nop before branch
@@ -1697,21 +1725,34 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
             if (!QEMU_PTR_IS_ALIGNED(s->code_ptr + 1, 4)) {
                 tcg_out16(s, NOP);
             }
+            tcg_debug_assert(!USE_REG_TB);
             tcg_out16(s, RIL_BRCL | (S390_CC_ALWAYS << 4));
-            s->tb_jmp_insn_offset[args[0]] = tcg_current_code_size(s);
+            s->tb_jmp_insn_offset[a0] = tcg_current_code_size(s);
             s->code_ptr += 2;
         } else {
-            /* load address stored at s->tb_jmp_target_addr + args[0] */
-            tcg_out_ld_abs(s, TCG_TYPE_PTR, TCG_TMP0,
-                           s->tb_jmp_target_addr + args[0]);
+            /* load address stored at s->tb_jmp_target_addr + a0 */
+            tcg_out_ld_abs(s, TCG_TYPE_PTR, TCG_REG_TB,
+                           s->tb_jmp_target_addr + a0);
             /* and go there */
-            tcg_out_insn(s, RR, BCR, S390_CC_ALWAYS, TCG_TMP0);
+            tcg_out_insn(s, RR, BCR, S390_CC_ALWAYS, TCG_REG_TB);
+        }
+        s->tb_jmp_reset_offset[a0] = tcg_current_code_size(s);
+
+        /* For the unlinked path of goto_tb, we need to reset
+           TCG_REG_TB to the beginning of this TB.  */
+        if (USE_REG_TB) {
+            int ofs = -tcg_current_code_size(s);
+            assert(ofs == (int16_t)ofs);
+            tcg_out_insn(s, RI, AGHI, TCG_REG_TB, ofs);
         }
-        s->tb_jmp_reset_offset[args[0]] = tcg_current_code_size(s);
         break;
 
     case INDEX_op_goto_ptr:
-        tcg_out_insn(s, RR, BCR, S390_CC_ALWAYS, args[0]);
+        a0 = args[0];
+        if (USE_REG_TB) {
+            tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_TB, a0);
+        }
+        tcg_out_insn(s, RR, BCR, S390_CC_ALWAYS, a0);
         break;
 
     OP_32_64(ld8u):
@@ -2476,6 +2517,9 @@ static void tcg_target_init(TCGContext *s)
     /* XXX many insns can't be used with R0, so we better avoid it for now */
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_R0);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
+    if (USE_REG_TB) {
+        tcg_regset_set_reg(s->reserved_regs, TCG_REG_TB);
+    }
 }
 
 #define FRAME_SIZE  ((int)(TCG_TARGET_CALL_STACK_OFFSET          \
@@ -2496,12 +2540,17 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 
 #ifndef CONFIG_SOFTMMU
     if (guest_base >= 0x80000) {
-        tcg_out_movi(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, guest_base);
+        tcg_out_movi_int(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, guest_base, true);
         tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG);
     }
 #endif
 
     tcg_out_mov(s, TCG_TYPE_PTR, TCG_AREG0, tcg_target_call_iarg_regs[0]);
+    if (USE_REG_TB) {
+        tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_TB,
+                    tcg_target_call_iarg_regs[1]);
+    }
+
     /* br %r3 (go to TB) */
     tcg_out_insn(s, RR, BCR, S390_CC_ALWAYS, tcg_target_call_iarg_regs[1]);
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 06/23] tcg/s390: Fix sign of patch_reloc addend
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
                   ` (4 preceding siblings ...)
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 05/23] tcg/s390: Introduce TCG_REG_TB Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 07/23] tcg/s390: Use constant pool for movi Richard Henderson
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

We were passing in -2 instead of +2, but then ignoring
the actual contents of addend in the calculation.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/s390/tcg-target.inc.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index e007586315..59c0da0922 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -360,21 +360,22 @@ uint64_t s390_facilities;
 static void patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
-    intptr_t pcrel2 = (tcg_insn_unit *)value - (code_ptr - 1);
-    tcg_debug_assert(addend == -2);
+    intptr_t pcrel2;
+
+    value += addend;
+    pcrel2 = (tcg_insn_unit *)value - code_ptr;
 
     switch (type) {
     case R_390_PC16DBL:
-        tcg_debug_assert(pcrel2 == (int16_t)pcrel2);
+        assert(pcrel2 == (int16_t)pcrel2);
         tcg_patch16(code_ptr, pcrel2);
         break;
     case R_390_PC32DBL:
-        tcg_debug_assert(pcrel2 == (int32_t)pcrel2);
+        assert(pcrel2 == (int32_t)pcrel2);
         tcg_patch32(code_ptr, pcrel2);
         break;
     default:
-        tcg_abort();
-        break;
+        g_assert_not_reached();
     }
 }
 
@@ -1270,11 +1271,11 @@ static void tgen_branch(TCGContext *s, int cc, TCGLabel *l)
         tgen_gotoi(s, cc, l->u.value_ptr);
     } else if (USE_LONG_BRANCHES) {
         tcg_out16(s, RIL_BRCL | (cc << 4));
-        tcg_out_reloc(s, s->code_ptr, R_390_PC32DBL, l, -2);
+        tcg_out_reloc(s, s->code_ptr, R_390_PC32DBL, l, 2);
         s->code_ptr += 2;
     } else {
         tcg_out16(s, RI_BRC | (cc << 4));
-        tcg_out_reloc(s, s->code_ptr, R_390_PC16DBL, l, -2);
+        tcg_out_reloc(s, s->code_ptr, R_390_PC16DBL, l, 2);
         s->code_ptr += 1;
     }
 }
@@ -1289,7 +1290,7 @@ static void tgen_compare_branch(TCGContext *s, S390Opcode opc, int cc,
     } else {
         /* We need to keep the offset unchanged for retranslation.  */
         off = s->code_ptr[1];
-        tcg_out_reloc(s, s->code_ptr + 1, R_390_PC16DBL, l, -2);
+        tcg_out_reloc(s, s->code_ptr + 1, R_390_PC16DBL, l, 2);
     }
 
     tcg_out16(s, (opc & 0xff00) | (r1 << 4) | r2);
@@ -1307,7 +1308,7 @@ static void tgen_compare_imm_branch(TCGContext *s, S390Opcode opc, int cc,
     } else {
         /* We need to keep the offset unchanged for retranslation.  */
         off = s->code_ptr[1];
-        tcg_out_reloc(s, s->code_ptr + 1, R_390_PC16DBL, l, -2);
+        tcg_out_reloc(s, s->code_ptr + 1, R_390_PC16DBL, l, 2);
     }
 
     tcg_out16(s, (opc & 0xff00) | (r1 << 4) | cc);
@@ -1571,7 +1572,7 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     TCGMemOpIdx oi = lb->oi;
     TCGMemOp opc = get_memop(oi);
 
-    patch_reloc(lb->label_ptr[0], R_390_PC16DBL, (intptr_t)s->code_ptr, -2);
+    patch_reloc(lb->label_ptr[0], R_390_PC16DBL, (intptr_t)s->code_ptr, 2);
 
     tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_R2, TCG_AREG0);
     if (TARGET_LONG_BITS == 64) {
@@ -1592,7 +1593,7 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     TCGMemOpIdx oi = lb->oi;
     TCGMemOp opc = get_memop(oi);
 
-    patch_reloc(lb->label_ptr[0], R_390_PC16DBL, (intptr_t)s->code_ptr, -2);
+    patch_reloc(lb->label_ptr[0], R_390_PC16DBL, (intptr_t)s->code_ptr, 2);
 
     tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_R2, TCG_AREG0);
     if (TARGET_LONG_BITS == 64) {
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 07/23] tcg/s390: Use constant pool for movi
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
                   ` (5 preceding siblings ...)
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 06/23] tcg/s390: Fix sign of patch_reloc addend Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 08/23] tcg/s390: Use constant pool for andi Richard Henderson
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

Split out maybe_out_small_movi for use with other operations
that want to add to the constant pool.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 include/elf.h             |   3 +-
 tcg/s390/tcg-target.h     |   1 +
 tcg/s390/tcg-target.inc.c | 130 +++++++++++++++++++++++++++-------------------
 3 files changed, 80 insertions(+), 54 deletions(-)

diff --git a/include/elf.h b/include/elf.h
index cd51434877..e8a515ce3d 100644
--- a/include/elf.h
+++ b/include/elf.h
@@ -942,8 +942,9 @@ typedef struct {
 #define R_390_TLS_DTPOFF	55	/* Offset in TLS block.  */
 #define R_390_TLS_TPOFF		56	/* Negate offset in static TLS
                                            block.  */
+#define R_390_20                57
 /* Keep this the last entry.  */
-#define R_390_NUM	57
+#define R_390_NUM               58
 
 /* x86-64 relocation types */
 #define R_X86_64_NONE		0	/* No reloc */
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 8c5a30ccf8..18a0efac3e 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -156,5 +156,6 @@ static inline void tb_target_set_jmp_target(uintptr_t tc_ptr,
 #ifdef CONFIG_SOFTMMU
 #define TCG_TARGET_NEED_LDST_LABELS
 #endif
+#define TCG_TARGET_NEED_POOL_LABELS
 
 #endif
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 59c0da0922..29b77ff67f 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -29,6 +29,7 @@
 #error "unsupported code generation mode"
 #endif
 
+#include "tcg-pool.inc.c"
 #include "elf.h"
 
 /* ??? The translation blocks produced by TCG are generally small enough to
@@ -361,6 +362,7 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
     intptr_t pcrel2;
+    uint32_t old;
 
     value += addend;
     pcrel2 = (tcg_insn_unit *)value - code_ptr;
@@ -374,6 +376,12 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
         assert(pcrel2 == (int32_t)pcrel2);
         tcg_patch32(code_ptr, pcrel2);
         break;
+    case R_390_20:
+        assert(value == sextract64(value, 0, 20));
+        old = *(uint32_t *)code_ptr & 0xf00000ff;
+        old |= ((value & 0xfff) << 16) | ((value & 0xff000) >> 4);
+        tcg_patch32(code_ptr, old);
+        break;
     default:
         g_assert_not_reached();
     }
@@ -562,14 +570,16 @@ static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg dst, TCGReg src)
     }
 }
 
-/* load a register with an immediate value */
-static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
-                             tcg_target_long sval, bool in_prologue)
-{
-    static const S390Opcode lli_insns[4] = {
-        RI_LLILL, RI_LLILH, RI_LLIHL, RI_LLIHH
-    };
+static const S390Opcode lli_insns[4] = {
+    RI_LLILL, RI_LLILH, RI_LLIHL, RI_LLIHH
+};
+static const S390Opcode ii_insns[4] = {
+    RI_IILL, RI_IILH, RI_IIHL, RI_IIHH
+};
 
+static bool maybe_out_small_movi(TCGContext *s, TCGType type,
+                                 TCGReg ret, tcg_target_long sval)
+{
     tcg_target_ulong uval = sval;
     int i;
 
@@ -581,17 +591,37 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
     /* Try all 32-bit insns that can load it in one go.  */
     if (sval >= -0x8000 && sval < 0x8000) {
         tcg_out_insn(s, RI, LGHI, ret, sval);
-        return;
+        return true;
     }
 
     for (i = 0; i < 4; i++) {
         tcg_target_long mask = 0xffffull << i*16;
         if ((uval & mask) == uval) {
             tcg_out_insn_RI(s, lli_insns[i], ret, uval >> i*16);
-            return;
+            return true;
         }
     }
 
+    return false;
+}
+
+/* load a register with an immediate value */
+static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
+                             tcg_target_long sval, bool in_prologue)
+{
+    tcg_target_ulong uval;
+
+    /* Try all 32-bit insns that can load it in one go.  */
+    if (maybe_out_small_movi(s, type, ret, sval)) {
+        return;
+    }
+
+    uval = sval;
+    if (type == TCG_TYPE_I32) {
+        uval = (uint32_t)sval;
+        sval = (int32_t)sval;
+    }
+
     /* Try all 48-bit insns that can load it in one go.  */
     if (s390_facilities & FACILITY_EXT_IMM) {
         if (sval == (int32_t)sval) {
@@ -603,7 +633,7 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
             return;
         }
         if ((uval & 0xffffffff) == 0) {
-            tcg_out_insn(s, RIL, LLIHF, ret, uval >> 31 >> 1);
+            tcg_out_insn(s, RIL, LLIHF, ret, uval >> 32);
             return;
         }
     }
@@ -626,55 +656,44 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
         }
     }
 
-    /* If extended immediates are not present, then we may have to issue
-       several instructions to load the low 32 bits.  */
-    if (!(s390_facilities & FACILITY_EXT_IMM)) {
-        /* A 32-bit unsigned value can be loaded in 2 insns.  And given
-           that the lli_insns loop above did not succeed, we know that
-           both insns are required.  */
-        if (uval <= 0xffffffff) {
-            tcg_out_insn(s, RI, LLILL, ret, uval);
-            tcg_out_insn(s, RI, IILH, ret, uval >> 16);
-            return;
-        }
+    /* A 32-bit unsigned value can be loaded in 2 insns.  And given
+       that LLILL, LLIHL, LLILF above did not succeed, we know that
+       both insns are required.  */
+    if (uval <= 0xffffffff) {
+        tcg_out_insn(s, RI, LLILL, ret, uval);
+        tcg_out_insn(s, RI, IILH, ret, uval >> 16);
+        return;
+    }
 
-        /* If all high bits are set, the value can be loaded in 2 or 3 insns.
-           We first want to make sure that all the high bits get set.  With
-           luck the low 16-bits can be considered negative to perform that for
-           free, otherwise we load an explicit -1.  */
-        if (sval >> 31 >> 1 == -1) {
-            if (uval & 0x8000) {
-                tcg_out_insn(s, RI, LGHI, ret, uval);
-            } else {
-                tcg_out_insn(s, RI, LGHI, ret, -1);
-                tcg_out_insn(s, RI, IILL, ret, uval);
-            }
-            tcg_out_insn(s, RI, IILH, ret, uval >> 16);
-            return;
+    /* When allowed, stuff it in the constant pool.  */
+    if (!in_prologue) {
+        if (USE_REG_TB) {
+            tcg_out_insn(s, RXY, LG, ret, TCG_REG_TB, TCG_REG_NONE, 0);
+            new_pool_label(s, sval, R_390_20, s->code_ptr - 2,
+                           -(intptr_t)s->code_gen_ptr);
+        } else {
+            tcg_out_insn(s, RIL, LGRL, ret, 0);
+            new_pool_label(s, sval, R_390_PC32DBL, s->code_ptr - 2, 2);
         }
+        return;
     }
 
-    /* If we get here, both the high and low parts have non-zero bits.  */
-
-    /* Recurse to load the lower 32-bits.  */
-    tcg_out_movi(s, TCG_TYPE_I64, ret, uval & 0xffffffff);
-
-    /* Insert data into the high 32-bits.  */
-    uval = uval >> 31 >> 1;
+    /* What's left is for the prologue, loading GUEST_BASE, and because
+       it failed to match above, is known to be a full 64-bit quantity.
+       We could try more than this, but it probably wouldn't pay off.  */
     if (s390_facilities & FACILITY_EXT_IMM) {
-        if (uval < 0x10000) {
-            tcg_out_insn(s, RI, IIHL, ret, uval);
-        } else if ((uval & 0xffff) == 0) {
-            tcg_out_insn(s, RI, IIHH, ret, uval >> 16);
-        } else {
-            tcg_out_insn(s, RIL, IIHF, ret, uval);
-        }
+        tcg_out_insn(s, RIL, LLILF, ret, uval);
+        tcg_out_insn(s, RIL, IIHF, ret, uval >> 32);
     } else {
-        if (uval & 0xffff) {
-            tcg_out_insn(s, RI, IIHL, ret, uval);
-        }
-        if (uval & 0xffff0000) {
-            tcg_out_insn(s, RI, IIHH, ret, uval >> 16);
+        const S390Opcode *insns = lli_insns;
+        int i;
+
+        for (i = 0; i < 4; i++) {
+            uint16_t part = uval >> (16 * i);
+            if (part) {
+                tcg_out_insn_RI(s, insns[i], ret, part);
+                insns = ii_insns;
+            }
         }
     }
 }
@@ -2573,6 +2592,11 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     tcg_out_insn(s, RR, BCR, S390_CC_ALWAYS, TCG_REG_R14);
 }
 
+static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
+{
+    memset(p, 0x07, count * sizeof(tcg_insn_unit));
+}
+
 typedef struct {
     DebugFrameHeader h;
     uint8_t fde_def_cfa[4];
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 08/23] tcg/s390: Use constant pool for andi
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
                   ` (6 preceding siblings ...)
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 07/23] tcg/s390: Use constant pool for movi Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 09/23] tcg/s390: Use constant pool for ori Richard Henderson
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/s390/tcg-target.inc.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 29b77ff67f..4be57c5765 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -224,6 +224,7 @@ typedef enum S390Opcode {
     RXY_LRVG    = 0xe30f,
     RXY_LRVH    = 0xe31f,
     RXY_LY      = 0xe358,
+    RXY_NG      = 0xe380,
     RXY_STCY    = 0xe372,
     RXY_STG     = 0xe324,
     RXY_STHY    = 0xe370,
@@ -985,8 +986,17 @@ static void tgen_andi(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
         return;
     }
 
-    /* Fall back to loading the constant.  */
-    tcg_out_movi(s, type, TCG_TMP0, val);
+    /* Use the constant pool if USE_REG_TB, but not for small constants.  */
+    if (USE_REG_TB) {
+        if (!maybe_out_small_movi(s, type, TCG_TMP0, val)) {
+            tcg_out_insn(s, RXY, NG, dest, TCG_REG_TB, TCG_REG_NONE, 0);
+            new_pool_label(s, val & valid, R_390_20, s->code_ptr - 2,
+                           -(intptr_t)s->code_gen_ptr);
+            return;
+        }
+    } else {
+        tcg_out_movi(s, type, TCG_TMP0, val);
+    }
     if (type == TCG_TYPE_I32) {
         tcg_out_insn(s, RR, NR, dest, TCG_TMP0);
     } else {
@@ -2341,6 +2351,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         return &r_r_ri;
     case INDEX_op_sub_i32:
     case INDEX_op_sub_i64:
+    case INDEX_op_and_i32:
+    case INDEX_op_and_i64:
         return (s390_facilities & FACILITY_DISTINCT_OPS ? &r_r_ri : &r_0_ri);
 
     case INDEX_op_mul_i32:
@@ -2375,10 +2387,6 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
                 ? (s390_facilities & FACILITY_DISTINCT_OPS ? &r_r_rM : &r_0_rM)
                 : &r_0_r);
 
-    case INDEX_op_and_i32:
-    case INDEX_op_and_i64:
-        return (s390_facilities & FACILITY_DISTINCT_OPS ? &r_r_ri : &r_0_ri);
-
     case INDEX_op_shl_i32:
     case INDEX_op_shr_i32:
     case INDEX_op_sar_i32:
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 09/23] tcg/s390: Use constant pool for ori
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
                   ` (7 preceding siblings ...)
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 08/23] tcg/s390: Use constant pool for andi Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 10/23] tcg/s390: Use constant pool for xori Richard Henderson
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/s390/tcg-target.inc.c | 74 ++++++++++++++++++++++-------------------------
 1 file changed, 34 insertions(+), 40 deletions(-)

diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 4be57c5765..83fac71c31 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -225,6 +225,7 @@ typedef enum S390Opcode {
     RXY_LRVH    = 0xe31f,
     RXY_LY      = 0xe358,
     RXY_NG      = 0xe380,
+    RXY_OG      = 0xe381,
     RXY_STCY    = 0xe372,
     RXY_STG     = 0xe324,
     RXY_STHY    = 0xe370,
@@ -1004,55 +1005,60 @@ static void tgen_andi(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
     }
 }
 
-static void tgen64_ori(TCGContext *s, TCGReg dest, tcg_target_ulong val)
+static void tgen_ori(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
 {
     static const S390Opcode oi_insns[4] = {
         RI_OILL, RI_OILH, RI_OIHL, RI_OIHH
     };
-    static const S390Opcode nif_insns[2] = {
+    static const S390Opcode oif_insns[2] = {
         RIL_OILF, RIL_OIHF
     };
 
     int i;
 
     /* Look for no-op.  */
-    if (val == 0) {
+    if (unlikely(val == 0)) {
         return;
     }
 
-    if (s390_facilities & FACILITY_EXT_IMM) {
-        /* Try all 32-bit insns that can perform it in one go.  */
-        for (i = 0; i < 4; i++) {
-            tcg_target_ulong mask = (0xffffull << i*16);
-            if ((val & mask) != 0 && (val & ~mask) == 0) {
-                tcg_out_insn_RI(s, oi_insns[i], dest, val >> i*16);
-                return;
-            }
+    /* Try all 32-bit insns that can perform it in one go.  */
+    for (i = 0; i < 4; i++) {
+        tcg_target_ulong mask = (0xffffull << i*16);
+        if ((val & mask) != 0 && (val & ~mask) == 0) {
+            tcg_out_insn_RI(s, oi_insns[i], dest, val >> i*16);
+            return;
         }
+    }
 
-        /* Try all 48-bit insns that can perform it in one go.  */
+    /* Try all 48-bit insns that can perform it in one go.  */
+    if (s390_facilities & FACILITY_EXT_IMM) {
         for (i = 0; i < 2; i++) {
             tcg_target_ulong mask = (0xffffffffull << i*32);
             if ((val & mask) != 0 && (val & ~mask) == 0) {
-                tcg_out_insn_RIL(s, nif_insns[i], dest, val >> i*32);
+                tcg_out_insn_RIL(s, oif_insns[i], dest, val >> i*32);
                 return;
             }
         }
+    }
 
+    /* Use the constant pool if USE_REG_TB, but not for small constants.  */
+    if (maybe_out_small_movi(s, type, TCG_TMP0, val)) {
+        if (type == TCG_TYPE_I32) {
+            tcg_out_insn(s, RR, OR, dest, TCG_TMP0);
+        } else {
+            tcg_out_insn(s, RRE, OGR, dest, TCG_TMP0);
+        }
+    } else if (USE_REG_TB) {
+        tcg_out_insn(s, RXY, OG, dest, TCG_REG_TB, TCG_REG_NONE, 0);
+        new_pool_label(s, val, R_390_20, s->code_ptr - 2,
+                       -(intptr_t)s->code_gen_ptr);
+    } else {
         /* Perform the OR via sequential modifications to the high and
            low parts.  Do this via recursion to handle 16-bit vs 32-bit
            masks in each half.  */
-        tgen64_ori(s, dest, val & 0x00000000ffffffffull);
-        tgen64_ori(s, dest, val & 0xffffffff00000000ull);
-    } else {
-        /* With no extended-immediate facility, we don't need to be so
-           clever.  Just iterate over the insns and mask in the constant.  */
-        for (i = 0; i < 4; i++) {
-            tcg_target_ulong mask = (0xffffull << i*16);
-            if ((val & mask) != 0) {
-                tcg_out_insn_RI(s, oi_insns[i], dest, val >> i*16);
-            }
-        }
+        tcg_debug_assert(s390_facilities & FACILITY_EXT_IMM);
+        tgen_ori(s, type, dest, val & 0x00000000ffffffffull);
+        tgen_ori(s, type, dest, val & 0xffffffff00000000ull);
     }
 }
 
@@ -1872,7 +1878,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         a0 = args[0], a1 = args[1], a2 = (uint32_t)args[2];
         if (const_args[2]) {
             tcg_out_mov(s, TCG_TYPE_I32, a0, a1);
-            tgen64_ori(s, a0, a2);
+            tgen_ori(s, TCG_TYPE_I32, a0, a2);
         } else if (a0 == a1) {
             tcg_out_insn(s, RR, OR, a0, a2);
         } else {
@@ -2104,7 +2110,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         a0 = args[0], a1 = args[1], a2 = args[2];
         if (const_args[2]) {
             tcg_out_mov(s, TCG_TYPE_I64, a0, a1);
-            tgen64_ori(s, a0, a2);
+            tgen_ori(s, TCG_TYPE_I64, a0, a2);
         } else if (a0 == a1) {
             tcg_out_insn(s, RRE, OGR, a0, a2);
         } else {
@@ -2312,7 +2318,6 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     static const TCGTargetOpDef r_0_ri = { .args_ct_str = { "r", "0", "ri" } };
     static const TCGTargetOpDef r_0_rI = { .args_ct_str = { "r", "0", "rI" } };
     static const TCGTargetOpDef r_0_rJ = { .args_ct_str = { "r", "0", "rJ" } };
-    static const TCGTargetOpDef r_0_rN = { .args_ct_str = { "r", "0", "rN" } };
     static const TCGTargetOpDef r_0_rM = { .args_ct_str = { "r", "0", "rM" } };
     static const TCGTargetOpDef a2_r
         = { .args_ct_str = { "r", "r", "0", "1", "r", "r" } };
@@ -2353,6 +2358,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_sub_i64:
     case INDEX_op_and_i32:
     case INDEX_op_and_i64:
+    case INDEX_op_or_i32:
+    case INDEX_op_or_i64:
         return (s390_facilities & FACILITY_DISTINCT_OPS ? &r_r_ri : &r_0_ri);
 
     case INDEX_op_mul_i32:
@@ -2363,19 +2370,6 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_mul_i64:
         return (s390_facilities & FACILITY_GEN_INST_EXT ? &r_0_rJ : &r_0_rI);
 
-    case INDEX_op_or_i32:
-        /* The use of [iNM] constraints are optimization only, since a full
-           64-bit immediate OR can always be performed with 4 sequential
-           OI[LH][LH] instructions.  By rejecting certain negative ranges,
-           the immediate load plus the reg-reg OR is smaller.  */
-        return (s390_facilities & FACILITY_EXT_IMM
-                ? (s390_facilities & FACILITY_DISTINCT_OPS ? &r_r_ri : &r_0_ri)
-                : &r_0_rN);
-    case INDEX_op_or_i64:
-        return (s390_facilities & FACILITY_EXT_IMM
-                ? (s390_facilities & FACILITY_DISTINCT_OPS ? &r_r_rM : &r_0_rM)
-                : &r_0_rN);
-
     case INDEX_op_xor_i32:
         /* Without EXT_IMM, no immediates are supported.  Otherwise,
            rejecting certain negative ranges leads to smaller code.  */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 10/23] tcg/s390: Use constant pool for xori
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
                   ` (8 preceding siblings ...)
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 09/23] tcg/s390: Use constant pool for ori Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 11/23] tcg/s390: Use constant pool for cmpi Richard Henderson
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/s390/tcg-target.inc.c | 77 ++++++++++++++++++++++++-----------------------
 1 file changed, 40 insertions(+), 37 deletions(-)

diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 83fac71c31..b0b34fa5ab 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -39,11 +39,9 @@
 
 #define TCG_CT_CONST_S16   0x100
 #define TCG_CT_CONST_S32   0x200
-#define TCG_CT_CONST_NN16  0x400
-#define TCG_CT_CONST_NN32  0x800
-#define TCG_CT_CONST_U31   0x1000
-#define TCG_CT_CONST_S33   0x2000
-#define TCG_CT_CONST_ZERO  0x4000
+#define TCG_CT_CONST_U31   0x400
+#define TCG_CT_CONST_S33   0x800
+#define TCG_CT_CONST_ZERO  0x1000
 
 /* Several places within the instruction set 0 means "no register"
    rather than TCG_REG_R0.  */
@@ -234,6 +232,7 @@ typedef enum S390Opcode {
     RXY_STRVG   = 0xe32f,
     RXY_STRVH   = 0xe33f,
     RXY_STY     = 0xe350,
+    RXY_XG      = 0xe382,
 
     RX_A        = 0x5a,
     RX_C        = 0x59,
@@ -424,12 +423,6 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
     case 'J':
         ct->ct |= TCG_CT_CONST_S32;
         break;
-    case 'N':
-        ct->ct |= TCG_CT_CONST_NN16;
-        break;
-    case 'M':
-        ct->ct |= TCG_CT_CONST_NN32;
-        break;
     case 'C':
         /* ??? We have no insight here into whether the comparison is
            signed or unsigned.  The COMPARE IMMEDIATE insn uses a 32-bit
@@ -474,10 +467,6 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
         return val == (int32_t)val;
     } else if (ct & TCG_CT_CONST_S33) {
         return val >= -0xffffffffll && val <= 0xffffffffll;
-    } else if (ct & TCG_CT_CONST_NN16) {
-        return !(val < 0 && val == (int16_t)val);
-    } else if (ct & TCG_CT_CONST_NN32) {
-        return !(val < 0 && val == (int32_t)val);
     } else if (ct & TCG_CT_CONST_U31) {
         return val >= 0 && val <= 0x7fffffff;
     } else if (ct & TCG_CT_CONST_ZERO) {
@@ -1062,14 +1051,40 @@ static void tgen_ori(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
     }
 }
 
-static void tgen64_xori(TCGContext *s, TCGReg dest, tcg_target_ulong val)
+static void tgen_xori(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
 {
-    /* Perform the xor by parts.  */
-    if (val & 0xffffffff) {
-        tcg_out_insn(s, RIL, XILF, dest, val);
+    /* Try all 48-bit insns that can perform it in one go.  */
+    if (s390_facilities & FACILITY_EXT_IMM) {
+        if ((val & 0xffffffff00000000ull) == 0) {
+            tcg_out_insn(s, RIL, XILF, dest, val);
+            return;
+        }
+        if ((val & 0x00000000ffffffffull) == 0) {
+            tcg_out_insn(s, RIL, XIHF, dest, val >> 32);
+            return;
+        }
     }
-    if (val > 0xffffffff) {
-        tcg_out_insn(s, RIL, XIHF, dest, val >> 31 >> 1);
+
+    /* Use the constant pool if USE_REG_TB, but not for small constants.  */
+    if (maybe_out_small_movi(s, type, TCG_TMP0, val)) {
+        if (type == TCG_TYPE_I32) {
+            tcg_out_insn(s, RR, XR, dest, TCG_TMP0);
+        } else {
+            tcg_out_insn(s, RRE, XGR, dest, TCG_TMP0);
+        }
+    } else if (USE_REG_TB) {
+        tcg_out_insn(s, RXY, XG, dest, TCG_REG_TB, TCG_REG_NONE, 0);
+        new_pool_label(s, val, R_390_20, s->code_ptr - 2,
+                       -(intptr_t)s->code_gen_ptr);
+    } else {
+        /* Perform the xor by parts.  */
+        tcg_debug_assert(s390_facilities & FACILITY_EXT_IMM);
+        if (val & 0xffffffff) {
+            tcg_out_insn(s, RIL, XILF, dest, val);
+        }
+        if (val > 0xffffffff) {
+            tcg_out_insn(s, RIL, XIHF, dest, val >> 32);
+        }
     }
 }
 
@@ -1889,7 +1904,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         a0 = args[0], a1 = args[1], a2 = (uint32_t)args[2];
         if (const_args[2]) {
             tcg_out_mov(s, TCG_TYPE_I32, a0, a1);
-            tgen64_xori(s, a0, a2);
+            tgen_xori(s, TCG_TYPE_I32, a0, a2);
         } else if (a0 == a1) {
             tcg_out_insn(s, RR, XR, args[0], args[2]);
         } else {
@@ -2121,7 +2136,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         a0 = args[0], a1 = args[1], a2 = args[2];
         if (const_args[2]) {
             tcg_out_mov(s, TCG_TYPE_I64, a0, a1);
-            tgen64_xori(s, a0, a2);
+            tgen_xori(s, TCG_TYPE_I64, a0, a2);
         } else if (a0 == a1) {
             tcg_out_insn(s, RRE, XGR, a0, a2);
         } else {
@@ -2313,12 +2328,9 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     static const TCGTargetOpDef r_rC = { .args_ct_str = { "r", "rC" } };
     static const TCGTargetOpDef r_rZ = { .args_ct_str = { "r", "rZ" } };
     static const TCGTargetOpDef r_r_ri = { .args_ct_str = { "r", "r", "ri" } };
-    static const TCGTargetOpDef r_r_rM = { .args_ct_str = { "r", "r", "rM" } };
-    static const TCGTargetOpDef r_0_r = { .args_ct_str = { "r", "0", "r" } };
     static const TCGTargetOpDef r_0_ri = { .args_ct_str = { "r", "0", "ri" } };
     static const TCGTargetOpDef r_0_rI = { .args_ct_str = { "r", "0", "rI" } };
     static const TCGTargetOpDef r_0_rJ = { .args_ct_str = { "r", "0", "rJ" } };
-    static const TCGTargetOpDef r_0_rM = { .args_ct_str = { "r", "0", "rM" } };
     static const TCGTargetOpDef a2_r
         = { .args_ct_str = { "r", "r", "0", "1", "r", "r" } };
     static const TCGTargetOpDef a2_ri
@@ -2360,6 +2372,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_and_i64:
     case INDEX_op_or_i32:
     case INDEX_op_or_i64:
+    case INDEX_op_xor_i32:
+    case INDEX_op_xor_i64:
         return (s390_facilities & FACILITY_DISTINCT_OPS ? &r_r_ri : &r_0_ri);
 
     case INDEX_op_mul_i32:
@@ -2370,17 +2384,6 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_mul_i64:
         return (s390_facilities & FACILITY_GEN_INST_EXT ? &r_0_rJ : &r_0_rI);
 
-    case INDEX_op_xor_i32:
-        /* Without EXT_IMM, no immediates are supported.  Otherwise,
-           rejecting certain negative ranges leads to smaller code.  */
-        return (s390_facilities & FACILITY_EXT_IMM
-                ? (s390_facilities & FACILITY_DISTINCT_OPS ? &r_r_ri : &r_0_ri)
-                : &r_0_r);
-    case INDEX_op_xor_i64:
-        return (s390_facilities & FACILITY_EXT_IMM
-                ? (s390_facilities & FACILITY_DISTINCT_OPS ? &r_r_rM : &r_0_rM)
-                : &r_0_r);
-
     case INDEX_op_shl_i32:
     case INDEX_op_shr_i32:
     case INDEX_op_sar_i32:
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 11/23] tcg/s390: Use constant pool for cmpi
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
                   ` (9 preceding siblings ...)
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 10/23] tcg/s390: Use constant pool for xori Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 12/23] tcg/aarch64: Use constant pool for movi Richard Henderson
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

Also use CHI/CGHI for 16-bit signed constants.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/s390/tcg-target.inc.c | 136 +++++++++++++++++++++++-----------------------
 1 file changed, 67 insertions(+), 69 deletions(-)

diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index b0b34fa5ab..e7ab8e4df3 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -39,9 +39,8 @@
 
 #define TCG_CT_CONST_S16   0x100
 #define TCG_CT_CONST_S32   0x200
-#define TCG_CT_CONST_U31   0x400
-#define TCG_CT_CONST_S33   0x800
-#define TCG_CT_CONST_ZERO  0x1000
+#define TCG_CT_CONST_S33   0x400
+#define TCG_CT_CONST_ZERO  0x800
 
 /* Several places within the instruction set 0 means "no register"
    rather than TCG_REG_R0.  */
@@ -75,6 +74,10 @@ typedef enum S390Opcode {
     RIL_CGFI    = 0xc20c,
     RIL_CLFI    = 0xc20f,
     RIL_CLGFI   = 0xc20e,
+    RIL_CLRL    = 0xc60f,
+    RIL_CLGRL   = 0xc60a,
+    RIL_CRL     = 0xc60d,
+    RIL_CGRL    = 0xc608,
     RIL_IIHF    = 0xc008,
     RIL_IILF    = 0xc009,
     RIL_LARL    = 0xc000,
@@ -97,6 +100,8 @@ typedef enum S390Opcode {
     RI_AGHI     = 0xa70b,
     RI_AHI      = 0xa70a,
     RI_BRC      = 0xa704,
+    RI_CHI      = 0xa70e,
+    RI_CGHI     = 0xa70f,
     RI_IIHH     = 0xa500,
     RI_IIHL     = 0xa501,
     RI_IILH     = 0xa502,
@@ -206,6 +211,8 @@ typedef enum S390Opcode {
     RXY_AG      = 0xe308,
     RXY_AY      = 0xe35a,
     RXY_CG      = 0xe320,
+    RXY_CLG     = 0xe321,
+    RXY_CLY     = 0xe355,
     RXY_CY      = 0xe359,
     RXY_LAY     = 0xe371,
     RXY_LB      = 0xe376,
@@ -423,20 +430,6 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
     case 'J':
         ct->ct |= TCG_CT_CONST_S32;
         break;
-    case 'C':
-        /* ??? We have no insight here into whether the comparison is
-           signed or unsigned.  The COMPARE IMMEDIATE insn uses a 32-bit
-           signed immediate, and the COMPARE LOGICAL IMMEDIATE insn uses
-           a 32-bit unsigned immediate.  If we were to use the (semi)
-           obvious "val == (int32_t)val" we would be enabling unsigned
-           comparisons vs very large numbers.  The only solution is to
-           take the intersection of the ranges.  */
-        /* ??? Another possible solution is to simply lie and allow all
-           constants here and force the out-of-range values into a temp
-           register in tgen_cmp when we have knowledge of the actual
-           comparison code in use.  */
-        ct->ct |= TCG_CT_CONST_U31;
-        break;
     case 'Z':
         ct->ct |= TCG_CT_CONST_ZERO;
         break;
@@ -467,8 +460,6 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
         return val == (int32_t)val;
     } else if (ct & TCG_CT_CONST_S33) {
         return val >= -0xffffffffll && val <= 0xffffffffll;
-    } else if (ct & TCG_CT_CONST_U31) {
-        return val >= 0 && val <= 0x7fffffff;
     } else if (ct & TCG_CT_CONST_ZERO) {
         return val == 0;
     }
@@ -1092,6 +1083,8 @@ static int tgen_cmp(TCGContext *s, TCGType type, TCGCond c, TCGReg r1,
                     TCGArg c2, bool c2const, bool need_carry)
 {
     bool is_unsigned = is_unsigned_cond(c);
+    S390Opcode op;
+
     if (c2const) {
         if (c2 == 0) {
             if (!(is_unsigned && need_carry)) {
@@ -1102,44 +1095,67 @@ static int tgen_cmp(TCGContext *s, TCGType type, TCGCond c, TCGReg r1,
                 }
                 return tcg_cond_to_ltr_cond[c];
             }
-            /* If we only got here because of load-and-test,
-               and we couldn't use that, then we need to load
-               the constant into a register.  */
-            if (!(s390_facilities & FACILITY_EXT_IMM)) {
-                c2 = TCG_TMP0;
-                tcg_out_movi(s, type, c2, 0);
-                goto do_reg;
-            }
         }
-        if (is_unsigned) {
-            if (type == TCG_TYPE_I32) {
-                tcg_out_insn(s, RIL, CLFI, r1, c2);
-            } else {
-                tcg_out_insn(s, RIL, CLGFI, r1, c2);
-            }
-        } else {
+
+        if (!is_unsigned && c2 == (int16_t)c2) {
+            op = (type == TCG_TYPE_I32 ? RI_CHI : RI_CGHI);
+            tcg_out_insn_RI(s, op, r1, c2);
+            goto exit;
+        }
+
+        if (s390_facilities & FACILITY_EXT_IMM) {
             if (type == TCG_TYPE_I32) {
-                tcg_out_insn(s, RIL, CFI, r1, c2);
-            } else {
-                tcg_out_insn(s, RIL, CGFI, r1, c2);
+                op = (is_unsigned ? RIL_CLFI : RIL_CFI);
+                tcg_out_insn_RIL(s, op, r1, c2);
+                goto exit;
+            } else if (c2 == (is_unsigned ? (uint32_t)c2 : (int32_t)c2)) {
+                op = (is_unsigned ? RIL_CLGFI : RIL_CGFI);
+                tcg_out_insn_RIL(s, op, r1, c2);
+                goto exit;
             }
         }
-    } else {
-    do_reg:
-        if (is_unsigned) {
+
+        /* Use the constant pool, but not for small constants.  */
+        if (maybe_out_small_movi(s, type, TCG_TMP0, c2)) {
+            c2 = TCG_TMP0;
+            /* fall through to reg-reg */
+        } else if (USE_REG_TB) {
             if (type == TCG_TYPE_I32) {
-                tcg_out_insn(s, RR, CLR, r1, c2);
+                op = (is_unsigned ? RXY_CLY : RXY_CY);
+                tcg_out_insn_RXY(s, op, r1, TCG_REG_TB, TCG_REG_NONE, 0);
+                new_pool_label(s, (uint32_t)c2, R_390_20, s->code_ptr - 2,
+                               4 - (intptr_t)s->code_gen_ptr);
             } else {
-                tcg_out_insn(s, RRE, CLGR, r1, c2);
+                op = (is_unsigned ? RXY_CLG : RXY_CG);
+                tcg_out_insn_RXY(s, op, r1, TCG_REG_TB, TCG_REG_NONE, 0);
+                new_pool_label(s, c2, R_390_20, s->code_ptr - 2,
+                               -(intptr_t)s->code_gen_ptr);
             }
+            goto exit;
         } else {
             if (type == TCG_TYPE_I32) {
-                tcg_out_insn(s, RR, CR, r1, c2);
+                op = (is_unsigned ? RIL_CLRL : RIL_CRL);
+                tcg_out_insn_RIL(s, op, r1, 0);
+                new_pool_label(s, (uint32_t)c2, R_390_PC32DBL,
+                               s->code_ptr - 2, 2 + 4);
             } else {
-                tcg_out_insn(s, RRE, CGR, r1, c2);
+                op = (is_unsigned ? RIL_CLGRL : RIL_CGRL);
+                tcg_out_insn_RIL(s, op, r1, 0);
+                new_pool_label(s, c2, R_390_PC32DBL, s->code_ptr - 2, 2);
             }
+            goto exit;
         }
     }
+
+    if (type == TCG_TYPE_I32) {
+        op = (is_unsigned ? RR_CLR : RR_CR);
+        tcg_out_insn_RR(s, op, r1, c2);
+    } else {
+        op = (is_unsigned ? RRE_CLGR : RRE_CGR);
+        tcg_out_insn_RRE(s, op, r1, c2);
+    }
+
+ exit:
     return tcg_cond_to_s390_cond[c];
 }
 
@@ -2325,8 +2341,6 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     static const TCGTargetOpDef r_L = { .args_ct_str = { "r", "L" } };
     static const TCGTargetOpDef L_L = { .args_ct_str = { "L", "L" } };
     static const TCGTargetOpDef r_ri = { .args_ct_str = { "r", "ri" } };
-    static const TCGTargetOpDef r_rC = { .args_ct_str = { "r", "rC" } };
-    static const TCGTargetOpDef r_rZ = { .args_ct_str = { "r", "rZ" } };
     static const TCGTargetOpDef r_r_ri = { .args_ct_str = { "r", "r", "ri" } };
     static const TCGTargetOpDef r_0_ri = { .args_ct_str = { "r", "0", "ri" } };
     static const TCGTargetOpDef r_0_rI = { .args_ct_str = { "r", "0", "rI" } };
@@ -2401,10 +2415,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         return &r_r_ri;
 
     case INDEX_op_brcond_i32:
-        /* Without EXT_IMM, only the LOAD AND TEST insn is available.  */
-        return (s390_facilities & FACILITY_EXT_IMM ? &r_ri : &r_rZ);
     case INDEX_op_brcond_i64:
-        return (s390_facilities & FACILITY_EXT_IMM ? &r_rC : &r_rZ);
+        return &r_ri;
 
     case INDEX_op_bswap16_i32:
     case INDEX_op_bswap16_i64:
@@ -2430,6 +2442,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         return &r_r;
 
     case INDEX_op_clz_i64:
+    case INDEX_op_setcond_i32:
+    case INDEX_op_setcond_i64:
         return &r_r_ri;
 
     case INDEX_op_qemu_ld_i32:
@@ -2446,30 +2460,14 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
                 = { .args_ct_str = { "r", "rZ", "r" } };
             return &dep;
         }
-    case INDEX_op_setcond_i32:
-    case INDEX_op_setcond_i64:
-        {
-            /* Without EXT_IMM, only the LOAD AND TEST insn is available.  */
-            static const TCGTargetOpDef setc_z
-                = { .args_ct_str = { "r", "r", "rZ" } };
-            static const TCGTargetOpDef setc_c
-                = { .args_ct_str = { "r", "r", "rC" } };
-            return (s390_facilities & FACILITY_EXT_IMM ? &setc_c : &setc_z);
-        }
     case INDEX_op_movcond_i32:
     case INDEX_op_movcond_i64:
         {
-            /* Without EXT_IMM, only the LOAD AND TEST insn is available.  */
-            static const TCGTargetOpDef movc_z
-                = { .args_ct_str = { "r", "r", "rZ", "r", "0" } };
-            static const TCGTargetOpDef movc_c
-                = { .args_ct_str = { "r", "r", "rC", "r", "0" } };
+            static const TCGTargetOpDef movc
+                = { .args_ct_str = { "r", "r", "ri", "r", "0" } };
             static const TCGTargetOpDef movc_l
-                = { .args_ct_str = { "r", "r", "rC", "rI", "0" } };
-            return (s390_facilities & FACILITY_EXT_IMM
-                    ? (s390_facilities & FACILITY_LOAD_ON_COND2
-                       ? &movc_l : &movc_c)
-                    : &movc_z);
+                = { .args_ct_str = { "r", "r", "ri", "rI", "0" } };
+            return (s390_facilities & FACILITY_LOAD_ON_COND2 ? &movc_l : &movc);
         }
     case INDEX_op_div2_i32:
     case INDEX_op_div2_i64:
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 12/23] tcg/aarch64: Use constant pool for movi
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
                   ` (10 preceding siblings ...)
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 11/23] tcg/s390: Use constant pool for cmpi Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 13/23] tcg/sparc: Introduce TCG_REG_TB Richard Henderson
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.h     |  1 +
 tcg/aarch64/tcg-target.inc.c | 62 +++++++++++++++++++++++---------------------
 2 files changed, 33 insertions(+), 30 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 484cf6236c..e86c2684fb 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -123,5 +123,6 @@ void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
 #ifdef CONFIG_SOFTMMU
 #define TCG_TARGET_NEED_LDST_LABELS
 #endif
+#define TCG_TARGET_NEED_POOL_LABELS
 
 #endif /* AARCH64_TCG_TARGET_H */
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index c7c751bafc..c2f3812214 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -10,6 +10,7 @@
  * See the COPYING file in the top-level directory for details.
  */
 
+#include "tcg-pool.inc.c"
 #include "qemu/bitops.h"
 
 /* We're going to re-use TCGType in setting of the SF bit, which controls
@@ -587,9 +588,11 @@ static void tcg_out_logicali(TCGContext *s, AArch64Insn insn, TCGType ext,
 static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
                          tcg_target_long value)
 {
-    int i, wantinv, shift;
     tcg_target_long svalue = value;
     tcg_target_long ivalue = ~value;
+    tcg_target_long t0, t1, t2;
+    int s0, s1;
+    AArch64Insn opc;
 
     /* For 32-bit values, discard potential garbage in value.  For 64-bit
        values within [2**31, 2**32-1], we can create smaller sequences by
@@ -638,38 +641,29 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
         }
     }
 
-    /* Would it take fewer insns to begin with MOVN?  For the value and its
-       inverse, count the number of 16-bit lanes that are 0.  */
-    for (i = wantinv = 0; i < 64; i += 16) {
-        tcg_target_long mask = 0xffffull << i;
-        wantinv -= ((value & mask) == 0);
-        wantinv += ((ivalue & mask) == 0);
-    }
-
-    if (wantinv <= 0) {
-        /* Find the lowest lane that is not 0x0000.  */
-        shift = ctz64(value) & (63 & -16);
-        tcg_out_insn(s, 3405, MOVZ, type, rd, value >> shift, shift);
-        /* Clear out the lane that we just set.  */
-        value &= ~(0xffffUL << shift);
-        /* Iterate until all non-zero lanes have been processed.  */
-        while (value) {
-            shift = ctz64(value) & (63 & -16);
-            tcg_out_insn(s, 3405, MOVK, type, rd, value >> shift, shift);
-            value &= ~(0xffffUL << shift);
-        }
+    /* Would it take fewer insns to begin with MOVN?  */
+    if (ctpop64(value) >= 32) {
+        t0 = ivalue;
+        opc = I3405_MOVN;
     } else {
-        /* Like above, but with the inverted value and MOVN to start.  */
-        shift = ctz64(ivalue) & (63 & -16);
-        tcg_out_insn(s, 3405, MOVN, type, rd, ivalue >> shift, shift);
-        ivalue &= ~(0xffffUL << shift);
-        while (ivalue) {
-            shift = ctz64(ivalue) & (63 & -16);
-            /* Provide MOVK with the non-inverted value.  */
-            tcg_out_insn(s, 3405, MOVK, type, rd, ~(ivalue >> shift), shift);
-            ivalue &= ~(0xffffUL << shift);
+        t0 = value;
+        opc = I3405_MOVZ;
+    }
+    s0 = ctz64(t0) & (63 & -16);
+    t1 = t0 & ~(0xffffUL << s0);
+    s1 = ctz64(t1) & (63 & -16);
+    t2 = t1 & ~(0xffffUL << s1);
+    if (t2 == 0) {
+        tcg_out_insn_3405(s, opc, type, rd, t0 >> s0, s0);
+        if (t1 != 0) {
+            tcg_out_insn(s, 3405, MOVK, type, rd, value >> s1, s1);
         }
+        return;
     }
+
+    /* For more than 2 insns, dump it into the constant pool.  */
+    new_pool_label(s, value, R_AARCH64_CONDBR19, s->code_ptr, 0);
+    tcg_out_insn(s, 3305, LDR, 0, rd);
 }
 
 /* Define something more legible for general use.  */
@@ -2030,6 +2024,14 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     tcg_out_insn(s, 3207, RET, TCG_REG_LR);
 }
 
+static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
+{
+    int i;
+    for (i = 0; i < count; ++i) {
+        p[i] = NOP;
+    }
+}
+
 typedef struct {
     DebugFrameHeader h;
     uint8_t fde_def_cfa[4];
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 13/23] tcg/sparc: Introduce TCG_REG_TB
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
                   ` (11 preceding siblings ...)
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 12/23] tcg/aarch64: Use constant pool for movi Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 14/23] tcg/sparc: Use constant pool for movi Richard Henderson
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.inc.c | 170 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 140 insertions(+), 30 deletions(-)

diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index bb7f7e8906..7d73c25347 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -85,6 +85,9 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 # define TCG_GUEST_BASE_REG TCG_REG_I5
 #endif
 
+#define TCG_REG_TB  TCG_REG_I1
+#define USE_REG_TB  (sizeof(void *) > 4)
+
 static const int tcg_target_reg_alloc_order[] = {
     TCG_REG_L0,
     TCG_REG_L1,
@@ -249,6 +252,8 @@ static const int tcg_target_call_oarg_regs[] = {
 
 #define MEMBAR     (INSN_OP(2) | INSN_OP3(0x28) | INSN_RS1(15) | (1 << 13))
 
+#define NOP        (SETHI | INSN_RD(TCG_REG_G0) | 0)
+
 #ifndef ASI_PRIMARY_LITTLE
 #define ASI_PRIMARY_LITTLE 0x88
 #endif
@@ -423,10 +428,11 @@ static inline void tcg_out_movi_imm13(TCGContext *s, TCGReg ret, int32_t arg)
     tcg_out_arithi(s, ret, TCG_REG_G0, arg, ARITH_OR);
 }
 
-static void tcg_out_movi(TCGContext *s, TCGType type,
-                         TCGReg ret, tcg_target_long arg)
+static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
+                             tcg_target_long arg, bool in_prologue)
 {
     tcg_target_long hi, lo = (int32_t)arg;
+    tcg_target_long test, lsb;
 
     /* Make sure we test 32-bit constants for imm13 properly.  */
     if (type == TCG_TYPE_I32) {
@@ -455,6 +461,27 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
         return;
     }
 
+    /* A 21-bit constant, shifted.  */
+    lsb = ctz64(arg);
+    test = (tcg_target_long)arg >> lsb;
+    if (check_fit_tl(test, 13)) {
+        tcg_out_movi_imm13(s, ret, test);
+        tcg_out_arithi(s, ret, ret, lsb, SHIFT_SLLX);
+        return;
+    } else if (lsb > 10 && test == extract64(test, 0, 21)) {
+        tcg_out_sethi(s, ret, test << 10);
+        tcg_out_arithi(s, ret, ret, lsb - 10, SHIFT_SLLX);
+        return;
+    }
+
+    if (USE_REG_TB && !in_prologue) {
+        intptr_t diff = arg - (uintptr_t)s->code_gen_ptr;
+        if (check_fit_ptr(diff, 13)) {
+            tcg_out_arithi(s, ret, TCG_REG_TB, diff, ARITH_ADD);
+            return;
+        }
+    }
+
     /* A 64-bit constant decomposed into 2 32-bit pieces.  */
     if (check_fit_i32(lo, 13)) {
         hi = (arg - lo) >> 32;
@@ -470,6 +497,12 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
     }
 }
 
+static inline void tcg_out_movi(TCGContext *s, TCGType type,
+                                TCGReg ret, tcg_target_long arg)
+{
+    tcg_out_movi_int(s, type, ret, arg, false);
+}
+
 static inline void tcg_out_ldst_rr(TCGContext *s, TCGReg data, TCGReg a1,
                                    TCGReg a2, int op)
 {
@@ -512,6 +545,11 @@ static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
 
 static void tcg_out_ld_ptr(TCGContext *s, TCGReg ret, uintptr_t arg)
 {
+    intptr_t diff = arg - (uintptr_t)s->code_gen_ptr;
+    if (USE_REG_TB && check_fit_ptr(diff, 13)) {
+        tcg_out_ld(s, TCG_TYPE_PTR, ret, TCG_REG_TB, diff);
+        return;
+    }
     tcg_out_movi(s, TCG_TYPE_PTR, ret, arg & ~0x3ff);
     tcg_out_ld(s, TCG_TYPE_PTR, ret, ret, arg & 0x3ff);
 }
@@ -543,7 +581,7 @@ static void tcg_out_div32(TCGContext *s, TCGReg rd, TCGReg rs1,
 
 static inline void tcg_out_nop(TCGContext *s)
 {
-    tcg_out_sethi(s, TCG_REG_G0, 0);
+    tcg_out32(s, NOP);
 }
 
 static const uint8_t tcg_cond_to_bcond[] = {
@@ -812,7 +850,8 @@ static void tcg_out_addsub2_i64(TCGContext *s, TCGReg rl, TCGReg rh,
     tcg_out_mov(s, TCG_TYPE_I64, rl, tmp);
 }
 
-static void tcg_out_call_nodelay(TCGContext *s, tcg_insn_unit *dest)
+static void tcg_out_call_nodelay(TCGContext *s, tcg_insn_unit *dest,
+                                 bool in_prologue)
 {
     ptrdiff_t disp = tcg_pcrel_diff(s, dest);
 
@@ -820,14 +859,15 @@ static void tcg_out_call_nodelay(TCGContext *s, tcg_insn_unit *dest)
         tcg_out32(s, CALL | (uint32_t)disp >> 2);
     } else {
         uintptr_t desti = (uintptr_t)dest;
-        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_T1, desti & ~0xfff);
+        tcg_out_movi_int(s, TCG_TYPE_PTR, TCG_REG_T1,
+                         desti & ~0xfff, in_prologue);
         tcg_out_arithi(s, TCG_REG_O7, TCG_REG_T1, desti & 0xfff, JMPL);
     }
 }
 
 static void tcg_out_call(TCGContext *s, tcg_insn_unit *dest)
 {
-    tcg_out_call_nodelay(s, dest);
+    tcg_out_call_nodelay(s, dest, false);
     tcg_out_nop(s);
 }
 
@@ -915,7 +955,7 @@ static void build_trampolines(TCGContext *s)
         /* Set the env operand.  */
         tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_O0, TCG_AREG0);
         /* Tail call.  */
-        tcg_out_call_nodelay(s, qemu_ld_helpers[i]);
+        tcg_out_call_nodelay(s, qemu_ld_helpers[i], true);
         tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_O7, ra);
     }
 
@@ -964,7 +1004,7 @@ static void build_trampolines(TCGContext *s)
         /* Set the env operand.  */
         tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_O0, TCG_AREG0);
         /* Tail call.  */
-        tcg_out_call_nodelay(s, qemu_st_helpers[i]);
+        tcg_out_call_nodelay(s, qemu_st_helpers[i], true);
         tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_O7, ra);
     }
 }
@@ -992,11 +1032,17 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 
 #ifndef CONFIG_SOFTMMU
     if (guest_base != 0) {
-        tcg_out_movi(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, guest_base);
+        tcg_out_movi_int(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, guest_base, true);
         tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG);
     }
 #endif
 
+    /* We choose TCG_REG_TB such that no move is required.  */
+    if (USE_REG_TB) {
+        QEMU_BUILD_BUG_ON(TCG_REG_TB != TCG_REG_I1);
+        tcg_regset_set_reg(s->reserved_regs, TCG_REG_TB);
+    }
+
     tcg_out_arithi(s, TCG_REG_G0, TCG_REG_I1, 0, JMPL);
     /* delay slot */
     tcg_out_nop(s);
@@ -1156,7 +1202,7 @@ static void tcg_out_qemu_ld(TCGContext *s, TCGReg data, TCGReg addr,
         func = qemu_ld_trampoline[memop & (MO_BSWAP | MO_SSIZE)];
     }
     tcg_debug_assert(func != NULL);
-    tcg_out_call_nodelay(s, func);
+    tcg_out_call_nodelay(s, func, false);
     /* delay slot */
     tcg_out_movi(s, TCG_TYPE_I32, param, oi);
 
@@ -1235,7 +1281,7 @@ static void tcg_out_qemu_st(TCGContext *s, TCGReg data, TCGReg addr,
 
     func = qemu_st_trampoline[memop & (MO_BSWAP | MO_SIZE)];
     tcg_debug_assert(func != NULL);
-    tcg_out_call_nodelay(s, func);
+    tcg_out_call_nodelay(s, func, false);
     /* delay slot */
     tcg_out_movi(s, TCG_TYPE_I32, param, oi);
 
@@ -1269,30 +1315,67 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         if (check_fit_ptr(a0, 13)) {
             tcg_out_arithi(s, TCG_REG_G0, TCG_REG_I7, 8, RETURN);
             tcg_out_movi_imm13(s, TCG_REG_O0, a0);
-        } else {
-            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_I0, a0 & ~0x3ff);
-            tcg_out_arithi(s, TCG_REG_G0, TCG_REG_I7, 8, RETURN);
-            tcg_out_arithi(s, TCG_REG_O0, TCG_REG_O0, a0 & 0x3ff, ARITH_OR);
+            break;
+        } else if (USE_REG_TB) {
+            intptr_t tb_diff = a0 - (uintptr_t)s->code_gen_ptr;
+            if (check_fit_ptr(tb_diff, 13)) {
+                tcg_out_arithi(s, TCG_REG_G0, TCG_REG_I7, 8, RETURN);
+                /* Note that TCG_REG_TB has been unwound to O1.  */
+                tcg_out_arithi(s, TCG_REG_O0, TCG_REG_O1, tb_diff, ARITH_ADD);
+                break;
+            }
         }
+        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_I0, a0 & ~0x3ff);
+        tcg_out_arithi(s, TCG_REG_G0, TCG_REG_I7, 8, RETURN);
+        tcg_out_arithi(s, TCG_REG_O0, TCG_REG_O0, a0 & 0x3ff, ARITH_OR);
         break;
     case INDEX_op_goto_tb:
         if (s->tb_jmp_insn_offset) {
             /* direct jump method */
-            s->tb_jmp_insn_offset[a0] = tcg_current_code_size(s);
-            /* Make sure to preserve links during retranslation.  */
-            tcg_out32(s, CALL | (*s->code_ptr & ~INSN_OP(-1)));
+            if (USE_REG_TB) {
+                /* make sure the patch is 8-byte aligned.  */
+                if ((intptr_t)s->code_ptr & 4) {
+                    tcg_out_nop(s);
+                }
+                s->tb_jmp_insn_offset[a0] = tcg_current_code_size(s);
+                tcg_out_sethi(s, TCG_REG_T1, 0);
+                tcg_out_arithi(s, TCG_REG_T1, TCG_REG_T1, 0, ARITH_OR);
+                tcg_out_arith(s, TCG_REG_G0, TCG_REG_TB, TCG_REG_T1, JMPL);
+                tcg_out_arith(s, TCG_REG_TB, TCG_REG_TB, TCG_REG_T1, ARITH_ADD);
+            } else {
+                s->tb_jmp_insn_offset[a0] = tcg_current_code_size(s);
+                tcg_out32(s, CALL);
+                tcg_out_nop(s);
+            }
         } else {
             /* indirect jump method */
-            tcg_out_ld_ptr(s, TCG_REG_T1,
+            tcg_out_ld_ptr(s, TCG_REG_TB,
                            (uintptr_t)(s->tb_jmp_target_addr + a0));
-            tcg_out_arithi(s, TCG_REG_G0, TCG_REG_T1, 0, JMPL);
+            tcg_out_arithi(s, TCG_REG_G0, TCG_REG_TB, 0, JMPL);
+            tcg_out_nop(s);
+        }
+        s->tb_jmp_reset_offset[a0] = c = tcg_current_code_size(s);
+
+        /* For the unlinked path of goto_tb, we need to reset
+           TCG_REG_TB to the beginning of this TB.  */
+        if (USE_REG_TB) {
+            c = -c;
+            if (check_fit_i32(c, 13)) {
+                tcg_out_arithi(s, TCG_REG_TB, TCG_REG_TB, c, ARITH_ADD);
+            } else {
+                tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_T1, c);
+                tcg_out_arith(s, TCG_REG_TB, TCG_REG_TB,
+                              TCG_REG_T1, ARITH_ADD);
+            }
         }
-        tcg_out_nop(s);
-        s->tb_jmp_reset_offset[a0] = tcg_current_code_size(s);
         break;
     case INDEX_op_goto_ptr:
         tcg_out_arithi(s, TCG_REG_G0, a0, 0, JMPL);
-        tcg_out_nop(s);
+        if (USE_REG_TB) {
+            tcg_out_arith(s, TCG_REG_TB, a0, TCG_REG_G0, ARITH_OR);
+        } else {
+            tcg_out_nop(s);
+        }
         break;
     case INDEX_op_br:
         tcg_out_bpcc(s, COND_A, BPCC_PT, arg_label(a0));
@@ -1709,13 +1792,40 @@ void tcg_register_jit(void *buf, size_t buf_size)
 void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
                               uintptr_t addr)
 {
-    uint32_t *ptr = (uint32_t *)jmp_addr;
-    uintptr_t disp = addr - jmp_addr;
+    intptr_t tb_disp = addr - tc_ptr;
+    intptr_t br_disp = addr - jmp_addr;
+    tcg_insn_unit i1, i2;
+
+    /* We can reach the entire address space for ILP32.
+       For LP64, the code_gen_buffer can't be larger than 2GB.  */
+    tcg_debug_assert(tb_disp == (int32_t)tb_disp);
+    tcg_debug_assert(br_disp == (int32_t)br_disp);
+
+    if (!USE_REG_TB) {
+        atomic_set((uint32_t *)jmp_addr, deposit32(CALL, 0, 30, br_disp >> 2));
+        flush_icache_range(jmp_addr, jmp_addr + 4);
+        return;
+    }
 
-    /* We can reach the entire address space for 32-bit.  For 64-bit
-       the code_gen_buffer can't be larger than 2GB.  */
-    tcg_debug_assert(disp == (int32_t)disp);
+    /* This does not exercise the range of the branch, but we do
+       still need to be able to load the new value of TCG_REG_TB.
+       But this does still happen quite often.  */
+    if (check_fit_ptr(tb_disp, 13)) {
+        /* ba,pt %icc, addr */
+        i1 = (INSN_OP(0) | INSN_OP2(1) | INSN_COND(COND_A)
+              | BPCC_ICC | BPCC_PT | INSN_OFF19(br_disp));
+        i2 = (ARITH_ADD | INSN_RD(TCG_REG_TB) | INSN_RS1(TCG_REG_TB)
+              | INSN_IMM13(tb_disp));
+    } else if (tb_disp >= 0) {
+        i1 = SETHI | INSN_RD(TCG_REG_T1) | ((tb_disp & 0xfffffc00) >> 10);
+        i2 = (ARITH_OR | INSN_RD(TCG_REG_T1) | INSN_RS1(TCG_REG_T1)
+              | INSN_IMM13(tb_disp & 0x3ff));
+    } else {
+        i1 = SETHI | INSN_RD(TCG_REG_T1) | ((~tb_disp & 0xfffffc00) >> 10);
+        i2 = (ARITH_XOR | INSN_RD(TCG_REG_T1) | INSN_RS1(TCG_REG_T1)
+              | INSN_IMM13((tb_disp & 0x3ff) | -0x400));
+    }
 
-    atomic_set(ptr, deposit32(CALL, 0, 30, disp >> 2));
-    flush_icache_range(jmp_addr, jmp_addr + 4);
+    atomic_set((uint64_t *)jmp_addr, deposit64(i2, 32, 32, i1));
+    flush_icache_range(jmp_addr, jmp_addr + 8);
 }
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 14/23] tcg/sparc: Use constant pool for movi
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
                   ` (12 preceding siblings ...)
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 13/23] tcg/sparc: Introduce TCG_REG_TB Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 15/23] tcg/arm: Improve tlb load for armv7 Richard Henderson
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.h     |  2 ++
 tcg/sparc/tcg-target.inc.c | 77 +++++++++++++++++++++++++++++++++-------------
 2 files changed, 58 insertions(+), 21 deletions(-)

diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 3ac0bd33d3..83f9397e04 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -173,4 +173,6 @@ static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
 
+#define TCG_TARGET_NEED_POOL_LABELS
+
 #endif
diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index 7d73c25347..bd7c1461c6 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -22,6 +22,8 @@
  * THE SOFTWARE.
  */
 
+#include "tcg-pool.inc.c"
+
 #ifdef CONFIG_DEBUG_TCG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
     "%g0",
@@ -292,33 +294,46 @@ static inline int check_fit_i32(int32_t val, unsigned int bits)
 static void patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
-    uint32_t insn;
+    uint32_t insn = *code_ptr;
+    intptr_t pcrel;
 
-    tcg_debug_assert(addend == 0);
-    value = tcg_ptr_byte_diff((tcg_insn_unit *)value, code_ptr);
+    value += addend;
+    pcrel = tcg_ptr_byte_diff((tcg_insn_unit *)value, code_ptr);
 
     switch (type) {
     case R_SPARC_WDISP16:
-        if (!check_fit_ptr(value >> 2, 16)) {
-            tcg_abort();
-        }
-        insn = *code_ptr;
+        assert(check_fit_ptr(pcrel >> 2, 16));
         insn &= ~INSN_OFF16(-1);
-        insn |= INSN_OFF16(value);
-        *code_ptr = insn;
+        insn |= INSN_OFF16(pcrel);
         break;
     case R_SPARC_WDISP19:
-        if (!check_fit_ptr(value >> 2, 19)) {
-            tcg_abort();
-        }
-        insn = *code_ptr;
+        assert(check_fit_ptr(pcrel >> 2, 19));
         insn &= ~INSN_OFF19(-1);
-        insn |= INSN_OFF19(value);
-        *code_ptr = insn;
+        insn |= INSN_OFF19(pcrel);
+        break;
+    case R_SPARC_13:
+        /* Note that we're abusing this reloc type for our own needs.  */
+        if (!check_fit_ptr(value, 13)) {
+            int adj = (value > 0 ? 0xff8 : -0x1000);
+            value -= adj;
+            assert(check_fit_ptr(value, 13));
+            *code_ptr++ = (ARITH_ADD | INSN_RD(TCG_REG_T2)
+                           | INSN_RS1(TCG_REG_TB) | INSN_IMM13(adj));
+            insn ^= INSN_RS1(TCG_REG_TB) ^ INSN_RS1(TCG_REG_T2);
+        }
+        insn &= ~INSN_IMM13(-1);
+        insn |= INSN_IMM13(value);
         break;
+    case R_SPARC_32:
+        /* Note that we're abusing this reloc type for our own needs.  */
+        code_ptr[0] = deposit32(code_ptr[0], 0, 22, value >> 10);
+        code_ptr[1] = deposit32(code_ptr[1], 0, 10, value);
+        return;
     default:
-        tcg_abort();
+        g_assert_not_reached();
     }
+
+    *code_ptr = insn;
 }
 
 /* parse target specific constraints */
@@ -474,12 +489,24 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
         return;
     }
 
-    if (USE_REG_TB && !in_prologue) {
-        intptr_t diff = arg - (uintptr_t)s->code_gen_ptr;
-        if (check_fit_ptr(diff, 13)) {
-            tcg_out_arithi(s, ret, TCG_REG_TB, diff, ARITH_ADD);
-            return;
+    if (!in_prologue) {
+        if (USE_REG_TB) {
+            intptr_t diff = arg - (uintptr_t)s->code_gen_ptr;
+            if (check_fit_ptr(diff, 13)) {
+                tcg_out_arithi(s, ret, TCG_REG_TB, diff, ARITH_ADD);
+            } else {
+                new_pool_label(s, arg, R_SPARC_13, s->code_ptr,
+                               -(intptr_t)s->code_gen_ptr);
+                tcg_out32(s, LDX | INSN_RD(ret) | INSN_RS1(TCG_REG_TB));
+                /* May be used to extend the 13-bit range in patch_reloc.  */
+                tcg_out32(s, NOP);
+            }
+        } else {
+            new_pool_label(s, arg, R_SPARC_32, s->code_ptr, 0);
+            tcg_out_sethi(s, ret, 0);
+            tcg_out32(s, LDX | INSN_RD(ret) | INSN_RS1(ret) | INSN_IMM13(0));
         }
+        return;
     }
 
     /* A 64-bit constant decomposed into 2 32-bit pieces.  */
@@ -1058,6 +1085,14 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 #endif
 }
 
+static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
+{
+    int i;
+    for (i = 0; i < count; ++i) {
+        p[i] = NOP;
+    }
+}
+
 #if defined(CONFIG_SOFTMMU)
 /* Perform the TLB load and compare.
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 15/23] tcg/arm: Improve tlb load for armv7
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
                   ` (13 preceding siblings ...)
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 14/23] tcg/sparc: Use constant pool for movi Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 16/23] tcg/arm: Tighten tlb indexing offset test Richard Henderson
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

Use UBFX to avoid limitation on CPU_TLB_BITS.  Since we're dropping
the initial shift, we need to replace the page masking.  We can use
MOVW+BIC to do this without shifting.  The result is the same size
as the armv6 path with one less conditional instruction.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.inc.c | 72 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 52 insertions(+), 20 deletions(-)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 81ea900852..66c369c239 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -1173,18 +1173,33 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     unsigned s_bits = opc & MO_SIZE;
     unsigned a_bits = get_alignment_bits(opc);
 
-    /* Should generate something like the following:
-     *   shr    tmp, addrlo, #TARGET_PAGE_BITS                    (1)
+    /* V7 generates the following:
+     *   ubfx   r0, addrlo, #TARGET_PAGE_BITS, #CPU_TLB_BITS
      *   add    r2, env, #high
-     *   and    r0, tmp, #(CPU_TLB_SIZE - 1)                      (2)
-     *   add    r2, r2, r0, lsl #CPU_TLB_ENTRY_BITS               (3)
-     *   ldr    r0, [r2, #cmp]                                    (4)
+     *   add    r2, r2, r0, lsl #CPU_TLB_ENTRY_BITS
+     *   ldr    r0, [r2, #cmp]
+     *   ldr    r2, [r2, #add]
+     *   movw   tmp, #page_align_mask
+     *   bic    tmp, addrlo, tmp
+     *   cmp    r0, tmp
+     *
+     * Otherwise we generate:
+     *   shr    tmp, addrlo, #TARGET_PAGE_BITS
+     *   add    r2, env, #high
+     *   and    r0, tmp, #(CPU_TLB_SIZE - 1)
+     *   add    r2, r2, r0, lsl #CPU_TLB_ENTRY_BITS
+     *   ldr    r0, [r2, #cmp]
+     *   ldr    r2, [r2, #add]
      *   tst    addrlo, #s_mask
-     *   ldr    r2, [r2, #add]                                    (5)
      *   cmpeq  r0, tmp, lsl #TARGET_PAGE_BITS
      */
-    tcg_out_dat_reg(s, COND_AL, ARITH_MOV, TCG_REG_TMP,
-                    0, addrlo, SHIFT_IMM_LSR(TARGET_PAGE_BITS));
+    if (use_armv7_instructions) {
+        tcg_out_extract(s, COND_AL, TCG_REG_R0, addrlo,
+                        TARGET_PAGE_BITS, CPU_TLB_BITS);
+    } else {
+        tcg_out_dat_reg(s, COND_AL, ARITH_MOV, TCG_REG_TMP,
+                        0, addrlo, SHIFT_IMM_LSR(TARGET_PAGE_BITS));
+    }
 
     /* We checked that the offset is contained within 16 bits above.  */
     if (add_off > 0xfff || (use_armv6_instructions && cmp_off > 0xff)) {
@@ -1194,9 +1209,10 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
         add_off -= cmp_off & 0xff00;
         cmp_off &= 0xff;
     }
-
-    tcg_out_dat_imm(s, COND_AL, ARITH_AND,
-                    TCG_REG_R0, TCG_REG_TMP, CPU_TLB_SIZE - 1);
+    if (!use_armv7_instructions) {
+        tcg_out_dat_imm(s, COND_AL, ARITH_AND,
+                        TCG_REG_R0, TCG_REG_TMP, CPU_TLB_SIZE - 1);
+    }
     tcg_out_dat_reg(s, COND_AL, ARITH_ADD, TCG_REG_R2, base,
                     TCG_REG_R0, SHIFT_IMM_LSL(CPU_TLB_ENTRY_BITS));
 
@@ -1212,24 +1228,40 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
         }
     }
 
+    /* Load the tlb addend.  */
+    tcg_out_ld32_12(s, COND_AL, TCG_REG_R2, TCG_REG_R2, add_off);
+
     /* Check alignment.  We don't support inline unaligned acceses,
        but we can easily support overalignment checks.  */
     if (a_bits < s_bits) {
         a_bits = s_bits;
     }
-    if (a_bits) {
-        tcg_out_dat_imm(s, COND_AL, ARITH_TST, 0, addrlo, (1 << a_bits) - 1);
-    }
 
-    /* Load the tlb addend.  */
-    tcg_out_ld32_12(s, COND_AL, TCG_REG_R2, TCG_REG_R2, add_off);
+    if (use_armv7_instructions) {
+        tcg_target_ulong mask = ~(TARGET_PAGE_MASK | ((1 << a_bits) - 1));
+        int rot = encode_imm(mask);
 
-    tcg_out_dat_reg(s, (a_bits ? COND_EQ : COND_AL), ARITH_CMP, 0,
-                    TCG_REG_R0, TCG_REG_TMP, SHIFT_IMM_LSL(TARGET_PAGE_BITS));
+        if (rot >= 0) { 
+            tcg_out_dat_imm(s, COND_AL, ARITH_BIC, TCG_REG_TMP, addrlo,
+                            rotl(mask, rot) | (rot << 7));
+        } else {
+            tcg_out_movi32(s, COND_AL, TCG_REG_TMP, mask);
+            tcg_out_dat_reg(s, COND_AL, ARITH_BIC, TCG_REG_TMP,
+                            addrlo, TCG_REG_TMP, 0);
+        }
+        tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0, TCG_REG_R0, TCG_REG_TMP, 0);
+    } else {
+        if (a_bits) {
+            tcg_out_dat_imm(s, COND_AL, ARITH_TST, 0, addrlo,
+                            (1 << a_bits) - 1);
+        }
+        tcg_out_dat_reg(s, (a_bits ? COND_EQ : COND_AL), ARITH_CMP,
+                        0, TCG_REG_R0, TCG_REG_TMP,
+                        SHIFT_IMM_LSL(TARGET_PAGE_BITS));
+    }
 
     if (TARGET_LONG_BITS == 64) {
-        tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0,
-                        TCG_REG_R1, addrhi, SHIFT_IMM_LSL(0));
+        tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0, TCG_REG_R1, addrhi, 0);
     }
 
     return TCG_REG_R2;
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 16/23] tcg/arm: Tighten tlb indexing offset test
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
                   ` (14 preceding siblings ...)
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 15/23] tcg/arm: Improve tlb load for armv7 Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 17/23] tcg/arm: Code rearrangement Richard Henderson
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

We are not going to use ldrd for loading the comparator
for 32-bit guests, so don't limit cmp_off to 8 bits then.
This eliminates one insn in the tlb load for some guests.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.inc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 66c369c239..6c12b169ce 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -1202,7 +1202,9 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     }
 
     /* We checked that the offset is contained within 16 bits above.  */
-    if (add_off > 0xfff || (use_armv6_instructions && cmp_off > 0xff)) {
+    if (add_off > 0xfff
+        || (use_armv6_instructions && TARGET_LONG_BITS == 64
+            && cmp_off > 0xff)) {
         tcg_out_dat_imm(s, COND_AL, ARITH_ADD, TCG_REG_R2, base,
                         (24 << 7) | (cmp_off >> 8));
         base = TCG_REG_R2;
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 17/23] tcg/arm: Code rearrangement
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
                   ` (15 preceding siblings ...)
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 16/23] tcg/arm: Tighten tlb indexing offset test Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 18/23] tcg/arm: Extract INSN_NOP Richard Henderson
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

Move constants before all of the functions.
Move tcg_out_<format> functions before all
of the others.  No functional change.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.inc.c | 599 +++++++++++++++++++++++------------------------
 1 file changed, 299 insertions(+), 300 deletions(-)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 6c12b169ce..f40e87066f 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -85,6 +85,97 @@ static const int tcg_target_call_oarg_regs[2] = {
 
 #define TCG_REG_TMP  TCG_REG_R12
 
+enum arm_cond_code_e {
+    COND_EQ = 0x0,
+    COND_NE = 0x1,
+    COND_CS = 0x2,	/* Unsigned greater or equal */
+    COND_CC = 0x3,	/* Unsigned less than */
+    COND_MI = 0x4,	/* Negative */
+    COND_PL = 0x5,	/* Zero or greater */
+    COND_VS = 0x6,	/* Overflow */
+    COND_VC = 0x7,	/* No overflow */
+    COND_HI = 0x8,	/* Unsigned greater than */
+    COND_LS = 0x9,	/* Unsigned less or equal */
+    COND_GE = 0xa,
+    COND_LT = 0xb,
+    COND_GT = 0xc,
+    COND_LE = 0xd,
+    COND_AL = 0xe,
+};
+
+#define TO_CPSR (1 << 20)
+
+#define SHIFT_IMM_LSL(im)	(((im) << 7) | 0x00)
+#define SHIFT_IMM_LSR(im)	(((im) << 7) | 0x20)
+#define SHIFT_IMM_ASR(im)	(((im) << 7) | 0x40)
+#define SHIFT_IMM_ROR(im)	(((im) << 7) | 0x60)
+#define SHIFT_REG_LSL(rs)	(((rs) << 8) | 0x10)
+#define SHIFT_REG_LSR(rs)	(((rs) << 8) | 0x30)
+#define SHIFT_REG_ASR(rs)	(((rs) << 8) | 0x50)
+#define SHIFT_REG_ROR(rs)	(((rs) << 8) | 0x70)
+
+typedef enum {
+    ARITH_AND = 0x0 << 21,
+    ARITH_EOR = 0x1 << 21,
+    ARITH_SUB = 0x2 << 21,
+    ARITH_RSB = 0x3 << 21,
+    ARITH_ADD = 0x4 << 21,
+    ARITH_ADC = 0x5 << 21,
+    ARITH_SBC = 0x6 << 21,
+    ARITH_RSC = 0x7 << 21,
+    ARITH_TST = 0x8 << 21 | TO_CPSR,
+    ARITH_CMP = 0xa << 21 | TO_CPSR,
+    ARITH_CMN = 0xb << 21 | TO_CPSR,
+    ARITH_ORR = 0xc << 21,
+    ARITH_MOV = 0xd << 21,
+    ARITH_BIC = 0xe << 21,
+    ARITH_MVN = 0xf << 21,
+
+    INSN_CLZ       = 0x016f0f10,
+    INSN_RBIT      = 0x06ff0f30,
+
+    INSN_LDR_IMM   = 0x04100000,
+    INSN_LDR_REG   = 0x06100000,
+    INSN_STR_IMM   = 0x04000000,
+    INSN_STR_REG   = 0x06000000,
+
+    INSN_LDRH_IMM  = 0x005000b0,
+    INSN_LDRH_REG  = 0x001000b0,
+    INSN_LDRSH_IMM = 0x005000f0,
+    INSN_LDRSH_REG = 0x001000f0,
+    INSN_STRH_IMM  = 0x004000b0,
+    INSN_STRH_REG  = 0x000000b0,
+
+    INSN_LDRB_IMM  = 0x04500000,
+    INSN_LDRB_REG  = 0x06500000,
+    INSN_LDRSB_IMM = 0x005000d0,
+    INSN_LDRSB_REG = 0x001000d0,
+    INSN_STRB_IMM  = 0x04400000,
+    INSN_STRB_REG  = 0x06400000,
+
+    INSN_LDRD_IMM  = 0x004000d0,
+    INSN_LDRD_REG  = 0x000000d0,
+    INSN_STRD_IMM  = 0x004000f0,
+    INSN_STRD_REG  = 0x000000f0,
+
+    INSN_DMB_ISH   = 0x5bf07ff5,
+    INSN_DMB_MCR   = 0xba0f07ee,
+} ARMInsn;
+
+static const uint8_t tcg_cond_to_arm_cond[] = {
+    [TCG_COND_EQ] = COND_EQ,
+    [TCG_COND_NE] = COND_NE,
+    [TCG_COND_LT] = COND_LT,
+    [TCG_COND_GE] = COND_GE,
+    [TCG_COND_LE] = COND_LE,
+    [TCG_COND_GT] = COND_GT,
+    /* unsigned */
+    [TCG_COND_LTU] = COND_CC,
+    [TCG_COND_GEU] = COND_CS,
+    [TCG_COND_LEU] = COND_LS,
+    [TCG_COND_GTU] = COND_HI,
+};
+
 static inline void reloc_pc24(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
 {
     ptrdiff_t offset = (tcg_ptr_byte_diff(target, code_ptr) - 8) >> 2;
@@ -236,183 +327,257 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
     }
 }
 
-#define TO_CPSR (1 << 20)
+static inline void tcg_out_b(TCGContext *s, int cond, int32_t offset)
+{
+    tcg_out32(s, (cond << 28) | 0x0a000000 |
+                    (((offset - 8) >> 2) & 0x00ffffff));
+}
 
-typedef enum {
-    ARITH_AND = 0x0 << 21,
-    ARITH_EOR = 0x1 << 21,
-    ARITH_SUB = 0x2 << 21,
-    ARITH_RSB = 0x3 << 21,
-    ARITH_ADD = 0x4 << 21,
-    ARITH_ADC = 0x5 << 21,
-    ARITH_SBC = 0x6 << 21,
-    ARITH_RSC = 0x7 << 21,
-    ARITH_TST = 0x8 << 21 | TO_CPSR,
-    ARITH_CMP = 0xa << 21 | TO_CPSR,
-    ARITH_CMN = 0xb << 21 | TO_CPSR,
-    ARITH_ORR = 0xc << 21,
-    ARITH_MOV = 0xd << 21,
-    ARITH_BIC = 0xe << 21,
-    ARITH_MVN = 0xf << 21,
+static inline void tcg_out_b_noaddr(TCGContext *s, int cond)
+{
+    /* We pay attention here to not modify the branch target by masking
+       the corresponding bytes.  This ensure that caches and memory are
+       kept coherent during retranslation. */
+    tcg_out32(s, deposit32(*s->code_ptr, 24, 8, (cond << 4) | 0x0a));
+}
 
-    INSN_CLZ       = 0x016f0f10,
-    INSN_RBIT      = 0x06ff0f30,
+static inline void tcg_out_bl_noaddr(TCGContext *s, int cond)
+{
+    /* We pay attention here to not modify the branch target by masking
+       the corresponding bytes.  This ensure that caches and memory are
+       kept coherent during retranslation. */
+    tcg_out32(s, deposit32(*s->code_ptr, 24, 8, (cond << 4) | 0x0b));
+}
 
-    INSN_LDR_IMM   = 0x04100000,
-    INSN_LDR_REG   = 0x06100000,
-    INSN_STR_IMM   = 0x04000000,
-    INSN_STR_REG   = 0x06000000,
+static inline void tcg_out_bl(TCGContext *s, int cond, int32_t offset)
+{
+    tcg_out32(s, (cond << 28) | 0x0b000000 |
+                    (((offset - 8) >> 2) & 0x00ffffff));
+}
 
-    INSN_LDRH_IMM  = 0x005000b0,
-    INSN_LDRH_REG  = 0x001000b0,
-    INSN_LDRSH_IMM = 0x005000f0,
-    INSN_LDRSH_REG = 0x001000f0,
-    INSN_STRH_IMM  = 0x004000b0,
-    INSN_STRH_REG  = 0x000000b0,
+static inline void tcg_out_blx(TCGContext *s, int cond, int rn)
+{
+    tcg_out32(s, (cond << 28) | 0x012fff30 | rn);
+}
 
-    INSN_LDRB_IMM  = 0x04500000,
-    INSN_LDRB_REG  = 0x06500000,
-    INSN_LDRSB_IMM = 0x005000d0,
-    INSN_LDRSB_REG = 0x001000d0,
-    INSN_STRB_IMM  = 0x04400000,
-    INSN_STRB_REG  = 0x06400000,
+static inline void tcg_out_blx_imm(TCGContext *s, int32_t offset)
+{
+    tcg_out32(s, 0xfa000000 | ((offset & 2) << 23) |
+                (((offset - 8) >> 2) & 0x00ffffff));
+}
 
-    INSN_LDRD_IMM  = 0x004000d0,
-    INSN_LDRD_REG  = 0x000000d0,
-    INSN_STRD_IMM  = 0x004000f0,
-    INSN_STRD_REG  = 0x000000f0,
+static inline void tcg_out_dat_reg(TCGContext *s,
+                int cond, int opc, int rd, int rn, int rm, int shift)
+{
+    tcg_out32(s, (cond << 28) | (0 << 25) | opc |
+                    (rn << 16) | (rd << 12) | shift | rm);
+}
 
-    INSN_DMB_ISH   = 0x5bf07ff5,
-    INSN_DMB_MCR   = 0xba0f07ee,
+static inline void tcg_out_nop(TCGContext *s)
+{
+    if (use_armv7_instructions) {
+        /* Architected nop introduced in v6k.  */
+        /* ??? This is an MSR (imm) 0,0,0 insn.  Anyone know if this
+           also Just So Happened to do nothing on pre-v6k so that we
+           don't need to conditionalize it?  */
+        tcg_out32(s, 0xe320f000);
+    } else {
+        /* Prior to that the assembler uses mov r0, r0.  */
+        tcg_out_dat_reg(s, COND_AL, ARITH_MOV, 0, 0, 0, SHIFT_IMM_LSL(0));
+    }
+}
 
-} ARMInsn;
+static inline void tcg_out_mov_reg(TCGContext *s, int cond, int rd, int rm)
+{
+    /* Simple reg-reg move, optimising out the 'do nothing' case */
+    if (rd != rm) {
+        tcg_out_dat_reg(s, cond, ARITH_MOV, rd, 0, rm, SHIFT_IMM_LSL(0));
+    }
+}
 
-#define SHIFT_IMM_LSL(im)	(((im) << 7) | 0x00)
-#define SHIFT_IMM_LSR(im)	(((im) << 7) | 0x20)
-#define SHIFT_IMM_ASR(im)	(((im) << 7) | 0x40)
-#define SHIFT_IMM_ROR(im)	(((im) << 7) | 0x60)
-#define SHIFT_REG_LSL(rs)	(((rs) << 8) | 0x10)
-#define SHIFT_REG_LSR(rs)	(((rs) << 8) | 0x30)
-#define SHIFT_REG_ASR(rs)	(((rs) << 8) | 0x50)
-#define SHIFT_REG_ROR(rs)	(((rs) << 8) | 0x70)
+static inline void tcg_out_bx(TCGContext *s, int cond, TCGReg rn)
+{
+    /* Unless the C portion of QEMU is compiled as thumb, we don't
+       actually need true BX semantics; merely a branch to an address
+       held in a register.  */
+    if (use_armv5t_instructions) {
+        tcg_out32(s, (cond << 28) | 0x012fff10 | rn);
+    } else {
+        tcg_out_mov_reg(s, cond, TCG_REG_PC, rn);
+    }
+}
 
-enum arm_cond_code_e {
-    COND_EQ = 0x0,
-    COND_NE = 0x1,
-    COND_CS = 0x2,	/* Unsigned greater or equal */
-    COND_CC = 0x3,	/* Unsigned less than */
-    COND_MI = 0x4,	/* Negative */
-    COND_PL = 0x5,	/* Zero or greater */
-    COND_VS = 0x6,	/* Overflow */
-    COND_VC = 0x7,	/* No overflow */
-    COND_HI = 0x8,	/* Unsigned greater than */
-    COND_LS = 0x9,	/* Unsigned less or equal */
-    COND_GE = 0xa,
-    COND_LT = 0xb,
-    COND_GT = 0xc,
-    COND_LE = 0xd,
-    COND_AL = 0xe,
-};
+static inline void tcg_out_dat_imm(TCGContext *s,
+                int cond, int opc, int rd, int rn, int im)
+{
+    tcg_out32(s, (cond << 28) | (1 << 25) | opc |
+                    (rn << 16) | (rd << 12) | im);
+}
 
-static const uint8_t tcg_cond_to_arm_cond[] = {
-    [TCG_COND_EQ] = COND_EQ,
-    [TCG_COND_NE] = COND_NE,
-    [TCG_COND_LT] = COND_LT,
-    [TCG_COND_GE] = COND_GE,
-    [TCG_COND_LE] = COND_LE,
-    [TCG_COND_GT] = COND_GT,
-    /* unsigned */
-    [TCG_COND_LTU] = COND_CC,
-    [TCG_COND_GEU] = COND_CS,
-    [TCG_COND_LEU] = COND_LS,
-    [TCG_COND_GTU] = COND_HI,
-};
+/* Note that this routine is used for both LDR and LDRH formats, so we do
+   not wish to include an immediate shift at this point.  */
+static void tcg_out_memop_r(TCGContext *s, int cond, ARMInsn opc, TCGReg rt,
+                            TCGReg rn, TCGReg rm, bool u, bool p, bool w)
+{
+    tcg_out32(s, (cond << 28) | opc | (u << 23) | (p << 24)
+              | (w << 21) | (rn << 16) | (rt << 12) | rm);
+}
+
+static void tcg_out_memop_8(TCGContext *s, int cond, ARMInsn opc, TCGReg rt,
+                            TCGReg rn, int imm8, bool p, bool w)
+{
+    bool u = 1;
+    if (imm8 < 0) {
+        imm8 = -imm8;
+        u = 0;
+    }
+    tcg_out32(s, (cond << 28) | opc | (u << 23) | (p << 24) | (w << 21) |
+              (rn << 16) | (rt << 12) | ((imm8 & 0xf0) << 4) | (imm8 & 0xf));
+}
+
+static void tcg_out_memop_12(TCGContext *s, int cond, ARMInsn opc, TCGReg rt,
+                             TCGReg rn, int imm12, bool p, bool w)
+{
+    bool u = 1;
+    if (imm12 < 0) {
+        imm12 = -imm12;
+        u = 0;
+    }
+    tcg_out32(s, (cond << 28) | opc | (u << 23) | (p << 24) | (w << 21) |
+              (rn << 16) | (rt << 12) | imm12);
+}
+
+static inline void tcg_out_ld32_12(TCGContext *s, int cond, TCGReg rt,
+                                   TCGReg rn, int imm12)
+{
+    tcg_out_memop_12(s, cond, INSN_LDR_IMM, rt, rn, imm12, 1, 0);
+}
+
+static inline void tcg_out_st32_12(TCGContext *s, int cond, TCGReg rt,
+                                   TCGReg rn, int imm12)
+{
+    tcg_out_memop_12(s, cond, INSN_STR_IMM, rt, rn, imm12, 1, 0);
+}
+
+static inline void tcg_out_ld32_r(TCGContext *s, int cond, TCGReg rt,
+                                  TCGReg rn, TCGReg rm)
+{
+    tcg_out_memop_r(s, cond, INSN_LDR_REG, rt, rn, rm, 1, 1, 0);
+}
+
+static inline void tcg_out_st32_r(TCGContext *s, int cond, TCGReg rt,
+                                  TCGReg rn, TCGReg rm)
+{
+    tcg_out_memop_r(s, cond, INSN_STR_REG, rt, rn, rm, 1, 1, 0);
+}
+
+static inline void tcg_out_ldrd_8(TCGContext *s, int cond, TCGReg rt,
+                                   TCGReg rn, int imm8)
+{
+    tcg_out_memop_8(s, cond, INSN_LDRD_IMM, rt, rn, imm8, 1, 0);
+}
+
+static inline void tcg_out_ldrd_r(TCGContext *s, int cond, TCGReg rt,
+                                  TCGReg rn, TCGReg rm)
+{
+    tcg_out_memop_r(s, cond, INSN_LDRD_REG, rt, rn, rm, 1, 1, 0);
+}
+
+static inline void tcg_out_strd_8(TCGContext *s, int cond, TCGReg rt,
+                                   TCGReg rn, int imm8)
+{
+    tcg_out_memop_8(s, cond, INSN_STRD_IMM, rt, rn, imm8, 1, 0);
+}
+
+static inline void tcg_out_strd_r(TCGContext *s, int cond, TCGReg rt,
+                                  TCGReg rn, TCGReg rm)
+{
+    tcg_out_memop_r(s, cond, INSN_STRD_REG, rt, rn, rm, 1, 1, 0);
+}
+
+/* Register pre-increment with base writeback.  */
+static inline void tcg_out_ld32_rwb(TCGContext *s, int cond, TCGReg rt,
+                                    TCGReg rn, TCGReg rm)
+{
+    tcg_out_memop_r(s, cond, INSN_LDR_REG, rt, rn, rm, 1, 1, 1);
+}
+
+static inline void tcg_out_st32_rwb(TCGContext *s, int cond, TCGReg rt,
+                                    TCGReg rn, TCGReg rm)
+{
+    tcg_out_memop_r(s, cond, INSN_STR_REG, rt, rn, rm, 1, 1, 1);
+}
+
+static inline void tcg_out_ld16u_8(TCGContext *s, int cond, TCGReg rt,
+                                   TCGReg rn, int imm8)
+{
+    tcg_out_memop_8(s, cond, INSN_LDRH_IMM, rt, rn, imm8, 1, 0);
+}
 
-static inline void tcg_out_b(TCGContext *s, int cond, int32_t offset)
+static inline void tcg_out_st16_8(TCGContext *s, int cond, TCGReg rt,
+                                  TCGReg rn, int imm8)
 {
-    tcg_out32(s, (cond << 28) | 0x0a000000 |
-                    (((offset - 8) >> 2) & 0x00ffffff));
+    tcg_out_memop_8(s, cond, INSN_STRH_IMM, rt, rn, imm8, 1, 0);
 }
 
-static inline void tcg_out_b_noaddr(TCGContext *s, int cond)
+static inline void tcg_out_ld16u_r(TCGContext *s, int cond, TCGReg rt,
+                                   TCGReg rn, TCGReg rm)
 {
-    /* We pay attention here to not modify the branch target by masking
-       the corresponding bytes.  This ensure that caches and memory are
-       kept coherent during retranslation. */
-    tcg_out32(s, deposit32(*s->code_ptr, 24, 8, (cond << 4) | 0x0a));
+    tcg_out_memop_r(s, cond, INSN_LDRH_REG, rt, rn, rm, 1, 1, 0);
 }
 
-static inline void tcg_out_bl_noaddr(TCGContext *s, int cond)
+static inline void tcg_out_st16_r(TCGContext *s, int cond, TCGReg rt,
+                                  TCGReg rn, TCGReg rm)
 {
-    /* We pay attention here to not modify the branch target by masking
-       the corresponding bytes.  This ensure that caches and memory are
-       kept coherent during retranslation. */
-    tcg_out32(s, deposit32(*s->code_ptr, 24, 8, (cond << 4) | 0x0b));
+    tcg_out_memop_r(s, cond, INSN_STRH_REG, rt, rn, rm, 1, 1, 0);
 }
 
-static inline void tcg_out_bl(TCGContext *s, int cond, int32_t offset)
+static inline void tcg_out_ld16s_8(TCGContext *s, int cond, TCGReg rt,
+                                   TCGReg rn, int imm8)
 {
-    tcg_out32(s, (cond << 28) | 0x0b000000 |
-                    (((offset - 8) >> 2) & 0x00ffffff));
+    tcg_out_memop_8(s, cond, INSN_LDRSH_IMM, rt, rn, imm8, 1, 0);
 }
 
-static inline void tcg_out_blx(TCGContext *s, int cond, int rn)
+static inline void tcg_out_ld16s_r(TCGContext *s, int cond, TCGReg rt,
+                                   TCGReg rn, TCGReg rm)
 {
-    tcg_out32(s, (cond << 28) | 0x012fff30 | rn);
+    tcg_out_memop_r(s, cond, INSN_LDRSH_REG, rt, rn, rm, 1, 1, 0);
 }
 
-static inline void tcg_out_blx_imm(TCGContext *s, int32_t offset)
+static inline void tcg_out_ld8_12(TCGContext *s, int cond, TCGReg rt,
+                                  TCGReg rn, int imm12)
 {
-    tcg_out32(s, 0xfa000000 | ((offset & 2) << 23) |
-                (((offset - 8) >> 2) & 0x00ffffff));
+    tcg_out_memop_12(s, cond, INSN_LDRB_IMM, rt, rn, imm12, 1, 0);
 }
 
-static inline void tcg_out_dat_reg(TCGContext *s,
-                int cond, int opc, int rd, int rn, int rm, int shift)
+static inline void tcg_out_st8_12(TCGContext *s, int cond, TCGReg rt,
+                                  TCGReg rn, int imm12)
 {
-    tcg_out32(s, (cond << 28) | (0 << 25) | opc |
-                    (rn << 16) | (rd << 12) | shift | rm);
+    tcg_out_memop_12(s, cond, INSN_STRB_IMM, rt, rn, imm12, 1, 0);
 }
 
-static inline void tcg_out_nop(TCGContext *s)
+static inline void tcg_out_ld8_r(TCGContext *s, int cond, TCGReg rt,
+                                 TCGReg rn, TCGReg rm)
 {
-    if (use_armv7_instructions) {
-        /* Architected nop introduced in v6k.  */
-        /* ??? This is an MSR (imm) 0,0,0 insn.  Anyone know if this
-           also Just So Happened to do nothing on pre-v6k so that we
-           don't need to conditionalize it?  */
-        tcg_out32(s, 0xe320f000);
-    } else {
-        /* Prior to that the assembler uses mov r0, r0.  */
-        tcg_out_dat_reg(s, COND_AL, ARITH_MOV, 0, 0, 0, SHIFT_IMM_LSL(0));
-    }
+    tcg_out_memop_r(s, cond, INSN_LDRB_REG, rt, rn, rm, 1, 1, 0);
 }
 
-static inline void tcg_out_mov_reg(TCGContext *s, int cond, int rd, int rm)
+static inline void tcg_out_st8_r(TCGContext *s, int cond, TCGReg rt,
+                                 TCGReg rn, TCGReg rm)
 {
-    /* Simple reg-reg move, optimising out the 'do nothing' case */
-    if (rd != rm) {
-        tcg_out_dat_reg(s, cond, ARITH_MOV, rd, 0, rm, SHIFT_IMM_LSL(0));
-    }
+    tcg_out_memop_r(s, cond, INSN_STRB_REG, rt, rn, rm, 1, 1, 0);
 }
 
-static inline void tcg_out_bx(TCGContext *s, int cond, TCGReg rn)
+static inline void tcg_out_ld8s_8(TCGContext *s, int cond, TCGReg rt,
+                                  TCGReg rn, int imm8)
 {
-    /* Unless the C portion of QEMU is compiled as thumb, we don't
-       actually need true BX semantics; merely a branch to an address
-       held in a register.  */
-    if (use_armv5t_instructions) {
-        tcg_out32(s, (cond << 28) | 0x012fff10 | rn);
-    } else {
-        tcg_out_mov_reg(s, cond, TCG_REG_PC, rn);
-    }
+    tcg_out_memop_8(s, cond, INSN_LDRSB_IMM, rt, rn, imm8, 1, 0);
 }
 
-static inline void tcg_out_dat_imm(TCGContext *s,
-                int cond, int opc, int rd, int rn, int im)
+static inline void tcg_out_ld8s_r(TCGContext *s, int cond, TCGReg rt,
+                                  TCGReg rn, TCGReg rm)
 {
-    tcg_out32(s, (cond << 28) | (1 << 25) | opc |
-                    (rn << 16) | (rd << 12) | im);
+    tcg_out_memop_r(s, cond, INSN_LDRSB_REG, rt, rn, rm, 1, 1, 0);
 }
 
 static void tcg_out_movi32(TCGContext *s, int cond, int rd, uint32_t arg)
@@ -747,172 +912,6 @@ static inline void tcg_out_sextract(TCGContext *s, int cond, TCGReg rd,
               | (ofs << 7) | ((len - 1) << 16));
 }
 
-/* Note that this routine is used for both LDR and LDRH formats, so we do
-   not wish to include an immediate shift at this point.  */
-static void tcg_out_memop_r(TCGContext *s, int cond, ARMInsn opc, TCGReg rt,
-                            TCGReg rn, TCGReg rm, bool u, bool p, bool w)
-{
-    tcg_out32(s, (cond << 28) | opc | (u << 23) | (p << 24)
-              | (w << 21) | (rn << 16) | (rt << 12) | rm);
-}
-
-static void tcg_out_memop_8(TCGContext *s, int cond, ARMInsn opc, TCGReg rt,
-                            TCGReg rn, int imm8, bool p, bool w)
-{
-    bool u = 1;
-    if (imm8 < 0) {
-        imm8 = -imm8;
-        u = 0;
-    }
-    tcg_out32(s, (cond << 28) | opc | (u << 23) | (p << 24) | (w << 21) |
-              (rn << 16) | (rt << 12) | ((imm8 & 0xf0) << 4) | (imm8 & 0xf));
-}
-
-static void tcg_out_memop_12(TCGContext *s, int cond, ARMInsn opc, TCGReg rt,
-                             TCGReg rn, int imm12, bool p, bool w)
-{
-    bool u = 1;
-    if (imm12 < 0) {
-        imm12 = -imm12;
-        u = 0;
-    }
-    tcg_out32(s, (cond << 28) | opc | (u << 23) | (p << 24) | (w << 21) |
-              (rn << 16) | (rt << 12) | imm12);
-}
-
-static inline void tcg_out_ld32_12(TCGContext *s, int cond, TCGReg rt,
-                                   TCGReg rn, int imm12)
-{
-    tcg_out_memop_12(s, cond, INSN_LDR_IMM, rt, rn, imm12, 1, 0);
-}
-
-static inline void tcg_out_st32_12(TCGContext *s, int cond, TCGReg rt,
-                                   TCGReg rn, int imm12)
-{
-    tcg_out_memop_12(s, cond, INSN_STR_IMM, rt, rn, imm12, 1, 0);
-}
-
-static inline void tcg_out_ld32_r(TCGContext *s, int cond, TCGReg rt,
-                                  TCGReg rn, TCGReg rm)
-{
-    tcg_out_memop_r(s, cond, INSN_LDR_REG, rt, rn, rm, 1, 1, 0);
-}
-
-static inline void tcg_out_st32_r(TCGContext *s, int cond, TCGReg rt,
-                                  TCGReg rn, TCGReg rm)
-{
-    tcg_out_memop_r(s, cond, INSN_STR_REG, rt, rn, rm, 1, 1, 0);
-}
-
-static inline void tcg_out_ldrd_8(TCGContext *s, int cond, TCGReg rt,
-                                   TCGReg rn, int imm8)
-{
-    tcg_out_memop_8(s, cond, INSN_LDRD_IMM, rt, rn, imm8, 1, 0);
-}
-
-static inline void tcg_out_ldrd_r(TCGContext *s, int cond, TCGReg rt,
-                                  TCGReg rn, TCGReg rm)
-{
-    tcg_out_memop_r(s, cond, INSN_LDRD_REG, rt, rn, rm, 1, 1, 0);
-}
-
-static inline void tcg_out_strd_8(TCGContext *s, int cond, TCGReg rt,
-                                   TCGReg rn, int imm8)
-{
-    tcg_out_memop_8(s, cond, INSN_STRD_IMM, rt, rn, imm8, 1, 0);
-}
-
-static inline void tcg_out_strd_r(TCGContext *s, int cond, TCGReg rt,
-                                  TCGReg rn, TCGReg rm)
-{
-    tcg_out_memop_r(s, cond, INSN_STRD_REG, rt, rn, rm, 1, 1, 0);
-}
-
-/* Register pre-increment with base writeback.  */
-static inline void tcg_out_ld32_rwb(TCGContext *s, int cond, TCGReg rt,
-                                    TCGReg rn, TCGReg rm)
-{
-    tcg_out_memop_r(s, cond, INSN_LDR_REG, rt, rn, rm, 1, 1, 1);
-}
-
-static inline void tcg_out_st32_rwb(TCGContext *s, int cond, TCGReg rt,
-                                    TCGReg rn, TCGReg rm)
-{
-    tcg_out_memop_r(s, cond, INSN_STR_REG, rt, rn, rm, 1, 1, 1);
-}
-
-static inline void tcg_out_ld16u_8(TCGContext *s, int cond, TCGReg rt,
-                                   TCGReg rn, int imm8)
-{
-    tcg_out_memop_8(s, cond, INSN_LDRH_IMM, rt, rn, imm8, 1, 0);
-}
-
-static inline void tcg_out_st16_8(TCGContext *s, int cond, TCGReg rt,
-                                  TCGReg rn, int imm8)
-{
-    tcg_out_memop_8(s, cond, INSN_STRH_IMM, rt, rn, imm8, 1, 0);
-}
-
-static inline void tcg_out_ld16u_r(TCGContext *s, int cond, TCGReg rt,
-                                   TCGReg rn, TCGReg rm)
-{
-    tcg_out_memop_r(s, cond, INSN_LDRH_REG, rt, rn, rm, 1, 1, 0);
-}
-
-static inline void tcg_out_st16_r(TCGContext *s, int cond, TCGReg rt,
-                                  TCGReg rn, TCGReg rm)
-{
-    tcg_out_memop_r(s, cond, INSN_STRH_REG, rt, rn, rm, 1, 1, 0);
-}
-
-static inline void tcg_out_ld16s_8(TCGContext *s, int cond, TCGReg rt,
-                                   TCGReg rn, int imm8)
-{
-    tcg_out_memop_8(s, cond, INSN_LDRSH_IMM, rt, rn, imm8, 1, 0);
-}
-
-static inline void tcg_out_ld16s_r(TCGContext *s, int cond, TCGReg rt,
-                                   TCGReg rn, TCGReg rm)
-{
-    tcg_out_memop_r(s, cond, INSN_LDRSH_REG, rt, rn, rm, 1, 1, 0);
-}
-
-static inline void tcg_out_ld8_12(TCGContext *s, int cond, TCGReg rt,
-                                  TCGReg rn, int imm12)
-{
-    tcg_out_memop_12(s, cond, INSN_LDRB_IMM, rt, rn, imm12, 1, 0);
-}
-
-static inline void tcg_out_st8_12(TCGContext *s, int cond, TCGReg rt,
-                                  TCGReg rn, int imm12)
-{
-    tcg_out_memop_12(s, cond, INSN_STRB_IMM, rt, rn, imm12, 1, 0);
-}
-
-static inline void tcg_out_ld8_r(TCGContext *s, int cond, TCGReg rt,
-                                 TCGReg rn, TCGReg rm)
-{
-    tcg_out_memop_r(s, cond, INSN_LDRB_REG, rt, rn, rm, 1, 1, 0);
-}
-
-static inline void tcg_out_st8_r(TCGContext *s, int cond, TCGReg rt,
-                                 TCGReg rn, TCGReg rm)
-{
-    tcg_out_memop_r(s, cond, INSN_STRB_REG, rt, rn, rm, 1, 1, 0);
-}
-
-static inline void tcg_out_ld8s_8(TCGContext *s, int cond, TCGReg rt,
-                                  TCGReg rn, int imm8)
-{
-    tcg_out_memop_8(s, cond, INSN_LDRSB_IMM, rt, rn, imm8, 1, 0);
-}
-
-static inline void tcg_out_ld8s_r(TCGContext *s, int cond, TCGReg rt,
-                                  TCGReg rn, TCGReg rm)
-{
-    tcg_out_memop_r(s, cond, INSN_LDRSB_REG, rt, rn, rm, 1, 1, 0);
-}
-
 static inline void tcg_out_ld32u(TCGContext *s, int cond,
                 int rd, int rn, int32_t offset)
 {
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 18/23] tcg/arm: Extract INSN_NOP
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
                   ` (16 preceding siblings ...)
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 17/23] tcg/arm: Code rearrangement Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 19/23] tcg/arm: Use constant pool for movi Richard Henderson
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

We'll want this for tcg_out_nop_fill.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.inc.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index f40e87066f..78603a19db 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -160,8 +160,18 @@ typedef enum {
 
     INSN_DMB_ISH   = 0x5bf07ff5,
     INSN_DMB_MCR   = 0xba0f07ee,
+
+    /* Architected nop introduced in v6k.  */
+    /* ??? This is an MSR (imm) 0,0,0 insn.  Anyone know if this
+       also Just So Happened to do nothing on pre-v6k so that we
+       don't need to conditionalize it?  */
+    INSN_NOP_v6k   = 0xe320f000,
+    /* Otherwise the assembler uses mov r0,r0 */
+    INSN_NOP_v4    = (COND_AL << 28) | ARITH_MOV,
 } ARMInsn;
 
+#define INSN_NOP   (use_armv7_instructions ? INSN_NOP_v6k : INSN_NOP_v4)
+
 static const uint8_t tcg_cond_to_arm_cond[] = {
     [TCG_COND_EQ] = COND_EQ,
     [TCG_COND_NE] = COND_NE,
@@ -375,16 +385,7 @@ static inline void tcg_out_dat_reg(TCGContext *s,
 
 static inline void tcg_out_nop(TCGContext *s)
 {
-    if (use_armv7_instructions) {
-        /* Architected nop introduced in v6k.  */
-        /* ??? This is an MSR (imm) 0,0,0 insn.  Anyone know if this
-           also Just So Happened to do nothing on pre-v6k so that we
-           don't need to conditionalize it?  */
-        tcg_out32(s, 0xe320f000);
-    } else {
-        /* Prior to that the assembler uses mov r0, r0.  */
-        tcg_out_dat_reg(s, COND_AL, ARITH_MOV, 0, 0, 0, SHIFT_IMM_LSL(0));
-    }
+    tcg_out32(s, INSN_NOP);
 }
 
 static inline void tcg_out_mov_reg(TCGContext *s, int cond, int rd, int rm)
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 19/23] tcg/arm: Use constant pool for movi
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
                   ` (17 preceding siblings ...)
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 18/23] tcg/arm: Extract INSN_NOP Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 20/23] tcg/arm: Use constant pool for call Richard Henderson
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.h     |  1 +
 tcg/arm/tcg-target.inc.c | 92 ++++++++++++++++++++++++++++++++++++++----------
 2 files changed, 75 insertions(+), 18 deletions(-)

diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 55de35a691..0f71a85a45 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -141,5 +141,6 @@ void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
 #ifdef CONFIG_SOFTMMU
 #define TCG_TARGET_NEED_LDST_LABELS
 #endif
+#define TCG_TARGET_NEED_POOL_LABELS
 
 #endif
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 78603a19db..2736022d5a 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -23,6 +23,7 @@
  */
 
 #include "elf.h"
+#include "tcg-pool.inc.c"
 
 int arm_arch = __ARM_ARCH;
 
@@ -203,9 +204,39 @@ static inline void reloc_pc24_atomic(tcg_insn_unit *code_ptr, tcg_insn_unit *tar
 static void patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
-    tcg_debug_assert(type == R_ARM_PC24);
     tcg_debug_assert(addend == 0);
-    reloc_pc24(code_ptr, (tcg_insn_unit *)value);
+
+    if (type == R_ARM_PC24) {
+        reloc_pc24(code_ptr, (tcg_insn_unit *)value);
+    } else if (type == R_ARM_PC13) {
+        intptr_t diff = value - (uintptr_t)(code_ptr + 2);
+        tcg_insn_unit insn = *code_ptr;
+        bool u;
+
+        if (diff >= -0xfff && diff <= 0xfff) {
+            u = (diff >= 0);
+            if (!u) {
+                diff = -diff;
+            }
+        } else {
+            int rd = extract32(insn, 12, 4);
+            int rt = rd == TCG_REG_PC ? TCG_REG_TMP : rd;
+            assert(diff >= 0x1000 && diff < 0x100000);
+            /* add rt, pc, #high */
+            *code_ptr++ = ((insn & 0xf0000000) | (1 << 25) | ARITH_ADD
+                           | (TCG_REG_PC << 16) | (rt << 12)
+                           | (20 << 7) | (diff >> 12));
+            /* ldr rd, [rt, #low] */
+            insn = deposit32(insn, 12, 4, rt);
+            diff &= 0xfff;
+            u = 1;
+        }
+        insn = deposit32(insn, 23, 1, u);
+        insn = deposit32(insn, 0, 12, diff);
+        *code_ptr = insn;
+    } else {
+        g_assert_not_reached();
+    }
 }
 
 #define TCG_CT_CONST_ARM  0x100
@@ -581,9 +612,20 @@ static inline void tcg_out_ld8s_r(TCGContext *s, int cond, TCGReg rt,
     tcg_out_memop_r(s, cond, INSN_LDRSB_REG, rt, rn, rm, 1, 1, 0);
 }
 
+static void tcg_out_movi_pool(TCGContext *s, int cond, int rd, uint32_t arg)
+{
+    /* The 12-bit range on the ldr insn is sometimes a bit too small.
+       In order to get around that we require two insns, one of which
+       will usually be a nop, but may be replaced in patch_reloc.  */
+    new_pool_label(s, arg, R_ARM_PC13, s->code_ptr, 0);
+    tcg_out_ld32_12(s, cond, rd, TCG_REG_PC, 0);
+    tcg_out_nop(s);
+}
+
 static void tcg_out_movi32(TCGContext *s, int cond, int rd, uint32_t arg)
 {
-    int rot, opc, rn, diff;
+    int rot, diff, opc, sh1, sh2;
+    uint32_t tt0, tt1, tt2;
 
     /* Check a single MOV/MVN before anything else.  */
     rot = encode_imm(arg);
@@ -631,24 +673,30 @@ static void tcg_out_movi32(TCGContext *s, int cond, int rd, uint32_t arg)
         return;
     }
 
-    /* TODO: This is very suboptimal, we can easily have a constant
-       pool somewhere after all the instructions.  */
+    /* Look for sequences of two insns.  If we have lots of 1's, we can
+       shorten the sequence by beginning with mvn and then clearing
+       higher bits with eor.  */
+    tt0 = arg;
     opc = ARITH_MOV;
-    rn = 0;
-    /* If we have lots of leading 1's, we can shorten the sequence by
-       beginning with mvn and then clearing higher bits with eor.  */
-    if (clz32(~arg) > clz32(arg)) {
-        opc = ARITH_MVN, arg = ~arg;
+    if (ctpop32(arg) > 16) {
+        tt0 = ~arg;
+        opc = ARITH_MVN;
+    }
+    sh1 = ctz32(tt0) & ~1;
+    tt1 = tt0 & ~(0xff << sh1);
+    sh2 = ctz32(tt1) & ~1;
+    tt2 = tt1 & ~(0xff << sh2);
+    if (tt2 == 0) {
+        rot = ((32 - sh1) << 7) & 0xf00;
+        tcg_out_dat_imm(s, cond, opc, rd,  0, ((tt0 >> sh1) & 0xff) | rot);
+        rot = ((32 - sh2) << 7) & 0xf00;
+        tcg_out_dat_imm(s, cond, ARITH_EOR, rd, rd,
+                        ((tt0 >> sh2) & 0xff) | rot);
+        return;
     }
-    do {
-        int i = ctz32(arg) & ~1;
-        rot = ((32 - i) << 7) & 0xf00;
-        tcg_out_dat_imm(s, cond, opc, rd, rn, ((arg >> i) & 0xff) | rot);
-        arg &= ~(0xff << i);
 
-        opc = ARITH_EOR;
-        rn = rd;
-    } while (arg);
+    /* Otherwise, drop it into the constant pool.  */
+    tcg_out_movi_pool(s, cond, rd, arg);
 }
 
 static inline void tcg_out_dat_rI(TCGContext *s, int cond, int opc, TCGArg dst,
@@ -2164,6 +2212,14 @@ static inline void tcg_out_movi(TCGContext *s, TCGType type,
     tcg_out_movi32(s, COND_AL, ret, arg);
 }
 
+static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
+{
+    int i;
+    for (i = 0; i < count; ++i) {
+        p[i] = INSN_NOP;
+    }
+}
+
 /* Compute frame size via macros, to share between tcg_target_qemu_prologue
    and tcg_register_jit.  */
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 20/23] tcg/arm: Use constant pool for call
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
                   ` (18 preceding siblings ...)
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 19/23] tcg/arm: Use constant pool for movi Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 21/23] tcg/ppc: Change TCG_REG_RA to TCG_REG_TB Richard Henderson
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.inc.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 2736022d5a..db46aea38c 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -1054,10 +1054,7 @@ static void tcg_out_goto(TCGContext *s, int cond, tcg_insn_unit *addr)
         tcg_out_b(s, cond, disp);
         return;
     }
-
-    assert(use_armv5t_instructions || (addri & 1) == 0);
-    tcg_out_movi32(s, cond, TCG_REG_TMP, addri);
-    tcg_out_bx(s, cond, TCG_REG_TMP);
+    tcg_out_movi_pool(s, cond, TCG_REG_PC, addri);
 }
 
 /* The call case is mostly used for helpers - so it's not unreasonable
@@ -1081,9 +1078,9 @@ static void tcg_out_call(TCGContext *s, tcg_insn_unit *addr)
         tcg_out_movi32(s, COND_AL, TCG_REG_TMP, addri);
         tcg_out_blx(s, COND_AL, TCG_REG_TMP);
     } else {
+        /* ??? Know that movi_pool emits exactly 2 insns.  */
         tcg_out_dat_imm(s, COND_AL, ARITH_ADD, TCG_REG_R14, TCG_REG_PC, 4);
-        tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4);
-        tcg_out32(s, addri);
+        tcg_out_movi_pool(s, COND_AL, TCG_REG_PC, addri);
     }
 }
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 21/23] tcg/ppc: Change TCG_REG_RA to TCG_REG_TB
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
                   ` (19 preceding siblings ...)
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 20/23] tcg/arm: Use constant pool for call Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 22/23] tcg/ppc: Look for shifted constants Richard Henderson
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 23/23] tcg/ppc: Use constant pool for movi Richard Henderson
  22 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

At this point the conversion is a wash.  Loading of TB+ofs is
smaller, but the actual return address from exit_tb is larger.
There are a few more insns required to transition between TBs.

But the expectation is that accesses to the constant pool will
on the whole be smaller.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc/tcg-target.inc.c | 273 +++++++++++++++++++++--------------------------
 1 file changed, 122 insertions(+), 151 deletions(-)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index d772faf7be..bc14d2c9c6 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -39,29 +39,8 @@
 # define TCG_REG_TMP1   TCG_REG_R12
 #endif
 
-/* For the 64-bit target, we don't like the 5 insn sequence needed to build
-   full 64-bit addresses.  Better to have a base register to which we can
-   apply a 32-bit displacement.
-
-   There are generally three items of interest:
-   (1) helper functions in the main executable,
-   (2) TranslationBlock data structures,
-   (3) the return address in the epilogue.
-
-   For user-only, we USE_STATIC_CODE_GEN_BUFFER, so the code_gen_buffer
-   will be inside the main executable, and thus near enough to make a
-   pointer to the epilogue be within 2GB of all helper functions.
-
-   For softmmu, we'll let the kernel choose the address of code_gen_buffer,
-   and odds are it'll be somewhere close to the main malloc arena, and so
-   a pointer to the epilogue will be within 2GB of the TranslationBlocks.
-
-   For --enable-pie, everything will be kinda near everything else,
-   somewhere in high memory.
-
-   Thus we choose to keep the return address in a call-saved register.  */
-#define TCG_REG_RA     TCG_REG_R31
-#define USE_REG_RA     (TCG_TARGET_REG_BITS == 64)
+#define TCG_REG_TB     TCG_REG_R31
+#define USE_REG_TB     (TCG_TARGET_REG_BITS == 64)
 
 /* Shorthand for size of a pointer.  Avoid promotion to unsigned.  */
 #define SZP  ((int)sizeof(void *))
@@ -614,50 +593,68 @@ static inline void tcg_out_shri64(TCGContext *s, TCGReg dst, TCGReg src, int c)
     tcg_out_rld(s, RLDICL, dst, src, 64 - c, c);
 }
 
-static void tcg_out_movi32(TCGContext *s, TCGReg ret, int32_t arg)
+static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
+                             tcg_target_long arg, bool in_prologue)
 {
-    if (arg == (int16_t) arg) {
+    intptr_t tb_diff;
+    int32_t high;
+
+    tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
+
+    if (TCG_TARGET_REG_BITS == 64 && type == TCG_TYPE_I32) {
+        arg = (int32_t)arg;
+    }
+
+    /* Load 16-bit immediates with one insn.  */
+    if (arg == (int16_t)arg) {
         tcg_out32(s, ADDI | TAI(ret, 0, arg));
-    } else {
+        return;
+    }
+
+    /* Load addresses within the TB with one insn.  */
+    tb_diff = arg - (intptr_t)s->code_gen_ptr;
+    if (!in_prologue && USE_REG_TB && tb_diff == (int16_t)tb_diff) {
+        tcg_out32(s, ADDI | TAI(ret, TCG_REG_TB, tb_diff));
+        return;
+    }
+
+    /* Load 32-bit immediates with two insns.  */
+    if (TCG_TARGET_REG_BITS == 32 || arg == (int32_t)arg) {
         tcg_out32(s, ADDIS | TAI(ret, 0, arg >> 16));
         if (arg & 0xffff) {
             tcg_out32(s, ORI | SAI(ret, ret, arg));
         }
+        return;
     }
-}
-
-static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret,
-                         tcg_target_long arg)
-{
-    tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
-    if (type == TCG_TYPE_I32 || arg == (int32_t)arg) {
-        tcg_out_movi32(s, ret, arg);
-    } else if (arg == (uint32_t)arg && !(arg & 0x8000)) {
+    if (arg == (uint32_t)arg && !(arg & 0x8000)) {
         tcg_out32(s, ADDI | TAI(ret, 0, arg));
         tcg_out32(s, ORIS | SAI(ret, ret, arg >> 16));
-    } else {
-        int32_t high;
+        return;
+    }
 
-        if (USE_REG_RA) {
-            intptr_t diff = arg - (intptr_t)tb_ret_addr;
-            if (diff == (int32_t)diff) {
-                tcg_out_mem_long(s, ADDI, ADD, ret, TCG_REG_RA, diff);
-                return;
-            }
-        }
+    /* Load addresses within 2GB of TB with 2 (or rarely 3) insns.  */
+    if (!in_prologue && USE_REG_TB && tb_diff == (int32_t)tb_diff) {
+        tcg_out_mem_long(s, ADDI, ADD, ret, TCG_REG_TB, tb_diff);
+        return;
+    }
 
-        high = arg >> 31 >> 1;
-        tcg_out_movi32(s, ret, high);
-        if (high) {
-            tcg_out_shli64(s, ret, ret, 32);
-        }
-        if (arg & 0xffff0000) {
-            tcg_out32(s, ORIS | SAI(ret, ret, arg >> 16));
-        }
-        if (arg & 0xffff) {
-            tcg_out32(s, ORI | SAI(ret, ret, arg));
-        }
+    high = arg >> 31 >> 1;
+    tcg_out_movi(s, TCG_TYPE_I32, ret, high);
+    if (high) {
+        tcg_out_shli64(s, ret, ret, 32);
     }
+    if (arg & 0xffff0000) {
+        tcg_out32(s, ORIS | SAI(ret, ret, arg >> 16));
+    }
+    if (arg & 0xffff) {
+        tcg_out32(s, ORI | SAI(ret, ret, arg));
+    }
+}
+
+static inline void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret,
+                                tcg_target_long arg)
+{
+    tcg_out_movi_int(s, type, ret, arg, false);
 }
 
 static bool mask_operand(uint32_t c, int *mb, int *me)
@@ -1293,49 +1290,43 @@ static void tcg_out_mb(TCGContext *s, TCGArg a0)
     tcg_out32(s, insn);
 }
 
-#ifdef __powerpc64__
 void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
                               uintptr_t addr)
 {
-    tcg_insn_unit i1, i2;
-    uint64_t pair;
-    intptr_t diff = addr - jmp_addr;
-
-    if (in_range_b(diff)) {
-        i1 = B | (diff & 0x3fffffc);
-        i2 = NOP;
-    } else if (USE_REG_RA) {
-        intptr_t lo, hi;
-        diff = addr - (uintptr_t)tb_ret_addr;
-        lo = (int16_t)diff;
-        hi = (int32_t)(diff - lo);
-        tcg_debug_assert(diff == hi + lo);
-        i1 = ADDIS | TAI(TCG_REG_TMP1, TCG_REG_RA, hi >> 16);
-        i2 = ADDI | TAI(TCG_REG_TMP1, TCG_REG_TMP1, lo);
-    } else {
-        tcg_debug_assert(TCG_TARGET_REG_BITS == 32 || addr == (int32_t)addr);
-        i1 = ADDIS | TAI(TCG_REG_TMP1, 0, addr >> 16);
-        i2 = ORI | SAI(TCG_REG_TMP1, TCG_REG_TMP1, addr);
-    }
+    if (TCG_TARGET_REG_BITS == 64) {
+        tcg_insn_unit i1, i2;
+        intptr_t tb_diff = addr - tc_ptr;
+        intptr_t br_diff = addr - (jmp_addr + 4);
+        uint64_t pair;
+
+        /* This does not exercise the range of the branch, but we do
+           still need to be able to load the new value of TCG_REG_TB.
+           But this does still happen quite often.  */
+        if (tb_diff == (int16_t)tb_diff) {
+            i1 = ADDI | TAI(TCG_REG_TB, TCG_REG_TB, tb_diff);
+            i2 = B | (br_diff & 0x3fffffc);
+        } else {
+            intptr_t lo = (int16_t)tb_diff;
+            intptr_t hi = (int32_t)(tb_diff - lo);
+            assert(tb_diff == hi + lo);
+            i1 = ADDIS | TAI(TCG_REG_TB, TCG_REG_TB, hi >> 16);
+            i2 = ADDI | TAI(TCG_REG_TB, TCG_REG_TB, lo);
+        }
 #ifdef HOST_WORDS_BIGENDIAN
-    pair = (uint64_t)i1 << 32 | i2;
+        pair = (uint64_t)i1 << 32 | i2;
 #else
-    pair = (uint64_t)i2 << 32 | i1;
+        pair = (uint64_t)i2 << 32 | i1;
 #endif
 
-    atomic_set((uint64_t *)jmp_addr, pair);
-    flush_icache_range(jmp_addr, jmp_addr + 8);
-}
-#else
-void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
-                              uintptr_t addr)
-{
-    intptr_t diff = addr - jmp_addr;
-    tcg_debug_assert(in_range_b(diff));
-    atomic_set((uint32_t *)jmp_addr, B | (diff & 0x3fffffc));
-    flush_icache_range(jmp_addr, jmp_addr + 4);
+        atomic_set((uint64_t *)jmp_addr, pair);
+        flush_icache_range(jmp_addr, jmp_addr + 8);
+    } else {
+        intptr_t diff = addr - jmp_addr;
+        tcg_debug_assert(in_range_b(diff));
+        atomic_set((uint32_t *)jmp_addr, B | (diff & 0x3fffffc));
+        flush_icache_range(jmp_addr, jmp_addr + 4);
+    }
 }
-#endif
 
 static void tcg_out_call(TCGContext *s, tcg_insn_unit *target)
 {
@@ -1897,44 +1888,20 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 
 #ifndef CONFIG_SOFTMMU
     if (guest_base) {
-        tcg_out_movi(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, guest_base);
+        tcg_out_movi_int(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, guest_base, true);
         tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG);
     }
 #endif
 
     tcg_out_mov(s, TCG_TYPE_PTR, TCG_AREG0, tcg_target_call_iarg_regs[0]);
     tcg_out32(s, MTSPR | RS(tcg_target_call_iarg_regs[1]) | CTR);
-
-    if (USE_REG_RA) {
-#ifdef _CALL_AIX
-        /* Make the caller load the value as the TOC into R2.  */
-        tb_ret_addr = s->code_ptr + 2;
-        desc[1] = tb_ret_addr;
-        tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_RA, TCG_REG_R2);
-        tcg_out32(s, BCCTR | BO_ALWAYS);
-#elif defined(_CALL_ELF) && _CALL_ELF == 2
-        /* Compute from the incoming R12 value.  */
-        tb_ret_addr = s->code_ptr + 2;
-        tcg_out32(s, ADDI | TAI(TCG_REG_RA, TCG_REG_R12,
-                                tcg_ptr_byte_diff(tb_ret_addr, s->code_buf)));
-        tcg_out32(s, BCCTR | BO_ALWAYS);
-#else
-        /* Reserve max 5 insns for the constant load.  */
-        tb_ret_addr = s->code_ptr + 6;
-        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_RA, (intptr_t)tb_ret_addr);
-        tcg_out32(s, BCCTR | BO_ALWAYS);
-        while (s->code_ptr < tb_ret_addr) {
-            tcg_out32(s, NOP);
-        }
-#endif
-    } else {
-        tcg_out32(s, BCCTR | BO_ALWAYS);
-        tb_ret_addr = s->code_ptr;
+    if (USE_REG_TB) {
+        tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_TB, tcg_target_call_iarg_regs[1]);
     }
+    tcg_out32(s, BCCTR | BO_ALWAYS);
 
     /* Epilogue */
-    tcg_debug_assert(tb_ret_addr == s->code_ptr);
-    s->code_gen_epilogue = tb_ret_addr;
+    s->code_gen_epilogue = tb_ret_addr = s->code_ptr;
 
     tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_R0, TCG_REG_R1, FRAME_SIZE+LR_OFFSET);
     for (i = 0; i < ARRAY_SIZE(tcg_target_callee_save_regs); ++i) {
@@ -1954,44 +1921,48 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
 
     switch (opc) {
     case INDEX_op_exit_tb:
-        if (USE_REG_RA) {
-            ptrdiff_t disp = tcg_pcrel_diff(s, tb_ret_addr);
-
-            /* Use a direct branch if we can, otherwise use the value in RA.
-               Note that the direct branch is always backward, thus we need
-               to account for the possibility of 5 insns from the movi.  */
-            if (!in_range_b(disp - 20)) {
-                tcg_out32(s, MTSPR | RS(TCG_REG_RA) | CTR);
-                tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R3, args[0]);
-                tcg_out32(s, BCCTR | BO_ALWAYS);
-                break;
-            }
-        }
         tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R3, args[0]);
         tcg_out_b(s, 0, tb_ret_addr);
         break;
     case INDEX_op_goto_tb:
-        tcg_debug_assert(s->tb_jmp_insn_offset);
-        /* Direct jump. */
-#ifdef __powerpc64__
-        /* Ensure the next insns are 8-byte aligned. */
-        if ((uintptr_t)s->code_ptr & 7) {
-            tcg_out32(s, NOP);
-        }
-        s->tb_jmp_insn_offset[args[0]] = tcg_current_code_size(s);
-        /* To be replaced by either a branch+nop or a load into TMP1.  */
-        s->code_ptr += 2;
-        tcg_out32(s, MTSPR | RS(TCG_REG_TMP1) | CTR);
+        if (s->tb_jmp_insn_offset) {
+            /* Direct jump. */
+            if (TCG_TARGET_REG_BITS == 64) {
+                /* Ensure the next insns are 8-byte aligned. */
+                if ((uintptr_t)s->code_ptr & 7) {
+                    tcg_out32(s, NOP);
+                }
+                s->tb_jmp_insn_offset[args[0]] = tcg_current_code_size(s);
+                tcg_out32(s, ADDIS | TAI(TCG_REG_TB, TCG_REG_TB, 0));
+                tcg_out32(s, ADDI | TAI(TCG_REG_TB, TCG_REG_TB, 0));
+            } else {
+                s->tb_jmp_insn_offset[args[0]] = tcg_current_code_size(s);
+                tcg_out32(s, B);
+                s->tb_jmp_reset_offset[args[0]] = tcg_current_code_size(s);
+                break;
+            }
+        } else {
+            /* Indirect jump. */
+            tcg_debug_assert(s->tb_jmp_insn_offset == NULL);
+            tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TB, 0,
+                       (intptr_t)(s->tb_jmp_insn_offset + args[0]));
+        }
+        tcg_out32(s, MTSPR | RS(TCG_REG_TB) | CTR);
         tcg_out32(s, BCCTR | BO_ALWAYS);
-#else
-        /* To be replaced by a branch.  */
-        s->code_ptr++;
-#endif
-        s->tb_jmp_reset_offset[args[0]] = tcg_current_code_size(s);
+        s->tb_jmp_reset_offset[args[0]] = c = tcg_current_code_size(s);
+        if (USE_REG_TB) {
+            /* For the unlinked case, need to reset TCG_REG_TB.  */
+            c = -c;
+            assert(c == (int16_t)c);
+            tcg_out32(s, ADDI | TAI(TCG_REG_TB, TCG_REG_TB, c));
+        }
         break;
     case INDEX_op_goto_ptr:
         tcg_out32(s, MTSPR | RS(args[0]) | CTR);
-        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R3, 0);
+        if (USE_REG_TB) {
+            tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_TB, args[0]);
+        }
+        tcg_out32(s, ADDI | TAI(TCG_REG_R3, 0, 0));
         tcg_out32(s, BCCTR | BO_ALWAYS);
         break;
     case INDEX_op_br:
@@ -2761,8 +2732,8 @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_R13); /* thread pointer */
 #endif
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP1); /* mem temp */
-    if (USE_REG_RA) {
-        tcg_regset_set_reg(s->reserved_regs, TCG_REG_RA);  /* return addr */
+    if (USE_REG_TB) {
+        tcg_regset_set_reg(s->reserved_regs, TCG_REG_TB);  /* tb->tc_ptr */
     }
 }
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 22/23] tcg/ppc: Look for shifted constants
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
                   ` (20 preceding siblings ...)
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 21/23] tcg/ppc: Change TCG_REG_RA to TCG_REG_TB Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  2017-08-04 16:39   ` Philippe Mathieu-Daudé
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 23/23] tcg/ppc: Use constant pool for movi Richard Henderson
  22 siblings, 1 reply; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc/tcg-target.inc.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index bc14d2c9c6..4b32809217 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -598,6 +598,7 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
 {
     intptr_t tb_diff;
     int32_t high;
+    int lsb;
 
     tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
 
@@ -638,6 +639,14 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
         return;
     }
 
+    lsb = ctz64(arg);
+    high = arg >> lsb;
+    if (arg == (int16_t)arg) {
+        tcg_out32(s, ADDI | TAI(ret, 0, high));
+        tcg_out_shli64(s, ret, ret, lsb);
+        return;
+    }
+
     high = arg >> 31 >> 1;
     tcg_out_movi(s, TCG_TYPE_I32, ret, high);
     if (high) {
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 23/23] tcg/ppc: Use constant pool for movi
  2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
                   ` (21 preceding siblings ...)
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 22/23] tcg/ppc: Look for shifted constants Richard Henderson
@ 2017-08-04  5:44 ` Richard Henderson
  22 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04  5:44 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc/tcg-target.h     |  1 +
 tcg/ppc/tcg-target.inc.c | 34 ++++++++++++++++++++++++++++++----
 2 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index c1226ea5b6..e10d7e4411 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -130,5 +130,6 @@ void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
 #ifdef CONFIG_SOFTMMU
 #define TCG_TARGET_NEED_LDST_LABELS
 #endif
+#define TCG_TARGET_NEED_POOL_LABELS
 
 #endif
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 4b32809217..7598157e5f 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -22,6 +22,9 @@
  * THE SOFTWARE.
  */
 
+#include "elf.h"
+#include "tcg-pool.inc.c"
+
 #if defined _CALL_DARWIN || defined __APPLE__
 #define TCG_TARGET_CALL_DARWIN
 #endif
@@ -58,8 +61,6 @@
 
 static tcg_insn_unit *tb_ret_addr;
 
-#include "elf.h"
-
 bool have_isa_2_06;
 bool have_isa_3_00;
 
@@ -224,9 +225,12 @@ static inline void tcg_out_bc_noaddr(TCGContext *s, int insn)
 static void patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
-    tcg_insn_unit *target = (tcg_insn_unit *)value;
+    tcg_insn_unit *target;
+    tcg_insn_unit old;
+
+    value += addend;
+    target = (tcg_insn_unit *)value;
 
-    tcg_debug_assert(addend == 0);
     switch (type) {
     case R_PPC_REL14:
         reloc_pc14(code_ptr, target);
@@ -234,6 +238,12 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
     case R_PPC_REL24:
         reloc_pc24(code_ptr, target);
         break;
+    case R_PPC_ADDR16:
+        assert(value == (int16_t)value);
+        old = *code_ptr;
+        old = deposit32(old, 0, 16, value);
+        *code_ptr = old;
+        break;
     default:
         tcg_abort();
     }
@@ -647,6 +657,14 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
         return;
     }
 
+    /* Use the constant pool, if possible.  */
+    if (!in_prologue && USE_REG_TB) {
+        new_pool_label(s, arg, R_PPC_ADDR16, s->code_ptr,
+                       -(intptr_t)s->code_gen_ptr);
+        tcg_out32(s, LD | TAI(ret, TCG_REG_TB, 0));
+        return;
+    }
+
     high = arg >> 31 >> 1;
     tcg_out_movi(s, TCG_TYPE_I32, ret, high);
     if (high) {
@@ -1829,6 +1847,14 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is_64)
 #endif
 }
 
+static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
+{
+    int i;
+    for (i = 0; i < count; ++i) {
+        p[i] = NOP;
+    }
+}
+
 /* Parameters for function call generation, used in tcg.c.  */
 #define TCG_TARGET_STACK_ALIGN       16
 #define TCG_TARGET_EXTEND_ARGS       1
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 02/23] tcg: Rearrange ldst label tracking
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 02/23] tcg: Rearrange ldst label tracking Richard Henderson
@ 2017-08-04 10:33   ` Paolo Bonzini
  0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2017-08-04 10:33 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 04/08/2017 07:44, Richard Henderson wrote:
> Dispense with TCGBackendData, as it has never been used for more than
> holding a single pointer.  Use a define in the cpu/tcg-target.h to
> signal requirement for TCGLabelQemuLdst, so that we can drop the no-op
> tcg-be-null.h stubs.  Rename tcg-be-ldst.h to tcg-ldst.inc.c.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

> ---
>  tcg/aarch64/tcg-target.h              |  4 ++++
>  tcg/arm/tcg-target.h                  |  4 ++++
>  tcg/i386/tcg-target.h                 |  4 ++++
>  tcg/ia64/tcg-target.h                 |  4 ++++
>  tcg/mips/tcg-target.h                 |  4 ++++
>  tcg/ppc/tcg-target.h                  |  4 ++++
>  tcg/s390/tcg-target.h                 |  4 ++++
>  tcg/tcg-be-null.h                     | 44 -----------------------------------
>  tcg/tcg.h                             |  6 +++--
>  tcg/aarch64/tcg-target.inc.c          |  3 ++-
>  tcg/arm/tcg-target.inc.c              |  3 ++-
>  tcg/i386/tcg-target.inc.c             |  4 ++--
>  tcg/ia64/tcg-target.inc.c             | 19 ++++-----------
>  tcg/mips/tcg-target.inc.c             |  4 ++--
>  tcg/ppc/tcg-target.inc.c              |  4 ++--
>  tcg/s390/tcg-target.inc.c             |  4 ++--
>  tcg/sparc/tcg-target.inc.c            |  2 --
>  tcg/{tcg-be-ldst.h => tcg-ldst.inc.c} | 27 ++++-----------------
>  tcg/tcg.c                             | 17 +++++++-------
>  tcg/tci/tcg-target.inc.c              |  2 --
>  20 files changed, 61 insertions(+), 106 deletions(-)
>  delete mode 100644 tcg/tcg-be-null.h
>  rename tcg/{tcg-be-ldst.h => tcg-ldst.inc.c} (85%)
> 
> diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
> index 3c3b1e603d..484cf6236c 100644
> --- a/tcg/aarch64/tcg-target.h
> +++ b/tcg/aarch64/tcg-target.h
> @@ -120,4 +120,8 @@ static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
>  
>  void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
>  
> +#ifdef CONFIG_SOFTMMU
> +#define TCG_TARGET_NEED_LDST_LABELS
> +#endif
> +
>  #endif /* AARCH64_TCG_TARGET_H */
> diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
> index b836f7f127..55de35a691 100644
> --- a/tcg/arm/tcg-target.h
> +++ b/tcg/arm/tcg-target.h
> @@ -138,4 +138,8 @@ static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
>  /* not defined -- call should be eliminated at compile time */
>  void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
>  
> +#ifdef CONFIG_SOFTMMU
> +#define TCG_TARGET_NEED_LDST_LABELS
> +#endif
> +
>  #endif
> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
> index 2fd28fa6a5..11ee7fadd1 100644
> --- a/tcg/i386/tcg-target.h
> +++ b/tcg/i386/tcg-target.h
> @@ -186,4 +186,8 @@ static inline void tb_target_set_jmp_target(uintptr_t tc_ptr,
>  
>  #define TCG_TARGET_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
>  
> +#ifdef CONFIG_SOFTMMU
> +#define TCG_TARGET_NEED_LDST_LABELS
> +#endif
> +
>  #endif
> diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
> index 5c9ca8c1ce..83107e1407 100644
> --- a/tcg/ia64/tcg-target.h
> +++ b/tcg/ia64/tcg-target.h
> @@ -199,4 +199,8 @@ static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
>  /* not defined -- call should be eliminated at compile time */
>  void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
>  
> +#ifdef CONFIG_SOFTMMU
> +#define TCG_TARGET_NEED_LDST_LABELS
> +#endif
> +
>  #endif
> diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
> index 557c8ddc46..bea5290b9f 100644
> --- a/tcg/mips/tcg-target.h
> +++ b/tcg/mips/tcg-target.h
> @@ -209,4 +209,8 @@ static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
>  
>  void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
>  
> +#ifdef CONFIG_SOFTMMU
> +#define TCG_TARGET_NEED_LDST_LABELS
> +#endif
> +
>  #endif
> diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
> index 5bab3387e5..c1226ea5b6 100644
> --- a/tcg/ppc/tcg-target.h
> +++ b/tcg/ppc/tcg-target.h
> @@ -127,4 +127,8 @@ extern bool have_isa_3_00;
>  void flush_icache_range(uintptr_t start, uintptr_t stop);
>  void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
>  
> +#ifdef CONFIG_SOFTMMU
> +#define TCG_TARGET_NEED_LDST_LABELS
> +#endif
> +
>  #endif
> diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
> index 1398952d6b..8fea9646b4 100644
> --- a/tcg/s390/tcg-target.h
> +++ b/tcg/s390/tcg-target.h
> @@ -153,4 +153,8 @@ static inline void tb_target_set_jmp_target(uintptr_t tc_ptr,
>      /* no need to flush icache explicitly */
>  }
>  
> +#ifdef CONFIG_SOFTMMU
> +#define TCG_TARGET_NEED_LDST_LABELS
> +#endif
> +
>  #endif
> diff --git a/tcg/tcg-be-null.h b/tcg/tcg-be-null.h
> deleted file mode 100644
> index 5222fe29e2..0000000000
> --- a/tcg/tcg-be-null.h
> +++ /dev/null
> @@ -1,44 +0,0 @@
> -/*
> - * TCG Backend Data: No backend data
> - *
> - * Permission is hereby granted, free of charge, to any person obtaining a copy
> - * of this software and associated documentation files (the "Software"), to deal
> - * in the Software without restriction, including without limitation the rights
> - * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> - * copies of the Software, and to permit persons to whom the Software is
> - * furnished to do so, subject to the following conditions:
> - *
> - * The above copyright notice and this permission notice shall be included in
> - * all copies or substantial portions of the Software.
> - *
> - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> - * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> - * THE SOFTWARE.
> - */
> -
> -typedef struct TCGBackendData {
> -    /* Empty */
> -    char dummy;
> -} TCGBackendData;
> -
> -
> -/*
> - * Initialize TB backend data at the beginning of the TB.
> - */
> -
> -static inline void tcg_out_tb_init(TCGContext *s)
> -{
> -}
> -
> -/*
> - * Generate TB finalization at the end of block
> - */
> -
> -static inline bool tcg_out_tb_finalize(TCGContext *s)
> -{
> -    return true;
> -}
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 46957d9bd7..b0e00e744e 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -712,8 +712,10 @@ struct TCGContext {
>      CPUState *cpu;                      /* *_trans */
>      TCGv_env tcg_env;                   /* *_exec  */
>  
> -    /* The TCGBackendData structure is private to tcg-target.inc.c.  */
> -    struct TCGBackendData *be;
> +    /* These structures are private to tcg-target.inc.c.  */
> +#ifdef TCG_TARGET_NEED_LDST_LABELS
> +    struct TCGLabelQemuLdst *ldst_labels;
> +#endif
>  
>      TCGTempSet free_temps[TCG_TYPE_COUNT * 2];
>      TCGTemp temps[TCG_MAX_TEMPS]; /* globals first, temps after */
> diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
> index a1e5dd2f03..c7c751bafc 100644
> --- a/tcg/aarch64/tcg-target.inc.c
> +++ b/tcg/aarch64/tcg-target.inc.c
> @@ -10,7 +10,6 @@
>   * See the COPYING file in the top-level directory for details.
>   */
>  
> -#include "tcg-be-ldst.h"
>  #include "qemu/bitops.h"
>  
>  /* We're going to re-use TCGType in setting of the SF bit, which controls
> @@ -1070,6 +1069,8 @@ static void tcg_out_cltz(TCGContext *s, TCGType ext, TCGReg d,
>  }
>  
>  #ifdef CONFIG_SOFTMMU
> +#include "tcg-ldst.inc.c"
> +
>  /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
>   *                                     TCGMemOpIdx oi, uintptr_t ra)
>   */
> diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
> index 37efcf06af..81ea900852 100644
> --- a/tcg/arm/tcg-target.inc.c
> +++ b/tcg/arm/tcg-target.inc.c
> @@ -23,7 +23,6 @@
>   */
>  
>  #include "elf.h"
> -#include "tcg-be-ldst.h"
>  
>  int arm_arch = __ARM_ARCH;
>  
> @@ -1060,6 +1059,8 @@ static inline void tcg_out_mb(TCGContext *s, TCGArg a0)
>  }
>  
>  #ifdef CONFIG_SOFTMMU
> +#include "tcg-ldst.inc.c"
> +
>  /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
>   *                                     int mmu_idx, uintptr_t ra)
>   */
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index e4b120a40c..1a1ad96906 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -22,8 +22,6 @@
>   * THE SOFTWARE.
>   */
>  
> -#include "tcg-be-ldst.h"
> -
>  #ifdef CONFIG_DEBUG_TCG
>  static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
>  #if TCG_TARGET_REG_BITS == 64
> @@ -1214,6 +1212,8 @@ static void tcg_out_nopn(TCGContext *s, int n)
>  }
>  
>  #if defined(CONFIG_SOFTMMU)
> +#include "tcg-ldst.inc.c"
> +
>  /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
>   *                                     int mmu_idx, uintptr_t ra)
>   */
> diff --git a/tcg/ia64/tcg-target.inc.c b/tcg/ia64/tcg-target.inc.c
> index bf9a97d75c..3569f2b457 100644
> --- a/tcg/ia64/tcg-target.inc.c
> +++ b/tcg/ia64/tcg-target.inc.c
> @@ -1565,29 +1565,19 @@ typedef struct TCGLabelQemuLdst {
>      struct TCGLabelQemuLdst *next;
>  } TCGLabelQemuLdst;
>  
> -typedef struct TCGBackendData {
> -    TCGLabelQemuLdst *labels;
> -} TCGBackendData;
> -
> -static inline void tcg_out_tb_init(TCGContext *s)
> -{
> -    s->be->labels = NULL;
> -}
> -
>  static void add_qemu_ldst_label(TCGContext *s, bool is_ld, TCGMemOp opc,
>                                  tcg_insn_unit *label_ptr)
>  {
> -    TCGBackendData *be = s->be;
>      TCGLabelQemuLdst *l = tcg_malloc(sizeof(*l));
>  
>      l->is_ld = is_ld;
>      l->size = opc & MO_SIZE;
>      l->label_ptr = label_ptr;
> -    l->next = be->labels;
> -    be->labels = l;
> +    l->next = s->ldst_labels;
> +    s->ldst_labels = l;
>  }
>  
> -static bool tcg_out_tb_finalize(TCGContext *s)
> +static bool tcg_out_ldst_finalize(TCGContext *s)
>  {
>      static const void * const helpers[8] = {
>          helper_ret_stb_mmu,
> @@ -1602,7 +1592,7 @@ static bool tcg_out_tb_finalize(TCGContext *s)
>      tcg_insn_unit *thunks[8] = { };
>      TCGLabelQemuLdst *l;
>  
> -    for (l = s->be->labels; l != NULL; l = l->next) {
> +    for (l = s->ldst_labels; l != NULL; l = l->next) {
>          long x = l->is_ld * 4 + l->size;
>          tcg_insn_unit *dest = thunks[x];
>  
> @@ -1767,7 +1757,6 @@ static inline void tcg_out_qemu_st(TCGContext *s, const TCGArg *args)
>  }
>  
>  #else /* !CONFIG_SOFTMMU */
> -# include "tcg-be-null.h"
>  
>  static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args)
>  {
> diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
> index 04f8c839fe..750baadf37 100644
> --- a/tcg/mips/tcg-target.inc.c
> +++ b/tcg/mips/tcg-target.inc.c
> @@ -24,8 +24,6 @@
>   * THE SOFTWARE.
>   */
>  
> -#include "tcg-be-ldst.h"
> -
>  #ifdef HOST_WORDS_BIGENDIAN
>  # define MIPS_BE  1
>  #else
> @@ -1112,6 +1110,8 @@ static void tcg_out_call(TCGContext *s, tcg_insn_unit *arg)
>  }
>  
>  #if defined(CONFIG_SOFTMMU)
> +#include "tcg-ldst.inc.c"
> +
>  static void * const qemu_ld_helpers[16] = {
>      [MO_UB]   = helper_ret_ldub_mmu,
>      [MO_SB]   = helper_ret_ldsb_mmu,
> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
> index 018c240f6d..d772faf7be 100644
> --- a/tcg/ppc/tcg-target.inc.c
> +++ b/tcg/ppc/tcg-target.inc.c
> @@ -22,8 +22,6 @@
>   * THE SOFTWARE.
>   */
>  
> -#include "tcg-be-ldst.h"
> -
>  #if defined _CALL_DARWIN || defined __APPLE__
>  #define TCG_TARGET_CALL_DARWIN
>  #endif
> @@ -1418,6 +1416,8 @@ static const uint32_t qemu_exts_opc[4] = {
>  };
>  
>  #if defined (CONFIG_SOFTMMU)
> +#include "tcg-ldst.inc.c"
> +
>  /* helper signature: helper_ld_mmu(CPUState *env, target_ulong addr,
>   *                                 int mmu_idx, uintptr_t ra)
>   */
> diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
> index 38b9e791ee..ee0dff995a 100644
> --- a/tcg/s390/tcg-target.inc.c
> +++ b/tcg/s390/tcg-target.inc.c
> @@ -24,8 +24,6 @@
>   * THE SOFTWARE.
>   */
>  
> -#include "tcg-be-ldst.h"
> -
>  /* We only support generating code for 64-bit mode.  */
>  #if TCG_TARGET_REG_BITS != 64
>  #error "unsupported code generation mode"
> @@ -1458,6 +1456,8 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGMemOp opc, TCGReg data,
>  }
>  
>  #if defined(CONFIG_SOFTMMU)
> +#include "tcg-ldst.inc.c"
> +
>  /* We're expecting to use a 20-bit signed offset on the tlb memory ops.
>     Using the offset of the second entry in the last tlb table ensures
>     that we can index all of the elements of the first entry.  */
> diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
> index 06cabbedf5..bb7f7e8906 100644
> --- a/tcg/sparc/tcg-target.inc.c
> +++ b/tcg/sparc/tcg-target.inc.c
> @@ -22,8 +22,6 @@
>   * THE SOFTWARE.
>   */
>  
> -#include "tcg-be-null.h"
> -
>  #ifdef CONFIG_DEBUG_TCG
>  static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
>      "%g0",
> diff --git a/tcg/tcg-be-ldst.h b/tcg/tcg-ldst.inc.c
> similarity index 85%
> rename from tcg/tcg-be-ldst.h
> rename to tcg/tcg-ldst.inc.c
> index 17777aec5a..0e14cf4357 100644
> --- a/tcg/tcg-be-ldst.h
> +++ b/tcg/tcg-ldst.inc.c
> @@ -20,8 +20,6 @@
>   * THE SOFTWARE.
>   */
>  
> -#ifdef CONFIG_SOFTMMU
> -
>  typedef struct TCGLabelQemuLdst {
>      bool is_ld;             /* qemu_ld: true, qemu_st: false */
>      TCGMemOpIdx oi;
> @@ -35,19 +33,6 @@ typedef struct TCGLabelQemuLdst {
>      struct TCGLabelQemuLdst *next;
>  } TCGLabelQemuLdst;
>  
> -typedef struct TCGBackendData {
> -    TCGLabelQemuLdst *labels;
> -} TCGBackendData;
> -
> -
> -/*
> - * Initialize TB backend data at the beginning of the TB.
> - */
> -
> -static inline void tcg_out_tb_init(TCGContext *s)
> -{
> -    s->be->labels = NULL;
> -}
>  
>  /*
>   * Generate TB finalization at the end of block
> @@ -56,12 +41,12 @@ static inline void tcg_out_tb_init(TCGContext *s)
>  static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l);
>  static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l);
>  
> -static bool tcg_out_tb_finalize(TCGContext *s)
> +static bool tcg_out_ldst_finalize(TCGContext *s)
>  {
>      TCGLabelQemuLdst *lb;
>  
>      /* qemu_ld/st slow paths */
> -    for (lb = s->be->labels; lb != NULL; lb = lb->next) {
> +    for (lb = s->ldst_labels; lb != NULL; lb = lb->next) {
>          if (lb->is_ld) {
>              tcg_out_qemu_ld_slow_path(s, lb);
>          } else {
> @@ -85,13 +70,9 @@ static bool tcg_out_tb_finalize(TCGContext *s)
>  
>  static inline TCGLabelQemuLdst *new_ldst_label(TCGContext *s)
>  {
> -    TCGBackendData *be = s->be;
>      TCGLabelQemuLdst *l = tcg_malloc(sizeof(*l));
>  
> -    l->next = be->labels;
> -    be->labels = l;
> +    l->next = s->ldst_labels;
> +    s->ldst_labels = l;
>      return l;
>  }
> -#else
> -#include "tcg-be-null.h"
> -#endif /* CONFIG_SOFTMMU */
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 35598296c5..dd74eabb0a 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -112,10 +112,9 @@ static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
>  static void tcg_out_call(TCGContext *s, tcg_insn_unit *target);
>  static int tcg_target_const_match(tcg_target_long val, TCGType type,
>                                    const TCGArgConstraint *arg_ct);
> -static void tcg_out_tb_init(TCGContext *s);
> -static bool tcg_out_tb_finalize(TCGContext *s);
> -
> -
> +#ifdef TCG_TARGET_NEED_LDST_LABELS
> +static bool tcg_out_ldst_finalize(TCGContext *s);
> +#endif
>  
>  static TCGRegSet tcg_target_available_regs[2];
>  static TCGRegSet tcg_target_call_clobber_regs;
> @@ -470,8 +469,6 @@ void tcg_func_start(TCGContext *s)
>      s->gen_op_buf[0].prev = 0;
>      s->gen_next_op_idx = 1;
>      s->gen_next_parm_idx = 0;
> -
> -    s->be = tcg_malloc(sizeof(TCGBackendData));
>  }
>  
>  static inline int temp_idx(TCGContext *s, TCGTemp *ts)
> @@ -2619,7 +2616,9 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>      s->code_buf = tb->tc_ptr;
>      s->code_ptr = tb->tc_ptr;
>  
> -    tcg_out_tb_init(s);
> +#ifdef TCG_TARGET_NEED_LDST_LABELS
> +    s->ldst_labels = NULL;
> +#endif
>  
>      num_insns = -1;
>      for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
> @@ -2694,9 +2693,11 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>      s->gen_insn_end_off[num_insns] = tcg_current_code_size(s);
>  
>      /* Generate TB finalization at the end of block */
> -    if (!tcg_out_tb_finalize(s)) {
> +#ifdef TCG_TARGET_NEED_LDST_LABELS
> +    if (!tcg_out_ldst_finalize(s)) {
>          return -1;
>      }
> +#endif
>  
>      /* flush instruction cache */
>      flush_icache_range((uintptr_t)s->code_buf, (uintptr_t)s->code_ptr);
> diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
> index b6a15569f8..94461b2baf 100644
> --- a/tcg/tci/tcg-target.inc.c
> +++ b/tcg/tci/tcg-target.inc.c
> @@ -22,8 +22,6 @@
>   * THE SOFTWARE.
>   */
>  
> -#include "tcg-be-null.h"
> -
>  /* TODO list:
>   * - See TODO comments in code.
>   */
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 22/23] tcg/ppc: Look for shifted constants
  2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 22/23] tcg/ppc: Look for shifted constants Richard Henderson
@ 2017-08-04 16:39   ` Philippe Mathieu-Daudé
  2017-08-04 16:58     ` Richard Henderson
  0 siblings, 1 reply; 27+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-08-04 16:39 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

Hi Richard,

On 08/04/2017 02:44 AM, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>   tcg/ppc/tcg-target.inc.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
> index bc14d2c9c6..4b32809217 100644
> --- a/tcg/ppc/tcg-target.inc.c
> +++ b/tcg/ppc/tcg-target.inc.c
> @@ -598,6 +598,7 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
>   {
>       intptr_t tb_diff;
>       int32_t high;
> +    int lsb;
>   
>       tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
>   
> @@ -638,6 +639,14 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
>           return;
>       }
>   
> +    lsb = ctz64(arg);
> +    high = arg >> lsb;
> +    if (arg == (int16_t)arg) {

Can you move these here?

+    lsb = ctz64(arg);
+    high = arg >> lsb;

> +        tcg_out32(s, ADDI | TAI(ret, 0, high));
> +        tcg_out_shli64(s, ret, ret, lsb);
> +        return;
> +    }
> +
>       high = arg >> 31 >> 1;
>       tcg_out_movi(s, TCG_TYPE_I32, ret, high);
>       if (high) {
> 

Regards,

Phil.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 22/23] tcg/ppc: Look for shifted constants
  2017-08-04 16:39   ` Philippe Mathieu-Daudé
@ 2017-08-04 16:58     ` Richard Henderson
  0 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2017-08-04 16:58 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel

On 08/04/2017 09:39 AM, Philippe Mathieu-Daudé wrote:
>>   @@ -638,6 +639,14 @@ static void tcg_out_movi_int(TCGContext *s, TCGType
>> type, TCGReg ret,
>>           return;
>>       }
>>   +    lsb = ctz64(arg);
>> +    high = arg >> lsb;
>> +    if (arg == (int16_t)arg) {
> 
> Can you move these here?
> 
> +    lsb = ctz64(arg);
> +    high = arg >> lsb;

No, because you've found a bug -- the if should be testing high, not arg.  ;-)

Thanks,


r~

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-08-04 16:59 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-04  5:44 [Qemu-devel] [PATCH for-2.11 00/23] tcg constant pools Richard Henderson
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 01/23] tcg: Move USE_DIRECT_JUMP discriminator to tcg/cpu/tcg-target.h Richard Henderson
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 02/23] tcg: Rearrange ldst label tracking Richard Henderson
2017-08-04 10:33   ` Paolo Bonzini
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 03/23] tcg: Infrastructure for managing constant pools Richard Henderson
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 04/23] tcg/i386: Store out-of-range call targets in constant pool Richard Henderson
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 05/23] tcg/s390: Introduce TCG_REG_TB Richard Henderson
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 06/23] tcg/s390: Fix sign of patch_reloc addend Richard Henderson
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 07/23] tcg/s390: Use constant pool for movi Richard Henderson
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 08/23] tcg/s390: Use constant pool for andi Richard Henderson
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 09/23] tcg/s390: Use constant pool for ori Richard Henderson
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 10/23] tcg/s390: Use constant pool for xori Richard Henderson
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 11/23] tcg/s390: Use constant pool for cmpi Richard Henderson
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 12/23] tcg/aarch64: Use constant pool for movi Richard Henderson
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 13/23] tcg/sparc: Introduce TCG_REG_TB Richard Henderson
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 14/23] tcg/sparc: Use constant pool for movi Richard Henderson
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 15/23] tcg/arm: Improve tlb load for armv7 Richard Henderson
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 16/23] tcg/arm: Tighten tlb indexing offset test Richard Henderson
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 17/23] tcg/arm: Code rearrangement Richard Henderson
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 18/23] tcg/arm: Extract INSN_NOP Richard Henderson
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 19/23] tcg/arm: Use constant pool for movi Richard Henderson
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 20/23] tcg/arm: Use constant pool for call Richard Henderson
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 21/23] tcg/ppc: Change TCG_REG_RA to TCG_REG_TB Richard Henderson
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 22/23] tcg/ppc: Look for shifted constants Richard Henderson
2017-08-04 16:39   ` Philippe Mathieu-Daudé
2017-08-04 16:58     ` Richard Henderson
2017-08-04  5:44 ` [Qemu-devel] [PATCH for-2.11 23/23] tcg/ppc: Use constant pool for movi Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.