All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PULL 00/18] tcg queued patches
@ 2016-09-07 21:10 Richard Henderson
  2016-09-07 21:10 ` [Qemu-devel] [PULL 01/18] tcg: Support arbitrary size + alignment Richard Henderson
                   ` (18 more replies)
  0 siblings, 19 replies; 25+ messages in thread
From: Richard Henderson @ 2016-09-07 21:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Three unrelated patches and Pranith's memory barrier patch sets.

The alignment patch is in support of Sparc's ldf instructions:
8 and 16-byte memory operations that require only 4-byte alignment.
It's just as easy to support this kind of misalignment as any other.
As mentioned in the commit, we'd also forgotten to properly handle
arm32, mips, ia64 and sparc when it came to overalignment.

I have a follow up patch set to make use of this for target-sparc.

I've tweaked the memory barrier patch set.  For aarch64, ppc
and sparc, I've fixed the insn selection a bit.  I merged the
optimization pass into the current optimization pass.


r~


Pranith Kumar (15):
  Introduce TCGOpcode for memory barrier
  tcg/i386: Add support for fence
  tcg/aarch64: Add support for fence
  tcg/arm: Add support for fence
  tcg/ia64: Add support for fence
  tcg/mips: Add support for fence
  tcg/ppc: Add support for fence
  tcg/s390: Add support for fence
  tcg/sparc: Add support for fence
  tcg/tci: Add support for fence
  target-arm: Generate fences in ARMv7 frontend
  target-alpha: Generate fence op
  target-aarch64: Generate fences for aarch64
  target-i386: Generate fences for x86
  tcg: Optimize fence instructions

Richard Henderson (3):
  tcg: Support arbitrary size + alignment
  tcg: Merge GETPC and GETRA
  cpu-exec: Check -dfilter for -d cpu

 cpu-exec.c                   |  3 +-
 cputlb.c                     |  6 ++--
 include/exec/exec-all.h      |  9 ++---
 softmmu_template.h           | 48 ++++++++------------------
 target-alpha/translate.c     |  4 +--
 target-arm/helper.c          |  6 ++--
 target-arm/translate-a64.c   | 14 +++++++-
 target-arm/translate.c       |  4 +--
 target-i386/translate.c      |  8 +++++
 target-mips/op_helper.c      | 18 +++++-----
 tcg/README                   | 17 ++++++++++
 tcg/aarch64/tcg-target.inc.c | 35 +++++++++++++++----
 tcg/arm/tcg-target.inc.c     | 37 ++++++++++++++++----
 tcg/i386/tcg-target.inc.c    | 33 +++++++++++++-----
 tcg/ia64/tcg-target.inc.c    | 27 +++++++++++----
 tcg/mips/tcg-target.inc.c    | 18 ++++++++--
 tcg/optimize.c               | 54 +++++++++++++++++++++++++++++
 tcg/ppc/tcg-target.inc.c     | 78 +++++++++++++++++++++++++++---------------
 tcg/s390/tcg-target.inc.c    | 24 ++++++++-----
 tcg/sparc/tcg-target.inc.c   | 30 ++++++++++++----
 tcg/tcg-op.c                 | 17 ++++++++++
 tcg/tcg-op.h                 |  2 ++
 tcg/tcg-opc.h                |  2 ++
 tcg/tcg.h                    | 81 +++++++++++++++++++++++++++-----------------
 tcg/tci/tcg-target.inc.c     |  3 ++
 tci.c                        |  4 +++
 translate-all.c              |  1 +
 user-exec.c                  |  7 ++--
 28 files changed, 421 insertions(+), 169 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PULL 01/18] tcg: Support arbitrary size + alignment
  2016-09-07 21:10 [Qemu-devel] [PULL 00/18] tcg queued patches Richard Henderson
@ 2016-09-07 21:10 ` Richard Henderson
  2016-09-07 21:10 ` [Qemu-devel] [PULL 02/18] tcg: Merge GETPC and GETRA Richard Henderson
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-09-07 21:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Previously we allowed fully unaligned operations, but not operations
that are aligned but with less alignment than the operation size.

In addition, arm32, ia64, mips, and sparc had been omitted from the
previous overalignment patch, which would have led to that alignment
being enforced.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 softmmu_template.h           | 16 ++++++------
 tcg/aarch64/tcg-target.inc.c | 11 ++++----
 tcg/arm/tcg-target.inc.c     | 19 +++++++++-----
 tcg/i386/tcg-target.inc.c    | 16 ++++++------
 tcg/ia64/tcg-target.inc.c    | 22 +++++++++++-----
 tcg/mips/tcg-target.inc.c    | 12 ++++++---
 tcg/ppc/tcg-target.inc.c     | 57 +++++++++++++++++++++-------------------
 tcg/s390/tcg-target.inc.c    | 13 +++-------
 tcg/sparc/tcg-target.inc.c   | 17 +++++++-----
 tcg/tcg.h                    | 62 +++++++++++++++++++++-----------------------
 10 files changed, 132 insertions(+), 113 deletions(-)

diff --git a/softmmu_template.h b/softmmu_template.h
index 284ab2c..7ea0a41 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -146,14 +146,14 @@ WORD_TYPE helper_le_ld_name(CPUArchState *env, target_ulong addr,
     unsigned mmu_idx = get_mmuidx(oi);
     int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
     target_ulong tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
-    int a_bits = get_alignment_bits(get_memop(oi));
+    unsigned a_bits = get_alignment_bits(get_memop(oi));
     uintptr_t haddr;
     DATA_TYPE res;
 
     /* Adjust the given return address.  */
     retaddr -= GETPC_ADJ;
 
-    if (a_bits > 0 && (addr & ((1 << a_bits) - 1)) != 0) {
+    if (addr & nbits(a_bits)) {
         cpu_unaligned_access(ENV_GET_CPU(env), addr, READ_ACCESS_TYPE,
                              mmu_idx, retaddr);
     }
@@ -220,14 +220,14 @@ WORD_TYPE helper_be_ld_name(CPUArchState *env, target_ulong addr,
     unsigned mmu_idx = get_mmuidx(oi);
     int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
     target_ulong tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
-    int a_bits = get_alignment_bits(get_memop(oi));
+    unsigned a_bits = get_alignment_bits(get_memop(oi));
     uintptr_t haddr;
     DATA_TYPE res;
 
     /* Adjust the given return address.  */
     retaddr -= GETPC_ADJ;
 
-    if (a_bits > 0 && (addr & ((1 << a_bits) - 1)) != 0) {
+    if (addr & nbits(a_bits)) {
         cpu_unaligned_access(ENV_GET_CPU(env), addr, READ_ACCESS_TYPE,
                              mmu_idx, retaddr);
     }
@@ -331,13 +331,13 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
     unsigned mmu_idx = get_mmuidx(oi);
     int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
     target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
-    int a_bits = get_alignment_bits(get_memop(oi));
+    unsigned a_bits = get_alignment_bits(get_memop(oi));
     uintptr_t haddr;
 
     /* Adjust the given return address.  */
     retaddr -= GETPC_ADJ;
 
-    if (a_bits > 0 && (addr & ((1 << a_bits) - 1)) != 0) {
+    if (addr & nbits(a_bits)) {
         cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
                              mmu_idx, retaddr);
     }
@@ -414,13 +414,13 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
     unsigned mmu_idx = get_mmuidx(oi);
     int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
     target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
-    int a_bits = get_alignment_bits(get_memop(oi));
+    unsigned a_bits = get_alignment_bits(get_memop(oi));
     uintptr_t haddr;
 
     /* Adjust the given return address.  */
     retaddr -= GETPC_ADJ;
 
-    if (a_bits > 0 && (addr & ((1 << a_bits) - 1)) != 0) {
+    if (addr & nbits(a_bits)) {
         cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
                              mmu_idx, retaddr);
     }
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 08b2d03..2f5629e 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -1081,23 +1081,22 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp opc,
     int tlb_offset = is_read ?
         offsetof(CPUArchState, tlb_table[mem_index][0].addr_read)
         : offsetof(CPUArchState, tlb_table[mem_index][0].addr_write);
-    int a_bits = get_alignment_bits(opc);
+    unsigned a_bits = get_alignment_bits(opc);
+    unsigned s_bits = opc & MO_SIZE;
     TCGReg base = TCG_AREG0, x3;
     uint64_t tlb_mask;
 
     /* For aligned accesses, we check the first byte and include the alignment
        bits within the address.  For unaligned access, we check that we don't
        cross pages using the address of the last byte of the access.  */
-    if (a_bits >= 0) {
-        /* A byte access or an alignment check required */
-        tlb_mask = TARGET_PAGE_MASK | ((1 << a_bits) - 1);
+    if (a_bits >= s_bits) {
         x3 = addr_reg;
     } else {
         tcg_out_insn(s, 3401, ADDI, TARGET_LONG_BITS == 64,
-                     TCG_REG_X3, addr_reg, (1 << (opc & MO_SIZE)) - 1);
-        tlb_mask = TARGET_PAGE_MASK;
+                     TCG_REG_X3, addr_reg, nbits(s_bits) - nbits(a_bits));
         x3 = TCG_REG_X3;
     }
+    tlb_mask = TARGET_PAGE_MASK | nbits(a_bits);
 
     /* Extract the TLB index from the address into X0.
        X0<CPU_TLB_BITS:0> =
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 172feba..58ffc0d 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -1168,7 +1168,7 @@ QEMU_BUILD_BUG_ON(offsetof(CPUArchState, tlb_table[NB_MMU_MODES - 1][1])
    containing the addend of the tlb entry.  Clobbers R0, R1, R2, TMP.  */
 
 static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
-                               TCGMemOp s_bits, int mem_index, bool is_load)
+                               TCGMemOp opc, int mem_index, bool is_load)
 {
     TCGReg base = TCG_AREG0;
     int cmp_off =
@@ -1176,6 +1176,8 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
          ? offsetof(CPUArchState, tlb_table[mem_index][0].addr_read)
          : offsetof(CPUArchState, tlb_table[mem_index][0].addr_write));
     int add_off = offsetof(CPUArchState, tlb_table[mem_index][0].addend);
+    unsigned s_bits = opc & MO_SIZE;
+    unsigned a_bits = get_alignment_bits(opc);
 
     /* Should generate something like the following:
      *   shr    tmp, addrlo, #TARGET_PAGE_BITS                    (1)
@@ -1216,10 +1218,13 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
         }
     }
 
-    /* Check alignment.  */
-    if (s_bits) {
-        tcg_out_dat_imm(s, COND_AL, ARITH_TST,
-                        0, addrlo, (1 << s_bits) - 1);
+    /* Check alignment.  We don't support inline unaligned acceses,
+       but we can easily support overalignment checks.  */
+    if (a_bits < s_bits) {
+        a_bits = s_bits;
+    }
+    if (a_bits) {
+        tcg_out_dat_imm(s, COND_AL, ARITH_TST, 0, addrlo, nbits(a_bits));
     }
 
     /* Load the tlb addend.  */
@@ -1499,7 +1504,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
-    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc & MO_SIZE, mem_index, 1);
+    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 1);
 
     /* This a conditional BL only to load a pointer within this opcode into LR
        for the slow path.  We will not be using the value for a tail call.  */
@@ -1630,7 +1635,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
-    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc & MO_SIZE, mem_index, 0);
+    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 0);
 
     tcg_out_qemu_st_index(s, COND_EQ, opc, datalo, datahi, addrlo, addend);
 
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 6f8cdca..1573e69 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1202,7 +1202,8 @@ static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     TCGType ttype = TCG_TYPE_I32;
     TCGType tlbtype = TCG_TYPE_I32;
     int trexw = 0, hrexw = 0, tlbrexw = 0;
-    int a_bits = get_alignment_bits(opc);
+    unsigned a_bits = get_alignment_bits(opc);
+    unsigned s_bits = opc & MO_SIZE;
     target_ulong tlb_mask;
 
     if (TCG_TARGET_REG_BITS == 64) {
@@ -1220,17 +1221,16 @@ static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     }
 
     tcg_out_mov(s, tlbtype, r0, addrlo);
-    if (a_bits >= 0) {
-        /* A byte access or an alignment check required */
+    /* If the required alignment is at least as large as the access, simply
+       copy the address and mask.  For lesser alignments, check that we don't
+       cross pages for the complete access.  */
+    if (a_bits >= s_bits) {
         tcg_out_mov(s, ttype, r1, addrlo);
-        tlb_mask = TARGET_PAGE_MASK | ((1 << a_bits) - 1);
     } else {
-        /* For unaligned access check that we don't cross pages using
-           the page address of the last byte.  */
         tcg_out_modrm_offset(s, OPC_LEA + trexw, r1, addrlo,
-                             (1 << (opc & MO_SIZE)) - 1);
-        tlb_mask = TARGET_PAGE_MASK;
+                             nbits(s_bits) - nbits(a_bits));
     }
+    tlb_mask = TARGET_PAGE_MASK | nbits(a_bits);
 
     tcg_out_shifti(s, SHIFT_SHR + tlbrexw, r0,
                    TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
diff --git a/tcg/ia64/tcg-target.inc.c b/tcg/ia64/tcg-target.inc.c
index c91f392..7642390 100644
--- a/tcg/ia64/tcg-target.inc.c
+++ b/tcg/ia64/tcg-target.inc.c
@@ -1496,10 +1496,18 @@ QEMU_BUILD_BUG_ON(offsetof(CPUArchState, tlb_table[NB_MMU_MODES - 1][1])
    R1, R3 are clobbered, leaving R56 free for...
    BSWAP_1, BSWAP_2 and I-slot insns for swapping data for store.  */
 static inline void tcg_out_qemu_tlb(TCGContext *s, TCGReg addr_reg,
-                                    TCGMemOp s_bits, int off_rw, int off_add,
+                                    TCGMemOp opc, int off_rw, int off_add,
                                     uint64_t bswap1, uint64_t bswap2)
 {
-     /*
+    unsigned s_bits = opc & MO_SIZE;
+    unsigned a_bits = get_alignment_bits(opc);
+
+    /* We don't support unaligned accesses, but overalignment is easy.  */
+    if (a_bits < s_bits) {
+        a_bits = s_bits;
+    }
+
+    /*
         .mii
         mov	r2 = off_rw
         extr.u	r3 = addr_reg, ...		# extract tlb page
@@ -1521,7 +1529,7 @@ static inline void tcg_out_qemu_tlb(TCGContext *s, TCGReg addr_reg,
         cmp.eq	p6, p7 = r3, r58
         nop
         ;;
-      */
+    */
     tcg_out_bundle(s, miI,
                    tcg_opc_movi_a(TCG_REG_P0, TCG_REG_R2, off_rw),
                    tcg_opc_i11(TCG_REG_P0, OPC_EXTR_U_I11, TCG_REG_R3,
@@ -1536,8 +1544,8 @@ static inline void tcg_out_qemu_tlb(TCGContext *s, TCGReg addr_reg,
                                TCG_REG_R3, 63 - CPU_TLB_ENTRY_BITS,
                                63 - CPU_TLB_ENTRY_BITS),
                    tcg_opc_i14(TCG_REG_P0, OPC_DEP_I14, TCG_REG_R1, 0,
-                               TCG_REG_R57, 63 - s_bits,
-                               TARGET_PAGE_BITS - s_bits - 1));
+                               TCG_REG_R57, 63 - a_bits,
+                               TARGET_PAGE_BITS - a_bits - 1));
     tcg_out_bundle(s, MmI,
                    tcg_opc_a1 (TCG_REG_P0, OPC_ADD_A1,
                                TCG_REG_R2, TCG_REG_R2, TCG_REG_R3),
@@ -1661,7 +1669,7 @@ static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args)
     s_bits = opc & MO_SIZE;
 
     /* Read the TLB entry */
-    tcg_out_qemu_tlb(s, addr_reg, s_bits,
+    tcg_out_qemu_tlb(s, addr_reg, opc,
                      offsetof(CPUArchState, tlb_table[mem_index][0].addr_read),
                      offsetof(CPUArchState, tlb_table[mem_index][0].addend),
                      INSN_NOP_I, INSN_NOP_I);
@@ -1739,7 +1747,7 @@ static inline void tcg_out_qemu_st(TCGContext *s, const TCGArg *args)
         pre1 = tcg_opc_ext_i(TCG_REG_P0, opc, TCG_REG_R58, data_reg);
     }
 
-    tcg_out_qemu_tlb(s, addr_reg, s_bits,
+    tcg_out_qemu_tlb(s, addr_reg, opc,
                      offsetof(CPUArchState, tlb_table[mem_index][0].addr_write),
                      offsetof(CPUArchState, tlb_table[mem_index][0].addend),
                      pre1, pre2);
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 2f9be48..8614ff8 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -1040,7 +1040,9 @@ static void tcg_out_tlb_load(TCGContext *s, TCGReg base, TCGReg addrl,
                              TCGReg addrh, TCGMemOpIdx oi,
                              tcg_insn_unit *label_ptr[2], bool is_load)
 {
-    TCGMemOp s_bits = get_memop(oi) & MO_SIZE;
+    TCGMemOp opc = get_memop(oi);
+    unsigned s_bits = opc & MO_SIZE;
+    unsigned a_bits = get_alignment_bits(opc);
     int mem_index = get_mmuidx(oi);
     int cmp_off
         = (is_load
@@ -1071,10 +1073,14 @@ static void tcg_out_tlb_load(TCGContext *s, TCGReg base, TCGReg addrl,
     tcg_out_opc_imm(s, OPC_LW, TCG_TMP0, TCG_REG_A0,
                     cmp_off + (TARGET_LONG_BITS == 64 ? LO_OFF : 0));
 
+    /* We don't currently support unaligned accesses.
+       We could do so with mips32r6.  */
+    if (a_bits < s_bits) {
+        a_bits = s_bits;
+    }
     /* Mask the page bits, keeping the alignment bits to compare against.
        In between on 32-bit targets, load the tlb addend for the fast path.  */
-    tcg_out_movi(s, TCG_TYPE_I32, TCG_TMP1,
-                 TARGET_PAGE_MASK | ((1 << s_bits) - 1));
+    tcg_out_movi(s, TCG_TYPE_I32, TCG_TMP1, TARGET_PAGE_MASK | nbits(a_bits));
     if (TARGET_LONG_BITS == 32) {
         tcg_out_opc_imm(s, OPC_LW, TCG_REG_A0, TCG_REG_A0, add_off);
     }
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index eaf1bd9..82ac4b3 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -1404,8 +1404,8 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGMemOp opc,
            : offsetof(CPUArchState, tlb_table[mem_index][0].addr_write));
     int add_off = offsetof(CPUArchState, tlb_table[mem_index][0].addend);
     TCGReg base = TCG_AREG0;
-    TCGMemOp s_bits = opc & MO_SIZE;
-    int a_bits = get_alignment_bits(opc);
+    unsigned s_bits = opc & MO_SIZE;
+    unsigned a_bits = get_alignment_bits(opc);
 
     /* Extract the page index, shifted into place for tlb index.  */
     if (TCG_TARGET_REG_BITS == 64) {
@@ -1458,39 +1458,42 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGMemOp opc,
     tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_R3, TCG_REG_R3, add_off);
 
     /* Clear the non-page, non-alignment bits from the address */
-    if (TCG_TARGET_REG_BITS == 32 || TARGET_LONG_BITS == 32) {
-        /* We don't support unaligned accesses on 32-bits, preserve
-         * the bottom bits and thus trigger a comparison failure on
-         * unaligned accesses
+    if (TCG_TARGET_REG_BITS == 32) {
+        /* We don't support unaligned accesses on 32-bits.
+         * Preserve the bottom bits and thus trigger a comparison
+         * failure on unaligned accesses.
          */
-        if (a_bits < 0) {
+        if (a_bits < s_bits) {
             a_bits = s_bits;
         }
         tcg_out_rlw(s, RLWINM, TCG_REG_R0, addrlo, 0,
                     (32 - a_bits) & 31, 31 - TARGET_PAGE_BITS);
-    } else if (a_bits) {
-        /* More than byte access, we need to handle alignment */
-        if (a_bits > 0) {
-            /* Alignment required by the front-end, same as 32-bits */
-            tcg_out_rld(s, RLDICL, TCG_REG_R0, addrlo,
+    } else {
+        TCGReg t = addrlo;
+
+        /* If the access is unaligned, we need to make sure we fail if we
+         * cross a page boundary.  The trick is to add the access size-1
+         * to the address before masking the low bits.  That will make the
+         * address overflow to the next page if we cross a page boundary,
+         * which will then force a mismatch of the TLB compare.
+         */
+        if (a_bits < s_bits) {
+            tcg_out32(s, ADDI | TAI(TCG_REG_R0, t,
+                                    nbits(s_bits) - nbits(a_bits)));
+            t = TCG_REG_R0;
+        }
+
+        /* Mask the address for the requested alignment.  */
+        if (TARGET_LONG_BITS == 32) {
+            tcg_out_rlw(s, RLWINM, TCG_REG_R0, t, 0,
+                        (32 - a_bits) & 31, 31 - TARGET_PAGE_BITS);
+        } else if (a_bits == 0) {
+            tcg_out_rld(s, RLDICR, TCG_REG_R0, t, 0, 63 - TARGET_PAGE_BITS);
+        } else {
+            tcg_out_rld(s, RLDICL, TCG_REG_R0, t,
                         64 - TARGET_PAGE_BITS, TARGET_PAGE_BITS - a_bits);
             tcg_out_rld(s, RLDICL, TCG_REG_R0, TCG_REG_R0, TARGET_PAGE_BITS, 0);
-       } else {
-           /* We support unaligned accesses, we need to make sure we fail
-            * if we cross a page boundary. The trick is to add the
-            * access_size-1 to the address before masking the low bits.
-            * That will make the address overflow to the next page if we
-            * cross a page boundary which will then force a mismatch of
-            * the TLB compare since the next page cannot possibly be in
-            * the same TLB index.
-            */
-            tcg_out32(s, ADDI | TAI(TCG_REG_R0, addrlo, (1 << s_bits) - 1));
-            tcg_out_rld(s, RLDICR, TCG_REG_R0, TCG_REG_R0,
-                        0, 63 - TARGET_PAGE_BITS);
         }
-    } else {
-        /* Byte access, just chop off the bits below the page index */
-        tcg_out_rld(s, RLDICR, TCG_REG_R0, addrlo, 0, 63 - TARGET_PAGE_BITS);
     }
 
     if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 5a7495b..c30a7ef 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -1505,21 +1505,16 @@ QEMU_BUILD_BUG_ON(offsetof(CPUArchState, tlb_table[NB_MMU_MODES - 1][1])
 static TCGReg tcg_out_tlb_read(TCGContext* s, TCGReg addr_reg, TCGMemOp opc,
                                int mem_index, bool is_ld)
 {
-    int a_bits = get_alignment_bits(opc);
+    unsigned s_bits = opc & MO_SIZE;
+    unsigned a_bits = get_alignment_bits(opc);
     int ofs, a_off;
     uint64_t tlb_mask;
 
     /* For aligned accesses, we check the first byte and include the alignment
        bits within the address.  For unaligned access, we check that we don't
        cross pages using the address of the last byte of the access.  */
-    if (a_bits >= 0) {
-        /* A byte access or an alignment check required */
-        a_off = 0;
-        tlb_mask = TARGET_PAGE_MASK | ((1 << a_bits) - 1);
-    } else {
-        a_off = (1 << (opc & MO_SIZE)) - 1;
-        tlb_mask = TARGET_PAGE_MASK;
-    }
+    a_off = (a_bits >= s_bits ? 0 : nbits(s_bits) - nbits(a_bits));
+    tlb_mask = TARGET_PAGE_MASK | nbits(a_bits);
 
     if (facilities & FACILITY_GEN_INST_EXT) {
         tcg_out_risbg(s, TCG_REG_R2, addr_reg,
diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index 8e98172..10e4126 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -996,19 +996,24 @@ static void tcg_target_qemu_prologue(TCGContext *s)
    is in the returned register, maybe %o0.  The TLB addend is in %o1.  */
 
 static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addr, int mem_index,
-                               TCGMemOp s_bits, int which)
+                               TCGMemOp opc, int which)
 {
     const TCGReg r0 = TCG_REG_O0;
     const TCGReg r1 = TCG_REG_O1;
     const TCGReg r2 = TCG_REG_O2;
+    unsigned s_bits = opc & MO_SIZE;
+    unsigned a_bits = get_alignment_bits(opc);
     int tlb_ofs;
 
     /* Shift the page number down.  */
     tcg_out_arithi(s, r1, addr, TARGET_PAGE_BITS, SHIFT_SRL);
 
-    /* Mask out the page offset, except for the required alignment.  */
-    tcg_out_movi(s, TCG_TYPE_TL, TCG_REG_T1,
-                 TARGET_PAGE_MASK | ((1 << s_bits) - 1));
+    /* Mask out the page offset, except for the required alignment.
+       We don't support unaligned accesses.  */
+    if (a_bits < s_bits) {
+        a_bits = s_bits;
+    }
+    tcg_out_movi(s, TCG_TYPE_TL, TCG_REG_T1, TARGET_PAGE_MASK | nbits(a_bits));
 
     /* Mask the tlb index.  */
     tcg_out_arithi(s, r1, r1, CPU_TLB_SIZE - 1, ARITH_AND);
@@ -1087,7 +1092,7 @@ static void tcg_out_qemu_ld(TCGContext *s, TCGReg data, TCGReg addr,
     tcg_insn_unit *func;
     tcg_insn_unit *label_ptr;
 
-    addrz = tcg_out_tlb_load(s, addr, memi, memop & MO_SIZE,
+    addrz = tcg_out_tlb_load(s, addr, memi, memop,
                              offsetof(CPUTLBEntry, addr_read));
 
     /* The fast path is exactly one insn.  Thus we can perform the
@@ -1169,7 +1174,7 @@ static void tcg_out_qemu_st(TCGContext *s, TCGReg data, TCGReg addr,
     tcg_insn_unit *func;
     tcg_insn_unit *label_ptr;
 
-    addrz = tcg_out_tlb_load(s, addr, memi, memop & MO_SIZE,
+    addrz = tcg_out_tlb_load(s, addr, memi, memop,
                              offsetof(CPUTLBEntry, addr_write));
 
     /* The fast path is exactly one insn.  Thus we can perform the entire
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 1bcabca..8856f02 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -287,20 +287,19 @@ typedef enum TCGMemOp {
      * MO_ALIGN accesses will result in a call to the CPU's
      * do_unaligned_access hook if the guest address is not aligned.
      * The default depends on whether the target CPU defines ALIGNED_ONLY.
+     *
      * Some architectures (e.g. ARMv8) need the address which is aligned
      * to a size more than the size of the memory access.
-     * To support such check it's enough the current costless alignment
-     * check implementation in QEMU, but we need to support
-     * an alignment size specifying.
-     * MO_ALIGN supposes a natural alignment
-     * (i.e. the alignment size is the size of a memory access).
-     * Note that an alignment size must be equal or greater
-     * than an access size.
+     * Some architectures (e.g. SPARCv9) need an address which is aligned,
+     * but less strictly than the natural alignment.
+     *
+     * MO_ALIGN supposes the alignment size is the size of a memory access.
+     *
      * There are three options:
-     * - an alignment to the size of an access (MO_ALIGN);
-     * - an alignment to the specified size that is equal or greater than
-     *   an access size (MO_ALIGN_x where 'x' is a size in bytes);
      * - unaligned access permitted (MO_UNALN).
+     * - an alignment to the size of an access (MO_ALIGN);
+     * - an alignment to a specified size, which may be more or less than
+     *   the access size (MO_ALIGN_x where 'x' is a size in bytes);
      */
     MO_ASHIFT = 4,
     MO_AMASK = 7 << MO_ASHIFT,
@@ -349,42 +348,41 @@ typedef enum TCGMemOp {
 } TCGMemOp;
 
 /**
+ * nbits
+ * @bits: number of bits
+ *
+ * Return a mask of 0 to 31 low bits set.
+ */
+static inline unsigned nbits(unsigned bits)
+{
+    return (1U << bits) - 1;
+}
+
+/**
  * get_alignment_bits
  * @memop: TCGMemOp value
  *
  * Extract the alignment size from the memop.
- *
- * Returns: 0 in case of byte access (which is always aligned);
- *          positive value - number of alignment bits;
- *          negative value if unaligned access enabled
- *          and this is not a byte access.
  */
-static inline int get_alignment_bits(TCGMemOp memop)
+static inline unsigned get_alignment_bits(TCGMemOp memop)
 {
-    int a = memop & MO_AMASK;
-    int s = memop & MO_SIZE;
-    int r;
+    unsigned a = memop & MO_AMASK;
 
     if (a == MO_UNALN) {
-        /* Negative value if unaligned access enabled,
-         * or zero value in case of byte access.
-         */
-        return -s;
+        /* No alignment required.  */
+        a = 0;
     } else if (a == MO_ALIGN) {
-        /* A natural alignment: return a number of access size bits */
-        r = s;
+        /* A natural alignment requirement.  */
+        a = memop & MO_SIZE;
     } else {
-        /* Specific alignment size. It must be equal or greater
-         * than the access size.
-         */
-        r = a >> MO_ASHIFT;
-        tcg_debug_assert(r >= s);
+        /* A specific alignment requirement.  */
+        a = a >> MO_ASHIFT;
     }
 #if defined(CONFIG_SOFTMMU)
     /* The requested alignment cannot overlap the TLB flags.  */
-    tcg_debug_assert((TLB_FLAGS_MASK & ((1 << r) - 1)) == 0);
+    tcg_debug_assert((TLB_FLAGS_MASK & nbits(a)) == 0);
 #endif
-    return r;
+    return a;
 }
 
 typedef tcg_target_ulong TCGArg;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PULL 02/18] tcg: Merge GETPC and GETRA
  2016-09-07 21:10 [Qemu-devel] [PULL 00/18] tcg queued patches Richard Henderson
  2016-09-07 21:10 ` [Qemu-devel] [PULL 01/18] tcg: Support arbitrary size + alignment Richard Henderson
@ 2016-09-07 21:10 ` Richard Henderson
  2016-09-07 21:10 ` [Qemu-devel] [PULL 03/18] cpu-exec: Check -dfilter for -d cpu Richard Henderson
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-09-07 21:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

The return address argument to the softmmu template helpers was
confused.  In the legacy case, we wanted to indicate that there
is no return address, and so passed in NULL.  However, we then
immediately subtracted GETPC_ADJ from NULL, resulting in a non-zero
value, indicating the presence of an (invalid) return address.

Push the GETPC_ADJ subtraction down to the only point it's required:
immediately before use within cpu_restore_state, after all NULL pointer
checks have been completed.  This makes GETPC and GETRA identical.

Remove GETRA as the lesser used macro, replacing all uses with GETPC.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 cputlb.c                |  6 ++----
 include/exec/exec-all.h |  9 +++------
 softmmu_template.h      | 32 ++++++--------------------------
 target-arm/helper.c     |  6 +++---
 target-mips/op_helper.c | 18 +++++++++---------
 translate-all.c         |  1 +
 user-exec.c             |  7 +++++--
 7 files changed, 29 insertions(+), 50 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index d068ee5..3c99c34 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -543,10 +543,8 @@ static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
 #undef MMUSUFFIX
 
 #define MMUSUFFIX _cmmu
-#undef GETPC_ADJ
-#define GETPC_ADJ 0
-#undef GETRA
-#define GETRA() ((uintptr_t)0)
+#undef GETPC
+#define GETPC() ((uintptr_t)0)
 #define SOFTMMU_CODE_ACCESS
 
 #define SHIFT 0
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index d008296..8b557d8 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -347,13 +347,12 @@ static inline void tb_add_jump(TranslationBlock *tb, int n,
     tb_next->jmp_list_first = (uintptr_t)tb | n;
 }
 
-/* GETRA is the true target of the return instruction that we'll execute,
-   defined here for simplicity of defining the follow-up macros.  */
+/* GETPC is the true target of the return instruction that we'll execute.  */
 #if defined(CONFIG_TCG_INTERPRETER)
 extern uintptr_t tci_tb_ptr;
-# define GETRA() tci_tb_ptr
+# define GETPC() tci_tb_ptr
 #else
-# define GETRA() \
+# define GETPC() \
     ((uintptr_t)__builtin_extract_return_addr(__builtin_return_address(0)))
 #endif
 
@@ -366,8 +365,6 @@ extern uintptr_t tci_tb_ptr;
    smaller than 4 bytes, so we don't worry about special-casing this.  */
 #define GETPC_ADJ   2
 
-#define GETPC()  (GETRA() - GETPC_ADJ)
-
 #if !defined(CONFIG_USER_ONLY)
 
 struct MemoryRegion *iotlb_to_region(CPUState *cpu,
diff --git a/softmmu_template.h b/softmmu_template.h
index 7ea0a41..b3eb821 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -150,9 +150,6 @@ WORD_TYPE helper_le_ld_name(CPUArchState *env, target_ulong addr,
     uintptr_t haddr;
     DATA_TYPE res;
 
-    /* Adjust the given return address.  */
-    retaddr -= GETPC_ADJ;
-
     if (addr & nbits(a_bits)) {
         cpu_unaligned_access(ENV_GET_CPU(env), addr, READ_ACCESS_TYPE,
                              mmu_idx, retaddr);
@@ -193,10 +190,8 @@ WORD_TYPE helper_le_ld_name(CPUArchState *env, target_ulong addr,
     do_unaligned_access:
         addr1 = addr & ~(DATA_SIZE - 1);
         addr2 = addr1 + DATA_SIZE;
-        /* Note the adjustment at the beginning of the function.
-           Undo that for the recursion.  */
-        res1 = helper_le_ld_name(env, addr1, oi, retaddr + GETPC_ADJ);
-        res2 = helper_le_ld_name(env, addr2, oi, retaddr + GETPC_ADJ);
+        res1 = helper_le_ld_name(env, addr1, oi, retaddr);
+        res2 = helper_le_ld_name(env, addr2, oi, retaddr);
         shift = (addr & (DATA_SIZE - 1)) * 8;
 
         /* Little-endian combine.  */
@@ -224,9 +219,6 @@ WORD_TYPE helper_be_ld_name(CPUArchState *env, target_ulong addr,
     uintptr_t haddr;
     DATA_TYPE res;
 
-    /* Adjust the given return address.  */
-    retaddr -= GETPC_ADJ;
-
     if (addr & nbits(a_bits)) {
         cpu_unaligned_access(ENV_GET_CPU(env), addr, READ_ACCESS_TYPE,
                              mmu_idx, retaddr);
@@ -267,10 +259,8 @@ WORD_TYPE helper_be_ld_name(CPUArchState *env, target_ulong addr,
     do_unaligned_access:
         addr1 = addr & ~(DATA_SIZE - 1);
         addr2 = addr1 + DATA_SIZE;
-        /* Note the adjustment at the beginning of the function.
-           Undo that for the recursion.  */
-        res1 = helper_be_ld_name(env, addr1, oi, retaddr + GETPC_ADJ);
-        res2 = helper_be_ld_name(env, addr2, oi, retaddr + GETPC_ADJ);
+        res1 = helper_be_ld_name(env, addr1, oi, retaddr);
+        res2 = helper_be_ld_name(env, addr2, oi, retaddr);
         shift = (addr & (DATA_SIZE - 1)) * 8;
 
         /* Big-endian combine.  */
@@ -334,9 +324,6 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
     unsigned a_bits = get_alignment_bits(get_memop(oi));
     uintptr_t haddr;
 
-    /* Adjust the given return address.  */
-    retaddr -= GETPC_ADJ;
-
     if (addr & nbits(a_bits)) {
         cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
                              mmu_idx, retaddr);
@@ -391,10 +378,8 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
         for (i = 0; i < DATA_SIZE; ++i) {
             /* Little-endian extract.  */
             uint8_t val8 = val >> (i * 8);
-            /* Note the adjustment at the beginning of the function.
-               Undo that for the recursion.  */
             glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
-                                            oi, retaddr + GETPC_ADJ);
+                                            oi, retaddr);
         }
         return;
     }
@@ -417,9 +402,6 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
     unsigned a_bits = get_alignment_bits(get_memop(oi));
     uintptr_t haddr;
 
-    /* Adjust the given return address.  */
-    retaddr -= GETPC_ADJ;
-
     if (addr & nbits(a_bits)) {
         cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
                              mmu_idx, retaddr);
@@ -474,10 +456,8 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
         for (i = 0; i < DATA_SIZE; ++i) {
             /* Big-endian extract.  */
             uint8_t val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8));
-            /* Note the adjustment at the beginning of the function.
-               Undo that for the recursion.  */
             glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
-                                            oi, retaddr + GETPC_ADJ);
+                                            oi, retaddr);
         }
         return;
     }
diff --git a/target-arm/helper.c b/target-arm/helper.c
index bdb842c..915fe0f 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -8310,12 +8310,12 @@ void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in)
              * this purpose use the actual register value passed to us
              * so that we get the fault address right.
              */
-            helper_ret_stb_mmu(env, vaddr_in, 0, oi, GETRA());
+            helper_ret_stb_mmu(env, vaddr_in, 0, oi, GETPC());
             /* Now we can populate the other TLB entries, if any */
             for (i = 0; i < maxidx; i++) {
                 uint64_t va = vaddr + TARGET_PAGE_SIZE * i;
                 if (va != (vaddr_in & TARGET_PAGE_MASK)) {
-                    helper_ret_stb_mmu(env, va, 0, oi, GETRA());
+                    helper_ret_stb_mmu(env, va, 0, oi, GETPC());
                 }
             }
         }
@@ -8332,7 +8332,7 @@ void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in)
          *    bounce buffer was in use
          */
         for (i = 0; i < blocklen; i++) {
-            helper_ret_stb_mmu(env, vaddr + i, 0, oi, GETRA());
+            helper_ret_stb_mmu(env, vaddr + i, 0, oi, GETPC());
         }
     }
 #else
diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
index ea2f2ab..7af4c2f 100644
--- a/target-mips/op_helper.c
+++ b/target-mips/op_helper.c
@@ -4122,10 +4122,10 @@ void helper_msa_ld_ ## TYPE(CPUMIPSState *env, uint32_t wd,             \
 }
 
 #if !defined(CONFIG_USER_ONLY)
-MSA_LD_DF(DF_BYTE,   b, helper_ret_ldub_mmu, oi, GETRA())
-MSA_LD_DF(DF_HALF,   h, helper_ret_lduw_mmu, oi, GETRA())
-MSA_LD_DF(DF_WORD,   w, helper_ret_ldul_mmu, oi, GETRA())
-MSA_LD_DF(DF_DOUBLE, d, helper_ret_ldq_mmu,  oi, GETRA())
+MSA_LD_DF(DF_BYTE,   b, helper_ret_ldub_mmu, oi, GETPC())
+MSA_LD_DF(DF_HALF,   h, helper_ret_lduw_mmu, oi, GETPC())
+MSA_LD_DF(DF_WORD,   w, helper_ret_ldul_mmu, oi, GETPC())
+MSA_LD_DF(DF_DOUBLE, d, helper_ret_ldq_mmu,  oi, GETPC())
 #else
 MSA_LD_DF(DF_BYTE,   b, cpu_ldub_data)
 MSA_LD_DF(DF_HALF,   h, cpu_lduw_data)
@@ -4161,17 +4161,17 @@ void helper_msa_st_ ## TYPE(CPUMIPSState *env, uint32_t wd,             \
     int mmu_idx = cpu_mmu_index(env, false);				\
     int i;                                                              \
     MEMOP_IDX(DF)                                                       \
-    ensure_writable_pages(env, addr, mmu_idx, GETRA());                 \
+    ensure_writable_pages(env, addr, mmu_idx, GETPC());                 \
     for (i = 0; i < DF_ELEMENTS(DF); i++) {                             \
         ST_INSN(env, addr + (i << DF), pwd->TYPE[i], ##__VA_ARGS__);    \
     }                                                                   \
 }
 
 #if !defined(CONFIG_USER_ONLY)
-MSA_ST_DF(DF_BYTE,   b, helper_ret_stb_mmu, oi, GETRA())
-MSA_ST_DF(DF_HALF,   h, helper_ret_stw_mmu, oi, GETRA())
-MSA_ST_DF(DF_WORD,   w, helper_ret_stl_mmu, oi, GETRA())
-MSA_ST_DF(DF_DOUBLE, d, helper_ret_stq_mmu, oi, GETRA())
+MSA_ST_DF(DF_BYTE,   b, helper_ret_stb_mmu, oi, GETPC())
+MSA_ST_DF(DF_HALF,   h, helper_ret_stw_mmu, oi, GETPC())
+MSA_ST_DF(DF_WORD,   w, helper_ret_stl_mmu, oi, GETPC())
+MSA_ST_DF(DF_DOUBLE, d, helper_ret_stq_mmu, oi, GETPC())
 #else
 MSA_ST_DF(DF_BYTE,   b, cpu_stb_data)
 MSA_ST_DF(DF_HALF,   h, cpu_stw_data)
diff --git a/translate-all.c b/translate-all.c
index 0dd6466..9cef15a 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -299,6 +299,7 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t retaddr)
 {
     TranslationBlock *tb;
 
+    retaddr -= GETPC_ADJ;
     tb = tb_find_pc(retaddr);
     if (tb) {
         cpu_restore_state_from_tb(cpu, tb, retaddr);
diff --git a/user-exec.c b/user-exec.c
index 95f9f97..6db0758 100644
--- a/user-exec.c
+++ b/user-exec.c
@@ -105,8 +105,11 @@ static inline int handle_cpu_signal(uintptr_t pc, unsigned long address,
     if (ret == 0) {
         return 1; /* the MMU fault was handled without causing real CPU fault */
     }
-    /* now we have a real cpu fault */
-    cpu_restore_state(cpu, pc);
+
+    /* Now we have a real cpu fault.  Since this is the exact location of
+     * the exception, we must undo the adjustment done by cpu_restore_state
+     * for handling call return addresses.  */
+    cpu_restore_state(cpu, pc + GETPC_ADJ);
 
     sigprocmask(SIG_SETMASK, old_set, NULL);
     cpu_loop_exit(cpu);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PULL 03/18] cpu-exec: Check -dfilter for -d cpu
  2016-09-07 21:10 [Qemu-devel] [PULL 00/18] tcg queued patches Richard Henderson
  2016-09-07 21:10 ` [Qemu-devel] [PULL 01/18] tcg: Support arbitrary size + alignment Richard Henderson
  2016-09-07 21:10 ` [Qemu-devel] [PULL 02/18] tcg: Merge GETPC and GETRA Richard Henderson
@ 2016-09-07 21:10 ` Richard Henderson
  2016-09-07 21:10 ` [Qemu-devel] [PULL 04/18] Introduce TCGOpcode for memory barrier Richard Henderson
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-09-07 21:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 cpu-exec.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 5d9710a..e7f851c 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -147,7 +147,8 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock *itb)
                            itb->tc_ptr, itb->pc, lookup_symbol(itb->pc));
 
 #if defined(DEBUG_DISAS)
-    if (qemu_loglevel_mask(CPU_LOG_TB_CPU)) {
+    if (qemu_loglevel_mask(CPU_LOG_TB_CPU)
+        && qemu_log_in_addr_range(itb->pc)) {
 #if defined(TARGET_I386)
         log_cpu_state(cpu, CPU_DUMP_CCOP);
 #elif defined(TARGET_M68K)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PULL 04/18] Introduce TCGOpcode for memory barrier
  2016-09-07 21:10 [Qemu-devel] [PULL 00/18] tcg queued patches Richard Henderson
                   ` (2 preceding siblings ...)
  2016-09-07 21:10 ` [Qemu-devel] [PULL 03/18] cpu-exec: Check -dfilter for -d cpu Richard Henderson
@ 2016-09-07 21:10 ` Richard Henderson
  2016-09-07 21:10 ` [Qemu-devel] [PULL 05/18] tcg/i386: Add support for fence Richard Henderson
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-09-07 21:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar

From: Pranith Kumar <bobby.prani@gmail.com>

This commit introduces the TCGOpcode for memory barrier instruction.

This opcode takes an argument which is the type of memory barrier
which should be generated.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-2-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/README    | 17 +++++++++++++++++
 tcg/tcg-op.c  | 17 +++++++++++++++++
 tcg/tcg-op.h  |  2 ++
 tcg/tcg-opc.h |  2 ++
 tcg/tcg.h     | 19 +++++++++++++++++++
 5 files changed, 57 insertions(+)

diff --git a/tcg/README b/tcg/README
index ce8beba..1d48aa9 100644
--- a/tcg/README
+++ b/tcg/README
@@ -402,6 +402,23 @@ double-word product T0.  The later is returned in two single-word outputs.
 
 Similar to mulu2, except the two inputs T1 and T2 are signed.
 
+********* Memory Barrier support
+
+* mb <$arg>
+
+Generate a target memory barrier instruction to ensure memory ordering as being
+enforced by a corresponding guest memory barrier instruction. The ordering
+enforced by the backend may be stricter than the ordering required by the guest.
+It cannot be weaker. This opcode takes a constant argument which is required to
+generate the appropriate barrier instruction. The backend should take care to
+emit the target barrier instruction only when necessary i.e., for SMP guests and
+when MTTCG is enabled.
+
+The guest translators should generate this opcode for all guest instructions
+which have ordering side effects.
+
+Please see docs/atomics.txt for more information on memory barriers.
+
 ********* 64-bit guest on 32-bit host support
 
 The following opcodes are internal to TCG.  Thus they are to be implemented by
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 0243c99..e3af4dd 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -148,6 +148,23 @@ void tcg_gen_op6(TCGContext *ctx, TCGOpcode opc, TCGArg a1, TCGArg a2,
     tcg_emit_op(ctx, opc, pi);
 }
 
+void tcg_gen_mb(TCGArg mb_type)
+{
+    bool emit_barriers = true;
+
+#ifndef CONFIG_USER_ONLY
+    /* TODO: When MTTCG is available for system mode, we will check
+     * the following condition and enable emit_barriers
+     * (qemu_tcg_mttcg_enabled() && smp_cpus > 1)
+     */
+    emit_barriers = false;
+#endif
+
+    if (emit_barriers) {
+        tcg_gen_op1(&tcg_ctx, INDEX_op_mb, mb_type);
+    }
+}
+
 /* 32 bit ops */
 
 void tcg_gen_addi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index f217e80..41890cc 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -261,6 +261,8 @@ static inline void tcg_gen_br(TCGLabel *l)
     tcg_gen_op1(&tcg_ctx, INDEX_op_br, label_arg(l));
 }
 
+void tcg_gen_mb(TCGArg a);
+
 /* Helper calls. */
 
 /* 32 bit ops */
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 6d0410c..45528d2 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -42,6 +42,8 @@ DEF(br, 0, 0, 1, TCG_OPF_BB_END)
 # define IMPL64  TCG_OPF_64BIT
 #endif
 
+DEF(mb, 0, 0, 1, 0)
+
 DEF(mov_i32, 1, 1, 0, TCG_OPF_NOT_PRESENT)
 DEF(movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT)
 DEF(setcond_i32, 1, 2, 1, 0)
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 8856f02..f8dfe4c 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -476,6 +476,25 @@ static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t)
 #define TCG_CALL_DUMMY_TCGV     MAKE_TCGV_I32(-1)
 #define TCG_CALL_DUMMY_ARG      ((TCGArg)(-1))
 
+/* used to indicate the type of accesses on which ordering is to be
+   ensured. Modeled after SPARC barriers */
+typedef enum {
+    TCG_MO_LD_LD    = 1,
+    TCG_MO_ST_LD    = 2,
+    TCG_MO_LD_ST    = 4,
+    TCG_MO_ST_ST    = 8,
+    TCG_MO_ALL      = 0xF, /* OR of all above */
+} TCGOrder;
+
+/* used to indicate the kind of ordering which is to be ensured by the
+   instruction. These types are derived from x86/aarch64 instructions.
+   It should be noted that these are different from C11 semantics */
+typedef enum {
+    TCG_BAR_LDAQ     = 0x10, /* generated for aarch64 load-acquire inst. */
+    TCG_BAR_STRL     = 0x20, /* generated for aarch64 store-rel inst. */
+    TCG_BAR_SC       = 0x40, /* generated for all other ordering inst. */
+} TCGBar;
+
 /* Conditions.  Note that these are laid out for easy manipulation by
    the functions below:
      bit 0 is used for inverting;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PULL 05/18] tcg/i386: Add support for fence
  2016-09-07 21:10 [Qemu-devel] [PULL 00/18] tcg queued patches Richard Henderson
                   ` (3 preceding siblings ...)
  2016-09-07 21:10 ` [Qemu-devel] [PULL 04/18] Introduce TCGOpcode for memory barrier Richard Henderson
@ 2016-09-07 21:10 ` Richard Henderson
  2016-09-07 21:10 ` [Qemu-devel] [PULL 06/18] tcg/aarch64: " Richard Henderson
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-09-07 21:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar

From: Pranith Kumar <bobby.prani@gmail.com>

Generate a 'lock orl $0,0(%esp)' instruction for ordering instead of
mfence which has similar ordering semantics.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-3-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.inc.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 1573e69..b4f3223 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -686,6 +686,18 @@ static inline void tcg_out_pushi(TCGContext *s, tcg_target_long val)
     }
 }
 
+static inline void tcg_out_mb(TCGContext *s, TCGArg a0)
+{
+    /* Given the strength of x86 memory ordering, we only need care for
+       store-load ordering.  Experimentally, "lock orl $0,0(%esp)" is
+       faster than "mfence", so don't bother with the sse insn.  */
+    if (a0 & TCG_MO_ST_LD) {
+        tcg_out8(s, 0xf0);
+        tcg_out_modrm_offset(s, OPC_ARITH_EvIb, ARITH_OR, TCG_REG_ESP, 0);
+        tcg_out8(s, 0);
+    }
+}
+
 static inline void tcg_out_push(TCGContext *s, int reg)
 {
     tcg_out_opc(s, OPC_PUSH_r32 + LOWREGMASK(reg), 0, reg, 0);
@@ -2130,6 +2142,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_mb:
+        tcg_out_mb(s, args[0]);
+        break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
@@ -2195,6 +2210,8 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_add2_i32, { "r", "r", "0", "1", "ri", "ri" } },
     { INDEX_op_sub2_i32, { "r", "r", "0", "1", "ri", "ri" } },
 
+    { INDEX_op_mb, { } },
+
 #if TCG_TARGET_REG_BITS == 32
     { INDEX_op_brcond2_i32, { "r", "r", "ri", "ri" } },
     { INDEX_op_setcond2_i32, { "r", "r", "r", "ri", "ri" } },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PULL 06/18] tcg/aarch64: Add support for fence
  2016-09-07 21:10 [Qemu-devel] [PULL 00/18] tcg queued patches Richard Henderson
                   ` (4 preceding siblings ...)
  2016-09-07 21:10 ` [Qemu-devel] [PULL 05/18] tcg/i386: Add support for fence Richard Henderson
@ 2016-09-07 21:10 ` Richard Henderson
  2016-09-07 21:10 ` [Qemu-devel] [PULL 07/18] tcg/arm: " Richard Henderson
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-09-07 21:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar, Claudio Fontana

From: Pranith Kumar <bobby.prani@gmail.com>

Cc: Claudio Fontana <claudio.fontana@gmail.com>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-4-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.inc.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 2f5629e..6caa9a4 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -372,6 +372,11 @@ typedef enum {
     I3510_EOR       = 0x4a000000,
     I3510_EON       = 0x4a200000,
     I3510_ANDS      = 0x6a000000,
+
+    /* System instructions.  */
+    DMB_ISH         = 0xd50338bf,
+    DMB_LD          = 0x00000100,
+    DMB_ST          = 0x00000200,
 } AArch64Insn;
 
 static inline uint32_t tcg_in32(TCGContext *s)
@@ -981,6 +986,20 @@ static inline void tcg_out_addsub2(TCGContext *s, int ext, TCGReg rl,
     tcg_out_mov(s, ext, orig_rl, rl);
 }
 
+static inline void tcg_out_mb(TCGContext *s, TCGArg a0)
+{
+    uint32_t dmb_type = DMB_ISH;
+    a0 &= TCG_MO_ALL;
+    if (a0 == TCG_MO_LD_LD) {
+        dmb_type |= DMB_LD;
+    } else if (a0 == TCG_MO_ST_ST) {
+        dmb_type |= DMB_ST;
+    } else {
+        dmb_type |= DMB_LD | DMB_ST;
+    }
+    tcg_out32(s, dmb_type);
+}
+
 #ifdef CONFIG_SOFTMMU
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     TCGMemOpIdx oi, uintptr_t ra)
@@ -1647,6 +1666,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_insn(s, 3508, SMULH, TCG_TYPE_I64, a0, a1, a2);
         break;
 
+    case INDEX_op_mb:
+        tcg_out_mb(s, a0);
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
@@ -1771,6 +1794,7 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
     { INDEX_op_muluh_i64, { "r", "r", "r" } },
     { INDEX_op_mulsh_i64, { "r", "r", "r" } },
 
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PULL 07/18] tcg/arm: Add support for fence
  2016-09-07 21:10 [Qemu-devel] [PULL 00/18] tcg queued patches Richard Henderson
                   ` (5 preceding siblings ...)
  2016-09-07 21:10 ` [Qemu-devel] [PULL 06/18] tcg/aarch64: " Richard Henderson
@ 2016-09-07 21:10 ` Richard Henderson
  2016-09-07 21:10 ` [Qemu-devel] [PULL 08/18] tcg/ia64: " Richard Henderson
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-09-07 21:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar, Andrzej Zaborowski

From: Pranith Kumar <bobby.prani@gmail.com>

Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-5-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.inc.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 58ffc0d..f3ff6f2 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -313,6 +313,10 @@ typedef enum {
     INSN_LDRD_REG  = 0x000000d0,
     INSN_STRD_IMM  = 0x004000f0,
     INSN_STRD_REG  = 0x000000f0,
+
+    INSN_DMB_ISH   = 0x5bf07ff5,
+    INSN_DMB_MCR   = 0xba0f07ee,
+
 } ARMInsn;
 
 #define SHIFT_IMM_LSL(im)	(((im) << 7) | 0x00)
@@ -1066,6 +1070,15 @@ static inline void tcg_out_goto_label(TCGContext *s, int cond, TCGLabel *l)
     }
 }
 
+static inline void tcg_out_mb(TCGContext *s, TCGArg a0)
+{
+    if (use_armv7_instructions) {
+        tcg_out32(s, INSN_DMB_ISH);
+    } else if (use_armv6_instructions) {
+        tcg_out32(s, INSN_DMB_MCR);
+    }
+}
+
 #ifdef CONFIG_SOFTMMU
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     int mmu_idx, uintptr_t ra)
@@ -1928,6 +1941,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_udiv(s, COND_AL, args[0], args[1], args[2]);
         break;
 
+    case INDEX_op_mb:
+        tcg_out_mb(s, args[0]);
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
@@ -2002,6 +2019,7 @@ static const TCGTargetOpDef arm_op_defs[] = {
     { INDEX_op_div_i32, { "r", "r", "r" } },
     { INDEX_op_divu_i32, { "r", "r", "r" } },
 
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PULL 08/18] tcg/ia64: Add support for fence
  2016-09-07 21:10 [Qemu-devel] [PULL 00/18] tcg queued patches Richard Henderson
                   ` (6 preceding siblings ...)
  2016-09-07 21:10 ` [Qemu-devel] [PULL 07/18] tcg/arm: " Richard Henderson
@ 2016-09-07 21:10 ` Richard Henderson
  2016-09-07 21:10 ` [Qemu-devel] [PULL 09/18] tcg/mips: " Richard Henderson
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-09-07 21:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar, Aurelien Jarno

From: Pranith Kumar <bobby.prani@gmail.com>

Cc: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-6-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ia64/tcg-target.inc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tcg/ia64/tcg-target.inc.c b/tcg/ia64/tcg-target.inc.c
index 7642390..b04d716 100644
--- a/tcg/ia64/tcg-target.inc.c
+++ b/tcg/ia64/tcg-target.inc.c
@@ -247,6 +247,7 @@ enum {
     OPC_LD4_M3                = 0x0a080000000ull,
     OPC_LD8_M1                = 0x080c0000000ull,
     OPC_LD8_M3                = 0x0a0c0000000ull,
+    OPC_MF_M24                = 0x00110000000ull,
     OPC_MUX1_I3               = 0x0eca0000000ull,
     OPC_NOP_B9                = 0x04008000000ull,
     OPC_NOP_F16               = 0x00008000000ull,
@@ -2231,6 +2232,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_qemu_st(s, args);
         break;
 
+    case INDEX_op_mb:
+        tcg_out_bundle(s, mmI, OPC_MF_M24, INSN_NOP_M, INSN_NOP_I);
+        break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
@@ -2344,6 +2348,7 @@ static const TCGTargetOpDef ia64_op_defs[] = {
     { INDEX_op_qemu_st_i32, { "SZ", "r" } },
     { INDEX_op_qemu_st_i64, { "SZ", "r" } },
 
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PULL 09/18] tcg/mips: Add support for fence
  2016-09-07 21:10 [Qemu-devel] [PULL 00/18] tcg queued patches Richard Henderson
                   ` (7 preceding siblings ...)
  2016-09-07 21:10 ` [Qemu-devel] [PULL 08/18] tcg/ia64: " Richard Henderson
@ 2016-09-07 21:10 ` Richard Henderson
  2016-09-07 21:10 ` [Qemu-devel] [PULL 10/18] tcg/ppc: " Richard Henderson
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-09-07 21:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar

From: Pranith Kumar <bobby.prani@gmail.com>

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-7-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/mips/tcg-target.inc.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 8614ff8..a59a311 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -292,6 +292,7 @@ typedef enum {
     OPC_JALR     = OPC_SPECIAL | 0x09,
     OPC_MOVZ     = OPC_SPECIAL | 0x0A,
     OPC_MOVN     = OPC_SPECIAL | 0x0B,
+    OPC_SYNC     = OPC_SPECIAL | 0x0F,
     OPC_MFHI     = OPC_SPECIAL | 0x10,
     OPC_MFLO     = OPC_SPECIAL | 0x12,
     OPC_MULT     = OPC_SPECIAL | 0x18,
@@ -1652,6 +1653,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                         const_args[4], const_args[5], true);
         break;
 
+    case INDEX_op_mb:
+        tcg_out32(s, OPC_SYNC);
+        break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
@@ -1732,6 +1736,8 @@ static const TCGTargetOpDef mips_op_defs[] = {
     { INDEX_op_qemu_ld_i64, { "L", "L", "lZ", "lZ" } },
     { INDEX_op_qemu_st_i64, { "SZ", "SZ", "SZ", "SZ" } },
 #endif
+
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PULL 10/18] tcg/ppc: Add support for fence
  2016-09-07 21:10 [Qemu-devel] [PULL 00/18] tcg queued patches Richard Henderson
                   ` (8 preceding siblings ...)
  2016-09-07 21:10 ` [Qemu-devel] [PULL 09/18] tcg/mips: " Richard Henderson
@ 2016-09-07 21:10 ` Richard Henderson
  2016-09-07 21:10 ` [Qemu-devel] [PULL 11/18] tcg/s390: " Richard Henderson
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-09-07 21:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar

From: Pranith Kumar <bobby.prani@gmail.com>

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-8-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc/tcg-target.inc.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 82ac4b3..4aee8ea 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -469,6 +469,10 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define STHX   XO31(407)
 #define STWX   XO31(151)
 
+#define EIEIO  XO31(854)
+#define HWSYNC XO31(598)
+#define LWSYNC (HWSYNC | (1u << 21))
+
 #define SPR(a, b) ((((a)<<5)|(b))<<11)
 #define LR     SPR(8, 0)
 #define CTR    SPR(9, 0)
@@ -1243,6 +1247,18 @@ static void tcg_out_brcond2 (TCGContext *s, const TCGArg *args,
     tcg_out_bc(s, BC | BI(7, CR_EQ) | BO_COND_TRUE, arg_label(args[5]));
 }
 
+static void tcg_out_mb(TCGContext *s, TCGArg a0)
+{
+    uint32_t insn = HWSYNC;
+    a0 &= TCG_MO_ALL;
+    if (a0 == TCG_MO_LD_LD) {
+        insn = LWSYNC;
+    } else if (a0 == TCG_MO_ST_ST) {
+        insn = EIEIO;
+    }
+    tcg_out32(s, insn);
+}
+
 #ifdef __powerpc64__
 void ppc_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
 {
@@ -2452,6 +2468,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out32(s, MULHD | TAB(args[0], args[1], args[2]));
         break;
 
+    case INDEX_op_mb:
+        tcg_out_mb(s, args[0]);
+        break;
+
     case INDEX_op_mov_i32:   /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32:  /* Always emitted via tcg_out_movi.  */
@@ -2599,6 +2619,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
     { INDEX_op_qemu_st_i64, { "S", "S", "S", "S" } },
 #endif
 
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PULL 11/18] tcg/s390: Add support for fence
  2016-09-07 21:10 [Qemu-devel] [PULL 00/18] tcg queued patches Richard Henderson
                   ` (9 preceding siblings ...)
  2016-09-07 21:10 ` [Qemu-devel] [PULL 10/18] tcg/ppc: " Richard Henderson
@ 2016-09-07 21:10 ` Richard Henderson
  2016-09-07 21:10 ` [Qemu-devel] [PULL 12/18] tcg/sparc: " Richard Henderson
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-09-07 21:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar, Alexander Graf

From: Pranith Kumar <bobby.prani@gmail.com>

Cc: Alexander Graf <agraf@suse.de>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-9-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/s390/tcg-target.inc.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index c30a7ef..ada607f 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -343,6 +343,7 @@ static tcg_insn_unit *tb_ret_addr;
 #define FACILITY_EXT_IMM	(1ULL << (63 - 21))
 #define FACILITY_GEN_INST_EXT	(1ULL << (63 - 34))
 #define FACILITY_LOAD_ON_COND   (1ULL << (63 - 45))
+#define FACILITY_FAST_BCR_SER   FACILITY_LOAD_ON_COND
 
 static uint64_t facilities;
 
@@ -2167,6 +2168,15 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tgen_deposit(s, args[0], args[2], args[3], args[4]);
         break;
 
+    case INDEX_op_mb:
+        /* The host memory model is quite strong, we simply need to
+           serialize the instruction stream.  */
+        if (args[0] & TCG_MO_ST_LD) {
+            tcg_out_insn(s, RR, BCR,
+                         facilities & FACILITY_FAST_BCR_SER ? 14 : 15, 0);
+        }
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
@@ -2288,6 +2298,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
     { INDEX_op_movcond_i64, { "r", "r", "rC", "r", "0" } },
     { INDEX_op_deposit_i64, { "r", "0", "r" } },
 
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PULL 12/18] tcg/sparc: Add support for fence
  2016-09-07 21:10 [Qemu-devel] [PULL 00/18] tcg queued patches Richard Henderson
                   ` (10 preceding siblings ...)
  2016-09-07 21:10 ` [Qemu-devel] [PULL 11/18] tcg/s390: " Richard Henderson
@ 2016-09-07 21:10 ` Richard Henderson
  2016-09-07 21:10 ` [Qemu-devel] [PULL 13/18] tcg/tci: " Richard Henderson
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-09-07 21:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar

From: Pranith Kumar <bobby.prani@gmail.com>

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-10-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.inc.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index 10e4126..bca25e2 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -249,6 +249,8 @@ static const int tcg_target_call_oarg_regs[] = {
 #define STWA       (INSN_OP(3) | INSN_OP3(0x14))
 #define STXA       (INSN_OP(3) | INSN_OP3(0x1e))
 
+#define MEMBAR     (INSN_OP(2) | INSN_OP3(0x28) | INSN_RS1(15) | (1 << 13))
+
 #ifndef ASI_PRIMARY_LITTLE
 #define ASI_PRIMARY_LITTLE 0x88
 #endif
@@ -835,6 +837,12 @@ static void tcg_out_call(TCGContext *s, tcg_insn_unit *dest)
     tcg_out_nop(s);
 }
 
+static void tcg_out_mb(TCGContext *s, TCGArg a0)
+{
+    /* Note that the TCG memory order constants mirror the Sparc MEMBAR.  */
+    tcg_out32(s, MEMBAR | (a0 & TCG_MO_ALL));
+}
+
 #ifdef CONFIG_SOFTMMU
 static tcg_insn_unit *qemu_ld_trampoline[16];
 static tcg_insn_unit *qemu_st_trampoline[16];
@@ -1465,6 +1473,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 	tcg_out_arithc(s, a0, TCG_REG_G0, a1, const_args[1], c);
 	break;
 
+    case INDEX_op_mb:
+        tcg_out_mb(s, a0);
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
@@ -1566,6 +1578,7 @@ static const TCGTargetOpDef sparc_op_defs[] = {
     { INDEX_op_qemu_st_i32, { "sZ", "A" } },
     { INDEX_op_qemu_st_i64, { "SZ", "A" } },
 
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PULL 13/18] tcg/tci: Add support for fence
  2016-09-07 21:10 [Qemu-devel] [PULL 00/18] tcg queued patches Richard Henderson
                   ` (11 preceding siblings ...)
  2016-09-07 21:10 ` [Qemu-devel] [PULL 12/18] tcg/sparc: " Richard Henderson
@ 2016-09-07 21:10 ` Richard Henderson
  2016-09-07 21:10 ` [Qemu-devel] [PULL 14/18] target-arm: Generate fences in ARMv7 frontend Richard Henderson
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-09-07 21:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar, Stefan Weil

From: Pranith Kumar <bobby.prani@gmail.com>

Cc: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-11-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tci/tcg-target.inc.c | 3 +++
 tci.c                    | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
index 3c47ea7..9dbf4d5 100644
--- a/tcg/tci/tcg-target.inc.c
+++ b/tcg/tci/tcg-target.inc.c
@@ -255,6 +255,7 @@ static const TCGTargetOpDef tcg_target_op_defs[] = {
     { INDEX_op_bswap32_i32, { R, R } },
 #endif
 
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
@@ -800,6 +801,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         }
         tcg_out_i(s, *args++);
         break;
+    case INDEX_op_mb:
+        break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
diff --git a/tci.c b/tci.c
index b488c0d..4bdc645 100644
--- a/tci.c
+++ b/tci.c
@@ -1236,6 +1236,10 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
                 tcg_abort();
             }
             break;
+        case INDEX_op_mb:
+            /* Ensure ordering for all kinds */
+            smp_mb();
+            break;
         default:
             TODO();
             break;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PULL 14/18] target-arm: Generate fences in ARMv7 frontend
  2016-09-07 21:10 [Qemu-devel] [PULL 00/18] tcg queued patches Richard Henderson
                   ` (12 preceding siblings ...)
  2016-09-07 21:10 ` [Qemu-devel] [PULL 13/18] tcg/tci: " Richard Henderson
@ 2016-09-07 21:10 ` Richard Henderson
  2016-09-07 21:10 ` [Qemu-devel] [PULL 15/18] target-alpha: Generate fence op Richard Henderson
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-09-07 21:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar

From: Pranith Kumar <bobby.prani@gmail.com>

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-12-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index bd5d5cb..693d4bc 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -8083,7 +8083,7 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
             case 4: /* dsb */
             case 5: /* dmb */
                 ARCH(7);
-                /* We don't emulate caches so these are a no-op.  */
+                tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
                 return;
             case 6: /* isb */
                 /* We need to break the TB after this insn to execute
@@ -10432,7 +10432,7 @@ static int disas_thumb2_insn(CPUARMState *env, DisasContext *s, uint16_t insn_hw
                             break;
                         case 4: /* dsb */
                         case 5: /* dmb */
-                            /* These execute as NOPs.  */
+                            tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
                             break;
                         case 6: /* isb */
                             /* We need to break the TB after this insn
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PULL 15/18] target-alpha: Generate fence op
  2016-09-07 21:10 [Qemu-devel] [PULL 00/18] tcg queued patches Richard Henderson
                   ` (13 preceding siblings ...)
  2016-09-07 21:10 ` [Qemu-devel] [PULL 14/18] target-arm: Generate fences in ARMv7 frontend Richard Henderson
@ 2016-09-07 21:10 ` Richard Henderson
  2016-09-07 21:10 ` [Qemu-devel] [PULL 16/18] target-aarch64: Generate fences for aarch64 Richard Henderson
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-09-07 21:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar

From: Pranith Kumar <bobby.prani@gmail.com>

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-13-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-alpha/translate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 0ea0e6e..c27c7b9 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -2338,11 +2338,11 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn)
             break;
         case 0x4000:
             /* MB */
-            /* No-op */
+            tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
             break;
         case 0x4400:
             /* WMB */
-            /* No-op */
+            tcg_gen_mb(TCG_MO_ST_ST | TCG_BAR_SC);
             break;
         case 0x8000:
             /* FETCH */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PULL 16/18] target-aarch64: Generate fences for aarch64
  2016-09-07 21:10 [Qemu-devel] [PULL 00/18] tcg queued patches Richard Henderson
                   ` (14 preceding siblings ...)
  2016-09-07 21:10 ` [Qemu-devel] [PULL 15/18] target-alpha: Generate fence op Richard Henderson
@ 2016-09-07 21:10 ` Richard Henderson
  2016-09-07 21:10 ` [Qemu-devel] [PULL 17/18] target-i386: Generate fences for x86 Richard Henderson
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-09-07 21:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar

From: Pranith Kumar <bobby.prani@gmail.com>

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-14-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index f5e29d2..09877bc 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -1305,7 +1305,7 @@ static void handle_sync(DisasContext *s, uint32_t insn,
         return;
     case 4: /* DSB */
     case 5: /* DMB */
-        /* We don't emulate caches so barriers are no-ops */
+        tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
         return;
     case 6: /* ISB */
         /* We need to break the TB after this insn to execute
@@ -1934,7 +1934,13 @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
         if (!is_store) {
             s->is_ldex = true;
             gen_load_exclusive(s, rt, rt2, tcg_addr, size, is_pair);
+            if (is_lasr) {
+                tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
+            }
         } else {
+            if (is_lasr) {
+                tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL);
+            }
             gen_store_exclusive(s, rs, rt, rt2, tcg_addr, size, is_pair);
         }
     } else {
@@ -1943,11 +1949,17 @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
 
         /* Generate ISS for non-exclusive accesses including LASR.  */
         if (is_store) {
+            if (is_lasr) {
+                tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL);
+            }
             do_gpr_st(s, tcg_rt, tcg_addr, size,
                       true, rt, iss_sf, is_lasr);
         } else {
             do_gpr_ld(s, tcg_rt, tcg_addr, size, false, false,
                       true, rt, iss_sf, is_lasr);
+            if (is_lasr) {
+                tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
+            }
         }
     }
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PULL 17/18] target-i386: Generate fences for x86
  2016-09-07 21:10 [Qemu-devel] [PULL 00/18] tcg queued patches Richard Henderson
                   ` (15 preceding siblings ...)
  2016-09-07 21:10 ` [Qemu-devel] [PULL 16/18] target-aarch64: Generate fences for aarch64 Richard Henderson
@ 2016-09-07 21:10 ` Richard Henderson
  2016-09-07 21:10 ` [Qemu-devel] [PULL 18/18] tcg: Optimize fence instructions Richard Henderson
  2016-09-08 17:15 ` [Qemu-devel] [PULL v2 00/18] tcg queued patches Richard Henderson
  18 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-09-07 21:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar

From: Pranith Kumar <bobby.prani@gmail.com>

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-15-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/translate.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index fa2ac48..9447557 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -8012,13 +8012,21 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                 || (prefixes & PREFIX_LOCK)) {
                 goto illegal_op;
             }
+            tcg_gen_mb(TCG_MO_ST_ST | TCG_BAR_SC);
             break;
         case 0xe8 ... 0xef: /* lfence */
+            if (!(s->cpuid_features & CPUID_SSE)
+                || (prefixes & PREFIX_LOCK)) {
+                goto illegal_op;
+            }
+            tcg_gen_mb(TCG_MO_LD_LD | TCG_BAR_SC);
+            break;
         case 0xf0 ... 0xf7: /* mfence */
             if (!(s->cpuid_features & CPUID_SSE2)
                 || (prefixes & PREFIX_LOCK)) {
                 goto illegal_op;
             }
+            tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
             break;
 
         default:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PULL 18/18] tcg: Optimize fence instructions
  2016-09-07 21:10 [Qemu-devel] [PULL 00/18] tcg queued patches Richard Henderson
                   ` (16 preceding siblings ...)
  2016-09-07 21:10 ` [Qemu-devel] [PULL 17/18] target-i386: Generate fences for x86 Richard Henderson
@ 2016-09-07 21:10 ` Richard Henderson
  2016-09-08 17:15 ` [Qemu-devel] [PULL v2 00/18] tcg queued patches Richard Henderson
  18 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-09-07 21:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar

From: Pranith Kumar <bobby.prani@gmail.com>

This commit optimizes fence instructions.  Two optimizations are
currently implemented: (1) unnecessary duplicate fence instructions,
and (2) merging weaker fences into a stronger fence.

[rth: Merge tcg_optimize_mb back into tcg_optimize, so that we only
loop over the opcode stream once.  Merge "unrelated" weaker barriers
into one stronger barrier.]

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160823134825.32578-1-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index cffe89b..0455285 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -542,6 +542,7 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
 void tcg_optimize(TCGContext *s)
 {
     int oi, oi_next, nb_temps, nb_globals;
+    TCGArg *prev_mb_args = NULL;
 
     /* Array VALS has an element for each temp.
        If this temp holds a constant then its value is kept in VALS' element.
@@ -1295,5 +1296,58 @@ void tcg_optimize(TCGContext *s)
             }
             break;
         }
+
+        /* Eliminate duplicate and redundant fence instructions.  */
+        if (prev_mb_args) {
+            TCGArg pop, cop;
+            TCGBar pty, cty;
+
+            switch (opc) {
+            case INDEX_op_mb:
+                pop = prev_mb_args[0];
+                cop = args[0];
+                pty = pop & 0xF0;
+                cty = cop & 0xF0;
+
+                if (cty == pty) {
+                    /* Two barriers of the same type.  Merge the set of
+                     * memories to which this applies.  */
+                    pop |= cop & 0x0F;
+                } else {
+                    /* Merge a weaker barrier into a stronger one,
+                     * or two weaker barriers into a stronger one.
+                     *   mb; strl => mb; st
+                     *   ldaq; mb => ld; mb
+                     *   ldaq; strl => ld; mb; st
+                     * Other combinations are also merged into a strong
+                     * barrier.  This is stricter than specified but for
+                     * the purposes of TCG is better than not optimizing.
+                     */
+                    pop = TCG_BAR_SC | ((cop | pop) & 0x0F);
+                }
+                /* Change the previous barrier to the merged state.
+                 * Then we can remove the current barrier.  */
+                prev_mb_args[0] = pop;
+                tcg_op_remove(s, op);
+                break;
+
+            default:
+                /* Opcodes that end the block stop the optimization.  */
+                if ((def->flags & TCG_OPF_BB_END) == 0) {
+                    break;
+                }
+                /* fallthru */
+            case INDEX_op_qemu_ld_i32:
+            case INDEX_op_qemu_ld_i64:
+            case INDEX_op_qemu_st_i32:
+            case INDEX_op_qemu_st_i64:
+            case INDEX_op_call:
+                /* Opcodes that touch guest memory stop the optimization.  */
+                prev_mb_args = NULL;
+                break;
+            }
+        } else if (opc == INDEX_op_mb) {
+            prev_mb_args = args;
+        }
     }
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PULL v2 00/18] tcg queued patches
  2016-09-07 21:10 [Qemu-devel] [PULL 00/18] tcg queued patches Richard Henderson
                   ` (17 preceding siblings ...)
  2016-09-07 21:10 ` [Qemu-devel] [PULL 18/18] tcg: Optimize fence instructions Richard Henderson
@ 2016-09-08 17:15 ` Richard Henderson
  2016-09-08 20:38   ` Richard Henderson
  18 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2016-09-08 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Three unrelated patches and Pranith's memory barrier patch sets.

The alignment patch is in support of Sparc's ldf instructions:
8 and 16-byte memory operations that require only 4-byte alignment.
It's just as easy to support this kind of misalignment as any other.
As mentioned in the commit, we'd also forgotten to properly handle
arm32, mips, ia64 and sparc when it came to overalignment.

I have a follow up patch set to make use of this for target-sparc.

I've tweaked the memory barrier patch set.  For aarch64, ppc
and sparc, I've fixed the insn selection a bit.  I merged the
optimization pass into the current optimization pass.


r~


[v2: With actual pull data this time, cover letter only.]


The following changes since commit 2926375cffce464fde6b4dabaed1e133d549af39:

  Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging (2016-09-06 17:18:17 +0100)

are available in the git repository at:

  git://github.com/rth7680/qemu.git tags/pull-tcg-20160908

for you to fetch changes up to a36ae2f6ad489d510ada40bf2aab9539e4a97511:

  tcg: Optimize fence instructions (2016-09-07 13:46:01 -0700)

----------------------------------------------------------------
Alignment, memory barrier, and misc patches.

----------------------------------------------------------------
Pranith Kumar (15):
      Introduce TCGOpcode for memory barrier
      tcg/i386: Add support for fence
      tcg/aarch64: Add support for fence
      tcg/arm: Add support for fence
      tcg/ia64: Add support for fence
      tcg/mips: Add support for fence
      tcg/ppc: Add support for fence
      tcg/s390: Add support for fence
      tcg/sparc: Add support for fence
      tcg/tci: Add support for fence
      target-arm: Generate fences in ARMv7 frontend
      target-alpha: Generate fence op
      target-aarch64: Generate fences for aarch64
      target-i386: Generate fences for x86
      tcg: Optimize fence instructions

Richard Henderson (3):
      tcg: Support arbitrary size + alignment
      tcg: Merge GETPC and GETRA
      cpu-exec: Check -dfilter for -d cpu

 cpu-exec.c                   |  3 +-
 cputlb.c                     |  6 ++--
 include/exec/exec-all.h      |  9 ++---
 softmmu_template.h           | 48 ++++++++------------------
 target-alpha/translate.c     |  4 +--
 target-arm/helper.c          |  6 ++--
 target-arm/translate-a64.c   | 14 +++++++-
 target-arm/translate.c       |  4 +--
 target-i386/translate.c      |  8 +++++
 target-mips/op_helper.c      | 18 +++++-----
 tcg/README                   | 17 ++++++++++
 tcg/aarch64/tcg-target.inc.c | 35 +++++++++++++++----
 tcg/arm/tcg-target.inc.c     | 37 ++++++++++++++++----
 tcg/i386/tcg-target.inc.c    | 33 +++++++++++++-----
 tcg/ia64/tcg-target.inc.c    | 27 +++++++++++----
 tcg/mips/tcg-target.inc.c    | 18 ++++++++--
 tcg/optimize.c               | 54 +++++++++++++++++++++++++++++
 tcg/ppc/tcg-target.inc.c     | 78 +++++++++++++++++++++++++++---------------
 tcg/s390/tcg-target.inc.c    | 24 ++++++++-----
 tcg/sparc/tcg-target.inc.c   | 30 ++++++++++++----
 tcg/tcg-op.c                 | 17 ++++++++++
 tcg/tcg-op.h                 |  2 ++
 tcg/tcg-opc.h                |  2 ++
 tcg/tcg.h                    | 81 +++++++++++++++++++++++++++-----------------
 tcg/tci/tcg-target.inc.c     |  3 ++
 tci.c                        |  4 +++
 translate-all.c              |  1 +
 user-exec.c                  |  7 ++--
 28 files changed, 421 insertions(+), 169 deletions(-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PULL v2 00/18] tcg queued patches
  2016-09-08 17:15 ` [Qemu-devel] [PULL v2 00/18] tcg queued patches Richard Henderson
@ 2016-09-08 20:38   ` Richard Henderson
  2016-09-08 23:49     ` Pranith Kumar
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2016-09-08 20:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

On 09/08/2016 10:15 AM, Richard Henderson wrote:
> Three unrelated patches and Pranith's memory barrier patch sets.
>
> The alignment patch is in support of Sparc's ldf instructions:
> 8 and 16-byte memory operations that require only 4-byte alignment.
> It's just as easy to support this kind of misalignment as any other.
> As mentioned in the commit, we'd also forgotten to properly handle
> arm32, mips, ia64 and sparc when it came to overalignment.
>
> I have a follow up patch set to make use of this for target-sparc.
>
> I've tweaked the memory barrier patch set.  For aarch64, ppc
> and sparc, I've fixed the insn selection a bit.  I merged the
> optimization pass into the current optimization pass.

Ho hum.  I think I've mucked something up here too.
Please ignore this pull.


r~

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PULL v2 00/18] tcg queued patches
  2016-09-08 20:38   ` Richard Henderson
@ 2016-09-08 23:49     ` Pranith Kumar
  2016-09-09  0:06       ` Pranith Kumar
  0 siblings, 1 reply; 25+ messages in thread
From: Pranith Kumar @ 2016-09-08 23:49 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Peter Maydell

On Thu, Sep 8, 2016 at 4:38 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 09/08/2016 10:15 AM, Richard Henderson wrote:
>>
>> Three unrelated patches and Pranith's memory barrier patch sets.
>>
>> The alignment patch is in support of Sparc's ldf instructions:
>> 8 and 16-byte memory operations that require only 4-byte alignment.
>> It's just as easy to support this kind of misalignment as any other.
>> As mentioned in the commit, we'd also forgotten to properly handle
>> arm32, mips, ia64 and sparc when it came to overalignment.
>>
>> I have a follow up patch set to make use of this for target-sparc.
>>
>> I've tweaked the memory barrier patch set.  For aarch64, ppc
>> and sparc, I've fixed the insn selection a bit.  I merged the
>> optimization pass into the current optimization pass.
>
>
> Ho hum.  I think I've mucked something up here too.
> Please ignore this pull.
>

I think I found the error. It looks like the fence optimization patch
is causing the error.

I will reply in that patch.

--
Pranith

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PULL v2 00/18] tcg queued patches
  2016-09-08 23:49     ` Pranith Kumar
@ 2016-09-09  0:06       ` Pranith Kumar
  2016-09-09  6:51         ` Richard Henderson
  0 siblings, 1 reply; 25+ messages in thread
From: Pranith Kumar @ 2016-09-09  0:06 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Peter Maydell

On Thu, Sep 8, 2016 at 7:49 PM, Pranith Kumar
<bobby.prani+qemu@gmail.com> wrote:
>> On 09/08/2016 10:15 AM, Richard Henderson wrote:
>> Ho hum.  I think I've mucked something up here too.
>> Please ignore this pull.
>>
>
> I think I found the error. It looks like the fence optimization patch
> is causing the error.
>
> I will reply in that patch.
>

Hmmm, looking closer, the optimization patch looks fine.

What error are you seeing?

--
Pranith

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PULL v2 00/18] tcg queued patches
  2016-09-09  0:06       ` Pranith Kumar
@ 2016-09-09  6:51         ` Richard Henderson
  2016-09-12 17:23           ` Richard Henderson
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2016-09-09  6:51 UTC (permalink / raw)
  To: Pranith Kumar; +Cc: qemu-devel, Peter Maydell

On 09/08/2016 05:06 PM, Pranith Kumar wrote:
> What error are you seeing?

Something else entirely.

My alpha kernel is crashing.  I thought it was something in this patch set, 
after I rebased, but now I can see it's in HEAD too.  I'll have to spend some 
time tracking it down.


r~

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PULL v2 00/18] tcg queued patches
  2016-09-09  6:51         ` Richard Henderson
@ 2016-09-12 17:23           ` Richard Henderson
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-09-12 17:23 UTC (permalink / raw)
  To: Pranith Kumar; +Cc: Peter Maydell, qemu-devel

On 09/08/2016 11:51 PM, Richard Henderson wrote:
> On 09/08/2016 05:06 PM, Pranith Kumar wrote:
>> What error are you seeing?
>
> Something else entirely.
>
> My alpha kernel is crashing.  I thought it was something in this patch set,
> after I rebased, but now I can see it's in HEAD too.  I'll have to spend some
> time tracking it down.

Well, the best I can figure, we've got a missing dependency in the makefiles 
somewhere.  Which, does explain the segv backtrace that made no sense.

I can't imagine what is wrong with the makefile, seeing as I used "make clean" 
from time to time in order to drop optimization from cflags.  But I built quite 
a lot of git versions, all of which failed until I deleted the build directory. 
  After that, everything works.

Ho hum.  I'll set up another pull request.


r~

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-09-12 17:24 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-07 21:10 [Qemu-devel] [PULL 00/18] tcg queued patches Richard Henderson
2016-09-07 21:10 ` [Qemu-devel] [PULL 01/18] tcg: Support arbitrary size + alignment Richard Henderson
2016-09-07 21:10 ` [Qemu-devel] [PULL 02/18] tcg: Merge GETPC and GETRA Richard Henderson
2016-09-07 21:10 ` [Qemu-devel] [PULL 03/18] cpu-exec: Check -dfilter for -d cpu Richard Henderson
2016-09-07 21:10 ` [Qemu-devel] [PULL 04/18] Introduce TCGOpcode for memory barrier Richard Henderson
2016-09-07 21:10 ` [Qemu-devel] [PULL 05/18] tcg/i386: Add support for fence Richard Henderson
2016-09-07 21:10 ` [Qemu-devel] [PULL 06/18] tcg/aarch64: " Richard Henderson
2016-09-07 21:10 ` [Qemu-devel] [PULL 07/18] tcg/arm: " Richard Henderson
2016-09-07 21:10 ` [Qemu-devel] [PULL 08/18] tcg/ia64: " Richard Henderson
2016-09-07 21:10 ` [Qemu-devel] [PULL 09/18] tcg/mips: " Richard Henderson
2016-09-07 21:10 ` [Qemu-devel] [PULL 10/18] tcg/ppc: " Richard Henderson
2016-09-07 21:10 ` [Qemu-devel] [PULL 11/18] tcg/s390: " Richard Henderson
2016-09-07 21:10 ` [Qemu-devel] [PULL 12/18] tcg/sparc: " Richard Henderson
2016-09-07 21:10 ` [Qemu-devel] [PULL 13/18] tcg/tci: " Richard Henderson
2016-09-07 21:10 ` [Qemu-devel] [PULL 14/18] target-arm: Generate fences in ARMv7 frontend Richard Henderson
2016-09-07 21:10 ` [Qemu-devel] [PULL 15/18] target-alpha: Generate fence op Richard Henderson
2016-09-07 21:10 ` [Qemu-devel] [PULL 16/18] target-aarch64: Generate fences for aarch64 Richard Henderson
2016-09-07 21:10 ` [Qemu-devel] [PULL 17/18] target-i386: Generate fences for x86 Richard Henderson
2016-09-07 21:10 ` [Qemu-devel] [PULL 18/18] tcg: Optimize fence instructions Richard Henderson
2016-09-08 17:15 ` [Qemu-devel] [PULL v2 00/18] tcg queued patches Richard Henderson
2016-09-08 20:38   ` Richard Henderson
2016-09-08 23:49     ` Pranith Kumar
2016-09-09  0:06       ` Pranith Kumar
2016-09-09  6:51         ` Richard Henderson
2016-09-12 17:23           ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.