All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PULL v3 00/18] tcg queued patches
@ 2016-09-12 23:39 Richard Henderson
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 01/18] tcg: Support arbitrary size + alignment Richard Henderson
                   ` (18 more replies)
  0 siblings, 19 replies; 22+ messages in thread
From: Richard Henderson @ 2016-09-12 23:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Mostly the same as v2, except rebased and the tcg/mips patch
adjusted for the mips32r6 discussion with Leon.


r~


The following changes since commit c2a57aae9a1c3dd7de77daf5478df10379aeeebf:

  Merge remote-tracking branch 'remotes/famz/tags/docker-pull-request' into staging (2016-09-09 12:49:41 +0100)

are available in the git repository at:

  git://github.com/rth7680/qemu.git tags/pull-tcg-20160912

for you to fetch changes up to 3c8951649d996facd9581752b87b6c09d2dbab06:

  tcg: Optimize fence instructions (2016-09-12 16:29:50 -0700)

----------------------------------------------------------------
queued tcg patches

----------------------------------------------------------------
Pranith Kumar (15):
      Introduce TCGOpcode for memory barrier
      tcg/i386: Add support for fence
      tcg/aarch64: Add support for fence
      tcg/arm: Add support for fence
      tcg/ia64: Add support for fence
      tcg/mips: Add support for fence
      tcg/ppc: Add support for fence
      tcg/s390: Add support for fence
      tcg/sparc: Add support for fence
      tcg/tci: Add support for fence
      target-arm: Generate fences in ARMv7 frontend
      target-alpha: Generate fence op
      target-aarch64: Generate fences for aarch64
      target-i386: Generate fences for x86
      tcg: Optimize fence instructions

Richard Henderson (3):
      tcg: Support arbitrary size + alignment
      tcg: Merge GETPC and GETRA
      cpu-exec: Check -dfilter for -d cpu

 cpu-exec.c                   |  3 +-
 cputlb.c                     |  6 ++--
 include/exec/exec-all.h      |  9 ++---
 softmmu_template.h           | 48 ++++++++------------------
 target-alpha/translate.c     |  4 +--
 target-arm/helper.c          |  6 ++--
 target-arm/translate-a64.c   | 14 +++++++-
 target-arm/translate.c       |  4 +--
 target-i386/translate.c      |  8 +++++
 target-mips/op_helper.c      | 18 +++++-----
 tcg/README                   | 17 ++++++++++
 tcg/aarch64/tcg-target.inc.c | 35 +++++++++++++++----
 tcg/arm/tcg-target.inc.c     | 37 ++++++++++++++++----
 tcg/i386/tcg-target.inc.c    | 33 +++++++++++++-----
 tcg/ia64/tcg-target.inc.c    | 27 +++++++++++----
 tcg/mips/tcg-target.inc.c    | 42 +++++++++++++++++++++--
 tcg/optimize.c               | 54 +++++++++++++++++++++++++++++
 tcg/ppc/tcg-target.inc.c     | 78 +++++++++++++++++++++++++++---------------
 tcg/s390/tcg-target.inc.c    | 24 ++++++++-----
 tcg/sparc/tcg-target.inc.c   | 30 ++++++++++++----
 tcg/tcg-op.c                 | 17 ++++++++++
 tcg/tcg-op.h                 |  2 ++
 tcg/tcg-opc.h                |  2 ++
 tcg/tcg.h                    | 81 +++++++++++++++++++++++++++-----------------
 tcg/tci/tcg-target.inc.c     |  3 ++
 tci.c                        |  4 +++
 translate-all.c              |  1 +
 user-exec.c                  |  7 ++--
 28 files changed, 445 insertions(+), 169 deletions(-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PULL v3 01/18] tcg: Support arbitrary size + alignment
  2016-09-12 23:39 [Qemu-devel] [PULL v3 00/18] tcg queued patches Richard Henderson
@ 2016-09-12 23:39 ` Richard Henderson
  2016-09-20 10:16   ` Bharata B Rao
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 02/18] tcg: Merge GETPC and GETRA Richard Henderson
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 22+ messages in thread
From: Richard Henderson @ 2016-09-12 23:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Previously we allowed fully unaligned operations, but not operations
that are aligned but with less alignment than the operation size.

In addition, arm32, ia64, mips, and sparc had been omitted from the
previous overalignment patch, which would have led to that alignment
being enforced.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 softmmu_template.h           | 16 ++++++------
 tcg/aarch64/tcg-target.inc.c | 11 ++++----
 tcg/arm/tcg-target.inc.c     | 19 +++++++++-----
 tcg/i386/tcg-target.inc.c    | 16 ++++++------
 tcg/ia64/tcg-target.inc.c    | 22 +++++++++++-----
 tcg/mips/tcg-target.inc.c    | 12 ++++++---
 tcg/ppc/tcg-target.inc.c     | 57 +++++++++++++++++++++-------------------
 tcg/s390/tcg-target.inc.c    | 13 +++-------
 tcg/sparc/tcg-target.inc.c   | 17 +++++++-----
 tcg/tcg.h                    | 62 +++++++++++++++++++++-----------------------
 10 files changed, 132 insertions(+), 113 deletions(-)

diff --git a/softmmu_template.h b/softmmu_template.h
index 284ab2c..7ea0a41 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -146,14 +146,14 @@ WORD_TYPE helper_le_ld_name(CPUArchState *env, target_ulong addr,
     unsigned mmu_idx = get_mmuidx(oi);
     int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
     target_ulong tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
-    int a_bits = get_alignment_bits(get_memop(oi));
+    unsigned a_bits = get_alignment_bits(get_memop(oi));
     uintptr_t haddr;
     DATA_TYPE res;
 
     /* Adjust the given return address.  */
     retaddr -= GETPC_ADJ;
 
-    if (a_bits > 0 && (addr & ((1 << a_bits) - 1)) != 0) {
+    if (addr & nbits(a_bits)) {
         cpu_unaligned_access(ENV_GET_CPU(env), addr, READ_ACCESS_TYPE,
                              mmu_idx, retaddr);
     }
@@ -220,14 +220,14 @@ WORD_TYPE helper_be_ld_name(CPUArchState *env, target_ulong addr,
     unsigned mmu_idx = get_mmuidx(oi);
     int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
     target_ulong tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
-    int a_bits = get_alignment_bits(get_memop(oi));
+    unsigned a_bits = get_alignment_bits(get_memop(oi));
     uintptr_t haddr;
     DATA_TYPE res;
 
     /* Adjust the given return address.  */
     retaddr -= GETPC_ADJ;
 
-    if (a_bits > 0 && (addr & ((1 << a_bits) - 1)) != 0) {
+    if (addr & nbits(a_bits)) {
         cpu_unaligned_access(ENV_GET_CPU(env), addr, READ_ACCESS_TYPE,
                              mmu_idx, retaddr);
     }
@@ -331,13 +331,13 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
     unsigned mmu_idx = get_mmuidx(oi);
     int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
     target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
-    int a_bits = get_alignment_bits(get_memop(oi));
+    unsigned a_bits = get_alignment_bits(get_memop(oi));
     uintptr_t haddr;
 
     /* Adjust the given return address.  */
     retaddr -= GETPC_ADJ;
 
-    if (a_bits > 0 && (addr & ((1 << a_bits) - 1)) != 0) {
+    if (addr & nbits(a_bits)) {
         cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
                              mmu_idx, retaddr);
     }
@@ -414,13 +414,13 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
     unsigned mmu_idx = get_mmuidx(oi);
     int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
     target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
-    int a_bits = get_alignment_bits(get_memop(oi));
+    unsigned a_bits = get_alignment_bits(get_memop(oi));
     uintptr_t haddr;
 
     /* Adjust the given return address.  */
     retaddr -= GETPC_ADJ;
 
-    if (a_bits > 0 && (addr & ((1 << a_bits) - 1)) != 0) {
+    if (addr & nbits(a_bits)) {
         cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
                              mmu_idx, retaddr);
     }
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 08b2d03..2f5629e 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -1081,23 +1081,22 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp opc,
     int tlb_offset = is_read ?
         offsetof(CPUArchState, tlb_table[mem_index][0].addr_read)
         : offsetof(CPUArchState, tlb_table[mem_index][0].addr_write);
-    int a_bits = get_alignment_bits(opc);
+    unsigned a_bits = get_alignment_bits(opc);
+    unsigned s_bits = opc & MO_SIZE;
     TCGReg base = TCG_AREG0, x3;
     uint64_t tlb_mask;
 
     /* For aligned accesses, we check the first byte and include the alignment
        bits within the address.  For unaligned access, we check that we don't
        cross pages using the address of the last byte of the access.  */
-    if (a_bits >= 0) {
-        /* A byte access or an alignment check required */
-        tlb_mask = TARGET_PAGE_MASK | ((1 << a_bits) - 1);
+    if (a_bits >= s_bits) {
         x3 = addr_reg;
     } else {
         tcg_out_insn(s, 3401, ADDI, TARGET_LONG_BITS == 64,
-                     TCG_REG_X3, addr_reg, (1 << (opc & MO_SIZE)) - 1);
-        tlb_mask = TARGET_PAGE_MASK;
+                     TCG_REG_X3, addr_reg, nbits(s_bits) - nbits(a_bits));
         x3 = TCG_REG_X3;
     }
+    tlb_mask = TARGET_PAGE_MASK | nbits(a_bits);
 
     /* Extract the TLB index from the address into X0.
        X0<CPU_TLB_BITS:0> =
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 172feba..58ffc0d 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -1168,7 +1168,7 @@ QEMU_BUILD_BUG_ON(offsetof(CPUArchState, tlb_table[NB_MMU_MODES - 1][1])
    containing the addend of the tlb entry.  Clobbers R0, R1, R2, TMP.  */
 
 static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
-                               TCGMemOp s_bits, int mem_index, bool is_load)
+                               TCGMemOp opc, int mem_index, bool is_load)
 {
     TCGReg base = TCG_AREG0;
     int cmp_off =
@@ -1176,6 +1176,8 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
          ? offsetof(CPUArchState, tlb_table[mem_index][0].addr_read)
          : offsetof(CPUArchState, tlb_table[mem_index][0].addr_write));
     int add_off = offsetof(CPUArchState, tlb_table[mem_index][0].addend);
+    unsigned s_bits = opc & MO_SIZE;
+    unsigned a_bits = get_alignment_bits(opc);
 
     /* Should generate something like the following:
      *   shr    tmp, addrlo, #TARGET_PAGE_BITS                    (1)
@@ -1216,10 +1218,13 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
         }
     }
 
-    /* Check alignment.  */
-    if (s_bits) {
-        tcg_out_dat_imm(s, COND_AL, ARITH_TST,
-                        0, addrlo, (1 << s_bits) - 1);
+    /* Check alignment.  We don't support inline unaligned acceses,
+       but we can easily support overalignment checks.  */
+    if (a_bits < s_bits) {
+        a_bits = s_bits;
+    }
+    if (a_bits) {
+        tcg_out_dat_imm(s, COND_AL, ARITH_TST, 0, addrlo, nbits(a_bits));
     }
 
     /* Load the tlb addend.  */
@@ -1499,7 +1504,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
-    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc & MO_SIZE, mem_index, 1);
+    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 1);
 
     /* This a conditional BL only to load a pointer within this opcode into LR
        for the slow path.  We will not be using the value for a tail call.  */
@@ -1630,7 +1635,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
-    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc & MO_SIZE, mem_index, 0);
+    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc, mem_index, 0);
 
     tcg_out_qemu_st_index(s, COND_EQ, opc, datalo, datahi, addrlo, addend);
 
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 6f8cdca..1573e69 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1202,7 +1202,8 @@ static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     TCGType ttype = TCG_TYPE_I32;
     TCGType tlbtype = TCG_TYPE_I32;
     int trexw = 0, hrexw = 0, tlbrexw = 0;
-    int a_bits = get_alignment_bits(opc);
+    unsigned a_bits = get_alignment_bits(opc);
+    unsigned s_bits = opc & MO_SIZE;
     target_ulong tlb_mask;
 
     if (TCG_TARGET_REG_BITS == 64) {
@@ -1220,17 +1221,16 @@ static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     }
 
     tcg_out_mov(s, tlbtype, r0, addrlo);
-    if (a_bits >= 0) {
-        /* A byte access or an alignment check required */
+    /* If the required alignment is at least as large as the access, simply
+       copy the address and mask.  For lesser alignments, check that we don't
+       cross pages for the complete access.  */
+    if (a_bits >= s_bits) {
         tcg_out_mov(s, ttype, r1, addrlo);
-        tlb_mask = TARGET_PAGE_MASK | ((1 << a_bits) - 1);
     } else {
-        /* For unaligned access check that we don't cross pages using
-           the page address of the last byte.  */
         tcg_out_modrm_offset(s, OPC_LEA + trexw, r1, addrlo,
-                             (1 << (opc & MO_SIZE)) - 1);
-        tlb_mask = TARGET_PAGE_MASK;
+                             nbits(s_bits) - nbits(a_bits));
     }
+    tlb_mask = TARGET_PAGE_MASK | nbits(a_bits);
 
     tcg_out_shifti(s, SHIFT_SHR + tlbrexw, r0,
                    TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
diff --git a/tcg/ia64/tcg-target.inc.c b/tcg/ia64/tcg-target.inc.c
index c91f392..7642390 100644
--- a/tcg/ia64/tcg-target.inc.c
+++ b/tcg/ia64/tcg-target.inc.c
@@ -1496,10 +1496,18 @@ QEMU_BUILD_BUG_ON(offsetof(CPUArchState, tlb_table[NB_MMU_MODES - 1][1])
    R1, R3 are clobbered, leaving R56 free for...
    BSWAP_1, BSWAP_2 and I-slot insns for swapping data for store.  */
 static inline void tcg_out_qemu_tlb(TCGContext *s, TCGReg addr_reg,
-                                    TCGMemOp s_bits, int off_rw, int off_add,
+                                    TCGMemOp opc, int off_rw, int off_add,
                                     uint64_t bswap1, uint64_t bswap2)
 {
-     /*
+    unsigned s_bits = opc & MO_SIZE;
+    unsigned a_bits = get_alignment_bits(opc);
+
+    /* We don't support unaligned accesses, but overalignment is easy.  */
+    if (a_bits < s_bits) {
+        a_bits = s_bits;
+    }
+
+    /*
         .mii
         mov	r2 = off_rw
         extr.u	r3 = addr_reg, ...		# extract tlb page
@@ -1521,7 +1529,7 @@ static inline void tcg_out_qemu_tlb(TCGContext *s, TCGReg addr_reg,
         cmp.eq	p6, p7 = r3, r58
         nop
         ;;
-      */
+    */
     tcg_out_bundle(s, miI,
                    tcg_opc_movi_a(TCG_REG_P0, TCG_REG_R2, off_rw),
                    tcg_opc_i11(TCG_REG_P0, OPC_EXTR_U_I11, TCG_REG_R3,
@@ -1536,8 +1544,8 @@ static inline void tcg_out_qemu_tlb(TCGContext *s, TCGReg addr_reg,
                                TCG_REG_R3, 63 - CPU_TLB_ENTRY_BITS,
                                63 - CPU_TLB_ENTRY_BITS),
                    tcg_opc_i14(TCG_REG_P0, OPC_DEP_I14, TCG_REG_R1, 0,
-                               TCG_REG_R57, 63 - s_bits,
-                               TARGET_PAGE_BITS - s_bits - 1));
+                               TCG_REG_R57, 63 - a_bits,
+                               TARGET_PAGE_BITS - a_bits - 1));
     tcg_out_bundle(s, MmI,
                    tcg_opc_a1 (TCG_REG_P0, OPC_ADD_A1,
                                TCG_REG_R2, TCG_REG_R2, TCG_REG_R3),
@@ -1661,7 +1669,7 @@ static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args)
     s_bits = opc & MO_SIZE;
 
     /* Read the TLB entry */
-    tcg_out_qemu_tlb(s, addr_reg, s_bits,
+    tcg_out_qemu_tlb(s, addr_reg, opc,
                      offsetof(CPUArchState, tlb_table[mem_index][0].addr_read),
                      offsetof(CPUArchState, tlb_table[mem_index][0].addend),
                      INSN_NOP_I, INSN_NOP_I);
@@ -1739,7 +1747,7 @@ static inline void tcg_out_qemu_st(TCGContext *s, const TCGArg *args)
         pre1 = tcg_opc_ext_i(TCG_REG_P0, opc, TCG_REG_R58, data_reg);
     }
 
-    tcg_out_qemu_tlb(s, addr_reg, s_bits,
+    tcg_out_qemu_tlb(s, addr_reg, opc,
                      offsetof(CPUArchState, tlb_table[mem_index][0].addr_write),
                      offsetof(CPUArchState, tlb_table[mem_index][0].addend),
                      pre1, pre2);
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 2f9be48..8614ff8 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -1040,7 +1040,9 @@ static void tcg_out_tlb_load(TCGContext *s, TCGReg base, TCGReg addrl,
                              TCGReg addrh, TCGMemOpIdx oi,
                              tcg_insn_unit *label_ptr[2], bool is_load)
 {
-    TCGMemOp s_bits = get_memop(oi) & MO_SIZE;
+    TCGMemOp opc = get_memop(oi);
+    unsigned s_bits = opc & MO_SIZE;
+    unsigned a_bits = get_alignment_bits(opc);
     int mem_index = get_mmuidx(oi);
     int cmp_off
         = (is_load
@@ -1071,10 +1073,14 @@ static void tcg_out_tlb_load(TCGContext *s, TCGReg base, TCGReg addrl,
     tcg_out_opc_imm(s, OPC_LW, TCG_TMP0, TCG_REG_A0,
                     cmp_off + (TARGET_LONG_BITS == 64 ? LO_OFF : 0));
 
+    /* We don't currently support unaligned accesses.
+       We could do so with mips32r6.  */
+    if (a_bits < s_bits) {
+        a_bits = s_bits;
+    }
     /* Mask the page bits, keeping the alignment bits to compare against.
        In between on 32-bit targets, load the tlb addend for the fast path.  */
-    tcg_out_movi(s, TCG_TYPE_I32, TCG_TMP1,
-                 TARGET_PAGE_MASK | ((1 << s_bits) - 1));
+    tcg_out_movi(s, TCG_TYPE_I32, TCG_TMP1, TARGET_PAGE_MASK | nbits(a_bits));
     if (TARGET_LONG_BITS == 32) {
         tcg_out_opc_imm(s, OPC_LW, TCG_REG_A0, TCG_REG_A0, add_off);
     }
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index eaf1bd9..82ac4b3 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -1404,8 +1404,8 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGMemOp opc,
            : offsetof(CPUArchState, tlb_table[mem_index][0].addr_write));
     int add_off = offsetof(CPUArchState, tlb_table[mem_index][0].addend);
     TCGReg base = TCG_AREG0;
-    TCGMemOp s_bits = opc & MO_SIZE;
-    int a_bits = get_alignment_bits(opc);
+    unsigned s_bits = opc & MO_SIZE;
+    unsigned a_bits = get_alignment_bits(opc);
 
     /* Extract the page index, shifted into place for tlb index.  */
     if (TCG_TARGET_REG_BITS == 64) {
@@ -1458,39 +1458,42 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGMemOp opc,
     tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_R3, TCG_REG_R3, add_off);
 
     /* Clear the non-page, non-alignment bits from the address */
-    if (TCG_TARGET_REG_BITS == 32 || TARGET_LONG_BITS == 32) {
-        /* We don't support unaligned accesses on 32-bits, preserve
-         * the bottom bits and thus trigger a comparison failure on
-         * unaligned accesses
+    if (TCG_TARGET_REG_BITS == 32) {
+        /* We don't support unaligned accesses on 32-bits.
+         * Preserve the bottom bits and thus trigger a comparison
+         * failure on unaligned accesses.
          */
-        if (a_bits < 0) {
+        if (a_bits < s_bits) {
             a_bits = s_bits;
         }
         tcg_out_rlw(s, RLWINM, TCG_REG_R0, addrlo, 0,
                     (32 - a_bits) & 31, 31 - TARGET_PAGE_BITS);
-    } else if (a_bits) {
-        /* More than byte access, we need to handle alignment */
-        if (a_bits > 0) {
-            /* Alignment required by the front-end, same as 32-bits */
-            tcg_out_rld(s, RLDICL, TCG_REG_R0, addrlo,
+    } else {
+        TCGReg t = addrlo;
+
+        /* If the access is unaligned, we need to make sure we fail if we
+         * cross a page boundary.  The trick is to add the access size-1
+         * to the address before masking the low bits.  That will make the
+         * address overflow to the next page if we cross a page boundary,
+         * which will then force a mismatch of the TLB compare.
+         */
+        if (a_bits < s_bits) {
+            tcg_out32(s, ADDI | TAI(TCG_REG_R0, t,
+                                    nbits(s_bits) - nbits(a_bits)));
+            t = TCG_REG_R0;
+        }
+
+        /* Mask the address for the requested alignment.  */
+        if (TARGET_LONG_BITS == 32) {
+            tcg_out_rlw(s, RLWINM, TCG_REG_R0, t, 0,
+                        (32 - a_bits) & 31, 31 - TARGET_PAGE_BITS);
+        } else if (a_bits == 0) {
+            tcg_out_rld(s, RLDICR, TCG_REG_R0, t, 0, 63 - TARGET_PAGE_BITS);
+        } else {
+            tcg_out_rld(s, RLDICL, TCG_REG_R0, t,
                         64 - TARGET_PAGE_BITS, TARGET_PAGE_BITS - a_bits);
             tcg_out_rld(s, RLDICL, TCG_REG_R0, TCG_REG_R0, TARGET_PAGE_BITS, 0);
-       } else {
-           /* We support unaligned accesses, we need to make sure we fail
-            * if we cross a page boundary. The trick is to add the
-            * access_size-1 to the address before masking the low bits.
-            * That will make the address overflow to the next page if we
-            * cross a page boundary which will then force a mismatch of
-            * the TLB compare since the next page cannot possibly be in
-            * the same TLB index.
-            */
-            tcg_out32(s, ADDI | TAI(TCG_REG_R0, addrlo, (1 << s_bits) - 1));
-            tcg_out_rld(s, RLDICR, TCG_REG_R0, TCG_REG_R0,
-                        0, 63 - TARGET_PAGE_BITS);
         }
-    } else {
-        /* Byte access, just chop off the bits below the page index */
-        tcg_out_rld(s, RLDICR, TCG_REG_R0, addrlo, 0, 63 - TARGET_PAGE_BITS);
     }
 
     if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 5a7495b..c30a7ef 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -1505,21 +1505,16 @@ QEMU_BUILD_BUG_ON(offsetof(CPUArchState, tlb_table[NB_MMU_MODES - 1][1])
 static TCGReg tcg_out_tlb_read(TCGContext* s, TCGReg addr_reg, TCGMemOp opc,
                                int mem_index, bool is_ld)
 {
-    int a_bits = get_alignment_bits(opc);
+    unsigned s_bits = opc & MO_SIZE;
+    unsigned a_bits = get_alignment_bits(opc);
     int ofs, a_off;
     uint64_t tlb_mask;
 
     /* For aligned accesses, we check the first byte and include the alignment
        bits within the address.  For unaligned access, we check that we don't
        cross pages using the address of the last byte of the access.  */
-    if (a_bits >= 0) {
-        /* A byte access or an alignment check required */
-        a_off = 0;
-        tlb_mask = TARGET_PAGE_MASK | ((1 << a_bits) - 1);
-    } else {
-        a_off = (1 << (opc & MO_SIZE)) - 1;
-        tlb_mask = TARGET_PAGE_MASK;
-    }
+    a_off = (a_bits >= s_bits ? 0 : nbits(s_bits) - nbits(a_bits));
+    tlb_mask = TARGET_PAGE_MASK | nbits(a_bits);
 
     if (facilities & FACILITY_GEN_INST_EXT) {
         tcg_out_risbg(s, TCG_REG_R2, addr_reg,
diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index 8e98172..10e4126 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -996,19 +996,24 @@ static void tcg_target_qemu_prologue(TCGContext *s)
    is in the returned register, maybe %o0.  The TLB addend is in %o1.  */
 
 static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addr, int mem_index,
-                               TCGMemOp s_bits, int which)
+                               TCGMemOp opc, int which)
 {
     const TCGReg r0 = TCG_REG_O0;
     const TCGReg r1 = TCG_REG_O1;
     const TCGReg r2 = TCG_REG_O2;
+    unsigned s_bits = opc & MO_SIZE;
+    unsigned a_bits = get_alignment_bits(opc);
     int tlb_ofs;
 
     /* Shift the page number down.  */
     tcg_out_arithi(s, r1, addr, TARGET_PAGE_BITS, SHIFT_SRL);
 
-    /* Mask out the page offset, except for the required alignment.  */
-    tcg_out_movi(s, TCG_TYPE_TL, TCG_REG_T1,
-                 TARGET_PAGE_MASK | ((1 << s_bits) - 1));
+    /* Mask out the page offset, except for the required alignment.
+       We don't support unaligned accesses.  */
+    if (a_bits < s_bits) {
+        a_bits = s_bits;
+    }
+    tcg_out_movi(s, TCG_TYPE_TL, TCG_REG_T1, TARGET_PAGE_MASK | nbits(a_bits));
 
     /* Mask the tlb index.  */
     tcg_out_arithi(s, r1, r1, CPU_TLB_SIZE - 1, ARITH_AND);
@@ -1087,7 +1092,7 @@ static void tcg_out_qemu_ld(TCGContext *s, TCGReg data, TCGReg addr,
     tcg_insn_unit *func;
     tcg_insn_unit *label_ptr;
 
-    addrz = tcg_out_tlb_load(s, addr, memi, memop & MO_SIZE,
+    addrz = tcg_out_tlb_load(s, addr, memi, memop,
                              offsetof(CPUTLBEntry, addr_read));
 
     /* The fast path is exactly one insn.  Thus we can perform the
@@ -1169,7 +1174,7 @@ static void tcg_out_qemu_st(TCGContext *s, TCGReg data, TCGReg addr,
     tcg_insn_unit *func;
     tcg_insn_unit *label_ptr;
 
-    addrz = tcg_out_tlb_load(s, addr, memi, memop & MO_SIZE,
+    addrz = tcg_out_tlb_load(s, addr, memi, memop,
                              offsetof(CPUTLBEntry, addr_write));
 
     /* The fast path is exactly one insn.  Thus we can perform the entire
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 1bcabca..8856f02 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -287,20 +287,19 @@ typedef enum TCGMemOp {
      * MO_ALIGN accesses will result in a call to the CPU's
      * do_unaligned_access hook if the guest address is not aligned.
      * The default depends on whether the target CPU defines ALIGNED_ONLY.
+     *
      * Some architectures (e.g. ARMv8) need the address which is aligned
      * to a size more than the size of the memory access.
-     * To support such check it's enough the current costless alignment
-     * check implementation in QEMU, but we need to support
-     * an alignment size specifying.
-     * MO_ALIGN supposes a natural alignment
-     * (i.e. the alignment size is the size of a memory access).
-     * Note that an alignment size must be equal or greater
-     * than an access size.
+     * Some architectures (e.g. SPARCv9) need an address which is aligned,
+     * but less strictly than the natural alignment.
+     *
+     * MO_ALIGN supposes the alignment size is the size of a memory access.
+     *
      * There are three options:
-     * - an alignment to the size of an access (MO_ALIGN);
-     * - an alignment to the specified size that is equal or greater than
-     *   an access size (MO_ALIGN_x where 'x' is a size in bytes);
      * - unaligned access permitted (MO_UNALN).
+     * - an alignment to the size of an access (MO_ALIGN);
+     * - an alignment to a specified size, which may be more or less than
+     *   the access size (MO_ALIGN_x where 'x' is a size in bytes);
      */
     MO_ASHIFT = 4,
     MO_AMASK = 7 << MO_ASHIFT,
@@ -349,42 +348,41 @@ typedef enum TCGMemOp {
 } TCGMemOp;
 
 /**
+ * nbits
+ * @bits: number of bits
+ *
+ * Return a mask of 0 to 31 low bits set.
+ */
+static inline unsigned nbits(unsigned bits)
+{
+    return (1U << bits) - 1;
+}
+
+/**
  * get_alignment_bits
  * @memop: TCGMemOp value
  *
  * Extract the alignment size from the memop.
- *
- * Returns: 0 in case of byte access (which is always aligned);
- *          positive value - number of alignment bits;
- *          negative value if unaligned access enabled
- *          and this is not a byte access.
  */
-static inline int get_alignment_bits(TCGMemOp memop)
+static inline unsigned get_alignment_bits(TCGMemOp memop)
 {
-    int a = memop & MO_AMASK;
-    int s = memop & MO_SIZE;
-    int r;
+    unsigned a = memop & MO_AMASK;
 
     if (a == MO_UNALN) {
-        /* Negative value if unaligned access enabled,
-         * or zero value in case of byte access.
-         */
-        return -s;
+        /* No alignment required.  */
+        a = 0;
     } else if (a == MO_ALIGN) {
-        /* A natural alignment: return a number of access size bits */
-        r = s;
+        /* A natural alignment requirement.  */
+        a = memop & MO_SIZE;
     } else {
-        /* Specific alignment size. It must be equal or greater
-         * than the access size.
-         */
-        r = a >> MO_ASHIFT;
-        tcg_debug_assert(r >= s);
+        /* A specific alignment requirement.  */
+        a = a >> MO_ASHIFT;
     }
 #if defined(CONFIG_SOFTMMU)
     /* The requested alignment cannot overlap the TLB flags.  */
-    tcg_debug_assert((TLB_FLAGS_MASK & ((1 << r) - 1)) == 0);
+    tcg_debug_assert((TLB_FLAGS_MASK & nbits(a)) == 0);
 #endif
-    return r;
+    return a;
 }
 
 typedef tcg_target_ulong TCGArg;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PULL v3 02/18] tcg: Merge GETPC and GETRA
  2016-09-12 23:39 [Qemu-devel] [PULL v3 00/18] tcg queued patches Richard Henderson
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 01/18] tcg: Support arbitrary size + alignment Richard Henderson
@ 2016-09-12 23:39 ` Richard Henderson
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 03/18] cpu-exec: Check -dfilter for -d cpu Richard Henderson
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2016-09-12 23:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

The return address argument to the softmmu template helpers was
confused.  In the legacy case, we wanted to indicate that there
is no return address, and so passed in NULL.  However, we then
immediately subtracted GETPC_ADJ from NULL, resulting in a non-zero
value, indicating the presence of an (invalid) return address.

Push the GETPC_ADJ subtraction down to the only point it's required:
immediately before use within cpu_restore_state, after all NULL pointer
checks have been completed.  This makes GETPC and GETRA identical.

Remove GETRA as the lesser used macro, replacing all uses with GETPC.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 cputlb.c                |  6 ++----
 include/exec/exec-all.h |  9 +++------
 softmmu_template.h      | 32 ++++++--------------------------
 target-arm/helper.c     |  6 +++---
 target-mips/op_helper.c | 18 +++++++++---------
 translate-all.c         |  1 +
 user-exec.c             |  7 +++++--
 7 files changed, 29 insertions(+), 50 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index d068ee5..3c99c34 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -543,10 +543,8 @@ static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
 #undef MMUSUFFIX
 
 #define MMUSUFFIX _cmmu
-#undef GETPC_ADJ
-#define GETPC_ADJ 0
-#undef GETRA
-#define GETRA() ((uintptr_t)0)
+#undef GETPC
+#define GETPC() ((uintptr_t)0)
 #define SOFTMMU_CODE_ACCESS
 
 #define SHIFT 0
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index d008296..8b557d8 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -347,13 +347,12 @@ static inline void tb_add_jump(TranslationBlock *tb, int n,
     tb_next->jmp_list_first = (uintptr_t)tb | n;
 }
 
-/* GETRA is the true target of the return instruction that we'll execute,
-   defined here for simplicity of defining the follow-up macros.  */
+/* GETPC is the true target of the return instruction that we'll execute.  */
 #if defined(CONFIG_TCG_INTERPRETER)
 extern uintptr_t tci_tb_ptr;
-# define GETRA() tci_tb_ptr
+# define GETPC() tci_tb_ptr
 #else
-# define GETRA() \
+# define GETPC() \
     ((uintptr_t)__builtin_extract_return_addr(__builtin_return_address(0)))
 #endif
 
@@ -366,8 +365,6 @@ extern uintptr_t tci_tb_ptr;
    smaller than 4 bytes, so we don't worry about special-casing this.  */
 #define GETPC_ADJ   2
 
-#define GETPC()  (GETRA() - GETPC_ADJ)
-
 #if !defined(CONFIG_USER_ONLY)
 
 struct MemoryRegion *iotlb_to_region(CPUState *cpu,
diff --git a/softmmu_template.h b/softmmu_template.h
index 7ea0a41..b3eb821 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -150,9 +150,6 @@ WORD_TYPE helper_le_ld_name(CPUArchState *env, target_ulong addr,
     uintptr_t haddr;
     DATA_TYPE res;
 
-    /* Adjust the given return address.  */
-    retaddr -= GETPC_ADJ;
-
     if (addr & nbits(a_bits)) {
         cpu_unaligned_access(ENV_GET_CPU(env), addr, READ_ACCESS_TYPE,
                              mmu_idx, retaddr);
@@ -193,10 +190,8 @@ WORD_TYPE helper_le_ld_name(CPUArchState *env, target_ulong addr,
     do_unaligned_access:
         addr1 = addr & ~(DATA_SIZE - 1);
         addr2 = addr1 + DATA_SIZE;
-        /* Note the adjustment at the beginning of the function.
-           Undo that for the recursion.  */
-        res1 = helper_le_ld_name(env, addr1, oi, retaddr + GETPC_ADJ);
-        res2 = helper_le_ld_name(env, addr2, oi, retaddr + GETPC_ADJ);
+        res1 = helper_le_ld_name(env, addr1, oi, retaddr);
+        res2 = helper_le_ld_name(env, addr2, oi, retaddr);
         shift = (addr & (DATA_SIZE - 1)) * 8;
 
         /* Little-endian combine.  */
@@ -224,9 +219,6 @@ WORD_TYPE helper_be_ld_name(CPUArchState *env, target_ulong addr,
     uintptr_t haddr;
     DATA_TYPE res;
 
-    /* Adjust the given return address.  */
-    retaddr -= GETPC_ADJ;
-
     if (addr & nbits(a_bits)) {
         cpu_unaligned_access(ENV_GET_CPU(env), addr, READ_ACCESS_TYPE,
                              mmu_idx, retaddr);
@@ -267,10 +259,8 @@ WORD_TYPE helper_be_ld_name(CPUArchState *env, target_ulong addr,
     do_unaligned_access:
         addr1 = addr & ~(DATA_SIZE - 1);
         addr2 = addr1 + DATA_SIZE;
-        /* Note the adjustment at the beginning of the function.
-           Undo that for the recursion.  */
-        res1 = helper_be_ld_name(env, addr1, oi, retaddr + GETPC_ADJ);
-        res2 = helper_be_ld_name(env, addr2, oi, retaddr + GETPC_ADJ);
+        res1 = helper_be_ld_name(env, addr1, oi, retaddr);
+        res2 = helper_be_ld_name(env, addr2, oi, retaddr);
         shift = (addr & (DATA_SIZE - 1)) * 8;
 
         /* Big-endian combine.  */
@@ -334,9 +324,6 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
     unsigned a_bits = get_alignment_bits(get_memop(oi));
     uintptr_t haddr;
 
-    /* Adjust the given return address.  */
-    retaddr -= GETPC_ADJ;
-
     if (addr & nbits(a_bits)) {
         cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
                              mmu_idx, retaddr);
@@ -391,10 +378,8 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
         for (i = 0; i < DATA_SIZE; ++i) {
             /* Little-endian extract.  */
             uint8_t val8 = val >> (i * 8);
-            /* Note the adjustment at the beginning of the function.
-               Undo that for the recursion.  */
             glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
-                                            oi, retaddr + GETPC_ADJ);
+                                            oi, retaddr);
         }
         return;
     }
@@ -417,9 +402,6 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
     unsigned a_bits = get_alignment_bits(get_memop(oi));
     uintptr_t haddr;
 
-    /* Adjust the given return address.  */
-    retaddr -= GETPC_ADJ;
-
     if (addr & nbits(a_bits)) {
         cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
                              mmu_idx, retaddr);
@@ -474,10 +456,8 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
         for (i = 0; i < DATA_SIZE; ++i) {
             /* Big-endian extract.  */
             uint8_t val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8));
-            /* Note the adjustment at the beginning of the function.
-               Undo that for the recursion.  */
             glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
-                                            oi, retaddr + GETPC_ADJ);
+                                            oi, retaddr);
         }
         return;
     }
diff --git a/target-arm/helper.c b/target-arm/helper.c
index bdb842c..915fe0f 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -8310,12 +8310,12 @@ void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in)
              * this purpose use the actual register value passed to us
              * so that we get the fault address right.
              */
-            helper_ret_stb_mmu(env, vaddr_in, 0, oi, GETRA());
+            helper_ret_stb_mmu(env, vaddr_in, 0, oi, GETPC());
             /* Now we can populate the other TLB entries, if any */
             for (i = 0; i < maxidx; i++) {
                 uint64_t va = vaddr + TARGET_PAGE_SIZE * i;
                 if (va != (vaddr_in & TARGET_PAGE_MASK)) {
-                    helper_ret_stb_mmu(env, va, 0, oi, GETRA());
+                    helper_ret_stb_mmu(env, va, 0, oi, GETPC());
                 }
             }
         }
@@ -8332,7 +8332,7 @@ void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in)
          *    bounce buffer was in use
          */
         for (i = 0; i < blocklen; i++) {
-            helper_ret_stb_mmu(env, vaddr + i, 0, oi, GETRA());
+            helper_ret_stb_mmu(env, vaddr + i, 0, oi, GETPC());
         }
     }
 #else
diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
index ea2f2ab..7af4c2f 100644
--- a/target-mips/op_helper.c
+++ b/target-mips/op_helper.c
@@ -4122,10 +4122,10 @@ void helper_msa_ld_ ## TYPE(CPUMIPSState *env, uint32_t wd,             \
 }
 
 #if !defined(CONFIG_USER_ONLY)
-MSA_LD_DF(DF_BYTE,   b, helper_ret_ldub_mmu, oi, GETRA())
-MSA_LD_DF(DF_HALF,   h, helper_ret_lduw_mmu, oi, GETRA())
-MSA_LD_DF(DF_WORD,   w, helper_ret_ldul_mmu, oi, GETRA())
-MSA_LD_DF(DF_DOUBLE, d, helper_ret_ldq_mmu,  oi, GETRA())
+MSA_LD_DF(DF_BYTE,   b, helper_ret_ldub_mmu, oi, GETPC())
+MSA_LD_DF(DF_HALF,   h, helper_ret_lduw_mmu, oi, GETPC())
+MSA_LD_DF(DF_WORD,   w, helper_ret_ldul_mmu, oi, GETPC())
+MSA_LD_DF(DF_DOUBLE, d, helper_ret_ldq_mmu,  oi, GETPC())
 #else
 MSA_LD_DF(DF_BYTE,   b, cpu_ldub_data)
 MSA_LD_DF(DF_HALF,   h, cpu_lduw_data)
@@ -4161,17 +4161,17 @@ void helper_msa_st_ ## TYPE(CPUMIPSState *env, uint32_t wd,             \
     int mmu_idx = cpu_mmu_index(env, false);				\
     int i;                                                              \
     MEMOP_IDX(DF)                                                       \
-    ensure_writable_pages(env, addr, mmu_idx, GETRA());                 \
+    ensure_writable_pages(env, addr, mmu_idx, GETPC());                 \
     for (i = 0; i < DF_ELEMENTS(DF); i++) {                             \
         ST_INSN(env, addr + (i << DF), pwd->TYPE[i], ##__VA_ARGS__);    \
     }                                                                   \
 }
 
 #if !defined(CONFIG_USER_ONLY)
-MSA_ST_DF(DF_BYTE,   b, helper_ret_stb_mmu, oi, GETRA())
-MSA_ST_DF(DF_HALF,   h, helper_ret_stw_mmu, oi, GETRA())
-MSA_ST_DF(DF_WORD,   w, helper_ret_stl_mmu, oi, GETRA())
-MSA_ST_DF(DF_DOUBLE, d, helper_ret_stq_mmu, oi, GETRA())
+MSA_ST_DF(DF_BYTE,   b, helper_ret_stb_mmu, oi, GETPC())
+MSA_ST_DF(DF_HALF,   h, helper_ret_stw_mmu, oi, GETPC())
+MSA_ST_DF(DF_WORD,   w, helper_ret_stl_mmu, oi, GETPC())
+MSA_ST_DF(DF_DOUBLE, d, helper_ret_stq_mmu, oi, GETPC())
 #else
 MSA_ST_DF(DF_BYTE,   b, cpu_stb_data)
 MSA_ST_DF(DF_HALF,   h, cpu_stw_data)
diff --git a/translate-all.c b/translate-all.c
index 0dd6466..9cef15a 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -299,6 +299,7 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t retaddr)
 {
     TranslationBlock *tb;
 
+    retaddr -= GETPC_ADJ;
     tb = tb_find_pc(retaddr);
     if (tb) {
         cpu_restore_state_from_tb(cpu, tb, retaddr);
diff --git a/user-exec.c b/user-exec.c
index 95f9f97..6db0758 100644
--- a/user-exec.c
+++ b/user-exec.c
@@ -105,8 +105,11 @@ static inline int handle_cpu_signal(uintptr_t pc, unsigned long address,
     if (ret == 0) {
         return 1; /* the MMU fault was handled without causing real CPU fault */
     }
-    /* now we have a real cpu fault */
-    cpu_restore_state(cpu, pc);
+
+    /* Now we have a real cpu fault.  Since this is the exact location of
+     * the exception, we must undo the adjustment done by cpu_restore_state
+     * for handling call return addresses.  */
+    cpu_restore_state(cpu, pc + GETPC_ADJ);
 
     sigprocmask(SIG_SETMASK, old_set, NULL);
     cpu_loop_exit(cpu);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PULL v3 03/18] cpu-exec: Check -dfilter for -d cpu
  2016-09-12 23:39 [Qemu-devel] [PULL v3 00/18] tcg queued patches Richard Henderson
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 01/18] tcg: Support arbitrary size + alignment Richard Henderson
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 02/18] tcg: Merge GETPC and GETRA Richard Henderson
@ 2016-09-12 23:39 ` Richard Henderson
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 04/18] Introduce TCGOpcode for memory barrier Richard Henderson
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2016-09-12 23:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 cpu-exec.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 5d9710a..e7f851c 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -147,7 +147,8 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock *itb)
                            itb->tc_ptr, itb->pc, lookup_symbol(itb->pc));
 
 #if defined(DEBUG_DISAS)
-    if (qemu_loglevel_mask(CPU_LOG_TB_CPU)) {
+    if (qemu_loglevel_mask(CPU_LOG_TB_CPU)
+        && qemu_log_in_addr_range(itb->pc)) {
 #if defined(TARGET_I386)
         log_cpu_state(cpu, CPU_DUMP_CCOP);
 #elif defined(TARGET_M68K)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PULL v3 04/18] Introduce TCGOpcode for memory barrier
  2016-09-12 23:39 [Qemu-devel] [PULL v3 00/18] tcg queued patches Richard Henderson
                   ` (2 preceding siblings ...)
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 03/18] cpu-exec: Check -dfilter for -d cpu Richard Henderson
@ 2016-09-12 23:39 ` Richard Henderson
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 05/18] tcg/i386: Add support for fence Richard Henderson
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2016-09-12 23:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar

From: Pranith Kumar <bobby.prani@gmail.com>

This commit introduces the TCGOpcode for memory barrier instruction.

This opcode takes an argument which is the type of memory barrier
which should be generated.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-2-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/README    | 17 +++++++++++++++++
 tcg/tcg-op.c  | 17 +++++++++++++++++
 tcg/tcg-op.h  |  2 ++
 tcg/tcg-opc.h |  2 ++
 tcg/tcg.h     | 19 +++++++++++++++++++
 5 files changed, 57 insertions(+)

diff --git a/tcg/README b/tcg/README
index ce8beba..1d48aa9 100644
--- a/tcg/README
+++ b/tcg/README
@@ -402,6 +402,23 @@ double-word product T0.  The later is returned in two single-word outputs.
 
 Similar to mulu2, except the two inputs T1 and T2 are signed.
 
+********* Memory Barrier support
+
+* mb <$arg>
+
+Generate a target memory barrier instruction to ensure memory ordering as being
+enforced by a corresponding guest memory barrier instruction. The ordering
+enforced by the backend may be stricter than the ordering required by the guest.
+It cannot be weaker. This opcode takes a constant argument which is required to
+generate the appropriate barrier instruction. The backend should take care to
+emit the target barrier instruction only when necessary i.e., for SMP guests and
+when MTTCG is enabled.
+
+The guest translators should generate this opcode for all guest instructions
+which have ordering side effects.
+
+Please see docs/atomics.txt for more information on memory barriers.
+
 ********* 64-bit guest on 32-bit host support
 
 The following opcodes are internal to TCG.  Thus they are to be implemented by
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 0243c99..e3af4dd 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -148,6 +148,23 @@ void tcg_gen_op6(TCGContext *ctx, TCGOpcode opc, TCGArg a1, TCGArg a2,
     tcg_emit_op(ctx, opc, pi);
 }
 
+void tcg_gen_mb(TCGArg mb_type)
+{
+    bool emit_barriers = true;
+
+#ifndef CONFIG_USER_ONLY
+    /* TODO: When MTTCG is available for system mode, we will check
+     * the following condition and enable emit_barriers
+     * (qemu_tcg_mttcg_enabled() && smp_cpus > 1)
+     */
+    emit_barriers = false;
+#endif
+
+    if (emit_barriers) {
+        tcg_gen_op1(&tcg_ctx, INDEX_op_mb, mb_type);
+    }
+}
+
 /* 32 bit ops */
 
 void tcg_gen_addi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index f217e80..41890cc 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -261,6 +261,8 @@ static inline void tcg_gen_br(TCGLabel *l)
     tcg_gen_op1(&tcg_ctx, INDEX_op_br, label_arg(l));
 }
 
+void tcg_gen_mb(TCGArg a);
+
 /* Helper calls. */
 
 /* 32 bit ops */
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 6d0410c..45528d2 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -42,6 +42,8 @@ DEF(br, 0, 0, 1, TCG_OPF_BB_END)
 # define IMPL64  TCG_OPF_64BIT
 #endif
 
+DEF(mb, 0, 0, 1, 0)
+
 DEF(mov_i32, 1, 1, 0, TCG_OPF_NOT_PRESENT)
 DEF(movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT)
 DEF(setcond_i32, 1, 2, 1, 0)
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 8856f02..f8dfe4c 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -476,6 +476,25 @@ static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t)
 #define TCG_CALL_DUMMY_TCGV     MAKE_TCGV_I32(-1)
 #define TCG_CALL_DUMMY_ARG      ((TCGArg)(-1))
 
+/* used to indicate the type of accesses on which ordering is to be
+   ensured. Modeled after SPARC barriers */
+typedef enum {
+    TCG_MO_LD_LD    = 1,
+    TCG_MO_ST_LD    = 2,
+    TCG_MO_LD_ST    = 4,
+    TCG_MO_ST_ST    = 8,
+    TCG_MO_ALL      = 0xF, /* OR of all above */
+} TCGOrder;
+
+/* used to indicate the kind of ordering which is to be ensured by the
+   instruction. These types are derived from x86/aarch64 instructions.
+   It should be noted that these are different from C11 semantics */
+typedef enum {
+    TCG_BAR_LDAQ     = 0x10, /* generated for aarch64 load-acquire inst. */
+    TCG_BAR_STRL     = 0x20, /* generated for aarch64 store-rel inst. */
+    TCG_BAR_SC       = 0x40, /* generated for all other ordering inst. */
+} TCGBar;
+
 /* Conditions.  Note that these are laid out for easy manipulation by
    the functions below:
      bit 0 is used for inverting;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PULL v3 05/18] tcg/i386: Add support for fence
  2016-09-12 23:39 [Qemu-devel] [PULL v3 00/18] tcg queued patches Richard Henderson
                   ` (3 preceding siblings ...)
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 04/18] Introduce TCGOpcode for memory barrier Richard Henderson
@ 2016-09-12 23:39 ` Richard Henderson
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 06/18] tcg/aarch64: " Richard Henderson
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2016-09-12 23:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar

From: Pranith Kumar <bobby.prani@gmail.com>

Generate a 'lock orl $0,0(%esp)' instruction for ordering instead of
mfence which has similar ordering semantics.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-3-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.inc.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 1573e69..b4f3223 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -686,6 +686,18 @@ static inline void tcg_out_pushi(TCGContext *s, tcg_target_long val)
     }
 }
 
+static inline void tcg_out_mb(TCGContext *s, TCGArg a0)
+{
+    /* Given the strength of x86 memory ordering, we only need care for
+       store-load ordering.  Experimentally, "lock orl $0,0(%esp)" is
+       faster than "mfence", so don't bother with the sse insn.  */
+    if (a0 & TCG_MO_ST_LD) {
+        tcg_out8(s, 0xf0);
+        tcg_out_modrm_offset(s, OPC_ARITH_EvIb, ARITH_OR, TCG_REG_ESP, 0);
+        tcg_out8(s, 0);
+    }
+}
+
 static inline void tcg_out_push(TCGContext *s, int reg)
 {
     tcg_out_opc(s, OPC_PUSH_r32 + LOWREGMASK(reg), 0, reg, 0);
@@ -2130,6 +2142,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_mb:
+        tcg_out_mb(s, args[0]);
+        break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
@@ -2195,6 +2210,8 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_add2_i32, { "r", "r", "0", "1", "ri", "ri" } },
     { INDEX_op_sub2_i32, { "r", "r", "0", "1", "ri", "ri" } },
 
+    { INDEX_op_mb, { } },
+
 #if TCG_TARGET_REG_BITS == 32
     { INDEX_op_brcond2_i32, { "r", "r", "ri", "ri" } },
     { INDEX_op_setcond2_i32, { "r", "r", "r", "ri", "ri" } },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PULL v3 06/18] tcg/aarch64: Add support for fence
  2016-09-12 23:39 [Qemu-devel] [PULL v3 00/18] tcg queued patches Richard Henderson
                   ` (4 preceding siblings ...)
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 05/18] tcg/i386: Add support for fence Richard Henderson
@ 2016-09-12 23:39 ` Richard Henderson
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 07/18] tcg/arm: " Richard Henderson
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2016-09-12 23:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar, Claudio Fontana

From: Pranith Kumar <bobby.prani@gmail.com>

Cc: Claudio Fontana <claudio.fontana@gmail.com>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-4-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.inc.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 2f5629e..6caa9a4 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -372,6 +372,11 @@ typedef enum {
     I3510_EOR       = 0x4a000000,
     I3510_EON       = 0x4a200000,
     I3510_ANDS      = 0x6a000000,
+
+    /* System instructions.  */
+    DMB_ISH         = 0xd50338bf,
+    DMB_LD          = 0x00000100,
+    DMB_ST          = 0x00000200,
 } AArch64Insn;
 
 static inline uint32_t tcg_in32(TCGContext *s)
@@ -981,6 +986,20 @@ static inline void tcg_out_addsub2(TCGContext *s, int ext, TCGReg rl,
     tcg_out_mov(s, ext, orig_rl, rl);
 }
 
+static inline void tcg_out_mb(TCGContext *s, TCGArg a0)
+{
+    uint32_t dmb_type = DMB_ISH;
+    a0 &= TCG_MO_ALL;
+    if (a0 == TCG_MO_LD_LD) {
+        dmb_type |= DMB_LD;
+    } else if (a0 == TCG_MO_ST_ST) {
+        dmb_type |= DMB_ST;
+    } else {
+        dmb_type |= DMB_LD | DMB_ST;
+    }
+    tcg_out32(s, dmb_type);
+}
+
 #ifdef CONFIG_SOFTMMU
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     TCGMemOpIdx oi, uintptr_t ra)
@@ -1647,6 +1666,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_insn(s, 3508, SMULH, TCG_TYPE_I64, a0, a1, a2);
         break;
 
+    case INDEX_op_mb:
+        tcg_out_mb(s, a0);
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
@@ -1771,6 +1794,7 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
     { INDEX_op_muluh_i64, { "r", "r", "r" } },
     { INDEX_op_mulsh_i64, { "r", "r", "r" } },
 
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PULL v3 07/18] tcg/arm: Add support for fence
  2016-09-12 23:39 [Qemu-devel] [PULL v3 00/18] tcg queued patches Richard Henderson
                   ` (5 preceding siblings ...)
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 06/18] tcg/aarch64: " Richard Henderson
@ 2016-09-12 23:39 ` Richard Henderson
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 08/18] tcg/ia64: " Richard Henderson
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2016-09-12 23:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar, Andrzej Zaborowski

From: Pranith Kumar <bobby.prani@gmail.com>

Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-5-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.inc.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 58ffc0d..f3ff6f2 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -313,6 +313,10 @@ typedef enum {
     INSN_LDRD_REG  = 0x000000d0,
     INSN_STRD_IMM  = 0x004000f0,
     INSN_STRD_REG  = 0x000000f0,
+
+    INSN_DMB_ISH   = 0x5bf07ff5,
+    INSN_DMB_MCR   = 0xba0f07ee,
+
 } ARMInsn;
 
 #define SHIFT_IMM_LSL(im)	(((im) << 7) | 0x00)
@@ -1066,6 +1070,15 @@ static inline void tcg_out_goto_label(TCGContext *s, int cond, TCGLabel *l)
     }
 }
 
+static inline void tcg_out_mb(TCGContext *s, TCGArg a0)
+{
+    if (use_armv7_instructions) {
+        tcg_out32(s, INSN_DMB_ISH);
+    } else if (use_armv6_instructions) {
+        tcg_out32(s, INSN_DMB_MCR);
+    }
+}
+
 #ifdef CONFIG_SOFTMMU
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     int mmu_idx, uintptr_t ra)
@@ -1928,6 +1941,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_udiv(s, COND_AL, args[0], args[1], args[2]);
         break;
 
+    case INDEX_op_mb:
+        tcg_out_mb(s, args[0]);
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
@@ -2002,6 +2019,7 @@ static const TCGTargetOpDef arm_op_defs[] = {
     { INDEX_op_div_i32, { "r", "r", "r" } },
     { INDEX_op_divu_i32, { "r", "r", "r" } },
 
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PULL v3 08/18] tcg/ia64: Add support for fence
  2016-09-12 23:39 [Qemu-devel] [PULL v3 00/18] tcg queued patches Richard Henderson
                   ` (6 preceding siblings ...)
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 07/18] tcg/arm: " Richard Henderson
@ 2016-09-12 23:39 ` Richard Henderson
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 09/18] tcg/mips: " Richard Henderson
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2016-09-12 23:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar, Aurelien Jarno

From: Pranith Kumar <bobby.prani@gmail.com>

Cc: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-6-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ia64/tcg-target.inc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tcg/ia64/tcg-target.inc.c b/tcg/ia64/tcg-target.inc.c
index 7642390..b04d716 100644
--- a/tcg/ia64/tcg-target.inc.c
+++ b/tcg/ia64/tcg-target.inc.c
@@ -247,6 +247,7 @@ enum {
     OPC_LD4_M3                = 0x0a080000000ull,
     OPC_LD8_M1                = 0x080c0000000ull,
     OPC_LD8_M3                = 0x0a0c0000000ull,
+    OPC_MF_M24                = 0x00110000000ull,
     OPC_MUX1_I3               = 0x0eca0000000ull,
     OPC_NOP_B9                = 0x04008000000ull,
     OPC_NOP_F16               = 0x00008000000ull,
@@ -2231,6 +2232,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_qemu_st(s, args);
         break;
 
+    case INDEX_op_mb:
+        tcg_out_bundle(s, mmI, OPC_MF_M24, INSN_NOP_M, INSN_NOP_I);
+        break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
@@ -2344,6 +2348,7 @@ static const TCGTargetOpDef ia64_op_defs[] = {
     { INDEX_op_qemu_st_i32, { "SZ", "r" } },
     { INDEX_op_qemu_st_i64, { "SZ", "r" } },
 
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PULL v3 09/18] tcg/mips: Add support for fence
  2016-09-12 23:39 [Qemu-devel] [PULL v3 00/18] tcg queued patches Richard Henderson
                   ` (7 preceding siblings ...)
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 08/18] tcg/ia64: " Richard Henderson
@ 2016-09-12 23:39 ` Richard Henderson
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 10/18] tcg/ppc: " Richard Henderson
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2016-09-12 23:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar

From: Pranith Kumar <bobby.prani@gmail.com>

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-7-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/mips/tcg-target.inc.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 8614ff8..8b9fb73 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -292,6 +292,7 @@ typedef enum {
     OPC_JALR     = OPC_SPECIAL | 0x09,
     OPC_MOVZ     = OPC_SPECIAL | 0x0A,
     OPC_MOVN     = OPC_SPECIAL | 0x0B,
+    OPC_SYNC     = OPC_SPECIAL | 0x0F,
     OPC_MFHI     = OPC_SPECIAL | 0x10,
     OPC_MFLO     = OPC_SPECIAL | 0x12,
     OPC_MULT     = OPC_SPECIAL | 0x18,
@@ -339,6 +340,14 @@ typedef enum {
      * backwards-compatible at the assembly level.
      */
     OPC_MUL      = use_mips32r6_instructions ? OPC_MUL_R6 : OPC_MUL_R5,
+
+    /* MIPS r6 introduced names for weaker variants of SYNC.  These are
+       backward compatible to previous architecture revisions.  */
+    OPC_SYNC_WMB     = OPC_SYNC | 0x04 << 5,
+    OPC_SYNC_MB      = OPC_SYNC | 0x10 << 5,
+    OPC_SYNC_ACQUIRE = OPC_SYNC | 0x11 << 5,
+    OPC_SYNC_RELEASE = OPC_SYNC | 0x12 << 5,
+    OPC_SYNC_RMB     = OPC_SYNC | 0x13 << 5,
 } MIPSInsn;
 
 /*
@@ -1383,6 +1392,22 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is_64)
 #endif
 }
 
+static void tcg_out_mb(TCGContext *s, TCGArg a0)
+{
+    static const MIPSInsn sync[] = {
+        /* Note that SYNC_MB is a slightly weaker than SYNC 0,
+           as the former is an ordering barrier and the latter
+           is a completion barrier.  */
+        [0 ... TCG_MO_ALL]            = OPC_SYNC_MB,
+        [TCG_MO_LD_LD]                = OPC_SYNC_RMB,
+        [TCG_MO_ST_ST]                = OPC_SYNC_WMB,
+        [TCG_MO_LD_ST]                = OPC_SYNC_RELEASE,
+        [TCG_MO_LD_ST | TCG_MO_ST_ST] = OPC_SYNC_RELEASE,
+        [TCG_MO_LD_ST | TCG_MO_LD_LD] = OPC_SYNC_ACQUIRE,
+    };
+    tcg_out32(s, sync[a0 & TCG_MO_ALL]);
+}
+
 static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                               const TCGArg *args, const int *const_args)
 {
@@ -1652,6 +1677,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                         const_args[4], const_args[5], true);
         break;
 
+    case INDEX_op_mb:
+        tcg_out_mb(s, a0);
+        break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
@@ -1732,6 +1760,8 @@ static const TCGTargetOpDef mips_op_defs[] = {
     { INDEX_op_qemu_ld_i64, { "L", "L", "lZ", "lZ" } },
     { INDEX_op_qemu_st_i64, { "SZ", "SZ", "SZ", "SZ" } },
 #endif
+
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PULL v3 10/18] tcg/ppc: Add support for fence
  2016-09-12 23:39 [Qemu-devel] [PULL v3 00/18] tcg queued patches Richard Henderson
                   ` (8 preceding siblings ...)
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 09/18] tcg/mips: " Richard Henderson
@ 2016-09-12 23:39 ` Richard Henderson
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 11/18] tcg/s390: " Richard Henderson
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2016-09-12 23:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar

From: Pranith Kumar <bobby.prani@gmail.com>

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-8-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc/tcg-target.inc.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 82ac4b3..4aee8ea 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -469,6 +469,10 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define STHX   XO31(407)
 #define STWX   XO31(151)
 
+#define EIEIO  XO31(854)
+#define HWSYNC XO31(598)
+#define LWSYNC (HWSYNC | (1u << 21))
+
 #define SPR(a, b) ((((a)<<5)|(b))<<11)
 #define LR     SPR(8, 0)
 #define CTR    SPR(9, 0)
@@ -1243,6 +1247,18 @@ static void tcg_out_brcond2 (TCGContext *s, const TCGArg *args,
     tcg_out_bc(s, BC | BI(7, CR_EQ) | BO_COND_TRUE, arg_label(args[5]));
 }
 
+static void tcg_out_mb(TCGContext *s, TCGArg a0)
+{
+    uint32_t insn = HWSYNC;
+    a0 &= TCG_MO_ALL;
+    if (a0 == TCG_MO_LD_LD) {
+        insn = LWSYNC;
+    } else if (a0 == TCG_MO_ST_ST) {
+        insn = EIEIO;
+    }
+    tcg_out32(s, insn);
+}
+
 #ifdef __powerpc64__
 void ppc_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
 {
@@ -2452,6 +2468,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out32(s, MULHD | TAB(args[0], args[1], args[2]));
         break;
 
+    case INDEX_op_mb:
+        tcg_out_mb(s, args[0]);
+        break;
+
     case INDEX_op_mov_i32:   /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32:  /* Always emitted via tcg_out_movi.  */
@@ -2599,6 +2619,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
     { INDEX_op_qemu_st_i64, { "S", "S", "S", "S" } },
 #endif
 
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PULL v3 11/18] tcg/s390: Add support for fence
  2016-09-12 23:39 [Qemu-devel] [PULL v3 00/18] tcg queued patches Richard Henderson
                   ` (9 preceding siblings ...)
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 10/18] tcg/ppc: " Richard Henderson
@ 2016-09-12 23:39 ` Richard Henderson
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 12/18] tcg/sparc: " Richard Henderson
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2016-09-12 23:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar, Alexander Graf

From: Pranith Kumar <bobby.prani@gmail.com>

Cc: Alexander Graf <agraf@suse.de>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-9-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/s390/tcg-target.inc.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index c30a7ef..ada607f 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -343,6 +343,7 @@ static tcg_insn_unit *tb_ret_addr;
 #define FACILITY_EXT_IMM	(1ULL << (63 - 21))
 #define FACILITY_GEN_INST_EXT	(1ULL << (63 - 34))
 #define FACILITY_LOAD_ON_COND   (1ULL << (63 - 45))
+#define FACILITY_FAST_BCR_SER   FACILITY_LOAD_ON_COND
 
 static uint64_t facilities;
 
@@ -2167,6 +2168,15 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tgen_deposit(s, args[0], args[2], args[3], args[4]);
         break;
 
+    case INDEX_op_mb:
+        /* The host memory model is quite strong, we simply need to
+           serialize the instruction stream.  */
+        if (args[0] & TCG_MO_ST_LD) {
+            tcg_out_insn(s, RR, BCR,
+                         facilities & FACILITY_FAST_BCR_SER ? 14 : 15, 0);
+        }
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
@@ -2288,6 +2298,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
     { INDEX_op_movcond_i64, { "r", "r", "rC", "r", "0" } },
     { INDEX_op_deposit_i64, { "r", "0", "r" } },
 
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PULL v3 12/18] tcg/sparc: Add support for fence
  2016-09-12 23:39 [Qemu-devel] [PULL v3 00/18] tcg queued patches Richard Henderson
                   ` (10 preceding siblings ...)
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 11/18] tcg/s390: " Richard Henderson
@ 2016-09-12 23:39 ` Richard Henderson
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 13/18] tcg/tci: " Richard Henderson
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2016-09-12 23:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar

From: Pranith Kumar <bobby.prani@gmail.com>

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-10-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.inc.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index 10e4126..bca25e2 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -249,6 +249,8 @@ static const int tcg_target_call_oarg_regs[] = {
 #define STWA       (INSN_OP(3) | INSN_OP3(0x14))
 #define STXA       (INSN_OP(3) | INSN_OP3(0x1e))
 
+#define MEMBAR     (INSN_OP(2) | INSN_OP3(0x28) | INSN_RS1(15) | (1 << 13))
+
 #ifndef ASI_PRIMARY_LITTLE
 #define ASI_PRIMARY_LITTLE 0x88
 #endif
@@ -835,6 +837,12 @@ static void tcg_out_call(TCGContext *s, tcg_insn_unit *dest)
     tcg_out_nop(s);
 }
 
+static void tcg_out_mb(TCGContext *s, TCGArg a0)
+{
+    /* Note that the TCG memory order constants mirror the Sparc MEMBAR.  */
+    tcg_out32(s, MEMBAR | (a0 & TCG_MO_ALL));
+}
+
 #ifdef CONFIG_SOFTMMU
 static tcg_insn_unit *qemu_ld_trampoline[16];
 static tcg_insn_unit *qemu_st_trampoline[16];
@@ -1465,6 +1473,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 	tcg_out_arithc(s, a0, TCG_REG_G0, a1, const_args[1], c);
 	break;
 
+    case INDEX_op_mb:
+        tcg_out_mb(s, a0);
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
@@ -1566,6 +1578,7 @@ static const TCGTargetOpDef sparc_op_defs[] = {
     { INDEX_op_qemu_st_i32, { "sZ", "A" } },
     { INDEX_op_qemu_st_i64, { "SZ", "A" } },
 
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PULL v3 13/18] tcg/tci: Add support for fence
  2016-09-12 23:39 [Qemu-devel] [PULL v3 00/18] tcg queued patches Richard Henderson
                   ` (11 preceding siblings ...)
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 12/18] tcg/sparc: " Richard Henderson
@ 2016-09-12 23:39 ` Richard Henderson
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 14/18] target-arm: Generate fences in ARMv7 frontend Richard Henderson
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2016-09-12 23:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar, Stefan Weil

From: Pranith Kumar <bobby.prani@gmail.com>

Cc: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-11-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tci/tcg-target.inc.c | 3 +++
 tci.c                    | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
index 3c47ea7..9dbf4d5 100644
--- a/tcg/tci/tcg-target.inc.c
+++ b/tcg/tci/tcg-target.inc.c
@@ -255,6 +255,7 @@ static const TCGTargetOpDef tcg_target_op_defs[] = {
     { INDEX_op_bswap32_i32, { R, R } },
 #endif
 
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
@@ -800,6 +801,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         }
         tcg_out_i(s, *args++);
         break;
+    case INDEX_op_mb:
+        break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
diff --git a/tci.c b/tci.c
index b488c0d..4bdc645 100644
--- a/tci.c
+++ b/tci.c
@@ -1236,6 +1236,10 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
                 tcg_abort();
             }
             break;
+        case INDEX_op_mb:
+            /* Ensure ordering for all kinds */
+            smp_mb();
+            break;
         default:
             TODO();
             break;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PULL v3 14/18] target-arm: Generate fences in ARMv7 frontend
  2016-09-12 23:39 [Qemu-devel] [PULL v3 00/18] tcg queued patches Richard Henderson
                   ` (12 preceding siblings ...)
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 13/18] tcg/tci: " Richard Henderson
@ 2016-09-12 23:39 ` Richard Henderson
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 15/18] target-alpha: Generate fence op Richard Henderson
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2016-09-12 23:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar

From: Pranith Kumar <bobby.prani@gmail.com>

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-12-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index bd5d5cb..693d4bc 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -8083,7 +8083,7 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
             case 4: /* dsb */
             case 5: /* dmb */
                 ARCH(7);
-                /* We don't emulate caches so these are a no-op.  */
+                tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
                 return;
             case 6: /* isb */
                 /* We need to break the TB after this insn to execute
@@ -10432,7 +10432,7 @@ static int disas_thumb2_insn(CPUARMState *env, DisasContext *s, uint16_t insn_hw
                             break;
                         case 4: /* dsb */
                         case 5: /* dmb */
-                            /* These execute as NOPs.  */
+                            tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
                             break;
                         case 6: /* isb */
                             /* We need to break the TB after this insn
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PULL v3 15/18] target-alpha: Generate fence op
  2016-09-12 23:39 [Qemu-devel] [PULL v3 00/18] tcg queued patches Richard Henderson
                   ` (13 preceding siblings ...)
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 14/18] target-arm: Generate fences in ARMv7 frontend Richard Henderson
@ 2016-09-12 23:39 ` Richard Henderson
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 16/18] target-aarch64: Generate fences for aarch64 Richard Henderson
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2016-09-12 23:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar

From: Pranith Kumar <bobby.prani@gmail.com>

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-13-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-alpha/translate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 0ea0e6e..c27c7b9 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -2338,11 +2338,11 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn)
             break;
         case 0x4000:
             /* MB */
-            /* No-op */
+            tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
             break;
         case 0x4400:
             /* WMB */
-            /* No-op */
+            tcg_gen_mb(TCG_MO_ST_ST | TCG_BAR_SC);
             break;
         case 0x8000:
             /* FETCH */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PULL v3 16/18] target-aarch64: Generate fences for aarch64
  2016-09-12 23:39 [Qemu-devel] [PULL v3 00/18] tcg queued patches Richard Henderson
                   ` (14 preceding siblings ...)
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 15/18] target-alpha: Generate fence op Richard Henderson
@ 2016-09-12 23:39 ` Richard Henderson
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 17/18] target-i386: Generate fences for x86 Richard Henderson
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2016-09-12 23:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar

From: Pranith Kumar <bobby.prani@gmail.com>

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-14-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index f5e29d2..09877bc 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -1305,7 +1305,7 @@ static void handle_sync(DisasContext *s, uint32_t insn,
         return;
     case 4: /* DSB */
     case 5: /* DMB */
-        /* We don't emulate caches so barriers are no-ops */
+        tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
         return;
     case 6: /* ISB */
         /* We need to break the TB after this insn to execute
@@ -1934,7 +1934,13 @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
         if (!is_store) {
             s->is_ldex = true;
             gen_load_exclusive(s, rt, rt2, tcg_addr, size, is_pair);
+            if (is_lasr) {
+                tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
+            }
         } else {
+            if (is_lasr) {
+                tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL);
+            }
             gen_store_exclusive(s, rs, rt, rt2, tcg_addr, size, is_pair);
         }
     } else {
@@ -1943,11 +1949,17 @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
 
         /* Generate ISS for non-exclusive accesses including LASR.  */
         if (is_store) {
+            if (is_lasr) {
+                tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL);
+            }
             do_gpr_st(s, tcg_rt, tcg_addr, size,
                       true, rt, iss_sf, is_lasr);
         } else {
             do_gpr_ld(s, tcg_rt, tcg_addr, size, false, false,
                       true, rt, iss_sf, is_lasr);
+            if (is_lasr) {
+                tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
+            }
         }
     }
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PULL v3 17/18] target-i386: Generate fences for x86
  2016-09-12 23:39 [Qemu-devel] [PULL v3 00/18] tcg queued patches Richard Henderson
                   ` (15 preceding siblings ...)
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 16/18] target-aarch64: Generate fences for aarch64 Richard Henderson
@ 2016-09-12 23:39 ` Richard Henderson
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 18/18] tcg: Optimize fence instructions Richard Henderson
  2016-09-13 10:39 ` [Qemu-devel] [PULL v3 00/18] tcg queued patches Peter Maydell
  18 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2016-09-12 23:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar

From: Pranith Kumar <bobby.prani@gmail.com>

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-15-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/translate.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index fa2ac48..9447557 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -8012,13 +8012,21 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                 || (prefixes & PREFIX_LOCK)) {
                 goto illegal_op;
             }
+            tcg_gen_mb(TCG_MO_ST_ST | TCG_BAR_SC);
             break;
         case 0xe8 ... 0xef: /* lfence */
+            if (!(s->cpuid_features & CPUID_SSE)
+                || (prefixes & PREFIX_LOCK)) {
+                goto illegal_op;
+            }
+            tcg_gen_mb(TCG_MO_LD_LD | TCG_BAR_SC);
+            break;
         case 0xf0 ... 0xf7: /* mfence */
             if (!(s->cpuid_features & CPUID_SSE2)
                 || (prefixes & PREFIX_LOCK)) {
                 goto illegal_op;
             }
+            tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
             break;
 
         default:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PULL v3 18/18] tcg: Optimize fence instructions
  2016-09-12 23:39 [Qemu-devel] [PULL v3 00/18] tcg queued patches Richard Henderson
                   ` (16 preceding siblings ...)
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 17/18] target-i386: Generate fences for x86 Richard Henderson
@ 2016-09-12 23:39 ` Richard Henderson
  2016-09-13 10:39 ` [Qemu-devel] [PULL v3 00/18] tcg queued patches Peter Maydell
  18 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2016-09-12 23:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Pranith Kumar

From: Pranith Kumar <bobby.prani@gmail.com>

This commit optimizes fence instructions.  Two optimizations are
currently implemented: (1) unnecessary duplicate fence instructions,
and (2) merging weaker fences into a stronger fence.

[rth: Merge tcg_optimize_mb back into tcg_optimize, so that we only
loop over the opcode stream once.  Merge "unrelated" weaker barriers
into one stronger barrier.]

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160823134825.32578-1-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index cffe89b..0455285 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -542,6 +542,7 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
 void tcg_optimize(TCGContext *s)
 {
     int oi, oi_next, nb_temps, nb_globals;
+    TCGArg *prev_mb_args = NULL;
 
     /* Array VALS has an element for each temp.
        If this temp holds a constant then its value is kept in VALS' element.
@@ -1295,5 +1296,58 @@ void tcg_optimize(TCGContext *s)
             }
             break;
         }
+
+        /* Eliminate duplicate and redundant fence instructions.  */
+        if (prev_mb_args) {
+            TCGArg pop, cop;
+            TCGBar pty, cty;
+
+            switch (opc) {
+            case INDEX_op_mb:
+                pop = prev_mb_args[0];
+                cop = args[0];
+                pty = pop & 0xF0;
+                cty = cop & 0xF0;
+
+                if (cty == pty) {
+                    /* Two barriers of the same type.  Merge the set of
+                     * memories to which this applies.  */
+                    pop |= cop & 0x0F;
+                } else {
+                    /* Merge a weaker barrier into a stronger one,
+                     * or two weaker barriers into a stronger one.
+                     *   mb; strl => mb; st
+                     *   ldaq; mb => ld; mb
+                     *   ldaq; strl => ld; mb; st
+                     * Other combinations are also merged into a strong
+                     * barrier.  This is stricter than specified but for
+                     * the purposes of TCG is better than not optimizing.
+                     */
+                    pop = TCG_BAR_SC | ((cop | pop) & 0x0F);
+                }
+                /* Change the previous barrier to the merged state.
+                 * Then we can remove the current barrier.  */
+                prev_mb_args[0] = pop;
+                tcg_op_remove(s, op);
+                break;
+
+            default:
+                /* Opcodes that end the block stop the optimization.  */
+                if ((def->flags & TCG_OPF_BB_END) == 0) {
+                    break;
+                }
+                /* fallthru */
+            case INDEX_op_qemu_ld_i32:
+            case INDEX_op_qemu_ld_i64:
+            case INDEX_op_qemu_st_i32:
+            case INDEX_op_qemu_st_i64:
+            case INDEX_op_call:
+                /* Opcodes that touch guest memory stop the optimization.  */
+                prev_mb_args = NULL;
+                break;
+            }
+        } else if (opc == INDEX_op_mb) {
+            prev_mb_args = args;
+        }
     }
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PULL v3 00/18] tcg queued patches
  2016-09-12 23:39 [Qemu-devel] [PULL v3 00/18] tcg queued patches Richard Henderson
                   ` (17 preceding siblings ...)
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 18/18] tcg: Optimize fence instructions Richard Henderson
@ 2016-09-13 10:39 ` Peter Maydell
  18 siblings, 0 replies; 22+ messages in thread
From: Peter Maydell @ 2016-09-13 10:39 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On 13 September 2016 at 00:39, Richard Henderson <rth@twiddle.net> wrote:
> Mostly the same as v2, except rebased and the tcg/mips patch
> adjusted for the mips32r6 discussion with Leon.
>
>
> r~
>
>
> The following changes since commit c2a57aae9a1c3dd7de77daf5478df10379aeeebf:
>
>   Merge remote-tracking branch 'remotes/famz/tags/docker-pull-request' into staging (2016-09-09 12:49:41 +0100)
>
> are available in the git repository at:
>
>   git://github.com/rth7680/qemu.git tags/pull-tcg-20160912
>
> for you to fetch changes up to 3c8951649d996facd9581752b87b6c09d2dbab06:
>
>   tcg: Optimize fence instructions (2016-09-12 16:29:50 -0700)
>
> ----------------------------------------------------------------
> queued tcg patches
>
> ----------------------------------------------------------------

I see these test failures on OSX, and aarch64, aarch32 and x86-64 Linux host:

TEST: tests/pxe-test... (pid=14244)
  /i386/pxe/e1000:                                                     **
ERROR:/home/petmay01/qemu/tests/boot-sector.c:111:boot_sector_test:
assertion failed (signat
ure == SIGNATURE): (0x00000000 == 0x0000dead)
FAIL
GTester: last random seed: R02S4049f2392d6ada8fbe4f8686d839ba8a
(pid=14250)
  /i386/pxe/virtio:                                                    **
ERROR:/home/petmay01/qemu/tests/boot-sector.c:111:boot_sector_test:
assertion failed (signat
ure == SIGNATURE): (0x00000000 == 0x0000dead)
FAIL
GTester: last random seed: R02S0bdaebf271b668b6af61e13af2119181
(pid=14255)
FAIL: tests/pxe-test

(ppc64be worked fine, for some reason -- surely this can't be the
rare "only works on bigendian" class of bug? :-))

thanks
-- PMM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PULL v3 01/18] tcg: Support arbitrary size + alignment
  2016-09-12 23:39 ` [Qemu-devel] [PULL v3 01/18] tcg: Support arbitrary size + alignment Richard Henderson
@ 2016-09-20 10:16   ` Bharata B Rao
  2016-09-20 18:57     ` Richard Henderson
  0 siblings, 1 reply; 22+ messages in thread
From: Bharata B Rao @ 2016-09-20 10:16 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Peter Maydell, David Gibson

On Tue, Sep 13, 2016 at 5:09 AM, Richard Henderson <rth@twiddle.net> wrote:
>
> Previously we allowed fully unaligned operations, but not operations
> that are aligned but with less alignment than the operation size.
>
> In addition, arm32, ia64, mips, and sparc had been omitted from the
> previous overalignment patch, which would have led to that alignment
> being enforced.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>

This breaks ppc64 emulation on x86 pretty early during boot.

Quiescing Open Firmware ...
Booting Linux via __start() @ 0x0000000000400000 ...

Thread 4 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc3bfb700 (LWP 17798)]
0x00007fffd302030f in code_gen_buffer ()
Missing separate debuginfos, use: dnf debuginfo-install
glib2-2.48.1-1.fc24.x86_64 gmp-6.1.0-2.fc24.x86_64
gnutls-3.4.12-1.fc24.x86_64 libfdt-1.4.1-5.fc24.x86_64
libffi-3.1-9.fc24.x86_64 libgcc-6.1.1-2.fc24.x86_64
libidn-1.32-2.fc24.x86_64 libstdc++-6.1.1-2.fc24.x86_64
libtasn1-4.8-1.fc24.x86_64 libX11-1.6.3-3.fc24.x86_64
libXau-1.0.8-6.fc24.x86_64 libxcb-1.11.1-2.fc24.x86_64
ncurses-libs-6.0-6.20160709.fc24.x86_64 nettle-3.2-2.fc24.x86_64
p11-kit-0.23.2-2.fc24.x86_64 pcre-8.39-1.fc24.x86_64
pixman-0.34.0-2.fc24.x86_64 SDL-1.2.15-21.fc24.x86_64
zlib-1.2.8-10.fc24.x86_64
(gdb) bt
#0  0x00007fffd302030f in code_gen_buffer ()
#1  0x000055555576d519 in cpu_tb_exec (cpu=0x7fffc8090010,
itb=0x7fffc963c1f8) at /tmp/qemu/cpu-exec.c:166
#2  0x000055555576e035 in cpu_loop_exec_tb (cpu=0x7fffc8090010,
tb=0x7fffc963c1f8, last_tb=0x7fffc3bfab08, tb_exit=0x7fffc3bfab04,
sc=0x7fffc3bfab20)
    at /tmp/qemu/cpu-exec.c:517
#3  0x000055555576e2df in cpu_exec (cpu=0x7fffc8090010) at
/tmp/qemu/cpu-exec.c:612
#4  0x00005555557ab96c in tcg_cpu_exec (cpu=0x7fffc8090010) at
/tmp/qemu/cpus.c:1547
#5  0x00005555557aba48 in tcg_exec_all () at /tmp/qemu/cpus.c:1580
#6  0x00005555557aae3d in qemu_tcg_cpu_thread_fn (arg=0x7fffc8090010)
at /tmp/qemu/cpus.c:1177
#7  0x00007ffff6e105ba in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffff50d87cd in clone () from /lib64/libc.so.6

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PULL v3 01/18] tcg: Support arbitrary size + alignment
  2016-09-20 10:16   ` Bharata B Rao
@ 2016-09-20 18:57     ` Richard Henderson
  0 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2016-09-20 18:57 UTC (permalink / raw)
  To: Bharata B Rao; +Cc: qemu-devel, Peter Maydell, David Gibson

On 09/20/2016 03:16 AM, Bharata B Rao wrote:
> This breaks ppc64 emulation on x86 pretty early during boot.
> 
> Quiescing Open Firmware ...
> Booting Linux via __start() @ 0x0000000000400000 ...
> 
> Thread 4 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffc3bfb700 (LWP 17798)]
> 0x00007fffd302030f in code_gen_buffer ()
> Missing separate debuginfos, use: dnf debuginfo-install
> glib2-2.48.1-1.fc24.x86_64 gmp-6.1.0-2.fc24.x86_64
> gnutls-3.4.12-1.fc24.x86_64 libfdt-1.4.1-5.fc24.x86_64
> libffi-3.1-9.fc24.x86_64 libgcc-6.1.1-2.fc24.x86_64
> libidn-1.32-2.fc24.x86_64 libstdc++-6.1.1-2.fc24.x86_64
> libtasn1-4.8-1.fc24.x86_64 libX11-1.6.3-3.fc24.x86_64
> libXau-1.0.8-6.fc24.x86_64 libxcb-1.11.1-2.fc24.x86_64
> ncurses-libs-6.0-6.20160709.fc24.x86_64 nettle-3.2-2.fc24.x86_64
> p11-kit-0.23.2-2.fc24.x86_64 pcre-8.39-1.fc24.x86_64
> pixman-0.34.0-2.fc24.x86_64 SDL-1.2.15-21.fc24.x86_64
> zlib-1.2.8-10.fc24.x86_64
> (gdb) bt
> #0  0x00007fffd302030f in code_gen_buffer ()

Yes, I was able to reproduce this, although I nearly got to a shell prompt with
Fedora 19 media before it occurred.

I'll send a fix right away.


r~

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-09-20 18:57 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-12 23:39 [Qemu-devel] [PULL v3 00/18] tcg queued patches Richard Henderson
2016-09-12 23:39 ` [Qemu-devel] [PULL v3 01/18] tcg: Support arbitrary size + alignment Richard Henderson
2016-09-20 10:16   ` Bharata B Rao
2016-09-20 18:57     ` Richard Henderson
2016-09-12 23:39 ` [Qemu-devel] [PULL v3 02/18] tcg: Merge GETPC and GETRA Richard Henderson
2016-09-12 23:39 ` [Qemu-devel] [PULL v3 03/18] cpu-exec: Check -dfilter for -d cpu Richard Henderson
2016-09-12 23:39 ` [Qemu-devel] [PULL v3 04/18] Introduce TCGOpcode for memory barrier Richard Henderson
2016-09-12 23:39 ` [Qemu-devel] [PULL v3 05/18] tcg/i386: Add support for fence Richard Henderson
2016-09-12 23:39 ` [Qemu-devel] [PULL v3 06/18] tcg/aarch64: " Richard Henderson
2016-09-12 23:39 ` [Qemu-devel] [PULL v3 07/18] tcg/arm: " Richard Henderson
2016-09-12 23:39 ` [Qemu-devel] [PULL v3 08/18] tcg/ia64: " Richard Henderson
2016-09-12 23:39 ` [Qemu-devel] [PULL v3 09/18] tcg/mips: " Richard Henderson
2016-09-12 23:39 ` [Qemu-devel] [PULL v3 10/18] tcg/ppc: " Richard Henderson
2016-09-12 23:39 ` [Qemu-devel] [PULL v3 11/18] tcg/s390: " Richard Henderson
2016-09-12 23:39 ` [Qemu-devel] [PULL v3 12/18] tcg/sparc: " Richard Henderson
2016-09-12 23:39 ` [Qemu-devel] [PULL v3 13/18] tcg/tci: " Richard Henderson
2016-09-12 23:39 ` [Qemu-devel] [PULL v3 14/18] target-arm: Generate fences in ARMv7 frontend Richard Henderson
2016-09-12 23:39 ` [Qemu-devel] [PULL v3 15/18] target-alpha: Generate fence op Richard Henderson
2016-09-12 23:39 ` [Qemu-devel] [PULL v3 16/18] target-aarch64: Generate fences for aarch64 Richard Henderson
2016-09-12 23:39 ` [Qemu-devel] [PULL v3 17/18] target-i386: Generate fences for x86 Richard Henderson
2016-09-12 23:39 ` [Qemu-devel] [PULL v3 18/18] tcg: Optimize fence instructions Richard Henderson
2016-09-13 10:39 ` [Qemu-devel] [PULL v3 00/18] tcg queued patches Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.