All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC v2 PATCH 00/13] tcg: Add fence gen support
@ 2016-05-31 18:39 Pranith Kumar
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier Pranith Kumar
                   ` (13 more replies)
  0 siblings, 14 replies; 61+ messages in thread
From: Pranith Kumar @ 2016-05-31 18:39 UTC (permalink / raw)
  Cc: alex.bennee, serge.fdrv, rth, qemu-devel

Hello,

The following series adds fence instruction generation support to
TCG. The current work has been rebased on-top of Richard's patch
series.

This has been tested and confirmed to fix ordering issues on a x86
host with MTTCG enabled ARMv7 guest using KVM unit tests.

Pranith Kumar (13):
  Introduce TCGOpcode for memory barrier
  tcg/i386: Add support for fence
  tcg/aarch64: Add support for fence
  tcg/arm: Add support for fence
  tcg/ia64: Add support for fence
  tcg/mips: Add support for fence
  tcg/ppc: Add support for fence
  tcg/s390: Add support for fence
  tcg/sparc: Add support for fence
  tcg/tci: Add support for fence
  target-arm: Generate fences in ARMv7 frontend
  target-alpha: Generate fence op
  tcg: Generate fences only for SMP MTTCG guests

 target-alpha/translate.c     |  5 +++--
 target-arm/translate.c       |  7 +++++--
 tcg/README                   | 17 +++++++++++++++++
 tcg/aarch64/tcg-target.inc.c |  7 +++++++
 tcg/arm/tcg-target.inc.c     | 12 ++++++++++++
 tcg/i386/tcg-target.inc.c    | 35 +++++++++++++++++++++++++++++++++++
 tcg/ia64/tcg-target.inc.c    |  5 +++++
 tcg/mips/tcg-target.inc.c    |  6 ++++++
 tcg/ppc/tcg-target.inc.c     |  8 ++++++++
 tcg/s390/tcg-target.inc.c    |  9 +++++++++
 tcg/sparc/tcg-target.inc.c   |  8 ++++++++
 tcg/tcg-op.c                 |  9 +++++++++
 tcg/tcg-op.h                 |  2 ++
 tcg/tcg-opc.h                |  2 ++
 tcg/tcg.h                    |  8 ++++++++
 tcg/tci/tcg-target.inc.c     |  3 +++
 tci.c                        |  3 +++
 17 files changed, 142 insertions(+), 4 deletions(-)

-- 
2.8.3

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-05-31 18:39 [Qemu-devel] [RFC v2 PATCH 00/13] tcg: Add fence gen support Pranith Kumar
@ 2016-05-31 18:39 ` Pranith Kumar
  2016-05-31 20:24   ` Richard Henderson
  2016-06-02 16:30   ` Sergey Fedorov
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 02/13] tcg/i386: Add support for fence Pranith Kumar
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 61+ messages in thread
From: Pranith Kumar @ 2016-05-31 18:39 UTC (permalink / raw)
  To: Richard Henderson, open list:All patches CC here; +Cc: alex.bennee, serge.fdrv

This commit introduces the TCGOpcode for memory barrier instruction.

This opcode takes an argument which is the type of memory barrier
which should be generated.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/README    | 17 +++++++++++++++++
 tcg/tcg-op.c  |  6 ++++++
 tcg/tcg-op.h  |  2 ++
 tcg/tcg-opc.h |  2 ++
 tcg/tcg.h     |  8 ++++++++
 5 files changed, 35 insertions(+)

diff --git a/tcg/README b/tcg/README
index f4a8ac1..cfe79d7 100644
--- a/tcg/README
+++ b/tcg/README
@@ -402,6 +402,23 @@ double-word product T0.  The later is returned in two single-word outputs.
 
 Similar to mulu2, except the two inputs T1 and T2 are signed.
 
+********* Memory Barrier support
+
+* mb <$arg>
+
+Generate a target memory barrier instruction to ensure memory ordering as being
+enforced by a corresponding guest memory barrier instruction. The ordering
+enforced by the backend may be stricter than the ordering required by the guest.
+It cannot be weaker. This opcode takes an optional constant argument if required
+to generate the appropriate barrier instruction. The backend should take care to
+emit the target barrier instruction only when necessary i.e., for SMP guests and
+when MTTCG is enabled.
+
+The guest translators should generate this opcode for all guest instructions
+which have ordering side effects.
+
+Please see docs/atomics.txt for more information on memory barriers.
+
 ********* 64-bit guest on 32-bit host support
 
 The following opcodes are internal to TCG.  Thus they are to be implemented by
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index f554b86..a6f01a7 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -143,6 +143,12 @@ void tcg_gen_op6(TCGContext *ctx, TCGOpcode opc, TCGArg a1, TCGArg a2,
     tcg_emit_op(ctx, opc, pi);
 }
 
+void tcg_gen_mb(TCGArg a)
+{
+    /* ??? Enable only when MTTCG is enabled.  */
+    tcg_gen_op1(&tcg_ctx, INDEX_op_mb, 0);
+}
+
 /* 32 bit ops */
 
 void tcg_gen_addi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index c446d3d..40920fb 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -261,6 +261,8 @@ static inline void tcg_gen_br(TCGLabel *l)
     tcg_gen_op1(&tcg_ctx, INDEX_op_br, label_arg(l));
 }
 
+void tcg_gen_mb(TCGArg a);
+
 /* Helper calls. */
 
 /* 32 bit ops */
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 6d0410c..c0f3e83 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -42,6 +42,8 @@ DEF(br, 0, 0, 1, TCG_OPF_BB_END)
 # define IMPL64  TCG_OPF_64BIT
 #endif
 
+DEF(mb, 0, 1, 0, 0)
+
 DEF(mov_i32, 1, 1, 0, TCG_OPF_NOT_PRESENT)
 DEF(movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT)
 DEF(setcond_i32, 1, 2, 1, 0)
diff --git a/tcg/tcg.h b/tcg/tcg.h
index a46d17c..a1d59f7 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -385,6 +385,14 @@ static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t)
 #define TCG_CALL_DUMMY_TCGV     MAKE_TCGV_I32(-1)
 #define TCG_CALL_DUMMY_ARG      ((TCGArg)(-1))
 
+/* TCGOpmb args */
+#define TCG_MB_FULL             ((TCGArg)(0))
+#define TCG_MB_READ             ((TCGArg)(1))
+#define TCG_MB_WRITE            ((TCGArg)(2))
+#define TCG_MB_ACQUIRE          ((TCGArg)(3))
+#define TCG_MB_RELEASE          ((TCGArg)(4))
+
+
 /* Conditions.  Note that these are laid out for easy manipulation by
    the functions below:
      bit 0 is used for inverting;
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [RFC v2 PATCH 02/13] tcg/i386: Add support for fence
  2016-05-31 18:39 [Qemu-devel] [RFC v2 PATCH 00/13] tcg: Add fence gen support Pranith Kumar
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier Pranith Kumar
@ 2016-05-31 18:39 ` Pranith Kumar
  2016-05-31 20:27   ` Richard Henderson
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 03/13] tcg/aarch64: " Pranith Kumar
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 61+ messages in thread
From: Pranith Kumar @ 2016-05-31 18:39 UTC (permalink / raw)
  To: Richard Henderson, open list:i386 target; +Cc: alex.bennee, serge.fdrv

Generate mfence instruction on SSE2 enabled processors. For older
processors, generate a 'lock orl $0,0(%esp)' instruction which has
similar ordering semantics.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
[rth: Check for sse2, fallback to locked memory op otherwise.]
Signed-off-by: Richard Henderson <rth@twiddle.net>

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 tcg/i386/tcg-target.inc.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 8fd37f4..1fd5a99 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -121,6 +121,16 @@ static bool have_cmov;
 # define have_cmov 0
 #endif
 
+/* For 32-bit, we are going to attempt to determine at runtime whether
+   sse2 support is available.  */
+#if TCG_TARGET_REG_BITS == 64 || defined(__SSE2__)
+# define have_sse2 1
+#elif defined(CONFIG_CPUID_H) && defined(bit_SSE2)
+static bool have_sse2;
+#else
+# define have_sse2 0
+#endif
+
 /* If bit_MOVBE is defined in cpuid.h (added in GCC version 4.6), we are
    going to attempt to determine at runtime whether movbe is available.  */
 #if defined(CONFIG_CPUID_H) && defined(bit_MOVBE)
@@ -686,6 +696,21 @@ static inline void tcg_out_pushi(TCGContext *s, tcg_target_long val)
     }
 }
 
+static inline void tcg_out_mb(TCGContext *s)
+{
+    if (have_sse2) {
+        /* mfence */
+        tcg_out8(s, 0x0f);
+        tcg_out8(s, 0xae);
+        tcg_out8(s, 0xf0);
+    } else {
+        /* lock orl $0,0(%esp) */
+        tcg_out8(s, 0xf0);
+        tcg_out_modrm_offset(s, OPC_ARITH_EvIb, ARITH_OR, TCG_REG_ESP, 0);
+        tcg_out8(s, 0);
+    }
+}
+
 static inline void tcg_out_push(TCGContext *s, int reg)
 {
     tcg_out_opc(s, OPC_PUSH_r32 + LOWREGMASK(reg), 0, reg, 0);
@@ -2114,6 +2139,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_mb:
+        tcg_out_mb(s);
+        break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
@@ -2179,6 +2207,8 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_add2_i32, { "r", "r", "0", "1", "ri", "ri" } },
     { INDEX_op_sub2_i32, { "r", "r", "0", "1", "ri", "ri" } },
 
+    { INDEX_op_mb, { "r" } },
+
 #if TCG_TARGET_REG_BITS == 32
     { INDEX_op_brcond2_i32, { "r", "r", "ri", "ri" } },
     { INDEX_op_setcond2_i32, { "r", "r", "r", "ri", "ri" } },
@@ -2356,6 +2386,11 @@ static void tcg_target_init(TCGContext *s)
            available, we'll use a small forward branch.  */
         have_cmov = (d & bit_CMOV) != 0;
 #endif
+#ifndef have_sse2
+        /* Likewise, almost all hardware supports SSE2, but we do
+           have a locked memory operation to use as a substitute.  */
+        have_sse2 = (d & bit_SSE2) != 0;
+#endif
 #ifndef have_movbe
         /* MOVBE is only available on Intel Atom and Haswell CPUs, so we
            need to probe for it.  */
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [RFC v2 PATCH 03/13] tcg/aarch64: Add support for fence
  2016-05-31 18:39 [Qemu-devel] [RFC v2 PATCH 00/13] tcg: Add fence gen support Pranith Kumar
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier Pranith Kumar
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 02/13] tcg/i386: Add support for fence Pranith Kumar
@ 2016-05-31 18:39 ` Pranith Kumar
  2016-05-31 18:59   ` Claudio Fontana
  2016-05-31 20:34   ` Richard Henderson
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 04/13] tcg/arm: " Pranith Kumar
                   ` (10 subsequent siblings)
  13 siblings, 2 replies; 61+ messages in thread
From: Pranith Kumar @ 2016-05-31 18:39 UTC (permalink / raw)
  To: Claudio Fontana, Richard Henderson, open list:AArch64 target,
	open list:All patches CC here
  Cc: alex.bennee, serge.fdrv, Claudio Fontana

Cc: Claudio Fontana <claudio.fontana@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 tcg/aarch64/tcg-target.inc.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 08efdf4..c361a5c 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -360,6 +360,9 @@ typedef enum {
     I3510_EOR       = 0x4a000000,
     I3510_EON       = 0x4a200000,
     I3510_ANDS      = 0x6a000000,
+
+    /* System instructions.  */
+    DMB_ISH         = 0xd5033bbf,
 } AArch64Insn;
 
 static inline uint32_t tcg_in32(TCGContext *s)
@@ -1625,6 +1628,9 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_insn(s, 3508, SMULH, TCG_TYPE_I64, a0, a1, a2);
         break;
 
+    case INDEX_op_mb:
+        tcg_out32(s, DMB_ISH);
+        break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
@@ -1749,6 +1755,7 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
     { INDEX_op_muluh_i64, { "r", "r", "r" } },
     { INDEX_op_mulsh_i64, { "r", "r", "r" } },
 
+    { INDEX_op_mb, { "r" } },
     { -1 },
 };
 
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [RFC v2 PATCH 04/13] tcg/arm: Add support for fence
  2016-05-31 18:39 [Qemu-devel] [RFC v2 PATCH 00/13] tcg: Add fence gen support Pranith Kumar
                   ` (2 preceding siblings ...)
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 03/13] tcg/aarch64: " Pranith Kumar
@ 2016-05-31 18:39 ` Pranith Kumar
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 05/13] tcg/ia64: " Pranith Kumar
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 61+ messages in thread
From: Pranith Kumar @ 2016-05-31 18:39 UTC (permalink / raw)
  To: Andrzej Zaborowski, Richard Henderson, open list:ARM target,
	open list:All patches CC here
  Cc: alex.bennee, serge.fdrv, Peter Maydell

Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.inc.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index a914762..e88d8ce 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -305,6 +305,10 @@ typedef enum {
     INSN_LDRD_REG  = 0x000000d0,
     INSN_STRD_IMM  = 0x004000f0,
     INSN_STRD_REG  = 0x000000f0,
+
+    INSN_DMB_ISH   = 0x5bf07ff5,
+    INSN_DMB_MCR   = 0xba0f07ee,
+
 } ARMInsn;
 
 #define SHIFT_IMM_LSL(im)	(((im) << 7) | 0x00)
@@ -1905,6 +1909,13 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_udiv(s, COND_AL, args[0], args[1], args[2]);
         break;
 
+    case INDEX_op_mb:
+        if (use_armv7_instructions) {
+            tcg_out32(s, INSN_DMB_ISH);
+        } else if (use_armv6_instructions) {
+            tcg_out32(s, INSN_DMB_MCR);
+        }
+        break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
@@ -1979,6 +1990,7 @@ static const TCGTargetOpDef arm_op_defs[] = {
     { INDEX_op_div_i32, { "r", "r", "r" } },
     { INDEX_op_divu_i32, { "r", "r", "r" } },
 
+    { INDEX_op_mb, { "r" } },
     { -1 },
 };
 
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [RFC v2 PATCH 05/13] tcg/ia64: Add support for fence
  2016-05-31 18:39 [Qemu-devel] [RFC v2 PATCH 00/13] tcg: Add fence gen support Pranith Kumar
                   ` (3 preceding siblings ...)
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 04/13] tcg/arm: " Pranith Kumar
@ 2016-05-31 18:39 ` Pranith Kumar
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 06/13] tcg/mips: " Pranith Kumar
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 61+ messages in thread
From: Pranith Kumar @ 2016-05-31 18:39 UTC (permalink / raw)
  To: Aurelien Jarno, Richard Henderson, open list:All patches CC here
  Cc: alex.bennee, serge.fdrv

Cc: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 tcg/ia64/tcg-target.inc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tcg/ia64/tcg-target.inc.c b/tcg/ia64/tcg-target.inc.c
index 261861f..88cc560 100644
--- a/tcg/ia64/tcg-target.inc.c
+++ b/tcg/ia64/tcg-target.inc.c
@@ -247,6 +247,7 @@ enum {
     OPC_LD4_M3                = 0x0a080000000ull,
     OPC_LD8_M1                = 0x080c0000000ull,
     OPC_LD8_M3                = 0x0a0c0000000ull,
+    OPC_MF_M24                = 0x00110000000ull,
     OPC_MUX1_I3               = 0x0eca0000000ull,
     OPC_NOP_B9                = 0x04008000000ull,
     OPC_NOP_F16               = 0x00008000000ull,
@@ -2213,6 +2214,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_qemu_st(s, args);
         break;
 
+    case INDEX_op_mb:
+        tcg_out_bundle(s, mmI, OPC_MF_M24, INSN_NOP_M, INSN_NOP_I);
+        break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
@@ -2326,6 +2330,7 @@ static const TCGTargetOpDef ia64_op_defs[] = {
     { INDEX_op_qemu_st_i32, { "SZ", "r" } },
     { INDEX_op_qemu_st_i64, { "SZ", "r" } },
 
+    { INDEX_op_mb, { "r" } },
     { -1 },
 };
 
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [RFC v2 PATCH 06/13] tcg/mips: Add support for fence
  2016-05-31 18:39 [Qemu-devel] [RFC v2 PATCH 00/13] tcg: Add fence gen support Pranith Kumar
                   ` (4 preceding siblings ...)
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 05/13] tcg/ia64: " Pranith Kumar
@ 2016-05-31 18:39 ` Pranith Kumar
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 07/13] tcg/ppc: " Pranith Kumar
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 61+ messages in thread
From: Pranith Kumar @ 2016-05-31 18:39 UTC (permalink / raw)
  To: Aurelien Jarno, Richard Henderson, open list:All patches CC here
  Cc: alex.bennee, serge.fdrv

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 tcg/mips/tcg-target.inc.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index b2a839a..fc9c7fb 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -292,6 +292,7 @@ typedef enum {
     OPC_JALR     = OPC_SPECIAL | 0x09,
     OPC_MOVZ     = OPC_SPECIAL | 0x0A,
     OPC_MOVN     = OPC_SPECIAL | 0x0B,
+    OPC_SYNC     = OPC_SPECIAL | 0x0F,
     OPC_MFHI     = OPC_SPECIAL | 0x10,
     OPC_MFLO     = OPC_SPECIAL | 0x12,
     OPC_MULT     = OPC_SPECIAL | 0x18,
@@ -1636,6 +1637,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                         const_args[4], const_args[5], true);
         break;
 
+    case INDEX_op_mb:
+        tcg_out32(s, OPC_SYNC);
+        break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
@@ -1716,6 +1720,8 @@ static const TCGTargetOpDef mips_op_defs[] = {
     { INDEX_op_qemu_ld_i64, { "L", "L", "lZ", "lZ" } },
     { INDEX_op_qemu_st_i64, { "SZ", "SZ", "SZ", "SZ" } },
 #endif
+
+    { INDEX_op_mb, { "r" } },
     { -1 },
 };
 
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [RFC v2 PATCH 07/13] tcg/ppc: Add support for fence
  2016-05-31 18:39 [Qemu-devel] [RFC v2 PATCH 00/13] tcg: Add fence gen support Pranith Kumar
                   ` (5 preceding siblings ...)
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 06/13] tcg/mips: " Pranith Kumar
@ 2016-05-31 18:39 ` Pranith Kumar
  2016-05-31 20:41   ` Richard Henderson
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 08/13] tcg/s390: " Pranith Kumar
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 61+ messages in thread
From: Pranith Kumar @ 2016-05-31 18:39 UTC (permalink / raw)
  To: Vassili Karpov (malc), Richard Henderson, open list:All patches CC here
  Cc: alex.bennee, serge.fdrv

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 tcg/ppc/tcg-target.inc.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 1039407..45a667f 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -469,6 +469,9 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define STHX   XO31(407)
 #define STWX   XO31(151)
 
+#define HWSYNC XO31(598)
+#define LWSYNC (HWSYNC | (1u << 21))
+
 #define SPR(a, b) ((((a)<<5)|(b))<<11)
 #define LR     SPR(8, 0)
 #define CTR    SPR(9, 0)
@@ -2425,6 +2428,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out32(s, MULHD | TAB(args[0], args[1], args[2]));
         break;
 
+    case INDEX_op_mb:
+        /* ??? Do we want SEQ_CST or ACQ_REL memory model.  */
+        tcg_out32(s, HWSYNC);
+        break;
     case INDEX_op_mov_i32:   /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32:  /* Always emitted via tcg_out_movi.  */
@@ -2572,6 +2579,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
     { INDEX_op_qemu_st_i64, { "S", "S", "S", "S" } },
 #endif
 
+    { INDEX_op_mb, { "r" } },
     { -1 },
 };
 
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [RFC v2 PATCH 08/13] tcg/s390: Add support for fence
  2016-05-31 18:39 [Qemu-devel] [RFC v2 PATCH 00/13] tcg: Add fence gen support Pranith Kumar
                   ` (6 preceding siblings ...)
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 07/13] tcg/ppc: " Pranith Kumar
@ 2016-05-31 18:39 ` Pranith Kumar
  2016-06-02 19:31   ` Sergey Fedorov
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 09/13] tcg/sparc: " Pranith Kumar
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 61+ messages in thread
From: Pranith Kumar @ 2016-05-31 18:39 UTC (permalink / raw)
  To: Alexander Graf, Richard Henderson, open list:All patches CC here
  Cc: alex.bennee, serge.fdrv

Cc: Alexander Graf <agraf@suse.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 tcg/s390/tcg-target.inc.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index e95b04b..b4f14bc 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -341,6 +341,7 @@ static tcg_insn_unit *tb_ret_addr;
 #define FACILITY_EXT_IMM	(1ULL << (63 - 21))
 #define FACILITY_GEN_INST_EXT	(1ULL << (63 - 34))
 #define FACILITY_LOAD_ON_COND   (1ULL << (63 - 45))
+#define FACILITY_FAST_BCR_SER   FACILITY_LOAD_ON_COND
 
 static uint64_t facilities;
 
@@ -2157,6 +2158,13 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tgen_deposit(s, args[0], args[2], args[3], args[4]);
         break;
 
+    case INDEX_op_mb:
+        /* The host memory model is quite strong, we simply need to
+           serialize the instruction stream.  */
+        tcg_out_insn(s, RR, BCR,
+		     facilities & FACILITY_FAST_BCR_SER ? 14 : 15, 0);
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
@@ -2278,6 +2286,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
     { INDEX_op_movcond_i64, { "r", "r", "rC", "r", "0" } },
     { INDEX_op_deposit_i64, { "r", "0", "r" } },
 
+    { INDEX_op_mb, { "r" } },
     { -1 },
 };
 
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [RFC v2 PATCH 09/13] tcg/sparc: Add support for fence
  2016-05-31 18:39 [Qemu-devel] [RFC v2 PATCH 00/13] tcg: Add fence gen support Pranith Kumar
                   ` (7 preceding siblings ...)
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 08/13] tcg/s390: " Pranith Kumar
@ 2016-05-31 18:39 ` Pranith Kumar
  2016-05-31 20:45   ` Richard Henderson
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 10/13] tcg/tci: " Pranith Kumar
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 61+ messages in thread
From: Pranith Kumar @ 2016-05-31 18:39 UTC (permalink / raw)
  To: Blue Swirl, Richard Henderson, open list:All patches CC here
  Cc: alex.bennee, serge.fdrv

Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 tcg/sparc/tcg-target.inc.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index a611885..81f263f 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -249,6 +249,8 @@ static const int tcg_target_call_oarg_regs[] = {
 #define STWA       (INSN_OP(3) | INSN_OP3(0x14))
 #define STXA       (INSN_OP(3) | INSN_OP3(0x1e))
 
+#define MEMBAR     (INSN_OP(2) | INSN_OP3(0x28) | INSN_RS1(15) | (1 << 13))
+
 #ifndef ASI_PRIMARY_LITTLE
 #define ASI_PRIMARY_LITTLE 0x88
 #endif
@@ -1450,6 +1452,11 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 	tcg_out_arithc(s, a0, TCG_REG_G0, a1, const_args[1], c);
 	break;
 
+    case INDEX_op_mb:
+        /* membar #LoadLoad|#LoadStore|#StoreStore|#StoreLoad */
+        tcg_out32(s, MEMBAR | 15);
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
@@ -1551,6 +1558,7 @@ static const TCGTargetOpDef sparc_op_defs[] = {
     { INDEX_op_qemu_st_i32, { "sZ", "A" } },
     { INDEX_op_qemu_st_i64, { "SZ", "A" } },
 
+    { INDEX_op_mb, { "r" } },
     { -1 },
 };
 
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [RFC v2 PATCH 10/13] tcg/tci: Add support for fence
  2016-05-31 18:39 [Qemu-devel] [RFC v2 PATCH 00/13] tcg: Add fence gen support Pranith Kumar
                   ` (8 preceding siblings ...)
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 09/13] tcg/sparc: " Pranith Kumar
@ 2016-05-31 18:39 ` Pranith Kumar
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 11/13] target-arm: Generate fences in ARMv7 frontend Pranith Kumar
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 61+ messages in thread
From: Pranith Kumar @ 2016-05-31 18:39 UTC (permalink / raw)
  To: Stefan Weil, Richard Henderson, open list:All patches CC here
  Cc: alex.bennee, serge.fdrv

Cc: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 tcg/tci/tcg-target.inc.c | 3 +++
 tci.c                    | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
index 4e91687..a507ceb 100644
--- a/tcg/tci/tcg-target.inc.c
+++ b/tcg/tci/tcg-target.inc.c
@@ -255,6 +255,7 @@ static const TCGTargetOpDef tcg_target_op_defs[] = {
     { INDEX_op_bswap32_i32, { R, R } },
 #endif
 
+    { INDEX_op_mb, { "r" } },
     { -1 },
 };
 
@@ -798,6 +799,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         }
         tcg_out_i(s, *args++);
         break;
+    case INDEX_op_mb:
+        break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
diff --git a/tci.c b/tci.c
index 7cbb39e..d35b24f 100644
--- a/tci.c
+++ b/tci.c
@@ -1230,6 +1230,9 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
                 tcg_abort();
             }
             break;
+        case INDEX_op_mb:
+            smp_mb();
+            break;
         default:
             TODO();
             break;
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [RFC v2 PATCH 11/13] target-arm: Generate fences in ARMv7 frontend
  2016-05-31 18:39 [Qemu-devel] [RFC v2 PATCH 00/13] tcg: Add fence gen support Pranith Kumar
                   ` (9 preceding siblings ...)
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 10/13] tcg/tci: " Pranith Kumar
@ 2016-05-31 18:39 ` Pranith Kumar
  2016-06-02 19:37   ` Sergey Fedorov
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 12/13] target-alpha: Generate fence op Pranith Kumar
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 61+ messages in thread
From: Pranith Kumar @ 2016-05-31 18:39 UTC (permalink / raw)
  To: Peter Maydell, open list:ARM, open list:All patches CC here
  Cc: alex.bennee, serge.fdrv, rth

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index c946c0e..e1b16c0 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -7980,9 +7980,11 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                 gen_clrex(s);
                 return;
             case 4: /* dsb */
+                ARCH(7);
+                return;
             case 5: /* dmb */
                 ARCH(7);
-                /* We don't emulate caches so these are a no-op.  */
+                tcg_gen_mb(TCG_MB_FULL);
                 return;
             case 6: /* isb */
                 /* We need to break the TB after this insn to execute
@@ -10330,8 +10332,9 @@ static int disas_thumb2_insn(CPUARMState *env, DisasContext *s, uint16_t insn_hw
                             gen_clrex(s);
                             break;
                         case 4: /* dsb */
+                            break;
                         case 5: /* dmb */
-                            /* These execute as NOPs.  */
+                            tcg_gen_mb(TCG_MB_FULL);
                             break;
                         case 6: /* isb */
                             /* We need to break the TB after this insn
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [RFC v2 PATCH 12/13] target-alpha: Generate fence op
  2016-05-31 18:39 [Qemu-devel] [RFC v2 PATCH 00/13] tcg: Add fence gen support Pranith Kumar
                   ` (10 preceding siblings ...)
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 11/13] target-arm: Generate fences in ARMv7 frontend Pranith Kumar
@ 2016-05-31 18:39 ` Pranith Kumar
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 13/13] tcg: Generate fences only for SMP MTTCG guests Pranith Kumar
  2016-05-31 18:46 ` [Qemu-devel] [RFC v2 PATCH 00/13] tcg: Add fence gen support Pranith Kumar
  13 siblings, 0 replies; 61+ messages in thread
From: Pranith Kumar @ 2016-05-31 18:39 UTC (permalink / raw)
  To: Richard Henderson, open list:All patches CC here; +Cc: alex.bennee, serge.fdrv

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 target-alpha/translate.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 5b86992..17b68f5 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -2329,11 +2329,12 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn)
             break;
         case 0x4000:
             /* MB */
-            /* No-op */
+            tcg_gen_mb(TCG_MB_FULL);
             break;
         case 0x4400:
             /* WMB */
-            /* No-op */
+            /* TODO: Change this to write barrier */
+            tcg_gen_mb(TCG_MB_FULL);
             break;
         case 0x8000:
             /* FETCH */
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [RFC v2 PATCH 13/13] tcg: Generate fences only for SMP MTTCG guests
  2016-05-31 18:39 [Qemu-devel] [RFC v2 PATCH 00/13] tcg: Add fence gen support Pranith Kumar
                   ` (11 preceding siblings ...)
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 12/13] target-alpha: Generate fence op Pranith Kumar
@ 2016-05-31 18:39 ` Pranith Kumar
  2016-05-31 18:46 ` [Qemu-devel] [RFC v2 PATCH 00/13] tcg: Add fence gen support Pranith Kumar
  13 siblings, 0 replies; 61+ messages in thread
From: Pranith Kumar @ 2016-05-31 18:39 UTC (permalink / raw)
  To: Richard Henderson, open list:All patches CC here; +Cc: alex.bennee, serge.fdrv

We need to generate fence instructions only for SMP MTTCG guests. This
patch enforces that.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 tcg/tcg-op.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index a6f01a7..eeb0d0c 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -36,6 +36,8 @@ extern TCGv_i32 TCGV_HIGH_link_error(TCGv_i64);
 #define TCGV_HIGH TCGV_HIGH_link_error
 #endif
 
+extern int smp_cpus;
+
 /* Note that this is optimized for sequential allocation during translate.
    Up to and including filling in the forward link immediately.  We'll do
    proper termination of the end of the list after we finish translation.  */
@@ -145,8 +147,9 @@ void tcg_gen_op6(TCGContext *ctx, TCGOpcode opc, TCGArg a1, TCGArg a2,
 
 void tcg_gen_mb(TCGArg a)
 {
-    /* ??? Enable only when MTTCG is enabled.  */
-    tcg_gen_op1(&tcg_ctx, INDEX_op_mb, 0);
+    if (qemu_tcg_mttcg_enabled() && smp_cpus > 1) {
+        tcg_gen_op1(&tcg_ctx, INDEX_op_mb, 0);
+    }
 }
 
 /* 32 bit ops */
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 00/13] tcg: Add fence gen support
  2016-05-31 18:39 [Qemu-devel] [RFC v2 PATCH 00/13] tcg: Add fence gen support Pranith Kumar
                   ` (12 preceding siblings ...)
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 13/13] tcg: Generate fences only for SMP MTTCG guests Pranith Kumar
@ 2016-05-31 18:46 ` Pranith Kumar
  13 siblings, 0 replies; 61+ messages in thread
From: Pranith Kumar @ 2016-05-31 18:46 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: Alex Bennée, Richard Henderson, qemu-devel, Sergey Fedorov

Added correct email for Sergey in CC.

I apologize for getting Sergey's email wrong. Please drop/correct his
email when replying to the patches in this series otherwise you will
see an email bounce.

On Tue, May 31, 2016 at 2:39 PM, Pranith Kumar <bobby.prani@gmail.com> wrote:
> Hello,
>
> The following series adds fence instruction generation support to
> TCG. The current work has been rebased on-top of Richard's patch
> series.
>
> This has been tested and confirmed to fix ordering issues on a x86
> host with MTTCG enabled ARMv7 guest using KVM unit tests.
>
> Pranith Kumar (13):
>   Introduce TCGOpcode for memory barrier
>   tcg/i386: Add support for fence
>   tcg/aarch64: Add support for fence
>   tcg/arm: Add support for fence
>   tcg/ia64: Add support for fence
>   tcg/mips: Add support for fence
>   tcg/ppc: Add support for fence
>   tcg/s390: Add support for fence
>   tcg/sparc: Add support for fence
>   tcg/tci: Add support for fence
>   target-arm: Generate fences in ARMv7 frontend
>   target-alpha: Generate fence op
>   tcg: Generate fences only for SMP MTTCG guests
>
>  target-alpha/translate.c     |  5 +++--
>  target-arm/translate.c       |  7 +++++--
>  tcg/README                   | 17 +++++++++++++++++
>  tcg/aarch64/tcg-target.inc.c |  7 +++++++
>  tcg/arm/tcg-target.inc.c     | 12 ++++++++++++
>  tcg/i386/tcg-target.inc.c    | 35 +++++++++++++++++++++++++++++++++++
>  tcg/ia64/tcg-target.inc.c    |  5 +++++
>  tcg/mips/tcg-target.inc.c    |  6 ++++++
>  tcg/ppc/tcg-target.inc.c     |  8 ++++++++
>  tcg/s390/tcg-target.inc.c    |  9 +++++++++
>  tcg/sparc/tcg-target.inc.c   |  8 ++++++++
>  tcg/tcg-op.c                 |  9 +++++++++
>  tcg/tcg-op.h                 |  2 ++
>  tcg/tcg-opc.h                |  2 ++
>  tcg/tcg.h                    |  8 ++++++++
>  tcg/tci/tcg-target.inc.c     |  3 +++
>  tci.c                        |  3 +++
>  17 files changed, 142 insertions(+), 4 deletions(-)
>
> --
> 2.8.3
>



-- 
Pranith

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 03/13] tcg/aarch64: Add support for fence
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 03/13] tcg/aarch64: " Pranith Kumar
@ 2016-05-31 18:59   ` Claudio Fontana
  2016-05-31 20:34   ` Richard Henderson
  1 sibling, 0 replies; 61+ messages in thread
From: Claudio Fontana @ 2016-05-31 18:59 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: Claudio Fontana, Richard Henderson, open list:AArch64 target,
	open list:All patches CC here, alex.bennee, serge.fdrv

Acked-by: Claudio Fontana <claudio.fontana@huawei.com>

On Tuesday, 31 May 2016, Pranith Kumar <bobby.prani@gmail.com> wrote:

> Cc: Claudio Fontana <claudio.fontana@gmail.com <javascript:;>>
> Signed-off-by: Richard Henderson <rth@twiddle.net <javascript:;>>
> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com <javascript:;>>
> ---
>  tcg/aarch64/tcg-target.inc.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
> index 08efdf4..c361a5c 100644
> --- a/tcg/aarch64/tcg-target.inc.c
> +++ b/tcg/aarch64/tcg-target.inc.c
> @@ -360,6 +360,9 @@ typedef enum {
>      I3510_EOR       = 0x4a000000,
>      I3510_EON       = 0x4a200000,
>      I3510_ANDS      = 0x6a000000,
> +
> +    /* System instructions.  */
> +    DMB_ISH         = 0xd5033bbf,
>  } AArch64Insn;
>
>  static inline uint32_t tcg_in32(TCGContext *s)
> @@ -1625,6 +1628,9 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          tcg_out_insn(s, 3508, SMULH, TCG_TYPE_I64, a0, a1, a2);
>          break;
>
> +    case INDEX_op_mb:
> +        tcg_out32(s, DMB_ISH);
> +        break;
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
>      case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
> @@ -1749,6 +1755,7 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
>      { INDEX_op_muluh_i64, { "r", "r", "r" } },
>      { INDEX_op_mulsh_i64, { "r", "r", "r" } },
>
> +    { INDEX_op_mb, { "r" } },
>      { -1 },
>  };
>
> --
> 2.8.3
>
>

--

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier Pranith Kumar
@ 2016-05-31 20:24   ` Richard Henderson
  2016-06-01 18:43     ` Pranith Kumar
  2016-06-02 16:30   ` Sergey Fedorov
  1 sibling, 1 reply; 61+ messages in thread
From: Richard Henderson @ 2016-05-31 20:24 UTC (permalink / raw)
  To: Pranith Kumar, open list:All patches CC here; +Cc: alex.bennee, serge.fdrv

On 05/31/2016 11:39 AM, Pranith Kumar wrote:
> +********* Memory Barrier support
> +
> +* mb <$arg>

Document what $arg should be.

> +Generate a target memory barrier instruction to ensure memory ordering as being
> +enforced by a corresponding guest memory barrier instruction. The ordering
> +enforced by the backend may be stricter than the ordering required by the guest.
> +It cannot be weaker. This opcode takes an optional constant argument if required
> +to generate the appropriate barrier instruction. The backend should take care to

The argument is *not* optional.

> +void tcg_gen_mb(TCGArg a)
> +{
> +    /* ??? Enable only when MTTCG is enabled.  */
> +    tcg_gen_op1(&tcg_ctx, INDEX_op_mb, 0);

Pass A to tcg_gen_op1, not 0.

> +/* TCGOpmb args */
> +#define TCG_MB_FULL             ((TCGArg)(0))
> +#define TCG_MB_READ             ((TCGArg)(1))
> +#define TCG_MB_WRITE            ((TCGArg)(2))
> +#define TCG_MB_ACQUIRE          ((TCGArg)(3))
> +#define TCG_MB_RELEASE          ((TCGArg)(4))

This is, IMO, confused.  Either we should use the C++11 barrier types, or the 
Linux barrier types, but not both.


r~

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 02/13] tcg/i386: Add support for fence
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 02/13] tcg/i386: Add support for fence Pranith Kumar
@ 2016-05-31 20:27   ` Richard Henderson
  2016-06-01 18:49     ` Pranith Kumar
  0 siblings, 1 reply; 61+ messages in thread
From: Richard Henderson @ 2016-05-31 20:27 UTC (permalink / raw)
  To: Pranith Kumar, open list:i386 target; +Cc: alex.bennee, serge.fdrv

On 05/31/2016 11:39 AM, Pranith Kumar wrote:
> +    case INDEX_op_mb:
> +        tcg_out_mb(s);

You need to look at the barrier type and DTRT.  In particular, the Linux 
smp_rmb and smp_wmb types need not emit any code.

> +    { INDEX_op_mb, { "r" } },

You certainly do *not* need the constant argument loaded into a register.  This 
should remain { }.


r~

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 03/13] tcg/aarch64: Add support for fence
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 03/13] tcg/aarch64: " Pranith Kumar
  2016-05-31 18:59   ` Claudio Fontana
@ 2016-05-31 20:34   ` Richard Henderson
  2016-06-16 22:03     ` Pranith Kumar
  1 sibling, 1 reply; 61+ messages in thread
From: Richard Henderson @ 2016-05-31 20:34 UTC (permalink / raw)
  To: Pranith Kumar, Claudio Fontana, open list:AArch64 target,
	open list:All patches CC here
  Cc: alex.bennee, serge.fdrv, Claudio Fontana

On 05/31/2016 11:39 AM, Pranith Kumar wrote:
> +    /* System instructions.  */
> +    DMB_ISH         = 0xd5033bbf,
...
> +    case INDEX_op_mb:
> +        tcg_out32(s, DMB_ISH);
> +        break;

With the flags argument, this needs to be split.

DMB_ISH = 0xd5033b8f
DMB_RD  = 0x00000010
DMB_WR  = 0x00000020

	if (a0 == TCG_MB_READ) {
		a0 = DMB_RD;
	} else if (a0 == TCG_MB_WRITE) {
		a0 = DMB_WR;
	} else {
		a0 = DMB_RD | DMB_WR;
	}
	tcg_out32(s, DMB_ISH | a0);



r~

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 07/13] tcg/ppc: Add support for fence
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 07/13] tcg/ppc: " Pranith Kumar
@ 2016-05-31 20:41   ` Richard Henderson
  0 siblings, 0 replies; 61+ messages in thread
From: Richard Henderson @ 2016-05-31 20:41 UTC (permalink / raw)
  To: Pranith Kumar, Vassili Karpov (malc), open list:All patches CC here
  Cc: alex.bennee, serge.fdrv

On 05/31/2016 11:39 AM, Pranith Kumar wrote:
> +#define HWSYNC XO31(598)
> +#define LWSYNC (HWSYNC | (1u << 21))
...
> +    case INDEX_op_mb:
> +        /* ??? Do we want SEQ_CST or ACQ_REL memory model.  */
> +        tcg_out32(s, HWSYNC);
> +        break;

With the flags argument, this needs to be

#define EIEIO  XO31(854)

	a0 = args[0];
	if (a0 == TCG_MB_WRITE) {
		tcg_out32(s, EIEIO);
	} else if (a1 == TCG_MB_READ) {
		tcg_out32(s, LWSYNC);
	} else {
		tcg_out32(s, HWSYNC);
	}


r~

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 09/13] tcg/sparc: Add support for fence
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 09/13] tcg/sparc: " Pranith Kumar
@ 2016-05-31 20:45   ` Richard Henderson
  0 siblings, 0 replies; 61+ messages in thread
From: Richard Henderson @ 2016-05-31 20:45 UTC (permalink / raw)
  To: Pranith Kumar, Blue Swirl, open list:All patches CC here
  Cc: alex.bennee, serge.fdrv

On 05/31/2016 11:39 AM, Pranith Kumar wrote:
> +    case INDEX_op_mb:
> +        /* membar #LoadLoad|#LoadStore|#StoreStore|#StoreLoad */
> +        tcg_out32(s, MEMBAR | 15);
> +        break;

With the argument, this needs to be

	if (a0 == TCG_MB_WRITE) {
		/* #StoreStore | #StoreLoad */
		a0 = 0xa;
	} else if (a0 == TCG_MB_READ) {
		/* #LoadStore | #LoadLoad */
		a0 = 0x5;
	} else {
		/* #StoreStore | #LoadStore | #StoreLoad | #LoadLoad */
		a0 = 0xf;
	}
	tcg_out32(s, MEMBAR | a0);


r~

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-05-31 20:24   ` Richard Henderson
@ 2016-06-01 18:43     ` Pranith Kumar
  2016-06-01 21:35       ` Richard Henderson
  0 siblings, 1 reply; 61+ messages in thread
From: Pranith Kumar @ 2016-06-01 18:43 UTC (permalink / raw)
  To: Richard Henderson
  Cc: open list:All patches CC here, Alex Bennée, Sergey Fedorov

Hi Richard,

Thanks for the review. I will make the changes you pointed out. One point below:

On Tue, May 31, 2016 at 4:24 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 05/31/2016 11:39 AM, Pranith Kumar wrote:

>> +/* TCGOpmb args */
>> +#define TCG_MB_FULL             ((TCGArg)(0))
>> +#define TCG_MB_READ             ((TCGArg)(1))
>> +#define TCG_MB_WRITE            ((TCGArg)(2))
>> +#define TCG_MB_ACQUIRE          ((TCGArg)(3))
>> +#define TCG_MB_RELEASE          ((TCGArg)(4))
>
>
> This is, IMO, confused.  Either we should use the C++11 barrier types, or
> the Linux barrier types, but not both.

This part of the design is still being fleshed out. The above listing
is all the kinds of barriers we can encounter during translation. May
be I am missing something, but C++11 and Linux barrier types have
higher level semantics which is difficult to infer at the instruction
level.

All we want to do here is map a barrier instruction from guest to a
barrier instruction on hist. This mapping is 1:1 if the host has
barrier instructions with matching semantics. Otherwise we generate a
multi-op instruction sequence. Some examples are: load acquire(ldar)
on ARM64 guest will map to dmb+load on ARMv7 target, store
fence(sfence) on x86 guest will map to dmb on ARMv7 target.

So we need to support all kinds of barrier types we expect the guest
to have, then map them to the host.

Thanks,
-- 
Pranith

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 02/13] tcg/i386: Add support for fence
  2016-05-31 20:27   ` Richard Henderson
@ 2016-06-01 18:49     ` Pranith Kumar
  2016-06-01 21:17       ` Richard Henderson
  0 siblings, 1 reply; 61+ messages in thread
From: Pranith Kumar @ 2016-06-01 18:49 UTC (permalink / raw)
  To: Richard Henderson; +Cc: open list:i386 target, Alex Bennée, Sergey Fedorov

On Tue, May 31, 2016 at 4:27 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 05/31/2016 11:39 AM, Pranith Kumar wrote:
>>
>> +    case INDEX_op_mb:
>> +        tcg_out_mb(s);
>
>
> You need to look at the barrier type and DTRT.  In particular, the Linux
> smp_rmb and smp_wmb types need not emit any code.

These are converted to 'lfence' and 'sfence' instructions. Based on
the target backend, I think we still need to emit barrier
instructions. For example, if target backend is ARMv7 we need to emit
'dmb' instruction for both x86 fence instructions. I am not sure why
they do not emit any code?

>
>> +    { INDEX_op_mb, { "r" } },
>
>
> You certainly do *not* need the constant argument loaded into a register.
> This should remain { }.
>

OK, I will fix this.

Thanks,
-- 
Pranith

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 02/13] tcg/i386: Add support for fence
  2016-06-01 18:49     ` Pranith Kumar
@ 2016-06-01 21:17       ` Richard Henderson
  2016-06-01 21:44         ` Pranith Kumar
  0 siblings, 1 reply; 61+ messages in thread
From: Richard Henderson @ 2016-06-01 21:17 UTC (permalink / raw)
  To: Pranith Kumar; +Cc: open list:i386 target, Alex Bennée, Sergey Fedorov

On 06/01/2016 11:49 AM, Pranith Kumar wrote:
> On Tue, May 31, 2016 at 4:27 PM, Richard Henderson <rth@twiddle.net> wrote:
>> On 05/31/2016 11:39 AM, Pranith Kumar wrote:
>>>
>>> +    case INDEX_op_mb:
>>> +        tcg_out_mb(s);
>>
>>
>> You need to look at the barrier type and DTRT.  In particular, the Linux
>> smp_rmb and smp_wmb types need not emit any code.
>
> These are converted to 'lfence' and 'sfence' instructions. Based on
> the target backend, I think we still need to emit barrier
> instructions. For example, if target backend is ARMv7 we need to emit
> 'dmb' instruction for both x86 fence instructions. I am not sure why
> they do not emit any code?

Because x86 has a strong memory model.

It does not require barriers to keep normal loads and stores in order.  The 
primary reason for the *fence instructions is to order the "non-temporal" 
memory operations that are part of the SSE instruction set, which we're not 
using at all.

This is why you'll find

/*
  * Because of the strongly ordered storage model, wmb() and rmb() are nops
  * here (a compiler barrier only).  QEMU doesn't do accesses to write-combining
  * qemu memory or non-temporal load/stores from C code.
  */
#define smp_wmb()   barrier()
#define smp_rmb()   barrier()

for x86 and s390.


r~

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-01 18:43     ` Pranith Kumar
@ 2016-06-01 21:35       ` Richard Henderson
  2016-06-02 16:18         ` Sergey Fedorov
  0 siblings, 1 reply; 61+ messages in thread
From: Richard Henderson @ 2016-06-01 21:35 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: open list:All patches CC here, Alex Bennée, Sergey Fedorov

On 06/01/2016 11:43 AM, Pranith Kumar wrote:
>> This is, IMO, confused.  Either we should use the C++11 barrier types, or
>> the Linux barrier types, but not both.
>
> This part of the design is still being fleshed out. The above listing
> is all the kinds of barriers we can encounter during translation. May
> be I am missing something, but C++11 and Linux barrier types have
> higher level semantics which is difficult to infer at the instruction
> level.

Fair enough.

> All we want to do here is map a barrier instruction from guest to a
> barrier instruction on hist. This mapping is 1:1 if the host has
> barrier instructions with matching semantics. Otherwise we generate a
> multi-op instruction sequence. Some examples are: load acquire(ldar)
> on ARM64 guest will map to dmb+load on ARMv7 target, store
> fence(sfence) on x86 guest will map to dmb on ARMv7 target.

Perhaps we should model this like the Sparc membar instruction, with individual 
bits for all combinations of Load x Store.  One can then describe exactly what 
the target semantics are for each barrier.


r~

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 02/13] tcg/i386: Add support for fence
  2016-06-01 21:17       ` Richard Henderson
@ 2016-06-01 21:44         ` Pranith Kumar
  0 siblings, 0 replies; 61+ messages in thread
From: Pranith Kumar @ 2016-06-01 21:44 UTC (permalink / raw)
  To: Richard Henderson; +Cc: open list:i386 target, Alex Bennée, Sergey Fedorov

On Wed, Jun 1, 2016 at 5:17 PM, Richard Henderson <rth@twiddle.net> wrote:
>
> Because x86 has a strong memory model.
>
> It does not require barriers to keep normal loads and stores in order.  The
> primary reason for the *fence instructions is to order the "non-temporal"
> memory operations that are part of the SSE instruction set, which we're not
> using at all.
>
> This is why you'll find
>
> /*
>  * Because of the strongly ordered storage model, wmb() and rmb() are nops
>  * here (a compiler barrier only).  QEMU doesn't do accesses to
> write-combining
>  * qemu memory or non-temporal load/stores from C code.
>  */
> #define smp_wmb()   barrier()
> #define smp_rmb()   barrier()
>
> for x86 and s390.


OK. For x86 target, that is true. I think I got the context confused.
On x86 target, we can elide the read and write barriers. But we still
need to generate 'mfence' to prevent store-after-load reordering. I
will refine this in the next version.

Thanks,
-- 
Pranith

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-01 21:35       ` Richard Henderson
@ 2016-06-02 16:18         ` Sergey Fedorov
  0 siblings, 0 replies; 61+ messages in thread
From: Sergey Fedorov @ 2016-06-02 16:18 UTC (permalink / raw)
  To: Richard Henderson, Pranith Kumar
  Cc: open list:All patches CC here, Alex Bennée

On 02/06/16 00:35, Richard Henderson wrote:
> On 06/01/2016 11:43 AM, Pranith Kumar wrote:
>> All we want to do here is map a barrier instruction from guest to a
>> barrier instruction on hist. This mapping is 1:1 if the host has
>> barrier instructions with matching semantics. Otherwise we generate a
>> multi-op instruction sequence. Some examples are: load acquire(ldar)
>> on ARM64 guest will map to dmb+load on ARMv7 target, store
>> fence(sfence) on x86 guest will map to dmb on ARMv7 target.
>
> Perhaps we should model this like the Sparc membar instruction, with
> individual bits for all combinations of Load x Store.  One can then
> describe exactly what the target semantics are for each barrier.

Seconded. Given that we don't support Alpha host we can ignore its
unique "Dependent loads reordered" feature. I suppose we're not going to
add Alpha host support :)

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier Pranith Kumar
  2016-05-31 20:24   ` Richard Henderson
@ 2016-06-02 16:30   ` Sergey Fedorov
  2016-06-02 18:42     ` Pranith Kumar
  2016-06-02 20:36     ` Richard Henderson
  1 sibling, 2 replies; 61+ messages in thread
From: Sergey Fedorov @ 2016-06-02 16:30 UTC (permalink / raw)
  To: Pranith Kumar, Richard Henderson, open list:All patches CC here
  Cc: serge.fdrv, alex.bennee

On 31/05/16 21:39, Pranith Kumar wrote:
> diff --git a/tcg/README b/tcg/README
> index f4a8ac1..cfe79d7 100644
> --- a/tcg/README
> +++ b/tcg/README
> @@ -402,6 +402,23 @@ double-word product T0.  The later is returned in two single-word outputs.
>  
>  Similar to mulu2, except the two inputs T1 and T2 are signed.
>  
> +********* Memory Barrier support
> +
> +* mb <$arg>
> +
> +Generate a target memory barrier instruction to ensure memory ordering as being
> +enforced by a corresponding guest memory barrier instruction. The ordering
> +enforced by the backend may be stricter than the ordering required by the guest.
> +It cannot be weaker. This opcode takes an optional constant argument if required
> +to generate the appropriate barrier instruction. The backend should take care to
> +emit the target barrier instruction only when necessary i.e., for SMP guests and
> +when MTTCG is enabled.
> +
> +The guest translators should generate this opcode for all guest instructions
> +which have ordering side effects.

I think we need to extend TCG load/store instruction attributes to
provide information about guest ordering requirements and leave this TCG
operation only for explicit barrier instruction translation.

> +
> +Please see docs/atomics.txt for more information on memory barriers.
> +
>  ********* 64-bit guest on 32-bit host support
>  
>  The following opcodes are internal to TCG.  Thus they are to be implemented by
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index f554b86..a6f01a7 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -143,6 +143,12 @@ void tcg_gen_op6(TCGContext *ctx, TCGOpcode opc, TCGArg a1, TCGArg a2,
>      tcg_emit_op(ctx, opc, pi);
>  }
>  
> +void tcg_gen_mb(TCGArg a)
> +{
> +    /* ??? Enable only when MTTCG is enabled.  */
> +    tcg_gen_op1(&tcg_ctx, INDEX_op_mb, 0);

Yes, this could be a right place to check for MTTCG enabled and number
of CPUs emulated. If we do it here, then we should adjust the
documentation stating that the backend should take care of it.

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-02 16:30   ` Sergey Fedorov
@ 2016-06-02 18:42     ` Pranith Kumar
  2016-06-02 20:36       ` Richard Henderson
  2016-06-02 20:36     ` Richard Henderson
  1 sibling, 1 reply; 61+ messages in thread
From: Pranith Kumar @ 2016-06-02 18:42 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: Richard Henderson, open list:All patches CC here, serge.fdrv,
	alex.bennee


Sergey Fedorov writes:

> On 31/05/16 21:39, Pranith Kumar wrote:
>> diff --git a/tcg/README b/tcg/README
>> index f4a8ac1..cfe79d7 100644
>> --- a/tcg/README
>> +++ b/tcg/README
>> @@ -402,6 +402,23 @@ double-word product T0.  The later is returned in two single-word outputs.
>>  
>>  Similar to mulu2, except the two inputs T1 and T2 are signed.
>>  
>> +********* Memory Barrier support
>> +
>> +* mb <$arg>
>> +
>> +Generate a target memory barrier instruction to ensure memory ordering as being
>> +enforced by a corresponding guest memory barrier instruction. The ordering
>> +enforced by the backend may be stricter than the ordering required by the guest.
>> +It cannot be weaker. This opcode takes an optional constant argument if required
>> +to generate the appropriate barrier instruction. The backend should take care to
>> +emit the target barrier instruction only when necessary i.e., for SMP guests and
>> +when MTTCG is enabled.
>> +
>> +The guest translators should generate this opcode for all guest instructions
>> +which have ordering side effects.
>
> I think we need to extend TCG load/store instruction attributes to
> provide information about guest ordering requirements and leave this TCG
> operation only for explicit barrier instruction translation.
>

Yes, I am working on adding flag argument to the TCG MemOp which indicates
this.

>> +
>> +Please see docs/atomics.txt for more information on memory barriers.
>> +
>>  ********* 64-bit guest on 32-bit host support
>>  
>>  The following opcodes are internal to TCG.  Thus they are to be implemented by
>> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
>> index f554b86..a6f01a7 100644
>> --- a/tcg/tcg-op.c
>> +++ b/tcg/tcg-op.c
>> @@ -143,6 +143,12 @@ void tcg_gen_op6(TCGContext *ctx, TCGOpcode opc, TCGArg a1, TCGArg a2,
>>      tcg_emit_op(ctx, opc, pi);
>>  }
>>  
>> +void tcg_gen_mb(TCGArg a)
>> +{
>> +    /* ??? Enable only when MTTCG is enabled.  */
>> +    tcg_gen_op1(&tcg_ctx, INDEX_op_mb, 0);
>
> Yes, this could be a right place to check for MTTCG enabled and number
> of CPUs emulated. If we do it here, then we should adjust the
> documentation stating that the backend should take care of it.
>

I added the check in Patch 13. I will update the documentation to reflect this.

Thanks,
-- 
Pranith

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 08/13] tcg/s390: Add support for fence
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 08/13] tcg/s390: " Pranith Kumar
@ 2016-06-02 19:31   ` Sergey Fedorov
  2016-06-02 20:38     ` Richard Henderson
  0 siblings, 1 reply; 61+ messages in thread
From: Sergey Fedorov @ 2016-06-02 19:31 UTC (permalink / raw)
  To: Pranith Kumar, Alexander Graf, Richard Henderson,
	open list:All patches CC here
  Cc: serge.fdrv, alex.bennee

On 31/05/16 21:39, Pranith Kumar wrote:
> Cc: Alexander Graf <agraf@suse.de>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> ---
>  tcg/s390/tcg-target.inc.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
> index e95b04b..b4f14bc 100644
> --- a/tcg/s390/tcg-target.inc.c
> +++ b/tcg/s390/tcg-target.inc.c
> @@ -341,6 +341,7 @@ static tcg_insn_unit *tb_ret_addr;
>  #define FACILITY_EXT_IMM	(1ULL << (63 - 21))
>  #define FACILITY_GEN_INST_EXT	(1ULL << (63 - 34))
>  #define FACILITY_LOAD_ON_COND   (1ULL << (63 - 45))
> +#define FACILITY_FAST_BCR_SER   FACILITY_LOAD_ON_COND
>  
>  static uint64_t facilities;
>  
> @@ -2157,6 +2158,13 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          tgen_deposit(s, args[0], args[2], args[3], args[4]);
>          break;
>  
> +    case INDEX_op_mb:
> +        /* The host memory model is quite strong, we simply need to
> +           serialize the instruction stream.  */
> +        tcg_out_insn(s, RR, BCR,
> +		     facilities & FACILITY_FAST_BCR_SER ? 14 : 15, 0);
> +        break;
> +

Do we? What does that mean?

Kind regards,
Sergey

>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
>      case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
> @@ -2278,6 +2286,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
>      { INDEX_op_movcond_i64, { "r", "r", "rC", "r", "0" } },
>      { INDEX_op_deposit_i64, { "r", "0", "r" } },
>  
> +    { INDEX_op_mb, { "r" } },
>      { -1 },
>  };
>  

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 11/13] target-arm: Generate fences in ARMv7 frontend
  2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 11/13] target-arm: Generate fences in ARMv7 frontend Pranith Kumar
@ 2016-06-02 19:37   ` Sergey Fedorov
  2016-06-04 14:50     ` Pranith Kumar
  0 siblings, 1 reply; 61+ messages in thread
From: Sergey Fedorov @ 2016-06-02 19:37 UTC (permalink / raw)
  To: Pranith Kumar, Peter Maydell, open list:ARM,
	open list:All patches CC here
  Cc: alex.bennee, rth

On 31/05/16 21:39, Pranith Kumar wrote:
> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-arm/translate.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index c946c0e..e1b16c0 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -7980,9 +7980,11 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
>                  gen_clrex(s);
>                  return;
>              case 4: /* dsb */
> +                ARCH(7);

ARMv7 ARM says: "A DSB behaves as a DMB with the same arguments, and
also has the additional properties...".

Kind regards,
Sergey

> +                return;
>              case 5: /* dmb */
>                  ARCH(7);
> -                /* We don't emulate caches so these are a no-op.  */
> +                tcg_gen_mb(TCG_MB_FULL);
>                  return;
>              case 6: /* isb */
>                  /* We need to break the TB after this insn to execute
> @@ -10330,8 +10332,9 @@ static int disas_thumb2_insn(CPUARMState *env, DisasContext *s, uint16_t insn_hw
>                              gen_clrex(s);
>                              break;
>                          case 4: /* dsb */
> +                            break;
>                          case 5: /* dmb */
> -                            /* These execute as NOPs.  */
> +                            tcg_gen_mb(TCG_MB_FULL);
>                              break;
>                          case 6: /* isb */
>                              /* We need to break the TB after this insn

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-02 16:30   ` Sergey Fedorov
  2016-06-02 18:42     ` Pranith Kumar
@ 2016-06-02 20:36     ` Richard Henderson
  2016-06-02 20:38       ` Sergey Fedorov
  1 sibling, 1 reply; 61+ messages in thread
From: Richard Henderson @ 2016-06-02 20:36 UTC (permalink / raw)
  To: Sergey Fedorov, Pranith Kumar, open list:All patches CC here
  Cc: serge.fdrv, alex.bennee

On 06/02/2016 09:30 AM, Sergey Fedorov wrote:
> I think we need to extend TCG load/store instruction attributes to
> provide information about guest ordering requirements and leave this TCG
> operation only for explicit barrier instruction translation.

I do not agree.  I think separate barriers are much cleaner and easier to 
manage and reason with.


r~

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-02 18:42     ` Pranith Kumar
@ 2016-06-02 20:36       ` Richard Henderson
  0 siblings, 0 replies; 61+ messages in thread
From: Richard Henderson @ 2016-06-02 20:36 UTC (permalink / raw)
  To: Pranith Kumar, Sergey Fedorov
  Cc: open list:All patches CC here, serge.fdrv, alex.bennee

On 06/02/2016 11:42 AM, Pranith Kumar wrote:
> Yes, I am working on adding flag argument to the TCG MemOp which indicates
> this.

Please don't.  I don't think this is the right way to go.


r~

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 08/13] tcg/s390: Add support for fence
  2016-06-02 19:31   ` Sergey Fedorov
@ 2016-06-02 20:38     ` Richard Henderson
  0 siblings, 0 replies; 61+ messages in thread
From: Richard Henderson @ 2016-06-02 20:38 UTC (permalink / raw)
  To: Sergey Fedorov, Pranith Kumar, Alexander Graf,
	open list:All patches CC here
  Cc: serge.fdrv, alex.bennee

On 06/02/2016 12:31 PM, Sergey Fedorov wrote:
> On 31/05/16 21:39, Pranith Kumar wrote:
>> Cc: Alexander Graf <agraf@suse.de>
>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
>> ---
>>  tcg/s390/tcg-target.inc.c | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
>> index e95b04b..b4f14bc 100644
>> --- a/tcg/s390/tcg-target.inc.c
>> +++ b/tcg/s390/tcg-target.inc.c
>> @@ -341,6 +341,7 @@ static tcg_insn_unit *tb_ret_addr;
>>  #define FACILITY_EXT_IMM	(1ULL << (63 - 21))
>>  #define FACILITY_GEN_INST_EXT	(1ULL << (63 - 34))
>>  #define FACILITY_LOAD_ON_COND   (1ULL << (63 - 45))
>> +#define FACILITY_FAST_BCR_SER   FACILITY_LOAD_ON_COND
>>
>>  static uint64_t facilities;
>>
>> @@ -2157,6 +2158,13 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>>          tgen_deposit(s, args[0], args[2], args[3], args[4]);
>>          break;
>>
>> +    case INDEX_op_mb:
>> +        /* The host memory model is quite strong, we simply need to
>> +           serialize the instruction stream.  */
>> +        tcg_out_insn(s, RR, BCR,
>> +		     facilities & FACILITY_FAST_BCR_SER ? 14 : 15, 0);
>> +        break;
>> +
>
> Do we? What does that mean?

It's the difference between ACQ_REL and SEQ_CST for s390x.  FWIW.


r~

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-02 20:36     ` Richard Henderson
@ 2016-06-02 20:38       ` Sergey Fedorov
  2016-06-02 21:18         ` Richard Henderson
  0 siblings, 1 reply; 61+ messages in thread
From: Sergey Fedorov @ 2016-06-02 20:38 UTC (permalink / raw)
  To: Richard Henderson, Pranith Kumar, open list:All patches CC here
  Cc: serge.fdrv, alex.bennee

On 02/06/16 23:36, Richard Henderson wrote:
> On 06/02/2016 09:30 AM, Sergey Fedorov wrote:
>> I think we need to extend TCG load/store instruction attributes to
>> provide information about guest ordering requirements and leave this TCG
>> operation only for explicit barrier instruction translation.
>
> I do not agree.  I think separate barriers are much cleaner and easier
> to manage and reason with.
>

How are we going to emulate strongly-ordered guests on weakly-ordered
hosts then? I think if every load/store operation must specify which
ordering it implies then this task would be quite simple.

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-02 20:38       ` Sergey Fedorov
@ 2016-06-02 21:18         ` Richard Henderson
  2016-06-02 21:37           ` Sergey Fedorov
  2016-06-03 18:27           ` Pranith Kumar
  0 siblings, 2 replies; 61+ messages in thread
From: Richard Henderson @ 2016-06-02 21:18 UTC (permalink / raw)
  To: Sergey Fedorov, Pranith Kumar, open list:All patches CC here
  Cc: serge.fdrv, alex.bennee

On 06/02/2016 01:38 PM, Sergey Fedorov wrote:
> On 02/06/16 23:36, Richard Henderson wrote:
>> On 06/02/2016 09:30 AM, Sergey Fedorov wrote:
>>> I think we need to extend TCG load/store instruction attributes to
>>> provide information about guest ordering requirements and leave this TCG
>>> operation only for explicit barrier instruction translation.
>>
>> I do not agree.  I think separate barriers are much cleaner and easier
>> to manage and reason with.
>>
>
> How are we going to emulate strongly-ordered guests on weakly-ordered
> hosts then? I think if every load/store operation must specify which
> ordering it implies then this task would be quite simple.

Hum.  That does seem helpful-ish.  But I'm not certain how helpful it is to 
complicate the helper functions even further.

What if we have tcg_canonicalize_memop (or some such) split off the barriers 
into separate opcodes.  E.g.

MO_BAR_LD_B = 32	// prevent earlier loads from crossing current op
MO_BAR_ST_B = 64	// prevent earlier stores from crossing current op
MO_BAR_LD_A = 128	// prevent later loads from crossing current op
MO_BAR_ST_A = 256	// prevent later stores from crossing current op
MO_BAR_LDST_B = MO_BAR_LD_B | MO_BAR_ST_B
MO_BAR_LDST_A = MO_BAR_LD_A | MO_BAR_ST_A
MO_BAR_MASK = MO_BAR_LDST_B | MO_BAR_LDST_A

// Match Sparc MEMBAR as the most flexible host.
TCG_BAR_LD_LD = 1	// #LoadLoad barrier
TCG_BAR_ST_LD = 2	// #StoreLoad barrier
TCG_BAR_LD_ST = 4	// #LoadStore barrier
TCG_BAR_ST_ST = 8	// #StoreStore barrier
TCG_BAR_SYNC  = 64	// SEQ_CST barrier

where

   tcg_gen_qemu_ld_i32(x, y, i, m | MO_BAR_LD_BEFORE | MO_BAR_ST_AFTER)

emits

   mb		TCG_BAR_LD_LD
   qemu_ld_i32	x, y, i, m
   mb		TCG_BAR_LD_ST

We can then add an optimization pass which folds barriers with no memory 
operations in between, so that duplicates are eliminated.


r~

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-02 21:18         ` Richard Henderson
@ 2016-06-02 21:37           ` Sergey Fedorov
  2016-06-03  1:08             ` Richard Henderson
  2016-06-03 18:27           ` Pranith Kumar
  1 sibling, 1 reply; 61+ messages in thread
From: Sergey Fedorov @ 2016-06-02 21:37 UTC (permalink / raw)
  To: Richard Henderson, Pranith Kumar, open list:All patches CC here
  Cc: serge.fdrv, alex.bennee

On 03/06/16 00:18, Richard Henderson wrote:
> On 06/02/2016 01:38 PM, Sergey Fedorov wrote:
>> On 02/06/16 23:36, Richard Henderson wrote:
>>> On 06/02/2016 09:30 AM, Sergey Fedorov wrote:
>>>> I think we need to extend TCG load/store instruction attributes to
>>>> provide information about guest ordering requirements and leave
>>>> this TCG
>>>> operation only for explicit barrier instruction translation.
>>>
>>> I do not agree.  I think separate barriers are much cleaner and easier
>>> to manage and reason with.
>>>
>>
>> How are we going to emulate strongly-ordered guests on weakly-ordered
>> hosts then? I think if every load/store operation must specify which
>> ordering it implies then this task would be quite simple.
>
> Hum.  That does seem helpful-ish.  But I'm not certain how helpful it
> is to complicate the helper functions even further.
>
> What if we have tcg_canonicalize_memop (or some such) split off the
> barriers into separate opcodes.  E.g.
>
> MO_BAR_LD_B = 32    // prevent earlier loads from crossing current op
> MO_BAR_ST_B = 64    // prevent earlier stores from crossing current op
> MO_BAR_LD_A = 128    // prevent later loads from crossing current op
> MO_BAR_ST_A = 256    // prevent later stores from crossing current op
> MO_BAR_LDST_B = MO_BAR_LD_B | MO_BAR_ST_B
> MO_BAR_LDST_A = MO_BAR_LD_A | MO_BAR_ST_A
> MO_BAR_MASK = MO_BAR_LDST_B | MO_BAR_LDST_A
>
> // Match Sparc MEMBAR as the most flexible host.
> TCG_BAR_LD_LD = 1    // #LoadLoad barrier
> TCG_BAR_ST_LD = 2    // #StoreLoad barrier
> TCG_BAR_LD_ST = 4    // #LoadStore barrier
> TCG_BAR_ST_ST = 8    // #StoreStore barrier
> TCG_BAR_SYNC  = 64    // SEQ_CST barrier
>
> where
>
>   tcg_gen_qemu_ld_i32(x, y, i, m | MO_BAR_LD_BEFORE | MO_BAR_ST_AFTER)
>
> emits
>
>   mb        TCG_BAR_LD_LD
>   qemu_ld_i32    x, y, i, m
>   mb        TCG_BAR_LD_ST
>
> We can then add an optimization pass which folds barriers with no
> memory operations in between, so that duplicates are eliminated.

It would give us three TCG operations for each memory operation instead
of one. But then we might like to combine these barrier operations back
with memory operations in each backend. If we propagate memory ordering
semantics up to the backend, it can decide itself what instructions are
best to generate.

So I would just focus on translating only explicit memory barrier
operations for now.

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-02 21:37           ` Sergey Fedorov
@ 2016-06-03  1:08             ` Richard Henderson
  2016-06-03 15:16               ` Sergey Fedorov
  2016-06-03 18:30               ` Pranith Kumar
  0 siblings, 2 replies; 61+ messages in thread
From: Richard Henderson @ 2016-06-03  1:08 UTC (permalink / raw)
  To: Sergey Fedorov, Pranith Kumar, open list:All patches CC here
  Cc: serge.fdrv, alex.bennee

On 06/02/2016 02:37 PM, Sergey Fedorov wrote:
> On 03/06/16 00:18, Richard Henderson wrote:
>> On 06/02/2016 01:38 PM, Sergey Fedorov wrote:
>>> On 02/06/16 23:36, Richard Henderson wrote:
>>>> On 06/02/2016 09:30 AM, Sergey Fedorov wrote:
>>>>> I think we need to extend TCG load/store instruction attributes to
>>>>> provide information about guest ordering requirements and leave
>>>>> this TCG
>>>>> operation only for explicit barrier instruction translation.
>>>>
>>>> I do not agree.  I think separate barriers are much cleaner and easier
>>>> to manage and reason with.
>>>>
>>>
>>> How are we going to emulate strongly-ordered guests on weakly-ordered
>>> hosts then? I think if every load/store operation must specify which
>>> ordering it implies then this task would be quite simple.
>>
>> Hum.  That does seem helpful-ish.  But I'm not certain how helpful it
>> is to complicate the helper functions even further.
>>
>> What if we have tcg_canonicalize_memop (or some such) split off the
>> barriers into separate opcodes.  E.g.
>>
>> MO_BAR_LD_B = 32    // prevent earlier loads from crossing current op
>> MO_BAR_ST_B = 64    // prevent earlier stores from crossing current op
>> MO_BAR_LD_A = 128    // prevent later loads from crossing current op
>> MO_BAR_ST_A = 256    // prevent later stores from crossing current op
>> MO_BAR_LDST_B = MO_BAR_LD_B | MO_BAR_ST_B
>> MO_BAR_LDST_A = MO_BAR_LD_A | MO_BAR_ST_A
>> MO_BAR_MASK = MO_BAR_LDST_B | MO_BAR_LDST_A
>>
>> // Match Sparc MEMBAR as the most flexible host.
>> TCG_BAR_LD_LD = 1    // #LoadLoad barrier
>> TCG_BAR_ST_LD = 2    // #StoreLoad barrier
>> TCG_BAR_LD_ST = 4    // #LoadStore barrier
>> TCG_BAR_ST_ST = 8    // #StoreStore barrier
>> TCG_BAR_SYNC  = 64    // SEQ_CST barrier
>>
>> where
>>
>>   tcg_gen_qemu_ld_i32(x, y, i, m | MO_BAR_LD_BEFORE | MO_BAR_ST_AFTER)
>>
>> emits
>>
>>   mb        TCG_BAR_LD_LD
>>   qemu_ld_i32    x, y, i, m
>>   mb        TCG_BAR_LD_ST
>>
>> We can then add an optimization pass which folds barriers with no
>> memory operations in between, so that duplicates are eliminated.
>
> It would give us three TCG operations for each memory operation instead
> of one. But then we might like to combine these barrier operations back
> with memory operations in each backend. If we propagate memory ordering
> semantics up to the backend, it can decide itself what instructions are
> best to generate.

A strongly ordered target would generally only set BEFORE bits or AFTER bits, 
but not both (and I suggest we canonicalize on AFTER for all such targets). 
Thus a strongly ordered target would produce only 2 opcodes per memory op.

I supplied both to make it easier to handle a weakly ordered target with 
acquire/release bits.

I would *not* combine the barrier operations back with memory operations in the 
backend.  Only armv8 and ia64 can do that, and given the optimization level at 
which we generate code, I doubt it would really make much difference above 
separate barriers.

> So I would just focus on translating only explicit memory barrier
> operations for now.

Then why did you bring it up?


r~

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-03  1:08             ` Richard Henderson
@ 2016-06-03 15:16               ` Sergey Fedorov
  2016-06-03 15:45                 ` Richard Henderson
  2016-06-03 18:30               ` Pranith Kumar
  1 sibling, 1 reply; 61+ messages in thread
From: Sergey Fedorov @ 2016-06-03 15:16 UTC (permalink / raw)
  To: Richard Henderson, Pranith Kumar, open list:All patches CC here
  Cc: serge.fdrv, alex.bennee

On 03/06/16 04:08, Richard Henderson wrote:
> On 06/02/2016 02:37 PM, Sergey Fedorov wrote:
>> On 03/06/16 00:18, Richard Henderson wrote:
>>> On 06/02/2016 01:38 PM, Sergey Fedorov wrote:
>>>> On 02/06/16 23:36, Richard Henderson wrote:
>>>>> On 06/02/2016 09:30 AM, Sergey Fedorov wrote:
>>>>>> I think we need to extend TCG load/store instruction attributes to
>>>>>> provide information about guest ordering requirements and leave
>>>>>> this TCG
>>>>>> operation only for explicit barrier instruction translation.
>>>>>
>>>>> I do not agree.  I think separate barriers are much cleaner and
>>>>> easier
>>>>> to manage and reason with.
>>>>>
>>>>
>>>> How are we going to emulate strongly-ordered guests on weakly-ordered
>>>> hosts then? I think if every load/store operation must specify which
>>>> ordering it implies then this task would be quite simple.
>>>
>>> Hum.  That does seem helpful-ish.  But I'm not certain how helpful it
>>> is to complicate the helper functions even further.
>>>
>>> What if we have tcg_canonicalize_memop (or some such) split off the
>>> barriers into separate opcodes.  E.g.
>>>
>>> MO_BAR_LD_B = 32    // prevent earlier loads from crossing current op
>>> MO_BAR_ST_B = 64    // prevent earlier stores from crossing current op
>>> MO_BAR_LD_A = 128    // prevent later loads from crossing current op
>>> MO_BAR_ST_A = 256    // prevent later stores from crossing current op
>>> MO_BAR_LDST_B = MO_BAR_LD_B | MO_BAR_ST_B
>>> MO_BAR_LDST_A = MO_BAR_LD_A | MO_BAR_ST_A
>>> MO_BAR_MASK = MO_BAR_LDST_B | MO_BAR_LDST_A
>>>
>>> // Match Sparc MEMBAR as the most flexible host.
>>> TCG_BAR_LD_LD = 1    // #LoadLoad barrier
>>> TCG_BAR_ST_LD = 2    // #StoreLoad barrier
>>> TCG_BAR_LD_ST = 4    // #LoadStore barrier
>>> TCG_BAR_ST_ST = 8    // #StoreStore barrier
>>> TCG_BAR_SYNC  = 64    // SEQ_CST barrier
>>>
>>> where
>>>
>>>   tcg_gen_qemu_ld_i32(x, y, i, m | MO_BAR_LD_BEFORE | MO_BAR_ST_AFTER)
>>>
>>> emits
>>>
>>>   mb        TCG_BAR_LD_LD
>>>   qemu_ld_i32    x, y, i, m
>>>   mb        TCG_BAR_LD_ST
>>>
>>> We can then add an optimization pass which folds barriers with no
>>> memory operations in between, so that duplicates are eliminated.
>>
>> It would give us three TCG operations for each memory operation instead
>> of one. But then we might like to combine these barrier operations back
>> with memory operations in each backend. If we propagate memory ordering
>> semantics up to the backend, it can decide itself what instructions are
>> best to generate.
>
> A strongly ordered target would generally only set BEFORE bits or
> AFTER bits, but not both (and I suggest we canonicalize on AFTER for
> all such targets). Thus a strongly ordered target would produce only 2
> opcodes per memory op.
>
> I supplied both to make it easier to handle a weakly ordered target
> with acquire/release bits.
>
> I would *not* combine the barrier operations back with memory
> operations in the backend.  Only armv8 and ia64 can do that, and given
> the optimization level at which we generate code, I doubt it would
> really make much difference above separate barriers.

So your suggestion is to generate different TCG opcode sequences
depending on the underlying target architecture? And you are against
forwarding this task further, to the backend code?

>
>> So I would just focus on translating only explicit memory barrier
>> operations for now.
>
> Then why did you bring it up?

I'm not sure I got the question right. I suggested to avoid using this
TCG operation to emulate guest's memory ordering requirements for
loads/stores that can be supplied with memory ordering requirement
information which each backend can decide how to translate together with
the load/store (possible just ignore it as it is the case for
strongly-ordered hosts). I think we just need to translate explicit
memory barrier instructions.

For example, emulating ARM guest on x86 host requires ARM dmb
instruction to be translated to x86 mfence instruction to prevent
store-after-load reordering. At the same time, we don't have to generate
anything special for loads/stores since x86 is a strongly-ordered
architecture.

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-03 15:16               ` Sergey Fedorov
@ 2016-06-03 15:45                 ` Richard Henderson
  2016-06-03 16:06                   ` Sergey Fedorov
  0 siblings, 1 reply; 61+ messages in thread
From: Richard Henderson @ 2016-06-03 15:45 UTC (permalink / raw)
  To: Sergey Fedorov, Pranith Kumar, open list:All patches CC here
  Cc: serge.fdrv, alex.bennee

On 06/03/2016 08:16 AM, Sergey Fedorov wrote:
> On 03/06/16 04:08, Richard Henderson wrote:
> So your suggestion is to generate different TCG opcode sequences
> depending on the underlying target architecture? And you are against
> forwarding this task further, to the backend code?

Yes, I would prefer to have, in the opcode stream, a separate opcode for 
barriers.  This aids both the common case, where most of our hosts require 
separate barriers, as well as simplicity.

I am not opposed to letting the translators describe the memory model with 
barrier data along with memory operations, but I'd really prefer that those be 
split apart during initial opcode generation.

>>> So I would just focus on translating only explicit memory barrier
>>> operations for now.
>>
>> Then why did you bring it up?
>
> I'm not sure I got the question right. I suggested to avoid using this
> TCG operation to emulate guest's memory ordering requirements for
> loads/stores that can be supplied with memory ordering requirement
> information which each backend can decide how to translate together with
> the load/store (possible just ignore it as it is the case for
> strongly-ordered hosts). I think we just need to translate explicit
> memory barrier instructions.
>
> For example, emulating ARM guest on x86 host requires ARM dmb
> instruction to be translated to x86 mfence instruction to prevent
> store-after-load reordering. At the same time, we don't have to generate
> anything special for loads/stores since x86 is a strongly-ordered
> architecture.

Ah, so you'd prefer that we not think about optimizing barriers at the moment. 
Fine, but I'd prefer to think about *how* they might be optimized now, so that 
we *can* later.


r~

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-03 15:45                 ` Richard Henderson
@ 2016-06-03 16:06                   ` Sergey Fedorov
  0 siblings, 0 replies; 61+ messages in thread
From: Sergey Fedorov @ 2016-06-03 16:06 UTC (permalink / raw)
  To: Richard Henderson, Pranith Kumar, open list:All patches CC here
  Cc: serge.fdrv, alex.bennee

On 03/06/16 18:45, Richard Henderson wrote:
> On 06/03/2016 08:16 AM, Sergey Fedorov wrote:
>> On 03/06/16 04:08, Richard Henderson wrote:
>> So your suggestion is to generate different TCG opcode sequences
>> depending on the underlying target architecture? And you are against
>> forwarding this task further, to the backend code?
>
> Yes, I would prefer to have, in the opcode stream, a separate opcode
> for barriers.  This aids both the common case, where most of our hosts
> require separate barriers, as well as simplicity.
>
> I am not opposed to letting the translators describe the memory model
> with barrier data along with memory operations, but I'd really prefer
> that those be split apart during initial opcode generation.
>
>>>> So I would just focus on translating only explicit memory barrier
>>>> operations for now.
>>>
>>> Then why did you bring it up?
>>
>> I'm not sure I got the question right. I suggested to avoid using this
>> TCG operation to emulate guest's memory ordering requirements for
>> loads/stores that can be supplied with memory ordering requirement
>> information which each backend can decide how to translate together with
>> the load/store (possible just ignore it as it is the case for
>> strongly-ordered hosts). I think we just need to translate explicit
>> memory barrier instructions.
>>
>> For example, emulating ARM guest on x86 host requires ARM dmb
>> instruction to be translated to x86 mfence instruction to prevent
>> store-after-load reordering. At the same time, we don't have to generate
>> anything special for loads/stores since x86 is a strongly-ordered
>> architecture.
>
> Ah, so you'd prefer that we not think about optimizing barriers at the
> moment. Fine, but I'd prefer to think about *how* they might be
> optimized now, so that we *can* later.

Not exactly. We need to have a TCG operation for various types of
explicit barriers in order to translate guest explicit barrier
instructions. I like your idea to follow Sparc's way to specify membar
instruction attributes which can be used by the backed for generating
optimal instructions. I think we also need to associate memory ordering
attributes with load/store TCG operations. I'm not sure how would be
best to handle load/store implicit memory ordering requirements, but it
is probably out of the scope of this series. I'd propagate this
attribute up to the backend and let it decide what kind of instructions
to generate. I'd prefer to see only explicit barrier operations in the
TCG opcode stream.

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-02 21:18         ` Richard Henderson
  2016-06-02 21:37           ` Sergey Fedorov
@ 2016-06-03 18:27           ` Pranith Kumar
  2016-06-03 19:52             ` Sergey Fedorov
  2016-06-06 15:44             ` Sergey Fedorov
  1 sibling, 2 replies; 61+ messages in thread
From: Pranith Kumar @ 2016-06-03 18:27 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Sergey Fedorov, open list:All patches CC here, serge.fdrv,
	Alex Bennée

On Thu, Jun 2, 2016 at 5:18 PM, Richard Henderson <rth@twiddle.net> wrote:
>
> Hum.  That does seem helpful-ish.  But I'm not certain how helpful it is to
> complicate the helper functions even further.
>
> What if we have tcg_canonicalize_memop (or some such) split off the barriers
> into separate opcodes.  E.g.
>
> MO_BAR_LD_B = 32        // prevent earlier loads from crossing current op
> MO_BAR_ST_B = 64        // prevent earlier stores from crossing current op
> MO_BAR_LD_A = 128       // prevent later loads from crossing current op
> MO_BAR_ST_A = 256       // prevent later stores from crossing current op
> MO_BAR_LDST_B = MO_BAR_LD_B | MO_BAR_ST_B
> MO_BAR_LDST_A = MO_BAR_LD_A | MO_BAR_ST_A
> MO_BAR_MASK = MO_BAR_LDST_B | MO_BAR_LDST_A
>
> // Match Sparc MEMBAR as the most flexible host.
> TCG_BAR_LD_LD = 1       // #LoadLoad barrier
> TCG_BAR_ST_LD = 2       // #StoreLoad barrier
> TCG_BAR_LD_ST = 4       // #LoadStore barrier
> TCG_BAR_ST_ST = 8       // #StoreStore barrier
> TCG_BAR_SYNC  = 64      // SEQ_CST barrier

I really like this format. I would also like to add to the frontend:

MO_BAR_ACQUIRE
MO_BAR_RELEASE

and the following to the backend:

TCG_BAR_ACQUIRE
TCG_BAR_RELEASE

since these are one-way barriers and the previous barrier types do not
cover them.

>
> where
>
>   tcg_gen_qemu_ld_i32(x, y, i, m | MO_BAR_LD_BEFORE | MO_BAR_ST_AFTER)
>
> emits
>
>   mb            TCG_BAR_LD_LD
>   qemu_ld_i32   x, y, i, m
>   mb            TCG_BAR_LD_ST
>
> We can then add an optimization pass which folds barriers with no memory
> operations in between, so that duplicates are eliminated.
>

Yes, folding/eliding these barriers in an optimization pass sounds
like a good idea.

Thanks,
-- 
Pranith

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-03  1:08             ` Richard Henderson
  2016-06-03 15:16               ` Sergey Fedorov
@ 2016-06-03 18:30               ` Pranith Kumar
  2016-06-03 19:49                 ` Sergey Fedorov
  1 sibling, 1 reply; 61+ messages in thread
From: Pranith Kumar @ 2016-06-03 18:30 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Sergey Fedorov, open list:All patches CC here, Alex Bennée

On Thu, Jun 2, 2016 at 9:08 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 06/02/2016 02:37 PM, Sergey Fedorov wrote:
>>
>>
>> It would give us three TCG operations for each memory operation instead
>> of one. But then we might like to combine these barrier operations back
>> with memory operations in each backend. If we propagate memory ordering
>> semantics up to the backend, it can decide itself what instructions are
>> best to generate.
>
>
> A strongly ordered target would generally only set BEFORE bits or AFTER
> bits, but not both (and I suggest we canonicalize on AFTER for all such
> targets). Thus a strongly ordered target would produce only 2 opcodes per
> memory op.
>
> I supplied both to make it easier to handle a weakly ordered target with
> acquire/release bits.
>
> I would *not* combine the barrier operations back with memory operations in
> the backend.  Only armv8 and ia64 can do that, and given the optimization
> level at which we generate code, I doubt it would really make much
> difference above separate barriers.
>

On armv8, using load_acquire/store_release instructions makes a
significant difference in performance when compared to plain
dmb+memory instruction sequence. So I would really like to keep the
option of generating acq/rel instructions(by combining barrier and
memory or some other way) open.

Thanks,
-- 
Pranith

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-03 18:30               ` Pranith Kumar
@ 2016-06-03 19:49                 ` Sergey Fedorov
  2016-06-03 20:43                   ` Peter Maydell
  2016-06-06 16:19                   ` Alex Bennée
  0 siblings, 2 replies; 61+ messages in thread
From: Sergey Fedorov @ 2016-06-03 19:49 UTC (permalink / raw)
  To: Pranith Kumar, Richard Henderson
  Cc: open list:All patches CC here, Alex Bennée

On 03/06/16 21:30, Pranith Kumar wrote:
> On Thu, Jun 2, 2016 at 9:08 PM, Richard Henderson <rth@twiddle.net> wrote:
>> On 06/02/2016 02:37 PM, Sergey Fedorov wrote:
>>>
>>> It would give us three TCG operations for each memory operation instead
>>> of one. But then we might like to combine these barrier operations back
>>> with memory operations in each backend. If we propagate memory ordering
>>> semantics up to the backend, it can decide itself what instructions are
>>> best to generate.
>>
>> A strongly ordered target would generally only set BEFORE bits or AFTER
>> bits, but not both (and I suggest we canonicalize on AFTER for all such
>> targets). Thus a strongly ordered target would produce only 2 opcodes per
>> memory op.
>>
>> I supplied both to make it easier to handle a weakly ordered target with
>> acquire/release bits.
>>
>> I would *not* combine the barrier operations back with memory operations in
>> the backend.  Only armv8 and ia64 can do that, and given the optimization
>> level at which we generate code, I doubt it would really make much
>> difference above separate barriers.
>>
> On armv8, using load_acquire/store_release instructions makes a
> significant difference in performance when compared to plain
> dmb+memory instruction sequence. So I would really like to keep the
> option of generating acq/rel instructions(by combining barrier and
> memory or some other way) open.

I'm not so sure about acq/rel flags. Is there any architecture which has
explicit acq/rel barriers? I suppose acq/rel memory access instructions
are always load-link and store-conditional and thus rely on exclusive
memory monitor to support that "conditional" behaviour. To emulate this
behaviour we need something more special like "Slow-path for atomic
instruction translation" [1].

[1] http://thread.gmane.org/gmane.comp.emulators.qemu/407419

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-03 18:27           ` Pranith Kumar
@ 2016-06-03 19:52             ` Sergey Fedorov
  2016-06-06 15:44             ` Sergey Fedorov
  1 sibling, 0 replies; 61+ messages in thread
From: Sergey Fedorov @ 2016-06-03 19:52 UTC (permalink / raw)
  To: Pranith Kumar, Richard Henderson
  Cc: open list:All patches CC here, serge.fdrv, Alex Bennée

On 03/06/16 21:27, Pranith Kumar wrote:
> On Thu, Jun 2, 2016 at 5:18 PM, Richard Henderson <rth@twiddle.net> wrote:
>> >
>> > Hum.  That does seem helpful-ish.  But I'm not certain how helpful it is to
>> > complicate the helper functions even further.
>> >
>> > What if we have tcg_canonicalize_memop (or some such) split off the barriers
>> > into separate opcodes.  E.g.
>> >
>> > MO_BAR_LD_B = 32        // prevent earlier loads from crossing current op
>> > MO_BAR_ST_B = 64        // prevent earlier stores from crossing current op
>> > MO_BAR_LD_A = 128       // prevent later loads from crossing current op
>> > MO_BAR_ST_A = 256       // prevent later stores from crossing current op
>> > MO_BAR_LDST_B = MO_BAR_LD_B | MO_BAR_ST_B
>> > MO_BAR_LDST_A = MO_BAR_LD_A | MO_BAR_ST_A
>> > MO_BAR_MASK = MO_BAR_LDST_B | MO_BAR_LDST_A
>> >
>> > // Match Sparc MEMBAR as the most flexible host.
>> > TCG_BAR_LD_LD = 1       // #LoadLoad barrier
>> > TCG_BAR_ST_LD = 2       // #StoreLoad barrier
>> > TCG_BAR_LD_ST = 4       // #LoadStore barrier
>> > TCG_BAR_ST_ST = 8       // #StoreStore barrier
>> > TCG_BAR_SYNC  = 64      // SEQ_CST barrier
> I really like this format. I would also like to add to the frontend:
>
> MO_BAR_ACQUIRE
> MO_BAR_RELEASE
>
> and the following to the backend:
>
> TCG_BAR_ACQUIRE
> TCG_BAR_RELEASE
>
> since these are one-way barriers and the previous barrier types do not
> cover them.
>

Well, my last comment about acq/rel barriers should really go here
[Message-Id: 5751DF2D.5040709@gmail.com]

Regards,
Sergey

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-03 19:49                 ` Sergey Fedorov
@ 2016-06-03 20:43                   ` Peter Maydell
  2016-06-03 21:33                     ` Sergey Fedorov
  2016-06-06 16:19                   ` Alex Bennée
  1 sibling, 1 reply; 61+ messages in thread
From: Peter Maydell @ 2016-06-03 20:43 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: Pranith Kumar, Richard Henderson, Alex Bennée,
	open list:All patches CC here

On 3 June 2016 at 20:49, Sergey Fedorov <serge.fdrv@gmail.com> wrote:
> I'm not so sure about acq/rel flags. Is there any architecture which has
> explicit acq/rel barriers? I suppose acq/rel memory access instructions
> are always load-link and store-conditional and thus rely on exclusive
> memory monitor to support that "conditional" behaviour.

This doesn't sound right (at least not for ARM). You can have
load-acquire and store-release insns which aren't exclusives,
you can have exclusives which aren't acq/rel, and you can
have accesses which are both exclusives and acq/rel (and
just to complete the set, obviously there are accesses which
are neither). The exclusive semantics require the monitor, but
acq/rel is just an ordering constraint (sort of like an implicit
barrier, but not quite).

thanks
-- PMM

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-03 20:43                   ` Peter Maydell
@ 2016-06-03 21:33                     ` Sergey Fedorov
  0 siblings, 0 replies; 61+ messages in thread
From: Sergey Fedorov @ 2016-06-03 21:33 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Pranith Kumar, Richard Henderson, Alex Bennée,
	open list:All patches CC here

On 03/06/16 23:43, Peter Maydell wrote:
> On 3 June 2016 at 20:49, Sergey Fedorov <serge.fdrv@gmail.com> wrote:
>> I'm not so sure about acq/rel flags. Is there any architecture which has
>> explicit acq/rel barriers? I suppose acq/rel memory access instructions
>> are always load-link and store-conditional and thus rely on exclusive
>> memory monitor to support that "conditional" behaviour.
> This doesn't sound right (at least not for ARM). You can have
> load-acquire and store-release insns which aren't exclusives,
> you can have exclusives which aren't acq/rel, and you can
> have accesses which are both exclusives and acq/rel (and
> just to complete the set, obviously there are accesses which
> are neither). The exclusive semantics require the monitor, but
> acq/rel is just an ordering constraint (sort of like an implicit
> barrier, but not quite).

Thanks for clarifying this, Peter.

Regards,
Sergey

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 11/13] target-arm: Generate fences in ARMv7 frontend
  2016-06-02 19:37   ` Sergey Fedorov
@ 2016-06-04 14:50     ` Pranith Kumar
  0 siblings, 0 replies; 61+ messages in thread
From: Pranith Kumar @ 2016-06-04 14:50 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: Peter Maydell, open list:ARM, open list:All patches CC here,
	Alex Bennée, Richard Henderson

On Thu, Jun 2, 2016 at 3:37 PM, Sergey Fedorov <serge.fdrv@gmail.com> wrote:
> On 31/05/16 21:39, Pranith Kumar wrote:
>> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>> ---
>>  target-arm/translate.c | 7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/target-arm/translate.c b/target-arm/translate.c
>> index c946c0e..e1b16c0 100644
>> --- a/target-arm/translate.c
>> +++ b/target-arm/translate.c
>> @@ -7980,9 +7980,11 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
>>                  gen_clrex(s);
>>                  return;
>>              case 4: /* dsb */
>> +                ARCH(7);
>
> ARMv7 ARM says: "A DSB behaves as a DMB with the same arguments, and
> also has the additional properties...".
>

OK. I was just focusing on dmb for now. I will add translation for all
other barriers too in the next version.

Thanks!
-- 
Pranith

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-03 18:27           ` Pranith Kumar
  2016-06-03 19:52             ` Sergey Fedorov
@ 2016-06-06 15:44             ` Sergey Fedorov
  2016-06-06 15:47               ` Pranith Kumar
  1 sibling, 1 reply; 61+ messages in thread
From: Sergey Fedorov @ 2016-06-06 15:44 UTC (permalink / raw)
  To: Pranith Kumar, Richard Henderson
  Cc: open list:All patches CC here, serge.fdrv, Alex Bennée

On 03/06/16 21:27, Pranith Kumar wrote:
> On Thu, Jun 2, 2016 at 5:18 PM, Richard Henderson <rth@twiddle.net> wrote:
>> Hum.  That does seem helpful-ish.  But I'm not certain how helpful it is to
>> complicate the helper functions even further.
>>
>> What if we have tcg_canonicalize_memop (or some such) split off the barriers
>> into separate opcodes.  E.g.
>>
>> MO_BAR_LD_B = 32        // prevent earlier loads from crossing current op
>> MO_BAR_ST_B = 64        // prevent earlier stores from crossing current op
>> MO_BAR_LD_A = 128       // prevent later loads from crossing current op
>> MO_BAR_ST_A = 256       // prevent later stores from crossing current op
>> MO_BAR_LDST_B = MO_BAR_LD_B | MO_BAR_ST_B
>> MO_BAR_LDST_A = MO_BAR_LD_A | MO_BAR_ST_A
>> MO_BAR_MASK = MO_BAR_LDST_B | MO_BAR_LDST_A
>>
>> // Match Sparc MEMBAR as the most flexible host.
>> TCG_BAR_LD_LD = 1       // #LoadLoad barrier
>> TCG_BAR_ST_LD = 2       // #StoreLoad barrier
>> TCG_BAR_LD_ST = 4       // #LoadStore barrier
>> TCG_BAR_ST_ST = 8       // #StoreStore barrier
>> TCG_BAR_SYNC  = 64      // SEQ_CST barrier
> I really like this format. I would also like to add to the frontend:
>
> MO_BAR_ACQUIRE
> MO_BAR_RELEASE
>
> and the following to the backend:
>
> TCG_BAR_ACQUIRE
> TCG_BAR_RELEASE
>
> since these are one-way barriers and the previous barrier types do not
> cover them.

Actually, the acquire barrier is a combined load-load + load-store
barrier; and the release barrier is a combo of load-store + store-store
barriers.

Kind regards,
Sergey

>
>> where
>>
>>   tcg_gen_qemu_ld_i32(x, y, i, m | MO_BAR_LD_BEFORE | MO_BAR_ST_AFTER)
>>
>> emits
>>
>>   mb            TCG_BAR_LD_LD
>>   qemu_ld_i32   x, y, i, m
>>   mb            TCG_BAR_LD_ST
>>
>> We can then add an optimization pass which folds barriers with no memory
>> operations in between, so that duplicates are eliminated.
>>
> Yes, folding/eliding these barriers in an optimization pass sounds
> like a good idea.
>
> Thanks,

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-06 15:44             ` Sergey Fedorov
@ 2016-06-06 15:47               ` Pranith Kumar
  2016-06-06 15:49                 ` Sergey Fedorov
  0 siblings, 1 reply; 61+ messages in thread
From: Pranith Kumar @ 2016-06-06 15:47 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: Richard Henderson, open list:All patches CC here, serge.fdrv,
	Alex Bennée

On Mon, Jun 6, 2016 at 11:44 AM, Sergey Fedorov <serge.fdrv@gmail.com> wrote:
> On 03/06/16 21:27, Pranith Kumar wrote:
>> On Thu, Jun 2, 2016 at 5:18 PM, Richard Henderson <rth@twiddle.net> wrote:
>>>
>>> What if we have tcg_canonicalize_memop (or some such) split off the barriers
>>> into separate opcodes.  E.g.
>>>
>>> MO_BAR_LD_B = 32        // prevent earlier loads from crossing current op
>>> MO_BAR_ST_B = 64        // prevent earlier stores from crossing current op
>>> MO_BAR_LD_A = 128       // prevent later loads from crossing current op
>>> MO_BAR_ST_A = 256       // prevent later stores from crossing current op
>>> MO_BAR_LDST_B = MO_BAR_LD_B | MO_BAR_ST_B
>>> MO_BAR_LDST_A = MO_BAR_LD_A | MO_BAR_ST_A
>>> MO_BAR_MASK = MO_BAR_LDST_B | MO_BAR_LDST_A
>>>
>>> // Match Sparc MEMBAR as the most flexible host.
>>> TCG_BAR_LD_LD = 1       // #LoadLoad barrier
>>> TCG_BAR_ST_LD = 2       // #StoreLoad barrier
>>> TCG_BAR_LD_ST = 4       // #LoadStore barrier
>>> TCG_BAR_ST_ST = 8       // #StoreStore barrier
>>> TCG_BAR_SYNC  = 64      // SEQ_CST barrier
>> I really like this format. I would also like to add to the frontend:
>>
>
> Actually, the acquire barrier is a combined load-load + load-store
> barrier; and the release barrier is a combo of load-store + store-store
> barriers.
>

All the above are two-way barriers. Acquire/Release are one-way
barriers. So we cannot combine the above to get acquire/release
semantics without being conservative.

Thanks,
-- 
Pranith

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-06 15:47               ` Pranith Kumar
@ 2016-06-06 15:49                 ` Sergey Fedorov
  2016-06-06 15:58                   ` Pranith Kumar
  0 siblings, 1 reply; 61+ messages in thread
From: Sergey Fedorov @ 2016-06-06 15:49 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: Richard Henderson, open list:All patches CC here, serge.fdrv,
	Alex Bennée

On 06/06/16 18:47, Pranith Kumar wrote:
> On Mon, Jun 6, 2016 at 11:44 AM, Sergey Fedorov <serge.fdrv@gmail.com> wrote:
>> On 03/06/16 21:27, Pranith Kumar wrote:
>>> On Thu, Jun 2, 2016 at 5:18 PM, Richard Henderson <rth@twiddle.net> wrote:
>>>> What if we have tcg_canonicalize_memop (or some such) split off the barriers
>>>> into separate opcodes.  E.g.
>>>>
>>>> MO_BAR_LD_B = 32        // prevent earlier loads from crossing current op
>>>> MO_BAR_ST_B = 64        // prevent earlier stores from crossing current op
>>>> MO_BAR_LD_A = 128       // prevent later loads from crossing current op
>>>> MO_BAR_ST_A = 256       // prevent later stores from crossing current op
>>>> MO_BAR_LDST_B = MO_BAR_LD_B | MO_BAR_ST_B
>>>> MO_BAR_LDST_A = MO_BAR_LD_A | MO_BAR_ST_A
>>>> MO_BAR_MASK = MO_BAR_LDST_B | MO_BAR_LDST_A
>>>>
>>>> // Match Sparc MEMBAR as the most flexible host.
>>>> TCG_BAR_LD_LD = 1       // #LoadLoad barrier
>>>> TCG_BAR_ST_LD = 2       // #StoreLoad barrier
>>>> TCG_BAR_LD_ST = 4       // #LoadStore barrier
>>>> TCG_BAR_ST_ST = 8       // #StoreStore barrier
>>>> TCG_BAR_SYNC  = 64      // SEQ_CST barrier
>>> I really like this format. I would also like to add to the frontend:
>>>
>> Actually, the acquire barrier is a combined load-load + load-store
>> barrier; and the release barrier is a combo of load-store + store-store
>> barriers.
>>
> All the above are two-way barriers. Acquire/Release are one-way
> barriers. So we cannot combine the above to get acquire/release
> semantics without being conservative.

Do you mean *barriers* or *memory access* operations implying memory
ordering?

Regards,
Sergey

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-06 15:49                 ` Sergey Fedorov
@ 2016-06-06 15:58                   ` Pranith Kumar
  2016-06-06 16:14                     ` Sergey Fedorov
  0 siblings, 1 reply; 61+ messages in thread
From: Pranith Kumar @ 2016-06-06 15:58 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: Richard Henderson, open list:All patches CC here, Alex Bennée

On Mon, Jun 6, 2016 at 11:49 AM, Sergey Fedorov <serge.fdrv@gmail.com> wrote:
> On 06/06/16 18:47, Pranith Kumar wrote:
>> On Mon, Jun 6, 2016 at 11:44 AM, Sergey Fedorov <serge.fdrv@gmail.com> wrote:
>>> On 03/06/16 21:27, Pranith Kumar wrote:
>>>> On Thu, Jun 2, 2016 at 5:18 PM, Richard Henderson <rth@twiddle.net> wrote:
>>>>> What if we have tcg_canonicalize_memop (or some such) split off the barriers
>>>>> into separate opcodes.  E.g.
>>>>>
>>>>> MO_BAR_LD_B = 32        // prevent earlier loads from crossing current op
>>>>> MO_BAR_ST_B = 64        // prevent earlier stores from crossing current op
>>>>> MO_BAR_LD_A = 128       // prevent later loads from crossing current op
>>>>> MO_BAR_ST_A = 256       // prevent later stores from crossing current op
>>>>> MO_BAR_LDST_B = MO_BAR_LD_B | MO_BAR_ST_B
>>>>> MO_BAR_LDST_A = MO_BAR_LD_A | MO_BAR_ST_A
>>>>> MO_BAR_MASK = MO_BAR_LDST_B | MO_BAR_LDST_A
>>>>>
>>>>> // Match Sparc MEMBAR as the most flexible host.
>>>>> TCG_BAR_LD_LD = 1       // #LoadLoad barrier
>>>>> TCG_BAR_ST_LD = 2       // #StoreLoad barrier
>>>>> TCG_BAR_LD_ST = 4       // #LoadStore barrier
>>>>> TCG_BAR_ST_ST = 8       // #StoreStore barrier
>>>>> TCG_BAR_SYNC  = 64      // SEQ_CST barrier
>>>> I really like this format. I would also like to add to the frontend:
>>>>
>>> Actually, the acquire barrier is a combined load-load + load-store
>>> barrier; and the release barrier is a combo of load-store + store-store
>>> barriers.
>>>
>> All the above are two-way barriers. Acquire/Release are one-way
>> barriers. So we cannot combine the above to get acquire/release
>> semantics without being conservative.
>
> Do you mean *barriers* or *memory access* operations implying memory
> ordering?

I meant the latter. I know no arch which has acquire/release barriers.
Sorry for the confusion.

Thanks,
-- 
Pranith

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-06 15:58                   ` Pranith Kumar
@ 2016-06-06 16:14                     ` Sergey Fedorov
  2016-06-06 17:11                       ` Pranith Kumar
  0 siblings, 1 reply; 61+ messages in thread
From: Sergey Fedorov @ 2016-06-06 16:14 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: Richard Henderson, open list:All patches CC here, Alex Bennée

On 06/06/16 18:58, Pranith Kumar wrote:
> On Mon, Jun 6, 2016 at 11:49 AM, Sergey Fedorov <serge.fdrv@gmail.com> wrote:
>> On 06/06/16 18:47, Pranith Kumar wrote:
>>> On Mon, Jun 6, 2016 at 11:44 AM, Sergey Fedorov <serge.fdrv@gmail.com> wrote:
>>>> On 03/06/16 21:27, Pranith Kumar wrote:
>>>>> On Thu, Jun 2, 2016 at 5:18 PM, Richard Henderson <rth@twiddle.net> wrote:
>>>>>> What if we have tcg_canonicalize_memop (or some such) split off the barriers
>>>>>> into separate opcodes.  E.g.
>>>>>>
>>>>>> MO_BAR_LD_B = 32        // prevent earlier loads from crossing current op
>>>>>> MO_BAR_ST_B = 64        // prevent earlier stores from crossing current op
>>>>>> MO_BAR_LD_A = 128       // prevent later loads from crossing current op
>>>>>> MO_BAR_ST_A = 256       // prevent later stores from crossing current op
>>>>>> MO_BAR_LDST_B = MO_BAR_LD_B | MO_BAR_ST_B
>>>>>> MO_BAR_LDST_A = MO_BAR_LD_A | MO_BAR_ST_A
>>>>>> MO_BAR_MASK = MO_BAR_LDST_B | MO_BAR_LDST_A
>>>>>>
>>>>>> // Match Sparc MEMBAR as the most flexible host.
>>>>>> TCG_BAR_LD_LD = 1       // #LoadLoad barrier
>>>>>> TCG_BAR_ST_LD = 2       // #StoreLoad barrier
>>>>>> TCG_BAR_LD_ST = 4       // #LoadStore barrier
>>>>>> TCG_BAR_ST_ST = 8       // #StoreStore barrier
>>>>>> TCG_BAR_SYNC  = 64      // SEQ_CST barrier
>>>>> I really like this format. I would also like to add to the frontend:
>>>>>
>>>> Actually, the acquire barrier is a combined load-load + load-store
>>>> barrier; and the release barrier is a combo of load-store + store-store
>>>> barriers.
>>>>
>>> All the above are two-way barriers. Acquire/Release are one-way
>>> barriers. So we cannot combine the above to get acquire/release
>>> semantics without being conservative.
>> Do you mean *barriers* or *memory access* operations implying memory
>> ordering?
> I meant the latter. I know no arch which has acquire/release barriers.
> Sorry for the confusion.

So I am. By the way, what's the difference between sequential consistent
*barrier* and a combination of all the other barriers?

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-03 19:49                 ` Sergey Fedorov
  2016-06-03 20:43                   ` Peter Maydell
@ 2016-06-06 16:19                   ` Alex Bennée
  1 sibling, 0 replies; 61+ messages in thread
From: Alex Bennée @ 2016-06-06 16:19 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: Pranith Kumar, Richard Henderson, open list:All patches CC here


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 03/06/16 21:30, Pranith Kumar wrote:
>> On Thu, Jun 2, 2016 at 9:08 PM, Richard Henderson <rth@twiddle.net> wrote:
>>> On 06/02/2016 02:37 PM, Sergey Fedorov wrote:
>>>>
>>>> It would give us three TCG operations for each memory operation instead
>>>> of one. But then we might like to combine these barrier operations back
>>>> with memory operations in each backend. If we propagate memory ordering
>>>> semantics up to the backend, it can decide itself what instructions are
>>>> best to generate.
>>>
>>> A strongly ordered target would generally only set BEFORE bits or AFTER
>>> bits, but not both (and I suggest we canonicalize on AFTER for all such
>>> targets). Thus a strongly ordered target would produce only 2 opcodes per
>>> memory op.
>>>
>>> I supplied both to make it easier to handle a weakly ordered target with
>>> acquire/release bits.
>>>
>>> I would *not* combine the barrier operations back with memory operations in
>>> the backend.  Only armv8 and ia64 can do that, and given the optimization
>>> level at which we generate code, I doubt it would really make much
>>> difference above separate barriers.
>>>
>> On armv8, using load_acquire/store_release instructions makes a
>> significant difference in performance when compared to plain
>> dmb+memory instruction sequence. So I would really like to keep the
>> option of generating acq/rel instructions(by combining barrier and
>> memory or some other way) open.
>
> I'm not so sure about acq/rel flags. Is there any architecture which has
> explicit acq/rel barriers? I suppose acq/rel memory access instructions
> are always load-link and store-conditional and thus rely on exclusive
> memory monitor to support that "conditional" behaviour.

Nope, you can have acq/rel memory operations without exclusive
operations (see ARMv8 ldar and stlr). The exclusive operations also have
ordered and non-ordered variants (ldxr, strx).

> To emulate this
> behaviour we need something more special like "Slow-path for atomic
> instruction translation" [1].
>
> [1] http://thread.gmane.org/gmane.comp.emulators.qemu/407419
>
> Kind regards,
> Sergey


--
Alex Bennée

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-06 16:14                     ` Sergey Fedorov
@ 2016-06-06 17:11                       ` Pranith Kumar
  2016-06-06 19:23                         ` Richard Henderson
  0 siblings, 1 reply; 61+ messages in thread
From: Pranith Kumar @ 2016-06-06 17:11 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: Richard Henderson, open list:All patches CC here, Alex Bennée

On Mon, Jun 6, 2016 at 12:14 PM, Sergey Fedorov <serge.fdrv@gmail.com> wrote:
> On 06/06/16 18:58, Pranith Kumar wrote:
>> On Mon, Jun 6, 2016 at 11:49 AM, Sergey Fedorov <serge.fdrv@gmail.com> wrote:
>>> On 06/06/16 18:47, Pranith Kumar wrote:
>>>> On Mon, Jun 6, 2016 at 11:44 AM, Sergey Fedorov <serge.fdrv@gmail.com> wrote:
>>>>> On 03/06/16 21:27, Pranith Kumar wrote:
>>>>>> On Thu, Jun 2, 2016 at 5:18 PM, Richard Henderson <rth@twiddle.net> wrote:
>>>>>>> What if we have tcg_canonicalize_memop (or some such) split off the barriers
>>>>>>> into separate opcodes.  E.g.
>>>>>>>
>>>>>>> MO_BAR_LD_B = 32        // prevent earlier loads from crossing current op
>>>>>>> MO_BAR_ST_B = 64        // prevent earlier stores from crossing current op
>>>>>>> MO_BAR_LD_A = 128       // prevent later loads from crossing current op
>>>>>>> MO_BAR_ST_A = 256       // prevent later stores from crossing current op
>>>>>>> MO_BAR_LDST_B = MO_BAR_LD_B | MO_BAR_ST_B
>>>>>>> MO_BAR_LDST_A = MO_BAR_LD_A | MO_BAR_ST_A
>>>>>>> MO_BAR_MASK = MO_BAR_LDST_B | MO_BAR_LDST_A
>>>>>>>
>>>>>>> // Match Sparc MEMBAR as the most flexible host.
>>>>>>> TCG_BAR_LD_LD = 1       // #LoadLoad barrier
>>>>>>> TCG_BAR_ST_LD = 2       // #StoreLoad barrier
>>>>>>> TCG_BAR_LD_ST = 4       // #LoadStore barrier
>>>>>>> TCG_BAR_ST_ST = 8       // #StoreStore barrier
>>>>>>> TCG_BAR_SYNC  = 64      // SEQ_CST barrier
>>>>>> I really like this format. I would also like to add to the frontend:
>>>>>>
>>>>> Actually, the acquire barrier is a combined load-load + load-store
>>>>> barrier; and the release barrier is a combo of load-store + store-store
>>>>> barriers.
>>>>>
>>>> All the above are two-way barriers. Acquire/Release are one-way
>>>> barriers. So we cannot combine the above to get acquire/release
>>>> semantics without being conservative.
>>> Do you mean *barriers* or *memory access* operations implying memory
>>> ordering?
>> I meant the latter. I know no arch which has acquire/release barriers.
>> Sorry for the confusion.
>
> So I am. By the way, what's the difference between sequential consistent
> *barrier* and a combination of all the other barriers?


If I read it correctly TCG_BAR_SYNC is equivalent to OR of all the
other four barriers. I am not sure if we can just construct SYNC like
this or if we need to define it explicitly though.

-- 
Pranith

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-06 17:11                       ` Pranith Kumar
@ 2016-06-06 19:23                         ` Richard Henderson
  2016-06-06 19:28                           ` Pranith Kumar
  0 siblings, 1 reply; 61+ messages in thread
From: Richard Henderson @ 2016-06-06 19:23 UTC (permalink / raw)
  To: Pranith Kumar, Sergey Fedorov
  Cc: open list:All patches CC here, Alex Bennée

On 06/06/2016 10:11 AM, Pranith Kumar wrote:
> If I read it correctly TCG_BAR_SYNC is equivalent to OR of all the
> other four barriers. I am not sure if we can just construct SYNC like
> this or if we need to define it explicitly though.

AFAICS, sparc membar #sync is stronger.

Compare PowerPC hwsync and lwsync.  I would define lwsync as the OR of the 
other 4 barriers, but hwsync as TCG_BAR_SYNC.


r~

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-06 19:23                         ` Richard Henderson
@ 2016-06-06 19:28                           ` Pranith Kumar
  2016-06-06 20:30                             ` Sergey Fedorov
  0 siblings, 1 reply; 61+ messages in thread
From: Pranith Kumar @ 2016-06-06 19:28 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Sergey Fedorov, open list:All patches CC here, Alex Bennée

On Mon, Jun 6, 2016 at 3:23 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 06/06/2016 10:11 AM, Pranith Kumar wrote:
>>
>> If I read it correctly TCG_BAR_SYNC is equivalent to OR of all the
>> other four barriers. I am not sure if we can just construct SYNC like
>> this or if we need to define it explicitly though.
>
>
> AFAICS, sparc membar #sync is stronger.

I tried looking it up but it's not clear. How is it stronger? And do
we need those strong guarantees in our front-end/back-end?

Thanks,
-- 
Pranith

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-06 19:28                           ` Pranith Kumar
@ 2016-06-06 20:30                             ` Sergey Fedorov
  2016-06-06 21:00                               ` Peter Maydell
  0 siblings, 1 reply; 61+ messages in thread
From: Sergey Fedorov @ 2016-06-06 20:30 UTC (permalink / raw)
  To: Pranith Kumar, Richard Henderson
  Cc: open list:All patches CC here, Alex Bennée

On 06/06/16 22:28, Pranith Kumar wrote:
> On Mon, Jun 6, 2016 at 3:23 PM, Richard Henderson <rth@twiddle.net> wrote:
>> On 06/06/2016 10:11 AM, Pranith Kumar wrote:
>>> If I read it correctly TCG_BAR_SYNC is equivalent to OR of all the
>>> other four barriers. I am not sure if we can just construct SYNC like
>>> this or if we need to define it explicitly though.
>>
>> AFAICS, sparc membar #sync is stronger.
> I tried looking it up but it's not clear. How is it stronger? And do
> we need those strong guarantees in our front-end/back-end?

That is not clear for me either :( AFAIU, PPC's lwsync does allow stores
to be reordered after loads but hwsync - not. I suspect Sparc's membar
#Sync is used to ensure that some system operations are complete before
proceeding with execution. I'm not sure we need to introduce this into
TCG. It needs to be clear what is it and how to use it.

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-06 20:30                             ` Sergey Fedorov
@ 2016-06-06 21:00                               ` Peter Maydell
  2016-06-06 21:49                                 ` Sergey Fedorov
  0 siblings, 1 reply; 61+ messages in thread
From: Peter Maydell @ 2016-06-06 21:00 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: Pranith Kumar, Richard Henderson, Alex Bennée,
	open list:All patches CC here

On 6 June 2016 at 21:30, Sergey Fedorov <serge.fdrv@gmail.com> wrote:
> On 06/06/16 22:28, Pranith Kumar wrote:
>> On Mon, Jun 6, 2016 at 3:23 PM, Richard Henderson <rth@twiddle.net> wrote:
>>> On 06/06/2016 10:11 AM, Pranith Kumar wrote:
>>>> If I read it correctly TCG_BAR_SYNC is equivalent to OR of all the
>>>> other four barriers. I am not sure if we can just construct SYNC like
>>>> this or if we need to define it explicitly though.
>>>
>>> AFAICS, sparc membar #sync is stronger.
>> I tried looking it up but it's not clear. How is it stronger? And do
>> we need those strong guarantees in our front-end/back-end?
>
> That is not clear for me either :( AFAIU, PPC's lwsync does allow stores
> to be reordered after loads but hwsync - not.

Yes, from the PoV of the other CPU. That is, for write-then-read by
CPU 0, CPU 0 will always read what it wrote, but other CPUs don't
necessarily see the write before the read is satisfied.
https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
describes the difference in sections 3.2 and 3.3, and has an
example in section 6 of a situation which requires a full
(hw)sync and an lwsync is not sufficient.

> I suspect Sparc's membar
> #Sync is used to ensure that some system operations are complete before
> proceeding with execution. I'm not sure we need to introduce this into
> TCG. It needs to be clear what is it and how to use it.

My reading of the manual is that the SPARC "membar #Sync" is like ARM
ISB -- it enforces that any instruction (whether a memory access or not)
before it must finish before anything after it can start. It only
affects the CPU that issues it (assuming you didn't also specify
any of the bits requesting memory barriers!) Since TCG doesn't attempt
to reorder instructions, we likely don't need to do anything except
maybe end the current TB. Also if we're still trying to do TLB
operations on other CPUs asynchronously we need to wait for them to
finish; I forget what the conclusion was on that idea.
PPC equivalent insn is isync I think.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
  2016-06-06 21:00                               ` Peter Maydell
@ 2016-06-06 21:49                                 ` Sergey Fedorov
  0 siblings, 0 replies; 61+ messages in thread
From: Sergey Fedorov @ 2016-06-06 21:49 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Pranith Kumar, Richard Henderson, Alex Bennée,
	open list:All patches CC here

On 07/06/16 00:00, Peter Maydell wrote:
> On 6 June 2016 at 21:30, Sergey Fedorov <serge.fdrv@gmail.com> wrote:
>> On 06/06/16 22:28, Pranith Kumar wrote:
>>> On Mon, Jun 6, 2016 at 3:23 PM, Richard Henderson <rth@twiddle.net> wrote:
>>>> On 06/06/2016 10:11 AM, Pranith Kumar wrote:
>>>>> If I read it correctly TCG_BAR_SYNC is equivalent to OR of all the
>>>>> other four barriers. I am not sure if we can just construct SYNC like
>>>>> this or if we need to define it explicitly though.
>>>> AFAICS, sparc membar #sync is stronger.
>>> I tried looking it up but it's not clear. How is it stronger? And do
>>> we need those strong guarantees in our front-end/back-end?
>> That is not clear for me either :( AFAIU, PPC's lwsync does allow stores
>> to be reordered after loads but hwsync - not.
> Yes, from the PoV of the other CPU. That is, for write-then-read by
> CPU 0, CPU 0 will always read what it wrote, but other CPUs don't
> necessarily see the write before the read is satisfied.
> https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
> describes the difference in sections 3.2 and 3.3, and has an
> example in section 6 of a situation which requires a full
> (hw)sync and an lwsync is not sufficient.
>
>> I suspect Sparc's membar
>> #Sync is used to ensure that some system operations are complete before
>> proceeding with execution. I'm not sure we need to introduce this into
>> TCG. It needs to be clear what is it and how to use it.
> My reading of the manual is that the SPARC "membar #Sync" is like ARM
> ISB -- it enforces that any instruction (whether a memory access or not)
> before it must finish before anything after it can start. It only
> affects the CPU that issues it (assuming you didn't also specify
> any of the bits requesting memory barriers!) Since TCG doesn't attempt
> to reorder instructions, we likely don't need to do anything except
> maybe end the current TB. Also if we're still trying to do TLB
> operations on other CPUs asynchronously we need to wait for them to
> finish; I forget what the conclusion was on that idea.
> PPC equivalent insn is isync I think.
>

Thanks for commenting this, Peter. AFAIU, a sequential consistency
barrier is stronger than a release-aquire barrier because it provides
"transitivity/commutativity" [1]. This is what general barrier
guarantees in Linux [2]. I especially like this piece of description
from [2]:

    ... if this example runs on a system where CPUs 1 and 2 share a
    store buffer or a level of cache, CPU 2 might have early access to
    CPU 1's writes. General barriers are therefore required to ensure
    that all CPUs agree on the combined order of CPU 1's and CPU 2's
    accesses.

Current Linux kernel implements Sparc's smp_mb()/__smp_mb()/mb() with
"membar #StoreLoad" [3]. So we'll probably be fine with just RR, RW, WR,
and WW bits in the TCG memory barrier operation attribute.

Kind regards,
Sergey

[1] https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync#Overall_Summary
[2]
http://lxr.free-electrons.com/source/Documentation/memory-barriers.txt#L1268
[3]
http://lxr.free-electrons.com/source/arch/sparc/include/asm/barrier_64.h#L36

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 03/13] tcg/aarch64: Add support for fence
  2016-05-31 20:34   ` Richard Henderson
@ 2016-06-16 22:03     ` Pranith Kumar
  0 siblings, 0 replies; 61+ messages in thread
From: Pranith Kumar @ 2016-06-16 22:03 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Claudio Fontana, open list:AArch64 target,
	open list:All patches CC here, Alex Bennée, Claudio Fontana,
	Sergey Fedorov

Hi Richard,

On Tue, May 31, 2016 at 4:34 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 05/31/2016 11:39 AM, Pranith Kumar wrote:
>>
>> +    /* System instructions.  */
>> +    DMB_ISH         = 0xd5033bbf,
>
> ...
>>
>> +    case INDEX_op_mb:
>> +        tcg_out32(s, DMB_ISH);
>> +        break;
>
>
> With the flags argument, this needs to be split.
>
> DMB_ISH = 0xd5033b8f

I just checked this. Shouldn't this be as follows:

DMB_ISH = 0xd50338bf
DMB_RD  = 0x00000100
DMB_WR  = 0x00000200

The logic seems to be ok.

Thanks!

> DMB_RD  = 0x00000010
> DMB_WR  = 0x00000020
>
>         if (a0 == TCG_MB_READ) {
>                 a0 = DMB_RD;
>         } else if (a0 == TCG_MB_WRITE) {
>                 a0 = DMB_WR;
>         } else {
>                 a0 = DMB_RD | DMB_WR;
>         }
>         tcg_out32(s, DMB_ISH | a0);
>
>
>
> r~



-- 
Pranith

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2016-06-16 22:03 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-31 18:39 [Qemu-devel] [RFC v2 PATCH 00/13] tcg: Add fence gen support Pranith Kumar
2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier Pranith Kumar
2016-05-31 20:24   ` Richard Henderson
2016-06-01 18:43     ` Pranith Kumar
2016-06-01 21:35       ` Richard Henderson
2016-06-02 16:18         ` Sergey Fedorov
2016-06-02 16:30   ` Sergey Fedorov
2016-06-02 18:42     ` Pranith Kumar
2016-06-02 20:36       ` Richard Henderson
2016-06-02 20:36     ` Richard Henderson
2016-06-02 20:38       ` Sergey Fedorov
2016-06-02 21:18         ` Richard Henderson
2016-06-02 21:37           ` Sergey Fedorov
2016-06-03  1:08             ` Richard Henderson
2016-06-03 15:16               ` Sergey Fedorov
2016-06-03 15:45                 ` Richard Henderson
2016-06-03 16:06                   ` Sergey Fedorov
2016-06-03 18:30               ` Pranith Kumar
2016-06-03 19:49                 ` Sergey Fedorov
2016-06-03 20:43                   ` Peter Maydell
2016-06-03 21:33                     ` Sergey Fedorov
2016-06-06 16:19                   ` Alex Bennée
2016-06-03 18:27           ` Pranith Kumar
2016-06-03 19:52             ` Sergey Fedorov
2016-06-06 15:44             ` Sergey Fedorov
2016-06-06 15:47               ` Pranith Kumar
2016-06-06 15:49                 ` Sergey Fedorov
2016-06-06 15:58                   ` Pranith Kumar
2016-06-06 16:14                     ` Sergey Fedorov
2016-06-06 17:11                       ` Pranith Kumar
2016-06-06 19:23                         ` Richard Henderson
2016-06-06 19:28                           ` Pranith Kumar
2016-06-06 20:30                             ` Sergey Fedorov
2016-06-06 21:00                               ` Peter Maydell
2016-06-06 21:49                                 ` Sergey Fedorov
2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 02/13] tcg/i386: Add support for fence Pranith Kumar
2016-05-31 20:27   ` Richard Henderson
2016-06-01 18:49     ` Pranith Kumar
2016-06-01 21:17       ` Richard Henderson
2016-06-01 21:44         ` Pranith Kumar
2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 03/13] tcg/aarch64: " Pranith Kumar
2016-05-31 18:59   ` Claudio Fontana
2016-05-31 20:34   ` Richard Henderson
2016-06-16 22:03     ` Pranith Kumar
2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 04/13] tcg/arm: " Pranith Kumar
2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 05/13] tcg/ia64: " Pranith Kumar
2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 06/13] tcg/mips: " Pranith Kumar
2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 07/13] tcg/ppc: " Pranith Kumar
2016-05-31 20:41   ` Richard Henderson
2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 08/13] tcg/s390: " Pranith Kumar
2016-06-02 19:31   ` Sergey Fedorov
2016-06-02 20:38     ` Richard Henderson
2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 09/13] tcg/sparc: " Pranith Kumar
2016-05-31 20:45   ` Richard Henderson
2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 10/13] tcg/tci: " Pranith Kumar
2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 11/13] target-arm: Generate fences in ARMv7 frontend Pranith Kumar
2016-06-02 19:37   ` Sergey Fedorov
2016-06-04 14:50     ` Pranith Kumar
2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 12/13] target-alpha: Generate fence op Pranith Kumar
2016-05-31 18:39 ` [Qemu-devel] [RFC v2 PATCH 13/13] tcg: Generate fences only for SMP MTTCG guests Pranith Kumar
2016-05-31 18:46 ` [Qemu-devel] [RFC v2 PATCH 00/13] tcg: Add fence gen support Pranith Kumar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.