All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC 0/8] Helper-based Atomic Instruction Emulation (AIE)
@ 2015-05-08 21:02 Emilio G. Cota
  2015-05-08 21:02 ` [Qemu-devel] [RFC 1/8] cputlb: add physical address to CPUTLBEntry Emilio G. Cota
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: Emilio G. Cota @ 2015-05-08 21:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, Peter Maydell, Alvise Rigo, Paolo Bonzini,
	Frederic Konrad, alex.bennee, Richard Henderson


Hi all,

These are patches I've been working on for some time now.
Since emulation of atomic instructions is recently getting
attention([1], [2]), I'm submitting them for comment.

[1] http://thread.gmane.org/gmane.comp.emulators.qemu/314406
[2] http://thread.gmane.org/gmane.comp.emulators.qemu/334561

Main features of this design:

- Performance and scalability are the main design goal: guest code should
  scale as much as it would scale running natively on the host.

  For this, a host lock is (if necessary) assigned to each 16-byte
  aligned chunk of the physical address space.
  The assignment (i.e. lock allocation + registration) only happens
  after an atomic operation on a particular physical address
  is performed. To keep track of this sparse set of locks,
  a lockless radix tree is used, so lookups are fast and scalable.

- Translation helpers are employed to call the 'aie' module, which is
  the common code that accesses the radix tree, locking the appropriate
  entry depending on the access' physical address.

- No special host atomic instructions (e.g. cmpxchg16b) are required;
  mutexes and include/qemu/atomic.h is all that's needed.

- Usermode and full-system are supported with the same code. Note that
  the newly-added tiny_set module is necessary to properly emulate LL/SC,
  since the number of "cpus" (i.e. threads) is unbounded in usermode--
  for full-system mode a bitmap would have been sufficient.

- ARM: Stores concurrent with LL/SC primitives are initially not dealt
  with.
  This is my choice, since I'm assuming most sane code will only
  handle data atomically using LL/SC primitives. However, SWP can
  be used, so whenevery a SWP instruction is issued, stores start checking
  that stores do not clash with concurrent SWP instructions. This is
  implemented via pre/post-store helpers. I've stress-tested this with a
  heavily contended guest lock (64 cores), and it works fine. Executing
  non-trivial pre/post-store helpers adds a 5% perf overhead to linux
  bootup, and is negligible on regular programs. Anyway most
  sane code doesn't use SWP (linux bootup certainly doesn't.), so this
  overhead is rarely seen.

- x86: Instead of acquiring the same host lock every time LOCK is found,
  the acquisition of an AIE lock (via the radix tree) is done when the
  address of the ensuing load/store is known.
  Loads perform this check at compile-time.
  Stores are emulated using the same trick as in ARM; non-atomic stores
  are executed as atomic stores iff there's a prior atomic operation that
  has been executed on their target address. This for instance ensures
  that a regular store cannot race with a cmpxchg.
  This has very small overhead (negligible with OpenSSL's bntest in
  user-only), and scales as native code.

- Barriers: not emulated yet. They're needed to correctly run non-trivial
  lockless code (I'm using concurrencykit's testbenches).
  The strongly-ordered-guest-on-weakly-ordered-host problem remains; my
  guess is that we'll have to sacrifice single-threaded performance to
  make it work (e.g. using pre-post ld/st helpers).

- 64-bit guest on 32-bit host: Not supported yet. Note that 64-bit
  loads/stores on a 32-bit guest are not atomic, yet 64-bit code might
  have been written assuming that they are. Checks for this will be needed.

- Other ISAs: not done yet, but they should be like either ARM or x86.

- License of new files: is there a preferred license for new code?

- Please tolerate the lack of comments in code and commit logs, when
  preparing this RFC I thought it's better to put all the info
  here. If this wasn't an RFC I'd have done it differently.

Thanks for reading this far, comments welcome!

		Emilio

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [RFC 1/8] cputlb: add physical address to CPUTLBEntry
  2015-05-08 21:02 [Qemu-devel] [RFC 0/8] Helper-based Atomic Instruction Emulation (AIE) Emilio G. Cota
@ 2015-05-08 21:02 ` Emilio G. Cota
  2015-05-08 21:51   ` Richard Henderson
  2015-05-08 21:02 ` [Qemu-devel] [RFC 2/8] softmmu: add helpers to get ld/st physical addresses Emilio G. Cota
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 13+ messages in thread
From: Emilio G. Cota @ 2015-05-08 21:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, Peter Maydell, Alvise Rigo, Paolo Bonzini,
	Frederic Konrad, alex.bennee, Richard Henderson

Having the physical address in the TLB entry will allow us
to portably obtain the physical address of a memory access,
which will prove useful when implementing a scalable emulation
of atomic instructions.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 cputlb.c                |  1 +
 include/exec/cpu-defs.h | 11 ++++++-----
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index 7606548..2cd5912 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -330,6 +330,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
     } else {
         te->addr_write = -1;
     }
+    te->addr_phys = paddr;
 }
 
 /* Add a new TLB entry, but without specifying the memory
diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index 3f56546..67aa0a0 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -75,10 +75,10 @@ typedef uint64_t target_ulong;
 /* use a fully associative victim tlb of 8 entries */
 #define CPU_VTLB_SIZE 8
 
-#if HOST_LONG_BITS == 32 && TARGET_LONG_BITS == 32
-#define CPU_TLB_ENTRY_BITS 4
-#else
+#if TARGET_LONG_BITS == 32
 #define CPU_TLB_ENTRY_BITS 5
+#else
+#define CPU_TLB_ENTRY_BITS 6
 #endif
 
 typedef struct CPUTLBEntry {
@@ -91,13 +91,14 @@ typedef struct CPUTLBEntry {
     target_ulong addr_read;
     target_ulong addr_write;
     target_ulong addr_code;
+    target_ulong addr_phys;
     /* Addend to virtual address to get host address.  IO accesses
        use the corresponding iotlb value.  */
     uintptr_t addend;
     /* padding to get a power of two size */
     uint8_t dummy[(1 << CPU_TLB_ENTRY_BITS) -
-                  (sizeof(target_ulong) * 3 +
-                   ((-sizeof(target_ulong) * 3) & (sizeof(uintptr_t) - 1)) +
+                  (sizeof(target_ulong) * 4 +
+                   ((-sizeof(target_ulong) * 4) & (sizeof(uintptr_t) - 1)) +
                    sizeof(uintptr_t))];
 } CPUTLBEntry;
 
-- 
1.8.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [RFC 2/8] softmmu: add helpers to get ld/st physical addresses
  2015-05-08 21:02 [Qemu-devel] [RFC 0/8] Helper-based Atomic Instruction Emulation (AIE) Emilio G. Cota
  2015-05-08 21:02 ` [Qemu-devel] [RFC 1/8] cputlb: add physical address to CPUTLBEntry Emilio G. Cota
@ 2015-05-08 21:02 ` Emilio G. Cota
  2015-05-08 21:02 ` [Qemu-devel] [RFC 3/8] tiny_set: add module to test for membership in a tiny set of pointers Emilio G. Cota
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Emilio G. Cota @ 2015-05-08 21:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, Peter Maydell, Alvise Rigo, Paolo Bonzini,
	Frederic Konrad, alex.bennee, Richard Henderson

This will be used by the atomic instruction emulation code.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 softmmu_template.h | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg.h          |  5 +++++
 2 files changed, 53 insertions(+)

diff --git a/softmmu_template.h b/softmmu_template.h
index 16b0852..cf33d77 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -465,6 +465,54 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
 #endif
 }
 
+#if DATA_SIZE == 1
+
+/* get a load's physical address */
+hwaddr helper_ret_get_ld_phys(CPUArchState *env, target_ulong addr,
+                              int mmu_idx, uintptr_t retaddr)
+{
+    int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+    CPUTLBEntry *te = &env->tlb_table[mmu_idx][index];
+    target_ulong taddr;
+    target_ulong phys_addr;
+
+    retaddr -= GETPC_ADJ;
+    taddr = te->addr_read & (TARGET_PAGE_MASK | TLB_INVALID_MASK);
+    if (taddr != (addr & TARGET_PAGE_MASK)) {
+        if (!VICTIM_TLB_HIT(addr_read)) {
+            CPUState *cs = ENV_GET_CPU(env);
+
+            tlb_fill(cs, addr, MMU_DATA_LOAD, mmu_idx, retaddr);
+        }
+    }
+    phys_addr = te->addr_phys;
+    return phys_addr | (addr & ~TARGET_PAGE_MASK);
+}
+
+/* get a store's physical address */
+hwaddr helper_ret_get_st_phys(CPUArchState *env, target_ulong addr,
+                              int mmu_idx, uintptr_t retaddr)
+{
+    int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+    CPUTLBEntry *te = &env->tlb_table[mmu_idx][index];
+    target_ulong taddr;
+    target_ulong phys_addr;
+
+    retaddr -= GETPC_ADJ;
+    taddr = te->addr_write & (TARGET_PAGE_MASK | TLB_INVALID_MASK);
+    if (taddr != (addr & TARGET_PAGE_MASK)) {
+        if (!VICTIM_TLB_HIT(addr_write)) {
+            CPUState *cs = ENV_GET_CPU(env);
+
+            tlb_fill(cs, addr, MMU_DATA_STORE, mmu_idx, retaddr);
+        }
+    }
+    phys_addr = te->addr_phys;
+    return phys_addr | (addr & ~TARGET_PAGE_MASK);
+}
+
+#endif /* DATA_SIZE == 1 */
+
 #if DATA_SIZE > 1
 void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
                        int mmu_idx, uintptr_t retaddr)
diff --git a/tcg/tcg.h b/tcg/tcg.h
index fbb3daf..e84cf45 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -930,6 +930,11 @@ void helper_be_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
 void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
                        int mmu_idx, uintptr_t retaddr);
 
+hwaddr helper_ret_get_ld_phys(CPUArchState *env, target_ulong addr,
+                              int mmu_idx, uintptr_t retaddr);
+hwaddr helper_ret_get_st_phys(CPUArchState *env, target_ulong addr,
+                              int mmu_idx, uintptr_t retaddr);
+
 /* Temporary aliases until backends are converted.  */
 #ifdef TARGET_WORDS_BIGENDIAN
 # define helper_ret_ldsw_mmu  helper_be_ldsw_mmu
-- 
1.8.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [RFC 3/8] tiny_set: add module to test for membership in a tiny set of pointers
  2015-05-08 21:02 [Qemu-devel] [RFC 0/8] Helper-based Atomic Instruction Emulation (AIE) Emilio G. Cota
  2015-05-08 21:02 ` [Qemu-devel] [RFC 1/8] cputlb: add physical address to CPUTLBEntry Emilio G. Cota
  2015-05-08 21:02 ` [Qemu-devel] [RFC 2/8] softmmu: add helpers to get ld/st physical addresses Emilio G. Cota
@ 2015-05-08 21:02 ` Emilio G. Cota
  2015-05-08 21:02 ` [Qemu-devel] [RFC 4/8] radix-tree: add generic lockless radix tree module Emilio G. Cota
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Emilio G. Cota @ 2015-05-08 21:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, Peter Maydell, Alvise Rigo, Paolo Bonzini,
	Frederic Konrad, alex.bennee, Richard Henderson

This will be used by the atomic instruction emulation code.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/qemu/tiny_set.h | 90 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 90 insertions(+)
 create mode 100644 include/qemu/tiny_set.h

diff --git a/include/qemu/tiny_set.h b/include/qemu/tiny_set.h
new file mode 100644
index 0000000..b9aa049
--- /dev/null
+++ b/include/qemu/tiny_set.h
@@ -0,0 +1,90 @@
+/*
+ * tiny_set - simple data structure for fast lookups on tiny sets
+ *
+ * Assumptions:
+ * - Sets are tiny, i.e. of up to a few dozen items
+ * - All operations are serialised by some external lock
+ * - Only non-NULL pointers are supported
+ * - No values stored! This is a set, and thus key-only
+ * - No check for duplicates on insert
+ *
+ * Alternatives:
+ * - bitmap: if the number of items in the set is small and bounded
+ * - hash table: if the set is not tiny
+ *
+ * Complexity:
+ * - O(n) lookup/removal, O(1) insert
+ */
+#ifndef TINY_SET_H
+#define TINY_SET_H
+
+#include <stdbool.h>
+#include <assert.h>
+
+#include <glib.h>
+
+#include "qemu/osdep.h"
+#include "qemu/queue.h"
+
+typedef struct tiny_set TinySet;
+
+struct tiny_set {
+    const void **items;
+    int max_items;
+    int n_items;
+};
+
+static inline void tiny_set_init(TinySet *ts)
+{
+    memset(ts, 0, sizeof(*ts));
+}
+
+static inline void tiny_set_insert(TinySet *ts, const void *key)
+{
+    assert(key);
+
+    if (unlikely(ts->n_items == ts->max_items)) {
+        ts->max_items = ts->max_items ? ts->max_items * 2 : 1;
+        ts->items = g_realloc_n(ts->items, ts->max_items, sizeof(*ts->items));
+    }
+    ts->items[ts->n_items] = key;
+    ts->n_items++;
+}
+
+/* returns true if key was removed, false if it wasn't found */
+static inline bool tiny_set_remove(TinySet *ts, const void *key)
+{
+    bool ret = false;
+    int i;
+
+    assert(key);
+
+    for (i = 0; i < ts->n_items; i++) {
+        if (ts->items[i] == key) {
+            ts->items[i] = NULL;
+            ret = true;
+        }
+    }
+    return ret;
+}
+
+static inline void tiny_set_remove_all(TinySet *ts)
+{
+    ts->n_items = 0;
+}
+
+static inline bool tiny_set_contains(TinySet *ts, const void *key)
+{
+    int i;
+
+    assert(key);
+
+    for (i = 0; i < ts->n_items; i++) {
+        if (ts->items[i] == key) {
+            return true;
+        }
+    }
+    return false;
+}
+
+#endif /* TINY_SET_H */
-- 
1.8.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [RFC 4/8] radix-tree: add generic lockless radix tree module
  2015-05-08 21:02 [Qemu-devel] [RFC 0/8] Helper-based Atomic Instruction Emulation (AIE) Emilio G. Cota
                   ` (2 preceding siblings ...)
  2015-05-08 21:02 ` [Qemu-devel] [RFC 3/8] tiny_set: add module to test for membership in a tiny set of pointers Emilio G. Cota
@ 2015-05-08 21:02 ` Emilio G. Cota
  2015-05-08 21:02 ` [Qemu-devel] [RFC 5/8] aie: add module for Atomic Instruction Emulation Emilio G. Cota
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Emilio G. Cota @ 2015-05-08 21:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, Peter Maydell, Alvise Rigo, Paolo Bonzini,
	Frederic Konrad, alex.bennee, Richard Henderson

This will be used by atomic instruction emulation code.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/qemu/radix-tree.h | 29 ++++++++++++++++++
 util/Makefile.objs        |  2 +-
 util/radix-tree.c         | 77 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 107 insertions(+), 1 deletion(-)
 create mode 100644 include/qemu/radix-tree.h
 create mode 100644 util/radix-tree.c

diff --git a/include/qemu/radix-tree.h b/include/qemu/radix-tree.h
new file mode 100644
index 0000000..a4e1f97
--- /dev/null
+++ b/include/qemu/radix-tree.h
@@ -0,0 +1,29 @@
+#ifndef RADIX_TREE_H
+#define RADIX_TREE_H
+
+#include <stddef.h>
+
+typedef struct QemuRadixNode QemuRadixNode;
+typedef struct QemuRadixTree QemuRadixTree;
+
+struct QemuRadixNode {
+    void *slots[0];
+};
+
+struct QemuRadixTree {
+    QemuRadixNode *root;
+    int radix;
+    int max_height;
+};
+
+void qemu_radix_tree_init(QemuRadixTree *tree, int bits, int radix);
+void *qemu_radix_tree_find_alloc(QemuRadixTree *tree, unsigned long index,
+                                 void *(*create)(unsigned long),
+                                 void (*delete)(void *));
+
+static inline void *qemu_radix_tree_find(QemuRadixTree *t, unsigned long index)
+{
+    return qemu_radix_tree_find_alloc(t, index, NULL, NULL);
+}
+
+#endif /* RADIX_TREE_H */
diff --git a/util/Makefile.objs b/util/Makefile.objs
index ceaba30..253f91e 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -1,4 +1,4 @@
-util-obj-y = osdep.o cutils.o unicode.o qemu-timer-common.o
+util-obj-y = osdep.o cutils.o unicode.o qemu-timer-common.o radix-tree.o
 util-obj-$(CONFIG_WIN32) += oslib-win32.o qemu-thread-win32.o event_notifier-win32.o
 util-obj-$(CONFIG_POSIX) += oslib-posix.o qemu-thread-posix.o event_notifier-posix.o qemu-openpty.o
 util-obj-y += envlist.o path.o module.o
diff --git a/util/radix-tree.c b/util/radix-tree.c
new file mode 100644
index 0000000..9d4ebfd
--- /dev/null
+++ b/util/radix-tree.c
@@ -0,0 +1,77 @@
+/*
+ * radix-tree.c
+ * Non-blocking radix tree.
+ *
+ * Features:
+ * - Concurrent lookups and inserts.
+ * - No support for deletions.
+ *
+ * Conventions:
+ * - Height is counted starting from 0 at the bottom.
+ * - The index is used from left to right, i.e. MSBs are used first. This way
+ *   nearby addresses land in nearby slots, minimising cache/TLB misses.
+ *
+ * License: XXX
+ */
+#include <glib.h>
+
+#include "qemu/radix-tree.h"
+#include "qemu/atomic.h"
+#include "qemu/bitops.h"
+#include "qemu/osdep.h"
+
+typedef struct QemuRadixNode QemuRadixNode;
+
+void *qemu_radix_tree_find_alloc(QemuRadixTree *tree, unsigned long index,
+                                 void *(*create)(unsigned long),
+                                 void (*delete)(void *))
+{
+    QemuRadixNode *parent;
+    QemuRadixNode *node = tree->root;
+    void **slot;
+    int n_slots = BIT(tree->radix);
+    int level = tree->max_height - 1;
+    int shift = (level - 1) * tree->radix;
+
+    do {
+        parent = node;
+        slot = parent->slots + ((index >> shift) & (n_slots - 1));
+        node = atomic_read(slot);
+        smp_read_barrier_depends();
+        if (node == NULL) {
+            void *old;
+            void *new;
+
+            if (!create) {
+                return NULL;
+            }
+
+            if (level == 1) {
+                node = create(index);
+            } else {
+                node = g_malloc0(sizeof(*node) + sizeof(void *) * n_slots);
+            }
+            new = node;
+            /* atomic_cmpxchg is type-safe so we cannot use 'node' here */
+            old = atomic_cmpxchg(slot, NULL, new);
+            if (old) {
+                if (level == 1) {
+                    delete(node);
+                } else {
+                    g_free(node);
+                }
+                node = old;
+            }
+        }
+        shift -= tree->radix;
+        level--;
+    } while (level > 0);
+    return node;
+}
+
+void qemu_radix_tree_init(QemuRadixTree *tree, int bits, int radix)
+{
+    tree->radix = radix;
+    tree->max_height = 1 + DIV_ROUND_UP(bits, radix);
+    tree->root = g_malloc0(sizeof(*tree->root) + sizeof(void *) * BIT(radix));
+}
-- 
1.8.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [RFC 5/8] aie: add module for Atomic Instruction Emulation
  2015-05-08 21:02 [Qemu-devel] [RFC 0/8] Helper-based Atomic Instruction Emulation (AIE) Emilio G. Cota
                   ` (3 preceding siblings ...)
  2015-05-08 21:02 ` [Qemu-devel] [RFC 4/8] radix-tree: add generic lockless radix tree module Emilio G. Cota
@ 2015-05-08 21:02 ` Emilio G. Cota
  2015-05-08 22:41   ` Emilio G. Cota
  2015-05-08 21:02 ` [Qemu-devel] [RFC 6/8] aie: add target helpers Emilio G. Cota
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 13+ messages in thread
From: Emilio G. Cota @ 2015-05-08 21:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, Peter Maydell, Alvise Rigo, Paolo Bonzini,
	Frederic Konrad, alex.bennee, Richard Henderson

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 Makefile.target    |  1 +
 aie.c              | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/qemu/aie.h | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 translate-all.c    |  2 ++
 4 files changed, 103 insertions(+)
 create mode 100644 aie.c
 create mode 100644 include/qemu/aie.h

diff --git a/Makefile.target b/Makefile.target
index 1083377..ab2fe6c 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -83,6 +83,7 @@ all: $(PROGS) stap
 #########################################################
 # cpu emulator library
 obj-y = exec.o translate-all.o cpu-exec.o
+obj-y += aie.o
 obj-y += tcg/tcg.o tcg/tcg-op.o tcg/optimize.o
 obj-$(CONFIG_TCG_INTERPRETER) += tci.o
 obj-$(CONFIG_TCG_INTERPRETER) += disas/tci.o
diff --git a/aie.c b/aie.c
new file mode 100644
index 0000000..a22098b
--- /dev/null
+++ b/aie.c
@@ -0,0 +1,52 @@
+/*
+ * Atomic instruction emulation (AIE).
+ * This applies to LL/SC and higher-order atomic instructions.
+ * More info:
+ *   http://en.wikipedia.org/wiki/Load-link/store-conditional
+ * License: XXX
+ */
+#include "qemu-common.h"
+#include "qemu/radix-tree.h"
+#include "qemu/thread.h"
+#include "qemu/aie.h"
+
+#if defined(CONFIG_USER_ONLY)
+# define AIE_FULL_ADDR_BITS  TARGET_VIRT_ADDR_SPACE_BITS
+#else
+#if HOST_LONG_BITS < TARGET_PHYS_ADDR_SPACE_BITS
+/* in this case QEMU restricts the maximum RAM size to fit in the host */
+# define AIE_FULL_ADDR_BITS  HOST_LONG_BITS
+#else
+# define AIE_FULL_ADDR_BITS  TARGET_PHYS_ADDR_SPACE_BITS
+#endif
+#endif /* CONFIG_USER_ONLY */
+
+#define AIE_ADDR_BITS  (AIE_FULL_ADDR_BITS - AIE_DISCARD_BITS)
+#define AIE_RADIX     8
+
+QemuRadixTree aie_rtree;
+
+static inline void *aie_entry_init(unsigned long index)
+{
+    AIEEntry *entry;
+
+    entry = qemu_memalign(64, sizeof(*entry));
+    tiny_set_init(&entry->ts);
+    qemu_mutex_init(&entry->lock);
+    return entry;
+}
+
+AIEEntry *aie_entry_get_lock(hwaddr addr)
+{
+    aie_addr_t laddr = to_aie(addr);
+    AIEEntry *e;
+
+    e = qemu_radix_tree_find_alloc(&aie_rtree, laddr, aie_entry_init, g_free);
+    qemu_mutex_lock(&e->lock);
+    return e;
+}
+
+void aie_init(void)
+{
+    qemu_radix_tree_init(&aie_rtree, AIE_ADDR_BITS, AIE_RADIX);
+}
diff --git a/include/qemu/aie.h b/include/qemu/aie.h
new file mode 100644
index 0000000..ae4e85e
--- /dev/null
+++ b/include/qemu/aie.h
@@ -0,0 +1,48 @@
+/*
+ * Atomic instruction emulation (AIE)
+ * License: XXX
+ */
+#ifndef AIE_H
+#define AIE_H
+
+#include "qemu/radix-tree.h"
+#include "qemu/tiny_set.h"
+#include "qemu/thread.h"
+#include "qemu/bitops.h"
+
+#include "exec/hwaddr.h"
+
+typedef hwaddr aie_addr_t;
+
+typedef struct AIEEntry AIEEntry;
+
+struct AIEEntry {
+    union {
+        struct {
+            TinySet ts;
+            QemuMutex lock;
+        };
+        uint8_t pad[64];
+    };
+} __attribute((aligned(64)));
+
+/* support atomic ops of up to 16-byte size */
+#define AIE_DISCARD_BITS 4
+
+extern QemuRadixTree aie_rtree;
+
+static inline aie_addr_t to_aie(hwaddr paddr)
+{
+    return paddr >> AIE_DISCARD_BITS;
+}
+
+void aie_init(void);
+
+AIEEntry *aie_entry_get_lock(hwaddr addr);
+
+static inline bool aie_entry_exists(hwaddr addr)
+{
+    return !!qemu_radix_tree_find(&aie_rtree, to_aie(addr));
+}
+
+#endif /* AIE_H */
diff --git a/translate-all.c b/translate-all.c
index 65a76c5..c12f333 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -61,6 +61,7 @@
 #include "translate-all.h"
 #include "qemu/bitmap.h"
 #include "qemu/timer.h"
+#include "qemu/aie.h"
 
 //#define DEBUG_TB_INVALIDATE
 //#define DEBUG_FLUSH
@@ -684,6 +685,7 @@ void tcg_exec_init(unsigned long tb_size)
     tcg_ctx.code_gen_ptr = tcg_ctx.code_gen_buffer;
     tcg_register_jit(tcg_ctx.code_gen_buffer, tcg_ctx.code_gen_buffer_size);
     page_init();
+    aie_init();
 #if !defined(CONFIG_USER_ONLY) || !defined(CONFIG_USE_GUEST_BASE)
     /* There's no guest base to take into account, so go ahead and
        initialize the prologue now.  */
-- 
1.8.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [RFC 6/8] aie: add target helpers
  2015-05-08 21:02 [Qemu-devel] [RFC 0/8] Helper-based Atomic Instruction Emulation (AIE) Emilio G. Cota
                   ` (4 preceding siblings ...)
  2015-05-08 21:02 ` [Qemu-devel] [RFC 5/8] aie: add module for Atomic Instruction Emulation Emilio G. Cota
@ 2015-05-08 21:02 ` Emilio G. Cota
  2015-05-08 21:02 ` [Qemu-devel] [RFC 7/8] target-arm: emulate atomic instructions using AIE Emilio G. Cota
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Emilio G. Cota @ 2015-05-08 21:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, Peter Maydell, Alvise Rigo, Paolo Bonzini,
	Frederic Konrad, alex.bennee, Richard Henderson

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 aie-helper.c              | 179 ++++++++++++++++++++++++++++++++++++++++++++++
 include/exec/cpu-defs.h   |   5 ++
 include/qemu/aie-helper.h |  21 ++++++
 3 files changed, 205 insertions(+)
 create mode 100644 aie-helper.c
 create mode 100644 include/qemu/aie-helper.h

diff --git a/aie-helper.c b/aie-helper.c
new file mode 100644
index 0000000..8bc8955
--- /dev/null
+++ b/aie-helper.c
@@ -0,0 +1,179 @@
+/*
+ * To be included directly from the target's helper.c
+ */
+#include "qemu/aie.h"
+
+#ifdef CONFIG_USER_ONLY
+static inline hwaddr h_get_ld_phys(CPUArchState *env, target_ulong vaddr)
+{
+    return vaddr;
+}
+
+static inline hwaddr h_get_st_phys(CPUArchState *env, target_ulong vaddr)
+{
+    return vaddr;
+}
+#else
+/* these need to be macros due to GETRA() */
+#define h_get_ld_phys(env, vaddr)                                       \
+    helper_ret_get_ld_phys(env, vaddr, cpu_mmu_index(env), GETRA())
+#define h_get_st_phys(env, vaddr)                                       \
+    helper_ret_get_st_phys(env, vaddr, cpu_mmu_index(env), GETRA())
+#endif /* CONFIG_USER_ONLY */
+
+static inline void h_aie_lock(CPUArchState *env, hwaddr paddr)
+{
+    AIEEntry *entry = aie_entry_get_lock(paddr);
+
+    env->aie_entry = entry;
+    env->aie_locked = true;
+}
+
+static inline void h_aie_unlock(CPUArchState *env)
+{
+    assert(env->aie_entry && env->aie_locked);
+    qemu_mutex_unlock(&env->aie_entry->lock);
+    env->aie_locked = false;
+}
+
+static inline void h_aie_unlock__done(CPUArchState *env)
+{
+    h_aie_unlock(env);
+    env->aie_entry = NULL;
+}
+
+void HELPER(aie_llsc_st_tracking_enable)(CPUArchState *env)
+{
+    CPUState *other_cs;
+
+    if (likely(atomic_read(&env->aie_llsc_st_tracking))) {
+        return;
+    }
+    CPU_FOREACH(other_cs) {
+        CPUArchState *other_env = other_cs->env_ptr;
+
+        atomic_set(&other_env->aie_llsc_st_tracking, true);
+    }
+}
+
+void HELPER(aie_ld_lock)(CPUArchState *env, target_ulong vaddr)
+{
+    hwaddr paddr;
+
+    assert(!env->aie_locked);
+    paddr = h_get_ld_phys(env, vaddr);
+    h_aie_lock(env, paddr);
+}
+
+void HELPER(aie_st_lock)(CPUArchState *env, target_ulong vaddr)
+{
+    hwaddr paddr;
+
+    assert(!env->aie_locked);
+    paddr = h_get_st_phys(env, vaddr);
+    h_aie_lock(env, paddr);
+}
+
+void HELPER(aie_insert_lock)(CPUArchState *env, target_ulong vaddr)
+{
+    AIEEntry *entry;
+    hwaddr paddr;
+
+    assert(!env->aie_locked);
+    paddr = h_get_ld_phys(env, vaddr);
+    entry = aie_entry_get_lock(paddr);
+
+    tiny_set_insert(&entry->ts, current_cpu);
+    env->aie_entry = entry;
+    env->aie_locked = true;
+}
+
+uint32_t HELPER(aie_contains_lock)(CPUArchState *env)
+{
+    AIEEntry *entry = env->aie_entry;
+
+    /* clrex could arrive between ldrex and strex due to preemption */
+    if (unlikely(entry == NULL)) {
+        return -1;
+    }
+    qemu_mutex_lock(&entry->lock);
+    env->aie_locked = true;
+    if (tiny_set_contains(&entry->ts, current_cpu)) {
+        tiny_set_remove_all(&entry->ts);
+        return 0;
+    }
+    qemu_mutex_unlock(&entry->lock);
+    env->aie_locked = false;
+    env->aie_entry = NULL;
+    return -1;
+}
+
+void HELPER(aie_unlock)(CPUArchState *env)
+{
+    h_aie_unlock(env);
+}
+
+void HELPER(aie_unlock__done)(CPUArchState *env)
+{
+    h_aie_unlock__done(env);
+}
+
+void HELPER(aie_clear)(CPUArchState *env)
+{
+    AIEEntry *entry = env->aie_entry;
+
+    assert(!env->aie_locked);
+    if (!entry) {
+        return;
+    }
+
+    qemu_mutex_lock(&env->aie_entry->lock);
+    tiny_set_remove(&entry->ts, current_cpu);
+    qemu_mutex_unlock(&env->aie_entry->lock);
+    env->aie_entry = NULL;
+}
+
+void HELPER(aie_ld_pre)(CPUArchState *env, target_ulong vaddr)
+{
+    if (likely(!env->aie_lock_enabled) || env->aie_locked) {
+        return;
+    }
+    helper_aie_ld_lock(env, vaddr);
+}
+
+void HELPER(aie_st_pre)(CPUArchState *env, target_ulong vaddr)
+{
+    if (unlikely(env->aie_lock_enabled)) {
+        if (env->aie_locked) {
+            return;
+        }
+        helper_aie_st_lock(env, vaddr);
+    } else {
+        hwaddr paddr = h_get_st_phys(env, vaddr);
+
+        if (unlikely(aie_entry_exists(paddr))) {
+            h_aie_lock(env, paddr);
+        }
+    }
+}
+
+void HELPER(aie_llsc_st_pre)(CPUArchState *env, target_ulong vaddr)
+{
+    hwaddr paddr;
+
+    assert(!env->aie_locked);
+    if (!env->aie_llsc_st_tracking) {
+        return;
+    }
+    paddr = h_get_st_phys(env, vaddr);
+    if (unlikely(aie_entry_exists(paddr))) {
+        h_aie_lock(env, paddr);
+    }
+}
+
+void HELPER(aie_llsc_st_post)(CPUArchState *env)
+{
+    if (unlikely(env->aie_locked)) {
+        h_aie_unlock__done(env);
+    }
+}
diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index 67aa0a0..8891f16 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -27,6 +27,7 @@
 #include <inttypes.h>
 #include "qemu/osdep.h"
 #include "qemu/queue.h"
+#include "qemu/aie.h"
 #ifndef CONFIG_USER_ONLY
 #include "exec/hwaddr.h"
 #endif
@@ -135,5 +136,9 @@ typedef struct CPUIOTLBEntry {
 #define CPU_COMMON                                                      \
     /* soft mmu support */                                              \
     CPU_COMMON_TLB                                                      \
+    AIEEntry *aie_entry;                                                \
+    bool aie_locked;                                                    \
+    bool aie_lock_enabled;                                              \
+    bool aie_llsc_st_tracking;                                          \
 
 #endif
diff --git a/include/qemu/aie-helper.h b/include/qemu/aie-helper.h
new file mode 100644
index 0000000..7605e07
--- /dev/null
+++ b/include/qemu/aie-helper.h
@@ -0,0 +1,21 @@
+#ifdef TARGET_ARM
+#define AIE_VADDR_TCG_TYPE glue(i, TARGET_VIRT_ADDR_SPACE_BITS)
+#else
+#define AIE_VADDR_TCG_TYPE tl
+#endif /* TARGET_ARM */
+
+DEF_HELPER_2(aie_ld_pre, void, env, AIE_VADDR_TCG_TYPE)
+DEF_HELPER_2(aie_st_pre, void, env, AIE_VADDR_TCG_TYPE)
+DEF_HELPER_2(aie_llsc_st_pre, void, env, AIE_VADDR_TCG_TYPE)
+DEF_HELPER_1(aie_llsc_st_post, void, env)
+DEF_HELPER_1(aie_llsc_st_tracking_enable, void, env)
+
+DEF_HELPER_2(aie_ld_lock, void, env, AIE_VADDR_TCG_TYPE)
+DEF_HELPER_2(aie_st_lock, void, env, AIE_VADDR_TCG_TYPE)
+DEF_HELPER_2(aie_insert_lock, void, env, AIE_VADDR_TCG_TYPE)
+DEF_HELPER_1(aie_contains_lock, i32, env)
+DEF_HELPER_1(aie_unlock, void, env)
+DEF_HELPER_1(aie_unlock__done, void, env)
+DEF_HELPER_1(aie_clear, void, env)
+
+#undef AIE_VADDR_TCG_TYPE
-- 
1.8.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [RFC 7/8] target-arm: emulate atomic instructions using AIE
  2015-05-08 21:02 [Qemu-devel] [RFC 0/8] Helper-based Atomic Instruction Emulation (AIE) Emilio G. Cota
                   ` (5 preceding siblings ...)
  2015-05-08 21:02 ` [Qemu-devel] [RFC 6/8] aie: add target helpers Emilio G. Cota
@ 2015-05-08 21:02 ` Emilio G. Cota
  2015-05-08 21:02 ` [Qemu-devel] [RFC 8/8] target-i386: " Emilio G. Cota
  2015-05-11 16:01 ` [Qemu-devel] [RFC 0/8] Helper-based Atomic Instruction Emulation (AIE) Frederic Konrad
  8 siblings, 0 replies; 13+ messages in thread
From: Emilio G. Cota @ 2015-05-08 21:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, Peter Maydell, Alvise Rigo, Paolo Bonzini,
	Frederic Konrad, alex.bennee, Richard Henderson

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 linux-user/main.c      |  89 -------------------------
 target-arm/helper.c    |   2 +
 target-arm/helper.h    |   2 +
 target-arm/op_helper.c |   5 ++
 target-arm/translate.c | 172 ++++++++++++++++++++++---------------------------
 5 files changed, 86 insertions(+), 184 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index 3f32db0..b6f21b4 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -584,90 +584,6 @@ do_kernel_trap(CPUARMState *env)
     return 0;
 }
 
-/* Store exclusive handling for AArch32 */
-static int do_strex(CPUARMState *env)
-{
-    uint64_t val;
-    int size;
-    int rc = 1;
-    int segv = 0;
-    uint32_t addr;
-    start_exclusive();
-    if (env->exclusive_addr != env->exclusive_test) {
-        goto fail;
-    }
-    /* We know we're always AArch32 so the address is in uint32_t range
-     * unless it was the -1 exclusive-monitor-lost value (which won't
-     * match exclusive_test above).
-     */
-    assert(extract64(env->exclusive_addr, 32, 32) == 0);
-    addr = env->exclusive_addr;
-    size = env->exclusive_info & 0xf;
-    switch (size) {
-    case 0:
-        segv = get_user_u8(val, addr);
-        break;
-    case 1:
-        segv = get_user_u16(val, addr);
-        break;
-    case 2:
-    case 3:
-        segv = get_user_u32(val, addr);
-        break;
-    default:
-        abort();
-    }
-    if (segv) {
-        env->exception.vaddress = addr;
-        goto done;
-    }
-    if (size == 3) {
-        uint32_t valhi;
-        segv = get_user_u32(valhi, addr + 4);
-        if (segv) {
-            env->exception.vaddress = addr + 4;
-            goto done;
-        }
-        val = deposit64(val, 32, 32, valhi);
-    }
-    if (val != env->exclusive_val) {
-        goto fail;
-    }
-
-    val = env->regs[(env->exclusive_info >> 8) & 0xf];
-    switch (size) {
-    case 0:
-        segv = put_user_u8(val, addr);
-        break;
-    case 1:
-        segv = put_user_u16(val, addr);
-        break;
-    case 2:
-    case 3:
-        segv = put_user_u32(val, addr);
-        break;
-    }
-    if (segv) {
-        env->exception.vaddress = addr;
-        goto done;
-    }
-    if (size == 3) {
-        val = env->regs[(env->exclusive_info >> 12) & 0xf];
-        segv = put_user_u32(val, addr + 4);
-        if (segv) {
-            env->exception.vaddress = addr + 4;
-            goto done;
-        }
-    }
-    rc = 0;
-fail:
-    env->regs[15] += 4;
-    env->regs[(env->exclusive_info >> 4) & 0xf] = rc;
-done:
-    end_exclusive();
-    return segv;
-}
-
 void cpu_loop(CPUARMState *env)
 {
     CPUState *cs = CPU(arm_env_get_cpu(env));
@@ -833,11 +749,6 @@ void cpu_loop(CPUARMState *env)
         case EXCP_INTERRUPT:
             /* just indicate that signals should be handled asap */
             break;
-        case EXCP_STREX:
-            if (!do_strex(env)) {
-                break;
-            }
-            /* fall through for segv */
         case EXCP_PREFETCH_ABORT:
         case EXCP_DATA_ABORT:
             addr = env->exception.vaddress;
diff --git a/target-arm/helper.c b/target-arm/helper.c
index f8f8d76..742e5be 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -11,6 +11,8 @@
 #include "arm_ldst.h"
 #include <zlib.h> /* For crc32 */
 
+#include "aie-helper.c"
+
 #ifndef CONFIG_USER_ONLY
 static inline int get_phys_addr(CPUARMState *env, target_ulong address,
                                 int access_type, ARMMMUIdx mmu_idx,
diff --git a/target-arm/helper.h b/target-arm/helper.h
index dec3728..3c797d1 100644
--- a/target-arm/helper.h
+++ b/target-arm/helper.h
@@ -529,6 +529,8 @@ DEF_HELPER_2(dc_zva, void, env, i64)
 DEF_HELPER_FLAGS_2(neon_pmull_64_lo, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(neon_pmull_64_hi, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 
+#include "qemu/aie-helper.h"
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #endif
diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
index 3df9c57..ef48180 100644
--- a/target-arm/op_helper.c
+++ b/target-arm/op_helper.c
@@ -29,6 +29,11 @@ static void raise_exception(CPUARMState *env, int tt)
     ARMCPU *cpu = arm_env_get_cpu(env);
     CPUState *cs = CPU(cpu);
 
+    if (unlikely(env->aie_locked)) {
+        assert(env->aie_entry);
+        qemu_mutex_unlock(&env->aie_entry->lock);
+        env->aie_locked = false;
+    }
     cs->exception_index = tt;
     cpu_loop_exit(cs);
 }
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 9116529..935011c 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -65,12 +65,6 @@ TCGv_ptr cpu_env;
 static TCGv_i64 cpu_V0, cpu_V1, cpu_M0;
 static TCGv_i32 cpu_R[16];
 static TCGv_i32 cpu_CF, cpu_NF, cpu_VF, cpu_ZF;
-static TCGv_i64 cpu_exclusive_addr;
-static TCGv_i64 cpu_exclusive_val;
-#ifdef CONFIG_USER_ONLY
-static TCGv_i64 cpu_exclusive_test;
-static TCGv_i32 cpu_exclusive_info;
-#endif
 
 /* FIXME:  These should be removed.  */
 static TCGv_i32 cpu_F0s, cpu_F1s;
@@ -99,17 +93,6 @@ void arm_translate_init(void)
     cpu_VF = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUARMState, VF), "VF");
     cpu_ZF = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUARMState, ZF), "ZF");
 
-    cpu_exclusive_addr = tcg_global_mem_new_i64(TCG_AREG0,
-        offsetof(CPUARMState, exclusive_addr), "exclusive_addr");
-    cpu_exclusive_val = tcg_global_mem_new_i64(TCG_AREG0,
-        offsetof(CPUARMState, exclusive_val), "exclusive_val");
-#ifdef CONFIG_USER_ONLY
-    cpu_exclusive_test = tcg_global_mem_new_i64(TCG_AREG0,
-        offsetof(CPUARMState, exclusive_test), "exclusive_test");
-    cpu_exclusive_info = tcg_global_mem_new_i32(TCG_AREG0,
-        offsetof(CPUARMState, exclusive_info), "exclusive_info");
-#endif
-
     a64_translate_init();
 }
 
@@ -896,6 +879,15 @@ static inline void gen_aa32_ld##SUFF(TCGv_i32 val, TCGv_i32 addr, int index) \
 #define DO_GEN_ST(SUFF, OPC)                                             \
 static inline void gen_aa32_st##SUFF(TCGv_i32 val, TCGv_i32 addr, int index) \
 {                                                                        \
+    gen_helper_aie_llsc_st_pre(cpu_env, addr);                           \
+    tcg_gen_qemu_st_i32(val, addr, index, OPC);                          \
+    gen_helper_aie_llsc_st_post(cpu_env);                                \
+}
+
+#define DO_GEN_ST_LOCKED(SUFF, OPC)                                      \
+static inline                                                            \
+void gen_aa32_st##SUFF##_locked(TCGv_i32 val, TCGv_i32 addr, int index)  \
+{                                                                        \
     tcg_gen_qemu_st_i32(val, addr, index, OPC);                          \
 }
 
@@ -906,7 +898,19 @@ static inline void gen_aa32_ld64(TCGv_i64 val, TCGv_i32 addr, int index)
 
 static inline void gen_aa32_st64(TCGv_i64 val, TCGv_i32 addr, int index)
 {
+    gen_helper_aie_llsc_st_pre(cpu_env, addr);
     tcg_gen_qemu_st_i64(val, addr, index, MO_TEQ);
+    gen_helper_aie_llsc_st_post(cpu_env);
+}
+
+static inline void gen_aa32_aie_insert_lock(TCGv_i32 addr)
+{
+    gen_helper_aie_insert_lock(cpu_env, addr);
+}
+
+static inline void gen_aa32_aie_ld_lock(TCGv_i32 addr)
+{
+    gen_helper_aie_ld_lock(cpu_env, addr);
 }
 
 #else
@@ -920,11 +924,23 @@ static inline void gen_aa32_ld##SUFF(TCGv_i32 val, TCGv_i32 addr, int index) \
     tcg_temp_free(addr64);                                               \
 }
 
-#define DO_GEN_ST(SUFF, OPC)                                             \
+#define DO_GEN_ST(SUFF, OPC)                                     \
 static inline void gen_aa32_st##SUFF(TCGv_i32 val, TCGv_i32 addr, int index) \
 {                                                                        \
     TCGv addr64 = tcg_temp_new();                                        \
     tcg_gen_extu_i32_i64(addr64, addr);                                  \
+    gen_helper_aie_llsc_st_pre(cpu_env, addr64);                         \
+    tcg_gen_qemu_st_i32(val, addr64, index, OPC);                        \
+    gen_helper_aie_llsc_st_post(cpu_env);                                \
+    tcg_temp_free(addr64);                                               \
+}
+
+#define DO_GEN_ST_LOCKED(SUFF, OPC)                                      \
+static inline                                                            \
+void gen_aa32_st##SUFF##_locked(TCGv_i32 val, TCGv_i32 addr, int index)  \
+{                                                                        \
+    TCGv addr64 = tcg_temp_new();                                        \
+    tcg_gen_extu_i32_i64(addr64, addr);                                  \
     tcg_gen_qemu_st_i32(val, addr64, index, OPC);                        \
     tcg_temp_free(addr64);                                               \
 }
@@ -941,7 +957,29 @@ static inline void gen_aa32_st64(TCGv_i64 val, TCGv_i32 addr, int index)
 {
     TCGv addr64 = tcg_temp_new();
     tcg_gen_extu_i32_i64(addr64, addr);
+    gen_helper_aie_llsc_st_pre(cpu_env, addr64);
     tcg_gen_qemu_st_i64(val, addr64, index, MO_TEQ);
+    gen_helper_aie_llsc_st_post(cpu_env);
+    tcg_temp_free(addr64);
+}
+
+static inline void gen_aa32_aie_insert_lock(TCGv_i32 addr)
+{
+    TCGv addr64 = tcg_temp_new();
+
+    addr64 = tcg_temp_new();
+    tcg_gen_extu_i32_i64(addr64, addr);
+    gen_helper_aie_insert_lock(cpu_env, addr64);
+    tcg_temp_free(addr64);
+}
+
+static inline void gen_aa32_aie_ld_lock(TCGv_i32 addr)
+{
+    TCGv addr64 = tcg_temp_new();
+
+    addr64 = tcg_temp_new();
+    tcg_gen_extu_i32_i64(addr64, addr);
+    gen_helper_aie_ld_lock(cpu_env, addr64);
     tcg_temp_free(addr64);
 }
 
@@ -955,6 +993,9 @@ DO_GEN_LD(32u, MO_TEUL)
 DO_GEN_ST(8, MO_UB)
 DO_GEN_ST(16, MO_TEUW)
 DO_GEN_ST(32, MO_TEUL)
+DO_GEN_ST_LOCKED(8, MO_UB)
+DO_GEN_ST_LOCKED(16, MO_TEUW)
+DO_GEN_ST_LOCKED(32, MO_TEUL)
 
 static inline void gen_set_pc_im(DisasContext *s, target_ulong val)
 {
@@ -7372,15 +7413,6 @@ static void gen_logicq_cc(TCGv_i32 lo, TCGv_i32 hi)
     tcg_gen_or_i32(cpu_ZF, lo, hi);
 }
 
-/* Load/Store exclusive instructions are implemented by remembering
-   the value/address loaded, and seeing if these are the same
-   when the store is performed. This should be sufficient to implement
-   the architecturally mandated semantics, and avoids having to monitor
-   regular stores.
-
-   In system emulation mode only one CPU will be running at once, so
-   this sequence is effectively atomic.  In user emulation mode we
-   throw an exception and handle the atomic operation elsewhere.  */
 static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
                                TCGv_i32 addr, int size)
 {
@@ -7388,6 +7420,7 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
 
     s->is_ldex = true;
 
+    gen_aa32_aie_insert_lock(addr);
     switch (size) {
     case 0:
         gen_aa32_ld8u(tmp, addr, get_mem_index(s));
@@ -7410,96 +7443,44 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
         tcg_gen_addi_i32(tmp2, addr, 4);
         gen_aa32_ld32u(tmp3, tmp2, get_mem_index(s));
         tcg_temp_free_i32(tmp2);
-        tcg_gen_concat_i32_i64(cpu_exclusive_val, tmp, tmp3);
         store_reg(s, rt2, tmp3);
-    } else {
-        tcg_gen_extu_i32_i64(cpu_exclusive_val, tmp);
     }
 
     store_reg(s, rt, tmp);
-    tcg_gen_extu_i32_i64(cpu_exclusive_addr, addr);
+    gen_helper_aie_unlock(cpu_env);
 }
 
 static void gen_clrex(DisasContext *s)
 {
-    tcg_gen_movi_i64(cpu_exclusive_addr, -1);
+    gen_helper_aie_clear(cpu_env);
 }
 
-#ifdef CONFIG_USER_ONLY
-static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
-                                TCGv_i32 addr, int size)
-{
-    tcg_gen_extu_i32_i64(cpu_exclusive_test, addr);
-    tcg_gen_movi_i32(cpu_exclusive_info,
-                     size | (rd << 4) | (rt << 8) | (rt2 << 12));
-    gen_exception_internal_insn(s, 4, EXCP_STREX);
-}
-#else
 static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
                                 TCGv_i32 addr, int size)
 {
     TCGv_i32 tmp;
-    TCGv_i64 val64, extaddr;
     TCGLabel *done_label;
     TCGLabel *fail_label;
 
-    /* if (env->exclusive_addr == addr && env->exclusive_val == [addr]) {
-         [addr] = {Rt};
-         {Rd} = 0;
-       } else {
-         {Rd} = 1;
-       } */
     fail_label = gen_new_label();
     done_label = gen_new_label();
-    extaddr = tcg_temp_new_i64();
-    tcg_gen_extu_i32_i64(extaddr, addr);
-    tcg_gen_brcond_i64(TCG_COND_NE, extaddr, cpu_exclusive_addr, fail_label);
-    tcg_temp_free_i64(extaddr);
 
     tmp = tcg_temp_new_i32();
-    switch (size) {
-    case 0:
-        gen_aa32_ld8u(tmp, addr, get_mem_index(s));
-        break;
-    case 1:
-        gen_aa32_ld16u(tmp, addr, get_mem_index(s));
-        break;
-    case 2:
-    case 3:
-        gen_aa32_ld32u(tmp, addr, get_mem_index(s));
-        break;
-    default:
-        abort();
-    }
-
-    val64 = tcg_temp_new_i64();
-    if (size == 3) {
-        TCGv_i32 tmp2 = tcg_temp_new_i32();
-        TCGv_i32 tmp3 = tcg_temp_new_i32();
-        tcg_gen_addi_i32(tmp2, addr, 4);
-        gen_aa32_ld32u(tmp3, tmp2, get_mem_index(s));
-        tcg_temp_free_i32(tmp2);
-        tcg_gen_concat_i32_i64(val64, tmp, tmp3);
-        tcg_temp_free_i32(tmp3);
-    } else {
-        tcg_gen_extu_i32_i64(val64, tmp);
-    }
+    gen_helper_aie_contains_lock(tmp, cpu_env);
+    tcg_gen_brcondi_i32(TCG_COND_NE, tmp, 0, fail_label);
     tcg_temp_free_i32(tmp);
 
-    tcg_gen_brcond_i64(TCG_COND_NE, val64, cpu_exclusive_val, fail_label);
-    tcg_temp_free_i64(val64);
-
     tmp = load_reg(s, rt);
     switch (size) {
     case 0:
-        gen_aa32_st8(tmp, addr, get_mem_index(s));
+        gen_aa32_st8_locked(tmp, addr, get_mem_index(s));
         break;
     case 1:
-        gen_aa32_st16(tmp, addr, get_mem_index(s));
+        gen_aa32_st16_locked(tmp, addr, get_mem_index(s));
         break;
     case 2:
     case 3:
-        gen_aa32_st32(tmp, addr, get_mem_index(s));
+        gen_aa32_st32_locked(tmp, addr, get_mem_index(s));
         break;
     default:
         abort();
@@ -7508,17 +7489,16 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
     if (size == 3) {
         tcg_gen_addi_i32(addr, addr, 4);
         tmp = load_reg(s, rt2);
-        gen_aa32_st32(tmp, addr, get_mem_index(s));
+        gen_aa32_st32_locked(tmp, addr, get_mem_index(s));
         tcg_temp_free_i32(tmp);
     }
     tcg_gen_movi_i32(cpu_R[rd], 0);
+    gen_helper_aie_unlock__done(cpu_env);
     tcg_gen_br(done_label);
     gen_set_label(fail_label);
     tcg_gen_movi_i32(cpu_R[rd], 1);
     gen_set_label(done_label);
-    tcg_gen_movi_i64(cpu_exclusive_addr, -1);
 }
-#endif
 
 /* gen_srs:
  * @env: CPUARMState
@@ -8401,21 +8381,23 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                         tcg_temp_free_i32(addr);
                     } else {
                         /* SWP instruction */
+                        int size8 = insn & (1 << 22);
+
                         rm = (insn) & 0xf;
 
-                        /* ??? This is not really atomic.  However we know
-                           we never have multiple CPUs running in parallel,
-                           so it is good enough.  */
                         addr = load_reg(s, rn);
                         tmp = load_reg(s, rm);
                         tmp2 = tcg_temp_new_i32();
-                        if (insn & (1 << 22)) {
+                        gen_helper_aie_llsc_st_tracking_enable(cpu_env);
+                        gen_aa32_aie_ld_lock(addr);
+                        if (size8) {
                             gen_aa32_ld8u(tmp2, addr, get_mem_index(s));
-                            gen_aa32_st8(tmp, addr, get_mem_index(s));
+                            gen_aa32_st8_locked(tmp, addr, get_mem_index(s));
                         } else {
                             gen_aa32_ld32u(tmp2, addr, get_mem_index(s));
-                            gen_aa32_st32(tmp, addr, get_mem_index(s));
+                            gen_aa32_st32_locked(tmp, addr, get_mem_index(s));
                         }
+                        gen_helper_aie_unlock__done(cpu_env);
                         tcg_temp_free_i32(tmp);
                         tcg_temp_free_i32(addr);
                         store_reg(s, rd, tmp2);
-- 
1.8.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [RFC 8/8] target-i386: emulate atomic instructions using AIE
  2015-05-08 21:02 [Qemu-devel] [RFC 0/8] Helper-based Atomic Instruction Emulation (AIE) Emilio G. Cota
                   ` (6 preceding siblings ...)
  2015-05-08 21:02 ` [Qemu-devel] [RFC 7/8] target-arm: emulate atomic instructions using AIE Emilio G. Cota
@ 2015-05-08 21:02 ` Emilio G. Cota
  2015-05-11 16:01 ` [Qemu-devel] [RFC 0/8] Helper-based Atomic Instruction Emulation (AIE) Frederic Konrad
  8 siblings, 0 replies; 13+ messages in thread
From: Emilio G. Cota @ 2015-05-08 21:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, Peter Maydell, Alvise Rigo, Paolo Bonzini,
	Frederic Konrad, alex.bennee, Richard Henderson

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 target-i386/helper.h     |   6 +-
 target-i386/mem_helper.c |  19 +++--
 target-i386/translate.c  | 176 ++++++++++++++++++++++++++++-------------------
 3 files changed, 122 insertions(+), 79 deletions(-)

diff --git a/target-i386/helper.h b/target-i386/helper.h
index 8eb0145..d3335b0 100644
--- a/target-i386/helper.h
+++ b/target-i386/helper.h
@@ -1,8 +1,8 @@
 DEF_HELPER_FLAGS_4(cc_compute_all, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl, int)
 DEF_HELPER_FLAGS_4(cc_compute_c, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl, int)
 
-DEF_HELPER_0(lock, void)
-DEF_HELPER_0(unlock, void)
+DEF_HELPER_1(lock_enable, void, env)
+DEF_HELPER_1(lock_disable, void, env)
 DEF_HELPER_3(write_eflags, void, env, tl, i32)
 DEF_HELPER_1(read_eflags, tl, env)
 DEF_HELPER_2(divb_AL, void, env, tl)
@@ -217,3 +217,5 @@ DEF_HELPER_3(rcrl, tl, env, tl, tl)
 DEF_HELPER_3(rclq, tl, env, tl, tl)
 DEF_HELPER_3(rcrq, tl, env, tl, tl)
 #endif
+
+#include "qemu/aie-helper.h"
diff --git a/target-i386/mem_helper.c b/target-i386/mem_helper.c
index 1aec8a5..b3d4cf7 100644
--- a/target-i386/mem_helper.c
+++ b/target-i386/mem_helper.c
@@ -21,18 +21,19 @@
 #include "exec/helper-proto.h"
 #include "exec/cpu_ldst.h"
 
-/* broken thread support */
+#include "aie-helper.c"
 
-static spinlock_t global_cpu_lock = SPIN_LOCK_UNLOCKED;
-
-void helper_lock(void)
+void helper_lock_enable(CPUX86State *env)
 {
-    spin_lock(&global_cpu_lock);
+    env->aie_lock_enabled = true;
 }
 
-void helper_unlock(void)
+void helper_lock_disable(CPUX86State *env)
 {
-    spin_unlock(&global_cpu_lock);
+    if (env->aie_entry) {
+        helper_aie_unlock__done(env);
+    }
+    env->aie_lock_enabled = false;
 }
 
 void helper_cmpxchg8b(CPUX86State *env, target_ulong a0)
@@ -41,6 +42,7 @@ void helper_cmpxchg8b(CPUX86State *env, target_ulong a0)
     int eflags;
 
     eflags = cpu_cc_compute_all(env, CC_OP);
+    helper_aie_ld_lock(env, a0);
     d = cpu_ldq_data(env, a0);
     if (d == (((uint64_t)env->regs[R_EDX] << 32) | (uint32_t)env->regs[R_EAX])) {
         cpu_stq_data(env, a0, ((uint64_t)env->regs[R_ECX] << 32) | (uint32_t)env->regs[R_EBX]);
@@ -52,6 +54,7 @@ void helper_cmpxchg8b(CPUX86State *env, target_ulong a0)
         env->regs[R_EAX] = (uint32_t)d;
         eflags &= ~CC_Z;
     }
+    helper_aie_unlock__done(env);
     CC_SRC = eflags;
 }
 
@@ -65,6 +68,7 @@ void helper_cmpxchg16b(CPUX86State *env, target_ulong a0)
         raise_exception(env, EXCP0D_GPF);
     }
     eflags = cpu_cc_compute_all(env, CC_OP);
+    helper_aie_ld_lock(env, a0);
     d0 = cpu_ldq_data(env, a0);
     d1 = cpu_ldq_data(env, a0 + 8);
     if (d0 == env->regs[R_EAX] && d1 == env->regs[R_EDX]) {
@@ -79,6 +83,7 @@ void helper_cmpxchg16b(CPUX86State *env, target_ulong a0)
         env->regs[R_EAX] = d0;
         eflags &= ~CC_Z;
     }
+    helper_aie_unlock__done(env);
     CC_SRC = eflags;
 }
 #endif
diff --git a/target-i386/translate.c b/target-i386/translate.c
index 305ce50..8bed55c 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -300,6 +300,46 @@ static inline bool byte_reg_is_xH(int reg)
     return true;
 }
 
+static inline void gen_i386_ld_i32(DisasContext *s, TCGv_i32 val, TCGv addr,
+                                   TCGArg idx, TCGMemOp op)
+{
+    if (s->prefix & PREFIX_LOCK) {
+        gen_helper_aie_ld_pre(cpu_env, addr);
+    }
+    tcg_gen_qemu_ld_i32(val, addr, idx, op);
+}
+
+static inline void gen_i386_ld_i64(DisasContext *s, TCGv_i64 val, TCGv addr,
+                                   TCGArg idx, TCGMemOp op)
+{
+    if (s->prefix & PREFIX_LOCK) {
+        gen_helper_aie_ld_pre(cpu_env, addr);
+    }
+    tcg_gen_qemu_ld_i64(val, addr, idx, op);
+}
+
+static inline
+void gen_i386_st_i32(TCGv_i32 val, TCGv addr, TCGArg idx, TCGMemOp op)
+{
+    gen_helper_aie_st_pre(cpu_env, addr);
+    tcg_gen_qemu_st_i32(val, addr, idx, op);
+}
+
+static inline
+void gen_i386_st_i64(TCGv_i64 val, TCGv addr, TCGArg idx, TCGMemOp op)
+{
+    gen_helper_aie_st_pre(cpu_env, addr);
+    tcg_gen_qemu_st_i64(val, addr, idx, op);
+}
+
+#if TARGET_LONG_BITS == 32
+#define gen_i386_ld_tl  gen_i386_ld_i32
+#define gen_i386_st_tl  gen_i386_st_i32
+#else
+#define gen_i386_ld_tl  gen_i386_ld_i64
+#define gen_i386_st_tl  gen_i386_st_i64
+#endif
+
 /* Select the size of a push/pop operation.  */
 static inline TCGMemOp mo_pushpop(DisasContext *s, TCGMemOp ot)
 {
@@ -479,7 +519,7 @@ static inline void gen_op_addq_A0_reg_sN(int shift, int reg)
 
 static inline void gen_op_ld_v(DisasContext *s, int idx, TCGv t0, TCGv a0)
 {
-    tcg_gen_qemu_ld_tl(t0, a0, s->mem_index, idx | MO_LE);
+    gen_i386_ld_tl(s, t0, a0, s->mem_index, idx | MO_LE);
 }
 
 static inline void gen_op_st_v(DisasContext *s, int idx, TCGv t0, TCGv a0)
@@ -2587,23 +2627,23 @@ static void gen_jmp(DisasContext *s, target_ulong eip)
 
 static inline void gen_ldq_env_A0(DisasContext *s, int offset)
 {
-    tcg_gen_qemu_ld_i64(cpu_tmp1_i64, cpu_A0, s->mem_index, MO_LEQ);
+    gen_i386_ld_i64(s, cpu_tmp1_i64, cpu_A0, s->mem_index, MO_LEQ);
     tcg_gen_st_i64(cpu_tmp1_i64, cpu_env, offset);
 }
 
 static inline void gen_stq_env_A0(DisasContext *s, int offset)
 {
     tcg_gen_ld_i64(cpu_tmp1_i64, cpu_env, offset);
-    tcg_gen_qemu_st_i64(cpu_tmp1_i64, cpu_A0, s->mem_index, MO_LEQ);
+    gen_i386_st_i64(cpu_tmp1_i64, cpu_A0, s->mem_index, MO_LEQ);
 }
 
 static inline void gen_ldo_env_A0(DisasContext *s, int offset)
 {
     int mem_index = s->mem_index;
-    tcg_gen_qemu_ld_i64(cpu_tmp1_i64, cpu_A0, mem_index, MO_LEQ);
+    gen_i386_ld_i64(s, cpu_tmp1_i64, cpu_A0, mem_index, MO_LEQ);
     tcg_gen_st_i64(cpu_tmp1_i64, cpu_env, offset + offsetof(XMMReg, XMM_Q(0)));
     tcg_gen_addi_tl(cpu_tmp0, cpu_A0, 8);
-    tcg_gen_qemu_ld_i64(cpu_tmp1_i64, cpu_tmp0, mem_index, MO_LEQ);
+    gen_i386_ld_i64(s, cpu_tmp1_i64, cpu_tmp0, mem_index, MO_LEQ);
     tcg_gen_st_i64(cpu_tmp1_i64, cpu_env, offset + offsetof(XMMReg, XMM_Q(1)));
 }
 
@@ -2611,10 +2651,10 @@ static inline void gen_sto_env_A0(DisasContext *s, int offset)
 {
     int mem_index = s->mem_index;
     tcg_gen_ld_i64(cpu_tmp1_i64, cpu_env, offset + offsetof(XMMReg, XMM_Q(0)));
-    tcg_gen_qemu_st_i64(cpu_tmp1_i64, cpu_A0, mem_index, MO_LEQ);
+    gen_i386_st_i64(cpu_tmp1_i64, cpu_A0, mem_index, MO_LEQ);
     tcg_gen_addi_tl(cpu_tmp0, cpu_A0, 8);
     tcg_gen_ld_i64(cpu_tmp1_i64, cpu_env, offset + offsetof(XMMReg, XMM_Q(1)));
-    tcg_gen_qemu_st_i64(cpu_tmp1_i64, cpu_tmp0, mem_index, MO_LEQ);
+    gen_i386_st_i64(cpu_tmp1_i64, cpu_tmp0, mem_index, MO_LEQ);
 }
 
 static inline void gen_op_movo(int d_offset, int s_offset)
@@ -3643,14 +3683,14 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
                         break;
                     case 0x21: case 0x31: /* pmovsxbd, pmovzxbd */
                     case 0x24: case 0x34: /* pmovsxwq, pmovzxwq */
-                        tcg_gen_qemu_ld_i32(cpu_tmp2_i32, cpu_A0,
-                                            s->mem_index, MO_LEUL);
+                        gen_i386_ld_i32(s, cpu_tmp2_i32, cpu_A0,
+                                        s->mem_index, MO_LEUL);
                         tcg_gen_st_i32(cpu_tmp2_i32, cpu_env, op2_offset +
                                         offsetof(XMMReg, XMM_L(0)));
                         break;
                     case 0x22: case 0x32: /* pmovsxbq, pmovzxbq */
-                        tcg_gen_qemu_ld_tl(cpu_tmp0, cpu_A0,
-                                           s->mem_index, MO_LEUW);
+                        gen_i386_ld_tl(s, cpu_tmp0, cpu_A0, s->mem_index,
+                                       MO_LEUW);
                         tcg_gen_st16_tl(cpu_tmp0, cpu_env, op2_offset +
                                         offsetof(XMMReg, XMM_W(0)));
                         break;
@@ -3738,8 +3778,8 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
 
                 gen_lea_modrm(env, s, modrm);
                 if ((b & 1) == 0) {
-                    tcg_gen_qemu_ld_tl(cpu_T[0], cpu_A0,
-                                       s->mem_index, ot | MO_BE);
+                    gen_i386_ld_tl(s, cpu_T[0], cpu_A0, s->mem_index,
+                                   ot | MO_BE);
                     gen_op_mov_reg_v(ot, reg, cpu_T[0]);
                 } else {
                     tcg_gen_qemu_st_tl(cpu_regs[reg], cpu_A0,
@@ -4101,8 +4141,8 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
                         if (mod == 3) {
                             tcg_gen_extu_i32_tl(cpu_regs[rm], cpu_tmp2_i32);
                         } else {
-                            tcg_gen_qemu_st_i32(cpu_tmp2_i32, cpu_A0,
-                                                s->mem_index, MO_LEUL);
+                            gen_i386_st_i32(cpu_tmp2_i32, cpu_A0,
+                                            s->mem_index, MO_LEUL);
                         }
                     } else { /* pextrq */
 #ifdef TARGET_X86_64
@@ -4112,8 +4152,8 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
                         if (mod == 3) {
                             tcg_gen_mov_i64(cpu_regs[rm], cpu_tmp1_i64);
                         } else {
-                            tcg_gen_qemu_st_i64(cpu_tmp1_i64, cpu_A0,
-                                                s->mem_index, MO_LEQ);
+                            gen_i386_st_i64(cpu_tmp1_i64, cpu_A0,
+                                            s->mem_index, MO_LEQ);
                         }
 #else
                         goto illegal_op;
@@ -4134,8 +4174,8 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
                     if (mod == 3) {
                         gen_op_mov_v_reg(MO_32, cpu_T[0], rm);
                     } else {
-                        tcg_gen_qemu_ld_tl(cpu_T[0], cpu_A0,
-                                           s->mem_index, MO_UB);
+                        gen_i386_ld_tl(s, cpu_T[0], cpu_A0, s->mem_index,
+                                       MO_UB);
                     }
                     tcg_gen_st8_tl(cpu_T[0], cpu_env, offsetof(CPUX86State,
                                             xmm_regs[reg].XMM_B(val & 15)));
@@ -4146,8 +4186,8 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
                                         offsetof(CPUX86State,xmm_regs[rm]
                                                 .XMM_L((val >> 6) & 3)));
                     } else {
-                        tcg_gen_qemu_ld_i32(cpu_tmp2_i32, cpu_A0,
-                                            s->mem_index, MO_LEUL);
+                        gen_i386_ld_i32(s, cpu_tmp2_i32, cpu_A0,
+                                        s->mem_index, MO_LEUL);
                     }
                     tcg_gen_st_i32(cpu_tmp2_i32, cpu_env,
                                     offsetof(CPUX86State,xmm_regs[reg]
@@ -4174,8 +4214,8 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
                         if (mod == 3) {
                             tcg_gen_trunc_tl_i32(cpu_tmp2_i32, cpu_regs[rm]);
                         } else {
-                            tcg_gen_qemu_ld_i32(cpu_tmp2_i32, cpu_A0,
-                                                s->mem_index, MO_LEUL);
+                            gen_i386_ld_i32(s, cpu_tmp2_i32, cpu_A0,
+                                            s->mem_index, MO_LEUL);
                         }
                         tcg_gen_st_i32(cpu_tmp2_i32, cpu_env,
                                         offsetof(CPUX86State,
@@ -4185,8 +4225,8 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
                         if (mod == 3) {
                             gen_op_mov_v_reg(ot, cpu_tmp1_i64, rm);
                         } else {
-                            tcg_gen_qemu_ld_i64(cpu_tmp1_i64, cpu_A0,
-                                                s->mem_index, MO_LEQ);
+                            gen_i386_ld_i64(s, cpu_tmp1_i64, cpu_A0,
+                                            s->mem_index, MO_LEQ);
                         }
                         tcg_gen_st_i64(cpu_tmp1_i64, cpu_env,
                                         offsetof(CPUX86State,
@@ -4567,7 +4607,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
 
     /* lock generation */
     if (prefixes & PREFIX_LOCK)
-        gen_helper_lock();
+        gen_helper_lock_enable(cpu_env);
 
     /* now check op code */
  reswitch:
@@ -5569,11 +5609,11 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             gen_op_mov_v_reg(ot, cpu_T[0], reg);
             /* for xchg, lock is implicit */
             if (!(prefixes & PREFIX_LOCK))
-                gen_helper_lock();
+                gen_helper_aie_ld_lock(cpu_env, cpu_A0);
             gen_op_ld_v(s, ot, cpu_T[1], cpu_A0);
             gen_op_st_v(s, ot, cpu_T[0], cpu_A0);
             if (!(prefixes & PREFIX_LOCK))
-                gen_helper_unlock();
+                gen_helper_aie_unlock__done(cpu_env);
             gen_op_mov_reg_v(ot, reg, cpu_T[1]);
         }
         break;
@@ -5724,24 +5764,24 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
 
                     switch(op >> 4) {
                     case 0:
-                        tcg_gen_qemu_ld_i32(cpu_tmp2_i32, cpu_A0,
-                                            s->mem_index, MO_LEUL);
+                        gen_i386_ld_i32(s, cpu_tmp2_i32, cpu_A0,
+                                        s->mem_index, MO_LEUL);
                         gen_helper_flds_FT0(cpu_env, cpu_tmp2_i32);
                         break;
                     case 1:
-                        tcg_gen_qemu_ld_i32(cpu_tmp2_i32, cpu_A0,
-                                            s->mem_index, MO_LEUL);
+                        gen_i386_ld_i32(s, cpu_tmp2_i32, cpu_A0,
+                                        s->mem_index, MO_LEUL);
                         gen_helper_fildl_FT0(cpu_env, cpu_tmp2_i32);
                         break;
                     case 2:
-                        tcg_gen_qemu_ld_i64(cpu_tmp1_i64, cpu_A0,
-                                            s->mem_index, MO_LEQ);
+                        gen_i386_ld_i64(s, cpu_tmp1_i64, cpu_A0,
+                                        s->mem_index, MO_LEQ);
                         gen_helper_fldl_FT0(cpu_env, cpu_tmp1_i64);
                         break;
                     case 3:
                     default:
-                        tcg_gen_qemu_ld_i32(cpu_tmp2_i32, cpu_A0,
-                                            s->mem_index, MO_LESW);
+                        gen_i386_ld_i32(s, cpu_tmp2_i32, cpu_A0,
+                                        s->mem_index, MO_LESW);
                         gen_helper_fildl_FT0(cpu_env, cpu_tmp2_i32);
                         break;
                     }
@@ -5763,24 +5803,24 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                 case 0:
                     switch(op >> 4) {
                     case 0:
-                        tcg_gen_qemu_ld_i32(cpu_tmp2_i32, cpu_A0,
-                                            s->mem_index, MO_LEUL);
+                        gen_i386_ld_i32(s, cpu_tmp2_i32, cpu_A0,
+                                        s->mem_index, MO_LEUL);
                         gen_helper_flds_ST0(cpu_env, cpu_tmp2_i32);
                         break;
                     case 1:
-                        tcg_gen_qemu_ld_i32(cpu_tmp2_i32, cpu_A0,
-                                            s->mem_index, MO_LEUL);
+                        gen_i386_ld_i32(s, cpu_tmp2_i32, cpu_A0,
+                                        s->mem_index, MO_LEUL);
                         gen_helper_fildl_ST0(cpu_env, cpu_tmp2_i32);
                         break;
                     case 2:
-                        tcg_gen_qemu_ld_i64(cpu_tmp1_i64, cpu_A0,
-                                            s->mem_index, MO_LEQ);
+                        gen_i386_ld_i64(s, cpu_tmp1_i64, cpu_A0,
+                                        s->mem_index, MO_LEQ);
                         gen_helper_fldl_ST0(cpu_env, cpu_tmp1_i64);
                         break;
                     case 3:
                     default:
-                        tcg_gen_qemu_ld_i32(cpu_tmp2_i32, cpu_A0,
-                                            s->mem_index, MO_LESW);
+                        gen_i386_ld_i32(s, cpu_tmp2_i32, cpu_A0,
+                                        s->mem_index, MO_LESW);
                         gen_helper_fildl_ST0(cpu_env, cpu_tmp2_i32);
                         break;
                     }
@@ -5790,19 +5830,19 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                     switch(op >> 4) {
                     case 1:
                         gen_helper_fisttl_ST0(cpu_tmp2_i32, cpu_env);
-                        tcg_gen_qemu_st_i32(cpu_tmp2_i32, cpu_A0,
-                                            s->mem_index, MO_LEUL);
+                        gen_i386_st_i32(cpu_tmp2_i32, cpu_A0,
+                                        s->mem_index, MO_LEUL);
                         break;
                     case 2:
                         gen_helper_fisttll_ST0(cpu_tmp1_i64, cpu_env);
-                        tcg_gen_qemu_st_i64(cpu_tmp1_i64, cpu_A0,
-                                            s->mem_index, MO_LEQ);
+                        gen_i386_st_i64(cpu_tmp1_i64, cpu_A0,
+                                        s->mem_index, MO_LEQ);
                         break;
                     case 3:
                     default:
                         gen_helper_fistt_ST0(cpu_tmp2_i32, cpu_env);
-                        tcg_gen_qemu_st_i32(cpu_tmp2_i32, cpu_A0,
-                                            s->mem_index, MO_LEUW);
+                        gen_i386_st_i32(cpu_tmp2_i32, cpu_A0,
+                                        s->mem_index, MO_LEUW);
                         break;
                     }
                     gen_helper_fpop(cpu_env);
@@ -5811,24 +5851,24 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                     switch(op >> 4) {
                     case 0:
                         gen_helper_fsts_ST0(cpu_tmp2_i32, cpu_env);
-                        tcg_gen_qemu_st_i32(cpu_tmp2_i32, cpu_A0,
-                                            s->mem_index, MO_LEUL);
+                        gen_i386_st_i32(cpu_tmp2_i32, cpu_A0,
+                                        s->mem_index, MO_LEUL);
                         break;
                     case 1:
                         gen_helper_fistl_ST0(cpu_tmp2_i32, cpu_env);
-                        tcg_gen_qemu_st_i32(cpu_tmp2_i32, cpu_A0,
-                                            s->mem_index, MO_LEUL);
+                        gen_i386_st_i32(cpu_tmp2_i32, cpu_A0,
+                                        s->mem_index, MO_LEUL);
                         break;
                     case 2:
                         gen_helper_fstl_ST0(cpu_tmp1_i64, cpu_env);
-                        tcg_gen_qemu_st_i64(cpu_tmp1_i64, cpu_A0,
-                                            s->mem_index, MO_LEQ);
+                        gen_i386_st_i64(cpu_tmp1_i64, cpu_A0,
+                                        s->mem_index, MO_LEQ);
                         break;
                     case 3:
                     default:
                         gen_helper_fist_ST0(cpu_tmp2_i32, cpu_env);
-                        tcg_gen_qemu_st_i32(cpu_tmp2_i32, cpu_A0,
-                                            s->mem_index, MO_LEUW);
+                        gen_i386_st_i32(cpu_tmp2_i32, cpu_A0,
+                                        s->mem_index, MO_LEUW);
                         break;
                     }
                     if ((op & 7) == 3)
@@ -5842,8 +5882,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                 gen_helper_fldenv(cpu_env, cpu_A0, tcg_const_i32(dflag - 1));
                 break;
             case 0x0d: /* fldcw mem */
-                tcg_gen_qemu_ld_i32(cpu_tmp2_i32, cpu_A0,
-                                    s->mem_index, MO_LEUW);
+                gen_i386_ld_i32(s, cpu_tmp2_i32, cpu_A0, s->mem_index, MO_LEUW);
                 gen_helper_fldcw(cpu_env, cpu_tmp2_i32);
                 break;
             case 0x0e: /* fnstenv mem */
@@ -5853,8 +5892,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                 break;
             case 0x0f: /* fnstcw mem */
                 gen_helper_fnstcw(cpu_tmp2_i32, cpu_env);
-                tcg_gen_qemu_st_i32(cpu_tmp2_i32, cpu_A0,
-                                    s->mem_index, MO_LEUW);
+                gen_i386_st_i32(cpu_tmp2_i32, cpu_A0, s->mem_index, MO_LEUW);
                 break;
             case 0x1d: /* fldt mem */
                 gen_update_cc_op(s);
@@ -5879,8 +5917,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                 break;
             case 0x2f: /* fnstsw mem */
                 gen_helper_fnstsw(cpu_tmp2_i32, cpu_env);
-                tcg_gen_qemu_st_i32(cpu_tmp2_i32, cpu_A0,
-                                    s->mem_index, MO_LEUW);
+                gen_i386_st_i32(cpu_tmp2_i32, cpu_A0, s->mem_index, MO_LEUW);
                 break;
             case 0x3c: /* fbld */
                 gen_update_cc_op(s);
@@ -5894,12 +5931,12 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                 gen_helper_fpop(cpu_env);
                 break;
             case 0x3d: /* fildll */
-                tcg_gen_qemu_ld_i64(cpu_tmp1_i64, cpu_A0, s->mem_index, MO_LEQ);
+                gen_i386_ld_i64(s, cpu_tmp1_i64, cpu_A0, s->mem_index, MO_LEQ);
                 gen_helper_fildll_ST0(cpu_env, cpu_tmp1_i64);
                 break;
             case 0x3f: /* fistpll */
                 gen_helper_fistll_ST0(cpu_tmp1_i64, cpu_env);
-                tcg_gen_qemu_st_i64(cpu_tmp1_i64, cpu_A0, s->mem_index, MO_LEQ);
+                gen_i386_st_i64(cpu_tmp1_i64, cpu_A0, s->mem_index, MO_LEQ);
                 gen_helper_fpop(cpu_env);
                 break;
             default:
@@ -7754,8 +7791,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                 goto illegal_op;
             gen_lea_modrm(env, s, modrm);
             if (op == 2) {
-                tcg_gen_qemu_ld_i32(cpu_tmp2_i32, cpu_A0,
-                                    s->mem_index, MO_LEUL);
+                gen_i386_ld_i32(s, cpu_tmp2_i32, cpu_A0, s->mem_index, MO_LEUL);
                 gen_helper_ldmxcsr(cpu_env, cpu_tmp2_i32);
             } else {
                 tcg_gen_ld32u_tl(cpu_T[0], cpu_env, offsetof(CPUX86State, mxcsr));
@@ -7841,11 +7877,11 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
     }
     /* lock generation */
     if (s->prefix & PREFIX_LOCK)
-        gen_helper_unlock();
+        gen_helper_lock_disable(cpu_env);
     return s->pc;
  illegal_op:
     if (s->prefix & PREFIX_LOCK)
-        gen_helper_unlock();
+        gen_helper_lock_disable(cpu_env);
     /* XXX: ensure that no lock was generated */
     gen_exception(s, EXCP06_ILLOP, pc_start - s->cs_base);
     return s->pc;
-- 
1.8.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC 1/8] cputlb: add physical address to CPUTLBEntry
  2015-05-08 21:02 ` [Qemu-devel] [RFC 1/8] cputlb: add physical address to CPUTLBEntry Emilio G. Cota
@ 2015-05-08 21:51   ` Richard Henderson
  2015-05-10  8:07     ` Emilio G. Cota
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Henderson @ 2015-05-08 21:51 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel
  Cc: mttcg, Peter Maydell, Alvise Rigo, Paolo Bonzini, alex.bennee,
	Frederic Konrad

On 05/08/2015 02:02 PM, Emilio G. Cota wrote:
> -#if HOST_LONG_BITS == 32 && TARGET_LONG_BITS == 32
> -#define CPU_TLB_ENTRY_BITS 4
> -#else
> +#if TARGET_LONG_BITS == 32
>  #define CPU_TLB_ENTRY_BITS 5
> +#else
> +#define CPU_TLB_ENTRY_BITS 6
>  #endif

Ouch.  24 of 64 wasted bytes for 64-bit?

I wonder if there's a better way we can encode this to avoid 3 copies of the
virtual address for read/write/code.  Or if we're better off using more than
one insn to multiply by a non-power-of-two.  Or if the hardware multiplier is
fast enough just multiply by the proper constant.


r~

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC 5/8] aie: add module for Atomic Instruction Emulation
  2015-05-08 21:02 ` [Qemu-devel] [RFC 5/8] aie: add module for Atomic Instruction Emulation Emilio G. Cota
@ 2015-05-08 22:41   ` Emilio G. Cota
  0 siblings, 0 replies; 13+ messages in thread
From: Emilio G. Cota @ 2015-05-08 22:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, Peter Maydell, Alvise Rigo, Paolo Bonzini,
	Richard Henderson, alex.bennee, Frederic Konrad

On Fri, May 08, 2015 at 17:02:11 -0400, Emilio G. Cota wrote:
> +++ b/aie.c
(snip)
> +static inline void *aie_entry_init(unsigned long index)
> +{
> +    AIEEntry *entry;
> +
> +    entry = qemu_memalign(64, sizeof(*entry));
> +    tiny_set_init(&entry->ts);
> +    qemu_mutex_init(&entry->lock);
> +    return entry;
> +}
> +
> +AIEEntry *aie_entry_get_lock(hwaddr addr)
> +{
> +    aie_addr_t laddr = to_aie(addr);
> +    AIEEntry *e;
> +
> +    e = qemu_radix_tree_find_alloc(&aie_rtree, laddr, aie_entry_init, g_free);

s/g_free/qemu_vfree/, since the allocation is done (above) with
qemu_memalign.

The tree doesn't support deletions, but still..

		Emilio

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC 1/8] cputlb: add physical address to CPUTLBEntry
  2015-05-08 21:51   ` Richard Henderson
@ 2015-05-10  8:07     ` Emilio G. Cota
  0 siblings, 0 replies; 13+ messages in thread
From: Emilio G. Cota @ 2015-05-10  8:07 UTC (permalink / raw)
  To: Richard Henderson
  Cc: mttcg, Peter Maydell, qemu-devel, Alvise Rigo, Paolo Bonzini,
	alex.bennee, Frederic Konrad

On Fri, May 08, 2015 at 14:51:58 -0700, Richard Henderson wrote:
> Ouch.  24 of 64 wasted bytes for 64-bit?
> 
> I wonder if there's a better way we can encode this to avoid 3 copies of the
> virtual address for read/write/code.  Or if we're better off using more than
> one insn to multiply by a non-power-of-two.

Adding one more instruction works well. Perf tests
(# time ./selftest.sh for [1]) show no appreciable difference.

Patch (i386-only) appended.

[1] http://wiki.qemu.org/download/ppc-virtexml507-linux-2_6_34.tgz

> Or if the hardware multiplier is
> fast enough just multiply by the proper constant.

This option should be slower (3-cycle latency for IMUL
on Ivy Bridge/Haswell, probably slower on others),
but would make the code very simple.
Unfortunately I couldn't write a patch to do it due to
my poor grasp of TCG backend code. If someone provides a
patch I'd be glad to test it.

If the appended ends up being the preferred option I
can extend it to support all the TCG targets. I cannot
however test all of them--only have access to x86 and ppc
hardware at the moment.

Thanks,

		Emilio

[PATCH] i386-only: remove sizeof(CPUTLBEntry)=pow2 constraint

Breaks all non-i386 TCG backends! Do not apply.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/cpu-defs.h | 23 +++++++++++++++++------
 tcg/i386/tcg-target.c   | 12 +++++++-----
 2 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index 8891f16..716052a 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -96,14 +96,25 @@ typedef struct CPUTLBEntry {
     /* Addend to virtual address to get host address.  IO accesses
        use the corresponding iotlb value.  */
     uintptr_t addend;
-    /* padding to get a power of two size */
-    uint8_t dummy[(1 << CPU_TLB_ENTRY_BITS) -
-                  (sizeof(target_ulong) * 4 +
-                   ((-sizeof(target_ulong) * 4) & (sizeof(uintptr_t) - 1)) +
-                   sizeof(uintptr_t))];
 } CPUTLBEntry;
 
-QEMU_BUILD_BUG_ON(sizeof(CPUTLBEntry) != (1 << CPU_TLB_ENTRY_BITS));
+/*
+ * Fast TLB hits are essential for softmmu performance. Since we hand-code in
+ * assembly the TCG check for TLB hits, we define here a pair of constants that
+ * allow us to use only shifts to obtain a TLBEntry address given its index.
+ * NOTE: The constants below should thus be updated every time changes are
+ * made to the CPUTLBEntry struct. See the compile-time consistency check below.
+ */
+#if TARGET_LONG_BITS == 64
+#define CPU_TLB_ENTRY_OUT_SHIFT 3
+#define CPU_TLB_ENTRY_IN_SHIFT  2
+#else
+#define CPU_TLB_ENTRY_OUT_SHIFT (HOST_LONG_BITS == 32 ? 2 : 3)
+#define CPU_TLB_ENTRY_IN_SHIFT  (HOST_LONG_BITS == 32 ? 2 : 1)
+#endif
+
+QEMU_BUILD_BUG_ON(sizeof(CPUTLBEntry) != \
+                (1 + BIT(CPU_TLB_ENTRY_IN_SHIFT)) << CPU_TLB_ENTRY_OUT_SHIFT);
 
 /* The IOTLB is not accessed directly inline by generated TCG code,
  * so the CPUIOTLBEntry layout is not as critical as that of the
diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index ab63823..54250f5 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -1195,15 +1195,17 @@ static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     tcg_out_mov(s, htype, r0, addrlo);
     tcg_out_mov(s, ttype, r1, addrlo);
 
-    tcg_out_shifti(s, SHIFT_SHR + hrexw, r0,
-                   TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
+    tcg_out_shifti(s, SHIFT_SHR + hrexw, r0, TARGET_PAGE_BITS);
 
     tgen_arithi(s, ARITH_AND + trexw, r1,
                 TARGET_PAGE_MASK | ((1 << s_bits) - 1), 0);
-    tgen_arithi(s, ARITH_AND + hrexw, r0,
-                (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS, 0);
+    tgen_arithi(s, ARITH_AND + hrexw, r0, CPU_TLB_SIZE - 1, 0);
 
-    tcg_out_modrm_sib_offset(s, OPC_LEA + hrexw, r0, TCG_AREG0, r0, 0,
+    tcg_out_modrm_sib_offset(s, OPC_LEA + hrexw, r0, r0, r0,
+                             CPU_TLB_ENTRY_IN_SHIFT, 0);
+
+    tcg_out_modrm_sib_offset(s, OPC_LEA + hrexw, r0, TCG_AREG0, r0,
+                             CPU_TLB_ENTRY_OUT_SHIFT,
                              offsetof(CPUArchState, tlb_table[mem_index][0])
                              + which);
 
-- 
1.8.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] Helper-based Atomic Instruction Emulation (AIE)
  2015-05-08 21:02 [Qemu-devel] [RFC 0/8] Helper-based Atomic Instruction Emulation (AIE) Emilio G. Cota
                   ` (7 preceding siblings ...)
  2015-05-08 21:02 ` [Qemu-devel] [RFC 8/8] target-i386: " Emilio G. Cota
@ 2015-05-11 16:01 ` Frederic Konrad
  8 siblings, 0 replies; 13+ messages in thread
From: Frederic Konrad @ 2015-05-11 16:01 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel
  Cc: mttcg, Peter Maydell, Alvise Rigo, Paolo Bonzini, alex.bennee,
	Richard Henderson

On 08/05/2015 23:02, Emilio G. Cota wrote:
> Hi all,
>
> These are patches I've been working on for some time now.
> Since emulation of atomic instructions is recently getting
> attention([1], [2]), I'm submitting them for comment.
>
> [1] http://thread.gmane.org/gmane.comp.emulators.qemu/314406
> [2] http://thread.gmane.org/gmane.comp.emulators.qemu/334561
>
> Main features of this design:
>
> - Performance and scalability are the main design goal: guest code should
>    scale as much as it would scale running natively on the host.
>
>    For this, a host lock is (if necessary) assigned to each 16-byte
>    aligned chunk of the physical address space.
>    The assignment (i.e. lock allocation + registration) only happens
>    after an atomic operation on a particular physical address
>    is performed. To keep track of this sparse set of locks,
>    a lockless radix tree is used, so lookups are fast and scalable.
>
> - Translation helpers are employed to call the 'aie' module, which is
>    the common code that accesses the radix tree, locking the appropriate
>    entry depending on the access' physical address.
>
> - No special host atomic instructions (e.g. cmpxchg16b) are required;
>    mutexes and include/qemu/atomic.h is all that's needed.
>
> - Usermode and full-system are supported with the same code. Note that
>    the newly-added tiny_set module is necessary to properly emulate LL/SC,
>    since the number of "cpus" (i.e. threads) is unbounded in usermode--
>    for full-system mode a bitmap would have been sufficient.
>
> - ARM: Stores concurrent with LL/SC primitives are initially not dealt
>    with.
>    This is my choice, since I'm assuming most sane code will only
>    handle data atomically using LL/SC primitives. However, SWP can
>    be used, so whenevery a SWP instruction is issued, stores start checking
>    that stores do not clash with concurrent SWP instructions. This is
>    implemented via pre/post-store helpers. I've stress-tested this with a
>    heavily contended guest lock (64 cores), and it works fine. Executing
>    non-trivial pre/post-store helpers adds a 5% perf overhead to linux
>    bootup, and is negligible on regular programs. Anyway most
>    sane code doesn't use SWP (linux bootup certainly doesn't.), so this
>    overhead is rarely seen.
>
> - x86: Instead of acquiring the same host lock every time LOCK is found,
>    the acquisition of an AIE lock (via the radix tree) is done when the
>    address of the ensuing load/store is known.
>    Loads perform this check at compile-time.
>    Stores are emulated using the same trick as in ARM; non-atomic stores
>    are executed as atomic stores iff there's a prior atomic operation that
>    has been executed on their target address. This for instance ensures
>    that a regular store cannot race with a cmpxchg.
>    This has very small overhead (negligible with OpenSSL's bntest in
>    user-only), and scales as native code.
>
> - Barriers: not emulated yet. They're needed to correctly run non-trivial
>    lockless code (I'm using concurrencykit's testbenches).
>    The strongly-ordered-guest-on-weakly-ordered-host problem remains; my
>    guess is that we'll have to sacrifice single-threaded performance to
>    make it work (e.g. using pre-post ld/st helpers).
>
> - 64-bit guest on 32-bit host: Not supported yet. Note that 64-bit
>    loads/stores on a 32-bit guest are not atomic, yet 64-bit code might
>    have been written assuming that they are. Checks for this will be needed.
>
> - Other ISAs: not done yet, but they should be like either ARM or x86.
>
> - License of new files: is there a preferred license for new code?
I think it's GPL V2 or later.

Fred

>
> - Please tolerate the lack of comments in code and commit logs, when
>    preparing this RFC I thought it's better to put all the info
>    here. If this wasn't an RFC I'd have done it differently.
>
> Thanks for reading this far, comments welcome!
>
> 		Emilio

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-05-11 16:02 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-08 21:02 [Qemu-devel] [RFC 0/8] Helper-based Atomic Instruction Emulation (AIE) Emilio G. Cota
2015-05-08 21:02 ` [Qemu-devel] [RFC 1/8] cputlb: add physical address to CPUTLBEntry Emilio G. Cota
2015-05-08 21:51   ` Richard Henderson
2015-05-10  8:07     ` Emilio G. Cota
2015-05-08 21:02 ` [Qemu-devel] [RFC 2/8] softmmu: add helpers to get ld/st physical addresses Emilio G. Cota
2015-05-08 21:02 ` [Qemu-devel] [RFC 3/8] tiny_set: add module to test for membership in a tiny set of pointers Emilio G. Cota
2015-05-08 21:02 ` [Qemu-devel] [RFC 4/8] radix-tree: add generic lockless radix tree module Emilio G. Cota
2015-05-08 21:02 ` [Qemu-devel] [RFC 5/8] aie: add module for Atomic Instruction Emulation Emilio G. Cota
2015-05-08 22:41   ` Emilio G. Cota
2015-05-08 21:02 ` [Qemu-devel] [RFC 6/8] aie: add target helpers Emilio G. Cota
2015-05-08 21:02 ` [Qemu-devel] [RFC 7/8] target-arm: emulate atomic instructions using AIE Emilio G. Cota
2015-05-08 21:02 ` [Qemu-devel] [RFC 8/8] target-i386: " Emilio G. Cota
2015-05-11 16:01 ` [Qemu-devel] [RFC 0/8] Helper-based Atomic Instruction Emulation (AIE) Frederic Konrad

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.