All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC v2 0/7] Slow-path for atomic instruction translation
@ 2015-06-15 11:51 Alvise Rigo
  2015-06-15 11:51 ` [Qemu-devel] [RFC v2 1/7] bitmap: Add bitmap_one_extend operation Alvise Rigo
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Alvise Rigo @ 2015-06-15 11:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, claudio.fontana, cota, jani.kokkonen, tech, alex.bennee, rth

This is the second iteration of the patch series. The relevant changes from
the v1 are at the bottom of this cover letter.

This patch series provides an infrastructure for atomic
instruction implementation in QEMU, paving the way for TCG multi-threading.
The adopted design does not rely on host atomic
instructions and is intended to propose a 'legacy' solution for
translating guest atomic instructions.

The underlying idea is to provide new TCG instructions that guarantee
atomicity to some memory accesses or in general a way to define memory
transactions. More specifically, a new pair of TCG instructions are
implemented, qemu_ldlink_i32 and qemu_stcond_i32, that behave as
LoadLink and StoreConditional primitives (only 32 bit variant
implemented).  In order to achieve this, a new bitmap is added to the
ram_list structure (always unique) which flags all memory pages that
could not be accessed directly through the fast-path, due to previous
exclusive operations. This new bitmap is coupled with a new TLB flag
which forces the slow-path exectuion. All stores which are performed
between an LL/SC operation by other vCPUs to the same (protected) address
will fail the subsequent StoreConditional.

In theory, the provided implementation of TCG LoadLink/StoreConditional
can be used to properly handle atomic instructions on any architecture.

The new slow-path is implemented such that:
- the LoadLink behaves as a normal load slow-path, except for cleaning
  the dirty flag in the bitmap. The TLB entries created from now on will
  force the slow-path. To ensure it, we flush the TLB cache for the
  other vCPUs
- the StoreConditional behaves as a normal store slow-path, except for
  checking the state of the dirty bitmap and returning 0 or 1 whether or
  not the StoreConditional succeeded (0 when no vCPU has touched the
  same memory in the mean time).

All those write accesses that are forced to follow the 'legacy'
slow-path will set the accessed memory page to dirty.

In this series only the ARM ldrex/strex instructions are implemented
for ARM and i386 hosts.
The code was tested with bare-metal test cases and by booting Linux,
using upstream QEMU.

Change from v1:
- The ram bitmap is not reversed anymore, 1 = dirty, 0 = exclusive
- The way how the offset to access the bitmap is calculated has
  been improved and fixed
- A page to be set as dirty requires a vCPU to target the protected address
  and not just an address in the page
- Addressed comments from Richard Henderson to improve the logic in
  softmmu_template.h and to simplify the methods generation through
  softmmu_llsc_template.h
- Added initial implementation of qemu_{ldlink,stcond}_i32 for tcg/i386

This work has been sponsored by Huawei Technologies Duesseldorf GmbH.

Alvise Rigo (7):
  bitmap: Add bitmap_one_extend operation
  exec: Add new exclusive bitmap to ram_list
  Add new TLB_EXCL flag
  softmmu: Add helpers for a new slow-path
  tcg-op: create new TCG qemu_ldlink and qemu_stcond instructions
  target-arm: translate: implement qemu_ldlink and qemu_stcond ops
  target-i386: translate: implement qemu_ldlink and qemu_stcond ops

 cputlb.c                |  21 +++++-
 exec.c                  |   7 +-
 include/exec/cpu-all.h  |   2 +
 include/exec/cpu-defs.h |   4 +
 include/exec/memory.h   |   3 +-
 include/exec/ram_addr.h |  19 +++++
 include/qemu/bitmap.h   |   9 +++
 softmmu_llsc_template.h | 155 +++++++++++++++++++++++++++++++++++++++
 softmmu_template.h      | 191 +++++++++++++++++++++++++++++++-----------------
 target-arm/translate.c  |  87 +++++++++++++++++++++-
 tcg/arm/tcg-target.c    | 121 +++++++++++++++++++++++-------
 tcg/i386/tcg-target.c   | 136 ++++++++++++++++++++++++++++------
 tcg/tcg-be-ldst.h       |   1 +
 tcg/tcg-op.c            |  23 ++++++
 tcg/tcg-op.h            |   3 +
 tcg/tcg-opc.h           |   4 +
 tcg/tcg.c               |   2 +
 tcg/tcg.h               |  20 +++++
 18 files changed, 684 insertions(+), 124 deletions(-)
 create mode 100644 softmmu_llsc_template.h

-- 
2.4.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Qemu-devel] [RFC v2 1/7] bitmap: Add bitmap_one_extend operation
  2015-06-15 11:51 [Qemu-devel] [RFC v2 0/7] Slow-path for atomic instruction translation Alvise Rigo
@ 2015-06-15 11:51 ` Alvise Rigo
  2015-06-15 11:51 ` [Qemu-devel] [RFC v2 2/7] exec: Add new exclusive bitmap to ram_list Alvise Rigo
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Alvise Rigo @ 2015-06-15 11:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, claudio.fontana, cota, jani.kokkonen, tech, alex.bennee, rth

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 include/qemu/bitmap.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 86dd9cd..c59e8e7 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -246,4 +246,13 @@ static inline unsigned long *bitmap_zero_extend(unsigned long *old,
     return new;
 }
 
+static inline unsigned long *bitmap_one_extend(unsigned long *old,
+                                               long old_nbits, long new_nbits)
+{
+    long new_len = BITS_TO_LONGS(new_nbits) * sizeof(unsigned long);
+    unsigned long *new = g_realloc(old, new_len);
+    bitmap_set(new, old_nbits, new_nbits - old_nbits);
+    return new;
+}
+
 #endif /* BITMAP_H */
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [RFC v2 2/7] exec: Add new exclusive bitmap to ram_list
  2015-06-15 11:51 [Qemu-devel] [RFC v2 0/7] Slow-path for atomic instruction translation Alvise Rigo
  2015-06-15 11:51 ` [Qemu-devel] [RFC v2 1/7] bitmap: Add bitmap_one_extend operation Alvise Rigo
@ 2015-06-15 11:51 ` Alvise Rigo
  2015-06-15 11:51 ` [Qemu-devel] [RFC v2 3/7] Add new TLB_EXCL flag Alvise Rigo
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Alvise Rigo @ 2015-06-15 11:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, claudio.fontana, cota, jani.kokkonen, tech, alex.bennee, rth

The purpose of this new bitmap is to flag the memory pages that are in
the middle of LL/SC operations (after a LL, before a SC).
For all these pages, the corresponding TLB entries will be generated
in such a way to force the slow-path.
When the system starts, the whole memory is dirty (all the bitmap is
set). A page, after being marked as exclusively-clean, will *not* be
restored as dirty after the SC; the cputlb code will take care of that,
lazily setting the page as dirty when the TLB EXCL entry is about to be
overwritten.

The accessors to this bitmap are currently not atomic, but they have to
be so in a real multi-threading TCG.

Fix also one bracket alignment.

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 exec.c                  |  7 +++++--
 include/exec/memory.h   |  3 ++-
 include/exec/ram_addr.h | 19 +++++++++++++++++++
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/exec.c b/exec.c
index 487583b..79b2571 100644
--- a/exec.c
+++ b/exec.c
@@ -1430,11 +1430,14 @@ static ram_addr_t ram_block_add(RAMBlock *new_block, Error **errp)
         int i;
 
         /* ram_list.dirty_memory[] is protected by the iothread lock.  */
-        for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
+        for (i = 0; i < DIRTY_MEMORY_NUM - 1; i++) {
             ram_list.dirty_memory[i] =
                 bitmap_zero_extend(ram_list.dirty_memory[i],
                                    old_ram_size, new_ram_size);
-       }
+        }
+        ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE] =
+                                bitmap_one_extend(ram_list.dirty_memory[i],
+                                old_ram_size, new_ram_size);
     }
     cpu_physical_memory_set_dirty_range(new_block->offset,
                                         new_block->used_length,
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 8ae004e..5ad6f20 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -19,7 +19,8 @@
 #define DIRTY_MEMORY_VGA       0
 #define DIRTY_MEMORY_CODE      1
 #define DIRTY_MEMORY_MIGRATION 2
-#define DIRTY_MEMORY_NUM       3        /* num of dirty bits */
+#define DIRTY_MEMORY_EXCLUSIVE 3
+#define DIRTY_MEMORY_NUM       4        /* num of dirty bits */
 
 #include <stdint.h>
 #include <stdbool.h>
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index c113f21..f097f1b 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -21,6 +21,7 @@
 
 #ifndef CONFIG_USER_ONLY
 #include "hw/xen/xen.h"
+#include "qemu/bitmap.h"
 
 ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
                                     bool share, const char *mem_path,
@@ -249,5 +250,23 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(unsigned long *dest,
     return num_dirty;
 }
 
+/* Exclusive bitmap accessors. */
+static inline void cpu_physical_memory_set_excl_dirty(ram_addr_t addr)
+{
+    set_bit(addr >> TARGET_PAGE_BITS,
+            ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE]);
+}
+
+static inline int cpu_physical_memory_excl_is_dirty(ram_addr_t addr)
+{
+    return test_bit(addr >> TARGET_PAGE_BITS,
+                    ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE]);
+}
+
+static inline void cpu_physical_memory_clear_excl_dirty(ram_addr_t addr)
+{
+    clear_bit(addr >> TARGET_PAGE_BITS,
+              ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE]);
+}
 #endif
 #endif
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel]  [RFC v2 3/7] Add new TLB_EXCL flag
  2015-06-15 11:51 [Qemu-devel] [RFC v2 0/7] Slow-path for atomic instruction translation Alvise Rigo
  2015-06-15 11:51 ` [Qemu-devel] [RFC v2 1/7] bitmap: Add bitmap_one_extend operation Alvise Rigo
  2015-06-15 11:51 ` [Qemu-devel] [RFC v2 2/7] exec: Add new exclusive bitmap to ram_list Alvise Rigo
@ 2015-06-15 11:51 ` Alvise Rigo
  2015-06-15 11:51 ` [Qemu-devel] [RFC v2 4/7] softmmu: Add helpers for a new slow-path Alvise Rigo
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Alvise Rigo @ 2015-06-15 11:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, claudio.fontana, cota, jani.kokkonen, tech, alex.bennee, rth

Add a new flag for the TLB entries to force all the accesses made to a
page to follow the slow-path.

In the case we remove a TLB entry marked as EXCL, we unset the
corresponding exclusive bit in the bitmap.

Mark the accessed page as dirty to invalidate any pending operation of
LL/SC only if a vCPU writes to the protected address.

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 cputlb.c                |  18 ++++-
 include/exec/cpu-all.h  |   2 +
 include/exec/cpu-defs.h |   4 ++
 softmmu_template.h      | 187 ++++++++++++++++++++++++++++++------------------
 4 files changed, 142 insertions(+), 69 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index a506086..630c11c 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -299,6 +299,16 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
     env->tlb_v_table[mmu_idx][vidx] = *te;
     env->iotlb_v[mmu_idx][vidx] = env->iotlb[mmu_idx][index];
 
+    if (te->addr_write & TLB_EXCL) {
+        /* We are removing an exclusive entry, if the corresponding exclusive
+         * bit is set, unset it. */
+        hwaddr hw_addr = (env->iotlb[mmu_idx][index].addr & TARGET_PAGE_MASK) +
+                                          (te->addr_write & TARGET_PAGE_MASK);
+        if (cpu_physical_memory_excl_is_dirty(hw_addr)) {
+            cpu_physical_memory_set_excl_dirty(hw_addr);
+        }
+    }
+
     /* refill the tlb */
     env->iotlb[mmu_idx][index].addr = iotlb - vaddr;
     env->iotlb[mmu_idx][index].attrs = attrs;
@@ -324,7 +334,13 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
                                                    + xlat)) {
             te->addr_write = address | TLB_NOTDIRTY;
         } else {
-            te->addr_write = address;
+            if (!(address & TLB_MMIO) &&
+                !cpu_physical_memory_excl_is_dirty(section->mr->ram_addr
+                                                   + xlat)) {
+                te->addr_write = address | TLB_EXCL;
+            } else {
+                te->addr_write = address;
+            }
         }
     } else {
         te->addr_write = -1;
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index ac06c67..632f6ce 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -311,6 +311,8 @@ extern RAMList ram_list;
 #define TLB_NOTDIRTY    (1 << 4)
 /* Set if TLB entry is an IO callback.  */
 #define TLB_MMIO        (1 << 5)
+/* Set if TLB entry refers a page that requires exclusive access.  */
+#define TLB_EXCL        (1 << 6)
 
 void dump_exec_info(FILE *f, fprintf_function cpu_fprintf);
 void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf);
diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index d5aecaf..c73a75f 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -165,5 +165,9 @@ typedef struct CPUIOTLBEntry {
 #define CPU_COMMON                                                      \
     /* soft mmu support */                                              \
     CPU_COMMON_TLB                                                      \
+                                                                        \
+    /* Used for atomic instruction translation. */                      \
+    bool ll_sc_context;                                                 \
+    hwaddr excl_protected_hwaddr;                                       \
 
 #endif
diff --git a/softmmu_template.h b/softmmu_template.h
index 39f571b..f1782f6 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -141,6 +141,21 @@
     vidx >= 0;                                                                \
 })
 
+#define lookup_cpus_ll_addr(addr)                                             \
+({                                                                            \
+    CPUState *cpu;                                                            \
+    bool hit = false;                                                         \
+                                                                              \
+    CPU_FOREACH(cpu) {                                                        \
+        if (cpu != current_cpu && env->excl_protected_hwaddr == addr) {       \
+            hit = true;                                                       \
+            break;                                                            \
+        }                                                                     \
+    }                                                                         \
+                                                                              \
+    hit;                                                                      \
+})
+
 #ifndef SOFTMMU_CODE_ACCESS
 static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
                                               CPUIOTLBEntry *iotlbentry,
@@ -409,43 +424,61 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
         tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
     }
 
-    /* Handle an IO access.  */
+    /* Handle an IO access or exclusive access.  */
     if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
-        CPUIOTLBEntry *iotlbentry;
-        if ((addr & (DATA_SIZE - 1)) != 0) {
-            goto do_unaligned_access;
-        }
-        iotlbentry = &env->iotlb[mmu_idx][index];
-
-        /* ??? Note that the io helpers always read data in the target
-           byte ordering.  We should push the LE/BE request down into io.  */
-        val = TGT_LE(val);
-        glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
-        return;
-    }
-
-    /* Handle slow unaligned access (it spans two pages or IO).  */
-    if (DATA_SIZE > 1
-        && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
-                     >= TARGET_PAGE_SIZE)) {
-        int i;
-    do_unaligned_access:
-        if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
-            cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
-                                 mmu_idx, retaddr);
+        CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
+        if ((tlb_addr & ~TARGET_PAGE_MASK) == TLB_EXCL) {
+            /* The slow-path has been forced since we are writing to
+             * exclusive-protected memory. */
+            hwaddr hw_addr = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
+
+            bool set_to_dirty;
+
+            /* Two cases of invalidation: the current vCPU is writing to another
+             * vCPU's exclusive address or the vCPU that issued the LoadLink is
+             * writing to it, but not through a StoreCond. */
+            set_to_dirty = lookup_cpus_ll_addr(hw_addr);
+            set_to_dirty |= env->ll_sc_context &&
+                           (env->excl_protected_hwaddr == hw_addr);
+
+            if (set_to_dirty) {
+                cpu_physical_memory_set_excl_dirty(hw_addr);
+            } /* the vCPU is legitimately writing to the protected address */
+        } else {
+            if ((addr & (DATA_SIZE - 1)) != 0) {
+                goto do_unaligned_access;
+            }
+
+            /* ??? Note that the io helpers always read data in the target
+               byte ordering.  We should push the LE/BE request down into io. */
+            val = TGT_LE(val);
+            glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
+            return;
         }
-        /* XXX: not efficient, but simple */
-        /* Note: relies on the fact that tlb_fill() does not remove the
-         * previous page from the TLB cache.  */
-        for (i = DATA_SIZE - 1; i >= 0; i--) {
-            /* Little-endian extract.  */
-            uint8_t val8 = val >> (i * 8);
-            /* Note the adjustment at the beginning of the function.
-               Undo that for the recursion.  */
-            glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
-                                            oi, retaddr + GETPC_ADJ);
+    } else {
+        /* Handle slow unaligned access (it spans two pages or IO).  */
+        if (DATA_SIZE > 1
+            && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
+                         >= TARGET_PAGE_SIZE)) {
+            int i;
+        do_unaligned_access:
+            if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
+                cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
+                                     mmu_idx, retaddr);
+            }
+            /* XXX: not efficient, but simple */
+            /* Note: relies on the fact that tlb_fill() does not remove the
+             * previous page from the TLB cache.  */
+            for (i = DATA_SIZE - 1; i >= 0; i--) {
+                /* Little-endian extract.  */
+                uint8_t val8 = val >> (i * 8);
+                /* Note the adjustment at the beginning of the function.
+                   Undo that for the recursion.  */
+                glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
+                                                oi, retaddr + GETPC_ADJ);
+            }
+            return;
         }
-        return;
     }
 
     /* Handle aligned access or unaligned access in the same page.  */
@@ -489,43 +522,61 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
         tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
     }
 
-    /* Handle an IO access.  */
+    /* Handle an IO access or exclusive access.  */
     if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
-        CPUIOTLBEntry *iotlbentry;
-        if ((addr & (DATA_SIZE - 1)) != 0) {
-            goto do_unaligned_access;
-        }
-        iotlbentry = &env->iotlb[mmu_idx][index];
-
-        /* ??? Note that the io helpers always read data in the target
-           byte ordering.  We should push the LE/BE request down into io.  */
-        val = TGT_BE(val);
-        glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
-        return;
-    }
-
-    /* Handle slow unaligned access (it spans two pages or IO).  */
-    if (DATA_SIZE > 1
-        && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
-                     >= TARGET_PAGE_SIZE)) {
-        int i;
-    do_unaligned_access:
-        if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
-            cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
-                                 mmu_idx, retaddr);
+        CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
+        if ((tlb_addr & ~TARGET_PAGE_MASK) == TLB_EXCL) {
+            /* The slow-path has been forced since we are writing to
+             * exclusive-protected memory. */
+            hwaddr hw_addr = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
+
+            bool set_to_dirty;
+
+            /* Two cases of invalidation: the current vCPU is writing to another
+             * vCPU's exclusive address or the vCPU that issued the LoadLink is
+             * writing to it, but not through a StoreCond. */
+            set_to_dirty = lookup_cpus_ll_addr(hw_addr);
+            set_to_dirty |= env->ll_sc_context &&
+                           (env->excl_protected_hwaddr == hw_addr);
+
+            if (set_to_dirty) {
+                cpu_physical_memory_set_excl_dirty(hw_addr);
+            } /* the vCPU is legitimately writing to the protected address */
+        } else {
+            if ((addr & (DATA_SIZE - 1)) != 0) {
+                goto do_unaligned_access;
+            }
+
+            /* ??? Note that the io helpers always read data in the target
+               byte ordering.  We should push the LE/BE request down into io. */
+            val = TGT_BE(val);
+            glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
+            return;
         }
-        /* XXX: not efficient, but simple */
-        /* Note: relies on the fact that tlb_fill() does not remove the
-         * previous page from the TLB cache.  */
-        for (i = DATA_SIZE - 1; i >= 0; i--) {
-            /* Big-endian extract.  */
-            uint8_t val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8));
-            /* Note the adjustment at the beginning of the function.
-               Undo that for the recursion.  */
-            glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
-                                            oi, retaddr + GETPC_ADJ);
+    } else {
+        /* Handle slow unaligned access (it spans two pages or IO).  */
+        if (DATA_SIZE > 1
+            && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
+                         >= TARGET_PAGE_SIZE)) {
+            int i;
+        do_unaligned_access:
+            if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
+                cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
+                                     mmu_idx, retaddr);
+            }
+            /* XXX: not efficient, but simple */
+            /* Note: relies on the fact that tlb_fill() does not remove the
+             * previous page from the TLB cache.  */
+            for (i = DATA_SIZE - 1; i >= 0; i--) {
+                /* Big-endian extract.  */
+                uint8_t val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8));
+                /* Note the adjustment at the beginning of the function.
+                   Undo that for the recursion.  */
+                glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
+                                                oi, retaddr + GETPC_ADJ);
+            }
+            return;
         }
-        return;
     }
 
     /* Handle aligned access or unaligned access in the same page.  */
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [RFC v2 4/7] softmmu: Add helpers for a new slow-path
  2015-06-15 11:51 [Qemu-devel] [RFC v2 0/7] Slow-path for atomic instruction translation Alvise Rigo
                   ` (2 preceding siblings ...)
  2015-06-15 11:51 ` [Qemu-devel] [RFC v2 3/7] Add new TLB_EXCL flag Alvise Rigo
@ 2015-06-15 11:51 ` Alvise Rigo
  2015-06-15 11:51 ` [Qemu-devel] [RFC v2 5/7] tcg-op: create new TCG qemu_ldlink and qemu_stcond instructions Alvise Rigo
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Alvise Rigo @ 2015-06-15 11:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, claudio.fontana, cota, jani.kokkonen, tech, alex.bennee, rth

The new helpers rely on the legacy ones to perform the actual read/write.

The StoreConditional helper (helper_le_stcond_name) returns 1 if the
store has to fail due to a concurrent access to the same page by
another vCPU.  A 'concurrent access' can be a store made by *any* vCPU
(although, some implementations allow stores made by the CPU that issued
the LoadLink).

These helpers also update the TLB entry of the page involved in the
LL/SC, so that all the following accesses made by any vCPU will follow
the slow path.
In real multi-threading, these helpers will require to temporarily pause
the execution of the other vCPUs in order to update accordingly (flush)
the TLB cache.

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 cputlb.c                |   3 +
 softmmu_llsc_template.h | 155 ++++++++++++++++++++++++++++++++++++++++++++++++
 softmmu_template.h      |   4 ++
 tcg/tcg.h               |  18 ++++++
 4 files changed, 180 insertions(+)
 create mode 100644 softmmu_llsc_template.h

diff --git a/cputlb.c b/cputlb.c
index 630c11c..1145a33 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -394,6 +394,8 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr)
 
 #define MMUSUFFIX _mmu
 
+/* Generates LoadLink/StoreConditional helpers in softmmu_template.h */
+#define GEN_EXCLUSIVE_HELPERS
 #define SHIFT 0
 #include "softmmu_template.h"
 
@@ -406,6 +408,7 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr)
 #define SHIFT 3
 #include "softmmu_template.h"
 #undef MMUSUFFIX
+#undef GEN_EXCLUSIVE_HELPERS
 
 #define MMUSUFFIX _cmmu
 #undef GETPC_ADJ
diff --git a/softmmu_llsc_template.h b/softmmu_llsc_template.h
new file mode 100644
index 0000000..81e9d8e
--- /dev/null
+++ b/softmmu_llsc_template.h
@@ -0,0 +1,155 @@
+/*
+ *  Software MMU support (esclusive load/store operations)
+ *
+ * Generate helpers used by TCG for qemu_ldlink/stcond ops.
+ *
+ * Included from softmmu_template.h only.
+ *
+ * Copyright (c) 2015 Virtual Open Systems
+ *
+ * Authors:
+ *  Alvise Rigo <a.rigo@virtualopensystems.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * TODO configurations not implemented:
+ *     - Signed/Unsigned Big-endian
+ *     - Signed Little-endian
+ * */
+
+#if DATA_SIZE > 1
+#define helper_le_ldlink_name  glue(glue(helper_le_ldlink, USUFFIX), MMUSUFFIX)
+#define helper_le_stcond_name  glue(glue(helper_le_stcond, SUFFIX), MMUSUFFIX)
+#else
+#define helper_le_ldlink_name  glue(glue(helper_ret_ldlink, USUFFIX), MMUSUFFIX)
+#define helper_le_stcond_name  glue(glue(helper_ret_stcond, SUFFIX), MMUSUFFIX)
+#endif
+
+/* helpers from cpu_ldst.h, byte-order independent versions */
+#if DATA_SIZE > 1
+#define helper_ld_legacy glue(glue(helper_le_ld, USUFFIX), MMUSUFFIX)
+#define helper_st_legacy glue(glue(helper_le_st, SUFFIX), MMUSUFFIX)
+#else
+#define helper_ld_legacy glue(glue(helper_ret_ld, USUFFIX), MMUSUFFIX)
+#define helper_st_legacy glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)
+#endif
+
+#define is_write_tlb_entry_set(env, page, index)                             \
+({                                                                           \
+    (addr & TARGET_PAGE_MASK)                                                \
+         == ((env->tlb_table[mmu_idx][index].addr_write) &                   \
+                 (TARGET_PAGE_MASK | TLB_INVALID_MASK));                     \
+})                                                                           \
+
+#define EXCLUSIVE_RESET_ADDR ULLONG_MAX
+
+WORD_TYPE helper_le_ldlink_name(CPUArchState *env, target_ulong addr,
+                                TCGMemOpIdx oi, uintptr_t retaddr)
+{
+    WORD_TYPE ret;
+    int index;
+    CPUState *cpu;
+    hwaddr hw_addr;
+    unsigned mmu_idx = get_mmuidx(oi);
+
+    /* Use the proper load helper from cpu_ldst.h */
+    ret = helper_ld_legacy(env, addr, mmu_idx, retaddr);
+
+    /* The last legacy access ensures that the TLB and IOTLB entry for 'addr'
+     * have been created. */
+    index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+
+    /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr + xlat)
+     * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
+    hw_addr = (env->iotlb[mmu_idx][index].addr & TARGET_PAGE_MASK) + addr;
+
+    /* Set the exclusive-protected hwaddr. */
+    env->excl_protected_hwaddr = hw_addr;
+    env->ll_sc_context = true;
+
+    /* No need to mask hw_addr with TARGET_PAGE_MASK since
+     * cpu_physical_memory_excl_is_dirty() will take care of that. */
+    if (cpu_physical_memory_excl_is_dirty(hw_addr)) {
+        cpu_physical_memory_clear_excl_dirty(hw_addr);
+
+        /* Invalidate the TLB entry for the other processors. The next TLB
+         * entries for this page will have the TLB_EXCL flag set. */
+        CPU_FOREACH(cpu) {
+            if (cpu != current_cpu) {
+                tlb_flush(cpu, 1);
+            }
+        }
+    }
+
+    /* For this vCPU, just update the TLB entry, no need to flush. */
+    env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
+
+    return ret;
+}
+
+WORD_TYPE helper_le_stcond_name(CPUArchState *env, target_ulong addr,
+                                DATA_TYPE val, TCGMemOpIdx oi,
+                                uintptr_t retaddr)
+{
+    WORD_TYPE ret;
+    int index;
+    hwaddr hw_addr;
+    unsigned mmu_idx = get_mmuidx(oi);
+
+    /* If the TLB entry is not the right one, create it. */
+    index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+    if (!is_write_tlb_entry_set(env, addr, index)) {
+        tlb_fill(ENV_GET_CPU(env), addr, MMU_DATA_STORE, mmu_idx, retaddr);
+    }
+
+    /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr + xlat)
+     * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
+    hw_addr = (env->iotlb[mmu_idx][index].addr & TARGET_PAGE_MASK) + addr;
+
+    if (!env->ll_sc_context) {
+        /* No LoakLink has been set, the StoreCond has to fail. */
+        return 1;
+    }
+
+    env->ll_sc_context = 0;
+
+    if (cpu_physical_memory_excl_is_dirty(hw_addr)) {
+        /* Another vCPU has accessed the memory after the LoadLink. */
+        ret = 1;
+    } else {
+        helper_st_legacy(env, addr, val, mmu_idx, retaddr);
+
+        /* The StoreConditional succeeded */
+        ret = 0;
+    }
+
+    env->tlb_table[mmu_idx][index].addr_write &= ~TLB_EXCL;
+    env->excl_protected_hwaddr = EXCLUSIVE_RESET_ADDR;
+    /* It's likely that the page will be used again for exclusive accesses,
+     * for this reason we don't flush any TLB cache at the price of some
+     * additional slow paths and we don't set the page bit as dirty.
+     * The EXCL TLB entries will not remain there forever since they will be
+     * eventually removed to serve another guest page; when this happens we
+     * remove also the dirty bit (see cputlb.c).
+     * */
+
+    return ret;
+}
+
+#undef helper_le_ldlink_name
+#undef helper_le_stcond_name
+#undef helper_ld_legacy
+#undef helper_st_legacy
diff --git a/softmmu_template.h b/softmmu_template.h
index f1782f6..2a61284 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -601,6 +601,10 @@ glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
 
 #endif /* !defined(SOFTMMU_CODE_ACCESS) */
 
+#ifdef GEN_EXCLUSIVE_HELPERS
+#include "softmmu_llsc_template.h"
+#endif
+
 #undef READ_ACCESS_TYPE
 #undef SHIFT
 #undef DATA_TYPE
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 41e4869..d0180fe 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -955,6 +955,15 @@ tcg_target_ulong helper_be_ldul_mmu(CPUArchState *env, target_ulong addr,
                                     TCGMemOpIdx oi, uintptr_t retaddr);
 uint64_t helper_be_ldq_mmu(CPUArchState *env, target_ulong addr,
                            TCGMemOpIdx oi, uintptr_t retaddr);
+/* Exclusive variants */
+tcg_target_ulong helper_ret_ldlinkub_mmu(CPUArchState *env, target_ulong addr,
+                                         int mmu_idx, uintptr_t retaddr);
+tcg_target_ulong helper_le_ldlinkuw_mmu(CPUArchState *env, target_ulong addr,
+                                        int mmu_idx, uintptr_t retaddr);
+tcg_target_ulong helper_le_ldlinkul_mmu(CPUArchState *env, target_ulong addr,
+                                        int mmu_idx, uintptr_t retaddr);
+uint64_t helper_le_ldlinkq_mmu(CPUArchState *env, target_ulong addr,
+                               int mmu_idx, uintptr_t retaddr);
 
 /* Value sign-extended to tcg register size.  */
 tcg_target_ulong helper_ret_ldsb_mmu(CPUArchState *env, target_ulong addr,
@@ -982,6 +991,15 @@ void helper_be_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
                        TCGMemOpIdx oi, uintptr_t retaddr);
 void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
                        TCGMemOpIdx oi, uintptr_t retaddr);
+/* Exclusive variants */
+tcg_target_ulong helper_ret_stcondb_mmu(CPUArchState *env, target_ulong addr,
+                                uint8_t val, int mmu_idx, uintptr_t retaddr);
+tcg_target_ulong helper_le_stcondw_mmu(CPUArchState *env, target_ulong addr,
+                                uint16_t val, int mmu_idx, uintptr_t retaddr);
+tcg_target_ulong helper_le_stcondl_mmu(CPUArchState *env, target_ulong addr,
+                                uint32_t val, int mmu_idx, uintptr_t retaddr);
+uint64_t helper_le_stcondq_mmu(CPUArchState *env, target_ulong addr,
+                                uint64_t val, int mmu_idx, uintptr_t retaddr);
 
 /* Temporary aliases until backends are converted.  */
 #ifdef TARGET_WORDS_BIGENDIAN
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [RFC v2 5/7] tcg-op: create new TCG qemu_ldlink and qemu_stcond instructions
  2015-06-15 11:51 [Qemu-devel] [RFC v2 0/7] Slow-path for atomic instruction translation Alvise Rigo
                   ` (3 preceding siblings ...)
  2015-06-15 11:51 ` [Qemu-devel] [RFC v2 4/7] softmmu: Add helpers for a new slow-path Alvise Rigo
@ 2015-06-15 11:51 ` Alvise Rigo
  2015-06-15 11:51 ` [Qemu-devel] [RFC v2 6/7] target-arm: translate: implement qemu_ldlink and qemu_stcond ops Alvise Rigo
  2015-06-15 11:51 ` [Qemu-devel] [RFC v2 7/7] target-i386: " Alvise Rigo
  6 siblings, 0 replies; 8+ messages in thread
From: Alvise Rigo @ 2015-06-15 11:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, claudio.fontana, cota, jani.kokkonen, tech, alex.bennee, rth

Create a new pair of instructions that implement a LoadLink/StoreConditional
mechanism.

It has not been possible to completely include the two new opcodes
in the plain variants, since the StoreConditional will always require
one more argument to store the success of the operation.

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 tcg/tcg-be-ldst.h |  1 +
 tcg/tcg-op.c      | 23 +++++++++++++++++++++++
 tcg/tcg-op.h      |  3 +++
 tcg/tcg-opc.h     |  4 ++++
 tcg/tcg.c         |  2 ++
 tcg/tcg.h         | 18 ++++++++++--------
 6 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/tcg/tcg-be-ldst.h b/tcg/tcg-be-ldst.h
index 40a2369..b3f9c51 100644
--- a/tcg/tcg-be-ldst.h
+++ b/tcg/tcg-be-ldst.h
@@ -24,6 +24,7 @@
 
 typedef struct TCGLabelQemuLdst {
     bool is_ld;             /* qemu_ld: true, qemu_st: false */
+    TCGReg llsc_success;    /* reg index for qemu_stcond outcome */
     TCGMemOpIdx oi;
     TCGType type;           /* result type of a load */
     TCGReg addrlo_reg;      /* reg index for low word of guest virtual addr */
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 45098c3..a73b522 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1885,6 +1885,15 @@ static void gen_ldst_i32(TCGOpcode opc, TCGv_i32 val, TCGv addr,
 #endif
 }
 
+/* An output operand to return the StoreConditional result */
+static void gen_stcond_i32(TCGOpcode opc, TCGv_i32 is_dirty, TCGv_i32 val,
+                           TCGv addr, TCGMemOp memop, TCGArg idx)
+{
+    TCGMemOpIdx oi = make_memop_idx(memop, idx);
+
+    tcg_gen_op4i_i32(opc, is_dirty, val, addr, oi);
+}
+
 static void gen_ldst_i64(TCGOpcode opc, TCGv_i64 val, TCGv addr,
                          TCGMemOp memop, TCGArg idx)
 {
@@ -1911,12 +1920,26 @@ void tcg_gen_qemu_ld_i32(TCGv_i32 val, TCGv addr, TCGArg idx, TCGMemOp memop)
     gen_ldst_i32(INDEX_op_qemu_ld_i32, val, addr, memop, idx);
 }
 
+void tcg_gen_qemu_ldlink_i32(TCGv_i32 val, TCGv addr, TCGArg idx,
+                             TCGMemOp memop)
+{
+    memop = tcg_canonicalize_memop(memop, 0, 0);
+    gen_ldst_i32(INDEX_op_qemu_ldlink_i32, val, addr, memop, idx);
+}
+
 void tcg_gen_qemu_st_i32(TCGv_i32 val, TCGv addr, TCGArg idx, TCGMemOp memop)
 {
     memop = tcg_canonicalize_memop(memop, 0, 1);
     gen_ldst_i32(INDEX_op_qemu_st_i32, val, addr, memop, idx);
 }
 
+void tcg_gen_qemu_stcond_i32(TCGv_i32 is_dirty, TCGv_i32 val, TCGv addr,
+                             TCGArg idx, TCGMemOp memop)
+{
+    memop = tcg_canonicalize_memop(memop, 0, 1);
+    gen_stcond_i32(INDEX_op_qemu_stcond_i32, is_dirty, val, addr, memop, idx);
+}
+
 void tcg_gen_qemu_ld_i64(TCGv_i64 val, TCGv addr, TCGArg idx, TCGMemOp memop)
 {
     if (TCG_TARGET_REG_BITS == 32 && (memop & MO_SIZE) < MO_64) {
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index d1d763f..f183169 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -754,6 +754,9 @@ void tcg_gen_qemu_st_i32(TCGv_i32, TCGv, TCGArg, TCGMemOp);
 void tcg_gen_qemu_ld_i64(TCGv_i64, TCGv, TCGArg, TCGMemOp);
 void tcg_gen_qemu_st_i64(TCGv_i64, TCGv, TCGArg, TCGMemOp);
 
+void tcg_gen_qemu_ldlink_i32(TCGv_i32, TCGv, TCGArg, TCGMemOp);
+void tcg_gen_qemu_stcond_i32(TCGv_i32, TCGv_i32, TCGv, TCGArg, TCGMemOp);
+
 static inline void tcg_gen_qemu_ld8u(TCGv ret, TCGv addr, int mem_index)
 {
     tcg_gen_qemu_ld_tl(ret, addr, mem_index, MO_UB);
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 13ccb60..d6c0454 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -183,6 +183,10 @@ DEF(qemu_ld_i32, 1, TLADDR_ARGS, 1,
     TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 DEF(qemu_st_i32, 0, TLADDR_ARGS + 1, 1,
     TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_ldlink_i32, 1, TLADDR_ARGS, 2,
+    TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_stcond_i32, 1, TLADDR_ARGS + 1, 2,
+    TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 DEF(qemu_ld_i64, DATA64_ARGS, TLADDR_ARGS, 1,
     TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS | TCG_OPF_64BIT)
 DEF(qemu_st_i64, 0, TLADDR_ARGS + DATA64_ARGS, 1,
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 7e088b1..8a2265e 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1068,6 +1068,8 @@ void tcg_dump_ops(TCGContext *s)
                 i = 1;
                 break;
             case INDEX_op_qemu_ld_i32:
+            case INDEX_op_qemu_ldlink_i32:
+            case INDEX_op_qemu_stcond_i32:
             case INDEX_op_qemu_st_i32:
             case INDEX_op_qemu_ld_i64:
             case INDEX_op_qemu_st_i64:
diff --git a/tcg/tcg.h b/tcg/tcg.h
index d0180fe..7acc1fa 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -282,6 +282,8 @@ typedef enum TCGMemOp {
     MO_TEQ   = MO_TE | MO_Q,
 
     MO_SSIZE = MO_SIZE | MO_SIGN,
+
+    MO_EXCL  = 32, /* Set for exclusive memory access */
 } TCGMemOp;
 
 typedef tcg_target_ulong TCGArg;
@@ -957,13 +959,13 @@ uint64_t helper_be_ldq_mmu(CPUArchState *env, target_ulong addr,
                            TCGMemOpIdx oi, uintptr_t retaddr);
 /* Exclusive variants */
 tcg_target_ulong helper_ret_ldlinkub_mmu(CPUArchState *env, target_ulong addr,
-                                         int mmu_idx, uintptr_t retaddr);
+                                            TCGMemOpIdx oi, uintptr_t retaddr);
 tcg_target_ulong helper_le_ldlinkuw_mmu(CPUArchState *env, target_ulong addr,
-                                        int mmu_idx, uintptr_t retaddr);
+                                            TCGMemOpIdx oi, uintptr_t retaddr);
 tcg_target_ulong helper_le_ldlinkul_mmu(CPUArchState *env, target_ulong addr,
-                                        int mmu_idx, uintptr_t retaddr);
+                                            TCGMemOpIdx oi, uintptr_t retaddr);
 uint64_t helper_le_ldlinkq_mmu(CPUArchState *env, target_ulong addr,
-                               int mmu_idx, uintptr_t retaddr);
+                                            TCGMemOpIdx oi, uintptr_t retaddr);
 
 /* Value sign-extended to tcg register size.  */
 tcg_target_ulong helper_ret_ldsb_mmu(CPUArchState *env, target_ulong addr,
@@ -993,13 +995,13 @@ void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
                        TCGMemOpIdx oi, uintptr_t retaddr);
 /* Exclusive variants */
 tcg_target_ulong helper_ret_stcondb_mmu(CPUArchState *env, target_ulong addr,
-                                uint8_t val, int mmu_idx, uintptr_t retaddr);
+                            uint8_t val, TCGMemOpIdx oi, uintptr_t retaddr);
 tcg_target_ulong helper_le_stcondw_mmu(CPUArchState *env, target_ulong addr,
-                                uint16_t val, int mmu_idx, uintptr_t retaddr);
+                            uint16_t val, TCGMemOpIdx oi, uintptr_t retaddr);
 tcg_target_ulong helper_le_stcondl_mmu(CPUArchState *env, target_ulong addr,
-                                uint32_t val, int mmu_idx, uintptr_t retaddr);
+                            uint32_t val, TCGMemOpIdx oi, uintptr_t retaddr);
 uint64_t helper_le_stcondq_mmu(CPUArchState *env, target_ulong addr,
-                                uint64_t val, int mmu_idx, uintptr_t retaddr);
+                            uint64_t val, TCGMemOpIdx oi, uintptr_t retaddr);
 
 /* Temporary aliases until backends are converted.  */
 #ifdef TARGET_WORDS_BIGENDIAN
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [RFC v2 6/7] target-arm: translate: implement qemu_ldlink and qemu_stcond ops
  2015-06-15 11:51 [Qemu-devel] [RFC v2 0/7] Slow-path for atomic instruction translation Alvise Rigo
                   ` (4 preceding siblings ...)
  2015-06-15 11:51 ` [Qemu-devel] [RFC v2 5/7] tcg-op: create new TCG qemu_ldlink and qemu_stcond instructions Alvise Rigo
@ 2015-06-15 11:51 ` Alvise Rigo
  2015-06-15 11:51 ` [Qemu-devel] [RFC v2 7/7] target-i386: " Alvise Rigo
  6 siblings, 0 replies; 8+ messages in thread
From: Alvise Rigo @ 2015-06-15 11:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, claudio.fontana, cota, jani.kokkonen, tech, alex.bennee, rth

Implement strex and ldrex instruction relying on TCG's qemu_ldlink and
qemu_stcond.  For the time being only the 32bit instructions are supported.

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 target-arm/translate.c |  87 ++++++++++++++++++++++++++++++++++-
 tcg/arm/tcg-target.c   | 121 +++++++++++++++++++++++++++++++++++++------------
 2 files changed, 178 insertions(+), 30 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 39692d7..7211944 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -72,6 +72,8 @@ static TCGv_i64 cpu_exclusive_test;
 static TCGv_i32 cpu_exclusive_info;
 #endif
 
+static TCGv_i32 cpu_ll_sc_context;
+
 /* FIXME:  These should be removed.  */
 static TCGv_i32 cpu_F0s, cpu_F1s;
 static TCGv_i64 cpu_F0d, cpu_F1d;
@@ -103,6 +105,8 @@ void arm_translate_init(void)
         offsetof(CPUARMState, exclusive_addr), "exclusive_addr");
     cpu_exclusive_val = tcg_global_mem_new_i64(TCG_AREG0,
         offsetof(CPUARMState, exclusive_val), "exclusive_val");
+    cpu_ll_sc_context = tcg_global_mem_new_i32(TCG_AREG0,
+        offsetof(CPUARMState, ll_sc_context), "ll_sc_context");
 #ifdef CONFIG_USER_ONLY
     cpu_exclusive_test = tcg_global_mem_new_i64(TCG_AREG0,
         offsetof(CPUARMState, exclusive_test), "exclusive_test");
@@ -961,6 +965,18 @@ DO_GEN_ST(8, MO_UB)
 DO_GEN_ST(16, MO_TEUW)
 DO_GEN_ST(32, MO_TEUL)
 
+/* Load/Store exclusive generators (always unsigned) */
+static inline void gen_aa32_ldex32(TCGv_i32 val, TCGv_i32 addr, int index)
+{
+    tcg_gen_qemu_ldlink_i32(val, addr, index, MO_TEUL | MO_EXCL);
+}
+
+static inline void gen_aa32_stex32(TCGv_i32 is_dirty, TCGv_i32 val,
+                                   TCGv_i32 addr, int index)
+{
+    tcg_gen_qemu_stcond_i32(is_dirty, val, addr, index, MO_TEUL | MO_EXCL);
+}
+
 static inline void gen_set_pc_im(DisasContext *s, target_ulong val)
 {
     tcg_gen_movi_i32(cpu_R[15], val);
@@ -7426,6 +7442,26 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
     tcg_gen_extu_i32_i64(cpu_exclusive_addr, addr);
 }
 
+static void gen_load_exclusive_multi(DisasContext *s, int rt, int rt2,
+                                     TCGv_i32 addr, int size)
+{
+    TCGv_i32 tmp = tcg_temp_new_i32();
+
+    switch (size) {
+    case 0:
+    case 1:
+        abort();
+    case 2:
+        gen_aa32_ldex32(tmp, addr, get_mem_index(s));
+        break;
+    case 3:
+    default:
+        abort();
+    }
+
+    store_reg(s, rt, tmp);
+}
+
 static void gen_clrex(DisasContext *s)
 {
     tcg_gen_movi_i64(cpu_exclusive_addr, -1);
@@ -7524,6 +7560,52 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
     gen_set_label(done_label);
     tcg_gen_movi_i64(cpu_exclusive_addr, -1);
 }
+
+static void gen_store_exclusive_multi(DisasContext *s, int rd, int rt, int rt2,
+                                      TCGv_i32 addr, int size)
+{
+    TCGv_i32 tmp;
+    TCGv_i32 is_dirty;
+    TCGLabel *done_label;
+    TCGLabel *fail_label;
+
+    fail_label = gen_new_label();
+    done_label = gen_new_label();
+
+    tmp = tcg_temp_new_i32();
+    is_dirty = tcg_temp_new_i32();
+
+    /* Fail if we are not in LL/SC context. */
+    tcg_gen_brcondi_i32(TCG_COND_NE, cpu_ll_sc_context, 1, fail_label);
+
+    tmp = load_reg(s, rt);
+    switch (size) {
+    case 0:
+    case 1:
+        abort();
+        break;
+    case 2:
+        gen_aa32_stex32(is_dirty, tmp, addr, get_mem_index(s));
+        break;
+    case 3:
+    default:
+        abort();
+    }
+
+    tcg_temp_free_i32(tmp);
+
+    /* Check if the store conditional has to fail. */
+    tcg_gen_brcondi_i32(TCG_COND_EQ, is_dirty, 1, fail_label);
+    tcg_temp_free_i32(is_dirty);
+
+    tcg_temp_free_i32(tmp);
+
+    tcg_gen_movi_i32(cpu_R[rd], 0); /* is_dirty = 0 */
+    tcg_gen_br(done_label);
+    gen_set_label(fail_label);
+    tcg_gen_movi_i32(cpu_R[rd], 1); /* is_dirty = 1 */
+    gen_set_label(done_label);
+}
 #endif
 
 /* gen_srs:
@@ -8372,7 +8454,7 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                         } else if (insn & (1 << 20)) {
                             switch (op1) {
                             case 0: /* ldrex */
-                                gen_load_exclusive(s, rd, 15, addr, 2);
+                                gen_load_exclusive_multi(s, rd, 15, addr, 2);
                                 break;
                             case 1: /* ldrexd */
                                 gen_load_exclusive(s, rd, rd + 1, addr, 3);
@@ -8390,7 +8472,8 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                             rm = insn & 0xf;
                             switch (op1) {
                             case 0:  /*  strex */
-                                gen_store_exclusive(s, rd, rm, 15, addr, 2);
+                                gen_store_exclusive_multi(s, rd, rm, 15,
+                                                          addr, 2);
                                 break;
                             case 1: /*  strexd */
                                 gen_store_exclusive(s, rd, rm, rm + 1, addr, 3);
diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index ae2ec7a..f2b69a0 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -1069,6 +1069,17 @@ static void * const qemu_ld_helpers[16] = {
     [MO_BESL] = helper_be_ldul_mmu,
 };
 
+/* LoadLink helpers, only unsigned. Use the macro below to access them. */
+static void * const qemu_ldex_helpers[16] = {
+    [MO_LEUL] = helper_le_ldlinkul_mmu,
+};
+
+#define LDEX_HELPER(mem_op)                                             \
+({                                                                      \
+    assert(mem_op & MO_EXCL);                                           \
+    qemu_ldex_helpers[((int)mem_op - MO_EXCL)];                         \
+})
+
 /* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
  *                                     uintxx_t val, int mmu_idx, uintptr_t ra)
  */
@@ -1082,6 +1093,19 @@ static void * const qemu_st_helpers[16] = {
     [MO_BEQ]  = helper_be_stq_mmu,
 };
 
+/* StoreConditional helpers. Use the macro below to access them. */
+static void * const qemu_stex_helpers[16] = {
+    [MO_LEUL] = helper_le_stcondl_mmu,
+};
+
+#define STEX_HELPER(mem_op)                                             \
+({                                                                      \
+    assert(mem_op & MO_EXCL);                                           \
+    qemu_stex_helpers[(int)mem_op - MO_EXCL];                           \
+})
+
+
+
 /* Helper routines for marshalling helper function arguments into
  * the correct registers and stack.
  * argreg is where we want to put this argument, arg is the argument itself.
@@ -1222,13 +1246,14 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
    path for a load or store, so that we can later generate the correct
    helper code.  */
 static void add_qemu_ldst_label(TCGContext *s, bool is_ld, TCGMemOpIdx oi,
-                                TCGReg datalo, TCGReg datahi, TCGReg addrlo,
-                                TCGReg addrhi, tcg_insn_unit *raddr,
-                                tcg_insn_unit *label_ptr)
+                                TCGReg llsc_success, TCGReg datalo,
+                                TCGReg datahi, TCGReg addrlo, TCGReg addrhi,
+                                tcg_insn_unit *raddr, tcg_insn_unit *label_ptr)
 {
     TCGLabelQemuLdst *label = new_ldst_label(s);
 
     label->is_ld = is_ld;
+    label->llsc_success = llsc_success;
     label->oi = oi;
     label->datalo_reg = datalo;
     label->datahi_reg = datahi;
@@ -1259,12 +1284,16 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     /* For armv6 we can use the canonical unsigned helpers and minimize
        icache usage.  For pre-armv6, use the signed helpers since we do
        not have a single insn sign-extend.  */
-    if (use_armv6_instructions) {
-        func = qemu_ld_helpers[opc & (MO_BSWAP | MO_SIZE)];
+    if (opc & MO_EXCL) {
+        func = LDEX_HELPER(opc);
     } else {
-        func = qemu_ld_helpers[opc & (MO_BSWAP | MO_SSIZE)];
-        if (opc & MO_SIGN) {
-            opc = MO_UL;
+        if (use_armv6_instructions) {
+            func = qemu_ld_helpers[opc & (MO_BSWAP | MO_SIZE)];
+        } else {
+            func = qemu_ld_helpers[opc & (MO_BSWAP | MO_SSIZE)];
+            if (opc & MO_SIGN) {
+                opc = MO_UL;
+            }
         }
     }
     tcg_out_call(s, func);
@@ -1336,8 +1365,15 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     argreg = tcg_out_arg_imm32(s, argreg, oi);
     argreg = tcg_out_arg_reg32(s, argreg, TCG_REG_R14);
 
-    /* Tail-call to the helper, which will return to the fast path.  */
-    tcg_out_goto(s, COND_AL, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)]);
+    if (opc & MO_EXCL) {
+        tcg_out_call(s, STEX_HELPER(opc));
+        /* Save the output of the StoreConditional */
+        tcg_out_mov_reg(s, COND_AL, lb->llsc_success, TCG_REG_R0);
+        tcg_out_goto(s, COND_AL, lb->raddr);
+    } else {
+        /* Tail-call to the helper, which will return to the fast path.  */
+        tcg_out_goto(s, COND_AL, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)]);
+    }
 }
 #endif /* SOFTMMU */
 
@@ -1461,7 +1497,8 @@ static inline void tcg_out_qemu_ld_direct(TCGContext *s, TCGMemOp opc,
     }
 }
 
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64,
+                            bool isLoadLink)
 {
     TCGReg addrlo, datalo, datahi, addrhi __attribute__((unused));
     TCGMemOpIdx oi;
@@ -1484,13 +1521,20 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
     addend = tcg_out_tlb_read(s, addrlo, addrhi, opc & MO_SIZE, mem_index, 1);
 
     /* This a conditional BL only to load a pointer within this opcode into LR
-       for the slow path.  We will not be using the value for a tail call.  */
-    label_ptr = s->code_ptr;
-    tcg_out_bl_noaddr(s, COND_NE);
+       for the slow path.  We will not be using the value for a tail call.
+       In the context of a LoadLink instruction, we don't check the TLB but we
+       always follow the slow path.  */
+    if (isLoadLink) {
+        label_ptr = s->code_ptr;
+        tcg_out_bl_noaddr(s, COND_AL);
+    } else {
+        label_ptr = s->code_ptr;
+        tcg_out_bl_noaddr(s, COND_NE);
 
-    tcg_out_qemu_ld_index(s, opc, datalo, datahi, addrlo, addend);
+        tcg_out_qemu_ld_index(s, opc, datalo, datahi, addrlo, addend);
+    }
 
-    add_qemu_ldst_label(s, true, oi, datalo, datahi, addrlo, addrhi,
+    add_qemu_ldst_label(s, true, oi, 0, datalo, datahi, addrlo, addrhi,
                         s->code_ptr, label_ptr);
 #else /* !CONFIG_SOFTMMU */
     if (GUEST_BASE) {
@@ -1592,9 +1636,11 @@ static inline void tcg_out_qemu_st_direct(TCGContext *s, TCGMemOp opc,
     }
 }
 
-static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64,
+                            bool isStoreCond)
 {
     TCGReg addrlo, datalo, datahi, addrhi __attribute__((unused));
+    TCGReg llsc_success;
     TCGMemOpIdx oi;
     TCGMemOp opc;
 #ifdef CONFIG_SOFTMMU
@@ -1603,6 +1649,9 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
     tcg_insn_unit *label_ptr;
 #endif
 
+    /* The stcond variant has one more param */
+    llsc_success = (isStoreCond ? *args++ : 0);
+
     datalo = *args++;
     datahi = (is64 ? *args++ : 0);
     addrlo = *args++;
@@ -1612,16 +1661,24 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
-    addend = tcg_out_tlb_read(s, addrlo, addrhi, opc & MO_SIZE, mem_index, 0);
 
-    tcg_out_qemu_st_index(s, COND_EQ, opc, datalo, datahi, addrlo, addend);
+    if (isStoreCond) {
+        /* Always follow the slow-path for an exclusive access */
+        label_ptr = s->code_ptr;
+        tcg_out_bl_noaddr(s, COND_AL);
+    } else {
+        addend = tcg_out_tlb_read(s, addrlo, addrhi, opc & MO_SIZE,
+                                  mem_index, 0);
 
-    /* The conditional call must come last, as we're going to return here.  */
-    label_ptr = s->code_ptr;
-    tcg_out_bl_noaddr(s, COND_NE);
+        tcg_out_qemu_st_index(s, COND_EQ, opc, datalo, datahi, addrlo, addend);
 
-    add_qemu_ldst_label(s, false, oi, datalo, datahi, addrlo, addrhi,
-                        s->code_ptr, label_ptr);
+        /* The conditional call must come last, as we're going to return here.*/
+        label_ptr = s->code_ptr;
+        tcg_out_bl_noaddr(s, COND_NE);
+    }
+
+    add_qemu_ldst_label(s, false, oi, llsc_success, datalo, datahi, addrlo,
+                        addrhi, s->code_ptr, label_ptr);
 #else /* !CONFIG_SOFTMMU */
     if (GUEST_BASE) {
         tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP, GUEST_BASE);
@@ -1864,16 +1921,22 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_qemu_ld_i32:
-        tcg_out_qemu_ld(s, args, 0);
+        tcg_out_qemu_ld(s, args, 0, 0);
+        break;
+    case INDEX_op_qemu_ldlink_i32:
+        tcg_out_qemu_ld(s, args, 0, 1); /* LoadLink */
         break;
     case INDEX_op_qemu_ld_i64:
-        tcg_out_qemu_ld(s, args, 1);
+        tcg_out_qemu_ld(s, args, 1, 0);
         break;
     case INDEX_op_qemu_st_i32:
-        tcg_out_qemu_st(s, args, 0);
+        tcg_out_qemu_st(s, args, 0, 0);
+        break;
+    case INDEX_op_qemu_stcond_i32:
+        tcg_out_qemu_st(s, args, 0, 1); /* StoreConditional */
         break;
     case INDEX_op_qemu_st_i64:
-        tcg_out_qemu_st(s, args, 1);
+        tcg_out_qemu_st(s, args, 1, 0);
         break;
 
     case INDEX_op_bswap16_i32:
@@ -1957,8 +2020,10 @@ static const TCGTargetOpDef arm_op_defs[] = {
 
 #if TARGET_LONG_BITS == 32
     { INDEX_op_qemu_ld_i32, { "r", "l" } },
+    { INDEX_op_qemu_ldlink_i32, { "r", "l" } },
     { INDEX_op_qemu_ld_i64, { "r", "r", "l" } },
     { INDEX_op_qemu_st_i32, { "s", "s" } },
+    { INDEX_op_qemu_stcond_i32, { "r", "s", "s" } },
     { INDEX_op_qemu_st_i64, { "s", "s", "s" } },
 #else
     { INDEX_op_qemu_ld_i32, { "r", "l", "l" } },
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [RFC v2 7/7] target-i386: translate: implement qemu_ldlink and qemu_stcond ops
  2015-06-15 11:51 [Qemu-devel] [RFC v2 0/7] Slow-path for atomic instruction translation Alvise Rigo
                   ` (5 preceding siblings ...)
  2015-06-15 11:51 ` [Qemu-devel] [RFC v2 6/7] target-arm: translate: implement qemu_ldlink and qemu_stcond ops Alvise Rigo
@ 2015-06-15 11:51 ` Alvise Rigo
  6 siblings, 0 replies; 8+ messages in thread
From: Alvise Rigo @ 2015-06-15 11:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: mttcg, claudio.fontana, cota, jani.kokkonen, tech, alex.bennee, rth

Implement strex and ldrex instruction relying on TCG's qemu_ldlink and
qemu_stcond.  For the time being only 32bit configurations are supported.

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 tcg/i386/tcg-target.c | 136 ++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 114 insertions(+), 22 deletions(-)

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index ff4d9cf..1ffc88a 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -1137,6 +1137,17 @@ static void * const qemu_ld_helpers[16] = {
     [MO_BEQ]  = helper_be_ldq_mmu,
 };
 
+/* LoadLink helpers, only unsigned. Use the macro below to access them. */
+static void * const qemu_ldex_helpers[16] = {
+    [MO_LEUL] = helper_le_ldlinkul_mmu,
+};
+
+#define LDEX_HELPER(mem_op)                                             \
+({                                                                      \
+    assert(mem_op & MO_EXCL);                                           \
+    qemu_ldex_helpers[((int)mem_op - MO_EXCL)];                         \
+})
+
 /* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
  *                                     uintxx_t val, int mmu_idx, uintptr_t ra)
  */
@@ -1150,6 +1161,17 @@ static void * const qemu_st_helpers[16] = {
     [MO_BEQ]  = helper_be_stq_mmu,
 };
 
+/* StoreConditional helpers. Use the macro below to access them. */
+static void * const qemu_stex_helpers[16] = {
+    [MO_LEUL] = helper_le_stcondl_mmu,
+};
+
+#define STEX_HELPER(mem_op)                                             \
+({                                                                      \
+    assert(mem_op & MO_EXCL);                                           \
+    qemu_stex_helpers[(int)mem_op - MO_EXCL];                           \
+})
+
 /* Perform the TLB load and compare.
 
    Inputs:
@@ -1245,6 +1267,7 @@ static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
  * for a load or store, so that we can later generate the correct helper code
  */
 static void add_qemu_ldst_label(TCGContext *s, bool is_ld, TCGMemOpIdx oi,
+                                TCGReg llsc_success,
                                 TCGReg datalo, TCGReg datahi,
                                 TCGReg addrlo, TCGReg addrhi,
                                 tcg_insn_unit *raddr,
@@ -1253,6 +1276,7 @@ static void add_qemu_ldst_label(TCGContext *s, bool is_ld, TCGMemOpIdx oi,
     TCGLabelQemuLdst *label = new_ldst_label(s);
 
     label->is_ld = is_ld;
+    label->llsc_success = llsc_success;
     label->oi = oi;
     label->datalo_reg = datalo;
     label->datahi_reg = datahi;
@@ -1307,7 +1331,11 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
                      (uintptr_t)l->raddr);
     }
 
-    tcg_out_call(s, qemu_ld_helpers[opc & (MO_BSWAP | MO_SIZE)]);
+    if (opc & MO_EXCL) {
+        tcg_out_call(s, LDEX_HELPER(opc));
+    } else {
+        tcg_out_call(s, qemu_ld_helpers[opc & (MO_BSWAP | MO_SIZE)]);
+    }
 
     data_reg = l->datalo_reg;
     switch (opc & MO_SSIZE) {
@@ -1411,9 +1439,16 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
         }
     }
 
-    /* "Tail call" to the helper, with the return address back inline.  */
-    tcg_out_push(s, retaddr);
-    tcg_out_jmp(s, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)]);
+    if (opc & MO_EXCL) {
+        tcg_out_call(s, STEX_HELPER(opc));
+        /* Save the output of the StoreConditional */
+        tcg_out_mov(s, TCG_TYPE_I32, l->llsc_success, TCG_REG_EAX);
+        tcg_out_jmp(s, l->raddr);
+    } else {
+        /* "Tail call" to the helper, with the return address back inline.  */
+        tcg_out_push(s, retaddr);
+        tcg_out_jmp(s, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)]);
+    }
 }
 #elif defined(__x86_64__) && defined(__linux__)
 # include <asm/prctl.h>
@@ -1526,7 +1561,8 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
 /* XXX: qemu_ld and qemu_st could be modified to clobber only EDX and
    EAX. It will be useful once fixed registers globals are less
    common. */
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64,
+                            bool isLoadLink)
 {
     TCGReg datalo, datahi, addrlo;
     TCGReg addrhi __attribute__((unused));
@@ -1549,14 +1585,34 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
     mem_index = get_mmuidx(oi);
     s_bits = opc & MO_SIZE;
 
-    tcg_out_tlb_load(s, addrlo, addrhi, mem_index, s_bits,
-                     label_ptr, offsetof(CPUTLBEntry, addr_read));
+    if (isLoadLink) {
+        TCGType t = ((TCG_TARGET_REG_BITS == 64) && (TARGET_LONG_BITS == 64)) ?
+                                                   TCG_TYPE_I64 : TCG_TYPE_I32;
+        /* The JMP address will be patched afterwards,
+         * in tcg_out_qemu_ld_slow_path (two times when
+         * TARGET_LONG_BITS > TCG_TARGET_REG_BITS). */
+        tcg_out_mov(s, t, TCG_REG_L1, addrlo);
+
+        if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+            /* Store the second part of the address. */
+            tcg_out_mov(s, t, TCG_REG_L0, addrhi);
+            /* We add 4 to include the jmp that follows. */
+            label_ptr[1] = s->code_ptr + 4;
+        }
 
-    /* TLB Hit.  */
-    tcg_out_qemu_ld_direct(s, datalo, datahi, TCG_REG_L1, 0, 0, opc);
+        tcg_out_opc(s, OPC_JMP_long, 0, 0, 0);
+        label_ptr[0] = s->code_ptr;
+        s->code_ptr += 4;
+    } else {
+        tcg_out_tlb_load(s, addrlo, addrhi, mem_index, s_bits,
+                         label_ptr, offsetof(CPUTLBEntry, addr_read));
+
+        /* TLB Hit.  */
+        tcg_out_qemu_ld_direct(s, datalo, datahi, TCG_REG_L1, 0, 0, opc);
+    }
 
     /* Record the current context of a load into ldst label */
-    add_qemu_ldst_label(s, true, oi, datalo, datahi, addrlo, addrhi,
+    add_qemu_ldst_label(s, true, oi, 0, datalo, datahi, addrlo, addrhi,
                         s->code_ptr, label_ptr);
 #else
     {
@@ -1659,9 +1715,10 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
     }
 }
 
-static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64,
+                            bool isStoreCond)
 {
-    TCGReg datalo, datahi, addrlo;
+    TCGReg datalo, datahi, addrlo, llsc_success;
     TCGReg addrhi __attribute__((unused));
     TCGMemOpIdx oi;
     TCGMemOp opc;
@@ -1671,6 +1728,9 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
     tcg_insn_unit *label_ptr[2];
 #endif
 
+    /* The stcond variant has one more param */
+    llsc_success = (isStoreCond ? *args++ : 0);
+
     datalo = *args++;
     datahi = (TCG_TARGET_REG_BITS == 32 && is64 ? *args++ : 0);
     addrlo = *args++;
@@ -1682,15 +1742,35 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
     mem_index = get_mmuidx(oi);
     s_bits = opc & MO_SIZE;
 
-    tcg_out_tlb_load(s, addrlo, addrhi, mem_index, s_bits,
-                     label_ptr, offsetof(CPUTLBEntry, addr_write));
+    if (isStoreCond) {
+        TCGType t = ((TCG_TARGET_REG_BITS == 64) && (TARGET_LONG_BITS == 64)) ?
+                                                   TCG_TYPE_I64 : TCG_TYPE_I32;
+        /* The JMP address will be filled afterwards,
+         * in tcg_out_qemu_ld_slow_path (two times when
+         * TARGET_LONG_BITS > TCG_TARGET_REG_BITS). */
+        tcg_out_mov(s, t, TCG_REG_L1, addrlo);
+
+        if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+            /* Store the second part of the address. */
+            tcg_out_mov(s, t, TCG_REG_L0, addrhi);
+            /* We add 4 to include the jmp that follows. */
+            label_ptr[1] = s->code_ptr + 4;
+        }
 
-    /* TLB Hit.  */
-    tcg_out_qemu_st_direct(s, datalo, datahi, TCG_REG_L1, 0, 0, opc);
+        tcg_out_opc(s, OPC_JMP_long, 0, 0, 0);
+        label_ptr[0] = s->code_ptr;
+        s->code_ptr += 4;
+    } else {
+        tcg_out_tlb_load(s, addrlo, addrhi, mem_index, s_bits,
+                         label_ptr, offsetof(CPUTLBEntry, addr_write));
+
+        /* TLB Hit.  */
+        tcg_out_qemu_st_direct(s, datalo, datahi, TCG_REG_L1, 0, 0, opc);
+    }
 
     /* Record the current context of a store into ldst label */
-    add_qemu_ldst_label(s, false, oi, datalo, datahi, addrlo, addrhi,
-                        s->code_ptr, label_ptr);
+    add_qemu_ldst_label(s, false, oi, llsc_success, datalo, datahi, addrlo,
+                        addrhi, s->code_ptr, label_ptr);
 #else
     {
         int32_t offset = GUEST_BASE;
@@ -1951,16 +2031,22 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_qemu_ld_i32:
-        tcg_out_qemu_ld(s, args, 0);
+        tcg_out_qemu_ld(s, args, 0, 0);
+        break;
+    case INDEX_op_qemu_ldlink_i32:
+        tcg_out_qemu_ld(s, args, 0, 1);
         break;
     case INDEX_op_qemu_ld_i64:
-        tcg_out_qemu_ld(s, args, 1);
+        tcg_out_qemu_ld(s, args, 1, 0);
         break;
     case INDEX_op_qemu_st_i32:
-        tcg_out_qemu_st(s, args, 0);
+        tcg_out_qemu_st(s, args, 0, 0);
+        break;
+    case INDEX_op_qemu_stcond_i32:
+        tcg_out_qemu_st(s, args, 0, 1);
         break;
     case INDEX_op_qemu_st_i64:
-        tcg_out_qemu_st(s, args, 1);
+        tcg_out_qemu_st(s, args, 1, 0);
         break;
 
     OP_32_64(mulu2):
@@ -2182,17 +2268,23 @@ static const TCGTargetOpDef x86_op_defs[] = {
 
 #if TCG_TARGET_REG_BITS == 64
     { INDEX_op_qemu_ld_i32, { "r", "L" } },
+    { INDEX_op_qemu_ldlink_i32, { "r", "L" } },
     { INDEX_op_qemu_st_i32, { "L", "L" } },
+    { INDEX_op_qemu_stcond_i32, { "r", "L", "L" } },
     { INDEX_op_qemu_ld_i64, { "r", "L" } },
     { INDEX_op_qemu_st_i64, { "L", "L" } },
 #elif TARGET_LONG_BITS <= TCG_TARGET_REG_BITS
     { INDEX_op_qemu_ld_i32, { "r", "L" } },
+    { INDEX_op_qemu_ldlink_i32, { "r", "L" } },
     { INDEX_op_qemu_st_i32, { "L", "L" } },
+    { INDEX_op_qemu_stcond_i32, { "r", "L", "L" } },
     { INDEX_op_qemu_ld_i64, { "r", "r", "L" } },
     { INDEX_op_qemu_st_i64, { "L", "L", "L" } },
 #else
     { INDEX_op_qemu_ld_i32, { "r", "L", "L" } },
+    { INDEX_op_qemu_ldlink_i32, { "r", "L", "L" } },
     { INDEX_op_qemu_st_i32, { "L", "L", "L" } },
+    { INDEX_op_qemu_stcond_i32, { "r", "L", "L", "L" } },
     { INDEX_op_qemu_ld_i64, { "r", "r", "L", "L" } },
     { INDEX_op_qemu_st_i64, { "L", "L", "L", "L" } },
 #endif
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-06-15 11:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-15 11:51 [Qemu-devel] [RFC v2 0/7] Slow-path for atomic instruction translation Alvise Rigo
2015-06-15 11:51 ` [Qemu-devel] [RFC v2 1/7] bitmap: Add bitmap_one_extend operation Alvise Rigo
2015-06-15 11:51 ` [Qemu-devel] [RFC v2 2/7] exec: Add new exclusive bitmap to ram_list Alvise Rigo
2015-06-15 11:51 ` [Qemu-devel] [RFC v2 3/7] Add new TLB_EXCL flag Alvise Rigo
2015-06-15 11:51 ` [Qemu-devel] [RFC v2 4/7] softmmu: Add helpers for a new slow-path Alvise Rigo
2015-06-15 11:51 ` [Qemu-devel] [RFC v2 5/7] tcg-op: create new TCG qemu_ldlink and qemu_stcond instructions Alvise Rigo
2015-06-15 11:51 ` [Qemu-devel] [RFC v2 6/7] target-arm: translate: implement qemu_ldlink and qemu_stcond ops Alvise Rigo
2015-06-15 11:51 ` [Qemu-devel] [RFC v2 7/7] target-i386: " Alvise Rigo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.