qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation
@ 2016-01-29  9:32 Alvise Rigo
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 01/16] exec.c: Add new exclusive bitmap to ram_list Alvise Rigo
                   ` (16 more replies)
  0 siblings, 17 replies; 50+ messages in thread
From: Alvise Rigo @ 2016-01-29  9:32 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: claudio.fontana, pbonzini, jani.kokkonen, tech, alex.bennee, rth

This is the seventh iteration of the patch series which applies to the
upstream branch of QEMU (v2.5.0-rc4).

Changes versus previous versions are at the bottom of this cover letter.

The code is also available at following repository:
https://git.virtualopensystems.com/dev/qemu-mt.git
branch:
slowpath-for-atomic-v7-no-mttcg

This patch series provides an infrastructure for atomic instruction
implementation in QEMU, thus offering a 'legacy' solution for
translating guest atomic instructions. Moreover, it can be considered as
a first step toward a multi-thread TCG.

The underlying idea is to provide new TCG helpers (sort of softmmu
helpers) that guarantee atomicity to some memory accesses or in general
a way to define memory transactions.

More specifically, the new softmmu helpers behave as LoadLink and
StoreConditional instructions, and are called from TCG code by means of
target specific helpers. This work includes the implementation for all
the ARM atomic instructions, see target-arm/op_helper.c.

The implementation heavily uses the software TLB together with a new
bitmap that has been added to the ram_list structure which flags, on a
per-CPU basis, all the memory pages that are in the middle of a LoadLink
(LL), StoreConditional (SC) operation.  Since all these pages can be
accessed directly through the fast-path and alter a vCPU's linked value,
the new bitmap has been coupled with a new TLB flag for the TLB virtual
address which forces the slow-path execution for all the accesses to a
page containing a linked address.

The new slow-path is implemented such that:
- the LL behaves as a normal load slow-path, except for clearing the
  dirty flag in the bitmap.  The cputlb.c code while generating a TLB
  entry, checks if there is at least one vCPU that has the bit cleared
  in the exclusive bitmap, it that case the TLB entry will have the EXCL
  flag set, thus forcing the slow-path.  In order to ensure that all the
  vCPUs will follow the slow-path for that page, we flush the TLB cache
  of all the other vCPUs.

  The LL will also set the linked address and size of the access in a
  vCPU's private variable. After the corresponding SC, this address will
  be set to a reset value.

- the SC can fail returning 1, or succeed, returning 0.  It has to come
  always after a LL and has to access the same address 'linked' by the
  previous LL, otherwise it will fail. If in the time window delimited
  by a legit pair of LL/SC operations another write access happens to
  the linked address, the SC will fail.

In theory, the provided implementation of TCG LoadLink/StoreConditional
can be used to properly handle atomic instructions on any architecture.

The code has been tested with bare-metal test cases and by booting Linux.

* Performance considerations
The new slow-path adds some overhead to the translation of the ARM
atomic instructions, since their emulation doesn't happen anymore only
in the guest (by means of pure TCG generated code), but requires the
execution of two helpers functions. Despite this, the additional time
required to boot an ARM Linux kernel on an i7 clocked at 2.5GHz is
negligible.
Instead, on a LL/SC bound test scenario - like:
https://git.virtualopensystems.com/dev/tcg_baremetal_tests.git - this
solution requires 30% (1 million iterations) and 70% (10 millions
iterations) of additional time for the test to complete.

Changes from v6:
- Included aligned variants of the exclusive helpers
- Reverted to single bit per page design in DIRTY_MEMORY_EXCLUSIVE
  bitmap. The new way we restore the pages as non-exclusive (PATCH 13)
  made the per-VCPU design unnecessary.
- arm32 now uses aligned exclusive accesses
- aarch64 exclusive instructions implemented [PATCH 15-16]
- Addressed comments from Alex

Changes from v5:
- The exclusive memory region is now set through a CPUClass hook,
  allowing any architecture to decide the memory area that will be
  protected during a LL/SC operation [PATCH 3]
- The runtime helpers dropped any target dependency and are now in a
  common file [PATCH 5]
- Improved the way we restore a guest page as non-exclusive [PATCH 9]
- Included MMIO memory as possible target of LL/SC
  instructions. This also required to somehow simplify the
  helper_*_st_name helpers in softmmu_template.h [PATCH 8-14]

Changes from v4:
- Reworked the exclusive bitmap to be of fixed size (8 bits per address)
- The slow-path is now TCG backend independent, no need to touch
  tcg/* anymore as suggested by Aurelien Jarno.

Changes from v3:
- based on upstream QEMU
- addressed comments from Alex Bennée
- the slow path can be enabled by the user with:
  ./configure --enable-tcg-ldst-excl only if the backend supports it
- all the ARM ldex/stex instructions make now use of the slow path
- added aarch64 TCG backend support
- part of the code has been rewritten

Changes from v2:
- the bitmap accessors are now atomic
- a rendezvous between vCPUs and a simple callback support before executing
  a TB have been added to handle the TLB flush support
- the softmmu_template and softmmu_llsc_template have been adapted to work
  on real multi-threading

Changes from v1:
- The ram bitmap is not reversed anymore, 1 = dirty, 0 = exclusive
- The way how the offset to access the bitmap is calculated has
  been improved and fixed
- A page to be set as dirty requires a vCPU to target the protected address
  and not just an address in the page
- Addressed comments from Richard Henderson to improve the logic in
  softmmu_template.h and to simplify the methods generation through
  softmmu_llsc_template.h
- Added initial implementation of qemu_{ldlink,stcond}_i32 for tcg/i386

This work has been sponsored by Huawei Technologies Duesseldorf GmbH.



Alvise Rigo (16):
  exec.c: Add new exclusive bitmap to ram_list
  softmmu: Simplify helper_*_st_name, wrap unaligned code
  softmmu: Simplify helper_*_st_name, wrap MMIO code
  softmmu: Simplify helper_*_st_name, wrap RAM code
  softmmu: Add new TLB_EXCL flag
  qom: cpu: Add CPUClass hooks for exclusive range
  softmmu: Add helpers for a new slowpath
  softmmu: Honor the new exclusive bitmap
  softmmu: Include MMIO/invalid exclusive accesses
  softmmu: Protect MMIO exclusive range
  tcg: Create new runtime helpers for excl accesses
  configure: Use slow-path for atomic only when the softmmu is enabled
  softmmu: Add history of excl accesses
  target-arm: translate: Use ld/st excl for atomic insns
  target-arm: cpu64: use custom set_excl hook
  target-arm: aarch64: add atomic instructions

 Makefile.target             |   2 +-
 configure                   |  16 +++
 cputlb.c                    |  63 +++++++-
 exec.c                      |  26 +++-
 include/exec/cpu-all.h      |   8 ++
 include/exec/helper-gen.h   |   3 +
 include/exec/helper-proto.h |   1 +
 include/exec/helper-tcg.h   |   3 +
 include/exec/memory.h       |   4 +-
 include/exec/ram_addr.h     |  31 ++++
 include/qom/cpu.h           |  28 ++++
 qom/cpu.c                   |  20 +++
 softmmu_llsc_template.h     | 137 ++++++++++++++++++
 softmmu_template.h          | 342 ++++++++++++++++++++++++++++++++++----------
 target-arm/cpu.h            |   2 +
 target-arm/cpu64.c          |   8 ++
 target-arm/helper-a64.c     |  55 +++++++
 target-arm/helper-a64.h     |   4 +
 target-arm/helper.h         |   4 +
 target-arm/machine.c        |   2 +
 target-arm/op_helper.c      |  18 +++
 target-arm/translate-a64.c  | 134 ++++++++++++++++-
 target-arm/translate.c      | 188 +++++++++++++++++++++++-
 tcg-llsc-helper.c           | 104 ++++++++++++++
 tcg-llsc-helper.h           |  61 ++++++++
 tcg/tcg-llsc-gen-helper.h   |  67 +++++++++
 tcg/tcg.h                   |  31 ++++
 vl.c                        |   3 +
 28 files changed, 1273 insertions(+), 92 deletions(-)
 create mode 100644 softmmu_llsc_template.h
 create mode 100644 tcg-llsc-helper.c
 create mode 100644 tcg-llsc-helper.h
 create mode 100644 tcg/tcg-llsc-gen-helper.h

-- 
2.7.0

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [RFC v7 01/16] exec.c: Add new exclusive bitmap to ram_list
  2016-01-29  9:32 [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation Alvise Rigo
@ 2016-01-29  9:32 ` Alvise Rigo
  2016-02-11 13:00   ` Alex Bennée
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 02/16] softmmu: Simplify helper_*_st_name, wrap unaligned code Alvise Rigo
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 50+ messages in thread
From: Alvise Rigo @ 2016-01-29  9:32 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: claudio.fontana, pbonzini, jani.kokkonen, tech, alex.bennee, rth

The purpose of this new bitmap is to flag the memory pages that are in
the middle of LL/SC operations (after a LL, before a SC). For all these
pages, the corresponding TLB entries will be generated in such a way to
force the slow-path for all the VCPUs (see the following patches).

When the system starts, the whole memory is set to dirty.

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 exec.c                  |  7 +++++--
 include/exec/memory.h   |  3 ++-
 include/exec/ram_addr.h | 31 +++++++++++++++++++++++++++++++
 3 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/exec.c b/exec.c
index 7115403..51f366d 100644
--- a/exec.c
+++ b/exec.c
@@ -1575,11 +1575,14 @@ static ram_addr_t ram_block_add(RAMBlock *new_block, Error **errp)
         int i;
 
         /* ram_list.dirty_memory[] is protected by the iothread lock.  */
-        for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
+        for (i = 0; i < DIRTY_MEMORY_EXCLUSIVE; i++) {
             ram_list.dirty_memory[i] =
                 bitmap_zero_extend(ram_list.dirty_memory[i],
                                    old_ram_size, new_ram_size);
-       }
+        }
+        ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE] =
+            bitmap_zero_extend(ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE],
+                               old_ram_size, new_ram_size);
     }
     cpu_physical_memory_set_dirty_range(new_block->offset,
                                         new_block->used_length,
diff --git a/include/exec/memory.h b/include/exec/memory.h
index c92734a..71e0480 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -19,7 +19,8 @@
 #define DIRTY_MEMORY_VGA       0
 #define DIRTY_MEMORY_CODE      1
 #define DIRTY_MEMORY_MIGRATION 2
-#define DIRTY_MEMORY_NUM       3        /* num of dirty bits */
+#define DIRTY_MEMORY_EXCLUSIVE 3
+#define DIRTY_MEMORY_NUM       4        /* num of dirty bits */
 
 #include <stdint.h>
 #include <stdbool.h>
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index ef1489d..19789fc 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -21,6 +21,7 @@
 
 #ifndef CONFIG_USER_ONLY
 #include "hw/xen/xen.h"
+#include "sysemu/sysemu.h"
 
 struct RAMBlock {
     struct rcu_head rcu;
@@ -172,6 +173,9 @@ static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start,
     if (unlikely(mask & (1 << DIRTY_MEMORY_CODE))) {
         bitmap_set_atomic(d[DIRTY_MEMORY_CODE], page, end - page);
     }
+    if (unlikely(mask & (1 << DIRTY_MEMORY_EXCLUSIVE))) {
+        bitmap_set_atomic(d[DIRTY_MEMORY_EXCLUSIVE], page, end - page);
+    }
     xen_modified_memory(start, length);
 }
 
@@ -287,5 +291,32 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(unsigned long *dest,
 }
 
 void migration_bitmap_extend(ram_addr_t old, ram_addr_t new);
+
+/* Exclusive bitmap support. */
+#define EXCL_BITMAP_GET_OFFSET(addr) (addr >> TARGET_PAGE_BITS)
+
+/* Make the page of @addr not exclusive. */
+static inline void cpu_physical_memory_unset_excl(ram_addr_t addr)
+{
+    set_bit_atomic(EXCL_BITMAP_GET_OFFSET(addr),
+                   ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE]);
+}
+
+/* Return true if the page of @addr is exclusive, i.e. the EXCL bit is set. */
+static inline int cpu_physical_memory_is_excl(ram_addr_t addr)
+{
+    return !test_bit(EXCL_BITMAP_GET_OFFSET(addr),
+                     ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE]);
+}
+
+/* Set the page of @addr as exclusive clearing its EXCL bit and return the
+ * previous bit's state. */
+static inline int cpu_physical_memory_set_excl(ram_addr_t addr)
+{
+    return bitmap_test_and_clear_atomic(
+                                ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE],
+                                EXCL_BITMAP_GET_OFFSET(addr), 1);
+}
+
 #endif
 #endif
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel]  [RFC v7 02/16] softmmu: Simplify helper_*_st_name, wrap unaligned code
  2016-01-29  9:32 [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation Alvise Rigo
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 01/16] exec.c: Add new exclusive bitmap to ram_list Alvise Rigo
@ 2016-01-29  9:32 ` Alvise Rigo
  2016-02-11 13:07   ` Alex Bennée
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 03/16] softmmu: Simplify helper_*_st_name, wrap MMIO code Alvise Rigo
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 50+ messages in thread
From: Alvise Rigo @ 2016-01-29  9:32 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: claudio.fontana, pbonzini, jani.kokkonen, tech, alex.bennee, rth

Attempting to simplify the helper_*_st_name, wrap the
do_unaligned_access code into an inline function.
Remove also the goto statement.

Based on this work, Alex proposed the following patch series
https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg01136.html
that reduces code duplication of the softmmu_helpers.

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 softmmu_template.h | 96 ++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 60 insertions(+), 36 deletions(-)

diff --git a/softmmu_template.h b/softmmu_template.h
index 208f808..7029a03 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -370,6 +370,32 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
                                  iotlbentry->attrs);
 }
 
+static inline void glue(helper_le_st_name, _do_unl_access)(CPUArchState *env,
+                                                           DATA_TYPE val,
+                                                           target_ulong addr,
+                                                           TCGMemOpIdx oi,
+                                                           unsigned mmu_idx,
+                                                           uintptr_t retaddr)
+{
+    int i;
+
+    if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
+        cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
+                             mmu_idx, retaddr);
+    }
+    /* XXX: not efficient, but simple */
+    /* Note: relies on the fact that tlb_fill() does not remove the
+     * previous page from the TLB cache.  */
+    for (i = DATA_SIZE - 1; i >= 0; i--) {
+        /* Little-endian extract.  */
+        uint8_t val8 = val >> (i * 8);
+        /* Note the adjustment at the beginning of the function.
+           Undo that for the recursion.  */
+        glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
+                                        oi, retaddr + GETPC_ADJ);
+    }
+}
+
 void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
                        TCGMemOpIdx oi, uintptr_t retaddr)
 {
@@ -399,7 +425,8 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
     if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
         CPUIOTLBEntry *iotlbentry;
         if ((addr & (DATA_SIZE - 1)) != 0) {
-            goto do_unaligned_access;
+            glue(helper_le_st_name, _do_unl_access)(env, val, addr, mmu_idx,
+                                                    oi, retaddr);
         }
         iotlbentry = &env->iotlb[mmu_idx][index];
 
@@ -414,23 +441,8 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
     if (DATA_SIZE > 1
         && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
                      >= TARGET_PAGE_SIZE)) {
-        int i;
-    do_unaligned_access:
-        if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
-            cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
-                                 mmu_idx, retaddr);
-        }
-        /* XXX: not efficient, but simple */
-        /* Note: relies on the fact that tlb_fill() does not remove the
-         * previous page from the TLB cache.  */
-        for (i = DATA_SIZE - 1; i >= 0; i--) {
-            /* Little-endian extract.  */
-            uint8_t val8 = val >> (i * 8);
-            /* Note the adjustment at the beginning of the function.
-               Undo that for the recursion.  */
-            glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
-                                            oi, retaddr + GETPC_ADJ);
-        }
+        glue(helper_le_st_name, _do_unl_access)(env, val, addr, mmu_idx,
+                                                oi, retaddr);
         return;
     }
 
@@ -450,6 +462,32 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
 }
 
 #if DATA_SIZE > 1
+static inline void glue(helper_be_st_name, _do_unl_access)(CPUArchState *env,
+                                                           DATA_TYPE val,
+                                                           target_ulong addr,
+                                                           TCGMemOpIdx oi,
+                                                           unsigned mmu_idx,
+                                                           uintptr_t retaddr)
+{
+    int i;
+
+    if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
+        cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
+                             mmu_idx, retaddr);
+    }
+    /* XXX: not efficient, but simple */
+    /* Note: relies on the fact that tlb_fill() does not remove the
+     * previous page from the TLB cache.  */
+    for (i = DATA_SIZE - 1; i >= 0; i--) {
+        /* Big-endian extract.  */
+        uint8_t val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8));
+        /* Note the adjustment at the beginning of the function.
+           Undo that for the recursion.  */
+        glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
+                                        oi, retaddr + GETPC_ADJ);
+    }
+}
+
 void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
                        TCGMemOpIdx oi, uintptr_t retaddr)
 {
@@ -479,7 +517,8 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
     if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
         CPUIOTLBEntry *iotlbentry;
         if ((addr & (DATA_SIZE - 1)) != 0) {
-            goto do_unaligned_access;
+            glue(helper_be_st_name, _do_unl_access)(env, val, addr, mmu_idx,
+                                                    oi, retaddr);
         }
         iotlbentry = &env->iotlb[mmu_idx][index];
 
@@ -494,23 +533,8 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
     if (DATA_SIZE > 1
         && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
                      >= TARGET_PAGE_SIZE)) {
-        int i;
-    do_unaligned_access:
-        if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
-            cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
-                                 mmu_idx, retaddr);
-        }
-        /* XXX: not efficient, but simple */
-        /* Note: relies on the fact that tlb_fill() does not remove the
-         * previous page from the TLB cache.  */
-        for (i = DATA_SIZE - 1; i >= 0; i--) {
-            /* Big-endian extract.  */
-            uint8_t val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8));
-            /* Note the adjustment at the beginning of the function.
-               Undo that for the recursion.  */
-            glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
-                                            oi, retaddr + GETPC_ADJ);
-        }
+            glue(helper_be_st_name, _do_unl_access)(env, val, addr, oi, mmu_idx,
+                                                    retaddr);
         return;
     }
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel]  [RFC v7 03/16] softmmu: Simplify helper_*_st_name, wrap MMIO code
  2016-01-29  9:32 [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation Alvise Rigo
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 01/16] exec.c: Add new exclusive bitmap to ram_list Alvise Rigo
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 02/16] softmmu: Simplify helper_*_st_name, wrap unaligned code Alvise Rigo
@ 2016-01-29  9:32 ` Alvise Rigo
  2016-02-11 13:15   ` Alex Bennée
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 04/16] softmmu: Simplify helper_*_st_name, wrap RAM code Alvise Rigo
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 50+ messages in thread
From: Alvise Rigo @ 2016-01-29  9:32 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: claudio.fontana, pbonzini, jani.kokkonen, tech, alex.bennee, rth

Attempting to simplify the helper_*_st_name, wrap the MMIO code into an
inline function.

Based on this work, Alex proposed the following patch series
https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg01136.html
that reduces code duplication of the softmmu_helpers.

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 softmmu_template.h | 66 ++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 44 insertions(+), 22 deletions(-)

diff --git a/softmmu_template.h b/softmmu_template.h
index 7029a03..3d388ec 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -396,6 +396,26 @@ static inline void glue(helper_le_st_name, _do_unl_access)(CPUArchState *env,
     }
 }
 
+static inline void glue(helper_le_st_name, _do_mmio_access)(CPUArchState *env,
+                                                            DATA_TYPE val,
+                                                            target_ulong addr,
+                                                            TCGMemOpIdx oi,
+                                                            unsigned mmu_idx,
+                                                            int index,
+                                                            uintptr_t retaddr)
+{
+    CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
+
+    if ((addr & (DATA_SIZE - 1)) != 0) {
+        glue(helper_le_st_name, _do_unl_access)(env, val, addr, mmu_idx,
+                                                oi, retaddr);
+    }
+    /* ??? Note that the io helpers always read data in the target
+       byte ordering.  We should push the LE/BE request down into io.  */
+    val = TGT_LE(val);
+    glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
+}
+
 void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
                        TCGMemOpIdx oi, uintptr_t retaddr)
 {
@@ -423,17 +443,8 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
 
     /* Handle an IO access.  */
     if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
-        CPUIOTLBEntry *iotlbentry;
-        if ((addr & (DATA_SIZE - 1)) != 0) {
-            glue(helper_le_st_name, _do_unl_access)(env, val, addr, mmu_idx,
-                                                    oi, retaddr);
-        }
-        iotlbentry = &env->iotlb[mmu_idx][index];
-
-        /* ??? Note that the io helpers always read data in the target
-           byte ordering.  We should push the LE/BE request down into io.  */
-        val = TGT_LE(val);
-        glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
+        glue(helper_le_st_name, _do_mmio_access)(env, val, addr, oi,
+                                                 mmu_idx, index, retaddr);
         return;
     }
 
@@ -488,6 +499,26 @@ static inline void glue(helper_be_st_name, _do_unl_access)(CPUArchState *env,
     }
 }
 
+static inline void glue(helper_be_st_name, _do_mmio_access)(CPUArchState *env,
+                                                            DATA_TYPE val,
+                                                            target_ulong addr,
+                                                            TCGMemOpIdx oi,
+                                                            unsigned mmu_idx,
+                                                            int index,
+                                                            uintptr_t retaddr)
+{
+    CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
+
+    if ((addr & (DATA_SIZE - 1)) != 0) {
+        glue(helper_be_st_name, _do_unl_access)(env, val, addr, mmu_idx,
+                                                oi, retaddr);
+    }
+    /* ??? Note that the io helpers always read data in the target
+       byte ordering.  We should push the LE/BE request down into io.  */
+    val = TGT_BE(val);
+    glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
+}
+
 void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
                        TCGMemOpIdx oi, uintptr_t retaddr)
 {
@@ -515,17 +546,8 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
 
     /* Handle an IO access.  */
     if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
-        CPUIOTLBEntry *iotlbentry;
-        if ((addr & (DATA_SIZE - 1)) != 0) {
-            glue(helper_be_st_name, _do_unl_access)(env, val, addr, mmu_idx,
-                                                    oi, retaddr);
-        }
-        iotlbentry = &env->iotlb[mmu_idx][index];
-
-        /* ??? Note that the io helpers always read data in the target
-           byte ordering.  We should push the LE/BE request down into io.  */
-        val = TGT_BE(val);
-        glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
+        glue(helper_be_st_name, _do_mmio_access)(env, val, addr, oi,
+                                                 mmu_idx, index, retaddr);
         return;
     }
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel]  [RFC v7 04/16] softmmu: Simplify helper_*_st_name, wrap RAM code
  2016-01-29  9:32 [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation Alvise Rigo
                   ` (2 preceding siblings ...)
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 03/16] softmmu: Simplify helper_*_st_name, wrap MMIO code Alvise Rigo
@ 2016-01-29  9:32 ` Alvise Rigo
  2016-02-11 13:18   ` Alex Bennée
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 05/16] softmmu: Add new TLB_EXCL flag Alvise Rigo
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 50+ messages in thread
From: Alvise Rigo @ 2016-01-29  9:32 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: claudio.fontana, pbonzini, jani.kokkonen, tech, alex.bennee, rth

Attempting to simplify the helper_*_st_name, wrap the code relative to a
RAM access into an inline function.

Based on this work, Alex proposed the following patch series
https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg01136.html
that reduces code duplication of the softmmu_helpers.

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 softmmu_template.h | 110 +++++++++++++++++++++++++++++++++--------------------
 1 file changed, 68 insertions(+), 42 deletions(-)

diff --git a/softmmu_template.h b/softmmu_template.h
index 3d388ec..6279437 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -416,13 +416,46 @@ static inline void glue(helper_le_st_name, _do_mmio_access)(CPUArchState *env,
     glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
 }
 
+static inline void glue(helper_le_st_name, _do_ram_access)(CPUArchState *env,
+                                                           DATA_TYPE val,
+                                                           target_ulong addr,
+                                                           TCGMemOpIdx oi,
+                                                           unsigned mmu_idx,
+                                                           int index,
+                                                           uintptr_t retaddr)
+{
+    uintptr_t haddr;
+
+    /* Handle slow unaligned access (it spans two pages or IO).  */
+    if (DATA_SIZE > 1
+        && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
+                     >= TARGET_PAGE_SIZE)) {
+        glue(helper_le_st_name, _do_unl_access)(env, val, addr, oi, mmu_idx,
+                                                retaddr);
+        return;
+    }
+
+    /* Handle aligned access or unaligned access in the same page.  */
+    if ((addr & (DATA_SIZE - 1)) != 0
+        && (get_memop(oi) & MO_AMASK) == MO_ALIGN) {
+        cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
+                             mmu_idx, retaddr);
+    }
+
+    haddr = addr + env->tlb_table[mmu_idx][index].addend;
+#if DATA_SIZE == 1
+    glue(glue(st, SUFFIX), _p)((uint8_t *)haddr, val);
+#else
+    glue(glue(st, SUFFIX), _le_p)((uint8_t *)haddr, val);
+#endif
+}
+
 void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
                        TCGMemOpIdx oi, uintptr_t retaddr)
 {
     unsigned mmu_idx = get_mmuidx(oi);
     int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
     target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
-    uintptr_t haddr;
 
     /* Adjust the given return address.  */
     retaddr -= GETPC_ADJ;
@@ -448,28 +481,8 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
         return;
     }
 
-    /* Handle slow unaligned access (it spans two pages or IO).  */
-    if (DATA_SIZE > 1
-        && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
-                     >= TARGET_PAGE_SIZE)) {
-        glue(helper_le_st_name, _do_unl_access)(env, val, addr, mmu_idx,
-                                                oi, retaddr);
-        return;
-    }
-
-    /* Handle aligned access or unaligned access in the same page.  */
-    if ((addr & (DATA_SIZE - 1)) != 0
-        && (get_memop(oi) & MO_AMASK) == MO_ALIGN) {
-        cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
-                             mmu_idx, retaddr);
-    }
-
-    haddr = addr + env->tlb_table[mmu_idx][index].addend;
-#if DATA_SIZE == 1
-    glue(glue(st, SUFFIX), _p)((uint8_t *)haddr, val);
-#else
-    glue(glue(st, SUFFIX), _le_p)((uint8_t *)haddr, val);
-#endif
+    glue(helper_le_st_name, _do_ram_access)(env, val, addr, oi, mmu_idx, index,
+                                            retaddr);
 }
 
 #if DATA_SIZE > 1
@@ -519,13 +532,42 @@ static inline void glue(helper_be_st_name, _do_mmio_access)(CPUArchState *env,
     glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
 }
 
+static inline void glue(helper_be_st_name, _do_ram_access)(CPUArchState *env,
+                                                           DATA_TYPE val,
+                                                           target_ulong addr,
+                                                           TCGMemOpIdx oi,
+                                                           unsigned mmu_idx,
+                                                           int index,
+                                                           uintptr_t retaddr)
+{
+    uintptr_t haddr;
+
+    /* Handle slow unaligned access (it spans two pages or IO).  */
+    if (DATA_SIZE > 1
+        && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
+                     >= TARGET_PAGE_SIZE)) {
+        glue(helper_be_st_name, _do_unl_access)(env, val, addr, oi, mmu_idx,
+                                                retaddr);
+        return;
+    }
+
+    /* Handle aligned access or unaligned access in the same page.  */
+    if ((addr & (DATA_SIZE - 1)) != 0
+        && (get_memop(oi) & MO_AMASK) == MO_ALIGN) {
+        cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
+                             mmu_idx, retaddr);
+    }
+
+    haddr = addr + env->tlb_table[mmu_idx][index].addend;
+    glue(glue(st, SUFFIX), _be_p)((uint8_t *)haddr, val);
+}
+
 void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
                        TCGMemOpIdx oi, uintptr_t retaddr)
 {
     unsigned mmu_idx = get_mmuidx(oi);
     int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
     target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
-    uintptr_t haddr;
 
     /* Adjust the given return address.  */
     retaddr -= GETPC_ADJ;
@@ -551,24 +593,8 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
         return;
     }
 
-    /* Handle slow unaligned access (it spans two pages or IO).  */
-    if (DATA_SIZE > 1
-        && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
-                     >= TARGET_PAGE_SIZE)) {
-            glue(helper_be_st_name, _do_unl_access)(env, val, addr, oi, mmu_idx,
-                                                    retaddr);
-        return;
-    }
-
-    /* Handle aligned access or unaligned access in the same page.  */
-    if ((addr & (DATA_SIZE - 1)) != 0
-        && (get_memop(oi) & MO_AMASK) == MO_ALIGN) {
-        cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
-                             mmu_idx, retaddr);
-    }
-
-    haddr = addr + env->tlb_table[mmu_idx][index].addend;
-    glue(glue(st, SUFFIX), _be_p)((uint8_t *)haddr, val);
+    glue(helper_be_st_name, _do_ram_access)(env, val, addr, oi, mmu_idx, index,
+                                            retaddr);
 }
 #endif /* DATA_SIZE > 1 */
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel]  [RFC v7 05/16] softmmu: Add new TLB_EXCL flag
  2016-01-29  9:32 [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation Alvise Rigo
                   ` (3 preceding siblings ...)
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 04/16] softmmu: Simplify helper_*_st_name, wrap RAM code Alvise Rigo
@ 2016-01-29  9:32 ` Alvise Rigo
  2016-02-11 13:18   ` Alex Bennée
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 06/16] qom: cpu: Add CPUClass hooks for exclusive range Alvise Rigo
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 50+ messages in thread
From: Alvise Rigo @ 2016-01-29  9:32 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: claudio.fontana, pbonzini, jani.kokkonen, tech, alex.bennee, rth

Add a new TLB flag to force all the accesses made to a page to follow
the slow-path.

The TLB entries referring guest pages with the DIRTY_MEMORY_EXCLUSIVE
bit clean will have this flag set.

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 include/exec/cpu-all.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 83b1781..f8d8feb 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -277,6 +277,14 @@ CPUArchState *cpu_copy(CPUArchState *env);
 #define TLB_NOTDIRTY    (1 << 4)
 /* Set if TLB entry is an IO callback.  */
 #define TLB_MMIO        (1 << 5)
+/* Set if TLB entry references a page that requires exclusive access.  */
+#define TLB_EXCL        (1 << 6)
+
+/* Do not allow a TARGET_PAGE_MASK which covers one or more bits defined
+ * above. */
+#if TLB_EXCL >= TARGET_PAGE_SIZE
+#error TARGET_PAGE_MASK covering the low bits of the TLB virtual address
+#endif
 
 void dump_exec_info(FILE *f, fprintf_function cpu_fprintf);
 void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf);
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [RFC v7 06/16] qom: cpu: Add CPUClass hooks for exclusive range
  2016-01-29  9:32 [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation Alvise Rigo
                   ` (4 preceding siblings ...)
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 05/16] softmmu: Add new TLB_EXCL flag Alvise Rigo
@ 2016-01-29  9:32 ` Alvise Rigo
  2016-02-11 13:22   ` Alex Bennée
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 07/16] softmmu: Add helpers for a new slowpath Alvise Rigo
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 50+ messages in thread
From: Alvise Rigo @ 2016-01-29  9:32 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: claudio.fontana, pbonzini, jani.kokkonen, tech, alex.bennee, rth

The excl_protected_range is a hwaddr range set by the VCPU at the
execution of a LoadLink instruction. If a normal access writes to this
range, the corresponding StoreCond will fail.

Each architecture can set the exclusive range when issuing the LoadLink
operation through a CPUClass hook. This comes in handy to emulate, for
instance, the exclusive monitor implemented in some ARM architectures
(more precisely, the Exclusive Reservation Granule).

In addition, add another CPUClass hook called to decide whether a
StoreCond has to fail or not.

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 include/qom/cpu.h | 15 +++++++++++++++
 qom/cpu.c         | 20 ++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 2e5229d..682c81d 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -29,6 +29,7 @@
 #include "qemu/queue.h"
 #include "qemu/thread.h"
 #include "qemu/typedefs.h"
+#include "qemu/range.h"
 
 typedef int (*WriteCoreDumpFunction)(const void *buf, size_t size,
                                      void *opaque);
@@ -183,6 +184,12 @@ typedef struct CPUClass {
     void (*cpu_exec_exit)(CPUState *cpu);
     bool (*cpu_exec_interrupt)(CPUState *cpu, int interrupt_request);
 
+    /* Atomic instruction handling */
+    void (*cpu_set_excl_protected_range)(CPUState *cpu, hwaddr addr,
+                                         hwaddr size);
+    int (*cpu_valid_excl_access)(CPUState *cpu, hwaddr addr,
+                                 hwaddr size);
+
     void (*disas_set_info)(CPUState *cpu, disassemble_info *info);
 } CPUClass;
 
@@ -219,6 +226,9 @@ struct kvm_run;
 #define TB_JMP_CACHE_BITS 12
 #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
 
+/* Atomic insn translation TLB support. */
+#define EXCLUSIVE_RESET_ADDR ULLONG_MAX
+
 /**
  * CPUState:
  * @cpu_index: CPU index (informative).
@@ -341,6 +351,11 @@ struct CPUState {
      */
     bool throttle_thread_scheduled;
 
+    /* vCPU's exclusive addresses range.
+     * The address is set to EXCLUSIVE_RESET_ADDR if the vCPU is not
+     * in the middle of a LL/SC. */
+    struct Range excl_protected_range;
+
     /* Note that this is accessed at the start of every TB via a negative
        offset from AREG0.  Leave this field at the end so as to make the
        (absolute value) offset as small as possible.  This reduces code
diff --git a/qom/cpu.c b/qom/cpu.c
index 8f537a4..a5d360c 100644
--- a/qom/cpu.c
+++ b/qom/cpu.c
@@ -203,6 +203,24 @@ static bool cpu_common_exec_interrupt(CPUState *cpu, int int_req)
     return false;
 }
 
+static void cpu_common_set_excl_range(CPUState *cpu, hwaddr addr, hwaddr size)
+{
+    cpu->excl_protected_range.begin = addr;
+    cpu->excl_protected_range.end = addr + size;
+}
+
+static int cpu_common_valid_excl_access(CPUState *cpu, hwaddr addr, hwaddr size)
+{
+    /* Check if the excl range completely covers the access */
+    if (cpu->excl_protected_range.begin <= addr &&
+        cpu->excl_protected_range.end >= addr + size) {
+
+        return 1;
+    }
+
+    return 0;
+}
+
 void cpu_dump_state(CPUState *cpu, FILE *f, fprintf_function cpu_fprintf,
                     int flags)
 {
@@ -355,6 +373,8 @@ static void cpu_class_init(ObjectClass *klass, void *data)
     k->cpu_exec_enter = cpu_common_noop;
     k->cpu_exec_exit = cpu_common_noop;
     k->cpu_exec_interrupt = cpu_common_exec_interrupt;
+    k->cpu_set_excl_protected_range = cpu_common_set_excl_range;
+    k->cpu_valid_excl_access = cpu_common_valid_excl_access;
     dc->realize = cpu_common_realizefn;
     /*
      * Reason: CPUs still need special care by board code: wiring up
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [RFC v7 07/16] softmmu: Add helpers for a new slowpath
  2016-01-29  9:32 [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation Alvise Rigo
                   ` (5 preceding siblings ...)
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 06/16] qom: cpu: Add CPUClass hooks for exclusive range Alvise Rigo
@ 2016-01-29  9:32 ` Alvise Rigo
  2016-02-11 16:33   ` Alex Bennée
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 08/16] softmmu: Honor the new exclusive bitmap Alvise Rigo
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 50+ messages in thread
From: Alvise Rigo @ 2016-01-29  9:32 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: claudio.fontana, pbonzini, jani.kokkonen, tech, alex.bennee, rth

The new helpers rely on the legacy ones to perform the actual read/write.

The LoadLink helper (helper_ldlink_name) prepares the way for the
following StoreCond operation. It sets the linked address and the size
of the access. The LoadLink helper also updates the TLB entry of the
page involved in the LL/SC to all vCPUs by forcing a TLB flush, so that
the following accesses made by all the vCPUs will follow the slow path.

The StoreConditional helper (helper_stcond_name) returns 1 if the
store has to fail due to a concurrent access to the same page by
another vCPU. A 'concurrent access' can be a store made by *any* vCPU
(although, some implementations allow stores made by the CPU that issued
the LoadLink).

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 cputlb.c                |   3 ++
 include/qom/cpu.h       |   5 ++
 softmmu_llsc_template.h | 133 ++++++++++++++++++++++++++++++++++++++++++++++++
 softmmu_template.h      |  12 +++++
 tcg/tcg.h               |  31 +++++++++++
 5 files changed, 184 insertions(+)
 create mode 100644 softmmu_llsc_template.h

diff --git a/cputlb.c b/cputlb.c
index f6fb161..ce6d720 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -476,6 +476,8 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr)
 
 #define MMUSUFFIX _mmu
 
+/* Generates LoadLink/StoreConditional helpers in softmmu_template.h */
+#define GEN_EXCLUSIVE_HELPERS
 #define SHIFT 0
 #include "softmmu_template.h"
 
@@ -488,6 +490,7 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr)
 #define SHIFT 3
 #include "softmmu_template.h"
 #undef MMUSUFFIX
+#undef GEN_EXCLUSIVE_HELPERS
 
 #define MMUSUFFIX _cmmu
 #undef GETPC_ADJ
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 682c81d..6f6c1c0 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -351,10 +351,15 @@ struct CPUState {
      */
     bool throttle_thread_scheduled;
 
+    /* Used by the atomic insn translation backend. */
+    bool ll_sc_context;
     /* vCPU's exclusive addresses range.
      * The address is set to EXCLUSIVE_RESET_ADDR if the vCPU is not
      * in the middle of a LL/SC. */
     struct Range excl_protected_range;
+    /* Used to carry the SC result but also to flag a normal store access made
+     * by a stcond (see softmmu_template.h). */
+    bool excl_succeeded;
 
     /* Note that this is accessed at the start of every TB via a negative
        offset from AREG0.  Leave this field at the end so as to make the
diff --git a/softmmu_llsc_template.h b/softmmu_llsc_template.h
new file mode 100644
index 0000000..101f5e8
--- /dev/null
+++ b/softmmu_llsc_template.h
@@ -0,0 +1,133 @@
+/*
+ *  Software MMU support (esclusive load/store operations)
+ *
+ * Generate helpers used by TCG for qemu_ldlink/stcond ops.
+ *
+ * Included from softmmu_template.h only.
+ *
+ * Copyright (c) 2015 Virtual Open Systems
+ *
+ * Authors:
+ *  Alvise Rigo <a.rigo@virtualopensystems.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/* This template does not generate together the le and be version, but only one
+ * of the two depending on whether BIGENDIAN_EXCLUSIVE_HELPERS has been set.
+ * The same nomenclature as softmmu_template.h is used for the exclusive
+ * helpers.  */
+
+#ifdef BIGENDIAN_EXCLUSIVE_HELPERS
+
+#define helper_ldlink_name  glue(glue(helper_be_ldlink, USUFFIX), MMUSUFFIX)
+#define helper_stcond_name  glue(glue(helper_be_stcond, SUFFIX), MMUSUFFIX)
+#define helper_ld glue(glue(helper_be_ld, USUFFIX), MMUSUFFIX)
+#define helper_st glue(glue(helper_be_st, SUFFIX), MMUSUFFIX)
+
+#else /* LE helpers + 8bit helpers (generated only once for both LE end BE) */
+
+#if DATA_SIZE > 1
+#define helper_ldlink_name  glue(glue(helper_le_ldlink, USUFFIX), MMUSUFFIX)
+#define helper_stcond_name  glue(glue(helper_le_stcond, SUFFIX), MMUSUFFIX)
+#define helper_ld glue(glue(helper_le_ld, USUFFIX), MMUSUFFIX)
+#define helper_st glue(glue(helper_le_st, SUFFIX), MMUSUFFIX)
+#else /* DATA_SIZE <= 1 */
+#define helper_ldlink_name  glue(glue(helper_ret_ldlink, USUFFIX), MMUSUFFIX)
+#define helper_stcond_name  glue(glue(helper_ret_stcond, SUFFIX), MMUSUFFIX)
+#define helper_ld glue(glue(helper_ret_ld, USUFFIX), MMUSUFFIX)
+#define helper_st glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)
+#endif
+
+#endif
+
+WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
+                                TCGMemOpIdx oi, uintptr_t retaddr)
+{
+    WORD_TYPE ret;
+    int index;
+    CPUState *cpu, *this = ENV_GET_CPU(env);
+    CPUClass *cc = CPU_GET_CLASS(this);
+    hwaddr hw_addr;
+    unsigned mmu_idx = get_mmuidx(oi);
+
+    /* Use the proper load helper from cpu_ldst.h */
+    ret = helper_ld(env, addr, oi, retaddr);
+
+    index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+
+    /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr + xlat)
+     * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
+    hw_addr = (env->iotlb[mmu_idx][index].addr & TARGET_PAGE_MASK) + addr;
+    if (likely(!(env->tlb_table[mmu_idx][index].addr_read & TLB_MMIO))) {
+        /* If all the vCPUs have the EXCL bit set for this page there is no need
+         * to request any flush. */
+        if (!cpu_physical_memory_is_excl(hw_addr)) {
+            cpu_physical_memory_set_excl(hw_addr);
+            CPU_FOREACH(cpu) {
+                if (current_cpu != cpu) {
+                    tlb_flush(cpu, 1);
+                }
+            }
+        }
+    } else {
+        hw_error("EXCL accesses to MMIO regions not supported yet.");
+    }
+
+    cc->cpu_set_excl_protected_range(this, hw_addr, DATA_SIZE);
+
+    /* For this vCPU, just update the TLB entry, no need to flush. */
+    env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
+
+    /* From now on we are in LL/SC context */
+    this->ll_sc_context = true;
+
+    return ret;
+}
+
+WORD_TYPE helper_stcond_name(CPUArchState *env, target_ulong addr,
+                             DATA_TYPE val, TCGMemOpIdx oi,
+                             uintptr_t retaddr)
+{
+    WORD_TYPE ret;
+    CPUState *cpu = ENV_GET_CPU(env);
+
+    if (!cpu->ll_sc_context) {
+        ret = 1;
+    } else {
+        /* We set it preventively to true to distinguish the following legacy
+         * access as one made by the store conditional wrapper. If the store
+         * conditional does not succeed, the value will be set to 0.*/
+        cpu->excl_succeeded = true;
+        helper_st(env, addr, val, oi, retaddr);
+
+        if (cpu->excl_succeeded) {
+            ret = 0;
+        } else {
+            ret = 1;
+        }
+    }
+
+    /* Unset LL/SC context */
+    cpu->ll_sc_context = false;
+    cpu->excl_succeeded = false;
+    cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
+
+    return ret;
+}
+
+#undef helper_ldlink_name
+#undef helper_stcond_name
+#undef helper_ld
+#undef helper_st
diff --git a/softmmu_template.h b/softmmu_template.h
index 6279437..4332db2 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -622,6 +622,18 @@ void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
 #endif
 #endif /* !defined(SOFTMMU_CODE_ACCESS) */
 
+#ifdef GEN_EXCLUSIVE_HELPERS
+
+#if DATA_SIZE > 1 /* The 8-bit helpers are generate along with LE helpers */
+#define BIGENDIAN_EXCLUSIVE_HELPERS
+#include "softmmu_llsc_template.h"
+#undef BIGENDIAN_EXCLUSIVE_HELPERS
+#endif
+
+#include "softmmu_llsc_template.h"
+
+#endif /* !defined(GEN_EXCLUSIVE_HELPERS) */
+
 #undef READ_ACCESS_TYPE
 #undef SHIFT
 #undef DATA_TYPE
diff --git a/tcg/tcg.h b/tcg/tcg.h
index a696922..3e050a4 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -968,6 +968,21 @@ tcg_target_ulong helper_be_ldul_mmu(CPUArchState *env, target_ulong addr,
                                     TCGMemOpIdx oi, uintptr_t retaddr);
 uint64_t helper_be_ldq_mmu(CPUArchState *env, target_ulong addr,
                            TCGMemOpIdx oi, uintptr_t retaddr);
+/* Exclusive variants */
+tcg_target_ulong helper_ret_ldlinkub_mmu(CPUArchState *env, target_ulong addr,
+                                            TCGMemOpIdx oi, uintptr_t retaddr);
+tcg_target_ulong helper_le_ldlinkuw_mmu(CPUArchState *env, target_ulong addr,
+                                            TCGMemOpIdx oi, uintptr_t retaddr);
+tcg_target_ulong helper_le_ldlinkul_mmu(CPUArchState *env, target_ulong addr,
+                                            TCGMemOpIdx oi, uintptr_t retaddr);
+uint64_t helper_le_ldlinkq_mmu(CPUArchState *env, target_ulong addr,
+                                            TCGMemOpIdx oi, uintptr_t retaddr);
+tcg_target_ulong helper_be_ldlinkuw_mmu(CPUArchState *env, target_ulong addr,
+                                            TCGMemOpIdx oi, uintptr_t retaddr);
+tcg_target_ulong helper_be_ldlinkul_mmu(CPUArchState *env, target_ulong addr,
+                                            TCGMemOpIdx oi, uintptr_t retaddr);
+uint64_t helper_be_ldlinkq_mmu(CPUArchState *env, target_ulong addr,
+                                            TCGMemOpIdx oi, uintptr_t retaddr);
 
 /* Value sign-extended to tcg register size.  */
 tcg_target_ulong helper_ret_ldsb_mmu(CPUArchState *env, target_ulong addr,
@@ -1010,6 +1025,22 @@ uint32_t helper_be_ldl_cmmu(CPUArchState *env, target_ulong addr,
                             TCGMemOpIdx oi, uintptr_t retaddr);
 uint64_t helper_be_ldq_cmmu(CPUArchState *env, target_ulong addr,
                             TCGMemOpIdx oi, uintptr_t retaddr);
+/* Exclusive variants */
+tcg_target_ulong helper_ret_stcondb_mmu(CPUArchState *env, target_ulong addr,
+                            uint8_t val, TCGMemOpIdx oi, uintptr_t retaddr);
+tcg_target_ulong helper_le_stcondw_mmu(CPUArchState *env, target_ulong addr,
+                            uint16_t val, TCGMemOpIdx oi, uintptr_t retaddr);
+tcg_target_ulong helper_le_stcondl_mmu(CPUArchState *env, target_ulong addr,
+                            uint32_t val, TCGMemOpIdx oi, uintptr_t retaddr);
+uint64_t helper_le_stcondq_mmu(CPUArchState *env, target_ulong addr,
+                            uint64_t val, TCGMemOpIdx oi, uintptr_t retaddr);
+tcg_target_ulong helper_be_stcondw_mmu(CPUArchState *env, target_ulong addr,
+                            uint16_t val, TCGMemOpIdx oi, uintptr_t retaddr);
+tcg_target_ulong helper_be_stcondl_mmu(CPUArchState *env, target_ulong addr,
+                            uint32_t val, TCGMemOpIdx oi, uintptr_t retaddr);
+uint64_t helper_be_stcondq_mmu(CPUArchState *env, target_ulong addr,
+                            uint64_t val, TCGMemOpIdx oi, uintptr_t retaddr);
+
 
 /* Temporary aliases until backends are converted.  */
 #ifdef TARGET_WORDS_BIGENDIAN
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [RFC v7 08/16] softmmu: Honor the new exclusive bitmap
  2016-01-29  9:32 [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation Alvise Rigo
                   ` (6 preceding siblings ...)
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 07/16] softmmu: Add helpers for a new slowpath Alvise Rigo
@ 2016-01-29  9:32 ` Alvise Rigo
  2016-02-16 17:39   ` Alex Bennée
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 09/16] softmmu: Include MMIO/invalid exclusive accesses Alvise Rigo
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 50+ messages in thread
From: Alvise Rigo @ 2016-01-29  9:32 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: claudio.fontana, pbonzini, jani.kokkonen, tech, alex.bennee, rth

The pages set as exclusive (clean) in the DIRTY_MEMORY_EXCLUSIVE bitmap
have to have their TLB entries flagged with TLB_EXCL. The accesses to
pages with TLB_EXCL flag set have to be properly handled in that they
can potentially invalidate an open LL/SC transaction.

Modify the TLB entries generation to honor the new bitmap and extend
the softmmu_template to handle the accesses made to guest pages marked
as exclusive.

In the case we remove a TLB entry marked as EXCL, we unset the
corresponding exclusive bit in the bitmap.

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 cputlb.c           | 44 ++++++++++++++++++++++++++++--
 softmmu_template.h | 80 ++++++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 113 insertions(+), 11 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index ce6d720..aa9cc17 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -395,6 +395,16 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
     env->tlb_v_table[mmu_idx][vidx] = *te;
     env->iotlb_v[mmu_idx][vidx] = env->iotlb[mmu_idx][index];
 
+    if (unlikely(!(te->addr_write & TLB_MMIO) && (te->addr_write & TLB_EXCL))) {
+        /* We are removing an exclusive entry, set the page to dirty. This
+         * is not be necessary if the vCPU has performed both SC and LL. */
+        hwaddr hw_addr = (env->iotlb[mmu_idx][index].addr & TARGET_PAGE_MASK) +
+                                          (te->addr_write & TARGET_PAGE_MASK);
+        if (!cpu->ll_sc_context) {
+            cpu_physical_memory_unset_excl(hw_addr);
+        }
+    }
+
     /* refill the tlb */
     env->iotlb[mmu_idx][index].addr = iotlb - vaddr;
     env->iotlb[mmu_idx][index].attrs = attrs;
@@ -418,9 +428,19 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
         } else if (memory_region_is_ram(section->mr)
                    && cpu_physical_memory_is_clean(section->mr->ram_addr
                                                    + xlat)) {
-            te->addr_write = address | TLB_NOTDIRTY;
-        } else {
-            te->addr_write = address;
+            address |= TLB_NOTDIRTY;
+        }
+
+        /* Since the MMIO accesses follow always the slow path, we do not need
+         * to set any flag to trap the access */
+        if (!(address & TLB_MMIO)) {
+            if (cpu_physical_memory_is_excl(section->mr->ram_addr + xlat)) {
+                /* There is at least one vCPU that has flagged the address as
+                 * exclusive. */
+                te->addr_write = address | TLB_EXCL;
+            } else {
+                te->addr_write = address;
+            }
         }
     } else {
         te->addr_write = -1;
@@ -474,6 +494,24 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr)
     return qemu_ram_addr_from_host_nofail(p);
 }
 
+/* For every vCPU compare the exclusive address and reset it in case of a
+ * match. Since only one vCPU is running at once, no lock has to be held to
+ * guard this operation. */
+static inline void lookup_and_reset_cpus_ll_addr(hwaddr addr, hwaddr size)
+{
+    CPUState *cpu;
+
+    CPU_FOREACH(cpu) {
+        if (cpu->excl_protected_range.begin != EXCLUSIVE_RESET_ADDR &&
+            ranges_overlap(cpu->excl_protected_range.begin,
+                           cpu->excl_protected_range.end -
+                           cpu->excl_protected_range.begin,
+                           addr, size)) {
+            cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
+        }
+    }
+}
+
 #define MMUSUFFIX _mmu
 
 /* Generates LoadLink/StoreConditional helpers in softmmu_template.h */
diff --git a/softmmu_template.h b/softmmu_template.h
index 4332db2..267c52a 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -474,11 +474,43 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
         tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
     }
 
-    /* Handle an IO access.  */
+    /* Handle an IO access or exclusive access.  */
     if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
-        glue(helper_le_st_name, _do_mmio_access)(env, val, addr, oi,
-                                                 mmu_idx, index, retaddr);
-        return;
+        if ((tlb_addr & ~TARGET_PAGE_MASK) == TLB_EXCL) {
+            CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
+            CPUState *cpu = ENV_GET_CPU(env);
+            CPUClass *cc = CPU_GET_CLASS(cpu);
+            /* The slow-path has been forced since we are writing to
+             * exclusive-protected memory. */
+            hwaddr hw_addr = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
+
+            /* The function lookup_and_reset_cpus_ll_addr could have reset the
+             * exclusive address. Fail the SC in this case.
+             * N.B.: here excl_succeed == true means that the caller is
+             * helper_stcond_name in softmmu_llsc_template.
+             * On the contrary, excl_succeeded == false occurs when a VCPU is
+             * writing through normal store to a page with TLB_EXCL bit set. */
+            if (cpu->excl_succeeded) {
+                if (!cc->cpu_valid_excl_access(cpu, hw_addr, DATA_SIZE)) {
+                    /* The vCPU is SC-ing to an unprotected address. */
+                    cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
+                    cpu->excl_succeeded = false;
+
+                    return;
+                }
+            }
+
+            glue(helper_le_st_name, _do_ram_access)(env, val, addr, oi,
+                                                    mmu_idx, index, retaddr);
+
+            lookup_and_reset_cpus_ll_addr(hw_addr, DATA_SIZE);
+
+            return;
+        } else {
+            glue(helper_le_st_name, _do_mmio_access)(env, val, addr, oi,
+                                                     mmu_idx, index, retaddr);
+            return;
+        }
     }
 
     glue(helper_le_st_name, _do_ram_access)(env, val, addr, oi, mmu_idx, index,
@@ -586,11 +618,43 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
         tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
     }
 
-    /* Handle an IO access.  */
+    /* Handle an IO access or exclusive access.  */
     if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
-        glue(helper_be_st_name, _do_mmio_access)(env, val, addr, oi,
-                                                 mmu_idx, index, retaddr);
-        return;
+        if ((tlb_addr & ~TARGET_PAGE_MASK) == TLB_EXCL) {
+            CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
+            CPUState *cpu = ENV_GET_CPU(env);
+            CPUClass *cc = CPU_GET_CLASS(cpu);
+            /* The slow-path has been forced since we are writing to
+             * exclusive-protected memory. */
+            hwaddr hw_addr = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
+
+            /* The function lookup_and_reset_cpus_ll_addr could have reset the
+             * exclusive address. Fail the SC in this case.
+             * N.B.: here excl_succeed == true means that the caller is
+             * helper_stcond_name in softmmu_llsc_template.
+             * On the contrary, excl_succeeded == false occurs when a VCPU is
+             * writing through normal store to a page with TLB_EXCL bit set. */
+            if (cpu->excl_succeeded) {
+                if (!cc->cpu_valid_excl_access(cpu, hw_addr, DATA_SIZE)) {
+                    /* The vCPU is SC-ing to an unprotected address. */
+                    cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
+                    cpu->excl_succeeded = false;
+
+                    return;
+                }
+            }
+
+            glue(helper_be_st_name, _do_ram_access)(env, val, addr, oi,
+                                                    mmu_idx, index, retaddr);
+
+            lookup_and_reset_cpus_ll_addr(hw_addr, DATA_SIZE);
+
+            return;
+        } else {
+            glue(helper_be_st_name, _do_mmio_access)(env, val, addr, oi,
+                                                     mmu_idx, index, retaddr);
+            return;
+        }
     }
 
     glue(helper_be_st_name, _do_ram_access)(env, val, addr, oi, mmu_idx, index,
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [RFC v7 09/16] softmmu: Include MMIO/invalid exclusive accesses
  2016-01-29  9:32 [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation Alvise Rigo
                   ` (7 preceding siblings ...)
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 08/16] softmmu: Honor the new exclusive bitmap Alvise Rigo
@ 2016-01-29  9:32 ` Alvise Rigo
  2016-02-16 17:49   ` Alex Bennée
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 10/16] softmmu: Protect MMIO exclusive range Alvise Rigo
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 50+ messages in thread
From: Alvise Rigo @ 2016-01-29  9:32 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: claudio.fontana, pbonzini, jani.kokkonen, tech, alex.bennee, rth

Enable exclusive accesses when the MMIO/invalid flag is set in the TLB
entry.

In case a LL access is done to MMIO memory, we treat it differently from
a RAM access in that we do not rely on the EXCL bitmap to flag the page
as exclusive. In fact, we don't even need the TLB_EXCL flag to force the
slow path, since it is always forced anyway.

This commit does not take care of invalidating an MMIO exclusive range from
other non-exclusive accesses i.e. CPU1 LoadLink to MMIO address X and
CPU2 writes to X. This will be addressed in the following commit.

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 cputlb.c           |  7 +++----
 softmmu_template.h | 26 ++++++++++++++++++++------
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index aa9cc17..87d09c8 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -424,7 +424,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
         if ((memory_region_is_ram(section->mr) && section->readonly)
             || memory_region_is_romd(section->mr)) {
             /* Write access calls the I/O callback.  */
-            te->addr_write = address | TLB_MMIO;
+            address |= TLB_MMIO;
         } else if (memory_region_is_ram(section->mr)
                    && cpu_physical_memory_is_clean(section->mr->ram_addr
                                                    + xlat)) {
@@ -437,11 +437,10 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
             if (cpu_physical_memory_is_excl(section->mr->ram_addr + xlat)) {
                 /* There is at least one vCPU that has flagged the address as
                  * exclusive. */
-                te->addr_write = address | TLB_EXCL;
-            } else {
-                te->addr_write = address;
+                address |= TLB_EXCL;
             }
         }
+        te->addr_write = address;
     } else {
         te->addr_write = -1;
     }
diff --git a/softmmu_template.h b/softmmu_template.h
index 267c52a..c54bdc9 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -476,7 +476,7 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
 
     /* Handle an IO access or exclusive access.  */
     if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
-        if ((tlb_addr & ~TARGET_PAGE_MASK) == TLB_EXCL) {
+        if (tlb_addr & TLB_EXCL) {
             CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
             CPUState *cpu = ENV_GET_CPU(env);
             CPUClass *cc = CPU_GET_CLASS(cpu);
@@ -500,8 +500,15 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
                 }
             }
 
-            glue(helper_le_st_name, _do_ram_access)(env, val, addr, oi,
-                                                    mmu_idx, index, retaddr);
+            if (tlb_addr & ~(TARGET_PAGE_MASK | TLB_EXCL)) { /* MMIO access */
+                glue(helper_le_st_name, _do_mmio_access)(env, val, addr, oi,
+                                                         mmu_idx, index,
+                                                         retaddr);
+            } else {
+                glue(helper_le_st_name, _do_ram_access)(env, val, addr, oi,
+                                                        mmu_idx, index,
+                                                        retaddr);
+            }
 
             lookup_and_reset_cpus_ll_addr(hw_addr, DATA_SIZE);
 
@@ -620,7 +627,7 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
 
     /* Handle an IO access or exclusive access.  */
     if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
-        if ((tlb_addr & ~TARGET_PAGE_MASK) == TLB_EXCL) {
+        if (tlb_addr & TLB_EXCL) {
             CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
             CPUState *cpu = ENV_GET_CPU(env);
             CPUClass *cc = CPU_GET_CLASS(cpu);
@@ -644,8 +651,15 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
                 }
             }
 
-            glue(helper_be_st_name, _do_ram_access)(env, val, addr, oi,
-                                                    mmu_idx, index, retaddr);
+            if (tlb_addr & ~(TARGET_PAGE_MASK | TLB_EXCL)) { /* MMIO access */
+                glue(helper_be_st_name, _do_mmio_access)(env, val, addr, oi,
+                                                         mmu_idx, index,
+                                                         retaddr);
+            } else {
+                glue(helper_be_st_name, _do_ram_access)(env, val, addr, oi,
+                                                        mmu_idx, index,
+                                                        retaddr);
+            }
 
             lookup_and_reset_cpus_ll_addr(hw_addr, DATA_SIZE);
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [RFC v7 10/16] softmmu: Protect MMIO exclusive range
  2016-01-29  9:32 [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation Alvise Rigo
                   ` (8 preceding siblings ...)
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 09/16] softmmu: Include MMIO/invalid exclusive accesses Alvise Rigo
@ 2016-01-29  9:32 ` Alvise Rigo
  2016-02-17 18:55   ` Alex Bennée
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 11/16] tcg: Create new runtime helpers for excl accesses Alvise Rigo
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 50+ messages in thread
From: Alvise Rigo @ 2016-01-29  9:32 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: claudio.fontana, pbonzini, jani.kokkonen, tech, alex.bennee, rth

As for the RAM case, also the MMIO exclusive ranges have to be protected
by other CPU's accesses. In order to do that, we flag the accessed
MemoryRegion to mark that an exclusive access has been performed and is
not concluded yet.

This flag will force the other CPUs to invalidate the exclusive range in
case of collision.

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 cputlb.c                | 20 +++++++++++++-------
 include/exec/memory.h   |  1 +
 softmmu_llsc_template.h | 11 +++++++----
 softmmu_template.h      | 22 ++++++++++++++++++++++
 4 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index 87d09c8..06ce2da 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -496,19 +496,25 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr)
 /* For every vCPU compare the exclusive address and reset it in case of a
  * match. Since only one vCPU is running at once, no lock has to be held to
  * guard this operation. */
-static inline void lookup_and_reset_cpus_ll_addr(hwaddr addr, hwaddr size)
+static inline bool lookup_and_reset_cpus_ll_addr(hwaddr addr, hwaddr size)
 {
     CPUState *cpu;
+    bool ret = false;
 
     CPU_FOREACH(cpu) {
-        if (cpu->excl_protected_range.begin != EXCLUSIVE_RESET_ADDR &&
-            ranges_overlap(cpu->excl_protected_range.begin,
-                           cpu->excl_protected_range.end -
-                           cpu->excl_protected_range.begin,
-                           addr, size)) {
-            cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
+        if (current_cpu != cpu) {
+            if (cpu->excl_protected_range.begin != EXCLUSIVE_RESET_ADDR &&
+                ranges_overlap(cpu->excl_protected_range.begin,
+                               cpu->excl_protected_range.end -
+                               cpu->excl_protected_range.begin,
+                               addr, size)) {
+                cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
+                ret = true;
+            }
         }
     }
+
+    return ret;
 }
 
 #define MMUSUFFIX _mmu
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 71e0480..bacb3ad 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -171,6 +171,7 @@ struct MemoryRegion {
     bool rom_device;
     bool flush_coalesced_mmio;
     bool global_locking;
+    bool pending_excl_access; /* A vCPU issued an exclusive access */
     uint8_t dirty_log_mask;
     ram_addr_t ram_addr;
     Object *owner;
diff --git a/softmmu_llsc_template.h b/softmmu_llsc_template.h
index 101f5e8..b4712ba 100644
--- a/softmmu_llsc_template.h
+++ b/softmmu_llsc_template.h
@@ -81,15 +81,18 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
                 }
             }
         }
+        /* For this vCPU, just update the TLB entry, no need to flush. */
+        env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
     } else {
-        hw_error("EXCL accesses to MMIO regions not supported yet.");
+        /* Set a pending exclusive access in the MemoryRegion */
+        MemoryRegion *mr = iotlb_to_region(this,
+                                           env->iotlb[mmu_idx][index].addr,
+                                           env->iotlb[mmu_idx][index].attrs);
+        mr->pending_excl_access = true;
     }
 
     cc->cpu_set_excl_protected_range(this, hw_addr, DATA_SIZE);
 
-    /* For this vCPU, just update the TLB entry, no need to flush. */
-    env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
-
     /* From now on we are in LL/SC context */
     this->ll_sc_context = true;
 
diff --git a/softmmu_template.h b/softmmu_template.h
index c54bdc9..71c5152 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -360,6 +360,14 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
     MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
 
     physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
+
+    /* Invalidate the exclusive range that overlaps this access */
+    if (mr->pending_excl_access) {
+        if (lookup_and_reset_cpus_ll_addr(physaddr, 1 << SHIFT)) {
+            mr->pending_excl_access = false;
+        }
+    }
+
     if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
         cpu_io_recompile(cpu, retaddr);
     }
@@ -504,6 +512,13 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
                 glue(helper_le_st_name, _do_mmio_access)(env, val, addr, oi,
                                                          mmu_idx, index,
                                                          retaddr);
+                /* N.B.: Here excl_succeeded == true means that this access
+                 * comes from an exclusive instruction. */
+                if (cpu->excl_succeeded) {
+                    MemoryRegion *mr = iotlb_to_region(cpu, iotlbentry->addr,
+                                                       iotlbentry->attrs);
+                    mr->pending_excl_access = false;
+                }
             } else {
                 glue(helper_le_st_name, _do_ram_access)(env, val, addr, oi,
                                                         mmu_idx, index,
@@ -655,6 +670,13 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
                 glue(helper_be_st_name, _do_mmio_access)(env, val, addr, oi,
                                                          mmu_idx, index,
                                                          retaddr);
+                /* N.B.: Here excl_succeeded == true means that this access
+                 * comes from an exclusive instruction. */
+                if (cpu->excl_succeeded) {
+                    MemoryRegion *mr = iotlb_to_region(cpu, iotlbentry->addr,
+                                                       iotlbentry->attrs);
+                    mr->pending_excl_access = false;
+                }
             } else {
                 glue(helper_be_st_name, _do_ram_access)(env, val, addr, oi,
                                                         mmu_idx, index,
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [RFC v7 11/16] tcg: Create new runtime helpers for excl accesses
  2016-01-29  9:32 [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation Alvise Rigo
                   ` (9 preceding siblings ...)
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 10/16] softmmu: Protect MMIO exclusive range Alvise Rigo
@ 2016-01-29  9:32 ` Alvise Rigo
  2016-02-18 16:16   ` Alex Bennée
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 12/16] configure: Use slow-path for atomic only when the softmmu is enabled Alvise Rigo
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 50+ messages in thread
From: Alvise Rigo @ 2016-01-29  9:32 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: claudio.fontana, pbonzini, jani.kokkonen, tech, alex.bennee, rth

Introduce a set of new runtime helpers to handle exclusive instructions.
These helpers are used as hooks to call the respective LL/SC helpers in
softmmu_llsc_template.h from TCG code.

The helpers ending with an "a" make an alignment check.

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 Makefile.target             |   2 +-
 include/exec/helper-gen.h   |   3 ++
 include/exec/helper-proto.h |   1 +
 include/exec/helper-tcg.h   |   3 ++
 tcg-llsc-helper.c           | 104 ++++++++++++++++++++++++++++++++++++++++++++
 tcg-llsc-helper.h           |  61 ++++++++++++++++++++++++++
 tcg/tcg-llsc-gen-helper.h   |  67 ++++++++++++++++++++++++++++
 7 files changed, 240 insertions(+), 1 deletion(-)
 create mode 100644 tcg-llsc-helper.c
 create mode 100644 tcg-llsc-helper.h
 create mode 100644 tcg/tcg-llsc-gen-helper.h

diff --git a/Makefile.target b/Makefile.target
index 34ddb7e..faf32a2 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -135,7 +135,7 @@ obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o
 obj-y += qtest.o bootdevice.o
 obj-y += hw/
 obj-$(CONFIG_KVM) += kvm-all.o
-obj-y += memory.o cputlb.o
+obj-y += memory.o cputlb.o tcg-llsc-helper.o
 obj-y += memory_mapping.o
 obj-y += dump.o
 obj-y += migration/ram.o migration/savevm.o
diff --git a/include/exec/helper-gen.h b/include/exec/helper-gen.h
index 0d0da3a..f8483a9 100644
--- a/include/exec/helper-gen.h
+++ b/include/exec/helper-gen.h
@@ -60,6 +60,9 @@ static inline void glue(gen_helper_, name)(dh_retvar_decl(ret)          \
 #include "trace/generated-helpers.h"
 #include "trace/generated-helpers-wrappers.h"
 #include "tcg-runtime.h"
+#if defined(CONFIG_SOFTMMU)
+#include "tcg-llsc-gen-helper.h"
+#endif
 
 #undef DEF_HELPER_FLAGS_0
 #undef DEF_HELPER_FLAGS_1
diff --git a/include/exec/helper-proto.h b/include/exec/helper-proto.h
index effdd43..90be2fd 100644
--- a/include/exec/helper-proto.h
+++ b/include/exec/helper-proto.h
@@ -29,6 +29,7 @@ dh_ctype(ret) HELPER(name) (dh_ctype(t1), dh_ctype(t2), dh_ctype(t3), \
 #include "helper.h"
 #include "trace/generated-helpers.h"
 #include "tcg-runtime.h"
+#include "tcg/tcg-llsc-gen-helper.h"
 
 #undef DEF_HELPER_FLAGS_0
 #undef DEF_HELPER_FLAGS_1
diff --git a/include/exec/helper-tcg.h b/include/exec/helper-tcg.h
index 79fa3c8..6228a7f 100644
--- a/include/exec/helper-tcg.h
+++ b/include/exec/helper-tcg.h
@@ -38,6 +38,9 @@
 #include "helper.h"
 #include "trace/generated-helpers.h"
 #include "tcg-runtime.h"
+#ifdef CONFIG_SOFTMMU
+#include "tcg-llsc-gen-helper.h"
+#endif
 
 #undef DEF_HELPER_FLAGS_0
 #undef DEF_HELPER_FLAGS_1
diff --git a/tcg-llsc-helper.c b/tcg-llsc-helper.c
new file mode 100644
index 0000000..646b4ba
--- /dev/null
+++ b/tcg-llsc-helper.c
@@ -0,0 +1,104 @@
+/*
+ * Runtime helpers for atomic istruction emulation
+ *
+ * Copyright (c) 2015 Virtual Open Systems
+ *
+ * Authors:
+ *  Alvise Rigo <a.rigo@virtualopensystems.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "exec/cpu_ldst.h"
+#include "exec/helper-head.h"
+#include "tcg-llsc-helper.h"
+
+#define LDEX_HELPER(SUFF, OPC, FUNC)                                       \
+uint32_t HELPER(ldlink_i##SUFF)(CPUArchState *env, target_ulong addr,      \
+                                uint32_t index)                            \
+{                                                                          \
+    CPUArchState *state = env;                                             \
+    TCGMemOpIdx op;                                                        \
+                                                                           \
+    op = make_memop_idx((OPC), index);                                     \
+                                                                           \
+    return (uint32_t)FUNC(state, addr, op, GETRA());                       \
+}
+
+#define STEX_HELPER(SUFF, DATA_TYPE, OPC, FUNC)                            \
+target_ulong HELPER(stcond_i##SUFF)(CPUArchState *env, target_ulong addr,  \
+                                    uint32_t val, uint32_t index)          \
+{                                                                          \
+    CPUArchState *state = env;                                             \
+    TCGMemOpIdx op;                                                        \
+                                                                           \
+    op = make_memop_idx((OPC), index);                                     \
+                                                                           \
+    return (target_ulong)FUNC(state, addr, val, op, GETRA());              \
+}
+
+
+LDEX_HELPER(8, MO_UB, helper_ret_ldlinkub_mmu)
+LDEX_HELPER(16_be, MO_BEUW, helper_be_ldlinkuw_mmu)
+LDEX_HELPER(16_bea, MO_BEUW | MO_ALIGN, helper_be_ldlinkuw_mmu)
+LDEX_HELPER(32_be, MO_BEUL, helper_be_ldlinkul_mmu)
+LDEX_HELPER(32_bea, MO_BEUL | MO_ALIGN, helper_be_ldlinkul_mmu)
+LDEX_HELPER(16_le, MO_LEUW, helper_le_ldlinkuw_mmu)
+LDEX_HELPER(16_lea, MO_LEUW | MO_ALIGN, helper_le_ldlinkuw_mmu)
+LDEX_HELPER(32_le, MO_LEUL, helper_le_ldlinkul_mmu)
+LDEX_HELPER(32_lea, MO_LEUL | MO_ALIGN, helper_le_ldlinkul_mmu)
+
+STEX_HELPER(8, uint8_t, MO_UB, helper_ret_stcondb_mmu)
+STEX_HELPER(16_be, uint16_t, MO_BEUW, helper_be_stcondw_mmu)
+STEX_HELPER(16_bea, uint16_t, MO_BEUW | MO_ALIGN, helper_be_stcondw_mmu)
+STEX_HELPER(32_be, uint32_t, MO_BEUL, helper_be_stcondl_mmu)
+STEX_HELPER(32_bea, uint32_t, MO_BEUL | MO_ALIGN, helper_be_stcondl_mmu)
+STEX_HELPER(16_le, uint16_t, MO_LEUW, helper_le_stcondw_mmu)
+STEX_HELPER(16_lea, uint16_t, MO_LEUW | MO_ALIGN, helper_le_stcondw_mmu)
+STEX_HELPER(32_le, uint32_t, MO_LEUL, helper_le_stcondl_mmu)
+STEX_HELPER(32_lea, uint32_t, MO_LEUL | MO_ALIGN, helper_le_stcondl_mmu)
+
+#define LDEX_HELPER_64(SUFF, OPC, FUNC)                                     \
+uint64_t HELPER(ldlink_i##SUFF)(CPUArchState *env, target_ulong addr,       \
+                                uint32_t index)                             \
+{                                                                           \
+    CPUArchState *state = env;                                              \
+    TCGMemOpIdx op;                                                         \
+                                                                            \
+    op = make_memop_idx((OPC), index);                                      \
+                                                                            \
+    return FUNC(state, addr, op, GETRA());                                  \
+}
+
+#define STEX_HELPER_64(SUFF, OPC, FUNC)                                     \
+target_ulong HELPER(stcond_i##SUFF)(CPUArchState *env, target_ulong addr,   \
+                                    uint64_t val, uint32_t index)           \
+{                                                                           \
+    CPUArchState *state = env;                                              \
+    TCGMemOpIdx op;                                                         \
+                                                                            \
+    op = make_memop_idx((OPC), index);                                      \
+                                                                            \
+    return (target_ulong)FUNC(state, addr, val, op, GETRA());               \
+}
+
+LDEX_HELPER_64(64_be, MO_BEQ, helper_be_ldlinkq_mmu)
+LDEX_HELPER_64(64_bea, MO_BEQ | MO_ALIGN, helper_be_ldlinkq_mmu)
+LDEX_HELPER_64(64_le, MO_LEQ, helper_le_ldlinkq_mmu)
+LDEX_HELPER_64(64_lea, MO_LEQ | MO_ALIGN, helper_le_ldlinkq_mmu)
+
+STEX_HELPER_64(64_be, MO_BEQ, helper_be_stcondq_mmu)
+STEX_HELPER_64(64_bea, MO_BEQ | MO_ALIGN, helper_be_stcondq_mmu)
+STEX_HELPER_64(64_le, MO_LEQ, helper_le_stcondq_mmu)
+STEX_HELPER_64(64_lea, MO_LEQ | MO_ALIGN, helper_le_stcondq_mmu)
diff --git a/tcg-llsc-helper.h b/tcg-llsc-helper.h
new file mode 100644
index 0000000..8f7adf0
--- /dev/null
+++ b/tcg-llsc-helper.h
@@ -0,0 +1,61 @@
+#ifndef HELPER_LLSC_HEAD_H
+#define HELPER_LLSC_HEAD_H 1
+
+uint32_t HELPER(ldlink_i8)(CPUArchState *env, target_ulong addr,
+                           uint32_t index);
+uint32_t HELPER(ldlink_i16_be)(CPUArchState *env, target_ulong addr,
+                               uint32_t index);
+uint32_t HELPER(ldlink_i32_be)(CPUArchState *env, target_ulong addr,
+                               uint32_t index);
+uint64_t HELPER(ldlink_i64_be)(CPUArchState *env, target_ulong addr,
+                               uint32_t index);
+uint32_t HELPER(ldlink_i16_le)(CPUArchState *env, target_ulong addr,
+                               uint32_t index);
+uint32_t HELPER(ldlink_i32_le)(CPUArchState *env, target_ulong addr,
+                               uint32_t index);
+uint64_t HELPER(ldlink_i64_le)(CPUArchState *env, target_ulong addr,
+                               uint32_t index);
+
+target_ulong HELPER(stcond_i8)(CPUArchState *env, target_ulong addr,
+                               uint32_t val, uint32_t index);
+target_ulong HELPER(stcond_i16_be)(CPUArchState *env, target_ulong addr,
+                                   uint32_t val, uint32_t index);
+target_ulong HELPER(stcond_i32_be)(CPUArchState *env, target_ulong addr,
+                                   uint32_t val, uint32_t index);
+target_ulong HELPER(stcond_i64_be)(CPUArchState *env, target_ulong addr,
+                                   uint64_t val, uint32_t index);
+target_ulong HELPER(stcond_i16_le)(CPUArchState *env, target_ulong addr,
+                                   uint32_t val, uint32_t index);
+target_ulong HELPER(stcond_i32_le)(CPUArchState *env, target_ulong addr,
+                                   uint32_t val, uint32_t index);
+target_ulong HELPER(stcond_i64_le)(CPUArchState *env, target_ulong addr,
+                                   uint64_t val, uint32_t index);
+
+/* Aligned versions */
+uint32_t HELPER(ldlink_i16_bea)(CPUArchState *env, target_ulong addr,
+                                uint32_t index);
+uint32_t HELPER(ldlink_i32_bea)(CPUArchState *env, target_ulong addr,
+                                uint32_t index);
+uint64_t HELPER(ldlink_i64_bea)(CPUArchState *env, target_ulong addr,
+                                uint32_t index);
+uint32_t HELPER(ldlink_i16_lea)(CPUArchState *env, target_ulong addr,
+                                uint32_t index);
+uint32_t HELPER(ldlink_i32_lea)(CPUArchState *env, target_ulong addr,
+                                uint32_t index);
+uint64_t HELPER(ldlink_i64_lea)(CPUArchState *env, target_ulong addr,
+                                uint32_t index);
+
+target_ulong HELPER(stcond_i16_bea)(CPUArchState *env, target_ulong addr,
+                                    uint32_t val, uint32_t index);
+target_ulong HELPER(stcond_i32_bea)(CPUArchState *env, target_ulong addr,
+                                    uint32_t val, uint32_t index);
+target_ulong HELPER(stcond_i64_bea)(CPUArchState *env, target_ulong addr,
+                                    uint64_t val, uint32_t index);
+target_ulong HELPER(stcond_i16_lea)(CPUArchState *env, target_ulong addr,
+                                    uint32_t val, uint32_t index);
+target_ulong HELPER(stcond_i32_lea)(CPUArchState *env, target_ulong addr,
+                                    uint32_t val, uint32_t index);
+target_ulong HELPER(stcond_i64_lea)(CPUArchState *env, target_ulong addr,
+                                    uint64_t val, uint32_t index);
+
+#endif
diff --git a/tcg/tcg-llsc-gen-helper.h b/tcg/tcg-llsc-gen-helper.h
new file mode 100644
index 0000000..01c0a67
--- /dev/null
+++ b/tcg/tcg-llsc-gen-helper.h
@@ -0,0 +1,67 @@
+#if TARGET_LONG_BITS == 32
+#define TYPE i32
+#else
+#define TYPE i64
+#endif
+
+DEF_HELPER_3(ldlink_i8, i32, env, TYPE, i32)
+DEF_HELPER_3(ldlink_i16_be, i32, env, TYPE, i32)
+DEF_HELPER_3(ldlink_i32_be, i32, env, TYPE, i32)
+DEF_HELPER_3(ldlink_i64_be, i64, env, TYPE, i32)
+DEF_HELPER_3(ldlink_i16_le, i32, env, TYPE, i32)
+DEF_HELPER_3(ldlink_i32_le, i32, env, TYPE, i32)
+DEF_HELPER_3(ldlink_i64_le, i64, env, TYPE, i32)
+
+DEF_HELPER_4(stcond_i8, TYPE, env, TYPE, i32, i32)
+DEF_HELPER_4(stcond_i16_be, TYPE, env, TYPE, i32, i32)
+DEF_HELPER_4(stcond_i32_be, TYPE, env, TYPE, i32, i32)
+DEF_HELPER_4(stcond_i64_be, TYPE, env, TYPE, i64, i32)
+DEF_HELPER_4(stcond_i16_le, TYPE, env, TYPE, i32, i32)
+DEF_HELPER_4(stcond_i32_le, TYPE, env, TYPE, i32, i32)
+DEF_HELPER_4(stcond_i64_le, TYPE, env, TYPE, i64, i32)
+
+/* Aligned versions */
+DEF_HELPER_3(ldlink_i16_bea, i32, env, TYPE, i32)
+DEF_HELPER_3(ldlink_i32_bea, i32, env, TYPE, i32)
+DEF_HELPER_3(ldlink_i64_bea, i64, env, TYPE, i32)
+DEF_HELPER_3(ldlink_i16_lea, i32, env, TYPE, i32)
+DEF_HELPER_3(ldlink_i32_lea, i32, env, TYPE, i32)
+DEF_HELPER_3(ldlink_i64_lea, i64, env, TYPE, i32)
+
+DEF_HELPER_4(stcond_i16_bea, TYPE, env, TYPE, i32, i32)
+DEF_HELPER_4(stcond_i32_bea, TYPE, env, TYPE, i32, i32)
+DEF_HELPER_4(stcond_i64_bea, TYPE, env, TYPE, i64, i32)
+DEF_HELPER_4(stcond_i16_lea, TYPE, env, TYPE, i32, i32)
+DEF_HELPER_4(stcond_i32_lea, TYPE, env, TYPE, i32, i32)
+DEF_HELPER_4(stcond_i64_lea, TYPE, env, TYPE, i64, i32)
+
+/* Convenient aliases */
+#ifdef TARGET_WORDS_BIGENDIAN
+#define gen_helper_stcond_i16 gen_helper_stcond_i16_be
+#define gen_helper_stcond_i32 gen_helper_stcond_i32_be
+#define gen_helper_stcond_i64 gen_helper_stcond_i64_be
+#define gen_helper_ldlink_i16 gen_helper_ldlink_i16_be
+#define gen_helper_ldlink_i32 gen_helper_ldlink_i32_be
+#define gen_helper_ldlink_i64 gen_helper_ldlink_i64_be
+#define gen_helper_stcond_i16a gen_helper_stcond_i16_bea
+#define gen_helper_stcond_i32a gen_helper_stcond_i32_bea
+#define gen_helper_stcond_i64a gen_helper_stcond_i64_bea
+#define gen_helper_ldlink_i16a gen_helper_ldlink_i16_bea
+#define gen_helper_ldlink_i32a gen_helper_ldlink_i32_bea
+#define gen_helper_ldlink_i64a gen_helper_ldlink_i64_bea
+#else
+#define gen_helper_stcond_i16 gen_helper_stcond_i16_le
+#define gen_helper_stcond_i32 gen_helper_stcond_i32_le
+#define gen_helper_stcond_i64 gen_helper_stcond_i64_le
+#define gen_helper_ldlink_i16 gen_helper_ldlink_i16_le
+#define gen_helper_ldlink_i32 gen_helper_ldlink_i32_le
+#define gen_helper_ldlink_i64 gen_helper_ldlink_i64_le
+#define gen_helper_stcond_i16a gen_helper_stcond_i16_lea
+#define gen_helper_stcond_i32a gen_helper_stcond_i32_lea
+#define gen_helper_stcond_i64a gen_helper_stcond_i64_lea
+#define gen_helper_ldlink_i16a gen_helper_ldlink_i16_lea
+#define gen_helper_ldlink_i32a gen_helper_ldlink_i32_lea
+#define gen_helper_ldlink_i64a gen_helper_ldlink_i64_lea
+#endif
+
+#undef TYPE
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [RFC v7 12/16] configure: Use slow-path for atomic only when the softmmu is enabled
  2016-01-29  9:32 [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation Alvise Rigo
                   ` (10 preceding siblings ...)
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 11/16] tcg: Create new runtime helpers for excl accesses Alvise Rigo
@ 2016-01-29  9:32 ` Alvise Rigo
  2016-02-18 16:40   ` Alex Bennée
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 13/16] softmmu: Add history of excl accesses Alvise Rigo
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 50+ messages in thread
From: Alvise Rigo @ 2016-01-29  9:32 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: claudio.fontana, pbonzini, jani.kokkonen, tech, alex.bennee, rth

Use the new slow path for atomic instruction translation when the
softmmu is enabled.

At the moment only arm and aarch64 use the new LL/SC backend. It is
possible to disable such backed with --disable-arm-llsc-backend.

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 configure | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/configure b/configure
index 44ac9ab..915efcc 100755
--- a/configure
+++ b/configure
@@ -294,6 +294,7 @@ solaris="no"
 profiler="no"
 cocoa="no"
 softmmu="yes"
+arm_tcg_use_llsc="yes"
 linux_user="no"
 bsd_user="no"
 aix="no"
@@ -880,6 +881,10 @@ for opt do
   ;;
   --disable-debug-tcg) debug_tcg="no"
   ;;
+  --enable-arm-llsc-backend) arm_tcg_use_llsc="yes"
+  ;;
+  --disable-arm-llsc-backend) arm_tcg_use_llsc="no"
+  ;;
   --enable-debug)
       # Enable debugging options that aren't excessively noisy
       debug_tcg="yes"
@@ -4751,6 +4756,7 @@ echo "host CPU          $cpu"
 echo "host big endian   $bigendian"
 echo "target list       $target_list"
 echo "tcg debug enabled $debug_tcg"
+echo "arm use llsc backend" $arm_tcg_use_llsc
 echo "gprof enabled     $gprof"
 echo "sparse enabled    $sparse"
 echo "strip binaries    $strip_opt"
@@ -4806,6 +4812,7 @@ echo "Install blobs     $blobs"
 echo "KVM support       $kvm"
 echo "RDMA support      $rdma"
 echo "TCG interpreter   $tcg_interpreter"
+echo "use ld/st excl    $softmmu"
 echo "fdt support       $fdt"
 echo "preadv support    $preadv"
 echo "fdatasync         $fdatasync"
@@ -5863,6 +5870,13 @@ fi
 echo "LDFLAGS+=$ldflags" >> $config_target_mak
 echo "QEMU_CFLAGS+=$cflags" >> $config_target_mak
 
+# Use tcg LL/SC tcg backend for exclusive instruction is arm/aarch64
+# softmmus targets
+if test "$arm_tcg_use_llsc" = "yes" ; then
+  if test "$target" = "arm-softmmu" ; then
+    echo "CONFIG_ARM_USE_LDST_EXCL=y" >> $config_target_mak
+  fi
+fi
 done # for target in $targets
 
 if [ "$pixman" = "internal" ]; then
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [RFC v7 13/16] softmmu: Add history of excl accesses
  2016-01-29  9:32 [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation Alvise Rigo
                   ` (11 preceding siblings ...)
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 12/16] configure: Use slow-path for atomic only when the softmmu is enabled Alvise Rigo
@ 2016-01-29  9:32 ` Alvise Rigo
  2016-02-16 17:07   ` Alex Bennée
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 14/16] target-arm: translate: Use ld/st excl for atomic insns Alvise Rigo
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 50+ messages in thread
From: Alvise Rigo @ 2016-01-29  9:32 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: claudio.fontana, pbonzini, jani.kokkonen, tech, alex.bennee, rth

Add a circular buffer to store the hw addresses used in the last
EXCLUSIVE_HISTORY_LEN exclusive accesses.

When an address is pop'ed from the buffer, its page will be set as not
exclusive. In this way, we avoid:
- frequent set/unset of a page (causing frequent flushes as well)
- the possibility to forget the EXCL bit set.

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 cputlb.c                | 29 +++++++++++++++++++----------
 exec.c                  | 19 +++++++++++++++++++
 include/qom/cpu.h       |  8 ++++++++
 softmmu_llsc_template.h |  1 +
 vl.c                    |  3 +++
 5 files changed, 50 insertions(+), 10 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index 06ce2da..f3c4d97 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -395,16 +395,6 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
     env->tlb_v_table[mmu_idx][vidx] = *te;
     env->iotlb_v[mmu_idx][vidx] = env->iotlb[mmu_idx][index];
 
-    if (unlikely(!(te->addr_write & TLB_MMIO) && (te->addr_write & TLB_EXCL))) {
-        /* We are removing an exclusive entry, set the page to dirty. This
-         * is not be necessary if the vCPU has performed both SC and LL. */
-        hwaddr hw_addr = (env->iotlb[mmu_idx][index].addr & TARGET_PAGE_MASK) +
-                                          (te->addr_write & TARGET_PAGE_MASK);
-        if (!cpu->ll_sc_context) {
-            cpu_physical_memory_unset_excl(hw_addr);
-        }
-    }
-
     /* refill the tlb */
     env->iotlb[mmu_idx][index].addr = iotlb - vaddr;
     env->iotlb[mmu_idx][index].attrs = attrs;
@@ -517,6 +507,25 @@ static inline bool lookup_and_reset_cpus_ll_addr(hwaddr addr, hwaddr size)
     return ret;
 }
 
+extern CPUExclusiveHistory excl_history;
+static inline void excl_history_put_addr(hwaddr addr)
+{
+    hwaddr last;
+
+    /* Calculate the index of the next exclusive address */
+    excl_history.last_idx = (excl_history.last_idx + 1) % excl_history.length;
+
+    last = excl_history.c_array[excl_history.last_idx];
+
+    /* Unset EXCL bit of the oldest entry */
+    if (last != EXCLUSIVE_RESET_ADDR) {
+        cpu_physical_memory_unset_excl(last);
+    }
+
+    /* Add a new address, overwriting the oldest one */
+    excl_history.c_array[excl_history.last_idx] = addr & TARGET_PAGE_MASK;
+}
+
 #define MMUSUFFIX _mmu
 
 /* Generates LoadLink/StoreConditional helpers in softmmu_template.h */
diff --git a/exec.c b/exec.c
index 51f366d..2e123f1 100644
--- a/exec.c
+++ b/exec.c
@@ -177,6 +177,25 @@ struct CPUAddressSpace {
     MemoryListener tcg_as_listener;
 };
 
+/* Exclusive memory support */
+CPUExclusiveHistory excl_history;
+void cpu_exclusive_history_init(void)
+{
+    /* Initialize exclusive history for atomic instruction handling. */
+    if (tcg_enabled()) {
+        g_assert(EXCLUSIVE_HISTORY_CPU_LEN * max_cpus <= UINT16_MAX);
+        excl_history.length = EXCLUSIVE_HISTORY_CPU_LEN * max_cpus;
+        excl_history.c_array = g_malloc(excl_history.length * sizeof(hwaddr));
+        memset(excl_history.c_array, -1, excl_history.length * sizeof(hwaddr));
+    }
+}
+
+void cpu_exclusive_history_free(void)
+{
+    if (tcg_enabled()) {
+        g_free(excl_history.c_array);
+    }
+}
 #endif
 
 #if !defined(CONFIG_USER_ONLY)
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 6f6c1c0..0452fd0 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -227,7 +227,15 @@ struct kvm_run;
 #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
 
 /* Atomic insn translation TLB support. */
+typedef struct CPUExclusiveHistory {
+    uint16_t last_idx;           /* index of last insertion */
+    uint16_t length;             /* history's length, it depends on smp_cpus */
+    hwaddr *c_array;             /* history's circular array */
+} CPUExclusiveHistory;
 #define EXCLUSIVE_RESET_ADDR ULLONG_MAX
+#define EXCLUSIVE_HISTORY_CPU_LEN 256
+void cpu_exclusive_history_init(void);
+void cpu_exclusive_history_free(void);
 
 /**
  * CPUState:
diff --git a/softmmu_llsc_template.h b/softmmu_llsc_template.h
index b4712ba..b4e7f9d 100644
--- a/softmmu_llsc_template.h
+++ b/softmmu_llsc_template.h
@@ -75,6 +75,7 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
          * to request any flush. */
         if (!cpu_physical_memory_is_excl(hw_addr)) {
             cpu_physical_memory_set_excl(hw_addr);
+            excl_history_put_addr(hw_addr);
             CPU_FOREACH(cpu) {
                 if (current_cpu != cpu) {
                     tlb_flush(cpu, 1);
diff --git a/vl.c b/vl.c
index f043009..b22d99b 100644
--- a/vl.c
+++ b/vl.c
@@ -547,6 +547,7 @@ static void res_free(void)
 {
     g_free(boot_splash_filedata);
     boot_splash_filedata = NULL;
+    cpu_exclusive_history_free();
 }
 
 static int default_driver_check(void *opaque, QemuOpts *opts, Error **errp)
@@ -4322,6 +4323,8 @@ int main(int argc, char **argv, char **envp)
 
     configure_accelerator(current_machine);
 
+    cpu_exclusive_history_init();
+
     if (qtest_chrdev) {
         qtest_init(qtest_chrdev, qtest_log, &error_fatal);
     }
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [RFC v7 14/16] target-arm: translate: Use ld/st excl for atomic insns
  2016-01-29  9:32 [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation Alvise Rigo
                   ` (12 preceding siblings ...)
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 13/16] softmmu: Add history of excl accesses Alvise Rigo
@ 2016-01-29  9:32 ` Alvise Rigo
  2016-02-18 17:02   ` Alex Bennée
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 15/16] target-arm: cpu64: use custom set_excl hook Alvise Rigo
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 50+ messages in thread
From: Alvise Rigo @ 2016-01-29  9:32 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: claudio.fontana, pbonzini, jani.kokkonen, tech, alex.bennee, rth

Use the new LL/SC runtime helpers to handle the ARM atomic instructions
in softmmu_llsc_template.h.

In general, the helper generator
gen_{ldrex,strex}_{8,16a,32a,64a}() calls the function
helper_{le,be}_{ldlink,stcond}{ub,uw,ul,q}_mmu() implemented in
softmmu_llsc_template.h, doing an alignment check.

In addition, add a simple helper function to emulate the CLREX instruction.

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 target-arm/cpu.h       |   2 +
 target-arm/helper.h    |   4 ++
 target-arm/machine.c   |   2 +
 target-arm/op_helper.c |  10 +++
 target-arm/translate.c | 188 +++++++++++++++++++++++++++++++++++++++++++++++--
 5 files changed, 202 insertions(+), 4 deletions(-)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index b8b3364..bb5361f 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -462,9 +462,11 @@ typedef struct CPUARMState {
         float_status fp_status;
         float_status standard_fp_status;
     } vfp;
+#if !defined(CONFIG_ARM_USE_LDST_EXCL)
     uint64_t exclusive_addr;
     uint64_t exclusive_val;
     uint64_t exclusive_high;
+#endif
 #if defined(CONFIG_USER_ONLY)
     uint64_t exclusive_test;
     uint32_t exclusive_info;
diff --git a/target-arm/helper.h b/target-arm/helper.h
index c2a85c7..6bc3c0a 100644
--- a/target-arm/helper.h
+++ b/target-arm/helper.h
@@ -532,6 +532,10 @@ DEF_HELPER_2(dc_zva, void, env, i64)
 DEF_HELPER_FLAGS_2(neon_pmull_64_lo, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(neon_pmull_64_hi, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 
+#ifdef CONFIG_ARM_USE_LDST_EXCL
+DEF_HELPER_1(atomic_clear, void, env)
+#endif
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #endif
diff --git a/target-arm/machine.c b/target-arm/machine.c
index ed1925a..7adfb4d 100644
--- a/target-arm/machine.c
+++ b/target-arm/machine.c
@@ -309,9 +309,11 @@ const VMStateDescription vmstate_arm_cpu = {
         VMSTATE_VARRAY_INT32(cpreg_vmstate_values, ARMCPU,
                              cpreg_vmstate_array_len,
                              0, vmstate_info_uint64, uint64_t),
+#if !defined(CONFIG_ARM_USE_LDST_EXCL)
         VMSTATE_UINT64(env.exclusive_addr, ARMCPU),
         VMSTATE_UINT64(env.exclusive_val, ARMCPU),
         VMSTATE_UINT64(env.exclusive_high, ARMCPU),
+#endif
         VMSTATE_UINT64(env.features, ARMCPU),
         VMSTATE_UINT32(env.exception.syndrome, ARMCPU),
         VMSTATE_UINT32(env.exception.fsr, ARMCPU),
diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
index a5ee65f..404c13b 100644
--- a/target-arm/op_helper.c
+++ b/target-arm/op_helper.c
@@ -51,6 +51,14 @@ static int exception_target_el(CPUARMState *env)
     return target_el;
 }
 
+#ifdef CONFIG_ARM_USE_LDST_EXCL
+void HELPER(atomic_clear)(CPUARMState *env)
+{
+    ENV_GET_CPU(env)->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
+    ENV_GET_CPU(env)->ll_sc_context = false;
+}
+#endif
+
 uint32_t HELPER(neon_tbl)(CPUARMState *env, uint32_t ireg, uint32_t def,
                           uint32_t rn, uint32_t maxindex)
 {
@@ -689,7 +697,9 @@ void HELPER(exception_return)(CPUARMState *env)
 
     aarch64_save_sp(env, cur_el);
 
+#if !defined(CONFIG_ARM_USE_LDST_EXCL)
     env->exclusive_addr = -1;
+#endif
 
     /* We must squash the PSTATE.SS bit to zero unless both of the
      * following hold:
diff --git a/target-arm/translate.c b/target-arm/translate.c
index cff511b..5150841 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -60,8 +60,10 @@ TCGv_ptr cpu_env;
 static TCGv_i64 cpu_V0, cpu_V1, cpu_M0;
 static TCGv_i32 cpu_R[16];
 TCGv_i32 cpu_CF, cpu_NF, cpu_VF, cpu_ZF;
+#if !defined(CONFIG_ARM_USE_LDST_EXCL)
 TCGv_i64 cpu_exclusive_addr;
 TCGv_i64 cpu_exclusive_val;
+#endif
 #ifdef CONFIG_USER_ONLY
 TCGv_i64 cpu_exclusive_test;
 TCGv_i32 cpu_exclusive_info;
@@ -94,10 +96,12 @@ void arm_translate_init(void)
     cpu_VF = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUARMState, VF), "VF");
     cpu_ZF = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUARMState, ZF), "ZF");
 
+#if !defined(CONFIG_ARM_USE_LDST_EXCL)
     cpu_exclusive_addr = tcg_global_mem_new_i64(TCG_AREG0,
         offsetof(CPUARMState, exclusive_addr), "exclusive_addr");
     cpu_exclusive_val = tcg_global_mem_new_i64(TCG_AREG0,
         offsetof(CPUARMState, exclusive_val), "exclusive_val");
+#endif
 #ifdef CONFIG_USER_ONLY
     cpu_exclusive_test = tcg_global_mem_new_i64(TCG_AREG0,
         offsetof(CPUARMState, exclusive_test), "exclusive_test");
@@ -7413,15 +7417,145 @@ static void gen_logicq_cc(TCGv_i32 lo, TCGv_i32 hi)
     tcg_gen_or_i32(cpu_ZF, lo, hi);
 }
 
-/* Load/Store exclusive instructions are implemented by remembering
+/* If the softmmu is enabled, the translation of Load/Store exclusive
+   instructions will rely on the gen_helper_{ldlink,stcond} helpers,
+   offloading most of the work to the softmmu_llsc_template.h functions.
+   All the accesses made by the exclusive instructions include an
+   alignment check.
+
+   Otherwise, these instructions are implemented by remembering
    the value/address loaded, and seeing if these are the same
    when the store is performed. This should be sufficient to implement
    the architecturally mandated semantics, and avoids having to monitor
    regular stores.
 
-   In system emulation mode only one CPU will be running at once, so
-   this sequence is effectively atomic.  In user emulation mode we
-   throw an exception and handle the atomic operation elsewhere.  */
+   In user emulation mode we throw an exception and handle the atomic
+   operation elsewhere.  */
+#ifdef CONFIG_ARM_USE_LDST_EXCL
+
+#if TARGET_LONG_BITS == 32
+#define DO_GEN_LDREX(SUFF)                                             \
+static inline void gen_ldrex_##SUFF(TCGv_i32 dst, TCGv_i32 addr,       \
+                                    TCGv_i32 index)                    \
+{                                                                      \
+    gen_helper_ldlink_##SUFF(dst, cpu_env, addr, index);               \
+}
+
+#define DO_GEN_STREX(SUFF)                                             \
+static inline void gen_strex_##SUFF(TCGv_i32 dst, TCGv_i32 addr,       \
+                                    TCGv_i32 val, TCGv_i32 index)      \
+{                                                                      \
+    gen_helper_stcond_##SUFF(dst, cpu_env, addr, val, index);          \
+}
+
+static inline void gen_ldrex_i64a(TCGv_i64 dst, TCGv_i32 addr, TCGv_i32 index)
+{
+    gen_helper_ldlink_i64a(dst, cpu_env, addr, index);
+}
+
+static inline void gen_strex_i64a(TCGv_i32 dst, TCGv_i32 addr, TCGv_i64 val,
+                                  TCGv_i32 index)
+{
+
+    gen_helper_stcond_i64a(dst, cpu_env, addr, val, index);
+}
+#else
+#define DO_GEN_LDREX(SUFF)                                             \
+static inline void gen_ldrex_##SUFF(TCGv_i32 dst, TCGv_i32 addr,       \
+                                         TCGv_i32 index)               \
+{                                                                      \
+    TCGv addr64 = tcg_temp_new();                                      \
+    tcg_gen_extu_i32_i64(addr64, addr);                                \
+    gen_helper_ldlink_##SUFF(dst, cpu_env, addr64, index);             \
+    tcg_temp_free(addr64);                                             \
+}
+
+#define DO_GEN_STREX(SUFF)                                             \
+static inline void gen_strex_##SUFF(TCGv_i32 dst, TCGv_i32 addr,       \
+                                    TCGv_i32 val, TCGv_i32 index)      \
+{                                                                      \
+    TCGv addr64 = tcg_temp_new();                                      \
+    TCGv dst64 = tcg_temp_new();                                       \
+    tcg_gen_extu_i32_i64(addr64, addr);                                \
+    gen_helper_stcond_##SUFF(dst64, cpu_env, addr64, val, index);      \
+    tcg_gen_extrl_i64_i32(dst, dst64);                                 \
+    tcg_temp_free(dst64);                                              \
+    tcg_temp_free(addr64);                                             \
+}
+
+static inline void gen_ldrex_i64a(TCGv_i64 dst, TCGv_i32 addr, TCGv_i32 index)
+{
+    TCGv addr64 = tcg_temp_new();
+    tcg_gen_extu_i32_i64(addr64, addr);
+    gen_helper_ldlink_i64a(dst, cpu_env, addr64, index);
+    tcg_temp_free(addr64);
+}
+
+static inline void gen_strex_i64a(TCGv_i32 dst, TCGv_i32 addr, TCGv_i64 val,
+                                  TCGv_i32 index)
+{
+    TCGv addr64 = tcg_temp_new();
+    TCGv dst64 = tcg_temp_new();
+
+    tcg_gen_extu_i32_i64(addr64, addr);
+    gen_helper_stcond_i64a(dst64, cpu_env, addr64, val, index);
+    tcg_gen_extrl_i64_i32(dst, dst64);
+
+    tcg_temp_free(dst64);
+    tcg_temp_free(addr64);
+}
+#endif
+
+#if defined(CONFIG_ARM_USE_LDST_EXCL)
+DO_GEN_LDREX(i8)
+DO_GEN_LDREX(i16a)
+DO_GEN_LDREX(i32a)
+
+DO_GEN_STREX(i8)
+DO_GEN_STREX(i16a)
+DO_GEN_STREX(i32a)
+#endif
+
+static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
+                               TCGv_i32 addr, int size)
+ {
+    TCGv_i32 tmp = tcg_temp_new_i32();
+    TCGv_i32 mem_idx = tcg_temp_new_i32();
+
+    tcg_gen_movi_i32(mem_idx, get_mem_index(s));
+
+    if (size != 3) {
+        switch (size) {
+        case 0:
+            gen_ldrex_i8(tmp, addr, mem_idx);
+            break;
+        case 1:
+            gen_ldrex_i16a(tmp, addr, mem_idx);
+            break;
+        case 2:
+            gen_ldrex_i32a(tmp, addr, mem_idx);
+            break;
+        default:
+            abort();
+        }
+
+        store_reg(s, rt, tmp);
+    } else {
+        TCGv_i64 tmp64 = tcg_temp_new_i64();
+        TCGv_i32 tmph = tcg_temp_new_i32();
+
+        gen_ldrex_i64a(tmp64, addr, mem_idx);
+        tcg_gen_extr_i64_i32(tmp, tmph, tmp64);
+
+        store_reg(s, rt, tmp);
+        store_reg(s, rt2, tmph);
+
+        tcg_temp_free_i64(tmp64);
+    }
+
+    tcg_temp_free_i32(mem_idx);
+}
+#else
 static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
                                TCGv_i32 addr, int size)
 {
@@ -7460,10 +7594,15 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
     store_reg(s, rt, tmp);
     tcg_gen_extu_i32_i64(cpu_exclusive_addr, addr);
 }
+#endif
 
 static void gen_clrex(DisasContext *s)
 {
+#ifdef CONFIG_ARM_USE_LDST_EXCL
+    gen_helper_atomic_clear(cpu_env);
+#else
     tcg_gen_movi_i64(cpu_exclusive_addr, -1);
+#endif
 }
 
 #ifdef CONFIG_USER_ONLY
@@ -7475,6 +7614,47 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
                      size | (rd << 4) | (rt << 8) | (rt2 << 12));
     gen_exception_internal_insn(s, 4, EXCP_STREX);
 }
+#elif defined CONFIG_ARM_USE_LDST_EXCL
+static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
+                                TCGv_i32 addr, int size)
+{
+    TCGv_i32 tmp, mem_idx;
+
+    mem_idx = tcg_temp_new_i32();
+
+    tcg_gen_movi_i32(mem_idx, get_mem_index(s));
+    tmp = load_reg(s, rt);
+
+    if (size != 3) {
+        switch (size) {
+        case 0:
+            gen_strex_i8(cpu_R[rd], addr, tmp, mem_idx);
+            break;
+        case 1:
+            gen_strex_i16a(cpu_R[rd], addr, tmp, mem_idx);
+            break;
+        case 2:
+            gen_strex_i32a(cpu_R[rd], addr, tmp, mem_idx);
+            break;
+        default:
+            abort();
+        }
+    } else {
+        TCGv_i64 tmp64;
+        TCGv_i32 tmp2;
+
+        tmp64 = tcg_temp_new_i64();
+        tmp2 = load_reg(s, rt2);
+        tcg_gen_concat_i32_i64(tmp64, tmp, tmp2);
+        gen_strex_i64a(cpu_R[rd], addr, tmp64, mem_idx);
+
+        tcg_temp_free_i32(tmp2);
+        tcg_temp_free_i64(tmp64);
+    }
+
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i32(mem_idx);
+}
 #else
 static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
                                 TCGv_i32 addr, int size)
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [RFC v7 15/16] target-arm: cpu64: use custom set_excl hook
  2016-01-29  9:32 [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation Alvise Rigo
                   ` (13 preceding siblings ...)
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 14/16] target-arm: translate: Use ld/st excl for atomic insns Alvise Rigo
@ 2016-01-29  9:32 ` Alvise Rigo
  2016-02-18 18:19   ` Alex Bennée
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 16/16] target-arm: aarch64: add atomic instructions Alvise Rigo
  2016-02-19 11:44 ` [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation Alex Bennée
  16 siblings, 1 reply; 50+ messages in thread
From: Alvise Rigo @ 2016-01-29  9:32 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: claudio.fontana, pbonzini, jani.kokkonen, tech, alex.bennee, rth

In aarch64 the LDXP/STXP instructions allow to perform up to 128 bits
exclusive accesses. However, due to a softmmu limitation, such wide
accesses are not allowed.

To workaround this limitation, we need to support LoadLink instructions
that cover at least 128 consecutive bits (see the next patch for more
details).

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 target-arm/cpu64.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/target-arm/cpu64.c b/target-arm/cpu64.c
index cc177bb..1d45e66 100644
--- a/target-arm/cpu64.c
+++ b/target-arm/cpu64.c
@@ -287,6 +287,13 @@ static void aarch64_cpu_set_pc(CPUState *cs, vaddr value)
     }
 }
 
+static void aarch64_set_excl_range(CPUState *cpu, hwaddr addr, hwaddr size)
+{
+    cpu->excl_protected_range.begin = addr;
+    /* At least cover 128 bits for a STXP access (two paired doublewords case)*/
+    cpu->excl_protected_range.end = addr + 16;
+}
+
 static void aarch64_cpu_class_init(ObjectClass *oc, void *data)
 {
     CPUClass *cc = CPU_CLASS(oc);
@@ -297,6 +304,7 @@ static void aarch64_cpu_class_init(ObjectClass *oc, void *data)
     cc->gdb_write_register = aarch64_cpu_gdb_write_register;
     cc->gdb_num_core_regs = 34;
     cc->gdb_core_xml_file = "aarch64-core.xml";
+    cc->cpu_set_excl_protected_range = aarch64_set_excl_range;
 }
 
 static void aarch64_cpu_register(const ARMCPUInfo *info)
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [RFC v7 16/16] target-arm: aarch64: add atomic instructions
  2016-01-29  9:32 [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation Alvise Rigo
                   ` (14 preceding siblings ...)
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 15/16] target-arm: cpu64: use custom set_excl hook Alvise Rigo
@ 2016-01-29  9:32 ` Alvise Rigo
  2016-02-19 11:34   ` Alex Bennée
  2016-02-19 11:44 ` [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation Alex Bennée
  16 siblings, 1 reply; 50+ messages in thread
From: Alvise Rigo @ 2016-01-29  9:32 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: claudio.fontana, pbonzini, jani.kokkonen, tech, alex.bennee, rth

Use the new LL/SC runtime helpers to handle the aarch64 atomic instructions
in softmmu_llsc_template.h.

The STXP emulation required a dedicated helper to handle the paired
doubleword case.

Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
 configure                  |   6 +-
 target-arm/helper-a64.c    |  55 +++++++++++++++++++
 target-arm/helper-a64.h    |   4 ++
 target-arm/op_helper.c     |   8 +++
 target-arm/translate-a64.c | 134 ++++++++++++++++++++++++++++++++++++++++++++-
 5 files changed, 204 insertions(+), 3 deletions(-)

diff --git a/configure b/configure
index 915efcc..38121ff 100755
--- a/configure
+++ b/configure
@@ -5873,9 +5873,11 @@ echo "QEMU_CFLAGS+=$cflags" >> $config_target_mak
 # Use tcg LL/SC tcg backend for exclusive instruction is arm/aarch64
 # softmmus targets
 if test "$arm_tcg_use_llsc" = "yes" ; then
-  if test "$target" = "arm-softmmu" ; then
+  case "$target" in
+    arm-softmmu | aarch64-softmmu)
     echo "CONFIG_ARM_USE_LDST_EXCL=y" >> $config_target_mak
-  fi
+    ;;
+  esac
 fi
 done # for target in $targets
 
diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
index c7bfb4d..dcee66f 100644
--- a/target-arm/helper-a64.c
+++ b/target-arm/helper-a64.c
@@ -26,6 +26,7 @@
 #include "qemu/bitops.h"
 #include "internals.h"
 #include "qemu/crc32c.h"
+#include "tcg/tcg.h"
 #include <zlib.h> /* For crc32 */
 
 /* C2.4.7 Multiply and divide */
@@ -443,3 +444,57 @@ uint64_t HELPER(crc32c_64)(uint64_t acc, uint64_t val, uint32_t bytes)
     /* Linux crc32c converts the output to one's complement.  */
     return crc32c(acc, buf, bytes) ^ 0xffffffff;
 }
+
+#ifdef CONFIG_ARM_USE_LDST_EXCL
+/* STXP emulation for two 64 bit doublewords. We can't use directly two
+ * stcond_i64 accesses, otherwise the first will conclude the LL/SC pair.
+ * Instead, two normal 64-bit accesses are used and the CPUState is
+ * updated accordingly. */
+target_ulong HELPER(stxp_i128)(CPUArchState *env, target_ulong addr,
+                               uint64_t vall, uint64_t valh,
+                               uint32_t mmu_idx)
+{
+    CPUState *cpu = ENV_GET_CPU(env);
+    TCGMemOpIdx op;
+    target_ulong ret = 0;
+
+    if (!cpu->ll_sc_context) {
+        cpu->excl_succeeded = false;
+        ret = 1;
+        goto out;
+    }
+
+    op = make_memop_idx(MO_BEQ, mmu_idx);
+
+    /* According to section C6.6.191 of ARM ARM DDI 0487A.h, the access has to
+     * be quadword aligned.  For the time being, we do not support paired STXPs
+     * to MMIO memory, this will become trivial when the softmmu will support
+     * 128bit memory accesses. */
+    if (addr & 0xf) {
+        /* TODO: Do unaligned access */
+    }
+
+    /* Setting excl_succeeded to true will make the store exclusive. */
+    cpu->excl_succeeded = true;
+    helper_ret_stq_mmu(env, addr, vall, op, GETRA());
+
+    if (!cpu->excl_succeeded) {
+        ret = 1;
+        goto out;
+    }
+
+    helper_ret_stq_mmu(env, addr + 8, valh, op, GETRA());
+    if (!cpu->excl_succeeded) {
+        ret = 1;
+    } else {
+        cpu->excl_succeeded = false;
+    }
+
+out:
+    /* Unset LL/SC context */
+    cpu->ll_sc_context = false;
+    cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
+
+    return ret;
+}
+#endif
diff --git a/target-arm/helper-a64.h b/target-arm/helper-a64.h
index 1d3d10f..c416a83 100644
--- a/target-arm/helper-a64.h
+++ b/target-arm/helper-a64.h
@@ -46,3 +46,7 @@ DEF_HELPER_FLAGS_2(frecpx_f32, TCG_CALL_NO_RWG, f32, f32, ptr)
 DEF_HELPER_FLAGS_2(fcvtx_f64_to_f32, TCG_CALL_NO_RWG, f32, f64, env)
 DEF_HELPER_FLAGS_3(crc32_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
 DEF_HELPER_FLAGS_3(crc32c_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
+#ifdef CONFIG_ARM_USE_LDST_EXCL
+/* STXP helper */
+DEF_HELPER_5(stxp_i128, i64, env, i64, i64, i64, i32)
+#endif
diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
index 404c13b..146fc9a 100644
--- a/target-arm/op_helper.c
+++ b/target-arm/op_helper.c
@@ -34,6 +34,14 @@ static void raise_exception(CPUARMState *env, uint32_t excp,
     cs->exception_index = excp;
     env->exception.syndrome = syndrome;
     env->exception.target_el = target_el;
+#ifdef CONFIG_ARM_USE_LDST_EXCL
+    HELPER(atomic_clear)(env);
+    /* If the exception happens in the middle of a LL/SC, we need to clear
+     * excl_succeeded to avoid that the normal store following the exception is
+     * wrongly interpreted as exclusive.
+     * */
+    cs->excl_succeeded = 0;
+#endif
     cpu_loop_exit(cs);
 }
 
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 80f6c20..f34e957 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -37,8 +37,10 @@
 static TCGv_i64 cpu_X[32];
 static TCGv_i64 cpu_pc;
 
+#if !defined(CONFIG_ARM_USE_LDST_EXCL)
 /* Load/store exclusive handling */
 static TCGv_i64 cpu_exclusive_high;
+#endif
 
 static const char *regnames[] = {
     "x0", "x1", "x2", "x3", "x4", "x5", "x6", "x7",
@@ -94,8 +96,10 @@ void a64_translate_init(void)
                                           regnames[i]);
     }
 
+#if !defined(CONFIG_ARM_USE_LDST_EXCL)
     cpu_exclusive_high = tcg_global_mem_new_i64(TCG_AREG0,
         offsetof(CPUARMState, exclusive_high), "exclusive_high");
+#endif
 }
 
 static inline ARMMMUIdx get_a64_user_mem_index(DisasContext *s)
@@ -1219,7 +1223,11 @@ static void handle_hint(DisasContext *s, uint32_t insn,
 
 static void gen_clrex(DisasContext *s, uint32_t insn)
 {
+#ifndef CONFIG_ARM_USE_LDST_EXCL
     tcg_gen_movi_i64(cpu_exclusive_addr, -1);
+#else
+    gen_helper_atomic_clear(cpu_env);
+#endif
 }
 
 /* CLREX, DSB, DMB, ISB */
@@ -1685,7 +1693,11 @@ static void disas_b_exc_sys(DisasContext *s, uint32_t insn)
 }
 
 /*
- * Load/Store exclusive instructions are implemented by remembering
+ * If the softmmu is enabled, the translation of Load/Store exclusive
+ * instructions will rely on the gen_helper_{ldlink,stcond} helpers,
+ * offloading most of the work to the softmmu_llsc_template.h functions.
+ *
+ * Otherwise, instructions are implemented by remembering
  * the value/address loaded, and seeing if these are the same
  * when the store is performed. This is not actually the architecturally
  * mandated semantics, but it works for typical guest code sequences
@@ -1695,6 +1707,66 @@ static void disas_b_exc_sys(DisasContext *s, uint32_t insn)
  * this sequence is effectively atomic.  In user emulation mode we
  * throw an exception and handle the atomic operation elsewhere.
  */
+#ifdef CONFIG_ARM_USE_LDST_EXCL
+static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
+                               TCGv_i64 addr, int size, bool is_pair)
+{
+    /* In case @is_pair is set, we have to guarantee that at least the 128 bits
+     * accessed by a Load Exclusive Pair (64-bit variant) are protected. Since
+     * we do not have 128-bit helpers, we split the access in two halves, the
+     * first of them will set the exclusive region to cover at least 128 bits
+     * (this is why aarch64 has a custom cc->cpu_set_excl_protected_range which
+     * covers 128 bits).
+     * */
+    TCGv_i32 mem_idx = tcg_temp_new_i32();
+
+    tcg_gen_movi_i32(mem_idx, get_mem_index(s));
+
+    g_assert(size <= 3);
+
+    if (size < 3) {
+        TCGv_i32 tmp = tcg_temp_new_i32();
+
+        switch (size) {
+        case 0:
+            gen_helper_ldlink_i8(tmp, cpu_env, addr, mem_idx);
+            break;
+        case 1:
+            gen_helper_ldlink_i16(tmp, cpu_env, addr, mem_idx);
+            break;
+        case 2:
+            gen_helper_ldlink_i32(tmp, cpu_env, addr, mem_idx);
+            break;
+        default:
+            abort();
+        }
+
+        TCGv_i64 tmp64 = tcg_temp_new_i64();
+        tcg_gen_ext_i32_i64(tmp64, tmp);
+        tcg_gen_mov_i64(cpu_reg(s, rt), tmp64);
+
+        tcg_temp_free_i32(tmp);
+        tcg_temp_free_i64(tmp64);
+    } else {
+        gen_helper_ldlink_i64(cpu_reg(s, rt), cpu_env, addr, mem_idx);
+    }
+
+    if (is_pair) {
+        TCGMemOp memop = MO_TE + size;
+        TCGv_i64 addr2 = tcg_temp_new_i64();
+        TCGv_i64 hitmp = tcg_temp_new_i64();
+
+        g_assert(size >= 2);
+        tcg_gen_addi_i64(addr2, addr, 1 << size);
+        tcg_gen_qemu_ld_i64(hitmp, addr2, get_mem_index(s), memop);
+        tcg_temp_free_i64(addr2);
+        tcg_gen_mov_i64(cpu_reg(s, rt2), hitmp);
+        tcg_temp_free_i64(hitmp);
+    }
+
+    tcg_temp_free_i32(mem_idx);
+}
+#else
 static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
                                TCGv_i64 addr, int size, bool is_pair)
 {
@@ -1723,6 +1795,7 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
     tcg_temp_free_i64(tmp);
     tcg_gen_mov_i64(cpu_exclusive_addr, addr);
 }
+#endif
 
 #ifdef CONFIG_USER_ONLY
 static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
@@ -1733,6 +1806,65 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
                      size | is_pair << 2 | (rd << 4) | (rt << 9) | (rt2 << 14));
     gen_exception_internal_insn(s, 4, EXCP_STREX);
 }
+#elif defined(CONFIG_ARM_USE_LDST_EXCL)
+static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
+                                TCGv_i64 addr, int size, int is_pair)
+{
+    /* Don't bother to check if we are actually in exclusive context since the
+     * helpers keep care of it. */
+    TCGv_i32 mem_idx = tcg_temp_new_i32();
+
+    tcg_gen_movi_i32(mem_idx, get_mem_index(s));
+
+    g_assert(size <= 3);
+    if (is_pair) {
+        if (size == 3) {
+            gen_helper_stxp_i128(cpu_reg(s, rd), cpu_env, addr, cpu_reg(s, rt),
+                                 cpu_reg(s, rt2), mem_idx);
+        } else if (size == 2) {
+            /* Paired single word case. After merging the two registers into
+             * one, we use one stcond_i64 to store the value to memory. */
+            TCGv_i64 val = tcg_temp_new_i64();
+            TCGv_i64 valh = tcg_temp_new_i64();
+            tcg_gen_shli_i64(valh, cpu_reg(s, rt2), 32);
+            tcg_gen_and_i64(val, valh, cpu_reg(s, rt));
+            gen_helper_stcond_i64(cpu_reg(s, rd), cpu_env, addr, val, mem_idx);
+            tcg_temp_free_i64(valh);
+            tcg_temp_free_i64(val);
+        } else {
+            abort();
+        }
+    } else {
+        if (size < 3) {
+            TCGv_i32 val = tcg_temp_new_i32();
+
+            tcg_gen_extrl_i64_i32(val, cpu_reg(s, rt));
+
+            switch (size) {
+            case 0:
+                gen_helper_stcond_i8(cpu_reg(s, rd), cpu_env, addr, val,
+                                     mem_idx);
+                break;
+            case 1:
+                gen_helper_stcond_i16(cpu_reg(s, rd), cpu_env, addr, val,
+                                      mem_idx);
+                break;
+            case 2:
+                gen_helper_stcond_i32(cpu_reg(s, rd), cpu_env, addr, val,
+                                      mem_idx);
+                break;
+            default:
+                abort();
+            }
+            tcg_temp_free_i32(val);
+        } else {
+            gen_helper_stcond_i64(cpu_reg(s, rd), cpu_env, addr, cpu_reg(s, rt),
+                                  mem_idx);
+        }
+    }
+
+    tcg_temp_free_i32(mem_idx);
+}
 #else
 static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
                                 TCGv_i64 inaddr, int size, int is_pair)
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 01/16] exec.c: Add new exclusive bitmap to ram_list
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 01/16] exec.c: Add new exclusive bitmap to ram_list Alvise Rigo
@ 2016-02-11 13:00   ` Alex Bennée
  2016-02-11 13:21     ` alvise rigo
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Bennée @ 2016-02-11 13:00 UTC (permalink / raw)
  To: Alvise Rigo
  Cc: mttcg, claudio.fontana, qemu-devel, pbonzini, jani.kokkonen, tech, rth


Alvise Rigo <a.rigo@virtualopensystems.com> writes:

> The purpose of this new bitmap is to flag the memory pages that are in
> the middle of LL/SC operations (after a LL, before a SC). For all these
> pages, the corresponding TLB entries will be generated in such a way to
> force the slow-path for all the VCPUs (see the following patches).
>
> When the system starts, the whole memory is set to dirty.
>
> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
> ---
>  exec.c                  |  7 +++++--
>  include/exec/memory.h   |  3 ++-
>  include/exec/ram_addr.h | 31 +++++++++++++++++++++++++++++++
>  3 files changed, 38 insertions(+), 3 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 7115403..51f366d 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1575,11 +1575,14 @@ static ram_addr_t ram_block_add(RAMBlock *new_block, Error **errp)
>          int i;
>
>          /* ram_list.dirty_memory[] is protected by the iothread lock.  */
> -        for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
> +        for (i = 0; i < DIRTY_MEMORY_EXCLUSIVE; i++) {
>              ram_list.dirty_memory[i] =
>                  bitmap_zero_extend(ram_list.dirty_memory[i],
>                                     old_ram_size, new_ram_size);
> -       }
> +        }
> +        ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE] =
> +            bitmap_zero_extend(ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE],
> +                               old_ram_size, new_ram_size);

In the previous patch you moved this out of the loop as
ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE] was a different size to
the other dirty bitmaps. This no longer seems to be the case so this
seems pointless.

>      }
>      cpu_physical_memory_set_dirty_range(new_block->offset,
>                                          new_block->used_length,
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index c92734a..71e0480 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -19,7 +19,8 @@
>  #define DIRTY_MEMORY_VGA       0
>  #define DIRTY_MEMORY_CODE      1
>  #define DIRTY_MEMORY_MIGRATION 2
> -#define DIRTY_MEMORY_NUM       3        /* num of dirty bits */
> +#define DIRTY_MEMORY_EXCLUSIVE 3
> +#define DIRTY_MEMORY_NUM       4        /* num of dirty bits */
>
>  #include <stdint.h>
>  #include <stdbool.h>
> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> index ef1489d..19789fc 100644
> --- a/include/exec/ram_addr.h
> +++ b/include/exec/ram_addr.h
> @@ -21,6 +21,7 @@
>
>  #ifndef CONFIG_USER_ONLY
>  #include "hw/xen/xen.h"
> +#include "sysemu/sysemu.h"
>
>  struct RAMBlock {
>      struct rcu_head rcu;
> @@ -172,6 +173,9 @@ static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start,
>      if (unlikely(mask & (1 << DIRTY_MEMORY_CODE))) {
>          bitmap_set_atomic(d[DIRTY_MEMORY_CODE], page, end - page);
>      }
> +    if (unlikely(mask & (1 << DIRTY_MEMORY_EXCLUSIVE))) {
> +        bitmap_set_atomic(d[DIRTY_MEMORY_EXCLUSIVE], page, end - page);
> +    }
>      xen_modified_memory(start, length);
>  }
>
> @@ -287,5 +291,32 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(unsigned long *dest,
>  }
>
>  void migration_bitmap_extend(ram_addr_t old, ram_addr_t new);
> +
> +/* Exclusive bitmap support. */
> +#define EXCL_BITMAP_GET_OFFSET(addr) (addr >> TARGET_PAGE_BITS)
> +
> +/* Make the page of @addr not exclusive. */
> +static inline void cpu_physical_memory_unset_excl(ram_addr_t addr)
> +{
> +    set_bit_atomic(EXCL_BITMAP_GET_OFFSET(addr),
> +                   ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE]);
> +}
> +
> +/* Return true if the page of @addr is exclusive, i.e. the EXCL bit is set. */
> +static inline int cpu_physical_memory_is_excl(ram_addr_t addr)
> +{
> +    return !test_bit(EXCL_BITMAP_GET_OFFSET(addr),
> +                     ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE]);
> +}
> +
> +/* Set the page of @addr as exclusive clearing its EXCL bit and return the
> + * previous bit's state. */
> +static inline int cpu_physical_memory_set_excl(ram_addr_t addr)
> +{
> +    return bitmap_test_and_clear_atomic(
> +                                ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE],
> +                                EXCL_BITMAP_GET_OFFSET(addr), 1);
> +}
> +
>  #endif
>  #endif


--
Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 02/16] softmmu: Simplify helper_*_st_name, wrap unaligned code
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 02/16] softmmu: Simplify helper_*_st_name, wrap unaligned code Alvise Rigo
@ 2016-02-11 13:07   ` Alex Bennée
  0 siblings, 0 replies; 50+ messages in thread
From: Alex Bennée @ 2016-02-11 13:07 UTC (permalink / raw)
  To: Alvise Rigo
  Cc: mttcg, claudio.fontana, qemu-devel, Peter Crosthwaite, pbonzini,
	jani.kokkonen, tech, rth


Alvise Rigo <a.rigo@virtualopensystems.com> writes:

> Attempting to simplify the helper_*_st_name, wrap the
> do_unaligned_access code into an inline function.
> Remove also the goto statement.

How are you generating your CC list? get_maintainer.pl shows Peter
Croshwaite (CC'ed) should also be CC'ed on these patches. If we want to
get any of this patch series merged before soft freeze we'll need some
signoffs from the maintainers ;-)

>
> Based on this work, Alex proposed the following patch series
> https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg01136.html
> that reduces code duplication of the softmmu_helpers.
>
> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
> ---
>  softmmu_template.h | 96 ++++++++++++++++++++++++++++++++++--------------------
>  1 file changed, 60 insertions(+), 36 deletions(-)
>
> diff --git a/softmmu_template.h b/softmmu_template.h
> index 208f808..7029a03 100644
> --- a/softmmu_template.h
> +++ b/softmmu_template.h
> @@ -370,6 +370,32 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
>                                   iotlbentry->attrs);
>  }
>
> +static inline void glue(helper_le_st_name, _do_unl_access)(CPUArchState *env,
> +                                                           DATA_TYPE val,
> +                                                           target_ulong addr,
> +                                                           TCGMemOpIdx oi,
> +                                                           unsigned mmu_idx,
> +                                                           uintptr_t retaddr)
> +{
> +    int i;
> +
> +    if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
> +        cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
> +                             mmu_idx, retaddr);
> +    }
> +    /* XXX: not efficient, but simple */
> +    /* Note: relies on the fact that tlb_fill() does not remove the
> +     * previous page from the TLB cache.  */
> +    for (i = DATA_SIZE - 1; i >= 0; i--) {
> +        /* Little-endian extract.  */
> +        uint8_t val8 = val >> (i * 8);
> +        /* Note the adjustment at the beginning of the function.
> +           Undo that for the recursion.  */
> +        glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
> +                                        oi, retaddr + GETPC_ADJ);
> +    }
> +}
> +

I still think there is some mileage in combining the unaligned stuff but
as no maintainer spoke out before or against last time I'll leave that
for future clean-ups.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

>  void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> @@ -399,7 +425,8 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>      if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
>          CPUIOTLBEntry *iotlbentry;
>          if ((addr & (DATA_SIZE - 1)) != 0) {
> -            goto do_unaligned_access;
> +            glue(helper_le_st_name, _do_unl_access)(env, val, addr, mmu_idx,
> +                                                    oi, retaddr);
>          }
>          iotlbentry = &env->iotlb[mmu_idx][index];
>
> @@ -414,23 +441,8 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>      if (DATA_SIZE > 1
>          && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
>                       >= TARGET_PAGE_SIZE)) {
> -        int i;
> -    do_unaligned_access:
> -        if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
> -            cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
> -                                 mmu_idx, retaddr);
> -        }
> -        /* XXX: not efficient, but simple */
> -        /* Note: relies on the fact that tlb_fill() does not remove the
> -         * previous page from the TLB cache.  */
> -        for (i = DATA_SIZE - 1; i >= 0; i--) {
> -            /* Little-endian extract.  */
> -            uint8_t val8 = val >> (i * 8);
> -            /* Note the adjustment at the beginning of the function.
> -               Undo that for the recursion.  */
> -            glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
> -                                            oi, retaddr + GETPC_ADJ);
> -        }
> +        glue(helper_le_st_name, _do_unl_access)(env, val, addr, mmu_idx,
> +                                                oi, retaddr);
>          return;
>      }
>
> @@ -450,6 +462,32 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>  }
>
>  #if DATA_SIZE > 1
> +static inline void glue(helper_be_st_name, _do_unl_access)(CPUArchState *env,
> +                                                           DATA_TYPE val,
> +                                                           target_ulong addr,
> +                                                           TCGMemOpIdx oi,
> +                                                           unsigned mmu_idx,
> +                                                           uintptr_t retaddr)
> +{
> +    int i;
> +
> +    if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
> +        cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
> +                             mmu_idx, retaddr);
> +    }
> +    /* XXX: not efficient, but simple */
> +    /* Note: relies on the fact that tlb_fill() does not remove the
> +     * previous page from the TLB cache.  */
> +    for (i = DATA_SIZE - 1; i >= 0; i--) {
> +        /* Big-endian extract.  */
> +        uint8_t val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8));
> +        /* Note the adjustment at the beginning of the function.
> +           Undo that for the recursion.  */
> +        glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
> +                                        oi, retaddr + GETPC_ADJ);
> +    }
> +}
> +
>  void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> @@ -479,7 +517,8 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>      if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
>          CPUIOTLBEntry *iotlbentry;
>          if ((addr & (DATA_SIZE - 1)) != 0) {
> -            goto do_unaligned_access;
> +            glue(helper_be_st_name, _do_unl_access)(env, val, addr, mmu_idx,
> +                                                    oi, retaddr);
>          }
>          iotlbentry = &env->iotlb[mmu_idx][index];
>
> @@ -494,23 +533,8 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>      if (DATA_SIZE > 1
>          && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
>                       >= TARGET_PAGE_SIZE)) {
> -        int i;
> -    do_unaligned_access:
> -        if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
> -            cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
> -                                 mmu_idx, retaddr);
> -        }
> -        /* XXX: not efficient, but simple */
> -        /* Note: relies on the fact that tlb_fill() does not remove the
> -         * previous page from the TLB cache.  */
> -        for (i = DATA_SIZE - 1; i >= 0; i--) {
> -            /* Big-endian extract.  */
> -            uint8_t val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8));
> -            /* Note the adjustment at the beginning of the function.
> -               Undo that for the recursion.  */
> -            glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
> -                                            oi, retaddr + GETPC_ADJ);
> -        }
> +            glue(helper_be_st_name, _do_unl_access)(env, val, addr, oi, mmu_idx,
> +                                                    retaddr);
>          return;
>      }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 03/16] softmmu: Simplify helper_*_st_name, wrap MMIO code
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 03/16] softmmu: Simplify helper_*_st_name, wrap MMIO code Alvise Rigo
@ 2016-02-11 13:15   ` Alex Bennée
  0 siblings, 0 replies; 50+ messages in thread
From: Alex Bennée @ 2016-02-11 13:15 UTC (permalink / raw)
  To: Alvise Rigo
  Cc: mttcg, claudio.fontana, qemu-devel, pbonzini, jani.kokkonen, tech, rth


Alvise Rigo <a.rigo@virtualopensystems.com> writes:

> Attempting to simplify the helper_*_st_name, wrap the MMIO code into an
> inline function.
>
> Based on this work, Alex proposed the following patch series
> https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg01136.html
> that reduces code duplication of the softmmu_helpers.
>
> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
> ---
>  softmmu_template.h | 66 ++++++++++++++++++++++++++++++++++++------------------
>  1 file changed, 44 insertions(+), 22 deletions(-)
>
> diff --git a/softmmu_template.h b/softmmu_template.h
> index 7029a03..3d388ec 100644
> --- a/softmmu_template.h
> +++ b/softmmu_template.h
> @@ -396,6 +396,26 @@ static inline void glue(helper_le_st_name, _do_unl_access)(CPUArchState *env,
>      }
>  }
>
> +static inline void glue(helper_le_st_name, _do_mmio_access)(CPUArchState *env,
> +                                                            DATA_TYPE val,
> +                                                            target_ulong addr,
> +                                                            TCGMemOpIdx oi,
> +                                                            unsigned mmu_idx,
> +                                                            int index,
> +                                                            uintptr_t retaddr)
> +{
> +    CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
> +
> +    if ((addr & (DATA_SIZE - 1)) != 0) {
> +        glue(helper_le_st_name, _do_unl_access)(env, val, addr, mmu_idx,
> +                                                oi, retaddr);
> +    }
> +    /* ??? Note that the io helpers always read data in the target
> +       byte ordering.  We should push the LE/BE request down into io.  */
> +    val = TGT_LE(val);
> +    glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
> +}
> +
>  void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> @@ -423,17 +443,8 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>
>      /* Handle an IO access.  */
>      if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
> -        CPUIOTLBEntry *iotlbentry;
> -        if ((addr & (DATA_SIZE - 1)) != 0) {
> -            glue(helper_le_st_name, _do_unl_access)(env, val, addr, mmu_idx,
> -                                                    oi, retaddr);
> -        }
> -        iotlbentry = &env->iotlb[mmu_idx][index];
> -
> -        /* ??? Note that the io helpers always read data in the target
> -           byte ordering.  We should push the LE/BE request down into io.  */
> -        val = TGT_LE(val);
> -        glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
> +        glue(helper_le_st_name, _do_mmio_access)(env, val, addr, oi,
> +                                                 mmu_idx, index, retaddr);
>          return;
>      }
>
> @@ -488,6 +499,26 @@ static inline void glue(helper_be_st_name, _do_unl_access)(CPUArchState *env,
>      }
>  }
>
> +static inline void glue(helper_be_st_name, _do_mmio_access)(CPUArchState *env,
> +                                                            DATA_TYPE val,
> +                                                            target_ulong addr,
> +                                                            TCGMemOpIdx oi,
> +                                                            unsigned mmu_idx,
> +                                                            int index,
> +                                                            uintptr_t retaddr)
> +{
> +    CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
> +
> +    if ((addr & (DATA_SIZE - 1)) != 0) {
> +        glue(helper_be_st_name, _do_unl_access)(env, val, addr, mmu_idx,
> +                                                oi, retaddr);
> +    }
> +    /* ??? Note that the io helpers always read data in the target
> +       byte ordering.  We should push the LE/BE request down into io.  */
> +    val = TGT_BE(val);
> +    glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
> +}
> +

As before I still thing there is millage in having a common helper
between LE/BE which the compiler can sort out. Having said that there is
less argument for this function as it is a bit smalled and you would
need a bit more faffing about.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

>  void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> @@ -515,17 +546,8 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>
>      /* Handle an IO access.  */
>      if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
> -        CPUIOTLBEntry *iotlbentry;
> -        if ((addr & (DATA_SIZE - 1)) != 0) {
> -            glue(helper_be_st_name, _do_unl_access)(env, val, addr, mmu_idx,
> -                                                    oi, retaddr);
> -        }
> -        iotlbentry = &env->iotlb[mmu_idx][index];
> -
> -        /* ??? Note that the io helpers always read data in the target
> -           byte ordering.  We should push the LE/BE request down into io.  */
> -        val = TGT_BE(val);
> -        glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
> +        glue(helper_be_st_name, _do_mmio_access)(env, val, addr, oi,
> +                                                 mmu_idx, index, retaddr);
>          return;
>      }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 04/16] softmmu: Simplify helper_*_st_name, wrap RAM code
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 04/16] softmmu: Simplify helper_*_st_name, wrap RAM code Alvise Rigo
@ 2016-02-11 13:18   ` Alex Bennée
  0 siblings, 0 replies; 50+ messages in thread
From: Alex Bennée @ 2016-02-11 13:18 UTC (permalink / raw)
  To: Alvise Rigo
  Cc: mttcg, claudio.fontana, qemu-devel, pbonzini, jani.kokkonen, tech, rth


Alvise Rigo <a.rigo@virtualopensystems.com> writes:

> Attempting to simplify the helper_*_st_name, wrap the code relative to a
> RAM access into an inline function.
>
> Based on this work, Alex proposed the following patch series
> https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg01136.html
> that reduces code duplication of the softmmu_helpers.
>
> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
> ---
>  softmmu_template.h | 110 +++++++++++++++++++++++++++++++++--------------------
>  1 file changed, 68 insertions(+), 42 deletions(-)
>
> diff --git a/softmmu_template.h b/softmmu_template.h
> index 3d388ec..6279437 100644
> --- a/softmmu_template.h
> +++ b/softmmu_template.h
> @@ -416,13 +416,46 @@ static inline void glue(helper_le_st_name, _do_mmio_access)(CPUArchState *env,
>      glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
>  }
>
> +static inline void glue(helper_le_st_name, _do_ram_access)(CPUArchState *env,
> +                                                           DATA_TYPE val,
> +                                                           target_ulong addr,
> +                                                           TCGMemOpIdx oi,
> +                                                           unsigned mmu_idx,
> +                                                           int index,
> +                                                           uintptr_t retaddr)
> +{
> +    uintptr_t haddr;
> +
> +    /* Handle slow unaligned access (it spans two pages or IO).  */
> +    if (DATA_SIZE > 1
> +        && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
> +                     >= TARGET_PAGE_SIZE)) {
> +        glue(helper_le_st_name, _do_unl_access)(env, val, addr, oi, mmu_idx,
> +                                                retaddr);
> +        return;
> +    }
> +
> +    /* Handle aligned access or unaligned access in the same page.  */
> +    if ((addr & (DATA_SIZE - 1)) != 0
> +        && (get_memop(oi) & MO_AMASK) == MO_ALIGN) {
> +        cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
> +                             mmu_idx, retaddr);
> +    }
> +
> +    haddr = addr + env->tlb_table[mmu_idx][index].addend;
> +#if DATA_SIZE == 1
> +    glue(glue(st, SUFFIX), _p)((uint8_t *)haddr, val);
> +#else
> +    glue(glue(st, SUFFIX), _le_p)((uint8_t *)haddr, val);
> +#endif
> +}
> +
>  void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      unsigned mmu_idx = get_mmuidx(oi);
>      int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>      target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
> -    uintptr_t haddr;
>
>      /* Adjust the given return address.  */
>      retaddr -= GETPC_ADJ;
> @@ -448,28 +481,8 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>          return;
>      }
>
> -    /* Handle slow unaligned access (it spans two pages or IO).  */
> -    if (DATA_SIZE > 1
> -        && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
> -                     >= TARGET_PAGE_SIZE)) {
> -        glue(helper_le_st_name, _do_unl_access)(env, val, addr, mmu_idx,
> -                                                oi, retaddr);
> -        return;
> -    }
> -
> -    /* Handle aligned access or unaligned access in the same page.  */
> -    if ((addr & (DATA_SIZE - 1)) != 0
> -        && (get_memop(oi) & MO_AMASK) == MO_ALIGN) {
> -        cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
> -                             mmu_idx, retaddr);
> -    }
> -
> -    haddr = addr + env->tlb_table[mmu_idx][index].addend;
> -#if DATA_SIZE == 1
> -    glue(glue(st, SUFFIX), _p)((uint8_t *)haddr, val);
> -#else
> -    glue(glue(st, SUFFIX), _le_p)((uint8_t *)haddr, val);
> -#endif
> +    glue(helper_le_st_name, _do_ram_access)(env, val, addr, oi, mmu_idx, index,
> +                                            retaddr);
>  }
>
>  #if DATA_SIZE > 1
> @@ -519,13 +532,42 @@ static inline void glue(helper_be_st_name, _do_mmio_access)(CPUArchState *env,
>      glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
>  }
>
> +static inline void glue(helper_be_st_name, _do_ram_access)(CPUArchState *env,
> +                                                           DATA_TYPE val,
> +                                                           target_ulong addr,
> +                                                           TCGMemOpIdx oi,
> +                                                           unsigned mmu_idx,
> +                                                           int index,
> +                                                           uintptr_t retaddr)
> +{
> +    uintptr_t haddr;
> +
> +    /* Handle slow unaligned access (it spans two pages or IO).  */
> +    if (DATA_SIZE > 1
> +        && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
> +                     >= TARGET_PAGE_SIZE)) {
> +        glue(helper_be_st_name, _do_unl_access)(env, val, addr, oi, mmu_idx,
> +                                                retaddr);
> +        return;
> +    }
> +
> +    /* Handle aligned access or unaligned access in the same page.  */
> +    if ((addr & (DATA_SIZE - 1)) != 0
> +        && (get_memop(oi) & MO_AMASK) == MO_ALIGN) {
> +        cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
> +                             mmu_idx, retaddr);
> +    }
> +
> +    haddr = addr + env->tlb_table[mmu_idx][index].addend;
> +    glue(glue(st, SUFFIX), _be_p)((uint8_t *)haddr, val);
> +}
> +
>  void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      unsigned mmu_idx = get_mmuidx(oi);
>      int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>      target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
> -    uintptr_t haddr;
>
>      /* Adjust the given return address.  */
>      retaddr -= GETPC_ADJ;
> @@ -551,24 +593,8 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>          return;
>      }
>
> -    /* Handle slow unaligned access (it spans two pages or IO).  */
> -    if (DATA_SIZE > 1
> -        && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
> -                     >= TARGET_PAGE_SIZE)) {
> -            glue(helper_be_st_name, _do_unl_access)(env, val, addr, oi, mmu_idx,
> -                                                    retaddr);
> -        return;
> -    }
> -
> -    /* Handle aligned access or unaligned access in the same page.  */
> -    if ((addr & (DATA_SIZE - 1)) != 0
> -        && (get_memop(oi) & MO_AMASK) == MO_ALIGN) {
> -        cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
> -                             mmu_idx, retaddr);
> -    }
> -
> -    haddr = addr + env->tlb_table[mmu_idx][index].addend;
> -    glue(glue(st, SUFFIX), _be_p)((uint8_t *)haddr, val);
> +    glue(helper_be_st_name, _do_ram_access)(env, val, addr, oi, mmu_idx, index,
> +                                            retaddr);
>  }
>  #endif /* DATA_SIZE > 1 */

Same comments as before, there is more duplication that could be
removed. However:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 05/16] softmmu: Add new TLB_EXCL flag
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 05/16] softmmu: Add new TLB_EXCL flag Alvise Rigo
@ 2016-02-11 13:18   ` Alex Bennée
  0 siblings, 0 replies; 50+ messages in thread
From: Alex Bennée @ 2016-02-11 13:18 UTC (permalink / raw)
  To: Alvise Rigo
  Cc: mttcg, claudio.fontana, qemu-devel, pbonzini, jani.kokkonen, tech, rth


Alvise Rigo <a.rigo@virtualopensystems.com> writes:

> Add a new TLB flag to force all the accesses made to a page to follow
> the slow-path.
>
> The TLB entries referring guest pages with the DIRTY_MEMORY_EXCLUSIVE
> bit clean will have this flag set.
>
> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
> ---
>  include/exec/cpu-all.h | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> index 83b1781..f8d8feb 100644
> --- a/include/exec/cpu-all.h
> +++ b/include/exec/cpu-all.h
> @@ -277,6 +277,14 @@ CPUArchState *cpu_copy(CPUArchState *env);
>  #define TLB_NOTDIRTY    (1 << 4)
>  /* Set if TLB entry is an IO callback.  */
>  #define TLB_MMIO        (1 << 5)
> +/* Set if TLB entry references a page that requires exclusive access.  */
> +#define TLB_EXCL        (1 << 6)
> +
> +/* Do not allow a TARGET_PAGE_MASK which covers one or more bits defined
> + * above. */
> +#if TLB_EXCL >= TARGET_PAGE_SIZE
> +#error TARGET_PAGE_MASK covering the low bits of the TLB virtual address
> +#endif
>
>  void dump_exec_info(FILE *f, fprintf_function cpu_fprintf);
>  void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf);

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 01/16] exec.c: Add new exclusive bitmap to ram_list
  2016-02-11 13:00   ` Alex Bennée
@ 2016-02-11 13:21     ` alvise rigo
  0 siblings, 0 replies; 50+ messages in thread
From: alvise rigo @ 2016-02-11 13:21 UTC (permalink / raw)
  To: Alex Bennée
  Cc: MTTCG Devel, Claudio Fontana, QEMU Developers, Paolo Bonzini,
	Jani Kokkonen, VirtualOpenSystems Technical Team,
	Richard Henderson

You are right, the for loop with i < DIRTY_MEMORY_NUM works just fine.

Thank you,
alvise

On Thu, Feb 11, 2016 at 2:00 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Alvise Rigo <a.rigo@virtualopensystems.com> writes:
>
>> The purpose of this new bitmap is to flag the memory pages that are in
>> the middle of LL/SC operations (after a LL, before a SC). For all these
>> pages, the corresponding TLB entries will be generated in such a way to
>> force the slow-path for all the VCPUs (see the following patches).
>>
>> When the system starts, the whole memory is set to dirty.
>>
>> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
>> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
>> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
>> ---
>>  exec.c                  |  7 +++++--
>>  include/exec/memory.h   |  3 ++-
>>  include/exec/ram_addr.h | 31 +++++++++++++++++++++++++++++++
>>  3 files changed, 38 insertions(+), 3 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index 7115403..51f366d 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -1575,11 +1575,14 @@ static ram_addr_t ram_block_add(RAMBlock *new_block, Error **errp)
>>          int i;
>>
>>          /* ram_list.dirty_memory[] is protected by the iothread lock.  */
>> -        for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
>> +        for (i = 0; i < DIRTY_MEMORY_EXCLUSIVE; i++) {
>>              ram_list.dirty_memory[i] =
>>                  bitmap_zero_extend(ram_list.dirty_memory[i],
>>                                     old_ram_size, new_ram_size);
>> -       }
>> +        }
>> +        ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE] =
>> +            bitmap_zero_extend(ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE],
>> +                               old_ram_size, new_ram_size);
>
> In the previous patch you moved this out of the loop as
> ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE] was a different size to
> the other dirty bitmaps. This no longer seems to be the case so this
> seems pointless.
>
>>      }
>>      cpu_physical_memory_set_dirty_range(new_block->offset,
>>                                          new_block->used_length,
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index c92734a..71e0480 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -19,7 +19,8 @@
>>  #define DIRTY_MEMORY_VGA       0
>>  #define DIRTY_MEMORY_CODE      1
>>  #define DIRTY_MEMORY_MIGRATION 2
>> -#define DIRTY_MEMORY_NUM       3        /* num of dirty bits */
>> +#define DIRTY_MEMORY_EXCLUSIVE 3
>> +#define DIRTY_MEMORY_NUM       4        /* num of dirty bits */
>>
>>  #include <stdint.h>
>>  #include <stdbool.h>
>> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
>> index ef1489d..19789fc 100644
>> --- a/include/exec/ram_addr.h
>> +++ b/include/exec/ram_addr.h
>> @@ -21,6 +21,7 @@
>>
>>  #ifndef CONFIG_USER_ONLY
>>  #include "hw/xen/xen.h"
>> +#include "sysemu/sysemu.h"
>>
>>  struct RAMBlock {
>>      struct rcu_head rcu;
>> @@ -172,6 +173,9 @@ static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start,
>>      if (unlikely(mask & (1 << DIRTY_MEMORY_CODE))) {
>>          bitmap_set_atomic(d[DIRTY_MEMORY_CODE], page, end - page);
>>      }
>> +    if (unlikely(mask & (1 << DIRTY_MEMORY_EXCLUSIVE))) {
>> +        bitmap_set_atomic(d[DIRTY_MEMORY_EXCLUSIVE], page, end - page);
>> +    }
>>      xen_modified_memory(start, length);
>>  }
>>
>> @@ -287,5 +291,32 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(unsigned long *dest,
>>  }
>>
>>  void migration_bitmap_extend(ram_addr_t old, ram_addr_t new);
>> +
>> +/* Exclusive bitmap support. */
>> +#define EXCL_BITMAP_GET_OFFSET(addr) (addr >> TARGET_PAGE_BITS)
>> +
>> +/* Make the page of @addr not exclusive. */
>> +static inline void cpu_physical_memory_unset_excl(ram_addr_t addr)
>> +{
>> +    set_bit_atomic(EXCL_BITMAP_GET_OFFSET(addr),
>> +                   ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE]);
>> +}
>> +
>> +/* Return true if the page of @addr is exclusive, i.e. the EXCL bit is set. */
>> +static inline int cpu_physical_memory_is_excl(ram_addr_t addr)
>> +{
>> +    return !test_bit(EXCL_BITMAP_GET_OFFSET(addr),
>> +                     ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE]);
>> +}
>> +
>> +/* Set the page of @addr as exclusive clearing its EXCL bit and return the
>> + * previous bit's state. */
>> +static inline int cpu_physical_memory_set_excl(ram_addr_t addr)
>> +{
>> +    return bitmap_test_and_clear_atomic(
>> +                                ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE],
>> +                                EXCL_BITMAP_GET_OFFSET(addr), 1);
>> +}
>> +
>>  #endif
>>  #endif
>
>
> --
> Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 06/16] qom: cpu: Add CPUClass hooks for exclusive range
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 06/16] qom: cpu: Add CPUClass hooks for exclusive range Alvise Rigo
@ 2016-02-11 13:22   ` Alex Bennée
  2016-02-18 13:53     ` alvise rigo
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Bennée @ 2016-02-11 13:22 UTC (permalink / raw)
  To: Alvise Rigo
  Cc: mttcg, claudio.fontana, qemu-devel, pbonzini, jani.kokkonen, tech, rth


Alvise Rigo <a.rigo@virtualopensystems.com> writes:

> The excl_protected_range is a hwaddr range set by the VCPU at the
> execution of a LoadLink instruction. If a normal access writes to this
> range, the corresponding StoreCond will fail.
>
> Each architecture can set the exclusive range when issuing the LoadLink
> operation through a CPUClass hook. This comes in handy to emulate, for
> instance, the exclusive monitor implemented in some ARM architectures
> (more precisely, the Exclusive Reservation Granule).
>
> In addition, add another CPUClass hook called to decide whether a
> StoreCond has to fail or not.
>
> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
> ---
>  include/qom/cpu.h | 15 +++++++++++++++
>  qom/cpu.c         | 20 ++++++++++++++++++++
>  2 files changed, 35 insertions(+)
>
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 2e5229d..682c81d 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -29,6 +29,7 @@
>  #include "qemu/queue.h"
>  #include "qemu/thread.h"
>  #include "qemu/typedefs.h"
> +#include "qemu/range.h"
>
>  typedef int (*WriteCoreDumpFunction)(const void *buf, size_t size,
>                                       void *opaque);
> @@ -183,6 +184,12 @@ typedef struct CPUClass {
>      void (*cpu_exec_exit)(CPUState *cpu);
>      bool (*cpu_exec_interrupt)(CPUState *cpu, int interrupt_request);
>
> +    /* Atomic instruction handling */
> +    void (*cpu_set_excl_protected_range)(CPUState *cpu, hwaddr addr,
> +                                         hwaddr size);
> +    int (*cpu_valid_excl_access)(CPUState *cpu, hwaddr addr,
> +                                 hwaddr size);
> +
>      void (*disas_set_info)(CPUState *cpu, disassemble_info *info);
>  } CPUClass;
>
> @@ -219,6 +226,9 @@ struct kvm_run;
>  #define TB_JMP_CACHE_BITS 12
>  #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
>
> +/* Atomic insn translation TLB support. */
> +#define EXCLUSIVE_RESET_ADDR ULLONG_MAX
> +
>  /**
>   * CPUState:
>   * @cpu_index: CPU index (informative).
> @@ -341,6 +351,11 @@ struct CPUState {
>       */
>      bool throttle_thread_scheduled;
>
> +    /* vCPU's exclusive addresses range.
> +     * The address is set to EXCLUSIVE_RESET_ADDR if the vCPU is not
> +     * in the middle of a LL/SC. */
> +    struct Range excl_protected_range;
> +

In which case we should probably initialise that on CPU creation as we
don't start in the middle of a LL/SC.

>      /* Note that this is accessed at the start of every TB via a negative
>         offset from AREG0.  Leave this field at the end so as to make the
>         (absolute value) offset as small as possible.  This reduces code
> diff --git a/qom/cpu.c b/qom/cpu.c
> index 8f537a4..a5d360c 100644
> --- a/qom/cpu.c
> +++ b/qom/cpu.c
> @@ -203,6 +203,24 @@ static bool cpu_common_exec_interrupt(CPUState *cpu, int int_req)
>      return false;
>  }
>
> +static void cpu_common_set_excl_range(CPUState *cpu, hwaddr addr, hwaddr size)
> +{
> +    cpu->excl_protected_range.begin = addr;
> +    cpu->excl_protected_range.end = addr + size;
> +}
> +
> +static int cpu_common_valid_excl_access(CPUState *cpu, hwaddr addr, hwaddr size)
> +{
> +    /* Check if the excl range completely covers the access */
> +    if (cpu->excl_protected_range.begin <= addr &&
> +        cpu->excl_protected_range.end >= addr + size) {
> +
> +        return 1;
> +    }
> +
> +    return 0;
> +}

This can be a bool function.

> +
>  void cpu_dump_state(CPUState *cpu, FILE *f, fprintf_function cpu_fprintf,
>                      int flags)
>  {
> @@ -355,6 +373,8 @@ static void cpu_class_init(ObjectClass *klass, void *data)
>      k->cpu_exec_enter = cpu_common_noop;
>      k->cpu_exec_exit = cpu_common_noop;
>      k->cpu_exec_interrupt = cpu_common_exec_interrupt;
> +    k->cpu_set_excl_protected_range = cpu_common_set_excl_range;
> +    k->cpu_valid_excl_access = cpu_common_valid_excl_access;
>      dc->realize = cpu_common_realizefn;
>      /*
>       * Reason: CPUs still need special care by board code: wiring up


--
Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 07/16] softmmu: Add helpers for a new slowpath
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 07/16] softmmu: Add helpers for a new slowpath Alvise Rigo
@ 2016-02-11 16:33   ` Alex Bennée
  2016-02-18 13:58     ` alvise rigo
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Bennée @ 2016-02-11 16:33 UTC (permalink / raw)
  To: Alvise Rigo
  Cc: mttcg, claudio.fontana, qemu-devel, pbonzini, jani.kokkonen, tech, rth


Alvise Rigo <a.rigo@virtualopensystems.com> writes:

> The new helpers rely on the legacy ones to perform the actual read/write.
>
> The LoadLink helper (helper_ldlink_name) prepares the way for the
> following StoreCond operation. It sets the linked address and the size
> of the access. The LoadLink helper also updates the TLB entry of the
> page involved in the LL/SC to all vCPUs by forcing a TLB flush, so that
> the following accesses made by all the vCPUs will follow the slow path.
>
> The StoreConditional helper (helper_stcond_name) returns 1 if the
> store has to fail due to a concurrent access to the same page by
> another vCPU. A 'concurrent access' can be a store made by *any* vCPU
> (although, some implementations allow stores made by the CPU that issued
> the LoadLink).
>
> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
> ---
>  cputlb.c                |   3 ++
>  include/qom/cpu.h       |   5 ++
>  softmmu_llsc_template.h | 133 ++++++++++++++++++++++++++++++++++++++++++++++++
>  softmmu_template.h      |  12 +++++
>  tcg/tcg.h               |  31 +++++++++++
>  5 files changed, 184 insertions(+)
>  create mode 100644 softmmu_llsc_template.h
>
> diff --git a/cputlb.c b/cputlb.c
> index f6fb161..ce6d720 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -476,6 +476,8 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr)
>
>  #define MMUSUFFIX _mmu
>
> +/* Generates LoadLink/StoreConditional helpers in softmmu_template.h */
> +#define GEN_EXCLUSIVE_HELPERS
>  #define SHIFT 0
>  #include "softmmu_template.h"
>
> @@ -488,6 +490,7 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr)
>  #define SHIFT 3
>  #include "softmmu_template.h"
>  #undef MMUSUFFIX
> +#undef GEN_EXCLUSIVE_HELPERS
>
>  #define MMUSUFFIX _cmmu
>  #undef GETPC_ADJ
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 682c81d..6f6c1c0 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -351,10 +351,15 @@ struct CPUState {
>       */
>      bool throttle_thread_scheduled;
>
> +    /* Used by the atomic insn translation backend. */
> +    bool ll_sc_context;
>      /* vCPU's exclusive addresses range.
>       * The address is set to EXCLUSIVE_RESET_ADDR if the vCPU is not
>       * in the middle of a LL/SC. */
>      struct Range excl_protected_range;
> +    /* Used to carry the SC result but also to flag a normal store access made
> +     * by a stcond (see softmmu_template.h). */
> +    bool excl_succeeded;
>
>      /* Note that this is accessed at the start of every TB via a negative
>         offset from AREG0.  Leave this field at the end so as to make the
> diff --git a/softmmu_llsc_template.h b/softmmu_llsc_template.h
> new file mode 100644
> index 0000000..101f5e8
> --- /dev/null
> +++ b/softmmu_llsc_template.h
> @@ -0,0 +1,133 @@
> +/*
> + *  Software MMU support (esclusive load/store operations)
> + *
> + * Generate helpers used by TCG for qemu_ldlink/stcond ops.
> + *
> + * Included from softmmu_template.h only.
> + *
> + * Copyright (c) 2015 Virtual Open Systems
> + *
> + * Authors:
> + *  Alvise Rigo <a.rigo@virtualopensystems.com>
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +/* This template does not generate together the le and be version, but only one
> + * of the two depending on whether BIGENDIAN_EXCLUSIVE_HELPERS has been set.
> + * The same nomenclature as softmmu_template.h is used for the exclusive
> + * helpers.  */
> +
> +#ifdef BIGENDIAN_EXCLUSIVE_HELPERS
> +
> +#define helper_ldlink_name  glue(glue(helper_be_ldlink, USUFFIX), MMUSUFFIX)
> +#define helper_stcond_name  glue(glue(helper_be_stcond, SUFFIX), MMUSUFFIX)
> +#define helper_ld glue(glue(helper_be_ld, USUFFIX), MMUSUFFIX)
> +#define helper_st glue(glue(helper_be_st, SUFFIX), MMUSUFFIX)
> +
> +#else /* LE helpers + 8bit helpers (generated only once for both LE end BE) */
> +
> +#if DATA_SIZE > 1
> +#define helper_ldlink_name  glue(glue(helper_le_ldlink, USUFFIX), MMUSUFFIX)
> +#define helper_stcond_name  glue(glue(helper_le_stcond, SUFFIX), MMUSUFFIX)
> +#define helper_ld glue(glue(helper_le_ld, USUFFIX), MMUSUFFIX)
> +#define helper_st glue(glue(helper_le_st, SUFFIX), MMUSUFFIX)
> +#else /* DATA_SIZE <= 1 */
> +#define helper_ldlink_name  glue(glue(helper_ret_ldlink, USUFFIX), MMUSUFFIX)
> +#define helper_stcond_name  glue(glue(helper_ret_stcond, SUFFIX), MMUSUFFIX)
> +#define helper_ld glue(glue(helper_ret_ld, USUFFIX), MMUSUFFIX)
> +#define helper_st glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)
> +#endif
> +
> +#endif
> +
> +WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
> +                                TCGMemOpIdx oi, uintptr_t retaddr)
> +{
> +    WORD_TYPE ret;
> +    int index;
> +    CPUState *cpu, *this = ENV_GET_CPU(env);

I'd rename this to this_cpu and move the *cpu definition to inside the
if {} where it is used so no confusion occurs.

> +    CPUClass *cc = CPU_GET_CLASS(this);
> +    hwaddr hw_addr;
> +    unsigned mmu_idx = get_mmuidx(oi);
> +
> +    /* Use the proper load helper from cpu_ldst.h */
> +    ret = helper_ld(env, addr, oi, retaddr);
> +
> +    index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
> +
> +    /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr + xlat)
> +     * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
> +    hw_addr = (env->iotlb[mmu_idx][index].addr & TARGET_PAGE_MASK) + addr;
> +    if (likely(!(env->tlb_table[mmu_idx][index].addr_read & TLB_MMIO))) {
> +        /* If all the vCPUs have the EXCL bit set for this page there is no need
> +         * to request any flush. */
> +        if (!cpu_physical_memory_is_excl(hw_addr)) {
> +            cpu_physical_memory_set_excl(hw_addr);
> +            CPU_FOREACH(cpu) {
> +                if (current_cpu != cpu) {

Why use current_cpu is we have this_cpu? I'd argue the check should use
what we've been passed over global (and future TLS value).

> +                    tlb_flush(cpu, 1);
> +                }
> +            }
> +        }
> +    } else {
> +        hw_error("EXCL accesses to MMIO regions not supported yet.");
> +    }
> +
> +    cc->cpu_set_excl_protected_range(this, hw_addr, DATA_SIZE);
> +
> +    /* For this vCPU, just update the TLB entry, no need to flush. */
> +    env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
> +
> +    /* From now on we are in LL/SC context */
> +    this->ll_sc_context = true;
> +
> +    return ret;
> +}
> +
> +WORD_TYPE helper_stcond_name(CPUArchState *env, target_ulong addr,
> +                             DATA_TYPE val, TCGMemOpIdx oi,
> +                             uintptr_t retaddr)
> +{
> +    WORD_TYPE ret;
> +    CPUState *cpu = ENV_GET_CPU(env);
> +
> +    if (!cpu->ll_sc_context) {
> +        ret = 1;
> +    } else {
> +        /* We set it preventively to true to distinguish the following legacy
> +         * access as one made by the store conditional wrapper. If the store
> +         * conditional does not succeed, the value will be set to 0.*/
> +        cpu->excl_succeeded = true;
> +        helper_st(env, addr, val, oi, retaddr);
> +
> +        if (cpu->excl_succeeded) {
> +            ret = 0;
> +        } else {
> +            ret = 1;
> +        }
> +    }
> +
> +    /* Unset LL/SC context */
> +    cpu->ll_sc_context = false;
> +    cpu->excl_succeeded = false;
> +    cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
> +
> +    return ret;
> +}
> +
> +#undef helper_ldlink_name
> +#undef helper_stcond_name
> +#undef helper_ld
> +#undef helper_st
> diff --git a/softmmu_template.h b/softmmu_template.h
> index 6279437..4332db2 100644
> --- a/softmmu_template.h
> +++ b/softmmu_template.h
> @@ -622,6 +622,18 @@ void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
>  #endif
>  #endif /* !defined(SOFTMMU_CODE_ACCESS) */
>
> +#ifdef GEN_EXCLUSIVE_HELPERS
> +
> +#if DATA_SIZE > 1 /* The 8-bit helpers are generate along with LE helpers */
> +#define BIGENDIAN_EXCLUSIVE_HELPERS
> +#include "softmmu_llsc_template.h"
> +#undef BIGENDIAN_EXCLUSIVE_HELPERS
> +#endif
> +
> +#include "softmmu_llsc_template.h"
> +
> +#endif /* !defined(GEN_EXCLUSIVE_HELPERS) */
> +
>  #undef READ_ACCESS_TYPE
>  #undef SHIFT
>  #undef DATA_TYPE
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index a696922..3e050a4 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -968,6 +968,21 @@ tcg_target_ulong helper_be_ldul_mmu(CPUArchState *env, target_ulong addr,
>                                      TCGMemOpIdx oi, uintptr_t retaddr);
>  uint64_t helper_be_ldq_mmu(CPUArchState *env, target_ulong addr,
>                             TCGMemOpIdx oi, uintptr_t retaddr);
> +/* Exclusive variants */
> +tcg_target_ulong helper_ret_ldlinkub_mmu(CPUArchState *env, target_ulong addr,
> +                                            TCGMemOpIdx oi, uintptr_t retaddr);
> +tcg_target_ulong helper_le_ldlinkuw_mmu(CPUArchState *env, target_ulong addr,
> +                                            TCGMemOpIdx oi, uintptr_t retaddr);
> +tcg_target_ulong helper_le_ldlinkul_mmu(CPUArchState *env, target_ulong addr,
> +                                            TCGMemOpIdx oi, uintptr_t retaddr);
> +uint64_t helper_le_ldlinkq_mmu(CPUArchState *env, target_ulong addr,
> +                                            TCGMemOpIdx oi, uintptr_t retaddr);
> +tcg_target_ulong helper_be_ldlinkuw_mmu(CPUArchState *env, target_ulong addr,
> +                                            TCGMemOpIdx oi, uintptr_t retaddr);
> +tcg_target_ulong helper_be_ldlinkul_mmu(CPUArchState *env, target_ulong addr,
> +                                            TCGMemOpIdx oi, uintptr_t retaddr);
> +uint64_t helper_be_ldlinkq_mmu(CPUArchState *env, target_ulong addr,
> +                                            TCGMemOpIdx oi, uintptr_t retaddr);
>
>  /* Value sign-extended to tcg register size.  */
>  tcg_target_ulong helper_ret_ldsb_mmu(CPUArchState *env, target_ulong addr,
> @@ -1010,6 +1025,22 @@ uint32_t helper_be_ldl_cmmu(CPUArchState *env, target_ulong addr,
>                              TCGMemOpIdx oi, uintptr_t retaddr);
>  uint64_t helper_be_ldq_cmmu(CPUArchState *env, target_ulong addr,
>                              TCGMemOpIdx oi, uintptr_t retaddr);
> +/* Exclusive variants */
> +tcg_target_ulong helper_ret_stcondb_mmu(CPUArchState *env, target_ulong addr,
> +                            uint8_t val, TCGMemOpIdx oi, uintptr_t retaddr);
> +tcg_target_ulong helper_le_stcondw_mmu(CPUArchState *env, target_ulong addr,
> +                            uint16_t val, TCGMemOpIdx oi, uintptr_t retaddr);
> +tcg_target_ulong helper_le_stcondl_mmu(CPUArchState *env, target_ulong addr,
> +                            uint32_t val, TCGMemOpIdx oi, uintptr_t retaddr);
> +uint64_t helper_le_stcondq_mmu(CPUArchState *env, target_ulong addr,
> +                            uint64_t val, TCGMemOpIdx oi, uintptr_t retaddr);
> +tcg_target_ulong helper_be_stcondw_mmu(CPUArchState *env, target_ulong addr,
> +                            uint16_t val, TCGMemOpIdx oi, uintptr_t retaddr);
> +tcg_target_ulong helper_be_stcondl_mmu(CPUArchState *env, target_ulong addr,
> +                            uint32_t val, TCGMemOpIdx oi, uintptr_t retaddr);
> +uint64_t helper_be_stcondq_mmu(CPUArchState *env, target_ulong addr,
> +                            uint64_t val, TCGMemOpIdx oi, uintptr_t retaddr);
> +
>
>  /* Temporary aliases until backends are converted.  */
>  #ifdef TARGET_WORDS_BIGENDIAN


--
Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 13/16] softmmu: Add history of excl accesses
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 13/16] softmmu: Add history of excl accesses Alvise Rigo
@ 2016-02-16 17:07   ` Alex Bennée
  2016-02-18 14:17     ` alvise rigo
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Bennée @ 2016-02-16 17:07 UTC (permalink / raw)
  To: Alvise Rigo
  Cc: mttcg, claudio.fontana, qemu-devel, pbonzini, jani.kokkonen, tech, rth


Alvise Rigo <a.rigo@virtualopensystems.com> writes:

> Add a circular buffer to store the hw addresses used in the last
> EXCLUSIVE_HISTORY_LEN exclusive accesses.
>
> When an address is pop'ed from the buffer, its page will be set as not
> exclusive. In this way, we avoid:
> - frequent set/unset of a page (causing frequent flushes as well)
> - the possibility to forget the EXCL bit set.

Why was this a possibility before? Shouldn't that be tackled in the
patch that introduced it?

>
> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
> ---
>  cputlb.c                | 29 +++++++++++++++++++----------
>  exec.c                  | 19 +++++++++++++++++++
>  include/qom/cpu.h       |  8 ++++++++
>  softmmu_llsc_template.h |  1 +
>  vl.c                    |  3 +++
>  5 files changed, 50 insertions(+), 10 deletions(-)
>
> diff --git a/cputlb.c b/cputlb.c
> index 06ce2da..f3c4d97 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -395,16 +395,6 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
>      env->tlb_v_table[mmu_idx][vidx] = *te;
>      env->iotlb_v[mmu_idx][vidx] = env->iotlb[mmu_idx][index];
>
> -    if (unlikely(!(te->addr_write & TLB_MMIO) && (te->addr_write & TLB_EXCL))) {
> -        /* We are removing an exclusive entry, set the page to dirty. This
> -         * is not be necessary if the vCPU has performed both SC and LL. */
> -        hwaddr hw_addr = (env->iotlb[mmu_idx][index].addr & TARGET_PAGE_MASK) +
> -                                          (te->addr_write & TARGET_PAGE_MASK);
> -        if (!cpu->ll_sc_context) {
> -            cpu_physical_memory_unset_excl(hw_addr);
> -        }
> -    }
> -

Errm is this right? I got confused reviewing 8/16 because my final tree
didn't have this code. I'm not sure the adding of history obviates the
need to clear the exclusive flag?

>      /* refill the tlb */
>      env->iotlb[mmu_idx][index].addr = iotlb - vaddr;
>      env->iotlb[mmu_idx][index].attrs = attrs;
> @@ -517,6 +507,25 @@ static inline bool lookup_and_reset_cpus_ll_addr(hwaddr addr, hwaddr size)
>      return ret;
>  }
>
> +extern CPUExclusiveHistory excl_history;
> +static inline void excl_history_put_addr(hwaddr addr)
> +{
> +    hwaddr last;
> +
> +    /* Calculate the index of the next exclusive address */
> +    excl_history.last_idx = (excl_history.last_idx + 1) % excl_history.length;
> +
> +    last = excl_history.c_array[excl_history.last_idx];
> +
> +    /* Unset EXCL bit of the oldest entry */
> +    if (last != EXCLUSIVE_RESET_ADDR) {
> +        cpu_physical_memory_unset_excl(last);
> +    }
> +
> +    /* Add a new address, overwriting the oldest one */
> +    excl_history.c_array[excl_history.last_idx] = addr & TARGET_PAGE_MASK;
> +}
> +
>  #define MMUSUFFIX _mmu
>
>  /* Generates LoadLink/StoreConditional helpers in softmmu_template.h */
> diff --git a/exec.c b/exec.c
> index 51f366d..2e123f1 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -177,6 +177,25 @@ struct CPUAddressSpace {
>      MemoryListener tcg_as_listener;
>  };
>
> +/* Exclusive memory support */
> +CPUExclusiveHistory excl_history;
> +void cpu_exclusive_history_init(void)
> +{
> +    /* Initialize exclusive history for atomic instruction handling. */
> +    if (tcg_enabled()) {
> +        g_assert(EXCLUSIVE_HISTORY_CPU_LEN * max_cpus <= UINT16_MAX);
> +        excl_history.length = EXCLUSIVE_HISTORY_CPU_LEN * max_cpus;
> +        excl_history.c_array = g_malloc(excl_history.length * sizeof(hwaddr));
> +        memset(excl_history.c_array, -1, excl_history.length * sizeof(hwaddr));
> +    }
> +}
> +
> +void cpu_exclusive_history_free(void)
> +{
> +    if (tcg_enabled()) {
> +        g_free(excl_history.c_array);
> +    }
> +}
>  #endif
>
>  #if !defined(CONFIG_USER_ONLY)
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 6f6c1c0..0452fd0 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -227,7 +227,15 @@ struct kvm_run;
>  #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
>
>  /* Atomic insn translation TLB support. */
> +typedef struct CPUExclusiveHistory {
> +    uint16_t last_idx;           /* index of last insertion */
> +    uint16_t length;             /* history's length, it depends on smp_cpus */
> +    hwaddr *c_array;             /* history's circular array */
> +} CPUExclusiveHistory;
>  #define EXCLUSIVE_RESET_ADDR ULLONG_MAX
> +#define EXCLUSIVE_HISTORY_CPU_LEN 256
> +void cpu_exclusive_history_init(void);
> +void cpu_exclusive_history_free(void);
>
>  /**
>   * CPUState:
> diff --git a/softmmu_llsc_template.h b/softmmu_llsc_template.h
> index b4712ba..b4e7f9d 100644
> --- a/softmmu_llsc_template.h
> +++ b/softmmu_llsc_template.h
> @@ -75,6 +75,7 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
>           * to request any flush. */
>          if (!cpu_physical_memory_is_excl(hw_addr)) {
>              cpu_physical_memory_set_excl(hw_addr);
> +            excl_history_put_addr(hw_addr);
>              CPU_FOREACH(cpu) {
>                  if (current_cpu != cpu) {
>                      tlb_flush(cpu, 1);
> diff --git a/vl.c b/vl.c
> index f043009..b22d99b 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -547,6 +547,7 @@ static void res_free(void)
>  {
>      g_free(boot_splash_filedata);
>      boot_splash_filedata = NULL;
> +    cpu_exclusive_history_free();
>  }
>
>  static int default_driver_check(void *opaque, QemuOpts *opts, Error **errp)
> @@ -4322,6 +4323,8 @@ int main(int argc, char **argv, char **envp)
>
>      configure_accelerator(current_machine);
>
> +    cpu_exclusive_history_init();
> +
>      if (qtest_chrdev) {
>          qtest_init(qtest_chrdev, qtest_log, &error_fatal);
>      }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 08/16] softmmu: Honor the new exclusive bitmap
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 08/16] softmmu: Honor the new exclusive bitmap Alvise Rigo
@ 2016-02-16 17:39   ` Alex Bennée
  2016-02-18 14:18     ` alvise rigo
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Bennée @ 2016-02-16 17:39 UTC (permalink / raw)
  To: Alvise Rigo
  Cc: mttcg, claudio.fontana, qemu-devel, pbonzini, jani.kokkonen, tech, rth


Alvise Rigo <a.rigo@virtualopensystems.com> writes:

> The pages set as exclusive (clean) in the DIRTY_MEMORY_EXCLUSIVE bitmap
> have to have their TLB entries flagged with TLB_EXCL. The accesses to
> pages with TLB_EXCL flag set have to be properly handled in that they
> can potentially invalidate an open LL/SC transaction.
>
> Modify the TLB entries generation to honor the new bitmap and extend
> the softmmu_template to handle the accesses made to guest pages marked
> as exclusive.
>
> In the case we remove a TLB entry marked as EXCL, we unset the
> corresponding exclusive bit in the bitmap.
>
> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
> ---
>  cputlb.c           | 44 ++++++++++++++++++++++++++++--
>  softmmu_template.h | 80 ++++++++++++++++++++++++++++++++++++++++++++++++------
>  2 files changed, 113 insertions(+), 11 deletions(-)
>
> diff --git a/cputlb.c b/cputlb.c
> index ce6d720..aa9cc17 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -395,6 +395,16 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
>      env->tlb_v_table[mmu_idx][vidx] = *te;
>      env->iotlb_v[mmu_idx][vidx] = env->iotlb[mmu_idx][index];
>
> +    if (unlikely(!(te->addr_write & TLB_MMIO) && (te->addr_write & TLB_EXCL))) {
> +        /* We are removing an exclusive entry, set the page to dirty. This
> +         * is not be necessary if the vCPU has performed both SC and LL. */
> +        hwaddr hw_addr = (env->iotlb[mmu_idx][index].addr & TARGET_PAGE_MASK) +
> +                                          (te->addr_write & TARGET_PAGE_MASK);
> +        if (!cpu->ll_sc_context) {
> +            cpu_physical_memory_unset_excl(hw_addr);
> +        }
> +    }
> +

I'm confused by the later patches removing this code and its comments
about missing the setting of flags.

>      /* refill the tlb */
>      env->iotlb[mmu_idx][index].addr = iotlb - vaddr;
>      env->iotlb[mmu_idx][index].attrs = attrs;
> @@ -418,9 +428,19 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
>          } else if (memory_region_is_ram(section->mr)
>                     && cpu_physical_memory_is_clean(section->mr->ram_addr
>                                                     + xlat)) {
> -            te->addr_write = address | TLB_NOTDIRTY;
> -        } else {
> -            te->addr_write = address;
> +            address |= TLB_NOTDIRTY;
> +        }
> +
> +        /* Since the MMIO accesses follow always the slow path, we do not need
> +         * to set any flag to trap the access */
> +        if (!(address & TLB_MMIO)) {
> +            if (cpu_physical_memory_is_excl(section->mr->ram_addr + xlat)) {
> +                /* There is at least one vCPU that has flagged the address as
> +                 * exclusive. */
> +                te->addr_write = address | TLB_EXCL;
> +            } else {
> +                te->addr_write = address;
> +            }

Again this is confusing when following patches blat over the code.
Perhaps this part of the patch should be:

        /* Since the MMIO accesses follow always the slow path, we do not need
         * to set any flag to trap the access */
        if (!(address & TLB_MMIO)) {
            if (cpu_physical_memory_is_excl(section->mr->ram_addr + xlat)) {
                /* There is at least one vCPU that has flagged the address as
                 * exclusive. */
                address |= TLB_EXCL;
            }
        }
        te->addr_write = address;

So the future patch is clearer about what it does?

>          }
>      } else {
>          te->addr_write = -1;
> @@ -474,6 +494,24 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr)
>      return qemu_ram_addr_from_host_nofail(p);
>  }
>
> +/* For every vCPU compare the exclusive address and reset it in case of a
> + * match. Since only one vCPU is running at once, no lock has to be held to
> + * guard this operation. */
> +static inline void lookup_and_reset_cpus_ll_addr(hwaddr addr, hwaddr size)
> +{
> +    CPUState *cpu;
> +
> +    CPU_FOREACH(cpu) {
> +        if (cpu->excl_protected_range.begin != EXCLUSIVE_RESET_ADDR &&
> +            ranges_overlap(cpu->excl_protected_range.begin,
> +                           cpu->excl_protected_range.end -
> +                           cpu->excl_protected_range.begin,
> +                           addr, size)) {
> +            cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
> +        }
> +    }
> +}
> +
>  #define MMUSUFFIX _mmu
>
>  /* Generates LoadLink/StoreConditional helpers in softmmu_template.h */
> diff --git a/softmmu_template.h b/softmmu_template.h
> index 4332db2..267c52a 100644
> --- a/softmmu_template.h
> +++ b/softmmu_template.h
> @@ -474,11 +474,43 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>          tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
>      }
>
> -    /* Handle an IO access.  */
> +    /* Handle an IO access or exclusive access.  */
>      if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
> -        glue(helper_le_st_name, _do_mmio_access)(env, val, addr, oi,
> -                                                 mmu_idx, index, retaddr);
> -        return;
> +        if ((tlb_addr & ~TARGET_PAGE_MASK) == TLB_EXCL) {

>From here:

> +            CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
> +            CPUState *cpu = ENV_GET_CPU(env);
> +            CPUClass *cc = CPU_GET_CLASS(cpu);
> +            /* The slow-path has been forced since we are writing to
> +             * exclusive-protected memory. */
> +            hwaddr hw_addr = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> +
> +            /* The function lookup_and_reset_cpus_ll_addr could have reset the
> +             * exclusive address. Fail the SC in this case.
> +             * N.B.: here excl_succeed == true means that the caller is
> +             * helper_stcond_name in softmmu_llsc_template.
> +             * On the contrary, excl_succeeded == false occurs when a VCPU is
> +             * writing through normal store to a page with TLB_EXCL bit set. */
> +            if (cpu->excl_succeeded) {
> +                if (!cc->cpu_valid_excl_access(cpu, hw_addr, DATA_SIZE)) {
> +                    /* The vCPU is SC-ing to an unprotected address. */
> +                    cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
> +                    cpu->excl_succeeded = false;
> +
> +                    return;
> +                }
> +            }
> +

To here is repeated code later on. It would be better to have a common
chunk of logic.

> +            glue(helper_le_st_name, _do_ram_access)(env, val, addr, oi,
> +                                                    mmu_idx, index, retaddr);
> +
> +            lookup_and_reset_cpus_ll_addr(hw_addr, DATA_SIZE);

In fact if the endianess is passed to the inline function you could have
a call that was:

        if (tlb_addr & TLB_EXCL) {
           glue(helper_st_name, _do_excl)(true, env, val, addr, oi, mmu_idx,
                                              index, retaddr);
        }

and

        if (tlb_addr & TLB_EXCL) {
           glue(helper_st_name, _do_excl)(false, env, val, addr, oi, mmu_idx,
                                              index, retaddr);
        }

later. Then future patches would just extend the single helper.

> +
> +            return;
> +        } else {
> +            glue(helper_le_st_name, _do_mmio_access)(env, val, addr, oi,
> +                                                     mmu_idx, index, retaddr);
> +            return;
> +        }
>      }
>
>      glue(helper_le_st_name, _do_ram_access)(env, val, addr, oi, mmu_idx, index,
> @@ -586,11 +618,43 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>          tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
>      }
>
> -    /* Handle an IO access.  */
> +    /* Handle an IO access or exclusive access.  */
>      if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
> -        glue(helper_be_st_name, _do_mmio_access)(env, val, addr, oi,
> -                                                 mmu_idx, index, retaddr);
> -        return;
> +        if ((tlb_addr & ~TARGET_PAGE_MASK) == TLB_EXCL) {
> +            CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
> +            CPUState *cpu = ENV_GET_CPU(env);
> +            CPUClass *cc = CPU_GET_CLASS(cpu);
> +            /* The slow-path has been forced since we are writing to
> +             * exclusive-protected memory. */
> +            hwaddr hw_addr = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> +
> +            /* The function lookup_and_reset_cpus_ll_addr could have reset the
> +             * exclusive address. Fail the SC in this case.
> +             * N.B.: here excl_succeed == true means that the caller is
> +             * helper_stcond_name in softmmu_llsc_template.
> +             * On the contrary, excl_succeeded == false occurs when a VCPU is
> +             * writing through normal store to a page with TLB_EXCL bit set. */
> +            if (cpu->excl_succeeded) {
> +                if (!cc->cpu_valid_excl_access(cpu, hw_addr, DATA_SIZE)) {
> +                    /* The vCPU is SC-ing to an unprotected address. */
> +                    cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
> +                    cpu->excl_succeeded = false;
> +
> +                    return;
> +                }
> +            }
> +
> +            glue(helper_be_st_name, _do_ram_access)(env, val, addr, oi,
> +                                                    mmu_idx, index, retaddr);
> +
> +            lookup_and_reset_cpus_ll_addr(hw_addr, DATA_SIZE);
> +
> +            return;
> +        } else {
> +            glue(helper_be_st_name, _do_mmio_access)(env, val, addr, oi,
> +                                                     mmu_idx, index, retaddr);
> +            return;
> +        }
>      }
>
>      glue(helper_be_st_name, _do_ram_access)(env, val, addr, oi, mmu_idx, index,


--
Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 09/16] softmmu: Include MMIO/invalid exclusive accesses
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 09/16] softmmu: Include MMIO/invalid exclusive accesses Alvise Rigo
@ 2016-02-16 17:49   ` Alex Bennée
  2016-02-18 14:18     ` alvise rigo
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Bennée @ 2016-02-16 17:49 UTC (permalink / raw)
  To: Alvise Rigo
  Cc: mttcg, claudio.fontana, qemu-devel, pbonzini, jani.kokkonen, tech, rth


Alvise Rigo <a.rigo@virtualopensystems.com> writes:

> Enable exclusive accesses when the MMIO/invalid flag is set in the TLB
> entry.
>
> In case a LL access is done to MMIO memory, we treat it differently from
> a RAM access in that we do not rely on the EXCL bitmap to flag the page
> as exclusive. In fact, we don't even need the TLB_EXCL flag to force the
> slow path, since it is always forced anyway.
>
> This commit does not take care of invalidating an MMIO exclusive range from
> other non-exclusive accesses i.e. CPU1 LoadLink to MMIO address X and
> CPU2 writes to X. This will be addressed in the following commit.
>
> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
> ---
>  cputlb.c           |  7 +++----
>  softmmu_template.h | 26 ++++++++++++++++++++------
>  2 files changed, 23 insertions(+), 10 deletions(-)
>
> diff --git a/cputlb.c b/cputlb.c
> index aa9cc17..87d09c8 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -424,7 +424,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
>          if ((memory_region_is_ram(section->mr) && section->readonly)
>              || memory_region_is_romd(section->mr)) {
>              /* Write access calls the I/O callback.  */
> -            te->addr_write = address | TLB_MMIO;
> +            address |= TLB_MMIO;
>          } else if (memory_region_is_ram(section->mr)
>                     && cpu_physical_memory_is_clean(section->mr->ram_addr
>                                                     + xlat)) {
> @@ -437,11 +437,10 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
>              if (cpu_physical_memory_is_excl(section->mr->ram_addr + xlat)) {
>                  /* There is at least one vCPU that has flagged the address as
>                   * exclusive. */
> -                te->addr_write = address | TLB_EXCL;
> -            } else {
> -                te->addr_write = address;
> +                address |= TLB_EXCL;
>              }
>          }
> +        te->addr_write = address;

As mentioned before I think this bit belongs in the earlier patch.

>      } else {
>          te->addr_write = -1;
>      }
> diff --git a/softmmu_template.h b/softmmu_template.h
> index 267c52a..c54bdc9 100644
> --- a/softmmu_template.h
> +++ b/softmmu_template.h
> @@ -476,7 +476,7 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>
>      /* Handle an IO access or exclusive access.  */
>      if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
> -        if ((tlb_addr & ~TARGET_PAGE_MASK) == TLB_EXCL) {
> +        if (tlb_addr & TLB_EXCL) {
>              CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
>              CPUState *cpu = ENV_GET_CPU(env);
>              CPUClass *cc = CPU_GET_CLASS(cpu);
> @@ -500,8 +500,15 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>                  }
>              }
>
> -            glue(helper_le_st_name, _do_ram_access)(env, val, addr, oi,
> -                                                    mmu_idx, index, retaddr);
> +            if (tlb_addr & ~(TARGET_PAGE_MASK | TLB_EXCL)) { /* MMIO
> access */

What about the other flags? Shouldn't this be tlb_addr & TLB_MMIO?

> +                glue(helper_le_st_name, _do_mmio_access)(env, val, addr, oi,
> +                                                         mmu_idx, index,
> +                                                         retaddr);
> +            } else {
> +                glue(helper_le_st_name, _do_ram_access)(env, val, addr, oi,
> +                                                        mmu_idx, index,
> +                                                        retaddr);
> +            }
>
>              lookup_and_reset_cpus_ll_addr(hw_addr, DATA_SIZE);
>
> @@ -620,7 +627,7 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>
>      /* Handle an IO access or exclusive access.  */
>      if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
> -        if ((tlb_addr & ~TARGET_PAGE_MASK) == TLB_EXCL) {
> +        if (tlb_addr & TLB_EXCL) {
>              CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
>              CPUState *cpu = ENV_GET_CPU(env);
>              CPUClass *cc = CPU_GET_CLASS(cpu);
> @@ -644,8 +651,15 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>                  }
>              }
>
> -            glue(helper_be_st_name, _do_ram_access)(env, val, addr, oi,
> -                                                    mmu_idx, index, retaddr);
> +            if (tlb_addr & ~(TARGET_PAGE_MASK | TLB_EXCL)) { /* MMIO access */
> +                glue(helper_be_st_name, _do_mmio_access)(env, val, addr, oi,
> +                                                         mmu_idx, index,
> +                                                         retaddr);
> +            } else {
> +                glue(helper_be_st_name, _do_ram_access)(env, val, addr, oi,
> +                                                        mmu_idx, index,
> +                                                        retaddr);
> +            }
>
>              lookup_and_reset_cpus_ll_addr(hw_addr, DATA_SIZE);


--
Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 10/16] softmmu: Protect MMIO exclusive range
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 10/16] softmmu: Protect MMIO exclusive range Alvise Rigo
@ 2016-02-17 18:55   ` Alex Bennée
  2016-02-18 14:15     ` alvise rigo
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Bennée @ 2016-02-17 18:55 UTC (permalink / raw)
  To: Alvise Rigo
  Cc: mttcg, claudio.fontana, qemu-devel, pbonzini, jani.kokkonen, tech, rth


Alvise Rigo <a.rigo@virtualopensystems.com> writes:

> As for the RAM case, also the MMIO exclusive ranges have to be protected
> by other CPU's accesses. In order to do that, we flag the accessed
> MemoryRegion to mark that an exclusive access has been performed and is
> not concluded yet.
>
> This flag will force the other CPUs to invalidate the exclusive range in
> case of collision.
>
> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
> ---
>  cputlb.c                | 20 +++++++++++++-------
>  include/exec/memory.h   |  1 +
>  softmmu_llsc_template.h | 11 +++++++----
>  softmmu_template.h      | 22 ++++++++++++++++++++++
>  4 files changed, 43 insertions(+), 11 deletions(-)
>
> diff --git a/cputlb.c b/cputlb.c
> index 87d09c8..06ce2da 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -496,19 +496,25 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr)
>  /* For every vCPU compare the exclusive address and reset it in case of a
>   * match. Since only one vCPU is running at once, no lock has to be held to
>   * guard this operation. */
> -static inline void lookup_and_reset_cpus_ll_addr(hwaddr addr, hwaddr size)
> +static inline bool lookup_and_reset_cpus_ll_addr(hwaddr addr, hwaddr size)
>  {
>      CPUState *cpu;
> +    bool ret = false;
>
>      CPU_FOREACH(cpu) {
> -        if (cpu->excl_protected_range.begin != EXCLUSIVE_RESET_ADDR &&
> -            ranges_overlap(cpu->excl_protected_range.begin,
> -                           cpu->excl_protected_range.end -
> -                           cpu->excl_protected_range.begin,
> -                           addr, size)) {
> -            cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
> +        if (current_cpu != cpu) {

I'm confused by this change. I don't see anywhere in the MMIO handling
why we would want to change skipping the CPU. Perhaps this belongs in
the previous patch? Maybe the function should really be
lookup_and_maybe_reset_other_cpu_ll_addr?

> +            if (cpu->excl_protected_range.begin != EXCLUSIVE_RESET_ADDR &&
> +                ranges_overlap(cpu->excl_protected_range.begin,
> +                               cpu->excl_protected_range.end -
> +                               cpu->excl_protected_range.begin,
> +                               addr, size)) {
> +                cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
> +                ret = true;
> +            }
>          }
>      }
> +
> +    return ret;
>  }
>
>  #define MMUSUFFIX _mmu
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 71e0480..bacb3ad 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -171,6 +171,7 @@ struct MemoryRegion {
>      bool rom_device;
>      bool flush_coalesced_mmio;
>      bool global_locking;
> +    bool pending_excl_access; /* A vCPU issued an exclusive access */
>      uint8_t dirty_log_mask;
>      ram_addr_t ram_addr;
>      Object *owner;
> diff --git a/softmmu_llsc_template.h b/softmmu_llsc_template.h
> index 101f5e8..b4712ba 100644
> --- a/softmmu_llsc_template.h
> +++ b/softmmu_llsc_template.h
> @@ -81,15 +81,18 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
>                  }
>              }
>          }
> +        /* For this vCPU, just update the TLB entry, no need to flush. */
> +        env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
>      } else {
> -        hw_error("EXCL accesses to MMIO regions not supported yet.");
> +        /* Set a pending exclusive access in the MemoryRegion */
> +        MemoryRegion *mr = iotlb_to_region(this,
> +                                           env->iotlb[mmu_idx][index].addr,
> +                                           env->iotlb[mmu_idx][index].attrs);
> +        mr->pending_excl_access = true;
>      }
>
>      cc->cpu_set_excl_protected_range(this, hw_addr, DATA_SIZE);
>
> -    /* For this vCPU, just update the TLB entry, no need to flush. */
> -    env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
> -
>      /* From now on we are in LL/SC context */
>      this->ll_sc_context = true;
>
> diff --git a/softmmu_template.h b/softmmu_template.h
> index c54bdc9..71c5152 100644
> --- a/softmmu_template.h
> +++ b/softmmu_template.h
> @@ -360,6 +360,14 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
>      MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
>
>      physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
> +
> +    /* Invalidate the exclusive range that overlaps this access */
> +    if (mr->pending_excl_access) {
> +        if (lookup_and_reset_cpus_ll_addr(physaddr, 1 << SHIFT)) {
> +            mr->pending_excl_access = false;
> +        }
> +    }
> +
>      if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
>          cpu_io_recompile(cpu, retaddr);
>      }
> @@ -504,6 +512,13 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>                  glue(helper_le_st_name, _do_mmio_access)(env, val, addr, oi,
>                                                           mmu_idx, index,
>                                                           retaddr);
> +                /* N.B.: Here excl_succeeded == true means that this access
> +                 * comes from an exclusive instruction. */
> +                if (cpu->excl_succeeded) {
> +                    MemoryRegion *mr = iotlb_to_region(cpu, iotlbentry->addr,
> +                                                       iotlbentry->attrs);
> +                    mr->pending_excl_access = false;
> +                }
>              } else {
>                  glue(helper_le_st_name, _do_ram_access)(env, val, addr, oi,
>                                                          mmu_idx, index,
> @@ -655,6 +670,13 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>                  glue(helper_be_st_name, _do_mmio_access)(env, val, addr, oi,
>                                                           mmu_idx, index,
>                                                           retaddr);
> +                /* N.B.: Here excl_succeeded == true means that this access
> +                 * comes from an exclusive instruction. */
> +                if (cpu->excl_succeeded) {
> +                    MemoryRegion *mr = iotlb_to_region(cpu, iotlbentry->addr,
> +                                                       iotlbentry->attrs);
> +                    mr->pending_excl_access = false;
> +                }

My comments about duplication on previous patches still stand.

>              } else {
>                  glue(helper_be_st_name, _do_ram_access)(env, val, addr, oi,
>                                                          mmu_idx, index,


--
Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 06/16] qom: cpu: Add CPUClass hooks for exclusive range
  2016-02-11 13:22   ` Alex Bennée
@ 2016-02-18 13:53     ` alvise rigo
  0 siblings, 0 replies; 50+ messages in thread
From: alvise rigo @ 2016-02-18 13:53 UTC (permalink / raw)
  To: Alex Bennée
  Cc: MTTCG Devel, Claudio Fontana, QEMU Developers, Paolo Bonzini,
	Jani Kokkonen, VirtualOpenSystems Technical Team,
	Richard Henderson

On Thu, Feb 11, 2016 at 2:22 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Alvise Rigo <a.rigo@virtualopensystems.com> writes:
>
>> The excl_protected_range is a hwaddr range set by the VCPU at the
>> execution of a LoadLink instruction. If a normal access writes to this
>> range, the corresponding StoreCond will fail.
>>
>> Each architecture can set the exclusive range when issuing the LoadLink
>> operation through a CPUClass hook. This comes in handy to emulate, for
>> instance, the exclusive monitor implemented in some ARM architectures
>> (more precisely, the Exclusive Reservation Granule).
>>
>> In addition, add another CPUClass hook called to decide whether a
>> StoreCond has to fail or not.
>>
>> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
>> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
>> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
>> ---
>>  include/qom/cpu.h | 15 +++++++++++++++
>>  qom/cpu.c         | 20 ++++++++++++++++++++
>>  2 files changed, 35 insertions(+)
>>
>> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
>> index 2e5229d..682c81d 100644
>> --- a/include/qom/cpu.h
>> +++ b/include/qom/cpu.h
>> @@ -29,6 +29,7 @@
>>  #include "qemu/queue.h"
>>  #include "qemu/thread.h"
>>  #include "qemu/typedefs.h"
>> +#include "qemu/range.h"
>>
>>  typedef int (*WriteCoreDumpFunction)(const void *buf, size_t size,
>>                                       void *opaque);
>> @@ -183,6 +184,12 @@ typedef struct CPUClass {
>>      void (*cpu_exec_exit)(CPUState *cpu);
>>      bool (*cpu_exec_interrupt)(CPUState *cpu, int interrupt_request);
>>
>> +    /* Atomic instruction handling */
>> +    void (*cpu_set_excl_protected_range)(CPUState *cpu, hwaddr addr,
>> +                                         hwaddr size);
>> +    int (*cpu_valid_excl_access)(CPUState *cpu, hwaddr addr,
>> +                                 hwaddr size);
>> +
>>      void (*disas_set_info)(CPUState *cpu, disassemble_info *info);
>>  } CPUClass;
>>
>> @@ -219,6 +226,9 @@ struct kvm_run;
>>  #define TB_JMP_CACHE_BITS 12
>>  #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
>>
>> +/* Atomic insn translation TLB support. */
>> +#define EXCLUSIVE_RESET_ADDR ULLONG_MAX
>> +
>>  /**
>>   * CPUState:
>>   * @cpu_index: CPU index (informative).
>> @@ -341,6 +351,11 @@ struct CPUState {
>>       */
>>      bool throttle_thread_scheduled;
>>
>> +    /* vCPU's exclusive addresses range.
>> +     * The address is set to EXCLUSIVE_RESET_ADDR if the vCPU is not
>> +     * in the middle of a LL/SC. */
>> +    struct Range excl_protected_range;
>> +
>
> In which case we should probably initialise that on CPU creation as we
> don't start in the middle of a LL/SC.

Agreed.

>
>>      /* Note that this is accessed at the start of every TB via a negative
>>         offset from AREG0.  Leave this field at the end so as to make the
>>         (absolute value) offset as small as possible.  This reduces code
>> diff --git a/qom/cpu.c b/qom/cpu.c
>> index 8f537a4..a5d360c 100644
>> --- a/qom/cpu.c
>> +++ b/qom/cpu.c
>> @@ -203,6 +203,24 @@ static bool cpu_common_exec_interrupt(CPUState *cpu, int int_req)
>>      return false;
>>  }
>>
>> +static void cpu_common_set_excl_range(CPUState *cpu, hwaddr addr, hwaddr size)
>> +{
>> +    cpu->excl_protected_range.begin = addr;
>> +    cpu->excl_protected_range.end = addr + size;
>> +}
>> +
>> +static int cpu_common_valid_excl_access(CPUState *cpu, hwaddr addr, hwaddr size)
>> +{
>> +    /* Check if the excl range completely covers the access */
>> +    if (cpu->excl_protected_range.begin <= addr &&
>> +        cpu->excl_protected_range.end >= addr + size) {
>> +
>> +        return 1;
>> +    }
>> +
>> +    return 0;
>> +}
>
> This can be a bool function.

OK.

Thank you,
alvise

>
>> +
>>  void cpu_dump_state(CPUState *cpu, FILE *f, fprintf_function cpu_fprintf,
>>                      int flags)
>>  {
>> @@ -355,6 +373,8 @@ static void cpu_class_init(ObjectClass *klass, void *data)
>>      k->cpu_exec_enter = cpu_common_noop;
>>      k->cpu_exec_exit = cpu_common_noop;
>>      k->cpu_exec_interrupt = cpu_common_exec_interrupt;
>> +    k->cpu_set_excl_protected_range = cpu_common_set_excl_range;
>> +    k->cpu_valid_excl_access = cpu_common_valid_excl_access;
>>      dc->realize = cpu_common_realizefn;
>>      /*
>>       * Reason: CPUs still need special care by board code: wiring up
>
>
> --
> Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 07/16] softmmu: Add helpers for a new slowpath
  2016-02-11 16:33   ` Alex Bennée
@ 2016-02-18 13:58     ` alvise rigo
  0 siblings, 0 replies; 50+ messages in thread
From: alvise rigo @ 2016-02-18 13:58 UTC (permalink / raw)
  To: Alex Bennée
  Cc: MTTCG Devel, Claudio Fontana, QEMU Developers, Paolo Bonzini,
	Jani Kokkonen, VirtualOpenSystems Technical Team,
	Richard Henderson

On Thu, Feb 11, 2016 at 5:33 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Alvise Rigo <a.rigo@virtualopensystems.com> writes:
>
>> The new helpers rely on the legacy ones to perform the actual read/write.
>>
>> The LoadLink helper (helper_ldlink_name) prepares the way for the
>> following StoreCond operation. It sets the linked address and the size
>> of the access. The LoadLink helper also updates the TLB entry of the
>> page involved in the LL/SC to all vCPUs by forcing a TLB flush, so that
>> the following accesses made by all the vCPUs will follow the slow path.
>>
>> The StoreConditional helper (helper_stcond_name) returns 1 if the
>> store has to fail due to a concurrent access to the same page by
>> another vCPU. A 'concurrent access' can be a store made by *any* vCPU
>> (although, some implementations allow stores made by the CPU that issued
>> the LoadLink).
>>
>> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
>> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
>> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
>> ---
>>  cputlb.c                |   3 ++
>>  include/qom/cpu.h       |   5 ++
>>  softmmu_llsc_template.h | 133 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  softmmu_template.h      |  12 +++++
>>  tcg/tcg.h               |  31 +++++++++++
>>  5 files changed, 184 insertions(+)
>>  create mode 100644 softmmu_llsc_template.h
>>
>> diff --git a/cputlb.c b/cputlb.c
>> index f6fb161..ce6d720 100644
>> --- a/cputlb.c
>> +++ b/cputlb.c
>> @@ -476,6 +476,8 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr)
>>
>>  #define MMUSUFFIX _mmu
>>
>> +/* Generates LoadLink/StoreConditional helpers in softmmu_template.h */
>> +#define GEN_EXCLUSIVE_HELPERS
>>  #define SHIFT 0
>>  #include "softmmu_template.h"
>>
>> @@ -488,6 +490,7 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr)
>>  #define SHIFT 3
>>  #include "softmmu_template.h"
>>  #undef MMUSUFFIX
>> +#undef GEN_EXCLUSIVE_HELPERS
>>
>>  #define MMUSUFFIX _cmmu
>>  #undef GETPC_ADJ
>> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
>> index 682c81d..6f6c1c0 100644
>> --- a/include/qom/cpu.h
>> +++ b/include/qom/cpu.h
>> @@ -351,10 +351,15 @@ struct CPUState {
>>       */
>>      bool throttle_thread_scheduled;
>>
>> +    /* Used by the atomic insn translation backend. */
>> +    bool ll_sc_context;
>>      /* vCPU's exclusive addresses range.
>>       * The address is set to EXCLUSIVE_RESET_ADDR if the vCPU is not
>>       * in the middle of a LL/SC. */
>>      struct Range excl_protected_range;
>> +    /* Used to carry the SC result but also to flag a normal store access made
>> +     * by a stcond (see softmmu_template.h). */
>> +    bool excl_succeeded;
>>
>>      /* Note that this is accessed at the start of every TB via a negative
>>         offset from AREG0.  Leave this field at the end so as to make the
>> diff --git a/softmmu_llsc_template.h b/softmmu_llsc_template.h
>> new file mode 100644
>> index 0000000..101f5e8
>> --- /dev/null
>> +++ b/softmmu_llsc_template.h
>> @@ -0,0 +1,133 @@
>> +/*
>> + *  Software MMU support (esclusive load/store operations)
>> + *
>> + * Generate helpers used by TCG for qemu_ldlink/stcond ops.
>> + *
>> + * Included from softmmu_template.h only.
>> + *
>> + * Copyright (c) 2015 Virtual Open Systems
>> + *
>> + * Authors:
>> + *  Alvise Rigo <a.rigo@virtualopensystems.com>
>> + *
>> + * This library is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU Lesser General Public
>> + * License as published by the Free Software Foundation; either
>> + * version 2 of the License, or (at your option) any later version.
>> + *
>> + * This library is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * Lesser General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU Lesser General Public
>> + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +/* This template does not generate together the le and be version, but only one
>> + * of the two depending on whether BIGENDIAN_EXCLUSIVE_HELPERS has been set.
>> + * The same nomenclature as softmmu_template.h is used for the exclusive
>> + * helpers.  */
>> +
>> +#ifdef BIGENDIAN_EXCLUSIVE_HELPERS
>> +
>> +#define helper_ldlink_name  glue(glue(helper_be_ldlink, USUFFIX), MMUSUFFIX)
>> +#define helper_stcond_name  glue(glue(helper_be_stcond, SUFFIX), MMUSUFFIX)
>> +#define helper_ld glue(glue(helper_be_ld, USUFFIX), MMUSUFFIX)
>> +#define helper_st glue(glue(helper_be_st, SUFFIX), MMUSUFFIX)
>> +
>> +#else /* LE helpers + 8bit helpers (generated only once for both LE end BE) */
>> +
>> +#if DATA_SIZE > 1
>> +#define helper_ldlink_name  glue(glue(helper_le_ldlink, USUFFIX), MMUSUFFIX)
>> +#define helper_stcond_name  glue(glue(helper_le_stcond, SUFFIX), MMUSUFFIX)
>> +#define helper_ld glue(glue(helper_le_ld, USUFFIX), MMUSUFFIX)
>> +#define helper_st glue(glue(helper_le_st, SUFFIX), MMUSUFFIX)
>> +#else /* DATA_SIZE <= 1 */
>> +#define helper_ldlink_name  glue(glue(helper_ret_ldlink, USUFFIX), MMUSUFFIX)
>> +#define helper_stcond_name  glue(glue(helper_ret_stcond, SUFFIX), MMUSUFFIX)
>> +#define helper_ld glue(glue(helper_ret_ld, USUFFIX), MMUSUFFIX)
>> +#define helper_st glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)
>> +#endif
>> +
>> +#endif
>> +
>> +WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
>> +                                TCGMemOpIdx oi, uintptr_t retaddr)
>> +{
>> +    WORD_TYPE ret;
>> +    int index;
>> +    CPUState *cpu, *this = ENV_GET_CPU(env);
>
> I'd rename this to this_cpu and move the *cpu definition to inside the
> if {} where it is used so no confusion occurs.
>
>> +    CPUClass *cc = CPU_GET_CLASS(this);
>> +    hwaddr hw_addr;
>> +    unsigned mmu_idx = get_mmuidx(oi);
>> +
>> +    /* Use the proper load helper from cpu_ldst.h */
>> +    ret = helper_ld(env, addr, oi, retaddr);
>> +
>> +    index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>> +
>> +    /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr + xlat)
>> +     * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
>> +    hw_addr = (env->iotlb[mmu_idx][index].addr & TARGET_PAGE_MASK) + addr;
>> +    if (likely(!(env->tlb_table[mmu_idx][index].addr_read & TLB_MMIO))) {
>> +        /* If all the vCPUs have the EXCL bit set for this page there is no need
>> +         * to request any flush. */
>> +        if (!cpu_physical_memory_is_excl(hw_addr)) {
>> +            cpu_physical_memory_set_excl(hw_addr);
>> +            CPU_FOREACH(cpu) {
>> +                if (current_cpu != cpu) {
>
> Why use current_cpu is we have this_cpu? I'd argue the check should use
> what we've been passed over global (and future TLS value).

Indeed, it makes more sense to do as you suggested.

Thank you,
alvise

>
>> +                    tlb_flush(cpu, 1);
>> +                }
>> +            }
>> +        }
>> +    } else {
>> +        hw_error("EXCL accesses to MMIO regions not supported yet.");
>> +    }
>> +
>> +    cc->cpu_set_excl_protected_range(this, hw_addr, DATA_SIZE);
>> +
>> +    /* For this vCPU, just update the TLB entry, no need to flush. */
>> +    env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
>> +
>> +    /* From now on we are in LL/SC context */
>> +    this->ll_sc_context = true;
>> +
>> +    return ret;
>> +}
>> +
>> +WORD_TYPE helper_stcond_name(CPUArchState *env, target_ulong addr,
>> +                             DATA_TYPE val, TCGMemOpIdx oi,
>> +                             uintptr_t retaddr)
>> +{
>> +    WORD_TYPE ret;
>> +    CPUState *cpu = ENV_GET_CPU(env);
>> +
>> +    if (!cpu->ll_sc_context) {
>> +        ret = 1;
>> +    } else {
>> +        /* We set it preventively to true to distinguish the following legacy
>> +         * access as one made by the store conditional wrapper. If the store
>> +         * conditional does not succeed, the value will be set to 0.*/
>> +        cpu->excl_succeeded = true;
>> +        helper_st(env, addr, val, oi, retaddr);
>> +
>> +        if (cpu->excl_succeeded) {
>> +            ret = 0;
>> +        } else {
>> +            ret = 1;
>> +        }
>> +    }
>> +
>> +    /* Unset LL/SC context */
>> +    cpu->ll_sc_context = false;
>> +    cpu->excl_succeeded = false;
>> +    cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
>> +
>> +    return ret;
>> +}
>> +
>> +#undef helper_ldlink_name
>> +#undef helper_stcond_name
>> +#undef helper_ld
>> +#undef helper_st
>> diff --git a/softmmu_template.h b/softmmu_template.h
>> index 6279437..4332db2 100644
>> --- a/softmmu_template.h
>> +++ b/softmmu_template.h
>> @@ -622,6 +622,18 @@ void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
>>  #endif
>>  #endif /* !defined(SOFTMMU_CODE_ACCESS) */
>>
>> +#ifdef GEN_EXCLUSIVE_HELPERS
>> +
>> +#if DATA_SIZE > 1 /* The 8-bit helpers are generate along with LE helpers */
>> +#define BIGENDIAN_EXCLUSIVE_HELPERS
>> +#include "softmmu_llsc_template.h"
>> +#undef BIGENDIAN_EXCLUSIVE_HELPERS
>> +#endif
>> +
>> +#include "softmmu_llsc_template.h"
>> +
>> +#endif /* !defined(GEN_EXCLUSIVE_HELPERS) */
>> +
>>  #undef READ_ACCESS_TYPE
>>  #undef SHIFT
>>  #undef DATA_TYPE
>> diff --git a/tcg/tcg.h b/tcg/tcg.h
>> index a696922..3e050a4 100644
>> --- a/tcg/tcg.h
>> +++ b/tcg/tcg.h
>> @@ -968,6 +968,21 @@ tcg_target_ulong helper_be_ldul_mmu(CPUArchState *env, target_ulong addr,
>>                                      TCGMemOpIdx oi, uintptr_t retaddr);
>>  uint64_t helper_be_ldq_mmu(CPUArchState *env, target_ulong addr,
>>                             TCGMemOpIdx oi, uintptr_t retaddr);
>> +/* Exclusive variants */
>> +tcg_target_ulong helper_ret_ldlinkub_mmu(CPUArchState *env, target_ulong addr,
>> +                                            TCGMemOpIdx oi, uintptr_t retaddr);
>> +tcg_target_ulong helper_le_ldlinkuw_mmu(CPUArchState *env, target_ulong addr,
>> +                                            TCGMemOpIdx oi, uintptr_t retaddr);
>> +tcg_target_ulong helper_le_ldlinkul_mmu(CPUArchState *env, target_ulong addr,
>> +                                            TCGMemOpIdx oi, uintptr_t retaddr);
>> +uint64_t helper_le_ldlinkq_mmu(CPUArchState *env, target_ulong addr,
>> +                                            TCGMemOpIdx oi, uintptr_t retaddr);
>> +tcg_target_ulong helper_be_ldlinkuw_mmu(CPUArchState *env, target_ulong addr,
>> +                                            TCGMemOpIdx oi, uintptr_t retaddr);
>> +tcg_target_ulong helper_be_ldlinkul_mmu(CPUArchState *env, target_ulong addr,
>> +                                            TCGMemOpIdx oi, uintptr_t retaddr);
>> +uint64_t helper_be_ldlinkq_mmu(CPUArchState *env, target_ulong addr,
>> +                                            TCGMemOpIdx oi, uintptr_t retaddr);
>>
>>  /* Value sign-extended to tcg register size.  */
>>  tcg_target_ulong helper_ret_ldsb_mmu(CPUArchState *env, target_ulong addr,
>> @@ -1010,6 +1025,22 @@ uint32_t helper_be_ldl_cmmu(CPUArchState *env, target_ulong addr,
>>                              TCGMemOpIdx oi, uintptr_t retaddr);
>>  uint64_t helper_be_ldq_cmmu(CPUArchState *env, target_ulong addr,
>>                              TCGMemOpIdx oi, uintptr_t retaddr);
>> +/* Exclusive variants */
>> +tcg_target_ulong helper_ret_stcondb_mmu(CPUArchState *env, target_ulong addr,
>> +                            uint8_t val, TCGMemOpIdx oi, uintptr_t retaddr);
>> +tcg_target_ulong helper_le_stcondw_mmu(CPUArchState *env, target_ulong addr,
>> +                            uint16_t val, TCGMemOpIdx oi, uintptr_t retaddr);
>> +tcg_target_ulong helper_le_stcondl_mmu(CPUArchState *env, target_ulong addr,
>> +                            uint32_t val, TCGMemOpIdx oi, uintptr_t retaddr);
>> +uint64_t helper_le_stcondq_mmu(CPUArchState *env, target_ulong addr,
>> +                            uint64_t val, TCGMemOpIdx oi, uintptr_t retaddr);
>> +tcg_target_ulong helper_be_stcondw_mmu(CPUArchState *env, target_ulong addr,
>> +                            uint16_t val, TCGMemOpIdx oi, uintptr_t retaddr);
>> +tcg_target_ulong helper_be_stcondl_mmu(CPUArchState *env, target_ulong addr,
>> +                            uint32_t val, TCGMemOpIdx oi, uintptr_t retaddr);
>> +uint64_t helper_be_stcondq_mmu(CPUArchState *env, target_ulong addr,
>> +                            uint64_t val, TCGMemOpIdx oi, uintptr_t retaddr);
>> +
>>
>>  /* Temporary aliases until backends are converted.  */
>>  #ifdef TARGET_WORDS_BIGENDIAN
>
>
> --
> Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 10/16] softmmu: Protect MMIO exclusive range
  2016-02-17 18:55   ` Alex Bennée
@ 2016-02-18 14:15     ` alvise rigo
  2016-02-18 16:25       ` Alex Bennée
  0 siblings, 1 reply; 50+ messages in thread
From: alvise rigo @ 2016-02-18 14:15 UTC (permalink / raw)
  To: Alex Bennée
  Cc: MTTCG Devel, Claudio Fontana, QEMU Developers, Paolo Bonzini,
	Jani Kokkonen, VirtualOpenSystems Technical Team,
	Richard Henderson

On Wed, Feb 17, 2016 at 7:55 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Alvise Rigo <a.rigo@virtualopensystems.com> writes:
>
>> As for the RAM case, also the MMIO exclusive ranges have to be protected
>> by other CPU's accesses. In order to do that, we flag the accessed
>> MemoryRegion to mark that an exclusive access has been performed and is
>> not concluded yet.
>>
>> This flag will force the other CPUs to invalidate the exclusive range in
>> case of collision.
>>
>> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
>> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
>> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
>> ---
>>  cputlb.c                | 20 +++++++++++++-------
>>  include/exec/memory.h   |  1 +
>>  softmmu_llsc_template.h | 11 +++++++----
>>  softmmu_template.h      | 22 ++++++++++++++++++++++
>>  4 files changed, 43 insertions(+), 11 deletions(-)
>>
>> diff --git a/cputlb.c b/cputlb.c
>> index 87d09c8..06ce2da 100644
>> --- a/cputlb.c
>> +++ b/cputlb.c
>> @@ -496,19 +496,25 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr)
>>  /* For every vCPU compare the exclusive address and reset it in case of a
>>   * match. Since only one vCPU is running at once, no lock has to be held to
>>   * guard this operation. */
>> -static inline void lookup_and_reset_cpus_ll_addr(hwaddr addr, hwaddr size)
>> +static inline bool lookup_and_reset_cpus_ll_addr(hwaddr addr, hwaddr size)
>>  {
>>      CPUState *cpu;
>> +    bool ret = false;
>>
>>      CPU_FOREACH(cpu) {
>> -        if (cpu->excl_protected_range.begin != EXCLUSIVE_RESET_ADDR &&
>> -            ranges_overlap(cpu->excl_protected_range.begin,
>> -                           cpu->excl_protected_range.end -
>> -                           cpu->excl_protected_range.begin,
>> -                           addr, size)) {
>> -            cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
>> +        if (current_cpu != cpu) {
>
> I'm confused by this change. I don't see anywhere in the MMIO handling
> why we would want to change skipping the CPU. Perhaps this belongs in
> the previous patch? Maybe the function should really be
> lookup_and_maybe_reset_other_cpu_ll_addr?

This is actually used later on in this patch.

>
>> +            if (cpu->excl_protected_range.begin != EXCLUSIVE_RESET_ADDR &&
>> +                ranges_overlap(cpu->excl_protected_range.begin,
>> +                               cpu->excl_protected_range.end -
>> +                               cpu->excl_protected_range.begin,
>> +                               addr, size)) {
>> +                cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
>> +                ret = true;
>> +            }
>>          }
>>      }
>> +
>> +    return ret;
>>  }
>>
>>  #define MMUSUFFIX _mmu
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index 71e0480..bacb3ad 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -171,6 +171,7 @@ struct MemoryRegion {
>>      bool rom_device;
>>      bool flush_coalesced_mmio;
>>      bool global_locking;
>> +    bool pending_excl_access; /* A vCPU issued an exclusive access */
>>      uint8_t dirty_log_mask;
>>      ram_addr_t ram_addr;
>>      Object *owner;
>> diff --git a/softmmu_llsc_template.h b/softmmu_llsc_template.h
>> index 101f5e8..b4712ba 100644
>> --- a/softmmu_llsc_template.h
>> +++ b/softmmu_llsc_template.h
>> @@ -81,15 +81,18 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
>>                  }
>>              }
>>          }
>> +        /* For this vCPU, just update the TLB entry, no need to flush. */
>> +        env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
>>      } else {
>> -        hw_error("EXCL accesses to MMIO regions not supported yet.");
>> +        /* Set a pending exclusive access in the MemoryRegion */
>> +        MemoryRegion *mr = iotlb_to_region(this,
>> +                                           env->iotlb[mmu_idx][index].addr,
>> +                                           env->iotlb[mmu_idx][index].attrs);
>> +        mr->pending_excl_access = true;
>>      }
>>
>>      cc->cpu_set_excl_protected_range(this, hw_addr, DATA_SIZE);
>>
>> -    /* For this vCPU, just update the TLB entry, no need to flush. */
>> -    env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
>> -
>>      /* From now on we are in LL/SC context */
>>      this->ll_sc_context = true;
>>
>> diff --git a/softmmu_template.h b/softmmu_template.h
>> index c54bdc9..71c5152 100644
>> --- a/softmmu_template.h
>> +++ b/softmmu_template.h
>> @@ -360,6 +360,14 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
>>      MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
>>
>>      physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
>> +
>> +    /* Invalidate the exclusive range that overlaps this access */
>> +    if (mr->pending_excl_access) {
>> +        if (lookup_and_reset_cpus_ll_addr(physaddr, 1 << SHIFT)) {

Here precisely. As you wrote, we can rename it to
lookup_and_maybe_reset_other_cpu_ll_addr even if this name does not
convince me. What about other_cpus_reset_colliding_ll_addr?

Thank you,
alvise

>> +            mr->pending_excl_access = false;
>> +        }
>> +    }
>> +
>>      if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
>>          cpu_io_recompile(cpu, retaddr);
>>      }
>> @@ -504,6 +512,13 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>                  glue(helper_le_st_name, _do_mmio_access)(env, val, addr, oi,
>>                                                           mmu_idx, index,
>>                                                           retaddr);
>> +                /* N.B.: Here excl_succeeded == true means that this access
>> +                 * comes from an exclusive instruction. */
>> +                if (cpu->excl_succeeded) {
>> +                    MemoryRegion *mr = iotlb_to_region(cpu, iotlbentry->addr,
>> +                                                       iotlbentry->attrs);
>> +                    mr->pending_excl_access = false;
>> +                }
>>              } else {
>>                  glue(helper_le_st_name, _do_ram_access)(env, val, addr, oi,
>>                                                          mmu_idx, index,
>> @@ -655,6 +670,13 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>                  glue(helper_be_st_name, _do_mmio_access)(env, val, addr, oi,
>>                                                           mmu_idx, index,
>>                                                           retaddr);
>> +                /* N.B.: Here excl_succeeded == true means that this access
>> +                 * comes from an exclusive instruction. */
>> +                if (cpu->excl_succeeded) {
>> +                    MemoryRegion *mr = iotlb_to_region(cpu, iotlbentry->addr,
>> +                                                       iotlbentry->attrs);
>> +                    mr->pending_excl_access = false;
>> +                }
>
> My comments about duplication on previous patches still stand.

Indeed.

Thank you,
alvise

>
>>              } else {
>>                  glue(helper_be_st_name, _do_ram_access)(env, val, addr, oi,
>>                                                          mmu_idx, index,
>
>
> --
> Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 13/16] softmmu: Add history of excl accesses
  2016-02-16 17:07   ` Alex Bennée
@ 2016-02-18 14:17     ` alvise rigo
  0 siblings, 0 replies; 50+ messages in thread
From: alvise rigo @ 2016-02-18 14:17 UTC (permalink / raw)
  To: Alex Bennée
  Cc: MTTCG Devel, Claudio Fontana, QEMU Developers, Paolo Bonzini,
	Jani Kokkonen, VirtualOpenSystems Technical Team,
	Richard Henderson

On Tue, Feb 16, 2016 at 6:07 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Alvise Rigo <a.rigo@virtualopensystems.com> writes:
>
>> Add a circular buffer to store the hw addresses used in the last
>> EXCLUSIVE_HISTORY_LEN exclusive accesses.
>>
>> When an address is pop'ed from the buffer, its page will be set as not
>> exclusive. In this way, we avoid:
>> - frequent set/unset of a page (causing frequent flushes as well)
>> - the possibility to forget the EXCL bit set.
>
> Why was this a possibility before? Shouldn't that be tackled in the
> patch that introduced it?

Yes and no. The problem happens for instance when a LL will not be
followed by the SC. In this situation, the flag will be set for the
page, but might remain set for the rest of the execution (unless a
complete LL/SC in performed later on in the same guest page).

>
>>
>> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
>> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
>> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
>> ---
>>  cputlb.c                | 29 +++++++++++++++++++----------
>>  exec.c                  | 19 +++++++++++++++++++
>>  include/qom/cpu.h       |  8 ++++++++
>>  softmmu_llsc_template.h |  1 +
>>  vl.c                    |  3 +++
>>  5 files changed, 50 insertions(+), 10 deletions(-)
>>
>> diff --git a/cputlb.c b/cputlb.c
>> index 06ce2da..f3c4d97 100644
>> --- a/cputlb.c
>> +++ b/cputlb.c
>> @@ -395,16 +395,6 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
>>      env->tlb_v_table[mmu_idx][vidx] = *te;
>>      env->iotlb_v[mmu_idx][vidx] = env->iotlb[mmu_idx][index];
>>
>> -    if (unlikely(!(te->addr_write & TLB_MMIO) && (te->addr_write & TLB_EXCL))) {
>> -        /* We are removing an exclusive entry, set the page to dirty. This
>> -         * is not be necessary if the vCPU has performed both SC and LL. */
>> -        hwaddr hw_addr = (env->iotlb[mmu_idx][index].addr & TARGET_PAGE_MASK) +
>> -                                          (te->addr_write & TARGET_PAGE_MASK);
>> -        if (!cpu->ll_sc_context) {
>> -            cpu_physical_memory_unset_excl(hw_addr);
>> -        }
>> -    }
>> -
>
> Errm is this right? I got confused reviewing 8/16 because my final tree
> didn't have this code. I'm not sure the adding of history obviates the
> need to clear the exclusive flag?

We will clear it adding a new item to the history. When an entry is
added, the oldest is removed and cleaned, solving the problem
mentioned above.

Thank you,
alvise

>
>>      /* refill the tlb */
>>      env->iotlb[mmu_idx][index].addr = iotlb - vaddr;
>>      env->iotlb[mmu_idx][index].attrs = attrs;
>> @@ -517,6 +507,25 @@ static inline bool lookup_and_reset_cpus_ll_addr(hwaddr addr, hwaddr size)
>>      return ret;
>>  }
>>
>> +extern CPUExclusiveHistory excl_history;
>> +static inline void excl_history_put_addr(hwaddr addr)
>> +{
>> +    hwaddr last;
>> +
>> +    /* Calculate the index of the next exclusive address */
>> +    excl_history.last_idx = (excl_history.last_idx + 1) % excl_history.length;
>> +
>> +    last = excl_history.c_array[excl_history.last_idx];
>> +
>> +    /* Unset EXCL bit of the oldest entry */
>> +    if (last != EXCLUSIVE_RESET_ADDR) {
>> +        cpu_physical_memory_unset_excl(last);
>> +    }
>> +
>> +    /* Add a new address, overwriting the oldest one */
>> +    excl_history.c_array[excl_history.last_idx] = addr & TARGET_PAGE_MASK;
>> +}
>> +
>>  #define MMUSUFFIX _mmu
>>
>>  /* Generates LoadLink/StoreConditional helpers in softmmu_template.h */
>> diff --git a/exec.c b/exec.c
>> index 51f366d..2e123f1 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -177,6 +177,25 @@ struct CPUAddressSpace {
>>      MemoryListener tcg_as_listener;
>>  };
>>
>> +/* Exclusive memory support */
>> +CPUExclusiveHistory excl_history;
>> +void cpu_exclusive_history_init(void)
>> +{
>> +    /* Initialize exclusive history for atomic instruction handling. */
>> +    if (tcg_enabled()) {
>> +        g_assert(EXCLUSIVE_HISTORY_CPU_LEN * max_cpus <= UINT16_MAX);
>> +        excl_history.length = EXCLUSIVE_HISTORY_CPU_LEN * max_cpus;
>> +        excl_history.c_array = g_malloc(excl_history.length * sizeof(hwaddr));
>> +        memset(excl_history.c_array, -1, excl_history.length * sizeof(hwaddr));
>> +    }
>> +}
>> +
>> +void cpu_exclusive_history_free(void)
>> +{
>> +    if (tcg_enabled()) {
>> +        g_free(excl_history.c_array);
>> +    }
>> +}
>>  #endif
>>
>>  #if !defined(CONFIG_USER_ONLY)
>> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
>> index 6f6c1c0..0452fd0 100644
>> --- a/include/qom/cpu.h
>> +++ b/include/qom/cpu.h
>> @@ -227,7 +227,15 @@ struct kvm_run;
>>  #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
>>
>>  /* Atomic insn translation TLB support. */
>> +typedef struct CPUExclusiveHistory {
>> +    uint16_t last_idx;           /* index of last insertion */
>> +    uint16_t length;             /* history's length, it depends on smp_cpus */
>> +    hwaddr *c_array;             /* history's circular array */
>> +} CPUExclusiveHistory;
>>  #define EXCLUSIVE_RESET_ADDR ULLONG_MAX
>> +#define EXCLUSIVE_HISTORY_CPU_LEN 256
>> +void cpu_exclusive_history_init(void);
>> +void cpu_exclusive_history_free(void);
>>
>>  /**
>>   * CPUState:
>> diff --git a/softmmu_llsc_template.h b/softmmu_llsc_template.h
>> index b4712ba..b4e7f9d 100644
>> --- a/softmmu_llsc_template.h
>> +++ b/softmmu_llsc_template.h
>> @@ -75,6 +75,7 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
>>           * to request any flush. */
>>          if (!cpu_physical_memory_is_excl(hw_addr)) {
>>              cpu_physical_memory_set_excl(hw_addr);
>> +            excl_history_put_addr(hw_addr);
>>              CPU_FOREACH(cpu) {
>>                  if (current_cpu != cpu) {
>>                      tlb_flush(cpu, 1);
>> diff --git a/vl.c b/vl.c
>> index f043009..b22d99b 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -547,6 +547,7 @@ static void res_free(void)
>>  {
>>      g_free(boot_splash_filedata);
>>      boot_splash_filedata = NULL;
>> +    cpu_exclusive_history_free();
>>  }
>>
>>  static int default_driver_check(void *opaque, QemuOpts *opts, Error **errp)
>> @@ -4322,6 +4323,8 @@ int main(int argc, char **argv, char **envp)
>>
>>      configure_accelerator(current_machine);
>>
>> +    cpu_exclusive_history_init();
>> +
>>      if (qtest_chrdev) {
>>          qtest_init(qtest_chrdev, qtest_log, &error_fatal);
>>      }
>
>
> --
> Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 09/16] softmmu: Include MMIO/invalid exclusive accesses
  2016-02-16 17:49   ` Alex Bennée
@ 2016-02-18 14:18     ` alvise rigo
  2016-02-18 16:26       ` Alex Bennée
  0 siblings, 1 reply; 50+ messages in thread
From: alvise rigo @ 2016-02-18 14:18 UTC (permalink / raw)
  To: Alex Bennée
  Cc: MTTCG Devel, Claudio Fontana, QEMU Developers, Paolo Bonzini,
	Jani Kokkonen, VirtualOpenSystems Technical Team,
	Richard Henderson

On Tue, Feb 16, 2016 at 6:49 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Alvise Rigo <a.rigo@virtualopensystems.com> writes:
>
>> Enable exclusive accesses when the MMIO/invalid flag is set in the TLB
>> entry.
>>
>> In case a LL access is done to MMIO memory, we treat it differently from
>> a RAM access in that we do not rely on the EXCL bitmap to flag the page
>> as exclusive. In fact, we don't even need the TLB_EXCL flag to force the
>> slow path, since it is always forced anyway.
>>
>> This commit does not take care of invalidating an MMIO exclusive range from
>> other non-exclusive accesses i.e. CPU1 LoadLink to MMIO address X and
>> CPU2 writes to X. This will be addressed in the following commit.
>>
>> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
>> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
>> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
>> ---
>>  cputlb.c           |  7 +++----
>>  softmmu_template.h | 26 ++++++++++++++++++++------
>>  2 files changed, 23 insertions(+), 10 deletions(-)
>>
>> diff --git a/cputlb.c b/cputlb.c
>> index aa9cc17..87d09c8 100644
>> --- a/cputlb.c
>> +++ b/cputlb.c
>> @@ -424,7 +424,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
>>          if ((memory_region_is_ram(section->mr) && section->readonly)
>>              || memory_region_is_romd(section->mr)) {
>>              /* Write access calls the I/O callback.  */
>> -            te->addr_write = address | TLB_MMIO;
>> +            address |= TLB_MMIO;
>>          } else if (memory_region_is_ram(section->mr)
>>                     && cpu_physical_memory_is_clean(section->mr->ram_addr
>>                                                     + xlat)) {
>> @@ -437,11 +437,10 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
>>              if (cpu_physical_memory_is_excl(section->mr->ram_addr + xlat)) {
>>                  /* There is at least one vCPU that has flagged the address as
>>                   * exclusive. */
>> -                te->addr_write = address | TLB_EXCL;
>> -            } else {
>> -                te->addr_write = address;
>> +                address |= TLB_EXCL;
>>              }
>>          }
>> +        te->addr_write = address;
>
> As mentioned before I think this bit belongs in the earlier patch.
>
>>      } else {
>>          te->addr_write = -1;
>>      }
>> diff --git a/softmmu_template.h b/softmmu_template.h
>> index 267c52a..c54bdc9 100644
>> --- a/softmmu_template.h
>> +++ b/softmmu_template.h
>> @@ -476,7 +476,7 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>
>>      /* Handle an IO access or exclusive access.  */
>>      if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
>> -        if ((tlb_addr & ~TARGET_PAGE_MASK) == TLB_EXCL) {
>> +        if (tlb_addr & TLB_EXCL) {
>>              CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
>>              CPUState *cpu = ENV_GET_CPU(env);
>>              CPUClass *cc = CPU_GET_CLASS(cpu);
>> @@ -500,8 +500,15 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>                  }
>>              }
>>
>> -            glue(helper_le_st_name, _do_ram_access)(env, val, addr, oi,
>> -                                                    mmu_idx, index, retaddr);
>> +            if (tlb_addr & ~(TARGET_PAGE_MASK | TLB_EXCL)) { /* MMIO
>> access */
>
> What about the other flags? Shouldn't this be tlb_addr & TLB_MMIO?

The upstream QEMU's condition to follow the IO access path is:
if (unlikely(tlb_addr & ~TARGET_PAGE_MASK))
Now, we split this in:
if (tlb_addr & TLB_EXCL)
for RAM exclusive accesses and
if (tlb_addr & ~(TARGET_PAGE_MASK | TLB_EXCL))
for IO accesses. In this last case, we handle also the IO exclusive accesses.

>
>> +                glue(helper_le_st_name, _do_mmio_access)(env, val, addr, oi,
>> +                                                         mmu_idx, index,
>> +                                                         retaddr);
>> +            } else {
>> +                glue(helper_le_st_name, _do_ram_access)(env, val, addr, oi,
>> +                                                        mmu_idx, index,
>> +                                                        retaddr);
>> +            }
>>
>>              lookup_and_reset_cpus_ll_addr(hw_addr, DATA_SIZE);
>>
>> @@ -620,7 +627,7 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>
>>      /* Handle an IO access or exclusive access.  */
>>      if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
>> -        if ((tlb_addr & ~TARGET_PAGE_MASK) == TLB_EXCL) {
>> +        if (tlb_addr & TLB_EXCL) {
>>              CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
>>              CPUState *cpu = ENV_GET_CPU(env);
>>              CPUClass *cc = CPU_GET_CLASS(cpu);
>> @@ -644,8 +651,15 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>                  }
>>              }
>>
>> -            glue(helper_be_st_name, _do_ram_access)(env, val, addr, oi,
>> -                                                    mmu_idx, index, retaddr);
>> +            if (tlb_addr & ~(TARGET_PAGE_MASK | TLB_EXCL)) { /* MMIO access */
>> +                glue(helper_be_st_name, _do_mmio_access)(env, val, addr, oi,
>> +                                                         mmu_idx, index,
>> +                                                         retaddr);
>> +            } else {
>> +                glue(helper_be_st_name, _do_ram_access)(env, val, addr, oi,
>> +                                                        mmu_idx, index,
>> +                                                        retaddr);
>> +            }
>>
>>              lookup_and_reset_cpus_ll_addr(hw_addr, DATA_SIZE);
>
>
> --
> Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 08/16] softmmu: Honor the new exclusive bitmap
  2016-02-16 17:39   ` Alex Bennée
@ 2016-02-18 14:18     ` alvise rigo
  0 siblings, 0 replies; 50+ messages in thread
From: alvise rigo @ 2016-02-18 14:18 UTC (permalink / raw)
  To: Alex Bennée
  Cc: MTTCG Devel, Claudio Fontana, QEMU Developers, Paolo Bonzini,
	Jani Kokkonen, VirtualOpenSystems Technical Team,
	Richard Henderson

On Tue, Feb 16, 2016 at 6:39 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Alvise Rigo <a.rigo@virtualopensystems.com> writes:
>
> > The pages set as exclusive (clean) in the DIRTY_MEMORY_EXCLUSIVE bitmap
> > have to have their TLB entries flagged with TLB_EXCL. The accesses to
> > pages with TLB_EXCL flag set have to be properly handled in that they
> > can potentially invalidate an open LL/SC transaction.
> >
> > Modify the TLB entries generation to honor the new bitmap and extend
> > the softmmu_template to handle the accesses made to guest pages marked
> > as exclusive.
> >
> > In the case we remove a TLB entry marked as EXCL, we unset the
> > corresponding exclusive bit in the bitmap.
> >
> > Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
> > Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
> > Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
> > ---
> >  cputlb.c           | 44 ++++++++++++++++++++++++++++--
> >  softmmu_template.h | 80 ++++++++++++++++++++++++++++++++++++++++++++++++------
> >  2 files changed, 113 insertions(+), 11 deletions(-)
> >
> > diff --git a/cputlb.c b/cputlb.c
> > index ce6d720..aa9cc17 100644
> > --- a/cputlb.c
> > +++ b/cputlb.c
> > @@ -395,6 +395,16 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
> >      env->tlb_v_table[mmu_idx][vidx] = *te;
> >      env->iotlb_v[mmu_idx][vidx] = env->iotlb[mmu_idx][index];
> >
> > +    if (unlikely(!(te->addr_write & TLB_MMIO) && (te->addr_write & TLB_EXCL))) {
> > +        /* We are removing an exclusive entry, set the page to dirty. This
> > +         * is not be necessary if the vCPU has performed both SC and LL. */
> > +        hwaddr hw_addr = (env->iotlb[mmu_idx][index].addr & TARGET_PAGE_MASK) +
> > +                                          (te->addr_write & TARGET_PAGE_MASK);
> > +        if (!cpu->ll_sc_context) {
> > +            cpu_physical_memory_unset_excl(hw_addr);
> > +        }
> > +    }
> > +
>
> I'm confused by the later patches removing this code and its comments
> about missing the setting of flags.


I hope I answered to this question in the other thread.

>
>
> >      /* refill the tlb */
> >      env->iotlb[mmu_idx][index].addr = iotlb - vaddr;
> >      env->iotlb[mmu_idx][index].attrs = attrs;
> > @@ -418,9 +428,19 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
> >          } else if (memory_region_is_ram(section->mr)
> >                     && cpu_physical_memory_is_clean(section->mr->ram_addr
> >                                                     + xlat)) {
> > -            te->addr_write = address | TLB_NOTDIRTY;
> > -        } else {
> > -            te->addr_write = address;
> > +            address |= TLB_NOTDIRTY;
> > +        }
> > +
> > +        /* Since the MMIO accesses follow always the slow path, we do not need
> > +         * to set any flag to trap the access */
> > +        if (!(address & TLB_MMIO)) {
> > +            if (cpu_physical_memory_is_excl(section->mr->ram_addr + xlat)) {
> > +                /* There is at least one vCPU that has flagged the address as
> > +                 * exclusive. */
> > +                te->addr_write = address | TLB_EXCL;
> > +            } else {
> > +                te->addr_write = address;
> > +            }
>
> Again this is confusing when following patches blat over the code.
> Perhaps this part of the patch should be:
>
>         /* Since the MMIO accesses follow always the slow path, we do not need
>          * to set any flag to trap the access */
>         if (!(address & TLB_MMIO)) {
>             if (cpu_physical_memory_is_excl(section->mr->ram_addr + xlat)) {
>                 /* There is at least one vCPU that has flagged the address as
>                  * exclusive. */
>                 address |= TLB_EXCL;
>             }
>         }
>         te->addr_write = address;
>
> So the future patch is clearer about what it does?


Yes, this is more clear. I will fix it.

>
>
> >          }
> >      } else {
> >          te->addr_write = -1;
> > @@ -474,6 +494,24 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr)
> >      return qemu_ram_addr_from_host_nofail(p);
> >  }
> >
> > +/* For every vCPU compare the exclusive address and reset it in case of a
> > + * match. Since only one vCPU is running at once, no lock has to be held to
> > + * guard this operation. */
> > +static inline void lookup_and_reset_cpus_ll_addr(hwaddr addr, hwaddr size)
> > +{
> > +    CPUState *cpu;
> > +
> > +    CPU_FOREACH(cpu) {
> > +        if (cpu->excl_protected_range.begin != EXCLUSIVE_RESET_ADDR &&
> > +            ranges_overlap(cpu->excl_protected_range.begin,
> > +                           cpu->excl_protected_range.end -
> > +                           cpu->excl_protected_range.begin,
> > +                           addr, size)) {
> > +            cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
> > +        }
> > +    }
> > +}
> > +
> >  #define MMUSUFFIX _mmu
> >
> >  /* Generates LoadLink/StoreConditional helpers in softmmu_template.h */
> > diff --git a/softmmu_template.h b/softmmu_template.h
> > index 4332db2..267c52a 100644
> > --- a/softmmu_template.h
> > +++ b/softmmu_template.h
> > @@ -474,11 +474,43 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
> >          tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
> >      }
> >
> > -    /* Handle an IO access.  */
> > +    /* Handle an IO access or exclusive access.  */
> >      if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
> > -        glue(helper_le_st_name, _do_mmio_access)(env, val, addr, oi,
> > -                                                 mmu_idx, index, retaddr);
> > -        return;
> > +        if ((tlb_addr & ~TARGET_PAGE_MASK) == TLB_EXCL) {
>
> From here:
>
> > +            CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
> > +            CPUState *cpu = ENV_GET_CPU(env);
> > +            CPUClass *cc = CPU_GET_CLASS(cpu);
> > +            /* The slow-path has been forced since we are writing to
> > +             * exclusive-protected memory. */
> > +            hwaddr hw_addr = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> > +
> > +            /* The function lookup_and_reset_cpus_ll_addr could have reset the
> > +             * exclusive address. Fail the SC in this case.
> > +             * N.B.: here excl_succeed == true means that the caller is
> > +             * helper_stcond_name in softmmu_llsc_template.
> > +             * On the contrary, excl_succeeded == false occurs when a VCPU is
> > +             * writing through normal store to a page with TLB_EXCL bit set. */
> > +            if (cpu->excl_succeeded) {
> > +                if (!cc->cpu_valid_excl_access(cpu, hw_addr, DATA_SIZE)) {
> > +                    /* The vCPU is SC-ing to an unprotected address. */
> > +                    cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
> > +                    cpu->excl_succeeded = false;
> > +
> > +                    return;
> > +                }
> > +            }
> > +
>
> To here is repeated code later on. It would be better to have a common
> chunk of logic.
>
> > +            glue(helper_le_st_name, _do_ram_access)(env, val, addr, oi,
> > +                                                    mmu_idx, index, retaddr);
> > +
> > +            lookup_and_reset_cpus_ll_addr(hw_addr, DATA_SIZE);
>
> In fact if the endianess is passed to the inline function you could have
> a call that was:
>
>         if (tlb_addr & TLB_EXCL) {
>            glue(helper_st_name, _do_excl)(true, env, val, addr, oi, mmu_idx,
>                                               index, retaddr);
>         }
>
> and
>
>         if (tlb_addr & TLB_EXCL) {
>            glue(helper_st_name, _do_excl)(false, env, val, addr, oi, mmu_idx,
>                                               index, retaddr);
>         }
>
> later. Then future patches would just extend the single helper.

OK, let's shirk down this file :)

Thank you,
alvise

>
>
> > +
> > +            return;
> > +        } else {
> > +            glue(helper_le_st_name, _do_mmio_access)(env, val, addr, oi,
> > +                                                     mmu_idx, index, retaddr);
> > +            return;
> > +        }
> >      }
> >
> >      glue(helper_le_st_name, _do_ram_access)(env, val, addr, oi, mmu_idx, index,
> > @@ -586,11 +618,43 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
> >          tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
> >      }
> >
> > -    /* Handle an IO access.  */
> > +    /* Handle an IO access or exclusive access.  */
> >      if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
> > -        glue(helper_be_st_name, _do_mmio_access)(env, val, addr, oi,
> > -                                                 mmu_idx, index, retaddr);
> > -        return;
> > +        if ((tlb_addr & ~TARGET_PAGE_MASK) == TLB_EXCL) {
> > +            CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
> > +            CPUState *cpu = ENV_GET_CPU(env);
> > +            CPUClass *cc = CPU_GET_CLASS(cpu);
> > +            /* The slow-path has been forced since we are writing to
> > +             * exclusive-protected memory. */
> > +            hwaddr hw_addr = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> > +
> > +            /* The function lookup_and_reset_cpus_ll_addr could have reset the
> > +             * exclusive address. Fail the SC in this case.
> > +             * N.B.: here excl_succeed == true means that the caller is
> > +             * helper_stcond_name in softmmu_llsc_template.
> > +             * On the contrary, excl_succeeded == false occurs when a VCPU is
> > +             * writing through normal store to a page with TLB_EXCL bit set. */
> > +            if (cpu->excl_succeeded) {
> > +                if (!cc->cpu_valid_excl_access(cpu, hw_addr, DATA_SIZE)) {
> > +                    /* The vCPU is SC-ing to an unprotected address. */
> > +                    cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
> > +                    cpu->excl_succeeded = false;
> > +
> > +                    return;
> > +                }
> > +            }
> > +
> > +            glue(helper_be_st_name, _do_ram_access)(env, val, addr, oi,
> > +                                                    mmu_idx, index, retaddr);
> > +
> > +            lookup_and_reset_cpus_ll_addr(hw_addr, DATA_SIZE);
> > +
> > +            return;
> > +        } else {
> > +            glue(helper_be_st_name, _do_mmio_access)(env, val, addr, oi,
> > +                                                     mmu_idx, index, retaddr);
> > +            return;
> > +        }
> >      }
> >
> >      glue(helper_be_st_name, _do_ram_access)(env, val, addr, oi, mmu_idx, index,
>
>
> --
> Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 11/16] tcg: Create new runtime helpers for excl accesses
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 11/16] tcg: Create new runtime helpers for excl accesses Alvise Rigo
@ 2016-02-18 16:16   ` Alex Bennée
  0 siblings, 0 replies; 50+ messages in thread
From: Alex Bennée @ 2016-02-18 16:16 UTC (permalink / raw)
  To: Alvise Rigo
  Cc: mttcg, claudio.fontana, qemu-devel, pbonzini, jani.kokkonen, tech, rth


Alvise Rigo <a.rigo@virtualopensystems.com> writes:

> Introduce a set of new runtime helpers to handle exclusive instructions.
> These helpers are used as hooks to call the respective LL/SC helpers in
> softmmu_llsc_template.h from TCG code.
>
> The helpers ending with an "a" make an alignment check.
>
> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
> ---
>  Makefile.target             |   2 +-
>  include/exec/helper-gen.h   |   3 ++
>  include/exec/helper-proto.h |   1 +
>  include/exec/helper-tcg.h   |   3 ++
>  tcg-llsc-helper.c           | 104 ++++++++++++++++++++++++++++++++++++++++++++
>  tcg-llsc-helper.h           |  61 ++++++++++++++++++++++++++

I suspect we shouldn't be adding tcg specific stuff into the top level
of the directory. I know there is some stuff here but the general trend
is moving stuff into subdirs. I'll defer to the maintainers here.

>  tcg/tcg-llsc-gen-helper.h   |  67 ++++++++++++++++++++++++++++
>  7 files changed, 240 insertions(+), 1 deletion(-)
>  create mode 100644 tcg-llsc-helper.c
>  create mode 100644 tcg-llsc-helper.h
>  create mode 100644 tcg/tcg-llsc-gen-helper.h
>
> diff --git a/Makefile.target b/Makefile.target
> index 34ddb7e..faf32a2 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -135,7 +135,7 @@ obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o
>  obj-y += qtest.o bootdevice.o
>  obj-y += hw/
>  obj-$(CONFIG_KVM) += kvm-all.o
> -obj-y += memory.o cputlb.o
> +obj-y += memory.o cputlb.o tcg-llsc-helper.o
>  obj-y += memory_mapping.o
>  obj-y += dump.o
>  obj-y += migration/ram.o migration/savevm.o
> diff --git a/include/exec/helper-gen.h b/include/exec/helper-gen.h
> index 0d0da3a..f8483a9 100644
> --- a/include/exec/helper-gen.h
> +++ b/include/exec/helper-gen.h
> @@ -60,6 +60,9 @@ static inline void glue(gen_helper_, name)(dh_retvar_decl(ret)          \
>  #include "trace/generated-helpers.h"
>  #include "trace/generated-helpers-wrappers.h"
>  #include "tcg-runtime.h"
> +#if defined(CONFIG_SOFTMMU)
> +#include "tcg-llsc-gen-helper.h"
> +#endif
>
>  #undef DEF_HELPER_FLAGS_0
>  #undef DEF_HELPER_FLAGS_1
> diff --git a/include/exec/helper-proto.h b/include/exec/helper-proto.h
> index effdd43..90be2fd 100644
> --- a/include/exec/helper-proto.h
> +++ b/include/exec/helper-proto.h
> @@ -29,6 +29,7 @@ dh_ctype(ret) HELPER(name) (dh_ctype(t1), dh_ctype(t2), dh_ctype(t3), \
>  #include "helper.h"
>  #include "trace/generated-helpers.h"
>  #include "tcg-runtime.h"
> +#include "tcg/tcg-llsc-gen-helper.h"
>
>  #undef DEF_HELPER_FLAGS_0
>  #undef DEF_HELPER_FLAGS_1
> diff --git a/include/exec/helper-tcg.h b/include/exec/helper-tcg.h
> index 79fa3c8..6228a7f 100644
> --- a/include/exec/helper-tcg.h
> +++ b/include/exec/helper-tcg.h
> @@ -38,6 +38,9 @@
>  #include "helper.h"
>  #include "trace/generated-helpers.h"
>  #include "tcg-runtime.h"
> +#ifdef CONFIG_SOFTMMU
> +#include "tcg-llsc-gen-helper.h"
> +#endif
>
>  #undef DEF_HELPER_FLAGS_0
>  #undef DEF_HELPER_FLAGS_1
> diff --git a/tcg-llsc-helper.c b/tcg-llsc-helper.c
> new file mode 100644
> index 0000000..646b4ba
> --- /dev/null
> +++ b/tcg-llsc-helper.c
> @@ -0,0 +1,104 @@
> +/*
> + * Runtime helpers for atomic istruction emulation
> + *
> + * Copyright (c) 2015 Virtual Open Systems
> + *
> + * Authors:
> + *  Alvise Rigo <a.rigo@virtualopensystems.com>
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "exec/cpu_ldst.h"
> +#include "exec/helper-head.h"
> +#include "tcg-llsc-helper.h"
> +
> +#define LDEX_HELPER(SUFF, OPC, FUNC)                                       \
> +uint32_t HELPER(ldlink_i##SUFF)(CPUArchState *env, target_ulong addr,      \
> +                                uint32_t index)                            \
> +{                                                                          \
> +    CPUArchState *state = env;                                             \
> +    TCGMemOpIdx op;                                                        \
> +                                                                           \
> +    op = make_memop_idx((OPC), index);                                     \
> +                                                                           \
> +    return (uint32_t)FUNC(state, addr, op, GETRA());                       \
> +}
> +
> +#define STEX_HELPER(SUFF, DATA_TYPE, OPC, FUNC)                            \
> +target_ulong HELPER(stcond_i##SUFF)(CPUArchState *env, target_ulong addr,  \
> +                                    uint32_t val, uint32_t index)          \
> +{                                                                          \
> +    CPUArchState *state = env;                                             \
> +    TCGMemOpIdx op;                                                        \
> +                                                                           \
> +    op = make_memop_idx((OPC), index);                                     \
> +                                                                           \
> +    return (target_ulong)FUNC(state, addr, val, op, GETRA());              \
> +}
> +
> +
> +LDEX_HELPER(8, MO_UB, helper_ret_ldlinkub_mmu)
> +LDEX_HELPER(16_be, MO_BEUW, helper_be_ldlinkuw_mmu)
> +LDEX_HELPER(16_bea, MO_BEUW | MO_ALIGN, helper_be_ldlinkuw_mmu)
> +LDEX_HELPER(32_be, MO_BEUL, helper_be_ldlinkul_mmu)
> +LDEX_HELPER(32_bea, MO_BEUL | MO_ALIGN, helper_be_ldlinkul_mmu)
> +LDEX_HELPER(16_le, MO_LEUW, helper_le_ldlinkuw_mmu)
> +LDEX_HELPER(16_lea, MO_LEUW | MO_ALIGN, helper_le_ldlinkuw_mmu)
> +LDEX_HELPER(32_le, MO_LEUL, helper_le_ldlinkul_mmu)
> +LDEX_HELPER(32_lea, MO_LEUL | MO_ALIGN, helper_le_ldlinkul_mmu)
> +
> +STEX_HELPER(8, uint8_t, MO_UB, helper_ret_stcondb_mmu)
> +STEX_HELPER(16_be, uint16_t, MO_BEUW, helper_be_stcondw_mmu)
> +STEX_HELPER(16_bea, uint16_t, MO_BEUW | MO_ALIGN, helper_be_stcondw_mmu)
> +STEX_HELPER(32_be, uint32_t, MO_BEUL, helper_be_stcondl_mmu)
> +STEX_HELPER(32_bea, uint32_t, MO_BEUL | MO_ALIGN, helper_be_stcondl_mmu)
> +STEX_HELPER(16_le, uint16_t, MO_LEUW, helper_le_stcondw_mmu)
> +STEX_HELPER(16_lea, uint16_t, MO_LEUW | MO_ALIGN, helper_le_stcondw_mmu)
> +STEX_HELPER(32_le, uint32_t, MO_LEUL, helper_le_stcondl_mmu)
> +STEX_HELPER(32_lea, uint32_t, MO_LEUL | MO_ALIGN, helper_le_stcondl_mmu)
> +
> +#define LDEX_HELPER_64(SUFF, OPC, FUNC)                                     \
> +uint64_t HELPER(ldlink_i##SUFF)(CPUArchState *env, target_ulong addr,       \
> +                                uint32_t index)                             \
> +{                                                                           \
> +    CPUArchState *state = env;                                              \
> +    TCGMemOpIdx op;                                                         \
> +                                                                            \
> +    op = make_memop_idx((OPC), index);                                      \
> +                                                                            \
> +    return FUNC(state, addr, op, GETRA());                                  \
> +}
> +
> +#define STEX_HELPER_64(SUFF, OPC, FUNC)                                     \
> +target_ulong HELPER(stcond_i##SUFF)(CPUArchState *env, target_ulong addr,   \
> +                                    uint64_t val, uint32_t index)           \
> +{                                                                           \
> +    CPUArchState *state = env;                                              \
> +    TCGMemOpIdx op;                                                         \
> +                                                                            \
> +    op = make_memop_idx((OPC), index);                                      \
> +                                                                            \
> +    return (target_ulong)FUNC(state, addr, val, op, GETRA());               \
> +}
> +
> +LDEX_HELPER_64(64_be, MO_BEQ, helper_be_ldlinkq_mmu)
> +LDEX_HELPER_64(64_bea, MO_BEQ | MO_ALIGN, helper_be_ldlinkq_mmu)
> +LDEX_HELPER_64(64_le, MO_LEQ, helper_le_ldlinkq_mmu)
> +LDEX_HELPER_64(64_lea, MO_LEQ | MO_ALIGN, helper_le_ldlinkq_mmu)
> +
> +STEX_HELPER_64(64_be, MO_BEQ, helper_be_stcondq_mmu)
> +STEX_HELPER_64(64_bea, MO_BEQ | MO_ALIGN, helper_be_stcondq_mmu)
> +STEX_HELPER_64(64_le, MO_LEQ, helper_le_stcondq_mmu)
> +STEX_HELPER_64(64_lea, MO_LEQ | MO_ALIGN, helper_le_stcondq_mmu)
> diff --git a/tcg-llsc-helper.h b/tcg-llsc-helper.h
> new file mode 100644
> index 0000000..8f7adf0
> --- /dev/null
> +++ b/tcg-llsc-helper.h
> @@ -0,0 +1,61 @@
> +#ifndef HELPER_LLSC_HEAD_H
> +#define HELPER_LLSC_HEAD_H 1
> +
> +uint32_t HELPER(ldlink_i8)(CPUArchState *env, target_ulong addr,
> +                           uint32_t index);
> +uint32_t HELPER(ldlink_i16_be)(CPUArchState *env, target_ulong addr,
> +                               uint32_t index);
> +uint32_t HELPER(ldlink_i32_be)(CPUArchState *env, target_ulong addr,
> +                               uint32_t index);
> +uint64_t HELPER(ldlink_i64_be)(CPUArchState *env, target_ulong addr,
> +                               uint32_t index);
> +uint32_t HELPER(ldlink_i16_le)(CPUArchState *env, target_ulong addr,
> +                               uint32_t index);
> +uint32_t HELPER(ldlink_i32_le)(CPUArchState *env, target_ulong addr,
> +                               uint32_t index);
> +uint64_t HELPER(ldlink_i64_le)(CPUArchState *env, target_ulong addr,
> +                               uint32_t index);
> +
> +target_ulong HELPER(stcond_i8)(CPUArchState *env, target_ulong addr,
> +                               uint32_t val, uint32_t index);
> +target_ulong HELPER(stcond_i16_be)(CPUArchState *env, target_ulong addr,
> +                                   uint32_t val, uint32_t index);
> +target_ulong HELPER(stcond_i32_be)(CPUArchState *env, target_ulong addr,
> +                                   uint32_t val, uint32_t index);
> +target_ulong HELPER(stcond_i64_be)(CPUArchState *env, target_ulong addr,
> +                                   uint64_t val, uint32_t index);
> +target_ulong HELPER(stcond_i16_le)(CPUArchState *env, target_ulong addr,
> +                                   uint32_t val, uint32_t index);
> +target_ulong HELPER(stcond_i32_le)(CPUArchState *env, target_ulong addr,
> +                                   uint32_t val, uint32_t index);
> +target_ulong HELPER(stcond_i64_le)(CPUArchState *env, target_ulong addr,
> +                                   uint64_t val, uint32_t index);
> +
> +/* Aligned versions */
> +uint32_t HELPER(ldlink_i16_bea)(CPUArchState *env, target_ulong addr,
> +                                uint32_t index);
> +uint32_t HELPER(ldlink_i32_bea)(CPUArchState *env, target_ulong addr,
> +                                uint32_t index);
> +uint64_t HELPER(ldlink_i64_bea)(CPUArchState *env, target_ulong addr,
> +                                uint32_t index);
> +uint32_t HELPER(ldlink_i16_lea)(CPUArchState *env, target_ulong addr,
> +                                uint32_t index);
> +uint32_t HELPER(ldlink_i32_lea)(CPUArchState *env, target_ulong addr,
> +                                uint32_t index);
> +uint64_t HELPER(ldlink_i64_lea)(CPUArchState *env, target_ulong addr,
> +                                uint32_t index);
> +
> +target_ulong HELPER(stcond_i16_bea)(CPUArchState *env, target_ulong addr,
> +                                    uint32_t val, uint32_t index);
> +target_ulong HELPER(stcond_i32_bea)(CPUArchState *env, target_ulong addr,
> +                                    uint32_t val, uint32_t index);
> +target_ulong HELPER(stcond_i64_bea)(CPUArchState *env, target_ulong addr,
> +                                    uint64_t val, uint32_t index);
> +target_ulong HELPER(stcond_i16_lea)(CPUArchState *env, target_ulong addr,
> +                                    uint32_t val, uint32_t index);
> +target_ulong HELPER(stcond_i32_lea)(CPUArchState *env, target_ulong addr,
> +                                    uint32_t val, uint32_t index);
> +target_ulong HELPER(stcond_i64_lea)(CPUArchState *env, target_ulong addr,
> +                                    uint64_t val, uint32_t index);
> +
> +#endif
> diff --git a/tcg/tcg-llsc-gen-helper.h b/tcg/tcg-llsc-gen-helper.h
> new file mode 100644
> index 0000000..01c0a67
> --- /dev/null
> +++ b/tcg/tcg-llsc-gen-helper.h
> @@ -0,0 +1,67 @@
> +#if TARGET_LONG_BITS == 32
> +#define TYPE i32
> +#else
> +#define TYPE i64
> +#endif
> +
> +DEF_HELPER_3(ldlink_i8, i32, env, TYPE, i32)
> +DEF_HELPER_3(ldlink_i16_be, i32, env, TYPE, i32)
> +DEF_HELPER_3(ldlink_i32_be, i32, env, TYPE, i32)
> +DEF_HELPER_3(ldlink_i64_be, i64, env, TYPE, i32)
> +DEF_HELPER_3(ldlink_i16_le, i32, env, TYPE, i32)
> +DEF_HELPER_3(ldlink_i32_le, i32, env, TYPE, i32)
> +DEF_HELPER_3(ldlink_i64_le, i64, env, TYPE, i32)
> +
> +DEF_HELPER_4(stcond_i8, TYPE, env, TYPE, i32, i32)
> +DEF_HELPER_4(stcond_i16_be, TYPE, env, TYPE, i32, i32)
> +DEF_HELPER_4(stcond_i32_be, TYPE, env, TYPE, i32, i32)
> +DEF_HELPER_4(stcond_i64_be, TYPE, env, TYPE, i64, i32)
> +DEF_HELPER_4(stcond_i16_le, TYPE, env, TYPE, i32, i32)
> +DEF_HELPER_4(stcond_i32_le, TYPE, env, TYPE, i32, i32)
> +DEF_HELPER_4(stcond_i64_le, TYPE, env, TYPE, i64, i32)
> +
> +/* Aligned versions */
> +DEF_HELPER_3(ldlink_i16_bea, i32, env, TYPE, i32)
> +DEF_HELPER_3(ldlink_i32_bea, i32, env, TYPE, i32)
> +DEF_HELPER_3(ldlink_i64_bea, i64, env, TYPE, i32)
> +DEF_HELPER_3(ldlink_i16_lea, i32, env, TYPE, i32)
> +DEF_HELPER_3(ldlink_i32_lea, i32, env, TYPE, i32)
> +DEF_HELPER_3(ldlink_i64_lea, i64, env, TYPE, i32)
> +
> +DEF_HELPER_4(stcond_i16_bea, TYPE, env, TYPE, i32, i32)
> +DEF_HELPER_4(stcond_i32_bea, TYPE, env, TYPE, i32, i32)
> +DEF_HELPER_4(stcond_i64_bea, TYPE, env, TYPE, i64, i32)
> +DEF_HELPER_4(stcond_i16_lea, TYPE, env, TYPE, i32, i32)
> +DEF_HELPER_4(stcond_i32_lea, TYPE, env, TYPE, i32, i32)
> +DEF_HELPER_4(stcond_i64_lea, TYPE, env, TYPE, i64, i32)
> +
> +/* Convenient aliases */
> +#ifdef TARGET_WORDS_BIGENDIAN
> +#define gen_helper_stcond_i16 gen_helper_stcond_i16_be
> +#define gen_helper_stcond_i32 gen_helper_stcond_i32_be
> +#define gen_helper_stcond_i64 gen_helper_stcond_i64_be
> +#define gen_helper_ldlink_i16 gen_helper_ldlink_i16_be
> +#define gen_helper_ldlink_i32 gen_helper_ldlink_i32_be
> +#define gen_helper_ldlink_i64 gen_helper_ldlink_i64_be
> +#define gen_helper_stcond_i16a gen_helper_stcond_i16_bea
> +#define gen_helper_stcond_i32a gen_helper_stcond_i32_bea
> +#define gen_helper_stcond_i64a gen_helper_stcond_i64_bea
> +#define gen_helper_ldlink_i16a gen_helper_ldlink_i16_bea
> +#define gen_helper_ldlink_i32a gen_helper_ldlink_i32_bea
> +#define gen_helper_ldlink_i64a gen_helper_ldlink_i64_bea
> +#else
> +#define gen_helper_stcond_i16 gen_helper_stcond_i16_le
> +#define gen_helper_stcond_i32 gen_helper_stcond_i32_le
> +#define gen_helper_stcond_i64 gen_helper_stcond_i64_le
> +#define gen_helper_ldlink_i16 gen_helper_ldlink_i16_le
> +#define gen_helper_ldlink_i32 gen_helper_ldlink_i32_le
> +#define gen_helper_ldlink_i64 gen_helper_ldlink_i64_le
> +#define gen_helper_stcond_i16a gen_helper_stcond_i16_lea
> +#define gen_helper_stcond_i32a gen_helper_stcond_i32_lea
> +#define gen_helper_stcond_i64a gen_helper_stcond_i64_lea
> +#define gen_helper_ldlink_i16a gen_helper_ldlink_i16_lea
> +#define gen_helper_ldlink_i32a gen_helper_ldlink_i32_lea
> +#define gen_helper_ldlink_i64a gen_helper_ldlink_i64_lea
> +#endif
> +
> +#undef TYPE


--
Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 10/16] softmmu: Protect MMIO exclusive range
  2016-02-18 14:15     ` alvise rigo
@ 2016-02-18 16:25       ` Alex Bennée
  2016-03-07 18:13         ` alvise rigo
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Bennée @ 2016-02-18 16:25 UTC (permalink / raw)
  To: alvise rigo
  Cc: MTTCG Devel, Claudio Fontana, QEMU Developers, Paolo Bonzini,
	Jani Kokkonen, VirtualOpenSystems Technical Team,
	Richard Henderson


alvise rigo <a.rigo@virtualopensystems.com> writes:

> On Wed, Feb 17, 2016 at 7:55 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>> Alvise Rigo <a.rigo@virtualopensystems.com> writes:
>>
>>> As for the RAM case, also the MMIO exclusive ranges have to be protected
>>> by other CPU's accesses. In order to do that, we flag the accessed
>>> MemoryRegion to mark that an exclusive access has been performed and is
>>> not concluded yet.
>>>
>>> This flag will force the other CPUs to invalidate the exclusive range in
>>> case of collision.
>>>
>>> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
>>> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
>>> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
>>> ---
>>>  cputlb.c                | 20 +++++++++++++-------
>>>  include/exec/memory.h   |  1 +
>>>  softmmu_llsc_template.h | 11 +++++++----
>>>  softmmu_template.h      | 22 ++++++++++++++++++++++
>>>  4 files changed, 43 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/cputlb.c b/cputlb.c
>>> index 87d09c8..06ce2da 100644
>>> --- a/cputlb.c
>>> +++ b/cputlb.c
>>> @@ -496,19 +496,25 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr)
>>>  /* For every vCPU compare the exclusive address and reset it in case of a
>>>   * match. Since only one vCPU is running at once, no lock has to be held to
>>>   * guard this operation. */
>>> -static inline void lookup_and_reset_cpus_ll_addr(hwaddr addr, hwaddr size)
>>> +static inline bool lookup_and_reset_cpus_ll_addr(hwaddr addr, hwaddr size)
>>>  {
>>>      CPUState *cpu;
>>> +    bool ret = false;
>>>
>>>      CPU_FOREACH(cpu) {
>>> -        if (cpu->excl_protected_range.begin != EXCLUSIVE_RESET_ADDR &&
>>> -            ranges_overlap(cpu->excl_protected_range.begin,
>>> -                           cpu->excl_protected_range.end -
>>> -                           cpu->excl_protected_range.begin,
>>> -                           addr, size)) {
>>> -            cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
>>> +        if (current_cpu != cpu) {
>>
>> I'm confused by this change. I don't see anywhere in the MMIO handling
>> why we would want to change skipping the CPU. Perhaps this belongs in
>> the previous patch? Maybe the function should really be
>> lookup_and_maybe_reset_other_cpu_ll_addr?
>
> This is actually used later on in this patch.

But aren't there other users before the functional change was made to
skip the current_cpu? Where their expectations wrong or should we have
always skipped the current CPU?

The additional of the bool return I agree only needs to be brought in
now when there are functions that care.

>
>>
>>> +            if (cpu->excl_protected_range.begin != EXCLUSIVE_RESET_ADDR &&
>>> +                ranges_overlap(cpu->excl_protected_range.begin,
>>> +                               cpu->excl_protected_range.end -
>>> +                               cpu->excl_protected_range.begin,
>>> +                               addr, size)) {
>>> +                cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
>>> +                ret = true;
>>> +            }
>>>          }
>>>      }
>>> +
>>> +    return ret;
>>>  }
>>>
>>>  #define MMUSUFFIX _mmu
>>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>>> index 71e0480..bacb3ad 100644
>>> --- a/include/exec/memory.h
>>> +++ b/include/exec/memory.h
>>> @@ -171,6 +171,7 @@ struct MemoryRegion {
>>>      bool rom_device;
>>>      bool flush_coalesced_mmio;
>>>      bool global_locking;
>>> +    bool pending_excl_access; /* A vCPU issued an exclusive access */
>>>      uint8_t dirty_log_mask;
>>>      ram_addr_t ram_addr;
>>>      Object *owner;
>>> diff --git a/softmmu_llsc_template.h b/softmmu_llsc_template.h
>>> index 101f5e8..b4712ba 100644
>>> --- a/softmmu_llsc_template.h
>>> +++ b/softmmu_llsc_template.h
>>> @@ -81,15 +81,18 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
>>>                  }
>>>              }
>>>          }
>>> +        /* For this vCPU, just update the TLB entry, no need to flush. */
>>> +        env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
>>>      } else {
>>> -        hw_error("EXCL accesses to MMIO regions not supported yet.");
>>> +        /* Set a pending exclusive access in the MemoryRegion */
>>> +        MemoryRegion *mr = iotlb_to_region(this,
>>> +                                           env->iotlb[mmu_idx][index].addr,
>>> +                                           env->iotlb[mmu_idx][index].attrs);
>>> +        mr->pending_excl_access = true;
>>>      }
>>>
>>>      cc->cpu_set_excl_protected_range(this, hw_addr, DATA_SIZE);
>>>
>>> -    /* For this vCPU, just update the TLB entry, no need to flush. */
>>> -    env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
>>> -
>>>      /* From now on we are in LL/SC context */
>>>      this->ll_sc_context = true;
>>>
>>> diff --git a/softmmu_template.h b/softmmu_template.h
>>> index c54bdc9..71c5152 100644
>>> --- a/softmmu_template.h
>>> +++ b/softmmu_template.h
>>> @@ -360,6 +360,14 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
>>>      MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
>>>
>>>      physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
>>> +
>>> +    /* Invalidate the exclusive range that overlaps this access */
>>> +    if (mr->pending_excl_access) {
>>> +        if (lookup_and_reset_cpus_ll_addr(physaddr, 1 << SHIFT)) {
>
> Here precisely. As you wrote, we can rename it to
> lookup_and_maybe_reset_other_cpu_ll_addr even if this name does not
> convince me. What about other_cpus_reset_colliding_ll_addr?

We want as short and semantically informative as possible. Naming things is hard ;-)

 - reset_other_cpus_colliding_ll_addr
 - reset_other_cpus_overlapping_ll_addr

Any other options?

>
> Thank you,
> alvise
>
>>> +            mr->pending_excl_access = false;
>>> +        }
>>> +    }
>>> +
>>>      if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
>>>          cpu_io_recompile(cpu, retaddr);
>>>      }
>>> @@ -504,6 +512,13 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>>                  glue(helper_le_st_name, _do_mmio_access)(env, val, addr, oi,
>>>                                                           mmu_idx, index,
>>>                                                           retaddr);
>>> +                /* N.B.: Here excl_succeeded == true means that this access
>>> +                 * comes from an exclusive instruction. */
>>> +                if (cpu->excl_succeeded) {
>>> +                    MemoryRegion *mr = iotlb_to_region(cpu, iotlbentry->addr,
>>> +                                                       iotlbentry->attrs);
>>> +                    mr->pending_excl_access = false;
>>> +                }
>>>              } else {
>>>                  glue(helper_le_st_name, _do_ram_access)(env, val, addr, oi,
>>>                                                          mmu_idx, index,
>>> @@ -655,6 +670,13 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>>                  glue(helper_be_st_name, _do_mmio_access)(env, val, addr, oi,
>>>                                                           mmu_idx, index,
>>>                                                           retaddr);
>>> +                /* N.B.: Here excl_succeeded == true means that this access
>>> +                 * comes from an exclusive instruction. */
>>> +                if (cpu->excl_succeeded) {
>>> +                    MemoryRegion *mr = iotlb_to_region(cpu, iotlbentry->addr,
>>> +                                                       iotlbentry->attrs);
>>> +                    mr->pending_excl_access = false;
>>> +                }
>>
>> My comments about duplication on previous patches still stand.
>
> Indeed.
>
> Thank you,
> alvise
>
>>
>>>              } else {
>>>                  glue(helper_be_st_name, _do_ram_access)(env, val, addr, oi,
>>>                                                          mmu_idx, index,
>>
>>
>> --
>> Alex Bennée


--
Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 09/16] softmmu: Include MMIO/invalid exclusive accesses
  2016-02-18 14:18     ` alvise rigo
@ 2016-02-18 16:26       ` Alex Bennée
  0 siblings, 0 replies; 50+ messages in thread
From: Alex Bennée @ 2016-02-18 16:26 UTC (permalink / raw)
  To: alvise rigo
  Cc: MTTCG Devel, Claudio Fontana, QEMU Developers, Paolo Bonzini,
	Jani Kokkonen, VirtualOpenSystems Technical Team,
	Richard Henderson


alvise rigo <a.rigo@virtualopensystems.com> writes:

> On Tue, Feb 16, 2016 at 6:49 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>> Alvise Rigo <a.rigo@virtualopensystems.com> writes:
>>
>>> Enable exclusive accesses when the MMIO/invalid flag is set in the TLB
>>> entry.
>>>
>>> In case a LL access is done to MMIO memory, we treat it differently from
>>> a RAM access in that we do not rely on the EXCL bitmap to flag the page
>>> as exclusive. In fact, we don't even need the TLB_EXCL flag to force the
>>> slow path, since it is always forced anyway.
>>>
>>> This commit does not take care of invalidating an MMIO exclusive range from
>>> other non-exclusive accesses i.e. CPU1 LoadLink to MMIO address X and
>>> CPU2 writes to X. This will be addressed in the following commit.
>>>
>>> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
>>> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
>>> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
>>> ---
>>>  cputlb.c           |  7 +++----
>>>  softmmu_template.h | 26 ++++++++++++++++++++------
>>>  2 files changed, 23 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/cputlb.c b/cputlb.c
>>> index aa9cc17..87d09c8 100644
>>> --- a/cputlb.c
>>> +++ b/cputlb.c
>>> @@ -424,7 +424,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
>>>          if ((memory_region_is_ram(section->mr) && section->readonly)
>>>              || memory_region_is_romd(section->mr)) {
>>>              /* Write access calls the I/O callback.  */
>>> -            te->addr_write = address | TLB_MMIO;
>>> +            address |= TLB_MMIO;
>>>          } else if (memory_region_is_ram(section->mr)
>>>                     && cpu_physical_memory_is_clean(section->mr->ram_addr
>>>                                                     + xlat)) {
>>> @@ -437,11 +437,10 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
>>>              if (cpu_physical_memory_is_excl(section->mr->ram_addr + xlat)) {
>>>                  /* There is at least one vCPU that has flagged the address as
>>>                   * exclusive. */
>>> -                te->addr_write = address | TLB_EXCL;
>>> -            } else {
>>> -                te->addr_write = address;
>>> +                address |= TLB_EXCL;
>>>              }
>>>          }
>>> +        te->addr_write = address;
>>
>> As mentioned before I think this bit belongs in the earlier patch.
>>
>>>      } else {
>>>          te->addr_write = -1;
>>>      }
>>> diff --git a/softmmu_template.h b/softmmu_template.h
>>> index 267c52a..c54bdc9 100644
>>> --- a/softmmu_template.h
>>> +++ b/softmmu_template.h
>>> @@ -476,7 +476,7 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>>
>>>      /* Handle an IO access or exclusive access.  */
>>>      if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
>>> -        if ((tlb_addr & ~TARGET_PAGE_MASK) == TLB_EXCL) {
>>> +        if (tlb_addr & TLB_EXCL) {
>>>              CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
>>>              CPUState *cpu = ENV_GET_CPU(env);
>>>              CPUClass *cc = CPU_GET_CLASS(cpu);
>>> @@ -500,8 +500,15 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>>                  }
>>>              }
>>>
>>> -            glue(helper_le_st_name, _do_ram_access)(env, val, addr, oi,
>>> -                                                    mmu_idx, index, retaddr);
>>> +            if (tlb_addr & ~(TARGET_PAGE_MASK | TLB_EXCL)) { /* MMIO
>>> access */
>>
>> What about the other flags? Shouldn't this be tlb_addr & TLB_MMIO?
>
> The upstream QEMU's condition to follow the IO access path is:
> if (unlikely(tlb_addr & ~TARGET_PAGE_MASK))
> Now, we split this in:
> if (tlb_addr & TLB_EXCL)
> for RAM exclusive accesses and
> if (tlb_addr & ~(TARGET_PAGE_MASK | TLB_EXCL))
> for IO accesses. In this last case, we handle also the IO exclusive
> accesses.

OK I see that now. Thanks.

>
>>
>>> +                glue(helper_le_st_name, _do_mmio_access)(env, val, addr, oi,
>>> +                                                         mmu_idx, index,
>>> +                                                         retaddr);
>>> +            } else {
>>> +                glue(helper_le_st_name, _do_ram_access)(env, val, addr, oi,
>>> +                                                        mmu_idx, index,
>>> +                                                        retaddr);
>>> +            }
>>>
>>>              lookup_and_reset_cpus_ll_addr(hw_addr, DATA_SIZE);
>>>
>>> @@ -620,7 +627,7 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>>
>>>      /* Handle an IO access or exclusive access.  */
>>>      if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
>>> -        if ((tlb_addr & ~TARGET_PAGE_MASK) == TLB_EXCL) {
>>> +        if (tlb_addr & TLB_EXCL) {
>>>              CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
>>>              CPUState *cpu = ENV_GET_CPU(env);
>>>              CPUClass *cc = CPU_GET_CLASS(cpu);
>>> @@ -644,8 +651,15 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>>                  }
>>>              }
>>>
>>> -            glue(helper_be_st_name, _do_ram_access)(env, val, addr, oi,
>>> -                                                    mmu_idx, index, retaddr);
>>> +            if (tlb_addr & ~(TARGET_PAGE_MASK | TLB_EXCL)) { /* MMIO access */
>>> +                glue(helper_be_st_name, _do_mmio_access)(env, val, addr, oi,
>>> +                                                         mmu_idx, index,
>>> +                                                         retaddr);
>>> +            } else {
>>> +                glue(helper_be_st_name, _do_ram_access)(env, val, addr, oi,
>>> +                                                        mmu_idx, index,
>>> +                                                        retaddr);
>>> +            }
>>>
>>>              lookup_and_reset_cpus_ll_addr(hw_addr, DATA_SIZE);
>>
>>
>> --
>> Alex Bennée


--
Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 12/16] configure: Use slow-path for atomic only when the softmmu is enabled
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 12/16] configure: Use slow-path for atomic only when the softmmu is enabled Alvise Rigo
@ 2016-02-18 16:40   ` Alex Bennée
  2016-02-18 16:43     ` Alex Bennée
  2016-03-07 17:21     ` alvise rigo
  0 siblings, 2 replies; 50+ messages in thread
From: Alex Bennée @ 2016-02-18 16:40 UTC (permalink / raw)
  To: Alvise Rigo
  Cc: mttcg, claudio.fontana, qemu-devel, pbonzini, jani.kokkonen, tech, rth


Alvise Rigo <a.rigo@virtualopensystems.com> writes:

> Use the new slow path for atomic instruction translation when the
> softmmu is enabled.
>
> At the moment only arm and aarch64 use the new LL/SC backend. It is
> possible to disable such backed with --disable-arm-llsc-backend.

Do we want to disable the backend once it is merged? Does it serve a
purpose other than to confuse the user?

>
> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
> ---
>  configure | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/configure b/configure
> index 44ac9ab..915efcc 100755
> --- a/configure
> +++ b/configure
> @@ -294,6 +294,7 @@ solaris="no"
>  profiler="no"
>  cocoa="no"
>  softmmu="yes"
> +arm_tcg_use_llsc="yes"
>  linux_user="no"
>  bsd_user="no"
>  aix="no"
> @@ -880,6 +881,10 @@ for opt do
>    ;;
>    --disable-debug-tcg) debug_tcg="no"
>    ;;
> +  --enable-arm-llsc-backend) arm_tcg_use_llsc="yes"
> +  ;;
> +  --disable-arm-llsc-backend) arm_tcg_use_llsc="no"
> +  ;;
>    --enable-debug)
>        # Enable debugging options that aren't excessively noisy
>        debug_tcg="yes"
> @@ -4751,6 +4756,7 @@ echo "host CPU          $cpu"
>  echo "host big endian   $bigendian"
>  echo "target list       $target_list"
>  echo "tcg debug enabled $debug_tcg"
> +echo "arm use llsc backend" $arm_tcg_use_llsc
>  echo "gprof enabled     $gprof"
>  echo "sparse enabled    $sparse"
>  echo "strip binaries    $strip_opt"
> @@ -4806,6 +4812,7 @@ echo "Install blobs     $blobs"
>  echo "KVM support       $kvm"
>  echo "RDMA support      $rdma"
>  echo "TCG interpreter   $tcg_interpreter"
> +echo "use ld/st excl    $softmmu"

I think we can drop everything above here.

>  echo "fdt support       $fdt"
>  echo "preadv support    $preadv"
>  echo "fdatasync         $fdatasync"
> @@ -5863,6 +5870,13 @@ fi
>  echo "LDFLAGS+=$ldflags" >> $config_target_mak
>  echo "QEMU_CFLAGS+=$cflags" >> $config_target_mak
>
> +# Use tcg LL/SC tcg backend for exclusive instruction is arm/aarch64
> +# softmmus targets
> +if test "$arm_tcg_use_llsc" = "yes" ; then
> +  if test "$target" = "arm-softmmu" ; then
> +    echo "CONFIG_ARM_USE_LDST_EXCL=y" >> $config_target_mak
> +  fi
> +fi

This isn't going to be just ARM specific and it will be progressively
turned on for other arches. So perhaps with the CONFIG_SOFTMMU section:

if test "$target_softmmu" = "yes" ; then
    echo "CONFIG_SOFTMMU=y" >> $config_target_mak

    # Use SoftMMU LL/SC primitives?
    case "$target_name" in
        arm | aarch64)
            echo "CONFIG_USE_LDST_EXCL=y" >> $config_target_mak
            ;;
    esac
fi


>  done # for target in $targets
>
>  if [ "$pixman" = "internal" ]; then


--
Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 12/16] configure: Use slow-path for atomic only when the softmmu is enabled
  2016-02-18 16:40   ` Alex Bennée
@ 2016-02-18 16:43     ` Alex Bennée
  2016-03-07 17:21     ` alvise rigo
  1 sibling, 0 replies; 50+ messages in thread
From: Alex Bennée @ 2016-02-18 16:43 UTC (permalink / raw)
  To: Alvise Rigo
  Cc: mttcg, claudio.fontana, qemu-devel, pbonzini, jani.kokkonen, tech, rth


Alex Bennée <alex.bennee@linaro.org> writes:

> Alvise Rigo <a.rigo@virtualopensystems.com> writes:
>
>> Use the new slow path for atomic instruction translation when the
>> softmmu is enabled.
>>
>> At the moment only arm and aarch64 use the new LL/SC backend. It is
>> possible to disable such backed with --disable-arm-llsc-backend.
>
> Do we want to disable the backend once it is merged? Does it serve a
> purpose other than to confuse the user?

Also this needs to be after the actual implementation has been filled in
or you'll have a broken build!

>
>>
>> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
>> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
>> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
>> ---
>>  configure | 14 ++++++++++++++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/configure b/configure
>> index 44ac9ab..915efcc 100755
>> --- a/configure
>> +++ b/configure
>> @@ -294,6 +294,7 @@ solaris="no"
>>  profiler="no"
>>  cocoa="no"
>>  softmmu="yes"
>> +arm_tcg_use_llsc="yes"
>>  linux_user="no"
>>  bsd_user="no"
>>  aix="no"
>> @@ -880,6 +881,10 @@ for opt do
>>    ;;
>>    --disable-debug-tcg) debug_tcg="no"
>>    ;;
>> +  --enable-arm-llsc-backend) arm_tcg_use_llsc="yes"
>> +  ;;
>> +  --disable-arm-llsc-backend) arm_tcg_use_llsc="no"
>> +  ;;
>>    --enable-debug)
>>        # Enable debugging options that aren't excessively noisy
>>        debug_tcg="yes"
>> @@ -4751,6 +4756,7 @@ echo "host CPU          $cpu"
>>  echo "host big endian   $bigendian"
>>  echo "target list       $target_list"
>>  echo "tcg debug enabled $debug_tcg"
>> +echo "arm use llsc backend" $arm_tcg_use_llsc
>>  echo "gprof enabled     $gprof"
>>  echo "sparse enabled    $sparse"
>>  echo "strip binaries    $strip_opt"
>> @@ -4806,6 +4812,7 @@ echo "Install blobs     $blobs"
>>  echo "KVM support       $kvm"
>>  echo "RDMA support      $rdma"
>>  echo "TCG interpreter   $tcg_interpreter"
>> +echo "use ld/st excl    $softmmu"
>
> I think we can drop everything above here.
>
>>  echo "fdt support       $fdt"
>>  echo "preadv support    $preadv"
>>  echo "fdatasync         $fdatasync"
>> @@ -5863,6 +5870,13 @@ fi
>>  echo "LDFLAGS+=$ldflags" >> $config_target_mak
>>  echo "QEMU_CFLAGS+=$cflags" >> $config_target_mak
>>
>> +# Use tcg LL/SC tcg backend for exclusive instruction is arm/aarch64
>> +# softmmus targets
>> +if test "$arm_tcg_use_llsc" = "yes" ; then
>> +  if test "$target" = "arm-softmmu" ; then
>> +    echo "CONFIG_ARM_USE_LDST_EXCL=y" >> $config_target_mak
>> +  fi
>> +fi
>
> This isn't going to be just ARM specific and it will be progressively
> turned on for other arches. So perhaps with the CONFIG_SOFTMMU section:
>
> if test "$target_softmmu" = "yes" ; then
>     echo "CONFIG_SOFTMMU=y" >> $config_target_mak
>
>     # Use SoftMMU LL/SC primitives?
>     case "$target_name" in
>         arm | aarch64)
>             echo "CONFIG_USE_LDST_EXCL=y" >> $config_target_mak
>             ;;
>     esac
> fi
>
>
>>  done # for target in $targets
>>
>>  if [ "$pixman" = "internal" ]; then


--
Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 14/16] target-arm: translate: Use ld/st excl for atomic insns
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 14/16] target-arm: translate: Use ld/st excl for atomic insns Alvise Rigo
@ 2016-02-18 17:02   ` Alex Bennée
  2016-03-07 18:39     ` alvise rigo
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Bennée @ 2016-02-18 17:02 UTC (permalink / raw)
  To: Alvise Rigo
  Cc: mttcg, claudio.fontana, qemu-devel, pbonzini, jani.kokkonen, tech, rth


Alvise Rigo <a.rigo@virtualopensystems.com> writes:

> Use the new LL/SC runtime helpers to handle the ARM atomic instructions
> in softmmu_llsc_template.h.
>
> In general, the helper generator
> gen_{ldrex,strex}_{8,16a,32a,64a}() calls the function
> helper_{le,be}_{ldlink,stcond}{ub,uw,ul,q}_mmu() implemented in
> softmmu_llsc_template.h, doing an alignment check.
>
> In addition, add a simple helper function to emulate the CLREX instruction.
>
> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
> ---
>  target-arm/cpu.h       |   2 +
>  target-arm/helper.h    |   4 ++
>  target-arm/machine.c   |   2 +
>  target-arm/op_helper.c |  10 +++
>  target-arm/translate.c | 188 +++++++++++++++++++++++++++++++++++++++++++++++--
>  5 files changed, 202 insertions(+), 4 deletions(-)
>
> diff --git a/target-arm/cpu.h b/target-arm/cpu.h
> index b8b3364..bb5361f 100644
> --- a/target-arm/cpu.h
> +++ b/target-arm/cpu.h
> @@ -462,9 +462,11 @@ typedef struct CPUARMState {
>          float_status fp_status;
>          float_status standard_fp_status;
>      } vfp;
> +#if !defined(CONFIG_ARM_USE_LDST_EXCL)
>      uint64_t exclusive_addr;
>      uint64_t exclusive_val;
>      uint64_t exclusive_high;
> +#endif
>  #if defined(CONFIG_USER_ONLY)
>      uint64_t exclusive_test;
>      uint32_t exclusive_info;
> diff --git a/target-arm/helper.h b/target-arm/helper.h
> index c2a85c7..6bc3c0a 100644
> --- a/target-arm/helper.h
> +++ b/target-arm/helper.h
> @@ -532,6 +532,10 @@ DEF_HELPER_2(dc_zva, void, env, i64)
>  DEF_HELPER_FLAGS_2(neon_pmull_64_lo, TCG_CALL_NO_RWG_SE, i64, i64, i64)
>  DEF_HELPER_FLAGS_2(neon_pmull_64_hi, TCG_CALL_NO_RWG_SE, i64, i64, i64)
>
> +#ifdef CONFIG_ARM_USE_LDST_EXCL
> +DEF_HELPER_1(atomic_clear, void, env)
> +#endif
> +
>  #ifdef TARGET_AARCH64
>  #include "helper-a64.h"
>  #endif
> diff --git a/target-arm/machine.c b/target-arm/machine.c
> index ed1925a..7adfb4d 100644
> --- a/target-arm/machine.c
> +++ b/target-arm/machine.c
> @@ -309,9 +309,11 @@ const VMStateDescription vmstate_arm_cpu = {
>          VMSTATE_VARRAY_INT32(cpreg_vmstate_values, ARMCPU,
>                               cpreg_vmstate_array_len,
>                               0, vmstate_info_uint64, uint64_t),
> +#if !defined(CONFIG_ARM_USE_LDST_EXCL)
>          VMSTATE_UINT64(env.exclusive_addr, ARMCPU),
>          VMSTATE_UINT64(env.exclusive_val, ARMCPU),
>          VMSTATE_UINT64(env.exclusive_high, ARMCPU),
> +#endif

Hmm this does imply we either need to support migration of the LL/SC
state in the generic code or map the generic state into the ARM specific
machine state or we'll break migration.

The later if probably better so you can save machine state from a
pre-LL/SC build and migrate to a new LL/SC enabled build.

>          VMSTATE_UINT64(env.features, ARMCPU),
>          VMSTATE_UINT32(env.exception.syndrome, ARMCPU),
>          VMSTATE_UINT32(env.exception.fsr, ARMCPU),
> diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
> index a5ee65f..404c13b 100644
> --- a/target-arm/op_helper.c
> +++ b/target-arm/op_helper.c
> @@ -51,6 +51,14 @@ static int exception_target_el(CPUARMState *env)
>      return target_el;
>  }
>
> +#ifdef CONFIG_ARM_USE_LDST_EXCL
> +void HELPER(atomic_clear)(CPUARMState *env)
> +{
> +    ENV_GET_CPU(env)->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
> +    ENV_GET_CPU(env)->ll_sc_context = false;
> +}
> +#endif
> +

Given this is just touching generic CPU state this helper should probably be
part of the generic TCG runtime. I assume other arches will just call
this helper as well.


>  uint32_t HELPER(neon_tbl)(CPUARMState *env, uint32_t ireg, uint32_t def,
>                            uint32_t rn, uint32_t maxindex)
>  {
> @@ -689,7 +697,9 @@ void HELPER(exception_return)(CPUARMState *env)
>
>      aarch64_save_sp(env, cur_el);
>
> +#if !defined(CONFIG_ARM_USE_LDST_EXCL)
>      env->exclusive_addr = -1;
> +#endif
>
>      /* We must squash the PSTATE.SS bit to zero unless both of the
>       * following hold:
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index cff511b..5150841 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -60,8 +60,10 @@ TCGv_ptr cpu_env;
>  static TCGv_i64 cpu_V0, cpu_V1, cpu_M0;
>  static TCGv_i32 cpu_R[16];
>  TCGv_i32 cpu_CF, cpu_NF, cpu_VF, cpu_ZF;
> +#if !defined(CONFIG_ARM_USE_LDST_EXCL)
>  TCGv_i64 cpu_exclusive_addr;
>  TCGv_i64 cpu_exclusive_val;
> +#endif
>  #ifdef CONFIG_USER_ONLY
>  TCGv_i64 cpu_exclusive_test;
>  TCGv_i32 cpu_exclusive_info;
> @@ -94,10 +96,12 @@ void arm_translate_init(void)
>      cpu_VF = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUARMState, VF), "VF");
>      cpu_ZF = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUARMState, ZF), "ZF");
>
> +#if !defined(CONFIG_ARM_USE_LDST_EXCL)
>      cpu_exclusive_addr = tcg_global_mem_new_i64(TCG_AREG0,
>          offsetof(CPUARMState, exclusive_addr), "exclusive_addr");
>      cpu_exclusive_val = tcg_global_mem_new_i64(TCG_AREG0,
>          offsetof(CPUARMState, exclusive_val), "exclusive_val");
> +#endif
>  #ifdef CONFIG_USER_ONLY
>      cpu_exclusive_test = tcg_global_mem_new_i64(TCG_AREG0,
>          offsetof(CPUARMState, exclusive_test), "exclusive_test");
> @@ -7413,15 +7417,145 @@ static void gen_logicq_cc(TCGv_i32 lo, TCGv_i32 hi)
>      tcg_gen_or_i32(cpu_ZF, lo, hi);
>  }
>
> -/* Load/Store exclusive instructions are implemented by remembering
> +/* If the softmmu is enabled, the translation of Load/Store exclusive
> +   instructions will rely on the gen_helper_{ldlink,stcond} helpers,
> +   offloading most of the work to the softmmu_llsc_template.h functions.
> +   All the accesses made by the exclusive instructions include an
> +   alignment check.
> +
> +   Otherwise, these instructions are implemented by remembering
>     the value/address loaded, and seeing if these are the same
>     when the store is performed. This should be sufficient to implement
>     the architecturally mandated semantics, and avoids having to monitor
>     regular stores.
>
> -   In system emulation mode only one CPU will be running at once, so
> -   this sequence is effectively atomic.  In user emulation mode we
> -   throw an exception and handle the atomic operation elsewhere.  */
> +   In user emulation mode we throw an exception and handle the atomic
> +   operation elsewhere.  */
> +#ifdef CONFIG_ARM_USE_LDST_EXCL
> +
> +#if TARGET_LONG_BITS == 32
> +#define DO_GEN_LDREX(SUFF)                                             \
> +static inline void gen_ldrex_##SUFF(TCGv_i32 dst, TCGv_i32 addr,       \
> +                                    TCGv_i32 index)                    \
> +{                                                                      \
> +    gen_helper_ldlink_##SUFF(dst, cpu_env, addr, index);               \
> +}
> +
> +#define DO_GEN_STREX(SUFF)                                             \
> +static inline void gen_strex_##SUFF(TCGv_i32 dst, TCGv_i32 addr,       \
> +                                    TCGv_i32 val, TCGv_i32 index)      \
> +{                                                                      \
> +    gen_helper_stcond_##SUFF(dst, cpu_env, addr, val, index);          \
> +}
> +
> +static inline void gen_ldrex_i64a(TCGv_i64 dst, TCGv_i32 addr, TCGv_i32 index)
> +{
> +    gen_helper_ldlink_i64a(dst, cpu_env, addr, index);
> +}
> +
> +static inline void gen_strex_i64a(TCGv_i32 dst, TCGv_i32 addr, TCGv_i64 val,
> +                                  TCGv_i32 index)
> +{
> +
> +    gen_helper_stcond_i64a(dst, cpu_env, addr, val, index);
> +}
> +#else
> +#define DO_GEN_LDREX(SUFF)                                             \
> +static inline void gen_ldrex_##SUFF(TCGv_i32 dst, TCGv_i32 addr,       \
> +                                         TCGv_i32 index)               \
> +{                                                                      \
> +    TCGv addr64 = tcg_temp_new();                                      \
> +    tcg_gen_extu_i32_i64(addr64, addr);                                \
> +    gen_helper_ldlink_##SUFF(dst, cpu_env, addr64, index);             \
> +    tcg_temp_free(addr64);                                             \
> +}
> +
> +#define DO_GEN_STREX(SUFF)                                             \
> +static inline void gen_strex_##SUFF(TCGv_i32 dst, TCGv_i32 addr,       \
> +                                    TCGv_i32 val, TCGv_i32 index)      \
> +{                                                                      \
> +    TCGv addr64 = tcg_temp_new();                                      \
> +    TCGv dst64 = tcg_temp_new();                                       \
> +    tcg_gen_extu_i32_i64(addr64, addr);                                \
> +    gen_helper_stcond_##SUFF(dst64, cpu_env, addr64, val, index);      \
> +    tcg_gen_extrl_i64_i32(dst, dst64);                                 \
> +    tcg_temp_free(dst64);                                              \
> +    tcg_temp_free(addr64);                                             \
> +}
> +
> +static inline void gen_ldrex_i64a(TCGv_i64 dst, TCGv_i32 addr, TCGv_i32 index)
> +{
> +    TCGv addr64 = tcg_temp_new();
> +    tcg_gen_extu_i32_i64(addr64, addr);
> +    gen_helper_ldlink_i64a(dst, cpu_env, addr64, index);
> +    tcg_temp_free(addr64);
> +}
> +
> +static inline void gen_strex_i64a(TCGv_i32 dst, TCGv_i32 addr, TCGv_i64 val,
> +                                  TCGv_i32 index)
> +{
> +    TCGv addr64 = tcg_temp_new();
> +    TCGv dst64 = tcg_temp_new();
> +
> +    tcg_gen_extu_i32_i64(addr64, addr);
> +    gen_helper_stcond_i64a(dst64, cpu_env, addr64, val, index);
> +    tcg_gen_extrl_i64_i32(dst, dst64);
> +
> +    tcg_temp_free(dst64);
> +    tcg_temp_free(addr64);
> +}
> +#endif
> +
> +#if defined(CONFIG_ARM_USE_LDST_EXCL)
> +DO_GEN_LDREX(i8)
> +DO_GEN_LDREX(i16a)
> +DO_GEN_LDREX(i32a)
> +
> +DO_GEN_STREX(i8)
> +DO_GEN_STREX(i16a)
> +DO_GEN_STREX(i32a)
> +#endif
> +
> +static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
> +                               TCGv_i32 addr, int size)
> + {
> +    TCGv_i32 tmp = tcg_temp_new_i32();
> +    TCGv_i32 mem_idx = tcg_temp_new_i32();
> +
> +    tcg_gen_movi_i32(mem_idx, get_mem_index(s));
> +
> +    if (size != 3) {
> +        switch (size) {
> +        case 0:
> +            gen_ldrex_i8(tmp, addr, mem_idx);
> +            break;
> +        case 1:
> +            gen_ldrex_i16a(tmp, addr, mem_idx);
> +            break;
> +        case 2:
> +            gen_ldrex_i32a(tmp, addr, mem_idx);
> +            break;
> +        default:
> +            abort();
> +        }
> +
> +        store_reg(s, rt, tmp);
> +    } else {
> +        TCGv_i64 tmp64 = tcg_temp_new_i64();
> +        TCGv_i32 tmph = tcg_temp_new_i32();
> +
> +        gen_ldrex_i64a(tmp64, addr, mem_idx);
> +        tcg_gen_extr_i64_i32(tmp, tmph, tmp64);
> +
> +        store_reg(s, rt, tmp);
> +        store_reg(s, rt2, tmph);
> +
> +        tcg_temp_free_i64(tmp64);
> +    }
> +
> +    tcg_temp_free_i32(mem_idx);
> +}
> +#else
>  static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
>                                 TCGv_i32 addr, int size)
>  {
> @@ -7460,10 +7594,15 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
>      store_reg(s, rt, tmp);
>      tcg_gen_extu_i32_i64(cpu_exclusive_addr, addr);
>  }
> +#endif
>
>  static void gen_clrex(DisasContext *s)
>  {
> +#ifdef CONFIG_ARM_USE_LDST_EXCL
> +    gen_helper_atomic_clear(cpu_env);
> +#else
>      tcg_gen_movi_i64(cpu_exclusive_addr, -1);
> +#endif
>  }
>
>  #ifdef CONFIG_USER_ONLY
> @@ -7475,6 +7614,47 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
>                       size | (rd << 4) | (rt << 8) | (rt2 << 12));
>      gen_exception_internal_insn(s, 4, EXCP_STREX);
>  }
> +#elif defined CONFIG_ARM_USE_LDST_EXCL
> +static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
> +                                TCGv_i32 addr, int size)
> +{
> +    TCGv_i32 tmp, mem_idx;
> +
> +    mem_idx = tcg_temp_new_i32();
> +
> +    tcg_gen_movi_i32(mem_idx, get_mem_index(s));
> +    tmp = load_reg(s, rt);
> +
> +    if (size != 3) {
> +        switch (size) {
> +        case 0:
> +            gen_strex_i8(cpu_R[rd], addr, tmp, mem_idx);
> +            break;
> +        case 1:
> +            gen_strex_i16a(cpu_R[rd], addr, tmp, mem_idx);
> +            break;
> +        case 2:
> +            gen_strex_i32a(cpu_R[rd], addr, tmp, mem_idx);
> +            break;
> +        default:
> +            abort();
> +        }
> +    } else {
> +        TCGv_i64 tmp64;
> +        TCGv_i32 tmp2;
> +
> +        tmp64 = tcg_temp_new_i64();
> +        tmp2 = load_reg(s, rt2);
> +        tcg_gen_concat_i32_i64(tmp64, tmp, tmp2);
> +        gen_strex_i64a(cpu_R[rd], addr, tmp64, mem_idx);
> +
> +        tcg_temp_free_i32(tmp2);
> +        tcg_temp_free_i64(tmp64);
> +    }
> +
> +    tcg_temp_free_i32(tmp);
> +    tcg_temp_free_i32(mem_idx);
> +}
>  #else
>  static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
>                                  TCGv_i32 addr, int size)


--
Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 15/16] target-arm: cpu64: use custom set_excl hook
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 15/16] target-arm: cpu64: use custom set_excl hook Alvise Rigo
@ 2016-02-18 18:19   ` Alex Bennée
  0 siblings, 0 replies; 50+ messages in thread
From: Alex Bennée @ 2016-02-18 18:19 UTC (permalink / raw)
  To: Alvise Rigo
  Cc: mttcg, claudio.fontana, qemu-devel, pbonzini, jani.kokkonen, tech, rth


Alvise Rigo <a.rigo@virtualopensystems.com> writes:

> In aarch64 the LDXP/STXP instructions allow to perform up to 128 bits
> exclusive accesses. However, due to a softmmu limitation, such wide
> accesses are not allowed.
>
> To workaround this limitation, we need to support LoadLink instructions
> that cover at least 128 consecutive bits (see the next patch for more
> details).
>
> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target-arm/cpu64.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/target-arm/cpu64.c b/target-arm/cpu64.c
> index cc177bb..1d45e66 100644
> --- a/target-arm/cpu64.c
> +++ b/target-arm/cpu64.c
> @@ -287,6 +287,13 @@ static void aarch64_cpu_set_pc(CPUState *cs, vaddr value)
>      }
>  }
>
> +static void aarch64_set_excl_range(CPUState *cpu, hwaddr addr, hwaddr size)
> +{
> +    cpu->excl_protected_range.begin = addr;
> +    /* At least cover 128 bits for a STXP access (two paired doublewords case)*/
> +    cpu->excl_protected_range.end = addr + 16;
> +}
> +
>  static void aarch64_cpu_class_init(ObjectClass *oc, void *data)
>  {
>      CPUClass *cc = CPU_CLASS(oc);
> @@ -297,6 +304,7 @@ static void aarch64_cpu_class_init(ObjectClass *oc, void *data)
>      cc->gdb_write_register = aarch64_cpu_gdb_write_register;
>      cc->gdb_num_core_regs = 34;
>      cc->gdb_core_xml_file = "aarch64-core.xml";
> +    cc->cpu_set_excl_protected_range = aarch64_set_excl_range;
>  }
>
>  static void aarch64_cpu_register(const ARMCPUInfo *info)


--
Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 16/16] target-arm: aarch64: add atomic instructions
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 16/16] target-arm: aarch64: add atomic instructions Alvise Rigo
@ 2016-02-19 11:34   ` Alex Bennée
  0 siblings, 0 replies; 50+ messages in thread
From: Alex Bennée @ 2016-02-19 11:34 UTC (permalink / raw)
  To: Alvise Rigo
  Cc: mttcg, claudio.fontana, qemu-devel, pbonzini, jani.kokkonen, tech, rth


Alvise Rigo <a.rigo@virtualopensystems.com> writes:

> Use the new LL/SC runtime helpers to handle the aarch64 atomic instructions
> in softmmu_llsc_template.h.
>
> The STXP emulation required a dedicated helper to handle the paired
> doubleword case.
>
> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
> ---
>  configure                  |   6 +-
>  target-arm/helper-a64.c    |  55 +++++++++++++++++++
>  target-arm/helper-a64.h    |   4 ++
>  target-arm/op_helper.c     |   8 +++
>  target-arm/translate-a64.c | 134 ++++++++++++++++++++++++++++++++++++++++++++-
>  5 files changed, 204 insertions(+), 3 deletions(-)
>
> diff --git a/configure b/configure
> index 915efcc..38121ff 100755
> --- a/configure
> +++ b/configure
> @@ -5873,9 +5873,11 @@ echo "QEMU_CFLAGS+=$cflags" >> $config_target_mak
>  # Use tcg LL/SC tcg backend for exclusive instruction is arm/aarch64
>  # softmmus targets
>  if test "$arm_tcg_use_llsc" = "yes" ; then
> -  if test "$target" = "arm-softmmu" ; then
> +  case "$target" in
> +    arm-softmmu | aarch64-softmmu)
>      echo "CONFIG_ARM_USE_LDST_EXCL=y" >> $config_target_mak
> -  fi
> +    ;;
> +  esac
>  fi

See previous comments about configure code.

>  done # for target in $targets
>
> diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
> index c7bfb4d..dcee66f 100644
> --- a/target-arm/helper-a64.c
> +++ b/target-arm/helper-a64.c
> @@ -26,6 +26,7 @@
>  #include "qemu/bitops.h"
>  #include "internals.h"
>  #include "qemu/crc32c.h"
> +#include "tcg/tcg.h"
>  #include <zlib.h> /* For crc32 */
>
>  /* C2.4.7 Multiply and divide */
> @@ -443,3 +444,57 @@ uint64_t HELPER(crc32c_64)(uint64_t acc, uint64_t val, uint32_t bytes)
>      /* Linux crc32c converts the output to one's complement.  */
>      return crc32c(acc, buf, bytes) ^ 0xffffffff;
>  }
> +
> +#ifdef CONFIG_ARM_USE_LDST_EXCL
> +/* STXP emulation for two 64 bit doublewords. We can't use directly two
> + * stcond_i64 accesses, otherwise the first will conclude the LL/SC pair.
> + * Instead, two normal 64-bit accesses are used and the CPUState is
> + * updated accordingly. */
> +target_ulong HELPER(stxp_i128)(CPUArchState *env, target_ulong addr,
> +                               uint64_t vall, uint64_t valh,
> +                               uint32_t mmu_idx)
> +{
> +    CPUState *cpu = ENV_GET_CPU(env);
> +    TCGMemOpIdx op;
> +    target_ulong ret = 0;
> +
> +    if (!cpu->ll_sc_context) {
> +        cpu->excl_succeeded = false;
> +        ret = 1;
> +        goto out;
> +    }
> +
> +    op = make_memop_idx(MO_BEQ, mmu_idx);
> +
> +    /* According to section C6.6.191 of ARM ARM DDI 0487A.h, the access has to
> +     * be quadword aligned.  For the time being, we do not support paired STXPs
> +     * to MMIO memory, this will become trivial when the softmmu will support
> +     * 128bit memory accesses. */
> +    if (addr & 0xf) {
> +        /* TODO: Do unaligned access */

This should probably log UNIMP if you don't implement it now.

> +    }
> +
> +    /* Setting excl_succeeded to true will make the store exclusive. */
> +    cpu->excl_succeeded = true;
> +    helper_ret_stq_mmu(env, addr, vall, op, GETRA());
> +
> +    if (!cpu->excl_succeeded) {
> +        ret = 1;
> +        goto out;
> +    }
> +
> +    helper_ret_stq_mmu(env, addr + 8, valh, op, GETRA());
> +    if (!cpu->excl_succeeded) {
> +        ret = 1;
> +    } else {
> +        cpu->excl_succeeded = false;
> +    }
> +
> +out:
> +    /* Unset LL/SC context */
> +    cpu->ll_sc_context = false;
> +    cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
> +
> +    return ret;
> +}
> +#endif
> diff --git a/target-arm/helper-a64.h b/target-arm/helper-a64.h
> index 1d3d10f..c416a83 100644
> --- a/target-arm/helper-a64.h
> +++ b/target-arm/helper-a64.h
> @@ -46,3 +46,7 @@ DEF_HELPER_FLAGS_2(frecpx_f32, TCG_CALL_NO_RWG, f32, f32, ptr)
>  DEF_HELPER_FLAGS_2(fcvtx_f64_to_f32, TCG_CALL_NO_RWG, f32, f64, env)
>  DEF_HELPER_FLAGS_3(crc32_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
>  DEF_HELPER_FLAGS_3(crc32c_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
> +#ifdef CONFIG_ARM_USE_LDST_EXCL
> +/* STXP helper */
> +DEF_HELPER_5(stxp_i128, i64, env, i64, i64, i64, i32)
> +#endif
> diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
> index 404c13b..146fc9a 100644
> --- a/target-arm/op_helper.c
> +++ b/target-arm/op_helper.c
> @@ -34,6 +34,14 @@ static void raise_exception(CPUARMState *env, uint32_t excp,
>      cs->exception_index = excp;
>      env->exception.syndrome = syndrome;
>      env->exception.target_el = target_el;
> +#ifdef CONFIG_ARM_USE_LDST_EXCL
> +    HELPER(atomic_clear)(env);
> +    /* If the exception happens in the middle of a LL/SC, we need to clear
> +     * excl_succeeded to avoid that the normal store following the exception is
> +     * wrongly interpreted as exclusive.
> +     * */
> +    cs->excl_succeeded = 0;
> +#endif
>      cpu_loop_exit(cs);
>  }
>
> diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
> index 80f6c20..f34e957 100644
> --- a/target-arm/translate-a64.c
> +++ b/target-arm/translate-a64.c
> @@ -37,8 +37,10 @@
>  static TCGv_i64 cpu_X[32];
>  static TCGv_i64 cpu_pc;
>
> +#if !defined(CONFIG_ARM_USE_LDST_EXCL)
>  /* Load/store exclusive handling */
>  static TCGv_i64 cpu_exclusive_high;
> +#endif
>
>  static const char *regnames[] = {
>      "x0", "x1", "x2", "x3", "x4", "x5", "x6", "x7",
> @@ -94,8 +96,10 @@ void a64_translate_init(void)
>                                            regnames[i]);
>      }
>
> +#if !defined(CONFIG_ARM_USE_LDST_EXCL)
>      cpu_exclusive_high = tcg_global_mem_new_i64(TCG_AREG0,
>          offsetof(CPUARMState, exclusive_high), "exclusive_high");
> +#endif
>  }
>
>  static inline ARMMMUIdx get_a64_user_mem_index(DisasContext *s)
> @@ -1219,7 +1223,11 @@ static void handle_hint(DisasContext *s, uint32_t insn,
>
>  static void gen_clrex(DisasContext *s, uint32_t insn)
>  {
> +#ifndef CONFIG_ARM_USE_LDST_EXCL
>      tcg_gen_movi_i64(cpu_exclusive_addr, -1);
> +#else
> +    gen_helper_atomic_clear(cpu_env);
> +#endif
>  }
>
>  /* CLREX, DSB, DMB, ISB */
> @@ -1685,7 +1693,11 @@ static void disas_b_exc_sys(DisasContext *s, uint32_t insn)
>  }
>
>  /*
> - * Load/Store exclusive instructions are implemented by remembering
> + * If the softmmu is enabled, the translation of Load/Store exclusive
> + * instructions will rely on the gen_helper_{ldlink,stcond} helpers,
> + * offloading most of the work to the softmmu_llsc_template.h functions.
> + *
> + * Otherwise, instructions are implemented by remembering
>   * the value/address loaded, and seeing if these are the same
>   * when the store is performed. This is not actually the architecturally
>   * mandated semantics, but it works for typical guest code sequences
> @@ -1695,6 +1707,66 @@ static void disas_b_exc_sys(DisasContext *s, uint32_t insn)
>   * this sequence is effectively atomic.  In user emulation mode we
>   * throw an exception and handle the atomic operation elsewhere.
>   */
> +#ifdef CONFIG_ARM_USE_LDST_EXCL
> +static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
> +                               TCGv_i64 addr, int size, bool is_pair)
> +{
> +    /* In case @is_pair is set, we have to guarantee that at least the 128 bits
> +     * accessed by a Load Exclusive Pair (64-bit variant) are protected. Since
> +     * we do not have 128-bit helpers, we split the access in two halves, the
> +     * first of them will set the exclusive region to cover at least 128 bits
> +     * (this is why aarch64 has a custom cc->cpu_set_excl_protected_range which
> +     * covers 128 bits).
> +     * */
> +    TCGv_i32 mem_idx = tcg_temp_new_i32();
> +
> +    tcg_gen_movi_i32(mem_idx, get_mem_index(s));
> +
> +    g_assert(size <= 3);
> +
> +    if (size < 3) {
> +        TCGv_i32 tmp = tcg_temp_new_i32();
> +
> +        switch (size) {
> +        case 0:
> +            gen_helper_ldlink_i8(tmp, cpu_env, addr, mem_idx);
> +            break;
> +        case 1:
> +            gen_helper_ldlink_i16(tmp, cpu_env, addr, mem_idx);
> +            break;
> +        case 2:
> +            gen_helper_ldlink_i32(tmp, cpu_env, addr, mem_idx);
> +            break;
> +        default:
> +            abort();
> +        }
> +
> +        TCGv_i64 tmp64 = tcg_temp_new_i64();
> +        tcg_gen_ext_i32_i64(tmp64, tmp);
> +        tcg_gen_mov_i64(cpu_reg(s, rt), tmp64);
> +
> +        tcg_temp_free_i32(tmp);
> +        tcg_temp_free_i64(tmp64);
> +    } else {
> +        gen_helper_ldlink_i64(cpu_reg(s, rt), cpu_env, addr, mem_idx);
> +    }
> +
> +    if (is_pair) {
> +        TCGMemOp memop = MO_TE + size;
> +        TCGv_i64 addr2 = tcg_temp_new_i64();
> +        TCGv_i64 hitmp = tcg_temp_new_i64();
> +
> +        g_assert(size >= 2);
> +        tcg_gen_addi_i64(addr2, addr, 1 << size);
> +        tcg_gen_qemu_ld_i64(hitmp, addr2, get_mem_index(s), memop);
> +        tcg_temp_free_i64(addr2);
> +        tcg_gen_mov_i64(cpu_reg(s, rt2), hitmp);
> +        tcg_temp_free_i64(hitmp);
> +    }
> +
> +    tcg_temp_free_i32(mem_idx);
> +}
> +#else
>  static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
>                                 TCGv_i64 addr, int size, bool is_pair)
>  {
> @@ -1723,6 +1795,7 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
>      tcg_temp_free_i64(tmp);
>      tcg_gen_mov_i64(cpu_exclusive_addr, addr);
>  }
> +#endif
>
>  #ifdef CONFIG_USER_ONLY
>  static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
> @@ -1733,6 +1806,65 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
>                       size | is_pair << 2 | (rd << 4) | (rt << 9) | (rt2 << 14));
>      gen_exception_internal_insn(s, 4, EXCP_STREX);
>  }
> +#elif defined(CONFIG_ARM_USE_LDST_EXCL)
> +static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
> +                                TCGv_i64 addr, int size, int is_pair)
> +{
> +    /* Don't bother to check if we are actually in exclusive context since the
> +     * helpers keep care of it. */
> +    TCGv_i32 mem_idx = tcg_temp_new_i32();
> +
> +    tcg_gen_movi_i32(mem_idx, get_mem_index(s));
> +
> +    g_assert(size <= 3);
> +    if (is_pair) {
> +        if (size == 3) {
> +            gen_helper_stxp_i128(cpu_reg(s, rd), cpu_env, addr, cpu_reg(s, rt),
> +                                 cpu_reg(s, rt2), mem_idx);
> +        } else if (size == 2) {
> +            /* Paired single word case. After merging the two registers into
> +             * one, we use one stcond_i64 to store the value to memory. */
> +            TCGv_i64 val = tcg_temp_new_i64();
> +            TCGv_i64 valh = tcg_temp_new_i64();
> +            tcg_gen_shli_i64(valh, cpu_reg(s, rt2), 32);
> +            tcg_gen_and_i64(val, valh, cpu_reg(s, rt));
> +            gen_helper_stcond_i64(cpu_reg(s, rd), cpu_env, addr, val, mem_idx);
> +            tcg_temp_free_i64(valh);
> +            tcg_temp_free_i64(val);
> +        } else {
> +            abort();
> +        }
> +    } else {
> +        if (size < 3) {
> +            TCGv_i32 val = tcg_temp_new_i32();
> +
> +            tcg_gen_extrl_i64_i32(val, cpu_reg(s, rt));
> +
> +            switch (size) {
> +            case 0:
> +                gen_helper_stcond_i8(cpu_reg(s, rd), cpu_env, addr, val,
> +                                     mem_idx);
> +                break;
> +            case 1:
> +                gen_helper_stcond_i16(cpu_reg(s, rd), cpu_env, addr, val,
> +                                      mem_idx);
> +                break;
> +            case 2:
> +                gen_helper_stcond_i32(cpu_reg(s, rd), cpu_env, addr, val,
> +                                      mem_idx);
> +                break;
> +            default:
> +                abort();
> +            }
> +            tcg_temp_free_i32(val);
> +        } else {
> +            gen_helper_stcond_i64(cpu_reg(s, rd), cpu_env, addr, cpu_reg(s, rt),
> +                                  mem_idx);
> +        }
> +    }
> +
> +    tcg_temp_free_i32(mem_idx);
> +}
>  #else
>  static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
>                                  TCGv_i64 inaddr, int size, int is_pair)


--
Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation
  2016-01-29  9:32 [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation Alvise Rigo
                   ` (15 preceding siblings ...)
  2016-01-29  9:32 ` [Qemu-devel] [RFC v7 16/16] target-arm: aarch64: add atomic instructions Alvise Rigo
@ 2016-02-19 11:44 ` Alex Bennée
  2016-02-19 12:01   ` alvise rigo
  16 siblings, 1 reply; 50+ messages in thread
From: Alex Bennée @ 2016-02-19 11:44 UTC (permalink / raw)
  To: Alvise Rigo
  Cc: mttcg, claudio.fontana, qemu-devel, pbonzini, jani.kokkonen, tech, rth


Alvise Rigo <a.rigo@virtualopensystems.com> writes:

> This is the seventh iteration of the patch series which applies to the
> upstream branch of QEMU (v2.5.0-rc4).
>
> Changes versus previous versions are at the bottom of this cover letter.
>
> The code is also available at following repository:
> https://git.virtualopensystems.com/dev/qemu-mt.git
> branch:
> slowpath-for-atomic-v7-no-mttcg

OK I'm done on this review pass. I think generally we are in pretty good
shape although I await to see what extra needs to be done for the MTTCG
case.

We are coming up to soft-freeze on 1/3/16 and it would be nice to get
this merged by then. As it is a fairly major chunk of work it would need
to get the initial commit by that date.

However before we can get to that stage we need some review from the
maintainers. For your next version can you please:

  - Drop the RFC tag, I think we have had enough comment ;-)
  - Make sure you CC the TCG maintainers (Paolo, Peter C and Richard Henderson)
  - Also CC the ARM maintainers (Peter M)
  - Be ready for a fast turnaround

Paolo/Richard,

Do you have any comments on this iteration?

--
Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation
  2016-02-19 11:44 ` [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation Alex Bennée
@ 2016-02-19 12:01   ` alvise rigo
  2016-02-19 12:19     ` Alex Bennée
  0 siblings, 1 reply; 50+ messages in thread
From: alvise rigo @ 2016-02-19 12:01 UTC (permalink / raw)
  To: Alex Bennée
  Cc: MTTCG Devel, Claudio Fontana, QEMU Developers, Paolo Bonzini,
	Jani Kokkonen, VirtualOpenSystems Technical Team,
	Richard Henderson

On Fri, Feb 19, 2016 at 12:44 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Alvise Rigo <a.rigo@virtualopensystems.com> writes:
>
>> This is the seventh iteration of the patch series which applies to the
>> upstream branch of QEMU (v2.5.0-rc4).
>>
>> Changes versus previous versions are at the bottom of this cover letter.
>>
>> The code is also available at following repository:
>> https://git.virtualopensystems.com/dev/qemu-mt.git
>> branch:
>> slowpath-for-atomic-v7-no-mttcg
>
> OK I'm done on this review pass. I think generally we are in pretty good
> shape although I await to see what extra needs to be done for the MTTCG
> case.

Hi Alex,

Thank you for this review. Regarding the extra needs and integration
with the MTTCG code, I've made available at this address [1] a working
branch with the two patch series merged together. The branch boots
fine Linux on both aarch64 and arm architectures. There is still that
known issue with virtio, that Fred should fix soon. Let me know your
first impressions.

[1] https://git.virtualopensystems.com/dev/qemu-mt.git (branch
"merging-slowpath-v7-mttcg-v8-wip")

Thank you,
alvise

>
> We are coming up to soft-freeze on 1/3/16 and it would be nice to get
> this merged by then. As it is a fairly major chunk of work it would need
> to get the initial commit by that date.
>
> However before we can get to that stage we need some review from the
> maintainers. For your next version can you please:
>
>   - Drop the RFC tag, I think we have had enough comment ;-)
>   - Make sure you CC the TCG maintainers (Paolo, Peter C and Richard Henderson)
>   - Also CC the ARM maintainers (Peter M)
>   - Be ready for a fast turnaround
>
> Paolo/Richard,
>
> Do you have any comments on this iteration?
>
> --
> Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation
  2016-02-19 12:01   ` alvise rigo
@ 2016-02-19 12:19     ` Alex Bennée
  0 siblings, 0 replies; 50+ messages in thread
From: Alex Bennée @ 2016-02-19 12:19 UTC (permalink / raw)
  To: alvise rigo
  Cc: MTTCG Devel, Claudio Fontana, QEMU Developers, Paolo Bonzini,
	Jani Kokkonen, VirtualOpenSystems Technical Team,
	Richard Henderson


alvise rigo <a.rigo@virtualopensystems.com> writes:

> On Fri, Feb 19, 2016 at 12:44 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>> Alvise Rigo <a.rigo@virtualopensystems.com> writes:
>>
>>> This is the seventh iteration of the patch series which applies to the
>>> upstream branch of QEMU (v2.5.0-rc4).
>>>
>>> Changes versus previous versions are at the bottom of this cover letter.
>>>
>>> The code is also available at following repository:
>>> https://git.virtualopensystems.com/dev/qemu-mt.git
>>> branch:
>>> slowpath-for-atomic-v7-no-mttcg
>>
>> OK I'm done on this review pass. I think generally we are in pretty good
>> shape although I await to see what extra needs to be done for the MTTCG
>> case.
>
> Hi Alex,
>
> Thank you for this review. Regarding the extra needs and integration
> with the MTTCG code, I've made available at this address [1] a working
> branch with the two patch series merged together. The branch boots
> fine Linux on both aarch64 and arm architectures. There is still that
> known issue with virtio, that Fred should fix soon. Let me know your
> first impressions.

Does the virtio problem it go away if you drop the top patch (as per my last email)?

>
> [1] https://git.virtualopensystems.com/dev/qemu-mt.git (branch
> "merging-slowpath-v7-mttcg-v8-wip")

Thanks I'll have a look next week.

>
> Thank you,
> alvise
>
>>
>> We are coming up to soft-freeze on 1/3/16 and it would be nice to get
>> this merged by then. As it is a fairly major chunk of work it would need
>> to get the initial commit by that date.
>>
>> However before we can get to that stage we need some review from the
>> maintainers. For your next version can you please:
>>
>>   - Drop the RFC tag, I think we have had enough comment ;-)
>>   - Make sure you CC the TCG maintainers (Paolo, Peter C and Richard Henderson)
>>   - Also CC the ARM maintainers (Peter M)
>>   - Be ready for a fast turnaround
>>
>> Paolo/Richard,
>>
>> Do you have any comments on this iteration?
>>
>> --
>> Alex Bennée


--
Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 12/16] configure: Use slow-path for atomic only when the softmmu is enabled
  2016-02-18 16:40   ` Alex Bennée
  2016-02-18 16:43     ` Alex Bennée
@ 2016-03-07 17:21     ` alvise rigo
  1 sibling, 0 replies; 50+ messages in thread
From: alvise rigo @ 2016-03-07 17:21 UTC (permalink / raw)
  To: Alex Bennée
  Cc: MTTCG Devel, Claudio Fontana, QEMU Developers, Paolo Bonzini,
	Jani Kokkonen, VirtualOpenSystems Technical Team,
	Richard Henderson

On Thu, Feb 18, 2016 at 5:40 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Alvise Rigo <a.rigo@virtualopensystems.com> writes:
>
>> Use the new slow path for atomic instruction translation when the
>> softmmu is enabled.
>>
>> At the moment only arm and aarch64 use the new LL/SC backend. It is
>> possible to disable such backed with --disable-arm-llsc-backend.
>
> Do we want to disable the backend once it is merged? Does it serve a
> purpose other than to confuse the user?

I added this option in order to have a quick way to compile a binary
with an without backend, it has been useful in the development
process. Now it's probably time to drop it.

Thank you,
alvise

>
>>
>> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
>> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
>> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
>> ---
>>  configure | 14 ++++++++++++++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/configure b/configure
>> index 44ac9ab..915efcc 100755
>> --- a/configure
>> +++ b/configure
>> @@ -294,6 +294,7 @@ solaris="no"
>>  profiler="no"
>>  cocoa="no"
>>  softmmu="yes"
>> +arm_tcg_use_llsc="yes"
>>  linux_user="no"
>>  bsd_user="no"
>>  aix="no"
>> @@ -880,6 +881,10 @@ for opt do
>>    ;;
>>    --disable-debug-tcg) debug_tcg="no"
>>    ;;
>> +  --enable-arm-llsc-backend) arm_tcg_use_llsc="yes"
>> +  ;;
>> +  --disable-arm-llsc-backend) arm_tcg_use_llsc="no"
>> +  ;;
>>    --enable-debug)
>>        # Enable debugging options that aren't excessively noisy
>>        debug_tcg="yes"
>> @@ -4751,6 +4756,7 @@ echo "host CPU          $cpu"
>>  echo "host big endian   $bigendian"
>>  echo "target list       $target_list"
>>  echo "tcg debug enabled $debug_tcg"
>> +echo "arm use llsc backend" $arm_tcg_use_llsc
>>  echo "gprof enabled     $gprof"
>>  echo "sparse enabled    $sparse"
>>  echo "strip binaries    $strip_opt"
>> @@ -4806,6 +4812,7 @@ echo "Install blobs     $blobs"
>>  echo "KVM support       $kvm"
>>  echo "RDMA support      $rdma"
>>  echo "TCG interpreter   $tcg_interpreter"
>> +echo "use ld/st excl    $softmmu"
>
> I think we can drop everything above here.
>
>>  echo "fdt support       $fdt"
>>  echo "preadv support    $preadv"
>>  echo "fdatasync         $fdatasync"
>> @@ -5863,6 +5870,13 @@ fi
>>  echo "LDFLAGS+=$ldflags" >> $config_target_mak
>>  echo "QEMU_CFLAGS+=$cflags" >> $config_target_mak
>>
>> +# Use tcg LL/SC tcg backend for exclusive instruction is arm/aarch64
>> +# softmmus targets
>> +if test "$arm_tcg_use_llsc" = "yes" ; then
>> +  if test "$target" = "arm-softmmu" ; then
>> +    echo "CONFIG_ARM_USE_LDST_EXCL=y" >> $config_target_mak
>> +  fi
>> +fi
>
> This isn't going to be just ARM specific and it will be progressively
> turned on for other arches. So perhaps with the CONFIG_SOFTMMU section:
>
> if test "$target_softmmu" = "yes" ; then
>     echo "CONFIG_SOFTMMU=y" >> $config_target_mak
>
>     # Use SoftMMU LL/SC primitives?
>     case "$target_name" in
>         arm | aarch64)
>             echo "CONFIG_USE_LDST_EXCL=y" >> $config_target_mak
>             ;;
>     esac
> fi
>
>
>>  done # for target in $targets
>>
>>  if [ "$pixman" = "internal" ]; then
>
>
> --
> Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 10/16] softmmu: Protect MMIO exclusive range
  2016-02-18 16:25       ` Alex Bennée
@ 2016-03-07 18:13         ` alvise rigo
  0 siblings, 0 replies; 50+ messages in thread
From: alvise rigo @ 2016-03-07 18:13 UTC (permalink / raw)
  To: Alex Bennée
  Cc: MTTCG Devel, Claudio Fontana, QEMU Developers, Paolo Bonzini,
	Jani Kokkonen, VirtualOpenSystems Technical Team,
	Richard Henderson

On Thu, Feb 18, 2016 at 5:25 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> alvise rigo <a.rigo@virtualopensystems.com> writes:
>
>> On Wed, Feb 17, 2016 at 7:55 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>
>>> Alvise Rigo <a.rigo@virtualopensystems.com> writes:
>>>
>>>> As for the RAM case, also the MMIO exclusive ranges have to be protected
>>>> by other CPU's accesses. In order to do that, we flag the accessed
>>>> MemoryRegion to mark that an exclusive access has been performed and is
>>>> not concluded yet.
>>>>
>>>> This flag will force the other CPUs to invalidate the exclusive range in
>>>> case of collision.
>>>>
>>>> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
>>>> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
>>>> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
>>>> ---
>>>>  cputlb.c                | 20 +++++++++++++-------
>>>>  include/exec/memory.h   |  1 +
>>>>  softmmu_llsc_template.h | 11 +++++++----
>>>>  softmmu_template.h      | 22 ++++++++++++++++++++++
>>>>  4 files changed, 43 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/cputlb.c b/cputlb.c
>>>> index 87d09c8..06ce2da 100644
>>>> --- a/cputlb.c
>>>> +++ b/cputlb.c
>>>> @@ -496,19 +496,25 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr)
>>>>  /* For every vCPU compare the exclusive address and reset it in case of a
>>>>   * match. Since only one vCPU is running at once, no lock has to be held to
>>>>   * guard this operation. */
>>>> -static inline void lookup_and_reset_cpus_ll_addr(hwaddr addr, hwaddr size)
>>>> +static inline bool lookup_and_reset_cpus_ll_addr(hwaddr addr, hwaddr size)
>>>>  {
>>>>      CPUState *cpu;
>>>> +    bool ret = false;
>>>>
>>>>      CPU_FOREACH(cpu) {
>>>> -        if (cpu->excl_protected_range.begin != EXCLUSIVE_RESET_ADDR &&
>>>> -            ranges_overlap(cpu->excl_protected_range.begin,
>>>> -                           cpu->excl_protected_range.end -
>>>> -                           cpu->excl_protected_range.begin,
>>>> -                           addr, size)) {
>>>> -            cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
>>>> +        if (current_cpu != cpu) {
>>>
>>> I'm confused by this change. I don't see anywhere in the MMIO handling
>>> why we would want to change skipping the CPU. Perhaps this belongs in
>>> the previous patch? Maybe the function should really be
>>> lookup_and_maybe_reset_other_cpu_ll_addr?
>>
>> This is actually used later on in this patch.
>
> But aren't there other users before the functional change was made to
> skip the current_cpu? Where their expectations wrong or should we have
> always skipped the current CPU?

I see your point now. When current_cpu was skipped, there was no need
of the line
cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
in helper_stcond_name() when we return back from softmmu_template.h.

The error is that that line should have been added in this patch, not
in PATCH 07/16. Fixing it for the next version.

>
> The additional of the bool return I agree only needs to be brought in
> now when there are functions that care.
>
>>
>>>
>>>> +            if (cpu->excl_protected_range.begin != EXCLUSIVE_RESET_ADDR &&
>>>> +                ranges_overlap(cpu->excl_protected_range.begin,
>>>> +                               cpu->excl_protected_range.end -
>>>> +                               cpu->excl_protected_range.begin,
>>>> +                               addr, size)) {
>>>> +                cpu->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
>>>> +                ret = true;
>>>> +            }
>>>>          }
>>>>      }
>>>> +
>>>> +    return ret;
>>>>  }
>>>>
>>>>  #define MMUSUFFIX _mmu
>>>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>>>> index 71e0480..bacb3ad 100644
>>>> --- a/include/exec/memory.h
>>>> +++ b/include/exec/memory.h
>>>> @@ -171,6 +171,7 @@ struct MemoryRegion {
>>>>      bool rom_device;
>>>>      bool flush_coalesced_mmio;
>>>>      bool global_locking;
>>>> +    bool pending_excl_access; /* A vCPU issued an exclusive access */
>>>>      uint8_t dirty_log_mask;
>>>>      ram_addr_t ram_addr;
>>>>      Object *owner;
>>>> diff --git a/softmmu_llsc_template.h b/softmmu_llsc_template.h
>>>> index 101f5e8..b4712ba 100644
>>>> --- a/softmmu_llsc_template.h
>>>> +++ b/softmmu_llsc_template.h
>>>> @@ -81,15 +81,18 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
>>>>                  }
>>>>              }
>>>>          }
>>>> +        /* For this vCPU, just update the TLB entry, no need to flush. */
>>>> +        env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
>>>>      } else {
>>>> -        hw_error("EXCL accesses to MMIO regions not supported yet.");
>>>> +        /* Set a pending exclusive access in the MemoryRegion */
>>>> +        MemoryRegion *mr = iotlb_to_region(this,
>>>> +                                           env->iotlb[mmu_idx][index].addr,
>>>> +                                           env->iotlb[mmu_idx][index].attrs);
>>>> +        mr->pending_excl_access = true;
>>>>      }
>>>>
>>>>      cc->cpu_set_excl_protected_range(this, hw_addr, DATA_SIZE);
>>>>
>>>> -    /* For this vCPU, just update the TLB entry, no need to flush. */
>>>> -    env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
>>>> -
>>>>      /* From now on we are in LL/SC context */
>>>>      this->ll_sc_context = true;
>>>>
>>>> diff --git a/softmmu_template.h b/softmmu_template.h
>>>> index c54bdc9..71c5152 100644
>>>> --- a/softmmu_template.h
>>>> +++ b/softmmu_template.h
>>>> @@ -360,6 +360,14 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
>>>>      MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
>>>>
>>>>      physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
>>>> +
>>>> +    /* Invalidate the exclusive range that overlaps this access */
>>>> +    if (mr->pending_excl_access) {
>>>> +        if (lookup_and_reset_cpus_ll_addr(physaddr, 1 << SHIFT)) {
>>
>> Here precisely. As you wrote, we can rename it to
>> lookup_and_maybe_reset_other_cpu_ll_addr even if this name does not
>> convince me. What about other_cpus_reset_colliding_ll_addr?
>
> We want as short and semantically informative as possible. Naming things is hard ;-)
>
>  - reset_other_cpus_colliding_ll_addr
>  - reset_other_cpus_overlapping_ll_addr
>
> Any other options?

Umm, these sound fine to me. Probably the first one since shorter.

Thank you,
alvise

>
>>
>> Thank you,
>> alvise
>>
>>>> +            mr->pending_excl_access = false;
>>>> +        }
>>>> +    }
>>>> +
>>>>      if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
>>>>          cpu_io_recompile(cpu, retaddr);
>>>>      }
>>>> @@ -504,6 +512,13 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>>>                  glue(helper_le_st_name, _do_mmio_access)(env, val, addr, oi,
>>>>                                                           mmu_idx, index,
>>>>                                                           retaddr);
>>>> +                /* N.B.: Here excl_succeeded == true means that this access
>>>> +                 * comes from an exclusive instruction. */
>>>> +                if (cpu->excl_succeeded) {
>>>> +                    MemoryRegion *mr = iotlb_to_region(cpu, iotlbentry->addr,
>>>> +                                                       iotlbentry->attrs);
>>>> +                    mr->pending_excl_access = false;
>>>> +                }
>>>>              } else {
>>>>                  glue(helper_le_st_name, _do_ram_access)(env, val, addr, oi,
>>>>                                                          mmu_idx, index,
>>>> @@ -655,6 +670,13 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>>>                  glue(helper_be_st_name, _do_mmio_access)(env, val, addr, oi,
>>>>                                                           mmu_idx, index,
>>>>                                                           retaddr);
>>>> +                /* N.B.: Here excl_succeeded == true means that this access
>>>> +                 * comes from an exclusive instruction. */
>>>> +                if (cpu->excl_succeeded) {
>>>> +                    MemoryRegion *mr = iotlb_to_region(cpu, iotlbentry->addr,
>>>> +                                                       iotlbentry->attrs);
>>>> +                    mr->pending_excl_access = false;
>>>> +                }
>>>
>>> My comments about duplication on previous patches still stand.
>>
>> Indeed.
>>
>> Thank you,
>> alvise
>>
>>>
>>>>              } else {
>>>>                  glue(helper_be_st_name, _do_ram_access)(env, val, addr, oi,
>>>>                                                          mmu_idx, index,
>>>
>>>
>>> --
>>> Alex Bennée
>
>
> --
> Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 14/16] target-arm: translate: Use ld/st excl for atomic insns
  2016-02-18 17:02   ` Alex Bennée
@ 2016-03-07 18:39     ` alvise rigo
  2016-03-07 20:06       ` Alex Bennée
  0 siblings, 1 reply; 50+ messages in thread
From: alvise rigo @ 2016-03-07 18:39 UTC (permalink / raw)
  To: Alex Bennée
  Cc: MTTCG Devel, Claudio Fontana, QEMU Developers, Paolo Bonzini,
	Jani Kokkonen, VirtualOpenSystems Technical Team,
	Richard Henderson

On Thu, Feb 18, 2016 at 6:02 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Alvise Rigo <a.rigo@virtualopensystems.com> writes:
>
>> Use the new LL/SC runtime helpers to handle the ARM atomic instructions
>> in softmmu_llsc_template.h.
>>
>> In general, the helper generator
>> gen_{ldrex,strex}_{8,16a,32a,64a}() calls the function
>> helper_{le,be}_{ldlink,stcond}{ub,uw,ul,q}_mmu() implemented in
>> softmmu_llsc_template.h, doing an alignment check.
>>
>> In addition, add a simple helper function to emulate the CLREX instruction.
>>
>> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
>> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
>> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
>> ---
>>  target-arm/cpu.h       |   2 +
>>  target-arm/helper.h    |   4 ++
>>  target-arm/machine.c   |   2 +
>>  target-arm/op_helper.c |  10 +++
>>  target-arm/translate.c | 188 +++++++++++++++++++++++++++++++++++++++++++++++--
>>  5 files changed, 202 insertions(+), 4 deletions(-)
>>
>> diff --git a/target-arm/cpu.h b/target-arm/cpu.h
>> index b8b3364..bb5361f 100644
>> --- a/target-arm/cpu.h
>> +++ b/target-arm/cpu.h
>> @@ -462,9 +462,11 @@ typedef struct CPUARMState {
>>          float_status fp_status;
>>          float_status standard_fp_status;
>>      } vfp;
>> +#if !defined(CONFIG_ARM_USE_LDST_EXCL)
>>      uint64_t exclusive_addr;
>>      uint64_t exclusive_val;
>>      uint64_t exclusive_high;
>> +#endif
>>  #if defined(CONFIG_USER_ONLY)
>>      uint64_t exclusive_test;
>>      uint32_t exclusive_info;
>> diff --git a/target-arm/helper.h b/target-arm/helper.h
>> index c2a85c7..6bc3c0a 100644
>> --- a/target-arm/helper.h
>> +++ b/target-arm/helper.h
>> @@ -532,6 +532,10 @@ DEF_HELPER_2(dc_zva, void, env, i64)
>>  DEF_HELPER_FLAGS_2(neon_pmull_64_lo, TCG_CALL_NO_RWG_SE, i64, i64, i64)
>>  DEF_HELPER_FLAGS_2(neon_pmull_64_hi, TCG_CALL_NO_RWG_SE, i64, i64, i64)
>>
>> +#ifdef CONFIG_ARM_USE_LDST_EXCL
>> +DEF_HELPER_1(atomic_clear, void, env)
>> +#endif
>> +
>>  #ifdef TARGET_AARCH64
>>  #include "helper-a64.h"
>>  #endif
>> diff --git a/target-arm/machine.c b/target-arm/machine.c
>> index ed1925a..7adfb4d 100644
>> --- a/target-arm/machine.c
>> +++ b/target-arm/machine.c
>> @@ -309,9 +309,11 @@ const VMStateDescription vmstate_arm_cpu = {
>>          VMSTATE_VARRAY_INT32(cpreg_vmstate_values, ARMCPU,
>>                               cpreg_vmstate_array_len,
>>                               0, vmstate_info_uint64, uint64_t),
>> +#if !defined(CONFIG_ARM_USE_LDST_EXCL)
>>          VMSTATE_UINT64(env.exclusive_addr, ARMCPU),
>>          VMSTATE_UINT64(env.exclusive_val, ARMCPU),
>>          VMSTATE_UINT64(env.exclusive_high, ARMCPU),
>> +#endif
>
> Hmm this does imply we either need to support migration of the LL/SC
> state in the generic code or map the generic state into the ARM specific
> machine state or we'll break migration.
>
> The later if probably better so you can save machine state from a
> pre-LL/SC build and migrate to a new LL/SC enabled build.

This basically would require to add in cpu_pre_save some code to copy
env.exclusive_* to the new structures. As a consequence, this will not
get rid of the variables pre-LL/SC.

>
>>          VMSTATE_UINT64(env.features, ARMCPU),
>>          VMSTATE_UINT32(env.exception.syndrome, ARMCPU),
>>          VMSTATE_UINT32(env.exception.fsr, ARMCPU),
>> diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
>> index a5ee65f..404c13b 100644
>> --- a/target-arm/op_helper.c
>> +++ b/target-arm/op_helper.c
>> @@ -51,6 +51,14 @@ static int exception_target_el(CPUARMState *env)
>>      return target_el;
>>  }
>>
>> +#ifdef CONFIG_ARM_USE_LDST_EXCL
>> +void HELPER(atomic_clear)(CPUARMState *env)
>> +{
>> +    ENV_GET_CPU(env)->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
>> +    ENV_GET_CPU(env)->ll_sc_context = false;
>> +}
>> +#endif
>> +
>
> Given this is just touching generic CPU state this helper should probably be
> part of the generic TCG runtime. I assume other arches will just call
> this helper as well.

Would it make sense instead to add a new CPUClass hook for this? Other
architectures might want a different behaviour (or add something
else).

Thank you,
alvise

>
>
>>  uint32_t HELPER(neon_tbl)(CPUARMState *env, uint32_t ireg, uint32_t def,
>>                            uint32_t rn, uint32_t maxindex)
>>  {
>> @@ -689,7 +697,9 @@ void HELPER(exception_return)(CPUARMState *env)
>>
>>      aarch64_save_sp(env, cur_el);
>>
>> +#if !defined(CONFIG_ARM_USE_LDST_EXCL)
>>      env->exclusive_addr = -1;
>> +#endif
>>
>>      /* We must squash the PSTATE.SS bit to zero unless both of the
>>       * following hold:
>> diff --git a/target-arm/translate.c b/target-arm/translate.c
>> index cff511b..5150841 100644
>> --- a/target-arm/translate.c
>> +++ b/target-arm/translate.c
>> @@ -60,8 +60,10 @@ TCGv_ptr cpu_env;
>>  static TCGv_i64 cpu_V0, cpu_V1, cpu_M0;
>>  static TCGv_i32 cpu_R[16];
>>  TCGv_i32 cpu_CF, cpu_NF, cpu_VF, cpu_ZF;
>> +#if !defined(CONFIG_ARM_USE_LDST_EXCL)
>>  TCGv_i64 cpu_exclusive_addr;
>>  TCGv_i64 cpu_exclusive_val;
>> +#endif
>>  #ifdef CONFIG_USER_ONLY
>>  TCGv_i64 cpu_exclusive_test;
>>  TCGv_i32 cpu_exclusive_info;
>> @@ -94,10 +96,12 @@ void arm_translate_init(void)
>>      cpu_VF = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUARMState, VF), "VF");
>>      cpu_ZF = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUARMState, ZF), "ZF");
>>
>> +#if !defined(CONFIG_ARM_USE_LDST_EXCL)
>>      cpu_exclusive_addr = tcg_global_mem_new_i64(TCG_AREG0,
>>          offsetof(CPUARMState, exclusive_addr), "exclusive_addr");
>>      cpu_exclusive_val = tcg_global_mem_new_i64(TCG_AREG0,
>>          offsetof(CPUARMState, exclusive_val), "exclusive_val");
>> +#endif
>>  #ifdef CONFIG_USER_ONLY
>>      cpu_exclusive_test = tcg_global_mem_new_i64(TCG_AREG0,
>>          offsetof(CPUARMState, exclusive_test), "exclusive_test");
>> @@ -7413,15 +7417,145 @@ static void gen_logicq_cc(TCGv_i32 lo, TCGv_i32 hi)
>>      tcg_gen_or_i32(cpu_ZF, lo, hi);
>>  }
>>
>> -/* Load/Store exclusive instructions are implemented by remembering
>> +/* If the softmmu is enabled, the translation of Load/Store exclusive
>> +   instructions will rely on the gen_helper_{ldlink,stcond} helpers,
>> +   offloading most of the work to the softmmu_llsc_template.h functions.
>> +   All the accesses made by the exclusive instructions include an
>> +   alignment check.
>> +
>> +   Otherwise, these instructions are implemented by remembering
>>     the value/address loaded, and seeing if these are the same
>>     when the store is performed. This should be sufficient to implement
>>     the architecturally mandated semantics, and avoids having to monitor
>>     regular stores.
>>
>> -   In system emulation mode only one CPU will be running at once, so
>> -   this sequence is effectively atomic.  In user emulation mode we
>> -   throw an exception and handle the atomic operation elsewhere.  */
>> +   In user emulation mode we throw an exception and handle the atomic
>> +   operation elsewhere.  */
>> +#ifdef CONFIG_ARM_USE_LDST_EXCL
>> +
>> +#if TARGET_LONG_BITS == 32
>> +#define DO_GEN_LDREX(SUFF)                                             \
>> +static inline void gen_ldrex_##SUFF(TCGv_i32 dst, TCGv_i32 addr,       \
>> +                                    TCGv_i32 index)                    \
>> +{                                                                      \
>> +    gen_helper_ldlink_##SUFF(dst, cpu_env, addr, index);               \
>> +}
>> +
>> +#define DO_GEN_STREX(SUFF)                                             \
>> +static inline void gen_strex_##SUFF(TCGv_i32 dst, TCGv_i32 addr,       \
>> +                                    TCGv_i32 val, TCGv_i32 index)      \
>> +{                                                                      \
>> +    gen_helper_stcond_##SUFF(dst, cpu_env, addr, val, index);          \
>> +}
>> +
>> +static inline void gen_ldrex_i64a(TCGv_i64 dst, TCGv_i32 addr, TCGv_i32 index)
>> +{
>> +    gen_helper_ldlink_i64a(dst, cpu_env, addr, index);
>> +}
>> +
>> +static inline void gen_strex_i64a(TCGv_i32 dst, TCGv_i32 addr, TCGv_i64 val,
>> +                                  TCGv_i32 index)
>> +{
>> +
>> +    gen_helper_stcond_i64a(dst, cpu_env, addr, val, index);
>> +}
>> +#else
>> +#define DO_GEN_LDREX(SUFF)                                             \
>> +static inline void gen_ldrex_##SUFF(TCGv_i32 dst, TCGv_i32 addr,       \
>> +                                         TCGv_i32 index)               \
>> +{                                                                      \
>> +    TCGv addr64 = tcg_temp_new();                                      \
>> +    tcg_gen_extu_i32_i64(addr64, addr);                                \
>> +    gen_helper_ldlink_##SUFF(dst, cpu_env, addr64, index);             \
>> +    tcg_temp_free(addr64);                                             \
>> +}
>> +
>> +#define DO_GEN_STREX(SUFF)                                             \
>> +static inline void gen_strex_##SUFF(TCGv_i32 dst, TCGv_i32 addr,       \
>> +                                    TCGv_i32 val, TCGv_i32 index)      \
>> +{                                                                      \
>> +    TCGv addr64 = tcg_temp_new();                                      \
>> +    TCGv dst64 = tcg_temp_new();                                       \
>> +    tcg_gen_extu_i32_i64(addr64, addr);                                \
>> +    gen_helper_stcond_##SUFF(dst64, cpu_env, addr64, val, index);      \
>> +    tcg_gen_extrl_i64_i32(dst, dst64);                                 \
>> +    tcg_temp_free(dst64);                                              \
>> +    tcg_temp_free(addr64);                                             \
>> +}
>> +
>> +static inline void gen_ldrex_i64a(TCGv_i64 dst, TCGv_i32 addr, TCGv_i32 index)
>> +{
>> +    TCGv addr64 = tcg_temp_new();
>> +    tcg_gen_extu_i32_i64(addr64, addr);
>> +    gen_helper_ldlink_i64a(dst, cpu_env, addr64, index);
>> +    tcg_temp_free(addr64);
>> +}
>> +
>> +static inline void gen_strex_i64a(TCGv_i32 dst, TCGv_i32 addr, TCGv_i64 val,
>> +                                  TCGv_i32 index)
>> +{
>> +    TCGv addr64 = tcg_temp_new();
>> +    TCGv dst64 = tcg_temp_new();
>> +
>> +    tcg_gen_extu_i32_i64(addr64, addr);
>> +    gen_helper_stcond_i64a(dst64, cpu_env, addr64, val, index);
>> +    tcg_gen_extrl_i64_i32(dst, dst64);
>> +
>> +    tcg_temp_free(dst64);
>> +    tcg_temp_free(addr64);
>> +}
>> +#endif
>> +
>> +#if defined(CONFIG_ARM_USE_LDST_EXCL)
>> +DO_GEN_LDREX(i8)
>> +DO_GEN_LDREX(i16a)
>> +DO_GEN_LDREX(i32a)
>> +
>> +DO_GEN_STREX(i8)
>> +DO_GEN_STREX(i16a)
>> +DO_GEN_STREX(i32a)
>> +#endif
>> +
>> +static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
>> +                               TCGv_i32 addr, int size)
>> + {
>> +    TCGv_i32 tmp = tcg_temp_new_i32();
>> +    TCGv_i32 mem_idx = tcg_temp_new_i32();
>> +
>> +    tcg_gen_movi_i32(mem_idx, get_mem_index(s));
>> +
>> +    if (size != 3) {
>> +        switch (size) {
>> +        case 0:
>> +            gen_ldrex_i8(tmp, addr, mem_idx);
>> +            break;
>> +        case 1:
>> +            gen_ldrex_i16a(tmp, addr, mem_idx);
>> +            break;
>> +        case 2:
>> +            gen_ldrex_i32a(tmp, addr, mem_idx);
>> +            break;
>> +        default:
>> +            abort();
>> +        }
>> +
>> +        store_reg(s, rt, tmp);
>> +    } else {
>> +        TCGv_i64 tmp64 = tcg_temp_new_i64();
>> +        TCGv_i32 tmph = tcg_temp_new_i32();
>> +
>> +        gen_ldrex_i64a(tmp64, addr, mem_idx);
>> +        tcg_gen_extr_i64_i32(tmp, tmph, tmp64);
>> +
>> +        store_reg(s, rt, tmp);
>> +        store_reg(s, rt2, tmph);
>> +
>> +        tcg_temp_free_i64(tmp64);
>> +    }
>> +
>> +    tcg_temp_free_i32(mem_idx);
>> +}
>> +#else
>>  static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
>>                                 TCGv_i32 addr, int size)
>>  {
>> @@ -7460,10 +7594,15 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
>>      store_reg(s, rt, tmp);
>>      tcg_gen_extu_i32_i64(cpu_exclusive_addr, addr);
>>  }
>> +#endif
>>
>>  static void gen_clrex(DisasContext *s)
>>  {
>> +#ifdef CONFIG_ARM_USE_LDST_EXCL
>> +    gen_helper_atomic_clear(cpu_env);
>> +#else
>>      tcg_gen_movi_i64(cpu_exclusive_addr, -1);
>> +#endif
>>  }
>>
>>  #ifdef CONFIG_USER_ONLY
>> @@ -7475,6 +7614,47 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
>>                       size | (rd << 4) | (rt << 8) | (rt2 << 12));
>>      gen_exception_internal_insn(s, 4, EXCP_STREX);
>>  }
>> +#elif defined CONFIG_ARM_USE_LDST_EXCL
>> +static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
>> +                                TCGv_i32 addr, int size)
>> +{
>> +    TCGv_i32 tmp, mem_idx;
>> +
>> +    mem_idx = tcg_temp_new_i32();
>> +
>> +    tcg_gen_movi_i32(mem_idx, get_mem_index(s));
>> +    tmp = load_reg(s, rt);
>> +
>> +    if (size != 3) {
>> +        switch (size) {
>> +        case 0:
>> +            gen_strex_i8(cpu_R[rd], addr, tmp, mem_idx);
>> +            break;
>> +        case 1:
>> +            gen_strex_i16a(cpu_R[rd], addr, tmp, mem_idx);
>> +            break;
>> +        case 2:
>> +            gen_strex_i32a(cpu_R[rd], addr, tmp, mem_idx);
>> +            break;
>> +        default:
>> +            abort();
>> +        }
>> +    } else {
>> +        TCGv_i64 tmp64;
>> +        TCGv_i32 tmp2;
>> +
>> +        tmp64 = tcg_temp_new_i64();
>> +        tmp2 = load_reg(s, rt2);
>> +        tcg_gen_concat_i32_i64(tmp64, tmp, tmp2);
>> +        gen_strex_i64a(cpu_R[rd], addr, tmp64, mem_idx);
>> +
>> +        tcg_temp_free_i32(tmp2);
>> +        tcg_temp_free_i64(tmp64);
>> +    }
>> +
>> +    tcg_temp_free_i32(tmp);
>> +    tcg_temp_free_i32(mem_idx);
>> +}
>>  #else
>>  static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
>>                                  TCGv_i32 addr, int size)
>
>
> --
> Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFC v7 14/16] target-arm: translate: Use ld/st excl for atomic insns
  2016-03-07 18:39     ` alvise rigo
@ 2016-03-07 20:06       ` Alex Bennée
  0 siblings, 0 replies; 50+ messages in thread
From: Alex Bennée @ 2016-03-07 20:06 UTC (permalink / raw)
  To: alvise rigo
  Cc: MTTCG Devel, Claudio Fontana, QEMU Developers, Paolo Bonzini,
	Jani Kokkonen, VirtualOpenSystems Technical Team,
	Richard Henderson


alvise rigo <a.rigo@virtualopensystems.com> writes:

> On Thu, Feb 18, 2016 at 6:02 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>> Alvise Rigo <a.rigo@virtualopensystems.com> writes:
>>
>>> Use the new LL/SC runtime helpers to handle the ARM atomic instructions
>>> in softmmu_llsc_template.h.
>>>
>>> In general, the helper generator
>>> gen_{ldrex,strex}_{8,16a,32a,64a}() calls the function
>>> helper_{le,be}_{ldlink,stcond}{ub,uw,ul,q}_mmu() implemented in
>>> softmmu_llsc_template.h, doing an alignment check.
>>>
>>> In addition, add a simple helper function to emulate the CLREX instruction.
>>>
>>> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
>>> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
>>> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
>>> ---
>>>  target-arm/cpu.h       |   2 +
>>>  target-arm/helper.h    |   4 ++
>>>  target-arm/machine.c   |   2 +
>>>  target-arm/op_helper.c |  10 +++
>>>  target-arm/translate.c | 188 +++++++++++++++++++++++++++++++++++++++++++++++--
>>>  5 files changed, 202 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/target-arm/cpu.h b/target-arm/cpu.h
>>> index b8b3364..bb5361f 100644
>>> --- a/target-arm/cpu.h
>>> +++ b/target-arm/cpu.h
>>> @@ -462,9 +462,11 @@ typedef struct CPUARMState {
>>>          float_status fp_status;
>>>          float_status standard_fp_status;
>>>      } vfp;
>>> +#if !defined(CONFIG_ARM_USE_LDST_EXCL)
>>>      uint64_t exclusive_addr;
>>>      uint64_t exclusive_val;
>>>      uint64_t exclusive_high;
>>> +#endif
>>>  #if defined(CONFIG_USER_ONLY)
>>>      uint64_t exclusive_test;
>>>      uint32_t exclusive_info;
>>> diff --git a/target-arm/helper.h b/target-arm/helper.h
>>> index c2a85c7..6bc3c0a 100644
>>> --- a/target-arm/helper.h
>>> +++ b/target-arm/helper.h
>>> @@ -532,6 +532,10 @@ DEF_HELPER_2(dc_zva, void, env, i64)
>>>  DEF_HELPER_FLAGS_2(neon_pmull_64_lo, TCG_CALL_NO_RWG_SE, i64, i64, i64)
>>>  DEF_HELPER_FLAGS_2(neon_pmull_64_hi, TCG_CALL_NO_RWG_SE, i64, i64, i64)
>>>
>>> +#ifdef CONFIG_ARM_USE_LDST_EXCL
>>> +DEF_HELPER_1(atomic_clear, void, env)
>>> +#endif
>>> +
>>>  #ifdef TARGET_AARCH64
>>>  #include "helper-a64.h"
>>>  #endif
>>> diff --git a/target-arm/machine.c b/target-arm/machine.c
>>> index ed1925a..7adfb4d 100644
>>> --- a/target-arm/machine.c
>>> +++ b/target-arm/machine.c
>>> @@ -309,9 +309,11 @@ const VMStateDescription vmstate_arm_cpu = {
>>>          VMSTATE_VARRAY_INT32(cpreg_vmstate_values, ARMCPU,
>>>                               cpreg_vmstate_array_len,
>>>                               0, vmstate_info_uint64, uint64_t),
>>> +#if !defined(CONFIG_ARM_USE_LDST_EXCL)
>>>          VMSTATE_UINT64(env.exclusive_addr, ARMCPU),
>>>          VMSTATE_UINT64(env.exclusive_val, ARMCPU),
>>>          VMSTATE_UINT64(env.exclusive_high, ARMCPU),
>>> +#endif
>>
>> Hmm this does imply we either need to support migration of the LL/SC
>> state in the generic code or map the generic state into the ARM specific
>> machine state or we'll break migration.
>>
>> The later if probably better so you can save machine state from a
>> pre-LL/SC build and migrate to a new LL/SC enabled build.
>
> This basically would require to add in cpu_pre_save some code to copy
> env.exclusive_* to the new structures. As a consequence, this will not
> get rid of the variables pre-LL/SC.

I wonder what the others think but I think breaking any existing TCG
state on migration would be against the spirit of the thing, but I could
be wrong. If we do break migration we'll need to at least bump the
version tag.

>
>>
>>>          VMSTATE_UINT64(env.features, ARMCPU),
>>>          VMSTATE_UINT32(env.exception.syndrome, ARMCPU),
>>>          VMSTATE_UINT32(env.exception.fsr, ARMCPU),
>>> diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
>>> index a5ee65f..404c13b 100644
>>> --- a/target-arm/op_helper.c
>>> +++ b/target-arm/op_helper.c
>>> @@ -51,6 +51,14 @@ static int exception_target_el(CPUARMState *env)
>>>      return target_el;
>>>  }
>>>
>>> +#ifdef CONFIG_ARM_USE_LDST_EXCL
>>> +void HELPER(atomic_clear)(CPUARMState *env)
>>> +{
>>> +    ENV_GET_CPU(env)->excl_protected_range.begin = EXCLUSIVE_RESET_ADDR;
>>> +    ENV_GET_CPU(env)->ll_sc_context = false;
>>> +}
>>> +#endif
>>> +
>>
>> Given this is just touching generic CPU state this helper should probably be
>> part of the generic TCG runtime. I assume other arches will just call
>> this helper as well.
>
> Would it make sense instead to add a new CPUClass hook for this? Other
> architectures might want a different behaviour (or add something
> else).

Yes. Best of both worlds a generic helper with an option to vary it if needed.

>
> Thank you,
> alvise
>
>>
>>
>>>  uint32_t HELPER(neon_tbl)(CPUARMState *env, uint32_t ireg, uint32_t def,
>>>                            uint32_t rn, uint32_t maxindex)
>>>  {
>>> @@ -689,7 +697,9 @@ void HELPER(exception_return)(CPUARMState *env)
>>>
>>>      aarch64_save_sp(env, cur_el);
>>>
>>> +#if !defined(CONFIG_ARM_USE_LDST_EXCL)
>>>      env->exclusive_addr = -1;
>>> +#endif
>>>
>>>      /* We must squash the PSTATE.SS bit to zero unless both of the
>>>       * following hold:
>>> diff --git a/target-arm/translate.c b/target-arm/translate.c
>>> index cff511b..5150841 100644
>>> --- a/target-arm/translate.c
>>> +++ b/target-arm/translate.c
>>> @@ -60,8 +60,10 @@ TCGv_ptr cpu_env;
>>>  static TCGv_i64 cpu_V0, cpu_V1, cpu_M0;
>>>  static TCGv_i32 cpu_R[16];
>>>  TCGv_i32 cpu_CF, cpu_NF, cpu_VF, cpu_ZF;
>>> +#if !defined(CONFIG_ARM_USE_LDST_EXCL)
>>>  TCGv_i64 cpu_exclusive_addr;
>>>  TCGv_i64 cpu_exclusive_val;
>>> +#endif
>>>  #ifdef CONFIG_USER_ONLY
>>>  TCGv_i64 cpu_exclusive_test;
>>>  TCGv_i32 cpu_exclusive_info;
>>> @@ -94,10 +96,12 @@ void arm_translate_init(void)
>>>      cpu_VF = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUARMState, VF), "VF");
>>>      cpu_ZF = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUARMState, ZF), "ZF");
>>>
>>> +#if !defined(CONFIG_ARM_USE_LDST_EXCL)
>>>      cpu_exclusive_addr = tcg_global_mem_new_i64(TCG_AREG0,
>>>          offsetof(CPUARMState, exclusive_addr), "exclusive_addr");
>>>      cpu_exclusive_val = tcg_global_mem_new_i64(TCG_AREG0,
>>>          offsetof(CPUARMState, exclusive_val), "exclusive_val");
>>> +#endif
>>>  #ifdef CONFIG_USER_ONLY
>>>      cpu_exclusive_test = tcg_global_mem_new_i64(TCG_AREG0,
>>>          offsetof(CPUARMState, exclusive_test), "exclusive_test");
>>> @@ -7413,15 +7417,145 @@ static void gen_logicq_cc(TCGv_i32 lo, TCGv_i32 hi)
>>>      tcg_gen_or_i32(cpu_ZF, lo, hi);
>>>  }
>>>
>>> -/* Load/Store exclusive instructions are implemented by remembering
>>> +/* If the softmmu is enabled, the translation of Load/Store exclusive
>>> +   instructions will rely on the gen_helper_{ldlink,stcond} helpers,
>>> +   offloading most of the work to the softmmu_llsc_template.h functions.
>>> +   All the accesses made by the exclusive instructions include an
>>> +   alignment check.
>>> +
>>> +   Otherwise, these instructions are implemented by remembering
>>>     the value/address loaded, and seeing if these are the same
>>>     when the store is performed. This should be sufficient to implement
>>>     the architecturally mandated semantics, and avoids having to monitor
>>>     regular stores.
>>>
>>> -   In system emulation mode only one CPU will be running at once, so
>>> -   this sequence is effectively atomic.  In user emulation mode we
>>> -   throw an exception and handle the atomic operation elsewhere.  */
>>> +   In user emulation mode we throw an exception and handle the atomic
>>> +   operation elsewhere.  */
>>> +#ifdef CONFIG_ARM_USE_LDST_EXCL
>>> +
>>> +#if TARGET_LONG_BITS == 32
>>> +#define DO_GEN_LDREX(SUFF)                                             \
>>> +static inline void gen_ldrex_##SUFF(TCGv_i32 dst, TCGv_i32 addr,       \
>>> +                                    TCGv_i32 index)                    \
>>> +{                                                                      \
>>> +    gen_helper_ldlink_##SUFF(dst, cpu_env, addr, index);               \
>>> +}
>>> +
>>> +#define DO_GEN_STREX(SUFF)                                             \
>>> +static inline void gen_strex_##SUFF(TCGv_i32 dst, TCGv_i32 addr,       \
>>> +                                    TCGv_i32 val, TCGv_i32 index)      \
>>> +{                                                                      \
>>> +    gen_helper_stcond_##SUFF(dst, cpu_env, addr, val, index);          \
>>> +}
>>> +
>>> +static inline void gen_ldrex_i64a(TCGv_i64 dst, TCGv_i32 addr, TCGv_i32 index)
>>> +{
>>> +    gen_helper_ldlink_i64a(dst, cpu_env, addr, index);
>>> +}
>>> +
>>> +static inline void gen_strex_i64a(TCGv_i32 dst, TCGv_i32 addr, TCGv_i64 val,
>>> +                                  TCGv_i32 index)
>>> +{
>>> +
>>> +    gen_helper_stcond_i64a(dst, cpu_env, addr, val, index);
>>> +}
>>> +#else
>>> +#define DO_GEN_LDREX(SUFF)                                             \
>>> +static inline void gen_ldrex_##SUFF(TCGv_i32 dst, TCGv_i32 addr,       \
>>> +                                         TCGv_i32 index)               \
>>> +{                                                                      \
>>> +    TCGv addr64 = tcg_temp_new();                                      \
>>> +    tcg_gen_extu_i32_i64(addr64, addr);                                \
>>> +    gen_helper_ldlink_##SUFF(dst, cpu_env, addr64, index);             \
>>> +    tcg_temp_free(addr64);                                             \
>>> +}
>>> +
>>> +#define DO_GEN_STREX(SUFF)                                             \
>>> +static inline void gen_strex_##SUFF(TCGv_i32 dst, TCGv_i32 addr,       \
>>> +                                    TCGv_i32 val, TCGv_i32 index)      \
>>> +{                                                                      \
>>> +    TCGv addr64 = tcg_temp_new();                                      \
>>> +    TCGv dst64 = tcg_temp_new();                                       \
>>> +    tcg_gen_extu_i32_i64(addr64, addr);                                \
>>> +    gen_helper_stcond_##SUFF(dst64, cpu_env, addr64, val, index);      \
>>> +    tcg_gen_extrl_i64_i32(dst, dst64);                                 \
>>> +    tcg_temp_free(dst64);                                              \
>>> +    tcg_temp_free(addr64);                                             \
>>> +}
>>> +
>>> +static inline void gen_ldrex_i64a(TCGv_i64 dst, TCGv_i32 addr, TCGv_i32 index)
>>> +{
>>> +    TCGv addr64 = tcg_temp_new();
>>> +    tcg_gen_extu_i32_i64(addr64, addr);
>>> +    gen_helper_ldlink_i64a(dst, cpu_env, addr64, index);
>>> +    tcg_temp_free(addr64);
>>> +}
>>> +
>>> +static inline void gen_strex_i64a(TCGv_i32 dst, TCGv_i32 addr, TCGv_i64 val,
>>> +                                  TCGv_i32 index)
>>> +{
>>> +    TCGv addr64 = tcg_temp_new();
>>> +    TCGv dst64 = tcg_temp_new();
>>> +
>>> +    tcg_gen_extu_i32_i64(addr64, addr);
>>> +    gen_helper_stcond_i64a(dst64, cpu_env, addr64, val, index);
>>> +    tcg_gen_extrl_i64_i32(dst, dst64);
>>> +
>>> +    tcg_temp_free(dst64);
>>> +    tcg_temp_free(addr64);
>>> +}
>>> +#endif
>>> +
>>> +#if defined(CONFIG_ARM_USE_LDST_EXCL)
>>> +DO_GEN_LDREX(i8)
>>> +DO_GEN_LDREX(i16a)
>>> +DO_GEN_LDREX(i32a)
>>> +
>>> +DO_GEN_STREX(i8)
>>> +DO_GEN_STREX(i16a)
>>> +DO_GEN_STREX(i32a)
>>> +#endif
>>> +
>>> +static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
>>> +                               TCGv_i32 addr, int size)
>>> + {
>>> +    TCGv_i32 tmp = tcg_temp_new_i32();
>>> +    TCGv_i32 mem_idx = tcg_temp_new_i32();
>>> +
>>> +    tcg_gen_movi_i32(mem_idx, get_mem_index(s));
>>> +
>>> +    if (size != 3) {
>>> +        switch (size) {
>>> +        case 0:
>>> +            gen_ldrex_i8(tmp, addr, mem_idx);
>>> +            break;
>>> +        case 1:
>>> +            gen_ldrex_i16a(tmp, addr, mem_idx);
>>> +            break;
>>> +        case 2:
>>> +            gen_ldrex_i32a(tmp, addr, mem_idx);
>>> +            break;
>>> +        default:
>>> +            abort();
>>> +        }
>>> +
>>> +        store_reg(s, rt, tmp);
>>> +    } else {
>>> +        TCGv_i64 tmp64 = tcg_temp_new_i64();
>>> +        TCGv_i32 tmph = tcg_temp_new_i32();
>>> +
>>> +        gen_ldrex_i64a(tmp64, addr, mem_idx);
>>> +        tcg_gen_extr_i64_i32(tmp, tmph, tmp64);
>>> +
>>> +        store_reg(s, rt, tmp);
>>> +        store_reg(s, rt2, tmph);
>>> +
>>> +        tcg_temp_free_i64(tmp64);
>>> +    }
>>> +
>>> +    tcg_temp_free_i32(mem_idx);
>>> +}
>>> +#else
>>>  static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
>>>                                 TCGv_i32 addr, int size)
>>>  {
>>> @@ -7460,10 +7594,15 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
>>>      store_reg(s, rt, tmp);
>>>      tcg_gen_extu_i32_i64(cpu_exclusive_addr, addr);
>>>  }
>>> +#endif
>>>
>>>  static void gen_clrex(DisasContext *s)
>>>  {
>>> +#ifdef CONFIG_ARM_USE_LDST_EXCL
>>> +    gen_helper_atomic_clear(cpu_env);
>>> +#else
>>>      tcg_gen_movi_i64(cpu_exclusive_addr, -1);
>>> +#endif
>>>  }
>>>
>>>  #ifdef CONFIG_USER_ONLY
>>> @@ -7475,6 +7614,47 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
>>>                       size | (rd << 4) | (rt << 8) | (rt2 << 12));
>>>      gen_exception_internal_insn(s, 4, EXCP_STREX);
>>>  }
>>> +#elif defined CONFIG_ARM_USE_LDST_EXCL
>>> +static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
>>> +                                TCGv_i32 addr, int size)
>>> +{
>>> +    TCGv_i32 tmp, mem_idx;
>>> +
>>> +    mem_idx = tcg_temp_new_i32();
>>> +
>>> +    tcg_gen_movi_i32(mem_idx, get_mem_index(s));
>>> +    tmp = load_reg(s, rt);
>>> +
>>> +    if (size != 3) {
>>> +        switch (size) {
>>> +        case 0:
>>> +            gen_strex_i8(cpu_R[rd], addr, tmp, mem_idx);
>>> +            break;
>>> +        case 1:
>>> +            gen_strex_i16a(cpu_R[rd], addr, tmp, mem_idx);
>>> +            break;
>>> +        case 2:
>>> +            gen_strex_i32a(cpu_R[rd], addr, tmp, mem_idx);
>>> +            break;
>>> +        default:
>>> +            abort();
>>> +        }
>>> +    } else {
>>> +        TCGv_i64 tmp64;
>>> +        TCGv_i32 tmp2;
>>> +
>>> +        tmp64 = tcg_temp_new_i64();
>>> +        tmp2 = load_reg(s, rt2);
>>> +        tcg_gen_concat_i32_i64(tmp64, tmp, tmp2);
>>> +        gen_strex_i64a(cpu_R[rd], addr, tmp64, mem_idx);
>>> +
>>> +        tcg_temp_free_i32(tmp2);
>>> +        tcg_temp_free_i64(tmp64);
>>> +    }
>>> +
>>> +    tcg_temp_free_i32(tmp);
>>> +    tcg_temp_free_i32(mem_idx);
>>> +}
>>>  #else
>>>  static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
>>>                                  TCGv_i32 addr, int size)
>>
>>
>> --
>> Alex Bennée


--
Alex Bennée

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2016-03-07 20:06 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-29  9:32 [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation Alvise Rigo
2016-01-29  9:32 ` [Qemu-devel] [RFC v7 01/16] exec.c: Add new exclusive bitmap to ram_list Alvise Rigo
2016-02-11 13:00   ` Alex Bennée
2016-02-11 13:21     ` alvise rigo
2016-01-29  9:32 ` [Qemu-devel] [RFC v7 02/16] softmmu: Simplify helper_*_st_name, wrap unaligned code Alvise Rigo
2016-02-11 13:07   ` Alex Bennée
2016-01-29  9:32 ` [Qemu-devel] [RFC v7 03/16] softmmu: Simplify helper_*_st_name, wrap MMIO code Alvise Rigo
2016-02-11 13:15   ` Alex Bennée
2016-01-29  9:32 ` [Qemu-devel] [RFC v7 04/16] softmmu: Simplify helper_*_st_name, wrap RAM code Alvise Rigo
2016-02-11 13:18   ` Alex Bennée
2016-01-29  9:32 ` [Qemu-devel] [RFC v7 05/16] softmmu: Add new TLB_EXCL flag Alvise Rigo
2016-02-11 13:18   ` Alex Bennée
2016-01-29  9:32 ` [Qemu-devel] [RFC v7 06/16] qom: cpu: Add CPUClass hooks for exclusive range Alvise Rigo
2016-02-11 13:22   ` Alex Bennée
2016-02-18 13:53     ` alvise rigo
2016-01-29  9:32 ` [Qemu-devel] [RFC v7 07/16] softmmu: Add helpers for a new slowpath Alvise Rigo
2016-02-11 16:33   ` Alex Bennée
2016-02-18 13:58     ` alvise rigo
2016-01-29  9:32 ` [Qemu-devel] [RFC v7 08/16] softmmu: Honor the new exclusive bitmap Alvise Rigo
2016-02-16 17:39   ` Alex Bennée
2016-02-18 14:18     ` alvise rigo
2016-01-29  9:32 ` [Qemu-devel] [RFC v7 09/16] softmmu: Include MMIO/invalid exclusive accesses Alvise Rigo
2016-02-16 17:49   ` Alex Bennée
2016-02-18 14:18     ` alvise rigo
2016-02-18 16:26       ` Alex Bennée
2016-01-29  9:32 ` [Qemu-devel] [RFC v7 10/16] softmmu: Protect MMIO exclusive range Alvise Rigo
2016-02-17 18:55   ` Alex Bennée
2016-02-18 14:15     ` alvise rigo
2016-02-18 16:25       ` Alex Bennée
2016-03-07 18:13         ` alvise rigo
2016-01-29  9:32 ` [Qemu-devel] [RFC v7 11/16] tcg: Create new runtime helpers for excl accesses Alvise Rigo
2016-02-18 16:16   ` Alex Bennée
2016-01-29  9:32 ` [Qemu-devel] [RFC v7 12/16] configure: Use slow-path for atomic only when the softmmu is enabled Alvise Rigo
2016-02-18 16:40   ` Alex Bennée
2016-02-18 16:43     ` Alex Bennée
2016-03-07 17:21     ` alvise rigo
2016-01-29  9:32 ` [Qemu-devel] [RFC v7 13/16] softmmu: Add history of excl accesses Alvise Rigo
2016-02-16 17:07   ` Alex Bennée
2016-02-18 14:17     ` alvise rigo
2016-01-29  9:32 ` [Qemu-devel] [RFC v7 14/16] target-arm: translate: Use ld/st excl for atomic insns Alvise Rigo
2016-02-18 17:02   ` Alex Bennée
2016-03-07 18:39     ` alvise rigo
2016-03-07 20:06       ` Alex Bennée
2016-01-29  9:32 ` [Qemu-devel] [RFC v7 15/16] target-arm: cpu64: use custom set_excl hook Alvise Rigo
2016-02-18 18:19   ` Alex Bennée
2016-01-29  9:32 ` [Qemu-devel] [RFC v7 16/16] target-arm: aarch64: add atomic instructions Alvise Rigo
2016-02-19 11:34   ` Alex Bennée
2016-02-19 11:44 ` [Qemu-devel] [RFC v7 00/16] Slow-path for atomic instruction translation Alex Bennée
2016-02-19 12:01   ` alvise rigo
2016-02-19 12:19     ` Alex Bennée

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).