All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/4] per-TLB lock
@ 2018-10-05 21:14 Emilio G. Cota
  2018-10-05 21:14 ` [Qemu-devel] [PATCH v3 1/4] exec: introduce tlb_init Emilio G. Cota
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Emilio G. Cota @ 2018-10-05 21:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Richard Henderson, Alex Bennée

v2: https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg00617.html

Changes since v2:

- Add R-b's

- Include "qemu/thread.h" from cpu-defs.h to fix the build error
  reported by Alex.

- Eliminate (on average) the performance slowdown. This is achieved by:
  + Switching from a mutex to a spinlock
  + Acquiring the lock only once in tlb_set_page_with_attrs,
    instead of acquiring/releasing it several times.
    This function is called often enough for memory-hungry
    workloads that amortizing the lock acquisition cost
    makes sense. I've added a comment about this.

- Add SPEC06int perf numbers to patch 4's commit log.

The series is checkpatch-clean. You can fetch the code from:
  https://github.com/cota/qemu/tree/tlb-lock-v3

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 1/4] exec: introduce tlb_init
  2018-10-05 21:14 [Qemu-devel] [PATCH v3 0/4] per-TLB lock Emilio G. Cota
@ 2018-10-05 21:14 ` Emilio G. Cota
  2018-10-05 21:14 ` [Qemu-devel] [PATCH v3 2/4] cputlb: fix assert_cpu_is_self macro Emilio G. Cota
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Emilio G. Cota @ 2018-10-05 21:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Richard Henderson, Alex Bennée

Paves the way for the addition of a per-TLB lock.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/exec-all.h | 8 ++++++++
 accel/tcg/cputlb.c      | 4 ++++
 exec.c                  | 1 +
 3 files changed, 13 insertions(+)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 5f78125582..815e5b1e83 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -99,6 +99,11 @@ void cpu_address_space_init(CPUState *cpu, int asidx,
 
 #if !defined(CONFIG_USER_ONLY) && defined(CONFIG_TCG)
 /* cputlb.c */
+/**
+ * tlb_init - initialize a CPU's TLB
+ * @cpu: CPU whose TLB should be initialized
+ */
+void tlb_init(CPUState *cpu);
 /**
  * tlb_flush_page:
  * @cpu: CPU whose TLB should be flushed
@@ -258,6 +263,9 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr,
 void probe_write(CPUArchState *env, target_ulong addr, int size, int mmu_idx,
                  uintptr_t retaddr);
 #else
+static inline void tlb_init(CPUState *cpu)
+{
+}
 static inline void tlb_flush_page(CPUState *cpu, target_ulong addr)
 {
 }
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index f4702ce91f..502eea2850 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -73,6 +73,10 @@ QEMU_BUILD_BUG_ON(sizeof(target_ulong) > sizeof(run_on_cpu_data));
 QEMU_BUILD_BUG_ON(NB_MMU_MODES > 16);
 #define ALL_MMUIDX_BITS ((1 << NB_MMU_MODES) - 1)
 
+void tlb_init(CPUState *cpu)
+{
+}
+
 /* flush_all_helper: run fn across all cpus
  *
  * If the wait flag is set then the src cpu's helper will be queued as
diff --git a/exec.c b/exec.c
index d0821e69aa..4fd831ef06 100644
--- a/exec.c
+++ b/exec.c
@@ -965,6 +965,7 @@ void cpu_exec_realizefn(CPUState *cpu, Error **errp)
         tcg_target_initialized = true;
         cc->tcg_initialize();
     }
+    tlb_init(cpu);
 
 #ifndef CONFIG_USER_ONLY
     if (qdev_get_vmsd(DEVICE(cpu)) == NULL) {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 2/4] cputlb: fix assert_cpu_is_self macro
  2018-10-05 21:14 [Qemu-devel] [PATCH v3 0/4] per-TLB lock Emilio G. Cota
  2018-10-05 21:14 ` [Qemu-devel] [PATCH v3 1/4] exec: introduce tlb_init Emilio G. Cota
@ 2018-10-05 21:14 ` Emilio G. Cota
  2018-10-05 21:14 ` [Qemu-devel] [PATCH v3 3/4] cputlb: serialize tlb updates with env->tlb_lock Emilio G. Cota
  2018-10-05 21:14 ` [Qemu-devel] [PATCH v3 4/4] cputlb: read CPUTLBEntry.addr_write atomically Emilio G. Cota
  3 siblings, 0 replies; 12+ messages in thread
From: Emilio G. Cota @ 2018-10-05 21:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Richard Henderson, Alex Bennée

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 accel/tcg/cputlb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 502eea2850..f6b388c961 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -58,9 +58,9 @@
     } \
 } while (0)
 
-#define assert_cpu_is_self(this_cpu) do {                         \
+#define assert_cpu_is_self(cpu) do {                              \
         if (DEBUG_TLB_GATE) {                                     \
-            g_assert(!cpu->created || qemu_cpu_is_self(cpu));     \
+            g_assert(!(cpu)->created || qemu_cpu_is_self(cpu));   \
         }                                                         \
     } while (0)
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 3/4] cputlb: serialize tlb updates with env->tlb_lock
  2018-10-05 21:14 [Qemu-devel] [PATCH v3 0/4] per-TLB lock Emilio G. Cota
  2018-10-05 21:14 ` [Qemu-devel] [PATCH v3 1/4] exec: introduce tlb_init Emilio G. Cota
  2018-10-05 21:14 ` [Qemu-devel] [PATCH v3 2/4] cputlb: fix assert_cpu_is_self macro Emilio G. Cota
@ 2018-10-05 21:14 ` Emilio G. Cota
  2018-10-08 13:57   ` Alex Bennée
  2018-10-05 21:14 ` [Qemu-devel] [PATCH v3 4/4] cputlb: read CPUTLBEntry.addr_write atomically Emilio G. Cota
  3 siblings, 1 reply; 12+ messages in thread
From: Emilio G. Cota @ 2018-10-05 21:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Richard Henderson, Alex Bennée

Currently we rely on atomic operations for cross-CPU invalidations.
There are two cases that these atomics miss: cross-CPU invalidations
can race with either (1) vCPU threads flushing their TLB, which
happens via memset, or (2) vCPUs calling tlb_reset_dirty on their TLB,
which updates .addr_write with a regular store. This results in
undefined behaviour, since we're mixing regular and atomic ops
on concurrent accesses.

Fix it by using tlb_lock, a per-vCPU lock. All updaters of tlb_table
and the corresponding victim cache now hold the lock.
The readers that do not hold tlb_lock must use atomic reads when
reading .addr_write, since this field can be updated by other threads;
the conversion to atomic reads is done in the next patch.

Note that an alternative fix would be to expand the use of atomic ops.
However, in the case of TLB flushes this would have a huge performance
impact, since (1) TLB flushes can happen very frequently and (2) we
currently use a full memory barrier to flush each TLB entry, and a TLB
has many entries. Instead, acquiring the lock is barely slower than a
full memory barrier since it is uncontended, and with a single lock
acquisition we can flush the entire TLB.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/cpu-defs.h |   3 +
 accel/tcg/cputlb.c      | 152 +++++++++++++++++++++-------------------
 2 files changed, 84 insertions(+), 71 deletions(-)

diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index a171ffc1a4..4ff62f32bf 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -24,6 +24,7 @@
 #endif
 
 #include "qemu/host-utils.h"
+#include "qemu/thread.h"
 #include "qemu/queue.h"
 #ifdef CONFIG_TCG
 #include "tcg-target.h"
@@ -142,6 +143,8 @@ typedef struct CPUIOTLBEntry {
 
 #define CPU_COMMON_TLB \
     /* The meaning of the MMU modes is defined in the target code. */   \
+    /* tlb_lock serializes updates to tlb_table and tlb_v_table */      \
+    QemuSpin tlb_lock;                                                  \
     CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE];                  \
     CPUTLBEntry tlb_v_table[NB_MMU_MODES][CPU_VTLB_SIZE];               \
     CPUIOTLBEntry iotlb[NB_MMU_MODES][CPU_TLB_SIZE];                    \
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index f6b388c961..2b0ff47fdf 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -75,6 +75,9 @@ QEMU_BUILD_BUG_ON(NB_MMU_MODES > 16);
 
 void tlb_init(CPUState *cpu)
 {
+    CPUArchState *env = cpu->env_ptr;
+
+    qemu_spin_init(&env->tlb_lock);
 }
 
 /* flush_all_helper: run fn across all cpus
@@ -129,8 +132,17 @@ static void tlb_flush_nocheck(CPUState *cpu)
     atomic_set(&env->tlb_flush_count, env->tlb_flush_count + 1);
     tlb_debug("(count: %zu)\n", tlb_flush_count());
 
+    /*
+     * tlb_table/tlb_v_table updates from any thread must hold tlb_lock.
+     * However, updates from the owner thread (as is the case here; see the
+     * above assert_cpu_is_self) do not need atomic_set because all reads
+     * that do not hold the lock are performed by the same owner thread.
+     */
+    qemu_spin_lock(&env->tlb_lock);
     memset(env->tlb_table, -1, sizeof(env->tlb_table));
     memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
+    qemu_spin_unlock(&env->tlb_lock);
+
     cpu_tb_jmp_cache_clear(cpu);
 
     env->vtlb_index = 0;
@@ -182,6 +194,7 @@ static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, run_on_cpu_data data)
 
     tlb_debug("start: mmu_idx:0x%04lx\n", mmu_idx_bitmask);
 
+    qemu_spin_lock(&env->tlb_lock);
     for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
 
         if (test_bit(mmu_idx, &mmu_idx_bitmask)) {
@@ -191,6 +204,7 @@ static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, run_on_cpu_data data)
             memset(env->tlb_v_table[mmu_idx], -1, sizeof(env->tlb_v_table[0]));
         }
     }
+    qemu_spin_unlock(&env->tlb_lock);
 
     cpu_tb_jmp_cache_clear(cpu);
 
@@ -247,19 +261,24 @@ static inline bool tlb_hit_page_anyprot(CPUTLBEntry *tlb_entry,
            tlb_hit_page(tlb_entry->addr_code, page);
 }
 
-static inline void tlb_flush_entry(CPUTLBEntry *tlb_entry, target_ulong page)
+/* Called with tlb_lock held */
+static inline void tlb_flush_entry_locked(CPUTLBEntry *tlb_entry,
+                                          target_ulong page)
 {
     if (tlb_hit_page_anyprot(tlb_entry, page)) {
         memset(tlb_entry, -1, sizeof(*tlb_entry));
     }
 }
 
-static inline void tlb_flush_vtlb_page(CPUArchState *env, int mmu_idx,
-                                       target_ulong page)
+/* Called with tlb_lock held */
+static inline void tlb_flush_vtlb_page_locked(CPUArchState *env, int mmu_idx,
+                                              target_ulong page)
 {
     int k;
+
+    assert_cpu_is_self(ENV_GET_CPU(env));
     for (k = 0; k < CPU_VTLB_SIZE; k++) {
-        tlb_flush_entry(&env->tlb_v_table[mmu_idx][k], page);
+        tlb_flush_entry_locked(&env->tlb_v_table[mmu_idx][k], page);
     }
 }
 
@@ -286,10 +305,12 @@ static void tlb_flush_page_async_work(CPUState *cpu, run_on_cpu_data data)
 
     addr &= TARGET_PAGE_MASK;
     i = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+    qemu_spin_lock(&env->tlb_lock);
     for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
-        tlb_flush_entry(&env->tlb_table[mmu_idx][i], addr);
-        tlb_flush_vtlb_page(env, mmu_idx, addr);
+        tlb_flush_entry_locked(&env->tlb_table[mmu_idx][i], addr);
+        tlb_flush_vtlb_page_locked(env, mmu_idx, addr);
     }
+    qemu_spin_unlock(&env->tlb_lock);
 
     tb_flush_jmp_cache(cpu, addr);
 }
@@ -326,12 +347,14 @@ static void tlb_flush_page_by_mmuidx_async_work(CPUState *cpu,
     tlb_debug("page:%d addr:"TARGET_FMT_lx" mmu_idx:0x%lx\n",
               page, addr, mmu_idx_bitmap);
 
+    qemu_spin_lock(&env->tlb_lock);
     for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
         if (test_bit(mmu_idx, &mmu_idx_bitmap)) {
-            tlb_flush_entry(&env->tlb_table[mmu_idx][page], addr);
-            tlb_flush_vtlb_page(env, mmu_idx, addr);
+            tlb_flush_entry_locked(&env->tlb_table[mmu_idx][page], addr);
+            tlb_flush_vtlb_page_locked(env, mmu_idx, addr);
         }
     }
+    qemu_spin_unlock(&env->tlb_lock);
 
     tb_flush_jmp_cache(cpu, addr);
 }
@@ -454,72 +477,41 @@ void tlb_unprotect_code(ram_addr_t ram_addr)
  * most usual is detecting writes to code regions which may invalidate
  * generated code.
  *
- * Because we want other vCPUs to respond to changes straight away we
- * update the te->addr_write field atomically. If the TLB entry has
- * been changed by the vCPU in the mean time we skip the update.
+ * Other vCPUs might be reading their TLBs during guest execution, so we update
+ * te->addr_write with atomic_set. We don't need to worry about this for
+ * oversized guests as MTTCG is disabled for them.
  *
- * As this function uses atomic accesses we also need to ensure
- * updates to tlb_entries follow the same access rules. We don't need
- * to worry about this for oversized guests as MTTCG is disabled for
- * them.
+ * Called with tlb_lock held.
  */
-
-static void tlb_reset_dirty_range(CPUTLBEntry *tlb_entry, uintptr_t start,
-                           uintptr_t length)
+static void tlb_reset_dirty_range_locked(CPUTLBEntry *tlb_entry,
+                                         uintptr_t start, uintptr_t length)
 {
-#if TCG_OVERSIZED_GUEST
     uintptr_t addr = tlb_entry->addr_write;
 
     if ((addr & (TLB_INVALID_MASK | TLB_MMIO | TLB_NOTDIRTY)) == 0) {
         addr &= TARGET_PAGE_MASK;
         addr += tlb_entry->addend;
         if ((addr - start) < length) {
+#if TCG_OVERSIZED_GUEST
             tlb_entry->addr_write |= TLB_NOTDIRTY;
-        }
-    }
 #else
-    /* paired with atomic_mb_set in tlb_set_page_with_attrs */
-    uintptr_t orig_addr = atomic_mb_read(&tlb_entry->addr_write);
-    uintptr_t addr = orig_addr;
-
-    if ((addr & (TLB_INVALID_MASK | TLB_MMIO | TLB_NOTDIRTY)) == 0) {
-        addr &= TARGET_PAGE_MASK;
-        addr += atomic_read(&tlb_entry->addend);
-        if ((addr - start) < length) {
-            uintptr_t notdirty_addr = orig_addr | TLB_NOTDIRTY;
-            atomic_cmpxchg(&tlb_entry->addr_write, orig_addr, notdirty_addr);
+            atomic_set(&tlb_entry->addr_write,
+                       tlb_entry->addr_write | TLB_NOTDIRTY);
+#endif
         }
     }
-#endif
 }
 
-/* For atomic correctness when running MTTCG we need to use the right
- * primitives when copying entries */
-static inline void copy_tlb_helper(CPUTLBEntry *d, CPUTLBEntry *s,
-                                   bool atomic_set)
+/* Called with tlb_lock held */
+static inline void copy_tlb_helper_locked(CPUTLBEntry *d, const CPUTLBEntry *s)
 {
-#if TCG_OVERSIZED_GUEST
     *d = *s;
-#else
-    if (atomic_set) {
-        d->addr_read = s->addr_read;
-        d->addr_code = s->addr_code;
-        atomic_set(&d->addend, atomic_read(&s->addend));
-        /* Pairs with flag setting in tlb_reset_dirty_range */
-        atomic_mb_set(&d->addr_write, atomic_read(&s->addr_write));
-    } else {
-        d->addr_read = s->addr_read;
-        d->addr_write = atomic_read(&s->addr_write);
-        d->addr_code = s->addr_code;
-        d->addend = atomic_read(&s->addend);
-    }
-#endif
 }
 
 /* This is a cross vCPU call (i.e. another vCPU resetting the flags of
- * the target vCPU). As such care needs to be taken that we don't
- * dangerously race with another vCPU update. The only thing actually
- * updated is the target TLB entry ->addr_write flags.
+ * the target vCPU).
+ * We must take tlb_lock to avoid racing with another vCPU update. The only
+ * thing actually updated is the target TLB entry ->addr_write flags.
  */
 void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length)
 {
@@ -528,22 +520,26 @@ void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length)
     int mmu_idx;
 
     env = cpu->env_ptr;
+    qemu_spin_lock(&env->tlb_lock);
     for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
         unsigned int i;
 
         for (i = 0; i < CPU_TLB_SIZE; i++) {
-            tlb_reset_dirty_range(&env->tlb_table[mmu_idx][i],
-                                  start1, length);
+            tlb_reset_dirty_range_locked(&env->tlb_table[mmu_idx][i], start1,
+                                         length);
         }
 
         for (i = 0; i < CPU_VTLB_SIZE; i++) {
-            tlb_reset_dirty_range(&env->tlb_v_table[mmu_idx][i],
-                                  start1, length);
+            tlb_reset_dirty_range_locked(&env->tlb_v_table[mmu_idx][i], start1,
+                                         length);
         }
     }
+    qemu_spin_unlock(&env->tlb_lock);
 }
 
-static inline void tlb_set_dirty1(CPUTLBEntry *tlb_entry, target_ulong vaddr)
+/* Called with tlb_lock held */
+static inline void tlb_set_dirty1_locked(CPUTLBEntry *tlb_entry,
+                                         target_ulong vaddr)
 {
     if (tlb_entry->addr_write == (vaddr | TLB_NOTDIRTY)) {
         tlb_entry->addr_write = vaddr;
@@ -562,16 +558,18 @@ void tlb_set_dirty(CPUState *cpu, target_ulong vaddr)
 
     vaddr &= TARGET_PAGE_MASK;
     i = (vaddr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+    qemu_spin_lock(&env->tlb_lock);
     for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
-        tlb_set_dirty1(&env->tlb_table[mmu_idx][i], vaddr);
+        tlb_set_dirty1_locked(&env->tlb_table[mmu_idx][i], vaddr);
     }
 
     for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
         int k;
         for (k = 0; k < CPU_VTLB_SIZE; k++) {
-            tlb_set_dirty1(&env->tlb_v_table[mmu_idx][k], vaddr);
+            tlb_set_dirty1_locked(&env->tlb_v_table[mmu_idx][k], vaddr);
         }
     }
+    qemu_spin_unlock(&env->tlb_lock);
 }
 
 /* Our TLB does not support large pages, so remember the area covered by
@@ -658,9 +656,6 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
         addend = (uintptr_t)memory_region_get_ram_ptr(section->mr) + xlat;
     }
 
-    /* Make sure there's no cached translation for the new page.  */
-    tlb_flush_vtlb_page(env, mmu_idx, vaddr_page);
-
     code_address = address;
     iotlb = memory_region_section_get_iotlb(cpu, section, vaddr_page,
                                             paddr_page, xlat, prot, &address);
@@ -668,6 +663,18 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
     index = (vaddr_page >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
     te = &env->tlb_table[mmu_idx][index];
 
+    /*
+     * Hold the TLB lock for the rest of the function. We could acquire/release
+     * the lock several times in the function, but it is faster to amortize the
+     * acquisition cost by acquiring it just once. Note that this leads to
+     * a longer critical section, but this is not a concern since the TLB lock
+     * is unlikely to be contended.
+     */
+    qemu_spin_lock(&env->tlb_lock);
+
+    /* Make sure there's no cached translation for the new page.  */
+    tlb_flush_vtlb_page_locked(env, mmu_idx, vaddr_page);
+
     /*
      * Only evict the old entry to the victim tlb if it's for a
      * different page; otherwise just overwrite the stale data.
@@ -677,7 +684,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
         CPUTLBEntry *tv = &env->tlb_v_table[mmu_idx][vidx];
 
         /* Evict the old entry into the victim tlb.  */
-        copy_tlb_helper(tv, te, true);
+        copy_tlb_helper_locked(tv, te);
         env->iotlb_v[mmu_idx][vidx] = env->iotlb[mmu_idx][index];
     }
 
@@ -729,9 +736,8 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
         }
     }
 
-    /* Pairs with flag setting in tlb_reset_dirty_range */
-    copy_tlb_helper(te, &tn, true);
-    /* atomic_mb_set(&te->addr_write, write_address); */
+    copy_tlb_helper_locked(te, &tn);
+    qemu_spin_unlock(&env->tlb_lock);
 }
 
 /* Add a new TLB entry, but without specifying the memory
@@ -895,6 +901,8 @@ static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
                            size_t elt_ofs, target_ulong page)
 {
     size_t vidx;
+
+    assert_cpu_is_self(ENV_GET_CPU(env));
     for (vidx = 0; vidx < CPU_VTLB_SIZE; ++vidx) {
         CPUTLBEntry *vtlb = &env->tlb_v_table[mmu_idx][vidx];
         target_ulong cmp = *(target_ulong *)((uintptr_t)vtlb + elt_ofs);
@@ -903,9 +911,11 @@ static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
             /* Found entry in victim tlb, swap tlb and iotlb.  */
             CPUTLBEntry tmptlb, *tlb = &env->tlb_table[mmu_idx][index];
 
-            copy_tlb_helper(&tmptlb, tlb, false);
-            copy_tlb_helper(tlb, vtlb, true);
-            copy_tlb_helper(vtlb, &tmptlb, true);
+            qemu_spin_lock(&env->tlb_lock);
+            copy_tlb_helper_locked(&tmptlb, tlb);
+            copy_tlb_helper_locked(tlb, vtlb);
+            copy_tlb_helper_locked(vtlb, &tmptlb);
+            qemu_spin_unlock(&env->tlb_lock);
 
             CPUIOTLBEntry tmpio, *io = &env->iotlb[mmu_idx][index];
             CPUIOTLBEntry *vio = &env->iotlb_v[mmu_idx][vidx];
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 4/4] cputlb: read CPUTLBEntry.addr_write atomically
  2018-10-05 21:14 [Qemu-devel] [PATCH v3 0/4] per-TLB lock Emilio G. Cota
                   ` (2 preceding siblings ...)
  2018-10-05 21:14 ` [Qemu-devel] [PATCH v3 3/4] cputlb: serialize tlb updates with env->tlb_lock Emilio G. Cota
@ 2018-10-05 21:14 ` Emilio G. Cota
  2018-10-16  2:52   ` Richard Henderson
  3 siblings, 1 reply; 12+ messages in thread
From: Emilio G. Cota @ 2018-10-05 21:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Richard Henderson, Alex Bennée

Updates can come from other threads, so readers that do not
take tlb_lock must use atomic_read to avoid undefined
behaviour (UB).

This and the previous commit result on average in no performance loss,
as the following experiments (run on an Intel i7-6700K CPU @ 4.00GHz)
show.

1. aarch64 bootup+shutdown test:

- Before:
 Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):

       7487.087786      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.12% )
    31,574,905,303      cycles                    #    4.217 GHz                      ( +-  0.12% )
    57,097,908,812      instructions              #    1.81  insns per cycle          ( +-  0.08% )
    10,255,415,367      branches                  # 1369.747 M/sec                    ( +-  0.08% )
       173,278,962      branch-misses             #    1.69% of all branches          ( +-  0.18% )

       7.504481349 seconds time elapsed                                          ( +-  0.14% )

- After:
 Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):

       7462.441328      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.07% )
    31,478,476,520      cycles                    #    4.218 GHz                      ( +-  0.07% )
    57,017,330,084      instructions              #    1.81  insns per cycle          ( +-  0.05% )
    10,251,929,667      branches                  # 1373.804 M/sec                    ( +-  0.05% )
       173,023,787      branch-misses             #    1.69% of all branches          ( +-  0.11% )

       7.474970463 seconds time elapsed                                          ( +-  0.07% )

2. SPEC06int:
                                              SPEC06int (test set)
                                           [Y axis: Speedup over master]
  1.15 +-+----+------+------+------+------+------+-------+------+------+------+------+------+------+----+-+
       |                                                                                                  |
   1.1 +-+.................................+++.............................+  tlb-lock-v2 (m+++x)       +-+
       |                                +++ |                   +++        tlb-lock-v3 (spinl|ck)         |
       |                    +++          |  |     +++    +++     |                           |            |
  1.05 +-+....+++...........####.........|####.+++.|......|.....###....+++...........+++....###.........+-+
       |      ###         ++#| #         |# |# ***### +++### +++#+#     |     +++     |     #|#    ###    |
     1 +-+++***+#++++####+++#++#++++++++++#++#+*+*++#++++#+#+****+#++++###++++###++++###++++#+#++++#+#+++-+
       |    *+* #    #++# ***  #   #### ***  # * *++# ****+# *| * # ****|#   |# #    #|#    #+#    # #    |
  0.95 +-+..*.*.#....#..#.*|*..#...#..#.*|*..#.*.*..#.*|.*.#.*++*.#.*++*+#.****.#....#+#....#.#..++#.#..+-+
       |    * * #    #  # *|*  #   #  # *|*  # * *  # *++* # *  * # *  * # * |* #  ++# #    # #  *** #    |
       |    * * #  ++#  # *+*  #   #  # *|*  # * *  # *  * # *  * # *  * # *++* # **** #  ++# #  * * #    |
   0.9 +-+..*.*.#...|#..#.*.*..#.++#..#.*|*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*.|*.#...|#.#..*.*.#..+-+
       |    * * #  ***  # * *  #  |#  # *+*  # * *  # *  * # *  * # *  * # *  * # *++* #   |# #  * * #    |
  0.85 +-+..*.*.#..*|*..#.*.*..#.***..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.****.#..*.*.#..+-+
       |    * * #  *+*  # * *  # *|*  # * *  # * *  # *  * # *  * # *  * # *  * # *  * # * |* #  * * #    |
       |    * * #  * *  # * *  # *+*  # * *  # * *  # *  * # *  * # *  * # *  * # *  * # * |* #  * * #    |
   0.8 +-+..*.*.#..*.*..#.*.*..#.*.*..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.*++*.#..*.*.#..+-+
       |    * * #  * *  # * *  # * *  # * *  # * *  # *  * # *  * # *  * # *  * # *  * # *  * #  * * #    |
  0.75 +-+--***##--***###-***###-***###-***###-***###-****##-****##-****##-****##-****##-****##--***##--+-+
 400.perlben401.bzip2403.gcc429.m445.gob456.hmme45462.libqua464.h26471.omnet473483.xalancbmkgeomean

  png: https://imgur.com/a/BHzpPTW

Notes:
- tlb-lock-v2 corresponds to an implementation with a mutex.
- tlb-lock-v3 is the current patch series, i.e. with a spinlock
  and a single lock acquisition in tlb_set_page_with_attrs.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 accel/tcg/softmmu_template.h     | 16 ++++++++++------
 include/exec/cpu_ldst.h          |  2 +-
 include/exec/cpu_ldst_template.h |  2 +-
 accel/tcg/cputlb.c               | 15 +++++++++------
 4 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/accel/tcg/softmmu_template.h b/accel/tcg/softmmu_template.h
index f060a693d4..1e50263871 100644
--- a/accel/tcg/softmmu_template.h
+++ b/accel/tcg/softmmu_template.h
@@ -277,7 +277,8 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
 {
     unsigned mmu_idx = get_mmuidx(oi);
     int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
-    target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
+    target_ulong tlb_addr =
+        atomic_read(&env->tlb_table[mmu_idx][index].addr_write);
     unsigned a_bits = get_alignment_bits(get_memop(oi));
     uintptr_t haddr;
 
@@ -292,7 +293,8 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
             tlb_fill(ENV_GET_CPU(env), addr, DATA_SIZE, MMU_DATA_STORE,
                      mmu_idx, retaddr);
         }
-        tlb_addr = env->tlb_table[mmu_idx][index].addr_write & ~TLB_INVALID_MASK;
+        tlb_addr = atomic_read(&env->tlb_table[mmu_idx][index].addr_write) &
+            ~TLB_INVALID_MASK;
     }
 
     /* Handle an IO access.  */
@@ -321,7 +323,7 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
            cannot evict the first.  */
         page2 = (addr + DATA_SIZE) & TARGET_PAGE_MASK;
         index2 = (page2 >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
-        tlb_addr2 = env->tlb_table[mmu_idx][index2].addr_write;
+        tlb_addr2 = atomic_read(&env->tlb_table[mmu_idx][index2].addr_write);
         if (!tlb_hit_page(tlb_addr2, page2)
             && !VICTIM_TLB_HIT(addr_write, page2)) {
             tlb_fill(ENV_GET_CPU(env), page2, DATA_SIZE, MMU_DATA_STORE,
@@ -354,7 +356,8 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
 {
     unsigned mmu_idx = get_mmuidx(oi);
     int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
-    target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
+    target_ulong tlb_addr =
+        atomic_read(&env->tlb_table[mmu_idx][index].addr_write);
     unsigned a_bits = get_alignment_bits(get_memop(oi));
     uintptr_t haddr;
 
@@ -369,7 +372,8 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
             tlb_fill(ENV_GET_CPU(env), addr, DATA_SIZE, MMU_DATA_STORE,
                      mmu_idx, retaddr);
         }
-        tlb_addr = env->tlb_table[mmu_idx][index].addr_write & ~TLB_INVALID_MASK;
+        tlb_addr = atomic_read(&env->tlb_table[mmu_idx][index].addr_write) &
+            ~TLB_INVALID_MASK;
     }
 
     /* Handle an IO access.  */
@@ -398,7 +402,7 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
            cannot evict the first.  */
         page2 = (addr + DATA_SIZE) & TARGET_PAGE_MASK;
         index2 = (page2 >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
-        tlb_addr2 = env->tlb_table[mmu_idx][index2].addr_write;
+        tlb_addr2 = atomic_read(&env->tlb_table[mmu_idx][index2].addr_write);
         if (!tlb_hit_page(tlb_addr2, page2)
             && !VICTIM_TLB_HIT(addr_write, page2)) {
             tlb_fill(ENV_GET_CPU(env), page2, DATA_SIZE, MMU_DATA_STORE,
diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
index 41ed0526e2..9581587ce1 100644
--- a/include/exec/cpu_ldst.h
+++ b/include/exec/cpu_ldst.h
@@ -426,7 +426,7 @@ static inline void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
         tlb_addr = tlbentry->addr_read;
         break;
     case 1:
-        tlb_addr = tlbentry->addr_write;
+        tlb_addr = atomic_read(&tlbentry->addr_write);
         break;
     case 2:
         tlb_addr = tlbentry->addr_code;
diff --git a/include/exec/cpu_ldst_template.h b/include/exec/cpu_ldst_template.h
index 4db2302962..ba7a11123c 100644
--- a/include/exec/cpu_ldst_template.h
+++ b/include/exec/cpu_ldst_template.h
@@ -176,7 +176,7 @@ glue(glue(glue(cpu_st, SUFFIX), MEMSUFFIX), _ra)(CPUArchState *env,
     addr = ptr;
     page_index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
     mmu_idx = CPU_MMU_INDEX;
-    if (unlikely(env->tlb_table[mmu_idx][page_index].addr_write !=
+    if (unlikely(atomic_read(&env->tlb_table[mmu_idx][page_index].addr_write) !=
                  (addr & (TARGET_PAGE_MASK | (DATA_SIZE - 1))))) {
         oi = make_memop_idx(SHIFT, mmu_idx);
         glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)(env, addr, v, oi,
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 2b0ff47fdf..c7608ccdf8 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -257,7 +257,7 @@ static inline bool tlb_hit_page_anyprot(CPUTLBEntry *tlb_entry,
                                         target_ulong page)
 {
     return tlb_hit_page(tlb_entry->addr_read, page) ||
-           tlb_hit_page(tlb_entry->addr_write, page) ||
+           tlb_hit_page(atomic_read(&tlb_entry->addr_write), page) ||
            tlb_hit_page(tlb_entry->addr_code, page);
 }
 
@@ -856,7 +856,7 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
         tlb_fill(cpu, addr, size, MMU_DATA_STORE, mmu_idx, retaddr);
 
         index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
-        tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
+        tlb_addr = atomic_read(&env->tlb_table[mmu_idx][index].addr_write);
         if (!(tlb_addr & ~(TARGET_PAGE_MASK | TLB_RECHECK))) {
             /* RAM access */
             uintptr_t haddr = addr + env->tlb_table[mmu_idx][index].addend;
@@ -905,7 +905,9 @@ static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
     assert_cpu_is_self(ENV_GET_CPU(env));
     for (vidx = 0; vidx < CPU_VTLB_SIZE; ++vidx) {
         CPUTLBEntry *vtlb = &env->tlb_v_table[mmu_idx][vidx];
-        target_ulong cmp = *(target_ulong *)((uintptr_t)vtlb + elt_ofs);
+        /* elt_ofs might correspond to .addr_write, so use atomic_read */
+        target_ulong cmp =
+            atomic_read((target_ulong *)((uintptr_t)vtlb + elt_ofs));
 
         if (cmp == page) {
             /* Found entry in victim tlb, swap tlb and iotlb.  */
@@ -977,7 +979,8 @@ void probe_write(CPUArchState *env, target_ulong addr, int size, int mmu_idx,
                  uintptr_t retaddr)
 {
     int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
-    target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
+    target_ulong tlb_addr =
+        atomic_read(&env->tlb_table[mmu_idx][index].addr_write);
 
     if (!tlb_hit(tlb_addr, addr)) {
         /* TLB entry is for a different page */
@@ -997,7 +1000,7 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
     size_t mmu_idx = get_mmuidx(oi);
     size_t index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
     CPUTLBEntry *tlbe = &env->tlb_table[mmu_idx][index];
-    target_ulong tlb_addr = tlbe->addr_write;
+    target_ulong tlb_addr = atomic_read(&tlbe->addr_write);
     TCGMemOp mop = get_memop(oi);
     int a_bits = get_alignment_bits(mop);
     int s_bits = mop & MO_SIZE;
@@ -1028,7 +1031,7 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
             tlb_fill(ENV_GET_CPU(env), addr, 1 << s_bits, MMU_DATA_STORE,
                      mmu_idx, retaddr);
         }
-        tlb_addr = tlbe->addr_write & ~TLB_INVALID_MASK;
+        tlb_addr = atomic_read(&tlbe->addr_write) & ~TLB_INVALID_MASK;
     }
 
     /* Notice an IO access or a needs-MMU-lookup access */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/4] cputlb: serialize tlb updates with env->tlb_lock
  2018-10-05 21:14 ` [Qemu-devel] [PATCH v3 3/4] cputlb: serialize tlb updates with env->tlb_lock Emilio G. Cota
@ 2018-10-08 13:57   ` Alex Bennée
  2018-10-08 14:12     ` Emilio G. Cota
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Bennée @ 2018-10-08 13:57 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel, Paolo Bonzini, Richard Henderson


Emilio G. Cota <cota@braap.org> writes:

> Currently we rely on atomic operations for cross-CPU invalidations.
> There are two cases that these atomics miss: cross-CPU invalidations
> can race with either (1) vCPU threads flushing their TLB, which
> happens via memset, or (2) vCPUs calling tlb_reset_dirty on their TLB,
> which updates .addr_write with a regular store. This results in
> undefined behaviour, since we're mixing regular and atomic ops
> on concurrent accesses.
>
> Fix it by using tlb_lock, a per-vCPU lock. All updaters of tlb_table
> and the corresponding victim cache now hold the lock.
> The readers that do not hold tlb_lock must use atomic reads when
> reading .addr_write, since this field can be updated by other threads;
> the conversion to atomic reads is done in the next patch.

We don't enforce this for the TCG code - but rely on the backend ISA's
to avoid torn reads from updates from cputlb that could invalidate an
entry.

>
> Note that an alternative fix would be to expand the use of atomic ops.
> However, in the case of TLB flushes this would have a huge performance
> impact, since (1) TLB flushes can happen very frequently and (2) we
> currently use a full memory barrier to flush each TLB entry, and a TLB
> has many entries. Instead, acquiring the lock is barely slower than a
> full memory barrier since it is uncontended, and with a single lock
> acquisition we can flush the entire TLB.
>
> Tested-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>  include/exec/cpu-defs.h |   3 +
>  accel/tcg/cputlb.c      | 152 +++++++++++++++++++++-------------------
>  2 files changed, 84 insertions(+), 71 deletions(-)
>
> diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
> index a171ffc1a4..4ff62f32bf 100644
> --- a/include/exec/cpu-defs.h
> +++ b/include/exec/cpu-defs.h
> @@ -24,6 +24,7 @@
>  #endif
>
>  #include "qemu/host-utils.h"
> +#include "qemu/thread.h"
>  #include "qemu/queue.h"
>  #ifdef CONFIG_TCG
>  #include "tcg-target.h"
> @@ -142,6 +143,8 @@ typedef struct CPUIOTLBEntry {
>
>  #define CPU_COMMON_TLB \
>      /* The meaning of the MMU modes is defined in the target code. */   \
> +    /* tlb_lock serializes updates to tlb_table and tlb_v_table */      \
> +    QemuSpin tlb_lock;                                                  \
>      CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE];                  \
>      CPUTLBEntry tlb_v_table[NB_MMU_MODES][CPU_VTLB_SIZE];               \
>      CPUIOTLBEntry iotlb[NB_MMU_MODES][CPU_TLB_SIZE];                    \
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index f6b388c961..2b0ff47fdf 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -75,6 +75,9 @@ QEMU_BUILD_BUG_ON(NB_MMU_MODES > 16);
>
>  void tlb_init(CPUState *cpu)
>  {
> +    CPUArchState *env = cpu->env_ptr;
> +
> +    qemu_spin_init(&env->tlb_lock);
>  }
>
>  /* flush_all_helper: run fn across all cpus
> @@ -129,8 +132,17 @@ static void tlb_flush_nocheck(CPUState *cpu)
>      atomic_set(&env->tlb_flush_count, env->tlb_flush_count + 1);
>      tlb_debug("(count: %zu)\n", tlb_flush_count());
>
> +    /*
> +     * tlb_table/tlb_v_table updates from any thread must hold tlb_lock.
> +     * However, updates from the owner thread (as is the case here; see the
> +     * above assert_cpu_is_self) do not need atomic_set because all reads
> +     * that do not hold the lock are performed by the same owner thread.
> +     */
> +    qemu_spin_lock(&env->tlb_lock);
>      memset(env->tlb_table, -1, sizeof(env->tlb_table));
>      memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
> +    qemu_spin_unlock(&env->tlb_lock);
> +
>      cpu_tb_jmp_cache_clear(cpu);
>
>      env->vtlb_index = 0;
> @@ -182,6 +194,7 @@ static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, run_on_cpu_data data)
>
>      tlb_debug("start: mmu_idx:0x%04lx\n", mmu_idx_bitmask);
>
> +    qemu_spin_lock(&env->tlb_lock);
>      for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
>
>          if (test_bit(mmu_idx, &mmu_idx_bitmask)) {
> @@ -191,6 +204,7 @@ static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, run_on_cpu_data data)
>              memset(env->tlb_v_table[mmu_idx], -1, sizeof(env->tlb_v_table[0]));
>          }
>      }
> +    qemu_spin_unlock(&env->tlb_lock);
>
>      cpu_tb_jmp_cache_clear(cpu);
>
> @@ -247,19 +261,24 @@ static inline bool tlb_hit_page_anyprot(CPUTLBEntry *tlb_entry,
>             tlb_hit_page(tlb_entry->addr_code, page);
>  }
>
> -static inline void tlb_flush_entry(CPUTLBEntry *tlb_entry, target_ulong page)
> +/* Called with tlb_lock held */
> +static inline void tlb_flush_entry_locked(CPUTLBEntry *tlb_entry,
> +                                          target_ulong page)
>  {
>      if (tlb_hit_page_anyprot(tlb_entry, page)) {
>          memset(tlb_entry, -1, sizeof(*tlb_entry));
>      }
>  }
>
> -static inline void tlb_flush_vtlb_page(CPUArchState *env, int mmu_idx,
> -                                       target_ulong page)
> +/* Called with tlb_lock held */
> +static inline void tlb_flush_vtlb_page_locked(CPUArchState *env, int mmu_idx,
> +                                              target_ulong page)
>  {
>      int k;
> +
> +    assert_cpu_is_self(ENV_GET_CPU(env));
>      for (k = 0; k < CPU_VTLB_SIZE; k++) {
> -        tlb_flush_entry(&env->tlb_v_table[mmu_idx][k], page);
> +        tlb_flush_entry_locked(&env->tlb_v_table[mmu_idx][k], page);
>      }
>  }
>
> @@ -286,10 +305,12 @@ static void tlb_flush_page_async_work(CPUState *cpu, run_on_cpu_data data)
>
>      addr &= TARGET_PAGE_MASK;
>      i = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
> +    qemu_spin_lock(&env->tlb_lock);
>      for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
> -        tlb_flush_entry(&env->tlb_table[mmu_idx][i], addr);
> -        tlb_flush_vtlb_page(env, mmu_idx, addr);
> +        tlb_flush_entry_locked(&env->tlb_table[mmu_idx][i], addr);
> +        tlb_flush_vtlb_page_locked(env, mmu_idx, addr);
>      }
> +    qemu_spin_unlock(&env->tlb_lock);
>
>      tb_flush_jmp_cache(cpu, addr);
>  }
> @@ -326,12 +347,14 @@ static void tlb_flush_page_by_mmuidx_async_work(CPUState *cpu,
>      tlb_debug("page:%d addr:"TARGET_FMT_lx" mmu_idx:0x%lx\n",
>                page, addr, mmu_idx_bitmap);
>
> +    qemu_spin_lock(&env->tlb_lock);
>      for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
>          if (test_bit(mmu_idx, &mmu_idx_bitmap)) {
> -            tlb_flush_entry(&env->tlb_table[mmu_idx][page], addr);
> -            tlb_flush_vtlb_page(env, mmu_idx, addr);
> +            tlb_flush_entry_locked(&env->tlb_table[mmu_idx][page], addr);
> +            tlb_flush_vtlb_page_locked(env, mmu_idx, addr);
>          }
>      }
> +    qemu_spin_unlock(&env->tlb_lock);
>
>      tb_flush_jmp_cache(cpu, addr);
>  }
> @@ -454,72 +477,41 @@ void tlb_unprotect_code(ram_addr_t ram_addr)
>   * most usual is detecting writes to code regions which may invalidate
>   * generated code.
>   *
> - * Because we want other vCPUs to respond to changes straight away we
> - * update the te->addr_write field atomically. If the TLB entry has
> - * been changed by the vCPU in the mean time we skip the update.
> + * Other vCPUs might be reading their TLBs during guest execution, so we update
> + * te->addr_write with atomic_set. We don't need to worry about this for
> + * oversized guests as MTTCG is disabled for them.
>   *
> - * As this function uses atomic accesses we also need to ensure
> - * updates to tlb_entries follow the same access rules. We don't need
> - * to worry about this for oversized guests as MTTCG is disabled for
> - * them.
> + * Called with tlb_lock held.
>   */
> -
> -static void tlb_reset_dirty_range(CPUTLBEntry *tlb_entry, uintptr_t start,
> -                           uintptr_t length)
> +static void tlb_reset_dirty_range_locked(CPUTLBEntry *tlb_entry,
> +                                         uintptr_t start, uintptr_t length)
>  {
> -#if TCG_OVERSIZED_GUEST
>      uintptr_t addr = tlb_entry->addr_write;
>
>      if ((addr & (TLB_INVALID_MASK | TLB_MMIO | TLB_NOTDIRTY)) == 0) {
>          addr &= TARGET_PAGE_MASK;
>          addr += tlb_entry->addend;
>          if ((addr - start) < length) {
> +#if TCG_OVERSIZED_GUEST
>              tlb_entry->addr_write |= TLB_NOTDIRTY;
> -        }
> -    }
>  #else
> -    /* paired with atomic_mb_set in tlb_set_page_with_attrs */
> -    uintptr_t orig_addr = atomic_mb_read(&tlb_entry->addr_write);
> -    uintptr_t addr = orig_addr;
> -
> -    if ((addr & (TLB_INVALID_MASK | TLB_MMIO | TLB_NOTDIRTY)) == 0) {
> -        addr &= TARGET_PAGE_MASK;
> -        addr += atomic_read(&tlb_entry->addend);
> -        if ((addr - start) < length) {
> -            uintptr_t notdirty_addr = orig_addr | TLB_NOTDIRTY;
> -            atomic_cmpxchg(&tlb_entry->addr_write, orig_addr, notdirty_addr);
> +            atomic_set(&tlb_entry->addr_write,
> +                       tlb_entry->addr_write | TLB_NOTDIRTY);
> +#endif
>          }
>      }
> -#endif
>  }
>
> -/* For atomic correctness when running MTTCG we need to use the right
> - * primitives when copying entries */
> -static inline void copy_tlb_helper(CPUTLBEntry *d, CPUTLBEntry *s,
> -                                   bool atomic_set)
> +/* Called with tlb_lock held */
> +static inline void copy_tlb_helper_locked(CPUTLBEntry *d, const CPUTLBEntry *s)
>  {
> -#if TCG_OVERSIZED_GUEST
>      *d = *s;

In general I'm happy with the patch set but what ensures that this
always DRT with respect to the TCG code reads that race with it?

> -#else
> -    if (atomic_set) {
> -        d->addr_read = s->addr_read;
> -        d->addr_code = s->addr_code;
> -        atomic_set(&d->addend, atomic_read(&s->addend));
> -        /* Pairs with flag setting in tlb_reset_dirty_range */
> -        atomic_mb_set(&d->addr_write, atomic_read(&s->addr_write));
> -    } else {
> -        d->addr_read = s->addr_read;
> -        d->addr_write = atomic_read(&s->addr_write);
> -        d->addr_code = s->addr_code;
> -        d->addend = atomic_read(&s->addend);
> -    }
> -#endif
>  }
>
>  /* This is a cross vCPU call (i.e. another vCPU resetting the flags of
> - * the target vCPU). As such care needs to be taken that we don't
> - * dangerously race with another vCPU update. The only thing actually
> - * updated is the target TLB entry ->addr_write flags.
> + * the target vCPU).
> + * We must take tlb_lock to avoid racing with another vCPU update. The only
> + * thing actually updated is the target TLB entry ->addr_write flags.
>   */
>  void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length)
>  {
> @@ -528,22 +520,26 @@ void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length)
>      int mmu_idx;
>
>      env = cpu->env_ptr;
> +    qemu_spin_lock(&env->tlb_lock);
>      for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
>          unsigned int i;
>
>          for (i = 0; i < CPU_TLB_SIZE; i++) {
> -            tlb_reset_dirty_range(&env->tlb_table[mmu_idx][i],
> -                                  start1, length);
> +            tlb_reset_dirty_range_locked(&env->tlb_table[mmu_idx][i], start1,
> +                                         length);
>          }
>
>          for (i = 0; i < CPU_VTLB_SIZE; i++) {
> -            tlb_reset_dirty_range(&env->tlb_v_table[mmu_idx][i],
> -                                  start1, length);
> +            tlb_reset_dirty_range_locked(&env->tlb_v_table[mmu_idx][i], start1,
> +                                         length);
>          }
>      }
> +    qemu_spin_unlock(&env->tlb_lock);
>  }
>
> -static inline void tlb_set_dirty1(CPUTLBEntry *tlb_entry, target_ulong vaddr)
> +/* Called with tlb_lock held */
> +static inline void tlb_set_dirty1_locked(CPUTLBEntry *tlb_entry,
> +                                         target_ulong vaddr)
>  {
>      if (tlb_entry->addr_write == (vaddr | TLB_NOTDIRTY)) {
>          tlb_entry->addr_write = vaddr;
> @@ -562,16 +558,18 @@ void tlb_set_dirty(CPUState *cpu, target_ulong vaddr)
>
>      vaddr &= TARGET_PAGE_MASK;
>      i = (vaddr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
> +    qemu_spin_lock(&env->tlb_lock);
>      for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
> -        tlb_set_dirty1(&env->tlb_table[mmu_idx][i], vaddr);
> +        tlb_set_dirty1_locked(&env->tlb_table[mmu_idx][i], vaddr);
>      }
>
>      for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
>          int k;
>          for (k = 0; k < CPU_VTLB_SIZE; k++) {
> -            tlb_set_dirty1(&env->tlb_v_table[mmu_idx][k], vaddr);
> +            tlb_set_dirty1_locked(&env->tlb_v_table[mmu_idx][k], vaddr);
>          }
>      }
> +    qemu_spin_unlock(&env->tlb_lock);
>  }
>
>  /* Our TLB does not support large pages, so remember the area covered by
> @@ -658,9 +656,6 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
>          addend = (uintptr_t)memory_region_get_ram_ptr(section->mr) + xlat;
>      }
>
> -    /* Make sure there's no cached translation for the new page.  */
> -    tlb_flush_vtlb_page(env, mmu_idx, vaddr_page);
> -
>      code_address = address;
>      iotlb = memory_region_section_get_iotlb(cpu, section, vaddr_page,
>                                              paddr_page, xlat, prot, &address);
> @@ -668,6 +663,18 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
>      index = (vaddr_page >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>      te = &env->tlb_table[mmu_idx][index];
>
> +    /*
> +     * Hold the TLB lock for the rest of the function. We could acquire/release
> +     * the lock several times in the function, but it is faster to amortize the
> +     * acquisition cost by acquiring it just once. Note that this leads to
> +     * a longer critical section, but this is not a concern since the TLB lock
> +     * is unlikely to be contended.
> +     */
> +    qemu_spin_lock(&env->tlb_lock);
> +
> +    /* Make sure there's no cached translation for the new page.  */
> +    tlb_flush_vtlb_page_locked(env, mmu_idx, vaddr_page);
> +
>      /*
>       * Only evict the old entry to the victim tlb if it's for a
>       * different page; otherwise just overwrite the stale data.
> @@ -677,7 +684,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
>          CPUTLBEntry *tv = &env->tlb_v_table[mmu_idx][vidx];
>
>          /* Evict the old entry into the victim tlb.  */
> -        copy_tlb_helper(tv, te, true);
> +        copy_tlb_helper_locked(tv, te);
>          env->iotlb_v[mmu_idx][vidx] = env->iotlb[mmu_idx][index];
>      }
>
> @@ -729,9 +736,8 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
>          }
>      }
>
> -    /* Pairs with flag setting in tlb_reset_dirty_range */
> -    copy_tlb_helper(te, &tn, true);
> -    /* atomic_mb_set(&te->addr_write, write_address); */
> +    copy_tlb_helper_locked(te, &tn);
> +    qemu_spin_unlock(&env->tlb_lock);
>  }
>
>  /* Add a new TLB entry, but without specifying the memory
> @@ -895,6 +901,8 @@ static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
>                             size_t elt_ofs, target_ulong page)
>  {
>      size_t vidx;
> +
> +    assert_cpu_is_self(ENV_GET_CPU(env));
>      for (vidx = 0; vidx < CPU_VTLB_SIZE; ++vidx) {
>          CPUTLBEntry *vtlb = &env->tlb_v_table[mmu_idx][vidx];
>          target_ulong cmp = *(target_ulong *)((uintptr_t)vtlb + elt_ofs);
> @@ -903,9 +911,11 @@ static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
>              /* Found entry in victim tlb, swap tlb and iotlb.  */
>              CPUTLBEntry tmptlb, *tlb = &env->tlb_table[mmu_idx][index];
>
> -            copy_tlb_helper(&tmptlb, tlb, false);
> -            copy_tlb_helper(tlb, vtlb, true);
> -            copy_tlb_helper(vtlb, &tmptlb, true);
> +            qemu_spin_lock(&env->tlb_lock);
> +            copy_tlb_helper_locked(&tmptlb, tlb);
> +            copy_tlb_helper_locked(tlb, vtlb);
> +            copy_tlb_helper_locked(vtlb, &tmptlb);
> +            qemu_spin_unlock(&env->tlb_lock);
>
>              CPUIOTLBEntry tmpio, *io = &env->iotlb[mmu_idx][index];
>              CPUIOTLBEntry *vio = &env->iotlb_v[mmu_idx][vidx];


--
Alex Bennée

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/4] cputlb: serialize tlb updates with env->tlb_lock
  2018-10-08 13:57   ` Alex Bennée
@ 2018-10-08 14:12     ` Emilio G. Cota
  2018-10-08 14:30       ` Alex Bennée
  0 siblings, 1 reply; 12+ messages in thread
From: Emilio G. Cota @ 2018-10-08 14:12 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, Paolo Bonzini, Richard Henderson

On Mon, Oct 08, 2018 at 14:57:18 +0100, Alex Bennée wrote:
> Emilio G. Cota <cota@braap.org> writes:
> > The readers that do not hold tlb_lock must use atomic reads when
> > reading .addr_write, since this field can be updated by other threads;
> > the conversion to atomic reads is done in the next patch.
> 
> We don't enforce this for the TCG code - but rely on the backend ISA's
> to avoid torn reads from updates from cputlb that could invalidate an
> entry.

We do enforce it though; the TLB reads we emit in TCG backend
code are appropriately sized to guarantee atomic reads.

> > -/* For atomic correctness when running MTTCG we need to use the right
> > - * primitives when copying entries */
> > -static inline void copy_tlb_helper(CPUTLBEntry *d, CPUTLBEntry *s,
> > -                                   bool atomic_set)
> > +/* Called with tlb_lock held */
> > +static inline void copy_tlb_helper_locked(CPUTLBEntry *d, const CPUTLBEntry *s)
> >  {
> > -#if TCG_OVERSIZED_GUEST
> >      *d = *s;
> 
> In general I'm happy with the patch set but what ensures that this
> always DRT with respect to the TCG code reads that race with it?

copy_tlb_helper is only called by the "owner" CPU, so it cannot
race with TCG code (i.e. the owner thread cannot race with itself).

I wanted to add an assert_cpu_is_self(cpu) here, but that needs
a CPUState pointer. Maybe I should just get rid of the function?
All the callers have the assert, so that might make the code
clearer.

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/4] cputlb: serialize tlb updates with env->tlb_lock
  2018-10-08 14:12     ` Emilio G. Cota
@ 2018-10-08 14:30       ` Alex Bennée
  0 siblings, 0 replies; 12+ messages in thread
From: Alex Bennée @ 2018-10-08 14:30 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel, Paolo Bonzini, Richard Henderson


Emilio G. Cota <cota@braap.org> writes:

> On Mon, Oct 08, 2018 at 14:57:18 +0100, Alex Bennée wrote:
>> Emilio G. Cota <cota@braap.org> writes:
>> > The readers that do not hold tlb_lock must use atomic reads when
>> > reading .addr_write, since this field can be updated by other threads;
>> > the conversion to atomic reads is done in the next patch.
>>
>> We don't enforce this for the TCG code - but rely on the backend ISA's
>> to avoid torn reads from updates from cputlb that could invalidate an
>> entry.
>
> We do enforce it though; the TLB reads we emit in TCG backend
> code are appropriately sized to guarantee atomic reads.
>
>> > -/* For atomic correctness when running MTTCG we need to use the right
>> > - * primitives when copying entries */
>> > -static inline void copy_tlb_helper(CPUTLBEntry *d, CPUTLBEntry *s,
>> > -                                   bool atomic_set)
>> > +/* Called with tlb_lock held */
>> > +static inline void copy_tlb_helper_locked(CPUTLBEntry *d, const CPUTLBEntry *s)
>> >  {
>> > -#if TCG_OVERSIZED_GUEST
>> >      *d = *s;
>>
>> In general I'm happy with the patch set but what ensures that this
>> always DRT with respect to the TCG code reads that race with it?
>
> copy_tlb_helper is only called by the "owner" CPU, so it cannot
> race with TCG code (i.e. the owner thread cannot race with itself).
>
> I wanted to add an assert_cpu_is_self(cpu) here, but that needs
> a CPUState pointer. Maybe I should just get rid of the function?
> All the callers have the assert, so that might make the code
> clearer.

I'm happy keeping the function and just expanding the comment:

/* Called with tlb_lock held and only ever from the vCPU context */

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

>
> Thanks,
>
> 		Emilio


--
Alex Bennée

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/4] cputlb: read CPUTLBEntry.addr_write atomically
  2018-10-05 21:14 ` [Qemu-devel] [PATCH v3 4/4] cputlb: read CPUTLBEntry.addr_write atomically Emilio G. Cota
@ 2018-10-16  2:52   ` Richard Henderson
  2018-10-16  6:03     ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Richard Henderson @ 2018-10-16  2:52 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel; +Cc: Paolo Bonzini, Alex Bennée

On 10/5/18 2:14 PM, Emilio G. Cota wrote:
> -    target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
> +    target_ulong tlb_addr =
> +        atomic_read(&env->tlb_table[mmu_idx][index].addr_write);

This fails for 32-bit hosts emulating 64-bit hosts.
I think you need a separate helper function.  Perhaps

static inline target_ulong tlb_addr_write(CPUTLBEntry *ent)
{
#if TCG_OVERSIZED_GUEST
    return ent->addr_write;
#else
    return atomic_read(&ent->addr_write);
#endif
}

I'm going to drop this patch from my queue for now.
We can fix it up this week some time.


r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/4] cputlb: read CPUTLBEntry.addr_write atomically
  2018-10-16  2:52   ` Richard Henderson
@ 2018-10-16  6:03     ` Paolo Bonzini
  2018-10-16 15:12       ` Emilio G. Cota
  2018-10-16 15:56       ` Richard Henderson
  0 siblings, 2 replies; 12+ messages in thread
From: Paolo Bonzini @ 2018-10-16  6:03 UTC (permalink / raw)
  To: Richard Henderson, Emilio G. Cota, qemu-devel; +Cc: Alex Bennée

On 16/10/2018 04:52, Richard Henderson wrote:
> On 10/5/18 2:14 PM, Emilio G. Cota wrote:
>> -    target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
>> +    target_ulong tlb_addr =
>> +        atomic_read(&env->tlb_table[mmu_idx][index].addr_write);
> 
> This fails for 32-bit hosts emulating 64-bit hosts.
> I think you need a separate helper function.  Perhaps
> 
> static inline target_ulong tlb_addr_write(CPUTLBEntry *ent)
> {
> #if TCG_OVERSIZED_GUEST
>     return ent->addr_write;
> #else
>     return atomic_read(&ent->addr_write);
> #endif
> }
> 

Or just atomic_read__nocheck.

Paolo

> I'm going to drop this patch from my queue for now.
> We can fix it up this week some time.
> 
> 
> r~
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/4] cputlb: read CPUTLBEntry.addr_write atomically
  2018-10-16  6:03     ` Paolo Bonzini
@ 2018-10-16 15:12       ` Emilio G. Cota
  2018-10-16 15:56       ` Richard Henderson
  1 sibling, 0 replies; 12+ messages in thread
From: Emilio G. Cota @ 2018-10-16 15:12 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Richard Henderson, qemu-devel, Alex Bennée

On Tue, Oct 16, 2018 at 08:03:03 +0200, Paolo Bonzini wrote:
> On 16/10/2018 04:52, Richard Henderson wrote:
> > On 10/5/18 2:14 PM, Emilio G. Cota wrote:
> >> -    target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
> >> +    target_ulong tlb_addr =
> >> +        atomic_read(&env->tlb_table[mmu_idx][index].addr_write);
> > 
> > This fails for 32-bit hosts emulating 64-bit hosts.
> > I think you need a separate helper function.  Perhaps
> > 
> > static inline target_ulong tlb_addr_write(CPUTLBEntry *ent)
> > {
> > #if TCG_OVERSIZED_GUEST
> >     return ent->addr_write;
> > #else
> >     return atomic_read(&ent->addr_write);
> > #endif
> > }

Ouch, yes.

> Or just atomic_read__nocheck.

I prefer special-casing oversized guests, otherwise we risk having
compile-time errors -- like the ones we got on mingw 32-bit when
not checking for CONFIG_ATOMIC64.

> > I'm going to drop this patch from my queue for now.
> > We can fix it up this week some time.

Sounds good. I'll send a patch on top of your current tcg-next
to avoid conflicts.

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/4] cputlb: read CPUTLBEntry.addr_write atomically
  2018-10-16  6:03     ` Paolo Bonzini
  2018-10-16 15:12       ` Emilio G. Cota
@ 2018-10-16 15:56       ` Richard Henderson
  1 sibling, 0 replies; 12+ messages in thread
From: Richard Henderson @ 2018-10-16 15:56 UTC (permalink / raw)
  To: Paolo Bonzini, Emilio G. Cota, qemu-devel; +Cc: Alex Bennée

On 10/15/18 11:03 PM, Paolo Bonzini wrote:
> On 16/10/2018 04:52, Richard Henderson wrote:
>> On 10/5/18 2:14 PM, Emilio G. Cota wrote:
>>> -    target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
>>> +    target_ulong tlb_addr =
>>> +        atomic_read(&env->tlb_table[mmu_idx][index].addr_write);
>>
>> This fails for 32-bit hosts emulating 64-bit hosts.
>> I think you need a separate helper function.  Perhaps
>>
>> static inline target_ulong tlb_addr_write(CPUTLBEntry *ent)
>> {
>> #if TCG_OVERSIZED_GUEST
>>     return ent->addr_write;
>> #else
>>     return atomic_read(&ent->addr_write);
>> #endif
>> }
>>
> 
> Or just atomic_read__nocheck.

No, it won't be present, necessarily.


r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-10-16 15:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-05 21:14 [Qemu-devel] [PATCH v3 0/4] per-TLB lock Emilio G. Cota
2018-10-05 21:14 ` [Qemu-devel] [PATCH v3 1/4] exec: introduce tlb_init Emilio G. Cota
2018-10-05 21:14 ` [Qemu-devel] [PATCH v3 2/4] cputlb: fix assert_cpu_is_self macro Emilio G. Cota
2018-10-05 21:14 ` [Qemu-devel] [PATCH v3 3/4] cputlb: serialize tlb updates with env->tlb_lock Emilio G. Cota
2018-10-08 13:57   ` Alex Bennée
2018-10-08 14:12     ` Emilio G. Cota
2018-10-08 14:30       ` Alex Bennée
2018-10-05 21:14 ` [Qemu-devel] [PATCH v3 4/4] cputlb: read CPUTLBEntry.addr_write atomically Emilio G. Cota
2018-10-16  2:52   ` Richard Henderson
2018-10-16  6:03     ` Paolo Bonzini
2018-10-16 15:12       ` Emilio G. Cota
2018-10-16 15:56       ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.