All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] target/ppc: fix tlb flushing race
@ 2024-03-28  5:31 Nicholas Piggin
  2024-03-28  5:31 ` [PATCH 1/3] target/ppc: Fix broadcast tlbie synchronisation Nicholas Piggin
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Nicholas Piggin @ 2024-03-28  5:31 UTC (permalink / raw)
  To: qemu-ppc; +Cc: Nicholas Piggin, Richard Henderson, Paolo Bonzini, qemu-devel

ppc broadcast tlb flushes should be synchronised with other vCPUs,
like all other architectures that support such operations seem to
be doing.

Fixing ppc removes the last caller of the non-synced TLB flush
variants, we can remove some dead code. I'd like to merge patch 1
for 9.0, and hold patches 2 and 3 until 9.1 to avoid churn (unless
someone prefers to remove the dead code asap).

Thanks,
Nick

Nicholas Piggin (3):
  target/ppc: Fix broadcast tlbie synchronisation
  tcg/cputlb: Remove non-synced variants of global TLB flushes
  tcg/cputlb: remove other-cpu capability from TLB flushing

 docs/devel/multi-thread-tcg.rst |  13 +--
 include/exec/exec-all.h         |  97 ++++-----------------
 accel/tcg/cputlb.c              | 145 ++------------------------------
 target/ppc/helper_regs.c        |   2 +-
 target/ppc/mmu_helper.c         |   2 +-
 5 files changed, 30 insertions(+), 229 deletions(-)

-- 
2.42.0



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] target/ppc: Fix broadcast tlbie synchronisation
  2024-03-28  5:31 [PATCH 0/3] target/ppc: fix tlb flushing race Nicholas Piggin
@ 2024-03-28  5:31 ` Nicholas Piggin
  2024-03-28 13:18   ` Philippe Mathieu-Daudé
  2024-03-28  5:31 ` [PATCH 2/3] tcg/cputlb: Remove non-synced variants of global TLB flushes Nicholas Piggin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Nicholas Piggin @ 2024-03-28  5:31 UTC (permalink / raw)
  To: qemu-ppc; +Cc: Nicholas Piggin, Richard Henderson, Paolo Bonzini, qemu-devel

With mttcg, broadcast tlbie instructions do not wait until other vCPUs
have been kicked out of TCG execution before they complete (including
necessary subsequent tlbsync, etc., instructions). This is contrary to
the ISA, and it permits other vCPUs to use translations after the TLB
flush. For example:

   CPU0
   // *memP is initially 0, memV maps to memP with *pte
   *pte = 0;
   ptesync ; tlbie ; eieio ; tlbsync ; ptesync
   *memP = 1;

   CPU1
   assert(*memV == 0);

It is possible for the assertion to fail because CPU1 translates memV
using the TLB after CPU0 has stored 1 to the underlying memory. This
race was observed with a careful test case where CPU1 checks run in a
very large expensive TB so it can run for the entire CPU0 period between
clearing the pte and storing the memory. It's normally very difficult to
hit, but preemption of host vCPU threads could trigger the race
anywhere.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 target/ppc/helper_regs.c | 2 +-
 target/ppc/mmu_helper.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/ppc/helper_regs.c b/target/ppc/helper_regs.c
index 25258986e3..9094ae5004 100644
--- a/target/ppc/helper_regs.c
+++ b/target/ppc/helper_regs.c
@@ -334,7 +334,7 @@ void check_tlb_flush(CPUPPCState *env, bool global)
     if (global && (env->tlb_need_flush & TLB_NEED_GLOBAL_FLUSH)) {
         env->tlb_need_flush &= ~TLB_NEED_GLOBAL_FLUSH;
         env->tlb_need_flush &= ~TLB_NEED_LOCAL_FLUSH;
-        tlb_flush_all_cpus(cs);
+        tlb_flush_all_cpus_synced(cs);
         return;
     }
 
diff --git a/target/ppc/mmu_helper.c b/target/ppc/mmu_helper.c
index c071b4d5e2..aaa5bfc62a 100644
--- a/target/ppc/mmu_helper.c
+++ b/target/ppc/mmu_helper.c
@@ -533,7 +533,7 @@ void helper_tlbie_isa300(CPUPPCState *env, target_ulong rb, target_ulong rs,
     if (local) {
         tlb_flush_page(env_cpu(env), addr);
     } else {
-        tlb_flush_page_all_cpus(env_cpu(env), addr);
+        tlb_flush_page_all_cpus_synced(env_cpu(env), addr);
     }
     return;
 
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] tcg/cputlb: Remove non-synced variants of global TLB flushes
  2024-03-28  5:31 [PATCH 0/3] target/ppc: fix tlb flushing race Nicholas Piggin
  2024-03-28  5:31 ` [PATCH 1/3] target/ppc: Fix broadcast tlbie synchronisation Nicholas Piggin
@ 2024-03-28  5:31 ` Nicholas Piggin
  2024-03-28 13:18   ` Philippe Mathieu-Daudé
  2024-03-28  5:31 ` [PATCH 3/3] tcg/cputlb: remove other-cpu capability from TLB flushing Nicholas Piggin
  2024-03-28  8:12 ` [PATCH 0/3] target/ppc: fix tlb flushing race Nicholas Piggin
  3 siblings, 1 reply; 10+ messages in thread
From: Nicholas Piggin @ 2024-03-28  5:31 UTC (permalink / raw)
  To: qemu-ppc; +Cc: Nicholas Piggin, Richard Henderson, Paolo Bonzini, qemu-devel

These are no longer used.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 docs/devel/multi-thread-tcg.rst |  13 ++--
 include/exec/exec-all.h         |  97 +++++-------------------------
 accel/tcg/cputlb.c              | 103 --------------------------------
 3 files changed, 19 insertions(+), 194 deletions(-)

diff --git a/docs/devel/multi-thread-tcg.rst b/docs/devel/multi-thread-tcg.rst
index 1420789fff..d706c27ea7 100644
--- a/docs/devel/multi-thread-tcg.rst
+++ b/docs/devel/multi-thread-tcg.rst
@@ -205,15 +205,10 @@ DESIGN REQUIREMENTS:
 
 (Current solution)
 
-We have updated cputlb.c to defer operations when a cross-vCPU
-operation with async_run_on_cpu() which ensures each vCPU sees a
-coherent state when it next runs its work (in a few instructions
-time).
-
-A new set up operations (tlb_flush_*_all_cpus) take an additional flag
-which when set will force synchronisation by setting the source vCPUs
-work as "safe work" and exiting the cpu run loop. This ensure by the
-time execution restarts all flush operations have completed.
+A new set of tlb flush operations (tlb_flush_*_all_cpus_synced) force
+synchronisation by setting the source vCPUs work as "safe work" and
+exiting the cpu run loop. This ensures that by the time execution
+restarts all flush operations have completed.
 
 TLB flag updates are all done atomically and are also protected by the
 corresponding page lock.
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 3e53501691..7cf9faa63f 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -66,24 +66,15 @@ void tlb_destroy(CPUState *cpu);
  */
 void tlb_flush_page(CPUState *cpu, vaddr addr);
 /**
- * tlb_flush_page_all_cpus:
+ * tlb_flush_page_all_cpus_synced:
  * @cpu: src CPU of the flush
  * @addr: virtual address of page to be flushed
  *
- * Flush one page from the TLB of the specified CPU, for all
+ * Flush one page from the TLB of all CPUs, for all
  * MMU indexes.
- */
-void tlb_flush_page_all_cpus(CPUState *src, vaddr addr);
-/**
- * tlb_flush_page_all_cpus_synced:
- * @cpu: src CPU of the flush
- * @addr: virtual address of page to be flushed
  *
- * Flush one page from the TLB of the specified CPU, for all MMU
- * indexes like tlb_flush_page_all_cpus except the source vCPUs work
- * is scheduled as safe work meaning all flushes will be complete once
- * the source vCPUs safe work is complete. This will depend on when
- * the guests translation ends the TB.
+ * When this function returns, no CPUs will subsequently perform
+ * translations using the flushed TLBs.
  */
 void tlb_flush_page_all_cpus_synced(CPUState *src, vaddr addr);
 /**
@@ -96,19 +87,14 @@ void tlb_flush_page_all_cpus_synced(CPUState *src, vaddr addr);
  * use one of the other functions for efficiency.
  */
 void tlb_flush(CPUState *cpu);
-/**
- * tlb_flush_all_cpus:
- * @cpu: src CPU of the flush
- */
-void tlb_flush_all_cpus(CPUState *src_cpu);
 /**
  * tlb_flush_all_cpus_synced:
  * @cpu: src CPU of the flush
  *
- * Like tlb_flush_all_cpus except this except the source vCPUs work is
- * scheduled as safe work meaning all flushes will be complete once
- * the source vCPUs safe work is complete. This will depend on when
- * the guests translation ends the TB.
+ * Flush the entire TLB for all CPUs, for all MMU indexes.
+ *
+ * When this function returns, no CPUs will subsequently perform
+ * translations using the flushed TLBs.
  */
 void tlb_flush_all_cpus_synced(CPUState *src_cpu);
 /**
@@ -123,27 +109,16 @@ void tlb_flush_all_cpus_synced(CPUState *src_cpu);
 void tlb_flush_page_by_mmuidx(CPUState *cpu, vaddr addr,
                               uint16_t idxmap);
 /**
- * tlb_flush_page_by_mmuidx_all_cpus:
+ * tlb_flush_page_by_mmuidx_all_cpus_synced:
  * @cpu: Originating CPU of the flush
  * @addr: virtual address of page to be flushed
  * @idxmap: bitmap of MMU indexes to flush
  *
  * Flush one page from the TLB of all CPUs, for the specified
  * MMU indexes.
- */
-void tlb_flush_page_by_mmuidx_all_cpus(CPUState *cpu, vaddr addr,
-                                       uint16_t idxmap);
-/**
- * tlb_flush_page_by_mmuidx_all_cpus_synced:
- * @cpu: Originating CPU of the flush
- * @addr: virtual address of page to be flushed
- * @idxmap: bitmap of MMU indexes to flush
  *
- * Flush one page from the TLB of all CPUs, for the specified MMU
- * indexes like tlb_flush_page_by_mmuidx_all_cpus except the source
- * vCPUs work is scheduled as safe work meaning all flushes will be
- * complete once  the source vCPUs safe work is complete. This will
- * depend on when the guests translation ends the TB.
+ * When this function returns, no CPUs will subsequently perform
+ * translations using the flushed TLBs.
  */
 void tlb_flush_page_by_mmuidx_all_cpus_synced(CPUState *cpu, vaddr addr,
                                               uint16_t idxmap);
@@ -158,24 +133,15 @@ void tlb_flush_page_by_mmuidx_all_cpus_synced(CPUState *cpu, vaddr addr,
  */
 void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap);
 /**
- * tlb_flush_by_mmuidx_all_cpus:
+ * tlb_flush_by_mmuidx_all_cpus_synced:
  * @cpu: Originating CPU of the flush
  * @idxmap: bitmap of MMU indexes to flush
  *
- * Flush all entries from all TLBs of all CPUs, for the specified
+ * Flush all entries from the TLB of all CPUs, for the specified
  * MMU indexes.
- */
-void tlb_flush_by_mmuidx_all_cpus(CPUState *cpu, uint16_t idxmap);
-/**
- * tlb_flush_by_mmuidx_all_cpus_synced:
- * @cpu: Originating CPU of the flush
- * @idxmap: bitmap of MMU indexes to flush
  *
- * Flush all entries from all TLBs of all CPUs, for the specified
- * MMU indexes like tlb_flush_by_mmuidx_all_cpus except except the source
- * vCPUs work is scheduled as safe work meaning all flushes will be
- * complete once  the source vCPUs safe work is complete. This will
- * depend on when the guests translation ends the TB.
+ * When this function returns, no CPUs will subsequently perform
+ * translations using the flushed TLBs.
  */
 void tlb_flush_by_mmuidx_all_cpus_synced(CPUState *cpu, uint16_t idxmap);
 
@@ -192,8 +158,6 @@ void tlb_flush_page_bits_by_mmuidx(CPUState *cpu, vaddr addr,
                                    uint16_t idxmap, unsigned bits);
 
 /* Similarly, with broadcast and syncing. */
-void tlb_flush_page_bits_by_mmuidx_all_cpus(CPUState *cpu, vaddr addr,
-                                            uint16_t idxmap, unsigned bits);
 void tlb_flush_page_bits_by_mmuidx_all_cpus_synced
     (CPUState *cpu, vaddr addr, uint16_t idxmap, unsigned bits);
 
@@ -213,9 +177,6 @@ void tlb_flush_range_by_mmuidx(CPUState *cpu, vaddr addr,
                                unsigned bits);
 
 /* Similarly, with broadcast and syncing. */
-void tlb_flush_range_by_mmuidx_all_cpus(CPUState *cpu, vaddr addr,
-                                        vaddr len, uint16_t idxmap,
-                                        unsigned bits);
 void tlb_flush_range_by_mmuidx_all_cpus_synced(CPUState *cpu,
                                                vaddr addr,
                                                vaddr len,
@@ -288,18 +249,12 @@ static inline void tlb_destroy(CPUState *cpu)
 static inline void tlb_flush_page(CPUState *cpu, vaddr addr)
 {
 }
-static inline void tlb_flush_page_all_cpus(CPUState *src, vaddr addr)
-{
-}
 static inline void tlb_flush_page_all_cpus_synced(CPUState *src, vaddr addr)
 {
 }
 static inline void tlb_flush(CPUState *cpu)
 {
 }
-static inline void tlb_flush_all_cpus(CPUState *src_cpu)
-{
-}
 static inline void tlb_flush_all_cpus_synced(CPUState *src_cpu)
 {
 }
@@ -311,20 +266,11 @@ static inline void tlb_flush_page_by_mmuidx(CPUState *cpu,
 static inline void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap)
 {
 }
-static inline void tlb_flush_page_by_mmuidx_all_cpus(CPUState *cpu,
-                                                     vaddr addr,
-                                                     uint16_t idxmap)
-{
-}
 static inline void tlb_flush_page_by_mmuidx_all_cpus_synced(CPUState *cpu,
                                                             vaddr addr,
                                                             uint16_t idxmap)
 {
 }
-static inline void tlb_flush_by_mmuidx_all_cpus(CPUState *cpu, uint16_t idxmap)
-{
-}
-
 static inline void tlb_flush_by_mmuidx_all_cpus_synced(CPUState *cpu,
                                                        uint16_t idxmap)
 {
@@ -335,12 +281,6 @@ static inline void tlb_flush_page_bits_by_mmuidx(CPUState *cpu,
                                                  unsigned bits)
 {
 }
-static inline void tlb_flush_page_bits_by_mmuidx_all_cpus(CPUState *cpu,
-                                                          vaddr addr,
-                                                          uint16_t idxmap,
-                                                          unsigned bits)
-{
-}
 static inline void
 tlb_flush_page_bits_by_mmuidx_all_cpus_synced(CPUState *cpu, vaddr addr,
                                               uint16_t idxmap, unsigned bits)
@@ -351,13 +291,6 @@ static inline void tlb_flush_range_by_mmuidx(CPUState *cpu, vaddr addr,
                                              unsigned bits)
 {
 }
-static inline void tlb_flush_range_by_mmuidx_all_cpus(CPUState *cpu,
-                                                      vaddr addr,
-                                                      vaddr len,
-                                                      uint16_t idxmap,
-                                                      unsigned bits)
-{
-}
 static inline void tlb_flush_range_by_mmuidx_all_cpus_synced(CPUState *cpu,
                                                              vaddr addr,
                                                              vaddr len,
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 93b1ca810b..8ff3aa5e50 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -379,21 +379,6 @@ void tlb_flush(CPUState *cpu)
     tlb_flush_by_mmuidx(cpu, ALL_MMUIDX_BITS);
 }
 
-void tlb_flush_by_mmuidx_all_cpus(CPUState *src_cpu, uint16_t idxmap)
-{
-    const run_on_cpu_func fn = tlb_flush_by_mmuidx_async_work;
-
-    tlb_debug("mmu_idx: 0x%"PRIx16"\n", idxmap);
-
-    flush_all_helper(src_cpu, fn, RUN_ON_CPU_HOST_INT(idxmap));
-    fn(src_cpu, RUN_ON_CPU_HOST_INT(idxmap));
-}
-
-void tlb_flush_all_cpus(CPUState *src_cpu)
-{
-    tlb_flush_by_mmuidx_all_cpus(src_cpu, ALL_MMUIDX_BITS);
-}
-
 void tlb_flush_by_mmuidx_all_cpus_synced(CPUState *src_cpu, uint16_t idxmap)
 {
     const run_on_cpu_func fn = tlb_flush_by_mmuidx_async_work;
@@ -604,46 +589,6 @@ void tlb_flush_page(CPUState *cpu, vaddr addr)
     tlb_flush_page_by_mmuidx(cpu, addr, ALL_MMUIDX_BITS);
 }
 
-void tlb_flush_page_by_mmuidx_all_cpus(CPUState *src_cpu, vaddr addr,
-                                       uint16_t idxmap)
-{
-    tlb_debug("addr: %016" VADDR_PRIx " mmu_idx:%"PRIx16"\n", addr, idxmap);
-
-    /* This should already be page aligned */
-    addr &= TARGET_PAGE_MASK;
-
-    /*
-     * Allocate memory to hold addr+idxmap only when needed.
-     * See tlb_flush_page_by_mmuidx for details.
-     */
-    if (idxmap < TARGET_PAGE_SIZE) {
-        flush_all_helper(src_cpu, tlb_flush_page_by_mmuidx_async_1,
-                         RUN_ON_CPU_TARGET_PTR(addr | idxmap));
-    } else {
-        CPUState *dst_cpu;
-
-        /* Allocate a separate data block for each destination cpu.  */
-        CPU_FOREACH(dst_cpu) {
-            if (dst_cpu != src_cpu) {
-                TLBFlushPageByMMUIdxData *d
-                    = g_new(TLBFlushPageByMMUIdxData, 1);
-
-                d->addr = addr;
-                d->idxmap = idxmap;
-                async_run_on_cpu(dst_cpu, tlb_flush_page_by_mmuidx_async_2,
-                                 RUN_ON_CPU_HOST_PTR(d));
-            }
-        }
-    }
-
-    tlb_flush_page_by_mmuidx_async_0(src_cpu, addr, idxmap);
-}
-
-void tlb_flush_page_all_cpus(CPUState *src, vaddr addr)
-{
-    tlb_flush_page_by_mmuidx_all_cpus(src, addr, ALL_MMUIDX_BITS);
-}
-
 void tlb_flush_page_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
                                               vaddr addr,
                                               uint16_t idxmap)
@@ -835,54 +780,6 @@ void tlb_flush_page_bits_by_mmuidx(CPUState *cpu, vaddr addr,
     tlb_flush_range_by_mmuidx(cpu, addr, TARGET_PAGE_SIZE, idxmap, bits);
 }
 
-void tlb_flush_range_by_mmuidx_all_cpus(CPUState *src_cpu,
-                                        vaddr addr, vaddr len,
-                                        uint16_t idxmap, unsigned bits)
-{
-    TLBFlushRangeData d;
-    CPUState *dst_cpu;
-
-    /*
-     * If all bits are significant, and len is small,
-     * this devolves to tlb_flush_page.
-     */
-    if (bits >= TARGET_LONG_BITS && len <= TARGET_PAGE_SIZE) {
-        tlb_flush_page_by_mmuidx_all_cpus(src_cpu, addr, idxmap);
-        return;
-    }
-    /* If no page bits are significant, this devolves to tlb_flush. */
-    if (bits < TARGET_PAGE_BITS) {
-        tlb_flush_by_mmuidx_all_cpus(src_cpu, idxmap);
-        return;
-    }
-
-    /* This should already be page aligned */
-    d.addr = addr & TARGET_PAGE_MASK;
-    d.len = len;
-    d.idxmap = idxmap;
-    d.bits = bits;
-
-    /* Allocate a separate data block for each destination cpu.  */
-    CPU_FOREACH(dst_cpu) {
-        if (dst_cpu != src_cpu) {
-            TLBFlushRangeData *p = g_memdup(&d, sizeof(d));
-            async_run_on_cpu(dst_cpu,
-                             tlb_flush_range_by_mmuidx_async_1,
-                             RUN_ON_CPU_HOST_PTR(p));
-        }
-    }
-
-    tlb_flush_range_by_mmuidx_async_0(src_cpu, d);
-}
-
-void tlb_flush_page_bits_by_mmuidx_all_cpus(CPUState *src_cpu,
-                                            vaddr addr, uint16_t idxmap,
-                                            unsigned bits)
-{
-    tlb_flush_range_by_mmuidx_all_cpus(src_cpu, addr, TARGET_PAGE_SIZE,
-                                       idxmap, bits);
-}
-
 void tlb_flush_range_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
                                                vaddr addr,
                                                vaddr len,
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] tcg/cputlb: remove other-cpu capability from TLB flushing
  2024-03-28  5:31 [PATCH 0/3] target/ppc: fix tlb flushing race Nicholas Piggin
  2024-03-28  5:31 ` [PATCH 1/3] target/ppc: Fix broadcast tlbie synchronisation Nicholas Piggin
  2024-03-28  5:31 ` [PATCH 2/3] tcg/cputlb: Remove non-synced variants of global TLB flushes Nicholas Piggin
@ 2024-03-28  5:31 ` Nicholas Piggin
  2024-03-28  8:12 ` [PATCH 0/3] target/ppc: fix tlb flushing race Nicholas Piggin
  3 siblings, 0 replies; 10+ messages in thread
From: Nicholas Piggin @ 2024-03-28  5:31 UTC (permalink / raw)
  To: qemu-ppc; +Cc: Nicholas Piggin, Richard Henderson, Paolo Bonzini, qemu-devel

Some TLB flush operations can flush other CPUs. The problem with this
is they used non-synced variants of flushes (i.e., that return
before the destination has completed theflush). Since all TLB flush
users need the synced variants and the last user of the non-synced
flush was buggy, this is a footgun waiting to go off. There do not
seem to be any callers that flush other CPUs, so remove the capability.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 accel/tcg/cputlb.c | 42 +++++++++---------------------------------
 1 file changed, 9 insertions(+), 33 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 8ff3aa5e50..1fe6def280 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -366,12 +366,9 @@ void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap)
 {
     tlb_debug("mmu_idx: 0x%" PRIx16 "\n", idxmap);
 
-    if (cpu->created && !qemu_cpu_is_self(cpu)) {
-        async_run_on_cpu(cpu, tlb_flush_by_mmuidx_async_work,
-                         RUN_ON_CPU_HOST_INT(idxmap));
-    } else {
-        tlb_flush_by_mmuidx_async_work(cpu, RUN_ON_CPU_HOST_INT(idxmap));
-    }
+    assert_cpu_is_self(cpu);
+
+    tlb_flush_by_mmuidx_async_work(cpu, RUN_ON_CPU_HOST_INT(idxmap));
 }
 
 void tlb_flush(CPUState *cpu)
@@ -560,28 +557,12 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, vaddr addr, uint16_t idxmap)
 {
     tlb_debug("addr: %016" VADDR_PRIx " mmu_idx:%" PRIx16 "\n", addr, idxmap);
 
+    assert_cpu_is_self(cpu);
+
     /* This should already be page aligned */
     addr &= TARGET_PAGE_MASK;
 
-    if (qemu_cpu_is_self(cpu)) {
-        tlb_flush_page_by_mmuidx_async_0(cpu, addr, idxmap);
-    } else if (idxmap < TARGET_PAGE_SIZE) {
-        /*
-         * Most targets have only a few mmu_idx.  In the case where
-         * we can stuff idxmap into the low TARGET_PAGE_BITS, avoid
-         * allocating memory for this operation.
-         */
-        async_run_on_cpu(cpu, tlb_flush_page_by_mmuidx_async_1,
-                         RUN_ON_CPU_TARGET_PTR(addr | idxmap));
-    } else {
-        TLBFlushPageByMMUIdxData *d = g_new(TLBFlushPageByMMUIdxData, 1);
-
-        /* Otherwise allocate a structure, freed by the worker.  */
-        d->addr = addr;
-        d->idxmap = idxmap;
-        async_run_on_cpu(cpu, tlb_flush_page_by_mmuidx_async_2,
-                         RUN_ON_CPU_HOST_PTR(d));
-    }
+    tlb_flush_page_by_mmuidx_async_0(cpu, addr, idxmap);
 }
 
 void tlb_flush_page(CPUState *cpu, vaddr addr)
@@ -744,6 +725,8 @@ void tlb_flush_range_by_mmuidx(CPUState *cpu, vaddr addr,
 {
     TLBFlushRangeData d;
 
+    assert_cpu_is_self(cpu);
+
     /*
      * If all bits are significant, and len is small,
      * this devolves to tlb_flush_page.
@@ -764,14 +747,7 @@ void tlb_flush_range_by_mmuidx(CPUState *cpu, vaddr addr,
     d.idxmap = idxmap;
     d.bits = bits;
 
-    if (qemu_cpu_is_self(cpu)) {
-        tlb_flush_range_by_mmuidx_async_0(cpu, d);
-    } else {
-        /* Otherwise allocate a structure, freed by the worker.  */
-        TLBFlushRangeData *p = g_memdup(&d, sizeof(d));
-        async_run_on_cpu(cpu, tlb_flush_range_by_mmuidx_async_1,
-                         RUN_ON_CPU_HOST_PTR(p));
-    }
+    tlb_flush_range_by_mmuidx_async_0(cpu, d);
 }
 
 void tlb_flush_page_bits_by_mmuidx(CPUState *cpu, vaddr addr,
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] target/ppc: fix tlb flushing race
  2024-03-28  5:31 [PATCH 0/3] target/ppc: fix tlb flushing race Nicholas Piggin
                   ` (2 preceding siblings ...)
  2024-03-28  5:31 ` [PATCH 3/3] tcg/cputlb: remove other-cpu capability from TLB flushing Nicholas Piggin
@ 2024-03-28  8:12 ` Nicholas Piggin
  2024-03-28 10:15   ` Nicholas Piggin
  3 siblings, 1 reply; 10+ messages in thread
From: Nicholas Piggin @ 2024-03-28  8:12 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-ppc; +Cc: Richard Henderson, Paolo Bonzini, qemu-devel

On Thu Mar 28, 2024 at 3:31 PM AEST, Nicholas Piggin wrote:
> ppc broadcast tlb flushes should be synchronised with other vCPUs,
> like all other architectures that support such operations seem to
> be doing.
>
> Fixing ppc removes the last caller of the non-synced TLB flush
> variants, we can remove some dead code. I'd like to merge patch 1
> for 9.0, and hold patches 2 and 3 until 9.1 to avoid churn (unless
> someone prefers to remove the dead code asap).

Hmm, turns out to not be so simple, this in parts reverts
the fix in commit 4ddc104689b. Do other architectures
that use the _synced TLB flush variants have that same problem
with the TLB flush not actually flushing until the TB ends,
I wonder?

AFAIKS it seems like the right fix would be to use _synced, but
force a new TB at the end of the TLB flush instruction so the
flush will take effect on all CPUs before the next instruction?

In any case this is tricky enough and I only hit it with a
test program, so I'll leave it out of 9.0.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] target/ppc: fix tlb flushing race
  2024-03-28  8:12 ` [PATCH 0/3] target/ppc: fix tlb flushing race Nicholas Piggin
@ 2024-03-28 10:15   ` Nicholas Piggin
  2024-03-28 10:37     ` Nicholas Piggin
  0 siblings, 1 reply; 10+ messages in thread
From: Nicholas Piggin @ 2024-03-28 10:15 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-ppc; +Cc: Richard Henderson, Paolo Bonzini, qemu-devel

On Thu Mar 28, 2024 at 6:12 PM AEST, Nicholas Piggin wrote:
> On Thu Mar 28, 2024 at 3:31 PM AEST, Nicholas Piggin wrote:
> > ppc broadcast tlb flushes should be synchronised with other vCPUs,
> > like all other architectures that support such operations seem to
> > be doing.
> >
> > Fixing ppc removes the last caller of the non-synced TLB flush
> > variants, we can remove some dead code. I'd like to merge patch 1
> > for 9.0, and hold patches 2 and 3 until 9.1 to avoid churn (unless
> > someone prefers to remove the dead code asap).
>
> Hmm, turns out to not be so simple, this in parts reverts
> the fix in commit 4ddc104689b. Do other architectures
> that use the _synced TLB flush variants have that same problem
> with the TLB flush not actually flushing until the TB ends,
> I wonder?

Huh, I can reproduce that original problem with a little test
case (which I will upstream into kvm-unit-tests).

async_run_on_cpu(this_cpu) seems to flush before the next TB, but
async_safe_run_on_cpu(this_cpu) does not? How does it execute it
without exiting from the TB?

In any case, patch 1 to make it _synced, plus the following,
seems to close both races.

Thanks,
Nick

---

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 93ffec787c..c44e0ce687 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3495,6 +3495,7 @@ static inline void gen_check_tlb_flush(DisasContext *ctx, bool global)
         gen_helper_check_tlb_flush_local(tcg_env);
     }
     gen_set_label(l);
+    ctx->base.is_jmp = DISAS_EXIT_UPDATE;
 }
 #else
 static inline void gen_check_tlb_flush(DisasContext *ctx, bool global) { }
diff --git a/target/ppc/translate/storage-ctrl-impl.c.inc b/target/ppc/translate/storage-ctrl-impl.c.inc
index 74c23a4191..673e754404 100644
--- a/target/ppc/translate/storage-ctrl-impl.c.inc
+++ b/target/ppc/translate/storage-ctrl-impl.c.inc
@@ -224,6 +224,9 @@ static bool do_tlbie(DisasContext *ctx, arg_X_tlbie *a, bool local)
                                  a->prs << TLBIE_F_PRS_SHIFT |
                                  a->r << TLBIE_F_R_SHIFT |
                                  local << TLBIE_F_LOCAL_SHIFT));
+        if (!local) {
+            ctx->base.is_jmp = DISAS_EXIT_UPDATE;
+        }
         return true;
 #endif


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] target/ppc: fix tlb flushing race
  2024-03-28 10:15   ` Nicholas Piggin
@ 2024-03-28 10:37     ` Nicholas Piggin
  2024-03-28 13:20       ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 10+ messages in thread
From: Nicholas Piggin @ 2024-03-28 10:37 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-ppc; +Cc: Richard Henderson, Paolo Bonzini, qemu-devel

On Thu Mar 28, 2024 at 8:15 PM AEST, Nicholas Piggin wrote:
> On Thu Mar 28, 2024 at 6:12 PM AEST, Nicholas Piggin wrote:
> > On Thu Mar 28, 2024 at 3:31 PM AEST, Nicholas Piggin wrote:
> > > ppc broadcast tlb flushes should be synchronised with other vCPUs,
> > > like all other architectures that support such operations seem to
> > > be doing.
> > >
> > > Fixing ppc removes the last caller of the non-synced TLB flush
> > > variants, we can remove some dead code. I'd like to merge patch 1
> > > for 9.0, and hold patches 2 and 3 until 9.1 to avoid churn (unless
> > > someone prefers to remove the dead code asap).
> >
> > Hmm, turns out to not be so simple, this in parts reverts
> > the fix in commit 4ddc104689b. Do other architectures
> > that use the _synced TLB flush variants have that same problem
> > with the TLB flush not actually flushing until the TB ends,
> > I wonder?
>
> Huh, I can reproduce that original problem with a little test
> case (which I will upstream into kvm-unit-tests).
>
> async_run_on_cpu(this_cpu) seems to flush before the next TB, but
> async_safe_run_on_cpu(this_cpu) does not? How does it execute it
> without exiting from the TB?

Duh, it's because the non-_synced tlb flush variants don't use
that for running on this CPU, they just call it directly.

Okay that all makes sense now. I think this series plus the
below are good then. Also it's possible some other archs that
use _all_cpus_synced() (arm, riscv, s390x) _may_ be racy. I
had a quick look at sfence.vma and ipte, and AFAIKS they're
supposed to take immediate effect after they execute.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] tcg/cputlb: Remove non-synced variants of global TLB flushes
  2024-03-28  5:31 ` [PATCH 2/3] tcg/cputlb: Remove non-synced variants of global TLB flushes Nicholas Piggin
@ 2024-03-28 13:18   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 10+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-03-28 13:18 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-ppc
  Cc: Richard Henderson, Paolo Bonzini, qemu-devel, Alex Bennée

On 28/3/24 06:31, Nicholas Piggin wrote:
> These are no longer used.

tlb_flush_all_cpus: removed in previous patch
tlb_flush_page_all_cpus: removed in previous patch

tlb_flush_page_bits_by_mmuidx_all_cpus: never used
tlb_flush_page_by_mmuidx_all_cpus: never used
tlb_flush_page_bits_by_mmuidx_all_cpus: never used thus:
  tlb_flush_range_by_mmuidx_all_cpus: never used
  tlb_flush_by_mmuidx_all_cpus: never used

> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   docs/devel/multi-thread-tcg.rst |  13 ++--
>   include/exec/exec-all.h         |  97 +++++-------------------------
>   accel/tcg/cputlb.c              | 103 --------------------------------
>   3 files changed, 19 insertions(+), 194 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] target/ppc: Fix broadcast tlbie synchronisation
  2024-03-28  5:31 ` [PATCH 1/3] target/ppc: Fix broadcast tlbie synchronisation Nicholas Piggin
@ 2024-03-28 13:18   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 10+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-03-28 13:18 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-ppc; +Cc: Richard Henderson, Paolo Bonzini, qemu-devel

On 28/3/24 06:31, Nicholas Piggin wrote:
> With mttcg, broadcast tlbie instructions do not wait until other vCPUs
> have been kicked out of TCG execution before they complete (including
> necessary subsequent tlbsync, etc., instructions). This is contrary to
> the ISA, and it permits other vCPUs to use translations after the TLB
> flush. For example:
> 
>     CPU0
>     // *memP is initially 0, memV maps to memP with *pte
>     *pte = 0;
>     ptesync ; tlbie ; eieio ; tlbsync ; ptesync
>     *memP = 1;
> 
>     CPU1
>     assert(*memV == 0);
> 
> It is possible for the assertion to fail because CPU1 translates memV
> using the TLB after CPU0 has stored 1 to the underlying memory. This
> race was observed with a careful test case where CPU1 checks run in a
> very large expensive TB so it can run for the entire CPU0 period between
> clearing the pte and storing the memory. It's normally very difficult to
> hit, but preemption of host vCPU threads could trigger the race
> anywhere.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   target/ppc/helper_regs.c | 2 +-
>   target/ppc/mmu_helper.c  | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)

To the best of my knowledge,
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] target/ppc: fix tlb flushing race
  2024-03-28 10:37     ` Nicholas Piggin
@ 2024-03-28 13:20       ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 10+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-03-28 13:20 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-ppc; +Cc: Richard Henderson, Paolo Bonzini, qemu-devel

On 28/3/24 11:37, Nicholas Piggin wrote:
> On Thu Mar 28, 2024 at 8:15 PM AEST, Nicholas Piggin wrote:
>> On Thu Mar 28, 2024 at 6:12 PM AEST, Nicholas Piggin wrote:
>>> On Thu Mar 28, 2024 at 3:31 PM AEST, Nicholas Piggin wrote:
>>>> ppc broadcast tlb flushes should be synchronised with other vCPUs,
>>>> like all other architectures that support such operations seem to
>>>> be doing.
>>>>
>>>> Fixing ppc removes the last caller of the non-synced TLB flush
>>>> variants, we can remove some dead code. I'd like to merge patch 1
>>>> for 9.0, and hold patches 2 and 3 until 9.1 to avoid churn (unless
>>>> someone prefers to remove the dead code asap).
>>>
>>> Hmm, turns out to not be so simple, this in parts reverts
>>> the fix in commit 4ddc104689b.

Please mention that in the patch.

> Do other architectures
>>> that use the _synced TLB flush variants have that same problem
>>> with the TLB flush not actually flushing until the TB ends,
>>> I wonder?
>>
>> Huh, I can reproduce that original problem with a little test
>> case (which I will upstream into kvm-unit-tests).
>>
>> async_run_on_cpu(this_cpu) seems to flush before the next TB, but
>> async_safe_run_on_cpu(this_cpu) does not? How does it execute it
>> without exiting from the TB?
> 
> Duh, it's because the non-_synced tlb flush variants don't use
> that for running on this CPU, they just call it directly.
> 
> Okay that all makes sense now. I think this series plus the
> below are good then. Also it's possible some other archs that
> use _all_cpus_synced() (arm, riscv, s390x) _may_ be racy. I
> had a quick look at sfence.vma and ipte, and AFAIKS they're
> supposed to take immediate effect after they execute.
> 
> Thanks,
> Nick
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-03-28 13:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-28  5:31 [PATCH 0/3] target/ppc: fix tlb flushing race Nicholas Piggin
2024-03-28  5:31 ` [PATCH 1/3] target/ppc: Fix broadcast tlbie synchronisation Nicholas Piggin
2024-03-28 13:18   ` Philippe Mathieu-Daudé
2024-03-28  5:31 ` [PATCH 2/3] tcg/cputlb: Remove non-synced variants of global TLB flushes Nicholas Piggin
2024-03-28 13:18   ` Philippe Mathieu-Daudé
2024-03-28  5:31 ` [PATCH 3/3] tcg/cputlb: remove other-cpu capability from TLB flushing Nicholas Piggin
2024-03-28  8:12 ` [PATCH 0/3] target/ppc: fix tlb flushing race Nicholas Piggin
2024-03-28 10:15   ` Nicholas Piggin
2024-03-28 10:37     ` Nicholas Piggin
2024-03-28 13:20       ` Philippe Mathieu-Daudé

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.