All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] target/arm: Fix tlb flush page vs tbi
@ 2020-10-01 17:07 Richard Henderson
  2020-10-01 17:07 ` [RFC PATCH 1/2] accel/tcg: Add tlb_flush_page_bits_by_mmuidx* Richard Henderson
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Richard Henderson @ 2020-10-01 17:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm, Jordan Frank

Since the FAR_ELx fix at 38d931687fa1, it is reported that
page granularity flushing is broken.

This makes sense, since TCG will record the entire virtual
address in its TLB, not simply the 56 significant bits.
With no other TCG support, the ARM backend should require
256 different page flushes to clear the virtual address of
any possible tag.

So I added a new tcg interface that allows passing the size
of the virtual address.  I thought a simple bit-count was a 
cleaner interface than passing in a mask, since it means that
we couldn't be passed nonsensical masks like 0xdeadbeef.  It
also makes it easy to re-direct special cases.

I don't have a test case that triggers the bug.  All I can say
is that (1) this still boots a normal kernel and (2) the code
paths are triggered since the kernel enables tbi for EL0.

Jordan, can you test this please?


r~


Richard Henderson (2):
  accel/tcg: Add tlb_flush_page_bits_by_mmuidx*
  target/arm: Use tlb_flush_page_bits_by_mmuidx*

 include/exec/exec-all.h |  36 ++++++
 accel/tcg/cputlb.c      | 259 ++++++++++++++++++++++++++++++++++++++--
 target/arm/helper.c     |  46 +++++--
 3 files changed, 325 insertions(+), 16 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC PATCH 1/2] accel/tcg: Add tlb_flush_page_bits_by_mmuidx*
  2020-10-01 17:07 [RFC PATCH 0/2] target/arm: Fix tlb flush page vs tbi Richard Henderson
@ 2020-10-01 17:07 ` Richard Henderson
  2020-10-08 12:53   ` Peter Maydell
  2020-10-01 17:07 ` [RFC PATCH 2/2] target/arm: Use tlb_flush_page_bits_by_mmuidx* Richard Henderson
  2020-10-02 18:19 ` [RFC PATCH 0/2] target/arm: Fix tlb flush page vs tbi Jordan Frank
  2 siblings, 1 reply; 6+ messages in thread
From: Richard Henderson @ 2020-10-01 17:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm, Jordan Frank

On ARM, the Top Byte Ignore feature means that only 56 bits of
the address are significant in the virtual address.  We are
required to give the entire 64-bit address to FAR_ELx on fault,
which means that we do not "clean" the top byte early in TCG.

This new interface allows us to flush all 256 possible aliases
for a given page, currently missed by tlb_flush_page*.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/exec-all.h |  36 ++++++
 accel/tcg/cputlb.c      | 259 ++++++++++++++++++++++++++++++++++++++--
 2 files changed, 286 insertions(+), 9 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 1fe28d574f..7b3b6bf57d 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -251,6 +251,25 @@ void tlb_flush_by_mmuidx_all_cpus(CPUState *cpu, uint16_t idxmap);
  * depend on when the guests translation ends the TB.
  */
 void tlb_flush_by_mmuidx_all_cpus_synced(CPUState *cpu, uint16_t idxmap);
+
+/**
+ * tlb_flush_page_bits_by_mmuidx
+ * @cpu: CPU whose TLB should be flushed
+ * @addr: virtual address of page to be flushed
+ * @idxmap: bitmap of mmu indexes to flush
+ * @bits: number of significant bits in address
+ *
+ * Similar to tlb_flush_page_mask, but with a bitmap of indexes.
+ */
+void tlb_flush_page_bits_by_mmuidx(CPUState *cpu, target_ulong addr,
+                                   uint16_t idxmap, unsigned bits);
+
+/* Similarly, with broadcast and syncing. */
+void tlb_flush_page_bits_by_mmuidx_all_cpus(CPUState *cpu, target_ulong addr,
+                                            uint16_t idxmap, unsigned bits);
+void tlb_flush_page_bits_by_mmuidx_all_cpus_synced
+    (CPUState *cpu, target_ulong addr, uint16_t idxmap, unsigned bits);
+
 /**
  * tlb_set_page_with_attrs:
  * @cpu: CPU to add this TLB entry for
@@ -337,6 +356,23 @@ static inline void tlb_flush_by_mmuidx_all_cpus_synced(CPUState *cpu,
                                                        uint16_t idxmap)
 {
 }
+static inline void tlb_flush_page_bits_by_mmuidx(CPUState *cpu,
+                                                 target_ulong addr,
+                                                 uint16_t idxmap,
+                                                 unsigned bits)
+{
+}
+static inline void tlb_flush_page_bits_by_mmuidx_all_cpus(CPUState *cpu,
+                                                          target_ulong addr,
+                                                          uint16_t idxmap,
+                                                          unsigned bits)
+{
+}
+static inline void
+tlb_flush_page_bits_by_mmuidx_all_cpus_synced(CPUState *cpu, target_ulong addr,
+                                              uint16_t idxmap, unsigned bits)
+{
+}
 #endif
 /**
  * probe_access:
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index aaf8e46ae5..d74032a898 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -409,12 +409,21 @@ void tlb_flush_all_cpus_synced(CPUState *src_cpu)
     tlb_flush_by_mmuidx_all_cpus_synced(src_cpu, ALL_MMUIDX_BITS);
 }
 
+static bool tlb_hit_page_mask_anyprot(CPUTLBEntry *tlb_entry,
+                                      target_ulong page, target_ulong mask)
+{
+    page &= mask;
+    mask &= TARGET_PAGE_MASK | TLB_INVALID_MASK;
+
+    return (page == (tlb_entry->addr_read & mask) ||
+            page == (tlb_addr_write(tlb_entry) & mask) ||
+            page == (tlb_entry->addr_code & mask));
+}
+
 static inline bool tlb_hit_page_anyprot(CPUTLBEntry *tlb_entry,
                                         target_ulong page)
 {
-    return tlb_hit_page(tlb_entry->addr_read, page) ||
-           tlb_hit_page(tlb_addr_write(tlb_entry), page) ||
-           tlb_hit_page(tlb_entry->addr_code, page);
+    return tlb_hit_page_mask_anyprot(tlb_entry, page, -1);
 }
 
 /**
@@ -427,31 +436,45 @@ static inline bool tlb_entry_is_empty(const CPUTLBEntry *te)
 }
 
 /* Called with tlb_c.lock held */
-static inline bool tlb_flush_entry_locked(CPUTLBEntry *tlb_entry,
-                                          target_ulong page)
+static bool tlb_flush_entry_mask_locked(CPUTLBEntry *tlb_entry,
+                                        target_ulong page,
+                                        target_ulong mask)
 {
-    if (tlb_hit_page_anyprot(tlb_entry, page)) {
+    if (tlb_hit_page_mask_anyprot(tlb_entry, page, mask)) {
         memset(tlb_entry, -1, sizeof(*tlb_entry));
         return true;
     }
     return false;
 }
 
+static inline bool tlb_flush_entry_locked(CPUTLBEntry *tlb_entry,
+                                          target_ulong page)
+{
+    return tlb_flush_entry_mask_locked(tlb_entry, page, -1);
+}
+
 /* Called with tlb_c.lock held */
-static inline void tlb_flush_vtlb_page_locked(CPUArchState *env, int mmu_idx,
-                                              target_ulong page)
+static void tlb_flush_vtlb_page_mask_locked(CPUArchState *env, int mmu_idx,
+                                            target_ulong page,
+                                            target_ulong mask)
 {
     CPUTLBDesc *d = &env_tlb(env)->d[mmu_idx];
     int k;
 
     assert_cpu_is_self(env_cpu(env));
     for (k = 0; k < CPU_VTLB_SIZE; k++) {
-        if (tlb_flush_entry_locked(&d->vtable[k], page)) {
+        if (tlb_flush_entry_mask_locked(&d->vtable[k], page, mask)) {
             tlb_n_used_entries_dec(env, mmu_idx);
         }
     }
 }
 
+static inline void tlb_flush_vtlb_page_locked(CPUArchState *env, int mmu_idx,
+                                              target_ulong page)
+{
+    tlb_flush_vtlb_page_mask_locked(env, mmu_idx, page, -1);
+}
+
 static void tlb_flush_page_locked(CPUArchState *env, int midx,
                                   target_ulong page)
 {
@@ -666,6 +689,224 @@ void tlb_flush_page_all_cpus_synced(CPUState *src, target_ulong addr)
     tlb_flush_page_by_mmuidx_all_cpus_synced(src, addr, ALL_MMUIDX_BITS);
 }
 
+static void tlb_flush_page_bits_locked(CPUArchState *env, int midx,
+                                       target_ulong page, unsigned bits)
+{
+    CPUTLBDesc *d = &env_tlb(env)->d[midx];
+    CPUTLBDescFast *f = &env_tlb(env)->f[midx];
+    target_ulong mask = MAKE_64BIT_MASK(0, bits);
+
+    /*
+     * If @bits is smaller than the tlb size, there may be multiple entries
+     * within the TLB; otherwise all addresses that match under @mask hit
+     * the same TLB entry.
+     *
+     * TODO: Perhaps allow bits to be a few bits less than the size.
+     * For now, just flush the entire TLB.
+     */
+    if (mask < f->mask) {
+        tlb_debug("forcing full flush midx %d ("
+                  TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
+                  midx, page, mask);
+        tlb_flush_one_mmuidx_locked(env, midx, get_clock_realtime());
+        return;
+    }
+
+    /* Check if we need to flush due to large pages.  */
+    if ((page & d->large_page_mask) == d->large_page_addr) {
+        tlb_debug("forcing full flush midx %d ("
+                  TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
+                  midx, d->large_page_addr, d->large_page_mask);
+        tlb_flush_one_mmuidx_locked(env, midx, get_clock_realtime());
+        return;
+    }
+
+    if (tlb_flush_entry_mask_locked(tlb_entry(env, midx, page), page, mask)) {
+        tlb_n_used_entries_dec(env, midx);
+    }
+    tlb_flush_vtlb_page_mask_locked(env, midx, page, mask);
+}
+
+static void tlb_flush_page_bits_by_mmuidx_async_0(CPUState *cpu,
+                                                  target_ulong addr,
+                                                  uint16_t idxmap,
+                                                  unsigned bits)
+{
+    CPUArchState *env = cpu->env_ptr;
+    int mmu_idx;
+
+    assert_cpu_is_self(cpu);
+
+    tlb_debug("page addr:" TARGET_FMT_lx "/%u mmu_map:0x%x\n",
+              addr, bits, idxmap);
+
+    qemu_spin_lock(&env_tlb(env)->c.lock);
+    for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
+        if ((idxmap >> mmu_idx) & 1) {
+            tlb_flush_page_bits_locked(env, mmu_idx, addr, bits);
+        }
+    }
+    qemu_spin_unlock(&env_tlb(env)->c.lock);
+
+    tb_flush_jmp_cache(cpu, addr);
+}
+
+static void tlb_flush_page_bits_by_mmuidx_async_1(CPUState *cpu,
+                                                  run_on_cpu_data data)
+{
+    target_ulong addr_map_bits = (target_ulong) data.target_ptr;
+    target_ulong addr = addr_map_bits & TARGET_PAGE_MASK;
+    uint16_t idxmap = (addr_map_bits & ~TARGET_PAGE_MASK) >> 6;
+    unsigned bits = addr_map_bits & 0x3f;
+
+    tlb_flush_page_bits_by_mmuidx_async_0(cpu, addr, idxmap, bits);
+}
+
+typedef struct {
+    target_ulong addr;
+    uint16_t idxmap;
+    uint16_t bits;
+} TLBFlushPageBitsByMMUIdxData;
+
+static void tlb_flush_page_bits_by_mmuidx_async_2(CPUState *cpu,
+                                                  run_on_cpu_data data)
+{
+    TLBFlushPageBitsByMMUIdxData *d = data.host_ptr;
+
+    tlb_flush_page_bits_by_mmuidx_async_0(cpu, d->addr, d->idxmap, d->bits);
+    g_free(d);
+}
+
+void tlb_flush_page_bits_by_mmuidx(CPUState *cpu, target_ulong addr,
+                                   uint16_t idxmap, unsigned bits)
+{
+    /* If all bits are significant, this devolves to tlb_flush_page. */
+    if (bits >= TARGET_LONG_BITS) {
+        tlb_flush_page_by_mmuidx(cpu, addr, idxmap);
+        return;
+    }
+    /* If no page bits are significant, this devolves to tlb_flush. */
+    if (bits < TARGET_PAGE_BITS) {
+        tlb_flush_by_mmuidx(cpu, idxmap);
+        return;
+    }
+
+    /* This should already be page aligned */
+    addr &= TARGET_PAGE_MASK;
+
+    if (qemu_cpu_is_self(cpu)) {
+        tlb_flush_page_bits_by_mmuidx_async_0(cpu, addr, idxmap, bits);
+    } else if (idxmap <= MAKE_64BIT_MASK(0, TARGET_PAGE_BITS - 6)) {
+        run_on_cpu_data data
+            = RUN_ON_CPU_TARGET_PTR(addr | (idxmap << 6) | bits);
+        async_run_on_cpu(cpu, tlb_flush_page_bits_by_mmuidx_async_1, data);
+    } else {
+        TLBFlushPageBitsByMMUIdxData *d
+            = g_new(TLBFlushPageBitsByMMUIdxData, 1);
+
+        /* Otherwise allocate a structure, freed by the worker.  */
+        d->addr = addr;
+        d->idxmap = idxmap;
+        d->bits = bits;
+        async_run_on_cpu(cpu, tlb_flush_page_bits_by_mmuidx_async_2,
+                         RUN_ON_CPU_HOST_PTR(d));
+    }
+}
+
+void tlb_flush_page_bits_by_mmuidx_all_cpus(CPUState *src_cpu,
+                                            target_ulong addr,
+                                            uint16_t idxmap,
+                                            unsigned bits)
+{
+    /* If all bits are significant, this devolves to tlb_flush_page. */
+    if (bits >= TARGET_LONG_BITS) {
+        tlb_flush_page_by_mmuidx_all_cpus(src_cpu, addr, idxmap);
+        return;
+    }
+    /* If no page bits are significant, this devolves to tlb_flush. */
+    if (bits < TARGET_PAGE_BITS) {
+        tlb_flush_by_mmuidx_all_cpus(src_cpu, idxmap);
+        return;
+    }
+
+    /* This should already be page aligned */
+    addr &= TARGET_PAGE_MASK;
+
+    if (idxmap <= MAKE_64BIT_MASK(0, TARGET_PAGE_BITS - 6)) {
+        run_on_cpu_data data
+            = RUN_ON_CPU_TARGET_PTR(addr | (idxmap << 6) | bits);
+        flush_all_helper(src_cpu, tlb_flush_page_bits_by_mmuidx_async_1, data);
+    } else {
+        CPUState *dst_cpu;
+        TLBFlushPageBitsByMMUIdxData *d;
+
+        /* Allocate a separate data block for each destination cpu.  */
+        CPU_FOREACH(dst_cpu) {
+            if (dst_cpu != src_cpu) {
+                d = g_new(TLBFlushPageBitsByMMUIdxData, 1);
+                d->addr = addr;
+                d->idxmap = idxmap;
+                d->bits = bits;
+                async_run_on_cpu(dst_cpu,
+                                 tlb_flush_page_bits_by_mmuidx_async_2,
+                                 RUN_ON_CPU_HOST_PTR(d));
+            }
+        }
+    }
+
+    tlb_flush_page_bits_by_mmuidx_async_0(src_cpu, addr, idxmap, bits);
+}
+
+void tlb_flush_page_bits_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
+                                                   target_ulong addr,
+                                                   uint16_t idxmap,
+                                                   unsigned bits)
+{
+    /* If all bits are significant, this devolves to tlb_flush_page. */
+    if (bits >= TARGET_LONG_BITS) {
+        tlb_flush_page_by_mmuidx_all_cpus_synced(src_cpu, addr, idxmap);
+        return;
+    }
+    /* If no page bits are significant, this devolves to tlb_flush. */
+    if (bits < TARGET_PAGE_BITS) {
+        tlb_flush_by_mmuidx_all_cpus_synced(src_cpu, idxmap);
+        return;
+    }
+
+    /* This should already be page aligned */
+    addr &= TARGET_PAGE_MASK;
+
+    if (idxmap <= MAKE_64BIT_MASK(0, TARGET_PAGE_BITS - 6)) {
+        run_on_cpu_data data
+            = RUN_ON_CPU_TARGET_PTR(addr | (idxmap << 6) | bits);
+        flush_all_helper(src_cpu, tlb_flush_page_bits_by_mmuidx_async_1, data);
+        async_safe_run_on_cpu(src_cpu, tlb_flush_page_bits_by_mmuidx_async_1,
+                              data);
+    } else {
+        CPUState *dst_cpu;
+        TLBFlushPageBitsByMMUIdxData *d;
+
+        /* Allocate a separate data block for each destination cpu.  */
+        CPU_FOREACH(dst_cpu) {
+            if (dst_cpu != src_cpu) {
+                d = g_new(TLBFlushPageBitsByMMUIdxData, 1);
+                d->addr = addr;
+                d->idxmap = idxmap;
+                d->bits = bits;
+                async_run_on_cpu(dst_cpu, tlb_flush_page_bits_by_mmuidx_async_2,
+                                 RUN_ON_CPU_HOST_PTR(d));
+            }
+        }
+
+        d = g_new(TLBFlushPageBitsByMMUIdxData, 1);
+        d->addr = addr;
+        d->idxmap = idxmap;
+        d->bits = bits;
+        async_safe_run_on_cpu(src_cpu, tlb_flush_page_bits_by_mmuidx_async_2,
+                              RUN_ON_CPU_HOST_PTR(d));
+    }
+}
+
 /* update the TLBs so that writes to code in the virtual page 'addr'
    can be detected */
 void tlb_protect_code(ram_addr_t ram_addr)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH 2/2] target/arm: Use tlb_flush_page_bits_by_mmuidx*
  2020-10-01 17:07 [RFC PATCH 0/2] target/arm: Fix tlb flush page vs tbi Richard Henderson
  2020-10-01 17:07 ` [RFC PATCH 1/2] accel/tcg: Add tlb_flush_page_bits_by_mmuidx* Richard Henderson
@ 2020-10-01 17:07 ` Richard Henderson
  2020-10-08 12:59   ` Peter Maydell
  2020-10-02 18:19 ` [RFC PATCH 0/2] target/arm: Fix tlb flush page vs tbi Jordan Frank
  2 siblings, 1 reply; 6+ messages in thread
From: Richard Henderson @ 2020-10-01 17:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm, Jordan Frank

When TBI is enabled in a given regime, 56 bits of the address
are significant and we need to clear out any other matching
virtual addresses with differing tags.

The other uses of tlb_flush_page (without mmuidx) in this file
are only used by aarch32 mode.

Fixes: 38d931687fa1
Reported-by: Jordan Frank <jordanfrank@fb.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.c | 46 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 39 insertions(+), 7 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 88bd9dd35d..7e9095d687 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -49,6 +49,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
 #endif
 
 static void switch_mode(CPUARMState *env, int mode);
+static int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx);
 
 static int vfp_gdb_get_reg(CPUARMState *env, GByteArray *buf, int reg)
 {
@@ -4456,6 +4457,33 @@ static int vae1_tlbmask(CPUARMState *env)
     }
 }
 
+/* Return 56 if TBI is enabled, 64 otherwise. */
+static int tlbbits_for_regime(CPUARMState *env, ARMMMUIdx mmu_idx,
+                              uint64_t addr)
+{
+    uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
+    int tbi = aa64_va_parameter_tbi(tcr, mmu_idx);
+    int select = extract64(addr, 55, 1);
+
+    return (tbi >> select) & 1 ? 56 : 64;
+}
+
+static int vae1_tlbbits(CPUARMState *env, uint64_t addr)
+{
+    ARMMMUIdx mmu_idx;
+
+    /* Only the regime of the mmu_idx below is significant. */
+    if (arm_is_secure_below_el3(env)) {
+        mmu_idx = ARMMMUIdx_SE10_0;
+    } else if ((env->cp15.hcr_el2 & (HCR_E2H | HCR_TGE))
+               == (HCR_E2H | HCR_TGE)) {
+        mmu_idx = ARMMMUIdx_E20_0;
+    } else {
+        mmu_idx = ARMMMUIdx_E10_0;
+    }
+    return tlbbits_for_regime(env, mmu_idx, addr);
+}
+
 static void tlbi_aa64_vmalle1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                       uint64_t value)
 {
@@ -4592,8 +4620,9 @@ static void tlbi_aa64_vae1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
     CPUState *cs = env_cpu(env);
     int mask = vae1_tlbmask(env);
     uint64_t pageaddr = sextract64(value << 12, 0, 56);
+    int bits = vae1_tlbbits(env, pageaddr);
 
-    tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr, mask);
+    tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, bits);
 }
 
 static void tlbi_aa64_vae1_write(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -4607,11 +4636,12 @@ static void tlbi_aa64_vae1_write(CPUARMState *env, const ARMCPRegInfo *ri,
     CPUState *cs = env_cpu(env);
     int mask = vae1_tlbmask(env);
     uint64_t pageaddr = sextract64(value << 12, 0, 56);
+    int bits = vae1_tlbbits(env, pageaddr);
 
     if (tlb_force_broadcast(env)) {
-        tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr, mask);
+        tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, bits);
     } else {
-        tlb_flush_page_by_mmuidx(cs, pageaddr, mask);
+        tlb_flush_page_bits_by_mmuidx(cs, pageaddr, mask, bits);
     }
 }
 
@@ -4620,9 +4650,10 @@ static void tlbi_aa64_vae2is_write(CPUARMState *env, const ARMCPRegInfo *ri,
 {
     CPUState *cs = env_cpu(env);
     uint64_t pageaddr = sextract64(value << 12, 0, 56);
+    int bits = tlbbits_for_regime(env, ARMMMUIdx_E2, pageaddr);
 
-    tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr,
-                                             ARMMMUIdxBit_E2);
+    tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr,
+                                                  ARMMMUIdxBit_E2, bits);
 }
 
 static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -4630,9 +4661,10 @@ static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
 {
     CPUState *cs = env_cpu(env);
     uint64_t pageaddr = sextract64(value << 12, 0, 56);
+    int bits = tlbbits_for_regime(env, ARMMMUIdx_SE3, pageaddr);
 
-    tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr,
-                                             ARMMMUIdxBit_SE3);
+    tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr,
+                                                  ARMMMUIdxBit_SE3, bits);
 }
 
 static CPAccessResult aa64_zva_access(CPUARMState *env, const ARMCPRegInfo *ri,
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 0/2] target/arm: Fix tlb flush page vs tbi
  2020-10-01 17:07 [RFC PATCH 0/2] target/arm: Fix tlb flush page vs tbi Richard Henderson
  2020-10-01 17:07 ` [RFC PATCH 1/2] accel/tcg: Add tlb_flush_page_bits_by_mmuidx* Richard Henderson
  2020-10-01 17:07 ` [RFC PATCH 2/2] target/arm: Use tlb_flush_page_bits_by_mmuidx* Richard Henderson
@ 2020-10-02 18:19 ` Jordan Frank
  2 siblings, 0 replies; 6+ messages in thread
From: Jordan Frank @ 2020-10-02 18:19 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-arm, peter.maydell

I can confirm this resolves the issue we were seeing. Thank you Richard!

Best,
Jordan

On 10/1/20, 10:08 AM, "Richard Henderson" <richard.henderson@linaro.org> wrote:

    Since the FAR_ELx fix at 38d931687fa1, it is reported that
    page granularity flushing is broken.

    This makes sense, since TCG will record the entire virtual
    address in its TLB, not simply the 56 significant bits.
    With no other TCG support, the ARM backend should require
    256 different page flushes to clear the virtual address of
    any possible tag.

    So I added a new tcg interface that allows passing the size
    of the virtual address.  I thought a simple bit-count was a 
    cleaner interface than passing in a mask, since it means that
    we couldn't be passed nonsensical masks like 0xdeadbeef.  It
    also makes it easy to re-direct special cases.

    I don't have a test case that triggers the bug.  All I can say
    is that (1) this still boots a normal kernel and (2) the code
    paths are triggered since the kernel enables tbi for EL0.

    Jordan, can you test this please?


    r~


    Richard Henderson (2):
      accel/tcg: Add tlb_flush_page_bits_by_mmuidx*
      target/arm: Use tlb_flush_page_bits_by_mmuidx*

     include/exec/exec-all.h |  36 ++++++
     accel/tcg/cputlb.c      | 259 ++++++++++++++++++++++++++++++++++++++--
     target/arm/helper.c     |  46 +++++--
     3 files changed, 325 insertions(+), 16 deletions(-)

    -- 
    2.25.1



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 1/2] accel/tcg: Add tlb_flush_page_bits_by_mmuidx*
  2020-10-01 17:07 ` [RFC PATCH 1/2] accel/tcg: Add tlb_flush_page_bits_by_mmuidx* Richard Henderson
@ 2020-10-08 12:53   ` Peter Maydell
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Maydell @ 2020-10-08 12:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jordan Frank, qemu-arm, QEMU Developers

On Thu, 1 Oct 2020 at 18:07, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On ARM, the Top Byte Ignore feature means that only 56 bits of
> the address are significant in the virtual address.  We are
> required to give the entire 64-bit address to FAR_ELx on fault,
> which means that we do not "clean" the top byte early in TCG.
>
> This new interface allows us to flush all 256 possible aliases
> for a given page, currently missed by tlb_flush_page*.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> +static void tlb_flush_page_bits_by_mmuidx_async_1(CPUState *cpu,
> +                                                  run_on_cpu_data data)
> +{
> +    target_ulong addr_map_bits = (target_ulong) data.target_ptr;
> +    target_ulong addr = addr_map_bits & TARGET_PAGE_MASK;
> +    uint16_t idxmap = (addr_map_bits & ~TARGET_PAGE_MASK) >> 6;
> +    unsigned bits = addr_map_bits & 0x3f;

So this is unpacking...

> +    } else if (idxmap <= MAKE_64BIT_MASK(0, TARGET_PAGE_BITS - 6)) {
> +        run_on_cpu_data data
> +            = RUN_ON_CPU_TARGET_PTR(addr | (idxmap << 6) | bits);

...the value that we packed into an integer here...

> +        run_on_cpu_data data
> +            = RUN_ON_CPU_TARGET_PTR(addr | (idxmap << 6) | bits);

...here...

> +    if (idxmap <= MAKE_64BIT_MASK(0, TARGET_PAGE_BITS - 6)) {
> +        run_on_cpu_data data
> +            = RUN_ON_CPU_TARGET_PTR(addr | (idxmap << 6) | bits);

...and here.

Could we do something to avoid all these hard-coded 6s and
maybe make it a bit clearer that these two operations
are the inverse of each other?

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 2/2] target/arm: Use tlb_flush_page_bits_by_mmuidx*
  2020-10-01 17:07 ` [RFC PATCH 2/2] target/arm: Use tlb_flush_page_bits_by_mmuidx* Richard Henderson
@ 2020-10-08 12:59   ` Peter Maydell
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Maydell @ 2020-10-08 12:59 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jordan Frank, qemu-arm, QEMU Developers

On Thu, 1 Oct 2020 at 18:08, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> When TBI is enabled in a given regime, 56 bits of the address
> are significant and we need to clear out any other matching
> virtual addresses with differing tags.
>
> The other uses of tlb_flush_page (without mmuidx) in this file
> are only used by aarch32 mode.
>
> Fixes: 38d931687fa1
> Reported-by: Jordan Frank <jordanfrank@fb.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-10-08 13:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-01 17:07 [RFC PATCH 0/2] target/arm: Fix tlb flush page vs tbi Richard Henderson
2020-10-01 17:07 ` [RFC PATCH 1/2] accel/tcg: Add tlb_flush_page_bits_by_mmuidx* Richard Henderson
2020-10-08 12:53   ` Peter Maydell
2020-10-01 17:07 ` [RFC PATCH 2/2] target/arm: Use tlb_flush_page_bits_by_mmuidx* Richard Henderson
2020-10-08 12:59   ` Peter Maydell
2020-10-02 18:19 ` [RFC PATCH 0/2] target/arm: Fix tlb flush page vs tbi Jordan Frank

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.