[PATCH v10 0/3] x86/guest: use assisted TLB flush in guest mode

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v10 0/3] x86/guest: use assisted TLB flush in guest mode
@ 2020-04-16 13:59 Roger Pau Monne
  2020-04-16 13:59 ` [PATCH v10 1/3] x86/tlb: introduce a flush HVM ASIDs flag Roger Pau Monne
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Roger Pau Monne @ 2020-04-16 13:59 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Andrew Cooper, Tim Deegan, George Dunlap, Jan Beulich,
	Roger Pau Monne

Hello,

This is the remaining of the assisted TLB flush series. This last set of
patches enable the usage of the Xen assisted flush when running nested
on Xen.

Thanks, Roger.

Roger Pau Monne (3):
  x86/tlb: introduce a flush HVM ASIDs flag
  x86/tlb: allow disabling the TLB clock
  x86/tlb: use Xen L0 assisted TLB flush when available

 xen/arch/x86/flushtlb.c                | 37 ++++++++++++++++++++------
 xen/arch/x86/guest/hypervisor.c        | 14 ++++++++++
 xen/arch/x86/guest/xen/xen.c           |  6 +++++
 xen/arch/x86/mm/hap/hap.c              |  8 +++---
 xen/arch/x86/mm/hap/nested_hap.c       |  2 +-
 xen/arch/x86/mm/p2m-pt.c               |  5 ++--
 xen/arch/x86/mm/paging.c               |  2 +-
 xen/arch/x86/mm/shadow/common.c        | 18 ++++++-------
 xen/arch/x86/mm/shadow/hvm.c           |  2 +-
 xen/arch/x86/mm/shadow/multi.c         | 16 +++++------
 xen/arch/x86/mm/shadow/private.h       |  6 +++++
 xen/arch/x86/smp.c                     |  7 +++++
 xen/include/asm-x86/flushtlb.h         | 25 ++++++++++++++++-
 xen/include/asm-x86/guest/hypervisor.h | 17 ++++++++++++
 14 files changed, 130 insertions(+), 35 deletions(-)

-- 
2.26.0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v10 1/3] x86/tlb: introduce a flush HVM ASIDs flag
  2020-04-16 13:59 [PATCH v10 0/3] x86/guest: use assisted TLB flush in guest mode Roger Pau Monne
@ 2020-04-16 13:59 ` Roger Pau Monne
  2020-04-21 10:21   ` Jan Beulich
  2020-04-22 16:33   ` Roger Pau Monné
  2020-04-16 13:59 ` [PATCH v10 2/3] x86/tlb: allow disabling the TLB clock Roger Pau Monne
  2020-04-16 13:59 ` [PATCH v10 3/3] x86/tlb: use Xen L0 assisted TLB flush when available Roger Pau Monne
  2 siblings, 2 replies; 16+ messages in thread
From: Roger Pau Monne @ 2020-04-16 13:59 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Andrew Cooper, Tim Deegan, George Dunlap, Jan Beulich,
	Roger Pau Monne

Introduce a specific flag to request a HVM guest linear TLB flush,
which is an ASID/VPID tickle that forces a guest linear to guest
physical TLB flush for all HVM guests.

This was previously unconditionally done in each pre_flush call, but
that's not required: HVM guests not using shadow don't require linear
TLB flushes as Xen doesn't modify the guest page tables in that case
(ie: when using HAP). Note that shadow paging code already takes care
of issuing the necessary flushes when the shadow page tables are
modified.

In order to keep the previous behavior modify all shadow code TLB
flushes to also flush the guest linear to physical TLB if the guest is
HVM. I haven't looked at each specific shadow code TLB flush in order
to figure out whether it actually requires a guest TLB flush or not,
so there might be room for improvement in that regard.

Also perform ASID/VPID flushes when modifying the p2m tables as it's a
requirement for AMD hardware. Finally keep the flush in
switch_cr3_cr4, as it's not clear whether code could rely on
switch_cr3_cr4 also performing a guest linear TLB flush. A following
patch can remove the ASID/VPID tickle from switch_cr3_cr4 if found to
not be necessary.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v9:
 - Introduce and use guest_flush_tlb_mask and sh_flush_local.
 - Add a local domain variable to p2m_pt_change_entry_type_global.

Changes since v8:
 - Don't flush host TLB on HAP changes.
 - Introduce a helper for shadow changes that only flushes ASIDs/VPIDs
   when the guest is HVM.
 - Introduce a helper for HAP that only flushes ASIDs/VPIDs.

Changes since v7:
 - Do not perform an ASID flush in filtered_flush_tlb_mask: the
   requested flush is related to the page need_tlbflush field and not
   to p2m changes (applies to both callers).

Changes since v6:
 - Add ASID/VPID flushes when modifying the p2m.
 - Keep the ASID/VPID flush in switch_cr3_cr4.

Changes since v5:
 - Rename FLUSH_GUESTS_TLB to FLUSH_HVM_ASID_CORE.
 - Clarify commit message.
 - Define FLUSH_HVM_ASID_CORE to 0 when !CONFIG_HVM.
---
 xen/arch/x86/flushtlb.c          | 18 ++++++++++++++++--
 xen/arch/x86/mm/hap/hap.c        |  8 ++++----
 xen/arch/x86/mm/hap/nested_hap.c |  2 +-
 xen/arch/x86/mm/p2m-pt.c         |  5 +++--
 xen/arch/x86/mm/paging.c         |  2 +-
 xen/arch/x86/mm/shadow/common.c  | 18 +++++++++---------
 xen/arch/x86/mm/shadow/hvm.c     |  2 +-
 xen/arch/x86/mm/shadow/multi.c   | 16 ++++++++--------
 xen/arch/x86/mm/shadow/private.h |  6 ++++++
 xen/include/asm-x86/flushtlb.h   |  8 ++++++++
 10 files changed, 57 insertions(+), 28 deletions(-)

diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index 03f92c23dc..7d261aef32 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -7,6 +7,7 @@
  * Copyright (c) 2003-2006, K A Fraser
  */
 
+#include <xen/paging.h>
 #include <xen/sched.h>
 #include <xen/smp.h>
 #include <xen/softirq.h>
@@ -59,8 +60,6 @@ static u32 pre_flush(void)
         raise_softirq(NEW_TLBFLUSH_CLOCK_PERIOD_SOFTIRQ);
 
  skip_clocktick:
-    hvm_flush_guest_tlbs();
-
     return t2;
 }
 
@@ -118,6 +117,7 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
     local_irq_save(flags);
 
     t = pre_flush();
+    hvm_flush_guest_tlbs();
 
     old_cr4 = read_cr4();
     ASSERT(!(old_cr4 & X86_CR4_PCIDE) || !(old_cr4 & X86_CR4_PGE));
@@ -221,6 +221,9 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
             do_tlb_flush();
     }
 
+    if ( flags & FLUSH_HVM_ASID_CORE )
+        hvm_flush_guest_tlbs();
+
     if ( flags & FLUSH_CACHE )
     {
         const struct cpuinfo_x86 *c = &current_cpu_data;
@@ -254,3 +257,14 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
 
     return flags;
 }
+
+void guest_flush_tlb_mask(const struct domain *d, const cpumask_t *mask)
+{
+    unsigned int flags = (is_pv_domain(d) || paging_mode_shadow(d) ? FLUSH_TLB
+                                                                   : 0) |
+                         (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE
+                                                          : 0);
+
+    if ( flags )
+        flush_mask(mask, flags);
+}
diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index 052ae35c6f..f7218a86d6 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -118,7 +118,7 @@ int hap_track_dirty_vram(struct domain *d,
             p2m_change_type_range(d, begin_pfn, begin_pfn + nr,
                                   p2m_ram_rw, p2m_ram_logdirty);
 
-            flush_tlb_mask(d->dirty_cpumask);
+            guest_flush_tlb_mask(d, d->dirty_cpumask);
 
             memset(dirty_bitmap, 0xff, size); /* consider all pages dirty */
         }
@@ -205,7 +205,7 @@ static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
          * to be read-only, or via hardware-assisted log-dirty.
          */
         p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
-        flush_tlb_mask(d->dirty_cpumask);
+        guest_flush_tlb_mask(d, d->dirty_cpumask);
     }
     return 0;
 }
@@ -234,7 +234,7 @@ static void hap_clean_dirty_bitmap(struct domain *d)
      * be read-only, or via hardware-assisted log-dirty.
      */
     p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
-    flush_tlb_mask(d->dirty_cpumask);
+    guest_flush_tlb_mask(d, d->dirty_cpumask);
 }
 
 /************************************************/
@@ -812,7 +812,7 @@ hap_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, l1_pgentry_t *p,
 
     safe_write_pte(p, new);
     if ( old_flags & _PAGE_PRESENT )
-        flush_tlb_mask(d->dirty_cpumask);
+        guest_flush_tlb_mask(d, d->dirty_cpumask);
 
     paging_unlock(d);
 
diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
index abe5958a52..f92ddc5206 100644
--- a/xen/arch/x86/mm/hap/nested_hap.c
+++ b/xen/arch/x86/mm/hap/nested_hap.c
@@ -84,7 +84,7 @@ nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
     safe_write_pte(p, new);
 
     if (old_flags & _PAGE_PRESENT)
-        flush_tlb_mask(p2m->dirty_cpumask);
+        guest_flush_tlb_mask(d, p2m->dirty_cpumask);
 
     paging_unlock(d);
 
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index eb66077496..5c0501794e 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -866,11 +866,12 @@ static void p2m_pt_change_entry_type_global(struct p2m_domain *p2m,
     l1_pgentry_t *tab;
     unsigned long gfn = 0;
     unsigned int i, changed;
+    const struct domain *d = p2m->domain;
 
     if ( pagetable_get_pfn(p2m_get_pagetable(p2m)) == 0 )
         return;
 
-    ASSERT(hap_enabled(p2m->domain));
+    ASSERT(hap_enabled(d));
 
     tab = map_domain_page(pagetable_get_mfn(p2m_get_pagetable(p2m)));
     for ( changed = i = 0; i < (1 << PAGETABLE_ORDER); ++i )
@@ -896,7 +897,7 @@ static void p2m_pt_change_entry_type_global(struct p2m_domain *p2m,
     unmap_domain_page(tab);
 
     if ( changed )
-         flush_tlb_mask(p2m->domain->dirty_cpumask);
+         guest_flush_tlb_mask(d, d->dirty_cpumask);
 }
 
 static int p2m_pt_change_entry_type_range(struct p2m_domain *p2m,
diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c
index 469bb76429..fd3175bd3e 100644
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -613,7 +613,7 @@ void paging_log_dirty_range(struct domain *d,
 
     p2m_unlock(p2m);
 
-    flush_tlb_mask(d->dirty_cpumask);
+    guest_flush_tlb_mask(d, d->dirty_cpumask);
 }
 
 /*
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index 75dd414a6e..f8168b210a 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -363,7 +363,7 @@ static int oos_remove_write_access(struct vcpu *v, mfn_t gmfn,
     }
 
     if ( ftlb )
-        flush_tlb_mask(d->dirty_cpumask);
+        guest_flush_tlb_mask(d, d->dirty_cpumask);
 
     return 0;
 }
@@ -939,7 +939,7 @@ static void _shadow_prealloc(struct domain *d, unsigned int pages)
                 /* See if that freed up enough space */
                 if ( d->arch.paging.shadow.free_pages >= pages )
                 {
-                    flush_tlb_mask(d->dirty_cpumask);
+                    guest_flush_tlb_mask(d, d->dirty_cpumask);
                     return;
                 }
             }
@@ -993,7 +993,7 @@ static void shadow_blow_tables(struct domain *d)
                                pagetable_get_mfn(v->arch.shadow_table[i]), 0);
 
     /* Make sure everyone sees the unshadowings */
-    flush_tlb_mask(d->dirty_cpumask);
+    guest_flush_tlb_mask(d, d->dirty_cpumask);
 }
 
 void shadow_blow_tables_per_domain(struct domain *d)
@@ -1102,7 +1102,7 @@ mfn_t shadow_alloc(struct domain *d,
         if ( unlikely(!cpumask_empty(&mask)) )
         {
             perfc_incr(shadow_alloc_tlbflush);
-            flush_tlb_mask(&mask);
+            guest_flush_tlb_mask(d, &mask);
         }
         /* Now safe to clear the page for reuse */
         clear_domain_page(page_to_mfn(sp));
@@ -2293,7 +2293,7 @@ void sh_remove_shadows(struct domain *d, mfn_t gmfn, int fast, int all)
 
     /* Need to flush TLBs now, so that linear maps are safe next time we
      * take a fault. */
-    flush_tlb_mask(d->dirty_cpumask);
+    guest_flush_tlb_mask(d, d->dirty_cpumask);
 
     paging_unlock(d);
 }
@@ -3008,7 +3008,7 @@ static void sh_unshadow_for_p2m_change(struct domain *d, unsigned long gfn,
         {
             sh_remove_all_shadows_and_parents(d, mfn);
             if ( sh_remove_all_mappings(d, mfn, _gfn(gfn)) )
-                flush_tlb_mask(d->dirty_cpumask);
+                guest_flush_tlb_mask(d, d->dirty_cpumask);
         }
     }
 
@@ -3048,7 +3048,7 @@ static void sh_unshadow_for_p2m_change(struct domain *d, unsigned long gfn,
                 }
                 omfn = mfn_add(omfn, 1);
             }
-            flush_tlb_mask(&flushmask);
+            guest_flush_tlb_mask(d, &flushmask);
 
             if ( npte )
                 unmap_domain_page(npte);
@@ -3335,7 +3335,7 @@ int shadow_track_dirty_vram(struct domain *d,
         }
     }
     if ( flush_tlb )
-        flush_tlb_mask(d->dirty_cpumask);
+        guest_flush_tlb_mask(d, d->dirty_cpumask);
     goto out;
 
 out_sl1ma:
@@ -3405,7 +3405,7 @@ bool shadow_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
     }
 
     /* Flush TLBs on all CPUs with dirty vcpu state. */
-    flush_tlb_mask(mask);
+    guest_flush_tlb_mask(d, mask);
 
     /* Done. */
     for_each_vcpu ( d, v )
diff --git a/xen/arch/x86/mm/shadow/hvm.c b/xen/arch/x86/mm/shadow/hvm.c
index 1e6024c71f..608360daec 100644
--- a/xen/arch/x86/mm/shadow/hvm.c
+++ b/xen/arch/x86/mm/shadow/hvm.c
@@ -591,7 +591,7 @@ static void validate_guest_pt_write(struct vcpu *v, mfn_t gmfn,
 
     if ( rc & SHADOW_SET_FLUSH )
         /* Need to flush TLBs to pick up shadow PT changes */
-        flush_tlb_mask(d->dirty_cpumask);
+        guest_flush_tlb_mask(d, d->dirty_cpumask);
 
     if ( rc & SHADOW_SET_ERROR )
     {
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index f6b1628742..d3f6a9216f 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3067,7 +3067,7 @@ static int sh_page_fault(struct vcpu *v,
         perfc_incr(shadow_rm_write_flush_tlb);
         smp_wmb();
         atomic_inc(&d->arch.paging.shadow.gtable_dirty_version);
-        flush_tlb_mask(d->dirty_cpumask);
+        guest_flush_tlb_mask(d, d->dirty_cpumask);
     }
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
@@ -3576,7 +3576,7 @@ static bool sh_invlpg(struct vcpu *v, unsigned long linear)
     if ( mfn_to_page(sl1mfn)->u.sh.type
          == SH_type_fl1_shadow )
     {
-        flush_tlb_local();
+        sh_flush_local(v->domain);
         return false;
     }
 
@@ -3811,7 +3811,7 @@ sh_update_linear_entries(struct vcpu *v)
          * table entry. But, without this change, it would fetch the wrong
          * value due to a stale TLB.
          */
-        flush_tlb_local();
+        sh_flush_local(d);
     }
 }
 
@@ -4012,7 +4012,7 @@ sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
      * (old) shadow linear maps in the writeable mapping heuristics. */
 #if GUEST_PAGING_LEVELS == 2
     if ( sh_remove_write_access(d, gmfn, 2, 0) != 0 )
-        flush_tlb_mask(d->dirty_cpumask);
+        guest_flush_tlb_mask(d, d->dirty_cpumask);
     sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l2_shadow);
 #elif GUEST_PAGING_LEVELS == 3
     /* PAE guests have four shadow_table entries, based on the
@@ -4036,7 +4036,7 @@ sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
             }
         }
         if ( flush )
-            flush_tlb_mask(d->dirty_cpumask);
+            guest_flush_tlb_mask(d, d->dirty_cpumask);
         /* Now install the new shadows. */
         for ( i = 0; i < 4; i++ )
         {
@@ -4057,7 +4057,7 @@ sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
     }
 #elif GUEST_PAGING_LEVELS == 4
     if ( sh_remove_write_access(d, gmfn, 4, 0) != 0 )
-        flush_tlb_mask(d->dirty_cpumask);
+        guest_flush_tlb_mask(d, d->dirty_cpumask);
     sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l4_shadow);
     if ( !shadow_mode_external(d) && !is_pv_32bit_domain(d) )
     {
@@ -4503,7 +4503,7 @@ static void sh_pagetable_dying(paddr_t gpa)
         }
     }
     if ( flush )
-        flush_tlb_mask(d->dirty_cpumask);
+        guest_flush_tlb_mask(d, d->dirty_cpumask);
 
     /* Remember that we've seen the guest use this interface, so we
      * can rely on it using it in future, instead of guessing at
@@ -4540,7 +4540,7 @@ static void sh_pagetable_dying(paddr_t gpa)
         mfn_to_page(gmfn)->pagetable_dying = true;
         shadow_unhook_mappings(d, smfn, 1/* user pages only */);
         /* Now flush the TLB: we removed toplevel mappings. */
-        flush_tlb_mask(d->dirty_cpumask);
+        guest_flush_tlb_mask(d, d->dirty_cpumask);
     }
 
     /* Remember that we've seen the guest use this interface, so we
diff --git a/xen/arch/x86/mm/shadow/private.h b/xen/arch/x86/mm/shadow/private.h
index e8b028a365..82735e35ef 100644
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -818,6 +818,12 @@ static inline int sh_check_page_has_no_refs(struct page_info *page)
 bool shadow_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
                       void *ctxt);
 
+static inline void sh_flush_local(const struct domain *d)
+{
+    flush_local(FLUSH_TLB |
+                (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE : 0));
+}
+
 #endif /* _XEN_SHADOW_PRIVATE_H */
 
 /*
diff --git a/xen/include/asm-x86/flushtlb.h b/xen/include/asm-x86/flushtlb.h
index 2cfe4e6e97..50df468c16 100644
--- a/xen/include/asm-x86/flushtlb.h
+++ b/xen/include/asm-x86/flushtlb.h
@@ -105,6 +105,12 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4);
 #define FLUSH_VCPU_STATE 0x1000
  /* Flush the per-cpu root page table */
 #define FLUSH_ROOT_PGTBL 0x2000
+#if CONFIG_HVM
+ /* Flush all HVM guests linear TLB (using ASID/VPID) */
+#define FLUSH_HVM_ASID_CORE 0x4000
+#else
+#define FLUSH_HVM_ASID_CORE 0
+#endif
 
 /* Flush local TLBs/caches. */
 unsigned int flush_area_local(const void *va, unsigned int flags);
@@ -159,4 +165,6 @@ static inline int clean_dcache_va_range(const void *p, unsigned long size)
     return clean_and_invalidate_dcache_va_range(p, size);
 }
 
+void guest_flush_tlb_mask(const struct domain *d, const cpumask_t *mask);
+
 #endif /* __FLUSHTLB_H__ */
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v10 2/3] x86/tlb: allow disabling the TLB clock
  2020-04-16 13:59 [PATCH v10 0/3] x86/guest: use assisted TLB flush in guest mode Roger Pau Monne
  2020-04-16 13:59 ` [PATCH v10 1/3] x86/tlb: introduce a flush HVM ASIDs flag Roger Pau Monne
@ 2020-04-16 13:59 ` Roger Pau Monne
  2020-04-16 13:59 ` [PATCH v10 3/3] x86/tlb: use Xen L0 assisted TLB flush when available Roger Pau Monne
  2 siblings, 0 replies; 16+ messages in thread
From: Roger Pau Monne @ 2020-04-16 13:59 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monne

The TLB clock is helpful when running Xen on bare metal because when
doing a TLB flush each CPU is IPI'ed and can keep a timestamp of the
last flush.

This is not the case however when Xen is running virtualized, and the
underlying hypervisor provides mechanism to assist in performing TLB
flushes: Xen itself for example offers a HVMOP_flush_tlbs hypercall in
order to perform a TLB flush without having to IPI each CPU. When
using such mechanisms it's no longer possible to keep a timestamp of
the flushes on each CPU, as they are performed by the underlying
hypervisor.

Offer a boolean in order to signal Xen that the timestamped TLB
shouldn't be used. This avoids keeping the timestamps of the flushes,
and also forces NEED_FLUSH to always return true.

No functional change intended, as this change doesn't introduce any
user that disables the timestamped TLB.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/flushtlb.c        | 19 +++++++++++++------
 xen/include/asm-x86/flushtlb.h | 17 ++++++++++++++++-
 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index 7d261aef32..e1b1e98c23 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -33,6 +33,9 @@
 u32 tlbflush_clock = 1U;
 DEFINE_PER_CPU(u32, tlbflush_time);
 
+/* Signals whether the TLB flush clock is in use. */
+bool __read_mostly tlb_clk_enabled = true;
+
 /*
  * pre_flush(): Increment the virtual TLB-flush clock. Returns new clock value.
  * 
@@ -83,12 +86,13 @@ static void post_flush(u32 t)
 static void do_tlb_flush(void)
 {
     unsigned long flags, cr4;
-    u32 t;
+    u32 t = 0;
 
     /* This non-reentrant function is sometimes called in interrupt context. */
     local_irq_save(flags);
 
-    t = pre_flush();
+    if ( tlb_clk_enabled )
+        t = pre_flush();
 
     if ( use_invpcid )
         invpcid_flush_all();
@@ -100,7 +104,8 @@ static void do_tlb_flush(void)
     else
         write_cr3(read_cr3());
 
-    post_flush(t);
+    if ( tlb_clk_enabled )
+        post_flush(t);
 
     local_irq_restore(flags);
 }
@@ -108,7 +113,7 @@ static void do_tlb_flush(void)
 void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
 {
     unsigned long flags, old_cr4;
-    u32 t;
+    u32 t = 0;
 
     /* Throughout this function we make this assumption: */
     ASSERT(!(cr4 & X86_CR4_PCIDE) || !(cr4 & X86_CR4_PGE));
@@ -116,7 +121,8 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
     /* This non-reentrant function is sometimes called in interrupt context. */
     local_irq_save(flags);
 
-    t = pre_flush();
+    if ( tlb_clk_enabled )
+        t = pre_flush();
     hvm_flush_guest_tlbs();
 
     old_cr4 = read_cr4();
@@ -169,7 +175,8 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
     if ( cr4 & X86_CR4_PCIDE )
         invpcid_flush_all_nonglobals();
 
-    post_flush(t);
+    if ( tlb_clk_enabled )
+        post_flush(t);
 
     local_irq_restore(flags);
 }
diff --git a/xen/include/asm-x86/flushtlb.h b/xen/include/asm-x86/flushtlb.h
index 50df468c16..d5ca4bad57 100644
--- a/xen/include/asm-x86/flushtlb.h
+++ b/xen/include/asm-x86/flushtlb.h
@@ -21,10 +21,21 @@ extern u32 tlbflush_clock;
 /* Time at which each CPU's TLB was last flushed. */
 DECLARE_PER_CPU(u32, tlbflush_time);
 
-#define tlbflush_current_time() tlbflush_clock
+/* TLB clock is in use. */
+extern bool tlb_clk_enabled;
+
+static inline uint32_t tlbflush_current_time(void)
+{
+    /* Returning 0 from tlbflush_current_time will always force a flush. */
+    return tlb_clk_enabled ? tlbflush_clock : 0;
+}
 
 static inline void page_set_tlbflush_timestamp(struct page_info *page)
 {
+    /* Avoid the write if the TLB clock is disabled. */
+    if ( !tlb_clk_enabled )
+        return;
+
     /*
      * Prevent storing a stale time stamp, which could happen if an update
      * to tlbflush_clock plus a subsequent flush IPI happen between the
@@ -67,6 +78,10 @@ static inline void tlbflush_filter(cpumask_t *mask, uint32_t page_timestamp)
 {
     unsigned int cpu;
 
+    /* Short-circuit: there's no need to iterate if the clock is disabled. */
+    if ( !tlb_clk_enabled )
+        return;
+
     for_each_cpu ( cpu, mask )
         if ( !NEED_FLUSH(per_cpu(tlbflush_time, cpu), page_timestamp) )
             __cpumask_clear_cpu(cpu, mask);
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v10 3/3] x86/tlb: use Xen L0 assisted TLB flush when available
  2020-04-16 13:59 [PATCH v10 0/3] x86/guest: use assisted TLB flush in guest mode Roger Pau Monne
  2020-04-16 13:59 ` [PATCH v10 1/3] x86/tlb: introduce a flush HVM ASIDs flag Roger Pau Monne
  2020-04-16 13:59 ` [PATCH v10 2/3] x86/tlb: allow disabling the TLB clock Roger Pau Monne
@ 2020-04-16 13:59 ` Roger Pau Monne
  2 siblings, 0 replies; 16+ messages in thread
From: Roger Pau Monne @ 2020-04-16 13:59 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monne

Use Xen's L0 HVMOP_flush_tlbs hypercall in order to perform flushes.
This greatly increases the performance of TLB flushes when running
with a high amount of vCPUs as a Xen guest, and is specially important
when running in shim mode.

The following figures are from a PV guest running `make -j32 xen` in
shim mode with 32 vCPUs and HAP.

Using x2APIC and ALLBUT shorthand:
real	4m35.973s
user	4m35.110s
sys	36m24.117s

Using L0 assisted flush:
real    1m2.596s
user    4m34.818s
sys     5m16.374s

The implementation adds a new hook to hypervisor_ops so other
enlightenments can also implement such assisted flush just by filling
the hook.

Note that the Xen implementation completely ignores the dirty CPU mask
and the linear address passed in, and always performs a global TLB
flush on all vCPUs. This is a limitation of the hypercall provided by
Xen. Also note that local TLB flushes are not performed using the
assisted TLB flush, only remote ones.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Changes since v5:
 - Clarify commit message.
 - Test for assisted flush at setup, do this for all hypervisors.
 - Return EOPNOTSUPP if assisted flush is not available.

Changes since v4:
 - Adjust order calculation.

Changes since v3:
 - Use an alternative call for the flush hook.

Changes since v1:
 - Add a L0 assisted hook to hypervisor ops.
---
 xen/arch/x86/guest/hypervisor.c        | 14 ++++++++++++++
 xen/arch/x86/guest/xen/xen.c           |  6 ++++++
 xen/arch/x86/smp.c                     |  7 +++++++
 xen/include/asm-x86/guest/hypervisor.h | 17 +++++++++++++++++
 4 files changed, 44 insertions(+)

diff --git a/xen/arch/x86/guest/hypervisor.c b/xen/arch/x86/guest/hypervisor.c
index 647cdb1367..e46de42ded 100644
--- a/xen/arch/x86/guest/hypervisor.c
+++ b/xen/arch/x86/guest/hypervisor.c
@@ -18,6 +18,7 @@
  *
  * Copyright (c) 2019 Microsoft.
  */
+#include <xen/cpumask.h>
 #include <xen/init.h>
 #include <xen/types.h>
 
@@ -51,6 +52,10 @@ void __init hypervisor_setup(void)
 {
     if ( ops.setup )
         ops.setup();
+
+    /* Check if assisted flush is available and disable the TLB clock if so. */
+    if ( !hypervisor_flush_tlb(cpumask_of(smp_processor_id()), NULL, 0) )
+        tlb_clk_enabled = false;
 }
 
 int hypervisor_ap_setup(void)
@@ -73,6 +78,15 @@ void __init hypervisor_e820_fixup(struct e820map *e820)
         ops.e820_fixup(e820);
 }
 
+int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
+                         unsigned int order)
+{
+    if ( ops.flush_tlb )
+        return alternative_call(ops.flush_tlb, mask, va, order);
+
+    return -EOPNOTSUPP;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/guest/xen/xen.c b/xen/arch/x86/guest/xen/xen.c
index e74fd1e995..3bc01c8723 100644
--- a/xen/arch/x86/guest/xen/xen.c
+++ b/xen/arch/x86/guest/xen/xen.c
@@ -324,12 +324,18 @@ static void __init e820_fixup(struct e820map *e820)
         pv_shim_fixup_e820(e820);
 }
 
+static int flush_tlb(const cpumask_t *mask, const void *va, unsigned int order)
+{
+    return xen_hypercall_hvm_op(HVMOP_flush_tlbs, NULL);
+}
+
 static const struct hypervisor_ops __initconstrel ops = {
     .name = "Xen",
     .setup = setup,
     .ap_setup = ap_setup,
     .resume = resume,
     .e820_fixup = e820_fixup,
+    .flush_tlb = flush_tlb,
 };
 
 const struct hypervisor_ops *__init xg_probe(void)
diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index bcead5d01b..1d9fec65de 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -15,6 +15,7 @@
 #include <xen/perfc.h>
 #include <xen/spinlock.h>
 #include <asm/current.h>
+#include <asm/guest.h>
 #include <asm/smp.h>
 #include <asm/mc146818rtc.h>
 #include <asm/flushtlb.h>
@@ -268,6 +269,12 @@ void flush_area_mask(const cpumask_t *mask, const void *va, unsigned int flags)
     if ( (flags & ~FLUSH_ORDER_MASK) &&
          !cpumask_subset(mask, cpumask_of(cpu)) )
     {
+        if ( cpu_has_hypervisor &&
+             !(flags & ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID |
+                         FLUSH_ORDER_MASK)) &&
+             !hypervisor_flush_tlb(mask, va, (flags - 1) & FLUSH_ORDER_MASK) )
+            return;
+
         spin_lock(&flush_lock);
         cpumask_and(&flush_cpumask, mask, &cpu_online_map);
         cpumask_clear_cpu(cpu, &flush_cpumask);
diff --git a/xen/include/asm-x86/guest/hypervisor.h b/xen/include/asm-x86/guest/hypervisor.h
index ade10e74ea..77a1d21824 100644
--- a/xen/include/asm-x86/guest/hypervisor.h
+++ b/xen/include/asm-x86/guest/hypervisor.h
@@ -19,6 +19,8 @@
 #ifndef __X86_HYPERVISOR_H__
 #define __X86_HYPERVISOR_H__
 
+#include <xen/cpumask.h>
+
 #include <asm/e820.h>
 
 struct hypervisor_ops {
@@ -32,6 +34,8 @@ struct hypervisor_ops {
     void (*resume)(void);
     /* Fix up e820 map */
     void (*e820_fixup)(struct e820map *e820);
+    /* L0 assisted TLB flush */
+    int (*flush_tlb)(const cpumask_t *mask, const void *va, unsigned int order);
 };
 
 #ifdef CONFIG_GUEST
@@ -41,6 +45,14 @@ void hypervisor_setup(void);
 int hypervisor_ap_setup(void);
 void hypervisor_resume(void);
 void hypervisor_e820_fixup(struct e820map *e820);
+/*
+ * L0 assisted TLB flush.
+ * mask: cpumask of the dirty vCPUs that should be flushed.
+ * va: linear address to flush, or NULL for global flushes.
+ * order: order of the linear address pointed by va.
+ */
+int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
+                         unsigned int order);
 
 #else
 
@@ -52,6 +64,11 @@ static inline void hypervisor_setup(void) { ASSERT_UNREACHABLE(); }
 static inline int hypervisor_ap_setup(void) { return 0; }
 static inline void hypervisor_resume(void) { ASSERT_UNREACHABLE(); }
 static inline void hypervisor_e820_fixup(struct e820map *e820) {}
+static inline int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
+                                       unsigned int order)
+{
+    return -EOPNOTSUPP;
+}
 
 #endif  /* CONFIG_GUEST */
 
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v10 1/3] x86/tlb: introduce a flush HVM ASIDs flag
  2020-04-16 13:59 ` [PATCH v10 1/3] x86/tlb: introduce a flush HVM ASIDs flag Roger Pau Monne
@ 2020-04-21 10:21   ` Jan Beulich
  2020-04-21 10:43     ` Roger Pau Monné
  2020-04-22 16:33   ` Roger Pau Monné
  1 sibling, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2020-04-21 10:21 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: xen-devel, Tim Deegan, George Dunlap, Wei Liu, Andrew Cooper

On 16.04.2020 15:59, Roger Pau Monne wrote:
> Introduce a specific flag to request a HVM guest linear TLB flush,
> which is an ASID/VPID tickle that forces a guest linear to guest
> physical TLB flush for all HVM guests.
> 
> This was previously unconditionally done in each pre_flush call, but
> that's not required: HVM guests not using shadow don't require linear
> TLB flushes as Xen doesn't modify the guest page tables in that case
> (ie: when using HAP).

I'm afraid I'm being confused by this: Even in shadow mode Xen
doesn't modify guest page tables, does it?

> @@ -254,3 +257,14 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
>  
>      return flags;
>  }
> +
> +void guest_flush_tlb_mask(const struct domain *d, const cpumask_t *mask)
> +{
> +    unsigned int flags = (is_pv_domain(d) || paging_mode_shadow(d) ? FLUSH_TLB
> +                                                                   : 0) |
> +                         (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE
> +                                                          : 0);

Why the is_pv_domain() part of the condition? Afaict for PV
domains you can get here only if they have shadow mode enabled.

> --- a/xen/arch/x86/mm/shadow/private.h
> +++ b/xen/arch/x86/mm/shadow/private.h
> @@ -818,6 +818,12 @@ static inline int sh_check_page_has_no_refs(struct page_info *page)
>  bool shadow_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
>                        void *ctxt);
>  
> +static inline void sh_flush_local(const struct domain *d)
> +{
> +    flush_local(FLUSH_TLB |
> +                (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE : 0));
> +}

I think the right side of | wants folding with its counterpart in
guest_flush_tlb_mask(). Doing so would avoid guest_flush_tlb_mask()
getting updated but this one forgotten. Perhaps split out
guest_flush_tlb_flags() from guest_flush_tlb_mask()?

I also think this function should move into multi.c as long as it's
needed only there.

Jan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v10 1/3] x86/tlb: introduce a flush HVM ASIDs flag
  2020-04-21 10:21   ` Jan Beulich
@ 2020-04-21 10:43     ` Roger Pau Monné
  2020-04-21 12:59       ` Jan Beulich
  0 siblings, 1 reply; 16+ messages in thread
From: Roger Pau Monné @ 2020-04-21 10:43 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Tim Deegan, George Dunlap, Wei Liu, Andrew Cooper

On Tue, Apr 21, 2020 at 12:21:12PM +0200, Jan Beulich wrote:
> On 16.04.2020 15:59, Roger Pau Monne wrote:
> > Introduce a specific flag to request a HVM guest linear TLB flush,
> > which is an ASID/VPID tickle that forces a guest linear to guest
> > physical TLB flush for all HVM guests.
> > 
> > This was previously unconditionally done in each pre_flush call, but
> > that's not required: HVM guests not using shadow don't require linear
> > TLB flushes as Xen doesn't modify the guest page tables in that case
> > (ie: when using HAP).
> 
> I'm afraid I'm being confused by this: Even in shadow mode Xen
> doesn't modify guest page tables, does it?

I'm also confused now. It's my understand that when running in shadow
mode guest page tables are not actually used, and the guest uses Xen's
crafted shadow tables instead, which are based on the original guest
page tables suitably adjusted by Xen in order to do the p2m
translation in the HVM case, or the needed PTE adjustments in the PV
case.

So guest page tables are not modified, but are also not used as they
are never loaded into cr3.

> > @@ -254,3 +257,14 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
> >  
> >      return flags;
> >  }
> > +
> > +void guest_flush_tlb_mask(const struct domain *d, const cpumask_t *mask)
> > +{
> > +    unsigned int flags = (is_pv_domain(d) || paging_mode_shadow(d) ? FLUSH_TLB
> > +                                                                   : 0) |
> > +                         (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE
> > +                                                          : 0);
> 
> Why the is_pv_domain() part of the condition? Afaict for PV
> domains you can get here only if they have shadow mode enabled.

Right now yes, the only way to get here for PV domains is when using
shadow, but if this helper gets used in other non-shadow PV paths then
Xen's needs to do a TLB flush.

> > --- a/xen/arch/x86/mm/shadow/private.h
> > +++ b/xen/arch/x86/mm/shadow/private.h
> > @@ -818,6 +818,12 @@ static inline int sh_check_page_has_no_refs(struct page_info *page)
> >  bool shadow_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
> >                        void *ctxt);
> >  
> > +static inline void sh_flush_local(const struct domain *d)
> > +{
> > +    flush_local(FLUSH_TLB |
> > +                (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE : 0));
> > +}
> 
> I think the right side of | wants folding with its counterpart in
> guest_flush_tlb_mask(). Doing so would avoid guest_flush_tlb_mask()
> getting updated but this one forgotten. Perhaps split out
> guest_flush_tlb_flags() from guest_flush_tlb_mask()?

Can do.

> I also think this function should move into multi.c as long as it's
> needed only there.

Ack.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v10 1/3] x86/tlb: introduce a flush HVM ASIDs flag
  2020-04-21 10:43     ` Roger Pau Monné
@ 2020-04-21 12:59       ` Jan Beulich
  2020-04-21 13:51         ` Roger Pau Monné
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2020-04-21 12:59 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Tim Deegan, George Dunlap, Wei Liu, Andrew Cooper

On 21.04.2020 12:43, Roger Pau Monné wrote:
> On Tue, Apr 21, 2020 at 12:21:12PM +0200, Jan Beulich wrote:
>> On 16.04.2020 15:59, Roger Pau Monne wrote:
>>> Introduce a specific flag to request a HVM guest linear TLB flush,
>>> which is an ASID/VPID tickle that forces a guest linear to guest
>>> physical TLB flush for all HVM guests.
>>>
>>> This was previously unconditionally done in each pre_flush call, but
>>> that's not required: HVM guests not using shadow don't require linear
>>> TLB flushes as Xen doesn't modify the guest page tables in that case
>>> (ie: when using HAP).
>>
>> I'm afraid I'm being confused by this: Even in shadow mode Xen
>> doesn't modify guest page tables, does it?
> 
> I'm also confused now. It's my understand that when running in shadow
> mode guest page tables are not actually used, and the guest uses Xen's
> crafted shadow tables instead, which are based on the original guest
> page tables suitably adjusted by Xen in order to do the p2m
> translation in the HVM case, or the needed PTE adjustments in the PV
> case.
> 
> So guest page tables are not modified, but are also not used as they
> are never loaded into cr3.

This matches my understanding.

>>> @@ -254,3 +257,14 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
>>>  
>>>      return flags;
>>>  }
>>> +
>>> +void guest_flush_tlb_mask(const struct domain *d, const cpumask_t *mask)
>>> +{
>>> +    unsigned int flags = (is_pv_domain(d) || paging_mode_shadow(d) ? FLUSH_TLB
>>> +                                                                   : 0) |
>>> +                         (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE
>>> +                                                          : 0);
>>
>> Why the is_pv_domain() part of the condition? Afaict for PV
>> domains you can get here only if they have shadow mode enabled.
> 
> Right now yes, the only way to get here for PV domains is when using
> shadow, but if this helper gets used in other non-shadow PV paths then
> Xen's needs to do a TLB flush.

Why would a non-shdow PV path find a need to call this function?

Jan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v10 1/3] x86/tlb: introduce a flush HVM ASIDs flag
  2020-04-21 12:59       ` Jan Beulich
@ 2020-04-21 13:51         ` Roger Pau Monné
  2020-04-21 14:09           ` Jan Beulich
  0 siblings, 1 reply; 16+ messages in thread
From: Roger Pau Monné @ 2020-04-21 13:51 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Tim Deegan, George Dunlap, Wei Liu, Andrew Cooper

On Tue, Apr 21, 2020 at 02:59:10PM +0200, Jan Beulich wrote:
> On 21.04.2020 12:43, Roger Pau Monné wrote:
> > On Tue, Apr 21, 2020 at 12:21:12PM +0200, Jan Beulich wrote:
> >> On 16.04.2020 15:59, Roger Pau Monne wrote:
> >>> Introduce a specific flag to request a HVM guest linear TLB flush,
> >>> which is an ASID/VPID tickle that forces a guest linear to guest
> >>> physical TLB flush for all HVM guests.
> >>>
> >>> This was previously unconditionally done in each pre_flush call, but
> >>> that's not required: HVM guests not using shadow don't require linear
> >>> TLB flushes as Xen doesn't modify the guest page tables in that case
> >>> (ie: when using HAP).
> >>
> >> I'm afraid I'm being confused by this: Even in shadow mode Xen
> >> doesn't modify guest page tables, does it?
> > 
> > I'm also confused now. It's my understand that when running in shadow
> > mode guest page tables are not actually used, and the guest uses Xen's
> > crafted shadow tables instead, which are based on the original guest
> > page tables suitably adjusted by Xen in order to do the p2m
> > translation in the HVM case, or the needed PTE adjustments in the PV
> > case.
> > 
> > So guest page tables are not modified, but are also not used as they
> > are never loaded into cr3.
> 
> This matches my understanding.

Please bear with me, as I'm not sure if your question was because you
think the paragraph is not clear and/or should be expanded.

The point of the paragraph you mention was to have a difference
between guests running in shadow mode vs guests running in HAP mode.
Maybe I should use guest loaded page pages, to differentiate between
guest created page tables and the page tables actually loaded in cr3
in guest mode?

> >>> @@ -254,3 +257,14 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
> >>>  
> >>>      return flags;
> >>>  }
> >>> +
> >>> +void guest_flush_tlb_mask(const struct domain *d, const cpumask_t *mask)
> >>> +{
> >>> +    unsigned int flags = (is_pv_domain(d) || paging_mode_shadow(d) ? FLUSH_TLB
> >>> +                                                                   : 0) |
> >>> +                         (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE
> >>> +                                                          : 0);
> >>
> >> Why the is_pv_domain() part of the condition? Afaict for PV
> >> domains you can get here only if they have shadow mode enabled.
> > 
> > Right now yes, the only way to get here for PV domains is when using
> > shadow, but if this helper gets used in other non-shadow PV paths then
> > Xen's needs to do a TLB flush.
> 
> Why would a non-shdow PV path find a need to call this function?

The memory management code in PV guests also needs to perform TLB
flushes, so I wasn't sure whether the aim was to also switch it to use
guest_flush_tlb_mask.

I guess this doesn't make a lot of sense since PV guests just need a
plain TLB flush, and there's not a lot of benefit from having a helper
around that. Maybe for PV guests running in XPTI mode, where the flush
can be avoided as the return to guest path already flushes the page
tables? Anyway, will remove the is_pv_domain check.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v10 1/3] x86/tlb: introduce a flush HVM ASIDs flag
  2020-04-21 13:51         ` Roger Pau Monné
@ 2020-04-21 14:09           ` Jan Beulich
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Beulich @ 2020-04-21 14:09 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Tim Deegan, George Dunlap, Wei Liu, Andrew Cooper

On 21.04.2020 15:51, Roger Pau Monné wrote:
> On Tue, Apr 21, 2020 at 02:59:10PM +0200, Jan Beulich wrote:
>> On 21.04.2020 12:43, Roger Pau Monné wrote:
>>> On Tue, Apr 21, 2020 at 12:21:12PM +0200, Jan Beulich wrote:
>>>> On 16.04.2020 15:59, Roger Pau Monne wrote:
>>>>> Introduce a specific flag to request a HVM guest linear TLB flush,
>>>>> which is an ASID/VPID tickle that forces a guest linear to guest
>>>>> physical TLB flush for all HVM guests.
>>>>>
>>>>> This was previously unconditionally done in each pre_flush call, but
>>>>> that's not required: HVM guests not using shadow don't require linear
>>>>> TLB flushes as Xen doesn't modify the guest page tables in that case
>>>>> (ie: when using HAP).
>>>>
>>>> I'm afraid I'm being confused by this: Even in shadow mode Xen
>>>> doesn't modify guest page tables, does it?
>>>
>>> I'm also confused now. It's my understand that when running in shadow
>>> mode guest page tables are not actually used, and the guest uses Xen's
>>> crafted shadow tables instead, which are based on the original guest
>>> page tables suitably adjusted by Xen in order to do the p2m
>>> translation in the HVM case, or the needed PTE adjustments in the PV
>>> case.
>>>
>>> So guest page tables are not modified, but are also not used as they
>>> are never loaded into cr3.
>>
>> This matches my understanding.
> 
> Please bear with me, as I'm not sure if your question was because you
> think the paragraph is not clear and/or should be expanded.
> 
> The point of the paragraph you mention was to have a difference
> between guests running in shadow mode vs guests running in HAP mode.
> Maybe I should use guest loaded page pages, to differentiate between
> guest created page tables and the page tables actually loaded in cr3
> in guest mode?

How about using "the pages tables the guest runs on"?

Jan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v10 1/3] x86/tlb: introduce a flush HVM ASIDs flag
  2020-04-16 13:59 ` [PATCH v10 1/3] x86/tlb: introduce a flush HVM ASIDs flag Roger Pau Monne
  2020-04-21 10:21   ` Jan Beulich
@ 2020-04-22 16:33   ` Roger Pau Monné
  2020-04-23  7:40     ` Jan Beulich
  2020-04-23 10:30     ` Wei Liu
  1 sibling, 2 replies; 16+ messages in thread
From: Roger Pau Monné @ 2020-04-22 16:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich, George Dunlap

On Thu, Apr 16, 2020 at 03:59:07PM +0200, Roger Pau Monne wrote:
> @@ -254,3 +257,14 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
>  
>      return flags;
>  }
> +
> +void guest_flush_tlb_mask(const struct domain *d, const cpumask_t *mask)
> +{
> +    unsigned int flags = (is_pv_domain(d) || paging_mode_shadow(d) ? FLUSH_TLB
> +                                                                   : 0) |
> +                         (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE
> +                                                          : 0);

Maybe I'm getting confused, but I think the above is wrong and ASID
should _always_ be flushed when running a HVM domain in shadow mode
regardless of whether the underlying hw is Intel or AMD, ie:

bool shadow = paging_mode_shadow(d);
unsigned int flags = (shadow ? FLUSH_TLB : 0) |
                     (is_hvm_domain(d) &&
                      (cpu_has_svm || shadow) ? FLUSH_HVM_ASID_CORE : 0);

Is the correct version.

Roger.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v10 1/3] x86/tlb: introduce a flush HVM ASIDs flag
  2020-04-22 16:33   ` Roger Pau Monné
@ 2020-04-23  7:40     ` Jan Beulich
  2020-04-23 10:30     ` Wei Liu
  1 sibling, 0 replies; 16+ messages in thread
From: Jan Beulich @ 2020-04-23  7:40 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Tim Deegan, George Dunlap, Wei Liu, Andrew Cooper

On 22.04.2020 18:33, Roger Pau Monné wrote:
> On Thu, Apr 16, 2020 at 03:59:07PM +0200, Roger Pau Monne wrote:
>> @@ -254,3 +257,14 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
>>  
>>      return flags;
>>  }
>> +
>> +void guest_flush_tlb_mask(const struct domain *d, const cpumask_t *mask)
>> +{
>> +    unsigned int flags = (is_pv_domain(d) || paging_mode_shadow(d) ? FLUSH_TLB
>> +                                                                   : 0) |
>> +                         (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE
>> +                                                          : 0);
> 
> Maybe I'm getting confused, but I think the above is wrong and ASID
> should _always_ be flushed when running a HVM domain in shadow mode
> regardless of whether the underlying hw is Intel or AMD, ie:
> 
> bool shadow = paging_mode_shadow(d);
> unsigned int flags = (shadow ? FLUSH_TLB : 0) |
>                      (is_hvm_domain(d) &&
>                       (cpu_has_svm || shadow) ? FLUSH_HVM_ASID_CORE : 0);
> 
> Is the correct version.

Oh, indeed.

Jan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v10 1/3] x86/tlb: introduce a flush HVM ASIDs flag
  2020-04-22 16:33   ` Roger Pau Monné
  2020-04-23  7:40     ` Jan Beulich
@ 2020-04-23 10:30     ` Wei Liu
  2020-04-23 10:41       ` Jan Beulich
  1 sibling, 1 reply; 16+ messages in thread
From: Wei Liu @ 2020-04-23 10:30 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Wei Liu, Andrew Cooper, Tim Deegan, George Dunlap, Jan Beulich,
	xen-devel

On Wed, Apr 22, 2020 at 06:33:38PM +0200, Roger Pau Monné wrote:
> On Thu, Apr 16, 2020 at 03:59:07PM +0200, Roger Pau Monne wrote:
> > @@ -254,3 +257,14 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
> >  
> >      return flags;
> >  }
> > +
> > +void guest_flush_tlb_mask(const struct domain *d, const cpumask_t *mask)
> > +{
> > +    unsigned int flags = (is_pv_domain(d) || paging_mode_shadow(d) ? FLUSH_TLB
> > +                                                                   : 0) |
> > +                         (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE
> > +                                                          : 0);
> 
> Maybe I'm getting confused, but I think the above is wrong and ASID
> should _always_ be flushed when running a HVM domain in shadow mode
> regardless of whether the underlying hw is Intel or AMD, ie:
> 
> bool shadow = paging_mode_shadow(d);
> unsigned int flags = (shadow ? FLUSH_TLB : 0) |
>                      (is_hvm_domain(d) &&
>                       (cpu_has_svm || shadow) ? FLUSH_HVM_ASID_CORE : 0);

This sort of long expression is prone to error. See XSA-316.

Can I request it to be broken down a bit?

Wei.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v10 1/3] x86/tlb: introduce a flush HVM ASIDs flag
  2020-04-23 10:30     ` Wei Liu
@ 2020-04-23 10:41       ` Jan Beulich
  2020-04-23 10:57         ` Roger Pau Monné
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2020-04-23 10:41 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel, Andrew Cooper, Tim Deegan, George Dunlap,
	Roger Pau Monné

On 23.04.2020 12:30, Wei Liu wrote:
> On Wed, Apr 22, 2020 at 06:33:38PM +0200, Roger Pau Monné wrote:
>> On Thu, Apr 16, 2020 at 03:59:07PM +0200, Roger Pau Monne wrote:
>>> @@ -254,3 +257,14 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
>>>  
>>>      return flags;
>>>  }
>>> +
>>> +void guest_flush_tlb_mask(const struct domain *d, const cpumask_t *mask)
>>> +{
>>> +    unsigned int flags = (is_pv_domain(d) || paging_mode_shadow(d) ? FLUSH_TLB
>>> +                                                                   : 0) |
>>> +                         (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE
>>> +                                                          : 0);
>>
>> Maybe I'm getting confused, but I think the above is wrong and ASID
>> should _always_ be flushed when running a HVM domain in shadow mode
>> regardless of whether the underlying hw is Intel or AMD, ie:
>>
>> bool shadow = paging_mode_shadow(d);
>> unsigned int flags = (shadow ? FLUSH_TLB : 0) |
>>                      (is_hvm_domain(d) &&
>>                       (cpu_has_svm || shadow) ? FLUSH_HVM_ASID_CORE : 0);
> 
> This sort of long expression is prone to error. See XSA-316.

To be honest I consider it quite fine. XSA-316 was in particular
because of successive closing parentheses, of which there are
none here. (This isn't to say I would strictly mind splitting,
but I fear this would result in (multiple?) single use local
variables.)

Jan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v10 1/3] x86/tlb: introduce a flush HVM ASIDs flag
  2020-04-23 10:41       ` Jan Beulich
@ 2020-04-23 10:57         ` Roger Pau Monné
  2020-04-23 11:10           ` Jan Beulich
  2020-04-23 13:10           ` Wei Liu
  0 siblings, 2 replies; 16+ messages in thread
From: Roger Pau Monné @ 2020-04-23 10:57 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Tim Deegan, George Dunlap, Wei Liu, Andrew Cooper

On Thu, Apr 23, 2020 at 12:41:49PM +0200, Jan Beulich wrote:
> On 23.04.2020 12:30, Wei Liu wrote:
> > On Wed, Apr 22, 2020 at 06:33:38PM +0200, Roger Pau Monné wrote:
> >> On Thu, Apr 16, 2020 at 03:59:07PM +0200, Roger Pau Monne wrote:
> >>> @@ -254,3 +257,14 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
> >>>  
> >>>      return flags;
> >>>  }
> >>> +
> >>> +void guest_flush_tlb_mask(const struct domain *d, const cpumask_t *mask)
> >>> +{
> >>> +    unsigned int flags = (is_pv_domain(d) || paging_mode_shadow(d) ? FLUSH_TLB
> >>> +                                                                   : 0) |
> >>> +                         (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE
> >>> +                                                          : 0);
> >>
> >> Maybe I'm getting confused, but I think the above is wrong and ASID
> >> should _always_ be flushed when running a HVM domain in shadow mode
> >> regardless of whether the underlying hw is Intel or AMD, ie:
> >>
> >> bool shadow = paging_mode_shadow(d);
> >> unsigned int flags = (shadow ? FLUSH_TLB : 0) |
> >>                      (is_hvm_domain(d) &&
> >>                       (cpu_has_svm || shadow) ? FLUSH_HVM_ASID_CORE : 0);
> > 
> > This sort of long expression is prone to error. See XSA-316.
> 
> To be honest I consider it quite fine. XSA-316 was in particular
> because of successive closing parentheses, of which there are
> none here. (This isn't to say I would strictly mind splitting,
> but I fear this would result in (multiple?) single use local
> variables.)

Right now it's exactly (including the indentation):

    bool shadow = paging_mode_shadow(d);

    return (shadow ? FLUSH_TLB : 0) |
           (is_hvm_domain(d) && (cpu_has_svm || shadow) ? FLUSH_HVM_ASID_CORE
                                                        : 0);

I could change it to:

    bool shadow = paging_mode_shadow(d);
    bool asid = is_hvm_domain(d) && (cpu_has_svm || shadow);

    return (shadow ? FLUSH_TLB : 0) | (asid ? FLUSH_HVM_ASID_CORE : 0);

But would result in a single use asid local variable.

I think XSA-316 was slightly different because the issue arose from
assigning and comparing a variable inside of an if condition, which is
not the case here. I however don't mind changing it if it's regarded
as potentially insecure, or hard to read.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v10 1/3] x86/tlb: introduce a flush HVM ASIDs flag
  2020-04-23 10:57         ` Roger Pau Monné
@ 2020-04-23 11:10           ` Jan Beulich
  2020-04-23 13:10           ` Wei Liu
  1 sibling, 0 replies; 16+ messages in thread
From: Jan Beulich @ 2020-04-23 11:10 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Tim Deegan, George Dunlap, Wei Liu, Andrew Cooper

On 23.04.2020 12:57, Roger Pau Monné wrote:
> On Thu, Apr 23, 2020 at 12:41:49PM +0200, Jan Beulich wrote:
>> On 23.04.2020 12:30, Wei Liu wrote:
>>> On Wed, Apr 22, 2020 at 06:33:38PM +0200, Roger Pau Monné wrote:
>>>> On Thu, Apr 16, 2020 at 03:59:07PM +0200, Roger Pau Monne wrote:
>>>>> @@ -254,3 +257,14 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
>>>>>  
>>>>>      return flags;
>>>>>  }
>>>>> +
>>>>> +void guest_flush_tlb_mask(const struct domain *d, const cpumask_t *mask)
>>>>> +{
>>>>> +    unsigned int flags = (is_pv_domain(d) || paging_mode_shadow(d) ? FLUSH_TLB
>>>>> +                                                                   : 0) |
>>>>> +                         (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE
>>>>> +                                                          : 0);
>>>>
>>>> Maybe I'm getting confused, but I think the above is wrong and ASID
>>>> should _always_ be flushed when running a HVM domain in shadow mode
>>>> regardless of whether the underlying hw is Intel or AMD, ie:
>>>>
>>>> bool shadow = paging_mode_shadow(d);
>>>> unsigned int flags = (shadow ? FLUSH_TLB : 0) |
>>>>                      (is_hvm_domain(d) &&
>>>>                       (cpu_has_svm || shadow) ? FLUSH_HVM_ASID_CORE : 0);
>>>
>>> This sort of long expression is prone to error. See XSA-316.
>>
>> To be honest I consider it quite fine. XSA-316 was in particular
>> because of successive closing parentheses, of which there are
>> none here. (This isn't to say I would strictly mind splitting,
>> but I fear this would result in (multiple?) single use local
>> variables.)
> 
> Right now it's exactly (including the indentation):
> 
>     bool shadow = paging_mode_shadow(d);
> 
>     return (shadow ? FLUSH_TLB : 0) |
>            (is_hvm_domain(d) && (cpu_has_svm || shadow) ? FLUSH_HVM_ASID_CORE
>                                                         : 0);
> 
> I could change it to:
> 
>     bool shadow = paging_mode_shadow(d);
>     bool asid = is_hvm_domain(d) && (cpu_has_svm || shadow);
> 
>     return (shadow ? FLUSH_TLB : 0) | (asid ? FLUSH_HVM_ASID_CORE : 0);
> 
> But would result in a single use asid local variable.
> 
> I think XSA-316 was slightly different because the issue arose from
> assigning and comparing a variable inside of an if condition, which is
> not the case here. I however don't mind changing it if it's regarded
> as potentially insecure, or hard to read.

I'd be okay to ack either, but would still somewhat prefer the original
variant.

Jan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v10 1/3] x86/tlb: introduce a flush HVM ASIDs flag
  2020-04-23 10:57         ` Roger Pau Monné
  2020-04-23 11:10           ` Jan Beulich
@ 2020-04-23 13:10           ` Wei Liu
  1 sibling, 0 replies; 16+ messages in thread
From: Wei Liu @ 2020-04-23 13:10 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Wei Liu, Andrew Cooper, Tim Deegan, George Dunlap, Jan Beulich,
	xen-devel

On Thu, Apr 23, 2020 at 12:57:44PM +0200, Roger Pau Monné wrote:
> On Thu, Apr 23, 2020 at 12:41:49PM +0200, Jan Beulich wrote:
> > On 23.04.2020 12:30, Wei Liu wrote:
> > > On Wed, Apr 22, 2020 at 06:33:38PM +0200, Roger Pau Monné wrote:
> > >> On Thu, Apr 16, 2020 at 03:59:07PM +0200, Roger Pau Monne wrote:
> > >>> @@ -254,3 +257,14 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
> > >>>  
> > >>>      return flags;
> > >>>  }
> > >>> +
> > >>> +void guest_flush_tlb_mask(const struct domain *d, const cpumask_t *mask)
> > >>> +{
> > >>> +    unsigned int flags = (is_pv_domain(d) || paging_mode_shadow(d) ? FLUSH_TLB
> > >>> +                                                                   : 0) |
> > >>> +                         (is_hvm_domain(d) && cpu_has_svm ? FLUSH_HVM_ASID_CORE
> > >>> +                                                          : 0);
> > >>
> > >> Maybe I'm getting confused, but I think the above is wrong and ASID
> > >> should _always_ be flushed when running a HVM domain in shadow mode
> > >> regardless of whether the underlying hw is Intel or AMD, ie:
> > >>
> > >> bool shadow = paging_mode_shadow(d);
> > >> unsigned int flags = (shadow ? FLUSH_TLB : 0) |
> > >>                      (is_hvm_domain(d) &&
> > >>                       (cpu_has_svm || shadow) ? FLUSH_HVM_ASID_CORE : 0);
> > > 
> > > This sort of long expression is prone to error. See XSA-316.
> > 
> > To be honest I consider it quite fine. XSA-316 was in particular
> > because of successive closing parentheses, of which there are
> > none here. (This isn't to say I would strictly mind splitting,
> > but I fear this would result in (multiple?) single use local
> > variables.)
> 
> Right now it's exactly (including the indentation):
> 
>     bool shadow = paging_mode_shadow(d);
> 
>     return (shadow ? FLUSH_TLB : 0) |
>            (is_hvm_domain(d) && (cpu_has_svm || shadow) ? FLUSH_HVM_ASID_CORE
>                                                         : 0);
> 
> I could change it to:
> 
>     bool shadow = paging_mode_shadow(d);
>     bool asid = is_hvm_domain(d) && (cpu_has_svm || shadow);
> 
>     return (shadow ? FLUSH_TLB : 0) | (asid ? FLUSH_HVM_ASID_CORE : 0);

IMHO this is much clearer. I merely made a suggestion and it is up to
you and Jan to decide. :-)

Wei.


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-04-23 13:11 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-16 13:59 [PATCH v10 0/3] x86/guest: use assisted TLB flush in guest mode Roger Pau Monne
2020-04-16 13:59 ` [PATCH v10 1/3] x86/tlb: introduce a flush HVM ASIDs flag Roger Pau Monne
2020-04-21 10:21   ` Jan Beulich
2020-04-21 10:43     ` Roger Pau Monné
2020-04-21 12:59       ` Jan Beulich
2020-04-21 13:51         ` Roger Pau Monné
2020-04-21 14:09           ` Jan Beulich
2020-04-22 16:33   ` Roger Pau Monné
2020-04-23  7:40     ` Jan Beulich
2020-04-23 10:30     ` Wei Liu
2020-04-23 10:41       ` Jan Beulich
2020-04-23 10:57         ` Roger Pau Monné
2020-04-23 11:10           ` Jan Beulich
2020-04-23 13:10           ` Wei Liu
2020-04-16 13:59 ` [PATCH v10 2/3] x86/tlb: allow disabling the TLB clock Roger Pau Monne
2020-04-16 13:59 ` [PATCH v10 3/3] x86/tlb: use Xen L0 assisted TLB flush when available Roger Pau Monne

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.