All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv5 0/2] x86/ept: reduce translation invalidation impact
@ 2015-12-17 15:17 David Vrabel
  2015-12-17 15:17 ` [PATCHv5 1/2] x86/ept: invalidate guest physical mappings on VMENTER David Vrabel
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: David Vrabel @ 2015-12-17 15:17 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	David Vrabel, Jan Beulich

This series improves the performance of EPT by further reducing the
impact of the translation invalidations (ept_sync_domain()). By:

a) Deferring invalidations until the p2m write lock is released.

Prior to this change a 16 VCPU guest could not be successfully
migrated on an (admittedly slow) 160 PCPU box because the p2m write
lock was held for such extended periods of time.  This starved the
read lock needed (by the toolstack) to map the domain's memory,
triggering the watchdog.

After this change a 64 VCPU guest could be successfully migrated.

ept_sync_domain() is very expensive because:

a) it uses on_selected_cpus() and the IPI cost can be particularly
   high for a multi-socket machine.

b) on_selected_cpus() is serialized by its own spin lock.

On this particular box, ept_sync_domain() could take ~3-5 ms.

Simply using a fair rw lock was not sufficient to resolve this (but it
was an improvement) as the cost of the ept_sync_domain calls() was
still delaying the read locks enough for the watchdog to trigger (the
toolstack maps a batch of 1024 GFNs at a time, which means trying to
acquire the p2m read lock 1024 times).

Changes in v5:

- Fix PoD by explicitly doing an invalidation before reclaiming zero
  pages.
- Use the same mechanism for dealing with freeing page table pages.
  This isn't a common path and its simpler than the deferred list.

Changes in v4:

- __ept_sync_domain() is a no-op -- invalidates are done before VMENTER.
- initialize ept->invalidate to all ones so the initial invalidate is
  always done.

Changes in v3:

- Drop already applied "x86/ept: remove unnecessary sync after
  resolving misconfigured entries".
- Replaced "mm: don't free pages until mm locks are released" with
  "x86/ept: invalidate guest physical mappings on VMENTER".

Changes in v2:

- Use a per-p2m (not per-CPU) list for page table pages to be freed.
- Hold the write lock while updating the synced_mask.

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv5 1/2] x86/ept: invalidate guest physical mappings on VMENTER
  2015-12-17 15:17 [PATCHv5 0/2] x86/ept: reduce translation invalidation impact David Vrabel
@ 2015-12-17 15:17 ` David Vrabel
  2015-12-17 15:56   ` Andrew Cooper
  2015-12-18  7:53   ` Tian, Kevin
  2015-12-17 15:17 ` [PATCHv5 2/2] x86/ept: defer the invalidation until the p2m lock is released David Vrabel
  2015-12-18  7:46 ` [PATCHv5 0/2] x86/ept: reduce translation invalidation impact Tian, Kevin
  2 siblings, 2 replies; 12+ messages in thread
From: David Vrabel @ 2015-12-17 15:17 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	David Vrabel, Jan Beulich

If a guest allocates a page and the tlbflush_timestamp on the page
indicates that a TLB flush of the previous owner is required, only the
linear and combined mappings are invalidated.  The guest-physical
mappings are not invalidated.

This is currently safe because the EPT code ensures that the
guest-physical and combined mappings are invalidated /before/ the page
is freed.  However, this prevents us from deferring the EPT invalidate
until after the page is freed (e.g., to defer the invalidate until the
p2m locks are released).

The TLB flush that may be done after allocating page already causes
the original guest to VMEXIT, thus on VMENTER we can do an INVEPT if
one is pending.

This means __ept_sync_domain() need not do anything and the thus the
on_selected_cpu() call does not need to wait for as long.

ept_sync_domain() now marks all PCPUs as needing to be invalidated,
including PCPUs that the domain has not run on.  We still only IPI
those PCPUs that are active so this does not result in any more INVEPT
calls.

We do not attempt to track when PCPUs may have cached translations
because the only safe way to clear this per-CPU state is if
immediately after an invalidate the PCPU is not active (i.e., the PCPU
is not in d->domain_dirty_cpumask).  Since we only invalidate on
VMENTER or by IPIing active PCPUs this can never happen.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
---
v4:
- __ept_sync_domain() is a no-op -- invalidates are done before VMENTER.
- initialize ept->invalidate to all ones so the initial invalidate is
  always done.
---
 xen/arch/x86/hvm/vmx/vmx.c         | 22 ++++++++++------------
 xen/arch/x86/mm/p2m-ept.c          | 29 ++++++++++++++---------------
 xen/include/asm-x86/hvm/vmx/vmcs.h |  3 +--
 3 files changed, 25 insertions(+), 29 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index f7c5e4f..cca35f2 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -744,24 +744,12 @@ static void vmx_ctxt_switch_from(struct vcpu *v)
 
 static void vmx_ctxt_switch_to(struct vcpu *v)
 {
-    struct domain *d = v->domain;
     unsigned long old_cr4 = read_cr4(), new_cr4 = mmu_cr4_features;
-    struct ept_data *ept_data = &p2m_get_hostp2m(d)->ept;
 
     /* HOST_CR4 in VMCS is always mmu_cr4_features. Sync CR4 now. */
     if ( old_cr4 != new_cr4 )
         write_cr4(new_cr4);
 
-    if ( paging_mode_hap(d) )
-    {
-        unsigned int cpu = smp_processor_id();
-        /* Test-and-test-and-set this CPU in the EPT-is-synced mask. */
-        if ( !cpumask_test_cpu(cpu, ept_get_synced_mask(ept_data)) &&
-             !cpumask_test_and_set_cpu(cpu,
-                                       ept_get_synced_mask(ept_data)) )
-            __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept_data), 0);
-    }
-
     vmx_restore_guest_msrs(v);
     vmx_restore_dr(v);
 }
@@ -3507,6 +3495,16 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs)
     if ( unlikely(need_flush) )
         vpid_sync_all();
 
+    if ( paging_mode_hap(curr->domain) )
+    {
+        struct ept_data *ept = &p2m_get_hostp2m(curr->domain)->ept;
+        unsigned int cpu = smp_processor_id();
+
+        if ( cpumask_test_cpu(cpu, ept->invalidate)
+             && cpumask_test_and_clear_cpu(cpu, ept->invalidate) )
+            __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept), 0);
+    }
+
  out:
     HVMTRACE_ND(VMENTRY, 0, 1/*cycles*/, 0, 0, 0, 0, 0, 0, 0);
 
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index eef0372..6e0cf89 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -1089,9 +1089,10 @@ static void ept_memory_type_changed(struct p2m_domain *p2m)
 
 static void __ept_sync_domain(void *info)
 {
-    struct ept_data *ept = &((struct p2m_domain *)info)->ept;
-
-    __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept), 0);
+    /*
+     * The invalidate will be done before VMENTER (see
+     * vmx_vmenter_helper()).
+     */
 }
 
 void ept_sync_domain(struct p2m_domain *p2m)
@@ -1107,16 +1108,10 @@ void ept_sync_domain(struct p2m_domain *p2m)
     if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) )
         p2m_flush_nestedp2m(d);
 
-    /*
-     * Flush active cpus synchronously. Flush others the next time this domain
-     * is scheduled onto them. We accept the race of other CPUs adding to
-     * the ept_synced mask before on_selected_cpus() reads it, resulting in
-     * unnecessary extra flushes, to avoid allocating a cpumask_t on the stack.
-     */
-    cpumask_and(ept_get_synced_mask(ept),
-                d->domain_dirty_cpumask, &cpu_online_map);
+    /* May need to invalidate on all PCPUs. */
+    cpumask_setall(ept->invalidate);
 
-    on_selected_cpus(ept_get_synced_mask(ept),
+    on_selected_cpus(d->domain_dirty_cpumask,
                      __ept_sync_domain, p2m, 1);
 }
 
@@ -1182,10 +1177,14 @@ int ept_p2m_init(struct p2m_domain *p2m)
         p2m->flush_hardware_cached_dirty = ept_flush_pml_buffers;
     }
 
-    if ( !zalloc_cpumask_var(&ept->synced_mask) )
+    if ( !zalloc_cpumask_var(&ept->invalidate) )
         return -ENOMEM;
 
-    on_each_cpu(__ept_sync_domain, p2m, 1);
+    /*
+     * Assume an initial invalidation is required, in case an EP4TA is
+     * reused.
+     */
+    cpumask_setall(ept->invalidate);
 
     return 0;
 }
@@ -1193,7 +1192,7 @@ int ept_p2m_init(struct p2m_domain *p2m)
 void ept_p2m_uninit(struct p2m_domain *p2m)
 {
     struct ept_data *ept = &p2m->ept;
-    free_cpumask_var(ept->synced_mask);
+    free_cpumask_var(ept->invalidate);
 }
 
 static void ept_dump_p2m_table(unsigned char key)
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index a8d4d5b..e778d86 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -67,7 +67,7 @@ struct ept_data {
         };
         u64 eptp;
     };
-    cpumask_var_t synced_mask;
+    cpumask_var_t invalidate;
 };
 
 #define _VMX_DOMAIN_PML_ENABLED    0
@@ -97,7 +97,6 @@ struct pi_desc {
 #define ept_get_wl(ept)   ((ept)->ept_wl)
 #define ept_get_asr(ept)  ((ept)->asr)
 #define ept_get_eptp(ept) ((ept)->eptp)
-#define ept_get_synced_mask(ept) ((ept)->synced_mask)
 
 #define NR_PML_ENTRIES   512
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHv5 2/2] x86/ept: defer the invalidation until the p2m lock is released
  2015-12-17 15:17 [PATCHv5 0/2] x86/ept: reduce translation invalidation impact David Vrabel
  2015-12-17 15:17 ` [PATCHv5 1/2] x86/ept: invalidate guest physical mappings on VMENTER David Vrabel
@ 2015-12-17 15:17 ` David Vrabel
  2015-12-18  7:46 ` [PATCHv5 0/2] x86/ept: reduce translation invalidation impact Tian, Kevin
  2 siblings, 0 replies; 12+ messages in thread
From: David Vrabel @ 2015-12-17 15:17 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	David Vrabel, Jan Beulich

Holding the p2m lock while calling ept_sync_domain() is very expensive
since it does a on_selected_cpus() call.  IPIs on many socket machines
can be very slows and on_selected_cpus() is serialized.

It is safe to defer the invalidate until the p2m lock is released
except for two cases:

1. When freeing a page table page (since partial translations may be
   cached).
2. When reclaiming a zero page as part of PoD.

For these cases, add p2m_tlb_flush_sync() calls which will immediately
perform the invalidate before the page is freed or reclaimed.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
v5:
- add p2m_tlb_flush_sync() and call it before freeing pgae table pages
  and reclaiming zeroed pod pages.

v2:
- use per-p2m list for deferred pages.
- update synced_mask while holding write lock.
---
 xen/arch/x86/mm/mm-locks.h | 23 +++++++++++++++--------
 xen/arch/x86/mm/p2m-ept.c  | 44 ++++++++++++++++++++++++++++++++++++--------
 xen/arch/x86/mm/p2m-pod.c  |  2 ++
 xen/arch/x86/mm/p2m.c      | 14 ++++++++++++++
 xen/include/asm-x86/p2m.h  | 10 ++++++++++
 5 files changed, 77 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
index 76c7217..b1a92e2 100644
--- a/xen/arch/x86/mm/mm-locks.h
+++ b/xen/arch/x86/mm/mm-locks.h
@@ -263,14 +263,21 @@ declare_mm_lock(altp2mlist)
  */
 
 declare_mm_rwlock(altp2m);
-#define p2m_lock(p)                         \
-{                                           \
-    if ( p2m_is_altp2m(p) )                 \
-        mm_write_lock(altp2m, &(p)->lock);  \
-    else                                    \
-        mm_write_lock(p2m, &(p)->lock);     \
-}
-#define p2m_unlock(p)         mm_write_unlock(&(p)->lock);
+#define p2m_lock(p)                             \
+    do {                                        \
+        if ( p2m_is_altp2m(p) )                 \
+            mm_write_lock(altp2m, &(p)->lock);  \
+        else                                    \
+            mm_write_lock(p2m, &(p)->lock);     \
+        (p)->defer_flush++;                     \
+    } while (0)
+#define p2m_unlock(p)                           \
+    do {                                        \
+        if ( --(p)->defer_flush == 0 )          \
+            p2m_tlb_flush_and_unlock(p);        \
+        else                                    \
+            mm_write_unlock(&(p)->lock);        \
+    } while (0)
 #define gfn_lock(p,g,o)       p2m_lock(p)
 #define gfn_unlock(p,g,o)     p2m_unlock(p)
 #define p2m_read_lock(p)      mm_read_lock(p2m, &(p)->lock)
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 6e0cf89..69fff61 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -843,7 +843,10 @@ out:
        from the iommu tables, so as to avoid a potential
        use-after-free. */
     if ( is_epte_present(&old_entry) )
+    {
+        p2m_tlb_flush_sync(p2m);
         ept_free_entry(p2m, &old_entry, target);
+    }
 
     if ( rc == 0 && p2m_is_hostp2m(p2m) )
         p2m_altp2m_propagate_change(d, _gfn(gfn), mfn, order, p2mt, p2ma);
@@ -1095,24 +1098,48 @@ static void __ept_sync_domain(void *info)
      */
 }
 
-void ept_sync_domain(struct p2m_domain *p2m)
+static void ept_sync_domain_prepare(struct p2m_domain *p2m)
 {
     struct domain *d = p2m->domain;
     struct ept_data *ept = &p2m->ept;
-    /* Only if using EPT and this domain has some VCPUs to dirty. */
-    if ( !paging_mode_hap(d) || !d->vcpu || !d->vcpu[0] )
-        return;
-
-    ASSERT(local_irq_is_enabled());
 
     if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) )
         p2m_flush_nestedp2m(d);
 
     /* May need to invalidate on all PCPUs. */
     cpumask_setall(ept->invalidate);
+}
+
+static void ept_sync_domain_mask(struct p2m_domain *p2m, const cpumask_t *mask)
+{
+    on_selected_cpus(mask, __ept_sync_domain, p2m, 1);
+}
 
-    on_selected_cpus(d->domain_dirty_cpumask,
-                     __ept_sync_domain, p2m, 1);
+void ept_sync_domain(struct p2m_domain *p2m)
+{
+    struct domain *d = p2m->domain;
+
+    /* Only if using EPT and this domain has some VCPUs to dirty. */
+    if ( !paging_mode_hap(d) || !d->vcpu || !d->vcpu[0] )
+        return;
+
+    ept_sync_domain_prepare(p2m);
+
+    if ( p2m->defer_flush )
+    {
+        p2m->need_flush = 1;
+        return;
+    }
+
+    ept_sync_domain_mask(p2m, d->domain_dirty_cpumask);
+}
+
+static void ept_flush_and_unlock(struct p2m_domain *p2m, bool_t unlock)
+{
+    p2m->need_flush = 0;
+    if ( unlock )
+        mm_write_unlock(&p2m->lock);
+    ept_sync_domain_mask(p2m, p2m->domain->domain_dirty_cpumask);
 }
 
 static void ept_enable_pml(struct p2m_domain *p2m)
@@ -1163,6 +1190,7 @@ int ept_p2m_init(struct p2m_domain *p2m)
     p2m->change_entry_type_range = ept_change_entry_type_range;
     p2m->memory_type_changed = ept_memory_type_changed;
     p2m->audit_p2m = NULL;
+    p2m->flush_and_unlock = ept_flush_and_unlock;
 
     /* Set the memory type used when accessing EPT paging structures. */
     ept->ept_mt = EPT_DEFAULT_MT;
diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
index ea16d3e..a5d672e 100644
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -886,6 +886,8 @@ p2m_pod_zero_check(struct p2m_domain *p2m, unsigned long *gfns, int count)
         }
     }
 
+    p2m_tlb_flush_sync(p2m);
+
     /* Now check each page for real */
     for ( i=0; i < count; i++ )
     {
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index ed0bbd7..efb15cd 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -324,6 +324,20 @@ void p2m_flush_hardware_cached_dirty(struct domain *d)
     }
 }
 
+void p2m_tlb_flush_sync(struct p2m_domain *p2m)
+{
+    if ( p2m->need_flush )
+        p2m->flush_and_unlock(p2m, 0);
+}
+
+void p2m_tlb_flush_and_unlock(struct p2m_domain *p2m)
+{
+    if ( p2m->need_flush )
+        p2m->flush_and_unlock(p2m, 1);
+    else
+        mm_write_unlock(&p2m->lock);
+}
+
 mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn,
                     p2m_type_t *t, p2m_access_t *a, p2m_query_t q,
                     unsigned int *page_order, bool_t locked)
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index fa46dd9..9c394c2 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -261,6 +261,10 @@ struct p2m_domain {
                                           unsigned long gfn, l1_pgentry_t *p,
                                           l1_pgentry_t new, unsigned int level);
     long               (*audit_p2m)(struct p2m_domain *p2m);
+    void               (*flush_and_unlock)(struct p2m_domain *p2m, bool_t unlock);
+
+    unsigned int defer_flush;
+    bool_t need_flush;
 
     /* Default P2M access type for each page in the the domain: new pages,
      * swapped in pages, cleared pages, and pages that are ambiguously
@@ -353,6 +357,12 @@ static inline bool_t p2m_is_altp2m(const struct p2m_domain *p2m)
 
 #define p2m_get_pagetable(p2m)  ((p2m)->phys_table)
 
+/*
+ * Ensure any deferred p2m TLB flush has been completed on all VCPUs.
+ */
+void p2m_tlb_flush_sync(struct p2m_domain *p2m);
+void p2m_tlb_flush_and_unlock(struct p2m_domain *p2m);
+
 /**** p2m query accessors. They lock p2m_lock, and thus serialize
  * lookups wrt modifications. They _do not_ release the lock on exit.
  * After calling any of the variants below, caller needs to use
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCHv5 1/2] x86/ept: invalidate guest physical mappings on VMENTER
  2015-12-17 15:17 ` [PATCHv5 1/2] x86/ept: invalidate guest physical mappings on VMENTER David Vrabel
@ 2015-12-17 15:56   ` Andrew Cooper
  2015-12-18  7:53   ` Tian, Kevin
  1 sibling, 0 replies; 12+ messages in thread
From: Andrew Cooper @ 2015-12-17 15:56 UTC (permalink / raw)
  To: David Vrabel, xen-devel
  Cc: George Dunlap, Kevin Tian, Jan Beulich, Jun Nakajima

On 17/12/15 15:17, David Vrabel wrote:
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index a8d4d5b..e778d86 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -67,7 +67,7 @@ struct ept_data {
>          };
>          u64 eptp;
>      };
> -    cpumask_var_t synced_mask;
> +    cpumask_var_t invalidate;

Comment? :(

~Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv5 0/2] x86/ept: reduce translation invalidation impact
  2015-12-17 15:17 [PATCHv5 0/2] x86/ept: reduce translation invalidation impact David Vrabel
  2015-12-17 15:17 ` [PATCHv5 1/2] x86/ept: invalidate guest physical mappings on VMENTER David Vrabel
  2015-12-17 15:17 ` [PATCHv5 2/2] x86/ept: defer the invalidation until the p2m lock is released David Vrabel
@ 2015-12-18  7:46 ` Tian, Kevin
  2015-12-18 10:12   ` David Vrabel
  2 siblings, 1 reply; 12+ messages in thread
From: Tian, Kevin @ 2015-12-18  7:46 UTC (permalink / raw)
  To: David Vrabel, xen-devel
  Cc: George Dunlap, Andrew Cooper, Jan Beulich, Nakajima, Jun

> From: David Vrabel [mailto:david.vrabel@citrix.com]
> Sent: Thursday, December 17, 2015 11:17 PM
> 
> This series improves the performance of EPT by further reducing the
> impact of the translation invalidations (ept_sync_domain()). By:
> 
> a) Deferring invalidations until the p2m write lock is released.

This is only for PATCH 2/2 right? No description of PATCH 1/2 in this summary.

> 
> Prior to this change a 16 VCPU guest could not be successfully
> migrated on an (admittedly slow) 160 PCPU box because the p2m write
> lock was held for such extended periods of time.  This starved the
> read lock needed (by the toolstack) to map the domain's memory,
> triggering the watchdog.
> 
> After this change a 64 VCPU guest could be successfully migrated.
> 
> ept_sync_domain() is very expensive because:
> 
> a) it uses on_selected_cpus() and the IPI cost can be particularly
>    high for a multi-socket machine.
> 
> b) on_selected_cpus() is serialized by its own spin lock.
> 
> On this particular box, ept_sync_domain() could take ~3-5 ms.
> 
> Simply using a fair rw lock was not sufficient to resolve this (but it
> was an improvement) as the cost of the ept_sync_domain calls() was
> still delaying the read locks enough for the watchdog to trigger (the
> toolstack maps a batch of 1024 GFNs at a time, which means trying to
> acquire the p2m read lock 1024 times).
> 
> Changes in v5:
> 
> - Fix PoD by explicitly doing an invalidation before reclaiming zero
>   pages.
> - Use the same mechanism for dealing with freeing page table pages.
>   This isn't a common path and its simpler than the deferred list.
> 
> Changes in v4:
> 
> - __ept_sync_domain() is a no-op -- invalidates are done before VMENTER.
> - initialize ept->invalidate to all ones so the initial invalidate is
>   always done.
> 
> Changes in v3:
> 
> - Drop already applied "x86/ept: remove unnecessary sync after
>   resolving misconfigured entries".
> - Replaced "mm: don't free pages until mm locks are released" with
>   "x86/ept: invalidate guest physical mappings on VMENTER".
> 
> Changes in v2:
> 
> - Use a per-p2m (not per-CPU) list for page table pages to be freed.
> - Hold the write lock while updating the synced_mask.
> 
> David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv5 1/2] x86/ept: invalidate guest physical mappings on VMENTER
  2015-12-17 15:17 ` [PATCHv5 1/2] x86/ept: invalidate guest physical mappings on VMENTER David Vrabel
  2015-12-17 15:56   ` Andrew Cooper
@ 2015-12-18  7:53   ` Tian, Kevin
  2015-12-18 10:18     ` David Vrabel
  1 sibling, 1 reply; 12+ messages in thread
From: Tian, Kevin @ 2015-12-18  7:53 UTC (permalink / raw)
  To: David Vrabel, xen-devel
  Cc: George Dunlap, Andrew Cooper, Jan Beulich, Nakajima, Jun

> From: David Vrabel [mailto:david.vrabel@citrix.com]
> Sent: Thursday, December 17, 2015 11:17 PM

[...]

> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index f7c5e4f..cca35f2 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c

[...]

> @@ -3507,6 +3495,16 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs)
>      if ( unlikely(need_flush) )
>          vpid_sync_all();
> 
> +    if ( paging_mode_hap(curr->domain) )
> +    {
> +        struct ept_data *ept = &p2m_get_hostp2m(curr->domain)->ept;
> +        unsigned int cpu = smp_processor_id();
> +
> +        if ( cpumask_test_cpu(cpu, ept->invalidate)
> +             && cpumask_test_and_clear_cpu(cpu, ept->invalidate) )

Just test_and_clear should be enough.

> +            __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept), 0);
> +    }
> +
>   out:
>      HVMTRACE_ND(VMENTRY, 0, 1/*cycles*/, 0, 0, 0, 0, 0, 0, 0);
> 
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index eef0372..6e0cf89 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -1089,9 +1089,10 @@ static void ept_memory_type_changed(struct p2m_domain
> *p2m)
> 
>  static void __ept_sync_domain(void *info)
>  {
> -    struct ept_data *ept = &((struct p2m_domain *)info)->ept;
> -
> -    __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept), 0);
> +    /*
> +     * The invalidate will be done before VMENTER (see

invalidate -> invalidation?

> +     * vmx_vmenter_helper()).
> +     */
>  }
> 
>  void ept_sync_domain(struct p2m_domain *p2m)
> @@ -1107,16 +1108,10 @@ void ept_sync_domain(struct p2m_domain *p2m)
>      if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) )
>          p2m_flush_nestedp2m(d);
> 
> -    /*
> -     * Flush active cpus synchronously. Flush others the next time this domain
> -     * is scheduled onto them. We accept the race of other CPUs adding to
> -     * the ept_synced mask before on_selected_cpus() reads it, resulting in
> -     * unnecessary extra flushes, to avoid allocating a cpumask_t on the stack.
> -     */
> -    cpumask_and(ept_get_synced_mask(ept),
> -                d->domain_dirty_cpumask, &cpu_online_map);
> +    /* May need to invalidate on all PCPUs. */

It'd be good to add your earlier description why invalidating all PCPUs are
OK here, to help others better understand the logic later.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv5 0/2] x86/ept: reduce translation invalidation impact
  2015-12-18  7:46 ` [PATCHv5 0/2] x86/ept: reduce translation invalidation impact Tian, Kevin
@ 2015-12-18 10:12   ` David Vrabel
  0 siblings, 0 replies; 12+ messages in thread
From: David Vrabel @ 2015-12-18 10:12 UTC (permalink / raw)
  To: Tian, Kevin, David Vrabel, xen-devel
  Cc: George Dunlap, Andrew Cooper, Nakajima, Jun, Jan Beulich

On 18/12/15 07:46, Tian, Kevin wrote:
>> From: David Vrabel [mailto:david.vrabel@citrix.com]
>> Sent: Thursday, December 17, 2015 11:17 PM
>>
>> This series improves the performance of EPT by further reducing the
>> impact of the translation invalidations (ept_sync_domain()). By:
>>
>> a) Deferring invalidations until the p2m write lock is released.
> 
> This is only for PATCH 2/2 right? No description of PATCH 1/2 in this summary.

Yes.  Patch 1/2 is a prerequisite.  I think the description makes this
clear.

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv5 1/2] x86/ept: invalidate guest physical mappings on VMENTER
  2015-12-18  7:53   ` Tian, Kevin
@ 2015-12-18 10:18     ` David Vrabel
  2015-12-18 10:24       ` George Dunlap
  2015-12-18 10:32       ` Jan Beulich
  0 siblings, 2 replies; 12+ messages in thread
From: David Vrabel @ 2015-12-18 10:18 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel
  Cc: George Dunlap, Andrew Cooper, Jan Beulich, Nakajima, Jun

On 18/12/15 07:53, Tian, Kevin wrote:
>> From: David Vrabel [mailto:david.vrabel@citrix.com]
>> Sent: Thursday, December 17, 2015 11:17 PM
> 
> [...]
> 
>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>> index f7c5e4f..cca35f2 100644
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> 
> [...]
> 
>> @@ -3507,6 +3495,16 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs)
>>      if ( unlikely(need_flush) )
>>          vpid_sync_all();
>>
>> +    if ( paging_mode_hap(curr->domain) )
>> +    {
>> +        struct ept_data *ept = &p2m_get_hostp2m(curr->domain)->ept;
>> +        unsigned int cpu = smp_processor_id();
>> +
>> +        if ( cpumask_test_cpu(cpu, ept->invalidate)
>> +             && cpumask_test_and_clear_cpu(cpu, ept->invalidate) )
> 
> Just test_and_clear should be enough.

The first test is to avoid the locked test and clear in the common case.
 But this is probably better written as

  if ( cpumask_test_cpu(cpu, ept->invalidate) )
  {
      cpumask_clear_cpu(cpu, ept->invalidate);
      __invept(...);
   }

>> --- a/xen/arch/x86/mm/p2m-ept.c
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -1089,9 +1089,10 @@ static void ept_memory_type_changed(struct p2m_domain
>> *p2m)
>>
>>  static void __ept_sync_domain(void *info)
>>  {
>> -    struct ept_data *ept = &((struct p2m_domain *)info)->ept;
>> -
>> -    __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept), 0);
>> +    /*
>> +     * The invalidate will be done before VMENTER (see
> 
> invalidate -> invalidation?

That would be the correct English grammer, yes.

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv5 1/2] x86/ept: invalidate guest physical mappings on VMENTER
  2015-12-18 10:18     ` David Vrabel
@ 2015-12-18 10:24       ` George Dunlap
  2015-12-18 10:30         ` David Vrabel
  2015-12-18 10:32       ` Jan Beulich
  1 sibling, 1 reply; 12+ messages in thread
From: George Dunlap @ 2015-12-18 10:24 UTC (permalink / raw)
  To: David Vrabel, Tian, Kevin, xen-devel
  Cc: George Dunlap, Andrew Cooper, Jan Beulich, Nakajima, Jun

On 18/12/15 10:18, David Vrabel wrote:
> On 18/12/15 07:53, Tian, Kevin wrote:
>>> From: David Vrabel [mailto:david.vrabel@citrix.com]
>>> Sent: Thursday, December 17, 2015 11:17 PM
>>
>> [...]
>>
>>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>>> index f7c5e4f..cca35f2 100644
>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>
>> [...]
>>
>>> @@ -3507,6 +3495,16 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs)
>>>      if ( unlikely(need_flush) )
>>>          vpid_sync_all();
>>>
>>> +    if ( paging_mode_hap(curr->domain) )
>>> +    {
>>> +        struct ept_data *ept = &p2m_get_hostp2m(curr->domain)->ept;
>>> +        unsigned int cpu = smp_processor_id();
>>> +
>>> +        if ( cpumask_test_cpu(cpu, ept->invalidate)
>>> +             && cpumask_test_and_clear_cpu(cpu, ept->invalidate) )
>>
>> Just test_and_clear should be enough.
> 
> The first test is to avoid the locked test and clear in the common case.
>  But this is probably better written as
> 
>   if ( cpumask_test_cpu(cpu, ept->invalidate) )
>   {
>       cpumask_clear_cpu(cpu, ept->invalidate);
>       __invept(...);
>    }
> 
>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>> @@ -1089,9 +1089,10 @@ static void ept_memory_type_changed(struct p2m_domain
>>> *p2m)
>>>
>>>  static void __ept_sync_domain(void *info)
>>>  {
>>> -    struct ept_data *ept = &((struct p2m_domain *)info)->ept;
>>> -
>>> -    __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept), 0);
>>> +    /*
>>> +     * The invalidate will be done before VMENTER (see
>>
>> invalidate -> invalidation?
> 
> That would be the correct English grammer, yes.

FWIW I think either one is acceptable English grammar.

 -George

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv5 1/2] x86/ept: invalidate guest physical mappings on VMENTER
  2015-12-18 10:24       ` George Dunlap
@ 2015-12-18 10:30         ` David Vrabel
  0 siblings, 0 replies; 12+ messages in thread
From: David Vrabel @ 2015-12-18 10:30 UTC (permalink / raw)
  To: George Dunlap, Tian, Kevin, xen-devel
  Cc: George Dunlap, Andrew Cooper, Jan Beulich, Nakajima, Jun

On 18/12/15 10:24, George Dunlap wrote:
>>>> +     * The invalidate will be done before VMENTER (see
>>>
>>> invalidate -> invalidation?
>>
>> That would be the correct English grammer, yes.
> 
> FWIW I think either one is acceptable English grammar.

invalidate is a verb, invalidation is the corresponding noun.  Although
I grant you that nouning verbs is quite common in informal English.  But
I think it best to use more formal English in comments etc. to benefit
non-native speakers.

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv5 1/2] x86/ept: invalidate guest physical mappings on VMENTER
  2015-12-18 10:18     ` David Vrabel
  2015-12-18 10:24       ` George Dunlap
@ 2015-12-18 10:32       ` Jan Beulich
  2015-12-18 11:37         ` David Vrabel
  1 sibling, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2015-12-18 10:32 UTC (permalink / raw)
  To: David Vrabel
  Cc: George Dunlap, Andrew Cooper, Kevin Tian, Jun Nakajima, xen-devel

>>> On 18.12.15 at 11:18, <david.vrabel@citrix.com> wrote:
> On 18/12/15 07:53, Tian, Kevin wrote:
>>> From: David Vrabel [mailto:david.vrabel@citrix.com]
>>> Sent: Thursday, December 17, 2015 11:17 PM
>> 
>> [...]
>> 
>>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>>> index f7c5e4f..cca35f2 100644
>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> 
>> [...]
>> 
>>> @@ -3507,6 +3495,16 @@ void vmx_vmenter_helper(const struct cpu_user_regs 
> *regs)
>>>      if ( unlikely(need_flush) )
>>>          vpid_sync_all();
>>>
>>> +    if ( paging_mode_hap(curr->domain) )
>>> +    {
>>> +        struct ept_data *ept = &p2m_get_hostp2m(curr->domain)->ept;
>>> +        unsigned int cpu = smp_processor_id();
>>> +
>>> +        if ( cpumask_test_cpu(cpu, ept->invalidate)
>>> +             && cpumask_test_and_clear_cpu(cpu, ept->invalidate) )
>> 
>> Just test_and_clear should be enough.
> 
> The first test is to avoid the locked test and clear in the common case.
>  But this is probably better written as
> 
>   if ( cpumask_test_cpu(cpu, ept->invalidate) )
>   {
>       cpumask_clear_cpu(cpu, ept->invalidate);
>       __invept(...);
>    }

The question is what you care for more: Avoiding the invalidation if
you lose the race here, or avoiding the extra conditional. (Either is
correct in this context afaict.)

Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv5 1/2] x86/ept: invalidate guest physical mappings on VMENTER
  2015-12-18 10:32       ` Jan Beulich
@ 2015-12-18 11:37         ` David Vrabel
  0 siblings, 0 replies; 12+ messages in thread
From: David Vrabel @ 2015-12-18 11:37 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Andrew Cooper, Kevin Tian, Jun Nakajima, xen-devel

On 18/12/15 10:32, Jan Beulich wrote:
>>>> On 18.12.15 at 11:18, <david.vrabel@citrix.com> wrote:
>> On 18/12/15 07:53, Tian, Kevin wrote:
>>>> From: David Vrabel [mailto:david.vrabel@citrix.com]
>>>> Sent: Thursday, December 17, 2015 11:17 PM
>>>
>>> [...]
>>>
>>>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>>>> index f7c5e4f..cca35f2 100644
>>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>>
>>> [...]
>>>
>>>> @@ -3507,6 +3495,16 @@ void vmx_vmenter_helper(const struct cpu_user_regs 
>> *regs)
>>>>      if ( unlikely(need_flush) )
>>>>          vpid_sync_all();
>>>>
>>>> +    if ( paging_mode_hap(curr->domain) )
>>>> +    {
>>>> +        struct ept_data *ept = &p2m_get_hostp2m(curr->domain)->ept;
>>>> +        unsigned int cpu = smp_processor_id();
>>>> +
>>>> +        if ( cpumask_test_cpu(cpu, ept->invalidate)
>>>> +             && cpumask_test_and_clear_cpu(cpu, ept->invalidate) )
>>>
>>> Just test_and_clear should be enough.
>>
>> The first test is to avoid the locked test and clear in the common case.
>>  But this is probably better written as
>>
>>   if ( cpumask_test_cpu(cpu, ept->invalidate) )
>>   {
>>       cpumask_clear_cpu(cpu, ept->invalidate);
>>       __invept(...);
>>    }
> 
> The question is what you care for more: Avoiding the invalidation if
> you lose the race here, or avoiding the extra conditional. (Either is
> correct in this context afaict.)

There is no race since each pcpu clears its own bit.

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-12-18 11:37 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-17 15:17 [PATCHv5 0/2] x86/ept: reduce translation invalidation impact David Vrabel
2015-12-17 15:17 ` [PATCHv5 1/2] x86/ept: invalidate guest physical mappings on VMENTER David Vrabel
2015-12-17 15:56   ` Andrew Cooper
2015-12-18  7:53   ` Tian, Kevin
2015-12-18 10:18     ` David Vrabel
2015-12-18 10:24       ` George Dunlap
2015-12-18 10:30         ` David Vrabel
2015-12-18 10:32       ` Jan Beulich
2015-12-18 11:37         ` David Vrabel
2015-12-17 15:17 ` [PATCHv5 2/2] x86/ept: defer the invalidation until the p2m lock is released David Vrabel
2015-12-18  7:46 ` [PATCHv5 0/2] x86/ept: reduce translation invalidation impact Tian, Kevin
2015-12-18 10:12   ` David Vrabel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.