[PATCHv8 0/1] x86/ept: reduce translation invalidation impact

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCHv8 0/1] x86/ept: reduce translation invalidation impact
@ 2016-04-12 16:19 David Vrabel
  2016-04-12 16:19 ` [PATCHv8] x86/ept: defer the invalidation until the p2m lock is released David Vrabel
  0 siblings, 1 reply; 8+ messages in thread
From: David Vrabel @ 2016-04-12 16:19 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	David Vrabel, Jan Beulich

This series improves the performance of EPT by further reducing the
impact of the translation invalidations (ept_sync_domain()). By:

a) Deferring invalidations until the p2m write lock is released.

Prior to this change a 16 VCPU guest could not be successfully
migrated on an (admittedly slow) 160 PCPU box because the p2m write
lock was held for such extended periods of time.  This starved the
read lock needed (by the toolstack) to map the domain's memory,
triggering the watchdog.

After this change a 64 VCPU guest could be successfully migrated.

ept_sync_domain() is very expensive because:

a) it uses on_selected_cpus() and the IPI cost can be particularly
   high for a multi-socket machine.

b) on_selected_cpus() is serialized by its own spin lock.

On this particular box, ept_sync_domain() could take ~3-5 ms.

Simply using a fair rw lock was not sufficient to resolve this (but it
was an improvement) as the cost of the ept_sync_domain calls() was
still delaying the read locks enough for the watchdog to trigger (the
toolstack maps a batch of 1024 GFNs at a time, which means trying to
acquire the p2m read lock 1024 times).

Changes in v8:

- p2m_tlb_flush_and_unlock() -> p2m_unlock_and_tlb_flush().
- p2m_unlock_and_tlb_flush() now does the unlock and the p2m
  implementation need only provide a tlb_flush() op.

Changes in v7:

- Add some more p2m_tlb_flush_sync() calls to PoD.
- More comments.

Changes in v6:

- Fix performance bug in patch #2.
- Improve comments.

Changes in v5:

- Fix PoD by explicitly doing an invalidation before reclaiming zero
  pages.
- Use the same mechanism for dealing with freeing page table pages.
  This isn't a common path and its simpler than the deferred list.

Changes in v4:

- __ept_sync_domain() is a no-op -- invalidates are done before VMENTER.
- initialize ept->invalidate to all ones so the initial invalidate is
  always done.

Changes in v3:

- Drop already applied "x86/ept: remove unnecessary sync after
  resolving misconfigured entries".
- Replaced "mm: don't free pages until mm locks are released" with
  "x86/ept: invalidate guest physical mappings on VMENTER".

Changes in v2:

- Use a per-p2m (not per-CPU) list for page table pages to be freed.
- Hold the write lock while updating the synced_mask.

David

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCHv8] x86/ept: defer the invalidation until the p2m lock is released
  2016-04-12 16:19 [PATCHv8 0/1] x86/ept: reduce translation invalidation impact David Vrabel
@ 2016-04-12 16:19 ` David Vrabel
  2016-04-15 16:18   ` George Dunlap
                     ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: David Vrabel @ 2016-04-12 16:19 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	David Vrabel, Jan Beulich

Holding the p2m lock while calling ept_sync_domain() is very expensive
since it does an on_selected_cpus() call.  IPIs on many socket
machines can be very slow and on_selected_cpus() is serialized.

It is safe to defer the invalidate until the p2m lock is released
except for two cases:

1. When freeing a page table page (since partial translations may be
   cached).
2. When reclaiming a zero page as part of PoD.

For these cases, add p2m_tlb_flush_sync() calls which will immediately
perform the invalidate before the page is freed or reclaimed.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
v8:
- p2m_tlb_flush_and_unlock() -> p2m_unlock_and_tlb_flush().
- p2m_unlock_and_tlb_flush() now does the unlock and the p2m
  implementation need only provide a tlb_flush() op.

v7:
- Add some more p2m_tlb_flush_sync() calls to PoD.
- More comments.

v6:
- Move p2m_tlb_flush_sync() to immediately before p2m_free_ptp().  It was
  called all the time otherwise.

v5:
- add p2m_tlb_flush_sync() and call it before freeing pgae table pages
  and reclaiming zeroed pod pages.

v2:
- use per-p2m list for deferred pages.
- update synced_mask while holding write lock.
---
 xen/arch/x86/mm/mm-locks.h | 23 +++++++++++++++--------
 xen/arch/x86/mm/p2m-ept.c  | 39 +++++++++++++++++++++++++++++++--------
 xen/arch/x86/mm/p2m-pod.c  |  4 ++++
 xen/arch/x86/mm/p2m.c      | 26 ++++++++++++++++++++++++++
 xen/include/asm-x86/p2m.h  | 22 ++++++++++++++++++++++
 5 files changed, 98 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
index 8a40986..086c8bb 100644
--- a/xen/arch/x86/mm/mm-locks.h
+++ b/xen/arch/x86/mm/mm-locks.h
@@ -265,14 +265,21 @@ declare_mm_lock(altp2mlist)
  */
 
 declare_mm_rwlock(altp2m);
-#define p2m_lock(p)                         \
-{                                           \
-    if ( p2m_is_altp2m(p) )                 \
-        mm_write_lock(altp2m, &(p)->lock);  \
-    else                                    \
-        mm_write_lock(p2m, &(p)->lock);     \
-}
-#define p2m_unlock(p)         mm_write_unlock(&(p)->lock);
+#define p2m_lock(p)                             \
+    do {                                        \
+        if ( p2m_is_altp2m(p) )                 \
+            mm_write_lock(altp2m, &(p)->lock);  \
+        else                                    \
+            mm_write_lock(p2m, &(p)->lock);     \
+        (p)->defer_flush++;                     \
+    } while (0)
+#define p2m_unlock(p)                           \
+    do {                                        \
+        if ( --(p)->defer_flush == 0 )          \
+            p2m_unlock_and_tlb_flush(p);        \
+        else                                    \
+            mm_write_unlock(&(p)->lock);        \
+    } while (0)
 #define gfn_lock(p,g,o)       p2m_lock(p)
 #define gfn_unlock(p,g,o)     p2m_unlock(p)
 #define p2m_read_lock(p)      mm_read_lock(p2m, &(p)->lock)
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 3cb6868..1ed5b47 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -264,6 +264,7 @@ static void ept_free_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry, int l
         unmap_domain_page(epte);
     }
     
+    p2m_tlb_flush_sync(p2m);
     p2m_free_ptp(p2m, mfn_to_page(ept_entry->mfn));
 }
 
@@ -1096,15 +1097,10 @@ static void __ept_sync_domain(void *info)
      */
 }
 
-void ept_sync_domain(struct p2m_domain *p2m)
+static void ept_sync_domain_prepare(struct p2m_domain *p2m)
 {
     struct domain *d = p2m->domain;
     struct ept_data *ept = &p2m->ept;
-    /* Only if using EPT and this domain has some VCPUs to dirty. */
-    if ( !paging_mode_hap(d) || !d->vcpu || !d->vcpu[0] )
-        return;
-
-    ASSERT(local_irq_is_enabled());
 
     if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) )
         p2m_flush_nestedp2m(d);
@@ -1117,9 +1113,35 @@ void ept_sync_domain(struct p2m_domain *p2m)
      *    of an EP4TA reuse is still needed.
      */
     cpumask_setall(ept->invalidate);
+}
+
+static void ept_sync_domain_mask(struct p2m_domain *p2m, const cpumask_t *mask)
+{
+    on_selected_cpus(mask, __ept_sync_domain, p2m, 1);
+}
+
+void ept_sync_domain(struct p2m_domain *p2m)
+{
+    struct domain *d = p2m->domain;
 
-    on_selected_cpus(d->domain_dirty_cpumask,
-                     __ept_sync_domain, p2m, 1);
+    /* Only if using EPT and this domain has some VCPUs to dirty. */
+    if ( !paging_mode_hap(d) || !d->vcpu || !d->vcpu[0] )
+        return;
+
+    ept_sync_domain_prepare(p2m);
+
+    if ( p2m->defer_flush )
+    {
+        p2m->need_flush = 1;
+        return;
+    }
+
+    ept_sync_domain_mask(p2m, d->domain_dirty_cpumask);
+}
+
+static void ept_tlb_flush(struct p2m_domain *p2m)
+{
+    ept_sync_domain_mask(p2m, p2m->domain->domain_dirty_cpumask);
 }
 
 static void ept_enable_pml(struct p2m_domain *p2m)
@@ -1170,6 +1192,7 @@ int ept_p2m_init(struct p2m_domain *p2m)
     p2m->change_entry_type_range = ept_change_entry_type_range;
     p2m->memory_type_changed = ept_memory_type_changed;
     p2m->audit_p2m = NULL;
+    p2m->tlb_flush = ept_tlb_flush;
 
     /* Set the memory type used when accessing EPT paging structures. */
     ept->ept_mt = EPT_DEFAULT_MT;
diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
index ea16d3e..35835d1 100644
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -626,6 +626,7 @@ p2m_pod_decrease_reservation(struct domain *d,
 
             p2m_set_entry(p2m, gpfn + i, _mfn(INVALID_MFN), cur_order,
                           p2m_invalid, p2m->default_access);
+            p2m_tlb_flush_sync(p2m);
             for ( j = 0; j < n; ++j )
                 set_gpfn_from_mfn(mfn_x(mfn), INVALID_M2P_ENTRY);
             p2m_pod_cache_add(p2m, page, cur_order);
@@ -755,6 +756,7 @@ p2m_pod_zero_check_superpage(struct p2m_domain *p2m, unsigned long gfn)
     /* Try to remove the page, restoring old mapping if it fails. */
     p2m_set_entry(p2m, gfn, _mfn(INVALID_MFN), PAGE_ORDER_2M,
                   p2m_populate_on_demand, p2m->default_access);
+    p2m_tlb_flush_sync(p2m);
 
     /* Make none of the MFNs are used elsewhere... for example, mapped
      * via the grant table interface, or by qemu.  Allow one refcount for
@@ -886,6 +888,8 @@ p2m_pod_zero_check(struct p2m_domain *p2m, unsigned long *gfns, int count)
         }
     }
 
+    p2m_tlb_flush_sync(p2m);
+
     /* Now check each page for real */
     for ( i=0; i < count; i++ )
     {
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index b3fce1b..491deac 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -325,6 +325,32 @@ void p2m_flush_hardware_cached_dirty(struct domain *d)
     }
 }
 
+/*
+ * Force a synchronous P2M TLB flush if a deferred flush is pending.
+ *
+ * Must be called with the p2m lock held.
+ */
+void p2m_tlb_flush_sync(struct p2m_domain *p2m)
+{
+    if ( p2m->need_flush ) {
+        p2m->need_flush = 0;
+        p2m->tlb_flush(p2m);
+    }
+}
+
+/*
+ * Unlock the p2m lock and do a P2M TLB flush if needed.
+ */
+void p2m_unlock_and_tlb_flush(struct p2m_domain *p2m)
+{
+    if ( p2m->need_flush ) {
+        p2m->need_flush = 0;
+        mm_write_unlock(&p2m->lock);
+        p2m->tlb_flush(p2m);
+    } else
+        mm_write_unlock(&p2m->lock);
+}
+
 mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn,
                     p2m_type_t *t, p2m_access_t *a, p2m_query_t q,
                     unsigned int *page_order, bool_t locked)
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 5392eb0..65675a2 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -262,6 +262,22 @@ struct p2m_domain {
                                           l1_pgentry_t new, unsigned int level);
     long               (*audit_p2m)(struct p2m_domain *p2m);
 
+    /*
+     * P2M updates may require TLBs to be flushed (invalidated).
+     *
+     * If 'defer_flush' is set, flushes may be deferred by setting
+     * 'need_flush' and then flushing in 'tlb_flush()'.
+     *
+     * 'tlb_flush()' is only called if 'need_flush' was set.
+     *
+     * If a flush may be being deferred but an immediate flush is
+     * required (e.g., if a page is being freed to pool other than the
+     * domheap), call p2m_tlb_flush_sync().
+     */
+    void (*tlb_flush)(struct p2m_domain *p2m);
+    unsigned int defer_flush;
+    bool_t need_flush;
+
     /* Default P2M access type for each page in the the domain: new pages,
      * swapped in pages, cleared pages, and pages that are ambiguously
      * retyped get this access type.  See definition of p2m_access_t. */
@@ -353,6 +369,12 @@ static inline bool_t p2m_is_altp2m(const struct p2m_domain *p2m)
 
 #define p2m_get_pagetable(p2m)  ((p2m)->phys_table)
 
+/*
+ * Ensure any deferred p2m TLB flush has been completed on all VCPUs.
+ */
+void p2m_tlb_flush_sync(struct p2m_domain *p2m);
+void p2m_unlock_and_tlb_flush(struct p2m_domain *p2m);
+
 /**** p2m query accessors. They lock p2m_lock, and thus serialize
  * lookups wrt modifications. They _do not_ release the lock on exit.
  * After calling any of the variants below, caller needs to use
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCHv8] x86/ept: defer the invalidation until the p2m lock is released
  2016-04-12 16:19 ` [PATCHv8] x86/ept: defer the invalidation until the p2m lock is released David Vrabel
@ 2016-04-15 16:18   ` George Dunlap
  2016-04-19  3:10   ` Tian, Kevin
  2016-04-19  7:18   ` David Vrabel
  2 siblings, 0 replies; 8+ messages in thread
From: George Dunlap @ 2016-04-15 16:18 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, Kevin Tian, Jan Beulich, Jun Nakajima, Andrew Cooper

On Tue, Apr 12, 2016 at 5:19 PM, David Vrabel <david.vrabel@citrix.com> wrote:
> Holding the p2m lock while calling ept_sync_domain() is very expensive
> since it does an on_selected_cpus() call.  IPIs on many socket
> machines can be very slow and on_selected_cpus() is serialized.
>
> It is safe to defer the invalidate until the p2m lock is released
> except for two cases:
>
> 1. When freeing a page table page (since partial translations may be
>    cached).
> 2. When reclaiming a zero page as part of PoD.
>
> For these cases, add p2m_tlb_flush_sync() calls which will immediately
> perform the invalidate before the page is freed or reclaimed.
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

Looks good, thanks:

Reviewed-by: George Dunlap <geroge.dunlap@citrix.com>

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCHv8] x86/ept: defer the invalidation until the p2m lock is released
  2016-04-12 16:19 ` [PATCHv8] x86/ept: defer the invalidation until the p2m lock is released David Vrabel
  2016-04-15 16:18   ` George Dunlap
@ 2016-04-19  3:10   ` Tian, Kevin
  2016-04-19  7:18   ` David Vrabel
  2 siblings, 0 replies; 8+ messages in thread
From: Tian, Kevin @ 2016-04-19  3:10 UTC (permalink / raw)
  To: David Vrabel, xen-devel
  Cc: George Dunlap, Andrew Cooper, Jan Beulich, Nakajima, Jun

> From: David Vrabel [mailto:david.vrabel@citrix.com]
> Sent: Wednesday, April 13, 2016 12:20 AM
> 
> Holding the p2m lock while calling ept_sync_domain() is very expensive
> since it does an on_selected_cpus() call.  IPIs on many socket
> machines can be very slow and on_selected_cpus() is serialized.
> 
> It is safe to defer the invalidate until the p2m lock is released
> except for two cases:
> 
> 1. When freeing a page table page (since partial translations may be
>    cached).
> 2. When reclaiming a zero page as part of PoD.
> 
> For these cases, add p2m_tlb_flush_sync() calls which will immediately
> perform the invalidate before the page is freed or reclaimed.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCHv8] x86/ept: defer the invalidation until the p2m lock is released
  2016-04-12 16:19 ` [PATCHv8] x86/ept: defer the invalidation until the p2m lock is released David Vrabel
  2016-04-15 16:18   ` George Dunlap
  2016-04-19  3:10   ` Tian, Kevin
@ 2016-04-19  7:18   ` David Vrabel
  2016-04-22 10:49     ` Wei Liu
  2 siblings, 1 reply; 8+ messages in thread
From: David Vrabel @ 2016-04-19  7:18 UTC (permalink / raw)
  To: David Vrabel, xen-devel
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, George Dunlap, Andrew Cooper,
	Jan Beulich

Hi Wei,

This patch has all the required acks now.  Can you consider it for 4.7?

It's a signficant scalability improvement (see the cover letter for
details).

v7 has been in XenServer's upcoming release for a while now so it has
been tested with many guests and many life cycle operations, including
plenty of uses of PoD.

Thanks.

David

On 12/04/2016 17:19, David Vrabel wrote:
> Holding the p2m lock while calling ept_sync_domain() is very expensive
> since it does an on_selected_cpus() call.  IPIs on many socket
> machines can be very slow and on_selected_cpus() is serialized.
> 
> It is safe to defer the invalidate until the p2m lock is released
> except for two cases:
> 
> 1. When freeing a page table page (since partial translations may be
>    cached).
> 2. When reclaiming a zero page as part of PoD.
> 
> For these cases, add p2m_tlb_flush_sync() calls which will immediately
> perform the invalidate before the page is freed or reclaimed.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
> v8:
> - p2m_tlb_flush_and_unlock() -> p2m_unlock_and_tlb_flush().
> - p2m_unlock_and_tlb_flush() now does the unlock and the p2m
>   implementation need only provide a tlb_flush() op.
> 
> v7:
> - Add some more p2m_tlb_flush_sync() calls to PoD.
> - More comments.
> 
> v6:
> - Move p2m_tlb_flush_sync() to immediately before p2m_free_ptp().  It was
>   called all the time otherwise.
> 
> v5:
> - add p2m_tlb_flush_sync() and call it before freeing pgae table pages
>   and reclaiming zeroed pod pages.
> 
> v2:
> - use per-p2m list for deferred pages.
> - update synced_mask while holding write lock.
> ---
>  xen/arch/x86/mm/mm-locks.h | 23 +++++++++++++++--------
>  xen/arch/x86/mm/p2m-ept.c  | 39 +++++++++++++++++++++++++++++++--------
>  xen/arch/x86/mm/p2m-pod.c  |  4 ++++
>  xen/arch/x86/mm/p2m.c      | 26 ++++++++++++++++++++++++++
>  xen/include/asm-x86/p2m.h  | 22 ++++++++++++++++++++++
>  5 files changed, 98 insertions(+), 16 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
> index 8a40986..086c8bb 100644
> --- a/xen/arch/x86/mm/mm-locks.h
> +++ b/xen/arch/x86/mm/mm-locks.h
> @@ -265,14 +265,21 @@ declare_mm_lock(altp2mlist)
>   */
>  
>  declare_mm_rwlock(altp2m);
> -#define p2m_lock(p)                         \
> -{                                           \
> -    if ( p2m_is_altp2m(p) )                 \
> -        mm_write_lock(altp2m, &(p)->lock);  \
> -    else                                    \
> -        mm_write_lock(p2m, &(p)->lock);     \
> -}
> -#define p2m_unlock(p)         mm_write_unlock(&(p)->lock);
> +#define p2m_lock(p)                             \
> +    do {                                        \
> +        if ( p2m_is_altp2m(p) )                 \
> +            mm_write_lock(altp2m, &(p)->lock);  \
> +        else                                    \
> +            mm_write_lock(p2m, &(p)->lock);     \
> +        (p)->defer_flush++;                     \
> +    } while (0)
> +#define p2m_unlock(p)                           \
> +    do {                                        \
> +        if ( --(p)->defer_flush == 0 )          \
> +            p2m_unlock_and_tlb_flush(p);        \
> +        else                                    \
> +            mm_write_unlock(&(p)->lock);        \
> +    } while (0)
>  #define gfn_lock(p,g,o)       p2m_lock(p)
>  #define gfn_unlock(p,g,o)     p2m_unlock(p)
>  #define p2m_read_lock(p)      mm_read_lock(p2m, &(p)->lock)
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index 3cb6868..1ed5b47 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -264,6 +264,7 @@ static void ept_free_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry, int l
>          unmap_domain_page(epte);
>      }
>      
> +    p2m_tlb_flush_sync(p2m);
>      p2m_free_ptp(p2m, mfn_to_page(ept_entry->mfn));
>  }
>  
> @@ -1096,15 +1097,10 @@ static void __ept_sync_domain(void *info)
>       */
>  }
>  
> -void ept_sync_domain(struct p2m_domain *p2m)
> +static void ept_sync_domain_prepare(struct p2m_domain *p2m)
>  {
>      struct domain *d = p2m->domain;
>      struct ept_data *ept = &p2m->ept;
> -    /* Only if using EPT and this domain has some VCPUs to dirty. */
> -    if ( !paging_mode_hap(d) || !d->vcpu || !d->vcpu[0] )
> -        return;
> -
> -    ASSERT(local_irq_is_enabled());
>  
>      if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) )
>          p2m_flush_nestedp2m(d);
> @@ -1117,9 +1113,35 @@ void ept_sync_domain(struct p2m_domain *p2m)
>       *    of an EP4TA reuse is still needed.
>       */
>      cpumask_setall(ept->invalidate);
> +}
> +
> +static void ept_sync_domain_mask(struct p2m_domain *p2m, const cpumask_t *mask)
> +{
> +    on_selected_cpus(mask, __ept_sync_domain, p2m, 1);
> +}
> +
> +void ept_sync_domain(struct p2m_domain *p2m)
> +{
> +    struct domain *d = p2m->domain;
>  
> -    on_selected_cpus(d->domain_dirty_cpumask,
> -                     __ept_sync_domain, p2m, 1);
> +    /* Only if using EPT and this domain has some VCPUs to dirty. */
> +    if ( !paging_mode_hap(d) || !d->vcpu || !d->vcpu[0] )
> +        return;
> +
> +    ept_sync_domain_prepare(p2m);
> +
> +    if ( p2m->defer_flush )
> +    {
> +        p2m->need_flush = 1;
> +        return;
> +    }
> +
> +    ept_sync_domain_mask(p2m, d->domain_dirty_cpumask);
> +}
> +
> +static void ept_tlb_flush(struct p2m_domain *p2m)
> +{
> +    ept_sync_domain_mask(p2m, p2m->domain->domain_dirty_cpumask);
>  }
>  
>  static void ept_enable_pml(struct p2m_domain *p2m)
> @@ -1170,6 +1192,7 @@ int ept_p2m_init(struct p2m_domain *p2m)
>      p2m->change_entry_type_range = ept_change_entry_type_range;
>      p2m->memory_type_changed = ept_memory_type_changed;
>      p2m->audit_p2m = NULL;
> +    p2m->tlb_flush = ept_tlb_flush;
>  
>      /* Set the memory type used when accessing EPT paging structures. */
>      ept->ept_mt = EPT_DEFAULT_MT;
> diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
> index ea16d3e..35835d1 100644
> --- a/xen/arch/x86/mm/p2m-pod.c
> +++ b/xen/arch/x86/mm/p2m-pod.c
> @@ -626,6 +626,7 @@ p2m_pod_decrease_reservation(struct domain *d,
>  
>              p2m_set_entry(p2m, gpfn + i, _mfn(INVALID_MFN), cur_order,
>                            p2m_invalid, p2m->default_access);
> +            p2m_tlb_flush_sync(p2m);
>              for ( j = 0; j < n; ++j )
>                  set_gpfn_from_mfn(mfn_x(mfn), INVALID_M2P_ENTRY);
>              p2m_pod_cache_add(p2m, page, cur_order);
> @@ -755,6 +756,7 @@ p2m_pod_zero_check_superpage(struct p2m_domain *p2m, unsigned long gfn)
>      /* Try to remove the page, restoring old mapping if it fails. */
>      p2m_set_entry(p2m, gfn, _mfn(INVALID_MFN), PAGE_ORDER_2M,
>                    p2m_populate_on_demand, p2m->default_access);
> +    p2m_tlb_flush_sync(p2m);
>  
>      /* Make none of the MFNs are used elsewhere... for example, mapped
>       * via the grant table interface, or by qemu.  Allow one refcount for
> @@ -886,6 +888,8 @@ p2m_pod_zero_check(struct p2m_domain *p2m, unsigned long *gfns, int count)
>          }
>      }
>  
> +    p2m_tlb_flush_sync(p2m);
> +
>      /* Now check each page for real */
>      for ( i=0; i < count; i++ )
>      {
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index b3fce1b..491deac 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -325,6 +325,32 @@ void p2m_flush_hardware_cached_dirty(struct domain *d)
>      }
>  }
>  
> +/*
> + * Force a synchronous P2M TLB flush if a deferred flush is pending.
> + *
> + * Must be called with the p2m lock held.
> + */
> +void p2m_tlb_flush_sync(struct p2m_domain *p2m)
> +{
> +    if ( p2m->need_flush ) {
> +        p2m->need_flush = 0;
> +        p2m->tlb_flush(p2m);
> +    }
> +}
> +
> +/*
> + * Unlock the p2m lock and do a P2M TLB flush if needed.
> + */
> +void p2m_unlock_and_tlb_flush(struct p2m_domain *p2m)
> +{
> +    if ( p2m->need_flush ) {
> +        p2m->need_flush = 0;
> +        mm_write_unlock(&p2m->lock);
> +        p2m->tlb_flush(p2m);
> +    } else
> +        mm_write_unlock(&p2m->lock);
> +}
> +
>  mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn,
>                      p2m_type_t *t, p2m_access_t *a, p2m_query_t q,
>                      unsigned int *page_order, bool_t locked)
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index 5392eb0..65675a2 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -262,6 +262,22 @@ struct p2m_domain {
>                                            l1_pgentry_t new, unsigned int level);
>      long               (*audit_p2m)(struct p2m_domain *p2m);
>  
> +    /*
> +     * P2M updates may require TLBs to be flushed (invalidated).
> +     *
> +     * If 'defer_flush' is set, flushes may be deferred by setting
> +     * 'need_flush' and then flushing in 'tlb_flush()'.
> +     *
> +     * 'tlb_flush()' is only called if 'need_flush' was set.
> +     *
> +     * If a flush may be being deferred but an immediate flush is
> +     * required (e.g., if a page is being freed to pool other than the
> +     * domheap), call p2m_tlb_flush_sync().
> +     */
> +    void (*tlb_flush)(struct p2m_domain *p2m);
> +    unsigned int defer_flush;
> +    bool_t need_flush;
> +
>      /* Default P2M access type for each page in the the domain: new pages,
>       * swapped in pages, cleared pages, and pages that are ambiguously
>       * retyped get this access type.  See definition of p2m_access_t. */
> @@ -353,6 +369,12 @@ static inline bool_t p2m_is_altp2m(const struct p2m_domain *p2m)
>  
>  #define p2m_get_pagetable(p2m)  ((p2m)->phys_table)
>  
> +/*
> + * Ensure any deferred p2m TLB flush has been completed on all VCPUs.
> + */
> +void p2m_tlb_flush_sync(struct p2m_domain *p2m);
> +void p2m_unlock_and_tlb_flush(struct p2m_domain *p2m);
> +
>  /**** p2m query accessors. They lock p2m_lock, and thus serialize
>   * lookups wrt modifications. They _do not_ release the lock on exit.
>   * After calling any of the variants below, caller needs to use
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCHv8] x86/ept: defer the invalidation until the p2m lock is released
  2016-04-19  7:18   ` David Vrabel
@ 2016-04-22 10:49     ` Wei Liu
  2016-04-22 10:52       ` George Dunlap
  0 siblings, 1 reply; 8+ messages in thread
From: Wei Liu @ 2016-04-22 10:49 UTC (permalink / raw)
  To: David Vrabel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, George Dunlap, Andrew Cooper,
	David Vrabel, Jun Nakajima, xen-devel

On Tue, Apr 19, 2016 at 08:18:14AM +0100, David Vrabel wrote:
> Hi Wei,
> 
> This patch has all the required acks now.  Can you consider it for 4.7?
> 
> It's a signficant scalability improvement (see the cover letter for
> details).
> 
> v7 has been in XenServer's upcoming release for a while now so it has
> been tested with many guests and many life cycle operations, including
> plenty of uses of PoD.
> 

Thanks for prodding. I've gone through the thread.

Release-acked-by: Wei Liu <wei.liu2@citrix.com>

Jan / Konrad / Andrew, please could you apply this patch as soon as
possible? We're close to cutting RC1.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCHv8] x86/ept: defer the invalidation until the p2m lock is released
  2016-04-22 10:49     ` Wei Liu
@ 2016-04-22 10:52       ` George Dunlap
  2016-04-22 10:55         ` Wei Liu
  0 siblings, 1 reply; 8+ messages in thread
From: George Dunlap @ 2016-04-22 10:52 UTC (permalink / raw)
  To: Wei Liu, David Vrabel
  Cc: Kevin Tian, Jan Beulich, George Dunlap, Andrew Cooper,
	David Vrabel, Jun Nakajima, xen-devel

On 22/04/16 11:49, Wei Liu wrote:
> On Tue, Apr 19, 2016 at 08:18:14AM +0100, David Vrabel wrote:
>> Hi Wei,
>>
>> This patch has all the required acks now.  Can you consider it for 4.7?
>>
>> It's a signficant scalability improvement (see the cover letter for
>> details).
>>
>> v7 has been in XenServer's upcoming release for a while now so it has
>> been tested with many guests and many life cycle operations, including
>> plenty of uses of PoD.
>>
> 
> Thanks for prodding. I've gone through the thread.
> 
> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
> 
> Jan / Konrad / Andrew, please could you apply this patch as soon as
> possible? We're close to cutting RC1.

Thank Wei -- I'll queue it and push it.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCHv8] x86/ept: defer the invalidation until the p2m lock is released
  2016-04-22 10:52       ` George Dunlap
@ 2016-04-22 10:55         ` Wei Liu
  0 siblings, 0 replies; 8+ messages in thread
From: Wei Liu @ 2016-04-22 10:55 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, Wei Liu, Jan Beulich, George Dunlap, Andrew Cooper,
	David Vrabel, David Vrabel, Jun Nakajima, xen-devel

On Fri, Apr 22, 2016 at 11:52:07AM +0100, George Dunlap wrote:
> On 22/04/16 11:49, Wei Liu wrote:
> > On Tue, Apr 19, 2016 at 08:18:14AM +0100, David Vrabel wrote:
> >> Hi Wei,
> >>
> >> This patch has all the required acks now.  Can you consider it for 4.7?
> >>
> >> It's a signficant scalability improvement (see the cover letter for
> >> details).
> >>
> >> v7 has been in XenServer's upcoming release for a while now so it has
> >> been tested with many guests and many life cycle operations, including
> >> plenty of uses of PoD.
> >>
> > 
> > Thanks for prodding. I've gone through the thread.
> > 
> > Release-acked-by: Wei Liu <wei.liu2@citrix.com>
> > 
> > Jan / Konrad / Andrew, please could you apply this patch as soon as
> > possible? We're close to cutting RC1.
> 
> Thank Wei -- I'll queue it and push it.
> 

Thank you! :-)

>  -George
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-04-22 10:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-12 16:19 [PATCHv8 0/1] x86/ept: reduce translation invalidation impact David Vrabel
2016-04-12 16:19 ` [PATCHv8] x86/ept: defer the invalidation until the p2m lock is released David Vrabel
2016-04-15 16:18   ` George Dunlap
2016-04-19  3:10   ` Tian, Kevin
2016-04-19  7:18   ` David Vrabel
2016-04-22 10:49     ` Wei Liu
2016-04-22 10:52       ` George Dunlap
2016-04-22 10:55         ` Wei Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).