All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCHv2 0/3]: x86/ept: reduce translation invalidation impact
@ 2015-11-13 18:49 David Vrabel
  2015-11-13 18:49 ` [PATCHv2 1/3] x86/ept: remove unnecessary sync after resolving misconfigured entries David Vrabel
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: David Vrabel @ 2015-11-13 18:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Tim Deegan, David Vrabel, Jan Beulich

This RFC series improves the performance of EPT by reducing the impact
of the translation invalidations (ept_sync_domain()).  Two approaches
are used:

a) Removing unnecessary invalidations after fixing misconfigured
   entries (after a type change).

b) Deferring invalidations until the p2m write lock is released.

Prior to this change a 16 VCPU guest could not be successfully
migrated on an (admittedly slow) 160 PCPU box because the p2m write
lock was held for such extended periods of time.  This starved the
read lock needed (by the toolstack) to map the domain's memory,
triggering the watchdog.

After this change a 64 VCPU guest could be successfully migrated.

ept_sync_domain() is very expensive because:

a) it uses on_selected_cpus() and the IPI cost can be particularly
   high for a multi-socket machine.

b) on_selected_cpus() is serialized by its own spin lock.

On this particular box, ept_sync_domain() could take ~3-5 ms.

Simply using a fair rw lock was not sufficient to resolve this (but it
was an improvement) as the cost of the ept_sync_domain calls() was
still delaying the read locks enough for the watchdog to trigger (the
toolstack maps a batch of 1024 GFNs at a time, which means trying to
acquire the p2m read lock 1024 times).

Changes in v2:

- Use a per-p2m (not per-CPU) list for page table pages to be freed.
- Hold the write lock while updating the synced_mask.

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCHv2 1/3] x86/ept: remove unnecessary sync after resolving misconfigured entries
  2015-11-13 18:49 [RFC PATCHv2 0/3]: x86/ept: reduce translation invalidation impact David Vrabel
@ 2015-11-13 18:49 ` David Vrabel
  2015-12-02  7:17   ` Tian, Kevin
  2015-11-13 18:49 ` [PATCHv2 2/3] mm: don't free pages until mm locks are released David Vrabel
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: David Vrabel @ 2015-11-13 18:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Tim Deegan, David Vrabel, Jan Beulich

When using EPT, type changes are done with the following steps:

1. Set entry as invalid (misconfigured) by settings a reserved memory
type.

2. Flush all EPT and combined translations (ept_sync_domain()).

3. Fixup misconfigured entries as required (on EPT_MISCONFIG vmexits or
when explicitly setting an entry.

Since resolve_misconfig() only updates entries that were misconfigured,
there is no need to invalidate any translations since the hardware
does not cache misconfigured translations (vol 3, section 28.3.2).

Remove the unnecessary (and very expensive) ept_sync_domain() calls).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v2:
- remove needs_sync enum
---
 xen/arch/x86/mm/p2m-ept.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 86440fc..548f2f2 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -644,7 +644,6 @@ bool_t ept_handle_misconfig(uint64_t gpa)
     spurious = curr->arch.hvm_vmx.ept_spurious_misconfig;
     rc = resolve_misconfig(p2m, PFN_DOWN(gpa));
     curr->arch.hvm_vmx.ept_spurious_misconfig = 0;
-    ept_sync_domain(p2m);
 
     p2m_unlock(p2m);
 
@@ -671,7 +670,7 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
     bool_t need_modify_vtd_table = 1;
     bool_t vtd_pte_present = 0;
     unsigned int iommu_flags = p2m_get_iommu_flags(p2mt);
-    enum { sync_off, sync_on, sync_check } needs_sync = sync_check;
+    bool_t needs_sync = 1;
     ept_entry_t old_entry = { .epte = 0 };
     ept_entry_t new_entry = { .epte = 0 };
     struct ept_data *ept = &p2m->ept;
@@ -692,12 +691,7 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
     /* Carry out any eventually pending earlier changes first. */
     ret = resolve_misconfig(p2m, gfn);
     if ( ret < 0 )
-    {
-        ept_sync_domain(p2m);
         return ret;
-    }
-    if ( ret > 0 )
-        needs_sync = sync_on;
 
     ASSERT((target == 2 && hap_has_1gb) ||
            (target == 1 && hap_has_2mb) ||
@@ -740,8 +734,8 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         /* We reached the target level. */
 
         /* No need to flush if the old entry wasn't valid */
-        if ( needs_sync == sync_check && !is_epte_present(ept_entry) )
-            needs_sync = sync_off;
+        if ( !is_epte_present(ept_entry) )
+            needs_sync = 0;
 
         /* If we're replacing a non-leaf entry with a leaf entry (1GiB or 2MiB),
          * the intermediate tables will be freed below after the ept flush
@@ -823,7 +817,7 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         p2m->max_mapped_pfn = gfn + (1UL << order) - 1;
 
 out:
-    if ( needs_sync != sync_off )
+    if ( needs_sync )
         ept_sync_domain(p2m);
 
     /* For host p2m, may need to change VT-d page table.*/
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCHv2 2/3] mm: don't free pages until mm locks are released
  2015-11-13 18:49 [RFC PATCHv2 0/3]: x86/ept: reduce translation invalidation impact David Vrabel
  2015-11-13 18:49 ` [PATCHv2 1/3] x86/ept: remove unnecessary sync after resolving misconfigured entries David Vrabel
@ 2015-11-13 18:49 ` David Vrabel
  2015-11-16 11:52   ` Jan Beulich
                     ` (2 more replies)
  2015-11-13 18:49 ` [PATCHv2 3/3] x86/ept: defer the invalidation until the p2m lock is released David Vrabel
  2015-12-02  7:14 ` [RFC PATCHv2 0/3]: x86/ept: reduce translation invalidation impact Tian, Kevin
  3 siblings, 3 replies; 22+ messages in thread
From: David Vrabel @ 2015-11-13 18:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Tim Deegan, David Vrabel, Jan Beulich

If a page is freed without translations being invalidated, and the page is
subsequently allocated to another domain, a guest with a cached
translation will still be able to access the page.

Currently translations are invalidated before releasing the page ref, but
while still holding the mm locks.  To allow translations to be invalidated
without holding the mm locks, we need to keep a reference to the page
for a bit longer in some cases.

[ This seems difficult to a) verify as correct; and b) difficult to get
correct in the future.  A better suggestion would be useful.  Perhaps
using something like pg->tlbflush_needed mechanism that already exists
for pages from PV guests? ]

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/arch/x86/mm/p2m.c | 9 +++++++--
 xen/common/memory.c   | 2 +-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index ed0bbd7..e2c82b1 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -2758,6 +2758,7 @@ int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
     p2m_type_t p2mt, p2mt_prev;
     unsigned long prev_mfn, mfn;
     struct page_info *page;
+    struct page_info *prev_page = NULL;
     int rc;
     struct domain *fdom;
 
@@ -2805,6 +2806,9 @@ int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
     prev_mfn = mfn_x(get_gfn(tdom, gpfn, &p2mt_prev));
     if ( mfn_valid(_mfn(prev_mfn)) )
     {
+        prev_page = mfn_to_page(_mfn(prev_mfn));
+        get_page(prev_page, tdom);
+
         if ( is_xen_heap_mfn(prev_mfn) )
             /* Xen heap frames are simply unhooked from this phys slot */
             guest_physmap_remove_page(tdom, gpfn, prev_mfn, 0);
@@ -2823,14 +2827,15 @@ int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
                  "gpfn:%lx mfn:%lx fgfn:%lx td:%d fd:%d\n",
                  gpfn, mfn, fgfn, tdom->domain_id, fdom->domain_id);
 
-    put_page(page);
-
     /*
      * This put_gfn for the above get_gfn for prev_mfn.  We must do this
      * after set_foreign_p2m_entry so another cpu doesn't populate the gpfn
      * before us.
      */
     put_gfn(tdom, gpfn);
+    if ( prev_page )
+        put_page(prev_page);
+    put_page(page);
 
 out:
     if ( fdom )
diff --git a/xen/common/memory.c b/xen/common/memory.c
index a3bffb7..571c754 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -272,8 +272,8 @@ int guest_remove_page(struct domain *d, unsigned long gmfn)
 
     guest_physmap_remove_page(d, gmfn, mfn, 0);
 
-    put_page(page);
     put_gfn(d, gmfn);
+    put_page(page);
 
     return 1;
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCHv2 3/3] x86/ept: defer the invalidation until the p2m lock is released
  2015-11-13 18:49 [RFC PATCHv2 0/3]: x86/ept: reduce translation invalidation impact David Vrabel
  2015-11-13 18:49 ` [PATCHv2 1/3] x86/ept: remove unnecessary sync after resolving misconfigured entries David Vrabel
  2015-11-13 18:49 ` [PATCHv2 2/3] mm: don't free pages until mm locks are released David Vrabel
@ 2015-11-13 18:49 ` David Vrabel
  2015-12-02  7:14 ` [RFC PATCHv2 0/3]: x86/ept: reduce translation invalidation impact Tian, Kevin
  3 siblings, 0 replies; 22+ messages in thread
From: David Vrabel @ 2015-11-13 18:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Tim Deegan, David Vrabel, David Vrabel, Jan Beulich

From: David Vrabel <dvrabel@cantab.net>

Holding the p2m lock while calling ept_sync_domain() is very expensive
since it does a on_selected_cpus() call.  IPIs on many socket machines
can be very slows and on_selected_cpus() is serialized.

Defer the invalidate until the p2m lock is released.  Since the processor
may cache partial translations, we also need to make sure any page table
pages to be freed are not freed until the invalidate is complete.  Such
pages are temporarily stored in a list.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
v2:
- use per-p2m list for deferred pages.
- update synced_mask while holding write lock.
---
 xen/arch/x86/mm/mm-locks.h | 23 +++++++++++++-------
 xen/arch/x86/mm/p2m-ept.c  | 53 ++++++++++++++++++++++++++++++++++++++--------
 xen/arch/x86/mm/p2m.c      | 18 ++++++++++++++++
 xen/include/asm-x86/p2m.h  |  7 ++++++
 4 files changed, 84 insertions(+), 17 deletions(-)

diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
index 76c7217..b5eb560 100644
--- a/xen/arch/x86/mm/mm-locks.h
+++ b/xen/arch/x86/mm/mm-locks.h
@@ -263,14 +263,21 @@ declare_mm_lock(altp2mlist)
  */
 
 declare_mm_rwlock(altp2m);
-#define p2m_lock(p)                         \
-{                                           \
-    if ( p2m_is_altp2m(p) )                 \
-        mm_write_lock(altp2m, &(p)->lock);  \
-    else                                    \
-        mm_write_lock(p2m, &(p)->lock);     \
-}
-#define p2m_unlock(p)         mm_write_unlock(&(p)->lock);
+#define p2m_lock(p)                             \
+    do {                                        \
+        if ( p2m_is_altp2m(p) )                 \
+            mm_write_lock(altp2m, &(p)->lock);  \
+        else                                    \
+            mm_write_lock(p2m, &(p)->lock);     \
+        (p)->defer_flush++;                     \
+    } while (0)
+#define p2m_unlock(p)                                                   \
+    do {                                                                \
+        if ( --(p)->defer_flush == 0 && (p)->need_flush )               \
+            (p)->flush_and_unlock(p);                                   \
+        else                                                            \
+            mm_write_unlock(&(p)->lock);                                \
+    } while (0)
 #define gfn_lock(p,g,o)       p2m_lock(p)
 #define gfn_unlock(p,g,o)     p2m_unlock(p)
 #define p2m_read_lock(p)      mm_read_lock(p2m, &(p)->lock)
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 548f2f2..9f4d6e6 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -263,7 +263,7 @@ static void ept_free_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry, int l
         unmap_domain_page(epte);
     }
     
-    p2m_free_ptp(p2m, mfn_to_page(ept_entry->mfn));
+    p2m_free_ptp_defer(p2m, mfn_to_page(ept_entry->mfn));
 }
 
 static bool_t ept_split_super_page(struct p2m_domain *p2m,
@@ -1095,15 +1095,10 @@ static void __ept_sync_domain(void *info)
     __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept), 0);
 }
 
-void ept_sync_domain(struct p2m_domain *p2m)
+static void ept_sync_domain_prepare(struct p2m_domain *p2m)
 {
     struct domain *d = p2m->domain;
     struct ept_data *ept = &p2m->ept;
-    /* Only if using EPT and this domain has some VCPUs to dirty. */
-    if ( !paging_mode_hap(d) || !d->vcpu || !d->vcpu[0] )
-        return;
-
-    ASSERT(local_irq_is_enabled());
 
     if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) )
         p2m_flush_nestedp2m(d);
@@ -1116,9 +1111,48 @@ void ept_sync_domain(struct p2m_domain *p2m)
      */
     cpumask_and(ept_get_synced_mask(ept),
                 d->domain_dirty_cpumask, &cpu_online_map);
+}
+
+static void ept_sync_domain_mask(struct p2m_domain *p2m, const cpumask_t *mask)
+{
+    on_selected_cpus(mask, __ept_sync_domain, p2m, 1);
+}
+
+void ept_sync_domain(struct p2m_domain *p2m)
+{
+    struct domain *d = p2m->domain;
+    struct ept_data *ept = &p2m->ept;
+
+    /* Only if using EPT and this domain has some VCPUs to dirty. */
+    if ( !paging_mode_hap(d) || !d->vcpu || !d->vcpu[0] )
+        return;
+
+    if ( p2m->defer_flush )
+    {
+        p2m->need_flush = 1;
+        return;
+    }
+    else
+        p2m->need_flush = 0;
+
+    ept_sync_domain_prepare(p2m);
+    ept_sync_domain_mask(p2m, ept_get_synced_mask(ept));
+}
+
+static void ept_flush_and_unlock(struct p2m_domain *p2m)
+{
+    struct ept_data *ept = &p2m->ept;
+    PAGE_LIST_HEAD(deferred_pages);
+    cpumask_t flush_mask;
+
+    ept_sync_domain_prepare(p2m);
+    cpumask_copy(&flush_mask, ept_get_synced_mask(ept));
+    page_list_move(&deferred_pages, &p2m->deferred_pages);
+
+    mm_write_unlock(&p2m->lock);
 
-    on_selected_cpus(ept_get_synced_mask(ept),
-                     __ept_sync_domain, p2m, 1);
+    ept_sync_domain_mask(p2m, &flush_mask);
+    p2m_free_ptp_list(p2m, &deferred_pages);
 }
 
 static void ept_enable_pml(struct p2m_domain *p2m)
@@ -1169,6 +1203,7 @@ int ept_p2m_init(struct p2m_domain *p2m)
     p2m->change_entry_type_range = ept_change_entry_type_range;
     p2m->memory_type_changed = ept_memory_type_changed;
     p2m->audit_p2m = NULL;
+    p2m->flush_and_unlock = ept_flush_and_unlock;
 
     /* Set the memory type used when accessing EPT paging structures. */
     ept->ept_mt = EPT_DEFAULT_MT;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index e2c82b1..4080410 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -65,6 +65,7 @@ static int p2m_initialise(struct domain *d, struct p2m_domain *p2m)
     mm_lock_init(&p2m->pod.lock);
     INIT_LIST_HEAD(&p2m->np2m_list);
     INIT_PAGE_LIST_HEAD(&p2m->pages);
+    INIT_PAGE_LIST_HEAD(&p2m->deferred_pages);
     INIT_PAGE_LIST_HEAD(&p2m->pod.super);
     INIT_PAGE_LIST_HEAD(&p2m->pod.single);
 
@@ -504,6 +505,23 @@ void p2m_free_ptp(struct p2m_domain *p2m, struct page_info *pg)
     return;
 }
 
+void p2m_free_ptp_defer(struct p2m_domain *p2m, struct page_info *pg)
+{
+    page_list_del(pg, &p2m->pages);
+    page_list_add(pg, &p2m->deferred_pages);
+}
+
+void p2m_free_ptp_list(struct p2m_domain *p2m, struct page_list_head *list)
+{
+    struct page_info *pg, *tmp;
+
+    page_list_for_each_safe(pg, tmp, list)
+    {
+        page_list_del(pg, list);
+        p2m->domain->arch.paging.free_page(p2m->domain, pg);
+    }
+}
+
 /*
  * Allocate a new p2m table for a domain.
  *
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index d748557..89b7bd1 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -254,6 +254,11 @@ struct p2m_domain {
                                           unsigned long gfn, l1_pgentry_t *p,
                                           l1_pgentry_t new, unsigned int level);
     long               (*audit_p2m)(struct p2m_domain *p2m);
+    void               (*flush_and_unlock)(struct p2m_domain *p2m);
+
+    struct page_list_head deferred_pages;
+    unsigned int defer_flush;
+    bool_t need_flush;
 
     /* Default P2M access type for each page in the the domain: new pages,
      * swapped in pages, cleared pages, and pages that are ambiguously
@@ -681,6 +686,8 @@ static inline bool_t p2m_mem_access_sanity_check(struct domain *d)
 
 struct page_info *p2m_alloc_ptp(struct p2m_domain *p2m, unsigned long type);
 void p2m_free_ptp(struct p2m_domain *p2m, struct page_info *pg);
+void p2m_free_ptp_defer(struct p2m_domain *p2m, struct page_info *pg);
+void p2m_free_ptp_list(struct p2m_domain *p2m, struct page_list_head *list);
 
 /* Directly set a p2m entry: only for use by p2m code. Does not need
  * a call to put_gfn afterwards/ */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 2/3] mm: don't free pages until mm locks are released
  2015-11-13 18:49 ` [PATCHv2 2/3] mm: don't free pages until mm locks are released David Vrabel
@ 2015-11-16 11:52   ` Jan Beulich
  2015-11-16 12:02     ` David Vrabel
  2015-11-30 14:10   ` David Vrabel
  2015-12-02  7:25   ` Tian, Kevin
  2 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2015-11-16 11:52 UTC (permalink / raw)
  To: David Vrabel
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan,
	Jun Nakajima, xen-devel

>>> On 13.11.15 at 19:49, <david.vrabel@citrix.com> wrote:
> @@ -2805,6 +2806,9 @@ int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
>      prev_mfn = mfn_x(get_gfn(tdom, gpfn, &p2mt_prev));
>      if ( mfn_valid(_mfn(prev_mfn)) )
>      {
> +        prev_page = mfn_to_page(_mfn(prev_mfn));
> +        get_page(prev_page, tdom);

If you're absolutely sure that this can never fail, then still at the very
least this imo should be documented by a respective ASSERT() (or
ASSERT_UNREACHABLE()).

Jan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 2/3] mm: don't free pages until mm locks are released
  2015-11-16 11:52   ` Jan Beulich
@ 2015-11-16 12:02     ` David Vrabel
  2015-11-16 12:04       ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: David Vrabel @ 2015-11-16 12:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan,
	Jun Nakajima, xen-devel

On 16/11/15 11:52, Jan Beulich wrote:
>>>> On 13.11.15 at 19:49, <david.vrabel@citrix.com> wrote:
>> @@ -2805,6 +2806,9 @@ int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
>>      prev_mfn = mfn_x(get_gfn(tdom, gpfn, &p2mt_prev));
>>      if ( mfn_valid(_mfn(prev_mfn)) )
>>      {
>> +        prev_page = mfn_to_page(_mfn(prev_mfn));
>> +        get_page(prev_page, tdom);
> 
> If you're absolutely sure that this can never fail, then still at the very
> least this imo should be documented by a respective ASSERT() (or
> ASSERT_UNREACHABLE()).

What are you suggesting may fail?

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 2/3] mm: don't free pages until mm locks are released
  2015-11-16 12:02     ` David Vrabel
@ 2015-11-16 12:04       ` Jan Beulich
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2015-11-16 12:04 UTC (permalink / raw)
  To: David Vrabel
  Cc: KevinTian, George Dunlap, Andrew Cooper, Tim Deegan,
	Jun Nakajima, xen-devel

>>> On 16.11.15 at 13:02, <david.vrabel@citrix.com> wrote:
> On 16/11/15 11:52, Jan Beulich wrote:
>>>>> On 13.11.15 at 19:49, <david.vrabel@citrix.com> wrote:
>>> @@ -2805,6 +2806,9 @@ int p2m_add_foreign(struct domain *tdom, unsigned long 
> fgfn,
>>>      prev_mfn = mfn_x(get_gfn(tdom, gpfn, &p2mt_prev));
>>>      if ( mfn_valid(_mfn(prev_mfn)) )
>>>      {
>>> +        prev_page = mfn_to_page(_mfn(prev_mfn));
>>> +        get_page(prev_page, tdom);
>> 
>> If you're absolutely sure that this can never fail, then still at the very
>> least this imo should be documented by a respective ASSERT() (or
>> ASSERT_UNREACHABLE()).
> 
> What are you suggesting may fail?

get_page()

Jan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 2/3] mm: don't free pages until mm locks are released
  2015-11-13 18:49 ` [PATCHv2 2/3] mm: don't free pages until mm locks are released David Vrabel
  2015-11-16 11:52   ` Jan Beulich
@ 2015-11-30 14:10   ` David Vrabel
  2015-12-02  7:25   ` Tian, Kevin
  2 siblings, 0 replies; 22+ messages in thread
From: David Vrabel @ 2015-11-30 14:10 UTC (permalink / raw)
  To: David Vrabel, xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Tim Deegan, Jan Beulich

On 13/11/15 18:49, David Vrabel wrote:
> If a page is freed without translations being invalidated, and the page is
> subsequently allocated to another domain, a guest with a cached
> translation will still be able to access the page.
> 
> Currently translations are invalidated before releasing the page ref, but
> while still holding the mm locks.  To allow translations to be invalidated
> without holding the mm locks, we need to keep a reference to the page
> for a bit longer in some cases.
> 
> [ This seems difficult to a) verify as correct; and b) difficult to get
> correct in the future.  A better suggestion would be useful.  Perhaps
> using something like pg->tlbflush_needed mechanism that already exists
> for pages from PV guests? ]

ping?

This question is quite important and hasn't been answered yet.

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCHv2 0/3]: x86/ept: reduce translation invalidation impact
  2015-11-13 18:49 [RFC PATCHv2 0/3]: x86/ept: reduce translation invalidation impact David Vrabel
                   ` (2 preceding siblings ...)
  2015-11-13 18:49 ` [PATCHv2 3/3] x86/ept: defer the invalidation until the p2m lock is released David Vrabel
@ 2015-12-02  7:14 ` Tian, Kevin
  2015-12-03 15:59   ` David Vrabel
  3 siblings, 1 reply; 22+ messages in thread
From: Tian, Kevin @ 2015-12-02  7:14 UTC (permalink / raw)
  To: David Vrabel, xen-devel
  Cc: George Dunlap, Andrew Cooper, Tim Deegan, Jan Beulich, Nakajima, Jun

> From: David Vrabel [mailto:david.vrabel@citrix.com]
> Sent: Saturday, November 14, 2015 2:50 AM
> 
> This RFC series improves the performance of EPT by reducing the impact
> of the translation invalidations (ept_sync_domain()).  Two approaches
> are used:
> 
> a) Removing unnecessary invalidations after fixing misconfigured
>    entries (after a type change).
> 
> b) Deferring invalidations until the p2m write lock is released.

Do you have a sense which one above incurs more overhead?

> 
> Prior to this change a 16 VCPU guest could not be successfully
> migrated on an (admittedly slow) 160 PCPU box because the p2m write
> lock was held for such extended periods of time.  This starved the
> read lock needed (by the toolstack) to map the domain's memory,
> triggering the watchdog.
> 
> After this change a 64 VCPU guest could be successfully migrated.
> 
> ept_sync_domain() is very expensive because:
> 
> a) it uses on_selected_cpus() and the IPI cost can be particularly
>    high for a multi-socket machine.
> 
> b) on_selected_cpus() is serialized by its own spin lock.
> 
> On this particular box, ept_sync_domain() could take ~3-5 ms.
> 
> Simply using a fair rw lock was not sufficient to resolve this (but it
> was an improvement) as the cost of the ept_sync_domain calls() was
> still delaying the read locks enough for the watchdog to trigger (the
> toolstack maps a batch of 1024 GFNs at a time, which means trying to
> acquire the p2m read lock 1024 times).
> 
> Changes in v2:
> 
> - Use a per-p2m (not per-CPU) list for page table pages to be freed.
> - Hold the write lock while updating the synced_mask.
> 
> David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 1/3] x86/ept: remove unnecessary sync after resolving misconfigured entries
  2015-11-13 18:49 ` [PATCHv2 1/3] x86/ept: remove unnecessary sync after resolving misconfigured entries David Vrabel
@ 2015-12-02  7:17   ` Tian, Kevin
  0 siblings, 0 replies; 22+ messages in thread
From: Tian, Kevin @ 2015-12-02  7:17 UTC (permalink / raw)
  To: David Vrabel, xen-devel
  Cc: George Dunlap, Andrew Cooper, Tim Deegan, Jan Beulich, Nakajima, Jun

> From: David Vrabel [mailto:david.vrabel@citrix.com]
> Sent: Saturday, November 14, 2015 2:50 AM
> 
> When using EPT, type changes are done with the following steps:
> 
> 1. Set entry as invalid (misconfigured) by settings a reserved memory
> type.
> 
> 2. Flush all EPT and combined translations (ept_sync_domain()).
> 
> 3. Fixup misconfigured entries as required (on EPT_MISCONFIG vmexits or
> when explicitly setting an entry.
> 
> Since resolve_misconfig() only updates entries that were misconfigured,
> there is no need to invalidate any translations since the hardware
> does not cache misconfigured translations (vol 3, section 28.3.2).
> 
> Remove the unnecessary (and very expensive) ept_sync_domain() calls).
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 2/3] mm: don't free pages until mm locks are released
  2015-11-13 18:49 ` [PATCHv2 2/3] mm: don't free pages until mm locks are released David Vrabel
  2015-11-16 11:52   ` Jan Beulich
  2015-11-30 14:10   ` David Vrabel
@ 2015-12-02  7:25   ` Tian, Kevin
  2015-12-02 16:23     ` Tim Deegan
  2 siblings, 1 reply; 22+ messages in thread
From: Tian, Kevin @ 2015-12-02  7:25 UTC (permalink / raw)
  To: David Vrabel, xen-devel
  Cc: George Dunlap, Andrew Cooper, Tim Deegan, Jan Beulich, Nakajima, Jun

> From: David Vrabel [mailto:david.vrabel@citrix.com]
> Sent: Saturday, November 14, 2015 2:50 AM
> 
> If a page is freed without translations being invalidated, and the page is
> subsequently allocated to another domain, a guest with a cached
> translation will still be able to access the page.
> 
> Currently translations are invalidated before releasing the page ref, but
> while still holding the mm locks.  To allow translations to be invalidated
> without holding the mm locks, we need to keep a reference to the page
> for a bit longer in some cases.
> 
> [ This seems difficult to a) verify as correct; and b) difficult to get
> correct in the future.  A better suggestion would be useful.  Perhaps
> using something like pg->tlbflush_needed mechanism that already exists
> for pages from PV guests? ]

Per-page flag looks clean in general, but not an expert here. Tim might
have a better idea.

> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
>  xen/arch/x86/mm/p2m.c | 9 +++++++--
>  xen/common/memory.c   | 2 +-
>  2 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index ed0bbd7..e2c82b1 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -2758,6 +2758,7 @@ int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
>      p2m_type_t p2mt, p2mt_prev;
>      unsigned long prev_mfn, mfn;
>      struct page_info *page;
> +    struct page_info *prev_page = NULL;
>      int rc;
>      struct domain *fdom;
> 
> @@ -2805,6 +2806,9 @@ int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
>      prev_mfn = mfn_x(get_gfn(tdom, gpfn, &p2mt_prev));
>      if ( mfn_valid(_mfn(prev_mfn)) )
>      {
> +        prev_page = mfn_to_page(_mfn(prev_mfn));
> +        get_page(prev_page, tdom);
> +
>          if ( is_xen_heap_mfn(prev_mfn) )
>              /* Xen heap frames are simply unhooked from this phys slot */
>              guest_physmap_remove_page(tdom, gpfn, prev_mfn, 0);
> @@ -2823,14 +2827,15 @@ int p2m_add_foreign(struct domain *tdom, unsigned long
> fgfn,
>                   "gpfn:%lx mfn:%lx fgfn:%lx td:%d fd:%d\n",
>                   gpfn, mfn, fgfn, tdom->domain_id, fdom->domain_id);
> 
> -    put_page(page);
> -
>      /*
>       * This put_gfn for the above get_gfn for prev_mfn.  We must do this
>       * after set_foreign_p2m_entry so another cpu doesn't populate the gpfn
>       * before us.
>       */
>      put_gfn(tdom, gpfn);
> +    if ( prev_page )
> +        put_page(prev_page);
> +    put_page(page);
> 
>  out:
>      if ( fdom )
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index a3bffb7..571c754 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -272,8 +272,8 @@ int guest_remove_page(struct domain *d, unsigned long gmfn)
> 
>      guest_physmap_remove_page(d, gmfn, mfn, 0);
> 
> -    put_page(page);
>      put_gfn(d, gmfn);
> +    put_page(page);
> 
>      return 1;
>  }
> --
> 2.1.4

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 2/3] mm: don't free pages until mm locks are released
  2015-12-02  7:25   ` Tian, Kevin
@ 2015-12-02 16:23     ` Tim Deegan
  2015-12-02 16:30       ` George Dunlap
  2015-12-02 16:45       ` David Vrabel
  0 siblings, 2 replies; 22+ messages in thread
From: Tim Deegan @ 2015-12-02 16:23 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nakajima, Jun, George Dunlap, Andrew Cooper, David Vrabel,
	Jan Beulich, xen-devel

At 07:25 +0000 on 02 Dec (1449041100), Tian, Kevin wrote:
> > From: David Vrabel [mailto:david.vrabel@citrix.com]
> > Sent: Saturday, November 14, 2015 2:50 AM
> > 
> > If a page is freed without translations being invalidated, and the page is
> > subsequently allocated to another domain, a guest with a cached
> > translation will still be able to access the page.
> > 
> > Currently translations are invalidated before releasing the page ref, but
> > while still holding the mm locks.  To allow translations to be invalidated
> > without holding the mm locks, we need to keep a reference to the page
> > for a bit longer in some cases.
> > 
> > [ This seems difficult to a) verify as correct; and b) difficult to get
> > correct in the future.  A better suggestion would be useful.  Perhaps
> > using something like pg->tlbflush_needed mechanism that already exists
> > for pages from PV guests? ]
> 
> Per-page flag looks clean in general, but not an expert here. Tim might
> have a better idea.

I think you can probably use the tlbflush_timestamp stuff as-is for
EPT flushes -- the existing TLB shootdowns already drop all EPT
translations.

Holding the ref until the flush finishes is a common enough idiom but
I can see that having to track lists of them does make it a little clunky. 

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 2/3] mm: don't free pages until mm locks are released
  2015-12-02 16:23     ` Tim Deegan
@ 2015-12-02 16:30       ` George Dunlap
  2015-12-02 16:46         ` Jan Beulich
  2015-12-02 16:46         ` Tim Deegan
  2015-12-02 16:45       ` David Vrabel
  1 sibling, 2 replies; 22+ messages in thread
From: George Dunlap @ 2015-12-02 16:30 UTC (permalink / raw)
  To: Tim Deegan, Tian, Kevin
  Cc: Nakajima, Jun, George Dunlap, Andrew Cooper, David Vrabel,
	Jan Beulich, xen-devel

On 02/12/15 16:23, Tim Deegan wrote:
> At 07:25 +0000 on 02 Dec (1449041100), Tian, Kevin wrote:
>>> From: David Vrabel [mailto:david.vrabel@citrix.com]
>>> Sent: Saturday, November 14, 2015 2:50 AM
>>>
>>> If a page is freed without translations being invalidated, and the page is
>>> subsequently allocated to another domain, a guest with a cached
>>> translation will still be able to access the page.
>>>
>>> Currently translations are invalidated before releasing the page ref, but
>>> while still holding the mm locks.  To allow translations to be invalidated
>>> without holding the mm locks, we need to keep a reference to the page
>>> for a bit longer in some cases.
>>>
>>> [ This seems difficult to a) verify as correct; and b) difficult to get
>>> correct in the future.  A better suggestion would be useful.  Perhaps
>>> using something like pg->tlbflush_needed mechanism that already exists
>>> for pages from PV guests? ]
>>
>> Per-page flag looks clean in general, but not an expert here. Tim might
>> have a better idea.
> 
> I think you can probably use the tlbflush_timestamp stuff as-is for
> EPT flushes -- the existing TLB shootdowns already drop all EPT
> translations.

Are you saying that if you do a TLB shootdown you don't need to do an
invept command?  That doesn't sound right.

 -George

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 2/3] mm: don't free pages until mm locks are released
  2015-12-02 16:23     ` Tim Deegan
  2015-12-02 16:30       ` George Dunlap
@ 2015-12-02 16:45       ` David Vrabel
  2015-12-02 17:17         ` George Dunlap
  1 sibling, 1 reply; 22+ messages in thread
From: David Vrabel @ 2015-12-02 16:45 UTC (permalink / raw)
  To: Tim Deegan, Tian, Kevin
  Cc: George Dunlap, xen-devel, Jan Beulich, Nakajima, Jun, Andrew Cooper

On 02/12/15 16:23, Tim Deegan wrote:
> At 07:25 +0000 on 02 Dec (1449041100), Tian, Kevin wrote:
>>> From: David Vrabel [mailto:david.vrabel@citrix.com]
>>> Sent: Saturday, November 14, 2015 2:50 AM
>>>
>>> If a page is freed without translations being invalidated, and the page is
>>> subsequently allocated to another domain, a guest with a cached
>>> translation will still be able to access the page.
>>>
>>> Currently translations are invalidated before releasing the page ref, but
>>> while still holding the mm locks.  To allow translations to be invalidated
>>> without holding the mm locks, we need to keep a reference to the page
>>> for a bit longer in some cases.
>>>
>>> [ This seems difficult to a) verify as correct; and b) difficult to get
>>> correct in the future.  A better suggestion would be useful.  Perhaps
>>> using something like pg->tlbflush_needed mechanism that already exists
>>> for pages from PV guests? ]
>>
>> Per-page flag looks clean in general, but not an expert here. Tim might
>> have a better idea.
> 
> I think you can probably use the tlbflush_timestamp stuff as-is for
> EPT flushes -- the existing TLB shootdowns already drop all EPT
> translations.

Regular TLB invalidate instructions (e.g., INVLPG) only invalidate
linear and combined mappings.  guest-physical mappings are not affected
and require an INVEPT.

Section 2.3.3.2 of the Intel SDM vol 3 gives a useful summary.

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 2/3] mm: don't free pages until mm locks are released
  2015-12-02 16:30       ` George Dunlap
@ 2015-12-02 16:46         ` Jan Beulich
  2015-12-02 16:46         ` Tim Deegan
  1 sibling, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2015-12-02 16:46 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan,
	David Vrabel, Jun Nakajima, xen-devel

>>> On 02.12.15 at 17:30, <george.dunlap@citrix.com> wrote:
> On 02/12/15 16:23, Tim Deegan wrote:
>> At 07:25 +0000 on 02 Dec (1449041100), Tian, Kevin wrote:
>>>> From: David Vrabel [mailto:david.vrabel@citrix.com]
>>>> Sent: Saturday, November 14, 2015 2:50 AM
>>>>
>>>> If a page is freed without translations being invalidated, and the page is
>>>> subsequently allocated to another domain, a guest with a cached
>>>> translation will still be able to access the page.
>>>>
>>>> Currently translations are invalidated before releasing the page ref, but
>>>> while still holding the mm locks.  To allow translations to be invalidated
>>>> without holding the mm locks, we need to keep a reference to the page
>>>> for a bit longer in some cases.
>>>>
>>>> [ This seems difficult to a) verify as correct; and b) difficult to get
>>>> correct in the future.  A better suggestion would be useful.  Perhaps
>>>> using something like pg->tlbflush_needed mechanism that already exists
>>>> for pages from PV guests? ]
>>>
>>> Per-page flag looks clean in general, but not an expert here. Tim might
>>> have a better idea.
>> 
>> I think you can probably use the tlbflush_timestamp stuff as-is for
>> EPT flushes -- the existing TLB shootdowns already drop all EPT
>> translations.
> 
> Are you saying that if you do a TLB shootdown you don't need to do an
> invept command?  That doesn't sound right.

No. flush_area_local() (i.e. the actual worker function of TLB
shootdown) calls hvm_flush_guest_tlbs() (in turn leading to
hvm_asid_flush_core()).

Jan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 2/3] mm: don't free pages until mm locks are released
  2015-12-02 16:30       ` George Dunlap
  2015-12-02 16:46         ` Jan Beulich
@ 2015-12-02 16:46         ` Tim Deegan
  2015-12-02 17:01           ` George Dunlap
  2015-12-02 17:14           ` David Vrabel
  1 sibling, 2 replies; 22+ messages in thread
From: Tim Deegan @ 2015-12-02 16:46 UTC (permalink / raw)
  To: George Dunlap
  Cc: Tian, Kevin, Nakajima, Jun, George Dunlap, Andrew Cooper,
	David Vrabel, Jan Beulich, xen-devel

At 16:30 +0000 on 02 Dec (1449073841), George Dunlap wrote:
> On 02/12/15 16:23, Tim Deegan wrote:
> > At 07:25 +0000 on 02 Dec (1449041100), Tian, Kevin wrote:
> >>> From: David Vrabel [mailto:david.vrabel@citrix.com]
> >>> Sent: Saturday, November 14, 2015 2:50 AM
> >>>
> >>> If a page is freed without translations being invalidated, and the page is
> >>> subsequently allocated to another domain, a guest with a cached
> >>> translation will still be able to access the page.
> >>>
> >>> Currently translations are invalidated before releasing the page ref, but
> >>> while still holding the mm locks.  To allow translations to be invalidated
> >>> without holding the mm locks, we need to keep a reference to the page
> >>> for a bit longer in some cases.
> >>>
> >>> [ This seems difficult to a) verify as correct; and b) difficult to get
> >>> correct in the future.  A better suggestion would be useful.  Perhaps
> >>> using something like pg->tlbflush_needed mechanism that already exists
> >>> for pages from PV guests? ]
> >>
> >> Per-page flag looks clean in general, but not an expert here. Tim might
> >> have a better idea.
> > 
> > I think you can probably use the tlbflush_timestamp stuff as-is for
> > EPT flushes -- the existing TLB shootdowns already drop all EPT
> > translations.
> 
> Are you saying that if you do a TLB shootdown you don't need to do an
> invept command?

Yes, I think so.  flush_area_local() -> hvm_flush_guest_tlbs() ->
hvm_asid_flush_core() should DTRT.

Tim.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 2/3] mm: don't free pages until mm locks are released
  2015-12-02 16:46         ` Tim Deegan
@ 2015-12-02 17:01           ` George Dunlap
  2015-12-02 17:14           ` David Vrabel
  1 sibling, 0 replies; 22+ messages in thread
From: George Dunlap @ 2015-12-02 17:01 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Tian, Kevin, Nakajima, Jun, George Dunlap, Andrew Cooper,
	David Vrabel, Jan Beulich, xen-devel

On 02/12/15 16:46, Tim Deegan wrote:
> At 16:30 +0000 on 02 Dec (1449073841), George Dunlap wrote:
>> On 02/12/15 16:23, Tim Deegan wrote:
>>> At 07:25 +0000 on 02 Dec (1449041100), Tian, Kevin wrote:
>>>>> From: David Vrabel [mailto:david.vrabel@citrix.com]
>>>>> Sent: Saturday, November 14, 2015 2:50 AM
>>>>>
>>>>> If a page is freed without translations being invalidated, and the page is
>>>>> subsequently allocated to another domain, a guest with a cached
>>>>> translation will still be able to access the page.
>>>>>
>>>>> Currently translations are invalidated before releasing the page ref, but
>>>>> while still holding the mm locks.  To allow translations to be invalidated
>>>>> without holding the mm locks, we need to keep a reference to the page
>>>>> for a bit longer in some cases.
>>>>>
>>>>> [ This seems difficult to a) verify as correct; and b) difficult to get
>>>>> correct in the future.  A better suggestion would be useful.  Perhaps
>>>>> using something like pg->tlbflush_needed mechanism that already exists
>>>>> for pages from PV guests? ]
>>>>
>>>> Per-page flag looks clean in general, but not an expert here. Tim might
>>>> have a better idea.
>>>
>>> I think you can probably use the tlbflush_timestamp stuff as-is for
>>> EPT flushes -- the existing TLB shootdowns already drop all EPT
>>> translations.
>>
>> Are you saying that if you do a TLB shootdown you don't need to do an
>> invept command?
> 
> Yes, I think so.  flush_area_local() -> hvm_flush_guest_tlbs() ->
> hvm_asid_flush_core() should DTRT.

Looks like that ends up causing vpid_sync_all(), which executes invvpid.
 I presume that's different than invept.  But perhaps we could extend
the basic functionality to call invept when we need it.

 -George

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 2/3] mm: don't free pages until mm locks are released
  2015-12-02 16:46         ` Tim Deegan
  2015-12-02 17:01           ` George Dunlap
@ 2015-12-02 17:14           ` David Vrabel
  2015-12-03 10:22             ` Jan Beulich
  1 sibling, 1 reply; 22+ messages in thread
From: David Vrabel @ 2015-12-02 17:14 UTC (permalink / raw)
  To: Tim Deegan, George Dunlap
  Cc: Tian, Kevin, Nakajima, Jun, George Dunlap, Andrew Cooper,
	David Vrabel, Jan Beulich, xen-devel

On 02/12/15 16:46, Tim Deegan wrote:
> At 16:30 +0000 on 02 Dec (1449073841), George Dunlap wrote:
>> On 02/12/15 16:23, Tim Deegan wrote:
>>> At 07:25 +0000 on 02 Dec (1449041100), Tian, Kevin wrote:
>>>>> From: David Vrabel [mailto:david.vrabel@citrix.com]
>>>>> Sent: Saturday, November 14, 2015 2:50 AM
>>>>>
>>>>> If a page is freed without translations being invalidated, and the page is
>>>>> subsequently allocated to another domain, a guest with a cached
>>>>> translation will still be able to access the page.
>>>>>
>>>>> Currently translations are invalidated before releasing the page ref, but
>>>>> while still holding the mm locks.  To allow translations to be invalidated
>>>>> without holding the mm locks, we need to keep a reference to the page
>>>>> for a bit longer in some cases.
>>>>>
>>>>> [ This seems difficult to a) verify as correct; and b) difficult to get
>>>>> correct in the future.  A better suggestion would be useful.  Perhaps
>>>>> using something like pg->tlbflush_needed mechanism that already exists
>>>>> for pages from PV guests? ]
>>>>
>>>> Per-page flag looks clean in general, but not an expert here. Tim might
>>>> have a better idea.
>>>
>>> I think you can probably use the tlbflush_timestamp stuff as-is for
>>> EPT flushes -- the existing TLB shootdowns already drop all EPT
>>> translations.
>>
>> Are you saying that if you do a TLB shootdown you don't need to do an
>> invept command?
> 
> Yes, I think so.  flush_area_local() -> hvm_flush_guest_tlbs() ->
> hvm_asid_flush_core() should DTRT.

I don't think so. Guest-physical mappings are only tagged with the EP4TA
and not the VPID, thus changing the VPID does nothing.

Thus we would need to extend the current mechanism to determine that a
invept is also needed.

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 2/3] mm: don't free pages until mm locks are released
  2015-12-02 16:45       ` David Vrabel
@ 2015-12-02 17:17         ` George Dunlap
  0 siblings, 0 replies; 22+ messages in thread
From: George Dunlap @ 2015-12-02 17:17 UTC (permalink / raw)
  To: David Vrabel, Tim Deegan, Tian, Kevin
  Cc: George Dunlap, xen-devel, Jan Beulich, Nakajima, Jun, Andrew Cooper

On 02/12/15 16:45, David Vrabel wrote:
> On 02/12/15 16:23, Tim Deegan wrote:
>> At 07:25 +0000 on 02 Dec (1449041100), Tian, Kevin wrote:
>>>> From: David Vrabel [mailto:david.vrabel@citrix.com]
>>>> Sent: Saturday, November 14, 2015 2:50 AM
>>>>
>>>> If a page is freed without translations being invalidated, and the page is
>>>> subsequently allocated to another domain, a guest with a cached
>>>> translation will still be able to access the page.
>>>>
>>>> Currently translations are invalidated before releasing the page ref, but
>>>> while still holding the mm locks.  To allow translations to be invalidated
>>>> without holding the mm locks, we need to keep a reference to the page
>>>> for a bit longer in some cases.
>>>>
>>>> [ This seems difficult to a) verify as correct; and b) difficult to get
>>>> correct in the future.  A better suggestion would be useful.  Perhaps
>>>> using something like pg->tlbflush_needed mechanism that already exists
>>>> for pages from PV guests? ]
>>>
>>> Per-page flag looks clean in general, but not an expert here. Tim might
>>> have a better idea.
>>
>> I think you can probably use the tlbflush_timestamp stuff as-is for
>> EPT flushes -- the existing TLB shootdowns already drop all EPT
>> translations.
> 
> Regular TLB invalidate instructions (e.g., INVLPG) only invalidate
> linear and combined mappings.  guest-physical mappings are not affected
> and require an INVEPT.
> 
> Section 2.3.3.2 of the Intel SDM vol 3 gives a useful summary.

That's 28.3.3.2, in case anyone else wants to look for it...

 -George

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 2/3] mm: don't free pages until mm locks are released
  2015-12-02 17:14           ` David Vrabel
@ 2015-12-03 10:22             ` Jan Beulich
  2015-12-03 10:56               ` George Dunlap
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2015-12-03 10:22 UTC (permalink / raw)
  To: David Vrabel
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan,
	George Dunlap, Jun Nakajima, xen-devel

>>> On 02.12.15 at 18:14, <david.vrabel@citrix.com> wrote:
> On 02/12/15 16:46, Tim Deegan wrote:
>> At 16:30 +0000 on 02 Dec (1449073841), George Dunlap wrote:
>>> On 02/12/15 16:23, Tim Deegan wrote:
>>>> At 07:25 +0000 on 02 Dec (1449041100), Tian, Kevin wrote:
>>>>>> From: David Vrabel [mailto:david.vrabel@citrix.com]
>>>>>> Sent: Saturday, November 14, 2015 2:50 AM
>>>>>>
>>>>>> If a page is freed without translations being invalidated, and the page is
>>>>>> subsequently allocated to another domain, a guest with a cached
>>>>>> translation will still be able to access the page.
>>>>>>
>>>>>> Currently translations are invalidated before releasing the page ref, but
>>>>>> while still holding the mm locks.  To allow translations to be invalidated
>>>>>> without holding the mm locks, we need to keep a reference to the page
>>>>>> for a bit longer in some cases.
>>>>>>
>>>>>> [ This seems difficult to a) verify as correct; and b) difficult to get
>>>>>> correct in the future.  A better suggestion would be useful.  Perhaps
>>>>>> using something like pg->tlbflush_needed mechanism that already exists
>>>>>> for pages from PV guests? ]
>>>>>
>>>>> Per-page flag looks clean in general, but not an expert here. Tim might
>>>>> have a better idea.
>>>>
>>>> I think you can probably use the tlbflush_timestamp stuff as-is for
>>>> EPT flushes -- the existing TLB shootdowns already drop all EPT
>>>> translations.
>>>
>>> Are you saying that if you do a TLB shootdown you don't need to do an
>>> invept command?
>> 
>> Yes, I think so.  flush_area_local() -> hvm_flush_guest_tlbs() ->
>> hvm_asid_flush_core() should DTRT.
> 
> I don't think so. Guest-physical mappings are only tagged with the EP4TA
> and not the VPID, thus changing the VPID does nothing.

Indeed. And in fact they tell us that using VPID together with EPT is
kind of pointless when different guests use different EPTPs.

> Thus we would need to extend the current mechanism to determine that a
> invept is also needed.

Agreed.

Jan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 2/3] mm: don't free pages until mm locks are released
  2015-12-03 10:22             ` Jan Beulich
@ 2015-12-03 10:56               ` George Dunlap
  0 siblings, 0 replies; 22+ messages in thread
From: George Dunlap @ 2015-12-03 10:56 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Andrew Cooper, Tim Deegan, George Dunlap,
	David Vrabel, Jun Nakajima, xen-devel

On Thu, Dec 3, 2015 at 10:22 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 02.12.15 at 18:14, <david.vrabel@citrix.com> wrote:
>> On 02/12/15 16:46, Tim Deegan wrote:
>>> At 16:30 +0000 on 02 Dec (1449073841), George Dunlap wrote:
>>>> On 02/12/15 16:23, Tim Deegan wrote:
>>>>> At 07:25 +0000 on 02 Dec (1449041100), Tian, Kevin wrote:
>>>>>>> From: David Vrabel [mailto:david.vrabel@citrix.com]
>>>>>>> Sent: Saturday, November 14, 2015 2:50 AM
>>>>>>>
>>>>>>> If a page is freed without translations being invalidated, and the page is
>>>>>>> subsequently allocated to another domain, a guest with a cached
>>>>>>> translation will still be able to access the page.
>>>>>>>
>>>>>>> Currently translations are invalidated before releasing the page ref, but
>>>>>>> while still holding the mm locks.  To allow translations to be invalidated
>>>>>>> without holding the mm locks, we need to keep a reference to the page
>>>>>>> for a bit longer in some cases.
>>>>>>>
>>>>>>> [ This seems difficult to a) verify as correct; and b) difficult to get
>>>>>>> correct in the future.  A better suggestion would be useful.  Perhaps
>>>>>>> using something like pg->tlbflush_needed mechanism that already exists
>>>>>>> for pages from PV guests? ]
>>>>>>
>>>>>> Per-page flag looks clean in general, but not an expert here. Tim might
>>>>>> have a better idea.
>>>>>
>>>>> I think you can probably use the tlbflush_timestamp stuff as-is for
>>>>> EPT flushes -- the existing TLB shootdowns already drop all EPT
>>>>> translations.
>>>>
>>>> Are you saying that if you do a TLB shootdown you don't need to do an
>>>> invept command?
>>>
>>> Yes, I think so.  flush_area_local() -> hvm_flush_guest_tlbs() ->
>>> hvm_asid_flush_core() should DTRT.
>>
>> I don't think so. Guest-physical mappings are only tagged with the EP4TA
>> and not the VPID, thus changing the VPID does nothing.
>
> Indeed. And in fact they tell us that using VPID together with EPT is
> kind of pointless when different guests use different EPTPs.

Except that of course, that they allow us to isolate the flushing of
combined mappings from between guests...

 -George

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCHv2 0/3]: x86/ept: reduce translation invalidation impact
  2015-12-02  7:14 ` [RFC PATCHv2 0/3]: x86/ept: reduce translation invalidation impact Tian, Kevin
@ 2015-12-03 15:59   ` David Vrabel
  0 siblings, 0 replies; 22+ messages in thread
From: David Vrabel @ 2015-12-03 15:59 UTC (permalink / raw)
  To: Tian, Kevin, David Vrabel, xen-devel
  Cc: George Dunlap, Andrew Cooper, Tim Deegan, Nakajima, Jun, Jan Beulich

On 02/12/15 07:14, Tian, Kevin wrote:
>> From: David Vrabel [mailto:david.vrabel@citrix.com]
>> Sent: Saturday, November 14, 2015 2:50 AM
>>
>> This RFC series improves the performance of EPT by reducing the impact
>> of the translation invalidations (ept_sync_domain()).  Two approaches
>> are used:
>>
>> a) Removing unnecessary invalidations after fixing misconfigured
>>    entries (after a type change).
>>
>> b) Deferring invalidations until the p2m write lock is released.
> 
> Do you have a sense which one above incurs more overhead?

No, but fixing (a) probably gives us the biggest win since it removes
the sync entirely.

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-12-03 15:59 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-13 18:49 [RFC PATCHv2 0/3]: x86/ept: reduce translation invalidation impact David Vrabel
2015-11-13 18:49 ` [PATCHv2 1/3] x86/ept: remove unnecessary sync after resolving misconfigured entries David Vrabel
2015-12-02  7:17   ` Tian, Kevin
2015-11-13 18:49 ` [PATCHv2 2/3] mm: don't free pages until mm locks are released David Vrabel
2015-11-16 11:52   ` Jan Beulich
2015-11-16 12:02     ` David Vrabel
2015-11-16 12:04       ` Jan Beulich
2015-11-30 14:10   ` David Vrabel
2015-12-02  7:25   ` Tian, Kevin
2015-12-02 16:23     ` Tim Deegan
2015-12-02 16:30       ` George Dunlap
2015-12-02 16:46         ` Jan Beulich
2015-12-02 16:46         ` Tim Deegan
2015-12-02 17:01           ` George Dunlap
2015-12-02 17:14           ` David Vrabel
2015-12-03 10:22             ` Jan Beulich
2015-12-03 10:56               ` George Dunlap
2015-12-02 16:45       ` David Vrabel
2015-12-02 17:17         ` George Dunlap
2015-11-13 18:49 ` [PATCHv2 3/3] x86/ept: defer the invalidation until the p2m lock is released David Vrabel
2015-12-02  7:14 ` [RFC PATCHv2 0/3]: x86/ept: reduce translation invalidation impact Tian, Kevin
2015-12-03 15:59   ` David Vrabel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.