All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dongli Zhang <dongli.zhang@oracle.com>
To: xen-devel@lists.xen.org
Cc: sstabellini@kernel.org, wei.liu2@citrix.com,
	George.Dunlap@eu.citrix.com, andrew.cooper3@citrix.com,
	dario.faggioli@citrix.com, ian.jackson@eu.citrix.com,
	tim@xen.org, david.vrabel@citrix.com, jbeulich@suse.com
Subject: [PATCH v4 2/2] xen: move TLB-flush filtering out into populate_physmap during vm creation
Date: Mon, 12 Sep 2016 16:16:15 +0800	[thread overview]
Message-ID: <1473668175-3088-2-git-send-email-dongli.zhang@oracle.com> (raw)
In-Reply-To: <1473668175-3088-1-git-send-email-dongli.zhang@oracle.com>

This patch implemented parts of TODO left in commit id
a902c12ee45fc9389eb8fe54eeddaf267a555c58. It moved TLB-flush filtering out
into populate_physmap. Because of TLB-flush in alloc_heap_pages, it's very
slow to create a guest with memory size of more than 100GB on host with
100+ cpus.

This patch introduced a "MEMF_no_tlbflush" bit to memflags to indicate
whether TLB-flush should be done in alloc_heap_pages or its caller
populate_physmap. Once this bit is set in memflags, alloc_heap_pages will
ignore TLB-flush. To use this bit after vm is created might lead to
security issue, that is, this would make pages accessible to the guest B,
when guest A may still have a cached mapping to them.

Therefore, this patch also introduced a "is_ever_unpaused" field to struct
domain to indicate whether this domain has ever got unpaused by hypervisor.
MEMF_no_tlbflush can be set only during vm creation phase when
is_ever_unpaused is still false before this domain gets unpaused for the
first time.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
Changed since v3:
  * Set the flag to true in domain_unpause_by_systemcontroller when
    unpausing the guest domain for the first time.
  * Use true/false for all boot_t variables.
  * Add unlikely to optimize "if statement".
  * Correct comment style.

Changed since v2:
  * Limit this optimization to domain creation time.

---
 xen/common/domain.c     | 11 +++++++++++
 xen/common/memory.c     | 34 ++++++++++++++++++++++++++++++++++
 xen/common/page_alloc.c |  3 ++-
 xen/include/xen/mm.h    |  2 ++
 xen/include/xen/sched.h |  3 +++
 5 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index a8804e4..7be1bee 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -303,6 +303,8 @@ struct domain *domain_create(domid_t domid, unsigned int domcr_flags,
     if ( !zalloc_cpumask_var(&d->domain_dirty_cpumask) )
         goto fail;
 
+    d->is_ever_unpaused = false;
+
     if ( domcr_flags & DOMCRF_hvm )
         d->guest_type = guest_type_hvm;
     else if ( domcr_flags & DOMCRF_pvh )
@@ -1004,6 +1006,15 @@ int domain_unpause_by_systemcontroller(struct domain *d)
 {
     int old, new, prev = d->controller_pause_count;
 
+    /*
+     * Set is_ever_unpaused to true when this domain gets unpaused for the
+     * first time. We record this information here to help populate_physmap
+     * verify whether the domain has ever been unpaused. MEMF_no_tlbflush
+     * is allowed to be set by populate_physmap only during vm creation.
+     */
+    if ( unlikely(!d->is_ever_unpaused) )
+        d->is_ever_unpaused = true;
+
     do
     {
         old = prev;
diff --git a/xen/common/memory.c b/xen/common/memory.c
index cc0f69e..f3a733b 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -141,6 +141,8 @@ static void populate_physmap(struct memop_args *a)
     unsigned int i, j;
     xen_pfn_t gpfn, mfn;
     struct domain *d = a->domain, *curr_d = current->domain;
+    bool_t need_tlbflush = false;
+    uint32_t tlbflush_timestamp = 0;
 
     if ( !guest_handle_subrange_okay(a->extent_list, a->nr_done,
                                      a->nr_extents-1) )
@@ -150,6 +152,14 @@ static void populate_physmap(struct memop_args *a)
                             max_order(curr_d)) )
         return;
 
+    /*
+     * MEMF_no_tlbflush can be set only during vm creation phase when
+     * is_ever_unpaused is still false before this domain gets unpaused for
+     * the first time.
+     */
+    if ( unlikely(!d->is_ever_unpaused) )
+        a->memflags |= MEMF_no_tlbflush;
+
     for ( i = a->nr_done; i < a->nr_extents; i++ )
     {
         if ( i != a->nr_done && hypercall_preempt_check() )
@@ -214,6 +224,20 @@ static void populate_physmap(struct memop_args *a)
                     goto out;
                 }
 
+                if ( unlikely(!d->is_ever_unpaused) )
+                {
+                    for ( j = 0; j < (1U << a->extent_order); j++ )
+                    {
+                        if ( page_needs_tlbflush(&page[j], need_tlbflush,
+                                                 tlbflush_timestamp,
+                                                 tlbflush_current_time()) )
+                        {
+                            need_tlbflush = true;
+                            tlbflush_timestamp = page[j].tlbflush_timestamp;
+                        }
+                    }
+                }
+
                 mfn = page_to_mfn(page);
             }
 
@@ -232,6 +256,16 @@ static void populate_physmap(struct memop_args *a)
     }
 
 out:
+    if ( need_tlbflush )
+    {
+        cpumask_t mask = cpu_online_map;
+        tlbflush_filter(mask, tlbflush_timestamp);
+        if ( !cpumask_empty(&mask) )
+        {
+            perfc_incr(need_flush_tlb_flush);
+            flush_tlb_mask(&mask);
+        }
+    }
     a->nr_done = i;
 }
 
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 5b93a01..04ca26a 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -827,7 +827,8 @@ static struct page_info *alloc_heap_pages(
         BUG_ON(pg[i].count_info != PGC_state_free);
         pg[i].count_info = PGC_state_inuse;
 
-        if ( page_needs_tlbflush(&pg[i], need_tlbflush,
+        if ( !(memflags & MEMF_no_tlbflush) &&
+             page_needs_tlbflush(&pg[i], need_tlbflush,
                                  tlbflush_timestamp,
                                  tlbflush_current_time()) )
         {
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 766559d..04b10e9 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -221,6 +221,8 @@ struct npfec {
 #define  MEMF_exact_node  (1U<<_MEMF_exact_node)
 #define _MEMF_no_owner    5
 #define  MEMF_no_owner    (1U<<_MEMF_no_owner)
+#define _MEMF_no_tlbflush 6
+#define  MEMF_no_tlbflush (1U<<_MEMF_no_tlbflush)
 #define _MEMF_node        8
 #define  MEMF_node_mask   ((1U << (8 * sizeof(nodeid_t))) - 1)
 #define  MEMF_node(n)     ((((n) + 1) & MEMF_node_mask) << _MEMF_node)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 2f9c15f..7fe8841 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -474,6 +474,9 @@ struct domain
         unsigned int guest_request_enabled       : 1;
         unsigned int guest_request_sync          : 1;
     } monitor;
+
+    /* set to true the first time this domain gets unpaused. */
+    bool_t is_ever_unpaused;
 };
 
 /* Protect updates/reads (resp.) of domain_list and domain_hash. */
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2016-09-12  8:16 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-12  8:16 [PATCH v4 1/2] xen: replace complicated tlbflush check with an inline function Dongli Zhang
2016-09-12  8:16 ` Dongli Zhang [this message]
2016-09-14 16:52   ` [PATCH v4 2/2] xen: move TLB-flush filtering out into populate_physmap during vm creation Dario Faggioli
2016-09-15  8:39   ` Jan Beulich
2016-09-14 16:16 ` [PATCH v4 1/2] xen: replace complicated tlbflush check with an inline function Jan Beulich
2016-09-16 10:47 [PATCH v4 2/2] xen: move TLB-flush filtering out into populate_physmap during vm creation Dongli Zhang
2016-09-16 10:55 ` Wei Liu
2016-09-16 11:34 Dongli Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1473668175-3088-2-git-send-email-dongli.zhang@oracle.com \
    --to=dongli.zhang@oracle.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dario.faggioli@citrix.com \
    --cc=david.vrabel@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=sstabellini@kernel.org \
    --cc=tim@xen.org \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.