All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Cc: Juergen Gross <jgross@suse.com>,
	andrew.cooper3@citrix.com, tim@xen.org, jbeulich@suse.com
Subject: [PATCH v8 1/9] x86/xpti: avoid copying L4 page table contents when possible
Date: Wed, 18 Apr 2018 10:30:43 +0200	[thread overview]
Message-ID: <20180418083051.8334-2-jgross@suse.com> (raw)
In-Reply-To: <20180418083051.8334-1-jgross@suse.com>

For mitigation of Meltdown the current L4 page table is copied to the
cpu local root page table each time a 64 bit pv guest is entered.

Copying can be avoided in cases where the guest L4 page table hasn't
been modified while running the hypervisor, e.g. when handling
interrupts or any hypercall not modifying the L4 page table or %cr3.

So add a per-cpu flag indicating whether the copying should be
performed and set that flag only when loading a new %cr3 or modifying
the L4 page table.  This includes synchronization of the cpu local
root page table with other cpus, so add a special synchronization flag
for that case.

A simple performance check (compiling the hypervisor via "make -j 4")
in dom0 with 4 vcpus shows a significant improvement:

- real time drops from 112 seconds to 103 seconds
- system time drops from 142 seconds to 131 seconds

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V8:
- more shadow code: set flag on all affected cpus (Tim Deegan)

V7:
- add missing flag setting in shadow code

V6:
- correct an error from rebasing to staging in assembly part

V4:
- move setting of root_pgt_changed flag in flush_area_local() out of
  irq disabled section (Jan Beulich)
- move setting of root_pgt_changed in make_cr3() to _toggle_guest_pt()
  (Jan Beulich)
- remove most conditionals in write_ptbase() (Jan Beulich)
- don't set root_pgt_changed in do_mmu_update() for modification of
  the user page table (Jan Beulich)

V3:
- set flag locally only if affected L4 is active (Jan Beulich)
- add setting flag to flush_area_mask() (Jan Beulich)
- set flag in make_cr3() only if called for current active vcpu

To be applied on top of Jan's "Meltdown band-aid overhead reduction"
series
---
 xen/arch/x86/flushtlb.c           | 16 ++++++++++++++++
 xen/arch/x86/mm.c                 | 36 +++++++++++++++++++++++-------------
 xen/arch/x86/mm/shadow/multi.c    | 10 ++++++++++
 xen/arch/x86/pv/domain.c          |  2 ++
 xen/arch/x86/smp.c                |  2 +-
 xen/arch/x86/x86_64/asm-offsets.c |  1 +
 xen/arch/x86/x86_64/entry.S       |  9 +++++++--
 xen/include/asm-x86/current.h     |  8 ++++++++
 xen/include/asm-x86/flushtlb.h    |  5 +++++
 9 files changed, 73 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index 8a7a76b8ff..ad19ba0f5d 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -8,6 +8,7 @@
  */
 
 #include <xen/sched.h>
+#include <xen/smp.h>
 #include <xen/softirq.h>
 #include <asm/flushtlb.h>
 #include <asm/page.h>
@@ -160,5 +161,20 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
 
     local_irq_restore(irqfl);
 
+    if ( flags & FLUSH_ROOT_PGTBL )
+        get_cpu_info()->root_pgt_changed = true;
+
     return flags;
 }
+
+void flush_root_pgt_mask(cpumask_t *mask)
+{
+    int cpu;
+    struct cpu_info *cpu_info;
+
+    for_each_cpu(cpu, mask)
+    {
+        cpu_info = (struct cpu_info *)(stack_base[cpu] + STACK_SIZE) - 1;
+        cpu_info->root_pgt_changed = true;
+    }
+}
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 9fe5583fc3..7d960c742e 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -502,6 +502,7 @@ void make_cr3(struct vcpu *v, mfn_t mfn)
 
 void write_ptbase(struct vcpu *v)
 {
+    get_cpu_info()->root_pgt_changed = true;
     write_cr3(v->arch.cr3);
 }
 
@@ -3699,18 +3700,27 @@ long do_mmu_update(
                         break;
                     rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
                                       cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
-                    /*
-                     * No need to sync if all uses of the page can be accounted
-                     * to the page lock we hold, its pinned status, and uses on
-                     * this (v)CPU.
-                     */
-                    if ( !rc && !cpu_has_no_xpti &&
-                         ((page->u.inuse.type_info & PGT_count_mask) >
-                          (1 + !!(page->u.inuse.type_info & PGT_pinned) +
-                           (pagetable_get_pfn(curr->arch.guest_table) == mfn) +
-                           (pagetable_get_pfn(curr->arch.guest_table_user) ==
-                            mfn))) )
-                        sync_guest = true;
+                    if ( !rc && !cpu_has_no_xpti )
+                    {
+                        bool local_in_use = false;
+
+                        if ( pagetable_get_pfn(curr->arch.guest_table) == mfn )
+                        {
+                            local_in_use = true;
+                            get_cpu_info()->root_pgt_changed = true;
+                        }
+
+                        /*
+                         * No need to sync if all uses of the page can be
+                         * accounted to the page lock we hold, its pinned
+                         * status, and uses on this (v)CPU.
+                         */
+                        if ( (page->u.inuse.type_info & PGT_count_mask) >
+                             (1 + !!(page->u.inuse.type_info & PGT_pinned) +
+                              (pagetable_get_pfn(curr->arch.guest_table_user) ==
+                               mfn) + local_in_use) )
+                            sync_guest = true;
+                    }
                     break;
 
                 case PGT_writable_page:
@@ -3825,7 +3835,7 @@ long do_mmu_update(
 
         cpumask_andnot(mask, pt_owner->dirty_cpumask, cpumask_of(cpu));
         if ( !cpumask_empty(mask) )
-            flush_mask(mask, FLUSH_TLB_GLOBAL);
+            flush_mask(mask, FLUSH_TLB_GLOBAL | FLUSH_ROOT_PGTBL);
     }
 
     perfc_add(num_page_updates, i);
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 9c3af330ec..5e779baa53 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -966,6 +966,16 @@ static int shadow_set_l4e(struct domain *d,
         }
         sh_put_ref(d, osl3mfn, paddr);
     }
+
+    if ( !cpu_has_no_xpti )
+        /*
+         * Lazy flushing is enough: either we do a TLB flush right afterwards
+         * which will pick up the new root page table on all affected cpus
+         * or a TLB flush won't be needed and the same reasons for that being
+         * not necessary apply to the root page table, too.
+         */
+        flush_root_pgt_mask(d->dirty_cpumask);
+
     return flags;
 }
 
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index ac65ba4609..b1c40373fa 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -217,6 +217,8 @@ static void _toggle_guest_pt(struct vcpu *v)
 {
     v->arch.flags ^= TF_kernel_mode;
     update_cr3(v);
+    get_cpu_info()->root_pgt_changed = true;
+
     /* Don't flush user global mappings from the TLB. Don't tick TLB clock. */
     asm volatile ( "mov %0, %%cr3" : : "r" (v->arch.cr3) : "memory" );
 
diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index 033dd05958..63e819ca38 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -208,7 +208,7 @@ void invalidate_interrupt(struct cpu_user_regs *regs)
     ack_APIC_irq();
     perfc_incr(ipis);
     if ( (flags & FLUSH_VCPU_STATE) && __sync_local_execstate() )
-        flags &= ~(FLUSH_TLB | FLUSH_TLB_GLOBAL);
+        flags &= ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_ROOT_PGTBL);
     if ( flags & ~(FLUSH_VCPU_STATE | FLUSH_ORDER_MASK) )
         flush_area_local(flush_va, flags);
     cpumask_clear_cpu(smp_processor_id(), &flush_cpumask);
diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
index a2fea94f4c..9e2aefb00f 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -143,6 +143,7 @@ void __dummy__(void)
     OFFSET(CPUINFO_shadow_spec_ctrl, struct cpu_info, shadow_spec_ctrl);
     OFFSET(CPUINFO_use_shadow_spec_ctrl, struct cpu_info, use_shadow_spec_ctrl);
     OFFSET(CPUINFO_bti_ist_info, struct cpu_info, bti_ist_info);
+    OFFSET(CPUINFO_root_pgt_changed, struct cpu_info, root_pgt_changed);
     DEFINE(CPUINFO_sizeof, sizeof(struct cpu_info));
     BLANK();
 
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 45d9842d09..dd42223b20 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -129,11 +129,15 @@ restore_all_guest:
         mov   VCPU_cr3(%rbx), %r9
         GET_STACK_END(dx)
         mov   STACK_CPUINFO_FIELD(pv_cr3)(%rdx), %rdi
+        test  %rdi, %rdi
+        jz    .Lrag_keep_cr3
+        mov   %rdi, %rax
+        cmpb  $0, STACK_CPUINFO_FIELD(root_pgt_changed)(%rdx)
+        je    .Lrag_copy_done
+        movb  $0, STACK_CPUINFO_FIELD(root_pgt_changed)(%rdx)
         movabs $PADDR_MASK & PAGE_MASK, %rsi
         movabs $DIRECTMAP_VIRT_START, %rcx
-        mov   %rdi, %rax
         and   %rsi, %rdi
-        jz    .Lrag_keep_cr3
         and   %r9, %rsi
         add   %rcx, %rdi
         add   %rcx, %rsi
@@ -148,6 +152,7 @@ restore_all_guest:
         sub   $(ROOT_PAGETABLE_FIRST_XEN_SLOT - \
                 ROOT_PAGETABLE_LAST_XEN_SLOT - 1) * 8, %rdi
         rep movsq
+.Lrag_copy_done:
         mov   STACK_CPUINFO_FIELD(cr4)(%rdx), %rdi
         mov   %r9, STACK_CPUINFO_FIELD(xen_cr3)(%rdx)
         mov   %rdi, %rsi
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index 3a0e1eef36..f2491b4423 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -59,6 +59,14 @@ struct cpu_info {
     bool         use_shadow_spec_ctrl;
     uint8_t      bti_ist_info;
 
+    /*
+     * The following field controls copying of the L4 page table of 64-bit
+     * PV guests to the per-cpu root page table on entering the guest context.
+     * If set the L4 page table is being copied to the root page table and
+     * the field will be reset.
+     */
+    bool         root_pgt_changed;
+
     unsigned long __pad;
     /* get_stack_bottom() must be 16-byte aligned */
 };
diff --git a/xen/include/asm-x86/flushtlb.h b/xen/include/asm-x86/flushtlb.h
index 2cade9cbfb..6bf33a6f5e 100644
--- a/xen/include/asm-x86/flushtlb.h
+++ b/xen/include/asm-x86/flushtlb.h
@@ -103,6 +103,8 @@ void write_cr3(unsigned long cr3);
 #define FLUSH_VA_VALID   0x800
  /* Flush CPU state */
 #define FLUSH_VCPU_STATE 0x1000
+ /* Flush the per-cpu root page table */
+#define FLUSH_ROOT_PGTBL 0x2000
 
 /* Flush local TLBs/caches. */
 unsigned int flush_area_local(const void *va, unsigned int flags);
@@ -134,6 +136,9 @@ void flush_area_mask(const cpumask_t *, const void *va, unsigned int flags);
 #define flush_tlb_one_all(v)                    \
     flush_tlb_one_mask(&cpu_online_map, v)
 
+/* Lazy flush of the per-cpu root page table. */
+void flush_root_pgt_mask(cpumask_t *mask);
+
 static inline void flush_page_to_ram(unsigned long mfn, bool sync_icache) {}
 static inline int invalidate_dcache_va_range(const void *p,
                                              unsigned long size)
-- 
2.13.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2018-04-18  8:31 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-18  8:30 [PATCH v8 0/9] xen/x86: various XPTI speedups Juergen Gross
2018-04-18  8:30 ` Juergen Gross [this message]
2018-04-18 16:12   ` [PATCH v8 1/9] x86/xpti: avoid copying L4 page table contents when possible Jan Beulich
     [not found]   ` <5AD76E5A02000078001BC770@suse.com>
2018-04-19  6:19     ` Juergen Gross
2018-04-19  7:39       ` Jan Beulich
     [not found]       ` <5AD847BD02000078001BC8D9@suse.com>
2018-04-19  7:44         ` Juergen Gross
2018-04-21 13:32           ` Tim Deegan
2018-04-21 17:11             ` Juergen Gross
2018-04-22 16:39               ` Tim Deegan
2018-04-23  5:45                 ` Juergen Gross
2018-04-24 10:31                   ` Tim Deegan
2018-04-24 11:45                     ` Juergen Gross
2018-04-18  8:30 ` [PATCH v8 2/9] xen/x86: add a function for modifying cr3 Juergen Gross
2018-04-18  8:30 ` [PATCH v8 3/9] xen/x86: support per-domain flag for xpti Juergen Gross
2018-04-18  9:42   ` Sergey Dyasli
2018-04-18  9:49     ` Jan Beulich
2018-04-18  9:52       ` Juergen Gross
2018-04-18  9:54     ` Juergen Gross
2018-04-18 15:29   ` Jan Beulich
     [not found]   ` <5AD7647502000078001BC6C8@suse.com>
2018-04-18 15:33     ` Juergen Gross
2018-04-18 15:45       ` Jan Beulich
     [not found]       ` <5AD7680102000078001BC725@suse.com>
2018-04-18 15:54         ` Juergen Gross
2018-04-18 16:06           ` Jan Beulich
2018-04-18  8:30 ` [PATCH v8 4/9] xen/x86: use invpcid for flushing the TLB Juergen Gross
2018-04-18  8:30 ` [PATCH v8 5/9] xen/x86: disable global pages for domains with XPTI active Juergen Gross
2018-04-18  8:30 ` [PATCH v8 6/9] xen/x86: use flag byte for decision whether xen_cr3 is valid Juergen Gross
2018-04-18  8:30 ` [PATCH v8 7/9] xen/x86: convert pv_guest_cr4_to_real_cr4() to a function Juergen Gross
2018-04-18  8:30 ` [PATCH v8 8/9] xen/x86: add some cr3 helpers Juergen Gross
2018-04-18  8:30 ` [PATCH v8 9/9] xen/x86: use PCID feature Juergen Gross
2018-04-18  9:13   ` Jan Beulich
2018-04-18  9:37     ` Juergen Gross
2018-04-18 15:32       ` Jan Beulich
     [not found]       ` <5AD7652902000078001BC6F4@suse.com>
2018-04-18 15:36         ` Juergen Gross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180418083051.8334-2-jgross@suse.com \
    --to=jgross@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.