All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/9] xen/x86: various XPTI speedups
@ 2018-04-12 18:09 Juergen Gross
  2018-04-12 18:09 ` [PATCH v7 1/9] x86/xpti: avoid copying L4 page table contents when possible Juergen Gross
                   ` (9 more replies)
  0 siblings, 10 replies; 16+ messages in thread
From: Juergen Gross @ 2018-04-12 18:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, jbeulich

This patch series aims at reducing the overhead of the XPTI Meltdown
mitigation.

Patch 1 had been posted before, the main changes in this patch are due
to addressing Jan's comments on my first version. The main objective of
that patch is to avoid copying the L4 page table each time the guest is
being activated, as often the contents didn't change while the
hypervisor was active.

Patch 2 adds a new helper for writing cr3 instead of open coding the
inline assembly in multiple places.

Patch 3 sets the stage for being able to activate XPTI per domain. As a
first step it is now possible to switch XPTI off for dom0 via the xpti
boot parameter.

Patch 4 adds support for using the INVPCID instruction for flushing
the TLB.

Patch 5 reduces the costs of TLB flushes even further: as we don't make
any use of global TLB entries with XPTI being active we can avoid
removing all global TLB entries on TLB flushes by simply deactivating
the global pages in CR4.

Patch 6 prepares using PCIDs in patch 6.
For that purpose it was necessary to allow CR3 values with bit 63 set
in order to avoid flushing TLB entries when writing CR3. This requires
a modification of Jan's rather clever state machine with positive and
negative CR3 values for the hypervisor by using a dedicated flag byte
instead.

Patch 7 converts pv_guest_cr4_to_real_cr4() from a macro to a function
as it was becoming more and more complex.

Patch 8 adds some PCID helper functions for accessing the different
parts of cr3 (address and pcid part).

Patch 9 is the main performance contributor: by making use of the PCID
feature (if available) TLB entries can survive CR3 switches. The TLB
needs to be flushed on context switches only and not when switching
between guest and hypervisor or guest kernel and user mode.

On my machine (Intel i7-4600M) using the PCID feature in the non-XPTI
case showed a slightly worse performance than using global pages
instead (using PCID and global pages is a bad idea as invalidating
global pages in this case would need a complete TLB flush). For this
reason I've decided to use PCID for XPTI only as the default. That
can easily be changed by using the command line parameter "pcid=true".

The complete series has been verified to still mitigate against
Meltdown attacks. A simple performance test (make -j 4 in the Xen
hypervisor directory) showed significant improvements compared to the
state without this series.
Numbers are seconds, stddev in braces.

xpti=false  elapsed         system         user
unpatched:  88.42 ( 2.01)   94.49 ( 1.38)  180.40 ( 1.41)
patched  :  89.45 ( 3.10)   96.47 ( 3.22)  181.34 ( 1.98)

xpti=true   elapsed         system         user
unpatched: 113.43 ( 3.68)  165.44 ( 4.41)  183.30 ( 1.72)
patched  :  92.76 ( 2.11)  103.39 ( 1.13)  184.86 ( 0.12)


Juergen Gross (9):
  x86/xpti: avoid copying L4 page table contents when possible
  xen/x86: add a function for modifying cr3
  xen/x86: support per-domain flag for xpti
  xen/x86: use invpcid for flushing the TLB
  xen/x86: disable global pages for domains with XPTI active
  xen/x86: use flag byte for decision whether xen_cr3 is valid
  xen/x86: convert pv_guest_cr4_to_real_cr4() to a function
  xen/x86: add some cr3 helpers
  xen/x86: use PCID feature

 docs/misc/xen-command-line.markdown | 37 +++++++++++++-
 xen/arch/x86/cpu/mtrr/generic.c     | 37 +++++++++-----
 xen/arch/x86/debug.c                |  2 +-
 xen/arch/x86/domain.c               |  6 +--
 xen/arch/x86/domain_page.c          |  2 +-
 xen/arch/x86/flushtlb.c             | 98 ++++++++++++++++++++++++++++++-------
 xen/arch/x86/mm.c                   | 86 ++++++++++++++++++++++++++------
 xen/arch/x86/mm/shadow/multi.c      |  4 ++
 xen/arch/x86/pv/dom0_build.c        |  8 +--
 xen/arch/x86/pv/domain.c            | 89 ++++++++++++++++++++++++++++++++-
 xen/arch/x86/setup.c                | 27 +++-------
 xen/arch/x86/smp.c                  |  2 +-
 xen/arch/x86/smpboot.c              |  6 ++-
 xen/arch/x86/spec_ctrl.c            | 70 ++++++++++++++++++++++++++
 xen/arch/x86/x86_64/asm-offsets.c   |  2 +
 xen/arch/x86/x86_64/compat/entry.S  |  5 +-
 xen/arch/x86/x86_64/entry.S         | 78 ++++++++++++-----------------
 xen/common/efi/runtime.c            |  4 +-
 xen/include/asm-x86/current.h       | 23 +++++++--
 xen/include/asm-x86/domain.h        | 17 +++----
 xen/include/asm-x86/flushtlb.h      |  4 +-
 xen/include/asm-x86/invpcid.h       |  2 +
 xen/include/asm-x86/processor.h     | 18 +++++++
 xen/include/asm-x86/pv/domain.h     | 31 ++++++++++++
 xen/include/asm-x86/spec_ctrl.h     |  4 ++
 xen/include/asm-x86/x86-defns.h     |  4 +-
 26 files changed, 521 insertions(+), 145 deletions(-)

-- 
2.13.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v7 1/9] x86/xpti: avoid copying L4 page table contents when possible
  2018-04-12 18:09 [PATCH v7 0/9] xen/x86: various XPTI speedups Juergen Gross
@ 2018-04-12 18:09 ` Juergen Gross
  2018-04-13  8:43   ` Jan Beulich
  2018-04-12 18:09 ` [PATCH v7 2/9] xen/x86: add a function for modifying cr3 Juergen Gross
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Juergen Gross @ 2018-04-12 18:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, jbeulich

For mitigation of Meltdown the current L4 page table is copied to the
cpu local root page table each time a 64 bit pv guest is entered.

Copying can be avoided in cases where the guest L4 page table hasn't
been modified while running the hypervisor, e.g. when handling
interrupts or any hypercall not modifying the L4 page table or %cr3.

So add a per-cpu flag indicating whether the copying should be
performed and set that flag only when loading a new %cr3 or modifying
the L4 page table.  This includes synchronization of the cpu local
root page table with other cpus, so add a special synchronization flag
for that case.

A simple performance check (compiling the hypervisor via "make -j 4")
in dom0 with 4 vcpus shows a significant improvement:

- real time drops from 112 seconds to 103 seconds
- system time drops from 142 seconds to 131 seconds

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V7:
- add missing flag setting in shadow code

V6:
- correct an error from rebasing to staging in assembly part

V4:
- move setting of root_pgt_changed flag in flush_area_local() out of
  irq disabled section (Jan Beulich)
- move setting of root_pgt_changed in make_cr3() to _toggle_guest_pt()
  (Jan Beulich)
- remove most conditionals in write_ptbase() (Jan Beulich)
- don't set root_pgt_changed in do_mmu_update() for modification of
  the user page table (Jan Beulich)

V3:
- set flag locally only if affected L4 is active (Jan Beulich)
- add setting flag to flush_area_mask() (Jan Beulich)
- set flag in make_cr3() only if called for current active vcpu

To be applied on top of Jan's "Meltdown band-aid overhead reduction"
series
---
 xen/arch/x86/flushtlb.c           |  3 +++
 xen/arch/x86/mm.c                 | 36 +++++++++++++++++++++++-------------
 xen/arch/x86/mm/shadow/multi.c    |  4 ++++
 xen/arch/x86/pv/domain.c          |  2 ++
 xen/arch/x86/smp.c                |  2 +-
 xen/arch/x86/x86_64/asm-offsets.c |  1 +
 xen/arch/x86/x86_64/entry.S       |  9 +++++++--
 xen/include/asm-x86/current.h     |  8 ++++++++
 xen/include/asm-x86/flushtlb.h    |  2 ++
 9 files changed, 51 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index 8a7a76b8ff..38cedf3b22 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -160,5 +160,8 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
 
     local_irq_restore(irqfl);
 
+    if ( flags & FLUSH_ROOT_PGTBL )
+        get_cpu_info()->root_pgt_changed = true;
+
     return flags;
 }
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 9fe5583fc3..7d960c742e 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -502,6 +502,7 @@ void make_cr3(struct vcpu *v, mfn_t mfn)
 
 void write_ptbase(struct vcpu *v)
 {
+    get_cpu_info()->root_pgt_changed = true;
     write_cr3(v->arch.cr3);
 }
 
@@ -3699,18 +3700,27 @@ long do_mmu_update(
                         break;
                     rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
                                       cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
-                    /*
-                     * No need to sync if all uses of the page can be accounted
-                     * to the page lock we hold, its pinned status, and uses on
-                     * this (v)CPU.
-                     */
-                    if ( !rc && !cpu_has_no_xpti &&
-                         ((page->u.inuse.type_info & PGT_count_mask) >
-                          (1 + !!(page->u.inuse.type_info & PGT_pinned) +
-                           (pagetable_get_pfn(curr->arch.guest_table) == mfn) +
-                           (pagetable_get_pfn(curr->arch.guest_table_user) ==
-                            mfn))) )
-                        sync_guest = true;
+                    if ( !rc && !cpu_has_no_xpti )
+                    {
+                        bool local_in_use = false;
+
+                        if ( pagetable_get_pfn(curr->arch.guest_table) == mfn )
+                        {
+                            local_in_use = true;
+                            get_cpu_info()->root_pgt_changed = true;
+                        }
+
+                        /*
+                         * No need to sync if all uses of the page can be
+                         * accounted to the page lock we hold, its pinned
+                         * status, and uses on this (v)CPU.
+                         */
+                        if ( (page->u.inuse.type_info & PGT_count_mask) >
+                             (1 + !!(page->u.inuse.type_info & PGT_pinned) +
+                              (pagetable_get_pfn(curr->arch.guest_table_user) ==
+                               mfn) + local_in_use) )
+                            sync_guest = true;
+                    }
                     break;
 
                 case PGT_writable_page:
@@ -3825,7 +3835,7 @@ long do_mmu_update(
 
         cpumask_andnot(mask, pt_owner->dirty_cpumask, cpumask_of(cpu));
         if ( !cpumask_empty(mask) )
-            flush_mask(mask, FLUSH_TLB_GLOBAL);
+            flush_mask(mask, FLUSH_TLB_GLOBAL | FLUSH_ROOT_PGTBL);
     }
 
     perfc_add(num_page_updates, i);
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 9c3af330ec..5f964505d1 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -751,6 +751,7 @@ l4e_propagate_from_guest(struct vcpu *v,
         ASSERT(!guest_l4e_rsvd_bits(v, gl4e));
 
     _sh_propagate(v, gl4e.l4, sl3mfn, sl4e, 4, ft, p2m_ram_rw);
+    get_cpu_info()->root_pgt_changed = true;
 }
 
 static void
@@ -966,6 +967,9 @@ static int shadow_set_l4e(struct domain *d,
         }
         sh_put_ref(d, osl3mfn, paddr);
     }
+
+    get_cpu_info()->root_pgt_changed = true;
+
     return flags;
 }
 
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index ac65ba4609..b1c40373fa 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -217,6 +217,8 @@ static void _toggle_guest_pt(struct vcpu *v)
 {
     v->arch.flags ^= TF_kernel_mode;
     update_cr3(v);
+    get_cpu_info()->root_pgt_changed = true;
+
     /* Don't flush user global mappings from the TLB. Don't tick TLB clock. */
     asm volatile ( "mov %0, %%cr3" : : "r" (v->arch.cr3) : "memory" );
 
diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index 033dd05958..63e819ca38 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -208,7 +208,7 @@ void invalidate_interrupt(struct cpu_user_regs *regs)
     ack_APIC_irq();
     perfc_incr(ipis);
     if ( (flags & FLUSH_VCPU_STATE) && __sync_local_execstate() )
-        flags &= ~(FLUSH_TLB | FLUSH_TLB_GLOBAL);
+        flags &= ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_ROOT_PGTBL);
     if ( flags & ~(FLUSH_VCPU_STATE | FLUSH_ORDER_MASK) )
         flush_area_local(flush_va, flags);
     cpumask_clear_cpu(smp_processor_id(), &flush_cpumask);
diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
index a2fea94f4c..9e2aefb00f 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -143,6 +143,7 @@ void __dummy__(void)
     OFFSET(CPUINFO_shadow_spec_ctrl, struct cpu_info, shadow_spec_ctrl);
     OFFSET(CPUINFO_use_shadow_spec_ctrl, struct cpu_info, use_shadow_spec_ctrl);
     OFFSET(CPUINFO_bti_ist_info, struct cpu_info, bti_ist_info);
+    OFFSET(CPUINFO_root_pgt_changed, struct cpu_info, root_pgt_changed);
     DEFINE(CPUINFO_sizeof, sizeof(struct cpu_info));
     BLANK();
 
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 45d9842d09..dd42223b20 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -129,11 +129,15 @@ restore_all_guest:
         mov   VCPU_cr3(%rbx), %r9
         GET_STACK_END(dx)
         mov   STACK_CPUINFO_FIELD(pv_cr3)(%rdx), %rdi
+        test  %rdi, %rdi
+        jz    .Lrag_keep_cr3
+        mov   %rdi, %rax
+        cmpb  $0, STACK_CPUINFO_FIELD(root_pgt_changed)(%rdx)
+        je    .Lrag_copy_done
+        movb  $0, STACK_CPUINFO_FIELD(root_pgt_changed)(%rdx)
         movabs $PADDR_MASK & PAGE_MASK, %rsi
         movabs $DIRECTMAP_VIRT_START, %rcx
-        mov   %rdi, %rax
         and   %rsi, %rdi
-        jz    .Lrag_keep_cr3
         and   %r9, %rsi
         add   %rcx, %rdi
         add   %rcx, %rsi
@@ -148,6 +152,7 @@ restore_all_guest:
         sub   $(ROOT_PAGETABLE_FIRST_XEN_SLOT - \
                 ROOT_PAGETABLE_LAST_XEN_SLOT - 1) * 8, %rdi
         rep movsq
+.Lrag_copy_done:
         mov   STACK_CPUINFO_FIELD(cr4)(%rdx), %rdi
         mov   %r9, STACK_CPUINFO_FIELD(xen_cr3)(%rdx)
         mov   %rdi, %rsi
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index 3a0e1eef36..f2491b4423 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -59,6 +59,14 @@ struct cpu_info {
     bool         use_shadow_spec_ctrl;
     uint8_t      bti_ist_info;
 
+    /*
+     * The following field controls copying of the L4 page table of 64-bit
+     * PV guests to the per-cpu root page table on entering the guest context.
+     * If set the L4 page table is being copied to the root page table and
+     * the field will be reset.
+     */
+    bool         root_pgt_changed;
+
     unsigned long __pad;
     /* get_stack_bottom() must be 16-byte aligned */
 };
diff --git a/xen/include/asm-x86/flushtlb.h b/xen/include/asm-x86/flushtlb.h
index 2cade9cbfb..052f0fa403 100644
--- a/xen/include/asm-x86/flushtlb.h
+++ b/xen/include/asm-x86/flushtlb.h
@@ -103,6 +103,8 @@ void write_cr3(unsigned long cr3);
 #define FLUSH_VA_VALID   0x800
  /* Flush CPU state */
 #define FLUSH_VCPU_STATE 0x1000
+ /* Flush the per-cpu root page table */
+#define FLUSH_ROOT_PGTBL 0x2000
 
 /* Flush local TLBs/caches. */
 unsigned int flush_area_local(const void *va, unsigned int flags);
-- 
2.13.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v7 2/9] xen/x86: add a function for modifying cr3
  2018-04-12 18:09 [PATCH v7 0/9] xen/x86: various XPTI speedups Juergen Gross
  2018-04-12 18:09 ` [PATCH v7 1/9] x86/xpti: avoid copying L4 page table contents when possible Juergen Gross
@ 2018-04-12 18:09 ` Juergen Gross
  2018-04-12 18:09 ` [PATCH v7 3/9] xen/x86: support per-domain flag for xpti Juergen Gross
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Juergen Gross @ 2018-04-12 18:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, jbeulich

Instead of having multiple places with more or less identical asm
statements just have one function doing a write to cr3.

As this function should be named write_cr3() rename the current
write_cr3() function to switch_cr3().

Suggested-by: Andrew Copper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V6:
- new patch
---
 xen/arch/x86/flushtlb.c         | 4 ++--
 xen/arch/x86/mm.c               | 2 +-
 xen/arch/x86/pv/domain.c        | 2 +-
 xen/common/efi/runtime.c        | 4 ++--
 xen/include/asm-x86/flushtlb.h  | 2 +-
 xen/include/asm-x86/processor.h | 5 +++++
 6 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index 38cedf3b22..788c61d81a 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -71,7 +71,7 @@ static void post_flush(u32 t)
     this_cpu(tlbflush_time) = t;
 }
 
-void write_cr3(unsigned long cr3)
+void switch_cr3(unsigned long cr3)
 {
     unsigned long flags, cr4;
     u32 t;
@@ -83,7 +83,7 @@ void write_cr3(unsigned long cr3)
     cr4 = read_cr4();
 
     write_cr4(cr4 & ~X86_CR4_PGE);
-    asm volatile ( "mov %0, %%cr3" : : "r" (cr3) : "memory" );
+    write_cr3(cr3);
     write_cr4(cr4);
 
     post_flush(t);
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 7d960c742e..e245d96a97 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -503,7 +503,7 @@ void make_cr3(struct vcpu *v, mfn_t mfn)
 void write_ptbase(struct vcpu *v)
 {
     get_cpu_info()->root_pgt_changed = true;
-    write_cr3(v->arch.cr3);
+    switch_cr3(v->arch.cr3);
 }
 
 /*
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index b1c40373fa..be40843b05 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -220,7 +220,7 @@ static void _toggle_guest_pt(struct vcpu *v)
     get_cpu_info()->root_pgt_changed = true;
 
     /* Don't flush user global mappings from the TLB. Don't tick TLB clock. */
-    asm volatile ( "mov %0, %%cr3" : : "r" (v->arch.cr3) : "memory" );
+    write_cr3(v->arch.cr3);
 
     if ( !(v->arch.flags & TF_kernel_mode) )
         return;
diff --git a/xen/common/efi/runtime.c b/xen/common/efi/runtime.c
index 3dbc2e8ee5..4e5ddfef4f 100644
--- a/xen/common/efi/runtime.c
+++ b/xen/common/efi/runtime.c
@@ -111,7 +111,7 @@ struct efi_rs_state efi_rs_enter(void)
         lgdt(&gdt_desc);
     }
 
-    write_cr3(virt_to_maddr(efi_l4_pgtable));
+    switch_cr3(virt_to_maddr(efi_l4_pgtable));
 
     return state;
 }
@@ -120,7 +120,7 @@ void efi_rs_leave(struct efi_rs_state *state)
 {
     if ( !state->cr3 )
         return;
-    write_cr3(state->cr3);
+    switch_cr3(state->cr3);
     if ( is_pv_vcpu(current) && !is_idle_vcpu(current) )
     {
         struct desc_ptr gdt_desc = {
diff --git a/xen/include/asm-x86/flushtlb.h b/xen/include/asm-x86/flushtlb.h
index 052f0fa403..5515f73b7f 100644
--- a/xen/include/asm-x86/flushtlb.h
+++ b/xen/include/asm-x86/flushtlb.h
@@ -84,7 +84,7 @@ static inline unsigned long read_cr3(void)
 }
 
 /* Write pagetable base and implicitly tick the tlbflush clock. */
-void write_cr3(unsigned long cr3);
+void switch_cr3(unsigned long cr3);
 
 /* flush_* flag fields: */
  /*
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index db9988ab33..71d32c0333 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -283,6 +283,11 @@ static inline unsigned long read_cr2(void)
     return cr2;
 }
 
+static inline void write_cr3(unsigned long val)
+{
+    asm volatile ( "mov %0, %%cr3" : : "r" (val) : "memory" );
+}
+
 static inline unsigned long read_cr4(void)
 {
     return get_cpu_info()->cr4;
-- 
2.13.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v7 3/9] xen/x86: support per-domain flag for xpti
  2018-04-12 18:09 [PATCH v7 0/9] xen/x86: various XPTI speedups Juergen Gross
  2018-04-12 18:09 ` [PATCH v7 1/9] x86/xpti: avoid copying L4 page table contents when possible Juergen Gross
  2018-04-12 18:09 ` [PATCH v7 2/9] xen/x86: add a function for modifying cr3 Juergen Gross
@ 2018-04-12 18:09 ` Juergen Gross
  2018-04-12 18:09 ` [PATCH v7 4/9] xen/x86: use invpcid for flushing the TLB Juergen Gross
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Juergen Gross @ 2018-04-12 18:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, jbeulich

Instead of switching XPTI globally on or off add a per-domain flag for
that purpose. This allows to modify the xpti boot parameter to support
running dom0 without Meltdown mitigations. Using "xpti=nodom0" as boot
parameter will achieve that.

Move the xpti boot parameter handling to xen/arch/x86/pv/domain.c as
it is pv-domain specific.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V6.1:
- address some minor comments (Jan Beulich)

V6:
- modify xpti boot parameter options (Andrew Cooper)
- move xpti_init() code to spec_ctrl.c (Andrew Cooper)
- irework init of per-domain xpti flag (Andrew Cooper)

V3:
- latch get_cpu_info() return value in variable (Jan Beulich)
- call always xpti_domain_init() for pv dom0 (Jan Beulich)
- add __init annotations (Jan Beulich)
- drop per domain XPTI message (Jan Beulich)
- document xpti=default support (Jan Beulich)
- move domain xpti flag into a padding hole (Jan Beulich)
---
 docs/misc/xen-command-line.markdown | 14 ++++++--
 xen/arch/x86/mm.c                   | 17 +++++++--
 xen/arch/x86/pv/dom0_build.c        |  1 +
 xen/arch/x86/pv/domain.c            |  6 ++++
 xen/arch/x86/setup.c                | 19 ----------
 xen/arch/x86/smpboot.c              |  4 +--
 xen/arch/x86/spec_ctrl.c            | 70 +++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/current.h       |  3 +-
 xen/include/asm-x86/domain.h        |  3 ++
 xen/include/asm-x86/spec_ctrl.h     |  4 +++
 10 files changed, 115 insertions(+), 26 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index b353352adf..d4f758487a 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1955,14 +1955,24 @@ clustered mode.  The default, given no hint from the **FADT**, is cluster
 mode.
 
 ### xpti
-> `= <boolean>`
+> `= List of [ default | <boolean> | dom0=<bool> | domu=<bool> ]`
 
-> Default: `false` on AMD hardware
+> Default: `false` on hardware not vulnerable to Meltdown (e.g. AMD)
 > Default: `true` everywhere else
 
 Override default selection of whether to isolate 64-bit PV guest page
 tables.
 
+`true` activates page table isolation even on hardware not vulnerable by
+Meltdown for all domains.
+
+`false` deactivates page table isolation on all systems for all domains.
+
+`default` sets the default behaviour.
+
+With `dom0` and `domu` it is possible to control page table isolation
+for dom0 or guest domains only.
+
 ### xsave
 > `= <boolean>`
 
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e245d96a97..9c36614099 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -502,8 +502,21 @@ void make_cr3(struct vcpu *v, mfn_t mfn)
 
 void write_ptbase(struct vcpu *v)
 {
-    get_cpu_info()->root_pgt_changed = true;
-    switch_cr3(v->arch.cr3);
+    struct cpu_info *cpu_info = get_cpu_info();
+
+    if ( is_pv_vcpu(v) && v->domain->arch.pv_domain.xpti )
+    {
+        cpu_info->root_pgt_changed = true;
+        cpu_info->pv_cr3 = __pa(this_cpu(root_pgt));
+        switch_cr3(v->arch.cr3);
+    }
+    else
+    {
+        /* Make sure to clear xen_cr3 before pv_cr3; switch_cr3() serializes. */
+        cpu_info->xen_cr3 = 0;
+        switch_cr3(v->arch.cr3);
+        cpu_info->pv_cr3 = 0;
+    }
 }
 
 /*
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 5b4325b87f..d148395919 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -387,6 +387,7 @@ int __init dom0_construct_pv(struct domain *d,
     if ( compat32 )
     {
         d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 1;
+        d->arch.pv_domain.xpti = false;
         v->vcpu_info = (void *)&d->shared_info->compat.vcpu_info[0];
         if ( setup_compat_arg_xlat(v) != 0 )
             BUG();
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index be40843b05..ce1a1a9d35 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -9,6 +9,7 @@
 #include <xen/lib.h>
 #include <xen/sched.h>
 
+#include <asm/spec_ctrl.h>
 #include <asm/pv/domain.h>
 
 static void noreturn continue_nonidle_domain(struct vcpu *v)
@@ -75,6 +76,8 @@ int switch_compat(struct domain *d)
 
     d->arch.x87_fip_width = 4;
 
+    d->arch.pv_domain.xpti = false;
+
     return 0;
 
  undo_and_fail:
@@ -205,6 +208,9 @@ int pv_domain_initialise(struct domain *d)
     /* 64-bit PV guest by default. */
     d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
 
+    d->arch.pv_domain.xpti = opt_xpti & (is_hardware_domain(d)
+                                         ? OPT_XPTI_DOM0 : OPT_XPTI_DOMU);
+
     return 0;
 
   fail:
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index b2baee3d2c..f803980b97 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -169,9 +169,6 @@ static int __init parse_smap_param(const char *s)
 }
 custom_param("smap", parse_smap_param);
 
-static int8_t __initdata opt_xpti = -1;
-boolean_param("xpti", opt_xpti);
-
 bool __read_mostly acpi_disabled;
 bool __initdata acpi_force;
 static char __initdata acpi_param[10] = "";
@@ -1546,22 +1543,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
     cr4_pv32_mask = mmu_cr4_features & XEN_CR4_PV32_BITS;
 
-    if ( opt_xpti < 0 )
-    {
-        uint64_t caps = 0;
-
-        if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
-            caps = ARCH_CAPABILITIES_RDCL_NO;
-        else if ( boot_cpu_has(X86_FEATURE_ARCH_CAPS) )
-            rdmsrl(MSR_ARCH_CAPABILITIES, caps);
-
-        opt_xpti = !(caps & ARCH_CAPABILITIES_RDCL_NO);
-    }
-    if ( opt_xpti )
-        setup_clear_cpu_cap(X86_FEATURE_NO_XPTI);
-    else
-        setup_force_cpu_cap(X86_FEATURE_NO_XPTI);
-
     if ( cpu_has_fsgsbase )
         set_in_cr4(X86_CR4_FSGSBASE);
 
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 103d8f7142..980192e71f 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -325,7 +325,7 @@ void start_secondary(void *unused)
     spin_debug_disable();
 
     get_cpu_info()->xen_cr3 = 0;
-    get_cpu_info()->pv_cr3 = this_cpu(root_pgt) ? __pa(this_cpu(root_pgt)) : 0;
+    get_cpu_info()->pv_cr3 = 0;
 
     load_system_tables();
 
@@ -1044,7 +1044,7 @@ void __init smp_prepare_cpus(void)
         panic("Error %d setting up PV root page table\n", rc);
     if ( per_cpu(root_pgt, 0) )
     {
-        get_cpu_info()->pv_cr3 = __pa(per_cpu(root_pgt, 0));
+        get_cpu_info()->pv_cr3 = 0;
 
         /*
          * All entry points which may need to switch page tables have to start
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 5b5ec90fd8..2300e9eba9 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -193,6 +193,70 @@ static bool __init retpoline_safe(void)
     }
 }
 
+#define OPT_XPTI_DEFAULT  0xff
+uint8_t __read_mostly opt_xpti = OPT_XPTI_DEFAULT;
+
+static __init void xpti_init_default(bool force)
+{
+    uint64_t caps = 0;
+
+    if ( !force && (opt_xpti != OPT_XPTI_DEFAULT) )
+        return;
+
+    if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
+        caps = ARCH_CAPABILITIES_RDCL_NO;
+    else if ( boot_cpu_has(X86_FEATURE_ARCH_CAPS) )
+        rdmsrl(MSR_ARCH_CAPABILITIES, caps);
+
+    if ( caps & ARCH_CAPABILITIES_RDCL_NO )
+        opt_xpti = 0;
+    else
+        opt_xpti = OPT_XPTI_DOM0 | OPT_XPTI_DOMU;
+}
+
+static __init int parse_xpti(const char *s)
+{
+    const char *ss;
+    int val, rc = 0;
+
+    xpti_init_default(false);
+
+    do {
+        ss = strchr(s, ',');
+        if ( !ss )
+            ss = strchr(s, '\0');
+
+        switch ( parse_bool(s, ss) )
+        {
+        case 0:
+            opt_xpti = 0;
+            break;
+
+        case 1:
+            opt_xpti = OPT_XPTI_DOM0 | OPT_XPTI_DOMU;
+            break;
+
+        default:
+            if ( !strcmp(s, "default") )
+                xpti_init_default(true);
+            else if ( (val = parse_boolean("dom0", s, ss)) >= 0 )
+                opt_xpti = (opt_xpti & ~OPT_XPTI_DOM0) |
+                           (val ? OPT_XPTI_DOM0 : 0);
+            else if ( (val = parse_boolean("domu", s, ss)) >= 0 )
+                opt_xpti = (opt_xpti & ~OPT_XPTI_DOMU) |
+                           (val ? OPT_XPTI_DOMU : 0);
+            else
+                rc = -EINVAL;
+            break;
+        }
+
+        s = ss + 1;
+    } while ( *ss );
+
+    return rc;
+}
+custom_param("xpti", parse_xpti);
+
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;
@@ -310,6 +374,12 @@ void __init init_speculation_mitigations(void)
     /* (Re)init BSP state now that default_bti_ist_info has been calculated. */
     init_shadow_spec_ctrl_state();
 
+    xpti_init_default(false);
+    if ( opt_xpti == 0 )
+        setup_force_cpu_cap(X86_FEATURE_NO_XPTI);
+    else
+        setup_clear_cpu_cap(X86_FEATURE_NO_XPTI);
+
     print_details(thunk);
 }
 
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index f2491b4423..b2475783f8 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -44,7 +44,8 @@ struct cpu_info {
     /*
      * Of the two following fields the latter is being set to the CR3 value
      * to be used on the given pCPU for loading whenever 64-bit PV guest
-     * context is being entered. The value never changes once set.
+     * context is being entered. A value of zero indicates no setting of CR3
+     * is to be performed.
      * The former is the value to restore when re-entering Xen, if any. IOW
      * its value being zero means there's nothing to restore. However, its
      * value can also be negative, indicating to the exit-to-Xen code that
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index a12ae47f1b..ed4199931a 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -253,6 +253,9 @@ struct pv_domain
 
     atomic_t nr_l4_pages;
 
+    /* XPTI active? */
+    bool xpti;
+
     /* map_domain_page() mapping cache. */
     struct mapcache_domain mapcache;
 
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 5ab4ff3f68..b4fa43269e 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -29,6 +29,10 @@ void init_speculation_mitigations(void);
 extern bool opt_ibpb;
 extern uint8_t default_bti_ist_info;
 
+extern uint8_t opt_xpti;
+#define OPT_XPTI_DOM0  0x01
+#define OPT_XPTI_DOMU  0x02
+
 static inline void init_shadow_spec_ctrl_state(void)
 {
     struct cpu_info *info = get_cpu_info();
-- 
2.13.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v7 4/9] xen/x86: use invpcid for flushing the TLB
  2018-04-12 18:09 [PATCH v7 0/9] xen/x86: various XPTI speedups Juergen Gross
                   ` (2 preceding siblings ...)
  2018-04-12 18:09 ` [PATCH v7 3/9] xen/x86: support per-domain flag for xpti Juergen Gross
@ 2018-04-12 18:09 ` Juergen Gross
  2018-04-12 18:09 ` [PATCH v7 5/9] xen/x86: disable global pages for domains with XPTI active Juergen Gross
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Juergen Gross @ 2018-04-12 18:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, jbeulich

If possible use the INVPCID instruction for flushing the TLB instead of
toggling cr4.pge for that purpose.

While at it remove the dependency on cr4.pge being required for mtrr
loading, as this will be required later anyway.

Add a command line option "invpcid" for controlling the use of
INVPCID (default to true).

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V6:
- reword invpcid parameter description (Andrew Cooper)
- add __read_mostly to use_invpcid definition (Andrew Cooper)

V5:
- use pre_flush() as an initializer in do_tlb_flush() (Jan Beulich)
- introduce boolean use_invpcid instead of clearing X86_FEATURE_INVPCID
  (Jan Beulich)

V4:
- option "invpcid" instead of "noinvpcid" (Jan Beulich)

V3:
- new patch
---
 docs/misc/xen-command-line.markdown |  9 +++++++++
 xen/arch/x86/cpu/mtrr/generic.c     | 37 ++++++++++++++++++++++++++-----------
 xen/arch/x86/flushtlb.c             | 29 +++++++++++++++++++----------
 xen/arch/x86/setup.c                |  8 ++++++++
 xen/include/asm-x86/invpcid.h       |  2 ++
 5 files changed, 64 insertions(+), 21 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index d4f758487a..451a4fa566 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1380,6 +1380,15 @@ Because responsibility for APIC setup is shared between Xen and the
 domain 0 kernel this option is automatically propagated to the domain
 0 command line.
 
+### invpcid (x86)
+> `= <boolean>`
+
+> Default: `true`
+
+By default, Xen will use the INVPCID instruction for TLB management if
+it is available.  This option can be used to cause Xen to fall back to
+older mechanisms, which are generally slower.
+
 ### noirqbalance
 > `= <boolean>`
 
diff --git a/xen/arch/x86/cpu/mtrr/generic.c b/xen/arch/x86/cpu/mtrr/generic.c
index e9c0e5e059..7ba0c3f0fe 100644
--- a/xen/arch/x86/cpu/mtrr/generic.c
+++ b/xen/arch/x86/cpu/mtrr/generic.c
@@ -5,6 +5,7 @@
 #include <xen/mm.h>
 #include <xen/stdbool.h>
 #include <asm/flushtlb.h>
+#include <asm/invpcid.h>
 #include <asm/io.h>
 #include <asm/mtrr.h>
 #include <asm/msr.h>
@@ -400,8 +401,10 @@ static DEFINE_SPINLOCK(set_atomicity_lock);
  * has been called.
  */
 
-static void prepare_set(void)
+static bool prepare_set(void)
 {
+	unsigned long cr4;
+
 	/*  Note that this is not ideal, since the cache is only flushed/disabled
 	   for this CPU while the MTRRs are changed, but changing this requires
 	   more invasive changes to the way the kernel boots  */
@@ -412,18 +415,24 @@ static void prepare_set(void)
 	write_cr0(read_cr0() | X86_CR0_CD);
 	wbinvd();
 
-	/*  TLB flushing here relies on Xen always using CR4.PGE. */
-	BUILD_BUG_ON(!(XEN_MINIMAL_CR4 & X86_CR4_PGE));
-	write_cr4(read_cr4() & ~X86_CR4_PGE);
+	cr4 = read_cr4();
+	if (cr4 & X86_CR4_PGE)
+		write_cr4(cr4 & ~X86_CR4_PGE);
+	else if (use_invpcid)
+		invpcid_flush_all();
+	else
+		write_cr3(read_cr3());
 
 	/*  Save MTRR state */
 	rdmsrl(MSR_MTRRdefType, deftype);
 
 	/*  Disable MTRRs, and set the default type to uncached  */
 	mtrr_wrmsr(MSR_MTRRdefType, deftype & ~0xcff);
+
+	return cr4 & X86_CR4_PGE;
 }
 
-static void post_set(void)
+static void post_set(bool pge)
 {
 	/* Intel (P6) standard MTRRs */
 	mtrr_wrmsr(MSR_MTRRdefType, deftype);
@@ -432,7 +441,12 @@ static void post_set(void)
 	write_cr0(read_cr0() & ~X86_CR0_CD);
 
 	/*  Reenable CR4.PGE (also flushes the TLB) */
-	write_cr4(read_cr4() | X86_CR4_PGE);
+	if (pge)
+		write_cr4(read_cr4() | X86_CR4_PGE);
+	else if (use_invpcid)
+		invpcid_flush_all();
+	else
+		write_cr3(read_cr3());
 
 	spin_unlock(&set_atomicity_lock);
 }
@@ -441,14 +455,15 @@ static void generic_set_all(void)
 {
 	unsigned long mask, count;
 	unsigned long flags;
+	bool pge;
 
 	local_irq_save(flags);
-	prepare_set();
+	pge = prepare_set();
 
 	/* Actually set the state */
 	mask = set_mtrr_state();
 
-	post_set();
+	post_set(pge);
 	local_irq_restore(flags);
 
 	/*  Use the atomic bitops to update the global mask  */
@@ -457,7 +472,6 @@ static void generic_set_all(void)
 			set_bit(count, &smp_changes_mask);
 		mask >>= 1;
 	}
-	
 }
 
 static void generic_set_mtrr(unsigned int reg, unsigned long base,
@@ -474,11 +488,12 @@ static void generic_set_mtrr(unsigned int reg, unsigned long base,
 {
 	unsigned long flags;
 	struct mtrr_var_range *vr;
+	bool pge;
 
 	vr = &mtrr_state.var_ranges[reg];
 
 	local_irq_save(flags);
-	prepare_set();
+	pge = prepare_set();
 
 	if (size == 0) {
 		/* The invalid bit is kept in the mask, so we simply clear the
@@ -499,7 +514,7 @@ static void generic_set_mtrr(unsigned int reg, unsigned long base,
 		mtrr_wrmsr(MSR_IA32_MTRR_PHYSMASK(reg), vr->mask);
 	}
 
-	post_set();
+	post_set(pge);
 	local_irq_restore(flags);
 }
 
diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index 788c61d81a..fc3b0a3268 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -10,6 +10,7 @@
 #include <xen/sched.h>
 #include <xen/softirq.h>
 #include <asm/flushtlb.h>
+#include <asm/invpcid.h>
 #include <asm/page.h>
 
 /* Debug builds: Wrap frequently to stress-test the wrap logic. */
@@ -71,6 +72,23 @@ static void post_flush(u32 t)
     this_cpu(tlbflush_time) = t;
 }
 
+static void do_tlb_flush(void)
+{
+    u32 t = pre_flush();
+
+    if ( use_invpcid )
+        invpcid_flush_all();
+    else
+    {
+        unsigned long cr4 = read_cr4();
+
+        write_cr4(cr4 ^ X86_CR4_PGE);
+        write_cr4(cr4);
+    }
+
+    post_flush(t);
+}
+
 void switch_cr3(unsigned long cr3)
 {
     unsigned long flags, cr4;
@@ -118,16 +136,7 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
                            : : "m" (*(const char *)(va)) : "memory" );
         }
         else
-        {
-            u32 t = pre_flush();
-            unsigned long cr4 = read_cr4();
-
-            write_cr4(cr4 & ~X86_CR4_PGE);
-            barrier();
-            write_cr4(cr4);
-
-            post_flush(t);
-        }
+            do_tlb_flush();
     }
 
     if ( flags & FLUSH_CACHE )
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index f803980b97..164c42cbf1 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -62,6 +62,11 @@ boolean_param("nosmp", opt_nosmp);
 static unsigned int __initdata max_cpus;
 integer_param("maxcpus", max_cpus);
 
+/* opt_invpcid: If false, don't use INVPCID instruction even if available. */
+static bool __initdata opt_invpcid = true;
+boolean_param("invpcid", opt_invpcid);
+bool __read_mostly use_invpcid;
+
 unsigned long __read_mostly cr4_pv32_mask;
 
 /* **** Linux config option: propagated to domain0. */
@@ -1546,6 +1551,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     if ( cpu_has_fsgsbase )
         set_in_cr4(X86_CR4_FSGSBASE);
 
+    if ( opt_invpcid && cpu_has_invpcid )
+        use_invpcid = true;
+
     init_speculation_mitigations();
 
     init_idle_domain();
diff --git a/xen/include/asm-x86/invpcid.h b/xen/include/asm-x86/invpcid.h
index b46624a865..edd8b68706 100644
--- a/xen/include/asm-x86/invpcid.h
+++ b/xen/include/asm-x86/invpcid.h
@@ -3,6 +3,8 @@
 
 #include <xen/types.h>
 
+extern bool use_invpcid;
+
 #define INVPCID_TYPE_INDIV_ADDR      0
 #define INVPCID_TYPE_SINGLE_CTXT     1
 #define INVPCID_TYPE_ALL_INCL_GLOBAL 2
-- 
2.13.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v7 5/9] xen/x86: disable global pages for domains with XPTI active
  2018-04-12 18:09 [PATCH v7 0/9] xen/x86: various XPTI speedups Juergen Gross
                   ` (3 preceding siblings ...)
  2018-04-12 18:09 ` [PATCH v7 4/9] xen/x86: use invpcid for flushing the TLB Juergen Gross
@ 2018-04-12 18:09 ` Juergen Gross
  2018-04-12 18:09 ` [PATCH v7 6/9] xen/x86: use flag byte for decision whether xen_cr3 is valid Juergen Gross
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Juergen Gross @ 2018-04-12 18:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, jbeulich

Instead of flushing the TLB from global pages when switching address
spaces with XPTI being active just disable global pages via %cr4
completely when a domain subject to XPTI is active. This avoids the
need for extra TLB flushes as loading %cr3 will remove all TLB
entries.

In order to avoid states with cr3/cr4 having inconsistent values
(e.g. global pages being activated while cr3 already specifies a XPTI
address space) move loading of the new cr4 value to write_ptbase()
(actually to switch_cr3_cr4() called by write_ptbase()).

This requires to use switch_cr3_cr4() instead of write_ptbase() when
building dom0 in order to avoid setting cr4 with cr4.smap set.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V7:
- use switch_cr3_cr4() in dom0_build.c

V6:
- don't call read_cr4() multiple times in switch_cr3_cr4()
  (Andrew Cooper)

V4:
- don't use mmu_cr4_features for setting new cr4 value (Jan Beulich)
- use simpler scheme for setting X86_CR4_PGE in
  pv_guest_cr4_to_real_cr4() (Jan Beulich)

V3:
- move cr4 loading for all domains from *_ctxt_switch_to() to
  write_cr3_cr4() called by write_ptbase() (Jan Beulich)
- rebase
---
 xen/arch/x86/domain.c          |  5 -----
 xen/arch/x86/flushtlb.c        | 17 ++++++++++++-----
 xen/arch/x86/mm.c              | 14 +++++++++++---
 xen/arch/x86/pv/dom0_build.c   |  6 +++---
 xen/arch/x86/x86_64/entry.S    | 10 ----------
 xen/common/efi/runtime.c       |  4 ++--
 xen/include/asm-x86/domain.h   |  3 ++-
 xen/include/asm-x86/flushtlb.h |  2 +-
 8 files changed, 31 insertions(+), 30 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 3d9c19d055..9b001a03ec 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1523,17 +1523,12 @@ void paravirt_ctxt_switch_from(struct vcpu *v)
 void paravirt_ctxt_switch_to(struct vcpu *v)
 {
     root_pgentry_t *root_pgt = this_cpu(root_pgt);
-    unsigned long cr4;
 
     if ( root_pgt )
         root_pgt[root_table_offset(PERDOMAIN_VIRT_START)] =
             l4e_from_page(v->domain->arch.perdomain_l3_pg,
                           __PAGE_HYPERVISOR_RW);
 
-    cr4 = pv_guest_cr4_to_real_cr4(v);
-    if ( unlikely(cr4 != read_cr4()) )
-        write_cr4(cr4);
-
     if ( unlikely(v->arch.debugreg[7] & DR7_ACTIVE_MASK) )
         activate_debugregs(v);
 
diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index fc3b0a3268..e28bf04a37 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -89,20 +89,27 @@ static void do_tlb_flush(void)
     post_flush(t);
 }
 
-void switch_cr3(unsigned long cr3)
+void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
 {
-    unsigned long flags, cr4;
+    unsigned long flags, old_cr4;
     u32 t;
 
     /* This non-reentrant function is sometimes called in interrupt context. */
     local_irq_save(flags);
 
     t = pre_flush();
-    cr4 = read_cr4();
 
-    write_cr4(cr4 & ~X86_CR4_PGE);
+    old_cr4 = read_cr4();
+    if ( old_cr4 & X86_CR4_PGE )
+    {
+        old_cr4 = cr4 & ~X86_CR4_PGE;
+        write_cr4(old_cr4);
+    }
+
     write_cr3(cr3);
-    write_cr4(cr4);
+
+    if ( old_cr4 != cr4 )
+        write_cr4(cr4);
 
     post_flush(t);
 
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 9c36614099..73a38e8715 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -503,20 +503,28 @@ void make_cr3(struct vcpu *v, mfn_t mfn)
 void write_ptbase(struct vcpu *v)
 {
     struct cpu_info *cpu_info = get_cpu_info();
+    unsigned long new_cr4;
+
+    new_cr4 = (is_pv_vcpu(v) && !is_idle_vcpu(v))
+              ? pv_guest_cr4_to_real_cr4(v)
+              : ((read_cr4() & ~X86_CR4_TSD) | X86_CR4_PGE);
 
     if ( is_pv_vcpu(v) && v->domain->arch.pv_domain.xpti )
     {
         cpu_info->root_pgt_changed = true;
         cpu_info->pv_cr3 = __pa(this_cpu(root_pgt));
-        switch_cr3(v->arch.cr3);
+        switch_cr3_cr4(v->arch.cr3, new_cr4);
     }
     else
     {
-        /* Make sure to clear xen_cr3 before pv_cr3; switch_cr3() serializes. */
+        /* Make sure to clear xen_cr3 before pv_cr3. */
         cpu_info->xen_cr3 = 0;
-        switch_cr3(v->arch.cr3);
+        /* switch_cr3_cr4() serializes. */
+        switch_cr3_cr4(v->arch.cr3, new_cr4);
         cpu_info->pv_cr3 = 0;
     }
+
+    ASSERT(is_pv_vcpu(v) || read_cr4() == mmu_cr4_features);
 }
 
 /*
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index d148395919..4465a059a8 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -717,7 +717,7 @@ int __init dom0_construct_pv(struct domain *d,
         update_cr3(v);
 
     /* We run on dom0's page tables for the final part of the build process. */
-    write_ptbase(v);
+    switch_cr3_cr4(v->arch.cr3, read_cr4());
     mapcache_override_current(v);
 
     /* Copy the OS image and free temporary buffer. */
@@ -738,7 +738,7 @@ int __init dom0_construct_pv(struct domain *d,
              (parms.virt_hypercall >= v_end) )
         {
             mapcache_override_current(NULL);
-            write_ptbase(current);
+            switch_cr3_cr4(current->arch.cr3, read_cr4());
             printk("Invalid HYPERCALL_PAGE field in ELF notes.\n");
             rc = -1;
             goto out;
@@ -871,7 +871,7 @@ int __init dom0_construct_pv(struct domain *d,
 
     /* Return to idle domain's page tables. */
     mapcache_override_current(NULL);
-    write_ptbase(current);
+    switch_cr3_cr4(current->arch.cr3, read_cr4());
 
     update_domain_wallclock_time(d);
 
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index dd42223b20..5f0758d64f 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -153,13 +153,8 @@ restore_all_guest:
                 ROOT_PAGETABLE_LAST_XEN_SLOT - 1) * 8, %rdi
         rep movsq
 .Lrag_copy_done:
-        mov   STACK_CPUINFO_FIELD(cr4)(%rdx), %rdi
         mov   %r9, STACK_CPUINFO_FIELD(xen_cr3)(%rdx)
-        mov   %rdi, %rsi
-        and   $~X86_CR4_PGE, %rdi
-        mov   %rdi, %cr4
         mov   %rax, %cr3
-        mov   %rsi, %cr4
 .Lrag_keep_cr3:
 
         /* Restore stashed SPEC_CTRL value. */
@@ -215,12 +210,7 @@ restore_all_xen:
          * so "g" will have to do.
          */
 UNLIKELY_START(g, exit_cr3)
-        mov   %cr4, %rdi
-        mov   %rdi, %rsi
-        and   $~X86_CR4_PGE, %rdi
-        mov   %rdi, %cr4
         mov   %rax, %cr3
-        mov   %rsi, %cr4
 UNLIKELY_END(exit_cr3)
 
         /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
diff --git a/xen/common/efi/runtime.c b/xen/common/efi/runtime.c
index 4e5ddfef4f..070a70d784 100644
--- a/xen/common/efi/runtime.c
+++ b/xen/common/efi/runtime.c
@@ -111,7 +111,7 @@ struct efi_rs_state efi_rs_enter(void)
         lgdt(&gdt_desc);
     }
 
-    switch_cr3(virt_to_maddr(efi_l4_pgtable));
+    switch_cr3_cr4(virt_to_maddr(efi_l4_pgtable), read_cr4());
 
     return state;
 }
@@ -120,7 +120,7 @@ void efi_rs_leave(struct efi_rs_state *state)
 {
     if ( !state->cr3 )
         return;
-    switch_cr3(state->cr3);
+    switch_cr3_cr4(state->cr3, read_cr4());
     if ( is_pv_vcpu(current) && !is_idle_vcpu(current) )
     {
         struct desc_ptr gdt_desc = {
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index ed4199931a..b7894dc8c8 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -618,9 +618,10 @@ unsigned long pv_guest_cr4_fixup(const struct vcpu *, unsigned long guest_cr4);
 #define pv_guest_cr4_to_real_cr4(v)                         \
     (((v)->arch.pv_vcpu.ctrlreg[4]                          \
       | (mmu_cr4_features                                   \
-         & (X86_CR4_PGE | X86_CR4_PSE | X86_CR4_SMEP |      \
+         & (X86_CR4_PSE | X86_CR4_SMEP |                    \
             X86_CR4_SMAP | X86_CR4_OSXSAVE |                \
             X86_CR4_FSGSBASE))                              \
+      | ((v)->domain->arch.pv_domain.xpti ? 0 : X86_CR4_PGE) \
       | ((v)->domain->arch.vtsc ? X86_CR4_TSD : 0))         \
      & ~X86_CR4_DE)
 #define real_cr4_to_pv_guest_cr4(c)                         \
diff --git a/xen/include/asm-x86/flushtlb.h b/xen/include/asm-x86/flushtlb.h
index 5515f73b7f..73321f948a 100644
--- a/xen/include/asm-x86/flushtlb.h
+++ b/xen/include/asm-x86/flushtlb.h
@@ -84,7 +84,7 @@ static inline unsigned long read_cr3(void)
 }
 
 /* Write pagetable base and implicitly tick the tlbflush clock. */
-void switch_cr3(unsigned long cr3);
+void switch_cr3_cr4(unsigned long cr3, unsigned long cr4);
 
 /* flush_* flag fields: */
  /*
-- 
2.13.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v7 6/9] xen/x86: use flag byte for decision whether xen_cr3 is valid
  2018-04-12 18:09 [PATCH v7 0/9] xen/x86: various XPTI speedups Juergen Gross
                   ` (4 preceding siblings ...)
  2018-04-12 18:09 ` [PATCH v7 5/9] xen/x86: disable global pages for domains with XPTI active Juergen Gross
@ 2018-04-12 18:09 ` Juergen Gross
  2018-04-12 18:09 ` [PATCH v7 7/9] xen/x86: convert pv_guest_cr4_to_real_cr4() to a function Juergen Gross
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Juergen Gross @ 2018-04-12 18:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, jbeulich

Today cpu_info->xen_cr3 is either 0 to indicate %cr3 doesn't need to
be switched on entry to Xen, or negative for keeping the value while
indicating not to restore %cr3, or positive in case %cr3 is to be
restored.

Switch to use a flag byte instead of a negative xen_cr3 value in order
to allow %cr3 values with the high bit set in case we want to keep TLB
entries when using the PCID feature.

This reduces the number of branches in interrupt handling and results
in better performance (e.g. parallel make of the Xen hypervisor on my
system was using about 3% less system time).

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V3:
- renamed use_xen_cr3 to better fitting use_pv_cr3
- corrected comment regarding semantics of use_pv_cr3 (Jan Beulich)
- prefer 32-bit operations over 8- or 16-bit ones (Jan Beulich)
---
 xen/arch/x86/domain.c              |  1 +
 xen/arch/x86/mm.c                  |  3 +-
 xen/arch/x86/smpboot.c             |  2 ++
 xen/arch/x86/x86_64/asm-offsets.c  |  1 +
 xen/arch/x86/x86_64/compat/entry.S |  5 ++--
 xen/arch/x86/x86_64/entry.S        | 59 ++++++++++++++++----------------------
 xen/include/asm-x86/current.h      | 12 +++++---
 7 files changed, 41 insertions(+), 42 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 9b001a03ec..801ac33810 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1696,6 +1696,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 
     ASSERT(local_irq_is_enabled());
 
+    get_cpu_info()->use_pv_cr3 = false;
     get_cpu_info()->xen_cr3 = 0;
 
     if ( unlikely(dirty_cpu != cpu) && dirty_cpu != VCPU_CPU_CLEAN )
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 73a38e8715..4997047edf 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -517,7 +517,8 @@ void write_ptbase(struct vcpu *v)
     }
     else
     {
-        /* Make sure to clear xen_cr3 before pv_cr3. */
+        /* Make sure to clear use_pv_cr3 and xen_cr3 before pv_cr3. */
+        cpu_info->use_pv_cr3 = false;
         cpu_info->xen_cr3 = 0;
         /* switch_cr3_cr4() serializes. */
         switch_cr3_cr4(v->arch.cr3, new_cr4);
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 980192e71f..d21020c510 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -324,6 +324,7 @@ void start_secondary(void *unused)
      */
     spin_debug_disable();
 
+    get_cpu_info()->use_pv_cr3 = false;
     get_cpu_info()->xen_cr3 = 0;
     get_cpu_info()->pv_cr3 = 0;
 
@@ -1123,6 +1124,7 @@ void __init smp_prepare_boot_cpu(void)
     per_cpu(scratch_cpumask, cpu) = &scratch_cpu0mask;
 #endif
 
+    get_cpu_info()->use_pv_cr3 = false;
     get_cpu_info()->xen_cr3 = 0;
     get_cpu_info()->pv_cr3 = 0;
 }
diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
index 9e2aefb00f..7ad024cf37 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -144,6 +144,7 @@ void __dummy__(void)
     OFFSET(CPUINFO_use_shadow_spec_ctrl, struct cpu_info, use_shadow_spec_ctrl);
     OFFSET(CPUINFO_bti_ist_info, struct cpu_info, bti_ist_info);
     OFFSET(CPUINFO_root_pgt_changed, struct cpu_info, root_pgt_changed);
+    OFFSET(CPUINFO_use_pv_cr3, struct cpu_info, use_pv_cr3);
     DEFINE(CPUINFO_sizeof, sizeof(struct cpu_info));
     BLANK();
 
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index ae2bb4bf1e..b909977e33 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -210,10 +210,9 @@ ENTRY(cstar_enter)
 
         GET_STACK_END(bx)
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rcx
-        neg   %rcx
+        test  %rcx, %rcx
         jz    .Lcstar_cr3_okay
-        mov   %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
-        neg   %rcx
+        movb  $0, STACK_CPUINFO_FIELD(use_pv_cr3)(%rbx)
         mov   %rcx, %cr3
         movq  $0, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
 .Lcstar_cr3_okay:
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 5f0758d64f..8b7d1c48ad 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -154,6 +154,7 @@ restore_all_guest:
         rep movsq
 .Lrag_copy_done:
         mov   %r9, STACK_CPUINFO_FIELD(xen_cr3)(%rdx)
+        movb  $1, STACK_CPUINFO_FIELD(use_pv_cr3)(%rdx)
         mov   %rax, %cr3
 .Lrag_keep_cr3:
 
@@ -202,14 +203,9 @@ restore_all_xen:
          * case we return to late PV exit code (from an NMI or #MC).
          */
         GET_STACK_END(bx)
-        mov   STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rdx
+        cmpb  $0, STACK_CPUINFO_FIELD(use_pv_cr3)(%rbx)
+UNLIKELY_START(ne, exit_cr3)
         mov   STACK_CPUINFO_FIELD(pv_cr3)(%rbx), %rax
-        test  %rdx, %rdx
-        /*
-         * Ideally the condition would be "nsz", but such doesn't exist,
-         * so "g" will have to do.
-         */
-UNLIKELY_START(g, exit_cr3)
         mov   %rax, %cr3
 UNLIKELY_END(exit_cr3)
 
@@ -251,10 +247,9 @@ ENTRY(lstar_enter)
 
         GET_STACK_END(bx)
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rcx
-        neg   %rcx
+        test  %rcx, %rcx
         jz    .Llstar_cr3_okay
-        mov   %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
-        neg   %rcx
+        movb  $0, STACK_CPUINFO_FIELD(use_pv_cr3)(%rbx)
         mov   %rcx, %cr3
         movq  $0, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
 .Llstar_cr3_okay:
@@ -288,10 +283,9 @@ GLOBAL(sysenter_eflags_saved)
         /* PUSHF above has saved EFLAGS.IF clear (the caller had it set). */
         orl   $X86_EFLAGS_IF, UREGS_eflags(%rsp)
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rcx
-        neg   %rcx
+        test  %rcx, %rcx
         jz    .Lsyse_cr3_okay
-        mov   %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
-        neg   %rcx
+        movb  $0, STACK_CPUINFO_FIELD(use_pv_cr3)(%rbx)
         mov   %rcx, %cr3
         movq  $0, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
 .Lsyse_cr3_okay:
@@ -338,10 +332,9 @@ ENTRY(int80_direct_trap)
 
         GET_STACK_END(bx)
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rcx
-        neg   %rcx
+        test  %rcx, %rcx
         jz    .Lint80_cr3_okay
-        mov   %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
-        neg   %rcx
+        movb  $0, STACK_CPUINFO_FIELD(use_pv_cr3)(%rbx)
         mov   %rcx, %cr3
         movq  $0, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
 .Lint80_cr3_okay:
@@ -546,24 +539,24 @@ ENTRY(common_interrupt)
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
+        mov   STACK_CPUINFO_FIELD(use_pv_cr3)(%r14), %bl
         mov   %rcx, %r15
-        neg   %rcx
+        test  %rcx, %rcx
         jz    .Lintr_cr3_okay
-        jns   .Lintr_cr3_load
-        mov   %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
-        neg   %rcx
-.Lintr_cr3_load:
+        movb  $0, STACK_CPUINFO_FIELD(use_pv_cr3)(%r14)
         mov   %rcx, %cr3
         xor   %ecx, %ecx
         mov   %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
         testb $3, UREGS_cs(%rsp)
         cmovnz %rcx, %r15
+        cmovnz %rcx, %rbx
 .Lintr_cr3_okay:
 
         CR4_PV32_RESTORE
         movq %rsp,%rdi
         callq do_IRQ
         mov   %r15, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
+        mov   %bl, STACK_CPUINFO_FIELD(use_pv_cr3)(%r14)
         jmp ret_from_intr
 
 ENTRY(page_fault)
@@ -578,18 +571,17 @@ GLOBAL(handle_exception)
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
+        mov   STACK_CPUINFO_FIELD(use_pv_cr3)(%r14), %r13b
         mov   %rcx, %r15
-        neg   %rcx
+        test  %rcx, %rcx
         jz    .Lxcpt_cr3_okay
-        jns   .Lxcpt_cr3_load
-        mov   %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
-        neg   %rcx
-.Lxcpt_cr3_load:
+        movb  $0, STACK_CPUINFO_FIELD(use_pv_cr3)(%r14)
         mov   %rcx, %cr3
         xor   %ecx, %ecx
         mov   %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
         testb $3, UREGS_cs(%rsp)
         cmovnz %rcx, %r15
+        cmovnz %rcx, %r13
 .Lxcpt_cr3_okay:
 
 handle_exception_saved:
@@ -644,6 +636,7 @@ handle_exception_saved:
         mov   (%rdx, %rax, 8), %rdx
         INDIRECT_CALL %rdx
         mov   %r15, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
+        mov   %r13b, STACK_CPUINFO_FIELD(use_pv_cr3)(%r14)
         testb $3,UREGS_cs(%rsp)
         jz    restore_all_xen
         leaq  VCPU_trap_bounce(%rbx),%rdx
@@ -677,6 +670,7 @@ exception_with_ints_disabled:
 1:      movq  UREGS_error_code(%rsp),%rax # ec/ev
         movq  %rax,UREGS_kernel_sizeof(%rsp)
         mov   %r15, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
+        mov   %r13b, STACK_CPUINFO_FIELD(use_pv_cr3)(%r14)
         jmp   restore_all_xen           # return to fixup code
 
 /* No special register assumptions. */
@@ -764,9 +758,6 @@ ENTRY(double_fault)
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rbx
         test  %rbx, %rbx
         jz    .Ldblf_cr3_okay
-        jns   .Ldblf_cr3_load
-        neg   %rbx
-.Ldblf_cr3_load:
         mov   %rbx, %cr3
 .Ldblf_cr3_okay:
 
@@ -795,13 +786,11 @@ handle_ist_exception:
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
+        mov   STACK_CPUINFO_FIELD(use_pv_cr3)(%r14), %bl
         mov   %rcx, %r15
-        neg   %rcx
+        test  %rcx, %rcx
         jz    .List_cr3_okay
-        jns   .List_cr3_load
-        mov   %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
-        neg   %rcx
-.List_cr3_load:
+        movb  $0, STACK_CPUINFO_FIELD(use_pv_cr3)(%r14)
         mov   %rcx, %cr3
         movq  $0, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
 .List_cr3_okay:
@@ -814,6 +803,7 @@ handle_ist_exception:
          * and copy the context to stack bottom.
          */
         xor   %r15, %r15
+        xor   %ebx, %ebx
         GET_CPUINFO_FIELD(guest_cpu_user_regs,di)
         movq  %rsp,%rsi
         movl  $UREGS_kernel_sizeof/8,%ecx
@@ -825,6 +815,7 @@ handle_ist_exception:
         mov   (%rdx, %rax, 8), %rdx
         INDIRECT_CALL %rdx
         mov   %r15, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
+        mov   %bl, STACK_CPUINFO_FIELD(use_pv_cr3)(%r14)
         cmpb  $TRAP_nmi,UREGS_entry_vector(%rsp)
         jne   ret_from_intr
 
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index b2475783f8..43bdec1f49 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -47,10 +47,7 @@ struct cpu_info {
      * context is being entered. A value of zero indicates no setting of CR3
      * is to be performed.
      * The former is the value to restore when re-entering Xen, if any. IOW
-     * its value being zero means there's nothing to restore. However, its
-     * value can also be negative, indicating to the exit-to-Xen code that
-     * restoring is not necessary, but allowing any nested entry code paths
-     * to still know the value to put back into CR3.
+     * its value being zero means there's nothing to restore.
      */
     unsigned long xen_cr3;
     unsigned long pv_cr3;
@@ -68,6 +65,13 @@ struct cpu_info {
      */
     bool         root_pgt_changed;
 
+    /*
+     * use_pv_cr3 is set in case the value of pv_cr3 is to be written into
+     * CR3 when returning from an interrupt. The main use is when returning
+     * from a NMI or MCE to hypervisor code where pv_cr3 was active.
+     */
+    bool         use_pv_cr3;
+
     unsigned long __pad;
     /* get_stack_bottom() must be 16-byte aligned */
 };
-- 
2.13.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v7 7/9] xen/x86: convert pv_guest_cr4_to_real_cr4() to a function
  2018-04-12 18:09 [PATCH v7 0/9] xen/x86: various XPTI speedups Juergen Gross
                   ` (5 preceding siblings ...)
  2018-04-12 18:09 ` [PATCH v7 6/9] xen/x86: use flag byte for decision whether xen_cr3 is valid Juergen Gross
@ 2018-04-12 18:09 ` Juergen Gross
  2018-04-12 18:09 ` [PATCH v7 8/9] xen/x86: add some cr3 helpers Juergen Gross
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Juergen Gross @ 2018-04-12 18:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, jbeulich

pv_guest_cr4_to_real_cr4() is becoming more and more complex. Convert
it from a macro to an ordinary function.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V6:
- new patch, split off from (old) patch 7 (Andrew Cooper)
---
 xen/arch/x86/mm.c            | 14 ++++++++++++++
 xen/include/asm-x86/domain.h | 11 ++---------
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 4997047edf..6aa0c34bae 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -500,6 +500,20 @@ void make_cr3(struct vcpu *v, mfn_t mfn)
     v->arch.cr3 = mfn_x(mfn) << PAGE_SHIFT;
 }
 
+unsigned long pv_guest_cr4_to_real_cr4(const struct vcpu *v)
+{
+    const struct domain *d = v->domain;
+    unsigned long cr4;
+
+    cr4 = v->arch.pv_vcpu.ctrlreg[4] & ~X86_CR4_DE;
+    cr4 |= mmu_cr4_features & (X86_CR4_PSE | X86_CR4_SMEP | X86_CR4_SMAP |
+                               X86_CR4_OSXSAVE | X86_CR4_FSGSBASE);
+    cr4 |= d->arch.pv_domain.xpti  ? 0 : X86_CR4_PGE;
+    cr4 |= d->arch.vtsc ? X86_CR4_TSD : 0;
+
+    return cr4;
+}
+
 void write_ptbase(struct vcpu *v)
 {
     struct cpu_info *cpu_info = get_cpu_info();
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index b7894dc8c8..9627058cd0 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -615,15 +615,8 @@ void vcpu_show_registers(const struct vcpu *);
 unsigned long pv_guest_cr4_fixup(const struct vcpu *, unsigned long guest_cr4);
 
 /* Convert between guest-visible and real CR4 values. */
-#define pv_guest_cr4_to_real_cr4(v)                         \
-    (((v)->arch.pv_vcpu.ctrlreg[4]                          \
-      | (mmu_cr4_features                                   \
-         & (X86_CR4_PSE | X86_CR4_SMEP |                    \
-            X86_CR4_SMAP | X86_CR4_OSXSAVE |                \
-            X86_CR4_FSGSBASE))                              \
-      | ((v)->domain->arch.pv_domain.xpti ? 0 : X86_CR4_PGE) \
-      | ((v)->domain->arch.vtsc ? X86_CR4_TSD : 0))         \
-     & ~X86_CR4_DE)
+unsigned long pv_guest_cr4_to_real_cr4(const struct vcpu *v);
+
 #define real_cr4_to_pv_guest_cr4(c)                         \
     ((c) & ~(X86_CR4_PGE | X86_CR4_PSE | X86_CR4_TSD |      \
              X86_CR4_OSXSAVE | X86_CR4_SMEP |               \
-- 
2.13.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v7 8/9] xen/x86: add some cr3 helpers
  2018-04-12 18:09 [PATCH v7 0/9] xen/x86: various XPTI speedups Juergen Gross
                   ` (6 preceding siblings ...)
  2018-04-12 18:09 ` [PATCH v7 7/9] xen/x86: convert pv_guest_cr4_to_real_cr4() to a function Juergen Gross
@ 2018-04-12 18:09 ` Juergen Gross
  2018-04-12 18:09 ` [PATCH v7 9/9] xen/x86: use PCID feature Juergen Gross
  2018-04-13  9:59 ` [PATCH v7 0/9] xen/x86: various XPTI speedups Andrew Cooper
  9 siblings, 0 replies; 16+ messages in thread
From: Juergen Gross @ 2018-04-12 18:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, jbeulich

Add some helper macros to access the address and pcid parts of cr3.

Use those helpers where appropriate.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V6:
- new patch (Andrew Cooper)
---
 xen/arch/x86/debug.c            |  2 +-
 xen/arch/x86/domain_page.c      |  2 +-
 xen/include/asm-x86/processor.h | 10 ++++++++++
 xen/include/asm-x86/x86-defns.h |  4 +++-
 4 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/debug.c b/xen/arch/x86/debug.c
index 9159f32db4..a500df01ac 100644
--- a/xen/arch/x86/debug.c
+++ b/xen/arch/x86/debug.c
@@ -98,7 +98,7 @@ dbg_pv_va2mfn(dbgva_t vaddr, struct domain *dp, uint64_t pgd3val)
     l2_pgentry_t l2e, *l2t;
     l1_pgentry_t l1e, *l1t;
     unsigned long cr3 = (pgd3val ? pgd3val : dp->vcpu[0]->arch.cr3);
-    mfn_t mfn = maddr_to_mfn(cr3);
+    mfn_t mfn = maddr_to_mfn(cr3_pa(cr3));
 
     DBGP2("vaddr:%lx domid:%d cr3:%lx pgd3:%lx\n", vaddr, dp->domain_id, 
           cr3, pgd3val);
diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index 11b6a5421a..0c24530ed9 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -51,7 +51,7 @@ static inline struct vcpu *mapcache_current_vcpu(void)
         if ( (v = idle_vcpu[smp_processor_id()]) == current )
             sync_local_execstate();
         /* We must now be running on the idle page table. */
-        ASSERT(read_cr3() == __pa(idle_pg_table));
+        ASSERT(cr3_pa(read_cr3()) == __pa(idle_pg_table));
     }
 
     return v;
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 71d32c0333..36628459dc 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -288,6 +288,16 @@ static inline void write_cr3(unsigned long val)
     asm volatile ( "mov %0, %%cr3" : : "r" (val) : "memory" );
 }
 
+static inline unsigned long cr3_pa(unsigned long cr3)
+{
+    return cr3 & X86_CR3_ADDR_MASK;
+}
+
+static inline unsigned long cr3_pcid(unsigned long cr3)
+{
+    return cr3 & X86_CR3_PCID_MASK;
+}
+
 static inline unsigned long read_cr4(void)
 {
     return get_cpu_info()->cr4;
diff --git a/xen/include/asm-x86/x86-defns.h b/xen/include/asm-x86/x86-defns.h
index ff8d66be3c..904041e1ab 100644
--- a/xen/include/asm-x86/x86-defns.h
+++ b/xen/include/asm-x86/x86-defns.h
@@ -45,7 +45,9 @@
 /*
  * Intel CPU flags in CR3
  */
-#define X86_CR3_NOFLUSH (_AC(1, ULL) << 63)
+#define X86_CR3_NOFLUSH    (_AC(1, ULL) << 63)
+#define X86_CR3_ADDR_MASK  (PAGE_MASK & PADDR_MASK)
+#define X86_CR3_PCID_MASK  _AC(0x0fff, ULL) /* Mask for PCID */
 
 /*
  * Intel CPU features in CR4
-- 
2.13.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v7 9/9] xen/x86: use PCID feature
  2018-04-12 18:09 [PATCH v7 0/9] xen/x86: various XPTI speedups Juergen Gross
                   ` (7 preceding siblings ...)
  2018-04-12 18:09 ` [PATCH v7 8/9] xen/x86: add some cr3 helpers Juergen Gross
@ 2018-04-12 18:09 ` Juergen Gross
  2018-04-13  9:59 ` [PATCH v7 0/9] xen/x86: various XPTI speedups Andrew Cooper
  9 siblings, 0 replies; 16+ messages in thread
From: Juergen Gross @ 2018-04-12 18:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, jbeulich

Avoid flushing the complete TLB when switching %cr3 for mitigation of
Meltdown by using the PCID feature if available.

We are using 4 PCID values for a 64 bit pv domain subject to XPTI and
2 values for the non-XPTI case:

- guest active and in kernel mode
- guest active and in user mode
- hypervisor active and guest in user mode (XPTI only)
- hypervisor active and guest in kernel mode (XPTI only)

We use PCID only if PCID _and_ INVPCID are supported. With PCID in use
we disable global pages in cr4. A command line parameter controls in
which cases PCID is being used.

As the non-XPTI case has shown not to perform better with PCID at least
on some machines the default is to use PCID only for domains subject to
XPTI.

With PCID enabled we always disable global pages. This avoids having to
either flush the complete TLB or do a cycle through all PCID values
when invalidating a single global page.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V6.1:
- address some minor comments (Jan Beulich)

V6:
- split off pv_guest_cr4_to_real_cr4() conversion to function into new
  patch (Andrew Cooper)
- changed some comments (Jan Beulich, Andrew Cooper)

V5:
- use X86_CR3_ADDR_MASK instead of ~X86_CR3_PCID_MASK (Jan Beulich)
- add some const qualifiers (Jan Beulich)
- mask X86_CR3_ADDR_MASK with PADDR_MASK (Jan Beulich)
- add flushing the TLB from old PCID related entries in write_cr3_cr4()
  (Jan Beulich)

V4:
- add cr3 mask for page table address and use that in dbg_pv_va2mfn()
  (Jan Beulich)
- use invpcid_flush_all_nonglobals() instead of invpcid_flush_all()
  (Jan Beulich)
- use PCIDs 0/1 when running in Xen or without XPTI, 2/3 with XPTI in
  guest (Jan Beulich)
- ASSERT cr4.pge and cr4.pcide are never active at the same time
  (Jan Beulich)
- make pv_guest_cr4_to_real_cr4() a real function

V3:
- support PCID for non-XPTI case, too
- add command line parameter for controlling usage of PCID
- check PCID active by using cr4.pcide (Jan Beulich)
---
 docs/misc/xen-command-line.markdown | 14 +++++++
 xen/arch/x86/flushtlb.c             | 47 ++++++++++++++++++++-
 xen/arch/x86/mm.c                   | 16 +++++++-
 xen/arch/x86/pv/dom0_build.c        |  1 +
 xen/arch/x86/pv/domain.c            | 81 ++++++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/domain.h        |  4 +-
 xen/include/asm-x86/processor.h     |  3 ++
 xen/include/asm-x86/pv/domain.h     | 31 ++++++++++++++
 8 files changed, 191 insertions(+), 6 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 451a4fa566..f8950a3bb2 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1451,6 +1451,20 @@ All numbers specified must be hexadecimal ones.
 
 This option can be specified more than once (up to 8 times at present).
 
+### pcid (x86)
+> `= <boolean> | xpti=<bool>`
+
+> Default: `xpti`
+
+> Can be modified at runtime (change takes effect only for domains created
+  afterwards)
+
+If available, control usage of the PCID feature of the processor for
+64-bit pv-domains. PCID can be used either for no domain at all (`false`),
+for all of them (`true`), only for those subject to XPTI (`xpti`) or for
+those not subject to XPTI (`no-xpti`). The feature is used only in case
+INVPCID is supported and not disabled via `invpcid=false`.
+
 ### ple\_gap
 > `= <integer>`
 
diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index e28bf04a37..8dd184d8be 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -12,6 +12,7 @@
 #include <asm/flushtlb.h>
 #include <asm/invpcid.h>
 #include <asm/page.h>
+#include <asm/pv/domain.h>
 
 /* Debug builds: Wrap frequently to stress-test the wrap logic. */
 #ifdef NDEBUG
@@ -93,6 +94,7 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
 {
     unsigned long flags, old_cr4;
     u32 t;
+    unsigned long old_pcid = cr3_pcid(read_cr3());
 
     /* This non-reentrant function is sometimes called in interrupt context. */
     local_irq_save(flags);
@@ -102,14 +104,34 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
     old_cr4 = read_cr4();
     if ( old_cr4 & X86_CR4_PGE )
     {
+        /*
+         * X86_CR4_PGE set means PCID is inactive.
+         * We have to purge the TLB via flipping cr4.pge.
+         */
         old_cr4 = cr4 & ~X86_CR4_PGE;
         write_cr4(old_cr4);
     }
+    else if ( use_invpcid )
+        /*
+         * Flushing the TLB via INVPCID is necessary only in case PCIDs are
+         * in use, which is true only with INVPCID being available.
+         * Without PCID usage the following write_cr3() will purge the TLB
+         * (we are in the cr4.pge off path) of all entries.
+         * Using invpcid_flush_all_nonglobals() seems to be faster than
+         * invpcid_flush_all(), so use that.
+         */
+        invpcid_flush_all_nonglobals();
 
     write_cr3(cr3);
 
     if ( old_cr4 != cr4 )
         write_cr4(cr4);
+    else if ( old_pcid != cr3_pcid(cr3) )
+        /*
+         * Make sure no TLB entries related to the old PCID created between
+         * flushing the TLB and writing the new %cr3 value remain in the TLB.
+         */
+        invpcid_flush_single_context(old_pcid);
 
     post_flush(t);
 
@@ -139,8 +161,29 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
              * are various errata surrounding INVLPG usage on superpages, and
              * a full flush is in any case not *that* expensive.
              */
-            asm volatile ( "invlpg %0"
-                           : : "m" (*(const char *)(va)) : "memory" );
+            if ( read_cr4() & X86_CR4_PCIDE )
+            {
+                unsigned long addr = (unsigned long)va;
+
+                /*
+                 * Flush the addresses for all potential address spaces.
+                 * We can't check the current domain for being subject to
+                 * XPTI as current might be the idle vcpu while we still have
+                 * some XPTI domain TLB entries.
+                 * Using invpcid is okay here, as with PCID enabled we always
+                 * have global pages disabled.
+                 */
+                invpcid_flush_one(PCID_PV_PRIV, addr);
+                invpcid_flush_one(PCID_PV_USER, addr);
+                if ( !cpu_has_no_xpti )
+                {
+                    invpcid_flush_one(PCID_PV_PRIV | PCID_PV_XPTI, addr);
+                    invpcid_flush_one(PCID_PV_USER | PCID_PV_XPTI, addr);
+                }
+            }
+            else
+                asm volatile ( "invlpg %0"
+                               : : "m" (*(const char *)(va)) : "memory" );
         }
         else
             do_tlb_flush();
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 6aa0c34bae..d981ce0b19 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -126,6 +126,7 @@
 #include <asm/hvm/ioreq.h>
 
 #include <asm/hvm/grant_table.h>
+#include <asm/pv/domain.h>
 #include <asm/pv/grant_table.h>
 #include <asm/pv/mm.h>
 
@@ -497,7 +498,11 @@ void free_shared_domheap_page(struct page_info *page)
 
 void make_cr3(struct vcpu *v, mfn_t mfn)
 {
+    struct domain *d = v->domain;
+
     v->arch.cr3 = mfn_x(mfn) << PAGE_SHIFT;
+    if ( is_pv_domain(d) && d->arch.pv_domain.pcid )
+        v->arch.cr3 |= get_pcid_bits(v, false);
 }
 
 unsigned long pv_guest_cr4_to_real_cr4(const struct vcpu *v)
@@ -508,7 +513,12 @@ unsigned long pv_guest_cr4_to_real_cr4(const struct vcpu *v)
     cr4 = v->arch.pv_vcpu.ctrlreg[4] & ~X86_CR4_DE;
     cr4 |= mmu_cr4_features & (X86_CR4_PSE | X86_CR4_SMEP | X86_CR4_SMAP |
                                X86_CR4_OSXSAVE | X86_CR4_FSGSBASE);
-    cr4 |= d->arch.pv_domain.xpti  ? 0 : X86_CR4_PGE;
+
+    if ( d->arch.pv_domain.pcid )
+        cr4 |= X86_CR4_PCIDE;
+    else if ( !d->arch.pv_domain.xpti )
+        cr4 |= X86_CR4_PGE;
+
     cr4 |= d->arch.vtsc ? X86_CR4_TSD : 0;
 
     return cr4;
@@ -521,12 +531,14 @@ void write_ptbase(struct vcpu *v)
 
     new_cr4 = (is_pv_vcpu(v) && !is_idle_vcpu(v))
               ? pv_guest_cr4_to_real_cr4(v)
-              : ((read_cr4() & ~X86_CR4_TSD) | X86_CR4_PGE);
+              : ((read_cr4() & ~(X86_CR4_PCIDE | X86_CR4_TSD)) | X86_CR4_PGE);
 
     if ( is_pv_vcpu(v) && v->domain->arch.pv_domain.xpti )
     {
         cpu_info->root_pgt_changed = true;
         cpu_info->pv_cr3 = __pa(this_cpu(root_pgt));
+        if ( new_cr4 & X86_CR4_PCIDE )
+            cpu_info->pv_cr3 |= get_pcid_bits(v, true);
         switch_cr3_cr4(v->arch.cr3, new_cr4);
     }
     else
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 4465a059a8..22c5150444 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -388,6 +388,7 @@ int __init dom0_construct_pv(struct domain *d,
     {
         d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 1;
         d->arch.pv_domain.xpti = false;
+        d->arch.pv_domain.pcid = false;
         v->vcpu_info = (void *)&d->shared_info->compat.vcpu_info[0];
         if ( setup_compat_arg_xlat(v) != 0 )
             BUG();
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index ce1a1a9d35..a4f0bd239d 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -9,9 +9,54 @@
 #include <xen/lib.h>
 #include <xen/sched.h>
 
+#include <asm/cpufeature.h>
+#include <asm/invpcid.h>
 #include <asm/spec_ctrl.h>
 #include <asm/pv/domain.h>
 
+static __read_mostly enum {
+    PCID_OFF,
+    PCID_ALL,
+    PCID_XPTI,
+    PCID_NOXPTI
+} opt_pcid = PCID_XPTI;
+
+static __init int parse_pcid(const char *s)
+{
+    int rc = 0;
+
+    switch ( parse_bool(s, NULL) )
+    {
+    case 0:
+        opt_pcid = PCID_OFF;
+        break;
+
+    case 1:
+        opt_pcid = PCID_ALL;
+        break;
+
+    default:
+        switch ( parse_boolean("xpti", s, NULL) )
+        {
+        case 0:
+            opt_pcid = PCID_NOXPTI;
+            break;
+
+        case 1:
+            opt_pcid = PCID_XPTI;
+            break;
+
+        default:
+            rc = -EINVAL;
+            break;
+        }
+        break;
+    }
+
+    return rc;
+}
+custom_runtime_param("pcid", parse_pcid);
+
 static void noreturn continue_nonidle_domain(struct vcpu *v)
 {
     check_wakeup_from_wait();
@@ -77,6 +122,7 @@ int switch_compat(struct domain *d)
     d->arch.x87_fip_width = 4;
 
     d->arch.pv_domain.xpti = false;
+    d->arch.pv_domain.pcid = false;
 
     return 0;
 
@@ -211,6 +257,29 @@ int pv_domain_initialise(struct domain *d)
     d->arch.pv_domain.xpti = opt_xpti & (is_hardware_domain(d)
                                          ? OPT_XPTI_DOM0 : OPT_XPTI_DOMU);
 
+    if ( !is_pv_32bit_domain(d) && use_invpcid && cpu_has_pcid )
+        switch ( opt_pcid )
+        {
+        case PCID_OFF:
+            break;
+
+        case PCID_ALL:
+            d->arch.pv_domain.pcid = true;
+            break;
+
+        case PCID_XPTI:
+            d->arch.pv_domain.pcid = d->arch.pv_domain.xpti;
+            break;
+
+        case PCID_NOXPTI:
+            d->arch.pv_domain.pcid = !d->arch.pv_domain.xpti;
+            break;
+
+        default:
+            ASSERT_UNREACHABLE();
+            break;
+        }
+
     return 0;
 
   fail:
@@ -221,9 +290,19 @@ int pv_domain_initialise(struct domain *d)
 
 static void _toggle_guest_pt(struct vcpu *v)
 {
+    const struct domain *d = v->domain;
+
     v->arch.flags ^= TF_kernel_mode;
     update_cr3(v);
-    get_cpu_info()->root_pgt_changed = true;
+    if ( d->arch.pv_domain.xpti )
+    {
+        struct cpu_info *cpu_info = get_cpu_info();
+
+        cpu_info->root_pgt_changed = true;
+        cpu_info->pv_cr3 = __pa(this_cpu(root_pgt)) |
+                           (d->arch.pv_domain.pcid
+                            ? get_pcid_bits(v, true) : 0);
+    }
 
     /* Don't flush user global mappings from the TLB. Don't tick TLB clock. */
     write_cr3(v->arch.cr3);
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 9627058cd0..8b66096e7f 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -255,6 +255,8 @@ struct pv_domain
 
     /* XPTI active? */
     bool xpti;
+    /* Use PCID feature? */
+    bool pcid;
 
     /* map_domain_page() mapping cache. */
     struct mapcache_domain mapcache;
@@ -620,7 +622,7 @@ unsigned long pv_guest_cr4_to_real_cr4(const struct vcpu *v);
 #define real_cr4_to_pv_guest_cr4(c)                         \
     ((c) & ~(X86_CR4_PGE | X86_CR4_PSE | X86_CR4_TSD |      \
              X86_CR4_OSXSAVE | X86_CR4_SMEP |               \
-             X86_CR4_FSGSBASE | X86_CR4_SMAP))
+             X86_CR4_FSGSBASE | X86_CR4_SMAP | X86_CR4_PCIDE))
 
 #define domain_max_vcpus(d) (is_hvm_domain(d) ? HVM_MAX_VCPUS : MAX_VIRT_CPUS)
 
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 36628459dc..c4aa385a6f 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -305,6 +305,9 @@ static inline unsigned long read_cr4(void)
 
 static inline void write_cr4(unsigned long val)
 {
+    /* No global pages in case of PCIDs enabled! */
+    ASSERT(!(val & X86_CR4_PGE) || !(val & X86_CR4_PCIDE));
+
     get_cpu_info()->cr4 = val;
     asm volatile ( "mov %0,%%cr4" : : "r" (val) );
 }
diff --git a/xen/include/asm-x86/pv/domain.h b/xen/include/asm-x86/pv/domain.h
index 5e34176939..4fea76444a 100644
--- a/xen/include/asm-x86/pv/domain.h
+++ b/xen/include/asm-x86/pv/domain.h
@@ -21,6 +21,37 @@
 #ifndef __X86_PV_DOMAIN_H__
 #define __X86_PV_DOMAIN_H__
 
+/*
+ * PCID values for the address spaces of 64-bit pv domains:
+ *
+ * We are using 4 PCID values for a 64 bit pv domain subject to XPTI:
+ * - hypervisor active and guest in kernel mode   PCID 0
+ * - hypervisor active and guest in user mode     PCID 1
+ * - guest active and in kernel mode              PCID 2
+ * - guest active and in user mode                PCID 3
+ *
+ * Without XPTI only 2 values are used:
+ * - guest in kernel mode                         PCID 0
+ * - guest in user mode                           PCID 1
+ */
+
+#define PCID_PV_PRIV      0x0000    /* Used for other domains, too. */
+#define PCID_PV_USER      0x0001
+#define PCID_PV_XPTI      0x0002    /* To be ORed to above values. */
+
+/*
+ * Return additional PCID specific cr3 bits.
+ *
+ * Note that X86_CR3_NOFLUSH will not be readable in cr3. Anyone consuming
+ * v->arch.cr3 should mask away X86_CR3_NOFLUSH and X86_CR3_PCIDMASK in case
+ * the value is used to address the root page table.
+ */
+static inline unsigned long get_pcid_bits(const struct vcpu *v, bool is_xpti)
+{
+    return X86_CR3_NOFLUSH | (is_xpti ? PCID_PV_XPTI : 0) |
+           ((v->arch.flags & TF_kernel_mode) ? PCID_PV_PRIV : PCID_PV_USER);
+}
+
 #ifdef CONFIG_PV
 
 void pv_vcpu_destroy(struct vcpu *v);
-- 
2.13.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v7 1/9] x86/xpti: avoid copying L4 page table contents when possible
  2018-04-12 18:09 ` [PATCH v7 1/9] x86/xpti: avoid copying L4 page table contents when possible Juergen Gross
@ 2018-04-13  8:43   ` Jan Beulich
  2018-04-16 15:34     ` Tim Deegan
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2018-04-13  8:43 UTC (permalink / raw)
  To: Juergen Gross; +Cc: Andrew Cooper, Tim Deegan, xen-devel

>>> On 12.04.18 at 20:09, <jgross@suse.com> wrote:
> For mitigation of Meltdown the current L4 page table is copied to the
> cpu local root page table each time a 64 bit pv guest is entered.
> 
> Copying can be avoided in cases where the guest L4 page table hasn't
> been modified while running the hypervisor, e.g. when handling
> interrupts or any hypercall not modifying the L4 page table or %cr3.
> 
> So add a per-cpu flag indicating whether the copying should be
> performed and set that flag only when loading a new %cr3 or modifying
> the L4 page table.  This includes synchronization of the cpu local
> root page table with other cpus, so add a special synchronization flag
> for that case.
> 
> A simple performance check (compiling the hypervisor via "make -j 4")
> in dom0 with 4 vcpus shows a significant improvement:
> 
> - real time drops from 112 seconds to 103 seconds
> - system time drops from 142 seconds to 131 seconds
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> ---
> V7:
> - add missing flag setting in shadow code

This now needs an ack from Tim (now Cc-ed).

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v7 0/9] xen/x86: various XPTI speedups
  2018-04-12 18:09 [PATCH v7 0/9] xen/x86: various XPTI speedups Juergen Gross
                   ` (8 preceding siblings ...)
  2018-04-12 18:09 ` [PATCH v7 9/9] xen/x86: use PCID feature Juergen Gross
@ 2018-04-13  9:59 ` Andrew Cooper
  2018-04-13 10:29   ` Juergen Gross
  9 siblings, 1 reply; 16+ messages in thread
From: Andrew Cooper @ 2018-04-13  9:59 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: jbeulich

On 12/04/18 19:09, Juergen Gross wrote:
> This patch series aims at reducing the overhead of the XPTI Meltdown
> mitigation.

Sadly, there are still problems. 

(XEN) [   13.486805] Dom0 has maximum 2 VCPUs
(XEN) [   13.486824] ----[ Xen-4.11.0-5.0.3-d  x86_64  debug=y   Not tainted ]----
(XEN) [   13.486826] CPU:    0
(XEN) [   13.486828] RIP:    e008:[<ffff82d0802885f4>] switch_cr3_cr4+0x58/0x116
(XEN) [   13.486833] RFLAGS: 0000000000010086   CONTEXT: hypervisor
(XEN) [   13.486836] rax: 00000000000000df   rbx: 0000000000000282   rcx: ffff82d0804b7fff
(XEN) [   13.486839] rdx: 0000000000152660   rsi: 00000000001526e0   rdi: 8000001071d4a000
(XEN) [   13.486841] rbp: ffff82d0804b78d8   rsp: ffff82d0804b78a8   r8:  0000000000000000
(XEN) [   13.486844] r9:  0000ffff0000ffff   r10: 00ff00ff00ff00ff   r11: 0f0f0f0f0f0f0f0f
(XEN) [   13.486847] r12: 8000001071d4a000   r13: 0000000057ea8000   r14: 00000000001526e0
(XEN) [   13.486849] r15: ffff83107326f000   cr0: 000000008005003b   cr4: 0000000000152660
(XEN) [   13.486851] cr3: 0000000057ea8000   cr2: 0000000000000000
(XEN) [   13.486853] fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
(XEN) [   13.486855] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) [   13.486859] Xen code around <ffff82d0802885f4> (switch_cr3_cr4+0x58/0x116):
(XEN) [   13.486860]  00 00 66 0f 38 82 4d d0 <41> 0f 22 dc 4c 39 f2 75 56 4c 89 ea 81 e2 ff 0f
(XEN) [   13.486869] Xen stack trace from rsp=ffff82d0804b78a8:
(XEN) [   13.486870]    ffff82d0804b78d8 ffff82d0804466a2 ffff83005a1f1000 0000000000000002
(XEN) [   13.486874]    ffffffff82000000 ffff830000060fa0 ffff82d0804b7d68 ffff82d08044349e
(XEN) [   13.486878]    0000000000000000 ffff830000060fa0 ffffffff82000000 0000000000000ff0
(XEN) [   13.486881]    0000000000000000 0000001071d4c000 ffff831071d4b000 ffff831071d4c000
(XEN) [   13.486884]    ffffffff81d49000 0000000000000000 0000000000000013 ffff831071d4dff8
(XEN) [   13.486887]    0000001071d5c000 ffff831071d4d000 ffffffff81d5e000 ffffffff81000000
(XEN) [   13.486891]    0000000001072000 0000001071d5d000 ffffffff81d4a000 ffffffff81d49000
(XEN) [   13.486894]    ffff831071d4c080 0000000000002000 0000000001070000 ffffffff81d49000
(XEN) [   13.486897]    ffffffff82000000 ffff831071d4aff8 0000000000002000 0000000000000001
(XEN) [   13.486900]    0000008000200000 0000008000000000 000000000000570a 0000000000040000
(XEN) [   13.486903]    0000000000000000 ffffffff80000000 ffff831071d4dff0 ffff82d080485580
(XEN) [   13.486907]    ffff83005a1f1000 0000000005709ac2 0000000000000000 ffff832079bd182c
(XEN) [   13.486910]    ffff832079bd19e8 0000000000000000 0000000000000000 0000000000000000
(XEN) [   13.486913]    0000000000000001 ffff82d0803fd5e8 ffffffff81b051f0 0000000000000001
(XEN) [   13.486916]    ffff82d0803fd436 ffffffff81001000 0000000000000001 ffff82d0803fd410
(XEN) [   13.486919]    ffffffff80000000 0000000000000001 ffff82d0803fd429 0000000000000000
(XEN) [   13.486923]    0000000000000002 ffff82d0803fd578 ffff832079bd1868 0000000000000002
(XEN) [   13.486926]    ffff82d0803fd3d4 ffff832079bd183c 0000000000000002 ffff82d0803fd584
(XEN) [   13.486929]    ffff832079bd1854 0000000000000002 ffff82d0803fd3cd ffff832079bd1944
(XEN) [   13.486933]    0000000000000002 ffff82d0803fd592 ffff832079bd1930 0000000000000002
(XEN) [   13.486936] Xen call trace:
(XEN) [   13.486938]    [<ffff82d0802885f4>] switch_cr3_cr4+0x58/0x116
(XEN) [   13.486942]    [<ffff82d08044349e>] dom0_construct_pv+0x1bb1/0x29e3
(XEN) [   13.486945]    [<ffff82d0804470ba>] construct_dom0+0x8c/0xb86
(XEN) [   13.486949]    [<ffff82d080437fd3>] __start_xen+0x23c4/0x2629
(XEN) [   13.486952]    [<ffff82d0802000f3>] __high_start+0x53/0x58
(XEN) [   13.486954]
(XEN) [   14.047278]
(XEN) [   14.049274] ****************************************
(XEN) [   14.054734] Panic on CPU 0:
(XEN) [   14.058026] GENERAL PROTECTION FAULT
(XEN) [   14.062099] [error_code=0000]
(XEN) [   14.065565] ****************************************
(XEN) [   14.071024]
(XEN) [   14.073018] Reboot in five seconds...

The faulting instruction is `mov %r12, %cr3` which is trying to use
noflush while %cr4.pcide is clear.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v7 0/9] xen/x86: various XPTI speedups
  2018-04-13  9:59 ` [PATCH v7 0/9] xen/x86: various XPTI speedups Andrew Cooper
@ 2018-04-13 10:29   ` Juergen Gross
  0 siblings, 0 replies; 16+ messages in thread
From: Juergen Gross @ 2018-04-13 10:29 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: jbeulich

[-- Attachment #1: Type: text/plain, Size: 4765 bytes --]

On 13/04/18 11:59, Andrew Cooper wrote:
> On 12/04/18 19:09, Juergen Gross wrote:
>> This patch series aims at reducing the overhead of the XPTI Meltdown
>> mitigation.
> 
> Sadly, there are still problems. 
> 
> (XEN) [   13.486805] Dom0 has maximum 2 VCPUs
> (XEN) [   13.486824] ----[ Xen-4.11.0-5.0.3-d  x86_64  debug=y   Not tainted ]----
> (XEN) [   13.486826] CPU:    0
> (XEN) [   13.486828] RIP:    e008:[<ffff82d0802885f4>] switch_cr3_cr4+0x58/0x116
> (XEN) [   13.486833] RFLAGS: 0000000000010086   CONTEXT: hypervisor
> (XEN) [   13.486836] rax: 00000000000000df   rbx: 0000000000000282   rcx: ffff82d0804b7fff
> (XEN) [   13.486839] rdx: 0000000000152660   rsi: 00000000001526e0   rdi: 8000001071d4a000
> (XEN) [   13.486841] rbp: ffff82d0804b78d8   rsp: ffff82d0804b78a8   r8:  0000000000000000
> (XEN) [   13.486844] r9:  0000ffff0000ffff   r10: 00ff00ff00ff00ff   r11: 0f0f0f0f0f0f0f0f
> (XEN) [   13.486847] r12: 8000001071d4a000   r13: 0000000057ea8000   r14: 00000000001526e0
> (XEN) [   13.486849] r15: ffff83107326f000   cr0: 000000008005003b   cr4: 0000000000152660
> (XEN) [   13.486851] cr3: 0000000057ea8000   cr2: 0000000000000000
> (XEN) [   13.486853] fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
> (XEN) [   13.486855] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) [   13.486859] Xen code around <ffff82d0802885f4> (switch_cr3_cr4+0x58/0x116):
> (XEN) [   13.486860]  00 00 66 0f 38 82 4d d0 <41> 0f 22 dc 4c 39 f2 75 56 4c 89 ea 81 e2 ff 0f
> (XEN) [   13.486869] Xen stack trace from rsp=ffff82d0804b78a8:
> (XEN) [   13.486870]    ffff82d0804b78d8 ffff82d0804466a2 ffff83005a1f1000 0000000000000002
> (XEN) [   13.486874]    ffffffff82000000 ffff830000060fa0 ffff82d0804b7d68 ffff82d08044349e
> (XEN) [   13.486878]    0000000000000000 ffff830000060fa0 ffffffff82000000 0000000000000ff0
> (XEN) [   13.486881]    0000000000000000 0000001071d4c000 ffff831071d4b000 ffff831071d4c000
> (XEN) [   13.486884]    ffffffff81d49000 0000000000000000 0000000000000013 ffff831071d4dff8
> (XEN) [   13.486887]    0000001071d5c000 ffff831071d4d000 ffffffff81d5e000 ffffffff81000000
> (XEN) [   13.486891]    0000000001072000 0000001071d5d000 ffffffff81d4a000 ffffffff81d49000
> (XEN) [   13.486894]    ffff831071d4c080 0000000000002000 0000000001070000 ffffffff81d49000
> (XEN) [   13.486897]    ffffffff82000000 ffff831071d4aff8 0000000000002000 0000000000000001
> (XEN) [   13.486900]    0000008000200000 0000008000000000 000000000000570a 0000000000040000
> (XEN) [   13.486903]    0000000000000000 ffffffff80000000 ffff831071d4dff0 ffff82d080485580
> (XEN) [   13.486907]    ffff83005a1f1000 0000000005709ac2 0000000000000000 ffff832079bd182c
> (XEN) [   13.486910]    ffff832079bd19e8 0000000000000000 0000000000000000 0000000000000000
> (XEN) [   13.486913]    0000000000000001 ffff82d0803fd5e8 ffffffff81b051f0 0000000000000001
> (XEN) [   13.486916]    ffff82d0803fd436 ffffffff81001000 0000000000000001 ffff82d0803fd410
> (XEN) [   13.486919]    ffffffff80000000 0000000000000001 ffff82d0803fd429 0000000000000000
> (XEN) [   13.486923]    0000000000000002 ffff82d0803fd578 ffff832079bd1868 0000000000000002
> (XEN) [   13.486926]    ffff82d0803fd3d4 ffff832079bd183c 0000000000000002 ffff82d0803fd584
> (XEN) [   13.486929]    ffff832079bd1854 0000000000000002 ffff82d0803fd3cd ffff832079bd1944
> (XEN) [   13.486933]    0000000000000002 ffff82d0803fd592 ffff832079bd1930 0000000000000002
> (XEN) [   13.486936] Xen call trace:
> (XEN) [   13.486938]    [<ffff82d0802885f4>] switch_cr3_cr4+0x58/0x116
> (XEN) [   13.486942]    [<ffff82d08044349e>] dom0_construct_pv+0x1bb1/0x29e3
> (XEN) [   13.486945]    [<ffff82d0804470ba>] construct_dom0+0x8c/0xb86
> (XEN) [   13.486949]    [<ffff82d080437fd3>] __start_xen+0x23c4/0x2629
> (XEN) [   13.486952]    [<ffff82d0802000f3>] __high_start+0x53/0x58
> (XEN) [   13.486954]
> (XEN) [   14.047278]
> (XEN) [   14.049274] ****************************************
> (XEN) [   14.054734] Panic on CPU 0:
> (XEN) [   14.058026] GENERAL PROTECTION FAULT
> (XEN) [   14.062099] [error_code=0000]
> (XEN) [   14.065565] ****************************************
> (XEN) [   14.071024]
> (XEN) [   14.073018] Reboot in five seconds...
> 
> The faulting instruction is `mov %r12, %cr3` which is trying to use
> noflush while %cr4.pcide is clear.

While I can see how that happened I'm not sure why I didn't hit this
when testing my series. Could it be some cpus won't GP in this case?

Could you try the series without the last patch? Maybe it would be
possible to commit some of the patches at least.

I'm just about to leave for the Linux root conference in Kiev, so the
patch attached is only compile tested. You might want to try that.


Juergen


[-- Attachment #2: fixup.patch --]
[-- Type: text/x-patch, Size: 546 bytes --]

diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 22c5150444..34c77bcbe4 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -718,7 +718,7 @@ int __init dom0_construct_pv(struct domain *d,
         update_cr3(v);
 
     /* We run on dom0's page tables for the final part of the build process. */
-    switch_cr3_cr4(v->arch.cr3, read_cr4());
+    switch_cr3_cr4(cr3_pa(v->arch.cr3), read_cr4());
     mapcache_override_current(v);
 
     /* Copy the OS image and free temporary buffer. */

[-- Attachment #3: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v7 1/9] x86/xpti: avoid copying L4 page table contents when possible
  2018-04-13  8:43   ` Jan Beulich
@ 2018-04-16 15:34     ` Tim Deegan
  2018-04-16 18:22       ` Juergen Gross
  0 siblings, 1 reply; 16+ messages in thread
From: Tim Deegan @ 2018-04-16 15:34 UTC (permalink / raw)
  To: Juergen Gross; +Cc: Andrew Cooper, Jan Beulich, xen-devel

Hi,

At 02:43 -0600 on 13 Apr (1523587395), Jan Beulich wrote:
> >>> On 12.04.18 at 20:09, <jgross@suse.com> wrote:
> > For mitigation of Meltdown the current L4 page table is copied to the
> > cpu local root page table each time a 64 bit pv guest is entered.
> > 
> > Copying can be avoided in cases where the guest L4 page table hasn't
> > been modified while running the hypervisor, e.g. when handling
> > interrupts or any hypercall not modifying the L4 page table or %cr3.
> > 
> > So add a per-cpu flag indicating whether the copying should be
> > performed and set that flag only when loading a new %cr3 or modifying
> > the L4 page table.  This includes synchronization of the cpu local
> > root page table with other cpus, so add a special synchronization flag
> > for that case.
> > 
> > A simple performance check (compiling the hypervisor via "make -j 4")
> > in dom0 with 4 vcpus shows a significant improvement:
> > 
> > - real time drops from 112 seconds to 103 seconds
> > - system time drops from 142 seconds to 131 seconds
> > 
> > Signed-off-by: Juergen Gross <jgross@suse.com>
> > Reviewed-by: Jan Beulich <jbeulich@suse.com>
> > ---
> > V7:
> > - add missing flag setting in shadow code
> 
> This now needs an ack from Tim (now Cc-ed).

I may be misunderstanding how this flag is supposed to work, but this
seems at first glance to do both too much and too little.  The sl4
table that's being modified may be in use on any number of other
pcpus, and might not be in use on the current pcpu.

It looks like the do_mmu_update path issues a flush IPI; I think that
the equivalent IPI would be a better place to hook if you can.

Also I'm not sure why the flag needs to be set in
l4e_propagate_from_guest() as well as shadow_set_l4e().  Can you
elaborate?

Cheers,

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v7 1/9] x86/xpti: avoid copying L4 page table contents when possible
  2018-04-16 15:34     ` Tim Deegan
@ 2018-04-16 18:22       ` Juergen Gross
  2018-04-18  6:55         ` Juergen Gross
  0 siblings, 1 reply; 16+ messages in thread
From: Juergen Gross @ 2018-04-16 18:22 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Andrew Cooper, Jan Beulich, xen-devel

On 16/04/18 17:34, Tim Deegan wrote:
> Hi,
> 
> At 02:43 -0600 on 13 Apr (1523587395), Jan Beulich wrote:
>>>>> On 12.04.18 at 20:09, <jgross@suse.com> wrote:
>>> For mitigation of Meltdown the current L4 page table is copied to the
>>> cpu local root page table each time a 64 bit pv guest is entered.
>>>
>>> Copying can be avoided in cases where the guest L4 page table hasn't
>>> been modified while running the hypervisor, e.g. when handling
>>> interrupts or any hypercall not modifying the L4 page table or %cr3.
>>>
>>> So add a per-cpu flag indicating whether the copying should be
>>> performed and set that flag only when loading a new %cr3 or modifying
>>> the L4 page table.  This includes synchronization of the cpu local
>>> root page table with other cpus, so add a special synchronization flag
>>> for that case.
>>>
>>> A simple performance check (compiling the hypervisor via "make -j 4")
>>> in dom0 with 4 vcpus shows a significant improvement:
>>>
>>> - real time drops from 112 seconds to 103 seconds
>>> - system time drops from 142 seconds to 131 seconds
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>> ---
>>> V7:
>>> - add missing flag setting in shadow code
>>
>> This now needs an ack from Tim (now Cc-ed).
> 
> I may be misunderstanding how this flag is supposed to work, but this
> seems at first glance to do both too much and too little.  The sl4
> table that's being modified may be in use on any number of other
> pcpus, and might not be in use on the current pcpu.

Okay.

> It looks like the do_mmu_update path issues a flush IPI; I think that
> the equivalent IPI would be a better place to hook if you can.

I want to catch only cases where the L4 table is being modified.

> Also I'm not sure why the flag needs to be set in
> l4e_propagate_from_guest() as well as shadow_set_l4e().  Can you
> elaborate?

I narrowed down iteratively where the setting of the flag is mandatory.
Doing it in shadow_set_l4e() seemed to look right, but it wasn't enough.
Setting it also in l4e_propagate_from_guest() showed no further problems
migrating the guest.

So the latter is strictly required, while the former might be not
necessary.

What you are telling me is I need to set the flag on the other cpus,
too. I didn't experience any problems omitting that, but maybe this was
only because the flag would be set eventually some time later (e.g. in
case of a vcpu scheduling event or a scheduling in the guest leading to
the flag being set), resulting in a short hang instead of crash.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v7 1/9] x86/xpti: avoid copying L4 page table contents when possible
  2018-04-16 18:22       ` Juergen Gross
@ 2018-04-18  6:55         ` Juergen Gross
  0 siblings, 0 replies; 16+ messages in thread
From: Juergen Gross @ 2018-04-18  6:55 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Andrew Cooper, Jan Beulich, xen-devel

On 16/04/18 20:22, Juergen Gross wrote:
> On 16/04/18 17:34, Tim Deegan wrote:
>> Hi,
>>
>> At 02:43 -0600 on 13 Apr (1523587395), Jan Beulich wrote:
>>>>>> On 12.04.18 at 20:09, <jgross@suse.com> wrote:
>>>> For mitigation of Meltdown the current L4 page table is copied to the
>>>> cpu local root page table each time a 64 bit pv guest is entered.
>>>>
>>>> Copying can be avoided in cases where the guest L4 page table hasn't
>>>> been modified while running the hypervisor, e.g. when handling
>>>> interrupts or any hypercall not modifying the L4 page table or %cr3.
>>>>
>>>> So add a per-cpu flag indicating whether the copying should be
>>>> performed and set that flag only when loading a new %cr3 or modifying
>>>> the L4 page table.  This includes synchronization of the cpu local
>>>> root page table with other cpus, so add a special synchronization flag
>>>> for that case.
>>>>
>>>> A simple performance check (compiling the hypervisor via "make -j 4")
>>>> in dom0 with 4 vcpus shows a significant improvement:
>>>>
>>>> - real time drops from 112 seconds to 103 seconds
>>>> - system time drops from 142 seconds to 131 seconds
>>>>
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>> ---
>>>> V7:
>>>> - add missing flag setting in shadow code
>>>
>>> This now needs an ack from Tim (now Cc-ed).
>>
>> I may be misunderstanding how this flag is supposed to work, but this
>> seems at first glance to do both too much and too little.  The sl4
>> table that's being modified may be in use on any number of other
>> pcpus, and might not be in use on the current pcpu.
> 
> Okay.
> 
>> It looks like the do_mmu_update path issues a flush IPI; I think that
>> the equivalent IPI would be a better place to hook if you can.
> 
> I want to catch only cases where the L4 table is being modified.
> 
>> Also I'm not sure why the flag needs to be set in
>> l4e_propagate_from_guest() as well as shadow_set_l4e().  Can you
>> elaborate?
> 
> I narrowed down iteratively where the setting of the flag is mandatory.
> Doing it in shadow_set_l4e() seemed to look right, but it wasn't enough.
> Setting it also in l4e_propagate_from_guest() showed no further problems
> migrating the guest.

I just revisited this after looking into the code. Seems I remembered
the sequence of narrowing down wrong. l4e_propagate_from_guest() doesn't
need to set the flag.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-04-18  6:55 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-12 18:09 [PATCH v7 0/9] xen/x86: various XPTI speedups Juergen Gross
2018-04-12 18:09 ` [PATCH v7 1/9] x86/xpti: avoid copying L4 page table contents when possible Juergen Gross
2018-04-13  8:43   ` Jan Beulich
2018-04-16 15:34     ` Tim Deegan
2018-04-16 18:22       ` Juergen Gross
2018-04-18  6:55         ` Juergen Gross
2018-04-12 18:09 ` [PATCH v7 2/9] xen/x86: add a function for modifying cr3 Juergen Gross
2018-04-12 18:09 ` [PATCH v7 3/9] xen/x86: support per-domain flag for xpti Juergen Gross
2018-04-12 18:09 ` [PATCH v7 4/9] xen/x86: use invpcid for flushing the TLB Juergen Gross
2018-04-12 18:09 ` [PATCH v7 5/9] xen/x86: disable global pages for domains with XPTI active Juergen Gross
2018-04-12 18:09 ` [PATCH v7 6/9] xen/x86: use flag byte for decision whether xen_cr3 is valid Juergen Gross
2018-04-12 18:09 ` [PATCH v7 7/9] xen/x86: convert pv_guest_cr4_to_real_cr4() to a function Juergen Gross
2018-04-12 18:09 ` [PATCH v7 8/9] xen/x86: add some cr3 helpers Juergen Gross
2018-04-12 18:09 ` [PATCH v7 9/9] xen/x86: use PCID feature Juergen Gross
2018-04-13  9:59 ` [PATCH v7 0/9] xen/x86: various XPTI speedups Andrew Cooper
2018-04-13 10:29   ` Juergen Gross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.