[PATCH v2 0/9] x86/vmx: Don't leak EFER.NXE into guest context

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/9] x86/vmx: Don't leak EFER.NXE into guest context
@ 2018-06-08 18:48 Andrew Cooper
  2018-06-08 18:48 ` [PATCH v2 1/9] x86/vmx: API improvements for MSR load/save infrastructure Andrew Cooper
                   ` (8 more replies)
  0 siblings, 9 replies; 36+ messages in thread
From: Andrew Cooper @ 2018-06-08 18:48 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper, Jun Nakajima,
	Roger Pau Monné

The purpose of this patchset is to fix a longstanding bug whereby Xen's NXE
setting leaks into HVM context, and has a visible effect on the guests
pagewalk.

To allow patch 9 to function, there are 8 patches of improvements to the MSR
load/save infrastructure, purely to support first-gen VT-x hardware.

This series is based on the x86-next branch, which has the prerequisite
content to split out MSR_DEBUGCTL handling.

Major changes from v1:
  * MSR_DEBUGCTL handling fixes disentangled, and already completed.
  * Improvements to LBR handling to fix a logic bug with the behavioural
    changes to vmx_add_msr().
  * Fix a regression introduced on certain Nehalem/Westmere hardware.  See
    patch 9 for more details.

Andrew Cooper (9):
  x86/vmx: API improvements for MSR load/save infrastructure
  x86/vmx: Internal cleanup for MSR load/save infrastructure
  x86/vmx: Factor locate_msr_entry() out of vmx_find_msr() and vmx_add_msr()
  x86/vmx: Support remote access to the MSR lists
  x86/vmx: Improvements to LBR MSR handling
  x86/vmx: Pass an MSR value into vmx_msr_add()
  x86/vmx: Support load-only guest MSR list entries
  x86/vmx: Support removing MSRs from the host/guest load/save lists
  x86/vmx: Don't leak EFER.NXE into guest context

 xen/arch/x86/cpu/vpmu_intel.c      |  14 +-
 xen/arch/x86/domain.c              |  10 --
 xen/arch/x86/hvm/vmx/vmcs.c        | 313 ++++++++++++++++++++++++++-----------
 xen/arch/x86/hvm/vmx/vmx.c         | 127 +++++++++------
 xen/include/asm-x86/hvm/hvm.h      |   4 +-
 xen/include/asm-x86/hvm/vmx/vmcs.h | 118 +++++++++++---
 xen/include/xen/sched.h            |   2 +-
 7 files changed, 414 insertions(+), 174 deletions(-)

-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 1/9] x86/vmx: API improvements for MSR load/save infrastructure
  2018-06-08 18:48 [PATCH v2 0/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
@ 2018-06-08 18:48 ` Andrew Cooper
  2018-06-08 18:48 ` [PATCH v2 2/9] x86/vmx: Internal cleanup " Andrew Cooper
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 36+ messages in thread
From: Andrew Cooper @ 2018-06-08 18:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

Collect together related infrastructure in vmcs.h, rather than having it
spread out.  Turn vmx_{read,write}_guest_msr() into static inlines, as they
are simple enough.

Replace 'int type' with 'enum vmx_msr_list_type', and use switch statements
internally.  Later changes are going to introduce a new type.

Rename the type identifiers for consistency with the other VMX_MSR_*
constants.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wei.liu2@citrix.com>

v2:
 * Extra const
 * Document the behaviour of the MSR list types
---
 xen/arch/x86/hvm/vmx/vmcs.c        | 93 +++++++++++++++++---------------------
 xen/arch/x86/hvm/vmx/vmx.c         |  8 ++--
 xen/include/asm-x86/hvm/vmx/vmcs.h | 62 ++++++++++++++++++-------
 3 files changed, 91 insertions(+), 72 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 884c672..7f9dd48 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1283,22 +1283,26 @@ static int vmx_msr_entry_key_cmp(const void *key, const void *elt)
     return 0;
 }
 
-struct vmx_msr_entry *vmx_find_msr(u32 msr, int type)
+struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type)
 {
     struct vcpu *curr = current;
     unsigned int msr_count;
-    struct vmx_msr_entry *msr_area;
+    struct vmx_msr_entry *msr_area = NULL;
 
-    if ( type == VMX_GUEST_MSR )
+    switch ( type )
     {
-        msr_count = curr->arch.hvm_vmx.msr_count;
-        msr_area = curr->arch.hvm_vmx.msr_area;
-    }
-    else
-    {
-        ASSERT(type == VMX_HOST_MSR);
+    case VMX_MSR_HOST:
         msr_count = curr->arch.hvm_vmx.host_msr_count;
         msr_area = curr->arch.hvm_vmx.host_msr_area;
+        break;
+
+    case VMX_MSR_GUEST:
+        msr_count = curr->arch.hvm_vmx.msr_count;
+        msr_area = curr->arch.hvm_vmx.msr_area;
+        break;
+
+    default:
+        ASSERT_UNREACHABLE();
     }
 
     if ( msr_area == NULL )
@@ -1308,48 +1312,27 @@ struct vmx_msr_entry *vmx_find_msr(u32 msr, int type)
                    vmx_msr_entry_key_cmp);
 }
 
-int vmx_read_guest_msr(u32 msr, u64 *val)
-{
-    struct vmx_msr_entry *ent;
-
-    if ( (ent = vmx_find_msr(msr, VMX_GUEST_MSR)) != NULL )
-    {
-        *val = ent->data;
-        return 0;
-    }
-
-    return -ESRCH;
-}
-
-int vmx_write_guest_msr(u32 msr, u64 val)
-{
-    struct vmx_msr_entry *ent;
-
-    if ( (ent = vmx_find_msr(msr, VMX_GUEST_MSR)) != NULL )
-    {
-        ent->data = val;
-        return 0;
-    }
-
-    return -ESRCH;
-}
-
-int vmx_add_msr(u32 msr, int type)
+int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
 {
     struct vcpu *curr = current;
     unsigned int idx, *msr_count;
     struct vmx_msr_entry **msr_area, *msr_area_elem;
 
-    if ( type == VMX_GUEST_MSR )
+    switch ( type )
     {
-        msr_count = &curr->arch.hvm_vmx.msr_count;
-        msr_area = &curr->arch.hvm_vmx.msr_area;
-    }
-    else
-    {
-        ASSERT(type == VMX_HOST_MSR);
+    case VMX_MSR_HOST:
         msr_count = &curr->arch.hvm_vmx.host_msr_count;
         msr_area = &curr->arch.hvm_vmx.host_msr_area;
+        break;
+
+    case VMX_MSR_GUEST:
+        msr_count = &curr->arch.hvm_vmx.msr_count;
+        msr_area = &curr->arch.hvm_vmx.msr_area;
+        break;
+
+    default:
+        ASSERT_UNREACHABLE();
+        return -EINVAL;
     }
 
     if ( *msr_area == NULL )
@@ -1357,13 +1340,17 @@ int vmx_add_msr(u32 msr, int type)
         if ( (*msr_area = alloc_xenheap_page()) == NULL )
             return -ENOMEM;
 
-        if ( type == VMX_GUEST_MSR )
+        switch ( type )
         {
+        case VMX_MSR_HOST:
+            __vmwrite(VM_EXIT_MSR_LOAD_ADDR, virt_to_maddr(*msr_area));
+            break;
+
+        case VMX_MSR_GUEST:
             __vmwrite(VM_EXIT_MSR_STORE_ADDR, virt_to_maddr(*msr_area));
             __vmwrite(VM_ENTRY_MSR_LOAD_ADDR, virt_to_maddr(*msr_area));
+            break;
         }
-        else
-            __vmwrite(VM_EXIT_MSR_LOAD_ADDR, virt_to_maddr(*msr_area));
     }
 
     for ( idx = 0; idx < *msr_count && (*msr_area)[idx].index <= msr; idx++ )
@@ -1382,16 +1369,18 @@ int vmx_add_msr(u32 msr, int type)
 
     ++*msr_count;
 
-    if ( type == VMX_GUEST_MSR )
+    switch ( type )
     {
+    case VMX_MSR_HOST:
+        rdmsrl(msr, msr_area_elem->data);
+        __vmwrite(VM_EXIT_MSR_LOAD_COUNT, *msr_count);
+        break;
+
+    case VMX_MSR_GUEST:
         msr_area_elem->data = 0;
         __vmwrite(VM_EXIT_MSR_STORE_COUNT, *msr_count);
         __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, *msr_count);
-    }
-    else
-    {
-        rdmsrl(msr, msr_area_elem->data);
-        __vmwrite(VM_EXIT_MSR_LOAD_COUNT, *msr_count);
+        break;
     }
 
     return 0;
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 4318b15..bfabcab 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -4160,7 +4160,7 @@ static void lbr_tsx_fixup(void)
     struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
     struct vmx_msr_entry *msr;
 
-    if ( (msr = vmx_find_msr(lbr_from_start, VMX_GUEST_MSR)) != NULL )
+    if ( (msr = vmx_find_msr(lbr_from_start, VMX_MSR_GUEST)) != NULL )
     {
         /*
          * Sign extend into bits 61:62 while preserving bit 63
@@ -4170,7 +4170,7 @@ static void lbr_tsx_fixup(void)
             msr->data |= ((LBR_FROM_SIGNEXT_2MSB & msr->data) << 2);
     }
 
-    if ( (msr = vmx_find_msr(lbr_lastint_from, VMX_GUEST_MSR)) != NULL )
+    if ( (msr = vmx_find_msr(lbr_lastint_from, VMX_MSR_GUEST)) != NULL )
         msr->data |= ((LBR_FROM_SIGNEXT_2MSB & msr->data) << 2);
 }
 
@@ -4193,8 +4193,8 @@ static void bdw_erratum_bdf14_fixup(void)
      * erratum BDF14. Fix up MSR_IA32_LASTINT{FROM,TO}IP by
      * sign-extending into bits 48:63.
      */
-    sign_extend_msr(MSR_IA32_LASTINTFROMIP, VMX_GUEST_MSR);
-    sign_extend_msr(MSR_IA32_LASTINTTOIP, VMX_GUEST_MSR);
+    sign_extend_msr(MSR_IA32_LASTINTFROMIP, VMX_MSR_GUEST);
+    sign_extend_msr(MSR_IA32_LASTINTTOIP, VMX_MSR_GUEST);
 }
 
 static void lbr_fixup(void)
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 06c3179..20882d1 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -514,9 +514,6 @@ enum vmcs_field {
 
 #define VMCS_VPID_WIDTH 16
 
-#define VMX_GUEST_MSR 0
-#define VMX_HOST_MSR  1
-
 /* VM Instruction error numbers */
 enum vmx_insn_errno
 {
@@ -534,6 +531,52 @@ enum vmx_insn_errno
     VMX_INSN_FAIL_INVALID                  = ~0,
 };
 
+/* MSR load/save list infrastructure. */
+enum vmx_msr_list_type {
+    VMX_MSR_HOST,           /* MSRs loaded on VMExit.                   */
+    VMX_MSR_GUEST,          /* MSRs saved on VMExit, loaded on VMEntry. */
+};
+
+int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type);
+
+static inline int vmx_add_host_load_msr(uint32_t msr)
+{
+    return vmx_add_msr(msr, VMX_MSR_HOST);
+}
+
+static inline int vmx_add_guest_msr(uint32_t msr)
+{
+    return vmx_add_msr(msr, VMX_MSR_GUEST);
+}
+
+struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type);
+
+static inline int vmx_read_guest_msr(uint32_t msr, uint64_t *val)
+{
+    const struct vmx_msr_entry *ent = vmx_find_msr(msr, VMX_MSR_GUEST);
+
+    if ( !ent )
+        return -ESRCH;
+
+    *val = ent->data;
+
+    return 0;
+}
+
+static inline int vmx_write_guest_msr(uint32_t msr, uint64_t val)
+{
+    struct vmx_msr_entry *ent = vmx_find_msr(msr, VMX_MSR_GUEST);
+
+    if ( !ent )
+        return -ESRCH;
+
+    ent->data = val;
+
+    return 0;
+}
+
+
+/* MSR intercept bitmap infrastructure. */
 enum vmx_msr_intercept_type {
     VMX_MSR_R  = 1,
     VMX_MSR_W  = 2,
@@ -544,10 +587,6 @@ void vmx_clear_msr_intercept(struct vcpu *v, unsigned int msr,
                              enum vmx_msr_intercept_type type);
 void vmx_set_msr_intercept(struct vcpu *v, unsigned int msr,
                            enum vmx_msr_intercept_type type);
-int vmx_read_guest_msr(u32 msr, u64 *val);
-int vmx_write_guest_msr(u32 msr, u64 val);
-struct vmx_msr_entry *vmx_find_msr(u32 msr, int type);
-int vmx_add_msr(u32 msr, int type);
 void vmx_vmcs_switch(paddr_t from, paddr_t to);
 void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector);
 void vmx_clear_eoi_exit_bitmap(struct vcpu *v, u8 vector);
@@ -562,15 +601,6 @@ void virtual_vmcs_vmwrite(const struct vcpu *, u32 encoding, u64 val);
 enum vmx_insn_errno virtual_vmcs_vmwrite_safe(const struct vcpu *v,
                                               u32 vmcs_encoding, u64 val);
 
-static inline int vmx_add_guest_msr(u32 msr)
-{
-    return vmx_add_msr(msr, VMX_GUEST_MSR);
-}
-static inline int vmx_add_host_load_msr(u32 msr)
-{
-    return vmx_add_msr(msr, VMX_HOST_MSR);
-}
-
 DECLARE_PER_CPU(bool_t, vmxon);
 
 bool_t vmx_vcpu_pml_enabled(const struct vcpu *v);
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 2/9] x86/vmx: Internal cleanup for MSR load/save infrastructure
  2018-06-08 18:48 [PATCH v2 0/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
  2018-06-08 18:48 ` [PATCH v2 1/9] x86/vmx: API improvements for MSR load/save infrastructure Andrew Cooper
@ 2018-06-08 18:48 ` Andrew Cooper
  2018-06-08 18:48 ` [PATCH v2 3/9] x86/vmx: Factor locate_msr_entry() out of vmx_find_msr() and vmx_add_msr() Andrew Cooper
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 36+ messages in thread
From: Andrew Cooper @ 2018-06-08 18:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

 * Use an arch_vmx_struct local variable to reduce later code volume.
 * Use start/total instead of msr_area/msr_count.  This is in preparation for
   more finegrained handling with later changes.
 * Use ent/end pointers (again for preparation), and to make the vmx_add_msr()
   logic easier to follow.
 * Make the memory allocation block of vmx_add_msr() unlikely, and calculate
   virt_to_maddr() just once.

No practical change to functionality.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

v2:
 * Drop arch_ prefix from arch_vmx.
---
 xen/arch/x86/hvm/vmx/vmcs.c | 75 ++++++++++++++++++++++++---------------------
 1 file changed, 40 insertions(+), 35 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 7f9dd48..0b06b02 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1286,48 +1286,49 @@ static int vmx_msr_entry_key_cmp(const void *key, const void *elt)
 struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type)
 {
     struct vcpu *curr = current;
-    unsigned int msr_count;
-    struct vmx_msr_entry *msr_area = NULL;
+    struct arch_vmx_struct *vmx = &curr->arch.hvm_vmx;
+    struct vmx_msr_entry *start = NULL;
+    unsigned int total;
 
     switch ( type )
     {
     case VMX_MSR_HOST:
-        msr_count = curr->arch.hvm_vmx.host_msr_count;
-        msr_area = curr->arch.hvm_vmx.host_msr_area;
+        start    = vmx->host_msr_area;
+        total    = vmx->host_msr_count;
         break;
 
     case VMX_MSR_GUEST:
-        msr_count = curr->arch.hvm_vmx.msr_count;
-        msr_area = curr->arch.hvm_vmx.msr_area;
+        start    = vmx->msr_area;
+        total    = vmx->msr_count;
         break;
 
     default:
         ASSERT_UNREACHABLE();
     }
 
-    if ( msr_area == NULL )
+    if ( !start )
         return NULL;
 
-    return bsearch(&msr, msr_area, msr_count, sizeof(struct vmx_msr_entry),
-                   vmx_msr_entry_key_cmp);
+    return bsearch(&msr, start, total, sizeof(*start), vmx_msr_entry_key_cmp);
 }
 
 int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
 {
     struct vcpu *curr = current;
-    unsigned int idx, *msr_count;
-    struct vmx_msr_entry **msr_area, *msr_area_elem;
+    struct arch_vmx_struct *vmx = &curr->arch.hvm_vmx;
+    struct vmx_msr_entry **ptr, *start = NULL, *ent, *end;
+    unsigned int total;
 
     switch ( type )
     {
     case VMX_MSR_HOST:
-        msr_count = &curr->arch.hvm_vmx.host_msr_count;
-        msr_area = &curr->arch.hvm_vmx.host_msr_area;
+        ptr      = &vmx->host_msr_area;
+        total    = vmx->host_msr_count;
         break;
 
     case VMX_MSR_GUEST:
-        msr_count = &curr->arch.hvm_vmx.msr_count;
-        msr_area = &curr->arch.hvm_vmx.msr_area;
+        ptr      = &vmx->msr_area;
+        total    = vmx->msr_count;
         break;
 
     default:
@@ -1335,51 +1336,55 @@ int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
         return -EINVAL;
     }
 
-    if ( *msr_area == NULL )
+    /* Allocate memory on first use. */
+    if ( unlikely(!*ptr) )
     {
-        if ( (*msr_area = alloc_xenheap_page()) == NULL )
+        paddr_t addr;
+
+        if ( (*ptr = alloc_xenheap_page()) == NULL )
             return -ENOMEM;
 
+        addr = virt_to_maddr(*ptr);
+
         switch ( type )
         {
         case VMX_MSR_HOST:
-            __vmwrite(VM_EXIT_MSR_LOAD_ADDR, virt_to_maddr(*msr_area));
+            __vmwrite(VM_EXIT_MSR_LOAD_ADDR, addr);
             break;
 
         case VMX_MSR_GUEST:
-            __vmwrite(VM_EXIT_MSR_STORE_ADDR, virt_to_maddr(*msr_area));
-            __vmwrite(VM_ENTRY_MSR_LOAD_ADDR, virt_to_maddr(*msr_area));
+            __vmwrite(VM_EXIT_MSR_STORE_ADDR, addr);
+            __vmwrite(VM_ENTRY_MSR_LOAD_ADDR, addr);
             break;
         }
     }
 
-    for ( idx = 0; idx < *msr_count && (*msr_area)[idx].index <= msr; idx++ )
-        if ( (*msr_area)[idx].index == msr )
+    start = *ptr;
+    end   = start + total;
+
+    for ( ent = start; ent < end && ent->index <= msr; ++ent )
+        if ( ent->index == msr )
             return 0;
 
-    if ( *msr_count == (PAGE_SIZE / sizeof(struct vmx_msr_entry)) )
+    if ( total == (PAGE_SIZE / sizeof(*ent)) )
         return -ENOSPC;
 
-    memmove(*msr_area + idx + 1, *msr_area + idx,
-            sizeof(*msr_area_elem) * (*msr_count - idx));
-
-    msr_area_elem = *msr_area + idx;
-    msr_area_elem->index = msr;
-    msr_area_elem->mbz = 0;
+    memmove(ent + 1, ent, sizeof(*ent) * (end - ent));
 
-    ++*msr_count;
+    ent->index = msr;
+    ent->mbz = 0;
 
     switch ( type )
     {
     case VMX_MSR_HOST:
-        rdmsrl(msr, msr_area_elem->data);
-        __vmwrite(VM_EXIT_MSR_LOAD_COUNT, *msr_count);
+        rdmsrl(msr, ent->data);
+        __vmwrite(VM_EXIT_MSR_LOAD_COUNT, ++vmx->host_msr_count);
         break;
 
     case VMX_MSR_GUEST:
-        msr_area_elem->data = 0;
-        __vmwrite(VM_EXIT_MSR_STORE_COUNT, *msr_count);
-        __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, *msr_count);
+        ent->data = 0;
+        __vmwrite(VM_EXIT_MSR_STORE_COUNT, ++vmx->msr_count);
+        __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_count);
         break;
     }
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 3/9] x86/vmx: Factor locate_msr_entry() out of vmx_find_msr() and vmx_add_msr()
  2018-06-08 18:48 [PATCH v2 0/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
  2018-06-08 18:48 ` [PATCH v2 1/9] x86/vmx: API improvements for MSR load/save infrastructure Andrew Cooper
  2018-06-08 18:48 ` [PATCH v2 2/9] x86/vmx: Internal cleanup " Andrew Cooper
@ 2018-06-08 18:48 ` Andrew Cooper
  2018-06-08 18:48 ` [PATCH v2 4/9] x86/vmx: Support remote access to the MSR lists Andrew Cooper
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 36+ messages in thread
From: Andrew Cooper @ 2018-06-08 18:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

Instead of having multiple algorithms searching the MSR lists, implement a
single one.  It has the semantics required by vmx_add_msr(), to identify the
position in which an MSR should live, if it isn't already present.

There will be a marginal improvement for vmx_find_msr() by avoiding the
function pointer calls to vmx_msr_entry_key_cmp(), and a major improvement for
vmx_add_msr() by using a binary search instead of a linear search.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wei.liu2@citrix.com>

v2:
 * Fix typo
---
 xen/arch/x86/hvm/vmx/vmcs.c | 41 ++++++++++++++++++++++++++++-------------
 1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 0b06b02..d6ebcd6 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1270,24 +1270,36 @@ static int construct_vmcs(struct vcpu *v)
     return rc;
 }
 
-static int vmx_msr_entry_key_cmp(const void *key, const void *elt)
+/*
+ * Search an MSR list looking for an MSR entry, or the slot in which it should
+ * live (to keep the data sorted) if an entry is not found.
+ *
+ * The return pointer is guaranteed to be bounded by start and end.  However,
+ * it may point at end, and may be invalid for the caller to dereference.
+ */
+static struct vmx_msr_entry *locate_msr_entry(
+    struct vmx_msr_entry *start, struct vmx_msr_entry *end, uint32_t msr)
 {
-    const u32 *msr = key;
-    const struct vmx_msr_entry *entry = elt;
+    while ( start < end )
+    {
+        struct vmx_msr_entry *mid = start + (end - start) / 2;
 
-    if ( *msr > entry->index )
-        return 1;
-    if ( *msr < entry->index )
-        return -1;
+        if ( msr < mid->index )
+            end = mid;
+        else if ( msr > mid->index )
+            start = mid + 1;
+        else
+            return mid;
+    }
 
-    return 0;
+    return start;
 }
 
 struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type)
 {
     struct vcpu *curr = current;
     struct arch_vmx_struct *vmx = &curr->arch.hvm_vmx;
-    struct vmx_msr_entry *start = NULL;
+    struct vmx_msr_entry *start = NULL, *ent, *end;
     unsigned int total;
 
     switch ( type )
@@ -1309,7 +1321,10 @@ struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type)
     if ( !start )
         return NULL;
 
-    return bsearch(&msr, start, total, sizeof(*start), vmx_msr_entry_key_cmp);
+    end = start + total;
+    ent = locate_msr_entry(start, end, msr);
+
+    return ((ent < end) && (ent->index == msr)) ? ent : NULL;
 }
 
 int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
@@ -1361,10 +1376,10 @@ int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
 
     start = *ptr;
     end   = start + total;
+    ent   = locate_msr_entry(start, end, msr);
 
-    for ( ent = start; ent < end && ent->index <= msr; ++ent )
-        if ( ent->index == msr )
-            return 0;
+    if ( (ent < end) && (ent->index == msr) )
+        return 0;
 
     if ( total == (PAGE_SIZE / sizeof(*ent)) )
         return -ENOSPC;
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 4/9] x86/vmx: Support remote access to the MSR lists
  2018-06-08 18:48 [PATCH v2 0/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
                   ` (2 preceding siblings ...)
  2018-06-08 18:48 ` [PATCH v2 3/9] x86/vmx: Factor locate_msr_entry() out of vmx_find_msr() and vmx_add_msr() Andrew Cooper
@ 2018-06-08 18:48 ` Andrew Cooper
  2018-06-12  2:49   ` Tian, Kevin
  2018-06-12  8:06   ` Jan Beulich
  2018-06-08 18:48 ` [PATCH v2 5/9] x86/vmx: Improvements to LBR MSR handling Andrew Cooper
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 36+ messages in thread
From: Andrew Cooper @ 2018-06-08 18:48 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper, Jun Nakajima,
	Roger Pau Monné

At the moment, all modifications of the MSR lists are in current context.
However, future changes may need to put MSR_EFER into the lists from domctl
hypercall context.

Plumb a struct vcpu parameter down through the infrastructure, and use
vmx_vmcs_{enter,exit}() for safe access to the VMCS in vmx_add_msr().  Use
assertions to ensure that access is either in current context, or while the
vcpu is paused.

Note these expectations beside the fields in arch_vmx_struct, and reorder the
fields to avoid unnecessary padding.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

To preempt any questions about spinlocks, the use of the MSR lists in the
return-to-guest path causes checklock failures for plain spinlocks (despite it
technically being safe to live here), and the call to alloc_xenheap_page()
makes it impossible to use irqsave/restore variants, due to the nested
acquisition of the heap lock.

v2:
 * Remove paragraph from the commit message.
 * Constify vmx_find_msr().  This in turn requires correcting the prototype
   for vcpu_runnable()
---
 xen/arch/x86/cpu/vpmu_intel.c      | 14 ++++++-------
 xen/arch/x86/hvm/vmx/vmcs.c        | 40 ++++++++++++++++++++++++++++----------
 xen/arch/x86/hvm/vmx/vmx.c         | 22 +++++++++++----------
 xen/include/asm-x86/hvm/vmx/vmcs.h | 34 ++++++++++++++++++++------------
 xen/include/xen/sched.h            |  2 +-
 5 files changed, 72 insertions(+), 40 deletions(-)

diff --git a/xen/arch/x86/cpu/vpmu_intel.c b/xen/arch/x86/cpu/vpmu_intel.c
index 207e2e7..c499e69 100644
--- a/xen/arch/x86/cpu/vpmu_intel.c
+++ b/xen/arch/x86/cpu/vpmu_intel.c
@@ -455,12 +455,12 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
     if ( is_hvm_vcpu(v) )
     {
         wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-        if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+        if ( vmx_add_host_load_msr(v, MSR_CORE_PERF_GLOBAL_CTRL) )
             goto out_err;
 
-        if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+        if ( vmx_add_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL) )
             goto out_err;
-        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+        vmx_write_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL, 0);
     }
 
     core2_vpmu_cxt = xzalloc_bytes(sizeof(*core2_vpmu_cxt) +
@@ -613,7 +613,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
             return -EINVAL;
 
         if ( is_hvm_vcpu(v) )
-            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
+            vmx_read_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL,
                                &core2_vpmu_cxt->global_ctrl);
         else
             rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
@@ -682,7 +682,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
                 return -EINVAL;
 
             if ( is_hvm_vcpu(v) )
-                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
+                vmx_read_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL,
                                    &core2_vpmu_cxt->global_ctrl);
             else
                 rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
@@ -701,7 +701,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
     else
     {
         if ( is_hvm_vcpu(v) )
-            vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+            vmx_write_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
         else
             wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
     }
@@ -735,7 +735,7 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
             break;
         case MSR_CORE_PERF_GLOBAL_CTRL:
             if ( is_hvm_vcpu(v) )
-                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+                vmx_read_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
             else
                 rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content);
             break;
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index d6ebcd6..2541ee0 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1295,13 +1295,15 @@ static struct vmx_msr_entry *locate_msr_entry(
     return start;
 }
 
-struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type)
+struct vmx_msr_entry *vmx_find_msr(const struct vcpu *v, uint32_t msr,
+                                   enum vmx_msr_list_type type)
 {
-    struct vcpu *curr = current;
-    struct arch_vmx_struct *vmx = &curr->arch.hvm_vmx;
+    const struct arch_vmx_struct *vmx = &v->arch.hvm_vmx;
     struct vmx_msr_entry *start = NULL, *ent, *end;
     unsigned int total;
 
+    ASSERT(v == current || !vcpu_runnable(v));
+
     switch ( type )
     {
     case VMX_MSR_HOST:
@@ -1327,12 +1329,14 @@ struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type)
     return ((ent < end) && (ent->index == msr)) ? ent : NULL;
 }
 
-int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
+int vmx_add_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type)
 {
-    struct vcpu *curr = current;
-    struct arch_vmx_struct *vmx = &curr->arch.hvm_vmx;
+    struct arch_vmx_struct *vmx = &v->arch.hvm_vmx;
     struct vmx_msr_entry **ptr, *start = NULL, *ent, *end;
     unsigned int total;
+    int rc;
+
+    ASSERT(v == current || !vcpu_runnable(v));
 
     switch ( type )
     {
@@ -1351,13 +1355,18 @@ int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
         return -EINVAL;
     }
 
+    vmx_vmcs_enter(v);
+
     /* Allocate memory on first use. */
     if ( unlikely(!*ptr) )
     {
         paddr_t addr;
 
         if ( (*ptr = alloc_xenheap_page()) == NULL )
-            return -ENOMEM;
+        {
+            rc = -ENOMEM;
+            goto out;
+        }
 
         addr = virt_to_maddr(*ptr);
 
@@ -1379,10 +1388,16 @@ int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
     ent   = locate_msr_entry(start, end, msr);
 
     if ( (ent < end) && (ent->index == msr) )
-        return 0;
+    {
+        rc = 0;
+        goto out;
+    }
 
     if ( total == (PAGE_SIZE / sizeof(*ent)) )
-        return -ENOSPC;
+    {
+        rc = -ENOSPC;
+        goto out;
+    }
 
     memmove(ent + 1, ent, sizeof(*ent) * (end - ent));
 
@@ -1403,7 +1418,12 @@ int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
         break;
     }
 
-    return 0;
+    rc = 0;
+
+ out:
+    vmx_vmcs_exit(v);
+
+    return rc;
 }
 
 void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index bfabcab..25dd204 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2814,7 +2814,7 @@ static int is_last_branch_msr(u32 ecx)
 
 static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
 {
-    const struct vcpu *curr = current;
+    struct vcpu *curr = current;
 
     HVM_DBG_LOG(DBG_LEVEL_MSR, "ecx=%#x", msr);
 
@@ -2893,7 +2893,7 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
         if ( passive_domain_do_rdmsr(msr, msr_content) )
             goto done;
 
-        if ( vmx_read_guest_msr(msr, msr_content) == 0 )
+        if ( vmx_read_guest_msr(curr, msr, msr_content) == 0 )
             break;
 
         if ( is_last_branch_msr(msr) )
@@ -3105,7 +3105,7 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
 
             for ( ; (rc == 0) && lbr->count; lbr++ )
                 for ( i = 0; (rc == 0) && (i < lbr->count); i++ )
-                    if ( (rc = vmx_add_guest_msr(lbr->base + i)) == 0 )
+                    if ( (rc = vmx_add_guest_msr(v, lbr->base + i)) == 0 )
                     {
                         vmx_clear_msr_intercept(v, lbr->base + i, VMX_MSR_RW);
                         if ( lbr_tsx_fixup_needed )
@@ -3145,7 +3145,7 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
         if ( wrmsr_viridian_regs(msr, msr_content) ) 
             break;
 
-        if ( vmx_write_guest_msr(msr, msr_content) == 0 ||
+        if ( vmx_write_guest_msr(v, msr, msr_content) == 0 ||
              is_last_branch_msr(msr) )
             break;
 
@@ -4160,7 +4160,7 @@ static void lbr_tsx_fixup(void)
     struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
     struct vmx_msr_entry *msr;
 
-    if ( (msr = vmx_find_msr(lbr_from_start, VMX_MSR_GUEST)) != NULL )
+    if ( (msr = vmx_find_msr(curr, lbr_from_start, VMX_MSR_GUEST)) != NULL )
     {
         /*
          * Sign extend into bits 61:62 while preserving bit 63
@@ -4170,20 +4170,22 @@ static void lbr_tsx_fixup(void)
             msr->data |= ((LBR_FROM_SIGNEXT_2MSB & msr->data) << 2);
     }
 
-    if ( (msr = vmx_find_msr(lbr_lastint_from, VMX_MSR_GUEST)) != NULL )
+    if ( (msr = vmx_find_msr(curr, lbr_lastint_from, VMX_MSR_GUEST)) != NULL )
         msr->data |= ((LBR_FROM_SIGNEXT_2MSB & msr->data) << 2);
 }
 
-static void sign_extend_msr(u32 msr, int type)
+static void sign_extend_msr(struct vcpu *v, u32 msr, int type)
 {
     struct vmx_msr_entry *entry;
 
-    if ( (entry = vmx_find_msr(msr, type)) != NULL )
+    if ( (entry = vmx_find_msr(v, msr, type)) != NULL )
         entry->data = canonicalise_addr(entry->data);
 }
 
 static void bdw_erratum_bdf14_fixup(void)
 {
+    struct vcpu *curr = current;
+
     /*
      * Occasionally, on certain Broadwell CPUs MSR_IA32_LASTINTTOIP has
      * been observed to have the top three bits corrupted as though the
@@ -4193,8 +4195,8 @@ static void bdw_erratum_bdf14_fixup(void)
      * erratum BDF14. Fix up MSR_IA32_LASTINT{FROM,TO}IP by
      * sign-extending into bits 48:63.
      */
-    sign_extend_msr(MSR_IA32_LASTINTFROMIP, VMX_MSR_GUEST);
-    sign_extend_msr(MSR_IA32_LASTINTTOIP, VMX_MSR_GUEST);
+    sign_extend_msr(curr, MSR_IA32_LASTINTFROMIP, VMX_MSR_GUEST);
+    sign_extend_msr(curr, MSR_IA32_LASTINTTOIP, VMX_MSR_GUEST);
 }
 
 static void lbr_fixup(void)
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 20882d1..62afebe 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -130,10 +130,17 @@ struct arch_vmx_struct {
     uint64_t             sfmask;
 
     struct vmx_msr_bitmap *msr_bitmap;
-    unsigned int         msr_count;
+
+    /*
+     * Most accesses to the MSR host/guest load/save lists are in current
+     * context.  However, the data can be modified by toolstack/migration
+     * actions.  Remote access is only permitted for paused vcpus, and is
+     * protected under the domctl lock.
+     */
     struct vmx_msr_entry *msr_area;
-    unsigned int         host_msr_count;
     struct vmx_msr_entry *host_msr_area;
+    unsigned int         msr_count;
+    unsigned int         host_msr_count;
 
     unsigned long        eoi_exitmap_changed;
     DECLARE_BITMAP(eoi_exit_bitmap, NR_VECTORS);
@@ -537,23 +544,25 @@ enum vmx_msr_list_type {
     VMX_MSR_GUEST,          /* MSRs saved on VMExit, loaded on VMEntry. */
 };
 
-int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type);
+int vmx_add_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type);
 
-static inline int vmx_add_host_load_msr(uint32_t msr)
+static inline int vmx_add_guest_msr(struct vcpu *v, uint32_t msr)
 {
-    return vmx_add_msr(msr, VMX_MSR_HOST);
+    return vmx_add_msr(v, msr, VMX_MSR_GUEST);
 }
 
-static inline int vmx_add_guest_msr(uint32_t msr)
+static inline int vmx_add_host_load_msr(struct vcpu *v, uint32_t msr)
 {
-    return vmx_add_msr(msr, VMX_MSR_GUEST);
+    return vmx_add_msr(v, msr, VMX_MSR_HOST);
 }
 
-struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type);
+struct vmx_msr_entry *vmx_find_msr(const struct vcpu *v, uint32_t msr,
+                                   enum vmx_msr_list_type type);
 
-static inline int vmx_read_guest_msr(uint32_t msr, uint64_t *val)
+static inline int vmx_read_guest_msr(const struct vcpu *v, uint32_t msr,
+                                     uint64_t *val)
 {
-    const struct vmx_msr_entry *ent = vmx_find_msr(msr, VMX_MSR_GUEST);
+    const struct vmx_msr_entry *ent = vmx_find_msr(v, msr, VMX_MSR_GUEST);
 
     if ( !ent )
         return -ESRCH;
@@ -563,9 +572,10 @@ static inline int vmx_read_guest_msr(uint32_t msr, uint64_t *val)
     return 0;
 }
 
-static inline int vmx_write_guest_msr(uint32_t msr, uint64_t val)
+static inline int vmx_write_guest_msr(struct vcpu *v, uint32_t msr,
+                                      uint64_t val)
 {
-    struct vmx_msr_entry *ent = vmx_find_msr(msr, VMX_MSR_GUEST);
+    struct vmx_msr_entry *ent = vmx_find_msr(v, msr, VMX_MSR_GUEST);
 
     if ( !ent )
         return -ESRCH;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 99d2af2..e79d5a3 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -788,7 +788,7 @@ static inline struct domain *next_domain_in_cpupool(
 #define _VPF_parked          8
 #define VPF_parked           (1UL<<_VPF_parked)
 
-static inline int vcpu_runnable(struct vcpu *v)
+static inline bool vcpu_runnable(const struct vcpu *v)
 {
     return !(v->pause_flags |
              atomic_read(&v->pause_count) |
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 5/9] x86/vmx: Improvements to LBR MSR handling
  2018-06-08 18:48 [PATCH v2 0/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
                   ` (3 preceding siblings ...)
  2018-06-08 18:48 ` [PATCH v2 4/9] x86/vmx: Support remote access to the MSR lists Andrew Cooper
@ 2018-06-08 18:48 ` Andrew Cooper
  2018-06-12  2:58   ` Tian, Kevin
                     ` (2 more replies)
  2018-06-08 18:48 ` [PATCH v2 6/9] x86/vmx: Pass an MSR value into vmx_msr_add() Andrew Cooper
                   ` (3 subsequent siblings)
  8 siblings, 3 replies; 36+ messages in thread
From: Andrew Cooper @ 2018-06-08 18:48 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper, Jun Nakajima,
	Roger Pau Monné

The main purpose of this patch is to only ever insert the LBR MSRs into the
guest load/save list once, as a future patch wants to change the behaviour of
vmx_add_guest_msr().

The repeated processing of lbr_info and the guests MSR load/save list is
redundant, and a guest using LBR itself will have to re-enable
MSR_DEBUGCTL.LBR in its #DB handler, meaning that Xen will repeat this
redundant processing every time the guest gets a debug exception.

Rename lbr_fixup_enabled to lbr_flags to be a little more generic, and use one
bit to indicate that the MSRs have been inserted into the load/save list.
Shorten the existing FIXUP* identifiers to reduce code volume.

Finally, the enablement of the fixups only need to be calculated once, rather
than inside a doubly nested loop.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>

v2:
 * New
---
 xen/arch/x86/hvm/vmx/vmx.c         | 25 +++++++++++--------------
 xen/include/asm-x86/hvm/vmx/vmcs.h |  7 ++++++-
 2 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 25dd204..35f0e90 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2750,9 +2750,6 @@ enum
 
 #define LBR_FROM_SIGNEXT_2MSB  ((1ULL << 59) | (1ULL << 60))
 
-#define FIXUP_LBR_TSX            (1u << 0)
-#define FIXUP_BDW_ERRATUM_BDF14  (1u << 1)
-
 static bool __read_mostly lbr_tsx_fixup_needed;
 static bool __read_mostly bdw_erratum_bdf14_fixup_needed;
 static uint32_t __read_mostly lbr_from_start;
@@ -3097,7 +3094,8 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
             if ( vpmu_do_wrmsr(msr, msr_content, supported) )
                 break;
         }
-        if ( msr_content & IA32_DEBUGCTLMSR_LBR )
+        if ( (msr_content & IA32_DEBUGCTLMSR_LBR) &&
+             !(v->arch.hvm_vmx.lbr_flags & LBR_MSRS_INSERTED) )
         {
             const struct lbr_info *lbr = last_branch_msr_get();
             if ( lbr == NULL )
@@ -3106,14 +3104,13 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
             for ( ; (rc == 0) && lbr->count; lbr++ )
                 for ( i = 0; (rc == 0) && (i < lbr->count); i++ )
                     if ( (rc = vmx_add_guest_msr(v, lbr->base + i)) == 0 )
-                    {
                         vmx_clear_msr_intercept(v, lbr->base + i, VMX_MSR_RW);
-                        if ( lbr_tsx_fixup_needed )
-                            v->arch.hvm_vmx.lbr_fixup_enabled |= FIXUP_LBR_TSX;
-                        if ( bdw_erratum_bdf14_fixup_needed )
-                            v->arch.hvm_vmx.lbr_fixup_enabled |=
-                                FIXUP_BDW_ERRATUM_BDF14;
-                    }
+
+            v->arch.hvm_vmx.lbr_flags |= LBR_MSRS_INSERTED;
+            if ( lbr_tsx_fixup_needed )
+                v->arch.hvm_vmx.lbr_flags |= LBR_FIXUP_TSX;
+            if ( bdw_erratum_bdf14_fixup_needed )
+                v->arch.hvm_vmx.lbr_flags |= LBR_FIXUP_BDF14;
         }
 
         if ( rc < 0 )
@@ -4203,9 +4200,9 @@ static void lbr_fixup(void)
 {
     struct vcpu *curr = current;
 
-    if ( curr->arch.hvm_vmx.lbr_fixup_enabled & FIXUP_LBR_TSX )
+    if ( curr->arch.hvm_vmx.lbr_flags & LBR_FIXUP_TSX )
         lbr_tsx_fixup();
-    if ( curr->arch.hvm_vmx.lbr_fixup_enabled & FIXUP_BDW_ERRATUM_BDF14 )
+    if ( curr->arch.hvm_vmx.lbr_flags & LBR_FIXUP_BDF14 )
         bdw_erratum_bdf14_fixup();
 }
 
@@ -4273,7 +4270,7 @@ bool vmx_vmenter_helper(const struct cpu_user_regs *regs)
     }
 
  out:
-    if ( unlikely(curr->arch.hvm_vmx.lbr_fixup_enabled) )
+    if ( unlikely(curr->arch.hvm_vmx.lbr_flags & LBR_FIXUP_MASK) )
         lbr_fixup();
 
     HVMTRACE_ND(VMENTRY, 0, 1/*cycles*/, 0, 0, 0, 0, 0, 0, 0);
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 62afebe..37825ad 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -156,7 +156,12 @@ struct arch_vmx_struct {
     /* Are we emulating rather than VMENTERing? */
     uint8_t              vmx_emulate;
 
-    uint8_t              lbr_fixup_enabled;
+    /* Flags for LBR MSRs in the load/save lists. */
+#define LBR_MSRS_INSERTED  (1u << 0)
+#define LBR_FIXUP_TSX      (1u << 1)
+#define LBR_FIXUP_BDF14    (1u << 2)
+#define LBR_FIXUP_MASK     (LBR_FIXUP_TSX | LBR_FIXUP_BDF14)
+    uint8_t              lbr_flags;
 
     /* Bitmask of segments that we can't safely use in virtual 8086 mode */
     uint16_t             vm86_segment_mask;
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 6/9] x86/vmx: Pass an MSR value into vmx_msr_add()
  2018-06-08 18:48 [PATCH v2 0/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
                   ` (4 preceding siblings ...)
  2018-06-08 18:48 ` [PATCH v2 5/9] x86/vmx: Improvements to LBR MSR handling Andrew Cooper
@ 2018-06-08 18:48 ` Andrew Cooper
  2018-06-12  3:11   ` Tian, Kevin
  2018-06-12  8:19   ` Jan Beulich
  2018-06-08 18:48 ` [PATCH v2 7/9] x86/vmx: Support load-only guest MSR list entries Andrew Cooper
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 36+ messages in thread
From: Andrew Cooper @ 2018-06-08 18:48 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper, Jun Nakajima,
	Roger Pau Monné

The main purpose of this change is to allow us to set a specific MSR value,
without needing to know whether there is already a load/save list slot for it.

Previously, callers wanting this property needed to call both vmx_add_*_msr()
and vmx_write_*_msr() to cover both cases, and there are no callers which want
the old behaviour of being a no-op if an entry already existed for the MSR.

As a result of this API improvement, the default value for guest MSRs need not
be 0, and the default for host MSRs need not be passed via hardware register.
In practice, this cleans up the VPMU allocation logic, and avoids an MSR read
as part of vcpu construction.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

v2:
 * Document vmx_add_msr()'s behaviour.
 * Update commit message to indicate that noone wants the old behaviour.
---
 xen/arch/x86/cpu/vpmu_intel.c      |  6 ++----
 xen/arch/x86/hvm/vmx/vmcs.c        | 14 +++++++-------
 xen/arch/x86/hvm/vmx/vmx.c         |  2 +-
 xen/include/asm-x86/hvm/vmx/vmcs.h | 20 ++++++++++++++------
 4 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/cpu/vpmu_intel.c b/xen/arch/x86/cpu/vpmu_intel.c
index c499e69..1fc79c9 100644
--- a/xen/arch/x86/cpu/vpmu_intel.c
+++ b/xen/arch/x86/cpu/vpmu_intel.c
@@ -454,13 +454,11 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
 
     if ( is_hvm_vcpu(v) )
     {
-        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-        if ( vmx_add_host_load_msr(v, MSR_CORE_PERF_GLOBAL_CTRL) )
+        if ( vmx_add_host_load_msr(v, MSR_CORE_PERF_GLOBAL_CTRL, 0) )
             goto out_err;
 
-        if ( vmx_add_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL) )
+        if ( vmx_add_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL, 0) )
             goto out_err;
-        vmx_write_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL, 0);
     }
 
     core2_vpmu_cxt = xzalloc_bytes(sizeof(*core2_vpmu_cxt) +
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 2541ee0..c43b578 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1329,7 +1329,8 @@ struct vmx_msr_entry *vmx_find_msr(const struct vcpu *v, uint32_t msr,
     return ((ent < end) && (ent->index == msr)) ? ent : NULL;
 }
 
-int vmx_add_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type)
+int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
+                enum vmx_msr_list_type type)
 {
     struct arch_vmx_struct *vmx = &v->arch.hvm_vmx;
     struct vmx_msr_entry **ptr, *start = NULL, *ent, *end;
@@ -1388,11 +1389,9 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type)
     ent   = locate_msr_entry(start, end, msr);
 
     if ( (ent < end) && (ent->index == msr) )
-    {
-        rc = 0;
-        goto out;
-    }
+        goto found;
 
+    /* If there isn't an existing entry for msr, insert room for one. */
     if ( total == (PAGE_SIZE / sizeof(*ent)) )
     {
         rc = -ENOSPC;
@@ -1407,17 +1406,18 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type)
     switch ( type )
     {
     case VMX_MSR_HOST:
-        rdmsrl(msr, ent->data);
         __vmwrite(VM_EXIT_MSR_LOAD_COUNT, ++vmx->host_msr_count);
         break;
 
     case VMX_MSR_GUEST:
-        ent->data = 0;
         __vmwrite(VM_EXIT_MSR_STORE_COUNT, ++vmx->msr_count);
         __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_count);
         break;
     }
 
+    /* Set the msr's value. */
+ found:
+    ent->data = val;
     rc = 0;
 
  out:
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 35f0e90..5ca0658 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3103,7 +3103,7 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
 
             for ( ; (rc == 0) && lbr->count; lbr++ )
                 for ( i = 0; (rc == 0) && (i < lbr->count); i++ )
-                    if ( (rc = vmx_add_guest_msr(v, lbr->base + i)) == 0 )
+                    if ( (rc = vmx_add_guest_msr(v, lbr->base + i, 0)) == 0 )
                         vmx_clear_msr_intercept(v, lbr->base + i, VMX_MSR_RW);
 
             v->arch.hvm_vmx.lbr_flags |= LBR_MSRS_INSERTED;
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 37825ad..b4d23a1 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -549,16 +549,24 @@ enum vmx_msr_list_type {
     VMX_MSR_GUEST,          /* MSRs saved on VMExit, loaded on VMEntry. */
 };
 
-int vmx_add_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type);
+/**
+ * Add an MSR to an MSR list (inserting space for the entry if necessary), and
+ * set the MSRs value.
+ *
+ * May fail if unable to allocate memory for the list, or the total number of
+ * entries exceeds the memory allocated.
+ */
+int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
+                enum vmx_msr_list_type type);
 
-static inline int vmx_add_guest_msr(struct vcpu *v, uint32_t msr)
+static inline int vmx_add_guest_msr(struct vcpu *v, uint32_t msr, uint64_t val)
 {
-    return vmx_add_msr(v, msr, VMX_MSR_GUEST);
+    return vmx_add_msr(v, msr, val, VMX_MSR_GUEST);
 }
-
-static inline int vmx_add_host_load_msr(struct vcpu *v, uint32_t msr)
+static inline int vmx_add_host_load_msr(struct vcpu *v, uint32_t msr,
+                                        uint64_t val)
 {
-    return vmx_add_msr(v, msr, VMX_MSR_HOST);
+    return vmx_add_msr(v, msr, val, VMX_MSR_HOST);
 }
 
 struct vmx_msr_entry *vmx_find_msr(const struct vcpu *v, uint32_t msr,
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 7/9] x86/vmx: Support load-only guest MSR list entries
  2018-06-08 18:48 [PATCH v2 0/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
                   ` (5 preceding siblings ...)
  2018-06-08 18:48 ` [PATCH v2 6/9] x86/vmx: Pass an MSR value into vmx_msr_add() Andrew Cooper
@ 2018-06-08 18:48 ` Andrew Cooper
  2018-06-12  3:22   ` Tian, Kevin
  2018-06-08 18:48 ` [PATCH v2 8/9] x86/vmx: Support removing MSRs from the host/guest load/save lists Andrew Cooper
  2018-06-08 18:48 ` [PATCH v2 9/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
  8 siblings, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2018-06-08 18:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Kevin Tian, Wei Liu, Jun Nakajima

Currently, the VMX_MSR_GUEST type maintains completely symmetric guest load
and save lists, by pointing VM_EXIT_MSR_STORE_ADDR and VM_ENTRY_MSR_LOAD_ADDR
at the same page, and setting VM_EXIT_MSR_STORE_COUNT and
VM_ENTRY_MSR_LOAD_COUNT to the same value.

However, for MSRs which we won't let the guest have direct access to, having
hardware save the current value on VMExit is unnecessary overhead.

To avoid this overhead, we must make the load and save lists asymmetric.  By
making the entry load count greater than the exit store count, we can maintain
two adjacent lists of MSRs, the first of which is saved and restored, and the
second of which is only restored on VMEntry.

For simplicity:
 * Both adjacent lists are still sorted by MSR index.
 * It undefined behaviour to insert the same MSR into both lists.
 * The total size of both lists is still limited at 256 entries (one 4k page).

Split the current msr_count field into msr_{load,save}_count, and introduce a
new VMX_MSR_GUEST_LOADONLY type, and update vmx_{add,find}_msr() to calculate
which sublist to search, based on type.  VMX_MSR_HOST has no logical sublist,
whereas VMX_MSR_GUEST has a sublist between 0 and the save count, while
VMX_MSR_GUEST_LOADONLY has a sublist between the save count and the load
count.

One subtle point is that inserting an MSR into the load-save list involves
moving the entire load-only list, and updating both counts.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        | 46 +++++++++++++++++++++++++++++---------
 xen/arch/x86/hvm/vmx/vmx.c         |  2 +-
 xen/include/asm-x86/hvm/vmx/vmcs.h |  7 +++++-
 3 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index c43b578..4fb3043 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1300,7 +1300,7 @@ struct vmx_msr_entry *vmx_find_msr(const struct vcpu *v, uint32_t msr,
 {
     const struct arch_vmx_struct *vmx = &v->arch.hvm_vmx;
     struct vmx_msr_entry *start = NULL, *ent, *end;
-    unsigned int total;
+    unsigned int substart, subend, total;
 
     ASSERT(v == current || !vcpu_runnable(v));
 
@@ -1308,12 +1308,23 @@ struct vmx_msr_entry *vmx_find_msr(const struct vcpu *v, uint32_t msr,
     {
     case VMX_MSR_HOST:
         start    = vmx->host_msr_area;
-        total    = vmx->host_msr_count;
+        substart = 0;
+        subend   = vmx->host_msr_count;
+        total    = subend;
         break;
 
     case VMX_MSR_GUEST:
         start    = vmx->msr_area;
-        total    = vmx->msr_count;
+        substart = 0;
+        subend   = vmx->msr_save_count;
+        total    = vmx->msr_load_count;
+        break;
+
+    case VMX_MSR_GUEST_LOADONLY:
+        start    = vmx->msr_area;
+        substart = vmx->msr_save_count;
+        subend   = vmx->msr_load_count;
+        total    = subend;
         break;
 
     default:
@@ -1324,7 +1335,7 @@ struct vmx_msr_entry *vmx_find_msr(const struct vcpu *v, uint32_t msr,
         return NULL;
 
     end = start + total;
-    ent = locate_msr_entry(start, end, msr);
+    ent = locate_msr_entry(start + substart, start + subend, msr);
 
     return ((ent < end) && (ent->index == msr)) ? ent : NULL;
 }
@@ -1334,7 +1345,7 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
 {
     struct arch_vmx_struct *vmx = &v->arch.hvm_vmx;
     struct vmx_msr_entry **ptr, *start = NULL, *ent, *end;
-    unsigned int total;
+    unsigned int substart, subend, total;
     int rc;
 
     ASSERT(v == current || !vcpu_runnable(v));
@@ -1343,12 +1354,23 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
     {
     case VMX_MSR_HOST:
         ptr      = &vmx->host_msr_area;
-        total    = vmx->host_msr_count;
+        substart = 0;
+        subend   = vmx->host_msr_count;
+        total    = subend;
         break;
 
     case VMX_MSR_GUEST:
         ptr      = &vmx->msr_area;
-        total    = vmx->msr_count;
+        substart = 0;
+        subend   = vmx->msr_save_count;
+        total    = vmx->msr_load_count;
+        break;
+
+    case VMX_MSR_GUEST_LOADONLY:
+        ptr      = &vmx->msr_area;
+        substart = vmx->msr_save_count;
+        subend   = vmx->msr_load_count;
+        total    = subend;
         break;
 
     default:
@@ -1378,6 +1400,7 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
             break;
 
         case VMX_MSR_GUEST:
+        case VMX_MSR_GUEST_LOADONLY:
             __vmwrite(VM_EXIT_MSR_STORE_ADDR, addr);
             __vmwrite(VM_ENTRY_MSR_LOAD_ADDR, addr);
             break;
@@ -1386,7 +1409,7 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
 
     start = *ptr;
     end   = start + total;
-    ent   = locate_msr_entry(start, end, msr);
+    ent   = locate_msr_entry(start + substart, start + subend, msr);
 
     if ( (ent < end) && (ent->index == msr) )
         goto found;
@@ -1410,8 +1433,11 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
         break;
 
     case VMX_MSR_GUEST:
-        __vmwrite(VM_EXIT_MSR_STORE_COUNT, ++vmx->msr_count);
-        __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_count);
+        __vmwrite(VM_EXIT_MSR_STORE_COUNT, ++vmx->msr_save_count);
+
+        /* Fallthrough */
+    case VMX_MSR_GUEST_LOADONLY:
+        __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, ++vmx->msr_load_count);
         break;
     }
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 5ca0658..b167fde 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -4153,7 +4153,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 static void lbr_tsx_fixup(void)
 {
     struct vcpu *curr = current;
-    unsigned int msr_count = curr->arch.hvm_vmx.msr_count;
+    unsigned int msr_count = curr->arch.hvm_vmx.msr_save_count;
     struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
     struct vmx_msr_entry *msr;
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index b4d23a1..3764982 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -139,7 +139,8 @@ struct arch_vmx_struct {
      */
     struct vmx_msr_entry *msr_area;
     struct vmx_msr_entry *host_msr_area;
-    unsigned int         msr_count;
+    unsigned int         msr_load_count;
+    unsigned int         msr_save_count;
     unsigned int         host_msr_count;
 
     unsigned long        eoi_exitmap_changed;
@@ -547,12 +548,16 @@ enum vmx_insn_errno
 enum vmx_msr_list_type {
     VMX_MSR_HOST,           /* MSRs loaded on VMExit.                   */
     VMX_MSR_GUEST,          /* MSRs saved on VMExit, loaded on VMEntry. */
+    VMX_MSR_GUEST_LOADONLY, /* MSRs loaded on VMEntry only.             */
 };
 
 /**
  * Add an MSR to an MSR list (inserting space for the entry if necessary), and
  * set the MSRs value.
  *
+ * It is undefined behaviour to try and insert the same MSR into both the
+ * GUEST and GUEST_LOADONLY list.
+ *
  * May fail if unable to allocate memory for the list, or the total number of
  * entries exceeds the memory allocated.
  */
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 8/9] x86/vmx: Support removing MSRs from the host/guest load/save lists
  2018-06-08 18:48 [PATCH v2 0/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
                   ` (6 preceding siblings ...)
  2018-06-08 18:48 ` [PATCH v2 7/9] x86/vmx: Support load-only guest MSR list entries Andrew Cooper
@ 2018-06-08 18:48 ` Andrew Cooper
  2018-06-12  3:24   ` Tian, Kevin
  2018-06-12  8:27   ` Jan Beulich
  2018-06-08 18:48 ` [PATCH v2 9/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
  8 siblings, 2 replies; 36+ messages in thread
From: Andrew Cooper @ 2018-06-08 18:48 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper, Jun Nakajima,
	Roger Pau Monné

Up until this point, the MSR load/save lists have only ever accumulated
content.  Introduce vmx_del_msr() as a companion to vmx_add_msr().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

v2:
 * s/arch_vmx/vmx/ as per patch 2 review
---
 xen/arch/x86/hvm/vmx/vmcs.c        | 68 ++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/vmx/vmcs.h |  6 ++++
 2 files changed, 74 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 4fb3043..0ebccc4 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1452,6 +1452,74 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
     return rc;
 }
 
+int vmx_del_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type)
+{
+    struct arch_vmx_struct *vmx = &v->arch.hvm_vmx;
+    struct vmx_msr_entry *start = NULL, *ent, *end;
+    unsigned int substart, subend, total;
+
+    ASSERT(v == current || !vcpu_runnable(v));
+
+    switch ( type )
+    {
+    case VMX_MSR_HOST:
+        start    = vmx->host_msr_area;
+        substart = 0;
+        subend   = vmx->host_msr_count;
+        total    = subend;
+        break;
+
+    case VMX_MSR_GUEST:
+        start    = vmx->msr_area;
+        substart = 0;
+        subend   = vmx->msr_save_count;
+        total    = vmx->msr_load_count;
+        break;
+
+    case VMX_MSR_GUEST_LOADONLY:
+        start    = vmx->msr_area;
+        substart = vmx->msr_save_count;
+        subend   = vmx->msr_load_count;
+        total    = subend;
+        break;
+
+    default:
+        ASSERT_UNREACHABLE();
+    }
+
+    if ( !start )
+        return -ESRCH;
+
+    end = start + total;
+    ent = locate_msr_entry(start + substart, start + subend, msr);
+
+    if ( (ent == end) || (ent->index != msr) )
+        return -ESRCH;
+
+    memmove(ent, ent + 1, sizeof(*ent) * (end - ent));
+
+    vmx_vmcs_enter(v);
+
+    switch ( type )
+    {
+    case VMX_MSR_HOST:
+        __vmwrite(VM_EXIT_MSR_LOAD_COUNT, vmx->host_msr_count--);
+        break;
+
+    case VMX_MSR_GUEST:
+        __vmwrite(VM_EXIT_MSR_STORE_COUNT, vmx->msr_save_count--);
+
+        /* Fallthrough */
+    case VMX_MSR_GUEST_LOADONLY:
+        __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_load_count--);
+        break;
+    }
+
+    vmx_vmcs_exit(v);
+
+    return 0;
+}
+
 void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector)
 {
     if ( !test_and_set_bit(vector, v->arch.hvm_vmx.eoi_exit_bitmap) )
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 3764982..6629b4f 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -564,6 +564,12 @@ enum vmx_msr_list_type {
 int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
                 enum vmx_msr_list_type type);
 
+/**
+ * Remove an MSR entry from an MSR list.  Returns -ESRCH if the MSR was not
+ * found in the list.
+ */
+int vmx_del_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type);
+
 static inline int vmx_add_guest_msr(struct vcpu *v, uint32_t msr, uint64_t val)
 {
     return vmx_add_msr(v, msr, val, VMX_MSR_GUEST);
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 9/9] x86/vmx: Don't leak EFER.NXE into guest context
  2018-06-08 18:48 [PATCH v2 0/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
                   ` (7 preceding siblings ...)
  2018-06-08 18:48 ` [PATCH v2 8/9] x86/vmx: Support removing MSRs from the host/guest load/save lists Andrew Cooper
@ 2018-06-08 18:48 ` Andrew Cooper
  2018-06-12  6:04   ` Tian, Kevin
                     ` (2 more replies)
  8 siblings, 3 replies; 36+ messages in thread
From: Andrew Cooper @ 2018-06-08 18:48 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper, Tim Deegan,
	Jun Nakajima, Roger Pau Monné

Intel hardware only uses 4 bits in MSR_EFER.  Changes to LME and LMA are
handled automatically via the VMENTRY_CTLS.IA32E_MODE bit.

SCE is handled by ad-hoc logic in context_switch(), vmx_restore_guest_msrs()
and vmx_update_guest_efer(), and works by altering the host SCE value to match
the setting the guest wants.  This works because, in HVM vcpu context, Xen
never needs to execute a SYSCALL or SYSRET instruction.

However, NXE has never been context switched.  Unlike SCE, NXE cannot be
context switched at vcpu boundaries because disabling NXE makes PTE.NX bits
reserved and cause a pagefault when encountered.  This means that the guest
always has Xen's setting in effect, irrespective of the bit it can see and
modify in its virtualised view of MSR_EFER.

This isn't a major problem for production operating systems because they, like
Xen, always turn the NXE on when it is available.  However, it does have an
observable effect on which guest PTE bits are valid, and whether
PFEC_insn_fetch is visible in a #PF error code.

Second generation VT-x hardware has host and guest EFER fields in the VMCS,
and support for loading and saving them automatically.  First generation VT-x
hardware needs to use MSR load/save lists to cause an atomic switch of
MSR_EFER on vmentry/exit.

Therefore we update vmx_init_vmcs_config() to find and use guest/host EFER
support when available (and MSR load/save lists on older hardware) and drop
all ad-hoc alteration of SCE.

There are two minor complications when selecting the EFER setting:
 * For shadow guests, NXE is a paging setting and must remain under host
   control, but this is fine as Xen also handles the pagefaults.
 * When the Unrestricted Guest control is clear, hardware doesn't tolerate LME
   and LMA being different.  This doesn't matter in practice as we intercept
   all writes to CR0 and reads from MSR_EFER, so can provide architecturally
   consistent behaviour from the guests point of view.

With changing how EFER is loaded, vmcs_dump_vcpu() needs adjusting.  Read EFER
from the appropriate information source, and identify when dumping the guest
EFER value which source was used.

As a result of fixing EFER context switching, we can remove the Intel-special
case from hvm_nx_enabled() and let guest_walk_tables() work with the real
guest paging settings.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Tim Deegan <tim@xen.org>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

v2:
 * Fix a vmentry failure on Nehalem/Westmere hardware.  LME != LMA is actuall
   dependent on Unrestricted Guest (which is clear for Shadow guests as a side
   effect of clearing EPT).
 * Fix vmcs_dump_vcpu() to cope with the new EFER behaviour.
---
 xen/arch/x86/domain.c              | 10 -----
 xen/arch/x86/hvm/vmx/vmcs.c        | 26 +++++++++----
 xen/arch/x86/hvm/vmx/vmx.c         | 78 ++++++++++++++++++++++++++++----------
 xen/include/asm-x86/hvm/hvm.h      |  4 +-
 xen/include/asm-x86/hvm/vmx/vmcs.h | 16 ++++++++
 5 files changed, 95 insertions(+), 39 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 0ca820a..e817795 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1723,16 +1723,6 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
     {
         __context_switch();
 
-        if ( is_pv_domain(nextd) &&
-             (is_idle_domain(prevd) ||
-              is_hvm_domain(prevd) ||
-              is_pv_32bit_domain(prevd) != is_pv_32bit_domain(nextd)) )
-        {
-            uint64_t efer = read_efer();
-            if ( !(efer & EFER_SCE) )
-                write_efer(efer | EFER_SCE);
-        }
-
         /* Re-enable interrupts before restoring state which may fault. */
         local_irq_enable();
 
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 0ebccc4..9e4e29d 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -342,8 +342,8 @@ static int vmx_init_vmcs_config(void)
     }
 
     min = VM_EXIT_ACK_INTR_ON_EXIT;
-    opt = VM_EXIT_SAVE_GUEST_PAT | VM_EXIT_LOAD_HOST_PAT |
-          VM_EXIT_CLEAR_BNDCFGS;
+    opt = (VM_EXIT_SAVE_GUEST_PAT | VM_EXIT_LOAD_HOST_PAT |
+           VM_EXIT_LOAD_HOST_EFER | VM_EXIT_CLEAR_BNDCFGS);
     min |= VM_EXIT_IA32E_MODE;
     _vmx_vmexit_control = adjust_vmx_controls(
         "VMExit Control", min, opt, MSR_IA32_VMX_EXIT_CTLS, &mismatch);
@@ -383,7 +383,8 @@ static int vmx_init_vmcs_config(void)
         _vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS;
 
     min = 0;
-    opt = VM_ENTRY_LOAD_GUEST_PAT | VM_ENTRY_LOAD_BNDCFGS;
+    opt = (VM_ENTRY_LOAD_GUEST_PAT | VM_ENTRY_LOAD_GUEST_EFER |
+           VM_ENTRY_LOAD_BNDCFGS);
     _vmx_vmentry_control = adjust_vmx_controls(
         "VMEntry Control", min, opt, MSR_IA32_VMX_ENTRY_CTLS, &mismatch);
 
@@ -1148,6 +1149,8 @@ static int construct_vmcs(struct vcpu *v)
     v->arch.hvm_vmx.host_cr0 = read_cr0() | X86_CR0_TS;
     __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0);
     __vmwrite(HOST_CR4, mmu_cr4_features);
+    if ( cpu_has_vmx_efer )
+        __vmwrite(HOST_EFER, read_efer());
 
     /* Host CS:RIP. */
     __vmwrite(HOST_CS_SELECTOR, __HYPERVISOR_CS);
@@ -1905,7 +1908,17 @@ void vmcs_dump_vcpu(struct vcpu *v)
     vmentry_ctl = vmr32(VM_ENTRY_CONTROLS),
     vmexit_ctl = vmr32(VM_EXIT_CONTROLS);
     cr4 = vmr(GUEST_CR4);
-    efer = vmr(GUEST_EFER);
+
+    /*
+     * The guests EFER setting comes from the GUEST_EFER VMCS field whenever
+     * available, or the guest load-only MSR list on Gen1 hardware, the entry
+     * for which may be elided for performance reasons if identical to Xen's
+     * setting.
+     */
+    if ( cpu_has_vmx_efer )
+        efer = vmr(GUEST_EFER);
+    else if ( vmx_read_guest_loadonly_msr(v, MSR_EFER, &efer) )
+        efer = read_efer();
 
     printk("*** Guest State ***\n");
     printk("CR0: actual=0x%016lx, shadow=0x%016lx, gh_mask=%016lx\n",
@@ -1942,9 +1955,8 @@ void vmcs_dump_vcpu(struct vcpu *v)
     vmx_dump_sel("LDTR", GUEST_LDTR_SELECTOR);
     vmx_dump_sel2("IDTR", GUEST_IDTR_LIMIT);
     vmx_dump_sel("  TR", GUEST_TR_SELECTOR);
-    if ( (vmexit_ctl & (VM_EXIT_SAVE_GUEST_PAT | VM_EXIT_SAVE_GUEST_EFER)) ||
-         (vmentry_ctl & (VM_ENTRY_LOAD_GUEST_PAT | VM_ENTRY_LOAD_GUEST_EFER)) )
-        printk("EFER = 0x%016lx  PAT = 0x%016lx\n", efer, vmr(GUEST_PAT));
+    printk("EFER(%s) = 0x%016lx  PAT = 0x%016lx\n",
+           cpu_has_vmx_efer ? "VMCS" : "MSR LL", efer, vmr(GUEST_PAT));
     printk("PreemptionTimer = 0x%08x  SM Base = 0x%08x\n",
            vmr32(GUEST_PREEMPTION_TIMER), vmr32(GUEST_SMBASE));
     printk("DebugCtl = 0x%016lx  DebugExceptions = 0x%016lx\n",
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index b167fde..5d04e45 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -509,15 +509,6 @@ static void vmx_restore_guest_msrs(struct vcpu *v)
     wrmsrl(MSR_LSTAR,          v->arch.hvm_vmx.lstar);
     wrmsrl(MSR_SYSCALL_MASK,   v->arch.hvm_vmx.sfmask);
 
-    if ( (v->arch.hvm_vcpu.guest_efer ^ read_efer()) & EFER_SCE )
-    {
-        HVM_DBG_LOG(DBG_LEVEL_2,
-                    "restore guest's EFER with value %lx",
-                    v->arch.hvm_vcpu.guest_efer);
-        write_efer((read_efer() & ~EFER_SCE) |
-                   (v->arch.hvm_vcpu.guest_efer & EFER_SCE));
-    }
-
     if ( cpu_has_rdtscp )
         wrmsr_tsc_aux(hvm_msr_tsc_aux(v));
 }
@@ -1646,22 +1637,71 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr,
 
 static void vmx_update_guest_efer(struct vcpu *v)
 {
-    unsigned long vm_entry_value;
+    unsigned long entry_ctls, guest_efer = v->arch.hvm_vcpu.guest_efer,
+        xen_efer = read_efer();
+
+    if ( paging_mode_shadow(v->domain) )
+    {
+        /*
+         * When using shadow pagetables, EFER.NX is a Xen-owned bit and is not
+         * under guest control.
+         */
+        guest_efer &= ~EFER_NX;
+        guest_efer |= xen_efer & EFER_NX;
+    }
+
+    if ( !(v->arch.hvm_vmx.secondary_exec_control &
+           SECONDARY_EXEC_UNRESTRICTED_GUEST) )
+    {
+        /*
+         * When Unrestricted Guest is not enabled in the VMCS, hardware does
+         * not tolerate the LME and LMA settings being different.  As writes
+         * to CR0 are intercepted, it is safe to leave LME clear at this
+         * point, and fix up both LME and LMA when CR0.PG is set.
+         */
+        if ( !(guest_efer & EFER_LMA) )
+            guest_efer &= ~EFER_LME;
+    }
 
     vmx_vmcs_enter(v);
 
-    __vmread(VM_ENTRY_CONTROLS, &vm_entry_value);
-    if ( v->arch.hvm_vcpu.guest_efer & EFER_LMA )
-        vm_entry_value |= VM_ENTRY_IA32E_MODE;
+    /*
+     * The intended guest running mode is derived from VM_ENTRY_IA32E_MODE,
+     * which (architecturally) is the guest's LMA setting.
+     */
+    __vmread(VM_ENTRY_CONTROLS, &entry_ctls);
+
+    entry_ctls &= ~VM_ENTRY_IA32E_MODE;
+    if ( guest_efer & EFER_LMA )
+        entry_ctls |= VM_ENTRY_IA32E_MODE;
+
+    __vmwrite(VM_ENTRY_CONTROLS, entry_ctls);
+
+    /* We expect to use EFER loading in the common case, but... */
+    if ( likely(cpu_has_vmx_efer) )
+        __vmwrite(GUEST_EFER, guest_efer);
+
+    /* ... on Gen1 VT-x hardware, we have to use MSR load/save lists instead. */
     else
-        vm_entry_value &= ~VM_ENTRY_IA32E_MODE;
-    __vmwrite(VM_ENTRY_CONTROLS, vm_entry_value);
+    {
+        /*
+         * When the guests choice of EFER matches Xen's, remove the load/save
+         * list entries.  It is unnecessary overhead, especially as this is
+         * expected to be the common case for 64bit guests.
+         */
+        if ( guest_efer == xen_efer )
+        {
+            vmx_del_msr(v, MSR_EFER, VMX_MSR_HOST);
+            vmx_del_msr(v, MSR_EFER, VMX_MSR_GUEST_LOADONLY);
+        }
+        else
+        {
+            vmx_add_msr(v, MSR_EFER, xen_efer, VMX_MSR_HOST);
+            vmx_add_msr(v, MSR_EFER, guest_efer, VMX_MSR_GUEST_LOADONLY);
+        }
+    }
 
     vmx_vmcs_exit(v);
-
-    if ( v == current )
-        write_efer((read_efer() & ~EFER_SCE) |
-                   (v->arch.hvm_vcpu.guest_efer & EFER_SCE));
 }
 
 void nvmx_enqueue_n2_exceptions(struct vcpu *v, 
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index ef5e198..fcfc5cf 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -296,10 +296,8 @@ int hvm_girq_dest_2_vcpu_id(struct domain *d, uint8_t dest, uint8_t dest_mode);
     (hvm_paging_enabled(v) && ((v)->arch.hvm_vcpu.guest_cr[4] & X86_CR4_SMEP))
 #define hvm_smap_enabled(v) \
     (hvm_paging_enabled(v) && ((v)->arch.hvm_vcpu.guest_cr[4] & X86_CR4_SMAP))
-/* HVM guests on Intel hardware leak Xen's NX settings into guest context. */
 #define hvm_nx_enabled(v) \
-    ((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && cpu_has_nx) ||    \
-     ((v)->arch.hvm_vcpu.guest_efer & EFER_NX))
+    ((v)->arch.hvm_vcpu.guest_efer & EFER_NX)
 #define hvm_pku_enabled(v) \
     (hvm_paging_enabled(v) && ((v)->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PKE))
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 6629b4f..d4b59a0 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -311,6 +311,8 @@ extern u64 vmx_ept_vpid_cap;
     (vmx_cpu_based_exec_control & CPU_BASED_MONITOR_TRAP_FLAG)
 #define cpu_has_vmx_pat \
     (vmx_vmentry_control & VM_ENTRY_LOAD_GUEST_PAT)
+#define cpu_has_vmx_efer \
+    (vmx_vmentry_control & VM_ENTRY_LOAD_GUEST_EFER)
 #define cpu_has_vmx_unrestricted_guest \
     (vmx_secondary_exec_control & SECONDARY_EXEC_UNRESTRICTED_GUEST)
 #define vmx_unrestricted_guest(v)               \
@@ -596,6 +598,20 @@ static inline int vmx_read_guest_msr(const struct vcpu *v, uint32_t msr,
     return 0;
 }
 
+static inline int vmx_read_guest_loadonly_msr(
+    const struct vcpu *v, uint32_t msr, uint64_t *val)
+{
+    const struct vmx_msr_entry *ent =
+        vmx_find_msr(v, msr, VMX_MSR_GUEST_LOADONLY);
+
+    if ( !ent )
+        return -ESRCH;
+
+    *val = ent->data;
+
+    return 0;
+}
+
 static inline int vmx_write_guest_msr(struct vcpu *v, uint32_t msr,
                                       uint64_t val)
 {
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/9] x86/vmx: Support remote access to the MSR lists
  2018-06-08 18:48 ` [PATCH v2 4/9] x86/vmx: Support remote access to the MSR lists Andrew Cooper
@ 2018-06-12  2:49   ` Tian, Kevin
  2018-06-12  8:06   ` Jan Beulich
  1 sibling, 0 replies; 36+ messages in thread
From: Tian, Kevin @ 2018-06-12  2:49 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Wei Liu, Nakajima, Jun, Jan Beulich, Roger Pau Monné

> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Saturday, June 9, 2018 2:49 AM
> 
> At the moment, all modifications of the MSR lists are in current context.
> However, future changes may need to put MSR_EFER into the lists from
> domctl
> hypercall context.
> 
> Plumb a struct vcpu parameter down through the infrastructure, and use
> vmx_vmcs_{enter,exit}() for safe access to the VMCS in vmx_add_msr().
> Use
> assertions to ensure that access is either in current context, or while the
> vcpu is paused.
> 
> Note these expectations beside the fields in arch_vmx_struct, and reorder
> the
> fields to avoid unnecessary padding.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/9] x86/vmx: Improvements to LBR MSR handling
  2018-06-08 18:48 ` [PATCH v2 5/9] x86/vmx: Improvements to LBR MSR handling Andrew Cooper
@ 2018-06-12  2:58   ` Tian, Kevin
  2018-06-12  8:15   ` Jan Beulich
  2018-06-27  8:43   ` [PATCH v3 " Andrew Cooper
  2 siblings, 0 replies; 36+ messages in thread
From: Tian, Kevin @ 2018-06-12  2:58 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Wei Liu, Nakajima, Jun, Jan Beulich, Roger Pau Monné

> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Saturday, June 9, 2018 2:49 AM
> 
> The main purpose of this patch is to only ever insert the LBR MSRs into the
> guest load/save list once, as a future patch wants to change the behaviour
> of
> vmx_add_guest_msr().
> 
> The repeated processing of lbr_info and the guests MSR load/save list is
> redundant, and a guest using LBR itself will have to re-enable
> MSR_DEBUGCTL.LBR in its #DB handler, meaning that Xen will repeat this
> redundant processing every time the guest gets a debug exception.
> 
> Rename lbr_fixup_enabled to lbr_flags to be a little more generic, and use
> one
> bit to indicate that the MSRs have been inserted into the load/save list.
> Shorten the existing FIXUP* identifiers to reduce code volume.
> 
> Finally, the enablement of the fixups only need to be calculated once,
> rather
> than inside a doubly nested loop.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 6/9] x86/vmx: Pass an MSR value into vmx_msr_add()
  2018-06-08 18:48 ` [PATCH v2 6/9] x86/vmx: Pass an MSR value into vmx_msr_add() Andrew Cooper
@ 2018-06-12  3:11   ` Tian, Kevin
  2018-06-12  8:19   ` Jan Beulich
  1 sibling, 0 replies; 36+ messages in thread
From: Tian, Kevin @ 2018-06-12  3:11 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Wei Liu, Nakajima, Jun, Jan Beulich, Roger Pau Monné

> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Saturday, June 9, 2018 2:49 AM
> 
> The main purpose of this change is to allow us to set a specific MSR value,
> without needing to know whether there is already a load/save list slot for it.
> 
> Previously, callers wanting this property needed to call both
> vmx_add_*_msr()
> and vmx_write_*_msr() to cover both cases, and there are no callers which
> want
> the old behaviour of being a no-op if an entry already existed for the MSR.
> 
> As a result of this API improvement, the default value for guest MSRs need
> not
> be 0, and the default for host MSRs need not be passed via hardware
> register.
> In practice, this cleans up the VPMU allocation logic, and avoids an MSR
> read
> as part of vcpu construction.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 7/9] x86/vmx: Support load-only guest MSR list entries
  2018-06-08 18:48 ` [PATCH v2 7/9] x86/vmx: Support load-only guest MSR list entries Andrew Cooper
@ 2018-06-12  3:22   ` Tian, Kevin
  0 siblings, 0 replies; 36+ messages in thread
From: Tian, Kevin @ 2018-06-12  3:22 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel; +Cc: Wei Liu, Nakajima, Jun

> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Saturday, June 9, 2018 2:49 AM
> 
> Currently, the VMX_MSR_GUEST type maintains completely symmetric
> guest load
> and save lists, by pointing VM_EXIT_MSR_STORE_ADDR and
> VM_ENTRY_MSR_LOAD_ADDR
> at the same page, and setting VM_EXIT_MSR_STORE_COUNT and
> VM_ENTRY_MSR_LOAD_COUNT to the same value.
> 
> However, for MSRs which we won't let the guest have direct access to,
> having
> hardware save the current value on VMExit is unnecessary overhead.
> 
> To avoid this overhead, we must make the load and save lists asymmetric.
> By
> making the entry load count greater than the exit store count, we can
> maintain
> two adjacent lists of MSRs, the first of which is saved and restored, and the
> second of which is only restored on VMEntry.
> 
> For simplicity:
>  * Both adjacent lists are still sorted by MSR index.
>  * It undefined behaviour to insert the same MSR into both lists.
>  * The total size of both lists is still limited at 256 entries (one 4k page).
> 
> Split the current msr_count field into msr_{load,save}_count, and introduce
> a
> new VMX_MSR_GUEST_LOADONLY type, and update vmx_{add,find}_msr()
> to calculate
> which sublist to search, based on type.  VMX_MSR_HOST has no logical
> sublist,
> whereas VMX_MSR_GUEST has a sublist between 0 and the save count,
> while
> VMX_MSR_GUEST_LOADONLY has a sublist between the save count and
> the load
> count.
> 
> One subtle point is that inserting an MSR into the load-save list involves
> moving the entire load-only list, and updating both counts.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 8/9] x86/vmx: Support removing MSRs from the host/guest load/save lists
  2018-06-08 18:48 ` [PATCH v2 8/9] x86/vmx: Support removing MSRs from the host/guest load/save lists Andrew Cooper
@ 2018-06-12  3:24   ` Tian, Kevin
  2018-06-12  8:27   ` Jan Beulich
  1 sibling, 0 replies; 36+ messages in thread
From: Tian, Kevin @ 2018-06-12  3:24 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Wei Liu, Nakajima, Jun, Jan Beulich, Roger Pau Monné

> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Saturday, June 9, 2018 2:49 AM
> 
> Up until this point, the MSR load/save lists have only ever accumulated
> content.  Introduce vmx_del_msr() as a companion to vmx_add_msr().
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewd-by: Kevin Tian <kevin.tian@intel.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 9/9] x86/vmx: Don't leak EFER.NXE into guest context
  2018-06-08 18:48 ` [PATCH v2 9/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
@ 2018-06-12  6:04   ` Tian, Kevin
  2018-06-12  8:54   ` Jan Beulich
  2018-06-13 11:19   ` [PATCH v3 " Andrew Cooper
  2 siblings, 0 replies; 36+ messages in thread
From: Tian, Kevin @ 2018-06-12  6:04 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Wei Liu, Tim Deegan, Nakajima, Jun, Jan Beulich, Roger Pau Monné

> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Saturday, June 9, 2018 2:49 AM
> 
> Intel hardware only uses 4 bits in MSR_EFER.  Changes to LME and LMA are
> handled automatically via the VMENTRY_CTLS.IA32E_MODE bit.
> 
> SCE is handled by ad-hoc logic in context_switch(),
> vmx_restore_guest_msrs()
> and vmx_update_guest_efer(), and works by altering the host SCE value to
> match
> the setting the guest wants.  This works because, in HVM vcpu context, Xen
> never needs to execute a SYSCALL or SYSRET instruction.
> 
> However, NXE has never been context switched.  Unlike SCE, NXE cannot be
> context switched at vcpu boundaries because disabling NXE makes PTE.NX
> bits
> reserved and cause a pagefault when encountered.  This means that the
> guest
> always has Xen's setting in effect, irrespective of the bit it can see and
> modify in its virtualised view of MSR_EFER.
> 
> This isn't a major problem for production operating systems because they,
> like
> Xen, always turn the NXE on when it is available.  However, it does have an
> observable effect on which guest PTE bits are valid, and whether
> PFEC_insn_fetch is visible in a #PF error code.
> 
> Second generation VT-x hardware has host and guest EFER fields in the
> VMCS,
> and support for loading and saving them automatically.  First generation
> VT-x
> hardware needs to use MSR load/save lists to cause an atomic switch of
> MSR_EFER on vmentry/exit.
> 
> Therefore we update vmx_init_vmcs_config() to find and use guest/host
> EFER
> support when available (and MSR load/save lists on older hardware) and
> drop
> all ad-hoc alteration of SCE.
> 
> There are two minor complications when selecting the EFER setting:
>  * For shadow guests, NXE is a paging setting and must remain under host
>    control, but this is fine as Xen also handles the pagefaults.
>  * When the Unrestricted Guest control is clear, hardware doesn't tolerate
> LME
>    and LMA being different.  This doesn't matter in practice as we intercept
>    all writes to CR0 and reads from MSR_EFER, so can provide architecturally
>    consistent behaviour from the guests point of view.
> 
> With changing how EFER is loaded, vmcs_dump_vcpu() needs adjusting.
> Read EFER
> from the appropriate information source, and identify when dumping the
> guest
> EFER value which source was used.
> 
> As a result of fixing EFER context switching, we can remove the Intel-special
> case from hvm_nx_enabled() and let guest_walk_tables() work with the
> real
> guest paging settings.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> Reviewed-by: Tim Deegan <tim@xen.org>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/9] x86/vmx: Support remote access to the MSR lists
  2018-06-08 18:48 ` [PATCH v2 4/9] x86/vmx: Support remote access to the MSR lists Andrew Cooper
  2018-06-12  2:49   ` Tian, Kevin
@ 2018-06-12  8:06   ` Jan Beulich
  1 sibling, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2018-06-12  8:06 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Xen-devel, Wei Liu, Jun Nakajima, Roger Pau Monne

>>> On 08.06.18 at 20:48, <andrew.cooper3@citrix.com> wrote:
> At the moment, all modifications of the MSR lists are in current context.
> However, future changes may need to put MSR_EFER into the lists from domctl
> hypercall context.
> 
> Plumb a struct vcpu parameter down through the infrastructure, and use
> vmx_vmcs_{enter,exit}() for safe access to the VMCS in vmx_add_msr().  Use
> assertions to ensure that access is either in current context, or while the
> vcpu is paused.
> 
> Note these expectations beside the fields in arch_vmx_struct, and reorder the
> fields to avoid unnecessary padding.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/9] x86/vmx: Improvements to LBR MSR handling
  2018-06-08 18:48 ` [PATCH v2 5/9] x86/vmx: Improvements to LBR MSR handling Andrew Cooper
  2018-06-12  2:58   ` Tian, Kevin
@ 2018-06-12  8:15   ` Jan Beulich
  2018-06-12  8:51     ` Andrew Cooper
  2018-06-27  8:43   ` [PATCH v3 " Andrew Cooper
  2 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2018-06-12  8:15 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Xen-devel, Wei Liu, Jun Nakajima, Roger Pau Monne

>>> On 08.06.18 at 20:48, <andrew.cooper3@citrix.com> wrote:
> @@ -3106,14 +3104,13 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
>              for ( ; (rc == 0) && lbr->count; lbr++ )
>                  for ( i = 0; (rc == 0) && (i < lbr->count); i++ )
>                      if ( (rc = vmx_add_guest_msr(v, lbr->base + i)) == 0 )
> -                    {
>                          vmx_clear_msr_intercept(v, lbr->base + i, VMX_MSR_RW);
> -                        if ( lbr_tsx_fixup_needed )
> -                            v->arch.hvm_vmx.lbr_fixup_enabled |= FIXUP_LBR_TSX;
> -                        if ( bdw_erratum_bdf14_fixup_needed )
> -                            v->arch.hvm_vmx.lbr_fixup_enabled |=
> -                                FIXUP_BDW_ERRATUM_BDF14;
> -                    }
> +
> +            v->arch.hvm_vmx.lbr_flags |= LBR_MSRS_INSERTED;
> +            if ( lbr_tsx_fixup_needed )
> +                v->arch.hvm_vmx.lbr_flags |= LBR_FIXUP_TSX;
> +            if ( bdw_erratum_bdf14_fixup_needed )
> +                v->arch.hvm_vmx.lbr_flags |= LBR_FIXUP_BDF14;

Note how the setting of the flags previously depended on
vmx_add_guest_msr() having returned success at least once.

> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -156,7 +156,12 @@ struct arch_vmx_struct {
>      /* Are we emulating rather than VMENTERing? */
>      uint8_t              vmx_emulate;
>  
> -    uint8_t              lbr_fixup_enabled;
> +    /* Flags for LBR MSRs in the load/save lists. */
> +#define LBR_MSRS_INSERTED  (1u << 0)
> +#define LBR_FIXUP_TSX      (1u << 1)
> +#define LBR_FIXUP_BDF14    (1u << 2)
> +#define LBR_FIXUP_MASK     (LBR_FIXUP_TSX | LBR_FIXUP_BDF14)
> +    uint8_t              lbr_flags;

I'm not overly happy with these getting moved to a non-private header,
but I assume you need to use the new flag in vmcs.c in a later patch.
Let's hope no other LBR_-prefixed names appear elsewhere in the code.


Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 6/9] x86/vmx: Pass an MSR value into vmx_msr_add()
  2018-06-08 18:48 ` [PATCH v2 6/9] x86/vmx: Pass an MSR value into vmx_msr_add() Andrew Cooper
  2018-06-12  3:11   ` Tian, Kevin
@ 2018-06-12  8:19   ` Jan Beulich
  1 sibling, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2018-06-12  8:19 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Xen-devel, Wei Liu, Jun Nakajima, Roger Pau Monne

>>> On 08.06.18 at 20:48, <andrew.cooper3@citrix.com> wrote:
> The main purpose of this change is to allow us to set a specific MSR value,
> without needing to know whether there is already a load/save list slot for it.
> 
> Previously, callers wanting this property needed to call both vmx_add_*_msr()
> and vmx_write_*_msr() to cover both cases, and there are no callers which want
> the old behaviour of being a no-op if an entry already existed for the MSR.
> 
> As a result of this API improvement, the default value for guest MSRs need not
> be 0, and the default for host MSRs need not be passed via hardware register.
> In practice, this cleans up the VPMU allocation logic, and avoids an MSR read
> as part of vcpu construction.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 8/9] x86/vmx: Support removing MSRs from the host/guest load/save lists
  2018-06-08 18:48 ` [PATCH v2 8/9] x86/vmx: Support removing MSRs from the host/guest load/save lists Andrew Cooper
  2018-06-12  3:24   ` Tian, Kevin
@ 2018-06-12  8:27   ` Jan Beulich
  2018-06-12 17:40     ` Andrew Cooper
  1 sibling, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2018-06-12  8:27 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Xen-devel, Wei Liu, Jun Nakajima, Roger Pau Monne

>>> On 08.06.18 at 20:48, <andrew.cooper3@citrix.com> wrote:
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -1452,6 +1452,74 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
>      return rc;
>  }
>  
> +int vmx_del_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type)
> +{
> +    struct arch_vmx_struct *vmx = &v->arch.hvm_vmx;
> +    struct vmx_msr_entry *start = NULL, *ent, *end;
> +    unsigned int substart, subend, total;
> +
> +    ASSERT(v == current || !vcpu_runnable(v));
> +
> +    switch ( type )
> +    {
> +    case VMX_MSR_HOST:
> +        start    = vmx->host_msr_area;
> +        substart = 0;
> +        subend   = vmx->host_msr_count;
> +        total    = subend;
> +        break;
> +
> +    case VMX_MSR_GUEST:
> +        start    = vmx->msr_area;
> +        substart = 0;
> +        subend   = vmx->msr_save_count;
> +        total    = vmx->msr_load_count;
> +        break;
> +
> +    case VMX_MSR_GUEST_LOADONLY:
> +        start    = vmx->msr_area;
> +        substart = vmx->msr_save_count;
> +        subend   = vmx->msr_load_count;
> +        total    = subend;
> +        break;
> +
> +    default:
> +        ASSERT_UNREACHABLE();
> +    }
> +
> +    if ( !start )
> +        return -ESRCH;

I'm pretty sure not all gcc versions we support are capable of recognizing
that substart, subend, and total can't be used uninitialized due to this
return path, without there also being a return in after default: - I'm not
sure though whether adding that return or initializers might be the
better approach towards addressing this. At least for substart an
initializer (of zero) would allow dropping two other lines of code.

> +    end = start + total;
> +    ent = locate_msr_entry(start + substart, start + subend, msr);
> +
> +    if ( (ent == end) || (ent->index != msr) )
> +        return -ESRCH;
> +
> +    memmove(ent, ent + 1, sizeof(*ent) * (end - ent));

Aren't you running over the end of the array by 1 entry here?

> +    vmx_vmcs_enter(v);
> +
> +    switch ( type )
> +    {
> +    case VMX_MSR_HOST:
> +        __vmwrite(VM_EXIT_MSR_LOAD_COUNT, vmx->host_msr_count--);
> +        break;
> +
> +    case VMX_MSR_GUEST:
> +        __vmwrite(VM_EXIT_MSR_STORE_COUNT, vmx->msr_save_count--);
> +
> +        /* Fallthrough */
> +    case VMX_MSR_GUEST_LOADONLY:
> +        __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_load_count--);
> +        break;
> +    }

Don't you want pre-decrements in all of these?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/9] x86/vmx: Improvements to LBR MSR handling
  2018-06-12  8:15   ` Jan Beulich
@ 2018-06-12  8:51     ` Andrew Cooper
  2018-06-12  9:00       ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2018-06-12  8:51 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Kevin Tian, Xen-devel, Wei Liu, Jun Nakajima, Roger Pau Monne

On 12/06/2018 09:15, Jan Beulich wrote:
>>>> On 08.06.18 at 20:48, <andrew.cooper3@citrix.com> wrote:
>> @@ -3106,14 +3104,13 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
>>              for ( ; (rc == 0) && lbr->count; lbr++ )
>>                  for ( i = 0; (rc == 0) && (i < lbr->count); i++ )
>>                      if ( (rc = vmx_add_guest_msr(v, lbr->base + i)) == 0 )
>> -                    {
>>                          vmx_clear_msr_intercept(v, lbr->base + i, VMX_MSR_RW);
>> -                        if ( lbr_tsx_fixup_needed )
>> -                            v->arch.hvm_vmx.lbr_fixup_enabled |= FIXUP_LBR_TSX;
>> -                        if ( bdw_erratum_bdf14_fixup_needed )
>> -                            v->arch.hvm_vmx.lbr_fixup_enabled |=
>> -                                FIXUP_BDW_ERRATUM_BDF14;
>> -                    }
>> +
>> +            v->arch.hvm_vmx.lbr_flags |= LBR_MSRS_INSERTED;
>> +            if ( lbr_tsx_fixup_needed )
>> +                v->arch.hvm_vmx.lbr_flags |= LBR_FIXUP_TSX;
>> +            if ( bdw_erratum_bdf14_fixup_needed )
>> +                v->arch.hvm_vmx.lbr_flags |= LBR_FIXUP_BDF14;
> Note how the setting of the flags previously depended on
> vmx_add_guest_msr() having returned success at least once.

And?

Unless this sequence returns fully successfully, we throw #MC into the
guest without setting any kind of vMCE state.  It might be the least bad
option we have available, but its also not reasonable to expect the
guest to survive.

The two ways to fail are ENOMEM which E2BIG.  The former is going to be
causing other forms of chaos, and the latter isn't going to occur in
practice because current codepaths in Xen use a maximum of ~40 or the
256 available slots.  If in the unlikely case that we fail with ENOMEM
on the first entry, all the fixup logic gets short circuited due to the
missing memory allocation (so practically 0 extra overhead), and the
guest will still malfunction.

The error handling here is sufficiently poor that I'm not worried about
changing one minor corner case.  I'm actually debating whether it would
be better to make the allocation at vmcs construction time, to avoid
runtime out-of-memory issues.

>
>> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
>> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> @@ -156,7 +156,12 @@ struct arch_vmx_struct {
>>      /* Are we emulating rather than VMENTERing? */
>>      uint8_t              vmx_emulate;
>>  
>> -    uint8_t              lbr_fixup_enabled;
>> +    /* Flags for LBR MSRs in the load/save lists. */
>> +#define LBR_MSRS_INSERTED  (1u << 0)
>> +#define LBR_FIXUP_TSX      (1u << 1)
>> +#define LBR_FIXUP_BDF14    (1u << 2)
>> +#define LBR_FIXUP_MASK     (LBR_FIXUP_TSX | LBR_FIXUP_BDF14)
>> +    uint8_t              lbr_flags;
> I'm not overly happy with these getting moved to a non-private header,
> but I assume you need to use the new flag in vmcs.c in a later patch.
> Let's hope no other LBR_-prefixed names appear elsewhere in the code.

No - no use in a later patch.  They are moved here so they are next to
the data field they are used for.  The previous code having random
defines remote from, and not obviously linked with, this data field is
detrimental to code quality and clarity.

As for namespacing, we could go with a VMX_ prefix, but there is no
equivalent needed elsewhere, so the chances of having problems are very low.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 9/9] x86/vmx: Don't leak EFER.NXE into guest context
  2018-06-08 18:48 ` [PATCH v2 9/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
  2018-06-12  6:04   ` Tian, Kevin
@ 2018-06-12  8:54   ` Jan Beulich
  2018-06-13 10:19     ` Andrew Cooper
  2018-06-13 11:19   ` [PATCH v3 " Andrew Cooper
  2 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2018-06-12  8:54 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, Tim Deegan, Xen-devel, Jun Nakajima,
	Roger Pau Monne

>>> On 08.06.18 at 20:48, <andrew.cooper3@citrix.com> wrote:
> @@ -1646,22 +1637,71 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr,
>  
>  static void vmx_update_guest_efer(struct vcpu *v)
>  {
> -    unsigned long vm_entry_value;
> +    unsigned long entry_ctls, guest_efer = v->arch.hvm_vcpu.guest_efer,
> +        xen_efer = read_efer();
> +
> +    if ( paging_mode_shadow(v->domain) )
> +    {
> +        /*
> +         * When using shadow pagetables, EFER.NX is a Xen-owned bit and is not
> +         * under guest control.
> +         */
> +        guest_efer &= ~EFER_NX;
> +        guest_efer |= xen_efer & EFER_NX;
> +    }
> +
> +    if ( !(v->arch.hvm_vmx.secondary_exec_control &
> +           SECONDARY_EXEC_UNRESTRICTED_GUEST) )

!vmx_unrestricted_guest(v)

> +    {
> +        /*
> +         * When Unrestricted Guest is not enabled in the VMCS, hardware does
> +         * not tolerate the LME and LMA settings being different.  As writes
> +         * to CR0 are intercepted, it is safe to leave LME clear at this
> +         * point, and fix up both LME and LMA when CR0.PG is set.
> +         */
> +        if ( !(guest_efer & EFER_LMA) )
> +            guest_efer &= ~EFER_LME;
> +    }
>  
>      vmx_vmcs_enter(v);
>  
> -    __vmread(VM_ENTRY_CONTROLS, &vm_entry_value);
> -    if ( v->arch.hvm_vcpu.guest_efer & EFER_LMA )
> -        vm_entry_value |= VM_ENTRY_IA32E_MODE;
> +    /*
> +     * The intended guest running mode is derived from VM_ENTRY_IA32E_MODE,
> +     * which (architecturally) is the guest's LMA setting.
> +     */
> +    __vmread(VM_ENTRY_CONTROLS, &entry_ctls);
> +
> +    entry_ctls &= ~VM_ENTRY_IA32E_MODE;
> +    if ( guest_efer & EFER_LMA )
> +        entry_ctls |= VM_ENTRY_IA32E_MODE;
> +
> +    __vmwrite(VM_ENTRY_CONTROLS, entry_ctls);
> +
> +    /* We expect to use EFER loading in the common case, but... */
> +    if ( likely(cpu_has_vmx_efer) )
> +        __vmwrite(GUEST_EFER, guest_efer);
> +
> +    /* ... on Gen1 VT-x hardware, we have to use MSR load/save lists instead. */
>      else
> -        vm_entry_value &= ~VM_ENTRY_IA32E_MODE;
> -    __vmwrite(VM_ENTRY_CONTROLS, vm_entry_value);
> +    {
> +        /*
> +         * When the guests choice of EFER matches Xen's, remove the load/save
> +         * list entries.  It is unnecessary overhead, especially as this is
> +         * expected to be the common case for 64bit guests.
> +         */
> +        if ( guest_efer == xen_efer )
> +        {
> +            vmx_del_msr(v, MSR_EFER, VMX_MSR_HOST);
> +            vmx_del_msr(v, MSR_EFER, VMX_MSR_GUEST_LOADONLY);
> +        }
> +        else
> +        {
> +            vmx_add_msr(v, MSR_EFER, xen_efer, VMX_MSR_HOST);
> +            vmx_add_msr(v, MSR_EFER, guest_efer, VMX_MSR_GUEST_LOADONLY);
> +        }
> +    }
>  
>      vmx_vmcs_exit(v);
> -
> -    if ( v == current )
> -        write_efer((read_efer() & ~EFER_SCE) |
> -                   (v->arch.hvm_vcpu.guest_efer & EFER_SCE));
>  }

As mentioned before, overall this would allow for disabling read intercepts in
certain cases. If you don't want to do this right away that's certainly fine, but
could I talk you into at least adding a comment to this effect?

> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -311,6 +311,8 @@ extern u64 vmx_ept_vpid_cap;
>      (vmx_cpu_based_exec_control & CPU_BASED_MONITOR_TRAP_FLAG)
>  #define cpu_has_vmx_pat \
>      (vmx_vmentry_control & VM_ENTRY_LOAD_GUEST_PAT)
> +#define cpu_has_vmx_efer \
> +    (vmx_vmentry_control & VM_ENTRY_LOAD_GUEST_EFER)

I think this was asked before, but I'm concerned (of at least the inconsistency)
anyway: cpu_has_vmx_mpx, for example, checks both flags. Of course there's
unlikely to be any hardware with just one of the two features, but what about
buggy virtual environments we might run in?

IOW - if you want to check just one of the two flags here, I think you want to
enforce the dependency in vmx_init_vmcs_config(), clearing the entry control
bit if the exit control one comes out clear from adjust_vmx_controls().

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/9] x86/vmx: Improvements to LBR MSR handling
  2018-06-12  8:51     ` Andrew Cooper
@ 2018-06-12  9:00       ` Jan Beulich
  2018-06-12 16:33         ` Andrew Cooper
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2018-06-12  9:00 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Xen-devel, Wei Liu, Jun Nakajima, Roger Pau Monne

>>> On 12.06.18 at 10:51, <andrew.cooper3@citrix.com> wrote:
> On 12/06/2018 09:15, Jan Beulich wrote:
>>>>> On 08.06.18 at 20:48, <andrew.cooper3@citrix.com> wrote:
>>> @@ -3106,14 +3104,13 @@ static int vmx_msr_write_intercept(unsigned int msr, 
> uint64_t msr_content)
>>>              for ( ; (rc == 0) && lbr->count; lbr++ )
>>>                  for ( i = 0; (rc == 0) && (i < lbr->count); i++ )
>>>                      if ( (rc = vmx_add_guest_msr(v, lbr->base + i)) == 0 )
>>> -                    {
>>>                          vmx_clear_msr_intercept(v, lbr->base + i, 
> VMX_MSR_RW);
>>> -                        if ( lbr_tsx_fixup_needed )
>>> -                            v->arch.hvm_vmx.lbr_fixup_enabled |= 
> FIXUP_LBR_TSX;
>>> -                        if ( bdw_erratum_bdf14_fixup_needed )
>>> -                            v->arch.hvm_vmx.lbr_fixup_enabled |=
>>> -                                FIXUP_BDW_ERRATUM_BDF14;
>>> -                    }
>>> +
>>> +            v->arch.hvm_vmx.lbr_flags |= LBR_MSRS_INSERTED;
>>> +            if ( lbr_tsx_fixup_needed )
>>> +                v->arch.hvm_vmx.lbr_flags |= LBR_FIXUP_TSX;
>>> +            if ( bdw_erratum_bdf14_fixup_needed )
>>> +                v->arch.hvm_vmx.lbr_flags |= LBR_FIXUP_BDF14;
>> Note how the setting of the flags previously depended on
>> vmx_add_guest_msr() having returned success at least once.
> 
> And?
> 
> Unless this sequence returns fully successfully, we throw #MC into the
> guest without setting any kind of vMCE state.  It might be the least bad
> option we have available, but its also not reasonable to expect the
> guest to survive.
> 
> The two ways to fail are ENOMEM which E2BIG.  The former is going to be
> causing other forms of chaos, and the latter isn't going to occur in
> practice because current codepaths in Xen use a maximum of ~40 or the
> 256 available slots.  If in the unlikely case that we fail with ENOMEM
> on the first entry, all the fixup logic gets short circuited due to the
> missing memory allocation (so practically 0 extra overhead), and the
> guest will still malfunction.
> 
> The error handling here is sufficiently poor that I'm not worried about
> changing one minor corner case.  I'm actually debating whether it would
> be better to make the allocation at vmcs construction time, to avoid
> runtime out-of-memory issues.

With further improved MSR handling down the road, I assume we'll
have some entries in the list in almost all cases, so yes, I think that
would be desirable.

>>> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
>>> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
>>> @@ -156,7 +156,12 @@ struct arch_vmx_struct {
>>>      /* Are we emulating rather than VMENTERing? */
>>>      uint8_t              vmx_emulate;
>>>  
>>> -    uint8_t              lbr_fixup_enabled;
>>> +    /* Flags for LBR MSRs in the load/save lists. */
>>> +#define LBR_MSRS_INSERTED  (1u << 0)
>>> +#define LBR_FIXUP_TSX      (1u << 1)
>>> +#define LBR_FIXUP_BDF14    (1u << 2)
>>> +#define LBR_FIXUP_MASK     (LBR_FIXUP_TSX | LBR_FIXUP_BDF14)
>>> +    uint8_t              lbr_flags;
>> I'm not overly happy with these getting moved to a non-private header,
>> but I assume you need to use the new flag in vmcs.c in a later patch.
>> Let's hope no other LBR_-prefixed names appear elsewhere in the code.
> 
> No - no use in a later patch.  They are moved here so they are next to
> the data field they are used for.  The previous code having random
> defines remote from, and not obviously linked with, this data field is
> detrimental to code quality and clarity.

A comment at both sites providing the link would already help. Just
like I would always advocate for struct-s to be fully declared locally only
when they're used in just a single source file, I'd also prefer #define-s
to have as little visibility as possible. Anyway - that's something to be
decided by the VMX maintainers in this specific case.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/9] x86/vmx: Improvements to LBR MSR handling
  2018-06-12  9:00       ` Jan Beulich
@ 2018-06-12 16:33         ` Andrew Cooper
  2018-06-13  6:30           ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2018-06-12 16:33 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Kevin Tian, Xen-devel, Wei Liu, Jun Nakajima, Roger Pau Monne

On 12/06/18 10:00, Jan Beulich wrote:
>>>> On 12.06.18 at 10:51, <andrew.cooper3@citrix.com> wrote:
>> On 12/06/2018 09:15, Jan Beulich wrote:
>>>>>> On 08.06.18 at 20:48, <andrew.cooper3@citrix.com> wrote:
>>>> @@ -3106,14 +3104,13 @@ static int vmx_msr_write_intercept(unsigned int msr, 
>> uint64_t msr_content)
>>>>              for ( ; (rc == 0) && lbr->count; lbr++ )
>>>>                  for ( i = 0; (rc == 0) && (i < lbr->count); i++ )
>>>>                      if ( (rc = vmx_add_guest_msr(v, lbr->base + i)) == 0 )
>>>> -                    {
>>>>                          vmx_clear_msr_intercept(v, lbr->base + i, 
>> VMX_MSR_RW);
>>>> -                        if ( lbr_tsx_fixup_needed )
>>>> -                            v->arch.hvm_vmx.lbr_fixup_enabled |= 
>> FIXUP_LBR_TSX;
>>>> -                        if ( bdw_erratum_bdf14_fixup_needed )
>>>> -                            v->arch.hvm_vmx.lbr_fixup_enabled |=
>>>> -                                FIXUP_BDW_ERRATUM_BDF14;
>>>> -                    }
>>>> +
>>>> +            v->arch.hvm_vmx.lbr_flags |= LBR_MSRS_INSERTED;
>>>> +            if ( lbr_tsx_fixup_needed )
>>>> +                v->arch.hvm_vmx.lbr_flags |= LBR_FIXUP_TSX;
>>>> +            if ( bdw_erratum_bdf14_fixup_needed )
>>>> +                v->arch.hvm_vmx.lbr_flags |= LBR_FIXUP_BDF14;
>>> Note how the setting of the flags previously depended on
>>> vmx_add_guest_msr() having returned success at least once.
>> And?
>>
>> Unless this sequence returns fully successfully, we throw #MC into the
>> guest without setting any kind of vMCE state.  It might be the least bad
>> option we have available, but its also not reasonable to expect the
>> guest to survive.
>>
>> The two ways to fail are ENOMEM which E2BIG.  The former is going to be
>> causing other forms of chaos, and the latter isn't going to occur in
>> practice because current codepaths in Xen use a maximum of ~40 or the
>> 256 available slots.  If in the unlikely case that we fail with ENOMEM
>> on the first entry, all the fixup logic gets short circuited due to the
>> missing memory allocation (so practically 0 extra overhead), and the
>> guest will still malfunction.
>>
>> The error handling here is sufficiently poor that I'm not worried about
>> changing one minor corner case.  I'm actually debating whether it would
>> be better to make the allocation at vmcs construction time, to avoid
>> runtime out-of-memory issues.
> With further improved MSR handling down the road, I assume we'll
> have some entries in the list in almost all cases, so yes, I think that
> would be desirable.

For performance reasons, we'll want to keep the size of the lists to an
absolute minimum.

On a closer inspection, the only uses we currently have for the
load/save lists are this new EFER case (on Gen1 hardware), the Global
Perf Ctl (for vPMU, and we really should be using the load/save support
like EFER), and the LBR MSRs.

Therefore, for on non-ancient hardware, a guest which doesn't touch
MSR_DEBUGCTL is not going to need the memory allocation, so perhaps an
up-front allocation isn't the wisest of options.  I'll keep this in mind
during the MSR work.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 8/9] x86/vmx: Support removing MSRs from the host/guest load/save lists
  2018-06-12  8:27   ` Jan Beulich
@ 2018-06-12 17:40     ` Andrew Cooper
  2018-06-12 18:23       ` Andrew Cooper
  2018-06-13  6:33       ` Jan Beulich
  0 siblings, 2 replies; 36+ messages in thread
From: Andrew Cooper @ 2018-06-12 17:40 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Kevin Tian, Xen-devel, Wei Liu, Jun Nakajima, Roger Pau Monne

On 12/06/18 09:27, Jan Beulich wrote:
>>>> On 08.06.18 at 20:48, <andrew.cooper3@citrix.com> wrote:
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -1452,6 +1452,74 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
>>      return rc;
>>  }
>>  
>> +int vmx_del_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type)
>> +{
>> +    struct arch_vmx_struct *vmx = &v->arch.hvm_vmx;
>> +    struct vmx_msr_entry *start = NULL, *ent, *end;
>> +    unsigned int substart, subend, total;
>> +
>> +    ASSERT(v == current || !vcpu_runnable(v));
>> +
>> +    switch ( type )
>> +    {
>> +    case VMX_MSR_HOST:
>> +        start    = vmx->host_msr_area;
>> +        substart = 0;
>> +        subend   = vmx->host_msr_count;
>> +        total    = subend;
>> +        break;
>> +
>> +    case VMX_MSR_GUEST:
>> +        start    = vmx->msr_area;
>> +        substart = 0;
>> +        subend   = vmx->msr_save_count;
>> +        total    = vmx->msr_load_count;
>> +        break;
>> +
>> +    case VMX_MSR_GUEST_LOADONLY:
>> +        start    = vmx->msr_area;
>> +        substart = vmx->msr_save_count;
>> +        subend   = vmx->msr_load_count;
>> +        total    = subend;
>> +        break;
>> +
>> +    default:
>> +        ASSERT_UNREACHABLE();
>> +    }
>> +
>> +    if ( !start )
>> +        return -ESRCH;
> I'm pretty sure not all gcc versions we support are capable of recognizing
> that substart, subend, and total can't be used uninitialized due to this
> return path, without there also being a return in after default: - I'm not
> sure though whether adding that return or initializers might be the
> better approach towards addressing this. At least for substart an
> initializer (of zero) would allow dropping two other lines of code.

The oldest compiler I can easily put my hands on:

x86_64-linux-gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46)

is fine with this.

>
>> +    end = start + total;
>> +    ent = locate_msr_entry(start + substart, start + subend, msr);
>> +
>> +    if ( (ent == end) || (ent->index != msr) )
>> +        return -ESRCH;
>> +
>> +    memmove(ent, ent + 1, sizeof(*ent) * (end - ent));
> Aren't you running over the end of the array by 1 entry here?

ent == end is an error condition above.  By this point, ent is
guaranteed to be < end.

>
>> +    vmx_vmcs_enter(v);
>> +
>> +    switch ( type )
>> +    {
>> +    case VMX_MSR_HOST:
>> +        __vmwrite(VM_EXIT_MSR_LOAD_COUNT, vmx->host_msr_count--);
>> +        break;
>> +
>> +    case VMX_MSR_GUEST:
>> +        __vmwrite(VM_EXIT_MSR_STORE_COUNT, vmx->msr_save_count--);
>> +
>> +        /* Fallthrough */
>> +    case VMX_MSR_GUEST_LOADONLY:
>> +        __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_load_count--);
>> +        break;
>> +    }
> Don't you want pre-decrements in all of these?

Using pre-decrements would end up with the value in struct vcpu being
correct, but the value in the VMCS being one-too-large.

I could alternatively move the subtraction to an earlier statement to
avoid any pre/post confusion?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 8/9] x86/vmx: Support removing MSRs from the host/guest load/save lists
  2018-06-12 17:40     ` Andrew Cooper
@ 2018-06-12 18:23       ` Andrew Cooper
  2018-06-13  6:33       ` Jan Beulich
  1 sibling, 0 replies; 36+ messages in thread
From: Andrew Cooper @ 2018-06-12 18:23 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Kevin Tian, Roger Pau Monne, Wei Liu, Jun Nakajima, Xen-devel

On 12/06/18 18:40, Andrew Cooper wrote:
> On 12/06/18 09:27, Jan Beulich wrote:
>>> +    end = start + total;
>>> +    ent = locate_msr_entry(start + substart, start + subend, msr);
>>> +
>>> +    if ( (ent == end) || (ent->index != msr) )
>>> +        return -ESRCH;
>>> +
>>> +    memmove(ent, ent + 1, sizeof(*ent) * (end - ent));
>> Aren't you running over the end of the array by 1 entry here?
> ent == end is an error condition above.  By this point, ent is
> guaranteed to be < end.

Although on further consideration, the size parameter should be (end -
ent - 1) to avoid moving the entry beyond the end of the array.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/9] x86/vmx: Improvements to LBR MSR handling
  2018-06-12 16:33         ` Andrew Cooper
@ 2018-06-13  6:30           ` Jan Beulich
  2018-06-13 10:37             ` Andrew Cooper
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2018-06-13  6:30 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Xen-devel, Wei Liu, Jun Nakajima, Roger Pau Monne

>>> On 12.06.18 at 18:33, <andrew.cooper3@citrix.com> wrote:
> On 12/06/18 10:00, Jan Beulich wrote:
>>>>> On 12.06.18 at 10:51, <andrew.cooper3@citrix.com> wrote:
>>> On 12/06/2018 09:15, Jan Beulich wrote:
>>>>>>> On 08.06.18 at 20:48, <andrew.cooper3@citrix.com> wrote:
>>>>> @@ -3106,14 +3104,13 @@ static int vmx_msr_write_intercept(unsigned int msr, 
> 
>>> uint64_t msr_content)
>>>>>              for ( ; (rc == 0) && lbr->count; lbr++ )
>>>>>                  for ( i = 0; (rc == 0) && (i < lbr->count); i++ )
>>>>>                      if ( (rc = vmx_add_guest_msr(v, lbr->base + i)) == 0 )
>>>>> -                    {
>>>>>                          vmx_clear_msr_intercept(v, lbr->base + i, 
>>> VMX_MSR_RW);
>>>>> -                        if ( lbr_tsx_fixup_needed )
>>>>> -                            v->arch.hvm_vmx.lbr_fixup_enabled |= 
>>> FIXUP_LBR_TSX;
>>>>> -                        if ( bdw_erratum_bdf14_fixup_needed )
>>>>> -                            v->arch.hvm_vmx.lbr_fixup_enabled |=
>>>>> -                                FIXUP_BDW_ERRATUM_BDF14;
>>>>> -                    }
>>>>> +
>>>>> +            v->arch.hvm_vmx.lbr_flags |= LBR_MSRS_INSERTED;
>>>>> +            if ( lbr_tsx_fixup_needed )
>>>>> +                v->arch.hvm_vmx.lbr_flags |= LBR_FIXUP_TSX;
>>>>> +            if ( bdw_erratum_bdf14_fixup_needed )
>>>>> +                v->arch.hvm_vmx.lbr_flags |= LBR_FIXUP_BDF14;
>>>> Note how the setting of the flags previously depended on
>>>> vmx_add_guest_msr() having returned success at least once.
>>> And?
>>>
>>> Unless this sequence returns fully successfully, we throw #MC into the
>>> guest without setting any kind of vMCE state.  It might be the least bad
>>> option we have available, but its also not reasonable to expect the
>>> guest to survive.
>>>
>>> The two ways to fail are ENOMEM which E2BIG.  The former is going to be
>>> causing other forms of chaos, and the latter isn't going to occur in
>>> practice because current codepaths in Xen use a maximum of ~40 or the
>>> 256 available slots.  If in the unlikely case that we fail with ENOMEM
>>> on the first entry, all the fixup logic gets short circuited due to the
>>> missing memory allocation (so practically 0 extra overhead), and the
>>> guest will still malfunction.
>>>
>>> The error handling here is sufficiently poor that I'm not worried about
>>> changing one minor corner case.  I'm actually debating whether it would
>>> be better to make the allocation at vmcs construction time, to avoid
>>> runtime out-of-memory issues.
>> With further improved MSR handling down the road, I assume we'll
>> have some entries in the list in almost all cases, so yes, I think that
>> would be desirable.
> 
> For performance reasons, we'll want to keep the size of the lists to an
> absolute minimum.
> 
> On a closer inspection, the only uses we currently have for the
> load/save lists are this new EFER case (on Gen1 hardware), the Global
> Perf Ctl (for vPMU, and we really should be using the load/save support
> like EFER), and the LBR MSRs.
> 
> Therefore, for on non-ancient hardware, a guest which doesn't touch
> MSR_DEBUGCTL is not going to need the memory allocation, so perhaps an
> up-front allocation isn't the wisest of options.  I'll keep this in mind
> during the MSR work.

Hmm, okay. In which case, if we anyway don't expect the guest to
survive here in case of some failure, why don't we crash it right away
instead of injecting #MC? Would make for a much easier to recognize
cause of the guest crash.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 8/9] x86/vmx: Support removing MSRs from the host/guest load/save lists
  2018-06-12 17:40     ` Andrew Cooper
  2018-06-12 18:23       ` Andrew Cooper
@ 2018-06-13  6:33       ` Jan Beulich
  1 sibling, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2018-06-13  6:33 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Xen-devel, Wei Liu, Jun Nakajima, Roger Pau Monne

>>> On 12.06.18 at 19:40, <andrew.cooper3@citrix.com> wrote:
> On 12/06/18 09:27, Jan Beulich wrote:
>>>>> On 08.06.18 at 20:48, <andrew.cooper3@citrix.com> wrote:
>>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>>> @@ -1452,6 +1452,74 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
>>>      return rc;
>>>  }
>>>  
>>> +int vmx_del_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type)
>>> +{
>>> +    struct arch_vmx_struct *vmx = &v->arch.hvm_vmx;
>>> +    struct vmx_msr_entry *start = NULL, *ent, *end;
>>> +    unsigned int substart, subend, total;
>>> +
>>> +    ASSERT(v == current || !vcpu_runnable(v));
>>> +
>>> +    switch ( type )
>>> +    {
>>> +    case VMX_MSR_HOST:
>>> +        start    = vmx->host_msr_area;
>>> +        substart = 0;
>>> +        subend   = vmx->host_msr_count;
>>> +        total    = subend;
>>> +        break;
>>> +
>>> +    case VMX_MSR_GUEST:
>>> +        start    = vmx->msr_area;
>>> +        substart = 0;
>>> +        subend   = vmx->msr_save_count;
>>> +        total    = vmx->msr_load_count;
>>> +        break;
>>> +
>>> +    case VMX_MSR_GUEST_LOADONLY:
>>> +        start    = vmx->msr_area;
>>> +        substart = vmx->msr_save_count;
>>> +        subend   = vmx->msr_load_count;
>>> +        total    = subend;
>>> +        break;
>>> +
>>> +    default:
>>> +        ASSERT_UNREACHABLE();
>>> +    }
>>> +
>>> +    if ( !start )
>>> +        return -ESRCH;
>> I'm pretty sure not all gcc versions we support are capable of recognizing
>> that substart, subend, and total can't be used uninitialized due to this
>> return path, without there also being a return in after default: - I'm not
>> sure though whether adding that return or initializers might be the
>> better approach towards addressing this. At least for substart an
>> initializer (of zero) would allow dropping two other lines of code.
> 
> The oldest compiler I can easily put my hands on:
> 
> x86_64-linux-gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46)
> 
> is fine with this.

Well, okay - let's chance it then, and fix it up later if needed.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 9/9] x86/vmx: Don't leak EFER.NXE into guest context
  2018-06-12  8:54   ` Jan Beulich
@ 2018-06-13 10:19     ` Andrew Cooper
  0 siblings, 0 replies; 36+ messages in thread
From: Andrew Cooper @ 2018-06-13 10:19 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Wei Liu, Tim Deegan, Xen-devel, Jun Nakajima,
	Roger Pau Monne

On 12/06/18 09:54, Jan Beulich wrote:
>>>> On 08.06.18 at 20:48, <andrew.cooper3@citrix.com> wrote:
>> @@ -1646,22 +1637,71 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr,
>>  
>>  static void vmx_update_guest_efer(struct vcpu *v)
>>  {
>> -    unsigned long vm_entry_value;
>> +    unsigned long entry_ctls, guest_efer = v->arch.hvm_vcpu.guest_efer,
>> +        xen_efer = read_efer();
>> +
>> +    if ( paging_mode_shadow(v->domain) )
>> +    {
>> +        /*
>> +         * When using shadow pagetables, EFER.NX is a Xen-owned bit and is not
>> +         * under guest control.
>> +         */
>> +        guest_efer &= ~EFER_NX;
>> +        guest_efer |= xen_efer & EFER_NX;
>> +    }
>> +
>> +    if ( !(v->arch.hvm_vmx.secondary_exec_control &
>> +           SECONDARY_EXEC_UNRESTRICTED_GUEST) )
> !vmx_unrestricted_guest(v)
>
>> +    {
>> +        /*
>> +         * When Unrestricted Guest is not enabled in the VMCS, hardware does
>> +         * not tolerate the LME and LMA settings being different.  As writes
>> +         * to CR0 are intercepted, it is safe to leave LME clear at this
>> +         * point, and fix up both LME and LMA when CR0.PG is set.
>> +         */
>> +        if ( !(guest_efer & EFER_LMA) )
>> +            guest_efer &= ~EFER_LME;
>> +    }
>>  
>>      vmx_vmcs_enter(v);
>>  
>> -    __vmread(VM_ENTRY_CONTROLS, &vm_entry_value);
>> -    if ( v->arch.hvm_vcpu.guest_efer & EFER_LMA )
>> -        vm_entry_value |= VM_ENTRY_IA32E_MODE;
>> +    /*
>> +     * The intended guest running mode is derived from VM_ENTRY_IA32E_MODE,
>> +     * which (architecturally) is the guest's LMA setting.
>> +     */
>> +    __vmread(VM_ENTRY_CONTROLS, &entry_ctls);
>> +
>> +    entry_ctls &= ~VM_ENTRY_IA32E_MODE;
>> +    if ( guest_efer & EFER_LMA )
>> +        entry_ctls |= VM_ENTRY_IA32E_MODE;
>> +
>> +    __vmwrite(VM_ENTRY_CONTROLS, entry_ctls);
>> +
>> +    /* We expect to use EFER loading in the common case, but... */
>> +    if ( likely(cpu_has_vmx_efer) )
>> +        __vmwrite(GUEST_EFER, guest_efer);
>> +
>> +    /* ... on Gen1 VT-x hardware, we have to use MSR load/save lists instead. */
>>      else
>> -        vm_entry_value &= ~VM_ENTRY_IA32E_MODE;
>> -    __vmwrite(VM_ENTRY_CONTROLS, vm_entry_value);
>> +    {
>> +        /*
>> +         * When the guests choice of EFER matches Xen's, remove the load/save
>> +         * list entries.  It is unnecessary overhead, especially as this is
>> +         * expected to be the common case for 64bit guests.
>> +         */
>> +        if ( guest_efer == xen_efer )
>> +        {
>> +            vmx_del_msr(v, MSR_EFER, VMX_MSR_HOST);
>> +            vmx_del_msr(v, MSR_EFER, VMX_MSR_GUEST_LOADONLY);
>> +        }
>> +        else
>> +        {
>> +            vmx_add_msr(v, MSR_EFER, xen_efer, VMX_MSR_HOST);
>> +            vmx_add_msr(v, MSR_EFER, guest_efer, VMX_MSR_GUEST_LOADONLY);
>> +        }
>> +    }
>>  
>>      vmx_vmcs_exit(v);
>> -
>> -    if ( v == current )
>> -        write_efer((read_efer() & ~EFER_SCE) |
>> -                   (v->arch.hvm_vcpu.guest_efer & EFER_SCE));
>>  }
> As mentioned before, overall this would allow for disabling read intercepts in
> certain cases. If you don't want to do this right away that's certainly fine, but
> could I talk you into at least adding a comment to this effect?

Apologies - that was a straight oversight.  Razvan thinks the monitor
side of things is actually fine, which was my concern with doing it
originally.

I've inserted the following fragment in the tail of this function, after
the vmx_vmcs_exit(v);

    /*
     * If the guests virtualised view of MSR_EFER matches the value loaded
     * into hardware, clear the read intercept to avoid unnecessary VMExits.
     */
    if ( guest_efer == v->arch.hvm_vcpu.guest_efer )
        vmx_clear_msr_intercept(v, MSR_EFER, VMX_MSR_R);
    else
        vmx_set_msr_intercept(v, MSR_EFER, VMX_MSR_R);

and will quickly whip up an XTF test for some confirmation.

>
>> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
>> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> @@ -311,6 +311,8 @@ extern u64 vmx_ept_vpid_cap;
>>      (vmx_cpu_based_exec_control & CPU_BASED_MONITOR_TRAP_FLAG)
>>  #define cpu_has_vmx_pat \
>>      (vmx_vmentry_control & VM_ENTRY_LOAD_GUEST_PAT)
>> +#define cpu_has_vmx_efer \
>> +    (vmx_vmentry_control & VM_ENTRY_LOAD_GUEST_EFER)
> I think this was asked before, but I'm concerned (of at least the inconsistency)
> anyway: cpu_has_vmx_mpx, for example, checks both flags. Of course there's
> unlikely to be any hardware with just one of the two features, but what about
> buggy virtual environments we might run in?

I'm not worried about buggy virtual environments.  For one, its not
really our bug to care about, but irrespective, if an environment is
this buggy, it won't notice the setting we've made, and the vmentry will
be fine.

This, FYI, is exactly what happens with the Virtual NMI feature when
nested under Xen atm.  Some hypervisors fail to check for it, and
blindly use it, and they mostly function when nested under Xen.  The
hypervisor which check for it as a prerequisite fail to start.

> IOW - if you want to check just one of the two flags here, I think you want to
> enforce the dependency in vmx_init_vmcs_config(), clearing the entry control
> bit if the exit control one comes out clear from adjust_vmx_controls().

As I said before, a work along this line is coming as part of the Nested
Virt work.  The current logic here is already inconsistent, and is fine
in this case.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/9] x86/vmx: Improvements to LBR MSR handling
  2018-06-13  6:30           ` Jan Beulich
@ 2018-06-13 10:37             ` Andrew Cooper
  0 siblings, 0 replies; 36+ messages in thread
From: Andrew Cooper @ 2018-06-13 10:37 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Kevin Tian, Xen-devel, Wei Liu, Jun Nakajima, Roger Pau Monne

On 13/06/18 07:30, Jan Beulich wrote:
>>>> On 12.06.18 at 18:33, <andrew.cooper3@citrix.com> wrote:
>> On 12/06/18 10:00, Jan Beulich wrote:
>>>>>> On 12.06.18 at 10:51, <andrew.cooper3@citrix.com> wrote:
>>>> On 12/06/2018 09:15, Jan Beulich wrote:
>>>>>>>> On 08.06.18 at 20:48, <andrew.cooper3@citrix.com> wrote:
>>>>>> @@ -3106,14 +3104,13 @@ static int vmx_msr_write_intercept(unsigned int msr, 
>>>> uint64_t msr_content)
>>>>>>              for ( ; (rc == 0) && lbr->count; lbr++ )
>>>>>>                  for ( i = 0; (rc == 0) && (i < lbr->count); i++ )
>>>>>>                      if ( (rc = vmx_add_guest_msr(v, lbr->base + i)) == 0 )
>>>>>> -                    {
>>>>>>                          vmx_clear_msr_intercept(v, lbr->base + i, 
>>>> VMX_MSR_RW);
>>>>>> -                        if ( lbr_tsx_fixup_needed )
>>>>>> -                            v->arch.hvm_vmx.lbr_fixup_enabled |= 
>>>> FIXUP_LBR_TSX;
>>>>>> -                        if ( bdw_erratum_bdf14_fixup_needed )
>>>>>> -                            v->arch.hvm_vmx.lbr_fixup_enabled |=
>>>>>> -                                FIXUP_BDW_ERRATUM_BDF14;
>>>>>> -                    }
>>>>>> +
>>>>>> +            v->arch.hvm_vmx.lbr_flags |= LBR_MSRS_INSERTED;
>>>>>> +            if ( lbr_tsx_fixup_needed )
>>>>>> +                v->arch.hvm_vmx.lbr_flags |= LBR_FIXUP_TSX;
>>>>>> +            if ( bdw_erratum_bdf14_fixup_needed )
>>>>>> +                v->arch.hvm_vmx.lbr_flags |= LBR_FIXUP_BDF14;
>>>>> Note how the setting of the flags previously depended on
>>>>> vmx_add_guest_msr() having returned success at least once.
>>>> And?
>>>>
>>>> Unless this sequence returns fully successfully, we throw #MC into the
>>>> guest without setting any kind of vMCE state.  It might be the least bad
>>>> option we have available, but its also not reasonable to expect the
>>>> guest to survive.
>>>>
>>>> The two ways to fail are ENOMEM which E2BIG.  The former is going to be
>>>> causing other forms of chaos, and the latter isn't going to occur in
>>>> practice because current codepaths in Xen use a maximum of ~40 or the
>>>> 256 available slots.  If in the unlikely case that we fail with ENOMEM
>>>> on the first entry, all the fixup logic gets short circuited due to the
>>>> missing memory allocation (so practically 0 extra overhead), and the
>>>> guest will still malfunction.
>>>>
>>>> The error handling here is sufficiently poor that I'm not worried about
>>>> changing one minor corner case.  I'm actually debating whether it would
>>>> be better to make the allocation at vmcs construction time, to avoid
>>>> runtime out-of-memory issues.
>>> With further improved MSR handling down the road, I assume we'll
>>> have some entries in the list in almost all cases, so yes, I think that
>>> would be desirable.
>> For performance reasons, we'll want to keep the size of the lists to an
>> absolute minimum.
>>
>> On a closer inspection, the only uses we currently have for the
>> load/save lists are this new EFER case (on Gen1 hardware), the Global
>> Perf Ctl (for vPMU, and we really should be using the load/save support
>> like EFER), and the LBR MSRs.
>>
>> Therefore, for on non-ancient hardware, a guest which doesn't touch
>> MSR_DEBUGCTL is not going to need the memory allocation, so perhaps an
>> up-front allocation isn't the wisest of options.  I'll keep this in mind
>> during the MSR work.
> Hmm, okay. In which case, if we anyway don't expect the guest to
> survive here in case of some failure, why don't we crash it right away
> instead of injecting #MC? Would make for a much easier to recognize
> cause of the guest crash.

Yeah - I was considering that as an option.  -ENOSPC is definitely a
hypervisor coding issue, and can't be remedied in place.  -ENOMEM is
unfortunate if a guest happens to hit it, but cleanly crashing is better
behaviour than the current #MC with no information.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 9/9] x86/vmx: Don't leak EFER.NXE into guest context
  2018-06-08 18:48 ` [PATCH v2 9/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
  2018-06-12  6:04   ` Tian, Kevin
  2018-06-12  8:54   ` Jan Beulich
@ 2018-06-13 11:19   ` Andrew Cooper
  2 siblings, 0 replies; 36+ messages in thread
From: Andrew Cooper @ 2018-06-13 11:19 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

Intel hardware only uses 4 bits in MSR_EFER.  Changes to LME and LMA are
handled automatically via the VMENTRY_CTLS.IA32E_MODE bit.

SCE is handled by ad-hoc logic in context_switch(), vmx_restore_guest_msrs()
and vmx_update_guest_efer(), and works by altering the host SCE value to match
the setting the guest wants.  This works because, in HVM vcpu context, Xen
never needs to execute a SYSCALL or SYSRET instruction.

However, NXE has never been context switched.  Unlike SCE, NXE cannot be
context switched at vcpu boundaries because disabling NXE makes PTE.NX bits
reserved and cause a pagefault when encountered.  This means that the guest
always has Xen's setting in effect, irrespective of the bit it can see and
modify in its virtualised view of MSR_EFER.

This isn't a major problem for production operating systems because they, like
Xen, always turn the NXE on when it is available.  However, it does have an
observable effect on which guest PTE bits are valid, and whether
PFEC_insn_fetch is visible in a #PF error code.

Second generation VT-x hardware has host and guest EFER fields in the VMCS,
and support for loading and saving them automatically.  First generation VT-x
hardware needs to use MSR load/save lists to cause an atomic switch of
MSR_EFER on vmentry/exit.

Therefore we update vmx_init_vmcs_config() to find and use guest/host EFER
support when available (and MSR load/save lists on older hardware) and drop
all ad-hoc alteration of SCE.

There are two minor complications when selecting the EFER setting:
 * For shadow guests, NXE is a paging setting and must remain under host
   control, but this is fine as Xen also handles the pagefaults.
 * When the Unrestricted Guest control is clear, hardware doesn't tolerate LME
   and LMA being different.  This doesn't matter in practice as we intercept
   all writes to CR0 and reads from MSR_EFER, so can provide architecturally
   consistent behaviour from the guests point of view.

With changing how EFER is loaded, vmcs_dump_vcpu() needs adjusting.  Read EFER
from the appropriate information source, and identify when dumping the guest
EFER value which source was used.

As a result of fixing EFER context switching, we can remove the Intel-special
case from hvm_nx_enabled() and let guest_walk_tables() work with the real
guest paging settings.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wei.liu2@citrix.com>

v2:
 * Fix a vmentry failure on Nehalem/Westmere hardware.  LME != LMA is actuall
   dependent on Unrestricted Guest (which is clear for Shadow guests as a side
   effect of clearing EPT).
 * Fix vmcs_dump_vcpu() to cope with the new EFER behaviour.
v3:
 * Don't opencode the existing vmx_unrestricted_guest() predicate.
 * Clear the MSR_EFER read intercept when possible.
---
 xen/arch/x86/domain.c              | 10 -----
 xen/arch/x86/hvm/vmx/vmcs.c        | 26 ++++++++----
 xen/arch/x86/hvm/vmx/vmx.c         | 84 ++++++++++++++++++++++++++++++--------
 xen/include/asm-x86/hvm/hvm.h      |  4 +-
 xen/include/asm-x86/hvm/vmx/vmcs.h | 16 ++++++++
 5 files changed, 102 insertions(+), 38 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 0ca820a..e817795 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1723,16 +1723,6 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
     {
         __context_switch();
 
-        if ( is_pv_domain(nextd) &&
-             (is_idle_domain(prevd) ||
-              is_hvm_domain(prevd) ||
-              is_pv_32bit_domain(prevd) != is_pv_32bit_domain(nextd)) )
-        {
-            uint64_t efer = read_efer();
-            if ( !(efer & EFER_SCE) )
-                write_efer(efer | EFER_SCE);
-        }
-
         /* Re-enable interrupts before restoring state which may fault. */
         local_irq_enable();
 
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index cb14b43..cdc1e77 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -342,8 +342,8 @@ static int vmx_init_vmcs_config(void)
     }
 
     min = VM_EXIT_ACK_INTR_ON_EXIT;
-    opt = VM_EXIT_SAVE_GUEST_PAT | VM_EXIT_LOAD_HOST_PAT |
-          VM_EXIT_CLEAR_BNDCFGS;
+    opt = (VM_EXIT_SAVE_GUEST_PAT | VM_EXIT_LOAD_HOST_PAT |
+           VM_EXIT_LOAD_HOST_EFER | VM_EXIT_CLEAR_BNDCFGS);
     min |= VM_EXIT_IA32E_MODE;
     _vmx_vmexit_control = adjust_vmx_controls(
         "VMExit Control", min, opt, MSR_IA32_VMX_EXIT_CTLS, &mismatch);
@@ -383,7 +383,8 @@ static int vmx_init_vmcs_config(void)
         _vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS;
 
     min = 0;
-    opt = VM_ENTRY_LOAD_GUEST_PAT | VM_ENTRY_LOAD_BNDCFGS;
+    opt = (VM_ENTRY_LOAD_GUEST_PAT | VM_ENTRY_LOAD_GUEST_EFER |
+           VM_ENTRY_LOAD_BNDCFGS);
     _vmx_vmentry_control = adjust_vmx_controls(
         "VMEntry Control", min, opt, MSR_IA32_VMX_ENTRY_CTLS, &mismatch);
 
@@ -1148,6 +1149,8 @@ static int construct_vmcs(struct vcpu *v)
     v->arch.hvm_vmx.host_cr0 = read_cr0() | X86_CR0_TS;
     __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0);
     __vmwrite(HOST_CR4, mmu_cr4_features);
+    if ( cpu_has_vmx_efer )
+        __vmwrite(HOST_EFER, read_efer());
 
     /* Host CS:RIP. */
     __vmwrite(HOST_CS_SELECTOR, __HYPERVISOR_CS);
@@ -1905,7 +1908,17 @@ void vmcs_dump_vcpu(struct vcpu *v)
     vmentry_ctl = vmr32(VM_ENTRY_CONTROLS),
     vmexit_ctl = vmr32(VM_EXIT_CONTROLS);
     cr4 = vmr(GUEST_CR4);
-    efer = vmr(GUEST_EFER);
+
+    /*
+     * The guests EFER setting comes from the GUEST_EFER VMCS field whenever
+     * available, or the guest load-only MSR list on Gen1 hardware, the entry
+     * for which may be elided for performance reasons if identical to Xen's
+     * setting.
+     */
+    if ( cpu_has_vmx_efer )
+        efer = vmr(GUEST_EFER);
+    else if ( vmx_read_guest_loadonly_msr(v, MSR_EFER, &efer) )
+        efer = read_efer();
 
     printk("*** Guest State ***\n");
     printk("CR0: actual=0x%016lx, shadow=0x%016lx, gh_mask=%016lx\n",
@@ -1942,9 +1955,8 @@ void vmcs_dump_vcpu(struct vcpu *v)
     vmx_dump_sel("LDTR", GUEST_LDTR_SELECTOR);
     vmx_dump_sel2("IDTR", GUEST_IDTR_LIMIT);
     vmx_dump_sel("  TR", GUEST_TR_SELECTOR);
-    if ( (vmexit_ctl & (VM_EXIT_SAVE_GUEST_PAT | VM_EXIT_SAVE_GUEST_EFER)) ||
-         (vmentry_ctl & (VM_ENTRY_LOAD_GUEST_PAT | VM_ENTRY_LOAD_GUEST_EFER)) )
-        printk("EFER = 0x%016lx  PAT = 0x%016lx\n", efer, vmr(GUEST_PAT));
+    printk("EFER(%s) = 0x%016lx  PAT = 0x%016lx\n",
+           cpu_has_vmx_efer ? "VMCS" : "MSR LL", efer, vmr(GUEST_PAT));
     printk("PreemptionTimer = 0x%08x  SM Base = 0x%08x\n",
            vmr32(GUEST_PREEMPTION_TIMER), vmr32(GUEST_SMBASE));
     printk("DebugCtl = 0x%016lx  DebugExceptions = 0x%016lx\n",
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index b167fde..51cd85f 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -509,15 +509,6 @@ static void vmx_restore_guest_msrs(struct vcpu *v)
     wrmsrl(MSR_LSTAR,          v->arch.hvm_vmx.lstar);
     wrmsrl(MSR_SYSCALL_MASK,   v->arch.hvm_vmx.sfmask);
 
-    if ( (v->arch.hvm_vcpu.guest_efer ^ read_efer()) & EFER_SCE )
-    {
-        HVM_DBG_LOG(DBG_LEVEL_2,
-                    "restore guest's EFER with value %lx",
-                    v->arch.hvm_vcpu.guest_efer);
-        write_efer((read_efer() & ~EFER_SCE) |
-                   (v->arch.hvm_vcpu.guest_efer & EFER_SCE));
-    }
-
     if ( cpu_has_rdtscp )
         wrmsr_tsc_aux(hvm_msr_tsc_aux(v));
 }
@@ -1646,22 +1637,79 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr,
 
 static void vmx_update_guest_efer(struct vcpu *v)
 {
-    unsigned long vm_entry_value;
+    unsigned long entry_ctls, guest_efer = v->arch.hvm_vcpu.guest_efer,
+        xen_efer = read_efer();
+
+    if ( paging_mode_shadow(v->domain) )
+    {
+        /*
+         * When using shadow pagetables, EFER.NX is a Xen-owned bit and is not
+         * under guest control.
+         */
+        guest_efer &= ~EFER_NX;
+        guest_efer |= xen_efer & EFER_NX;
+    }
+
+    if ( !vmx_unrestricted_guest(v) )
+    {
+        /*
+         * When Unrestricted Guest is not enabled in the VMCS, hardware does
+         * not tolerate the LME and LMA settings being different.  As writes
+         * to CR0 are intercepted, it is safe to leave LME clear at this
+         * point, and fix up both LME and LMA when CR0.PG is set.
+         */
+        if ( !(guest_efer & EFER_LMA) )
+            guest_efer &= ~EFER_LME;
+    }
 
     vmx_vmcs_enter(v);
 
-    __vmread(VM_ENTRY_CONTROLS, &vm_entry_value);
-    if ( v->arch.hvm_vcpu.guest_efer & EFER_LMA )
-        vm_entry_value |= VM_ENTRY_IA32E_MODE;
+    /*
+     * The intended guest running mode is derived from VM_ENTRY_IA32E_MODE,
+     * which (architecturally) is the guest's LMA setting.
+     */
+    __vmread(VM_ENTRY_CONTROLS, &entry_ctls);
+
+    entry_ctls &= ~VM_ENTRY_IA32E_MODE;
+    if ( guest_efer & EFER_LMA )
+        entry_ctls |= VM_ENTRY_IA32E_MODE;
+
+    __vmwrite(VM_ENTRY_CONTROLS, entry_ctls);
+
+    /* We expect to use EFER loading in the common case, but... */
+    if ( likely(cpu_has_vmx_efer) )
+        __vmwrite(GUEST_EFER, guest_efer);
+
+    /* ... on Gen1 VT-x hardware, we have to use MSR load/save lists instead. */
     else
-        vm_entry_value &= ~VM_ENTRY_IA32E_MODE;
-    __vmwrite(VM_ENTRY_CONTROLS, vm_entry_value);
+    {
+        /*
+         * When the guests choice of EFER matches Xen's, remove the load/save
+         * list entries.  It is unnecessary overhead, especially as this is
+         * expected to be the common case for 64bit guests.
+         */
+        if ( guest_efer == xen_efer )
+        {
+            vmx_del_msr(v, MSR_EFER, VMX_MSR_HOST);
+            vmx_del_msr(v, MSR_EFER, VMX_MSR_GUEST_LOADONLY);
+        }
+        else
+        {
+            vmx_add_msr(v, MSR_EFER, xen_efer, VMX_MSR_HOST);
+            vmx_add_msr(v, MSR_EFER, guest_efer, VMX_MSR_GUEST_LOADONLY);
+        }
+    }
 
     vmx_vmcs_exit(v);
 
-    if ( v == current )
-        write_efer((read_efer() & ~EFER_SCE) |
-                   (v->arch.hvm_vcpu.guest_efer & EFER_SCE));
+    /*
+     * If the guests virtualised view of MSR_EFER matches the value loaded
+     * into hardware, clear the read intercept to avoid unnecessary VMExits.
+     */
+    if ( guest_efer == v->arch.hvm_vcpu.guest_efer )
+        vmx_clear_msr_intercept(v, MSR_EFER, VMX_MSR_R);
+    else
+        vmx_set_msr_intercept(v, MSR_EFER, VMX_MSR_R);
 }
 
 void nvmx_enqueue_n2_exceptions(struct vcpu *v, 
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index ef5e198..fcfc5cf 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -296,10 +296,8 @@ int hvm_girq_dest_2_vcpu_id(struct domain *d, uint8_t dest, uint8_t dest_mode);
     (hvm_paging_enabled(v) && ((v)->arch.hvm_vcpu.guest_cr[4] & X86_CR4_SMEP))
 #define hvm_smap_enabled(v) \
     (hvm_paging_enabled(v) && ((v)->arch.hvm_vcpu.guest_cr[4] & X86_CR4_SMAP))
-/* HVM guests on Intel hardware leak Xen's NX settings into guest context. */
 #define hvm_nx_enabled(v) \
-    ((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && cpu_has_nx) ||    \
-     ((v)->arch.hvm_vcpu.guest_efer & EFER_NX))
+    ((v)->arch.hvm_vcpu.guest_efer & EFER_NX)
 #define hvm_pku_enabled(v) \
     (hvm_paging_enabled(v) && ((v)->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PKE))
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 6629b4f..d4b59a0 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -311,6 +311,8 @@ extern u64 vmx_ept_vpid_cap;
     (vmx_cpu_based_exec_control & CPU_BASED_MONITOR_TRAP_FLAG)
 #define cpu_has_vmx_pat \
     (vmx_vmentry_control & VM_ENTRY_LOAD_GUEST_PAT)
+#define cpu_has_vmx_efer \
+    (vmx_vmentry_control & VM_ENTRY_LOAD_GUEST_EFER)
 #define cpu_has_vmx_unrestricted_guest \
     (vmx_secondary_exec_control & SECONDARY_EXEC_UNRESTRICTED_GUEST)
 #define vmx_unrestricted_guest(v)               \
@@ -596,6 +598,20 @@ static inline int vmx_read_guest_msr(const struct vcpu *v, uint32_t msr,
     return 0;
 }
 
+static inline int vmx_read_guest_loadonly_msr(
+    const struct vcpu *v, uint32_t msr, uint64_t *val)
+{
+    const struct vmx_msr_entry *ent =
+        vmx_find_msr(v, msr, VMX_MSR_GUEST_LOADONLY);
+
+    if ( !ent )
+        return -ESRCH;
+
+    *val = ent->data;
+
+    return 0;
+}
+
 static inline int vmx_write_guest_msr(struct vcpu *v, uint32_t msr,
                                       uint64_t val)
 {
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 5/9] x86/vmx: Improvements to LBR MSR handling
  2018-06-08 18:48 ` [PATCH v2 5/9] x86/vmx: Improvements to LBR MSR handling Andrew Cooper
  2018-06-12  2:58   ` Tian, Kevin
  2018-06-12  8:15   ` Jan Beulich
@ 2018-06-27  8:43   ` Andrew Cooper
  2018-06-27  9:12     ` Jan Beulich
  2 siblings, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2018-06-27  8:43 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper, Jun Nakajima,
	Roger Pau Monné

The main purpose of this patch is to only ever insert the LBR MSRs into the
guest load/save list once, as a future patch wants to change the behaviour of
vmx_add_guest_msr().

The repeated processing of lbr_info and the guests MSR load/save list is
redundant, and a guest using LBR itself will have to re-enable
MSR_DEBUGCTL.LBR in its #DB handler, meaning that Xen will repeat this
redundant processing every time the guest gets a debug exception.

Rename lbr_fixup_enabled to lbr_flags to be a little more generic, and use one
bit to indicate that the MSRs have been inserted into the load/save list.
Shorten the existing FIXUP* identifiers to reduce code volume.

Furthermore, handing the guest #MC on an error isn't a legitimate action.  Two
of the three failure cases are definitely hypervisor bugs, and the third is a
boundary case which shouldn't occur in practice.  The guest also won't execute
correctly, so handle errors by cleanly crashing the guest.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>

v2:
 * New
v3:
 * domain_crash() error handling.  Add a comment explaining why this is the
   best (least bad?) option available.
---
 xen/arch/x86/hvm/vmx/vmx.c         | 81 +++++++++++++++++++++++++++-----------
 xen/include/asm-x86/hvm/vmx/vmcs.h |  2 +-
 2 files changed, 59 insertions(+), 24 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 25dd204..d4696e8 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2750,8 +2750,10 @@ enum
 
 #define LBR_FROM_SIGNEXT_2MSB  ((1ULL << 59) | (1ULL << 60))
 
-#define FIXUP_LBR_TSX            (1u << 0)
-#define FIXUP_BDW_ERRATUM_BDF14  (1u << 1)
+#define LBR_MSRS_INSERTED      (1u << 0)
+#define LBR_FIXUP_TSX          (1u << 1)
+#define LBR_FIXUP_BDF14        (1u << 2)
+#define LBR_FIXUP_MASK         (LBR_FIXUP_TSX | LBR_FIXUP_BDF14)
 
 static bool __read_mostly lbr_tsx_fixup_needed;
 static bool __read_mostly bdw_erratum_bdf14_fixup_needed;
@@ -3086,7 +3088,6 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
         break;
 
     case MSR_IA32_DEBUGCTLMSR: {
-        int i, rc = 0;
         uint64_t supported = IA32_DEBUGCTLMSR_LBR | IA32_DEBUGCTLMSR_BTF;
 
         if ( boot_cpu_has(X86_FEATURE_RTM) )
@@ -3097,30 +3098,64 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
             if ( vpmu_do_wrmsr(msr, msr_content, supported) )
                 break;
         }
-        if ( msr_content & IA32_DEBUGCTLMSR_LBR )
+
+        /*
+         * When a guest first enables LBR, arrange to save and restore the LBR
+         * MSRs and allow the guest direct access.
+         *
+         * MSR_DEBUGCTL and LBR has existed almost long as MSRs have existed,
+         * and there is no architectural way to hide the feature, or fail the
+         * attempt to enable LBR.
+         *
+         * Unknown host LBR MSRs or hitting -ENOSPC with the guest load/save
+         * list are definitely hypervisor bugs, whereas -ENOMEM for allocating
+         * the load/save list is simply unlucky (and shouldn't occur with
+         * sensible management by the toolstack).
+         *
+         * Either way, there is nothing we can do right now to recover, and
+         * the guest won't execute correctly either.  Simply crash the domain
+         * to make the failure obvious.
+         */
+        if ( !(v->arch.hvm_vmx.lbr_flags & LBR_MSRS_INSERTED) &&
+             (msr_content & IA32_DEBUGCTLMSR_LBR) )
         {
             const struct lbr_info *lbr = last_branch_msr_get();
-            if ( lbr == NULL )
-                break;
 
-            for ( ; (rc == 0) && lbr->count; lbr++ )
-                for ( i = 0; (rc == 0) && (i < lbr->count); i++ )
-                    if ( (rc = vmx_add_guest_msr(v, lbr->base + i)) == 0 )
+            if ( unlikely(!lbr) )
+            {
+                gprintk(XENLOG_ERR, "Unknown Host LBR MSRs\n");
+                domain_crash(v->domain);
+                return X86EMUL_OKAY;
+            }
+
+            for ( ; lbr->count; lbr++ )
+            {
+                unsigned int i;
+
+                for ( i = 0; i < lbr->count; i++ )
+                {
+                    int rc = vmx_add_guest_msr(v, lbr->base + i);
+
+                    if ( unlikely(!rc) )
                     {
-                        vmx_clear_msr_intercept(v, lbr->base + i, VMX_MSR_RW);
-                        if ( lbr_tsx_fixup_needed )
-                            v->arch.hvm_vmx.lbr_fixup_enabled |= FIXUP_LBR_TSX;
-                        if ( bdw_erratum_bdf14_fixup_needed )
-                            v->arch.hvm_vmx.lbr_fixup_enabled |=
-                                FIXUP_BDW_ERRATUM_BDF14;
+                        gprintk(XENLOG_ERR,
+                                "Guest load/save list error %d\n", rc);
+                        domain_crash(v->domain);
+                        return X86EMUL_OKAY;
                     }
-        }
 
-        if ( rc < 0 )
-            hvm_inject_hw_exception(TRAP_machine_check, X86_EVENT_NO_EC);
-        else
-            __vmwrite(GUEST_IA32_DEBUGCTL, msr_content);
+                    vmx_clear_msr_intercept(v, lbr->base + i, VMX_MSR_RW);
+                }
+            }
+
+            v->arch.hvm_vmx.lbr_flags |= LBR_MSRS_INSERTED;
+            if ( lbr_tsx_fixup_needed )
+                v->arch.hvm_vmx.lbr_flags |= LBR_FIXUP_TSX;
+            if ( bdw_erratum_bdf14_fixup_needed )
+                v->arch.hvm_vmx.lbr_flags |= LBR_FIXUP_BDF14;
+        }
 
+        __vmwrite(GUEST_IA32_DEBUGCTL, msr_content);
         break;
     }
     case MSR_IA32_FEATURE_CONTROL:
@@ -4203,9 +4238,9 @@ static void lbr_fixup(void)
 {
     struct vcpu *curr = current;
 
-    if ( curr->arch.hvm_vmx.lbr_fixup_enabled & FIXUP_LBR_TSX )
+    if ( curr->arch.hvm_vmx.lbr_flags & LBR_FIXUP_TSX )
         lbr_tsx_fixup();
-    if ( curr->arch.hvm_vmx.lbr_fixup_enabled & FIXUP_BDW_ERRATUM_BDF14 )
+    if ( curr->arch.hvm_vmx.lbr_flags & LBR_FIXUP_BDF14 )
         bdw_erratum_bdf14_fixup();
 }
 
@@ -4273,7 +4308,7 @@ bool vmx_vmenter_helper(const struct cpu_user_regs *regs)
     }
 
  out:
-    if ( unlikely(curr->arch.hvm_vmx.lbr_fixup_enabled) )
+    if ( unlikely(curr->arch.hvm_vmx.lbr_flags & LBR_FIXUP_MASK) )
         lbr_fixup();
 
     HVMTRACE_ND(VMENTRY, 0, 1/*cycles*/, 0, 0, 0, 0, 0, 0, 0);
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 62afebe..2c9e291 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -156,7 +156,7 @@ struct arch_vmx_struct {
     /* Are we emulating rather than VMENTERing? */
     uint8_t              vmx_emulate;
 
-    uint8_t              lbr_fixup_enabled;
+    uint8_t              lbr_flags;
 
     /* Bitmask of segments that we can't safely use in virtual 8086 mode */
     uint16_t             vm86_segment_mask;
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 5/9] x86/vmx: Improvements to LBR MSR handling
  2018-06-27  8:43   ` [PATCH v3 " Andrew Cooper
@ 2018-06-27  9:12     ` Jan Beulich
  2018-06-27  9:50       ` Andrew Cooper
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2018-06-27  9:12 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Xen-devel, Wei Liu, Jun Nakajima, Roger Pau Monne

>>> On 27.06.18 at 10:43, <andrew.cooper3@citrix.com> wrote:
> The main purpose of this patch is to only ever insert the LBR MSRs into the
> guest load/save list once, as a future patch wants to change the behaviour of
> vmx_add_guest_msr().
> 
> The repeated processing of lbr_info and the guests MSR load/save list is
> redundant, and a guest using LBR itself will have to re-enable
> MSR_DEBUGCTL.LBR in its #DB handler, meaning that Xen will repeat this
> redundant processing every time the guest gets a debug exception.
> 
> Rename lbr_fixup_enabled to lbr_flags to be a little more generic, and use one
> bit to indicate that the MSRs have been inserted into the load/save list.
> Shorten the existing FIXUP* identifiers to reduce code volume.
> 
> Furthermore, handing the guest #MC on an error isn't a legitimate action.  Two
> of the three failure cases are definitely hypervisor bugs, and the third is a
> boundary case which shouldn't occur in practice.  The guest also won't execute
> correctly, so handle errors by cleanly crashing the guest.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with one minor remark:

> @@ -3097,30 +3098,64 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
>              if ( vpmu_do_wrmsr(msr, msr_content, supported) )
>                  break;
>          }
> -        if ( msr_content & IA32_DEBUGCTLMSR_LBR )
> +
> +        /*
> +         * When a guest first enables LBR, arrange to save and restore the LBR
> +         * MSRs and allow the guest direct access.
> +         *
> +         * MSR_DEBUGCTL and LBR has existed almost long as MSRs have existed,

... as long as ...

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 5/9] x86/vmx: Improvements to LBR MSR handling
  2018-06-27  9:12     ` Jan Beulich
@ 2018-06-27  9:50       ` Andrew Cooper
  2018-06-27  9:58         ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2018-06-27  9:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Kevin Tian, Xen-devel, Wei Liu, Jun Nakajima, Roger Pau Monne

On 27/06/18 10:12, Jan Beulich wrote:
>>>> On 27.06.18 at 10:43, <andrew.cooper3@citrix.com> wrote:
>> The main purpose of this patch is to only ever insert the LBR MSRs into the
>> guest load/save list once, as a future patch wants to change the behaviour of
>> vmx_add_guest_msr().
>>
>> The repeated processing of lbr_info and the guests MSR load/save list is
>> redundant, and a guest using LBR itself will have to re-enable
>> MSR_DEBUGCTL.LBR in its #DB handler, meaning that Xen will repeat this
>> redundant processing every time the guest gets a debug exception.
>>
>> Rename lbr_fixup_enabled to lbr_flags to be a little more generic, and use one
>> bit to indicate that the MSRs have been inserted into the load/save list.
>> Shorten the existing FIXUP* identifiers to reduce code volume.
>>
>> Furthermore, handing the guest #MC on an error isn't a legitimate action.  Two
>> of the three failure cases are definitely hypervisor bugs, and the third is a
>> boundary case which shouldn't occur in practice.  The guest also won't execute
>> correctly, so handle errors by cleanly crashing the guest.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> with one minor remark:
>
>> @@ -3097,30 +3098,64 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
>>              if ( vpmu_do_wrmsr(msr, msr_content, supported) )
>>                  break;
>>          }
>> -        if ( msr_content & IA32_DEBUGCTLMSR_LBR )
>> +
>> +        /*
>> +         * When a guest first enables LBR, arrange to save and restore the LBR
>> +         * MSRs and allow the guest direct access.
>> +         *
>> +         * MSR_DEBUGCTL and LBR has existed almost long as MSRs have existed,
> ... as long as ...

Not quite.  MSRs have existed since the P5, whereas LBR was introduced
in the P6.  The MSR wasn't architectural at the time, hence no CPUID
bit, but the newer architectural declaration is compatible with the P6
(as far as LBR is concerned).

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 5/9] x86/vmx: Improvements to LBR MSR handling
  2018-06-27  9:50       ` Andrew Cooper
@ 2018-06-27  9:58         ` Jan Beulich
  2018-06-27  9:59           ` Andrew Cooper
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2018-06-27  9:58 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Xen-devel, Wei Liu, Jun Nakajima, Roger Pau Monne

>>> On 27.06.18 at 11:50, <andrew.cooper3@citrix.com> wrote:
> On 27/06/18 10:12, Jan Beulich wrote:
>>>>> On 27.06.18 at 10:43, <andrew.cooper3@citrix.com> wrote:
>>> The main purpose of this patch is to only ever insert the LBR MSRs into the
>>> guest load/save list once, as a future patch wants to change the behaviour 
> of
>>> vmx_add_guest_msr().
>>>
>>> The repeated processing of lbr_info and the guests MSR load/save list is
>>> redundant, and a guest using LBR itself will have to re-enable
>>> MSR_DEBUGCTL.LBR in its #DB handler, meaning that Xen will repeat this
>>> redundant processing every time the guest gets a debug exception.
>>>
>>> Rename lbr_fixup_enabled to lbr_flags to be a little more generic, and use one
>>> bit to indicate that the MSRs have been inserted into the load/save list.
>>> Shorten the existing FIXUP* identifiers to reduce code volume.
>>>
>>> Furthermore, handing the guest #MC on an error isn't a legitimate action.  Two
>>> of the three failure cases are definitely hypervisor bugs, and the third is a
>>> boundary case which shouldn't occur in practice.  The guest also won't execute
>>> correctly, so handle errors by cleanly crashing the guest.
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> with one minor remark:
>>
>>> @@ -3097,30 +3098,64 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
>>>              if ( vpmu_do_wrmsr(msr, msr_content, supported) )
>>>                  break;
>>>          }
>>> -        if ( msr_content & IA32_DEBUGCTLMSR_LBR )
>>> +
>>> +        /*
>>> +         * When a guest first enables LBR, arrange to save and restore the LBR
>>> +         * MSRs and allow the guest direct access.
>>> +         *
>>> +         * MSR_DEBUGCTL and LBR has existed almost long as MSRs have existed,
>> ... as long as ...
> 
> Not quite.  MSRs have existed since the P5, whereas LBR was introduced
> in the P6.  The MSR wasn't architectural at the time, hence no CPUID
> bit, but the newer architectural declaration is compatible with the P6
> (as far as LBR is concerned).

Perhaps a misunderstanding resulting from an ambiguity in my reply? I
should have said "... almost as long as ..."; I didn't mean for the "almost"
to be dropped.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 5/9] x86/vmx: Improvements to LBR MSR handling
  2018-06-27  9:58         ` Jan Beulich
@ 2018-06-27  9:59           ` Andrew Cooper
  0 siblings, 0 replies; 36+ messages in thread
From: Andrew Cooper @ 2018-06-27  9:59 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Kevin Tian, Xen-devel, Wei Liu, Jun Nakajima, Roger Pau Monne

On 27/06/18 10:58, Jan Beulich wrote:
>>>> On 27.06.18 at 11:50, <andrew.cooper3@citrix.com> wrote:
>> On 27/06/18 10:12, Jan Beulich wrote:
>>>>>> On 27.06.18 at 10:43, <andrew.cooper3@citrix.com> wrote:
>>>> The main purpose of this patch is to only ever insert the LBR MSRs into the
>>>> guest load/save list once, as a future patch wants to change the behaviour 
>> of
>>>> vmx_add_guest_msr().
>>>>
>>>> The repeated processing of lbr_info and the guests MSR load/save list is
>>>> redundant, and a guest using LBR itself will have to re-enable
>>>> MSR_DEBUGCTL.LBR in its #DB handler, meaning that Xen will repeat this
>>>> redundant processing every time the guest gets a debug exception.
>>>>
>>>> Rename lbr_fixup_enabled to lbr_flags to be a little more generic, and use one
>>>> bit to indicate that the MSRs have been inserted into the load/save list.
>>>> Shorten the existing FIXUP* identifiers to reduce code volume.
>>>>
>>>> Furthermore, handing the guest #MC on an error isn't a legitimate action.  Two
>>>> of the three failure cases are definitely hypervisor bugs, and the third is a
>>>> boundary case which shouldn't occur in practice.  The guest also won't execute
>>>> correctly, so handle errors by cleanly crashing the guest.
>>>>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>> with one minor remark:
>>>
>>>> @@ -3097,30 +3098,64 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
>>>>              if ( vpmu_do_wrmsr(msr, msr_content, supported) )
>>>>                  break;
>>>>          }
>>>> -        if ( msr_content & IA32_DEBUGCTLMSR_LBR )
>>>> +
>>>> +        /*
>>>> +         * When a guest first enables LBR, arrange to save and restore the LBR
>>>> +         * MSRs and allow the guest direct access.
>>>> +         *
>>>> +         * MSR_DEBUGCTL and LBR has existed almost long as MSRs have existed,
>>> ... as long as ...
>> Not quite.  MSRs have existed since the P5, whereas LBR was introduced
>> in the P6.  The MSR wasn't architectural at the time, hence no CPUID
>> bit, but the newer architectural declaration is compatible with the P6
>> (as far as LBR is concerned).
> Perhaps a misunderstanding resulting from an ambiguity in my reply? I
> should have said "... almost as long as ..."; I didn't mean for the "almost"
> to be dropped.

Oops yes - you're right.  I'll fix that.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2018-06-27  9:59 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-08 18:48 [PATCH v2 0/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
2018-06-08 18:48 ` [PATCH v2 1/9] x86/vmx: API improvements for MSR load/save infrastructure Andrew Cooper
2018-06-08 18:48 ` [PATCH v2 2/9] x86/vmx: Internal cleanup " Andrew Cooper
2018-06-08 18:48 ` [PATCH v2 3/9] x86/vmx: Factor locate_msr_entry() out of vmx_find_msr() and vmx_add_msr() Andrew Cooper
2018-06-08 18:48 ` [PATCH v2 4/9] x86/vmx: Support remote access to the MSR lists Andrew Cooper
2018-06-12  2:49   ` Tian, Kevin
2018-06-12  8:06   ` Jan Beulich
2018-06-08 18:48 ` [PATCH v2 5/9] x86/vmx: Improvements to LBR MSR handling Andrew Cooper
2018-06-12  2:58   ` Tian, Kevin
2018-06-12  8:15   ` Jan Beulich
2018-06-12  8:51     ` Andrew Cooper
2018-06-12  9:00       ` Jan Beulich
2018-06-12 16:33         ` Andrew Cooper
2018-06-13  6:30           ` Jan Beulich
2018-06-13 10:37             ` Andrew Cooper
2018-06-27  8:43   ` [PATCH v3 " Andrew Cooper
2018-06-27  9:12     ` Jan Beulich
2018-06-27  9:50       ` Andrew Cooper
2018-06-27  9:58         ` Jan Beulich
2018-06-27  9:59           ` Andrew Cooper
2018-06-08 18:48 ` [PATCH v2 6/9] x86/vmx: Pass an MSR value into vmx_msr_add() Andrew Cooper
2018-06-12  3:11   ` Tian, Kevin
2018-06-12  8:19   ` Jan Beulich
2018-06-08 18:48 ` [PATCH v2 7/9] x86/vmx: Support load-only guest MSR list entries Andrew Cooper
2018-06-12  3:22   ` Tian, Kevin
2018-06-08 18:48 ` [PATCH v2 8/9] x86/vmx: Support removing MSRs from the host/guest load/save lists Andrew Cooper
2018-06-12  3:24   ` Tian, Kevin
2018-06-12  8:27   ` Jan Beulich
2018-06-12 17:40     ` Andrew Cooper
2018-06-12 18:23       ` Andrew Cooper
2018-06-13  6:33       ` Jan Beulich
2018-06-08 18:48 ` [PATCH v2 9/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
2018-06-12  6:04   ` Tian, Kevin
2018-06-12  8:54   ` Jan Beulich
2018-06-13 10:19     ` Andrew Cooper
2018-06-13 11:19   ` [PATCH v3 " Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.