All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] x86/vmx: Don't leak EFER.NXE into guest context
@ 2018-05-22 11:20 Andrew Cooper
  2018-05-22 11:20 ` [PATCH 1/9] x86/vmx: API improvements for MSR load/save infrastructure Andrew Cooper
                   ` (8 more replies)
  0 siblings, 9 replies; 51+ messages in thread
From: Andrew Cooper @ 2018-05-22 11:20 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper, Tim Deegan,
	Jun Nakajima, Roger Pau Monné

The purpose of this patchset is to fix a longstanding bug whereby Xen's NXE
setting leaks into HVM context, and has a visible effect on the guests
pagewalk.

To allow patch 9 to function, there are 7 patches of improvements to the MSR
load/save infrastructure, purely to support first-gen VT-x hardware.

Along the way, there is a bugfix to how MSR_DEBUGCTL is handled, noticed while
doing the load/save list improvements.  It really does want backporting to the
stable releases, but I can't find any clean way of disentanging it from the
series.

Andrew Cooper (9):
  x86/vmx: API improvements for MSR load/save infrastructure
  x86/vmx: Internal cleanup for MSR load/save infrastructure
  x86/vmx: Factor locate_msr_entry() out of vmx_find_msr() and vmx_add_msr()
  x86/vmx: Support remote access to the MSR lists
  x86/vmx: Fix handing of MSR_DEBUGCTL on VMExit
  x86/vmx: Pass an MSR value into vmx_msr_add()
  x86/vmx: Support load-only guest MSR list entries
  x86/vmx: Support removing MSRs from the host/guest load/save lists
  x86/vmx: Don't leak EFER.NXE into guest context

 xen/arch/x86/cpu/vpmu_intel.c      |  14 +-
 xen/arch/x86/domain.c              |  10 --
 xen/arch/x86/hvm/vmx/vmcs.c        | 303 ++++++++++++++++++++++++++-----------
 xen/arch/x86/hvm/vmx/vmx.c         | 115 ++++++++++----
 xen/include/asm-x86/hvm/hvm.h      |   4 +-
 xen/include/asm-x86/hvm/vmx/vmcs.h |  84 +++++++---
 6 files changed, 374 insertions(+), 156 deletions(-)

-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 1/9] x86/vmx: API improvements for MSR load/save infrastructure
  2018-05-22 11:20 [PATCH 0/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
@ 2018-05-22 11:20 ` Andrew Cooper
  2018-05-23 16:01   ` Roger Pau Monné
  2018-05-27  3:26   ` Tian, Kevin
  2018-05-22 11:20 ` [PATCH 2/9] x86/vmx: Internal cleanup " Andrew Cooper
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 51+ messages in thread
From: Andrew Cooper @ 2018-05-22 11:20 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper, Jun Nakajima,
	Roger Pau Monné

Collect together related infrastructure in vmcs.h, rather than having it
spread out.  Turn vmx_{read,write}_guest_msr() into static inlines, as they
are simple enough.

Replace 'int type' with 'enum vmx_msr_list_type', and use switch statements
internally.  Later changes are going to introduce a new type.

Rename the type identifiers for consistency with the other VMX_MSR_*
constants.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        | 93 +++++++++++++++++---------------------
 xen/arch/x86/hvm/vmx/vmx.c         |  8 ++--
 xen/include/asm-x86/hvm/vmx/vmcs.h | 64 +++++++++++++++++++-------
 3 files changed, 93 insertions(+), 72 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 70c2fb7..a5dcf5c 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1289,22 +1289,26 @@ static int vmx_msr_entry_key_cmp(const void *key, const void *elt)
     return 0;
 }
 
-struct vmx_msr_entry *vmx_find_msr(u32 msr, int type)
+struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type)
 {
     struct vcpu *curr = current;
     unsigned int msr_count;
-    struct vmx_msr_entry *msr_area;
+    struct vmx_msr_entry *msr_area = NULL;
 
-    if ( type == VMX_GUEST_MSR )
+    switch ( type )
     {
-        msr_count = curr->arch.hvm_vmx.msr_count;
-        msr_area = curr->arch.hvm_vmx.msr_area;
-    }
-    else
-    {
-        ASSERT(type == VMX_HOST_MSR);
+    case VMX_MSR_HOST:
         msr_count = curr->arch.hvm_vmx.host_msr_count;
         msr_area = curr->arch.hvm_vmx.host_msr_area;
+        break;
+
+    case VMX_MSR_GUEST:
+        msr_count = curr->arch.hvm_vmx.msr_count;
+        msr_area = curr->arch.hvm_vmx.msr_area;
+        break;
+
+    default:
+        ASSERT_UNREACHABLE();
     }
 
     if ( msr_area == NULL )
@@ -1314,48 +1318,27 @@ struct vmx_msr_entry *vmx_find_msr(u32 msr, int type)
                    vmx_msr_entry_key_cmp);
 }
 
-int vmx_read_guest_msr(u32 msr, u64 *val)
-{
-    struct vmx_msr_entry *ent;
-
-    if ( (ent = vmx_find_msr(msr, VMX_GUEST_MSR)) != NULL )
-    {
-        *val = ent->data;
-        return 0;
-    }
-
-    return -ESRCH;
-}
-
-int vmx_write_guest_msr(u32 msr, u64 val)
-{
-    struct vmx_msr_entry *ent;
-
-    if ( (ent = vmx_find_msr(msr, VMX_GUEST_MSR)) != NULL )
-    {
-        ent->data = val;
-        return 0;
-    }
-
-    return -ESRCH;
-}
-
-int vmx_add_msr(u32 msr, int type)
+int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
 {
     struct vcpu *curr = current;
     unsigned int idx, *msr_count;
     struct vmx_msr_entry **msr_area, *msr_area_elem;
 
-    if ( type == VMX_GUEST_MSR )
+    switch ( type )
     {
-        msr_count = &curr->arch.hvm_vmx.msr_count;
-        msr_area = &curr->arch.hvm_vmx.msr_area;
-    }
-    else
-    {
-        ASSERT(type == VMX_HOST_MSR);
+    case VMX_MSR_HOST:
         msr_count = &curr->arch.hvm_vmx.host_msr_count;
         msr_area = &curr->arch.hvm_vmx.host_msr_area;
+        break;
+
+    case VMX_MSR_GUEST:
+        msr_count = &curr->arch.hvm_vmx.msr_count;
+        msr_area = &curr->arch.hvm_vmx.msr_area;
+        break;
+
+    default:
+        ASSERT_UNREACHABLE();
+        return -EINVAL;
     }
 
     if ( *msr_area == NULL )
@@ -1363,13 +1346,17 @@ int vmx_add_msr(u32 msr, int type)
         if ( (*msr_area = alloc_xenheap_page()) == NULL )
             return -ENOMEM;
 
-        if ( type == VMX_GUEST_MSR )
+        switch ( type )
         {
+        case VMX_MSR_HOST:
+            __vmwrite(VM_EXIT_MSR_LOAD_ADDR, virt_to_maddr(*msr_area));
+            break;
+
+        case VMX_MSR_GUEST:
             __vmwrite(VM_EXIT_MSR_STORE_ADDR, virt_to_maddr(*msr_area));
             __vmwrite(VM_ENTRY_MSR_LOAD_ADDR, virt_to_maddr(*msr_area));
+            break;
         }
-        else
-            __vmwrite(VM_EXIT_MSR_LOAD_ADDR, virt_to_maddr(*msr_area));
     }
 
     for ( idx = 0; idx < *msr_count && (*msr_area)[idx].index <= msr; idx++ )
@@ -1388,16 +1375,18 @@ int vmx_add_msr(u32 msr, int type)
 
     ++*msr_count;
 
-    if ( type == VMX_GUEST_MSR )
+    switch ( type )
     {
+    case VMX_MSR_HOST:
+        rdmsrl(msr, msr_area_elem->data);
+        __vmwrite(VM_EXIT_MSR_LOAD_COUNT, *msr_count);
+        break;
+
+    case VMX_MSR_GUEST:
         msr_area_elem->data = 0;
         __vmwrite(VM_EXIT_MSR_STORE_COUNT, *msr_count);
         __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, *msr_count);
-    }
-    else
-    {
-        rdmsrl(msr, msr_area_elem->data);
-        __vmwrite(VM_EXIT_MSR_LOAD_COUNT, *msr_count);
+        break;
     }
 
     return 0;
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 9707514..123dccb 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -4165,7 +4165,7 @@ static void lbr_tsx_fixup(void)
     struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
     struct vmx_msr_entry *msr;
 
-    if ( (msr = vmx_find_msr(lbr_from_start, VMX_GUEST_MSR)) != NULL )
+    if ( (msr = vmx_find_msr(lbr_from_start, VMX_MSR_GUEST)) != NULL )
     {
         /*
          * Sign extend into bits 61:62 while preserving bit 63
@@ -4175,7 +4175,7 @@ static void lbr_tsx_fixup(void)
             msr->data |= ((LBR_FROM_SIGNEXT_2MSB & msr->data) << 2);
     }
 
-    if ( (msr = vmx_find_msr(lbr_lastint_from, VMX_GUEST_MSR)) != NULL )
+    if ( (msr = vmx_find_msr(lbr_lastint_from, VMX_MSR_GUEST)) != NULL )
         msr->data |= ((LBR_FROM_SIGNEXT_2MSB & msr->data) << 2);
 }
 
@@ -4203,8 +4203,8 @@ static void bdw_erratum_bdf14_fixup(void)
      * erratum BDF14. Fix up MSR_IA32_LASTINT{FROM,TO}IP by
      * sign-extending into bits 48:63.
      */
-    sign_extend_msr(MSR_IA32_LASTINTFROMIP, VMX_GUEST_MSR);
-    sign_extend_msr(MSR_IA32_LASTINTTOIP, VMX_GUEST_MSR);
+    sign_extend_msr(MSR_IA32_LASTINTFROMIP, VMX_MSR_GUEST);
+    sign_extend_msr(MSR_IA32_LASTINTTOIP, VMX_MSR_GUEST);
 }
 
 static void lbr_fixup(void)
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 06c3179..c8a1f89 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -514,9 +514,6 @@ enum vmcs_field {
 
 #define VMCS_VPID_WIDTH 16
 
-#define VMX_GUEST_MSR 0
-#define VMX_HOST_MSR  1
-
 /* VM Instruction error numbers */
 enum vmx_insn_errno
 {
@@ -534,6 +531,54 @@ enum vmx_insn_errno
     VMX_INSN_FAIL_INVALID                  = ~0,
 };
 
+/* MSR load/save list infrastructure. */
+enum vmx_msr_list_type {
+    VMX_MSR_HOST,
+    VMX_MSR_GUEST,
+};
+
+int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type);
+
+static inline int vmx_add_host_load_msr(uint32_t msr)
+{
+    return vmx_add_msr(msr, VMX_MSR_HOST);
+}
+
+static inline int vmx_add_guest_msr(uint32_t msr)
+{
+    return vmx_add_msr(msr, VMX_MSR_GUEST);
+}
+
+struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type);
+
+static inline int vmx_read_guest_msr(uint32_t msr, uint64_t *val)
+{
+    struct vmx_msr_entry *ent;
+
+    if ( (ent = vmx_find_msr(msr, VMX_MSR_GUEST)) )
+    {
+        *val = ent->data;
+        return 0;
+    }
+
+    return -ESRCH;
+}
+
+static inline int vmx_write_guest_msr(uint32_t msr, uint64_t val)
+{
+    struct vmx_msr_entry *ent;
+
+    if ( (ent = vmx_find_msr(msr, VMX_MSR_GUEST)) )
+    {
+        ent->data = val;
+        return 0;
+    }
+
+    return -ESRCH;
+}
+
+
+/* MSR intercept bitmap infrastructure. */
 enum vmx_msr_intercept_type {
     VMX_MSR_R  = 1,
     VMX_MSR_W  = 2,
@@ -544,10 +589,6 @@ void vmx_clear_msr_intercept(struct vcpu *v, unsigned int msr,
                              enum vmx_msr_intercept_type type);
 void vmx_set_msr_intercept(struct vcpu *v, unsigned int msr,
                            enum vmx_msr_intercept_type type);
-int vmx_read_guest_msr(u32 msr, u64 *val);
-int vmx_write_guest_msr(u32 msr, u64 val);
-struct vmx_msr_entry *vmx_find_msr(u32 msr, int type);
-int vmx_add_msr(u32 msr, int type);
 void vmx_vmcs_switch(paddr_t from, paddr_t to);
 void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector);
 void vmx_clear_eoi_exit_bitmap(struct vcpu *v, u8 vector);
@@ -562,15 +603,6 @@ void virtual_vmcs_vmwrite(const struct vcpu *, u32 encoding, u64 val);
 enum vmx_insn_errno virtual_vmcs_vmwrite_safe(const struct vcpu *v,
                                               u32 vmcs_encoding, u64 val);
 
-static inline int vmx_add_guest_msr(u32 msr)
-{
-    return vmx_add_msr(msr, VMX_GUEST_MSR);
-}
-static inline int vmx_add_host_load_msr(u32 msr)
-{
-    return vmx_add_msr(msr, VMX_HOST_MSR);
-}
-
 DECLARE_PER_CPU(bool_t, vmxon);
 
 bool_t vmx_vcpu_pml_enabled(const struct vcpu *v);
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 2/9] x86/vmx: Internal cleanup for MSR load/save infrastructure
  2018-05-22 11:20 [PATCH 0/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
  2018-05-22 11:20 ` [PATCH 1/9] x86/vmx: API improvements for MSR load/save infrastructure Andrew Cooper
@ 2018-05-22 11:20 ` Andrew Cooper
  2018-05-23 16:28   ` Roger Pau Monné
                     ` (2 more replies)
  2018-05-22 11:20 ` [PATCH 3/9] x86/vmx: Factor locate_msr_entry() out of vmx_find_msr() and vmx_add_msr() Andrew Cooper
                   ` (6 subsequent siblings)
  8 siblings, 3 replies; 51+ messages in thread
From: Andrew Cooper @ 2018-05-22 11:20 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper, Jun Nakajima,
	Roger Pau Monné

 * Use an arch_vmx_struct local variable to reduce later code volume.
 * Use start/total instead of msr_area/msr_count.  This is in preparation for
   more finegrained handling with later changes.
 * Use ent/end pointers (again for preparation), and to make the vmx_add_msr()
   logic easier to follow.
 * Make the memory allocation block of vmx_add_msr() unlikely, and calculate
   virt_to_maddr() just once.

No practical change to functionality.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c | 74 ++++++++++++++++++++++++---------------------
 1 file changed, 40 insertions(+), 34 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index a5dcf5c..f557857 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1292,48 +1292,50 @@ static int vmx_msr_entry_key_cmp(const void *key, const void *elt)
 struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type)
 {
     struct vcpu *curr = current;
-    unsigned int msr_count;
-    struct vmx_msr_entry *msr_area = NULL;
+    struct arch_vmx_struct *arch_vmx = &curr->arch.hvm_vmx;
+    struct vmx_msr_entry *start = NULL;
+    unsigned int total;
 
     switch ( type )
     {
     case VMX_MSR_HOST:
-        msr_count = curr->arch.hvm_vmx.host_msr_count;
-        msr_area = curr->arch.hvm_vmx.host_msr_area;
+        start    = arch_vmx->host_msr_area;
+        total    = arch_vmx->host_msr_count;
         break;
 
     case VMX_MSR_GUEST:
-        msr_count = curr->arch.hvm_vmx.msr_count;
-        msr_area = curr->arch.hvm_vmx.msr_area;
+        start    = arch_vmx->msr_area;
+        total    = arch_vmx->msr_count;
         break;
 
     default:
         ASSERT_UNREACHABLE();
     }
 
-    if ( msr_area == NULL )
+    if ( !start )
         return NULL;
 
-    return bsearch(&msr, msr_area, msr_count, sizeof(struct vmx_msr_entry),
+    return bsearch(&msr, start, total, sizeof(struct vmx_msr_entry),
                    vmx_msr_entry_key_cmp);
 }
 
 int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
 {
     struct vcpu *curr = current;
-    unsigned int idx, *msr_count;
-    struct vmx_msr_entry **msr_area, *msr_area_elem;
+    struct arch_vmx_struct *arch_vmx = &curr->arch.hvm_vmx;
+    struct vmx_msr_entry **ptr, *start = NULL, *ent, *end;
+    unsigned int total;
 
     switch ( type )
     {
     case VMX_MSR_HOST:
-        msr_count = &curr->arch.hvm_vmx.host_msr_count;
-        msr_area = &curr->arch.hvm_vmx.host_msr_area;
+        ptr      = &arch_vmx->host_msr_area;
+        total    = arch_vmx->host_msr_count;
         break;
 
     case VMX_MSR_GUEST:
-        msr_count = &curr->arch.hvm_vmx.msr_count;
-        msr_area = &curr->arch.hvm_vmx.msr_area;
+        ptr      = &arch_vmx->msr_area;
+        total    = arch_vmx->msr_count;
         break;
 
     default:
@@ -1341,51 +1343,55 @@ int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
         return -EINVAL;
     }
 
-    if ( *msr_area == NULL )
+    /* Allocate memory on first use. */
+    if ( unlikely(!*ptr) )
     {
-        if ( (*msr_area = alloc_xenheap_page()) == NULL )
+        paddr_t addr;
+
+        if ( (*ptr = alloc_xenheap_page()) == NULL )
             return -ENOMEM;
 
+        addr = virt_to_maddr(*ptr);
+
         switch ( type )
         {
         case VMX_MSR_HOST:
-            __vmwrite(VM_EXIT_MSR_LOAD_ADDR, virt_to_maddr(*msr_area));
+            __vmwrite(VM_EXIT_MSR_LOAD_ADDR, addr);
             break;
 
         case VMX_MSR_GUEST:
-            __vmwrite(VM_EXIT_MSR_STORE_ADDR, virt_to_maddr(*msr_area));
-            __vmwrite(VM_ENTRY_MSR_LOAD_ADDR, virt_to_maddr(*msr_area));
+            __vmwrite(VM_EXIT_MSR_STORE_ADDR, addr);
+            __vmwrite(VM_ENTRY_MSR_LOAD_ADDR, addr);
             break;
         }
     }
 
-    for ( idx = 0; idx < *msr_count && (*msr_area)[idx].index <= msr; idx++ )
-        if ( (*msr_area)[idx].index == msr )
+    start = *ptr;
+    end   = start + total;
+
+    for ( ent = start; ent < end && ent->index <= msr; ++ent )
+        if ( ent->index == msr )
             return 0;
 
-    if ( *msr_count == (PAGE_SIZE / sizeof(struct vmx_msr_entry)) )
+    if ( total == (PAGE_SIZE / sizeof(*ent)) )
         return -ENOSPC;
 
-    memmove(*msr_area + idx + 1, *msr_area + idx,
-            sizeof(*msr_area_elem) * (*msr_count - idx));
-
-    msr_area_elem = *msr_area + idx;
-    msr_area_elem->index = msr;
-    msr_area_elem->mbz = 0;
+    memmove(ent + 1, ent, sizeof(*ent) * (end - ent));
 
-    ++*msr_count;
+    ent->index = msr;
+    ent->mbz = 0;
 
     switch ( type )
     {
     case VMX_MSR_HOST:
-        rdmsrl(msr, msr_area_elem->data);
-        __vmwrite(VM_EXIT_MSR_LOAD_COUNT, *msr_count);
+        rdmsrl(msr, ent->data);
+        __vmwrite(VM_EXIT_MSR_LOAD_COUNT, ++arch_vmx->host_msr_count);
         break;
 
     case VMX_MSR_GUEST:
-        msr_area_elem->data = 0;
-        __vmwrite(VM_EXIT_MSR_STORE_COUNT, *msr_count);
-        __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, *msr_count);
+        ent->data = 0;
+        __vmwrite(VM_EXIT_MSR_STORE_COUNT, ++arch_vmx->msr_count);
+        __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, arch_vmx->msr_count);
         break;
     }
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 3/9] x86/vmx: Factor locate_msr_entry() out of vmx_find_msr() and vmx_add_msr()
  2018-05-22 11:20 [PATCH 0/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
  2018-05-22 11:20 ` [PATCH 1/9] x86/vmx: API improvements for MSR load/save infrastructure Andrew Cooper
  2018-05-22 11:20 ` [PATCH 2/9] x86/vmx: Internal cleanup " Andrew Cooper
@ 2018-05-22 11:20 ` Andrew Cooper
  2018-05-23 16:39   ` Roger Pau Monné
  2018-05-27  3:38   ` Tian, Kevin
  2018-05-22 11:20 ` [PATCH 4/9] x86/vmx: Support remote access to the MSR lists Andrew Cooper
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 51+ messages in thread
From: Andrew Cooper @ 2018-05-22 11:20 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper, Jun Nakajima,
	Roger Pau Monné

Instead of having multiple algorithms searching the MSR lists, implement a
single one.  It has the semantics required by vmx_add_msr(), to identify the
position in which an MSR should live, if it isn't already present.

There will be a marginal improvement for vmx_find_msr() by avoiding the
function pointer calls to vmx_msr_entry_key_cmp(), and a major improvement for
vmx_add_msr() by using a binary search instead of a linear search.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c | 42 ++++++++++++++++++++++++++++--------------
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index f557857..e4acdc1 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1276,24 +1276,36 @@ static int construct_vmcs(struct vcpu *v)
     return 0;
 }
 
-static int vmx_msr_entry_key_cmp(const void *key, const void *elt)
+/*
+ * Search an MSR list looking for an MSR entry, or the slot in which it should
+ * live (to keep the data sorted) if an entry is not found.
+ *
+ * The return pointer is guarenteed to be bounded by start and end.  However,
+ * it may point at end, and may be invalid for the caller to dereference.
+ */
+static struct vmx_msr_entry *locate_msr_entry(
+    struct vmx_msr_entry *start, struct vmx_msr_entry *end, uint32_t msr)
 {
-    const u32 *msr = key;
-    const struct vmx_msr_entry *entry = elt;
+    while ( start < end )
+    {
+        struct vmx_msr_entry *mid = start + (end - start) / 2;
 
-    if ( *msr > entry->index )
-        return 1;
-    if ( *msr < entry->index )
-        return -1;
+        if ( msr < mid->index )
+            end = mid;
+        else if ( msr > mid->index )
+            start = mid + 1;
+        else
+            return mid;
+    }
 
-    return 0;
+    return start;
 }
 
 struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type)
 {
     struct vcpu *curr = current;
     struct arch_vmx_struct *arch_vmx = &curr->arch.hvm_vmx;
-    struct vmx_msr_entry *start = NULL;
+    struct vmx_msr_entry *start = NULL, *ent, *end;
     unsigned int total;
 
     switch ( type )
@@ -1315,8 +1327,10 @@ struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type)
     if ( !start )
         return NULL;
 
-    return bsearch(&msr, start, total, sizeof(struct vmx_msr_entry),
-                   vmx_msr_entry_key_cmp);
+    end = start + total;
+    ent = locate_msr_entry(start, end, msr);
+
+    return ((ent < end) && (ent->index == msr)) ? ent : NULL;
 }
 
 int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
@@ -1368,10 +1382,10 @@ int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
 
     start = *ptr;
     end   = start + total;
+    ent   = locate_msr_entry(start, end, msr);
 
-    for ( ent = start; ent < end && ent->index <= msr; ++ent )
-        if ( ent->index == msr )
-            return 0;
+    if ( (ent < end) && (ent->index == msr) )
+        return 0;
 
     if ( total == (PAGE_SIZE / sizeof(*ent)) )
         return -ENOSPC;
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 4/9] x86/vmx: Support remote access to the MSR lists
  2018-05-22 11:20 [PATCH 0/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
                   ` (2 preceding siblings ...)
  2018-05-22 11:20 ` [PATCH 3/9] x86/vmx: Factor locate_msr_entry() out of vmx_find_msr() and vmx_add_msr() Andrew Cooper
@ 2018-05-22 11:20 ` Andrew Cooper
  2018-05-24 11:50   ` Roger Pau Monné
                     ` (2 more replies)
  2018-05-22 11:20 ` [PATCH 5/9] x86/vmx: Fix handing of MSR_DEBUGCTL on VMExit Andrew Cooper
                   ` (4 subsequent siblings)
  8 siblings, 3 replies; 51+ messages in thread
From: Andrew Cooper @ 2018-05-22 11:20 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper, Jun Nakajima,
	Roger Pau Monné

At the moment, all modifications of the MSR lists are in current context.
However, future changes may need to put MSR_EFER into the lists from domctl
hypercall context.

Plumb a struct vcpu parameter down through the infrastructure, and use
vmx_vmcs_{enter,exit}() for safe access to the VMCS in vmx_add_msr().  Use
assertions to ensure that access is either in current context, or while the
vcpu is paused.

For now it is safe to require that remote accesses are under the domctl lock.
This will remain safe if/when the global domctl lock becomes per-domain.

Note these expectations beside the fields in arch_vmx_struct, and reorder the
fields to avoid unnecessary padding.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

To preempt any questions about spinlocks, the use of the MSR lists in the
return-to-guest path causes checklock failures for plain spinlocks (despite it
technically being safe to live here), and the call to alloc_xenheap_page()
makes it impossible to use irqsave/restore variants, due to the nested
acquisition of the heap lock.
---
 xen/arch/x86/cpu/vpmu_intel.c      | 14 ++++++-------
 xen/arch/x86/hvm/vmx/vmcs.c        | 40 ++++++++++++++++++++++++++++----------
 xen/arch/x86/hvm/vmx/vmx.c         | 24 ++++++++++++-----------
 xen/include/asm-x86/hvm/vmx/vmcs.h | 34 ++++++++++++++++++++------------
 4 files changed, 72 insertions(+), 40 deletions(-)

diff --git a/xen/arch/x86/cpu/vpmu_intel.c b/xen/arch/x86/cpu/vpmu_intel.c
index 207e2e7..c499e69 100644
--- a/xen/arch/x86/cpu/vpmu_intel.c
+++ b/xen/arch/x86/cpu/vpmu_intel.c
@@ -455,12 +455,12 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
     if ( is_hvm_vcpu(v) )
     {
         wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-        if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+        if ( vmx_add_host_load_msr(v, MSR_CORE_PERF_GLOBAL_CTRL) )
             goto out_err;
 
-        if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+        if ( vmx_add_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL) )
             goto out_err;
-        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+        vmx_write_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL, 0);
     }
 
     core2_vpmu_cxt = xzalloc_bytes(sizeof(*core2_vpmu_cxt) +
@@ -613,7 +613,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
             return -EINVAL;
 
         if ( is_hvm_vcpu(v) )
-            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
+            vmx_read_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL,
                                &core2_vpmu_cxt->global_ctrl);
         else
             rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
@@ -682,7 +682,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
                 return -EINVAL;
 
             if ( is_hvm_vcpu(v) )
-                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
+                vmx_read_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL,
                                    &core2_vpmu_cxt->global_ctrl);
             else
                 rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
@@ -701,7 +701,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
     else
     {
         if ( is_hvm_vcpu(v) )
-            vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+            vmx_write_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
         else
             wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
     }
@@ -735,7 +735,7 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
             break;
         case MSR_CORE_PERF_GLOBAL_CTRL:
             if ( is_hvm_vcpu(v) )
-                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+                vmx_read_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
             else
                 rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content);
             break;
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index e4acdc1..8bf54c4 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1301,13 +1301,15 @@ static struct vmx_msr_entry *locate_msr_entry(
     return start;
 }
 
-struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type)
+struct vmx_msr_entry *vmx_find_msr(struct vcpu *v, uint32_t msr,
+                                   enum vmx_msr_list_type type)
 {
-    struct vcpu *curr = current;
-    struct arch_vmx_struct *arch_vmx = &curr->arch.hvm_vmx;
+    struct arch_vmx_struct *arch_vmx = &v->arch.hvm_vmx;
     struct vmx_msr_entry *start = NULL, *ent, *end;
     unsigned int total;
 
+    ASSERT(v == current || !vcpu_runnable(v));
+
     switch ( type )
     {
     case VMX_MSR_HOST:
@@ -1333,12 +1335,14 @@ struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type)
     return ((ent < end) && (ent->index == msr)) ? ent : NULL;
 }
 
-int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
+int vmx_add_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type)
 {
-    struct vcpu *curr = current;
-    struct arch_vmx_struct *arch_vmx = &curr->arch.hvm_vmx;
+    struct arch_vmx_struct *arch_vmx = &v->arch.hvm_vmx;
     struct vmx_msr_entry **ptr, *start = NULL, *ent, *end;
     unsigned int total;
+    int rc;
+
+    ASSERT(v == current || !vcpu_runnable(v));
 
     switch ( type )
     {
@@ -1357,13 +1361,18 @@ int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
         return -EINVAL;
     }
 
+    vmx_vmcs_enter(v);
+
     /* Allocate memory on first use. */
     if ( unlikely(!*ptr) )
     {
         paddr_t addr;
 
         if ( (*ptr = alloc_xenheap_page()) == NULL )
-            return -ENOMEM;
+        {
+            rc = -ENOMEM;
+            goto out;
+        }
 
         addr = virt_to_maddr(*ptr);
 
@@ -1385,10 +1394,16 @@ int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
     ent   = locate_msr_entry(start, end, msr);
 
     if ( (ent < end) && (ent->index == msr) )
-        return 0;
+    {
+        rc = 0;
+        goto out;
+    }
 
     if ( total == (PAGE_SIZE / sizeof(*ent)) )
-        return -ENOSPC;
+    {
+        rc = -ENOSPC;
+        goto out;
+    }
 
     memmove(ent + 1, ent, sizeof(*ent) * (end - ent));
 
@@ -1409,7 +1424,12 @@ int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
         break;
     }
 
-    return 0;
+    rc = 0;
+
+ out:
+    vmx_vmcs_exit(v);
+
+    return rc;
 }
 
 void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 123dccb..3950b12 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2818,7 +2818,7 @@ static int is_last_branch_msr(u32 ecx)
 
 static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
 {
-    const struct vcpu *curr = current;
+    struct vcpu *curr = current;
 
     HVM_DBG_LOG(DBG_LEVEL_MSR, "ecx=%#x", msr);
 
@@ -2897,7 +2897,7 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
         if ( passive_domain_do_rdmsr(msr, msr_content) )
             goto done;
 
-        if ( vmx_read_guest_msr(msr, msr_content) == 0 )
+        if ( vmx_read_guest_msr(curr, msr, msr_content) == 0 )
             break;
 
         if ( is_last_branch_msr(msr) )
@@ -3109,7 +3109,7 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
 
             for ( ; (rc == 0) && lbr->count; lbr++ )
                 for ( i = 0; (rc == 0) && (i < lbr->count); i++ )
-                    if ( (rc = vmx_add_guest_msr(lbr->base + i)) == 0 )
+                    if ( (rc = vmx_add_guest_msr(v, lbr->base + i)) == 0 )
                     {
                         vmx_clear_msr_intercept(v, lbr->base + i, VMX_MSR_RW);
                         if ( lbr_tsx_fixup_needed )
@@ -3121,7 +3121,7 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
         }
 
         if ( (rc < 0) ||
-             (msr_content && (vmx_add_host_load_msr(msr) < 0)) )
+             (msr_content && (vmx_add_host_load_msr(v, msr) < 0)) )
             hvm_inject_hw_exception(TRAP_machine_check, X86_EVENT_NO_EC);
         else
             __vmwrite(GUEST_IA32_DEBUGCTL, msr_content);
@@ -3150,7 +3150,7 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
         if ( wrmsr_viridian_regs(msr, msr_content) ) 
             break;
 
-        if ( vmx_write_guest_msr(msr, msr_content) == 0 ||
+        if ( vmx_write_guest_msr(v, msr, msr_content) == 0 ||
              is_last_branch_msr(msr) )
             break;
 
@@ -4165,7 +4165,7 @@ static void lbr_tsx_fixup(void)
     struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
     struct vmx_msr_entry *msr;
 
-    if ( (msr = vmx_find_msr(lbr_from_start, VMX_MSR_GUEST)) != NULL )
+    if ( (msr = vmx_find_msr(curr, lbr_from_start, VMX_MSR_GUEST)) != NULL )
     {
         /*
          * Sign extend into bits 61:62 while preserving bit 63
@@ -4175,15 +4175,15 @@ static void lbr_tsx_fixup(void)
             msr->data |= ((LBR_FROM_SIGNEXT_2MSB & msr->data) << 2);
     }
 
-    if ( (msr = vmx_find_msr(lbr_lastint_from, VMX_MSR_GUEST)) != NULL )
+    if ( (msr = vmx_find_msr(curr, lbr_lastint_from, VMX_MSR_GUEST)) != NULL )
         msr->data |= ((LBR_FROM_SIGNEXT_2MSB & msr->data) << 2);
 }
 
-static void sign_extend_msr(u32 msr, int type)
+static void sign_extend_msr(struct vcpu *v, u32 msr, int type)
 {
     struct vmx_msr_entry *entry;
 
-    if ( (entry = vmx_find_msr(msr, type)) != NULL )
+    if ( (entry = vmx_find_msr(v, msr, type)) != NULL )
     {
         if ( entry->data & VADDR_TOP_BIT )
             entry->data |= CANONICAL_MASK;
@@ -4194,6 +4194,8 @@ static void sign_extend_msr(u32 msr, int type)
 
 static void bdw_erratum_bdf14_fixup(void)
 {
+    struct vcpu *curr = current;
+
     /*
      * Occasionally, on certain Broadwell CPUs MSR_IA32_LASTINTTOIP has
      * been observed to have the top three bits corrupted as though the
@@ -4203,8 +4205,8 @@ static void bdw_erratum_bdf14_fixup(void)
      * erratum BDF14. Fix up MSR_IA32_LASTINT{FROM,TO}IP by
      * sign-extending into bits 48:63.
      */
-    sign_extend_msr(MSR_IA32_LASTINTFROMIP, VMX_MSR_GUEST);
-    sign_extend_msr(MSR_IA32_LASTINTTOIP, VMX_MSR_GUEST);
+    sign_extend_msr(curr, MSR_IA32_LASTINTFROMIP, VMX_MSR_GUEST);
+    sign_extend_msr(curr, MSR_IA32_LASTINTTOIP, VMX_MSR_GUEST);
 }
 
 static void lbr_fixup(void)
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index c8a1f89..f66f121 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -130,10 +130,17 @@ struct arch_vmx_struct {
     uint64_t             sfmask;
 
     struct vmx_msr_bitmap *msr_bitmap;
-    unsigned int         msr_count;
+
+    /*
+     * Most accesses to the MSR host/guest load/save lists are in current
+     * context.  However, the data can be modified by toolstack/migration
+     * actions.  Remote access is only permitted for paused vcpus, and is
+     * protected under the domctl lock.
+     */
     struct vmx_msr_entry *msr_area;
-    unsigned int         host_msr_count;
     struct vmx_msr_entry *host_msr_area;
+    unsigned int         msr_count;
+    unsigned int         host_msr_count;
 
     unsigned long        eoi_exitmap_changed;
     DECLARE_BITMAP(eoi_exit_bitmap, NR_VECTORS);
@@ -537,25 +544,27 @@ enum vmx_msr_list_type {
     VMX_MSR_GUEST,
 };
 
-int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type);
+int vmx_add_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type);
 
-static inline int vmx_add_host_load_msr(uint32_t msr)
+static inline int vmx_add_guest_msr(struct vcpu *v, uint32_t msr)
 {
-    return vmx_add_msr(msr, VMX_MSR_HOST);
+    return vmx_add_msr(v, msr, VMX_MSR_GUEST);
 }
 
-static inline int vmx_add_guest_msr(uint32_t msr)
+static inline int vmx_add_host_load_msr(struct vcpu *v, uint32_t msr)
 {
-    return vmx_add_msr(msr, VMX_MSR_GUEST);
+    return vmx_add_msr(v, msr, VMX_MSR_HOST);
 }
 
-struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type);
+struct vmx_msr_entry *vmx_find_msr(struct vcpu *v, uint32_t msr,
+                                   enum vmx_msr_list_type type);
 
-static inline int vmx_read_guest_msr(uint32_t msr, uint64_t *val)
+static inline int vmx_read_guest_msr(struct vcpu *v, uint32_t msr,
+                                     uint64_t *val)
 {
     struct vmx_msr_entry *ent;
 
-    if ( (ent = vmx_find_msr(msr, VMX_MSR_GUEST)) )
+    if ( (ent = vmx_find_msr(v, msr, VMX_MSR_GUEST)) )
     {
         *val = ent->data;
         return 0;
@@ -564,11 +573,12 @@ static inline int vmx_read_guest_msr(uint32_t msr, uint64_t *val)
     return -ESRCH;
 }
 
-static inline int vmx_write_guest_msr(uint32_t msr, uint64_t val)
+static inline int vmx_write_guest_msr(struct vcpu *v, uint32_t msr,
+                                      uint64_t val)
 {
     struct vmx_msr_entry *ent;
 
-    if ( (ent = vmx_find_msr(msr, VMX_MSR_GUEST)) )
+    if ( (ent = vmx_find_msr(v, msr, VMX_MSR_GUEST)) )
     {
         ent->data = val;
         return 0;
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 5/9] x86/vmx: Fix handing of MSR_DEBUGCTL on VMExit
  2018-05-22 11:20 [PATCH 0/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
                   ` (3 preceding siblings ...)
  2018-05-22 11:20 ` [PATCH 4/9] x86/vmx: Support remote access to the MSR lists Andrew Cooper
@ 2018-05-22 11:20 ` Andrew Cooper
  2018-05-22 12:53   ` Andrew Cooper
                     ` (3 more replies)
  2018-05-22 11:20 ` [PATCH 6/9] x86/vmx: Pass an MSR value into vmx_msr_add() Andrew Cooper
                   ` (3 subsequent siblings)
  8 siblings, 4 replies; 51+ messages in thread
From: Andrew Cooper @ 2018-05-22 11:20 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper, Jun Nakajima,
	Roger Pau Monné

Currently, whenever the guest writes a nonzero value to MSR_DEBUGCTL, Xen
updates a host MSR load list entry with the current hardware value of
MSR_DEBUGCTL.  This is wrong.

On VMExit, hardware automatically resets MSR_DEBUGCTL to 0.  The only case
where different behaviour is needed is if Xen is debugging itself, and this
needs setting up unconditionally for the lifetime of the VM.

The `ler` command line boolean is the only way to configure any use of
MSR_DEBUGCTL for Xen, so tie the host load list entry to this setting in
construct_vmcs().  Any runtime update of Xen's MSR_DEBUGCTL setting requires
more complicated synchronisation across all the running VMs.

In the exceedingly common case, this avoids the unnecessary overhead of having
a host load entry performing the same zeroing operation that hardware has
already performed as part of the VMExit.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

Notes for backporting: This change probably does want backporting, but depends
on the previous patch "Support remote access to the MSR lists", and adds an
extra rdmsr to the vcpu construction path (resolved in a later patch).
---
 xen/arch/x86/hvm/vmx/vmcs.c | 6 ++++++
 xen/arch/x86/hvm/vmx/vmx.c  | 3 +--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 8bf54c4..2035a6d 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -996,6 +996,7 @@ static int construct_vmcs(struct vcpu *v)
     struct domain *d = v->domain;
     u32 vmexit_ctl = vmx_vmexit_control;
     u32 vmentry_ctl = vmx_vmentry_control;
+    int rc;
 
     vmx_vmcs_enter(v);
 
@@ -1266,6 +1267,11 @@ static int construct_vmcs(struct vcpu *v)
     if ( cpu_has_vmx_tsc_scaling )
         __vmwrite(TSC_MULTIPLIER, d->arch.hvm_domain.tsc_scaling_ratio);
 
+    /* If using host debugging, restore Xen's setting on vmexit. */
+    if ( this_cpu(ler_msr) &&
+         (rc = vmx_add_host_load_msr(v, MSR_IA32_DEBUGCTLMSR))  )
+        return rc;
+
     vmx_vmcs_exit(v);
 
     /* will update HOST & GUEST_CR3 as reqd */
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 3950b12..f9cfb6d 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3120,8 +3120,7 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
                     }
         }
 
-        if ( (rc < 0) ||
-             (msr_content && (vmx_add_host_load_msr(v, msr) < 0)) )
+        if ( rc < 0 )
             hvm_inject_hw_exception(TRAP_machine_check, X86_EVENT_NO_EC);
         else
             __vmwrite(GUEST_IA32_DEBUGCTL, msr_content);
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 6/9] x86/vmx: Pass an MSR value into vmx_msr_add()
  2018-05-22 11:20 [PATCH 0/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
                   ` (4 preceding siblings ...)
  2018-05-22 11:20 ` [PATCH 5/9] x86/vmx: Fix handing of MSR_DEBUGCTL on VMExit Andrew Cooper
@ 2018-05-22 11:20 ` Andrew Cooper
  2018-05-24 15:12   ` Jan Beulich
  2018-05-22 11:20 ` [PATCH 7/9] x86/vmx: Support load-only guest MSR list entries Andrew Cooper
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 51+ messages in thread
From: Andrew Cooper @ 2018-05-22 11:20 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper, Jun Nakajima,
	Roger Pau Monné

The main purpose of this change is to allow us to set a specific MSR value,
without needing to know whether there is already a load/save list slot for it.
Previously, callers wanting this property needed to call both vmx_add_*_msr()
and vmx_write_*_msr() to cover both cases.

As a result of this API improvement, the default value for guest MSRs need not
be 0, and the default for host MSRs need not be passed via hardware register.
In practice, this cleans up the VPMU allocation logic, and avoids an MSR read
as part of vcpu construction.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/cpu/vpmu_intel.c      |  6 ++----
 xen/arch/x86/hvm/vmx/vmcs.c        | 17 +++++++++--------
 xen/arch/x86/hvm/vmx/vmx.c         |  2 +-
 xen/include/asm-x86/hvm/vmx/vmcs.h | 13 +++++++------
 4 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/cpu/vpmu_intel.c b/xen/arch/x86/cpu/vpmu_intel.c
index c499e69..1fc79c9 100644
--- a/xen/arch/x86/cpu/vpmu_intel.c
+++ b/xen/arch/x86/cpu/vpmu_intel.c
@@ -454,13 +454,11 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
 
     if ( is_hvm_vcpu(v) )
     {
-        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-        if ( vmx_add_host_load_msr(v, MSR_CORE_PERF_GLOBAL_CTRL) )
+        if ( vmx_add_host_load_msr(v, MSR_CORE_PERF_GLOBAL_CTRL, 0) )
             goto out_err;
 
-        if ( vmx_add_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL) )
+        if ( vmx_add_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL, 0) )
             goto out_err;
-        vmx_write_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL, 0);
     }
 
     core2_vpmu_cxt = xzalloc_bytes(sizeof(*core2_vpmu_cxt) +
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 2035a6d..b75cc90 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1269,7 +1269,8 @@ static int construct_vmcs(struct vcpu *v)
 
     /* If using host debugging, restore Xen's setting on vmexit. */
     if ( this_cpu(ler_msr) &&
-         (rc = vmx_add_host_load_msr(v, MSR_IA32_DEBUGCTLMSR))  )
+         (rc = vmx_add_host_load_msr(v, MSR_IA32_DEBUGCTLMSR,
+                                     IA32_DEBUGCTLMSR_LBR))  )
         return rc;
 
     vmx_vmcs_exit(v);
@@ -1341,7 +1342,8 @@ struct vmx_msr_entry *vmx_find_msr(struct vcpu *v, uint32_t msr,
     return ((ent < end) && (ent->index == msr)) ? ent : NULL;
 }
 
-int vmx_add_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type)
+int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
+                enum vmx_msr_list_type type)
 {
     struct arch_vmx_struct *arch_vmx = &v->arch.hvm_vmx;
     struct vmx_msr_entry **ptr, *start = NULL, *ent, *end;
@@ -1400,11 +1402,9 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type)
     ent   = locate_msr_entry(start, end, msr);
 
     if ( (ent < end) && (ent->index == msr) )
-    {
-        rc = 0;
-        goto out;
-    }
+        goto found;
 
+    /* If there isn't an existing entry for msr, insert room for one. */
     if ( total == (PAGE_SIZE / sizeof(*ent)) )
     {
         rc = -ENOSPC;
@@ -1419,17 +1419,18 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type)
     switch ( type )
     {
     case VMX_MSR_HOST:
-        rdmsrl(msr, ent->data);
         __vmwrite(VM_EXIT_MSR_LOAD_COUNT, ++arch_vmx->host_msr_count);
         break;
 
     case VMX_MSR_GUEST:
-        ent->data = 0;
         __vmwrite(VM_EXIT_MSR_STORE_COUNT, ++arch_vmx->msr_count);
         __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, arch_vmx->msr_count);
         break;
     }
 
+    /* Set the msr's value. */
+ found:
+    ent->data = val;
     rc = 0;
 
  out:
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index f9cfb6d..1783cd8 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3109,7 +3109,7 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
 
             for ( ; (rc == 0) && lbr->count; lbr++ )
                 for ( i = 0; (rc == 0) && (i < lbr->count); i++ )
-                    if ( (rc = vmx_add_guest_msr(v, lbr->base + i)) == 0 )
+                    if ( (rc = vmx_add_guest_msr(v, lbr->base + i, 0)) == 0 )
                     {
                         vmx_clear_msr_intercept(v, lbr->base + i, VMX_MSR_RW);
                         if ( lbr_tsx_fixup_needed )
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index f66f121..accd6fb 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -544,16 +544,17 @@ enum vmx_msr_list_type {
     VMX_MSR_GUEST,
 };
 
-int vmx_add_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type);
+int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
+                enum vmx_msr_list_type type);
 
-static inline int vmx_add_guest_msr(struct vcpu *v, uint32_t msr)
+static inline int vmx_add_guest_msr(struct vcpu *v, uint32_t msr, uint64_t val)
 {
-    return vmx_add_msr(v, msr, VMX_MSR_GUEST);
+    return vmx_add_msr(v, msr, val, VMX_MSR_GUEST);
 }
-
-static inline int vmx_add_host_load_msr(struct vcpu *v, uint32_t msr)
+static inline int vmx_add_host_load_msr(struct vcpu *v, uint32_t msr,
+                                        uint64_t val)
 {
-    return vmx_add_msr(v, msr, VMX_MSR_HOST);
+    return vmx_add_msr(v, msr, val, VMX_MSR_HOST);
 }
 
 struct vmx_msr_entry *vmx_find_msr(struct vcpu *v, uint32_t msr,
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 7/9] x86/vmx: Support load-only guest MSR list entries
  2018-05-22 11:20 [PATCH 0/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
                   ` (5 preceding siblings ...)
  2018-05-22 11:20 ` [PATCH 6/9] x86/vmx: Pass an MSR value into vmx_msr_add() Andrew Cooper
@ 2018-05-22 11:20 ` Andrew Cooper
  2018-05-24 15:19   ` Jan Beulich
  2018-05-24 15:37   ` Roger Pau Monné
  2018-05-22 11:20 ` [PATCH 8/9] x86/vmx: Support removing MSRs from the host/guest load/save lists Andrew Cooper
  2018-05-22 11:20 ` [PATCH 9/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
  8 siblings, 2 replies; 51+ messages in thread
From: Andrew Cooper @ 2018-05-22 11:20 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper, Jun Nakajima,
	Roger Pau Monné

Currently, the VMX_MSR_GUEST type maintains completely symmetric guest load
and save lists, by pointing VM_EXIT_MSR_STORE_ADDR and VM_ENTRY_MSR_LOAD_ADDR
at the same page, and setting VM_EXIT_MSR_STORE_COUNT and
VM_ENTRY_MSR_LOAD_COUNT to the same value.

However, for MSRs which we won't let the guest have direct access to, having
hardware save the current value on VMExit is unnecessary overhead.

To avoid this overhead, we must make the load and save lists asymmetric.  By
making the entry load count greater than the exit store count, we can maintain
two adjacent lists of MSRs, the first of which is saved and restored, and the
second of which is only restored on VMEntry.

For simplicity:
 * Both adjacent lists are still sorted by MSR index.
 * It undefined behaviour to insert the same MSR into both lists.
 * The total size of both lists is still limited at 256 entries (one 4k page).

Split the current msr_count field into msr_{load,save}_count, and introduce a
new VMX_MSR_GUEST_LOADONLY type, and update vmx_{add,find}_msr() to calculate
which sublist to search, based on type.  VMX_MSR_HOST has no logical sublist,
whereas VMX_MSR_GUEST has a sublist between 0 and the save count, while
VMX_MSR_GUEST_LOADONLY has a sublist between the save count and the load
count.

One subtle point is that inserting an MSR into the load-save list involves
moving the entire load-only list, and updating both counts.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        | 46 +++++++++++++++++++++++++++++---------
 xen/arch/x86/hvm/vmx/vmx.c         |  2 +-
 xen/include/asm-x86/hvm/vmx/vmcs.h |  4 +++-
 3 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index b75cc90..7bf19a0 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1313,7 +1313,7 @@ struct vmx_msr_entry *vmx_find_msr(struct vcpu *v, uint32_t msr,
 {
     struct arch_vmx_struct *arch_vmx = &v->arch.hvm_vmx;
     struct vmx_msr_entry *start = NULL, *ent, *end;
-    unsigned int total;
+    unsigned int substart, subend, total;
 
     ASSERT(v == current || !vcpu_runnable(v));
 
@@ -1321,12 +1321,23 @@ struct vmx_msr_entry *vmx_find_msr(struct vcpu *v, uint32_t msr,
     {
     case VMX_MSR_HOST:
         start    = arch_vmx->host_msr_area;
-        total    = arch_vmx->host_msr_count;
+        substart = 0;
+        subend   = arch_vmx->host_msr_count;
+        total    = subend;
         break;
 
     case VMX_MSR_GUEST:
         start    = arch_vmx->msr_area;
-        total    = arch_vmx->msr_count;
+        substart = 0;
+        subend   = arch_vmx->msr_save_count;
+        total    = arch_vmx->msr_load_count;
+        break;
+
+    case VMX_MSR_GUEST_LOADONLY:
+        start    = arch_vmx->msr_area;
+        substart = arch_vmx->msr_save_count;
+        subend   = arch_vmx->msr_load_count;
+        total    = subend;
         break;
 
     default:
@@ -1337,7 +1348,7 @@ struct vmx_msr_entry *vmx_find_msr(struct vcpu *v, uint32_t msr,
         return NULL;
 
     end = start + total;
-    ent = locate_msr_entry(start, end, msr);
+    ent = locate_msr_entry(start + substart, start + subend, msr);
 
     return ((ent < end) && (ent->index == msr)) ? ent : NULL;
 }
@@ -1347,7 +1358,7 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
 {
     struct arch_vmx_struct *arch_vmx = &v->arch.hvm_vmx;
     struct vmx_msr_entry **ptr, *start = NULL, *ent, *end;
-    unsigned int total;
+    unsigned int substart, subend, total;
     int rc;
 
     ASSERT(v == current || !vcpu_runnable(v));
@@ -1356,12 +1367,23 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
     {
     case VMX_MSR_HOST:
         ptr      = &arch_vmx->host_msr_area;
-        total    = arch_vmx->host_msr_count;
+        substart = 0;
+        subend   = arch_vmx->host_msr_count;
+        total    = subend;
         break;
 
     case VMX_MSR_GUEST:
         ptr      = &arch_vmx->msr_area;
-        total    = arch_vmx->msr_count;
+        substart = 0;
+        subend   = arch_vmx->msr_save_count;
+        total    = arch_vmx->msr_load_count;
+        break;
+
+    case VMX_MSR_GUEST_LOADONLY:
+        ptr      = &arch_vmx->msr_area;
+        substart = arch_vmx->msr_save_count;
+        subend   = arch_vmx->msr_load_count;
+        total    = subend;
         break;
 
     default:
@@ -1391,6 +1413,7 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
             break;
 
         case VMX_MSR_GUEST:
+        case VMX_MSR_GUEST_LOADONLY:
             __vmwrite(VM_EXIT_MSR_STORE_ADDR, addr);
             __vmwrite(VM_ENTRY_MSR_LOAD_ADDR, addr);
             break;
@@ -1399,7 +1422,7 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
 
     start = *ptr;
     end   = start + total;
-    ent   = locate_msr_entry(start, end, msr);
+    ent   = locate_msr_entry(start + substart, start + subend, msr);
 
     if ( (ent < end) && (ent->index == msr) )
         goto found;
@@ -1423,8 +1446,11 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
         break;
 
     case VMX_MSR_GUEST:
-        __vmwrite(VM_EXIT_MSR_STORE_COUNT, ++arch_vmx->msr_count);
-        __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, arch_vmx->msr_count);
+        __vmwrite(VM_EXIT_MSR_STORE_COUNT, ++arch_vmx->msr_save_count);
+
+        /* Fallthrough */
+    case VMX_MSR_GUEST_LOADONLY:
+        __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, ++arch_vmx->msr_load_count);
         break;
     }
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 1783cd8..26e4206 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -4160,7 +4160,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 static void lbr_tsx_fixup(void)
 {
     struct vcpu *curr = current;
-    unsigned int msr_count = curr->arch.hvm_vmx.msr_count;
+    unsigned int msr_count = curr->arch.hvm_vmx.msr_save_count;
     struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
     struct vmx_msr_entry *msr;
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index accd6fb..b0fccd2 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -139,7 +139,8 @@ struct arch_vmx_struct {
      */
     struct vmx_msr_entry *msr_area;
     struct vmx_msr_entry *host_msr_area;
-    unsigned int         msr_count;
+    unsigned int         msr_load_count;
+    unsigned int         msr_save_count;
     unsigned int         host_msr_count;
 
     unsigned long        eoi_exitmap_changed;
@@ -542,6 +543,7 @@ enum vmx_insn_errno
 enum vmx_msr_list_type {
     VMX_MSR_HOST,
     VMX_MSR_GUEST,
+    VMX_MSR_GUEST_LOADONLY,
 };
 
 int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 8/9] x86/vmx: Support removing MSRs from the host/guest load/save lists
  2018-05-22 11:20 [PATCH 0/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
                   ` (6 preceding siblings ...)
  2018-05-22 11:20 ` [PATCH 7/9] x86/vmx: Support load-only guest MSR list entries Andrew Cooper
@ 2018-05-22 11:20 ` Andrew Cooper
  2018-05-24 15:42   ` Roger Pau Monné
  2018-05-22 11:20 ` [PATCH 9/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
  8 siblings, 1 reply; 51+ messages in thread
From: Andrew Cooper @ 2018-05-22 11:20 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper, Jun Nakajima,
	Roger Pau Monné

Up until this point, the MSR load/save lists have only ever accumulated
content.  Introduce vmx_del_msr() as a companion to vmx_add_msr().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        | 68 ++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/vmx/vmcs.h |  1 +
 2 files changed, 69 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 7bf19a0..e1a8f95 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1465,6 +1465,74 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
     return rc;
 }
 
+int vmx_del_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type)
+{
+    struct arch_vmx_struct *arch_vmx = &v->arch.hvm_vmx;
+    struct vmx_msr_entry *start = NULL, *ent, *end;
+    unsigned int substart, subend, total;
+
+    ASSERT(v == current || !vcpu_runnable(v));
+
+    switch ( type )
+    {
+    case VMX_MSR_HOST:
+        start    = arch_vmx->host_msr_area;
+        substart = 0;
+        subend   = arch_vmx->host_msr_count;
+        total    = subend;
+        break;
+
+    case VMX_MSR_GUEST:
+        start    = arch_vmx->msr_area;
+        substart = 0;
+        subend   = arch_vmx->msr_save_count;
+        total    = arch_vmx->msr_load_count;
+        break;
+
+    case VMX_MSR_GUEST_LOADONLY:
+        start    = arch_vmx->msr_area;
+        substart = arch_vmx->msr_save_count;
+        subend   = arch_vmx->msr_load_count;
+        total    = subend;
+        break;
+
+    default:
+        ASSERT_UNREACHABLE();
+    }
+
+    if ( !start )
+        return -ESRCH;
+
+    end = start + total;
+    ent = locate_msr_entry(start + substart, start + subend, msr);
+
+    if ( (ent == end) || (ent->index != msr) )
+        return -ESRCH;
+
+    memmove(ent, ent + 1, sizeof(*ent) * (end - ent));
+
+    vmx_vmcs_enter(v);
+
+    switch ( type )
+    {
+    case VMX_MSR_HOST:
+        __vmwrite(VM_EXIT_MSR_LOAD_COUNT, arch_vmx->host_msr_count--);
+        break;
+
+    case VMX_MSR_GUEST:
+        __vmwrite(VM_EXIT_MSR_STORE_COUNT, arch_vmx->msr_save_count--);
+
+        /* Fallthrough */
+    case VMX_MSR_GUEST_LOADONLY:
+        __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, arch_vmx->msr_load_count--);
+        break;
+    }
+
+    vmx_vmcs_exit(v);
+
+    return 0;
+}
+
 void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector)
 {
     if ( !test_and_set_bit(vector, v->arch.hvm_vmx.eoi_exit_bitmap) )
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index b0fccd2..cfd174c 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -548,6 +548,7 @@ enum vmx_msr_list_type {
 
 int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
                 enum vmx_msr_list_type type);
+int vmx_del_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type);
 
 static inline int vmx_add_guest_msr(struct vcpu *v, uint32_t msr, uint64_t val)
 {
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 9/9] x86/vmx: Don't leak EFER.NXE into guest context
  2018-05-22 11:20 [PATCH 0/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
                   ` (7 preceding siblings ...)
  2018-05-22 11:20 ` [PATCH 8/9] x86/vmx: Support removing MSRs from the host/guest load/save lists Andrew Cooper
@ 2018-05-22 11:20 ` Andrew Cooper
  2018-05-24 16:01   ` Roger Pau Monné
                     ` (2 more replies)
  8 siblings, 3 replies; 51+ messages in thread
From: Andrew Cooper @ 2018-05-22 11:20 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper, Tim Deegan,
	Jun Nakajima, Roger Pau Monné

Intel hardware only uses 4 bits in MSR_EFER.  Changes to LME and LMA are
handled automatically via the VMENTRY_CTLS.IA32E_MODE bit.

SCE is handled by ad-hoc logic in context_switch(), vmx_restore_guest_msrs()
and vmx_update_guest_efer(), and works by altering the host SCE value to match
the setting the guest wants.  This works because, in HVM vcpu context, Xen
never needs to execute a SYSCALL or SYSRET instruction.

However, NXE has never been context switched.  Unlike SCE, NXE cannot be
context switched at vcpu boundaries because disabling NXE makes PTE.NX bits
reserved and cause a pagefault when encountered.  This means that the guest
always has Xen's setting in effect, irrespective of the bit it can see and
modify in its virtualised view of MSR_EFER.

This isn't a major problem for production operating systems because they, like
Xen, always turn the NXE on when it is available.  However, it does have an
observable effect on which guest PTE bits are valid, and whether
PFEC_insn_fetch is visible in a #PF error code.

Second generation VT-x hardware has host and guest EFER fields in the VMCS,
and support for loading and saving them automatically.  First generation VT-x
hardware needs to use MSR load/save lists to cause an atomic switch of
MSR_EFER on vmentry/exit.

Therefore we update vmx_init_vmcs_config() to find and use guest/host EFER
support when available (and MSR load/save lists on older hardware) and drop
all ad-hoc alteration of SCE.

There are two complications for shadow guests.  NXE, being a paging setting
needs to remain under host control, but that is fine as it is also Xen which
handles the pagefaults.  Also, it turns out that without EPT enabled, hardware
won't tolerate LME and LMA being different via either the GUEST_EFER VMCS
setting, or via the guest load list.  This doesn't matter in practice as we
intercept all writes to CR0 and reads from MSR_EFER, so can provide
architecturally consistent behaviour from the guests point of view.

As a result of fixing EFER context switching, we can remove the Intel-special
case from hvm_nx_enabled() and let guest_walk_tables() work with the real
guest paging settings.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Tim Deegan <tim@xen.org>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

Slightly RFC.  There are some corrections going to be made to the Intel SDM,
and I'm just waiting for final confirmation.
---
 xen/arch/x86/domain.c              | 10 -----
 xen/arch/x86/hvm/vmx/vmcs.c        |  9 ++--
 xen/arch/x86/hvm/vmx/vmx.c         | 88 ++++++++++++++++++++++++++++++--------
 xen/include/asm-x86/hvm/hvm.h      |  4 +-
 xen/include/asm-x86/hvm/vmx/vmcs.h |  2 +
 5 files changed, 78 insertions(+), 35 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 4ff3d2f3..600d7f7 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1723,16 +1723,6 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
     {
         __context_switch();
 
-        if ( is_pv_domain(nextd) &&
-             (is_idle_domain(prevd) ||
-              is_hvm_domain(prevd) ||
-              is_pv_32bit_domain(prevd) != is_pv_32bit_domain(nextd)) )
-        {
-            uint64_t efer = read_efer();
-            if ( !(efer & EFER_SCE) )
-                write_efer(efer | EFER_SCE);
-        }
-
         /* Re-enable interrupts before restoring state which may fault. */
         local_irq_enable();
 
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index e1a8f95..383098d 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -342,8 +342,8 @@ static int vmx_init_vmcs_config(void)
     }
 
     min = VM_EXIT_ACK_INTR_ON_EXIT;
-    opt = VM_EXIT_SAVE_GUEST_PAT | VM_EXIT_LOAD_HOST_PAT |
-          VM_EXIT_CLEAR_BNDCFGS;
+    opt = (VM_EXIT_SAVE_GUEST_PAT | VM_EXIT_LOAD_HOST_PAT |
+           VM_EXIT_LOAD_HOST_EFER | VM_EXIT_CLEAR_BNDCFGS);
     min |= VM_EXIT_IA32E_MODE;
     _vmx_vmexit_control = adjust_vmx_controls(
         "VMExit Control", min, opt, MSR_IA32_VMX_EXIT_CTLS, &mismatch);
@@ -383,7 +383,8 @@ static int vmx_init_vmcs_config(void)
         _vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS;
 
     min = 0;
-    opt = VM_ENTRY_LOAD_GUEST_PAT | VM_ENTRY_LOAD_BNDCFGS;
+    opt = (VM_ENTRY_LOAD_GUEST_PAT | VM_ENTRY_LOAD_GUEST_EFER |
+           VM_ENTRY_LOAD_BNDCFGS);
     _vmx_vmentry_control = adjust_vmx_controls(
         "VMEntry Control", min, opt, MSR_IA32_VMX_ENTRY_CTLS, &mismatch);
 
@@ -1148,6 +1149,8 @@ static int construct_vmcs(struct vcpu *v)
     v->arch.hvm_vmx.host_cr0 = read_cr0() | X86_CR0_TS;
     __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0);
     __vmwrite(HOST_CR4, mmu_cr4_features);
+    if ( cpu_has_vmx_efer )
+        __vmwrite(HOST_EFER, read_efer());
 
     /* Host CS:RIP. */
     __vmwrite(HOST_CS_SELECTOR, __HYPERVISOR_CS);
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 26e4206..a9fbce9 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -513,15 +513,6 @@ static void vmx_restore_guest_msrs(struct vcpu *v)
     wrmsrl(MSR_LSTAR,          v->arch.hvm_vmx.lstar);
     wrmsrl(MSR_SYSCALL_MASK,   v->arch.hvm_vmx.sfmask);
 
-    if ( (v->arch.hvm_vcpu.guest_efer ^ read_efer()) & EFER_SCE )
-    {
-        HVM_DBG_LOG(DBG_LEVEL_2,
-                    "restore guest's EFER with value %lx",
-                    v->arch.hvm_vcpu.guest_efer);
-        write_efer((read_efer() & ~EFER_SCE) |
-                   (v->arch.hvm_vcpu.guest_efer & EFER_SCE));
-    }
-
     if ( cpu_has_rdtscp )
         wrmsr_tsc_aux(hvm_msr_tsc_aux(v));
 }
@@ -1650,22 +1641,81 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr,
 
 static void vmx_update_guest_efer(struct vcpu *v)
 {
-    unsigned long vm_entry_value;
+    unsigned long entry_ctls, guest_efer = v->arch.hvm_vcpu.guest_efer,
+        xen_efer = read_efer();
+
+    if ( paging_mode_shadow(v->domain) )
+    {
+        /*
+         * When using shadow pagetables, EFER.NX is a Xen-owned bit and is not
+         * under guest control.
+         */
+        guest_efer &= ~EFER_NX;
+        guest_efer |= xen_efer & EFER_NX;
+
+        /*
+         * At the time of writing (May 2018), the Intel SDM "VM Entry: Checks
+         * on Guest Control Registers, Debug Registers and MSRs" section says:
+         *
+         *  If the "Load IA32_EFER" VM-entry control is 1, the following
+         *  checks are performed on the field for the IA32_MSR:
+         *   - Bits reserved in the IA32_EFER MSR must be 0.
+         *   - Bit 10 (corresponding to IA32_EFER.LMA) must equal the value of
+         *     the "IA-32e mode guest" VM-entry control.  It must also be
+         *     identical to bit 8 (LME) if bit 31 in the CR0 field
+         *     (corresponding to CR0.PG) is 1.
+         *
+         * Experimentally what actually happens is:
+         *   - Checks for EFER.{LME,LMA} apply uniformly whether using the
+         *     GUEST_EFER VMCS controls, or MSR load/save lists.
+         *   - Without EPT, LME being different to LMA isn't tolerated by
+         *     hardware.  As writes to CR0 are intercepted, it is safe to
+         *     leave LME clear at this point, and fix up both LME and LMA when
+         *     CR0.PG is set.
+         */
+        if ( !(guest_efer & EFER_LMA) )
+            guest_efer &= ~EFER_LME;
+    }
 
     vmx_vmcs_enter(v);
 
-    __vmread(VM_ENTRY_CONTROLS, &vm_entry_value);
-    if ( v->arch.hvm_vcpu.guest_efer & EFER_LMA )
-        vm_entry_value |= VM_ENTRY_IA32E_MODE;
+    /*
+     * The intended guest running mode is derived from VM_ENTRY_IA32E_MODE,
+     * which (architecturally) is the guest's LMA setting.
+     */
+    __vmread(VM_ENTRY_CONTROLS, &entry_ctls);
+
+    entry_ctls &= ~VM_ENTRY_IA32E_MODE;
+    if ( guest_efer & EFER_LMA )
+        entry_ctls |= VM_ENTRY_IA32E_MODE;
+
+    __vmwrite(VM_ENTRY_CONTROLS, entry_ctls);
+
+    /* We expect to use EFER loading in the common case, but... */
+    if ( likely(cpu_has_vmx_efer) )
+        __vmwrite(GUEST_EFER, guest_efer);
+
+    /* ... on Gen1 VT-x hardware, we have to use MSR load/save lists instead. */
     else
-        vm_entry_value &= ~VM_ENTRY_IA32E_MODE;
-    __vmwrite(VM_ENTRY_CONTROLS, vm_entry_value);
+    {
+        /*
+         * When the guests choice of EFER matches Xen's, remove the load/save
+         * list entries.  It is unnecessary overhead, especially as this is
+         * expected to be the common case for 64bit guests.
+         */
+        if ( guest_efer == xen_efer )
+        {
+            vmx_del_msr(v, MSR_EFER, VMX_MSR_HOST);
+            vmx_del_msr(v, MSR_EFER, VMX_MSR_GUEST_LOADONLY);
+        }
+        else
+        {
+            vmx_add_msr(v, MSR_EFER, xen_efer, VMX_MSR_HOST);
+            vmx_add_msr(v, MSR_EFER, guest_efer, VMX_MSR_GUEST_LOADONLY);
+        }
+    }
 
     vmx_vmcs_exit(v);
-
-    if ( v == current )
-        write_efer((read_efer() & ~EFER_SCE) |
-                   (v->arch.hvm_vcpu.guest_efer & EFER_SCE));
 }
 
 void nvmx_enqueue_n2_exceptions(struct vcpu *v, 
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index ef5e198..fcfc5cf 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -296,10 +296,8 @@ int hvm_girq_dest_2_vcpu_id(struct domain *d, uint8_t dest, uint8_t dest_mode);
     (hvm_paging_enabled(v) && ((v)->arch.hvm_vcpu.guest_cr[4] & X86_CR4_SMEP))
 #define hvm_smap_enabled(v) \
     (hvm_paging_enabled(v) && ((v)->arch.hvm_vcpu.guest_cr[4] & X86_CR4_SMAP))
-/* HVM guests on Intel hardware leak Xen's NX settings into guest context. */
 #define hvm_nx_enabled(v) \
-    ((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && cpu_has_nx) ||    \
-     ((v)->arch.hvm_vcpu.guest_efer & EFER_NX))
+    ((v)->arch.hvm_vcpu.guest_efer & EFER_NX)
 #define hvm_pku_enabled(v) \
     (hvm_paging_enabled(v) && ((v)->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PKE))
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index cfd174c..6c6897c 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -306,6 +306,8 @@ extern u64 vmx_ept_vpid_cap;
     (vmx_cpu_based_exec_control & CPU_BASED_MONITOR_TRAP_FLAG)
 #define cpu_has_vmx_pat \
     (vmx_vmentry_control & VM_ENTRY_LOAD_GUEST_PAT)
+#define cpu_has_vmx_efer \
+    (vmx_vmentry_control & VM_ENTRY_LOAD_GUEST_EFER)
 #define cpu_has_vmx_unrestricted_guest \
     (vmx_secondary_exec_control & SECONDARY_EXEC_UNRESTRICTED_GUEST)
 #define vmx_unrestricted_guest(v)               \
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/9] x86/vmx: Fix handing of MSR_DEBUGCTL on VMExit
  2018-05-22 11:20 ` [PATCH 5/9] x86/vmx: Fix handing of MSR_DEBUGCTL on VMExit Andrew Cooper
@ 2018-05-22 12:53   ` Andrew Cooper
  2018-05-24 12:14   ` Roger Pau Monné
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 51+ messages in thread
From: Andrew Cooper @ 2018-05-22 12:53 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Jan Beulich, Roger Pau Monné

On 22/05/18 12:20, Andrew Cooper wrote:
> Currently, whenever the guest writes a nonzero value to MSR_DEBUGCTL, Xen
> updates a host MSR load list entry with the current hardware value of
> MSR_DEBUGCTL.  This is wrong.
>
> On VMExit, hardware automatically resets MSR_DEBUGCTL to 0.  The only case
> where different behaviour is needed is if Xen is debugging itself, and this
> needs setting up unconditionally for the lifetime of the VM.
>
> The `ler` command line boolean is the only way to configure any use of
> MSR_DEBUGCTL for Xen, so tie the host load list entry to this setting in
> construct_vmcs().  Any runtime update of Xen's MSR_DEBUGCTL setting requires
> more complicated synchronisation across all the running VMs.
>
> In the exceedingly common case, this avoids the unnecessary overhead of having
> a host load entry performing the same zeroing operation that hardware has
> already performed as part of the VMExit.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

So after doing some archaeology, the last meaningful change to DEBUGCTL
handing was c/s dfa625e1 "VMX: fix DebugCtl MSR clearing" in Xen 4.5,
but the underlying logic has been broken since its introduction in
0ae9ef55a "vmx: last branch recording MSR emulation" in 2007.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/9] x86/vmx: API improvements for MSR load/save infrastructure
  2018-05-22 11:20 ` [PATCH 1/9] x86/vmx: API improvements for MSR load/save infrastructure Andrew Cooper
@ 2018-05-23 16:01   ` Roger Pau Monné
  2018-05-23 17:02     ` Andrew Cooper
  2018-05-27  3:26   ` Tian, Kevin
  1 sibling, 1 reply; 51+ messages in thread
From: Roger Pau Monné @ 2018-05-23 16:01 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Kevin Tian, Wei Liu, Jun Nakajima, Jan Beulich, Xen-devel

On Tue, May 22, 2018 at 12:20:38PM +0100, Andrew Cooper wrote:
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index 06c3179..c8a1f89 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -514,9 +514,6 @@ enum vmcs_field {
>  
>  #define VMCS_VPID_WIDTH 16
>  
> -#define VMX_GUEST_MSR 0
> -#define VMX_HOST_MSR  1
> -
>  /* VM Instruction error numbers */
>  enum vmx_insn_errno
>  {
> @@ -534,6 +531,54 @@ enum vmx_insn_errno
>      VMX_INSN_FAIL_INVALID                  = ~0,
>  };
>  
> +/* MSR load/save list infrastructure. */
> +enum vmx_msr_list_type {
> +    VMX_MSR_HOST,
> +    VMX_MSR_GUEST,
> +};
> +
> +int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type);
> +
> +static inline int vmx_add_host_load_msr(uint32_t msr)
> +{
> +    return vmx_add_msr(msr, VMX_MSR_HOST);
> +}
> +
> +static inline int vmx_add_guest_msr(uint32_t msr)
> +{
> +    return vmx_add_msr(msr, VMX_MSR_GUEST);
> +}
> +
> +struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type);
> +
> +static inline int vmx_read_guest_msr(uint32_t msr, uint64_t *val)
> +{
> +    struct vmx_msr_entry *ent;

const

Also I would probably do:

{
    const struct vmx_msr_entry *ent = vmx_find_msr(msr, VMX_MSR_GUEST);

    if ( !ent )
        return -ESRCH;

    *val = ent->data;
    return 0;
}

With the const:

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/9] x86/vmx: Internal cleanup for MSR load/save infrastructure
  2018-05-22 11:20 ` [PATCH 2/9] x86/vmx: Internal cleanup " Andrew Cooper
@ 2018-05-23 16:28   ` Roger Pau Monné
  2018-05-23 16:54     ` Andrew Cooper
  2018-05-24 14:45   ` Jan Beulich
  2018-05-27  3:30   ` Tian, Kevin
  2 siblings, 1 reply; 51+ messages in thread
From: Roger Pau Monné @ 2018-05-23 16:28 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Kevin Tian, Wei Liu, Jun Nakajima, Jan Beulich, Xen-devel

On Tue, May 22, 2018 at 12:20:39PM +0100, Andrew Cooper wrote:
>  * Use an arch_vmx_struct local variable to reduce later code volume.
>  * Use start/total instead of msr_area/msr_count.  This is in preparation for
>    more finegrained handling with later changes.
>  * Use ent/end pointers (again for preparation), and to make the vmx_add_msr()
>    logic easier to follow.
>  * Make the memory allocation block of vmx_add_msr() unlikely, and calculate
>    virt_to_maddr() just once.
> 
> No practical change to functionality.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Jun Nakajima <jun.nakajima@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c | 74 ++++++++++++++++++++++++---------------------
>  1 file changed, 40 insertions(+), 34 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index a5dcf5c..f557857 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -1292,48 +1292,50 @@ static int vmx_msr_entry_key_cmp(const void *key, const void *elt)
>  struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type)
>  {
>      struct vcpu *curr = current;
> -    unsigned int msr_count;
> -    struct vmx_msr_entry *msr_area = NULL;
> +    struct arch_vmx_struct *arch_vmx = &curr->arch.hvm_vmx;

curr is used here only, so you can use current and get rid of the curr
local variable?

> +    struct vmx_msr_entry *start = NULL;
> +    unsigned int total;
>  
>      switch ( type )
>      {
>      case VMX_MSR_HOST:
> -        msr_count = curr->arch.hvm_vmx.host_msr_count;
> -        msr_area = curr->arch.hvm_vmx.host_msr_area;
> +        start    = arch_vmx->host_msr_area;
> +        total    = arch_vmx->host_msr_count;
>          break;
>  
>      case VMX_MSR_GUEST:
> -        msr_count = curr->arch.hvm_vmx.msr_count;
> -        msr_area = curr->arch.hvm_vmx.msr_area;
> +        start    = arch_vmx->msr_area;
> +        total    = arch_vmx->msr_count;

Not that I think is wrong, but why are you adding the extra spaces
after the variable name? Those assignments will already be aligned
because start and total names have the same length.

>          break;
>  
>      default:
>          ASSERT_UNREACHABLE();
>      }
>  
> -    if ( msr_area == NULL )
> +    if ( !start )
>          return NULL;
>  
> -    return bsearch(&msr, msr_area, msr_count, sizeof(struct vmx_msr_entry),
> +    return bsearch(&msr, start, total, sizeof(struct vmx_msr_entry),
>                     vmx_msr_entry_key_cmp);
>  }
>  
>  int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
>  {
>      struct vcpu *curr = current;

curr seems to be used only once below in order to get hvm_vmx? In
which case it could be removed.

> -    unsigned int idx, *msr_count;
> -    struct vmx_msr_entry **msr_area, *msr_area_elem;
> +    struct arch_vmx_struct *arch_vmx = &curr->arch.hvm_vmx;
> +    struct vmx_msr_entry **ptr, *start = NULL, *ent, *end;

Do you really need to initialize start here? It seems like it's
unconditionally set to *ptr before any usage.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/9] x86/vmx: Factor locate_msr_entry() out of vmx_find_msr() and vmx_add_msr()
  2018-05-22 11:20 ` [PATCH 3/9] x86/vmx: Factor locate_msr_entry() out of vmx_find_msr() and vmx_add_msr() Andrew Cooper
@ 2018-05-23 16:39   ` Roger Pau Monné
  2018-05-23 16:55     ` Andrew Cooper
  2018-05-27  3:38   ` Tian, Kevin
  1 sibling, 1 reply; 51+ messages in thread
From: Roger Pau Monné @ 2018-05-23 16:39 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Kevin Tian, Wei Liu, Jun Nakajima, Jan Beulich, Xen-devel

On Tue, May 22, 2018 at 12:20:40PM +0100, Andrew Cooper wrote:
> Instead of having multiple algorithms searching the MSR lists, implement a
> single one.  It has the semantics required by vmx_add_msr(), to identify the
> position in which an MSR should live, if it isn't already present.
> 
> There will be a marginal improvement for vmx_find_msr() by avoiding the
> function pointer calls to vmx_msr_entry_key_cmp(), and a major improvement for
> vmx_add_msr() by using a binary search instead of a linear search.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Jun Nakajima <jun.nakajima@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c | 42 ++++++++++++++++++++++++++++--------------
>  1 file changed, 28 insertions(+), 14 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index f557857..e4acdc1 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -1276,24 +1276,36 @@ static int construct_vmcs(struct vcpu *v)
>      return 0;
>  }
>  
> -static int vmx_msr_entry_key_cmp(const void *key, const void *elt)
> +/*
> + * Search an MSR list looking for an MSR entry, or the slot in which it should
> + * live (to keep the data sorted) if an entry is not found.
> + *
> + * The return pointer is guarenteed to be bounded by start and end.  However,
> + * it may point at end, and may be invalid for the caller to dereference.
> + */
> +static struct vmx_msr_entry *locate_msr_entry(
> +    struct vmx_msr_entry *start, struct vmx_msr_entry *end, uint32_t msr)
>  {
> -    const u32 *msr = key;
> -    const struct vmx_msr_entry *entry = elt;
> +    while ( start < end )
> +    {
> +        struct vmx_msr_entry *mid = start + (end - start) / 2;
>  
> -    if ( *msr > entry->index )
> -        return 1;
> -    if ( *msr < entry->index )
> -        return -1;
> +        if ( msr < mid->index )
> +            end = mid;
> +        else if ( msr > mid->index )
> +            start = mid + 1;
> +        else
> +            return mid;
> +    }

This is basically an open coded version of bsearch, isn't there anyway
to adapt the current bsearch so that it could be used for both
vmx_find_msr and vmx_add_msr?

I know there will be a performance penalty for using a function
pointer for the comparator function, but this looks like code
duplication to me.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/9] x86/vmx: Internal cleanup for MSR load/save infrastructure
  2018-05-23 16:28   ` Roger Pau Monné
@ 2018-05-23 16:54     ` Andrew Cooper
  0 siblings, 0 replies; 51+ messages in thread
From: Andrew Cooper @ 2018-05-23 16:54 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Jan Beulich, Xen-devel

On 23/05/18 17:28, Roger Pau Monné wrote:
> On Tue, May 22, 2018 at 12:20:39PM +0100, Andrew Cooper wrote:
>>  * Use an arch_vmx_struct local variable to reduce later code volume.
>>  * Use start/total instead of msr_area/msr_count.  This is in preparation for
>>    more finegrained handling with later changes.
>>  * Use ent/end pointers (again for preparation), and to make the vmx_add_msr()
>>    logic easier to follow.
>>  * Make the memory allocation block of vmx_add_msr() unlikely, and calculate
>>    virt_to_maddr() just once.
>>
>> No practical change to functionality.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Jun Nakajima <jun.nakajima@intel.com>
>> CC: Kevin Tian <kevin.tian@intel.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> ---
>>  xen/arch/x86/hvm/vmx/vmcs.c | 74 ++++++++++++++++++++++++---------------------
>>  1 file changed, 40 insertions(+), 34 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>> index a5dcf5c..f557857 100644
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -1292,48 +1292,50 @@ static int vmx_msr_entry_key_cmp(const void *key, const void *elt)
>>  struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type)
>>  {
>>      struct vcpu *curr = current;
>> -    unsigned int msr_count;
>> -    struct vmx_msr_entry *msr_area = NULL;
>> +    struct arch_vmx_struct *arch_vmx = &curr->arch.hvm_vmx;
> curr is used here only, so you can use current and get rid of the curr
> local variable?

curr disappears in patch 4 "x86/vmx: Support remote access to the MSR
lists", when v starts getting passed in from the outside.

>
>> +    struct vmx_msr_entry *start = NULL;
>> +    unsigned int total;
>>  
>>      switch ( type )
>>      {
>>      case VMX_MSR_HOST:
>> -        msr_count = curr->arch.hvm_vmx.host_msr_count;
>> -        msr_area = curr->arch.hvm_vmx.host_msr_area;
>> +        start    = arch_vmx->host_msr_area;
>> +        total    = arch_vmx->host_msr_count;
>>          break;
>>  
>>      case VMX_MSR_GUEST:
>> -        msr_count = curr->arch.hvm_vmx.msr_count;
>> -        msr_area = curr->arch.hvm_vmx.msr_area;
>> +        start    = arch_vmx->msr_area;
>> +        total    = arch_vmx->msr_count;
> Not that I think is wrong, but why are you adding the extra spaces
> after the variable name? Those assignments will already be aligned
> because start and total names have the same length.

There are changes in later patches, for which this is the correct
indentation.

>
>>          break;
>>  
>>      default:
>>          ASSERT_UNREACHABLE();
>>      }
>>  
>> -    if ( msr_area == NULL )
>> +    if ( !start )
>>          return NULL;
>>  
>> -    return bsearch(&msr, msr_area, msr_count, sizeof(struct vmx_msr_entry),
>> +    return bsearch(&msr, start, total, sizeof(struct vmx_msr_entry),
>>                     vmx_msr_entry_key_cmp);
>>  }
>>  
>>  int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
>>  {
>>      struct vcpu *curr = current;
> curr seems to be used only once below in order to get hvm_vmx? In
> which case it could be removed.
>
>> -    unsigned int idx, *msr_count;
>> -    struct vmx_msr_entry **msr_area, *msr_area_elem;
>> +    struct arch_vmx_struct *arch_vmx = &curr->arch.hvm_vmx;
>> +    struct vmx_msr_entry **ptr, *start = NULL, *ent, *end;
> Do you really need to initialize start here? It seems like it's
> unconditionally set to *ptr before any usage.

Yes, for safety through the default case in release builds.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/9] x86/vmx: Factor locate_msr_entry() out of vmx_find_msr() and vmx_add_msr()
  2018-05-23 16:39   ` Roger Pau Monné
@ 2018-05-23 16:55     ` Andrew Cooper
  2018-05-24 10:53       ` Roger Pau Monné
  0 siblings, 1 reply; 51+ messages in thread
From: Andrew Cooper @ 2018-05-23 16:55 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Jan Beulich, Xen-devel

On 23/05/18 17:39, Roger Pau Monné wrote:
> On Tue, May 22, 2018 at 12:20:40PM +0100, Andrew Cooper wrote:
>> Instead of having multiple algorithms searching the MSR lists, implement a
>> single one.  It has the semantics required by vmx_add_msr(), to identify the
>> position in which an MSR should live, if it isn't already present.
>>
>> There will be a marginal improvement for vmx_find_msr() by avoiding the
>> function pointer calls to vmx_msr_entry_key_cmp(), and a major improvement for
>> vmx_add_msr() by using a binary search instead of a linear search.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Jun Nakajima <jun.nakajima@intel.com>
>> CC: Kevin Tian <kevin.tian@intel.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> ---
>>  xen/arch/x86/hvm/vmx/vmcs.c | 42 ++++++++++++++++++++++++++++--------------
>>  1 file changed, 28 insertions(+), 14 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>> index f557857..e4acdc1 100644
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -1276,24 +1276,36 @@ static int construct_vmcs(struct vcpu *v)
>>      return 0;
>>  }
>>  
>> -static int vmx_msr_entry_key_cmp(const void *key, const void *elt)
>> +/*
>> + * Search an MSR list looking for an MSR entry, or the slot in which it should
>> + * live (to keep the data sorted) if an entry is not found.
>> + *
>> + * The return pointer is guarenteed to be bounded by start and end.  However,
>> + * it may point at end, and may be invalid for the caller to dereference.
>> + */
>> +static struct vmx_msr_entry *locate_msr_entry(
>> +    struct vmx_msr_entry *start, struct vmx_msr_entry *end, uint32_t msr)
>>  {
>> -    const u32 *msr = key;
>> -    const struct vmx_msr_entry *entry = elt;
>> +    while ( start < end )
>> +    {
>> +        struct vmx_msr_entry *mid = start + (end - start) / 2;
>>  
>> -    if ( *msr > entry->index )
>> -        return 1;
>> -    if ( *msr < entry->index )
>> -        return -1;
>> +        if ( msr < mid->index )
>> +            end = mid;
>> +        else if ( msr > mid->index )
>> +            start = mid + 1;
>> +        else
>> +            return mid;
>> +    }
> This is basically an open coded version of bsearch, isn't there anyway
> to adapt the current bsearch so that it could be used for both
> vmx_find_msr and vmx_add_msr?
>
> I know there will be a performance penalty for using a function
> pointer for the comparator function, but this looks like code
> duplication to me.

A third use appears in a later patch.  bsearch() doesn't have the
described property on a miss, which is necessary to maintain the lists.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/9] x86/vmx: API improvements for MSR load/save infrastructure
  2018-05-23 16:01   ` Roger Pau Monné
@ 2018-05-23 17:02     ` Andrew Cooper
  0 siblings, 0 replies; 51+ messages in thread
From: Andrew Cooper @ 2018-05-23 17:02 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Jan Beulich, Xen-devel

On 23/05/18 17:01, Roger Pau Monné wrote:
> On Tue, May 22, 2018 at 12:20:38PM +0100, Andrew Cooper wrote:
>> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> index 06c3179..c8a1f89 100644
>> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
>> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> @@ -514,9 +514,6 @@ enum vmcs_field {
>>  
>>  #define VMCS_VPID_WIDTH 16
>>  
>> -#define VMX_GUEST_MSR 0
>> -#define VMX_HOST_MSR  1
>> -
>>  /* VM Instruction error numbers */
>>  enum vmx_insn_errno
>>  {
>> @@ -534,6 +531,54 @@ enum vmx_insn_errno
>>      VMX_INSN_FAIL_INVALID                  = ~0,
>>  };
>>  
>> +/* MSR load/save list infrastructure. */
>> +enum vmx_msr_list_type {
>> +    VMX_MSR_HOST,
>> +    VMX_MSR_GUEST,
>> +};
>> +
>> +int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type);
>> +
>> +static inline int vmx_add_host_load_msr(uint32_t msr)
>> +{
>> +    return vmx_add_msr(msr, VMX_MSR_HOST);
>> +}
>> +
>> +static inline int vmx_add_guest_msr(uint32_t msr)
>> +{
>> +    return vmx_add_msr(msr, VMX_MSR_GUEST);
>> +}
>> +
>> +struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type);
>> +
>> +static inline int vmx_read_guest_msr(uint32_t msr, uint64_t *val)
>> +{
>> +    struct vmx_msr_entry *ent;
> const
>
> Also I would probably do:
>
> {
>     const struct vmx_msr_entry *ent = vmx_find_msr(msr, VMX_MSR_GUEST);
>
>     if ( !ent )
>         return -ESRCH;
>
>     *val = ent->data;
>     return 0;
> }

Done.

>
> With the const:
>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks,

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/9] x86/vmx: Factor locate_msr_entry() out of vmx_find_msr() and vmx_add_msr()
  2018-05-23 16:55     ` Andrew Cooper
@ 2018-05-24 10:53       ` Roger Pau Monné
  2018-05-24 10:59         ` Andrew Cooper
  0 siblings, 1 reply; 51+ messages in thread
From: Roger Pau Monné @ 2018-05-24 10:53 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Kevin Tian, Wei Liu, Jun Nakajima, Jan Beulich, Xen-devel

On Wed, May 23, 2018 at 05:55:50PM +0100, Andrew Cooper wrote:
> On 23/05/18 17:39, Roger Pau Monné wrote:
> > On Tue, May 22, 2018 at 12:20:40PM +0100, Andrew Cooper wrote:
> >> Instead of having multiple algorithms searching the MSR lists, implement a
> >> single one.  It has the semantics required by vmx_add_msr(), to identify the
> >> position in which an MSR should live, if it isn't already present.
> >>
> >> There will be a marginal improvement for vmx_find_msr() by avoiding the
> >> function pointer calls to vmx_msr_entry_key_cmp(), and a major improvement for
> >> vmx_add_msr() by using a binary search instead of a linear search.
> >>
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >> ---
> >> CC: Jan Beulich <JBeulich@suse.com>
> >> CC: Jun Nakajima <jun.nakajima@intel.com>
> >> CC: Kevin Tian <kevin.tian@intel.com>
> >> CC: Wei Liu <wei.liu2@citrix.com>
> >> CC: Roger Pau Monné <roger.pau@citrix.com>
> >> ---
> >>  xen/arch/x86/hvm/vmx/vmcs.c | 42 ++++++++++++++++++++++++++++--------------
> >>  1 file changed, 28 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> >> index f557857..e4acdc1 100644
> >> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> >> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> >> @@ -1276,24 +1276,36 @@ static int construct_vmcs(struct vcpu *v)
> >>      return 0;
> >>  }
> >>  
> >> -static int vmx_msr_entry_key_cmp(const void *key, const void *elt)
> >> +/*
> >> + * Search an MSR list looking for an MSR entry, or the slot in which it should
> >> + * live (to keep the data sorted) if an entry is not found.
> >> + *
> >> + * The return pointer is guarenteed to be bounded by start and end.  However,
> >> + * it may point at end, and may be invalid for the caller to dereference.
> >> + */
> >> +static struct vmx_msr_entry *locate_msr_entry(
> >> +    struct vmx_msr_entry *start, struct vmx_msr_entry *end, uint32_t msr)
> >>  {
> >> -    const u32 *msr = key;
> >> -    const struct vmx_msr_entry *entry = elt;
> >> +    while ( start < end )
> >> +    {
> >> +        struct vmx_msr_entry *mid = start + (end - start) / 2;
> >>  
> >> -    if ( *msr > entry->index )
> >> -        return 1;
> >> -    if ( *msr < entry->index )
> >> -        return -1;
> >> +        if ( msr < mid->index )
> >> +            end = mid;
> >> +        else if ( msr > mid->index )
> >> +            start = mid + 1;
> >> +        else
> >> +            return mid;
> >> +    }
> > This is basically an open coded version of bsearch, isn't there anyway
> > to adapt the current bsearch so that it could be used for both
> > vmx_find_msr and vmx_add_msr?
> >
> > I know there will be a performance penalty for using a function
> > pointer for the comparator function, but this looks like code
> > duplication to me.
> 
> A third use appears in a later patch.  bsearch() doesn't have the
> described property on a miss, which is necessary to maintain the lists.

I would consider adding a flag to the list of parameters so that
bsearch returned the position where the item should be added in case
of a miss. You could then wrap it inside of locate_msr_entry, or get
rid of this helper altogether.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/9] x86/vmx: Factor locate_msr_entry() out of vmx_find_msr() and vmx_add_msr()
  2018-05-24 10:53       ` Roger Pau Monné
@ 2018-05-24 10:59         ` Andrew Cooper
  2018-05-24 12:16           ` Roger Pau Monné
  0 siblings, 1 reply; 51+ messages in thread
From: Andrew Cooper @ 2018-05-24 10:59 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Jan Beulich, Xen-devel

On 24/05/18 11:53, Roger Pau Monné wrote:
> On Wed, May 23, 2018 at 05:55:50PM +0100, Andrew Cooper wrote:
>> On 23/05/18 17:39, Roger Pau Monné wrote:
>>> On Tue, May 22, 2018 at 12:20:40PM +0100, Andrew Cooper wrote:
>>>> Instead of having multiple algorithms searching the MSR lists, implement a
>>>> single one.  It has the semantics required by vmx_add_msr(), to identify the
>>>> position in which an MSR should live, if it isn't already present.
>>>>
>>>> There will be a marginal improvement for vmx_find_msr() by avoiding the
>>>> function pointer calls to vmx_msr_entry_key_cmp(), and a major improvement for
>>>> vmx_add_msr() by using a binary search instead of a linear search.
>>>>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> ---
>>>> CC: Jan Beulich <JBeulich@suse.com>
>>>> CC: Jun Nakajima <jun.nakajima@intel.com>
>>>> CC: Kevin Tian <kevin.tian@intel.com>
>>>> CC: Wei Liu <wei.liu2@citrix.com>
>>>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>>> ---
>>>>  xen/arch/x86/hvm/vmx/vmcs.c | 42 ++++++++++++++++++++++++++++--------------
>>>>  1 file changed, 28 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>>>> index f557857..e4acdc1 100644
>>>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>>>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>>>> @@ -1276,24 +1276,36 @@ static int construct_vmcs(struct vcpu *v)
>>>>      return 0;
>>>>  }
>>>>  
>>>> -static int vmx_msr_entry_key_cmp(const void *key, const void *elt)
>>>> +/*
>>>> + * Search an MSR list looking for an MSR entry, or the slot in which it should
>>>> + * live (to keep the data sorted) if an entry is not found.
>>>> + *
>>>> + * The return pointer is guarenteed to be bounded by start and end.  However,
>>>> + * it may point at end, and may be invalid for the caller to dereference.
>>>> + */
>>>> +static struct vmx_msr_entry *locate_msr_entry(
>>>> +    struct vmx_msr_entry *start, struct vmx_msr_entry *end, uint32_t msr)
>>>>  {
>>>> -    const u32 *msr = key;
>>>> -    const struct vmx_msr_entry *entry = elt;
>>>> +    while ( start < end )
>>>> +    {
>>>> +        struct vmx_msr_entry *mid = start + (end - start) / 2;
>>>>  
>>>> -    if ( *msr > entry->index )
>>>> -        return 1;
>>>> -    if ( *msr < entry->index )
>>>> -        return -1;
>>>> +        if ( msr < mid->index )
>>>> +            end = mid;
>>>> +        else if ( msr > mid->index )
>>>> +            start = mid + 1;
>>>> +        else
>>>> +            return mid;
>>>> +    }
>>> This is basically an open coded version of bsearch, isn't there anyway
>>> to adapt the current bsearch so that it could be used for both
>>> vmx_find_msr and vmx_add_msr?
>>>
>>> I know there will be a performance penalty for using a function
>>> pointer for the comparator function, but this looks like code
>>> duplication to me.
>> A third use appears in a later patch.  bsearch() doesn't have the
>> described property on a miss, which is necessary to maintain the lists.
> I would consider adding a flag to the list of parameters so that
> bsearch returned the position where the item should be added in case
> of a miss. You could then wrap it inside of locate_msr_entry, or get
> rid of this helper altogether.

bsearch() is specified by POSIX, and C89/99, amongst other standards. 
Changing its API is not something I'm going to do.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/9] x86/vmx: Support remote access to the MSR lists
  2018-05-22 11:20 ` [PATCH 4/9] x86/vmx: Support remote access to the MSR lists Andrew Cooper
@ 2018-05-24 11:50   ` Roger Pau Monné
  2018-05-24 12:03     ` Andrew Cooper
  2018-05-24 14:53   ` Jan Beulich
  2018-05-27  3:47   ` Tian, Kevin
  2 siblings, 1 reply; 51+ messages in thread
From: Roger Pau Monné @ 2018-05-24 11:50 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Kevin Tian, Wei Liu, Jun Nakajima, Jan Beulich, Xen-devel

On Tue, May 22, 2018 at 12:20:41PM +0100, Andrew Cooper wrote:
> At the moment, all modifications of the MSR lists are in current context.
> However, future changes may need to put MSR_EFER into the lists from domctl
> hypercall context.
> 
> Plumb a struct vcpu parameter down through the infrastructure, and use
> vmx_vmcs_{enter,exit}() for safe access to the VMCS in vmx_add_msr().  Use
> assertions to ensure that access is either in current context, or while the
> vcpu is paused.
> 
> For now it is safe to require that remote accesses are under the domctl lock.
> This will remain safe if/when the global domctl lock becomes per-domain.

I'm not sure I see the point of this sentence. From my reading of the
above test accesses will always be safe regardless of whether the
domctl lock is global or per-domain?

> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index e4acdc1..8bf54c4 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -1301,13 +1301,15 @@ static struct vmx_msr_entry *locate_msr_entry(
>      return start;
>  }
>  
> -struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type)
> +struct vmx_msr_entry *vmx_find_msr(struct vcpu *v, uint32_t msr,
> +                                   enum vmx_msr_list_type type)
>  {
> -    struct vcpu *curr = current;
> -    struct arch_vmx_struct *arch_vmx = &curr->arch.hvm_vmx;
> +    struct arch_vmx_struct *arch_vmx = &v->arch.hvm_vmx;
>      struct vmx_msr_entry *start = NULL, *ent, *end;
>      unsigned int total;
>  
> +    ASSERT(v == current || !vcpu_runnable(v));

I would rather do:

if ( v != current && vcpu_runnable(v) )
{
    ASSERT_UNREACHABLE();
    return NULL;
}

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/9] x86/vmx: Support remote access to the MSR lists
  2018-05-24 11:50   ` Roger Pau Monné
@ 2018-05-24 12:03     ` Andrew Cooper
  0 siblings, 0 replies; 51+ messages in thread
From: Andrew Cooper @ 2018-05-24 12:03 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Jan Beulich, Xen-devel

On 24/05/18 12:50, Roger Pau Monné wrote:
> On Tue, May 22, 2018 at 12:20:41PM +0100, Andrew Cooper wrote:
>> At the moment, all modifications of the MSR lists are in current context.
>> However, future changes may need to put MSR_EFER into the lists from domctl
>> hypercall context.
>>
>> Plumb a struct vcpu parameter down through the infrastructure, and use
>> vmx_vmcs_{enter,exit}() for safe access to the VMCS in vmx_add_msr().  Use
>> assertions to ensure that access is either in current context, or while the
>> vcpu is paused.
>>
>> For now it is safe to require that remote accesses are under the domctl lock.
>> This will remain safe if/when the global domctl lock becomes per-domain.
> I'm not sure I see the point of this sentence. From my reading of the
> above test accesses will always be safe regardless of whether the
> domctl lock is global or per-domain?

Its attempting to justify why the domctl lock is ok to use here, but I
can drop the paragraph if people think it isn't helpful.

>
>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>> index e4acdc1..8bf54c4 100644
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -1301,13 +1301,15 @@ static struct vmx_msr_entry *locate_msr_entry(
>>      return start;
>>  }
>>  
>> -struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type)
>> +struct vmx_msr_entry *vmx_find_msr(struct vcpu *v, uint32_t msr,
>> +                                   enum vmx_msr_list_type type)
>>  {
>> -    struct vcpu *curr = current;
>> -    struct arch_vmx_struct *arch_vmx = &curr->arch.hvm_vmx;
>> +    struct arch_vmx_struct *arch_vmx = &v->arch.hvm_vmx;
>>      struct vmx_msr_entry *start = NULL, *ent, *end;
>>      unsigned int total;
>>  
>> +    ASSERT(v == current || !vcpu_runnable(v));
> I would rather do:
>
> if ( v != current && vcpu_runnable(v) )
> {
>     ASSERT_UNREACHABLE();
>     return NULL;
> }

I'm not so certain.  Failing the assertion doesn't make the later code
unsafe to execute.

Even for the mutable operations, all that would happen if the assertion
failed would be corruption to the guest's view of its MSR state.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/9] x86/vmx: Fix handing of MSR_DEBUGCTL on VMExit
  2018-05-22 11:20 ` [PATCH 5/9] x86/vmx: Fix handing of MSR_DEBUGCTL on VMExit Andrew Cooper
  2018-05-22 12:53   ` Andrew Cooper
@ 2018-05-24 12:14   ` Roger Pau Monné
  2018-05-24 12:39     ` Andrew Cooper
  2018-05-24 15:08   ` Jan Beulich
  2018-05-27  3:56   ` Tian, Kevin
  3 siblings, 1 reply; 51+ messages in thread
From: Roger Pau Monné @ 2018-05-24 12:14 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Kevin Tian, Wei Liu, Jun Nakajima, Jan Beulich, Xen-devel

On Tue, May 22, 2018 at 12:20:42PM +0100, Andrew Cooper wrote:
> Currently, whenever the guest writes a nonzero value to MSR_DEBUGCTL, Xen
> updates a host MSR load list entry with the current hardware value of
> MSR_DEBUGCTL.  This is wrong.
> 
> On VMExit, hardware automatically resets MSR_DEBUGCTL to 0.  The only case
> where different behaviour is needed is if Xen is debugging itself, and this
> needs setting up unconditionally for the lifetime of the VM.
> 
> The `ler` command line boolean is the only way to configure any use of
> MSR_DEBUGCTL for Xen,

Hm, there's no documentation at all for the 'ler' option.

> so tie the host load list entry to this setting in
> construct_vmcs().  Any runtime update of Xen's MSR_DEBUGCTL setting requires
> more complicated synchronisation across all the running VMs.
> 
> In the exceedingly common case, this avoids the unnecessary overhead of having
> a host load entry performing the same zeroing operation that hardware has
> already performed as part of the VMExit.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Jun Nakajima <jun.nakajima@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> 
> Notes for backporting: This change probably does want backporting, but depends
> on the previous patch "Support remote access to the MSR lists", and adds an
> extra rdmsr to the vcpu construction path (resolved in a later patch).
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c | 6 ++++++
>  xen/arch/x86/hvm/vmx/vmx.c  | 3 +--
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 8bf54c4..2035a6d 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -996,6 +996,7 @@ static int construct_vmcs(struct vcpu *v)
>      struct domain *d = v->domain;
>      u32 vmexit_ctl = vmx_vmexit_control;
>      u32 vmentry_ctl = vmx_vmentry_control;
> +    int rc;
>  
>      vmx_vmcs_enter(v);
>  
> @@ -1266,6 +1267,11 @@ static int construct_vmcs(struct vcpu *v)
>      if ( cpu_has_vmx_tsc_scaling )
>          __vmwrite(TSC_MULTIPLIER, d->arch.hvm_domain.tsc_scaling_ratio);
>  
> +    /* If using host debugging, restore Xen's setting on vmexit. */
> +    if ( this_cpu(ler_msr) &&
> +         (rc = vmx_add_host_load_msr(v, MSR_IA32_DEBUGCTLMSR))  )
> +        return rc;

Isn't this missing a vmx_vmcs_exit on error?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/9] x86/vmx: Factor locate_msr_entry() out of vmx_find_msr() and vmx_add_msr()
  2018-05-24 10:59         ` Andrew Cooper
@ 2018-05-24 12:16           ` Roger Pau Monné
  0 siblings, 0 replies; 51+ messages in thread
From: Roger Pau Monné @ 2018-05-24 12:16 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Kevin Tian, Wei Liu, Jun Nakajima, Jan Beulich, Xen-devel

On Thu, May 24, 2018 at 11:59:07AM +0100, Andrew Cooper wrote:
> On 24/05/18 11:53, Roger Pau Monné wrote:
> > On Wed, May 23, 2018 at 05:55:50PM +0100, Andrew Cooper wrote:
> >> On 23/05/18 17:39, Roger Pau Monné wrote:
> >>> On Tue, May 22, 2018 at 12:20:40PM +0100, Andrew Cooper wrote:
> >>>> Instead of having multiple algorithms searching the MSR lists, implement a
> >>>> single one.  It has the semantics required by vmx_add_msr(), to identify the
> >>>> position in which an MSR should live, if it isn't already present.
> >>>>
> >>>> There will be a marginal improvement for vmx_find_msr() by avoiding the
> >>>> function pointer calls to vmx_msr_entry_key_cmp(), and a major improvement for
> >>>> vmx_add_msr() by using a binary search instead of a linear search.
> >>>>
> >>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >>>> ---
> >>>> CC: Jan Beulich <JBeulich@suse.com>
> >>>> CC: Jun Nakajima <jun.nakajima@intel.com>
> >>>> CC: Kevin Tian <kevin.tian@intel.com>
> >>>> CC: Wei Liu <wei.liu2@citrix.com>
> >>>> CC: Roger Pau Monné <roger.pau@citrix.com>
> >>>> ---
> >>>>  xen/arch/x86/hvm/vmx/vmcs.c | 42 ++++++++++++++++++++++++++++--------------
> >>>>  1 file changed, 28 insertions(+), 14 deletions(-)
> >>>>
> >>>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> >>>> index f557857..e4acdc1 100644
> >>>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> >>>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> >>>> @@ -1276,24 +1276,36 @@ static int construct_vmcs(struct vcpu *v)
> >>>>      return 0;
> >>>>  }
> >>>>  
> >>>> -static int vmx_msr_entry_key_cmp(const void *key, const void *elt)
> >>>> +/*
> >>>> + * Search an MSR list looking for an MSR entry, or the slot in which it should
> >>>> + * live (to keep the data sorted) if an entry is not found.
> >>>> + *
> >>>> + * The return pointer is guarenteed to be bounded by start and end.  However,
> >>>> + * it may point at end, and may be invalid for the caller to dereference.
> >>>> + */
> >>>> +static struct vmx_msr_entry *locate_msr_entry(
> >>>> +    struct vmx_msr_entry *start, struct vmx_msr_entry *end, uint32_t msr)
> >>>>  {
> >>>> -    const u32 *msr = key;
> >>>> -    const struct vmx_msr_entry *entry = elt;
> >>>> +    while ( start < end )
> >>>> +    {
> >>>> +        struct vmx_msr_entry *mid = start + (end - start) / 2;
> >>>>  
> >>>> -    if ( *msr > entry->index )
> >>>> -        return 1;
> >>>> -    if ( *msr < entry->index )
> >>>> -        return -1;
> >>>> +        if ( msr < mid->index )
> >>>> +            end = mid;
> >>>> +        else if ( msr > mid->index )
> >>>> +            start = mid + 1;
> >>>> +        else
> >>>> +            return mid;
> >>>> +    }
> >>> This is basically an open coded version of bsearch, isn't there anyway
> >>> to adapt the current bsearch so that it could be used for both
> >>> vmx_find_msr and vmx_add_msr?
> >>>
> >>> I know there will be a performance penalty for using a function
> >>> pointer for the comparator function, but this looks like code
> >>> duplication to me.
> >> A third use appears in a later patch.  bsearch() doesn't have the
> >> described property on a miss, which is necessary to maintain the lists.
> > I would consider adding a flag to the list of parameters so that
> > bsearch returned the position where the item should be added in case
> > of a miss. You could then wrap it inside of locate_msr_entry, or get
> > rid of this helper altogether.
> 
> bsearch() is specified by POSIX, and C89/99, amongst other standards. 
> Changing its API is not something I'm going to do.

Oh, didn't know that. In which case I agree. AFAICT there's no POSIX
specification for a function that could be used to add new entries
into a sorted array, so:

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/9] x86/vmx: Fix handing of MSR_DEBUGCTL on VMExit
  2018-05-24 12:14   ` Roger Pau Monné
@ 2018-05-24 12:39     ` Andrew Cooper
  2018-05-24 13:53       ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Andrew Cooper @ 2018-05-24 12:39 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Jan Beulich, Xen-devel

On 24/05/18 13:14, Roger Pau Monné wrote:
> On Tue, May 22, 2018 at 12:20:42PM +0100, Andrew Cooper wrote:
>> Currently, whenever the guest writes a nonzero value to MSR_DEBUGCTL, Xen
>> updates a host MSR load list entry with the current hardware value of
>> MSR_DEBUGCTL.  This is wrong.
>>
>> On VMExit, hardware automatically resets MSR_DEBUGCTL to 0.  The only case
>> where different behaviour is needed is if Xen is debugging itself, and this
>> needs setting up unconditionally for the lifetime of the VM.
>>
>> The `ler` command line boolean is the only way to configure any use of
>> MSR_DEBUGCTL for Xen,
> Hm, there's no documentation at all for the 'ler' option.

:(  ISTR Jan introducing it for debugging purposes, but I've never used
it myself.

>
>> so tie the host load list entry to this setting in
>> construct_vmcs().  Any runtime update of Xen's MSR_DEBUGCTL setting requires
>> more complicated synchronisation across all the running VMs.
>>
>> In the exceedingly common case, this avoids the unnecessary overhead of having
>> a host load entry performing the same zeroing operation that hardware has
>> already performed as part of the VMExit.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Jun Nakajima <jun.nakajima@intel.com>
>> CC: Kevin Tian <kevin.tian@intel.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>
>> Notes for backporting: This change probably does want backporting, but depends
>> on the previous patch "Support remote access to the MSR lists", and adds an
>> extra rdmsr to the vcpu construction path (resolved in a later patch).
>> ---
>>  xen/arch/x86/hvm/vmx/vmcs.c | 6 ++++++
>>  xen/arch/x86/hvm/vmx/vmx.c  | 3 +--
>>  2 files changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>> index 8bf54c4..2035a6d 100644
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -996,6 +996,7 @@ static int construct_vmcs(struct vcpu *v)
>>      struct domain *d = v->domain;
>>      u32 vmexit_ctl = vmx_vmexit_control;
>>      u32 vmentry_ctl = vmx_vmentry_control;
>> +    int rc;
>>  
>>      vmx_vmcs_enter(v);
>>  
>> @@ -1266,6 +1267,11 @@ static int construct_vmcs(struct vcpu *v)
>>      if ( cpu_has_vmx_tsc_scaling )
>>          __vmwrite(TSC_MULTIPLIER, d->arch.hvm_domain.tsc_scaling_ratio);
>>  
>> +    /* If using host debugging, restore Xen's setting on vmexit. */
>> +    if ( this_cpu(ler_msr) &&
>> +         (rc = vmx_add_host_load_msr(v, MSR_IA32_DEBUGCTLMSR))  )
>> +        return rc;
> Isn't this missing a vmx_vmcs_exit on error?

It is indeed.  Will fix.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/9] x86/vmx: Fix handing of MSR_DEBUGCTL on VMExit
  2018-05-24 12:39     ` Andrew Cooper
@ 2018-05-24 13:53       ` Jan Beulich
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Beulich @ 2018-05-24 13:53 UTC (permalink / raw)
  To: Andrew Cooper, Roger Pau Monne
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Xen-devel

>>> On 24.05.18 at 14:39, <andrew.cooper3@citrix.com> wrote:
> On 24/05/18 13:14, Roger Pau Monné wrote:
>> On Tue, May 22, 2018 at 12:20:42PM +0100, Andrew Cooper wrote:
>>> Currently, whenever the guest writes a nonzero value to MSR_DEBUGCTL, Xen
>>> updates a host MSR load list entry with the current hardware value of
>>> MSR_DEBUGCTL.  This is wrong.
>>>
>>> On VMExit, hardware automatically resets MSR_DEBUGCTL to 0.  The only case
>>> where different behaviour is needed is if Xen is debugging itself, and this
>>> needs setting up unconditionally for the lifetime of the VM.
>>>
>>> The `ler` command line boolean is the only way to configure any use of
>>> MSR_DEBUGCTL for Xen,
>> Hm, there's no documentation at all for the 'ler' option.
> 
> :(  ISTR Jan introducing it for debugging purposes, but I've never used
> it myself.

Indeed I've been using this a couple of times to have an idea how execution
lead to the point where an exception was raised. The option pre-dates the
existence of xen-command-line.markdown by several releases.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/9] x86/vmx: Internal cleanup for MSR load/save infrastructure
  2018-05-22 11:20 ` [PATCH 2/9] x86/vmx: Internal cleanup " Andrew Cooper
  2018-05-23 16:28   ` Roger Pau Monné
@ 2018-05-24 14:45   ` Jan Beulich
  2018-05-27  3:30   ` Tian, Kevin
  2 siblings, 0 replies; 51+ messages in thread
From: Jan Beulich @ 2018-05-24 14:45 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Xen-devel, Wei Liu, Jun Nakajima, Roger Pau Monne

>>> On 22.05.18 at 13:20, <andrew.cooper3@citrix.com> wrote:
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -1292,48 +1292,50 @@ static int vmx_msr_entry_key_cmp(const void *key, const void *elt)
>  struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type)
>  {
>      struct vcpu *curr = current;
> -    unsigned int msr_count;
> -    struct vmx_msr_entry *msr_area = NULL;
> +    struct arch_vmx_struct *arch_vmx = &curr->arch.hvm_vmx;

In the interest of code volume reduction - why the arch_ prefix? There's
no arch_-less vmx anywhere afaict.

> +    struct vmx_msr_entry *start = NULL;
> +    unsigned int total;
>  
>      switch ( type )
>      {
>      case VMX_MSR_HOST:
> -        msr_count = curr->arch.hvm_vmx.host_msr_count;
> -        msr_area = curr->arch.hvm_vmx.host_msr_area;
> +        start    = arch_vmx->host_msr_area;
> +        total    = arch_vmx->host_msr_count;
>          break;
>  
>      case VMX_MSR_GUEST:
> -        msr_count = curr->arch.hvm_vmx.msr_count;
> -        msr_area = curr->arch.hvm_vmx.msr_area;
> +        start    = arch_vmx->msr_area;
> +        total    = arch_vmx->msr_count;
>          break;
>  
>      default:
>          ASSERT_UNREACHABLE();
>      }
>  
> -    if ( msr_area == NULL )
> +    if ( !start )
>          return NULL;
>  
> -    return bsearch(&msr, msr_area, msr_count, sizeof(struct vmx_msr_entry),
> +    return bsearch(&msr, start, total, sizeof(struct vmx_msr_entry),

sizeof(*start) ?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/9] x86/vmx: Support remote access to the MSR lists
  2018-05-22 11:20 ` [PATCH 4/9] x86/vmx: Support remote access to the MSR lists Andrew Cooper
  2018-05-24 11:50   ` Roger Pau Monné
@ 2018-05-24 14:53   ` Jan Beulich
  2018-05-27  3:47   ` Tian, Kevin
  2 siblings, 0 replies; 51+ messages in thread
From: Jan Beulich @ 2018-05-24 14:53 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Xen-devel, Wei Liu, Jun Nakajima, Roger Pau Monne

>>> On 22.05.18 at 13:20, <andrew.cooper3@citrix.com> wrote:
> @@ -537,25 +544,27 @@ enum vmx_msr_list_type {
>      VMX_MSR_GUEST,
>  };
>  
> -int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type);
> +int vmx_add_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type);
>  
> -static inline int vmx_add_host_load_msr(uint32_t msr)
> +static inline int vmx_add_guest_msr(struct vcpu *v, uint32_t msr)
>  {
> -    return vmx_add_msr(msr, VMX_MSR_HOST);
> +    return vmx_add_msr(v, msr, VMX_MSR_GUEST);
>  }
>  
> -static inline int vmx_add_guest_msr(uint32_t msr)
> +static inline int vmx_add_host_load_msr(struct vcpu *v, uint32_t msr)
>  {
> -    return vmx_add_msr(msr, VMX_MSR_GUEST);
> +    return vmx_add_msr(v, msr, VMX_MSR_HOST);
>  }
>  
> -struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type);
> +struct vmx_msr_entry *vmx_find_msr(struct vcpu *v, uint32_t msr,
> +                                   enum vmx_msr_list_type type);
>  
> -static inline int vmx_read_guest_msr(uint32_t msr, uint64_t *val)
> +static inline int vmx_read_guest_msr(struct vcpu *v, uint32_t msr,
> +                                     uint64_t *val)

It would seem logical for the new parameter to be constified for at least
these last two.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/9] x86/vmx: Fix handing of MSR_DEBUGCTL on VMExit
  2018-05-22 11:20 ` [PATCH 5/9] x86/vmx: Fix handing of MSR_DEBUGCTL on VMExit Andrew Cooper
  2018-05-22 12:53   ` Andrew Cooper
  2018-05-24 12:14   ` Roger Pau Monné
@ 2018-05-24 15:08   ` Jan Beulich
  2018-05-24 15:51     ` Andrew Cooper
  2018-05-27  3:56   ` Tian, Kevin
  3 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2018-05-24 15:08 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Xen-devel, Wei Liu, Jun Nakajima, Roger Pau Monne

>>> On 22.05.18 at 13:20, <andrew.cooper3@citrix.com> wrote:
> Currently, whenever the guest writes a nonzero value to MSR_DEBUGCTL, Xen
> updates a host MSR load list entry with the current hardware value of
> MSR_DEBUGCTL.  This is wrong.
> 
> On VMExit, hardware automatically resets MSR_DEBUGCTL to 0.

I'm pretty certain that back when I write this, the SDM didn't spell this out.

>  The only case
> where different behaviour is needed is if Xen is debugging itself, and this
> needs setting up unconditionally for the lifetime of the VM.
> 
> The `ler` command line boolean is the only way to configure any use of
> MSR_DEBUGCTL for Xen, so tie the host load list entry to this setting in
> construct_vmcs().  Any runtime update of Xen's MSR_DEBUGCTL setting requires
> more complicated synchronisation across all the running VMs.
> 
> In the exceedingly common case, this avoids the unnecessary overhead of having
> a host load entry performing the same zeroing operation that hardware has
> already performed as part of the VMExit.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

> Notes for backporting: This change probably does want backporting, but depends
> on the previous patch "Support remote access to the MSR lists", and adds an
> extra rdmsr to the vcpu construction path (resolved in a later patch).

I wonder if for backporting purposes we couldn't stick this function
invocation somewhere else, like vmx_ctxt_switch_to() or
vmx_do_resume(). I realize the potential allocation failure is a problem here,
but for that we could either allocate the memory at the place you now
invoke vmx_add_host_load_msr(), or take the brute force approach and
crash the domain upon failure (the net effect won't be much different to
allocation failing during domain destruction - the domain won't start in either
case).

I mention this because it seems to me that pulling in the previous patch
would in turn require pulling in earlier ones.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 6/9] x86/vmx: Pass an MSR value into vmx_msr_add()
  2018-05-22 11:20 ` [PATCH 6/9] x86/vmx: Pass an MSR value into vmx_msr_add() Andrew Cooper
@ 2018-05-24 15:12   ` Jan Beulich
  2018-05-30 18:09     ` Andrew Cooper
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2018-05-24 15:12 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Xen-devel, Wei Liu, Jun Nakajima, Roger Pau Monne

>>> On 22.05.18 at 13:20, <andrew.cooper3@citrix.com> wrote:
> The main purpose of this change is to allow us to set a specific MSR value,
> without needing to know whether there is already a load/save list slot for 
> it.
> Previously, callers wanting this property needed to call both vmx_add_*_msr()
> and vmx_write_*_msr() to cover both cases.
> 
> As a result of this API improvement, the default value for guest MSRs need not
> be 0, and the default for host MSRs need not be passed via hardware register.
> In practice, this cleans up the VPMU allocation logic, and avoids an MSR read
> as part of vcpu construction.

But this also means: If there already is such an entry, the previous value will
now be blindly overwritten. Are you sure this is always what is wanted?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 7/9] x86/vmx: Support load-only guest MSR list entries
  2018-05-22 11:20 ` [PATCH 7/9] x86/vmx: Support load-only guest MSR list entries Andrew Cooper
@ 2018-05-24 15:19   ` Jan Beulich
  2018-05-24 15:37   ` Roger Pau Monné
  1 sibling, 0 replies; 51+ messages in thread
From: Jan Beulich @ 2018-05-24 15:19 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Xen-devel, Wei Liu, Jun Nakajima, Roger Pau Monne

>>> On 22.05.18 at 13:20, <andrew.cooper3@citrix.com> wrote:
> Currently, the VMX_MSR_GUEST type maintains completely symmetric guest load
> and save lists, by pointing VM_EXIT_MSR_STORE_ADDR and 
> VM_ENTRY_MSR_LOAD_ADDR
> at the same page, and setting VM_EXIT_MSR_STORE_COUNT and
> VM_ENTRY_MSR_LOAD_COUNT to the same value.
> 
> However, for MSRs which we won't let the guest have direct access to, having
> hardware save the current value on VMExit is unnecessary overhead.
> 
> To avoid this overhead, we must make the load and save lists asymmetric.  By
> making the entry load count greater than the exit store count, we can maintain
> two adjacent lists of MSRs, the first of which is saved and restored, and the
> second of which is only restored on VMEntry.
> 
> For simplicity:
>  * Both adjacent lists are still sorted by MSR index.
>  * It undefined behaviour to insert the same MSR into both lists.

I guess for now that's good enough. Ideally, inserting a load/save entry
would purge a possible load-only entry, and inserting a load-only entry
would be a no-op when a load/save one already exists. Otherwise
different pieces of code dealing with different parts of an MSR (if we
ever gain such) would need to be tightly aware of one another (and in
particular conditionals around the insertions would need to be kept in
sync).

>  * The total size of both lists is still limited at 256 entries (one 4k page).
> 
> Split the current msr_count field into msr_{load,save}_count, and introduce a
> new VMX_MSR_GUEST_LOADONLY type, and update vmx_{add,find}_msr() to calculate
> which sublist to search, based on type.  VMX_MSR_HOST has no logical sublist,
> whereas VMX_MSR_GUEST has a sublist between 0 and the save count, while
> VMX_MSR_GUEST_LOADONLY has a sublist between the save count and the load
> count.
> 
> One subtle point is that inserting an MSR into the load-save list involves
> moving the entire load-only list, and updating both counts.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 7/9] x86/vmx: Support load-only guest MSR list entries
  2018-05-22 11:20 ` [PATCH 7/9] x86/vmx: Support load-only guest MSR list entries Andrew Cooper
  2018-05-24 15:19   ` Jan Beulich
@ 2018-05-24 15:37   ` Roger Pau Monné
  1 sibling, 0 replies; 51+ messages in thread
From: Roger Pau Monné @ 2018-05-24 15:37 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Kevin Tian, Wei Liu, Jun Nakajima, Jan Beulich, Xen-devel

On Tue, May 22, 2018 at 12:20:44PM +0100, Andrew Cooper wrote:
> Currently, the VMX_MSR_GUEST type maintains completely symmetric guest load
> and save lists, by pointing VM_EXIT_MSR_STORE_ADDR and VM_ENTRY_MSR_LOAD_ADDR
> at the same page, and setting VM_EXIT_MSR_STORE_COUNT and
> VM_ENTRY_MSR_LOAD_COUNT to the same value.
> 
> However, for MSRs which we won't let the guest have direct access to, having
> hardware save the current value on VMExit is unnecessary overhead.
> 
> To avoid this overhead, we must make the load and save lists asymmetric.  By
> making the entry load count greater than the exit store count, we can maintain
> two adjacent lists of MSRs, the first of which is saved and restored, and the
> second of which is only restored on VMEntry.
> 
> For simplicity:
>  * Both adjacent lists are still sorted by MSR index.
>  * It undefined behaviour to insert the same MSR into both lists.
>  * The total size of both lists is still limited at 256 entries (one 4k page).
> 
> Split the current msr_count field into msr_{load,save}_count, and introduce a
> new VMX_MSR_GUEST_LOADONLY type, and update vmx_{add,find}_msr() to calculate
> which sublist to search, based on type.  VMX_MSR_HOST has no logical sublist,
> whereas VMX_MSR_GUEST has a sublist between 0 and the save count, while
> VMX_MSR_GUEST_LOADONLY has a sublist between the save count and the load
> count.
> 
> One subtle point is that inserting an MSR into the load-save list involves
> moving the entire load-only list, and updating both counts.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Just one nit below.

> @@ -1423,8 +1446,11 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
>          break;
>  
>      case VMX_MSR_GUEST:
> -        __vmwrite(VM_EXIT_MSR_STORE_COUNT, ++arch_vmx->msr_count);
> -        __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, arch_vmx->msr_count);
> +        __vmwrite(VM_EXIT_MSR_STORE_COUNT, ++arch_vmx->msr_save_count);
> +
> +        /* Fallthrough */
> +    case VMX_MSR_GUEST_LOADONLY:
> +        __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, ++arch_vmx->msr_load_count);
>          break;
>      }

Would it make sense to add something like:

ASSERT(arch_vmx->msr_save_count <= arch_vmx->msr_load_count);

Thanks.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 8/9] x86/vmx: Support removing MSRs from the host/guest load/save lists
  2018-05-22 11:20 ` [PATCH 8/9] x86/vmx: Support removing MSRs from the host/guest load/save lists Andrew Cooper
@ 2018-05-24 15:42   ` Roger Pau Monné
  2018-05-24 15:45     ` Andrew Cooper
  0 siblings, 1 reply; 51+ messages in thread
From: Roger Pau Monné @ 2018-05-24 15:42 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Kevin Tian, Wei Liu, Jun Nakajima, Jan Beulich, Xen-devel

On Tue, May 22, 2018 at 12:20:45PM +0100, Andrew Cooper wrote:
> Up until this point, the MSR load/save lists have only ever accumulated
> content.  Introduce vmx_del_msr() as a companion to vmx_add_msr().
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Jun Nakajima <jun.nakajima@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c        | 68 ++++++++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/hvm/vmx/vmcs.h |  1 +
>  2 files changed, 69 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 7bf19a0..e1a8f95 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -1465,6 +1465,74 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
>      return rc;
>  }
>  
> +int vmx_del_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type)
> +{
> +    struct arch_vmx_struct *arch_vmx = &v->arch.hvm_vmx;
> +    struct vmx_msr_entry *start = NULL, *ent, *end;
> +    unsigned int substart, subend, total;
> +
> +    ASSERT(v == current || !vcpu_runnable(v));
> +
> +    switch ( type )
> +    {
> +    case VMX_MSR_HOST:
> +        start    = arch_vmx->host_msr_area;
> +        substart = 0;
> +        subend   = arch_vmx->host_msr_count;
> +        total    = subend;
> +        break;
> +
> +    case VMX_MSR_GUEST:
> +        start    = arch_vmx->msr_area;
> +        substart = 0;
> +        subend   = arch_vmx->msr_save_count;
> +        total    = arch_vmx->msr_load_count;
> +        break;
> +
> +    case VMX_MSR_GUEST_LOADONLY:
> +        start    = arch_vmx->msr_area;
> +        substart = arch_vmx->msr_save_count;
> +        subend   = arch_vmx->msr_load_count;
> +        total    = subend;
> +        break;
> +
> +    default:
> +        ASSERT_UNREACHABLE();
> +    }

The above chunk is already in vmx_find_msr and vmx_add_msr, maybe a
static helper that sets start/substart/subend/total would be helpful
here?

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 8/9] x86/vmx: Support removing MSRs from the host/guest load/save lists
  2018-05-24 15:42   ` Roger Pau Monné
@ 2018-05-24 15:45     ` Andrew Cooper
  0 siblings, 0 replies; 51+ messages in thread
From: Andrew Cooper @ 2018-05-24 15:45 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Jan Beulich, Xen-devel

On 24/05/18 16:42, Roger Pau Monné wrote:
> On Tue, May 22, 2018 at 12:20:45PM +0100, Andrew Cooper wrote:
>> Up until this point, the MSR load/save lists have only ever accumulated
>> content.  Introduce vmx_del_msr() as a companion to vmx_add_msr().
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Jun Nakajima <jun.nakajima@intel.com>
>> CC: Kevin Tian <kevin.tian@intel.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> ---
>>  xen/arch/x86/hvm/vmx/vmcs.c        | 68 ++++++++++++++++++++++++++++++++++++++
>>  xen/include/asm-x86/hvm/vmx/vmcs.h |  1 +
>>  2 files changed, 69 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>> index 7bf19a0..e1a8f95 100644
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -1465,6 +1465,74 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t val,
>>      return rc;
>>  }
>>  
>> +int vmx_del_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type)
>> +{
>> +    struct arch_vmx_struct *arch_vmx = &v->arch.hvm_vmx;
>> +    struct vmx_msr_entry *start = NULL, *ent, *end;
>> +    unsigned int substart, subend, total;
>> +
>> +    ASSERT(v == current || !vcpu_runnable(v));
>> +
>> +    switch ( type )
>> +    {
>> +    case VMX_MSR_HOST:
>> +        start    = arch_vmx->host_msr_area;
>> +        substart = 0;
>> +        subend   = arch_vmx->host_msr_count;
>> +        total    = subend;
>> +        break;
>> +
>> +    case VMX_MSR_GUEST:
>> +        start    = arch_vmx->msr_area;
>> +        substart = 0;
>> +        subend   = arch_vmx->msr_save_count;
>> +        total    = arch_vmx->msr_load_count;
>> +        break;
>> +
>> +    case VMX_MSR_GUEST_LOADONLY:
>> +        start    = arch_vmx->msr_area;
>> +        substart = arch_vmx->msr_save_count;
>> +        subend   = arch_vmx->msr_load_count;
>> +        total    = subend;
>> +        break;
>> +
>> +    default:
>> +        ASSERT_UNREACHABLE();
>> +    }
> The above chunk is already in vmx_find_msr and vmx_add_msr, maybe a
> static helper that sets start/substart/subend/total would be helpful
> here?

Its actually more than that.  They are identical identical until after
the locate_msr_entry() call.  The problem is that I can't find a clean
way of collecting the logic which is also readable.

I'd also like to fold together the add side, but that is further
complicated by the memory allocation logic.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/9] x86/vmx: Fix handing of MSR_DEBUGCTL on VMExit
  2018-05-24 15:08   ` Jan Beulich
@ 2018-05-24 15:51     ` Andrew Cooper
  0 siblings, 0 replies; 51+ messages in thread
From: Andrew Cooper @ 2018-05-24 15:51 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Kevin Tian, Xen-devel, Wei Liu, Jun Nakajima, Roger Pau Monne

On 24/05/18 16:08, Jan Beulich wrote:
>>>> On 22.05.18 at 13:20, <andrew.cooper3@citrix.com> wrote:
>> Currently, whenever the guest writes a nonzero value to MSR_DEBUGCTL, Xen
>> updates a host MSR load list entry with the current hardware value of
>> MSR_DEBUGCTL.  This is wrong.
>>
>> On VMExit, hardware automatically resets MSR_DEBUGCTL to 0.
> I'm pretty certain that back when I write this, the SDM didn't spell this out.

I wondered as much.  Gen-1 VT-x is well documented as forcing the
VM_EXIT_SAVE_DEBUG_CTLS control to be set in non-root operation, which
clearly suggests that it has always had this behaviour.

>
>>  The only case
>> where different behaviour is needed is if Xen is debugging itself, and this
>> needs setting up unconditionally for the lifetime of the VM.
>>
>> The `ler` command line boolean is the only way to configure any use of
>> MSR_DEBUGCTL for Xen, so tie the host load list entry to this setting in
>> construct_vmcs().  Any runtime update of Xen's MSR_DEBUGCTL setting requires
>> more complicated synchronisation across all the running VMs.
>>
>> In the exceedingly common case, this avoids the unnecessary overhead of having
>> a host load entry performing the same zeroing operation that hardware has
>> already performed as part of the VMExit.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>
>> Notes for backporting: This change probably does want backporting, but depends
>> on the previous patch "Support remote access to the MSR lists", and adds an
>> extra rdmsr to the vcpu construction path (resolved in a later patch).
> I wonder if for backporting purposes we couldn't stick this function
> invocation somewhere else, like vmx_ctxt_switch_to() or
> vmx_do_resume(). I realize the potential allocation failure is a problem here,
> but for that we could either allocate the memory at the place you now
> invoke vmx_add_host_load_msr(), or take the brute force approach and
> crash the domain upon failure (the net effect won't be much different to
> allocation failing during domain destruction - the domain won't start in either
> case).
>
> I mention this because it seems to me that pulling in the previous patch
> would in turn require pulling in earlier ones.

Yeah - to backport this change, you need 6 patches from the series.

That said, I think I've come up with a much safer, faster, alternative
which I can disentangle completely from this series.

I was already planning to clean up the host debugctl handling to have a
single read_mostly value.  With a trivial alternative block in the vmx
vmexit handler, we can do away with the host load entry entirely (and,
as I've been reliably informed, doing things this way is faster than
having microcode walk the host/guest load/save lists).

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 9/9] x86/vmx: Don't leak EFER.NXE into guest context
  2018-05-22 11:20 ` [PATCH 9/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
@ 2018-05-24 16:01   ` Roger Pau Monné
  2018-05-24 16:48     ` Andrew Cooper
  2018-05-25  6:23   ` Tim Deegan
  2018-05-25  7:49   ` Jan Beulich
  2 siblings, 1 reply; 51+ messages in thread
From: Roger Pau Monné @ 2018-05-24 16:01 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Tim Deegan, Xen-devel, Jun Nakajima

On Tue, May 22, 2018 at 12:20:46PM +0100, Andrew Cooper wrote:
> Intel hardware only uses 4 bits in MSR_EFER.  Changes to LME and LMA are
> handled automatically via the VMENTRY_CTLS.IA32E_MODE bit.
> 
> SCE is handled by ad-hoc logic in context_switch(), vmx_restore_guest_msrs()
> and vmx_update_guest_efer(), and works by altering the host SCE value to match
> the setting the guest wants.  This works because, in HVM vcpu context, Xen
> never needs to execute a SYSCALL or SYSRET instruction.
> 
> However, NXE has never been context switched.  Unlike SCE, NXE cannot be
> context switched at vcpu boundaries because disabling NXE makes PTE.NX bits
> reserved and cause a pagefault when encountered.  This means that the guest
> always has Xen's setting in effect, irrespective of the bit it can see and
> modify in its virtualised view of MSR_EFER.
> 
> This isn't a major problem for production operating systems because they, like
> Xen, always turn the NXE on when it is available.  However, it does have an
> observable effect on which guest PTE bits are valid, and whether
> PFEC_insn_fetch is visible in a #PF error code.
> 
> Second generation VT-x hardware has host and guest EFER fields in the VMCS,
> and support for loading and saving them automatically.  First generation VT-x
> hardware needs to use MSR load/save lists to cause an atomic switch of
> MSR_EFER on vmentry/exit.
> 
> Therefore we update vmx_init_vmcs_config() to find and use guest/host EFER
> support when available (and MSR load/save lists on older hardware) and drop
> all ad-hoc alteration of SCE.
> 
> There are two complications for shadow guests.  NXE, being a paging setting
> needs to remain under host control, but that is fine as it is also Xen which
> handles the pagefaults.  Also, it turns out that without EPT enabled, hardware
> won't tolerate LME and LMA being different via either the GUEST_EFER VMCS
> setting, or via the guest load list.  This doesn't matter in practice as we
> intercept all writes to CR0 and reads from MSR_EFER, so can provide
> architecturally consistent behaviour from the guests point of view.
> 
> As a result of fixing EFER context switching, we can remove the Intel-special
> case from hvm_nx_enabled() and let guest_walk_tables() work with the real
> guest paging settings.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

LGTM:

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

One question below though.

> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index cfd174c..6c6897c 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -306,6 +306,8 @@ extern u64 vmx_ept_vpid_cap;
>      (vmx_cpu_based_exec_control & CPU_BASED_MONITOR_TRAP_FLAG)
>  #define cpu_has_vmx_pat \
>      (vmx_vmentry_control & VM_ENTRY_LOAD_GUEST_PAT)
> +#define cpu_has_vmx_efer \
> +    (vmx_vmentry_control & VM_ENTRY_LOAD_GUEST_EFER)

Don't you also need a vmx_vmexit_control & VM_EXIT_SAVE_GUEST_EFER and
vmx_vmexit_control & VM_EXIT_LOAD_HOST_EFER?

Or can the presence of those two be inferred from
VM_ENTRY_LOAD_GUEST_EFER?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 9/9] x86/vmx: Don't leak EFER.NXE into guest context
  2018-05-24 16:01   ` Roger Pau Monné
@ 2018-05-24 16:48     ` Andrew Cooper
  2018-05-25  7:27       ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Andrew Cooper @ 2018-05-24 16:48 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Tim Deegan, Xen-devel, Jun Nakajima

On 24/05/18 17:01, Roger Pau Monné wrote:
> On Tue, May 22, 2018 at 12:20:46PM +0100, Andrew Cooper wrote:
>> Intel hardware only uses 4 bits in MSR_EFER.  Changes to LME and LMA are
>> handled automatically via the VMENTRY_CTLS.IA32E_MODE bit.
>>
>> SCE is handled by ad-hoc logic in context_switch(), vmx_restore_guest_msrs()
>> and vmx_update_guest_efer(), and works by altering the host SCE value to match
>> the setting the guest wants.  This works because, in HVM vcpu context, Xen
>> never needs to execute a SYSCALL or SYSRET instruction.
>>
>> However, NXE has never been context switched.  Unlike SCE, NXE cannot be
>> context switched at vcpu boundaries because disabling NXE makes PTE.NX bits
>> reserved and cause a pagefault when encountered.  This means that the guest
>> always has Xen's setting in effect, irrespective of the bit it can see and
>> modify in its virtualised view of MSR_EFER.
>>
>> This isn't a major problem for production operating systems because they, like
>> Xen, always turn the NXE on when it is available.  However, it does have an
>> observable effect on which guest PTE bits are valid, and whether
>> PFEC_insn_fetch is visible in a #PF error code.
>>
>> Second generation VT-x hardware has host and guest EFER fields in the VMCS,
>> and support for loading and saving them automatically.  First generation VT-x
>> hardware needs to use MSR load/save lists to cause an atomic switch of
>> MSR_EFER on vmentry/exit.
>>
>> Therefore we update vmx_init_vmcs_config() to find and use guest/host EFER
>> support when available (and MSR load/save lists on older hardware) and drop
>> all ad-hoc alteration of SCE.
>>
>> There are two complications for shadow guests.  NXE, being a paging setting
>> needs to remain under host control, but that is fine as it is also Xen which
>> handles the pagefaults.  Also, it turns out that without EPT enabled, hardware
>> won't tolerate LME and LMA being different via either the GUEST_EFER VMCS
>> setting, or via the guest load list.  This doesn't matter in practice as we
>> intercept all writes to CR0 and reads from MSR_EFER, so can provide
>> architecturally consistent behaviour from the guests point of view.
>>
>> As a result of fixing EFER context switching, we can remove the Intel-special
>> case from hvm_nx_enabled() and let guest_walk_tables() work with the real
>> guest paging settings.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> LGTM:
>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
>
> One question below though.
>
>> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> index cfd174c..6c6897c 100644
>> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
>> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> @@ -306,6 +306,8 @@ extern u64 vmx_ept_vpid_cap;
>>      (vmx_cpu_based_exec_control & CPU_BASED_MONITOR_TRAP_FLAG)
>>  #define cpu_has_vmx_pat \
>>      (vmx_vmentry_control & VM_ENTRY_LOAD_GUEST_PAT)
>> +#define cpu_has_vmx_efer \
>> +    (vmx_vmentry_control & VM_ENTRY_LOAD_GUEST_EFER)
> Don't you also need a vmx_vmexit_control & VM_EXIT_SAVE_GUEST_EFER and
> vmx_vmexit_control & VM_EXIT_LOAD_HOST_EFER?
>
> Or can the presence of those two be inferred from
> VM_ENTRY_LOAD_GUEST_EFER?

They were introduced at the same time into hardware, so these settings
will be the same in practice.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 9/9] x86/vmx: Don't leak EFER.NXE into guest context
  2018-05-22 11:20 ` [PATCH 9/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
  2018-05-24 16:01   ` Roger Pau Monné
@ 2018-05-25  6:23   ` Tim Deegan
  2018-05-25  7:49   ` Jan Beulich
  2 siblings, 0 replies; 51+ messages in thread
From: Tim Deegan @ 2018-05-25  6:23 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Xen-devel, Jun Nakajima,
	Roger Pau Monné

At 12:20 +0100 on 22 May (1526991646), Andrew Cooper wrote:
> Intel hardware only uses 4 bits in MSR_EFER.  Changes to LME and LMA are
> handled automatically via the VMENTRY_CTLS.IA32E_MODE bit.
> 
> SCE is handled by ad-hoc logic in context_switch(), vmx_restore_guest_msrs()
> and vmx_update_guest_efer(), and works by altering the host SCE value to match
> the setting the guest wants.  This works because, in HVM vcpu context, Xen
> never needs to execute a SYSCALL or SYSRET instruction.
> 
> However, NXE has never been context switched.  Unlike SCE, NXE cannot be
> context switched at vcpu boundaries because disabling NXE makes PTE.NX bits
> reserved and cause a pagefault when encountered.  This means that the guest
> always has Xen's setting in effect, irrespective of the bit it can see and
> modify in its virtualised view of MSR_EFER.
> 
> This isn't a major problem for production operating systems because they, like
> Xen, always turn the NXE on when it is available.  However, it does have an
> observable effect on which guest PTE bits are valid, and whether
> PFEC_insn_fetch is visible in a #PF error code.
> 
> Second generation VT-x hardware has host and guest EFER fields in the VMCS,
> and support for loading and saving them automatically.  First generation VT-x
> hardware needs to use MSR load/save lists to cause an atomic switch of
> MSR_EFER on vmentry/exit.
> 
> Therefore we update vmx_init_vmcs_config() to find and use guest/host EFER
> support when available (and MSR load/save lists on older hardware) and drop
> all ad-hoc alteration of SCE.
> 
> There are two complications for shadow guests.  NXE, being a paging setting
> needs to remain under host control, but that is fine as it is also Xen which
> handles the pagefaults.  Also, it turns out that without EPT enabled, hardware
> won't tolerate LME and LMA being different via either the GUEST_EFER VMCS
> setting, or via the guest load list.  This doesn't matter in practice as we
> intercept all writes to CR0 and reads from MSR_EFER, so can provide
> architecturally consistent behaviour from the guests point of view.
> 
> As a result of fixing EFER context switching, we can remove the Intel-special
> case from hvm_nx_enabled() and let guest_walk_tables() work with the real
> guest paging settings.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Tim Deegan <tim@xen.org>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 9/9] x86/vmx: Don't leak EFER.NXE into guest context
  2018-05-24 16:48     ` Andrew Cooper
@ 2018-05-25  7:27       ` Jan Beulich
  2018-05-25  8:03         ` Andrew Cooper
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2018-05-25  7:27 UTC (permalink / raw)
  To: Andrew Cooper, Roger Pau Monne
  Cc: Tim Deegan, Kevin Tian, Wei Liu, Jun Nakajima, Xen-devel

>>> On 24.05.18 at 18:48, <andrew.cooper3@citrix.com> wrote:
> On 24/05/18 17:01, Roger Pau Monné wrote:
>> On Tue, May 22, 2018 at 12:20:46PM +0100, Andrew Cooper wrote:
>>> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
>>> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
>>> @@ -306,6 +306,8 @@ extern u64 vmx_ept_vpid_cap;
>>>      (vmx_cpu_based_exec_control & CPU_BASED_MONITOR_TRAP_FLAG)
>>>  #define cpu_has_vmx_pat \
>>>      (vmx_vmentry_control & VM_ENTRY_LOAD_GUEST_PAT)
>>> +#define cpu_has_vmx_efer \
>>> +    (vmx_vmentry_control & VM_ENTRY_LOAD_GUEST_EFER)
>> Don't you also need a vmx_vmexit_control & VM_EXIT_SAVE_GUEST_EFER and
>> vmx_vmexit_control & VM_EXIT_LOAD_HOST_EFER?
>>
>> Or can the presence of those two be inferred from
>> VM_ENTRY_LOAD_GUEST_EFER?
> 
> They were introduced at the same time into hardware, so these settings
> will be the same in practice.

I see other similar groups of features also aren't checked for consistency,
but wouldn't it be better to have such checks in vmx_init_vmcs_config()
(i.e. disable all three if at least one of them is unavailable)?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 9/9] x86/vmx: Don't leak EFER.NXE into guest context
  2018-05-22 11:20 ` [PATCH 9/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
  2018-05-24 16:01   ` Roger Pau Monné
  2018-05-25  6:23   ` Tim Deegan
@ 2018-05-25  7:49   ` Jan Beulich
  2018-05-25  8:36     ` Andrew Cooper
  2 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2018-05-25  7:49 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, Tim Deegan, Xen-devel, Jun Nakajima,
	Roger Pau Monne

>>> On 22.05.18 at 13:20, <andrew.cooper3@citrix.com> wrote:
> @@ -1650,22 +1641,81 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr,
>  
>  static void vmx_update_guest_efer(struct vcpu *v)
>  {
> -    unsigned long vm_entry_value;
> +    unsigned long entry_ctls, guest_efer = v->arch.hvm_vcpu.guest_efer,
> +        xen_efer = read_efer();
> +
> +    if ( paging_mode_shadow(v->domain) )
> +    {
> +        /*
> +         * When using shadow pagetables, EFER.NX is a Xen-owned bit and is not
> +         * under guest control.
> +         */
> +        guest_efer &= ~EFER_NX;
> +        guest_efer |= xen_efer & EFER_NX;
> +
> +        /*
> +         * At the time of writing (May 2018), the Intel SDM "VM Entry: Checks
> +         * on Guest Control Registers, Debug Registers and MSRs" section says:
> +         *
> +         *  If the "Load IA32_EFER" VM-entry control is 1, the following
> +         *  checks are performed on the field for the IA32_MSR:
> +         *   - Bits reserved in the IA32_EFER MSR must be 0.
> +         *   - Bit 10 (corresponding to IA32_EFER.LMA) must equal the value of
> +         *     the "IA-32e mode guest" VM-entry control.  It must also be
> +         *     identical to bit 8 (LME) if bit 31 in the CR0 field
> +         *     (corresponding to CR0.PG) is 1.
> +         *
> +         * Experimentally what actually happens is:
> +         *   - Checks for EFER.{LME,LMA} apply uniformly whether using the
> +         *     GUEST_EFER VMCS controls, or MSR load/save lists.
> +         *   - Without EPT, LME being different to LMA isn't tolerated by
> +         *     hardware.  As writes to CR0 are intercepted, it is safe to
> +         *     leave LME clear at this point, and fix up both LME and LMA when
> +         *     CR0.PG is set.
> +         */
> +        if ( !(guest_efer & EFER_LMA) )
> +            guest_efer &= ~EFER_LME;
> +    }

Why is this latter adjustments done only for shadow mode?

After the above adjustments, when guest_efer still matches
v->arch.hvm_vcpu.guest_efer, couldn't we disable the MSR read
intercept?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 9/9] x86/vmx: Don't leak EFER.NXE into guest context
  2018-05-25  7:27       ` Jan Beulich
@ 2018-05-25  8:03         ` Andrew Cooper
  0 siblings, 0 replies; 51+ messages in thread
From: Andrew Cooper @ 2018-05-25  8:03 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monne
  Cc: Tim Deegan, Kevin Tian, Wei Liu, Jun Nakajima, Xen-devel

On 25/05/2018 08:27, Jan Beulich wrote:
>>>> On 24.05.18 at 18:48, <andrew.cooper3@citrix.com> wrote:
>> On 24/05/18 17:01, Roger Pau Monné wrote:
>>> On Tue, May 22, 2018 at 12:20:46PM +0100, Andrew Cooper wrote:
>>>> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
>>>> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
>>>> @@ -306,6 +306,8 @@ extern u64 vmx_ept_vpid_cap;
>>>>      (vmx_cpu_based_exec_control & CPU_BASED_MONITOR_TRAP_FLAG)
>>>>  #define cpu_has_vmx_pat \
>>>>      (vmx_vmentry_control & VM_ENTRY_LOAD_GUEST_PAT)
>>>> +#define cpu_has_vmx_efer \
>>>> +    (vmx_vmentry_control & VM_ENTRY_LOAD_GUEST_EFER)
>>> Don't you also need a vmx_vmexit_control & VM_EXIT_SAVE_GUEST_EFER and
>>> vmx_vmexit_control & VM_EXIT_LOAD_HOST_EFER?
>>>
>>> Or can the presence of those two be inferred from
>>> VM_ENTRY_LOAD_GUEST_EFER?
>> They were introduced at the same time into hardware, so these settings
>> will be the same in practice.
> I see other similar groups of features also aren't checked for consistency,
> but wouldn't it be better to have such checks in vmx_init_vmcs_config()
> (i.e. disable all three if at least one of them is unavailable)?

Correct.  Logic to this effect is coming as part of Nested Virt work.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 9/9] x86/vmx: Don't leak EFER.NXE into guest context
  2018-05-25  7:49   ` Jan Beulich
@ 2018-05-25  8:36     ` Andrew Cooper
  2018-05-25 11:36       ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Andrew Cooper @ 2018-05-25  8:36 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Wei Liu, Tim Deegan, Xen-devel, Jun Nakajima,
	Roger Pau Monne

On 25/05/2018 08:49, Jan Beulich wrote:
>>>> On 22.05.18 at 13:20, <andrew.cooper3@citrix.com> wrote:
>> @@ -1650,22 +1641,81 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr,
>>  
>>  static void vmx_update_guest_efer(struct vcpu *v)
>>  {
>> -    unsigned long vm_entry_value;
>> +    unsigned long entry_ctls, guest_efer = v->arch.hvm_vcpu.guest_efer,
>> +        xen_efer = read_efer();
>> +
>> +    if ( paging_mode_shadow(v->domain) )
>> +    {
>> +        /*
>> +         * When using shadow pagetables, EFER.NX is a Xen-owned bit and is not
>> +         * under guest control.
>> +         */
>> +        guest_efer &= ~EFER_NX;
>> +        guest_efer |= xen_efer & EFER_NX;
>> +
>> +        /*
>> +         * At the time of writing (May 2018), the Intel SDM "VM Entry: Checks
>> +         * on Guest Control Registers, Debug Registers and MSRs" section says:
>> +         *
>> +         *  If the "Load IA32_EFER" VM-entry control is 1, the following
>> +         *  checks are performed on the field for the IA32_MSR:
>> +         *   - Bits reserved in the IA32_EFER MSR must be 0.
>> +         *   - Bit 10 (corresponding to IA32_EFER.LMA) must equal the value of
>> +         *     the "IA-32e mode guest" VM-entry control.  It must also be
>> +         *     identical to bit 8 (LME) if bit 31 in the CR0 field
>> +         *     (corresponding to CR0.PG) is 1.
>> +         *
>> +         * Experimentally what actually happens is:
>> +         *   - Checks for EFER.{LME,LMA} apply uniformly whether using the
>> +         *     GUEST_EFER VMCS controls, or MSR load/save lists.
>> +         *   - Without EPT, LME being different to LMA isn't tolerated by
>> +         *     hardware.  As writes to CR0 are intercepted, it is safe to
>> +         *     leave LME clear at this point, and fix up both LME and LMA when
>> +         *     CR0.PG is set.
>> +         */
>> +        if ( !(guest_efer & EFER_LMA) )
>> +            guest_efer &= ~EFER_LME;
>> +    }
> Why is this latter adjustments done only for shadow mode?

How should I go about making the comment clearer?

When EPT is active, hardware is happy with LMA  != LME.  When EPT is
disabled, hardware strictly requires LME == LMA.

This particular condition occurs architecturally on the transition into
long mode, between setting LME and setting CR0.PG, and updating EFER
controls in the naive way results in a vmentry failure.

Having spoken to Intel, they agree with my assessment that the docs
appear to be correct for Gen1 hardware, and stale for Gen2 hardware,
where fixing this was one of many parts of making Unrestricted Guest work.

> After the above adjustments, when guest_efer still matches
> v->arch.hvm_vcpu.guest_efer, couldn't we disable the MSR read
> intercept?

In principle, yes.  We use load/save lists, as long as we remembered to
recalculate EFER every time CR0 gets modified in the shadow path.

However, that would be a net performance penalty rather than benefit
(which is why I've gone to the effort of creating load-only lists).

In practice, EFER is written at boot and not touched again.  Having
load/save logic might avoid these vmexits, but at the cost of almost
every other vmexit needing to keep the guest_efer in sync with the
load/save list or VMCS field.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 9/9] x86/vmx: Don't leak EFER.NXE into guest context
  2018-05-25  8:36     ` Andrew Cooper
@ 2018-05-25 11:36       ` Jan Beulich
  2018-05-25 11:48         ` Andrew Cooper
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2018-05-25 11:36 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, Tim Deegan, Xen-devel, Jun Nakajima,
	Roger Pau Monne

>>> On 25.05.18 at 10:36, <andrew.cooper3@citrix.com> wrote:
> On 25/05/2018 08:49, Jan Beulich wrote:
>>>>> On 22.05.18 at 13:20, <andrew.cooper3@citrix.com> wrote:
>>> @@ -1650,22 +1641,81 @@ static void vmx_update_guest_cr(struct vcpu *v, 
> unsigned int cr,
>>>  
>>>  static void vmx_update_guest_efer(struct vcpu *v)
>>>  {
>>> -    unsigned long vm_entry_value;
>>> +    unsigned long entry_ctls, guest_efer = v->arch.hvm_vcpu.guest_efer,
>>> +        xen_efer = read_efer();
>>> +
>>> +    if ( paging_mode_shadow(v->domain) )
>>> +    {
>>> +        /*
>>> +         * When using shadow pagetables, EFER.NX is a Xen-owned bit and is not
>>> +         * under guest control.
>>> +         */
>>> +        guest_efer &= ~EFER_NX;
>>> +        guest_efer |= xen_efer & EFER_NX;
>>> +
>>> +        /*
>>> +         * At the time of writing (May 2018), the Intel SDM "VM Entry: Checks
>>> +         * on Guest Control Registers, Debug Registers and MSRs" section says:
>>> +         *
>>> +         *  If the "Load IA32_EFER" VM-entry control is 1, the following
>>> +         *  checks are performed on the field for the IA32_MSR:
>>> +         *   - Bits reserved in the IA32_EFER MSR must be 0.
>>> +         *   - Bit 10 (corresponding to IA32_EFER.LMA) must equal the value of
>>> +         *     the "IA-32e mode guest" VM-entry control.  It must also be
>>> +         *     identical to bit 8 (LME) if bit 31 in the CR0 field
>>> +         *     (corresponding to CR0.PG) is 1.
>>> +         *
>>> +         * Experimentally what actually happens is:
>>> +         *   - Checks for EFER.{LME,LMA} apply uniformly whether using the
>>> +         *     GUEST_EFER VMCS controls, or MSR load/save lists.
>>> +         *   - Without EPT, LME being different to LMA isn't tolerated by
>>> +         *     hardware.  As writes to CR0 are intercepted, it is safe to
>>> +         *     leave LME clear at this point, and fix up both LME and LMA when
>>> +         *     CR0.PG is set.
>>> +         */
>>> +        if ( !(guest_efer & EFER_LMA) )
>>> +            guest_efer &= ~EFER_LME;
>>> +    }
>> Why is this latter adjustments done only for shadow mode?
> 
> How should I go about making the comment clearer?
> 
> When EPT is active, hardware is happy with LMA  != LME.  When EPT is
> disabled, hardware strictly requires LME == LMA.

Part of my problem may be that "Without EPT" can have two meanings:
Hardware without EPT, or EPT disabled on otherwise capable hardware.

> This particular condition occurs architecturally on the transition into
> long mode, between setting LME and setting CR0.PG, and updating EFER
> controls in the naive way results in a vmentry failure.
> 
> Having spoken to Intel, they agree with my assessment that the docs
> appear to be correct for Gen1 hardware, and stale for Gen2 hardware,
> where fixing this was one of many parts of making Unrestricted Guest work.

This suggests you mean the former, in which case the check really
doesn't belong inside a paging_mode_shadow() conditional.

>> After the above adjustments, when guest_efer still matches
>> v->arch.hvm_vcpu.guest_efer, couldn't we disable the MSR read
>> intercept?
> 
> In principle, yes.  We use load/save lists, as long as we remembered to
> recalculate EFER every time CR0 gets modified in the shadow path.
> 
> However, that would be a net performance penalty rather than benefit
> (which is why I've gone to the effort of creating load-only lists).
> 
> In practice, EFER is written at boot and not touched again.  Having
> load/save logic might avoid these vmexits, but at the cost of almost
> every other vmexit needing to keep the guest_efer in sync with the
> load/save list or VMCS field.

I can't seem to connect this to my question about MSR _read_ intercept.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 9/9] x86/vmx: Don't leak EFER.NXE into guest context
  2018-05-25 11:36       ` Jan Beulich
@ 2018-05-25 11:48         ` Andrew Cooper
  0 siblings, 0 replies; 51+ messages in thread
From: Andrew Cooper @ 2018-05-25 11:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Wei Liu, Tim Deegan, Xen-devel, Jun Nakajima,
	Roger Pau Monne

On 25/05/18 12:36, Jan Beulich wrote:
>>>> On 25.05.18 at 10:36, <andrew.cooper3@citrix.com> wrote:
>> On 25/05/2018 08:49, Jan Beulich wrote:
>>>>>> On 22.05.18 at 13:20, <andrew.cooper3@citrix.com> wrote:
>>>> @@ -1650,22 +1641,81 @@ static void vmx_update_guest_cr(struct vcpu *v, 
>> unsigned int cr,
>>>>  
>>>>  static void vmx_update_guest_efer(struct vcpu *v)
>>>>  {
>>>> -    unsigned long vm_entry_value;
>>>> +    unsigned long entry_ctls, guest_efer = v->arch.hvm_vcpu.guest_efer,
>>>> +        xen_efer = read_efer();
>>>> +
>>>> +    if ( paging_mode_shadow(v->domain) )
>>>> +    {
>>>> +        /*
>>>> +         * When using shadow pagetables, EFER.NX is a Xen-owned bit and is not
>>>> +         * under guest control.
>>>> +         */
>>>> +        guest_efer &= ~EFER_NX;
>>>> +        guest_efer |= xen_efer & EFER_NX;
>>>> +
>>>> +        /*
>>>> +         * At the time of writing (May 2018), the Intel SDM "VM Entry: Checks
>>>> +         * on Guest Control Registers, Debug Registers and MSRs" section says:
>>>> +         *
>>>> +         *  If the "Load IA32_EFER" VM-entry control is 1, the following
>>>> +         *  checks are performed on the field for the IA32_MSR:
>>>> +         *   - Bits reserved in the IA32_EFER MSR must be 0.
>>>> +         *   - Bit 10 (corresponding to IA32_EFER.LMA) must equal the value of
>>>> +         *     the "IA-32e mode guest" VM-entry control.  It must also be
>>>> +         *     identical to bit 8 (LME) if bit 31 in the CR0 field
>>>> +         *     (corresponding to CR0.PG) is 1.
>>>> +         *
>>>> +         * Experimentally what actually happens is:
>>>> +         *   - Checks for EFER.{LME,LMA} apply uniformly whether using the
>>>> +         *     GUEST_EFER VMCS controls, or MSR load/save lists.
>>>> +         *   - Without EPT, LME being different to LMA isn't tolerated by
>>>> +         *     hardware.  As writes to CR0 are intercepted, it is safe to
>>>> +         *     leave LME clear at this point, and fix up both LME and LMA when
>>>> +         *     CR0.PG is set.
>>>> +         */
>>>> +        if ( !(guest_efer & EFER_LMA) )
>>>> +            guest_efer &= ~EFER_LME;
>>>> +    }
>>> Why is this latter adjustments done only for shadow mode?
>> How should I go about making the comment clearer?
>>
>> When EPT is active, hardware is happy with LMA  != LME.  When EPT is
>> disabled, hardware strictly requires LME == LMA.
> Part of my problem may be that "Without EPT" can have two meanings:
> Hardware without EPT, or EPT disabled on otherwise capable hardware.

Ah ok.  Yes - I see the confusion.  I'll see about rewording it.

>
>> This particular condition occurs architecturally on the transition into
>> long mode, between setting LME and setting CR0.PG, and updating EFER
>> controls in the naive way results in a vmentry failure.
>>
>> Having spoken to Intel, they agree with my assessment that the docs
>> appear to be correct for Gen1 hardware, and stale for Gen2 hardware,
>> where fixing this was one of many parts of making Unrestricted Guest work.
> This suggests you mean the former, in which case the check really
> doesn't belong inside a paging_mode_shadow() conditional.

Whereas what is meant is the latter.  It depends on the EPT setting in
the VMCS, rather than whether the hardware is capable.  This is
presumably for backwards compatibility.

>
>>> After the above adjustments, when guest_efer still matches
>>> v->arch.hvm_vcpu.guest_efer, couldn't we disable the MSR read
>>> intercept?
>> In principle, yes.  We use load/save lists, as long as we remembered to
>> recalculate EFER every time CR0 gets modified in the shadow path.
>>
>> However, that would be a net performance penalty rather than benefit
>> (which is why I've gone to the effort of creating load-only lists).
>>
>> In practice, EFER is written at boot and not touched again.  Having
>> load/save logic might avoid these vmexits, but at the cost of almost
>> every other vmexit needing to keep the guest_efer in sync with the
>> load/save list or VMCS field.
> I can't seem to connect this to my question about MSR _read_ intercept.

Oh - so it doesn't.  I read that as the read/write intercept.

Yes - probably, although I'd have to double check how it interacts with
the introspection interception settings (and the answer is almost
certainly badly.  I've got a plan to fix this by maintaining separate
"who wants which MSR intercepted" state, and having a single
recalc_msr_intercept_bitmap() which runs on the hvm_resume() path after
any changes.)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/9] x86/vmx: API improvements for MSR load/save infrastructure
  2018-05-22 11:20 ` [PATCH 1/9] x86/vmx: API improvements for MSR load/save infrastructure Andrew Cooper
  2018-05-23 16:01   ` Roger Pau Monné
@ 2018-05-27  3:26   ` Tian, Kevin
  1 sibling, 0 replies; 51+ messages in thread
From: Tian, Kevin @ 2018-05-27  3:26 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Wei Liu, Nakajima, Jun, Jan Beulich, Roger Pau Monné

> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Tuesday, May 22, 2018 7:21 PM
> 
> Collect together related infrastructure in vmcs.h, rather than having it
> spread out.  Turn vmx_{read,write}_guest_msr() into static inlines, as they
> are simple enough.
> 
> Replace 'int type' with 'enum vmx_msr_list_type', and use switch
> statements
> internally.  Later changes are going to introduce a new type.
> 
> Rename the type identifiers for consistency with the other VMX_MSR_*
> constants.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/9] x86/vmx: Internal cleanup for MSR load/save infrastructure
  2018-05-22 11:20 ` [PATCH 2/9] x86/vmx: Internal cleanup " Andrew Cooper
  2018-05-23 16:28   ` Roger Pau Monné
  2018-05-24 14:45   ` Jan Beulich
@ 2018-05-27  3:30   ` Tian, Kevin
  2 siblings, 0 replies; 51+ messages in thread
From: Tian, Kevin @ 2018-05-27  3:30 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Wei Liu, Nakajima, Jun, Jan Beulich, Roger Pau Monné

> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Tuesday, May 22, 2018 7:21 PM
> 
>  * Use an arch_vmx_struct local variable to reduce later code volume.
>  * Use start/total instead of msr_area/msr_count.  This is in preparation for
>    more finegrained handling with later changes.
>  * Use ent/end pointers (again for preparation), and to make the
> vmx_add_msr()
>    logic easier to follow.
>  * Make the memory allocation block of vmx_add_msr() unlikely, and
> calculate
>    virt_to_maddr() just once.
> 
> No practical change to functionality.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/9] x86/vmx: Factor locate_msr_entry() out of vmx_find_msr() and vmx_add_msr()
  2018-05-22 11:20 ` [PATCH 3/9] x86/vmx: Factor locate_msr_entry() out of vmx_find_msr() and vmx_add_msr() Andrew Cooper
  2018-05-23 16:39   ` Roger Pau Monné
@ 2018-05-27  3:38   ` Tian, Kevin
  1 sibling, 0 replies; 51+ messages in thread
From: Tian, Kevin @ 2018-05-27  3:38 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Wei Liu, Nakajima, Jun, Jan Beulich, Roger Pau Monné

> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Tuesday, May 22, 2018 7:21 PM
> 
> Instead of having multiple algorithms searching the MSR lists, implement a
> single one.  It has the semantics required by vmx_add_msr(), to identify the
> position in which an MSR should live, if it isn't already present.
> 
> There will be a marginal improvement for vmx_find_msr() by avoiding the
> function pointer calls to vmx_msr_entry_key_cmp(), and a major
> improvement for
> vmx_add_msr() by using a binary search instead of a linear search.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>, with one typo found:

> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Jun Nakajima <jun.nakajima@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c | 42 ++++++++++++++++++++++++++++-----
> ---------
>  1 file changed, 28 insertions(+), 14 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c
> b/xen/arch/x86/hvm/vmx/vmcs.c
> index f557857..e4acdc1 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -1276,24 +1276,36 @@ static int construct_vmcs(struct vcpu *v)
>      return 0;
>  }
> 
> -static int vmx_msr_entry_key_cmp(const void *key, const void *elt)
> +/*
> + * Search an MSR list looking for an MSR entry, or the slot in which it
> should
> + * live (to keep the data sorted) if an entry is not found.
> + *
> + * The return pointer is guarenteed to be bounded by start and end.

guaranteed

> However,
> + * it may point at end, and may be invalid for the caller to dereference.
> + */
> +static struct vmx_msr_entry *locate_msr_entry(
> +    struct vmx_msr_entry *start, struct vmx_msr_entry *end, uint32_t msr)
>  {
> -    const u32 *msr = key;
> -    const struct vmx_msr_entry *entry = elt;
> +    while ( start < end )
> +    {
> +        struct vmx_msr_entry *mid = start + (end - start) / 2;
> 
> -    if ( *msr > entry->index )
> -        return 1;
> -    if ( *msr < entry->index )
> -        return -1;
> +        if ( msr < mid->index )
> +            end = mid;
> +        else if ( msr > mid->index )
> +            start = mid + 1;
> +        else
> +            return mid;
> +    }
> 
> -    return 0;
> +    return start;
>  }
> 
>  struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum
> vmx_msr_list_type type)
>  {
>      struct vcpu *curr = current;
>      struct arch_vmx_struct *arch_vmx = &curr->arch.hvm_vmx;
> -    struct vmx_msr_entry *start = NULL;
> +    struct vmx_msr_entry *start = NULL, *ent, *end;
>      unsigned int total;
> 
>      switch ( type )
> @@ -1315,8 +1327,10 @@ struct vmx_msr_entry *vmx_find_msr(uint32_t
> msr, enum vmx_msr_list_type type)
>      if ( !start )
>          return NULL;
> 
> -    return bsearch(&msr, start, total, sizeof(struct vmx_msr_entry),
> -                   vmx_msr_entry_key_cmp);
> +    end = start + total;
> +    ent = locate_msr_entry(start, end, msr);
> +
> +    return ((ent < end) && (ent->index == msr)) ? ent : NULL;
>  }
> 
>  int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
> @@ -1368,10 +1382,10 @@ int vmx_add_msr(uint32_t msr, enum
> vmx_msr_list_type type)
> 
>      start = *ptr;
>      end   = start + total;
> +    ent   = locate_msr_entry(start, end, msr);
> 
> -    for ( ent = start; ent < end && ent->index <= msr; ++ent )
> -        if ( ent->index == msr )
> -            return 0;
> +    if ( (ent < end) && (ent->index == msr) )
> +        return 0;
> 
>      if ( total == (PAGE_SIZE / sizeof(*ent)) )
>          return -ENOSPC;
> --
> 2.1.4

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/9] x86/vmx: Support remote access to the MSR lists
  2018-05-22 11:20 ` [PATCH 4/9] x86/vmx: Support remote access to the MSR lists Andrew Cooper
  2018-05-24 11:50   ` Roger Pau Monné
  2018-05-24 14:53   ` Jan Beulich
@ 2018-05-27  3:47   ` Tian, Kevin
  2018-05-28 15:15     ` Andrew Cooper
  2 siblings, 1 reply; 51+ messages in thread
From: Tian, Kevin @ 2018-05-27  3:47 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Wei Liu, Nakajima, Jun, Jan Beulich, Roger Pau Monné

> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Tuesday, May 22, 2018 7:21 PM
> 
> At the moment, all modifications of the MSR lists are in current context.
> However, future changes may need to put MSR_EFER into the lists from
> domctl
> hypercall context.
> 
> Plumb a struct vcpu parameter down through the infrastructure, and use
> vmx_vmcs_{enter,exit}() for safe access to the VMCS in vmx_add_msr().
> Use
> assertions to ensure that access is either in current context, or while the
> vcpu is paused.
> 
> For now it is safe to require that remote accesses are under the domctl lock.
> This will remain safe if/when the global domctl lock becomes per-domain.
> 
> Note these expectations beside the fields in arch_vmx_struct, and reorder
> the
> fields to avoid unnecessary padding.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Jun Nakajima <jun.nakajima@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> 
> To preempt any questions about spinlocks, the use of the MSR lists in the
> return-to-guest path causes checklock failures for plain spinlocks (despite it
> technically being safe to live here), and the call to alloc_xenheap_page()
> makes it impossible to use irqsave/restore variants, due to the nested
> acquisition of the heap lock.

I don't understand above words. How does it relate to the patch here?

> ---
>  xen/arch/x86/cpu/vpmu_intel.c      | 14 ++++++-------
>  xen/arch/x86/hvm/vmx/vmcs.c        | 40
> ++++++++++++++++++++++++++++----------
>  xen/arch/x86/hvm/vmx/vmx.c         | 24 ++++++++++++-----------
>  xen/include/asm-x86/hvm/vmx/vmcs.h | 34 ++++++++++++++++++++-------
> -----
>  4 files changed, 72 insertions(+), 40 deletions(-)
> 
> diff --git a/xen/arch/x86/cpu/vpmu_intel.c
> b/xen/arch/x86/cpu/vpmu_intel.c
> index 207e2e7..c499e69 100644
> --- a/xen/arch/x86/cpu/vpmu_intel.c
> +++ b/xen/arch/x86/cpu/vpmu_intel.c
> @@ -455,12 +455,12 @@ static int core2_vpmu_alloc_resource(struct vcpu
> *v)
>      if ( is_hvm_vcpu(v) )
>      {
>          wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
> -        if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
> +        if ( vmx_add_host_load_msr(v, MSR_CORE_PERF_GLOBAL_CTRL) )
>              goto out_err;
> 
> -        if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
> +        if ( vmx_add_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL) )
>              goto out_err;
> -        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
> +        vmx_write_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL, 0);
>      }
> 
>      core2_vpmu_cxt = xzalloc_bytes(sizeof(*core2_vpmu_cxt) +
> @@ -613,7 +613,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
> uint64_t msr_content,
>              return -EINVAL;
> 
>          if ( is_hvm_vcpu(v) )
> -            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
> +            vmx_read_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL,
>                                 &core2_vpmu_cxt->global_ctrl);
>          else
>              rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt-
> >global_ctrl);
> @@ -682,7 +682,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
> uint64_t msr_content,
>                  return -EINVAL;
> 
>              if ( is_hvm_vcpu(v) )
> -                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
> +                vmx_read_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL,
>                                     &core2_vpmu_cxt->global_ctrl);
>              else
>                  rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt-
> >global_ctrl);
> @@ -701,7 +701,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
> uint64_t msr_content,
>      else
>      {
>          if ( is_hvm_vcpu(v) )
> -            vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
> msr_content);
> +            vmx_write_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL,
> msr_content);
>          else
>              wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
>      }
> @@ -735,7 +735,7 @@ static int core2_vpmu_do_rdmsr(unsigned int msr,
> uint64_t *msr_content)
>              break;
>          case MSR_CORE_PERF_GLOBAL_CTRL:
>              if ( is_hvm_vcpu(v) )
> -                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
> msr_content);
> +                vmx_read_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL,
> msr_content);
>              else
>                  rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content);
>              break;
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c
> b/xen/arch/x86/hvm/vmx/vmcs.c
> index e4acdc1..8bf54c4 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -1301,13 +1301,15 @@ static struct vmx_msr_entry
> *locate_msr_entry(
>      return start;
>  }
> 
> -struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum
> vmx_msr_list_type type)
> +struct vmx_msr_entry *vmx_find_msr(struct vcpu *v, uint32_t msr,
> +                                   enum vmx_msr_list_type type)
>  {
> -    struct vcpu *curr = current;
> -    struct arch_vmx_struct *arch_vmx = &curr->arch.hvm_vmx;
> +    struct arch_vmx_struct *arch_vmx = &v->arch.hvm_vmx;
>      struct vmx_msr_entry *start = NULL, *ent, *end;
>      unsigned int total;
> 
> +    ASSERT(v == current || !vcpu_runnable(v));
> +
>      switch ( type )
>      {
>      case VMX_MSR_HOST:
> @@ -1333,12 +1335,14 @@ struct vmx_msr_entry
> *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type)
>      return ((ent < end) && (ent->index == msr)) ? ent : NULL;
>  }
> 
> -int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
> +int vmx_add_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type
> type)
>  {
> -    struct vcpu *curr = current;
> -    struct arch_vmx_struct *arch_vmx = &curr->arch.hvm_vmx;
> +    struct arch_vmx_struct *arch_vmx = &v->arch.hvm_vmx;
>      struct vmx_msr_entry **ptr, *start = NULL, *ent, *end;
>      unsigned int total;
> +    int rc;
> +
> +    ASSERT(v == current || !vcpu_runnable(v));
> 
>      switch ( type )
>      {
> @@ -1357,13 +1361,18 @@ int vmx_add_msr(uint32_t msr, enum
> vmx_msr_list_type type)
>          return -EINVAL;
>      }
> 
> +    vmx_vmcs_enter(v);
> +

why entering vmcs so early even before possible page allocation?

>      /* Allocate memory on first use. */
>      if ( unlikely(!*ptr) )
>      {
>          paddr_t addr;
> 
>          if ( (*ptr = alloc_xenheap_page()) == NULL )
> -            return -ENOMEM;
> +        {
> +            rc = -ENOMEM;
> +            goto out;
> +        }
> 
>          addr = virt_to_maddr(*ptr);
> 
> @@ -1385,10 +1394,16 @@ int vmx_add_msr(uint32_t msr, enum
> vmx_msr_list_type type)
>      ent   = locate_msr_entry(start, end, msr);
> 
>      if ( (ent < end) && (ent->index == msr) )
> -        return 0;
> +    {
> +        rc = 0;
> +        goto out;
> +    }
> 
>      if ( total == (PAGE_SIZE / sizeof(*ent)) )
> -        return -ENOSPC;
> +    {
> +        rc = -ENOSPC;
> +        goto out;
> +    }
> 
>      memmove(ent + 1, ent, sizeof(*ent) * (end - ent));
> 
> @@ -1409,7 +1424,12 @@ int vmx_add_msr(uint32_t msr, enum
> vmx_msr_list_type type)
>          break;
>      }
> 
> -    return 0;
> +    rc = 0;
> +
> + out:
> +    vmx_vmcs_exit(v);
> +
> +    return rc;
>  }
> 
>  void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector)
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 123dccb..3950b12 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -2818,7 +2818,7 @@ static int is_last_branch_msr(u32 ecx)
> 
>  static int vmx_msr_read_intercept(unsigned int msr, uint64_t
> *msr_content)
>  {
> -    const struct vcpu *curr = current;
> +    struct vcpu *curr = current;
> 
>      HVM_DBG_LOG(DBG_LEVEL_MSR, "ecx=%#x", msr);
> 
> @@ -2897,7 +2897,7 @@ static int vmx_msr_read_intercept(unsigned int
> msr, uint64_t *msr_content)
>          if ( passive_domain_do_rdmsr(msr, msr_content) )
>              goto done;
> 
> -        if ( vmx_read_guest_msr(msr, msr_content) == 0 )
> +        if ( vmx_read_guest_msr(curr, msr, msr_content) == 0 )
>              break;
> 
>          if ( is_last_branch_msr(msr) )
> @@ -3109,7 +3109,7 @@ static int vmx_msr_write_intercept(unsigned int
> msr, uint64_t msr_content)
> 
>              for ( ; (rc == 0) && lbr->count; lbr++ )
>                  for ( i = 0; (rc == 0) && (i < lbr->count); i++ )
> -                    if ( (rc = vmx_add_guest_msr(lbr->base + i)) == 0 )
> +                    if ( (rc = vmx_add_guest_msr(v, lbr->base + i)) == 0 )
>                      {
>                          vmx_clear_msr_intercept(v, lbr->base + i, VMX_MSR_RW);
>                          if ( lbr_tsx_fixup_needed )
> @@ -3121,7 +3121,7 @@ static int vmx_msr_write_intercept(unsigned int
> msr, uint64_t msr_content)
>          }
> 
>          if ( (rc < 0) ||
> -             (msr_content && (vmx_add_host_load_msr(msr) < 0)) )
> +             (msr_content && (vmx_add_host_load_msr(v, msr) < 0)) )
>              hvm_inject_hw_exception(TRAP_machine_check,
> X86_EVENT_NO_EC);
>          else
>              __vmwrite(GUEST_IA32_DEBUGCTL, msr_content);
> @@ -3150,7 +3150,7 @@ static int vmx_msr_write_intercept(unsigned int
> msr, uint64_t msr_content)
>          if ( wrmsr_viridian_regs(msr, msr_content) )
>              break;
> 
> -        if ( vmx_write_guest_msr(msr, msr_content) == 0 ||
> +        if ( vmx_write_guest_msr(v, msr, msr_content) == 0 ||
>               is_last_branch_msr(msr) )
>              break;
> 
> @@ -4165,7 +4165,7 @@ static void lbr_tsx_fixup(void)
>      struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
>      struct vmx_msr_entry *msr;
> 
> -    if ( (msr = vmx_find_msr(lbr_from_start, VMX_MSR_GUEST)) != NULL )
> +    if ( (msr = vmx_find_msr(curr, lbr_from_start, VMX_MSR_GUEST)) !=
> NULL )
>      {
>          /*
>           * Sign extend into bits 61:62 while preserving bit 63
> @@ -4175,15 +4175,15 @@ static void lbr_tsx_fixup(void)
>              msr->data |= ((LBR_FROM_SIGNEXT_2MSB & msr->data) << 2);
>      }
> 
> -    if ( (msr = vmx_find_msr(lbr_lastint_from, VMX_MSR_GUEST)) != NULL )
> +    if ( (msr = vmx_find_msr(curr, lbr_lastint_from, VMX_MSR_GUEST)) !=
> NULL )
>          msr->data |= ((LBR_FROM_SIGNEXT_2MSB & msr->data) << 2);
>  }
> 
> -static void sign_extend_msr(u32 msr, int type)
> +static void sign_extend_msr(struct vcpu *v, u32 msr, int type)
>  {
>      struct vmx_msr_entry *entry;
> 
> -    if ( (entry = vmx_find_msr(msr, type)) != NULL )
> +    if ( (entry = vmx_find_msr(v, msr, type)) != NULL )
>      {
>          if ( entry->data & VADDR_TOP_BIT )
>              entry->data |= CANONICAL_MASK;
> @@ -4194,6 +4194,8 @@ static void sign_extend_msr(u32 msr, int type)
> 
>  static void bdw_erratum_bdf14_fixup(void)
>  {
> +    struct vcpu *curr = current;
> +
>      /*
>       * Occasionally, on certain Broadwell CPUs MSR_IA32_LASTINTTOIP has
>       * been observed to have the top three bits corrupted as though the
> @@ -4203,8 +4205,8 @@ static void bdw_erratum_bdf14_fixup(void)
>       * erratum BDF14. Fix up MSR_IA32_LASTINT{FROM,TO}IP by
>       * sign-extending into bits 48:63.
>       */
> -    sign_extend_msr(MSR_IA32_LASTINTFROMIP, VMX_MSR_GUEST);
> -    sign_extend_msr(MSR_IA32_LASTINTTOIP, VMX_MSR_GUEST);
> +    sign_extend_msr(curr, MSR_IA32_LASTINTFROMIP, VMX_MSR_GUEST);
> +    sign_extend_msr(curr, MSR_IA32_LASTINTTOIP, VMX_MSR_GUEST);
>  }
> 
>  static void lbr_fixup(void)
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-
> x86/hvm/vmx/vmcs.h
> index c8a1f89..f66f121 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -130,10 +130,17 @@ struct arch_vmx_struct {
>      uint64_t             sfmask;
> 
>      struct vmx_msr_bitmap *msr_bitmap;
> -    unsigned int         msr_count;
> +
> +    /*
> +     * Most accesses to the MSR host/guest load/save lists are in current
> +     * context.  However, the data can be modified by toolstack/migration
> +     * actions.  Remote access is only permitted for paused vcpus, and is
> +     * protected under the domctl lock.
> +     */
>      struct vmx_msr_entry *msr_area;
> -    unsigned int         host_msr_count;
>      struct vmx_msr_entry *host_msr_area;
> +    unsigned int         msr_count;
> +    unsigned int         host_msr_count;
> 
>      unsigned long        eoi_exitmap_changed;
>      DECLARE_BITMAP(eoi_exit_bitmap, NR_VECTORS);
> @@ -537,25 +544,27 @@ enum vmx_msr_list_type {
>      VMX_MSR_GUEST,
>  };
> 
> -int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type);
> +int vmx_add_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type
> type);
> 
> -static inline int vmx_add_host_load_msr(uint32_t msr)
> +static inline int vmx_add_guest_msr(struct vcpu *v, uint32_t msr)
>  {
> -    return vmx_add_msr(msr, VMX_MSR_HOST);
> +    return vmx_add_msr(v, msr, VMX_MSR_GUEST);
>  }
> 
> -static inline int vmx_add_guest_msr(uint32_t msr)
> +static inline int vmx_add_host_load_msr(struct vcpu *v, uint32_t msr)
>  {
> -    return vmx_add_msr(msr, VMX_MSR_GUEST);
> +    return vmx_add_msr(v, msr, VMX_MSR_HOST);
>  }
> 
> -struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum
> vmx_msr_list_type type);
> +struct vmx_msr_entry *vmx_find_msr(struct vcpu *v, uint32_t msr,
> +                                   enum vmx_msr_list_type type);
> 
> -static inline int vmx_read_guest_msr(uint32_t msr, uint64_t *val)
> +static inline int vmx_read_guest_msr(struct vcpu *v, uint32_t msr,
> +                                     uint64_t *val)
>  {
>      struct vmx_msr_entry *ent;
> 
> -    if ( (ent = vmx_find_msr(msr, VMX_MSR_GUEST)) )
> +    if ( (ent = vmx_find_msr(v, msr, VMX_MSR_GUEST)) )
>      {
>          *val = ent->data;
>          return 0;
> @@ -564,11 +573,12 @@ static inline int vmx_read_guest_msr(uint32_t
> msr, uint64_t *val)
>      return -ESRCH;
>  }
> 
> -static inline int vmx_write_guest_msr(uint32_t msr, uint64_t val)
> +static inline int vmx_write_guest_msr(struct vcpu *v, uint32_t msr,
> +                                      uint64_t val)
>  {
>      struct vmx_msr_entry *ent;
> 
> -    if ( (ent = vmx_find_msr(msr, VMX_MSR_GUEST)) )
> +    if ( (ent = vmx_find_msr(v, msr, VMX_MSR_GUEST)) )
>      {
>          ent->data = val;
>          return 0;
> --
> 2.1.4

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/9] x86/vmx: Fix handing of MSR_DEBUGCTL on VMExit
  2018-05-22 11:20 ` [PATCH 5/9] x86/vmx: Fix handing of MSR_DEBUGCTL on VMExit Andrew Cooper
                     ` (2 preceding siblings ...)
  2018-05-24 15:08   ` Jan Beulich
@ 2018-05-27  3:56   ` Tian, Kevin
  2018-05-28 15:30     ` Andrew Cooper
  3 siblings, 1 reply; 51+ messages in thread
From: Tian, Kevin @ 2018-05-27  3:56 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Wei Liu, Nakajima, Jun, Jan Beulich, Roger Pau Monné

> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Tuesday, May 22, 2018 7:21 PM
> 
> Currently, whenever the guest writes a nonzero value to MSR_DEBUGCTL,
> Xen
> updates a host MSR load list entry with the current hardware value of
> MSR_DEBUGCTL.  This is wrong.
> 
> On VMExit, hardware automatically resets MSR_DEBUGCTL to 0.  The only
> case
> where different behaviour is needed is if Xen is debugging itself, and this
> needs setting up unconditionally for the lifetime of the VM.
> 
> The `ler` command line boolean is the only way to configure any use of
> MSR_DEBUGCTL for Xen, so tie the host load list entry to this setting in
> construct_vmcs().  Any runtime update of Xen's MSR_DEBUGCTL setting
> requires
> more complicated synchronisation across all the running VMs.
> 
> In the exceedingly common case, this avoids the unnecessary overhead of
> having
> a host load entry performing the same zeroing operation that hardware
> has
> already performed as part of the VMExit.

I didn't get "unnecessary overhead" part. if "ler' is disabled, as you
said earlier it's a bug to save/restore thus overhead doesn't matter.
If "ler" is enabled, then save/restore is anyway required then where
is saved overhead coming from?

Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/9] x86/vmx: Support remote access to the MSR lists
  2018-05-27  3:47   ` Tian, Kevin
@ 2018-05-28 15:15     ` Andrew Cooper
  0 siblings, 0 replies; 51+ messages in thread
From: Andrew Cooper @ 2018-05-28 15:15 UTC (permalink / raw)
  To: Tian, Kevin, Xen-devel
  Cc: Wei Liu, Nakajima, Jun, Jan Beulich, Roger Pau Monné

On 27/05/18 04:47, Tian, Kevin wrote:
>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>> Sent: Tuesday, May 22, 2018 7:21 PM
>>
>> At the moment, all modifications of the MSR lists are in current context.
>> However, future changes may need to put MSR_EFER into the lists from
>> domctl
>> hypercall context.
>>
>> Plumb a struct vcpu parameter down through the infrastructure, and use
>> vmx_vmcs_{enter,exit}() for safe access to the VMCS in vmx_add_msr().
>> Use
>> assertions to ensure that access is either in current context, or while the
>> vcpu is paused.
>>
>> For now it is safe to require that remote accesses are under the domctl lock.
>> This will remain safe if/when the global domctl lock becomes per-domain.
>>
>> Note these expectations beside the fields in arch_vmx_struct, and reorder
>> the
>> fields to avoid unnecessary padding.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Jun Nakajima <jun.nakajima@intel.com>
>> CC: Kevin Tian <kevin.tian@intel.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>
>> To preempt any questions about spinlocks, the use of the MSR lists in the
>> return-to-guest path causes checklock failures for plain spinlocks (despite it
>> technically being safe to live here), and the call to alloc_xenheap_page()
>> makes it impossible to use irqsave/restore variants, due to the nested
>> acquisition of the heap lock.
> I don't understand above words. How does it relate to the patch here?

It explains why I haven't/can't introduce a spinlock to protect access,
in case someone reviewing the code asks "why not introduce a spinlock".

>> @@ -1333,12 +1335,14 @@ struct vmx_msr_entry
>> *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type)
>>      return ((ent < end) && (ent->index == msr)) ? ent : NULL;
>>  }
>>
>> -int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
>> +int vmx_add_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type
>> type)
>>  {
>> -    struct vcpu *curr = current;
>> -    struct arch_vmx_struct *arch_vmx = &curr->arch.hvm_vmx;
>> +    struct arch_vmx_struct *arch_vmx = &v->arch.hvm_vmx;
>>      struct vmx_msr_entry **ptr, *start = NULL, *ent, *end;
>>      unsigned int total;
>> +    int rc;
>> +
>> +    ASSERT(v == current || !vcpu_runnable(v));
>>
>>      switch ( type )
>>      {
>> @@ -1357,13 +1361,18 @@ int vmx_add_msr(uint32_t msr, enum
>> vmx_msr_list_type type)
>>          return -EINVAL;
>>      }
>>
>> +    vmx_vmcs_enter(v);
>> +
> why entering vmcs so early even before possible page allocation?

Because the next thing the allocation path does is write to the MSR
load/save list fields.

The alternative would be to have an else on this if(), and a second
vmcs_enter() after the memory allocation, but as these are two one-time
allocations in uncontended paths, I didn't consider the added complexity
worth it.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/9] x86/vmx: Fix handing of MSR_DEBUGCTL on VMExit
  2018-05-27  3:56   ` Tian, Kevin
@ 2018-05-28 15:30     ` Andrew Cooper
  0 siblings, 0 replies; 51+ messages in thread
From: Andrew Cooper @ 2018-05-28 15:30 UTC (permalink / raw)
  To: Tian, Kevin, Xen-devel
  Cc: Wei Liu, Nakajima, Jun, Jan Beulich, Roger Pau Monné

On 27/05/18 04:56, Tian, Kevin wrote:
>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>> Sent: Tuesday, May 22, 2018 7:21 PM
>>
>> Currently, whenever the guest writes a nonzero value to MSR_DEBUGCTL,
>> Xen
>> updates a host MSR load list entry with the current hardware value of
>> MSR_DEBUGCTL.  This is wrong.
>>
>> On VMExit, hardware automatically resets MSR_DEBUGCTL to 0.  The only
>> case
>> where different behaviour is needed is if Xen is debugging itself, and this
>> needs setting up unconditionally for the lifetime of the VM.
>>
>> The `ler` command line boolean is the only way to configure any use of
>> MSR_DEBUGCTL for Xen, so tie the host load list entry to this setting in
>> construct_vmcs().  Any runtime update of Xen's MSR_DEBUGCTL setting
>> requires
>> more complicated synchronisation across all the running VMs.
>>
>> In the exceedingly common case, this avoids the unnecessary overhead of
>> having
>> a host load entry performing the same zeroing operation that hardware
>> has
>> already performed as part of the VMExit.
> I didn't get "unnecessary overhead" part. if "ler' is disabled, as you
> said earlier it's a bug to save/restore thus overhead doesn't matter.

The current behaviour (bug or otherwise), means that when ler is
disabled, we end up with a host load list entry zeroing MSR_DEBUGCTL as
soon as the guest first writes to the MSR.

This causes unnecessary overhead because host load/save lists are slow.

Irrespective, I've reworked this patch differently, to fully disentangle
it from the EFER series.  See patch 1 of my "x86/vmx: Misc fixes and
improvements" series.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 6/9] x86/vmx: Pass an MSR value into vmx_msr_add()
  2018-05-24 15:12   ` Jan Beulich
@ 2018-05-30 18:09     ` Andrew Cooper
  0 siblings, 0 replies; 51+ messages in thread
From: Andrew Cooper @ 2018-05-30 18:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Kevin Tian, Xen-devel, Wei Liu, Jun Nakajima, Roger Pau Monne

On 24/05/18 16:12, Jan Beulich wrote:
>>>> On 22.05.18 at 13:20, <andrew.cooper3@citrix.com> wrote:
>> The main purpose of this change is to allow us to set a specific MSR value,
>> without needing to know whether there is already a load/save list slot for 
>> it.
>> Previously, callers wanting this property needed to call both vmx_add_*_msr()
>> and vmx_write_*_msr() to cover both cases.
>>
>> As a result of this API improvement, the default value for guest MSRs need not
>> be 0, and the default for host MSRs need not be passed via hardware register.
>> In practice, this cleans up the VPMU allocation logic, and avoids an MSR read
>> as part of vcpu construction.
> But this also means: If there already is such an entry, the previous value will
> now be blindly overwritten. Are you sure this is always what is wanted?

You are correct.  This will end up resetting the LBR MSRs whenever the
guest writes to MSR_DEBUGCTL.

Overall, this behaviour is the more useful for callers, so I'll adjust
the MSR_DEBUGCTL case to only ever try inserting them once.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2018-05-30 18:09 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-22 11:20 [PATCH 0/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
2018-05-22 11:20 ` [PATCH 1/9] x86/vmx: API improvements for MSR load/save infrastructure Andrew Cooper
2018-05-23 16:01   ` Roger Pau Monné
2018-05-23 17:02     ` Andrew Cooper
2018-05-27  3:26   ` Tian, Kevin
2018-05-22 11:20 ` [PATCH 2/9] x86/vmx: Internal cleanup " Andrew Cooper
2018-05-23 16:28   ` Roger Pau Monné
2018-05-23 16:54     ` Andrew Cooper
2018-05-24 14:45   ` Jan Beulich
2018-05-27  3:30   ` Tian, Kevin
2018-05-22 11:20 ` [PATCH 3/9] x86/vmx: Factor locate_msr_entry() out of vmx_find_msr() and vmx_add_msr() Andrew Cooper
2018-05-23 16:39   ` Roger Pau Monné
2018-05-23 16:55     ` Andrew Cooper
2018-05-24 10:53       ` Roger Pau Monné
2018-05-24 10:59         ` Andrew Cooper
2018-05-24 12:16           ` Roger Pau Monné
2018-05-27  3:38   ` Tian, Kevin
2018-05-22 11:20 ` [PATCH 4/9] x86/vmx: Support remote access to the MSR lists Andrew Cooper
2018-05-24 11:50   ` Roger Pau Monné
2018-05-24 12:03     ` Andrew Cooper
2018-05-24 14:53   ` Jan Beulich
2018-05-27  3:47   ` Tian, Kevin
2018-05-28 15:15     ` Andrew Cooper
2018-05-22 11:20 ` [PATCH 5/9] x86/vmx: Fix handing of MSR_DEBUGCTL on VMExit Andrew Cooper
2018-05-22 12:53   ` Andrew Cooper
2018-05-24 12:14   ` Roger Pau Monné
2018-05-24 12:39     ` Andrew Cooper
2018-05-24 13:53       ` Jan Beulich
2018-05-24 15:08   ` Jan Beulich
2018-05-24 15:51     ` Andrew Cooper
2018-05-27  3:56   ` Tian, Kevin
2018-05-28 15:30     ` Andrew Cooper
2018-05-22 11:20 ` [PATCH 6/9] x86/vmx: Pass an MSR value into vmx_msr_add() Andrew Cooper
2018-05-24 15:12   ` Jan Beulich
2018-05-30 18:09     ` Andrew Cooper
2018-05-22 11:20 ` [PATCH 7/9] x86/vmx: Support load-only guest MSR list entries Andrew Cooper
2018-05-24 15:19   ` Jan Beulich
2018-05-24 15:37   ` Roger Pau Monné
2018-05-22 11:20 ` [PATCH 8/9] x86/vmx: Support removing MSRs from the host/guest load/save lists Andrew Cooper
2018-05-24 15:42   ` Roger Pau Monné
2018-05-24 15:45     ` Andrew Cooper
2018-05-22 11:20 ` [PATCH 9/9] x86/vmx: Don't leak EFER.NXE into guest context Andrew Cooper
2018-05-24 16:01   ` Roger Pau Monné
2018-05-24 16:48     ` Andrew Cooper
2018-05-25  7:27       ` Jan Beulich
2018-05-25  8:03         ` Andrew Cooper
2018-05-25  6:23   ` Tim Deegan
2018-05-25  7:49   ` Jan Beulich
2018-05-25  8:36     ` Andrew Cooper
2018-05-25 11:36       ` Jan Beulich
2018-05-25 11:48         ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.